python模块之xml.etree.ElementTree

python

<?xml version="1.0"?>

<data>

<country name="Liechtenstein">

<rank>1</rank>

<year>2008</year>

<gdppc>141100</gdppc>

<neighbor name="Austria" direction="E"/>

<neighbor name="Switzerland" direction="W"/>

</country>

<country name="Singapore">

<rank>4</rank>

<year>2011</year>

<gdppc>59900</gdppc>

<neighbor name="Malaysia" direction="N"/>

</country>

<country name="Panama">

<rank>68</rank>

<year>2011</year>

<gdppc>13600</gdppc>

<neighbor name="Costa Rica" direction="W"/>

<neighbor name="Colombia" direction="E"/>

</country>

</data>

解析XML文件

parse()函数,从xml文件返回ElementTree

python;gutter:true;">from xml.etree.ElementTree import parse

tree = parse('demo.xml') //获取ElementTree

root = tree.getroot() // 获取根元素

Element.tag 、Element.attrib、Element.text

In [6]: root.tag

Out[6]: 'data'

In [7]: root.attrib

Out[7]: {}

In [25]: root.text

Out[25]: '\n '

for child in root  迭代获得子元素

In [8]: for child in root:

...: print(child.tag, child.attrib)

...:

country {'name': 'Liechtenstein'}

country {'name': 'Singapore'}

country {'name': 'Panama'}

Element.get()  获得属性值

In [27]: for child in root:

...: print (child.tag, child.get('name'))

...:

country Liechtenstein

country Singapore

country Panama

root.getchildren()  获得直接子元素

In [21]: root.getchildren()

Out[21]:

[<Element 'country' at 0x7f673581c728>,

<Element 'country' at 0x7f673581ca98>,

<Element 'country' at 0x7f673581cc28>]

root[0][1]  根据索引查找子元素

In [9]: root[0][1].text

Out[9]: '2008'

In [10]: root[1][0].text

Out[10]: '4'

root.find() 根据tag查找直接子元素,返回查到的第一个元素

In [13]: root.find('country').attrib

Out[13]: {'name': 'Liechtenstein'}

root.findall()    根据tag查找直接子元素,返回查到的所有元素的列表

In [16]: for country in root.findall('country'):

...: print (country.attrib)

...:

{'name': 'Liechtenstein'}

{'name': 'Singapore'}

{'name': 'Panama'}

root.iterfind()   根据tag查找直接子元素,返回查到的所有元素的生成器

In [22]: root.iterfind('country')

Out[22]: <generator object prepare_child.<locals>.select at 0x7f6736dccfc0> 

支持的XPath语句(XML Path)

In [19]: root.findall('.//rank')  //查找任意层次元素

Out[19]:

[<Element 'rank' at 0x7f673581c8b8>,

<Element 'rank' at 0x7f673581c6d8>,

<Element 'rank' at 0x7f673581cc78>]

In [32]: root.findall('country/*') //查找孙子节点元素

Out[32]:

[<Element 'rank' at 0x7f673581c8b8>,

<Element 'year' at 0x7f673581cbd8>,

<Element 'gdppc' at 0x7f673581c958>,

<Element 'neighbor' at 0x7f673581c688>,

<Element 'neighbor' at 0x7f673581cb38>,

<Element 'rank' at 0x7f673581c6d8>,

<Element 'year' at 0x7f673581c5e8>,

<Element 'gdppc' at 0x7f673581c868>,

<Element 'neighbor' at 0x7f673581cb88>,

<Element 'rank' at 0x7f673581cc78>,

<Element 'year' at 0x7f673581ccc8>,

<Element 'gdppc' at 0x7f673581cd18>,

<Element 'neighbor' at 0x7f673581cd68>,

<Element 'neighbor' at 0x7f673581cdb8>]

In [33]: root.findall('.//rank/..') // ..表示父元素

Out[33]:

[<Element 'country' at 0x7f673581c728>,

<Element 'country' at 0x7f673581ca98>,

<Element 'country' at 0x7f673581cc28>]

In [34]: root.findall('country[@name]') // 包含name属性的country

Out[34]:

[<Element 'country' at 0x7f673581c728>,

<Element 'country' at 0x7f673581ca98>,

<Element 'country' at 0x7f673581cc28>]

In [35]: root.findall('country[@name="Singapore"]') // name属性为Singapore的country

Out[35]: [<Element 'country' at 0x7f673581ca98>]

In [36]: root.findall('country[rank]') // 孩子元素中包含rank的country

Out[36]:

[<Element 'country' at 0x7f673581c728>,

<Element 'country' at 0x7f673581ca98>,

<Element 'country' at 0x7f673581cc28>]

In [37]: root.findall('country[rank="68"]') // 孩子元素中包含rank且rank元素的text为68的country

Out[37]: [<Element 'country' at 0x7f673581cc28>]

In [38]: root.findall('country[1]') // 第一个country

Out[38]: [<Element 'country' at 0x7f673581c728>]

In [39]: root.findall('country[last()]') // 最后一个country

Out[39]: [<Element 'country' at 0x7f673581cc28>]

In [40]: root.findall('country[last()-1]') // 倒数第二个country

Out[40]: [<Element 'country' at 0x7f673581ca98>]

root.iter()  递归查询指定的或所有子元素 

In [29]: root.iter()

Out[29]: <_elementtree._element_iterator at 0x7f67355dd728>

In [30]: list(root.iter())

Out[30]:

[<Element 'data' at 0x7f673581c778>,

<Element 'country' at 0x7f673581c728>,

<Element 'rank' at 0x7f673581c8b8>,

<Element 'year' at 0x7f673581cbd8>,

<Element 'gdppc' at 0x7f673581c958>,

<Element 'neighbor' at 0x7f673581c688>,

<Element 'neighbor' at 0x7f673581cb38>,

<Element 'country' at 0x7f673581ca98>,

<Element 'rank' at 0x7f673581c6d8>,

<Element 'year' at 0x7f673581c5e8>,

<Element 'gdppc' at 0x7f673581c868>,

<Element 'neighbor' at 0x7f673581cb88>,

<Element 'country' at 0x7f673581cc28>,

<Element 'rank' at 0x7f673581cc78>,

<Element 'year' at 0x7f673581ccc8>,

<Element 'gdppc' at 0x7f673581cd18>,

<Element 'neighbor' at 0x7f673581cd68>,

<Element 'neighbor' at 0x7f673581cdb8>]

In [31]: list(root.iter('rank'))

Out[31]:

[<Element 'rank' at 0x7f673581c8b8>,

<Element 'rank' at 0x7f673581c6d8>,

<Element 'rank' at 0x7f673581cc78>]

  

以上是 python模块之xml.etree.ElementTree 的全部内容, 来源链接: utcz.com/z/388939.html

回到顶部