python sax解析xml

python

#books.xml
<catalog>

<book isbn="0-596-00128-2">

<title>Python &amp; XML</title>

<title>Python &amp; HTML</title>

<date>December 2001</date>

<author>Jones, Drake</author>

</book>

<book isbn="0-596-15810-6">

<title>Programming Python, 4th Edition</title>

<date>October 2010</date>

<author>Lutz</author>

</book>

<book isbn="0-596-15806-8">

<title>Learning Python, 4th Edition</title>

<date>September 2009</date>

<author>Lutz</author>

</book>

<book isbn="0-596-15808-4">

<title>Python Pocket Reference, 4th Edition</title>

<date>October 2009</date>

<author>Lutz</author>

</book>

<book isbn="0-596-00797-3">

<title>Python Cookbook, 2nd Edition</title>

<date>March 2005</date>

<author>Martelli, Ravenscroft, Ascher</author>

</book>

<book isbn="0-596-10046-9">

<title>Python in a Nutshell, 2nd Edition</title>

<date>July 2006</date>

<author>Martelli</author>

</book>

<!-- plus many more Python books that should appear here -->

</catalog>

#conding:utf-8

# -*- coding:utf-8 -*-

__author__ = 'hdfs'

'''

总的来说 sax解析xml 进行3个阶段 sax是线性解析对于大的xml会很有效率

'''

import xml.sax,xml.sax.handler,pprint

class BookHandler(xml.sax.handler.ContentHandler):

def __init__(self):

self.inTitle=False

self.mapping={}

def startElement(self, name, attrs):

#book标签开始

if name=="book":

self.buffer=""

self.isbn=attrs["isbn"]

#title标签开始

elif name=="title":

self.inTitle=True

def characters(self,data):

#如果真的进入buffer 关联多个子节点的数据

if self.inTitle:

self.buffer+=data

#结束一个元素的遍历

def endElement(self,name):

if name=="title":

self.inTitle=False

self.mapping[self.isbn]=self.buffer

parser=xml.sax.make_parser()

handler=BookHandler()

parser.setContentHandler(handler)

parser.parse('books.xml')

pprint.pprint(handler.mapping)

result:

{u'0-596-00128-2': u'Python & XMLPython & HTML',

u'0-596-00797-3': u'Python Cookbook, 2nd Edition',

u'0-596-10046-9': u'Python in a Nutshell, 2nd Edition',

u'0-596-15806-8': u'Learning Python, 4th Edition',

u'0-596-15808-4': u'Python Pocket Reference, 4th Edition',

u'0-596-15810-6': u'Programming Python, 4th Edition'}

以上是 python sax解析xml 的全部内容, 来源链接: utcz.com/z/386532.html

回到顶部