148 lines
		
	
	
		
			5.2 KiB
		
	
	
	
		
			ReStructuredText
		
	
	
	
	
	
			
		
		
	
	
			148 lines
		
	
	
		
			5.2 KiB
		
	
	
	
		
			ReStructuredText
		
	
	
	
	
	
| :mod:`xml.dom.pulldom` --- Support for building partial DOM trees
 | |
| =================================================================
 | |
| 
 | |
| .. module:: xml.dom.pulldom
 | |
|    :synopsis: Support for building partial DOM trees from SAX events.
 | |
| 
 | |
| .. moduleauthor:: Paul Prescod <paul@prescod.net>
 | |
| 
 | |
| **Source code:** :source:`Lib/xml/dom/pulldom.py`
 | |
| 
 | |
| --------------
 | |
| 
 | |
| The :mod:`xml.dom.pulldom` module provides a "pull parser" which can also be
 | |
| asked to produce DOM-accessible fragments of the document where necessary. The
 | |
| basic concept involves pulling "events" from a stream of incoming XML and
 | |
| processing them. In contrast to SAX which also employs an event-driven
 | |
| processing model together with callbacks, the user of a pull parser is
 | |
| responsible for explicitly pulling events from the stream, looping over those
 | |
| events until either processing is finished or an error condition occurs.
 | |
| 
 | |
| 
 | |
| .. warning::
 | |
| 
 | |
|    The :mod:`xml.dom.pulldom` module is not secure against
 | |
|    maliciously constructed data.  If you need to parse untrusted or
 | |
|    unauthenticated data see :ref:`xml-vulnerabilities`.
 | |
| 
 | |
| .. versionchanged:: 3.7.1
 | |
| 
 | |
|    The SAX parser no longer processes general external entities by default to
 | |
|    increase security by default. To enable processing of external entities,
 | |
|    pass a custom parser instance in::
 | |
| 
 | |
|       from xml.dom.pulldom import parse
 | |
|       from xml.sax import make_parser
 | |
|       from xml.sax.handler import feature_external_ges
 | |
| 
 | |
|       parser = make_parser()
 | |
|       parser.setFeature(feature_external_ges, True)
 | |
|       parse(filename, parser=parser)
 | |
| 
 | |
| 
 | |
| Example::
 | |
| 
 | |
|    from xml.dom import pulldom
 | |
| 
 | |
|    doc = pulldom.parse('sales_items.xml')
 | |
|    for event, node in doc:
 | |
|        if event == pulldom.START_ELEMENT and node.tagName == 'item':
 | |
|            if int(node.getAttribute('price')) > 50:
 | |
|                doc.expandNode(node)
 | |
|                print(node.toxml())
 | |
| 
 | |
| ``event`` is a constant and can be one of:
 | |
| 
 | |
| * :data:`START_ELEMENT`
 | |
| * :data:`END_ELEMENT`
 | |
| * :data:`COMMENT`
 | |
| * :data:`START_DOCUMENT`
 | |
| * :data:`END_DOCUMENT`
 | |
| * :data:`CHARACTERS`
 | |
| * :data:`PROCESSING_INSTRUCTION`
 | |
| * :data:`IGNORABLE_WHITESPACE`
 | |
| 
 | |
| ``node`` is an object of type :class:`xml.dom.minidom.Document`,
 | |
| :class:`xml.dom.minidom.Element` or :class:`xml.dom.minidom.Text`.
 | |
| 
 | |
| Since the document is treated as a "flat" stream of events, the document "tree"
 | |
| is implicitly traversed and the desired elements are found regardless of their
 | |
| depth in the tree. In other words, one does not need to consider hierarchical
 | |
| issues such as recursive searching of the document nodes, although if the
 | |
| context of elements were important, one would either need to maintain some
 | |
| context-related state (i.e. remembering where one is in the document at any
 | |
| given point) or to make use of the :func:`DOMEventStream.expandNode` method
 | |
| and switch to DOM-related processing.
 | |
| 
 | |
| 
 | |
| .. class:: PullDom(documentFactory=None)
 | |
| 
 | |
|    Subclass of :class:`xml.sax.handler.ContentHandler`.
 | |
| 
 | |
| 
 | |
| .. class:: SAX2DOM(documentFactory=None)
 | |
| 
 | |
|    Subclass of :class:`xml.sax.handler.ContentHandler`.
 | |
| 
 | |
| 
 | |
| .. function:: parse(stream_or_string, parser=None, bufsize=None)
 | |
| 
 | |
|    Return a :class:`DOMEventStream` from the given input. *stream_or_string* may be
 | |
|    either a file name, or a file-like object. *parser*, if given, must be an
 | |
|    :class:`~xml.sax.xmlreader.XMLReader` object. This function will change the
 | |
|    document handler of the
 | |
|    parser and activate namespace support; other parser configuration (like
 | |
|    setting an entity resolver) must have been done in advance.
 | |
| 
 | |
| If you have XML in a string, you can use the :func:`parseString` function instead:
 | |
| 
 | |
| .. function:: parseString(string, parser=None)
 | |
| 
 | |
|    Return a :class:`DOMEventStream` that represents the (Unicode) *string*.
 | |
| 
 | |
| .. data:: default_bufsize
 | |
| 
 | |
|    Default value for the *bufsize* parameter to :func:`parse`.
 | |
| 
 | |
|    The value of this variable can be changed before calling :func:`parse` and
 | |
|    the new value will take effect.
 | |
| 
 | |
| .. _domeventstream-objects:
 | |
| 
 | |
| DOMEventStream Objects
 | |
| ----------------------
 | |
| 
 | |
| .. class:: DOMEventStream(stream, parser, bufsize)
 | |
| 
 | |
|    .. deprecated:: 3.8
 | |
|       Support for :meth:`sequence protocol <__getitem__>` is deprecated.
 | |
| 
 | |
|    .. method:: getEvent()
 | |
| 
 | |
|       Return a tuple containing *event* and the current *node* as
 | |
|       :class:`xml.dom.minidom.Document` if event equals :data:`START_DOCUMENT`,
 | |
|       :class:`xml.dom.minidom.Element` if event equals :data:`START_ELEMENT` or
 | |
|       :data:`END_ELEMENT` or :class:`xml.dom.minidom.Text` if event equals
 | |
|       :data:`CHARACTERS`.
 | |
|       The current node does not contain information about its children, unless
 | |
|       :func:`expandNode` is called.
 | |
| 
 | |
|    .. method:: expandNode(node)
 | |
| 
 | |
|       Expands all children of *node* into *node*. Example::
 | |
| 
 | |
|           from xml.dom import pulldom
 | |
| 
 | |
|           xml = '<html><title>Foo</title> <p>Some text <div>and more</div></p> </html>'
 | |
|           doc = pulldom.parseString(xml)
 | |
|           for event, node in doc:
 | |
|               if event == pulldom.START_ELEMENT and node.tagName == 'p':
 | |
|                   # Following statement only prints '<p/>'
 | |
|                   print(node.toxml())
 | |
|                   doc.expandNode(node)
 | |
|                   # Following statement prints node with all its children '<p>Some text <div>and more</div></p>'
 | |
|                   print(node.toxml())
 | |
| 
 | |
|    .. method:: DOMEventStream.reset()
 | |
| 
 |