Editing Simple API for XML (section)

== Benefits ==
A SAX parser only needs to report each parsing event as it happens, and normally discards almost all of that information once reported (it does, however, keep some things, for example a list of all elements that have not been closed yet, in order to catch later errors such as end-tags in the wrong order). Thus, the minimum memory required for a SAX parser is proportional to the maximum depth of the XML file (i.e., of the XML tree) and the maximum data involved in a single XML event (such as the name and attributes of a single start-tag, or the content of a processing instruction, etc.).

This much [[Memory (computers)|memory]] is usually considered negligible. A DOM parser, in contrast, has to build a tree representation of the entire document in memory to begin with, thus using memory that increases with the entire document length. This takes considerable time and space for large documents (memory allocation and data-structure construction take time).<ref>{{cite conference|last1=Wu|first1=D.|last2=Chau|first2=K. T.|last3=Wang|first3=J.|last4=Pan|first4=C.|date=January 2019|title=A comparative study on performance of XML parser APIs (DOM and SAX) in parsing efficiency.|conference=3rd International Conference on Cryptography, Security and Privacy|location=Kuala Lumpur, Malaysia|publisher=Association for Computing Machinery|pages=88–92|doi=10.1145/3309074.3309124}}</ref> The compensating advantage, of course, is that once loaded ''any'' part of the document can be accessed in any order.

Because of the event-driven nature of SAX, processing documents is generally far faster than DOM-style parsers, ''so long as'' the processing can be done in a start-to-end pass.<ref>{{cite conference|first=Chao|last=Wang|year=2020|title=XML Parsing Technique|conference=Innovative Computing: IC 2020|location=Kuala Lumpur, Malaysia|publisher=Springer|pages=1519–1526|doi=10.1007/978-981-15-5959-4_185}}</ref> Many tasks, such as indexing, conversion to other formats, very simple formatting and the like can be done that way. Other tasks, such as sorting, rearranging sections, getting from a link to its target, looking up information on one element to help process a later one and the like require accessing the document structure in complex orders and will be much faster with DOM than with multiple SAX passes.

Some implementations do not neatly fit either category: a DOM approach can keep its [[persistent data]] on disk, cleverly organized for speed (editors such as [[SoftQuad Author/Editor]] and large-document browser/indexers such as [[DynaText]] do this); while a SAX approach can cleverly cache information for later use (any validating SAX parser keeps more information than described above). Such implementations blur the DOM/SAX tradeoffs, but are often very effective in practice.

Due to the nature of DOM, streamed reading from disk requires techniques such as [[lazy evaluation]], caches, [[virtual memory]], persistent data structures, or other techniques (one such technique is disclosed in US patent 5557722). Processing XML documents larger than main memory is sometimes thought impossible because some DOM parsers do not allow it. However, it is no less possible than sorting a dataset larger than main memory using disk space as memory to sidestep this limitation.<ref>{{cite web
|first=Frank|last=Charlie
| access-date = 2011-10-20
| url = http://www.devx.com/xml/article/16922/1954
| publisher = devX
| title = XML Parsers: DOM and SAX Put to the Test 
| date = February 2001
| quote = Although these tests do not show it, SAX parsers typically are faster for very large documents where the DOM model hits virtual memory or consumes all available memory. }}</ref>