Editing XML (section)

== Programming interfaces ==
The design goals of XML include, "It shall be easy to write programs which process XML documents."{{sfnp|Bray|Paoli|Sperberg-McQueen|Maler|2008|loc=section 1.1}} Despite this, the XML specification contains almost no information about how programmers might go about doing such processing. The [[XML Infoset]] specification provides a vocabulary to refer to the constructs within an XML document, but does not provide any guidance on how to access this information. A variety of [[API]]s for accessing XML have been developed and used, and some have been standardized.

Existing APIs for XML processing tend to fall into these categories:
* Stream-oriented APIs accessible from a programming language, for example [[Simple API for XML|SAX]] and [[StAX]].
* Tree-traversal APIs accessible from a programming language, for example [[DOM (XML API)|DOM]].
* [[XML data binding]], which provides an automated translation between an XML document and programming-language objects.
* Declarative transformation languages such as [[XSLT]] and [[XQuery]].
* Syntax extensions to general-purpose programming languages, for example [[LINQ]] and [[Scala (programming language)|Scala]].

Stream-oriented facilities require less memory and, for certain tasks based on a linear traversal of an XML document, are faster and simpler than other alternatives. Tree-traversal and data-binding APIs typically require the use of much more memory, but are often found more convenient for use by programmers; some include declarative retrieval of document components via the use of XPath expressions.

XSLT is designed for declarative description of XML document transformations, and has been widely implemented both in server-side packages and Web browsers. XQuery overlaps XSLT in its functionality, but is designed more for searching of large [[XML database]]s.

=== Simple API for XML ===
{{Main|Simple API for XML}}

[[Simple API for XML]] (SAX) is a [[Lexical analysis|lexical]], [[Event-driven programming|event-driven]] API in which a document is read serially and its contents are reported as [[callbacks]] to various [[Method (computer science)|methods]] on a [[Event handler|handler object]] of the user's design. SAX is fast and efficient to implement, but difficult to use for extracting information at random from the XML, since it tends to burden the application author with keeping track of what part of the document is being processed. It is better suited to situations in which certain types of information are always handled the same way, no matter where they occur in the document.

=== Pull parsing ===
Pull parsing treats the document as a series of items read in sequence using the [[Iterator pattern|iterator design pattern]]. This allows for writing of [[recursive descent parser]]s in which the structure of the code performing the parsing mirrors the structure of the XML being parsed, and intermediate parsed results can be used and accessed as local variables within the functions performing the parsing, or passed down (as function parameters) into lower-level functions, or returned (as function return values) to higher-level functions.<ref>{{cite web|url=http://www.xml.com/pub/a/2005/07/06/tr.html|title=Push, Pull, Next!|first=Bob|last=DuCharme|website=Xml.com|access-date=16 November 2017}}</ref> Examples of pull parsers include Data::Edit::Xml in [[Perl]], [[StAX]] in the [[Java (programming language)|Java]] programming language, XMLPullParser in [[Smalltalk]], XMLReader in [[PHP]], ElementTree.iterparse in [[Python (programming language)|Python]], SmartXML in [[Red (programming language)|Red]], System.Xml.XmlReader in the [[.NET Framework]], and the DOM traversal API (NodeIterator and TreeWalker).

A pull parser creates an iterator that sequentially visits the various elements, attributes, and data in an XML document. Code that uses this iterator can test the current item (to tell, for example, whether it is a start-tag or end-tag, or text), and inspect its attributes (local name, [[XML namespace|namespace]], values of XML attributes, value of text, etc.), and can also move the iterator to the next item. The code can thus extract information from the document as it traverses it. The recursive-descent approach tends to lend itself to keeping data as typed local variables in the code doing the parsing, while SAX, for instance, typically requires a parser to manually maintain intermediate data within a stack of elements that are parent elements of the element being parsed. Pull-parsing code can be more straightforward to understand and maintain than SAX parsing code.

=== Document Object Model ===
{{Main|Document Object Model}}

The [[Document Object Model]] (DOM) is an interface that allows for navigation of the entire document as if it were a tree of [[Node (computer science)|node]] [[Object (computer science)|objects]] representing the document's contents. A DOM document can be created by a parser, or can be generated manually by users (with limitations). Data types in DOM nodes are abstract; implementations provide their own programming language-specific [[language binding|bindings]]. DOM implementations tend to be [[memory]] intensive, as they generally require the entire document to be loaded into memory and constructed as a tree of objects before access is allowed.

=== Data binding ===
[[XML data binding]] is a technique for simplifying development of applications that need to work with XML documents. It involves mapping the XML document to a hierarchy of strongly typed objects, rather than using the generic objects created by a DOM parser. The resulting code is often easier to read and maintain, and it can help to identify problems at compile time rather than run-time. XML data binding is particularly well-suited for applications where the document structure is known and fixed at the time the application is written. By creating a strongly typed representation of the XML data, developers can take advantage of modern integrated development environments (IDEs) that provide features like auto-complete, code refactoring, and code highlighting. This can make it easier to write correct and efficient code, and reduce the risk of errors and bugs. Example data-binding systems include the [[Java Architecture for XML Binding]] (JAXB), XML Serialization in [[.NET Framework]],<ref>{{cite web|first=Dare|last=Obasanjo|url=http://msdn.microsoft.com/en-us/library/ms950721.aspx|title=XML Serialization in the .NET Framework|website=Microsoft Developer Network|date=30 June 2006 |access-date=31 July 2009}}</ref> and XML serialization in [[gSOAP]].

=== XML as data type ===
XML has appeared as a [[first-class data type]] in other languages. The [[ECMAScript for XML]] (E4X) extension to the [[ECMAScript]]/JavaScript language explicitly defines two specific objects (XML and XMLList) for JavaScript, which support XML document nodes and XML node lists as distinct objects and use a dot-notation specifying parent-child relationships.<ref>{{cite web|title=Processing XML with E4X|url=https://developer.mozilla.org/en/core_javascript_1.5_guide/processing_xml_with_e4x|work=Mozilla Developer Center|publisher=Mozilla Foundation|access-date=2010-07-27|archive-date=2011-05-01|archive-url=https://web.archive.org/web/20110501151224/https://developer.mozilla.org/en/core_javascript_1.5_guide/processing_xml_with_e4x|url-status=dead}}</ref> E4X is supported by the [[Mozilla]] 2.5+ browsers (though now deprecated) and Adobe [[Actionscript]] but has not been widely adopted. Similar notations are used in Microsoft's [[LINQ]] implementation for Microsoft .NET 3.5 and above, and in [[Scala (programming language)|Scala]] (which uses the Java VM). The open-source xmlsh application, which provides a Linux-like shell with special features for XML manipulation, similarly treats XML as a data type, using the <[ ]> notation.<ref>{{cite web|url=http://www.xmlsh.org/CoreSyntax|title=XML Shell: Core Syntax|website=Xmlsh.org|date=2010-05-13|access-date=22 August 2010}}</ref> The [[Resource Description Framework]] defines a data type <code>rdf:XMLLiteral</code> to hold wrapped, [[canonical XML]].<ref>{{cite web|editor1-first=G.|editor1-last=Klyne|editor2-first=J. J.|editor2-last=Carroll|url=https://www.w3.org/TR/rdf10-concepts#dfn-rdf-XMLLiteral|date=10 February 2004|title=Resource Description Framework (RDF): Concepts and Abstract Syntax|format=W3C Recommendation|publisher=W3C|at=section 5.1}}</ref> Facebook has produced extensions to the [[PHP]] and [[JavaScript]] languages that add XML to the core syntax in a similar fashion to E4X, namely [[XHP]] and [[React (JavaScript library)#JSX|JSX]] respectively.