Fundamental XML Standards and Interfaces February 24, 2013 1 Speciﬁcations and validity of XML 1.1 Up-to-date Speciﬁcations of XML • Original Speciﬁcation (W3C Recommendation) XML 1.0 at W3C: http: //www.w3.org/XML/ • 5th Edition (corrections, updates, no major changes At Extensible Markup Language (XML) 1.0 (Fifth Edition) (http://www.w3.org/TR/REC-xml) • commented version at XML.COM (Annotated XML): http://www.xml. com/pub/a/axml/axmlintro.html • XML 1.1 (Second Edition) (http://www.w3.org/TR/xml11) - changes induced by the introduction of UNICODE 3 , easier normalization , the speciﬁcation of handling procedure for ”end of line” characters . XML 1.1 is not bound to speciﬁc version of UNICODE, but always on the latest version. 1.2 Which version to use? Which version to use in new applications?See W3C XML Core Working Group (http://www.w3.org/XML/Core/\#Publications) for the answer: • unless writing a parser or a XML-generating app. (editor), use XML 1.0 (backward-compatibility) • new parsers should ”know” XML 1.1 1.3 Validity of XML documents • To repeat: every XML document must be WELL-FORMED. • New: an XML doc can be VALID – which means a more strict requirements than WELL-FORMEDNESS.Usually, the conformance to a DTD (Document Type Deﬁnition) of the doc is meant by the validity, ormore recently – conformance with an XML Schema or other schema (RelaxNG, Schematron). 1 2 Document Type Deﬁnition (DTD) 2.1 Document Type Deﬁnition (DTD) • Document Type Deﬁnition (usage/reference to this deﬁnition is then a Document Type Declaration). • Speciﬁed in the (core) XML standard 1.0. • Describes allowed element content, attribute presence and content, their default values, deﬁnes used entities. • DTD might be either internal or external DTD (internal and external subset) or ”mixed” – both. • A document conformant with a DTD is denoted as valid (”platn´y” in Czech). • DTD and languages for similar purpose are denoted as modeling languages – they model/deﬁne concrete markups. • Syntax of DTD IS NOT XML (in constrast to XML Schema and many others modeling languages). 2.2 Motivation for DTD, comparison, pros and contras Problems with DTD? • Fundamental problem of DTD is its incompatibility with XML Namespaces and • lack of modeling expressiveness – some constructs cannot be constrained by DTD. • Direct, more powerful, but also more complex modeling language is W3C XML Schema (http://www.w3.org/XML/Schema). • Powerful and simpler alternatives of XML Schema are e.g. RelaxNG (http://relaxng.org). (on Wikipedia:RELAX NG (http://en.wikipedia. org/wiki/RELAX\_NG)) 2.3 Why use DTD? Why use DTD at all? • Simple. All parsers are ﬁne with it. • Suﬃcient for many markups. 2 2.4 DTD - tutorials • Webreview: http://www.webreview.com/2000/08\_11/developers/08\ _11\_00\_2.shtml • ZVON: http://www.zvon.org/xxl/DTDTutorial/General/contents.html • XML DTD Tutorial (101): http://www.xml101.com/dtd/ • W3Schools DTD Tutorial: http://www.w3schools.com (http://www.w3school. com) 2.5 DTD in more details / 1 DTD declaration is placed immediately before the root element! • Internal orexternal part (internal or external subset) might or might not be present, or both can be present. 2.6 DTD in more details / 2 External identiﬁer can be either • PUBLIC "PUBLIC ID" "URI" (suitable for ”public”, generally recognized DTDs) or • SYSTEM "URI" - for private- or other not-that-well established DTDs (”URI” neednot be just real URL on network, may also be a ﬁle on (local) ﬁlesystem, resolution according to system where it is resolved) The signiﬁcancy of internal a external parts is the same (they must not be in conﬂict - eg. two defeinitions of the same element).DTD contains a list of deﬁnitions for individual elements, list of attributes of them, entities, notations 2.7 DTD - conditional sections For ”commenting out” portions of DTDs e.g. for experimenting. • • 2.8 DTD - element type deﬁnition / 1 Describes allowed content of the element, in form of , where ... can be • EMPTY - for empty element which may be represented as or - the same logical meaning • ANY - any element content allowed, i.e. text nodes, child elements, ... • may contain child elements - 3 • may be mixed - containing both text and child elements given by enumeration . • for MIXED: the order or cardinality of concrete child elements cannot be speciﬁed. • The star (*) is required - any cardinality is always allowed. 2.9 DTD - element type deﬁnition / 2 For specifying the child elements, we use: • sequence operator (sekvence, follow with) , • choice operator (v´ybˇeru, select, choice) | • parenthesis () have usual meaning • various operators CANNOT be combined within a group ,| • the child elements cardinality (occurence) can be speciﬁed/limited by ”star”, ”question mark”, ”plus” having usual meaning. • No speciﬁer means just one occurence allowed. 2.10 DTD - attribute deﬁnition Describes (data) type and/or implicit attribute values for the respective element. 2.11 DTD - deﬁnition of attribute value type Allowed value types are as follows: • CDATA • NMTOKEN • NMTOKENS • ID • IDREF • IDREFS • ENTITY • ENTITIES • enumeration - eg. (hodnota1|hodnota2|hodnota3) • enumeration of notations - eg. NOTATION (notace1|notace2|notace3) 4 2.12 DTD - cardinality of attributes Attributes may have obligatory presence: • #REQUIRED - attribute is required • #IMPLIED - attribute is optional • #FIXED "fixed-value" - is required and must have the value fixed-value 2.13 DTD - implicit attribute value Attribute (incl. optional one) might have an implicit value: • "implicit value" - attribut is optional, but if not present, then the implicit value is used instead. 3 Physical Structure (Entities) 3.1 Entity - declaration and usage We distinguish: • declaration • reference (ie. use) of a (declared) entity. 3.2 General entities may be • parsed - ﬁles with a (well formed) markup, • not-parsed - eg. binary ﬁles, • character entities - characters, eg. > refers to a char entity. 3.3 Parametric entities • only inside of DTD, somehow similar to ”macros” in pg. languages • suitable eg. for declations of attribute lists (if long and multiply used) • see DTD for HTML 4.01 - http://www.w3.org/TR/html4/sgml/dtd. html • deﬁnition of a parametric entity is eg. 4 XML Base 4.1 XML Base • XML Base (second edition), W3C Recommendation 28 Jan 2009: http: //www.w3.org/TR/xmlbase/ 5 • Standard for evaluation of relative URLs in links to/from XML docs. Facility similar to that of HTML BASE, for deﬁning base URIs for parts of XML documents. • Deﬁnes how to use a reserved attribute xml:base denoting the base URI for relative URIs. • It complements with the XLink spec. • It works based on ”overriding” of XML base from parent (ancestor) ele- ments. 4.2 XML Base - example Note the use of the reserved preﬁx xml: 5 XML Namespaces 5.1 XML Namespaces (jmenn´e prostory) • XML Namespaces (W3C Recommendation, currently Namespaces in XML 1.0 (Third Edition) W3C Recommendation 8 Dec 2009): http://www.w3. org/TR/REC-xml-names • to new XML, there exists Namespaces in XML 1.1 W3C Recommendation (http: // www. w3. org/ TR/ xml-names11/ ) (Second Edition) 16 August 2006. Andrew Layman, Richard Tobin, Tim Bray, Dave Hollander • They deﬁne logical spaces for names of elements, attributes in XML doc- ument. • They give the elements and attributes the ”third dimension”. • To each NS in XML, there is exactly one (”globally”) unique identiﬁer, given by URI (URIs is a superset of URLs). • NS corresponding to an URI does not anyhow relate to content that would potentially be available under the URL (”nothing is downloaded when processing NSs”. 5.2 Preﬁxes and Equivalence of NSs /1 • Instead of URIs for denoting a namespace in document, one uses preﬁxes for these NS mapped to the respective URI using xmlns:prefix="URI".Elementor attribute-name containing colon (:) is denoted as Qualiﬁed Name, QName. 6 • Two NS are equal iﬀ their URIs are one-to-one-character the same (in UNICODE). • NS do not apply to text nodes. 5.3 Preﬁxes and Equivalence of NSs /2 • Element/attribute need not be in a namespace. • NS preﬁx declaration or declaration or the implicit NS recursively applies to all descendants (child elements, their children etc.), unless another declaration ”remaps” the given preﬁx. • One NS is co-called implicit (default) NS, declared by attribute xmlns= • Default NSs are NOT applied to attributes!!!, thus attributes without an explicit preﬁx do not belong to any NS. 5.4 Default NS – example

Hur´a´a´a´a

5.5 Explicit (preﬁxed) NS – example Hur´a´a´a´a 5.6 Issues related to NS NS are NOT compatible with DTD.DTD strictly diﬀerentiates between eg. name xi:include and include even if they belong to the same NS and should thus have the same interpretation/meaning for applications. 6 XML Information Set 6.1 XML Information Set (XML Infoset) - goals • XML Infoset 2nd Edition W3C Recommendation First published on 24 October 2001, revised 4 February 2004, John Cowan, Richard Tobin, http://www.w3.org/TR/xml-infoset/ • Infoset describes ”what all info can we get from a node (element, document, attribute...)” 7 • In other words: an application should not rely on any other info, such as attribute order etc. • Any well-formed XML document conformant to XML Namespaces has its Infoset. 6.2 XML Infoset - structure • Infoset comprises of Information items • Infoset relates to document with expanded (resolved) entities • We distinguish among infoset of document, element, attribut, character, PI, not-expanded entity, not-analysed entity, notation. 7 Canonical Form 7.1 Canonical Form of XML Document • Canonical XML Version 1.0, W3C Recommendation 15 March 2001, http: //www.w3.org/TR/xml-c14n • The goal of CF is to describe criteria and algorithm how to deﬁne equivalence on XML documents that are ”logically” the same and expose just diﬀerences in physical form (entities, attribute order, char encoding) • Canonication ”wipes-out” diﬀerences that are not signiﬁcant for applica- tions. • Canonication in inevitable in some important applications , e.g. electronic signature of XML data (when calculating digest). 7.2 Canonical Form - principles /1 Main principles for constructing the canonical form of an XML document: • encoding in UTF-8 • line breaks (CR, LF) normalized according to the algorithm mentioned in XML 1.0 Spec. • attribute values normalized • references to character and parsed entites replaced by their content • CDATA section also replaced by their content • prolog ”xml” and DTD removed 8 7.3 Canonical Form - principles /2 • whitespaces outside of the root element normalized • otherwise (except of line breaks), the whitespaces are preserved • attribute values always in double quotes ” • special chars in attr. values replaced by refs to character entities • superﬂous NS declarations removed • default attribute values added to all element where relevant • attributes and NS declarations will be ordered lexikographically 7.4 Issues with Canonical Form Certain information loss (mostly info from DTD): • not-parsed entity (eg. binary ones) are not accessible anymore after canon- icalization • notations • attribute types (incl. default values) 8 Terms 8.1 API Task • oﬀer simple standardized XML access • connect application to the parser and applications together • XML processing without knowledge of physical document structure (enti- ties) • eﬀective XML processing. 8.2 XML APIs Fundamental Types • Tree-based API • Event-based API • API based on pulling events/elements oﬀ the document (Pull API). 9 Tree-based API 9.1 Map XML Document to Memory Based Tree Struc- ture • allows to traverse the entire DOM Tree • best-known - Document Object Model (DOM from W3C, see http://www.w3.org/DOM (http://www.w3.org/DOM/)) 9 9.2 Programming Language Speciﬁc Models • Java: JDOM - http://jdom.org • Java: dom4j - http://dom4j.org • Java: XOM - http://www.xom.nu • Python: 4Suite - http://4suite.org • PHP: SimpleXML - http://www.php.net/simplexml 10 Event-based API 10.1 Generate Sequence of Events while parsing the Doc- ument • technical realization - using callback methods • application implements handlers (processing the generated events) • event-based API: – works on lower-level than tree-based – application should do more processing – saves memory - does not create any persistent objects. 10.2 Event Examples • start document, end document • start element, end element - contains the attributes as well. • processing instruction • comment • entity reference • Best-known event-based API - SAX http://www.saxproject.org 10.3 SAX - Document Analysis Example Hello, world!

generates following events:start document start element: doc list of attributes: empty start element: para list of attributes: empty characters: Hello, world! end element: para comment: that’s all folks start element: hr end element: hr end element: doc end document 10 10.4 When to use event-based API? • Easier to parser programmer, more diﬃcult to application programmer. • No complete document available to application programmer. He must keep the state of analysis him-self. • Suitable for tasks, that can be solved without the need of entire document. • The fastest possible processing usually. • Diﬃculties while writing applications can be solved using extensions like Streaming Transformations for XML (STX) (http://stx.sourceforge. net) 10.5 Optional SAX Parser Features The SAX parser behavior can be controlled using so called features a properties. • For optional SAX parser’s features see http://www.saxproject.org/ ?selected=get-set • For more details on properties and features see Use properties and features in SAX parsers (???) (IBM DeveloperWorks/XML). 10.6 SAX ﬁlters The SAX ﬁlters (implementation of org.xml.sax.XMLFilter interface) can be programmed using the SAX API.Such a class instance accepts input events, process them and sends them to the output.For more information on event ﬁltering see Change the events output by a SAX stream (http://www.ibm.com/ developerworks/xml/library/x-tipsaxfilter/) (IBM DeveloperWorks/XML) for example. 10.7 Additional SAX References • Primary source - http://www.saxproject.org • SAX Tutorial on JAXP http://java.sun.com/webservices/reference/ tutorials/jaxp/html/sax.html 11 Pull-based APIs 11.1 Pull-based APIs • Application does not process incoming events, but it pulls data from the processed ﬁle. • Can be used when programmer knows the structure of an input data and he can pull them oﬀ the ﬁle. • ... opposite to event-based API. 11 • Very comfortable to an application programmer, but implementations are usually slower the push event-based APIs. • Java oﬀers the XML-PULL parser API - see Common API for XML Pull Parsing (http://www.xmlpull.org/) and also • newly develop API - Streaming API for XML (StAX) (http://www.jcp. org/en/jsr/detail?id=173) developed like a product of JCP (Java Community Process). 11.2 Streaming API for XML (StAX) The API may become the part of the Java API for XML Processing (JAXP) in the future.Oﬀers two ways to pull-based processing: • pulling the events using iterator - more comfortable • low-level access using so called cursor - faster. 11.3 StAX - an Iterator Example import java.io.FileNotFoundException; import java.io.FileReader; import javax.xml.namespace.QName; import javax.xml.stream.XMLInputFactory; import javax.xml.stream.XMLStreamConstants; import javax.xml.stream.XMLStreamException; import javax.xml.stream.XMLStreamReader; public class ParseByIterator { public static void main(String[] args) throws FileNotFoundException, XMLStreamExceptio // Use reference implementation System.setProperty("javax.xml.stream.XMLInputFactory", "com.bea.xml.stream.MXParse XMLInputFactory xmlif = XMLInputFactory.newInstance(); // Create an XML stream rea XMLStreamReader xmlr = xmlif.createXMLStreamReader(new FileReader("somefile.xml")) while (xmlr.hasNext()) { processEvent(xmlr); xmlr.next(); } } /** * Process a single event * * @param xmlr - the XML stream reader */ private static void processEvent(XMLStreamReader xmlr) { switch (xmlr.getEventType()) { case XMLStreamConstants.START_ELEMENT: processName(xmlr); 12 processAttributes(xmlr); break; case XMLStreamConstants.END_ELEMENT: processName(xmlr); break; case XMLStreamConstants.SPACE: case XMLStreamConstants.CHARACTERS: int start = xmlr.getTextStart(); int length = xmlr.getTextLength(); String text = new String(xmlr.getTextCharacters(), start, length); break; case XMLStreamConstants.COMMENT: case XMLStreamConstants.PROCESSING_INSTRUCTION: if (xmlr.hasText()) { String piOrComment = xmlr.getText(); } break; } } private static void processName(XMLStreamReader xmlr) { if (xmlr.hasName()) { String prefix = xmlr.getPrefix(); String uri = xmlr.getNamespaceURI(); String localName = xmlr.getLocalName(); } } private static void processAttributes(XMLStreamReader xmlr) { for (int i = 0; i < xmlr.getAttributeCount(); i++) { processAttribute(xmlr, i); } } private static void processAttribute(XMLStreamReader xmlr, int index) { String prefix = xmlr.getAttributePrefix(index); String namespace = xmlr.getAttributeNamespace(index); QName localName = xmlr.getAttributeName(index); String value = xmlr.getAttributeValue(index); } } Example from Tip: Use XML streaming parsers (http://www.ibm.com/developerworks/ xml/library/x-tipstx) (IBM DeveloperWorks, XML section). 11.4 StAX - an Cursor Example import java.io.FileNotFoundException; import java.io.FileReader; import javax.xml.namespace.QName; 13 import javax.xml.stream.XMLInputFactory; import javax.xml.stream.XMLStreamConstants; import javax.xml.stream.XMLStreamException; import javax.xml.stream.XMLStreamReader; public class ParseByIterator { public static void main(String[] args) throws FileNotFoundException, XMLStreamExceptio // Use reference implementation System.setProperty("javax.xml.stream.XMLInputFactory", "com.bea.xml.stream.MXParse XMLInputFactory xmlif = XMLInputFactory.newInstance(); // Create an XML stream rea XMLStreamReader xmlr = xmlif.createXMLStreamReader(new FileReader("somefile.xml")) while (xmlr.hasNext()) { processEvent(xmlr); xmlr.next(); } } /** * Process a single event * * @param xmlr - the XML stream reader */ private static void processEvent(XMLStreamReader xmlr) { switch (xmlr.getEventType()) { case XMLStreamConstants.START_ELEMENT: processName(xmlr); processAttributes(xmlr); break; case XMLStreamConstants.END_ELEMENT: processName(xmlr); break; case XMLStreamConstants.SPACE: case XMLStreamConstants.CHARACTERS: int start = xmlr.getTextStart(); int length = xmlr.getTextLength(); String text = new String(xmlr.getTextCharacters(), start, length); break; case XMLStreamConstants.COMMENT: case XMLStreamConstants.PROCESSING_INSTRUCTION: if (xmlr.hasText()) { String piOrComment = xmlr.getText(); } break; } } private static void processName(XMLStreamReader xmlr) { if (xmlr.hasName()) { String prefix = xmlr.getPrefix(); 14 String uri = xmlr.getNamespaceURI(); String localName = xmlr.getLocalName(); } } private static void processAttributes(XMLStreamReader xmlr) { for (int i = 0; i < xmlr.getAttributeCount(); i++) { processAttribute(xmlr, i); } } private static void processAttribute(XMLStreamReader xmlr, int index) { String prefix = xmlr.getAttributePrefix(index); String namespace = xmlr.getAttributeNamespace(index); QName localName = xmlr.getAttributeName(index); String value = xmlr.getAttributeValue(index); } } Sample from Tip: Use XML streaming parsers (http://www.ibm.com/developerworks/ xml/library/x-tipstx) (IBM DeveloperWorks, XML section). 12 Document Object Model (DOM) 12.1 Basic Interface to Process and Access the Tree Representation of an XML Data • Three versions of DOM: DOM Level 1, 2, 3 • DOM - does not depend on the XML Parsing. • Described using IDL + API descriptions for particular programming languages (C++, Java, etc.) 12.2 HTML Documents Speciﬁc DOM • The HTML Core DOM is more less consolidated with the XML DOM • Designated to CSS • Used for dynamic HTML programming (scripting using VB Script, JavaScript, etc) • Contains the browser environment (windows, history, etc) besides the document model itself. 12.3 DOM references • JAXP Tutorial, part dedicated to the DOM Part III: XML and the Document Object Model (DOM) (http://java.sun.com/xml/jaxp/dist/1. 1/docs/tutorial/dom/index.html) 15 • Portal dedicated to the DOM http://www.oasis-open.org/cover/dom. html • DOM 1 Interface visual overview http://www.xml.com/pub/a/1999/07/ dom/index.html • Tutorial ”Understanding DOM (Level 2)” available athttp://ibm.com/developer/xmlhttp://ibm.com/de (http://ibm.com/developer/xml) 12.4 DOM Implementation • Included in many parsers, the. Xerces (http://xml.apache.org) parser for example. • Part of the JAXP (Java API for XML Processing) - http://java.sun. com/xml/jaxp/index.html • Standalone implementations independent on parsers: – dom4j - http://dom4j.org – EXML (Electric XML) - http://www.themindelectric.net 13 Using DOM in Java 13.1 What do we need? Native DOM support in the new Java versions (JDK and JRE) - no need of additional library.Applications need to import needed symbols (interfaces, classes, etc.) mostly from package org.w3c.dom. 13.2 What will we need often? Most often used interfaces are: Element corresponds to the element in a logical document structure. It allows us to access name of the element, names of attributes, child nodes (including textual ones). Useful methods: • Node getParentNode() - returns the parent node • String getTextContent() - returns textual content of the element. • NodeList getElementsByTagName(String name) - returns the list of ancestors (child nodes and their ancestors) with the given name. Node super interface of Element, corresponds to the general node in a logical document structure, may contain element, textual node, comment, etc. NodeList a list of nodes (a result of calling getElementsByTagName for example). It oﬀers the following methods for its processing: • int getLength() - returns the number of nodes in a list • Node item(int index) - returns the node at position index Document corresponds to the document node (its a parent of a root element) 16 13.3 Example 1 - creating DOM tree from ﬁle Example of method, reading a DOM tree from an XML ﬁle (see Home work 1): import java.io.IOException; import java.net.URL; import javax.xml.parsers.DocumentBuilder; import javax.xml.parsers.DocumentBuilderFactory; import javax.xml.parsers.ParserConfigurationException; import org.w3c.dom.Document; import org.xml.sax.SAXException; public class Uloha1 { /** * Constructor creating new instance of Uloha1 class by reading XML document * on the given URL. */ private Uloha1(URL url) throws SAXException, ParserConfigurationException, IOException // We create new instance of factory class DocumentBuilderFactory factory = DocumentBuilderFactory.newInstance(); // We get new instance of DocumentBuilder using the factory class. DocumentBuilder builder = factory.newDocumentBuilder(); // We utilize the DocumentBuilder to process an XML document // and we get document model in form of W3C DOM Document doc = builder.parse(url.toString()); } } 13.4 Example 2 - DOM tree modiﬁcation Example of a method manipulating a document DOM tree (see Homework 1): import org.w3c.dom.Document; import org.w3c.dom.Element; import org.w3c.dom.NodeList; public class Uloha1 { Document doc; /** * *********************************************************************** * Method for a salary modification. If the person’s salary is less then * minimum, the salary will increased to *

minimum>.
* No action is performed with the rest of persons.
*/
public void adjustSalary(double minimum) {
// get the list of salaries
NodeList salaries = doc.getElementsByTagName("salary");
for (int i = 0; i < salaries.getLength(); i++) {
// get the salary element
17
Element salaryElement = (Element) salaries.item(i);
// get payment
double salary = Double.parseDouble(salaryElement.getTextContent());
if (salary < minimum) {
// modify the text node/content of element
salaryElement.setTextContent(String.valueOf(minimum));
}
}
}
}
13.5 Example 3 - storing a DOM tree into an XML ﬁle
Example of the method storing a DOM tree into a ﬁle (see Homework 1)The
procedure utilizes a transformation we do not know yet. Let use it as a black
box.
import java.io.File;
import java.io.IOException;
import javax.xml.transform.Transformer;
import javax.xml.transform.TransformerConfigurationException;
import javax.xml.transform.TransformerException;
import javax.xml.transform.TransformerFactory;
import javax.xml.transform.dom.DOMSource;
import javax.xml.transform.stream.StreamResult;
import org.w3c.dom.Document;
public class Uloha1 {
Document doc;
/**
* ***********************************************************************
* Method for a salary modification. If the person’s salary is less then
* minimum, the salary will increased to
* minimum>.
* No action is performed with the rest of persons.
*/
public void serializetoXML(File output) throws IOException, TransformerConfigurationEx
// We create new instance of a factory class.
TransformerFactory factory = TransformerFactory.newInstance();
Transformer transformer = factory.newTransformer();
// The input is the document placed in a memory
DOMSource source = new DOMSource(doc);
// The transformation output is the output file
StreamResult result = new StreamResult(output);
// Let’s make the transformation
transformer.transform(source, result);
}
}
18
14 Alternative tree-based models
14.1 XML Object Model (XOM)
• XOM (XML Object Model) created as an one man project (author Elliote
Rusty Harold).
• It is an interface that strictly respect XML data logical model.
• For motivation and speciﬁcation see the XOM home page (http://cafeconleche.
org/XOM/).
• You can get there the open-sourceXOM implementation (http://cafeconleche.
org/XOM/xom-1.0d24.zip) and
• the API documentation (http://cafeconleche.org/XOM/apidocs/) too.
14.2 Alternative parsers and tree models - NanoXML
• Very small (in the mean of a code size) tree-based interface and parser all
in one
• available as open-source at http://nanoxml.n3.net
• adopted for mobile devices as well
• not the best in the mean of a run-time speed and memory eﬃciency.
14.3 DOM4J - practically good usable tree-based model
• comfortable, fast and memory eﬃcient tree-oriented interface
• designed and optimized for Java
• available as open-source at http://dom4j.org
• perfect”cookbook” (http://dom4j.org/cookbook/cookbook.html) avail-
able
• dom4j is powerful, seetree-based models eﬃciency comparison (http://
www.ibm.com/developerworks/xml/library/x-injava/)
15 Tree and event-based access combinations
15.1 Events → tree
• Allow us either to skip or to ﬁlter out the ”uninteresting” document part
using the event monitoring and then
• create memory-based tree from the ”interesting” part of a document only
and that part process.
19
15.2 Tree → events
• We create an entire document tree (and process it) and
• we go through the tree than and we generate events like while reading the
XML ﬁle.
• It allows us easy integration of both processing types in a single applica-
tion.
15.3 Virtual object models
• Document DOM model is not memory places, but is created on-demand
while accessing particular nodes.
• combines event-based and tree-based processing advantages (speed and
comfort)
• Implementation is the Sablotron processor for example (see http://www.
xml.com/pub/a/2002/03/13/sablotron.html or http://www.gingerall.
org/charlie/ga/xml/p\_sab.xml)
20