XML Namespaces (jmenné prostory)
XML Namespaces (W3C Recommendation, currently Namespaces in XML 1.0 (Third Edition) W3C Recommendation 8 Dec 2009): http://www.w3.org/TR/REC-xml-names
to new XML, there exists Namespaces in XML 1.1 W3C Recommendation (Second Edition) 16 August 2006. Andrew Layman, Richard Tobin, Tim Bray, Dave Hollander
They define logical spaces for names of elements, attributes in XML document.
They give the elements and attributes the "third dimension".
To each NS in XML, there is exactly one ("globally") unique identifier, given by URI (URIs is a superset of URLs).
NS corresponding to an URI does not anyhow relate to content that would potentially be available under the URL ("nothing is downloaded when processing NSs".
Prefixes and Equivalence of NSs (1)
Instead of URIs for denoting a namespace in document, one uses prefixes for these NS mapped to the respective URI using
Element- or attribute-name containing colon (
) is denoted as Qualified Name, QName. -
Two NS are equal iff their URIs are one-to-one-character the same (in UNICODE).
Namespaces do not apply to text nodes.
Element/attribute need not be in a namespace.
NS prefix declaration or declaration or the implicit NS recursively applies to all descendants (child elements, their children etc.), unless another declaration "remaps" the given prefix.
One NS is co-called implicit (default) NS, declared by attribute
Default NSs are NOT applied to attributes!!!, thus attributes without an explicit prefix do not belong to any NS.
Example 1. Default NS
<html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en" lang="en">
Example 2. Prefixed NS
<xhtml:html xmlns:xhtml="http://www.w3.org/1999/xhtml" xml:lang="en" lang="en">
Issues related to NS
NS are not compatible with DTD.
DTD strictly differentiates between eg. name
even if they belong to the same NS and should thus have the same interpretation/meaning for applications.
API for XML Processing (to repeat)
APIs offer simple standardized XML access.
APIs connect application to the parser and applications together.
APIs allow XML processing without knowledge of physical document structure (entities).
APIs optimize XML processing.
XML APIs Fundamental Types
Tree-based API
Event-based API
API based on pulling events/elements off the document (Pull API).
Tree-based API
They map an XML document to a memory-based tree structure.
allows to traverse the entire DOM tree.
best-known - Document Object Model (DOM from W3C, see http://www.w3.org/DOM)
Programming Language Specific Models
Java: JDOM - http://jdom.org
Java: dom4j - http://dom4j.org
Java: XOM - http://www.xom.nu
Python: 4Suite - http://4suite.org
PHP: SimpleXML - http://www.php.net/simplexml
Document Object Model (DOM)
Basic interface to process and access the tree representation of XML data
Three versions of DOM: DOM Level 1, 2, 3
DOM - does not depend on the XML parsing.
Described using IDL + API descriptions for particular programming languages (C++, Java, etc.)
HTML Documents Specific DOM
The HTML Core DOM is more less consolidated with the XML DOM
Designated to CSS
Used for dynamic HTML programming (scripting using VB Script, JavaScript, etc)
Contains the browser environment (windows, history, etc) besides the document model itself.
DOM references
JAXP Tutorial, part dedicated to the DOM Part III: XML and the Document Object Model (DOM) (http://java.sun.com/xml/jaxp/dist/1.1/docs/tutorial/dom/index.html)
Portal dedicated to the DOM http://www.oasis-open.org/cover/dom.html
DOM 1 Interface visual overview http://www.xml.com/pub/a/1999/07/dom/index.html
Tutorial ”Understanding DOM (Level 2)” available at https://www.ibm.com/developerworks/xml/
Using DOM in Java
Native DOM support in the new Java versions (JDK and JRE) - no need of additional library.
Applications need to import needed symbols (interfaces, classes, etc.) mostly from package
What will we need often?
Most often used interfaces are:
corresponds to the element in a logical document structure. It allows us to access name of the element, names of attributes, child nodes (including textual ones). Useful methods: -
Node getParentNode()
- returns the parent node -
String getTextContent()
- returns textual content of the element. -
NodeList getElementsByTagName(String name)
- returns the list of ancestors (child nodes and their ancestors) with the given name. -
super interface ofElement
, corresponds to the general node in a logical document structure, may contain element, textual node, comment, etc. -
a list of nodes (a result of callinggetElementsByTagName
for example). It offers the following methods for its processing: -
int getLength()
- returns the number of nodes in a list -
Node item(int index)
- returns the node at position index -
corresponds to the document node (its a parent of a root element)
Example 1 - creating DOM tree from file
public class Task1 {
private Task1(URL url) throws SAXException,
ParserConfigurationException, IOException {
// We create new instance of factory class
DocumentBuilderFactory factory = DocumentBuilderFactory.newInstance();
// We get new instance of DocumentBuilder using the factory class.
DocumentBuilder builder = factory.newDocumentBuilder();
// We utilize the DocumentBuilder to process an XML document
// and we get document model in form of W3C DOM
Document doc = builder.parse(url.toString());
Example 2 - DOM tree modification
public class Task1 {
private Document doc;
// Method for a salary modification.
// If the person’s salary is less then
// minimum, the salary will increased to minimum.
// No action is performed for the other persons.
public void adjustSalary(double minimum) {
// get the list of salaries
NodeList salaries = doc.getElementsByTagName("salary");
for (int i = 0; i < salaries.getLength(); i++) {
// get the salary element
Element salaryElement = (Element) salaries.item(i);
// get payment
double salary = Double.parseDouble(
if (salary < minimum) {
// modify the text node/content of element
Example 3 - storing a DOM tree into an XML file
Example of the method storing a DOM tree into a file (see Homework 1). The procedure utilizes a transformation we do not know yet. Let use it as a black box.
public class Task1 {
private Document doc;
public void serializetoXML(File output) throws IOException,
TransformerConfigurationException {
// We create new instance of a factory class.
TransformerFactory factory
= TransformerFactory.newInstance();
Transformer transformer
= factory.newTransformer();
// The input is the document placed in a memory
DOMSource source = new DOMSource(doc);
// The transformation output is the output file
StreamResult result = new StreamResult(output);
// Let’s make the transformation
transformer.transform(source, result);
Alternative tree-based models
XML Object Model (XOM)
XOM (XML Object Model) created as an one man project (author Elliote Rusty Harold).
It is an interface that strictly respect XML data logical model.
For motivation and specification see the XOM home page (http://www.xom.nu).
You can get there the open-sourceXOM implementation and
the API documentation, too.
DOM4J - practically usable tree-based model
comfortable, fast and memory efficient tree-oriented interface
designed and optimized for Java
available as open-source at http://dom4j.org
perfect ”cookbook” (http://dom4j.org/cookbook/cookbook.html) available
dom4j is powerful, seetree-based models efficiency comparison (http://www.ibm.com/developerworks/xml/library/x-injava/)
Tree and event-based access combinations
Events → tree
Tree → events
Events → tree
Allow us either to skip or to filter out the ”uninteresting” document part using the event monitoring and then
create memory-based tree from the ”interesting” part of a document only and that part process.
Tree → events
We create an entire document tree (and process it) and
we go through the tree than and we generate events like while reading the XML file.
It allows us easy integration of both processing types in a single application.
Virtual object models
Document DOM model is not memory places, but is created on-demand while accessing particular nodes.
combines event-based and tree-based processing advantages (speed and comfort)
There is an implementation: the Sablotron processor, http://www.xml.com/pub/a/2002/03/13/sablotron.html
XPath - basic principles
XPath is a syntax used to specify parts of XML documents (nodes, sets of nodes, sequences of nodes; does not allow to specify parts of text nodes).
XPath uses syntax similar to file system path.
XPath offers standard functions library (as well as user defined functions in either some XPath 2.0 or even XPath 1.x processors).
XPath is used as a base in XSLT since version 1.0 and XQuery since version 2.0.
XPath does not use XML syntax (it would be too long)
XPath 1.0 and 2.0 are W3C Recommendation - http://www.w3.org/TR/xpath
XPath - Application Domains
Advanced XML Data navigation
<?xml version="1.0"?>
Select the 3rd node b:
Select a node b, it has a child node c:
Select an empty node b:
XPath - Application Domains
Transformation (XSLT)
used to select nodes, they have to be processed
<xsl:value-of select="./c"/>
XPath - Application Domains
Selection parts of XML query languages (XQuery)
Some XML modeling languages (Schematron, XML Schema)
XPath - terms paths and locations
Path describes (means. "navigates") XML document location. Paths syntax is constructed similar way to paths on file systems, it means like
relative - related to a context node (CN), see further or
absolute - related to the root element, but predicates are evaluated in relation to CN.
XPath - syntactic rules
[20] PathExpr ::= AbsolutePathExpr | RelativePathExpr
[22] AbsolutePathExpr ::= ("/" RelativePathExpr?) | ("//" RelativePathExpr)
[23] RelativePathExpr ::= StepExpr (("/" | "//") StepExpr)*
[24] StepExpr ::= AxisStep | GeneralStep
[25] AxisStep ::= (Axis? NodeTest StepQualifiers) | AbbreviatedStep
XPath - axes
Axes (singular axis, plural axes) are sets of document elements, related to (usually relatively) to context.
Context is formed by a document and the current (context) node (CN).
Axes are:
- contains direct child nodes of CN -
- contains all descendants of CN except attributes. -
- contains the CN parent nod (if it exists) -
- contains all ancestors of CN - means parents, grandparents, etc to a root element (if the CN is not the root element itself) -
- contains all following siblings of CN (the axis is empty for NS and attributes) -
- dtto, but it contains the preceding sibling. -
- contains all nodes following the CN (except the attributes, child nodes and NS nodes) -
- dtto, but contains preceding nodes (except ancestors, attributes, NS) -
- contains attributes (for elements only) -
- contains all NS nodes of CN (for elements only) -
- the CN itself -
- contains the union of descendant and self axes -
- contains the union of ancestor and self axes
Figure 1. //b/child::*
<?xml version="1.0"?>
Example 3. //b/descendant::*
<?xml version="1.0"?>
Example 4. //d/parent::*
<?xml version="1.0"?>
Example 5. //d/ancestor::*
<?xml version="1.0"?>
Example 6. //b/following-sibling::*
<?xml version="1.0"?>
Example 7. //b/preceding-sibling::*
<?xml version="1.0"?>
Example 8. /a/b/c/following::*
<?xml version="1.0"?>
Example 9. /a/b/e/preceding::*
<?xml version="1.0"?>
XPath - predicates
Assigned to selection from node set specified by path for example.
Figure: /article/para[3] - selects the 3rd paragraph (element para) of article (element article)
Simplest predicate expression is proximity position specification - see preceding.
Attention at reverse axes (ancestor, preceding, …) - position is numbered always from CN, means opposite to document physical location directions.
Position specification 3 can be replace by expression position()=3.
XPath - expressions
Used in predicates to computations, etc The may contain XPath functions.
Expressions may operate on:
text strings
numbers (floating-point numbers)
logical values (boolean)
XPath - short notation - Examples
selects all child nodes of context node with name para -
selects all element children of the context node -
selects all text node children of the context node -
selects the name attribute of the context node -
selects all the attributes of the context node -
selects the firstpara
child of the context node -
selects the lastpara
child of the context node -
selects all para grandchildren of the context node -
selects the second section of the fifth chapter of the doc -
- selects all descendants of elementchapter
with namepara
- selects all elementspara
in the document -
- selects all elementsitem
with parent elementolist
selects all descendant nodes of CN with namepara
selects the parent node of CN -
selects a lang attribute of CN parent node
XPath - short notation (2)
Most common used short notation is at child axis
we use article/para instead of
. -
at attribute:we use
instead ofchild::para[attribute::type="warning"]
The next used short notation is
instead of/descendant-or-self::node()/
and of course shortcuts
For clarity, we keep sometimes the longer form: Do not fight it at all costs!
Further Information on XPath
XPath on W3C: http://www.w3.org/TR/xpath
Zvon XPath Tutorial: http://zvon.org/xxl/XPathTutorial/Output/index.html
XPath Tutorial on W3Schools: http://www.w3schools.com/xpath/xpath_intro.asp
XPath 2.0
Final specification available at - http://www.w3.org/TR/xpath20/
Different point of view on return values of XPatch expressions: everything is a sequence (even containing a single element)
→removes the set node order problems
Introduces conditional expressions and cycles.
Introduces user-defined functions (dynamically evaluate XPath expressions)
Users can uses general and existential quantifiers, for example. exist student/name="Fred", all student/@id
For more details see http://www.saxonica.com/, (pages contains the XPath/XSLT/XQuery processor Saxon as well)..
XPath 2.0 - examples
String functions - http://www.fi.muni.cz/~tomp/xml03/xpath20/string.html
Numeric functions - http://www.fi.muni.cz/~tomp/xml03/xpath20/numeric.html
Sequence functions - http://www.fi.muni.cz/~tomp/xml03/xpath20/sequence.html
Boolean functions - http://www.fi.muni.cz/~tomp/xml03/xpath20/boolean.html