XML Namespaces (jmenné prostory)

  • XML Namespaces (W3C Recommendation, currently Namespaces in XML 1.0 (Third Edition) W3C Recommendation 8 Dec 2009): http://www.w3.org/TR/REC-xml-names

  • to new XML, there exists Namespaces in XML 1.1 W3C Recommendation (Second Edition) 16 August 2006. Andrew Layman, Richard Tobin, Tim Bray, Dave Hollander

  • They define logical spaces for names of elements, attributes in XML document.

  • They give the elements and attributes the "third dimension".

  • To each NS in XML, there is exactly one ("globally") unique identifier, given by URI (URIs is a superset of URLs).

  • NS corresponding to an URI does not anyhow relate to content that would potentially be available under the URL ("nothing is downloaded when processing NSs".

Prefixes and Equivalence of NSs (1)

  • Instead of URIs for denoting a namespace in document, one uses prefixes for these NS mapped to the respective URI using xmlns:prefix="URI".

  • Element- or attribute-name containing colon (:) is denoted as Qualified Name, QName.

  • Two NS are equal iff their URIs are one-to-one-character the same (in UNICODE).

  • Namespaces do not apply to text nodes.

  • Element/attribute need not be in a namespace.

  • NS prefix declaration or declaration or the implicit NS recursively applies to all descendants (child elements, their children etc.), unless another declaration "remaps" the given prefix.

  • One NS is co-called implicit (default) NS, declared by attribute xmlns=

  • Default NSs are NOT applied to attributes!!!, thus attributes without an explicit prefix do not belong to any NS.

Example 1. Default NS

<html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en" lang="en">

Example 2. Prefixed NS

<xhtml:html xmlns:xhtml="http://www.w3.org/1999/xhtml" xml:lang="en" lang="en">
  • NS are not compatible with DTD.

  • DTD strictly differentiates between eg. name xi:include and include even if they belong to the same NS and should thus have the same interpretation/meaning for applications.

API for XML Processing (to repeat)

  • APIs offer simple standardized XML access.

  • APIs connect application to the parser and applications together.

  • APIs allow XML processing without knowledge of physical document structure (entities).

  • APIs optimize XML processing.

XML APIs Fundamental Types

  • Tree-based API

  • Event-based API

  • API based on pulling events/elements off the document (Pull API).

Tree-based API

  • They map an XML document to a memory-based tree structure.

  • allows to traverse the entire DOM tree.

  • best-known - Document Object Model (DOM from W3C, see http://www.w3.org/DOM)

Programming Language Specific Models

Document Object Model (DOM)

  • Basic interface to process and access the tree representation of XML data

  • Three versions of DOM: DOM Level 1, 2, 3

  • DOM - does not depend on the XML parsing.

  • Described using IDL + API descriptions for particular programming languages (C++, Java, etc.)

HTML Documents Specific DOM

  • The HTML Core DOM is more less consolidated with the XML DOM

  • Designated to CSS

  • Used for dynamic HTML programming (scripting using VB Script, JavaScript, etc)

  • Contains the browser environment (windows, history, etc) besides the document model itself.

DOM references

Using DOM in Java

  • Native DOM support in the new Java versions (JDK and JRE) - no need of additional library.

  • Applications need to import needed symbols (interfaces, classes, etc.) mostly from package org.w3c.dom.

What will we need often?

Most often used interfaces are:

  • Element corresponds to the element in a logical document structure. It allows us to access name of the element, names of attributes, child nodes (including textual ones). Useful methods:

  • Node getParentNode() - returns the parent node

  • String getTextContent() - returns textual content of the element.

  • NodeList getElementsByTagName(String name) - returns the list of ancestors (child nodes and their ancestors) with the given name.

  • Node super interface of Element, corresponds to the general node in a logical document structure, may contain element, textual node, comment, etc.

  • NodeList a list of nodes (a result of calling getElementsByTagName for example). It offers the following methods for its processing:

  • int getLength() - returns the number of nodes in a list

  • Node item(int index) - returns the node at position index

  • Document corresponds to the document node (its a parent of a root element)

Example 1 - creating DOM tree from file

public class Task1 {
  private Task1(URL url) throws SAXException,
    ParserConfigurationException, IOException {
    // We create new instance of factory class
    DocumentBuilderFactory factory = DocumentBuilderFactory.newInstance();
    // We get new instance of DocumentBuilder using the factory class.
    DocumentBuilder builder = factory.newDocumentBuilder();
    // We utilize the DocumentBuilder to process an XML document
    // and we get document model in form of W3C DOM
    Document doc = builder.parse(url.toString());

Example 2 - DOM tree modification

public class Task1 {
  private Document doc;
  // Method for a salary modification.
  // If the person’s salary is less then
  // minimum, the salary will increased to minimum.
  // No action is performed for the other persons.
  public void adjustSalary(double minimum) {
    // get the list of salaries
    NodeList salaries = doc.getElementsByTagName("salary");
    for (int i = 0; i < salaries.getLength(); i++) {
      // get the salary element
      Element salaryElement = (Element) salaries.item(i);
      // get payment
      double salary = Double.parseDouble(
      if (salary < minimum) {
        // modify the text node/content of element

Example 3 - storing a DOM tree into an XML file

Example of the method storing a DOM tree into a file (see Homework 1). The procedure utilizes a transformation we do not know yet. Let use it as a black box.

public class Task1 {
   private Document doc;
   public void serializetoXML(File output) throws IOException,
   TransformerConfigurationException {
      // We create new instance of a factory class.
      TransformerFactory factory
        = TransformerFactory.newInstance();
      Transformer transformer
        = factory.newTransformer();
      // The input is the document placed in a memory
      DOMSource source = new DOMSource(doc);
      // The transformation output is the output file
      StreamResult result = new StreamResult(output);
      // Let’s make the transformation
      transformer.transform(source, result);

Alternative tree-based models

XML Object Model (XOM)

  • XOM (XML Object Model) created as an one man project (author Elliote Rusty Harold).

  • It is an interface that strictly respect XML data logical model.

  • For motivation and specification see the XOM home page (http://www.xom.nu).

  • You can get there the open-sourceXOM implementation and

  • the API documentation, too.

DOM4J - practically usable tree-based model

Tree and event-based access combinations

  • Events → tree

  • Tree → events

Events → tree

  • Allow us either to skip or to filter out the ”uninteresting” document part using the event monitoring and then

  • create memory-based tree from the ”interesting” part of a document only and that part process.

Tree → events

  • We create an entire document tree (and process it) and

  • we go through the tree than and we generate events like while reading the XML file.

  • It allows us easy integration of both processing types in a single application.

Virtual object models

  • Document DOM model is not memory places, but is created on-demand while accessing particular nodes.

  • combines event-based and tree-based processing advantages (speed and comfort)

  • There is an implementation: the Sablotron processor, http://www.xml.com/pub/a/2002/03/13/sablotron.html

XPath - basic principles

  • XPath is a syntax used to specify parts of XML documents (nodes, sets of nodes, sequences of nodes; does not allow to specify parts of text nodes).

  • XPath uses syntax similar to file system path.

  • XPath offers standard functions library (as well as user defined functions in either some XPath 2.0 or even XPath 1.x processors).

  • XPath is used as a base in XSLT since version 1.0 and XQuery since version 2.0.

  • XPath does not use XML syntax (it would be too long)

  • XPath 1.0 and 2.0 are W3C Recommendation - http://www.w3.org/TR/xpath

XPath - Application Domains

  • Advanced XML Data navigation

 <?xml version="1.0"?>
  • Select the 3rd node b:

  • Select a node b, it has a child node c:

  • Select an empty node b:


XPath - Application Domains

  • Transformation (XSLT)

  • used to select nodes, they have to be processed

   <xsl:value-of select="./c"/>

XPath - Application Domains

  • Selection parts of XML query languages (XQuery)

  • Some XML modeling languages (Schematron, XML Schema)

XPath - terms paths and locations

  • Path describes (means. "navigates") XML document location. Paths syntax is constructed similar way to paths on file systems, it means like

    • relative - related to a context node (CN), see further or

    • absolute - related to the root element, but predicates are evaluated in relation to CN.

XPath - syntactic rules

[20] PathExpr ::= AbsolutePathExpr | RelativePathExpr
[22] AbsolutePathExpr ::= ("/" RelativePathExpr?) | ("//" RelativePathExpr)
[23] RelativePathExpr ::= StepExpr (("/" | "//") StepExpr)*
[24] StepExpr ::= AxisStep | GeneralStep
[25] AxisStep ::= (Axis? NodeTest StepQualifiers) | AbbreviatedStep

XPath - axes

  • Axes (singular axis, plural axes) are sets of document elements, related to (usually relatively) to context.

  • Context is formed by a document and the current (context) node (CN).

  • Axes are:

    • child - contains direct child nodes of CN

    • descendant - contains all descendants of CN except attributes.

    • parent - contains the CN parent nod (if it exists)

    • ancestor - contains all ancestors of CN - means parents, grandparents, etc to a root element (if the CN is not the root element itself)

    • following-sibling - contains all following siblings of CN (the axis is empty for NS and attributes)

    • preceding-sibling - dtto, but it contains the preceding sibling.

    • following - contains all nodes following the CN (except the attributes, child nodes and NS nodes)

    • preceding - dtto, but contains preceding nodes (except ancestors, attributes, NS)

    • attribute - contains attributes (for elements only)

    • namespace - contains all NS nodes of CN (for elements only)

    • self - the CN itself

    • descendant-or-self - contains the union of descendant and self axes

    • ancestor-or-self - contains the union of ancestor and self axes

Figure 1. //b/child::*

<?xml version="1.0"?>

Example 3. //b/descendant::*

<?xml version="1.0"?>

Example 4. //d/parent::*

<?xml version="1.0"?>

Example 5. //d/ancestor::*

<?xml version="1.0"?>

Example 6. //b/following-sibling::*

<?xml version="1.0"?>

Example 7. //b/preceding-sibling::*

<?xml version="1.0"?>

Example 8. /a/b/c/following::*

<?xml version="1.0"?>

Example 9. /a/b/e/preceding::*

<?xml version="1.0"?>

XPath - predicates

  • Assigned to selection from node set specified by path for example.

  • Figure: /article/para[3] - selects the 3rd paragraph (element para) of article (element article)

  • Simplest predicate expression is proximity position specification - see preceding.

    • Attention at reverse axes (ancestor, preceding, …) - position is numbered always from CN, means opposite to document physical location directions.

    • Position specification 3 can be replace by expression position()=3.

XPath - expressions

  • Used in predicates to computations, etc The may contain XPath functions.

  • Expressions may operate on:

    • text strings

    • numbers (floating-point numbers)

    • logical values (boolean)

    • nodes

    • sequences.

XPath - short notation - Examples

  • para selects all child nodes of context node with name para

  • * selects all element children of the context node

  • text() selects all text node children of the context node

  • @name selects the name attribute of the context node

  • @* selects all the attributes of the context node

  • para[1] selects the first para child of the context node

  • para[last()] selects the last para child of the context node

  • */para selects all para grandchildren of the context node

  • /doc/chapter[5]/section[2] selects the second section of the fifth chapter of the doc

  • chapter//para - selects all descendants of element chapter with name para

  • //para - selects all elements para in the document

  • //olist/item - selects all elements item with parent element olist

  • .//para selects all descendant nodes of CN with name para

  • .. selects the parent node of CN

  • ../@lang selects a lang attribute of CN parent node

XPath - short notation (2)

Most common used short notation is at child axis

  • we use article/para instead of child::article/child::para.

  • at attribute:we use para[@type="warning"] instead of child::para[attribute::type="warning"]

  • The next used short notation is // instead of /descendant-or-self::node()/

  • and of course shortcuts . and ..

For clarity, we keep sometimes the longer form: Do not fight it at all costs!

Further Information on XPath

XPath 2.0

  • Final specification available at - http://www.w3.org/TR/xpath20/

  • Different point of view on return values of XPatch expressions: everything is a sequence (even containing a single element)

  • →removes the set node order problems

  • Introduces conditional expressions and cycles.

  • Introduces user-defined functions (dynamically evaluate XPath expressions)

  • Users can uses general and existential quantifiers, for example. exist student/name="Fred", all student/@id

  • For more details see http://www.saxonica.com/, (pages contains the XPath/XSLT/XQuery processor Saxon as well)..

XPath 2.0 - examples