XML Namespaces (jmenné prostory)
-
XML Namespaces (W3C Recommendation, currently Namespaces in XML 1.0 (Third Edition) W3C Recommendation 8 Dec 2009): http://www.w3.org/TR/REC-xml-names
-
to new XML, there exists Namespaces in XML 1.1 W3C Recommendation (Second Edition) 16 August 2006. Andrew Layman, Richard Tobin, Tim Bray, Dave Hollander
-
They define logical spaces for names of elements, attributes in XML document.
-
They give the elements and attributes the "third dimension".
-
To each NS in XML, there is exactly one ("globally") unique identifier, given by URI (URIs is a superset of URLs).
-
NS corresponding to an URI does not anyhow relate to content that would potentially be available under the URL ("nothing is downloaded when processing NSs".
Prefixes and Equivalence of NSs (1)
-
Instead of URIs for denoting a namespace in document, one uses prefixes for these NS mapped to the respective URI using
xmlns:prefix="URI"
. -
Element- or attribute-name containing colon (
:
) is denoted as Qualified Name, QName. -
Two NS are equal iff their URIs are one-to-one-character the same (in UNICODE).
-
Namespaces do not apply to text nodes.
-
Element/attribute need not be in a namespace.
-
NS prefix declaration or declaration or the implicit NS recursively applies to all descendants (child elements, their children etc.), unless another declaration "remaps" the given prefix.
-
One NS is co-called implicit (default) NS, declared by attribute
xmlns=
-
Default NSs are NOT applied to attributes!!!, thus attributes without an explicit prefix do not belong to any NS.
Example 1. Default NS
<html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en" lang="en">
<body>
<h1>Huraaaa</h1>
</body>
</html>
Example 2. Prefixed NS
<xhtml:html xmlns:xhtml="http://www.w3.org/1999/xhtml" xml:lang="en" lang="en">
<xhtml:body>
<xhtml:h1>Huraaaa</xhtml:h1>
</xhtml:body>
</xhtml:html>
Issues related to NS
-
NS are not compatible with DTD.
-
DTD strictly differentiates between eg. name
xi:include
andinclude
even if they belong to the same NS and should thus have the same interpretation/meaning for applications.
API for XML Processing (to repeat)
-
APIs offer simple standardized XML access.
-
APIs connect application to the parser and applications together.
-
APIs allow XML processing without knowledge of physical document structure (entities).
-
APIs optimize XML processing.
XML APIs Fundamental Types
-
Tree-based API
-
Event-based API
-
API based on pulling events/elements off the document (Pull API).
Tree-based API
-
They map an XML document to a memory-based tree structure.
-
allows to traverse the entire DOM tree.
-
best-known - Document Object Model (DOM from W3C, see http://www.w3.org/DOM)
Programming Language Specific Models
-
Java: JDOM - http://jdom.org
-
Java: dom4j - http://dom4j.org
-
Java: XOM - http://www.xom.nu
-
Python: 4Suite - http://4suite.org
-
PHP: SimpleXML - http://www.php.net/simplexml
Document Object Model (DOM)
-
Basic interface to process and access the tree representation of XML data
-
Three versions of DOM: DOM Level 1, 2, 3
-
DOM - does not depend on the XML parsing.
-
Described using IDL + API descriptions for particular programming languages (C++, Java, etc.)
HTML Documents Specific DOM
-
The HTML Core DOM is more less consolidated with the XML DOM
-
Designated to CSS
-
Used for dynamic HTML programming (scripting using VB Script, JavaScript, etc)
-
Contains the browser environment (windows, history, etc) besides the document model itself.
DOM references
-
JAXP Tutorial, part dedicated to the DOM Part III: XML and the Document Object Model (DOM) (http://java.sun.com/xml/jaxp/dist/1.1/docs/tutorial/dom/index.html)
-
Portal dedicated to the DOM http://www.oasis-open.org/cover/dom.html
-
DOM 1 Interface visual overview http://www.xml.com/pub/a/1999/07/dom/index.html
-
Tutorial ”Understanding DOM (Level 2)” available at https://www.ibm.com/developerworks/xml/
Using DOM in Java
-
Native DOM support in the new Java versions (JDK and JRE) - no need of additional library.
-
Applications need to import needed symbols (interfaces, classes, etc.) mostly from package
org.w3c.dom
.
What will we need often?
Most often used interfaces are:
-
Element
corresponds to the element in a logical document structure. It allows us to access name of the element, names of attributes, child nodes (including textual ones). Useful methods: -
Node getParentNode()
- returns the parent node -
String getTextContent()
- returns textual content of the element. -
NodeList getElementsByTagName(String name)
- returns the list of ancestors (child nodes and their ancestors) with the given name. -
Node
super interface ofElement
, corresponds to the general node in a logical document structure, may contain element, textual node, comment, etc. -
NodeList
a list of nodes (a result of callinggetElementsByTagName
for example). It offers the following methods for its processing: -
int getLength()
- returns the number of nodes in a list -
Node item(int index)
- returns the node at position index -
Document
corresponds to the document node (its a parent of a root element)
Example 1 - creating DOM tree from file
public class Task1 {
private Task1(URL url) throws SAXException,
ParserConfigurationException, IOException {
// We create new instance of factory class
DocumentBuilderFactory factory = DocumentBuilderFactory.newInstance();
// We get new instance of DocumentBuilder using the factory class.
DocumentBuilder builder = factory.newDocumentBuilder();
// We utilize the DocumentBuilder to process an XML document
// and we get document model in form of W3C DOM
Document doc = builder.parse(url.toString());
}
}
Example 2 - DOM tree modification
public class Task1 {
private Document doc;
// Method for a salary modification.
// If the person’s salary is less then
// minimum, the salary will increased to minimum.
// No action is performed for the other persons.
public void adjustSalary(double minimum) {
// get the list of salaries
NodeList salaries = doc.getElementsByTagName("salary");
for (int i = 0; i < salaries.getLength(); i++) {
// get the salary element
Element salaryElement = (Element) salaries.item(i);
// get payment
double salary = Double.parseDouble(
salaryElement.getTextContent());
if (salary < minimum) {
// modify the text node/content of element
salaryElement.setTextContent(String.valueOf(minimum));
}
}
}
}
Example 3 - storing a DOM tree into an XML file
Example of the method storing a DOM tree into a file (see Homework 1). The procedure utilizes a transformation we do not know yet. Let use it as a black box.
public class Task1 {
private Document doc;
public void serializetoXML(File output) throws IOException,
TransformerConfigurationException {
// We create new instance of a factory class.
TransformerFactory factory
= TransformerFactory.newInstance();
Transformer transformer
= factory.newTransformer();
// The input is the document placed in a memory
DOMSource source = new DOMSource(doc);
// The transformation output is the output file
StreamResult result = new StreamResult(output);
// Let’s make the transformation
transformer.transform(source, result);
}
}
Alternative tree-based models
XML Object Model (XOM)
-
XOM (XML Object Model) created as an one man project (author Elliote Rusty Harold).
-
It is an interface that strictly respect XML data logical model.
-
For motivation and specification see the XOM home page (http://www.xom.nu).
-
You can get there the open-sourceXOM implementation and
-
the API documentation, too.
DOM4J - practically usable tree-based model
-
comfortable, fast and memory efficient tree-oriented interface
-
designed and optimized for Java
-
available as open-source at http://dom4j.org
-
perfect ”cookbook” (http://dom4j.org/cookbook/cookbook.html) available
-
dom4j is powerful, seetree-based models efficiency comparison (http://www.ibm.com/developerworks/xml/library/x-injava/)
Tree and event-based access combinations
-
Events → tree
-
Tree → events
Events → tree
-
Allow us either to skip or to filter out the ”uninteresting” document part using the event monitoring and then
-
create memory-based tree from the ”interesting” part of a document only and that part process.
Tree → events
-
We create an entire document tree (and process it) and
-
we go through the tree than and we generate events like while reading the XML file.
-
It allows us easy integration of both processing types in a single application.
Virtual object models
-
Document DOM model is not memory places, but is created on-demand while accessing particular nodes.
-
combines event-based and tree-based processing advantages (speed and comfort)
-
There is an implementation: the Sablotron processor, http://www.xml.com/pub/a/2002/03/13/sablotron.html
XPath - basic principles
-
XPath is a syntax used to specify parts of XML documents (nodes, sets of nodes, sequences of nodes; does not allow to specify parts of text nodes).
-
XPath uses syntax similar to file system path.
-
XPath offers standard functions library (as well as user defined functions in either some XPath 2.0 or even XPath 1.x processors).
-
XPath is used as a base in XSLT since version 1.0 and XQuery since version 2.0.
-
XPath does not use XML syntax (it would be too long)
-
XPath 1.0 and 2.0 are W3C Recommendation - http://www.w3.org/TR/xpath
XPath - Application Domains
-
Advanced XML Data navigation
<?xml version="1.0"?>
<a>
<b/>
<b>
<c/>
</b>
<b>
<c/>
</b>
</a>
-
Select the 3rd node b:
//b[3]
-
Select a node b, it has a child node c:
//b[./c]
-
Select an empty node b:
//b[count(./*)=0]
XPath - Application Domains
-
Transformation (XSLT)
-
used to select nodes, they have to be processed
<xsl:value-of select="./c"/>
XPath - Application Domains
-
Selection parts of XML query languages (XQuery)
-
Some XML modeling languages (Schematron, XML Schema)
-
…
XPath - terms paths and locations
-
Path describes (means. "navigates") XML document location. Paths syntax is constructed similar way to paths on file systems, it means like
-
relative - related to a context node (CN), see further or
-
absolute - related to the root element, but predicates are evaluated in relation to CN.
-
XPath - syntactic rules
[20] PathExpr ::= AbsolutePathExpr | RelativePathExpr
[22] AbsolutePathExpr ::= ("/" RelativePathExpr?) | ("//" RelativePathExpr)
[23] RelativePathExpr ::= StepExpr (("/" | "//") StepExpr)*
[24] StepExpr ::= AxisStep | GeneralStep
[25] AxisStep ::= (Axis? NodeTest StepQualifiers) | AbbreviatedStep
XPath - axes
-
Axes (singular axis, plural axes) are sets of document elements, related to (usually relatively) to context.
-
Context is formed by a document and the current (context) node (CN).
-
Axes are:
-
child
- contains direct child nodes of CN -
descendant
- contains all descendants of CN except attributes. -
parent
- contains the CN parent nod (if it exists) -
ancestor
- contains all ancestors of CN - means parents, grandparents, etc to a root element (if the CN is not the root element itself) -
following-sibling
- contains all following siblings of CN (the axis is empty for NS and attributes) -
preceding-sibling
- dtto, but it contains the preceding sibling. -
following
- contains all nodes following the CN (except the attributes, child nodes and NS nodes) -
preceding
- dtto, but contains preceding nodes (except ancestors, attributes, NS) -
attribute
- contains attributes (for elements only) -
namespace
- contains all NS nodes of CN (for elements only) -
self
- the CN itself -
descendant-or-self
- contains the union of descendant and self axes -
ancestor-or-self
- contains the union of ancestor and self axes
-
Figure 1. //b/child::*
<?xml version="1.0"?>
<a>
<b/>
<b>
<c/>
</b>
<b>
<c/>
</b>
</a>
Example 3. //b/descendant::*
<?xml version="1.0"?>
<a>
<b/>
<b>
<c>
<d/>
</c>
</b>
<b>
<c/>
</b>
</a>
Example 4. //d/parent::*
<?xml version="1.0"?>
<a>
<b/>
<b>
<c>
<d/>
</c>
</b>
<b>
<c/>
</b>
</a>
Example 5. //d/ancestor::*
<?xml version="1.0"?>
<a>
<b/>
<b>
<c>
<d/>
</c>
</b>
<b>
<c/>
</b>
</a>
Example 6. //b/following-sibling::*
<?xml version="1.0"?>
<a>
<b/>
<b>
<c>
<d/>
</c>
</b>
<b>
<c/>
</b>
</a>
Example 7. //b/preceding-sibling::*
<?xml version="1.0"?>
<a>
<b/>
<b>
<c>
<d/>
</c>
</b>
<b>
<c/>
</b>
</a>
Example 8. /a/b/c/following::*
<?xml version="1.0"?>
<a>
<b/>
<b>
<c>
<d/>
</c>
<e/>
</b>
<b>
<c/>
</b>
</a>
Example 9. /a/b/e/preceding::*
<?xml version="1.0"?>
<a>
<b/>
<b>
<c>
<d/>
</c>
</b>
<b>
<d/>
<e/>
</b>
</a>
XPath - predicates
-
Assigned to selection from node set specified by path for example.
-
Figure: /article/para[3] - selects the 3rd paragraph (element para) of article (element article)
-
Simplest predicate expression is proximity position specification - see preceding.
-
Attention at reverse axes (ancestor, preceding, …) - position is numbered always from CN, means opposite to document physical location directions.
-
Position specification 3 can be replace by expression position()=3.
-
XPath - expressions
-
Used in predicates to computations, etc The may contain XPath functions.
-
Expressions may operate on:
-
text strings
-
numbers (floating-point numbers)
-
logical values (boolean)
-
nodes
-
sequences.
-
XPath - short notation - Examples
-
para
selects all child nodes of context node with name para -
*
selects all element children of the context node -
text()
selects all text node children of the context node -
@name
selects the name attribute of the context node -
@*
selects all the attributes of the context node -
para[1]
selects the firstpara
child of the context node -
para[last()]
selects the lastpara
child of the context node -
*/para
selects all para grandchildren of the context node -
/doc/chapter[5]/section[2]
selects the second section of the fifth chapter of the doc -
chapter//para
- selects all descendants of elementchapter
with namepara
-
//para
- selects all elementspara
in the document -
//olist/item
- selects all elementsitem
with parent elementolist
-
.//para
selects all descendant nodes of CN with namepara
-
..
selects the parent node of CN -
../@lang
selects a lang attribute of CN parent node
XPath - short notation (2)
Most common used short notation is at child axis
-
we use article/para instead of
child::article/child::para
. -
at attribute:we use
para[@type="warning"]
instead ofchild::para[attribute::type="warning"]
-
The next used short notation is
//
instead of/descendant-or-self::node()/
-
and of course shortcuts
.
and..
For clarity, we keep sometimes the longer form: Do not fight it at all costs!
Further Information on XPath
-
XPath on W3C: http://www.w3.org/TR/xpath
-
Zvon XPath Tutorial: http://zvon.org/xxl/XPathTutorial/Output/index.html
-
XPath Tutorial on W3Schools: http://www.w3schools.com/xpath/xpath_intro.asp
XPath 2.0
-
Final specification available at - http://www.w3.org/TR/xpath20/
-
Different point of view on return values of XPatch expressions: everything is a sequence (even containing a single element)
-
→removes the set node order problems
-
Introduces conditional expressions and cycles.
-
Introduces user-defined functions (dynamically evaluate XPath expressions)
-
Users can uses general and existential quantifiers, for example. exist student/name="Fred", all student/@id
-
For more details see http://www.saxonica.com/, (pages contains the XPath/XSLT/XQuery processor Saxon as well)..
XPath 2.0 - examples
-
String functions - http://www.fi.muni.cz/~tomp/xml03/xpath20/string.html
-
Numeric functions - http://www.fi.muni.cz/~tomp/xml03/xpath20/numeric.html
-
Sequence functions - http://www.fi.muni.cz/~tomp/xml03/xpath20/sequence.html
-
Boolean functions - http://www.fi.muni.cz/~tomp/xml03/xpath20/boolean.html