XPath basic principles
-
XPath is a syntax used to specify parts of XML documents (primitive values, nodes, sequences of values or nodes
-
XPath does not allow to specify parts of text nodes.
-
Its name is derived from path expression providing a means of hierarchic addressing of the nodes in an XML tree.
-
XPath uses syntax similar to file system path.
-
XPath offers standard functions library, as well as user defined functions in either some XPath 2.0 or even XPath 1.x processors.
XPath "relatives"
-
XPath is used as a base in XSLT since version 1.0 and
-
in XQuery since XPath version 2.0.
-
XPath does not use XML syntax (it would be too long).
-
The latest XML Path Language (XPath) 3.0, W3C Recommendation 08 April 2014
-
Backward compatibility: nearly all XPath 1.0 expressions continue to deliver the same result with XPath 3.0 (for exceptions see http://www.w3.org/TR/xpath-30/#id-backwards-compatibility)
Crucial Learning Resource
-
Zvon tutorial very nice to learn step by step
-
PathEnq nice to
XPath domain: Advanced XML Data navigation
<?xml version="1.0"?> <a> <b/> <b> <c/> </b> <b> <c/> </b> </a>
-
Select the 3rd node b:
//b[3]
-
Select a node b, which has a child node c:
//b[./c]
-
Select an empty (eg. no child elements) node b:
//b[count(./*)=0]
XPath domain: Transformation (XSLT)
-
Select nodes that have to be processed next:
<xsl:apply-templates match="para"/>
-
Select value:
<xsl:value-of select="para/@id"/>
XPath domain: Selection parts in XQuery
-
(F)or part, eg.
for $para in $doc//para
selects allpara
in the documentdoc
-
(L)et part, eg.
let $mypara := $doc//para[@owner=myself]
-
(W)here part, eg.
where $para[@class=task]
-
(O)rder part, eg.
order by $para/@created
XPath domain: Modeling languages
-
Schematron
-
XML Schema
XPath paths and locations
Path describes (or "navigates" to) an XML document location. Paths syntax is constructed a similar way to paths in file systems, i.e.:
- relative
-
related to a context node (CN), see further, or
- absolute
-
related to the root element but predicates are evaluated in relation to CN.
XPath data types
-
Since XPath 3.0 unified with the XML Schema and XQuery datatypes
-
XQuery and XPath Data Model 3.0, W3C Recommendation 08 April 2014
Axes
-
Axes (singular axis, plural axes) are sets of document elements, related to (usually relatively) to context.
-
Context is formed by a document and the current (context) node (CN).
List of Axes
- child
-
contains direct child nodes of CN
- descendant
-
contains all descendants of CN except attributes.
- parent
-
contains the CN parent nod (if it exists)
- ancestor
-
contains all ancestors of CN - means parents, grandparents, etc to a root element (if the CN is not the root element itself)
- following-sibling
-
contains all following siblings of CN (the axis is empty for NS and attributes)
- preceding-sibling
-
dtto, but it contains the preceding sibling.
- following
-
contains all nodes following the CN (except the attributes, child nodes and NS nodes)
- preceding
-
dtto, but contains preceding nodes (except ancestors, attributes, NS)
- attribute
-
contains attributes (for elements only)
- namespace
-
contains all NS nodes of CN (for elements only)
- self
-
the CN itself
- descendant-or-self
-
contains the union of descendant and self axes
- ancestor-or-self
-
contains the union of ancestor and self axes
XPath online testers
-
It is possible to try evaluation of XPath expressions upon a provided XML document by using many online testers without the need of (local PC) installation.
Figure 1. //b/child::*
<?xml version="1.0"?> <a> <b/> <b> <c/> </b> <b> <c/> </b> </a>
Example //b/descendant::*
<?xml version="1.0"?> <a> <b/> <b> <c> <d/> </c> </b> <b> <c/> </b> </a>
Example //d/parent::*
<?xml version="1.0"?> <a> <b/> <b> <c> <d/> </c> </b> <b> <c/> </b> </a>
Example //d/ancestor::
<?xml version="1.0"?> <a> <b/> <b> <c> <d/> </c> </b> <b> <c/> </b> </a>
Example 6. //b/following-sibling::*
<?xml version="1.0"?> <a> <b/> <b> <c> <d/> </c> </b> <b> <c/> </b> </a>
Example 7. //b/preceding-sibling::*
<?xml version="1.0"?> <a> <b/> <b> <c> <d/> </c> </b> <b> <c/> </b> </a>
Example 8. /a/b/c/following::*
<?xml version="1.0"?> <a> <b/> <b> <c> <d/> </c> <e/> </b> <b> <c/> </b> </a>
Example 9. /a/b/e/preceding::*
<?xml version="1.0"?> <a> <b/> <b> <c> <d/> </c> </b> <b> <d/> <e/> </b> </a>
Predicates
-
Figure:
/article/para[3]
- selects the 3rd paragraph (elementpara
) of article (elementarticle
) -
Simplest predicate expression is proximity position specification - see preceding.
-
Attention at reverse axes (
ancestor
,preceding
, …) - position is numbered always from CN, means opposite to document physical location directions. -
Position specification 3 can be replace by the expression
position()=3
.
Expressions
-
Used in predicates for calculations. Expressions may contain XPath functions. Expressions may operate on:
-
text strings
-
numbers (floating-point numbers)
-
logical values (boolean)
-
nodes
-
sequences.
-
Short notation — examples 1
-
para
-
selects all child nodes of context node with name
para
-
*
-
selects all element children of the context node
-
text()
-
selects all text node children of the context node
-
@name
-
selects the
name
attribute of the context node -
@*
-
selects all the attributes of the context node
-
para[1]
-
selects the first
para
child of the context node -
para[last()]
-
selects the last
para
child of the context node -
*/para
-
selects all
para
grandchildren of the context node
Short notation — examples 2
-
/doc/chapter[5]/section[2]
-
selects the second
section
of the fifthchapter
of thedoc
-
chapter//para
-
selects all descendants of element
chapter
with namepara
-
//para
-
selects all elements
para
in the document -
//olist/item
-
selects all elements
item
with parent elementolist
-
.//para
-
selects all descendant nodes of CN with name
para
-
..
-
selects the parent node of CN
-
../@lang
-
selects a
lang
attribute of CN parent node
XPath - short notation (2)
Most common used short notation is at child axis
-
we use article/para instead of
child::article/child::para
. -
at attribute:we use
para[@type="warning"]
instead ofchild::para[attribute::type="warning"]
-
The next used short notation is
//
instead of/descendant-or-self::node()/
-
and of course shortcuts
.
and..
For clarity, we keep sometimes the longer form: Do not fight it at all costs!
Further Information on XPath
-
XPath on W3C: http://www.w3.org/TR/xpath
-
Zvon XPath Tutorial: http://zvon.org/xxl/XPathTutorial/Output/index.html
-
XPath Tutorial on W3Schools: http://www.w3schools.com/xpath/xpath_intro.asp
XPath 2.0
-
Final specification available at - http://www.w3.org/TR/xpath20/
-
Different point of view on return values of XPatch expressions: everything is a sequence (even containing a single element) → removes the set node order problems
-
Introduces conditional expressions and cycles.
-
Introduces user-defined functions (dynamically evaluate XPath expressions)
-
Users can uses general and existential quantifiers, for example
exist student/name="Fred"
,all student/@id
-
For more details see http://www.saxonica.com/, pages contains the XPath/XSLT/XQuery processor Saxon as well.
XPath 2.0 - examples
- String functions
- Numeric functions
- Sequence functions
- Boolean functions
Resources on XPath
-
Programming in XPath 3.0 (D. Novatchev)
-
XPath functions (Mozilla)