XPath PB138 - Markup Languages Tomas Pitner March 1, 2013 To i XPath 0 XPath To i XPath XPath - core principles • XPath is a special language (non a XML markup) to specify parts ofXML documents (nodes, node sets, node sequences); • Parts of text nodes cannot be specified using XPath. • XPath uses a syntax resembling the one used for specifying (file) path in a file system. • XPath uses a library of standard functions • in XPath 2.0 and some XPath 1.x processors may use a user-defined set of functions • XPath 1.0 is a base for XSLT, XPath 2.0 also for XQuery • XPath syntax is NOT XML (it would be too verbose) • XPath 1.0 and 2.0 are W3C Recommendations -http://www.w3.org/TR/xpath To i • advanced navigation in XML • Select third node b: //b[3] • Select the node b, having an ancestor c: //b[./c] • Select empty node b: //b [count(./*)=0] To i XPath XPath - application areas /l • Transformation (XSLT (http://www.w3.org/TR/xslt)) • used to eg. selection of nodes to process: jxshapply-templates select^" ./c" /i To i Kg™ XPath • In "selection part" of XML query languages (XQuery (http://www.w3.org/XML/Query/)) • In some modeling languages (Schematron (http://www.schematron.com/), XML Schema (http://www.w3.org/XML/Schema)) • ... To i Kg™ XPath | aths) a lokace (locations) Path defines (ie. navigates to) a location in a document. Paths are constructed similarly as in a file system, ie. relative evaluated from the so-called context node (CN), see later, or absolute from the document root, but predicates (expressions) also in relation to the CN [20] PathExpr ::= AbsolutePathExpr | RelativePathExpr [22] AbsolutePathExpr ::= ("/" RelativePathExpr?) | i [23] RelativePathExpr ::= StepExpr (("/" | "//") Stej [24] StepExpr ::= AxisStep | GeneralStep [25] AxisStep ::= (Axis? NodeTest StepQualifiers) | 1 Osy (singular axis, plural axes) are sets (sequences) of document nodes, usually but not exclusively, outgoing from the context. Context/s composed of document and the current (context) node (CN). Axes: child all child nodes of the CN descendant all descendants of the CN. No attributes, parent parent node to the CN ancestor all ancestor (parent, parent of parent, etc.) nodes following-sibling all following siblings of the CN (for NS node and attributes this is empty) preceding-sibling dtto, but preceding siblings following all nodes located after CN (no attributes, descendants and CN) preceding similarly, but before attribute all attributes of the CN (must be an element) Tomáš Pitner PB138 - Markup Languages //b/child::* \textbf{} \textbf{} To i XPath //b/descendant::* \textbf{ \textbf{} XPath \textbf{} \textbf{} \emph{\emph{}} To i \textbf{ } \textbf{} \textbf{} \textbf{} \emph{\emph{}} \textbf{} XPath I ing-sibling \textbf{} \textbf{ } \textbf{} To i XPath I ing-sibling \textbf{} \textbf{} \textbf{} To i XPath mg \textbf{} \textbf{ To i XPath \textbf{ \textbf{} XPath Condition used to select (filter) nodes specified eg. by path ex.: /article/para[3] - selects the third para of the article The simplest is (proximity position) - see above • Attention by reverse axes (ancestor, preceding...) - the position is calculated always (outwards) from the CN 3 could equally be replaced by position()=3 To I XPath Used in predicates, calculation (aggregation), etc. Might contain XPath functions. Expressions can be: • string (characters) • numeric (floating-point numbers) • logic (boolean) • nodes • sequences To i XPath | XPath - Examples of shortened notation • parase/ect all "para" child elements of the CN 9 *selects all element children of the context node • text 0 selects all text node children of the context node • Onamese/ec£s the name attribute of the context node • @* selects all the attributes of the context node • para [1] selects the first para child of the context node • para [last 0~\ selects the last para child of the context node 9 */parase/ecte all para grandchildren of the context node 9 /doc/chapter [5] /section [2] selects the second section of the fifth chapter of the doc 9 chapter//parae/ect all "para" descendant elements of "chapter" 9 //paraa// "para" elements from the document 9 //olist/itema// item elements, having a parent "olist" 9 . selects the CN • . //parase/ect all "para" descendants of the CN Kg™ otation (2) Most frequently used is the shortening of child axis: • like article/para instead of child::article/child::para. • and attributes: we write para[@type="warning"] instead of child::para[attribute::type="warning"] • use of // instead of /descendant-or-self: :node()/ • and shorthands dot . and double-dot . . Sometimes it is good to preserve the full (long) form. So, please, learn it! XPath Infosources on XPath • XPath / W3C: http://www.w3.org/TR/xpath • Zvon XPath Tutorial: http: //zvon.org/xxl/XPathTutorial/Dutput/index.html • XPath Tutorial / W3Schools: http://www.w3schools.com/xpath/xpath_intro.asp To i • The Recommendation - http://www.w3.org/TR/xpath20/ • The return value of an XPath expression: all are sequences (even if one item) • so they define ORDER on the returned nodes • Introduce conditional expressions and loops • User functions (in fact, dynamically evaluated expressions in XPath) • One can use general and existence quantifier, eg. exist student/name="Fred" or all student/@id • See further eg. http://www.saxonica.com/, where also the XPath/XSLT/XQuery processor Saxon is located. ►nana XPath • String functions (http: //www.f i.muni.cz/~tomp/xml03/xpath20/string.html) • Numerical functions (http://www.fi.muni.cz/~tomp/ xml03/xpath20/numeric.html) • Sequence functions (http://www.fi.muni.cz/~tomp/ xml03/xpath20/sequence.html) • Boolean functions (http://www.fi.muni.cz/~tomp/xml03/ xpath20/boolean.html) To i