Specifications and validity of XML Document Type Definition (I
PB138 - Markup Languages
Tomas Pitner February 24, 2013
To i
0 Specifications and validity of XML
Ql Document Type Definition (DTD)
Q Physical Structure (Entities)
0 XML Base
0 XML Namespaces
0 XML Information Set
0 Ca nonical Form
0 Terms
0 Tree-based API
0 Event-based API
0 Pull-based APIs
0 Document Object Model (DOM)
0 Using DOM in Java
0 Alternative tree-based models
0 Tree and event-based access combinations
Specifications and validity of XML Document Type Definition (
Up-to-date Specification s of XML
• Original Specification (W3C Recommendation) XML 1.0 at W3C: http://www.w3.org/XML/
• 5th Edition (corrections, updates, no major changes At Extensible Markup Language (XML) 1.0 (Fifth Edition) (http://www.w3.org/TR/REC-xml)
• commented version at XML.COM (Annotated XML): http://www.xml.com/pub/a/axml/axmlintro.html
• XML 1.1 (Second Edition) (http://www.w3.org/TR/xmlll) - changes induced by the introduction of UNICODE 3 , easier normalization , the specification of handling procedure for "end of line" characters . XML 1.1 is not bound to specific version of UNICODE, but always on the latest version.
To I
Specifications and validity of XML Document Type Definition (I
Which version to use?
Which version to use in new applications? See W3C XML Core Working Group
(http://www.w3.Org/XML/Core/#Publications) for the answer:
• unless writing a parser or a XML-generating app. (editor), use XML 1.0 (backward-compatibility)
• new parsers should "know" XML 1.1
To i
Specifications and validity of XML Document Type Definition (I
Validity of XML documents
• To repeat: every XML document must be WELL-FORMED.
• New: an XML doc can be VALID - which means a more strict requirements than WELL-FORMEDNESS.
Usually, the conformance to a DTD (Document Type Definition) of the doc is meant by the validity, or more recently - conformance with an XML Schema or other schema (RelaxNG, Schematron).
To i
Specifications and validity of XML Document Type Definition (I
Document Type Definition (DTD)
• Document Type Definition (usage/reference to this definition is then a Document Type Declaration).
• Specified in the (core) XML standard 1.0.
• Describes allowed element content, attribute presence and content, their default values, defines used entities.
• DTD might be either internal or external DTD (internal and external subset) or "mixed" - both.
• A document conformant with a DTD is denoted as valid (" platný" in Czech).
• DTD and languages for similar purpose are denoted as modeling languages - they model/define concrete markups.
• Syntax of DTD IS NOT XML (in constrast to XML Schema and many others modeling languages).
To i
Specifications and validity of XML Document Type Definition (I
Motivation for DTD, comparison, pros and contras
Problems with DTD?
• Fundamental problem of DTD is its incompatibility with XML Namespaces and
• lack of modeling expressiveness - some constructs cannot be constrained by DTD.
• Direct, more powerful, but also more complex modeling language is W3C XML Schema
(http://www.w3.org/XML/Schema).
• Powerful and simpler alternatives of XML Schema are e.g. RelaxNG (http://relaxng.org). (on Wikipedia:RELAX_NG (http://en.wikipedia.org/wiki/RELAX_NG))
To i
Specifications and validity of XML Document Type Definition (I
Why use DTD?
Why use DTD at all?
• Simple. All parsers are fine with it.
• Sufficient for many markups.
To i
Specifications and validity of XML Document Type Definition (I
DTD - tutorials
• Webreview: http://www.webreview.com/2000/08_ll/ developers/08_ll_00_2.shtml
• ZVON: http://www.zvon.org/xxl/DTDTutorial/ General/contents.html
• XML DTD Tutorial (101): http://www.xmll01.com/dtd/
• W3Schools DTD Tutorial: http://www.w3schools.com (http://www.w3school.com)
To i
Specifications and validity of XML Document Type Definition (I
DTD in more details / 1
DTD declaration is placed immediately before the root element!
•
Internal orexternal part (internal or external subset) might or might not be present, or both can be present.
To I
Specifications and validity of XML Document Type Definition (I
DTD in more details / 2
External identifier can be either
• PUBLIC "PUBLIC ID" "URI" (suitable for " public" , generally recognized DTDs) or
• SYSTEM "URI" - for private- or other not-that-well established DTDs ("URI" neednot be just real URL on network, may also be a file on (local) filesystem, resolution according to system where it is resolved)
The significancy of internal a external parts is the same (they must not be in conflict - eg. two defeinitions of the same element). DTD contains a list of definitions for individual elements, list of attributes of them, entities, notations
To i
Specifications and validity of XML Document Type Definition (I
DTD - conditional sections
For "commenting out" portions of DTDs e.g. for experimenting.
•
•
To I
Specifications and validity of XML Document Type Definition (I
DTD - element type definition / 1
Describes allowed content of the element, in form of , where . . . can be
• EMPTY - for empty element which may be represented as or - the same logical meaning
• ANY - any element content allowed, i.e. text nodes, child elements, ...
• may contain child elements -
• may be mixed - containing both text and child elements given by enumeration .
• for MIXED: the order or cardinality of concrete child elements cannot be specified.
• The star (*) is required - any cardinality is always allowed.
Specifications and validity of XML Document Type Definition (I
DTD - element type definition / 2
For specifying the child elements, we use:
• sequence operator (sekvence, follow with) ,
• choice operator (vyberu, select, choice) \
• parenthesis () have usual meaning
• various operators CANNOT be combined within a group , I
• the child elements cardinality (occurence) can be specified/limited by "star", "question mark", "plus" having usual meaning.
• No specifier means just one occurence allowed.
To i
Specifications and validity of XML Document Type Definition (I
DTD - attribute definition
Describes (data) type and/or implicit attribute values for the respective element.
To i
Specifications and validity of XML Document Type Definition (I
DTD - definition of attribute value type
Allowed value types are as follows:
• CDATA
• NMTDKEN
• NMTDKENS a ID
• IDREF
• IDREFS
• ENTITY
• ENTITIES
• enumeration - eg. (hodnotal | hodnota2 | hodnota3)
• enumeration of notations - eg. NOTATION (notacel|notace2|notace3)
To i
Specifications and validity of XML Document Type Definition (I
DTD - cardinality of attributes
Attributes may have obligatory presence:
• #REQUIRED - attribute is required
• #IMPLIED - attribute is optional
• #FIXED "fixed-value" - is required and must have the value fixed-value
To I
Specifications and validity of XML Document Type Definition (I
DTD - implicit attribute value
Attribute (incl. optional one) might have an implicit value:
• "implicit value" - attribut is optional, but if not present, then the implicit value is used instead.
To I
Specifications and validity of XML Document Type Definition (I
Entity - declaration and usage
We distinguish:
• declaration
• reference (ie. use) of a (declared) entity.
Specifications and validity of XML Document Type Definition (I
General entities may be
• parsed - files with a (well formed) markup,
• not-parsed - eg. binary files,
• character entities - characters, eg. > refers to a char entity.
To i
Specifications and validity of XML Document Type Definition (I
• only inside of DTD, somehow similar to "macros" in pg. languages
• suitable eg. for declations of attribute lists (if long and multiply used)
• see DTD for HTML 4.01 -
http://www.w3.org/TR/html4/sgml/dtd.html
• definition of a parametric entity is eg.
To i
Specifications and validity of XML Document Type Definition (I
• XML Base (second edition), W3C Recommendation 28 Jan 2009: http://www.w3.org/TR/xmlbase/
• Standard for evaluation of relative URLs in links to/from XML docs. Facility similar to that of HTML BASE, for defining base URIs for parts of XML documents.
• Defines how to use a reserved attribute xml:base denoting the base URI for relative URIs.
• It complements with the XLink spec.
• It works based on "overriding" of XML base from parent (ancestor) elements.
To I
Specifications and validity of XML Document Type Definition (I
Note the use of the reserved prefix xml:
To i
Specifications and validity of XML Document Type Definition (I
XML Namespaces (jmenné prostory)
XML Namespaces (W3C Recommendation, currently
Namespaces in XML 1.0 (Third Edition) W3C
Recommendation 8 Dec 2009):
http://www.w3.org/TR/REC-xml-names
to new XML, there exists Namespaces in XML 1.1 W3C
Recommendation
(http: //www. w3. org/TR/xml-namesll/) (Second Edition) 16 August 2006. Andrew Layman, Richard Tobin, Tim Bray, Dave Hollander
They define logical spaces for names of elements, attributes in XML document.
They give the elements and attributes the "third dimension". To each NS in XML, there is exactly one ("globally") unique identifier, given by URI (URIs is a superset of URLs). NS corresponding to an URI does not anyhow relate to content that would potentially be available under the URL
i □ ► A UP ► <
138 - Markup Languages
Specifications and validity of XML Document Type Definition (I
:e of NSs /l
• Instead of URIs for denoting a namespace in document, one uses prefixes for these NS mapped to the respective URI using xmlns:prefix="URI".
Element- or attribute-name containing colon (:) is denoted as Qualified Name, QName.
• Two NS are equal iff their URIs are one-to-one-character the same (in UNICODE).
• NS do not apply to text nodes.
To i
Specifications and validity of XML Document Type Definition (I
:e of NSs /2
• Element/attribute need not be in a namespace.
• NS prefix declaration or declaration or the implicit NS recursively applies to all descendants (child elements, their children etc.), unless another declaration "remaps" the given prefix.
• One NS is co-called implicit (default) NS, declared by attribute xmlns=
• Default NSs are NOT applied to attributes!!!, thus attributes without an explicit prefix do not belong to any NS.
To i
Specifications and validity of XML Document Type Definition (I
Huraaaa