<!DOCTYPE root-elt-name External-ID [ internal part of DTD ]>
Standards, specifications, XML processing APIs
In new applications - see W3C XML Core Working Group (http://www.w3.org/XML/Core/#Publications) for the answer:
Problems with DTD?
http://www.zvon.org/xxl/DTDTutorial/General/contents.html (including a Czech version)
http://edutechwiki.unige.ch/en/DTD_tutorial (not just DTD but much more)
DTDeclaration is placed immediately before the root element!
<!DOCTYPE root-elt-name External-ID [ internal part of DTD ]>
Internal or external part (internal or external subset) might or might not be present, or both can be present.
External identifier can be either
PUBLIC "PUBLIC ID" "URI"
(suitable for ”public”, generally recognized DTDs) or
SYSTEM "URI"
for private- or other not-that-well-established DTDs
(”URI” need not be just real URL on network, may also be a file on (local)
filesystem, resolution according to system where it is resolved)
The significancy of internal a external parts is the same (they must not be
in conflict - eg. two defeinitions of the same element). DTD contains a list of
definitions for individual elements, list of attributes of them, entities, notations
For "commenting out" portions of DTDs e.g. for experimenting:
<![IGNORE[ this will be ignored ]]>
<![INCLUDE[ this will be included into DTD (i.e. not ignored)]]>
Describes allowed content of the element, in form of <!ELEMENT element-name … >
,
where … can be
for empty element which may be represented as <element/>
or
<element></element>
with the same logical meaning
any element content allowed, i.e. text nodes, child elements, …
may contain child elements - <!ELEMENT element-name (specification of child elements)>
containing both text and child elements given by enumeration
<!ELEMENT element-name (#PCDATA | specification of child elements)*>
For MIXED, the order or cardinality of concrete child elements cannot be specified.
The star (*
) is required and any number of occurencies is always allowed.
For specifying the child elements, we use:
,
|
()
have usual meaning
*
, ?
, + having usual meaning.
Describes (data) type and/or implicit attribute values for the respective element.
<!ATTLIST element-name attribute-name attribute-value-type implicit-value>
Allowed value types are as follows:
CDATA
NMTOKEN
NMTOKENS
ID
IDREF
IDREFS
ENTITY
ENTITIES
(value1|hodnota2|hodnota3)
NOTATION (notace1|notace2|notace3)
Attributes may have obligatory presence:
#REQUIRED
attribute is required
#IMPLIED
attribute is optional
#FIXED "fixed-value"
is required and must have the value fixed-value
Attribute (incl. optional one) might have an implicit value: then the attribut is optional, but if not present, then the implicit value is used instead.
We distinguish:
files with a (well formed) markup,
eg. binary files,
eg. >
refers to a char entity.
Example from XML Base specification http://www.w3.org/TR/xmlbase/
<?xml version="1.0"?>
<e1 xml:base="http://example.org/wine/">
<e2 xml:base="rosé"/>
</e1>
In the example below, the base URI of element e2 should be returned as "http://example.org/wine/rosé".
[Note the use of the reserved prefix xml
]
W3C Recommendation, currently Namespaces in XML 1.0 (Third Edition) W3C Recommendation 8 Dec 2009: http://www.w3.org/TR/REC-xml-names
W3C Recommendation (http://www.w3.org/TR/xml-names11/) (Second Edition) 16 August 2006. Andrew Layman, Richard Tobin, Tim Bray, Dave Hollander
xmlns:prefix="URI"
.
:
) is denoted as so-called Qualified Name, QName.
xmlns=
<html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en" lang="en">
<body>
<h1>Hurááááá</h1>
</body>
</html>
<xhtml:html xmlns:xhtml="http://www.w3.org/1999/xhtml" xml:lang="en" lang="en">
<xhtml:body>
<xhtml:h1>Huráááááá</xhtml:h1>
</xhtml:body>
</xhtml:html>
xi:include
and include
differ according to DTD even if they belong to the same NS and should thus
have the same interpretation/meaning for applications.
Main principles for constructing the canonical form of an XML document:
CDATA
section also replaced by their content
xml
and DTD reference removed
Certain information loss (mostly info from DTD):
start document
, end document
start element
- contains the attributes as well, end element
.
processing instruction
comment
entity reference
<?xml version="1.0"?>
<doc>
<para>Hello, world!</para>
<!-- that’s all folks -->
<hr/>
</doc>
It generates following events:
start document start element: doc
list of attributes: empty
start element: para
list of attributes: empty
characters: Hello, world!
end element: para
comment: that’s all folks
start element: hr
end element: hr
end element: doc end document
org.xml.sax.XMLFilter
interface) can be
programmed using the SAX API.
(from Oracle Java Tutorials http://docs.oracle.com/javase/tutorial/jaxp/stax/example.html)
<?xml version="1.0" encoding="UTF-8"?>
<BookCatalogue xmlns="http://www.publishing.org">
<Book>
<Title>Yogasana Vijnana: the Science of Yoga</Title>
<author>Dhirendra Brahmachari</Author>
<Date>1966</Date>
<ISBN>81-40-34319-4</ISBN>
<Publisher>Dhirendra Yoga Publications</Publisher>
<Cost currency="INR">11.50</Cost>
</Book>
<Book>
<Title>The First and Last Freedom</Title>
<Author>J. Krishnamurti</Author>
<Date>1954</Date>
<ISBN>0-06-064831-7</ISBN>
<Publisher>Harper & Row</Publisher>
<Cost currency="USD">2.95</Cost>
</Book>
</BookCatalogue>
In this example, the client application pulls the next event in the XML stream by calling the next method on the parser; for example:
try {
for (int i = 0 ; i < count ; i++) {
// pass the file name.. all relative entity
// references will be resolved against this
// as base URI.
XMLStreamReader xmlr = xmlif.createXMLStreamReader(filename,
new FileInputStream(filename));
// when XMLStreamReader is created,
// it is positioned at START_DOCUMENT event.
int eventType = xmlr.getEventType();
printEventType(eventType);
printStartDocument(xmlr);
// check if there are more events
// in the input stream
while(xmlr.hasNext()) {
eventType = xmlr.next();
printEventType(eventType);
// these functions print the information
// about the particular event by calling
// the relevant function
printStartElement(xmlr);
printEndElement(xmlr);
printText(xmlr);
printPIData(xmlr);
printComment(xmlr);
}
}
}
org.w3c.dom
.
Most often used interfaces are:
Element
corresponds to the element in a logical document structure. It allows
us to access name of the element, names of attributes, child nodes
(including textual ones). Useful methods:
Node getParentNode()
- returns the parent node
String getTextContent()
- returns textual content of the element.
NodeList getElementsByTagName(String name)
- returns the list
of ancestors (child nodes and their ancestors) with the given name.
Node
super interface of Element
, corresponds to the general node in a logical
document structure, may contain element, textual node, comment, etc.
NodeList
a list of nodes (a result of calling getElementsByTagName
for example).
It offers the following methods for its processing:
int getLength()
- returns the number of nodes in a list
Node item(int index)
- returns the node at position index
Document
corresponds to the document node (its a parent of a root element)
import java.io.IOException;
import java.net.URL;
import javax.xml.parsers.DocumentBuilder;
import javax.xml.parsers.DocumentBuilderFactory;
import javax.xml.parsers.ParserConfigurationException;
import org.w3c.dom.Document;
import org.xml.sax.SAXException;
public class Uloha1 {
/**
* Constructor creating new instance of Uloha1 class by reading XML document
* on the given URL.
*/
private Uloha1(URL url) throws SAXException, ParserConfigurationException, IOException
// We create new instance of factory class
DocumentBuilderFactory factory = DocumentBuilderFactory.newInstance();
// We get new instance of DocumentBuilder using the factory class.
DocumentBuilder builder = factory.newDocumentBuilder();
// We utilize the DocumentBuilder to process an XML document
// and we get document model in form of W3C DOM
Document doc = builder.parse(url.toString());
}
}
import org.w3c.dom.Document;
import org.w3c.dom.Element;
import org.w3c.dom.NodeList;
public class Uloha1 {
Document doc;
/**
* ***********************************************************************
* Method for a salary modification. If the person’s salary is less then
* <code>minimum</code>, the salary will increased to
* <code>minimum>.
* No action is performed with the rest of persons.
*/
public void adjustSalary(double minimum) {
// get the list of salaries
NodeList salaries = doc.getElementsByTagName("salary");
for (int i = 0; i < salaries.getLength(); i++) {
// get the salary element
Element salaryElement = (Element) salaries.item(i);
// get payment
double salary = Double.parseDouble(salaryElement.getTextContent());
if (salary < minimum) {
// modify the text node/content of element
salaryElement.setTextContent(String.valueOf(minimum));
}
}
}
}
Example of the method storing a DOM tree into a file (see Homework 1). The procedure utilizes a transformation we do not know yet. Let use it as a black box.
import java.io.File;
import java.io.IOException;
import javax.xml.transform.Transformer;
import javax.xml.transform.TransformerConfigurationException;
import javax.xml.transform.TransformerException;
import javax.xml.transform.TransformerFactory;
import javax.xml.transform.dom.DOMSource;
import javax.xml.transform.stream.StreamResult;
import org.w3c.dom.Document;
public class Uloha1 {
Document doc;
/**************************************************************************
* Method for a salary modification. If the person’s salary is less then
* <code>minimum</code>, the salary will increased to
* <code>minimum>.
* No action is performed with the rest of persons.
*/
public void serializetoXML(File output) throws IOException,
TransformerConfigurationException {
// We create new instance of a factory class.
TransformerFactory factory = TransformerFactory.newInstance();
Transformer transformer = factory.newTransformer();
// The input is the document placed in a memory
DOMSource source = new DOMSource(doc);
// The transformation output is the output file
StreamResult result = new StreamResult(output);
// Let’s make the transformation
transformer.transform(source, result);
}
}
Tree and event-based access combinations
Events → tree::
- Allow us either to skip or to filter out the ”uninteresting” document part using the event monitoring and then
- create memory-based tree from the ”interesting” part of a document only and that part process.
Tree → events::
- We create an entire document tree (and process it) and
- we go through the tree than and we generate events like while reading the XML file.
- It allows us easy integration of both processing types in a single application.
Virtual object models