Outline

  • Logical and physical structure

  • Concepts: node, element, attribute, processing instruction, text node, comment

XML document structure

Fundamental requirement to all XML doc: it must be well-formed:

  1. It contains prolog (heading) andexactly one root element.Hefore and after the root element, there can be processing instructions, comments (Misc).

  2. It meets all the well-formedness constraints given in the specification.

  3. Each of the parsed entities which is referenced directly or indirectly within the document is well-formed.

XML document structure — further info

XML document structure

  • XML document structure we distinguish between: physical and logical structure.

  • Application programmers are usually interested just on the logical structure,

  • while for the authors of content, XML editors, processors may also the physical structure be important.

Physical and logical structure

Logical structure

A document is divided into elements (one of them is the root), their attributes, text nodes in elements, processing instructions, notations, comments.

Physical structure

One logical doc may be stored in one or more entities; at least in the document entity.

Composition of logical structure

  • node (element, attribute, text node, processing instructions, comments)

  • element

  • attribute

  • text node

  • processing instructions

  • comments

Prvky logické struktury (česky)

  • uzel (element, atribut, textový uzel, instrukce pro zpracování, komentář)

  • element

  • atribut

  • textový uzel

  • instrukce pro zpracování

  • komentář

Elements

are objects delimited by start- and end-tag, examples:

<body background="yellow">
   <hl>text node — content of element hl</hl>
   <p>text node — content of element p</p>
</body>

Elements — empty

If an element is empty (no child elements, neither text content inside), then we write just empty element tag, eg.:

<tagname tagattributel tagattribute2. . . />
<hr width=' 507.' />

Or equivalently (from logical viewpoint):

<hr width=' 507.' ></hr>

Attributes

  • "Attached to elements", carry "additional info" to elements - eg. its ID, required formatting (style) in case of (X)HTML, or links to other elements

  • Conceptually, we could replace all attributes with elements but we keep attributes to maintain readability.

  • The attribute content is NOT further structured (at least not according to XML standards. An application may see it other way but generally it is not recommended, cf. attributes in relational data model.)

  • The physical order of attributes in the start tag is NOT important and generally is NOT considered.

How to write attributes

  • An attribute is composed of its name and value.

  • Attributes are inserted in the start tag which may be empty.

  • Attribute value is always in quotes or doublequotes add separated by a — from the attribute name.

  • For attribute names the same rules as for element names hold.

  • In one element, there can never be two or more attributes with same name.

  • If namespaces are used, neither two attributes belonging to the same namespace are allowed.

Attributes — example <hr width=' 50°/.'/>

<table border='l'>
  <tr>
       <td>jedna</td>
       <td>dve</td>
  </tr>
  <tr>
       <td>tri</td>
       <td>ctyri</td>
  </tr>
</table>

Text node

  • They carry textual information, textual content.

  • Eg. in the next sample, the text ahoj ! is the text node - not the whole element em!

    <em>ahoj!</em>

Processing instructions

  • Processing-instructions are written using <?target content?> markup.

  • They inform an application about the expected processing or setting.

  • They do not carry content.

    <?xsl-stylesheet href="mystyle.xsl"?>
  • href does NOT mean an attribute; Pis do not contain attributes!

Notations

  • Notation is enclosed in <!NOTATION name declaration >

  • It is mostly used to describe binary / non-XML entities - eg. images GIF, PNG,…

  • It is a declaration how to process the binary data.

Comments

  • Similarly to HTML - comment is enclosed into <!—content—>

  • The comment content is content, NOT the the whole comment including markup.

  • Comments are usually not important for processing but it may depend on application, eg. Servlet-side Includes (SSI) use comments.

  • Parsers therefore should be able to forward comments to the applications.

  • SAX parsers ignore this in version 1!!! (resp. do so in version SAX2, in Java the package org. xml. sax. ext).

Entity

  • Entity is a basic unit of physical doc composition. Corresponds to string or file…

  • Parsers should process the entities such as the applications do not know about them.

Document node

We distinguish:

Document node

parent of the root element; may contain also Pis, notations, DOCTYPE etc. and

Root element

is the core part of an XML doc. In every file, there is just one.

In more details…

in the next chapter XML family standards.