Week 02: XML, schema and validation, DOM
1
Agenda
Markup languages
XML basics
XML schema
DOM
Short demo
Hands on: Iteration 01
2
Let's dive into it!
3
Markup languages (recap)
natural language + special constructs ("marks")
for instance HTML, Markdown, TeX
easily readable for both computers as well as humans
4
Example: Markdown
bold text
Gitlab FI MUNI
Heading level 5
5
XML
eXtensible Markup Language
data exchange format
translations
web scraping
.xml file extension
6
Harry Potter
J. K. Rowling
2005
29.99
Learning XML
Erik T. Ray
2003
39.95
7
XML document structure
8
comment
processing instruction
root element
child/nested elements
start/end tags
text node
attribute
Note: Elements are also nodes.
9
Basic rules
all elements must have an end tag OR be empty and self-closing
all elements must be properly nested (overlapping is not allowed)
all attribute values must be enclosed in quotes
each document must have a unique root element
10
Naming conventions and names of elements are free to choose.
But remember,
"with great power comes great responsibility."
11
To avoid
12
Element or attribute?
children
en
Harry Potter
J K. Rowling
2005
29.99
vs.
Harry Potter
J K. Rowling
2005
29.99
13
Element or attribute?
Attribute value:
cannot be further structured
"atomic" value
keeps an additional info of another element
for instance, lang
Note: do not use too many attributes on a single element. It can get hard to read for humans very quickly.
Let's say that 3-4 attributes are a maximum.
Element:
even if nested, it is a meaningful structure itself
usually structured
contains any number of nodes
for instance, person
14
XML Schema
the structure of an XML document
what elements and attributes are allowed to appear, in which order, how many times...
data types of elements and attributes
default or fixed values for elements/attributes
namespaces
follows XML syntax itself
.xsd file extension
Note: we validate an XML document against a schema. So a document
is valid if it conforms to a given schema.
15
16
Definition Header
...
...
17
Element
This is a simple element. It can contain only text.
Limiting occurences
18
Types: simple types
Basic types
xs:string
xs:decimal
xs:integer
xs:boolean
xs:date
xs:time
...
19
Types: simple types
definition inside of a xs:simpleType element
user defined types
restrictions, unions, enumerations
20
...
21
22
Attributes
23
Complex types
user defined
definition inside of a xs:complexType element
xs:sequence => all child elements, the order is specified
xs:all => all child elements, the order is not important
xs:choice => only one child element
...
24
25
Is the following XML element valid? For xs:sequence? For xs:all?
World
Hello
26
Relational vs. Non-relational data model
Relational (ERD model)
atomic, flat
general view
no data duplication => usage of unique keys as reference
data relations => entity relations, foreign keys
usage of ids
Non-relational (XML document)
structured, nested
specific view
data duplication (sometimes only partially)
data relations => elements nested one in another
ids not necessary
27
Bonus topic
28
DOM
Document Object Model
interface (cross-platform, language-
independent)
represents a document as a tree structure
each node contains an object
nodes can have event handlers
used by browsers to represent an HTML page
applicable to XML as well
29
Demo
Modelling Discord using XML
1. data modelling
2. schema definition
3. document validation
Note: Demo code is available in the Interactive
syllabus for seminars.
30
Starting point
Discord ERD model (slightly extended)
31
How to run a validation?
online, for instance: freeformatter or utilities-online
VS Code extension, for instance: XML extension from Red Hat (does also formatting and other)
32
Now, it's your turn :)
The assignment for Iteration 01 can be found in Gitlab issues.
Now, you can continue as described in How to download new iteration on Gitlab Wiki.
If you struggle, don't hesitate to ask for help :)
Note: even though the nature of XML does not require indentation and proper formatting,
please format it anyway. It makes the document much more readable.
33