PB138/07 - Modern Markup
Languages and Their Applications
Lab 01 [24.02.2020]
Course instructions and Introduction to XML, DOM & JAXP
Bruno Rossi
Department of Computer Systems and Communications,
Lasaris (Lab of Software Architectures and Information Systems)
Masaryk University, Brno
2/16
●
About your lab instructor
●
About your previous experience
●
Content of the seminars
Introduction
3/16
Evaluation
●
Assignments/Tasks solved during each seminar session
→ 20 pts (2pts x 10 seminars, starting week 3)
●
Team project given at half the semester (4 students)
→ 0-40 pts
●
Final Exam
→ 0-40 pts
●
To pass the course:
→ 70 pts (60 with 'zapoucet')
Attendance to seminars in not compulsory, but to get the points is
necessary to submit the completed task for each week
4/16
Structure of the Seminars
●
During the semester:
→ each week an incremental task to be completed
●
Second part:
→ starting with the development of the project to be delivered
5/16
Introduction to XML, DOM & JAXP
XML = eXtensible Markup Language
DOM = Document Object Model
JAXP = Java API for XML Processing
6/16
XML
●
XML is a mark-up language created to store & transport
information in a structured form
What is the difference between well-formed
and valid XML?
Can a 'non-well-formed' XML be valid?
Can a 'non-valid' XML be well-formed?
7/16
XML well-formedness
●
Try yourself: create 6 files
01.xml
02.xml
03.xml
●
At the prompt
●
Try to fix the errors and re-run xmllint (see also next slide)
$> xmllint --noout *.xml
What happens if you
add the --valid
flag to xmllint?
04.xml
05.xml
B&N
06.xml
(if you have only those xml files in the dir)
Before running
xmllint, think
about which should
not pass the well-
formedness
8/16
XML Escaping
" "
' '
< <
> >
& &
●
→ Cannot be used in attributes
●
Try rewriting the following by proper escaping
B&N
9/16
Playing with encoding
●
Download the file 01-enc.zip
●
Uncompress the files in a directory
●
You should have two files: enc.xml and enc-prob.xml
●
Try to run xmllint on both
●
Running on enc-prob.xml should give you the following error:
●
Try to fix this issue in the file (hint: you can either use a text editor or
iconv)
enc-prob.xml:2: parser error : Input is not proper UTF-8,
indicate encoding !
Bytes: 0xE0 0x3E 0x0A 0x20
10/16
Playing with encodings - hints
●
In a Linux terminal, you can look at the file encoding
●
You can look at the “bytes dump” for the two files
$> file --mime enc.xml
enc.xml: application/xml; charset=utf-8
$> file --mime enc-prob.xml
enc-prob.xml: application/xml; charset=iso-8859-1
$> hexdump enc.xml
0000000 3f3c 6d78 206c 6576 7372 6f69 3d6e 3122
0000010 302e 2022 6e65 6f63 6964 676e 223d 7475
0000020 2d66 2238 3e3f 3c0a 6f6c 6163 696c c374
0000030 3ea0 200a 6f52 656d 3c0a 6c2f 636f 6c61
0000040 7469 a0c3 003e
$> file --mime enc-prob.xml
0000000 3f3c 6d78 206c 6576 7372 6f69 3d6e 3122
0000010 302e 2022 6e65 6f63 6964 676e 223d 7475
0000020 2d66 2238 3e3f 3c0a 6f6c 6163 696c e074
0000030 0a3e 5620 6e65 6369 0a65 2f3c 6f6c 6163
0000040 696c e074 003e
11/16
Create a new XML file
●
Create a new XML file called continent.xml
●
Represent several continents in the file (e.g. Asia, Africa, America,
Europe, Australia). Each continent has a name attribute
●
Each continent contains one or more cities
●
Each city has an attribute: id and sub-elements: name,
population, and pollution (that can be either 'low','medium','high')
●
Add several continents and cities to the file
●
Check with xmllint the well-formedness of the xml file
●
Hint: you can use xmllint to output a nicely formatted xml file
from the command line:
$> cat continent.xml | xmllint --format -
12/16
Using JAXP (1/2)
●
We will start familiarizing with JAXP in this exercise (will continue
next time)
●
In this lab we are interested about the DOM API:
– org.w3c.dom
– javax.xml.parsers
●
Check the following documentation (so it will be familiar for the next
lab session):
→ http://www.oracle.com/technetwork/java/intro-140052.html (intro to JAXP)
→ http://www.oracle.com/technetwork/java/dom-139036.html (DOM API)
→ https://docs.oracle.com/javase/8/docs/api/org/w3c/dom/package-summary.html(org.w3c.dom)
13/16
Using JAXP (2/2)
●
Use the sample project (01-02-xml-ex.zip) to familiarize with the
API
●
Just consider the class 'XML'
●
You can modify the method doMyXMLTransformations
(org.w3c.dom.Document)
●
In the src/ folder, there is already a continent.xml file with
some data that you can use
●
Try to run the application
14/16
Task 1: Output Element Names
●
In doMyXMLTransformations(Document document)
1. Get the root element
2. Print the root element name ('world' in the sample file)
3. Get all the continents in the file (you can use
getElementsByTagName() )
4. Iterate over the continents and print the names (name attribute)
– output based on the sample should be (asia, africa,
europe, america)
15/16
Task 1: Output Element Names
HINTS
●
Use getDocumentElement()to get the root as a Node
●
Cast the Node to Element
●
Iterate over a NodeList by using getElementsByTagName()
●
Use getAttribute("name") to print the attribute ‘name’
16/16
References
Suggested material:
●
Extensible Markup Language (XML) 1.0 W3C Recommendation (annotated):
→ https://www.xml.com/axml/testaxml.htm
●
Introduction to JAXP from Oracle docs:
→ http://www.oracle.com/technetwork/java/intro-140052.html(chapt1)
→ http://www.oracle.com/technetwork/java/dom-139036.html(chapt3)