Week 02: XML, schema and validation, DOM 1 Agenda Markup languages XML basics XML schema DOM Short demo Hands on: Iteration 01 2 Let's dive into it! 3 Markup languages (recap) natural language + special constructs ("marks") for instance HTML, Markdown, TeX easily readable for both computers as well as humans 4 Example: Markdown bold text Gitlab FI MUNI Heading level 5 5 XML eXtensible Markup Language data exchange format translations web scraping .xml file extension 6 Harry Potter J. K. Rowling 2005 29.99 Learning XML Erik T. Ray 2003 39.95 7 XML document structure 8 comment processing instruction root element child/nested elements start/end tags text node attribute Note: Elements are also nodes. 9 Basic rules all elements must have an end tag OR be empty and self-closing all elements must be properly nested (overlapping is not allowed) all attribute values must be enclosed in quotes each document must have a unique root element 10 Naming conventions and names of elements are free to choose. But remember, "with great power comes great responsibility." 11 To avoid 12 Element or attribute? children <lang>en</lang> Harry Potter J K. Rowling 2005 29.99 vs. Harry Potter J K. Rowling 2005 29.99 13 Element or attribute? Attribute value: cannot be further structured "atomic" value keeps an additional info of another element for instance, lang Note: do not use too many attributes on a single element. It can get hard to read for humans very quickly. Let's say that 3-4 attributes are a maximum. Element: even if nested, it is a meaningful structure itself usually structured contains any number of nodes for instance, person 14 XML Schema the structure of an XML document what elements and attributes are allowed to appear, in which order, how many times... data types of elements and attributes default or fixed values for elements/attributes namespaces follows XML syntax itself .xsd file extension Note: we validate an XML document against a schema. So a document is valid if it conforms to a given schema. 15 16 Definition Header ... ... 17 Element This is a simple element. It can contain only text. Limiting occurences 18 Types: simple types Basic types xs:string xs:decimal xs:integer xs:boolean xs:date xs:time ... 19 Types: simple types definition inside of a xs:simpleType element user defined types restrictions, unions, enumerations 20 ... 21 22 Attributes 23 Complex types user defined definition inside of a xs:complexType element xs:sequence => all child elements, the order is specified xs:all => all child elements, the order is not important xs:choice => only one child element ... 24 25 Is the following XML element valid? For xs:sequence? For xs:all? World Hello 26 Relational vs. Non-relational data model Relational (ERD model) atomic, flat general view no data duplication => usage of unique keys as reference data relations => entity relations, foreign keys usage of ids Non-relational (XML document) structured, nested specific view data duplication (sometimes only partially) data relations => elements nested one in another ids not necessary 27 Bonus topic 28 DOM Document Object Model interface (cross-platform, language- independent) represents a document as a tree structure each node contains an object nodes can have event handlers used by browsers to represent an HTML page applicable to XML as well 29 Demo Modelling Discord using XML 1. data modelling 2. schema definition 3. document validation Note: Demo code is available in the Interactive syllabus for seminars. 30 Starting point Discord ERD model (slightly extended) 31 How to run a validation? online, for instance: freeformatter or utilities-online VS Code extension, for instance: XML extension from Red Hat (does also formatting and other) 32 Now, it's your turn :) The assignment for Iteration 01 can be found in Gitlab issues. Now, you can continue as described in How to download new iteration on Gitlab Wiki. If you struggle, don't hesitate to ask for help :) Note: even though the nature of XML does not require indentation and proper formatting, please format it anyway. It makes the document much more readable. 33