Outline
-
Types of transformations
-
XML pipelining
-
TODO: Tools
XML Transformations
-
An XML transformation language is a programming language designed specifically to transform an input XML document into an output document which satisfies some specific goal.
-
There are two special cases of transformation:
- XML to XML
-
the output document is an XML document.
- XML to Data
-
the output document is a byte stream.
XML Pipeline
-
In software, an XML Pipeline is formed when XML (Extensible Markup Language) processes, especially XML transformations and XML validations, are connected. For instance, given two transformations T1 and T2, the two can be connected so that an input XML document is transformed by T1 and then the output of T1 is fed as input document to T2. Simple pipelines like the one described above are called linear; a single input document always goes through the same sequence of transformations to produce a single output document.
XML Pipeline operations
-
Linear operations
-
Non-linear operations
Linear operations
-
Micro operations
-
Document operations
-
Sequence operations
Linear: Micro-operations
Operate at the inner document level:
- Rename
-
renames elements or attributes without modifying the content
- Replace
-
replaces elements or attributes
- Insert
-
adds a new data element to the output stream at a specified point
- Delete
-
removes an element or attribute (also known as pruning the input tree)
- Wrap
-
wraps elements with additional elements
- Reorder
-
changes the order of elements
Linear: Document operations
They take the input document as a whole:
- Identity transform
-
makes a verbatim copy of its input to the output
- Compare
-
it takes two documents and compare them
- Transform
-
execute a transform on the input file using a specified XSLT file
- Split
-
take a single XML document and split it into distinct documents
Linear: Sequence operations
They are mainly introduced in XProc and help to handle the sequence of documents as a whole:
- Count
-
it takes a sequence of documents and counts them
- Identity transform
-
makes a verbatim copy of its input sequence of documents to the output
- Split-sequence
-
takes a sequence of documents as input and routes them to different outputs depending on matching rules
- Wrap-sequence
-
takes a sequence of documents as input and wraps them into one or more documents
Non-linear operations
- Conditionals
-
where a given transformation is executed if a condition is met while another transformation is executed otherwise
- Loops
-
where a transformation is executed on each node of a node set selected from a document or a transformation is executed until a condition evaluates to false
- Tees
-
where a document is fed to multiple transformations potentially happening in parallel
- Aggregations
-
where multiple documents are aggregated into a single document
- Exception Handling
-
where failures in processing can result an alternate pipeline being processed