Preamble

Lasaris

Outline

  • Types of transformations
  • XML pipelining
  • TODO: Tools

XML Transformations

  • An XML transformation language is a programming language designed specifically to transform an input XML document into an output document which satisfies some specific goal.
  • There are two special cases of transformation:
    XML to XML

    the output document is an XML document.

    XML to Data

    the output document is a byte stream.

XML Pipeline

  • In software, an XML Pipeline is formed when XML (Extensible Markup Language) processes, especially XML transformations and XML validations, are connected. For instance, given two transformations T1 and T2, the two can be connected so that an input XML document is transformed by T1 and then the output of T1 is fed as input document to T2. Simple pipelines like the one described above are called linear; a single input document always goes through the same sequence of transformations to produce a single output document.

XML Pipeline operations

  • Linear operations
  • Non-linear operations

Linear operations

  • Micro operations
  • Document operations
  • Sequence operations

Linear: Micro-operations

Operate at the inner document level:

Rename

renames elements or attributes without modifying the content

Replace

replaces elements or attributes

Insert

adds a new data element to the output stream at a specified point

Delete

removes an element or attribute (also known as pruning the input tree)

Wrap

wraps elements with additional elements

Reorder

changes the order of elements

Linear: Document operations

They take the input document as a whole:

Identity transform

makes a verbatim copy of its input to the output

Compare

it takes two documents and compare them

Transform

execute a transform on the input file using a specified XSLT file

Split

take a single XML document and split it into distinct documents

Linear: Sequence operations

They are mainly introduced in XProc and help to handle the sequence of documents as a whole:

Count

it takes a sequence of documents and counts them

Identity transform

makes a verbatim copy of its input sequence of documents to the output

Split-sequence

takes a sequence of documents as input and routes them to different outputs depending on matching rules

Wrap-sequence

takes a sequence of documents as input and wraps them into one or more documents

Non-linear operations

Conditionals

where a given transformation is executed if a condition is met while another transformation is executed otherwise

Loops

where a transformation is executed on each node of a node set selected from a document or a transformation is executed until a condition evaluates to false

Tees

where a document is fed to multiple transformations potentially happening in parallel

Aggregations

where multiple documents are aggregated into a single document

Exception Handling

where failures in processing can result an alternate pipeline being processed

Resource on XML Pipeline