PB138 - Markup Languages
Tomas Pitner May 6, 2013
To i
Ql Motivation for Docbook Q Basic structures of Docbook Q Docbook versions and variants Ql Docbook Tooling 0 Üvod
What is Darwin Information Typing Architecture (DITA)?
• big project, one complex markup for all programmmers documentation
• now many other purposes - writing papers (article), books (book), chapters (chapter), sections (section, sectX)
• authored by Norman Walsh (formerly Sun Microsystems Inc.)
• details, DTD, help, software, styles, see docbook.org (http://docbook.org)
• probably the biggest markup for technical documentation ever
• there is the TDG (DocBook: The Definitive Guide) - also as Windows Help (/~tomp/xml/tdg-en-2.0.7.chm)
• Docbook is a XML (and SGML) markup for writing documents, namely of technical nature (computer/software manuals, technical documentation).
• Originally as a tool to cope with large UNIX-systems documentation.
• In principle, DB is a logical (semantic) markup (i.e. visual representation is not of importance when writing the source. Text is created using semantic elements for:
• big text blocks (book, paper, chapter, section, paragraph, screen...)
• smaller in-line parts (emphasized, link, product name, command,...)
• multimedia elements (images, videos, sounds...)
• helper elements and metadata (title, authoring, date of creation, copyright, index items, ToC.)
• Easy processing:
• visualization (using CSS, using XSLT for transf. to HTML, via LaTeX or XSLFO to PDF, but also PostScript, PDF, RTF, DVI and plain-ASCII...), or documentation/help formats (HTML Help, Microsoft CHM, man-pages)
• selected parts or elements can be extracted separately (take the intra chapter, generate the book ToC.) or connect more texts into one
To i
• Docbook since beginning of 90s (1991), as a SGML markup that time.
• After introduction of XML as de-facto standard for semistructured data (W3C spec. XML in 1998) is Docbook predominantly encoded in XML - mainly because of plethora of tools available.
• Further development under OASIS (http://www.oasis-open.org) (The Organization for the Advancement of Structured Information Standards).
• Jirka Kosek (http://www.kosek.cz) is involved in the development, the editor of specs, is Norm Walsh (http://norman.walsh.name).
Motivation for Docbook Basic structures of Docbook Docbook
Storing files
Usuale extension for files containing Docbook documents is .dbk, or simply .xml
MIME type for Docbook is application/docbook+xml
To i
ent categories
The nature (purpose, size) of the document is mainly determines by using certain structural elements. The categories include:
set collection of (book) or other collections - may be nested.
book book containing (chapters), papers (article) or parts (part), may contain indices (index), appendices etc. part part containing one or more chapters, may be nested, may contain intra texts, article paper, may contain a sequence of block element (like chapters, paragraphs), chapter named and usually numbered section of a bigger document (book, paper), appendix příloha dedication decication of a certain element
Motivation for Docbook Basic structures of Docbook Docbook
Block elements
• paragraphs
• tables
• lists
• examples
• figures, etc.
these block elements are visualized in the order they will be read, ie. - top-down in Western languages, but left-right in Chinesse.
To i
1
ements
contained in block elements:
• emphasized text (emphasis...)
• links (eg. link, ulink, olink...)
• meaning (keyword, command, file name...)
Motivation for Docbook Basic structures of Docbook Docbook
Example of Docbook 5 document
Docbook 5 is the latest but still developed standard. It uses XML Namespaces and no DOCTYPE declaration.
Chapter K/title>
Hello world!
I hope that your day is proceeding sp;
Chapter 2
Hello again, world!
Still Dnrhnnk 4
ikpH mainly for^pcrarw Hor<; ~ •Oc^O
B138 - Markup Languages
Very simple book Chapter K/title> Hello world!
I hope that your day is proceeding sp
Chapter 2
Hello again, world!
O Either, or... You won't do a big mistake still using 4.y, since there is plethora of tools and docs.
O Conversion to DB 5 any time later...
stomization
• DocBook can be used as basic (Full)
• or simplified (Simplified) or to make a
• customization.
Which means:
• modify schema
• evt. modify (XSL) styles
• XSL styles by importing the original style and overriding selected templates
mplified
derived languages/markups can be created by reduction or extension of allowed elements:
Simplified Docbook from a family of elements just one is
preserved/left: programlisting, but not screen No "big things" like books, just articles Any doc. in Simplified Docbook is also a (full) Docbook doc.
Docs for Simplified Docbook online
(http://www.docbook.org/schemas/simplified)
• Extension :-) of Simplified Docbook
• For writing (PowerPoint-1 ike) presentations - "foils".
• XSLT styles allow to make static- or JavaScript-enabled web/HTML pages.
• Modern browsers can even navigate through the structure (go to next slide, toe, etc.).
• In the worst case, any plain-text editor can be used if supporting the required charset and encoding (eg. Unicode/UTF-8).
• Better to use any editor with auto-closing (or even auto-completion) of elements.
• If an on-the-fly validation is supported - the best!
• Ideally an WYSIWYG producing a valid Docbook text - eg. XML Mind (XXE) or oXygen.
xmlmind http://xmlmind.com (http://www.xmlmind.com/) of Pixware powerfull WYSIWYG editor for Docbook, DITA, XHTML and other formats including ebooks, can be further customized, suitable for enterprise environment and integration. Professional- and Evaluation- license.
oXygen Synchro Soft SRL's (http://www.oxygenxml.com/) oXygen Editor/Developer/Author.
GNU Emacs with (http:
//www.thaiopensource. com/nxml-mode/)nxml-mode
• Docbook 4.x was DTD-constraint/defined
• Docbook 5.x uses namespaces and is RelaxNG/Schematron-constraint
• for transition, see http://docbook.org/docs/howto/ (http://docbook.org/docs/howto/)
• and complete reference (http://docbook.sourceforge. net/release/xsl/current/doc/) to use Docbook XSL
Mainly for conversion into other document formats (" Office-1 ike" as Office Open XML, Open Document Format, RTF, Word processing XML) or visualization via PDF, PS, XSL:FO, or web formats (XHTML 1.x, XHTML 5)
• Fundamental tools are Docbook XSL
(http://en.wikipedia.org/wiki/DocBook_XSL) styles
• well parametrized, rich, modifiable
• a book on Docbook XSL by Sagehill
(http://www.sagehill.net/docbookxsl/index.html) publishers
• complete reference (http://docbook.sourceforge.net/ release/xsl/current/doc/) to use Docbook XSL
BWSlBBMhMaaiEBE
MS3M
Iniciativa směřující k vytvoření a aplikacím podpory zachycování textů různé povahy ve standardizované formě
• dnes v XML syntaxi (P5), dříve SGML (po P3) nebo obojí
• rozsáhlé značkování (ještě větší počet elementů než např. Docbook)
• lépe podporuje metadata dokumentů a jejich životní cyklus (vznik, revize)
• používá se pro různorodé dokumenty (texty pořizované na počítači, skenované texty, historické dokumenty, dokumenty v neevropských jazycích)
• značkování je modulární - lze sestavit na míru potřebám
(P4)
Toi
Motivation for Docbook Basic structures of Docbook Docbook
Aplikace TEI značkování
• příklady textů v TEI
(http://wiki.tei-c.org/index.php/Samples) (především XML)
• Manuál (Guidelines (http://www.tei-c.org/Guidelines/P5/)) pro TEI P5
Toi
IBM and the Consortium OASIS have introduced DITA (http://docs.oasis-open.org/dita/vl.O/archspec/ ditaspec.toc.html) architecture as:
• Nástroj pro tvorbu tematicky orientovaného značkovaného obsahu s možností specializace pro zvláštní účely.
• Není to, na rozdíl např. od Docbooku, jedno pevné značkování.
• Využívá se principů podobných jako v objektových jazycích.
• Specializace znamená podědit vlastnosti (např. formátování) a konkretizovat je.
• Používá se tam, kde se tvoří rozsáhlý, vysoce strukturovaný, znovupoužitelný obsah s přesně vymezenou sémantikou.
• od roku 2001 DITA vyvíjena společností IBM (motivace: pevná značkování nestačí...)
• 2004 - IBM daruje standard do správy OASIS
• O vývoj se stará OASIS DITA Technical Committee (http://www.oasis-open.org/committees/dita/).
• Duben 2005 - Version 1.0 of the DITA specification:
• OASIS Darwin Information Typing Architecture (DITA) Language Specification (http://xml.coverpages.org/ DITAvl0-0S-LangSpec20050509.pdf)
• OASIS Darwin Information Typing Architecture (DITA) Architectural Specification (http://xml.coverpages.org/ DITAvl0-0S-ArchSpec20050509.pdf)
topič téma - jednotka informace daná názvem a obsahem; dostatečně malá, aby byla dále nedělitelná z hlediska obsahu a pořízení (menší už by nedávala ucelený smysl) - např. odpověď na jednu otázku map dokument organizující témata do větších jednotek se zachycením vztahu mezi tématy, vč. např. obsahu
specialization specializace - je technika umožňující definovat nové strukturální typy nebo nové informační domény) s maximálním znovupoužitím existujícího návrhu a kódu, důraz je kladen na snižování nákladů přechodu na nové typy (výměna dat, migrace, správa)
structural vs. domain specialization strukturální specializace -
umožňuje tvořit nové typy témat (topič types) nebo map (map types)
doménová specializace - dovoluje vznik nového značkování použitelného pro více strukturálních typů
Tomáš Pitner PB138 - Markup Languages
CambridgeDocs nabízí řešení pro pořizování a správu dokumentů navržených podle DITA - xDoc Pro
(http://www.cambridgedocs.com/solutions/dita.htm).