Preamble

Lasaris

Outline

  • What are markup languages
  • Motivation
  • History
  • Main representatives

What are Markup Languages?

  • Formal (computer) languages that allow to use in addition to the normal text in natural languages also syntactically distinguishable constructs specifying the structure of the text, the meaning of parts, etc., and also allows the text to store its metadata (information about the origin, content, authorship, dating, rights used …).
  • Known markup languages (markup languages) are languages for web (HTML, XML, …),
  • but also others such as typesetting formats of the TeX system, text (documentation/help)
  • formatting tools for the UNIX-like systems nroff, troff.
  • Languages for page description for printing and presentation, namely PostScript or PDF have similar characteristics (text + markup or commands).

What are Markup Languages?

  • Distinguishing characteristics of markup languages in comparison to programming languages is superiority of text (in natural language) over the rest of the content (markup, declarative), so files are often referred to as documents .
  • The preponderance of the text in natural language may not be true in specific applications.
  • For example, XML is used as the format of business exchange (database, table) data, where the marking more than text, and this has the character of a text-recorded data of other types (number, date, logical value).

The Nature of Markup

There are 3 main categories according to the nature of markup languages and method of their interpretation:

  1. Presentational markup usually characterizes binary content embedded in text, eg. classical (older) formats for text editors.
  2. Procedural markup indicating how the processor (processing applications) deals with the text. Usually a sequence of instructions that the sections of the text are to perform. This sequence is consequtively processed while the usual programming constructs (branching, loops, subroutines, variables) are available. Eg. TeX, PostScript.
  3. Descriptive markup declaratively defines the document structure and meaning of its parts and does not say exactly what step should be performed while processing - this is usually known by the applications. Eg. HTML

Tagging without computers

Around the sixties, the concept of tagging was known only in non-PC contexts:

  • The first markup language (informally) were used to processing texts in books and their typesetting.
  • Concealers and typographers make the markup on the edge of the paper to indicate what font to select, to make proofreading marks etc.

Early computer applications of markup

  • The first systems for computerized text processing suffered from the fact that their target printing facilities were very different and hence they must have been "programmed".
  • The standard GenCode (author William W. Tunnicliffe) was therefore developed, which allowed to mark the general (generic) print output in the text, and a special compiler customized the output for a particular output device.
  • The "real father" of markup languages is often considered Charles Goldfarb from IBM, which developed early seventies the language IBM GML .

Early computer applications of markup (contd)

  • On the basis of these two languages was later SGML was created later, which in fact is not (one) language but a meta-language , ie. standard to define languages.
  • A little different way was taken by the TeX markup language of Donald Knuth, 70s and Os, describing how a typesetting system should place text in a printed document.
  • Frequently, a system of macros LaTeX (Leslie Lamport) is used instead, which adds descriptive / declarative character to TeX (for example, characterized the logical structure of the document).

Later markup standards — SGML

The first truly widespread and relatively widely applied (in scales, then, of course, incomparable with today’s popularity of XML …) was SGML.

  • It evolved as a modernization of GML, then followed by formalization and subsequent adoption as an ISO standard.
  • It is a metalanguage, ie. rules for the design of specific markup languages, SGML instances.

SGML

  • Languages designed according to the rules of SGML are suitable for hand typing - there is less marking than later in XML. However, the existence of a DTD and connection to it to describe the structure of each document were compulsory.
  • SGML later, in the late 90 years, became the basis for formulation of XML as a format easier for machine processing, not necessarily requiring to describe the structure of documents for each file.