Unicode encodings

All XML applications (particularly parsers) must be able to process some Unicode encodings. The most common in CZ/SK/EU are:

  • 8-bit, traditional: US-ASCII, ISO-8859-2 (ISO Latin 2), Windows-1250 (=Cp1250) -- just a subset of Unicode.

  • UTF-8: encoding of all chars in Unicode, each char to 1-6 bytes (different), US-ASCII to 1 byte, Czech/Slovak chars to 2 bytes.

  • UTF-16: same principle as UTF-8, but 16 bit (2 bytes) word is the basic unit