The specification allows at some places in XML documents (eg. element name, attribute content…) not all characters.lt is good to know the meaning of:
Both standards try to resolve the problem: charsets with more than 256 chars.
upto 64 K chars, enough for European languages/alphabets, not sufficient for world languages (eg. Chinese).
covers " everything".
All XML applications (particularly parsers) must be able to process some Unicode encodings. The most common in CZ/SK/EU are:
US-ASCII, ISO-8859-2 (ISO Latin 2), Windows-1250 (=Cpl250) - just a subset of Unicode.
encoding of all chars in Unicode, each char to 1-6 bytes (different), US-ASCII to 1 byte, Czech/Slovak chars to 2 bytes.
same principle as UTF-8, but 16 bit (2 bytes) word is the basic unit
direct encoding of Unicode, chars from BMP are directly represented as their ordinal numbers
dtto, but for whole Unicode at 4 bytes - not efficient, 4 bytes even for US-ASCII, EU-langs…
encodings are the most important for XML, particularly UTF-8 (but parsers must know both).
<?xml version=" 1. 0" encoding="Windows-1250"?>
then UTF-8 or UTF-16 is used.