Federico Flueckiger
Contributions Towards a Unified Concept of Information
Zdroj: http://mypage.bluewin.ch/federico.flueckiger/Uci/Download/Ctrb2uci.pdf
[upraveno]
2.1.2 Claude E. Shannon's statistical model
Wiener's idea of a statistical theory about the amount of information was realised by Claude
E. Shannon in [Shannon 1969] {Footnote(4)
: [Shannon 1969] is the fourth edition of the work
first published in 1948.}. The basic problem that needed to be solved according to Shannon
was the reproduction at one point of a message produced at another point. He deliberately
excluded from his investigation the question of the meaning of a message {Footnote(5)
: In
chapter 2 'meaning' designates the extension of a message, i.e. the reference of the message to
the things of the real world. An exception to this is chapter 2.3.3, where the deviation from
this usage is explicitly noted. A detailed analysis of the concept of meaning in particular and
the theory of semantics in general with reference to our concept of information will follow in
chapter 3.}, arguing that:
"Frequently the messages have meaning; that is they refer to or are correlated according to
some system with physical or conceptual entities. These semantic aspects of communication
are irrelevant to the engineering problem. The significant aspect is that the actual message is
one selected from a set of possible messages. The system must be designed to operate for each
possible selection, not just the one which will actually be chosen since this is unknown at the
time of design." [Shannon 1969, p. 31]
According to Shannon, in order for messages to be transmitted, we need a communication
system. By this he means an arrangement such as is represented in Figure 1.
Shannon describes the components of the communication system as follows:
1. The information source (or message source) produces messages or parts of messages
intended for a particular destination.
2. On the basis of the message, the transmitter produces a sequence of signals such that
they can be transmitted over a channel.
3. The channel is merely the medium used to transmit the signal from transmitter to
receiver. During transmission the signals may be perturbed and distorted by a so-
called noise source.
4. The receiver usually performs the reverse operation to the transmitter, reconstructing
if possible the original message from the signals.
5. The destination is the addressee of the message and can be either a person or a thing. It
requires a priori knowledge about the information source which enables it to
understand the message transmitted. In any case, the destination must know the set of
signs available to the information source.
On the basis of his communication system, Shannon defines statistical quantities such as
channel capacity and amount of information for discrete and continuous, noisy and noiseless
systems. For the purpose of the present thesis, only the derivation of the amount of
information for discrete noiseless information sources is relevant:
ˇ Channel capacity is irrelevant for a definition of information. It is not of interest how
much information per time unit can be sent over a channel, but only the fact that
information has been sent and what it consisted of.
ˇ Except for degenerated cases, continuous behaviour can be simulated to any desired
degree of precision by means of discrete models.
ˇ The noise source can be represented as an additional source of information which
informs (or misinforms) the channel and thereby the information transmitted.
Shannon assumes that the set of messages which a discrete information source can produce
consists of a finite number of elements N1, N2, ..., Nn (the elementary messages or signs),
with a probability of occurrence of p1, p2, ..., pn respectively. Since under very general
conditions an information source combines the messages from sign sequences produced in a
non-deterministic way, the information source can be interpreted as a stochastic process
{Footnote(6)
: More detailed descriptions of stochastic processes and their special cases will be
found in [Heller et al. 1978], [Ehrenstrasser 1974] or [Beyer et al. 1978].}. In many cases,
such as the transmission of a text in a natural language, the probabilities moreover depend on
the preceding states. Thus in English the letter 'd' is more likely to be followed by the letter 'e'
than the letter 'z'. In such cases the information source operates as a discrete Markov process
(a special case of a stochastic process) or Markov chain respectively. Where transition
probabilities converge on a particular probability distribution independently of the starting
probability, such as is typically the case with information sources that produce texts in a given
language, it is usual to talk of an ergodic Markov chain.
The ergodicity of transition probabilities can be illustrated by imagining that an information
source can be brought to produce texts automatically and without human input [Shannon
1969, p. 43 ff.]. If digrams {Footnote(7)
: By a digram one understands the fact that the
occurrence of a sign depends not only on its own probability of occurrence but also on the
sign preceding it.} are used, the resulting sequence will already differ greatly from a sequence
produced using only normal probabilities of occurrence. If trigrams or indeed tetragrams
{Footnote(8)
: For trigrams the two preceding signs and for tetragrams the three preceding
signs determine the occurrence of a sign.} are used for generating text, it is possible to
produce almost meaningful text.
Apart from the requirement that abstraction be made from form and content of the
information, the measure for the amount of information - called H(p1, p2, ..., pn) by Shannon
- should have the following properties:
1. H is continuous in pi.
2. If all pi are equal (pi = 1/n), then H must increase monotonically with n.
3. If a choice is broken down into two successive choices, the original H should be the
weighted sum of the two new values of H.
In the form of a theorem, Shannon postulates in [Shannon 1969, p. 49 f.] that the only
function satisfying the three criteria above is of the following form:
Formula 1
where K is a normalising positive constant. Shannon shows in [Shannon 1969, p. 87 ff.] that
Formula 1 is also valid for the more general case of real pi.
With this formula Shannon created a direct equivalent to entropy as defined in statistical
mechanics and thus contradicted Wiener's assertion that the amount of information was the
same as negative entropy. The debate about the minus sign continued for quite a while until
Shannon's formula was generally adopted, probably because it was sufficiently formalised so
that it could be applied to practical cases immediately.
The amount of information H from Formula 1 is a quantity which allows us to calculate the
average number of signs (from a set of signs x1, x2, ..., xn) needed for coding a message.
Since the resulting value is not always an integer, Shannon talks about the measure of
information H* of the coding operation where: H* >= H. The resulting superfluous
information, redundancy r, is calculated as follows:
Formula 2
Shannon's theory of the amount of information was dismissed by many later authors as merely
a theory of syntactical information, because it excluded the semantic and pragmatic levels. In
particular, Yehoshua Bar-Hillel and authors drawing on his work, such as Doede Nauta jr.,
vehemently attacked Shannon's theory. They said that, except for the theory about the amount
of information of an information source, considered irrelevant by this school of thought,
information as a notion was not defined by Shannon at all (cf. [Bar-Hillel 1964, p. 301]).
Moreover, they claimed that physical entropy as a purely empirical concept was unsuitable for
a definition of semantic information, which for them was a logical concept (cf. [Bar-Hillel
1964, p. 309]). Any attempt to develop Shannon's theory into a universal theory of
information would necessarily reach an impasse. To this day, though, there is no quantity in
information theory which is as well-supported and as generally accepted as Shannon's amount
of information. On the other hand, Shannon's work is rightly seen as lacking indications for a
conceptual clarification of information.