3 The basic concepts of mathematical statistics
Probability theory and mathematical statistics have the subject of their interest in common. Describing
the reality they both respect the impact of stable random influences. In contrast to probability
theory, mathematical statistics handles with more unfamiliarity with the considered reality - we have
to handle with a family of in advance acceptable probability measures P, not only with one measure,
as probability theory does. We do not know which measure is appropriate in advance. On the
basis of observed and analyzed statistical data we try at lest approximately identify the best fitting
probability measure.
A population consists of 10 balls about which we know only they are black or white. We do not
know the exact number of black or white balls and we can not find it out. But we can draw one ball
with replacement many times. On the basis of the results of drawing we may estimate the unknown
number of black balls. This estimation is plausible, if the number of drawing is sufficiently large.
Let us imagine we have drawn 100 times one ball (with replacement) and there have been only 9
black ball results. It is highly probable that the number of black balls is less then the number of white
balls. And we can assert more. The numbers 1,2,...,9 are the candidates for the unknown quantity of
black balls. It seems, that according to the results, the number one is the most plausible candidate.
Mathematical statistics is concerned with (among others):
a) theory of estimating parameters, (e.g. the quantity of black balls in the sample).
b) theory of hypothesis testing, (e.g. the hypothesis test whether there are c black balls).
Both procedures are based on statistical data organized in data sets. This procedures give plausible
results if the drawing fulfill some assumptions. The following term of random sample is related to
the idea of proper collection of data. [Usually, the data processed by statisticians are taken from a
population. The portion drawn is called a sample. If the sample is representative of the population,
then inferences and conclusions made from the sample can be extended to the population as a whole.]
Definition 3.1
(i.) Let X1, . . . , Xn be independent and identically distributed random variables, Xi  L(), i =
1, . . . , n, where L() is a probability distribution. Then X1, . . . , Xn is referred to as random
sample of size n from the probability distribution L(). The realizations x1, . . . , xn organized
into the column vector form data set.
(ii.) Let (X1, Y1), . . . , (Xn, Yn) be independent and identically distributed random vectors, (Xi, Yi) 
L2(), i = 1, . . . , n; where L2() is a joint probability distribution. Then (X1, Y1), . . . , (Xn, Yn)
is referred to as random sample of size n from the two-dimensional probability distribution
L2(). The realization (x1, y1), . . . , (xn, yn) organized into the matrix n × 2 form the data set.
(iii.) Analogously random sample of size n from the p-dimensional probability distribution Lp(),
p  3 can be defined.
(iv.) Any function T of a random sample (or a number of random samples), where the function itself
is independent of the sample's distribution, is called statistic. The term is used both for the
function and for the value of the function on a given sample.
Remark
A statistic is an observable random variable, which differentiates it from a parameter, a generally
unobservable quantity describing a property of a statistical population.
12
Corollary 3.2
Let X1, . . . , Xn be a random sample from distribution function F(x).
Let F(x) = F(x1, . . . , xn) be a joint distribution function of a random vector (X1, . . . , Xn). Then
the following assertion holds:
F(x) = F(x1)  F(x2)  . . .  F(xn)
The following definition lists essential often used statistics (plural of statistic).
Definition 3.3
(i.) Let X1, . . . , Xn be a random sample, n  2
­ Statistic M = 1
n
n
i=1
Xi is called sample mean. [Instead of M it can be denoted X.]
­ Statistic S2
= 1
n-1
n
i=1
(Xi - M)2
is called sample variance.
­ Statistic S =

S2 is called sample standard deviation.
­ Statistic Fn(x) = 1
n
 card{i, Xi  x}, x  R is called the value of sample distribution
function in a point x. [for any fixed real number x : card{i, Xi  x} is a quantity of
those realizations of a random vector which are less than or equal to x.]
(ii.) Let X11, . . . , X1n1 ; . . . ; Xp1, . . . , Xpnp be a sequence of p independent random samples of sizes
n1  2, . . . , np  2,. The total size is n =
p
j=1
nj. Let us denote sample means as M1, . . . , Mp
and sample variances of particular samples as S2
1 , . . . , S2
p. Let c1, . . . , cp be real constants at
least one of which is non-zero.
­ Statistic
p
j=1
cjMj is called linear combination of sample means
­ Statistic S2
 =
p
j=1
(nj -1)S2
j
n-p
is called weighted mean of sample variances
(iii.) Let (X1, Y1), . . . , (Xn, Yn) be a random sample from two-dimensional distribution. Let us denote
sample means as M1 = 1
n
n
i=1
Xi, M2 = 1
n
n
i=1
Yi
and sample variances as S2
1 = 1
n-1
n
i=1
(Xi - M1)2
, S2
2 = 1
n-1
n
i=1
(Yi - M2)2
.
­ Statistic S12 = 1
n-1
n
i=1
(Xi - M1)(Yi - M2) is called sample covariance
­ Statistic R12 =


1
n-1
n
i=1
(Xi-M1)
S1
(Yi-M2)
S2
= S12
S1S2
for S1S2 = 0
0 otherwise
is called sample correlation coefficient
[Transforming the random sample by particular function we obtain statistics M, S2
, S, S12, R12, thus
these statistics are random variables and are denoted by capital letters. The numerical realization
of a random sample leads to numerical realizations of previously mentioned statistics which are
denoted by small letters m, s2
, s, s12, r12. These realisations are corresponding to characteristics known
from descriptive statistics. But there is an important difference. In case of variance, covariance and
correlation coefficient there is a constant 1
n-1
in front of sum instead of 1
n
as it is used in descriptive
statistics.
Example 3.4
An unknown constant  was mutually independently measured by 10 times. The results of measurement
follow: 2; 1,8; 2,1; 2,4; 1,9; 2,1; 2; 1,8; 2,3; 2,2. These results we can view as a realization of a
random sample X1, . . . , X10. Calculate m, s2
and the values of a sample distribution function F10(x).
13
Solution
m = 1
n
n
i=1
xi = 1
10
(2 + 1, 8 + . . . + 2, 2) = 2, 06
s2
= 1
n-1
n
i=1
(xi - m)2
= 1
n-1
n
i=1
(x2
i - 2mxi + m2
) = 1
n-1
[
n
i=1
x2
i - 2m
n
i=1
xi +
n
i=1
m2
] =
= 1
n-1
[
n
i=1
x2
i - 2mnm + nm2
] = 1
n-1
[
n
i=1
x2
i - nm2
] =
= 1
9
(22
+ 1, 82
+ . . . + 2, 22
- 10  2, 062
) = 0, 0404
s =

s2 =

0, 0404 = 0, 2011
To make the calculation of F10(x) easier , the results of measurement will be in ascendent order:
1,8; 1,8; 1,9; 2; 2; 2,1; 2,1; 2,2; 2,3; 2,4;
for x < 1, 8 : F10(x) = 0
for 1, 8  x < 1, 9 : F10(x) = 0, 2
for 1, 9  x < 2 : F10(x) = 0, 3
for 2  x < 2, 1 : F10(x) = 0, 5
for 2, 1  x < 2, 2 : F10(x) = 0, 7
for 2, 2  x < 2, 3 : F10(x) = 0, 8
for 2, 3  x < 2, 4 : F10(x) = 0, 9
for x  2, 4 : F10(x) = 1
Example 3.5
Consider 11 randomly drawn cars of a particular brand. A random variable X stands for an age of
a car (in years) and Y stands for a price of a car (in Kč). The results are listed in the following tab:
X 5 4 6 5 5 5 6 6 2 7 7
Y 85 103 70 82 89 98 66 95 169 70 48
Calculate and interpret r12.
Solution
m1 = 1
11
(5 + 4 + . . . + 7) = 5, 28
m2 = 1
11
(85 + . . . + 48) = 88, 63
s2
1 = 1
n-1
[
n
i=1
x2
i - nm2
1] = 1
10
(52
+ 42
+ . . . + 72
- 11  5, 282
) = 2, 02
s2
2 = 1
n-1
[
n
i=1
y2
i - nm2
2] = 1
10
(852
+ 1032
+ . . . + 482
- 11  88, 632
) = 970, 85
s12 = 1
n-1
[
n
i=1
(xi - m1)(yi - m2)] = 1
n-1
[
n
i=1
xiyi - nm1m2] =
= 1
10
(5  85 + 4  103 + . . . + 7  48 - 11  5, 28  88, 63) = -40, 89
r12 = s12
s1s2
= -40,89
2,02

970,85
= -0, 92
There is a strong decreasing linear relationship between variables X and Y : the older car, the lower
price.
The essential properties of frequently used statistics are listed in the following theorem. The properties
mentioned in the first paragraph will be derived in seminar.
Theorem 3.6
1.) Let X1, . . . , Xn be a random sample from a distribution with expected value , variance 2
and distribution function F(x). Then:
E(M) =  D(M) = 2
n
, n  2 E(S2
) = 2
for any x  R : E[Fn(x)] = F(x), D[Fn(x)] = F (x)(1-F (x))
n
2.) Let X11, . . . , X1n1 ; . . . ; Xp1, . . . , Xpnp be a sequence of p independent random samples with mean
values 1, . . . , p and identical variance 2
for each of p samples. Let us denote the total size
as n =
p
j=1
nj. Further let c1, . . . , cp be real constants at least one of which is non-zero. Then:
14
E(
p
j=1
cjMj) =
p
j=1
cjj E(S2
 ) = 2
3.) Let (X1, Y1), . . . , (Xn, Yn) be a random sample from two-dimensional distribution with the
covariance 12 and the correlation coefficient . Then:
E(S12) = 12, however E(R12)   (Approximation is appropriate for n  30)
Remark 3.7
Methods of mathematical statistics are often used for analyzing and interpreting the results of experiments.
In order to do it correctly it is very important to design experiments in a right way. The
list of basic patterns of the arrangement of experimental units into groups follows.
a) Simple observation: The random variable X is observed on equal terms. A random sample
X1, . . . , Xn correspondents to this arrangement of an experiment.
b) Dual observation: The random variable X is observed on two different terms. And there are
two different strategies of this arrangement of an experiment:
­ Two-sample comparing: The two independent samples X11, . . . , X1n1 ; X21, . . . , X2n2 , whose
sizes may differ, correspondent to this design.
­ Pair comparing: A random sample (X11, X12), . . . , (Xn1, Xn2) from two-dimensional distribution
correspondents to this design. In this case we transform the given sample into a
random sample Z1, . . . , Zn; where Zi = Xi1 - Xi2, i = 1, 2, . . . , n. This procedure leads
to the simple observation.
c) Multiple observation: The random variable X is observed on p  3 different terms. And there
are two different strategies of this arrangement of an experiment:
­ Multi-sample comparing: p independent samples X11, . . . , X1n1 ; . . . ; Xp1, . . . , Xpnp, whose
sizes may differ, correspondent to this design.
­ Block comparing: A random sample (X11, . . . , X1p), . . . , (Xn1, . . . , Xnp) from p-dimensional
distribution correspondents to this design.
15