3 The basic concepts of mathematical statistics Probability theory and mathematical statistics have the subject of their interest in common. Describing the reality they both respect the impact of stable random influences. In contrast to probability theory, mathematical statistics handles with more unfamiliarity with the considered reality - we have to handle with a family of in advance acceptable probability measures P, not only with one measure, as probability theory does. We do not know which measure is appropriate in advance. On the basis of observed and analyzed statistical data we try at lest approximately identify the best fitting probability measure. A population consists of 10 balls about which we know only they are black or white. We do not know the exact number of black or white balls and we can not find it out. But we can draw one ball with replacement many times. On the basis of the results of drawing we may estimate the unknown number of black balls. This estimation is plausible, if the number of drawing is sufficiently large. Let us imagine we have drawn 100 times one ball (with replacement) and there have been only 9 black ball results. It is highly probable that the number of black balls is less then the number of white balls. And we can assert more. The numbers 1,2,...,9 are the candidates for the unknown quantity of black balls. It seems, that according to the results, the number one is the most plausible candidate. Mathematical statistics is concerned with (among others): a) theory of estimating parameters, (e.g. the quantity of black balls in the sample). b) theory of hypothesis testing, (e.g. the hypothesis test whether there are c black balls). Both procedures are based on statistical data organized in data sets. This procedures give plausible results if the drawing fulfill some assumptions. The following term of random sample is related to the idea of proper collection of data. [Usually, the data processed by statisticians are taken from a population. The portion drawn is called a sample. If the sample is representative of the population, then inferences and conclusions made from the sample can be extended to the population as a whole.] Definition 3.1 (i.) Let X1, . . . , Xn be independent and identically distributed random variables, Xi L(), i = 1, . . . , n, where L() is a probability distribution. Then X1, . . . , Xn is referred to as random sample of size n from the probability distribution L(). The realizations x1, . . . , xn organized into the column vector form data set. (ii.) Let (X1, Y1), . . . , (Xn, Yn) be independent and identically distributed random vectors, (Xi, Yi) L2(), i = 1, . . . , n; where L2() is a joint probability distribution. Then (X1, Y1), . . . , (Xn, Yn) is referred to as random sample of size n from the two-dimensional probability distribution L2(). The realization (x1, y1), . . . , (xn, yn) organized into the matrix n × 2 form the data set. (iii.) Analogously random sample of size n from the p-dimensional probability distribution Lp(), p 3 can be defined. (iv.) Any function T of a random sample (or a number of random samples), where the function itself is independent of the sample's distribution, is called statistic. The term is used both for the function and for the value of the function on a given sample. Remark A statistic is an observable random variable, which differentiates it from a parameter, a generally unobservable quantity describing a property of a statistical population. 12 Corollary 3.2 Let X1, . . . , Xn be a random sample from distribution function F(x). Let F(x) = F(x1, . . . , xn) be a joint distribution function of a random vector (X1, . . . , Xn). Then the following assertion holds: F(x) = F(x1) F(x2) . . . F(xn) The following definition lists essential often used statistics (plural of statistic). Definition 3.3 (i.) Let X1, . . . , Xn be a random sample, n 2 ­ Statistic M = 1 n n i=1 Xi is called sample mean. [Instead of M it can be denoted X.] ­ Statistic S2 = 1 n-1 n i=1 (Xi - M)2 is called sample variance. ­ Statistic S = S2 is called sample standard deviation. ­ Statistic Fn(x) = 1 n card{i, Xi x}, x R is called the value of sample distribution function in a point x. [for any fixed real number x : card{i, Xi x} is a quantity of those realizations of a random vector which are less than or equal to x.] (ii.) Let X11, . . . , X1n1 ; . . . ; Xp1, . . . , Xpnp be a sequence of p independent random samples of sizes n1 2, . . . , np 2,. The total size is n = p j=1 nj. Let us denote sample means as M1, . . . , Mp and sample variances of particular samples as S2 1 , . . . , S2 p. Let c1, . . . , cp be real constants at least one of which is non-zero. ­ Statistic p j=1 cjMj is called linear combination of sample means ­ Statistic S2 = p j=1 (nj -1)S2 j n-p is called weighted mean of sample variances (iii.) Let (X1, Y1), . . . , (Xn, Yn) be a random sample from two-dimensional distribution. Let us denote sample means as M1 = 1 n n i=1 Xi, M2 = 1 n n i=1 Yi and sample variances as S2 1 = 1 n-1 n i=1 (Xi - M1)2 , S2 2 = 1 n-1 n i=1 (Yi - M2)2 . ­ Statistic S12 = 1 n-1 n i=1 (Xi - M1)(Yi - M2) is called sample covariance ­ Statistic R12 = 1 n-1 n i=1 (Xi-M1) S1 (Yi-M2) S2 = S12 S1S2 for S1S2 = 0 0 otherwise is called sample correlation coefficient [Transforming the random sample by particular function we obtain statistics M, S2 , S, S12, R12, thus these statistics are random variables and are denoted by capital letters. The numerical realization of a random sample leads to numerical realizations of previously mentioned statistics which are denoted by small letters m, s2 , s, s12, r12. These realisations are corresponding to characteristics known from descriptive statistics. But there is an important difference. In case of variance, covariance and correlation coefficient there is a constant 1 n-1 in front of sum instead of 1 n as it is used in descriptive statistics. Example 3.4 An unknown constant was mutually independently measured by 10 times. The results of measurement follow: 2; 1,8; 2,1; 2,4; 1,9; 2,1; 2; 1,8; 2,3; 2,2. These results we can view as a realization of a random sample X1, . . . , X10. Calculate m, s2 and the values of a sample distribution function F10(x). 13 Solution m = 1 n n i=1 xi = 1 10 (2 + 1, 8 + . . . + 2, 2) = 2, 06 s2 = 1 n-1 n i=1 (xi - m)2 = 1 n-1 n i=1 (x2 i - 2mxi + m2 ) = 1 n-1 [ n i=1 x2 i - 2m n i=1 xi + n i=1 m2 ] = = 1 n-1 [ n i=1 x2 i - 2mnm + nm2 ] = 1 n-1 [ n i=1 x2 i - nm2 ] = = 1 9 (22 + 1, 82 + . . . + 2, 22 - 10 2, 062 ) = 0, 0404 s = s2 = 0, 0404 = 0, 2011 To make the calculation of F10(x) easier , the results of measurement will be in ascendent order: 1,8; 1,8; 1,9; 2; 2; 2,1; 2,1; 2,2; 2,3; 2,4; for x < 1, 8 : F10(x) = 0 for 1, 8 x < 1, 9 : F10(x) = 0, 2 for 1, 9 x < 2 : F10(x) = 0, 3 for 2 x < 2, 1 : F10(x) = 0, 5 for 2, 1 x < 2, 2 : F10(x) = 0, 7 for 2, 2 x < 2, 3 : F10(x) = 0, 8 for 2, 3 x < 2, 4 : F10(x) = 0, 9 for x 2, 4 : F10(x) = 1 Example 3.5 Consider 11 randomly drawn cars of a particular brand. A random variable X stands for an age of a car (in years) and Y stands for a price of a car (in Kč). The results are listed in the following tab: X 5 4 6 5 5 5 6 6 2 7 7 Y 85 103 70 82 89 98 66 95 169 70 48 Calculate and interpret r12. Solution m1 = 1 11 (5 + 4 + . . . + 7) = 5, 28 m2 = 1 11 (85 + . . . + 48) = 88, 63 s2 1 = 1 n-1 [ n i=1 x2 i - nm2 1] = 1 10 (52 + 42 + . . . + 72 - 11 5, 282 ) = 2, 02 s2 2 = 1 n-1 [ n i=1 y2 i - nm2 2] = 1 10 (852 + 1032 + . . . + 482 - 11 88, 632 ) = 970, 85 s12 = 1 n-1 [ n i=1 (xi - m1)(yi - m2)] = 1 n-1 [ n i=1 xiyi - nm1m2] = = 1 10 (5 85 + 4 103 + . . . + 7 48 - 11 5, 28 88, 63) = -40, 89 r12 = s12 s1s2 = -40,89 2,02 970,85 = -0, 92 There is a strong decreasing linear relationship between variables X and Y : the older car, the lower price. The essential properties of frequently used statistics are listed in the following theorem. The properties mentioned in the first paragraph will be derived in seminar. Theorem 3.6 1.) Let X1, . . . , Xn be a random sample from a distribution with expected value , variance 2 and distribution function F(x). Then: E(M) = D(M) = 2 n , n 2 E(S2 ) = 2 for any x R : E[Fn(x)] = F(x), D[Fn(x)] = F (x)(1-F (x)) n 2.) Let X11, . . . , X1n1 ; . . . ; Xp1, . . . , Xpnp be a sequence of p independent random samples with mean values 1, . . . , p and identical variance 2 for each of p samples. Let us denote the total size as n = p j=1 nj. Further let c1, . . . , cp be real constants at least one of which is non-zero. Then: 14 E( p j=1 cjMj) = p j=1 cjj E(S2 ) = 2 3.) Let (X1, Y1), . . . , (Xn, Yn) be a random sample from two-dimensional distribution with the covariance 12 and the correlation coefficient . Then: E(S12) = 12, however E(R12) (Approximation is appropriate for n 30) Remark 3.7 Methods of mathematical statistics are often used for analyzing and interpreting the results of experiments. In order to do it correctly it is very important to design experiments in a right way. The list of basic patterns of the arrangement of experimental units into groups follows. a) Simple observation: The random variable X is observed on equal terms. A random sample X1, . . . , Xn correspondents to this arrangement of an experiment. b) Dual observation: The random variable X is observed on two different terms. And there are two different strategies of this arrangement of an experiment: ­ Two-sample comparing: The two independent samples X11, . . . , X1n1 ; X21, . . . , X2n2 , whose sizes may differ, correspondent to this design. ­ Pair comparing: A random sample (X11, X12), . . . , (Xn1, Xn2) from two-dimensional distribution correspondents to this design. In this case we transform the given sample into a random sample Z1, . . . , Zn; where Zi = Xi1 - Xi2, i = 1, 2, . . . , n. This procedure leads to the simple observation. c) Multiple observation: The random variable X is observed on p 3 different terms. And there are two different strategies of this arrangement of an experiment: ­ Multi-sample comparing: p independent samples X11, . . . , X1n1 ; . . . ; Xp1, . . . , Xpnp, whose sizes may differ, correspondent to this design. ­ Block comparing: A random sample (X11, . . . , X1p), . . . , (Xn1, . . . , Xnp) from p-dimensional distribution correspondents to this design. 15