6 The statistical inferences based on a single sample from the normal distribution It is not very difficult to find random variables which refer to natural or social phenomena and which are - or can be assumed to be - normally distributed. Then in case of not normally distributed random variables if the sample is large, we can invoke the central limit theorem. Thus we can obtain approximately normal distribution. Therefore it is necessary to pay great attention to random samples from normal distribution. Normal distribution is fully specified by two parameters, mean \i and variance a2. Thus we are going to follow the tasks concerning with these parameters, e.g. forming the confidence intervals or hypothesis testing. For simple random sample from normal distribution the following theorem states the list of common test statistics and their distributions: Theorem 6.1 Let Xi,..., Xn be a random sample from normal distribution N(fj,, a2). Then: n n 1. The sample mean M = J2 X and the sample variance S2 = J2(Xi — M)2 are mutually %=\ %=\ independent. 2. U = ^ ~ N(0,1), thus M ~ N(ß, £) [Pivotal statistic U is instrumental towards inferences about /x, when a2 is known.] 3. K = ^~X2(n-l) [Pivotal statistic K is instrumental towards inferences about a2, when \i is unknown.] 4. T=M^t(n_i) y/ň [Pivotal statistic T is instrumental towards inferences about /x, when a2 is unknown.] n E(^-m)2 5. ^2-----~ x2{n) [This pivotal statistic is instrumental towards inferences about a2, when \i is known.] 25 N (0,1) ua O Ua/2 O Wl-a/2 O ui_ X» oxlM ox*/2(") xí_a/2W o XÍ-aW *(«/) *a(ľ) 0 ía/2(ľ) 0 řl-a/2(^) 0 íi-a(l/) -F(^i,^) 0Fa(í^i,í/2) 0^/2(^1,^2) ŕl-a/2(^l,^2) 0 Fi_a(z/i,í/2) «c = -Ui- ta{v) = -t\-a(v) Fa{vUV2) = jr^ ľ2,ľl) Example 6.2 A weight of a packet of granulated sugar follows the normal distribution Af(1002 2 and c is a constant. Test H0 : ß = c versus Hx : ß ^ c (eventually Hx : ß < c eventually Hx : ß > c) is called z-test. Let X\,... ,Xn be a random sample from N(ß,a2), where a2 is unknown. Let n > 2 and c is a constant. Test H0 : ß = c versus H\ : ß ^ c (eventually H\ : ß < c eventually H\ : ß > c) is called one-sample t-test. Let X\,... ,Xn be a random sample from N(ß,a2), where ß is unknown. Let n > 2 and c is a constant. Test H0 : a2 = c versus Hx : a2 ^ c (eventually Hx : a2 < c eventually Hx : a2 > c) is called test about variance. Remark 6.6 The selection of an appropriate test statistic corresponding to particular test is analogous to the selection of an appropriate pivotal statistic in 6.3, thus for z-test the test statistic T0 is derived from statistic U, for t-test it is derived from statistic T and for test about variance it is derived from statistic K. Beware of ambiguity of a letter T. In general T0 stands for any test statistic; in case of t-test T stands for statistic following Student's t-distribution. Under the null hypothesis it can be written T0 = U, To = T,T0 = K Theorem 6.7 Let Xi,..., Xn ~ N(ß, a2) , c G R, n > 2 28 Considering z-test at the significance level a the null hypothesis H0 is rejected in favor of alternative hypothesis Hi, if the realization of the test statistic T0 = ^řr falls within critical region W. According to the form of the alternative hypothesis the list of corresponding critical regions follows : two-tailed test Hi left-tailed test Hi right-tailed test Hi ß^c W = (-00, -«1-0/2) lK«i-a/2, 00) (J, < C W = (-OO, -Ui-a) (J, > C W = (Ui-a, OO) Considering t-test at the significance level a the null hypothesis Ho is rejected in favor of alternative hypothesis Hi, if the realization of the test statistic T0 = -^ř^ falls within critical region W. two-tailed test Hi left-tailed test Hi right-tailed test Hi ß^c W = (-00, -ti-a/2(n-l))\J{ti-a/2(n-l), 00) ß c W = {ti-a(n- 1), 00) two-tailed test Hi: a2^c W left-tailed test Hi: a2 < c W right-tailed test Hi: a2 > c W 3. Considering test about variance at the significance level a the null hypothesis H0 is rejected in favor of alternative hypothesis Hi, if the realization of the test statistic T0 = *-"" '— falls within critical region W. (0, Xa/2(n-l)>lKx?_«/2fa-l), 00) (0, Xl(n-1)) (xi-a(n-l), 00) Example 6.8 According to the chocolate-wrapper, the net weight of chocolate should be 125 g. Manufacturer recorded buyers's complaints of lower weight then it was declared. For that reason the audit division drawn randomly 50 chocolates and found out the mean weight was 122g and the standard deviation was 8.6 g. Assuming that the weight of chocolates follows the normal distribution and using the significance level a = 0.01 can we conclude that the buyer's complaints are true? Solution Xi,..., X50 ~ N(ß, a2). We are testing H0 : \i = 125 versus Hx : ß < 125. Parameter a2 is unknown, thus the task leads to one-sample t-test. The test statistic: T0 = -^£. The numerical realization of it: t0 = l2\}25 = —2, 4667. Critical region: W = (-00, -ti-a(n - 1)} = (-00, -i0,99(49)} = (-00; -2,4049} Since t0 G W, H0 is rejected at the significance level 0,01. The buyer's complaints can be concluded as true and the risk of an error is at most 1%. Having random sample from two-dimensional normal distribution, this can be convert to single normal sample. Then the above stated inferences can be used. Poznámka 6.9 on random sample from two-dimensional normal distribution u*(5).-.(-ř:)~^((s). CL^i)).«*2- Using linear transformation the random sample (YJ is converted to scalar random variable Z = (X - Y) ~ N((ßi - ß2), (a2 - 2a12 + a2)) Let us denote ß = ßi — ß2 o2 = o\ — 2aí2 + of 29 Thus the random sample (Xi — Yi),..., (Xn — Yn) = Zi,... N(fi, a2) and is called říkáme mu differential random sample . Z„ follows the normal distribution No T2 n > 2 and variance-covariance matrix £ is unknown. Con- Theorem 6.10 Let (g),..., (íj~-2^w, Wi2^ sidering 100(1 — a)% empirical confidence interval the confidence limits for the parametric function fj, = ß\ — ß2 have the form: d = m- -^ -ti_a/2(n- 1) h = mJrÍk -ti-a^in- 1) Example 6.11 The chemical content in solution were tested by two laboratory measurements. (Data are in percentages.) The random sample consist of 5 specimen.: the number of specimen 1. method 2. method 2.3 2.4 1.9 2.0 2.1 2.0 2.4 2.3 5 2.6 2.5 Assuming the sample is selected from two-dimensional normal distribution determine 90% empirical confidence interval for difference between expected values of considered methods. Solution At first we transform the given sample to the differential sample, where: z1 = -0.1 z2 = -0.1 z3=0.1 z4 = 0.1 z5 = 0.1 m 0.2 -0.1 0.012 0.109545 n 5 t1-a/2(n-l) = to.95(4) = 2.1318 d = m-^-ti_Q/2(n-l) = 0,02-h = m + -JZ ■ *i-a/2(n - 1) = 0,02 + 0,109545 0,109545 •2,1318 = -0.0844 •2,1318 = 0,1244 The confidence interval —0, 0844 < \i < 0,1244 is true with the probability at least 0.95. Definition 6.12 ^2 (t), (±a%)),n>2. Let ($),..., g (7i2 o"; The test H0 : ßx — ß2 = 0 versus Hi : ßx — ß2 7^ 0 is called paired t-test. Using differential random sample the paired t-test is converted to single sample t-test. Přechodem k rozdílovému náhodnému výběru převedeme párový t-test na jednovýběrový t-test. Example 6.13 The following table lists the rate of return on investment (in percentages) of 12 randomly drawn companies, whose foreign investments are represented by random variable X and domestic investments are represented by random variable Y: a number of company 1 2 3 4 5 6 7 8 9 10 11 12 X Y 10 11 12 14 14 15 12 11 12 13 17 16 9 10 15 13 9 11 11 17 7 9 15 19 Assuming the sample is selected from two-dimensional normal distribution, at a significance level a = 0.1 run the test that there is no difference between foreign and domestic investment. Use a)confidence interval method, b) classical method. Solution At first we transform the given sample to the differential sample Z i = Xj—Yi, i = 1,... 12. Realization of sample characteristics follows: m = — 1, 33, s2 = 4, 78 We are testing hypothesis H0 : ß = 0 versus H\ : ß ^ 0, 30 needed quantile t0.95(H) = 1, 7959 ad a) d = m - ^ • Í!_Q/2(n - 1) = -1,3- ^ • 1, 7959 = -2, 4677 fc = m + ^ • *i-a/2(n - 1) = -1,3+ ^5 • 1, 7959 = -0,1989 •Since 0 ^ (—2,4677 , —0,1989), H0 is rejected on the significance level 0,1. ad b) The test statistic follows: Tn M-c s ■ -1,3-0 The numerical realization follows: to = / — = —2, 11085. V4.78 ' vT2 The critical region follows: W = (—oo, — ii_a/2(n — 1)) U (ti_Q,/2(n — 1), oo) = (-00, -1,7959) U (1,7959, oo) •Since to G W, H0 is rejected on the significance level 0,1. 31