6    The statistical inferences based on a single sample from the normal distribution
It is not very difficult to find random variables which refer to natural or social phenomena and which are - or can be assumed to be - normally distributed. Then in case of not normally distributed random variables if the sample is large, we can invoke the central limit theorem. Thus we can obtain approximately normal distribution. Therefore it is necessary to pay great attention to random samples from normal distribution.
Normal distribution is fully specified by two parameters, mean \i and variance a2. Thus we are going to follow the tasks concerning with these parameters, e.g. forming the confidence intervals or hypothesis testing.
For simple random sample from normal distribution the following theorem states the list of common test statistics and their distributions:
Theorem 6.1
Let Xi,..., Xn be a random sample from normal distribution N(fj,, a2). Then:
n                                                                                n
1.   The sample mean M =   J2 X and the sample variance S2 =  J2(Xi — M)2 are mutually
%=\                                                                             %=\
independent.
2.   U = ^ ~ N(0,1), thus M ~ N(ß, £)
[Pivotal statistic U is instrumental towards inferences about /x, when a2 is known.]
3.  K = ^~X2(n-l)
[Pivotal statistic K is instrumental towards inferences about a2, when \i is unknown.]
4.  T=M^t(n_i)
y/ň
[Pivotal statistic T is instrumental towards inferences about /x, when a2 is unknown.]
n
E(^-m)2
5.   ^2-----~ x2{n)
[This pivotal statistic is instrumental towards inferences about a2, when \i is known.]
25
N (0,1)
ua          O
Ua/2        O              Wl-a/2
O          ui_
X»
oxlM
ox*/2(")
xí_a/2W     o
XÍ-aW
*(«/)
*a(ľ)      0
ía/2(ľ)     0          řl-a/2(^)
0         íi-a(l/)
-F(^i,^)
0Fa(í^i,í/2)
0^/2(^1,^2)   ŕl-a/2(^l,^2)        0
Fi_a(z/i,í/2)
«c = -Ui-
ta{v) = -t\-a(v)
Fa{vUV2) = jr^
ľ2,ľl)
Example 6.2
A weight of a packet of granulated sugar follows the normal distribution Af(1002<jf, 64 g2). The inspection draws randomly 9 packets of one series and is finding if the average weight is at least 999 g. Otherwise the enterprise has to pay a penalty. Find the probability that the enterprise will have to pay the penalty.
Solution
XU...,X9~ JV(1002, 64), M ~ JV(1002,  f), P(M < 999) =?
P(M < 999) = P(
M-1002   <-   999-1002-
) = p{U < =f) = 1 - $(§) = 1 - $(1,125) = 1-0, 87076
0,12924.
The probability, that the enterprise will pay the penalty is approximately 12,9%.
The common statistician's task is to derive confidence intervals for unknown parameters. In case of the normal distribution they are parameters ß and a2, thus four situations may occur: finding the
26
confidence interval 1. for fj,, when a2 is known; 2. for a2, when \i is unknown; 3. for fj,, when a2 is unknown a 4. for a2, when // is known. Doing confidence interval in accordance to one of four mentioned situations the appropriate pivotal statistic has to be selected . Then using the procedure 4.9, the construction of the confidence interval is easy. In case of the first situation it has been done in the example 4.11. The following theorem states the upper and lower limits of the confidence intervals for any mentioned situation.
Theorem 6.3
Let Xi,..., Xn be a random sample from normal distribution N(fj,, a2). Let us consider 100(1 — a)% empirical confidence interval.
1.   The confidence interval for [i, when a2 is known is derived from pivotal statistic U = ^ ~ N(0,1). Thus the limits are for:
two-sided conf. int.   (á, h)           =    (m —^= • Wi_a/2    ,     m + -^= • Wi_a/2
left-sided conf. int.   (á, oo)          =    (m —^ • u\-a    ,     oo
right-sided conf. int.   (—oo, h)   =    f—oo    ,     m + -^= • U\_
2.   The confidence interval for a2, when \i is unknown is derived from pivotal statistic K =       J— ~ x2(n ~ !)• Thus the limits are for:
two-sided conf. int.   (á, h)           =    ( -^—f—.rr    ,     -?
(n—l)s2                     (n—l)s2
xi_„/2(™-l)       '        XÍ/2(n-l)
02
left-sided conf. int.   (á, oo)          =    (   j" , ^.s     ,     oo
right-sided conf. int.   (—oo, /i)   =    ŕ—oo    ,      "L,^
3.   The confidence interval for [i, when a2 is unknown is derived from pivotal statistic T = ^£^ ~ t(n — 1). Thus the limits are for:
two-sided conf. int.   (d, /i)           =    (m - -^ ■ ti-a/2(n - 1) , m + -^ • íi_a/2(n - 1)J
left-sided conf. int.   (á, oo)          =    (m —4= ■ ti_a(n — 1)    ,     oo)
right-sided conf. int.   (—oo, h)   =    ŕ—oo    ,     m + -^ • ti_a(n — 1))
4.   The confidence interval for a2, when // is known is derived from pivotal statistic
n
E(^-m)2
X2(n). Thus the limits are for:
t=i
two-sided conf. int.   (d, h) left-sided conf. int.   (d, oo) right-sided conf. int.   (—oo, h)
SO*-/*)2		E(^-m)2
t=i	)	t=i
XÍ-a/2(n)		x2/2(™)
n SO*-/*)2		^
t=l	)	
X2_a(™)		
	n	t-M)2\
	t=i	1
x!(«)
Example 6.4
The constant ß was measured 10 times independently. The results of measuring are:
27
2    1,8    2,1    2,4    1,9    2,1    2    1,8    2,3    2,2
These results are assumed to be the numerical realization of a random sample Xi,... ,Xn from distribution N(ß, a2) where parameters ß, a2 are unknown. Find the 95% confidence interval for the parameter ß a) two-sided, b) left-sided, c) right-sided.
Solution
It is the confidence interval for ß when a2 is unknown. The statistic T is instrumental to deriving confidence limits, T = -^ř^ ~ t(n — 1) whose a— quantiles are looked up in table.
n =10        a = 0,05        ti_a/2(n - 1) = r0,975(9) = 2, 2622
íi-a(n-l) = í0)95(9) = 1,8331
m = 2, 06        s2 = 0.0404        g = 0.2011_______________________________________________________
ad a)
d = m-^- ti_Q/2(n - 1) = 2, 06 - 2Ä • 2, 2622 = 1, 92
h = m + ^-ti_a/2(n-l) = 2,06+ ^-2,2622 = 2,20
1, 92 < ß < 2, 2  with the probability at least 0,95
ad b)
d = m - ^ • ti_a(n - 1) = 2, 06 - ^ • 1, 8331 = 1, 94
1, 94 < ß  with the probability at least 0,95
ad c)
h = m + ^- ti_a/2(n - 1) = 2, 06 + ^ • 1, 8331 = 2,18
ß < 2,18  with the probability at least 0,95
So much for confidence intervals and now let us turn to hypothesis testing. We will follow the classical method using critical region; the other methods can be derived easily.
Definition 6.5
Let Xi,..., Xn be a random sample from N(ß, a2), where a2 is known. Let n > 2 and c is a constant. Test H0 :  ß = c versus Hx :  ß ^ c (eventually Hx :  ß < c eventually Hx :  ß > c) is called z-test.
Let X\,... ,Xn be a random sample from N(ß,a2), where a2 is unknown. Let n > 2 and c is a constant. Test H0 : ß = c versus H\ : ß ^ c (eventually H\ : ß < c eventually H\ : ß > c) is called one-sample   t-test.
Let X\,... ,Xn be a random sample from N(ß,a2), where ß is unknown. Let n > 2 and c is a constant. Test H0 : a2 = c versus Hx : a2 ^ c (eventually Hx : a2 < c eventually Hx : a2 > c) is called   test about variance.
Remark 6.6
The selection of an appropriate test statistic corresponding to particular test is analogous to the selection of an appropriate pivotal statistic in 6.3, thus for z-test the test statistic T0 is derived from statistic U, for t-test it is derived from statistic T and for test about variance it is derived from statistic K.
Beware of ambiguity of a letter T. In general T0 stands for any test statistic; in case of t-test T stands for statistic following Student's t-distribution. Under the null hypothesis it can be written T0 = U, To = T,T0 = K
Theorem 6.7
Let Xi,..., Xn ~ N(ß, a2) , c G R, n > 2
28
Considering z-test at the significance level a the null hypothesis H0 is rejected in favor of alternative hypothesis Hi, if the realization of the test statistic T0 = ^řr falls within critical
region W. According to the form of the alternative hypothesis the list of corresponding critical regions follows :
two-tailed test Hi left-tailed test Hi right-tailed test     Hi
ß^c            W = (-00,  -«1-0/2) lK«i-a/2, 00)
(J, < C            W = (-OO,    -Ui-a)
(J, > C            W = (Ui-a,   OO)
Considering t-test at the significance level a the null hypothesis Ho is rejected in favor of alternative hypothesis Hi, if the realization of the test statistic T0 = -^ř^ falls within critical
region W.
two-tailed test Hi left-tailed test Hi right-tailed test     Hi
ß^c          W = (-00,  -ti-a/2(n-l))\J{ti-a/2(n-l), 00)
ß<c          W = (-00,  -ti_a(n- 1))
fj, > c          W = {ti-a(n- 1), 00)
two-tailed test	Hi:	a2^c	W
left-tailed test	Hi:	a2 < c	W
right-tailed test	Hi:	a2 > c	W
3. Considering test about variance at the significance level a the null hypothesis H0 is rejected in favor of alternative hypothesis Hi, if the realization of the test statistic T0 = *-"" '— falls within critical region W.
(0, Xa/2(n-l)>lKx?_«/2fa-l), 00)
(0, Xl(n-1))
(xi-a(n-l), 00)
Example 6.8
According to the chocolate-wrapper, the net weight of chocolate should be 125 g. Manufacturer recorded buyers's complaints of lower weight then it was declared. For that reason the audit division drawn randomly 50 chocolates and found out the mean weight was 122g and the standard deviation was 8.6 g. Assuming that the weight of chocolates follows the normal distribution and using the significance level a = 0.01 can we conclude that the buyer's complaints are true?
Solution
Xi,..., X50 ~ N(ß, a2). We are testing H0 : \i = 125 versus Hx : ß < 125. Parameter a2 is unknown, thus the task leads to one-sample t-test.
The test statistic: T0 = -^£.
The numerical realization of it: t0 = l2\}25 = —2, 4667.
Critical region: W = (-00, -ti-a(n - 1)} = (-00, -i0,99(49)} = (-00; -2,4049} Since t0 G W, H0 is rejected at the significance level 0,01.
The buyer's complaints can be concluded as true and the risk of an error is at most 1%.
Having random sample from two-dimensional normal distribution, this can be convert to single normal sample. Then the above stated inferences can be used.
Poznámka 6.9 on random sample from two-dimensional normal distribution
u*(5).-.(-ř:)~^((s). CL^i)).«*2-
Using linear transformation the random sample (YJ is converted to scalar random variable Z =
(X - Y) ~ N((ßi - ß2),  (a2 - 2a12 + a2))
Let us denote ß = ßi — ß2        o2 = o\ — 2aí2 + of
29
Thus the random sample (Xi — Yi),..., (Xn — Yn) = Zi,... N(fi, a2) and is called říkáme mu differential random sample .
Z„ follows the normal distribution
No
T2
n > 2 and variance-covariance matrix £ is unknown. Con-
Theorem 6.10
Let (g),..., (íj~-2^w,  Wi2^
sidering 100(1 — a)% empirical confidence interval the confidence limits for the parametric function fj, = ß\ — ß2 have the form: d = m- -^ -ti_a/2(n- 1) h = mJrÍk -ti-a^in- 1)
Example 6.11
The chemical content in solution were tested by two laboratory measurements. (Data are in percentages.) The random sample consist of 5 specimen.:
the number of specimen
1.  method
2.  method
2.3
2.4
1.9 2.0
2.1 2.0
2.4 2.3
5
2.6 2.5
Assuming the sample is selected from two-dimensional normal distribution determine 90% empirical confidence interval for difference between expected values of considered methods.
Solution
At first we transform the given sample to the differential sample, where: z1 = -0.1    z2 = -0.1    z3=0.1    z4 = 0.1    z5 = 0.1
m
0.2
-0.1 0.012
0.109545
n
5        t1-a/2(n-l) = to.95(4) = 2.1318
d = m-^-ti_Q/2(n-l) = 0,02-h = m + -JZ ■ *i-a/2(n - 1) = 0,02 +
0,109545 0,109545
•2,1318 = -0.0844 •2,1318 = 0,1244
The confidence interval —0, 0844 < \i < 0,1244 is true with the probability at least 0.95. Definition 6.12
^2 (t),  (±a%)),n>2.
Let ($),..., g
(7i2 o";
The test H0 : ßx — ß2 = 0 versus Hi : ßx — ß2 7^ 0 is called paired t-test. Using differential random sample the paired t-test is converted to single sample t-test. Přechodem k rozdílovému náhodnému výběru převedeme párový t-test na jednovýběrový t-test.
Example 6.13
The following table lists the rate of return on investment (in percentages) of 12 randomly drawn companies, whose foreign investments are represented by random variable X and domestic investments are represented by random variable Y:
a number of company	1	2	3	4	5	6	7	8	9	10	11	12
X Y	10 11	12 14	14 15	12 11	12 13	17 16	9 10	15 13	9 11	11 17	7 9	15 19
Assuming the sample is selected from two-dimensional normal distribution, at a significance level a = 0.1 run the test that there is no difference between foreign and domestic investment. Use a)confidence interval method, b) classical method.
Solution
At first we transform the given sample to the differential sample Z i = Xj—Yi, i = 1,... 12. Realization of sample characteristics follows: m = — 1, 33,     s2 = 4, 78 We are testing hypothesis H0 : ß = 0 versus H\ : ß ^ 0,
30
needed quantile t0.95(H) = 1, 7959
ad a)
d = m - ^ • Í!_Q/2(n - 1) = -1,3- ^ • 1, 7959 = -2, 4677
fc = m + ^ • *i-a/2(n - 1) = -1,3+ ^5 • 1, 7959 = -0,1989
•Since 0 ^ (—2,4677 ,  —0,1989), H0 is rejected on the significance level 0,1.
ad b)
The test statistic follows: Tn
M-c
s    ■
-1,3-0
The numerical realization follows: to =    / —   = —2, 11085.
V4.78                   '
vT2
The critical region follows: W = (—oo,  — ii_a/2(n — 1)) U (ti_Q,/2(n — 1), oo)
= (-00,  -1,7959) U (1,7959, oo)
•Since to G W, H0 is rejected on the significance level 0,1.
31