7 The statistical inferences based on two independent samples from the normal distribution. In this chapter we will be concerned with two independent samples, where the first sample follows a distribution N(1, 2 1) and the second one follows a distribution N(2, 2 2). We will do inferences (interval estimation and hypothesis testing) about parametric a function 1 - 2 or 2 1 2 2 . Statistics derived from sample means and sample variances of mentioned two samples and assertions about their distributions are stated in following theorem. Theorem 7.1 Consider two independent samples. Let X11, . . . , X1n1 be a random sample from normal distribution N(1, 2 1) and X21, . . . , X2n2 be a random sample from normal distribution N(2, 2 2), whereas n1 2, n2 2. Let us denote M1, M2 as sample means, S2 1 , S2 2 as sample variances and S2 = (n1-1)S2 1 +(n2-1)S2 2 n1+n2-2 as weighted mean of sample variances. Then : 1. Statistics (M1 - M2) and S2 are independent. 2. U = (M1-M2)-(1-2) 2 1 n1 + 2 2 n2 N(0, 1), thus M1 - M2 N 1 - 2, 2 1 n1 + 2 2 n2 [Pivotal statistic U is instrumental towards inferences about 1 - 2, when 2 1, 2 2 is known.] 3. If 2 1 = 2 2 =: 2 , then K = (n1+n2-2)S2 2 2 (n1 + n2 - 2) [Pivotal statistic K is instrumental towards inferences about common 2 .] 4. If 2 1 = 2 2 =: 2 , then T = (M1-M2)-(1-2) S 1 n1 + 1 n2 t(n1 + n2 - 2) [Pivotal statistic T is instrumental towards inferences about 1 -2, when 2 1, 2 2 are unknown, but equal.] 5. F = S2 1 /S2 2 2 1/2 2 F(n1 - 1, n2 - 1) [Pivotal statistic F is instrumental towards inferences about 2 1/2 2.] Using the above mentioned pivotal statistics we can construct confidence intervals for parametric functions 1 - 2 and 2 1/2 2. Estimating 1 - 2 we have to differentiate whether variances are known or unknown. If they are unknown, then we have to find out whether they are equal or not. The equivalence of variances may be tested by means of F-test, which will be showed later. The following theorem states the confidence limits for mentioned parametric functions. Theorem 7.2 Let us consider two independent samples. Let X11, . . . , X1n1 be a random sample from normal distribution N(1, 2 1) and X21, . . . , X2n2 be a random sample from normal distribution N(2, 2 2), whereas n1 2, n2 2. Further let us consider the confidence level 100(1 - )%. Then 1. The confidence interval for 1 - 2, when 2 1, 2 2 are known, is derived from pivotal statistic U = (M1-M2)-(1-2) 2 1 n1 + 2 2 n2 N(0, 1). Thus the limits are for: 32 two-sided c.i. (d, h) = m1 - m2 - 2 1 n1 + 2 2 n2 u1-/2 , m1 - m2 + 2 1 n1 + 2 2 n2 u1-/2 left-sided c.i. (d, ) = m1 - m2 - 2 1 n1 + 2 2 n2 u1- , right-sided c.i. (-, h) = - , m1 - m2 + 2 1 n1 + 2 2 n2 u1- 2. The confidence interval for equal unknown variance 2 is derived from pivotal statistic K = (n1+n2-2)S2 2 2 (n1 + n2 - 2). Thus the limits are for: two-sided c.i. (d, h) = (n1+n2-2)s2 2 1-/2 (n1+n2-2) , (n1+n2-2)s2 2 /2 (n1+n2-2) left-sided c.i. (d, ) = (n1+n2-2)s2 2 1-(n1+n2-2) , right-sided c.i. (-, h) = - , (n1+n2-2)s2 2 (n1+n2-2) 3. The confidence interval for 1 - 2, when 2 1, 2 2 are unknown but equal, is derived from pivotal statistic T = (M1-M2)-(1-2) S 1 n1 + 1 n2 t(n1 + n2 - 2). Thus the limits are for: two-sided c.i. (d, h) = m1 - m2 - s 1 n1 + 1 n2 t1-/2(n1 + n2 - 2) , m1 - m2 + s 1 n1 + 1 n2 t1-/2(n1 + n2 - 2) left-sided c.i. (d, ) = m1 - m2 - s 1 n1 + 1 n2 t1-(n1 + n2 - 2) , right-sided c.i. (-, h) = - , m1 - m2 + s 1 n1 + 1 n2 t1-(n1 + n2 - 2) 4. The confidence interval for the ratio of variances 2 1/2 2 is derived from pivotal statistic F = S2 1 /S2 2 2 1/2 2 F(n1 - 1, n2 - 1). Thus the limits are for: two-sided c.i. (d, h) = s2 1/s2 2 F1-/2(n1-1,n2-1) , s2 1/s2 2 F/2(n1-1,n2-1) left-sided c.i. (d, ) = s2 1/s2 2 F1-(n1-1,n2-1) , right-sided c.i. (-, h) = - , s2 1/s2 2 F(n1-1,n2-1) Remark 7.3 If the assumption of equal variances does not hold in the point 3 of previous theorem, the approximative 100(1 - )% confidence interval for 1 - 2 may be constructed. In this case the test statistic T has an approximative student's distribution t(), where the degrees of freedom are calculated as follows: = (s2 1/n1 + s2 2/n2)2 (s2 1/n1)2 n1-1 + (s2 2/n2)2 n2-1 so called Welch's aproximation If is not an integer number, then use linear interpolation. Example 7.4 In two tanks the chlorine content was tested (g/l). 25 specimens was drawn from the first tank, 10 specimen was drawn from the second one. The realization of sample statistics follow: m1 = 34, 48, m2 = 35, 59, s2 1 = 1, 7482, s2 2 = 1, 7121. The values of specimens are assumed to be realization of two independent samples drawn from normal distributions N(1, 2 ) and N(2, 2 ). Determine the 95% empirical confidence interval for the difference between expected values 1 - 2. 33 Solution We have to construct the confidence interval for 1 - 2, when the variances 2 1, 2 2 are unknown but equal. It can be derived from pivotal statistic T = (M1-M2)-(1-2) S 1 n1 + 2 n2 t(n1 + n2 - 2). We will need the quantile t1-/2(n1 + n2 - 2) = t0,975(33) = 2, 035 and the realization of the weighted mean of sample variances s2 = (n1-1)s2 1+(n2-1)s2 2 n1+n2-2 = 24.1,7482+9.1,7121 33 = 1, 7384.. Thus the confidence limits are : d = m1 - m2 - s 1 n1 + 1 n2 t1-/2(n1 + n2 - 2) = = 34, 48 - 35, 59 - 1, 7384 1/25 + 1/10 2, 035 = -2, 114 h = m1 - m2 + s 1 n1 + 1 n2 t1-/2(n1 + n2 - 2) = = 34, 48 - 35, 59 + 1, 7384 1/25 + 1/10 2, 035 = -0, 106 Hence 1 - 2 (-2, 114 g/l , -0, 106 g/l) with the probability at least 0,95. Example 7.5 Consider previous example assuming that given independent samples are from distributions N(1, 2 1) and N(2, 2 2). Determine the 95% empirical confidence interval for the ratio of variances. Solution We have to construct the confidence interval for the ratio of variances 2 1/2 2. It can be derived from pivotal statistic F = S2 1 /S2 2 2 1/2 2 F(n1 - 1, n2 - 1). We will need the quantile F1-/2(n1 - 1, n2 - 1) = F0,972(24, 9) = 3, 6142 a F/2(n1 - 1, n2 - 1) = F0,025(24, 9) = 1 F0,975(9,24) = 1 2,7027 . Thus the confidence limits are: d = s2 1/s2 2 F1-/2(n1-1,n2-1) = . . . = 0, 28 h = s2 1/s2 2 F/2(n1-1,n2-1) = . . . = 2, 76 Hence P (2 1/2 2 (0, 28 ; 2, 76)) 0, 95 Definition 7.6 Let us consider two independent samples. Let X11, . . . , X1n1 be a random sample from normal distribution N(1, 2 1) and X21, . . . , X2n2 be a random sample from normal distribution N(2, 2 2), whereas n1 2, n2 2. Let c be a constant. (i.) Assume that 2 1, 2 2 is known. Then the test H0 : 1 -2 = c versus H1 : 1 -2 = c (eventually H1 : 1 -2 < c eventually H1 : 1 - 2 > c) is called the two-sample z-test. (ii.) Assume that 2 1, 2 2 is unknown, but the equation 2 1 = 2 2 holds. Then the test H0 : 1 -2 = c versus H1 : 1 -2 = c (eventually H1 : 1 -2 < c eventually H1 : 1 - 2 > c) is called the two-sample t-test. (iii.) The test H0 : 2 1 2 2 = 1 versus H1 : 2 1 2 2 = 1 (eventually H1 : 2 1 2 2 < 1 eventually H1 : 2 1 2 2 > 1) is called the F-test. 34 Remark 7.7 The selection of an appropriate test statistic corresponding to particular test is analogous to the selection of an appropriate pivotal statistic in 7.2, thus for two-sample z-test the test statistic T0 is derived from statistic U, for two-sample t-test it is derived from statistic T and for F-test it is derived from statistic F. Theorem 7.8 Let us consider two independent samples. Let X11, . . . , X1n1 be a random sample from normal distribution N(1, 2 1) and X21, . . . , X2n2 be a random sample from normal distribution N(2, 2 2), whereas n1 2, n2 2. Let c be a constant. 1. Considering two-sample z-test at the significance level the null hypothesis H0 is rejected in favor of alternative hypothesis H1, if the realization of the test statistic T0 = M1-M2-c 2 1 n1 + 2 2 n2 falls within the critical region W. According to the form of the alternative hypothesis the list of corresponding critical regions follows : two-tailed test H1 : 1 - 2 = c W = (-, -u1-/2 u1-/2, ) left-tailed test H1 : 1 - 2 < c W = (-, -u1right-tailed test H1 : 1 - 2 > c W = u1-, ) 2. Considering two-sample t-test at the significance level the null hypothesis H0 is rejected in favor of alternative hypothesis H1, if the realization of the test statistic T0 = M1-M2-c S 1 n1 + 1 n2 falls within the critical region W. two-tailed H1 : 1 - 2 = c W = (-, -t1-/2(n1 + n2 - 2) t1-/2(n1 + n2 - 2), ) left-tailed H1 : 1 - 2 < c W = (-, -t1-(n1 + n2 - 2) right-tailed H1 : 1 - 2 > c W = t1-(n1 + n2 - 2), ) 3. Considering F-test at the significance level the null hypothesis H0 is rejected in favor of alternative hypothesis H1, if the realization of the test statistic T0 = S2 1 /S2 2 1 falls within the critical region W. two-tailed H1 : 2 1/2 2 = 1 W = (0, F/2(n1 - 1, n2 - 1) F1-/2(n1 - 1, n2 - 1), ) left-tailed H1 : 2 1/2 2 < 1 W = (0, F(n1 - 1, n2 - 1) right-tailed H1 : 2 1/2 2 > 1 W = F1-(n1 - 1, n2 - 1), ) Example 7.9 In a restaurant "White Pony" the servicing time was measured 20 times. The results follow: 6, 8, 11, 4, 7, 6, 10, 6, 9, 8, 5, 12, 13, 10, 9, 8, 7, 11, 10, 5. In a restaurant "Golden Lion" the same measurement was kept 15 times and the results come next: 9, 11, 10, 7, 6, 4, 8, 13, 5, 15, 8, 5, 6, 8, 7. Assuming that both samples are independent and normally distributed use 0.05 significance level and test equality of the mean values of the servicing time in both restaurants. Solution At the significance level 0.05 we are testing H0 : 1 - 2 = 0 versus H1 : 1 - 2 = 0, which is a two-sample z-test. This test can be used only if the assumption of equal variances holds. This equality has to be tested first. The F-test is a good instrumental towards it. m1 = 8, 25; m2 = 8, 13; s2 1 = 6, 307; s2 2 = 9, 41; s = (n1-1)S2 1 +(n2-1)S2 2 n1+n2-2 = 19.6,307+14.9,41 19+14 = 7, 623 . 35 Thus, using significance level 0.05 we test hypothesis ˇH0 : 2 1 2 2 = 1 proti H1 : 2 1 2 2 = 1. The test statistic: T0 = S2 1 /S2 2 1 , the numerical realization t0 = 6,307 9,41 = 0, 6702. The critical region: W = 0, F/2(n1 - 1, n2 - 1) F1-/2(n1 - 1, n2 - 1), ) = 0, F0,025(19, 14) F0,975(19, 14), ) = 0; 1 F0,975(14,19) 2, 8607; ) = 0; 0, 3778 2, 8607; ) t0 W, thus H0 assuming equality of variances is not rejected at the significance level 0.05. The two-sample t-test may follow. ˇH0 : 1 - 2 = 0 versus H1 : 1 - 2 = 0 Test statistic: T0 = M1-M2-c S 1 n1 + 1 n2 , the numerical realization: t0 = 8,25-8,13 7,623 1 20 + 1 15 = 0, 124 The critical region: W = (-, -t1-/2(n1 + n2 - 2) t1-/2(n1 + n2 - 2), ) = = (-, -t0,975(33) t0,975(33), ) = (-, -2, 035 2, 035 , ) Since t0 W, H0 is not rejected at the significance level 0.05. [Thus the data does not give evidence against the equality of servicing time.] 36