8 The statistical inferences based on one sample and two independent samples from Bernoulli distribution Theorem 8.1 Let X1, . . . , Xn be a random sample from Bernoulli distribution A() and let the condition n(1-) > 9 is true. Let M = 1 n n i=1 Xi be a sample mean (sometimes referred as to sample proportion). Then the statistic U = M- M(1-M) n N(0, 1). It has to be read: The statistic U follows asymptotic standard normal distribution. Thus 100(1 - )% asymptotic confidence limits for the parameter are: d = m - m(1-m) n u1-/2 h = m + m(1-m) n u1-/2 Remark 8.2 It is essential to realize what is the interpretation of the mean M. The random variable X takes the values: one and zero, where one stands for success. Then n i=1 Xi stands for the number of successes in n independent trials and the fracture 1 n n i=1 Xi stands for the proportion of successes. The sample proportion is the statistic which estimates the parameter of success probability. Example 8.3 The marketing department of a particular company analyzes competition market share of the same product that is manufactured by considered company. Drawing randomly 100 consumers it was found out that 34 of them use competitor's product, the rest of them use product of considered company. Find the 95% confidence interval for the proportion of competitor's product in the market. Solution Let Xi be a random variable, which takes value 1 if the i-th consumer uses the competitor's product and the value 0 otherwise; i = 1, 2, . . . , 100. Then Xi A() and X1, . . . , Xn is a random sample from Bernoulli distribution. The task is to construct the confidence interval for the parameter of this distribution. n = 100 m = 34 100 u1-/2 = u0,975 = 1, 96 Since the parameter in approximating condition n(1 - ) > 9 is unknown it should be replaced by its estimate m. 100.0, 34.0, 66 = 22, 44 > 9. Thus the estimate m satisfies the condition. Then: d = 0, 34 - 0,34.0,66 100 .1, 96 = 0, 2472 h = 0, 34 + 0,34.0,66 100 .1, 96 = 0, 4328 Thus 0, 2472 < < 0, 4328 with the probability approximately 0,95. [ is a probability, that the randomly drawn consumer uses competitor's product; this probability lies within the limits of interval (0,2472;0,4328). The confidence, that this interval contains the true parameter, is roughly 95%.] Theorem 8.4 Let X1, . . . , Xn be a random sample from A(), c (0, 1), M be a sample mean and let the condition n(1 - ) > 9 is true. At the asymptotic confidence level the null hypothesis H0 : = c is rejected in favour of the 37 alternative hypothesis H1, if the realization of the test statistic T0 = M-c c(1-c) n falls within the critical region W. According to the form of the alternative hypothesis the list of corresponding critical regions follows : two-tailed test H1 : = c W = (-, -u1-/2 u1-/2, ) left-tailed test H1 : < c W = (-, -u1right-tailed test H1 : > c W = u1-, ) [If H0 is true, then T0 N(0, 1).] Remark 8.5 The test statistic is derived using Moivre-Laplace theorem. T0 = M- (1-) n N(0, 1) Remark 8.6 The pivotal statistic, which is instrumental towards construction of confidence interval, differs from the test statistic stated in previous theorem! Example 8.7 Manufacturing some components, the manufacturer declares, that the probability of manufactured defective product is = 0, 01. The sample consisting of 1000 products was drawn randomly and it was found that 16 products were defective. At the asymptotic significance level 0.05 test the hypothesis H0 : = 0, 01 against H1 : = 0, 01. Solution Since the parameter is unknown the condition of normal approximation n(1 - ) > 9 should be replaced by the condition nm(1 - m) > 9. 1000. 16 1000 984 1000 = 15.744 > 9, thus the normal approximation is possible. The realization of the test statistic follows: t0 = 16/1000-0,01 0,010,99 1000 = 1, 907 The critic region is expressed: W = (-, -u1-/2 u1-/2, ) = (-, -1, 96 1, 96, ). Since 1, 907 W, H0 is not rejected at the asymptotic significance level 0,05. [Based on the values of the random sample there is no reason to doubt about declared probability 0.01 of manufacturing the defective product.] Theorem 8.8 Let us consider two independent samples. Let X11, . . . , X1n1 be a random sample from Bernoulli distribution A(1) and X21, . . . , X2n2 be a random sample from A(2). Let the conditions nii(1-i) > 9, i = 1, 2 are true. Let M1, M2 be sample means. Then the statistic U = (M1-M2)-(1-2) M1(1-M1) n1 + M2(1-M2) n2 N(0, 1). Thus 100(1 - )% asymptotic confidence limits for the parametric function 1 - 2 are: d = m1 - m2 - m1(1-m1) n1 + m2(1-m2) n2 u1-/2 h = m1 - m2 + m1(1-m1) n1 + m2(1-m2) n2 u1-/2 Example 8.9 The supermarket management advertised the week of prices reduction. The aim was to find out if the prices reduction does impact the proportion of the heavy shopping (over 500 Kč). During the week without reductions it was drawn randomly 200 customers and 97 of them had done heavy shopping. During the week with reductions the size of the random sample was 300 and the number of heavy 38 shopping was 162. Determine the 95% asymptotic confidence interval for the difference between the probabilities of heavy shopping during the week without reductions and week with reductions. Solution The random variable X1,i takes the value 1, if during the week without reduction in prices the i-th randomly drawn customer realizes heavy shopping and the value 0 otherwise, i = 1, . . . , 200. The random variables X1,1, . . . , X1,200 form the random sample from distribution A(1). Further the random variable X2i takes the value 1, if during the week with reduction in prices the i-th randomly drawn customer realizes heavy shopping and the value 0 otherwise, i = 1, . . . , 300. The random variables X2,1, . . . , X2,300 form the random sample from distribution A(2) and this sample is independent from the previous one. n1 = 200, n2 = 300, m1 = 97/200, m2 = 162/300. To verify the conditions nii(1 - i) > 9, i = 1, 2 of normal approximation the unknown parameters i should be replaced by their estimates mi. Thus this estimates meet the conditions: 200 97/200 103/200 = 49, 955 > 9, 300 162/300 138/300 = 74, 52 > 9. Thus the 100(1 - )% asymptotic confidence limits for parametric function 1 - 2 follow: d = m1 - m2 - m1(1-m1) n1 + m2(1-m2) n2 u1-/2 = = 97/200 - 162/300 - 97/200(1-97/200) 200 + 162/300(1-162/300) 300 1, 96 = = -0, 1443 h = m1 - m2 + m1(1-m1) n1 + m2(1-m2) n2 u1-/2 = = 97/200 - 162/300 + 97/200(1-97/200) 200 + 162/300(1-162/300) 300 1, 96 = = 0, 0343 Hence the parametric funktion 1 - 2 (-0, 1443 , 0, 0343) with the probability approximately 0.95. Theorem 8.10 Let us consider two independent samples. Let X11, . . . , X1n1 be a random sample from Bernoulli distribution A(1) and X21, . . . , X2n2 be a random sample from A(2). Let the conditions nii(1 i) > 9, i = 1, 2 are true. Let M1, M2 be sample means. At the asymptotic level the null hypothesis H0 : 1 - 2 = c is rejected in favour of the alternative hypothesis if the realization of the test statistic T0 = (M1-M2)-c M1(1-M1) n1 + M2(1-M2) n2 falls within the critical region W. According to the form of the alternative hypothesis the list of corresponding critical regions follows : two-sided test H1 : 1 - 2 = c W = (-, -u1-/2 u1-/2, ) left-sided test H1 : 1 - 2 < c W = (-, -u1right-sided test H1 : 1 - 2 > c W = u1-, ) [If H0 is true, then T0 N(0, 1).] Remark 8.11 In the case of H0 : 1 - 2 = 0 (c = 0) the test statistic T0 is preferable, T0 = M1-M2 M(1-M) 1 n1 + 1 n2 , where M = n1M1+n2M2 n1+n2 . [If H0 is true, then T0 N(0, 1).] Example 8.12 Using the data from exercise 8.9 and at the asymptotic significance level 0.05 test the hypothesis, 39 that the week of prices reductions does not increase the probability of heavy shopping. Solution We are running the left tailed test H0 : 1 - 2 = 0 versus H1 : 1 - 2 < 0 at asymptotic = 0, 05. n1 = 200, n2 = 300, m1 = 97/200, m2 = 162/300, m = (97 + 162)/500 = 0, 518. The assumptions of normal approximation have been verified in 8.9 ad a) Using confidence interval method: For the left-tailed test we use right-sided confidence interval: h = m1 - m2 + m1(1-m1) n1 + m2(1-m2) n2 u1- = = 97/200 - 162/300 + 97/200(1-97/200) 200 + 162/300(1-162/300) 300 1, 645 = = 0, 02 Since the value c = 0 is within the interval (- ; 0, 02), H0 is not rejected at the asymptotic = 0, 05, thus the week of prices reductions does not increase the probability of heavy shopping. ad b) Using classical method: The test statistic follows: T0 = M1-M2 M(1-M) 1 n1 + 1 n2 , kde M = n1M1+n2M2 n1+n2 m = 200.97/200+300.162/300 200+300 = 0, 518 t0 = 97/200-162/300 0,518(1-0,518)( 1 200 + 1 300 ) = -1, 2058 The critical region follows: W = (- , -u1- = (- , -u0,95 = (- , -1, 645 . Since t0 W, H0 is not rejected at the asymptotic = 0, 05 40