10. seminar Problem 1: There are 15 workers in a manufactory. For any of them the number of work-shifts (variable X) and number of final products (variable Y) were recorded. X: 20 21 18 17 20 18 19 21 20 14 16 19 21 15 15 Y: 92 93 83 80 91 85 82 98 90 60 73 86 96 64 81 a) Under the assumption that the regression line represents the dependence Y on X design the matrix of regressors, calculate the least square estimators of regression parameters and provide the sample regression function. b) Find the estimator of variance and the coefficient of determination and interpret it. To make it easier for b = the following statistics are calculated: S[E] = e’e = 238,5169 and s^2[y ]= 121,4 (s^2[y] is the realization of the sample variance for Y ). c) Find 95% confidence interval for regression parameters d) At the significance level 0,05 carry out the overall F-test. e) At the significance level 0,05 carry out the separate t-tests. f) For 18 work-shifts estimate the number of final products. g) Give a scatter plot with sample regression function . Solution: ad a) Matrix X (15x2) is formed of column of units and second column of values of X. LSE of regession parameters are obtained using formula where X’ X = ; (X’ X)^-1 = ; X’y = thus b = . = . Hence the sample regression line is expressed as ^ ad b) The estimator of variance follows: . m[y] = 83,6; s^2[y] =121,4 S[T] = (n-1). s^2[y] =14.121,4 = 1699,6 S[R] = S[T] – S[E] = 1699,6 – 238,5169 = 1461,0831. The coefficient of determination follows: . Thus 85,97% of the variation of Y can be explained by the regression line. ad c) To form the confidence interval we have to find the standard errors estimates . The needed diagonal elements of the matrix (X’ X)^-1 follows: v[00] = 4,2939 and v[11] = 0,0127. Thus and . Then the limits for the 95% confidence intervals for regression parameters β[0]^ and β[1 ]are derived from the formula: , j = 0, 1. * For β[0] the limits are calculated as follows: thus -14,1654 < β[0] < 24,1456 with the probability 95%. * For β[1] the limits are calculated as follows: thus 3,2596 < β[1] < 5,3452 with the probability 95%. ad d) Carrying out the overall F-test we are testing H[0]: β[1] = 0 versus H[1]: β[1] ≠ 0 at the significance level α = 0,05. The realization of the test statistic can be found in the last column of the following ANOVA table : zdroj variab. součet čtverců stupně volnosti podíl statistika F model S[R] = 1461,0831 p = 1 S[R]/p=1461,0831 79,6341 reziduální S[E] = 238,5169 n-p-1 = 13 S[E]/(n-p-1)=18,3475 - celkový S[T] = 1699,6 n-1 = 14 - - thus and the critical region has the form . Since we reject the null at 0.05; thus the parameter β[1 ](the slope)[ ]is relevant in our model. ad e) Carrying out the separate t-tests we are testing I. H[0]: β[0] = 0 versus H[1]: β[0] ≠ 0 at the significance level α = 0,05. The realization of the test statistic follows: , and the critical region has the form: Since we do not reject the null at 0.05; thus the parameter β[0] is not relevant in our model. II. H[0]: β[1] = 0 versus H[1]: β[1] ≠ 0 at the significance level α = 0,05. The realization of the test statistic follows: , and the critical region has the form: . Since we reject the null at 0.05; thus the parameter β[1] is relevant in our model. In case of the regression line the t-test for β[1 ]is equivalent with overall F-test. ad f) for x = 18 the regression estimate follows: . ad g) Problem 2.: Considering a car Škoda 120, the petrol consumption (1 liter/100 km) is dependent on the speed (km/hour). rychlost 40 50 60 70 80 90 100 110 spotřeba 5,7 5,4 5,2 5,2 5,8 6,0 7,5 8,1 a) Give a scatter plot for the data and suggest the form of regression function. b) Design the matrix of regressors, calculate the least square estimators of regression parameters, find the estimator of variance find the coefficient of determination and interpret it. c) Find 95% confidence interval for regression parameters d) At the significance level 0,05 carry out the overall F-test. e) At the significance level 0,05 carry out the separate t-tests. f) For the speed 80 km/hour estimate the petrol consumption. g) Give a scatter plot with sample regression function . Solution ad a) . ad b) X = = . y = 9,751786 – 0,150536x + 0,001244x^2. = Xb = ; = ; S[E] = e’e = 0,263869. . S[T] = (y – m[2])’(y – m[2]), where m[2] is a column vector (nx1) of m[2] (sample mean of Y); m[2] = 6,1125. S[T] = 8,32875. (Or it can be calculated: S[T] = (n-1). s^2[y ]) S[R] = S[T] – S[E] = 8,32875 – 0,263869 = 8,06488. . ad c) I for β[0]: II for β[1]: III for β[2]: ad d) F-test; α = 0,05 H[0]: (β[1], β[2]) = (0, 0) versus H[1]: (β[1], β[2]) ≠ (0, 0). , . zdroj variab. součet čtverců stupně volnosti podíl statistika F model S[R] = 8,06488 p = 2 S[R]/p=4,03244 76,41 reziduální S[E] = 0,263869 n-p-1 = 5 S[E]/(n-p-1)=0,05277 - celkový S[T] = 8,32875 n-1 = 7 - - ad e) t-tests; α = 0,05 I for β[0]: H[0]: β[0] = 0 versus H[1]: β[0] ≠ 0. , . II for β[1]: H[0]: β[1] = 0 versus H[1]: β[1] ≠ 0. , . IIIfor β[2]: H[0]: β[2] = 0 versus H[1]: β[2] ≠ 0. , . ad f) for x = 80 the regression estimate follows: . ad g)