Ketevani Kapanadze Brno, 2020 Multiple Regression Analyses: Estimation, Inference 3,4 Chapter 1 From previous class 2 • Properties of OLS on any sample of data • Fitted values and residuals • Algebraic properties of OLS regression Fitted or predicted values Deviations from regression line (= residuals) Deviations from regression line sum up to zero Correlation between deviations and regressors is zero Sample averages of y and x lie on regression line 3 • Goodness-of-Fit • Measures of Variation Total sum of squares, represents total variation in dependent variable Explained sum of squares, represents variation explained by regression Residual sum of squares, represents variation not explained by regression „How well does the explanatory variable explain the dependent variable?“ Properties of OLS 4 • Decomposition of total variation • Goodness-of-fit measure (R-squared) Total variation Explained part Unexplained part R-squared measures the fraction of the total variation that is explained by the regression Properties of OLS 5 • CEO Salary and return on equity • Voting outcomes and campaign expenditures • Caution: A high R-squared does not necessarily mean that the regression has a causal interpretation! The regression explains only 1.3 % of the total variation in salaries The regression explains 85.6 % of the total variation in election outcomes Properties of OLS: Examples Expected Values and Variance of the OLS 6 • The estimated regression coefficients are random variables because they are calculated from a random sample • The question is what the estimators will estimate on average and how large their variability in repeated samples is Data is random and depends on particular sample that has been drawn 7 • Standard assumptions for the linear regression model • Assumption SLR.1 (Linear in parameters) • Assumption SLR.2 (Random sampling) In the population, the relationship between y and x is linear The data is a random sample drawn from the population Each data point therefore follows the population equation Expected Values and Variance of the OLS 8 • Assumptions for the linear regression model (cont.) • Assumption SLR.3 (Sample variation in explanatory variable) • Assumption SLR.4 (Zero conditional mean) The values of the explanatory variables are not all the same (otherwise it would be impossible to study how different values of the explanatory variable lead to different values of the dependent variable) The value of the explanatory variable must contain no information about the mean of the unobserved factors Expected Values and Variance of the OLS 9 • Theorem 2.1 (Unbiasedness of OLS) • Interpretation of unbiasedness – The estimated coefficients may be smaller or larger, depending on the sample that is the result of a random draw – However, on average, they will be equal to the values that characterize the true relationship between y and x in the population – „On average“ means if sampling was repeated, i.e. if drawing the random sample and doing the estimation was repeated many times – In a given sample, estimates may differ considerably from true values Expected Values and Variance of the OLS 10 • Variances of the OLS estimators – Depending on the sample, the estimates will be nearer or farther away from the true population values – How far can we expect our estimates to be away from the true population values on average (= sampling variability)? – Sampling variability is measured by the estimator‘s variances • Assumption SLR.5 (Homoskedasticity) The value of the explanatory variable must contain no information about the variability of the unobserved factors Expected Values and Variance of the OLS 11 • Graphical illustration of homoskedasticity The variability of the unobserved influences does not dependent on the value of the explanatory variable Expected Values and Variance of the OLS 12 • An example for heteroskedasticity: Wage and education The variance of the unobserved determinants of wages increases with the level of education Expected Values and Variance of the OLS 13 • Theorem 2.2 (Variances of OLS estimators) • Conclusion: – The sampling variability of the estimated regression coefficients will be the higher the larger the variability of the unobserved factors, and the lower, the higher the variation in the explanatory variable Under assumptions SLR.1 – SLR.5: Expected Values and Variance of the OLS 14 • Estimating the error variance The variance of u does not depend on x, i.e. is equal to the unconditional variance One could estimate the variance of the errors by calculating the variance of the residuals in the sample; unfortunately this estimate would be biased An unbiased estimate of the error variance can be obtained by substracting the number of estimated regression coefficients from the number of observations Expected Values and Variance of the OLS 15 • Theorem 2.3 (Unbiasedness of the error variance) • Calculation of standard errors for regression coefficients The estimated standard deviations of the regression coefficients are called „standard errors“. They measure how precisely the regression coefficients are estimated. Plug in for the unknown Expected Values and Variance of the OLS • MULTIPLE REGRESSION MODELS 16 Multiple Regression Analyses: Estimation 17 • Definition of the multiple linear regression model Dependent variable, explained variable, response variable,… Independent variables, explanatory variables, control,… Error term, disturbance, unobservables,… Intercept Slope parameters Estimation: Motivation 18 • Motivation for multiple regression • Incorporate more explanatory factors into the model • Explicitly hold fixed other factors that otherwise would be in • Allow for more flexible functional forms • Example: Wage equation Hourly wage Years of education Labor market experience All other factors… Now measures effect of education explicitly holding experience fixed 19 • Example: Average test scores and per student spending • Per student spending is likely to be correlated with average family income at a given high school because of school financing • Omitting average family income in regression would lead to biased estimate of the effect of spending on average test scores • In a simple regression model, effect of per student spending would partly include the effect of family income on test scores Average standardized test score of school Other factors Per student spending at this school Average family income of students at this school Estimation: Motivation 20 • Example: Family income and family consumption • Model has two explanatory variables: inome and income squared • Consumption is explained as a quadratic function of income • One has to be very careful when interpreting the coefficients: Family consumption Family income Family income squared By how much does consumption increase if income is increased by one unit? Depends on how much income is already there Other factors Estimation: Motivation 21 • Example: CEO salary, sales and CEO tenure • Model assumes a constant elasticity relationship between CEO salary and the sales of his or her firm • Model assumes a quadratic relationship between CEO salary and his or her tenure with the firm • Meaning of „linear“ regression  The model has to be linear in the parameters (not in the variables) Log of CEO salary Log sales Quadratic function of CEO tenure with firm Estimation: Motivation Estimation: OLS estimation of the MLR 22 • Random sample • Regression residuals • Minimize sum of squared residuals Minimization will be carried out by computer 23 • Interpretation of the multiple regression model • The multiple linear regression model manages to hold the values of other explanatory variables fixed even if, in reality, they are correlated with the explanatory variable under consideration • „Ceteris paribus“-interpretation • It has still to be assumed that unobserved factors do not change if the explanatory variables are changed By how much does the dependent variable change if the j-th independent variable is increased by one unit, holding all other independent variables and the error term constant Estimation: OLS estimation of the MLR 24 • Example: Determinants of college GPA • Interpretation • Holding ACT fixed, another point on high school grade point average is associated with another .453 points college grade point average • Or: If we compare two students with the same ACT, but the hsGPA of student A is one point higher, we predict student A to have a colGPA that is .453 higher than that of student B • Holding high school grade point average fixed, another 10 points on ACT are associated with less than one point on college GPA Grade point average at college High school grade point average Achievement test score Estimation: OLS estimation of the MLR 25 • Properties of OLS on any sample of data • Fitted values and residuals • Algebraic properties of OLS regression Fitted or predicted values Residuals Deviations from regression line sum up to zero Correlations between deviations and regressors are zero Sample averages of y and of the regressors lie on regression line Estimation: OLS estimation of the MLR 26 • Goodness-of-Fit • Decomposition of total variation • R-squared • Alternative expression for R-squared Notice that R-squared can only increase if another explanatory variable is added to the regression R-squared is equal to the squared correlation coefficient between the actual and the predicted value of the dependent variable Estimation: OLS estimation of the MLR Estimation: The Expected Value of the OLS Estimators 27 • Standard assumptions for the multiple regression model • Assumption MLR.1 (Linear in parameters) • Assumption MLR.2 (Random sampling) • Assumption MLR.3 (No perfect collinearity) • Assumption MLR.4 (Zero conditional mean) „In the sample (and therefore in the population), none of the independent variables are constant and there are no exact linear relationships among the independent variables“ 28 • Example for perfect collinearity: small sample • Example for perfect collinearity: relationships between regressors In a small sample, avginc may accidentally be an exact multiple of expend; it will not be possible to disentangle their separate effects because there is exact covariation Either shareA or shareB will have to be dropped from the regression because there is an exact linear relationship between them: shareA + shareB = 1 Estimation: The Expected Value of the OLS Estimators 29 • Assumption MLR.4 (Zero conditional mean) • In a multiple regression model, the zero conditional mean assumption is much more likely to hold because fewer things end up in the error • Example: Average test scores The value of the explanatory variables must contain no information about the mean of the unobserved factors If avginc was not included in the regression, it would end up in the error term; it would then be hard to defend that expend is uncorrelated with the error Estimation: The Expected Value of the OLS Estimators 30 • Discussion of the zero mean conditional assumption • Explanatory variables that are correlated with the error term are called endogenous; endogeneity is a violation of assumption MLR.4 • Explanatory variables that are uncorrelated with the error term are called exogenous; MLR.4 holds if all explanat. var. are exogenous • Exogeneity is the key assumption for a causal interpretation of the regression, and for unbiasedness of the OLS estimators • Theorem 3.1 (Unbiasedness of OLS)  Unbiasedness is an average property in repeated samples; in a given sample, the estimates may still be far away from the true values Estimation: The Expected Value of the OLS Estimators 31 • Including irrelevant variables in a regression model • Omitting relevant variables: the simple case = 0 in the population True model (contains x1 and x2) Estimated model (x2 is omitted) No problem because . However, including irrevelant variables may increase sampling variance. Estimation: The Expected Value of the OLS Estimators 32 • Omitted variable bias • Conclusion: All estimated coefficients will be biased If x1 and x2 are correlated, assume a linear regression relationship between them If y is only regressed on x1 this will be the estimated intercept If y is only regressed on x1, this will be the estimated slope on x1 error term Estimation: The Expected Value of the OLS Estimators 33 • Example: Omitting ability in a wage equation • When is there no omitted variable bias?  If the omitted variable is irrelevant or uncorrelated Both will be positive The return to education will be overestimated because . It will look as if people with many years of education earn very high wages, but this is partly due to the fact that people with more education are also more able on average. Estimation: The Expected Value of the OLS Estimators 34 • Omitted variable bias: more general cases • No general statements possible about direction of bias • Analysis as in simple case if one regressor uncorrelated with others • Example: Omitting ability in a wage equation True model (contains x1, x2 and x3) Estimated model (x3 is omitted) If exper is approximately uncorrelated with educ and abil, then the direction of the omitted variable bias can be as analyzed in the simple two variable case. Estimation: The Expected Value of the OLS Estimators Estimation: The Variance of the OLS Estimators 35 • Standard assumptions for the multiple regression model (cont.) • Assumption MLR.5 (Homoscedasticity) • Short hand notation with All explanatory variables are collected in a random vector Under assumptions MLR.1 – MLR.5: 36 • Components of OLS Variances: • 1) The error variance • A high error variance increases the sampling variance because there is more „noise“ in the equation • A large error variance necessarily makes estimates imprecise • The error variance does not decrease with sample size • 2) The total sample variation in the explanatory variable • More sample variation leads to more precise estimates • Total sample variation automatically increases with the sample size • Increasing the sample size is thus a way to get more precise estimates Estimation: The Variance of the OLS Estimators 37 • 3) Linear relationships among the independent variables • Sampling variance of will be the higher the better explanatory variable can be linearly explained by other independent variables • The problem of almost linearly dependent explanatory variables is called multicollinearity (i.e. for some ) Regress on all other independent variables (including a constant) The R-squared of this regression will be higher the better xj can be linearly explained by the other independent variables Estimation: The Variance of the OLS Estimators 38 • An example for multicollinearity Average standardized test score of school Expenditures for teachers Expenditures for instructional materials Other ex- penditures The different expenditure categories will be strongly correlated because if a school has a lot of resources it will spend a lot on everything. It will be hard to estimate the differential effects of different expenditure categories because all expenditures are either high or low. For precise estimates of the differential effects, one would need information about situations where expenditure categories change differentially. As a consequence, sampling variance of the estimated effects will be large. Estimation: The Variance of the OLS Estimators 39 • Discussion of the multicollinearity problem • In the above example, it would probably be better to lump all expenditure categories together because effects cannot be disentangled • In other cases, dropping some independent variables may reduce multicollinearity (but this may lead to omitted variable bias) • Only the sampling variance of the variables involved in multicollinearity will be inflated; the estimates of other effects may be very precise • Note that multicollinearity is not a violation of MLR.3 in the strict sense • Multicollinearity may be detected through „variance inflation factors“ As an (arbitrary) rule of thumb, the variance inflation factor should not be larger than 10 Estimation: The Variance of the OLS Estimators 40 • Estimating the error variance • Theorem 3.3 (Unbiased estimator of the error variance) An unbiased estimate of the error variance can be obtained by substracting the number of estimated regression coefficients from the number of observations. The number of observations minus the number of estimated parameters is also called the degrees of freedom. The n estimated squared residuals in the sum are not completely independent but related through the k+1 equations that define the first order conditions of the minimization problem. Estimation: The Variance of the OLS Estimators 41 • Estimation of the sampling variances of the OLS estimators • Note that these formulas are only valid under assumptions MLR.1-MLR.5 (in particular, there has to be homoscedasticity) The true sampling variation of the estimated The estimated sampling variation of the estimated Plug in for the unknown Estimation: The Variance of the OLS Estimators Estimation: Efficiency of OLS 42 • Efficiency of OLS: The Gauss-Markov Theorem • Under assumptions MLR.1 - MLR.5, OLS is unbiased • However, under these assumptions there may be many other estimators that are unbiased • Which one is the unbiased estimator with the smallest variance? • In order to answer this question one usually limits oneself to linear estimators, i.e. estimators linear in the dependent variable Maybe an arbitrary function of the sample values of all the explanatory variables; the OLS estimator can be shown to be of this form 43 • Theorem 3.4 (Gauss-Markov Theorem) • Under assumptions MLR.1 - MLR.5, the OLS estimators are the best linear unbiased estimators (BLUEs) of the regression coefficients, i.e. • OLS is only the best estimator if MLR.1 – MLR.5 hold; if there is heteroscedasticity for example, there are better estimators. for all for which . Estimation: Efficiency of OLS 44 • Statistical inference in the regression model  Hypothesis tests about population parameters  Construction of confidence intervals • Sampling distributions of the OLS estimators  The OLS estimators are random variables  We already know their expected values and their variances  However, for hypothesis tests we need to know their distribution  In order to derive their distribution we need additional assumptions  Assumption about distribution of errors: normal distribution Multiple Regression Analyses: Inference 45 Inference: Sampling distributions of the OLS Estimators • Assumption MLR.6 (Normality of error terms) independently of It is assumed that the unobserved factors are normally distributed around the population regression function. The form and the variance of the distribution does not depend on any of the explanatory variables. It follows that: 46 • Discussion of the normality assumption • The error term is the sum of „many“ different unobserved factors • Sums of independent factors are normally distributed (CLT) • Problems:  How many different factors? Number large enough?  Possibly very heterogenuous distributions of individual factors  How independent are the different factors? • The normality of the error term is an empirical question • At least the error distribution should be „close“ to normal • In many cases, normality is questionable or impossible by definition Inference: Sampling distributions of the OLS Estimators 47 • Discussion of the normality assumption (cont.) • Examples where normality cannot hold: • Wages (nonnegative; also: minimum wage) • Number of arrests (takes on a small number of integer values) • Unemployment (indicator variable, takes on only 1 or 0) • In some cases, normality can be achieved through transformations of the dependent variable (e.g. use log(wage) instead of wage) • Under normality, OLS is the best (even nonlinear) unbiased estimator • Important: For the purposes of statistical inference, the assumption of normality can be replaced by a large sample size Inference: Sampling distributions of the OLS Estimators 48 • Terminology • Theorem 4.1 (Normal sampling distributions) Under assumptions MLR.1 – MLR.6: The estimators are normally distributed around the true parameters with the variance that was derived earlier The standardized estimators follow a standard normal distribution „Gauss-Markov assumptions“ „Classical linear model (CLM) assumptions“ Inference: Sampling distributions of the OLS Estimators 49 • Testing hypotheses about a single population parameter • Theorem 4.2 (t-distribution for standardized estimators) • Null hypothesis (for more general hypotheses, see below) Under assumptions MLR.1 – MLR.6: If the standardization is done using the estimated standard deviation (= standard error), the normal distribution is replaced by a t-distribution The population parameter is equal to zero, i.e. after controlling for the other independent variables, there is no effect of xj on y Note: The t-distribution is close to the standard normal distribution if n-k-1 is large. Inference: The t Test 50 • t-statistic (or t-ratio) • Distribution of the t-statistic if the null hypothesis is true • Goal: Define a rejection rule so that, if it is true, H0 is rejected only with a small probability (= significance level, e.g. 5%) The t-statistic will be used to test the above null hypothesis. The farther the estimated coefficient is away from zero, the less likely it is that the null hypothesis holds true. But what does „far“ away from zero mean? This depends on the variability of the estimated coefficient, i.e. its standard deviation. The t-statistic measures how many estimated standard deviations the estimated coefficient is away from zero. Inference: The t Test 51 • Testing against one-sided alternatives (greater than zero) Test ag against . Reject the null hypothesis in favour of the alternative hypothesis if the estimated coefficient is „too large“ (i.e. larger than a critical value). Construct the critical value so that, if the null hypothesis is true, it is rejected in, for example, 5% of the cases. In the given example, this is the point of the tdistribution with 28 degrees of freedom that is exceeded in 5% of the cases. ! Reject if t-statistic greater than 1.701 Inference: The t Test 52 • Example: Wage equation • Test whether, after controlling for education and tenure, higher work experience leads to higher hourly wages Standard errors Test against . One would either expect a positive effect of experience on hourly wage or no effect at all. Inference: The t Test 53 • Example: Wage equation (cont.) „The effect of experience on hourly wage is statistically greater than zero at the 5% (and even at the 1%) significance level.“ t-statistic Degrees of freedom; here the standard normal approximation applies Critical values for the 5% and the 1% significance level (these are conventional significance levels). The null hypothesis is rejected because the t-statistic exceeds the critical value. Inference: The t Test 54 • Testing against one-sided alternatives (less than zero) Test a against . Reject the null hypothesis in favour of the alternative hypothesis if the estimated coefficient is „too small“ (i.e. smaller than a critical value). Construct the critical value so that, if the null hypothesis is true, it is rejected in, for example, 5% of the cases. In the given example, this is the point of the tdistribution with 18 degrees of freedom so that 5% of the cases are below the point. ! Reject if t-statistic less than -1.734 Inference: The t Test 55 • Example: Student performance and school size • Test whether smaller school size leads to better student performance Test against . Do larger schools hamper student performance or is there no such effect? Percentage of students passing maths test Average annual teacher compensation Staff per one thousand students School enrollment (= school size) Inference: The t Test 56 • Example: Student performance and school size (cont.) One cannot reject the hypothesis that there is no effect of school size on student performance (not even for a lax significance level of 15%). t-statistic Degrees of freedom; here the standard normal approximation applies Critical values for the 5% and the 15% significance level. The null hypothesis is not rejected because the t-statistic is not smaller than the critical value. Inference: The t Test 57 • Example: Student performance and school size (cont.) • Alternative specification of functional form: Test against . R-squared slightly higher Inference: The t Test 58 • Example: Student performance and school size (cont.) The hypothesis that there is no effect of school size on student performance can be rejected in favor of the hypothesis that the effect is negative. t-statistic Critical value for the 5% significance level ! reject null hypothesis How large is the effect? + 10% enrollment ! -0.129 percentage points students pass test (small effect) Inference: The t Test 59 • Testing against two-sided alternatives Test against . Reject the null hypothesis in favour of the alternative hypothesis if the absolute value of the estimated coefficient is too large. Construct the critical value so that, if the null hypothesis is true, it is rejected in, for example, 5% of the cases. In the given example, these are the points of the t-distribution so that 5% of the cases lie in the two tails. ! Reject if absolute value of t-statistic is less than - 2.06 or greater than 2.06 Inference: The t Test 60 • Example: Determinants of college GPA Lectures missed per week The effects of hsGPA and skipped are significantly different from zero at the 1% significance level. The effect of ACT is not significantly different from zero, not even at the 10% significance level. For critical values, use standard normal distribution Inference: The t Test 61 • „Statistically significant“ variables in a regression • If a regression coefficient is different from zero in a two-sided test, the corresponding variable is said to be „statistically significant“ • If the number of degrees of freedom is large enough so that the normal approximation applies, the following rules of thumb apply: „statistically significant at 10 % level“ „statistically significant at 5 % level“ „statistically significant at 1 % level“ Inference: The t Test 62 • Guidelines for discussing economic and statistical significance • If a variable is statistically significant, discuss the magnitude of the coefficient to get an idea of its economic or practical importance • The fact that a coefficient is statistically significant does not necessarily mean it is economically or practically significant! • If a variable is statistically and economically important but has the „wrong“ sign, the regression model might be misspecified • If a variable is statistically insignificant at the usual levels (10%, 5%, 1%), one may think of dropping it from the regression • If the sample size is small, effects might be imprecisely estimated so that the case for dropping insignificant variables is less strong Inference: The t Test 63 • Testing more general hypotheses about a regression coefficient • Null hypothesis • t-statistic • The test works exactly as before, except that the hypothesized value is substracted from the estimate when forming the statistic Hypothesized value of the coefficient Inference: The t Test 64 • Example: Campus crime and enrollment • An interesting hypothesis is whether crime increases by one percent if enrollment is increased by one percent The hypothesis is rejected at the 5% level Estimate is different from one but is this difference statistically significant? Inference: The t Test 65 • Computing p-values for t-tests • If the significance level is made smaller and smaller, there will be a point where the null hypothesis cannot be rejected anymore • The reason is that, by lowering the significance level, one wants to avoid more and more to make the error of rejecting a correct H0 • The smallest significance level at which the null hypothesis is still rejected, is called the p-value of the hypothesis test • A small p-value is evidence against the null hypothesis because one would reject the null hypothesis even at small significance levels • A large p-value is evidence in favor of the null hypothesis • P-values are more informative than tests at fixed significance levels Inference: The t Test 66 • How the p-value is computed (here: two-sided test) The p-value is the significance level at which one is indifferent between rejecting and not rejecting the null hypothesis. In the two-sided case, the p-value is thus the probability that the t-distributed variable takes on a larger absolute value than the realized value of the test statistic, e.g.: From this, it is clear that a null hypothesis is rejected if and only if the corresponding p-value is smaller than the significance level. For example, for a significance level of 5% the tstatistic would not lie in the rejection region. value of test statistic These would be the critical values for a 5% significance level Inference: The t Test Inference: Confidence Intervals 67 Critical value of two-sided test • Confidence intervals • Simple manipulation of the result in Theorem 4.2 implies that • Interpretation of the confidence interval • The bounds of the interval are random • In repeated samples, the interval that is constructed in the above way will cover the population regression coefficient in 95% of the cases Lower bound of the Confidence interval Upper bound of the Confidence interval Confidence level 68 • Confidence intervals for typical confidence levels • Relationship between confidence intervals and hypotheses tests reject in favor of Use rules of thumb Inference: Confidence Intervals 69 • Example: Model of firms‘ R&D expenditures Spending on R&D Annual sales Profits as percentage of sales The effect of sales on R&D is relatively precisely estimated as the interval is narrow. Moreover, the effect is significantly different from zero because zero is outside the interval. This effect is imprecisely estimated as the interval is very wide. It is not even statistically significant because zero lies in the interval. Inference: Confidence Intervals (0.0128 ) 0.0217 (0.0128 ) 70 • Example: Return to education at 2 year vs. at 4 year colleges Years of education at 2 year colleges Years of education at 4 year colleges Test against . A possible test statistic would be: The difference between the estimates is normalized by the estimated standard deviation of the difference. The null hypothesis would have to be rejected if the statistic is „too negative“ to believe that the true difference between the parameters is equal to zero. Inference: Testing hypotheses about a linear combination of parameters 71 Inference: Testing hypotheses about a linear combination of parameters • Impossible to compute with standard regression output because • Alternative method Usually not available in regression output Define and test against . a new regressor (= total years of college)Insert into original regression 72 • Estimation results • This method works always for single linear hypotheses Total years of college Hypothesis is rejected at 10% level but not at 5% level Inference: Testing hypotheses about a linear combination of parameters 73 • Testing multiple linear restrictions: The F-test • Testing exclusion restrictions Years in the league Average number of games per year Salary of major league baseball player Batting average Home runs per year Runs batted in per year against Test whether performance measures have no effect/can be exluded from regression. Inference: The F Test 74 • Estimation of the unrestricted model None of these variabels are statistically significant when tested individually Idea: How would the model fit be if these variables were dropped from the regression? Inference: The F Test 75 • Estimation of the restricted model • Test statistic The sum of squared residuals necessarily increases, but is the increase statistically significant? The relative increase of the sum of squared residuals when going from H1 to H0 follows a F-distribution (if the null hypothesis H0 is correct) Number of restrictions Inference: The F Test 76 • Rejection rule (Figure 4.7) A F-distributed variable only takes on positive values. This corresponds to the fact that the sum of squared residuals can only increase if one moves from H1 to H0. Choose the critical value so that the null hypothesis is rejected in, for example, 5% of the cases, although it is true. Inference: The F Test 77 • Test decision in example • Discussion  The three variables are „jointly significant“  They were not significant when tested individually  The likely reason is multicollinearity between them Number of restrictions to be tested Degrees of freedom in the unrestricted model The null hypothesis is overwhelmingly rejected (even at very small significance levels). Inference: The F Test 78 • Test of overall significance of a regression • The test of overall significance is reported in most regression packages; the null hypothesis is usually overwhelmingly rejected The null hypothesis states that the explanatory variables are not useful at all in explaining the dependent variable Restricted model (regression on constant) Inference: The F Test Next Class • Freeing up the classical assumptions (heteroskedasticity) 79