Ketevani Kapanadze
Brno, 2020
Multiple Regression
Analyses: Estimation,
Inference
3,4 Chapter
1
From previous class
2
• Properties of OLS on any sample of data
• Fitted values and residuals
• Algebraic properties of OLS regression
Fitted or predicted values Deviations from regression line (= residuals)
Deviations from regression
line sum up to zero
Correlation between deviations
and regressors is zero
Sample averages of y and x
lie on regression line
3
• Goodness-of-Fit
• Measures of Variation
Total sum of squares,
represents total variation
in dependent variable
Explained sum of squares,
represents variation
explained by regression
Residual sum of squares,
represents variation not
explained by regression
„How well does the explanatory variable explain the dependent variable?“
Properties of OLS
4
• Decomposition of total variation
• Goodness-of-fit measure (R-squared)
Total variation Explained part Unexplained part
R-squared measures the fraction of the
total variation that is explained by the
regression
Properties of OLS
5
• CEO Salary and return on equity
• Voting outcomes and campaign expenditures
• Caution: A high R-squared does not necessarily mean that the regression has a
causal interpretation!
The regression explains only 1.3 %
of the total variation in salaries
The regression explains 85.6 % of the
total variation in election outcomes
Properties of OLS: Examples
Expected Values and Variance of the OLS
6
• The estimated regression coefficients are random variables because
they are calculated from a random sample
• The question is what the estimators will estimate on average and how
large their variability in repeated samples is
Data is random and depends on particular sample that has been drawn
7
• Standard assumptions for the linear regression model
• Assumption SLR.1 (Linear in parameters)
• Assumption SLR.2 (Random sampling)
In the population, the relationship
between y and x is linear
The data is a random sample
drawn from the population
Each data point therefore follows
the population equation
Expected Values and Variance of the OLS
8
• Assumptions for the linear regression model (cont.)
• Assumption SLR.3 (Sample variation in explanatory variable)
• Assumption SLR.4 (Zero conditional mean)
The values of the explanatory variables are not all
the same (otherwise it would be impossible to study
how different values of the explanatory variable
lead to different values of the dependent variable)
The value of the explanatory variable must
contain no information about the mean of the
unobserved factors
Expected Values and Variance of the OLS
9
• Theorem 2.1 (Unbiasedness of OLS)
• Interpretation of unbiasedness
– The estimated coefficients may be smaller or larger, depending on the
sample that is the result of a random draw
– However, on average, they will be equal to the values that characterize the
true relationship between y and x in the population
– „On average“ means if sampling was repeated, i.e. if drawing the random
sample and doing the estimation was repeated many times
– In a given sample, estimates may differ considerably from true values
Expected Values and Variance of the OLS
10
• Variances of the OLS estimators
– Depending on the sample, the estimates will be nearer or farther away from
the true population values
– How far can we expect our estimates to be away from the true population
values on average (= sampling variability)?
– Sampling variability is measured by the estimator‘s variances
• Assumption SLR.5 (Homoskedasticity)
The value of the explanatory variable must
contain no information about the variability of
the unobserved factors
Expected Values and Variance of the OLS
11
• Graphical illustration of homoskedasticity
The variability of the unobserved
influences does not dependent on the
value of the explanatory variable
Expected Values and Variance of the OLS
12
• An example for heteroskedasticity: Wage and education
The variance of the unobserved
determinants of wages increases
with the level of education
Expected Values and Variance of the OLS
13
• Theorem 2.2 (Variances of OLS estimators)
• Conclusion:
– The sampling variability of the estimated regression coefficients will be the
higher the larger the variability of the unobserved factors, and the lower, the
higher the variation in the explanatory variable
Under assumptions SLR.1 – SLR.5:
Expected Values and Variance of the OLS
14
• Estimating the error variance
The variance of u does not depend on x, i.e. is
equal to the unconditional variance
One could estimate the variance of the
errors by calculating the variance of the
residuals in the sample; unfortunately this
estimate would be biased
An unbiased estimate of the error variance can be obtained by
substracting the number of estimated regression coefficients
from the number of observations
Expected Values and Variance of the OLS
15
• Theorem 2.3 (Unbiasedness of the error variance)
• Calculation of standard errors for regression coefficients
The estimated standard deviations of the regression coefficients are called „standard errors“.
They measure how precisely the regression coefficients are estimated.
Plug in for
the unknown
Expected Values and Variance of the OLS
• MULTIPLE REGRESSION MODELS
16
Multiple Regression Analyses: Estimation
17
• Definition of the multiple linear regression model
Dependent variable,
explained variable,
response variable,…
Independent variables,
explanatory variables,
control,…
Error term,
disturbance,
unobservables,…
Intercept Slope parameters
Estimation: Motivation
18
• Motivation for multiple regression
• Incorporate more explanatory factors into the model
• Explicitly hold fixed other factors that otherwise would be in
• Allow for more flexible functional forms
• Example: Wage equation
Hourly wage Years of education Labor market experience
All other factors…
Now measures effect of education explicitly holding experience fixed
19
• Example: Average test scores and per student spending
• Per student spending is likely to be correlated with average family income at
a given high school because of school financing
• Omitting average family income in regression would lead to biased estimate
of the effect of spending on average test scores
• In a simple regression model, effect of per student spending would partly
include the effect of family income on test scores
Average standardized
test score of school
Other factors
Per student spending
at this school
Average family income
of students at this school
Estimation: Motivation
20
• Example: Family income and family consumption
• Model has two explanatory variables: inome and income squared
• Consumption is explained as a quadratic function of income
• One has to be very careful when interpreting the coefficients:
Family consumption Family income Family income squared
By how much does consumption
increase if income is increased
by one unit?
Depends on how
much income is
already there
Other factors
Estimation: Motivation
21
• Example: CEO salary, sales and CEO tenure
• Model assumes a constant elasticity relationship between CEO salary and
the sales of his or her firm
• Model assumes a quadratic relationship between CEO salary and his or her
tenure with the firm
• Meaning of „linear“ regression
 The model has to be linear in the parameters (not in the variables)
Log of CEO salary Log sales Quadratic function of CEO tenure with firm
Estimation: Motivation
Estimation: OLS estimation of the MLR
22
• Random sample
• Regression residuals
• Minimize sum of squared residuals
Minimization will be carried out by computer
23
• Interpretation of the multiple regression model
• The multiple linear regression model manages to hold the values of other
explanatory variables fixed even if, in reality, they are correlated with the
explanatory variable under consideration
• „Ceteris paribus“-interpretation
• It has still to be assumed that unobserved factors do not change if the
explanatory variables are changed
By how much does the dependent variable change if the j-th
independent variable is increased by one unit, holding all
other independent variables and the error term constant
Estimation: OLS estimation of the MLR
24
• Example: Determinants of college GPA
• Interpretation
• Holding ACT fixed, another point on high school grade point average is
associated with another .453 points college grade point average
• Or: If we compare two students with the same ACT, but the hsGPA of
student A is one point higher, we predict student A to have a colGPA that is
.453 higher than that of student B
• Holding high school grade point average fixed, another 10 points on ACT are
associated with less than one point on college GPA
Grade point average at college High school grade point average Achievement test score
Estimation: OLS estimation of the MLR
25
• Properties of OLS on any sample of data
• Fitted values and residuals
• Algebraic properties of OLS regression
Fitted or predicted values Residuals
Deviations from regression
line sum up to zero
Correlations between deviations
and regressors are zero
Sample averages of y and of the
regressors lie on regression line
Estimation: OLS estimation of the MLR
26
• Goodness-of-Fit
• Decomposition of total variation
• R-squared
• Alternative expression for R-squared
Notice that R-squared can only
increase if another explanatory
variable is added to the regression
R-squared is equal to the squared
correlation coefficient between the
actual and the predicted value of
the dependent variable
Estimation: OLS estimation of the MLR
Estimation: The Expected Value of the
OLS Estimators
27
• Standard assumptions for the multiple regression model
• Assumption MLR.1 (Linear in parameters)
• Assumption MLR.2 (Random sampling)
• Assumption MLR.3 (No perfect collinearity)
• Assumption MLR.4 (Zero conditional mean)
„In the sample (and therefore in the population), none
of the independent variables are constant and there are
no exact linear relationships among the independent variables“
28
• Example for perfect collinearity: small sample
• Example for perfect collinearity: relationships between regressors
In a small sample, avginc may accidentally be an exact multiple of expend; it will not
be possible to disentangle their separate effects because there is exact covariation
Either shareA or shareB will have to be dropped from the regression because there
is an exact linear relationship between them: shareA + shareB = 1
Estimation: The Expected Value of the
OLS Estimators
29
• Assumption MLR.4 (Zero conditional mean)
• In a multiple regression model, the zero conditional mean assumption is
much more likely to hold because fewer things end up in the error
• Example: Average test scores
The value of the explanatory variables
must contain no information about the mean of
the unobserved factors
If avginc was not included in the regression, it would end up in the error term; it
would then be hard to defend that expend is uncorrelated with the error
Estimation: The Expected Value of the
OLS Estimators
30
• Discussion of the zero mean conditional assumption
• Explanatory variables that are correlated with the error term are called
endogenous; endogeneity is a violation of assumption MLR.4
• Explanatory variables that are uncorrelated with the error term are called
exogenous; MLR.4 holds if all explanat. var. are exogenous
• Exogeneity is the key assumption for a causal interpretation of the
regression, and for unbiasedness of the OLS estimators
• Theorem 3.1 (Unbiasedness of OLS)
 Unbiasedness is an average property in repeated samples; in a given
sample, the estimates may still be far away from the true values
Estimation: The Expected Value of the
OLS Estimators
31
• Including irrelevant variables in a regression model
• Omitting relevant variables: the simple case
= 0 in the population
True model (contains x1 and x2)
Estimated model (x2 is omitted)
No problem because .
However, including irrevelant variables may increase sampling variance.
Estimation: The Expected Value of the
OLS Estimators
32
• Omitted variable bias
• Conclusion: All estimated coefficients will be biased
If x1 and x2 are correlated, assume a linear
regression relationship between them
If y is only regressed
on x1 this will be the
estimated intercept
If y is only regressed
on x1, this will be the
estimated slope on x1
error term
Estimation: The Expected Value of the
OLS Estimators
33
• Example: Omitting ability in a wage equation
• When is there no omitted variable bias?
 If the omitted variable is irrelevant or uncorrelated
Both will be positive
The return to education will be overestimated because . It will look
as if people with many years of education earn very high wages, but this is partly
due to the fact that people with more education are also more able on average.
Estimation: The Expected Value of the
OLS Estimators
34
• Omitted variable bias: more general cases
• No general statements possible about direction of bias
• Analysis as in simple case if one regressor uncorrelated with others
• Example: Omitting ability in a wage equation
True model (contains x1, x2 and x3)
Estimated model (x3 is omitted)
If exper is approximately uncorrelated with educ and abil, then the direction
of the omitted variable bias can be as analyzed in the simple two variable case.
Estimation: The Expected Value of the
OLS Estimators
Estimation: The Variance of the OLS
Estimators
35
• Standard assumptions for the multiple regression model (cont.)
• Assumption MLR.5 (Homoscedasticity)
• Short hand notation
with
All explanatory variables are
collected in a random vector
Under assumptions MLR.1 – MLR.5:
36
• Components of OLS Variances:
• 1) The error variance
• A high error variance increases the sampling variance because there is more
„noise“ in the equation
• A large error variance necessarily makes estimates imprecise
• The error variance does not decrease with sample size
• 2) The total sample variation in the explanatory variable
• More sample variation leads to more precise estimates
• Total sample variation automatically increases with the sample size
• Increasing the sample size is thus a way to get more precise estimates
Estimation: The Variance of the OLS
Estimators
37
• 3) Linear relationships among the independent variables
• Sampling variance of will be the higher the better explanatory variable
can be linearly explained by other independent variables
• The problem of almost linearly dependent explanatory variables is called
multicollinearity (i.e. for some )
Regress on all other independent variables (including a constant)
The R-squared of this regression will be higher
the better xj can be linearly explained by the other
independent variables
Estimation: The Variance of the OLS
Estimators
38
• An example for multicollinearity
Average standardized
test score of school
Expenditures
for teachers
Expenditures for instructional
materials
Other ex-
penditures
The different expenditure categories will be strongly correlated because if a school has a lot of
resources it will spend a lot on everything.
It will be hard to estimate the differential effects of different expenditure categories because all
expenditures are either high or low. For precise estimates of the differential effects, one would need
information about situations where expenditure categories change differentially.
As a consequence, sampling variance of the estimated effects will be large.
Estimation: The Variance of the OLS
Estimators
39
• Discussion of the multicollinearity problem
• In the above example, it would probably be better to lump all expenditure
categories together because effects cannot be disentangled
• In other cases, dropping some independent variables may reduce
multicollinearity (but this may lead to omitted variable bias)
• Only the sampling variance of the variables involved in multicollinearity will
be inflated; the estimates of other effects may be very precise
• Note that multicollinearity is not a violation of MLR.3 in the strict sense
• Multicollinearity may be detected through „variance inflation factors“
As an (arbitrary) rule of thumb, the variance
inflation factor should not be larger than 10
Estimation: The Variance of the OLS
Estimators
40
• Estimating the error variance
• Theorem 3.3 (Unbiased estimator of the error variance)
An unbiased estimate of the error variance can be obtained by substracting the number of estimated
regression coefficients from the number of observations. The number of observations minus the
number of estimated parameters is also called the degrees of freedom. The n estimated squared
residuals in the sum are not completely independent but related through the k+1 equations that
define the first order conditions of the minimization problem.
Estimation: The Variance of the OLS
Estimators
41
• Estimation of the sampling variances of the OLS estimators
• Note that these formulas are only valid under assumptions MLR.1-MLR.5 (in
particular, there has to be homoscedasticity)
The true sampling
variation of the
estimated
The estimated sampling
variation of the
estimated
Plug in for the unknown
Estimation: The Variance of the OLS
Estimators
Estimation: Efficiency of OLS
42
• Efficiency of OLS: The Gauss-Markov Theorem
• Under assumptions MLR.1 - MLR.5, OLS is unbiased
• However, under these assumptions there may be many other estimators
that are unbiased
• Which one is the unbiased estimator with the smallest variance?
• In order to answer this question one usually limits oneself to linear
estimators, i.e. estimators linear in the dependent variable
Maybe an arbitrary function of the sample values of all
the explanatory variables; the OLS estimator can be
shown to be of this form
43
• Theorem 3.4 (Gauss-Markov Theorem)
• Under assumptions MLR.1 - MLR.5, the OLS estimators are the best linear
unbiased estimators (BLUEs) of the regression coefficients, i.e.
• OLS is only the best estimator if MLR.1 – MLR.5 hold; if there is
heteroscedasticity for example, there are better estimators.
for all for which .
Estimation: Efficiency of OLS
44
• Statistical inference in the regression model
 Hypothesis tests about population parameters
 Construction of confidence intervals
• Sampling distributions of the OLS estimators
 The OLS estimators are random variables
 We already know their expected values and their variances
 However, for hypothesis tests we need to know their distribution
 In order to derive their distribution we need additional assumptions
 Assumption about distribution of errors: normal distribution
Multiple Regression Analyses: Inference
45
Inference: Sampling distributions of
the OLS Estimators
• Assumption MLR.6 (Normality of error terms)
independently of
It is assumed that the unobserved
factors are normally distributed around the
population regression function.
The form and the variance of the
distribution does not depend on
any of the explanatory variables.
It follows that:
46
• Discussion of the normality assumption
• The error term is the sum of „many“ different unobserved factors
• Sums of independent factors are normally distributed (CLT)
• Problems:
 How many different factors? Number large enough?
 Possibly very heterogenuous distributions of individual factors
 How independent are the different factors?
• The normality of the error term is an empirical question
• At least the error distribution should be „close“ to normal
• In many cases, normality is questionable or impossible by definition
Inference: Sampling distributions of
the OLS Estimators
47
• Discussion of the normality assumption (cont.)
• Examples where normality cannot hold:
• Wages (nonnegative; also: minimum wage)
• Number of arrests (takes on a small number of integer values)
• Unemployment (indicator variable, takes on only 1 or 0)
• In some cases, normality can be achieved through transformations of the
dependent variable (e.g. use log(wage) instead of wage)
• Under normality, OLS is the best (even nonlinear) unbiased estimator
• Important: For the purposes of statistical inference, the assumption of
normality can be replaced by a large sample size
Inference: Sampling distributions of
the OLS Estimators
48
• Terminology
• Theorem 4.1 (Normal sampling distributions)
Under assumptions MLR.1 – MLR.6:
The estimators are normally distributed
around the true parameters with the variance
that was derived earlier
The standardized estimators follow a standard
normal distribution
„Gauss-Markov assumptions“ „Classical linear model (CLM) assumptions“
Inference: Sampling distributions of
the OLS Estimators
49
• Testing hypotheses about a single population parameter
• Theorem 4.2 (t-distribution for standardized estimators)
• Null hypothesis (for more general hypotheses, see below)
Under assumptions MLR.1 – MLR.6:
If the standardization is done using the estimated
standard deviation (= standard error), the normal
distribution is replaced by a t-distribution
The population parameter is equal to zero, i.e. after
controlling for the other independent variables, there is no
effect of xj on y
Note: The t-distribution is close to the standard normal distribution if n-k-1 is large.
Inference: The t Test
50
• t-statistic (or t-ratio)
• Distribution of the t-statistic if the null hypothesis is true
• Goal: Define a rejection rule so that, if it is true, H0 is rejected only with a small
probability (= significance level, e.g. 5%)
The t-statistic will be used to test the above null hypothesis. The
farther the estimated coefficient is away from zero, the less likely
it is that the null hypothesis holds true. But what does „far“ away
from zero mean?
This depends on the variability of the estimated coefficient, i.e. its
standard deviation. The t-statistic measures how many estimated
standard deviations the estimated coefficient is away from zero.
Inference: The t Test
51
• Testing against one-sided alternatives (greater than zero)
Test ag against .
Reject the null hypothesis in favour of the
alternative hypothesis if the estimated coefficient
is „too large“ (i.e. larger than a critical
value).
Construct the critical value so that, if the
null hypothesis is true, it is rejected in,
for example, 5% of the cases.
In the given example, this is the point of the tdistribution
with 28 degrees of freedom that is
exceeded in 5% of the cases.
! Reject if t-statistic greater than 1.701
Inference: The t Test
52
• Example: Wage equation
• Test whether, after controlling for education and tenure, higher work
experience leads to higher hourly wages
Standard errors
Test against .
One would either expect a positive effect of experience on hourly wage or no effect at all.
Inference: The t Test
53
• Example: Wage equation (cont.)
„The effect of experience on hourly wage is statistically greater
than zero at the 5% (and even at the 1%) significance level.“
t-statistic
Degrees of freedom;
here the standard normal
approximation applies
Critical values for the 5% and the 1% significance level (these are
conventional significance levels).
The null hypothesis is rejected because the t-statistic exceeds the
critical value.
Inference: The t Test
54
• Testing against one-sided alternatives (less than zero)
Test a against .
Reject the null hypothesis in favour of the alternative
hypothesis if the estimated coefficient
is „too small“ (i.e. smaller than a critical
value).
Construct the critical value so that, if the
null hypothesis is true, it is rejected in,
for example, 5% of the cases.
In the given example, this is the point of the tdistribution
with 18 degrees of freedom so that
5% of the cases are below the point.
! Reject if t-statistic less than -1.734
Inference: The t Test
55
• Example: Student performance and school size
• Test whether smaller school size leads to better student performance
Test against .
Do larger schools hamper student performance or is there no such effect?
Percentage of students
passing maths test
Average annual teacher
compensation
Staff per one thousand
students
School enrollment
(= school size)
Inference: The t Test
56
• Example: Student performance and school size (cont.)
One cannot reject the hypothesis that there is no effect of school size on
student performance (not even for a lax significance level of 15%).
t-statistic
Degrees of freedom;
here the standard normal
approximation applies
Critical values for the 5% and the 15% significance level.
The null hypothesis is not rejected because the t-statistic is not
smaller than the critical value.
Inference: The t Test
57
• Example: Student performance and school size (cont.)
• Alternative specification of functional form:
Test against .
R-squared slightly higher
Inference: The t Test
58
• Example: Student performance and school size (cont.)
The hypothesis that there is no effect of school size on student performance
can be rejected in favor of the hypothesis that the effect is negative.
t-statistic
Critical value for the 5% significance level ! reject null hypothesis
How large is the effect? + 10% enrollment ! -0.129 percentage points
students pass test
(small effect)
Inference: The t Test
59
• Testing against two-sided alternatives
Test against .
Reject the null hypothesis in favour of the alternative
hypothesis if the absolute value
of the estimated coefficient is too large.
Construct the critical value so that, if the
null hypothesis is true, it is rejected in,
for example, 5% of the cases.
In the given example, these are the points
of the t-distribution so that 5% of the cases
lie in the two tails.
! Reject if absolute value of t-statistic is less than -
2.06 or greater than 2.06
Inference: The t Test
60
• Example: Determinants of college GPA Lectures missed per week
The effects of hsGPA and skipped are
significantly different from zero at the 1%
significance level. The effect of ACT is not
significantly different from zero, not even at
the 10% significance level.
For critical values, use standard normal distribution
Inference: The t Test
61
• „Statistically significant“ variables in a regression
• If a regression coefficient is different from zero in a two-sided test, the
corresponding variable is said to be „statistically significant“
• If the number of degrees of freedom is large enough so that the normal
approximation applies, the following rules of thumb apply:
„statistically significant at 10 % level“
„statistically significant at 5 % level“
„statistically significant at 1 % level“
Inference: The t Test
62
• Guidelines for discussing economic and statistical significance
• If a variable is statistically significant, discuss the magnitude of the coefficient
to get an idea of its economic or practical importance
• The fact that a coefficient is statistically significant does not necessarily mean
it is economically or practically significant!
• If a variable is statistically and economically important but has the „wrong“
sign, the regression model might be misspecified
• If a variable is statistically insignificant at the usual levels (10%, 5%, 1%), one
may think of dropping it from the regression
• If the sample size is small, effects might be imprecisely estimated so that the
case for dropping insignificant variables is less strong
Inference: The t Test
63
• Testing more general hypotheses about a regression coefficient
• Null hypothesis
• t-statistic
• The test works exactly as before, except that the hypothesized value is
substracted from the estimate when forming the statistic
Hypothesized value of the coefficient
Inference: The t Test
64
• Example: Campus crime and enrollment
• An interesting hypothesis is whether crime increases by one percent if
enrollment is increased by one percent
The hypothesis is
rejected at the 5% level
Estimate is different from
one but is this difference
statistically significant?
Inference: The t Test
65
• Computing p-values for t-tests
• If the significance level is made smaller and smaller, there will be a point
where the null hypothesis cannot be rejected anymore
• The reason is that, by lowering the significance level, one wants to avoid
more and more to make the error of rejecting a correct H0
• The smallest significance level at which the null hypothesis is still rejected, is
called the p-value of the hypothesis test
• A small p-value is evidence against the null hypothesis because one would
reject the null hypothesis even at small significance levels
• A large p-value is evidence in favor of the null hypothesis
• P-values are more informative than tests at fixed significance levels
Inference: The t Test
66
• How the p-value is computed (here: two-sided test)
The p-value is the significance level at which one is
indifferent between rejecting and not rejecting the
null hypothesis.
In the two-sided case, the p-value is thus the
probability that the t-distributed variable takes on
a larger absolute value than the realized value of
the test statistic, e.g.:
From this, it is clear that a null hypothesis is
rejected if and only if the corresponding p-value is
smaller than the significance level.
For example, for a significance level of 5% the tstatistic
would not lie in the rejection region.
value of test statistic
These would be the
critical values for a 5%
significance level
Inference: The t Test
Inference: Confidence Intervals
67
Critical value of
two-sided test
• Confidence intervals
• Simple manipulation of the result in Theorem 4.2 implies that
• Interpretation of the confidence interval
• The bounds of the interval are random
• In repeated samples, the interval that is constructed in the above way will
cover the population regression coefficient in 95% of the cases
Lower bound of the
Confidence interval
Upper bound of the
Confidence interval
Confidence level
68
• Confidence intervals for typical confidence levels
• Relationship between confidence intervals and hypotheses tests
reject in favor of
Use rules of thumb
Inference: Confidence Intervals
69
• Example: Model of firms‘ R&D expenditures
Spending on R&D Annual sales Profits as percentage of sales
The effect of sales on R&D is relatively precisely estimated as the
interval is narrow. Moreover, the effect is significantly different
from zero because zero is outside the interval.
This effect is imprecisely estimated as the interval
is very wide. It is not even statistically
significant because zero lies in the interval.
Inference: Confidence Intervals
(0.0128 )
0.0217
(0.0128 )
70
• Example: Return to education at 2 year vs. at 4 year colleges
Years of education at
2 year colleges
Years of education at
4 year colleges
Test against .
A possible test statistic would be:
The difference between the estimates is normalized by the estimated
standard deviation of the difference. The null hypothesis would have to be
rejected if the statistic is „too negative“ to believe that the true difference
between the parameters is equal to zero.
Inference: Testing hypotheses about a
linear combination of parameters
71
Inference: Testing hypotheses about a
linear combination of parameters
• Impossible to compute with standard regression output because
• Alternative method Usually not available in regression output
Define and test against .
a new regressor (= total years of college)Insert into original regression
72
• Estimation results
• This method works always for single linear hypotheses
Total years of college
Hypothesis is rejected at 10%
level but not at 5% level
Inference: Testing hypotheses about a
linear combination of parameters
73
• Testing multiple linear restrictions: The F-test
• Testing exclusion restrictions
Years in
the league
Average number of
games per year
Salary of major league
baseball player
Batting average Home runs per year Runs batted in per year
against
Test whether performance measures have no effect/can be exluded from regression.
Inference: The F Test
74
• Estimation of the unrestricted model
None of these variabels are statistically significant when tested individually
Idea: How would the model fit be if these variables were dropped from the regression?
Inference: The F Test
75
• Estimation of the restricted model
• Test statistic
The sum of squared residuals necessarily increases, but is the increase statistically significant?
The relative increase of the sum of
squared residuals when going from
H1 to H0 follows a F-distribution (if
the null hypothesis H0 is correct)
Number of restrictions
Inference: The F Test
76
• Rejection rule (Figure 4.7)
A F-distributed variable only takes on positive
values. This corresponds to the fact that the sum
of squared residuals can only increase if one
moves from H1 to H0.
Choose the critical value so that the null hypothesis
is rejected in, for example, 5% of the cases,
although it is true.
Inference: The F Test
77
• Test decision in example
• Discussion
 The three variables are „jointly significant“
 They were not significant when tested individually
 The likely reason is multicollinearity between them
Number of restrictions to be tested
Degrees of freedom in
the unrestricted model
The null hypothesis is overwhelmingly
rejected (even at very small
significance levels).
Inference: The F Test
78
• Test of overall significance of a regression
• The test of overall significance is reported in most regression packages; the null
hypothesis is usually overwhelmingly rejected
The null hypothesis states that the explanatory
variables are not useful at all in explaining the
dependent variable
Restricted model
(regression on constant)
Inference: The F Test
Next Class
• Freeing up the classical assumptions
(heteroskedasticity)
79