10 Simple linear regression
This chapter is concerned with relations among variables like demand and supply relations, cost
functions, production functions and many others. Deterministic relations characterized as a function
y = f(x) are usually from the natural sciences. A relation between X and Y is deterministic if
each element of the domain is paired off with just one element of the range. In economic situations
the deterministic relations are very rare and we usually deal with stochastic (probabilistic) relations
which ae more realistic for most real-world situations. A relation between X and Y is said to be
stochastic if for each value of X there is a whole probability distribution of values of Y . Thus for
any given value of X the variable Y may assume some specific value (or fall within some specific
interval) with a probability smaller than one and greater than zero. Its value is effected by a random
disturbance. Regression analysis is dealing with stochastic relations. It is aimed a) to determine a
form of a function which may describe the dependence of Y on X and b) to estimate the parameters
for the selected function.
ad a)
To determine the form of the function we may start from logical analysis (i.e. to follow some economic
theory) or it could be estimated from the two-dimensional diagram (scatter plot). (This way can be
used only in the case where the dependent (response) variable is a function of just one independent
(explanatory, predictor) variable.)
The list of commonly used forms of regression functions follows:
* regression line: E(Y |x) = 0 + 1x
* regression parabola: E(Y |x) = 0 + 1x + 2x2
* regression polynomial of degree p: E(Y |x) = 0 + 1x + . . . + +pxp
* regression hyperbola: E(Y |x) = 0 + 1
1
x
* regression logarithmic function: E(Y |x) = 0 + 1 ln x
Each of listed regression functions represents simple linear regression function. (The term linear regression
function is used if the function is linear with respect to the parameters 0, 1, 2, . . .. It is
said to be simple if the dependent variable is a function of just one independent variable. Otherwise
it is said to be multiple.)
ad b)
The unknown parameters 0, 1, 2, . . . are estimated so that to fit the data set of n pairs of observed
values
(x1, y1), . . . , (xn, yn). To estimate the parameters the least square estimation is commonly used me-
thod.
10.1 Specification of the classical simple linear regression model
A model consist of the regression equation and the basic assumptions. Let us begin with the equation:
48
ˇY = 0 + 1f1(x) + . . . + pfp(x) +  where:
Y is a dependent random variable which is observable
x is an independent non-stochastic variable which is observable
 is a random error which accounts for the random factors and is unobservable
0+1f1(x)+. . .+pfp(x) is a theoretic regression function with unknown parameters 0, 1, . . . , p
For n observations the regression equation can be expressed as follows:
y1 = 0 + 1f1(x1) + . . . + pfp(x1) + 1
...
yi = 0 + 1f1(xi) + . . . + pfp(xi) + i
...
yn = 0 + 1f1(xn) + . . . + pfp(xn) + n
The subscript i = 1, . . . , n refers to the ith observation. Observations on X and Y can be made over
time, in which case we speak of "time-series data" or they can be made over individuals, objects, or
geographical areas, in which case we speak of cross-section data.
ˇAssumptions for the random error i, i = 1, . . . , n are
a) E(i) = 0 [zero mean for errors which are not systematic]
b) D(i) = 2
> 0 [each observation is done with the equal precision]
c) C(i, j) = 0 pro i = j [there is no linear relationship between the errors]
d) i  N(0, 2
) [errors are normally distributed]
Violations of some basic assumptions are shown in following pictures. In the first one there is a
violation of the assumption b) and we speak of heteroskedasticity of random errors; in the second
picture there is a violation of the assumption c) and then we speak of autocorrelation of random
errors.
49
0 5 10 15 20 25 30 35 40 45 50
0
5
10
15
20
25
Heteroskedasticita
x
y
pozorování dvojic(x,y)
regresní funkce
0 5 10 15 20 25 30 35 40 45 50
496
498
500
502
504
506
508
510
512
514
516
x
y
autokorelace
Heteroskedasticity Autocorrelation
The following pictures perform low and strong linear dependence under basic assumptions:
0 5 10 15 20 25 30 35 40 45 50
-200
-150
-100
-50
0
50
100
150
200
x
y
Slabá lineární závislost
0 5 10 15 20 25 30 35 40 45 50
0
2
4
6
8
10
12
14
16
x
y
Silná lineární závislost
Low fit Strong fit
Since the mathematical form of the relation is specified the unknown parameters 0, 1 . . . , p should
be estimated.
10.2 The least square estimators of regression parameters and the notation
b0, b1, . . . , bp estimators of regression parameters 0, 1, . . . , p
b0 + b1f1(x) + . . . + bpfp(x) empirical (sample) regression function
^yi = b0 + b1f1(xi) + . . . + bpfp(xi) regression estimate of the ith value of the random variable Y
ei = yi - ^yi ith residual
SE =
n
i=1
(yi - ^yi)2
=
n
i=1
e2
i residual sum of squares
s2
= SE
n-p-1
estimator of the variance 2
SR =
n
i=1
(^yi - m2)2
; m2 = 1
n
n
i=1
yi regression sum of squares
ST =
n
i=1
(yi - m2)2
; total sum of squares [It holds: ST = SR + SE]
ID2
= SR
ST
= 1 - SE
ST
coefficient of determination [ ID2
 (0, 1) ]
[Coefficient of determination is a measure of "goodness of fit"; it is simply the proportion of the variation
of Y that can be attributed to the variation of X and describes how well the sample regression
function fits the observed data. A zero value of ID2
indicates the poorest and a unit value the best
fit that can be attained.]
50
10.3 The method of least squares
The purpose of the least-square method is to find estimators b0, b1, . . . , bp of regression parameters
0, 1, . . . , p so that the sum of squares of residuals is as little as possible. (The regression estimates
fit the data "best".) ThusTedy
S(0, 1, . . . , p) =
n
i=1
e2
i =
n
i=1
[yi - 0 + 1f1(xi) + . . . + pfp(xi)]2
 min
Thus we have to minimize the function S(0, 1, . . . , p), which is dependent only on unknown parameters
of the regression model. The procedure how to do it follows:
1. Differentiate S(0, 1, . . . , p) with respect to each regression parameter.
2. Equate each derivatives to zero. This leads to the system of n equations of n unknown variables.
These equations are generally known as the least squares normal equations.
3. Solving the least squares normal equations we obtain wanted estimators b0, b1, . . . , bp of regression
parameters 0, 1, . . . , p.
Then the least squares normal equations have the form:
0 1 + 1 f1 + 2 f2 + . . . + p fp = yi
0 f1 + 1 f2
1 + 2 f1f2 + . . . + p f1fp = yif1
...
...
0 fp + 1 fpf1 + 2 fpf2 + . . . + p f2
p = yifp
where the symbol states for
n
i=1
and te symbol fj states for
n
i=1
fj(xi).
0, 1, . . . , p, which solve the least squares normal equations are denoted as b0, b1, . . . , bp.
Example 10.4
Considering regression line find the estimators b0, b1 of the parameters 0, 1. (And the basic assumptions
of the classical regression model are satisfied.)
Solution
The estimates b0, b1 can be obtained from the least squares normal equations :
b0
n
i=1
1 + b1
n
i=1
xi =
n
i=1
yi
b0
n
i=1
xi + b1
n
i=1
x2
i =
n
i=1
yixi
.
51
The solution is
b0 =
n
i=1
yi
n
i=1
x2
i -
n
i=1
xi
n
i=1
yixi
n
n
i=1
x2
i -
n
i=1
xi
2 b1 =
n
n
i=1
yixi -
n
i=1
yi
n
i=1
xi
n
n
i=1
x2
i -
n
i=1
xi
2
Thus the estimated sample regression line is ^y = b0 + b1x.
(Notice that b0, b1 are random variables; they are depending on realizations (xi, yi),while parameters
0, 1 are constants.)
10.5 The matrix notation of classical linear regression model and its solution
ˇModel: yi = 0 + 1f1(xi) + . . . + pfp(xi) + i, i = 1, . . . , n
can be expressed in matrix notation: y = X +  , thus


y1
y2
...
yn


=


1 f1(x1) . . . fp(x1)
1 f1(x2) . . . fp(x21)
...
...
1 f1(xn) . . . fp(xn)


0
1
...
p


+


1
2
...
n


A notation:
y a column vector of observed values of dependent random variable Y
X a matrix of observed values of regressors
[we assume the rank: h(X) = p + 1 < n, the columns of X are linear independent]
 a column vector of regression parameters
 a column vector of residuals
ˇThe assumptions of the model can be rewritten as follows:   Nn(0, 2
I)
As it was written in 10.3 the estimators b0, b1, . . . , bp of regression parameters 0, 1, . . . , p can be
obtained by solving least square normal equations. These can be expressed in matrix notation as
follows:
X X = X y the least square normal equations
b = (X X)-1
X y the lest square estimators (LSE or frequently OLS - ordinary least squares)
^y = Xb a vector of regression estimators
e = y - ^y a vector of residuals
10.6 Properties of the least square estimators b = (X X)-1
X y
1. the estimator b is linear; it is a linear combination of the random vector y
2. the estimator b is unbiased; it is true that E(b) = 
3. the estimator b has a variance-covariance matrix var (b) = 2
(X X)-1
4. the estimator b is normally distributed with mean vector  and variance-covariance matrix
var (b) = 2
(X X)-1
, thus b  Np+1(, 2
(X X)-1
); the normality follows from   Nn(0, 2
I)
and the first property.
5. the estimator b is the best linear unbiased estimator of the vector . (BLUE)
52
Remark 10.7
The last property is known as Gauss-Markov theorem.
The "best" means that if b
is any other linear unbiased estimator, then var (b)  var (b
).
[var (b
) - var (b) is a positive semi-definite matrix.]
As we know the distribution of the vector of estimators b we may follow with statistical inferences
about regression parameters . But the parameter , which is involved in variance-covariance matrix
var (b), is unknown. Thus we have to obtain its estimator and consequently estimators of variances
of elements b.
Variance-covariance matrix has the form var b =


var(b0) cov(b0, b1) . . . cov(b0, bp)
cov(b1, b0) var(b1) . . . cov(b1, bp)
...
...
...
cov(bp, b0) cov(bp, b1) . . . var(bp)


=
2
(X X)-1
Thus the variances D(bj), j = 0, 1, . . . , p are represented by diagonal elements of the matrix
2
(X X)-1
.
Let us recall (from 10.2) that s2
= SE
n-p-1
is an unbiased estimator of the parameter 2
. Thus the
matrix s2
(X X)-1
estimates variance-covariance matrix var(b) and its diagonal elements estimate
the variances D(bj). The following notation is used :
vjj j-th diagonal element of the matrix (X X)-1
sbj
= s 

vjj a standard error of bj
10.8 The confidence intervals for the regression parameters
The statistic Tj =
bj -j
sbj
follows t-distribution t(n - p - 1) for j = 0, 1, . . . , p.
Thus considering 100(1 - )% confidence interval for j its limits are calculated as follows: bj 
sbj
t1-/2(n - p - 1)
10.9 Test of significance of single parameters (separate t-tests)
At the significance level  we are testing for j = 0, 1, . . . , p
H0 : j = 0 versus H1 : j = 0.
The null hypothesis asserts that the vector y is not influenced by the j-th column of the matrix X.
Rejecting the null it is concluded that the parameter j is relevant in our model.
The test statistic Tj =
bj
sbj
follows a distribution t(n - p - 1) if H0 is true.
The critical region follows: W = (-; -t1-/2(n - p - 1)  t1-/2(n - p - 1); )
10.10 Test of significance of regression (the overall F-test)
At the significance level  we are testing:
H0 : (1, 2, . . . , p) = (0, 0, . . . , 0) versus H1 : (1, 2, . . . , p) = (0, 0, . . . , 0).
H0 is a more extensive hypothesis that none of the explanatory variables has an influence on Y . If
H0 is true then the variation of Y from observation to observation is not affected by changes in any
one of the explanatory variables, but is purely random, Y = 0 + .
The test statistic: F = SR/p
SE/(n-p-1)
 F(p, n - p - 1), if H0 is true.
The critical region follows: W = F1-(p, n - p - 1); )
The F-test results are usually performed in ANOVA table:
53
Sources of variability sum of squares degrees of freedom mean squares test statistic
regression model SR p SR/p SR/p
SE/(n-p-1)
error SE n - p - 1 SE/(n - p - 1)
total ST n - 1
54