(b) (c)
x
y
x = --
1
2
y = 0
MATH 164, Spring 2001 Due Date: Name(s):
Honors Project 1: Least Squares
The Analytic Approach
In many science and engineering applications, one is often given a set of data points {(xi, yi), i = 1, ..., N}
in R2
, and interested in finding the line which "best fits this data" (see Fig. (a) below). One solution to
this problem is provided by classical least squares: finding the line for which the sum of the squares of the
distances di from the data points to the line in the direction of the dependent axis is a minimum (see Fig. (b)
below). Another solution to this problem is provided by invariant least squares: finding the line for which
the sum of the squares of the perpendicular distances di from the data points to the line is a minimum (see
Fig. (c) below). In general, these lines are different.
In MATH 261, it will follow (as an application of optimization of functions of two variables) that the line
y = mx + b which best approximates the data in the sense of classical least squares is given by
m =
N xiyi - ( xi) ( yi)
N x2
i - ( xi)
2 and b =
( yi) x2
i - ( xi) ( xiyi)
N x2
i - ( xi)
2 .
In MATH 351, we will discuss how to find the line which best approximates the data in the sense of invariant
least squares.
The classical least squares line is easier to compute than the invariant least squares line,
which is certainly a strength of classical least squares. On the other hand, the classical least
squares line depends on which variable is assumed to be independent and which is assumed
to be dependent, and this is a weakness. In particular, classical least squares assumes the
line which best approximates data has an equation of the form y = mx + b, and for a set
of data points such as {(0, 0), (0, 1), (0, 2), (1, 0), (1, 1), (1, 2)}, it will not necessarily
give the "best" answer. (The line whose equation is x = 1/2 -- which is the invariant least
squares line of fit -- appears to fit the data better than the line whose equation is y = 0 --
which is the classical least squares line of fit.)
Exercise 1: Find the constants m and b that define the line y = mx + b that best matches the following
data in the sense of classical least squares, to the nearest 2 decimal places.
x 1.0 1.2 1.5 1.8 2.0 2.3 2.4 2.6 3.4 3.7 3.8 3.9
y 2.7 4.0 3.6 4.2 4.5 5.1 5.4 5.7 7.2 7.8 7.8 6.0
Exercise 2: Repeat Exercise 1 (use the same data set, but now) finding the line x = ny + d that best
matches the data in the sense of classical least squares.
Exercise 3: Rewrite the equation x = ny + d in the form y = mx + b. (You should get an equation that is
different than the one you got in Exercise 1!)
In many science and engineering applications, one is also often given a set of data points {(xi, yi), i =
1, ..., N} in R2
, and interested in finding a curve other than a line which "best fits this data". For example,
you might want to fit a curve of the form y = Aekx
to such a set of data points. (This might happen if
you were studying population growth or radioactive decay problems, for example.) The difficulty in trying
to mimic the MATH 261 computations that we used to arrive at the classical least squares solution to our
problem, is that the systems of equations we would have to solve is much more complicated; in fact it is
too complicated to be of practical value. Instead of approaching our problem this way, it is much easier to
try to transform the data and the equation to which data is to be fit. For example, to fit a curve of the
form y = Aekx
to the data {(xi, yi), i = 1, ..., N}, we might take logarithms, transforming our equation into
ln y = kx + ln A, and our data into {(xi, ln yi), i = 1, ..., N}. By using the classical least squares solution to
the problem of fitting a line Y = mx + b to this "new" data set, we would find the m = k and b = ln A --
or k = m and A = eb
-- for which y = Aekx
best fits our original data set {(xi, yi), i = 1, ..., N}.
Exercise 4: Find the constants A and k that define the curve of the form y = Aekx
that best matches the
following data, to the nearest 2 decimal places.
x 1.0 1.4 1.8 2.5 2.9 3.2 3.6 3.9 4.0 4.2 4.4 4.5
y 0.9 1.1 1.3 1.7 2.0 2.3 2.8 3.1 3.3 3.6 3.9 4.0
One hidden advantage to transforming data by taking logarithms (as we did above) is that taking logarithms
tends to temper the size of data: if the yi-values in a data set are much larger or much smaller than the
corresponding xi-values, then the logs of the yi-values may be more commensurate with the corresponding
xi-values.
Exercise 5: By taking logs of both x- and y-data values, find the constants A and p that define the curve
of the form y = Axp
which best fits the following data, to the nearest 2 decimal places.
x 1 10 20 50 100 110
y 1.2 3.2 4.2 6.1 8.1 8.4
On the other hand, one disadvantage to transforming data is that data can often be transformed in more
than one way, and the results obtained via different transformations may be different.
The Geometric Approach
So far, the processes we have discussed have been analytical. In many applications, however, they are
often studied geometrically.
Given a set of data points {(xi, yi), i = 1, ..., N}, finding the line y = mx + b which best approximates it
(in whatever sense you might please) is straightforward: the data is plotted, the line is estimated by eye,
and -- having drawn the line -- the parameters that describe it are estimated by eye.
1 2 3 4 5 6 7 8 9 10 20 x
y
y = log10 x
10 100 1000
1
10
0.1
1
10
100
Exercise 6: By graphing, find the constants m and b that define the line y = mx + b that best matches the
data from Exercise 1 to the nearest 2 decimal places.
Finding the curve y = Aekx
that best fits a set of data points
is a little trickier, however. To find the curve of the form y =
Aekx
that best fits the set of data points {(xi, yi), i = 1, ..., N},
you might numerically transform the data into data of the form
{(xi, ln yi), i = 1, ..., N}, plot the data, estimate the line by eye,
estimate the parameters m and b that define the line, and then
numerically compute k = m and A = eb
. Alternately, rather than
taking the time and effort to transform the data, you might work
on semi-log graph paper. A sample of such paper is to the right.
On semi-log graph paper the y-axis tic marks are rescaled by log10 (see the figure below)
so that the units along the vertical axis can be labeled in any of
the manners illustrated to the right. Using semi-log graph paper
facilitates the plotting of y-values over a large range without the
need for numerical conversions by hand. One of the things that
makes semi-log graph paper so useful is the regularity in the irregularity
of the spacing of the y-axis tic marks; this regularity is a
consequence of the fact that log10 10n
= n log10 10 = n (or, if you
are thinking in terms of natural logs, ln en
= n ln e = n.)
Exercise 7: By graphing on semi-log graph paper, find the constants A and k that define the line y = Aekx
that best matches the data from Exercise 4 to the nearest 2 decimal places.
In addition to semi-log graph paper -- which facilitates the plotting
of y-values over a large range without the need for numerical
conversions by hand -- there is log-log graph paper which facilitates
the plotting of x- and y-values over a large range without the
need for numerical conversions by hand.
Exercise 8: By graphing on log-log graph paper, find the constants
A and p that define the line y = Axp
that best matches the
data from Exercise 5 to the nearest 2 decimal places.
Exercise 9: Your answers to Exercises 7 and 8 are probably not
the same. Why?