Distance and Approximation
A straight line may be the shortest distance between two points, but it is by no means the most interesting.
—Doctor Who In "The Time Monster" By Robert Sloman BBC, 1972
Although this may seem a paradox, all exact science is dominated by the idea of approximation.
—Bertrand Russell In W. H. Auden and L. Kronenberger, eds. The Viking Book of Aphorisms Viking, 1962, p. 263
			B	
				
	A			
				
				
Figure 1.1
Taxicab distance
1.0 introduction: Taxicab Geometry
We live in a three-dimensional Euclidean world, and, therefore, concepts from Euclidean geometry govern our way of looking at the world. In particular, imagine stopping people on the street and asking them to fill in the blank in the following
sentence: "The shortest distance between two points is a_." They will almost
certainly respond with "straight line." There are, however, other equally sensible and intuitive notions of distance. By allowing ourselves to think of "distance" in a more flexible way, we will open the door to the possibility of having a "distance" between polynomials, functions, matrices, and many other objects that arise in linear algebra.
In this section, you will discover a type of "distance" that is every bit as real as the straight-line distance you are used to from Euclidean geometry (the one that is a consequence of Pythagoras' Theorem). As you'll see, this new type of "distance" still behaves in some familiar ways.
Suppose you are standing at an intersection in a city, trying to get to a restaurant at another intersection. If you ask someone how far it is to the restaurant, that person is unlikely to measure distance "as the crow flies" (i.e., using the Euclidean version of distance). Instead, the response will be something like "It's five blocks away." Since this is the way taxicab drivers measure distance, we will refer to this notion of "distance" as taxicab distance.
Figure 7.1 shows an example of taxicab distance. The shortest path from A to B requires traversing the sides of five city blocks. Notice that although there is more than one route from A to B, all shortest routes require three horizontal moves and two vertical moves, where a "move" corresponds to the side of one city block. (How many shortest routes are there from A to 5?) Therefore, the taxicab distance from A to B is 5.
Idealizing this situation, we will assume that all blocks are unit squares, and we will use the notation dt(A, B) for the taxicab distance from A to B.
Problem 1   Find the taxicab distance between the following pairs of points:
(a) (1, 2) and (5, 5) (b) (2,4) and (3, -2)
(c) (0, 0) and {-A, -3)      (d) (-2, 3) and (l, 3)
(e)        and (-§,§)        (f) (2.5,4.6) and (3.1,1.5)
529
Chapter 7   Distance and Approximation
Problem 2 Which of the following is the correct formula for the taxicab distance d,(A, B) between A = (au a2) and B = (bv b2)~<
(a) dt(A, B) = (al - bx) + (a2 - b2)
(b) dt(A,B) = (|a,| - |&i|) + (|«2| - \b2\)
(c) dt(A, B) = |<*i - &i| + |«2 ~ ^1 We can define the taxicab norm of a vector v as
14 - d((v, o)
i
Problem 3 Find |v|, for the following vectors:
	3"	(b) v =
V =		
		
	'-3*	(d) v =
V =	-6	
6]
Problem 4 Show that Theorem 1.3 is true for the taxicab norm. Problem 5 Verify the Triangle Inequality (Theorem 1.5), using the taxicab norm and the following pairs of vectors:
	"3"		"f	(b)u =	" r		~-2
u =	1,	, v =			-h	, v =	
			2				3
Problem 6 Show that the Triangle Inequality is true, in general, for the taxicab norm.
In Euclidean geometry, we can define a circle of radius r, centered at the origin, as the set of all x such that ||x|| = r. Analogously, we can define a taxicab circle of radius r, centered at the origin, as the set of all x such that jjx||, = r.
Problem 1 Draw taxicab circles centered at the origin with the following radii:
(a) r = 3      (b) r = 4     (c) r = 1
Problem 8 In Euclidean geometry, the value of ir is half the circumference of a unit circle (a circle of radius 1). Lets define taxicab pi to be the number irt that is half the circumference of a taxicab unit circle. What is the value of 7Tf?
In Euclidean geometry, the perpendicular bisector of a line segment AB can be defined as the set of all points that are equidistant from A and B. If we use taxicab distance instead of Euclidean distance, it is reasonable to ask what the perpendicular bisector of a line segment now looks like. To be precise, the taxicab perpendicular bisector of AB is the set of all points X such that
d,(X.A) = dt(X,B)
Problem 9 Draw the taxicab perpendicular bisector of AB for the following pairs of points:
(a) A = (2,1), B = (4,1)     (b) A = (-1,3), B = (-1, -2) (c) A = (1, 1), B = (5, 3)      (d) A = (1, 1), B = (5, 5)
As these problems illustrate, taxicab geometry shares some properties with Euclidean geometry, but it also differs in some striking ways. In this chapter, we will
Section 7.1   Inner Product Spaces
531
encounter several other types of distances and norms, each of which is useful in its own way. We will try to discover what they have in common and use these common properties to our advantage. We will also explore a variety of approximation problems in which the notion of "distance" plays an important role.
Inner Product Spaces
In Chapter 1, we defined the dot product u • v of vectors u and v in R", and we have made repeated use of this operation throughout this book. In this section, we will use the properties of the dot product as a means of defining the general notion of an inner product. In the next section, we will show that inner products can be used to define analogues of "length" and "distance" in vector spaces other than W.
The following definition is our starting point; it is based on the properties of the dot product proved in Theorem 1.2.
DCfillitlOll An inner product on a vector space Vis an operation that assigns to every pair of vectors u and v in V a real number (u, v) such that the following properties hold for all vectors u, v, and w in V and all scalars c:
1. (u, v) = (v, u)
2. (u, v + w) = (u, v) + (u, w)
3. (cu, v) = c(u, v)
4. (u, u) £ 0 and (it, u) = 0 if and only if u = 0
A vector space with an inner product is called an inner product space.
Remark Technically, this definition defines a real inner product space, since it assumes that V is a real vector space and since the inner product of two vectors is a real number. There are complex inner product spaces too, but their definition is somewhat different. (See Exploration: Vectors and Matrices with Complex Entries at the end of this section.)
Example 7.1
W is an inner product space with (u, v) = u ■ v. Properties (1) through (4) were verified as Theorem 1.2.
The dot product is not the only inner product that can be defined on W.
				
Example 7.2	Letu =	1  and v = ."2 J	V	be two vectors in R2. Show that
				(u, v) = 2w,v, + 3m2v2
defines an inner product.
S32 Chapter 7   Distance and Approximation
Solution   We must verify properties (1) through (4). Property (1) holds because
(u, v) = 2ulvl + 3u2v2 - 2v1u1 + 3v2u2 = (v, u)
Next, let w =
. We check that
(u, v + w) = 2«1(v] + Wj) + 3u2(v2 + w2)
= 2ulvl + 2u1wl + 3u2v2 + 3w2w2 = (2«iVj + 3u2v2) + (2ulwl + 3u2w2) = (lI)V) + (u, w)
which proves property (2). If c is a scalar, then
(cu, v) = 2(cw1)v1 + 3(c«2)v2 = c(2uxvx + 3«2v2) = c(u,v)
which verifies property (3). Finally,
(u,u) = 2«!»! + 3u2«2 = 2«i + 3u\ S: 0
and it is clear that (u, u) = 2u\ + 3u\ = 0 if and only if ux = u2 = 0 (that is, if and only if u = 0). This verifies property (4), completing the proof that (u, v), as defined, is an inner product.
Example 7.2 can be generalized to show that if wx,..., w„ are positive scalars and
	ui		Vi
u =		sold   v =	
	.Un-		-V„-
are vectors in W, then
'■;;',^^.v';•^■t^^''vV■.v:,>'-■^■ ■ ■■
(u, v) = wxulvl + • ■ ■ + wnu„v„
(1)
defines an inner product on R", called a weighted dot product. If any of the weights Wj is negative or zero, then Equation (1) does not define an inner product. (See Exercises 13 and 14.)
Recall that the dot product can be expressed as u • v = u2v. Observe that we can write the weighted dot product in Equation (1) as
(u, v) = urWv
Section 7.1   Inner Product Spaces
533
where W is the n X n diagonal matrix
W =
w
The next example further generalizes this type of inner product.
	
Example 7.3	Let A be a symmetric, positive definite « X n matrix (see Section 5,5) and let u and v be vectors in W. Show that
	,                                              (u, v) = u7Av
defines an inner product.
Solution   We check that
(u, v) = u7Av = u-Av = ,\vu
= ATvu = (vTA)T'U - vrAu = (v, u)
Also,
(u, v + w) = urA(v + w) = urAv + uTAw = (u, v) + (u, w)
and
(cu,v) = (cu)TAv =5 c(urAv) = c{u, v)
Finally, since A is positive definite, (u, u) = urAu > 0 for all u £ 0, so (u, u) u'Au = 0 if and only if u = 0. This establishes the last property.
4
To illustrate Example 7.3, let A
4 -21 -2 7
. Then
(u, v) = urAv = [ w, u2
4 -2 -2 7
= 4lUxVi — 2uiV2 ~ 2«2vi + 7u2v2
Example 7.4
The matrix A is positive definite, by Theorem 5.22, since its eigenvalues are 3 and 8. Hence, (u, v) defines an inner product on U2.
We now define some inner products on vector spaces other than W.
In 9^2» let p(x) = flo + <*\X + a2x2 and q(x) — b0 + bxx + b2x2. Show that
(p(x), q(x)) = a0b0 + alb1 + a2b2
defines an inner product on <di2. (For example, if p(x) = 1 — 5x + 3X2 and q(x) = 6 + 2x- x2, then (p(x), q(x)) = 1 • 6 + (-5) • 2 + 3 • (-1) = -7.)
Since 9*2 is isomorphic to W, we need only show that the dot product in R3 is an inner product, which we have already established.
Chapter 7   Distance and Approximation
dx
Example 7.5
Let/and g be in % [a, b], the vector space of all continuous functions on the closed interval [a, b]. Show that
rb
(/.*>= fMg(x)dx
a
defines an inner product on % [a, b]. Solution   We have
rb rb
(fig) =    f(x)g(x) dx =    g(x)f(x) dx = (g,f)
Ja Ja
Also, if h is in % [a, b], then
ifig+h)=    f(x)(g(x) + h(x)) dx
(f(x)g(x) +f(x)h(x))dx ■ b
f(x)g(x) dx +    f{x)h{x) dx
a
= (fig) + {f,h)
If c is a scalar, then
rb
(Cf>g)= cf(x)g(x)dx
b
f(x)g(x) dx = c(fig)
rb
Finally, {/,/) =    (f(x))2dx S 0, and it follows from a theorem of calculus that, since/
f 2
is continuous, {//) =    (f(x)) dx = 0 if and only if/is the zero function. Therefore,
J a
(/, g) is an inner product on % [a, b].
Example 7.5 also defines an inner product on any subspace oi% [a, b]. For example, we could restrict our attention to polynomials defined on the interval [a, b]. Suppose we consider SP [0, 1], the vector space of all polynomials on the interval [0, 1]. Then, using the inner product of Example 7.5, we have
(x2,1 + x) =
x2(\ + x) dx =
o
V x4 — + — 3 4
Or + x) dx
I 1 7 3     4 ~ 12
Section 7.1   Inner Product Spaces
Properties of Inner Products
The following theorem summarizes some additional properties that follow from the definition of inner product.
Tl)60rem 7.1     Let u, v, and w be vectors in an inner product space Vand let c be a scalar.
a. (u + v, w) = (u, w) + (v, w)
b. (u, cv) = c{u, v)
c. (u, 0) = (0, v) = 0
Proof We prove property (a), leaving the proofs of properties (b) and (c) as Exercises 23 and 24. Referring to the definition of inner product, we have
(u + v, w) = (w, u + v) by (1)
= (w, u) + (w, v) by (2) = (n, w) + (v, w)      by (1)
Length, Distance, and Orthogonality
In an inner product space, we can define the length of a vector, distance between vectors, and orthogonal vectors, just as we did in Section 1.2. We simply have to replace every use of the dot product u • v by the more general inner product (u, v).
Definition    Let u and vbe vectors in an inner product space V.
1. The length (or norm) of vis ||vjj = V(v, v).
2. The distance between u and vis d(u, v) = j|u — vj|.
3. u and v are orthogonal if (u, v) = 0.
Note that jjv| is always defined, since (v, v) 2: 0 by the definition of inner product, so we can take the square root of this nonnegative quantity. As in U", a vector of length 1 is called a unit vector. The unit sphere in V is the set S of all unit vectors in V.
dy dx
Consider the inner product on % [0, 1] given in Example 7.5. If/(x) = x and g(x) 3x — 2, find
I     (a) 11/1
(b) d(f,g)
(c) (f,g)
Solution   (a) We find that
(/»/) =
so I/I = V(/7) = 1/V3.
1 rl 3
A
f2(x) dx = J x2 dx — —
1 = i Jo 3
Chapter 7   Distance and Approximation
(b) Since d(f,g) = \\f-g\\ = \/{f^gJ^g) and
f(x) - g(x) = x - (3x - 2) = 2 - 2x — 2(1 - x)
we have
{/- g>f~ g) =   (/(*) - ^W)2    =    4(1 - 2x + x2) dx
= 4
x - x +
Combining these facts, we see thatd(/,£) = VT/3 = 2/V3. (c) We compute
(fig) =    f(x)g(x) dx =    x(3x - 2) dx =    (3x2 - 2x)     = [x3 - x2]J = 0
Thus./andgare orthogonal.
4
It is important to remember that the "distance" between/and g in Example 7.6 does not refer to any measurement related to the graphs of these functions. Neither does the fact that / and g are orthogonal mean that their graphs intersect at right angles. We are simply applying the definition of a particular inner product. However, in doing so, we should be guided by the corresponding notions in R2 and 1R3, where the inner product is the dot product. The geometry of Euclidean space can still guide us here, even though we cannot visualize things in the same way.
*i£mmm 1,1
Using the inner product on R2 defined in Example 7.2, draw a sketch of the unit sphere (circle).
▼    Solution  If x = of all x such that
, then (x, x) = 2x2 + 3y2. Since the unit sphere (circle) consists x|| = 1, we have 1 = ||x|| = V{x7x) = Vi^rTJy2  or  2x2 + 3/2 = 1 This is the equation of an ellipse, and its graph is shown in Figure 7.2.
Figure 1.2
A unit circle that is an ellipse
Section 7.1   Inner Product Spaces
537
We will discuss properties of length, distance, and orthogonality in the next section and in the exercises. One result that we will need in this section is the generalized version of Pythagoras' Theorem, which extends Theorem 1.6.
Theorem 7.2     Pythagoras' Theorem
Let u and v be vectors in an inner product space V. Then u and v are orthogonal if and only if
II u + v||2 = Hull2 + llvll2
As you will be asked to prove in Exercise 32, we have
||u + vjj2 = (u + v, u + v) = ||u||2 + 2{u,v) + ||v||2 It follows immediately that |ju + v||2 = ||u|j2 + ||v||2 if and only if (u, v) = 0.
Orthogonal Projections and the Gram-Schmidt Process
In Chapter 5, we discussed orthogonality in U". Most of this material generalizes nicely to general inner product spaces. For example, an orthogonal set of vectors in an inner product space Vis a set {v^ ... , vk} of vectors from V such that (v;, rj) = 0 whenever v.- + y.. An orthonormal set of vectors is then an orthogonal set of unit vectors. An orthogonal basis for a subspace W of V is just a basis for W that is an orthogonal set; similarly, an orthonormal basis for a subspace W of Vis a basis for W that is an orthonormal set.
In W, the Gram-Schmidt Process (Theorem 5.15) shows that every subspace has an orthogonal basis. We can mimic the construction of the Gram-Schmidt Process to show that every finite-dimensional subspace of an innec product space has an orthogonal basis—-all we need to do is replace the dot product by the more general inner product. We illustrate this approach with an example, (Compare the steps here with those in Example 5.13.)
dx
flillilS 1.8
Construct an orthogonal basis for S?2 with respect to the inner product
(/>£> = f(x)g(x)dx
by applying the Gram-Schmidt Process to the basis {1, x, x2}.
Solution Let x, = 1, x2 = x, and x3 - x2. We begin by setting v, - x; = 1. Next we compute
dx
2   and  (t„ x2) = j  x dx
= 0
538
Chapter 7   Distance and Approximation
Adrien Marie Legendre (1752-183: was a French mathematician who worked in astronomy, number theory, and elliptic functions. He was involved in several heated disputes with Gauss. Legendre gave the first published statement of the law of quadratic reciprocity in number theory in 1765. Gauss, however, gave the first rigorous proof of this result in 1801 and claimed credit for the result, prompting understandable outrage from Legendre. Then in 1806, Legendre gave the first published application of the method of least squares in a book on the orbits of comets. Gauss published on die same topic in 1809 but claimed he had been using the method since 1795, once again infuriating Legendre.
M
w
Therefore,
(vi.x2) 0.,
v2 = x2 ---rV] = x - -(1)
(vi. Vi) 2
To find v3, we first compute
(▼l.Ä3>
x dx =
v2, X, =
x~ dx
= 0,
Then
It follows that {v„ v2, v3} is an orthogonal basis for 3P2 on the interval [—1, 1]. The polynomials
are the first three Legendre polynomials. If we divide each of these polynomials by its length relative to the same inner product, we obtain normalized Legendre polynomials (see Exercise 41).
Just as we did in Section 5.2, we can define the orthogonal projection projlv(v) of a vector v onto a subspace W of an inner product space. If {u^ . . . , UjJ is an orthogonal basis for W, then
<u1;v) (uk,v) projvtAvj =----u, + ■ • • +---uk
(«1. "I («t. Ufc)
Then the component of v orthogonal to W is the vector
perpw(v) = v - projw(v)
As in the Orthogonal Decomposition Theorem (Theorem 5.11), projw(v) and perpw(v) are orthogonal (see Exercise 43), and so, schematically, we have the situation illustrated in Figure 7.3.
We will make use of these formulas in Sections 7.3 and 7.5 when we consider approximation problems—in particular, the problem of how best to approximate a
Section 7.1   Inner Product Spaces 539
given function by "nice" functions. Consequently, we will defer any examples until then, when they will make more sense. Our immediate use of orthogonal projection will be to prove an inequality that we first encountered in Chapter 1.
The Oauchy-Schwarz and Triangle inequalities
The proofs of identities and inequalities involving the dot product in W are easily adapted to give corresponding results in general inner product spaces. Some of these are given in Exercises 31-36. In Section 1.2, we first encountered the Cauchy-Schwarz Inequality, which is important in many branches of mathematics. We now give a proof of this result for inner product spaces.
Theorem 7.3     The Cauchy-Schwarz Inequality
Let u and v be vectors in an inner product space V. Then
|{u,v)| < ||u|| ||v||
with equality holding if and only if u and v are scalar multiples of each other.
This inequality was discovered by several different mathematicians, in several different contexts. It is no surprise that the name of the prolific Cauchy is attached to it. The second name associated with this result is that of Karl Herman Amandus Schwarz (1843-1921), a German mathematician who taught at the University of Berlin. His version of the inequality that bears his name was published in 1885 in a paper that used integral equations to study surfaces of minimal area. A third name also associated with this important result is that of the Russian mathematician Viktor Yakovlevitch Bunyakovsky (1804-1889). Bunyakovsky published the inequality in 1859, a full quarter-century before Sehwarzs work on the same subject. Hence, it is more proper to refer to the result as the Cauchy-Bunyakovsky-Schwarz Inequality.
PrOOl   If u = 0, then the inequality is actually an equality, since
|(0, v)| = 0 = jj0|| J]v||
If u # 0, then let Wbe the subspace of V spanned by u. Since proj w(v)
(u, u)
uand
perpw v = v — projM,(v) are orthogonal, we can apply Pythagoras' Theorem to obtain
||vj|2 = ||projvv-(v) + (v - proj„,(v))|j2 = ||projw(v) + perpw(v)||2
= ||projw(v)||2 + ||perpw-(v)| jv||2. Now
(2)
It follows that !(projw(v) |j2
I! proj w (v) I2 so we have
u, v)    (u, v) u,--ru
(u, u) ' (u, u)
u,v
(u,u)
(u, u)
(«, v)2 (u, v)2 (u, u) ||u||2
(u,v)2
2S ||v|j2   or, equivalently,   (u, v)2 :£ Jjujj2||vj|2
Taking square roots, we obtain (u, v)      jjujj ||v||.
Clearly this last inequality is an equality if and only if ||projw(v) | Equation (2) this is true if and only if perpw(v) = 0 or, equivalently,
v = proJw(v) = ~-u
= ||v||2. By
540
Chapter 7   Distance and Approximation
If this is so, then v is a scalar multiple of u. Conversely, if v = cu, then
c(u, u)
(u, cu)
perpw(v) = v - projw(v) = c u -----u
so equality holds in the Cauchy-Schwarz Inequality.
L U
(u, u)
u = 0
For an alternative proof of this inequality, see Exercise 44. We will investigate some interesting consequences of the Cauchy-Schwarz Inequality and related inequalities in Exploration: Geometric Inequalities and Optimization Problems, which follows this1 section. For the moment, we use it to prove a generalized version of the Triangle Inequality (Theorem 1.5).
Theorem 7.4     The Triangle Inequality
Let u and v be vectors in an inner product space V. Then
||u + v|| tS ||u|| + ||v||
Proof   Starting with the equality you will be asked to prove in Exercise 32, we have |ju + vf = ||u||2 + 2{u,v) + ||v||2
< ||u||2 + 2[(u,v)| + ||v||2
< ||u||2 + 2||u||||v|| + ||v||2 = (||u|| + ||v||)2
Taking square roots yields the result.
by Cauchy-Schwarz
Exercises 7.1
1"		r4"
	andv =	
_-2_		.3.
. Compute
In Exercises 1-4, let u
1. (u, v) is the inner product of Example 7.2. Compute (a) (u,v)      (b)||u||        (c) d(u,v)
2. (u, v) is the inner product of Example 7.3 with
[6 2
A =
\2 3.
(a) (u,v)      (b)     II        (c) d(u,v)
3. In Exercise 1, find a nonzero vector orthogonal to u.
4. In Exercise 2, find a nonzero vector orthogonal to u.
In Exercises 5-8, let p(x) — 3 - 2x and q(x) =1 + x + x2.
5. (p(x), q(x)) is the inner product of Example 7.4. Compute (a) (p(x),q(x))      (b) \\p(x)\\      (c) d(p(x)tq(x))
6. (p(x), q(x)) is the inner product of Example 7.5 on the vector space 9*2 [0,1]. Compute
(a) (p(x),q(x))      (b) ||p(x)||      (c) d(p(x),q(x))
7. In Exercise 5, find a nonzero vector orthogonal to p{x).
8. In Exercise 6, find a nonzero vector orthogonal top{x).
* In Exercises 9 and 10, letf(x) = sinxandg(x) = sin x + cos x in the vector space % [0, 2ir] with the inner product defined by Example 7.5.
9. Compute
(a) (fig) (b) l/ll (c) d(fig)
10. Find a nonzero vector orthogonal to/.
11. Let a, b, and c be distinct real numbers. Show that
(p(x), q(x)) = p(a)q(a) + p(b)q(b) + p(c)q(c)
Section 7.1   Inner Product Spaces S41
defines an inner product on 9>2. [Hint: You will need the fact that a polynomial of degree n has at most n zeros. See Appendix D.]
12. Repeat Exercise 5 using the inner product of Exercise 11 with a = 0, b = 1, c = 2.
In Exercises 13-18, determine which of the four inner product axioms do not hold. Give a specific example in each case.
in U2. Define (u, v) = uxvx. n I in IR2. Define
13. Let u =	«i	andv =	.v2.
14. Let u -	-«1 .«2.	and v =	"v.!
(u, v) = uxvx		— u2v2.	
15. Let u =	»1 ."2.	andv =	_v2„
in 1R2. Define
Lv2J
(U, v) = UXV2 + M2Vj.
16. In 3f2, define (p(x), q(x)) = p(0)q(0).
17. In 3\, define (p(x), q(x)) = p{l)q{\).
18. In M22, define (A, B) = det(AB).
In Exercises 19 and 20, (u, v) defines an inner product on IR2, where u =   M]  andv =    1 . Find a symmetric
.«2J lv2„
matrix A such that (u, v) = urAv.
19. (u, v) = 4w,V! + UjV2 + u2Vi + 4u2v,
20. (u, v) = KjV] + 2U[V2 + 2«2V! + 5w2v2
Exercises 21 and 22, sketch the unit circle in U2for the given inner product, where u
"Uj"		
	andv =	
_«2_		A'2-
21. (u, v) = uxvx + \u2v2
22. (u,v) = 4Uj Vj + «iV2 + m2Vj + 4m2v2
23. Prove Theorem 7.1(b).
24. Prove Theorem 7.1(c).
In Exercises 25-29, suppose that u, v, and w are vectors in an inner product space such that
(u, v) = 1,   (u, w) = 5,   (v,w) = 0 ||u|| = 1,   ||v|| = V3,   ||w|| = 2
Evaluate the expressions in Exercises 25-28.
25. (u + w, v — w)
26. (2v ~ w, 3u + 2w)
27. ||u + v||
28. ||2u - 3v + w||
29. Show that u + v = w. [Hint: How can you use the properties of inner product to verify that u + v - w = 0?]
30. Show that, in an inner product space, there cannot be unit vectors u and v with (u, v) < -1.
In Exercises 31-36, (u,v) is an inner product. In Exercises 31-34, prove that the given statement is an identity.
31. (u + v,u - v) - """2 - iL'"2
32. ||« + vll2 = ||u
;|U|[" ~ ||V||
+ 2{u,v) + ||v|
33.  u 2 +
= f||u + v|
2llU
,112
34. (u,v) = i||u +v||^ - jllu-35. Provefhat II u + v II = II u — v|| if and only if u and v are orthogonal.
36. Prove that d(u, v) = V||u||2 + ||v||2ifandonlyifu and v are orthogonal.
In Exercises 37-40, apply the Gram-Schmidt Process to the basis B to obtain an orthogonal basis for the inner product space V relative to the given inner product.
37. V
B =
Example 7.2 38. V = R2, B =
with the inner product in
with the inner product
39
1S_40.
0« 41
42,
immediately following Example 7.3
V = £?2, B = {l, 1 + x, 1 + x + x2}, with the inner product in Example 7.4
V = 9>2[0, l],B = {1,1 + x, 1 + x + x2}, with the inner product in Example 7.5
(a) Compute the first three normalized Legendre polynomials. (See Example 7.8.)
(b) Use the Gram-Schmidt Process to find the fourth normalized Legendre polynomial.
If we multiply the Legendre polynomial of degree n by an appropriate scalar we can obtain a polynomial L„(x) such that Ln(l) = 1.
(a) Find LQ(x), Lx(x), L2(x), and L3(x).
(b) It can be shown that Ln(x) satisfies the recurrence relation
2n — 1                n — 1 Ln(x) = -xL„_i(x)--i»-2W
542
Chapter 7   Distance and Approximation
for all n > 2. Verify this recurrence for L2(x) and L3(x). Then use it to compute L4(x) and L5(x).
43. Verify that if W is a subspace of an inner product space V and v is in V, then perpw(v) is orthogonal to all w in W.
44. Let u and v be vectors in an inner product space V. Prove the Cauchy-Schwarz Inequality for u # 0 as follows:
(a) Let tbe a real scalar. Then (tu + v, tu + v) £ 0 for all values of t. Expand this inequality to obtain
a quadratic inequality of the form
at2 + bt + c > 0
What are a, b, and c in terms of u and v?
(b) Use your knowledge of quadratic equations and their graphs to obtain a condition on a, b, and c for which the inequality in part (a) is true.
(c) Show that, in terms of u and v, your condition in part (b) is equivalent to the Cauchy-Schwarz Inequality.
Explorations
Vectors and Matrices with Complex Entries
In this book, we have developed the theory and applications of real vector spaces, the most basic example of which is W. We have also explored the finite vector spaces lnp and their applications. The set C" of «-tuples of complex numbers is also a vector space, with the complex numbers C as scalars. The vector space axioms (Section 6.1) all hold for C", and concepts such as linear independence, basis, and dimension carry over from W without difficulty.
The first notable difference between W and C" is in the definition of dot product.
If we define the dot product in C" as in W, then for the nonzero vector v = have
V
v^v = V;2 + i2"
V/Zi + i = vo = o
This is clearly an undesirable situation (a nonzero vector whose length is zero) and violates Theorems 1.2(d) and 1.3. We now generalize the real dot product to C" in a way that avoids this type of difficulty.
			Vi
Definition  ifu =		and v =	
			
are vectors in C", then the complex
dot product of u and v is defined by
u • V = ulvl +
The norm (or length) of a complex vector v is defined as in the real case: ||v|| — Vv'V. Likewise, the distance between two complex vectors u and v is still defined as d(u, v) = jju-vj|.
543
1.   Show that, for v =
2. Let u = 1
[l
(a) u*v (b) ||-u.|| (c) ||v|| (d) d(u,v) (e) a nonzero vector orthogonal to u (f) a nonzero vector orthogonal to v
The complex dot product is an example of the more general notion of a complex inner product, which satisfies the same conditions as a real inner product with two exceptions. Problem 3 provides a summary.
3. Prove that the complex dot product satisfies the following properties for all vectors u, v, and w in C" and all complex scalars.
(a) u-v = vu
(b) u • (v + w) = u • v + u • w
(c) (cu) *v = c (u • v)   and  u • (cv) = c(u • v)
(d) u • u 3s 0  and  u • u = 0 if and only if u = 0.
For matrices with complex entries, addition, multiplication by complex scalars, transpose, and matrix multiplication are all defined exactly as we did for real matrices in Section 3.1, and the algebraic properties of these operations still hold. (See Section 3.2.) Likewise, we have the notion of the inverse and determinant of a square complex matrix just as in the real case, and the techniques and properties all carry over to the complex case. (See Sections 3.3 and 4.2.)
The notion of transpose is, however, less useful in the complex case than in the real case. The following definition provides an alternative.
-——-;-;---:-«
DfifinitiOII    If A is a complex matrix, then the conjugate transpose of A is the matrix A* defined by
A* = A1'
and v
2 - 3; 1 + 5i
Find:
In the preceding definition, A refers to the matrix whose entries are the complex conjugates of the corresponding entries of A; that is, if A = [fly] .then A = [atA.
4.   Find the conjugate transpose A* of the given matrix: i 2f
(a) A =
(c) A =
-» 3
2-i   1+3» -2 4 0       3 - 4»
(b) A
(d) A
2 5-2» 5 + 2»     -1
3i 0   1 + i
1 - i 4 i
1 + i 0 -»
Properties of the complex conjugate (Appendix C) extend to matrices, as the next problem shows.
5. Let A and B be complex matrices, and let c be a complex scalar. Prove the following properties:
(a) A' = A (b) ATB = A + B
(c) cA = cA (d) AB =AB
(e) (A)T = (Ar)
Hermitian matrices are named after the French mathematician Charles Hermite (1822-1901). Hermite is best known for his proof that the number e is transcendental, but he also was the first to use the term orthogonal matrices, and he proved that symmetric (and Hermitian) matrices have real eigenvalues.
The properties in Problem 5 can be used to establish the following properties of the conjugate transpose, which are analogous to the properties of the transpose for real matrices (Theorem 3.4).
6. Let A and B be complex matrices, and let c be a complex scalar. Prove the following properties:
(a) (A*)* = A (c) (cA)* = cA*
(b) (A + 5)* = A* + B* (d) (AB)* = B*A*
7. Show that for vectors u and v in C", the complex dot product satisfies u • v = u*v. (This result is why we defined the complex dot product as we did. It gives us the analogue of the formula u • v = urv for vectors in W.)
For real matrices, we have seen the importance of symmetric matrices, especially in our study of diagonalization. Recall that a real matrix A is symmetric if A1 - A. For complex matrices, the following definition is the correct generalization.
D6f InitiOn    A square complex matrix A is called Hermitian if A* = A—that is, if it is equal to its own conjugate transpose.
8. Prove that the diagonal entries of a Hermitian matrix must be real.
9. Which of the following matrices are Hermitian?
(a) A =
(c) A =
(e) A
2      1 +
1 - i i
-3 1 - 5i
■1 + 5i 3
0	3	2
-3	0	-1
-2	1	0
(b) A =
(d) A
(f) A =
-1
2 - 3/
1
1 - 4/ . 3 + i
3 0 0 2 -2 1
2 - 3i 5
1 + 4i 2
—i
3 - / i 0
10. Prove that the eigenvalues of a Hermitian matrix are real numbers. [Hint: The proof of Theorem 5.18 can be adapted by making use of the conjugate transpose operation.]
11. Prove that if A is a Hermitian matrix, then eigenvectors corresponding to distinct eigenvalues of A are orthogonal. [Hint: Adapt the proof of Theorem 5.19 using u • v = u*v instead of u • v = u v.]
Recall that a square real matrix Q is orthogonal if Q~1 = Q1, The next definition provides the complex analogue.
Definition    A square complex matrix Uis called unitary if U~1 = U*.
Just as for orthogonal matrices, in practice it is not necessary to compute U"1 directly. You need only show that 17* 17 = / to verify that U is unitary.
545
12. Which of the following matrices are unitary? For those that are unitary, give their inverses.
(a) (c)
3/5
4i/5
-4/5
31/5
(b) (d)
(1 + f)/V6    0 2/VfT 0 1 0
.(-1 - f)/V3   0 1/V3.
Unitary matrices behave in most respects like orthogonal matrices. The following problem gives some alternative characterizations of unitary matrices.
13. Prove that the following statements are equivalent for a square complex matrix U:
(a) U is unitary.
(b) The columns of U form an orthonormal set in C" with respect to the complex dot product.
(c) The rows of U form an orthonormal set in C" with respect to the complex dot product.
(d) ||l/x|| = ||x|| for every x in C".
(e) L'x • Uy — x • y for every x and y in C".
[Hint: Adapt the proofs of Theorems 5.4-5.7.]
14. Repeat Problem 12, this time by applying the criterion in part (b) or part (c) of Problem 13.
The next definition is the natural generalization of orthogonal diagonalizability to complex matrices.
Definition A square complex matrix A is called unitarily diagonalizable if there exists a unitary matrix U and a diagonal matrix D such that
U*AU = D
The process for diagonalizing a unitarily diagonalizable n X n matrix A mimics the real case. The columns of U must form an orthonormal basis for C" consisting of eigenvectors of A. Therefore, we (1) compute the eigenvalues of A, (2) find a basis for each eigenspace, (3) ensure that each eigenspace basis consists of orthonormal vectors (using the Gram-Schmidt Process, with the complex dot product, if necessary), (4) form the matrix U whose columns are the orthonormal eigenvectors just found. Then U *AU will be a diagonal matrix D whose diagonal entries are the eigenvalues of A, arranged in the same order as the corresponding eigenvectors in the columns of U.
15. In each of the following, find a unitary matrix U and a diagonal matrix D such that 17* A U = D.
(b) A =
(a) A =
-i 2
(c) A
r -i
1 - i
1 + i 0
(d) A =
-1 0_ 0 2
1 + i
0
The matrices in (a), (c), and (d) of the preceding problem are all Hermitian. It turns out that every Hermitian matrix is unitarily diagonalizable. (This is the Complex Spectral Theorem, which can be proved by adapting the proof of Theorem 5.20.) At this point you probably suspect that the converse of this result must also be true—namely, that every unitarily diagonalizable matrix must be Hermitian. But unfortunately this is falsel (Can you see where the complex analogue of the proof of Theorem 5.17 breaks down?)
For a specific counterexample, take the matrix in part (b) of Problem 15. It is not Hermitian, but it is unitarily diagonalizable.
It turns out that the correct characterization of unitary diagonalizability is the following theorem, the proof of which can be found in more advanced textbooks.
A square complex matrix A is unitarily diagonalizable if and only if
A*A = AA*
A matrix A for which A*A = AA* is called normal.
16. Show that every Hermitian matrix, every unitary matrix, and every skew-Hermitian matrix (A* = —A) is normal. (Note that in the real case, this result refers to symmetric, orthogonal, and skew-symmetric matrices, respectively.)
17. Prove that if a square complex matrix is unitarily diagonalizable, then it must be normal.
Geometric Inequalities and Optimization Problems
This exploration will introduce some powerful (and perhaps surprising) applications of various inequalities, such as the Cauchy-Schwarz Inequality. As you will see, certain maximization/minimization problems {optimization problems) that typically arise in a calculus course can be solved without using calculus at all!
Recall that the Cauchy-Schwarz Inequality in W states that for all vectors u and v,
|u"v| s ||u|| ||v||
with equality if and only if u and v are scalar multiples of each other. If u = [Xi  ■■■ x„]7andv=[v1  ••• yH] , the above inequality is equivalent to
kiji + • • • + x„y„\ < Vx] + ■ • • + x\ VyT+      + yl Squaring both sides and using summation notation, we have
See Linear Algebra with Applications by S. J. Leon (Upper Saddle River, NJ: Prentice-Hall, 2002).
541
Equality holds if and only if there is some scalar k such that^, = kx. for i = 1,..., n.
Let's begin by using Cauchy-Schwarz to derive a special case of one of the most useful of all inequalities.
1.   Let x and y be nonnegative real numbers. Apply the Cauchy-Schwarz
Inequality to u
Vx
and v =
Vy Vx
to show that
._       x + y
Vxy <
(1)
D
	c o \i
	
\ X	Y / y 1
Figure 7.4
with equality if and only ifx = y.
2. (a) Prove inequality (1) directly. [Hint: Square both sides.] (b) Figure 7.4 shows a circle with center O and diameter AB — AC + CB = x + y. The segment CD is perpendicular to AB. Prove that CD = Vxy and use this result to deduce inequality (1). [Hint: Use similar triangles.]
The right-hand side of inequality (1) is the familiar arithmetic mean (or average) of the numbers x and y. The left-hand side shows the less familiar geometric mean of x and y. Accordingly, inequality (1) is known as the Arithmetic Mean-Geometric Mean Inequality (AMGM). It holds more generally; for n nonnegative variables Xn ..., x„, it states
'if---... v
v x^Xj' ' ' Xn
with equality if and only if xi = x2 = •
x„.
In words, the AMGM Inequality says that the geometric mean of a set of nonnegative numbers is always less than or equal to their arithmetic mean, and the two are the same precisely when all of the numbers are the same. (For the general proof, see Appendix B.)
We now explore how such an inequality can be applied to optimization problems. Here is a typical calculus problem.
Figure 7.5
3Ml
Example 7.9
Prove that among all rectangles whose perimeter is 100 units, the square has the largest area.
Solution If we let x and y be the dimensions of the rectangle (see Figure 7.5), then the area we want to maximize is given by
A = xy
We are given that the perimeter satisfies
2x + 2y = 100
which is the same as x + y = 50. We can relate xy and x + y using the AMGM Inequality:
x + y
Vxy S —-—  or, equivalently,  xy £ %(x + y)
Since x + y = 50 is a constant (and this is the key), we see that the maximum value of A = xy is 502/4 = 625 and it occurs when x = y = 25.
JjJot a derivative in sight! Isn't that impressive? Notice that in this maximization problem, the crucial step was showing that the right-hand side of the AMGM Inequality was constant. In a similar fashion, we may be able to apply the inequality to a minimization problem if we can arrange for the left-hand side to be constant.
Example 7.10
Prove that among all rectangular prisms with volume 8 m3, the cube has the minimum surface area.
Solution As shown in Figure 7.6, if the dimensions of such a prism are x, y, and z, then its volume is given by
V = xyz
Thus, we are given that xyz = 8. The surface area to be minimized is
S = 2xy + 2yz + 2zx
Since this is a three-variable problem, the obvious thing to try is the version of the AMGM Inequality for n = 3—namely,
x + y + z
vxyz
Unfortunately, the expression for S does not appear here. However, the AMGM Inequality also implies that
S    2xy + 2yz + 2zx 3 ~~ 3
> ^(2xy)(2yz)(2zx) = 2^/{.xyzf
= 2V64 = 8
which is equivalent to S 2: 24. Therefore, the minimum value of S is 24, and it occurs when
2xy = 2yz = 2zx
(Why?) This implies that x = y = z = 2 (i.e., the rectangular prism is a cube).
3. Prove that among all rectangles with area 100 square units, the square has the smallest perimeter.
4.   What is the minimum value of f{x) = x + — for x > 0?
549
5. A cardboard box with a square base and an open top is to be constructed from a square of cardboard 10 cm on a side by cutting out four squares at the corners and folding up the sides. What should the dimensions of the box be in order to make the enclosed volume as large as possible?
6. Find the minimum value oif(x, y, z) — (x + y) (y + z) (z + x) if x, y, and z are positive real numbers such that xyz = 1.
8
7. For x > y > 0, find the minimum value of x H--------. [Hint: A substitution might help.]
y(x - y)'
The Cauchy-Schwarz Inequality itself can be applied to similar problems, as the next example illustrates.
	
Example 7.11	Find the maximum value of the function f(x, y, z) = 3x + y + 2z subject to the constraint x2 + y2 + z2 = 1. Where does the maximum value occur? f    Solution   This sort of problem is usually handled by techniques covered in a multi-
	
variable calculus course. Here's how to use the Cauchy-Schwarz Inequality. The function 3x + y + 2z has the form of a dot product, so we let
	"3"		X
u =	1	and  v =	y
	_2_		
Then the componentwise form of the Cauchy-Schwarz Inequality gives
(3x + y + 2z)2.....    (32 + l2 + 22)(x2 + y2 + z2)
14
Thus, the maximum value of our function is Vl4, and it occurs when
X		"3"
y	= k	1
_ ~ _		2_
Therefore, x = 3k, y = k, and z = 2k, so 3(3fc) + k k = 1 / Vl4, and hence
2(2k) = VÜ. It follows that
x		"3/VI4"
y	=	1/V14
_z_		_2/Vl4_
8. Find the maximum value of f{x, y, z) = x + 2y + Az subject to x2 + ly1 + z2 = 1.
z2
9. Find the minimum value off(x, 7,2) = x2 + y2+ — subjecttox + y + z = 10.
10. Find the maximum value of sin 9 + cos 9.
11. Find the point on the line x + 2y = 5 that is closest to the origin.
There are many other inequalities that can be used to solve optimization problems. The quadratic mean of the numbers xu ..., x„ is defined as
550
If X
1>
..., xn are nonzero, their harmonic mean is given by
n
l/x, + l/x-2 + • • ■ + l/x„
It turns out that the quadratic, arithmetic, geometric, and harmonic means are all related.
12.   Let x and y be positive real numbers. Show that
with equality if and only if x = y. (The middle inequality is just AMGM, so you need only establish the first and third inequalities.)
13. Find the area of the largest rectangle that can be inscribed in a semicircle of radius r (Figure 7.7).
14. Find the minimum value of the function
for x,y > 0. [Hint: (x + yf/xy = (x + y)(l/x + 1/y).]
15. Let x and y be positive real numbers with x + y — 1. Show that the minimum value of
f(x,y) =
(x + y)2
XV
is y, and determine the values of x and y for which it occurs.
551
552 Chapter 7   Distance and Approximation
Norms and Distance Functions
■■ mm
In the last section, you saw that it is possible to define length and distance in an inner product space. As you will see shortly, there are also some versions of these two concepts that are not defined in terms of an inner product.
To begin, we need to specify the properties that we want a "length function" to have. The following definition does this, using as its basis Theorem 1.3 and the Triangle Inequality.
DCfinitlOII A norm on a vector space V is a mapping that associates with each vector v a real number ||v||, called the norm of v, such that the following properties are satisfied for all vectors u and v and all scalars c:
1. jjv|| > 0, and ||v|| = 0 if and only if v = 0.
2. jjcvji = |c|||v||
3. j|u + v| < H| + ||v||
A vector space with a norm is called a normed linear space.
Show that in an inner product space, ||v|| = V (v, v) defines a norm.
Solution   Clearly, V(v, v) > 0. Moreover,
V{v, v) = 0 ^ (v, v) = 0     v = 0
by the definition of inner product. This proves property (1). For property (2), we only need to note that
|jcv|! = V(cv, cvj = V^vTvj = V^Vty.v) = |c|||v|| Property (3) is just the Triangle Inequality, which we verified in Theorem 7.4.
We now look at some examples of norms that are not defined in terms of an inner product. Example 7.13 is the mathematical generalization to W of the taxicab norm that we explored in the Introduction to this chapter.
Example 7.13
The sum norm ||v||s of a vector v in W is the sum of the absolute values of its components. That is, if v = [vj • • • v„]T, then
Wh - ini
Show that the sum norm is a norm.
• + I v„
Solution Clearly, ||v||s = |vj H— + |v„| > 0, and the only way to achieve equality is if | Vj | = • ■ • = | v„ | =0. But this is so if and only if Vj = ■ • • = v„ = 0 or, equivalently, v = 0, proving property (1). For property (2), we see that cv = [cVj   ■•• cv„]r,so
|kv|s= |cv,| + •••+ |cvj = IcKlvil + ■••+ |vj) - |c|H.
Section 7,2   Norms and Distance Functions
553
Finally, the Triangle Inequality holds, because if u = [u1   •'•   un] , then
!lu + v!ls = lui + vJ + ■ • • + \u„ + v„\
- (k! + hi) + ■■■ + (!«„! + \vn\)
= (\Ul\ +■■■+ \un\) + (jvj + • • • + |vj) = ||u||, + ||vj|s
The sum norm is also known as the l-norm and is often denoted by \\\\x. On fR2, it is the same as the taxicab norm. As Example 7.13 shows, it is possible to have several norms on the same vector space. Example 7.14 illustrates another norm on W.
Example 1,14
The max norm j|v|jm of a vector v in U" is the largest number among the absolute values of its components. That is, ifv = [v.
I T
then
||v||m = max{ r: ,..., jvj} Show that the max norm is a norm.
Again, it is clear that j|v|Jm _ 0. If ||v||m = 0, then the largest of |v1 [,..., j vj is zero, and so they all are. Hence, v, = • • • = v„ = 0, so v = 0. This verifies property (1). Next, we observe that for any scalar c,
||cv|!m = max{|cv,|,..., |cv„|} = |cjma_{|v,|,..., |v„|} = |c||jv||m
Finally, for u = [ux   ■■•   un] , we have
||u + v|w = max{|«i + vx\,\u„ + v„\]
^max{|«J + |vj,..., |«„| + |v„|}
< max{ u ...., \un\} + max{|vil,..., jvj} = jjujj,,, + ||v||m (Why is the second inequality true?) This verifies the Triangle Inequality.
The max norm is also known as the co-norm or uniform norm and is often denoted by jjvjl». In general, it is possible to define a norm j|vjj?) on W by
\\v\\„ = (I v, !? + •••+ \vn\P)u*
for any real number p > 1. For p = 1, jjvjji = jjv|js, justifying the term l-norm. For p = 2,
1M|2 = (IvJ2 + • • • + !v„|2)1/2 = Vv[+^4- vt
which is just the familiar norm on R" obtained from the dot product. Called the 2-norm or Euclidean norm, it is often denoted by !|vj|£. As p gets large, it can be shown using calculus that [|vL approaches the max norm ||v||m. This justifies the use of the alternative notation lIviL for this norm.
	___,__..................__,_______,_._Ste
Exam	_—i—-——~—.............-------.....■ -■.........." ■- ........j.............. —JB*-For a vector v in Z", define v H to be w(v), the weight of v. Show that it defines a
554 Chapter 7   Distance and Approximation
Solution Certainly, ||v||H = w(v) 5: 0, and the only vector whose weight is zero is the zero vector. Therefore, property (1) is true. Since the only candidates for a scalar c are 0 and 1, property (2) is immediate.
To verify the Triangle Inequality, first observe that if u and v are vectors in Z?, then w(u + v) counts the number of places in which u and v differ. [For example, if
u = [1   1   0   1   0]r  and  v = [0   1   1   1 l]r
then u + v = [1 0 1 0 ljr, so w(u + v) = 3, in agreement with the fact that u and v differ in exactly three positions.] Suppose that both u and v have zeros in n0 positions and Is in nx positions, u has a 0 and v has a 1 in «ol positions, and u has a 1 and v has a 0 in nl0 positions. (In the example above, n0 = 0, n, = 2, nm = 2, and nl0 =1.) Now
w(u) = «, + n10, w(v) — «! + «01,  and  w(u + v) = nU) + «01 Therefore,
+ V|H = w(U + V) = «i0 + Hqi
= («i + «10) + («, + nm) - 2«i
< («i + «i0) + («i + «0i)
= w(u) + w(v) = |)u||H + ||v||H
The norm ||v||H is called the Hamming norm.
Distance Functions '
For any norm, we can define a distance function just as we did in the last section—namely,
d(u, v) == |ju - v|j
					
Example 7.16	Let u = (b) the su	3" -2. m nc	and v = irm, and (c	-l" 1 ) the	. Compute d( u, v) relative to (a) the Euclidean norm, max norm.
Solution   Each calculation requires knowing that u — v = .
(a) As is by now quite familiar,
dE(u,\) = |ju - v||E = V42 + (-3)2 = V25 = 5
(b) ds(u,v) = lju -v\\s = |4| + j-3| = 7
(c) d,„(u, v) = ||u - v||„, = max{|4|, |-3|} = 4
k
The distance function on Zjj determined by the Hamming norm is called the Hamming distance. We will explore its use in error-correcting codes in Section 8.5. Example 7,17 provides an illustration of the Hamming distance.
Section 7.2   Norms and Distance Functions
555
• ? 11
Find the Hamming distance between u= (1   1   0   1 0]
T  and  v - [0   1   1   1 1]T
Solution   Since we are working over Z,, u - v = u + v. But
dH(u,v) = ||u + v\\H =
w(u + v)
As we noted in Example 7.15, this is just the number of positions in which u and v differ. "The given vectors are the same ones used in that example; the calculation is therefore exactly the same. Hence, dH(u, v) = 3. 4
Theorem 7.5 summarizes the most important properties of a distance function.
Theorem. 7.5     Let d be a distance function defined on a normed linear space V. The following
properties hold for all vectors u, v, and w in V:
a. d(u, v) > 0, and d(u, v) = 0 if and only if u - v.
b. d(u, v) = d(v, u)
c. d(u, w) 5 d(u, v) + d(v, w)
(a) Using property (1) from the definition of a norm, it is easy to check that d(u, v) == || u — vjj s 0, with equality holding if and only if u — v = 0 or, equivalently, u = v.
(b) You are asked to prove property (b) in Exercise 19.
(c) We apply the Triangle Inequality to obtain
A function d satisfying the three properties of Theorem 7.5 is also called a metric, and a vector space that possesses such a function is called a metric space. These are very important in many branches of mathematics and are studied in detail in more advanced courses.
We can define norms for matrices exactly as we defined norms for vectors in R". After all, the vector space Mmn of all m X n matrices is isomorphic to lRm", so this is not difficult to do. Of course, properties (1), (2), and (3) of a norm will also hold in the setting of matrices. It turns out that, for matrices, the norms that are most useful satisfy an additional property. (We will restrict our attention to square matrices, but it is possible to generalize everything to arbitrary matrices.)
d(u, v) + d(v, w) = |ju - vjj + jjv - wjj > jj(u - v) + (v - w) = I u — wj = d(u, w)
Matrix Norms
556 Chapter 7   Distance and Approximation
Definition A matrix norm on Mnn is a mapping that associates with each n X n matrix A a real number ||A|J, called the norm of A, such that the following properties are satisfied for all n X n matrices A and J3 and all scalars c.
1. jjA| :• 0 and ||A|| = 0 if and only if A = O,
2. \\cA\\ = |c|||A||
3. ||A + B\\ < ||A|| +
4. ||AB|| < ||A||||B||
A matrix norm on Mnn is said to be compatible with a vector norm |x| on W if, for all n X n matrices A and all vectors x in W, we have
IIAxil £ llAllllxll
Example 7.18
The Frobenius norm \\A\\F of a matrix A is obtained by stringing out the entries of the matrix into a vector and then taking the Euclidean norm. In other words, ||A|F is just the square root of the sum of the squares of the entries of A. So, if A = [fly], then
(a) Find the Frobenius norm of
A =
(b) Show that the Frobenius norm is compatible with the Euclidean norm.
(c) Show that the Frobenius norm is a matrix norm.
(a) ||A ||,,. = \/3- +71l7T2T+~41 = V30 Before we continue, observe that if Aj = [3   — 1] andA2 = [2   4] are the row vectors of A, then || AT || £ = V3"2 + (-lp and ||A2||£ = V2^ + 4AThus,
||A||f =V||A1||i+ ||A2|||
Similarly, if ai
"3"		-1"
	anda2 =	
.2.		
	IIa	
are the column vectors of A, then
It is easy to see that these facts extend ton X n matrices in general. We will use these observations to solve parts (b) and (c).
(b) Write
A =
'Ai
l A„
Section 7.2   Norms and Distance Functions
557
Then
Ax v. =
A,x
uA„x
= VllA^II
+
+ A„x
+
+ A
n\\E\
= (N/||A1||| + ...+ ||A,||S)||x||B
= ||A||f||x||B
where the inequality arises from the Cauchy-Schwarz Inequality applied to the dot products of the row vectors A; with the column vector x. (Do you see how Cauchy-Schwarz has been applied?) Hence, the Frobenius norm is compatible with the Euclidean norm.
(c) Let b, denote the Jth column of B. Using the matrix-column representation of the product AB, we have
||AB||F= ||[Ab,---Ab„]||F
V||Ab1||i + ---+ llAbJi V||A||II|b1||I + •••+ \\A\\2F\\bn
by part (b)
= iiAiuVINif +
= llAllJßllp
•+ b
2
«IIb
which proves property (4) of the definition of a matrix norm. Properties (1) through (3) are true, since the Frobenius norm is derived from the Euclidean norm, which satisfies these properties. Therefore, the Frobenius norm is a matrix norm.
For many applications, the Frobenius matrix norm is not the best (or the easiest) one to use. The most useful types of matrix norms arise from considering the effect of the matrix transformation corresponding to the square matrix A. This transformation maps a vector x into Ax. One way to measure the "size" of A is to compare ||x|| and ||Ax|| using any convenient (vector) norm. Lets think ahead. Whatever definition of || A || we arrive at, we know we are going to want it to be compatible with the vector norm we are using; that is, we will need
Ax
A    x or
Ax
IAII   forx # 0
!! Ax||
The expression —r.—r— measures the "stretching capability" of A. If we normalize each
nonzero vector x by dividing it by its norm, we get unit vectors x
1
x and thus
Ax
Ax
	—	4„V
IUI!		V llxll /
Ax
558 Chapter 7   Distance and Approximation
y
4--
-4-
Figure 7.8
Figure 7.8 shows how the matrix A =
affects the unit circle in Mr—it maps it
If x ranges over all nonzero vectors in IR", then x ranges over all unit vectors (i.e., the unit sphere) and the set of all vectors Ax determines some curve in W. For example,
"3 2 .2 0_
into an ellipse. With the Euclidean norm, the maximum value of || Ax|| is clearly just half the length of the principal axis—-in this case, 4 units. We express this by writing
max ||Ax|| = 4.
IliB-i
In Section 7.4, we will see that this is not an isolated phenomenon. That is,
!|Ax|l
max-f^—77--= max||Ax|| **o ||x||
always exists, and there is a particular unit vector y for which || Ay|| is maximum.
Now we prove that || A || = max || Ax|| defines a matrix norm.
IWhi
Theorem 7.6     If ||x|| is a vector norm on IR", then ||A|| = max ||Ax|| defines a matrix norm on M„
-■- lixl! -i
that is compatible with the vector norm that induces it.
Pf00f (1) Certainly, ||Ax|| S: 0 for all vectors x, so, in particular, this inequality is true if || x j| = 1.Hence, ||Ai| = max ||Ax|| > 0 also. If || A || = 0, then we must have
11x11=1
11 Ax 11 = 0—and, hence, Ax = 0—for all x with ||x|| = 1. In particular, Ae; = 0 for each of the standard basis vectors e; in W. But Ae,- is just the ith column of A, so we M        must have A = O. Conversely, if A = O, it is clear that || A || =0. (Why?) (2) Let c be a scalar. Then
||cA|| = max||cAx|| = max|c| ||Ax|| = |c|max||Ax|| = |c|||a||
llx||-=l IUII = 1 !|x|| = l
Section 7,2   Norms and Distance Functions
(3) Let B be an n X n matrix and let y be a unit vector for which
\\A + B|i = max || U + B)x|| = ||(A + £)y||
Then |JA + B|| = ||(A + B)y||
= II Ay + By ||
< ||Ay|| + ||J5y||
< mil + iibii
(Where nioes the second inequality come from?) Next, we show that our definition is compatible with the vector norm [ property (5)] and then use this fact to complete the proof that we have a matrix norm.
(5) If x = 0, then the inequality || Ax!| S || A || ||x|| is true, since both sides are zero. If x # 0, then from the comments preceding this theorem,
IUx||       !!ax||   .I h
~r,—r,— < max-H—n— = ||A|| |jx||       **o ||x||
Hence, ||Ax|| < |U|| ||x||.
(4) Let z be a unit vector such that || AB || = max 11 (AS) x|| = || ABz 11. Then
Iklhi
ii AB || = ||ABz|| = ||A(Bz)||
< || A || ||Bz|| by property (5)
'  < || A || ||B|| ||z||       by property (5)
= lUllllBl!
This completes the proof that || A || = max || Ax || defines a matrix norm on M„„ that is compatible with the vector norm that induces it. -3§fHBB
BefliiltiQIi The matrix norm || A || in Theorem 7.6 is called the operator norm induced by the vector norm ||x||.
The term operator norm reflects the fact that a matrix transformation arising from a square matrix is also called a linear operator. This norm is therefore a measure of the stretching capability of a linear operator.
The three most commonly used operator norms are those induced by the sum norm, the Euclidean norm, and the max norm—namely,
HaI^ = max ||Ax||s,   ||a||2 = max ||Ax||E,   ||a||,x = max ||Ax||m
IMI<r=l IH!e==i        " Mk=*i
respectively. The first and last of these turn out to have especially nice formulas that make them very easy to compute.
i ;. Chapter 7   Distance and Approximation
Tll£0rei11 7.7     LetAbeanwX n matrix with column vectors a; and row vectors A, fori = 1,..,,».
r '
a. IUÜ] =   max {|| a, II J =  max <]£k7|
b. ||A|U =   max {||A;||S} =   max S^K'I
In other words, ||A||j is the largest absolute column sum, and || A ||x is the largest absolute row sum. Before we prove the theorem, lets look at an example to see how
	easy it is to use.			
Examp	Let			- —---- ---■--
	i Find ||A llj and ||A|U	A =	"1-3 2" 4   -1 -2 _-5      1 3_	
Solution   Clearly, the largest absolute column sum is in the first column, so
|| At), = ||a,||s = |l| + |4| + |-5| = 10 The third row has the largest absolute row sum, so
IUIL = ||A3||S = |-5j + |lj + |3] = 9 With reference to the definition ||A||i = max || Ax||s, we see that the maximum value of 10 is actually achieved when we take x = el5 for then
llAeJ, = ||a,||, = 10 = ||a||j For ||A|L = max ||Ax|L, ifwe take
we obtain
||Ax||
1 -3 4 -1 -5 1 max {| -21, I
-1 1 1
-2 -7 9
-7|, |9|} = 9 =
We will use these observations in proving Theorem 7.7.
Section 7.2   Norms and Distance Functions
561
Proof Of Theorem 7.7 The strategy is the same in the case of both the column sum and the row sum. If M represents the maximum value, we show that || Ax|| s M for all unit vectors x. Then we find a specific unit vector x for which equality occurs. It is important to remember that for property (a) the vector norm is the sum norm whereas for property (b) it is the max norm.
(a) To prove (a), let M = max {||a,-||s}, the maximum absolute column sum, and let ||x|js = 1. Then |*J + ■■• + \xn\ - 1, so
Ax
< II3-1IIs + • • • + \xn\||ajs
< \xi\M + • • • + \x„\M
=      | + • • • + \x„\)M = 1«M = M If the maximum absolute column sum occurs in column k, then with x =   we obtain
llAeJ, = ||ajs = M Therefore, ||A||x = max||Ax||s = .« =   max {||a,||s}, asrequired.
IUIi.=l j"l,...,n
(b) The proof of property (b) is left as Exercise 32. ___SBBi
In Section 7.4, we will discover a formula for the operator norm || A || 2, although it is not as computationally feasible as the formula for 11A11, or || A
The Condition Numher of a Matrix
In Exploration: Lies My Computer Told Me in Chapter 2, we encountered the notion of an ill-conditioned system of linear equations. Here is the definition as it applies to matrices.
Definition A matrix A is ill-conditioned if small changes in its entries can prociuce large changes in the solutions to Ax = b. If small changes in the entries of A produce only small changes in the solutions to Ax = b, then A is called well-conditioned.
Although the definition applies to arbitrary matrices, we will restrict our attention to square matrices.
			
Example 7.20	Show that A = f Solution   If we	1 1 1 1.0005. [3 :akeb = .3	is ill-conditioned. , then the solution to Ax = b is x =      . However, 0010 2
if A changes to
1 1
1 1.0010
Chapter 7   Distance and Approximation
21
. (Check these assertions.) Therefore, a relative
then the solution changes to x'
change of 0.0005/1.0005 « 0,0005, or about 0.05%, causes a change of (2 - 1)/1 = 1, or 100%, in xx and (1 - 2)/2 = -0.5, or -50%, in x2. Hence, a is ill-conditioned.
ii
We can use matrix norms to give a more precise way of determining when a matrix is ill-conditioned. Think of the change from A to A' as an error AA that, in turn, introduces an error Ax in the solution x to Ax = b. Then A' = A + AA and x' = x + Ax, In Example 7.20,
AA =
0 0
0 0.0005
and Ax
1
-1
Then, since Ax = b and a'x' = b, we have (a + AA) (x + Ax) = b. Expanding and canceling off Ax = b, we obtain
A(Ax) + (AA)x + (AA)(Ax) = 0  or  A(Ax) = -AA(x + Ax)
Since we are assuming that Ax = b has a solution, A must be invertible. Therefore, we can rewrite the last equation as
Ax = -a"_1(AA)(x + Ax) = -a_1(AA)x'
Taking norms of both sides (using a matrix norm that is compatible with a vector norm), we have
|| Ax!| = I!-A-1(AA)x' H = ||A 1(AA)x'||
< || A'KAA) || ||x' ||
< |!a-M|||aa||||x'||
(What is the justification for each step?) Therefore,
|^<||A-MlllAA|| = (||A-Mll|A||)i^f
llx'll !!a|!
The expression ||A~'|II|a|| is called the condition number of A and is denoted by cond( A). If a is not invertible, we define cond( a) = oo.
What are we to make of the inequality just above? The ratio ||AAJj/||A|| is a measure of the relative change in the matrix A, which we are assuming to be small. Similarly, ||Ax||/||x'|| is a measure of the relative error created in the solution to Ax = b (although, in this case, the error is measured relative to the new solution, x', not the original one, x). Thus, the inequality
I Ax|| „ x II AA |[
gives an upper bound on how large the relative error in the solution can be in terms of the relative error in the coefficient matrix. The larger the condition number, the more ill-conditioned the matrix, since there is more "room" for the error to be large relative to the solution.
Section 7.2   Norms and Distance Functions
563
Itmrfci
• The condition number of a matrix depends on the choice of norm. The most commonly used norms are the operator norms || A ||, and || A || m.
• For any norm, cond(A) S: 1. (See Exercise 45.)
Example 7.21
Find the condition number of A =
Solution   We first compute
1 1
1 1.0005
relative to the oo-norm.
A'
2001 -2000 -2000 2000
Therefore, in the oo-norm (maximum absolute row sum),
lAlL = 1 + 1.0005 = 2.0005   and   II A-1 IL = 2001 + |-2000| = 4001
socond„(A) = HA""1 !U! A IL = 4001(2.0005) « 8004.
It turns out that if the condition number is large relative to one compatible matrix norm, it will be large relative to any compatible matrix norm. For example, it can be shown that for matrix A in Examples 7.20 and 7.21, cond^A) 8004, cond2(A) « 8002 (relative to the 2-norm), and condF(A) ~= 8002 (relative to the Frobenius norm).
The Convergence of Iterative Methods
In Section 2.5, we explored two iterative methods for solving a system of linear equations: Jacobi's method and the Gauss-Seidel method. In Theorem 2.9, we stated without proof that if A is a strictly diagonally dominant n X n matrix, then both of these methods converge to the solution of Ax = b. We are now in a position to prove this theorem. Indeed, one of the important uses of matrix norms is to establish the convergence properties of various iterative methods.
We will deal only with Jacobi's method here. (The Gauss-Seidel method can be handled using similar techniques, but it requires a bit more care.) The key is to rewrite the iterative process in terms of matrices. Let's revisit Example 2.37 with this in mind. The system of equations is
7Xj       %2 — ^
so
A =
7 -1 3 -5
and  b =
Chapter 7   Distance and Approximation
We rewrote Equation (2) as
5 + x2 7
7 + 3%i
(3)
which is equivalent to
7«, -5x2
Xo + 5
•3#i — 7
(4)
or, in terms of matrices,
\7	o"	V		0	1*			+	5"
Lo	-5.	*2.		.-3	0_		*2.		-7
(5)
Study Equation (5) carefully: The matrix on the left-hand side contains the diagonal entries of A, while on the right-hand side we see the negative of the off-diagonal entries of A and the vector b. So, if we decompose A as
= L + D + U
	'7	-f		0	0"		"7	0"		"0	-l"
A =						+			+		
	3	— 5_		3	0		.0	-5		.0	0
then Equation (5) can be written as
Dx = -(I + U)x + b
or, equivalently,
x =' -D \L + U)x + D lb
(6)
since the matrix D is invertible. Equation (6) is the matrix version of Equation (3). It is easy to see that we can do this in general: Ann X n matrix A can be written as A = L + D + U, where D is the diagonal part of A and L and U are, respectively, the portions of A below and above the diagonal. The system Ax = b can then be written in the form of Equation (6), provided D is invertible—which it is if A is strictly diagonally dominant. (Why?) To simplify the notation, let s let M = - D~1 (I + 17) and c = D~'b so that Equation (6) becomes
Mx + c
(7)
Recall how we use this equation in Jacobi s method. We start with an initial vector x0 and plug it into the right-hand side of Equation (7) to get the first iterate Xj—that is, X! = Mx0 + c. Then we plug _j into the right-hand side of Equation (7) to get the second iterate x2 = Mxj + c. In general, we have
Xfc+i = Mxk + c for k £ 0. For Example 2.37, we have
M ~ -D~KL + U) ■-
(8)
"7	0"	-l	"0	-1"		"0 i]
0	-5.		.3	0.		J oj
and
c = D~lh -
"7	0"	"T 5~		"5" 7
.0	-5.	1-7.		7 .5-
Section 7.2   Norms and Distance Functions
565
so
x, =
+
0.714 1.400
"0.714"		"1		"0.914"
	+			
.1-400.		.5-		1,829
and so on. (These are exactly the same calculations we did in Example 2.37, but written in matrix form.)
To show that Jacobis method will converge, we need to show that the iterates xk approach the actual solution x of Ax = b. It is enough to show that the error vectors xk — x approach the zero vector. From our calculations above, Ax = b is equivalent to x = Mx + c. Using Equation (8), we then have
xk+1 — x    Mxk + c — (Mx + c)
= M(xk - x)
Now we take the norm of both sides of this equation. (At this point, it is not important which norm we use as long as we choose a matrix norm that is compatible with a vector norm.) We have
Xl-4
\M(xk - x)|
|m|| ||xj. — x|
(9)
Ifwe can show that ||MJ| < 1, then we will have |jxA+1 — x|| < ||xt — x|| forallfc^O, and it follows that ||xt — x|| approaches zero, so the error vectors xk — x approach the zero vector.
The fact that strict diagonal dominance is defined in terms of the absolute values of the entries in the rows of a matrix suggests that the oo-norm of a matrix (the operator norm induced by the max norm) is the one to choose. If A = [««], then
M =
0        -auh\\ ~<*iJan
~~an\lann    ~~arülann    " ' ' 0
(verify this), so, by Theorem 7.7, ||M||» is the maximum absolute row sum of M. Suppose it occurs in the kth row. Then
ML =
akk
+ ■ ■ ■ +
"ak,k~l
+
<*kk
+ • ■ ■ +
akk
kil + • • • + Wk-\\ + l%icfil + • • • + W«\ Wk\
< l
since A is strictly diagonally dominant. Thus, ||M|L < 1, so ||x^ - x|| —> 0, as we wished to show.
f 11
Compute [| M \\„ in Example 2.37 and use this value to find the number of iterations required to approximate the solution to three-decimal-place accuracy (after rounding) if the initial vector is x0 = 0.
Chapter 7   Distance and Approximation
Exercises 7.2
Solution   We have already computed M =
0
0
Ml
0.6 < 1
(implying that Jacobi's method converges in Example 2.37, as we saw). The approximate solution xk will be accurate to three decimal places if the error vector xk — x has the property that each of its components is less than 0.0005 in absolute value. (Why?) Thus, we need only guarantee that the maximum absolute component of x.( - x is less than 0.0005. In other words, we need to find the smallest value of k such that
IIXi - x\\m < 0.0005 Using Equation (9) above, we see that
x,, - xlL < IImLIIx, , - x|L < llMjIil'
Now   ml = 0.6 and xn
so
H-2
I m|!*|
0.714 1.400
x„ - x
= 1.4,
m
(0.6)*(1.4)
(If we knew the exact solution in advance, we could use it instead of xt. In practice, this is not the case, so we use an approximation to the solution, as we have done here.) Therefore, we need to find k such that
(0.6)*(1.4) < 0.0005
We can solve this inequality by taking logarithms (base 10) of both sides. We have log10((0.6)*(l.4)) < log10(5 X 10~4) --=> fclog10(0.6) + log10(l.4) < log105 - 4
=> -0.222k + 0.146 < -3.301 =>k> 15.5
Since k must be an integer, we can therefore conclude that k = 16 will work and that 16 iterations of Jacobi's method will give us three-decimal-place accuracy in this example. (In fact, it appears from our calculations in Example 2.37 that we get this degree of accuracy sooner, but our goal here was only to come up with an estimate.)
			2'
In Exercises 1-3, let u =	4	andv =	-2
	.-5		0
4. (a) What does ds(u, v) measure? (b) What does dm(u, v) measure?
1. Compute the Euclidean norm, the sum norm, and the max norm of u.
2. Compute the Euclidean norm, the sum norm, and the max norm of v.
3. Compute d(u, v) relative to the Euclidean norm, the sum norm, and the max norm.
1 0
IT
1   1   0   0   11' and
In Exercises 5 and 6, let u = v = [0   1   1   0   1   1 1]
5. Compute the Hamming norms of u and v.
6. Compute the Hamming distance between u and v.
7. (a) For which vectors v is !|v||£ = ||v|jm? Explain
your answer.
Section 7.2   Norms and Distance Functions
567
(b) For which vectors v is [ | v 11, = 11 v j | ,„? Explain your answer.
(c) For which vectors v is ||v||s = ||v||m = ||v||£? Explain your answer.
8. (a) Under what conditions on u and v is ||n + v||£ =
11u!! e + 11v 11 e • Explain your answer.
(b) Under what conditions on u and v is
||U + v||s = ||u||s + ||vI!s? Explain your answer.
(c) Under what conditions on u and v is
II u + v||,„ = ||u||m + || v||,„? Explain your answer.
9. Show that for all v in R", ||v||m< ||v||£.
10. Show that for all v in U", ||v||£< ||v||s.
11. Show that for all v in I
12. Show that for all v in I
I vII s:
Ml*
«l|V||m.
V«||v||f
13. Draw the unit circles in and the max norm.
relative to the sum norm
14. By showing that the identity of Exercise 33 in Section 7.1 fails, show that the sum norm does not arise from any inner product.
In Exercises 15-18, prove that vector space V.
defines a norm on the
15. V
— o>-
16. V = M„
= max{|2a|, \3b\} a11 = max{ I ay j}
,17. V=<€[0,1], 11/1
18. 11/11 = max \f(x)\
0Sl<]
\f(x)\ dx
19. Prove Theorem 7.5(b),
In Exercises20-25, compute ||a||f, ||a||],and ||a|
	'2	3"				0		-1	
20. A =					21. A =				
	_4	1_				.-3		3.	
						"2	1	r	
		1	5"						
22. A =					23. A =	1	3	2	
		2	-1.						
						.1	1	3_	
		0	-5	2		"4		2	-1
24. A =		3	1	-3	25. A =	0		1	2
		-4	!	3		3		3	0
In Exercises26-31, find vectorsxandy with ||x|j, = 1 and ||y||m= 1 such that \\A\\, = |JAx||5aw«i ||a|U, = ||Ay||w where A is the matrix in the given exercise.
26. Exercise 20 27. Exercise 21 28. Exercise 22 29. Exercise 23      30. Exercise 24       31. Exercise 25
32. Prove Theorem 7.7(b).
33. (a) If || A || is an operator norm, prove that ||l|| = 1,
where 7 is an identity matrix, (b) Is there a vector norm that induces the Frobenius norm as an operator norm? Why or why not?
34. Let || A || be a matrix norm that is compatible with a vector norm ||x||. Prove that ||a|| 2: ja for every eigenvalue A of A.
In Exercises 35-40, find condY(A) and condJyA). State whether the given matrix is ill-conditioned.
	'3	f				1	-2	
35. A =					36. A =			
	.4	2.				.~3	6	
	"l	0.99"				150	200	
37. A =				38. A =				
	.1	1				L3001	4002	
	"l	1	f			"1 \	l -3	
39. A =	5	5	6		40. A =	1 l 2 3	1 4	
	.1	0	0.			1 1 -3 4	1 5 -	
		"l	k					
41. Let A	=							
		1	1					
(a) Find a formula for cond«(A) in terms of k.
(b) What happens to condK.(A) as k approaches 1?
42. Consider the linear system Ax = b, where A is invert-ible. Suppose an error Ab changes b to b' = b + Ab. Let x' be the solution to the new system; that is, Ax' = b'. Let x' = x + Ax so that Ax represents the resulting error in the solution of the system. Show that
Ax
cond(A)-
Ab
for any compatible matrix norm.
43. Let A
10   10 [100
andb = 10    9j [ 99
(a) Compute cond,=(A).
(b) Suppose A is changed to A' =
large a relative change can this change produce in the solution to Ax = b? [Hint: Use inequality (1) from this section.!
10 10
10
11
. How
Chapter 7   Distance and Approximation
(c) Solve the systems using A and A' and determine the actual relative error.
[100"
(d) Suppose b is changed to b'
101
How large a
relative change can this change produce in the solution to Ax — b? [Hint: Use Exercise 42.] (e) Solve the systems using b and b' and determine the actual relative error.
46. Show that if A and B are invertible matrices, then cond(AB) £ cond(A)cond(B) with respect to any matrix norm.
47. Let A be an invertible matrix and let A] and A„ be the eigenvalues with the largest and smallest absolute values, respectively. Show that
|Ai| I A. I
cond (A)
	"l	1	r		V
44. Let A =	2	5	0	andb =	2
	.1	-1	2_		.3.
(a) Compute cond, (A).
(b) Suppose A is changed to A' =
. How
large a relative change can this change produce in the solution to Ax = b? [Hint: Use inequality (1) from this section.]
(c) Solve the systems using A and A' and determine the actual relative error.
l"
(d) Suppose b is changed to b' =
. How large a
relative change can this change produce in the solution to Ax = b? [Hint: Use Exercise 42.]
(e) Solve the systems using b and b' and determine the actual relative error.
45. Show that if A is an invertible matrix, then cond(A) 2: 1 with respect to any matrix norm.
[Hint: See Exercise 34 and Theorem 4.18(b) in Section 4.3.]
cas in Exercises 48-51, write the given system in the form of Equation (7). Then use the method of Example 7.22 to estimate the number of iterations ofjacobi's method that will be needed to approximate the solution to three-decimal-place accuracy. (UsexQ — O.J Compare your answer with the solution computed in the given exercise from Section 2.5.
48. Exercise 1, Section 2.5     49. Exercise 3, Section 2.5
50. Exercise 4, Section 2.5     51. Exercise 5, Section 2.5
Exercise 52(c) refers to the Leontief model of an open economy, as discussed in Sections 2.4 and 3.7.
52. Let A be an n X n matrix such that ||A|| < 1, where the norm is either the sum norm or the max norm, (a) Prove that A" -> O as n -* ». (b.) Deduce from (a) that I — A is invertible and
(I - A)'1 = I + A + A2 + A3 + ■ • •
[Hint: See the proof of Theorem 3.34.] (c) Show that (b) can be used to prove Corollaries 3.35 and 3.36.
Least Squares Approximation
In many branches of science, experimental data are used to infer a mathematical relationship among the variables being measured. For example, we might measure the height of a tree at various points in time and try to deduce a function that expresses the tree's height h in terms of time t. Or, we might measure the size p of a population over time and try to find a rule that relates p to t. Relationships between variables are also of interest in business; for example, a company producing widgets may be interested in knowing the relationship between its total costs c and the number n of widgets produced.
In each of these examples, the data come in the form of two measurements: one for the independent variable and one for the (supposedly) dependent variable. Thus, we have a set of data points (xt, y{), and we are looking for a function that best approximates the relationship between the independent variable X and the dependent variable y. Figure 7.9 shows examples in which experimental data points are plotted, along with a curve that approximately "fits" the data.
Section 7.3   Least Squares Approximation
569
Figure 7.9
Curves of "best fit"
English mathematician who, while a fellow at Cambridge, edited the second edition of Newton's Prin-cipia. Although he published little, he made important discoveries in the theory of logarithms, integral calculus, and numerical methods.
The method of least squares, which we are about to consider, is attributed to Gauss. A new asteroid, Ceres, was discovered on New Year's Day, 1801, but it disappeared behind the sun shortly after it was observed. Astronomers predicted when and where Ceres would reappear, but their calculations differed greatly from those done, independently, by Gauss. Ceres reappeared on December 7, 1801, almost exactly where Gauss had predicted it would be. Although he did not disclose his methods at the time, Gauss had used his least squares approximation method, which he described in a paper in 1809. The same method was actually known earlier; Cotes anticipated the method in the early 18th century, and Legendre published a paper on it in 1806. Nevertheless, Gauss is generally given credit for the method of least squares approximation.
We begin our exploration of approximation with a more general result.
Chapter 7   Distance and Approximation
The Best Approximation Theorem
In the sciences, there are many problems that can be phrased generally as "What is the best approximation to X of type 7?" X might be a set of data points, a function, a vector, or many other things, while Y might be a particular type of function, a vector belonging to a certain vector space, etc. A typical example of such a problem is finding the vector w in a subspace W of a vector space V that best approximates (i.e., is closest to) a given vector v in V. This problem gives rise to the following definition.
PefiPitiOn If Wis a subspace of a normcd linear space V and if v is a vector in V, then the best approximation to v in W is the vector v in W such that
|]v — v|| < |jv — w||
for every vector w in W different from v.
In R2 or R3, we are used to thinking of "shortest distance" as corresponding to "perpendicular distance." In algebraic terminology, "shortest distance" relates to the notion of orthogonal projection: If Wis a subspace of W and vis a vector in R", then we expect projw(v) to be the vector in W that is closest to v (Figure 7.10).
Since orthogonal projection can be defined in any inner product space, we have the following theorem.
Figure 7.10
If v -- projw-(v), then
||v — v|| < ||v - w|! for all w # v
Theorem 7.8     Ihe Best Approximation Tlieorem
If W is a finite-dimensional subspace of an inner product space V and if v is a vector in V, then proj w(v) is the best approximation to v in W.
Let w be a vector in W different from projw(v). Then projw(v) - w is also in W, so v ~ proju-(v) = perpw(v) is orthogonal to projw(v) - w, by Exercise 43 in Section 7.1. Pythagoras' Theorem now implies that
jjv - projw(v)|2 + jjprojH,(v) - w||2 = ||(v - projw(v)) + (projw(v) - w)f
Section 7.3   Least Squares Approximation 511
as Figure 7.10 illustrates. However, ||projw(v) — wj|2 > 0, since w + projw(v), so
j|v - projw(v)jj2 < ||v -- projw(v)j|2 + ||projw(v) - wf = ||v - wj|2 or, equivalently,
||v - projw(v)|| < ||v - w|| _SH
Example 7.23
Letu,
r		5"		"3~
2	.u2 =	-2	, andv =	2
		1_		.5.
. Find the best approximation to v in the
plane W = span (tlx, u2) and nndtne Euclidean distance from v to W.
Solution The vector in W that best approximates v is projw(v). Since u: and u: are orthogonal,
projw(v)
u, - v
u, -u,
u, +
u, • V
u, u-
1		5		3
2	T 30	-2	=	_2
. ••• 1.		1_		1 L 5-
The distance from v to Wis the distance from v to the point in W closest to v. But this distance is just ||perpw(v)|| = jjv — projvv(v)jj. We compute
	3		3		0
v - projw(v) =	2	-	_2 5	=	12 5
*	5				24 L 5 J
so ||v - projw(v)i! = VO2 + (f)2 + (f)2 =
which is the distance from v to W.
'f = 12V5/5
In Section 7.5, we will look at other examples of the Best Approximation Theorem when we explore the problem of approximating functions.
Remark The orthogonal projection of a vector v onto a subspace W is defined in terms of an orthogonal basis for W. The Best Approximation Theorem gives us an alternative proof that proj w (v) does not depend on the choice of this basis, since there can be only one vector in W that is closest to v—namely, proj!V(v).
Least Squares Approximation
We now turn to the problem of finding a curve that "best fits" a set of data points. Before we can proceed, however, we need to define what we mean by "best fit." Suppose the data points (1, 2), (2, 2), and (3, 4) have arisen from measurements taken during some experiment. Also suppose we have reason to believe that the x and y values are related by a linear function; that is, we expect the points to lie on some line with equation/ = a + bx. If our measurements were accurate, all three points would satisfy this equation and we would have
2 = a + b • 1     2 = a + b • 2     4 = a + fo-3
Chapter 7   Distance and Approximation
This is a system of three linear equations in two variables:
a +  b = 2 a + 2b = 2 or a + 3b = 4
Unfortunately, this system is inconsistent (since the three points do not lie on a straight line). So we will settle for a line that comes "as close as possible" to passing through our points. For any line, we will measure the vertical distance from each data point to the line (representing the errors in the v-direction), and then we will try to choose the line that minimizes the total error. Figure 7.11 illustrates the situation.
1	1				2'
			a		
1	2			—	2
			b		
1	3				4
a + bx
Figaro 1.11
Finding the line that minimizes e\ + e\ + S3
If the errors are denoted by elt e2> and e3, then we can form the error vector
e e
IS3J
We want e to be as small as possible, so j|ej| must be as close to zero as possible. Which norm should we use? It turns out that the familiar Euclidean norm is the best choice. (The sum norm would also be a sensible choice, since ||e[js = + + e3 is the actual sum of the errors in Figure 7.11. However, the absolute value signs are hard to work with, and, as you will soon see, the choice of the Euclidean norm leads to some very nice formulas.) So we are going to minimize
||e| = Ve2 + ef + ef or, equivalently,   ||e|j2 = e2 + s2 + e2
This is where the term "least squares" comes from: We need to find the smallest sum of squares, in the sense of the foregoing equation. The number |je|| is called the least squares error of the approximation.
Section 7.3   Least Squares Approximation 573
	From Figure 7.11, we also obtain the following formulas for e„ e2, and s3 in our
	example:
	Et = 2 - (a + b' 1)   e2 = 2 - (a + b• 2)   s3 = 4 - (a + 3)
Example 7.24	Which of the following lines gives the smallest least squares error for the data points
	(1, 2), (2, 2), and (3, 4)?
	,    (a) y = 1 + x
(b) y = -2 + 2x
(c) y = 1 + x
Solution   Table 7.1 shows the necessary calculations.
Table 7.1
y = 1 + x _y = -2 + 2x_y = § + *
2 -(1 + 1)=   0       2 -(-2 + 2) = 2 2 -(f+l)=|
2 ~ (1 + 2) = -1       2 - (-2 + 4) = 0 2 - (| + 2) = -f
4 - (1 + 3) =   0       4 - (-2 + 6) = 0 4 - (| + 3) = |
Figure 7.12
We see that the line y = | + x produces the smallest least squares error among these three lines. Figure 7.12 shows the data points and all three lines. i,
S74
Chapter 7   Distance and Approximation
It turns out that the line y = § + x in Example 7.24 gives the smallest least squares error of any line, even though it passes through none of the given points. The rest of this section is devoted to illustrating why this is so.
In general, suppose we have n data points (Xi,)'i),..., (x„,yn) and a line y = a + bx. Our error vector is
e =
where e,- = y) — (a + bx,). The line y = a + bx that minimizes s\ + • • • + s2 is called the least squares approximating line (or the line of best fit) for the points (jq, y,),..., (x„,y„). As noted prior to Example 7.24, we can express this problem in matrix form. If the given points were actually on the line y = a + bx, then the «linear equations
a + bxl = a + foc„ = yn
would all be true (i.e., the system would be consistent). Our interest is in the case where the points are not collinear, in which case the system is inconsistent. In matrix form, we have
" 1					
1			a b	=	
. 1	Xfl-.				->•„-
which is of the form Ax = b, where
A
i v
1 X-,
1 x„
yi
Jn
The error vector e is just b — Ax (check this), and we want to minimize |[e|[2 or, equiv-alently, jjejj. We can therefore rephrase our problem in terms of matrices as follows.
D6filliti0ll If A is an m X n matrix and b is in Rm, a least squares solution of Ax = b is a vector x in IR" such that
■ jib - Axil < i|b - Ax||
for all x in W.
Section 7.3   Least Squares Approximation 575
Solution of the Least Squares Problem
Any vector of the form Ax is in the column space of A, and as x varies over all vectors in R", Ax varies over all vectors in col(A). A least squares solution of Ax = b is therefore equivalent to a vector y in col (A) such that
||b - yfl < [|b - y|i
for all y in col(A). In other words, we need the closest vector in col(A) to b. By the Best Approximation Theorem, the vector we want is the orthogonal projection of b onto cql(A). Thus, if x is a least squares solution of Ax = b, we have
Ax = projcol(A)(b) (1)
In order to find x, it would appear that we need to first compute projcol(A)(b) and then solve the system (1). However, there is a better way to proceed. We know that
b - Ax = b - projcolU)(b) = perpcoI(A)(b)
is orthogonal to col(A). So b - Ax is in (col(A))1 = null(AJ). Therefore Ar(b - Ax) = 0, which, in turn, is equivalent to Arb - ArAx = 0 or
A7Ax = Arb
This represents a system of equations known as the normal equations for x.
We have just established that the solutions of the normal equations for x are precisely the least squares solutions of Ax — b. This proves the first part of the following theorem.
Theorem 7.9     The Least Squares Theorem
Let A be an m X n matrix and let b be in Rm. Then Ax = b always has at least one least squares solution x. Moreover:
a. x is a least squares solution of Ax = b if and only if x is a solution of the normal equations A7 Ax = A1 b.
b. A has linearly independent columns if and only if ATA is invertible. In this case, the least squares solution of Ax = b is unique and is given by
x = (A''A)-'Arb
Proof We have already established property (a). For property (b), we note that the n columns of A are linearly independent if and only if rank(A) = n. But this is true if and only if A7A is invertible, by Theorem 3.28. If A1 A is invertible, then the unique solution of A7Ax = A' b is clearly x = (ATA)'1 Arb. _
576
Chapter 7   Distance and Approximation
Example 7.25
Find a least squares solution to the inconsistent system Ax = b, where
A =
1	5"		"3"
2	-2	and  b =	2
.-1	1.		.5.
Solution   We compute
A A
6 0 0 30
		1      2 -1"			' 2
and	Arb =	5   -2 1	2	—	16
The normal equations ATAx = ATb are just
"6	0"	x =	~21
.0	30_		.16J
which yield x
The fact that this solution is unique was guaranteed by
Theorem 7.9(b), since the columns of A are clearly linearly independent.
Remark We could have phrased Example 7.25 as follows: Find the best approximation to b in the column space of A. The resulting equations give the system Ax = b whose least squares solution we just found. (Verify this.) In this case, the components of x are the coefficients of that linear combination of the columns of A that produces the best approximation to b—namely,
f		5"		3"
2	+ h	-2	=	2 ~5
.-1.		1.		1 - 5-
This is exactly the result of Example 7.23. Compare the two approaches.
Example 7.26
Find the least squares approximating line for the data points (1, 2), (2, 2), and (3, 4) from Example 7.24.
We have already seen that the corresponding system Ax = b is
"l	r				~2~
			a		
1	2			—	2
			b		
A	3.				.4.
Section 7.3   Least Squares Approximation
577
where y — a + bx is the line we seek. Since the columns of A are clearly linearly independent, there will be a unique least squares solution, by part (b) of the Least Squares Theorem. We compute
r [ill
A A =
U   2 3
Hence, we can solve the normal equations A1 Ax = A!b, using Gaussian elimination to obtain
[A1 A | Arb] =
1 1 1 2 1 3
3 6 6 14
and A'b
tu -
			2		
"l 1	1		2	=	8"
1 2	3_				
			4.		
"3	6	8	-y	1	0	2 3
6	14	18_		0	1	1
So x
, from which we see that a = I,   = 1 are the coefficients of the least
squares approximating line: y = j + x.
4
The line we just found is the line in Example 7.24(c), so we have justified our claim that this line produces the smallest least squares error for the data points (1,2), (2, 2), and (3, 4). Notice that if x is a least squares solution of Ax = b, we may compute the least squares error as
Hell = lib - Ax||
Since Ax = projcol(yl)(b), this is just the length of perpcol{A)(b)-from b to the column space'of A. In Example 7.26, we had
-that is, the distance
Ax
~1		"l	r		- 2"		1" 3
2	_	1	2		3	=	2 3
					1		
.4.		.1	3,				- 3-
so, as in Example 7.24(c), we have a least squares error of ||e||
0.816.
Note that the columns of A in Example 7.26 are linearly independent, so (ATA)~1 exists, and we could calculate x as x = (AJA)_1A7b. However, it is almost always easier to solve the normal equations using Gaussian elimination (or to let your CAS do it for you!).
It is interesting to look at Example 7.26 from two different geometric points of view. On the one hand, we have the least squares approximating line y = | + x, with corresponding errors el — j, e2 = —f, and s3 = j, as shown in Figure 7.13(a). Equivalently, we have the projection of b onto the column space of A, as shown in Figure 7.13(b). Here,
projcoiu)
(b)
Ax =
1 1
1 2 1 3
578
Chapter 7   Distance and Approximation
y
Figure 7.13
and the least squares error vector is e = if the data points were collinear?]
. [What would Figure 7.13(b) look like
Example 7.27
Find the least squares approximating line and the least squares error for the points (1,1), (2, 2), (3, 2), and (4, 3).
Solution Let y = a + bx be the equation of the line we seek. Then, substituting the four points into this equation, we obtain
a + b = 1 a + 2b = 2 a + 3b = 2 a + 4b = 3
So we want the least squares solution of Ax = b, where
or
"1	r				"1"
1	2		a		2
1	3		X		2
1	4				3
"1	1"		T
1	2		2
		and  b =	
1	3		2
J	4_		3
Since the columns of a are linearly independent, the solution we want is
x = {ATAYlATb =
• fc     (Check this calculation.) Therefore, we take a = j and ib = f, producing the least squares approximating line y = 5 + fx, as shown in Figure 7.14.
Section /.3   Least Squares Approximation
579
► x
figuro 7.14
Since
Ax
"1"		"1	r
2		1	2
2		1	3
3		1	4
the least squares error is j|e|| = V5/5 ~ 0.447.
We can use the method of least squares to approximate data points by curves other than straight lines.
Example 7.28
Find the parabola that gives the best least squares approximation to the points (—1,1), (0, -1), (1,0), and (2, 2).
Solution Ihe equation of a parabola is a quadratic y — a + bx + cx2. Substituting the given points into this quadratic, we obtain the linear system
(I	-   b + c	= 1	"1	-1	1
a		= -1	1	0	0
	+  b + c	or			
a		= 0	1	1	1
a	+ 2b + Ac	= 2	1	2	4
Thus, we want the least squares approximation of Ax = b, where
A =
"1	-1	r
1	0	0
1	1	i
1	2	4
and  b =
Chapter 7   Distance and Approximation
We compute
~4   2 6 2   6 8 .6   8 18 so the normal equations are given by
A'A
and AJb
3jk -
"4	2	6~		"2~
2	6	8	x =	3
_6	8	18.		.9.
whose solution is
Thus, the least squares approximating parabola has the equation as shown in Figure 7.15.
y = -is - !* + x2
figure 7.15
A least squares approximating parabola
4
One of the important uses of least squares approximation is to estimate constants associated with various processes. The next example illustrates this application in the context of population growth. Recall from Section 6.7 that a population that is growing (or decaying) exponentially satisfies an equation of the form pit) = cekt, where p{t) is the size of the population at time t and c and k are constants. Clearly, c = p(0), but k is not so easy to determine. It is easy to see that
k =
P'it) Pit)
which explains why k is sometimes referred to as the relative growth rate of the population: It is the ratio of the growth rate p'it) to the size of the population pit).
Section 7.3   Least Squares Approximation
581
cas
Example 7.29
Table 7.2	
	Population
Year	(in billions)
1950	2.56
1960	3.04
1970	3.71
1980	4.46
1990	5.28
2000	6.08
Source: U.S. Bureau of the Census, International Data Base
Table 7.2 gives the population of the world at 10-year intervals for the second half of the 20th century. Assuming an exponential growth model, find the relative growth rate and predict the world's population in 2010.
vnt'i'a* Let's agree to measure time t in 10-year intervals so that I = 0 is 1950, t ~ 1 is 1960, and so on. Since c = p(0) = 2.56, the equation for the growth rate of the population is
p = 2.56ekt
How can we use the method of least squares on this equation? If we take the natural logarithm of both sides, we convert the equation into a linear one:
In p = ln(2.56efc0
= In 2,56 + ln(e*') « 0.94 + kt
Plugging in the values of t and p from Table 7.2 yields the following system (where we have rounded calculations to three decimal places):
0.94 = 0.94 k = 0.172 2k = 0.371 3k = 0.555 4k = 0.724 5k = 0.865
We can ignore the first equation (it just corresponds to the initial condition c = p(0) = 2.56). The remaining equations correspond to a system Ax = b, with
A =
"l"		' 0.172"
2		0.371
3	and  b =	0.555
4		0.724
.5.		_ 0.865 _
Since ArA = 55 and ATb - 9.80, the corresponding normal equations are just the single equation
55x = 9.80
Therefore, k = x: = 9.80/55 0.178. Consequently, the least squares solution has the formp = 2.56eomt (see Figure 7.16).
The world's population in 2010 corresponds to t = 6, from which we obtain
p(6) = 2.56e'
0.178(6)
7.448
Thus, if our model is accurate, there will be approximately 7.45 billion people on Earth in the year 2010. (The U.S. Census Bureau estimates that the global population will be "only" 6.82 billion in 2010. Why do you think our estimate is higher?)
Chapter 7   Distance and Approximation
Figure 7.16
Least Squares via the Off Factorization
It is often the case that the normal equations for a least squares problem are ill-conditioned. Therefore, a small numerical error in performing Gaussian elimination will result in a large error in the least squares solution. Consequently, in practice, other methods are usually used to compute least squares approximations.
It turns out that the QR factorization of A yields a more reliable way of computing the least squares approximation of Ax = b.
Theorem 1.10     Let A be an m X n matrix with linearly independent columns and let b be in Um.
If A = QR is a QR factorization of A, then the unique least squares solution x of Ax = b is
x    R Qrb
Recall from Theorem 5.16 that the QR factorization A = QR involves an m X n matrix Q with orthonormal columns and an invertible upper triangular matrix R. From the Least Squares Theorem, we have
ArAx = Arb
(QRfQRx = (QR)Tb
=> RTQTQRx = R'Q'b
=> RTRx = R'Q'b
H »    since Q1Q = I. (Why?)
Since R is invertible, so is RT, and hence we have
_Rx = Qrb   or, equivalently,   I = R^QTb "^M
Remark   Since R is upper triangular, in practice it is easier to solve Rx Q'b directly than to invert R and compute jR^Q'b.
Section 7.3   Least Squares Approximation
583
-   -  ! 38
Use the QR factorization to find a least squares solution of Ax = b, where
A =
" 1	2	2~		" 2"
-1	1	2		-3
			and  b =	
-1	0	1		-2
1	1	2_		0_
Solution   From Example 5.15,
1/2 3
A = QR =
3/10
1/2 3V5/10 1/2 V5/10 1/2     V5/10
•V6/6 0
V6/6 V6/3
V5
1/2' 3V5/2 V6/2.
We have
	1/2 -1/2	-1/2	1/2	" 2" -3 _		7/2*
QTb =	3V5/10 3V5/10	V5/10	Vs/io		=	-V5/2
	. - V6/6 0	V6/6	V6/3 .	0		„-2V6/3_
so we require the solution to Rx = Qrb, or
	'2     1 1/2			7/2
	0   V5 3V5/2		x =	-V5/2
	_0    0     V6/2 .			.-2V6/3.
Back substitution quickly yields				
			4/3"	
x =			3/2	
			4/3 _	
Orthogonal Projection Revisited
One of the nice byproducts of the least squares method is a new formula for the orthogonal projection of a vector onto a subspace of Wn.
Theorem 7.11
Let W be a subspace of R"! and let A be an m X n matrix whose columns form a basis for W. If v is any vector in U"\ then the orthogonal projection of v onto W is the vector
The linear transformation P as its standard matrix.
projw-(v) = A{ArAYlATv
-> W that projects W" onto Whas A(ATA)~1A1
PrOOt Given the way we have constructed A, its column space is W. Since the columns of A are linearly independent, the Least Squares Theorem guarantees that there is a unique least squares solution to Ax = v given by
x = (ATAylATv
Chapter 7   Distance and Approximation
tit mm 1.31
By Equation (1),
Therefore,
Ax = projcol(A)(v) = projw(v) projw(v) = A((ATA)  A'v) = (A{ATAYlAT)v
as required. Since this equation holds for all v in Rm, the last statement of the theorem follows immediately. —
We will illustrate Theorem 7.11 by revisiting Example 5.11.
Find the orthogonal projection of v
3 -1 2
onto the plane W in R3 with equation
x - y + 2z = 0, and give the standard matrix of the orthogonal projection transformation onto W.
SolltlQB   As in Example 5.11, we will take as a basis for W the set
We form the matrix
A =
-1 1 1
with these basis vectors as its columns. Then
A'A =
1 1 0 Til
1 -1 1 1 0 1
2 0 0 3
so
(ATAY
By Theorem 7.11, the standard matrix of the orthogonal projection transformation onto Wis
A(ÄrA)"'1AT = A so the orthogonal projection of v onto W is
1 -1 1 1 0 1
projw(v) = A(ATA)-lATv
1 1 0 -1   1 1
which agrees with our solution to Example 5.11.
Section 7.3   Least Squares Approximation
585
Remark   Since the projection of a vector onto a subspace W is unique, the standard matrix of this linear transformation (as given by Theorem 7.11) cannot depend on the choice of basis for W. In other words, with a different basis for W, we have a H j>    different matrix A, but the matrix A(ATAYlAT will be the same! (You are asked to verify this in Exercise 43.)
The Pseudoinverse of a Matrix
If A is an n X n matrix with linearly independent columns, then it is invertible, and the unique solution to Ax = b is x = A~'b. If m > n and A is m X n with linearly independent columns, then Ax = b has no exact solution, but the best approximation is given by the unique least squares solution x = (A1 A) ~1A1 b. The matrix (ArA) ~1A r therefore plays the role of an "inverse of A" in this situation.
Definition If A is a matrix with linearly independent columns, then the pseudoinverse of A is the matrix A + defined by
A+ = (A'AT'A1
Observe that if A is m X n, then A+ is n X m.
			
Example 7.32		"l l1	
	Find the pseudoinverse of A =	1 2	
		.1 3.	
Solution We have already done most of the calculations in Example 7.26. Using our previous work, we have
A+ = (ATA)~1AT =
3 -1"		i	l	1"		-    4 1 3 3	2" 3
L-i I		l	2	3.		-\ o	1 2-
The pseudoinverse is a convenient shorthand notation for some of the concepts we have been exploring. For example, if A is m X n with linearly independent columns, the least squares solution of Ax = b is given by
x - A b
and the standard matrix of the orthogonal projection P from Um onto col(A) is
[P] = AA +
If A is actually a square matrix, then it is easy to show that A* = A^1 (see Exercise 53). In this case, the least squares solution of Ax = b is the exact solution, since
x = A b = A 'b = x
Chapter 7   Distance and Approximation
The projection matrix becomes [P] = AA+ = AA~~l = I. (What is the geometric interpretation of this equality?)
Theorem 7.12 summarizes the key properties of the pseudoinverse of a matrix. (Before reading the proof of this theorem, verify these properties for the matrix in Example 7.32.)
TheOreM 7.12     Let A be a matrix with linearly independent columns. Then the pseudoinverse A+ of A satisfies the following properties, called the Penrose conditions for A:
AA+A =* A
b. A+AA+ = A+
c. AA+ and A+A are symmetric.
We prove condition (a) and half of condition (c) and leave the proofs of the remaining conditions as Exercises 54 and 55.
(a) We compute
AA-A = AdA^y'A^A
= a{ataY1{ata)
= AI = A
(c) By Theorem 3.4, A1 A is symmetric. Therefore, (A'A)'1 is also symmetric, by Exercise 46 in Section 3.3. Taking the transpose of AA+, we have
(AA'V = (AiA'AyW
= (AT)T((ATA)-l)TAT
= A{ATA)lAT
= AA+ _
Exercise 56 explores further properties of the pseudoinverse. In the next section, we will see how to extend the definition of A+ to handle all matrices, whether or not the columns of A are linearly independent.
Exercises 7.3_
CAS
In Exercises 1-3, consider the data points (1, 0), (2, 1), and (3, 5). Compute the least squares error for the given line. In each case, plot the points and the line.
l.y = -2 + 2%       2. y = x       3. y « -3 + \x
In Exercises 4-6, consider the data points ( — 5, 3), (0, 3), (5, 2), and (10, 0). Compute the least squares error for the given line. In each case, plot the points and the line.
4. v = 3 - \x       5. y = | 6. y = 2 - \x
In Exercises 7-14, find the least squares approximating line for the given points and compute the corresponding least squares error.
7. (1,0), (2, 1), (3, 5)
8. (1,6), (2, 3), (3, 1)
9. (0, 4), (1, 1), (2, 0)
10. (0, 3), (1, 3), (2, 5)
11. (-5, -1), (0, 1), (5, 2), (10, 4)
Section 7.3   Least Squares Approximation
387
12. (-5, 3), (0, 3), (5, 2), (10, 0)
13. (1,1), (2, 3), (3, 4), (4, 5), (5, 7)
14. (I, 10), (2, 8), (3, 5), (4, 3), (5,0)
In Exercises 15-18, find the least squares approximating parabola for the given points,
15. (1,1), (2,-2), (3, 3), (4, 4)
16. (1,6), (2, 0),(3,0), (4, 2)
17. (-2, 4), (-1,7), (0, 3), (1,0), (2,-1)
18. (-2, 0), (-1, -11), (0, -10), (1, -9), (2, 8)
In Exercises 19-22, find a least squares solution of Ax -- b by constructing and solving the normal equations.
	"3	1			1	
19. A =	1	1	,b =		1	
	„1	2.			J	
	"l		2"			
20. A =	3		I	,b =		
	2		I.			
	"1		2~			
	0		3	,b =		
21. A =						
	2		5			
	3	0				
4' 1
-2 4
1	0"		" 1
2	-1	,b =	5
			
-1	1		-1
0	2		2
22. A
In Exercises 23 and 24, show that the least squares solution of Ax = b is not unique and solve the normal equations to find all the least squares solutions.
23. A =
"1	1	0	0"			■ r	
1	0	1	1			-3	
				,b =			
0	-1	1	1			2	
J	-1	1	0_			4-	
"0	1	1	0"				5
1	-1	1	—	1	,b =		3
							
1	0	1	0				-1
1	1	1		1			1
24. A =
In Exercises 25 and 26, find the best approximation to a solution of the given system of equations.
25.   x +  y -  z =   2 26.   2x + 3y + z = 21
—y + 2z=  6 x + y + z 7
3x + 2y —  z — 11 ~ x + y — z = 14
—x +          z =   0 2y + z = 0
In Exercises 27 and 28, a QR factorization of A is given. Use it to find a least squares solution of Ax = b.
	"2	r		~2 3	1 ~ 3	
27. A =	2	0	,Q =	2 3	2	
	.1	i_		1 L3	2 3-	
b
2
3 -1
28. A =
R =
1
2
.-1
V6 0
Q
■V6/2" 1/V2
1/V6 2/V6 .-1/V6 f
1/V2 0
1/V2.
b -
29. A tennis ball is dropped from various heights, and the height of the ball on the first bounce is measured. Use the data in Table 7.3 to find the least squares approximating line for bounce height b as a linear function of initial height h.
h (cm) b (cm)
20 14.5
40 31
48 36
60 45.5
80 59
100 73.5
30. Hooke's Law states that the length L of a spring is a linear function of the force F applied to it. (See Figure 7.17 and Example 6.92.) Accordingly, there are constants a and b such that
L = a + bF
Table 7.4 shows the results of attaching various weights to a spring.
Talis IJ
F (oz) 2 L (in.) 7.4
4
9.6
11.5 13.6
Chapter 7   Distance and Approximation
ISIS« 11
Year of Birth 1920 1930 1940 1950         1960 1970 1980 1990
Life Expectancy (years)        54.1 59.7 62.9 68.2 69.7 70.8 73.7 75.4
Source: World Almanac and Book of Facts. New York: World Almanac Books, 1999
(a) Determine the constants a and b by finding the least squares approximating line for thes^data. What does a represent?
(b) Estimate the length of the spring when a weight of 5 ounces is attached.
31. Table 7.5 gives life expectancies for people born in the United States in the given years.
(a) Determine the least squares approximating line for these data and use it to predict the life expectancy of someone born in 2000.
(b) How good is this model? Explain.
32. When an object is thrown straight up into the air, Newton's Second Law of Motion states that its height s(t) at time t is given by
s(t) = s0 + v0r + \gt2
where v0 is its initial velocity and g is the constant of acceleration due to gravity. Suppose we take the measurements shown in Table 7.6.
Time(s) 0.5 1 1.5 2 3
Height (m)       11 17       21 23 18
(a) Find the least squares approximating quadratic for these data.
(b) Estimate the height at which the object was released (in m), its initial velocity (in m/s), and its acceleration due to gravity (in m/s2).
(c) Approximately when will the object hit the ground?
33. Table 7.7 gives the population of the United States at 10-year intervals for the years 1950-2000.
(a) Assuming an exponential growth model of the form p(t) = cekt, where p(t) is the population at time r, use least squares to find the equation for the growth rate of the population. [Hint: Let t = 0 be 1950.]
(b) Use the equation to estimate the U.S. population in 2010.
filli 7.7	
	Population
Year	(in millions)
1950	150
1960	179
1970	203
1980	227
1990	250
2000	281
Source: U.S. Bureau of the Census
34. Table 7.8 shows average major league baseball salaries for,the years 1970-2005.
(a) Find the least squares approximating quadratic for these data.
(b) Find the least squares approximating exponential for these data.
(c) Which equation gives the better approximation? Why?
(d) What do you estimate the average major league baseball salary will be in 2010 and 2015?
Iltilf I.I
	Average Salary
Year	(thousands of dollars)
1970	29.3
1975	44.7
1980	143.8
1985	371.6
1990	597.5
1995	1110.8
2000	1895.6
2005	2476.6
Source: Major Leaf	;ue Baseball Players Association
Section 7.3   Least Squares Approximation
S89
35. A 200 mg sample of radioactive polonium-210 is observed as it decays. Table 7.9 shows the mass remaining at various times.
Assuming an exponential decay model, use least squares to find the half-life of polonium-210. (See Section 6.7.)
Table 7.9
Time (days) Mass (mg)
0
200
30 172
60 148
90 128
36. Find the plane z = a + bx + cy that best fits the data points (0, -4, 0), (5, 0, 0), (4, -1,1), (1, -3,1), and (-1, ~5,-2).
In Exercises 37-42, find the standard matrix of the orthogonal projection onto the subspace W. Then use this matrix to find the orthogonal projection ofv onto W.
37. W = span
38. W = span
39. W = span
40. W = span
41. W = span
42. W = span
I v =
I v =
, V =
		"l"
	, v =	0
J		.0.
1
0
-1
	V
	2
/	.3.
43. Verify that the standard matrix of the projection onto Win Example 7.31 (as constructed by Theorem 7.11) does not depend on the choice of basis. Take
as a basis for W and repeat the calculations to show that the resulting projection matrix is the same.
44. Let A be a matrix with linearly independent columns and let P = A(ATA)'1AT be the matrix of orthogonal projection onto col(A).
(a) Show that P is symmetric.
(b) Show that P is idempotent.
In Exercises 45-52, compute the pseudoinverse of A.
l"
45. A =
47. A =
49. A
51. A
1
2.
1 3 -1 1
0 2 1 1
0 1 '10 0'
1 0 1
0 1 1
1 1 1
46. A =
48. A =
50. A
52. A
-1 2
1 3 3 1
0 1
1 1
0 0
53. (a) Show that if A is a square matrix with linearly
independent columns, then A+ = A-1, (b) If A is an m X n matrix with orthonormal columns, what is A+?
54. Prove Theorem 7.12(b).
55. Prove the remaining part of Theorem 7.12(c).
56. Let A be a matrix with linearly independent columns. Prove the following:
(a) (cA)T = (l/c)A+ for all scalars c# 0.
(b) (A+)+ = A if A is a square matrix.
(c) (A7)"*" = (A+)rifA is a square matrix.
57. Let n data points (x1;y{),..., (x„,yn) be given. Show that if the points do not all lie on the same vertical line, then they have a unique least squares approximating line.
58. Let n data points (xuyi),..., (xn,yn) be given. Generalize Exercise 57 to show that if at least k + 1 of distinct, then the given points have a unique least squares approximating polynomial of degree at most k.
590 Chapter 7   Distance and Approximation
The Singular Value Decomposition
In Chapter 5, we saw that every symmetric matrix A can be factored as A = PDP r, where P is an orthogonal matrix and D is a diagonal matrix displaying the eigenvalues of A. If A is not symmetric, such a factorization is not possible, but as we learned in Chapter 4, we may still be able to factor a square matrix A as A = PDP"1, where D is as before but P is now simply an invertible matrix. However, not every matrix is diagonalizable, so it may surprise you that we will now show that every matrix (symmetric or not, square or not) has a factorization of the form A = PDQ T, where P and Q are orthogonal and D is a diagonal matrix! This remarkable result is the singular value decomposition (SVD), and it is one of the most important of all matrix factorizations.
In this section, we will show how to compute the SVD of a matrix and then consider some of its many applications. Along the way, we will tie up some loose ends by answering a few questions that were left open in previous sections.
The Singular Values of a Matrix
For any m X n matrix A, the « X n matrix A1 A is symmetric and hence can be orthogonally diagonalized, by the Spectral Theorem. Not only are the eigenvalues of ArA all real (Theorem 5.18), they are all nonnegative. To show this, let A be an eigenvalue of ATA with corresponding unit eigenvector v. Then
0 < ||Av||2 = (Av) • (Av) = (Av)rAv = vrATAv = vrAv = A(vv) = Ajjvj|2 = A
It therefore makes sense to take (positive) square roots of these eigenvalues.
Definition If A is an m X n matrix, the singular values of A are the square roots of the eigenvalues of ATA and are denoted by ax,..., crn. It is conventional to arrange the singular values so that crx 5: a2 s • • • > an.
Find the singular values of
Solution   The matrix
A
1 1 1 0 0 1
		"l	1			
1   1 0					2	I
		1	0	35=		
1   0 1_					1	
		0	1			
A A
has eigenvalues A! = 3 and A2 = 1. Consequently, the singular values of A are ax = ^Tx = V3andcr2 = VXJ = 1.
Section 7,4   The Singular Value Decomposition
To understand the significance of the singular values of an m X n matrix A, consider the eigenvectors of ATA. Since ATA is symmetric, we know that there is an orthonormal basis for Un that consists of eigenvectors of ATA. Let {v1;..., v„} be such a basis corresponding to the eigenvalues of A1 A, ordered so that Aj 2: A2 > • • • > A„. From our calculations just before the definition,
Therefore,
A; = \\Avtf
c7: = VA^ = |JAVj|
In other words, the singular values of A are the lengths of the vectors Avv ..., Av„.
Geometrically, this result has a nice interpretation. Consider Example 7.33 again. If x lies on the unit circle in R2 (i.e., ||xj| = 1), then
ijAxi!2 = (Ax) • (Ax) = (Ax)r(Ax)
x'ArAx
'2 r	hi
.1 2_	1*2.
= 2%2 + 2x,x2 + 2x\
which we recognize is a quadratic form. By Theorem 5.23, the maximum and minimum values of this quadratic form, subject to the constraint ||x|| = 1, are A, = 3 and A2 = 1, respectively, and they occur at the corresponding eigenvectors of ATA—that
is, when x = Vj =
1/V2
and x = v, =
-1/V2 1/V2
, respectively. Since
IIAvJ
v/A'Av; = A;
for i = 1, 2, we see that al ~ jjAvjjj = V3 and cr2 = \\Av2\\ = 1 are the maximum and minimum values of the lengths ||Ax| as x traverses the unit circle in R .
Now, the linear transformation corresponding to A maps R2 onto the plane in R3 with equation x — y — z = 0 (verify this), and the image of the unit circle under this transformation is an ellipse that lies in this plane. (We will verify this fact in general shortly; see Figure 7.18.) So arl and <x2 are the lengths of half of the major and minor axes of this ellipse, as shown in Figure 7.19.
We can now describe the singular value decomposition of a matrix.
multiplication by A
x2
Figure 7.18
The matrix A transforms the unit circle in R2 into an ellipse in R3
		
	2-	
1		
1 / -2 /		\ / 2
		\
-2-		\
Figure 7.19
: Chapter 7   Distance and Approximation
The Singular Maine Decomposition
We want to show that anmXn matrix A can be factored as
A = U%VT
where U is an m X m orthogonal matrix, V is an n X n orthogonal matrix, and 2 is an m X n "diagonal" matrix. If the nonzero singular values of A are
cr, > 0
and <r.
a
r+2
2 =
		a„ =
	n — r	
5	o '	}<■
0	0 _	} m -
where D
(1)
and each matrix O is a zero matrix of the appropriate size. (If r = m or r = n, some of these will not appear.) Some examples of such a matrix 2 with r = 2 are
2 -
4 0 0 0   3 0
	2	0		8	0	0
, 2 =	0	2	, 2 =	0	3	0
	„0	0_		0	0	0
2 =
5 0 0
0 2 0
0 0 0
0 0 0
H w    (What is D in each case?)
To construct the orthogonal matrix V, we first find an orthonormal basis {Vj,.,,, v„} for W consisting of eigenvectors of the n X n symmetric matrix ATA. Then
y =
is an orthogonal n X n matrix. '
For the orthogonal matrix U, we first note that {Ayv ..., AvJ is an orthogonal set of vectors in Um. To see this, suppose that v; is the eigenvector of ATA corresponding to the eigenvalue A;. Then, for i ¥= j, we have
Uv,) • (Av,) = (Av^A^ = vfArAvj
= A/Vj-v,.) = 0
since the eigenvectors v,- are orthogonal. Now recall that the singular values satisfy <Ji = |jAv,-j| and that the first r of these are nonzero. Therefore, we can normalize Axi,..., Ayr by setting
u,- = —Ay,   for i = l,______, r
This guarantees that {u,,..., u,.} is an orthonormal set in Um, but if r < m it will not be a basis for Um. In this case, we extend the set {u„ ..., ur} to an orthonormal basis {uL,..., um) for Um. (This is the only tricky part of the construction; we will describe techniques for carrying it out in the examples below and in the exercises.) Then we set
U ~ [«i   • • • uj
Section 7.4   The Singular Value Decomposition
S93
All that remains to be shown is that this works; that is, we need to verify that with U, V, and 2 as described, we have A = l/2Vr. Since Vr = V'\tiiis is equivalent to showing that
AV = L/2
We know that Av; = a;U, for i = 1,..., r
and |AVj|| = cr,- = 0 for i = r + 1,..., n. Hence,
Av(- = 0 for i = r + 1,..., n
Therefore,
AV = A[v, = [Av, = [Av, = [a^
Av„]
Avr 0 • o-rur 0
			... o	-
= [n,	• • • um]	0	• • • <tr	o
		0		0
= 1/2
as required.
We have just proved the following extremely important theorem.
Theorem 7.13     The Singular Value Decomposition
Let A be an m X n matrix with singular values crl > a2 — ■— or,. > 0 and it,.+ i = crr+2 = '' • = cr„ = 0. Then there exist an m X m orthogonal matrix L/, an n X n orthogonal matrix V, and an m X n matrix 2 of the form shown in Equation (1) such that
A = VtVT
A factorization of A as in Theorem 7.13 is called a singular value decomposition (SVD) of A. The columns of U are called left singular vectors of A, and the columns of V are called right singular vectors of A. The matrices U and V are not uniquely determined by A, but 2 must contain the singular values of A, as in Equation (1). (See Exercise 25.)
Example 7.34
Find a singular value decomposition for the following matrices:
i r
(a) A
1 1 0 0   0 1
(b) A
1 0 0 1
594 Chapter 7   Distance and Approximation
Solution   (a) We compute
				"l		1 (f
ArA				1		1 0
				_0		0 1.
and find that its eigenvalues are	Ai	=	2, A,			= 1, (
eigenvectors						
	1		0			-1
	1		0		>	1
	0		1			0
(Verify this.) These vectors are orthogonal, so we normalize them to obtain
	"l/vr		"o"		"-1/V2"
	1/V2		0	>   v3 =	1/V2
	0		.1		0
The singular values of A are cr. = V2, a2 = VI = 1, and o", = Vo = 0. Thus,
and X
V
1/V2   0 -1/V2 1/V2   0 1/V2 0      1 0
V2 0 0 0     1 0
To find U, we compute 1
and
Ui = —Av, = —7= o-i V2
1 1
U2 = -Ay2 = -
1   1 0
0   0 I
1/V2'
1/V2 0
			"o~		
"1	1 0				0
			0	ss	
„0	0 1				1
			1		
These vectors already form an orthonormal basis (the standard basis) for [R2, so we have
p 0 10 1
This yields the SVD
A =
"1 1 0'		"1 0"	"V2	0 0"
.0 0 1.		_o 1.	. 0	1 o_
1/V2 1/V2 0'
0 0 1
-1/V2 1/V2 0.
utv1
a »    which can be easily checked. (Note that V had to be transposed. Also note that the singular value cr3 does not appear in 2.)
Section 7.4   The Singular Value Decomposition
S95
(b) This is the matrix in Example 7.33, so we already know that the singular values are
r     , , rl/V2]     , [-1/V2
cr, = V3 and o~2 — 1, corresponding to Vj =
1/V2
and v2
1/V2
. So
2 =
V3 0 0 1 0 0
and   V =
1/V2 -1/V2 1/V2 1/V2
For L7, we compute
1 1
—Av, = —= o"i V3
1 1 1 0 0 1
1/V2 1/V2
2/V6' 1/V6 1/V6
and
u, = —Av,
1 1 1 0 0 1
■1/V2 1/V2
0
■1/V2
1/V2
This time, we need to extend {u^ u2} to an orthonormal basis for R . There are several ways to proceed; one method is to use the Gram-Schmidt Process, as in Example 5.14. We first need to find a linearly independent set of three vectors that contains Ui and u2. If e3 is the third standard basis vector in R , it is clear that {u1; u2, e3} is linearly independent. (Here, you should be able to determine this by inspection, but a reliable method to use in general is to row reduce the matrix with these vectors as its columns and use the Fundamental Theorem.) Applying Gram-Schmidt (with normalization) to {%, u2, e3} (only the last step is needed), we find
-i/vT 1/V3 1/V3.
so
U =
2/V6 0 -1/V3' 1/V6   -1/V2 1/V3
i/y/6    1/V2 1/V3,
and we have the SVD
	"l f		"2/V6	0 -1/V3"	" V3 0"
A =	1 0	=	1/V6	-1/V2 1/V3	0 1
	_o 1.		.1/V6	1/V2 1/V3_	. 0 0.
1/V2 1/V2
-1/V2 1/V2_
U2Vr
There is another form of the singular value decomposition, analogous to the spectral decomposition of a symmetric matrix. It is obtained from the SVD by an outer product expansion and is very useful in applications. We can obtain this version of the SVD by imitating what we did to obtain the spectral decomposition.
596 Chapter 7   Distance and Approximation
Accordingly, we have A = U£Vr = [u, ■
	... o	
		0
0	• • • ar	
	0	0
"ml
[-1
0
-T,
= cr,U,
crrurJ
crr. 0
o-,
0 •••.or,
u,J[0]
= (TjUiV,1 + • • • + 0-.UrV7
using block multiplication and the column-row representation of the product. The following theorem summarizes the process for obtaining this outer product form of theSVD.
Theorem 7.14     The Outer Product Form of the SVD
Let A be an m X n matrix with singular values <Xj — a2 s ■ ■ • > cr,. > 0 and ar+l = cry+j = • • • = cr„ = 0. Let Uj,..., u,. be left singular vectors and let v,,..., v,. be right singular vectors of A corresponding to these singular values. Then
A = cr1u1V1' + • • • + crrurvr7
Remark If A is a positive definite, symmetric matrix, then Theorems 7.13 and 7.14 both reduce to results that we already know. In this case, it is not hard to show that the SVD generalizes the Spectral Theorem and that Theorem 7.14 generalizes the spectral decomposition. (See Exercise 27.)
The SVD of a matrix A contains much important information about A, as outlined in the crucial Theorem 7.15.
Section 7.4   The Singular Value Decomposition
51?
TllCOreill 7.15     LetA = C/2V3 be a singular value decomposition of an m X n matrix A. Let a crr be all the nonzero singular values of A. Then:
a. The rank of A is r.
b. {ui,..., u,J is an orthonormal basis for col(A).
c. {ur+1,..., um} is an orthonormal basis for null(AT).
d. {V],..., vr} is an orthonormal basis for row(A).
e. |vr+v„} is an orthonormal basis for null(A).
Proof   (a) By Exercise 61 in Section 3.5, we have
rank(A) = rmUUXV1) = rankCSF1) — rank(X) = r
(b) We already know that {ut,..., u,J is an orthonormal set. Therefore, it is linearly independent, by Theorem 5.1. Since u, — (l/a^AVj for i = 1,..., r, each u; is in the
a *»    column space of A, (Why?) Furthermore,
r = rank(A) = dim(col(A))
Therefore, {uu ..., u.} is a.n orthonormal basis for col(A), by Theorem 6.10(c).
(c) Since {ui.....um} is an orthonormal basis for Rm and {Uj,..., ur} is a basis for
col(A), by property (b), it follows that {ur+!,..., um} is an orthonormal basis for the orthogonal complement of col(A). But (col(A)) = null(A3), by Theorem 5.10.
(e) Since
A%+1 ==■•• = Av„ = 0
the set {vr+1,..., v„} is an orthonormal set contained in the null space of A. Therefore, {vr+1,..., v„} is a linearly independent set of n — r vectors in null(A). But
dim(null(A)) = n - r
by the Rank Theorem, so {vr+ v..., vj is an orthonormal basis for null(A), by Theorem 6.10(c).
(d) Property (d) follows from property (e) and Theorem 5.10. (You are asked to prove this in Exercise 32.) a—
The SVD provides new geometric insight into the effect of matrix transformations. We have noted several times (without proof) that an m X n matrix transforms the unit sphere in R" into an ellipsoid in Rm. This point arose, for example, in our discussions of Perron's Theorem and of operator norms, as well as in the introduction to singular values in this section. We now prove this result.
Chapter 7   Distance and Approximation
Theorem 7.16 Let A be an m X n matrix with rank r. Then the image of the unit sphere in M' - under the matrix transformation that maps x to Ax is
a. the surface of an ellipsoid in Um if r - n.
b, a solid ellipsoid in IR"' if r < n.
Frost Let A = U2Fr be a singular value decomposition of the m X n matrix A. Let the left and right singular vectors of A be i^,..., um and Vj,..., v„, respectively. Since rank(A) = r, the singular values of A satisfy
al > (i2 s • • • S ov > 0   and  <xr+1 = cr,.r2
o-„ = 0
by Theorem 7.15(a). Letx
be a unit vector in IR". Now, since VTs an orthogonal
matrix, so is V1, and hence fx is a unit vector, by Theorem 5.6. Now
			v{x
VTx =		X =	
	r V		V X
so (v[x)2 + •••+ (vjx)2 = 1.
By the outer product form of the SVD, we have A = WiUlv[ + '"' + crrurvj. Therefore,
Ax = cTjiijvfx + • • • + trrurvjx
= (fT1V11x)u1 + • • ■ + (C7,-V^x)ur
= y1ul + ■■• + yrur
where we are letting yt denote the scalar a";V,Jx. (a) If r = «, then we must have n S m and
Ax = ^jUj + ■ • ■ + 7„un
= Uy
where y = orthogonal. But
. Therefore, again by Theorem 5.6, ||Axj| = \\Uy\\ = ||y||, since U is
+ • • • +
= (v'fx)2 + • ■ • + (v„rx)2 = 1
which shows that the vectors Ax form the surface of an ellipsoid in Um. (Why?) (b) If r < n, the only difference in the above steps is that the equation becomes
^1 0-,
1
since we are missing some terms. This inequality corresponds to a solid ellipsoid
in IRm. —-
Section 7.4   The Singular Value Decomposition
599
		
Example 7.35	Describe the image of the unit sphere in R3 under the action of the matrix 110" A = lo 0 1	
: in- i   In Example 7.34(a), we found the following SVD of A:
"1   1 0"		"1 0"		"V2   0 0"
.0   0 1.		_0 1.		. 0 10.
1/V2    1/V2 0 0 0 1
-1/V2   1/V2 0
Since r = rank(A) = 2 < 3 = n, the second part of Theorem 7.16 applies. The image of the unit sphere will satisfy the inequality
Figure 7.20
In general, we can describe the effect of an m X n matrix A on the unit sphere in U" in terms of the effect of each factor in its SVD, A = 172V"7, from right to left. Since VT is an orthogonal matrix, it maps the unit sphere to itself. The m X « matrix 2 does two things: The diagonal entries <r,.+ 1 = crr+2 = • • • = crn = 0 collapse n — r of the dimensions of the unit sphere, leaving an r-dimensional unit sphere, which the nonzero diagonal entries o~lt . . . , <jr then distort into an ellipsoid. The orthogonal matrix U then aligns the axes of this ellipsoid with the orthonormal basis vectors u1;..., urin W". (See Figure 7.21.)
Applications of the SVD
The singular value decomposition is an extremely useful tool, both practically and theoretically. We will look at just a few of its many applications.
600 Chapter 7   Distance and Approximation
Figure 7.21
Rank Until now, we have not worried about calculating the rank of a matrix from a computational point of view. We compute the rank of a matrix by row reducing it to echelon form and counting the number of nonzero rows. However, as we have seen, roundoff errors can affect this process, especially if the matrix is ill-conditioned. Entries that should be zero may end up as very small nonzero numbers, affecting our ability to accurately determine the rank and other quantities associated with the matrix. In practice, the SVD is often used to find the rank of a matrix, since it is much more reliable when roundoff errors are present. The basic idea behind this approach is that the orthogonal matrices U and V in the SVD preserve lengths and thus do not introduce additional errors; any errors that occur will tend to show up in the matrix _.
CAS Example 7.36
Let
A =
8.1650 -0.0041 -0.0041 4.0825 -3.9960 4.0042 4.0825      4.0042 -3.9960
and  B =
8.17 0 4.08 -4 4.08 4
The matrix B has been obtained by rounding off the entries in A to two decimal places. If we compute the ranks of these two approximately equal matrices, we find that rank(A) = 3 but rank(B) = 2. By the Fundamental Theorem, this implies, among other things, that A is invertible but B is not.
The explanation for this critical difference between two matrices that are approximately equal lies in their SVDs. The singular values of A are 10, 8, and 0.01, so A has rank 3. The singular values of B are 10, 8, and 0, so B has rank 2.
In practical applications, it is often assumed that if a singular value is computed to be close to zero, then roundoff error has crept in and the actual value should be zero.
In this way, "noise" can be filtered out. In this example, if we compute A replace
s
V1V and
"10	0	0		"10	0	o"
0	8	0	by  %' =	0	8	0
0	0	0.01		0	0	0
then ITZ'V1 = B. (Try it!)
Matrix Norms and tie Condition Number The SVD can provide simple formulas for certain expressions involving matrix norms. Consider, for example, the Frobenius norm of a matrix. The following theorem shows that it is completely determined by the singular values of the matrix.
Section 7.4   The Singular Value Decomposition
601
Theorem 7.17
Let A be an m X n matrix and let ax,,.., a,, be all the nonzero singular values of A. Then
\\A\\F = Vo-f +       + cr2
The proof of this result depends on the following analogue of Theorem 5.6: If A is an m X « matrix and Q is an m X m orthogonal matrix, then
WQAh = (2)
To show that this is true, we compute
|QA|i?.= UQa,   ■■■ QaJH = HQaJI + • • • + iQajl = [jajl! + • • • + l|a„||l
Proof of moron 7.17 Let A = U2VTbe a singular value decomposition of A. Then, using Equation (2) twice, we have
!|A|)|- ||uxvri
= \\V%% =          = or? + • • > + o* which establishes the result. __... JiH
cas ^
Verify Theorem 7.17 for the matrix A in Example 7.18.
has singular values 4,5150 and 3.1008. We
SOiUtlOn The matrix A check that
3 -1 2 4
V4.51502 + 3.10082 = V3Ö = \\A\\F which agrees with Example 7.18.
In Section 7.2, we commented that there is no easy formula for the operator 2-norm of a matrix A. Although that is true, the SVD of A provides us with a very nice expression for |A||2. Recall that
!jA|2 = max|jAx||
where the vector norm is the ordinary Euclidean norm. By Theorem 7.16, for ||x[| = 1, the set of vectors |Ax|j lies on or inside an ellipsoid whose semi-axes have lengths
Chapter 7   Distance and Approximation
equal to the nonzero singular values of A. It follows immediately that the largest of
these is crl5 so
IIA , = a.
This provides us with a neat way to express the condition number of a (square) matrix with respect to the operator 2-norm. Recall that the condition number (with respect to the operator 2-norm) of an invertible matrix A is defined as
cond2(A) = |A-I1|2|]A(|2
t
As you will be asked to show in Exercise 28, if A = UlVr, then A"1 = V"2~lU . Therefore, the singular values of A"1 are l/<x,,..., l/cr„ (why?), and
l/an
■ l/o-i
It follows that j|A '||2 = l/crn, so
cond2(A)
iMiiiiSi !,3S
E. H. Moore (18s: was an
American mathematician who worked in group theory, number theory, and geometry. He was the first head of the mathematics department at the University of Chicago when it opened in 1892. In 1920, he introduced a generalized matrix inverse that included rectangular matrices. His work did not receive much attention because of his obscure writing style.
Find the 2-condition number of the matrix A in Example 7.36.
Since fj, - 10 and cr3 = 0.01, cond2(A) = —
10 0.01
1000
This value is large enough to suggest that A may be ill-conditioned and we should be wary of the effect of roundoff errors.
The Pseudolnverse and Least Squares Approximation In Section 7.3, we produced the formula A+ = (ATA)~1A7 for the pseudoinverse of a matrix A. Clearly, this formula is valid only if A A is invertible, as we noted at the time. Equipped with the SVD, we can now define the pseudoinverse of any matrix, generalizing our previous formula.
Definition   Let A = t/2Vr be an SVD for an m X n matrix A, where 2
D O
O 0_
cr1 5: a2 s • • • > o~r > 0 of A. The pseudoinverse (or Moore-Penrose inverse) of A
and D is an r X r diagonal matrix containing the nonzero singular values
>•••> crr > 0 of A. The pi is the n X m matrix A* defined by
where 2 + is the n X m matrix
A4 = VV~UT
D~l O O O
Section 7.4   The Singular Value Decomposition
603
	________	
Example 7.39	Find the pseudoinverses of the matrices in Example 7.34.	
	Solution   (a) From the SVD 1	
A =
1   1 0"		"1 0"	"V2   0 0"
0   0 1.		_o 1.	. 0 10
1/V2   1/V2 0 0 0 1
.-1/V2   1/V2 0.
utvT
One of those who was unaware of Moore's work on matrix inverses was Roger Penrose (b.1931), who introduced his own notion of a generalized matrix inverse in 1955. Penrose has made many contributions to geometry and theoretical physics. He is also the inventor of a type of nonperiodic tiling that covers the plane with only two different shapes of tile, yet has no repeating pattern. He has received many awards, including the 1988 Wolf Prize in Physics, which he shared with Stephen Hawking. In 1994, he was knighted for services to science. Sir Roger Penrose is currently the Emeritus Rouse Ball Professor of Mathematics at the University of Oxford.
we form
Then
1/V2 0' 0 1 0 0
vl+uT =
(b) We have the SVD
0	-1/V2"	"1/V2	o"	1 0" .0 1.		"1/2	o"
0	1/V2	0	1		=	1/2	0
1	0	0	0			0	1
	*1 f		~2/V6	0	-l/VT	"V3 0"
A =	1 0	=	i/Vi	-1/V2	1/V3	0 1
	-0 1.		_1/V6	1/V2	1/V3.	o 0.
1/V2 1/V2 T/V2 1/V2_
= L/2Vr
so
and
1/V3   0 0 0      1 0
A+ = VT'rU
t _
1/V2 1/V2
I/V2 1/V2.
1/V3   0 0 0      1 0
2/V6      1/V6 1/V6
0     -1/V2 1/V2
-1/V3    1/V3 1/V3.
1/3 2/3 -1/3 1/3   -1/3 2/3
It is straightforward to check that this new definition of the pseudoinverse generalizes the old one, for if them X n matrix .A U1,VT has linearly independent columns, then direct substitution shows that (ATA) ~ lA T = V~L+ U1. (You are asked to verify this in Exercise 50.) Other properties of the pseudoinverse are explored in the exercises.
We have seen that when A has linearly independent columns, there is a unique least squares solution x to Ax = b; that is, the normal equations ATAx = ATb have the unique solution
x = (ATA)~lATb = .\ b
When the columns of A are linearly dependent, then ATA is not invertible, so the normal equations have infinitely many solutions. In this case, we will ask for the solution x of minimum length (i.e., the one closest to the origin). It turns out that this time we simply use the general version of the pseudoinverse.
6 Chapter 7   Distance and Approximation
TiieOfeitl 7.18     The least squares problem Ax = b has a unique least squares solution x of minimal length that is given by
x = A b
Proof Let A be an m X n matrix of rank r with SVD A = 1/2 Vr (so that A+ = V"2+UT). Lety = V Tx and let c = L7 'b. Write y and c in block form as
	and  c =	V
j2.		_c2_
where yj and Cj are in W.
We wish to minimize ||b — Ax|| or, equivalently, jjb — Axjj2. Using Theorem 5.6 and the fact that UT is orthogonal (because U is), we have
Axj
2 - IlTTl-,
UT{b - Ax)||2 = |Ur(b - U2Vrx)jj2 = \\U'b - UTUlVlx\\ = l!c-2y!J2= "
[V		D	0'			2		
		o	0_	T2-			c2	
The only part of this expression that we have any control over is yu so the minimum value occurs when Cj — Dyl = 0 or, equivalently, when y, = D~lCx. So all least squares solutions x are of the form
Set
x = Vy = V x = Vy = V
D_1C!
Y2 .
0
We claim that this x is the least squares solution of minimal length. To show this, lets suppose that
x' = vy = v
L Yi
is a different least squares solution (hence, y2 ^ 0). Then
HI = ]VyI = ||yfl < lly'l = ||vy'|| = ||x'|
as claimed.
We still must show that x is equal to A+b. To do so, we simply compute
			Dl 0"		
V	0	= V			_c2.
x = Vy = V
u j
= W'c = 1'2 l.vb = ,1 b
Example 7.40
Find the minimum length least squares solution of Ax = b, where
	'1 I		"0'
A =		and   b =	
	.1 1_		1
Section 7.4   The Singular Value Decomposition
605
Solution   The corresponding equations
x + y = 0 .v — y = I
are clearly inconsistent, so a least squares solution is our only hope. Moreover, the columns of A are linearly dependent, so there will be infinitely many least squares solutions—among which we want the one with minimal length. An SVD of A is given by
A ~
1 1 1 1
1/V2 1/V2 1/V2 -1/V2
	'2	0"	
	_0		
1/V2 1/V2 1/V2 -1/V2_
= 1/2 VT
(Verify this.) It follows that
A+ = V£+[/r =
1/V2	1/V2"	"1/2	o~	"1/V2 I/vT		-1/4	1/4
1/V2	-1/V2.		0_	1/V2 -1/V2.		.1/4	1/4.
so
x = A b
•1	1 -		"0"		"1"
4	I				4
1 .4	1 4.		J.		1 .4.
You can see that the minimum least squares solution in Example 7.40 satisfies x + y *= |, In a sense, this is a compromise between the two equations we started with. In Exercise 49, you are asked to solve the normal equations for this problem directly and to verify that this solution really is the one closest to the origin.
The Fundamental Theorem of (avertible Matrices it is appropriate to conclude
by revisiting the Fundamental Theorem of Invertible Matrices one more time. Not surprisingly, the singular values of a square matrix tell us when the matrix is invertible.
Th60feiTI 7.19     The Fundamental Theorem of Invertible Matrices: Final Version
Let A be an n X n matrix and let T : V —> W be a linear transformation whose matrix [T\c+-b wim respect to bases B and C of V and W, respectively, is A. The following statements are equivalent:
a. A is invertible.
b. Ax = b has a unique solution for every b in W.
c. Ax = 0 has only the trivial solution,
d. The reduced row echelon form of A is /„.
e. A is a product of elementary matrices.
f. rank(A) = n
g. nullity(A) = 0
h. The column vectors of A are linearly independent.
i. The column vectors of A span Un.
j. The column vectors of A form a basis for R". k. The row vectors of A are linearly independent. 1. The row vectors of A span R".
Chapter 7   Distance and Approximation
m. The row vectors of A form a basis for W,
n. detA^O
o. 0 is not an eigenvalue of A.
p. T is invertible.
q. T is one-to-one.
r. T is onto.
s. ker(T) = {0}
t. range(T) = W
u. 0 is not a singular value of A.
First note that, by the definition of singular values, 0 is a singular value of A if and only if 0 is an eigenvalue of A7 A.
(a) =*• (u) If A is invertible, so is A , and hence A1 A is as well. Therefore, property (o) implies that 0 is not an eigenvalue of ArA, so 0 is not a singular value of A.
(u) => (a) If 0 is not a singular value of A, then 0 is not an eigenvalue of ATA. Therefore, ATA is invertible, by the equivalence of properties (a) and (o). But then rank(A) = n, by Theorem 3.28, so A is invertible, by the equivalence of properties (a)
Amongthe many applications of the SVD, one of the most impressive is its use in compressing digital images so that they can be efficiently transmitted electronically (by satellite, fax, Internet, or the like). We have already discussed the problem of detecting and correcting errors in such transmissions. The problem we now wish to consider has to do with reducing the amount of information that has to be transmitted, without losing any essential information.
In the case of digital images, let's suppose we have a grayscale picture that is 340 X 280 pixels in size. Each pixel is one of 256 shades of gray, which we can represent by a number between 0 and 255. We can store this information in a 340 X 280 matrix A, but transmitting and manipulating these 95,200 numbers is very expensive. The idea behind image compression is that some parts of the picture are less interesting than others. For example, in a photograph of someone standing outside, there may be a lot of sky in the background, while the person's face contains a lot of detail. We can probably get away with transmitting every second or third pixel in the background, but we would like to keep all the pixels in the region of the face.
It turns out that the small singular values in the SVD of the matrix A come from the "boring" parts of the image, and we can ignore many of them. Suppose, then, that we have the SVD of A in outer product form
A = (TjUjVi + ■ • ■ + (rru,\'
Let k ^ rand define        Ak = cr,Ujv[ + • • • + akukvk
Then Ak is an approximation to A that corresponds to keeping only the first' k singular values and the corresponding singular vectors. For our 340 X 280 example, we may discover that it is enough to transmit only the data corresponding to the first 20 singular values. Then, instead of transmitting 95,200 numbers, we need only send 20 singular values plus the 20 vectors u,,..., u20 in U340 and the 20 vectors V],..., v20 in ffS280, for a total of
20 + 20 • 340 + 20 • 280 = 12,420
■m numbers. This represents a substantial saving!
The picture of the mathematician Gauss in Figure 7.22 is a 340 X 280 pixel image. It has 256 shades of gray, so the corresponding matrix A is 340 X 280, with entries fl| between 0 and 255.
It turns out that the matrix A has rank 280. If we approximate A by Ak, as described above, we get an image that corresponds to the first k singular values of A. Figure 7.23 shows several of these images for values of k from 2 to 256. At first, the image is very blurry, but fairly quickly it takes shape. Notice that A32 already gives a pretty good approximation to the actual image (which comes from A = A2g0, as shown in the upper left-hand corner of Figure 7.23).
Some of the singular values of A are o~i = 49,096, al6 --- 22,589, ai2 = 10,187, ■** (764 = 484, o"i28 = 182, a256 = 5, and <t280 = 0.5. The smaller singular values contribute very Figure 1.22 little to the image, which is why the approximations quickly look so close to the original.
Section 7.4   The Singular Value Decomposition
609
Exercises 7.4
In Exercises 1-10, find the singular values of the given matrix.
"2   1 | .1 2_ "V2 0
1. A =
3. A
5. A =
7. A =
9. A
2	0
.0	3
']	1
„0	0
'3"	
A	
"	0
	0
-~	2
"2	0
0	2
2. A =
4. A
1
V2
6. A = [3 4]
8. A
10. A
1 0
-2.
0 1
-3 0 0 1
In Exercises 11-20, find an SVD of the indicated matrix.
1 l" 1 1
11. A in Exercise 3
12. A -
13. A =
0 -2 .-3 0 15. A in Exercise 5 17. A in Exercise 7
19. A in Exercise 9
14. A
1 -1 .1 1
16. A in Exercise 6 18. A in Exercise 8 1   1 1
20. A =
1 1
In Exercises 21-24, find the outer product form of the SVD for the matrix in the given exercises.
21. Exercises 3 and 11      22. Exercise 14
23. Exercises 7 and 17      24. Exercises 9 and 19
25. Show that the matrices U and V in the SVD are not uniquely determined. [Hint: Find an example in which it would be possible to make different choices in the construction of these matrices.]
26. Let A be a symmetric matrix. Show that the singular values of A are:
(a) the absolute values of the eigenvalues of A.
(b) the eigenvalues of A if A is positive definite.
27. (a) Show that, for a positive definite, symmetric
matrix A, Theorem 7.13 gives the orthogonal diagonalization of A, as guaranteed by the Spectral Theorem.
(b) Show that, for a positive definite, symmetric matrix A, Theorem 7.14 gives the spectral decomposition of A.
28. If A is an invertible matrix with SVD A = Ut VT, show that 2 is invertible and that A"1 = V2"1 UT is an SVD of A"1.
29. Show that if A = UtVT is an SVD of A, then the left singular vectors are eigenvectors of AAT.
30. Show that A and AT have the same singular values.
31. Let Q be an orthogonal matrix such that QA makes sense. Show that A and QA have the same singular values,
32. Prove Theorem 7.15(d).
33. What is the image of the unit circle in R2 under the action of the matrix in Exercise 3?
34. What is the image of the unit circle in U2 under the action of the matrix in Exercise 7?
35. What is the image of the unit sphere in R3 under the action of the matrix in Exercise 9?
36. What is the image of the unit sphere in R3 under the action of the matrix in Exercise 10?
In Exercises 37-40, compute (a) \\A\\2 and (b) cond2(A)for the indicated matrix.
37. A in Exercise 3 1 0.9
39. A =
1 1
38. A in Exercise 8 10 10
40. A =
100 100
In Exercises 41-44, compute the pseudoinverse A+ of A in the given exercise.
41. Exercise 3 42. Exercise 8
43. Exercise 9 44. Exercise 10
In Exercises 45-48, find A+ and use it to compute the minimal length least squares solution to Ax = b.
	1	2	,b		3"		
45. A =				=			
	.2	4.					
	"3	0	0"				3
46. A =				,b	=		
	_0	0	2.				0.
	"l	f			"l"		
47. A =	1	1	,b	=	2		
	_1	1.			.3.		
610 Chapter 7   Distance and Approximation
"l	0	f		rr
0	1	0	,b =	l
.1	0	1.		_i_
48. A
49. (a) Set up and solve the normal equations for the
system of equations in Example 7.40.
(b) Find a parametric expression for the length of a solution vector in part (a).
(c) Find the solution vector of minimal length and verify that it is the one produced by the method of Example 7.40. [Hint: Recall how to find the coordinates of the vertex of a parabola.] 1
50. Verify that when A has linearly independent col- Mft. umns, the definitions of pseudoinverse in this section and in Section 7.3 are the same.
51. Verify that the pseudoinverse (as defined in this section) satisfies the Penrose conditions for A (Theorem 7.12 in Section 7.3).
52. Show that A""" is the only matrix that satisfies the Penrose conditions for A. To do this, assume that A' is a matrix satisfying the Penrose conditions:
(a) A A'A = A, (b) A'AA' = A', and (c) AA' and A'A are symmetric. Prove that A' = A +. [Hint: Use the Penrose conditions for A  and A' to show that A+ = A'AA+ and A' = A'AA+. It is helpful to note that condition (c) can be written as AA' = (A')rAr and A'A = A (A')r, with similar versions for A+.]
53. Show that (A+)+ = A. [Hint: Showthat A satisfies the Penrose conditions for A + . By Exercise 52, A must therefore be (A+)+.]
54. Showthat (A + f = (AT)+. [Hint: Showthat (A + )r satisfies the Penrose conditions for A . By Exercise 52, (A + )rmust therefore be (AT) + .]
55. Show that if A is a symmetric, idempotent matrix, then
56. Let Q be an orthogonal matrix such that OA makes sense. Showthat (QA)+ = A+Qr.
57. Prove that if A is a positive definite matrix with SVD A = U?,VT, then 17= V.
58. Prove that for a diagonal matrix, the 1-, 2-, and °=-norms are the same.
59. Prove that for any square matrix A, \\A\\l S JlAHJAHoo. [Hint: \\A\\l is the square of the largest singular value of A and hence is equal to the largest eigenvalue of A A. Now use Exercise 34 in Section 7.2.1
Every complex number can be written in polar form as z = re'fi, where r = \z\ is a nonnegative real number and 8 is its argument, with \e'e\ = 1. (See Appendix C.) Thus, z has been decomposed into a stretching factor r and a rotation factor e'ti. There is an analogous decomposition A = RQ for square matrices, called the polar decomposition.
60. Show that every square matrix A can be factored as A = RQ, where R is symmetric, positive semidefinite and Q is orthogonal. [Hint: Show that the SVD can be rewritten to give
A = U1VT = \Ii(UTU)VT = (UlU^WV7)
Then show that R = U% UT and Q right properties.]
UVl have the
Find a polar decomposition of the matrices in Exercises 61-64.
61. A in Exercise 3
63. A =
1
-3
62. A in Exercise 14 4 2
64. A =
A
A.
-2 2 4 -1
-3 6 6
Applications
Approximation of Functions
In many applications, it is necessary to approximate a given function by a "nicer" function. For example, we might want to approximate fix) = ex by a linear function g(x) — c + dx on some interval [a, b]. In this case, we have a continuous function/, and we want to approximate it as closely as possible on the interval [a, b]
Section 7.5 Applications
611
by a function g in the subspace 9?x. The general problem can be phrased as follows:
Given a continuous function/on an interval [a, b) and a subspace W of c(o [a, b], find the function "closest" to/in W.
The problem is analogous to the least squares fitting of data points, except now we have infinitely many data points—namely, the points on the graph of the function/ What should "approximate" mean in this context? Once again, the Best Approximation Theorem holds the answer.
The given function f lives in the vector space % [a, b] of continuous functions on the interval [a, b]. This is an inner product space, with inner product
if'g) =    f(x)g(x) dx
If Wis a finite-dimensional subspace of % [a, b], then the best approximation to/ in W is given by the projection of/onto W, by Theorem 7.8. Furthermore, if {u„ ..., uk} is an orthogonal basis for W, then
proiH(/) = ——-tu, + • • • + 7;-—^
>i> Ul)
Example 7,41
Find the best linear approximation to fix) = ex on the interval [—1, 1].
Xr.tmt   Linear functions "are polynomials of degree 1, so we use the subspace f     W = 9V-1, 1] of^f-1, 1] with the inner product
r 1
{/,#) = f(x)g(x)dx
A basis for 9* J— 1, 1] is given by {1, .*}. Since
rl
x     = 0
this is an orthogonal basis, so the best approximation to/in Wis g(x) = projw(e*)
(1, ex)      (x, ex) -1 + -—rx
(1,1) (x,x)'
(l-ex)dx
xexdx
+
(l-l)dx
- -l
e — e~
x dx
+
2e
l) + 3e'\x = 1.18' + l.lOx
' : Chapter 7   Distance and Approximation
where we have used integration by parts to evaluate tions.) See Figure 7.24.
f(x) = ex
xex dx. (Check these calcula-
I/g{x) = 1,18 + l.lOx
figure 7.14
The error in approximating / by g is the one specified by the Best Approximation Theorem: the distance \\f — g\\ between/and g relative to the inner product on % [— 1,1]. This error is just
II/-*« =
(fix) - g(x))2 dx
and is often called the root mean square error. With the aid of a CAS, we find that the root mean square error in Example 7.41 is
\ex - (|(e - e"1) + 3e~lx)
(ex ~ |(e - e'1) - 3e~lx)2 dx «- 0.23
Remark The root mean square error can be thought of as analogous to the area between the graphs of /and g on the specified interval. Recall that the area between the graphs of/and g on the interval [a, b] is given by
\f(x) - g(x) I dx
(See Figure 7.25.)
Although the equation in the above Remark is a sensible measure of the "error" between /and g, the absolute value sign makes it hard to work with. The root mean square error is easier to use and therefore preferable. The square root is necessary to "compensate" for the squaring and to keep the unit of measurement the same as it would be for the area between the curves. For comparison purposes, the area between the graphs of /'and g in Example 7.41 is
He - e"1) - 3e"1x\ dx 0.28
Figure 7.25
Section 7.5   Applications 613
Example 7.42
Find the best quadratic approximation to/(x) = ex on the interval [—1,1],
Solution A quadratic function is a polynomial of the form g{x) — a + bx + cx2 in W = 9f>2[~ 1' 1] ■ This time, the standard basis {1, x, x2} is not orthogonal. However, we can construct an orthogonal basis using the Gram-Schmidt Process, as we did in Example 7.8. The result is the set of Legendre polynomials
{l, x,X 3}
Using this set as our basis, we compute the best approximation to fin W as g(x) = projw(e*). The linear terms in this calculation are exactly as in Example 7.41, so we only require the additional calculations
(x2 - k)exdx
dx - |j exdx = f(e - 7el)
and
el ,-l
1  „2 _ 1\ _  I    („2 _ 1\2 J„ -
x2 - \) =    (x2 - j-)2 dx = I  (x4 - fx2 + \)dx =
Then the best quadratic approximation to/(x) ~ ex on the interval [— 1,1] is
; e
n <!.«*). .teO  . <*2
gW = projw(e ) = —1 4- — x 4-
1 |(e-7e-1)
45
(x2 - |)
3(1 le"1— e) 4
4-3e"'x +
15(e - 7e_1)
-x' « 1.00 + l.lOx + 0.54x'
(See Figure 7.26.)
4
614
Chapter 7   Distance and Approximation
Notice how much better the quadratic approximation in Example 7.42 is than the linear approximation in Example 7.41. It turns out that, in the quadratic case, the root mean square error is
Ml
(e* - g{x))2 dx ~ 0.04
In general, the higher the degree of the approximating polynomial, the smaller the error and the better the approximation.
In many applications, functions are approximated by combinations of sine and cosine functions. This method is particularly useful if the function being approximated displays periodic or almost periodic behavior (such as that of a sound wave, an electrical impulse, or the motion of a vibrating system). A function of the form
p(x) = a0 + al cos x + a2 cos 2x + • • ■ + a„ cos nx + bx sin x + b2 sin 2x + • • • + bn sin nx
(1)
is called a trigonometric polynomial; if av and b„ are not both zero, thenp(x) is said to have order n. For example,
p(x) = 3 — cos x + sin 2x + 4 sin 3x
is a trigonometric polynomial of order 3.
Let's restrict our attention to the vector space % [—tt, it] with the inner product
f(x)g(x)dx
The trigonometric polynomiais of the form in Equation (1) are linear combinations of the set
B = {1, cos x,..., cos nx, sin x,
The best approximation to a function fin % [—tt, tt] by a trigonometric polynomial of order n will therefore be projw-(/), where W = span(S). It turns out that B is an orthogonal set and, hence, a basis for W, Verification of this fact involves showing that any two distinct functions in B are orthogonal with respect to the given inner product. Example 7.43 presents some of the necessary calculations; you are asked to provide the remaining ones in Exercises 17-19.
Example 7,43
Show that sin jx is orthogonal to cos kx in % [—tt, tt] for j, k s 1,
Using a trigonometric identity, we compute as follows: If j # k, then
sinjxcosfocifx = j     [sin(j + k)x + sin(j — k)x]dx
cos(; + k)x    cos(;' — k)x +
j + k
j-k
= 0
since the cosine function is periodic with period 2tt.
Section 7.5 Applications
615
If / = k, then
sin kx cos kx dx = —■ [ sin2 kx]Z- = 0 2k
since sin kn = 0 for any integer to
In order to find the orthogonal projection of a function/in [ — tt, tt] onto the subspace W spanned by the orthogonal basis B, we need to know the squares of the norms of the basis vectors. For example, using a half-angle formula, we have
(sin kx, sin kx)
tt
(l — cos2kx)dx sin 2kx
2k
In Exercise 20, you are asked to show that (cos kx, cos kx) = tt and (1, 1) = 2rr, We now have
projw(/) = a0 + ax cos x + • • • + a„ cos nx + bx sin x + • • • + b„ sin nx (2) where
- Ml _ _L '
a°    (1,1)    2^J _ (cos fee,/)
/ (x) dx
1
[cos kx, cos fee)    7r-' ff (sin kx,f) i
/(x) cos toedx f(x) sin fee dx
(3)
(sin kx, sin fee)
for  > 1. The approximation to/given by Equations (2) and (3) is called the nth-order Fourier approximation to/on [—it, tt]. The coefficients are called the Fourier coefficients off.
		
Example 7.44	Find the fourth-order Fourier approximation to/(x) = x on [—tt, tt].	-,-
Solution   Using formulas (3), we obtain
1
2tt
xdx
2tt
and for k a l, integration by parts yields at = —    x cos kx dx — —
— sin fee + TV cos toe
616 Chapter 7   Distance and Approximation
and
Jean-Baptiste Joseph Fourier (1768-1830) was a French mathematician and physicist who gained prominence through his investigation into the theory of heat. In his landmark solution of the so-called heat equation, he introduced techniques related to what are now known as Fourier series, a tool widely used in many branches of mathematics, physics, and engineering. Fourier was a political activist during the French revolution and became a favorite of Napoleon, accompanying him on his Egyptian campaign in 1798. Later Napoleon appointed Fourier Prefect of Isere, where he oversaw many important engineering projects. In 1808, Fourier was made a baron. He is commemorated by a plaque on the Fiffel Tower.
= <
x sin kx dx
77
X        , 1
■ 7 cos kx + ~*r sin kx k k
-it cos kir — it cos(—kir)
—   if k is even
if k is odd
= 2(-l)*+1 k
It follows that the fourth-order Fourier approximation to f(x) ~ x on [—tt, tt] is
2(sin x — j sin 2x + | sin 3x — \ sin 4x) Figure 7.27 shows the first four Fourier approximations tof(x) =* x on [—ir, tt] .
n = 3
n = 4
Figure 7.27
Section 7.5 Applications
You can clearly see the approximations in Figure 7.27 improving, a fact that can be confirmed by computing the root mean square error in each case. As the order of the Fourier approximation increases, it can be shown that this error approaches zero. The trigonometric polynomial then becomes an infinite series, and we write
fix) = a0 + ^iak cos kx + bk sin kx)
This is called the Fourier series of/on [ — tt, tt].
Mariner 9 used the Reed-Muller code R5, whose minimum distance is 24 = 16. By Theorem 1, this code can correct k errors, where 2k + 1 S 16. The largest value of k for which this inequality is true is k = 7. Thus, R- not only contains exactly the right number of code vectors for transmitting 64 shades of gray but also is capable of correcting up to 7 errors, making it quite reliable. This explains why the images transmitted by Mariner 9 were so sharp!
Exercises 7.5
Approximation of Functions
In Exercises 1-4, find the best linear approximation tofon the interval [ — 1,1 ].
1. fix) = x2 3. f(x) = xi
2. fix) = x2 + 2x 4. fix) = sin(irx/2)
In Exercises 5 and 6, find the best quadratic approximation tofon the interval [ — 1, 1 ].
5. fix) = \x\ 6. fix) = cos{ttx/2)
7. Apply the Gram-Schmidt Process to the basis {1, x] to construct an orthogonal basis for ^[0,1].
8. Apply the Gram-Schmidt Process to the basis
{l, x, x2} to construct an orthogonal basis for 2?2 [0, 1].
In Exercises 9-12, find the best linear approximation tofon the interval [0,1],
9. fix) = x2 10. fix) = Vx ll.f{x) = ex 12. fix) - smirrx/D
In Exercises 13-16, find the best quadratic approximation tof on the interval [0,1],
13. fix) = x3 14. fix) = Vx
15. fix) = ex 16. fix) = sin(77x/2)
17. Show that 1 is orthogonal to cos kx and sin kx in ^[-ir, -it] forfc> 1.
18. Show that cos jx is orthogonal to cos kx in % [ — tt, tt] for j i> k,j, /i21.
19. Show that sin jx is orthogonal to sin kx in % [ — tt, it] for; # k,j, k 1.
20. Show that |J 11[2 = 2tt and ||cos fcx||2 = - in <[ -~ rr. tt].
In Exercises 21 and 22, find the third-order Fourier approximation tofon [ —tt, tt],
21. fix) = |*j 22, fix) = x2
In Exercises 23-26, find the Fourier coefficients aQ, ak, and bkoffon [-tt, tt}.
'0   if-7T s x < 0 1   if 0 < x < tt
-1 if-7T<X<0 1     if 0      .V 'S. ~
25. fix) = tt - x 26. fix) = \x\
23. fix) =
24. fix) =
Recall that a function fis an even function if f{—x) = fix) for all x;fis called an odd function if f{ —x) = —fix) for all x.
27. (a) Prove that     fix) dx = 0 if/ is an odd function.
(b) Prove that the Fourier coefficients ak are all zero if / is odd.
28. (a) Prove that    fix)dx — 2   fix) dx if/is an even
*-v ■'o
function.
(b) Prove that the Fourier coefficients bk are all zero if / is even.
Chapter Review
Key Definitions and Concepts
Best Approximation Theorem, 570 Cauchy-Schwarz Inequality, 539 condition number of a matrix, 562 distance, 535
Euclidean norm (2-norm), 553 Frobenius norm, 556 Fundamental Theorem of Invertible
Matrices, 605 Hamming distance, 554 Hamming norm, 554 ill-conditioned matrix, 561 inner product,   531 inner product space, 531
least squares error, 572
least squares solution,   574, 604
Least Squares Theorem, 575
matrix norm, 556
max norm (oo-norm, uniform
norm), 553 norm,   535,552 normed linear space, 552 operator norm, 559 orthogonal basis, 537 orthogonal projection,   538, 583 orthogonal (set of) vectors, 537 orthonormal basis, 537
orthonormal set of
vectors, 537 pseudoinverse of a matrix,
585, 602 singular value decomposition
(SVD), 593 singular values, 590 singular vectors, 593 sum norm (1-norm), 552 Triangle Inequality, 540 unit sphere, 535 unit vector, 535 well-conditioned matrix, 561
Review Questions
1. Mark each of the following statements true or false:
(a) Ifu =
		n
	andv =	
112 _		
, then <u, v) = MjV! +
Ttu2v2 defines an inner product on I
(b) Ifu
W;		
	andv =	
U2_		
, then (u, v) = 4u1vl
618
2uxv2 — 2u2Vi + 4u2v2 defines an inner product
on!R2.
(c) (A, B)    tr(A) + tr(B) defines an inner product on
MZ2.
(d) If u and v are vectors in an inner product space with |[u|| = 4, [jvÜ = V5, and (u, v) = 2, then ||u + v|i = 5.
(e) The sum norm, max norm, and Euclidean norm on W are all equal to the absolute value function when n — 1.
(f) If a matrix A is well-conditioned, then cond(A) is small.
(g) If cond(A) is small, then the matrix A is well-conditioned.
(h) Every linear system has a unique least squares solution.
(i) If A is a matrix with orthonormal columns, then the standard matrix of an orthogonal projection onto the column space of A is P — AA .
(j) If A is a symmetric matrix, then the singular values of A are the same as the eigenvalues of A.
In Questions 2-4, determine whether the definition gives an inner product.
2. (p(x), q(x)) = p(0)q(l) + p(l)q(0) forp(x), q(x) in 9>x
3. (A, B) = tr(ArB) for A, B in M22
4. (fig) = ( max f(x))(maxß(x)) for/, g in % [0,1]
In Questions 5 and 6, compute the indicated quantity using the specified inner product.
5. i| 1 + x + x2\\ if (a0 + axx + a2x2, b0 + btx + b2x2) = a0b0 + ä\bx + a2b2
6. d(x, x2) if (p(x), q(x)) = j^p(x)qix) dx
In Questions 7 and 8, construct an orthogonal set of vectors by applying the Gram-Schmidt Process to the given set of vectors using the specified inner product.
7.
if (u, v) = u Av, where A =
6 4 4 6
... 8. {1, x, x2} if (p(x), q(x)) =   p(x)q(x) dx
- o
In Questions 9 and 10, determine whether the definition gives a norm.
9. || vl| = v'vforvin U"
10, \\p(x)\\ = |^(0)| + \p(l) - p(0)| forp(x) inS?!
Section 7.5   Applications 618
11. Show that the matrix A =
1
0.1 0.11
0.1
0.11
0.111
0.11
0.111
0.1111
ill-conditioned.
12. Prove that if Q is an orthogonal n X n matrix, then its Frobenius norm is ||Q||f = V».
13. Find the line of best fit through the points (1, 2), (2, 3), (3, 5), and (4, 7).
14. Find the least squares solution of t
1' 0 -1 3
15. Find the orthogonal projection of x "l f
column space of A
onto the
16. If u and v are orthonormal vectors, show that P = uur + w' is the standard matrix of an orthogonal projection onto span (u, v). [Hint: Show that P = A{ATA)~lAT for some matrix A.]
In Questions 17 and 18, find (a) the singular values, (b) a singular value decomposition, and (c) the pseudoinverse of the matrix A.
i r
18. A
17. A
1 -1 1 -1
0 0 .1 -1.
19. If P and Q are orthogonal matrices for which PAQ is defined, prove that PAQ has the same singular values as A.
20. If A is a square matrix for which A2 = O, prove that
(A+)2 = O.
Please, sir, I want some more.
—Oliver
Charles Dickens, Oliver Twist
Anyone who understands algebraic notation reads at a glance in an equation results reached arithmetically only with great labour and pains. —Augustin Cournot Researches into the Mathematical Principles of the Theory of Wealth Translated by Nathaniel T, Bacon Macmillan, 1897, p. 4
'0
Appendix A
Mathematical Notation and Methods of Proof
In this book, an effort has been made to use "mathematical English" as much as possible, keeping mathematical notation to a minimum. However, mathematical notation is a convenient shorthand that can greatly simplify the amount of writing we have to do. Moreover, it is commonly used in every branch of mathematics, so the ability to read and write mathematical notation is an essential ingredient of mathematical understanding. Finally, there are some theorems whose proofs become "obvious" if the right notation is used.
Proving theorems in mathematics is as much an art as a science. For the beginner, it is often hard to know what approach to use in proving a theorem; there are many approaches, any one of which might turn out to be the best. To become proficient at proofs, it is important to study as many examples as possible and to get plenty of practice.
This appendix summarizes basic mathematical notation applied to sets. Summation notation, a useful shorthand for dealing with sums, is also discussed. Finally, some approaches to proofs are illustrated with generic examples.
Set Notation
A set is a collection of objects, called the elements (or members) of the set. Examples of sets include the set of all words in this text, the set of all books in your college library, the set of positive integers, and the set of all vectors in the plane whose equation is 2x + 3y — z = 0.
It is often possible to list the elements of a set, in which case it is conventional to enclose the list within braces. For example, we have
{1,2,3},   {a,t,x,z},   {2,4,6,..., 100},
77  2 77   77 477
577
6
Note that ellipses (...) denote elements omitted when a pattern is present. (What is the pattern in the last two examples?) Infinite sets are often expressed using ellipses. For example, the set of positive integers is usually denoted by N or Z+, so
N = T = {1,2,3,...}
The set of all integers is denoted by Z, so
Z = {...,-2,-1,0,1,2,...}
Two sets are considered to be equal if they contain exactly the same elements. The order in which elements are listed does not matter, and repetitions are not counted. Thus,
"Exercises and selected odd-numbered answers for this appendix can be found on the student companion website.
A1
Appendix A   Mathematical Notation and Methods of Proof
{1,2,3} = {2, 1,3} = (1,3,2, 1}
The symbol E means "is an element of" or "is in," and the symbol & denotes the negation—that is, "is not an element of" or "is not in." For example,
5 e Z+  but  0 g I*
It is often more convenient to describe a set in terms of a rule satisfied by all of its elements. In such cases, set builder notation is appropriate. The format is
{x: x satisfies P}
where P represents a property or a collection of properties that the element x must satisfy. The colon is pronounced "such that." For example,
{n:«GZ,«>0}
is read as "the set of all n such that n is an integer and n is greater than zero." This is just another way of describing the positive integers Z+. (We could also write Z+ = {n G Z : n > 0}.)
The empty set is the set with no elements. It is denoted by either 0 or {}.
Example A.1
Describe in words the following sets:
(a) A = {n:n = 2U£Z(
(c) C = {x G R : 4x2 - 4x - 3 = 0}
(b) B = {m/n :m,n£ (d) D = {x £ Z : 4x2
Z, n * 0} - 4x - 3 = 0}
Solution (a) A is the set of numbers n that are integer multiples of 2. Therefore, A is the set of all even integers.
(b) B is the set of all expressions of the form m/n, where m and n are integers and n is nonzero. This is the set of rational numbers, usually denoted by Q. (Note that this way of describing Q produces many repetitions; however, our convention, as noted above, is that we include only one occurrence of each element. Thus, this expression precisely describes the set of all rational numbers.)
(c) C is the set of all real solutions of the equation 4x2 - 4x - 3 = 0. By factoring or using the quadratic formula, we find that the roots of this equation are — 5 and f. (Verify this.) Therefore,
c = {-§,!}
(d) From the solution to (c) we see that there are no solutions to 4x2 — 4x — 3 = 0 in R that are integers. Therefore, D is the empty set, which we can express by writing D = 0. ' 1
John Venn (1834-1923) was an English mathematician who studied at Cambridge University and later lectured there. He worked primarily in mathematical logic and is best known for inventing Venn diagrams.
If every element of a set A is also an element of a set B, then A is called a subset of B, denoted A C B. We can represent this situation schematically using a Venn diagram, as shown in Figure A.l, (The rectangle represents the universal set, a set large enough to contain all of the other sets in question—in this case, A and B.)
ft"
Appendix A   Mathematical Notation and Methods of Proof
A C B
(a) {1,2,3} £{1,2,3,4,5}
(b) Z+CZCH
(c) Let A be the set of all positive integers whose last two digits are 24 and let B be the set of all positive integers that are evenly divisible by 4. Then if n is in A, it is of the form
n = 100/c + 24
for some integer k. (For example, 36,524 = 100 • 365 + 24.) But then
n = lOOfc + 24 = 4(25fc + 6)
so «/4 = 25k + 6, which is an integer. Hence, n is evenly divisible by 4, so it is in B. Therefore, A C B.
We can show that two sets A and B are equal by showing that each is a subset of the other. This strategy is particularly useful if the sets are defined abstractly or if it is not easy to list and compare their elements.
Let A be the set of all positive integers whose last two digits form a number that is evenly divisible by 4. In the case of a one-digit number, we take its tens digit to be 0. Let B be the set of all positive integers that are evenly divisible by 4. Show that A = B.
Solution As in Example A.2(c), it is easy to see that A C B. If n is in A, then we can split off the number m formed by its last two digits by writing
n = lOOfc + m
for some integer k. But, since m is divisible by 4, we have m = 4r for some integer r. Therefore,
n = 100A: + m = 100k + 4r = 4(25fc + r)
so n is also evenly divisible by 4. Hence, A C B.
To show that B C A, let n be in B. That is, n is evenly divisible by 4. Let's say that n = 45, where s is an integer. If m is the number formed by the last two digits of n, then, as above, n = 100/c + m for some integer k. But now
m = n - 100/c = 4s - lOOfc = 4(s - 25A;)
which implies that m is evenly divisible by 4, since s — 25k is an integer, 'therefore, n is in A, and we have shown that B C A.
Since AC B and B C A, we must have A = B
Appendix A   Mathematical Notation and Methods of Proof
The intersection of sets A and B is denoted by A fl B and consists of the elements that A and B have in common. That is,
A ("1 B = {x : x E A   and  x £ B}
Figure A.2 shows a Venn diagram of this case. The union of A and B is denoted by A U B and consists of the elements that are in either A or B (or both). That is,
A U B = [x : x E A   or  x E B}
See Figure A.3.
Figure A.2
A D B A U B
Let A = {n2: n E T, 1 < « < 4} and let B = {n E Z+ : « < 10 and « is odd}. Find A fl B and A U B.
SelUliii   We see that
A = {l2,22,32, 42} = {1,4,9, 16}   and  B = {1,3,5,7,9} Therefore, A fl B = {1, 9} and A U B = {1, 3,4, 5, 7, 9, 16}.
If A fl B = 0, then A and B are called disjoint sets. (See Figure A.4.) For example, the set of even integers and the set of odd integers are disjoint.
Summation Notation
Summation notation is a convenient shorthand to use to write out a sum such as
1 + 2 + 3 + • • • + 100
where we want to leave out all but a few terms. As in set notation, ellipses (...) convey that we have established a pattern and have simply left out some intermediate terms. In the above example, readers are expected to recognize that we are summing all of the positive integers from 1 to 100. However, ellipses can be ambiguous. For example, what would one make of the following sum?
1 + 2 + • • • + 64
Is this the sum of all positive integers from 1 to 64 or just the powers of two, 1 + 2 + 4 + 8+16 + 32 + 64? It is often clearer (and shorter) to use summation notation (or sigma notation).
Figure fl.4
Disjoint sets
2 is the capital Greek letter sigma, corresponding to S (for "sum"). Summation notation was introduced by Fourier in 1820 and was quickly adopted by the mathematical community.
Appendix A   Mathematical Notation and Methods of Proof
We can abbreviate a sum of the form
at + a2 + ~ ~ • + an
(1)
as
5>*
fc«=i
(2)
which tells us to sum the terms ak over all integers fc ranging from 1 to n. An alternative version of this expression is
The subscript fc is called the imfec of summation. It is a "dummy variable" in the sense that it does not appear in the actual sum in expression (1). Therefore, we can use any letter we like as the index of summation (as long as it doesn't already appear somewhere else in the expressions we are summing). Thus, expression (2) can also be written as
«
The index of summation need not start at 1. The sum a3 + a4 H— + agg becomes
99
although we can arrange for the index to begin at 1 by rewriting the expression as
97 4 = 1
The key to using summation notation effectively is being able to recognize patterns.
Write the following sums using summation notation.
(a) 1 + 2 + 4 + ■•• + 64      (b) 1 + 3 + 5 + ••• + 99      (c) 3 + 8 + 15 + •■• + 99 (a) We recognize this expression as a sum of powers of 2:
1 + 2 + 4 +
64 = 2U + V
Therefore, the index of summation appears as the exponent, and we have >j 2k.
4 = 0
(b) This expression is the sum of all the odd integers from 1 to 99. Every odd integer is of the form 2k + 1, so the sum is
1 + 3 + 5 + • • • + 99 = (2-0 + 1) + (2-1 + 1) + (2-2 + 1) + • • • + (2-49 + 1)
= 2(2fc+ 1)
4=0
(c) The pattern here is less clear, but a little reflection reveals that each term is 1 less than a perfect square:
3 + 8 + 15 + • • • + 99
= (22 ~ 1) + (32 - 1) + (42 - 1) + • • ■ + (102 - 1)
10
= 2>2 -1)
k-2 . ^4-
Appendix A   Mathematical Notation and Methods of Proof
Example A.6
Rewrite each of the sums in Example A.5 so that the index of summation starts at 1.
(a) If we use the change of variable i = k + 1, then, as k goes from 0 to 6, i goes from 1 to 7. Since k = ( — 1, we obtain
(c=0 i=l
(b) Using the same substitution as in part (a), we get
49 50 50
^dk +1) = 2<2<* - i) + i) = 2>' - i)
(c) The substitution i = k - 2 will work (try it), but it is easier just to add a term corresponding tok= 1, since l2 — 1 = 0. Therefore,
10 10
2>2 - 1) = 2>2 - 1)
k -2 *«1
	Multiple summations arise when there is more than one index of summation, as there is with a matrix. The notation
	n
	means to sum the terms at, as i and j each range independently from 1 to n. The sum in expression (3) is equivalent to either
	ft n SIX
	where we sum first over j and then over i (we always work from the inside out), or
	n n ;=1 >=1
	where the order of summation is reversed.
Example A.7	i Write out ^!' using both possible orders of summation.
	U-J ' Solution
r*l J«l ;=l
= (l1 + l2 + l3) + (21 + 22 + 23) + (31 + 32 + 33) = (1 + 1 + 1) + (2 + 4 + 8) + (3 + 9 + 27) = 56
Appendix A   Mathematical Notation and Methods of Proof
and
22'v' = 2u'+ 2*+ 3*)
= (l1 + 21 + 31) + (l2 + 22 + 32) + (I3 4- 23 + 33) = (1 + 2 + 3) + (1 + 4 + 9) + (1 + 8 + 27) = 56
How fo Solve It is the title of a book by the mathematician George Polya (1887-1985 Since its publication in 1945, How to Solve It has sold over a million copies and has been translated into 17 languages. « Polya was born in Hungary, but because of the political situation in Europe, he moved to the United States in 1940. He subsequently taught at Brown and Stanford Universities, where he did mathematical research and developed a well-deserved reputation as an outstanding teacher. The Polya Prize is awarded annually by the Society for Industrial and Applied Mathematics for major contributions to areas of mathematics close to those on which Polya worked. The Mathematical Association of America annually awards Polya Lectureships to mathematicians demonstrating the high-quality exposition for which Polya was known.
Remark Of course, the value of the sum in Example A,7 is the same no matter which order, of summation we choose, because the sum is finite. It is also possible to consider infinite sums (known as infinite series in calculus), but such sums do not always have a value and great care must be taken when rearranging or manipulating their terms. For example, suppose we let
S = ±2k
fc=o
Then S=l+2 + 4 + 8 + -- -
= 1 + 2(1 + 2 + 4 + • • •) = 1 + IS
from which it follows that S = — 1. This is clearly nonsense, since S is a sum of non-negative terms! (Where is the error?)
Methods of Proof
The notion of proof is at the very heart of mathematics. It is one thing to know what is true; it is quite another to know why it is true and to be able to demonstrate its truth by means of a logically connected sequence of statements. The intention here is not to try to teach you how to do proofs; you will become better at doing proofs by studying examples and by practicing—something you should do often as you work through this text. The intention of this brief section is simply to provide a few elementary examples of some types of proofs. The proofs of theorems in the text will provide further illustrations of "how to solve it."
Roughly speaking, mathematical proofs fall into two categories: direct proofs and indirect proofs. Many theorems have the structure "if P, then Q," where P (the hypothesis, or premise) and Q (the conclusion) are statements that are either true or false. We denote such an implication by P =4» Q. A direct proof proceeds by establishing a chain of implications
P    Px     P2     ■ ■ ■ =*■ P„ Q
leading directly from P to Q.
Prove that any two consecutive perfect squares differ by an odd number. This instruction can be rephrased as "Prove that if a and b are consecutive perfect squares, then a — bis odd." Hence, it has the form P => Q, with P being "a and b are consecutive perfect squares" and Q being "a — b is odd."
Appendix A   Mathematical Notation and Methods of Proof
as
Assume that a and b are consecutive perfect squares, with a> b. Then a = (n + l)2 and  b = n2
for some integer n. But now
a - b = in + l)2 - n2
= n2 + 2n + 1 — n2 = 2n + 1
so a — t> is odd.
There are two types of indirect proofs that can be used to establish a conditional statement of the form P => Q. A proof by contradiction assumes that the hypothesis P is true, just as in a direct proof, but then supposes that the conclusion Q is false. The strategy then is to show that this is not possible (i.e., to rule out the possibility that the conclusion is false) by finding a contradiction to the truth of P. It then follows that Q must be true.
Example A.9
Let n be a positive integer. Prove that if n2 is even, so is n. (Take a few minutes to try to find a direct proof of this assertion; it will help you to appreciate the indirect proof that follows.)
Assume that n is a positive integer such that n2 is even. Now suppose that n is not even. Then n is odd, so
2k
1
for some integer k. But if so, we have n2 = {2k +
1)
4k2 + 4k + 1
so n2 is odd, since it is 1 more than the even number 4k + 4k. This contradicts our hypothesis that n2 is even. We conclude that our supposition that n was not even must have been false; in other words, n must be even.
Closely related to the method of proof by contradiction is proof by contraposi-tive. The negative of a statement P is the statement "it is not the case that P" abbreviated symbolically as ->P and pronounced "not P." For example, if P is "« is even," then ->P is "it is not the case that n is even"—in other words, "n is odd."
The contrapositive of the statement P =*• Q is the statement ->Q -iP. A conditional statement P=i> Q and its contrapositive ~iQ =4- -vP are logically equivalent in the sense that :«--»" they are either both true or both false. (For example, if P =** Q is a theorem, then so is ->Q -iP. To see this, note that if the hypothesis -iQ is true, then Q is false. The conclusion ->P cannot be false, for if it were, then P would be true and our known theorem P=$Q would imply the truth of Q, giving us a contradiction. It follows that ->P is true and we have proved -iQ =>• -iP.) Here is a contrapositive proof of the assertion in Example A.9.
M
Appendix A   Mathematical Notation and Methods of Proof
Example A.10
Let n be a positive integer. Prove that if n2 is even, so is n.
Solution   The contrapositive of the given statement is
"If n is not even, then n1 is not even"   or   "If n is odd, so is n2"
To prove this contrapositive, assume that n is odd. Then n — 2k + 1 for some integer k. As before, this means that n2 = (2k + I)2 = Ak2 + 4k + 1 is odd, which completes the proof of the contrapositive. Since the contrapositive is true, so is the original statement. a
Although we do not require a new method of proof to handle it, we will briefly consider how to prove an "if and only if" theorem. A statement of the form "P if and only if Q" signals a double implication, which we denote by P Q. To prove such a statement, we must prove P Q and Q =#■ P. To do so, we can use the techniques described earlier, where appropriate. It is important to notice that the "if" part of P Q is "P if Q," which is Q => P; the "only if" part of P 4* Q is "P only if Q," meaning P =>• Q. The implication P Q is sometimes read as "P is sufficient for Q" or "Q is necessary for P"; Q P is read "Q is sufficient for P" or "P is necessary for Q." Taken together, they are P Q, or "P is necessary and sufficient for Q" and vice versa.
Example Jl.11
A pawn is placed on a chessboard and is allowed to move one square at a time, either horizontally or vertically. A pawns tour of a chessboard is a path taken by a pawn, moving as described, that visits each square exactiy once, starting and ending on the same square. Prove that there is a pawn's tour of an n X n chessboard if and only if n is even.
Solution [ ] ("if") Assume that n is even. It is easy to see that the strategy illustrated in Figure A.5 for a 6 X 6 chessboard will always give a pawn's tour.
[ ] ("only if") Suppose that there is a pawn's tour of an n X n chessboard. We will give a proof by contradiction that n must be even. To this end, let's assume that n is odd. At each move, the pawn moves to a square of a different color. The total number of moves in its tour is n2, which is also an odd number, according to the proof in Example A. 10. Therefore, the pawn must end up on a square of the opposite color from that of the square on which it started. (Why?) This is impossible, since the pawn ends where it started, so we have a contradiction. It follows that n cannot be odd; hence, n is even and the proof is complete.
Figure A.5
Some theorems assert that several statements are equivalent. This means that each is true if and only if all of the others are true. Showing that n statements are
equivalent requires
n~ — n
'if and only if" proofs. In practice,
2!(« - 2)! 2
however, it is often easier to establish a "ring" of n implications that links all of the statements. The proof of the Fundamental Theorem of Invertible Matrices provides an excellent example of this approach.
Appendix B ©
Great fleas have little fleas upon their backs to bite 'em, And little fleas have lesser fleas, and so ad infinitum, —Augustus De Morgan A Budget of Paradoxes Longmans, Green, and Company, 1872, p. 377
Mathematical Induction
The ability to spot patterns is one of the keys to success in mathematical problem solving. Consider the following pattern:
1 = 1 1+3 = 4 1+3+5=9 1 + 3 + 5 + 7= 16 1 + 3 + 5 + 7 + 9 = 25
The sums are all perfect squares: l2, 22, 32,42, 52. It seems reasonable to conjecture that this pattern will continue to hold; that is, the sum of consecutive odd numbers, starting at 1, will always be a perfect square. Let's try to be more precise. If the sum is n2, then the last odd number in the sum is 2« - 1. (Check this in the five cases above.) In symbols, our conjecture becomes
1 + 3 + 5 + • ■ • + (2n - 1)
for all n s 1
(1)
Notice that Equation (1) is really an infinite collection of statements, one for each value of n 3: 1. Although our conjecture seems reasonable, we cannot assume that the pattern continues—we need to prove it. This is where mathematical induction comes in.
First Principle of Mathematical Induction
Let S(n) be a statement about the positive integer n. If
1. 5(1) is true and
2. for all k 2= 1, the truth of S(k) implies the truth of S(k + 1) then S(n) is true for all »2 1,
Verifying that S(l) is true is called the basis step. The assumption that S(k) is true for some k S: 1 is called the induction hypothesis. Using the induction hypothesis to prove that S(k + 1) is then true is called the induction step. Mathematical induction has been referred to as the domino principle because it is analogous to showing that a line of dominoes will fall down if (1) the first domino can be knocked down (the basis step) and (2) knocking down any domino (the induction hypothesis) will knock over the next domino (the induction step). See Figure B.l.
We now use the principle of mathematical induction to prove Equation (1).
*Exercises and selected odd-numbered answers for this appendix can be found on the student companion website,
Appendix B   Mathematical Induction
If the first domino falls, and ...        each domino that falls knocks down the next one,
then all the dominoes can be made to fall by pushing over the first one.
figure B.1
EXaiTiPl8 B.1       Use mathematical induction to prove that
1+3 + 5 -+-••• + (2« - 1) = «2
for all «S 1,
Solution For « = 1, the sum on the left-hand side is just 1, while the right-hand side is l2. Since 1 = l2, this completes the basis step.
Now assume that the formula is true for some integer k Se 1. That is, assume that
1 + 3 + 5 + • • • + (2k - 1)
= 1,2
(This is the induction hypothesis.) The induction step consists of proving that the formula is true when n = k + 1. We see that when n — k + 1, the left-hand side of formula (1) is
1 + 3 + 5 + • • • + (2(k + 1) - 1) = 1 + 3 + 5 + ■ • • + (2k + 1)
= 1 + 3 + 5 + ' • • + (2k - 1) + (2k + 1)
k2 + 2k +
= (k + l)2 by the induction
hypothesis
which is the right-hand side of Equation (1) when n = k + 1.
This completes the induction step, and we conclude that Equation (1) is true for all n 2: 1, by the principle of mathematical induction.
Appendix B   Mathematical Induction
Example B.2
The next example gives a proof of a useful formula for the sum of the first n positive integers. The formula appears several times in the text; for example, see the solution to Exercise 51 in Section 2.4.
Prove that
for all
1 + 2 + • • • + »
n(n + 1)
Solution   The formula is true for n = 1, since
1(1 + 1)
1 = —---
2
Assume that the formula is true for n = k; that is,
k(k + 1)
1 + 2 +
k =
We need to show that the formula is true when n = k + 1; that is, we must prove that
(k + l)[(k + 1) + 1]
1 + 2 + • • • + (k + 1)
But we see that
1 + 2 + • • • + (k + 1) = (1 + 2 + • • * + k) + (k + 1)
k(k + 1) „
= ~.--- + (k + 1)
by the induction hypothesis
k(k + 1) + 2(k + 1)
3k + 2
(k + l)(k + 2)
(k + l)[(k + 1) + 1]
which is what we needed to show.
This completes the induction step, and we conclude that the formula is true for all tiS 1, by the principle of mathematical induction. .
In a similar vein, we can prove that the sum of the squares of the first n positive integers satisfies the formula
l2 + 22 + 32 + • • • + n2 = for all n a 1. (Verify this for yourself.)
n(n + i)(2n + 1)
Appendix B   Mathematical Induction
Example B.3
The basis step need not be for n — 1, as the next two examples illustrate.
Prove that n\ > 2" for all integers n > 4.
Solution The basis step here is when n = 4. The inequality is clearly true in this case, since
4! = 24 > 16 = 24 Assume that kl >2k for some integer k S 4. Then
(fc + 1)! = (k + \)k\
> (k + 1)2 by the induction hypothesis
> 5 • 2k since H>4
> 2-2*
which verifies the inequality for n — k + 1 and completes the induction step.
We conclude that«! > 2" for all integers n 2: 4, by the principle of mathematical induction.
If a is a nonzero real number and n & 0 is an integer, we can give a recursive definition of the power a" that is compatible with mathematical induction. We define a" = 1 and, for n & 0,
(This form avoids the ellipses used in the version an = aa ■ • • a.) We can now use mathematical induction to verify a familiar property of exponents.
Let a be a nonzero real number. Prove that ama" = am+" for all integers m, n a 0.
Solution At first glance, it is not clear how to proceed, since there are two variables, m and n. But we simply need to keep one of them fixed and perform our induction using the other. So, let m 2: 0 be a fixed integer. When n = 0, we have
ama° = am-l = am = a"'+0
using the definition a0 = 1. Hence, the basis step is true.
Now assume that the formula holds when n = k, where k St 0. Then amak = am+k. For n = k + 1, using our recursive definition and the fact that addition and multiplication are associative, we see that
amak+i _ „m,J<
am{aKa) (amak)a
= am+ka
= a
(m+k) +1
by definition
by the induction hypothesis by definition
Appendix B   Mathematical Induction
Therefore, the formula is true for n = k + 1, and the induction step is complete.
We conclude that a'"a" = am+n for all integers m, n £ 0, by the principle of mathematical induction.
In Examples B.l through B.4, the use of the induction hypothesis during the induction step is relatively straightforward. However, this is not always the case. An alternative version of the principle of mathematical induction is often more useful.
Second Principle of Mathematical Induction
Let S(n) be a statement about the positive integer n. If
1. SO) is true and
2. the truth of S(l), S(2),..., S(k) implies the truth of S(k + 1) then S(n) is true for all ft£l,
Example B.i
The only difference between the two principles of mathematical induction is in the induction hypothesis: The first version assumes that S(k) is true, whereas the second version assumes that all of S(l), S(2),..., S(k) are true. This makes the second principle seem weaker than the first, since we need to assume more in order to prove S(k + 1) (although, paradoxically, the second principle is sometimes called strong induction). In fact, however, the two principles are logically equivalent: Each one implies the other. (Can you see why?)
The next example presents an instance in which the second principle of mathematical induction is easier to use than the first. Recall that a prime number is a positive integer whose only positive integer factors are 1 and itself.
Prove that every positive integer n 2: 2 either is prime or can be factored into a product of primes.
Solution The result is clearly true when n = 2, since 2 is prime. Now assume that for all integers n between 2 and k, n either is prime or can be factored into a product of primes. Let n = k + 1. If k + lis prime, we are done. Otherwise, it must factor into a product of two smaller integers—say,
k + 1 = ab
Since 2 ■& a, b £ k (why?), the induction hypothesis applies to a and b. Therefore,
a = Pi ''' Pr b = qx • ■ • qs
where the p's and q's are all prime. Then
ah** pr" prqi ■••<?,
gives a factorization of ab into primes, completing the induction step.
We conclude that the result is true for all integers n 2: 2, by the second principle of mathematical induction.
B6 Appendix B   Mathematical Induction
Do you see why the first principle of mathematical induction would have been difficult to use here?
We conclude with a highly nontrivial example that involves a combination of induction and backward induction. The result is the Arithmetic Mean-Geometric Mean Inequality, discussed in Chapter 7 in Exploration: Geometric Inequalities and Optimization Problems. The clever proof in Example B.6 is due to Cauchy
Example B.6
Let X ,..., X be nonnegative real numbers. Prove that
Xi + x2 + ■ • • + x„
V X\X2' ' ' xn
for all integers n -~ 2.
Solution For n = 2, the inequality becomes Vxy < (x + y)/2. You are asked to verify this in Problems 1 and 2 of the Exploration mentioned above.
If S(n) is the stated inequality, we will prove that S(fc) implies S(2k). Assume that S(k) is true; that is,
"VxYx2- ■
xk
for all nonnegative real numbers xx,..., xk. Let
_ 7i + 72 73 + y*
yik-\ + yik
Then
yr ■ -yik =
< k
yr • ■ yik yi + yi
y$2-- ■ Vy2k^y2k
721-1 + y-A
— \/xj * * * Xfc
Xj "t" Xj ~i~ * ■ ■ H- x^
byS(2)
byS(fc)
7i + 72
+
+
72*-1 + 72.t
_ 7i + '' ' + yik 2k
which verifies S(2k).
Thus, the Arithmetic Mean-Geometric Mean Inequality is true for n = 2, 4, 8,. . . —the powers of 2. In order to complete the proof, we need to "fill in the gaps." We will use backward induction to prove that S(k) implies S(k — 1). Assuming S(k) is true, let
X, + X2 + • • • + Xjt_]
xk =
1
Appendix B   Mathematical Induction
Then
/
:PA' " ' xk--l
Xj + x2 +
xk--l
k - 1
xx + x2 + • ■ • +
xl + % + ■"' + xk
_____
kxx + kx2 + • ■ • + kxji-1 ^kik^-Yf''
X1   +  X2   +   • • •  + Xk-y
k-l
Equivalently,
xlx2 ' ' ' xk-l
Xj + X2 + ■ • ■ + %-
Xl + X2 +  • - ' +
xlx2~ ' ' Xk-~L
k-l )    \ k-l
' Xx + x2 + • • • + xk_l's'k
k - 1
Taking the (k — l)th root of both sides yields S(k — 1).
The two inductions, taken together, show that the Arithmetic Mean-Geometric Mean Inequality is true for all n 5: 2.
■•■irk Although mathematical induction is a powerful and indispensable tool, it cannot work miracles. That is, it cannot prove that a pattern or formula holds if it does not. Consider the diagrams in Figure B.2, which show the maximum number of regions R(n) into which a circle can be subdivided by n straight lines.
R(0) =1 = 2°
Figure B.2
R(l) = 2 = 21
R(2) = 4
Based on the evidence in Figure B.2, we might conjecture that R(n) = 2" for n 2: 0 and try to prove this conjecture using mathematical induction. We would not succeed, since this formula is not correct! If we had considered one more case, we would have discovered that R(3) = 7 # 8 = 23, thereby demolishing our conjecture. In fact, the correct formula turns out to be
*(«) =--J-
which can be verified by induction. (Can you do it?)
For other examples in which a pattern appears to be true, only to disappear when enough cases are considered, see Richard K. Guy's delightful article "The Strong Law of Small Numbers" in the American Mathematical Monthly, Vol. 95 (1988), pp. 697-712.
[The] extension of the number concept to include the irrational, and we will at once add, the imaginary, is the greatest forward step which pure mathematics has ever taken. —Hermann Hankel Theorie der Complexen Zahlensysteme Leipzig, 1867, p. 60
There is nothing "imaginary" about complex numbers—they are just as "real" as the real numbers. The term imaginary arose from the study of polynomial equations such as x2 + 1 = 0, whose solutions are not "real" (i.e., real numbers). It is worth remembering that at one time negative numbers were thought of as "imaginary" too.
Jean-Robert Argand (1768-1822) was a French accountant and amateur mathematician. His geometric interpretation of complex numbers appeared in 1806 in a book that he published privately. He was not, however, the first to give such an interpretation. The Norwegian-Danish surveyor Caspar Wessel (1745-1818) gave the same version of the complex plane in 1787, but his paper was not noticed by the mathematical community until after his death.
Appendix C
Complex Numbers
A complex number is a number of the form a + bi, where a and b are real numbers and i is a symbol with the property that» = — 1. The real number a is considered to be a special type of complex number, since a = a + 0/. If z = a + bi is a complex number, then the real part of z, denoted by Re z, is a, and the imaginary part of z, denoted by Im z, is b. Two complex numbers a + bi and c + di are equal if their real parts are equal and their imaginary parts are equal—that is, if a = c and b ~ d. A complex number a + bi can be identified with the point (a, b) and plotted in the plane (called the complex plane, or the Argandplane), as shown in Figure C.l. In the complex plane, the horizontal axis is called the real axis and the vertical axis is called the imaginary axis.
Im.
A
6i~
-4 4- 3i
4i
2i~
~6
_l—|—|—|—h
-4 ' -3
~2i-
~4i -
-6e
2i
4—I—I—I—*Re
1 - 4i
m-A-r-i K.I
The complex plane
Operations on Complex Numbers
The sum of the complex numbers a + bi and c + di is defined as
(a + bi) + (c + di) = (a + c) + (b + d)i
Notice that, with the identification of a + bi with (a, b), c + di with (c, d), and (a + c) + (b + d)i with (a + c, b + d), addition of complex numbers is the same as vector addition. The product of a + bi and c 4- di is
(a + bi)(c + di) = a(c + di) + bi(c + di)
= ac + adi + bci + bdi2
*Exercises and selected odd-numbered answers for this appendix can be found on the student companion website.
Appendix C   Complex Numbers
C2
Since i2 = —1, this expression simplifies to (ac - bd) + (ad + bc)i. Thus, we have (a + bi)(c + di) = (ac — bd) + (ad + bc)i
Observe that, as a special case, a(c + di) — ac + adi, so the negative of c + di is — (c + di) = (— l)(c + di) — — c ~ di. This fact allows us to compute the difference
of a + bi and c + di as
(a + bi) - (c + di) = (a + bi) + (-l)(c + di) = (a + (-c)) + (ft + (-4))i = (a - c) + (ft - d)i
Im
ft/-
Example 0.1
Z = fl + fti T
-►Re
Find the sum, difference, and product of 3 — 4; and — 1 + 2i.
Solution   The sum is
(3 - 4/) + (-1 + 2/) = (3 - 1) + (-4 + 2)1 ~2 - 2i The difference is
(3 - 40 - (-1 + 20 = (3 - (-1)) + (-4 - 2)i = 4-6/ The product is
(3 - 4t)(-l + 20 = -3 + 6i + 4i - 8i2
= -3 + 10« - 8(-l) = 5 + 10/
The conjugate of z = a + bi is the complex number
z = a - bi
-bi-
z = a'— bi
Complex conjugates
(z is pronounced "z bar.") Figure C.2 gives the geometric interpretation of the conjugate.
To find the quotient of two complex numbers, we multiply the numerator and the denominator by the conjugate of the denominator.
Example C.2
— 1 + 2*    , r
Express----in the form a + bi.
v       3 + 4;'
Solution We multiply the numerator and denominator by 3 + 4i = 3 — 4i. Using Example C.l, we obtain
-1+2; _ -1+2; 3 - 4i    5 + 10* _ 5 + 10t _ I     2 . 3+4/3 + 4/   3 - 4/ _ 32 + 42 _     25     ~ 5 b
On the following page is a summary of some of the properties of conjugates. The proofs follow from the definition of conjugate; you should verify them for yourself.
S3
Appendix C   Complex Numbers
1. Z — Z
2. z + w = z + w
3. zw — zw
4. If z # 0, then (iv/z) = w/ž.
5. z is real if and only if z = z.
Figure IJ
► Re
The absolute value (or modulus) \z\ of a complex number z = a + bi is its distance from the origin. As Figure C.3 shows, Pythagoras' Theorem gives
\z\ = \a + bi\ = V a2 + b'
Observe that
zz = (a + bi){a — bi) = a2 — abi + bai — b2i2 = a2 + b2
Hence,
This gives us an alternative way of describing the division process for the quotient of two complex numbers. If w and z # 0 are two complex numbers, then
w w z z     z z
WZ
ZZ
WZ
Below is a summary of some of the properties of absolute value. (You should try to prove these using the definition of absolute value and other properties of complex numbers.)
1. I z\ = 0 if and only if z = 0.
2. jz[ = \z\
3. \zw\ = \z\\w
4. If z # 0, then
5.   z + w < z + \w\
a + bi
Figure 6.4
Polar Form
As you have seen, the complex number z = a + bi can be represented geometrically by the point (a, b). This point can also be expressed in terms of polar coordinates (r, 0), where r > 0, as shown in Figure C.4. We have
a = r cos 6   and   b = r sin 6
so z = a + bi = rcosö + (rsinö)«
Thus, any complex number can be written in the polar form
z = r(cos 0 + i sin 0)
Appendix C   Complex Numbers
■ v
Ex
Im
A
= 1 + i
> 1
-*-Re
> ce3
-2--
w — 1
v3(
Figure CS
where r = | z\ = V a2 + b1 and tan 0 = b/a. The angle 9 is called an argument of z and is denoted by arg z. Observe that arg z is not unique: Adding or subtracting any integer multiple of 27T gives another argument of z. However, there is only one argument 6 that satisfies
— tt<9 £ w
This is called the principal argument of z and is denoted by Arg z.
Write the following complex numbers in polar form using their principal arguments: (a) z*= 1 + i (b) w - 1 - V5i
Solution   (a) We compute
r= \z\ = Vi2"+ l2 = V2   and  tan 6 = - = 1
tt
Therefore, Arg z = 9 - - (= 45°), and we have
tt tt
z --- V2 cos---r i sin —
V     4 4
as shown in Figure C.5. (b) We have
Vi2 + (-V3)2 = V4 = 2  and  tan 6 =
•V3
-V3
Since w lies in the fourth quadrant, we must have Arg z — 0 — —~ (= —60°), Therefore,
w = 2 cos
7T
I Sllli — —
See Figure C.5.
The polar form of complex numbers can be used to give geometric interpretations of multiplication and division. Let
Zj = r1(cos9l + i sin 9 J   and  z2 = r2(cos 92 + i sin 92)
Multiplying, we obtain
ZjZj = rjfjfcosPi + i sin 6j)(cos 6| + / sin #2)
= r,r2[(cosSj cos92 — sinsin92) + /(sin()•. cos#2 + cos0: sin92)]
Using the trigonometric identities
cos(0l + 92) = cos    cos #2 — sinS] sin 92
mi(0x + 92) = sin 9X cosfl2 + cos 9X sin 02
Appendix C   Complex Numbers
Figure G.6
ex + ez
we obtain
► Re
zxz2 = r1r2[cos(01 + 62) + isin(Ox + 62)]
(1)
which is the polar form of a complex number with absolute value rxr2 and argument 6X + 02. This shows that
\zxz2\ = !ziii22!   and  arg(z]z2) = argZi + argz2
Equation (1) says that to multiply two complex numbers, we multiply their absolute values and add their arguments. See Figure C.6.
Similarly, using the subtraction identities for sine and cosine, we can show that
[cosiö, - ö'2) + ;'sin(0, - d2)}   if z * 0
B »    (Verify this.) Therefore,
Figure 6.7
!%!     , (zi\ j—j-  and  arg^ ) = arg Zi - arg z2
and we see that to divide two complex numbers, we divide their absolute values and subtract their arguments.
As a special case of the last result, we obtain a formula for the reciprocal of a complex number in polar form. Setting zx = 1 (and therefore 6X = 0) and z2 = z (and #.Re     therefore 82 = 0), we obtain the following:
It's = r(cos6> + ;' sin 0) is nonzero, then
1 1
- = -(cos f) - i sin H)
z r
See Figure C.7.
I        Find the product of 1 + i and 1 - V3;' in polar form. Sektion   From Example C.3, we have
1 + i = V2| cos--b i sin — ) and 1 — V3f = l{ cos
V 4
7T \ / tt
■ — J + ( sin--
3/ V 3
Therefore,
(1 +      - V30 = 2V2 = 2V2
See Figure C.8.
cos -
cos
tt
12
tt\	. . /	tt
	+ i sm	
37	\	'4
w	■ ■ (	tt
	i sm —	
/	V	12
Appendix C   Complex Numbers 60
1m
We therefore have a method for finding the sine and cosine of an angle such as nl\2 that is not a special angle but that can be obtained as a sum or difference of special angles.
De Moivre's Theorem
If n is a positive integer and z — r(cos 8 + i sin 8), then repeated use of Equation (1) yields formulas for the powers of z:
z2 = r2(cos 20 + i sin 20)
z3 = zz2 = r3(cos 30 + i sin 30)
z4 = zz3 - r4(cos4# + / sin 48)
In general, we have the following result, known as De Moivre's Theorem.
Abraham De Moivre (1667-1754) was a French mathematician who made important contributions to trigonometry, analytic geometry, probability, and statistics.
Appendix C   Complex Numbers
Theorem C. 1     De Moivre's Theorem
If z = r(cos 8 + i sin 8) and n is a positive integer, then
z" = rn(cosn0 + isin nö)
Stated differently, we have
\z"\ = \z\n and arg(z") = nargz
In words, De Moivre's Theorem says that to take the nth. power of a complex number, we take the nth power of its absolute value and multiply its argument by n.
Example C.5
-4 - 4/
Powers of 1 +
Find(l + resolution   From Example C.3(a), we have
tt tt
1 + i = V2 cos--h i sin —
4 4
Hence, De Moivre's Theorem gives
(1 + f) = (V2)fi cos— + isin
6tt
6tt
3tt
3tt
= ö cos--r 1 sin —
2 2
8(0 + ;(-!)) = -8i
See Figure C.9, which shows 1 +      + i)2, (1 + i)\..., (1 + /)
4
We can also use De Moivre's Theorem to find nth roots of complex numbers. An nth root of the complex number z is any complex number w such that
w" = z
In polar form, we have
w = s(cos <p + i sin 9)   and   z = r(cos 8 + i sin 0) so, by De Moivre's Theorem,
s"(cosn<p + i sin mp) = r(cosö + fsinÖ) Equating the absolute values, we see that
We must also have
5» = r  or  5 = rxln = Vr
cos nip = cos 8   and   sin ncp = sin 8
(Why?) Since the sine and cosine functions each have period 2-77, these equations imply that nq> and 6 differ by an integer multiple of 2 77; that is,
8 + 2fcir
tup = 8 + IkTT   or  (p =
n ■
AppendixC   Complex Numbers CS
where k is an integer. Therefore
0 + 2kir\ (0 + 2kir s|--) + i sinl---
describes the possible nth roots of z as k ranges over the integers. It is not hard to show that fc = 0,1, 2,..., n — 1 produce distinct values of w, so there are exactly n different nth roots of z = r(cos 0 + /' sin 0). We summarize this result as follows:
Let z — r(cos 0 + i sin 0) and let n be a positive integer. Then z has exactly n distinct Mh roots given by
,1/»
0 + 2fciT
+ i sin
0 + IkTT
(2)
for'Kf = 0,1,2, ...,■« - 1.
Example C.6
*-Re
Figure MO
The cube roots of —27
Find the three cube roots of —27.
Solution In polar form, -27 = 27(cos 77 + i sin 77). It follows that the cube roots of — 27 are given by
cos
(-27)113 = 271/3 Using formula (2) with n = 3, we obtain
27l/3
'it + 2ir-
2kTT\ i tt
+ i sin
2krr
for k = 0,1,2
77 tt
cos--h 1 sin —
3 3
= 3
V3
3 3V3
= —I--i
2 2
27i/3
27'
cos
7T + 4-7T
+ «sin
7t + 27t
tt + 47t
3(cos tt + i sin tt) = —3
577 577
3 cos---V i sin —
3 3
*1
V3
3V3
As Figure C. 10 shows, the three cube roots of — 27 are equally spaced 2tt/3 radians (120°) apart around a circle of radius 3 centered at the origin.
In general, formula (2) implies that the «th roots of Z = r(cos 0 + i sin 0) will lie on a circle of radius riln centered at the origin. Moreover, they will be equally spaced 277/n radians (360/n°) apart. (Verify this.) Thus, if we can find one nth root of z, the remaining nth roots of z can be obtained by rotating the first root through successive increments of 2t7/« radians. Had we known this in Example C.6, we could have used the fact that the real cube root of —27 is —3 and then rotated it twice through an angle of 277/3 radians (120°) to get the other two cube roots.
Numbers
1    Leonhard Euler (1707-1783) was the most prolific mathematician of all time. He has over 900 pub-|   lications to his name, and his collected works fill over 70 volumes. There are so many results attributed to him that "Euler's formula" or "Euler's Theorem" can mean many different things, depending |   on the context.
Euler worked in so many areas of mathematics, it is difficult to list them all. His contribu-
I   tions to calculus and analysis, differential equations, number theory, geometry, topology, me-
I   chanics, and other areas of applied mathematics continue to be influential. He also intro-
I   duced much of the notation we currently use, including tt, e, i, 2 for summation, A for
difference, and/(x) for a function, and was the first to treat sine and cosine as functions.
Euler was born in Switzerland but spent most of his mathematical life in Russia and Germany.
In 1727, he joined the St. Petersburg Academy of Sciences, which had been founded by Catherine
1, the wife of Peter the Great. He went to Berlin in 1741 at the invitation of Frederick the Great, but i
returned in 1766 to St. Petersburg, where he remained until his death. When he was young, he lost the vision in one eye as the result of an illness, and by 1776 he had lost the vision in the other eye and was totally blind. Remarkably, his mathematical output did not diminish, and he continued to be productive until the day he died.
Euler's Formula
In calculus, you learn that the function ez has a power series expansion
z2 z3 f = 1 + z + — + — + 2! 3!
that converges for every real number z. It can be shown that this expansion also works when z is a complex number and that the complex exponential function ez obeys the usual rules for exponents. The sine and cosine functions also have power series expansions:
y, 3 J y,7
A ./V ,\
sin x = x---1------1--
3!     5! 7!
2,4 6 0c        A- x
cos x — 1---1------H--
2!     4! 6!
If we let z = ix, where x is a real number, then we have
(ix)2 (ix)3 e" = eft = 1 + ix + — — + -~ + • • • 2! 3!
Using the fact that i2 = -1, iJ = - i, i* = 1, i5 = i, and so on, repeating in a cycle of length 4, we see that
2        -3 4        -5 .6 '7
iY X        XX        X        XX        X XX
e   = 1 + xx--------1-----1---------h H---
2!     3!     4!     5!     6! 7!
24 6 \ / 357
X        XX \        I X        X X
=   1----+----+---- + i\x--+-----+
2!     4!     6! /V      3!     5! 7!
= cos x + i sin x This remarkable result is known as Euler's formula.
AppendixC   Complex Numbers ;:
Theorem G.2	Euler's Formula
	For any real number x,
	e'x — cos x + isinx
	Using Euler's formula, we see that the polar form of a complex number can be written more compactly as i
	z = r(cos 0 + i sin 6) = re'0
	For example, from Example C.3(a), we have
	1 + i = V^cos™ + / sin ~j = V2ein/4
	We can also go in the other direction and convert a complex exponential back into polar or standard form. ... ^
EXflit	Write the following in the form a + hi: (a) eilT            (b) e2+br/i
	< Solution   (a) Using Euler's formula, we have em = cos tt + i sin it = — 1 + i • 0 = — 1 (If we write this equation as eOT + 1 = 0, we obtain what is surely one of the most
remarkable equations in mathematics. It contains the fundamental operations of addition, multiplication, and exponentiation; the additive identity 0 and the multiplicative identity 1; the two most important transcendental numbers, tt and e; and the complex unit i—-all in one equation!)
(b) Using rules for exponents together with Euler's formula, we obtain
e2V2 e2V2
_-----1_------2
2 2
If z = re'6 -■■ r(cos 6 + i sin 0), then
2 = r(cos 6 — i sin 0)
The trigonometric identities
cos(—0) = cos 0 and sin(—0)
(3)
= —sinö
C11 Appendix C   Complex Numbers
allow us to rewrite Equation (3) as
z = r(cos(-0) + isin(-0)) = reK~0) This gives the following useful formula for the conjugate:
If z = re* then
z = re""'
Note   Etiler s formula gives a quick, one-line proof of De Moivre s Theorem: [r(cos0 + i sin0)]" = (rew)n = rneine = r"(cos n6 + island)
Appendix D
Polynomials
Euler gave the most algebraic of the proofs of the existence of the roots of [a polynomial] equation.... 7 regard it as unjust to ascribe this proof exclusively to Gauss, who merely added the finishing touches. —Georg Frobenius, 1907 Quoted on the MacTutor History of Mathematics archive, http://www-history.mcs ,st-and.ac.uk/history/
A polynomial is a function p of a single variable x that can be written in the form
p(x) = fl0 + axx + a2x2 + • • • + anx" (1)
where a0, a„ are constants (an * 0), called the coefficients of p. With the con-
vention that x° = 1, we can use summation notation to write p as
n 4=0
The integer n is called the degree of p, which is denoted by writing deg p = n. A polynomial of degree zero is called a constant polynomial.
IsilSlli 1,1
Which of the following are polynomials?
(a) 2 - fx + V_x2
(d) ln(~ (g) cos(2 cos'^x)
(b) 2 -
1
(e)
3xl x2 — 5x + 6
x - 2
(c) \/lx2 (f) Vx
(h) ex
Solution   (a) This is the only one that is obviously a polynomial.
(b) A polynomial of the form shown in Equation (1) cannot become infinite as x approaches a finite value [limp(x) =f ±oo], whereas 2 — l/3x2 approaches ~oo as
x approaches zero. Hence, it is not a polynomial.
(c) We have
V2X2 = V~iVx2 = V2\x\
which is equal to V2x when %S0 and to - V_x when x < 0. Therefore, this expression is formed by "splicing together" two polynomials (a piecewisepolynomial), but it is not a polynomial itself.
"Exercises and selected odd-numbered answers for this appendix can be found on the student companion website.
D1
Appendix D Polynomials
(d) Using properties of exponents and logarithms, we have
Í2e5x'\
lni —spi = ]n(2e5x'~3x) = In 2 + ln(e5x'~3x)
= In 2 + 5x3 - 3x = In 2 - 3x + 5x3 so this expression is a polynomial.
(e) The domain of this function consists of all real numbers x # 2. For these values
of x, the function simplifies to t
x2 - 5x + 6     (x - 2)(x - 3) --------_ x - 3
x — 2 x — 2
so we can say that it is a polynomial on its domain.
(f) We see that this function cannot be a polynomial (even on its domain x > 0), since repeated differentiation of a polynomial of the form shown in Equation (1)
M—**    eventually results in zero and Vx does not have this property. (Verify this.)
(g) The domain of this expression is-l<xSl. Let 9 = cos"1 x so that cos 6 = x. Using a trigonometric identity, we see that
cos(2 cos~' x) = cos 26 = 2 cos2 6 — 1 = 2x2 — 1
so this expression is a polynomial on its domain.
(h) Analyzing this expression as we did the one in (f), we conclude that it is not a polynomial.
Two polynomials are equal if the coefficients of corresponding powers of x are all equal. In particular, equal polynomials must have the same degree. The sum of two polynomials is obtained by adding together the coefficients of corresponding powers of x.
			
' 1,2	Find the	sum of 2 — 4x + x2 and 1 + 2x — x2	+ 3x3.
	Solution r	We compute	
	(2 -	- 4x + x2) + (1 + 2x - x2 + 3x3)	= (2 + 1) + (-4 + 2)x + (1 + (~T))x2 + (0 + 3)x3 = 3 - 2x + 3x3
where we have "padded" the first polynomial by giving it an x3 coefficient of zero.
We define the difference of two polynomials analogously, subtracting coefficients instead of adding them. The product of two polynomials is obtained by repeatedly using the distributive law and then gathering together corresponding powers of x.
Appendix D      Polynomials S3
Example D.3 j     Find the product of 2 - 4% + x1 and 1 + 2x - x2 + 3%3.
We obtain
(2 - 4% + x2)(l + 2x - x2 + 3x3)
= 2(1 + 2x - x2 + 3%3) - 4*(1 + 2x - x2 +
+ x2(l + 2x - x2 + 3x3) = (2 + 4x - 2x2 + 6x3) + (-4% - 8%2 + 4x3
+ (x2 + 2x3 - x4 + 3x5) = 2 + (4x- 4x) + (-2%2 ~ 8x2 + x2) + {6x3 + 4x3 + 2x3)
+ (-12%4 - x4) + 3x5
12x4)
= 2
9x2 + 12x3
I3x4
3x5
Observe that for two polynomials p and q, we have
deg(pq) = degp + degq
If p and <j are polynomials with deg q ^ deg p, we can divide q into J>, using long division to obtain the quotient p/q. The next example illustrates the procedure, which is the same as for long division of one integer into another. Just as the quotient of two integers is not, in general, an integer, the quotient of two polynomials is not, in general, another polynomial.
Compute -
1 + 2x - x2 + 3x3
4x + x
We will perform long division. It is helpful to write each polynomial with decreasing powers of x. Accordingly, we have
x2 - 4x + ll^'-'^Tl^'+'i
We begin by dividing x2 into 3x3 to obtain the partial quotient 3x. We then multiply 3% by the divisor x2 - 4% + 2, subtract the result, and bring down the next term from the dividend (3x3 — x2 + 2% + 1):
x2 - 4x + 2J3x3 ■
x + 2x + 1 12x2 + 6%
Ilx* - 4x + 1
Then we repeat the process with 1 lx2, multiplying 11 by x2 — 4% + 2 and subtracting the result from 1 lx2 — 4x + 1. We obtain
3% + 11
- 4x + 2)3a
x2 + 2% + 1
3x3 - 12x2 + 6%
llx2 - 4x + 1 llx2 r 44% + 22 40% - 21
BI Appendix D Polynomials
We now have a remainder 40x — 21. Its degree is less than that of the divisor x2 — Ax + 2, so the process stops, and we have found that
3%3 - x2 + 2x + 1 = (x2 ~ 4x + 2)(3x + 11) + (40x - 21)
3x3 - x2 + 2x + 1 40x -- 21
or ----;--;——---= 3x + 11 +
4x + 2 x — 4x + 2
Example D.4 can be generalized to give the following result, known as the division algorithm.»
Theorem D.1     The Division Algorithm
If/and g are polynomials with degg S deg/, then there are polynomials q and r such that
fix) = gix)q{x) + r{x) where either r = 0 or deg r < deg g.
In Example D.4,
fix) = 3x3 - x2 + 2x + 1,  g{x) = x2 - 4x + 2,   ^{x) = 3x + 11,
and   r(x) = 40* - 21
In the division algorithm, if the remainder is zero, then
fix) = gix)q{x)
and we say that g is a factor of/ (Notice that q is also a factor of/) There is a close connection between the factors of a polynomial and its zeros. A zero of a polynomial /is a number a such that/(a) = 0. [The number a is also called a root of the polynomial equation fix) = 0.] The following result, known as the Factor Theorem, establishes the connection between factors of a polynomial and its zeros.
Theorem D.2     Hie Factor Theorem
Let /be a polynomial and let a be a constant. Then a is a zero of/if and only if x — a is a factor of/(x).
Proof   By the division algorithm,
fix) = {x — a)q{x) + r{x)
where either r(x) = 0 or deg r < deg(x — a) = 1. Thus, in either case, r{x) = r is a constant. Now,
f{d) = (a — a)q{a) + r = r
Appendix D Polynomials
so/(a) = 0 if and only if r — 0, which is equivalent to
/(x) = (x - a)q{x)
as we needed lo prove. —MKKKM
There is no method that is guaranteed to find the zeros of a given polynomial. However, there are some guidelines that are useful in special cases. The case of a polynomial with integer coefficients is particularly interesting. The following result, known as the Rational Roots Theorem, gives criteria for a zero of such a polynomial to be a rational number.
Theorem 0.3     The Rational Roots Theorem
Let
f{x) = a0 + ayx + ■ ■ • + fl„x"
be a polynomial with integer coefficients and let alh be a rational number written in lowest terms. If alb is a zero of/, then «0 is a multiple of a and a„ is a multiple
ofb.
Proof   If a/£> is a zero of/, then
aa + a
\b
Multiplying through by b", we have
"[b]
a0bn + axab"
an^.lan"lb + a„a = 0
which implies that
a0bn + aiabn~l + ■■■ + a„^anlb = -a„an
(1)
(2)
The left-hand side of Equation (2) is a multiple of b, so anan must be a multiple of b also. Since a/b is in lowest terms, a and b have no common factors greater than 1. Therefore, an must be a multiple of b. We can also write Equation (1) as
-a0b" = axabn~x + ■ ■ ■ + an_xan~lb + anan
and a similar argument shows that a0 must be a multiple of a. (Show this.)
Example 0.5 !    Find all the rational roots of the equation
6x3 + I ,lv'
4 = 0
(3)
If a/b is a root of this equation, then 6 is a multiple of b and -4 is a multiple of a, by the Rational Roots Theorem. Therefore,
ae{±l, ±2, ±4}   and   be{±1, ±2, ±3, ±6}
Appendix D Polynomials
Forming all possible rational numbers a/b with these choices of a and b, we see that the only possible rational roots of the given equation are
-t- 1    +1   +4   +1  -hi   -t- - -f-i
_ 1, _ Z, _ 1, _ 2> — 3» — 3> — 3> — 6
Substituting these values into Equation (3) one at a time, we find that — 2, —|, and \ are the only values from this list that are actually roots. (Check these.) As we will see shortly, a polynomial equation of degree 3 cannot have more than three roots, so these are not only all the rational roots of Equation (3) but also its only roots.
:
We can improve on the trial-and-error method of Example D.5 in various ways. For example, once we find one root a of a given polynomial equation/(x) = 0, we know that x - a is a factor of/(x)—say,/(x) = (x - a)g(x). We can therefore divide f(x) by x — a (using long division) to find g(x). Since deg g < deg/ the roots ofg(x) = 0 [which are also roots of/(x) = 0] may be easier to find. In particular, ifg(x) is a quadratic polynomial, we have access to the quadratic formula.
Suppose
ax2 + bx + c = 0
(We may assume that a is positive, since multiplying both sides by -1 would produce an equivalent equation otherwise.) Then, completing the square, we have
,    b      b2\ b2
a\ x 4—x H--r   =--c
a      4a 4a
(Verify this.) Equivalently,
b V    b2              (      b\2    b2 - 4ac ay x _|--~---c   or   I x -j--=---—
2a J     4a \      2a 4a
Therefore,
x +
b lb2 - 4ac    ±Vb2 -
4ac
2a       V    4a2 2a -b ± VlF^lac
or x
2a
Let's revisit the equation from Example D.5 with the quadratic formula in mind.
--------------------------__---
Find the roots of 6x3 + 13x2 - 4 = 0.
Solution   Let's suppose we use the Rational Roots Theorem to discover that x = — 2 '     is a rational root of 6x3 + 13x2 — 4 = 0. Then x + 2 is a factor of 6x3 + 13x2 — 4, and long division gives
6x3 + 13x2 - 4 = (x 4- 2)(6x2 + x - 2)
Appendix D Polynomials
(Check this.) We can now apply the quadratic formula to the second factor to find that its zeros are
-1 ± Vl2 ~ 4(6)(-2)
2-6
-1 ± V49 _ -1 ± 7 12       ~ 12
i. _ §_
12> n
or, in lowest terms, \ and — f. Thus, the three roots of Equation (3) are —2, |, and — §, as we determined in Example D.5.
Remark The Factor Tlieorem establishes a connection between the zeros of a polynomial and its linear factors. However, a polynomial without linear factors may still have factors of higher degree. Furthermore, when asked to factor a polynomial, we need to know the number system to which the coefficients of the factors are supposed to belong.
For example, consider the polynomial
p(x) - x4 + 1
Over the rational numbers Q, the only possible zeros ofp are 1 and — 1, by the Rational Roots Theorem. A quick check shows that neither of these actually works, so p(x) has no linear factors with rational coefficients, by the Factor Tlieorem. However, p (x) may still factor into a product of two quadratics. We will check for quadratic factors using the method of undetermined coefficients. Suppose that
xA + 1 = (x2 + ax + b)(x2 + cx + d)
Expanding the right-hand side and comparing coefficients, we obtain the equations
a + c = 0 b + ac + d = 0 be + ad — 0 bd = 1
If a = 0, then c = 0 and d = -b. This gives -b2 — 1, which has no solutions in Q. Hence, we may assume that a J= 0. Then c = - a, and we obtain d = b. It now follows that b1 = 1, so b = 1 or b = — 1. This implies that a2 = 2 or a2 = -2, respectively, neither of which has solutions in Q. It follows that x4 + 1 cannot be factored over Q. We say that it is irreducible over G.
However, over the real numbers R, x4 + 1 does factor. The calculations we have just done show that
x4 + 1 = (x2 + V2x + l)(x2 - V2x + 1)
(Why?) To see whether we can factor further, we apply the quadratic formula. We see that the first factor has zeros
\/2 ± V(V2 2 - 4    - V 2 ± V12     V2, s 1 1
-------L- =-----_ -(_: ±     _--± •
2 2 ■    2 '       V2 V2
Appendix D Polynomials
which are in C but not in SR. Hence, x2 + V2..V + 1 cannot be factored into linear factors over R. Similarly, x2 ~ v2x + 1 cannot be factored into linear factors over R.
Our calculations show that a complete factorization of*4 + 1 is possible over the complex numbers C. The four zeros of x4 + 1 are
- __L        -.___L - _ 1 1
V'2     V'2 ' V_     V2 ' V2     V2 '
1 1
— a = —7=---pi
V2 V2
which, as Figure D.l shows, all lie on the unit circle in the complex plane. Thus, the factorization of*4 + 1 is
x4 + I = (x — a)(x — a)(x + a)(x + a)
The preceding Remark illustrates several important properties of polynomials. Notice that the polynomial p(x) = x4 + 1 satisfies deg p ~ 4 and has exactly four zeros in C. Furthermore, its complex zeros occur in conjugate pairs; that: is, its complex zeros can be paired up as
{a, a}   and   {—a,—a}
These last two facts are true in general. The first is an instance of the Fundamental Theorem of Algebra (FTA). a result that was first proved by Gauss in 1797.
The0f8m D.4     The Fundamental Theorem of Algebra
Every polynomial of degree n with real or complex coefficients has exactly n zeros (counting multiplicities) in C.
This important theorem is sometimes stated as
"Every polynomial with real or complex coefficients has a zero in C."
Let's call this statement FTA'. Certainly, FTA implies FTA'. Conversely, if FTA' is true, then if we have a polynomial p of degree n, it has a zero a in C. The Factor Theorem then tells us that x — a is a factor oip(x), so
p(x) = (x - a)q(x)
where q is a polynomial of degree n — 1 (also with real or complex coefficients). We can now apply FTA' to q to get another zero, and so on, making FTA true. This argu-H >.    ment can be made into a nice induction proof. (Try it.)
It is not possible to give a formula (along the lines of the quadratic formula) for the zeros of polynomials of degree 5 or more. (The work of Abel and Galois confirmed this; see page 311.) Consequently, other methods must be used to prove FTA. The proof that Gauss gave uses topological methods and can be found in more advanced mathematics courses. Now suppose that
p(x) = a0 + a,x + • • • + a„xn
► Re
Appendix D      Polynomials D9
is a polynomial with real coefficients. Let a be a complex zero of p so that
a0 + axa + ■ ■ • + a„a" = p(a) = 0 Then, using properties of conjugates, we have
p(a) = a0 + axa + ■ ■ ■ + ana" — a0 + axa + • ■ ■ + ana" = a0 + axa + • • • + a„a" = p[ci) = 0 = 0 Thus, a is also a zero of p. This proves the following result:
The complex zeros of a polynomial with real coefficients occur in conjugate pairs.
In some situations, we do not need to know what the zeros of a polynomial are— we only need to know7 where they are located. For example, we might only need to know whether the zeros are positive or negative (as in Theorem 4.35). One theorem that is useful in this regard is Descartes' Rule of Signs. It allows us to make certain predictions about the number of positive zeros of a polynomial with real coefficients based on the signs of these coefficients.
Given a polynomial aa + axx +•■• + anx", write its nonzero coefficients in order. Replace each positive coefficient by a plus sign and each negative coefficient by a minus sign. We will say that the polynomial has k sign changes if there are k places where the coefficients change sign. For example, the polynomial 2 — 3x + 4x3 + x4 — 7x5 has the sign pattern
+^_+ +_j so it has three sign changes, as indicated.
Descartes' stated this rule in his 1637 book La Geometrie, but did not give a proof. Several mathematicians later furnished a proof, and Gauss provided a somewhat sharper version of the theorem in 1828.
Theorem D.5     Descartes' Rule of Signs
Let p be a polynomial with real coefficients that has k sign changes. Then the number of positive zeros of p (counting multiplicities) is at most k.
In words, Descartes' Rule of Signs says that a real polynomial cannot have more positive zeros than it has sign changes.
Example 0.7
Show that the polynomial p{x) = 4 + 2x2 — 7x4 has exactly one positive zero
Solution The coefficients of p have the sign pattern + H—, which has only one sign change. So, by Descartes' Rule of Signs, p has at most one positive zero. But p(0) — 4 andp(l) = —1, so there is a zero somewhere in the interval (0, 1). Hence, this is the only positive zero of p.
Appendix D Polynomials
We can also use Descartes' Rule of Signs to give a bound on the number of negative zeros of a polynomial with real coefficients. Let
p(x) = a0 + axx + a2x2 + • • • + anx"
and let b be a negative zero of p. Then b = — c for c > 0, and we have
0 = p(b) = a0 + axb + a2b2 H-----h anbn
= a0 - aAc + a2c2 —!-••• + (~l)"ancn
But p(~x) = a0 - axx + a2x2 —I-----h (~l)nanxn
so c is a positive zero ofp(-x). Therefore, p(x) has exactly as many negative zeros as p(—x) has positive zeros. Combined with Descartes' Rule of Signs, this observation yields the following:
Let p be a polynomial with real coefficients. Then the number of negative zeros of p is at most the number of sign changes ofp(-x).
-.— ,  , ,., , ..--»►
Show that the zeros ofp(x) = 1 + 3x + 2x2 + x5 cannot all be real.
Solution   The coefficients of p(x) have no sign changes, so p has no positive zeros. ▼    Sincep(-x) - 1 — 3x + 2x2 - x5 has three sign changes among its coefficients, p has at most three negative zeros. We note that 0 is not a zero of p either, so p has at most three real zeros. Therefore, it has at least two complex (nonreal) zeros.
Answers to Selected Odd-Numbered Exercises
Answers are easy. It's asking the right questions [that's] hard.
—Doctor Who "The Face of Evil," By Chris Boucher BBC, 1977
(d)
I-►.T
5. (a)
0>> v
A
2-
1^
H-h-\—*-x
1      2 3
ANS2
Answers to Selected Odd-Numbered Exercises
(c)
H—I—h -2
\
(d)
i i
6 3
7. a + b = [5, 3]
11. [3, -2,3]
13. u =
1/2"		"-V3/2"
.V3/2.	, v —	_ -1/2.
, U + V
(1 - V3)/2 .(V3 - l)/2.
, u - v =
(1 + \/3)/2 _(1 + V3)/2.
15. 5a 17. x 3a
19.
2
X /"\ X \ X\ /'
2)( X
4 V 6 \
-
21. w
-2u + 4v
25. u + v =
27. u + v = [0,1,0,0]
29. +	0	1	2 3
0	0	1	2 3
1	1	2	3 0
2	2.	3	0 1
3	3	0	1 2
31.0			
35.0			
39. 5			
43. [0, 0, 2, 2],			[2,3,
0   12 3
0 0 0 0 0 10   12 3
2 0   2   0 2
3 0   3   2 1 33. 1
37. 2, 0, 3 41. [1,1,0] 45. x = 2 47. No solution 49. x = 3
51. No solution 53. x = 2
55. x — 1, or x = 5
57. (a) All a # 0 (b) a = 1, 5
(c) a and m can have no common factors other than 1 [i.e., the greatest common divisor (gcd) of a and m is 1].
Exercises 1.2 1. -1
3. 11
5. 2
7. V5,
-1/V5
2/V5.
9. Vli,
1/V14 2/Vli 3/V14
Answers to Selected Odd-Numbered Exercises
ANS3
11. V6, [1/V6, 1/V3, 1/V2, 0]
13. Vl7 15. V6
17. (a) u • v is a scalar, not a vector.
(c) v • w is a scalar and u is a vector. 19. Acute 21. Acute 23. Acute
25.60° 27. «88.10° 29. =14.34°
31. Since AB ■ AC = right angle.
4"		r
1	•	1
_-l_		-3
= 0, ZBAC is a
"f		r		"-l"		r
1		1	>d3 =	1	,d4 =	-1
_1_		.-1.		. 1_		1,
33. If we take the cube to be a unit cube (as in Figure 1.34), the four diagonals are given by the vectors
d,
Since d, • d;    0 for all i + j (six possibilities), no two diagonals are perpendicular. 35. D = (-2, 1, 1)
37. 5 mi/h at an angle of   53.13° to the bank 39. 60°
r 1-
" -0.301
41.
43.
45.
0.033 -0.252
47. .4 = V45/2 51. v is of the form k
— a
49. k = -2, 3 , where k is a scalar.
53. The Cauchy-Schwarz Inequality would be violated.
Exercises 1.3 "3 2
l.(a)
3. (a) (b) x = 1 - t
0   (b) 3x + 2y = 0
X		"1"		
	—		+ t	
J-		.0.		3,
y =		3f			
	X			r	x = t
5. (a)	y		t	-1	(b) y= -t
	.z.			4_	z = 4t
	'3'		X		
7. (a)	2	•	y	= 2 (b) 3x + 2y +	
	.1,				
9. (a)
X		2		-3
y	= s	1	+ t	2
z		2		1
(b) x = 2s
y = 5
Z = 25 + f
3f 2f
		1	+ r	2
r		-2		2
X		1		3		-1
y	=	1	+ 5	-1	+ r	0
2		1		1		— 2
11.
13.
15. (a) x = f
/ = -1 + 3f
17. Direction vectors for the two lines are given by 1
X		0		1
	=		+1	
y		-i		3
if anc
and d, =
nh
. The lines are perpendicular
only if dj and d2 are orthogonal. But dj • dj = 0 if and only if 1 + m1m7 = 0 or, equivalently, minh = ~ 1-
19. (a) Perpendicular (c) Perpendicular
(b) Parallel
(d) Perpendicular
	a:		2		3
21.		—	.-1.	+ f	_2_
	X				■-1"
23.		=	0		3
	_z_		3		-1
25. (a) x — 0, x = 1, y = 0, y = 1, z = 0, z = 1
(b) x — y = 0 (c) x + y - z = 0
27. 3 V2/2 29. 2 V3/3 31. (5, \)
33.(1,1,1) 35. 18V13/13
37.1
43. «78.9°
45. =80.4°
Exercises 1.4 1. 13NatapproxN67.38 E
3. 8V3 Nat an angle of 30° to f,
5. 4 N at an angle of 60° to f2
7. 5 N at an angle of 60° to the given force, 5V3N perpendicular to the 5 N force
9. 750\ 2X 11. 980 N
13. « 117.6 N in the 15 cm wire, ~ 88.2 N in the 20 cm wire
ANS4 Answers to Selected Odd-Numbered Exercises
Review Questions 1. (a) T      (c) F      (e) T      (g) F      (i) T
'-2/VI
r
3.x
11
5. 120°
7.
1/V5 0
9.2x + 3y-z=7      11. V6/2 13. The Cauchy-Schwarz Inequality would be violated. 15.2V6/3 17. x = 2      19. 3
Chapter 2
Exercises 2.1
1. Linear 3. Not linear because of the x"1 term
5. Not linear     7. 2x + Ay = 7 9. x + y = 4(x, y # 0)
2/1
4 - 25 - 3f
13. 5
15. Unique solution, x = 3, y = —3
1—I—► *
17. No solution
1—► *
19. [7, 3]
23. [5, -2, 1, 1]
27.
1	-1	0"
2	1	3
21.	i i'	l		
25. [2,		— 7	-	32]
		1	5	-l"
29.		1	1	-5
		2	4	4_
31.        y + z = 1     33. [1, 1] x — y =1 2x — y + z = 1 35. [4,-1] 37. No solution
39. (a) 2x + y = 3        (b) x = f - js
4x + 2 v = 6 y = s
41. Let u — ~ and v = —. The solution is x = k y = —5 x y
43. Let w = tan x, v = sin y, w — cos z. One solution is x = 7r/4,y = -tt/6, z = 7r/3. (There are infinitely many solutions.)
Exercises 2.2 l.No 5. No
3. Reduced row echelon form 7. No
	"l	1	1
9. (a)	0	1	1
	„0	0	1.
	"l	0	-1
13. (b)	0	1	-1
	_0	0	0
11. (b)
1 0 0 1 0 0
15. Perform elementary row operations in the order
R4 + 29R3, SR3, R4 - 3R2, R2 <-> R3, R4 - Rv R3 + 2RV and, finally, R2 + 2RV
17. One possibility is to perform elementary row
operations on A in the order R2 - 3RV \R2, Rx + 2R2, R2 + 3RP i?! <-> R2.
19. Hint: Pick a random 2X2 matrix and try this— carefully!
21. This is really two elementary row operations combined: 3R2 and R2 — 2R}.
23. Exercise 1: 3; Exercise 3: 2; Exercise 5: 2; Exercise 7: 3
25.
27. t
29.
2 -1
31.
24		6		0		12
-10		-2		6		-6
0	+ r	1	+ s	0	+ /	0
0		0		1		0
0		0		0		1
33. No solution 35. Unique solution
Answers to Selected Odd-Numbered Exercises
ANS5
37. Infinitely many solutions
39. Hint: Show that if ad — be + 0, the rank of
a b c d
is 2. (There are two cases: a = 0 and a # 0.) Use the Rank Theorem to deduce that the given system must have a unique solution.
41. (a) No solution if k = -1
(b) A unique solution if k # ±1
(c) Infinitely many solutions if fc = 1
43. (a) No solution if k = 1
(b) A unique solution if k 41 —2,1
(c) Infinitely many solutions if k = —2
45.
X		o"		9'
y	=	-1	+ t	-10
_2_		1.		-7.
49. No intersection
51. The required vectors x =
are the solutions of
the homogeneous system with augmented matrix
0 0
By Theorem 3, there are infinitely many solutions. If
Uj # 0 and m, v2 - h2Vj # 0, the solutions are given by
u2v3 — U3V2 t   U3V1 — KjV3 -Mlv2 ~ w2v1-
But a direct check shows that these are still solutions even if ux = 0 and/or UjV, — u1vl = 0.
1
53.
55.
57.
7. Yes
Exercises 2.3 1. Yes        3. No        5. Yes
9. We need to show that the vector equation x 1
y
-i
_	a
	b_
+
has a solution for all values of a and fr. This vector equation is equivalent to the linear system
. Row
whose augmented matrix is reduction yields
\l 1	a
[l -1	
a
b — a
see that there is a (unique) solution
1 1 0 -2
from which we can
[Further row operations yield x = (a + b)/2, y=(a- b)/2.] Hence, U2 = span(
11. We need to show that the vector equation x
f	1"		1"
spanl	1.		.-1.
+
y
1		0		a
1	+ 2	1	=	b
0		1		c
has a solution for all values
of a, b, and c. This vector equation is equivalent to the linear system whose augmented matrix is
0 0
. Row reduction yields
a b
b + c
, from which we can see
that there is a (unique) solution. [Further row operations yield x = (a — b + c)/2,
y = (a + b — c)/2,		z =	(	-a	+ b +		
	(	1		1		0	\
Hence, U3 = span		0	>	1	í	1	
	{	1		0		1	/
13. (a) The line through the origin with direction '-I
vector
L 2.
(b) The line with general equation 2x + y — 0 15. (a) The plane through the origin with direction
		V		3'
3	vectors	2	i	2
3_		.0.		
whose solution is r
. It follows that there are
(b) The plane with general equation 2x — y + 4z = 0
17. Substitution yields the linear system
a       + 3c = 0 -a + b - 3c = 0
-3
0
1
infinitely many solutions, the simplest perhaps being a = ~3, b = 0, c = 1.
19. u = u + 0(u + v) + 0(u + v + w) v = (-l)u + (u + v) + 0(u + v + w) w = Ou + (-l)(u + v) + (u + v + w)
Answers to Selected Odd-Numbered Exercises
21. (c) We must show that span(e,, e:, e3) = span(e,, e, + e2, ej + e2 + e3). We know that span(e,, e, + e2, e + e. + e3) CR' = span(ep e2, e3). From Exercise 19, e., e2, and e3 all belong to span(ep e. + e,, et + e2 + e3). Therefore, by Exercise 21(b), span(ep e2, e3) = span(ep e, + e2, ej + e2 + e3).
23. Linearly independent
	0		2		2
25. Linearly dependent, —	1	+	1	=	0
	2		3		1
27. Linearly dependent, since the set contains the zero vector 29. Linearly independent
31. Linearly dependent,
43. (a) Yes
3		-1		-1		1
-1		3		-I		1
	+		+			
1		1		1		3
-1		-1		3		1
(b) No
Exercises 2.4
l.xl = 160, x2 = 120, x3 ~ 160
3. two small, three medium, four large
5. 65 bags of house blend, 30 bags of special blend, 45 bags of gourmet blend
7. 4FeS2 + 1102 -> 21V-0 + 8S02
9. 2C4H,0 + 1302 -> 8C02 + 10H2O
11.2C5HuOH + 1502 -> 12H20 + 10CO2
13. Na2C03 + 4C + N2 -> 2NaCN + 3CO
15.(a)/,=    30 - f   (b)/, = 15,/3 = 15
f2 = -10 + t
(c) 0 < fx < 20 0 < /2 < 20
10 </3 < 30
(d) Negative flow would mean that water was flowing backward, against the direction of the arrow.
17. (a) jfj = -200 + s + t   (b) 200 </3 < 300 f, =    300 - s - f
i =
/4 = 150 - t k = t
(c) H7/,
0, then f5 = ŕ > 200 (from the/ equa-
tion), but/ = ŕ < 150 (from the/, equation). This is a contradiction, (d) 50</3<300
19. / = 3 amps, (, = 5 amps, f3 = 2 amps 21. (a) 7=10 amps, I, = 75 = 6 amps, J2 = J4 = 4 amps, /; = 2 amps
(b) R
eff ■
■ ohms
(c) Yes; change it to 4 ohms. 23. Farming ; Manufacturing = 2:3 25. The painter charges $39/hr, the plumber $42/hr, the
electrician $54/hr. 27. (a) Coal should produce $100 million and steel $160 million.
(b) Coal should reduce production by ~ $4.2 million and steel should increase production by * $5.7 million. 29. (a) Yes; push switches 1, 2, and 3 or switches 3,4, and 5. (b) No
31. The states that can be obtained are represented by those vectors
X2
x4
in Z2 for which xx + x2 + xA + xs = 0. (There are 16 such possibilities.)
33. If 0 =-off, 1 = light blue, and 2 = dark blue, then the linear system that arises has augmented matrix
which reduces over Z3 to
10 0 0 0   10 0
0 0 10 0 0 0 0 1 1 0   0   0   0 0
This yields the solutions
		"l*		"2"
x2		1		1
x3		2	+ ŕ	0
		2		2
X,		0		1
Answers to Selected Odd-Numbered Exercises
ANS7
where f is in Z3. Hence, there are exactly three solutions:
1		0		2
1		2		0
2		2		2
2		1		0
0		1		2
where each entry indicates the number of times the corresponding switch should be pushed.
35. (a) Push squares 3 and 7.
(b) The 9X9 coefficient matrix A is row equivalent to Z2, so for any b in Z2, Ax = b has a unique solution.
37. Grace is 15, and Hans is 5. 39. 1200 and 600 square yards
41. (a) a = 4 - d, b = 5 - d, c = — 2 + d, dis arbitrary
(b) No solution 43. (a) No solution
(b) [a, b, c, d, e,f] = [4, 5, 6, -3, -1, 0] + /[-l,-1,-1,1,1,1] 45. (a) y = x2 - 2x + 1 (b) y = x2 + 6x + 10
47. A = 1, B = 2
49. A = -f, B = |, C = 0, D = -ri £ = -| 51. a =      = |,c = 0
Exercises 2.5
n	0	1	2	3	4	5	3. «	0	1	2	3	4	5	6
*i	0	0.8571	0.9714	0.9959	0.9991	0.9998		0	0.2222	0.2539	0.2610	0.2620	0.2622	0.2623
x2	0	0.8000	0.9714	0.9943	0.9992	0.9998	x2	0	0.2857	0.3492	0.3582	0.3603	0.3606	0.3606
Exact solution: x, = 1, x2 = 1 Exact solution (to four decimal places): x} = 0.2623,
x,;= 0.3606
5. «01 2           3 4 5 67 8
x,    0    0.3333 0.2500 0.3055 0.2916 0.3009 0.2986    0,3001 0.2997
x2    0    0.2500 0.0834 0.1250 0.0972 0.1042 0.0996    0.1008 0.1000
x3    0    0.3333 0.2500 0.3055 0.2916 0.3009 0.2986    0.3001 0.2997
Exact solution: x, = 0.3, x2 = 0.1, x3 = 0.3
n	0	1	2	3	4	9. n	0	1	2	3	4
	0	0.8571	0.9959	0.9998	1.0000	X,	0	0.2222	0.2610	0.2622	0.2623
x2	0	0.9714	0.9992	1.0000	1.0000	x2	0	0.3492	0.3603	0.3606	0.3606
After three iterations, the Gauss-Seidel method is within 0.001 of the exact solution. Jacobi's method took four iterations to reach the same accuracy.
After three iterations, the Gauss-Seidel method is within 0.001 of the exact solution. Jacobi's method took four iterations to reach the same accuracy.
ANS8
Answers to Selected Odd-Numbered Exercises
n	0	1	2	3	4	5	6
	0	0.3333	0.2777	0.2962	0.2993	0.2998	0.3000
	0	0.1667	0.1112	0.1020	0.1004	0.1000	0.1000
	0	0.2777	0.2962	0.2993	0.2998	0.3000	0.3000
After four iterations, the Gauss-Seidel method is within 0.001 of the exact solution. Jacobi's method took seven iterations to reach the same accuracy.
13.
*2
0.5-
i i m I i i i*i I i— 0.5 1
n	0	1	2	3	4
X,	0	3	-5	19	-53
x2	0	-4	8	-28	80
If the equations are interchanged and the Gauss-Seidel method is applied to the equivalent system
3*i + 2x2 = 1 x\    2x2 — 3
we obtain
n	0	1	2	3	4	5	6	7	8
*\	0	0.3333	1.2222	0.9260	1.0247	, 0.9918	1.0027	0.9991	1.0003
x2	0	-1.3333	-0.8889	-1.0370	-0.9876	-1.0041	-0.9986	-1.0004	-0.9998
After seven iterations, the process has converged to within 0.001 of the exact solution x1 = 1, x2 = — 1.
17. x2
A
VL0--
l»l I I I I I I
-10: -20: -30;
20
\
19. n 0
xx 0
x2 0
x, 0
1
-1.6 25.9 -10.35
14.97 11.408 -9.311
8.550 14.051 •11.200
10.740 11.615 11.322
9.839 11.718 11.721
10.120 11.249 -11.816
n	7	8	9	10	11	12
*i	9.989	10.022	10.002	10.005	10.001	10.001
	11.187	11.082	11.052	11.026	11.015	11.008
x3	-11.912	-11.948	-11.973	-11.985	-11.992	-11.996
After 12 iterations, the Gauss-Seidel method has converged to within 0.01 of the exact solution xk = 10, x2 — 11, x3 = —12.
Answers to Selected Odd-Numbered Exercises
ANS9
n	13	14	15	16
*i	10.0004	10.0003	10.0001	10.0001
x2	11.0043	11.0023	11.0014	11.0007
x3	-11.9976	-11.9986	-11.9993	-11.9996
23. The Gauss-Seidel method produces
n 0		1	2	3	4	5	6 7	8	9
*, 0		0	12.5	21.875	24.219	24.805	24.951 24.988	24.997	24.999
x2 0		0	18.75	21.438	24.609	24.902	24.976 24.994	24.998	24.999
x3 0		50	68.75	73.438	74.609	74,902	74.976 74.994	74.998	74.999
xA 0		62.5	71.875	74.219	74.805	74.951	74.988 74.997	74.999	75.000
The exact solution is x1 =					75.				
25. The Gauss-Seidel method produces				the following iterates:					
n	0	1	2	3	4	5	6		
h	0	20	21.25	22.8125	23.3301	23.6596	23.7732		
h	0	5	11.25	13.3203	14.6386	15.0926	15.2732		
h	0	21.25	24.6094	26.9873	27.7303	27.9626	28.0352		
k	0	2.5	5.8594	8.2373	8.9804	9.2126	9.2852		
h	0	7.1875	14.6289	16.2829	16.7578	16.9036	16.9491		
k	0	23.0469	24.9072	25.3207	25.4394	:.25.4759	25.4873		
n	7	8	9	10	11	12
h	23.8093	23.8206	23.8242	23.8252	23.8256	23.8257
h	15.2824	15.2966	15.3010	15.3024	15.3029	15.3029
h	28.0579	28.0650	28.0671	28.0678	28.0681	28.0681
h	9.3079	9.3150	9.3172	9.3178	9.3181	9.3181
h	16.9633	16.9677	16.9690	16.9695	16.9696	16.9696
k	25.4908	25.4919	25.4922	25.4924	25.4924	25.4924
ANS10 Answers to Selected Odd-Numbered Exercises
(b) 2xj + x2 = 1 x, H~ 2x2 — 1
(c)   n     0      1        2 3 4 5 6 7
x, 0 0 0.25 0.3125 0.3281 0.3320 0.3330 0.3332 Xj    1     0.5     0.375     0.3438     0.3360     0.3340     0.3335 0.3334
[Columns 1, 2, and 3 of this table are the odd-numbered columns 1, 3, and 5 from the table in part (a).] The iterates are converging to x, — x, = 0.3333.
(d) %i
v2 x2
Review Questions 1. (a) F      (c) F 0"
(e) T      (g) T      (i) F
5.
7. k = -1    9. (0,3,1)
11.x - 2v + z = 0      13. (a) Yes      15. 1 or 2
17. If c,(u + v) + c2(u - v) = 0, then (cj + c,)u + (Cj — c2)v = 0. Linear independence of u and v implies c, + c2 = 0 and cx — ct — 0. Solving this
system, we get e, = . are linearly independent. 19. Their ranks must be equal.
Chapter 3
Exercises 3.1
0. Hence u + v and u — v
1.
5.
3	-6	
LT;	1	
12	-6	3
-4	12	14
3. Not possible
9. [10]
13.
17.
11.
15.
3
19
-4 8
27 -49
3 27
-2 4
0
125
19. B =
1.50 1.75
1.00 1.50
2.00 1.00
BA =
650.00 675.00
462.50 406.25
Column i corresponds to warehouse i, row 1 contains the costs of shipping by truck, and row 2 contains the costs of shipping by train.
21.
x, x2
Xx.
23. AB = [2a, + a2 - a,   3a, - a2 + 6a3 (where a is the fth column of A)
4a,
25.
2 3 -6 -9 4 6
+
0 0 -1 1 0 0
2 -12 -1 6 1 -6
27. BA =
2 A, + 3A,
_-A, + 6A2 + 4A3.
(where A, is the fth row
of A)
29. If b; is the fth column of B, then Ab; is the fth column of AB. If the columns of B are linearly dependent, then there are scalars cv ... ,c„ (not all zero) such that c,b, + • ■ • + c„b„ = 0. But then c,(Ab,) + ■ • • + c„-(Ab„) = A(C;b, + • • • + c,,b„) = AO = 0, so the columns of AB are linearly dependent.
33.
	3	2	o"
31.	-1	1	0
	0	0	5
35. (a) A2 =
A' =
(b) A2
37. A"
39. (a)
1
0
1
-1 1
-1
	1	1
-	1	0
~0		-1
1		-1
	0	1
	1	1
		1
		0
n		
1.		
	1	
	1	
	1	
	1	
"1	2	2	0"
3	4	5	3
1	0	1	2
_0	1	o -	~1_
	"-1	0"	
	0	-1	
A5 =
-1
0
	A o"
,A6 =	
	0 1
r		"0	0	0	0
i	(c)	1	1	1	1
					
l		2	4	8	16
i		3	9	27	81
Answers to Selected Odd-Numbered Exercises
Exercises 3.2
"5 4
1. X = 5. B =
3 5 2A, + A
3.X
2
3
12
3
9. spantA^ A2) =
7. Not possible
2cj + c2 - 2c2    q + c2
W X
_2x — 5w   x — w 11. span(A1, A2, A3) =
c, — c2 + c3   2c2 + c3   — Cj + c3
0 c, + c2 0
-31? + 4c + 5e   b c 0 e 0_
13. Linearly independent    15. Linearly independent 23. a = d, c = 0 25. 3b = 2c, a = d — c
27. a = d, b = c = 0
29. Let A = [a„] andB =       be upper triangular n X n matrices and let i > j. Then, by the definition of an upper triangular matrix,
an ~ aa = " ' ' = at,i-i ~ 0 and hj =        = •••=&„/= 0 Now let C ■
-n)
AB. Then
q, = anbxj + ai2b2j + ■
= 0 • bxj + 0 ■
• + a,„ -0 = 0
from which it follows that C is upper triangular. 35. (a) A, B symmetric => (A + B)r = A1 + Br =
A + B      A + B is symmetric 37. Matrices (b) and (c) are skew-symmetric.
41. Either A or B (or both) must be the zero matrix.
	1   2 3		"l 3	5		0	-1	-2
43. (b)	4   5 6	=	3 5	7	+	1	0	-1
	J   8 9_		5 7	9_		2	1	0
47. Hint: Use the trace.
Exercises 3.3 2 -7 -1 4
1.
5. Not invertible
3. Not invertible -2.8]
7.
1.6 0.3
1 J
a/(a2 + b2) b/(a2 + b2) _-b/(a2 + b2)   a/(a2 + b2)
	"-5"
11.	
	9.
13. (a) x1 =
	4"		"-5"		6"
	1	> X2 =		,x3 =	
	- ~~2-		2_		_-2_
17. 21. 23.
(c) The method in part (b) uses fewer multiplications, (b) (AB)"1 = A~1B~1 if and only if AB = BA X = A~l(BA)2Bl
A
1   0 0"
27. E =
25. £
29. E =
X = (ABTlBA
0 0 1
0 1 0
1 0 0. 1 0 0 0 1 2 0 0 1
0 1 0 -1   0 1
31.
3 0
0 1
33.
39. 43.
45. 47.
0 1
1 0
	1	0	0^		"l	0	0
35.	0	1	2	37.	0	1/c	0
	_0	0	1		_0	0	i
1 0
•1 1
0 -2
1 0 1 1
(a) If A is invertible, then BA = CA=> (BA)A~! = (CA)A"1 B(AA ') = C(AA ') => B7 = CI = B = C.
Hint: Rewrite A2 - 2A + 7 = O as A (27 - A) = 7.
If AB is invertible, then there exists a matrix X such that (AB)X = I. But then A(BX) = 7 too, so A is invertible (with inverse BX).
~l/{a2 + 1) a/(a2 + 1)
49.
53. Not invertible
51.
-a/(a2 + 1) 1/V + DJ
	1/a	0	0 n	
55.	-1/a2	1/a	0	,a *
	1/a3	-1/a2	l/a_	
	"-11 -	-2 5	-4"	
	4	1 -2	2	
57.				
	5	1 -2	2	
	9	2 -4	3_	
	" 1	0	0	0 "
	0	1	0	0
59.				
	0	0	1	0
	_ — a/d	-bid	-c/d	l/d_
d+Q
61. Not invertible
63.
ANS12
Answers to Selected Odd-Numbered Exercises
69.
1	0	0	0"
0	1	0	0
-2	-3	1	0
-1	-2	0	1
71.
"-1	0	1	r
0	1	-1	0
0	1	0	0
1	"1	0	0.
Exercises 3.4
1.
7.
11.
13.
1 0 0 4 1 0 .8   3 1
1 0
2 1
0 3 1 -1   0 -2
3.
2
3
T
o
.0
o o o o
0
1
-3/2 -2 -1
-7' -15 —2 2
1 0 0 0 1 0 0   0 1
2 3 -3 -6
0 3.
1 2
0 2
0 0
0 0
1 -
0
3 3 0 0
15. L1 =	"1 0"	, U"1 =	-   i   l -2 12
	.1 1.		
19.
21.
23.
25.
27.
A	=
"0	0
0	1
.1	0
"0	1
1	0
0	0
_0	0
~0	1
1	0
.0	0
"0	1
1	0
0	0
0	0
-5/12 1/12 1/6 1/6.
0 0
1 0 0 0 0 0 0 1 0
0 0 1
1 0
0 0 1
1 0 0 0 0 0 0 0 1 0 1 0
	"o	1	0"
	1	0	0
	.0	0	1
0 0 0 1 0 10 0 0 0 10 10   0 0
1	0	0
0	1	0
1	5	1
0 0 10'
0 10 0 10   0 0
0 0   0 1 -1   2 1 0   1 4
0   0 -16
0 0 0
1 0 0 0 1 0
-1 0 1
31.
1 0
-1 1
-1 0 0 0
"-2 0 0 6
1 1 -1 1
2' 3
0   1 1
0   0 4
1 -4
Exercises 3.5 1. Subspace 3. Subspace
5. Subspace 7. Not a subspace
11. b is in col(A), w is not in row(A).
15. No
17. {[1   0   — 1 ], [0   1 2]}isabasisforrow(A)
1
is a basis for col(A);
-2 1
is a basis
for null(A).
19. {[1  0  1  0], [0  1   -1  0], [0 0 0  1 ]} is a basis
(Til r11 r r
for row(A); I   0 ,   1 , 1 0     1 -1
is a basis for col(A);
	*-r	
<	l	>
	l	
	°_	
> is a basis for null(A).
21.{[1   0   -1], [1   1   1]} is a basis for row(A);
is a basis for col(A)
23- {[1   1   0   1], [0   1   -1   1], [0   1   -1 -1]}
is a basis for row(A); col(A)
1		0		0
0	J	1		0
0		0		1
is a basis for
25. Both {[ 1   0   -1], [0   1   2]}and{[l   0 -1], [1   1   1 ]} are linearly independent spanning sets for
row(A) = {{a   b   -a + 2b]}. Both
are linearly independent spanning
and
sets for col(A)
1		-1
1		0
o.		1
27.
29. {[1   0   0], [0   1   0], [0   0 1])
31.{[2   -3   1], [1    - 1   0], [4   -4 1]}
35. rank(A) = 2, nullity(A) = 1
37. rank(A) = 3, nullity(A) = 1
39. If A is 3 X 5, then rank(A) £ 3, so there cannot be
more than three linearly independent columns. 41. nullity(A) = 2, 3,4, or 5
Answers to Selected Odd-Numbered Exercises
ANS13
43. If a = ~1, then rank(A) = I; if a — 2, then
rank(A) = 2; for a # -1,2, rank(A) = 3. 45. Yes 47. Yes 49. No
51. w is in span(/3) if and only if the linear system with augmented matrix [B j w] is consistent, which is true in this case, since
	"l	1	r		"l	0	3~
B\w]-	2	0	6	-►	0	1	-2
	„0	-1	2,		.0	0	0.
From this reduced row echelon form, it is also clear that [w]s = _^
53. rank(A) = 2, nullity(A) = 1
55. rank(A) = 3, nullity(A) = 1
57. Let A1;..., A„, be the row vectors of A so that
row(A) = span(At,..., Am). If x is in null(A), then, since Ax = 0, we also have A, • x = 0 for / = 1,... ,m, by the row-column definition of matrix multiplication. If r is in row(A), then r is of the form r = c,Ai + • • • + cmAm. Therefore,
r • x = (cjA] + • • • + cmAJ •x
= Ci(Ai'x) + • • ■ + c,„(Am -x) = 0
59. (a) If a set of columns of AB is linearly independent, then the corresponding columns of B are linearly independent (by an argument similar to that needed to prove Exercise 29 in Section 3.1). It follows that the maximum number k of linearly independent columns of AB [i.e., k = rank(AB)] is not more than the maximum number r of linearly independent columns of B [i.e., r = rank(B)]. In other words, rank(AB) < rank(B).
61. (a) From Exercise 59(a), rank(LA) _ rank(A) and rank(A) = rank((C/_1L/) A) = rnnk(U"l(UA)) < rank(C/A). Hence, rank(LA) = rank(A).
Exercises 3.6
i. r(u) =
0"		8"
	. T(v) =	1
.11.		
11.
1
-1
13.
15. [F]
-1 0
17. [D]
1
-3.
2 0 0 3
19.
k 0 0 1
stretches or contracts in the x-direction (com bined with a reflection in the y-axis if k < 0);
Lo k
stretches or contracts in the y -direction (combined
0 1
1 0
with a reflection in the x-axis if k < 0); reflection in the line y = x;
is a shear in the y -direction. For
1 k 0 1
is a shear in the
x-direction example,
1 0
k l
y
A.
(0, 1)
(0. 0)
(1,1)
(1,0)
(0,1)
(0, 0)
(1.1)
(1,0)
21.
25.
V3/2 1/2 -1/2 V3/2.
0 -1 -1 0
31. [S ° T] = 33. [S ° T] =
35. [S 0 T] =
*• x
23.
27.
1 _I
2 2 i 1 "2 2
3 4' "1 5
1 1 5 5.
-8 5
4 1
0 6 -
1 -2
1 0
-1 1
0 -1
37.
■V3/2 1/2 1/2 V3/2.
-1
0
1
39.
-V3/2 1/2
-1/2 -V3/2.
45. In vector form, let the parallel lines be given by x = p + td and x' = p' + td. Their images are T(x) = T(p + fd) = T(p) + tT(d) and T(x') = T(p' + td) = T(p') + tT(d). Suppose T(d) * 0. If T(p') — T(p) is parallel to T(d), then the images represent the same line; otherwise the images represent distinct parallel lines. On the other hand, if T(d) = 0,
ANS14
Answers to Selected Odd-Numbered Exercises
then the images represent two distinct points if T(p') ¥= T(p) and single point otherwise.
47.
(-2, 3)
(-2, -3)
Exercises 3.7 "0.4
1.x, =
0.6
(2, 3)
(2, -3)
	"0.38"
	
	.0.62.
	~150~		"155"
5. Xj =	120	,x2 =	120
	120		115
9. (a) P =
"0.662 0.250 .0.338 0.750 (c) 42.5% wet, 57.5% dry
3. 64%
7 JL
(b) 0.353
11. (a) P
0.08 0.07 0.85
0.09 0.11 0.80
0.11 0.05 0.84
(b) 0.08, 0.1062, 0.1057, 0.1057, 0.1057
(c) 10.6% good, 5.5% fair, 83.9% poor
13. The entries of the vector jP are just the column sums of the matrix P. So P is stochastic if and only if jP = j.
15.4
19. Yes, x
23. No
27. Productive
17. 9.375
21. No
25. Yes, x =
10
27 35
31.x =
37. X,
39.x, =
10 16
45 6
500 70 50
29. Not productive 10
33. Yes, x
	120"		"375"
, x2 =		,x3 =	
	. 27.		_ 72.
720 350 35
41. (a) For Lp we have Xj =
50
1175 504 175 40 40
	160"		"800"		"640"
, X4 =					
	.160.		.128.		.640.
	"2560"		"12800"
	.2560.	,Xg =	. 2048.
200 _ 32 "3200 512 "10240 .10240
(b) The first population oscillates between two
states,while the second approaches a steady state. 43. The population oscillates through a cycle of three states (for the relative population): If 0.1 < s S 1, the actual population is growing; if s = 0.1, the actual population goes through a cycle of length 3; and if 0 S s < 0,1, the actual population is declining (and will eventually die out).
45. A
47. A
Answers to Selected Odd-Numbered Exercises
ANSIS
51.
53. A
"0	1	1	0"
0	0	0	0
0	1	0	1
1	0	0	0
55. A =
0 10 10 10 0 10 110 0 0
1 0 0 0 1 1   0   0 0 0
61.2 63. 3 65.0 67. 3
69. (a) Vertex i is not adjacent to any other vertices.
71. If we use direct wins only, P2 is in first place; P3, P4, and P6 tie for second place; and P} and P5 tie for third place. If we combine direct and indirect wins, the players rank as follows: P2 in first place, followed by P6, P4, P3,P5, and Pj.
73. (a)
Ann
Bert
Ehaz
Dana
Carla
(b) two steps; all of the off-diagonal entries of the
second row of A + A2 are nonzero, (d) If the graph has n vertices, check the (f,;') entry of
the powers Ak for k = 1,... ,n — 1. Vertex i is
connected to vertex j by a path of length k if and
onlyif(A'')y * 0. 75. (AAr)y counts the number of vertices adjacent to both
vertex i and vertex;'. 77. Bipartite 79. Bipartite
Review Questions 1. (a) T      (c) F
3. Impossible
(e) T      (g) T      (i) T
iZ 83 1_
83 5_ 166
	1 3"		" 4	10
7.		+		
	.3 9.		.10	25
0 = 1,
11. Because (J - A)(I + A + A2) = I - A3 = I
(I - AY1 = 1 + A+A2. 13. Abasisforrow(A)is{[l, -2, 0,-1,0], [0, 0, 1,2,0],
[0, 0, 0, 0, 1]}; a basis for col(A) is (or the standard basis for R3); and a basis for null(A) is
2		5		5
1		2		1
4		3		6
	"2"		f	
	1		0	
	0		-2	>
\	0		1	
I	A		0_	-
15. An invertible matrix has a trivial (zero) null space. If A is invertible, then so is Ar, and so both A and A1 have trivial null spaces. If A is not invertible, then A and AT need not have the same null space. For example, take 11 1"
A
0 0
17. Because A has n linearly independent columns,
rank(A) = n. Hence rank(ArA) = n by Theorem 3.28. Because A A is n Xn, this implies that ATA is invertible, by the Fundamental Theorem of Invertible Matrices. AAT need not be invertible. For example,
take A
19.
-1/5V2   -3/5 V2 2/5V^ 6/5V2
Chapter 4
Exercises 4.1
1. Av =
= 3v, A = 3
AHS16
Answers to Selected Odd-Numbered Exercises
3. Av
5. Av =
•3v. A = -3
= 3v, A = 3
9.
11.
1 1
-1
13. A = 1,£, = span
15. A = 0, £0 = span
17. A = 2, E2 = span "l"
19. V =      , A = l;v =
1/V2] 21. v = , A = 2; v
LI/V2J
23. A = 2, £2 = span I
"0"	
1_	>
"0"	
J.	
"1*	
.0.	>
	
	1
, A = 2
. 1/V2J'
; A = 3, £, = span
25. A = 2, £2 = span^
A
l-
0
x 2x H—N-.....H~"»l—► *
1
27. A = 1 + ;. jS1+j = span: span!
A = 1 - /,£,_; =
29. A = 1 + i, El+i = span
A = 1 — i, BH =
span
31. A = 1,2
Exercises 4.2 1. 16 3. 0
33. A = 4
•18
11. a2b + ab2 13.4
27. -24 35. 8 47. -6
7. 6
15. afrdg 29. 0 37. -4
49.
9. -12
17. 0 25. 2
31. 0 33. -24
39.-8       45. fc^0,2 51. (-2)3"
53. det(Aß) = (det A)(det 5) = (det B) (det A) = det(BA) 55. 0, 1
1-
59.x = -!,>' = 0,2 = 1      61. '
57.x = \,y = -\
63.
2 2 1 0 1 -1 0     0 1
Exercises 4.3 1. (a) A2 - 7A + 12   (b) A = 3,4
(c) E3 = span
; £4 = span!
(d) The algebraic and geometric multiplicities are all 1.
3. (a) -A3 + 2A2 + 5A - 6 (b) A = -2, 1, 3
	/	r	)			(	V	\
(c) jEL2 = span		-3			= span		0	
	I		)			\	_o_	/
£3 ~ span
1
2 10
(d) The algebraic and geometric multiplicities are all 1. 5. (a) -A3 + A2        (b) A = 0,1
(c) £0 = span
(d) A = 0 has algebraic multiplicity 2 and geometric multiplicity 1; A = 1 has algebraic and geometric multiplicity 1.
7. (a) -A3 + 9A2 - 27A + 27 (b) A = 3
/	2	\		I	V	\
	-1		; £j = span		0	
I	1,	J		\	.1.	1
Answers to Selected Odd-Numbered Exercises ANSII
(	"-1"		"0"	\
(c) £3 = span	0	>	1	
V	1.		.0.	/
(d) A = 3 has algebraic multiplicity 3 and geometric multiplicity 2.
9. (a) A4 - 6A3 + 9A2 + 4A - 12 (b) A = -1,2,3
I
(c) £.....! = span
" 0"	\
0	
-2	
1_	/
	/	" r	\
		-1	
span		0	
	I	0.	/
span
\ /
(d) A = — 1 and A = 3 have algebraic and geometric multiplicity 1; A = 2 has algebraic multiplicity 2 and geometric multiplicity 1.
11. (a) A4 - 4A3 4- 2A2 + 4A - 3 (b) A = -1, 1,3
(c) E_! = span
Ex = span
3j7
span
0 0 2
Mi
\
15.
(d) A = -1 and A = 3 have algebraic and geometric multiplicity 1; A = 1 has algebraic and geometric multiplicity 2.
2
(2 • 320 - l)/320 2
2"
2~9 + 3-2' -2~9 + 3-21
17.
23. (a) A = —2, E_2 = span spanM J
-5
A = 5, E5
(b) (i) A = -j,£_1/2 = span span
A - 5, £1/5
(iii) A = 0, £0 = span
2 -5
;A = 7,
				= span	
		■3	4	-12"	
27.		1	0	0	, _
		0	1	0_	
35. A2		=	4A -	5/, A	3 _
A4		=	24A -	- 551	
-A3 - 3A2 + 4A - 12
14,        4 11
37. A    = —A + -I, A'2 =--A + —7
5       5 25 25
Exercises 4.4
1. The characteristic polynomial of A is A2 — 5 A + 1, but
that of B is A2 - 2A + 1. 3. The eigenvalues of A are A = 2 and A = 4, but those of B are A = 1 and A = 4.
l"
S.'Ai = 4, £4 = span
				f	"3
6,£6 =		span			2
				{	.3
/	0"		1		\
	1	>	0		
I	»-1.		_-l		/
;A2
3, £3 = span
-2, £_2 =
span
9. Not diagonalizable
	'1	1	-l"		"2	0	0"
11. p =	1	1	1	,D =	0	-1	0
	.1	-2	0_		.0	0	1
13. Not diagonalizable 15. P =
17.
-11605 24234.
(3* + 3(-l)*)/4 Okrl ~ 3(-l)*)/4] (3k-(~l)k)/4    (3*« + (-l)*)/4 J
	"1	0	0	-1			"2	0	0	0"
	0	1	0	0			0	2	0	0
						,D =				
	0	0	1	0			0	0	-2	0
		0	0	1			_0	0	0	-2_
35839			-69630"							
ANS18
Answers to Selected Odd-Numbered Exercises
21.
23.
1      1 1
0-1 0 0 0-1
(5 + 2l+2 + (-3)VlQ    (2* - (-3)*)/5   (-5 + 2ki2 + (-3)/()/T0
(2h+1 - 2(-3)k)/5      (2* + 4(~3)*)/5      (2*h - 2(-3)":)/5 (-5 + 2*+2 + (-3)*)/l0   (21 - (-3)*)/5    (5 + 2m + (-3)*)/10 .
25. fc = 0 27. it = 0
29. All real values of k
37. If A ~ 5, then there is an invertible matrix P such that B = P~XAP. Therefore, we have tr(B) = tr(P_1AP) = tr(P-1CAJP)) = tr((AP)P"')
= tr(APP-1) = tr(Af) = tr(A) using Exercise 45 in Section 3.2,
7 -2"
39. P =
41. P
10
-5 0
51. (b) dim£_, = 1, dimf, dim£2 = 3
Exercises 4.5 1
2.5
l.(a)
6.000
(b) A, =
3. (a) 1
[0.61"
(b) Aj = (3 + V5)/2 5. (a) m$ = 11.001, y5 =
7. (a) m8 = 10.000, y8 =
= 2.618 -0.333" 1.000.
V
0
1
9.
7*
26 8
1
0.308 26
17.692 [ 5.923.1 1
0.335. 17.692
18.018 6.004 1
0.333. 18.018
17.9"99 6.000 1
0.333. 17.999
18.000 6.000 'l
0.333. 18.000
Therefore, A! *> 18, v.
1
0.333
k	0	1	2	3	4	5	6
11.
	1			7			7.571		7.755"		"7.808"		"7.823"		~7.827~
	.0.			.2.			.2.857_		.3.132.		.3.212.		.3.234.		.3.240.
Yk	"l'		"l				"l		"l		"l		"l		1
	l0.		.0.286.				.0.377.		.0.404.		.0.411.		.0.413.		.0.414.
mk	1			7			7.571		7.755		7.808		7.823		7.827
Therefore, A, ~ 7.827, Vj
1
0.414
Answers to Selected Odd-Numbered Exercises
13. k
1
y*
mk
21 15 13
1
0.714 0.619 21
16.809 12.238 10.714 1
0.728 .0.637. 16.809
17.011 12.371 10.824 1
0.727 .0.636. 17.011
16.999 12.363 10.818 'l
0.727 .0.636. 16.999
17.000 12.363 10.818 "l
0.727 .0.636. 17.000
Therefore, A i « 17, Vj **
1
0.727 0.636.
15. A, <*» 5, Vj
1
0
0.333
k	0			1			2		3		4		5		6
	[11			"7"			"7.571"		"7.755"		"7.808"		7.823"		"7.827
xk	.0.			.2.			.2.857.		.3.132.		.3.212.		.3.234.		.3.240
	7	7.755					7.823		7.828		7.828		7.828		7.828
Yk	"l"		1				1		"l		"l		rl		"l
	o		,0.286				0.377		0.404		0.411		0.413		0.414
						
k	0	1	2	3	4	5
R(xk) Yk
1 1 1
16.333 l" 1
.1,
21
15
13 16.998 1
0.714 0.619
16.809 12.238 10.714 17.000 1
0.728 0.637
17.011 12.371 10.824 17.000 1
0.727 0.636
16.999 12.363 10.818 17.000 1
0.727 0.636
17.000 12.363 10.818. 17.000 1
0.727 0.636
21. k
y*
mk
5
u.
1
0.8
5
4.8 3.2
1
0.667 4.8
3		4		5		6		7		8
4.667"		4.571"		"4.500"		"4.444"		"4.400"		"4.364"
.2.667.		.2.286.		.2.000.		.1.778.		.1.600.		.1.455.
"l		"l		"l		1		"1		"l
.0.571.		.0.500.		.0.444.		.0.400.		.0.364.		.0.333.
4.667		4.571		4.500		4.444		4.400		4.364
Since A! = A2 = 4, v, = the exact answer.
mk is converging slowly to
ANS20 Answers to Selected Odd-Numbered Exercises
23. A:
1
8
mk
5 4
.1, 1
0.8 0.2
4.2 3.2
0.2
1
0.762 0.048
4.2
4.048 3.048 0.048
1
0.753 0.012
4.048
4.012 3.012 0.012
1
0.751 0.003
4.012
4.003 3.003 0.003. 1
0.750 0.001.
4.003
4.001 3.001 .0.001.
1
0.750
.0
4.001
4.000 3.000 0.000.
1
0.750 0
4.000
4.000 3.000 .0.000.
'l
0.750 .0
4.000
In this case, Aj = A2 = 4 and EA = span Clearly, mk is converging to 4 and jk is converging to a
/	"l"		"o~	\
	0		1	
\	_0_		.0.	/
	f		"o"
vector in the eigenspace £4—namely,	0	+ 0.75	1
	_0		0
25. k
	1		r
xk			
	_1_		0.
	"l"		"i"
Yk	.1.		.0.
mk	I		l
-1 -1
1
1
-1
The exact eigenvalues are complex (i and —;'), so the power method cannot possibly converge to either the dominant eigenvalue or the dominant eigenvector if we start with a real initial iterate. Instead, the power method oscillates between two sets of real vectors.
k	0			1			2		3		4		5
	1			3			2.500		2.250		2.125		2.063
	1			4			4.000		4.000		4.000		4.000
	.1.			.3.			. 2.500 _		. 2.250 _		.2.125.		.2.063.
	1		0.750				0.625		0.562		0.531		0.516
Yk	1		1				1		1		1		1
	-1_		- 0.750 _				. 0.625 _		_ 0.562.		.0.531,		.0.516.
mk	1			4			4		4		4		4
Answers to Selected Odd-Numbered Exercises
ANS21
The eigenvalues are Aj = —12, A, = 4, A3 = 2, with corresponding eigenvectors
r		V		T"
0	> V2 =	2	,v3 =	0
_-l_		.1.		.1.
Since Xq = |v2 + jv3, the initial vector x0 has a zero component in the direction of the dominant eigenvector, so the power method cannot converge to the dominant eigenvalue/eigenvector. Instead, it converges to a second eigenvalue/eigenvector pair, as the calculations show.
29. Apply the power method to A - 187 :
12 15
k	0		1		2		3	
	T		8"		15.2		15.2	
								
	.1.		-10.		-19		-19	
	rl"		'-0.8"		"-0.8"		"-0.8"	
Yk	.1.		1		1		1	
mk 1			-10		-19		-19	
Thus, —19 is the dominant eigenvalue of A — 187, and A2= —19 + 18 = —1 is the second eigenvalue of A.
31. Apply the power method to A - 177 =
-8    4 8
4  -2 -4
8  -4 -8
Yk
mk
-0.667
1
4 -2 -4.
1
-0.5 -1 4 18
-18 9
18. 1
-0.5 1
-18
-18
-18 9
18. 1
-0.5 -1 -18 -18
In-this case, there is no dominant eigenvalue. (We could choose either 18 or —18 for mk, k a 2.) However, the Rayleigh quotient method (Exercises 17-20) converges to —18. Thus, —18 is the dominant eigenvalue of A - 177, and A2 = -18 + 17 = -1 is the second eigenvalue of A.
33.
1
2		3		4		5
"- 0.833"		0.798"		0.800"		0.800"
1.056_		.-0.997.		.-1.000.		.-1.000.
"-0.789"		-0.801"		"-0.800"		"-0.800"
1		1		1		1
Yk
0.5 -0.5,
l" -1.
0.5
1.056
-0.997
■1.000
1.000
Thus, the eigenvalue of A that is smallest in magnitude is l/(-l) = -1.
ANS22 Answers to Selected Odd-Numbered Exercises
k	0		l		2		3		4		5 .   .   . ■ ■.
	r		*-0.500"		0.500"		0.500"		0.500""		0.500"
	l		0.000		0.333		0.111		0.259		0.160
			0.500.		.-0.500.		.-0.500.		.-0.500.		.-0.500.
	r		"-1.000"		"-1.000"		"-1.000"		~-l.Q00~		"-1.000 "
	l		0.000		-0.667		-0.222		-0.518		-0.321
			1.000.		1.000.		1.000.		1.000.		1.000.
	l		-0.500		-0.500		-0.500		-0.500		-0.500
Clearly, mk converges to —0.5, so the smallest eigenvalue ofAisl/(-0.5) = -2.
37. The calculations are the same as for Exercise 33. 39. We apply the inverse power method to A — 51 —
-1 0 -1 -2 6 0
0
. Taking x0
, we have
	1		0.200
	1		-0.500
	_1_		0.200.
	"l"		"-0.400"
Yk	1		1
	.1.		.-0.400.
mk	1		-0.500
-0.080 -0.500 -0.080
0.160' 1
0.160. 0.500
0.032 -0.500 0.032.
-0.064' 1
-0.064. -0.500
Clearly, mk converges to —0.5, so the eigenvalue of A closest to 5 is 5 + l/(-Q.5) = 5-2 = 3.
41. 0.732 47
43. -0.619
Im 4
2-
-2-
-*-Re
I—»-Re
51. Hint: Show that 0 is not contained in any Gerschgorin
disk and then apply Theorem 4.16. 53. Exercise 52 implies that JAJ is less than or equal to all
of the column sums of A for every eigenvalue A. But
for a stochastic matrix, all column sums are 1. Hence
JAj < l.
Exercises 4,6 1. Not regular
5. Not regular
3. Regular
7.1 =
I I
5 5
9.1 =
11. 1,
0.304 0.304 0.304 0.354 0.354 0.354 0.342   0.342 0.342.
13. 2,
16 4 1
15. The population is increasing, decreasing, and constant, respectively.
Answers to Selected Odd-Numbered Exercises
17.
P~lLP =
bl b2Si J>3Si52 1    0 0 0    1 0
0 0 0 0
sn~2   bnSiS2 ■ • • S„_
0 0
Ü 0
in- 1
65. (a) x(t) = -120em + 520ellt/w,y(t) = 240e8t/5 + 2601" 10. Strain X dies out after approximately 2.93 days; strain Y continues to grow.
67. a = 10, b = 20;x(t) = 10ef(cos t + sin f) + 10, y(t) = 10e'(cos f — sin t) + 20. Species Y dies out when t = 1.22.
The characteristic polynomial of L is (A" — bxKn
b2S]\n-2 - hs^x"-3-----&„w ■ -s^K-iT.
19. A = 1.746, p
21. A « 1.092, p
0.660 0.264 .0.076
0.535" 0.147 0.094 0.078 0.064 0.053 0.029
25. (a) h = 0.082 31.3,
35. Irreducible 45. 0, 1, 1,0,-1 49. yn = {n- \)2n 1
29. 3,
33. Reducible
43. 1,2,4, 8,16
47. x„ = 4" - (-1)"
51. b„
2V3
57. (a) dx = 1, d2 = 2, d3
(b) dn = d„_] + d„_2
[(1 + V3)" - (1 - \/3)"l 3, d4 = 5, d5 =
(c) d„
1
V5
1 + V5\"
1 - V'5
2     /        V 2 59. The general solution isx(f) = — 30^' + C2eu, y{t) = 2Cle~t + C2e4t. Tlie specific solution is x(f) = -3e'' + 3e4\ y(t) = 2e"! + 3e4t. 61. The general solution is Xj(0 = (1 + V2)C1ev5t + (1 - V2)C2e-v/5',x2(0 = C1ev'2t+ C2e-v'2,.The specific solution is Xj(f) = (2 + V2)eV274 + (2 - V2VV2y4, x2(t) = V2e^l4 - y/ü^/A. 63. The general solution is x(f) = — C, + C3e~',y(f) = Cj + C2e( - C3e~f, z(r) = Q + C2e'. The specific solution is x(t) = 2 — e~', y{t) = —2 + e' + e~\ z(t) = -2 + ef.
77. (a) 79. (a) 81. (a) 83. (a)
=	Qe2<		+ C2e			3f
1		3		9		27
.1.	J	.3.	y	„9.	>	_27
[1"		T		"1"		r
.1.		.1.	1	.1.		l
-1
0.6 0.6
	1.75"		3.125"
>	.-0.5 .		,-1-75 „
	0.36		0.216
>	_0.36_		.0.216.
(c) Repeller
(c) Neither
(c) Saddle point
(c) Attractor
85. r = V2,6 = 45°, spiral repeller 87. r = 2,6 — —60°, spiral repeller
		-1 -1"		"0.2	-0.1"
3"	89. P =		,c =		
5		1 o_		0,1	0.2
spiral attractor
91. P
1/2 -V3/2
1 0 orbital center
1/2 -V'3/2 3/2 1/2
Review Questions 1. (a) F      (c) F      (e) F      (g) T      (i) F 3. -18
5. Since A7 = —A, we have det A = det(Ar) -
det(-A) = (-1)" detA = -det A by Theorem 4.7 and the fact that n is odd. It follows that det A = 0.
7. Ax
5 .10
9. (a) 4 - 3A2
= 5x, A A3
(c) jB] = span
/	r			(	2'		"l"	\
	-l		, E.....2 = span		-1		0	
I	0_	I		\	0_		.1.	1
11.
13. Not similar
15. Not similar
162 158. 17. 0, l,or-l
19. If Ax = Ax, then (A2 - 5A + 27)x =
A2x - 5Ax + 2x = 32x - 5(3x) + 2x = -4x.
R N12\ Answers to Selected Odd-Numbered Exercises
feasor
Exercises 5.1
1. Orthogonal     3. Not orthogonal   5. Orthogonal
11. Orthonormal
	i-		"o"
7. [w]B =	2	9. [w]B =	3 1
13.
1/3' 2/3 2/3
2/V5 ' -1/V5 0
L 3 J 2/3V5 4/3V5 -5/3V5
15. Orthonormal
17. Orthogonal, 19. Orthogonal,
1/V2 -I/V2 _1/V2 I/V2.
cos 8 sin 8
— cos v -sin18
21. Not orthogonal 27. cos(Z.(Qx, Qy))
Qx-Qy
cos2 0 sin 0 sin 8 0
-cos 0 sin 0 cos 0
x-y
iQxHIlQyll ||x||||y||
= cos(Z(x, y)) by Theorem 5.6
29. Rotation, 8 = 45°       31. Reflection, y = V3x
33. (a) A(AT + BT)B = AArB + ABTB = IB + Al = B + A = A + B (b) From part (a),
det(A + B) = det(A(Ar + Br)B)
= detAdet(AT + Br)detB = detAdet((A + 5)r)detB = det A det(A + B)det B
Assume that det A + det B = 0 (so that det B = -det A) but that A + Bis invertible. Then det(A + B) * 0, so 1 = det A detB = det A(-det A) = -(det A)2. This is impossible, so we conclude that A + B cannot be invertible.
Exercises 5.2 1. W1 =
3. W- -
: x = f, y = f, 2 = — £ I
f	X	
		: x — y + 3z
I	_z_	
5. W =
7. row(A):{[l 0 1], [0 1  -2]}, null(A):
"0"	I
3	
.1.	1
9. col(A):
	_ r			"-1*
	5			2
1	0		>	1
I	_-l_			_-l_
-	-5"1	-		
	1			
			>	
	-7			
	0_			
, null(Ar):
15.
13.
17.
--4-		"-3"	
1		0	
0		1	
3.		0_	
	-_2"		' 11'		7 * 2		1 2
19. v =	5 _6 .   5 _	+	5 4 .    5 -	21. v =	-2 7 . 2-	+	0 1 1- 2
25. No
Exercises 5.3 i
1.Y,
3. v,
q2
■		"-	1/2		"1/v2"					"-I/V2"	
, v			1/2		.1/V2.			'^2		1/v2.	
r			'2'		0"					1/V3"	
-1		=	1	. v3 =		1		1 =		1/V3	>
-i_			.1.		1.				.-	1/V3.	
2/v6"				0	1						
1/V6			=	-I/V2							
1/V6.				. 1/V2 _							
5.
f			- i* 2
	1		1 2
I	.0.		2.
7. v
+
Answers to Selected Odd-Numbered Exercises
ANS2S
<	"a"		3 -		- 15"	
	j		35		34	
11.	1	)	34 35	>	0	
	K		_l		9	
	_ J _		7 -		- 34-	
13. Q
15.
0
1/V2 1/V2
1/V2 0
T/V2
2/V6 -1/V6 1/V6
1/V3 1/V6 1/V3 -2/V2 1/V3 1/V6
1/V3 1/V3 -1/V3.
V2 0 0
1/V2 1/vT 3/V6 1/V6 0 2/V3.
3 9
17. R =   0 6 .0 0 19. A = AI 21. A"1 = (Qfl) '1/V2 0 0 0
2/V6 1/V3
23. Let Rx = 0. Then Ax = Q£x = Q0 = 0. Since Ax represents a linear combination of the columns of A (which are linearly independent), we must have x = 0. Hence, R is invertible, by the Fundamental Theorem.
= R	'CP =	
1/V6	-1/2V3	
2/V6	-1/2V3	•
0	3/2V3	
1/V2	1/V2"	
1/V6	1/V6	
1/V3	-1/V3.	
Exercises 5.4
'1/V2
l.Q =
3.Q =
5.Q =
7.Q =
9.Q =
.1/V2
2/Vi 1/V3
1 0
0 1/V2 0 1/V2
-1/V2,
1/V3 -2/V6.
D =
D =
0
1/V2 1/V2
1/V2   0 l/VT
1
0
0
1/V2 0 1/V2 1/V2 0
1/V2 0 0 1/V2 0 1/V2
D =
D =
1/V2 1/V2 0 0
.0 0
"2 0 0'
0   1 0
.0  0 0.
0 ■
0
1/V2 ■1/V2.
D
11. QTAQ =
"1/V2 .1/V2
I/V2 1/V2 1/V2 -I/V2.
0
= D
1/V2" •1/V2. + b
0      a -
13. (a) If A and B are orthogonally diagonalizable, then each is symmetric, by the Spectral Theorem. Therefore, A + B is symmetric, by Exercise 35 in Section 3.2, and so is orthogonally diagonalizable, by the Spectral Theorem.
15. If A and B are orthogonally diagonalizable, then each is symmetric, by the Spectral Theorem. Since AB = BA, AB is also symmetric, by Exercise 36 in Section 3.2. Hence, AB is orthogonally diagonalizable, by the Spectral Theorem.
17. A
19. A
21.
+
5 0 0 0 0 0 0   0 0
	"0	0	0~		"0	0	0
+	0	2	2	+	0	-1	1
	0	2	2		0	1	-1
23.
Exercises 5.5 1. 2x2 + 6xy + 4y2 3. 123
5. -5
	"i	3"		"3 -§-
7.			9.	.-! -1.
	.3	2_		
11.
5 1
-2
13. Q
15. Q =
17. Q =
2/V5 I/V5 1/V5 -2/V5.
'2/V5      2/3 V5 0 5/3 V5
A/V5    -4/3 V5
I/V3 1/V2 -1/V3 0 .-1/V3   I/V2 -1/V6
(v'Y ~ (z')2
,yl + 6yl
-1/3'
2/3 2/3.
1/V6 2/V6
. 9yi + 9y\ - 9yj
,2(x')2 +
INI 28 Answers to Selected Odd-Numbered Exercises
19. Positive definite 23. Positive definite
21. Negative definite 25. Indefinite
43. Hyperbola, x' = x,y' = y + j, (x')2/4 - {y'f/9
27. For any vector x, we have X Ax = x B Bx =
(Bx)r(Bx) = IJBxf > 0. IfxrAx = 0,then j|Bx||2 = 0, so Bx= 0. Since B is invertible, this implies that x = 0. Therefore, xTAx > 0 for all x ¥= 0, and hence A = BTB is positive definite.
29. (a) Every eigenvalue of cA is of the form cA for some eigenvalue A of A. By Theorem 5.24, A > 0, so cA > 0, since c is positive. Hence, cA is positive definite, by Theorem 5.24.
(c) Let x * 0. Then xTAx > 0 and xrBx > 0, since A and B are positive definite. But then xJ (A + B)x = xrAx + xrBx > 0, so A + B is positive definite.
31. The maximum value of/(x) is 2 when x = ± the minimum value of/(x) is 0 when x =
33. The maximum value off(x) is 4 when x =: the minimum value of f(x) is 1 when x :
	1/V2"
	-1/V2}
	"1/V2"
	.I/V2
	"l/VT
4-	1/V3
	.1/V3.
	I/V2"		"-1/V5"
	0	or ±	1/V2
	.-1/V2.		0
35. Ellipse 37. Parabola 39. Hyperbola
41. Circle, x' = x - 2,y' = y - 2, (x')2 + (y')2 = 4
5-
45. Parabola, x' = x — 2, v' = y + 2, x' = — \{y')2
y y
47. Ellipse, (x'Ý/4 + (y'f/12 = 1
49. Hyperbola, (x')2 - {y'f = 1
Answers to Selected Odd-Numbered Exercises : (2
51. Ellipse, (x")2/50 + (y")2/10 = 1 53. Hyperbola, (x"f - (y"f = 1 55. Degenerate (two lines)
57. Degenerate (a point)
y
A
2-
H—I—h
H—I—I—*-x
-2-
59. Degenerate (two lines)
y
61. Hyperboloid of one sheet, (x')2 - (y')2 + 3(z')2 = 1 63. Hyperbolic paraboloid, z = — (x')2 + (y')1 65. Hyperbolic paraboloid, x' = — V3(y')2 + V3(z')2 67. Ellipsoid, 3(x")2 + (y")2 + 2(z")2 = 4
Review Questions 1. (a) T      (c) T      (e) F      (g) F      (i) F
9/2 "
2/3 5. Verify that QTQ = I.
.-11/6.
3.
7. Theorem 5.6(c) shows that if v; • V, = 0, then Qv. • Qy = 0. Theorem 5.6(b) shows that {Qv., ..., Qvk} consists of unit vectors, because {v,,..., \k} does. Hence, {Qvj,..., Qvk} is an orthonormal set.
r r—i "
2
9.
11.
13. row(A): {[1   0   2   3   4], [0102 1]} col(A):
null(A): <
0	2	3
1"		"-1"
-1		2
2		1
3_		— 5
~2		-3
0		-2
1		0
0		1
0		0
-4 -1 0
0
1
null(Ar):
15. (a)
i		1 -4		3
1		1 4		1 3
1		1 4		1 3
_1_		. ""I_		0
17.
19.
			I-2			1-
<	1		1 2			i 3
	0	5	"1			1 3
{	0		0			-1
-	1 "2	3 2	0~			
	3 2	1 ~2	0			
	0	0	1.			
Exercises 6.1 1. Vector space
3. Not a vector space; axiom 1 fails. 5. Not a vector space; axiom 8 fails. 7. Vector space 9. Vector space
11. Vector space 15. Complex vector space
AMS28 Answers to Selected Odd-Numbered Exercises
17. Not a complex vector space; axiom 6 fails. 19. Not a vector space; axioms 1,4, and 6 fail. 21. Not a vector space; the operations of addition and multiplication are not even the same.
27. Not a subspace 31. Subspace 35. Subspace 39. Subspace 43. Not a subspace
25. Subspace 29. Not a subspace 33. Subspace 37. Not a subspace 41. Subspace 45. Not a subspace
47. Take U to be the x -axis and W the y -axis, for example.
Then
V	
	and
„0.	[iJ
are in 17 U W, but
"1"		"0"
	+	
.0.		1
is not.
51. No
53. Yes; s(x) = (3 + 2t)p(x) + (14- t)q(x) + tr(x) for any scalar t.
55. Yes; h(x) = f(x) + g(x)
57. No
59. No
61. Yes
Exercises 6.2 1. Linearly independent
3. Linearly dependent;
1	0"		-1 f	
		= 4		+
1	7		-2 2	
"3	0'		0	2
		- 2		
I	r—1		-3	1
5. Linearly independent
7. Linearly dependent; 3x + 2x2 = 7x — 2(2x — x2)
9. Linearly independent 11. Linearly dependent; 1 = sin2x + cos2x 13. Linearly dependent; ln(x2) = -2 In 2-1 + 2 • ln(2x) 17. (a) Linearly independent (b) Linearly dependent
35. dimV = 2,B = {l - x, 1 "l 0
37. dim V = 3, B = 39. dim V = 2, B =
0 0
1 0 0 1
- x2} 0 1 0 0 0 1 0 0
0 0 0 1
41. (n2 - n)/2
43. (a) dim(U X V) = dim 17 + dim V
(b) Show that if {wv ..., w,J is a basis for W, then {(wj, Wj),..., (w„, wn)} is a basis for A. 45. {l + x, 1 + x + x2, l}
47.
1 0 0 1 49. {1,1 + x) 51. {1 - x,x - X2} 53. {sin2 x, cos2 x} 59. (a) pQ(x) = |x2 p2(x) = |x2
	0	1		0	-1	
	1	0		1	0	
1 0 0 0
Ix + 1
3, p;(.v)
x2 + 4x - 3,
61. (c) (i) 3x2 - 16x + 19 (ii) x2 - 4x + 5 63. (p" - l)(pn - p)(p" - p2)- ■ ■ {f - p"'!)
Exercises 6:3
1. [x]8 =
1 1 1 -1
> *C-
l >;);- c -
	1		1		1	0	0
3. Me =	0	» [x]c =	-1		-1	1	0
	.-1		„-l		0	-1	1
5. [p(x))B =
1 0 0 1 1 0 1   1 1
2 1
0 1
1 1
Ap(x)]c
-3 2
Pc-
-1 1 1 0
19. Basis	21. Not a basis		r		T		"l	0	(f
23. Not a basis	25. Not a basis	7. [pWJp =	-l	. ip(x)]c =	0		1	1	0
r -ii			i_		_1_		_1	1	1.
27. [A][
-1 -1 4
29. [p(x)]2
6 -1 3
1
-1 0
0 0
1 0
-1 1
Answers to Selected Odd-Numbered Exercises MS
9. [A]i
' 4~		5 -] 2
2	> [A]c =	0
0		
		-3
		9 2.
and      rl k
,P>
(,	a	\	ka
[k		)-T	
\	b.		kb_
11. \f(x))B
-1 1
1
-2
2 -5
In
2
0
1
1
~2
> Pb*-c ~
c+-b ■
1
2 0
-1
1 1
1 0
0 0
1 1
Therefore, T is linear.
-7
15. T
= (ka) + (ka + kb)x = k(a + (a + b)x) = kT
a + 3b
9
a + 7b
5 - 14* - 8xz, T a — b
x+ ,
c*~b
13. (a)
(b) 15. B =
1/2 -1/2 J
(3 - 2V/3)/2 (-3V3 + 2)/2
2 + 2V3" 2V3 - 2
[/(*)]C =
-1/2 5/2j'
1 1
0 ~2
	3.232'
	.-1-598.
17. T(4 - x + 3x2) = 4 + 3x + 5x2, T(a + bx+ cx2) =
(3a — b — cN
a + cx + I-
V 2
19. Hmf; Let a = T(EU), b = T(£12), c = T(£21), d=T(£22).
23. Hint: Consider the effect of T and D on the standard basis for S?„.
	"5.464"		"2"		"4 -1"	
	J .464.	25. (S 0 T)	1	—	0 6	. (S »D
2x —7 . 0   2x + 2y_
-1 -1
(r°s)
X
UM
does not make sense.
17. -2 - 8(x - 1) - 5(x - l)2
19. -1 + 3(x + 1) - 3(x + l)2 + (x+ I)3
27. (S o T)(p(x)) = p'(x + 1), (To S)(p(x)) (p(x+l))' = p'(* + 1)
3. Linear transformation
Exercises 6.4
1. Linear transformation
5. Linear transformation
7. Not a linear transformation
9. Linear transformation 11. Not a linear transformation 13. We have
S(p(x) + q(x)) = S((p + q)(x)) = x((p + q)(x))
- x(p(x) + q(x)) = xp(x) + xq(x) = S(p(x)) + S(q(x))
and       S(cp(x)) = S((cp)(x)) = x((cp)(x))
= x(cp(x)) = cxp(x) = cS(jp(x))
Therefore, S is linear. Similarly,
a + c b + d_
= (a + c) + ((a + c) + (b + d))x = (a + (a + b)x) + (c + (c + d)x)
29. (S'° T)
= si r
= s
x — y _—3x + 4y_
4(x - y) + (-3x + 4y) 3(x - y) + (-3x + 4y)
1-	x
J	.y.
x	J	x	Y-A	4x + y
	= T[S			
J-	V	y.	) \	_3x + y_
	X
	J.
(4x + y) - (3x + y)
Therefore, S 0 T = 1 and T0 S — I, so S and T are inverses.
Exercises 6.5
T
+
T
1. (a) Only (ii) is in ker(T').
(b) Only (iii) is in range(T). , "0 b
(c) ker(T) =
c 0
range(T)
= Tl
+ 71
3. (a) Only (iii) is in ker(T).
(b) All of them are in range (T).
(c) ker(T) = {a + bx + cx2: a = —c, b {t + tx- fx2}, range(T) = !R2
a 0; 0 d1
-c} =
AHS30 Answers to Selected Odd-Numbered Exercises
5. A basis for ker(T) is for range (T) is
and a basis
J	"l 0'		"0 0"
1	0 0_		_0 1.
nullity(T) dim M22. 7. A basis for ker(T) is {l + x
rank(T) = 2, and rank(T) + nullity(T) = 4 =
x2}, and a basis for
range (T) is
>;rank(T) = 2, nullity(T) = 1,
and rank(T) + nullity(T) = 3 = dim;-/',.
9. rank(T) = nullity(T) = 2 ll.rank(T) = nullity(T) = 2 13. rank(T) = 1, nullity(T') = 2 15. One-to-one and onto 17. Neither one-to-one nor onto 19. One-to-one but not onto
21. Isomorphic, T
a	0	(f		a
0	b	0	=	b
„0	0	c_		
23. Not isomorphic
25. Isomorphic, T(a + bi)
31. Hint: Define T: % [0,1] ->   [0, 2] by letting T(f) be the function whose value at x is (T(f))(x) — f(x/2) for xin [0, 2].
33. (a) Let Vj and v2 be in Vmd let (S ° T)(Vi) --(S o D(v2). Then S(r(v,)) = S(T(v2)), so r(vj) = T(v2), since S is one-to-one. But now v, = v2, since T is one-to-one. Hence, S ° T is one-to-one.
35. (a) By the Rank Theorem, rank(T) + nullity(T) = dim V. If Tis onto, then range(T) = W, so rank(T) = dim(range(T)) = dim W. Therefore,
dim V + nullity(T) < dim W + nullity (T)
= rank(T) + nullity(r) = dim V
so nullity (T) < 0, which is impossible. Therefore, T cannot be onto.
Exercises 6.6 1. [TW =
0
-1
2 -4
, [T'W4 + 2x]B =
[2 - 4x]c= [7(4 + 2x)]c
3. [T]
1 0 0 0 1 0 0   0 1
m
rfl 4- bx + cx2]r =
"l	0	o"	a		a	
0	1	0	b	=	b	= [a + b(x + 2) +
0	0	1	c		c	
c(x + 2)2)]c = [T(a + bx+ cx2)]c
Pi   0 0" 1   1 1
5. [T]
[T]
c+-bi
bx + cx2]B
1 0 0 1   1 1
a
b + c
a 4- b • 0 + c • 02 a + b'l + c I2
[T(« + bx
cx2)]c
Jc
	6		4						
									
	-3 2		-2 -1.		. [T]c		<-b	7.	B
6 4"				"o"			'7'		"7"
		2							
-3 -2				0		—	7	—	7
		-3							
2,-1.				_7_			.7.	c	.7.
T
-7 7
9. [TWS =
"1	0	0	0
0	0	1	0
0	1	0	0
0	0	0	1
mcWA],
1 0   0 0'
0 0 10
0 10 0
0 0   0 1
11. [7]
c^b
0
-1
1
0
c -
a -
0 -1
1
0
1 0 0
-1
a		a
b		c
c		h
d_		
	-1	
	0	
	0	
	1	—
	a	c	
	b	d.	
= [T(A)]
1		0"	
0		1	
0		1	, [T]
	—		
-1		0_	
a'		c	- b~
b		d	— a
c		a	- d
d		k	— c
b d — a d   b — c
= [AB - BA]C= [T(A)_
Answers to Selected Odd-Numbered Exercises HNS81
13. (b) [D]B--
(c) [D]g[3 sinx — 5 cosx]g ;
0 -1
3
-5
= [ 3 cos x + 5 sin x ] g D (3 sin x — 5 cos x) ] B
15. (a) [D]B = 17. [S°TWB
2     0 0 0     2 1 .0   -1 2. "-1 -2" 1 -1.
19. Invertible, T      + fox) = — b + ax
21. Invertible, T_1(p(x)) = p(x - 2)
23. Invertible, T~\a + bx + cx2) = (a - b + 2c) +
(b - 2c)x + cx2 or T'^ipix)) = p(x) - p'(x) + p"(x) 25. Not invertible        27. — 3 sin x — cos x + C 29.fe2*cosx -      sinx + C
31. C
33. C = {l - x, 2 + x}
35. C = {l,x}
37. [r]g-
(d2 - d22)/(d2 + d\)     2dld2/(df + d\) 2dld2/(d2 + d22)     {d\ - d\)l{d\ + d22)
Exercises 6.7 1. y(f) = 2e}t/e3
3. y(t) = ((1 - e4)e:" + (e3 - l)e4t)/(e3 - e4) 5. /(f) = '
eV5 - 1
■e(l + V5)l/2 _ g(l-V5)t/2j
7. v(f) = e* - (1 - e~l)te* 9. y(f) = ((/c + l)ekl + (k- l)e-kt)/2k ll.y(t) = e'cos(2r) 13. (a) p(t) = 100e'n(I6)t/3 « lOOe0924'
(b) 45 minutes (c) In 9.968 hours
15. (a) m(t) = 50e~d, where c = ln2/1590 « 4.36 X 10"4; 32.33 mg remain after 1000 years, (b) After 3691.9 years
5 - 10cos(10VK) /—
17. x(0 =----—- sin(Vkt) + 10 cos(VIr)
sin(lOVK)
19. (b) No
Review Questions 1. (a) F      (c) T
3. Subspace 5. Subspace
7. Let CiA + c2B = O. Then cYA - c2.B = c,AT + c2BT = (ctA + c2J3)r = O. Adding, we have 2c,A = O, so Cj = 0 because A is nonzero. Hence c25 = O, and so c2 = 0. Thus, {A, B} is linearly independent.
9. {l,x2,x4}, dim W = 3
11. Linear transformation
"1	0	-r
0	1	-2
0	0	i
1	0	-l
13. Linear transformation 15. n2 — 1
17.
19. S ° T is the zero transformation.
Chapter 7
Exercises 7.1
1. (a) -10 (b) Vl4 (c) V93
3. Any nonzero scalar multiple of
5. (a) 1 (b) V13 (c) V14
7. x2 is one possibility 9. (a) 77 (b) Vtt
"0
13. Axiom (4) fails: u = 15. Axiom (4) fails: u =
(c) Vtt # 0, but (u, u) = 0.
+ 0, but (u, u) = 0.
17. Axiom (4) fails: p(x) = 1 - x is not the zero polynomial, but (p(x),p(x)) = 0.
r4 r
(e) F
(g) F      (i) T
= (u, u) + (v, v) + (w, w)
+ 2(u, v) - 2(u, w) - 2(v, w) = 1+ 3+ 4 + 2- 10 - 0 = 0
AHS32
Answers to Selected Odd-Numbered Exercises
Therefore, jju + v — w|j — 0, so, by axiom (4), u4-v — w = 0oru + v = w.
31. (u + v, u - v) = (u, u) - (u, v) + (v, u) - (v, v) = i|u||2 - (u,v) + (u, v) - j|vjj2 = ||u||2 - ||v||2
33. Using Exercise 32 and a similar identity for jju - v|j2, we have
|ju + v||2 + |ju - v|2 = (u + v, u + v) + (u - v, u - v) = ||uj|2 + 2{u,v) + j;v|j2
+ ||u||2 - 2{u, v) + ||v||2 = 2!ju|j2 + 2|[v|!2
Dividing by 2 yields the identity we want.
35. |ju + v|| = jju - v|j •*+■ Jju + v||2 = |)u - vj|2 4* |[u||2 + 2{u, v) + |v||2 = Siu||2 - 2(u,v) + ||v{|2 2{u,v) = -2{u,v)^(u, v) = 0
37.
39. {l,x,x2}
41. (a) 1/V2, V3x/V2,V5(3x2 - 1)/2V2 (b) V7(5x3 - 3x)/2V2
Exercises 7.2 1. |u||£ = V42, jjujj, = 10, ||u||m = 5 3. d£(u, v) = V70, d£u, v) = 14, dm(u, v) = 6 5. i|u|i„ = 4, |jv||„ = 5
7. (a) At most one component of v is nonzero. 9. Suppose ||vj|m = \vk\. Then ||v||£ =
Vvf 4----+v|T - ■ • + vl a Vvl = I vk\ = Ijvjl
11. Suppose ||vj[m = |vkj. Then jv,-j s \vk\ for
i = 1,so
ML = In! + ■■• + \v„\ < (vjtl + ■■ ■ + \vk\
= n\vk\ = «SJv|m
13.
y
21. jjAjjp = V19, jjAjJ! = 4, j|Ajjx = 6 23. jjAj|F = V31, |jA||i = 6, |A|jx = 6 25. ||A||F= 2V/iT,i|A|1 = 7, \{A\\„ = 7
27.x
			~0~		"i
	-1*				
,y =	l.	29, x =	0 .1.	>y =	l .1.
	"l"		1
31. x =	0	,y =	-l
	0		-l
33. (a) By the definition of an operator norm, \\I\\ =
max||Jx|| = max||x|| = 1.
W=i MM
35. condj(A) = cond„(A) = 21; well-conditioned
37. cond^A) = condo;(A) = 400; ill-conditioned
39. condi(A) = 77, condx(A) = 128; moderately ill-conditioned
41. (a) cond^CA) = (max{|fe| 4- 1,2» •
I max
+
1
k - 1
43. (a) condx(A) = 40
(b) At most 400% relative change
45. Using Exercise 33(a), we have cond(A)
WAA"'1! = ||/|| = 1.
49. k > 6 51. k > 10
iiAllllA-1!!
Exercises 7.3
l. []e|] = V2 5. Bel = V7
1.414 2.646
3. ilejj = V6/2 « 1.225
7.y 9. y 11. y 13. y 15. y
19.x
23.x
-3 + ix, ilejj 2x, llell '
ii
3
z 4- -8-x llell
10   '   25x> Fll 1
I* lle!l
lx + x2
« 1.225 0.816
0.447 « 0.632
7
Lis
17. y = 21. x =
18 _ iZr _ Ir2 5        10x 2A
4~
4 + t ' -5 - t -5 - It t
25.
42 11 19 11 11 11 J
Answers to Selected Odd-Numbered Exercises ANS 13
27.x =
29. y = 0.92 + 0.73%
31. (a) If we let the year 1920 correspond to t = 0, then y = 56.6 + 2.9f; 79.9 years
33. (a) p(t) - 150eai31!
35.139 days
37.
		-1	1	1 "		"2"
.J.		3	3	3		
2 7	39.	1 3	] 3	1 3	>	2
-2-		1 -3	1 3	1 3-		.2.
41.
45. A+ = [
47. A+
51. A
49. A+ =
1 -1 0 1
53. (a) If A is invertible, so is Ar, and we have A+ =
	~o~		0
23. (Exercise 7) A = 3	1	[0   1] + 2	0
	.0.		L.-1-
33. The line segment [-1,1]
35. The solid ellipse--h —
5 4
1
37. (a) HAD, = V2 39. (a) ||A||2 = 1.95
41. A"
45. A+ =
47. A+ =
A A'
25 25
A A
25 25
43. A"
III
6    6 6
III 6     6     6 J
61.
63.
V2 0
0 0
2 -1 -1 3
0.52 1.04
1 1
1/V2 1/V2 ■1/V2 1/V2 0 1
-1 0
[1 o]
(b) cond2(A) = » (b) cond2(A) = 38.11
0
Exercises 7.4 1. 2,3
9. V5,2,0
11. A
13. A
15. A =
17. A =
19. A =
3. V^,0
V2 0 0 0
5. 5
7. 2,3
I/V2 I/V2" .1/V2 -I/V2
3 0 0 2
-1 0
	"5"
	,0.
[1]
0
1
.0 -
1 0 0 1
0 1 0 0
-1 0
3 0 0 2 0 0
V5 0 0 0    2 0
0 1
1 0
2/V5 0 0 1
1/V5 0
21. A = V2
[1/V2 l/V2]+0
1/V5 0
-2/V5J
Exercises 7.5	
1. -g(x) = \	3. g(x) = fx
5- g(x) = il +	jfx2 7.{l,x-i}
9. g(x) = x -	1 6
11. ^(x) = (4e -	- 10) + (18 - 6e)x « 0.87 + 1.69*
13.g(x) =i-	fx + fx2
15. g(x) = 39e	- 105 + (588 - 216e)x +
(210e - 570)x2 = 1.01 + 0.85x + 0.84x2
TT       4 ( COS 3%
21.---cos x +
2 TT
[-I/V2   1/V2] (Exercise 3)
23. a0 = 5> ak = °> ^* ~ 25. a0 = 77,    = 0, bk =
Review Questions 1. (a) T      (c) F
3. Inner product
1 - (-1)'
kTT
2(-l)k
(e) T (g) T 5. V3 7.
(i) T
/	r	\	■ r 2
I	1.	/'	1 - 2-
ANS34
Answers to Selected Odd-Numbered Exercises
9. Not a norm
I3.7 = 1.7* 17. (a) V2, V2
11. concUA) » 2432
15.
	~1/V2	1/V2	0"	"V2	0"
(b) A =	0	0	1	0	V2
	. 1/V5	-1/V2	o_	0	0_
(c) Af =
19. The singular values of PAQ are the square roots of the eigenvalues of {PAQ)T(PAQ) = QrATPTPAQ = QT(ATA)Q. But Qr(ArA)Q is similar to ATA because QT = Q_1, and hence it has the same eigenvalues as ATA. Thus, PAQ and A have the same singular values.
■ K.
Index
Abel, Niels Henrik, 311, D8 Absolute value, C3 Addition
closure under, 192, 429
of complex numbers, CI
of matrices, 140
of polynomials, D2
of vectors, 5-6, 9, 429 Adjacency matrix, 242, 244 Adjoint (adjugate) of a matrix, 276 Al-Khwarizmi, Abu Ja'far Muhammad ibn
Musa, 85 Algebraic multiplicity, 294 Algebraic properties of vectors, 10 Algorithm, 85
Allocation of resources, 99-101 Altitude of a triangle, 33 Angle between vectors, 24-26 Argand, Jean-Robert, CI Argand plane, CI
Argument of a complex number, C4 Arithmetic mean, 548
Arithmetic Mean—Geometric Mean Inequality, 548 Associativity, 10, 154, 158, 223, 429 Attractor, 350 Augmented matrix, 61, 64 Axioms
inner product space, 531
vector space, 429
B
Back substitution, 61
Balanced chemical equation, 101-102
Basis, 198, 446-448
change of, 463-470, 507-509
coordinates with respect to, 208, 448-452
orthogonal, 370, 537
orthonormal, 372, 537
standard, 198, 447 Basis step, Bl Basis Theorem, 202, 453 Best Approximation Theorem, 570 Best approximation, to a vector, 570-571 Binary vector, 14 Binet, Jacques, 338 Binet's formula, 339, 428 Bipartite graph, 251,254
Block, 145
Block multiplication, 148 Block triangular form, 283 Bunyakovsky, Viktor Yakovlevitch, 539
C
% 435 C2, 432 C", 543
Carroll, Lewis, 141,284 Cassini, Giovanni Domenico, 362 Cassinis identity, 362 Cauchy, Augustin-Louis, 273, 280 Cauchy-Schwarz Inequality, 22, 539-540 Cayley, Arthur, 300 Cayley-Hamilton Theorem, 300 Centroid of a triangle, 32 Change of basis, 463-470 Characteristic equation, 292 Characteristic polynomial, 292 Circuit, 242
Circumcenter of a triangle, 33
Closure
under addition, 192, 429 under linear combinations, 192 under scalar multiplication, 192, 429
Codomain, 212
Coefficient(s) Fourier, 615
of a linear combination, 12, 154 of a linear equation, 58 matrix, 64
method of undetermined, D7 of a polynomial, Dl Cofactor, 266
Cofactor expansion, 266-269
Column matrix, 138
Column-row representation of a matrix
product, 147 Column space, 195 Column vector, 3, 138 Commutativity, 10, 19, 154, 429 Companion matrix, 299 Complete bipartite graph, 254 Complex dot product, 543 Complex numbers, Cl-Cll
absolute value of, C3
addition of, CI
argument of, C4
conjugate of, C2
division of, C2, C5
equality of, CI
imaginary part of, CI
modulus of, C3
multiplication of, C1-C2, C5
negative of, C2
polar form of, C3-C6
powers of, C6-C7
principal argument of, C4
real part of, CI
roots of, C7-C8 Complex plane, CI Complex vector space, 429, 543 Component of a vector, 3
orthogonal to a subspace, 382, 538 Composition of linear transformations, 219,
476-478 Condensation method, 284-285 Condition number, 562,602 Conic sections, 415-416 Conjugate of complex numbers, C2-C3 Conjugate transpose of a matrix, 544-545 Connected graph, 361 Conservation of flow, 102 Consistent linear system, 60 Constant polynomial, D1 Constrained optimization, 413-415,
547-551 Consumption matrix, 236 Contradiction, proof by, A8 Contrapositive, proof by, A8 Convergence of iterative methods, 125, 316,
563-566 Coordinate grid, 13 Coordinate vector, 208, 448-452 Coordinates, 207-209 Cotes, Roger, 569 Cramer, Gabriel, 274 Cramers Rule, 274-275 Cross product, 48-49, 286-287 Crystallographic restriction, 517 Curve fitting, 290-291
D
3, 435
De Moivre, Abraham, C6 De Moivre's Theorem, C6-C9 Degenerate conic, 415, 424 Degree of a polynomial, Dl Demand vector, 236 Descartes, René, 3, D9
:,; Index
Descartes' Rule of Signs, D9-D10 Determinant(s), 165, 263-265
cofactor expansion of, 266-269
of elementary matrices, 271-272
geometric applications of, 286-291
history of, 280-281
and matrix operations, 272-274
of« X n matrices, 265-269
properties of, 269-274
Vandermonde, 291 Diagonal entries of a matrix, 139 Diagonal matrix, 139 Diagonalizable linear transformation, 509 Diagonalizabie matrix, 303
orthogonally, 400
unitarily, 546-547 Diagonalization, 303-309
orthogonal, 400-407 Diagonalization Theorem, 307 Diagonalizing a quadratic form, 411 Diagonally dominant matrix, 128, 324 Difference
of complex numbers, C2
of matrices, 140
of polynomials, D2
of vectors, 8, 433 Differential equation(s), 363,436, 518
boundary conditions for, 523
homogeneous, 436, 518-525
initial conditions for, 340, 343, 344, 363
solution of, 518
system of linear, 340-348 Differential operator, 473 Digital image compression, 607-608 Digraph, 243 Dimension, 203, 452-456 Direct proof, A7 Direction vector, 35, 39 Disjoint sets, A4 Distance
Hamming, 554
from a point to a line, 41-43
from a point to a plane, 43-44
taxi cab, 529-531
between vectors, 23-24, 535 Distance functions, 554-555 Distributivity, 10, 19, 154, 158, 429 Divergence, 127 Division algorithm, D4 Dodgson, Charles Lutwidge, 281, 284 Domain, 212
Dominant eigeiwalue, 311 Dominant eigenvector, 311 Dot product, 18-20, 49
complex, 543
weighted, 532 Dual space, 514
Dynamical system, 253, 348-355 trajectory of 349
E
Echelon form of a matrix reduced row, 73 row, 65
Edge of a graph, 242 Eigenspace, 256 Eigenvalue(s), 254
algebraic multiplicity of, 294
dominant, 311
geometric multiplicity of, 294 inverse power method for computing,
317- 318
power method for computing, 311-316 shifted inverse power method for computing,
318- 319
shifted power method for computing, 316-3171 Eigenvector(s), 254
dominant, 311
orthogonal, 402 Electrical network, 104-107 Elementary matrix, 170 Elementary reflector, 397 Elementary row operations, 66 Elements of a matrix, 138 Elimination
Gauss-Jordan, 72-76
Gaussian, 68-72 Empty set, A2 Equality
of complex numbers, CI
of matrices, 139
of polynomials, D2
of sets, A1-A2
of vectors, 4 Equation(s)
linear, 58
normai, 575
system of linear, 59 Equilibrium, 50,107 Equivalence relation, 302 Error vector, 565, 572 Euclidean norm, 553 Euler, Leonhard, C9 Euler's formula, C9-C11 Even function, 617 Exchange matrix, 235 Expansion by cofactors, 266-269 Exponential of a matrix, 346
F
gF,431
Factor Theorem, D4 Factorization
IV, 180-186
modified QR, 396-398
QR, 392-394 Feasible solution, 236 Fibonacci, 336 Fibonacci numbers, 335,
338-339, 427 Field, 429
Finite-dimensional vector space, 453 Finite linear games, 109-113 Floating-point form, 83 Force vectors, 50-53 Fourier approximation, 615 Fourier coefficients, 615 Fourier, Jean-Baptiste Joseph, 616
Fourier series, 617
Free variable, 71
Frobenius, Georg, 204
Frobenius norm, 556
Fundamental subspaces of a matrix, 380
Fundamental Theorem of Algebra, D8
Fundamental Theorem of Invertible
Matrices, 172, 206, 296,512,
605-606
G
Galilei, Galileo, 526
Galois, Evariste, 311, D8
Gauss, Carl Friedrich, 69, 125, 538, 569, D8
Gauss-Jordan elimination, 72-76
Gauss-Jordan inverse method, 175-178
Gauss-Seidel method, 124-131
Gaussian elimination, 68-72
General form of the equation of a line, 34, 36,41
General form of the equation of a plane, 38, 41
Geometric mean, 548
Geometric multiplicity, 294
Gerschgorin disk, 319
Gerschgorin Disk Theorem, 321
Gerschgorin, Semyon Aranovich, 319
Gerschgorin's theorem, 319-322
Gibbs, Josiah Willard, 49
Global Positioning System (GPS), 121-123
Google, 358
Gram, Jörgen Pedersen, 390 Gram-Schmidt Process, 388-392 Graph, 242, 253-254
adjacency matrix of, 242, 244
bipartite, 251
complete, 253
complete bipartite, 254
connected, 361
cycle, 254
directed (digraph), 243
edges of, 242
k-regular, 361
path in a, 242
Petersen, 254
vertices of, 242 Grassmann, Hermann, 429 Grassmann's Identity, 458, 496
H
Half-life, 520
Hamilton, William Rowan, 2, 300
Hamming distance, 554
Hamming norm, 554
Harmonic mean, 551
Head of a vector, 3
Head-to-tail rule, 6
Hermitian matrix, 545
Hilbert, David, 403
Hoene-Wronski, Jösef Maria, 457
Homogeneous linear differential equations,
518-525 Flomogeneous linear system, 76 Hooke's Law, 524 Householder, Alston Scott, 397 Householder matrix, 397 Hyperplane, 40
Index 13
I
!,C1
Idempotent matrix, 179 Identity matrix, 139 Identity transformation, 221, 474 Ill-conditioned linear system, 84 Ill-conditioned matrix, 561 Image, 212 Imaginary axis, CI Imaginary conic, 424 Imaginary part of a complex number, CI Inconsistent linear system, 60 Indefinite matrix, 413
quadratic form of, 413 Index of summation, A5 Indirect proof, A7 Induction hypothesis, Bl Induction step, Bl
Infinite-dimensional vector space, 453 Initial point of a vector, 3 Inner product, 531 Inner product space, 531-534
and Cauchy-Schwarz and Triangle Inequalities, 539-540
distance between vectors in, 535
length of vectors in, 535
orthogonal vectors in, 535
properties of, 535 Integers modulo m, 14-16 Interior of a matrix, 284 Intersection of sets, A4 Inverse
Gauss-fordan method of computing, 175-178
of a linear transformation, 221-222, 478-479
of a matrix, 163 Inverse power method, 317-318
shifted, 318-319 Invertible linear transformation, 221-222,
478-479 Invertible matrix, 163-170 Irreducible matrix, 335 Irreducible polynomial, D7 Isometry, 375 Isomorphism, 493-495 Iterative method (s)
convergence of, 125, 316, 563-566
Gauss-Seidel method, 124-131
inverse power method, 317-318
Jacobi's method, 124-131
power method, 311-316
shifted inverse power method, 318-319
shifted power method, 316-317
J
Jacobi, Carl Gustav, 124 Jacobis method, 124-131 Jordan, Wilhelm, 72
K
Kernel, 482 Kirchhoff's Laws, 104
L
Lagrange interpolation formula, 459
Lagrange, Joseph-Louis, 458
Lagrange polynomials, 458
Laplace Expansion Theorem, 266, 277-280
Laplace, Pierre Simon, 267
Lattice, 516
Leading entry, 65
Leading 1,73
Leading variable, 71
Least squares approximating line, 574
Least squares approximation, 568-569, 571-582
Best Approximation Theorem and, 570-571
and orthogonal projection, 583-585
and the pseudoinverse of a matrix, 585-586
via the qr factorization, 582-583
via the singular value decomposition, 603-605 Least squares error, 572 Least squares solution, 574
of minimal length, 603-604 Least Squares Theorem, 575 Left singular vectors, 593 Legendre, Adrien Marie, 538 Legendre polynomials, 538 Leibniz, Gottfried Wilhelm von, 281 Lemma, 271 Length
of a binary vector, 14
of an m-ary vector, 16
of a path, 242
of a vector, 20, 535 Leonardo of Pisa, 336 Leontief closed model, 108,235 Leontief open model, 108, 236 Leontief, Wassily, 107 Leslie matrix, 240 Leslie model, 239-241, 330-332 Line, 34-38
of best fit, 574
equation(s) of, 34, 36,41
least squares approximating, 574 Linear combination, 12, 154, 433 Linear dependence, 92-93, 157, 443, 446 Linear economic models, 107-109, 235-236 Linear equation(s), 58, 59. See also Linear system(s)
Linear independence, 92-97, 157, 443-446 Linear recurrence relations, 335-336 Linear system(s), 58-62
augmented matrix of, 61,64
coefficient matrix of, 64
consistent, 60
direct method for solving, 64-79 equivalent, 60 homogeneous, 76 ill-conditioned, 84 inconsistent, 60
iterative methods for solving, 124-131 solution (set) of, 59 over Rp, 77-79 Linear transformation(s), 213-214, 472-474 onto, 488
composition of, 219, 476-478 diagonalizable, 509 identity, 221,474 inverse of, 221-222, 478-479 invertible, 221-222, 478-479
kernel of, 482
matrix of, 216, 497-503
nullity of, 484
one-to-one, 488
properties of, 475-476
zero, 474 Linearly dependent matrices, 157 Linearly dependent vectors, 93, 443, 446 Linearly independent matrices, 157 Linearly independent vectors, 93, 443,446 Long range transition matrix, 329 LU factorization, 180-186 Lucas, Edouard, 336,428
M
m-ary vector, 16 Mm„,430
Maclaurin, Colin, 274, 280 Magic square, 460-462
classical, 460
weight of a, 460 Mantissa, 83
Markov, Andrei Andreyevich, 230 Markov chain, 230-235, 325-330 Mathematical induction, B1-B7
first principle of, Bl
second principle of, B5 Matrix (matrices), 61, 138
addition of, 140
adjacency, 242, 244
adjoint (adjugate), 276
associated with a quadratic form, 409
augmented, 61, 64
change-of-basis, 465
characteristic equation of, 292
characteristic polynomial of, 292
coefficient, 64
column space of, 195
companion, 299
condition number of, 562
conjugate transpose of, 544-545
consumption, 236
determinant of, 165, 264, 265-269
diagonal, 139
diagonalizable, 303-309
difference of, 140
eigenspace of, 256
eigenvalue of, 254
eigenvector of, 254
elementary, 170
elements of, 138
entries of, 138
equality of, 139
exchange, 235
exponential of, 346
factorization of, 180
fundamental subspaces of, 380
Hermitian, 545
idempotent, 179
identity, 139
ill-conditioned, 561
indefinite, 413
interior, 284
inverse of, 163
invertible, 163-170
Index
Matrix (Continued) irreducible, 335 Leslie, 240
of a linear transformation, 216,497-503
multiplication of, 141-143
negative definite, 413
negative of, 140
negative semidefinite, 413
nilpotent, 282
norm of, 555-561
normal, 547
null space of, 197
nullity of, 204
orthogonal, 373-376
orthogonally diagonalizable, 400
partitioned, 145-149
permutation, 187
positive, 325
positive definite, 413
positive semidefinite, 413
powers of, 149-150
primitive, 335
productive, 237-238
projection, 218-219, 366, 586
pseudoinverse of, 585-586,602-603
rank of, 72, 204
reduced row echelon form of, 73 reducible, 334 regular, 325
row echelon form of, 65 row equivalent, 68 row space of, 195 scalar, 139
scalar multiple of, 140 similar, 301-303 singular values of, 590-591 singular vectors of, 593 size of, 138 skew-symmetric, 162 square, 139 standard, 216 stochastic, 232
strictly diagonally dominant, 128 sum of, 140
symmetric, 151-152,160-161
trace of, 162
transition, 231
transpose of, 151, 159-160
unit lower triangular, 181
unitarily diagonalizable, 546-547
unitary, 545-546
upper triangular, 162
zero, 141
Matrix-column representation of a matrix
product, 146 Matrix factorization, 180. See also Singular value decomposition (SVD) and diagonalization, 303-309 LU, 180-186 modified QR, 396-398 ptw, 186-187 QR, 392-394
and Schur's Triangularization Theorem, 408
Matrix transformation, 211-216, 472
projection, 218-219, 509-510
reflection, 215
rotation, 216-218 Max norm, 553 Mean
arithmetic, 548
geometric, 548
harmonic, 551
quadratic, 550 Median of a triangle, 32 Metric, 555 1 Metric space, 555
Minimum length least squares solution, 603-604 Minor, 264
Modified qr factorization, 396-398 Modular arithmetic, 13-16 Modulus of a complex number, C3 Moore, Eliakim Hastings, 602 Moore-Penrose inverse, 602 Muir, Thomas, 281 Multiplication
of complex numbers, C1--C2, C5
of matrices, 141-143
of polynomials, D2-D3
scalar, 7-8, 140,429 Multiplicity of an eigenvalue
algebraic, 294
geometric, 294
N
Negative
of a complex number, C2
of a matrix, 140
of a vector, 8, 429 Negative definite matrix, 413
quadratic form of, 413 Negative semidefinite matrix, 413
quadratic form of, 413 Net reproduction rate, 360 Network, 102 Network analysis, 102-103 Newton's Second Law of Motion, 524 Nilpotent matrix, 282 Node, 102
Nondegenerate conic, 415-416 Norm of a matrix, 555-561
1- , 559
2- , 559 7-, 559
compatible, 556 Frobenius, 556 operator, 559 Norm of a vector, 20, 535, 552
1- , 553
2- , 553 7-, 553
Euclidean, 553 Hamming, 554 max, 553 sum, 552 taxicab, 530 uniform, 553
Normal equations, 575
Normal form of the equation of a line, 34, 36, 41
Normal form of the equation of a plane, 38, 41
Normal matrix, 547
Normal vector, 34, 38
Normalizing a vector, 21
Normed linear space, 552
Null space, 197
Nullit)'
of a linear transformation, 484 of a matrix, 204
o
Odd function, 617 Ohm's Law, 104 One-to-one, 488 Onto, 488
Operator norm, 559 Optimization
constrained, 413-415
geometric inequalities and, 547-551 Orbital center, 355 Ordered n-tuple, 9 Ordered pair, 3 Ordered triple, 8 Orthocenter of a triangle, 33 Orthogonal basis, 370, 537 Orthogonal complement, 378-382 Orthogonal Decomposition Theorem, 384-385 Orthogonal diagonalization, 400-407 Orthogonal matrix, 373-376 Orthogonal projection, 382-387, 538
least squares approximation, 583-585 Orthogonal set of vectors, 369-373, 537 Orthogonal vectors, 26, 535 Orthonormal basis, 372, 537 Orthonormal set of vectors, 372, 537 Outer product, 147 Outer product expansion, 147 Outer product form of the SVD, 596
P
9, 431 <3>„ 431
Parallel vectors, 8 Parallelogram rule, 6 Parameter, 36 Parametric equation
of a line, 36,41
of a plane, 39,41 Partial fractions, 119 Partial pivoting, 84-85 Partitioned matrix, 145-149 Pafh(s)
k-, 243
length of, 242
number of, 242-245
simple, 242 Peano, Giuseppe, 429 Penrose conditions, 586 Penrose, Roger, 603 Permutation matrix, 187 Perpendicular bisector, 33
Index
Perron eigenvector, 335 Perron-Frobenius Theorem, 332-335 Perron, Oskar, 332 Perron root, 335 Perrons Theorem, 333 Petersen graph, 254 Pivot, 66 Pivoting, 66
partial, 84-85 Plane, 38-41
Argand, CI
complex, CI
equation of, 38, 39, 41 Polar decomposition, 610 Polar form of a complex number, C3-C6 Polya, George, A7 Polynomial, D1-D10
characteristic, 292
constant, Dl
degree of, Dl
irreducible, D7
Lagrange, 458
Legendre, 538
Taylor, 472
trigonometric, 614
zero of, D4 Population distribution vector, 239 Population growth, 239-241, 330-332 Positive definite matrix, 413
quadratic form of, 413 Positive matrix, 325 Positive semidefmite matrix, 413
quadratic form of, 413 Power method, 311-316
inverse, 317-318
shifted, 316-317
shifted inverse, 318-319 Predator-prey model, 343 Price vector(s), 235 Primitive matrix, 335
Principal argument of a complex number, C4 Principal Axes Theorem, 411 Probability vector, 231 Product
of complex numbers, C1-C2
of matrices, 141-143
of polynomials, D2-D3 Production vector, 236 Projection
orthogonal, 382-387, 538
into a subspace, 382
onto a vector, 27-28 Projection form of the Spectral Theorem, 405 Projection matrix, 218-219, 366, 586 Proof
by contradiction, A8
by contrapositive, A8
direct, A7
indirect, A7
by mathematical induction, B1-B7 Pseudoinverse of a matrix, 585-586,
602-603 Pythagoras' Theorem, 26, 537
Q
qr algorithm, 398-399 qr factorization, 392-394
least squares and, 582-583
modified, 396-398 Quadratic equation(s), D6
graphing, 415-423 Quadratic form, 408-416
indefinite, 413
matrix associated with, 409
negative definite, 413
nsgative semidefmite, 413
positive definite, 413
positive semidefmite, 413 Quadratic mean, 550 Quadric surface, 420 Quotient of complex numbers, C2, C5
R
R,4 R3,8 B",9-ll
Racetrack game, 1-3 Range, 212, 482 Rank
of a linear transformation, 484
of a matrix, 72, 204
singular value decomposition, 600 Rank Theorem, 72, 205, 386, 486 Ranking vector, 356-358 Rational Roots Theorem, D5 Rayleigh, Baron, 316 Rayleigh quotient, 316 Real axis, CI Real part of a complex
number, CI Recurrence relation, 336
solution of, 337 Reduced row echelon form, 73 Reducible matrix, 334 Reflection, 215 Regular graph, 361 Regular matrix, 325 Repeller, 352 Resolving a vector, 51 Resultants, 50 Right singular vectors, 593 Robotics, 226-229 Root, for a polynomial equation, D4 Root mean square error, 612 Rotation, 216-218
center of, 516 Rotational symmetry, 516 Roundoff error, 62, 83-84 Row echelon form, 65 Row equivalent matrices, 68 Row matrix, 138
Row-matrix representation of a matrix
product, 146 Row reduction, 66 Row space, 195 Row vector, 3, 138
Saddle point, 352 Scalar, 8
Scalar matrix, 139
Scalar multiple, 481
Scalar multiplication, 7-8, 9, 140, 429
closure under, 192,429 Scaling, 314 Schmidt, Erhard, 390 Schur complement, 283 Schur, Issai, 283
Schur's Triangularization Theorem, 408 Schwarz, Karl Herman Amandus, 539 SeidtT, Philipp Ludwig, 125 Seki Kowa, Takakazu, 280 Set(s), A1-A4
disjoint, A4
elements of, Al
empty, A2
intersection of, A4
subset of, A2
union of, A4 Shifted inverse power method, 318-319 Similar matrices, 301-303, 508 Simple path, 242
Singular value decomposition (SVD), 590-599
applications of, 599-606
and condition number, 602
and least squares approximation, 603-605
and matrix norms, 600-602
outer product form of, 596
and polar decomposition, 610
and pseudoinverse, 602-603
and rank, 600 Singular values, 590-591 Singular vectors, 593 Size of a matrix, 138 Skew lines, 76
Skew-symmetric matrix, 162 Solution
of a differential equation, 518
least squares, 574-582
of a linear system, 59
minimum length least squares, 603
of a recurrence relation, 337
of a system of differential equations, 340-342 Span, 90, 156, 193,438 Spanning set of vectors, 88-92 Spanning sets, 438-441 Spectral decomposition, 405 Spectral Theorem, 403
projection form of, 405 Spectrum, 403 Spiral attractor, 355 Spiral repeller, 355 Square matrix, 139, 374 Square root of a matrix, 424 Standard basis, 198, 447 Standard matrix, 216 Standard position, 4 Standard unit vectors, 22 State vector, 231 Steadv state vector, 233
Index
Stochastic matrix, 232
Strictly diagonally dominant matrix, 324
Strutt, John William, 316
Submatrices, 145
Subset, A2
Subspace(s), 192, 433-438 fundamental, 380
spanned by a set of vectors, 192-193, 441
sum of, 442
trivial, 437
zero, 437 Subtraction
of complex numbers, C2
of matrices, 140
of polynomials, D2
of vectors, 8,433 Sum
of complex numbers, CI
of linear transformations, 481
of matrices, 140
of polynomials, D2
of subspaces, 442
of vectors, 5-6, 9, 439 Sum norm, 552 Summation notation, A4-A7 Sustainable harvesting policy, 360 Sylvester, James Joseph, 206, 280 Symmetric matrix, 151-152, 160-161 System of linear differential equations, 340-348 System(s) of linear equations, See Linear system(s)
T
Tail of a vector, 3
Taussky-Todd, Olga, 320
Taxicab circle, 530
Taxicab distance, 529-531
Taxicab norm, 530
Taxicab perpendicular bisector, 530
Taxicab pi, 530
Taylor polynomial, 472
Terminal point of a vector, 3
Ternary vector, 16
Tlieorem, 10
Tiling, 515
Tournament, 244
Trace of a matrix, 162
Transformation, 212
linear, 213-214, 472-474
matrix, 211-216, 472 Transitional matrix, 231 Transitional probabilities, 230 Translational symmetry, 516 Transpose of a matrix, 151, 159-160 Triangle inequality, 22, 540, 552
Trigonometric polynomial, 614 Triple scalar product identity, 287 Trivial subspace, 437 Turing, Alan Mathison, 181
u
Uniform norm, 553 Union of sets, A4 Unit circle, 21
Unit lower triangular matrix, 181 Unit sphere, 535 Unit vector, 21, 535
Unitarily diagonalizable matrix, 546-547 Unitary matrix, 545-546 Upper triangular matrix, 162 block, 283
V
Vandermonde, Alexandre-Theophile, 291 Vandermonde determinant, 291 Vector(s), 3, 9, 429, 439
addition of, 5-6, 9, 439
algebraic properties of, 10
angle between, 24-26
binary, 14
column, 3, 138
complex, 429, 432, 543-544
complex dot product of, 543
components of, 3
coordinate, 208, 448-452
cross product of, 48-49, 286-287
demand, 236
direction, 35, 39
distance between, 23-24, 535
dot product of, 18-20
equality of, 3
force, 50-53
inner product of, 531
length of, 20, 535
linear combination of, 12,433
linearly dependent, 92-93, 443, 446
linearly independent, 92-97, 443, 446
norm of, 20, 535, 552
normal, 34, 38
orthogonal, 26, 369-373, 535, 537 orthonormal, 372, 537 parallel, 8
population distribution, 240 price, 235 probability, 231 production, 236 ranking, 356-358 resultant, 50 row, 3, 138
scalar multiplication of, 7-8, 9, 429
span of, 438
spanning sets of, 88-92
state, 231
steady-state, 233
ternary, 16
unit, 21, 535
zero, 4, 429 Vector form of the equation of a
line, 36, 41 Vector form of the equation of a
plane, 39, 41 Vector space(s), 429
basis for, 446
complex, 429, 432, 543-544
dimension of, 453
finite-dimensional, 453
infinite-dimensional, 453
isomorphic, 493-495
subspace of, 433-438
over TLp 429,432 Venn diagram, A2-A3 Venn, John, A2 Vertex of a graph, 242
W
Weight of a magic square, 460 Weighted dot product, 532 Well-conditioned matrix, 561 Wessel, Caspar, CI Weyl, Hermann, 429 Wheatstone bridge circuit, 105-106 Wilson, Edwin B„ 49 Wronskian, 457
X
x-axis, 3 X)'-plane, 8 xz-plane, 8
Y
j'-axis, 3 yz-plane, 8
z
Z, 14
Ii, 14
II14
Z„„ 16
Z,"„ 16
Z-axis, 8
Zero matrix, 141
Zero of a polynomial, D4
Zero subspace, 437
Zero transformation, 474
Zero vector, 4, 429
index of Notation
v   vector, 3, 439
AB    vector as directed line segment, 3
vector as ordered n-tuple, 9
R"   vector space of ordered n-tuples of real numbers, 9 Z,   the integers modulo 2, 14 1 Z2"   binary vectors of length n, 14 Zm   the integers modulo m, 14
m-ary vectors of length n, 16 0   zero vector, 4 u • v   dot product of vectors, 18 ||v|   length (norm) of a vector, 20, 535, 552 ej, e2,..., e„   standard unit (basis) vectors in R", 22 d(u, v)   distance between vectors, 23 proju(v)   (orthogonal) projection of v onto u, 27 u X v   cross product of vectors, 48 [A | b ]   augmented matrix, 61 rank(A)   rank of a matrix, 72,204 span(v], v2,..., vt)   span of a set of vectors, 99, 438
«1«
a2n
mXn   matrix, 138-139 I = I„   identity matrix, 139
0= 0„
zero matrix, 141
Ak   kth power of a (square) matrix, 149 A T   transpose of a matrix, 151 A 1   inverse of a matrix, 163
det A = \A\   determinant of a matrix, 165, 264-265 row(A)   row space of a matrix, 195 col(A)   column space of a matrix, 195 null(A)   null space of a matrix, 197 B   basis, 198,446
dim V  dimension of a vector space, 203, 454 nullity(A)   nullity of a matrix, 204 [v] B   coordinate vector of v with respect to the basis B, 208,449
T   linear transformation, 213, 472
TA   matrix transformation, 214
S ° T   composition of linear transformations, 219, 477
[ T ]   standard matrix of a linear transformation, 216
A   eigenvalue, 254
EA   eigenspace, 256
C    (ij)-cofactor of a matrix, 266
A.(b)   matrix A with column (' replaced by b, 274
adj A   adjoint of a matrix, 276
C{p)   companion matrix of a polynomial, 299
cA(A)   characteristic polynomial of matrix A, 292
A ~ B   is similar to B, 301
A = PDP *   eigenvalue decomposition (diagonalizable
matrix A), 309 D.   z'th Gerschgorin disk, 319 fn   nth Fibonacci number, 335,427 eA   matrix exponential, 346
orthogonal complement of subspace W, 378 projw(v)   orthogonal projection of v onto subspace W, 382 perpu,(v)   component of v orthogonal to W, 382
A = QDQ'   spectral decomposition (symmetric matrix A),
400, 405 xT Ax   quadratic form, 409 Mm    vector space ofra X n matrices, 430 9n   vector space of polynomials of degree £ n, 431 9   vector space of all polynomials, 431 9   vector space of functions/: U —► R, 431 8F[a, b]   vector space of functions/: [a, b] —* R, 432 C"  vector space of ordered n-tuples of complex numbers, 432,540 '€   vector space of continuous functions f: H —> 435 ;^i[a,-b]   vector space of continuous functions/: [a, b] —> M, 435 2   vector space of differentiable functions/: K —> R, 435 1fi[a, b]   vector space of differentiable functions f: [a, b] —> R, 435 Pc^b   change-of-basis matrix, 465 ker(T)   kernel of a linear transformation, 482 range(T)   range of a linear transformation, 482 V = W   V is isomorphic to W, 493 [ r]c<_B   matrix of T with respect to bases B and C, 498 (u, v)   inner product of vectors, 531 A   complex conjugate of a matrix, 544 A* conjugate transpose of a matrix, 544 IIvlls ( = liv!l i)   sum norm (= 1 -norm), 552 llv!L ( = l|v||x)   max norm (= ---norm;, 55? IMIe (= II)   Euclidean norm (= 2-norm), 553 \\v\\H   Hamming norm, 554 i At   matrix norm, 556 ||A||f   Frobenius norm, 556 cond(A)   condition number of a matrix, 562 x   least squares solution of Ax = b, 5"4 A     pseudoinverse of a matrix, 5S5. 602 a   singular value, 590 •
A = US, VT   singular value decomposition (SVD), 593