Distance and Approximation A straight line may be the shortest distance between two points, but it is by no means the most interesting. —Doctor Who In "The Time Monster" By Robert Sloman BBC, 1972 Although this may seem a paradox, all exact science is dominated by the idea of approximation. —Bertrand Russell In W. H. Auden and L. Kronenberger, eds. The Viking Book of Aphorisms Viking, 1962, p. 263 B A Figure 1.1 Taxicab distance 1.0 introduction: Taxicab Geometry We live in a three-dimensional Euclidean world, and, therefore, concepts from Euclidean geometry govern our way of looking at the world. In particular, imagine stopping people on the street and asking them to fill in the blank in the following sentence: "The shortest distance between two points is a_." They will almost certainly respond with "straight line." There are, however, other equally sensible and intuitive notions of distance. By allowing ourselves to think of "distance" in a more flexible way, we will open the door to the possibility of having a "distance" between polynomials, functions, matrices, and many other objects that arise in linear algebra. In this section, you will discover a type of "distance" that is every bit as real as the straight-line distance you are used to from Euclidean geometry (the one that is a consequence of Pythagoras' Theorem). As you'll see, this new type of "distance" still behaves in some familiar ways. Suppose you are standing at an intersection in a city, trying to get to a restaurant at another intersection. If you ask someone how far it is to the restaurant, that person is unlikely to measure distance "as the crow flies" (i.e., using the Euclidean version of distance). Instead, the response will be something like "It's five blocks away." Since this is the way taxicab drivers measure distance, we will refer to this notion of "distance" as taxicab distance. Figure 7.1 shows an example of taxicab distance. The shortest path from A to B requires traversing the sides of five city blocks. Notice that although there is more than one route from A to B, all shortest routes require three horizontal moves and two vertical moves, where a "move" corresponds to the side of one city block. (How many shortest routes are there from A to 5?) Therefore, the taxicab distance from A to B is 5. Idealizing this situation, we will assume that all blocks are unit squares, and we will use the notation dt(A, B) for the taxicab distance from A to B. Problem 1 Find the taxicab distance between the following pairs of points: (a) (1, 2) and (5, 5) (b) (2,4) and (3, -2) (c) (0, 0) and {-A, -3) (d) (-2, 3) and (l, 3) (e) and (-§,§) (f) (2.5,4.6) and (3.1,1.5) 529 Chapter 7 Distance and Approximation Problem 2 Which of the following is the correct formula for the taxicab distance d,(A, B) between A = (au a2) and B = (bv b2)~< (a) dt(A, B) = (al - bx) + (a2 - b2) (b) dt(A,B) = (|a,| - |&i|) + (|«2| - \b2\) (c) dt(A, B) = |<*i - &i| + |«2 ~ ^1 We can define the taxicab norm of a vector v as 14 - d((v, o) i Problem 3 Find |v|, for the following vectors: 3" (b) v = V = '-3* (d) v = V = -6 6] Problem 4 Show that Theorem 1.3 is true for the taxicab norm. Problem 5 Verify the Triangle Inequality (Theorem 1.5), using the taxicab norm and the following pairs of vectors: "3" "f (b)u = " r ~-2 u = 1, , v = -h , v = 2 3 Problem 6 Show that the Triangle Inequality is true, in general, for the taxicab norm. In Euclidean geometry, we can define a circle of radius r, centered at the origin, as the set of all x such that ||x|| = r. Analogously, we can define a taxicab circle of radius r, centered at the origin, as the set of all x such that jjx||, = r. Problem 1 Draw taxicab circles centered at the origin with the following radii: (a) r = 3 (b) r = 4 (c) r = 1 Problem 8 In Euclidean geometry, the value of ir is half the circumference of a unit circle (a circle of radius 1). Lets define taxicab pi to be the number irt that is half the circumference of a taxicab unit circle. What is the value of 7Tf? In Euclidean geometry, the perpendicular bisector of a line segment AB can be defined as the set of all points that are equidistant from A and B. If we use taxicab distance instead of Euclidean distance, it is reasonable to ask what the perpendicular bisector of a line segment now looks like. To be precise, the taxicab perpendicular bisector of AB is the set of all points X such that d,(X.A) = dt(X,B) Problem 9 Draw the taxicab perpendicular bisector of AB for the following pairs of points: (a) A = (2,1), B = (4,1) (b) A = (-1,3), B = (-1, -2) (c) A = (1, 1), B = (5, 3) (d) A = (1, 1), B = (5, 5) As these problems illustrate, taxicab geometry shares some properties with Euclidean geometry, but it also differs in some striking ways. In this chapter, we will Section 7.1 Inner Product Spaces 531 encounter several other types of distances and norms, each of which is useful in its own way. We will try to discover what they have in common and use these common properties to our advantage. We will also explore a variety of approximation problems in which the notion of "distance" plays an important role. Inner Product Spaces In Chapter 1, we defined the dot product u • v of vectors u and v in R", and we have made repeated use of this operation throughout this book. In this section, we will use the properties of the dot product as a means of defining the general notion of an inner product. In the next section, we will show that inner products can be used to define analogues of "length" and "distance" in vector spaces other than W. The following definition is our starting point; it is based on the properties of the dot product proved in Theorem 1.2. DCfillitlOll An inner product on a vector space Vis an operation that assigns to every pair of vectors u and v in V a real number (u, v) such that the following properties hold for all vectors u, v, and w in V and all scalars c: 1. (u, v) = (v, u) 2. (u, v + w) = (u, v) + (u, w) 3. (cu, v) = c(u, v) 4. (u, u) £ 0 and (it, u) = 0 if and only if u = 0 A vector space with an inner product is called an inner product space. Remark Technically, this definition defines a real inner product space, since it assumes that V is a real vector space and since the inner product of two vectors is a real number. There are complex inner product spaces too, but their definition is somewhat different. (See Exploration: Vectors and Matrices with Complex Entries at the end of this section.) Example 7.1 W is an inner product space with (u, v) = u ■ v. Properties (1) through (4) were verified as Theorem 1.2. The dot product is not the only inner product that can be defined on W. Example 7.2 Letu = 1 and v = ."2 J V be two vectors in R2. Show that (u, v) = 2w,v, + 3m2v2 defines an inner product. S32 Chapter 7 Distance and Approximation Solution We must verify properties (1) through (4). Property (1) holds because (u, v) = 2ulvl + 3u2v2 - 2v1u1 + 3v2u2 = (v, u) Next, let w = . We check that (u, v + w) = 2«1(v] + Wj) + 3u2(v2 + w2) = 2ulvl + 2u1wl + 3u2v2 + 3w2w2 = (2«iVj + 3u2v2) + (2ulwl + 3u2w2) = (lI)V) + (u, w) which proves property (2). If c is a scalar, then (cu, v) = 2(cw1)v1 + 3(c«2)v2 = c(2uxvx + 3«2v2) = c(u,v) which verifies property (3). Finally, (u,u) = 2«!»! + 3u2«2 = 2«i + 3u\ S: 0 and it is clear that (u, u) = 2u\ + 3u\ = 0 if and only if ux = u2 = 0 (that is, if and only if u = 0). This verifies property (4), completing the proof that (u, v), as defined, is an inner product. Example 7.2 can be generalized to show that if wx,..., w„ are positive scalars and ui Vi u = sold v = .Un- -V„- are vectors in W, then '■;;',^^.v';•^■t^^''vV■.v:,>'-■^■ ■ ■■ (u, v) = wxulvl + • ■ ■ + wnu„v„ (1) defines an inner product on R", called a weighted dot product. If any of the weights Wj is negative or zero, then Equation (1) does not define an inner product. (See Exercises 13 and 14.) Recall that the dot product can be expressed as u • v = u2v. Observe that we can write the weighted dot product in Equation (1) as (u, v) = urWv Section 7.1 Inner Product Spaces 533 where W is the n X n diagonal matrix W = w The next example further generalizes this type of inner product. Example 7.3 Let A be a symmetric, positive definite « X n matrix (see Section 5,5) and let u and v be vectors in W. Show that , (u, v) = u7Av defines an inner product. Solution We check that (u, v) = u7Av = u-Av = ,\vu = ATvu = (vTA)T'U - vrAu = (v, u) Also, (u, v + w) = urA(v + w) = urAv + uTAw = (u, v) + (u, w) and (cu,v) = (cu)TAv =5 c(urAv) = c{u, v) Finally, since A is positive definite, (u, u) = urAu > 0 for all u £ 0, so (u, u) u'Au = 0 if and only if u = 0. This establishes the last property. 4 To illustrate Example 7.3, let A 4 -21 -2 7 . Then (u, v) = urAv = [ w, u2 4 -2 -2 7 = 4lUxVi — 2uiV2 ~ 2«2vi + 7u2v2 Example 7.4 The matrix A is positive definite, by Theorem 5.22, since its eigenvalues are 3 and 8. Hence, (u, v) defines an inner product on U2. We now define some inner products on vector spaces other than W. In 9^2» let p(x) = flo + <*\X + a2x2 and q(x) — b0 + bxx + b2x2. Show that (p(x), q(x)) = a0b0 + alb1 + a2b2 defines an inner product on = fMg(x)dx a defines an inner product on % [a, b]. Solution We have rb rb (fig) = f(x)g(x) dx = g(x)f(x) dx = (g,f) Ja Ja Also, if h is in % [a, b], then ifig+h)= f(x)(g(x) + h(x)) dx (f(x)g(x) +f(x)h(x))dx ■ b f(x)g(x) dx + f{x)h{x) dx a = (fig) + {f,h) If c is a scalar, then rb (Cf>g)= cf(x)g(x)dx b f(x)g(x) dx = c(fig) rb Finally, {/,/) = (f(x))2dx S 0, and it follows from a theorem of calculus that, since/ f 2 is continuous, {//) = (f(x)) dx = 0 if and only if/is the zero function. Therefore, J a (/, g) is an inner product on % [a, b]. Example 7.5 also defines an inner product on any subspace oi% [a, b]. For example, we could restrict our attention to polynomials defined on the interval [a, b]. Suppose we consider SP [0, 1], the vector space of all polynomials on the interval [0, 1]. Then, using the inner product of Example 7.5, we have (x2,1 + x) = x2(\ + x) dx = o V x4 — + — 3 4 Or + x) dx I 1 7 3 4 ~ 12 Section 7.1 Inner Product Spaces Properties of Inner Products The following theorem summarizes some additional properties that follow from the definition of inner product. Tl)60rem 7.1 Let u, v, and w be vectors in an inner product space Vand let c be a scalar. a. (u + v, w) = (u, w) + (v, w) b. (u, cv) = c{u, v) c. (u, 0) = (0, v) = 0 Proof We prove property (a), leaving the proofs of properties (b) and (c) as Exercises 23 and 24. Referring to the definition of inner product, we have (u + v, w) = (w, u + v) by (1) = (w, u) + (w, v) by (2) = (n, w) + (v, w) by (1) Length, Distance, and Orthogonality In an inner product space, we can define the length of a vector, distance between vectors, and orthogonal vectors, just as we did in Section 1.2. We simply have to replace every use of the dot product u • v by the more general inner product (u, v). Definition Let u and vbe vectors in an inner product space V. 1. The length (or norm) of vis ||vjj = V(v, v). 2. The distance between u and vis d(u, v) = j|u — vj|. 3. u and v are orthogonal if (u, v) = 0. Note that jjv| is always defined, since (v, v) 2: 0 by the definition of inner product, so we can take the square root of this nonnegative quantity. As in U", a vector of length 1 is called a unit vector. The unit sphere in V is the set S of all unit vectors in V. dy dx Consider the inner product on % [0, 1] given in Example 7.5. If/(x) = x and g(x) 3x — 2, find I (a) 11/1 (b) d(f,g) (c) (f,g) Solution (a) We find that (/»/) = so I/I = V(/7) = 1/V3. 1 rl 3 A f2(x) dx = J x2 dx — — 1 = i Jo 3 Chapter 7 Distance and Approximation (b) Since d(f,g) = \\f-g\\ = \/{f^gJ^g) and f(x) - g(x) = x - (3x - 2) = 2 - 2x — 2(1 - x) we have {/- g>f~ g) = (/(*) - ^W)2 = 4(1 - 2x + x2) dx = 4 x - x + Combining these facts, we see thatd(/,£) = VT/3 = 2/V3. (c) We compute (fig) = f(x)g(x) dx = x(3x - 2) dx = (3x2 - 2x) = [x3 - x2]J = 0 Thus./andgare orthogonal. 4 It is important to remember that the "distance" between/and g in Example 7.6 does not refer to any measurement related to the graphs of these functions. Neither does the fact that / and g are orthogonal mean that their graphs intersect at right angles. We are simply applying the definition of a particular inner product. However, in doing so, we should be guided by the corresponding notions in R2 and 1R3, where the inner product is the dot product. The geometry of Euclidean space can still guide us here, even though we cannot visualize things in the same way. *i£mmm 1,1 Using the inner product on R2 defined in Example 7.2, draw a sketch of the unit sphere (circle). ▼ Solution If x = of all x such that , then (x, x) = 2x2 + 3y2. Since the unit sphere (circle) consists x|| = 1, we have 1 = ||x|| = V{x7x) = Vi^rTJy2 or 2x2 + 3/2 = 1 This is the equation of an ellipse, and its graph is shown in Figure 7.2. Figure 1.2 A unit circle that is an ellipse Section 7.1 Inner Product Spaces 537 We will discuss properties of length, distance, and orthogonality in the next section and in the exercises. One result that we will need in this section is the generalized version of Pythagoras' Theorem, which extends Theorem 1.6. Theorem 7.2 Pythagoras' Theorem Let u and v be vectors in an inner product space V. Then u and v are orthogonal if and only if II u + v||2 = Hull2 + llvll2 As you will be asked to prove in Exercise 32, we have ||u + vjj2 = (u + v, u + v) = ||u||2 + 2{u,v) + ||v||2 It follows immediately that |ju + v||2 = ||u|j2 + ||v||2 if and only if (u, v) = 0. Orthogonal Projections and the Gram-Schmidt Process In Chapter 5, we discussed orthogonality in U". Most of this material generalizes nicely to general inner product spaces. For example, an orthogonal set of vectors in an inner product space Vis a set {v^ ... , vk} of vectors from V such that (v;, rj) = 0 whenever v.- + y.. An orthonormal set of vectors is then an orthogonal set of unit vectors. An orthogonal basis for a subspace W of V is just a basis for W that is an orthogonal set; similarly, an orthonormal basis for a subspace W of Vis a basis for W that is an orthonormal set. In W, the Gram-Schmidt Process (Theorem 5.15) shows that every subspace has an orthogonal basis. We can mimic the construction of the Gram-Schmidt Process to show that every finite-dimensional subspace of an innec product space has an orthogonal basis—-all we need to do is replace the dot product by the more general inner product. We illustrate this approach with an example, (Compare the steps here with those in Example 5.13.) dx flillilS 1.8 Construct an orthogonal basis for S?2 with respect to the inner product (/>£> = f(x)g(x)dx by applying the Gram-Schmidt Process to the basis {1, x, x2}. Solution Let x, = 1, x2 = x, and x3 - x2. We begin by setting v, - x; = 1. Next we compute dx 2 and (t„ x2) = j x dx = 0 538 Chapter 7 Distance and Approximation Adrien Marie Legendre (1752-183: was a French mathematician who worked in astronomy, number theory, and elliptic functions. He was involved in several heated disputes with Gauss. Legendre gave the first published statement of the law of quadratic reciprocity in number theory in 1765. Gauss, however, gave the first rigorous proof of this result in 1801 and claimed credit for the result, prompting understandable outrage from Legendre. Then in 1806, Legendre gave the first published application of the method of least squares in a book on the orbits of comets. Gauss published on die same topic in 1809 but claimed he had been using the method since 1795, once again infuriating Legendre. M w Therefore, (vi.x2) 0., v2 = x2 ---rV] = x - -(1) (vi. Vi) 2 To find v3, we first compute (▼l.Ä3> x dx = v2, X, = x~ dx = 0, Then It follows that {v„ v2, v3} is an orthogonal basis for 3P2 on the interval [—1, 1]. The polynomials are the first three Legendre polynomials. If we divide each of these polynomials by its length relative to the same inner product, we obtain normalized Legendre polynomials (see Exercise 41). Just as we did in Section 5.2, we can define the orthogonal projection projlv(v) of a vector v onto a subspace W of an inner product space. If {u^ . . . , UjJ is an orthogonal basis for W, then 2. [Hint: You will need the fact that a polynomial of degree n has at most n zeros. See Appendix D.] 12. Repeat Exercise 5 using the inner product of Exercise 11 with a = 0, b = 1, c = 2. In Exercises 13-18, determine which of the four inner product axioms do not hold. Give a specific example in each case. in U2. Define (u, v) = uxvx. n I in IR2. Define 13. Let u = «i andv = .v2. 14. Let u - -«1 .«2. and v = "v.! (u, v) = uxvx — u2v2. 15. Let u = »1 ."2. andv = _v2„ in 1R2. Define Lv2J (U, v) = UXV2 + M2Vj. 16. In 3f2, define (p(x), q(x)) = p(0)q(0). 17. In 3\, define (p(x), q(x)) = p{l)q{\). 18. In M22, define (A, B) = det(AB). In Exercises 19 and 20, (u, v) defines an inner product on IR2, where u = M] andv = 1 . Find a symmetric .«2J lv2„ matrix A such that (u, v) = urAv. 19. (u, v) = 4w,V! + UjV2 + u2Vi + 4u2v, 20. (u, v) = KjV] + 2U[V2 + 2«2V! + 5w2v2 Exercises 21 and 22, sketch the unit circle in U2for the given inner product, where u "Uj" andv = _«2_ A'2- 21. (u, v) = uxvx + \u2v2 22. (u,v) = 4Uj Vj + «iV2 + m2Vj + 4m2v2 23. Prove Theorem 7.1(b). 24. Prove Theorem 7.1(c). In Exercises 25-29, suppose that u, v, and w are vectors in an inner product space such that (u, v) = 1, (u, w) = 5, (v,w) = 0 ||u|| = 1, ||v|| = V3, ||w|| = 2 Evaluate the expressions in Exercises 25-28. 25. (u + w, v — w) 26. (2v ~ w, 3u + 2w) 27. ||u + v|| 28. ||2u - 3v + w|| 29. Show that u + v = w. [Hint: How can you use the properties of inner product to verify that u + v - w = 0?] 30. Show that, in an inner product space, there cannot be unit vectors u and v with (u, v) < -1. In Exercises 31-36, (u,v) is an inner product. In Exercises 31-34, prove that the given statement is an identity. 31. (u + v,u - v) - """2 - iL'"2 32. ||« + vll2 = ||u ;|U|[" ~ ||V|| + 2{u,v) + ||v| 33. u 2 + = f||u + v| 2llU ,112 34. (u,v) = i||u +v||^ - jllu-35. Provefhat II u + v II = II u — v|| if and only if u and v are orthogonal. 36. Prove that d(u, v) = V||u||2 + ||v||2ifandonlyifu and v are orthogonal. In Exercises 37-40, apply the Gram-Schmidt Process to the basis B to obtain an orthogonal basis for the inner product space V relative to the given inner product. 37. V B = Example 7.2 38. V = R2, B = with the inner product in with the inner product 39 1S_40. 0« 41 42, immediately following Example 7.3 V = £?2, B = {l, 1 + x, 1 + x + x2}, with the inner product in Example 7.4 V = 9>2[0, l],B = {1,1 + x, 1 + x + x2}, with the inner product in Example 7.5 (a) Compute the first three normalized Legendre polynomials. (See Example 7.8.) (b) Use the Gram-Schmidt Process to find the fourth normalized Legendre polynomial. If we multiply the Legendre polynomial of degree n by an appropriate scalar we can obtain a polynomial L„(x) such that Ln(l) = 1. (a) Find LQ(x), Lx(x), L2(x), and L3(x). (b) It can be shown that Ln(x) satisfies the recurrence relation 2n — 1 n — 1 Ln(x) = -xL„_i(x)--i»-2W 542 Chapter 7 Distance and Approximation for all n > 2. Verify this recurrence for L2(x) and L3(x). Then use it to compute L4(x) and L5(x). 43. Verify that if W is a subspace of an inner product space V and v is in V, then perpw(v) is orthogonal to all w in W. 44. Let u and v be vectors in an inner product space V. Prove the Cauchy-Schwarz Inequality for u # 0 as follows: (a) Let tbe a real scalar. Then (tu + v, tu + v) £ 0 for all values of t. Expand this inequality to obtain a quadratic inequality of the form at2 + bt + c > 0 What are a, b, and c in terms of u and v? (b) Use your knowledge of quadratic equations and their graphs to obtain a condition on a, b, and c for which the inequality in part (a) is true. (c) Show that, in terms of u and v, your condition in part (b) is equivalent to the Cauchy-Schwarz Inequality. Explorations Vectors and Matrices with Complex Entries In this book, we have developed the theory and applications of real vector spaces, the most basic example of which is W. We have also explored the finite vector spaces lnp and their applications. The set C" of «-tuples of complex numbers is also a vector space, with the complex numbers C as scalars. The vector space axioms (Section 6.1) all hold for C", and concepts such as linear independence, basis, and dimension carry over from W without difficulty. The first notable difference between W and C" is in the definition of dot product. If we define the dot product in C" as in W, then for the nonzero vector v = have V v^v = V;2 + i2" V/Zi + i = vo = o This is clearly an undesirable situation (a nonzero vector whose length is zero) and violates Theorems 1.2(d) and 1.3. We now generalize the real dot product to C" in a way that avoids this type of difficulty. Vi Definition ifu = and v = are vectors in C", then the complex dot product of u and v is defined by u • V = ulvl + The norm (or length) of a complex vector v is defined as in the real case: ||v|| — Vv'V. Likewise, the distance between two complex vectors u and v is still defined as d(u, v) = jju-vj|. 543 1. Show that, for v = 2. Let u = 1 [l (a) u*v (b) ||-u.|| (c) ||v|| (d) d(u,v) (e) a nonzero vector orthogonal to u (f) a nonzero vector orthogonal to v The complex dot product is an example of the more general notion of a complex inner product, which satisfies the same conditions as a real inner product with two exceptions. Problem 3 provides a summary. 3. Prove that the complex dot product satisfies the following properties for all vectors u, v, and w in C" and all complex scalars. (a) u-v = vu (b) u • (v + w) = u • v + u • w (c) (cu) *v = c (u • v) and u • (cv) = c(u • v) (d) u • u 3s 0 and u • u = 0 if and only if u = 0. For matrices with complex entries, addition, multiplication by complex scalars, transpose, and matrix multiplication are all defined exactly as we did for real matrices in Section 3.1, and the algebraic properties of these operations still hold. (See Section 3.2.) Likewise, we have the notion of the inverse and determinant of a square complex matrix just as in the real case, and the techniques and properties all carry over to the complex case. (See Sections 3.3 and 4.2.) The notion of transpose is, however, less useful in the complex case than in the real case. The following definition provides an alternative. -——-;-;---:-« DfifinitiOII If A is a complex matrix, then the conjugate transpose of A is the matrix A* defined by A* = A1' and v 2 - 3; 1 + 5i Find: In the preceding definition, A refers to the matrix whose entries are the complex conjugates of the corresponding entries of A; that is, if A = [fly] .then A = [atA. 4. Find the conjugate transpose A* of the given matrix: i 2f (a) A = (c) A = -» 3 2-i 1+3» -2 4 0 3 - 4» (b) A (d) A 2 5-2» 5 + 2» -1 3i 0 1 + i 1 - i 4 i 1 + i 0 -» Properties of the complex conjugate (Appendix C) extend to matrices, as the next problem shows. 5. Let A and B be complex matrices, and let c be a complex scalar. Prove the following properties: (a) A' = A (b) ATB = A + B (c) cA = cA (d) AB =AB (e) (A)T = (Ar) Hermitian matrices are named after the French mathematician Charles Hermite (1822-1901). Hermite is best known for his proof that the number e is transcendental, but he also was the first to use the term orthogonal matrices, and he proved that symmetric (and Hermitian) matrices have real eigenvalues. The properties in Problem 5 can be used to establish the following properties of the conjugate transpose, which are analogous to the properties of the transpose for real matrices (Theorem 3.4). 6. Let A and B be complex matrices, and let c be a complex scalar. Prove the following properties: (a) (A*)* = A (c) (cA)* = cA* (b) (A + 5)* = A* + B* (d) (AB)* = B*A* 7. Show that for vectors u and v in C", the complex dot product satisfies u • v = u*v. (This result is why we defined the complex dot product as we did. It gives us the analogue of the formula u • v = urv for vectors in W.) For real matrices, we have seen the importance of symmetric matrices, especially in our study of diagonalization. Recall that a real matrix A is symmetric if A1 - A. For complex matrices, the following definition is the correct generalization. D6f InitiOn A square complex matrix A is called Hermitian if A* = A—that is, if it is equal to its own conjugate transpose. 8. Prove that the diagonal entries of a Hermitian matrix must be real. 9. Which of the following matrices are Hermitian? (a) A = (c) A = (e) A 2 1 + 1 - i i -3 1 - 5i ■1 + 5i 3 0 3 2 -3 0 -1 -2 1 0 (b) A = (d) A (f) A = -1 2 - 3/ 1 1 - 4/ . 3 + i 3 0 0 2 -2 1 2 - 3i 5 1 + 4i 2 —i 3 - / i 0 10. Prove that the eigenvalues of a Hermitian matrix are real numbers. [Hint: The proof of Theorem 5.18 can be adapted by making use of the conjugate transpose operation.] 11. Prove that if A is a Hermitian matrix, then eigenvectors corresponding to distinct eigenvalues of A are orthogonal. [Hint: Adapt the proof of Theorem 5.19 using u • v = u*v instead of u • v = u v.] Recall that a square real matrix Q is orthogonal if Q~1 = Q1, The next definition provides the complex analogue. Definition A square complex matrix Uis called unitary if U~1 = U*. Just as for orthogonal matrices, in practice it is not necessary to compute U"1 directly. You need only show that 17* 17 = / to verify that U is unitary. 545 12. Which of the following matrices are unitary? For those that are unitary, give their inverses. (a) (c) 3/5 4i/5 -4/5 31/5 (b) (d) (1 + f)/V6 0 2/VfT 0 1 0 .(-1 - f)/V3 0 1/V3. Unitary matrices behave in most respects like orthogonal matrices. The following problem gives some alternative characterizations of unitary matrices. 13. Prove that the following statements are equivalent for a square complex matrix U: (a) U is unitary. (b) The columns of U form an orthonormal set in C" with respect to the complex dot product. (c) The rows of U form an orthonormal set in C" with respect to the complex dot product. (d) ||l/x|| = ||x|| for every x in C". (e) L'x • Uy — x • y for every x and y in C". [Hint: Adapt the proofs of Theorems 5.4-5.7.] 14. Repeat Problem 12, this time by applying the criterion in part (b) or part (c) of Problem 13. The next definition is the natural generalization of orthogonal diagonalizability to complex matrices. Definition A square complex matrix A is called unitarily diagonalizable if there exists a unitary matrix U and a diagonal matrix D such that U*AU = D The process for diagonalizing a unitarily diagonalizable n X n matrix A mimics the real case. The columns of U must form an orthonormal basis for C" consisting of eigenvectors of A. Therefore, we (1) compute the eigenvalues of A, (2) find a basis for each eigenspace, (3) ensure that each eigenspace basis consists of orthonormal vectors (using the Gram-Schmidt Process, with the complex dot product, if necessary), (4) form the matrix U whose columns are the orthonormal eigenvectors just found. Then U *AU will be a diagonal matrix D whose diagonal entries are the eigenvalues of A, arranged in the same order as the corresponding eigenvectors in the columns of U. 15. In each of the following, find a unitary matrix U and a diagonal matrix D such that 17* A U = D. (b) A = (a) A = -i 2 (c) A r -i 1 - i 1 + i 0 (d) A = -1 0_ 0 2 1 + i 0 The matrices in (a), (c), and (d) of the preceding problem are all Hermitian. It turns out that every Hermitian matrix is unitarily diagonalizable. (This is the Complex Spectral Theorem, which can be proved by adapting the proof of Theorem 5.20.) At this point you probably suspect that the converse of this result must also be true—namely, that every unitarily diagonalizable matrix must be Hermitian. But unfortunately this is falsel (Can you see where the complex analogue of the proof of Theorem 5.17 breaks down?) For a specific counterexample, take the matrix in part (b) of Problem 15. It is not Hermitian, but it is unitarily diagonalizable. It turns out that the correct characterization of unitary diagonalizability is the following theorem, the proof of which can be found in more advanced textbooks. A square complex matrix A is unitarily diagonalizable if and only if A*A = AA* A matrix A for which A*A = AA* is called normal. 16. Show that every Hermitian matrix, every unitary matrix, and every skew-Hermitian matrix (A* = —A) is normal. (Note that in the real case, this result refers to symmetric, orthogonal, and skew-symmetric matrices, respectively.) 17. Prove that if a square complex matrix is unitarily diagonalizable, then it must be normal. Geometric Inequalities and Optimization Problems This exploration will introduce some powerful (and perhaps surprising) applications of various inequalities, such as the Cauchy-Schwarz Inequality. As you will see, certain maximization/minimization problems {optimization problems) that typically arise in a calculus course can be solved without using calculus at all! Recall that the Cauchy-Schwarz Inequality in W states that for all vectors u and v, |u"v| s ||u|| ||v|| with equality if and only if u and v are scalar multiples of each other. If u = [Xi ■■■ x„]7andv=[v1 ••• yH] , the above inequality is equivalent to kiji + • • • + x„y„\ < Vx] + ■ • • + x\ VyT+ + yl Squaring both sides and using summation notation, we have See Linear Algebra with Applications by S. J. Leon (Upper Saddle River, NJ: Prentice-Hall, 2002). 541 Equality holds if and only if there is some scalar k such that^, = kx. for i = 1,..., n. Let's begin by using Cauchy-Schwarz to derive a special case of one of the most useful of all inequalities. 1. Let x and y be nonnegative real numbers. Apply the Cauchy-Schwarz Inequality to u Vx and v = Vy Vx to show that ._ x + y Vxy < (1) D c o \i \ X Y / y 1 Figure 7.4 with equality if and only ifx = y. 2. (a) Prove inequality (1) directly. [Hint: Square both sides.] (b) Figure 7.4 shows a circle with center O and diameter AB — AC + CB = x + y. The segment CD is perpendicular to AB. Prove that CD = Vxy and use this result to deduce inequality (1). [Hint: Use similar triangles.] The right-hand side of inequality (1) is the familiar arithmetic mean (or average) of the numbers x and y. The left-hand side shows the less familiar geometric mean of x and y. Accordingly, inequality (1) is known as the Arithmetic Mean-Geometric Mean Inequality (AMGM). It holds more generally; for n nonnegative variables Xn ..., x„, it states 'if---... v v x^Xj' ' ' Xn with equality if and only if xi = x2 = • x„. In words, the AMGM Inequality says that the geometric mean of a set of nonnegative numbers is always less than or equal to their arithmetic mean, and the two are the same precisely when all of the numbers are the same. (For the general proof, see Appendix B.) We now explore how such an inequality can be applied to optimization problems. Here is a typical calculus problem. Figure 7.5 3Ml Example 7.9 Prove that among all rectangles whose perimeter is 100 units, the square has the largest area. Solution If we let x and y be the dimensions of the rectangle (see Figure 7.5), then the area we want to maximize is given by A = xy We are given that the perimeter satisfies 2x + 2y = 100 which is the same as x + y = 50. We can relate xy and x + y using the AMGM Inequality: x + y Vxy S —-— or, equivalently, xy £ %(x + y) Since x + y = 50 is a constant (and this is the key), we see that the maximum value of A = xy is 502/4 = 625 and it occurs when x = y = 25. JjJot a derivative in sight! Isn't that impressive? Notice that in this maximization problem, the crucial step was showing that the right-hand side of the AMGM Inequality was constant. In a similar fashion, we may be able to apply the inequality to a minimization problem if we can arrange for the left-hand side to be constant. Example 7.10 Prove that among all rectangular prisms with volume 8 m3, the cube has the minimum surface area. Solution As shown in Figure 7.6, if the dimensions of such a prism are x, y, and z, then its volume is given by V = xyz Thus, we are given that xyz = 8. The surface area to be minimized is S = 2xy + 2yz + 2zx Since this is a three-variable problem, the obvious thing to try is the version of the AMGM Inequality for n = 3—namely, x + y + z vxyz Unfortunately, the expression for S does not appear here. However, the AMGM Inequality also implies that S 2xy + 2yz + 2zx 3 ~~ 3 > ^(2xy)(2yz)(2zx) = 2^/{.xyzf = 2V64 = 8 which is equivalent to S 2: 24. Therefore, the minimum value of S is 24, and it occurs when 2xy = 2yz = 2zx (Why?) This implies that x = y = z = 2 (i.e., the rectangular prism is a cube). 3. Prove that among all rectangles with area 100 square units, the square has the smallest perimeter. 4. What is the minimum value of f{x) = x + — for x > 0? 549 5. A cardboard box with a square base and an open top is to be constructed from a square of cardboard 10 cm on a side by cutting out four squares at the corners and folding up the sides. What should the dimensions of the box be in order to make the enclosed volume as large as possible? 6. Find the minimum value oif(x, y, z) — (x + y) (y + z) (z + x) if x, y, and z are positive real numbers such that xyz = 1. 8 7. For x > y > 0, find the minimum value of x H--------. [Hint: A substitution might help.] y(x - y)' The Cauchy-Schwarz Inequality itself can be applied to similar problems, as the next example illustrates. Example 7.11 Find the maximum value of the function f(x, y, z) = 3x + y + 2z subject to the constraint x2 + y2 + z2 = 1. Where does the maximum value occur? f Solution This sort of problem is usually handled by techniques covered in a multi- variable calculus course. Here's how to use the Cauchy-Schwarz Inequality. The function 3x + y + 2z has the form of a dot product, so we let "3" X u = 1 and v = y _2_ Then the componentwise form of the Cauchy-Schwarz Inequality gives (3x + y + 2z)2..... (32 + l2 + 22)(x2 + y2 + z2) 14 Thus, the maximum value of our function is Vl4, and it occurs when X "3" y = k 1 _ ~ _ 2_ Therefore, x = 3k, y = k, and z = 2k, so 3(3fc) + k k = 1 / Vl4, and hence 2(2k) = VÜ. It follows that x "3/VI4" y = 1/V14 _z_ _2/Vl4_ 8. Find the maximum value of f{x, y, z) = x + 2y + Az subject to x2 + ly1 + z2 = 1. z2 9. Find the minimum value off(x, 7,2) = x2 + y2+ — subjecttox + y + z = 10. 10. Find the maximum value of sin 9 + cos 9. 11. Find the point on the line x + 2y = 5 that is closest to the origin. There are many other inequalities that can be used to solve optimization problems. The quadratic mean of the numbers xu ..., x„ is defined as 550 If X 1> ..., xn are nonzero, their harmonic mean is given by n l/x, + l/x-2 + • • ■ + l/x„ It turns out that the quadratic, arithmetic, geometric, and harmonic means are all related. 12. Let x and y be positive real numbers. Show that with equality if and only if x = y. (The middle inequality is just AMGM, so you need only establish the first and third inequalities.) 13. Find the area of the largest rectangle that can be inscribed in a semicircle of radius r (Figure 7.7). 14. Find the minimum value of the function for x,y > 0. [Hint: (x + yf/xy = (x + y)(l/x + 1/y).] 15. Let x and y be positive real numbers with x + y — 1. Show that the minimum value of f(x,y) = (x + y)2 XV is y, and determine the values of x and y for which it occurs. 551 552 Chapter 7 Distance and Approximation Norms and Distance Functions ■■ mm In the last section, you saw that it is possible to define length and distance in an inner product space. As you will see shortly, there are also some versions of these two concepts that are not defined in terms of an inner product. To begin, we need to specify the properties that we want a "length function" to have. The following definition does this, using as its basis Theorem 1.3 and the Triangle Inequality. DCfinitlOII A norm on a vector space V is a mapping that associates with each vector v a real number ||v||, called the norm of v, such that the following properties are satisfied for all vectors u and v and all scalars c: 1. jjv|| > 0, and ||v|| = 0 if and only if v = 0. 2. jjcvji = |c|||v|| 3. j|u + v| < H| + ||v|| A vector space with a norm is called a normed linear space. Show that in an inner product space, ||v|| = V (v, v) defines a norm. Solution Clearly, V(v, v) > 0. Moreover, V{v, v) = 0 ^ (v, v) = 0 v = 0 by the definition of inner product. This proves property (1). For property (2), we only need to note that |jcv|! = V(cv, cvj = V^vTvj = V^Vty.v) = |c|||v|| Property (3) is just the Triangle Inequality, which we verified in Theorem 7.4. We now look at some examples of norms that are not defined in terms of an inner product. Example 7.13 is the mathematical generalization to W of the taxicab norm that we explored in the Introduction to this chapter. Example 7.13 The sum norm ||v||s of a vector v in W is the sum of the absolute values of its components. That is, if v = [vj • • • v„]T, then Wh - ini Show that the sum norm is a norm. • + I v„ Solution Clearly, ||v||s = |vj H— + |v„| > 0, and the only way to achieve equality is if | Vj | = • ■ • = | v„ | =0. But this is so if and only if Vj = ■ • • = v„ = 0 or, equivalently, v = 0, proving property (1). For property (2), we see that cv = [cVj ■•• cv„]r,so |kv|s= |cv,| + •••+ |cvj = IcKlvil + ■••+ |vj) - |c|H. Section 7,2 Norms and Distance Functions 553 Finally, the Triangle Inequality holds, because if u = [u1 •'• un] , then !lu + v!ls = lui + vJ + ■ • • + \u„ + v„\ - (k! + hi) + ■■■ + (!«„! + \vn\) = (\Ul\ +■■■+ \un\) + (jvj + • • • + |vj) = ||u||, + ||vj|s The sum norm is also known as the l-norm and is often denoted by \\\\x. On fR2, it is the same as the taxicab norm. As Example 7.13 shows, it is possible to have several norms on the same vector space. Example 7.14 illustrates another norm on W. Example 1,14 The max norm j|v|jm of a vector v in U" is the largest number among the absolute values of its components. That is, ifv = [v. I T then ||v||m = max{ r: ,..., jvj} Show that the max norm is a norm. Again, it is clear that j|v|Jm _ 0. If ||v||m = 0, then the largest of |v1 [,..., j vj is zero, and so they all are. Hence, v, = • • • = v„ = 0, so v = 0. This verifies property (1). Next, we observe that for any scalar c, ||cv|!m = max{|cv,|,..., |cv„|} = |cjma_{|v,|,..., |v„|} = |c||jv||m Finally, for u = [ux ■■• un] , we have ||u + v|w = max{|«i + vx\,\u„ + v„\] ^max{|«J + |vj,..., |«„| + |v„|} < max{ u ...., \un\} + max{|vil,..., jvj} = jjujj,,, + ||v||m (Why is the second inequality true?) This verifies the Triangle Inequality. The max norm is also known as the co-norm or uniform norm and is often denoted by jjvjl». In general, it is possible to define a norm j|vjj?) on W by \\v\\„ = (I v, !? + •••+ \vn\P)u* for any real number p > 1. For p = 1, jjvjji = jjv|js, justifying the term l-norm. For p = 2, 1M|2 = (IvJ2 + • • • + !v„|2)1/2 = Vv[+^4- vt which is just the familiar norm on R" obtained from the dot product. Called the 2-norm or Euclidean norm, it is often denoted by !|vj|£. As p gets large, it can be shown using calculus that [|vL approaches the max norm ||v||m. This justifies the use of the alternative notation lIviL for this norm. ___,__..................__,_______,_._Ste Exam _—i—-——~—.............-------.....■ -■.........." ■- ........j.............. —JB*-For a vector v in Z", define v H to be w(v), the weight of v. Show that it defines a 554 Chapter 7 Distance and Approximation Solution Certainly, ||v||H = w(v) 5: 0, and the only vector whose weight is zero is the zero vector. Therefore, property (1) is true. Since the only candidates for a scalar c are 0 and 1, property (2) is immediate. To verify the Triangle Inequality, first observe that if u and v are vectors in Z?, then w(u + v) counts the number of places in which u and v differ. [For example, if u = [1 1 0 1 0]r and v = [0 1 1 1 l]r then u + v = [1 0 1 0 ljr, so w(u + v) = 3, in agreement with the fact that u and v differ in exactly three positions.] Suppose that both u and v have zeros in n0 positions and Is in nx positions, u has a 0 and v has a 1 in «ol positions, and u has a 1 and v has a 0 in nl0 positions. (In the example above, n0 = 0, n, = 2, nm = 2, and nl0 =1.) Now w(u) = «, + n10, w(v) — «! + «01, and w(u + v) = nU) + «01 Therefore, + V|H = w(U + V) = «i0 + Hqi = («i + «10) + («, + nm) - 2«i < («i + «i0) + («i + «0i) = w(u) + w(v) = |)u||H + ||v||H The norm ||v||H is called the Hamming norm. Distance Functions ' For any norm, we can define a distance function just as we did in the last section—namely, d(u, v) == |ju - v|j Example 7.16 Let u = (b) the su 3" -2. m nc and v = irm, and (c -l" 1 ) the . Compute d( u, v) relative to (a) the Euclidean norm, max norm. Solution Each calculation requires knowing that u — v = . (a) As is by now quite familiar, dE(u,\) = |ju - v||E = V42 + (-3)2 = V25 = 5 (b) ds(u,v) = lju -v\\s = |4| + j-3| = 7 (c) d,„(u, v) = ||u - v||„, = max{|4|, |-3|} = 4 k The distance function on Zjj determined by the Hamming norm is called the Hamming distance. We will explore its use in error-correcting codes in Section 8.5. Example 7,17 provides an illustration of the Hamming distance. Section 7.2 Norms and Distance Functions 555 • ? 11 Find the Hamming distance between u= (1 1 0 1 0] T and v - [0 1 1 1 1]T Solution Since we are working over Z,, u - v = u + v. But dH(u,v) = ||u + v\\H = w(u + v) As we noted in Example 7.15, this is just the number of positions in which u and v differ. "The given vectors are the same ones used in that example; the calculation is therefore exactly the same. Hence, dH(u, v) = 3. 4 Theorem 7.5 summarizes the most important properties of a distance function. Theorem. 7.5 Let d be a distance function defined on a normed linear space V. The following properties hold for all vectors u, v, and w in V: a. d(u, v) > 0, and d(u, v) = 0 if and only if u - v. b. d(u, v) = d(v, u) c. d(u, w) 5 d(u, v) + d(v, w) (a) Using property (1) from the definition of a norm, it is easy to check that d(u, v) == || u — vjj s 0, with equality holding if and only if u — v = 0 or, equivalently, u = v. (b) You are asked to prove property (b) in Exercise 19. (c) We apply the Triangle Inequality to obtain A function d satisfying the three properties of Theorem 7.5 is also called a metric, and a vector space that possesses such a function is called a metric space. These are very important in many branches of mathematics and are studied in detail in more advanced courses. We can define norms for matrices exactly as we defined norms for vectors in R". After all, the vector space Mmn of all m X n matrices is isomorphic to lRm", so this is not difficult to do. Of course, properties (1), (2), and (3) of a norm will also hold in the setting of matrices. It turns out that, for matrices, the norms that are most useful satisfy an additional property. (We will restrict our attention to square matrices, but it is possible to generalize everything to arbitrary matrices.) d(u, v) + d(v, w) = |ju - vjj + jjv - wjj > jj(u - v) + (v - w) = I u — wj = d(u, w) Matrix Norms 556 Chapter 7 Distance and Approximation Definition A matrix norm on Mnn is a mapping that associates with each n X n matrix A a real number ||A|J, called the norm of A, such that the following properties are satisfied for all n X n matrices A and J3 and all scalars c. 1. jjA| :• 0 and ||A|| = 0 if and only if A = O, 2. \\cA\\ = |c|||A|| 3. ||A + B\\ < ||A|| + 4. ||AB|| < ||A||||B|| A matrix norm on Mnn is said to be compatible with a vector norm |x| on W if, for all n X n matrices A and all vectors x in W, we have IIAxil £ llAllllxll Example 7.18 The Frobenius norm \\A\\F of a matrix A is obtained by stringing out the entries of the matrix into a vector and then taking the Euclidean norm. In other words, ||A|F is just the square root of the sum of the squares of the entries of A. So, if A = [fly], then (a) Find the Frobenius norm of A = (b) Show that the Frobenius norm is compatible with the Euclidean norm. (c) Show that the Frobenius norm is a matrix norm. (a) ||A ||,,. = \/3- +71l7T2T+~41 = V30 Before we continue, observe that if Aj = [3 — 1] andA2 = [2 4] are the row vectors of A, then || AT || £ = V3"2 + (-lp and ||A2||£ = V2^ + 4AThus, ||A||f =V||A1||i+ ||A2||| Similarly, if ai "3" -1" anda2 = .2. IIa are the column vectors of A, then It is easy to see that these facts extend ton X n matrices in general. We will use these observations to solve parts (b) and (c). (b) Write A = 'Ai l A„ Section 7.2 Norms and Distance Functions 557 Then Ax v. = A,x uA„x = VllA^II + + A„x + + A n\\E\ = (N/||A1||| + ...+ ||A,||S)||x||B = ||A||f||x||B where the inequality arises from the Cauchy-Schwarz Inequality applied to the dot products of the row vectors A; with the column vector x. (Do you see how Cauchy-Schwarz has been applied?) Hence, the Frobenius norm is compatible with the Euclidean norm. (c) Let b, denote the Jth column of B. Using the matrix-column representation of the product AB, we have ||AB||F= ||[Ab,---Ab„]||F V||Ab1||i + ---+ llAbJi V||A||II|b1||I + •••+ \\A\\2F\\bn by part (b) = iiAiuVINif + = llAllJßllp •+ b 2 «IIb which proves property (4) of the definition of a matrix norm. Properties (1) through (3) are true, since the Frobenius norm is derived from the Euclidean norm, which satisfies these properties. Therefore, the Frobenius norm is a matrix norm. For many applications, the Frobenius matrix norm is not the best (or the easiest) one to use. The most useful types of matrix norms arise from considering the effect of the matrix transformation corresponding to the square matrix A. This transformation maps a vector x into Ax. One way to measure the "size" of A is to compare ||x|| and ||Ax|| using any convenient (vector) norm. Lets think ahead. Whatever definition of || A || we arrive at, we know we are going to want it to be compatible with the vector norm we are using; that is, we will need Ax A x or Ax IAII forx # 0 !! Ax|| The expression —r.—r— measures the "stretching capability" of A. If we normalize each nonzero vector x by dividing it by its norm, we get unit vectors x 1 x and thus Ax Ax — 4„V IUI! V llxll / Ax 558 Chapter 7 Distance and Approximation y 4-- -4- Figure 7.8 Figure 7.8 shows how the matrix A = affects the unit circle in Mr—it maps it If x ranges over all nonzero vectors in IR", then x ranges over all unit vectors (i.e., the unit sphere) and the set of all vectors Ax determines some curve in W. For example, "3 2 .2 0_ into an ellipse. With the Euclidean norm, the maximum value of || Ax|| is clearly just half the length of the principal axis—-in this case, 4 units. We express this by writing max ||Ax|| = 4. IliB-i In Section 7.4, we will see that this is not an isolated phenomenon. That is, !|Ax|l max-f^—77--= max||Ax|| **o ||x|| always exists, and there is a particular unit vector y for which || Ay|| is maximum. Now we prove that || A || = max || Ax|| defines a matrix norm. IWhi Theorem 7.6 If ||x|| is a vector norm on IR", then ||A|| = max ||Ax|| defines a matrix norm on M„ -■- lixl! -i that is compatible with the vector norm that induces it. Pf00f (1) Certainly, ||Ax|| S: 0 for all vectors x, so, in particular, this inequality is true if || x j| = 1.Hence, ||Ai| = max ||Ax|| > 0 also. If || A || = 0, then we must have 11x11=1 11 Ax 11 = 0—and, hence, Ax = 0—for all x with ||x|| = 1. In particular, Ae; = 0 for each of the standard basis vectors e; in W. But Ae,- is just the ith column of A, so we M must have A = O. Conversely, if A = O, it is clear that || A || =0. (Why?) (2) Let c be a scalar. Then ||cA|| = max||cAx|| = max|c| ||Ax|| = |c|max||Ax|| = |c|||a|| llx||-=l IUII = 1 !|x|| = l Section 7,2 Norms and Distance Functions (3) Let B be an n X n matrix and let y be a unit vector for which \\A + B|i = max || U + B)x|| = ||(A + £)y|| Then |JA + B|| = ||(A + B)y|| = II Ay + By || < ||Ay|| + ||J5y|| < mil + iibii (Where nioes the second inequality come from?) Next, we show that our definition is compatible with the vector norm [ property (5)] and then use this fact to complete the proof that we have a matrix norm. (5) If x = 0, then the inequality || Ax!| S || A || ||x|| is true, since both sides are zero. If x # 0, then from the comments preceding this theorem, IUx|| !!ax|| .I h ~r,—r,— < max-H—n— = ||A|| |jx|| **o ||x|| Hence, ||Ax|| < |U|| ||x||. (4) Let z be a unit vector such that || AB || = max 11 (AS) x|| = || ABz 11. Then Iklhi ii AB || = ||ABz|| = ||A(Bz)|| < || A || ||Bz|| by property (5) ' < || A || ||B|| ||z|| by property (5) = lUllllBl! This completes the proof that || A || = max || Ax || defines a matrix norm on M„„ that is compatible with the vector norm that induces it. -3§fHBB BefliiltiQIi The matrix norm || A || in Theorem 7.6 is called the operator norm induced by the vector norm ||x||. The term operator norm reflects the fact that a matrix transformation arising from a square matrix is also called a linear operator. This norm is therefore a measure of the stretching capability of a linear operator. The three most commonly used operator norms are those induced by the sum norm, the Euclidean norm, and the max norm—namely, HaI^ = max ||Ax||s, ||a||2 = max ||Ax||E, ||a||,x = max ||Ax||m IMI 0, as we wished to show. f 11 Compute [| M \\„ in Example 2.37 and use this value to find the number of iterations required to approximate the solution to three-decimal-place accuracy (after rounding) if the initial vector is x0 = 0. Chapter 7 Distance and Approximation Exercises 7.2 Solution We have already computed M = 0 0 Ml 0.6 < 1 (implying that Jacobi's method converges in Example 2.37, as we saw). The approximate solution xk will be accurate to three decimal places if the error vector xk — x has the property that each of its components is less than 0.0005 in absolute value. (Why?) Thus, we need only guarantee that the maximum absolute component of x.( - x is less than 0.0005. In other words, we need to find the smallest value of k such that IIXi - x\\m < 0.0005 Using Equation (9) above, we see that x,, - xlL < IImLIIx, , - x|L < llMjIil' Now ml = 0.6 and xn so H-2 I m|!*| 0.714 1.400 x„ - x = 1.4, m (0.6)*(1.4) (If we knew the exact solution in advance, we could use it instead of xt. In practice, this is not the case, so we use an approximation to the solution, as we have done here.) Therefore, we need to find k such that (0.6)*(1.4) < 0.0005 We can solve this inequality by taking logarithms (base 10) of both sides. We have log10((0.6)*(l.4)) < log10(5 X 10~4) --=> fclog10(0.6) + log10(l.4) < log105 - 4 => -0.222k + 0.146 < -3.301 =>k> 15.5 Since k must be an integer, we can therefore conclude that k = 16 will work and that 16 iterations of Jacobi's method will give us three-decimal-place accuracy in this example. (In fact, it appears from our calculations in Example 2.37 that we get this degree of accuracy sooner, but our goal here was only to come up with an estimate.) 2' In Exercises 1-3, let u = 4 andv = -2 .-5 0 4. (a) What does ds(u, v) measure? (b) What does dm(u, v) measure? 1. Compute the Euclidean norm, the sum norm, and the max norm of u. 2. Compute the Euclidean norm, the sum norm, and the max norm of v. 3. Compute d(u, v) relative to the Euclidean norm, the sum norm, and the max norm. 1 0 IT 1 1 0 0 11' and In Exercises 5 and 6, let u = v = [0 1 1 0 1 1 1] 5. Compute the Hamming norms of u and v. 6. Compute the Hamming distance between u and v. 7. (a) For which vectors v is !|v||£ = ||v|jm? Explain your answer. Section 7.2 Norms and Distance Functions 567 (b) For which vectors v is [ | v 11, = 11 v j | ,„? Explain your answer. (c) For which vectors v is ||v||s = ||v||m = ||v||£? Explain your answer. 8. (a) Under what conditions on u and v is ||n + v||£ = 11u!! e + 11v 11 e • Explain your answer. (b) Under what conditions on u and v is ||U + v||s = ||u||s + ||vI!s? Explain your answer. (c) Under what conditions on u and v is II u + v||,„ = ||u||m + || v||,„? Explain your answer. 9. Show that for all v in R", ||v||m< ||v||£. 10. Show that for all v in U", ||v||£< ||v||s. 11. Show that for all v in I 12. Show that for all v in I I vII s: Ml* «l|V||m. V«||v||f 13. Draw the unit circles in and the max norm. relative to the sum norm 14. By showing that the identity of Exercise 33 in Section 7.1 fails, show that the sum norm does not arise from any inner product. In Exercises 15-18, prove that vector space V. defines a norm on the 15. V — o>- 16. V = M„ = max{|2a|, \3b\} a11 = max{ I ay j} ,17. V=<€[0,1], 11/1 18. 11/11 = max \f(x)\ 0Sl<] \f(x)\ dx 19. Prove Theorem 7.5(b), In Exercises20-25, compute ||a||f, ||a||],and ||a| '2 3" 0 -1 20. A = 21. A = _4 1_ .-3 3. "2 1 r 1 5" 22. A = 23. A = 1 3 2 2 -1. .1 1 3_ 0 -5 2 "4 2 -1 24. A = 3 1 -3 25. A = 0 1 2 -4 ! 3 3 3 0 In Exercises26-31, find vectorsxandy with ||x|j, = 1 and ||y||m= 1 such that \\A\\, = |JAx||5aw«i ||a|U, = ||Ay||w where A is the matrix in the given exercise. 26. Exercise 20 27. Exercise 21 28. Exercise 22 29. Exercise 23 30. Exercise 24 31. Exercise 25 32. Prove Theorem 7.7(b). 33. (a) If || A || is an operator norm, prove that ||l|| = 1, where 7 is an identity matrix, (b) Is there a vector norm that induces the Frobenius norm as an operator norm? Why or why not? 34. Let || A || be a matrix norm that is compatible with a vector norm ||x||. Prove that ||a|| 2: ja for every eigenvalue A of A. In Exercises 35-40, find condY(A) and condJyA). State whether the given matrix is ill-conditioned. '3 f 1 -2 35. A = 36. A = .4 2. .~3 6 "l 0.99" 150 200 37. A = 38. A = .1 1 L3001 4002 "l 1 f "1 \ l -3 39. A = 5 5 6 40. A = 1 l 2 3 1 4 .1 0 0. 1 1 -3 4 1 5 - "l k 41. Let A = 1 1 (a) Find a formula for cond«(A) in terms of k. (b) What happens to condK.(A) as k approaches 1? 42. Consider the linear system Ax = b, where A is invert-ible. Suppose an error Ab changes b to b' = b + Ab. Let x' be the solution to the new system; that is, Ax' = b'. Let x' = x + Ax so that Ax represents the resulting error in the solution of the system. Show that Ax cond(A)- Ab for any compatible matrix norm. 43. Let A 10 10 [100 andb = 10 9j [ 99 (a) Compute cond,=(A). (b) Suppose A is changed to A' = large a relative change can this change produce in the solution to Ax = b? [Hint: Use inequality (1) from this section.! 10 10 10 11 . How Chapter 7 Distance and Approximation (c) Solve the systems using A and A' and determine the actual relative error. [100" (d) Suppose b is changed to b' 101 How large a relative change can this change produce in the solution to Ax — b? [Hint: Use Exercise 42.] (e) Solve the systems using b and b' and determine the actual relative error. 46. Show that if A and B are invertible matrices, then cond(AB) £ cond(A)cond(B) with respect to any matrix norm. 47. Let A be an invertible matrix and let A] and A„ be the eigenvalues with the largest and smallest absolute values, respectively. Show that |Ai| I A. I cond (A) "l 1 r V 44. Let A = 2 5 0 andb = 2 .1 -1 2_ .3. (a) Compute cond, (A). (b) Suppose A is changed to A' = . How large a relative change can this change produce in the solution to Ax = b? [Hint: Use inequality (1) from this section.] (c) Solve the systems using A and A' and determine the actual relative error. l" (d) Suppose b is changed to b' = . How large a relative change can this change produce in the solution to Ax = b? [Hint: Use Exercise 42.] (e) Solve the systems using b and b' and determine the actual relative error. 45. Show that if A is an invertible matrix, then cond(A) 2: 1 with respect to any matrix norm. [Hint: See Exercise 34 and Theorem 4.18(b) in Section 4.3.] cas in Exercises 48-51, write the given system in the form of Equation (7). Then use the method of Example 7.22 to estimate the number of iterations ofjacobi's method that will be needed to approximate the solution to three-decimal-place accuracy. (UsexQ — O.J Compare your answer with the solution computed in the given exercise from Section 2.5. 48. Exercise 1, Section 2.5 49. Exercise 3, Section 2.5 50. Exercise 4, Section 2.5 51. Exercise 5, Section 2.5 Exercise 52(c) refers to the Leontief model of an open economy, as discussed in Sections 2.4 and 3.7. 52. Let A be an n X n matrix such that ||A|| < 1, where the norm is either the sum norm or the max norm, (a) Prove that A" -> O as n -* ». (b.) Deduce from (a) that I — A is invertible and (I - A)'1 = I + A + A2 + A3 + ■ • • [Hint: See the proof of Theorem 3.34.] (c) Show that (b) can be used to prove Corollaries 3.35 and 3.36. Least Squares Approximation In many branches of science, experimental data are used to infer a mathematical relationship among the variables being measured. For example, we might measure the height of a tree at various points in time and try to deduce a function that expresses the tree's height h in terms of time t. Or, we might measure the size p of a population over time and try to find a rule that relates p to t. Relationships between variables are also of interest in business; for example, a company producing widgets may be interested in knowing the relationship between its total costs c and the number n of widgets produced. In each of these examples, the data come in the form of two measurements: one for the independent variable and one for the (supposedly) dependent variable. Thus, we have a set of data points (xt, y{), and we are looking for a function that best approximates the relationship between the independent variable X and the dependent variable y. Figure 7.9 shows examples in which experimental data points are plotted, along with a curve that approximately "fits" the data. Section 7.3 Least Squares Approximation 569 Figure 7.9 Curves of "best fit" English mathematician who, while a fellow at Cambridge, edited the second edition of Newton's Prin-cipia. Although he published little, he made important discoveries in the theory of logarithms, integral calculus, and numerical methods. The method of least squares, which we are about to consider, is attributed to Gauss. A new asteroid, Ceres, was discovered on New Year's Day, 1801, but it disappeared behind the sun shortly after it was observed. Astronomers predicted when and where Ceres would reappear, but their calculations differed greatly from those done, independently, by Gauss. Ceres reappeared on December 7, 1801, almost exactly where Gauss had predicted it would be. Although he did not disclose his methods at the time, Gauss had used his least squares approximation method, which he described in a paper in 1809. The same method was actually known earlier; Cotes anticipated the method in the early 18th century, and Legendre published a paper on it in 1806. Nevertheless, Gauss is generally given credit for the method of least squares approximation. We begin our exploration of approximation with a more general result. Chapter 7 Distance and Approximation The Best Approximation Theorem In the sciences, there are many problems that can be phrased generally as "What is the best approximation to X of type 7?" X might be a set of data points, a function, a vector, or many other things, while Y might be a particular type of function, a vector belonging to a certain vector space, etc. A typical example of such a problem is finding the vector w in a subspace W of a vector space V that best approximates (i.e., is closest to) a given vector v in V. This problem gives rise to the following definition. PefiPitiOn If Wis a subspace of a normcd linear space V and if v is a vector in V, then the best approximation to v in W is the vector v in W such that |]v — v|| < |jv — w|| for every vector w in W different from v. In R2 or R3, we are used to thinking of "shortest distance" as corresponding to "perpendicular distance." In algebraic terminology, "shortest distance" relates to the notion of orthogonal projection: If Wis a subspace of W and vis a vector in R", then we expect projw(v) to be the vector in W that is closest to v (Figure 7.10). Since orthogonal projection can be defined in any inner product space, we have the following theorem. Figure 7.10 If v -- projw-(v), then ||v — v|| < ||v - w|! for all w # v Theorem 7.8 Ihe Best Approximation Tlieorem If W is a finite-dimensional subspace of an inner product space V and if v is a vector in V, then proj w(v) is the best approximation to v in W. Let w be a vector in W different from projw(v). Then projw(v) - w is also in W, so v ~ proju-(v) = perpw(v) is orthogonal to projw(v) - w, by Exercise 43 in Section 7.1. Pythagoras' Theorem now implies that jjv - projw(v)|2 + jjprojH,(v) - w||2 = ||(v - projw(v)) + (projw(v) - w)f Section 7.3 Least Squares Approximation 511 as Figure 7.10 illustrates. However, ||projw(v) — wj|2 > 0, since w + projw(v), so j|v - projw(v)jj2 < ||v -- projw(v)j|2 + ||projw(v) - wf = ||v - wj|2 or, equivalently, ||v - projw(v)|| < ||v - w|| _SH Example 7.23 Letu, r 5" "3~ 2 .u2 = -2 , andv = 2 1_ .5. . Find the best approximation to v in the plane W = span (tlx, u2) and nndtne Euclidean distance from v to W. Solution The vector in W that best approximates v is projw(v). Since u: and u: are orthogonal, projw(v) u, - v u, -u, u, + u, • V u, u- 1 5 3 2 T 30 -2 = _2 . ••• 1. 1_ 1 L 5- The distance from v to Wis the distance from v to the point in W closest to v. But this distance is just ||perpw(v)|| = jjv — projvv(v)jj. We compute 3 3 0 v - projw(v) = 2 - _2 5 = 12 5 * 5 24 L 5 J so ||v - projw(v)i! = VO2 + (f)2 + (f)2 = which is the distance from v to W. 'f = 12V5/5 In Section 7.5, we will look at other examples of the Best Approximation Theorem when we explore the problem of approximating functions. Remark The orthogonal projection of a vector v onto a subspace W is defined in terms of an orthogonal basis for W. The Best Approximation Theorem gives us an alternative proof that proj w (v) does not depend on the choice of this basis, since there can be only one vector in W that is closest to v—namely, proj!V(v). Least Squares Approximation We now turn to the problem of finding a curve that "best fits" a set of data points. Before we can proceed, however, we need to define what we mean by "best fit." Suppose the data points (1, 2), (2, 2), and (3, 4) have arisen from measurements taken during some experiment. Also suppose we have reason to believe that the x and y values are related by a linear function; that is, we expect the points to lie on some line with equation/ = a + bx. If our measurements were accurate, all three points would satisfy this equation and we would have 2 = a + b • 1 2 = a + b • 2 4 = a + fo-3 Chapter 7 Distance and Approximation This is a system of three linear equations in two variables: a + b = 2 a + 2b = 2 or a + 3b = 4 Unfortunately, this system is inconsistent (since the three points do not lie on a straight line). So we will settle for a line that comes "as close as possible" to passing through our points. For any line, we will measure the vertical distance from each data point to the line (representing the errors in the v-direction), and then we will try to choose the line that minimizes the total error. Figure 7.11 illustrates the situation. 1 1 2' a 1 2 — 2 b 1 3 4 a + bx Figaro 1.11 Finding the line that minimizes e\ + e\ + S3 If the errors are denoted by elt e2> and e3, then we can form the error vector e e IS3J We want e to be as small as possible, so j|ej| must be as close to zero as possible. Which norm should we use? It turns out that the familiar Euclidean norm is the best choice. (The sum norm would also be a sensible choice, since ||e[js = + + e3 is the actual sum of the errors in Figure 7.11. However, the absolute value signs are hard to work with, and, as you will soon see, the choice of the Euclidean norm leads to some very nice formulas.) So we are going to minimize ||e| = Ve2 + ef + ef or, equivalently, ||e|j2 = e2 + s2 + e2 This is where the term "least squares" comes from: We need to find the smallest sum of squares, in the sense of the foregoing equation. The number |je|| is called the least squares error of the approximation. Section 7.3 Least Squares Approximation 573 From Figure 7.11, we also obtain the following formulas for e„ e2, and s3 in our example: Et = 2 - (a + b' 1) e2 = 2 - (a + b• 2) s3 = 4 - (a + 3) Example 7.24 Which of the following lines gives the smallest least squares error for the data points (1, 2), (2, 2), and (3, 4)? , (a) y = 1 + x (b) y = -2 + 2x (c) y = 1 + x Solution Table 7.1 shows the necessary calculations. Table 7.1 y = 1 + x _y = -2 + 2x_y = § + * 2 -(1 + 1)= 0 2 -(-2 + 2) = 2 2 -(f+l)=| 2 ~ (1 + 2) = -1 2 - (-2 + 4) = 0 2 - (| + 2) = -f 4 - (1 + 3) = 0 4 - (-2 + 6) = 0 4 - (| + 3) = | Figure 7.12 We see that the line y = | + x produces the smallest least squares error among these three lines. Figure 7.12 shows the data points and all three lines. i, S74 Chapter 7 Distance and Approximation It turns out that the line y = § + x in Example 7.24 gives the smallest least squares error of any line, even though it passes through none of the given points. The rest of this section is devoted to illustrating why this is so. In general, suppose we have n data points (Xi,)'i),..., (x„,yn) and a line y = a + bx. Our error vector is e = where e,- = y) — (a + bx,). The line y = a + bx that minimizes s\ + • • • + s2 is called the least squares approximating line (or the line of best fit) for the points (jq, y,),..., (x„,y„). As noted prior to Example 7.24, we can express this problem in matrix form. If the given points were actually on the line y = a + bx, then the «linear equations a + bxl = a + foc„ = yn would all be true (i.e., the system would be consistent). Our interest is in the case where the points are not collinear, in which case the system is inconsistent. In matrix form, we have " 1 1 a b = . 1 Xfl-. ->•„- which is of the form Ax = b, where A i v 1 X-, 1 x„ yi Jn The error vector e is just b — Ax (check this), and we want to minimize |[e|[2 or, equiv-alently, jjejj. We can therefore rephrase our problem in terms of matrices as follows. D6filliti0ll If A is an m X n matrix and b is in Rm, a least squares solution of Ax = b is a vector x in IR" such that ■ jib - Axil < i|b - Ax|| for all x in W. Section 7.3 Least Squares Approximation 575 Solution of the Least Squares Problem Any vector of the form Ax is in the column space of A, and as x varies over all vectors in R", Ax varies over all vectors in col(A). A least squares solution of Ax = b is therefore equivalent to a vector y in col (A) such that ||b - yfl < [|b - y|i for all y in col(A). In other words, we need the closest vector in col(A) to b. By the Best Approximation Theorem, the vector we want is the orthogonal projection of b onto cql(A). Thus, if x is a least squares solution of Ax = b, we have Ax = projcol(A)(b) (1) In order to find x, it would appear that we need to first compute projcol(A)(b) and then solve the system (1). However, there is a better way to proceed. We know that b - Ax = b - projcolU)(b) = perpcoI(A)(b) is orthogonal to col(A). So b - Ax is in (col(A))1 = null(AJ). Therefore Ar(b - Ax) = 0, which, in turn, is equivalent to Arb - ArAx = 0 or A7Ax = Arb This represents a system of equations known as the normal equations for x. We have just established that the solutions of the normal equations for x are precisely the least squares solutions of Ax — b. This proves the first part of the following theorem. Theorem 7.9 The Least Squares Theorem Let A be an m X n matrix and let b be in Rm. Then Ax = b always has at least one least squares solution x. Moreover: a. x is a least squares solution of Ax = b if and only if x is a solution of the normal equations A7 Ax = A1 b. b. A has linearly independent columns if and only if ATA is invertible. In this case, the least squares solution of Ax = b is unique and is given by x = (A''A)-'Arb Proof We have already established property (a). For property (b), we note that the n columns of A are linearly independent if and only if rank(A) = n. But this is true if and only if A7A is invertible, by Theorem 3.28. If A1 A is invertible, then the unique solution of A7Ax = A' b is clearly x = (ATA)'1 Arb. _ 576 Chapter 7 Distance and Approximation Example 7.25 Find a least squares solution to the inconsistent system Ax = b, where A = 1 5" "3" 2 -2 and b = 2 .-1 1. .5. Solution We compute A A 6 0 0 30 1 2 -1" ' 2 and Arb = 5 -2 1 2 — 16 The normal equations ATAx = ATb are just "6 0" x = ~21 .0 30_ .16J which yield x The fact that this solution is unique was guaranteed by Theorem 7.9(b), since the columns of A are clearly linearly independent. Remark We could have phrased Example 7.25 as follows: Find the best approximation to b in the column space of A. The resulting equations give the system Ax = b whose least squares solution we just found. (Verify this.) In this case, the components of x are the coefficients of that linear combination of the columns of A that produces the best approximation to b—namely, f 5" 3" 2 + h -2 = 2 ~5 .-1. 1. 1 - 5- This is exactly the result of Example 7.23. Compare the two approaches. Example 7.26 Find the least squares approximating line for the data points (1, 2), (2, 2), and (3, 4) from Example 7.24. We have already seen that the corresponding system Ax = b is "l r ~2~ a 1 2 — 2 b A 3. .4. Section 7.3 Least Squares Approximation 577 where y — a + bx is the line we seek. Since the columns of A are clearly linearly independent, there will be a unique least squares solution, by part (b) of the Least Squares Theorem. We compute r [ill A A = U 2 3 Hence, we can solve the normal equations A1 Ax = A!b, using Gaussian elimination to obtain [A1 A | Arb] = 1 1 1 2 1 3 3 6 6 14 and A'b tu - 2 "l 1 1 2 = 8" 1 2 3_ 4. "3 6 8 -y 1 0 2 3 6 14 18_ 0 1 1 So x , from which we see that a = I, = 1 are the coefficients of the least squares approximating line: y = j + x. 4 The line we just found is the line in Example 7.24(c), so we have justified our claim that this line produces the smallest least squares error for the data points (1,2), (2, 2), and (3, 4). Notice that if x is a least squares solution of Ax = b, we may compute the least squares error as Hell = lib - Ax|| Since Ax = projcol(yl)(b), this is just the length of perpcol{A)(b)-from b to the column space'of A. In Example 7.26, we had -that is, the distance Ax ~1 "l r - 2" 1" 3 2 _ 1 2 3 = 2 3 1 .4. .1 3, - 3- so, as in Example 7.24(c), we have a least squares error of ||e|| 0.816. Note that the columns of A in Example 7.26 are linearly independent, so (ATA)~1 exists, and we could calculate x as x = (AJA)_1A7b. However, it is almost always easier to solve the normal equations using Gaussian elimination (or to let your CAS do it for you!). It is interesting to look at Example 7.26 from two different geometric points of view. On the one hand, we have the least squares approximating line y = | + x, with corresponding errors el — j, e2 = —f, and s3 = j, as shown in Figure 7.13(a). Equivalently, we have the projection of b onto the column space of A, as shown in Figure 7.13(b). Here, projcoiu) (b) Ax = 1 1 1 2 1 3 578 Chapter 7 Distance and Approximation y Figure 7.13 and the least squares error vector is e = if the data points were collinear?] . [What would Figure 7.13(b) look like Example 7.27 Find the least squares approximating line and the least squares error for the points (1,1), (2, 2), (3, 2), and (4, 3). Solution Let y = a + bx be the equation of the line we seek. Then, substituting the four points into this equation, we obtain a + b = 1 a + 2b = 2 a + 3b = 2 a + 4b = 3 So we want the least squares solution of Ax = b, where or "1 r "1" 1 2 a 2 1 3 X 2 1 4 3 "1 1" T 1 2 2 and b = 1 3 2 J 4_ 3 Since the columns of a are linearly independent, the solution we want is x = {ATAYlATb = • fc (Check this calculation.) Therefore, we take a = j and ib = f, producing the least squares approximating line y = 5 + fx, as shown in Figure 7.14. Section /.3 Least Squares Approximation 579 ► x figuro 7.14 Since Ax "1" "1 r 2 1 2 2 1 3 3 1 4 the least squares error is j|e|| = V5/5 ~ 0.447. We can use the method of least squares to approximate data points by curves other than straight lines. Example 7.28 Find the parabola that gives the best least squares approximation to the points (—1,1), (0, -1), (1,0), and (2, 2). Solution Ihe equation of a parabola is a quadratic y — a + bx + cx2. Substituting the given points into this quadratic, we obtain the linear system (I - b + c = 1 "1 -1 1 a = -1 1 0 0 + b + c or a = 0 1 1 1 a + 2b + Ac = 2 1 2 4 Thus, we want the least squares approximation of Ax = b, where A = "1 -1 r 1 0 0 1 1 i 1 2 4 and b = Chapter 7 Distance and Approximation We compute ~4 2 6 2 6 8 .6 8 18 so the normal equations are given by A'A and AJb 3jk - "4 2 6~ "2~ 2 6 8 x = 3 _6 8 18. .9. whose solution is Thus, the least squares approximating parabola has the equation as shown in Figure 7.15. y = -is - !* + x2 figure 7.15 A least squares approximating parabola 4 One of the important uses of least squares approximation is to estimate constants associated with various processes. The next example illustrates this application in the context of population growth. Recall from Section 6.7 that a population that is growing (or decaying) exponentially satisfies an equation of the form pit) = cekt, where p{t) is the size of the population at time t and c and k are constants. Clearly, c = p(0), but k is not so easy to determine. It is easy to see that k = P'it) Pit) which explains why k is sometimes referred to as the relative growth rate of the population: It is the ratio of the growth rate p'it) to the size of the population pit). Section 7.3 Least Squares Approximation 581 cas Example 7.29 Table 7.2 Population Year (in billions) 1950 2.56 1960 3.04 1970 3.71 1980 4.46 1990 5.28 2000 6.08 Source: U.S. Bureau of the Census, International Data Base Table 7.2 gives the population of the world at 10-year intervals for the second half of the 20th century. Assuming an exponential growth model, find the relative growth rate and predict the world's population in 2010. vnt'i'a* Let's agree to measure time t in 10-year intervals so that I = 0 is 1950, t ~ 1 is 1960, and so on. Since c = p(0) = 2.56, the equation for the growth rate of the population is p = 2.56ekt How can we use the method of least squares on this equation? If we take the natural logarithm of both sides, we convert the equation into a linear one: In p = ln(2.56efc0 = In 2,56 + ln(e*') « 0.94 + kt Plugging in the values of t and p from Table 7.2 yields the following system (where we have rounded calculations to three decimal places): 0.94 = 0.94 k = 0.172 2k = 0.371 3k = 0.555 4k = 0.724 5k = 0.865 We can ignore the first equation (it just corresponds to the initial condition c = p(0) = 2.56). The remaining equations correspond to a system Ax = b, with A = "l" ' 0.172" 2 0.371 3 and b = 0.555 4 0.724 .5. _ 0.865 _ Since ArA = 55 and ATb - 9.80, the corresponding normal equations are just the single equation 55x = 9.80 Therefore, k = x: = 9.80/55 0.178. Consequently, the least squares solution has the formp = 2.56eomt (see Figure 7.16). The world's population in 2010 corresponds to t = 6, from which we obtain p(6) = 2.56e' 0.178(6) 7.448 Thus, if our model is accurate, there will be approximately 7.45 billion people on Earth in the year 2010. (The U.S. Census Bureau estimates that the global population will be "only" 6.82 billion in 2010. Why do you think our estimate is higher?) Chapter 7 Distance and Approximation Figure 7.16 Least Squares via the Off Factorization It is often the case that the normal equations for a least squares problem are ill-conditioned. Therefore, a small numerical error in performing Gaussian elimination will result in a large error in the least squares solution. Consequently, in practice, other methods are usually used to compute least squares approximations. It turns out that the QR factorization of A yields a more reliable way of computing the least squares approximation of Ax = b. Theorem 1.10 Let A be an m X n matrix with linearly independent columns and let b be in Um. If A = QR is a QR factorization of A, then the unique least squares solution x of Ax = b is x R Qrb Recall from Theorem 5.16 that the QR factorization A = QR involves an m X n matrix Q with orthonormal columns and an invertible upper triangular matrix R. From the Least Squares Theorem, we have ArAx = Arb (QRfQRx = (QR)Tb => RTQTQRx = R'Q'b => RTRx = R'Q'b H » since Q1Q = I. (Why?) Since R is invertible, so is RT, and hence we have _Rx = Qrb or, equivalently, I = R^QTb "^M Remark Since R is upper triangular, in practice it is easier to solve Rx Q'b directly than to invert R and compute jR^Q'b. Section 7.3 Least Squares Approximation 583 - - ! 38 Use the QR factorization to find a least squares solution of Ax = b, where A = " 1 2 2~ " 2" -1 1 2 -3 and b = -1 0 1 -2 1 1 2_ 0_ Solution From Example 5.15, 1/2 3 A = QR = 3/10 1/2 3V5/10 1/2 V5/10 1/2 V5/10 •V6/6 0 V6/6 V6/3 V5 1/2' 3V5/2 V6/2. We have 1/2 -1/2 -1/2 1/2 " 2" -3 _ 7/2* QTb = 3V5/10 3V5/10 V5/10 Vs/io = -V5/2 . - V6/6 0 V6/6 V6/3 . 0 „-2V6/3_ so we require the solution to Rx = Qrb, or '2 1 1/2 7/2 0 V5 3V5/2 x = -V5/2 _0 0 V6/2 . .-2V6/3. Back substitution quickly yields 4/3" x = 3/2 4/3 _ Orthogonal Projection Revisited One of the nice byproducts of the least squares method is a new formula for the orthogonal projection of a vector onto a subspace of Wn. Theorem 7.11 Let W be a subspace of R"! and let A be an m X n matrix whose columns form a basis for W. If v is any vector in U"\ then the orthogonal projection of v onto W is the vector The linear transformation P as its standard matrix. projw-(v) = A{ArAYlATv -> W that projects W" onto Whas A(ATA)~1A1 PrOOt Given the way we have constructed A, its column space is W. Since the columns of A are linearly independent, the Least Squares Theorem guarantees that there is a unique least squares solution to Ax = v given by x = (ATAylATv Chapter 7 Distance and Approximation tit mm 1.31 By Equation (1), Therefore, Ax = projcol(A)(v) = projw(v) projw(v) = A((ATA) A'v) = (A{ATAYlAT)v as required. Since this equation holds for all v in Rm, the last statement of the theorem follows immediately. — We will illustrate Theorem 7.11 by revisiting Example 5.11. Find the orthogonal projection of v 3 -1 2 onto the plane W in R3 with equation x - y + 2z = 0, and give the standard matrix of the orthogonal projection transformation onto W. SolltlQB As in Example 5.11, we will take as a basis for W the set We form the matrix A = -1 1 1 with these basis vectors as its columns. Then A'A = 1 1 0 Til 1 -1 1 1 0 1 2 0 0 3 so (ATAY By Theorem 7.11, the standard matrix of the orthogonal projection transformation onto Wis A(ÄrA)"'1AT = A so the orthogonal projection of v onto W is 1 -1 1 1 0 1 projw(v) = A(ATA)-lATv 1 1 0 -1 1 1 which agrees with our solution to Example 5.11. Section 7.3 Least Squares Approximation 585 Remark Since the projection of a vector onto a subspace W is unique, the standard matrix of this linear transformation (as given by Theorem 7.11) cannot depend on the choice of basis for W. In other words, with a different basis for W, we have a H j> different matrix A, but the matrix A(ATAYlAT will be the same! (You are asked to verify this in Exercise 43.) The Pseudoinverse of a Matrix If A is an n X n matrix with linearly independent columns, then it is invertible, and the unique solution to Ax = b is x = A~'b. If m > n and A is m X n with linearly independent columns, then Ax = b has no exact solution, but the best approximation is given by the unique least squares solution x = (A1 A) ~1A1 b. The matrix (ArA) ~1A r therefore plays the role of an "inverse of A" in this situation. Definition If A is a matrix with linearly independent columns, then the pseudoinverse of A is the matrix A + defined by A+ = (A'AT'A1 Observe that if A is m X n, then A+ is n X m. Example 7.32 "l l1 Find the pseudoinverse of A = 1 2 .1 3. Solution We have already done most of the calculations in Example 7.26. Using our previous work, we have A+ = (ATA)~1AT = 3 -1" i l 1" - 4 1 3 3 2" 3 L-i I l 2 3. -\ o 1 2- The pseudoinverse is a convenient shorthand notation for some of the concepts we have been exploring. For example, if A is m X n with linearly independent columns, the least squares solution of Ax = b is given by x - A b and the standard matrix of the orthogonal projection P from Um onto col(A) is [P] = AA + If A is actually a square matrix, then it is easy to show that A* = A^1 (see Exercise 53). In this case, the least squares solution of Ax = b is the exact solution, since x = A b = A 'b = x Chapter 7 Distance and Approximation The projection matrix becomes [P] = AA+ = AA~~l = I. (What is the geometric interpretation of this equality?) Theorem 7.12 summarizes the key properties of the pseudoinverse of a matrix. (Before reading the proof of this theorem, verify these properties for the matrix in Example 7.32.) TheOreM 7.12 Let A be a matrix with linearly independent columns. Then the pseudoinverse A+ of A satisfies the following properties, called the Penrose conditions for A: AA+A =* A b. A+AA+ = A+ c. AA+ and A+A are symmetric. We prove condition (a) and half of condition (c) and leave the proofs of the remaining conditions as Exercises 54 and 55. (a) We compute AA-A = AdA^y'A^A = a{ataY1{ata) = AI = A (c) By Theorem 3.4, A1 A is symmetric. Therefore, (A'A)'1 is also symmetric, by Exercise 46 in Section 3.3. Taking the transpose of AA+, we have (AA'V = (AiA'AyW = (AT)T((ATA)-l)TAT = A{ATA)lAT = AA+ _ Exercise 56 explores further properties of the pseudoinverse. In the next section, we will see how to extend the definition of A+ to handle all matrices, whether or not the columns of A are linearly independent. Exercises 7.3_ CAS In Exercises 1-3, consider the data points (1, 0), (2, 1), and (3, 5). Compute the least squares error for the given line. In each case, plot the points and the line. l.y = -2 + 2% 2. y = x 3. y « -3 + \x In Exercises 4-6, consider the data points ( — 5, 3), (0, 3), (5, 2), and (10, 0). Compute the least squares error for the given line. In each case, plot the points and the line. 4. v = 3 - \x 5. y = | 6. y = 2 - \x In Exercises 7-14, find the least squares approximating line for the given points and compute the corresponding least squares error. 7. (1,0), (2, 1), (3, 5) 8. (1,6), (2, 3), (3, 1) 9. (0, 4), (1, 1), (2, 0) 10. (0, 3), (1, 3), (2, 5) 11. (-5, -1), (0, 1), (5, 2), (10, 4) Section 7.3 Least Squares Approximation 387 12. (-5, 3), (0, 3), (5, 2), (10, 0) 13. (1,1), (2, 3), (3, 4), (4, 5), (5, 7) 14. (I, 10), (2, 8), (3, 5), (4, 3), (5,0) In Exercises 15-18, find the least squares approximating parabola for the given points, 15. (1,1), (2,-2), (3, 3), (4, 4) 16. (1,6), (2, 0),(3,0), (4, 2) 17. (-2, 4), (-1,7), (0, 3), (1,0), (2,-1) 18. (-2, 0), (-1, -11), (0, -10), (1, -9), (2, 8) In Exercises 19-22, find a least squares solution of Ax -- b by constructing and solving the normal equations. "3 1 1 19. A = 1 1 ,b = 1 „1 2. J "l 2" 20. A = 3 I ,b = 2 I. "1 2~ 0 3 ,b = 21. A = 2 5 3 0 4' 1 -2 4 1 0" " 1 2 -1 ,b = 5 -1 1 -1 0 2 2 22. A In Exercises 23 and 24, show that the least squares solution of Ax = b is not unique and solve the normal equations to find all the least squares solutions. 23. A = "1 1 0 0" ■ r 1 0 1 1 -3 ,b = 0 -1 1 1 2 J -1 1 0_ 4- "0 1 1 0" 5 1 -1 1 — 1 ,b = 3 1 0 1 0 -1 1 1 1 1 1 24. A = In Exercises 25 and 26, find the best approximation to a solution of the given system of equations. 25. x + y - z = 2 26. 2x + 3y + z = 21 —y + 2z= 6 x + y + z 7 3x + 2y — z — 11 ~ x + y — z = 14 —x + z = 0 2y + z = 0 In Exercises 27 and 28, a QR factorization of A is given. Use it to find a least squares solution of Ax = b. "2 r ~2 3 1 ~ 3 27. A = 2 0 ,Q = 2 3 2 .1 i_ 1 L3 2 3- b 2 3 -1 28. A = R = 1 2 .-1 V6 0 Q ■V6/2" 1/V2 1/V6 2/V6 .-1/V6 f 1/V2 0 1/V2. b - 29. A tennis ball is dropped from various heights, and the height of the ball on the first bounce is measured. Use the data in Table 7.3 to find the least squares approximating line for bounce height b as a linear function of initial height h. h (cm) b (cm) 20 14.5 40 31 48 36 60 45.5 80 59 100 73.5 30. Hooke's Law states that the length L of a spring is a linear function of the force F applied to it. (See Figure 7.17 and Example 6.92.) Accordingly, there are constants a and b such that L = a + bF Table 7.4 shows the results of attaching various weights to a spring. Talis IJ F (oz) 2 L (in.) 7.4 4 9.6 11.5 13.6 Chapter 7 Distance and Approximation ISIS« 11 Year of Birth 1920 1930 1940 1950 1960 1970 1980 1990 Life Expectancy (years) 54.1 59.7 62.9 68.2 69.7 70.8 73.7 75.4 Source: World Almanac and Book of Facts. New York: World Almanac Books, 1999 (a) Determine the constants a and b by finding the least squares approximating line for thes^data. What does a represent? (b) Estimate the length of the spring when a weight of 5 ounces is attached. 31. Table 7.5 gives life expectancies for people born in the United States in the given years. (a) Determine the least squares approximating line for these data and use it to predict the life expectancy of someone born in 2000. (b) How good is this model? Explain. 32. When an object is thrown straight up into the air, Newton's Second Law of Motion states that its height s(t) at time t is given by s(t) = s0 + v0r + \gt2 where v0 is its initial velocity and g is the constant of acceleration due to gravity. Suppose we take the measurements shown in Table 7.6. Time(s) 0.5 1 1.5 2 3 Height (m) 11 17 21 23 18 (a) Find the least squares approximating quadratic for these data. (b) Estimate the height at which the object was released (in m), its initial velocity (in m/s), and its acceleration due to gravity (in m/s2). (c) Approximately when will the object hit the ground? 33. Table 7.7 gives the population of the United States at 10-year intervals for the years 1950-2000. (a) Assuming an exponential growth model of the form p(t) = cekt, where p(t) is the population at time r, use least squares to find the equation for the growth rate of the population. [Hint: Let t = 0 be 1950.] (b) Use the equation to estimate the U.S. population in 2010. filli 7.7 Population Year (in millions) 1950 150 1960 179 1970 203 1980 227 1990 250 2000 281 Source: U.S. Bureau of the Census 34. Table 7.8 shows average major league baseball salaries for,the years 1970-2005. (a) Find the least squares approximating quadratic for these data. (b) Find the least squares approximating exponential for these data. (c) Which equation gives the better approximation? Why? (d) What do you estimate the average major league baseball salary will be in 2010 and 2015? Iltilf I.I Average Salary Year (thousands of dollars) 1970 29.3 1975 44.7 1980 143.8 1985 371.6 1990 597.5 1995 1110.8 2000 1895.6 2005 2476.6 Source: Major Leaf ;ue Baseball Players Association Section 7.3 Least Squares Approximation S89 35. A 200 mg sample of radioactive polonium-210 is observed as it decays. Table 7.9 shows the mass remaining at various times. Assuming an exponential decay model, use least squares to find the half-life of polonium-210. (See Section 6.7.) Table 7.9 Time (days) Mass (mg) 0 200 30 172 60 148 90 128 36. Find the plane z = a + bx + cy that best fits the data points (0, -4, 0), (5, 0, 0), (4, -1,1), (1, -3,1), and (-1, ~5,-2). In Exercises 37-42, find the standard matrix of the orthogonal projection onto the subspace W. Then use this matrix to find the orthogonal projection ofv onto W. 37. W = span 38. W = span 39. W = span 40. W = span 41. W = span 42. W = span I v = I v = , V = "l" , v = 0 J .0. 1 0 -1 V 2 / .3. 43. Verify that the standard matrix of the projection onto Win Example 7.31 (as constructed by Theorem 7.11) does not depend on the choice of basis. Take as a basis for W and repeat the calculations to show that the resulting projection matrix is the same. 44. Let A be a matrix with linearly independent columns and let P = A(ATA)'1AT be the matrix of orthogonal projection onto col(A). (a) Show that P is symmetric. (b) Show that P is idempotent. In Exercises 45-52, compute the pseudoinverse of A. l" 45. A = 47. A = 49. A 51. A 1 2. 1 3 -1 1 0 2 1 1 0 1 '10 0' 1 0 1 0 1 1 1 1 1 46. A = 48. A = 50. A 52. A -1 2 1 3 3 1 0 1 1 1 0 0 53. (a) Show that if A is a square matrix with linearly independent columns, then A+ = A-1, (b) If A is an m X n matrix with orthonormal columns, what is A+? 54. Prove Theorem 7.12(b). 55. Prove the remaining part of Theorem 7.12(c). 56. Let A be a matrix with linearly independent columns. Prove the following: (a) (cA)T = (l/c)A+ for all scalars c# 0. (b) (A+)+ = A if A is a square matrix. (c) (A7)"*" = (A+)rifA is a square matrix. 57. Let n data points (x1;y{),..., (x„,yn) be given. Show that if the points do not all lie on the same vertical line, then they have a unique least squares approximating line. 58. Let n data points (xuyi),..., (xn,yn) be given. Generalize Exercise 57 to show that if at least k + 1 of distinct, then the given points have a unique least squares approximating polynomial of degree at most k. 590 Chapter 7 Distance and Approximation The Singular Value Decomposition In Chapter 5, we saw that every symmetric matrix A can be factored as A = PDP r, where P is an orthogonal matrix and D is a diagonal matrix displaying the eigenvalues of A. If A is not symmetric, such a factorization is not possible, but as we learned in Chapter 4, we may still be able to factor a square matrix A as A = PDP"1, where D is as before but P is now simply an invertible matrix. However, not every matrix is diagonalizable, so it may surprise you that we will now show that every matrix (symmetric or not, square or not) has a factorization of the form A = PDQ T, where P and Q are orthogonal and D is a diagonal matrix! This remarkable result is the singular value decomposition (SVD), and it is one of the most important of all matrix factorizations. In this section, we will show how to compute the SVD of a matrix and then consider some of its many applications. Along the way, we will tie up some loose ends by answering a few questions that were left open in previous sections. The Singular Values of a Matrix For any m X n matrix A, the « X n matrix A1 A is symmetric and hence can be orthogonally diagonalized, by the Spectral Theorem. Not only are the eigenvalues of ArA all real (Theorem 5.18), they are all nonnegative. To show this, let A be an eigenvalue of ATA with corresponding unit eigenvector v. Then 0 < ||Av||2 = (Av) • (Av) = (Av)rAv = vrATAv = vrAv = A(vv) = Ajjvj|2 = A It therefore makes sense to take (positive) square roots of these eigenvalues. Definition If A is an m X n matrix, the singular values of A are the square roots of the eigenvalues of ATA and are denoted by ax,..., crn. It is conventional to arrange the singular values so that crx 5: a2 s • • • > an. Find the singular values of Solution The matrix A 1 1 1 0 0 1 "l 1 1 1 0 2 I 1 0 35= 1 0 1_ 1 0 1 A A has eigenvalues A! = 3 and A2 = 1. Consequently, the singular values of A are ax = ^Tx = V3andcr2 = VXJ = 1. Section 7,4 The Singular Value Decomposition To understand the significance of the singular values of an m X n matrix A, consider the eigenvectors of ATA. Since ATA is symmetric, we know that there is an orthonormal basis for Un that consists of eigenvectors of ATA. Let {v1;..., v„} be such a basis corresponding to the eigenvalues of A1 A, ordered so that Aj 2: A2 > • • • > A„. From our calculations just before the definition, Therefore, A; = \\Avtf c7: = VA^ = |JAVj| In other words, the singular values of A are the lengths of the vectors Avv ..., Av„. Geometrically, this result has a nice interpretation. Consider Example 7.33 again. If x lies on the unit circle in R2 (i.e., ||xj| = 1), then ijAxi!2 = (Ax) • (Ax) = (Ax)r(Ax) x'ArAx '2 r hi .1 2_ 1*2. = 2%2 + 2x,x2 + 2x\ which we recognize is a quadratic form. By Theorem 5.23, the maximum and minimum values of this quadratic form, subject to the constraint ||x|| = 1, are A, = 3 and A2 = 1, respectively, and they occur at the corresponding eigenvectors of ATA—that is, when x = Vj = 1/V2 and x = v, = -1/V2 1/V2 , respectively. Since IIAvJ v/A'Av; = A; for i = 1, 2, we see that al ~ jjAvjjj = V3 and cr2 = \\Av2\\ = 1 are the maximum and minimum values of the lengths ||Ax| as x traverses the unit circle in R . Now, the linear transformation corresponding to A maps R2 onto the plane in R3 with equation x — y — z = 0 (verify this), and the image of the unit circle under this transformation is an ellipse that lies in this plane. (We will verify this fact in general shortly; see Figure 7.18.) So arl and 0 and a2 — ■— or,. > 0 and it,.+ i = crr+2 = '' • = cr„ = 0. Then there exist an m X m orthogonal matrix L/, an n X n orthogonal matrix V, and an m X n matrix 2 of the form shown in Equation (1) such that A = VtVT A factorization of A as in Theorem 7.13 is called a singular value decomposition (SVD) of A. The columns of U are called left singular vectors of A, and the columns of V are called right singular vectors of A. The matrices U and V are not uniquely determined by A, but 2 must contain the singular values of A, as in Equation (1). (See Exercise 25.) Example 7.34 Find a singular value decomposition for the following matrices: i r (a) A 1 1 0 0 0 1 (b) A 1 0 0 1 594 Chapter 7 Distance and Approximation Solution (a) We compute "l 1 (f ArA 1 1 0 _0 0 1. and find that its eigenvalues are Ai = 2, A, = 1, ( eigenvectors 1 0 -1 1 0 > 1 0 1 0 (Verify this.) These vectors are orthogonal, so we normalize them to obtain "l/vr "o" "-1/V2" 1/V2 0 > v3 = 1/V2 0 .1 0 The singular values of A are cr. = V2, a2 = VI = 1, and o", = Vo = 0. Thus, and X V 1/V2 0 -1/V2 1/V2 0 1/V2 0 1 0 V2 0 0 0 1 0 To find U, we compute 1 and Ui = —Av, = —7= o-i V2 1 1 U2 = -Ay2 = - 1 1 0 0 0 I 1/V2' 1/V2 0 "o~ "1 1 0 0 0 ss „0 0 1 1 1 These vectors already form an orthonormal basis (the standard basis) for [R2, so we have p 0 10 1 This yields the SVD A = "1 1 0' "1 0" "V2 0 0" .0 0 1. _o 1. . 0 1 o_ 1/V2 1/V2 0' 0 0 1 -1/V2 1/V2 0. utv1 a » which can be easily checked. (Note that V had to be transposed. Also note that the singular value cr3 does not appear in 2.) Section 7.4 The Singular Value Decomposition S95 (b) This is the matrix in Example 7.33, so we already know that the singular values are r , , rl/V2] , [-1/V2 cr, = V3 and o~2 — 1, corresponding to Vj = 1/V2 and v2 1/V2 . So 2 = V3 0 0 1 0 0 and V = 1/V2 -1/V2 1/V2 1/V2 For L7, we compute 1 1 —Av, = —= o"i V3 1 1 1 0 0 1 1/V2 1/V2 2/V6' 1/V6 1/V6 and u, = —Av, 1 1 1 0 0 1 ■1/V2 1/V2 0 ■1/V2 1/V2 This time, we need to extend {u^ u2} to an orthonormal basis for R . There are several ways to proceed; one method is to use the Gram-Schmidt Process, as in Example 5.14. We first need to find a linearly independent set of three vectors that contains Ui and u2. If e3 is the third standard basis vector in R , it is clear that {u1; u2, e3} is linearly independent. (Here, you should be able to determine this by inspection, but a reliable method to use in general is to row reduce the matrix with these vectors as its columns and use the Fundamental Theorem.) Applying Gram-Schmidt (with normalization) to {%, u2, e3} (only the last step is needed), we find -i/vT 1/V3 1/V3. so U = 2/V6 0 -1/V3' 1/V6 -1/V2 1/V3 i/y/6 1/V2 1/V3, and we have the SVD "l f "2/V6 0 -1/V3" " V3 0" A = 1 0 = 1/V6 -1/V2 1/V3 0 1 _o 1. .1/V6 1/V2 1/V3_ . 0 0. 1/V2 1/V2 -1/V2 1/V2_ U2Vr There is another form of the singular value decomposition, analogous to the spectral decomposition of a symmetric matrix. It is obtained from the SVD by an outer product expansion and is very useful in applications. We can obtain this version of the SVD by imitating what we did to obtain the spectral decomposition. 596 Chapter 7 Distance and Approximation Accordingly, we have A = U£Vr = [u, ■ ... o 0 0 • • • ar 0 0 "ml [-1 0 -T, = cr,U, crrurJ crr. 0 o-, 0 •••.or, u,J[0] = (TjUiV,1 + • • • + 0-.UrV7 using block multiplication and the column-row representation of the product. The following theorem summarizes the process for obtaining this outer product form of theSVD. Theorem 7.14 The Outer Product Form of the SVD Let A be an m X n matrix with singular values cr,. > 0 and ar+l = cry+j = • • • = cr„ = 0. Let Uj,..., u,. be left singular vectors and let v,,..., v,. be right singular vectors of A corresponding to these singular values. Then A = cr1u1V1' + • • • + crrurvr7 Remark If A is a positive definite, symmetric matrix, then Theorems 7.13 and 7.14 both reduce to results that we already know. In this case, it is not hard to show that the SVD generalizes the Spectral Theorem and that Theorem 7.14 generalizes the spectral decomposition. (See Exercise 27.) The SVD of a matrix A contains much important information about A, as outlined in the crucial Theorem 7.15. Section 7.4 The Singular Value Decomposition 51? TllCOreill 7.15 LetA = C/2V3 be a singular value decomposition of an m X n matrix A. Let a crr be all the nonzero singular values of A. Then: a. The rank of A is r. b. {ui,..., u,J is an orthonormal basis for col(A). c. {ur+1,..., um} is an orthonormal basis for null(AT). d. {V],..., vr} is an orthonormal basis for row(A). e. |vr+v„} is an orthonormal basis for null(A). Proof (a) By Exercise 61 in Section 3.5, we have rank(A) = rmUUXV1) = rankCSF1) — rank(X) = r (b) We already know that {ut,..., u,J is an orthonormal set. Therefore, it is linearly independent, by Theorem 5.1. Since u, — (l/a^AVj for i = 1,..., r, each u; is in the a *» column space of A, (Why?) Furthermore, r = rank(A) = dim(col(A)) Therefore, {uu ..., u.} is a.n orthonormal basis for col(A), by Theorem 6.10(c). (c) Since {ui.....um} is an orthonormal basis for Rm and {Uj,..., ur} is a basis for col(A), by property (b), it follows that {ur+!,..., um} is an orthonormal basis for the orthogonal complement of col(A). But (col(A)) = null(A3), by Theorem 5.10. (e) Since A%+1 ==■•• = Av„ = 0 the set {vr+1,..., v„} is an orthonormal set contained in the null space of A. Therefore, {vr+1,..., v„} is a linearly independent set of n — r vectors in null(A). But dim(null(A)) = n - r by the Rank Theorem, so {vr+ v..., vj is an orthonormal basis for null(A), by Theorem 6.10(c). (d) Property (d) follows from property (e) and Theorem 5.10. (You are asked to prove this in Exercise 32.) a— The SVD provides new geometric insight into the effect of matrix transformations. We have noted several times (without proof) that an m X n matrix transforms the unit sphere in R" into an ellipsoid in Rm. This point arose, for example, in our discussions of Perron's Theorem and of operator norms, as well as in the introduction to singular values in this section. We now prove this result. Chapter 7 Distance and Approximation Theorem 7.16 Let A be an m X n matrix with rank r. Then the image of the unit sphere in M' - under the matrix transformation that maps x to Ax is a. the surface of an ellipsoid in Um if r - n. b, a solid ellipsoid in IR"' if r < n. Frost Let A = U2Fr be a singular value decomposition of the m X n matrix A. Let the left and right singular vectors of A be i^,..., um and Vj,..., v„, respectively. Since rank(A) = r, the singular values of A satisfy al > (i2 s • • • S ov > 0 and + o* which establishes the result. __... JiH cas ^ Verify Theorem 7.17 for the matrix A in Example 7.18. has singular values 4,5150 and 3.1008. We SOiUtlOn The matrix A check that 3 -1 2 4 V4.51502 + 3.10082 = V3Ö = \\A\\F which agrees with Example 7.18. In Section 7.2, we commented that there is no easy formula for the operator 2-norm of a matrix A. Although that is true, the SVD of A provides us with a very nice expression for |A||2. Recall that !jA|2 = max|jAx|| where the vector norm is the ordinary Euclidean norm. By Theorem 7.16, for ||x[| = 1, the set of vectors |Ax|j lies on or inside an ellipsoid whose semi-axes have lengths Chapter 7 Distance and Approximation equal to the nonzero singular values of A. It follows immediately that the largest of these is crl5 so IIA , = a. This provides us with a neat way to express the condition number of a (square) matrix with respect to the operator 2-norm. Recall that the condition number (with respect to the operator 2-norm) of an invertible matrix A is defined as cond2(A) = |A-I1|2|]A(|2 t As you will be asked to show in Exercise 28, if A = UlVr, then A"1 = V"2~lU . Therefore, the singular values of A"1 are l/ o~r > 0 of A. The pseudoinverse (or Moore-Penrose inverse) of A and D is an r X r diagonal matrix containing the nonzero singular values >•••> crr > 0 of A. The pi is the n X m matrix A* defined by where 2 + is the n X m matrix A4 = VV~UT D~l O O O Section 7.4 The Singular Value Decomposition 603 ________ Example 7.39 Find the pseudoinverses of the matrices in Example 7.34. Solution (a) From the SVD 1 A = 1 1 0" "1 0" "V2 0 0" 0 0 1. _o 1. . 0 10 1/V2 1/V2 0 0 0 1 .-1/V2 1/V2 0. utvT One of those who was unaware of Moore's work on matrix inverses was Roger Penrose (b.1931), who introduced his own notion of a generalized matrix inverse in 1955. Penrose has made many contributions to geometry and theoretical physics. He is also the inventor of a type of nonperiodic tiling that covers the plane with only two different shapes of tile, yet has no repeating pattern. He has received many awards, including the 1988 Wolf Prize in Physics, which he shared with Stephen Hawking. In 1994, he was knighted for services to science. Sir Roger Penrose is currently the Emeritus Rouse Ball Professor of Mathematics at the University of Oxford. we form Then 1/V2 0' 0 1 0 0 vl+uT = (b) We have the SVD 0 -1/V2" "1/V2 o" 1 0" .0 1. "1/2 o" 0 1/V2 0 1 = 1/2 0 1 0 0 0 0 1 *1 f ~2/V6 0 -l/VT "V3 0" A = 1 0 = i/Vi -1/V2 1/V3 0 1 -0 1. _1/V6 1/V2 1/V3. o 0. 1/V2 1/V2 T/V2 1/V2_ = L/2Vr so and 1/V3 0 0 0 1 0 A+ = VT'rU t _ 1/V2 1/V2 I/V2 1/V2. 1/V3 0 0 0 1 0 2/V6 1/V6 1/V6 0 -1/V2 1/V2 -1/V3 1/V3 1/V3. 1/3 2/3 -1/3 1/3 -1/3 2/3 It is straightforward to check that this new definition of the pseudoinverse generalizes the old one, for if them X n matrix .A U1,VT has linearly independent columns, then direct substitution shows that (ATA) ~ lA T = V~L+ U1. (You are asked to verify this in Exercise 50.) Other properties of the pseudoinverse are explored in the exercises. We have seen that when A has linearly independent columns, there is a unique least squares solution x to Ax = b; that is, the normal equations ATAx = ATb have the unique solution x = (ATA)~lATb = .\ b When the columns of A are linearly dependent, then ATA is not invertible, so the normal equations have infinitely many solutions. In this case, we will ask for the solution x of minimum length (i.e., the one closest to the origin). It turns out that this time we simply use the general version of the pseudoinverse. 6 Chapter 7 Distance and Approximation TiieOfeitl 7.18 The least squares problem Ax = b has a unique least squares solution x of minimal length that is given by x = A b Proof Let A be an m X n matrix of rank r with SVD A = 1/2 Vr (so that A+ = V"2+UT). Lety = V Tx and let c = L7 'b. Write y and c in block form as and c = V j2. _c2_ where yj and Cj are in W. We wish to minimize ||b — Ax|| or, equivalently, jjb — Axjj2. Using Theorem 5.6 and the fact that UT is orthogonal (because U is), we have Axj 2 - IlTTl-, UT{b - Ax)||2 = |Ur(b - U2Vrx)jj2 = \\U'b - UTUlVlx\\ = l!c-2y!J2= " [V D 0' 2 o 0_ T2- c2 The only part of this expression that we have any control over is yu so the minimum value occurs when Cj — Dyl = 0 or, equivalently, when y, = D~lCx. So all least squares solutions x are of the form Set x = Vy = V x = Vy = V D_1C! Y2 . 0 We claim that this x is the least squares solution of minimal length. To show this, lets suppose that x' = vy = v L Yi is a different least squares solution (hence, y2 ^ 0). Then HI = ]VyI = ||yfl < lly'l = ||vy'|| = ||x'| as claimed. We still must show that x is equal to A+b. To do so, we simply compute Dl 0" V 0 = V _c2. x = Vy = V u j = W'c = 1'2 l.vb = ,1 b Example 7.40 Find the minimum length least squares solution of Ax = b, where '1 I "0' A = and b = .1 1_ 1 Section 7.4 The Singular Value Decomposition 605 Solution The corresponding equations x + y = 0 .v — y = I are clearly inconsistent, so a least squares solution is our only hope. Moreover, the columns of A are linearly dependent, so there will be infinitely many least squares solutions—among which we want the one with minimal length. An SVD of A is given by A ~ 1 1 1 1 1/V2 1/V2 1/V2 -1/V2 '2 0" _0 1/V2 1/V2 1/V2 -1/V2_ = 1/2 VT (Verify this.) It follows that A+ = V£+[/r = 1/V2 1/V2" "1/2 o~ "1/V2 I/vT -1/4 1/4 1/V2 -1/V2. 0_ 1/V2 -1/V2. .1/4 1/4. so x = A b •1 1 - "0" "1" 4 I 4 1 .4 1 4. J. 1 .4. You can see that the minimum least squares solution in Example 7.40 satisfies x + y *= |, In a sense, this is a compromise between the two equations we started with. In Exercise 49, you are asked to solve the normal equations for this problem directly and to verify that this solution really is the one closest to the origin. The Fundamental Theorem of (avertible Matrices it is appropriate to conclude by revisiting the Fundamental Theorem of Invertible Matrices one more time. Not surprisingly, the singular values of a square matrix tell us when the matrix is invertible. Th60feiTI 7.19 The Fundamental Theorem of Invertible Matrices: Final Version Let A be an n X n matrix and let T : V —> W be a linear transformation whose matrix [T\c+-b wim respect to bases B and C of V and W, respectively, is A. The following statements are equivalent: a. A is invertible. b. Ax = b has a unique solution for every b in W. c. Ax = 0 has only the trivial solution, d. The reduced row echelon form of A is /„. e. A is a product of elementary matrices. f. rank(A) = n g. nullity(A) = 0 h. The column vectors of A are linearly independent. i. The column vectors of A span Un. j. The column vectors of A form a basis for R". k. The row vectors of A are linearly independent. 1. The row vectors of A span R". Chapter 7 Distance and Approximation m. The row vectors of A form a basis for W, n. detA^O o. 0 is not an eigenvalue of A. p. T is invertible. q. T is one-to-one. r. T is onto. s. ker(T) = {0} t. range(T) = W u. 0 is not a singular value of A. First note that, by the definition of singular values, 0 is a singular value of A if and only if 0 is an eigenvalue of A7 A. (a) =*• (u) If A is invertible, so is A , and hence A1 A is as well. Therefore, property (o) implies that 0 is not an eigenvalue of ArA, so 0 is not a singular value of A. (u) => (a) If 0 is not a singular value of A, then 0 is not an eigenvalue of ATA. Therefore, ATA is invertible, by the equivalence of properties (a) and (o). But then rank(A) = n, by Theorem 3.28, so A is invertible, by the equivalence of properties (a) Amongthe many applications of the SVD, one of the most impressive is its use in compressing digital images so that they can be efficiently transmitted electronically (by satellite, fax, Internet, or the like). We have already discussed the problem of detecting and correcting errors in such transmissions. The problem we now wish to consider has to do with reducing the amount of information that has to be transmitted, without losing any essential information. In the case of digital images, let's suppose we have a grayscale picture that is 340 X 280 pixels in size. Each pixel is one of 256 shades of gray, which we can represent by a number between 0 and 255. We can store this information in a 340 X 280 matrix A, but transmitting and manipulating these 95,200 numbers is very expensive. The idea behind image compression is that some parts of the picture are less interesting than others. For example, in a photograph of someone standing outside, there may be a lot of sky in the background, while the person's face contains a lot of detail. We can probably get away with transmitting every second or third pixel in the background, but we would like to keep all the pixels in the region of the face. It turns out that the small singular values in the SVD of the matrix A come from the "boring" parts of the image, and we can ignore many of them. Suppose, then, that we have the SVD of A in outer product form A = (TjUjVi + ■ • ■ + (rru,\' Let k ^ rand define Ak = cr,Ujv[ + • • • + akukvk Then Ak is an approximation to A that corresponds to keeping only the first' k singular values and the corresponding singular vectors. For our 340 X 280 example, we may discover that it is enough to transmit only the data corresponding to the first 20 singular values. Then, instead of transmitting 95,200 numbers, we need only send 20 singular values plus the 20 vectors u,,..., u20 in U340 and the 20 vectors V],..., v20 in ffS280, for a total of 20 + 20 • 340 + 20 • 280 = 12,420 ■m numbers. This represents a substantial saving! The picture of the mathematician Gauss in Figure 7.22 is a 340 X 280 pixel image. It has 256 shades of gray, so the corresponding matrix A is 340 X 280, with entries fl| between 0 and 255. It turns out that the matrix A has rank 280. If we approximate A by Ak, as described above, we get an image that corresponds to the first k singular values of A. Figure 7.23 shows several of these images for values of k from 2 to 256. At first, the image is very blurry, but fairly quickly it takes shape. Notice that A32 already gives a pretty good approximation to the actual image (which comes from A = A2g0, as shown in the upper left-hand corner of Figure 7.23). Some of the singular values of A are o~i = 49,096, al6 --- 22,589, ai2 = 10,187, ■** (764 = 484, o"i28 = 182, a256 = 5, and i> Ul) Example 7,41 Find the best linear approximation to fix) = ex on the interval [—1, 1]. Xr.tmt Linear functions "are polynomials of degree 1, so we use the subspace f W = 9V-1, 1] of^f-1, 1] with the inner product r 1 {/,#) = f(x)g(x)dx A basis for 9* J— 1, 1] is given by {1, .*}. Since rl x = 0 this is an orthogonal basis, so the best approximation to/in Wis g(x) = projw(e*) (1, ex) (x, ex) -1 + -—rx (1,1) (x,x)' (l-ex)dx xexdx + (l-l)dx - -l e — e~ x dx + 2e l) + 3e'\x = 1.18' + l.lOx ' : Chapter 7 Distance and Approximation where we have used integration by parts to evaluate tions.) See Figure 7.24. f(x) = ex xex dx. (Check these calcula- I/g{x) = 1,18 + l.lOx figure 7.14 The error in approximating / by g is the one specified by the Best Approximation Theorem: the distance \\f — g\\ between/and g relative to the inner product on % [— 1,1]. This error is just II/-*« = (fix) - g(x))2 dx and is often called the root mean square error. With the aid of a CAS, we find that the root mean square error in Example 7.41 is \ex - (|(e - e"1) + 3e~lx) (ex ~ |(e - e'1) - 3e~lx)2 dx «- 0.23 Remark The root mean square error can be thought of as analogous to the area between the graphs of /and g on the specified interval. Recall that the area between the graphs of/and g on the interval [a, b] is given by \f(x) - g(x) I dx (See Figure 7.25.) Although the equation in the above Remark is a sensible measure of the "error" between /and g, the absolute value sign makes it hard to work with. The root mean square error is easier to use and therefore preferable. The square root is necessary to "compensate" for the squaring and to keep the unit of measurement the same as it would be for the area between the curves. For comparison purposes, the area between the graphs of /'and g in Example 7.41 is He - e"1) - 3e"1x\ dx 0.28 Figure 7.25 Section 7.5 Applications 613 Example 7.42 Find the best quadratic approximation to/(x) = ex on the interval [—1,1], Solution A quadratic function is a polynomial of the form g{x) — a + bx + cx2 in W = 9f>2[~ 1' 1] ■ This time, the standard basis {1, x, x2} is not orthogonal. However, we can construct an orthogonal basis using the Gram-Schmidt Process, as we did in Example 7.8. The result is the set of Legendre polynomials {l, x,X 3} Using this set as our basis, we compute the best approximation to fin W as g(x) = projw(e*). The linear terms in this calculation are exactly as in Example 7.41, so we only require the additional calculations (x2 - k)exdx dx - |j exdx = f(e - 7el) and el ,-l 1 „2 _ 1\ _ I („2 _ 1\2 J„ - x2 - \) = (x2 - j-)2 dx = I (x4 - fx2 + \)dx = Then the best quadratic approximation to/(x) ~ ex on the interval [— 1,1] is ; e n 1. The approximation to/given by Equations (2) and (3) is called the nth-order Fourier approximation to/on [—it, tt]. The coefficients are called the Fourier coefficients off. Example 7.44 Find the fourth-order Fourier approximation to/(x) = x on [—tt, tt]. -,- Solution Using formulas (3), we obtain 1 2tt xdx 2tt and for k a l, integration by parts yields at = — x cos kx dx — — — sin fee + TV cos toe 616 Chapter 7 Distance and Approximation and Jean-Baptiste Joseph Fourier (1768-1830) was a French mathematician and physicist who gained prominence through his investigation into the theory of heat. In his landmark solution of the so-called heat equation, he introduced techniques related to what are now known as Fourier series, a tool widely used in many branches of mathematics, physics, and engineering. Fourier was a political activist during the French revolution and became a favorite of Napoleon, accompanying him on his Egyptian campaign in 1798. Later Napoleon appointed Fourier Prefect of Isere, where he oversaw many important engineering projects. In 1808, Fourier was made a baron. He is commemorated by a plaque on the Fiffel Tower. = < x sin kx dx 77 X , 1 ■ 7 cos kx + ~*r sin kx k k -it cos kir — it cos(—kir) — if k is even if k is odd = 2(-l)*+1 k It follows that the fourth-order Fourier approximation to f(x) ~ x on [—tt, tt] is 2(sin x — j sin 2x + | sin 3x — \ sin 4x) Figure 7.27 shows the first four Fourier approximations tof(x) =* x on [—ir, tt] . n = 3 n = 4 Figure 7.27 Section 7.5 Applications You can clearly see the approximations in Figure 7.27 improving, a fact that can be confirmed by computing the root mean square error in each case. As the order of the Fourier approximation increases, it can be shown that this error approaches zero. The trigonometric polynomial then becomes an infinite series, and we write fix) = a0 + ^iak cos kx + bk sin kx) This is called the Fourier series of/on [ — tt, tt]. Mariner 9 used the Reed-Muller code R5, whose minimum distance is 24 = 16. By Theorem 1, this code can correct k errors, where 2k + 1 S 16. The largest value of k for which this inequality is true is k = 7. Thus, R- not only contains exactly the right number of code vectors for transmitting 64 shades of gray but also is capable of correcting up to 7 errors, making it quite reliable. This explains why the images transmitted by Mariner 9 were so sharp! Exercises 7.5 Approximation of Functions In Exercises 1-4, find the best linear approximation tofon the interval [ — 1,1 ]. 1. fix) = x2 3. f(x) = xi 2. fix) = x2 + 2x 4. fix) = sin(irx/2) In Exercises 5 and 6, find the best quadratic approximation tofon the interval [ — 1, 1 ]. 5. fix) = \x\ 6. fix) = cos{ttx/2) 7. Apply the Gram-Schmidt Process to the basis {1, x] to construct an orthogonal basis for ^[0,1]. 8. Apply the Gram-Schmidt Process to the basis {l, x, x2} to construct an orthogonal basis for 2?2 [0, 1]. In Exercises 9-12, find the best linear approximation tofon the interval [0,1], 9. fix) = x2 10. fix) = Vx ll.f{x) = ex 12. fix) - smirrx/D In Exercises 13-16, find the best quadratic approximation tof on the interval [0,1], 13. fix) = x3 14. fix) = Vx 15. fix) = ex 16. fix) = sin(77x/2) 17. Show that 1 is orthogonal to cos kx and sin kx in ^[-ir, -it] forfc> 1. 18. Show that cos jx is orthogonal to cos kx in % [ — tt, tt] for j i> k,j, /i21. 19. Show that sin jx is orthogonal to sin kx in % [ — tt, it] for; # k,j, k 1. 20. Show that |J 11[2 = 2tt and ||cos fcx||2 = - in <[ -~ rr. tt]. In Exercises 21 and 22, find the third-order Fourier approximation tofon [ —tt, tt], 21. fix) = |*j 22, fix) = x2 In Exercises 23-26, find the Fourier coefficients aQ, ak, and bkoffon [-tt, tt}. '0 if-7T s x < 0 1 if 0 < x < tt -1 if-7Tx 3. (A, B) = tr(ArB) for A, B in M22 4. (fig) = ( max f(x))(maxß(x)) for/, g in % [0,1] In Questions 5 and 6, compute the indicated quantity using the specified inner product. 5. i| 1 + x + x2\\ if (a0 + axx + a2x2, b0 + btx + b2x2) = a0b0 + ä\bx + a2b2 6. d(x, x2) if (p(x), q(x)) = j^p(x)qix) dx In Questions 7 and 8, construct an orthogonal set of vectors by applying the Gram-Schmidt Process to the given set of vectors using the specified inner product. 7. if (u, v) = u Av, where A = 6 4 4 6 ... 8. {1, x, x2} if (p(x), q(x)) = p(x)q(x) dx - o In Questions 9 and 10, determine whether the definition gives a norm. 9. || vl| = v'vforvin U" 10, \\p(x)\\ = |^(0)| + \p(l) - p(0)| forp(x) inS?! Section 7.5 Applications 618 11. Show that the matrix A = 1 0.1 0.11 0.1 0.11 0.111 0.11 0.111 0.1111 ill-conditioned. 12. Prove that if Q is an orthogonal n X n matrix, then its Frobenius norm is ||Q||f = V». 13. Find the line of best fit through the points (1, 2), (2, 3), (3, 5), and (4, 7). 14. Find the least squares solution of t 1' 0 -1 3 15. Find the orthogonal projection of x "l f column space of A onto the 16. If u and v are orthonormal vectors, show that P = uur + w' is the standard matrix of an orthogonal projection onto span (u, v). [Hint: Show that P = A{ATA)~lAT for some matrix A.] In Questions 17 and 18, find (a) the singular values, (b) a singular value decomposition, and (c) the pseudoinverse of the matrix A. i r 18. A 17. A 1 -1 1 -1 0 0 .1 -1. 19. If P and Q are orthogonal matrices for which PAQ is defined, prove that PAQ has the same singular values as A. 20. If A is a square matrix for which A2 = O, prove that (A+)2 = O. Please, sir, I want some more. —Oliver Charles Dickens, Oliver Twist Anyone who understands algebraic notation reads at a glance in an equation results reached arithmetically only with great labour and pains. —Augustin Cournot Researches into the Mathematical Principles of the Theory of Wealth Translated by Nathaniel T, Bacon Macmillan, 1897, p. 4 '0 Appendix A Mathematical Notation and Methods of Proof In this book, an effort has been made to use "mathematical English" as much as possible, keeping mathematical notation to a minimum. However, mathematical notation is a convenient shorthand that can greatly simplify the amount of writing we have to do. Moreover, it is commonly used in every branch of mathematics, so the ability to read and write mathematical notation is an essential ingredient of mathematical understanding. Finally, there are some theorems whose proofs become "obvious" if the right notation is used. Proving theorems in mathematics is as much an art as a science. For the beginner, it is often hard to know what approach to use in proving a theorem; there are many approaches, any one of which might turn out to be the best. To become proficient at proofs, it is important to study as many examples as possible and to get plenty of practice. This appendix summarizes basic mathematical notation applied to sets. Summation notation, a useful shorthand for dealing with sums, is also discussed. Finally, some approaches to proofs are illustrated with generic examples. Set Notation A set is a collection of objects, called the elements (or members) of the set. Examples of sets include the set of all words in this text, the set of all books in your college library, the set of positive integers, and the set of all vectors in the plane whose equation is 2x + 3y — z = 0. It is often possible to list the elements of a set, in which case it is conventional to enclose the list within braces. For example, we have {1,2,3}, {a,t,x,z}, {2,4,6,..., 100}, 77 2 77 77 477 577 6 Note that ellipses (...) denote elements omitted when a pattern is present. (What is the pattern in the last two examples?) Infinite sets are often expressed using ellipses. For example, the set of positive integers is usually denoted by N or Z+, so N = T = {1,2,3,...} The set of all integers is denoted by Z, so Z = {...,-2,-1,0,1,2,...} Two sets are considered to be equal if they contain exactly the same elements. The order in which elements are listed does not matter, and repetitions are not counted. Thus, "Exercises and selected odd-numbered answers for this appendix can be found on the student companion website. A1 Appendix A Mathematical Notation and Methods of Proof {1,2,3} = {2, 1,3} = (1,3,2, 1} The symbol E means "is an element of" or "is in," and the symbol & denotes the negation—that is, "is not an element of" or "is not in." For example, 5 e Z+ but 0 g I* It is often more convenient to describe a set in terms of a rule satisfied by all of its elements. In such cases, set builder notation is appropriate. The format is {x: x satisfies P} where P represents a property or a collection of properties that the element x must satisfy. The colon is pronounced "such that." For example, {n:«GZ,«>0} is read as "the set of all n such that n is an integer and n is greater than zero." This is just another way of describing the positive integers Z+. (We could also write Z+ = {n G Z : n > 0}.) The empty set is the set with no elements. It is denoted by either 0 or {}. Example A.1 Describe in words the following sets: (a) A = {n:n = 2U£Z( (c) C = {x G R : 4x2 - 4x - 3 = 0} (b) B = {m/n :m,n£ (d) D = {x £ Z : 4x2 Z, n * 0} - 4x - 3 = 0} Solution (a) A is the set of numbers n that are integer multiples of 2. Therefore, A is the set of all even integers. (b) B is the set of all expressions of the form m/n, where m and n are integers and n is nonzero. This is the set of rational numbers, usually denoted by Q. (Note that this way of describing Q produces many repetitions; however, our convention, as noted above, is that we include only one occurrence of each element. Thus, this expression precisely describes the set of all rational numbers.) (c) C is the set of all real solutions of the equation 4x2 - 4x - 3 = 0. By factoring or using the quadratic formula, we find that the roots of this equation are — 5 and f. (Verify this.) Therefore, c = {-§,!} (d) From the solution to (c) we see that there are no solutions to 4x2 — 4x — 3 = 0 in R that are integers. Therefore, D is the empty set, which we can express by writing D = 0. ' 1 John Venn (1834-1923) was an English mathematician who studied at Cambridge University and later lectured there. He worked primarily in mathematical logic and is best known for inventing Venn diagrams. If every element of a set A is also an element of a set B, then A is called a subset of B, denoted A C B. We can represent this situation schematically using a Venn diagram, as shown in Figure A.l, (The rectangle represents the universal set, a set large enough to contain all of the other sets in question—in this case, A and B.) ft" Appendix A Mathematical Notation and Methods of Proof A C B (a) {1,2,3} £{1,2,3,4,5} (b) Z+CZCH (c) Let A be the set of all positive integers whose last two digits are 24 and let B be the set of all positive integers that are evenly divisible by 4. Then if n is in A, it is of the form n = 100/c + 24 for some integer k. (For example, 36,524 = 100 • 365 + 24.) But then n = lOOfc + 24 = 4(25fc + 6) so «/4 = 25k + 6, which is an integer. Hence, n is evenly divisible by 4, so it is in B. Therefore, A C B. We can show that two sets A and B are equal by showing that each is a subset of the other. This strategy is particularly useful if the sets are defined abstractly or if it is not easy to list and compare their elements. Let A be the set of all positive integers whose last two digits form a number that is evenly divisible by 4. In the case of a one-digit number, we take its tens digit to be 0. Let B be the set of all positive integers that are evenly divisible by 4. Show that A = B. Solution As in Example A.2(c), it is easy to see that A C B. If n is in A, then we can split off the number m formed by its last two digits by writing n = lOOfc + m for some integer k. But, since m is divisible by 4, we have m = 4r for some integer r. Therefore, n = 100A: + m = 100k + 4r = 4(25fc + r) so n is also evenly divisible by 4. Hence, A C B. To show that B C A, let n be in B. That is, n is evenly divisible by 4. Let's say that n = 45, where s is an integer. If m is the number formed by the last two digits of n, then, as above, n = 100/c + m for some integer k. But now m = n - 100/c = 4s - lOOfc = 4(s - 25A;) which implies that m is evenly divisible by 4, since s — 25k is an integer, 'therefore, n is in A, and we have shown that B C A. Since AC B and B C A, we must have A = B Appendix A Mathematical Notation and Methods of Proof The intersection of sets A and B is denoted by A fl B and consists of the elements that A and B have in common. That is, A ("1 B = {x : x E A and x £ B} Figure A.2 shows a Venn diagram of this case. The union of A and B is denoted by A U B and consists of the elements that are in either A or B (or both). That is, A U B = [x : x E A or x E B} See Figure A.3. Figure A.2 A D B A U B Let A = {n2: n E T, 1 < « < 4} and let B = {n E Z+ : « < 10 and « is odd}. Find A fl B and A U B. SelUliii We see that A = {l2,22,32, 42} = {1,4,9, 16} and B = {1,3,5,7,9} Therefore, A fl B = {1, 9} and A U B = {1, 3,4, 5, 7, 9, 16}. If A fl B = 0, then A and B are called disjoint sets. (See Figure A.4.) For example, the set of even integers and the set of odd integers are disjoint. Summation Notation Summation notation is a convenient shorthand to use to write out a sum such as 1 + 2 + 3 + • • • + 100 where we want to leave out all but a few terms. As in set notation, ellipses (...) convey that we have established a pattern and have simply left out some intermediate terms. In the above example, readers are expected to recognize that we are summing all of the positive integers from 1 to 100. However, ellipses can be ambiguous. For example, what would one make of the following sum? 1 + 2 + • • • + 64 Is this the sum of all positive integers from 1 to 64 or just the powers of two, 1 + 2 + 4 + 8+16 + 32 + 64? It is often clearer (and shorter) to use summation notation (or sigma notation). Figure fl.4 Disjoint sets 2 is the capital Greek letter sigma, corresponding to S (for "sum"). Summation notation was introduced by Fourier in 1820 and was quickly adopted by the mathematical community. Appendix A Mathematical Notation and Methods of Proof We can abbreviate a sum of the form at + a2 + ~ ~ • + an (1) as 5>* fc«=i (2) which tells us to sum the terms ak over all integers fc ranging from 1 to n. An alternative version of this expression is The subscript fc is called the imfec of summation. It is a "dummy variable" in the sense that it does not appear in the actual sum in expression (1). Therefore, we can use any letter we like as the index of summation (as long as it doesn't already appear somewhere else in the expressions we are summing). Thus, expression (2) can also be written as « The index of summation need not start at 1. The sum a3 + a4 H— + agg becomes 99 although we can arrange for the index to begin at 1 by rewriting the expression as 97 4 = 1 The key to using summation notation effectively is being able to recognize patterns. Write the following sums using summation notation. (a) 1 + 2 + 4 + ■•• + 64 (b) 1 + 3 + 5 + ••• + 99 (c) 3 + 8 + 15 + •■• + 99 (a) We recognize this expression as a sum of powers of 2: 1 + 2 + 4 + 64 = 2U + V Therefore, the index of summation appears as the exponent, and we have >j 2k. 4 = 0 (b) This expression is the sum of all the odd integers from 1 to 99. Every odd integer is of the form 2k + 1, so the sum is 1 + 3 + 5 + • • • + 99 = (2-0 + 1) + (2-1 + 1) + (2-2 + 1) + • • • + (2-49 + 1) = 2(2fc+ 1) 4=0 (c) The pattern here is less clear, but a little reflection reveals that each term is 1 less than a perfect square: 3 + 8 + 15 + • • • + 99 = (22 ~ 1) + (32 - 1) + (42 - 1) + • • ■ + (102 - 1) 10 = 2>2 -1) k-2 . ^4- Appendix A Mathematical Notation and Methods of Proof Example A.6 Rewrite each of the sums in Example A.5 so that the index of summation starts at 1. (a) If we use the change of variable i = k + 1, then, as k goes from 0 to 6, i goes from 1 to 7. Since k = ( — 1, we obtain (c=0 i=l (b) Using the same substitution as in part (a), we get 49 50 50 ^dk +1) = 2<2<* - i) + i) = 2>' - i) (c) The substitution i = k - 2 will work (try it), but it is easier just to add a term corresponding tok= 1, since l2 — 1 = 0. Therefore, 10 10 2>2 - 1) = 2>2 - 1) k -2 *«1 Multiple summations arise when there is more than one index of summation, as there is with a matrix. The notation n means to sum the terms at, as i and j each range independently from 1 to n. The sum in expression (3) is equivalent to either ft n SIX where we sum first over j and then over i (we always work from the inside out), or n n ;=1 >=1 where the order of summation is reversed. Example A.7 i Write out ^!' using both possible orders of summation. U-J ' Solution r*l J«l ;=l = (l1 + l2 + l3) + (21 + 22 + 23) + (31 + 32 + 33) = (1 + 1 + 1) + (2 + 4 + 8) + (3 + 9 + 27) = 56 Appendix A Mathematical Notation and Methods of Proof and 22'v' = 2u'+ 2*+ 3*) = (l1 + 21 + 31) + (l2 + 22 + 32) + (I3 4- 23 + 33) = (1 + 2 + 3) + (1 + 4 + 9) + (1 + 8 + 27) = 56 How fo Solve It is the title of a book by the mathematician George Polya (1887-1985 Since its publication in 1945, How to Solve It has sold over a million copies and has been translated into 17 languages. « Polya was born in Hungary, but because of the political situation in Europe, he moved to the United States in 1940. He subsequently taught at Brown and Stanford Universities, where he did mathematical research and developed a well-deserved reputation as an outstanding teacher. The Polya Prize is awarded annually by the Society for Industrial and Applied Mathematics for major contributions to areas of mathematics close to those on which Polya worked. The Mathematical Association of America annually awards Polya Lectureships to mathematicians demonstrating the high-quality exposition for which Polya was known. Remark Of course, the value of the sum in Example A,7 is the same no matter which order, of summation we choose, because the sum is finite. It is also possible to consider infinite sums (known as infinite series in calculus), but such sums do not always have a value and great care must be taken when rearranging or manipulating their terms. For example, suppose we let S = ±2k fc=o Then S=l+2 + 4 + 8 + -- - = 1 + 2(1 + 2 + 4 + • • •) = 1 + IS from which it follows that S = — 1. This is clearly nonsense, since S is a sum of non-negative terms! (Where is the error?) Methods of Proof The notion of proof is at the very heart of mathematics. It is one thing to know what is true; it is quite another to know why it is true and to be able to demonstrate its truth by means of a logically connected sequence of statements. The intention here is not to try to teach you how to do proofs; you will become better at doing proofs by studying examples and by practicing—something you should do often as you work through this text. The intention of this brief section is simply to provide a few elementary examples of some types of proofs. The proofs of theorems in the text will provide further illustrations of "how to solve it." Roughly speaking, mathematical proofs fall into two categories: direct proofs and indirect proofs. Many theorems have the structure "if P, then Q," where P (the hypothesis, or premise) and Q (the conclusion) are statements that are either true or false. We denote such an implication by P =4» Q. A direct proof proceeds by establishing a chain of implications P Px P2 ■ ■ ■ =*■ P„ Q leading directly from P to Q. Prove that any two consecutive perfect squares differ by an odd number. This instruction can be rephrased as "Prove that if a and b are consecutive perfect squares, then a — bis odd." Hence, it has the form P => Q, with P being "a and b are consecutive perfect squares" and Q being "a — b is odd." Appendix A Mathematical Notation and Methods of Proof as Assume that a and b are consecutive perfect squares, with a> b. Then a = (n + l)2 and b = n2 for some integer n. But now a - b = in + l)2 - n2 = n2 + 2n + 1 — n2 = 2n + 1 so a — t> is odd. There are two types of indirect proofs that can be used to establish a conditional statement of the form P => Q. A proof by contradiction assumes that the hypothesis P is true, just as in a direct proof, but then supposes that the conclusion Q is false. The strategy then is to show that this is not possible (i.e., to rule out the possibility that the conclusion is false) by finding a contradiction to the truth of P. It then follows that Q must be true. Example A.9 Let n be a positive integer. Prove that if n2 is even, so is n. (Take a few minutes to try to find a direct proof of this assertion; it will help you to appreciate the indirect proof that follows.) Assume that n is a positive integer such that n2 is even. Now suppose that n is not even. Then n is odd, so 2k 1 for some integer k. But if so, we have n2 = {2k + 1) 4k2 + 4k + 1 so n2 is odd, since it is 1 more than the even number 4k + 4k. This contradicts our hypothesis that n2 is even. We conclude that our supposition that n was not even must have been false; in other words, n must be even. Closely related to the method of proof by contradiction is proof by contraposi-tive. The negative of a statement P is the statement "it is not the case that P" abbreviated symbolically as ->P and pronounced "not P." For example, if P is "« is even," then ->P is "it is not the case that n is even"—in other words, "n is odd." The contrapositive of the statement P =*• Q is the statement ->Q -iP. A conditional statement P=i> Q and its contrapositive ~iQ =4- -vP are logically equivalent in the sense that :«--»" they are either both true or both false. (For example, if P =** Q is a theorem, then so is ->Q -iP. To see this, note that if the hypothesis -iQ is true, then Q is false. The conclusion ->P cannot be false, for if it were, then P would be true and our known theorem P=$Q would imply the truth of Q, giving us a contradiction. It follows that ->P is true and we have proved -iQ =>• -iP.) Here is a contrapositive proof of the assertion in Example A.9. M Appendix A Mathematical Notation and Methods of Proof Example A.10 Let n be a positive integer. Prove that if n2 is even, so is n. Solution The contrapositive of the given statement is "If n is not even, then n1 is not even" or "If n is odd, so is n2" To prove this contrapositive, assume that n is odd. Then n — 2k + 1 for some integer k. As before, this means that n2 = (2k + I)2 = Ak2 + 4k + 1 is odd, which completes the proof of the contrapositive. Since the contrapositive is true, so is the original statement. a Although we do not require a new method of proof to handle it, we will briefly consider how to prove an "if and only if" theorem. A statement of the form "P if and only if Q" signals a double implication, which we denote by P Q. To prove such a statement, we must prove P Q and Q =#■ P. To do so, we can use the techniques described earlier, where appropriate. It is important to notice that the "if" part of P Q is "P if Q," which is Q => P; the "only if" part of P 4* Q is "P only if Q," meaning P =>• Q. The implication P Q is sometimes read as "P is sufficient for Q" or "Q is necessary for P"; Q P is read "Q is sufficient for P" or "P is necessary for Q." Taken together, they are P Q, or "P is necessary and sufficient for Q" and vice versa. Example Jl.11 A pawn is placed on a chessboard and is allowed to move one square at a time, either horizontally or vertically. A pawns tour of a chessboard is a path taken by a pawn, moving as described, that visits each square exactiy once, starting and ending on the same square. Prove that there is a pawn's tour of an n X n chessboard if and only if n is even. Solution [ ] ("if") Assume that n is even. It is easy to see that the strategy illustrated in Figure A.5 for a 6 X 6 chessboard will always give a pawn's tour. [ ] ("only if") Suppose that there is a pawn's tour of an n X n chessboard. We will give a proof by contradiction that n must be even. To this end, let's assume that n is odd. At each move, the pawn moves to a square of a different color. The total number of moves in its tour is n2, which is also an odd number, according to the proof in Example A. 10. Therefore, the pawn must end up on a square of the opposite color from that of the square on which it started. (Why?) This is impossible, since the pawn ends where it started, so we have a contradiction. It follows that n cannot be odd; hence, n is even and the proof is complete. Figure A.5 Some theorems assert that several statements are equivalent. This means that each is true if and only if all of the others are true. Showing that n statements are equivalent requires n~ — n 'if and only if" proofs. In practice, 2!(« - 2)! 2 however, it is often easier to establish a "ring" of n implications that links all of the statements. The proof of the Fundamental Theorem of Invertible Matrices provides an excellent example of this approach. Appendix B © Great fleas have little fleas upon their backs to bite 'em, And little fleas have lesser fleas, and so ad infinitum, —Augustus De Morgan A Budget of Paradoxes Longmans, Green, and Company, 1872, p. 377 Mathematical Induction The ability to spot patterns is one of the keys to success in mathematical problem solving. Consider the following pattern: 1 = 1 1+3 = 4 1+3+5=9 1 + 3 + 5 + 7= 16 1 + 3 + 5 + 7 + 9 = 25 The sums are all perfect squares: l2, 22, 32,42, 52. It seems reasonable to conjecture that this pattern will continue to hold; that is, the sum of consecutive odd numbers, starting at 1, will always be a perfect square. Let's try to be more precise. If the sum is n2, then the last odd number in the sum is 2« - 1. (Check this in the five cases above.) In symbols, our conjecture becomes 1 + 3 + 5 + • ■ • + (2n - 1) for all n s 1 (1) Notice that Equation (1) is really an infinite collection of statements, one for each value of n 3: 1. Although our conjecture seems reasonable, we cannot assume that the pattern continues—we need to prove it. This is where mathematical induction comes in. First Principle of Mathematical Induction Let S(n) be a statement about the positive integer n. If 1. 5(1) is true and 2. for all k 2= 1, the truth of S(k) implies the truth of S(k + 1) then S(n) is true for all »2 1, Verifying that S(l) is true is called the basis step. The assumption that S(k) is true for some k S: 1 is called the induction hypothesis. Using the induction hypothesis to prove that S(k + 1) is then true is called the induction step. Mathematical induction has been referred to as the domino principle because it is analogous to showing that a line of dominoes will fall down if (1) the first domino can be knocked down (the basis step) and (2) knocking down any domino (the induction hypothesis) will knock over the next domino (the induction step). See Figure B.l. We now use the principle of mathematical induction to prove Equation (1). *Exercises and selected odd-numbered answers for this appendix can be found on the student companion website, Appendix B Mathematical Induction If the first domino falls, and ... each domino that falls knocks down the next one, then all the dominoes can be made to fall by pushing over the first one. figure B.1 EXaiTiPl8 B.1 Use mathematical induction to prove that 1+3 + 5 -+-••• + (2« - 1) = «2 for all «S 1, Solution For « = 1, the sum on the left-hand side is just 1, while the right-hand side is l2. Since 1 = l2, this completes the basis step. Now assume that the formula is true for some integer k Se 1. That is, assume that 1 + 3 + 5 + • • • + (2k - 1) = 1,2 (This is the induction hypothesis.) The induction step consists of proving that the formula is true when n = k + 1. We see that when n — k + 1, the left-hand side of formula (1) is 1 + 3 + 5 + • • • + (2(k + 1) - 1) = 1 + 3 + 5 + ■ • • + (2k + 1) = 1 + 3 + 5 + ' • • + (2k - 1) + (2k + 1) k2 + 2k + = (k + l)2 by the induction hypothesis which is the right-hand side of Equation (1) when n = k + 1. This completes the induction step, and we conclude that Equation (1) is true for all n 2: 1, by the principle of mathematical induction. Appendix B Mathematical Induction Example B.2 The next example gives a proof of a useful formula for the sum of the first n positive integers. The formula appears several times in the text; for example, see the solution to Exercise 51 in Section 2.4. Prove that for all 1 + 2 + • • • + » n(n + 1) Solution The formula is true for n = 1, since 1(1 + 1) 1 = —--- 2 Assume that the formula is true for n = k; that is, k(k + 1) 1 + 2 + k = We need to show that the formula is true when n = k + 1; that is, we must prove that (k + l)[(k + 1) + 1] 1 + 2 + • • • + (k + 1) But we see that 1 + 2 + • • • + (k + 1) = (1 + 2 + • • * + k) + (k + 1) k(k + 1) „ = ~.--- + (k + 1) by the induction hypothesis k(k + 1) + 2(k + 1) 3k + 2 (k + l)(k + 2) (k + l)[(k + 1) + 1] which is what we needed to show. This completes the induction step, and we conclude that the formula is true for all tiS 1, by the principle of mathematical induction. . In a similar vein, we can prove that the sum of the squares of the first n positive integers satisfies the formula l2 + 22 + 32 + • • • + n2 = for all n a 1. (Verify this for yourself.) n(n + i)(2n + 1) Appendix B Mathematical Induction Example B.3 The basis step need not be for n — 1, as the next two examples illustrate. Prove that n\ > 2" for all integers n > 4. Solution The basis step here is when n = 4. The inequality is clearly true in this case, since 4! = 24 > 16 = 24 Assume that kl >2k for some integer k S 4. Then (fc + 1)! = (k + \)k\ > (k + 1)2 by the induction hypothesis > 5 • 2k since H>4 > 2-2* which verifies the inequality for n — k + 1 and completes the induction step. We conclude that«! > 2" for all integers n 2: 4, by the principle of mathematical induction. If a is a nonzero real number and n & 0 is an integer, we can give a recursive definition of the power a" that is compatible with mathematical induction. We define a" = 1 and, for n & 0, (This form avoids the ellipses used in the version an = aa ■ • • a.) We can now use mathematical induction to verify a familiar property of exponents. Let a be a nonzero real number. Prove that ama" = am+" for all integers m, n a 0. Solution At first glance, it is not clear how to proceed, since there are two variables, m and n. But we simply need to keep one of them fixed and perform our induction using the other. So, let m 2: 0 be a fixed integer. When n = 0, we have ama° = am-l = am = a"'+0 using the definition a0 = 1. Hence, the basis step is true. Now assume that the formula holds when n = k, where k St 0. Then amak = am+k. For n = k + 1, using our recursive definition and the fact that addition and multiplication are associative, we see that amak+i _ „m,J< am{aKa) (amak)a = am+ka = a (m+k) +1 by definition by the induction hypothesis by definition Appendix B Mathematical Induction Therefore, the formula is true for n = k + 1, and the induction step is complete. We conclude that a'"a" = am+n for all integers m, n £ 0, by the principle of mathematical induction. In Examples B.l through B.4, the use of the induction hypothesis during the induction step is relatively straightforward. However, this is not always the case. An alternative version of the principle of mathematical induction is often more useful. Second Principle of Mathematical Induction Let S(n) be a statement about the positive integer n. If 1. SO) is true and 2. the truth of S(l), S(2),..., S(k) implies the truth of S(k + 1) then S(n) is true for all ft£l, Example B.i The only difference between the two principles of mathematical induction is in the induction hypothesis: The first version assumes that S(k) is true, whereas the second version assumes that all of S(l), S(2),..., S(k) are true. This makes the second principle seem weaker than the first, since we need to assume more in order to prove S(k + 1) (although, paradoxically, the second principle is sometimes called strong induction). In fact, however, the two principles are logically equivalent: Each one implies the other. (Can you see why?) The next example presents an instance in which the second principle of mathematical induction is easier to use than the first. Recall that a prime number is a positive integer whose only positive integer factors are 1 and itself. Prove that every positive integer n 2: 2 either is prime or can be factored into a product of primes. Solution The result is clearly true when n = 2, since 2 is prime. Now assume that for all integers n between 2 and k, n either is prime or can be factored into a product of primes. Let n = k + 1. If k + lis prime, we are done. Otherwise, it must factor into a product of two smaller integers—say, k + 1 = ab Since 2 ■& a, b £ k (why?), the induction hypothesis applies to a and b. Therefore, a = Pi ''' Pr b = qx • ■ • qs where the p's and q's are all prime. Then ah** pr" prqi ■•• 0, as shown in Figure C.4. We have a = r cos 6 and b = r sin 6 so z = a + bi = rcosö + (rsinö)« Thus, any complex number can be written in the polar form z = r(cos 0 + i sin 0) Appendix C Complex Numbers ■ v Ex Im A = 1 + i > 1 -*-Re > ce3 -2-- w — 1 v3( Figure CS where r = | z\ = V a2 + b1 and tan 0 = b/a. The angle 9 is called an argument of z and is denoted by arg z. Observe that arg z is not unique: Adding or subtracting any integer multiple of 27T gives another argument of z. However, there is only one argument 6 that satisfies — tt<9 £ w This is called the principal argument of z and is denoted by Arg z. Write the following complex numbers in polar form using their principal arguments: (a) z*= 1 + i (b) w - 1 - V5i Solution (a) We compute r= \z\ = Vi2"+ l2 = V2 and tan 6 = - = 1 tt Therefore, Arg z = 9 - - (= 45°), and we have tt tt z --- V2 cos---r i sin — V 4 4 as shown in Figure C.5. (b) We have Vi2 + (-V3)2 = V4 = 2 and tan 6 = •V3 -V3 Since w lies in the fourth quadrant, we must have Arg z — 0 — —~ (= —60°), Therefore, w = 2 cos 7T I Sllli — — See Figure C.5. The polar form of complex numbers can be used to give geometric interpretations of multiplication and division. Let Zj = r1(cos9l + i sin 9 J and z2 = r2(cos 92 + i sin 92) Multiplying, we obtain ZjZj = rjfjfcosPi + i sin 6j)(cos 6| + / sin #2) = r,r2[(cosSj cos92 — sinsin92) + /(sin()•. cos#2 + cos0: sin92)] Using the trigonometric identities cos(0l + 92) = cos cos #2 — sinS] sin 92 mi(0x + 92) = sin 9X cosfl2 + cos 9X sin 02 Appendix C Complex Numbers Figure G.6 ex + ez we obtain ► Re zxz2 = r1r2[cos(01 + 62) + isin(Ox + 62)] (1) which is the polar form of a complex number with absolute value rxr2 and argument 6X + 02. This shows that \zxz2\ = !ziii22! and arg(z]z2) = argZi + argz2 Equation (1) says that to multiply two complex numbers, we multiply their absolute values and add their arguments. See Figure C.6. Similarly, using the subtraction identities for sine and cosine, we can show that [cosiö, - ö'2) + ;'sin(0, - d2)} if z * 0 B » (Verify this.) Therefore, Figure 6.7 !%! , (zi\ j—j- and arg^ ) = arg Zi - arg z2 and we see that to divide two complex numbers, we divide their absolute values and subtract their arguments. As a special case of the last result, we obtain a formula for the reciprocal of a complex number in polar form. Setting zx = 1 (and therefore 6X = 0) and z2 = z (and #.Re therefore 82 = 0), we obtain the following: It's = r(cos6> + ;' sin 0) is nonzero, then 1 1 - = -(cos f) - i sin H) z r See Figure C.7. I Find the product of 1 + i and 1 - V3;' in polar form. Sektion From Example C.3, we have 1 + i = V2| cos--b i sin — ) and 1 — V3f = l{ cos V 4 7T \ / tt ■ — J + ( sin-- 3/ V 3 Therefore, (1 + - V30 = 2V2 = 2V2 See Figure C.8. cos - cos tt 12 tt\ . . / tt + i sm 37 \ '4 w ■ ■ ( tt i sm — / V 12 Appendix C Complex Numbers 60 1m We therefore have a method for finding the sine and cosine of an angle such as nl\2 that is not a special angle but that can be obtained as a sum or difference of special angles. De Moivre's Theorem If n is a positive integer and z — r(cos 8 + i sin 8), then repeated use of Equation (1) yields formulas for the powers of z: z2 = r2(cos 20 + i sin 20) z3 = zz2 = r3(cos 30 + i sin 30) z4 = zz3 - r4(cos4# + / sin 48) In general, we have the following result, known as De Moivre's Theorem. Abraham De Moivre (1667-1754) was a French mathematician who made important contributions to trigonometry, analytic geometry, probability, and statistics. Appendix C Complex Numbers Theorem C. 1 De Moivre's Theorem If z = r(cos 8 + i sin 8) and n is a positive integer, then z" = rn(cosn0 + isin nö) Stated differently, we have \z"\ = \z\n and arg(z") = nargz In words, De Moivre's Theorem says that to take the nth. power of a complex number, we take the nth power of its absolute value and multiply its argument by n. Example C.5 -4 - 4/ Powers of 1 + Find(l + resolution From Example C.3(a), we have tt tt 1 + i = V2 cos--h i sin — 4 4 Hence, De Moivre's Theorem gives (1 + f) = (V2)fi cos— + isin 6tt 6tt 3tt 3tt = ö cos--r 1 sin — 2 2 8(0 + ;(-!)) = -8i See Figure C.9, which shows 1 + + i)2, (1 + i)\..., (1 + /) 4 We can also use De Moivre's Theorem to find nth roots of complex numbers. An nth root of the complex number z is any complex number w such that w" = z In polar form, we have w = s(cos

and 6 differ by an integer multiple of 2 77; that is, 8 + 2fcir tup = 8 + IkTT or (p = n ■ AppendixC Complex Numbers CS where k is an integer. Therefore 0 + 2kir\ (0 + 2kir s|--) + i sinl--- describes the possible nth roots of z as k ranges over the integers. It is not hard to show that fc = 0,1, 2,..., n — 1 produce distinct values of w, so there are exactly n different nth roots of z = r(cos 0 + /' sin 0). We summarize this result as follows: Let z — r(cos 0 + i sin 0) and let n be a positive integer. Then z has exactly n distinct Mh roots given by ,1/» 0 + 2fciT + i sin 0 + IkTT (2) for'Kf = 0,1,2, ...,■« - 1. Example C.6 *-Re Figure MO The cube roots of —27 Find the three cube roots of —27. Solution In polar form, -27 = 27(cos 77 + i sin 77). It follows that the cube roots of — 27 are given by cos (-27)113 = 271/3 Using formula (2) with n = 3, we obtain 27l/3 'it + 2ir- 2kTT\ i tt + i sin 2krr for k = 0,1,2 77 tt cos--h 1 sin — 3 3 = 3 V3 3 3V3 = —I--i 2 2 27i/3 27' cos 7T + 4-7T + «sin 7t + 27t tt + 47t 3(cos tt + i sin tt) = —3 577 577 3 cos---V i sin — 3 3 *1 V3 3V3 As Figure C. 10 shows, the three cube roots of — 27 are equally spaced 2tt/3 radians (120°) apart around a circle of radius 3 centered at the origin. In general, formula (2) implies that the «th roots of Z = r(cos 0 + i sin 0) will lie on a circle of radius riln centered at the origin. Moreover, they will be equally spaced 277/n radians (360/n°) apart. (Verify this.) Thus, if we can find one nth root of z, the remaining nth roots of z can be obtained by rotating the first root through successive increments of 2t7/« radians. Had we known this in Example C.6, we could have used the fact that the real cube root of —27 is —3 and then rotated it twice through an angle of 277/3 radians (120°) to get the other two cube roots. Numbers 1 Leonhard Euler (1707-1783) was the most prolific mathematician of all time. He has over 900 pub-| lications to his name, and his collected works fill over 70 volumes. There are so many results attributed to him that "Euler's formula" or "Euler's Theorem" can mean many different things, depending | on the context. Euler worked in so many areas of mathematics, it is difficult to list them all. His contribu- I tions to calculus and analysis, differential equations, number theory, geometry, topology, me- I chanics, and other areas of applied mathematics continue to be influential. He also intro- I duced much of the notation we currently use, including tt, e, i, 2 for summation, A for difference, and/(x) for a function, and was the first to treat sine and cosine as functions. Euler was born in Switzerland but spent most of his mathematical life in Russia and Germany. In 1727, he joined the St. Petersburg Academy of Sciences, which had been founded by Catherine 1, the wife of Peter the Great. He went to Berlin in 1741 at the invitation of Frederick the Great, but i returned in 1766 to St. Petersburg, where he remained until his death. When he was young, he lost the vision in one eye as the result of an illness, and by 1776 he had lost the vision in the other eye and was totally blind. Remarkably, his mathematical output did not diminish, and he continued to be productive until the day he died. Euler's Formula In calculus, you learn that the function ez has a power series expansion z2 z3 f = 1 + z + — + — + 2! 3! that converges for every real number z. It can be shown that this expansion also works when z is a complex number and that the complex exponential function ez obeys the usual rules for exponents. The sine and cosine functions also have power series expansions: y, 3 J y,7 A ./V ,\ sin x = x---1------1-- 3! 5! 7! 2,4 6 0c A- x cos x — 1---1------H-- 2! 4! 6! If we let z = ix, where x is a real number, then we have (ix)2 (ix)3 e" = eft = 1 + ix + — — + -~ + • • • 2! 3! Using the fact that i2 = -1, iJ = - i, i* = 1, i5 = i, and so on, repeating in a cycle of length 4, we see that 2 -3 4 -5 .6 '7 iY X XX X XX X XX e = 1 + xx--------1-----1---------h H--- 2! 3! 4! 5! 6! 7! 24 6 \ / 357 X XX \ I X X X = 1----+----+---- + i\x--+-----+ 2! 4! 6! /V 3! 5! 7! = cos x + i sin x This remarkable result is known as Euler's formula. AppendixC Complex Numbers ;: Theorem G.2 Euler's Formula For any real number x, e'x — cos x + isinx Using Euler's formula, we see that the polar form of a complex number can be written more compactly as i z = r(cos 0 + i sin 6) = re'0 For example, from Example C.3(a), we have 1 + i = V^cos™ + / sin ~j = V2ein/4 We can also go in the other direction and convert a complex exponential back into polar or standard form. ... ^ EXflit Write the following in the form a + hi: (a) eilT (b) e2+br/i < Solution (a) Using Euler's formula, we have em = cos tt + i sin it = — 1 + i • 0 = — 1 (If we write this equation as eOT + 1 = 0, we obtain what is surely one of the most remarkable equations in mathematics. It contains the fundamental operations of addition, multiplication, and exponentiation; the additive identity 0 and the multiplicative identity 1; the two most important transcendental numbers, tt and e; and the complex unit i—-all in one equation!) (b) Using rules for exponents together with Euler's formula, we obtain e2V2 e2V2 _-----1_------2 2 2 If z = re'6 -■■ r(cos 6 + i sin 0), then 2 = r(cos 6 — i sin 0) The trigonometric identities cos(—0) = cos 0 and sin(—0) (3) = —sinö C11 Appendix C Complex Numbers allow us to rewrite Equation (3) as z = r(cos(-0) + isin(-0)) = reK~0) This gives the following useful formula for the conjugate: If z = re* then z = re""' Note Etiler s formula gives a quick, one-line proof of De Moivre s Theorem: [r(cos0 + i sin0)]" = (rew)n = rneine = r"(cos n6 + island) Appendix D Polynomials Euler gave the most algebraic of the proofs of the existence of the roots of [a polynomial] equation.... 7 regard it as unjust to ascribe this proof exclusively to Gauss, who merely added the finishing touches. —Georg Frobenius, 1907 Quoted on the MacTutor History of Mathematics archive, http://www-history.mcs ,st-and.ac.uk/history/ A polynomial is a function p of a single variable x that can be written in the form p(x) = fl0 + axx + a2x2 + • • • + anx" (1) where a0, a„ are constants (an * 0), called the coefficients of p. With the con- vention that x° = 1, we can use summation notation to write p as n 4=0 The integer n is called the degree of p, which is denoted by writing deg p = n. A polynomial of degree zero is called a constant polynomial. IsilSlli 1,1 Which of the following are polynomials? (a) 2 - fx + V_x2 (d) ln(~ (g) cos(2 cos'^x) (b) 2 - 1 (e) 3xl x2 — 5x + 6 x - 2 (c) \/lx2 (f) Vx (h) ex Solution (a) This is the only one that is obviously a polynomial. (b) A polynomial of the form shown in Equation (1) cannot become infinite as x approaches a finite value [limp(x) =f ±oo], whereas 2 — l/3x2 approaches ~oo as x approaches zero. Hence, it is not a polynomial. (c) We have V2X2 = V~iVx2 = V2\x\ which is equal to V2x when %S0 and to - V_x when x < 0. Therefore, this expression is formed by "splicing together" two polynomials (a piecewisepolynomial), but it is not a polynomial itself. "Exercises and selected odd-numbered answers for this appendix can be found on the student companion website. D1 Appendix D Polynomials (d) Using properties of exponents and logarithms, we have Í2e5x'\ lni —spi = ]n(2e5x'~3x) = In 2 + ln(e5x'~3x) = In 2 + 5x3 - 3x = In 2 - 3x + 5x3 so this expression is a polynomial. (e) The domain of this function consists of all real numbers x # 2. For these values of x, the function simplifies to t x2 - 5x + 6 (x - 2)(x - 3) --------_ x - 3 x — 2 x — 2 so we can say that it is a polynomial on its domain. (f) We see that this function cannot be a polynomial (even on its domain x > 0), since repeated differentiation of a polynomial of the form shown in Equation (1) M—** eventually results in zero and Vx does not have this property. (Verify this.) (g) The domain of this expression is-l, using long division to obtain the quotient p/q. The next example illustrates the procedure, which is the same as for long division of one integer into another. Just as the quotient of two integers is not, in general, an integer, the quotient of two polynomials is not, in general, another polynomial. Compute - 1 + 2x - x2 + 3x3 4x + x We will perform long division. It is helpful to write each polynomial with decreasing powers of x. Accordingly, we have x2 - 4x + ll^'-'^Tl^'+'i We begin by dividing x2 into 3x3 to obtain the partial quotient 3x. We then multiply 3% by the divisor x2 - 4% + 2, subtract the result, and bring down the next term from the dividend (3x3 — x2 + 2% + 1): x2 - 4x + 2J3x3 ■ x + 2x + 1 12x2 + 6% Ilx* - 4x + 1 Then we repeat the process with 1 lx2, multiplying 11 by x2 — 4% + 2 and subtracting the result from 1 lx2 — 4x + 1. We obtain 3% + 11 - 4x + 2)3a x2 + 2% + 1 3x3 - 12x2 + 6% llx2 - 4x + 1 llx2 r 44% + 22 40% - 21 BI Appendix D Polynomials We now have a remainder 40x — 21. Its degree is less than that of the divisor x2 — Ax + 2, so the process stops, and we have found that 3%3 - x2 + 2x + 1 = (x2 ~ 4x + 2)(3x + 11) + (40x - 21) 3x3 - x2 + 2x + 1 40x -- 21 or ----;--;——---= 3x + 11 + 4x + 2 x — 4x + 2 Example D.4 can be generalized to give the following result, known as the division algorithm.» Theorem D.1 The Division Algorithm If/and g are polynomials with degg S deg/, then there are polynomials q and r such that fix) = gix)q{x) + r{x) where either r = 0 or deg r < deg g. In Example D.4, fix) = 3x3 - x2 + 2x + 1, g{x) = x2 - 4x + 2, ^{x) = 3x + 11, and r(x) = 40* - 21 In the division algorithm, if the remainder is zero, then fix) = gix)q{x) and we say that g is a factor of/ (Notice that q is also a factor of/) There is a close connection between the factors of a polynomial and its zeros. A zero of a polynomial /is a number a such that/(a) = 0. [The number a is also called a root of the polynomial equation fix) = 0.] The following result, known as the Factor Theorem, establishes the connection between factors of a polynomial and its zeros. Theorem D.2 Hie Factor Theorem Let /be a polynomial and let a be a constant. Then a is a zero of/if and only if x — a is a factor of/(x). Proof By the division algorithm, fix) = {x — a)q{x) + r{x) where either r(x) = 0 or deg r < deg(x — a) = 1. Thus, in either case, r{x) = r is a constant. Now, f{d) = (a — a)q{a) + r = r Appendix D Polynomials so/(a) = 0 if and only if r — 0, which is equivalent to /(x) = (x - a)q{x) as we needed lo prove. —MKKKM There is no method that is guaranteed to find the zeros of a given polynomial. However, there are some guidelines that are useful in special cases. The case of a polynomial with integer coefficients is particularly interesting. The following result, known as the Rational Roots Theorem, gives criteria for a zero of such a polynomial to be a rational number. Theorem 0.3 The Rational Roots Theorem Let f{x) = a0 + ayx + ■ ■ • + fl„x" be a polynomial with integer coefficients and let alh be a rational number written in lowest terms. If alb is a zero of/, then «0 is a multiple of a and a„ is a multiple ofb. Proof If a/£> is a zero of/, then aa + a \b Multiplying through by b", we have "[b] a0bn + axab" an^.lan"lb + a„a = 0 which implies that a0bn + aiabn~l + ■■■ + a„^anlb = -a„an (1) (2) The left-hand side of Equation (2) is a multiple of b, so anan must be a multiple of b also. Since a/b is in lowest terms, a and b have no common factors greater than 1. Therefore, an must be a multiple of b. We can also write Equation (1) as -a0b" = axabn~x + ■ ■ ■ + an_xan~lb + anan and a similar argument shows that a0 must be a multiple of a. (Show this.) Example 0.5 ! Find all the rational roots of the equation 6x3 + I ,lv' 4 = 0 (3) If a/b is a root of this equation, then 6 is a multiple of b and -4 is a multiple of a, by the Rational Roots Theorem. Therefore, ae{±l, ±2, ±4} and be{±1, ±2, ±3, ±6} Appendix D Polynomials Forming all possible rational numbers a/b with these choices of a and b, we see that the only possible rational roots of the given equation are -t- 1 +1 +4 +1 -hi -t- - -f-i _ 1, _ Z, _ 1, _ 2> — 3» — 3> — 3> — 6 Substituting these values into Equation (3) one at a time, we find that — 2, —|, and \ are the only values from this list that are actually roots. (Check these.) As we will see shortly, a polynomial equation of degree 3 cannot have more than three roots, so these are not only all the rational roots of Equation (3) but also its only roots. : We can improve on the trial-and-error method of Example D.5 in various ways. For example, once we find one root a of a given polynomial equation/(x) = 0, we know that x - a is a factor of/(x)—say,/(x) = (x - a)g(x). We can therefore divide f(x) by x — a (using long division) to find g(x). Since deg g < deg/ the roots ofg(x) = 0 [which are also roots of/(x) = 0] may be easier to find. In particular, ifg(x) is a quadratic polynomial, we have access to the quadratic formula. Suppose ax2 + bx + c = 0 (We may assume that a is positive, since multiplying both sides by -1 would produce an equivalent equation otherwise.) Then, completing the square, we have , b b2\ b2 a\ x 4—x H--r =--c a 4a 4a (Verify this.) Equivalently, b V b2 ( b\2 b2 - 4ac ay x _|--~---c or I x -j--=---— 2a J 4a \ 2a 4a Therefore, x + b lb2 - 4ac ±Vb2 - 4ac 2a V 4a2 2a -b ± VlF^lac or x 2a Let's revisit the equation from Example D.5 with the quadratic formula in mind. --------------------------__--- Find the roots of 6x3 + 13x2 - 4 = 0. Solution Let's suppose we use the Rational Roots Theorem to discover that x = — 2 ' is a rational root of 6x3 + 13x2 — 4 = 0. Then x + 2 is a factor of 6x3 + 13x2 — 4, and long division gives 6x3 + 13x2 - 4 = (x 4- 2)(6x2 + x - 2) Appendix D Polynomials (Check this.) We can now apply the quadratic formula to the second factor to find that its zeros are -1 ± Vl2 ~ 4(6)(-2) 2-6 -1 ± V49 _ -1 ± 7 12 ~ 12 i. _ §_ 12> n or, in lowest terms, \ and — f. Thus, the three roots of Equation (3) are —2, |, and — §, as we determined in Example D.5. Remark The Factor Tlieorem establishes a connection between the zeros of a polynomial and its linear factors. However, a polynomial without linear factors may still have factors of higher degree. Furthermore, when asked to factor a polynomial, we need to know the number system to which the coefficients of the factors are supposed to belong. For example, consider the polynomial p(x) - x4 + 1 Over the rational numbers Q, the only possible zeros ofp are 1 and — 1, by the Rational Roots Theorem. A quick check shows that neither of these actually works, so p(x) has no linear factors with rational coefficients, by the Factor Tlieorem. However, p (x) may still factor into a product of two quadratics. We will check for quadratic factors using the method of undetermined coefficients. Suppose that xA + 1 = (x2 + ax + b)(x2 + cx + d) Expanding the right-hand side and comparing coefficients, we obtain the equations a + c = 0 b + ac + d = 0 be + ad — 0 bd = 1 If a = 0, then c = 0 and d = -b. This gives -b2 — 1, which has no solutions in Q. Hence, we may assume that a J= 0. Then c = - a, and we obtain d = b. It now follows that b1 = 1, so b = 1 or b = — 1. This implies that a2 = 2 or a2 = -2, respectively, neither of which has solutions in Q. It follows that x4 + 1 cannot be factored over Q. We say that it is irreducible over G. However, over the real numbers R, x4 + 1 does factor. The calculations we have just done show that x4 + 1 = (x2 + V2x + l)(x2 - V2x + 1) (Why?) To see whether we can factor further, we apply the quadratic formula. We see that the first factor has zeros \/2 ± V(V2 2 - 4 - V 2 ± V12 V2, s 1 1 -------L- =-----_ -(_: ± _--± • 2 2 ■ 2 ' V2 V2 Appendix D Polynomials which are in C but not in SR. Hence, x2 + V2..V + 1 cannot be factored into linear factors over R. Similarly, x2 ~ v2x + 1 cannot be factored into linear factors over R. Our calculations show that a complete factorization of*4 + 1 is possible over the complex numbers C. The four zeros of x4 + 1 are - __L -.___L - _ 1 1 V'2 V'2 ' V_ V2 ' V2 V2 ' 1 1 — a = —7=---pi V2 V2 which, as Figure D.l shows, all lie on the unit circle in the complex plane. Thus, the factorization of*4 + 1 is x4 + I = (x — a)(x — a)(x + a)(x + a) The preceding Remark illustrates several important properties of polynomials. Notice that the polynomial p(x) = x4 + 1 satisfies deg p ~ 4 and has exactly four zeros in C. Furthermore, its complex zeros occur in conjugate pairs; that: is, its complex zeros can be paired up as {a, a} and {—a,—a} These last two facts are true in general. The first is an instance of the Fundamental Theorem of Algebra (FTA). a result that was first proved by Gauss in 1797. The0f8m D.4 The Fundamental Theorem of Algebra Every polynomial of degree n with real or complex coefficients has exactly n zeros (counting multiplicities) in C. This important theorem is sometimes stated as "Every polynomial with real or complex coefficients has a zero in C." Let's call this statement FTA'. Certainly, FTA implies FTA'. Conversely, if FTA' is true, then if we have a polynomial p of degree n, it has a zero a in C. The Factor Theorem then tells us that x — a is a factor oip(x), so p(x) = (x - a)q(x) where q is a polynomial of degree n — 1 (also with real or complex coefficients). We can now apply FTA' to q to get another zero, and so on, making FTA true. This argu-H >. ment can be made into a nice induction proof. (Try it.) It is not possible to give a formula (along the lines of the quadratic formula) for the zeros of polynomials of degree 5 or more. (The work of Abel and Galois confirmed this; see page 311.) Consequently, other methods must be used to prove FTA. The proof that Gauss gave uses topological methods and can be found in more advanced mathematics courses. Now suppose that p(x) = a0 + a,x + • • • + a„xn ► Re Appendix D Polynomials D9 is a polynomial with real coefficients. Let a be a complex zero of p so that a0 + axa + ■ ■ • + a„a" = p(a) = 0 Then, using properties of conjugates, we have p(a) = a0 + axa + ■ ■ ■ + ana" — a0 + axa + • ■ ■ + ana" = a0 + axa + • • • + a„a" = p[ci) = 0 = 0 Thus, a is also a zero of p. This proves the following result: The complex zeros of a polynomial with real coefficients occur in conjugate pairs. In some situations, we do not need to know what the zeros of a polynomial are— we only need to know7 where they are located. For example, we might only need to know whether the zeros are positive or negative (as in Theorem 4.35). One theorem that is useful in this regard is Descartes' Rule of Signs. It allows us to make certain predictions about the number of positive zeros of a polynomial with real coefficients based on the signs of these coefficients. Given a polynomial aa + axx +•■• + anx", write its nonzero coefficients in order. Replace each positive coefficient by a plus sign and each negative coefficient by a minus sign. We will say that the polynomial has k sign changes if there are k places where the coefficients change sign. For example, the polynomial 2 — 3x + 4x3 + x4 — 7x5 has the sign pattern +^_+ +_j so it has three sign changes, as indicated. Descartes' stated this rule in his 1637 book La Geometrie, but did not give a proof. Several mathematicians later furnished a proof, and Gauss provided a somewhat sharper version of the theorem in 1828. Theorem D.5 Descartes' Rule of Signs Let p be a polynomial with real coefficients that has k sign changes. Then the number of positive zeros of p (counting multiplicities) is at most k. In words, Descartes' Rule of Signs says that a real polynomial cannot have more positive zeros than it has sign changes. Example 0.7 Show that the polynomial p{x) = 4 + 2x2 — 7x4 has exactly one positive zero Solution The coefficients of p have the sign pattern + H—, which has only one sign change. So, by Descartes' Rule of Signs, p has at most one positive zero. But p(0) — 4 andp(l) = —1, so there is a zero somewhere in the interval (0, 1). Hence, this is the only positive zero of p. Appendix D Polynomials We can also use Descartes' Rule of Signs to give a bound on the number of negative zeros of a polynomial with real coefficients. Let p(x) = a0 + axx + a2x2 + • • • + anx" and let b be a negative zero of p. Then b = — c for c > 0, and we have 0 = p(b) = a0 + axb + a2b2 H-----h anbn = a0 - aAc + a2c2 —!-••• + (~l)"ancn But p(~x) = a0 - axx + a2x2 —I-----h (~l)nanxn so c is a positive zero ofp(-x). Therefore, p(x) has exactly as many negative zeros as p(—x) has positive zeros. Combined with Descartes' Rule of Signs, this observation yields the following: Let p be a polynomial with real coefficients. Then the number of negative zeros of p is at most the number of sign changes ofp(-x). -.— , , ,., , ..--»► Show that the zeros ofp(x) = 1 + 3x + 2x2 + x5 cannot all be real. Solution The coefficients of p(x) have no sign changes, so p has no positive zeros. ▼ Sincep(-x) - 1 — 3x + 2x2 - x5 has three sign changes among its coefficients, p has at most three negative zeros. We note that 0 is not a zero of p either, so p has at most three real zeros. Therefore, it has at least two complex (nonreal) zeros. Answers to Selected Odd-Numbered Exercises Answers are easy. It's asking the right questions [that's] hard. —Doctor Who "The Face of Evil," By Chris Boucher BBC, 1977 (d) I-►.T 5. (a) 0>> v A 2- 1^ H-h-\—*-x 1 2 3 ANS2 Answers to Selected Odd-Numbered Exercises (c) H—I—h -2 \ (d) i i 6 3 7. a + b = [5, 3] 11. [3, -2,3] 13. u = 1/2" "-V3/2" .V3/2. , v — _ -1/2. , U + V (1 - V3)/2 .(V3 - l)/2. , u - v = (1 + \/3)/2 _(1 + V3)/2. 15. 5a 17. x 3a 19. 2 X /"\ X \ X\ /' 2)( X 4 V 6 \ - 21. w -2u + 4v 25. u + v = 27. u + v = [0,1,0,0] 29. + 0 1 2 3 0 0 1 2 3 1 1 2 3 0 2 2. 3 0 1 3 3 0 1 2 31.0 35.0 39. 5 43. [0, 0, 2, 2], [2,3, 0 12 3 0 0 0 0 0 10 12 3 2 0 2 0 2 3 0 3 2 1 33. 1 37. 2, 0, 3 41. [1,1,0] 45. x = 2 47. No solution 49. x = 3 51. No solution 53. x = 2 55. x — 1, or x = 5 57. (a) All a # 0 (b) a = 1, 5 (c) a and m can have no common factors other than 1 [i.e., the greatest common divisor (gcd) of a and m is 1]. Exercises 1.2 1. -1 3. 11 5. 2 7. V5, -1/V5 2/V5. 9. Vli, 1/V14 2/Vli 3/V14 Answers to Selected Odd-Numbered Exercises ANS3 11. V6, [1/V6, 1/V3, 1/V2, 0] 13. Vl7 15. V6 17. (a) u • v is a scalar, not a vector. (c) v • w is a scalar and u is a vector. 19. Acute 21. Acute 23. Acute 25.60° 27. «88.10° 29. =14.34° 31. Since AB ■ AC = right angle. 4" r 1 • 1 _-l_ -3 = 0, ZBAC is a "f r "-l" r 1 1 >d3 = 1 ,d4 = -1 _1_ .-1. . 1_ 1, 33. If we take the cube to be a unit cube (as in Figure 1.34), the four diagonals are given by the vectors d, Since d, • d; 0 for all i + j (six possibilities), no two diagonals are perpendicular. 35. D = (-2, 1, 1) 37. 5 mi/h at an angle of 53.13° to the bank 39. 60° r 1- " -0.301 41. 43. 45. 0.033 -0.252 47. .4 = V45/2 51. v is of the form k — a 49. k = -2, 3 , where k is a scalar. 53. The Cauchy-Schwarz Inequality would be violated. Exercises 1.3 "3 2 l.(a) 3. (a) (b) x = 1 - t 0 (b) 3x + 2y = 0 X "1" — + t J- .0. 3, y = 3f X r x = t 5. (a) y t -1 (b) y= -t .z. 4_ z = 4t '3' X 7. (a) 2 • y = 2 (b) 3x + 2y + .1, 9. (a) X 2 -3 y = s 1 + t 2 z 2 1 (b) x = 2s y = 5 Z = 25 + f 3f 2f 1 + r 2 r -2 2 X 1 3 -1 y = 1 + 5 -1 + r 0 2 1 1 — 2 11. 13. 15. (a) x = f / = -1 + 3f 17. Direction vectors for the two lines are given by 1 X 0 1 = +1 y -i 3 if anc and d, = nh . The lines are perpendicular only if dj and d2 are orthogonal. But dj • dj = 0 if and only if 1 + m1m7 = 0 or, equivalently, minh = ~ 1- 19. (a) Perpendicular (c) Perpendicular (b) Parallel (d) Perpendicular a: 2 3 21. — .-1. + f _2_ X ■-1" 23. = 0 3 _z_ 3 -1 25. (a) x — 0, x = 1, y = 0, y = 1, z = 0, z = 1 (b) x — y = 0 (c) x + y - z = 0 27. 3 V2/2 29. 2 V3/3 31. (5, \) 33.(1,1,1) 35. 18V13/13 37.1 43. «78.9° 45. =80.4° Exercises 1.4 1. 13NatapproxN67.38 E 3. 8V3 Nat an angle of 30° to f, 5. 4 N at an angle of 60° to f2 7. 5 N at an angle of 60° to the given force, 5V3N perpendicular to the 5 N force 9. 750\ 2X 11. 980 N 13. « 117.6 N in the 15 cm wire, ~ 88.2 N in the 20 cm wire ANS4 Answers to Selected Odd-Numbered Exercises Review Questions 1. (a) T (c) F (e) T (g) F (i) T '-2/VI r 3.x 11 5. 120° 7. 1/V5 0 9.2x + 3y-z=7 11. V6/2 13. The Cauchy-Schwarz Inequality would be violated. 15.2V6/3 17. x = 2 19. 3 Chapter 2 Exercises 2.1 1. Linear 3. Not linear because of the x"1 term 5. Not linear 7. 2x + Ay = 7 9. x + y = 4(x, y # 0) 2/1 4 - 25 - 3f 13. 5 15. Unique solution, x = 3, y = —3 1—I—► * 17. No solution 1—► * 19. [7, 3] 23. [5, -2, 1, 1] 27. 1 -1 0" 2 1 3 21. i i' l 25. [2, — 7 - 32] 1 5 -l" 29. 1 1 -5 2 4 4_ 31. y + z = 1 33. [1, 1] x — y =1 2x — y + z = 1 35. [4,-1] 37. No solution 39. (a) 2x + y = 3 (b) x = f - js 4x + 2 v = 6 y = s 41. Let u — ~ and v = —. The solution is x = k y = —5 x y 43. Let w = tan x, v = sin y, w — cos z. One solution is x = 7r/4,y = -tt/6, z = 7r/3. (There are infinitely many solutions.) Exercises 2.2 l.No 5. No 3. Reduced row echelon form 7. No "l 1 1 9. (a) 0 1 1 „0 0 1. "l 0 -1 13. (b) 0 1 -1 _0 0 0 11. (b) 1 0 0 1 0 0 15. Perform elementary row operations in the order R4 + 29R3, SR3, R4 - 3R2, R2 <-> R3, R4 - Rv R3 + 2RV and, finally, R2 + 2RV 17. One possibility is to perform elementary row operations on A in the order R2 - 3RV \R2, Rx + 2R2, R2 + 3RP i?! <-> R2. 19. Hint: Pick a random 2X2 matrix and try this— carefully! 21. This is really two elementary row operations combined: 3R2 and R2 — 2R}. 23. Exercise 1: 3; Exercise 3: 2; Exercise 5: 2; Exercise 7: 3 25. 27. t 29. 2 -1 31. 24 6 0 12 -10 -2 6 -6 0 + r 1 + s 0 + / 0 0 0 1 0 0 0 0 1 33. No solution 35. Unique solution Answers to Selected Odd-Numbered Exercises ANS5 37. Infinitely many solutions 39. Hint: Show that if ad — be + 0, the rank of a b c d is 2. (There are two cases: a = 0 and a # 0.) Use the Rank Theorem to deduce that the given system must have a unique solution. 41. (a) No solution if k = -1 (b) A unique solution if k # ±1 (c) Infinitely many solutions if fc = 1 43. (a) No solution if k = 1 (b) A unique solution if k 41 —2,1 (c) Infinitely many solutions if k = —2 45. X o" 9' y = -1 + t -10 _2_ 1. -7. 49. No intersection 51. The required vectors x = are the solutions of the homogeneous system with augmented matrix 0 0 By Theorem 3, there are infinitely many solutions. If Uj # 0 and m, v2 - h2Vj # 0, the solutions are given by u2v3 — U3V2 t U3V1 — KjV3 -Mlv2 ~ w2v1- But a direct check shows that these are still solutions even if ux = 0 and/or UjV, — u1vl = 0. 1 53. 55. 57. 7. Yes Exercises 2.3 1. Yes 3. No 5. Yes 9. We need to show that the vector equation x 1 y -i _ a b_ + has a solution for all values of a and fr. This vector equation is equivalent to the linear system . Row whose augmented matrix is reduction yields \l 1 a [l -1 a b — a see that there is a (unique) solution 1 1 0 -2 from which we can [Further row operations yield x = (a + b)/2, y=(a- b)/2.] Hence, U2 = span( 11. We need to show that the vector equation x f 1" 1" spanl 1. .-1. + y 1 0 a 1 + 2 1 = b 0 1 c has a solution for all values of a, b, and c. This vector equation is equivalent to the linear system whose augmented matrix is 0 0 . Row reduction yields a b b + c , from which we can see that there is a (unique) solution. [Further row operations yield x = (a — b + c)/2, y = (a + b — c)/2, z = ( -a + b + ( 1 1 0 \ Hence, U3 = span 0 > 1 í 1 { 1 0 1 / 13. (a) The line through the origin with direction '-I vector L 2. (b) The line with general equation 2x + y — 0 15. (a) The plane through the origin with direction V 3' 3 vectors 2 i 2 3_ .0. whose solution is r . It follows that there are (b) The plane with general equation 2x — y + 4z = 0 17. Substitution yields the linear system a + 3c = 0 -a + b - 3c = 0 -3 0 1 infinitely many solutions, the simplest perhaps being a = ~3, b = 0, c = 1. 19. u = u + 0(u + v) + 0(u + v + w) v = (-l)u + (u + v) + 0(u + v + w) w = Ou + (-l)(u + v) + (u + v + w) Answers to Selected Odd-Numbered Exercises 21. (c) We must show that span(e,, e:, e3) = span(e,, e, + e2, ej + e2 + e3). We know that span(e,, e, + e2, e + e. + e3) CR' = span(ep e2, e3). From Exercise 19, e., e2, and e3 all belong to span(ep e. + e,, et + e2 + e3). Therefore, by Exercise 21(b), span(ep e2, e3) = span(ep e, + e2, ej + e2 + e3). 23. Linearly independent 0 2 2 25. Linearly dependent, — 1 + 1 = 0 2 3 1 27. Linearly dependent, since the set contains the zero vector 29. Linearly independent 31. Linearly dependent, 43. (a) Yes 3 -1 -1 1 -1 3 -I 1 + + 1 1 1 3 -1 -1 3 1 (b) No Exercises 2.4 l.xl = 160, x2 = 120, x3 ~ 160 3. two small, three medium, four large 5. 65 bags of house blend, 30 bags of special blend, 45 bags of gourmet blend 7. 4FeS2 + 1102 -> 21V-0 + 8S02 9. 2C4H,0 + 1302 -> 8C02 + 10H2O 11.2C5HuOH + 1502 -> 12H20 + 10CO2 13. Na2C03 + 4C + N2 -> 2NaCN + 3CO 15.(a)/,= 30 - f (b)/, = 15,/3 = 15 f2 = -10 + t (c) 0 < fx < 20 0 < /2 < 20 10 200 (from the/ equa- tion), but/ = ŕ < 150 (from the/, equation). This is a contradiction, (d) 50 j. Then, by the definition of an upper triangular matrix, an ~ aa = " ' ' = at,i-i ~ 0 and hj = = •••=&„/= 0 Now let C ■ -n) AB. Then q, = anbxj + ai2b2j + ■ = 0 • bxj + 0 ■ • + a,„ -0 = 0 from which it follows that C is upper triangular. 35. (a) A, B symmetric => (A + B)r = A1 + Br = A + B A + B is symmetric 37. Matrices (b) and (c) are skew-symmetric. 41. Either A or B (or both) must be the zero matrix. 1 2 3 "l 3 5 0 -1 -2 43. (b) 4 5 6 = 3 5 7 + 1 0 -1 J 8 9_ 5 7 9_ 2 1 0 47. Hint: Use the trace. Exercises 3.3 2 -7 -1 4 1. 5. Not invertible 3. Not invertible -2.8] 7. 1.6 0.3 1 J a/(a2 + b2) b/(a2 + b2) _-b/(a2 + b2) a/(a2 + b2) "-5" 11. 9. 13. (a) x1 = 4" "-5" 6" 1 > X2 = ,x3 = - ~~2- 2_ _-2_ 17. 21. 23. (c) The method in part (b) uses fewer multiplications, (b) (AB)"1 = A~1B~1 if and only if AB = BA X = A~l(BA)2Bl A 1 0 0" 27. E = 25. £ 29. E = X = (ABTlBA 0 0 1 0 1 0 1 0 0. 1 0 0 0 1 2 0 0 1 0 1 0 -1 0 1 31. 3 0 0 1 33. 39. 43. 45. 47. 0 1 1 0 1 0 0^ "l 0 0 35. 0 1 2 37. 0 1/c 0 _0 0 1 _0 0 i 1 0 •1 1 0 -2 1 0 1 1 (a) If A is invertible, then BA = CA=> (BA)A~! = (CA)A"1 B(AA ') = C(AA ') => B7 = CI = B = C. Hint: Rewrite A2 - 2A + 7 = O as A (27 - A) = 7. If AB is invertible, then there exists a matrix X such that (AB)X = I. But then A(BX) = 7 too, so A is invertible (with inverse BX). ~l/{a2 + 1) a/(a2 + 1) 49. 53. Not invertible 51. -a/(a2 + 1) 1/V + DJ 1/a 0 0 n 55. -1/a2 1/a 0 ,a * 1/a3 -1/a2 l/a_ "-11 - -2 5 -4" 4 1 -2 2 57. 5 1 -2 2 9 2 -4 3_ " 1 0 0 0 " 0 1 0 0 59. 0 0 1 0 _ — a/d -bid -c/d l/d_ d+Q 61. Not invertible 63. ANS12 Answers to Selected Odd-Numbered Exercises 69. 1 0 0 0" 0 1 0 0 -2 -3 1 0 -1 -2 0 1 71. "-1 0 1 r 0 1 -1 0 0 1 0 0 1 "1 0 0. Exercises 3.4 1. 7. 11. 13. 1 0 0 4 1 0 .8 3 1 1 0 2 1 0 3 1 -1 0 -2 3. 2 3 T o .0 o o o o 0 1 -3/2 -2 -1 -7' -15 —2 2 1 0 0 0 1 0 0 0 1 2 3 -3 -6 0 3. 1 2 0 2 0 0 0 0 1 - 0 3 3 0 0 15. L1 = "1 0" , U"1 = - i l -2 12 .1 1. 19. 21. 23. 25. 27. A = "0 0 0 1 .1 0 "0 1 1 0 0 0 _0 0 ~0 1 1 0 .0 0 "0 1 1 0 0 0 0 0 -5/12 1/12 1/6 1/6. 0 0 1 0 0 0 0 0 0 1 0 0 0 1 1 0 0 0 1 1 0 0 0 0 0 0 0 1 0 1 0 "o 1 0" 1 0 0 .0 0 1 0 0 0 1 0 10 0 0 0 10 10 0 0 1 0 0 0 1 0 1 5 1 0 0 10' 0 10 0 10 0 0 0 0 0 1 -1 2 1 0 1 4 0 0 -16 0 0 0 1 0 0 0 1 0 -1 0 1 31. 1 0 -1 1 -1 0 0 0 "-2 0 0 6 1 1 -1 1 2' 3 0 1 1 0 0 4 1 -4 Exercises 3.5 1. Subspace 3. Subspace 5. Subspace 7. Not a subspace 11. b is in col(A), w is not in row(A). 15. No 17. {[1 0 — 1 ], [0 1 2]}isabasisforrow(A) 1 is a basis for col(A); -2 1 is a basis for null(A). 19. {[1 0 1 0], [0 1 -1 0], [0 0 0 1 ]} is a basis (Til r11 r r for row(A); I 0 , 1 , 1 0 1 -1 is a basis for col(A); *-r < l > l °_ > is a basis for null(A). 21.{[1 0 -1], [1 1 1]} is a basis for row(A); is a basis for col(A) 23- {[1 1 0 1], [0 1 -1 1], [0 1 -1 -1]} is a basis for row(A); col(A) 1 0 0 0 J 1 0 0 0 1 is a basis for 25. Both {[ 1 0 -1], [0 1 2]}and{[l 0 -1], [1 1 1 ]} are linearly independent spanning sets for row(A) = {{a b -a + 2b]}. Both are linearly independent spanning and sets for col(A) 1 -1 1 0 o. 1 27. 29. {[1 0 0], [0 1 0], [0 0 1]) 31.{[2 -3 1], [1 - 1 0], [4 -4 1]} 35. rank(A) = 2, nullity(A) = 1 37. rank(A) = 3, nullity(A) = 1 39. If A is 3 X 5, then rank(A) £ 3, so there cannot be more than three linearly independent columns. 41. nullity(A) = 2, 3,4, or 5 Answers to Selected Odd-Numbered Exercises ANS13 43. If a = ~1, then rank(A) = I; if a — 2, then rank(A) = 2; for a # -1,2, rank(A) = 3. 45. Yes 47. Yes 49. No 51. w is in span(/3) if and only if the linear system with augmented matrix [B j w] is consistent, which is true in this case, since "l 1 r "l 0 3~ B\w]- 2 0 6 -► 0 1 -2 „0 -1 2, .0 0 0. From this reduced row echelon form, it is also clear that [w]s = _^ 53. rank(A) = 2, nullity(A) = 1 55. rank(A) = 3, nullity(A) = 1 57. Let A1;..., A„, be the row vectors of A so that row(A) = span(At,..., Am). If x is in null(A), then, since Ax = 0, we also have A, • x = 0 for / = 1,... ,m, by the row-column definition of matrix multiplication. If r is in row(A), then r is of the form r = c,Ai + • • • + cmAm. Therefore, r • x = (cjA] + • • • + cmAJ •x = Ci(Ai'x) + • • ■ + c,„(Am -x) = 0 59. (a) If a set of columns of AB is linearly independent, then the corresponding columns of B are linearly independent (by an argument similar to that needed to prove Exercise 29 in Section 3.1). It follows that the maximum number k of linearly independent columns of AB [i.e., k = rank(AB)] is not more than the maximum number r of linearly independent columns of B [i.e., r = rank(B)]. In other words, rank(AB) < rank(B). 61. (a) From Exercise 59(a), rank(LA) _ rank(A) and rank(A) = rank((C/_1L/) A) = rnnk(U"l(UA)) < rank(C/A). Hence, rank(LA) = rank(A). Exercises 3.6 i. r(u) = 0" 8" . T(v) = 1 .11. 11. 1 -1 13. 15. [F] -1 0 17. [D] 1 -3. 2 0 0 3 19. k 0 0 1 stretches or contracts in the x-direction (com bined with a reflection in the y-axis if k < 0); Lo k stretches or contracts in the y -direction (combined 0 1 1 0 with a reflection in the x-axis if k < 0); reflection in the line y = x; is a shear in the y -direction. For 1 k 0 1 is a shear in the x-direction example, 1 0 k l y A. (0, 1) (0. 0) (1,1) (1,0) (0,1) (0, 0) (1.1) (1,0) 21. 25. V3/2 1/2 -1/2 V3/2. 0 -1 -1 0 31. [S ° T] = 33. [S ° T] = 35. [S 0 T] = *• x 23. 27. 1 _I 2 2 i 1 "2 2 3 4' "1 5 1 1 5 5. -8 5 4 1 0 6 - 1 -2 1 0 -1 1 0 -1 37. ■V3/2 1/2 1/2 V3/2. -1 0 1 39. -V3/2 1/2 -1/2 -V3/2. 45. In vector form, let the parallel lines be given by x = p + td and x' = p' + td. Their images are T(x) = T(p + fd) = T(p) + tT(d) and T(x') = T(p' + td) = T(p') + tT(d). Suppose T(d) * 0. If T(p') — T(p) is parallel to T(d), then the images represent the same line; otherwise the images represent distinct parallel lines. On the other hand, if T(d) = 0, ANS14 Answers to Selected Odd-Numbered Exercises then the images represent two distinct points if T(p') ¥= T(p) and single point otherwise. 47. (-2, 3) (-2, -3) Exercises 3.7 "0.4 1.x, = 0.6 (2, 3) (2, -3) "0.38" .0.62. ~150~ "155" 5. Xj = 120 ,x2 = 120 120 115 9. (a) P = "0.662 0.250 .0.338 0.750 (c) 42.5% wet, 57.5% dry 3. 64% 7 JL (b) 0.353 11. (a) P 0.08 0.07 0.85 0.09 0.11 0.80 0.11 0.05 0.84 (b) 0.08, 0.1062, 0.1057, 0.1057, 0.1057 (c) 10.6% good, 5.5% fair, 83.9% poor 13. The entries of the vector jP are just the column sums of the matrix P. So P is stochastic if and only if jP = j. 15.4 19. Yes, x 23. No 27. Productive 17. 9.375 21. No 25. Yes, x = 10 27 35 31.x = 37. X, 39.x, = 10 16 45 6 500 70 50 29. Not productive 10 33. Yes, x 120" "375" , x2 = ,x3 = . 27. _ 72. 720 350 35 41. (a) For Lp we have Xj = 50 1175 504 175 40 40 160" "800" "640" , X4 = .160. .128. .640. "2560" "12800" .2560. ,Xg = . 2048. 200 _ 32 "3200 512 "10240 .10240 (b) The first population oscillates between two states,while the second approaches a steady state. 43. The population oscillates through a cycle of three states (for the relative population): If 0.1 < s S 1, the actual population is growing; if s = 0.1, the actual population goes through a cycle of length 3; and if 0 S s < 0,1, the actual population is declining (and will eventually die out). 45. A 47. A Answers to Selected Odd-Numbered Exercises ANSIS 51. 53. A "0 1 1 0" 0 0 0 0 0 1 0 1 1 0 0 0 55. A = 0 10 10 10 0 10 110 0 0 1 0 0 0 1 1 0 0 0 0 61.2 63. 3 65.0 67. 3 69. (a) Vertex i is not adjacent to any other vertices. 71. If we use direct wins only, P2 is in first place; P3, P4, and P6 tie for second place; and P} and P5 tie for third place. If we combine direct and indirect wins, the players rank as follows: P2 in first place, followed by P6, P4, P3,P5, and Pj. 73. (a) Ann Bert Ehaz Dana Carla (b) two steps; all of the off-diagonal entries of the second row of A + A2 are nonzero, (d) If the graph has n vertices, check the (f,;') entry of the powers Ak for k = 1,... ,n — 1. Vertex i is connected to vertex j by a path of length k if and onlyif(A'')y * 0. 75. (AAr)y counts the number of vertices adjacent to both vertex i and vertex;'. 77. Bipartite 79. Bipartite Review Questions 1. (a) T (c) F 3. Impossible (e) T (g) T (i) T iZ 83 1_ 83 5_ 166 1 3" " 4 10 7. + .3 9. .10 25 0 = 1, 11. Because (J - A)(I + A + A2) = I - A3 = I (I - AY1 = 1 + A+A2. 13. Abasisforrow(A)is{[l, -2, 0,-1,0], [0, 0, 1,2,0], [0, 0, 0, 0, 1]}; a basis for col(A) is (or the standard basis for R3); and a basis for null(A) is 2 5 5 1 2 1 4 3 6 "2" f 1 0 0 -2 > \ 0 1 I A 0_ - 15. An invertible matrix has a trivial (zero) null space. If A is invertible, then so is Ar, and so both A and A1 have trivial null spaces. If A is not invertible, then A and AT need not have the same null space. For example, take 11 1" A 0 0 17. Because A has n linearly independent columns, rank(A) = n. Hence rank(ArA) = n by Theorem 3.28. Because A A is n Xn, this implies that ATA is invertible, by the Fundamental Theorem of Invertible Matrices. AAT need not be invertible. For example, take A 19. -1/5V2 -3/5 V2 2/5V^ 6/5V2 Chapter 4 Exercises 4.1 1. Av = = 3v, A = 3 AHS16 Answers to Selected Odd-Numbered Exercises 3. Av 5. Av = •3v. A = -3 = 3v, A = 3 9. 11. 1 1 -1 13. A = 1,£, = span 15. A = 0, £0 = span 17. A = 2, E2 = span "l" 19. V = , A = l;v = 1/V2] 21. v = , A = 2; v LI/V2J 23. A = 2, £2 = span I "0" 1_ > "0" J. "1* .0. > 1 , A = 2 . 1/V2J' ; A = 3, £, = span 25. A = 2, £2 = span^ A l- 0 x 2x H—N-.....H~"»l—► * 1 27. A = 1 + ;. jS1+j = span: span! A = 1 - /,£,_; = 29. A = 1 + i, El+i = span A = 1 — i, BH = span 31. A = 1,2 Exercises 4.2 1. 16 3. 0 33. A = 4 •18 11. a2b + ab2 13.4 27. -24 35. 8 47. -6 7. 6 15. afrdg 29. 0 37. -4 49. 9. -12 17. 0 25. 2 31. 0 33. -24 39.-8 45. fc^0,2 51. (-2)3" 53. det(Aß) = (det A)(det 5) = (det B) (det A) = det(BA) 55. 0, 1 1- 59.x = -!,>' = 0,2 = 1 61. ' 57.x = \,y = -\ 63. 2 2 1 0 1 -1 0 0 1 Exercises 4.3 1. (a) A2 - 7A + 12 (b) A = 3,4 (c) E3 = span ; £4 = span! (d) The algebraic and geometric multiplicities are all 1. 3. (a) -A3 + 2A2 + 5A - 6 (b) A = -2, 1, 3 / r ) ( V \ (c) jEL2 = span -3 = span 0 I ) \ _o_ / £3 ~ span 1 2 10 (d) The algebraic and geometric multiplicities are all 1. 5. (a) -A3 + A2 (b) A = 0,1 (c) £0 = span (d) A = 0 has algebraic multiplicity 2 and geometric multiplicity 1; A = 1 has algebraic and geometric multiplicity 1. 7. (a) -A3 + 9A2 - 27A + 27 (b) A = 3 / 2 \ I V \ -1 ; £j = span 0 I 1, J \ .1. 1 Answers to Selected Odd-Numbered Exercises ANSII ( "-1" "0" \ (c) £3 = span 0 > 1 V 1. .0. / (d) A = 3 has algebraic multiplicity 3 and geometric multiplicity 2. 9. (a) A4 - 6A3 + 9A2 + 4A - 12 (b) A = -1,2,3 I (c) £.....! = span " 0" \ 0 -2 1_ / / " r \ -1 span 0 I 0. / span \ / (d) A = — 1 and A = 3 have algebraic and geometric multiplicity 1; A = 2 has algebraic multiplicity 2 and geometric multiplicity 1. 11. (a) A4 - 4A3 4- 2A2 + 4A - 3 (b) A = -1, 1,3 (c) E_! = span Ex = span 3j7 span 0 0 2 Mi \ 15. (d) A = -1 and A = 3 have algebraic and geometric multiplicity 1; A = 1 has algebraic and geometric multiplicity 2. 2 (2 • 320 - l)/320 2 2" 2~9 + 3-2' -2~9 + 3-21 17. 23. (a) A = —2, E_2 = span spanM J -5 A = 5, E5 (b) (i) A = -j,£_1/2 = span span A - 5, £1/5 (iii) A = 0, £0 = span 2 -5 ;A = 7, = span ■3 4 -12" 27. 1 0 0 , _ 0 1 0_ 35. A2 = 4A - 5/, A 3 _ A4 = 24A - - 551 -A3 - 3A2 + 4A - 12 14, 4 11 37. A = —A + -I, A'2 =--A + —7 5 5 25 25 Exercises 4.4 1. The characteristic polynomial of A is A2 — 5 A + 1, but that of B is A2 - 2A + 1. 3. The eigenvalues of A are A = 2 and A = 4, but those of B are A = 1 and A = 4. l" S.'Ai = 4, £4 = span f "3 6,£6 = span 2 { .3 / 0" 1 \ 1 > 0 I »-1. _-l / ;A2 3, £3 = span -2, £_2 = span 9. Not diagonalizable '1 1 -l" "2 0 0" 11. p = 1 1 1 ,D = 0 -1 0 .1 -2 0_ .0 0 1 13. Not diagonalizable 15. P = 17. -11605 24234. (3* + 3(-l)*)/4 Okrl ~ 3(-l)*)/4] (3k-(~l)k)/4 (3*« + (-l)*)/4 J "1 0 0 -1 "2 0 0 0" 0 1 0 0 0 2 0 0 ,D = 0 0 1 0 0 0 -2 0 0 0 1 _0 0 0 -2_ 35839 -69630" ANS18 Answers to Selected Odd-Numbered Exercises 21. 23. 1 1 1 0-1 0 0 0-1 (5 + 2l+2 + (-3)VlQ (2* - (-3)*)/5 (-5 + 2ki2 + (-3)/()/T0 (2h+1 - 2(-3)k)/5 (2* + 4(~3)*)/5 (2*h - 2(-3)":)/5 (-5 + 2*+2 + (-3)*)/l0 (21 - (-3)*)/5 (5 + 2m + (-3)*)/10 . 25. fc = 0 27. it = 0 29. All real values of k 37. If A ~ 5, then there is an invertible matrix P such that B = P~XAP. Therefore, we have tr(B) = tr(P_1AP) = tr(P-1CAJP)) = tr((AP)P"') = tr(APP-1) = tr(Af) = tr(A) using Exercise 45 in Section 3.2, 7 -2" 39. P = 41. P 10 -5 0 51. (b) dim£_, = 1, dimf, dim£2 = 3 Exercises 4.5 1 2.5 l.(a) 6.000 (b) A, = 3. (a) 1 [0.61" (b) Aj = (3 + V5)/2 5. (a) m$ = 11.001, y5 = 7. (a) m8 = 10.000, y8 = = 2.618 -0.333" 1.000. V 0 1 9. 7* 26 8 1 0.308 26 17.692 [ 5.923.1 1 0.335. 17.692 18.018 6.004 1 0.333. 18.018 17.9"99 6.000 1 0.333. 17.999 18.000 6.000 'l 0.333. 18.000 Therefore, A! *> 18, v. 1 0.333 k 0 1 2 3 4 5 6 11. 1 7 7.571 7.755" "7.808" "7.823" ~7.827~ .0. .2. .2.857_ .3.132. .3.212. .3.234. .3.240. Yk "l' "l "l "l "l "l 1 l0. .0.286. .0.377. .0.404. .0.411. .0.413. .0.414. mk 1 7 7.571 7.755 7.808 7.823 7.827 Therefore, A, ~ 7.827, Vj 1 0.414 Answers to Selected Odd-Numbered Exercises 13. k 1 y* mk 21 15 13 1 0.714 0.619 21 16.809 12.238 10.714 1 0.728 .0.637. 16.809 17.011 12.371 10.824 1 0.727 .0.636. 17.011 16.999 12.363 10.818 'l 0.727 .0.636. 16.999 17.000 12.363 10.818 "l 0.727 .0.636. 17.000 Therefore, A i « 17, Vj ** 1 0.727 0.636. 15. A, <*» 5, Vj 1 0 0.333 k 0 1 2 3 4 5 6 [11 "7" "7.571" "7.755" "7.808" 7.823" "7.827 xk .0. .2. .2.857. .3.132. .3.212. .3.234. .3.240 7 7.755 7.823 7.828 7.828 7.828 7.828 Yk "l" 1 1 "l "l rl "l o ,0.286 0.377 0.404 0.411 0.413 0.414 k 0 1 2 3 4 5 R(xk) Yk 1 1 1 16.333 l" 1 .1, 21 15 13 16.998 1 0.714 0.619 16.809 12.238 10.714 17.000 1 0.728 0.637 17.011 12.371 10.824 17.000 1 0.727 0.636 16.999 12.363 10.818 17.000 1 0.727 0.636 17.000 12.363 10.818. 17.000 1 0.727 0.636 21. k y* mk 5 u. 1 0.8 5 4.8 3.2 1 0.667 4.8 3 4 5 6 7 8 4.667" 4.571" "4.500" "4.444" "4.400" "4.364" .2.667. .2.286. .2.000. .1.778. .1.600. .1.455. "l "l "l 1 "1 "l .0.571. .0.500. .0.444. .0.400. .0.364. .0.333. 4.667 4.571 4.500 4.444 4.400 4.364 Since A! = A2 = 4, v, = the exact answer. mk is converging slowly to ANS20 Answers to Selected Odd-Numbered Exercises 23. A: 1 8 mk 5 4 .1, 1 0.8 0.2 4.2 3.2 0.2 1 0.762 0.048 4.2 4.048 3.048 0.048 1 0.753 0.012 4.048 4.012 3.012 0.012 1 0.751 0.003 4.012 4.003 3.003 0.003. 1 0.750 0.001. 4.003 4.001 3.001 .0.001. 1 0.750 .0 4.001 4.000 3.000 0.000. 1 0.750 0 4.000 4.000 3.000 .0.000. 'l 0.750 .0 4.000 In this case, Aj = A2 = 4 and EA = span Clearly, mk is converging to 4 and jk is converging to a / "l" "o~ \ 0 1 \ _0_ .0. / f "o" vector in the eigenspace £4—namely, 0 + 0.75 1 _0 0 25. k 1 r xk _1_ 0. "l" "i" Yk .1. .0. mk I l -1 -1 1 1 -1 The exact eigenvalues are complex (i and —;'), so the power method cannot possibly converge to either the dominant eigenvalue or the dominant eigenvector if we start with a real initial iterate. Instead, the power method oscillates between two sets of real vectors. k 0 1 2 3 4 5 1 3 2.500 2.250 2.125 2.063 1 4 4.000 4.000 4.000 4.000 .1. .3. . 2.500 _ . 2.250 _ .2.125. .2.063. 1 0.750 0.625 0.562 0.531 0.516 Yk 1 1 1 1 1 1 -1_ - 0.750 _ . 0.625 _ _ 0.562. .0.531, .0.516. mk 1 4 4 4 4 4 Answers to Selected Odd-Numbered Exercises ANS21 The eigenvalues are Aj = —12, A, = 4, A3 = 2, with corresponding eigenvectors r V T" 0 > V2 = 2 ,v3 = 0 _-l_ .1. .1. Since Xq = |v2 + jv3, the initial vector x0 has a zero component in the direction of the dominant eigenvector, so the power method cannot converge to the dominant eigenvalue/eigenvector. Instead, it converges to a second eigenvalue/eigenvector pair, as the calculations show. 29. Apply the power method to A - 187 : 12 15 k 0 1 2 3 T 8" 15.2 15.2 .1. -10. -19 -19 rl" '-0.8" "-0.8" "-0.8" Yk .1. 1 1 1 mk 1 -10 -19 -19 Thus, —19 is the dominant eigenvalue of A — 187, and A2= —19 + 18 = —1 is the second eigenvalue of A. 31. Apply the power method to A - 177 = -8 4 8 4 -2 -4 8 -4 -8 Yk mk -0.667 1 4 -2 -4. 1 -0.5 -1 4 18 -18 9 18. 1 -0.5 1 -18 -18 -18 9 18. 1 -0.5 -1 -18 -18 In-this case, there is no dominant eigenvalue. (We could choose either 18 or —18 for mk, k a 2.) However, the Rayleigh quotient method (Exercises 17-20) converges to —18. Thus, —18 is the dominant eigenvalue of A - 177, and A2 = -18 + 17 = -1 is the second eigenvalue of A. 33. 1 2 3 4 5 "- 0.833" 0.798" 0.800" 0.800" 1.056_ .-0.997. .-1.000. .-1.000. "-0.789" -0.801" "-0.800" "-0.800" 1 1 1 1 Yk 0.5 -0.5, l" -1. 0.5 1.056 -0.997 ■1.000 1.000 Thus, the eigenvalue of A that is smallest in magnitude is l/(-l) = -1. ANS22 Answers to Selected Odd-Numbered Exercises k 0 l 2 3 4 5 . . . ■ ■. r *-0.500" 0.500" 0.500" 0.500"" 0.500" l 0.000 0.333 0.111 0.259 0.160 0.500. .-0.500. .-0.500. .-0.500. .-0.500. r "-1.000" "-1.000" "-1.000" ~-l.Q00~ "-1.000 " l 0.000 -0.667 -0.222 -0.518 -0.321 1.000. 1.000. 1.000. 1.000. 1.000. l -0.500 -0.500 -0.500 -0.500 -0.500 Clearly, mk converges to —0.5, so the smallest eigenvalue ofAisl/(-0.5) = -2. 37. The calculations are the same as for Exercise 33. 39. We apply the inverse power method to A — 51 — -1 0 -1 -2 6 0 0 . Taking x0 , we have 1 0.200 1 -0.500 _1_ 0.200. "l" "-0.400" Yk 1 1 .1. .-0.400. mk 1 -0.500 -0.080 -0.500 -0.080 0.160' 1 0.160. 0.500 0.032 -0.500 0.032. -0.064' 1 -0.064. -0.500 Clearly, mk converges to —0.5, so the eigenvalue of A closest to 5 is 5 + l/(-Q.5) = 5-2 = 3. 41. 0.732 47 43. -0.619 Im 4 2- -2- -*-Re I—»-Re 51. Hint: Show that 0 is not contained in any Gerschgorin disk and then apply Theorem 4.16. 53. Exercise 52 implies that JAJ is less than or equal to all of the column sums of A for every eigenvalue A. But for a stochastic matrix, all column sums are 1. Hence JAj < l. Exercises 4,6 1. Not regular 5. Not regular 3. Regular 7.1 = I I 5 5 9.1 = 11. 1, 0.304 0.304 0.304 0.354 0.354 0.354 0.342 0.342 0.342. 13. 2, 16 4 1 15. The population is increasing, decreasing, and constant, respectively. Answers to Selected Odd-Numbered Exercises 17. P~lLP = bl b2Si J>3Si52 1 0 0 0 1 0 0 0 0 0 sn~2 bnSiS2 ■ • • S„_ 0 0 Ü 0 in- 1 65. (a) x(t) = -120em + 520ellt/w,y(t) = 240e8t/5 + 2601" 10. Strain X dies out after approximately 2.93 days; strain Y continues to grow. 67. a = 10, b = 20;x(t) = 10ef(cos t + sin f) + 10, y(t) = 10e'(cos f — sin t) + 20. Species Y dies out when t = 1.22. The characteristic polynomial of L is (A" — bxKn b2S]\n-2 - hs^x"-3-----&„w ■ -s^K-iT. 19. A = 1.746, p 21. A « 1.092, p 0.660 0.264 .0.076 0.535" 0.147 0.094 0.078 0.064 0.053 0.029 25. (a) h = 0.082 31.3, 35. Irreducible 45. 0, 1, 1,0,-1 49. yn = {n- \)2n 1 29. 3, 33. Reducible 43. 1,2,4, 8,16 47. x„ = 4" - (-1)" 51. b„ 2V3 57. (a) dx = 1, d2 = 2, d3 (b) dn = d„_] + d„_2 [(1 + V3)" - (1 - \/3)"l 3, d4 = 5, d5 = (c) d„ 1 V5 1 + V5\" 1 - V'5 2 / V 2 59. The general solution isx(f) = — 30^' + C2eu, y{t) = 2Cle~t + C2e4t. Tlie specific solution is x(f) = -3e'' + 3e4\ y(t) = 2e"! + 3e4t. 61. The general solution is Xj(0 = (1 + V2)C1ev5t + (1 - V2)C2e-v/5',x2(0 = C1ev'2t+ C2e-v'2,.The specific solution is Xj(f) = (2 + V2)eV274 + (2 - V2VV2y4, x2(t) = V2e^l4 - y/ü^/A. 63. The general solution is x(f) = — C, + C3e~',y(f) = Cj + C2e( - C3e~f, z(r) = Q + C2e'. The specific solution is x(t) = 2 — e~', y{t) = —2 + e' + e~\ z(t) = -2 + ef. 77. (a) 79. (a) 81. (a) 83. (a) = Qe2< + C2e 3f 1 3 9 27 .1. J .3. y „9. > _27 [1" T "1" r .1. .1. 1 .1. l -1 0.6 0.6 1.75" 3.125" > .-0.5 . ,-1-75 „ 0.36 0.216 > _0.36_ .0.216. (c) Repeller (c) Neither (c) Saddle point (c) Attractor 85. r = V2,6 = 45°, spiral repeller 87. r = 2,6 — —60°, spiral repeller -1 -1" "0.2 -0.1" 3" 89. P = ,c = 5 1 o_ 0,1 0.2 spiral attractor 91. P 1/2 -V3/2 1 0 orbital center 1/2 -V'3/2 3/2 1/2 Review Questions 1. (a) F (c) F (e) F (g) T (i) F 3. -18 5. Since A7 = —A, we have det A = det(Ar) - det(-A) = (-1)" detA = -det A by Theorem 4.7 and the fact that n is odd. It follows that det A = 0. 7. Ax 5 .10 9. (a) 4 - 3A2 = 5x, A A3 (c) jB] = span / r ( 2' "l" \ -l , E.....2 = span -1 0 I 0_ I \ 0_ .1. 1 11. 13. Not similar 15. Not similar 162 158. 17. 0, l,or-l 19. If Ax = Ax, then (A2 - 5A + 27)x = A2x - 5Ax + 2x = 32x - 5(3x) + 2x = -4x. R N12\ Answers to Selected Odd-Numbered Exercises feasor Exercises 5.1 1. Orthogonal 3. Not orthogonal 5. Orthogonal 11. Orthonormal i- "o" 7. [w]B = 2 9. [w]B = 3 1 13. 1/3' 2/3 2/3 2/V5 ' -1/V5 0 L 3 J 2/3V5 4/3V5 -5/3V5 15. Orthonormal 17. Orthogonal, 19. Orthogonal, 1/V2 -I/V2 _1/V2 I/V2. cos 8 sin 8 — cos v -sin18 21. Not orthogonal 27. cos(Z.(Qx, Qy)) Qx-Qy cos2 0 sin 0 sin 8 0 -cos 0 sin 0 cos 0 x-y iQxHIlQyll ||x||||y|| = cos(Z(x, y)) by Theorem 5.6 29. Rotation, 8 = 45° 31. Reflection, y = V3x 33. (a) A(AT + BT)B = AArB + ABTB = IB + Al = B + A = A + B (b) From part (a), det(A + B) = det(A(Ar + Br)B) = detAdet(AT + Br)detB = detAdet((A + 5)r)detB = det A det(A + B)det B Assume that det A + det B = 0 (so that det B = -det A) but that A + Bis invertible. Then det(A + B) * 0, so 1 = det A detB = det A(-det A) = -(det A)2. This is impossible, so we conclude that A + B cannot be invertible. Exercises 5.2 1. W1 = 3. W- - : x = f, y = f, 2 = — £ I f X : x — y + 3z I _z_ 5. W = 7. row(A):{[l 0 1], [0 1 -2]}, null(A): "0" I 3 .1. 1 9. col(A): _ r "-1* 5 2 1 0 > 1 I _-l_ _-l_ - -5"1 - 1 > -7 0_ , null(Ar): 15. 13. 17. --4- "-3" 1 0 0 1 3. 0_ -_2" ' 11' 7 * 2 1 2 19. v = 5 _6 . 5 _ + 5 4 . 5 - 21. v = -2 7 . 2- + 0 1 1- 2 25. No Exercises 5.3 i 1.Y, 3. v, q2 ■ "- 1/2 "1/v2" "-I/V2" , v 1/2 .1/V2. '^2 1/v2. r '2' 0" 1/V3" -1 = 1 . v3 = 1 1 = 1/V3 > -i_ .1. 1. .- 1/V3. 2/v6" 0 1 1/V6 = -I/V2 1/V6. . 1/V2 _ 5. f - i* 2 1 1 2 I .0. 2. 7. v + Answers to Selected Odd-Numbered Exercises ANS2S < "a" 3 - - 15" j 35 34 11. 1 ) 34 35 > 0 K _l 9 _ J _ 7 - - 34- 13. Q 15. 0 1/V2 1/V2 1/V2 0 T/V2 2/V6 -1/V6 1/V6 1/V3 1/V6 1/V3 -2/V2 1/V3 1/V6 1/V3 1/V3 -1/V3. V2 0 0 1/V2 1/vT 3/V6 1/V6 0 2/V3. 3 9 17. R = 0 6 .0 0 19. A = AI 21. A"1 = (Qfl) '1/V2 0 0 0 2/V6 1/V3 23. Let Rx = 0. Then Ax = Q£x = Q0 = 0. Since Ax represents a linear combination of the columns of A (which are linearly independent), we must have x = 0. Hence, R is invertible, by the Fundamental Theorem. = R 'CP = 1/V6 -1/2V3 2/V6 -1/2V3 • 0 3/2V3 1/V2 1/V2" 1/V6 1/V6 1/V3 -1/V3. Exercises 5.4 '1/V2 l.Q = 3.Q = 5.Q = 7.Q = 9.Q = .1/V2 2/Vi 1/V3 1 0 0 1/V2 0 1/V2 -1/V2, 1/V3 -2/V6. D = D = 0 1/V2 1/V2 1/V2 0 l/VT 1 0 0 1/V2 0 1/V2 1/V2 0 1/V2 0 0 1/V2 0 1/V2 D = D = 1/V2 1/V2 0 0 .0 0 "2 0 0' 0 1 0 .0 0 0. 0 ■ 0 1/V2 ■1/V2. D 11. QTAQ = "1/V2 .1/V2 I/V2 1/V2 1/V2 -I/V2. 0 = D 1/V2" •1/V2. + b 0 a - 13. (a) If A and B are orthogonally diagonalizable, then each is symmetric, by the Spectral Theorem. Therefore, A + B is symmetric, by Exercise 35 in Section 3.2, and so is orthogonally diagonalizable, by the Spectral Theorem. 15. If A and B are orthogonally diagonalizable, then each is symmetric, by the Spectral Theorem. Since AB = BA, AB is also symmetric, by Exercise 36 in Section 3.2. Hence, AB is orthogonally diagonalizable, by the Spectral Theorem. 17. A 19. A 21. + 5 0 0 0 0 0 0 0 0 "0 0 0~ "0 0 0 + 0 2 2 + 0 -1 1 0 2 2 0 1 -1 23. Exercises 5.5 1. 2x2 + 6xy + 4y2 3. 123 5. -5 "i 3" "3 -§- 7. 9. .-! -1. .3 2_ 11. 5 1 -2 13. Q 15. Q = 17. Q = 2/V5 I/V5 1/V5 -2/V5. '2/V5 2/3 V5 0 5/3 V5 A/V5 -4/3 V5 I/V3 1/V2 -1/V3 0 .-1/V3 I/V2 -1/V6 (v'Y ~ (z')2 ,yl + 6yl -1/3' 2/3 2/3. 1/V6 2/V6 . 9yi + 9y\ - 9yj ,2(x')2 + INI 28 Answers to Selected Odd-Numbered Exercises 19. Positive definite 23. Positive definite 21. Negative definite 25. Indefinite 43. Hyperbola, x' = x,y' = y + j, (x')2/4 - {y'f/9 27. For any vector x, we have X Ax = x B Bx = (Bx)r(Bx) = IJBxf > 0. IfxrAx = 0,then j|Bx||2 = 0, so Bx= 0. Since B is invertible, this implies that x = 0. Therefore, xTAx > 0 for all x ¥= 0, and hence A = BTB is positive definite. 29. (a) Every eigenvalue of cA is of the form cA for some eigenvalue A of A. By Theorem 5.24, A > 0, so cA > 0, since c is positive. Hence, cA is positive definite, by Theorem 5.24. (c) Let x * 0. Then xTAx > 0 and xrBx > 0, since A and B are positive definite. But then xJ (A + B)x = xrAx + xrBx > 0, so A + B is positive definite. 31. The maximum value of/(x) is 2 when x = ± the minimum value of/(x) is 0 when x = 33. The maximum value off(x) is 4 when x =: the minimum value of f(x) is 1 when x : 1/V2" -1/V2} "1/V2" .I/V2 "l/VT 4- 1/V3 .1/V3. I/V2" "-1/V5" 0 or ± 1/V2 .-1/V2. 0 35. Ellipse 37. Parabola 39. Hyperbola 41. Circle, x' = x - 2,y' = y - 2, (x')2 + (y')2 = 4 5- 45. Parabola, x' = x — 2, v' = y + 2, x' = — \{y')2 y y 47. Ellipse, (x'Ý/4 + (y'f/12 = 1 49. Hyperbola, (x')2 - {y'f = 1 Answers to Selected Odd-Numbered Exercises : (2 51. Ellipse, (x")2/50 + (y")2/10 = 1 53. Hyperbola, (x"f - (y"f = 1 55. Degenerate (two lines) 57. Degenerate (a point) y A 2- H—I—h H—I—I—*-x -2- 59. Degenerate (two lines) y 61. Hyperboloid of one sheet, (x')2 - (y')2 + 3(z')2 = 1 63. Hyperbolic paraboloid, z = — (x')2 + (y')1 65. Hyperbolic paraboloid, x' = — V3(y')2 + V3(z')2 67. Ellipsoid, 3(x")2 + (y")2 + 2(z")2 = 4 Review Questions 1. (a) T (c) T (e) F (g) F (i) F 9/2 " 2/3 5. Verify that QTQ = I. .-11/6. 3. 7. Theorem 5.6(c) shows that if v; • V, = 0, then Qv. • Qy = 0. Theorem 5.6(b) shows that {Qv., ..., Qvk} consists of unit vectors, because {v,,..., \k} does. Hence, {Qvj,..., Qvk} is an orthonormal set. r r—i " 2 9. 11. 13. row(A): {[1 0 2 3 4], [0102 1]} col(A): null(A): < 0 2 3 1" "-1" -1 2 2 1 3_ — 5 ~2 -3 0 -2 1 0 0 1 0 0 -4 -1 0 0 1 null(Ar): 15. (a) i 1 -4 3 1 1 4 1 3 1 1 4 1 3 _1_ . ""I_ 0 17. 19. I-2 1- < 1 1 2 i 3 0 5 "1 1 3 { 0 0 -1 - 1 "2 3 2 0~ 3 2 1 ~2 0 0 0 1. Exercises 6.1 1. Vector space 3. Not a vector space; axiom 1 fails. 5. Not a vector space; axiom 8 fails. 7. Vector space 9. Vector space 11. Vector space 15. Complex vector space AMS28 Answers to Selected Odd-Numbered Exercises 17. Not a complex vector space; axiom 6 fails. 19. Not a vector space; axioms 1,4, and 6 fail. 21. Not a vector space; the operations of addition and multiplication are not even the same. 27. Not a subspace 31. Subspace 35. Subspace 39. Subspace 43. Not a subspace 25. Subspace 29. Not a subspace 33. Subspace 37. Not a subspace 41. Subspace 45. Not a subspace 47. Take U to be the x -axis and W the y -axis, for example. Then V and „0. [iJ are in 17 U W, but "1" "0" + .0. 1 is not. 51. No 53. Yes; s(x) = (3 + 2t)p(x) + (14- t)q(x) + tr(x) for any scalar t. 55. Yes; h(x) = f(x) + g(x) 57. No 59. No 61. Yes Exercises 6.2 1. Linearly independent 3. Linearly dependent; 1 0" -1 f = 4 + 1 7 -2 2 "3 0' 0 2 - 2 I r—1 -3 1 5. Linearly independent 7. Linearly dependent; 3x + 2x2 = 7x — 2(2x — x2) 9. Linearly independent 11. Linearly dependent; 1 = sin2x + cos2x 13. Linearly dependent; ln(x2) = -2 In 2-1 + 2 • ln(2x) 17. (a) Linearly independent (b) Linearly dependent 35. dimV = 2,B = {l - x, 1 "l 0 37. dim V = 3, B = 39. dim V = 2, B = 0 0 1 0 0 1 - x2} 0 1 0 0 0 1 0 0 0 0 0 1 41. (n2 - n)/2 43. (a) dim(U X V) = dim 17 + dim V (b) Show that if {wv ..., w,J is a basis for W, then {(wj, Wj),..., (w„, wn)} is a basis for A. 45. {l + x, 1 + x + x2, l} 47. 1 0 0 1 49. {1,1 + x) 51. {1 - x,x - X2} 53. {sin2 x, cos2 x} 59. (a) pQ(x) = |x2 p2(x) = |x2 0 1 0 -1 1 0 1 0 1 0 0 0 Ix + 1 3, p;(.v) x2 + 4x - 3, 61. (c) (i) 3x2 - 16x + 19 (ii) x2 - 4x + 5 63. (p" - l)(pn - p)(p" - p2)- ■ ■ {f - p"'!) Exercises 6:3 1. [x]8 = 1 1 1 -1 > *C- l >;);- c - 1 1 1 0 0 3. Me = 0 » [x]c = -1 -1 1 0 .-1 „-l 0 -1 1 5. [p(x))B = 1 0 0 1 1 0 1 1 1 2 1 0 1 1 1 Ap(x)]c -3 2 Pc- -1 1 1 0 19. Basis 21. Not a basis r T "l 0 (f 23. Not a basis 25. Not a basis 7. [pWJp = -l . ip(x)]c = 0 1 1 0 r -ii i_ _1_ _1 1 1. 27. [A][ -1 -1 4 29. [p(x)]2 6 -1 3 1 -1 0 0 0 1 0 -1 1 Answers to Selected Odd-Numbered Exercises MS 9. [A]i ' 4~ 5 -] 2 2 > [A]c = 0 0 -3 9 2. and rl k ,P> (, a \ ka [k )-T \ b. kb_ 11. \f(x))B -1 1 1 -2 2 -5 In 2 0 1 1 ~2 > Pb*-c ~ c+-b ■ 1 2 0 -1 1 1 1 0 0 0 1 1 Therefore, T is linear. -7 15. T = (ka) + (ka + kb)x = k(a + (a + b)x) = kT a + 3b 9 a + 7b 5 - 14* - 8xz, T a — b x+ , c*~b 13. (a) (b) 15. B = 1/2 -1/2 J (3 - 2V/3)/2 (-3V3 + 2)/2 2 + 2V3" 2V3 - 2 [/(*)]C = -1/2 5/2j' 1 1 0 ~2 3.232' .-1-598. 17. T(4 - x + 3x2) = 4 + 3x + 5x2, T(a + bx+ cx2) = (3a — b — cN a + cx + I- V 2 19. Hmf; Let a = T(EU), b = T(£12), c = T(£21), d=T(£22). 23. Hint: Consider the effect of T and D on the standard basis for S?„. "5.464" "2" "4 -1" J .464. 25. (S 0 T) 1 — 0 6 . (S »D 2x —7 . 0 2x + 2y_ -1 -1 (r°s) X UM does not make sense. 17. -2 - 8(x - 1) - 5(x - l)2 19. -1 + 3(x + 1) - 3(x + l)2 + (x+ I)3 27. (S o T)(p(x)) = p'(x + 1), (To S)(p(x)) (p(x+l))' = p'(* + 1) 3. Linear transformation Exercises 6.4 1. Linear transformation 5. Linear transformation 7. Not a linear transformation 9. Linear transformation 11. Not a linear transformation 13. We have S(p(x) + q(x)) = S((p + q)(x)) = x((p + q)(x)) - x(p(x) + q(x)) = xp(x) + xq(x) = S(p(x)) + S(q(x)) and S(cp(x)) = S((cp)(x)) = x((cp)(x)) = x(cp(x)) = cxp(x) = cS(jp(x)) Therefore, S is linear. Similarly, a + c b + d_ = (a + c) + ((a + c) + (b + d))x = (a + (a + b)x) + (c + (c + d)x) 29. (S'° T) = si r = s x — y _—3x + 4y_ 4(x - y) + (-3x + 4y) 3(x - y) + (-3x + 4y) 1- x J .y. x J x Y-A 4x + y = T[S J- V y. ) \ _3x + y_ X J. (4x + y) - (3x + y) Therefore, S 0 T = 1 and T0 S — I, so S and T are inverses. Exercises 6.5 T + T 1. (a) Only (ii) is in ker(T'). (b) Only (iii) is in range(T). , "0 b (c) ker(T) = c 0 range(T) = Tl + 71 3. (a) Only (iii) is in ker(T). (b) All of them are in range (T). (c) ker(T) = {a + bx + cx2: a = —c, b {t + tx- fx2}, range(T) = !R2 a 0; 0 d1 -c} = AHS30 Answers to Selected Odd-Numbered Exercises 5. A basis for ker(T) is for range (T) is and a basis J "l 0' "0 0" 1 0 0_ _0 1. nullity(T) dim M22. 7. A basis for ker(T) is {l + x rank(T) = 2, and rank(T) + nullity(T) = 4 = x2}, and a basis for range (T) is >;rank(T) = 2, nullity(T) = 1, and rank(T) + nullity(T) = 3 = dim;-/',. 9. rank(T) = nullity(T) = 2 ll.rank(T) = nullity(T) = 2 13. rank(T) = 1, nullity(T') = 2 15. One-to-one and onto 17. Neither one-to-one nor onto 19. One-to-one but not onto 21. Isomorphic, T a 0 (f a 0 b 0 = b „0 0 c_ 23. Not isomorphic 25. Isomorphic, T(a + bi) 31. Hint: Define T: % [0,1] -> [0, 2] by letting T(f) be the function whose value at x is (T(f))(x) — f(x/2) for xin [0, 2]. 33. (a) Let Vj and v2 be in Vmd let (S ° T)(Vi) --(S o D(v2). Then S(r(v,)) = S(T(v2)), so r(vj) = T(v2), since S is one-to-one. But now v, = v2, since T is one-to-one. Hence, S ° T is one-to-one. 35. (a) By the Rank Theorem, rank(T) + nullity(T) = dim V. If Tis onto, then range(T) = W, so rank(T) = dim(range(T)) = dim W. Therefore, dim V + nullity(T) < dim W + nullity (T) = rank(T) + nullity(r) = dim V so nullity (T) < 0, which is impossible. Therefore, T cannot be onto. Exercises 6.6 1. [TW = 0 -1 2 -4 , [T'W4 + 2x]B = [2 - 4x]c= [7(4 + 2x)]c 3. [T] 1 0 0 0 1 0 0 0 1 m rfl 4- bx + cx2]r = "l 0 o" a a 0 1 0 b = b = [a + b(x + 2) + 0 0 1 c c c(x + 2)2)]c = [T(a + bx+ cx2)]c Pi 0 0" 1 1 1 5. [T] [T] c+-bi bx + cx2]B 1 0 0 1 1 1 a b + c a 4- b • 0 + c • 02 a + b'l + c I2 [T(« + bx cx2)]c Jc 6 4 -3 2 -2 -1. . [T]c <-b 7. B 6 4" "o" '7' "7" 2 -3 -2 0 — 7 — 7 -3 2,-1. _7_ .7. c .7. T -7 7 9. [TWS = "1 0 0 0 0 0 1 0 0 1 0 0 0 0 0 1 mcWA], 1 0 0 0' 0 0 10 0 10 0 0 0 0 1 11. [7] c^b 0 -1 1 0 c - a - 0 -1 1 0 1 0 0 -1 a a b c c h d_ -1 0 0 1 — a c b d. = [T(A)] 1 0" 0 1 0 1 , [T] — -1 0_ a' c - b~ b d — a c a - d d k — c b d — a d b — c = [AB - BA]C= [T(A)_ Answers to Selected Odd-Numbered Exercises HNS81 13. (b) [D]B-- (c) [D]g[3 sinx — 5 cosx]g ; 0 -1 3 -5 = [ 3 cos x + 5 sin x ] g D (3 sin x — 5 cos x) ] B 15. (a) [D]B = 17. [S°TWB 2 0 0 0 2 1 .0 -1 2. "-1 -2" 1 -1. 19. Invertible, T + fox) = — b + ax 21. Invertible, T_1(p(x)) = p(x - 2) 23. Invertible, T~\a + bx + cx2) = (a - b + 2c) + (b - 2c)x + cx2 or T'^ipix)) = p(x) - p'(x) + p"(x) 25. Not invertible 27. — 3 sin x — cos x + C 29.fe2*cosx - sinx + C 31. C 33. C = {l - x, 2 + x} 35. C = {l,x} 37. [r]g- (d2 - d22)/(d2 + d\) 2dld2/(df + d\) 2dld2/(d2 + d22) {d\ - d\)l{d\ + d22) Exercises 6.7 1. y(f) = 2e}t/e3 3. y(t) = ((1 - e4)e:" + (e3 - l)e4t)/(e3 - e4) 5. /(f) = ' eV5 - 1 ■e(l + V5)l/2 _ g(l-V5)t/2j 7. v(f) = e* - (1 - e~l)te* 9. y(f) = ((/c + l)ekl + (k- l)e-kt)/2k ll.y(t) = e'cos(2r) 13. (a) p(t) = 100e'n(I6)t/3 « lOOe0924' (b) 45 minutes (c) In 9.968 hours 15. (a) m(t) = 50e~d, where c = ln2/1590 « 4.36 X 10"4; 32.33 mg remain after 1000 years, (b) After 3691.9 years 5 - 10cos(10VK) /— 17. x(0 =----—- sin(Vkt) + 10 cos(VIr) sin(lOVK) 19. (b) No Review Questions 1. (a) F (c) T 3. Subspace 5. Subspace 7. Let CiA + c2B = O. Then cYA - c2.B = c,AT + c2BT = (ctA + c2J3)r = O. Adding, we have 2c,A = O, so Cj = 0 because A is nonzero. Hence c25 = O, and so c2 = 0. Thus, {A, B} is linearly independent. 9. {l,x2,x4}, dim W = 3 11. Linear transformation "1 0 -r 0 1 -2 0 0 i 1 0 -l 13. Linear transformation 15. n2 — 1 17. 19. S ° T is the zero transformation. Chapter 7 Exercises 7.1 1. (a) -10 (b) Vl4 (c) V93 3. Any nonzero scalar multiple of 5. (a) 1 (b) V13 (c) V14 7. x2 is one possibility 9. (a) 77 (b) Vtt "0 13. Axiom (4) fails: u = 15. Axiom (4) fails: u = (c) Vtt # 0, but (u, u) = 0. + 0, but (u, u) = 0. 17. Axiom (4) fails: p(x) = 1 - x is not the zero polynomial, but (p(x),p(x)) = 0. r4 r (e) F (g) F (i) T = (u, u) + (v, v) + (w, w) + 2(u, v) - 2(u, w) - 2(v, w) = 1+ 3+ 4 + 2- 10 - 0 = 0 AHS32 Answers to Selected Odd-Numbered Exercises Therefore, jju + v — w|j — 0, so, by axiom (4), u4-v — w = 0oru + v = w. 31. (u + v, u - v) = (u, u) - (u, v) + (v, u) - (v, v) = i|u||2 - (u,v) + (u, v) - j|vjj2 = ||u||2 - ||v||2 33. Using Exercise 32 and a similar identity for jju - v|j2, we have |ju + v||2 + |ju - v|2 = (u + v, u + v) + (u - v, u - v) = ||uj|2 + 2{u,v) + j;v|j2 + ||u||2 - 2{u, v) + ||v||2 = 2!ju|j2 + 2|[v|!2 Dividing by 2 yields the identity we want. 35. |ju + v|| = jju - v|j •*+■ Jju + v||2 = |)u - vj|2 4* |[u||2 + 2{u, v) + |v||2 = Siu||2 - 2(u,v) + ||v{|2 2{u,v) = -2{u,v)^(u, v) = 0 37. 39. {l,x,x2} 41. (a) 1/V2, V3x/V2,V5(3x2 - 1)/2V2 (b) V7(5x3 - 3x)/2V2 Exercises 7.2 1. |u||£ = V42, jjujj, = 10, ||u||m = 5 3. d£(u, v) = V70, d£u, v) = 14, dm(u, v) = 6 5. i|u|i„ = 4, |jv||„ = 5 7. (a) At most one component of v is nonzero. 9. Suppose ||vj|m = \vk\. Then ||v||£ = Vvf 4----+v|T - ■ • + vl a Vvl = I vk\ = Ijvjl 11. Suppose ||vj[m = |vkj. Then jv,-j s \vk\ for i = 1,so ML = In! + ■■• + \v„\ < (vjtl + ■■ ■ + \vk\ = n\vk\ = «SJv|m 13. y 21. jjAjjp = V19, jjAjJ! = 4, j|Ajjx = 6 23. jjAj|F = V31, |jA||i = 6, |A|jx = 6 25. ||A||F= 2V/iT,i|A|1 = 7, \{A\\„ = 7 27.x ~0~ "i -1* ,y = l. 29, x = 0 .1. >y = l .1. "l" 1 31. x = 0 ,y = -l 0 -l 33. (a) By the definition of an operator norm, \\I\\ = max||Jx|| = max||x|| = 1. W=i MM 35. condj(A) = cond„(A) = 21; well-conditioned 37. cond^A) = condo;(A) = 400; ill-conditioned 39. condi(A) = 77, condx(A) = 128; moderately ill-conditioned 41. (a) cond^CA) = (max{|fe| 4- 1,2» • I max + 1 k - 1 43. (a) condx(A) = 40 (b) At most 400% relative change 45. Using Exercise 33(a), we have cond(A) WAA"'1! = ||/|| = 1. 49. k > 6 51. k > 10 iiAllllA-1!! Exercises 7.3 l. []e|] = V2 5. Bel = V7 1.414 2.646 3. ilejj = V6/2 « 1.225 7.y 9. y 11. y 13. y 15. y 19.x 23.x -3 + ix, ilejj 2x, llell ' ii 3 z 4- -8-x llell 10 ' 25x> Fll 1 I* lle!l lx + x2 « 1.225 0.816 0.447 « 0.632 7 Lis 17. y = 21. x = 18 _ iZr _ Ir2 5 10x 2A 4~ 4 + t ' -5 - t -5 - It t 25. 42 11 19 11 11 11 J Answers to Selected Odd-Numbered Exercises ANS 13 27.x = 29. y = 0.92 + 0.73% 31. (a) If we let the year 1920 correspond to t = 0, then y = 56.6 + 2.9f; 79.9 years 33. (a) p(t) - 150eai31! 35.139 days 37. -1 1 1 " "2" .J. 3 3 3 2 7 39. 1 3 ] 3 1 3 > 2 -2- 1 -3 1 3 1 3- .2. 41. 45. A+ = [ 47. A+ 51. A 49. A+ = 1 -1 0 1 53. (a) If A is invertible, so is Ar, and we have A+ = ~o~ 0 23. (Exercise 7) A = 3 1 [0 1] + 2 0 .0. L.-1- 33. The line segment [-1,1] 35. The solid ellipse--h — 5 4 1 37. (a) HAD, = V2 39. (a) ||A||2 = 1.95 41. A" 45. A+ = 47. A+ = A A' 25 25 A A 25 25 43. A" III 6 6 6 III 6 6 6 J 61. 63. V2 0 0 0 2 -1 -1 3 0.52 1.04 1 1 1/V2 1/V2 ■1/V2 1/V2 0 1 -1 0 [1 o] (b) cond2(A) = » (b) cond2(A) = 38.11 0 Exercises 7.4 1. 2,3 9. V5,2,0 11. A 13. A 15. A = 17. A = 19. A = 3. V^,0 V2 0 0 0 5. 5 7. 2,3 I/V2 I/V2" .1/V2 -I/V2 3 0 0 2 -1 0 "5" ,0. [1] 0 1 .0 - 1 0 0 1 0 1 0 0 -1 0 3 0 0 2 0 0 V5 0 0 0 2 0 0 1 1 0 2/V5 0 0 1 1/V5 0 21. A = V2 [1/V2 l/V2]+0 1/V5 0 -2/V5J Exercises 7.5 1. -g(x) = \ 3. g(x) = fx 5- g(x) = il + jfx2 7.{l,x-i} 9. g(x) = x - 1 6 11. ^(x) = (4e - - 10) + (18 - 6e)x « 0.87 + 1.69* 13.g(x) =i- fx + fx2 15. g(x) = 39e - 105 + (588 - 216e)x + (210e - 570)x2 = 1.01 + 0.85x + 0.84x2 TT 4 ( COS 3% 21.---cos x + 2 TT [-I/V2 1/V2] (Exercise 3) 23. a0 = 5> ak = °> ^* ~ 25. a0 = 77, = 0, bk = Review Questions 1. (a) T (c) F 3. Inner product 1 - (-1)' kTT 2(-l)k (e) T (g) T 5. V3 7. (i) T / r \ ■ r 2 I 1. /' 1 - 2- ANS34 Answers to Selected Odd-Numbered Exercises 9. Not a norm I3.7 = 1.7* 17. (a) V2, V2 11. concUA) » 2432 15. ~1/V2 1/V2 0" "V2 0" (b) A = 0 0 1 0 V2 . 1/V5 -1/V2 o_ 0 0_ (c) Af = 19. The singular values of PAQ are the square roots of the eigenvalues of {PAQ)T(PAQ) = QrATPTPAQ = QT(ATA)Q. But Qr(ArA)Q is similar to ATA because QT = Q_1, and hence it has the same eigenvalues as ATA. Thus, PAQ and A have the same singular values. ■ K. Index Abel, Niels Henrik, 311, D8 Absolute value, C3 Addition closure under, 192, 429 of complex numbers, CI of matrices, 140 of polynomials, D2 of vectors, 5-6, 9, 429 Adjacency matrix, 242, 244 Adjoint (adjugate) of a matrix, 276 Al-Khwarizmi, Abu Ja'far Muhammad ibn Musa, 85 Algebraic multiplicity, 294 Algebraic properties of vectors, 10 Algorithm, 85 Allocation of resources, 99-101 Altitude of a triangle, 33 Angle between vectors, 24-26 Argand, Jean-Robert, CI Argand plane, CI Argument of a complex number, C4 Arithmetic mean, 548 Arithmetic Mean—Geometric Mean Inequality, 548 Associativity, 10, 154, 158, 223, 429 Attractor, 350 Augmented matrix, 61, 64 Axioms inner product space, 531 vector space, 429 B Back substitution, 61 Balanced chemical equation, 101-102 Basis, 198, 446-448 change of, 463-470, 507-509 coordinates with respect to, 208, 448-452 orthogonal, 370, 537 orthonormal, 372, 537 standard, 198, 447 Basis step, Bl Basis Theorem, 202, 453 Best Approximation Theorem, 570 Best approximation, to a vector, 570-571 Binary vector, 14 Binet, Jacques, 338 Binet's formula, 339, 428 Bipartite graph, 251,254 Block, 145 Block multiplication, 148 Block triangular form, 283 Bunyakovsky, Viktor Yakovlevitch, 539 C % 435 C2, 432 C", 543 Carroll, Lewis, 141,284 Cassini, Giovanni Domenico, 362 Cassinis identity, 362 Cauchy, Augustin-Louis, 273, 280 Cauchy-Schwarz Inequality, 22, 539-540 Cayley, Arthur, 300 Cayley-Hamilton Theorem, 300 Centroid of a triangle, 32 Change of basis, 463-470 Characteristic equation, 292 Characteristic polynomial, 292 Circuit, 242 Circumcenter of a triangle, 33 Closure under addition, 192, 429 under linear combinations, 192 under scalar multiplication, 192, 429 Codomain, 212 Coefficient(s) Fourier, 615 of a linear combination, 12, 154 of a linear equation, 58 matrix, 64 method of undetermined, D7 of a polynomial, Dl Cofactor, 266 Cofactor expansion, 266-269 Column matrix, 138 Column-row representation of a matrix product, 147 Column space, 195 Column vector, 3, 138 Commutativity, 10, 19, 154, 429 Companion matrix, 299 Complete bipartite graph, 254 Complex dot product, 543 Complex numbers, Cl-Cll absolute value of, C3 addition of, CI argument of, C4 conjugate of, C2 division of, C2, C5 equality of, CI imaginary part of, CI modulus of, C3 multiplication of, C1-C2, C5 negative of, C2 polar form of, C3-C6 powers of, C6-C7 principal argument of, C4 real part of, CI roots of, C7-C8 Complex plane, CI Complex vector space, 429, 543 Component of a vector, 3 orthogonal to a subspace, 382, 538 Composition of linear transformations, 219, 476-478 Condensation method, 284-285 Condition number, 562,602 Conic sections, 415-416 Conjugate of complex numbers, C2-C3 Conjugate transpose of a matrix, 544-545 Connected graph, 361 Conservation of flow, 102 Consistent linear system, 60 Constant polynomial, D1 Constrained optimization, 413-415, 547-551 Consumption matrix, 236 Contradiction, proof by, A8 Contrapositive, proof by, A8 Convergence of iterative methods, 125, 316, 563-566 Coordinate grid, 13 Coordinate vector, 208, 448-452 Coordinates, 207-209 Cotes, Roger, 569 Cramer, Gabriel, 274 Cramers Rule, 274-275 Cross product, 48-49, 286-287 Crystallographic restriction, 517 Curve fitting, 290-291 D 3, 435 De Moivre, Abraham, C6 De Moivre's Theorem, C6-C9 Degenerate conic, 415, 424 Degree of a polynomial, Dl Demand vector, 236 Descartes, René, 3, D9 :,; Index Descartes' Rule of Signs, D9-D10 Determinant(s), 165, 263-265 cofactor expansion of, 266-269 of elementary matrices, 271-272 geometric applications of, 286-291 history of, 280-281 and matrix operations, 272-274 of« X n matrices, 265-269 properties of, 269-274 Vandermonde, 291 Diagonal entries of a matrix, 139 Diagonal matrix, 139 Diagonalizable linear transformation, 509 Diagonalizabie matrix, 303 orthogonally, 400 unitarily, 546-547 Diagonalization, 303-309 orthogonal, 400-407 Diagonalization Theorem, 307 Diagonalizing a quadratic form, 411 Diagonally dominant matrix, 128, 324 Difference of complex numbers, C2 of matrices, 140 of polynomials, D2 of vectors, 8, 433 Differential equation(s), 363,436, 518 boundary conditions for, 523 homogeneous, 436, 518-525 initial conditions for, 340, 343, 344, 363 solution of, 518 system of linear, 340-348 Differential operator, 473 Digital image compression, 607-608 Digraph, 243 Dimension, 203, 452-456 Direct proof, A7 Direction vector, 35, 39 Disjoint sets, A4 Distance Hamming, 554 from a point to a line, 41-43 from a point to a plane, 43-44 taxi cab, 529-531 between vectors, 23-24, 535 Distance functions, 554-555 Distributivity, 10, 19, 154, 158, 429 Divergence, 127 Division algorithm, D4 Dodgson, Charles Lutwidge, 281, 284 Domain, 212 Dominant eigeiwalue, 311 Dominant eigenvector, 311 Dot product, 18-20, 49 complex, 543 weighted, 532 Dual space, 514 Dynamical system, 253, 348-355 trajectory of 349 E Echelon form of a matrix reduced row, 73 row, 65 Edge of a graph, 242 Eigenspace, 256 Eigenvalue(s), 254 algebraic multiplicity of, 294 dominant, 311 geometric multiplicity of, 294 inverse power method for computing, 317- 318 power method for computing, 311-316 shifted inverse power method for computing, 318- 319 shifted power method for computing, 316-3171 Eigenvector(s), 254 dominant, 311 orthogonal, 402 Electrical network, 104-107 Elementary matrix, 170 Elementary reflector, 397 Elementary row operations, 66 Elements of a matrix, 138 Elimination Gauss-Jordan, 72-76 Gaussian, 68-72 Empty set, A2 Equality of complex numbers, CI of matrices, 139 of polynomials, D2 of sets, A1-A2 of vectors, 4 Equation(s) linear, 58 normai, 575 system of linear, 59 Equilibrium, 50,107 Equivalence relation, 302 Error vector, 565, 572 Euclidean norm, 553 Euler, Leonhard, C9 Euler's formula, C9-C11 Even function, 617 Exchange matrix, 235 Expansion by cofactors, 266-269 Exponential of a matrix, 346 F gF,431 Factor Theorem, D4 Factorization IV, 180-186 modified QR, 396-398 QR, 392-394 Feasible solution, 236 Fibonacci, 336 Fibonacci numbers, 335, 338-339, 427 Field, 429 Finite-dimensional vector space, 453 Finite linear games, 109-113 Floating-point form, 83 Force vectors, 50-53 Fourier approximation, 615 Fourier coefficients, 615 Fourier, Jean-Baptiste Joseph, 616 Fourier series, 617 Free variable, 71 Frobenius, Georg, 204 Frobenius norm, 556 Fundamental subspaces of a matrix, 380 Fundamental Theorem of Algebra, D8 Fundamental Theorem of Invertible Matrices, 172, 206, 296,512, 605-606 G Galilei, Galileo, 526 Galois, Evariste, 311, D8 Gauss, Carl Friedrich, 69, 125, 538, 569, D8 Gauss-Jordan elimination, 72-76 Gauss-Jordan inverse method, 175-178 Gauss-Seidel method, 124-131 Gaussian elimination, 68-72 General form of the equation of a line, 34, 36,41 General form of the equation of a plane, 38, 41 Geometric mean, 548 Geometric multiplicity, 294 Gerschgorin disk, 319 Gerschgorin Disk Theorem, 321 Gerschgorin, Semyon Aranovich, 319 Gerschgorin's theorem, 319-322 Gibbs, Josiah Willard, 49 Global Positioning System (GPS), 121-123 Google, 358 Gram, Jörgen Pedersen, 390 Gram-Schmidt Process, 388-392 Graph, 242, 253-254 adjacency matrix of, 242, 244 bipartite, 251 complete, 253 complete bipartite, 254 connected, 361 cycle, 254 directed (digraph), 243 edges of, 242 k-regular, 361 path in a, 242 Petersen, 254 vertices of, 242 Grassmann, Hermann, 429 Grassmann's Identity, 458, 496 H Half-life, 520 Hamilton, William Rowan, 2, 300 Hamming distance, 554 Hamming norm, 554 Harmonic mean, 551 Head of a vector, 3 Head-to-tail rule, 6 Hermitian matrix, 545 Hilbert, David, 403 Hoene-Wronski, Jösef Maria, 457 Homogeneous linear differential equations, 518-525 Flomogeneous linear system, 76 Hooke's Law, 524 Householder, Alston Scott, 397 Householder matrix, 397 Hyperplane, 40 Index 13 I !,C1 Idempotent matrix, 179 Identity matrix, 139 Identity transformation, 221, 474 Ill-conditioned linear system, 84 Ill-conditioned matrix, 561 Image, 212 Imaginary axis, CI Imaginary conic, 424 Imaginary part of a complex number, CI Inconsistent linear system, 60 Indefinite matrix, 413 quadratic form of, 413 Index of summation, A5 Indirect proof, A7 Induction hypothesis, Bl Induction step, Bl Infinite-dimensional vector space, 453 Initial point of a vector, 3 Inner product, 531 Inner product space, 531-534 and Cauchy-Schwarz and Triangle Inequalities, 539-540 distance between vectors in, 535 length of vectors in, 535 orthogonal vectors in, 535 properties of, 535 Integers modulo m, 14-16 Interior of a matrix, 284 Intersection of sets, A4 Inverse Gauss-fordan method of computing, 175-178 of a linear transformation, 221-222, 478-479 of a matrix, 163 Inverse power method, 317-318 shifted, 318-319 Invertible linear transformation, 221-222, 478-479 Invertible matrix, 163-170 Irreducible matrix, 335 Irreducible polynomial, D7 Isometry, 375 Isomorphism, 493-495 Iterative method (s) convergence of, 125, 316, 563-566 Gauss-Seidel method, 124-131 inverse power method, 317-318 Jacobi's method, 124-131 power method, 311-316 shifted inverse power method, 318-319 shifted power method, 316-317 J Jacobi, Carl Gustav, 124 Jacobis method, 124-131 Jordan, Wilhelm, 72 K Kernel, 482 Kirchhoff's Laws, 104 L Lagrange interpolation formula, 459 Lagrange, Joseph-Louis, 458 Lagrange polynomials, 458 Laplace Expansion Theorem, 266, 277-280 Laplace, Pierre Simon, 267 Lattice, 516 Leading entry, 65 Leading 1,73 Leading variable, 71 Least squares approximating line, 574 Least squares approximation, 568-569, 571-582 Best Approximation Theorem and, 570-571 and orthogonal projection, 583-585 and the pseudoinverse of a matrix, 585-586 via the qr factorization, 582-583 via the singular value decomposition, 603-605 Least squares error, 572 Least squares solution, 574 of minimal length, 603-604 Least Squares Theorem, 575 Left singular vectors, 593 Legendre, Adrien Marie, 538 Legendre polynomials, 538 Leibniz, Gottfried Wilhelm von, 281 Lemma, 271 Length of a binary vector, 14 of an m-ary vector, 16 of a path, 242 of a vector, 20, 535 Leonardo of Pisa, 336 Leontief closed model, 108,235 Leontief open model, 108, 236 Leontief, Wassily, 107 Leslie matrix, 240 Leslie model, 239-241, 330-332 Line, 34-38 of best fit, 574 equation(s) of, 34, 36,41 least squares approximating, 574 Linear combination, 12, 154, 433 Linear dependence, 92-93, 157, 443, 446 Linear economic models, 107-109, 235-236 Linear equation(s), 58, 59. See also Linear system(s) Linear independence, 92-97, 157, 443-446 Linear recurrence relations, 335-336 Linear system(s), 58-62 augmented matrix of, 61,64 coefficient matrix of, 64 consistent, 60 direct method for solving, 64-79 equivalent, 60 homogeneous, 76 ill-conditioned, 84 inconsistent, 60 iterative methods for solving, 124-131 solution (set) of, 59 over Rp, 77-79 Linear transformation(s), 213-214, 472-474 onto, 488 composition of, 219, 476-478 diagonalizable, 509 identity, 221,474 inverse of, 221-222, 478-479 invertible, 221-222, 478-479 kernel of, 482 matrix of, 216, 497-503 nullity of, 484 one-to-one, 488 properties of, 475-476 zero, 474 Linearly dependent matrices, 157 Linearly dependent vectors, 93, 443, 446 Linearly independent matrices, 157 Linearly independent vectors, 93, 443,446 Long range transition matrix, 329 LU factorization, 180-186 Lucas, Edouard, 336,428 M m-ary vector, 16 Mm„,430 Maclaurin, Colin, 274, 280 Magic square, 460-462 classical, 460 weight of a, 460 Mantissa, 83 Markov, Andrei Andreyevich, 230 Markov chain, 230-235, 325-330 Mathematical induction, B1-B7 first principle of, Bl second principle of, B5 Matrix (matrices), 61, 138 addition of, 140 adjacency, 242, 244 adjoint (adjugate), 276 associated with a quadratic form, 409 augmented, 61, 64 change-of-basis, 465 characteristic equation of, 292 characteristic polynomial of, 292 coefficient, 64 column space of, 195 companion, 299 condition number of, 562 conjugate transpose of, 544-545 consumption, 236 determinant of, 165, 264, 265-269 diagonal, 139 diagonalizable, 303-309 difference of, 140 eigenspace of, 256 eigenvalue of, 254 eigenvector of, 254 elementary, 170 elements of, 138 entries of, 138 equality of, 139 exchange, 235 exponential of, 346 factorization of, 180 fundamental subspaces of, 380 Hermitian, 545 idempotent, 179 identity, 139 ill-conditioned, 561 indefinite, 413 interior, 284 inverse of, 163 invertible, 163-170 Index Matrix (Continued) irreducible, 335 Leslie, 240 of a linear transformation, 216,497-503 multiplication of, 141-143 negative definite, 413 negative of, 140 negative semidefinite, 413 nilpotent, 282 norm of, 555-561 normal, 547 null space of, 197 nullity of, 204 orthogonal, 373-376 orthogonally diagonalizable, 400 partitioned, 145-149 permutation, 187 positive, 325 positive definite, 413 positive semidefinite, 413 powers of, 149-150 primitive, 335 productive, 237-238 projection, 218-219, 366, 586 pseudoinverse of, 585-586,602-603 rank of, 72, 204 reduced row echelon form of, 73 reducible, 334 regular, 325 row echelon form of, 65 row equivalent, 68 row space of, 195 scalar, 139 scalar multiple of, 140 similar, 301-303 singular values of, 590-591 singular vectors of, 593 size of, 138 skew-symmetric, 162 square, 139 standard, 216 stochastic, 232 strictly diagonally dominant, 128 sum of, 140 symmetric, 151-152,160-161 trace of, 162 transition, 231 transpose of, 151, 159-160 unit lower triangular, 181 unitarily diagonalizable, 546-547 unitary, 545-546 upper triangular, 162 zero, 141 Matrix-column representation of a matrix product, 146 Matrix factorization, 180. See also Singular value decomposition (SVD) and diagonalization, 303-309 LU, 180-186 modified QR, 396-398 ptw, 186-187 QR, 392-394 and Schur's Triangularization Theorem, 408 Matrix transformation, 211-216, 472 projection, 218-219, 509-510 reflection, 215 rotation, 216-218 Max norm, 553 Mean arithmetic, 548 geometric, 548 harmonic, 551 quadratic, 550 Median of a triangle, 32 Metric, 555 1 Metric space, 555 Minimum length least squares solution, 603-604 Minor, 264 Modified qr factorization, 396-398 Modular arithmetic, 13-16 Modulus of a complex number, C3 Moore, Eliakim Hastings, 602 Moore-Penrose inverse, 602 Muir, Thomas, 281 Multiplication of complex numbers, C1--C2, C5 of matrices, 141-143 of polynomials, D2-D3 scalar, 7-8, 140,429 Multiplicity of an eigenvalue algebraic, 294 geometric, 294 N Negative of a complex number, C2 of a matrix, 140 of a vector, 8, 429 Negative definite matrix, 413 quadratic form of, 413 Negative semidefinite matrix, 413 quadratic form of, 413 Net reproduction rate, 360 Network, 102 Network analysis, 102-103 Newton's Second Law of Motion, 524 Nilpotent matrix, 282 Node, 102 Nondegenerate conic, 415-416 Norm of a matrix, 555-561 1- , 559 2- , 559 7-, 559 compatible, 556 Frobenius, 556 operator, 559 Norm of a vector, 20, 535, 552 1- , 553 2- , 553 7-, 553 Euclidean, 553 Hamming, 554 max, 553 sum, 552 taxicab, 530 uniform, 553 Normal equations, 575 Normal form of the equation of a line, 34, 36, 41 Normal form of the equation of a plane, 38, 41 Normal matrix, 547 Normal vector, 34, 38 Normalizing a vector, 21 Normed linear space, 552 Null space, 197 Nullit)' of a linear transformation, 484 of a matrix, 204 o Odd function, 617 Ohm's Law, 104 One-to-one, 488 Onto, 488 Operator norm, 559 Optimization constrained, 413-415 geometric inequalities and, 547-551 Orbital center, 355 Ordered n-tuple, 9 Ordered pair, 3 Ordered triple, 8 Orthocenter of a triangle, 33 Orthogonal basis, 370, 537 Orthogonal complement, 378-382 Orthogonal Decomposition Theorem, 384-385 Orthogonal diagonalization, 400-407 Orthogonal matrix, 373-376 Orthogonal projection, 382-387, 538 least squares approximation, 583-585 Orthogonal set of vectors, 369-373, 537 Orthogonal vectors, 26, 535 Orthonormal basis, 372, 537 Orthonormal set of vectors, 372, 537 Outer product, 147 Outer product expansion, 147 Outer product form of the SVD, 596 P 9, 431 <3>„ 431 Parallel vectors, 8 Parallelogram rule, 6 Parameter, 36 Parametric equation of a line, 36,41 of a plane, 39,41 Partial fractions, 119 Partial pivoting, 84-85 Partitioned matrix, 145-149 Pafh(s) k-, 243 length of, 242 number of, 242-245 simple, 242 Peano, Giuseppe, 429 Penrose conditions, 586 Penrose, Roger, 603 Permutation matrix, 187 Perpendicular bisector, 33 Index Perron eigenvector, 335 Perron-Frobenius Theorem, 332-335 Perron, Oskar, 332 Perron root, 335 Perrons Theorem, 333 Petersen graph, 254 Pivot, 66 Pivoting, 66 partial, 84-85 Plane, 38-41 Argand, CI complex, CI equation of, 38, 39, 41 Polar decomposition, 610 Polar form of a complex number, C3-C6 Polya, George, A7 Polynomial, D1-D10 characteristic, 292 constant, Dl degree of, Dl irreducible, D7 Lagrange, 458 Legendre, 538 Taylor, 472 trigonometric, 614 zero of, D4 Population distribution vector, 239 Population growth, 239-241, 330-332 Positive definite matrix, 413 quadratic form of, 413 Positive matrix, 325 Positive semidefmite matrix, 413 quadratic form of, 413 Power method, 311-316 inverse, 317-318 shifted, 316-317 shifted inverse, 318-319 Predator-prey model, 343 Price vector(s), 235 Primitive matrix, 335 Principal argument of a complex number, C4 Principal Axes Theorem, 411 Probability vector, 231 Product of complex numbers, C1-C2 of matrices, 141-143 of polynomials, D2-D3 Production vector, 236 Projection orthogonal, 382-387, 538 into a subspace, 382 onto a vector, 27-28 Projection form of the Spectral Theorem, 405 Projection matrix, 218-219, 366, 586 Proof by contradiction, A8 by contrapositive, A8 direct, A7 indirect, A7 by mathematical induction, B1-B7 Pseudoinverse of a matrix, 585-586, 602-603 Pythagoras' Theorem, 26, 537 Q qr algorithm, 398-399 qr factorization, 392-394 least squares and, 582-583 modified, 396-398 Quadratic equation(s), D6 graphing, 415-423 Quadratic form, 408-416 indefinite, 413 matrix associated with, 409 negative definite, 413 nsgative semidefmite, 413 positive definite, 413 positive semidefmite, 413 Quadratic mean, 550 Quadric surface, 420 Quotient of complex numbers, C2, C5 R R,4 R3,8 B",9-ll Racetrack game, 1-3 Range, 212, 482 Rank of a linear transformation, 484 of a matrix, 72, 204 singular value decomposition, 600 Rank Theorem, 72, 205, 386, 486 Ranking vector, 356-358 Rational Roots Theorem, D5 Rayleigh, Baron, 316 Rayleigh quotient, 316 Real axis, CI Real part of a complex number, CI Recurrence relation, 336 solution of, 337 Reduced row echelon form, 73 Reducible matrix, 334 Reflection, 215 Regular graph, 361 Regular matrix, 325 Repeller, 352 Resolving a vector, 51 Resultants, 50 Right singular vectors, 593 Robotics, 226-229 Root, for a polynomial equation, D4 Root mean square error, 612 Rotation, 216-218 center of, 516 Rotational symmetry, 516 Roundoff error, 62, 83-84 Row echelon form, 65 Row equivalent matrices, 68 Row matrix, 138 Row-matrix representation of a matrix product, 146 Row reduction, 66 Row space, 195 Row vector, 3, 138 Saddle point, 352 Scalar, 8 Scalar matrix, 139 Scalar multiple, 481 Scalar multiplication, 7-8, 9, 140, 429 closure under, 192,429 Scaling, 314 Schmidt, Erhard, 390 Schur complement, 283 Schur, Issai, 283 Schur's Triangularization Theorem, 408 Schwarz, Karl Herman Amandus, 539 SeidtT, Philipp Ludwig, 125 Seki Kowa, Takakazu, 280 Set(s), A1-A4 disjoint, A4 elements of, Al empty, A2 intersection of, A4 subset of, A2 union of, A4 Shifted inverse power method, 318-319 Similar matrices, 301-303, 508 Simple path, 242 Singular value decomposition (SVD), 590-599 applications of, 599-606 and condition number, 602 and least squares approximation, 603-605 and matrix norms, 600-602 outer product form of, 596 and polar decomposition, 610 and pseudoinverse, 602-603 and rank, 600 Singular values, 590-591 Singular vectors, 593 Size of a matrix, 138 Skew lines, 76 Skew-symmetric matrix, 162 Solution of a differential equation, 518 least squares, 574-582 of a linear system, 59 minimum length least squares, 603 of a recurrence relation, 337 of a system of differential equations, 340-342 Span, 90, 156, 193,438 Spanning set of vectors, 88-92 Spanning sets, 438-441 Spectral decomposition, 405 Spectral Theorem, 403 projection form of, 405 Spectrum, 403 Spiral attractor, 355 Spiral repeller, 355 Square matrix, 139, 374 Square root of a matrix, 424 Standard basis, 198, 447 Standard matrix, 216 Standard position, 4 Standard unit vectors, 22 State vector, 231 Steadv state vector, 233 Index Stochastic matrix, 232 Strictly diagonally dominant matrix, 324 Strutt, John William, 316 Submatrices, 145 Subset, A2 Subspace(s), 192, 433-438 fundamental, 380 spanned by a set of vectors, 192-193, 441 sum of, 442 trivial, 437 zero, 437 Subtraction of complex numbers, C2 of matrices, 140 of polynomials, D2 of vectors, 8,433 Sum of complex numbers, CI of linear transformations, 481 of matrices, 140 of polynomials, D2 of subspaces, 442 of vectors, 5-6, 9, 439 Sum norm, 552 Summation notation, A4-A7 Sustainable harvesting policy, 360 Sylvester, James Joseph, 206, 280 Symmetric matrix, 151-152, 160-161 System of linear differential equations, 340-348 System(s) of linear equations, See Linear system(s) T Tail of a vector, 3 Taussky-Todd, Olga, 320 Taxicab circle, 530 Taxicab distance, 529-531 Taxicab norm, 530 Taxicab perpendicular bisector, 530 Taxicab pi, 530 Taylor polynomial, 472 Terminal point of a vector, 3 Ternary vector, 16 Tlieorem, 10 Tiling, 515 Tournament, 244 Trace of a matrix, 162 Transformation, 212 linear, 213-214, 472-474 matrix, 211-216, 472 Transitional matrix, 231 Transitional probabilities, 230 Translational symmetry, 516 Transpose of a matrix, 151, 159-160 Triangle inequality, 22, 540, 552 Trigonometric polynomial, 614 Triple scalar product identity, 287 Trivial subspace, 437 Turing, Alan Mathison, 181 u Uniform norm, 553 Union of sets, A4 Unit circle, 21 Unit lower triangular matrix, 181 Unit sphere, 535 Unit vector, 21, 535 Unitarily diagonalizable matrix, 546-547 Unitary matrix, 545-546 Upper triangular matrix, 162 block, 283 V Vandermonde, Alexandre-Theophile, 291 Vandermonde determinant, 291 Vector(s), 3, 9, 429, 439 addition of, 5-6, 9, 439 algebraic properties of, 10 angle between, 24-26 binary, 14 column, 3, 138 complex, 429, 432, 543-544 complex dot product of, 543 components of, 3 coordinate, 208, 448-452 cross product of, 48-49, 286-287 demand, 236 direction, 35, 39 distance between, 23-24, 535 dot product of, 18-20 equality of, 3 force, 50-53 inner product of, 531 length of, 20, 535 linear combination of, 12,433 linearly dependent, 92-93, 443, 446 linearly independent, 92-97, 443, 446 norm of, 20, 535, 552 normal, 34, 38 orthogonal, 26, 369-373, 535, 537 orthonormal, 372, 537 parallel, 8 population distribution, 240 price, 235 probability, 231 production, 236 ranking, 356-358 resultant, 50 row, 3, 138 scalar multiplication of, 7-8, 9, 429 span of, 438 spanning sets of, 88-92 state, 231 steady-state, 233 ternary, 16 unit, 21, 535 zero, 4, 429 Vector form of the equation of a line, 36, 41 Vector form of the equation of a plane, 39, 41 Vector space(s), 429 basis for, 446 complex, 429, 432, 543-544 dimension of, 453 finite-dimensional, 453 infinite-dimensional, 453 isomorphic, 493-495 subspace of, 433-438 over TLp 429,432 Venn diagram, A2-A3 Venn, John, A2 Vertex of a graph, 242 W Weight of a magic square, 460 Weighted dot product, 532 Well-conditioned matrix, 561 Wessel, Caspar, CI Weyl, Hermann, 429 Wheatstone bridge circuit, 105-106 Wilson, Edwin B„ 49 Wronskian, 457 X x-axis, 3 X)'-plane, 8 xz-plane, 8 Y j'-axis, 3 yz-plane, 8 z Z, 14 Ii, 14 II14 Z„„ 16 Z,"„ 16 Z-axis, 8 Zero matrix, 141 Zero of a polynomial, D4 Zero subspace, 437 Zero transformation, 474 Zero vector, 4, 429 index of Notation v vector, 3, 439 AB vector as directed line segment, 3 vector as ordered n-tuple, 9 R" vector space of ordered n-tuples of real numbers, 9 Z, the integers modulo 2, 14 1 Z2" binary vectors of length n, 14 Zm the integers modulo m, 14 m-ary vectors of length n, 16 0 zero vector, 4 u • v dot product of vectors, 18 ||v| length (norm) of a vector, 20, 535, 552 ej, e2,..., e„ standard unit (basis) vectors in R", 22 d(u, v) distance between vectors, 23 proju(v) (orthogonal) projection of v onto u, 27 u X v cross product of vectors, 48 [A | b ] augmented matrix, 61 rank(A) rank of a matrix, 72,204 span(v], v2,..., vt) span of a set of vectors, 99, 438 «1« a2n mXn matrix, 138-139 I = I„ identity matrix, 139 0= 0„ zero matrix, 141 Ak kth power of a (square) matrix, 149 A T transpose of a matrix, 151 A 1 inverse of a matrix, 163 det A = \A\ determinant of a matrix, 165, 264-265 row(A) row space of a matrix, 195 col(A) column space of a matrix, 195 null(A) null space of a matrix, 197 B basis, 198,446 dim V dimension of a vector space, 203, 454 nullity(A) nullity of a matrix, 204 [v] B coordinate vector of v with respect to the basis B, 208,449 T linear transformation, 213, 472 TA matrix transformation, 214 S ° T composition of linear transformations, 219, 477 [ T ] standard matrix of a linear transformation, 216 A eigenvalue, 254 EA eigenspace, 256 C (ij)-cofactor of a matrix, 266 A.(b) matrix A with column (' replaced by b, 274 adj A adjoint of a matrix, 276 C{p) companion matrix of a polynomial, 299 cA(A) characteristic polynomial of matrix A, 292 A ~ B is similar to B, 301 A = PDP * eigenvalue decomposition (diagonalizable matrix A), 309 D. z'th Gerschgorin disk, 319 fn nth Fibonacci number, 335,427 eA matrix exponential, 346 orthogonal complement of subspace W, 378 projw(v) orthogonal projection of v onto subspace W, 382 perpu,(v) component of v orthogonal to W, 382 A = QDQ' spectral decomposition (symmetric matrix A), 400, 405 xT Ax quadratic form, 409 Mm vector space ofra X n matrices, 430 9n vector space of polynomials of degree £ n, 431 9 vector space of all polynomials, 431 9 vector space of functions/: U —► R, 431 8F[a, b] vector space of functions/: [a, b] —* R, 432 C" vector space of ordered n-tuples of complex numbers, 432,540 '€ vector space of continuous functions f: H —> 435 ;^i[a,-b] vector space of continuous functions/: [a, b] —> M, 435 2 vector space of differentiable functions/: K —> R, 435 1fi[a, b] vector space of differentiable functions f: [a, b] —> R, 435 Pc^b change-of-basis matrix, 465 ker(T) kernel of a linear transformation, 482 range(T) range of a linear transformation, 482 V = W V is isomorphic to W, 493 [ r]c<_B matrix of T with respect to bases B and C, 498 (u, v) inner product of vectors, 531 A complex conjugate of a matrix, 544 A* conjugate transpose of a matrix, 544 IIvlls ( = liv!l i) sum norm (= 1 -norm), 552 llv!L ( = l|v||x) max norm (= ---norm;, 55? IMIe (= II) Euclidean norm (= 2-norm), 553 \\v\\H Hamming norm, 554 i At matrix norm, 556 ||A||f Frobenius norm, 556 cond(A) condition number of a matrix, 562 x least squares solution of Ax = b, 5"4 A pseudoinverse of a matrix, 5S5. 602 a singular value, 590 • A = US, VT singular value decomposition (SVD), 593