4 Eigenvalues and Eigenvector: Almost every combination of the adjectives proper, latent, characteristic, eigen and secular, with the nouns root, number and value, has been used in the literature for what we call a proper value. —Paul R. Haimos Finite Dimensional Vector Spaces (2nd edition) Van Nostrand, 1958, p. 102 4.0 Introduction: A Dynamical System on Graphs CAS We saw in the last chapter that iterating matrix multiplication often produces interesting results. Both Markov chains and the Leslie model of population growth exhibit steady states in certain situations. One of the goals of this chapter is to help you understand such behavior. First we will look at another iterative process, or dynamical system, that uses matrices. (In the problems that follow, you will find it helpful to use a CAS or a calculator with matrix capabilities to facilitate the computations.) Our example involves graphs (see Section 3.7). A complete graph is any graph in which every vertex is adjacent to every other vertex. If a complete graph has n vertices, it is denoted by Kn. For example, Figure 4.1 shows a representation of K4. Problem 1 Pick any vector x in R4 with nonnegative entries and label the vertices of K4 with the components of x, so that vx is labeled with xu and so on. Compute the adjacency matrix A ofK4 and relabel the vertices of the graph with the corresponding components of Ax. Try this for several vectors x and explain, in terms of the graph, how the new labels can be determined from the old labels. Problem 2 Now iterate the process in Problem 1. That is, for a given choice of x, relabel the vertices as described above and then apply A again (and again, and again) until a pattern emerges. Since components of the vectors themselves will get quite large, we will scale them by dividing each vector by its largest component after each iteration. Thus, if a computation results in the vector '4' 2 1 _1 we will replace it by Figure 4.1 "4" "1 " 2 0.5 1 0.25 _1 _0.25_ 253 254 Chapter 4 Eigenvalues and Eigenvectors Figure 4.2 Figure 4.3 Figure 4.4 Note that this process guarantees that the largest component of each vector will now be 1. Do this for K4, then K3 and K5. Use at least ten iterations and two-decimal-place accuracy. What appears to be happening? Problem 3 You should have noticed that, in each case, the labeling vector is approaching a certain vector (a steady state label!). Label the vertices of the complete graphs with this steady state vector and apply the adjacency matrix A one more time (without scaling). What is the relationship between the new labels and the old ones? Problem 4 Make a conjecture about the general case Kn. What is the steady state label? What happens if we label K„ with the steady state vector and apply the adjacency matrix A without scaling? Problem 5 The Petersen graph is shown in Figure 4.2. Repeat the process in Problems 1 through 3 with this graph. We will now explore the process with some other classes of graphs to see if they behave the same way. The cycle C„ is the graph with n vertices arranged in a cyclic fashion. For example, C5 is the graph shown in Figure 4.3. Problem 6 Repeat the process of Problems 1 through 3 with cycles C„ for various odd values of n and make a conjecture about the general case. Problem 1 Repeat Problem 6 with even values of n. What happens? A bipartite graph is a complete bipartite graph (see Exercises 74-78 in Section 3.7) if its vertices can be partitioned into sets U and V such that every vertex in 17 is adjacent to every vertex in V, and vice versa. If U and V each have n vertices, then the graph is denoted by Knn. For example, Ki3 is the graph in Figure 4.4. Problem 8 Repeat the process of Problems 1 through 3 with complete bipartite graphs K,hn for various values of n. What happens? By the end of this chapter, you will be in a position to explain the observations you have made in this Introduction. Introduction to Eigenvalues anil Eigenvectors In Chapter 3, we encountered the notion of a steady state vector in the context of two applications: Markov chains and the Leslie model of population growth. For a Markov chain with transition matrix P, a steady state vector x had the property that Px = x; for a Leslie matrix L, a steady state vector was a population vector x satisfying Lx — rx, where r represented the steady state growth rate. For example, we saw that The German adjective eigen means "own" or "characteristic of." and are charac- teristic of a matrix in the sense that they contain important information about the nature of the matrix. The letter A (lambda), the Greek equivalent of the English letter L, is used for eigenvalues because at one time they were also known as latent values. The prefix eigen is pronounced "EYE-gun." "0.7 0.2" "0.4" "0,4" „0.3 0.8. .0,6. _0.6_ and 0 4 3 0.5 0 0 0 0.25 0 18 18 6 = 1.5 6 1 1 In this chapter, we investigate this phenomenon more generally. That is, for a square matrix A, we ask whether there exist nonzero vectors x such that Ax is just a scalar multiple of x. This is the eigenvalue problem, and it is one of the most central problems in linear algebra. It has applications throughout mathematics and in many other fields as well. Definition Let A be an n X n matrix. A scalar A is called an eigenvalue of A if there is a nonzero vector x such that Ax = Ax. Such a vector x is called an eigenvector of A corresponding to A. Section 4.1 Introduction to Eigenvalues and Eigenvectors 255 Example 4.1 Show that x = eigenvalue. f V _1_ is an eigenvector of A = "3 f _1 3_ and find the corresponding Solution We compute 3 f Y "4l [il Ax = .1 3_ .1. — J=4L = 4x from which it follows that x is an eigenvector of A corresponding to the eigenvalue 4 ...............__________________________________________.____ - ■___._:_:_._:_W liiiilli 4.2 Show that 5 is an eigenvalue of A = 1 2 4 3 and determine all eigenvectors corre- sponding to this eigenvalue. Solution We must show that there is a nonzero vector x such that Ax = 5x. But this equation is equivalent to the equation (A — 5/)x = 0, so we need to compute the null space of the matrix A — 51. We find that A - 51 = 1 2 "5 0" "-4 2 _4 3_ _0 5_ 4 -2. Since the columns of this matrix are clearly linearly dependent, the Fundamental Theorem of Invertible Matrices implies that its null space is nonzero. Thus, Ax = 5x has a nontrivial solution, so 5 is an eigenvalue of A. We find its eigenvectors by computing the null space: [A - 51J 0] "-4 2 cf 1 -\ o" -> 4 -2 0. .0 0 0^ Thus, if x is an eigenvector corresponding to the eigenvalue 5, it satisfies X\ — jx2 = 0, or xx — \xt, so these eigenvectors are of the form L x2 That is, they are the nonzero multiples of 1 of (or, equivalently, the nonzero multiples The set of all eigenvectors corresponding to an eigenvalue A of an n X n matrix A is just the set of nonzero vectors in the null space of A — XI. It follows that this set of eigenvectors, together with the zero vector in W, is the null space of A — A/. 256 Chapter 4 Eigenvalues and Eigenvectors DOfillltiOll Let A be an n X n matrix and let A be an eigenvalue of A. The collection of all eigenvectors corresponding to A, together with the zero vector, is called the eigenspace of A and is denoted by Ek. Therefore, in Example 4.2, E5 = 11 * |. ......... w Exam • Show that A = 6 is an eigenvalue of ^4 = "7 1-2" -3 3 6 and find a basis for its eigenspace, r . 2 2 2_ Solution As in Example 4.2, we compute the null space of A — 61. Row reduction produces 1 1 -2 ~1 1 -1 A - 61 = -3 -3 6 -->■ 0 0 0 2 2 -4 0 0 0 from which we see that the null space of A — 61 is nonzero. Hence, 6 is an eigenvalue of A, and the eigenvectors corresponding to this eigenvalue satisfy x1 + x2 — 2x3 = 0, or xx = —x2 + 2x3. It follows that I X2 span In IR2, we can give a geometric interpretation of the notion of an eigenvector. The equation Ax = Ax says that the vectors Ax and x are parallel. Thus, x is an eigenvector of A if and only if A transforms x into a parallel vector [or, equivalently, if and only if TA(x) is parallel to x, where TA is the matrix transformation corresponding to A]. Find the eigenvectors and eigenvalues of A = geometrically. Solution We recognize that A is the matrix of a reflection F in the x-axis (see Example 3.56). The only vectors that F maps parallel to themselves are vectors parallel ", which are reversed (eigenvalue —1), and vectors ), which are sent to themselves (eigenvalue 1) to the v-axis (i.e., multiples of parallel to the ,\--axis (i.e., multiples of (see Figure 4.5). Accordingly, A = -corresponding eigenspaces are • 1 and A = 1 are the eigenvalues of A, and the span and Ei span Section 4.1 Introduction to Eigenvalues and Eigenvectors 251 y A 3- -3-- Flgore 4.5 The eigenvectors of a reflection Figure 4.6 The discussion is based on the article "Eigenpictures; Picturing the Eigenvector Problem" by Steven Schonefeld in The College Mathematics Journal 26 (1996), pp. 316-319. Another way to think of eigenvectors geometrically is to draw x and Ax head-to-tail. Then x will be an eigenvector of A if and only if x and Ax are aligned in a straight line. In Figure 4.6, x is an eigenvector of A but y is not. If x is an eigenvector of A corresponding to the eigenvalue A, then so is any nonzero multiple of x. So, if we want to search for eigenvectors geometrically, we need only consider the effect of A on unit vectors. Figure 4.7(a) shows what happens when 3 1 we transform unit vectors with the matrix A = .of Example 4.1 and display l/vT 1 3 the results head-to-tail, as in Figure 4.6. We can see that the vector x 1/V2 is an eigenvector, but we also notice that there appears to be an eigenvector in the second [~-l/V2~ quadrant. Indeed, this is the case, and it turns out to be the vector \lsJ2 258 Chapter 4 Eigenvalues and Eigenvectors y Figure 4.7 In Figure 4.7(b), we see what happens when we use the matrix A — There are no eigenvectors at all! We now know how to find eigenvectors once we have the corresponding eigenvalues, and we have a geometric interpretation of them—but one question remains: How do we first find the eigenvalues of a given matrix? The key is the observation that A is an eigenvalue of A if and only if the null space of A — A J is nontrivial. I a b Recall from Section 3.3 that the determinant of a 2 X 2 matrix A = the expression det A — ad — be, and A is invertible if and only if det A is nonzero. Furthermore, the Fundamental Theorem of Invertible Matrices guarantees that a matrix has a nontrivial null space if and only if it is noninvertible-—hence, if and only if its determinant is zero. Putting these facts together, we see that (for 2X2 matrices at least) A is an eigenvalue of A if and only if det(A — A J) — 0. This fact characterizes eigenvalues, and we will soon generalize it to square matrices of arbitrary size. For the moment, though, let's see how to use it with 2X2 matrices. Example 4,5 Find all of the eigenvalues and corresponding eigenvectors of the matrix A = 3 1], from Example 4.1. Solution The preceding remarks show that we must find all solutions A of the equation det(A - AI) = 0. Since det (A - AJ) = det 3 - A 1 1 3 - A = (3 - A)(3 - A) - 1 = A2 - 6A + we need to solve the quadratic equation X1 - 6A + 8 = 0. The solutions to this equation are easily found to be A = 4 and A = 2. These are therefore the eigenvalues of A. Section 4.1 Introduction to Eigenvalues and Eigenvectors 259 To find the eigenvectors corresponding to the eigenvalue A = 4, we compute the null space of A — 47. We find [A - 47 01 = -1 1 0" 1 -1 o" —> 1 -1 0, 0 0 0. from which it follows that x is an eigenvector corresponding to A = 4 if and only if Xi — x2 = Ö or xl — x2. Hence, the eigenspace E4 = spanf J *2 - < x so y Similarly, for A = 2, we have [A - 27 ! 0] is an eigenvector corresponding to A = 2 if and only if yl + y2 I 1 0" , ,\ "l 1 0" _1 1 0^ / _0 0 0_ 0 or yl = —y2. Thus, the eigenspace E2 J. L Ji J )'2 -1 1 span Figure 4.8 shows graphically how the eigenvectors of A are transformed when multiplied by A: an eigenvector x in the eigenspace E4 is transformed into 4x, and an eigenvector y in the eigenspace E2 is transformed into 2y. As Figure 4.7(a) shows, the eigenvectors of A are the only vectors in R2 that are transformed into scalar multiples of themselves when multiplied bv A. 4 Ay = 2y -3 -2 -2 -3 -4 + Ax = 4x ť 12 3 4 Figure 4.8 How A transforms eigenvectors ':■ > Chapter 4 Eigenvalues and Eigenvectors Remark You will recall that a polynomial equation with real coefficients (such as the quadratic equation in Example 4.5) need not have real roots; it may have complex roots. (See Appendix C.) It is also possible to compute eigenvalues and eigenvectors when the entries of a matrix come from Zp, where p is prime. Thus, it is important to specify the setting we intend to work in before we set out to compute the eigenvalues of a matrix. However, unless otherwise specified, the eigenvalues of a matrix whose entries are real numbers will be assumed to be real as well. Example 4.6 Example 4.1 a + bi Interpret the matrix in Example 4.5 as a matrix over Z, and find its eigenvalues in that field. Solution The solution proceeds exactly as above, except we work modulo 3. Hence, the quadratic equation A2 - 6A + 8 = 0 becomes A2 + 2 = 0. This equation is the same as A2 = -2 = 1, giving A = 1 and A = -1 = 2 as the eigenvalues in Z3. (Check that the same answer would be obtained by first reducing A modulo 3 to obtain and then working with this matrix.) 10 & Find the eigenvalues of A = (a) over U and (b) over the complex numbers C. Solution We must solve the equation 0 = dettA - Afl = det -A -1 1 -A A + 1 (a) Over R, there are no solutions, so A has no real eigenvalues. (b) Over C, the solutions are A = i and A = —i. (See Appendix C.) In the next section, we will extend the notion of determinant from 2 X 2 to n X n matrices, which in turn will allow us to find the eigenvalues of arbitrary square matrices. (In fact, this isn't quite true—but we will at least be able to find a polynomial equation that the eigenvalues of a given matrix must satisfy.) Exercises 4.1 In Exercises 1-6, show thatv is an eigenvector of A and find the corresponding eigenvalue. T 1 1. A 2. A 3. A = 0 3 .3 oj' 1 2 2 lj' — 1 1 6 0 f — 2 4. A = "4 .5 -2" -7_ ,v = "3 0 o" 5. A = 0 1 -2 J 0 1. "o 1 -l" 6. A = 1 1 1 1 2 0 2" , v = -1 1. ~-2~ , v = 1 1- Section 4,1 Introduction to Eigenvalues and Eigenvectors In Exercises 7-12, show that A is an eigenvalue of A and find one eigenvector corresponding to this eigenvalue. 7. A 8. A 9. A = 2 2 2 -1 , A = 3 2 3 3 2 ,A = -1 10. A 11. A 12. A 0 4 -1 5 4 -2 5 -7 , A = -6 1 0 2 — I 1 1 2 0 1J 3 1 -1" 1 1 1 4 2 0 A = -1 A = 2 In Exercises 13-18, find the eigenvalues and eigenvectors of A geometrically. 13. A = 14. A = 15. A = 16. A = -1 0 0 1 0 1 1 o 1 0 0 0 16 11" 25 25 13 JL 25 25- (reflection in the y-axis) (reflection in the line y = x) (projection onto the x-axis) (projection onto the line through the origin with direction vector '2 Ol 17. A = ^ (stretching by a factor of 2 horizontally and a factor of 3 vertically) 18. A = 0 -1 .1 0 about the origin) (counterclockwise rotation of 90° In Exercises 19-22, the unit vectors x in R and their images Ax under the action ofa2X2 matrix A are drawn head-to-tail, as in Figure 4.7. Estimate the eigenvectors and eigenvalues of A from each "eigenpicture." 19. 20. *~x 282 Chapter 4 Eigenvalues and Eigenvectors 21. f--*-x In Exercises 23-26, use the method of Example 4.5 to find all of the eigenvalues of the matrix A. Give bases for each of the corresponding eigenspaces. Illustrate the eigenspaces and the effect of multiplying eigenvectors by A as in Figure 4.8. 23. A = 25. A = -1 1 5 24. A 26. A 2 4 6 0 1 2 -2 3 29. A = 1 i i 1 30. A 1 + i In Exercises 31-34, find all of the eigenvalues of the ma- trix A over the indicated Z„ 31. A = 33. A = 1 0 1 2 3 1 4 0 over Z3 over Zq 32. A 34. A 2 1 1 2 1 4 4 0 •Z, over Zr 35. (a) Show that the eigenvalues of the 2 X 2 matrix a & are the solutions of the quadratic equation A2 - tr(A)A + det A = 0, where tr(A) is the trace of A. (See page 162.) (b) Show that the eigenvalues of the matrix A in part (a) are A = \{a + d ± Via - df + 4bc) (c) Show that the trace and determinant of the matrix A in part (a) are given by tr(A) = A[ + A2 and det A — AtA2 where At and A2 are the eigenvalues of A. 36. Consider again the matrix A in Exercise 35. Give conditions on a, b, c, and d such that A has (a) two distinct real eigenvalues, (b) one real eigenvalue, and (c) no real eigenvalues. 37. Show that the eigenvalues of the upper triangular matrix A a b 0 d In Exercises 27-30, find all of the eigenvalues of the matrix A over the complex numbers C. Give bases for each of the corresponding eigenspaces. are A = a and A = d, and find the corresponding eigenspaces. •b 38. Let a and b be real numbers. Find the eigenvalues and corresponding eigenspaces of A = 27. A = 28. A a b -b a over the complex numbers. Section 4.2 Determinants 263 Determinants_ Historically, determinants preceded matrices—a curious fact in light of the way linear algebra is taught today, with matrices before determinants. Nevertheless, determinants arose independently of matrices in the solution of many practical problems, and the theory of determinants was well developed almost two centuries before matrices were deemed worthy of study in and of themselves. A snapshot of the history of determinants is presented at the end of this section. flu (In Recall that the determinant of the 2X2 matrix A = is det A = ana22 We first encountered this expression when we determined ways to compute the inverse of a matrix. In particular, we found that 1 flllß22 - fl]2«2 ön J The determinant of a matrix A is sometimes also denoted by A |, so for the 2X2 matrix A = Lfl21 a,- we may also write A = llarning This notation for the determinant is reminiscent of absolute value no- tation. It is easy to mistake , the notation for determinant, for the notation for the matrix itself. Do not confuse these. Fortunately, it will usually be clear from the context which is intended. We define the determinant of a 1 X 1 matrix A = [a] to be det A = \a\ = a (Note that we really have to be careful with notation here: | a j does not denote the absolute value of a in this case.) How then should we define the determinant of a 3X3 matrix? If you ask your CAS for the inverse of a b c A = d e f -g h i. the answer will be equivalent to where A = aei — afh — bdi + bfg + cdh — ceg. Observe that A = aei — afh — bdi + bfg + cdh — ceg = a(ei — fh) — b(di — fg) + cidh — eg ei - fh ch - bi bf- ce di ai — ^g cd — af dh - eg bg - ah ae — bd e f d f d e a - b + c h i g * g h 264 Chapter 4 Eigenvalues and Eigenvectors and that each of the entries in the matrix portion of A'1 appears to be the determinant of a 2 X 2 submatrix of A. In fact, this is true, and it is the basis of the definition of the determinant of a 3 X 3 matrix. The definition is recursive in the sense that the determinant of a 3 X 3 matrix is defined in terms of determinants of 2 X 2 matrices. «12 «13 Definition Let a = «21 «22 «23 . Then the determinant of A is the scalar .«31 «32 «33- «22 «23 «21 «23 + «13 «21 «22 - au «32 «33 «31 «33 «31 «32 (1) Notice that each of the 2 X 2 determinants is obtained by deleting the row and column of A that contain the entry the determinant is being multiplied by. For example, the first summand is an multiplied by the determinant of the submatrix obtained by deleting row 1 and column 1. Notice also that the plus and minus signs alternate in Equation (1). If we denote by A,-,-the submatrix of a matrix A obtained by deleting row i and column j, then we may abbreviate Equation (1) as det A = fln det An - an det A12 + a13 det Ax = 2(-i)I+>«„det Av For any square matrix A, det A« is called the (i, j)-minor of A. Example 4.8 Compute the determinant of r Solution We compute A = "5 -3 2"1 1 0 2 l2 -1 3_ 0 2 - (-3) 1 2 1 0 + 2 -1 3 2 3 2 -1 det A = 5 = 5(0 - (-2)) + 3(3 - 4) + 2(-l - 0) = 5(2) + 3(-l) + 2(-l) = 5 With a little practice, you should find that you can easily work out 2X2 determinants in your head. Writing out the second line in the above solution is then unnecessary. Another method for calculating the determinant of a 3 X 3 matrix is analogous to the method for calculating the determinant of a 2 X 2 matrix. Copy the first two columns of A to the right of the matrix and take the products of the elements on the six Section 4.2 Determinants llilll diagonals shown below. Attach plus signs to the products from the downward-sloping diagonals and attach minus signs to the products from the upward-sloping diagonals. (2) This method gives «1I«22«33 + fll2«23fl31 + fl13fl21«32 ~~ fl31«22fl13 ~ «32«23«11 ~* «33fl21fl12 In Exercise 19, you are asked to check that this result agrees with that from Equation (1) for a 3 X 3 determinant. Calculate the determinant of the matrix in Example 4.8 using the method shown in (2). Solution We adjoin to A its first two columns and compute the six indicated T products: Adding the three products at the bottom and subtracting the three products at the top gives detA = 0 + (-12) + (-2) - 0 - (-10) - (~9) = 5 as before. 4 Warning We are about to define determinants for arbitrary square matrices. However, there is no analogue of the method in Example 4.9 for larger matrices. It is valid only for 3 X 3 matrices. Determinants of n x n Matrices The definition of the determinant of a 3 X 3 matrix extends naturally to arbitrary square matrices. Definition Let A = [a$ be an n X n matrix, where n>2. Then the determinant of A is the scalar det A = \A\ = (!,, del A,, - au det AS2 + • • • + (-1)' n = 2(-!)1'"i-det AD ' det A, (3) 266 Chapter 4 Eigenvalues and Eigenvectors It is convenient to combine a minor with its plus or minus sign. To this end, we define the (i, j)-cofactor of A to be C„ = (-if*1 det A„ With this notation, definition (3) becomes detA=2>yCy (4) Exercise 20 asks you to check that this definition correctly gives the formula for the determinant of a 2 X 2 matrix when n = 2. Definition (4) is often referred to as cofactor expansion along the first row. It is an amazing fact that we get exactly the same result by expanding along any row (or even any column)! We summarize this fact as a theorem but defer the proof until the end of this section (since it is somewhat lengthy and would interrupt our discussion if we were to present it here). Theorem 4.1 The Laplace Expansion Theorem The determinant of an n X n matrix A = [a,-,-], where nS2, can be computed as det A = anCn + al2Cl2 + • • • f det A«, each cofactor is plus or minus the corresponding minor, with the correct sign given by the term (— 1)'+J. A quick way to determine whether the sign is + or — is to remember that the signs form a "checkerboard" pattern: + - + - • • — + - + • • + - + - • • - + - + • • Section 4.2 Determinants w. lit 10 Compute the determinant of the matrix "5 -3 2~ A = 1 0 2 .2 -1 3_ 1 by (a) cofactor expansion along the third row and (b) cofactor expansion along the I second column. Pierre Simon Laplace (1749-1827) was born in Normandy, France, and was expected to become a clergyman until his mathematical talents were noticed at school. He made many important contributions to calculus, probability, and astronomy. He was an examiner of the young Napoleon Bonaparte at the Royal Artillery Corps and later, when Napoleon was in power, served briefly as Minister of the Interior and then Chancellor of the Senate. Laplace was granted the title of Count of the Empire in 1806 and received the title of Marquis de Laplace in 1817. 4,11 Solution (a) We compute det A = üi,C,, + a,iC« + a„C, -3 2 - (-1) 5 2 5 -3 2 + 3 0 2 1 2 1 0 = 2(-6) + 8 + 3(3) = 5 (b) In this case, we have det A = anCn + Ö22C22 + ^32^32 = -(-3) 1 2 5 2 - (-1) 5 2 + 0 2 3 2 3 1 2 = 3(-l) + 0 + = 5 Notice that in part (b) of Example 4.10 we needed to do fewer calculations than in part (a) because we were expanding along a column that contained a zero entry— namely, a12\ therefore, we did not need to compute C22. It follows that the Laplace Expansion Theorem is most useful when the matrix contains a row or column with lots of zeros, since, by choosing to expand along that row or column, we minimize the number of cofactors we need to compute. Compute the determinant of A = " 2 -3 0 1 5 4 2 0 1 -1 0 3 -2 1 0 0 Solution First, notice that column 3 has only one nonzero entry; we should therefore expand along this column. Next, note that the +/— pattern assigns a minus sign 268 Chapter 4 Eigenvalues and Eigenvectors to the entry a23 = 2. Thus, we have (Jet A = «13^13 + «23^23 + fl33^33 + 043C43 = 0(C13) + 2C23 + 0(C„) + 0(C43) 2 -3 1 1 3 1 0 We now continue by expanding along the third row of the determinant above (the third column would also be a good choice) to get det A = -2 1 -3 1 2 1 \ ("2 -1 3 1 3 ) = -2(-2(-8) - 5) = -2(11) = -22 (Note that the +/— pattern for the 3X3 minor is not that of the original matrix but that of a 3 X 3 matrix in general.) The Laplace expansion is particularly useful when the matrix is (upper or lower) triangular. Example 4.12 Compute the determinant of "2-3 10 0 3 2 5 0 0 16 0 0 0 5 0 0 0 0 We expand along the first column to get 3 2 5 0 1 6 det A = 2 0 0 5 0 0 0 (We have omitted all cofactors corresponding to zero entries.) Now we expand along the first column again: 1 6 0 det A = 2 • 3 0 5 2 0 0-1 Continuing to expand along the first column, we complete the calculation: det A =2-3-1 = 2 • 3 • 1 • (5(-1) - 2 • 0) = 2 • 3 • 1 • 5 • (-1) ■30 Section 4.2 Determinants 269 Example 4.12 should convince you that the determinant of a triangular matrix is the product of its diagonal entries. You are asked to give a proof of this fact in Exercise 21. We record the result as a theorem. ThSOFBIII 4.2 The determinant of a triangular matrix is the product of the entries on its main diagonal. Specifically, if A = [a,J is an n X n triangular matrix, then det A = ana22 - ■ -ann Note In general (that is, unless the matrix is triangular or has some other special form), computing a determinant by cofactor expansion is not efficient. For example, the determinant of a 3 X 3 matrix has 6 = 3! summands, each requiring two multiplications, and then five additions and subtractions are needed to finish off the calculations. For an n X n matrix, there will be n\ summands, each with n — 1 multiplications, and then n\ — 1 additions and subtractions. The total number of operations is thus T(n) = (n - \)n\ + n\ - 1 > n\ Even the fastest of supercomputers cannot calculate the determinant of a moderately large matrix using cofactor expansion. To illustrate: Suppose we needed to calculate a 50 X 50 determinant. (Matrices much larger than 50 X 50 are used to store the data from digital images such as those transmitted over the Internet or taken by a digital camera.) To calculate the determinant directly would require, in general, more than 50! operations, and 50! = 3 X 1064. If we had a computer that could perform a trillion (1012) operations per second, it would take approximately 3 X 1032 seconds, or almost 1045 years, to finish the calculations. To put this in perspective, consider that astronomers estimate the age of the universe to be at least 10 billion (1010) years. Thus, on even a very fast supercomputer, calculating a 50 X 50 determinant by cofactor expansion would take more than 1030 times the age of the universe! Fortunately, there are better methods—and we now turn to developing more computationally effective means of finding determinants. First, we need to look at some of the properties of determinants. Properties of Determinants The most efficient way to compute determinants is to use row reduction. However, not every elementary row operation leaves the determinant of a matrix unchanged. The next theorem summarizes the main properties you need to understand in order to use row reduction effectively. Theorem 4.3 Let A = [a^ be a square matrix. a. If A has a zero row (column), then det A = 0. b. If B is obtained by interchanging two rows (columns) of A, then det B = — det A. c. If A has two identical rows (columns), then det A = 0. d. If B is obtained by multiplying a row (column) of A by k, then det B = k det A. e. If A, B, and C are identical except that the ith row (column) of C is the sum of the ith rows (columns) of A and B, then det C = det A + det B. f. If B is obtained by adding a multiple of one row (column) of A to another row (column), then det B = det A. Chapter 4 Eigenvalues and Eigenvectors We will prove (b) as Lemma 4,14 at the end of this section. The proofs of properties (a) and (f) are left as exercises. We will prove the remaining properties in terms of rows; the corresponding proofs for columns are analogous. (c) If A has two identical rows, swap them to obtain the matrix B. Clearly, B = A, so detJ = detA. On the other hand, by (b),detJ3 = -det A. Therefore, det A = -detA, so det A = 0. (d) Suppose row i of A is multiplied by k to produce B; that is, b« = kati for j = 1,..,,«. Since the cefaclors C« of the elements in the ith rows of A and B are identical (why?), expanding along the ith row of B gives n n n detJ? = ^bijCjj = ^fcfl,jC,j = kj^iiijC;: = k detA (e) As in (d), the cofactors Cy of the elements in the ith rows of A, B, and C are identical. Moreover, c« — a» + b* for j = 1,..., n. We expand along the ith row of C to obtain n n n n detC =2c«Cy =2^ay + MCv =2fli>'C«j +2byCy = detA + detB Notice that properties (b), (d), and (f) are related to elementary row operations. Since the echelon form of a square matrix is necessarily upper triangular, we can combine these properties with Theorem 2 to calculate determinants efficiently. (See Exploration: Counting Operations in Chapter 2, which shows that row reduction of an n X n matrix uses on the order of n3 operations, far fewer than the n\ needed for cofactor expansion.) The next examples illustrate the computation of determinants using row reduction. rumple 4.13 Compute det A if (a) A = (b) A = 2 3 -1 0 5 3 .-4 -6 2, "0 2 -4 5" 3 0 ~3 6 2 4 5 7 5 -1 -3 1 Solution (a) Using property (f) and then property (a), we have 2 3 -1 2 3 -1 det A = 0 5 3 = 0 5 3 -4 -6 2 0 0 0 Section 4.2 Determinants 271 (b) We reduce A to echelon form as follows (there are other possible ways to do this): 0 2 - 4 5 3 0 -3 6 1 0 -1 2 3 o - 3 6 0 2 -4 5 u 0 2 -4 5 det A = = — = -3 2 4 5 7 2 4 5 7 2 4 5 7 5 -1 - -3 1 5 -1 -3 1 5 -1 -3 1 Rt-2R 1 0 1 2 1 0 -1 2 L R,-5R 0 2 4 5 -(-3) 0 -1 2 — 9 = 3 0 4 7 3 0 4 7 3 0 -1 2 -9 0 2 -4 5 R3+4R2 1 0 -1 2 Rt+2R2 0 -1 2 -9 = 3 0 0 15 -33 0 0 0 -13 3 'l-(-l)' 15 •(- -13) = 585 Remark By Theorem 4.3, we can also use elementary column operations in the process of computing determinants, and we can "mix and match" elementary row and column operations. For example, in Example 4.13(a), we could have started by adding column 3 to column 1 to create a leading 1 in the upper left-hand corner. In fact, the method we used was faster, but in other examples column operations may speed up the calculations. Keep this in mind when you work determinants by hand. Determinants of Elementary Matrices Recall from Section 3.3 that an elementary matrix results from performing an elementary row operation on an identity matrix. Setting A = l„ in Theorem 4.3 yields the following theorem. Theorem 4.4 Let £ be an n X n elementary matrix. a. If £ results from interchanging two rows of I„, then det £ = — 1. b. If E results from multiplying one row of I„ by k, then det E = k. c. If £ results from adding a multiple of one row of In to another row, then det£ - I. The word lemma is derived from the Greek verb lambanein, which means "to grasp." In mathematics, a lemma is a "helper theorem" that we "grasp hold of" and use to prove another, usually more important, theorem. Proof Since det I„ = 1, applying (b), (d), and (f) of Theorem 4.3 immediately gives (a), (b), and (c), respectively, of Theorem 4.4. _■ I Next, recall that multiplying a matrix B by an elementary matrix on the left performs the corresponding elementary row operation on B. We can therefore rephrase (b), (d), and (f) of Theorem 4.3 succinctly as the following lemma, the proof of which is straightforward and is left as Exercise 43. 272 Chapter 4 Eigenvalues and Eigenvectors LBItintB 4.5 Let B be an n X n matrix and let E be an n X n elementary matrix. Then detCEB) = (det £)(det B) We can use Lemma 4.5 to prove the main theorem of this section: a characterization of invertibility in terms of determinants. ThBOrBlll 4.6 A square matrix A is invertible if and only if det A 0. f".;!:■; Let A be an n X n matrix and let R be the reduced row echelon form of A. We will show first that det A ¥= 0 if and only if det R # 0. Let Ev E2, . . . , E, be the elementary matrices corresponding to the elementary row operations that reduce A to JR. Then Er-■ ■E2ElA = R Taking determinants of both sides and repeatedly applying Lemma 4.5, we obtain (det£r)- • •(det£2)(det£1)(detA) = detR By Theorem 4.4, the determinants of all the elementary matrices are nonzero. We conclude that det A ^ 0 if and only if det R ¥* 0. Now suppose that A is invertible. Then, by the Fundamental Theorem of Invertible Matrices, R = In, so det R = 1 ¥= 0. Hence, det A # 0 also. Conversely, if det A -+ 0, then det R ¥= 0, so R cannot contain a zero row, by Theorem 4.3(a). It follows that j§ » R must be J„ (why?), so A is invertible, by the Fundamental Theorem again. Determinants and Matrix Operations Let's now try to determine what relationship, if any, exists between determinants and some of the basic matrix operations. Specifically, we would like to find formulas for det(AA), det(A + B), det(AB), det(A~L), and det(Ar) in terms of det A and det B. M »■ Theorem 4.3(d) does not say that det(fcA) = k det A, The correct relationship between scalar multiplication and determinants is given by the following theorem. Theorem 4.7 If A is an « X » matrix, then det(fcA) = fc"detA You are asked to give a proof of this theorem in Exercise 44. Unfortunately, there is no simple formula for det (A + B), and in general, det(A + B) # det A + det B, (Find two 2X2 matrices that verify this.) It therefore comes as a pleasant surprise to find out that determinants are quite compatible with matrix multiplication. Indeed, we have the following nice formula due to Cauchy. Section 4.2 Determinants Augustin Louis Cauchy (1789-1857) was born in Paris and studied engineering but switched to mathematics because of poor health. A brilliant and prolific mathematician, he published over 700 papers, many on quite difficult problems. His name can be found on many theorems and definitions in differential equations, infinite series, probability theory, algebra, and physics. He is noted for introducing rigor into calculus, laying the foundation for the branch of mathematics known as analysis. Politically conservative, Cauchy was a royalist, and in 1830 he followed Charles X into exile. He returned to France in 1838 but did not return to his post at the Sorbonne until the university dropped its requirement that faculty swear an oath of loyalty to the new king. The0r6H1 4.8 If A and Baren X n matrices, then det(AB) = (det A)(det B) Proof We consider two cases: A invertible and A not invertible. If A is invertible, then, by the Fundamental Theorem of Invertible Matrices, it can be written as a product of elementary matrices—say, A = £,£2' * *-Bjt Then AB = E]E2''' E^B, so k applications of Lemma 4.5 give det(AB) = det(£i£2- • -EkB) = (det^Xdet E2) ■ ■ - (det Ek){det B) Continuing to apply Lemma 4.5, we obtain det(AB) = det^iSj- • -JB^detB = (det A)(det B) On the other hand, if A is not invertible, then neither is AB, by Exercise 47 in Section 3.3. Thus, by Theorem 4.6, det A = 0 and det(AB) = 0. Consequently, det(AB) = (det A)(det B), since both sides are zero. ____________ Example 4.14 Applying Theorem 4.8 to A = "2 1" -2 3, and B = '5 l" 2 1. , we find that r AB = 12 3 _16 5 and that det A = 4, det B = 3, and det(AB) =12 = 4-3 = (det A)(det B), as claimed. (Check these assertions!) A The next theorem gives a nice relationship between the determinant of an invertible matrix and the determinant of its inverse. 274 Chapter 4 Eigenvalues and Eigenvectors Theorem 4.9 If A is invertible, then det(A_1) det A Proof Since A is invertible, AA-1 = I, so det(AA_1) = det I = 1. Hence, (det A)(det A~!) = 1, by Theorem 4.8, and since det A # 0 (why?), dividing by det A yields the result. _iHiS Example 4.15 Verify Theorem 4.9 for the matrix A of Example 4.14. Solution We compute so det A A" 3 -l" - 3 r 4 4 -2 2. 1 1 - 2 2- _ 3 1 \ 4, \ 2 ~~ 8 8 1 4 det A Remark The beauty of Theorem 4.9 is that sometimes we do not need to know what the inverse of a matrix is, but only that it exists, or to know what its determinant is. For the matrix A in the last two examples, once we know that det A = 4 + 0, we immediately can deduce that A is invertible and that det A"1 = \ without actually computing A~ . We now relate the determinant of a matrix A to that of its transpose A1. Since the rows of A1' are just the columns of A, evaluating det A1 by expanding along the first row is identical to evaluating det A by expanding along its first column, which the Laplace Expansion Theorem allows us to do. Thus, we have the following result. Theorem 4.10 For any square matrix A, det A = det A' Cramer's Rule and the Adjoint Gabriel Cramer (1704-1752) was a Swiss mathematician. The rule that bears his name was published in 1750, in his treatise Introduction to the Analysis of Algebraic Curves. As early as 1730, however, special cases of the formula were known to other mathematicians, including the Scotsman Colin Maclaurin (1698-1746), perhaps the greatest of the British mathematicians who were the "successors of Newton," In this section, we derive two useful formulas relating determinants to the solution of linear systems and the inverse of a matrix. The first of these, Cramer's Rule, gives a formula for describing the solution of certain systems of n linear equations in n variables entirely in terms of determinants. While this result is of little practical use beyond 2X2 systems, it is of great theoretical importance. We will need some new notation for this result and its proof. For an n X n matrix A and a vector b in W, let A,-(b) denote the matrix obtained by replacing the ;'th column of A by b. That is, Column i I Afb) = [ar--b---aj Section 4.2 Determinants Theorem 4.11 Cramers Rule Let A be an invertible n X n matrix and let b be a vector in W. Then the unique solution x of the system Ax = b is given by det(A,(b)) r x —- for i = 1,..., n detA Pros! The columns of the identity matrix I = I„ are the standard unit vectors e,, e2,..., e„. If Ax = b, then ej = [Aei a„] = A,.(b) Ax AIj(x) = A[e! • • ■ x = [a, ••• b Therefore, by Theorem 4.8, (detA)(det7,(x)) = det(A7,(x)) = det(A,(b)) Now Ae„ det7,(x) 1 0 0 1 0 0 0 0 0 0 xn-l 0 0 0 0 0 0 1 0 0 1 as can be seen by expanding along the ith row. Thus, (det A)xt = det(A,(b)), and the result follows by dividing by det A (which is nonzero, since A is invertible). Example 4.16 Use Cramer's Rule to solve the system X, + 2x2 = 2 » ~xl + 4x2 = 1 Solution We compute detA 1 2 -1 4 = 6, det(Aj(b)) 2 2 1 4 6, and det(A2(b)) = = 3 1 2 1 1 By Cramer's Rule, detU^b)) 6 , det(A2(b)) 3 1 x = -= — = i and x2 = -:--= — = - detA 6 detA 6 2 276 Chapter 4 Eigenvalues and Eigenvectors Remark As noted previously, Cramer's Rule is computationally inefficient for all but small systems of linear equations because it involves the calculation of many determinants. The effort expended to compute just one of these determinants, using even the most efficient method, would be better spent using Gaussian elimination to solve the system directly. The final result of this section is a formula for the inverse of a matrix in terms of determinants. This formula was hinted at by the formula for the inverse of a 3 X 3 matrix, which was given without proof at the beginning of this section. Thus, we have come full circle. Let's discover the formula for ourselves. If A is an invertible n X n matrix, its inverse is the (unique) matrix X that satisfies the equation AX = I. Solving for X one column at a time, let jl be the jth column of X. That is, Therefore, Ax; = &, and by Cramer's Rule, det(A,(ei)) det A However, det(A,.(e;.)) = au an a2l a22 a)i aj2 (Xn\ (Xn ith column i • • 0 • • 0 .. I .. • • 0 ■ • (-ly+'detA,-,. - CM which is the (j, i) -cofactor of A. It follows that xij = (l/detA)Cj7, soAM = X = (1/det A) [C;;] = (1/det A)[C,/. In words, the inverse of A is the transpose of the matrix of cofactors of A, divided by the determinant of A. The matrix [Cj,] = [Cy]: C12 c22 Cml is called the adjoint (or adjugate) of A and is denoted by adj A. The result we have just proved can be stated as follows. Section 4.2 Determinants —__ Th60r6m 4.12 Let A be an invertible n X n matrix. Then A"1 = ,--adj A det A 1 Example 4.1? Use the adjoint method to compute the inverse of A = 1 2 -1 2 2 4 1 3 -3 Solution We compute det A - -2 and the nine cofactors 2 4 Cn = + . . = -18 C„ = On " Q, = + 4 -3 -1 -3 -1 4 2 4 1 -3 = 3 = 10 c22 C32 + = 10 c13 = + = —2 C23 = — = —6 C33 = + = 4 = -1 = -2 The adjoint is the transpose of the matrix of cofactors—namely, adj A -18 10 4^ r "-18 3 10 3 -2 -1 = 10 -2 -6 10 -6 ~2 4 -1 -2 Then det A adj A 18 3 10" 9 3 2 -5" 10 -2 -6 = -5 1 3 4 -1 -2_ .-2 1 2 1. which is the same answer we obtained (with less work) in Example 3,30. Proof of the Laplace Expansion Theorem Unfortunately, there is no short, easy proof of the Laplace Expansion Theorem. The proof we give has the merit of being relatively straightforward. We break it down into several steps, the first of which is to prove that cofactor expansion along the first row of a matrix is the same as cofactor expansion along the first column. Lemma 4.13 Let A be an n X n matrix. Then auCn + auC12 + ■ • ■ + a%nCln = detA = auCn + a2iC2\ + a„vC„ (7) 218 Chapter 4 Eigenvalues and Eigenvectors Proof. We prove this lemma by induction on n. For n = 1, the result is trivial. Now assume that the result is true for (n — 1) X (« — 1) matrices; this is our induction hypothesis. Note that, by the definition of cofactor (or minor), all of the terms containing an are accounted for by the summand anCu. We can therefore ignore terms containing flu. The j'th summand on the right-hand side of Equation (7) is aaCn = an(—l)'+l det An. Now we expand det A;i along the first row: fl12 fl13 • • ' ci\j • • • a,\n ai-l,2 ai~l,3 ' ' ' ai-l,j ' ' ' ai-l,n ai+l,2 ai-\l,3 ' ' ' ai+l,j ' ' ' ai+l,n an2 an3 ' ' ' anj ' ' ' an,n The jth term in this expansion of det Ajx is a^i— det AUiVp where the nota- tion AkiiK denotes the submatrix of A obtained by deleting rows k and / and columns r and S. Combining these, we see that the term containing on the right-hand side of Equation (7) is «n(-l)'+1fly(-l)1+'-1 detA1Uj = (-l)i+i+lauaij det AUAj What is the term containing anau on the left-hand side of Equation (7)? The factor aVl occurs in the ;'th summand, ctuCu = au(— l)1+J det Ay. By the induction hypothesis, we can expand det Ay along its first column: fl21 ' ' ' a2,j-l a2,j-H " ' ' a2n a3l ' ' ' a3,j-l a3,j+\ ' ' ' a3n ai\ ' ' ' ai,j~i ai,j+l ' ' * 0-in : •. : aul " ' ' an,j-l an,j+l ' ' " ann The rth term in this expansion of det Al-j is an(—1)(' det AliAj, so the term containing anaXj on the left-hand side of Equation (7) is ay(-l)1+'fla(-l)(^1)+1 det Aluj = (-D^a^y det Aiuj which establishes that the left- and right-hand sides of Equation (7) are equivalent. Next, we prove property (b) of Theorem 4.3. Lemma 4.14 Let A be an n X n matrix and let B be obtained by interchanging any two rows (columns) of A. Then det B = -det A Section 4.2 Determinants 219 Proof Once again, the proof is by induction on n. The result can be easily checked when « = 2, so assume that it is true for (n — 1) X (n — 1) matrices. We will prove that the result is true for n X n matrices. First, we prove that it holds when two adjacent rows of A are interchanged—say, rows r and r + 1. By Lemma 4.13, we can evaluate det B by cofactor expansion along its first column. The ith term in this expansion is ( — l)l + 'bn det Bn. \ii + r and i r + 1, then bn = an and Bn is an (w — 1) X (n — 1) submatrix that is identical to AA except that two adjacent rows have been interchanged. ar+l,n a,.„ Thus, by the induction hypothesis, det Bn = —det Aa \fi=£r and I # r + 1. If i = r, then b(1 =-■ flr+u and f$n = Ar+1>1. au a12 • • ■ aln Row 1 'r+1,1 *r+l,2 Therefore, the rth summand in det B is (-l)r+1brtdetBrt = (-l)''+1ar+udetAr+u = -(-l)(r+1)+1ar. u det Ar+1>1 Similarly, if i = r + 1, then bn = arh Bn = Arl, and the (r + l)st summand in det B is (-l)(r+i)+1br+uldetBr+1A = (-l)rflrldetArl = -(-l)r+1a„ det Ar! In other words, the rth and (r + l)st terms in the first column cofactor expansion of det B are the negatives of the (r + l)st and rth terms, respectively, in the first column cofactor expansion of det A. Substituting all of these results into det B and using Lemma 4.13 again, we obtain n detB = 2(-l)i+1fcfl detB.i (=i = 2 (-l)i+1fc,.1detBil+(-l)r+1fcrldetBrl + (-l)Cr+1)+1l>f+udetBr+1>1 i=i = 2 (-D'+1fln(^detA,,) - (-l)(r+,)+lar+1J det Ar+U1-(-l)r+1arl det A,, = ~^(~l)i+landetAa = -det A Chapter 4 Eigenvalues and Eigenvectors This proves the result for n X n matrices if adjacent rows are interchanged. To see that it holds for arbitrary row interchanges, we need only note that, for example, rows r and s, where r < s, can be swapped by performing 2(s — r) — 1 interchanges of adjacent rows (see Exercise 67). Since the number of interchanges is odd and each one changes the sign of the determinant, the net effect is a change of sign, as desired. The proof for column interchanges is analogous, except that we expand along row 1 instead of along column 1. -^91 We can now prove the Laplace Expansion Theorem. Proof 01 Theorem 4.1 Let B be the matrix obtained by moving row i of A to the top, using i — 1 interchanges of adjacent rows. By Lemma 4.14, detB = (-l)i_1detA. But bXj = an and Bu = Ay for; = 1,..., n. det B = ai+i,i Thus, - ' ain fly - aln ai-\,n ai+i,n anj ■ n det A = (-1)'' 1 det B = (~l)H1]£(~l)1+^y det By = (-D'-^i-D'-^detA^- = 2(-Dŕ%detAiŕ which gives the formula for cofactor expansion along row ('. The proof for column expansion is similar, invoking Lemma 4.13 so that we can use column expansion instead of row expansion (see Exercise 68). A self-taught child prodigy, Takakazu Seki Kowa (1642-1708) was descended from a family of samurai warriors. In addition to discovering determinants, he wrote about diophantine equations, magic squares, and Bernoulli numbers (before Bernoulli) and quite likely made discoveries in calculus. A Brief History of Determinants As noted at the beginning of this section, the history of determinants predates that of matrices. Indeed, determinants were first introduced, independently, by Seki in 1683 and Leibniz in 1693. In 1748, determinants appeared in Maclaurins Treatise on Algebra, which included a treatment of Cramer's Rule up to the 4X4 case. In 1750, Cramer himself proved the general case of his rule, applying it to curve fitting, and in 1772, Laplace gave a proof of his expansion theorem. The term determinant was not coined until 1801, when it was used by Gauss. Cauchy made the first use of determinants in the modern sense in 1812. Cauchy, in fact, was responsible for developing much of the early theory of determinants, including several important results that we have mentioned: the product rule for determinants, the characteristic polynomial, and the notion of a diagonalizable matrix. Determinants did not become widely known until 1841, when Jacobi popularized them, albeit in the context of functions of several variables, such as are encountered in a multivariable calculus course. (These types of determinants were called "Jacobians" by Sylvester around 1850, a term that is still used today.) Section 4.2 Determinants 281 Gottfried Wilhelm von Leibniz (1646-1716) was born in Leipzig and studied law, theology, philosophy, and mathematics. He is probably best known for developing (with Newton, independently) the main ideas of differential and integral calculus. However, his contributions to other branches of mathematics are also impressive. He developed the notion of a determinant, knew versions of Cramers Rule and the Laplace Expansion Theorem before others were given credit for them, and laid the foundation for matrix theory through work he did on quadratic forms. Leibniz also was the first to develop the binary system of arithmetic. He believed in the importance of good notation and, along with the familiar notation for derivatives and integrals, introduced a form of subscript notation for the coefficients of a linear system that is essentially the notation we use today. By the late 19th century, the theory of determinants had developed to the stage that entire books were devoted to it, including Dodgsons An Elementary Treatise on Determinants in 1867 and Thomas Muir's monumental five-volume work, which appeared in the early 20th century. While their history is fascinating, today determinants are of theoretical more than practical interest. Cramer's Rule is a hopelessly inefficient method for solving a system of linear equations, and numerical methods have replaced any use of determinants in the computation of eigenvalues. Determinants are used, however, to give students an initial understanding of the characteristic polynomial (as in Sections 4.1 and 4.3). Exercises 4.2 Compute the determinants in Exercises 1 -6 using cofactor expansion along the first row and along the first column. 1 0 3 0 1 1. 5 1 1 2. 2 3 0 1 2 1 3 1 -1 0 1 1 0 3. -1 0 1 4. 1 0 1 0 1 -1 0 1 1 1 2 3 1 2 3 5. 2 3 1 6. 4 5 6 3 1 2 7 8 9 Compute the determinants in Exercises 7-15 using cofactor expansion along any row or column that seems convenient. 5 2 2 -1 1 2 3 0 0 8. 15. 0 0 0 a 0 0 b c 0 d e f g h i j -4 1 3 cos 6 sin 0 tan© 9. 2 -2 4 10. 0 cos# sin# 1 -1 0 0 sinfl COS0 a b 0 0 a 0 11. 0 a b 12. b c d a 0 b 0 e 0 1 - 1 0 3 2 0 3 -1 2 5 2 6 1 0 2 2 13. 14. 0 1 0 0 0 1 1 4 1 4 2 1 2 0 1 -3 282 Chapter 4 Eigenvalues and Eigenvectors In Exercises 16-18, compute the indicated 3X3 determinants using the method of Example 4.9. 16. The determinant in Exercise 6 17. The determinant in Exercise 8 18. The determinant in Exercise 11 19. Verify that the method indicated in (2) agrees with Equation (1) for a 3 X 3 determinant. 20. Verify that definition (4) agrees with the definition of a 2X2 determinant when n — 2. 21. Prove Theorem 4.2. [Hint: A proof by induction would be appropriate here.] In Exercises 22-25, evaluate the given determinant using elementary row and/or column operations and Theorem 4.3 to reduce the matrix to row echelon form. 22. The determinant in Exercise 1 23. The determinant in Exercise 9 24. The determinant in Exercise 13 25. The determinant in Exercise 14 In Exercises 26-34, use properties of determinants to evaluate the given determinant by inspection. Explain your reasoning. Find the determinants in Exercises 35-40, assuming that 1 1 1 3 1 0 26. 3 o - 2 27. 0 -2 5 2 2 2 0 0 4 0 0 1 2 3 4 28. 0 5 2 29. 1 -3 2 3 -1 4 -1 5 2 1 2 3 4 1 3 30. 0 4 1 31. -2 0 - 2 1 6 4 5 4 1 1 0 0 0 0 2 0 0 0 0 1 0 -3 0 0 0 32. 33. 0 1 0 0 0 0 0 4 0 0 0 1 0 0 1 0 34. 0 1 0 1 0 1 1 0 0 0 0 11 a b d e s h 35. 37. 39. 2a 2b 2c d e f g h i d e f a b c g h i 2c b a 2f e d 2i h s 36. 38. 2a b/3 — c 2d e/3 -f 2g h/3 -i a — c d-f S ~ i a + 2g b + 2h c + 2i I. 3d + 2g 3e + 2h 3/ + 2i g h i 41. Prove Theorem 4.3(a). 42. Prove Theorem 4.3(f). 43. Prove Lemma 4.5. 44. Prove Theorem 4.7. In Exercises 45 and 46, use Theorem 4.6 to find all values of kfor which A is invertible. 45. A = 46. A = k -k 0 k + 1 k -8 k k 0 k2 2 k 0 k k 3 1 k ~ 1 In Exercises 47-52, assume that A and B are nX n matrices with det A — 3 and det B = — 2. Find the indicated determinants. 47. det(AB) 48. det(A2) 49. det(B"1A) 50. det(2A) 51. det(3£T) 52. det(AAT) In Exercises 53-56, A and B are n X n matrices. 53. Prove that det {AS) = det(BA). 54. If B is invertible, prove that det(B~lAB) = det(A). 55. If A is idempotent (that is, A2 ~ A), find all possible values of det (A). 56. A square matrix A is called nilpotent if A"' = O for some m > 1. (The word nilpotent comes from the Latin nil, meaning "nothing," and potere, meaning "to have power." A nilpotent matrix is thus one that Section 4.2 Determinants becomes "nothing"—that is, the zero matrix—when raised to some power.) Find all possible values of det (A) if A is nilpotent. In Exercises 57-60, use Cramer's Rule to solve the given linear system. 57. x + v = 1 x ~ y = 2 59. 2x + y + 3z = 1 y + z = 1 2 = 1 58. 2x - y = 5 x + 3y = —1 60. x + y — z = 1 x + y + z = 2 x — y =3 1» Exercises 61-64, use Theorem 4.12 to compute the inverse of the coefficient matrix for the given exercise. 61. Exercise 57 62. Exercise 58 63. Exercise 59 64. Exercise 60 65. If A is an invertible n X n matrix, show that adj A is also invertible and that 1 (adj A)" -A = adj (A"1) det A 66. If A is an n X n matrix, prove that det(adjA) = (det A)"""1 67. Verify that if r < s, then rows r and 5 of a matrix can be interchanged by performing 2(5 — r) — 1 interchanges of adjacent rows. 68. Prove that the Laplace Expansion Theorem holds for column expansion along the jth column. 69. Let A be a square matrix that can be partitioned as A = p Q o s_ where P and S are square matrices. Such a matrix is said to be in block (upper) triangular form. Prove that det A = (det P) (det S) [Hint: Try a proof by induction on the number of rows ofP.] 70. (a) Give an example to show that if A can be partitioned as A = where P, Q, R, and S are all square, then it is not necessarily true that det A = (det P)(det S) - (det Q)(det R) (b) Assume that A is partitioned as in part (a) and that P is invertible. Let ~p L.9.." R i'i. \o -RP~l II B = Compute det (BA) using Exercise 69 and use the result to show that det A = det P det(S - RP^Q) [The matrix S - RP~lQ is called the Schur complement of Pin A, after Issai Schur (1875-1941), who was born in Belarus but spent most of his life in Germany. He is known mainly for his fundamental work on the representation theory of groups, but he also worked in number theory, analysis, and other areas.] (c) Assume that A is partitioned as in part (a), that P is invertible, and that PR = RP. Prove that detA = det(PS - RQ) Writing Project Which Came First: The Matrix or the Determinant? The way in which matrices and determinants are taught today—matrices before determinants—bears little resemblance to the way these topics developed historically. There is a brief history of determinants at the end of Section 4.2. Write a report on the history of matrices and determinants. How did the notations used for each evolve over time? Who were some of the key mathematicians involved and what were their contributions? 1. Florian Cajori, A History of Mathematical Notations (New York: Dover, 1993). 2. Howard Eves, An Introduction to the History of Mathematics (Sixth Edition) (Philadelphia: Saunders College Publishing, 1990), 3. Victor J. Katz, A History of Mathematics: An Introduction (Third Edition) (Reading, MA: Addison Wesley Longman, 2008). 4. Eberhard Knobloch, Determinants, in Ivor Grattan-Guinness, ed., Companion Encyclopedia of the History and Philosophy of the Mathematical Sciences (London: Routledge, 2013). Vignette '■>y Charles Lutwidge Dodgson (1832-1898) is much better known by his pen name, Lewis Carroll, under which he wrote Alice's Adventures in Wonderland and Through the Looking Glass. He also wrote several mathematics books and collections of logic puzzles. This vignette is based on the article "Lewis Carroll's Condensation Method for Evaluating Determinants" by Adrian Rice and Eve Torrence in Math Horizons, November 2006, pp. 12-15. For further details of the condensation method, see David M. Bressoud, Proofs and Confirmations; The Story of the Alternating Sign Matrix Conjecture, MAA Spectrum Series (Cambridge University Press, 1999). 284 Lewis Carroll's Condensation Method In 1866, Charles Dodgson—better known by his pseudonym Lewis Carroll-published his only mathematical research paper. In it, he described a "new and brief method" for computing determinants, which he called "condensation." Although not well known today and rendered obsolete by numerical methods for evaluating determinants, the condensation method is very useful for hand calculation. When calculators or computer algebra systems are not available, many students find condensation to be their method of choice. It requires only the ability to compute 2X2 determinants. We require the following terminology. DCfinitlOil If A is an n X n matrix with n > 3, the interior of A, denoted int(A), is the (« — 2) X (« — 2) matrix obtained by deleting the first row, last row, first column, and last column of A. We will illustrate the condensation method for the 5 X 5 matrix 2 3 -1 2 0 1 2 3 1 -4 A = 2 -1 2 1 1 3 1 -1 2 -2 -4 1 0 1 2 Begin by setting A0 equal to the 6 X 6 matrix all of whose entries are 1. Then, we set A] = A. It is useful to imagine A0 as the base of a pyramid with Ax centered on top of A0. We are going to add successively smaller and smaller layers to the pyramid until we reach a 1 X 1 matrix at the top—this will contain det A. (Figure 4.9) Figure 4.9 Next, we "condense" Al into a 4 X 4 matrix A'2 whose entries are the determinants of all 2 X 2 submatrices of A{: a; -i 3 3 2 2 -1 -1 0 -1 2 3 1 3 2 2 -1 -1 0 1 11 ■5 7 5 -1 7 1 -7 -8 1 5 5 -4 -1 6 Now we divide each entry of A'2 by the corresponding entry of int(A0) to get matrix A2. Since A0 is all Is, this means A2 = A2. We repeat the procedure, constructing A 3 from the 2 X 2 submatrices of A2 and then dividing each entry of A 3 by the corresponding entry of int(A {), and so on. We obtain: a; 1 -5 -5 5 5 7 11 7 7 -1 _1 1 11 7 7 -1 -1 1 i5 5 -4 -4 6 62 60 -27 30 36 -29 12 -4 26 A3 = 62/2 60/3 -27/1 -30/-1 36/2 -29/1 12/1 -4/-1 26/2 31 20 -27 30 18 -29 12 4 13 ai = 31 20 20 -27 30 18 18 -29 30 18 18 -29 12 4 4 13 -42/7 -94/1 .-96/(- 1) 350/5. -42 -96 -94 350 -6 -94 96 70 A'5 A5 = [8604/18 -6 -94 96 70 = [8604], [478] As can be checked by other methods, det A = 478. In general, for an n X n matrix A, the condensation method will produce a 1 X 1 matrix An containing det A. Clearly, the method breaks down if the interior of any of the A J s contains a zero, since we would then be trying to divide by zero to construct A,+ 1. However, careful use of elementary row and column operations can be used to eliminate the zeros so that we can proceed. 285 Exploration Geometric Applications of Determinants This exploration will reveal some of the amazing applications of determinants to geometry. In particular, we will see that determinants are closely related to area and volume formulas and can be used to produce the equations of lines, planes, and certain other curves. Most of these ideas arose when the theory of determinants was being developed as a subject in its own right. The Cross Product Recall from Exploration: The Cross Product in Chapter 1 that the cross product is the vector u X v defined by of u = and v = v2 -«3- -V3_ u X v u2v3 - U3V2 M3V1 ~ U]V3 uxv2 — u2vx If we write this cross product as (w2v3 ~ u^v2)e1 — (tqv3 — u3v1)e2 + (uxv2 — w2v1)e3, where e2, and e3 are the standard basis vectors, then we see that the form of this formula is u X v = det «2 V2 if we expand along the first column. (This is not a proper determinant, of course, since e1; e2, and e3 are vectors, not scalars; however, it gives a useful way of remembering the somewhat awkward cross product formula. It also lets us use properties of determinants to verify some of the properties of the cross product.) Now let's revisit some of the exercises from Chapter 1. 286 1. Use the determinant version of the cross product to compute u X v. "o" 3~ 3~ o~ (a) u = 1 , v = -1 (b)u = -1 , v = 1 1 2_ 2. 1 (c) -1 2 3 2~ 1 1 »v = -4 (d)u = 1 , v = 2 ,-6_ ,1. .3 «! 2, If u = «2 . V = , and w = w2 -«3_ -V3- , show that u • (v X w) = det u, L«3 v2 v3 ^2 3. Use properties of determinants (and Problem 2 above, if necessary) to prove the given property of the cross product. (a) v X u = (u X v) (b) u X 0 = 0 (c) u X u = 0 (d) u X kv = k(u X v) (e) u X (v + w) = u X v + u X w (f) u • (u X v) = 0 and v (u X v) = 0 (g) u • (v X w) = (u X v) • w (the triple scalar product identity) Area and Volume We can now give a geometric interpretation of the determinants of 2 X 2 and 3X3 matrices. Recall that if u and v are vectors in R3, then the area A of the parallelogram determined by these vectors is given by A = || u X v ||. (See Exploration: The Cross Product in Chapter 1.) Let u = and v = determined by u and v is given by . Show that the area A of the parallelogram det [Hint: Write u and v as «2 and Vi 0 0 5. Derive the area formula in Problem 4 geometrically, using Figure 4.10 as a guide. [Hint: Subtract areas from the large rectangle until the parallelogram remains.] Where does the absolute value sign come from in this case? 287 vxw A V Figure 4.11 6. Find the area cf the parallelogram determined by u and v. *2" r-l" (b)u = "3" "5" > v = , v = .3. 4_ A 5. Generalizing from Problems 4-6, consider ^parallelepiped, a three-dimensional solid resembling a "slanted" brick, whose six faces are all parallelograms with opposite faces parallel and congruent (Figure 4.11). Its volume is given by the area of its base times its height. 7. Prove that the volume V of the parallelepiped determined by u, v, and w is given by the absolute value of the determinant of the 3 X 3 matrix [u v w] with u, v, and w as its columns. [Hint: From Figure 4.11 you can see that the height h can be expressed as h = jju[|cos 8, where 6 is the angle between u and v X w. Use this fact to show that V = |u • (v X w) | and apply the result of Problem 2.] 8. Show that the volume V of the tetrahedron determined by u, v, and w (Figure 4.12) is given by V = ||u-(v x w)| [Hint: From geometry, we know that the volume of such a solid is V = 5 (area of the base) (height).] Now let's view these geometric interpretations from a transformational point of view. Let A be a 2 X 2 matrix and let P be the parallelogram determined by the vectors u and v. We will consider the effect of the matrix transformation TA on the area of P. Let TA(P) denote the parallelogram determined by TA(u) = Au and TA(v) = Av. 9. Prove that the area of TA(P) is given by |det A| (area of P). 10. Let A be a 3 X 3 matrix and let P be the parallelepiped determined by the vectors u, v, and w. Let TA(P) denote the parallelepiped determined by TA(u) = Au, TA(y) — Av, and TA(w) = Aw. Prove that the volume of TA(P) is given by |det A j (volume of P). The preceding problems illustrate that the determinant of a matrix captures what the corresponding matrix transformation does to the area or volume of figures upon which the transformation acts. (Although we have considered only certain types of figures, the result is perfectly general and can be made rigorous. We will not do so here.) Lines and Planes Suppose we are given two distinct points (xx, yx) and (x2, y2) in the plane. There is a unique line passing through these points, and its equation is of the form ax + by + c = 0 Since the two given points are on this line, their coordinates satisfy this equation. Thus, axx + byx + c = 0 ax2 + by2 + c = 0 The three equations together can be viewed as a system of linear equations in the variables a, b, and c. Since there is a nontrivial solution (i.e., the line exists), the coefficient matrix x y x2 y2 cannot be invertible, by the Fundamental Theorem of Invertible Matrices. Consequently, its determinant must be zero, by Theorem 4.6. Expanding this determinant gives the equation of the line. The equation of the line through the points (xx, yx) and (x2, y2) is given by x y *2 72 = 0 11. Use the method described above to find the equation of the line through the given points. (a) (2,3) and (-1,0) (b) (l, 2) and (4, 3) 12. Prove that the three points (xt, yx), (x2, y2), and (x3, y3) are collinear (lie on the same line) if and only if xi yi l x2 y2 1 x3 ys 1 = o 13. Show that the equation of the plane through the three noncollinear points (xx, yx, zt), (x2, y2, z2), and (x3, y3, %) is given by y z x2 y2 z2 *a yi z3 What happens if the three points are collinear? [Hint: Explain what happens when row reduction is used to evaluate the determinant.] 288 14. Prove that the four points (x^y,, zx), (x2,y2, z2), (x3,y}, z3), and (x4,y4,z4) are coplanar (lie in the same plane) if and only if x1 yx Zj 1 x2 y2 z2 1 x3 y3 z3 1 x4 7i z4 1 = 0 Figure 4.13 Curve fitting When data arising from experimentation take the form of points (x, y) that can be plotted in the plane, it is often of interest to find a relationship between the variables x and y. Ideally, we would like to find a function whose graph passes through all of the points. Sometimes all we want is an approximation (see Section 7.3), but exact results are also possible in certain situations. 15. From Figure 4.13 it appears as though we may be able to find a parabola passing through the points A(-l, 10), B(0, 5), and C(3, 2). The equation of such a parabola is of the form y = a + bx + cx2. By substituting the given points into this equation, set up a system of three linear equations in the variables a, b, and c. Without solving the system, use Theorem 4.6 to argue that it must have a unique solution. Then solve the system to find the equation of the parabola in Figure 4.13. 16. Use the method of Problem 15 to find the polynomials of degree at most 2 that pass through the following sets of points. (a) A(l, -1), B(2,4), C(3, 3) (b) A(-l, -3), 5(1, -1), C(3,1) 17. Generalizing from Problems 15 and 16, suppose ax, a2, and a3 are distinct real numbers. For any real numbers by, b2> and b3, we want to show that there is a unique quadratic with equation of the form y = a + bx + cx2 passing through the points (flu bx), (a2, b2), and (a3, b3). Do this by demonstrating that the coefficient matrix of the associated linear system has the determinant 1 1 a 1 a = («2 - fl,)(a3 - a,)(fl3 - a2) which is necessarily nonzero. (Why?) 18. Let av, a2, a3, and a4 be distinct real numbers. Show that «3 2 (a2 — a[)(a3 — a,)(a4 — a,)(a3 — a2)(a4 — a2)(a4 — a3) # 0 For any real numbers bl,b2,bi, and bA, use this result to prove that there is a unique cubic with equation y = a + bx + cx2 + dx1 passing through the four points {ar, bx), (a2, b2), (a3, b3), and (a4, bA). (Do not actually solve for a, b, c, and d.) 290 19. Let be n real numbers. Prove that 1 ax a\ • • • a"'1 i 2 n-1 1 a2 a2 ■ • ■ a2 1 a3 aj • • • a""1 1 a„ a?, • • • a;;"1 where rii<, 0 1 -1 0 2 -5 3 0 _0 0 0 0 (We knew in advance that we must get at least one zero row. Why?) Thus, x = x2 .*3. in the eigenspace £, if and only if xx - x3 = 0 and x2 - .v3 = 0. Setting the free variable x3 = t, we see that x-, = I and x2 — t, from which it follows that = it span To find the eigenvectors corresponding to A3 = 2, we find the null space of A - 21 by row reduction: So x = '-2 1 0 o" "l 0 i 4 0" [A - 21 \0] = 0 -2 1 0 0 ] _1 0 2 -5 2 0_ _0 0 0 0- is in the eigenspace £2 if and only if xx 4*3 and = 2x Setting the span ij7 span /ri 2 \L4 hapter 4 Eigenvalues and Eigenvectors where we have cleared denominators in the basis by multiplying through by the least Wt common denominator 4. (Why is this permissible?) Irsssala I.IS Remark Notice that in Example 4.18, A is a 3 X 3 matrix but has only two distinct eigenvalues. However, if we count multiplicities, A has exactly three eigenvalues (A = 1 twice and A = 2 once). This is what the Fundamental Theorem of Algebra guarantees. Let us define the algebraic multiplicity of an eigenvalue to be its multiplicity as a root of the characteristic equation. Thus, A = 1 has algebraic multiplicity 2 and A = 2 has algebraic multiplicity 1. Next notice that each eigenspace has a basis consisting of just one vector. In other words, dim £, = dim E2 = 1. Let us define the geometric multiplicity of an eigenvalue A to be dim £A, the dimension of its corresponding eigenspace. As you will see in Section 4.4, a comparison of these two notions of multiplicity is important. Find the eigenvalues and the corresponding eigenspaces of -l o r A ! 0 -3 0 -1 Solution The characteristic equation is -1 - A 0 0 = det(A - AI) = -A 0 = -A(A2 + 2 A) ~A2(A + 2) 1 -3 -1 - A = -A T - A 1 1 1 - A Hence, the eigenvalues are Xl = A2 = 0 and A3 = —2. Thus, the eigenvalue 0 has algebraic multiplicity 2 and the eigenvalue -2 has algebraic multiplicity 1. For Aj = A2 = 0, we compute 0 1 o" 0 -1 o" [A - 0/ |0] = = [A|0] = 3 0 -3 0 > 0 0 0 0 1 0 -1 0, 0 0 0 0 from which it follows that an eigenvector x in E0 satisfies xx = x3. Therefore, both x2 and x, are free. Setting x2 = s and x3 = t, we have En = + t -— span For k3 "l 0 1 0~ \l 0 1 o" A - (-2)/|Q] = [A + 2/jd] = 3 2 -3 0 ->■ 0 1 -3 0 „1 0 1 0 .0 0 0 0 Section 4.3 Eigenvalues and Eigenvectors of n X n Matrices so x3 — t is free and xx = —x3 = —t and x2 = 3x3 — 3f. Consequently, -t 3t t = t span -1 3 1 It follows that Aj = A2 = 0 has geometric multiplicity 2 and A3 = —2 has geometric multiplicity 1. (Note that the algebraic multiplicity equals the geometric multiplicity for each eigenvalue.) In some situations, the eigenvalues of a matrix are very easy to find. If A is a triangular matrix, then so is A — XI, and Theorem 4.2 says that det (A ~ XI) is just the product of the diagonal entries. This implies that the characteristic equation of a triangular matrix is X)(a22 — A)•• • (fi A) = 0 from which it follows immediately that the eigenvalues are Ax = su, A2 = a22, . . A„ = ann. We summarize this result as a theorem and illustrate it with an example. Theorem 4.15 The eigenvalues of a triangular matrix are the entries on its main diagonal. lusmple 4 20 The eigenvalues of A = 2 0 0 0" 1 1 0 0 3 0 3 0 5 7 4 -2 are A: = 2, A2 = 1, X3 = 3, and A4 = —2, by Theorem 4.15. Indeed, the characteristic polynomial is just (2 — A)(l — A)(3 — A)(—2 — A). Note that diagonal matrices are a special case of Theorem 4.15. In fact, a diagonal matrix is both upper and lower triangular. Eigenvalues capture much important information about the behavior of a matrix. Once we know the eigenvalues of a matrix, we can deduce a great many things without doing any more work. The next theorem is one of the most important in this regard. Theorem 4.16 A square matrix A is invertible if and only if 0 is not an eigenvalue of A. PfOOf Let A be a square matrix. By Theorem 4.6, A is invertible if and only if det A # 0. But det A ¥= 0 is equivalent to det (A — 01) * 0, which says that 0 is not a root of the characteristic equation of A (i.e., 0 is not an eigenvalue of A).-- We can now extend the Fundamental Theorem of Invertible Matrices to include results we have proved in this chapter. Chapter 4 Eigenvalues and Eigenvectors Theorem 4.17 The Fundamental Theorem of Invertible Matrices: Version 3 Let A be an n X n matrix. The following statements are equivalent: a. A is invertible. b. Ax = b has a unique solution for every b in U". c. Ax = 0 has only the trivial solution. d. The reduced row echelon form of A is e. A is a product of elementary matrices. f. rank(A) = n g. nullity(A) = 0 h. The column vectors of A are linearly independent. i. The column vectors of A span IR". j. The column vectors of A form a basis for IR". k. The row vectors of A are linearly independent. 1. The row vectors of A span IR". m. The row vectors of A form a basis for W. n. det A # 0 o. 0 is not an eigenvalue of A. Proot The equivalence (a) (n) is Theorem 4.6, and we just proved (a) (o) in Theorem 4.16. There are nice formulas for the eigenvalues of the powers and inverses of a matrix. Theorem 4.18 Let A be a square matrix with eigenvalue A and corresponding eigenvector x. a. For any positive integer n, A" is an eigenvalue of A" with corresponding eigenvector x. b. If A is invertible, then 1/A is an eigenvalue of A"1 with corresponding eigenvector x. c. If A is invertible, then for any integer n, A" is an eigenvalue of A" with corresponding eigenvector x. Proof We are given that Ax = Ax. (a) We proceed by induction on n. For n = 1, the result is just what has been given. Assume the result is true for n — k. That is, assume that, for some positive integer k, Akx = Afcx. We must now prove the result for « = k + 1. But Ak+1x = ACA^x) = A(A!cx) by the induction hypothesis. Using property (d) of Theorem 3.3, we have A(A*x) = \k(Ax) = A*(Ax) = At+1x Thus, Ak+1x - Xk+lx, as required. By induction, the result is true for all integers n > 1. (b) You are asked to prove this property in Exercise 13. (c) You are asked to prove this property in Exercise 14. __JIHBB The next example shows one application of this theorem. Section 4.3 Eigenvalues and Eigenvectors ofnXn Matrices 297 Example 4,21 Compute "0 I 10 r5" 2 1. .1. Solution Let A eigenvalues of A are A, r 0 1 2 1 and x = ; then what we want to find is A10x. The and Vt Thati ■ 1 and A2 ~ 2, with corresponding eigenvectors v, = -V, and Av2 = 2v, (Check this.) Since {v„ v2} forms a basis for U2 (why?), we can write x as a linear combination of vx and v2. Indeed, as is easily checked, x = 3vj + 2v2. Therefore, using Theorem 4.18(a), we have Al0x = A10(3Vl + 2v2) = 3(A10Vl) + 2(A,0v2) = 3(Al°)v1 + 2(A;°)v2 3(-l)1 This is certainly a lot easier than computing A10 first; in fact, there are no matrix l" + 2(210) "f 3 + 2ir '20511 = _-3 4- 212. -1. .2. .4093J multiplications at all! When it can be used, the method of Example 4.21 is quite general. We summarize it as the following theorem, which you are asked to prove in Exercise 42. Theorem 4.19 Suppose the n X n matrix A has eigenvectors vu v2, .. ., vm with corresponding eigenvalues A1( A2,..., Am. If x is a vector in U" that can be expressed as a linear combination of these eigenvectors—say, x = c;v, 4- c,v2 + ■■■ + c,„v,„ then, for any integer k, Ak% = CjAfv] + c2Afv2 4- • • • + c„Xym WarRiny The catch here is the "if" in the second sentence. There is absolutely no guarantee that such a linear combination is possible. The best possible situation would be if there were a basis of W consisting of eigenvectors of A; we will explore this possibility further in the next section. As a step in that direction, however, we have the following theorem, which states that eigenvectors corresponding to distinct eigenvalues are linearly independent. Theorem 4.20 LetAbeann X n matrix and let A1; A2,Am be distinct eigenvalues of A with corresponding eigenvectors vlt v2,..., vm. Then v1; v2,..., vm are linearly independent. Chapter 4 Eigenvalues and Eigenvectors ft Oil The proof is indirect We will assume that v,, v2,..., vm are linearly dependent and show that this assumption leads to a contradiction. If Vj, v2,..., v„ are linearly dependent, then one of these vectors must be expressible as a linear combination of the previous ones. Let vk+ j be the first of the vectors v, that can be so expressed. In other words, Vj, v2,. . ., Vj. are linearly independent, but there are scalars q, c2,. ■ ■, ck such that + ckvk (1) Multiplying both sides of Equation (1) by A from the left and using the fact that Av; = A;V; for each i, we have Xk+lvk+l = Avk+i = A(c,Vi + c2v2 + • • • + ckvk) = c.AVi + c2Av2 + • • • + ckAvk (2) = CiAjVj + c2A2v2 + ■ • ■ + ckXkvk Now we multiply both sides of Equation (1) by Afc+1 to get h+iVk+i = CiAfc+1v, + e2Ai+1v2 + • • • + ckXk+,vk (3) When we subtract Equation (3) from Equation (2), we obtain 0 = Cj(Ai - AA+1)v, + c2(A2 - Ai+i)v2 +----f ck(Xk - kk+l)vk The linear independence of v1; v2,..., vt implies that Cj(Aj - Ajt+i) = c2(A, - Ajt+J) = • • ■ = ck(kk - A,.+i) = 0 Since the eigenvalues A; are all distinct, the terms in parentheses (A, — Xk+l), i = I,____, k, are all nonzero. Hence, c, = c2 = • • • = ck = 0. This implies that vit+1 = civi + C2V2 + ■ • • + ckvk = OVj + 0v2 + • • • + 0vk = 0 which is impossible, since the eigenvector vk+l cannot be zero. Thus, we have a contradiction, which means that our assumption that v., v2, . . . , v„ are linearly dependent is false. It follows that v,, v2,..., vm must be linearly independent. Exercises 4.3 In Exercises 1-12, compute (a) the characteristic polynomial of A, (b) the eigenvalues of A, (c) a basis for each eigenspace of A, and (d) the algebraic and geometric multiplicity of each eigenvalue. 1. A = 3. A = 5. A 1 -2 1 0 • 0 1 -1 0 3 6_ 1 -2 0 2 -1 1 2. A = 4. A = 6. A 0 2 -1 3 0 1 7. A 9. A 11. A = 0 0' 0 0 0 0 1 0 0 1 8. A = 10. A = 0 0 1 0 -I 2 0 1 0 0 -1 2 -1 1 1 4 0 0 0 -1 0 1 0' 5 1 2 12. A = 4 0 0 4 0 0 1 0 0 3 1 0' 1 1 2 0 Section 4,3 Eigenvalues and Eigenvectors of n X n Matrices 299 13. Prove Theorem 4.18(b). 14. Prove Theorem 4.18(c). [Hint: Combine the proofs of parts (a) and (b) and see the fourth Remark following Theorem 3.9 (page 169).] In Exercises 15 and 16, A is a 2 X 2 matrix with eigenvec- tors vt and v, = corresponding to eigenvalues Aj = f and A2 = 2, respectively, andx = 15. FindA10x. 16. Find Akx. What happens as k becomes large (i.e., k - ► oo)? In Exercises 17 and 18, A is a 3 X 3 matrix with eigenvectors corresponding to eigenvalues Aj = — f, A2 = j, and A3 = 1, respectively, and "2" V ~r "l* 0 ,v2 = i , and v3 = 1 .0. _o_ .1. X = 17. Find A20x. 18. Find Akx. What happens as k becomes large (i.e., k —*■ oo)? 19. (a) Show that, for any square matrix A, A and A have the same characteristic polynomial and hence the same eigenvalues, (b) Give an example of a 2 X 2 matrix A for which A and A have different eigenspaces. 20. Let A be a nilpotent matrix (that is, A'" = O for some m > 1). Show that A = 0 is the only eigenvalue of A. 21. Let A be an idempotent matrix (that is, A2 = A). Show that A = 0 and A = 1 are the only possible eigenvalues of A. 22. If v is an eigenvector of A with corresponding eigenvalue A and c is a scalar, show that v is an eigenvector of A — cl with corresponding eigenvalue A — c. 23. (a) Find the eigenvalues and eigenspaces of 3 2 A 5 0 (b) Using Theorem 4.18 and Exercise 22, find the eigenvalues and eigenspaces of A~!, A — 21, and A + 21. 24. Let A and B be n X n matrices with eigenvalues A and jjl, respectively. (a) Give an example to show that A + p need not be an eigenvalue of A + B. (b) Give an example to show that A/n need not be an eigenvalue of AB. (c) Suppose A and p correspond to the same eigenvector x. Show that, in this case, A + fi is an eigenvalue of A + B and Xp is an eigenvalue of AB. 25. If A and B are two row equivalent matrices, do they necessarily have the same eigenvalues? Either prove that they do or give a counterexample. Letp(x) be the polynomial p(x) = xn + an. '1 + • ■ ■ + axx + a0 The companion matrix ofp{x) is the n X n matrix dp) an-2 • —ax -a0 1 0 0 0 0 1 0 0 0 0 0 0 1 0 (4) 26. Find the companion matrix ofp(x) = x2 — 7x + 12 and then find the characteristic polynomial of C(p). 27. Find the companion matrix ofp(x) = x3 + 3x" — 4x + 12 and then find the characteristic polynomial ofC(p). 28. (a) Show that the companion matrix C(p) ofp(x) = x2 + ax + b has characteristic polynomial A2 + ak + b. (b) Show that if A is an eigenvalue of the companion matrix C(p) in part (a), then of C( p) corresponding to A. 29. (a) Show that the companion matrix C(p) of p (x) = x3 + ax2 + bx + c has characteristic polynomial -(A3 + aA2 + bX + c). (b) Show that if A is an eigenvalue of the companion "A2" matrix C{p) in part (a), then A is an eigenvector of C( p) corresponding to A. \ 30. Construct a nontriangular 2X2 matrix with eigenvalues 2 and 5. [Hint: Use Exercise 28.] 31. Construct a nontriangular 3X3 matrix with eigenvalues — 2, 1, and 3. [Hint: Use Exercise 29.] 32. (a) Use mathematical induction to prove that, for n > 2, the companion matrix C(p) ofp(x) = xn + a,,-!*"-1 + • • • + axx + a0 has characteristic polynomial ( — l)"p (A). [Hint: Expand by cofactors along the last column. You may find it helpful to introduce the polynomial q(x) = (p(x) — a0)/x.] Chapter 4 Eigenvalues and Eigenvectors (b) Show that if A is an eigenvalue of the companion matrix C(p) in Equation (4), then an eigenvector corresponding to A is given by A 1 Ifp(x) = x" + an^xx"~l 4- • • • + axx + a0 and A is a square matrix, we can define a square matrix p(A) by p(A) = A" + an..xA""1 + ■■■ + axA + a0I An important theorem in advanced linear algebra says that ifcA(X) is the characteristic polynomial of the matrix A, then cA(A) = O (in words, every matrix satisfies its characteristic equation). This is the celebrated Cayley-Hamilton Theorem, named after Arthur Cayley (1821-1895), pictured below, and Sir William Rowan Hamilton (see page 2). Cayley proved this theorem in 1858. Hamilton discovered it, independently, in his work on quaternions, a generalization of the complex numbers. and A3 = AA2 = A(—aA bl) 33. Verifythe Cayley-Hamilton Theorem for A = That is, find the characteristic polynomial cA(X) of A and show that cA(A) = O. 34. Verify the Cayley-Hamilton Theorem for 1 1 0~ A = The Cayley-Hamilton Theorem can be used to calculate powers and inverses of matrices. For example, if A is a 2 X 2 matrix with characteristic polynomial cA(A) = A2 + aX + b, then A2 + aA + bl = O, so A2 = -aA - bl = -aA2 - bA = -a(-aA - bl) - bA = (a2 - b)A + abl It is easy to see that by continuing in this fashion we can express any positive power of A as a linear combination of I and A. From A2 + aA + bl = O, we also obtain A(A + al) = -bl, so A"1 = --A b provided b 0. 35. For the matrix A in Exercise 33, use the Cayley-Hamilton Theorem to compute A2, A3, and A4 by expressing each as a linear combination of J and A. 36. For the matrix A in Exercise 34, use the Cayley-Hamilton Theorem to compute A3 and A4 by expressing each as a linear combination of I, A, and A". 37. For the matrix A in Exercise 33, use the Cayley-Hamilton Theorem to compute A"1 and A"2 by expressing each as a linear combination of I and A. 38. For the matrix A in Exercise 34, use the Cayley-Hamilton Theorem to compute A"1 and A'2 by expressing each as a linear combination of I, A, and A2, 39. Show that if the square matrix A can be partitioned as P Q] A = O ]S where P and S are square matrices, then the characteristic polynomial of A is cA(X) = cP(X)cs(X). [Hint: Use Exercise 69 in Section 4.2.] 40. Let Aj, A2,..., A„ be a complete set of eigenvalues (repetitions included) of the n X n matrix A. Prove that det(A) = A[A2 - • -A„ and tr(A) = A! + A2 + • • • + A„ [Hint: The characteristic polynomial of A factors as det(A - AJ) = (-1)"(A - AJ(A - A2) • • - (A - Xn) Find the constant term and the coefficient of A""1 on the left and right sides of this equation.] 41. Let A and B be n X n matrices. Prove that the sum of all the eigenvalues of A + B is the sum of all the eigenvalues of A and B individually. Prove that the product of all the eigenvalues of AB is the product of all the eigenvalues of A and B individually. (Compare this exercise with Exercise 24.) 42. Prove Theorem 4.19. Section 4.4 Similarity and Diagonalization 301 -.-,--.-.—_-- Writing Project The History of Eigenvalues Like much of linear algebra, the way the topic of eigenvalues is taught today does not correspond to its historical development. Eigenvalues arose out of problems in systems of differential equations before the concept of a matrix was even formulated. Write a report on the historical development of eigenvalues. Describe the types of mathematical problems in which they originally arose. Who were some of the key mathematicians involved with these problems? How did the terminology for eigenvalues change over time? 1. Thomas Hawkins, Cauchy and the Spectral Theory of Matrices, História Math-ematica 2 (1975), pp. 1-29. 2. Victor J. Katz, A History of Mathematics: An Introduction (Third Edition) (Reading, MA: Addison Wesley Longman, 2008). 3. Morris Kline, Mathematical Thought from Ancient to Modern Times (Oxford: Oxford University Press, 1972). Similarity and Diagonalization As you saw in the last section, triangular and diagonal matrices are nice in the sense that their eigenvalues are transparently displayed. It would be pleasant if we could relate a given square matrix to a triangular or diagonal one in such a way that they had exactly the same eigenvalues. Of course, we already know one procedure for converting a square matrix into triangular form—namely, Gaussian elimination. Unfortunately, this process does not preserve the eigenvalues of the matrix. In this section, we consider a different sort of transformation of a matrix that does behave well with respect to eigenvalues. Similar Matrices Definition Let A and B be n X n matrices. We say that A is similar to B if there is an invertible n X n matrix P such that P~~lAP = B. If A is similar to B, we write A ~ B. ■Mirks • If A ~ B, we can write, equivalently, that A = PBP~1 or AP = PB. • Similarity is a relation on square matrices in the same sense that "less than or equal to" is a relation on the integers. Note that there is a direction (or order) implicit in the definition. Just asa^b does not necessarily imply b £ a, we should not assume that A ~ B implies B — A. (In fact, this is true, as we will prove in the next theorem, but it does not follow immediately from the definition.) • The matrix P depends on A and B. It is not unique for a given pair of similar matrices A and B. To see this, simply take A = B = 1, in which case / ~ I, since P~lIP = I for any invertible matrix P. 302 Chapter 4 Eigenvalues and Eigenvectors Example 4.22 Let A = ] 2 1 0" andB = . Then A - ' B, since .0 -1. -2 -1. "l 2 1 - f 3 I "1 -1" 1 0 .0 - -1 1 1 .-1 -1. .1 1. -2 -1 Thus, AP = PB with P = [1 -1 LI 1 See the first Remark before Example 4.22. . (Note that it is not necessary to compute P~ . Theorem 4.21 LetA.B, and C ben X « matrices. a. A ~ A b. If A ~ B, then B ~ A. c. If A ~ B and B ~ C, then A ~ C. PfOOf (a) This property follows from the fact that I~ lAI = A. (b) If A ~ B, then P~lAP = B for some invertible matrix P. As noted in the first Remark on the previous page, this is equivalent to PBP~~l = A. Setting Q = P~\ we have CT'BQ = (P^j^BP"1 = PBP"1 = A. Therefore, by definition, B ~ A. (c) You are asked to prove property (c) in Exercise 30. _HSM Remark Any relation satisfying the three properties of Theorem 4.21 is called an equivalence relation. Equivalence relations arise frequently in mathematics, and objects that are related via an equivalence relation usually share important properties. We are about to see that this is true of similar matrices. Theorem 4.22 Let A and B be n X n matrices with A ~ B. Then a. det A = det B b. A is invertible if and only if B is invertible. c. A and B have the same rank. d. A and B have the same characteristic polynomial. e. A and B have the same eigenvalues. f. A'" ~ Bm for all integers m > 0. g. If A is invertible, then Am ~ Bra for all integers m. Proof We prove (a) and (d) and leave the remaining properties as exercises. If A ~ B, thenP~1AP = Bfor some invertible matrix P. (a) Taking determinants of both sides, we have detB = det(P^AP) = (det P_1)(det A)(det P) / 1 ,(det A)(det P) = det A V det P} (d) The characteristic polynomial of B is det(B - XI) = det(P" AP - A J) = det(P~lAP - AP-1/?) Section 4.4 Similarity and Diagonalization 303 = det (P~JAP - P '(\1)P) = det(p-\A - XI)P) = det(A - XI) with the last step following as in (a). Thus, det(B - AJ) = det (A - XI); that is, the characteristic polynomials of B and A are the same. Remark Two matrices may have properties (a) through (e) (and more) in common and yet still not be similar. For example, A = | and B = both have de- terminant 1 and rank 2, are invertible, and have characteristic polynomial (1 - A)2 and eigenvalues Aj = A2 = 1. But A is not similar to B, since P_1AP = P~lIP = I # B for any invertible matrix P. Theorem 4.22 is most useful in showing that two matrices are not similar, since A and B cannot be similar if any of properties (a) through (e) fails. Example 4,23 (a) A = (b) A = 1 2 2 1. "l 3 2 2 and B = and B are not similar, since det A = —3 but det 5=3. are not similar, since the characteristic polyno- mial of A is A2 - 3A - 4 while that of B is A2 - 4. (Check this.) Note that A and B do have the same determinant and rank, however. k Diagonalization The best possible situation is when a square matrix is similar to a diagonal matrix. As you are about to see, whether a matrix is diagonalizable is closely related to the eigenvalues and eigenvectors of the matrix. Definition An « X n matrix A is diagonalizable if there is a diagonal matrix D such that A is similar to D— that is, if there is an invertible n X n matrix P such that P ~'AP = D. Example 4,24 A-- 1 3 2 2 "1 3' '4 0" since, if P = and D = 1 ~2 ^0 -1 then P~~LAP — D, as can be easily checked. (Actually, it is faster to check the equivalent statement AP = PD, since it does not require finding P~~l.) Example 4.24 begs the question of where matrices P and D came from. Observe that the diagonal entries 4 and -1 of D are the eigenvalues of A, since they are the roots of its characteristic polynomial, which we found in Example 4.23(b). The origin of matrix P is less obvious, but, as we are about to demonstrate, its entries are obtained from the eigenvectors of A. Theorem 4.23 makes this connection precise. 304 Chapter 4 Eigenvalues and Eigenvectors ThCOrClll 4.23 Let A be an n X n matrix. Then A is diagonalizable if and only if A has n linearly independent eigenvectors. More precisely, there exist an invertible matrix P and a diagonal matrix D such that P ~1 AP = D if and only if the columns of P are n linearly independent eigenvectors of A and the diagonal entries of D are the eigenvalues of A corresponding to the eigenvectors in P in the same order. Proof Suppose first that A is similar to the diagonal matrix D via P lAP = D or, equivalently, AP = PD. Let the columns of P be p1} p2>..., p„ and let the diagonal entries of D be A1; A2,..., A„ Then A[p, p2 ■ • ■ P„] = [Pi P2 ' ■ • P«] 0 0 A2 0 0 or [Apj Ap2 Ap„] = [Ajpi A2p2 A„p„] (2) where the right-hand side is just the column-row representation of the product PD. Equating columns, we have Api = Ajpi, Ap2 - A2p2,..., Ap„ = A„p„ which proves that the column vectors of P are eigenvectors of A whose corresponding eigenvalues are the diagonal entries of D in the same order. Since P is invertible, its columns are linearly independent, by the Fundamental Theorem of Invertible Matrices. Conversely, if A has n linearly independent eigenvectors pl5 p2>..., p„ with corresponding eigenvalues Ax, A2,..., A„, respectively, then Apj = Ajpi, Ap2 = A2p2,..., Ap„ = A„p„ This implies Equation (2) above, which is equivalent to Equation (1). Consequently, if we take P to be the n X n matrix with columns p^ p2,..., p„, then Equation (1) becomes AP = PD. Since the columns of P are linearly independent, the Fundamental Theorem of Invertible Matrices implies that P is invertible, so P~lAP = D; that is, A is diagonalizable. - Example 4.25 If possible, find a matrix P that diagonalizes "0 1 0 A = 0 0 1 _2 -5 4 Solution We studied this matrix in Example 4.18, where we discovered that it has eigenvalues A] = A2 = 1 and A3 = 2. The eigenspaces have the following bases: For A! = A2 = 1, Ei has basis For A3 = 2, E2 has basis Section 4.4 Similarity and Diagonalization 305 Since all other eigenvectors are just multiples of one of these two basis vectors, there cannot be three linearly independent eigenvectors. By Theorem 4.23, therefore, A is not diagonalizable. Example 4.26 If possible, find a matrix P that diagonalizes -1 0 1 3 0-3 1 0 -1 Solution This is the matrix of Example 4.19. There, we found that the eigenvalues of A are A! = A2 = 0 and A, = -2, with the following bases for the eigenspaces: "(f V For A! = A2 = 0, £0 has basis p! = 1 and p2 = 0 0 l For A3 = —2, £_2 has basis p3 -1 3 1 It is straightforward to check that these three vectors are linearly independent. Thus, if we take "o 1 -1 p3 - 1 0 3 _0 1 1 0 0 o" 0 0 0 D 0 0 -2 then P is invertible. Furthermore, FlAP as can be easily checked. (If you are checking by hand, it is much easier to check the equivalent equation AP = PD.) a • When there are enough eigenvectors, they can be placed into the columns of P in any order. However, the eigenvalues will come up on the diagonal of D in the same order as their corresponding eigenvectors in P. For example, if we had chosen I Pi Pa P2 then we would have found P'lAP 0 -1 1 1 3 0 0 1 1 0 0 -2 0 0 0 306 Chapter 4 Eigenvalues and Eigenvectors • In Example 4.26, you were asked to check that the eigenvectors p,, p2, and p3 were linearly independent. Was it necessary to check this? We knew that {pj, p2} was linearly independent, since it was a basis for the eigenspace E0. We also knew that the sets {pj, p3} and {p2, p3} were linearly independent, by Theorem 4.20. But we could not conclude from this information that ip,, p2, p3} was linearly independent. The next theorem, however, guarantees that linear independence is preserved when the bases of different eigenspaces are combined. ThGOrBffi 4.24 Let A be an « X n matrix and let A,, A2,kk be distinct eigenvalues of A. If Bj is a basis for the eigenspace BA, then B = Bx U B2U • • • UBk (i.e., the total collection of basis vectors for all of the eigenspaces) is linearly independent. Proof Let Bi = {va, vi2,vin) for i = 1,..., k We have to show that B = {v„, v12,..., v,„;> v21, v22,..., v2Mj,..., vkl, vk2,..., yk„} is linearly independent. Suppose some nontrivial linear combination of these vectors is the zero vector—say, (cuv„ + • • • + Cln vln) + (c21v21 + • • ■ + c2n v2n) + ■■■+ (cklykl + ■■■ + ckntvk„) = 0 (3) Denoting the sums in parentheses by xb x2,... xh we can write Equation (3) as Xi + x2 + ■ • ■ + xk = 0 (4) Now each x,- is in £A (why?) and so either is an eigenvector corresponding to A; or is 0. But, since the eigenvalues A; are distinct, if any of the factors x.- is an eigenvector, they are linearly independent, by Theorem 4.20. Yet Equation (4) is a linear dependence relationship; this is a contradiction. We conclude that Equation (3) must be trivial; that is, all of its coefficients are zero. Hence, B is linearly independent. '1— There is one case in which diagonalizability is automatic: an n X n matrix with n distinct eigenvalues. Theorem 4.25 IfA is an n X n matrix with n distinct eigenvalues, then A is diagonalizable. Let Vj, v2,..., v„ be eigenvectors corresponding to the n distinct eigenvalues Br.....of A. (Why could there not be more than n such eigenvectors?) By Theorem 4.20, v,, v,,..., v„ are linearly independent, so, by Theorem 4.23, A is diagonalizable. — Example 4.27 The matrix '2 -3 7" A = 0 5 1 has eigenvalues Xx = 2, A2 = 5, and A3 = -1, by Theorem 4.15. Since these are three distinct eigenvalues for a 3 X 3 matrix, A is diagonalizable, by Theorem 4.25. (If we actually require a matrix P such that P'lAPis diagonal, we must still compute bases for the eigenspaces, as in Example 4.19 and Example 4.26 above.) Section 4.4 Similarity and EMagonalization 307 The final theorem of this section is an important result that characterizes diagonaliz-able matrices in terms of the two notions of multiplicity that were introduced following Example 4.18. It gives precise conditions under which an n X n matrix can be diago-nalized, even when it has fewer than n eigenvalues, as in Example 4.26. We first prove a lemma that holds whether or not a matrix is diagonalizable. Lemma 4.26 If A is an n X m matrix, then the geometric multiplicity of each eigenvalue is less than or equal to its algebraic multiplicity. Proof Suppose Aj is an eigenvalue of A with geometric multiplicity p; that is, dim £A] = p. Specifically, let £A. have basis By - {vx, v2,..., vp}. Let Q be any invertible nXfi matrix having v„v2,...,vp as its first p columns — say, Q = [Vi ■ • • v vp+1 • • ■ vj or, as a partitioned matrix, Let Q= [17 | V] , \c Q 1 = — D where C is p X n. Since the columns of U are eigenvectors corresponding to Xu AU = AjU. We also have = In = Q Q 0 ' 0 ~c [U V] = cu cv~ p_ pu from which we obtain CU = L, CV = O, DU = O, and DV = !„_., Therefore, Q-lAQ A [17 |V] \fiU\CAVl \pu\DAv\ CAUl CAV DAU\DAV. By Exercise 69 in Section 4.2, it follows that det(CT'AQ - XI) = (Ai - A)p det(DAV - AJ) CAV O DAV (5) But detCQ^AQ — XI) is the characteristic polynomial of Q~LAQ, which is the same as the characteristic polynomial of A, by Theorem 4.22(d). Thus, Equation (5) implies that the algebraic multiplicity of A y is at least p, its geometric multiplicity. Theorem 4.27 The Diagonalization Theorem Let A be an n X n matrix whose distinct eigenvalues are A[, A2,..., A^. The following statements are equivalent: a. A is diagonalizable. b. The union B of the bases of the eigenspaces of A (as in Theorem 4.24) contains n vectors. c. The algebraic multiplicity of each eigenvalue equals its geometric multiplicity. 308 Chapter 4 Eigenvalues and Eigenvectors (a) =>■ (b) If A is diagonalizable, then it has n linearly independent eigenvectors, by Theorem 4.23. If of these eigenvectors correspond to the eigenvalue A,-, a, * then B. contains at least n, vectors. (We already know that these nt vectors are linearly independent; the only thing that might prevent them from being a basis for £A is that they might not span it.) Thus, B contains at least n vectors. But, by Theorem 4.24, B is a linearly independent set in U"; hence, it contains exactly n vectors. (b) =4- (c) Let the geometric multiplicity of A, be d. = dim £A and let the algebraic multiplicity of A, be m;. By Lemma 4.26, d, £ m, for i = 1,..., k. Now assume that property (b) holds. Then we also have n = dx + d2 + ■ • • + <4 — m\ + m2 + ''' + mk But mx + m2 + ■ • ■ + mk = n, since the sum of the algebraic multiplicities of the eigenvalues of A is just the degree of the characteristic polynomial of A—namely, n. It follows that dx + d2 + ■ ■ ■ + dk = mx + m2 + • ■ ■ + mk, which implies that (m1 — rf[) + (m2 — d2) + • • * + (mk — dk) = 0 (6) Using Lemma 4.26 again, we know that m,- - d{ s 0 for i = 1,..., k, from which we can deduce that each summand in Equation (6) is zero; that is, m; = d,for i — l,...,k. (c) => (a) If the algebraic multiplicity m, and the geometric multiplicity d; are equal for each eigenvalue A; of A, then B has dx + d2 + ■ ■ ■ + dk = ml + m2 + ■ • ■ + mk — n vectors, which are linearly independent, by Theorem 4.24. Thus, these are n linearly independent eigenvectors of A, and A is diagonalizable, by Theorem 4.23. Example 4.28 (a) The matrix A = 0 1 0 0 2 -5 from Example 4.18 has two distinct eigenvalues, Aj = A2 = 1 and A3 = 2. Since the eigenvalue = A2 — 1 has algebraic multiplicity 2 but geometric multiplicity 1, A is not diagonalizable, by the Diagonalization Theorem. (See also Example 4.25.) [-1 0 I (b) The matrix A = 3 0 —3 from Example 4.19 also has two distinct eigen-.10-1. values, Xx — \2 = 0 and A3 = ~2. The eigenvalue 0 has algebraic and geometric multiplicity 2, and the eigenvalue —2 has algebraic and geometric multiplicity 1. Thus, this matrix is diagonalizable, by the Diagonalization Theorem. (This agrees with our findings in Example 4.26.) ^ We conclude this section with an application of diagonalization to the computation of the powers of a matrix. Example 4.29 • Compute A10 if A = t Solution In Exampl 0 f .2 1. e 4.21 we found that this matrix has eigenvalues Aj = — 1 and A2 = 2, with corresponding eigenvectors vx 1 -1 and v, = . It follows Section 4.4 Similarity and Diagonalization (from any one of a number of theorems in this section) that A is diagonalizable and P"lAP = D, where P = [v, v,l = 1 1 •1 2 and D -1 0 0 2 Solving for A, we have A = POP 1 and, by Theorem 4.22(f), A" = PD"?"1 for all n > 1. Since we have D" A" = PD"P ~l = "-1 0" 0 2_ 0 2". 1 1 -1 2 1 1 -1 2 (~1)8 0 0 2" (-1)" 0 0 2" 1 1 - 1 2 "2 r 3 3 .3 3- 2(-D" + 2" (-1)"+1 + 2" 3 3 2(-l)B+1 + 2"+1 (-1)"'2 + 2"+I Since we were only asked for A10, this is more than we needed. But now we can simply set n — 10 to find A1U - 2(~l)lc + 210 (-1)11 + 210"1 3 3 2(-l)" +2" (-1)12 + 211 3 3 342 341 682 683 Exercises 4.4 In Exercises I -4, show that A and B are not similar matrices, "4 i. 1. A = 2. A = "l 0" .0 1 3. A 4. A = 3 1 2 1 -4 6 '2 1 4 0 2 3 .0 0 4 1 2 0 0 1 -1 0 -1 1 3 -1" ,B = _—5 7_ 1 0 0 -14 0 2 3 4 In Exercises 5-7, a diagonalization of the matrix A is given in the form P^'AP = D. List the eigenvalues of A and bases for the corresponding eigenspaces. 5. -I 1 1 1 1 2 1 1 1 0 0 1 1 1 0 4 0 0 3_ 1 0 -1 1 0 -1 ~2 1 r '2 0 0 ,B = 0 i 0 = 0 0 0 2 0 1. ^0 0 -1 310 Chapter 4 Eigenvalues and Eigenvectors l i" 8 8 3 _ I 4 4 8 8- 0 0 -2 0 0 -2 In Exercises 8-15, determine whether A is diagonalizable and, if so, find an invertible matrix P and a diagonal matrixD such that P~'AP = D. 8. A = 10. A = 12. A = 14. A 5 2 2 5. 3 1 0 3 .0 0 10 0 2 2 1 0 1 0 0 2' 3 2 1 0 0 3 0 0 0 0 1 9. A = 11. A = 13. A = 15. A -3 4 -1 1 1 0 1 0 1 1 .1 10 12 1 -1 0 1 .110 ■2 0 0 0 2 0 0-2 0 0 0 0 In Exercises 16-23, use the method of Example 4.29 to compute the indicated power of the matrix. 16. 18. 20. 22. -4 -3 4 -3 -1 2 2 1 2 2 1 2 2 1 2 2 0 1 1 1 1 1 0 2 17. 19. 21. 23. -1 1 0 3 1 2 "l 0 - 0 1 2 -0 In Exercises 24-29, find all (real) values of k for which A is diagonalizable. 24. A = 26. A 25. A = 27. A 28. A 29. A = 30. Prove Theorem 4.21(c). 31. Prove Theorem 4.22(b). 32. Prove Theorem 4.22(c). 33. Prove Theorem 4.22(e). 34. Prove Theorem 4.22(f). 35. Prove Theorem 4.22(g). 36. If A and B are invertible matrices, show that AB and J3A are similar. 37. Prove that if A and B are similar matrices, then tr(A) — tr(B). [Hint: Find a way to use Exercise 45 from Section 3.2.] In general, it is difficult to show that two matrices are similar. However, if two similar matrices are diagonalizable, the task becomes easier. In Exercises 38-41, show that A and B are similar by showing that they are similar to the same diagonal matrix. Then find an invertible matrix P such that P"lAP = B. 38. A = 39. A = 40. A = 41. A 3 1 0 -1 5 -3 4 -2 1 -2 0 0 -1 0 ,B = ,B = 0 1 1. 2 1 1 1 2 2 1 -1 1 -6 4. 3 1 2 B = 2 -5 2 -1 2 -4. -3 -2 6 5 4 4 42. Prove that if A is similar to B, then A is similar to B1. 43. Prove that if A is diagonalizable, so is A . 44. Let A be an invertible matrix. Prove that if A is diagonalizable, so is A-'. 45. Prove that if A is a diagonalizable matrix with only one eigenvalue A, then A is of the form A = XI. (Such a matrix is called a scalar matrix.) 46. Let A and B be n X n matrices, each with n distinct eigenvalues. Prove that A and B have the same eigenvectors if and only if AB = BA. 47. Let A and B be similar matrices. Prove that the algebraic multiplicities of the eigenvalues of A and B are the same. Section 4.5 Iterative Methods for Computing Eigenvalues 811 48. Let A and B be similar matrices. Prove that the geometric multiplicities of the eigenvalues of A and B are the same. [Hint: Show that, if B = I' lAP, then every eigenvector of B is of the form P v for some eigenvector v of A.] 49. Prove that if A is a diagonalizable matrix such that every eigenvalue of A is either 0 or 1, then A is idem-potent (that is, A2 = A). 50. Let A be a nilpotent matrix (that is, Am = O for some m > 1). Prove that if A is diagonalizable, then A must be the zero matrix. 51. Suppose that A is a 6 X 6 matrix with characteristic polynomial cA(A) = (1 + A)(l - A)2(2 - A)3. (a) Prove that it is not possible to find three linearly independent vectors Vj, v2, v3 in IR6 such that Avj = v., Av2 = v\, and Av3 = v3. (b) If A is diagonalizable, what are the dimensions of the eigenspaces JELj, J5,, and £2? b] 52. Let A = (a) Prove that A is diagonalizable if (a — d)2 + Abe > 0 and is not diagonalizable if (a — df Abe < 0. + (b) Find two examples to demonstrate that if (a — d)2 + Abe = 0, then A may or may not be diagonalizable. In 1824, the Norwegian mathematician Niels Henrik Abel (1802-1829) proved that a general fifth-degree (quintic) polynomial equation is not solvable by radicals; that is, there is no formula for its roots in terms of its coefficients that uses only the operations of addition, subtraction, multiplication, division, and taking nth roots. In a paper written in 1830 and published posthumously in 1846, the French mathematician Evariste Galois (1811-1832) gave a more complete theory that established conditions under which an arbitrary polynomial equation can be solved by radicals. Galois's work was instrumental in establishing the branch of algebra called group theory, his approach to polynomial equations is now known as Galois theory. Iterative Methods for Computing Eigenvalues At this point, the only method we have for computing the eigenvalues of a matrix is to solve the characteristic equation. However, there are several problems with this method that render it impractical in all but small examples. The first problem is that it depends on the computation of a determinant, which is a very time-consuming process for large matrices. The second problem is that the characteristic equation is a polynomial equation, and there are no formulas for solving polynomial equations of degree higher than 4 (polynomials of degrees 2, 3, and 4 can be solved using the quadratic formula and its analogues). Thus, we are forced to approximate eigenvalues in most practical problems. Unfortunately, methods for approximating the roots of a polynomial are quite sensitive to roundoff error and are therefore unreliable. Instead, we bypass the characteristic polynomial altogether and take a different approach, approximating an eigenvector first and then using this eigenvector to find the corresponding eigenvalue. In this section, we will explore several variations on one such method that is based on a simple iterative technique. The Power Method The power method applies to an n X n matrix that has a dominant eigenvalue A;— that is, an eigenvalue that is larger in absolute value than all of the other eigenvalues. For example, if a matrix has eigenvalues —4, —3, 1, and 3, then —4 is the dominant eigenvalue, since 4 = [ — 4! > f—3| £ 3 2: |1|. On the other hand, a matrix with eigenvalues —4, —3, 3, and 4 has no dominant eigenvalue. The power method proceeds iteratively to produce a sequence of scalars that converges to Aj and a sequence of vectors that converges to the corresponding eigenvector Vj, the dominant eigenvector. For simplicity, we will assume that the matrix A is diagonalizable. The following theorem is the basis for the power method. 312 Chapter 4 Eigenvalues and Eigenvectors Theorem 4.28 Let A be an n X n diagonalizable matrix with dominant eigenvalue A[. Then there exists a nonzero vector x0 such that the sequence of vectors xk defined by xi = Axq, x2 — Axi, x3 = Ax2,..., x£ = Axj._i,... approaches a dominant eigenvector of A. Proof We may assume that the eigenvalues of A have been labeled so that JAj > |A2| > |A3| > |Aj Let v,, v2,. .., v„ be the corresponding eigenvectors. Since vu v2,..., v„ are linearly independent (why?), they form a basis for R". Consequently, we can write x0 as a linear combination of these eigenvectors—say, Xo = + c2v2 + • • • + c„v„ Now Xi = Ax0, x2 = Axt = A(Ax0) = A2x0, x3 = Ax2 = A(A2x0) = A3x0, and, generally, xk = A^Xq for k a 1 As we saw in Example 4.21, A^x,, = c^fa + c2A^v2 + ■ ■ • + c„A*v„ - \k A*l civi + ca( ~ ) v2 + A2V W where we have used the fact that Ai + 0. The fact that A! is the dominant eigenvalue means that each of the fractions A2/Ai, A3/A!,..., An/Ai, is less than 1 in absolute value. Thus, all go to zero as k —* oo. It follows that xk = a*Xo ~» AiCjVj as fc -> » (2) Now, since A[ # 0 and v, ¥= 0, x<. is approaching a nonzero multiple of V! (that is, an eigenvector corresponding to At) provided q # 0. (This is the required condition on the initial vector x0: It must have a nonzero component q in the direction of the dominant eigenvector v,.) _ Example 4.30 Approximate the dominant eigenvector of A : Theorem 4.28. 1 1 2 0 using the method of Solution We will take xn = as the initial vector. Then Xj — j4.Xq — 1 1 2 0 1 1 2 0 J X2 — AXy — We continue in this fashion to obtain the values of xk in Table 4.1. Section 4.5 Iterative Methods for Computing Eigenvalues 313 Table 4.1 k 0 1 2 3 4 5 6 7 8 1 1 3 5 11 21 43 85 171 0 2 2 6 10 22 42 86 170 rk — 0.50 1.5( ) 0.83 1.10 0.95 1.02 0.99 1.01 k — 1.00 3.00 1.67 2.20 1.91 2.05 1.98 2.01 Figure 4.14 Figure 4.14 shows what is happening geometrically. We know that the eigenspace for the dominant eigenvector will have dimension 1. (Why? See Exercise 46.) Therefore, it is a line through the origin in R2. The first few iterates xk are shown along with the directions they determine. It appears as though the iterates are converging on the line whose direction vector is . To confirm that this is the dominant eigenvector we seek, we need only observe that the ratio rk of the first to the second component of Xj. gets very close to 1 as A: increases. The second line in the body of Table 4.1 gives these values, and you can see clearly that rk is indeed approaching 1. We deduce that a [Y dominant eigenvector of A is u. Once we have found a dominant eigenvector, how can we find the corresponding dominant eigenvalue? One approach is to observe that if an xk is approximately a dominant eigenvector of A for the dominant eigenvalue Aj, then x/c-i = Axk *"* A.iXk It follows that the ratio lk of the first component of x*+i to that of x^ will approach Aj as k increases. Table 4.1 gives the values of lk, and you can see that they are approaching 2, which is the dominant eigenvalue. 314 Chapter 4 Eigenvalues and Eigenvectors There is a drawback to the method of Example 4.30: The components of the iterates xk get very large very quickly and can cause significant roundoff errors. To avoid this drawback, we can multiply each iterate by some scalar that reduces the magnitude of its components. Since scalar multiples of the iterates xk will still converge to a dominant eigenvector, this approach is acceptable. There are various ways to accomplish it. One is to normalize each xk by dividing it by ||xj (i.e., to make each iterate a unit vector). An easier method-—and the one we will use—is to divide each xk by the component with the maximum absolute value, so that the largest component is now 1. This method is called scaling. Thus, if mk denotes the component of xk with the maximum absolute value, we will replace xk by yk = (1/mk)xk. We illustrate this approach with the calculations from Example 4.30. For x0, there is nothing to do, since m0 = 1. Hence, y0 = x„ We then compute x( as before, but now we scale with mx — 2 to get y. V "0.5" ( - x, = - — \2J \2/ 1 Now the calculations change. We take "1 1" "0.5" 1.5" _2 0. .1 1 and scale to get 12 1_ 1.5, "1.5" "1 ) .1 0.67 The next few calculations are summarized in Table 4.2. You can now see clearly that the sequence of vectors yk is converging to dominant eigenvector. Moreover, the sequence of scalars rnk converges to the corresponding dominant eigenvalue Ax = 2. Table 4.2 k 0 1 2 3 4 5 6 7 8 "l "f "1.5" "1.67" "1.83" "1.91" "1.95" "1.98" "1.99" xk .0. .2. A . 2 1.67. 2 _1.91_ 2 .1.98. Yk "l" 0.5^ 1 "0.83" "l "0.95" "l "0.99" I .0. 1 .0.67, .1 L0.91 1 -0.98. .1 .0.99. mk 1 2 1.5 2 1.83 2 1.95 2 1.99 Section 4.5 Iterative Methods for Computing Eigenvalues 315 This method, called the power method, is summarized below. The Power Method Let A be a diagonalizable n X n matrix with a corresponding dominant eigenvalue A). 1. Let x0 = y0 be any initial vector in IR" whose largest component is 1. 2. Repeat the following steps for k — 1, 2,...: (a) Compute xk = Ayk^l. (b) Let mk be the component of x^ with the largest absolute value. (c) Setyt = (l/mk)xk. For most choices of x0, mk converges to the dominant eigenvalue A. and yk converges to a dominant eigenvector. Example 4.31 Use the power method to approximate the dominant eigenvalue and a dominant eigenvector of 0 5 -6 -4 12 -12 -2 -2 10 Solution Taking as our initial vector we compute the entries in Table 4.3. You can see that the vectors yk are approaching 0.50 1 -0.50 and the scalars m; are approaching 16. This suggests that they are, respectively, a dominant eigenvector and the dominant eigenvalue of A. Remarks • If the initial vector x0 has a zero component in the direction of the dominant eigenvector vt (i.e., if c = 0 in the proof of Theorem 4.28), then the power method Table 4.3 k 0 l 2 3 4 5 6 7 1 -l -9.33 8.62 8.12 8.03 8.01 8.00 1 -4 -19.33 17.31 16.25 16.05 16.01 16.00 1 6 11.67 -9.00 -8.20 -8.04 -8.01 -8.00 *r -0.17" 0.48" 0.50" 0.50" 0.50~ 0.50 " 0.50 Yk l -0.67 1 1 1 1 1 1 _i_ 1 .-0.60. .-0.52. .-0.50. .-0.50. .-0.50. .-0.50 l 6 -19.33 17.31 16.25 16.05 16.01 16.00 316 Chapter 4 Eigenvalues and Eigenvectors John William Strutt (1842-1919), Baron Rayleigh, was a British physicist who made major contributions to the fields of acoustics and optics. In 1871, he gave the first correct explanation of why the sky is blue, and in 1895, he discovered the inert gas argon, for which discovery he received the Nobel Prize in 1904. Rayleigh was president of the Royal Society from 1905 to 1908 and became chancellor of Cambridge University in 1908. He used Rayleigh quotients in an 1873 paper on vibrating systems and later in his book The Theory of Sound. will not converge to a dominant eigenvector. However, it is quite likely that during the calculation of the subsequent iterates, at some point roundoff error will produce an xk with a nonzero component in the direction of Vj. The power method will then start to converge to a multiple of Vj. (This is one instance where roundoff errors actually help!) * The power method still works when there is a repeated dominant eigenvalue, or even when the matrix is not diagonalizable, under certain conditions. Details may be found in most modern textbooks on numerical analysis. (See Exercises 21-24.) • For some matrices the power method converges rapidly to a dominant eigenvector, while for others the convergence may be quite slow. A careful look at the proof of Theorem 4.28 reveals why. Since |A2/A]| s |A3/A]| > • • • > IA„/Ail, if |A2/A,| is close to zero, then (A2/Aj) , (AK/Aj) * will all approach zero rapidly. Equation (2) then shows that xk = Akx0 will approach Xkclv1 rapidly too. As an illustration, consider Example 4.31. The eigenvalues are 16, 4, and 2, so A2/Aj = 4/16 = 0.25. Since 0.257 ~ 0.00006, by the seventh iteration we should have close to four-decimal-place accuracy. This is exactly what we saw. 9 There is an alternative way to estimate the dominant eigenvalue Aj of a matrix A in conjunction with the power method. First, observe that if Ax = Atx, then (Ax) • x (A,x) • x Ai(x • x) X'X x-x A, The expression R(x) = ((Ax) • x)/(x • x) is called a Rayleigh quotient. As we compute the iterates xj., the successive Rayleigh quotients R(xk) should approach Aj. In fact, for symmetric matrices, the Rayleigh quotient method is about twice as fast as the scaling factor method. (See Exercises 17-20.) The Shifted Power Method and the Inverse Power Method The power method can help us approximate the dominant eigenvalue of a matrix, but what should we do if we want the other eigenvalues? Fortunately, there are several variations of the power method that can be applied. The shifted power method uses the observation that, if A is an eigenvalue of A, then A — a is an eigenvalue of A — al for any scalar a (Exercise 22 in Section 4.3). Thus, if A[ is the dominant eigenvalue of A, the eigenvalues of A — A,f will be 0, A2 — A), A3 — Aj,.. ., A„ — Aj. We can then apply the power method to compute A2 — A,, and from this value we can find A2. Repeating this process will allow us to compute all of the eigenvalues. Example 4.32 1 Jse thi 1 f 2 0. : shifted power method to compute the second eigenvalue of the matrix A = from Example 4.30. In Example 4.30, we found that A, = 2. To find A2, we apply the power method to r-i ii A - 21 = L 2 -2_ , but other choices will also work. The calculations are summarized We take x0 = in Table 4.4. Section 4.5 Iterative Methods for Computing Eigenvalues 317 Table 4.4 k 0 1 2 3 4 v "-1" 1.5" 1.5" 1.5" 0. 2_ .-3 -3 _-3 "l" -0.5" -0.5" "-0.5" -0.5' y* 0. 1 1 1 1 mk 1 2 -3 -3 -3 Our choice of x0 has produced the eigenvalue — 3 after only two iterations. There fore, A2 — Ai = —3, so A2 — A: — 3 = 2 — 3 = —1 is the second eigenvalue of A. Recall from property (b) of Theorem 4.18 that if A is invertible with eigenvalue A, then A~l has eigenvalue 1/A. Therefore, if we apply the power method to A"1, its dominant eigenvalue will be the reciprocal of the smallest (in magnitude) eigenvalue of A. To use this inverse power method, we follow the same steps as in the power method, except that in step 2(a) we compute xk = A^1 yk-\. (In practice, we don't actually compute A "1 explicitiy; instead, we solve the equivalent equation Axk ~Yk-i for xk using Gaussian elimination. This turns out to be faster.) Use the inverse power method to compute the second eigenvalue of the matrix A = from Example 4.30. 2 0 r Solution We start, as in Example 4.30, with Xq = y0 = we use row reduction: [a|y0] = To solve Axt = y0, 1 1 f "l 0 0* -► 2 0 0_ .0 1 1 Thus.x, = "0" "0" .1. , so y, = 1 . Then we get x2 from Ax2 = [A |yj = "l 1 0" \ 1 0 0.5" 2 0 1. r _0 1 -0.5. Hence, x^ 0.5 -0.5 and, by scaling, we get y2 = Continuing, we get the values shown in Table 4.5, where the values mk are converging to —1. Thus, the smallest eigenvalue of A is the reciprocal of — 1 (which is also — 1). This agrees with our previous finding in Example 4.32. 318 Chapter 4 Eigenvalues and Eigenvectors Table 4.5 k 0 1 2 3 4 5 6 7 8 9 "1" o" 0.5" -0.5' 0.5 " 0.5" 0.5 0.5 0.5 " 0.5 " Xjfc .0. .1. _~0.5_ 1.5. .-0.83. .-1-1. _-0.95_ „-T.02. _-0.99_ L —1.01_ Yk "o" Y r "-0.33" -0.6] "-0.45" "-0.52" -0.49" -0.51" -0.50" .1. 0. -i 1 . 1 J 1 1 1 1 1 mk 1 i 0.5 1.5 -0.83 -1.1 -0.95 -1.02 -0.99 -1.01 The Shifted Inverse Power Method The most versatile of the variants of the power method is one that combines the two just mentioned. It can be used to find an approximation for any eigenvalue, provided we have a close approximation to that eigenvalue. In other words, if a scalar a is given, the shifted inverse power method will find the eigenvalue A of A that is closest to a. If A is an eigenvalue of A and a + A, then A — al is invertible if a is not an eigenvalue of A and 1/(A — a) is an eigenvalue of (A — a/)"1. (See Exercise 45.) If a is close to A, then l/(A — a) will be a dominant eigenvalue of (A — a/)-1. In fact, if a is very close to A, then 1 /(A — a) will be much bigger in magnitude than the next eigenvalue, so (as noted in the third Remark following Example 4.31) the convergence will be very rapid. Examine 4.34 Use the shifted inverse power method to approximate the eigenvalue of A = 0 5 -6 -4 12 -12 —2 -2 10 that is closest to 5. Solution Shifting, we have A — 51 = -5 -4 -2 -6 -12 5 Now we apply the inverse power method with yo = We solve (A — 5/)x! = y0 for xl "-5 5 -6 r "i 0 0 -0.61" [A - 57 ! y0] = -4 7 -12 l —> 0 1 0 -0.88 _-2 -2 5 i _ .0 0 1 -0.39 Section 4.5 Iterative Methods for Computing Eigenvalues 319 Table 4.6 k £ Y i .1. "f i .1. mk 1 1_ -0.61" -0.88 -0.39. 0.69" 1.00 0.45. 0.88 -0.41 -0.69 -0.35. '0.59" 1.00 .0.51. -0.69 -0.47 -0.89 -0.44. "0.53" 1.00 .0.50. -0.89 -0.49 -0.95 -0.48. "0.5l" 1.00 .0.50. -0.95 -0.50 -0.98 -0.49. "0.50" 1.00 .0.50. -0.98 -0.50 -0.99 -0.50. "0.50" 1.00 .0.50. -0.99 -0.50 -1.00 -0.50. "0.50" 1.00 .0.50. -1.00 This gives -0.61 -0.88 -0.39 m1 = —0.88, and y; = 1 0.88 "-0.61" "0.69" -0.88 = 1 .-0.39. .0.45. We continue in this fashion to obtain the values in Table 4.6, from which we deduce that the eigenvalue of A closest to 5 is approximately 5 + l/m7 » 5 + l/(—1) = 4, which, in fact, is exact. The power method and its variants represent only one approach to the computation of eigenvalues. In Chapter 5, we will discuss another method based on the QR factorization of a matrix. For a more complete treatment of this topic, you can consult almost any textbook on numerical methods. a + w Gerschgorin's Theorem We owe this theorem to the Russian mathematician S. Gerschgorin (1901- 1933), who stated it in 1931. It did not receive much attention until 1949, when it was resurrected by Olga Taussky-Todd in a note she published in the American Mathematical Monthly. In this section, we have discussed several variations on the power method for approximating the eigenvalues of a matrix. All of these methods are iterative, and the speed with which they converge depends on the choice of initial vector. If only we had some "inside information" about the location of the eigenvalues of a given matrix, then we could make a judicious choice of the initial vector and perhaps speed up the convergence of the iterative process. Fortunately, there is a way to estimate the location of the eigenvalues of any matrix. Gerschgorin's Disk Theorem states that the eigenvalues of a (real or complex) nXn matrix all lie inside the union of n circular disks in the complex plane. Definition LetA = [ay]bea(realorcomplex)ttXnmatrix,andletr;denote the sum of the absolute values of the off-diagonal entries in the ith row of A; that 1S> ri ~ 2la')'-l ^ne ith Gerschgorin disk is the circular disk D; in the complex plane with center ai{ and radius n. That is, D; = {zinC:jz — %j £ r,} 320 Chapter 4 Eigenvalues and Eigenvectors Olga Taussky-Todd (1906-1995) was born in Olmiitz in the Austro-Hungarian Empire (now Olmuac in the Czech Republic). She received her doctorate in number theory from the University of Vienna in 1930. During World War II, she worked for the National Physical Laboratory in London, where she investigated the problem of flutter in the wings of supersonic aircraft. Although the problem involved differential equations, the stability of an aircraft depended on the eigenvalues of a related matrix. Taussky-Todd remembered Gerschgorin's Theorem from her graduate studies in Vienna and was able to use it to simplify the otherwise laborious computations needed to determine the eigenvalues relevant to the flutter problem. Taussky-Todd moved to the United States in 1947, and ten years later she became the first woman appointed to the California Institute of Technology. In her career, she produced over 200 publications and received numerous awards. She was instrumental in the development of the branch of mathematics now known as matrix theory. w Example 4.35 W Sketch the Gerschgorin disks and the eigenvalues for the following matrices: 1 (a) A = '2 1 .2 -3. 1 -3" .2 3. Sr (a) The two Gerschgorin disks are centered at 2 and — 3 with radii 1 and 2, respectively. The characteristic polynomial of A is A2 + A — 8, so the eigenvalues are A = (-1 ± Vl2~^4(--8))/2 » 2.37, -3.37 Figure 4.15 shows that the eigenvalues are contained within the two Gerschgorin disks. (b) The two Gerschgorin disks are centered at 1 and 3 with radii |—3| = 3 and 2, respectively. The characteristic polynomial of A is A2 — 4A + 9, so the eigenvalues are A = (4 ± V(-4)2~^M9))/2 = 2 ± iVi « 2 + 2.23», 2 - 2.23; Figure 4.16 plots the location of the eigenvalues relative to the Gerschgorin disks. Figure 4.15 Section 4.5 Iterative Methods for Computing Eigenvalues 321 Im 4-- As Example 4.35 suggests, the eigenvalues of a matrix are contained within its Gerschgorin disks. The next theorem verifies that this is so. Theorem 4.29 Gerschgorin's Disk Theorem Let A be an n X n (real or complex) matrix. Then every eigenvalue of A is contained within a Gerschgorin disk. Proof Let A be an eigenvalue of A with corresponding eigenvector x. Let x; be the entry of x with the largest absolute value—and hence nonzero. (Why?) Then Ax = Ax, the t'th row of which is [an ai2 ■ ■ ■ a,n\ Rearranging, we have Ax; or 2fly*j = 2 < j*i___ Hi_< ifl i _ |X,| \Xj\ |X;| jti " because |x,| ^ |x;| forj # i. This establishes that the eigenvalue A is contained within the Gerschgorin disk centered at a with radius r>. __JHB1 322 Chapter 4 Eigenvalues and Eigenvectors Remarks * There is a corresponding version of the preceding theorem for Gerschgorin disks whose radii are the sum of the off-diagonal entries in the ith column of A. * It can be shown that if k of the Gerschgorin disks are disjoint from the other disks, then exactly k eigenvalues are contained within the union of these k disks. In particular, if a single disk is disjoint from the other disks, then it must contain exactly one eigenvalue of the matrix. Example 4.35(a) illustrates this. * Note that in Example 4.35(a), 0 is not contained in a Gerschgorin disk; that is, 0 is not an eigenvalue of A. Hence, without any further computation, we can deduce that the matrix A is invertible by Theorem 4.16. This observation is particularly useful when applied to larger matrices, because the Gerschgorin disks can be determined directly from the entries of the matrix. Example 4.38 ...... Consider the matrix A = 2 0 . Gerschgorins Theorem tells us that the eigen- values of A are contained within three disks centered at 2, 6, and 8 with radii 1,1, and 2, respectively. See Figure 4.17(a). Because the first disk is disjoint from the other two, it must contain exactly one eigenvalue, by the second Remark after Theorem 4.29. Because the characteristic polynomial of A has real coefficients, if it has complex roots (i.e., eigenvalues of A), they must occur in conjugate pairs. (See Appendix D.) Hence there is a unique real eigenvalue between 1 and 3, and the union of the other two disks contains two (possibly complex) eigenvalues whose real parts lie between 5 and 10. On the other hand, the first Remark after Theorem 4.29 tells us that the same three eigenvalues of A are contained in disks centered at 2, 6, and 8 with radii |, 1, and j> respectively. See Figure 4.17(b). These disks are mutually disjoint, so each contains a single (and hence real) eigenvalue. Combining these results, we deduce that A has three real eigenvalues, one in each of the intervals [1, 3], [5, 7], and [7.5, 8.5]. (Compute the actual eigenvalues of A to verify this.) Section 4.5 Iterative Methods for Computing Eigenvalues f Exercises 4.5 In Exercises 1-4, a matrix A is given along with an iterate x5, produced as in Example 4.30. (a) Use these data to approximate a dominant eigenvector whose first component is 1 and a corresponding dominant eigenvalue. (Use three-decimal-place accuracy.) (b) Compare your approximate eigenvalue in part (a) with the actual dominant eigenvalue. 1. A = 2. A = 3. A = 4. A 7 4" > x5 = 7811" 3 -1_ L-3904 2 1 1 1_ 1.5 0.5 2.0 3.0 " 4443" , x5 = .11109. 1 "144" >x5 = 89. x5 = 60.625 239.500 2 -3" , x5 = "-3.667" 3 10 11.001 In Exercises 5-8, a matrix A is given along with an iterate xk, produced using the power method, as in Example 4.31. (a) Approximate the dominant eigenvalue and eigenvector by computing the corresponding mk and yk. (b) Verify that you have approximated an eigenvalue and an eigenvector of A by comparing Ayk with mkyk. 5. A 6. A = 7. A 8. A 5 2 "5.530" 2 -2. > xio — 1.470 4 -1 6 1 1 0 -1 6 i 4 -2 -3 1 > x8 — 10.000 0.001 10.000 3.415 2.914 -1.207 In Exercises 9-14, use the power method to approximate the dominant eigenvalue and eigenvector of A. Use the given initial vector xQ, the specified number of iterations k, and three-decimal-place accuracy. 9. A 14 5 12 3 r .1. "-6 4" 1 10. A = 8 -2. i Xo — 0 11. A = 12. A = 13. A 14. A = 7 2 2 3 3.5 1.5 9 T i Xq — _o 1.5 -0.5 4 8 15 -4 -4 9 1 0 3 1 1 3 .x0 = ,k = 6 k = 5 V > X() — 1 .1. In Exercises 15 and 16, use the power method to approximate the dominant eigenvalue and eigenvector of A to two-decimal-place accuracy. Choose any initial vector you like (but keep the first Remark after Example 4.31 in mind!) and apply the method until the digit in the second decimal place of the iterates stops changing. 15. A = 16. A = 12 2 -6 -6 -2 12 Rayleigh quotients are described on page 316. In Exercises 17-20, to see how the Rayleigh quotient method approximates the dominant eigenvalue more rapidly than the ordinary power method, compute the successive Rayleigh quotients R(x,)for i = 1,..., k for the matrix A in the given exercise. 17. Exercise 11 19. Exercise 13 18. Exercise 12 20. Exercise 14 The matrices in Exercises 21-24 either are not diagonaliz-able or do not have a dominant eigenvalue (or both). Apply the power method anyway with the given initial vector xQ, performing eight iterations in each case. Compute the exact eigenvalues and eigenvectors and explain what is happening. 21. A = 23. A 24. A 1 "l" " 3 1] T >x„ = 22. A = .-1 lJ'X0 = 1—1 324 Chapter 4 Eigenvalues and Eigenvectors In Exercises 25-28, the power method does not converge to the dominant eigenvalue and eigenvector. Verify this, using the given initial vector x0. Compute the exact eigenvalues and eigenvectors and explain what is happening. 25. A 26. A 27. A 28. A -1 2 T — = .-1 1. .1. 2 1" "l" —2 > Xq = 5_ .1. "-5 1 7~ = 0 4 0 7 1 — -1 o" = 1 1 0 ) Xq — .1 -1 1. 42. p(x) = x -x-\a = 2 43. p(x) - x3 - 2x2 + 1, a = 0 44. p(x) = x3 - 5x2 + x + 1, a = 5 45. Let A be an eigenvalue of A with corresponding eigenvector x. If a + A and a is not an eigenvalue of A, show that 1/(A — a) is an eigenvalue of (A — al)~l with corresponding eigenvector x. (Why must A — al be invertible?) 46. If A has a dominant eigenvalue Aj, prove that the ei-genspace Ex is one-dimensional. km7 /n Exercises 47-50, draw the Gerschgorin disks for the given matrix. 47. In Exercises 29-32, apply the shifted power method to approximate the second eigenvalue of the matrix A in the given exercise. Use the given initial vector x0, k iterations, and three-decimal-place accuracy. 29. Exercise 9 30. Exercise 10 31. Exercise 13 32. Exercise 14 In Exercises 33-36, apply the inverse power method to approximate, for the matrix A in the given exercise, the eigenvalue that is smallest in magnitude. Use the given initial vector x0, k iterations, and three-decimal-place accuracy. 33. Exercise 9 34. Exercise 10 I 1 1 35. Exercise 7, Xg = 36. Exercise 14 In Exercises 37-40, use the shifted inverse power method to approximate, for the matrix A in the given exercise, the eigenvalue closest to a. 37. Exercise 9, a = 0 38. Exercise 12, a = 0 39. Exercise 7, a = 5 40. Exercise 13, a = —2 Exercise 32 in Section 4.3 demonstrates that every polynomial is (plus or minus) the characteristic polynomial of its own companion matrix. Therefore, the roots of a polynomial p are the eigenvalues of C(p). Hence, we can use the methods of this section to approximate the roots of any polynomial when exact results are not readily available. In Exercises 41-44, apply the shifted inverse power method to the companion matrix C(p) ofp to approximate the root of p closest to a to three decimal places. 41. p(x) = x2 + 2x - 2, a = 0 49. 1 1 0 2 —i 0 1 2 4 1 2 48. 1 2i 1 + i .1 0 5. .0 1 -2i "4 - 3i i 2 -2 - i -1 4- i 0 0 1 + i - i 5 + 6i 2i _ 1 -2i 2i -5 - 5i_ "2 1 2 0 0" i 4 4 i 4 0 0 i 6 6 i 6 0 0 l 8 8_ 50. 51. A square matrix is strictly diagonally dominant if the absolute value of each diagonal entry is greater than the sum of the absolute values of the remaining entries in that row. (See Section 2.5.) Use Gerschgorin's Disk Theorem to prove that a strictly diagonally dominant matrix must be invertible. [Hint: See the third Remark after Theorem 4.29.] 52. If A is an n X n matrix, let || A || denote the maximum of the sums of the absolute values of the rows of A; that is, (See Section 7.2.) Prove that ||A|| = max( 2 if A is an eigenvalue of A, then | A | £ || A ||. 53. Let A be an eigenvalue of a stochastic matrix A (see Section 3.7). Prove that | A| SI, [Hint: Apply Exercise 52 to Ar.] 54. Prove that the eigenvalues of A = are all real, and locate each of these eigenvalues within a closed interval on the real line. Section 4.6 Applications and the Perron-Frobenius Theorem 325 Applications and the Perron-Frobenius Theorem In this section, we will explore several applications of eigenvalues and eigenvectors. We begin by revisiting some applications from previous chapters. Markov Chains Section 3.7 introduced Markov chains and made several observations about the transition (stochastic) matrices associated with them. In particular, we observed that if P is the transition matrix of a Markov chain, then P has a steady state vector x. That is, there is a vector x such that Px = x. This is equivalent to saying that P has 1 as an eigenvalue. We are now in a position to prove this fact. ThdOFem 4.30 If P is the n X n transition matrix of a Markov chain, then 1 is an eigenvalue of P. Proot Recall that every transition matrix is stochastic; hence, each of its columns sums to 1. Therefore, if) is a row vector consisting of n Is, then jP = j. (See Exercise 13 in Section 3.7.) Taking transposes, we have pTf = (jp)r = jr which implies that j3 is an eigenvector of P with corresponding eigenvalue 1. By Exercise 19 in Section 4.3, P and P have the same eigenvalues, so 1 is also an eigenvalue of P. _-M In fact, much more is true. For most transition matrices, every eigenvalue A satisfies |A| ^ 1 and the eigenvalue 1 is dominant; that is, if A + 1, then jA| < 1. We need the following two definitions: A matrix is called positive if all of its entries are positive, and a square matrix is called regular if some power of it is positive. For example, "3 1" A = 11 6 3 1 2 2 3 is positive but B 2 0 is not. However, B is regular, since B2 is positive. Theorem 4.31 Let P be an n X n transition matrix with eigenvalue A. a. |A| < 1 b. If P is regular and A 1, then |A| < 1. Proof As in Theorem 4.30, the trick to proving this theorem is to use the fact that P has the same eigenvalues as P. (a) Let x be an eigenvector of PT corresponding to A and let jq. be the component of x with the largest absolute value m. Then |x;| ^ \xj\ = m for i = 1,2,..., n. Comparing the A:th components of the equation PTx = Ax, we have Pikx\ + P:k*i + ■■ ■ + p„kx„ = Xxk Chapter 4 Eigenvalues and Eigenvectors (Remember that the rows of PT are the columns of P.) Taking absolute values, we obtain |A[m = |AJ \xk\ = \Xxk\ = + p2kx2 + • • • + pnkxn\ \Plkxl\ + \P2kX2i + ■ ■ ■ + \pnkxJ Purkl + p2k\x2\ + ■ ■ ■ + pnk\xn\ Vim + plkm + ■■■ + pnkm (pik + P2k + '■■ + Pnk)m = m (1) The first inequality follows from the Triangle Inequality in R, and the last equality comes from the fact that the rows of PT sum to 1. Thus, [Ajm m. After dividing by m, we have |AJ ^ 1, as desired. (b) We will prove the equivalent implication: If jAj = 1, then A = 1. First, we show that it is true when P (and therefore PT) is a positive matrix. If |Aj = 1, then all of the inequalities in Equations (1) are actually equalities. In particular, Puki + p2k\x2\ + •■■ + p„k\x„\ = plkm + p2km + •■■ + pnkm Equivalently, Pikif k!) + pikh = o (2) Now, since Pis positive,pik > 0 for i = 1,2,...,«. Also, m — \x-\ £: 0 for i = 1,2,.,.,«. Therefore, each summand in Equation (2) must be zero, and this can happen only if \xt\ = m for / = 1,2,...,«. Furthermore, we get equality in the Triangle Inequality in If? if and only if all of the summands are positive or all are negative; in other words, the pikx{'s all have the same sign. This implies that my or x — m — m ■m -my where j is a row vector of n Is, as in Theorem 4.30. Thus, in either case, the eigenspace of P1 corresponding to A is Ex = span( jr). But, using the proof of Theorem 4.30, we see that f = PT j1 — kf, and, comparing components, we find that A = 1. This handles the case where P is positive. If P is regular, then some power of P is positive—say, P . It follows that P must also be positive. (Why?) Since A* and Xk+1 are eigenvalues of P* and Pk+1, respectively, by Theorem 4.18, we have just proved that Xk - Xk+1 = 1. Therefore, Afc(A - 1) = 0, which implies that A = 1, since A = 0 is impossible if jAI = 1. We can now explain some of the behavior of Markov chains that we observed in Chapter 3. In Example 3.64, we saw that for the transition matrix p - [°7 °'2l _ L0.3 0.8J and initial state vector Xq = 0.4" 0.6 0.6 0.4 , the state vectors xk converge to the vector x , a steady state vector for P (i.e., Px = x). We are going to prove that for regular Section 4.6 Applications and the Perron-Frobenius Theorem 327 Markov chains, this always happens. Indeed, we will prove much more. Recall that the state vectors x, satisfy xk = Pkx0. Let's investigate what happens to the powers P as P becomes large. iiätrnn 4.31 The transition matrix P 0.7 0.2 0.3 0.8 0 = det(P - XI) 0.7 - A 0.3 0.2 0.8 - A has characteristic equation A2 - 1.5A + 0.5 = (A - 1)(A - 0.5) so its eigenvalues are A, = 1 and A2 = 0.5. (Note that, thanks to Theorems 4.30 and 4.31, we knew in advance that 1 would be an eigenvalue and the other eigenvalue would be less than 1 in absolute value. However, we still needed to compute A2.) The eigenspaces are span So, taking Q = 2 1 _3 -1 used in Example 4.29 in Section 4.4, we have and E05 = span we know that Q 'PQ 1 0 0 0,5 Now, as k ■ Tr- pí = QDkQ 1 = co, (0.5)* -> 0, so and Pk - 1 0 0 0 2 1 3 -1 2 1 3 -1 0 (0.5)* 1 -1 D. From the method 1 -1 1 0 0 0 1 'M _ 0.4 0.4 -1 0.6 0.6 (Observe that the columns of this "limit matrix" are identical and each is a steady a b state vector for P.) Now let Xq = Then be any initial probability vector (i.e., a + b = 1). "0.4 0.4" fa "0.4a + 0.4b" "0.4* _0.6 0.6_ k .0.6a + Q,6b_ .0.6. xk = PkXo -» Not only does this explain what we saw in Example 3.64, it also tells us that the state 0,1 vectors xk will converge to the steady state vector x 0.6 for any choice of x0! There is nothing special about Example 4.37. The next theorem shows that this type of behavior always occurs with regular transition matrices. Before we can present the theorem, we need the following lemma. LCmma 4.32 Let P be a regular n X n transition matrix. If P is diagonalizable, then the dominant eigenvalue Aj = 1 has algebraic multiplicity 1. Chapter 4 Eigenvalues and Eigenvectors The eigenvalues of P and PT are the same. From the proof of Theorem 4.31(b), Aj = 1 has geometric multiplicity 1 as an eigenvalue of P1. Since P is diagonalizable, so is P , by Exercise 41 in Section 4.4. Therefore, the eigenvalue A t = 1 has algebraic multiplicity 1, by the Diagonalization Theorem. a— TlieOrBltl 4.33 Let P be a regular n X n transition matrix. Then as k —> oo, Pk approaches an n X n matrix L whose columns are identical, each equal to the same vector x. This vector x is a steady state probability vector for P. See Finite Markov Chains by J. G. Kemeny and }. L. Snell (New York: Springer-Verlag, 1976). To simplify the proof, we will consider only the case where P is diagonalizable. The theorem is true, however, without this assumption. We diagonalize P as Q~'PQ = D or, equivalent!)', P = QDQ~\ where D = At 0 0 A2 0 0 From Theorems 4.30 and 4.31, we know that each eigenvalue A, either is 1 or satisfies jAfj < 1. Hence, as k —»oc, Af approaches 1 or 0 for i = 1,..., n. It follows that FT approaches a diagonal matrix—say, D*—each of whose diagonal entries is 1 or 0. Thus, Pk = QDkQ'~1 approaches L = QD*Q~1. We write lim P1 = L We are taking some liberties with the notion of a limit. Nevertheless, these steps should be intuitively clear. Rigorous proofs follow from the properties of limits, which you may have encountered in a calculus course. Rather than get sidetracked with a discussion of matrix limits, we will omit the proofs. Observe that PL P lim Pk lc-><*> lim PP* lim P*fl Therefore, each column of L is an eigenvector of P corresponding to A! = 1. To see that each of these columns is a probability vector (i.e., I is a stochastic matrix), we need only observe that, if j is the row vector with n Is, then jL = j lim Pk = lim jPk = lim j = j fc-+=o k-*l since Pk is a stochastic matrix, by Exercise 14 in Section 3.7. Exercise 13 in Section 3.7 now implies that L is stochastic. We need only show that the columns of L are identical. The t'th column of I is just Le,, where e,< is the ith standard basis vector. Let Vj, v2,..., v„ be eigenvectors of P forming a basis of U", with Vj corresponding to At = 1. Write e« = CjVj + c2v2 + • • ■ for scalars cl5 c2,..., c„. Then, by Theorem 4.19, Pkei = CjlfcVi + c2A2v2 + • + c v + C A V By Lemma 4.32, A^ j= 1 for j # 1, so, by Theorem 4.31(b), |A,j < 1 for j # 1. Hence, Aj —> 0 as k -» oo, for j ^ 1. It follows that Le; = lim Pkt, = CjVj Section 4.6 Applications and the Perron-Frobenius Theorem 329 In other words, column i of I is an eigenvector corresponding to Aj = 1. But we have shown that the columns of L are probability vectors, so Le; is the unique multiple x of Vj whose components sum to 1. Since this is true for each column of L, it implies that all of the columns of L are identical, each equal to this vector x, -viMM Remark Since L is a stochastic matrix, we can interpret it as the long range transition matrix of the Markov chain. That is, represents the probability of being in state i, having started from state /', if the transitions were to continue indefinitely. The fact that the columns of L are identical says that the starting state does not matter, as the next example illustrates. tezmm 4.38 Recall the rat in a box from Example 3.65. The transition matrix was P = "o 1 3 1 -3 1 2 0 2 3 1 - 2 2 3 0_ We determined that the steady state probability vector was Hence, the powers of P approach I = -1 1 1 - 4 4 4 3 3 3 8 8 8 3 3 3 -8 8 8- 0.250 0.375 0.375 0.250 0.250 0.375 0.375 0.375 0.375 from which we can see that the rat will eventually spend 25% of its time in compartment 1 and 37.5% of its time in each of the other two compartments. -*—■- We conclude our discussion of regular Markov chains by proving that the steady state vector x is independent of the initial state. The proof is easily adapted to cover the case of state vectors whose components sum to an arbitrary constant—say, s. In the exercises, you are asked to prove some other properties of regular Markov chains. ThBOreil) 4.34 Let P be a regular n X n transition matrix, with x the steady state probability vector - for P, as in Theorem 4.33. Then, for any initial probability vector x0, the sequence of iterates xk approaches x. PrOOf Let %2 Chapter 4 Eigenvalues and Eigenvectors where xx + x2 H-----E x„ = 1. Since Xj. = P^x0>we must show that lim Pkx0 = x. Now, by Theorem 4.33, the long range transition matrix is L = [x x Therefore, x ] and lim P fc-»oo lim Pkx0 = (lim Pk)x0 = Lx0 k~*X fc—»=D = X X = + x2x + ■ • • + x„x = (*! + X2 + m • • + X„)x = X Example 4.39 Population Growth We return to the Leslie model of population growth, which we first explored in Section 3.7. In Example 3.67 in that section, we saw that for the Leslie matrix "0 4 3" 0.5 0 0 .0 0.25 0_ iterates of the population vectors began to approach a multiple of the vector 18" 6 1. In other words, the three age classes of this population eventually ended up in the ratio 18:6:1. Moreover, once this state is reached, it is stable, since the ratios for the following year are given by Lx = 0 0.5 0 4 0 0.25 18 "27 6 = 9 = 1.5x 1 1.5 and the components are still in the ratio 27:9:1.5 = 18:6:1. Observe that 1.5 represents the growth rate of this population when it has reached its steady state. We can now recognize that x is an eigenvector of L corresponding to the eigenvalue A = 1.5. Thus, the steady state growth rate is a positive eigenvalue of I, and an eigenvector corresponding to this eigenvalue represents the relative sizes of the age classes when the steady state has been reached. We can compute these directly, without having to iterate as we did before. Find the steady state growth rate and the corresponding ratios between the age classes for the Leslie matrix L above. Solution We need to find all positive eigenvalues and corresponding eigenvectors of L. The characteristic polynomial of L is -A 4 3 det(I - XI) = 0.5 -A 0 0 0.25 -A = -AJ + 2A + 0.375 Section 4.6 Applications and the Perron-Frobenius Theorem 331 so we must solve — A' + 2A + 0.375 = 0 or, equivalently, 8A'' — 16A — 3 = 0. Factoring, we have (2A - 3)(4A2 + 6A + 1) = 0 (See Appendix D.) Since the second factor has only the roots (— 3 + V5)/4 ~ — 0.19 and ( — 3 - V5)/4 ~ —1.31, the only positive root of this equation is A = f = 1.5. The corresponding eigenvectors are in the null space of L — 1.51, which we find by row reduction: "-1.5 4 3 0" "l 0 -18 0" [L - \.5I\0] = 0.5 -1.5 0 0 -► 0 1 -6 0 0 0.25 -1.5 0. _0 0 0 0 Thus, if x = x2 and x2 = 6x3. That is, is an eigenvector corresponding to A = 1.5, it satisfies xx = 18x3 £..5 = 1 "18x3" } { "18" \ 1 6X3 ■ = span \ 6 \ - x3 _ J 1. ! Hence, the steady state growth rate is 1.5, and when this rate has been reached, the age classes are in the ratio 18:6:1, as we saw before. In Example 4.39, there was only one candidate for the steady state growth rate: the unique positive eigenvalue of L. But what would we have done if L had had more than one positive eigenvalue or none? We were also apparently fortunate that there was a corresponding eigenvector all of whose components were positive, which allowed us to relate these components to the size of the population. We can prove that this situation is not accidental; that is, every Leslie matrix has exactly one positive eigenvalue and a corresponding eigenvector with positive components. Recall that the form of a Leslie matrix is (3) Since the entries st represent survival probabilities, we will assume that they are all nonzero (otherwise, the population would rapidly die out). We will also assume that at least one of the birth parameters bt is nonzero (otherwise, there would be no births and, again, the population would die out). With these standing assumptions, we can now prove the assertion we made above as a theorem. »! b2 b3 ■ K *1 0 0 • ■ 0 0 0 h 0 • 0 0 0 0 0 0 0 0 0 • sn-l 0 Theorem 4.35 Every Leslie matrix has a unique positive eigenvalue and a corresponding eigenvector with positive components. 332 Chapter 4 Eigenvalues and Eigenvectors PlOOl Let L be as in Equation (3). The characteristic polynomial of L is cL(A) = dettt - A7) = (-l)"(An - &,A""1 - bjsX2 - b^X" 3-----tViV • -s„-i) = (-D"/(A) (You are asked to prove this in Exercise 16.) The eigenvalues of L are therefore the roots of/(A). Since at least one of the birth parameters b{ is positive and all of the survival probabilities Sj are positive, the coefficients of/(A) change sign exactly once. By Descartes's Rule of Signs (Appendix D), therefore, /(A) has exactly one positive root. Let us call it A,. By direct calculation, we can check that an eigenvector corresponding to A, is 1 *iAi SiS2/X\ xi — i\i s1s2s3/ax (You are asked to prove this in Exercise 18.) Clearly, all of the components of xt are positive. -JBBi In fact, more is true. With the additional requirement that two consecutive birth parameters bi and bi+l are positive, it turns out that the unique positive eigenvalue Aj of L is dominant; that is, every other (real or complex) eigenvalue A of L satisfies |A| < A[. (It is beyond the scope of this book to prove this result, but a partial proof is outlined in Exercise 27 for readers who are familiar with the algebra of complex numbers.) This explains why we get convergence to a steady state vector when we iterate the population vectors: It is just the power method working for us! The Perron-Frobenius Theorem Oskar Perron (1880-1975) was a German mathematician who did work in many fields of mathematics, including analysis, differential equations, algebra, geometry, and number theory. Perrons Theorem was published in 1907 in a paper on continued fractions. In the previous two applications, Markov chains and Leslie matrices, we saw that the eigenvalue of interest was positive and dominant. Moreover, there was a corresponding eigenvector with positive components. It turns out that a remarkable theorem guarantees that this will be the case for a large class of matrices, including many of the ones we have been considering. The first version of this theorem is for positive matrices. First, we need some terminology and notation. Let's agree to refer to a vector as positive if all of its components are positive. For two m X n matrices A = [a,T and B = [bjj\, we will write A £ JB if £ by for all i and j. (Similar definitions will apply for A > B, A £ B, and so on.) Thus, a positive vector x satisfies x > 0. Let us define jA| = t|fl«|] to be the matrix of the absolute values of the entries of A. Section 4.6 Applications and the Perron-Frobenius Theorem 333 Theorem 4.36 Perrons Theorem Let A be a positive n X n matrix. Then A has a real eigenvalue Xx with the following properties: a. Aj > 0 b. A[ has a corresponding positive eigenvector. c. If A is any other eigenvalue of A, then |A| ^ At. Intuitively, we can see why the first two statements should be true. Consider the case of a 2 X 2 positive matrix A. The corresponding matrix transformation maps the first quadrant of the plane properly into itself, since all components are positive. If we repeatedly allow A to act on the images we get, they necessarily converge toward some ray in the first quadrant (Figure 4.18). A direction vector for this ray will be a positive vector x, which must be mapped into some positive multiple of itself (say, kx), since A leaves the ray fixed. In other words, Ax = Axx, with x and Ax both positive. Proof For some nonzero vectors x, Ax s Ax for some scalar A. When this happens, then A(kx) S A(fcx) for all k > 0; thus, we need only consider unit vectors x. In Chapter 7, we will see that A maps the set of all unit vectors in IR" (the unit sphere) into a "generalized ellipsoid." So, as x ranges over the nonnegative vectors on this unit sphere, there will be a maximum value of A such that Ax s Ax. (See Figure 4.19.) Denote this number by \Y and the corresponding unit vector by xv y y ------.-->x Figure 4.18 y Figure 4.19 334 Chapter 4 Eigenvalues and Eigenvectors We now show that Axx = AjXj. If not, then Axx > \{Xi, and, applying A again, we obtain A(Axj) > ACAjXj) = Aj(Axi) where the inequality is preserved, since A is positive. (See Exercise 40 and Section 3.7 Exercise 36.) But then y = (l/|JAx1||)Axj is a unit vector that satisfies Ay > Ajy, so there will be some A2 > A! such that Ay > A2y. This contradicts the fact that Ax was the maximum value with this property. Consequently, it must be the case that Axx = XyXy, that is, Aj is an eigenvalue of A. Now A is positive and x, is positive, so AjXj = Axt > 0. This means that A] > 0 and Xl > 0, which completes the proof of (a) and (b). To prove (c), suppose A is any other (real or complex) eigenvalue of A with corresponding eigenvector z. Then Az = Az, and, taking absolute values, we have A z = A zl > Az = Az IAI z (4) where the middle inequality follows from the Triangle Inequality. (See Exercise 40.) Since jz[ > 0, the unit vector u in the direction of |z| is also positive and satisfies Au S jA|u. By the maximality of A [ from the first part of this proof, we must have |A| Aj. In fact, more is true. It turns out that At is dominant, so |A| < A[ for any eigenvalue A + At. It is also the case that A] has algebraic, and hence geometric, multiplicity 1. We will not prove these facts. Perron's Theorem can be generalized from positive to certain nonnegative matrices. Frobenius did so in 1912. The result requires a technical condition on the matrix. A square matrix A is called reducible if, subject to some permutation of the rows and the same permutation of the columns, A can be written in block form as B C O D where B and D are square. Equivalently, A is reducible if there is some permutation matrix P such that PAP (See page 187.) For example, the matrix A = 2 0 0 1 3 4 2 1 5 5 1 2 7 3 0 6 0 0 2 1 1 0 0 7 2 is reducible, since interchanging rows 1 and 3 and then columns 1 and 3 produces 7 2 1 3 0 1 2 4 5 5 0 0 2 1 3 0 0 6 2 1 0 0 1 7 2 Section 4.6 Applications and the Perron-Frobenius Theorem (This is just PAP1, where 0 0 10 0 0 10 0 0 1 0 0 0 0 0 0 0 1 0 0 0 0 0 1 Check this!) A square matrix A that is not reducible is called irreducible. If Ak > O for some k, then A is called primitive. For example, every regular Markov chain has a primitive transition matrix, by definition. It is not hard to show that every primitive matrix is irreducible, (Do you see why? Try showing the contrapositive of this.) Theorem 4.37 The Perron-Frobenius Theorem Let A be an irreducible nonnegative n X n matrix. Then A has a real eigenvalue X1 with the following properties: a. A, > 0 b. A; has a corresponding positive eigenvector. c. If A is any other eigenvalue of A, then A; i= A;. If A is primitive, then this inequality is strict. d. If A is an eigenvalue of A such that jAj = Ab then A is a (complex) root of the equation A" - A" = 0. e. At has algebraic multiplicity 1. See Matrix Analysis by R. A. Horn and C. R. Johnson (Cambridge, England: Cambridge University Press, 1985). The interested reader can find a proof of the Perron-Frobenius Theorem in many texts on nonnegative matrices or matrix analysis. The eigenvalue A2 is often called the Perron root of A, and a corresponding probability eigenvector (which is necessarily unique) is called the Perron eigenvector of A. Linear Recurrence Relations The Fibonacci numbers are the numbers in the sequence 0, 1,1, 2, 3, 5, 8,13, 21,..., where, after the first two terms, each new term is obtained by summing the two terms preceding it. If we denote the nth Fibonacci number by/„, then this sequence is completely defined by the equations/0 = 0,/j = 1, and, for n a 2, f n fn-í "b f n-i This last equation is an example of a linear recurrence relation. We will return to the Fibonacci numbers, but first we will consider linear recurrence relations somewhat more generally. Chapter 4 Eigenvalues and Eigenvectors Leonardo of Pisa (1170-1250), pictured left, is better known by his nickname, Fibonacci, which means "son of Bonaccio." He wrote a number of important books, many of which have survived, including Liber abaci and Liber quadratorum. The Fibonacci sequence appears as the solution to a problem in Liber abaci: "A certain man put a pair of rabbits in a place surrounded on all sides by a wall. How many pairs of rabbits can be produced from that pair in a year if it is supposed that every month each pair begets a new pair which from the second month on becomes productive?" The name Fibonacci numbers was given to the terms of this sequence by the French mathematician Edouard Lucas (1842-1891). Definition Let (*„) = (*„, x;, Xi,...) be a sequence of numbers that is defined as follows: 1. x0 = a0, Xj = av ..., Xfc-i = ak-v where a0, ax, ..., ak..x are scalars. 2. For all n s k, x„ = cxxn^ + c2xn~2 + • • • + CfXn^k, where c1; c2,..., ck are scalars. If ck # 0, the equation in (2) is called a linear recurrence relation of order k. The equations in (1) are referred to as the initial conditions of the recurrence. Thus, the Fibonacci numbers satisfy a linear recurrence relation of order 2. Remarks • If, in order to define the rtfh term in a recurrence relation, we require the (n — k)th term but no term before it, then the recurrence relation has order k. • The number of initial conditions is the order of the recurrence relation. • It is not necessary that the first term of the sequence be called x0. We could start at xx or anywhere else. « It is possible to have even more general linear recurrence relations by allowing the coefficients ct to be functions rather than scalars and by allowing an extra, isolated coefficient, which may also be a function. An example would be the recurrence x„ 2x,, n xn-2 *»~3 n + n We will not consider such recurrences here. Consider the sequence (x„) defined by the initial conditions xx = 1, x2 — 5 and the recurrence relation x„ = 5x„_, - 6x„._2 for n 5:2. Write out the first five terms of this sequence. Solution We are given the first two terms. We use the recurrence relation to calculate the next three terms. We have x3 = 5x2 - 6xj = 5 5 - - 6-1 = 19 = 5x3 — 6x2 = 5 19 - 6 • 5 = 65 x5 5x4 6x3 = 5 65 - 6-19 = 211 so the sequence begins 1, 5,19, 65, 211, Section 4.6 Applications and the Perron-Frobenius Theorem 337 Clearly, if we were interested in, say, the 100th term of the sequence in Example 4.40, then the approach used there would be rather tedious, since we would have to apply the recurrence relation 98 times. It would be nice if we could find an explicit formula for xn as a function of n. We refer to finding such a formula as solving the recurrence relation. We will illustrate the process with the sequence from Example 4.40. To begin, we rewrite the recurrence relation as a matrix equation. Let A = and introduce vectors x„ = for « > 2. Thus, x, x2 "5" x3 Xl. .1. , A3 19" "65" ,x4 = = 5_ *3. , and so on. Now observe that, for n & 2, we have Ax„. X„-x *»-2j 5x 6x„ n — 1 UiA,n—2 Xn~i x„ Notice that this is the same type of equation we encountered with Markov chains and Leslie matrices. As in those cases, we can write x„ = Ax„„! = A2x„...2 = • • • = An'2x2 We now use the technique of Example 4.29 to compute the powers of A. The characteristic equation of A is A2 ~ 5A + 6 = 0 from which we find that the eigenvalues are At = 3 and A2 = 2. (Notice that the form of the characteristic equation follows that of the recurrence relation. If we write the recurrence as xn — 5x„_i 4- 6x„_2 = 0, it is apparent that the coefficients are exactly the same!) The corresponding eigenspaces are E3 = span and £2 span Setting P 3 2 1 1 , we know that P lAP = D ~ 3 0 0 2 Then A = PDP'1 and Ak = pDkp i _ 3 2 LI 1 3 2 1 1 3*+1 - 3* 0 0 2k 3k 0 0 2k 3 2 1 1 1 -2 -1 3 *+i -2(3^1) 4- 3(2i+1) -2(3fc) + 3(2*) It now follows that xn 2 "3"_I — 2"-1 -2(3""') 4- 3(2""')" "5" 3" - 2" X2 - 3«-2 _ 2»-2 -2(3"-2) 4- 3(2"^2)_ 1 Chapter 4 Eigenvalues and Eigenvectors from which we read off the solution xn - 3" - 2". (To check our work, we could plug in n — 1, 2,..., 5 to verify that this formula gives the same terms that we calculated using the recurrence relation. Try it!) Observe that x„ is a linear combination of powers of the eigenvalues. This is necessarily the case as long as the eigenvalues are distinct [as Theorem 4.38(a) will make explicit]. Using this observation, we can save ourselves some work. Once we have computed the eigenvalues At = 3 and A2 = 2, we can immediately write where q and c2 are to be determined. Using the initial conditions, we have 1 = x1 = c^1 + c22' = 3C] + 2c2 when n = 1 and 5 ~ X2 - q32 + c22 when n = 2. We now solve the system 2 _ 9c «2 9Cj + 4c2 = 5 for Cj and c2 to obtain cx = 1 and c2 = — 1. Thus, xn = 3" — 2", as before. This is the method we will use in practice. We now illustrate its use to find an explicit formula for the Fibonacci numbers. Example 4.41 Solve the Fibonacci recurrence/0 = 0,/; = 1, and/„ = fn^l + fn^2 for n S 2. "■ Writing the recurrence as/„ -/„„,...../„_2 - 0, we see that the characteris- tic equation is A2 — A — 1 = 0, so the eigenvalues are 1 + V5 , ^ 1 - V5 Aj =------ and A2 =------- Jacques Binet (1786-1856) made contributions to matrix theory, number theory, physics, and astronomy. He discovered the rule for matrix multiplication in 1812. Binet's formula for the Fibonacci numbers is actually due to Euler, who published it in 1765; however, it was forgotten until Binet published his version in 1843. Like Cauchy, Binet was a royalist, and he lost his university position when Charles X abdicated in 1830. He received many honors for his work, including his election, in 1843, to the Academie des Sciences. It follows from the discussion above that the solution to the recurrence relation has the form /„ = CjA" + c2A? = q----- + c2--— for some scalars cx and c2. Using the initial conditions, we find 0 = Jo = qA? + c2k\ = cx + c2 and 1 = fi - c\A\ + c2X\ = c, 1 + V5 '"2 1 - V5 Solving for q and c2, we obtain cx = 1/VI and c2 = -1/V5. Hence, an explicit formula for the nth Fibonacci number is fn = i (\ + V5V _ 1 (1 - V5V V5 (5) Section 4.6 Applications and the Perron-Frobenius Theorem 339 Formula (5) is a remarkable formula, because it is defined in terms of the irrational number V5 yet the Fibonacci numbers are all integers! Try plugging in a few values for n to see how the V5 terms cancel out to leave the integer values/,,. Formula (5) is known as Binet's formula. The method we have just outlined works for any second order linear recurrence relation whose associated eigenvalues are all distinct. When there is a repeated eigenvalue, the technique must be modified, since the diagonalization method we used may no longer work. The next theorem summarizes both situations. Th60rem 4.38 Let x„ = ax„~x + bx„~2 De a recurrence relation that is satisfied by a sequence (xn). Let Xx and A2 be the eigenvalues of the associated characteristic equation A2 — aX - b = 0. a. If Aj # A2, then xn = CjA" + c2A2 for some scalars cx and c2. b. If Ai = A2 = A, then x„ = cxX" + c2«A" for some scalars cx and c2. In either case, cx and c2 can be determined using the initial conditions. Proof (a) Generalizing our discussion above, we can write the recurrence as x„ = Ax„_!, where *%n—1- Since A has distinct eigenvalues, it can be diagonalized. The rest of the details are left for Exercise 53. (b) We will show that xn ~ CiX" + c2nX" satisfies the recurrence relation x„ = ax„-x + bxn-2 or, equivalently, x„ — axn~i — bxn--2 ~ 0 (6) if A2 - aX - b = 0. Since x„-i = c^1 + c2(n - 1)A" 1 and x,^2 = ClA""2 + c2(n - 2)A""2 substitution into Equation (6) yields x„ - axn„x - bxn„2 = (c,A" + c2nXn) - a{cxkn~l + c2(n - l)A"_1) - b(ClXn-2 + c2(n - 2)X"'2) = c,(A'! - aX""1 - bX"'2) + c2(nXn - a(n ~ DA""1 - bin - 2)Xn-2) = c,A""2(A2 - aX -b) + c2nXn~2(X2 ~ aX-b) + c2Xn~2{aX + lb) = cx\n~2(0) + c2nX"~2(0) + c2Xn-2(aX + 2b) = c2A"'~2(aA + 2b) But, since A is a double root of A2 — aX — b = 0, we must have a2 + 4b = 0 and A = a/2, using the quadratic formula. Consequently, aX + 2b = a2/2 + 2b = —4b/2 + 2b = 0, so x„ - axn-x ~ bxn-2 = c2X"~2(aX + 2b) = c2A""2(0) = 0 and A a b 1 0 340 Chapter 4 Eigenvalues and Eigenvectors Suppose the initial conditions are x0 = r and xx = s. Then, in either (a) or (b) there is a unique solution for cx and c2. (See Exercise 54.) _ Example 4.42 Solve the recurrence relation x0 — 1, xx = 6, and x„ = 6xn-x — 9xn~2 for n > 2. Solution The characteristic equation is A2 — 6A + 9 = 0, which has A = 3 as a double root. By Theorem 4.38(b), we must have x„ = ct3" + c2n3" = (q + c2«)3". Since 1 = Xq ~ C| and 6 =• .v, •- (q + c2)3, we find that c2 = 1, so x„ = (l + n)3" The techniques outlined in Theorem 4.38 can be extended to higher order recurrence relations. We state, without proof, the general result. Theorem 4.39 dy dx Let xn — am~lxn-1 + am-2xn~2 + ••• 4- aoxn-m be a recurrence relation of order m that is satisfied by a sequence (x„). Suppose the associated characteristic polynomial am.-xA" factors as (A - AI)'"'(A - A2)'"2- • - (A - \k)"k, where ml + m2 +----h mk = m. Then x„ has the form xn == (cuA? + c12«AJ + c13«2A? Uf) + + (ckl\'i + cklnkl + ck3n2X"k + ■■■ + ckmnm^\D Systems of Linear Differential Equations In calculus, you learn that if x — x(f) is a differentiable function satisfying a differential equation of the form x' = kx, where k is a constant, then the general solution is x = Ce where C is a constant. If an initial condition x(Q) — x0 is specified, then, by substituting t = 0 in the general solution, we find that C — xQ. Hence, the unique solution to the differential equation that satisfies the initial condition is x = x0ekt Suppose we have n differentiable functions of t— say, xx, x2,..., x„—that satisfy a system of differential equations x] qxxxx 4" (ii2x2 4* * * * 4" aXnxn X2 = «21*1 + «22*2 + • • ' + a2„X„ Xn ®n\x\ ' ^n2x2 -r * • * 4" dnnXn We can write this system in matrix form as x' = Ax, where ~xft)~ ~x[(tY x2(t) , x'(t) = .x„(t)_ and A «12 ' ' <*2n Now we can use matrix methods to help us find the solution. Section 4.6 Applications and the Perron-Frobenius Theorem 341 First, we make a useful observation. Suppose we want to solve the following system of differential equations: X% —" 5Xo Each equation can be solved separately, as above, to give *i = Qe2t x2 = C2est where Q and C2 are constants. Notice that, in matrix form, our equation x' = Ax has a diagonal coefficient matrix A = 2 0 0 5 and the eigenvalues 2 and 5 occur in the exponentials e2t and e5t of the solution. This suggests that, for an arbitrary system, we should start by diagonalizing the coefficient matrix, if possible. Solve the following system of differential equations: X\ ~~ Xy H~ x2 — 3Xj iBlUllOI Here the coefficient matrix is A — 1 2 3 2 and we find that the eigenvalues "2] . _-r are A, = 4 and A2 = -1, with corresponding eigenvectors v = II and v; = respectively. Therefore, A is diagonalizable, and the matrix P that does the job is P = fv, v2] 1 2 -1 3 1 We know that P~lAP = 4 0 0 -1 = D Let x = Py (so that x = Py') and substitute these results into the original equation x' = Ax to get Py' = APy or, equivalently, This is just the system whose general solution is y' = P "APy = Dy y'i = fi = C2e~' or y 342 Chapter 4 Eigenvalues and Eigenvectors To find x, we just compute x = Py = 2 -1 3 1 C,e4i Ce-' 2Qe4t - C2e-! 30^' + C-fi'' so xx = 2C\eA> — C2e ' and x2 = 3Cie'i' + C2e \ (Check that these values satisfy the given system.) Observe that we could also express the solution in Example 4.43 as f 4 x = Qe -1 1 = Qe^Vj + C2e'W2 This technique generalizes easily tonXn systems where the coefficient matrix is diagonalizable. The next theorem, whose proof is left as an exercise, summarizes the situation. Th£0r8ni 4.40 LetAbean«X«diagonalizablematrixandletP = [V] v2 ••■ v„ ] be such that F'lAP Ai 0 0 A2 0 0 Then the general solution to the system x' = Ax is x = de^'vi + C2e/Vv"2 + • • • + C„eK\ The next example involves a biological model in which two species live in the same ecosystem. It is reasonable to assume that the growth rate of each species depends on the sizes of both populations. (Of course, there are other factors that govern growth, but we will keep our model simple by ignoring these.) If Xi(t) and x2(f) denote the sizes of the two populations at time f, then x,'(f) and x2(t) are their rates of growth at time t. Our model is of the form x[(t) = flXi(f) + bx2(t) x2(t) = cxxit) + dx2(t) where the coefficients a, b, c, and d depend on the conditions. Example 4.44 Raccoons and squirrels inhabit the same ecosystem and compete with each other for food, water, and space. Let the raccoon and squirrel populations at time t years be given by r(f) and s(t), respectively. In the absence of squirrels, the raccoon growth rate is r\t) = 2.5r(f), but when squirrels are present, the competition slows the raccoon growth rate to r'(f) = 2.5r(i) - s(t). The squirrel population is similarly affected by the raccoons. In the absence of raccoons, the growth rate of the squirrel population is s'(t) = 2.5s(i)> and the population growth rate for squirrels when they are sharing the ecosystem with raccoons is s'{t) = ~0.25r(f) + 2.5s(f). Suppose that initially Section 4.6 Applications and the Perron-Frobenius Theorem 343 there are 60 raccoons and 60 squirrels in the ecosystem. Determine what happens to these two populations. Solution Our system is x = Ax, where >(*) = x(f) = sit) and A 2.5 -0.25 -1.0 2.5 The eigenvalues of A are A! = 3 and A2 = 2, with corresponding eigenvectors Vj = andv, = By Theorem 4.40, the general solution to our system is xit) = Qe3^ + C2e2tv2 = Qe3 + C2e2' (7) The initial population vector is x(0) = tion (7), we have HO) 5(0) "60" ^60_ so, setting t = 0 in Equa- "2" "60" + c2 — 1, ,60_ Q Solving this equation, we find Q = 15 and C2 = 45. Hence, x( 15e3' "-2" + 45e2' 2" i 1 from which we find r(f) = -30e3£ + 90e2' and 5(f) = 15e31 + 45e2t. Figure 4.20 shows the graphs of these two functions, and you can see clearly that the raccoon population dies out after a little more than 1 year. (Can you determine exactly when it dies out?) A We now consider a similar example, in which one species is a source of food for the other. Such a model is called & predator-prey model. Once again, our model will be drastically oversimplified in order to illustrate its main features. Raccoon and squirrel populations 344 Chapter 4 Eigenvalues and Eigenvectors a + bi Example 4.45 Robins and worms cohabit an ecosystem. The robins eat the worms, which are their only source of food. The robin and worm populations at time f years are denoted by r(t) and w(t), respectively, and the equations governing the growth of the two populations are r'(f) = wit) - 12 w'it) = -rit) + 10 (8) If initially 6 robins and 20 worms occupy the ecosystem, determine the behavior of the two populations over time. Solution The first thing we notice about this example is the presence of the extra constants, —12 and 10, in the two equations. Fortunately, we can get rid of them with a simple change of variables. If we let r(t) = x{t) + 10 and wit) = yit) + 12, then r'{t) = x'{t) and w'{t) = y'(t). Substituting into Equations (8), we have x'it) = yit) y'it) = -x{t) which is easier to work with. Equations (9) have the form x' = Ax, where A 0 f (9) 1 0 . Our new initial conditions are xiO) = riO) - 10 = 6 - 10 = -4 and yiO) = w(0) - 12 -4~ 20 - 12 = sox(0) = Proceeding as in the last example, we find the eigenvalues and eigenvectors of A. corresponding eigenvectors are also complex—namely, vl = Theorem 4.40, our solution has the form x(t) = Qe'Vj + C2e~'V2 = Qe'' 1 ~ i and A2 = — i. "1" i 1 andv2 = — i •By "1" + C2e'-" I _ / — i From x(0) , we get " 1 " "-4* . + c2 — 8_ c, whose solution is Q = —2 — 4i and C2 = —2 + 4i, So the solution to system (9) is 1 xit) = (-2 - 4i)e" + (-2 + 4j)e~" What are we to make of this solution? Robins and worms inhabit a real world-yet our solution involves complex numbers! Fearlessly proceeding, we apply Euler's formula e" = cos t + i sin t Section 4.6 Applications and the Perron-Frobenius Theorem 345 here's nm&Mim PROBLEM 1 CAHt FIGURE OUT. VJUATS OCtt, THMS MR\CW ONE. lOS HME TO \JSt CALCULUS AHO M&GlNAJfl NUMBERS FORTHVS LUMBERS?' ELECRTEEH THtRrt-'mwe m> RL"WCßt ITS Mmi£ CDHfUSIHG AT FIKST. HOW WD )W lEKRN ML TUB'? W£ NEVER EMEN GOtC TO SCHOOL'. tNSTHKT. TföERS N3£ BORK WW \T. CALVIN AND HOBBES © 1988 Watterson. Reprinted with permission of UNIVERSAL PRESS SYNDICATE. All rights reserved. (Appendix C) to get e !t = cos(-f) + isin(-f) = cosf - i sin f. Substituting, we have x(f) = (-2 - 4i)(cos t + i sin t) + (-2 + 4i')(cos t - i sin t) + (-2 cos f + 4 sin t) + i(—4 cos t - 2 sin f) (4 cos t + 2 sin f) + i(-2 cos f + 4 sin f) (-2 cos f + 4 sin f) + i(4 cos f + 2 sin f) (4 cos f + 2 sin t) + i (2 cos t — 4 sin f) —4 cos f + 8 sin t 8 cos f + 4 sin f This gives %(f) = —4 cos t + 8 sin t and/(f) = 8 cos t + 4 sin f. Putting evernhing in terms of our original variables, we conclude that and r(f) = x(t) + 10 = -4 cos f + 8 sin f + 10 w(f) = /(f) + 12 = 8 cos f + 4 sin f + 12 Chapter 4 Eigenvalues and Eigenvectors So our solution is real after all! The graphs of r(f) and w(t) in Figure 4.21 show that the two populations oscillate periodically. As the robin population increases, the worm population starts to decrease, but as the robins' only food source diminishes, their numbers start to decline as well. As the predators disappear, the worm population begins to recover. As its food supply increases, so does the robin population, and the cycle repeats itself. This oscillation is typical of examples in which the eigenvalues are complex. Plotting robins, worms, and time on separate axes, as in Figure 4.22, clearly reveals the cyclic nature of the two populations. We conclude this section by looking at what we have done from a different point of view. If x = x(t) is a differentiable function of t, then the general solution of the ordinary differential equation x' = ax is x — ceat, where c is a scalar. The systems of linear differential equations we have been considering have the form x' = Ax, so if we simply plowed ahead without thinking, we might be tempted to deduce that the solution would be x - ceAt, where c is a vector. But what on earth could this mean? On the right-hand side, we have the number e raised to the power of a matrix. This appears to be nonsense, yet you will see that there is a way to make sense of it. Let's start by considering the expression eA. In calculus, you learn that the function ex has a power series expansion x x ?x = 1 + x + + — + 21 31 that converges for every real number x. By analogy, let us define eA = I + A + A1 A 21 3! The right-hand side is just defined in terms of powers of A, and it can be shown that it converges for any real matrix A, So now eA is a matrix, called the exponential of A. But how can we compute eA or eAt?. For diagonal matrices, it is easy. Example 4.48 Compute eDt for D = Solution From the de 4 0~ _0 -1 finition, ---,-^ we have n, (Dt? (Dt? >Dl = 1 + Dt+ -~-L + ~~ + • 2! 3! 1 0 0 1 + 4t 0 0 - 1 + (4f) + |(4r)2 + i(4f)3 + 0 e4t 0 L 0 C (4r)2 0 0 (-r)2 + 3T (it)3 0 ] o (-tri l + (-t) + U-tf + u-tr + The matrix exponential is also nice if A is diagonalizable. Section 4.6 Applications and the Perron-Frobenius Theorem 347 iiitlliplS 4,4? Compute eA for A 1 2 3 2 Solution In Example 4.43, we found the eigenvalues of A to be Ax = 4 and A2 = -1, with corresponding eigenvectors v, "2" andv2 = .3. 1 respectively. Hence, with P = [v. v2 , we have P~ lAP = D = Since .A - POP \we haveA^ = PD,CP_1, so 6 =I + A+2Í + ¥ + = PIP'1 + POP"1 -^PD2P_1 + —PD-'P"1 + • 2! 3! D D P I + D + — + — 2! 3! = Pel>P~l '2 -\ .3 1. ~2e4 + 3e" 3e4 - 3e" 2e4 3e4 2 -1 3 1. - 2c 1 + 2e"1 We are now in a position to show that our bold (and seemingly foolish) guess at an "exponential" solution of x' = Ax was not so far off after all! ThCOreil) 4.41 Let A be an n X n diagonalizable matrix with eigenvalues A1; A2,..., A„. Then the general solution to the system x' = Ax is x = eAtc, where c is an arbitrary constant vector. If an initial condition x(0) is specified, then c = x(0). Proof Let P diagonalize A. Then A = PDP'\ and, as in Example 4,47, £m = PeiMp-\ Hence, we need to check that x' = Ax is satisfied by x = PeD'P 'c. Now, everything is constant except for eDt, so ix d dt dt —{PeDtp-lc) = P~-ieD<)p-lc dt (10) If D = A] 0 0 A, 0 0 348 Chapter 4 Eigenvalues and Eigenvectors For example, see Linear Algebra by S. H. Friedberg, A. J. Insel, and L. E. Spence (Englewood Cliffs, NJ: Prentice-Hall, 1979). then Taking derivatives, we have 0 0 0 0 Jut Pt\ - dt (eDt) ~AeKt) dt Kxeh 0 A, 0 0 A2 0 0 = DeDt 0 0 0 0 Substituting this result into Equation (10), we obtain x' = PDeD'P'lc = PDP^Pe'^P lc = (PDP~l)(PeDtP~l)c = AeAtc = Ax as required. The last statement follows easily from the fact that if x = x(t) = eA'c, then x(0) = eA'°c = e°c = Ic = c In fact, Theorem 4.41 is true even if A is not diagonalizable, but we will not prove this. Computation of matrix exponentials for nondiagonalizable matrices requires the Jordan normal form of a matrix, a topic that may be found in more advanced linear algebra texts. Ideally, this short digression has served to illustrate the power of mathematics to generalize and the value of creative thinking. Matrix exponentials turn out to be very important tools in many applications of linear algebra, both theoretical and applied. /a + wj Discrete Linear Dynamical Systems We conclude this chapter as we began it—by looking at dynamical systems. Markov chains and the Leslie model of population growth are examples of discrete linear dynamical systems. Each can be described by a matrix equation of the form xk+1 — Axk Section 4.6 Applications and the Perron-Frobenius Theorem 349 where the vector xk records the state of the system at "time" k and A is a square matrix. As we have seen, the long-term behavior of these systems is related to the eigenvalues and eigenvectors of the matrix A. The power method exploits the iterative nature of such dynamical systems to approximate eigenvalues and eigenvectors, and the Perron-Frobenius Theorem gives specialized information about the long-term behavior of a discrete linear dynamical system whose coefficient matrix A is nonnegative. When A is a 2 X 2 matrix, we can describe the evolution of a dynamical system geometrically. The equation xk+1 = Axk is really an infinite collection of equations. Beginning with an initial vector Xq, we have: Xj = Axq x2 = Axj x3 = Ax2 The set {xq, xlt x2, ...} is called a trajectory of the system. (For graphical purposes, we will identify each vector in a trajectory with its head so that we can plot it as a point.) Note that xk = Akx0. Example 4.48 Let A = 0.5 0 _0 0.8 in the trajectories with the following initial vectors: . For the dynamical system xk+l = Axk, plot the first five points (a) Xo (b) Xo = 0 -5 Solution (a) We compute x, = Ax0 0.625 0 ,x4 Ax3 = 0.3125 0 (c) Xg = 2.5 0 (d) xo 1.25 0 Xq — -^Xt — These are plotted in Figure 4.23, and the points are connected to highlight the trajectory. Similar calculations produce the trajectories marked (b), (c), and (d) in Figure 4.23. 350 Chapter 4 Eigenvalues and Eigenvectors In Example 4.48, every trajectory converges to 0. The origin is called an attractor in this case. We can understand why this is so from Theorem 4.19. The matrix A in Example 4.48 has eigenvectors and corresponding to its eigenvalues 0.5 and 0.8, respectively. (Check this.) Accordingly, for any initial vector 1" 0" = cl + c2 Co 0 1 we have x^ = A% = d(0.5)' + c2(0.8)' Because both (0.5)* and (0.8)^ approach zero as k gets large, xk approaches 0 for any choice of x0. In addition, we know from Theorem 4.28 that because 0.8 is the dominant eigenvalue of A, xk will approach a multiple of the corresponding eigenvector as long as c2 ¥= 0 (the coefficient of x^ corresponding to In other words, all trajectories except those that begin on the x-axis (where c2 = 0) will approach the y-axis, as Figure 4.23 shows. Example 4.49 Discuss the behavior of the dynamical system xk+l = Axk corresponding to the 0.65 -0.15] -0.15 0.65 " matrix A = Solution The eigenvalues of A are 0.5 and 0.8 with corresponding eigenvectors and -1 1 we have respectively. (Check this.) Hence for an initial vector Xq = cx "f -l" .1. 1 xk = A% = c,(0.5)' 1" 4- c2(0.8)^ .1. 1 Once again the origin is an attractor, because xk approaches 0 for any choice of Xq. If c2 + 0, the trajectory will approach the line through the origin with direction vector Several such trajectories are shown in Figure 4.24. The vectors Xq where c2 = 0 1 are on the line through the origin with direction vector ing trajectory in this case follows this line into the origin. and the correspond- Section 4.6 Applications and the Perron-Frobenius Theorem 351 Example 4.50 Discuss the behavior of the dynamical systems xk+1 following matrices: Axk corresponding to the (a) A : -1 4 1 1 4 (b) A 1 0.5 0.5 1 and 1 (a) The eigenvalues of A are 5 and 3 with corresponding eigenvectors 1 1 , respectively. Hence for an initial vector Xg = c. -1 1 1 1 , we have xk — A x0 — q5 + c,3' As k becomes large, so do both 5 and 3k. Hence, xk tends away from the origin. Because the dominant eigenvalue of 5 has corresponding eigenvector , all trajec- tories for which Cj # 0 will eventually end up in the first or the third quadrant. Trajectories with Cj = 0 start and stay on the line y = — x whose direction vector is ' See Figure 4.25(a). 1 (b) In this example, the eigenvalues are 1.5 and 0.5 with corresponding eigenvectors respectively. Hence, T '-f and 1. xk = d(l.5)' 1 LU + c2(0.5)* L 1 if ■l \ 1 352 Chapter 4 Eigenvalues and Eigenvectors Figure 4.25 If Ci = 0, then xk = c2(0.5)1 "0"" 1. _0_ as k —> ». But if Cj # 0, then xk = Cl(l.5)' T T + c2(0.5)* - c,(l.5)* l 1. l as fc -> 3= and such trajectories asymptotically approach the line y — x. See Figure 4.25(b). In Example 4.50(a), all points that start out near the origin become increasingly large in magnitude because [A| > 1 for both eigenvalues; 0 is called a repeller. In Example 4.50(b), 0 is called a saddle point because the origin attracts points in some directions and repels points in other directions. In this case, |Aj < 1 and JA2| > 1. The next example shows what can happen when the eigenvalues of a real 2X2 matrix are complex (and hence conjugates of one another). Example 4.51 Plot the trajectory beginning with Xq = corresponding to the following matrices: r _ "4" _4_ for the dynamical systems x^.,, = Axk (a) A = "0.5 -0.5" (b) A = "0.2 -1.2" .0.5 0.5_ _0.6 1-4 Solution The trajectories are shown in Figure 4.26(a) and (b), respectively. Note that (a) is a trajectory spiraling into the origin, whereas (b) appears to follow an elliptical orbit. Section 4.6 Applications and the Perron-Frobenius Theorem 3S3 Figure 4.26 [J i-1-1-1-y-+x | | |V i i i ii i 1 4 Theorem 4.42 a + bi ► Re The following theorem explains the spiral behavior of the trajectory in Example 4.51(a). . The eigenvalues of A are A = a ± hi, and if a and b are not Let A = JO a^ both zero, then A can be factored as a — b b a A = r 0 0 r cosf? — sin0 sin# costi where r = \\\ = V'a2 + b2 and 6 is the principal argument of a + bi Proof The eigenvalues of A are A = \{2a ± V4(-b2)) = \(2a ± 2> 2v4« — v iv u — 2v^« — ^ v &2V — l) — a ± = (7 by Exercise 35(b) in Section 4.1. Figure 4.27 displays a + bi, r, and 0. It follows that cosO — sinfc* sin0 cos# A a —b a/r -b/r "r Ol — T b/r a/r_ .0 rj Remark Geometrically, Theorem 4.42 implies that when A = the linear transformation T(x) = Ax is the composition of a rotation R = cos# -sinfll , , , . „.,, _ \r 0 through the angle 6 followed by a scaling S = sin0 COS0/J & 6 1 6 [0 r tor r (Figure 4.28). In Example 4.51(a), the eigenvalues are A = 0.5 ± 0.5; so r = |A| = V2/2 ~ 0.707 < 1, and hence the trajectories all spiral inward toward 0. a —b b a * O, with fac- The next theorem shows that, in general, when a real 2X2 matrix has complex a —b eigenvalues, it is similar to a matrix of the form . For a complex vector Figure 4.27 z = a + bi = a + c w_ b. A. Chapter 4 Eigenvalues and Eigenvectors y >x Figure 4.28 A rotation followed by a scaling we define the real part, Re x, and the imaginary part, Im x, of x to be Rex = a "Rez" A Rew and Imx = c Imz Imw Th60r6D1 4.43 LetAbeareal2 X 2 matrix with a complex eigenvalue A = a — bi (where b 0) and corresponding eigenvector x. Then the matrix P = [ Re x Im x ] is invertible and f1 Proof Let x = u + vi so that Re x = u and Im x = v. From Ax = Ax, we have Au 4- Avi = Ax = Ax = (a - bi)(u + vi) = flu 4 avi — bui + bv — {au + bv) + ( — bu + av)i Equating real and imaginary parts, we obtain Au = au + bv and Av = —bu + av NowP = [u v], so = [au 4- bv\ — bu + av] = {Au Av] =A[u|v] = AP To show that P is invertible, it is enough to show that u and v are linearly independent. If u and v were not linearly independent, then it would follow that v = Au for some (nonzero complex) scalar k, because neither u nor v is 0. Thus, x = u 4- vi = u 4- kui = (l 4- ki)u Now, because A is real, Ax = Ax implies that Ax = Ax = Ax = Ax = Ax so X = u — vi is an eigenvector corresponding to the other eigenvalue A = a + bi. But x = (1 + kt)u = (1 - ki)u a -b = [u|v] a -b b a b a Section 4.6 Applications and the Perron-Frobenius Theorem 3SS because u is a real vector. Hence, the eigenvectors x and x of A are both nonzero multiples of u and therefore are multiples of one another. This is impossible because eigenvectors corresponding to distinct eigenvalues must be linearly independent by Theorem 4.20. (This theorem is valid over the complex numbers as well as the real numbers.) This contradiction implies that u and v are linearly independent and hence P is invertible. It now follows that A = P A = Theorem 4.43 serves to explain Example 4.51(b). The eigenvalues of 0.2 -1.2" are 0.8 ± 0.6i. For A = 0.8 — 0.6i, a corresponding eigenvector is 0.6 1.4 '-1-f "-1 -1" X = 1 — 1 + 0 From Theorem 4.43, it follows that for P have "-1 -1' "0.8 -0.6" and C = 1 0_ .0.6 0.8. , we A = PCP"1 and P~lAP = C For the given dynamical system xk+] = Axk, we perform a change of variable. Let xk = Pyk (or, equivalently, yj. = P'x*) Then PYk+i = = Axk = APyk so y/c+i = x*+i = Axk = P lAPyk = Cy H—Now C has the same eigenvalues as A (why?) and 10.8 ± 0.6i | = 1. Thus, the dynamical system yk+l — Cyk simply rotates the points in every trajectory in a circle about the origin by Theorem 4.42. To determine a trajectory of the dynamical system in Example 4.51(b), we it-eratively apply the linear transformation T(x) = Ax = PCP !x. The transformation can be thought of as the composition of a change of variable (x to y), followed by the rotation determined by C, followed by the reverse change of variable (y back to x). We will encounter this idea again in the application to graphing quadratic equations in Section 5.5 and, more generally, as "change of basis" in Section 6.3. In Exercise 74 of Section 5.5, you will show that the trajectory in Example 4.51(b) is indeed an ellipse, as it appears to be from Figure 4.26(b). To summarize then: If a real 2X2 matrix A has complex eigenvalues a = a ± bi, then the trajectories of the dynamical system xk+l = Axk spiral inward if jaj < 1 (0 is a spiral attractor), spiral outward if |a| > 1 (0 is a spiral repeller), and lie on a closed orbit if |a| = 1 (0 is an orbital center). Vignette Ranking Sports Teams and Searching i In any competitive sports league, it is not necessarily a straightforward process to rank the players or teams. Counting wins and losses alone overlooks the possibility that one team may accumulate a large number of victories against weak teams, while another team may have fewer victories but all of them against strong teams. Which of these teams is better? How should we compare two teams that never play one another? Should points scored be taken into account? Points against? Despite these complexities, the ranking of athletes and sports teams has become a commonplace and much-anticipated feature in the media. For example, there are various annual rankings of U.S. college football and basketball teams, and golfers and tennis players are also ranked internationally. There are many copyrighted schemes used to produce such rankings, but we can gain some insight into how to approach the problem by using the ideas from this chapter. To establish the basic idea, let's revisit Example 3.69. Five tennis players play one another in a round-robin tournament. Wins and losses are recorded in the form of a digraph in which a directed edge from i to j indicates that player i defeats player ;'. The corresponding adjacency matrix A therefore has ati = 1 if player i defeats player j and has «« = 0 otherwise. 0 10 11 0 0 111 10 0 10 0 0 0 0 1 0 0 10 0 We would like to associate a ranking r, with player i in such a way that r, > r; indicates that player i is ranked more highly than player j. For this 356 purpose, let's require that the r. 's be probabilities (that is, 0 s rj £ 1 for all i, and fj + r2 + r3 + r4 + r5 = l) and then organize the rankings in a ranking vector r2 >4 Furthermore, lets insist that player j's ranking should be proportional to the sum of the rankings of the players defeated by player i. For example, player 1 defeated players 2,4, and 5, so we want rx = a(r2 + r4 + r5) where a is the constant of proportionality. Writing out similar equations for the other players produces the following system: f\ - a(r2 + r4 + r5) r, = a(r3 + r4 + r5) r3 = a(r, + r4) r4 = ar5 r5 = ar} Observe that we can write this system in matrix form as h 0 1 0 1 1 h r2 0 0 1 1 1 r2 r3 = a 1 0 0 1 0 r3 or U 0 0 0 0 1 u Js. 0 0 1 0 0_ Js. or r = aAr Equivalently, we see that the ranking vector r must satisfy Ar ~ —f. In other words, r is an eigenvector corresponding to the matrix A\ Furthermore, A is a primitive nonnegative matrix, so the Perron-Frobenius Theorem guarantees that there is a unique ranking vector r. In this example, the ranking vector turns out to be 0.29 0.27 0.22 0.08 0.14 so we would rank the players in the order 1, 2, 3, 5,4. By modifying the matrix A, it is possible to take into account many of the complexities mentioned in the opening paragraph. However, this simple example has served to indicate one useful approach to the problem of ranking teams. 357 The same idea can be used to understand how an Internet search engine such as Google works. Older search engines used to return the results of a search unordered. Useful sites would often be buried among irrelevant ones. Much scrolling was often needed to uncover what you were looking for. By contrast, Google returns search results ordered according to their likely relevance. Thus, a method for ranking websites is needed. Instead of teams playing one another, we now have websites linking to one another. We can once again use a digraph to model the situation, only now an edge from i to j indicates that website i links to (or refers to) website;. So whereas for the sports team digraph, incoming directed edges are bad (they indicate losses), for the Internet digraph, incoming directed edges are good (they indicate links from other sites). In this setting, we want the ranking of website i to be proportional to sum of the rankings of all the websites that link to i. Using the digraph on page 356 to represent just five websites, we have r4 = a{rl + r2 + r,) for example. It is easy to see that we now want to use the transpose of the adjacency matrix of the digraph. Therefore, the ranking vector r must satisfy ATx = —r and will thus be the Perron eigenvector of A . In this example, we obtain "o 0 1 0 o" "0.14" 1 0 0 0 0 0.08 0 1 0 0 1 and r = 0.22 1 1 1 0 0 0.27 _1 1 0 1 0_ ^0.29. so a search that turns up these five sites would list them in the order 5,4, 3,1, 2. Google actually uses a variant of the method described here and computes the ranking vector via an iterative method very similar to the power method (Section 4.5). Section 4.6 Applications and the Perron-Frobenius Theorem Exercises 4.6 Markov Chains Which of the stochastic matrices in Exercises 1-6 are regular? 1. 3. 0 1 1 0 I 0 0.1 0 0.5 0.5 1 0 0.4 0 0.5 2. 4. J 2J \ 0 1 j 0 0 0 1 0 0.5 1 0 0.5 0 1 0 0 0 In Exercises 7-9, P is the transition matrix of a regular Markov chain. Find the long range transition matrix L of P. 7.P 9.P = 13 8.P 3 1 2 J 0.2 0.6 0.2 0.3 0.1 0.6 0.4 0.4 0.2 10. Prove that the steady state probability vector of a regular Markov chain is unique. [Hint: Use Theorem 4.33 or Theorem 4.34.] Population Growth In Exercises 11-14, calculate the positive eigenvalue and a corresponding positive eigenvector of the Leslie matrix L. 11. L = 13. L = 0 0.5 0 0.5 0 2 0. 7 0 0.5 12. L = 14. L 1 1.5 0.5 0 j 1 5 3' \ 0 0 0 I 0 15. If a Leslie matrix has a unique positive eigenvalue kx, what is the significance for the population if A, > 1? Ax< 1? Aj = 1? 16. Verify that the characteristic polynomial of the Leslie matrix L in Equation (3) is q(A) = (-l)"(A" ~ b.k"'1 ~ ^iA""2 - b3Sls2\n-3 — ■ ■ ■ — bns1s2 • • • V-i) [Hint: Use mathematical induction and expand det(L — A/) along the last column.] 1 0 0 • 0 0 0 • 0 0 0 sxs2 ■ 0 0 0 0 ■ 17. If all of the survival rates s,- are nonzero, let P = Compute P~lLP and use it to find the characteristic polynomial of L. [Hint: Refer to Exercise 32 in Section 4.3.] 18. Verify that an eigenvector of I corresponding to A] is 1 sx/kx sxs2/k\ sxs2s3/k\ SL5253 " ' ' V-l/K . [Hint: Combine Exercise 17 above with Exercise 32 in Section 4.3 and Exercise 46 in Section 4.4.] cas in Exercises 19-21, compute the steady state growth rate of the population with the Leslie matrix Lfrom the given exercise. Then use Exercise 18 to help find the corresponding distribution of the age classes. 19. Exercise 39 in Section 3.7 20. Exercise 40 in Section 3.7 21. Exercise 44 in Section 3.7 CAS 22. Many species of seal have suffered from commercial hunting. They have been killed for their skin, blubber, and meat. The fur trade, in particular, reduced some seal populations to the point of extinction. Today, the greatest threats to seal populations are decline of fish stocks due to overfishing, pollution, disturbance of habitat, entanglement in marine debris, and culling by fishery owners. Some seals have been declared endangered species; other species are carefully managed. Table 4.7 gives the birth and survival rates for the northern fur seal, divided into 2-year age classes. [The data are based on A. E. York and J. R. Hartley, "Pup Production Following Harvest of Female Northern Fur Seals," Canadian Journal of Fisheries and Aquatic Science, 38 (1981), pp. 84-90.] Chapter 4 Eigenvalues and Eigenvectors (b) Show that r = 1 if and only if A! = 1. (This represents zero population growth.) [Hint: Let a A2 A3 + a" Show that A is an eigenvalue of L if and only if gW = l.] (c) Assuming that there is a unique positive eigenvalue Aj, show that r < 1 if and only if the population is decreasing and r > 1 if and only if the population is increasing. ' 4.1 Age (years) Birth Rate Survival Rate 0-2 0.00 0.91 2-4 0.02 0.88 4-6 0.70 0.85 6-8 1.53 0.80 8-10 1.67 0.74 10-12 1.65 0.67 12-14 1.56 0.59 14-16 1.45 0.49 16-18 1.22 0.38 18-20 0.91 0.27 20-22 0.70 0.17 22-24 0.22 0.15 24-26 0.00 0.00 cas (a) Construct the Leslie matrix L for these data and compute the positive eigenvalue and a corresponding positive eigenvector. (b) In the long run, what percentage of seals will be in each age class and what will the growth rate be? Exercise 23 shows that the long-run behavior of a population can be determined directly from the entries of its Leslie matrix. 23. The net reproduction rate of a population is defined as r = bj + b2sx 4- b3sxs2 + • • ■ + b„SiS2- • • v-i a + bi where the bt are the birth rates and the Sj are the survival rates for the population. (a) Explain why r can be interpreted as the average number of daughters born to a single female over her lifetime. A sustainable harvesting policy is a procedure that allows a certain fraction of a population (represented by a population distribution vector x) to be harvested so that the population returns to x after one time interval (where a time interval is the length of one age class). Ifh is the fraction of each age class that is harvested, then we can express the harvest-ingprocedure mathematically as follows: If we start with a population vector x, after one time interval we have Lx; harvesting removes hLx, leaving Lx — hLx = (1 — h)Lx Sustainability requires that (1 - h)Lx = x 24. If A j is the unique positive eigenvalue of a Leslie matrix L and h is the sustainable harvest ratio, prove that/; = 1 - 1/Aj. 25. (a) Find the sustainable harvest ratio for the wood- land caribou in Exercise 44 in Section 3.7. (b) Using the data in Exercise 44 in Section 3.7, reduce the caribou herd according to your answer to part (a). Verify that the population returns to its original level after one time interval. 26. Find the sustainable harvest ratio for the seal in Exercise 22. (Conservationists have had to harvest seal populations when overfishing has reduced the available food supply to the point where the seals are in danger of starvation.) 27. Let L be a Leslie matrix with a unique positive eigenvalue Aj. Show that if A is any other (real or complex) eigenvalue of L, then | A| £> Aj. [Hint: Write A = r(cos 6 + i sin 9) and substitute it into the equation g(X) = 1, as in part (b) of Exercise 23, Use De Moivre's Theorem and then take the real part of both sides. The Triangle Inequality should prove useful.] Section 4.6 Applications and the Perron-Frobenius Theorem 361 The Perron-Froheilis Theorem In Exercises 28-31, find the Perron root and the corresponding Perron eigenvector of A. '2 Ol [1 29. A = .1 lJ 2 "0 1 1 1 0 1 1 1 0 28. A = 30. A = 31. A = It can be shown that a nonnegative nX n matrix is irreducible if and only if (I + A)™-1 > O. In Exercises 32-35, use this criterion to determine whether the matrix A is irreducible. If A is reducible, find a permutation of its rows and columns that puts A into the block form b j c ()"/) 32. A = "0 0 1 0" "0 0 1 0" 0 0 0 1 0 0 1 1 33. A = 0 1 0 0 1 0 0 0 J 0 0 0_ 1 1 0 0_ "o 1 0 0 o" "o 1 0 0 0 0 0 1 0 1 0 0 0 0 1 1 0 1 0 1 35. A = 1 0 0 0 1 0 0 1 1 0 0 0 1 0 0 1 0 0 0 0 0 0 0 I 1 34. A = 36. (a) If A is the adjacency matrix of a graph G, show that A is irreducible if and only if G is connected. (A graph is connected if there is a path between every pair of vertices.) (b) Which of the graphs in Section 4.0 have an irreducible adjacency matrix? Which have a primitive adjacency matrix? 37. Let G be a bipartite graph with adjacency matrix A. (a) Show that A is not primitive. (b) Show that if A is an eigenvalue of A, so is — A. [Hint: Use Exercise 80 in Section 3.7 and partition an eigenvector for A so that it is compatible with this partitioning of A. Use this partitioning to find an eigenvector for — A.] 38. A graph is called k-regular if k edges meet at each vertex. Let G be a fc-regular graph. M (a) Show that the adjacency matrix A of G has A = k as an eigenvalue. [Hint: Adapt Theorem 4.30.1 (b) Show that if A is primitive, then the other eigenvalues are all less than k in absolute value. [Hint: Adapt Theorem 4.31.] 39. Explain the results of your exploration in Section 4.0 in light of Exercises 36-38 and Section 4.5. In Exercise 40, the absolute value of a matrix A = [atf] is defined to be the matrix \ A\ — [\aJ ]. 40. Let A and B be n X n matrices, x a vector in W, and c a scalar. Prove the following matrix inequalities: (a) \cA\ = |c| \A\ (b) \A + B\ ss |A] + \b\ (c) |Axj < \A\ |x| (d) \AB\ =3 |A| \B\ 41. Prove that a 2 X 2 matrix A = 11 12 is reducible a2i a22_ if and only if au = 0 or aTi = 0. 42. Let A be a nonnegative, irreducible matrix such that I - A is invertible and (7 - A)"1 2= O. Let Aj and Vj be the Perron root and Perron eigenvector of A. (a) Prove that 0 < A, < 1. [Hint Apply Exercise 22 in Section 4.3 and Theorem 4.18(b).] (b) Deduce from (a) that V[ > Av,. Linear Recurrence Relations In Exercises 43-46, write out the first six terms of the sequence defined by the recurrence relation with the given initial conditions. 43. Xq — 1, xn — 2xiv..x for n 3s 1 44. a, = 128, a„ = a„_x/2 for n s 2 45. y0 = 0,yi = 1, y„ = y,,^ - y„.2 for n > 2 46. b0 = 1, bx = 1, b„ = 2lvi + b„„2 for n > 2 In Exercises 47-52, solve the recurrence relation with the given initial conditions. 47. x0 = 0, x, = 5, x„ = 3x„._i + 4xn...2 for n s 2 48. x0 = 0, Xi = 1, xn = 4x„._, — 3x„.„2 for n 5: 2 49. y, = l,y2 = 6,y„ = 4y„..: - 4>'„„2 fern > 3 50. a0 = 4, «i = 1, a„ = a„„! — a„~2/4 for n s 2 51. fr0 = 0, = 1, b„ = 2fr„_i + 2b„_2 for n St 2 52. The recurrence relation in Exercise 45. Show that your solution agrees with the answer to Exercise 45. 53. Complete the proof of Theorem 4.38(a) by showing that if the recurrence relation xn + bx„^ has 362 Chapter 4 Eigenvalues and Eigenvectors distinct eigenvalues A, + A2, then the solution will be of the form Xn ~ C\K + C2^2 [Hint: Show that the method of Example 4.40 works in general] 54. (a) Show that for any choice of initial conditions x0 = r and xx = 5, the scalars cx and c2 can be found, as stated in Theorem 4.38(a) and (b). (b) If the eigenvalues Aj and A2 are distinct and the initial conditions are xn = 0, x, = 1, show that 1 Ai A2 (a; - a"). 55. The Fibonacci recurrence./', = /„_] + /„_2 has the associated matrix equation x„ = Axn^u where x„ = l r and A = Jn-1_ .1 o. (a) With/0 = 0 and fj tion to prove that 1, use mathematical induc- fn+l fn . fn fn-1- for all «5 1, (b) Using part (a), prove that fn + lfn-l ~~ fn = for all n s 1. [This is called Cassini's Identity, after the astronomer Giovanni Domenico Cassini (1625-1712). Cassini was born in Italy but, on the invitation of Louis XIV, moved in 1669 to France, where he became director of the Paris Observatory. He became a French citizen and adopted the French version of his name: Jean-Dominique Cassini. Mathematics was one of his many interests other than astronomy. Cassini's Identity was published in 1680 in a paper submitted to the Royal Academy of Sciences in Paris.] (c) An 8 X 8 checkerboard can be dissected as shown in Figure 4.29(a) and the pieces reassembled to form the 5X13 rectangle in Figure 4.29(b). (b) Figure 4.29 The area of the square is 64 square units, but the rectangle's area is 65 square units! Where did the extra square come from? [Hint: What does this have to do with the Fibonacci sequence?] 56. You have a supply of three kinds of tiles: two kinds of 1 X 2 tiles and one kind of 1 X 1 tile, as shown in Figure 4.30. Figure 4.30 Let t„ be the number of different ways to cover a 1 X n rectangle with these tiles. For example, Figure 4.31 shows that t3 = 5. (a) Find tx,..., t5. (Does t0 make any sense? If so, what is it?) (b) Set up a second order recurrence relation for tn. (c) Using r, and f2 as the initial conditions, solve the recurrence relation in part (b). Check your answer against the data in part (a). Figure 4.31 The five ways to tile a 1 X 3 rectangle 57. You have a supply of 1 X 2 dominoes with which to cover a 2 X n rectangle. Let d„ be the number of different ways to cover the rectangle. For example, Figure 4.32 shows that a\ = 3. Section 4.6 Applications and the Perron-Frobenius Theorem The three ways to cover a 2 X 3 rectangle with 1 X 2 dominoes (a) Find dt,.,., d5. B ^ (Does d0 make any sense? If so, what is it?) (b) Set up a second order recurrence relation for dn. (c) Using dx and d2 as the initial conditions, solve the recurrence relation in part (b). Check your answer against the data in part (a). 58. In Example 4.41, find eigenvectors vx and v2 corre-1 + V5 . 1 - V5 sponding to A! k f and A, -. With 2 4 2 , verify formula (2) in Section 4.5. That is, show that, for some scalar q, lim k-*oo A? c,v. Systems of Linear Differential Equations a + H dx' In Exercises 59-64, find the general solution to the given system of differential equations. Then find the specific solution that satisfies the initial conditions. (Consider all functions to be functions of t.) 59. 60. 61. x x_ 62. y[ y'i 63. x y x + 3y, x(0) = 0 2x + 2y, y(0) = 5 2x — y, x(0) = 1 -x + 2y, y(0) = 1 x1 >'i + x2, — x2, + 72, y ■ x + x + y, *,(0) = 1 x2(0) = 0 yfO) = 1 yM = l z, x(0) = z, y(0) = z(Q) = - 64. x' = x + 3z, x(0) = 2 / = x - 2y + z, y(0) = 3 z' = 3x + z, z(0) = 4 65. A scientist places two strains of bacteria, X and Y, in a petri dish. Initially, there are 400 of X and 500 of Y. The two bacteria compete for food and space but do not feed on each other. If x = x(t) andy = y(t) are the numbers of the strains at time t days, the growth rates of the two populations are given by the system x' — 1,2% — 0.2y y' = -0.2x + 1.5y (a) Determine what happens to these two populations by solving the system of differential equations. (b) Explore the effect of changing the initial populations by letting x(0) = a andy(0) = b. Describe what happens to the populations in terms of a and b. 66. Two species, X and Y, live in a symbiotic relationship. That is, neither species can survive on its own and each depends on the other for its survival. Initially, there are 15 of X and 10 of Y. If x — x(f) andy = y(t) are the sizes of the populations at time t months, the growth rates of the two populations are given by the system x' = -0.8x + 0.4y y' = 0.4x - 0.2y Determine what happens to these two populations. In Exercises 67 and 68, species Xpreys on species Y. The sizes of the populations are represented by x = x(t) and y = y(t). The growth rate of each population is governed by the system of differential equations x' = Ax + b, where x and b is a constant vector. Determine what happens to the two populations for the given A and b and initial conditions x(0). (First show that there are constants a and b such that the substitutions x = u + a andy = v + b convert the system into an equivalent one with no constant terms.) x = 67. A = 68. A 1 -1 -1 -1 1 1 1 -1 ,b -30 ,x(0) = 20 „30 -10. " o" , x(0) = "10 40 30 69. Let x = x(f) be a twice-differentiable function and consider the second order differential equation x" + ax' + bx = 0 (11) (a) Show that the change of variables y = x' and z = x allows Equation (11) to be written as a system of two linear differential equations in y and z. Chapter 4 Eigenvalues and Eigenvectors (b) Show that the characteristic equation of the system in part (a) is A2 + aX + b = 0. 70. Show that there is a change of variables that converts the nth order differential equation „(«) + + an 0 into a system of n linear differential equations whose coefficient matrix is the companion matrix C(p) of the polynomial/?(A) = A" + a„_iA"-1 4- ■ ■ • + alA 4- a0. [The notation xw denotes the fcth derivative of X See Exercises 26-32 in Section 4.3 for the definition of a companion matrix.] In Exercises 71 and 72, use Exercise 69 to find the general solution of the given equation. 71. x" - 5x' + 6x = 0 72. x" + 4x' + 3x = 0 In Exercises 73-76, solve the system of differential equations in the given exercise using Theorem 4.41. 73. Exercise 59 74. Exercise 60 75. Exercise 63 76. Exercise 64 Discrete Linear Dynamical Systems In Exercises 77-84, consider the dynamical system (a) Compute and plot Xq, x1; x2, x3 for Xq (b) Compute and plot Xq, x,, x2, x3 for x^ = (c) Using eigenvalues and eigenvectors, classify the origin as an attractor, repeller, saddle point, or none of these. (d) Sketch several typical trajectories of the system. 77. A = 2 1 0 3 78. A 0.5 0 -0.5 0.5 79. A 81. A = 83. A 2 -I 1.5 -1 0.2 0.4 -0.2 0.8 80. A 82. A 84. A -4 2 1 -3. 0.1 0.9" 0.5 0.5. 0 -1.5 1.2 3.6 In Exercises 85-88, the given matrix is of the form a —b\ , , , , A = , .In each case, A can be factored as the b a\ product of a scaling matrix and a rotation matrix. Find the scaling factor r and the angle 0 of rotation. Sketch the first four points of the trajectory for the dynamical system and classify the origin as a spi- Xjfc+1 = Axk with Xq = ral attractor, spiral repeller, or orbital center. 85. A = 87. A = 1 -1 1 1 1 -Vi V3 I 86. A = 88. A 0 0.5 -0.5 0 '3/2 -1/2 1/2 -V3/2 such that A - PCP In Exercises 89-92, find an invertible matrix P and a matt —b b a_ Sketch the first six points of the trajectory for the dynamical r trix C of the form C system xk+, = Axk with Xq and classify the origin as a spiral attractor, spiral repeller, or orbital center. 89. A = 91. A = 0.1 0.1 1 1 -0.2 0.3 90. A = 92. A = 2 1 -2 0. 0 -1 1 Vi Chapter Review Key Definitions and Concepts adjoint of a matrix, 276 algebraic multiplicity of an eigenvalue, 294 characteristic equation, 292 characteristic polynomial, 292 cofactor expansion, 266 Cramer's Rule, 274-275 determinant, 263-265 diagonalizable matrix, 303 eigenvalue, 254 eigenvector, 254 eigenspace, 256 Fundamental Theorem of Invertible Matrices, 296 geometric multiplicity of an eigenvalue, 294 Gerschgorin disk, 319 Gerschgorin's Disk Theorem, 321 Laplace Expansion Theorem, 266 power method (and its variants), 311-319 properties of determinants, 269-274 similar matrices, 301 Chapter Review 365 Review Questions 1. Mark each of the following statements true or false: (a) For all square matrices A, det(—A) = — detA. (b) If A and B are n X n matrices, then det(AB) = det (BA). (c) If A and B are n X n matrices whose columns are the same but in different orders, then detB = -detA, (d) If A is invertible, then det(A_1) = det AT. (e) If 0 is the only eigenvalue of a square matrix A, then A is the zero matrix. (f) Two eigenvectors corresponding to the same eigenvalue must be linearly dependent. (g) If Ann X n matrix has n distinct eigenvalues, then it must be diagonalizable. (h) If an n X n matrix is diagonalizable, then it must have n distinct eigenvalues. (i) Similar matrices have the same eigenvectors. (j) If A and B are two n X n matrices with the same reduced row echelon form, then A is similar to B. 2. Let A 1 3 5 3 5 7 7 9 11 (a) Compute det A by cofactor expansion along any row or column. (b) Compute det A by first reducing A to triangular form. 2 and 4. Let A and £ be 4 X 4 matrices with det A det B = — |. Find det C for the indicated matrix C: (a) C = (AB)"1 (b) C = A2B(3Ar) 5. If A is a skew-symmetric n X n matrix and n is odd, prove that det A = 0. 6. Find all values of k for which -1 1 4 = 0. In Questions 7 and 8, show that x is an eigenvector of A and find the corresponding eigenvalue. r '3 1" 7.x = ,A = 2 4 3 8. x = A = 13 -60 -45 - 5 18 15 10 -40 -32 9. Let A -5 3 0 -6 4 0 (a) Find the characteristic polynomial of A. (b) Find all of the eigenvalues of A. (c) Find a basis for each of the eigenspaces of A. (d) Determine whether A is diagonalizable. If A is not diagonalizable, explain why not. If A is diagonalizable, find an invertible matrix P and a diagonal matrix D such that P :AP = D. 10. If A is a 3 X 3 diagonalizable matrix with eigenvalues -2, 3, and 4, find det A. 11. IfAisa2 X 2 matrix with eigenvalues A1 = \, A2 = — 1, and corresponding eigenvectors v, "3~ find A" 17} T 1 > V2 = .1. -1. 12. If A is a diagonalizable matrix and all of its eigenvalues satisfy ! A | < 1, prove that A" approaches the zero matrix as n gets large. In Questions 13-15, determine, with reasons, whether A is similar to B. If A ~ B, give an invertible matrix P such that y-1 AP = B. 4 13. A = a b c 3d 2e -¥ f 3. If d e f = 3, find 3a 2b -4c c 14. A g h i 3£ 2h - 4i i 15. A 3 2 0 'l 0 1 1 0 0 1 r2 2 ,B = .3 2. "3 0" ,B = „0 2_ 1 0 "l 1 o" ,B = 0 1 0 .0 0 1. 16. Let A = . Find all values of k for which: (a) A has eigenvalues 3 and — 1. (b) A has an eigenvalue with algebraic multiplicity 2. (c) A has no real eigenvalues. 17. IfA3 = A, what are the possible eigenvalues of A? 18. If a square matrix A has two equal rows, why must A have 0 as one of its eigenvalues? 19. If x is an eigenvector of A with eigenvalue A = 3, show that x is also an eigenvector of A2 — 5A + 21. What is the corresponding eigenvalue? 20. If A is similar to B with P~~ lAP = B and x is an eigenvector of A, show that P~lx is an eigenvector of B.