Matrices H ..... We [Halmos and Kaplansky] share a philosophy about linear algebra: we think basis-free, we write basis-free, but when the chips are down we close the office door and compute with matrices like fury. —Irving Kaplansky In Paul Halmos: Celebrating 50 Years of Mathematics J. H. Ewing and F, W. Gehring, eds. Springer-Verlag, 1991, p. 88 3.0 Introduction: Matrices in Action In this chapter, we will study matrices in their own right. We have already used matrices—in the form of augmented matrices—to record information about and to help streamline calculations involving systems of linear equations. Now you will see that matrices have algebraic properties of their own, which enable us to calculate with them, subject to the rules of matrix algebra. Furthermore, you will observe that matrices are not static objects, recording information and data; rather, they represent certain types of functions that "act" on vectors, transforming them into other vectors. These "matrix transformations" will begin to play a key role in our study of linear algebra and will shed new light on what you have already learned about vectors and systems of linear equations. Furthermore, matrices arise in many forms other than augmented matrices; we will explore some of the many applications of matrices at the end of this chapter. In this section, we will consider a few simple examples to illustrate how matrices can transform vectors. In the process, you will get your first glimpse of "matrix arithmetic." Consider the equations x, + 2x2 (1) yi = 3x2 We can view these equations as describing a transformation of the vector x = . If we denote the matrix of coefficients of the right-hand side into the vector y by F, then F = 7i 1 2 0 3 , and we can rewrite the transformation as V "l 2 .0 3_ *2. 136 or, more succinctly, y = Fx. [Think of this expression as analogous to the functional notation y = f(x) you are used to: x is the independent "variable" here, y is the dependent "variable," and F is the name of the "function."] Section 3.0 Introduction: Matrices in Action 137 Thus, if x = , then the Equations (1) give -2 + 2-1=0 3-1 = 3 or y We can write this expression as Problem 1 Compute Fx for the following vectors x: "o" 1 2" -2 .3. _0 3. 1_ (a) x = (b) x = (c) X = (d) x = Problem 2 The heads of the four vectors x in Problem 1 locate the four corners of a square in the xxx2 plane. Draw this square and label its corners A, B, C, and D, corresponding to parts (a), (b), (c), and (d) of Problem 1. On separate coordinate axes (labeled yY and y2), draw the four points determined by Fx in Problem 1. Label these points A', B',C, and D'. Let's make the (reasonable) assumption that the line segment AB is transformed into the line segment A'B', and likewise for the other three sides of the square ABCD. What geometric figure is represented by A' B' C D' ? fo" Problem 3 The center of square ABCD is the origin 0 = A'B'CD'7. What algebraic calculation confirms this? Now consider the equations . What is the center of that transform a vector y = 1 lyi. transformation as z = Gy, where zi - yi - y2 z2 = -2y, into the vector z = (2) . We can abbreviate this G = Problem 4 We are going to find out how G transforms the figure A'B'C'D'. Compute Gy for each of the four vectors y that you computed in Problem 1. [That is, compute z = G(Fx). You may recognize this expression as being analogous to the composition of functions with which you are familiar.] Call the corresponding points A", B", C", and D", and sketch the figure A"B"C"D" on coordinate axes. Problem 5 By substituting Equations (1) into Equations (2), obtain equations for z, and z2 in terms of xx and x2. If we denote the matrix of these equations by H, then we have z = Hx. Since we also have z = GFx, it is reasonable to write H = GF Can you see how the entries of H are related to the entries of F and G? Problem 6 Let's do the above process the other way around: First transform the square ABCD, using G, to obtain figure A*B*C*D*. Then transform the resulting figure, using F, to obtain A**B**C**D**. [Note: Don't worry about the "variables" x, ;; Chapter 3 Matrices y, and z here. Simply substitute the coordinates of A, B, C, and D into Equations (2) and then substitute the results into Equations (1). ] Are A* *B* *C**D* * and A"B"C"D" the same? What does this tell you about the order in which we perform the transformations F and G? Problem ? Repeat Problem 5 with general matrices F = In fl2 , G = gn gl2 Jn J22. and H = hi h22 That is, if Equations (1) and Equations (2) have coefficients as specified by F and G, find the entries of H in terms of the entries of F and G. The result will be a formula for the "product" H = GF. Problem 8 Repeat Problems 1-6 with the following matrices. (Your formula from Problem 7 may help to speed up the algebraic calculations.) Note any similarities or differences that you think are significant. (a) F — (c) F = 0 -1 1 0 "2 0" ,G = _0 3. I* 2 -1" , G = 2 -1 1 (b) F — (d) F = 1 f r2 I ,G~ .1 2__ 1 1 "2 f , G = .1 1, Matrix Operations Although we have already encountered matrices, we begin by stating a formal definition and recording some facts for future reference. DSflUltiOn A matrix is a rectangular array of numbers called the entries, or elements, of the matrix. Although numbers will usually be chosen from the set V. of real numbers, they may also be taken from the set C of complex numbers or from Z„, where p is prime. Technically, there is a distinction between row/column matrices and vectors, but we will not belabor this distinction. We will, however, distinguish between row matrices/vectors and column matrices/vectors. This distinction is important—at the very least— for algebraic computations, as we will demonstrate. The following are all examples of matrices: V5 2 1 0 TT 2 4 17 [1111] 5.1 1.2 -1 6.9 0 4.4 -7.3 9 8.5 [7] The size of a matrix is a description of the numbers of rows and columns it has. A matrix is called m X n (pronounced "m by n") if it has m rows and n columns. Thus, the examples above are matrices of sizes 2 X 2, 2 X 3,3 X 1,1 X 4, 3 X 3, and 1 X 1, respectively. A 1 X m matrix is called a row matrix (or row vector), and an n X 1 matrix is called a column matrix (or column vector). We use double-subscript notation to refer to the entries of a matrix A. The entry of A in row i and column j is denoted by air Thus, if "3 9 .....1 0 5 4 A = then «J3 = — 1 and a21 -- S. (The notation APj is sometimes used interchangeably with a{j.) We can therefore compactly denote a matrix A by [a^] (or [ajj]mXn if it is important to specify the size of A, although the size will usually be clear from the context). Section 3.1 Matrix Operations With this notation, a general m X n matrix A has the form B,i ai2 • • ■ aln «21 If the columns of A are the vectors a,, a2, • • •, a... then we may represent A as A = [&i a2 • * • a„ ] If the rows of A are A,, A„ ..., Am, then we may represent A as ' A, A = The diagonal entries of A are au, a22, fl33> • > ■» and if m = « (that is, if A has the same number of rows as columns), then A is called a square matrix. A square matrix whose nondiagonal entries are all zero is called a diagonal matrix. A diagonal matrix all of whose diagonal entries are the same is called a scalar matrix. If the scalar on the diagonal is 1, the scalar matrix is called an identity matrix. For example, let "3 C = A 2 5 0" "3 f , B = _-l 4 1. 4 5. and D = The diagonal entries of A are 2 and 4, but A is not square; B is a square matrix of size 2X2 with diagonal entries 3 and 5; C is a diagonal matrix; D is a 3 X 3 identity matrix. The n X n identity matrix is denoted by J„ (or simply I if its size is understood). Since we can view matrices as generalizations of vectors (and, indeed, matrices can and should be thought of as being made up of both row and column vectors), many of the conventions and operations for vectors carry through (in an obvious way) to matrices. Two matrices are equal if they have the same size and if their corresponding entries are equal. Thus, if A = [aij]mX„ and B = [by]rXs, then A = B if and only if m = r and n — s and afl = b» for all i and j. 2 3.1 Consider the matrices A a b '2 0" , B = c d_ ß 3_ and C = y Neither A nor B can be equal to C (no matter what the values of x and y), since A and B are 2 X 2 matrices and C is 2 X 3. However, A = 5 if and only if a = 2, ib = 0, c = 5, and (i = 3. liiiipf« 3.2• Consider the matrices R = [1 4 31 and C 140 Chapter 3 Matrices Despite the fact that R and C have the same entries in the same order, R C since R is 1 X 3 and C is 3 X 1. (If we read R and C aloud, they both sound the same: "one, four, three.") Thus, our distinction between row matrices/vectors and column matrices/vectors is an important one. Matrix Addition and Scalar Multiplication Generalizing from vector addition, we define matrix addition componentwise. If A = [ay] and B = [by] are m X n matrices, their sum A + B is the m X n matrix obtained by adding the corresponding entries. Thus, A + B + by] [We could equally well have defined A + B in terms of vector addition by specifying that each column (or row) of A + B is the sum of the corresponding columns (or rows) of A and B.] If A and B are not the same size, then A + B is not defined. Example 3.3 Example 3.4 Let A 1 4 0" -3 1 -f , B = -2 6 5_ 3 0 2 , and C = 4 3 2 1 Then A + B = but neither A + C nor B + C is defined. -2 5 -1 1 6 7 The componentwise definition of scalar multiplication will come as no surprise. If A is an m X n matrix and c is a scalar, then the scalar multiple cA is the m X n matrix obtained by multiplying each entry of A by c. More formally, we have cA = c[ay] = [cay] [In terms of vectors, we could equivalently stipulate that each column (or row) of cA is c times the corresponding column (or row) of A.] For matrix A in Example 3.3, 2 8 0" - \ 2 0" 2A = _-4 12 10. .-1 3 5 2- and (- \)A - -1 -4 2 -6 The matrix ( — I)A is written as — A and called the negative of A. As with vectors, we can use this fact to define the difference of two matrices: If A and B are the same size, then A - B = A + Section 3.1 Matrix Operations 141 Example 3.5 For matrices A and B in Example 3.3, A - B = 1 "14 0" -2 6 5. - "-3 1 -l" . 3 0 2_ = "431" -5 6 3_ A matrix all of whose entries are zero is called a zero matrix and denoted by O (or OmX„ if it is important to specify its size). It should be clear that if A is any matrix and O is the zero matrix of the same size, then A+0=A=0+A and A A = O = -A + A Matrix Multiplication Mathematicians are sometimes like Lewis Carroll's Humpty Dumpty: "When I use a word," Humpty Dumpty said, "it means just what I choose it to mean—neither more nor less" (from Through the Looking Glass). The Introduction in Section 3.0 suggested that there is a "product" of matrices that is analogous to the composition of functions. We now make this notion more precise. The definition we are about to give generalizes what you should have discovered in Problems 5 and 7 in Section 3.0. Unlike the definitions of matrix addition and scalar multiplication, the definition of the product of two matrices is not a componentwise definition. Of course, there is nothing to stop us from defining a product of matrices in a componentwise fashion; unfortunately such a definition has few applications and is not as "natural" as the one we now give. Definition If A is an m X n matrix and B is an n X r matrix, then the product C — AB is an m X r matrix. The (i, j) entry of the product is computed as follows: ctj = anbij + aabv + • • ■ + ainb„j Remarks • Notice that A and B need not be the same size. However, the number of columns of A must be the same as the number of rows of B. If we write the sizes of A, B, and AB in order, we can see at a glance whether this requirement is satisfied. Moreover, we can predict the size of the product before doing any calculations, since the number of rows of AB is the same as the number of rows of A, while the number of columns of AB is the same as the number of columns of B, as shown below: A B = AB m X n n X r m X r - t_r " Same Size of Aß 142 Chapter 3 Matrices •» The formula for the entries of the product looks like a dot product, and indeed it is. It says that the entry of the matrix AB is the dot product of the zfh row of A and the jth column of B: "fell ■ • b, ■ ■ blr~ • b2r ■ Kr. Notice that, in the expression c# = ajxbu + ai2b2j + ■ • ■ + ainb„j, the "outer subscripts" on each ab term in the sum are always i and j whereas the "inner subscripts" always agree and increase from 1 to n. We see this pattern clearly if we write c« using summation notation: n cij = ^aikbkj Example 3.6 Compute AB if A = 1 3 -2 -1 and B -4 5 -1 0 3~1 -2 -1 1 2 0 6 Since A is 2 X 3 and B is 3 X 4, the product AB is defined and will be a 2X4 matrix. The first row of the product C = AB is computed by taking the dot product of the first row of A with each of the columns of B in turn. Thus, c„ = l(-4) + 3(5) + (-1)(-1) = 12 C12=1(0) + 3(-2) + (-1)(2) =~8 c13=l(3) +3(-l) + (-l)(0) =0 c14= 1(-1) + 3(1) +(-l)(6) =-4 The second row of C is computed by taking the dot product of the second row of A with each of the columns of B in turn: c21 = (-2)(-4) + <-l)(5) +(1)(-1) = 2 c22=(-2)(0) + (-1K-2) + (1)(2) =4 c23 = (-2)(3) + (-D(-l) + (1)(0) =-5 c24= (-2)(-l) + (-1)(1) +(1)(6) =7 Thus, the product matrix is given by AB = 12 2 (With a little practice, you should be able to do these calculations mentally without writing out all of the details as we have done here. For more complicated examples, a calculator with matrix capabilities or a computer algebra system is preferable.) Section 3.1 Matrix Operations 143 Before we go further, we will consider two examples that justify our chosen definition of matrix multiplication. Example 3.7 Ann and Bert are planning to go shopping for fruit for the next week. They each want to buy some apples, oranges, and grapefruit, but in differing amounts. Table 3.1 lists what they intend to buy. There are two fruit markets nearby—Sams and Theo's—and » their prices are given in Table 3.2. How much will it cost Ann and Bert to do their shopping at each of the two markets? Table 3.1 Grapefruit Oranges Table 3.2 Sam's Theo's Apples Ann 6 3 10 Apple $0.10 $0.15 Bert 4 8 5 Grapefruit $0.40 $0.30 Orange $0.10 $0.20 Table 3.3 Sam's Theo's Ann $2.80 $3.80 Bert $4.10 $4.00 Solution If Ann shops at Sam's, she will spend 6(0.10) + 3(0.40) + 10(0.10) = $2.80 If she shops at Theo's, she will spend 6(0.15) + 3(0.30) + 10(0.20) = $3.80 Bert will spend at Sams and 4(0.10) 4- 8(0.40) + 5(0.10) = $4.10 4(0.15) + 8(0.30) + 5(0.20) = $4.00 at Theos. (Presumably, Ann will shop at Sam's while Bert goes to Theos.) The "dot product form" of these calculations suggests that matrix multiplication is at work here. If we organize the given information into a demand matrix D and a price matrix P, we have D = 10 5 and P = 0.10 0.15 0.40 0.30 0.10 0.20 The calculations above are equivalent to computing the product DP = 6 3 10 4 8 5 0.10 0.15 0.40 0.30 0.10 0.20 2.80 3.80 4.10 4.00 Thus, the product matrix DP tells us how much each persons purchases will cost at each store (Table 3.3). a ' - Chapter 3 Matrices Example 3.8 Consider the linear system xx — 2x2 + 3x, = 5 • 1 —xx + 3x2 + x3 — 1 (1) 2xx — x2 + 4x3 = 14 Observe that the left-hand side arises from the matrix product 1 — I 2 -2 3 -1 so the system (1) can be written as or Ax = b, where A is the coefficient matrix, x is the (column) vector of variables, and b is the (column) vector of constant terms. 2 3 Xy 5 3 1 *2 = 1 1 4. -*3- .14 4 You should have no difficulty seeing that every linear system can be written in the form Ax = b. In fact, the notation [A | b] for the augmented matrix of a linear system is just shorthand for the matrix equation Ax = b. This form will prove to be a tremendously useful way of expressing a system of linear equations, and we will exploit it often from here on. Combining this insight with Theorem 2.4, we see that Ax = b has a solution if and only if b is a linear combination of the columns of A. There is another fact about matrix operations that will also prove to be quite useful: Multiplication of a matrix by a standard unit vector can be used to "pick out" or \a 2 r "reproduce" a column or row of a matrix. Let A and consider the .0 5 -1. products Ae3 and e2A, with the unit vectors e3 and e2 chosen so that the products make sense. Thus, "(f "4 2 f 1 Ae3 = 0 = and e2A .0 5 -1. 1 -1, to i: 4 2 0 5 [0 5 -11 Notice that Ae3 gives us the third column of A and e2A gives us the second row of A. We record the general result as a theorem. Theorem 3.1 Let A be an m X n matrix, e, a 1 X m standard unit vector, and ej an n X 1 standard unit vector. Then a. e A is the ith row of A and b. Ae> is the jth column of A. Section 3.1 Matrix Operations 145 Prool We prove (b) and leave proving (a) as Exercise 41. If a1;..., a„ are the columns of A, then the product Ae;- can be written Ae,- = Oaj + 0a2 + • • • + la^ + • • • + 0a„ = a, We could also prove (b) by direct calculation: 0 ■ Ay • Ae- «21 • • «2/ • 1 = «2j _ aml _0_ -amj. since the 1 in e;- is the jth entry. Partitioned Matrices It will often be convenient to regard a matrix as being composed of a number of smaller submatrices. By introducing vertical and horizontal lines into a matrix, we can partition it into blocks. There is a natural way to partition many matrices, particularly those arising in certain applications. For example, consider the matrix 0 0 2 -l" 10 1 3 0 14 0 0 0 1 7 0 0 7 2_ It seems natural to partition A as I B O C A 1 0 0 1 2 -1 0 1 0 I 1 3 0 0 1 i 4 0 0 0 0 J 1 7 0 0 0 7 2 where J is the 3 x 3 identity matrix, B is 3 x 2,0 is the 2 x 3 zero matrix, and C is 2 x 2. In this way, we can view A as a 2 x 2 matrix whose entries are themselves matrices. When matrices are being multiplied, there is often an advantage to be gained by viewing them as partitioned matrices. Not only does this frequently reveal underlying structures, but it often speeds up computation, especially when the matrices are large and have many blocks of zeros. It turns out that the multiplication of partitioned matrices is just like ordinary matrix multiplication. We begin by considering some special cases of partitioned matrices. Each gives rise to a different way of viewing the product of two matrices. Suppose A is m x n and B is n X r, so the product AB exists. If we partition B in terms of its column vectors, as B — [bl '• b2 '■ • ■ • \br], then AB = A [bj; b2; • • • br] = [Ab, Ab, [ Abr] 146 Chapter 3 Matrices This result is an immediate consequence of the definition of matrix multiplication. The form on the right is called the matrix-column representation of the product. Example 3.9 if then A = 1 3 2 0 -1 1 and 5 4 -1 1 2 3 0 ~4~ "1 3 2 1 13" .0 -1 1 I . 2_ 3 and Ab, = -1 "l 3 2 2 — 0 -1 1 0 Therefore, AB = [Ab;| Ab2] = 13 2 5 -2 . (Check by ordinary matrix multiplication.) Observe that the matrix-column representation of AB allows us to write each column of AB as a linear combination of the columns of A with entries from B as the coefficients. For example, 13 _ 1 2 ~ 0 (See Exercises 23 and 26.) 3 2 -1 1 = 4 V 3" "2" + + 3 0 -1 1 Suppose A is m X n and B is n X r, so the product A5 exists. If we partition A in terms of its row vectors, as A = then AB = rAi" A; JS = A2B . Am_ _ AmB _ Once again, this result is a direct consequence of the definition of matrix multiplication. The form on the right is called the row-matrix representation of the product. Example 3.10 -,-,-^ Using the row-matrix representation, compute AB for the matrices in Example 3.9. Section 3.1 Matrix Operations 147 Solution We compute A,JB =[13 2] [13 5] and A2B = [0 -1 I] [2 Therefore, AB = "13 5" AA . 2 -2. , as before. The definition of the matrix product AB uses the natural partition of A into rows and B into columns; this form might well be called the row-column representation of the product. We can also partition A into columns and B into rows; this form is called the column-row representation of the product. In this case, we have A = [a! • a2 aj and B LBn. AB = [al':a2:.- Bi B, b7 = + a2B2 +• • • + a„B„ (2) Notice that the sum resembles a dot product expansion; the difference is that the individual terms are matrices, not scalars. Let's make sure that this makes sense. Each term a,B, is the product of an m X 1 and a 1 X r matrix. Thus, each a,B; is an m X r matrix—the same size as AB. The products a,B, are called outer products, and (2) is called the outer product expansion of AB. Example 3.11 Compute the outer product expansion of AB for the matrices in Example 3.9. Solution We have A = [a, i a2;a3] = The outer products are aA = 3; 2 -1 : 1 ~4 -l" and B = B2 = 1 2 -B3_ .3 0. T [4 -1] = ~4 -1" , a2B2 = 3" [1 2] = 3 6 .0. .0 o. _-l_ -1 -2_ and a3B3 = 2 "6 0" [3 0] = \ _3 0_ Chapter 3 Matrices (Observe that computing each outer product is exactly like filling in a multiplication table.) Therefore, the outer product expansion of AB is + a2B2 + a,B3 = "4 -f + 3 6 '6 o~ = 13 5" .0 0_ .-1 -2. .3 0^ _ 2 -2. = AB We will make use of the outer product expansion in Chapters 5 and 7 when we discuss the Spectral Theorem and the singular value decomposition, respectively. Each of the foregoing partitions is a special case of partitioning in general. A matrix A is said to be partitioned if horizontal and vertical lines have been introduced, subdividing A into submatrices called blocks. Partitioning allows A to be written as a matrix whose entries are its blocks. For example, A = I 0 0 2 -1 4 3 1 2 1 0 1 0 1 3 -1 2 2 1 1 0 0 1 4 0 and B = 1 -5 3 3 1 0 0 0 1 7 1 0 0 0 2 0 0 0 7 2 0 1 0 0 3 are partitioned matrices. They have the block structures A = An Ax A71 A2 and B = Bn B12 Bj 521 522 ^2 If two matrices are the same size and have been partitioned in the same way, it is clear that they can be added and multiplied by scalars block by block. Less obvious is the fact that, with suitable partitioning, matrices can be multiplied blockwise as well. The next example illustrates this process. w Example 3.12 W Consider the matrices A and B above. If we ignore for the moment the fact that their entries are matrices, then A appears to be a 2 x 2 matrix and B a 2 x 3 matrix. Their product should thus be a 2 x 3 matrix given by r AB = A, A An ^22 B, Bn Bl2 Bj] B22 "23 J "AnBn + A12B2i AnB12 + Al2B22 .^2iB]i + A22B2i A21B12 + A22B22 •A11B13 + Al2B2i ^21-^13 + A22B23 But all of the products in this calculation are actually matrix products, so we need to make sure that they are all defined. A quick check reveals that this is indeed the case, since the numbers of columns in the blocks of A (3 and 2) match the numbers of rows in the blocks of B. The matrices A and B are said to be partitioned conformably for block multiplication. Carrying out the calculations indicated gives us the product AB in partitioned form: AnBu + Al2B21 = I3B11 + AI2J2 = B„ + A12 = 4 3" "2 -T r6 2" -1 2 + 1 3 = 0 5 1 - -5. _4 0_ .5 -5. Section 3.1 Matrix Operations 149 (When some of the blocks are zero matrices or identity matrices, as is the case here, these calculations can be done quite quickly.) The calculations for the other five blocks of AB are similar. Check that the result is 6 2 1 2 2 0 5 2 1 12 5 -5 3 3 9 1 7 0 0 23 7 2 0 0 20 St—** (Observe that the block in the upper-left corner is the result of our calculations above.) Check that you obtain the same answer by multiplying A by B in the usual way. A Matrix Powers When A and B are two n X n matrices, their product AB will also be an n X n matrix. A special case occurs when A = B. It makes sense to define A2 = AA and, in general, to define Ak as Ak = AA-j - A k factors if k is a positive integer. Thus, A1 — A, and it is convenient to define A0 = I„. Before making too many assumptions, we should ask ourselves to what extent matrix powers behave like powers of real numbers. The following properties follow immediately from the definitions we have just given and are the matrix analogues of the corresponding properties for powers of real numbers. If A is a square matrix and r and s are nonnegative integers, then 1. ArAs = Ar+S 2. (Ar)s = A™ In Section 3.3, we will extend the definition and properties to include negative integer powers. Example 3.13 (a) IfA 1 1 1 1 1 1 1 1 , then 1 I "2 2 .1 t. 2 2. and, in general, A = A1 A = 2 2 2 2 1 1 1 1 4 4 4 4 A" = 2»-l 2n-Y 1. The induction step is to prove that the formula holds for n = k + 1. Using the definition of matrix powers and the induction hypothesis, we compute A k+l AkA 2k-i 2 2i-: 2k-i 2k-i + 2k-i 1 1 1 1 21"1 + 2 2i 2,c 2* 2fc 2«+O-i 2(k+i)-i-\ 2ik+l)-l 2(k+l)-l + 2 + 2 k-l k-1 Thus, the formula holds for all n 2: 1 by the principle of mathematical induction. (b) If B = ° ~ we find then B2 "0 -1* "0 -l' "-1 0~ .1 0_ 1 Q. 0 -1. Continuing, B3 = B2B = -1 0" "0 -1* 0 f 0 -1, -1 o. and 0 1 -1 0 0 -1 1 0 1 0 0 1 B4 = B3B = Thus, BD = B, and the sequence of powers of B repeats in a cycle of four: "0 -r "-1 0" 0 f 1 0" "0 -r .1 0_ > 0 -1- .-I Q. _0 1. i _1 0_ The Transpose of a Matrix Thus far, all of the matrix operations we have defined are analogous to operations on real numbers, although they may not always behave in the same way. The next operation has no such analogue. Section 3.1 Matrix Operations 1S1 Definition The transpose of an m X n matrix A is the n X m matrix AT obtained by interchanging the rows and columns of A. That is, the ith column of AT is the fth row of A for all; v&mm 3.14 Let A l 3 2 a b , B = 5 0 1. , and C = [5 -1 2] Then their transposes are 1 5 3 0 2 1 A T a c b d , and CT The transpose is sometimes used to give an alternative definition of the dot product of two vectors in terms of matrix multiplication. If then «1 Vl u = and v = ."«_ u • V = + u2v2 + • • ' + u„ ~Vi~ V2 = t"l «2 ' • • «,J V A useful alternative definition of the transpose is given componentwise: (A7),; = Ajj for all i and j In words, the entry in row i and column j of Ar is the same as the entry in row j and column (' of A. The transpose is also used to define a very important type of square matrix: a symmetric matrix. Definition A square matrix A is symmetric if AT = A—that is, if A is equal to its own transpose. imimii} S,15 Let A 1 3 2 3 5 0 2 0 4 and B 1 2 •1 3 152 Chapter 3 Matrices □ A\ Figure 3.1 A symmetric matrix Then A is symmetric, since Ar = A; but B is not symmetric, since B T _ # 5. A symmetric matrix has the property that it is its own "mirror image" across its main diagonal. Figure 3.1 illustrates this property for a 3 X 3 matrix. The corresponding shapes represent equal entries; the diagonal entries (those on the dashed line) are arbitrary. A componentwise definition of a symmetric matrix is also useful. It is simply the algebraic description of the "reflection" property. A square matrix A is symmetric if and only if Ati = An for all i and;'. Exercises 3.1 Let 3 0" ~4 -2 1 A = , 5 = .0 2 3. 0 -3" E = [4 D 1. 2] -2 c = 1 2 3 4 5 6 -1 In Exercises 1-16, compute the indicated matrices (if possible). A + 2d 2. 3D - 2A B - C 4.C- BT AB 6.BD D + BC 8. BBT E(AF) 10. F(DF) EE 12. EF BTCT - (CB)T 14. DA - AD A3 16. (f2 - D)2 17. Give an example of a nonzero 2X2 matrix A such that A2 = O. 18. Let A 2 1 6 3 that AB = AC but B + C . Find 2X2 matrices B and C such 19. A factory manufactures three products (doohickies, gizmos, and widgets) and ships them to two warehouses for storage. The number of units of each product shipped to each warehouse is given by the matrix A = 200 75 150 100 100 125 (where a* is the number of units of product i sent to warehouse j and the products are taken in alphabetical order). The cost of shipping one unit of each product by truck is $1.50 per doohickey, $1.00 per gizmo, and $2.00 per widget. The corresponding unit costs to ship by train are $1.75, $1.50, and $1.00. Organize these costs into a matrix B and then use matrix multiplication to show how the factory can compare the cost of shipping its products to each of the two warehouses by truck and by train. 20. Referring to Exercise 19, suppose that the unit cost of distributing the products to stores is the same for each product but varies by warehouse because of the distances involved. It costs $0.75 to distribute one unit from warehouse 1 and $1.00 to distribute one unit from warehouse 2. Organize these costs into a matrix C and then use matrix multiplication to compute the total cost of distributing each product. Section 3.1 Matrix Operations 153 In Exercises 21 -22, write the given system of linear equations as a matrix equation of the form Ax — b. 21. Xi - 2x2 + 3x3= 0 2x] + x2 — 5x3 = 4 22. -Xj + 2x3 = 1 X\ x2 — 2 x2 + x3 = — 1 In Exercises 23-28, let A = 1 0 -3 1 2 0 and 2 1 -1 23. Use the matrix-column representation of the product to write each column of AB as a linear combination of the columns of A. 24. Use the row-matrix representation of the product to write each row of AB as a linear combination of the rows of 5. 25. Compute the outer product expansion of AB. 26. Use the matrix-column representation of the product to write each column of BA as a linear combination of the columns of B. 27. Use the row-matrix representation of the product to write each row of BA as a linear combination of the rows of A. 28. Compute the outer product expansion of BA. In Exercises 29 and 30, assume that the product AB makes sense. 29. Prove that if the columns of B are linearly dependent, then so are the columns of AB. 30. Prove that if the rows of A are linearly dependent, then so are the rows of AB. In Exercises 31-34, compute AB by block multiplication, using the indicated partitioning. 2 3:0" 31. A = 1 -1 0 0 0 1 0 0 0 0 2 3 B -1 10 ___________ 0 Oi 1 0 0! 1 32. A 33. A 34. A '2 3 1 0" , B 4 5 0 1. "1 2 : i 0" 3 4 0 1 1 0 i -i 1 _0 1 i -1_ "1 0 0 1" 0 1 0 2 = 0 0 1 3 ,0 0 0 4. 0 1 0" 0 ! o 1 1 !5 4 -2 1 3 2_ "1 0 0 1 0 1 1 0 0 0 0 -1 0 0 1 0 1 2 3 1 0 1 4 1 0 0 1 1 1 1 1 i -1_ 35. Let A 0 1 -1 1. (a) Compute A2, A3,..., A7. (b) What is A2015? Why? 36. Let B 37. Let A = 1 VI 1_ V2 1 1 0 1 1 • V2 V2 Find, with justification, B 2015 Find a formula for A" (n 2: 1) and verify your formula using mathematical induction. 38. Let A = cost sin ( -smtf COS0 (a) Show that A2 A" cos 26 -sin 26 _ sin 26 cos 26 _ (b) Prove, by mathematical induction, that cosnd — sinnfl sinn6 cos n6 for n S: 1 39. In each of the following, find the 4X4 matrix A = [ay] that satisfies the given condition: (a) atj = (c) aij = a - iy fly = J - ! (d) au sin (f + j - 1)it 40. In each of the following, find the 6 X 6 matrix A = [ay] that satisfies the given condition: i + j if i (a) fl,; (c) a» = 0 iff>; 1 if 6 < i + j , 0 otherwise 41. Prove Theorem 3.1(a). (b) a{j = 1 ifji'-;|l Chapter 3 Matrices Matrix Algebra In some ways, the arithmetic of matrices generalizes that of vectors. We do not expect any surprises with respect to addition and scalar multiplication, and indeed there are none. This will allow us to extend to matrices several concepts that we are already familiar with from our work with vectors. In particular, linear combinations, spanning sets, and linear independence carry over to matrices with no difficulty. However, matrices have other operations, such as matrix multiplication, that vectors do not possess. We should not expect matrix multiplication to behave like multiplication of real numbers unless we can prove that it does; in fact, it does not. In this section, we summarize and prove some of the main properties of matrix operations and begin to develop an algebra of matrices. Properties of Addition and Scalar Multiplication All of the algebraic properties of addition and scalar multiplication for vectors (Theorem 1.1) carry over to matrices. For completeness, we summarize these properties in the next theorem. Theorem 3.2 Algebraic Properties of Matrix Addition and Scalar Multiplication Let A, B, and C be matrices of the same size and let c and d be scalars. Then a. A + B — B + A Cnniuu'aihin b. (A + B) + C = A + {B + C) \ssocijth ii • c. A + O = A d. A + (-A) = 0 e. c(A + B) = cA + cB Distributive' f. (c + d)A = cA + dA Distributivity g. c(dA) = (cd)A The proofs of these properties are direct analogues of the corresponding proofs of the vector properties and are left as exercises. Likewise, the comments following Theorem 1.1 are equally valid here, and you should have no difficulty using these properties to perform algebraic manipulations with matrices. (Review Example 1.5 and see Exercises 17 and 18 at the end of this section.) The associativity property allows us to unambiguously combine scalar multiplication and addition without parentheses. IfA, B, and C are matrices of the same size, then (2A + 3B) - C = 2A + (3B - C) and so we can simply write 2A + 3J5 — C. Generally, then, if Al5 A2,..., Aj. are matrices of the same size and cx, c2,...,ck are scalars, we may form the linear combination c,A; + c2A2 + • • ■ + ckAk We will refer to q, c2,..., q as the coefficients of the linear combination. We can now ask and answer questions about linear combinations of matrices. Section 3.2 Matrix Algebra 155 Example 3.16 Let A, = (a) Is B ■■ (b) Is C 0 f 1 0" "1 l" ,A2 = , and A3 = .-i o_ ^0 1. 1 1. 1 4 2 1. 1 2 3 4 a linear combination of Ax, A2> and A3? a linear combination of At, A2, and A3? Solution (a) We want to find scalars cx, c2, and c3 such that cxAx + c2A2 + c3A3 = B. Thus, 0 r "l 0" "1 f 1 4' + c2 + c3 -1 0. _0 1_ ,1 1. _2 1. The left-hand side of this equation can be rewritten as c2 + c3 q + c3 -Cj + c3 c2 + c3 Comparing entries and using the definition of matrix equality, we have four linear equations: c2 + c3 = 1 C] + c3 = 4 ~C! + c3 = 2 c2 + c3 - 1 Gauss-Jordan elimination easily gives " 0 1 1 r "1 0 0 r 1 0 1 4 0 1 0 -2 -1 0 1 2 0 0 1 3 0 1 1 1_ _0 0 0 0_ (check this!), so cx = 1, c2 = -2, and c3 = 3. Thus, A, - 2A2 + 3A3 = B, which can be easily checked. (b) This time we want to solve 0 f ' I o" "l ť "1 2" + c2 + c3 — .-1 0_ _0 I .1 1. ^3 4_ Proceeding as in part (a), we obtain the linear system c2 + c3 = 1 cx + c3 = 2 c2 + c3 = 4 Chapter 3 Matrices Row reduction gives " 0 1 1 1" " 0 1 1 r 1 0 1 2 1 0 1 2 -> -1 0 1 3 -1 0 1 3 o 1 1 4_ 0 0 0 3_ We need go no further: The last row implies that there is no solution. Therefore, in this case, C is not a linear combination of-A,, A2, and A3, Renark Observe that the columns of the augmented matrix contain the entries of the matrices we are given. If we read the entries of each matrix from left to right and top to bottom, we get the order in which the entries appear in the columns of the augmented matrix. For example, we read A, as "0, 1, -1, 0," which corresponds to the first column of the augmented matrix. It is as if we simply "straightened out" the given matrices into column vectors. Thus, we would have ended up with exactly the same system of linear equations as in part (a) if we had asked Is We will encounter such parallels repeatedly from now on. In Chapter 6, we will explore them in more detail. We can define the span of a set of matrices to be the set of all linear combinations of the matrices. "1" " CT "1" "1" 4 a linear combination of 1 0 , and 1 2 -1 0 1 1 0 1 1 fxamnie 3.17 Describe the span of the matrices A1; A2, and A3 in Example 3.16. One way to do this is simply to write out a general linear combination of Aj, A2, and A3. Thus, CjAj + c2A2 + C3A3 = c, (which is analogous to the parametric representation of a plane). But suppose we . w x want to know when the matrix 0 1" "1 0" 1 1 + + c3 .-1 0_ .0 1. „1 1 c2 + + "Cj + c2 + c,_ ly tion above, we know that it is when is in span (A], A2, A3). From the representa- c2 + C3 Cy - W X c, + c3 Ci - _y z_ for some choice of scalars q, c2, c3. This gives rise to a system of linear equations whose left-hand side is exactly the same as in Example 3.16 but whose right-hand side Section 3.2 Matrix Algebra 157 is general. The augmented matrix of this system is " 0 1 1 w" 1 0 1 X -1 0 1 y 0 1 1 z _ and row reduction produces " 0 1 1 w~ "1 0 0 1 0 1 x 0 1 0 -y -1 0 1 y 0 0 1 0 1 1 z _ _0 0 0 kx-b -2y + w \x + \y w — z (Check this carefully.) The only restriction comes from the last row, where clearly we must have w - z = 0 in order to have a solution. Thus, the span of A1( A2, and A3 con-w x sists of all matrices for which w = z. That is, span (A lt A2, A3) = X w_ 4 Nate If we had known this before attempting Example 3.16, we would have seen 1 4l is a linear combination of A,, A2, and A3, since it has "l 2 immediately that B — 2 1 the necessary form (take w = 1, x = 4, and y = 2), but C 3 4 cannot be a linear combination of Ax, A2, and A3, since it does not have the proper form (1 ¥=4). Linear independence also makes sense for matrices. We say that matrices At, A2,. .., Ak of the same size are linearly independent if the only solution of the equation ClA, + c2A2+--- + ckAk = O (1) is the trivial one: c, = c2 = • • • = ck = 0. If there are nontrivial coefficients that satisfy (1), then Aly A2,..., Ak are called linearly dependent. ,,,_, , , , , ,.....■„..........._______ ,.„,................... i k» Example 3,18 Determine whether the matrices A,, A2, and A3 in Example 3.16 are linearly independent. 1 Solution We want to solve the equation cxAx + c2A2 + c3A3 = 0. Writing out the matrices, we have This time we get a homogeneous linear system whose left-hand side is the same as in Examples 3.16 and 3.17. (Are you starting to spot a pattern yet?) The augmented matrix row reduces to give 0 1" "l 0" ~1 ľ "0 0" + c2 _0 1_ _1 1_ ^0 0. " 0 1 1 0" "1 0 0 0" 1 0 1 0 0 1 0 0 --> -1 0 1 0 0 0 1 0 0 1 1 0_ .0 0 0 0_ Chapter 3 Matrices Thus, cl = c2 = c3 = 0, and we conclude that the matrices A1; A2, and A3 are linearly independent. Properties of Matrix Multiplication Whenever we encounter a new operation, such as matrix multiplication, we must be careful not to assume too much about it. It would be nice if matrix multiplication behaved like multiplication of real numbers. Although in many respects it does, there are some significant differences. Example 3.19 Consider the matrices Multiplying gives AB = A 2 4" 1 0" and B = -1 -2_ 1 1. and BA ~ 2 4 1 2 Thus, AB i= BA. So, in contrast to multiplication of real numbers, matrix multiplication is not commutative—the order of the factors in a product matters! "0 0" It is easy to check that A2 = 0 0 (do so!). So, for matrices, the equation A = O does not imply that A = O (unlike the situation for real numbers, where the equation x2 = 0 has only x = 0 as a solution). However gloomy things might appear after the last example, the situation is not really bad at all—you just need to get used to working with matrices and to constantly remind yourself that they are not numbers. The next theorem summarizes the main properties of matrix multiplication. Theorem 3.3 Properties of Matrix Multiplication Let A, B, and C be matrices (whose sizes are such that the indicated operations can be performed) and let k be a scalar. Then a. A(BC) = (AB)C Associativity b. A(B + C) = AB + AC Left distributivity c. (A + B)C = AC + BC Right distributivity d. k{AB) = (kA)B = A{kB) e. ImA = A = AI„ if A is m X n Multiplicative identity Proof We prove (b) and half of (e). We defer the proof of property (a) until Section 3.6. The remaining properties are considered in the exercises. Section 3.2 Matrix Algebra 159 (b) To prove A(B + C) = AB + AC, we let the rows of A be denoted by A, and the columns of B and C by b; and c. Then the jth column of B + C is b; + Cj (since addition is defined componentwise), and thus [A(B + C)]tj = Ai-ibj + c,) = A.-bj + Aj*c = (AB)y + (AC);; = (AB + AC)j;. Since this is true for all i and j, we must have A(B + C) = AB + AC. (e) To prove AIn — A, we note that the identity matrix I„ can be column-partitioned as h = [e,: e2; where e, is a standard unit vector. Therefore, e J AL [Aej! Ae2i---i AeJ a2: = A by Theorem 3.1(b). We can use these properties to further explore how closely matrix multiplication resembles multiplication of real numbers. Example 3.20 If A and B are square matrices of the same size, is (A + B)2 = A2 + 2AB + B2? Solution Using properties of matrix multiplication, we compute (A + B)2 = (A + B)(A + B) = (A + B)A + (A + B)B byleftdistributivity BA + AB + B2 by right distributivity Therefore, (A + B)2 = A2 + 2AB + B2 if and only if A2 + BA + AB + B2 = A2 + 2AB + B2. Subtracting A2 and B2 from both sides gives BA + AB = 2AB. Subtracting AB from both sides gives BA = AB. Thus, (A + B)2 = A2 + 2AB + B2 if and only if A and B commute. (Can you give an example of such a pair of matrices? Can you find two matrices that do not satisfy this property?) Properties of the Transpose Theorem 3.4 Properties of the Transpose Let A and B be matrices (whose sizes are such that the indicated operations can be performed) and let k be a scalar. Then a. (AT)T = A b. (A + B)T = A7 + B1 c. (kA)r=k(Ar) d. (AB)T=BTAT e. (Ar) = (A )r for all nonnegative integers r Chapter 3 Matrices Proof Properties (a) -(c) are intuitively clear and straightforward to prove (see Exercise 30). Proving property (e) is a good exercise in mathematical induction (see Exercise 31). We will prove (d), since it is not what you might have expected. [Would you have suspected that (AB)T = ATBT might be true?] First, if A is m X n and B is n X r, then BT is r X n and A1 is n X m. Thus, the product BTAT is defined and is r X m. Since AB is m X r, (AB)T is r X m, and so (AB)T and BrAr have the same size. We must now prove that their corresponding entries are equal. We denote the fth row of a matrix X by row((X) and its jth column by caL(X), Using these conventions, we see that [(AB)% = (AB),,. = row/A) • colj(B) = coL(Ar)-row,(B7) = row,(Br) • col/A7) = [BTAr]v (Note that we have used the definition of matrix multiplication, the definition of the transpose, and the fact that the dot product is commutative.) Since i and j are arbitrary, this result implies that (AB)7 = BTAT. Remark Properties (b) and (d) of Theorem 3.4 can be generalized to sums and products of finitely many matrices: (Aj + A2 + • • • + AkY = A[ + A\ + • ■ • + Al and (AlA2 • • • Akf = Aj---AfAf assuming that the sizes of the matrices are such that all of the operations can be performed. You are asked to prove these facts by mathematical induction in Exercises 32 and 33. Example 3.21 Let Then Ar = We have 1 3 2 4 so BB1 "l 2' '4 -1 0" A ~ and B = .3 4_ 2 3 1 so A + A1 = 2 5 5 8 a symmetric matrix. 4 2 B1 = -1 3 0 1. 4 2" 4 ~1 0~ 17 5 -1 3 2 3 1. 5 14 0 1 and BTB = 4 -1 0 -1 0 3 1 20 2 2 2 10 3 2 3 1 Thus, both BBT and BTB are symmetric, even though B is not even square! (Check that AA and A A are also symmetric.) Section 3.2 Matrix Algebra The next theorem says that the results of Example 3.21 are true in general. 161 Theorem 3.5 a. If A is a square matrix, then A + A1 is a symmetric matrix. b. For any matrix A, AAT and ATA are symmetric matrices. Proot We prove (a) and leave proving (b) as Exercise 34. We simply check that (A + AT)T = A1 + (ATV = AT + A = A+ AT (using properties of the transpose and the commutativity of matrix addition). Thus, A + A7 is equal to its own transpose and so, by definition, is symmetric. Exercises 3.2 and B = "-1 0" 1 1. In Exercises 1-4, solve the equation for.X, given that \l 2 A = L3 4. 1. X - 2A + 3B = O 2. 2X = A - B 3. 2(A + 2B) = 3X 4. 2(A - B + X) ~ 3(X - A) In Exercises 5-8, write, B as a linear combination of the other matrices, if possible. '2 5] f 1 2 . A, = 0 3 -11 5. B A2 = 0 1 2 1 6. B = A3 = 2 3 -4 2 1 1 0 1 A, = 1 0 0 1 0 -1 1 0 "3 1 l" "1 0 -1" 7. B = . A, = 0 1 0 0 1 0 "-1 2 0" 1 1 l" A2 = , A3 = 0 1 0_ _0 0 0 A, 1 -1 1 0 -1 -1 0 0 1 In Exercises 9-12, find the general form of the span of the indicated matrices, as in Example 3.17. 9. span(A!, A2) in Exercise 5 10. span(Aj, A2, A3) in Exercise 6 11. span(Aj, A2, A3) in Exercise 7 12. span(A!, A2, A3, A4) in Exercise 8 In Exercises 13-16, determine whether the given matrices are linearly independent. 13. 14. 15. 1 2 4 3 3 4, 2 1_ 1 2' 2 f 4 3. J _- 1 0 f "l 0 5 2 > 2 3 .-1 0 .1 1 "l 1" J _1 1_ -2 -1 0 1 0 2 "2 -2 3~ "l 0 o" 1 -1 0 '2 1 0 1 2 0 8. B = 0 0 2 . A, 0 1 0 16. 0 2 0 , 0 3 0 0 1 0 .0 0 2_ _0 0 1. .0 2 6. .0 4 9. .0 3 "o 1 r 1 0 -1 "-1 1 o" A2 = 0 0 1 > 0 1 0 0 1 0 _0 0 0_ 0 0 -1 0 0 - -4. 162 Chapter 3 Matrices 17. Prove Theorem 3.2(a)-(d). 18. Prove Theorem 3.2(e)-(h). 19. Prove Theorem 3.3(c), 20. Prove Theorem 3.3(d). 21. Prove the half of Theorem 3.3(e) that was not proved in the text. 22. Prove that, for square matrices A and B, AB = BA if and only if (A - B)(A + B) = A2 - B2. In Exercises 23-25, ifB = c, and d such that AB = BA a b c d ,find conditions on a, b, "1 i 1 -f "l 2 23. A = 24. A = 25. A = .0 1. .-1 1. .3 4. 26. Find conditions on a, b, c, and d such that B = commutes with both 1 0~ ~o o" and .0 o^ _0 1. 27. Find conditions on a, b, c, and d such that B = commutes with every 2X2 matrix. a b c d a b c d 28. Prove that if AB and BA are both defined, then AB and BA are both square matrices. A square matrix is called upper triangular if all of the entries below the main diagonal are zero. Thus, the form of an upper triangular matrix is 0 * 0 0 0 0 * * 0 * where the entries marked * are arbitrary. a more formal definition of such a matrix a = [a„] is that a§ = 0 ifi > /'. 29. Prove that the product of two upper triangular n x n matrices is upper triangular. 30. Prove Theorem 3.4(a) - (c). 31. Prove Theorem 3.4(e). 32. Using induction, prove that for all n 2s 1, (A, + A2 + • ■ • + a„f = a\ + A2r + • • ■ + Al 33. Using induction, prove that for all n 2:1, (A] A2- • • A„) = a„' • • A\a\. 34. Prove Theorem 3.5(b). 35. (a) Prove that if A and B are symmetric n x « matrices, then so is A + B. (b) Prove that if A is a symmetric n X n matrix, then so is kA for any scalar k. 36. (a) Give an example to show that if A and B are symmetric n x n matrices, then AB need not be symmetric. (b) Prove that if A and B are symmetric n x n matrices, then AB is symmetric if and only if AB = BA. A square matrix is called skew-symmetric if A1 = —A. 37. Which of the following matrices are skew-symmetric? (a) (c) -1 2 0 (b) (d) -1 0 J 38. Give a componentwise definition of a skew-symmetric matrix. 39. Prove that the main diagonal of a skew-symmetric matrix must consist entirely of zeros. 40. Prove that if A and B are skew-symmetric n x n matrices, then so is A + B. 41. If A and B are skew-symmetric 2X2 matrices, under what conditions is AB skew-symmetric? 42. Prove that if A is an n x n matrix, then A — AT is skew-symmetric. 43. (a) Prove that any square matrix A can be written as the sum of a symmetric matrix and a skew-symmetric matrix. [Hint: Consider Theorem 3.5 and Exercise 42.] "12 3" (b) Illustrate part (a) for the matrix A = The trace of an n x n matrix A = [a,-.-] is the sum of the entries on its main diagonal and is denoted by tr(A). That is, tr(A) = au + a22 + ■ ■ ■ + ann 44. If A and B are n x n matrices, prove the following properties of the trace: (a) tr(A + B) = tr(A) + tr(JB) (b) tr(fcA) = ktr(A), where k is a scalar 45. Prove that if A and B are n x n matrices, then tr (AB) = tr (BA). 46. If A is any matrix, to what is tr (AA1) equal? 47. Show that there are no 2 x 2 matrices A and B such thatAB - BA = T. Section 3.3 The Inverse of a Matrix 163 The Inverse of a Matrix In this section, we return to the matrix description Ax = b of a system of linear equations and look for ways to use matrix algebra to solve the system. By way of analogy, consider the equation ax = b, where a, b, and x represent real numbers and we want to solve for x. We can quickly figure out that we want x = b/a as the solution, but we must remind ourselves that this is true only if a + 0. Proceeding more slowly, assuming that a i=- 0, we will reach the solution by the following sequence of steps: 1, x 1 A, A b b b ax = b=> ~{ax) = -(b) -(a) \x = -=^l-x = -=^x = -a a \a J a a a (This example shows how much we do in our head and how many properties of arithmetic and algebra we take for granted!) To imitate this procedure for the matrix equation Ax = b, what do we need? We need to find a matrix A! (analogous to I/a) such that A'A = I, an identity matrix (analogous to 1). if such a matrix exists (analogous to the requirement that a ¥= 0), then we can do the following sequence of calculations: Ax = b «*■ A'(Ax) = A'b =*• (A'A)x = A'b =*• Ix = A'b =*• x = A'b » *» (Why would each of these steps be justified?) Our goal in this section is to determine precisely when we can find such a matrix A', hi fact, we are going to insist on a bit more: We want not only A'A = I but also M.....•» AA' = I. This requirement forces A and A' to be square matrices. (Why?) DBfinitiOll If A is an n X n matrix, an inverse of A is an n X n matrix A' with the property that AA' = 7 and A'A = I where 1 = I„ is the n X n identity matrix. If such an A' exists, then A is called invertible. Example 3.22 If A = A A' 2 5 1 3 , then A' = "2 5" 3 -5" 1 o" 1 3„ _-l 2_ „0 1. is an inverse of A, since and A'A = 3 -5] "2 5" 1 0" -i 2J .1 3. .0 1„ Example 3.23 Show that the following matrices are not invertible: (a) O = 0 0 0 0 (b) B 1 2 2 4 Solution (a) It is easy to see that the zero matrix O does not have an inverse. If it did, then there would be a matrix O' such that OO' = 1 = O'O. But the product of the zero matrix with any other matrix is the zero matrix, and so OO' could never equal the identity 164 Chapter 3 Matrices matrix I. (Notice that this proof makes no reference to the size of the matrices and so is true for n x n matrices in general.) (b) Suppose B has an inverse B' = "1 2 2 4 from which we get the equations w x y z. W X y z . The equation BB' = I gives 1 0 0 1 w + 2y =1 x + 2z = 0 2w + 4y = 0 2x + 4z = 1 Subtracting twice the first equation from the third yields 0 = — 2, which is clearly absurd. Thus, there is no solution. (Row reduction gives the same result but is not really needed here.) We deduce that no such matrix B' exists; that is, 5 is not invert-ible. (In fact, it does not even have an inverse that works on one side, let alone two!) • Even though we have seen that matrix multiplication is not, in general, commutative, A' (if it exists) must satisfy A'A = AA'. • The examples above raise two questions, which we will answer in this section: (1) How can we know when a matrix has an inverse? (2) If a matrix does have an inverse, how can we find it? • We have not ruled out the possibility that a matrix A might have more than one inverse. The next theorem assures us that this cannot happen. Theorem 3.6 If A is an invertible matrix, then its inverse is unique. Proof In mathematics, a standard way to show that there is just one of something is to show that there cannot be more than one. So, suppose that A has two inverses—say, A' and A". Then AA' = I = A'A and A A" = I = A" A Thus, A' = A'l = A'iAA") = (A'A) A" = I A" = A" Hence, A' = A", and the inverse is unique. Thanks to this theorem, we can now refer to the inverse of an invertible matrix. From now on, when A is invertible, we will denote its (unique) inverse by A-1 (pronounced "A inverse"). Warning Do not be tempted to write A~' = —! There is no such operation as "division by a matrix." Even if there were, how on earth could we divide the scalar 1 by Section 3.3 The Inverse of a Matrix 165 Theorem 3.? the matrix A7, If you ever feel tempted to "divide" by a matrix, what you really want to do is multiply by its inverse. We can now complete the analogy that we set up at the beginning of this section. If A is an invertible n X n matrix, then the system of linear equations given by Ax = b has the unique solution x = A-1b for any b in U". Proof Theorem 3.7 essentially formalizes the observation we made at the beginning of this section. We will go through it again, a little more carefully this time. We are asked to prove two things: that Ax = b has a solution and that it has only one solution. (In mathematics, such a proof is called an "existence and uniqueness" proof.) To show that a solution exists, we need only verify that x = A ~ 'b works. We check that A(A *b) = (AA-1)b = lb = b So A~'b satisfies the equation Ax = b, and hence there is at least this solution. To show that this solution is unique, suppose y is another solution. Then Ay = b, and multiplying both sides of the equation by A"1 on the left, we obtain the chain of implications A-1(Ay) = A' lb =*• (A^A)y = A !b =^fy = A'lb ==>y = A~Jb Thus, y is the same solution as before, and therefore the solution is unique. So, returning to the questions we raised in the Remarks before Theorem 3.6, how can we tell if a matrix is invertible and how can we find its inverse when it is invertible? We will give a general procedure shortly, but the situation for 2 X 2 matrices is sufficiently simple to warrant being singled out. Theorem 3.8 If A = a b c d , then A is invertible if ad — be + 0, in which case 1 A" ad — be If ad ~ be = 0, then A is not invertible. d -c The expression ad — be is called the determinant of A, denoted det A. The formula for the inverse of (when it exists) is thus--times the matrix obtained by detA 7 interchanging the entries on the main diagonal and changing the signs on the other two entries. In addition to giving this formula, Theorem 3.8 says that a 2 X 2 matrix A is invertible if and only if det A # 0. We will see in Chapter 4 that the determinant can be defined for all square matrices and that this result remains true, although there is no simple formula for the inverse of larger square matrices. Prool Suppose that det A = ad - be * 0. Then d -b -c a ad — be —ab + ba cd — dc —cb + da ad — be 0 0 ad — be 1 0" = detA .0 1 166 Chapter 3 Matrices Similarly, d -b' a b "i o" = detA _ — c. a_ 1 Since det A # 0, we can multiply both sides of each equation by 1/det A to obtain a bV 1 F d -bY\ _ [l 0' c dj \det AL-c fljy Lo 1. 1 [" d -&l\fa b"} _ [l 0 detA -c a J c d 0 1 and [Note that we have used property (d) of Theorem 3.3.] Thus, the matrix detA A -b -c a satisfies the definition of an inverse, so A is invertible. Since the inverse of A is unique, by Theorem 3.6, we must have A detA d -b -c a Conversely, assume that ad — be = 0. We will consider separately the cases where a =h 0 and where a = 0. If a 0, then d = bc/a, so the matrix can be written as A = to Example 3.23(b), we see that if A has an inverse a b a b a b c d\ [ac/a bc/a] \ka kb where k = c/a. In other words, the second row of A is a multiple of the first. Referring w x then a b w X "l 0" ka kb_ y z_ _0 1. and the corresponding system of linear equations aw 4- by 1 kaw ax kax + bz = 0 kby = 0 + kbz = 1 has no solution. (Why?) If a = 0, then ad — be = 0 implies that foe = 0, and therefore either b or c is 0. Thus, A is of the form "0 b or 0 0 c d 0 d In the first case, 0 0" w x '0 0" "l 0" . Similarly, ~0 b case, c d_ _y z_ — .0 1. _0 d cannot have an inverse. (Verify this.) Consequently, if ad — be — 0, then A is not invertible. Section 3.3 The Inverse of a Matrix 167 Example 3.24 "1 2 12 -15" Find the inverses of A = andß = _ 4 -5. Solution We have det A = 1(4) - 2(3) = -2 ¥> 0, so , if they exist. 1 4 -2" "-2 f -2 _-3 1. 3 1 2 2- (Check this.) On the other hand, det B = 12 (- 5) - (-15) (4) = 0, so B is not invertible. Example 3.25 Use the inverse of the coefficient matrix to solve the linear system x + 2y - 3 3x + 4y = -2 Solution The coefficient matrix is the matrix A = 1 2 3 4 whose inverse we com- puted in Example 3.24. By Theorem 3.7, Ax = b has the unique solution x = A b. 3] ; thus, the solution to the given system is Here we have b -2 A Remark Solving a linear system Ax = b viax = A -1b would appear to be a good method. Unfortunately, except for 2 X 2 coefficient matrices and matrices with certain special forms, it is almost always faster to use Gaussian or Gauss-Jordan elimination to find the solution directly. (See Exercise 13.) Furthermore, the technique of Example 3.25 works only when the coefficient matrix is square and invertible, while elimination methods can always be applied. Properties of Invertible Matrices The following theorem records some of the most important properties of invertible matrices. Theorem 3.9 a. If A is an invertible matrix, then A 1 is invertible and (A'1)- A b. If A is an invertible matrix and c is a nonzero scalar, then cA is an invertible matrix and (cA)"1 = —A"1 c c. If A and B are invertible matrices of the same size, then AB is invertible and (AB)_1 = B ]A ' 168 Chapter 3 Matrices d. If A is an invertible matrix, then AT is invertible and (A7)-1 = (A-1)7 e. If A is an invertible matrix, then A" is invertible for all nonnegative integers n and (A") 1 = (A ')" We will prove properties (a), (c), and (e), leaving properties (b) and (d) to be proven in Exercises 14 and 15. (a) To show that A ™1 is invertible, we must argue that there is a matrix X such that A'X = 1 = XT1 But A certainly satisfies these equations in place of X, so A~1 is invertible and A is arc inverse of A~ . Since inverses are unique, this means that (A'~l)'~l = A. (c) Here we must show that there is a matrix X such that (AB)X = I = X(AB) The claim is that substituting B^A-1 for X works. We check that (AB)(B'lA^1) = A^B^A-1 = A7A"1 = AA^1 = I where we have used associativity to shift the parentheses. Similarly, (B~ lA"1) (AB) = I (check!), so AB is invertible and its inverse is B~lA~1. (e) The basic idea here is easy enough. For example, when n = 2, we have A2(A^)2 = AAA "'A-'1 = AJA"1 = AA"! = 7 Similarly, (A~~')2A2 = I Thus, (A-1)2 is the inverse of A2. It is not difficult to see that a similar argument works for any higher integer value of n. However, mathematical induction is the way to carry out the proof. The basis step is when n = 0, in which case we are being asked to prove that A0 is invertible and that (AT1 = (A-1)0 This is the same as showing that I is invertible and that J"1 = I, which is clearly true. »-*- (Why? See Exercise 16.) Now we assume that the result is true when n = k, where A: is a specific nonnegative integer. That is, the induction hypothesis is to assume that Ak is invertible and that (AV = (A" l)k The induction step requires that we prove that Ak+l is invertible and that (Ak+1)~l = (A~l)k*\ Now we know from (c) that Ak+1 = AkA is invertible, since A and (by hypothesis) Ak are both invertible. Moreover, (A~l)M = {A~l)kA~l -■- (Ak)~1A~l by the induction hypothesis = (AA*9~l by property (c) = (A^r1 Section 3.3 The Inverse of a Matrix 169 Therefore, A" is invertible for all nonnegative integers n, and (An) 1 = (A ')" by the principle of mathematical induction. Remarks • While all of the properties of Theorem 3.9 are useful, (c) is the one you should highlight. It is perhaps the most important algebraic property of matrix inverses. It is also the one that is easiest to get wrong. In Exercise 17, you are asked to give a counterexample to show that, contrary to what we might like, (AB)~l # A-1B_1 in general. The correct property, (A8)~! = B~lA~l, is sometimes called the socks-and-shoes rule, because, although we put our socks on before our shoes, we take them off in the reverse order. ♦ Property (c) generalizes to products of finitely many invertible matrices: If Ait A2,..., A„ are invertible matrices of the same size, then AjAj" • A„ is invertible and (AjA2 ■ • • A„) 1 — A,,1 ■ • • A21Ai 1 (See Exercise 18.) Thus, we can state: The inverse of a product of invertible matrices is the product of their inverses in the reverse order. 9 Since, for real numbers,-- —h —, we should not expect that, for a + b a b square matrices, (A + B)-1 = a"1 + B™1 (and, indeed, this is not true in general; see Exercise 19). In fact, except for special matrices, there is no formula for (A + B)-1. * Property (e) allows us to define negative integer powers of an invertible matrix: DefiilitiOH If A is an invertible matrix and n is a positive integer, then A " is denned by A '" = (A""1)" = (AT1 With this definition, it can be shown that the rules for exponentiation, A'AS = Ar+S and (A'Y = Ars, hold for all integers r and s, provided A is invertible. One use of the algebraic properties of matrices is to help solve equations involving matrices. The next example illustrates the process. Note that we must pay particular attention to the order of the matrices in the product. Example 3.26 Solve the following matrix equation for X (assuming that the matrices involved are such that all of the indicated operations are defined): a^(BXrl = (A _1B3)2 Chapter 3 Matrices Solution There are many ways to proceed here. One solution is A-'iBXy1 = (A^B3)2 -» ((BX)A)-1 = (A~lB3)2 =* [((BXU)-1]-1 = [(A^B3)2]"1 =*(BX)A= HATWU^B*)]'1 -* (BX)A = B-'U"1)"1^3^-1)"1 =>BX4 = £~3AB"3A =*B"'BX4A 1 = B^B^AB^AA"1 =>IX1 ~ B~AAB~H =>X= B 4AB~3 (Can you justify each step?) Note the careful use of Theorem 3.9(c) and the expansion of (A_IB3)2. We have also made liberal use of the associativity of matrix multiplication to simplify the placement (or elimination) of parentheses. Elementary Matrices We are going to use matrix multiplication to take a different perspective on the row reduction of matrices. In the process, you will discover many new and important insights into the nature of invertible matrices. If we find that 1 0 0 0 0 1 0 1 0 EA and A 5 7 8 3 -1 0 5 7 -1 0 8 3 In other words, multiplying A by £ (on the left) has the same effect as interchanging rows 2 and 3 of A. What is significant about £? It is simply the matrix we obtain by applying the same elementary row operation, R2 «-> B3, to the identity matrix I3. It turns out that this always works. Definition An elementary matrix is any matrix that can be obtained by performing an elementary row operation on an identity matrix. Since there are three types of elementary row operations, there are three corresponding types of elementary matrices. Here are some more elementary matrices. Example 3.27 Let 10 0 0 0 3 0 0 0 0 10 0 0 0 1 0 0 10 0 10 0 10 0 0 0 0 0 1 and E3 = "1 0 0 0 0 1 0 0 0 0 1 0 0 -2 0 1 Section 3.3 The Inverse of a Matrix 171 Each of these matrices has been obtained from the identity matrix I4 by applying a single elementary row operation. The matrix El corresponds to SR2, E2 to Rx <-> #3, and £3 to £4 — 2R2. Observe that when we left-multiply a 4 X n matrix by one of these elementary matrices, the corresponding elementary row operation is performed on the matrix. For example, if A «11 «12 «13 «21 «22 «23 «31 «32 «33 _«41 «42 «43. then and «11 «12 13 «31 «32 «33 _ 3a2 1 3fl22 3fl23 , E2A = «21 «22 «23 a,. 1 «32 «33 «11 «12 «13 - «4 1 «42 «43 _ _«41 «42 «43. «11 «12 «13 = «21 «22 «23 «31 «32 «33 * 2^21 ^42 ^^22 «43 ~" 2«23 ■*— Example 3.27 and Exercises 24-30 should convince you that any elementary row operation on any matrix can be accomplished by left-multiplying by a suitable elementary matrix. We record this fact as a theorem, the proof of which is omitted. Theorem 3.10 Let E be the elementary matrix obtained by performing an elementary row operation on /„. If the same elementary row operation is performed on an n X r matrix A, the result is the same as the matrix EA. Remark From a computational point of view, it is not a good idea to use elementary matrices to perform elementary row operations—just do them directly. However, elementary matrices can provide some valuable insights into invertible matrices and the solution of systems of linear equations, We have already observed that every elementary row operation can be "undone," or "reversed." This same observation applied to elementary matrices shows us that they are invertible. Example 3.28 Let "l 0 o" 1 0 0~ 1 0 0 Ex = 0 0 1 . E2 = 0 4 0 , and £3 = 0 i 0 0 1 0 _0 0 1_ .-2 0 1 Then Ex corresponds to iR2 <-» R3, which is undone by doing R2 <-> R3 again. Thus, E{~1 = E{. (Check by showing that £f = ElE1 = I.) The matrix £2 comes from 4R2, 112 Chapter 3 Matrices which is undone by performing \R2. Thus, 1 0 0 0 \ 0 0 0 1 which can be easily checked. Finally, £3 corresponds to the elementary row operation R3 — 2RU which can be undone by the elementary row operation R3 + 2Rlr So, in this case, E3~ 1 0 0 0 1 0 2 0 1 (Again, it is easy to check this by confirming that the product of this matrix and E3, in both orders, is I.) Notice that not only is each elementary matrix invertible, but its inverse is another elementary matrix of the same type. We record this finding as the next theorem. TH60reill 3.11 Each elementary matrix is invertible, and its inverse is an elementary matrix of the same type. The Fundamental Theorem of Invertible Matrices We are now in a position to prove one of the main results in this book—a set of equivalent characterizations of what it means for a matrix to be invertible. In a sense, much of linear algebra is connected to this theorem, either in the development of these characterizations or in their application. As you might expect, given this introduction, we will use this theorem a great deal. Make it your friend! We refer to Theorem 3.12 as the first version of the Fundamental Theorem, since we will add to it in subsequent chapters. You are reminded that, when we say that a set of statements about a matrix A are equivalent, we mean that, for a given A, the statements are either all true or all false. Theorem 3.12 The Fundamental Theorem of Invertible Matrices: Version 1 Let A be an n X n matrix. The following statements are equivalent: a. A is invertible. b. Ax = b has a unique solution for every b in IR". c. Ax = 0 has only the trivial solution. d. The reduced row echelon form of A is /„. e. A is a product of elementary matrices. Section 3.3 The Inverse of a Matrix 113 Proof We will establish the theorem by proving the circular chain of implications (a) =*• (b) (c) =» (d) => (e) =» (a) (a) =*• (b) We have already shown that if A is invertible, then Ax = b has the unique solution x = A~:b for any b in W (Theorem 3.7). (b) (c) Assume that Ax = b has a unique solution for any b in U". This implies, in particular, that Ax = 0 has a unique solution. But a homogeneous system Ax = 0 always has x = 0 as one solution. So in this case, x = 0 must be the solution. (c) =r- (d) Suppose that Ax = 0 has only the trivial solution. The corresponding system of equations is d^yX^ &i2x2 "4" * * * *f" @ifi%fi 0 a21xl a22x2 + • ' • + Cl2nXn — 0 an\x\ anlx2 + ' " ' + annXn = 0 and we are assuming that its solution is Xx = 0 x2 = 0 x„ = 0 In other words, Gauss-Jordan elimination applied to the augmented matrix of the system gives "«u «12 ' ' «1„ 0" ~ 1 0 • • 0 0" [A|0] = «21 «22 ' «2n 0 -> 0 1 • • 0 0 = [/Jo _«»1 «„2 - «nn 0_ _ 0 0 • • 1 0_ Thus, the reduced row echelon form of A is In. (d) (e) If we assume that the reduced row echelon form of A is /„, then A can be reduced to In using a finite sequence of elementary row operations. By Theorem 3.10, each one of these elementary row operations can be achieved by left-multiplying by an appropriate elementary matrix. If the appropriate sequence of elementary matrices is Ej, J52,..., Ek (in that order), then we have Ek- ■ •E2ElA = /,, According to Theorem 3.11, these elementary matrices are all invertible. Therefore, so is their product, and we have A = (lv • ■E2El)-% = (£,•• -E^Y1 = E[lE2l■ • -Ek~l Again, each Ef1 is another elementary matrix, by Theorem 3.11, so we have written A as a product of elementary matrices, as required. (e) »=*■ (a) If A is a product of elementary matrices, then A is invertible, since elementary matrices are invertible and products of invertible matrices are invertible. Chapter 3 Matrices Example 3.29 "Never bring a cannon on stage in Act I unless you intend to fire it by the last act." -Anton Chekhov If possible, express A 2 3 1 3 as a product of elementary matrices. Solution We row reduce A as follows: a '2 3" 1 3" R2 - 2S, "l 3" I 3. „2 3. ,0 -3. 1 0 " 1 o" 0 -3 .0 1 Thus, the reduced row echelon form of A is the identity matrix, so the Fundamental Theorem assures us that A is invertible and can be written as a product of elementary matrices. We have E^EjEiA = I, where "0 r 1 0" 1 r 1 0" > E2 = » £3 — .1 0. -2 1. .0 1. are the elementary matrices corresponding to the four elementary row operations used to reduce A to I. As in the proof of the theorem, we have .A. — (^Ef^E 2-E 2-^1) 1 ^2 "^3 ^4 as required. 0 1 1 0 1 0 2 1 1 0 0 -3 Remark Because the sequence of elementary row operations that transforms A into I is not unique, neither is the representation of A as a product of elementary matrices. (Find a different way to express A as a product of elementary matrices.) The Fundamental Theorem is surprisingly powerful. To illustrate its power, we consider two of its consequences. The first is that, although the definition of an invertible matrix states that a matrix A is invertible if there is a matrix B such that both AB = I and BA = I are satisfied, we need only check one of these equations. Thus, we can cut our work in half! Theorem 3.13 Let A be a square matrix. If B is a square matrix such that either AB = I or BA = I, then A is invertible and B = A~ . Suppose BA = I. Consider the equation Ax = 0. Left-multiplying by B, we have BAx = B0. This implies that x = Ix = 0. Thus, the system represented by Ax = 0 has the unique solution x = 0. From the equivalence of (c) and (a) in the Fundamental Theorem, we know that A is invertible. (That is, A _ 1 exists and satisfies AA ~1 = I = A ~1A.) If we now right-multiply both sides of BA = J by A"1, we obtain BAA" = IA"1 =>BI = A"1 **> B = A~'J (The proof in the case of AB = / is left as Exercise 41.) The next consequence of the Fundamental Theorem is the basis for an efficient method of computing the inverse of a matrix. Section 3.3 The Inverse of a Matrix TheOreill 3.14 Lct A be a square matrix. If a sequence of elementary row operations reduces A to /, then the same sequence of elementary row operations transforms / into A '. PfiSf If A is row equivalent to I, then we can achieve the reduction by left-multiplying by a sequence Elt E2, ■ ■ ■, Ek of elementary matrices. Therefore, we have Ek- ■ - E2EXA = 1. Setting B — Ek- ■ ■ E2El gives BA = I. By Theorem 3.13, A is invert-ible and A-1 = B. Now applying the same sequence of elementary row operations to I is equivalent to left-multiplying / by Ek • ■ ■ E2E1 = B. The result is Ek-■ -E2EJ = Bl = B = A 1 Thus, I is transformed into A"1 by the same sequence of elementary row operations. The Gauss-Jordan Method for Computing the inverse We can perform row operations on A and I simultaneously by constructing a "super-augmented matrix" [A\I], Theorem 3.14 shows that if A is row equivalent to I [which, by the Fundamental Theorem (d) <=> (a), means that A is invertible], then elementary row operations will yield [A 17] HA"1] If A cannot be reduced to I, then the Fundamental Theorem guarantees us that A is not invertible. The procedure just described is simply Gauss-Jordan elimination performed on an n X 2«, instead of an n X (n + 1), augmented matrix. Another way to view this procedure is to look at the problem of finding A-1 as solving the matrix equation AX = I„ for an n X n matrix X. (This is sufficient, by the Fundamental Theorem, since a right inverse of A must be a two-sided inverse.) If we denote the columns ofXby x;,..., x,., then this matrix equation is equivalent to solving for the columns of X, one at a time. Since the columns of I„ are the standard unit vectors ..., e„, we thus have n systems of linear equations, all with coefficient matrix A: Axj = ev ..., Ax„ = e„ Since the same sequence of row operations is needed to bring A to reduced row echelon form in each case, the augmented matrices for these systems, [A | ej , . . . , [A | e„], can be combined as [A|eje2- ■ -e„ [A|J„ We now apply row operations to try to reduce A to 1„, which, if successful, will simultaneously solve for the columns of A"1, transforming I„ into A-1. We illustrate this use of Gauss-Jordan elimination with three examples. Example 3.30 Find the inverse of A = 1 2 -1 2 2 4 1 3 ~3 if it exists. 116 Chapter 3 Matrices Solution Gauss-Jordan elimination produces [A\I] Therefore, ——> r3-r2 r,+r3 r2 + 3r3 r, - 2r2 -» A' 1 2 - 1 1 0 0 2 2 4 0 1 0 1 3 - 3 0 0 1_ 1 2 - 1 1 0 0 0 -2 6 —2 1 0 0 1 - 2 -1 0 1 1 2 - 1 1 0 0 0 1 - 3 1 1 2 0 0 1 - 2 -1 0 1 1 2 - 1 1 0 0 0 1 - 3 1 1 2 0 0 0 1 -2 1 2 1 1 2 0 -1 1 2 r 0 1 0 -5 1 3 0 0 1 -2 1 2 1_ 1 0 0 9 3 2 -5 0 1 0 -3 1 3 0 0 1 •2 1 2 1 - 9 - 3 2 -5" -5 1 3 -2 1 2 1. (You should always check that AA 1 = I by direct multiplication. By Theorem 3.13, we do not need to check that A'1 A = / too.) Remark Notice that we have used the variant of Gauss-Jordan elimination that first introduces all of the zeros below the leading Is, from left to right and top to bottom, and then creates zeros above the leading Is, from right to left and bottom to top. This approach saves on calculations, as we noted in Chapter 2, but you may find it easier, when working by hand, to create all of the zeros in each column as you go. The answer, of course, will be the same. Example 3,31 Find the inverse of A = 1 -4 -1 6 2 -2 if it exists. Section 3.3 The Inverse of a Matrix ;5» Solution We proceed as in Example 3.30, adjoining the identity matrix to A and then trying to manipulate [A\I] into [I \ A~ ]. 2 1 -4 1 0 0 [A\I] = 4 -1 6 0 1 0 2 2 -2 0 0 1 r2 + 2r, "2 1 -4 1 0 o" -» 0 1 -2 2 1 0 .0 3 -6 1 0 1_ "l 2 -1 1 0 o" ->■ 0 1 -3 2 1 0 0 0 0 -5 -3 1 At this point, we see that it is not possible to reduce A to I, since there is a row of zeros on the left-hand side of the augmented matrix. Consequently, A is not invertible As the next example illustrates, everything works the same way over Zp, where p is prime. ___________,—________—_» Example 3.32 Find the inverse of A = ~2 2' .2 0. if it exists, over Z3. Solution 1 We use the Gauss-Jordan method, remembering that all calculations are in Z3. [A 11] = 2 2 2 0 1 1 2 0 1 1 0 1 1 0 0 1 1 0 0 1 2 0 0 1 2 0" 2 1. 0 2 2 1 Thus, A'1 = 0 2 2 1 , and it is easy to check that, over Z3, AA = 1. Solution 2 Since A is a 2 X 2 matrix, we can also compute A 1 using the formula given in Theorem 3.8. The determinant of A is detA = 2(0) - 2(2) = -1=2 in Z3 (since 2 + 1 =0). Thus, A"1 exists and is given by the formula in Theorem 3.8. We must be careful here, though, since the formula introduces the "fraction" l/det A Chapter 3 Matrices and there are no fractions in Z3. We must use multiplicative inverses rather than division. Instead of 1/det A = 1/2, we use 2"1; that is, we find the number x that satisfies the equation 2x = 1 in Z3. It is easy to see that x = 2 is the solution we want: In Z3, 2~' = 2, since 2(2) = 1. The formula for A-1 now becomes 0 -2' 'o f "0 2' -2 2_ — z .1 2 1 which agrees with our previous solution. Exercises 3.3 In Exercises 1-10, find the inverse of the given matrix (if it exists) using Theorem 3.8. "4 1, 3. 7. 9. 10. 1 3 6 3 4 6 -1.5 0.5 4. -4.2 2.4 4 -2 2 0 0 1 -1 0 1/V2 -1/V2 3.55 0.25 8.52 0.60 l/a 1/b l/c l/d , where neither a, b, c, nor d is 0 In Exercises 11 and 12, solve the given system using the method of Example 3.25. 11. 2x + y= -1 5x + 3y = 2 1 2 12. x[ — jc2 = 1 13. Let A 2 6 b, = -1 2 , and b3 = (a) Find A~1 and use it to solve the three systems Ax = bu Ax = b , and Ax = b3. (b) Solve all three systems at the same time by row reducing the augmented matrix [A | b, b2 b3] using Gauss-Jordan elimination. (c) Carefully count the total number of individual multiplications that you performed in (a) and in (b). You should discover that, even for this 2X2 example, one method uses fewer operations. For larger systems, the difference is even more pronounced, and this explains why computer systems do not use one of these methods to solve linear systems. 14. Prove Theorem 3.9(b). 15. Prove Theorem 3.9(d). 16. Prove that the n X n identity matrix I„ is invertible and that J"1 = ln. 17. (a) Give a counterexample to show that (AB)~l ¥= A^B"1 in general, (b) Under what conditions on A and B is (AB) ~' = A~lB~x?. Prove your assertion. 18. By induction, prove that if A,, A2,..., A„ are invertible matrices of the same size, then the product AXA2 • ' ■ A„ is invertible and (AiA2 ■ ■ • AJ"1 = A^1 • • • A2 !A[ 19. Give a counterexample to show that (A + B)'1 # "] in general. A~l + B~ In Exercises 20-23, solve the given matrix equation for X. Simplify your answers as much as possible. (In the words of Albert Einstein, "Every thing should be made as simple as possible, but not simpler!') Assume that all matrices are invertible. 20. XA1 = A'1 21. AXB = (BAf 22. (A'^XT1 = A(B "2A) 1 23. ABXA'^B"1 = I + A In Exercises 24-30, let A = C = i 2 -1 1 1 1 1 -1 0 1 2 -l" 1 1 1 2 1 -1 D = 0 1 -1 -1 3 -1 Section 3.3 The Inverse of a Matrix 179 In each case, find an elementary matrix E that satisfies the given equation. 24. EA = B 25. EB = A 26. EA = C 27. EC = A 28. EC = D 29. ED = C 30. Is there an elementary matrix E such that EA = D? Why or why not? In Exercises 31-38, find the inverse of the given elementary matrix. 31. 33. 35. 37. 0 -2 1 0 0 1 c # 0 32. 34. 36. 38. 2 1 0 1 0 1 0 0 1 0 In Exercises 39 and 40, find a sequence of elementary matrices Eu E2,..., Ek such that Ek- ■ ■ E1ElA = I. Use this sequence to write both A andA~l as products of elementary matrices. 39. A 40. A = 41. Prove Theorem 3.13 for the case of AB 42. (a) Prove that if A is invertible and AB = O, then B = O. (b) Give a counterexample to show that the result in part (a) may fail if A is not invertible. 43. (a) Prove that if A is invertible and BA = CA, then B = C. (b) Give a counterexample to show that the result in part (a) may fail if A is not invertible. 44. A square matrix A is called idempotent if A2 = A. (The word idempotent comes from the Latin idem, meaning "same," and potere, meaning "to have power." Thus, something that is idempotent has the "same power" when squared.) (a) Find three idempotent 2X2 matrices. (b) Prove that the only invertible idempotent n X n matrix is the identity matrix. 45. Show that if A is a square matrix that satisfies the equation A2 — 2A + / = O, then A"1 = 21 - A. 46. Prove that if a symmetric matrix is invertible, then its inverse is symmetric also. 47. Prove that if A and B are square matrices and AB is invertible, then both A and B are invertible. In Exercises 48-63, use the Gauss-Jordan method to find the inverse of the given matrix (if it exists). 48. 50. 52. 54. 56. 1 5 .1 4_ ~4 - 2 .2 "2 3 1 - 2 _2 0 "l 1 1 0 _0 1 "0 a b 0 _0 d ' V2 58. -4V2 V2 59. 61. 63. 0 C 0 ( 10 0 0 1 0 0 1 0 c d 0 0 0 a b 4 L3 1 1 3 2 4_ 5 0 2 4 6 1 49. 51. 53. 55. 57. -2 3 - 1 a —a 1 1 -1 3 1 2 3 a 0 0 1 a 0 1 a -1 1 -1 1 0 3 1 1 0 2V2 0 0 0 1 0 3 1. 60. 62. 0 1 1 1 '2 1 1 1 0 2 over Z, over Z, •z, Partitioning large square matrices can sometimes make their inverses easier to compute, particularly if the blocks have a nice form. In Exercises 64-68, verify by block multiplication that the inverse of a matrix, if partitioned as shown, is as claimed. (Assume that all inverses exist as needed.) 64. "A B~ -1 "A"1 -A^BD' O D O D1 180 Chapter 3 Matrices 65. 66. 67. 68. 0 b -l "-(bc; _1 (BC)~lB c j. _c(bc) _1 7 - c(bc)_1b. / b •■ i " (/- bo"1 ~(j- BCY'B c J. -C(I - bc)1 7 4- c(7 - - BC)lB_ 0 b -l c £>. (BD~ cr1 (bd^'o^'bd .£>~ C(BD D"1 - d 'cxbd" lc)-1bd-1. A c b £>_ -l P Q~ R S , where P= (A - bd^c)"1, Q = -i>BD~\ R = -D_1CP, and S = D^1 + D"lCPBD'1 In Exercises 69-72, partition the given matrix so that you can apply one of the formulas from Exercises 64-68, and then calculate the inverse using that formula. ■1 0 0 0' 69. 1 0 3 1 2 0 70. The matrix in Exercise 58 71. "0 0 J 1" 0 0 1 0 72. 0 -1 1 0 1 1 0 1 0 1 1 1 3 1 -15 2 The IU Factorization Just as it is natural (and illuminating) to factor a natural number into a product of other natural numbers—for example, 30 = 2 • 3 • 5—it is also frequently helpful to factor matrices as products of other matrices. Any representation of a matrix as a product of two or more other matrices is called a matrix factorization. For example, "3 -1" "l 0" "3 -1" .9 -5 .3 1. .0 -2 is a matrix factorization. Needless to say, some factorizations are more useful than others. In this section, we introduce a matrix factorization that arises in the solution of systems of linear equations by Gaussian elimination and is particularly well suited to computer implementation. In subsequent chapters, we will encounter other equally useful matrix factorizations. Indeed, the topic is a rich one, and entire books and courses have been devoted to it. Consider a system of linear equations of the form Ax = b, where A is an n X n matrix. Our goal is to show that Gaussian elimination implicitly factors A into a product of matrices that then enable us to solve the given system (and any other system with the same coefficient matrix) easily. The following example illustrates the basic idea. Example 3.33 Let 2 1 3 4-13 -2 5 5 Section 3.4 The LU Factorization .}•' Row reduction of A proceeds as follows: em 1: The LU factorization was introduced in 1948 by the great English mathematician Alan M. Turing (1912-1954) in a paper entitled "Rounding-off Errors in Matrix Processes" (Quarterly Journal of Mechanics and Applied Mathematics, 1 (1948), pp. 287-308). During World War II, Turing was instrumental in cracking the German "Enigma" code. However, he is best known for his work in mathematical logic that laid the theoretical groundwork for the development of the digital computer and the modern field of artificial intelligence. The "Turing test" that he proposed in 1950 is still used as one of the benchmarks in addressing the question of whether a computer can be considered "intelligent." A 2 1 3 4-13 -2 5 5 R2-2r, R, + R, 2 1 3 0 ~3 ~3 0 6 8 2 1 3 0 -3 -3 0 6 8 = 17 (1) The three elementary matrices Ex, E2, £3 that accomplish this reduction of A to echelon form I/are (in order): 1 0 o" "1 0 O" "l 0 0" -2 1 0 . £2 = 0 1 0 . £3 = 0 1 0 0 0 1. „1 0 1. „0 2 1. Hence, Solving for A, we get EJ-.,E,A = U A = E'{lE2lE;lU = 1 0 0 2 1 0 0 0 1, 1 0 0 2 1 0 -1 -2 1 1 0 0 0 1 0 -10 1. U = LU 1 0 0 0 1 0 0 -2 1 U Thus, A can be factored as A = LU where Uis an upper triangular matrix (see the exercises for Section 3.2), and L is unit lower triangular. That is, L has the form I = 1 0 * 1 with zeros above and Is on the main diagonal. The preceding example motivates the following definition. OSf lilltlOll Let A be a square matrix. A factorization of A as A = LU, where L is unit lower triangular and U is upper triangular, is called an LU factorization of A, Remarks • Observe that the matrix A in Example 3.33 had an LU factorization because no row interchanges were needed in the row reduction of A. Hence, all of the elementary matrices that arose were unit lower triangular. Thus, L was guaranteed to be unit Chapter 3 Matrices lower triangular because inverses and products of unit lower triangular matrices are also unit lower triangular. (See Exercises 29 and 30.) If a zero had appeared in a pivot position at any step, we would have had to swap rows to get a nonzero pivot. This would have resulted in L no longer being unit lower & triangular. We will comment further on this observation below. (Can you find a matrix for which row interchanges will be necessary?) • The notion of an LU factorization can be generalized to nonsquare matrices by simply requiring U to be a matrix in row echelon form. (See Exercises 13 and 14.) * Some books define an LU factorization of a square matrix A to be any factorization A = LU, where L is lower triangular and U is upper triangular. The first remark above is essentially a proof of the following theorem. Theorem 3.15 If A is a square matrix that can be reduced to row echelon form without using any row interchanges, then A has an LU factorization. To see why the LU factorization is useful, consider a linear system Ax = b, where the coefficient matrix has an LU factorization A = LU. We can rewrite the system Ax = b as LUx = b or L(Ux) = b. If we now define y = Ux, then we can solve for x in two stages: 1. Solve Ly = b for y by forward substitution (see Exercises 25 and 26 in Section 2.1). 2. Solve Ux = y for x by back substitution. Each of these linear systems is straightforward to solve because the coefficient matrices L and U are both triangular. The next example illustrates the method. *- Example 3.34 < Use an L U factorization of A = "2 13* 4-13 .-2 5 5. to solve Ax = b, where b = r -4 . 9. Solution In Example 3.33, we found that A = 1 0 0 2 1 0 -1 -2 1 LU Ly = b for y . This is just the linear system As outlined above, to solve Ax = b (which is the same as L(Ux) = b), we first solve V iy3- yx = 1 2yx + y2 = -A -yx - 2y2 + y3= 9 Forward substitution (that is, working from top to bottom) yields y\ = L*2 = -4 " 2yi = -6,y3 = 9 + yx + 2y2 = -2 Section 3.4 The LU Factorization 183 1 Thus y = -6 and we now solve Ux = y for x = x2 .-2. -*3- 2x, + x2 + 3x3 = 1 -6 2x, = - -2 . This linear system is and back substitution quickly produces x3 — — 1, —3x2 = — 6 + 3x3 = —9 so that x2 = 3, and 2Xj = 1 — x2 — 3x3 — 1 so that xx = \ Therefore, the solution to the given system Ax = b is x = liiiipls 3.35 An Easy Way to Find Iff Factorizations In Example 3.33, we computed the matrix I as a product of elementary matrices. Fortunately, L can be computed directly from the row reduction process without our needing to compute elementary matrices at all. Remember that we are assuming that A can be reduced to row echelon form without using any row interchanges. If this is the case, then the entire row reduction process can be done using only elementary row operations of the form Rj — kRj. (Why do we not need to use the remaining elementary row operation, multiplying a row by a nonzero scalar?) In the operation jR; — kRp we will refer to the scalar k as the multiplier. In Example 3.33, the elementary row operations that were used were, in order, R2 2Ry R3 + R,= R^-i-DR, R3 + 2_R2 = R3 - (-2)R2 (multiplier — 2) (multiplier = — 1) (multiplier = — 2) The multipliers are precisely the entries of L that are below its diagonal! Indeed, I = 1 0 0 2 1 0 -1 -2 1 and L21 = 2, L31 = — 1, and L32 = —2. Notice that the elementary row operation Rj — kRj has its multiplier k placed in the (:,;') entry of L. Find an L U factorization of A = 3 1 3 -4" 6 4 8 -10 3 2 5 1 -9 5 -2 -4 Solution Reducing A to row echelon form, we have A = 3 1 3 -4" "3 1 3 -4 6 4 8 -10 0 2 2 -2 3 2 5 -1 A4-(-3)«! 0 1 2 3 9 5 -2 -4. -> _0 8 7 -16 "3 1 3 — Z 0 2 1 4 R,-4Ka —-¥ 0 0 1 if _0 0 1 -i "3 1 3 -4" «4-(-l)ÄS 0 2 2 -2 -y 0 0 1 4 0 0 0 -4 = 17 The first three multipliers are 2,1, and -3, and these go into the subdiagonal entries of the first column of L. So, thus far, L = 1 0 0 0' 2 10 0 1*10 -3**1 The next two multipliers are \ and 4, so we continue to fill out L: L = 1 0 0 0' 2 10 0 1 I 10 -34*1 The final multiplier, — 1, replaces the last * in I to give " 1 0 0 0" 2 1 0 0 L = 1 2 1 0 -3 4 -1 1 Thus, an LU factorization of A is " 3 1 3 -4' " 1 0 0 0" "3 1 3 -4" 6 4 8 -10 2 1 0 0 0 2 2 -2 A = 3 2 5 -1 1 2 1 0 0 0 1 4 -9 5 -2 -4 -3 4 -1 1 0 0 0 -4 = LU as is easily checked. Section 3.4 The LU Factorization 185 Remarks * In applying this method, it is important to note that the elementary row operations Rt — kRj must be performed from top to bottom within each column (using the diagonal entry as the pivot), and column by column from left to right. To illustrate what can go wrong if we do not obey these rules, consider the following row reduction: A ~ "l 2 2' R$ ~ 2R, "l 2 2' "l 2 2 1 1 1 -> 1 1 1 ~-> 0 -1 -1 2 2 1 _0 0 -1 _0 0 -1 = u This time the multipliers would be placed in L as follows: Li2 ~ 2,L21 = 1. We would get "l 0 0" 1 1 0 0 2 1 L = but A i= LU. (Check this! Find a correct LU factorization of A.) • An alternative way to construct L is to observe that the multipliers can be obtained directly from the matrices obtained at the intermediate steps of the row reduction process. In Example 3.33, examine the pivots and the corresponding columns of the matrices that arise in the row reduction A = 2 1 3 4-13 -2 5 5 •A, = 2 1 3 0 -3 -3 0 6 8 2 1 3 0 -3 -3 0 0 2 = 1/ The first pivot is 2, which occurs in the first column of A. Dividing the entries of this column vector that are on or below the diagonal by the pivot produces 2~ r 4 = 2 .-2. „-1- The next pivot is —3, which occurs in the second column of A\, Dividing the entries of this column vector that are on or below the diagonal by the pivot, we obtain _ -3 = 1 6. .-2. (-3) The final pivot (which we did not need to use) is 2, in the third column of U. Dividing the entries of this column vector that are on or below the diagonal by the pivot, we obtain If we place the resulting three column vectors side by side in a matrix, we have 1 2 1 -1 -2 1 _ ,2. .1. which is exactly I once the above-diagonal entries are filled with zeros. Chapter 3 Matrices In Chapter 2, we remarked that the row echelon form of a matrix is not unique. However, if an invertible matrix A has an LU factorization A =LU, then this factorization is unique. ThCOrCni 3.16 If A is an invertible matrix that has an LU factorization, then L and fJare unique. Suppose A = LU and A = /_,(/, are two LU factorizations of A. Then LU -LtUi, where I and Ll are unit lower triangular and U and U1 are upper triangular. In fact, U and U} are two (possibly different) row echelon forms of A. By Exercise 30, L, is invertible. Because A is invertible, its reduced row echelon form is an identity matrix I by the Fundamental Theorem of Invertible Matrices. H—*- Hence U also row reduces to I (why?) and so U is invertible also. Therefore, L^&lfltT1 = irKi/L^ir1 so {LJlL){UV~l) = (ir'ixXt/iir1) Hence, (If1!)/ = liUiU'1) so LflI = C^IT1 But Lj~'L is unit lower triangular by Exercise 29, and [/jt/ is upper triangular. 18 }»■ (Why?) It follows that L^LL = 17, IT1 is both unit lower triangular and upper triangular. The only such matrix is the identity matrix, so If 'L = J and 1/;IT 1 = J. It follows that I = I, and Lr = Lr1; so the LI7 factorization of A is unique._____~3m The r iu Factorization We now explore the problem of adapting the LU factorization to handle cases where row interchanges are necessary during Gaussian elimination. Consider the matrix A = 1 2 3 6 -1 1 A straightforward row reduction produces A-»B = 1 2 -1 0 0 5 0 3 3 which is not an upper triangular matrix. However, we can easily convert this into upper triangular form by swapping rows 2 and 3 of 5 to get U 1 2 0 3 0 0 Alternatively, we can swap rows 2 and 3 of A first. To this end, let P be the elementary matrix "l 0 0" 0 0 1 0 1 0 Section 3.4 The LV Factorization corresponding to interchanging rows 2 and 3, and let E be the product of the elementary matrices that then reduce PA to U (so that = L is unit lower triangular). Thus EPA = U, so A = (EPy^U = P~lE~lU = P~]LU. Now this handles only the case of a single row interchange. In general, P will be the product P = Pk- ■ -P2P\ of all the row interchange matrices PX,PP ... ,Pk (where Pi is performed first, and so on). Such a matrix P is called a permutation matrix. Observe that a permutation matrix arises from permuting the rows of an identity matrix in some order. For example, the following are all permutation matrices: 0 10 0" 0 0 0 1 10 0 0 0 0 1 0_ Fortunately, the inverse of a permutation matrix is easy to compute; in fact, no calculations are needed at all! Theorem 3.17 If P is a permutation matrix, then P_1 = PT. We must show that PTP = I. But the ith row of P1 is the same as the ith column of P, and these are both equal to the same standard unit vector e, because P is a permutation matrix. So (PTP)U ~ (ith row of Pr)(;th column of P) = e'e = e-e = 1 This shows that diagonal entries of PTP are all Is. On the other hand, if j i, then the jth column of P is a different standard unit vector from e—say e'. Thus, a typical off-diagonal entry of PJP is given by (PrP),7 = (ith row of Pr)(/'th column of P) = eV = e-e' = 0 Hence PTP is an identity matrix, as we wished to show. Thus, in general, we can factor a square matrix A as A = P~]LU = PLU. Definition Let A be a square matrix. A factorization of A asA = PJLU, where P is a permutation matrix, L is unit lower triangular, and U is upper triangular, is called a PJLUfactorization of A. fxilllli 3.3S Find a PTLU factorization of A = f "0 0 6' 1 2 3 .2 1 4_ Solution First we reduce A to row echelon form. Clearly, we need at least one row interchange. 1 2 3~ 0 0 6 0 ~3 -2 0 1 1 0 0 0 1 1 0 0 0 1 0 A = 0 0 6 1 2 3 1 2 3 R,<->R2 0 0 6 2 1 4. _2 "l 1 4_ 2 3 0 0 3 0 -2 6 188 Chapter 3 Matrices We have used two row interchanj »es («i R2 and then R2 <-> R3) permutation matrix is "l 0 0" "o 1 0~ "o 1 o" P = P2Pi = 0 0 1 1 0 0 = 0 0 1 .0 1 0. _0 0 1. .1 0 0_ We now find an LU factorization of PA. "o 1 0~ "o 0 6' "l 2 3" 1 2 3 PA = 0 0 1 1 2 3 = 2 1 4 --► 0 -3 -2 .1 0 0_ 2 1 4_ .0 0 6_ .0 0 6. Hence L>, =2, and so A = PJLU = 0 0 1 1 0 0 0 1 0 1 0 0 2 1 0 0 0 1 The discussion above justifies the following theorem. TheOrSlIt 3.18 Every square matrix has a PTLU factorization. = u Remark Even for an invertible matrix, the PTLU factorization is not unique. In Example 3.36, a single row interchange R{ <-» R3 also would have worked, leading to a different P. However, once P has been determined, L and U are unique. Computational Considerations IfA is « X n, then the total number of operations (multiplications and divisions) required to solve a linear system Ax = b using an LU factorization of A) is T(n) ~ n3/3, the same as is required for Gaussian elimination. (See the Exploration "Counting Operations," in Chapter 2.) This is hardly surprising since the forward elimination phase produces the LU factorization in «* «3/3 steps, whereas both forward and backward substitution require = n2/2 steps. Therefore, for large values of n, the «3/3 term is dominant. From this point of view, then, Gaussian elimination and the LU factorization are equivalent. However, the LU factorization has other advantages: • From a storage point of view, the LU factorization is very compact because we can overwrite the entries of A with the entries of L and U as they are computed. In Example 3.33, we found that A 1 3 -1 3 5 5 1 0 0 2 1 0 -1 -2 1 1 -3 0 3 -3 2 = LU This can be stored as 2 -1 3 2 -3 -3 -1-2 2 Section 3.4 The LU Factorization 189 with the entries placed in the order (1,1), (1,2), (1,3), (2,1), (3,1), (2,2), (2,3), (3,2), (3,3). In other words, the subdiagonal entries of A are replaced by the corresponding multipliers. (Check that this works!) • Once an LU factorization of A has been computed, it can be used to solve as many linear systems of the form Ax = b as we like. We just need to apply the method of Example 3.34, varying the vector b each time. • For matrices with certain special forms, especially those with a large number of zeros (so-called "sparse" matrices) concentrated off the diagonal, there are methods that will simplify the computation of an LU factorization. In these cases, this method is faster than Gaussian elimination in solving Ax = b. • For an invertible matrix A, an LU factorization of A can be used to find A"1, if necessary. Moreover, this can be done in such a way that it simultaneously yields a factorization of A"1. (See Exercises 15-18.) Remark If you have a CAS (such as MATLAB) that has the LU factorization built in, you may notice some differences between your hand calculations and the computer output. This is because most CAS's will automatically try to perform partial pivoting to reduce roundoff errors. (See the Exploration "Partial Pivoting," in Chapter 2.) Turing's paper is an extended discussion of such errors in the context of matrix factorizations. This section has served to introduce one of the most useful matrix factorizations. In subsequent chapters, we will encounter other equally useful factorizations. Exercises 3.4 In Exercises 1-6, solve the system Ax — b using the given LU factorization of A. 1 0~ 1 1 1. A = 2. A = 3. A = -2 1 2 5, 4 -2 2 3 2 -2 4 - 1 i L 2 2 0 0 0 1 ~2 3 -4 -3 0 1 -2 4 -6 -2 1 0 6 -2 4 1 -1 , b , b O _2 Z 4 -3 1 0 0 0 1 0 1 2 -4 o" 1 0 0 4. A = 3 -1 4 = 3 2 1 0 1 2 2_ i - 2 0 1. "2 -4 0* 2" X 0 5 4 , b 0 „0 0 2_ -5„ 5. A "2 -1 0 0" "1 0 0 0 6 -4 5 -3 3 1 0 0 8 -4 1 0 4 0 1 0 4 -1 0 7 2 -1 5 1 X 6. A = 2 0 0 0 1 -2 3 -5 1 0 0' -3 0 0 0 4 "1" 2 , b = 2 1 0 0 0 0 3 -1 -3 9 3 0 5 2 2 0 0 1 0" " 1 2 -2 -4 3 9_ - " r -3 , b -1 0 0 0 0" 1 0 0 2 1 0 4 -2 1 In Exercises 7-12, find an LU factorization of the given matrix. 1. 1 2 -3 -1 8, 2 -4 3 1 190 Chapter 3 Matrices 9. 11. 12. "l 2 3" "2 2 -l" /« Exercises 19-22, write the given permutation ma 4 5 6 10. 4 0 4 product of elementary (row interchange) matrices. _8 7 9_ 3 4 4_ "0 0 0 r "o 0 r " 1 2 3 -1" 19. 1 0 0 20. 0 0 1 0 2 6 3 0 _0 1 0_ 0 1 0 0 0 6 -6 / 1 0 0 0_ _-l -2 -9 0_ "o 0 1 0 o" "0 1 0 0~ " 2 2 2 1" 0 0 0 1 1 0 0 0 0 -2 4 -1 2 21. 1 0 0 0 22. 0 0 0 1 0 4 4 7 3 0 0 1 0_ 0 0 0 0 1 6 9 5 8 _0 1 0 0 0_ Generalize the definition of LUfactorization to nonsquare matrices by simply requiring Uto be a matrix in row echelon form. With this modification, find an LU factorization of the matrices in Exercises 13 and 14. 13. 14. 1 0 0 3 0 0 0 1 -2 1 0 1 5 0 3 3 -3 -1 1 8 -2 5 2 -6 0 For an invertible matrix with an LU factorization A both L and U will be invertible and A~l = U~lL~1. Exercises 15 and 16, find L~ matrix. = LU, In U~l, and A"1 for the given 15. A in Exercise 1 16. A in Exercise 4 The inverse of a matrix can also be computed by solving several systems of equations using the method of Example 3.34. For an n X n matrix A, to find its inverse we need to solve AX = Infor then X n matrix X. Writing this equation as A[X] x2---x„] = [Cj e2- •• ej, using the matrix-column form of AX, we see that we need to solve n systems of linear equations: Axx = e1; Ax2 = e2,..., Ax„ = e • Moreover, we can use the factorization A = LU to solve each one of these systems. In Exercises 17 and 18, use the approach just outlined to find A ~1 for the given matrix. Compare with the method of Exercises 15 and 16. In Exercises 23-25, find a P1 LU factorization of the given matrix A. 23. A = 25. A 0 1 1 3 0 -1 0 0 1 4 2 1 24. A 0 0 1 2' -1 1 3 2 0 2 1 1 1 1 -1 0 ■1 1 1 1 26. Prove that there are exactly nl n X n permutation matrices. In Exercises 27-28, solve the system Ax = b using the given factorization A = PJLU. Because PP1 = I, PrLUx — b can be rewritten as LUx = Pb. This system can then be solved using the method of Example 3.34. 27. A 0 1 -1 0 1 2 3 2 = 1 0 1 1 -1 0 0 X 28. A = X 17. A in Exercise 1 18. A in Exercise 4 2 0 0 8 3 4 1 4 0 4 0 0 P LU, b = PTLU, b 0 1 _i 2 r i 5_ 0 1 -1 16 -4 4 Section 3.5 Subspaces, Basis, Dimension, and Rank ■ '1 29. Prove that a product of unit lower triangular matrices is unit lower triangular. 30. Prove that every unit lower triangular matrix is invertible and that its inverse is also unit lower triangular. An LDUfactorization of a square matrix A is a factorization A = LDU, where L is a unit lower triangular matrix, D is a diagonal matrix, and U is a unit upper triangular matrix (upper triangular with Is on its diagonal). In Exercises 31 and 32, find an LDU factorization of A. 31. A in Exercise 1 32. A in Exercise 4 33. If A is symmetric and invertible and has an LDU factorization, show that U = LT. 34. If A is symmetric and invertible and A = LDLT (with L unit lower triangular and D diagonal), prove that this factorization is unique. That is, prove that if we also have A = LjDjlf (with Ll unit lower triangular and Dt diagonal), then L = Lt and D = Dx. ■.......... Subspaces, Basis, Dimension, and Rank Figure 3.2 This section introduces perhaps the most important ideas in the entire book. We have already seen that there is an interplay between geometry and algebra: We can often use geometric intuition and reasoning to obtain algebraic results, and the power of algebra will often allow us to extend our findings well beyond the geometric settings in which they first arose. In our study of vectors, we have already encountered all of the concepts in this section informally. Here, we will start to become more formal by giving definitions for the key ideas. As you'll see, the notion of a subspace is simply an algebraic generalization of the geometric examples of lines and planes through the origin. The fundamental concept of a basis for a subspace is then derived from the idea of direction vectors for such lines and planes. The concept of a basis will allow us to give a precise definition of dimension that agrees with an intuitive, geometric idea of the term, yet is flexible enough to allow generalization to other settings. You will also begin to see that these ideas shed more light on what you already know about matrices and the solution of systems of linear equations. In Chapter 6, we will encounter all of these fundamental ideas again, in more detail. Consider this section a "getting to know you" session. A plane through the origin in U3 "looks like" a copy of !R2. Intuitively, we would agree that they are both "two-dimensional." Pressed further, we might also say that any calculation that can be done with vectors in U2 can also be done in a plane through the origin. In particular, we can add and take scalar multiples (and, more generally, form linear combinations) of vectors in such a plane, and the results are other vectors in the same plane. We say that, like R2, a plane through the origin is closed with respect to the operations of addition and scalar multiplication. (See Figure 3.2.) But are the vectors in this plane two- or three-dimensional objects? We might argue that they are three-dimensional because they live in R3 and therefore have three components. On the other hand, they can be described as a linear combination of just two vectors—direction vectors for the plane—and so are two-dimensional objects living in a two-dimensional plane. The notion of a subspace is the key to resolving this conundrum. Chapter 3 Matrices Definition A subspace of R" is any collection S of vectors in R" such that: 1. The zero vector 0 is in S. 2. If u and v are in S, then u + v is in S. (S is closed under addition.) 3. If u is in S and c is a scalar, then cu is in S. (S is closed under scalar multiplication.) We could have combined properties (2) and (3) and required, equivalently, that S be closed under linear combinations: If u„ u2,..., uk are in S and c,, c2,..., ck are scalars, then qU] + c2u2 + • • ■ + ckuk is in S. Every line and plane through the origin in R3 is a subspace of R3. It should be clear geometrically that properties (1) through (3) are satisfied. Here is an algebraic proof in the case of a plane through the origin. You are asked to give the corresponding proof for a line in Exercise 9. Let 8P be a plane through the origin with direction vectors vt and v2. Hence, 9 = span(vi, v2). The zero vector 0 is in./', since 0 = 0v{ + 0v2. Now let u = clv1 + c2v2 and v = d1vl + d2v2 be two vectors in 2P, Then u + v = (cjVj + c2v2) + (f^Vj + d2\2) = (cj + dl)vl + (c2 + d2)v2 Thus, u + v is a linear combination of vx and v2 and so is in Now let c be a scalar. Then cu = cCcjVi + c2v3) = (cc,)Vi + (cc2)v2 which shows that cu is also a linear combination of v1 and v2 and is therefore in 9?. We have shown that i? satisfies properties (1) through (3) and hence is a subspace of R3. » If you look carefully at the details of Example 3,37, you will notice that the fact that v$ and v2 were vectors in R3 played no role at all in the verification of the properties. Thus, the algebraic method we used should generalize beyond R3 and apply in situations where we can no longer visualize the geometry. It does. Moreover, the method of Example 3.37 can serve as a "template" in more general settings. When we generalize Example 3.37 to the span of an arbitrary set of vectors in any U", the result is important enough to be called a theorem. Tlt00r6in 3.19 Let v1( v2,..., \k be vectors in R". Then span^, v2,..., v^) is a subspace of R". iKafiipIg S.3! ; Proof Let S = span (v1; v2,..., \k). To check property (1) of the definition, we simply observe that the zero vector 0 is in S, since 0 = 0vx + 0v2 + • • • + 0vk. Section 3.5 Subspaces, Basis, Dimension, and Rank 193 Now let u = CjVi + c2v2 + • • • + ckvk and v = + d2v2 + • ■ • + dk\k be two vectors in S. Then u + v = (cjV, + c2v2 + • • • + ckvk) + (flVr, + d2v2 + • • • + dk\k) = (cj + di)vj + (c2 + rf2)v2 + ■ ■ - +(ck + dk)vk Thus, u + v is a linear combination of Vj, v2.....V* and so is in S. This verifies property (2). To show property (3), let c be a scalar. Then cu = cCqvj + c2v2 + • • • + CfcVi) = (cci)v, + (cc2)v2 H-----h (cck)yk which shows that cu is also a linear combination of vx, v2, . . . , vk and is therefore in S. We have shown that S satisfies properties (1) through (3) and hence is a subspace of or. - We will refer to span^j, v2,. . . , vk) as the subspace spanned by vu v2.....vk. We will often be able to save a lot of work by recognizing when Theorem 3.19 can be applied. .............________________________________,_- -_» Example 3.38 X Show that the set of all vectors y that satisfy the conditions x = 3y and z — —2y forms a subspace of R3. r Z Solution Substituting the two conditions into yields 3/ 3 y = y i -ly. .-2 Since y is arbitrary, the given set of vectors is span) ofR3, by Theorem 3.19. 3 1 -2 and is thus a subspace Geometrically, the set of vectors in Example 3.38 represents the line through the 3" origin in R3 with direction vector Chapter 3 Matrices Example 3.39 Determine whether the set of all vectors and z = — 2y is a subspace of R3. that satisfy the conditions x — 3y + 1 Solution This time, we have all vectors of the form 3y + 1 J . ~2y J " 3y + 1 0 p-js. The zero vector is not of this form. (Why not? Try solving y = 0 .) Hence L -ly , _o_ property (1) does not hold, so this set cannot be a subspace off?3. Determine whether the set of all vectors , where y = x2, is a subspace of R2. Solution These are the vectors of the form belongs to S (take x = 0), so property (1) holds. Let u Then call this set S. This time 0 = be in S, xl andv = X2 2 2 Xi -X2~ U + V = Xi + x2 which, in general, is not in S, since it does not have the correct form; that is, x2 + x\ ¥= (xx + x2)2. To be specific, we look for a counterexample. If is not in S since 5 # 32. Thus, V ~2 and v = .1. A. then both u and v are in S, but their sum u + v property (2) fails and S is not a subspace of R2. Remark In order for a set S to be a subspace of some R", we must prove that properties (1) through (3) hold in general. However, for S to fail to be a subspace of W, it is enough to show that one of the three properties fails to hold. The easiest course is usually to find a single, specific counterexample to illustrate the failure of the property. Once you have done so, there is no need to consider the other properties. Subspaces Associated with Matrices A great many examples of subspaces arise in the context of matrices. We have already encountered the most important of these in Chapter 2; we now revisit them with the notion of a subspace in mind. Section 3.5 Subspaces, Basis, Dimension, and Rank Definition Let A be an m X n matrix. 1. The row space of A is the subspace row(A) of IR" spanned by the rows of A. 2. The column space of A is the subspace col (A) of Um spanned by the columns of A. Remark Observe that, by Example 3.9 and the Remark that follows it, col(A) consists precisely of all vectors of the form Ax where x is in R". Example 3.41 Consider the matrix A 1 -1 0 1 3 -3 (a) Determine whether b = is in the column space of A. (b) Determine whether w = [4 5 ] is in the row space of A. (c) Describe row(A) and col (A). Solution (a) By Theorem 2.4 and the discussion preceding it, b is a linear combination of the columns of A if and only if the linear system Ax = b is consistent. We row reduce the augmented matrix as follows: "l -1 r "l 0 3" 0 1 2 -> 0 1 2 _3 -3 3_ .0 0 0. Thus, the system is consistent (and, in fact, has a unique solution). Therefore, b is in col (A). (This example is just Example 2.18, phrased in the terminology of this section.) (b) As we also saw in Section 2.3, elementary row operations simply create linear combinations of the rows of a matrix. That is, they produce vectors only in the row space of the matrix. If the vector w is in row(A), then w is a linear combination of the rows of A, so if we augment A by w as , it will be possible to apply elementary row operations to this augmented matrix to reduce it to form — using only elementary row operations of the form Rt + kR,, where i > j— in other words, working from top to bottom in each column. (Why?) In this example, we have "1 -1" -r "1 -r 0 3 1 -3 Ri-4R1 -> 0 0 i 0 K4-9J!2 -> 0 0 i 0 _4 5_ _0 9_ _0 0_ 196 Chapter 3 Matrices Therefore, w is a linear combination of the rows of A (in fact, these calculations show that w = 4[l -l]+9[0 1] —how?), and thus w is in row(A). (c) It is easy to check that, for any vector w = [x y], the augmented matrix ' reduces to 1 0 0 1 0 0 0 0 in a similar fashion. Therefore, every vector in U2 is in row(A), and so row(A) = R2. Finding col (A) is identical to solving Example 2.21, wherein we determined that it coincides with the plane (through the origin) in U3 with equation 3x — z = 0. (We will discover other ways to answer this type of question shortly.) Remark We could also have answered part (b) and the first part of part (c) by observing that any question about the rows of A is the corresponding question about the columns of AT. So, for example, w is in row(A) if and only if wT is in col(Ar). This is true if and only if the system Arx = w1 is consistent. We can now proceed as in part (a). (See Exercises 21-24.) The observations we have made about the relationship between elementary row operations and the row space are summarized in the following theorem. Th60rem 3.20 Let B be any matrix that is row equivalent to a matrix A. Then row(B) = row(A) Proof The matrix A can be transformed into B by a sequence of row operations. Consequently, the rows of B are linear combinations of the rows of A; hence, linear combinations of the rows of B are linear combinations of the rows of A. (See Exercise 21 in Section 2.3.) It follows that row(B) C row(A). On the other hand, reversing these row operations transforms B into A. Therefore, the above argument shows that row(A) C row(B). Combining these results, we have row(A) = row(5). ___3IHB There is another important subspace that we have already encountered: the set of solutions of a homogeneous system of linear equations. It is easy to prove that this subspace satisfies the three subspace properties. Theorem 3.21 Let A be an m X n matrix and let N be the set of solutions of the homogeneous linear system Ax = 0. Then N is a subspace of W. Proof [Note that x must be a (column) vector in that 0 = 0m is the zero vector in in N, Therefore, Au = 0 and Av " in order for Ax to be defined and ".] Since A0„ = 0m, 0„ is in N. Now let u and v be 0. It follows that A(u + v) = Au + Av = 0 + 0 0 Section 3.5 Subspaces, Basis, Dimension, and Rank Hence, u + v is in N. Finally, for any scalar c, A(cu) = c(Au) = cO = 0 and therefore cu is also in N. It follows that N is a subspace of R". _ 'fflHI Definition Let A be an m X n matrix. The null space of A is the subspace of R" consisting of solutions of the homogeneous linear system Ax = 0. It is denoted bynull(A). The fact that the null space of a matrix is a subspace allows us to prove what intuition and examples have led us to understand about the solutions of linear systems: They have either no solution, a unique solution, or infinitely many solutions. Theorem 3.22 Let A be a matrix whose entries are real numbers. For any system of linear equations Ax = b, exactly one of the following is true: a. There is no solution. b. There is a unique solution. c. There are infinitely many solutions. At first glance, it is not entirely clear how we should proceed to prove this theorem. A little reflection should persuade you that what we are really being asked to prove is that if (a) and (b) are not true, then (c) is the only other possibility. That is, if there is more than one solution, then there cannot be just two or even finitely many, but there must be infinitely many Proof If the system Ax — b has either no solutions or exactiy one solution, we are done. Assume, then, that there are at least two distinct solutions of Ax = b—say, x1 and x2. Thus, AXj = b and Ax2 = b with Xj # x2. It follows that A(x, - x2) = Axx - Ax2 = b - b = 0 Set x0 = X! — x:. Then x0 * 0 and Axo = 0. Hence, the null space of A is nontrivial, and since null(A) is closed under scalar multiplication, cx0 is in null(A) for every scalar c. Consequently, the null space of A contains infinitely many vectors (since it contains at least every vector of the form cx0 and there are infinitely many of these). Now, consider the (infinitely many) vectors of the form x; + cx0, as c varies through the set of real numbers. We have A(x, + cx0) = Ax, + cAxq = b + cO = b Therefore, there are infinitely many solutions of the equation Ax = b.__ Basis We can extract a bit more from the intuitive idea that subspaces are generalizations of planes through the origin in R . A plane is spanned by any two vectors that are Chapter 3 Matrices parallel to the plane but are not parallel to each other. In algebraic parlance, two such vectors span the plane and are linearly independent. Fewer than two vectors will not work; more than two vectors is not necessary. This is the essence of a basis for a subspace. DCflllitiOII A basis for a subspace S of R" is a set of vectors in S that 1. spans S and 2. is linearly independent. _:__________________—-^ Example 3.42 In Section 2.3, we saw that the standard unit vectors e2, . . . e„ in R" are linearly independent and span R". Therefore, they form a basis for R", called the standard basis. " r <— ___________—- Example 3.43 In Example 2.19, we showed that R2 = span! also linearly independent (as they are not multiples' ri"h r 2i f1! , . Since and are hi) L-iJ L3J , they form a basis for R2. <-- A subspace can (and will) have more than one basis. For example, we have just seen that R2 has the standard basis and the basis How- ever, we will prove shortly that the number of vectors in a basis for a given subspace will always be the same. Example 3.44 i Find a basis for S = span(u, v, w), where 3 2 0 u = -1 , v = 1 , and w = -5 5 .3 1 Solution The vectors u, v, and w already span S, so they will be a basis for S if they are also linearly independent. It is easy to determine that they are not; indeed, w = 2u — 3v. Therefore, we can ignore w, since any linear combinations involving u, v, and w can be rewritten to involve u and v alone. (Also see Exercise 47 in Section 2.3.) This implies that S = span (u, v, w) = span (u, v), and since u and v are certainly _ *» linearly independent (why?), they form a basis for S. (Geometrically, this means that u, v, and w all lie in the same plane and u and v can serve as a set of direction vectors for this plane.) A Section 3.5 Subspaces, Basis, Dimension, and Rank Example 3.45 Find a basis for the row space of A 1 3 -1 0 2 1 1 6 1 1 -2 1 Solution The reduced row echelon form of A is R 10 10 0 12 0 0 0 0 1 0 0 0 0 -ľ 3 4 0 By Theorem 3.20, row(A) = row(£), so it is enough to find a basis for the row space of R. But row(i?) is clearly spanned by its nonzero rows, and it is easy to check that the staircase pattern forces the first three rows of R to be linearly independent. (This is a general fact, one that you will need to establish to prove Exercise 33.) Therefore, a basis for the row space of A is {[10 10 -1], [0 1 2 0 3], [0 0 0 1 4]} We can use the method of Example 3.45 to find a basis for the subspace spanned by a given set of vectors. Example 3.46 Rework Example 3.44 using the method from Example 3.45. Solution We transpose u, v, and w to get row vectors and then form a matrix with these vectors as its rows: B = 3-15 2 1 3 0 -5 1 Proceeding as in Example 3.45, we reduce B to its reduced row echelon form "i o f 0 1 -| .0 0 0_ and use the nonzero row vectors as a basis for the row space. Since we started with column vectors, we must transpose again. Thus, a basis for span (u, v, w) is Remarks • In fact, we do not need to go all the way to reduced row echelon form—row echelon form is far enough. If U is a row echelon form of A, then the nonzero row vectors r; Chapter 3 Matrices of [/will form a basis for row(A) (see Exercise 33). This approach has the advantage of (often) allowing us to avoid fractions. In Example 3.46, B can be reduced to 17 = 3-15 0 -5 1 0 0 0 which gives us the basis 3 -1 5 0 -5 1 for span (u, v, w). * Observe that the methods used in Example 3.44, Example 3.46, and the Remark above will generally produce different bases. We now turn to the problem of finding a basis for the column space of a matrix A. One method is simply to transpose the matrix. The column vectors of A become the row vectors of AT, and we can apply the method of Example 3.45 to find a basis for row(Ar). Transposing these vectors then gives us a basis for col(A). (You are asked to do this in Exercises 21-24.) This approach, however, requires performing a new set of row operations on AT. Instead, we prefer to take an approach that allows us to use the row reduced form of A that we have already computed. Recall that a product Ax of a matrix and a vector corresponds to a linear combination of the columns of A with the entries of x as coefficients. Thus, a nontrivial solution to Ax = 0 represents a dependence relation among the columns of A. Since elementary row operations do not affect the solution set, if A is row equivalent to R, the columns of A have the same dependence relationships as the columns ofR. This important observation is the basis (no pun intended!) for the technique we now use to find a basis for col (A). ,,,____. Example 3.47 Find a basis for the column space of the matrix from Example 3.45, " 1 13 1 6" r A = 2-10 1-1 -3 2 1-2 1 4 16 13 Solution Let a; denote a column vector of A and let r, denote a column vector of the reduced echelon form R = 10 10-1 0 12 0 0 0 0 1 0 0 0 0 We can quickly see by inspection that r3 = rx + 2r2 and r5 = —r, + 3r2 + 4r4. (Check that, as predicted, the corresponding column vectors of A satisfy the same dependence relations.) Thus, r3 and r5 contribute nothing to col(R). The remaining column Section 3.5 Subspaces, Basis, Dimension, and Rank 201 vectors, r2, and r4, are linearly independent, since they are just standard unit vectors. The corresponding statements are therefore true of the column vectors of A. Thus, among the column vectors of A, we eliminate the dependent ones (a3 and a5), and the remaining ones will be linearly independent and hence form a basis for col(A). What is the fastest way to find this basis? Use the columns of A that correspond to the columns of R containing the leading Is. A basis for col(A) is {a,, a2, aj = < 1 1 2 -1 -3 2 4_ 1_ 1 1 -2 1 Warning Elementary row operations change the column space! In our example, col(A) i= col(R), since every vector in col(,R) has its fourth component equal to 0 but this is certainly not true of col (A). So we must go back to the original matrix A to get the column vectors for a basis of col(A). To be specific, in Example 3.47, r,, r2, and r4 do not form a basis for the column space of A. Example 3.48 Find a basis for the null space of matrix A from Example 3.47, Solution There is really nothing new here except the terminology. We simply have to find and describe the solutions of the homogeneous system Ax = 0. We have already computed the reduced row echelon form R of A, so all that remains to be done in Gauss-Jordan elimination is to solve for the leading variables in terms of the free variables. The final augmented matrix is [Ä0] = If 10 10 0 12 0 0 0 0 1 0 0 0 0 -1 3 4 0 x = then the leading Is are in columns 1,2, and 4, so we solve for X\, x2, and x4 in terms of the free variables x3 and x5. We get Setting x3 = 5 and x5 = t, we obtain -#3 + x5, x2 = —2*3 — 3x5, and xA = —4x5. *1 -5 + t ' "-\" f x2 -25 ~ 3r -2 -3 = s = s 1 + t 0 -At 0 -4 -*5. t 0_ 1 = su + tv Thus, u and v span null(A), and since they are linearly independent, they form a basis fornull(A). f 202 Chapter 3 Matrices Following is a summary of the most effective procedure to use to find bases for the row space, the column space, and the null space of a matrix A. 1. Find the reduced row echelon form R of A. 2. Use the nonzero row vectors of R (containing the leading Is) to form a basis for row(A). 3. Use the column vectors of A that correspond to the columns of R containing the leading Is (the pivot columns) to form a basis for col(A). 4. Solve for the leading variables of Rx = 0 in terms of the free variables, set the free variables equal to parameters, substitute back into x, and write the result as a linear combination off vectors (where/ is the number of free variables). These /'vectors form a basis for null(A). If we do not need to find the null space, then it is faster to simply reduce A to row echelon form to find bases for the row and column spaces. Steps 2 and 3 above remain valid (with the substitution of the word "pivots" for "leading Is"). Dimension and Rank We have observed that although a subspace will have different bases, each basis has the same number of vectors. This fundamental fact will be of vital importance from here on in this book. The Basis Theorem Let S be a subspace of R", Then any two bases for S have the same number of vectors. PrOOl Let B = ju., ii,, . . ., u,j and C = {v1; v2,. .., vj be bases for S. We need to prove that r = s. We do so by showing that neither of the other two possibilities, r s, can occur. Suppose that r < s. We will show that this forces C to be a linearly dependent set of vectors. To this end, let CjVj 4- c2v2 +• • ■+ c.v, = 0 (1) Since B is a basis for S, we can write each v(- as a linear combination of the elements a.-: Vj = anu{ + a12u2 + • • • + alrur v2 = a21u, + a22u2 + • • • + a2ru,. vs = aslU] + as2u2 + • • • + asrur Substituting the Equations (2) into Equation (1), we obtain Ci(anUi + ■ ■ ■ + alru,) + c2(a21U! + • • ■ + a2rur) + • • ■ + cfa^ + • • • + asruf) = 0 Theorem 3.23 Sherlock Holmes noted, "When you have eliminated the impossible, whatever remains, however improbable, must be the truth" (from The Sign of Four by Sir Arthur Conan Doyle). Section 3.5 Subspaces, Basis, Dimension, and Rank Regrouping, we have (cxdn + c2a2i + • • ■ + cs%)Uj + (c1a12 + c2a22 + "''+ C.A-2.)U2 + ■ ■ ■ + (cjalr + c2a2r + • • • + csasr)ur = 0 Now, since B is a basis, the ufs are linearly independent. So each of the expressions in parentheses must be zero: Cjflii + c2a21 + ■■ ■ + csasl = 0 c}al2 + c2a22 +■ •■ + csas2 = 0 cxalr + c2a2r + ■ ■ ■ + csasr = 0 This is a homogeneous system of r linear equations in the s variables q, c2,..., cs. (The fact that the variables appear to the left of the coefficients makes no difference.) Since r < s, we know from Theorem 2.3 that there are infinitely many solutions. In particular, there is a nontrivial solution, giving a nontrivial dependence relation in Equation (1). Thus, C is a linearly dependent set of vectors. But this finding contradicts the fact that C was given to be a basis and hence linearly independent. We conclude that r < s is not possible. Similarly (interchanging the roles of B and C), we find that r> s leads to a contradiction. Hence, we must have r — s, as desired. --JMH Since all bases for a given subspace must have the same number of vectors, we can attach a name to this number. DSfillltiOII If S is a subspace of R", then the number of vectors in a basis for S is called the dimension of S, denoted dim S. HQBiark The zero vector 0 by itself is always a subspace of R". (Why?) Yet any set containing the zero vector (and, in particular, {0}) is linearly dependent, so {0} cannot have a basis. We define dim {0} to be 0. Since the standard basis for R" has n vectors, dim R" = n. (Note that this result agrees with our intuitive understanding of dimension for n S 3.) tmm^vi 3.IS In Examples 3.45 through 3.48, we found that row(A) has a basis with three vectors, col (A) has a basis with three vectors, and null (A) has a basis with two vectors. Hence, dim(row(A)) = 3, dim(col(A)) = 3, and dim(null(A)) = 2. A single example is not enough on which to speculate, but the fact that the row and column spaces in Example 3.50 have the same dimension is no accident. Nor is the fact that the sum of dim(col(A)) and dim(null(A)) is 5, the number of columns of A. We now prove that these relationships are true in general. 204 Chapter 3 Matrices - TllCOreill 3.24 The row and column spaces of a matrix A have the same dimension. PrOOl Let jR be the reduced row echelon form of A. By Theorem 3.20, row(A) = row(.R), so dim(row(A)) = dim(row(R)) = number of nonzero rows of R = number of leading Is of R Let this number be called r. Now col(A) col(#), but the columns of A and R have the same dependence relationships. Therefore, dim(col(A)) = dim(col(.R)). Since there are r leading Is, R has r columns that are standard unit vectors, eu e2,..., er. (These will be vectors in Rffl if A and R are m X n matrices.) These r vectors are linearly independent, and the remaining columns of R are linear combinations of them. Thus, dim(col(JR)) = r. It follows that dim(row(A)) = r = dim(coKA)), as we wished to prove. _Si The rank of a matrix was first defined in 1878 by Georg Frobenius (1849-1917), although he defined it using determinants and not as we have done here. (See Chapter 4.) Frobenius was a German mathematician who received his doctorate from and later taught at the University of Berlin. Best known for his contributions to group theory, Frobenius used matrices in his work on group representations. Definition The rank of a matrix A is the dimension of its row and column spaces and is denoted by rank(A). For Example 3.50, we can thus write rank(A) = 3. Remarks • The preceding definition agrees with the more informal definition of rank that was introduced in Chapter 2. The advantage of our new definition is that it is much more flexible. • The rank of a matrix simultaneously gives us information about linear dependence among the row vectors of the matrix and among its column vectors. In particular, it tells us the number of rows and columns that are linearly independent (and this number is the same in each case!). Since the row vectors of A are the column vectors of A , Theorem 3.24 has the following immediate corollary. Theorem 3.25 For any matrix A, rank(Ar) = rank(A) Proof We have rank(Ar) = dim (col(Ar)) = dim (row(A)) = rank(A) Definition The nutUty of a matrix A is the dimension of its null space and is denoted by nullity(A). Section 3.5 Subspaces, Basis, Dimension, and Rank 205 In other words, nullity(A) is the dimension of the solution space of Ax — 0, which is the same as the number of free variables in the solution. We can now revisit the Rank Theorem (Theorem 2.2), rephrasing it in terms of our new definitions. Theorem 3.26 The Rank Theorem If A is an m x n matrix, then rank(A) + nullity(A) = n Proof Let R be the reduced row echelon form of A, and suppose that rank(A) = r. Then R has r leading Is, so there are r leading variables and n — r free variables in the solution to Ax = 0. Since dim(null(A)) = n - r, we have rank(A) + nullity(A) = r + (« = n r) Often, when we need to know the nullity of a matrix, we do not need to know the actual solution of Ax = 0. The Rank Theorem is extremely useful in such situations, as the following example illustrates. Example 3.51 Find the nullity of each of the following matrices: "2 3" 1 5 M = and 4 7 _3 6_ "2 1 -2 -1 N = 4 4 -3 1 2 7 1 8 Solution Since the two columns of M are clearly linearly independent, rank(M) = 2. Thus, by the Rank Theorem, nullity(M) = 2 - rank(M) = 2-2 = 0. There is no obvious dependence among the rows or columns of N, so we apply row operations to reduce it to "2 1 -2 -l" 0 2 13 .0 0 0 0_ We have reduced the matrix far enough (we do not need reduced row echelon form here, since we are not looking for a basis for the null space). We see that there are only two nonzero rows, so rank(N) = 2. Hence, nullity(jV) = 4 — rank(N) =4 — 2 = 2. 4 The results of this section allow us to extend the Fundamental Theorem of Invertible Matrices (Theorem 3.12). 206 Chapter 3 Matrices Theorem 3.27 The Fundamental Theorem of Invertible Matrices: Version 2 The nullity of a matrix was defined in 1884 by James Joseph Sylvester (1814-1887), who was interested in invariants—properties of matrices that do not change under certain types of transformations. Born in England, Sylvester became the second president of the London Mathematical Society. In 1878, while teaching at Johns Hopkins University in Baltimore, he founded the American Journal of Mathematics, the first mathematical journal in the United States. Let A be an n X n matrix. The following statements are equivalent: a. A is invertible. b. Ax = b has a unique solution for every b in 15". Ax = 0 has only the trivial solution. The reduced row echelon form of A is /„. A is a product of elementary matrices. rank(A) = n g. nullity (A) = 0 h. The column vectors of A are linearly independent. i. The column vectors of A span IR". j. The column vectors of A form a basis for IR". k. The row vectors of A are linearly independent. 1. The row vectors of A span IR". m. The row vectors of A form a basis for IR". Proof We have already established the equivalence of (a) through (e). It remains to be shown that statements (f) to (m) are equivalent to the first live statements. (f) (g) Since rank(A) + nullity(A) = n when A is an n X n matrix, it follows from the Rank Theorem that rank(A) = n if and only if nullity(A) = 0. (f) =*■ (d) => (c) => (h) If rank(A) = n, then the reduced row echelon form of A has n leading Is and so is In. From (d) => (c) we know that Ax = 0 has only the trivial solution, which implies that the column vectors of A are linearly independent, since Ax is just a linear combination of the column vectors of A. (h) =*■ (i) If the column vectors of A are linearly independent, then Ax = 0 has only the trivial solution. Thus, by (c) (b), Ax = b has a unique solution for every b in IR". This means that every vector b in IR" can be written as a linear combination of the column vectors of A, establishing (i). (i) => (j) If the column vectors of A span IR", then col (A) = IR" by definition, so rank(A) = dim(col(A)) = n. This is (f), and we have already established that (f) (h). We conclude that the column vectors of A are linearly independent and so form a basis for IR", since, by assumption, they also span IR". (i) (f) If the column vectors of A form a basis for IR", then, in particular, they are linearly independent. It follows that the reduced row echelon form of A contains n leading Is, and thus rank (A) = n. The above discussion shows that (f) (d) (c) (h) =■*• (i) (j) =^ (f) (g). Now recall that, by Theorem 3.25, rank(Ar) = rank(A), so what we have just proved gives us the corresponding results about the column vectors of A r. These are then results about the row vectors of A, bringing (k), (1), and (m) into the network of equivalences and completing the proof. Theorems such as the Fundamental Theorem are not merely of theoretical interest. They are tremendous labor-saving devices as well. The Fundamental Theorem has already allowed us to cut in half the work needed to check that two square matrices are inverses. It also simplifies the task of showing that certain sets of vectors are bases for IR". Indeed, when we have a set of n vectors in W, that set will be a basis for IR" if either of the necessary properties of linear independence or spanning set is true. The next example shows how easy the calculations can be. Section 3.5 Subspaces, Basis, Dimension, and Rank 207 Example 3.52 Show that the vectors "l" "4" 2 3_ • 0 1 , and 9 7. form a basis for R . v: According to the Fundamental Theorem, the vectors will form a basis for R? if and only if a matrix with these vectors as its columns (or rows) has rank 3. We perform just enough row operations to determine this: 1 -1 4 1 -1 4 A = 2 0 9 > 0 2 1 .3 1 7_ 0 0 -7 We see that A has rank 3, so the given vectors are a basis for R3 by the equivalence of (f) and (j). The next theorem is an application of both the Rank Theorem and the Fundamental Theorem. We will require this result in Chapters 5 and 7. Theorem 3.28 Let A be an m X n matrix. Then: a. rank(ATA) = rank(A) b. The n X n matrix A7A is invertible if and only if rank(A) Pr« il (a) Since ATA is n X n, it has the same number of columns as A. The Rank Theorem then tells us that rank(A) + nullity(A) = n ~ rank(ArA) + nullity(ArA) Hence, to show that rank(A) = rank(ATA), it is enough to show that nullity(A) = nullity(ArA). We will do so by establishing that the null spaces of A and ATA are the same. To this end, let x be in null (A) so that Ax = 0. Then ATAx = AJ0 = 0, and thus x is in null(ArA). Conversely, let x be in null(ArA). Then ArAx = 0, so xlATAx = xT0 = 0. But then (Ax) • (Ax) = (Ax)T{Ax) = xTATAx = 0 and hence Ax = 0, by Theorem 1.2(d). Therefore, x is in null(A), so null(A) = null (A A), as required. (b) By the Fundamental Theorem, the n X n matrix A1 A is invertible if and only if rank(ArA) = n. But, by (a) this is so if and only if rank(A) = n. _JWKM Coordinates We now return to one of the questions posed at the very beginning of this section: How should we view vectors in R3 that live in a plane through the origin? Are they two-dimensional or three-dimensional? The notions of basis and dimension will help clarify things. 208 Chapter 3 Matrices A plane through the origin is a two-dimensional subspace of W, with any set of two direction vectors serving as a basis. Basis vectors locate coordinate axes in the plane/subspace, in turn allowing us to view the plane as a "copy" of Uz. Before we illustrate this approach, we prove a theorem guaranteeing that "coordinates" that arise in this way are unique. Theorem 3.29 Let S be a subspace of U" and let B = {v1; v2,..., yj be a basis for S. For every vector v in S, there is exactly one way to write v as a linear combination of the basis vectors in B: V = CjV, + c2v2 + ---+ckvk Proof Since B is a basis, it spans S, so v can be written in at least one way as a linear combination of v1; v2,..., Vj.. Let one of these linear combinations be V = dTj + C2V2 +■■ ■ + ckvk Our task is to show that this is the only way to write v as a linear combination of Vi, v2,..., vk. To this end, suppose that we also have v = dlyl + d2v2 + ■ • • + dk\k Then clv1 + c2v2 + • • • + ck\k = d{vx + d2v2 + ■ ■ ■ + dkvk Rearranging (using properties of vector algebra), we obtain (ci - dx)vx + (c2 - d2)y2 + ■ • ■ + (ck- dk)vk = 0 Since B is a basis, Vj, v2,..., Yk are linearly independent. Therefore, (C[ - dx) = (c2 ~ d2) = • • • = (ck - dk) = 0 In other words, cx = dx, c2 = d2,. . . , ck = dk> and the two linear combinations are actually the same. Thus, there is exactly one way to write v as a linear combination of the basis vectors in B. -^■■1 Definition Let 5 be & subspace of W and letB = {vj, v2,..., yj beabasisfor S. Let v be a vector in S, and write v = cl\l + c2v2 + ■ ■ ■ + ckvk. Then cu c2,...,ck axe called the coordinates ofv with respect to B, and the column vector is called the coordinate vector ofv with respect to B. Example 3.53 Let £ = {e,, e2, e3} be the standard basis for K3. Find the coordinate vector of v with respect to £. Section 3.5 Subspaces, Basis, Dimension, and Rank Solution Since v = 2e, + 7e2 + 4e3 Example 3.54 It should be clear that the coordinate vector of every (column) vector in W with respect to the standard basis is just the vector itself. In Example 3.44, we saw that u = 3" n2" o" -1 ,v = 1 , and w = -5 5. .3, . 1. are three vec- T tors in the same subspace (plane through the origin) S of R3 and that B = {u, v} is a basis for S. Since w = 2u - 3v, we have 2" -3 See Figure 3.3. Figure 3.3 The coordinates of a vector with respect to a basis Exercises 3.5 In Exercises 1-4, let S be the collection of vectors that satisfy the given property. In each case, either prove that S forms a subspace ofU2 or give a counterexample to show that it does not. l.x = 0 3. y = 2x 2. x > 0,y > 0 4. xy > 0 In Exercises 5-8, let S be the collection of vectors 7.x-y + z= l 8. \x — y\ = \y — z \ 9, Prove that every line through the origin in R3 is a sub-space of R3. 10. Suppose S consists of all points in IR2 that are on the x-axis or the 7-axis (or both). (S is called the union of the two axes.) Is S a subspace of R2? Why or why not? In Exercises 11 and 12, determine whether b is in col(A) and whether wis in row(A), as in Example 3.41. in that satisfy the given property. In each case, either prove that S forms a subspace q/R3 or give a counterexample to show that it does not. 5. x — y = z 6. z = 2x, y = 0 11. A 12. A = ,b 1 1 -3 0 2 1 1 -1 -4 ,b = w 1 1 0 -1 1 1] , w [2 4 -5] 210 Chapter 3 Matrices 13. In Exercise 11, determine whether w is in row(A), using the method described in the Remark following Example 3.41. 14. In Exercise 12, determine whether w is in row(A), using the method described in the Remark following Example 3.41. •l" 15. If A is the matrix in Exercise 11, is v 16. If A is the matrix in Exercise 12, is v = in null (A)? in null (A)? In Exercises 17-20, give bases for row(A), col(A), and null(A). 17. A = 1 0 1 1 18. A = 1 -3 2 1 1 -4 1 1 0 1 19. A = 0 1 -1 1 .0 1 -1 -1_ 2 -4 0 2 1 20. A = I 2 1 2 3 1 -2 1 4 4 In Exercises 21-24, find bases for row(A) and col(A) in the given exercises using A1. 21. Exercise 17 22. Exercise 18 23. Exercise 19 24. Exercise 20 25. Explain carefully why your answers to Exercises 17 and 21 are both correct even though there appear to be differences. 26. Explain carefully why your answers to Exercises 18 and 22 are both correct even though there appear to be differences. In Exercises 27-30, find a basis for the span of the given vectors. 1 -1 0 1 1 0 2 27. -1 0 , 1 28. -1 > 2 1 1 0 1 -1 1 0 1 2 29. [2 - -3 1]. 1 - 1 0] .[4 - 4 1 ] 30. [0 1 -2 1], [3 1 -1 0], [2 15 1] For Exercises 31 and 32, find bases for the spans of the vectors in the given exercises from among the vectors themselves. 31. Exercise 29 32. Exercise 30 33. Prove that if R is a matrix in echelon form, then a basis for row(R) consists of the nonzero rows of R. 34. Prove that if the columns of A are linearly independent, then they must form a basis for col(A). For Exercises 35-38, give the rank and the nullity of the matrices in the given exercises. 35. Exercise 17 36. Exercise 18 37. Exercise 19 38. Exercise 20 39. If A is a 3 X 5 matrix, explain why the columns of A must be linearly dependent. 40. If A is a 4 X 2 matrix, explain why the rows of A must be linearly dependent. 41. If A is a 3 X 5 matrix, what are the possible values of nullity(A)? 42. If A is a 4 X 2 matrix, what are the possible values of nullity(A)? In Exercises 43 and 44, find all possible values ofrank(A) as a varies. 43. A 1 2 a -2 4a 2 a -2 1 44. A = a 2—1 3 3-2 — 2—1 a Answer Exercises 45-48 by considering the matrix with the given vectors as its columns. 45. Do 1 1 0 1 0 1 0 1 1 form a basis for R ? 46. Do 47. Do 1 -1 3 -1 1 5 -3 1 1 1 1 1 0 1 1 0 1 1 0 1 1 0 1 1 1 form a basis for R3? form a basis for R ? Section 3.6 Introduction to Linear Transformations 211 48. Do 1 u u -1 -1 1 0 0 0 0 -1 > 1 0 -1 1 0 form a basis for R4? 1 0 1 49. Do 1 s 1 0 form a basis for Z2? 0 1 1 1 0 1 50. Do 1 1 0 form a basis for Zj? 0 1 1 In Exercises 51 and 52, show that w is in span(B) and find the coordinate vector [w]B. 51. B = 52. B V 2 * .0. f *3~ J 1 I .4. ] V ,w = 6 J 2 "5* 1 "l" 1 ,w = 3 .6. J A. In Exercises 53-56, compute the rank and nullity of the given matrices over the indicated Zp. 53. 55. 56. 1 1 0 1 1 0 1 over Z, 54. 1 1 2 1 2 2 0 0 over Z, 1 3 2 3 4 1 0 0 0 1 5 1 0 2 2 5 1 1 1 over Zc over 57. If A is m X n, prove that every vector in null (A) is orthogonal to every vector in row(A). 58. If A and B are n X n matrices of rank n, prove that AB has rank n. 59. (a) Prove that rank(AB) < rank(B). [Hint: Review Exercise 29 in Section 3.1.] (b) Give an example in which rank(AB) < rank(B). 60. (a) Prove that rank(AB) < rank(A). [Hint: Review Exercise 30 in Section 3.1 or use transposes and Exercise 59(a).] (b) Give an example in which rank(AB) < rank(A). 61. (a) Prove that if U is invertible, then rank(LA) = rank(A). [Hint: A = IT1 (LA).] (b) Prove that if Vis invertible, then rank(A7) = rank(A). 62. Prove that anmXn matrix A has rank 1 if and only if A can be written as the outer product uvT of a vector u in!RmandvinBF. 63. If an m X n matrix A has rank r, prove that A can be written as the sum of r matrices, each of which has rank 1. [Hint: Find a way to use Exercise 62.] 64. Prove that, for m X n matrices A and B, rank (A + B) £ rank(A) + rank(B). 65. Let A be an n X n matrix such that A2 = O. Prove that rank(A) "■" n/2. [Hint: Show that col(A) C null(A) and use the Rank Theorem.] 66. Let A be a skew-symmetric n X n matrix. (See page 162). (a) Prove that xT Ax = 0 for all x in IR". (b) Prove that I + A is invertible. [Hint: Show that null(/ + A) = {0}.] Introduction to Linear Transformations In this section, we begin to explore one of the themes from the introduction to this chapter. There we saw that matrices can be used to transform vectors, acting as a type of "function" of the form w = T(v), where the independent variable v and the dependent variable w are vectors. We will make this notion more precise now and look at several examples of such matrix transformations, leading to the concept of a linear transformation—a powerful idea that we will encounter repeatedly from here on. We begin by recalling some of the basic concepts associated with functions. You will be familiar with most of these ideas from other courses in which you encountered functions of the form/: R —> R [such as/(x) = x2] that transform real numbers into real numbers. What is new here is that vectors are involved and we are interested only in functions that are "compatible" with the vector operations of addition and scalar multiplication. Consider an example. Let ,4 and v = Then Av 1 -1 This shows that A transforms v into w = We can describe this transformation more generally. The matrix equation "l o" X X 2 -1 2x — y .3 4. J- _3x + 4y_ vector x 2x — y 3x + 4y. gives a formula that shows how A transforms an arbitrary vector in R3. We denote this transformation by TA and write in R2 into the x 2x — y 3x + Ay (Although technically sloppy, omitting the parentheses in definitions such as this one is a common convention that saves some writing. The description of TA becomes x x 2x — y 3x + Ay with this convention.) With this example in mind, we now consider some terminology. A transformation (or mapping or function) T from R" to R'" is a rule that assigns to each vector v in R" a unique vector T(v) in Rm. The domain of Tis R", and the codomain of Tis Um. We indicate this by writing T: R" —> Rm. For a vector v in the domain of T, the vector T(v) in the codomain is called the image of v under (the action of) T. The set of all possible images T(v) (as v varies throughout the domain of T) is called the range of T. In our example, the domain of TA is R2 and its codomain is R3, so we write f 1 is w = rA(v) = 53. The image of v = -1 3 -1 . What is the range of Section 3.6 Introduction to Linear Transformations T4? It consists of all vectors in the codomain R3 that are of the form T, x X V o" 2x — y = X 2 + y -1 3x + Ay, „3. 4. which describes the set of all linear combinations of the column vectors o" and -1 4 of A. In other words, the range of T is the column space of A\ (We will have more to say about this later—for now we'll simply note it as an interesting observation.) Geometrically, this shows that the range of TA is the plane through the origin in R3 with direction vectors given by the column vectors of A. Notice that the range of TA is strictly smaller than the codomain of TA. Linear Transformations The example TA above is a special case of a more general type of transformation called a linear transformation. We will consider the general definition in Chapter 6, but the essence of it is that these are the transformations that "preserve" the vector operations of addition and scalar multiplication. Definition A transformation T:! is called a linear transformation if 1. T(u + v) = T(u) + T(v) for all u and v in R" and 2. T(c\) = cT(\) for all v in W and all scalars c. Example 3.55 Consider once again the transformation T: M2 —» R3 defined by r i f x X T - 2x - y y '~ L3x + 4y. Let's check that Tis a linear transformation. To verify (1), we let Then T(u + v) = I + x UK! J and v x X\ + x2 2xx + 2x2 - yx - y2 3x{ + 3>x2 + 4yx + 4y2 x{ + x2 2{xl + x2) - (yi + y2) 3(x, + x2) + 4(y, + y2) xl + x2 (2Xi - yx) + C2.Y- - y2) _(3xi + 4yj) + (3x2 + 4y2), 2*i - >'i 3xj + 4y, x2 2x2 - y2 3x2 + 4y2_ = T Xi + T x2 Jl- Jt. r(u) + rw 214 Chapter 3 Matrices To show (2), we let v and let c be a scalar. Then Ttcv) = T c ( X \ f cx \ c = T V .y. / V cy. J cx 2{cx) — icy) _3(cx) + 4(c)') _ x 2x — y 3x + 4y. = cT cx c(2x — y) c(3x + Ay) = cTtv) Thus, T is a linear transformation. Remark The definition of a linear transformation can be streamlined by combining (1) and (2) as shown below. T: W -» Um is a linear transformation if T(c1v1 + c2v2) = CjT^) + c2T(v2) for all v2 in U" and scalars ct, c. In Exercise 53, you will be asked to show that the statement above is equivalent to the original definition. In practice, this equivalent formulation can save some writing—try it! Although the linear transformation T in Example 3.55 originally arose as a matrix transformation TA, it is a simple matter to recover the matrix A from the definition of T given in the example. We observe that x iyi X V cf ~1 (f 2x - y = X 2 + y -l = 2 -1 _3x + 4y_ .3. 4. .3 4. so T = TA, where A = . (Notice that when the variables x and y are lined 1 0 2 -1 .3 4. up, the matrix A is just their coefficient matrix.) Recognizing that a transformation is a matrix transformation is important, since, as the next theorem shows, all matrix transformations are linear transformations. Th80f6HI 3.30 Let A be an w X n matrix. Then the matrix transformation TA: I T.(x) = Ax (for x in IR") is a linear transformation. defined by Section 3.6 Introduction to Linear Transformations 215 Proof Let u and v be vectors in W and let c be a scalar. Then TA(u + v) = A(tt + v) = Au + Av = TA(u) + TA(v) and TA(cv) = A(cv) = c(Av) = cTA(v) Hence, TA is a linear transformation. Example 3.56 v (1.2) i (x, -y) (1.-2) figure 3.4 Reflection in the x-axis Let F : IR2 —*■ U2 be the transformation that sends each point to its reflection in the x-axis. Show that F is a linear transformation. T Solution From Figure 3.4, it is clear that F sends the point (x, y) to the point (x, —y). Thus, we may write X X J. -y. We could proceed to check that F is linear, as in Example 3.55 (this one is even easier to check!), but it is faster to observe that X 1 0 1 0 = X + y 0 -1 -y 0 -1 Therefore, F X X = A J- y. , where A = 1 0 0 -1 so F is a matrix transformation. It now follows, by Theorem 3.30, that F is a linear transformation. Example 3.57 i ! \ i \ (x,y) -9()° > ^ i i ! 1 [ b 1 -y ! P x Figure 3.5 A 90° rotation Let : U2 —> R2 be the transformation that rotates each point 90° counterclockwise about the origin. Show that JR is a linear transformation. Solution As Figure 3.5 shows, R sends the point (x, y) to the point (~y, x). Thus, we have R X -y 0 + y -i 0 -1 = X — J- X 1 0 1 0 Hence, JR is a matrix transformation and is therefore linear. 4 Observe that if we multiply a matrix by standard basis vectors, we obtain the columns of the matrix. For example, a b c d e f. and a b c d le f. d If. We can use this observation to show that every linear transformation from IR" to : arises as a matrix transformation. Chapter 3 Matrices Theorem 3.31 Let T: U" —» Rm be a linear transformation. Then T is a matrix transformation. More specifically, T = TA, where A is the m X « matrix A = [T(ei) T(e2).--- T(e„)] Proof Let e„ e2, . . . , e„ be the standard basis vectors in R" and let x be a vector in R". We can write x = xxex + x2e2 +• ■ • + xnen (where the x-s are the components of x). We also know that T(ex), T(e2),..., T(e„) are (column) vectors in Rm. Let A = [T(e1): T(e2): ■ • •: T(e„)] be the m X n matrix with these vectors as its columns. Then T(x) = T(xxex + x2e2 + ■ • ■ + x„e„) = x,r(e,) + x2T(e2) + • • • + xnT(en) "x, [T(ei)\ne2)\----.T(eH)} x2 = Ax as required. The matrix A in Theorem 3.31 is called the standard matrix of the linear transformation T. ~——.„,„„—,.,,,,„,,..,]-_________-p. Example 3.58 Show that a rotation about the origin through an angle 6 defines a linear transformation from R2 to R2 and find its standard matrix. t Solution Let R0 be the rotation. We will give a geometric argument to establish the fact that Rg is linear. Let u and v be vectors in R2. If they are not parallel, then Figure 3.6(a) shows the parallelogram rule that determines u + v. If we now applying, the entire parallelogram is rotated through the angle d, as shown in Figure 3.6(b). But the diagonal of this parallelogram must be Rg(u) + Re(v), again by tire parallelogram rule. H—► Hence, Re(u + v) = Rg(u) + Kg(v), (What happens if u and v are parallel?) y A u + v Re(u + v) y ,4 '1 I / / (a) (b) Figure 3.6 Similarly, if we apply Re to v and cv, we obtain Rs(v) and R§(cv), as shown in Figure 3.7. But since the rotation does not affect lengths, we must then have H * Re(cv) = cR(j(v), as required. (Draw diagrams for the cases 0 < c < 1, — l R2 that projects a point onto the x-axis is a linear transformation and find its standard matrix. (b) More generally, if € is a line through the origin in R2, show that the transforma- r tion P, that projects a point onto € is a linear transformation and find its standard matrix. Solution (a) As Figure 3.11 shows, P sends the point (x, y) to the point (x, 0). Thus, X X 1 + y 0 1 0 = X 33 V 0 0 0 0 0 It follows that P is a matrix transformation (and hence a linear transformation) with standard matrix 1 0 0 0 (fa) Let the line f, have direction vector d and let v be an arbitrary vector. Then P( is given by projd(v), the projection of v onto d, which you'll recall from Section 1.2 has the formula d • v d-d projdW = Thus, to show that P( is linear, we proceed as follows: , ,'d-(u + v)s P((u + v) d-d d • u + d-v d-d d-u d•v \ d-d + d-dj ° ' \l I ~~r Id = Pe(u) + P((v) d-d d-d Similarly, Pe(cv) = cPe(v) for any scalar c (Exercise 52). Hence, Pe is a linear transformation. Section 3.6 Introduction to Linear Transformations 219 To find the standard matrix of P(, we apply Theorem 3.31, If we let d then dd (™~lc dx 1 ' d\ ' d\ + d\ d2_ d'i + d\ d\d2_ d2 d\ 1 dyd2 d{ + dj d2_ d\ + d\ .d\_ Thus, the standard matrix of the projection is A 1 d\ dxd2 ' d\lid\ + d\) dxd2/{dl + d\ dj + d\ _dxd2 d\ _dxd2/(dx2 + d22) d\l{d\ + d2) As a check, note that in part (a) we could take d = ex as a direction vector for the x-axis. Therefore, dx = 1 and d2 = 0, and we obtain A = 1 0 0 0 , as before. New Linear Transformations from Old If T: U'n -» U" and S : U" -» W are linear transformations, then we may follow T by S to form the composition of the two transformations, denoted S ° T. Notice that, in order for S ° T to make sense, the codomain of T and the domain of 5 must match (in this case, they are both R") and the resulting composite transformation S°T goes from the domain of Tto the codomain of S (in this case, S°T: U'n —> W). Figure 3.12 shows schematically how this composition works. The formal definition of composition of transformations is taken directly from this figure and is the same as the corresponding definition of composition of ordinary functions: (S ° T)(v) = S(T(v)) Of course, we would like S 0 T to be a linear transformation too, and happily we find that it is. We can demonstrate this by showing that S°T satisfies the definition of a linear transformation (which we will do in Chapter 6), but, since for the time being we are assuming that linear transformations and matrix transformations are the same thing, it is enough to show that S ° Tis a matrix transformation. We will use the notation [T] for the standard matrix of a linear transformation T. Figure 3.12 The composition of transformations H Chapter 3 Matrices Theorem 3.32 Let T:U'n-> W and S: W -> W be linear transformations. Then S ° T: Um ~> W is a linear transformation. Moreover, their standard matrices are related by [S'T)= [S][T] Proof Let [S] = A and [T] = B. (Notice that A is p X « and Bis n X m.) Ifv is a vector in I im, then we simply compute (S o r)(v) = S(T(v)) = S(Bv) A(Bv) = (AB)v (Notice here that the dimensions of A and B guarantee that the product AB makes sense.) Thus, we see that the effect of S0 T is to multiply vectors by AB, from which it follows immediately that 5 0 T is a matrix (hence, linear) transformation with [S°T}= [S][T]. - Isn't this a great result? Say it in words: "The matrix of the composite is the product of the matrices." What a lovely formula! Example 3.80 Consider the linear transformation T: R2 —> U3 from Example 3.55, defined by and the linear transformation S : L73 %2 _3*i + 4x2 !4 denned by 2jVi + }'3 tyi - n y\+ yi + y3 FindS°T: M2-*^. Solution We see that the standard matrices are [S] = so Theorem 3.32 gives [SoT] = [S][T] It follows that (So T) '2 0 1" 0 3 -1 1 -1 0 1 1 1 and IT] = 5xl + 4x2 3x{ — 7x2 L6x1 + 3x2j Section 3.6 Introduction to Linear Transformations 221 (In Exercise 29, you will be asked to check this result by setting V xt -X2. Xy Jl = T - ^3xi + 4x2. and substituting these values into the definition of S, thereby calculating (S ° T) directly.) Example 3.61 Find the standard matrix of the transformation that first rotates a point 90° counterclockwise about the origin and then reflects the result in the x-axis. The rotation R and the reflection F were discussed in Examples 3.57 and 3.56, respectively, where we found their standard matrices to be [R] = 1 0" 0 1 1 0 It follows that the composition F ° R has for its matrix [FoR] = [F][R] = "l °1 "o -l" r o -ii _0 -lj 1 0. ~ L-i o_ (Check that this result is correct by considering the effect of F 0 R on the standard basis vectors e] and e2. Note the importance of the order of the transformations: R is performed before F, but we write F ° R. In this case, R 0 F also makes sense. Is R°F = F°R?) Inverses of Linear Transformations Consider the effect of a 90° counterclockwise rotation about the origin followed by a 90° clockwise rotation about the origin. Clearly this leaves every point in 1R2 unchanged. If we denote these transformations by R90 and R^90 (remember that a negative angle measure corresponds to clockwise direction), then we may express this as (R90 0 R-go) (v) = v for every v in U2. Note that, in this case, if we perform the transformations in the other order, we get the same end result: (R-90 0 R9q)(v) = v for every v in 1R2. Thus, R90 ° R~90 (and R-90 0 R90 too) is a linear transformation that leaves every vector in R2 unchanged. Such a transformation is called an identity transformation. Generally, we have one such transformation for every W—namely, I: U" —> U" such that I (v) = v for every vin W. (If it is important to keep track of the dimension of the space, we might write I„ for clarity.) So, with this notation, we have R90 ° R-90 = I = R-90° Rgo- A pair of transformations that are related to each other in this way are called inverse transformations. Definition Let S and Tbe linear transformations from W to IR". Then S and T are inverse transformationsifS°T=In and T° S = I„. 222 Chapter 3 Matrices Remark Since this definition is symmetric with respect to S and T, we will say that, when this situation occurs, S is the inverse of T and Tis the inverse of S. Furthermore, we will say that S and T are invertible. In terms of matrices, we see immediately that if S and T are inverse transforma-tions, then [S][T] = [S 0 T] = [I] = I, where the last lis the identity matrix. (Why is the standard matrix of the identity transformation the identity matrix?) We must also have [T][S] = [T° S] = [I] = I. This shows that [S] and [T] are inverse matrices. It shows something more: If a linear transformation T is invertible, then its standard matrix [T] must be invertible, and since matrix inverses are unique, this means that the inverse of T is also unique. Therefore, we can unambiguously use the notation T~l to refer to the inverse of T. Thus, we can rewrite the above equations as [THT""1] = I = [T~1][T], showing that the matrix of T^1 is the inverse matrix of [T]. We have just proved the following theorem. Til60r6ill 3.33 Let T : U" —> U" be an invertible linear transformation. Then its standard matrix [T] is an invertible matrix, and [T1] = [T] ~' Remark Say this one in words too: "The matrix of the inverse is the inverse of the matrix." Fabulous! Example 3.62 Find the standard matrix of a 60° clockwise rotation about the origin in I Solution Earlier we computed the matrix of a 60° counterclockwise rotation about the origin to be 1/2 -V3/2" .V3/2 1/2. Since a 60° clockwise rotation is the inverse of a 60° counterclockwise rotation, we can apply Theorem 3.33 to obtain [*-„] = [ÍRJ-1] 1/2 -V3/2 V3/2 1/2. 1/2 V3/2 -V3/2 1/2. B *• (Check the calculation of the matrix inverse. The fastest way is to use the 2X2 shortcut from Theorem 3.8. Also, check that the resulting matrix has the right effect on the standard basis in R by drawing a diagram.) 4 Example 3.63 Determine whether projection onto the x-axis is an invertible transformation, and if it is, find its inverse. 1 0 .0 0 since its determinant is 0. Hence, P is not invertible either. Solution The standard matrix of this projection P is , which is not invertible Section 3.6 Introduction to Linear Transformations 223 T (a. b) I f(a,b') * (a, 0) *(a,b") Figure 3.13 Projections are not invertible Remark Figure 3.13 gives some idea why Pin Example 3.63 is not invertible. The projection "collapses" R2 onto the x-axis. For P to be invertible, we would have to have a way of "undoing" it, to recover the point (a, b) we started with. However, there are infinitely many candidates for the image of (a, 0) under such a hypothetical "inverse." Which one should we use? We cannot simply say that P~l must send [a, 0) to (a, b), since this cannot be a definition when we have no way of knowing what b should be. (See Exercise 42.) Associativity Theorem 3.3(a) in Section 3.2 stated the associativity property for matrix multiplication: A(BC) = (AB)C. (If you didn't try to prove it then, do so now. Even with all matrices restricted 2X2, you will get some feeling for the notational complexity involved in an "elementwise" proof, which should make you appreciate the proof we are about to give.) Our approach to the proof is via linear transformations. We have seen that every m X n matrix A gives rise to a linear transformation TA : W —> M'n; conversely, every linear transformation T : R" —*■ Rm has a corresponding m X n matrix [T]. The two correspondences are inversely related; that is, given A, [TA] = A, and given T, T[T] = T, Let R — TA, S = TB, and T = Tc. Then, by Theorem 3.32, A{BC) = (AB)C if and only if R ° (S ° T) = (R ° S) ° T We now prove the latter identity. Let x be in the domain of T [and hence in the domain of both R°(S°T) and (R°S)° T— why?]. To prove that R ° (S ° T) = (R° S) ° T, it is enough to prove that they have the same effect on x. By repeated application of the definition of composition, we have (R o (S o D)(x) R((S o t)(x)) r(s(t(x))) CR ° s)(r(x)) = ((r o s) o r)(x) as required. (Carefully check how the definition of composition has been used four times.) This section has served as an introduction to linear transformations. In Chapter 6, we will take a more detailed and more general look at these transformations. The exercises that follow also contain some additional explorations of this important concept. Exercises 3.6 1. Let TA : I2 be the matrix transformation corre- 2. Let TA : U2 -» R3 be the matrix transformation corre- sponding to A where u = 2 -1 3 4 . Find TA(u) and TA(v), "1" 3~ andv = .2. _-2_ sponding to A = Ta(v), where u = Find TA(u) and Y 3" and v = .2. _-2_ : ." Chapters Matrices In Exercises 3-6, prove that the given transformation is a linear transformation, using the definition (or the Remark following Example 3.55). 3. T 5. T X x + y .y. x — y_ X y x — 2x + 4. T x ~y x + 2y 3x - 4y - X X + z z 6. T y = y + z .x + y. In Exercises 7-10, give a counterexample to show that the given transformation is not a linear transformation. 7. r 9. T x V / J. x\ X xy J- x + y_ 8. T 10. T x J. - x .y. \x \y\\ X + 1 y- 1 In Exercises 11-14, find the standard matrix of the linear transformation in the given exercise. 11. Exercise 3 12. Exercise 4 13. Exercise 5 14. Exercise 6 In Exercises 15-18, show that the given transformation from R2 to IR2 is linear by showing that it is a matrix transformation. 15. F reflects a vector in the y-axis, 16. R rotates a vector 45° counterclockwise about the origin. 17. D stretches a vector by a factor of 2 in the x-component and a factor of 3 in the y-component. 18. P projects a vector onto the line y — x. 19. The three types of elementary matrices give rise to five types of 2 X 2 matrices with one of the following forms: 'k 0~ 1 0" or 0 1. „0 k 0 1 1 0 "1 fc" 1 0' or k 1. _0 1. Each of these elementary matrices corresponds to a linear transformation from IRJ to R2. Draw pictures to illustrate the effect of each one on the unit square with vertices at (0, 0), (1,0), (0,1), and (1,1). In Exercises 20-25, find the standard matrix of the given linear transformation from R2 to R2. 20. Counterclockwise rotation through 120° about the origin 21. Clockwise rotation through 30° about the origin 22. Projection onto the line y = 2x 23. Projection onto the liney = — x 24. Reflection in the line y — x 25. Reflection in the liney = — x 26. Let € be a line through the origin in R2, Pe the linear transformation that projects a vector onto €, and Fe the transformation that reflects a vector in (. (a) Draw diagrams to show that Ff is linear. (b) Figure 3.14 suggests a way to find the matrix of F(, using the fact that the diagonals of a parallelogram bisect each other. Prove that F((x) = 2P( (x) — x, and use this result to show that the standard matrix of F( is 1 d\ + d\ d\ - d\ 2d,d~, 2dxdj -d2 + d\ (where the direction vector of t is d = (c) If the angle between 6 and the positive x-axis is 6, show that the matrix of Fe is cos 20 sin 20 sin 20 -cos 20 Figure 3.14 In Exercises 27 and 28, apply part (b) or (c) of Exercise 26 to find the standard matrix of the transformation. 27. Reflection in the line y = 2x Section 3.6 Introduction to Linear Transformations 225 28. Reflection in the line y = V3x 29. Check the formula for S0 Tin Example 3.60, by performing the suggested direct substitution. In Exercises 30-35, verify Theorem 3.32 by finding the matrix ofS °T(a)by direct substitution and (b) by matrix multiplication of [S] [T]. 30. T 31. T 32. T 33. T 34. T *2. xx x2 -x, = X] - X2 ,5 *i + 2x2 - — 3x + *2- ■ x2 .~*1„ V = ,s 71 J2. 2yi "TV 7i + 3y2 71 7i + 3y2 2vi + y2 x\ "4" ^2 ^3 ~~ ^2 ~l" X3 ,s }'l 72 J 4Vi - 2y2 ->'i + y2 71 L72J 7i - 72 7l + 72 -yi + yi Xj r X2 71 "71-72 35. r *2 = x2 + X3 ,s yi = 72 - 73 -*3- L*l + *3- --7l + 73- In Exercises 36-39, find the standard matrix of the composite transformation from R2 to R2. 36. Counterclockwise rotation through 60°, followed by reflection in the line y = x 37. Reflection in the y-axis, followed by clockwise rotation through 30° 38. Clockwise rotation through 45°, followed by projection onto the y-axis, followed by clockwise rotation through 45° 39. Reflection in the line y = x, followed by counterclockwise rotation through 30°, followed by reflection in the liney = — x In Exercises 40-43, use matrices to prove the given statements about transformations from R2 to U2. 40. If Re denotes a rotation (about the origin) through the angle 8, thenRaoR/3 = Ra+p. 41. If 8 is the angle between lines € and m (through the origin), then Fm ° F( = R+26. (See Exercise 26.) 42. (a) If P is a projection, then P 0 P = P. (b) The matrix of a projection can never be invertible. 43. If (, m, and n are three lines through the origin, then F„ 0 Fm ° F( is also a reflection in a line through the origin. 44. Let T be a linear transformation from R2 to R2 (or from R3 to R3). Prove that T maps a straight line to a straight line or a point. [Hint: Use the vector form of the equation of a line.] 45. Let T be a linear transformation from R2 to R2 (or from R3 to R3). Prove that Tmaps parallel lines to parallel lines, a single line, a pair of points, or a single point. In Exercises 46-51, let ABCD be the square with vertices (-1,1), (1,1), (1, -1), and (-1, -1). Use the results in Exercises 44 and 45 to find and draw the image of ABCD under the given transformation. 46. T in Exercise 3 47. D in Exercise 17 48. Pin Exercise 18 49. The projection in Exercise 22 50. Tin Exercise 31 51. The transformation in Exercise 37 52. Prove that P((cv) = cPf(v) for any scalar c [Example 3.59(b)]. 53. Prove that T: R" -> Rm is a linear transformation if and only if r(clVl + c2v2) = c,r(Vl) + c2T(v2) for all Vn v2 in R" and scalars cly c2. 54. Prove that (as noted at the beginning of this section) the range of a linear transformation T: R" —> Rm is the column space of its matrix [T]. 55. If A is an invertible 2X2 matrix, what does the Fundamental Theorem of Invertible Matrices assert about the corresponding linear transformation TA in light of Exercise 19? Vignette Robotics In 1981, the U.S. Space Shuttle Columbia blasted off equipped with a device called the Shuttle Remote Manipulator System (SRMS). This robotic arm, known as Canadarm, has proved to be a vital tool in all subsequent space shuttle missions, providing strong, yet precise and delicate handling of its payloads (see Figure 3.15). Canadarm has been used to place satellites into their proper orbit and to retrieve malfunctioning ones for repair, and it has also performed critical repairs to the shuttle itself. Notably, the robotic arm was instrumental in the successful repair of the Hubble Space Telescope. Since 1998, Canadarm has played an important role in the assembly and operation of the International Space Station. -I'll" H TtHnTtnWMr $58Sk laEßmüBL m, Figure 3.15 Canadarm A robotic arm consists of a series of links of fixed length connected at joints where they can rotate. Each link can therefore rotate in space, or (through the effect of the other links) be translated parallel to itself, or move by a combination (composition) of rotations and translations. Before we can design a mathematical model for a robotic arm, we need to understand how rotations and translations work in composition. To simplify matters, we will assume that our arm is in IK2. 226 In Section 3.6, we saw that the matrix of a rotation R about the origin through an ["cosff — sin(? angle 9 is a linear transformation with matrix a sva.9 cost (Figure 3.16(a)). If , then a translation along v is the transformation T(x) = x + v or, equivalently, T (Figure 3.16(b)). X x + a J. y + b y A R(x) T(x) = x + v (a) Rotation (b) Translation Figure 3.16 Unfortunately, translation is not a linear transformation, because T(0) # 0. However, there is a trick that will get us around this problem. We can represent the vector as the vector in IR3. This is called representing x in homogeneous coor- dinates. Then the matrix multiplication "l 0 a X X + a 0 1 b y = y + b .0 0 1_ .1. i represents the translated vector T(x) in homogeneous coordinates. We can treat rotations in homogeneous coordinates too. The matrix multiplication cos 9 sin# 0 -sinff cos 9 0 xcos8 — ysin9 x siné* + y cos 9 1 "l 0 a 0 1 b .0 0 1_ represents the rotated vector R(x) in homogeneous coordinates. The composition T°R that gives the rotation R followed by the translation T is now represented by the product cos9 — sin# 0 cos# —sin0 a sin8 cos 6 0 = sin 9 cos 9 b 0 0 1J L 0 0 1. [Note that R ° T + T°R.] To model a robotic arm, we give each link its own coordinate system (called a frame) and examine how one link moves in relation to those to which it is directly connected. To be specific, we let the coordinate axes for the link At be xt and y;, with the x,-axis aligned with the link. The length of A; is denoted by a,-, and the angle 227 between xt and xhi is denoted by 0,. The joint between A,- and AHl is at the point (0,0) relative to A,-and (a,-,, 0) relative to AH]. Hence, relative to A,_i, the coordinate system for A, has been rotated through 0, and then translated along i_1 (Figure 3.17). This transformation is represented in homogeneous coordinates by the matrix cos#, — sin#; a,„! sin 8t cos 9j 0 0 0 1 yi-\ n-\- Figure 3.17 To give a specific example, consider Figure 3.18(a). It shows an arm with three links in which A] is in its initial position and each of the other two links has been rotated 45° from the previous link. We will take the length of each link to be 2 units. Figure 3.18(b) shows A3 in its initial frame. The transformation T3 = cos 45 -sin 45 2 sin 45 cos 45 0 0 0 1 1/V2 -1/V2 2 1/V2 1/V2 0 0 0 1 causes a rotation of 45° and then a translation by 2 units. As shown in 3.18(c), this places A3 in its appropriate position relative to A2 s frame. Next, the transformation r, = cos 45 —sin 45 2 sin 45 cos 45 0 0 0 1 1/V2 -1/V2 2 1/V2 1/V2 0 0 0 1 is applied to the previous result. This places both A3 and A2 in their correct position relative to At, as shown in Figure 3.18(d). Normally, a third transformation Tl (a rotation) would be applied to the previous result, but in our case, T, is the identity transformation because Ax stays in its initial position. Typically, we want to know the coordinates of the end (the "hand") of the robotic arm, given the length and angle parameters—this is known as forward kinematics. Following the above sequence of calculations and referring to Figure 3.18, we see that A (a) A three-link chain V2 *2 (c) 73 puts A3 in Afs initial frame Figure 3.18 A3 (b) A3 in its initial frame >'i ^3 A2 / Ay / / 1-.-►.v, (d) T2T3 puts A3 in Aj's initial frame we need to determine where the point (2,0) ends up after T3 and T2 are applied. Thus, the arm's hand is at ~2~' "1/V2 -1/V2 2 " 2 '2 ' '0 -1 2 + v5" "2 T2T3 0 = 1/V2 1/v2 0 0 = 1 0 v2 0 .1. 0 0 1 . .1 . _0 0 1 .1 2 + v5" 2 +V2 1 which represents the point (2+ V2, 2 +V2) in homogeneous coordinates. It is easily checked from Figure 3.18(a) that this is correct. The methods used in this example generalize to robotic arms in three dimensions, although in R3 there are more degrees of freedom and hence more variables. The method of homogeneous coordinates is also useful in other applications, notably computer graphics. 229 231 Chapter 3 Matrices Applications Andrei A. Markov (1856-1922) was a Russian mathematician who studied and later taught at the University of St. Petersburg. He was interested in number theory, analysis, and the theory of continued fractions, a recently developed field that Markov applied to probability theory. Markov was also interested in poetry, and one of the uses to which he put Markov chains was the analysis of patterns in poems and other literarv texts. Markov Chains A market research team is conducting a controlled survey to determine peoples preferences in toothpaste. The sample consists of 200 people, each of whom is asked to try two brands of toothpaste over a period of several months. Based on the responses to the survey, the research team compiles the following statistics about toothpaste preferences. Of those using Brand A in any month, 70% continue to use it the following month, while 30% switch to Brand B; of those using Brand B in any month, 80% continue to use it the following month, while 20% switch to Brand A. These findings are summarized in Figure 3.19, in which the percentages have been converted into decimals; we will think of them as probabilities. 0.30 Figure 3.19 Figure 3.19 is a simple example of a (finite) Markov chain. It represents an evolving process consisting of a finite number of states. At each step or point in time, the process may be in any one of the states; at the next step, the process can remain in its present state or switch to one of the other states. The state to which the process moves at the next step and the probability of its doing so depend only on the present state and not on the past history of the process. These probabilities are called transition probabilities and are assumed to be constants (that is, the probability of moving from state i to state; is always the same). Example 3.64 In the toothpaste survey described above, there are just two states—using Brand A and using Brand B—and the transition probabilities are those indicated in Figure 3.19. Suppose that, when the survey begins, 120 people are using Brand A and 80 people are using Brand B. How many people will be using each brand 1 month later? 2 months later? Solution The number of Brand A users after 1 month will be 70% of those initially using Brand A (those who remain loyal to Brand A) plus 20% of the Brand B users (those who switch from B to A): 0.70(120) + 0.20(80) = 100 Similarly, the number of Brand B users after 1 month will be a combination of those who switch to Brand B and those who continue to use it: 0.30(120) + 0.80(80) = 100 Section 3.7 Applications 231 We can summarize these two equations in a single matrix equation: "0.70 0.20" 120" "lOO" .0.30 0.80_ . 80. .100. Let's call the matrix P and label the vectors x,, = "120" and x, = "lOO" . 80_ .100. (Note that the components of each vector are the numbers of Brand A and Brand B users, in that order, after the number of months indicated by the subscript.) Thus, we have x, = Px0. Extending the notation, let xk be the vector whose components record the distribution of toothpaste users after k months. To determine the number of users of each brand after 2 months have elapsed, we simply apply the same reasoning, starting with x, instead of xn. We obtain "0.70 0.20" "100" " 90" X2 = PX! = = .0.30 0.80. 100_ 110 from which we see that there are now 90 Brand A users and 110 Brand B users. The vectors xk in Example 3.64 are called the state vectors of the Markov chain, and the matrix P is called its transition matrix. We have just seen that a Markov chain satisfies the relation xk |, = Pxk for A; = 0, 1,2,... From this result it follows that we can compute an arbitrary state vector iteratively once we know x0 and P. In other words, a Markov chain is completely determined by its transition probabilities and its initial state. Remarks 3 Suppose, in Example 3.64, we wanted to keep track of not the actual numbers of toothpaste users but, rather, the relative numbers using each brand. We could convert the data into percentages or fractions by dividing by 200, the total number of users. Thus, we would start with 0.60 0.40 to reflect the fact that, initially, the Brand A-Brand B split is 60%-40%. Check by r0.50" direct calculation that Pxa = 0.50 , which can then be taken as Xj (in agreement with the 50-50 split we computed above). Vectors such as these, with nonnegative components that add up to 1, are called probability vectors. * Observe how the transition probabilities are arranged within the transition matrix P. We can think of the columns as being labeled with the present states and the rows as being labeled with the next states: Present A B A f 0.70 0.20" B 0.30 0.80 Next Chapter 3 Matrices The word is derived from the Greek adjective stokhastikos, meaning "capable of aiming" (or guessing). It has come to be applied to anything that is governed by the laws of probability in the sense that probability makes predictions about the likelihood of things happening. In probability theory, "stochastic processes" form a generalization of Markov chains. Note also that the columns of P are probability vectors; any square matrix with this property is called a stochastic matrix. We can realize the deterministic nature of Markov chains in another way. Note that we can write and, in general, = Px1 = P(Px0) = P\, xk = PkxQ fork = 0, 1,2,... This leads us to examine the powers of a transition matrix. In Example 3.64, we have p2 = 0.70 0.30 0.20 0.80 0.70 0.30 0.20 0.80 0.55 0.45 0.30 0.70 A 0.49 B 0.24* Figure 3.20 What are we to make of the entries of this matrix? The first thing to observe is that P2 is another stochastic matrix, since its columns sum to 1. (You are asked to prove this in Exercise 14.) Could it be that P2 is also a transition matrix of some kind? Consider one of its entries—say, (P2)21 — 0.45. The tree diagram in Figure 3.20 clarifies where this entry came from. There are four possible state changes that can occur over 2 months, and these correspond to the four branches (or paths) of length 2 in the tree. Someone who initially is using Brand A can end up using Brand B 2 months later in two different ways (marked * in the figure): The person can continue to use A after 1 month and then switch to B (with probability 0.7(0.3) = 0.21), or the person can switch to B after 1 month and then stay with B (with probability 0.3(0.8) = 0.24). The sum of these probabilities gives an overall probability of 0.45. Observe that these calculations are exactly what we do when we compute (P2)2i- It follows that (P2)21 = 0.45 represents the probability of moving from state 1 (Brand A) to state 2 (Brand B) in two transitions. (Note that the order of the subscripts is the reverse of what you might have guessed.) The argument can be generalized to show that (Pk)ij is the probability of moving from state; to state i in k transitions. In Example 3.64, what will happen to the distribution of toothpaste users in the long run? Lets work with probability vectors as state vectors. Continuing our calculations (rounding to three decimal places), we find "0.60" *0.50" L0.40, .0.50. x, = l'x- = xfi = 0.403 0.597 , x2 = PXi ["0.70 0.201 "0.50" "0.45" [0.30 0.80J .0.50. .0.55. 0.70 0.20" "0.45" "0.425" "0.412" "0.406" >*5 = .0.30 0.80_ _0.55_ .0.575. „0.588 0.594 , x7 = 0.402" "0.401" "0.400" "0.400" ,x9 = .0.598. .0.599. .0.600. 0.600 Section 3.7 Applications 233 and so on. It appears that the state vectors approach (or converge to) the vector 0.4 0.6 implying that eventually 40% of the toothpaste users in the survey will be using Brand A and 60% will be using Brand B. Indeed, it is easy to check that, once this distribution is reached, it will never change. We simply compute "0.70 0.20" "0.4" "0.4" ^0.30 0.80_ _0.6_ .0,6. A state vector x with the property that Px = x is called a steady state vector. In Chapter 4, we will prove that every Markov chain has a unique steady state vector. For now, lets accept this as a fact and see how we can find such a vector without doing any iterations at all. We begin by rewriting the matrix equation Px = x as Px = Ix, which can in turn be rewritten as (J - P)x = 0. Now this is just a homogeneous system of linear equations with coefficient matrix I — P,so the augmented matrix is [I — P10]. In Example 3.64, we have [I- P\0] which reduces to 1 - 0.70 -0.20 0" 0.30 -0.20 0 -0.30 1 - 0.80 0 -0.30 0.20 0 "l 2 3 0' .0 0 0 So, if our steady state vector is x solution is 1*2- , then x2 is a free variable and the parametric Xy 3 ty If we require x to be a probability vector, then we must have \ = xx + x2 = -3t + t = \t Therefore, x2 — t ~ § = 0.6 and xx = j = 0.4, so x = 0.4 0.6 in agreement with our iterative calculations above. (If we require x to contain the actual distribution, then [ 80 in this example we must have xx + x-, = 200, from which it follows that x = r 120 Example 3.65 A psychologist places a rat in a cage with three compartments, as shown in Figure 3.21. The rat has been trained to select a door at random whenever a bell is rung and to move through it into the next compartment. (a) If the rat is initially in compartment 1, what is the probability that it will be in compartment 2 after the bell has rung twice? three times? (b) In the long run, what proportion of its time will the rat spend in each compartment? Solution Let P = [py] be the transition matrix for this Markov chain. Then p21 = p31 Pn = Pm = Pa = 3» and Pn = Pn = P3i = 0 234 Chapter 3 Matrices Figure 3.21 (Why? Remember that p$ is the probability of moving from j to i.) Therefore, "0 i P = 0 and the initial state vector is (a) After one ring of the bell, we have "o 1 3 3 "r "o" "o 1 2 0 2 3 0 = 1 2 = 0.5 1 i-2 2 3 0. _0_ I - 2 - ,0.5, Continuing (rounding to three decimal places), we find "O \ I ~ 3 "o" "I" "0.333" 1 o 2 1 2 1 3 0.333 1 2 -2 3 0_ 1 - 2 - 1 -3- .0.333. and "0 1 3 1 _ 3 -1 -3 - 2 — 9 "0.222" 1 2 0 2 3 1 3 = 7 18 0.389 1 _ 2 2 3 0_ 1 -3- -18- _0.389_ Px, = Therefore, after two rings, the probability that the rat is in compartment 2 is \ ~ 0.333, and after three rings, the probability that the rat is in compartment 2 is 13 = 0.389. [Note that these questions could also be answered by computing (-P2)2i and (P3)21.] Section 3.7 Applications (b) This question is asking for the steady state vector x as a probability vector. As we saw above, x must be in the null space oil — P, so we proceed to solve the system 1 1 3 1 3 0~ "l 0 2 3 o" [I - P 0] = 1 2 1 2 3 0 -> 0 1 -1 0 1 - 2 2 3 1 0_ „0 0 0 0. Hence, if x , then x3 = t is free and xx — |f, x2 — t. Since x must be a prob- ability vector, we need 1 = xx + x2 + x3 = § t. Thus, t = § and which tells us that, in the long run, the rat spends j of its time in compartment 1 and | of its time in each of the other two compartments. Linear Economic Models We now revisit the economic models that we first encountered in Section 2.4 and recast these models in terms of matrices. Example 2.33 illustrated the Leontief closed model The system of equations we needed to solve was 1 Xy + I X2+ 2X3 = Xx \xx+ |X2+ 4*3 = x2 2 Xy "f" 1 X2~\~ ^ %3 X3 In matrix form, this is the equation Ex = x, where ~l/4 1/3 1/2 " Xj E = 1/4 1/3 1/4 and x = x2 .1/2 1/3 1/4. -*3. The matrix E is called an exchange matrix and the vector x is called a price vector. In general, if E = [e«], then e« represents the fraction (or percentage) of industry /s output that is consumed by industry i and x, is the price charged by industry i for its output. In a closed economy, the sum of each column of £ is 1. Since the entries of £ are also nonnegative, £ is a stochastic matrix and the problem of finding a solution to the equation £x (1) is precisely the same as the problem of finding the steady state vector of a Markov chain! Thus, to find a price vector x that satisfies £x = x, we solve the equivalent homogeneous equation (7 — £)x = 0. There will always be infinitely many solutions; we seek a solution where the prices are all nonnegative and at least one price is positive. 236 Chapter 3 Matrices The Leontief open model is more interesting. In Example 2.34, we needed to solve the system *i = 0.2*i + °-5*2 + O.lxj + 10 *2 = 0.4*i + 0.2*2 + 0.2*3 + 10 *3= 0.1*i + 0.3x2 + 0.3*3 + 30 In matrix form, we have x = Cx + d or (I - C)x = d (2) where "0.2 0.5 0.1 "*! "10" c = 0.4 0.2 0.2 . x = *, , d = 10 .0.1 0.3 0.3. -X3- 30 The matrix C is called the consumption matrix, x is the production vector, and d is the demand vector. In general, if C = [c«], x = [*,], and d = [dj], then c« represents the dollar value of industry f"s output that is needed to produce one dollars worth of industry j's output, *, is the dollar value (price) of industry is output, and d{ is the dollar value of the external demand for industry i '$ output. Once again, we are interested in finding a production vector x with nonnegative entries such that at least one entry is positive. We call such a vector x a feasible solution. Example 3.66 Determine whether there is a solution to the Leontief open model determined by the following consumption matrices: (a) C 1/4 1/3 1/2 1/3 (b) C 1/2 1/2 J/2 2/3 1 0" "1/4 1/3" 3/4 -1/3 .0 1. .1/2 1/3. .-1/2 2/3 Solution (a) We have I - C = so the equation (I — C)x = d becomes 3/4 -1/3 .-1/2 2/3 In practice, we would row reduce the corresponding augmented matrix to determine a solution. However, in this case, it is instructive to notice that the coefficient matrix I — C is invertible and then to apply Theorem 3.7. We compute V df L*2_ A. *i -X2- 3/4 -1/2 1/3 " -1 'd{ "2 1 'di A. .3/2 9/4. Since rfj, d2, and all entries of (7 — C) 1 are nonnegative, so are xl and x2. Thus, we can find a feasible solution for any nonzero demand vector. Section 3.7 Applications 237 (b) In this case, 7 - C = so that 1/2 -1/2 ■1/2 2/3 and (7 - Cf -4 -6 -6 -6 x = (7 - C)_1d = -4 -6 Since all entries of (7 - Q 1 are negative, this will not produce a feasible solution for any nonzero demand vector d. Motivated by Example 3.66, we have the following definition. (For two m X n matrices A = [a^ and B = [by], we will write A > 73 if a,:j > by for all i and j. Similarly, we may define A > B, A < B, and so on. A matrix A is called nonnegative if A > O and positive if A > O.) Definition A consumption matrix C is called productive iff — C is invertible and (J - CT1 > 0. We now give three results that give criteria for a consumption matrix to be productive. Th80rem 3.34 Let C be a consumption matrix. Then C is productive if and only if there exists a —- production vector x & 0 such that x > Cx. Proof Assume that C is productive. Then 7 — C is invertible and (7 — C) 1 — O. Let J - Then x = (7 - C)"'j > 0 and (7 - C)x = j > 0. Thus, x - Cx > 0 or, equiva-lently, x > Cx. Conversely, assume that there exists a vector x 2: 0 such that x > Cx. Since CsO and C # O, we have x > 0 by Exercise 35. Furthermore, there must exist a real number A with 0 < A < 1 such that Cx < Ax. But then C2x = C(Cx) < C(Ax) = A(Cx) < A(Ax) = A2x >' > By induction, it can be shown that 0 £S Cx < A"x for all n > 0. (Write out the details of this induction proof.) Since 0 < A < 1, A" approaches 0 as n gets large. Therefore, as n —► oo, A"x —> 0 and hence Cx —> 0. Since x > 0, we must have C" —» O as n —> c°. Now consider the matrix equation (7 - C)(7 + C + C2 + ■■■ + C_1) = 7 - C" Chapter 3 Matrices As n -* oo, C" O, so we have (/ - c)(i + c + c2 = 1-0 = 1 Therefore, / — C is invertible, with its inverse given by the infinite matrix series I + C + C2 + . . . . Since all the terms in this series are nonnegative, we also have (J - C)"1 = I + c + c2 + ... > O Hence, C is productive. • The infinite series 1 + C + C2 + ... is the matrix analogue of the geometric series 1 + x + x2 + .... You may be familiar with the fact that, for \x\ < 1, 1 + x + x2 + ... = 1/(1 - x). • Since the vector Cx represents the amounts consumed by each industry, the inequality x > Cx means that there is some level of production for which each industry is producing more than it consumes. • For an alternative approach to the first part of the proof of Theorem 3.34, see Exercise 42 in Section 4.6. Corollary 3.35 Let C be a consumption matrix. If the sum of each row of C is less than 1, then C is productive. The word corollary comes from the Latin word corollarium, which refers to a garland given as a reward. Thus, a corollary is a little extra reward that follows from a theorem. Proof If then Cx is a vector consisting of the row sums of C. If each row sum of C is less than 1, then the condition x > Cx is satisfied. Hence, C is productive. Corollary 3.36 Let Cbe a consumption matrix. If the sum of each column of C is less than 1, then C is productive. Proof If each column sum of C is less than 1, then each row sum of C is less than 1. Hence, CT is productive, by Corollary 3.35. Therefore, by Theorems 3.9(d) and 3.4, (d - cry = (a - a7)-1 = ar - c'yi = a - ct1 ^ o It follows that (I-Q^O too and, thus, C is productive. You are asked to give alternative proofs of Corollaries 3.35 and 3.36 in Exercise 52 of Section 7.2. It follows from the definition of a consumption matrix that the sum of column j is the total dollar value of all the inputs needed to produce one dollar's worth of industry j's output—that is, industry j's income exceeds its expenditures. We say that such an industry is profitable. Corollary 3.36 can therefore be rephrased to state that a consumption matrix is productive if all industries are profitable. Section 3.7 Applications P. H. Leslie, "On the Use of Matrices in Certain Population Mathematics," Biometrika 33 (1945), pp. 183-212. Population Growth One of the most popular models of population growth is a matrix-based model, first introduced by P. H. Leslie in 1945. The Leslie model describes the growth of the female portion of a population, which is assumed to have a maximum lifespan. The females are divided into age classes, all of which span an equal number of years. Using data about the average birthrates and survival probabilities of each class, the model is then able to determine the growth of the population over time. Example 3.67 A certain species of German beetle, the Vollmar-Wasserman beetle (or VW beetle, for short), lives for at most 3 years. We divide the female VW beetles into three age classes of 1 year each: youths (0-1 year), juveniles (1-2 years), and adults (2-3 years). The youths do not lay eggs; each juvenile produces an average of four female beetles; and each adult produces an average of three females. The survival rate for youths is 50% (that is, the probability of a youths surviving to become a juvenile is 0.5), and the survival rate for juveniles is 25%. Suppose we begin with a population of 100 female VW beetles: 40 youths, 40 juveniles, and 20 adults. Predict the beetle population for each of the next 5 years. Solution After 1 year, the number of youths will be the number produced during that year: 40 X 4 + 20 X 3 = 220 The number of juveniles will simply be the number of youths that have survived: 40 X 0.5 = 20 Likewise, the number of adults will be the number of juveniles that have survived: 40 X 0.25 = 10 We can combine these into a single matrix equation "0 4 3" 0.5 0 0 0 0.25 0 40 220 40 = 20 20 10 or Lx 0 = Xj, where Xq = 40 40 20 is the initial population distribution vector and xt 220 20 10 is the distribution after 1 year. We see that the structure of the equation is exactly the same as for Markov chains: xk+l = Lxj. for k ~ 0,1,2,... (although the interpretation is quite different). It follows that we can iteratively compute successive population distribution vectors. (It also follows that xk = Lkx0 for k = 0,1, 2,..., as for Markov chains, but we will not use this fact here.) We compute X2 — LX] — 0 0.5 0 4 0 0.25 0 4 3 0.5 0 0 0 0.25 0 220 20 10 110 110 5 110 110 5 455 55 27.5 ?*8 Chapter 3 Matrices 0 4 3" "455 "302.5 X4 X/X3 0.5 0 0 55 - 227.5 .0 0.25 0_ - 27.5 _ 13.75. "0 4 3" "302.5 "951.2 x5 = Lx4 = 0.5 0 0 227.5 = 151.2 „0 0.25 0_ 13.75_ _ 56.88 Therefore, the model predicts that after 5 years there will be approximately 951 young female VW beetles, 151 juveniles, and 57 adults. (Note: You could argue that we should have rounded to the nearest integer at each step—for example, 28 adults after step 3—which would have affected the subsequent iterations. We elected not to do this, since the calculations are only approximations anyway and it is much easier to use a calculator or CAS if you do not round as you go.) The matrix L in Example 3.67 is called a Leslie matrix. In general, if we have a population with n age classes of equal duration, I will be an n X n matrix with the following structure: I = b2 h ■ • K-i K h 0 0 • 0 0 0 0 • • 0 0 0 0 h ■ • 0 0 0 0 0 • 0 Here, b\, b2, ■ ■ ■ are the birth parameters (bj = the average numbers of females produced by each female in class i) and su s2,.. . are the survival probabilities (s, = the probability that a female in class i survives into class i + 1). What are we to make of our calculations? Overall, the beetle population appears to be increasing, although there are some fluctuations, such as a decrease from 250 to 225 from year 1 to year 2. Figure 3.22 shows the change in the population in each of the three age classes and clearly shows the growth, with fluctuations. Time (in years) Figure 3.22 Section 3.7 Applications : 0.9- Time (in years) Figure 3.23 If, instead of plotting the actual population, we plot the relative population in each class, a different pattern emerges. To do this, we need to compute the fraction of the population in each age class in each year; that is, we need to divide each distribution vector by the sum of its components. For example, after 1 year, we have 1 1 250Xl ~ 250 "220" "0.88" 20 = 0.08 10. .0.04. which tells us that 88% of the population consists of youths, 8% is juveniles, and 4% is adults. If we plot this type of data over time, we get a graph like the one in Figure 3.23, which shows clearly that the proportion of the population in each class is approaching a steady state. It turns out that the steady state vector in this example is "0.72" 0.24 .0.04. That is, in the long run, 72% of the population will be youths, 24% juveniles, and 4% adults. (In other words, the population is distributed among the three age classes in the ratio 18:6:1.) We will see how to determine this ratio exactly in Chapter 4. Graphs and Digraphs There are many situations in which it is important to be able to model the interrelationships among a finite set of objects. For example, we might wish to describe various types of networks (roads connecting towns, airline routes connecting cities, communication links connecting satellites, etc.) or relationships among groups or individuals (friendship relationships in a society, predator-prey relationships in 242 Chapter 3 Matrices ■r,"'i i V: Two representations of the same graph an ecosystem, dominance relationships in a sport, etc.). Graphs are ideally suited to modeling such networks and relationships, and it turns out that matrices are a useful tool in their study. A graph consists of a finite set of points (called vertices) and a finite set of edges, each of which connects two (not necessarily distinct) vertices. We say that two vertices are adjacent if they are the endpoints of an edge. Figure 3.24 shows an example of the same graph drawn in two different ways. The graphs are the "same" in the sense that all we care about are the adjacency relationships that identify the edges. We can record the essential information about a graph in a matrix and use matrix algebra to help us answer certain questions about the graph. This is particularly useful if the graphs are large, since computers can handle the calculations very quickly. Definition If G is a graph with n vertices, then its adjacency matrix is the n X n matrix A [or A(G)] defined by Jl if there is an edge between vertices i and j \o otherwise The term vertex (vertices is the plural) comes from the Latin verb vertere, which means "to turn." In the context of graphs (and geometry), a vertex is a corner—a point where an edge "turns" into a different edge. Figure 3.25 shows a graph and its associated adjacency matrix. A = h Flours 3 25 A graph with adjacency matrix A "0 1 1 1 1 1 1 0 1 1 0 0 1 0 0 0 Remark Observe that the adjacency matrix of a graph is necessarily a p.....symmetric matrix. (Why?) Notice also that a diagonal entry a„ of A is zero unless there is a loop at vertex i. In some situations, a graph may have more than one edge between a pair of vertices. In such cases, it may make sense to modify the definition of the adjacency matrix so that ay equals the number of edges between vertices i and j. We define a path in a graph to be a sequence of edges that allows us to travel from one vertex to another continuously. The length of a path is the number of edges it contains, and we will refer to a path with k edges as a k-path. For example, in the graph of Figure 3.25, ViV3v2Vi is a 3-path, and vivlv2v1vlv-i is a 5-path. Notice that the first of these is closed (it begins and ends at the same vertex); such a path is called a circuit. The second uses the edge between v( and v2 twice; a path that does not include the same edge more than once is called a simple path. Section 3.7 Applications We can use the powers of a graph's adjacency matrix to give us information about the paths of various lengths in the graph. Consider the square of the adjacency matrix in Figure 3.25: "3 2 10" 2 3 2 1 A = 12 2 1 .0 111. What do the entries of A2 represent? Look at the (2, 3) entry. From the definition of matrix multiplication, we know that (A2)23 = a21fl13 + «22*23 + %}%3 + fl24«43 The only way this expression can result in a nonzero number is if at least one of the products alkak3 that make up the sum is nonzero. But alkak-i is nonzero if and only if both a2S. and ak3 are nonzero, which means that there is an edge between v2 and vk as well as an edge between vk and v3. Thus, there will be a 2-path between vertices 2 and 3 (via vertex k). In our example, this happens for k = 1 and for k = 2, so (i4 ^23 *^2I^13 •^22*^23 ^23*^33 ^24^43 = 1-1 + 1-1 + 1-0 + 0*0 = 2 3$—*- which tells us that there are two 2-paths between vertices 2 and 3. (Check to see that the remaining entries of A2 correctly give 2-paths in the graph.) The argument we have just given can be generalized to yield the following result, whose proof we leave as Exercise 72. If A is the adjacency matrix of a graph G, then the (/, j) entry of Ak is equal to the number of fc-paths between vertices i and j. Example 3.68 Figure 3.26 A digraph How many 3-paths are there between vx and v2 in Figure 3.25? Solution We need the (1, 2) entry of A3, which is the dot product of row 1 of A2 and column 2 of A. The calculation gives (A3)]2 = 3-1 + 2-1 + 1-1 + 0-0 = 6 so there are six 3-paths between vertices 1 and 2, which can be easily checked. In many applications that can be modeled by a graph, the vertices are ordered by some type of relation that imposes a direction on the edges. For example, directed edges might be used to represent one-way routes in a graph that models a transportation network or predator-prey relationships in a graph modeling an ecosystem. A graph with directed edges is called a digraph. Figure 3.26 shows an example. An easy modification to the definition of adjacency matrices allows us to use them with digraphs. 244 Chapter 3 Matrices Definition If G is a digraph with n vertices, then its adjacency matrix is the tiX n matrix A [or A(G)] defined by 1 if there is an edge from vertex i to vertex; 0 otherwise Thus, the adjacency matrix for the digraph in Figure 3.26 is "0 10 1" 0 0 0 1 A = 10 0 0 .10 10. Not surprisingly, the adjacency matrix of a digraph is not symmetric in general. (When would it be?) You should have no difficulty seeing that Ak now contains the numbers of directed fc-paths between vertices, where we insist that all edges along a path flow in the same direction. (See Exercise 72.) The next example gives an application of this idea. 4- Example 3.69 Figure 3.27 A tournament Five tennis players (Djokovic, Federer, Nadal, Roddick, and Safin) compete in a round-robin tournament in which each player plays every other player once. The digraph in Figure 3.27 summarizes the results. A directed edge from vertex i to vertex ;' means that player i defeated player ;". (A digraph in which there is exactly one directed edge between every pair of vertices is called a tournament.) The adjacency matrix for the digraph in Figure 3.27 is 0 10 11 0 0 111 10 0 10 0 0 0 0 1 0 0 10 0 where the order of the vertices (and hence the rows and columns of A) is determined alphabetically. Thus, Federer corresponds to row 2 and column 2, for example. Suppose we wish to rank the five players, based on the results of their matches. One way to do this might be to count the number of wins for each player. Observe that the number of wins each player had is just the sum of the entries in die corresponding row; equivalently, the vector containing all the row sums is given by the product A), where Section 3.7 Applications 245 In our case, we have Exercises 3.7 "o 1 0 1 1 1 3 0 0 1 1 1 1 3 1 0 0 1 0 1 = 2 0 0 0 0 1 1 1 0 0 1 0 0 1 1 Aj = which produces the following ranking: First: Djokovic, Federer (tie) Second: Nadal Third: Roddick, Safin (tie) Are the players who tied in this ranking equally strong? Djokovic might argue that since he defeated Federer, he deserves first place. Roddick would use the same type of argument to break the tie with Safin. However, Safin could argue that he has two "indirect" victories because he beat Nadal, who defeated two others; furthermore, he might note that Roddick has only one indirect victory (over Safin, who then defeated Nadal). Since in a group of ties there may not be a player who defeated all the others in the group, the notion of indirect wins seems more useful. Moreover, an indirect victory corresponds to a 2-path in the digraph, so we can use the square of the adjacency matrix. To compute both wins and indirect wins for each player, we need the row sums of the matrix A + A2, which are given by (A + A2)j / "o 1 0 1 l" "o 0 2 1 1 \ "l" 0 0 1 1 1 1 0 1 1 1 1 1 0 0 1 0 + 0 1 0 1 2 1 0 0 0 0 1 0 0 1 0 0 1 \ _0 0 1 0 0_ 1 0 0 1 0_ 1 _1_ "8" 1 7 1 = 6 1 2 .1. _3 Thus, we would rank the players as follows: Djokovic, Federer, Nadal, Safin, Roddick. Unfortunately, this approach is not guaranteed to break all ties. Markov Chains In Exercises 1-4, let P 0.5 0.3 0.5 0.7 trix for a Markov chain with two states. Let x0 be the transition ma-0.5' 0.5 the initial state vector for the population. 1. Compute Xj and x2. 2. What proportion of the state 1 population will be in state 2 after two steps? 3. What proportion of the state 2 population will be in state 2 after two steps? 4. Find the steady state vector. 246 Chapter 3 Matrices In Exercises 5-8, let P = be the transition ma- 120 180 90 be trixfor a Markov chain with three states. Let x0 the initial state vector for the population. 5. Compute xx and x2. 6. What proportion of the state 1 population will be in state 1 after two steps? 7. What proportion of the state 2 population will be in state 3 after two steps? 8. Find the steady state vector. 9. Suppose that the weather in a particular region behaves according to a Markov chain. Specifically, suppose that the probability that tomorrow will be a wet day is 0.662 if today is wet and 0.250 if today is dry. The probability that tomorrow will be a dry day is 0.750 if today is dry and 0.338 if today is wet. [This exercise is based on an actual study of rainfall in Tel Aviv over a 27-year period. See K. R. Gabriel and J. Neumann, "A Markov Chain Model for Daily Rainfall Occurrence at Tel Aviv," Quarterly Journal of the Royal Meteorological Society, 88 (1962), pp. 90-95.] (a) Write down the transition matrix for this Markov chain. (b) If Monday is a dry day, what is the probability that Wednesday will be wet? (c) In the long run, what will the distribution of wet and dry days be? 10. Data have been accumulated on the heights of children relative to their parents. Suppose that the probabilities that a tall parent will have a tall, medium-height, or short child are 0.6, 0.2, and 0.2, respectively; the probabilities that a medium-height parent will have a tall, medium-height, or short child are 0.1, 0.7, and 0.2, respectively; and the probabilities that a short parent will have a tall, medium-height, or short child are 0.2, 0.4, and 0.4, respectively. (a) Write down the transition matrix for this Markov chain. (b) What is the probability that a short person will have a tall grandchild? (c) If 20% of the current population is tall, 50% is of medium height, and 30% is short, what will the distribution be in three generations? (d) What proportion of the population will be tall, of medium height, and short in the long run? 11. A study of pinon (pine) nut crops in the American southwest from 1940 to 1947 hypothesized that nut production followed a Markov chain. [See D. H. Thomas, "A Computer Simulation Model of Great Basin Shoshonean Subsistence and Settlement Patterns," in D. L. Clarke, ed., Models in Archaeology (London: Methuen, 1972).] The data suggested that if one years crop was good, then the probabilities that the following year's crop would be good, fair, or poor were 0.08, 0.07, and 0.85, respectively; if one year's crop was fair, then the probabilities that the following years crop would be good, fair, or poor were 0.09, 0.11, and 0.80, respectively; if one year's crop was poor, then the probabilities that the following year's crop would be good, fair, or poor were 0.11,0.05, and 0.84, respectively. (a) Write down the transition matrix for this Markov chain. (b) If the pinon nut crop was good in 1940, find the probabilities of a good crop in the years 1941 through 1945. (c) In the long run, what proportion of the crops will be good, fair, and poor? 12. Robots have been programmed to traverse the maze shown in Figure 3.28 and at each junction randomly choose which way to go. Figure 3.28 (a) Construct the transition matrix for the Markov chain that models this situation. (b) Suppose we start with 15 robots at each junction. Find the steady state distribution of robots. (Assume that it takes each robot the same amount of time to travel between two adjacent junctions.) 13. Let j denote a row vector consisting entirely of Is. Prove that a nonnegative matrix P is a stochastic matrix if and only if jP = j. Section 3.7 Applications 247 14. (a) Show that the product of two 2X2 stochastic matrices is also a stochastic matrix. (b) Prove that the product of two n X n stochastic matrices is also a stochastic matrix. (c) If a 2 X 2 stochastic matrix P is invertible, prove that P~l is also a stochastic matrix. 29. 0.35 0.25 0 0.15 0.55 0.35 0.45 0.30 0.60 30. 0.2 0.4 0.3 0.2 0 0.4 0.5 0 0.1 0.2 0.5 0.2 0.4' 0.1 0.3 0.2 Suppose we want to know the average (or expected) number of steps it will take to go from state i to state) in a Markov chain. It can be shown that the following computation answers this question: Delete thejth row and thejth column of the transition matrix P to get a new matrix Q. (Keep the rows and columns ofQ labeled as they were in P.) The expected number of steps from state i to state) is given by the sum of the entries in the column of (I — Q)""1 labeled i. 15. In Exercise 9, if Monday is a dry day, what is the expected number of days until a wet day? 16. In Exercise 10, what is the expected number of generations until a short person has a tall descendant? 17. In Exercise 11, if the pinon nut crop is fair one year, what is the expected number of years until a good crop occurs? CAS 34. C = 18. In Exercise 12, starting from each of the other junctions, what is the expected number of moves until a robot reaches junction 4? In Exercises 31-34, a consumption matrix C and a demand vector d are given. In each case, find a feasible production vector x that satisfies Equation (2). 31. C 32. C = 33. C = 1/2 1/4' 1/2 1/2 T ,d = _3_ 0.1 0.3 0.4-0.2 0.5 0.2 0 0.4 0 0 0 0 0.1 2 0.1 0 0.3 0.4 0.2 0.2 3 2 4. 1.1 3.5 2.0 35. Let A be an n X n matrix, A 2 O, Suppose that Ax < x for some x in R", x > 0. Prove that x > 0. Linear Economic Models In Exercises 19-26, determine which of the matrices are exchange matrices. For those that are exchange matrices, find a nonnegative price vector that satisfies Equation (1). 19. 21. 23. 25. 1/2 1/2 0.4 0.6 '1/3 1/3 .1/3 '0.3 0.3 0.4 1/4 3/4 0.7 0.4 0 3/2 -1/2 0 0.2 0.5 0.3 0.5 0.5 20. 22. 24. 26. 1/3 1/2 0.1 0.9 '1/2 0 .1/2 0.50 0.25 0.25 2/3 1/2J 0.6 0.4 0 1/3 2/3 0.70 0.35 0.30 0.25 0 0.40 36. Let A, B, C, and Dben X n matrices and x and y vectors in W. Prove the following inequalities: (a) If A > B > O and C > D > O, then AC > BD > O. (b) If A > B and x > 0, x # 0, then Ax > Bx. Population Growth 37. A population with two age classes has a Leslie matrix L = 2 5 0.6 0 10 . If the initial population vector is compute x,, x2, and x3. In Exercises 27-30, determine whether the given consump Hon matrix is productive. ^0.20 0.10 0.10 38. A population with three age classes has a Leslie matrix 0 1 2 , If the initial population L 27. 0.2 0.5 0.3 0.6 28. 0.30 0.15 0.15 0.30 0.45 0.50 0.2 0 0 0 0.5 0 10 vector is xn = 4 3 , compute Xj, x2, and x3. Chapter 3 Matrices 39. A population with three age classes; has a Leslie matrix 1 1 3~ L = 0.7 0 0 . If the initial population vector is 0.7 0 0 0 0.5 0 100 100 100 , compute X!, x2, and x3. 40. A population with four age classes has a Leslie matrix '0 12 5" 0.5 0 0 0 0 0.7 0 0 0 0 0.3 0 L = If the initial population vector is xn 10' 10 10 10 , compute xu x2, and x3 41. A certain species with two age classes of 1 year's duration has a survival probability of 80% from class 1 to class 2. Empirical evidence shows that, on average, each female gives birth to five females per year. Thus, two possible Leslie matrices are and Li = 4 0.8 , compute Xj,..., x10 in (a) Starting with x0 each case. (b) For each case, plot the relative size of each age class over time (as in Figure 3.23). What do your graphs suggest? 42. Suppose the Leslie matrix for the VW beetle is L = '0 0 20" 0.10 0 . Starting with an arbitrary x0, deter-.0 0.5 0_ mine the behavior of this population. 43. Suppose the Leslie matrix for the VW beetle is ~0 0 20" L = s 0 0 _0 0.5 0. the survival probability s of the young beetles. cas Woodland caribou are found primarily in the western provinces of Canada and the American northwest. The average lifespan of a female is about 14 years. The birth and survival rates for each age bracket are given in Table 3.4, which shows that caribou cows do not give birth at all during their first 2 years and give Investigate the effect of varying birth to about one calf per year during their middle years. The mortality rate for young calves is very high. lilii 3.4 Survival Rate Age (years) Birth Rate 0-2 0.0 0.3 2-4 0.4 0.7 4-6 1.8 0.9 6-8 1.8 0.9 8-10 1.8 0,9 10-12 1.6 0.6 12-14 0.6 0.0 The numbers of woodland caribou reported in Jasper National Park in Alberta in 1990 are shown in Table 3.5. Using a CAS, predict the caribou population for 1992 and 1994. Then project the population for the years 2010 and 2020. What do you conclude? (What assumptions does this model make, and how could it be improved?) fitili i.l Woodland Caribou Population in fasper National Park. 1990 Age (years) Number 0-2 10 2-4 2 4-6 8 6-8 5 8-10 12 10-12 0 12-14 1 Source: World Wildlife Fund Canada Section 3.7 Applications In Exercises 49-52, draw a graph that has the given adjacency matrix. "0 1 1 11 [0 1 0 1' 1 0 0 0 1 1 1 1 50. 1 0 0 0 0 1 0 1 1 0 0 0 1 1 1 0 49. 51. 0 0 110 0 0 0 1 1 1 0 0 0 1 110 0 0 0 110 0 52. 0 0 0 1 0 0 0 1 0 0 0 1 1 1 1 1 1 0 0 1 0 0 56. v, In Exercises 57-60, draw a digraph that has the given adjacency matrix. 57. 0 10 0 10 0 1 0 10 0 10 11 58. 0 10 0 0 0 0 1 10 0 0 0 0 10 250 Chapter 3 Matrices 59. 0 0 10 1 10 0 10 0 0 0 0 1 10 10 0 0 10 10 60. 0 10 0 1 0 0 0 1 0 10 0 11 10 10 0 110 0 0 In Exercises 61-68, use powers of adjacency matrices to determine the number of paths of the specified length between the given vertices. 61. Exercise 50, length 2, vx and v2 62. Exercise 52, length 2, Vj and v2 63. Exercise 50, length 3, vx and v3 64. Exercise 52, length 4, v2 and v2 65. Exercise 57, length 2, vt to v3 66. Exercise 57, length 3, v4 to vx 67. Exercise 60, length 3, v4 to v, 68. Exercise 60, length 4, vx to v4 69. Let A be the adjacency matrix of a graph G. (a) If row i of A is all zeros, what does this imply about G? (b) If column j of A is all zeros, what does this imply about G? 70. Let A be the adjacency matrix of a digraph D. (a) If row i of A2 is all zeros, what does this imply about D? (b) If column j of A" is all zeros, what does this imply about D'< 71. Figure 3.29 is the digraph of a tournament with six players, P, to P6. Using adjacency matrices, rank the players first by determining wins only and then by using the notion of combined wins and indirect wins, as in Example 3.69. fe—-►-yl \ 1 -«--3 f>5 Figure 3.29 72. Figure 3.30 is a digraph representing a food web in a small ecosystem. A directed edge from ato b indicates that a has b as a source of food. Construct the adjacency matrix A for this digraph and use it to answer the following questions. Rodent Bear Fox Plant Insect Fish Bird Figure 3.30 (a) Which species has the most direct sources of food? How does A show this? (b) Which species is a direct source of food for the most other species? How does A show this? (c) If a eats b and b eats c, we say that a has c as an indirect source of food. How can we use A to determine which species has the most indirect food sources? Which species has the most direct and indirect food sources combined? (d) Suppose that pollutants kill the plants in this food web, and we want to determine the effect this change will have on the ecosystem. Construct a new adjacency matrix A* from A by deleting the row and column corresponding to plants. Repeat parts (a) to (c) and determine which species are the most and least affected by the change. (e) What will the long-term effect of the pollution be? What matrix calculations will show this? 73. Five people are all connected by e-mail. Whenever one of them hears a juicy piece of gossip, he or she passes it along by e-mailing it to someone else in the group according to Table 3.6. (a) Draw the digraph that models this "gossip network" and find its adjacency matrix A. Sender Recipients Ann Carla, Ehaz Bert Carla, Dana Carla Ehaz Dana Ann, Carla Ehaz Bert Chapter Review /- (b) Define a step as the time it takes a person to e-mail everyone on his or her list. (Thus, in one step, gossip gets from Ann to both Carla and Ehaz.) If Bert hears a rumor, how many steps will it take for everyone else to hear the rumor? What matrix calculation reveals this? (c) If Ann hears a rumor, how many steps will it take for everyone else to hear the rumor? What matrix calculation reveals this? (d) In general, if A is the adjacency matrix of a digraph, how can we tell if vertex i is connected to vertex j by a path (of some length)? [The gossip network in this exercise is reminiscent of the notion of "six degrees of separation" (found in the play and film by that name), which suggests that any two people are connected by a path of acquaintances whose length is at most 6. The game "Six Degrees of Kevin Bacon" more frivolously asserts that all actors are connected to the actor Kevin Bacon in such a way] 74. Let A be the adjacency matrix of a graph G. (a) By induction, prove that for all n & 1, the (f, j) entry of A" is equal to the number of n-paths between vertices i and j. (b) How do the statement and proof in part (a) have to be modified if G is a digraph? 75. If A is the adjacency matrix of a digraph G, what does the (i,j) entry of AAT represent if i # ;? A graph is called bipartite if its vertices can be subdivided into two sets U and V such that every edge has one endpoint in U and the other endpoint in V. For example, the graph in Exercise 48 is bipartite with U = {vj, v2, v3} and V = {v4, v5}. In Exercises 76-79, determine whether a graph with the given adjacency matrix is bipartite. 76. The adjacency matrix in Exercise 49 77. The adjacency matrix in Exercise 52 78. The adjacency matrix in Exercise 51 79. "o 0 1 0 1 r 0 0 1 0 1 1 1 1 0 1 0 0 0 0 1 0 1 1 1 1 0 1 0 0 .1 1 0 1 0 0. 80. (a) Prove that a graph is bipartite if and only if its vertices can be labeled so that its adjacency matrix can be partitioned as '0 B BT 0 (b) Using the result in part (a), prove that a bipartite graph has no circuits of odd length. Chapter Review Key Definitions and Concepts basis, 198 Basis Theorem, 202 column matrix (vector), 138 column space of a matrix, 195 composition of linear transformations, 219 coordinate vector with respect to a basis, 208 diagonal matrix, 139 dimension, 203 elementary matrix, 170 Fundamental Theorem of Invertible Matrices, 172,206 identity matrix, 139 inverse of a square matrix, 163 inverse of a linear transformation, 221 linear combination of matrices, 154 linear dependence/independence of matrices, 157 linear transformation, 213 LU factorization, 181 matrix, 138 matrix addition, 140 matrix factorization, 180 matrix multiplication, 141 matrix powers, 149 negative of a matrix, 140 null space of a matrix, 197 nullity of a matrix, 204 outer product, 147 partitioned matrices (block multiplication), 145, 148 permutation matrix, 187 Chapter 3 Matrices properties of matrix algebra, 154, 158, 159, 167 rank of a matrix, 204 Rank Theorem, 205 representations of matrix products, 146-148 row matrix (vector), 138 row space of a matrix, 195 scalar matrix, 139 scalar multiple of a matrix, 140 span of a set of matrices, 156 square matrix, 139 standard matrix of a linear transformation, 216 subspace, 192 symmetric matrix, 151 transpose of a matrix, 151 zero matrix, 141 Review Questions 1. Mark each of the following statements true or false: (a) For any matrix A, both AAT and ATA are defined. (b) If A and B are matrices such that AB = O and A ¥= O, then 5 = 0. (c) If A, B, and X are invertible matrices such that XA = B, then X = A~>B. (d) The inverse of an elementary matrix is an elementary matrix. (e) The transpose of an elementary matrix is an elementary matrix. (f) The product of two elementary matrices is an elementary matrix. (g) If A is an m X n matrix, then the null space of A is a subspace of R". (h) Every plane in R3 is a two-dimensional subspace ofR3. (i) The transformation T: R2 -» R2 defined by T(x) = — x is a linear transformation. (j) If T: R4 —► R5 is a linear transformation, then there is a 4 X 5 matrix A such that T(x) = Ax for all x in the domain of T. In Exercises 2-7, let A = "l 2 [2 0 -l" anaB = .3 5. L3 -3 4. Compute the indicated matrices, if possible. 2. A2B 3. A2B2 4. BTA"lB 5. (BB1)"1 6. (BTByl 7. The outer product expansion of AAT 8. If A is a matrix such that A = 1/2 .-3/2 find A. 9. If A AX = 1 0 2 3 0 1 -1 5 3 -1 -1 1 -3 0 -2 and X is a matrix such that , find X. 10. If possible, express the matrix A uct of elementary matrices. 1 2 4 6 as a prod- 11. If A is a square matrix such that A3 = O, show that (I - A) = I + A + A . 12. Find an LU factorization of A = 1 1 1 3 1 1 .2 -1 1. 13. Find bases for the row space, column space, and null "2 -4 5 8 5" space of A = 1 —2 2 3 1 _4 -8 3 2 6_ 14. Suppose matrices A and B are row equivalent. Do they have the same row space? Why or why not? Do A and B have the same column space? Why or why not? 15. If A is an invertible matrix, explain why A and Ar must have the same null space. Is this true if A is a nonin-vertible square matrix? Explain. 16. If A is a square matrix whose rows add up to the zero vector, explain why A cannot be invertible. 17. Let A be an m X n matrix with linearly independent columns. Explain why ATA must be an invertible matrix. Must AA1 also be invertible? Explain. 18. Find a linear transformation T: T such that Y "2" I '°1 = and T _i_ 3. _5J 19. Find the standard matrix of the linear transformation T: U2 —> R2 that corresponds to a counterclockwise rotation of 45° about the origin followed by a projection onto the line y = —2x. 20. Suppose that T: R" —> R" is a linear transformation and suppose that v is a vector such that T(v) # 0 but T2(v) = 0 (where T2 = T ° T). Prove that v and 7Tv) are linearly independent.