LINEAR ALGEBR
GEORGIE. SHILOV
Revised English Edition Translated & Edited by Richard A. Silverman
LINEAR ALGEBRA
GEORGI E. SHILOV
Professor of Mathematics Moscow University
Revised English Edition Translated and Edited by
Richard A. Silverman
DOVER PUBLICATIONS, INC., NEW YORK
Copyright © 1977 by Dover Publications, Inc. Copyright © 1971 by Richard A. Silverman. All rights reserved under  Pan American and International Copyright Conventions.
This Dover edition, first published in 1977, is an unabridged and corrected republication of the English translation originally published by Prentice-Hall, Inc., in 1971.
International Standard Book Number: 0-486-63518-X Library of Congress Catalog Card Number-.77-075267
Manufactured in the United States of America Dover Publications, Inc. 31 East 2nd Street, Mineola, N.Y. 11501
PREFACE
This book is intended as a text for undergraduate students majoring in mathematics and physics. It presents the material ordinarily covered in a course on linear algebra and subsequently drawn upon in various branches of mathematical analysis. However, it should be noted that the term "linear algebra" has for some time ceased to describe the actual content of the course, representing as it does a synthesis of various ideas from algebra, geometry and analysis. And although analysis in the strict sense of the term (i.e., the branch of mathematics concerned with liuiits, differentiation, integration, etc.) plays only a background role in this Vook, it is in fact the actual organizing principle of the course, since the problems of "linear algebra" can be regarded both as "finite-dimensional projections" and as the "support" for the basic problems of analysis.
The text stems in part from my previous book An Introduction to the Theory of Linear Spaces (Prentice-Hall, 1961), henceforth denoted by LS. Briefly, the differences between LS and the present book are the following: LS is entirely concerned with real spaces, while this book considers spaces over an arbitrary number field, with the real and complex spaces being considered as closely related special cases of the general theory. A chapter has been introduced on the Jordan canonical form of the matrix of a linear operator in a real or complex space. Moreover, we also study the canonical form of the matrix of a normal operator in a complex space equipped with a scalar product, deducing as special cases the canonical forms of the matrices of Hermitian, anti-Hermitian and unitary operators and their real analogues.
Vi PREFACE
The final lengthy chapter in LS on the geometry of infinite-dimensional Hilbert space has been omitted, since a more systematic treatment of this topic (in a functional analysis context) is available in a number of other books. Instead, further new material bearing directly on the basic content of the course has been added, namely Chapter 11 on the structure of matrix algebras (written at my request by A. Y. Khelemski) and an appendix on the structure of matrix categories, based on my article with I. M. Gelfand (Vestnik MGU, Ser. Mat. Mekh., No. 4 (1963), pp. 27-48). Chapter 11 and the appendix, although completely elementary in method, are nevertheless somewhat higher in level than the rest of the book (as indicated by the asterisks) and represent advanced developments in the theory of linear algebra.
Each chapter is equipped with a set of problems, and hints and answers to these problems appear at the end of the book. To a certain extent, the problems help to develop necessary technical skill, but they are primarily intended to illustrate and amplify the material in the text. Certain groups of problems can serve as the basis for seminar discussions. The same is true of Chapter 11 and the appendix, as well as of the starred sections (the latter contain ancillary material that can be omitted on first reading).
It is my pleasant duty to acknowledge the painstaking efforts of M. S. Agranovich, the editor of the book, and to thank him for a number of valuable suggestions. I also wish to thank I. Y. Dorfman for checking the solutions to all the problems.
CONTENTS
chapter I
DETERMINANTS I
1.1. Number Fields 1
1.2. Problems of the Theory of Systems of Linear Equations 3
1.3. Determinants of Order n 5
1.4. Properties of Determinants 8
1.5. Cofactors and Minors 12
1.6. Practical Evaluation of Determinants 16
1.7. Cramer's Rule 18
1.8. Minors of Arbitrary Order. Laplace's Theorem 20
1.9. Linear Dependence between Columns 23 Problems 28
vi i
VIIP CONTENTS
chapter 2
LINEAR SPACES 31
2.1. Definitions 31
2.2. Linear Dependence 36
2.3. Bases, Components, Dimension 38
2.4. Subspaces 42
2.5. Linear Manifolds 49
2.6. Hyperplanes 51
2.7. Morphisms of Linear Spaces 53 Problems 56
chapter 3
SYSTEMS OF LINEAR EQUATIONS 58
3.1. More on the Rank of a Matrix 58
3.2. Nontrivial Compatibility of a Homogeneous Linear System 60
3.3. The Compatibility Condition for a General Linear System 61
3.4. The General Solution of a Linear System 63
3.5. Geometric Properties of the Solution Space 65
3.6. Methods for Calculating the Rank of a Matrix 67 Problems 71
chapter 4
LINEAR FUNCTIONS OF A VECTOR ARGUMENT 75
4.1. Linear Forms 75
4.2. Linear Operators 77
4.3. Sums and Products of Linear Operators 82
4.4. Corresponding Operations on Matrices 84
4.5. Further Properties of Matrix Multiplication 88
4.6. The Range and Null Space of a Linear Operator 93
4.7. Linear Operators Mapping a Space Kn into Itself 98
4.8. Invariant Subspaces 106
4.9. Eigenvectors and Eigenvalues 108 Problems 113
CONTENTS ix
chapter 5
COORDINATE TRANSFORMATIONS 118
5.1. Transformation to a New Basis 118
5.2. Consecutive Transformations 120
5.3. Transformation of the Components of a Vector 121
5.4. Transformation of the Coefficients of a Linear Form 123
5.5. Transformation of the Matrix of a Linear Operator 124 *5.6.   Tensors 126
Problems 131
chapter 6
THE CANONICAL FORM OF THE MATRIX
OF A LINEAR OPERATOR 133
6.1. Canonical Form of the Matrix of a Nilpotent Operator 133
6.2. Algebras. The Algebra of Polynomials 136
6.3. Canonical Form of the Matrix of an Arbitrary Operator 142
6.4. Elementary Divisors 147
6.5. Further Implications 153
6.6. The Real Jordan Canonical Form 155 *6.7. Spectra, Jets and Polynomials 160 *6.8.   Operator Functions and Their Matrices 169
Problems 176
chapter 7
BILINEAR AND QUADRATIC FORMS 179
7.1. Bilinear Forms 179
7.2. Quadratic Forms 183
7.3. Reduction of a Quadratic Form to Canonical Form 185
7.4. The Canonical Basis of a Bilinear Form 190
7.5. Construction of a Canonical Basis by Jacobi's Method 192
7.6. Adjoint Linear Operators 196
7.7. Isomorphism of Spaces Equipped with a Bilinear Form 199 *7.8.   Multilinear Forms 202
7.9.   Bilinear and Quadratic Forms in a Real Space 204
Problems 210
X CONTENTS
chapter 8
EUCLIDEAN SPACES 214
8.1. Introduction 214
8.2. Definition of a Euclidean Space 215
8.3. Basic Metric Concepts 216
8.4. Orthogonal Bases 222
8.5. Perpendiculars 223
8.6. The Orthogonalization Theorem 226
8.7. The Gram Determinant 230
8.8. Incompatible Systems and the Method of Least Squares 234
8.9. Adjoint Operators and Isometry 237 Problems 241
chapter 9
UNITARY SPACES 247
9.1. Hermitian Forms 247
9.2. The Scalar Product in a Complex Space 254
9.3. Normal Operators 259
9.4. Applications to Operator Theory in Euclidean Space 263 Problems 271
chapter 10
QUADRATIC FORMS IN EUCLIDEAN
AND UNITARY SPACES 273
10.1. Basic Theorem on Quadratic Forms in a Euclidean Space 273
10.2. Extremal Properties of a Quadratic Form 276
10.3. Simultaneous Reduction of Two Quadratic Forms 283
10.4. Reduction of the General Equation of a Quadric Surface 287
10.5. Geometric Properties of a Quadric Surface 289 *10.6.   Analysis of a Quadric Surface from Its General Equation 300
10.7.   Hermitian Quadratic Forms 308
Problems 310
CONTENTS Xi
♦chapter 11
FINITE-DIMENSIONAL ALGEBRAS
AND THEIR REPRESENTATIONS 312
11.1. More on Algebras 312
11.2. Representations of Abstract Algebras 313
11.3. Irreducible Representations and Schur's Lemma 314
11.4. Basic Types of Finite-Dimensional Algebras 315
11.5. The Left Regular Representation of a Simple Algebra 318
11.6. Structure of Simple Algebras 320
11.7. Structure of Semisimple Algebras 323
11.8. Representations of Simple and Semisimple Algebras 327
11.9. Some Further Results 331 Problems 332
♦Appendix
CATEGORIES OF FINITE-DIMENSIONAL SPACES 335
A.l.   Introduction 335
A.2.   The Case of Complete Algebras 338
A.3.   The Case of One-Dimensional Algebras 340
A.4.   The Case of Simple Algebras 345
A.5.  The Case of Complete Algebras of Diagonal Matrices 353
A.6.   Categories and Direct Sums 357
HINTS AND ANSWERS 361
BIBLIOGRAPHY 379
INDEX
381
chapter I
DETERMINANTS
I.I. Number Fields
1.11. Like most of mathematics, linear algebra makes use of number systems (number fields). By a number field we mean any set K of objects, called "numbers," which, when subjected to the four arithmetic operations again give elements of K. More exactly, these operations have the following properties (field axioms):
a. To every pair of numbers a and fl in K there corresponds a (unique) number a + |3 in K, called the sum of a and (3, where
1) a + |3 = [3 + oc for every a and (3 in K {addition is commutative);
2) (oc + p) + y ^ a + (P + Y) f°r every a, (3, y in K {addition is associative);
3) There exists a number 0 {zero) in K such that 0 + a — a for every a
in
4) For every a in /if there exists a number {negative element) y in K such that a + y = 0.
The solvability of the equation <x + y = 0 for every a allows us to carry out the operation of subtraction, by defining the difference $ — a as the sum of the number (3 and the solution y of the equation a + y = 0.
b. To every pair of numbers a and |3 in K there corresponds a (unique) number a • £1 (or a(3) in K, called the product of a and fi, where
5) <x(3 = |3a for every a and (3 in A" (multiplication is commutative);
6) (a(3)Y = a(PY) for every a, (3, y 'n ^ {multiplication is associative);
I
2 DETERMINANTS
CHAP. 1
7) There exists a number 1 (^ 0) in K such that 1 • a = a for every a in AT;
8) For every a ^ 0 in K there exists a number {reciprocal element) y in AT such that ay = 1.
c. Multiplication is distributive over addition, i.e.,
9) a(|3 + y) = a(3 + ay for every a, |3, y in A'.f
The solvability of the equation ay = 1 for every a^O allows us to carry out the operation of division, by defining the quotient (3/a as the product of the number fi and the solution y of the equation ay = 1.
The numbers 1,1 + 1=2, 2+1=3, etc. are said to be natural; it is assumed that none of these numbers is zero.J By the integers in a field A" we mean the set of all natural numbers together with their negatives and the number zero. By the rational numbers in a field K we mean the set of all quotients pjq, where p and q are integers and q ^ 0.
Two fields K and K' are said to be isomorphic if we can set up a one-to-one correspondence between K and A" such that the number associated with every sum (or product) of numbers in K is the sum (or product) of the corresponding numbers in K'. The number associated with every difference (or quotient) of numbers in AT will then be the difference (or quotient) of the corresponding numbers in A".
1.12. The most commonly encountered concrete examples of number fields are the following:
a. The field of rational numbers, i.e., of quotients pjq where p and q ^ 0 are the ordinary integers subject to the ordinary operations of arithmetic. (It should be noted that the integers by themselves do not form a field, since they do not satisfy axiom 8).) It follows from the foregoing that every field K has a subset (subfield) isomorphic to the field of rational numbers.
b. The field of real numbers, having the set of all points of the real line as its geometric counterpart. An axiomatic treatment of the field of real numbers is achieved by supplementing axioms l)-9) with the axioms of order and the least upper bound axiom.§
t Note that axioms 5) and 9) also imply (a + P>Y = ay + fJy.
% Given two elements N and E, say, we can construct a field by the rules N + N = N, N -r E — E, E -r E = N, N-N~N,N-E=^N, E-E = E. Then, in keeping with our notation, we should write N ~ 0, E — \ and hence 2 = 1 + 1 = 0. To exclude such number systems, we require that all natural field elements be nonzero.
§ For a detailed treatment of real numbers, see, for example, G. H. Hardy, Pure Mathematics, ninth edition, The Macmillan Co., New York (1945), Chap. 1.
SEC. 1.2
PROBLEMS OF THE THEORY OF SYSTEMS OF LINEAR EQUATIONS 3
c. The field of complex numbers of the form a + ib, where a and b are real numbers (i is not a real number), equipped with the following operations of addition and multiplication (Hardy, op. cit., Chap. 3):
(ax + ibx) + (a2 + ib2) = (ax + a2) + i(bx + b2),
(ax + ibx){a2 + ib2) = (axa2 — bxb2) + i{axb2 + a2*i)-
For numbers of the form a + /0, these operations reduce to the corresponding operations for real numbers; briefly we write a + iO = a and call complex numbers of this form real. Thus it can be said that the field of complex numbers has a subset (subfield) isomorphic to the field of real numbers. Complex numbers of the form 0 + ib are said to be (purely) imaginary and are designated briefly by ib. It follows from the multiplication rule that
;2 = /-/=(0 + /l)(0 + /l) = -l.
1.13. Henceforth we will designate the field of real numbers by R and the field of complex numbers by C. According to the "fundamental theorem of algebra" (Hardy, op. cit., Appendix II, p. 492), we can not only carry out the four arithmetic operations in C but also solve any algebraic equation
zn + axzn~x H-----h an = 0.
The field R of real numbers does not have this property. For example, the equation z2 + 1 = 0 has no solutions in the field R.
Many of the subsequent considerations are valid for any number field. In what follows, we will use the letter K to denote an arbitrary number field. If some property is true for the field K, then it is automatically true for the field R and the field C, which are special cases of the general field K.
1.2. Problems of the Theory of Systems of Linear Equations
In this and the next two chapters, we shall study systems of linear equations. In the most general case, such a system has the form
axxxx + ax2x2 + • • • + aXnxn = bx,
a2lxx + a22x2 + • • • + a2nxn = b2, ^
+ 0*2*2 H----+ akn*n = V
Here xx, x2,. . . , xn denote the unknowns (elements of the field K) which are to be determined. (Note that we do not necessarily assume that the number of unknowns equals the number of equations.) The quantities au> ai2y ■ ■ • > ahni taken from the field K, are called the coefficients of the
4 DETERMINANTS
CHAP. 1
system. The first index of a coefficient indicates the number of the equation in which the coefficient appears, while the second index indicates the number of the unknown with which the coefficient is associated, f The quantities bx, b2,. . . , bk appearing in the right-hand side of (1), taken from the same field Ky are called the constant terms of the system; like the coefficients, they are assumed to be known. By a solution of the system (1) we mean any set of numbers cXy c2, •. . , cn from the same field K which, when substituted for the unknowns xu x2,. . . , xn turns all the equations of the system into identities.!
Not every system of linear equations of the form (1) has a solution. For example, the system
2xx + 3>x2 ~ 5, ^ 2xx + 3x2 = 6
obviously has no solution at all. Indeed, whatever numbers cx, c2 we substitute in place of the unknowns xx, x2, the left-hand sides of the equations of the system (2) are the same, while the right-hand sides are different. Therefore no such substitution can simultaneously convert both equations of the system into identities.
A system of equations of the form (1) which has at least one solution is called compatible', a system which does not have solutions is called incompatible. A compatible system can have one solution or several solutions. In the latter case, we distinguish the solutions by indicating the number of the solution by a superscript in parentheses; for example, the first solution will be denoted by cxx), c\x\ . .. , c[1}, the second solution by c[2\ c[2\ . . . , ci2), and so on. The solutions c[1}, c[1],. .. , c(nx> and c{2), c{2\ . . . , c{*] are regarded as distinct if at least one of the numbers cj1' does not coincide with the corresponding numbers c|.2> (i = 1,2, . .. ,«). For example, the system
2xx -\- 3x2 — 0, 4xx + 6x2 = 0
has the distinct solutions
c«> = c?» = 0  and  c[2) = 3, c22) = -2
(and also infinitely many other solutions). If a compatible system has a unique solution, the system is called determinate; if a compatible system has at least two different solutions, it is called indeterminate.
t Thus, for example, the symbol ait should be read as "a three four" and not as "<a thirty-four."
+ We emphasize that the set of numbers cx, c2,.. ., c„ represents one solution of the system and not n solutions.
SEC. 1.3
DETERMINANTS OF ORDER fl 5
We can now formulate the basic problems which arise in studying the system (1):
a) To ascertain whether the system (1) is compatible or incompatible;
b) If the system (I) is compatible, to ascertain whether it is determinate;
c) If the system (1) is compatible and determinate, to find its unique solution;
d) If the system (1) is compatible and indeterminate, to describe the set of all its solutions.
The basic mathematical tool for studying linear systems is the theory of determinants, which we consider next.
1.3. Determinants of Order n
1.31* Suppose we are given a square matrix, i.e., an array of n2 numbers aa      = 1, 2,...,«), all elements of a field K:
a
xx
a
12
a
Xn
#21 #22
a
2n
a
Ml
a
n2
a,
(4)
The number of rows and columns of the matrix (4) is called its order. The numbers ait are called the elements of the matrix. The first index indicates the row and the second index the column in which ai} appears. The elements an, a22,.. • , ann form the principal diagonal of the matrix.
Consider any product of n elements which appear in different rows and different columns of the matrix (4), i.e., a product containing/!/.?* one element from each row and each column. Such a product can be written in the form
««.n- (5)
Actually, for the first factor we can always choose the element appearing in the first column of the matrix (4); then, if we denote by ax the number of the row in which the element appears, the indices of the element will be a1? 1. Similarly, for the second factor we can choose the element appearing in the second column; then its indices will be a2, 2, where a2 is the number of the row in which the element appears, and so on. Thus, the indices al5 o^, ... , att are the numbers of the rows in which the factors of the product (5) appear, when we agree to write the column indices in increasing order. Since, by hypothesis, the elements aail, aaa2, • • • , #<xnn appear in different rows of the matrix (4), one from each row, then the numbers oc2,. . . , an are all different and represent some permutation of the numbers 1,2,.. . ,n. By an inversion in the sequence ax, a2,. . . , an, we mean an arrangement
6 DETERMINANTS
CHAP. 1
of two indices such that the larger index comes before the smaller index. The total number of inversions will be denoted by JV(«i. a2,... , a„). For example, in the permutation 2, 1, 4, 3, there are two inversions (2 before 1, 4 before 3), so that
N(2, 1,4, 3) = 2.
In the permutation 4, 3, 1,2, there are five inversions (4 before 3, 4 before 1, 4 before 2, 3 before 1, 3 before 2), so that
#(4,3,1,2)== 5.
If the number of inversions in the sequence <xx, a2, . . . , an is even, we put a plus sign before the product (5); if the number is odd, we put a minus sign before the product. In other words, we agree to write in front of each product of the form (5) the sign determined by the expression
£_^yV(ai.a2,...,an)
The total number of products of the form (5) which can be formed from the elements of a given matrix of order « is equal to the total number of permutations of the numbers 1,2,...,«. As is well known, this number is equal
to «!.
We now introduce the following definition:
By the determinant D of the matrix (4) is meant the algebraic sum of the n! products of the form (5), each preceded by the sign determined by the rule just given, i.e.,
D = I (-l)Mai-'*aXi<W ' " ' #*„„• (6) Henceforth, the products of the form (5) will be called the terms of the determinant D. The elements ai} of the matrix (4) will be called the elements of D, and the order of (4) will be called the order of D. We denote the determinant D corresponding to the matrix (4) by one of the following symbols:
D =
a
li
a
12
a
a
21
a
22
In
a.
= det \\atj\
(?)
For example, we obtain the following expressions for the determinants of orders two and three:
#21 #22
a12 a13
#21     #22 #23
#31    #32 fl33
— aUa22 #2l#12'
— #U#22#33 4" #21#32#13 + #31fl12#23
— #31#22#13 — #21#12fl33 #11#32#23-
SEC. 1.3
DETERMINANTS OF ORDER n 7
We now indicate the role of determinants in solving systems of linear equations, by considering the example of a system of two equations in two unknowns:
«11^1 ~\~ #12^2 = *1> «21^1       «22^2 ~ *2-
Eliminating one of the unknowns in the usual way, we can easily obtain the formulas
*1#22 " *2«12
i
aUa22 ~ «21«l2
«11*2 - «2 A «11«22 «21«12
assuming that these ratios have nonvanishing denominators. The numerators and denominators of the ratios can be represented by the second-order determinants
«11«22 ~" «2l«12 —
22
*2«12 ~
«11	«12
«21	«22
	«12
	«22
«11	*l
«21	b2
«U*2 — «21*1 =
It turns out that similar formulas hold for the solutions of systems with an arbitrary number of unknowns (see Sec. 1.7).
1.32. The rule for determining the sign of a given term of a determinant can be formulated somewhat differently, in geometric terms. Corresponding to the enumeration of elements in the matrix (4), we can distinguish two natural positive directions: from left to right along the rows, and from top to bottom along the columns. Moreover, the slanting lines joining any two elements of the matrix can be furnished with a direction: we shall say that the line segment joining the element a„- with the element akl has positive slope if its right endpoint lies lower than its left endpoint, and that it has negative slope if its right endpoint lies higher than its left endpoint.f Now imagine that in the matrix (4) we draw all the segments with negative slope joining pairs of elements aail, aa^,. . . , aan7l of the product (5). Then we put a plus sign before the product (5) if the number of all such segments is even, and a minus sign if the number is odd.
f This definition of "slope1' is not to be confused with the geometric notion with the same name. In fact, the sign convention adopted here is the opposite of that used in geometry.
8 DETERMINANTS
CHAP. 1
For example, in the case of a fourth-order matrix, a plus sign must be put before the product a2ial2aiSaSi, since there are two segments of negative slope joining the elements of this product:
a23 #24
a.%% #32
#41 #42
However, a minus sign must be put before the product #u#32#i3#24> since in the matrix there are five segments of negative slope joining these elements:
In these examples, the number of segments of negative slope joining the elements of a given term equals the number of inversions in the order of the first indices of the elements appearing in the term. In the first example, the sequence 2, 1, 4, 3 of first indices has two inversions; in the second example, the sequence 4, 3, 1, 2 of first indices has five inversions.
We now show that the second definition of the sign of a term in a determinant is equivalent to the first. To show this, it suffices to prove that the number of inversions in the sequence of first indices of a given term (with the second indices in natural order) is always equal to the number of segments of negative slope joining the elements of the given term in the matrix. But this is almost obvious, since the presence of a segment of negative slope joining the elements #a.i and aXjj means that ocf > for / < j, i.e., there is an inversion in the order of the first indices.
1.4. Properties of Determinants
1.41. The transposition operation. The determinant
an   a2l   ' • • a
nl
#12    #22     * #
ra2
aln    a2n    ' ' ' ann
(8)
SEC. 1.4
PROPERTIES OF DETERMINANTS 9
obtained from the determinant (7) by interchanging rows and columns with the same indices is said to be the transpose of the determinant (7). We now show that the transpose of a determinant has the same value as the original determinant. In fact, the determinants (7) and (§) obviously consist of the same terms; therefore it is enough for us to show that identical terms in the determinants (7) and (8) have identical signs. Transposition of the matrix of a determinant is clearly the result of rotating it (in space) through 180° about the principal diagonal an, a22, ■ ■ . , ann. As a result of this rotation, every segment with negative slope (e.g., making an angle a < 90° with the rows of the matrix) again becomes a segment with negative slope (i.e., making the angle 90° — a with the rows of the matrix). Therefore the number of segments with negative slope joining the elements of a given term does not change after transposition. Consequently the sign of the term does not change either. Thus the signs of all the terms are preserved, which means that the value of the determinant remains unchanged.
The property just proved establishes the equivalence of the rows and columns of a determinant. Therefore further properties of determinants will be stated and proved only for columns.
1.42. The antisymmetry property. By the property of being antisymmetric with respect to columns, we mean the fact that a determinant changes sign when two of its columns are interchanged. We consider first the case where two adjacent columns are interchanged, for example columns j and j + \. The determinant which is obtained after these columns are interchanged obviously still consists of the same terms as the original determinant. Consider any of the terms of the original determinant. Such a term contains an element of the yth column and an element of the {j + l)th column. If the segment joining these two elements originally had negative slope, then after the interchange of columns, its slope becomes positive, and conversely. As for the other segments joining pairs of elements of the term in question, each of these segments does not change the character of its slope after the column interchange. Consequently the number of segments with negative slope joining the elements of the given term changes by one when the two columns are interchanged; therefore each term of the determinant, and hence the determinant itself, changes sign when the columns are interchanged.
Suppose now that two nonadjacent columns are interchanged, e.g., column j and column k {j < k), where there are m other columns between. This interchange can be accomplished by successive interchanges of adjacent columns as follows: First column j is interchanged with column j + 1, then with columns j + 2, j + 3,. . . , k. Then the column k — 1 so obtained (which was formerly column k) is interchanged with columns k — 2, k — 3, . . . J. In all, m + 1 + m = 2m + 1 interchanges of adjacent columns are required, each of which, according to what has just been proved, changes the
10 DETERMINANTS
CHAP. 1
sign of the determinant. Therefore, at the end of the process, the determinant will have a sign opposite to its original sign (since for any integer m, the number 2m + 1 is odd).
1.43. Corollary. A determinant with two identical columns vanishes.
Proof. Interchanging the columns does not change the determinant D. On the other hand, as just proved, the determinant must change its sign. Thus D = — D, which implies that D = 0. |f
1.44. The linear property of determinants. This property can be formulated as follows:
a. Theorem. If all the elements of the jth column of a determinant D are "linear combinations^ of two columns of numbers, i.e., if
ati — Ibi + yxt      0* = 1, 2,. . . , n)
where X and jx are fixed numbers, then D is equal to a linear combination of two determinants:
D — XZ)X + [aD2. (9)
Here both determinants Dx and D2 have the same columns as the determinant D except for the jth column; the jth column of Dx consists of the numbers b±, while the jth column of D2 consists of the numbers c{.
Proof. Every term of the determinant D can be represented in the form «aii««22 * • ' a9jl • • • a^n = a^a^ * • •        + ^ • • • a^n
Adding up all the first terms (with the signs which the corresponding terms have in the original determinant), we clearly obtain the determinant Dx, multiplied by the number X. Similarly, adding up all the second terms, we obtain the determinant D2, multiplied by the number fx. |
It is convenient to write this formula in a somewhat different form. Let D be an arbitrary fixed determinant. Denote by Dj(pt) the determinant which is obtained by replacing the elements of the yth column of D by the numbers pt (i — 1, 2,. . . , n). Then (9) takes the form
b. The linear property of determinants can easily be extended to the case where every element of the y'th column is a linear combination not of two terms but of any other number of terms, i.e.
atJ = X^ + jxc, +----h ift.
t The symbol | means Q.E.D. and indicates the end of a proof.
SEC. 1.4
PROPERTIES OF DETERMINANTS      I I
In this case,
= IDfa) + v,Dt{cd + ■ ■ • + vDt(fd. (10)
1.45. Corollary. Any common factor of a column of a determinant can be factored out of the determinant.
Proof If a^ = ~kbt, then by (10) we have
D,(a„) = D,(kbs) = lDtQ>J. |
1.46. Corollary. If a column of a determinant consists entirely of zeros, then the determinant vanishes.
Proof Since 0 is a common factor of the elements of one of the columns, we can factor it out of the determinant, obtaining
D,(0) = Dt(0 ■ 1) = 0 • D,(l) = 0. |
1.47. Addition of an arbitrary multiple of one column to another column.
a. Theorem. The value of a determinant is not changed by adding the elements of one column multiplied by an arbitrary number to the corresponding elements of another column.
Proof Suppose we add the kth column multiplied by the number X to the yth column (k j). The yth column of the resulting determinant consists of elements of the form ai} + ~kaik (/ = 1, 2,. . . , n). By (9) we have
Dt(au +        = DM + -kDM-
Theyth column of the second determinant consists of the elements aik, and hence is identical with the Ath column. It follows from Corollary 1.43f that J>i(aik) = 0, so that
£>}(ai}- + X«ifc) = Ds{ati). |
b. Naturally, Theorem 1.47a can be formulated in the following more general form: The value of a determinant is not changed by adding to the elements of its jth column first the corresponding elements of the kth column multiplied by X, next the elements of the Ith column multiplied by (x, etc., and finally the elements of the pth column multiplied by t (k ^ j, I ^ j, . . . , p ^ j).
1.48. Because of the invariance of determinants under transposition (Sec. 1.41), all the properties of determinants proved in this section for columns remain valid for rows as well.
t Corollary 1.43 refers to the (unique) corollary in Sec. 1.43, Theorem 1.47a to the theorem in Sec. 1.47a, etc.
12 DETERMINANTS
CHAP. 1
1.5. Cofactors and Minors
1.51. Consider any column, the jth say, of the determinant D. Let aif be any element of this column. Add up all the terms containing the element a{} appearing in the right-hand side of equation (6)
and then factor out the element au. The quantity which remains, denoted by Aijt is called the cofactor of the element au of the determinant D.
Since every term of the determinant D contains an element from the yth column, (6) can be written in the form
D = auAu + a2iA2i + ■ • • + aniAni, (11)
called the expansion of the determinant D with respect to the {elements of the) jth column. Naturally, we can write a similar formula for any row of the determinant D. For example, for the ith row we have the formula
D = aaAa + ai2Ai2 + ■ • • + ainAin. (12)
This gives the following
Theorem. The sum of all the products of the elements of any column (or row) of the determinant D with the corresponding cofactors is equal to the determinant D itself
Equations (11) and (12) can be used to calculate determinants, but first we must know how to calculate cofactors. We will show how this is done in Sec. 1.53.
1.52. Next we note a consequence of (11) and (12) which will be useful later. Equation (11) is an identity in the quantities au, a2j,.. . , anj. Therefore it remains valid if we replace a ti (i = 1, 2,...,«) by any other quantities. The quantities AXj, A2i,. . . , Ani remain unchanged when such a replacement is made, since they do not depend on the elements au. Suppose that in the right and left-hand sides of the equality (11) we replace the elements au, a2j, . .. , ani by the corresponding elements of any other column, say the A:th. Then the determinant in the left-hand side of (11) will have two identical columns and will therefore vanish, according to Corollary 1.43. Thus we obtain the relation
OikAu + a2kA2i + • • ■ + anliAn} = 0 (13) for k ^ j. Similarly, from (12) we obtain
aaAa + al2Ai2 + ■ • - + alnAin = 0
(14)
SEC. 1.5
COFACTORS AND MINORS 13
for / ^ i. Thus we have proved the following
Theorem. The sum of all the products of the elements of a column (or row) of the determinant D with the cofactors of the corresponding elements of another column (or row) is equal to zero.
1.53. If we delete a row and a column from a matrix of order n, then, of course, the remaining elements form a matrix of order n — 1. The determinant of this matrix is called a minor of the original wth-order matrix (and also a minor of its determinant D). If we delete the ith row and theyth column of D, then the minor so obtained is denoted by Mu or MU(D).
We now show that the relation
^ = (-Di+'A^ (15)
holds, so that the calculation of cofactors reduces to the calculation of the corresponding minors. First we prove (15) for the case i = 1,/— 1. We add up all the terms in the right-hand side of (6) which contain the element an, and consider one of these terms. It is clear that the product of all the elements of this term except an gives a term c of the minor Mn. Since in the matrix of the determinant D, there are no segments of negative slope joining the element au with the other elements of the term selected, the sign ascribed to the term anc of the determinant D is the same as the sign ascribed to the term c in the minor Mn. Moreover, by suitably choosing a term of the determinant D containing au and then deleting an, we can obtain any term of the minor Mn. Thus the algebraic sum of all the terms of the determinant D containing au, with axx deleted, equals the product Mxl. But according to Sec. 1.51, this sum is equal to the product An. Therefore, An — Mn as required.
Now we prove (15) for arbitrary / and j, making essential use of the fact that the formula is valid for i=J~ 1. Consider the element ati = a, appearing in the z'th row and the yth column of the determinant D. By successively interchanging adjacent rows and columns, we can move the element a over to the upper left-hand corner of the matrix; to do this, we need
/-i+y-i-i+y-2
interchanges. As a result, we obtain the determinant Dx with the same terms as those of the original determinant D multiplied by
(_l)*+J-2 ^ (-1)'+*.
The minor Mn(Dx) of the determinant Dx is clearly identical with the minor Mti(D) of the determinant D. By what has been proved already, the sum of the terms of the determinant Dx which contain the element a, with a deleted, is equal to Mn(Dt). Therefore the sum of the terms of the
14 DETERMINANTS
CHAP. 1
original determinant D which contain the element «„ = a, with « deleted, is equal to
(-D'+WuCDJ - (-1)»WW(D). According to Sec. 1.51, this sum is equal to An. Consequently
which completes the proof of (15).
1.54. Formulas (11) and (12) can now be written in the following commonly used variants:
d = {-\)™aiimu + (-i)2+'#2,m2, + • • • + (-ir+x,mtt,, an
D = (- l)i+1«aMil + {~\)^ai2Mi2 + • ■ • + (~l)i+nainMin. (12') 1.55. Examples
a. A third-order determinant has six distinct expansions, three with respect to rows and three with respect to columns. For example, the expansion with respect to the first row is
«11 «12
a
13
«21    «22 «23
a
31
#32 «:
33
	«22	«23		«21	«23		«21	«22
			— «12			+ «13		
	«32	«33		«31	«33		«31	«32
b. An Hth-order determinant of the form
axx    0 0
«21    «22 0 «31     «32 «33
a
nl    an2 «n3
0 0 0
a,
is called triangular. Expanding Dn with respect to the first row, we find that Dn equals the product of the element an with the triangular determinant
D
n-l
«22 0 «32 «33
0 0
«
n2 «n3
«*
of order n ~ 1. Again expanding Dn_x with respect to the first row, we find that
Dn-l — «22^«-2>
SEC. 1.5
COFACTORS AND MINORS 15
where Dn_2 is a triangular determinant of order n — 2. Continuing in this way, we finally obtain
D = ana22' • ' ann,
i.e., a triangular determinant equals the product of the elements appearing along its principal diagonal.
c. Calculate the Vandermonde determinant
	1	1	1
			Xn
	x\	•> xZ	■ x2
	vn—l xl	vn—l x2	v"—
Solution. W(xu ... , xn) is a polynomial of degree w — 1 in xn, with coefficients depending on xlt . . . , xn_v This polynomial vanishes if xn takes any of the values jq, x2, . . . , xn_x, since then the determinant has two identical columns. Hence, by a familiar theorem of elementary algebra, the polynomial W(xu . . . , xn) is divisible by the product (xn — xt) • * * (xn — xn_t), so that
n-l
W(xx, . . . , xn) = a(xx, . . . , xn_j) IX 0« -
The quantity aOq, .. . , xn_j) is the leading coefficient of the polynomial W(xu . . . , xn). Expanding the Vandermonde determinant with respect to the last column, we see that this coefficient is just W(xx, . . . , xn_x). It follows that
n-l
• • •, x„) = w(xlt..., xn_t) n o« - xky
Similarly,
n-2
W(xx, . . . , X„_j) — W(XX, . . . , X„_2) IX (Xn~l ~ Xj\
3=1
W(xlt x2) = W(Xl)(x2 -
and obviously
W^(xI)= 1.
Multiplying all these equalities together, we get the desired result
W(xx, . . . , xn) =    IX   (xm - xd.
In particular, if the quantities xx, . . . , xtl are all distinct, then
W(xx,. . . , xj ^ 0.
16 determinants
chap. 1
1.6. Practical Evaluation of Determinants
1.61. Formula (12) takes a particularly simple form when all the elements of the fth row vanish except one element, say aik. In this case
D = aikAik, (16)
and the calculation of the determinant D of order n reduces at once to the calculation of a determinant of order n — 1. If in addition to aik, there is another nonzero element a{j in the ith row, then multiplying the kth column by X = fl,-,-/flifc and subtracting it from the z'th column, we obtain a determinant which is equal to the original one (cf. Sec. 1.47) but which now has a zero in the ith row and y'th column. By a sequence of similar operations, we change any determinant with a nonzero element aik in the ith row into a determinant in which all the elements of the ith row equal zero except aik. This new determinant can then be evaluated by (16). Of course, similar operations can also be performed on the columns of a determinant.
1.62. Example. Calculate the following determinant of order five:
-2     5     0   -1 3
10     3 7-2 D=     3-1      0 5-5 2 6-412
0-3-1      2 3
Solution. There are already two zeros in the third column of this determinant. In order to obtain two more zeros in this column, we multiply the fifth row by 3 and add it to the second row and then multiply the fifth row by 4 and subtract it from the fourth row. After performing these operations and expanding the determinant with respect to the third column, we obtain
D :
-2
1 3
2 0
5
-9 -1
18
^3
0 0 0 0
-1
-1
13
5
-7 2
3 7 -5 10 3
= (-l)3+5(-l)
-2
1 3
2
5
-9 -1 18
-1 13 5
3 3
-5
-7 -10
2	5	-1	3
1	-9	13	7
3	-1	5	-5
2	18	-1	-10
sec. 1.6
practical evaluation of determinants 17
The simplest thing to do now is to produce three zeros in the first column; to do this, we add twice the second row to the first row, subtract three times the second row from the third row and subtract twice the second row from the fourth row:
D = -
-2	5 -1	3			0	-13	25	17
1 -	-9 13	7			1	-9	13	7
3 -	-1 5	-5			0	26	-34	-26
2	18 -7	-10			0	36	-33	-24
	-13	25	17					
„1)1+2	26 -	-34 -	-26	*				
	36 -	■33 -	-24					
To simplify the calculation of the third-order determinant just obtained, we try to decrease the absolute values of its elements. To do this, we factor the common factor 2 out of the second row, add the second row to the first and subtract twice the second row from the third row:
	—	13	25	17		0	8	4
0 = 2		13	-17	-13	= 2	13	„17	-13
		36	-33	-24		10	1	2
		0	2	1				
- 2	> 4	13	-17	-13	-			
		10	1	2				
There is already one zero in the first row. To obtain still another zero, we subtract twice the third column from the second column. After this, the evaluation of the determinant is easily completed.
	0		2	1		0	0	1	
D = 8	13		17	-13	= 8	13	9	-13	= 8(-l)1+3
	10		1	2		0	-3	2	
= 8	3	13 10	3 -1	= 8	■3(-	13 -	30)	= -8	■ 3 • 43 = -
13 10
18 determinants
chap. 1
1.7. Cramer's Rule
1.71. We are now in a position to solve systems of linear equations. First we consider a system of the special form
anxi + «12*2 + * * ' + alnxn = bx,
«21*1 ~T" #22*2        ' ' ' ~\~ «2n*« ~ ^*2?
*    *    >   *   *   4 *
(17)
««1*1 "I- an2X2 + * ' " + annXn
i.e., a system which has the same number of unknowns and equations. The coefficients ati = 1,2,..., n) form the coefficient matrix of the system; we assume that the determinant of this matrix is different from zero. We now show that such a system is always compatible and determinate, and we obtain a formula which gives the unique solution of the system.
We begin by assuming that cx, c2, .. . , c„ is a solution of (17), so that
«11^1 + al2c2 + • ■ • + alncn = bx, a2Xcx + a22c2 + • • • + a2ncn = b2,
anlCl "T" an2C2 4" " " ' + annCn =
(18)
We multiply the first of the equations (18) by the cofactor Axx of the element axx in the coefficient matrix, then we multiply the second equation by A2l, the third by A3X, and so on, and finally the last equation by Anl. Then we add all the equations so obtained. The result is
(«u^u + «21^21 + " " " + anXAnX)cx + («12^11 + «22^21 + ' " + an2Anl)c2 + ' ' • (19) + (alnAu + a2nA21 + ■ • ■ + annAnl)cn = bxAxx + M21 + ■ • ■ + bnAnX.
By Theorem 1.51, the coefficient of ct in (19) equals the determinant D itself. By Theorem 1.52, the coefficients of all the other Cj (j ^ 1) vanish. The expression in the right-hand side of (19) is the expansion of the determinant
bx al2 b
a
22
bn ««2
a2n
with respect to its first column. Therefore (19) can now be written in the form
Dcx = Dx,
sec. 1.7
cramer's rule 19
so that
c, —
D
In a completely analogous way, we can obtain the expression
Cj, = ~     (7=1,2,..., «),
(20)
where
a2X ^22
anl an2
al,t-l alJ+l a2J-l     &2 a2J+l
an.i-l    bn an,j+\
a
In
a
2«
a.
= DAK)
is the determinant obtained from the determinant D by replacing its y'th column by the numbers bx, b2, . .. , bn. Thus we obtain the following result: If a solution of the system (17) exists, then (20) expresses the solution in terms of the coefficients of the system and the numbers in the right-hand side of (17). In particular, we find that if a solution of the system (17) exists, it is unique.
1.72. We must still show that a solution of the system (17) always exists. Consider the quantities
, n),
and substitute them into the system (17) in place of the unknowns xx, x2, . . . , xn. Then this reduces all the equations of the system (17) to identities. In fact, for the rth equation we obtain
aacx + ai2c2
D,
■ t aincn = an —~ + ai2 —~ + ■ ■ • + ain —-
D
D
1
= — [aiX(bxAlx + b2A2l + ■ + ai2{bxAX2 + b2A22 + •
+ aiJJ>lAXn + b2A2n +
= ^ [bi(anAu + anAi* +
+ K(onA21 + ai2A22 + + bn(aaAKl + ai2An2 +
D
+ bnAn2) + • • • " + bnAnn)]
' + ainAln) + • • ■
+ ainA2n) + " + ainAxn)].
20 determinants
chap. 1
By Theorems 1.51 and 1.52, only one of the coefficients of the quantities bx, b2,. . . , bn is different from zero, namely the coefficient of bx, which is equal to the determinant D itself. Consequently, the above expression reduces to
-biD = bi, D
i.e., is identical with the right-hand side of the rth equation of the system.
1.73. Thus the quantities Cj (j — 1, . . . , «) actually constitute a solution of the system (17), and we have found the following prescription (Cramer's rule) for obtaining solutions of (17):
If the determinant of the system (17) is different from zero, then (17) has a unique solution, namely, for the value of the unknown xj (j = 1, . . . , n) we take the fraction whose denominator is the determinant D of (17) and whose numerator is the determinant obtained by replacing the jth column of D by the column consisting of the constant terms of (17), i.e., the numbers in the right-hand sides of the system.
Thus finding the solution of the system (17) reduces to calculating determinants. Ways of solving more general systems (with vanishing determinants, or with a number of equations different from the number of unknowns) will be given in the next two chapters.
1.74. Remark. One sometimes encounters systems of linear equations whose constant terms are not numbers but vectors, e.g., in analytic geometry or in mechanics. Cramer's rule and its proof remain valid in this case as well; one must only bear in mind that the values of the unknowns xx, x2, .. . , xn will then be vectors rather than numbers. For example, the system
*i + *2 = i — 3j, xx — x2 = i + 5j
has the unique solution
Ci = i 4- j,     c2= —4j.
1.8. Minors of Arbitrary Order. Laplace's Theorem
1.81. Theorem 1.54 on the expansion of a determinant with respect to a row or a column is a special case of a more general theorem on the expansion of a determinant with respect to a whole set of rows or columns. Before formulating this general theorem (Laplace's theorem), we introduce some new notation.
Suppose that in a square matrix of order n we specify any k < n different rows and the same number of different columns. The elements appearing
sec. 1.8
minors of arbitrary order. laplace's theorem 21
at the intersections of these rows and columns form a square matrix of order k. The determinant of this matrix is called a minor of order k of the original matrix of order n (also a minor of order k of the determinant d); it is denoted by
where i\, i2, . . . , ik are the numbers of the deleted rows, and jx,j2> - - - ,jk are the numbers of the deleted columns.
If in the original matrix we delete the rows and columns which make up the minor M, then the remaining elements again form a square matrix, this time of order n — k. The determinant of this matrix is called the complementary minor of the minor M, and is denoted by the symbol
In particular, if the original minor is of order 1, i.e., is just some element ai} of the determinant D, then the complementary minor is the same as the minor Mti discussed in Sec. 1.53. Consider now the minor
formed from the first k rows and the first k columns of the determinant D; its complementary minor is
m, = mx = Mt±:;:i.
In the right-hand side of equation (6), p. 6 group together all the terms of the determinant whose first k elements belong to the minor Mx (and thus whose remaining n — k elements belong to the minor M2). Let one of these terms be denoted by c; we now wish to determine the sign which must be ascribed to c. The first k elements of c belong to a term cx of the minor Mx. If we denote by Nx the number of segments of negative slope corresponding to these elements, then the sign which must be put in front of the term cx in the minor Mx is (—l)Al. The remaining /; — k elements of c belong to a term c2 of the minor Mz; the sign which must be put in front of this term in the minor M2 is (— 1)-Va, where /V2 is the number of segments of negative slope corresponding to the n — k elements of c2. Since in the matrix of the determinant d there is not a single segment with negative slope joining an element of the minor Mx with an element of the minor M2, the total number of segments of negative slope joining elements of the term c equals the sum Nx + A^. Therefore the sign which must be put in front of the term c is given by the expression (— 1)Vl *-V2, and hence is equal to the product of the signs of the terms cx and c2 in the minors Mx and A/2. Moreover, we note that the product of any term of the minor Mx and any term of the minor M2 gives us one of the terms of the determinant D that
22 determinants
chap. 1
have been grouped together. It follows that the sum of all the terms that we have grouped together from the expression for the determinant D given by (6) is equal to the product of the minors Mx and M2.
Next we solve the analogous problem for an arbitrary minor
ivix — JviJlJ2.....j]e,
with complementary minor M2. By successively interchanging adjacent rows and columns, we can move the minor Mx over to the upper left-hand corner of the determinant D; to do so, we need a total of
& - 1) + ('i - 2) + ■ • • + (ik - k)
+ Oi - i) + (y. - 2) h— + (jk - k)
interchanges. As a result, we obtain a determinant Dx with the same terms as in the original determinant but multiplied by (—l)i+J", where
' = 'i + h + " ' + 4.     j=ji +J2 + ' " + h-
By what has just been proved, the sum of all the terms in the determinant Dx whose first k elemeflts appear in the minor Mx is equal to the product MXM2. It follows from this that the sum of the corresponding terms of the determinant D is equal to the product
(-l)i+iMlM2=.MlA2,
where the quantity
is called the cofactor of the minor Mx in the determinant D. Sometimes one uses the notation
a _ /ilil!.....«
^2 — A h.ii.....«'
where the indices indicate the numbers of the deleted rows and columns.
Finally, let the rows of the determinant D with indices ix, i2, . , , , ik be fixed; some elements from these rows appear in every term of Z). We group together all the terms of D such that the elements from the fixed rows ix, i2,. .. , ik belong to the columns with indices jx,J2, ■ • • Then, by what has just been proved, the sum of all these terms equals the product of the minor
with the corresponding cofactor. In this way, all the terms of D can be divided into groups, each of which is characterized by specifying k columns. The sum of the terms in each group is equal to the product of the corresponding minor and its cofactor. Therefore the entire determinant can be represented as the sum
(21)
sec. 1.9
linear dependence between columns 23
where the indices i\, i2,. . . , ik (the indices selected above) are fixed, and the sum is over all possible values of the column indices ji,j2, . . . ,jk (1 < jx < j2 < • ■ • < jk < «). The expansion of D given by (21) is called Laplace's theorem. Clearly, Laplace's theorem constitutes a generalization of the formula for expanding a determinant with respect to one of its rows (derived in Sec. 1.54). There is an analogous formula for expanding the determinant D with respect to a fixed set of columns.
1.82. Example. The determinant of the form
D -
a
li
a
21
a
kk
ak+l,l
a
Hi
a
Ik
a
2k
a
kk
0 0
*
0
ak+l.k ak-rl,k+l
a
nk
a
0 0
0
ak+l,n
^ 7) t>
such that all the elements appearing in both the first k rows and the last n — k columns vanish, is called quasi-triangular. To calculate the determinant, we expand it with respect to the first k rows by using Laplace's theorem. Only one term survives in the sum (21), and we obtain
D =
a
li
a
Ik
a
kl
a
kk
X
a
fc+i.ft+i
a
a
n.k+l
k+l.n
a„
1.9. Linear Dependence between Columns
1.91. Suppose we are given m columns of numbers with n numbers in each:
" a,
A,
a
li
a
21
a
nl
12 «22
a
n2
a
lm
ar
We multiply every element of the first column by some number Xx, every element of the second column by X2, etc., and finally every element of the last (wth) column by Xm; we then add corresponding elements of the columns.
24 determinants
chap. 1
As a result, we get a new column of numbers, whose elements we denote by cx, c2, . . . , cn. We can represent all these operations schematically as follows;
a
11
a
21
a
nl
+ x2
a
21
a
22
a
«2
4- X
lm
2m
or more briefly as
\Al + X2/l2 + • • • + ~kmAm = C,
where C denotes the column whose elements are cx, c2, . . . , cH. The column C is called a linear combination of the columns Ax, A2, .. . , Am, and the numbers Xx, X2,. . . , Xm are called the coefficients of the linear combination. As special cases of the linear combination C, we have the sum of the columns if Xt = X2 = * • • = XOT = 1 and the product of a column by a number if m = 1.
Suppose now that our columns are not chosen independently, but rather make up a determinant D of order n. Then we have the following
Theorem. If one of the columns of the determinant D is a linear combination of the other columns, then D = 0.
Proof Suppose, for example, that the ^th column of the determinant D is a linear combination of the yth, kth, . . . , pth columns of D, with coefficients Xj, Xfc, . . . , Xj,, respectively. Then, according to Sec. 1.47, by subtracting from the <yth column first the fth column multiplied by Xi? then the kth column multiplied by X^, etc., and finally the pth column multiplied by X^, we do not change the value of the determinant D. However, as a result, the <yth column consists of zeros only, from which it follows that D = 0. |
It is remarkable that the converse is also true, i.e., if a given determinant D is equal to zero, then (at least) one of its columns is a linear combination of the other columns. The proof of this theorem requires some preliminary considerations, to which we now turn.
1.92. Again suppose we have m columns of numbers with n elements in each. We can write them in the form of a matrix
A -
«11 «12
«21 «2?
- • a
lm
a
2m
a,
sec. 1.9
linear dependence between columns 25
with n rows and m columns. If k columns and k rows of this matrix are held fixed, then the elements appearing at the intersections of these columns and rows form a square matrix of order k, whose determinant is a minor of order k of the original matrix A (see p. 21); this determinant may either be vanishing or nonvanishing. If, as we shall always assume, not all of the aik are zero, then we can always find an integer r which has the following two properties:
1) The matrix A has a minor of order r which does not vanish;
2) Every minor of the matrix A of order r + 1 and higher (if such actually exist) vanishes.
The number r which has these properties is called the rank of the matrix A. If all the aik vanish, then the rank of the matrix A is considered to be zero (r = 0). Henceforth we shall assume that r > 0. The minor of order r which is different from zero is called the basis minor of the matrix A. (Of course, A can have several basis minors, but they all have the same order r.) The columns which contain the basis minor are called the basis columns.
1.93. Concerning the basis columns, we have the following important
Theorem (Basis minor theorem). Any column of the matrix A is a linear combination of its basis columns.
Proof. To be explicit, we assume that the basis minor of the matrix is located in the first r rows and first r columns of A. Let s be any integer from 1 to m, let k be any integer from 1 to n, and consider the determinant
«11	«12 '		
«21	«22 '	■ a2r	
arl	ar2  ■ ■	■ arT	ars
flax	ak2   ■ ■	' akr	
of order r -j- 1. If k < r, the determinant D is obviously zero, since it then has two identical rows. Similarly, D = 0 for s < r. If k > r and s > r, then the determinant D is also equal to zero, since it is then a minor of order r + 1 of a matrix of rank r. Consequently D = 0 for any values of k and s.
We now expand D with respect to its last row, obtaining the relation
akiAki + ak2Ak2 + • • ' + akrAkr + ah9Aks = 0, (22)
where the numbers Akl, Ak2, . . . , Akr, Aks denote the cofactors of the elements akl, ak2, .. . , akr, aki appearing in the last row of D. These cofactors
26 determinants
chap. 1
do not depend on the number k, since they are formed by using elements atj with i < r. Therefore we can introduce the notation
Ak% = cx, Ak2 — c2, • . . , Akr — cr, Aks = cs.
Substituting the values k — 1,2, ... ,« in turn into (22), we obtain the system of equations
Wii + c2al2 + • • • + cralr + csals = 0,
Cjflai + c2a22 + " • ' + cr«2j. + ^a2a = °> ^3)
Cifl»i +        + • ■ ■ + cranr + c^ns = 0.
The number cs = v4fts is different from zero, since Aks is a basis minor of the matrix A. Dividing each of the equations (23) by cs, transposing all the terms except the last to the right-hand side, and denoting — Cj/c3 by ^(j= 1,2,..., /•), we obtain
<*u = Mn + M12 + * * * + \alr, a2s = Xxa21 + X2a22 + • • • + Xra2r,
ans = Mnl + X2fl«2 H----+ Kanr-
These equations show that the sth column of the matrix A is a linear combination of the first r columns of the matrix (with coefficients Xx, X2,. . . , Xr). The proof of the theorem is now complete, since s can be any number from 1 to m. |
1.94. We are now in a position to prove the converse of Theorem 1.91 (already mentioned at the end of Sec. 1.91);
Theorem. If the determinant D vanishes, then it has at least one column which is a linear combination of the other columns.
Proof. Consider the matrix of the determinant D. Since D — 0, the basis minor of this matrix is of order r < n. Therefore, after specifying the r basis columns, we can still find at least one column which is not one of the basis columns. By the basis minor theorem, this column is a linear combination of the basis columns. Thus we have found a column of the determinant D which is a linear combination of the other columns. |
Note that we can include all the remaining columns of the determinant D in this linear combination by assigning them zero coefficients (say).
1.95. The results just obtained can be formulated in a somewhat more symmetric way. If the coefficients \, X2,. . . , Xm of a linear combination
sec. 1.9
linear dependence between columns 27
of m columns Ax, A2,.. .,Am (see Sec. 1.91) are equal to zero, then obviously the linear combination is just the zero column, i.e., the column consisting entirely of zeros. But it may also be possible to obtain the zero column from the given columns by using coefficients Xx, X2,. . . , Xm which are not all equal to zero. In this case, the given columns Ax, A2,. . . , Am are called linearly dependent. For example, the columns
1		2		1
2	,      A2 —	4	i      A3 —	1
3		6		1
4		8		1
are linearly dependent, since the zero column can be obtained as the linear combination
2-Ax- 1-^2 + 0-^3.
A more detailed statement of the definition of linear dependence is the following: The columns
		«n			«12		«ml
		«21			«22		«m2
		«„1		i       A2 —	««2	A — 5 * * * ? m	««m
are called linearly dependent if there exist numbers Xx, X2, . . . , Xm, not all equal to zero, such that the system of equation
Mil + Mli + • • ' + K<*lm = 0, M21 + M22 + * * ' + *m«2m = 0»
Mnl + M„2 + • • ■ + *m««m = 0
is satisfied, or equivalently such that
^1^1 + ^2^2 4" " " " + ^m-^m = 0,
where the symbol 0 on the right-hand side denotes the zero column. If one of the columns Ax, A2,. . . , Am, (e.g., the last column) is a linear combination of the others, i.e.,
Am = Xx^x + \2A2 + • • • + "km-iAm_ly (25)
28 determinants
chap. 1
then the columns Au A2, . . . , Am are linearly dependent. In fact, (25) is equivalent to the relation
Ml + Mi +----h K-lAm-l ~ Am = °-
Consequently, there exists a linear combination of the columns Ax, A2, . . . , Am, whose coefficients are not equal to zero (e.g., with the last coefficient equal to — 1) whose sum is the zero column; this just means that the columns Ax, A2, . . . , Am are linearly dependent.
Conversely, if the columns A x, A2, . . . , Am are linearly dependent, then (at least) one of the columns is a linear combination of the other columns. In fact, suppose that in the relation
\AX + \2A2 + • • • + Xm_1^m_1 + lmAm = 0 (26)
expressing the linear dependence of the columns Ax, A2, . .. , Am, the coefficient Xm, say, is nonzero. Then (26) is equivalent to the relation
A   - -h^a - h a hn=} a
n m —       y        1 2 _   - ^m-U
which shows that the column Am is a. linear combination of the columns Ax, A2, . . . , Am_x. Thus, finally, the columns Ax, A2, . . . , Am are linearly dependent if and only if one of the columns is a linear combination of the other columns.
1.96. Theorems 1.91 and 1.94 show that the determinant D vanishes if and only if one of its columns is a linear combination of the other columns. Using the results obtained in Sec. 1.95, we have the following
Theorem. The determinant D vanishes if and only if there is linear dependence between its columns.
1.97. Since the value of a determinant does not change when it is transposed (see Sec. 1.41), and since transposition changes columns to rows, we can change columns to rows in all the statements made above. In particular, the determinant D vanishes if and only if there is linear dependence between its rows.
PROBLEMS
1. With what sign do the terms a) a^xaA^aX4p^
appear in the determinant of order 6?
2. Write down all the terms appearing in the determinant of order four which have a minus sign and contain the factor aiZ.
problems 29
3. With what sign does the term alnaZtn^l - • • anl appear in the determinant of order nl
4. Show that of the n\ terms of a determinant of order n, exactly half (w!/2) have a plus sign according to the definition of Sec. 1.3, while the other half have a minus sign.
5. Use the linear property of determinants (Sec. 1.44) to calculate
am 4- bp an + bq cm f dp   cn + dq
A -.=
6. The numbers 20604, 53227, 25755, 20927 and 78421 are divisible by 17. Show that the determinant
	0	6	0	4
5	3	2	2	7
2	5	7	5	5
2	0	9	2	7
7	8	4	2	1
is also divisible by 17.
7. Calculate the determinants
246   427 327 1014   543 443 -342   721 621
8. Calculate the determinant
P(x) =-
1
2 - x2 3 3
A., =
1 3 1 1
2 2 1 1
3 3 5
9 - x2
1 1
4 1 1
1 1
1 1
1 1
5 1
1 6
9. Calculate the /ith-order determinant
	X	a	a •	■ ■ a
	a	X	a ■	■ a
A -	a	a	x ■	• a
	a	a	a	• X
30 determinants
chap. i
10. Prove that 1
,n-2
,«-2
A2
,n—2 vn-2
,n—\ vn—2
n—2 n
n-l n
X
11. Solve the system of equations
+ 2x2 + 3x3 + 4jc4 + 5x5 = 13, 2xx + x2 + 2x3 + 3jc4 4- 4x5 = 10, 2a:x 4- 2a;2 + x3 4 2jc4 4 3x5 = 11, 2xx 4 2x2 4 2jc3 + Xt + 2x5 = 6, 2atx -r 2*2 + 2;c3 + 2;t4 4 x5 = 3.
12. Formulate and prove the theorem which bears the same relation to Laplace's theorem as Theorem 1.52 bears to Theorem 1.51.
13. Construct four linearly independent columns of four numbers each.
14. Show that if the rows of a determinant of order n are linearly dependent, then its columns are also linearly dependent.
chapter 2
LINEAR SPACES
2.1. Definitions
2.11. In analytic geometry and mechanics one uses vectors (directed Hne segments) subject to certain suitably defined operations. The reader is undoubtedly already familiar with the meaning of the sum of two vectors and the product of a vector and a real number, operations obeying the usual laws of arithmeticf
The concept of a linear space generalizes that of the set of all vectors. The generalization consists first in getting away from the concrete nature of the objects involved (directed line segments) without changing the properties of the operations on the objects, and secondly in getting away from the concrete nature of the admissible numerical factors (real numbers). This leads to the following definition: A set K is called a linear (or affine) space over a field K if
a) Given any two elements xjeK, there is a rule (the addition rule) leading to a (unique) element x 4- V e K, called the sum of x and y;%
t For the time being, we are not concerned with the other vector operations, namely scalar and vector products. In any event, these two products cannot play as basic a role as that played by the product of a vector and a real number. In fact, the scalar product of two vectors is no longer a vector, while the operation of forming a vector product, although leading to a vector, is noncommutative.
+ Here and subsequently, we use some notation from set theory. By a e A we mean that the element a belongs to the set A; by B <= A we mean that the set B is a subset of the set A (B may coincide with A). The two relations B c A and A <= B are equivalent to the assertion that the sets A and B coincide. The symbols e and <= are called inclusion relations. The fact that a e A (or A c B) is sometimes written A 3 a (or B => A). By a $ A we mean that the element a does not belong to the set A.
31
32     linear spaces
chap. 2
b) Given any element x e K and any number X e K, there is a rule (the rule for multiplication by a number) leading to a (unique) element Xx e K, called the product of the element x and the number X;
c) These two rules obey the axioms listed below in Sees. 2.12 and 2.13.
The elements of a linear space will be called rectors, regardless of the fact that their concrete nature may be quite unlike the more familiar directed line segments. The geometric notions associated with the term "vector" will help us explain and often anticipate important results, as well as find a direct geometric interpretation (which would otherwise not be obvious) of various facts from algebra and analysis. In particular, in the next chapter we will obtain a simple geometric characterization of all the solutions of a homogeneous or nonhomogeneous system of linear equations.
2.12. The addition rule has the following properties:
1) xi/^/i-x for every x, y e K;
2) (jc + y) -f z = x + (y -f z) for every x, y, z e K;
3) There exists an element OeK (the zero vector) such that x + 0 = .v for every xeK;
4) For every x eK there exists an element / eK (thenegative element) such that x -r y ~ 0.
2.13. The rule for multiplication by a number has the following properties:
5) 1 • x — x for every xeK;
6) a(f}x) = (ol$)x for every x eK and every a, (3 e K\
7) (a + $)x = ax + fix for every x e K and every a, {3 e K;
8) 7.(x -j- y) = olx -f- &y for every x, y e K and every ael
2.14. Axioms l)-8) have a number of simple implications:
a. Theorem. The zero vector in a linear space is unique.
Proof. The existence of at least one zero vector is asserted in axiom 3). Suppose there are two zero vectors 0X and 02 in the space K. Setting x = 0lt 0 — 02 in axiom 3), we obtain
Oi + o2 = ox.
Setting x — 02, 0 = 0X in the same axiom, we obtain
02 -f 0X = 0
sec. 2.1
definitions 33
Comparing the first of these relations with the second and using axiom 1), we find that 0X = 02. |
b. Theorem. Every element in a linear space has a unique negative.
Proof. The existence of at least one negative element is asserted in axiom 4). Suppose an element x has two negatives y] and j>2. Adding y2 to both sides of the equation x+y\ = 0 and using axioms 1) — 3), we get
y* + (x + yx) = (y2 + x) + yx = 0 + yx = yx, y2 + (x + yx) = y2 + 0 - y2,
whence yx — y2. |
c. Theorem. The relation
0 • x = 0
holds for every element x in a linear space.f
Proof. Consider the element 0 • x + 1 • x. Using axioms 7) and 5), we
get
0 ■ x + 1 ■ jc = (0 + \) ■ x = I ■ x = x, 0-jc+1-jc = 0*jc+jc,
whence
x = 0 * x + x.
Let y be the negative of jc, and add y to both sides of the last equation. Then
0 = x + ^=(0-jc+x) + >' = 0- jc + (jc+>')=0-jc + 0 = 0-x, whence
0 = 0 • x. |
d. Theorem. Given any element x of a linear space, the element
y=(-\)-x
serves as the negative of x.
Proof Form the sum Using the axioms and Theorem 2.14c,
we find that
jc+7=1-jc+(-1)-jc=(1 — 1)-jc = 0-x=0. |
f In the right-hand side of the equation, 0 denotes the zero vector, and in the left-hand side the number 0.
34     linear spaces
chap. 2
e. The negative of a given element x will now be denoted by — x, since Theorem 2.14d makes this a natural notation. The presence of a negative allows us to introduce the operation of subtraction, i.e., the difference x—y is defined as the sum of x and -y. This definition agrees with the definition of subtraction in arithmetic.
2.15. A linear space over the field R of real numbers will be called real and denoted by the symbol R. A linear space over the field C of complex numbers will be called complex and denoted by the symbol C. If the nature of the elements x, y, z, .. . and the rules for operating on them are specified (where axioms l)-8) must be satisfied), then we call the linear space concrete. As a rule, such spaces will be denoted by their own special symbols.
The following four kinds of concrete spaces will be of particular importance later:
a. The space K3. The elements of this space are the free vectors studied in three-dimensional analytic geometry. Each vector is characterized by a length and a direction (with the exception of the zero vector, whose length is zero and whose direction is arbitrary). Addition of vectors is defined in the usual way by the parallelogram rule. Multiplication of a vector by a number X is also defined in the usual way, i.e., the length of the vector is multiplied by |Xj, while its direction remains unchanged if X > 0 and is reversed if X < 0. It is easily verified that all the axioms l)-8) are satisfied in this case. We denote the analogous sets of two-dimensional and one-dimensional vectors, which are also linear spaces, by V2 and Vx, respectively; Vt, V2 and V3 are linear spaces over the field R of real numbers.
b. The space Kn. An element of this space is any ordered w-tuple
of n numbers from the field K. The numbers £x, £2, • • • > are called the components of the element x. The operations of addition and multiplication by a number X e K are specified by the following rules:
(Zu £2, . . . , £„) + (?h, Y}2, . . . , >}„) = (£j + KJj, \2 + *)2, - • • , \n + >)») 0) M£l> ^2> • • • > £n) = (X£l> X^2> ■ • ■ j ^£n)- (?)
It is easily verified that axioms l)-8) are satisfied. In particular, the element 0 is the w-tuple consisting of n zeros:
0 = (0, 0,... , 0).
Actually, we dealt with elements of this space in Sec. 1.9, except that we wrote them there in the form of columns of numbers rather than rows of numbers. If K is the field R of real numbers, we write Rn instead of Kn, while if AT is the field C of complex numbers, we write Cn instead of Kn.
sec. 2.1
definitions 35
c. The space R(a, b). An element of this space is any continuous real function x = x(t) defined on the interval a < / < b. The operations of addition of functions and multiplication of functions by real numbers are defined by the usual rules of analysis, and it is obvious that axioms l)-8) are satisfied. In this case, the element 0 is the function which is identically zero. The space R(a, b) is a linear space over the field R of real numbers.
d. Correspondingly, the space C(a, b) is the space of all continuous complex-valued functions on the interval a < / < b. This space is a linear space over the field C of complex numbers.
2.16. We note that all the properties of elements of concrete spaces (e.g., the vectors of the space V3) which are based only on axioms l)-8) are also valid for the elements of an arbitrary linear space. For example, analyzing the proof of Cramer's rule for solving the system of linear equations
«11*1 + #12*2 + * * * + olnxn = bx,
#21*1        #22*2        ' ' '        #2«*n = b2,
««1*1 + #«2*2 + ' ' ' + #««*« — bn,
we observe that insofar as the quantities bx, b2,. . . , bn are concerned, the proof is based only on axioms l)-8) and the fact that these quantities can be added and multiplied by numbers in K. As has already been pointed out in Sec, 1.74, this permits us to generalize Cramer's rule to systems in which the quantities bx, b2,... , bn are vectors (elements of the space V3). Furthermore, this permits us to assert that Cramer's rule is also valid for systems in which the elements bx, b2,. . . , bn are elements of any linear space K. We note only that then the values of the unknowns xx, x2,. . . , xn are also elements of the space K, and in fact can be expressed linearly in terms of the quantities bx, b2,... , bn.
2.17. Remark. In analytic geometry, it is sometimes convenient to consider vectors which are not free but have their initial points attached to the origin of coordinates. The convenience of this approach is that every vector is then associated with a point of space, namely its end point, and every point of space can be specified by giving the corresponding vector, called the radius vector of the point. With this picture in mind, we sometimes call the elements of a linear space points instead of vectors.! Of course, this change in terminology is not accompanied by any change whatsoever in the definitions, and merely appeals to our geometric intuition.
t We then talk of the "coordinates" of a point, rather than of the "components" of a vector.
36     linear spaces
chap. 2
2.2. Linear Dependence
2.21. Let Xi, x2,. . . , xk be vectors of the linear space K over a field K, and let a19 a2,. . . , afc be numbers from X. Then the vector
/ =        + a2*2 + ' ■ ■ + *tXk
is called a //wear combination of the vectors Ji,x2)...,% and the numbers a1( a2,. . . , afc are called the coefficients of the linear combination.
If ax = a2 — • ■ ■ = afc = 0, then y = 0 by Theorem 2.14c. However, there may exist a linear combination of the vectors xlt x2, . . . , xk which equals the zero vector, even though its coefficients are not all zero. In this case, the vectors xlt x2, .. . , xk are called linearly dependent. In other words, the vectors xlt x2,. . . , xk are said to be linearly dependent if there exist numbers not all equal to zero, such that
at!*! + oc2x2 + ■ ■ • + a*** = 0. (3)
If (3) is possible only in the case where
ax = a2 = " ■ * = afc = 0,
the vectors jc19 ;t2,. .. , jc^ are said to be linearly independent (over X).
2.22. Examples
a. In the linear space K3, linear dependence of two vectors means that they are parallel to the same straight line. Linear dependence of three vectors means that they are parallel to the same plane. Any four vectors are linearly dependent.
b. We now explain what is meant by linear dependence of the vectors xly x2,. . . ,xk of the linear space Kn. Let the vector xt have components 3*1°, . . . , £(0 (/ = 1,2,...,/:). Then the linear dependence expressed by
+ *2x2 + " " " + ap:A. = 0
means that the n equations
+   + - - • + <dk) = o,
+     + ■ ■ ■ +      = o, (4)
hold, where the constants ax, a2, . . . , a*, are not all equal to zero. This is the same definition of linear dependence as that given in Sec. 1.95 for columns of numbers.
sec. 2.2
linear dependence 37
Thus the problem of whether or not the vectors jc1s x2,.. . , xk are linearly dependent reduces in the general case to the problem of whether or not there exists a nontrivial solution of the homogeneous system of equations (4),f with coefficients equal to the corresponding components of the given vectors. This problem will be solved completely in Sec. 3.21, where we will find a rule allowing us to decide whether or not given vectors in the space Kn are linearly dependent from an examination of their components.
c. In some cases, however, we can even now decide whether or not a given system of vectors is linearly dependent. For example, consider the n vectors
^ = (1,0, 0, ...,0), e2 = (0, 1,0,... ,0),
en= (0,0, 0, ... , 1)
in the space Kn. For these vectors, the system (4) has the form
<*i * 1 + a2 ' 0 4- &3 " 0 + • • ■ + art ■ 0 = 0, OH ■ 0 + a2 ■ 1 + a8 ■ 0 + • - ■ + an ■ 0 = 0,
ai ■ 0 + a2 ■ 0 + a3 - 0 + ■ ■ ■ + a„ " 1 = 0, and obviously has the unique solution
ai = a2 = " " " = a„ = 0. Thus the vectors elt e2,. . . , en in the space Kn are linearly independent.
d. Linear dependence of the vectors
Xx = Xi(t), x2 = x2('1, • - • , xk — xk(t)
in the space R(a, b) (or C(a, b)) means that the functions x^t), x2(f), . . . , xk(t) satisfy a relation of the form
Ki*i(0 + aaX2(0 + ' ' ■ + a*x*(0 = 0,
where the constants ax, a2,. . . , afc are not all equal to zero. For example, the functions
xy{t) = cos2 /,     x2(t) = sin2 /,     xz(t) = 1 are linearly dependent, since the relation
*i(0 + x2(t) - x9(t) = 0
t Concerning the terms "homogeneous" and "nontrivial," see Sec. 2.42e.
38     linear spaces
chap. 2
holds. On the other hand, as we now show, the functions 1, t, t2,. .. , tk are linearly independent. In fact, suppose there exists a relation
oc0 • 1 + ai/ + - - • + *ktk = 0. (5)
Then, by successively differentiating (5) k times, we obtain a system of k + 1 equations in the quantities <x0, ax,. . . , ocfc, with a determinant which is clearly different from zero (recall Sec. 1.55b). Solving this system by Cramer's rule (Sec. 1.75), we find that
a0 = ai = • • ■ = a* = 0.
Consequently, the functions 1, t, t2,. . . , tk are linearly independent in the space R(a, b), as asserted.
2.23. Next we note two simple properties of systems of vectors, both involving the notion of linear dependence.
a. Lemma. If some of the vectors jcl5 x2,. . . , xk are linearly dependent, then the whole system xlt x2, ■ . . , xk is also linearly dependent.
Proof. Without loss of generality, we can assume that the vectors xu x2,.. . , x}- (j < k) are linearly dependent. Thus there is a relation
where at least one of the constants a1( a2,. . . , a, is different from zero. By Theorem 2.14c and axiom 3) of Sec. 2.12, we have
ct.1x1 + a2x2 + • • • + oLjXj + 0 • xi+l + • • • + 0 • xk = 0.
But then the vectors xx, x2,. .. ,xk are also linearly dependent, since at least one of the constants 0,. . . , 0 is different from zero. |
b. Lemma. The vectors xt, x2,. . . ,xk are linearly dependent if and only if one of the vectors can be expressed as a linear combination of the others.
Proof. A similar statement has already been encountered; in fact, it was proved for columns of numbers in Sec. 1.95. Inspecting the proof given there, we see that it is based only on the possibility of performing on columns the operations of addition and multiplication by real numbers. Hence the proof can be carried through for the elements of any linear space, i.e., our lemma is valid for any linear space. |
2.3. Bases, Components, Dimension
2.31. By definition, a system of linearly independent vectors elt e2,.. . , en in a linear space K over a field AT is called a basis for K if, given any a- e K,
sec. 2.3
bases, components, dimension 39
there exists an expansion
X = + 2,2*2 + • ■ •  + lyfin (£j £Jf,/= 1,2,..., «). (6)
It is easy to see that under these conditions the coefficients in the expansion (6) are uniquely determined. In fact, if we can write two expansions
X = ^l^i -|- ^2^2 ""t" " " " ~~\~
x = *h*i + v]2e2 + • • • + r[nen for a vector jc, then, subtracting them term by term, we obtain the relation
0 = Hi ~ *h)«i + (£2 - *)2>?2 + •■■ + (£„- r\n)en,
from which, by the assumption that the vectors eu e2,. . . , en are linearly independent, we find that
The uniquely defined numbers £1, ..-■>£« , are called the components of the vector x with respect to the basis ex, e2,. . . , en.
2.32. Examples
a. A familiar basis in the space Vz is formed by the three orthogonal unit vectors i, j, k. The components £l9 J*2, £3 of a vector x with respect to this basis are the projections of x along the coordinate axes.
b. An example of a basis in the space Kn is the system of vectors
«i = (l,0.....0),
e2 = (0,1,..., 0),
eB= (0,0,... , 1), already considered in Sec. 2.22c. Indeed it is obvious that the relation x = ^(1, 0,. . . , 0) + 5>(0, 1,. . . , 0) + • • • + uo, 0,. . . , 1) holds for every vector
X = (£1, ^2) • • ■ > ^ Kn.
This fact, together with the linear independence of the vectors eu e2, . . . , en already proved, shows that these vectors form a basis in the space Kn. In particular, we see that the numbers £2,. . . , are just the components of the vector x with respect to the basis elt e2,. . . , en.
c. In the space R(a, b) there does not exist a basis in the sense defined here. The proof of this statement will be given in Sec. 2.36c.
2.33. The fundamental significance of the concept of a basis for a linear space consists in the fact that when a basis is specified, the originally abstract linear operations in the space become ordinary linear operations with
40      linear spaces
chap. 2
numbers, i.e., the components of the vectors with respect to the given basis. In fact, we have the following
Theorem. When two vectors of a linear space K are added, their components (with respect to any basis) are added. When a vector is multiplied by a number X, all its components are multiplied by X.
Proof Let
x = L^i + lie2 + ■ ■ ■ + lneHt y = 'Wi + ri2e2 + - - ■ 4- v\neH.
Then
x 4- y = (5i +        + (I* + ri2)e2 + - ■ ■ 4- (ln 4- t\n)ent Xx = X^1e1 + \\2e2 4- ' • • + *£nett> by the axioms of Sees. 2.12 and 2.13. |
2.34. If in a linear space K we can find n linearly independent vectors while every n 4- 1 vectors of the space are linearly dependent, then the number n is called the dimension of the space K and the space K itself is called n-dimensional. A linear space in which we can find an arbitrarily large number of linearly independent vectors is called infinite-dimensional.
Theorem. In a space K of dimension n there exists a basis consisting of n vectors. Moreover, any set of n linearly independent vectors of the space K is a basis for the space.
Proof Let elf e2, . . . , en be a system of n linearly independent vectors of the given w-dimensional space K. If x is any vector of the space, then the set of n 4- 1 vectors
x, e±, e2,... , en is linearly dependent, i.e., there exists a relation of the form
a0x +        + a2f?2 + ' ' " 4- a.nev = 0, (7)
where at least one of the coefficients oc0, a1(. .. , aw is different from zero. Clearly a0 is different from zero, since otherwise the vectors eu e2> ■ ■ ■ , en would be linearly dependent, contrary to hypothesis. Thus, in the usual way, i.e., by dividing (7) by <x0 and transposing all the other terms to the other side, we find that x can be expressed as a linear combination of the vectors ex, e2,. .. , en. Since x is an arbitrary vector of the space K, we have shown that the vectors eu et,. . . ,e„ form a basis for the space. |
2.35. The preceding theorem has the following converse:
Theorem. If there is a basis in the space K, then the dimension ofK equals the number of basis vectors.
sec. 2.3
bases, components, dimension 41
Proof. Let the vectors ex, e2, . . . , en be a basis for K. By the definition of a basis, the vectors ex, e2, - - - , en are linearly independent; thus we already have n linearly independent vectors. We now show that any n + 1 vectors of the space K are linearly dependent. —
Suppose we are given n + 1 vectors of the space K:
xx =        +        -f ■ ■ ■ + l{n]en, =        + lfe2 + • • • + lfent
xn+1 = l^e, + i;iB+l)*i + ■ ■ • + ^n+l>^.
Writing the components of each of these vectors as a column of numbers, we form the matrix
A =		Si" ■ I? •	
	j;(i>	l(2)    ■ ■	
with « rows and «4-1 columns. The basis minor of the matrix A (see Sec. 1,92) is of order r < n. If r = 0, the linear dependence is obvious. Let r > 0. After specifying the r basis columns, we can still find at least one column which is not one of the basis columns. But then, according to the basis minor theorem, this column is a linear combination of the basis columns. Thus the corresponding vector of the space K is a linear combination of some other vectors among the given xx, x%,. .. , jc„+1. But in this case, according to Lemma 2.23b, the vectors xu x2,. . . , xn+1 are linearly dependent. |
a. The space K3 is three-dimensional, since it has a basis consisting of the three vectors i, j, k (see Example 2.32a). Similarly, V2 is two-dimensional and Vx is one-dimensional.
b. The space Kn is w-dimensional, since it contains a basis consisting of the n vectors ex, e.^ . . . , en (see Example 2.32b).
c. In each of the spaces R(a, b) and C(a, b), there is an arbitrarily large number of linearly independent vectors (see Example 2.22d), and hence these spaces are infinite-dimensional. Therefore neither space has a basis, for the presence of a basis would contradict Theorem 2.35.
d. Every complex linear space C is obviously a real space as well, since the domain of complex numbers contains the domain of real numbers. However, the dimension of C as a complex space does not coincide with that of C as a real space. In fact, if the vectors elf. .. , en are linearly independent in C regarded as a complex space, then the vectors ex, iex,. .. , en, ien are
42     unear spaces
chap. 2
linearly independent in C regarded as a real space. Hence the dimension of C regarded as a real space is twice as large as that of C regarded as a complex space (provided the dimension is finite).
2.4. Subspaces
2.41. Suppose that a set L of elements of a linear space K has the following properties:
a) If x e L, y e L, then x + y e L;
b) If x e L and A is an element of the field K, then Axe L.
Thus L is a set of elements with linear operations defined on them. We now show that this set is also a linear space. To do so, we must verify that the set L with the operations a) and b) satisfies the axioms of Sees. 2.12 and 2.13, Axioms 1), 2) and 5)-8) are satisfied, since they hold quite generally for all elements of the space K. It remains to verify axioms 3) and 4). Let x be any element of L. Then, by hypothesis, Ax e L for every A e K. First we choose A = 0. Then, since 0 • x = 0 by Theorem 2.14c, the zero vector belongs to the set L, i.e., axiom 3) is satisfied. Next we choose A = — 1. Then, by Theorem 2.]4d, (— \ )x is the negative of the element x. Thus, if an element x belongs to the set x, so does the negative of x. This means that axiom 4) is also satisfied, so that L is a linear space, as asserted. Consequently, every set LcK with properties a) and b) is called a linear subspace (or simply a subspace) of the space K.
2.42. Examples
a. The set whose only element is the zero vector of the space K is obviously the smallest possible subspace of K.
b. The whole space K is the largest possible subspace of K.
These two subspaces of K, the whole space and the set {0} consisting of the zero vector alone, are sometimes called trivial subspaces. All the other subspaces of K are then said to be nontrivial.
c. Let Lx and L2 be two subspaces of the same linear space K. Then the set of all vectors x eK belonging to both and L2 forms a subspace called the intersection of the subspaces 1^ and L2. The set of all vectors of the form y + z> where y e L1? z e L2 forms a subspace, denoted by Lx + L3 and called the sum of the subspaces Lx and L2.
d. All the vectors in the space V3 parallel to a plane (or a line) form a subspace. If we talk about points rather than about vectors, as in Sec. 2.17, then the subspaces of Vz are the sets of points lying on some plane (or line) passing through the origin of coordinates.
sec. 2.4
subspaces 43
e. Consider the set L of all vectors      £2,. .. , £n) in the space Kn whose coordinates satisfy a system of linear equations of the form
«11*1 + «12*2 + ' ' * + «ln*n = °> #21*1 + «22*2 + ' ' * + «2**n = 0,
....................... (8)
«fcl*l + «fc2*2 + * ' ■ + «fc»*n = 0,
with coefficients in the field K and constant terms equal to zero. Such a system is called a homogeneous linear system. A homogeneous linear system is always compatible, since it obviously has the "trivial" solution
*1 = *2 = = *n 0-
Let c[u, cl21},.. . , and c[2), c(22c{2) be two solutions of this system, and form the numbers
r   _ M) _|_ J2)   r   _ Jl)    ,    J2) _    (1)    , <2>
cl — i   Li  » c2 — <-2    "T~ L2  > • ■ ■ » Ln — Ln   ~T~ Lw •
Then clearly c1? c2, . . . , cn is again a solution of the system (8). In fact, substituting these numbers into the ith equation of the system, we obtain
#nCi + ai2c2 + - ■ • + aincn
= Mrf11 + c?) + «i2(41> + 42>) + ■ ' " + ain(c^ + <42>)
= (aac[U + ai2c[» + • • • + ainc{»)
+ (a(1ci2) + a,2c22) + • • • + aincf) = 0,
as asserted; this solution will be called the sum of the solutions c[1], c(2l),. , . , c{v and c^2), c{2\. . , , c^2). Similarly, if cx, c2, , . . , cn is an arbitrary solution of the system (8), then the numbers Xc1? Xc2,. . . , \cn also form a solution of (8) for every fixed X e K; this solution will be called the product of the solution c1? c2,. . . , cn and the number X. Thus solutions of a homogeneous linear system (8) with coefficients and constant terms in a given field K can be added to one another and multiplied by numbers from the same field Ky with the result still a solution of (8). In other words, the set L is a subspace of the space Kn, and hence a linear space in its own right. We will call L the solution space of the system (8). In Sec. 3.41 we will calculate the dimension of this space and construct a basis for it.
2.43. We now consider some properties of subspaces which are related to the definitions of Sees. 2.2 and 2.3. First of all, we note that every linear relation which connects the vectors x,y,. . . , z in a subspace L is also valid in the whole space K, and conversely. In particular, the fact that the vectors xj,...,zeL are linearly dependent holds true simultaneously in the subspace L and in the space K. For example, if every set of w + 1 vectors is
44     linear spaces
chap. 2
linearly dependent in the space K, then this fact is true a fortiori in the sub-space L. It follows that the dimension of any subspace L of an n-dimensional space K does not exceed the number n. According to Theorem 2.34, in any subspace L <= K there exists a basis with the same number of vectors as the dimension o/L. Of course, if a basis elf e2, . . . , en is chosen in K, then in the general case we cannot choose the basis vectors of the subspace L from the vectors elt e2, . . . , en, because none of these vectors may belong to L. However, it can be asserted that if a basis fx,f2, ■ ■ • >fi is chosen in the subspace L (which, to be explicit, is assumed to have dimension I < n), then additional vectorscan always be chosen in the whole space K such that the system fx,f2, . . .      . . . ,fn is a basis for all ofK.
To prove this, we argue as follows: In the space K there are vectors which cannot be expressed as linear combinations of fi,f2,.. . Indeed, if there were no such vectors, then the vectors fx,f2,. . . ,f, which are linearly independent by hypothesis, would constitute a basis for the space K, and then by Theorem 2.35 the dimension of K would be / rather than n. Let fl+l be any of the vectors that cannot be expressed as a linear combination of /i,/2» - - -      Then the system./i,/2,. .. is linearly independent. In
fact, suppose there were a relation of the form
+ «3/2 +----^1/1 + v-i+ifi+i = 0.
Then if aI+1 ^ 0, the vector fl+1 could be expressed as a linear combination °f/i>/2> • ■ ■ while if ai+1 = 0, the vectorsfx,f2,. . . ,fx would be linearly dependent. But both these results contradict the construction. If now every vector of the space K can be expressed as a linear combination of fx,f2, . . . , ft,fl+l, then the system/^ f2,. . . forms a basis for K (and / + 1 = w),
which concludes our construction. If / + 1 < n, then there is a vector /I+2 which cannot be expressed as a linear combination offlff2) . . . and hence we can continue the construction. Eventually, after n — / steps, we obtain a basis for the space K.
2.44. We say that the vectors glf. . . ,gk are linearly independent over the subspace L <= K if the relation
+ " " " + *kgk E L      («1» . . . » a* e A)
imp] ies
a, = ■ ■ • = a^. — 0.
If L is the subspace consisting of the zero vector alone, then linear independence over L means ordinary linear independence. Linear dependence of the vectors gi,. . . ,gk over the subspace L means that there exists a linear combination -4- ■ • • -f- &kgk belonging to L, where at least one of the coefficients a1} , .. , <x,k is nonzero.
sec. 2.4
subspaces 45
The largest possible number of vectors of the space K which are linearly independent over the subspace L <= K is called the dimension ofK over L.
If the vectors glt . . . ,gk are linearly independent over the space L <= K and if the vectors /1( . . . ,ft are linearly independent in the subspace L, then the vectors g1} . . . , gk,f> . . . >fi are linearly independent in the whole space K. In fact, if there were a relation of the form
«1/1 +----h y-ifi + Ptfi + •' • + %gk = 0,
or equivalently
+ ■ ' " +       = -(«1/1 +----r a,/,) G L,
then
Pi = ■ ■ ■ = P* = 0,
by the assumed linear independence of the vectors glf , . . , gk over L. It follows that 0^ = ■ • • = otj = 0, by the linear independence of the vectors
fi> ■ • • ifi'
The vectors/H1, . . . ,/n constructed in Sec. 2.43 are linearly independent over the subspace L. In fact, if there were a relation of the form
with at least one of the numbers am, .. . , an not equal to zero, then the vectors/!,. . . ,/n would be linearly dependent, contrary to the construction. Hence the dimension of the space K over L is no less than n — I. On the other hand, this dimension cannot be greater than n — 1, since if n — I + 1 vectors hu . . . , hn__l+lt say, were linearly independent over L, then the vectors hlt . . . , rtn_i+i,/i» of which there are more than n, would be
linearly independent in K. Therefore the dimension of K over L is precisely n - /.
2.45. The direct sum. We say that a linear space L is the direct sum of given subspaces Lls. . . , Lm <= L if
a) For every xeL, there exists an expansion
X = Xi ~r ' ' ' ~\~ Xm,
where xx e Ll5 ,..,xmeLm;
b) This expansion is unique, i.e., if
x = *i + ' • • + xm = y\ H----—ym
where x^ e L;, yf e L3 (j = 1, . . . , m), then
^1 = J^l? ■ • * j = J^'m-
46     linear spaces
chap. 2
However, the validity of condition b) is a consequence of the following simpler condition: b') If
0 = zx + • • • + zm where z1eL1,...,zm£Lm, then
In fact, given two expansions a: — xx + ■ • - + xm, x = yx + ■ - ■ + ym, suppose b') holds. Then subtracting the second expansion from the first, we get
0 = (*i - yd + • • • + (xm - ym),
and hence xx = yu . . . , xm = ym, because of b'). Conversely, b') follows from b) if we set x = 0, xx = ■ • • = xm = 0.
It follows from condition b) that every pair of subspaces Li,. . . , Lm has only the element 0 in common. In fact, if z e L,- and z g Lfc, then using b) and comparing the two expansions
2 = 2 + 0,      zeLj,     0 e Lfc,
z = 0 + z,     0 e Lj-,     z e Lfc,
we find that z = 0.
Thus an n-dimensional space Kn is the direct sum of the n one-dimensional subspaces determined by any n linearly independent vectors. Moreover, the space Kn can be represented in various ways as a direct sum of subspaces not all of dimension 1.
2.46. Let L be a fixed subspace of an «-dimensional space Kn. Then there always exists a Subspace M <= Kn such that the whole space K„ is the direct sum ofL andM. To prove this, we use the vectors fl+1,... ,/„ constructed in Sec. 2.43, which are linearly independent over the subspace L. Let M be the subspace consisting of all linear combinations of the vectors fl+li. .. ,fn. Then M satisfies the stipulated requirement. In fact, since the vectors /i, - .. ,fn form a basis in Kn (see Sec. 2.43), every vector x eL has an expansion of the form
X = a,/i + • • ■ + ctifi + ai+i/i+i + " " ■ + an/n = y + z,
where
y =       + * • • + aj/i e L>
z = aI+1/i+1 + ' ■ ' + an/„ e M.
Moreover x = 0 implies ax = • ■ • = a„ = 0, since the vectors /i,.. - ,/„ are linearly independent. Therefore conditions a)-b') of Sec. 2.45 are satisfied, so that Kn is the direct sum of L and M.
sec. 2.4
subspaces 47
2.47. a. If the dimension of the space Lfc equals rk {k = 1,.. . , m) and if rk linearly independent vectors fkl, . . . ,fkr]e are selected in each space Lfc, then every vector x of the sum L = Lx + " " " + Lfc can be expressed as a linear combination of these vectors. Hence the dimension of the sum of the spaces Lx,. . . , Lfc does not exceed the sum of the dimensions of the separate spaces. If the sum Lx + • • • + hk is direct, then the vectors fn,. . . ,. . . , Ai, • • • .Afv • ■ ■ - ■ • ' /w„ are aH linearly independent, so that in this case the dimension of the sum is precisely the sum of the dimensions.
b. In the general case, the dimension of the sum is related to the dimensions of the summands in a more complicated way. Here we consider only the problem of determining the dimension of the sum of two finite-dimensional subspaces P and Q of the space K, of dimensions p and q, respectively. Let L be the intersection of the subspaces P and Q, and let L have dimension /. First we choose a basis ex,e2,... ,el in L. Then, using the argument of Sec. 2.43, we augment the basis ex, e2,.. . , et by the vectors ft+x,fi+2,.. . ,/„ to make a basis for the whole subspace P and by the vectors gl+1,gl+2,. .. , gq to make a basis for the whole subspace Q. By definition, every vector in the sum P + Q is the sum of a vector from P and a vector from Q, and hence can be expressed as a linear combination of the vectors
eXy - * * J *i,fi+i, . . . ,fv, gi+ii • • • » gQ. (9)
We now show that these vectors form a basis for the subspace P + Q. To show this, it remains to verify their linear independence. Assume that there exists a linear relation of the form
*xex + ' 1 " +       + Pi+i/i+i + ■ ■ *
where at least one of the coefficients ax,. . . , y, is different from zero. We can then assert that at least one of the numbers ... , y<?1S different from zero, since otherwise the vectors
eXi ■ ■ ■ ■> el,fl+l> • - - tfp
would be linearly dependent, which is impossible in view of the fact that they form a basis for the subspace P. Consequently the vector
x = Yi+tfi+i + ■ ' ' +       ^ 0, (11)
for otherwise the vectors gi+x,.. . , gQ would be linearly dependent. But it follows from (10) that
—x = tx.1e1 + ■ ■ ■ + $PfP e P,
while (11) shows that x e Q. Thus x belongs to both P and Q, and hence belongs to the subspace L. But then
x = Yi+tfu-i H----+ Y«£« = Vi +----1" Vi>
48     linear spaces
chap. 2
and since the vectors
ex, . . .
are linearly independent, we have
Yh-i
= • ■ • = Y„
= 0.
This contradiction shows that the vectors (9) are actually linearly independent, and hence form a basis for the subspace P + Q. It follows from Theorem 2.35 that the dimension of P + Q equals the number of basis vectors (9). But this number equals p + q — /. Thus, finally, the dimension of the sum of two subspaces is equal to the sum of their dimensions minus the dimension of their intersection.
c. Corollary. Let RJ} and RQ be two subspaces of dimensions p and q, respectively, of an n-dimensional space Rft, and suppose p -\- q > n. Then the intersection of Rp and R9 is of dimension no less than p -f q — n.
2.48. Factor spaces
a. Given a subspace L of a linear space K, an element x e K is said to be comparable with an element y eK (more exactly, comparable relative to L) if x — y e L. Obviously, if x is comparable with y, then y is comparable with x, so that the relation of comparability is symmetric. Every element x e K is comparable with itself. Moreover, if x is comparable with y and y is comparable with z, then \ is comparable with z, since
b. The set of all elements y eK comparable with a given element x e K is called a class, and is denoted by X. As just shown, a class X contains the element x itself, and every pair of elements y e X, z e X are comparable with each other. Moreover, if u $ X, then u is not comparable with any element of X. Therefore two classes either have no elements in common or else coincide completely. The subspace L itself is a class. This class is denoted by 0, since it contains the zero element of the space K.
c. The whole space K can be partitioned into a set of nonintersecting classes X, Y,. . . . This set of classes will be denoted by K/L. We now introduce linear operations in K/L as follows: Given two classes X, Y and two elements a, (} of the field K, we wish to define the class ocX 4- fJY. To do this, we choose arbitrary elements xeX,jeY and find the class Z containing the element z = y.x + $y. This class is then denoted by aX + fJY. Clearly, aX + fiY is uniquely defined. In fact, suppose we choose another element xx of the class X and another element yx of the class Y. Then
x — z = (x — y) + (y — z) e L.
(qui -f ß>'!) - (7.x + ß>0 = a(xx - x) + ßCvx - y)
sec. 2.5
linear manifolds 49
belongs to the space L, since xx — x and — y both belong to L. It follows that a*! 4- fJyj belongs to the same class as ax +
In particular, the above prescription defines addition of two classes X and Y, as well as multiplication of a class by a number a g K. We now show that these operations obey the axioms of a linear space, enumerated in Sees. 2.12 and 2.13. In fact, the validity of axioms 1) and 2) of Sec. 2.12 and axioms 5)-8) of Sec. 2.13 for classes follows at once from their validity for elements of the space K. Moreover, the zero element of the space K/L is the class 0 (consisting of all elements of the subspace L), while the inverse of the class X is the class consisting of all inverses of elements of the class X. Thus axioms 3) and 4) of Sec. 2.12 are also satisfied for the set of classes K/L. The resulting linear space K/L is called the factor space of the space K with respect to the subspace L.
2.49. Theorem. Let K = K„ be an n-dimensional linear space over the field K, and let L = Lj <= K be an l-dimensional subspace of K. Then the factor space K/L is of dimension n — I.
Proof. Choose any basis/ls. . . ,/; e L, and augment it, as in Sec. 2.43, by vectors fH1, ...,/„ to make a basis for the whole space K. Then the classes Xz+1 a ft+1,. . . , X„ sfn form a basis in the space K/L. To see this, we note that given any x e K, there is a representation
6=1
and hence a representation
n
x = 2 afcxfc
for the class X 3 x. Moreover, the classes X;a l5 .. . , Xn are linearly independent. In fact, if
amXM + --, + «BX„=0eK/L for any a^1? .. . , a„ in K, then, in particular, there would be a relation
ai_i/I+i 4- • • ■ 4- a„/„ e L.
But fl+1, .. . ,/„ are linearly independent over L (see Sec. 2.44), and hence a!+i = - - - = a„ = 0, as required. Thus the n — / classes X?+1,. . . , Xn form a basis in K/L. It follows from Theorem 2.35 that K/L is of dimension it-/. |
2.5. Linear Manifolds
2.51. An important way of constructing subspaces is to form the linear manifold spanned by a given system of vectors. Let .v, y, z, . .. be a system
50     linear spaces
chap. 2
of vectors of a linear space K. Then by the linear manifold spanned by x,y, z,. .. is meant the set of all (finite) linear combinations
with coefficients %, fj, y,. . . in the field K. It is easily verified that this set has properties a) and b) of Sec. 2.41. Therefore the linear manifold spanned by a system x, y, z,. .. is a subspace of the space K. Obviously, every subspace containing the vectors x,y, z, .. . also contains all their linear combinations (12). Consequently, the linear manifold spanned by the vectors x,y, z,. . . is the smallest subspace containing these vectors. The linear manifold spanned by the vectors x, y, z,... is denoted by L(x, y, z,. . .).
2.52. Examples
a. The linear manifold spanned by the basis vectors elf e2, . . . , en of a space K is obviously the whole space K.
b. The linear manifold spanned by two (noncollinear) vectors of the space Vz consists of all the vectors parallel to the plane determined by the two vectors.
c. The linear manifold spanned by the system of functions 1, t, t2t.. . , tk of the space K(a, b) (K is R or C) consists of the set of all polynomials in t of degree no higher than k. The linear manifold spanned by the infinite system of functions 1, t2,. . . consists of all polynomials (of any degree) in the variable t with coefficients in the field K.
2.53. We now note two simple properties of linear manifolds.
a. Lemma. If the vectors x', y', .. . belong to the linear manifold spanned by the vectors x,y,. .. , then the linear manifold L(x, y, . . .) contains the whole linear manifold L,(x',y', .. .).
Proof. Since the vectors x',y',. .. belong to the subspace L(x,y,. ..) then all their linear combinations, whose totality constitutes the linear manifold L(x', y',. ..), also belong to the subspace L(jc, y,.. .). |
b. Lemma. Every vector of the system x,y,. . . which is linearly dependent on the other vectors of the system can be eliminated without changing the linear manifold spanned by x, y,. . ..
Proof. If the vector x, say, is linearly dependent on the vectors^, z,.. . , this means that x e L(j>, z,. . .). It follows from Lemma 2.53a that
ax + ß>- + yz + • • ■
(12)
L(x,7, z, . . .) c L(j>, z
5
On the other hand, obviously
L(>% z,. ..) <= L(x,j>, z,. ..).
sec. 2.6
hyperplanes 51
Together these two relations imply
LO, z, . ..) = L(x,y, z, . . .). |
2,54. We now pose the problem of constructing a basis for a linear manifold and determining the dimension of a linear manifold. In solving this problem, we will assume that the number of vectors x,y, . . . spanning the linear manifold L(jc, y,. ..) is finite, although some of our conclusions do not actually require this assumption.
Suppose that among the vectors x, y,. . . spanning the linear manifold L(x,y, . ..) we can find r linearly independent vectors xx, x2, .. . , xr, say, such that every vector of the system x,y,. . . is a linear combination of xx, x2, .. . , xr. Then the vectors xx, x2, .. . , xr form a basis for the space ~L(x,y,. . .). Indeed, by the very definition of a linear manifold, every vector z e~L(x,y,. . .) can be expressed as a linear combination of a finite number of vectors of the system x,y,. . . . But, by hypothesis, each of these vectors can be expressed as a linear combination of x1( x2, ... , xr. Thus eventually the vector z can also be expressed as a linear combination of the vectors xx, x2,... , xr. This, together with the assumption that the vectors xx, x2,. . . , xr are linearly independent, shows that xx, x2,... , xr indeed form a basis, as asserted.
According to Theorem 2.35, the dimension of the space ~L(x,y,.. .) is equal to the number r. Since there can be no more than r linearly independent vectors in an r-dimensional space, we can draw the following conclusions:
a. If the number of vectors x,y,. . . spanning ~L(x,y,. ..) is larger than the number r, then the vectors x,y, . .. are linearly dependent. If the number of these vectors equals r, then the vectors are linearly independent.
b. Every set ofr + 1 vectors from the system x, y,. .. is linearly dependent.
c. The dimension of the space L,(x, y,. . .) can be defined as the maximum number of linearly independent vectors in the system x, y, . . . .
2.6. Hyperplanes
2.61. As already noted in Sec. 2.42d, if we adopt the "point" rather than the "vector" interpretation in the space K3, then the geometric entity corresponding to the notion of a subspace is a plane (or a straight line) passing through the origin of coordinates. But it is also desirable to include in our scheme of things planes and straight lines which do not pass through the origin of coordinates. Noting that such planes and straight lines are obtained from planes and straight lines passing through the origin of coordinates by means of a parallel displacement in space, i.e., by a shift, we are led in a natural way to the following genera] construction:
52     linear spaces
chap. 2
Let L be a subspace of a linear space K, and let x0 e K be a fixed vector which in general does not belong to L. Consider the set H of all vectors of the form
x = Xo + y
where the vector y ranges over the whole subspace L. Then H is called a hyperplane^ more specifically, the result of shifting the subspace L by the vector x0. We note that in general a hyperplane is itself not a linear space.
2.62. Examples
a. In the space K3 the set of all vectors starting from the origin of coordinates and terminating on a plane y forms a hyperplane. It is easily verified that this hyperplane is a subspace if and only if the plane y passes through the origin of coordinates.
b. In the space Kn consider the set H consisting of the vectors x = <;2, • • • > <U) whose components satisfy the compatible nonhomogeneous
system of linear equations
tfii-Vi + 012*2 + • • • t- alnx„ = bu
a21\\ -f fl22.v2 -J • • • + a2nxn = b2i ^3
tfi-i*i + tf*2-v-2 + • * * + aknxn = bk,
and the set L consisting of the vectors y = (r;i, yj2,... , vjn) whose components satisfy the homogeneous system of linear equations with the same coefficients:
«ii>'i + «i2>'2 H-----i- fli«3'n = 0»
«2i>'i + ^22^2 + " ' t a2nyn = 0, ,
Wi + aksy* -+'■■ + ak„yn = 0.
As we already know from Example 2.42e, the set L is a subspace of the space Kn. Let a-0 = If,. . . , If) be a solution of the system (13). Then the set H is identical with the set of all sums a"0 -f y where y ranges over the whole subspace L. In fact, if y = (rlt, -/]2, . . . , rh) is a solution of the system (13'), then the vector
x = a-o + y = ttf - 7]!. If + r,2, • • ■ , 5?» + rin)
is obviously a solution of the system (13), i.e., belongs to the set H. Conversely, if a" is any vector of the set H, then the difference y = x — x0 certainly satisfies the system (13'), i.e., the vector y belongs to the subspace
sec. 2.7
morph1sms of linear spaces 53
L. In view of the definition given above, the set H is a hyperplane, namely the result of shifting the space L by the vector jc0.
2.63. We can assign a dimension to every hyperplane, even if it is not a subspace, i.e., we consider the dimension of the hyperplane H to be equal to the dimension of the subspace L from which H was obtained by shifting. For this definition to be suitable, we must show that the given hyperplane H can be obtained as a shift of only one subspace. To prove this, suppose H is both the result of shifting the subspace L by the vector x0 and the result of shifting the subspace L' by the vector x$. Then for any z e H we have both 2 = *0 + y where y e L and z = x$ + y where y' e L'. It follows that L' is the set of vectors of the form y = (x0 — x$) + y where y is an arbitrary vector in L, i.e., the subspace L' is the result of shifting the subspace L by the vector xx = x0 — x$. Clearly xx belongs to the subspace L. In fact, the zero vector, just like any other element of the space L', can be represented in the form jq + yx whereyx e L (since L' is the subspace L shifted by the vector Jtx). Therefore xx ~ —yx, so that xx e L, as asserted. But then every vector y' e L' also belongs to the subspace L, since y' is the sum of a vector x^L and a vector^ e L. It follows that V <= L. Because of the complete symmetry of the hypothesis, we can prove similarly that L <= L'. Together with L' <= L, this implies L = L', as required.
In what follows, hyperplanes of dimension 1 will be called straight lines, and hyperplanes of dimension 2 will be called planes.
2.7. Morphisms of Linear Spaces
2.71. Let co be a rule which' assigns to every given vector x' of a linear space K' a vector ,y" in a linear space K". Then co is called a morphism (or linear operator)^ if the following two conditions hold:
a) co(x' + /) = w(.y') + «(/)for every   y'E K'>"
b) co(a.y') = aco(jc') for every x' e K' and every cue K.
A morphism co mapping the space K' onto the whole space K" is called an epimorphism. A morphism co mapping K' onto part (or all) of K" in a one-to-one fashion (so that x'^ y' implies co(x') to(j/)) is called a mono-morphism. A morphism co mapping K' onto all of K" in a one-to-one fashion (i.e., a morphism which is both an epimorphism and a monomorphism) is called an isomorphism, and the spaces K' and K" themselves are said to be isomorphic (more exactly, K-isomorphic). The usual notation for a morphism is
co:K'^K".
t More exactly, a morphism of K' into K" (or a linear operator mapping K' into K").
54     linear spaces
chap. 2
2.72. Examples
a. Let L be a subspace of a space K. Then the mapping co which assigns to every vector xeL the same vector x e K is a morphism of L into K, and in fact a monomorphism (but not an epimorphism if L 7^ K). This morphism is said to embed L in K.
b. Let L be a subspace of a space K, and let K/L be the factor space of K with respect to L (see Sec. 2.48). Then the mapping co which assigns to every vector x e K the class X e K/L containing x is a morphism of co into K/L, and in fact an epimorphism (but not a monomorphism if L 0). This morphism co is called the canonical mapping ofK onto K/L.
2.73. a. Let the space K' be ^-dimensional with basis e'v . .. , e'n, and choose n arbitrary vectors e\, .... e"n in K". With every given vector
*'=i 5*;
fc=i
in K' we associate the vector
^(x') = x" = 2lA
in K" with the same components £fc (k = 1,. . . , «). Then the mapping co(a-') = x" is a morphism of the space K' into the space K". In fact, given any two vectors
n n k=l fc=l
in K', it follows from Theorem 2.33 that
x' + / =%(Zk + v\h)e'k.
But
Jt=l fc=l by the definition of the mapping co, and moreover
+ /) = 2(5* + >kK =2^4 + Iwl =      + cooo, fc=i fc=i fc=i
so that condition a) of Sec. 2.71 is satisfied. Similarly,
co(ax') = co^aj^e^j - w||a^ij
= 2a^ = a2^4 - aco(x')
sec. 2.7
morphisms of linear spaces 55
for every a e K, so that condition b) is also satisfied. Therefore w is a morphism of K' into K", as asserted.
b. Obviously, the morphism co just described is an epimorphism if and only if every vector x" e K" can be represented in the form
k=l
i.e., if and only if K" coincides with the linear manifold spanned by the vectors e" e"
c. Similarly, our morphism co is a tnonomorphism if and only if every pair of vectors
n n k=l fc=l
differing in at least one component (i.e., such that J-fc ^ vjfc for at least one value of k) are distinct vectors of K". But this is equivalent to linear independence of the vectors e",. . . , e"n. Therefore the morphism co is a monomorphism if and only if the vectors e"v . . . , e"n are linearly independent.
d. It follows that the morphism co described above is an isomorphism if and only if the vectors e'[, .. . , e"n are linearly independent and the linear manifold spanned by them coincides with the whole space K". In other words, the morphism co is an isomorphism if and only if the vectors e"v . . . , e"n form a basis in the space K".
2.74. Theorem. Any two n-dimenshnal spaces K' and K" (over the same field K) are K-isomorphic.
Proof. Let e[, . . . , e'n be a basis in the space K' and e"lt. . . , e"n a basis in the space K", and use these two systems of vectors to construct a morphism co of K' into K" in the way described in Sec. 2.73a. Then co is an isomorphism, by Sec. 2.73d. |
2.75. Corollary. Every n-dimensional linear space over a field K is K-isomorphic to the space Kn of Sec. 2.15b. In particular, every n-dimensional complex space is C-isomorphic to the space Cn, and every n-dimensional real space is R-isomorphic to the space Rn.
2.76. We now discuss further properties of epimorphisms and mono-morphisms.
a. Given a morphism co:K'—vK", consider the set L" of all vectors co(x') e K" such that x' e K'. The set L", which is obviously a subspace of K", is called the range of the morphism co. It is clear that the mapping co
56     linear spaces
chap. 2
of K' into L" is an epimorphism. If the morphism co :K' K" is a monomor-phism, then the morphism co :K'    V is an isomorphism.
b. Given a morphism co:K'-*-K", consider the set V of all vectors x' e K' such that co(x') = 0. The set V, which is obviously a subspace of K', is called the null space (or kernel) of the morphism co.
We now construct the factor space K'jV (see Sec. 2.48). All the elements x' belonging to the same class X' e K'jV are carried by the morphism co into the same element of the space K". In fact, given two such elements x' and y', we have x'-/ = z'eL', and hence
co(jc') - co(/) = co(z') -■ 0,      co(jc') = co(/).
Suppose that with every class X' e K'jV we associate the element x" — co(jc') e K" where x is an arbitrary element of X' (as just shown x" is uniquely determined). Let x" = O(X'). Then it is easy to see that Q is a morphism of K'jV into K". Moreover Q is a monomorphism, since it follows from X' ^ Y', x'eX',y'eY' that
O(X') - Q(Y') = co(x') - co(/) = co(x' - /) ^ 0.
Thus any morphism co :K' —> K" generates a monomorphism Q:K'/L' —*~ K". If the morphism co is an epimorphism, then, obviously, the monomorphism CI is also an epimorphism, so that the epimorphism co:K'—^K" generates an isomorphism Cl:K'jV K".
We will continue the study of morphisms in Chapter 4.
PROBLEMS
1. Consider the set of vectors in the plane whose initial points are located at the origin of coordinates and whose final points lie within the first quadrant. Does this set form a linear space (with the usual operations)?
2. Consider the set of all vectors in the plane with the exception of the vectors which are parallel to a given straight line. Does this set form a linear space?
3. Consider the set P consisting of the positive real numbers only. We introduce operations according to the following rules: By the "sum" of two numbers we mean their product (in the usual sense), and by the "product" of an element r e P and a real number X we mean r raised to the power X (in the usual sense). If P a linear space (with these operations)?
4. Show that a criterion for the linear independence of n given vectors in the space Kn is that the determinant formed from the coordinates of the vectors does not vanish.
5. Show that the functions tr>>, fl,. . . , fk are linearly independent in the space K(a, b), where 0 < a < b and rlt r2,... , rk are distinct real numbers.
problems 57
6. The following is known about a system of vectors el5 e2, . . . , en in a linear space k:
a) Every vector xeK has an expansion of the form
x = £1*1 + Z2e2 + ■ ■ ■ +
b) This expansion is unique for some fixed vector x0 e k. Show that the system elt e2,. . . ,en forms a basis in k.
7. Does there exist a basis in the space P of Problem 3 ?
8. What is the dimension of the space P of Problem 3 ?
9. Find the intersection and sum of two distinct two-dimensional subspaces of the space Vz (two distinct planes passing through the origin of coordinates).
10. Prove that if the dimension of the subspace l <= k is the same as that of the space k, then l = k.
11. Is the shift vector x0 figuring in the construction of a hyperplane uniquely determined by the hyperplane itself?
12. Show that every hyperplane h c k has the following property: If x e h, y e h, then ouc + (1 — a)y e h for every element of the field K. Conversely, show that if a subset h c k has this property, then h is a hyperplane. What geometric characteristic of a hyperplane is expressed by this property?
13. The hyperplanes Hx and h2 have dimensionsp and q, respectively. What is the (smallest) dimension which the hyperplane h3 must have in order to be sure to contain both Hx and h2?
14. Solve the analogous problem for three hyperplanes h1( h2 and h3, with dimensions p, q and r, respectively.
15. According to Theorem 2.74, the one-dimensional spaces 7?x and P (see Problem 3) are isomorphic. How can one establish this isomorphism in practice ?
chapter 3
SYSTEMS OF LINEAR EQUATIONS
3.1. More on the Rank of a Matrix
3.11. We have already touched upon the subject of matrices several times. In this section we will study in more detail those properties of matrices which are connected with the concept of rank (see Sec. 1.9). This will allow us to give a genera] solution of the basic problems of the theory of systems of linear equations, posed in Sec. 1.2.
We begin by recalling some basic definitions from Sec. 1.9. Suppose we have a matrix
all     a12     ' alk a21     aZ2     ' ' '     a2k ,
anl    an2    ' ' ' ank
with n rows and k columns, consisting of the numbers from the field K> where i is the row index ranging from 1 to n and j is the column index ranging from 1 to kf[ If we choose any m rows and m columns of this matrix, then the elements which appear at the intersections of these rows and columns
t Sometimes the indices of an element of the matrix A will be written? differently, i.e., sometimes we will denote the element appearing in the ith row andy'th column of A by the symbol a\.
58
SEC. 3.1
MORE ON THE RANK OF A MATRIX 59
form a square matrix of order m. The determinant of this matrix is called a minor of order m of the matrix A. The integer m is said to be the rank of the matrix A if A has a nonvanishing minor of order r and all its minors of order r + 1 and higher vanish. If the matrix A has rank r > 0, then each of its nonvanishing minors of order r is called a basis minor. The columns and rows of the matrix which intersect at the elements of the basis minor are called the basis columns and basis rows.
The considerations that follow are based on the possibility of regarding any column of numbers as a geometric object, i.e., as a vector in the n-dimensional space Kn of Sec. 2.15b. With this geometric interpretation, the matrix A itself corresponds to a certain set of k vectors of the space Kn. Let xj, (/ — 1, . . . , k) denote the vector corresponding to theyth column of A. Then any linear relation between the columns of A can be interpreted as the same linear relation between the corresponding vectors (see Sec. 2.22b).
Let Lfa, x2, . . . , xk) be the linear manifold spanned by the vectors Xi, x>, . . . , xk of Kn (see Sec. 2.51). We now prove that the vectors corresponding to the basis columns of the matrix A form a basis for this linear manifold. To be explicit, suppose that the first r columns of A are basis columns. Then, to prove our assertion, it suffices to show first that the vectors xx, x2, . . . , xr are linearly independent, and secondly that any of the other vectors xr+li>. . . , xn is a linear combination of the first r vectors (see Sec. 2.54). To prove the first assertion, suppose that the vectors jc1s x2, . . . , xr are linearly dependent, or equivalently, that the first r columns of A are linearly dependent. Then, by Theorem 1.96, any determinant of order r constructed from these columns and any r rows of A would vanish. In particular, the basis minor of A would vanish, contrary to its very definition. This contradiction establishes the first assertion. The second assertion, as applied to columns of the matrix A, has already been proved in Sec. 1.93 under the guise of the "basis minor theorem." This completes the proof that the vectors xls x2,. .. , xr form a basis for the space L(xlt x2,. . . , xk). According to Theorem 2.35, the dimension of this space equals the number r, i.e., the rank of the matrix A. Thus we have established the following important
Theorem. The dimension of the linear manifold spanned by the vectors corresponding to the columns of the matrix A equals the rank of A. Moreover, the vectors corresponding to the basis columns of A form a basis for this linear manifold.
3.12. The following propositions are obvious consequences of conclusions a)-c) of Sec. 2.54:
a. Theorem. If the rank of the matrix A is less than the number of columns in A (r < k), then the columns of A are linearly dependent. If the rank of A
60     systems of linear equations
chap. 3
equals the number of columns in A (r = k), then the columns of A are linearly independent.
b. Theorem. Any r + 1 columns of the matrix A are linearly dependent.
c. Theorem. The rank of any matrix A equals the maximum number of linearly independent columns in A.
This last theorem is of fundamental importance, since it constitutes a new definition of the rank of a matrix.
3.13. Suppose we transpose the matrix A, i.e., suppose we go over to the matrix A' whose rows are the columns of A (cf. Sec. 1.41). Clearly, the rank of the transposed matrix A' is the same as the rank of A. But according to Theorem 3.12c, the rank of A' equals the maximum number of linearly independent columns in A', or equivalently, the maximum number of linearly independent rows in A. Thus we arrive at the following somewhat unexpected conclusion:
Theorem. The maximum number of linearly independent rows in a matrix A is the same as the maximum number cf linearly independent columns in A.
We note that this theorem is not trivial. In fact, any direct proof of the theorem would require a chain of reasoning equivalent to the proof of Theorems 1.93 and 3.11.
3.14. Finally we note the following result, which is a consequence of Theorem 3.11 and Lemma 2.53b:
Theorem. Any column of the matrix A which is a linear combination of the other columns can be deleted without changing the rank of A.
3.2. Nontrivial Compatibility of a Homogeneous Linear System
3.21. Suppose we have a homogeneous linear system
"n^i ~\~ a\2.x% H- ' " ' t aXnxn = 0,
a21X! -f «22^2 + " ■ " + o2nxn = 0, (2)
akixi ~\~ aK2x2 + ' " ' t aknxn — 0. As we know, this system is always compatible, since it has the trivial solution
SEC. 3.3 THE COMPATIBILITY CONDITION FOR A GENERAL LINEAR SYSTEM 61
The basic problem encountered in studying homogeneous linear systems is the following: Under what conditions is a homogeneous linear system "non-trivially compatible" i.e., under what conditions does such a system have solutions other than the trivial solution1} The results of Sec. 3.1 allow us to solve this problem immediately. In fact, as we have seen in Sec. 2.22b, the existence of a nontrivial solution of the system (2) is equivalent to the columns of the matrix
au   a]2   ■ ■ ■ aln
^ _ #21     #22     " " " #2«
#fcl     #Jfc2 " akn
being linearly dependent. But, according to Theorem 3.12a, this occurs if and only if the rank of the matrix A is less than the number of columns in A. Thus we obtain the following
Theorem. The system (2) is nontrivially compatible, i.e., has nontrivial solutions if and only if the rank of the matrix A is less than n. If the rank of the matrix A equals n, the system (2) has no nontrivial solutions.
3.22. In particular, if the number of equations in the system (2) is less than the number of unknowns (k < n), the rank of the matrix A is certainly less than n, and in this case nontrivial solutions always exist. If k — n, the question of whether or not nontrivial solutions exist depends on the value of det A. If det A 7^ 0, there are no nontrivial solutions (r = n), while if det A = 0, there are nontrivial solutions (r < n). If k > n, we have to examine all possible determinants of order n which are obtained by fixing any n rows of the matrix A. If all these determinants vanish, then r < n and nontrivial solutions exist. If at least one of these determinants isnonvanishing, then r = n and there is only the trivial solution.
3.3. The Compatibility Condition for a General Linear System
3.31. Suppose we have a general (i.e., nonhomogeneous) system of linear equations
#11*1 ~\~ #12*2 ~~\~ ' ' ' ~\~ alnXn ~ #21*1 ~\~ #22-^2 ~H " ' " H~ #2n*n ~ ^2?
#fcl*l + #&2*2 + * * ' + aknXn ~ ^Jfc-
62     SYSTEMS OF LINEAR EQUATIONS
With this system we associate two matrices, the matrix
CHAP. 3
«11 «12 * * " «ln «21    «22    * * " «2n
«fcl    «Jfc2    " " " akn
called the coefficient matrix of the system (3), and the matrix
«11 «12
«21 «22
«fcl ak2
«1« h
a
2n
akn h
called the augmented matrix of the system (3). Regarding the compatibility of the system (3), we then have the following basic
Theorem {Kronecker-Capelli). The system (3) is compatible if and only if the rank of the augmented matrix of the system equals the rank of the coefficient matrix.
Proof. Assume first that the system (3) is compatible. Then if cu c2, . .. , cn is a solution of the system, we have the equations
«u<a + a12c2 + ■ • • + aXncn = 6i,
«21^1 ~\~ «22^2 ~\~ ' ' ' ~\~ «2«^n ~ &2>
«*i<a + «^2 + ■ ■ * + akncn — bk.
These equations imply that the last column of Ax is a linear combination of the other columns of Ax (with coefficients clt c2,. . . , cn). By Theorem 3.14, we can delete the last column of Ax without changing its rank. But when the last column of Ax is deleted, it becomes just A. Hence if the system (3) is compatible, the matrices A and Ax have the same rank.
We now assume that the matrices A and Ax have the same rank, and show that the system (3) is compatible. Let r be the rank of the matrix A (and consequently also of the matrix Ax). Consider r basis columns of A; they will also be basis columns of Ax. By Theorem 1.93, the last column of AY can be written as a linear combination of the basis columns, and hence it can be written as a linear combination of all the columns of A. If we
sec. 3.4
the general solution of a linear system 63
denote the coefficients of this linear combination by clt c2f. . . , cn, we find that the equations
a\\C\ + a12c2 + ■ • ■ + alncn = blt a2iCi + «22^2 + • • • + a2ncn = b2,
«jtici + ak2c% + ■ ■ • 4- akncn = bk are satisfied. Thus the values
-^1 " ^i, X2 = C2, ■ ■ . , Xn ' — cn
satisfy the system (3), which is therefore compatible. |
3.4. The General Solution of a Linear System
3.41. The Kronecker-Capelli theorem, which gives the general condition for the compatibility of a linear system, does not give a method for solving the system. We now derive a formula which constitutes a general solution of a linear system.
By a general solution of the system (3) we mean a set of expressions
*i = /i(flu» ■ - • > akn, bi, ■ ■ ■ > h, qi, ■ ■ ■ , q,)      (/ = 1, • ■ ■ , n),
where the right-hand sides are functions depending on the coefficients ati of the system (3), the constant terms bi of (3) and certain undetermined parameters qx, . . . , qs, such that
1) The quantities x^ cj (j = 1, . . . , n) obtained for arbitrary fixed values of the parameters qt, . . . , qs (from the field A") constitute a solution of the system (3);
2) Any given solution of the system (3) can be obtained in this way by suitably choosing the values of the parameters q1} . . . ,qs in K.
As shown in Sec. 2.62b, the set of all sums of the form x0 4- y, where x0 is any ("particular") solution of the system (3) and y ranges over the set of all solutions of the corresponding homogeneous system, is just the set of all solutions of (3). This fact can now be expressed as follows: The general solution of the nonhomogeneous system (3) is the sum of any particular solution of (3) and the general solution of the corresponding homogeneous system (2).
Suppose we have a compatible linear system (3) with a coefficient matrix A — of rank r. It can be assumed that the basis minor M of the matrix A appears in its upper left-hand corner; otherwise, we can achieve this configuration by interchanging rows and columns of A, which corresponds
64     SYSTEMS OF LINEAR EQUATIONS
CHAP. 3
to renumbering some of the equations and unknowns in the system (3). We take the first r equations of the system (3) and rewrite them in the form
«11*1 + «12*2 ~----+ «lr*r = *1 ~ «l,r+l*r+l----- «l»*n,
«21*1       «22*2 ~T " " " ~T «2r*r —* ^2       «2 r+l*r+l       " «2n*n?
(4)
«rl*l ^   «r2*2 "T " " ' "T «rTXr — br      dr r+1Xr+1       * * - arnXn.
Next we assign the unknowns xT+1, . . . , xn completely arbitrary values cT+1, . . . , cn. Then (4) becomes a system of r equations in the r unknowns xu x2, . .. , xr, with a determinant M which is nonvanishing (a basis minor of the matrix A). This system can be solved by using Cramer's rule (see Sec. 1.73). Hence there exist numbers cl5 c2,. . . , cn which, when substituted for the unknowns x1? x2, . . . , xn of the system (4), reduce all the equations of the system to identities. We now show that these values c1? c2, . . . , cn satisfy all the other equations of the system (3) as well.
The first r rows of the augmented matrix Ax of the system (3) are basis rows of this matrix, since by the compatibility condition, the rank of the augmented matrix is r, while by construction, the nonvanishing minor M appears in the first r rows of A±. By Theorem 1.93 (applied to rows), each of the last n — r rows of Ax is a linear combination of the first r rows. This means that every equation of the system (3) beginning with the (r + l)st equation is a linear combination of the first r equations of the system. Therefore, if the values
Xi ~ Ci,. .. , yn = cn
satisfy the first r equations of the system (3), they also satisfy all the other equations of (3).
3.42. To write an explicit formula for the solution of the system (3) just constructed, let Mfaj) denote the determinant obtained from the basis minor
M — det ||aw||      (i,j = 1, 2,. . . , r)
by replacing its y'th column by the column consisting of the quantities a1( a2, . . . , ar. Then, using Cramer's rule to write the solution of the system (4), we obtain
c, = ~~ Mjibi ~ aiT+1cr+1 ----- aincn) M
^ ~ WAK) ~ Cr+iM^i.r+i) ----- c„M3.(#in)}      (y = 1, 2, . . ., r).
(5)
sec. 3.5
geometric properties of the solution space 65
These formulas express the values of the unknowns x, = c3(j = 1, 2, . . . , r) in terms of the coefficients of the system, the constant terms and the arbitrary quantities (parameters)
Finally, we show that (5) comprises any solution of the system (3). In fact, let c[0), c20>,.. . , cj.0>, cj°>r . . . , c^0' be an arbitrary solution of the system (3). Obviously, it is also a solution of the system (4). But, using Cramer's rule to solve the system (4), we obtain unique expressions for the quantities c[°\ c20),.. . , c(r0) in terms of the quantities c^,. .. , c^\ namely the formulas (5). Thus, choosing
c       _ -(0) c .(0)
in (5), we get just the solution c[0), c20), .. . , c(°(, as asserted. Thus (5) is the genera] solution of the system (3).
3.5. Geometric Properties of the Solution Space
3.51. Consider first the case of the homogeneous linear system (2). As we have already seen (Sec. 2.42e), the set of all solutions of this system forms a linear "solution space," which we denote by L. We now calculate the dimension of L and construct a basis for L.
For a homogeneous system, the equations (5) become
-Mci --- cr+1M3(alir+1) + ■ ■ ■ + cnM3-(ain)      (j = 1, 2, . . . , r), (6)
since Mjibt) = Mj(0) = 0. With every solution cl5 c2, ■ ■ . , cr, cr+1, . . . , cn of the system (2) we associate a vector (cr+1, . . . , cn) of the space Kn^r (see Sec. 2.15b). Since the numbers cr+1,. .. , cn can be chosen arbitrarily and since they uniquely define a solution of the system (2), the correspondence between the space of solutions of the system (2) and the space Kn_T is one-to-one. This correspondence is an isomorphism, since it preserves linear operations, as is easily verified. Thus the space L of solutions of a homogeneous system of linear equations in n unknowns with a coefficient matrix of rank r is isomorphic to the space Kn_r. In particular, the dimension of the space L is n — r.
3.52. Any system of n — r linearly independent solutions of a homogeneous linear system of equations (which, by Theorem 2.34, forms a basis in the space of all solutions) is called a fundamental system of solutions. To construct a fundamental system of solutions, we can use any basis of the
66     SYSTEMS OF LINEAR EQUATIONS
CHAP. 3
space Kn_r. Then, because of the isomorphism, the corresponding solutions of the system (2) will form a basis in the space of all solutions of the system. The simplest basis of the space Kn_r consists of the vectors
et = (1,0, ... ,0), e2 = (0, 1,...,0),
(see Sec. 2.32c). For example, to obtain the solution of the system (2) corresponding to the vector elf we set cT+1 = 1, cr+2 = ■ ■ ■ = cn = 0 in the formulas (6) and determine the corresponding values
c. = cji>      (," = 1, 2, ... , n).
Similarly, we construct the solution corresponding to any other basis vector ei (j — 2, . . . , n — r). The set of solutions of the system (2) constructed in this way is called a normal fundamental system of solutions. If we denote these solutions by xa), x{2), . . . , x(n~T), then by the definition of a basis, any solution x is given by the formula
x = ai;ca> + <x2jc(2> + ■ ■ ■ + an_Px(w-r). (7)
Since any solution of the system (2) is a special case of (7), this formula gives the general solution of (2).
3.53. Consider now the general case of a nonhomogeneous system (3). As shown in Sec. 2.62b, the geometric object H corresponding to the set of all solutions of a nonhomogeneous system is a hyperplane in the «-dimensional space Kn. This hyperplane is obtained by shifting the subspace L of all solutions of the corresponding homogeneous system (L has been shown to be isomorphic to the space Kn_r) by a vector x0 which is an arbitrary particular solution of the nonhomogeneous system. From this we conclude that the dimension of the hyperplane H is the same as the dimension of the subspace L. Moreover, if r is the rank of the coefficient matrix of the system (3), then any vector y of the subspace L can be represented as a sum
y = aa/1' + a2/2> + ■ ■ ■ + an_ry<" r>,
where ya], y(2\ . . . ,y{n~r) are basis vectors of the space L (a fundamental system of solutions). Consequently, any vector x of the hyperplane H can be represented as a sum
x=Xo+y = Xo + aiy0 + ^ym + ■ ■ ■ 4- ^n_ry{n~r\
In the language appropriate to solutions of the systems (2) and (3), this agrees with the prescription established in Sec. 3.41, i.e., the general solution
sec. 3.6
methods for calculating the rank of a matrix 67
of the nonhomogeneous system (3) is the sum of any particular solution of (3) and the general solution of the corresponding homogeneous system (2).
3.6. Methods for Calculating the Rank of a Matrix
3.61. To make practical use of the methods for solving systems of linear equations developed in the preceding sections, one must be able to calculate the rank of a matrix and find its basis minor. Obviously, the definition of the rank of a matrix given in Sec. 1.92 cannot serve per se as a reasonable practical means of calculating the rank. For example, a square matrix of order five contains one minor of order five, 25 minors of order four, 100 minors of order three, and 100 minors of order two. Clearly, it would be a very laborious task to find the rank of such a matrix by direct calculation of all its minors. In this section, we will give simple methods for calculating the rank of a matrix and determining its basis minor. These methods are based on a study of certain operations on rows and columns of a matrix which do not change its rank; these operations will be called elementary operations. Since, as already noted, the rank of a matrix does not change when it is transposed, we will define these operations only for the columns of a matrix. In keeping with this, our proofs will make use of the geometric interpretation of a matrix with n rows and k columns as the matrix formed from the components of a system of k vectors xlt x2, . . . , xk in the n-dimensional (real) space Rn. We will also make use of Theorem 3.11, which asserts that the rank of this matrix equals the dimension of the linear manifold spanned by the vectors xu x2> . . . , xk.
We now study the following elementary operations:
a. Permutation of columns. Suppose the columns of the matrix A are permuted in any way. This operation does not change the rank of A. In fact, the dimension of the linear manifold spanned by the vectors xx, x2, . . . , xk does not depend on the order in which they are written, and hence the rank of the matrix does not depend on the order of its columns.
b. Dividing out a nonzero common factor of the elements of a column. Suppose the number X ^ 0 being divided out is a common factor of the elements of the first column of the matrix A. This operation is equivalent to replacing the system of vectors Xjq, x2,. . . , xk by the system jq, x2,. . . , xk. But obviously the linear manifolds spanned by these two systems have the same dimension (since the linear manifolds themselves are the same). Therefore the rank of the matrix A does not change as a result of this elementary operation.
c. Adding an arbitrary multiple of one column to another column. Suppose we multiply the with column of the matrix A by the number X and add it to
68     SYSTEMS of LINEAR EQUATIONS
CHAP. 3
they'th column. This means that the system of vectors xlt . . . , xh . . . , xm, . . . , xk has been replaced by the system
xi, ■ ■ ■ » Xj 4- Xxm, . . . , xm,    . , xk.
We have to show that the linear manifolds L± and L2 spanned by these two systems are the same. In the first place, all the vectors of the second system lie in the linear manifold spanned by the vectors of the first system. Hence, by Lemma 2.53a, we have L2 <= Lx. On the other hand, the equation
Xj = (Xj + hxm) — Xxm
shows that the vector x, lies in the linear manifold spanned by the vectors of the second system. Since all the other vectors of the first system obviously belong to this linear manifold, we have Lx <= L2. It follows that Lx ~ L2. Therefore the rank of A does not change as a result of this elementary operation.
d. Deletion of a column consisting entirely of zeros. A column consisting entirely of zeros corresponds to the zero vector of the space Rn. Obviously, eliminating the zero vector from the system xx, x2, . . . ,xk does not change the linear manifold L(x±, x2>.. . , xk) and hence does not change the rank of the matrix A.
e. Deletion of a column which is a linear combination of the other columns. The legitimacy of this elementary operation was proved in Theorem 3.14.
3.62. Calculation of the rank of a matrix and determination of a basis minor. We now show how to calculate the rank and find a basis minor of a given matrix A by using the elementary operations just enumerated. If the matrix A consists only of zeros, then its rank is obviously zero. Suppose A contains a nonzero element. Then, by suitably permuting the rows and columns, we can bring this element over to the upper left-hand corner of the matrix. Then, subtracting from every column the first column multiplied by a suitable coefficient, we can make all the other elements of the first row vanish. We shall make no further changes in the first row and first column (except for the rearrangements described below). If there are no nonzero elements among the remaining elements (i.e., the elements which do not belong to the first row and the first column), then the rank of the matrix A is obviously 1. If there is a nonzero element among the remaining elements, then by suitably rearranging rows and columns, we can bring this element over to the intersection of the second row and the second column and then make all the elements following it in the second row vanish, just as before. (We note that these operations do not affect the first row and the first column.)
SEC. 3.6
METHODS FOR CALCULATING THE RANK OF A MATRIX 69
Continuing in this fashion, and assuming that the number of columns in A does not exceed the number of rows in A (this can always be achieved by transposition), we reduce A to one of the following two forms:
«1	0		0	■ ■ •	0	0	... o
C21	a	2	0	■    > #	0	0	... o
C31	C32		<*3	■ ■ •	0	0	... o
Ckl	Ck2		Ck3	. . .	*k	0	... o
ck+ l.l			Ck+1,3		Ck+l,k	0	... o
cnl				- ■ -	Cnk	0	... o
		ai	0	0		0	
		C21		0		0	
		C31	C32	a3		0	
A2		-		-		-	-
		Cml	Cm2				
			Cn2	CnZ		c	
Here the numbers a1? a2, etc. are nonzero. In the first case, the rank of Ax equals k and its basis minor (in the transformed matrix) stands in the upper left-hand corner. In the second case, the rank of A2 equals m (the number of columns) and its basis minor (in the transformed matrix) appears in the first m rows. This determines the rank of A. The location of the basis minor of A is easily found by following back in reverse order all the operations performed on A.
70     SYSTEMS OF LINEAR EQUATIONS
CHAP. 3
As an example, consider the following matrix with five columns and six rows:
A =
1	2	6	-2	-1
-2	-1	0	-5	-1
3	1	-1	8	1
-1	0	2	-4	-1
-1	__2	~1	3	2
-2	_2	-5	-1	1
There is one zero in the second row of A; by using the general method described above, we can produce three more zeros in this row. However, for convenience, we first interchange the first and second rows. Then, interchanging the first and second columns (so that an element —1 with the smallest nonzero absolute value again appears in the upper left-hand corner), we obtain*
A ~
-2	-1	0	-5	-1		-1	-2	0	-5	-1
1	2	6	-2	-1		2	1	6	-2	-1
3	1	-1	8	1		1	3	-1	8	1
-1	0	2	-4	-1		0	-1	2	-4	-1
-1	-2	-7	3	2		-2	-1	-7	3	2
-2	-2	-5	-1	1		~2	-2	-5	-1	1
To obtain three more zeros in the first row, we multiply the first column by 2, 5, and 1, and subtract the results from the second, fourth, and fifth columns, respectively. This gives
A ^
-1	0	0	0	0
2	-3	6	-12	-3
1	1	-1	3	0
0	-1	2	-4	-1
-2	3	-7	13	4
-2	2	-5	9	3
The simplest thing to do next is to produce additional zeros in the third row. First we interchange this row with the second row. Then we multiply
f Here the symbol ~ written between two matrices means that they have the same rank.
problems 71
the second column by 1 and —3 and add the results to the third and fourth columns, respectively. Thus we have
-1	0	0	0	0		-1	0	0	0	0
1	1	-1	3	0		1	1	0	0	0
2	-3	6	-12	-3		2	„3	3	-3	„3
0	-1	2	-4	„1		0	-1	1	-1	-1
-2	3	„7	13	4		-2	3	-4	4	4
-2	2	-5	9	3		-2	2	-3	3	3
The fourth and fifth columns of the matrix Ax are proportional to the third column and can be deleted. The matrix which is left obviously has rank 3, so that the original matrix A also has rank 3. Moreover, Ax has a basis minor in its first three rows and first three columns. By reversing the successive transformations which led from A to Ax, we can easily verify that none of the transformations which were carried out has any effect on the absolute value of this minor. Therefore the minor appearing in the first three rows and the first three columns of the original matrix is also a basis minor.
PROBLEMS
1. Prove the following theorem: A necessary and sufficient condition for a matrix \au || of order m to have rank r < 1 is that there exist numbers ax, a2,.. . , am and blt b2,.. . , bm such that
au = a^j      (/,;' = 1, 2, . . . , m).
2. Let xx, *2, ■ .. , xk be k linearly independent vectors in an w-dimensional space K^, and let A = ||flj;>|| be the matrix made up of the components of the vectors xlt *2,. . . , xk with respect to some basis elt e2,. . . , en. Show that the linear manifold L(jcx, jc2, . .. , xk) is uniquely determined, provided one knows the values of all the minors of A of order k.
3. Show that when k = «, the system (2), p. 60 has the solution
cx = AjX, c2 = Af2,. . . , cn — Ain     (1 < / < w),
where Ati is the cofactor of the element ati (/ fixed), provided that the rank of the matrix A is less than n.
4. Solve the system of equations
xl     X2 + X3 + x^ + x§ = 7, 3xx ~\- 2.x2 ~\~ xz -f" x^    3jfg — 2, x2 + 2jc3 + 2*4 + 6x5 = 23, 5xx + 4*2 + 3*3 + 3*4 — *6 = 12.
72     SYSTEMS OF LINEAR EQUATIONS
CHAP. 3
5. Study the solutions of the system
+ y + z = 1, X +      + z = X,
x + y + Xz = X2
as a function of X.
6. What is the condition for the three straight lines
axx + bxy + cx = 0,     a2x + 62j + c2 = 0,     fl3* + 63j + c3 = 0 to pass through one point ?
7. What is the condition for the n straight lines
axx + b-^y + cx = 0,     a2x + b2y + c2 = 0,. . . , a„Ar + 6„j + cn = 0 to pass through one point ?
8. Find the normal fundamental system of solutions for the system of equations
xl + X2 + XZ + xi + *5 = 0,
3x± + 2x2 + x3 + x4 — 3xs = 0, x2 + 2x3 + 2x4 + 6x5 = 0, 4- 4jc2 + 3jt3 + 3jt4 — jc5 = 0.
9. Write down the general solution of the system given in Problem 4, using the normal fundamental system of solutions of the corresponding homogeneous system (found in Problem 8).
10. Determine the rank and basis minor of the following matrices:
	1	-2	3	-1	-1	-2		1	0	1	0	0
	2	-1	1	0	-2	-2		1	1	0	0	0
	-2	-5	8	-4	3	-1	, A2	0	1	1	0	0
	6	0	-1	2	-7	-5		0	0	1	1	0
	-1	— I	1	-1	2	1		0	1	0	1	1
11. Suppose the matrix A has a nonvanishing minor Mof order r, while every minor of order r + 1 containing all the elements of M vanishes. Prove that A has rank r.
12. Construct a matrix
au  a12 au
A =
a21  a22 a2Z
such that the minors
«11	«12		«11	«13		«12	«13
							
«21	«22		«2]	«23		«22	«23
- R
have the indicated values P, Q and R.
problems 73
13. For the system of equations
n
2 ***** = bi      (j = l,...,n)
(8)
with a square coefficient matrix, prove "Fredholm's alternative," which asserts that (8) either has a unique solution for arbitrary blt... ,bnor else the corresponding homogeneous system
n
2 ***** = 0   0" = i, • • ■»«)
has a nontrivial solution.
14. Prove that the system of equations
*n*i + ■ " " + alnxn =
^ril*] + " " " + #nn*n *m+1,1*] + ' ' ' + an^-l,nxn
subject to the condition
a„   - - ■ a
in
*nl
is solvable if and only if
a
li
*n+l,l       '      an+l,n ^n+l
15 (Elimination of unknowns). Prove that the system
0.
«n*]
-'r al7lxn = bnyx + ■ ■ ■ -i blky,c -j- ca,
«,a*i -'-■■■-!- annxn ■= 6^ + ■ ■ ■ + bnkyk + cn,
an+lmlx\ + " " " + «n+l,n*n = ^n+l.ljl + ' ' ' + ^n+l^k + cn+l
containing the parametersylt ■ ■. ,yk, subject to the condition
a
li
a
lw
0
a
nl
SYSTEMS OF LINEAR EQUATIONS
CHAP.
is solvable if and only if the parameters ylt... ,yk satisfy the equation
a
li
nl
a
in
"nn Kn
^TH-l.l     "   " fln+l,l
a.
nl
On
+1,1
1« ulk
ann b
nk
an+i,n bn+lk
+
a
li
a,
nl
fln+l,l     ' ' '    an+l.n cn+l
chapter 4
LINEAR FUNCTIONS OF A VECTOR ARGUMENT
In courses on mathematical analysis one studies functions of one or more real variables. Such functions can be regarded as functions of a vector argument. For example, a function of three variables can be regarded as a function whose argument is a vector of the space V3. This suggests studying functions whose arguments are vectors from an arbitrary linear space. In making this study, we will for the time being restrict ourselves to the simplest functions of this kind, namely linear functions. We will study both linear numerical functions of a vector argument, i.e., functions whose values are numbers, and linear vector functions of a vector argument, i.e., functions whose values are vectors. Linear vector functions, otherwise known as linear operators, are of great importance in linear algebra and its applications.
4.1. Linear Forms
4.11. A numerical function L(x) of a vector argument x, defined on a linear space K over a number field K, is called a linear form if it satisfies the following conditions:
a) L(x        = L(x) + L(y) for every .v^yeK;
b) L(a.x) = cn.L(x) for every x e K and every cue K.
In other words, a linear form L(x) is a morphism of the linear space K into the one-dimensional space Ky~K (cf. Sec. 2.71). By using induction, we easily verify that conditions a) and b) imply the formula
Lfaxx + ■ ■ ■ + u.kxk) = axLCxx) + • ■ ■ + v.kL(xk), (1)
75
76     LINEAR FUNCTIONS OF A VECTOR ARGUMENT
CHAP. 4
where xlt.. . , xk are arbitrary vectors in K and oq,...,^ are arbitrary numbers in K.
4.12. Examples
a. Suppose a basis is chosen in an w-dimensional space K, so that every vector xeK can be specified by its components £a, £2,.. . , £n. Then
— ^ (the first component) is obviously a linear form in x.
b. A more general linear form in the same space is given by the expression
n
L{x) = 2,lkZ,k, *:=i
with arbitrary fixed coefficients /ls L,. . . , ln.
c. An example of a linear form in the space K(a, b) (where K is R or C)t is the expression
L{x) = x(*0), where f0 is a fixed point of the interval a < / < b.
d. In the same space we can study the linear form
L(x) = fV)*(0 dt,
where l(t) is a fixed continuous function.
e. In the space Vz the scalar product (x, x0) of the vector x with a fixed vector x0 e V3 is a linear form in x.
Linear forms defined on infinite-dimensional spaces are usually called linear functionate.
4.13. We now find the general representation of a linear form L(x) defined on an «-dimensional space Kn. Let elt e2, . - - , en be an arbitrary basis of the space K„, and denote the quantity L(ek) by lk (k = 1, 2, . . . , n). Then, by (1), given any
i'
x ~ 2 ^lcek> k=l
we have
LOO = L(i^kek) = i^L(eJ - 2/^,
\fc=l       /       fc=l jfc=i
i.e., the value of the linear form L(x) is a linear combination of the components of the vector .v, with the fixed coefficients llt /2, . . . , ln. Thus the
t Recall Sees. 2.15c and 2.15d.
sec. 4.2
linear operators 77
most general representation of a linear form in an h-dimensional linear space has already been encountered in Example 4.12b.
4.14. In a complex linear space C we can also consider another type of linear form, called a linear form of the second kind (in this context, the linear form defined in Sec. 4.11 is called a linear form of the first kind). A numerical function L{x) of a vector argument x, defined on a complex linear space C, is called a linear form of the second kind if it satisfies the following two conditions:
a') L(x +y)=* L(x) + L(y) for every x,y gC;
b') L(tx.x) — &L(x) for every x e C and every complex number a — + /a2 (here a = aa — /a2 is the complex conjugate of a).
For a linear form of the second kind, the analogue of formula (1) becomes
L(v.1x1 + • ■ ■ + a.kxk) = + • • • + v.kL(xk), (1')
valid for arbitrary     . . . , xk in C and arbitrary complex numbers ol1, . . . , &k.
4.15. An example of a linear form of the second kind in an w-dimensional complex space Cn with basis eu . . . , en is given by the function
n _
L(x) =2'*5*»
where lx, . . . ,ln are arbitrary fixed complex numbers and £l5. .. , \n are the components of the vector x with respect to the basis els .. . , en. Moreover, this formula gives the general representation of a linear form of the second kind defined on the space Cn. In fact, let L(x) be an arbitrary linear form of the second kind, and let /a = L(ex), . . . , ln = L(en). Then, given any x e C„, it follows from (1') that
L(x) = L(i^fc) =ilkL(ek) =i/*L
as required.
4.2. Linear Operators
4.21. As just shown, a linear form L(x) defined on a linear space K is just a morphism of K into the one-dimensional space Kx. More generally, we now consider a morphism A = A(x) of a linear space X into another linear space Y over the same field K (X and Y may coincide). As already noted in Sec. 2.71, A(x) is also called a linear operator, mapping X into Y. Instead of A(x), we will often write simply Ax. By the definition of a morphism, A(x) satisfies the following conditions:
a) A(x + y) = Ax + Ay for every x, y eX;
b) A(a*) = vlAx for every x e X and every a e K.
78     LINEAR FUNCTIONS OF A VECTOR ARGUMENT
CHAP. 4
Just as for linear forms, conditions a) and b) imply the more general formula
M*ixi + • ■ " + akxk) = ct.1Axl + • • ■ + afcA*fc for arbitrary xlt . . . , xk in X and arbitrary a1} . . . , ak in K.
4.22. Examples
a. The operator^ associating the zero vector of the space Y with every vector x of the space X is obviously a linear operator. This operator is called the zero operator, denoted by 0.
b. Given any linear operator A mapping the space X into the space Y,
let
Bx =  — Ax.
It is easy to see that the operator B so defined is also a linear operator mapping X into Y. This operator is called the negative of the operator A.
c. Let ex,. . . , en be a basis in the space X, and let vectors flt . . . ,/n in the space Y be associated with the vectors elt . . . , en in an arbitrary way. Then there exists a unique linear operator A mapping X into Y and carrying every vector ek into the corresponding vector fk (k = 1, . . . , n). In fact, if such an operator A exists, then, given any vector
x=2^fceX, (2)
we have
(n \ n n
2   ) = 2 £*Af?* = 2 5*/*» /:=1        /      k=l k=l
thereby proving the uniqueness of A. On the other hand, given any vector (2), we can set
n
^x — 2 %>kfk> Jfc=l
by definition. The resulting operator, as is easily verified, is linear, maps X into Y, and at the same time carries every vector ek into the corresponding vector fk (k = 1, . . . , n).
d. Suppose that with every vector x of the space X we associate the same vector x, thereby obtaining a linear operator E, mapping X into itself. Then E is called the identity operator or unit operator.
4.23. Matrix representation of linear operators. Let A be a linear operator mapping a space X of dimension n into a space Y of dimension m. Let
t Here we use the term operator as a synonym Tot function (mapping one linear space into another).
sec. 4.2
linear operators 79
elt . . . , en be a fixed basis in X and fu. ,fm sl fixed basis in Y. The vector ex is mapped by A into some vector Aex of the space Y, which, like every vector of Y, has an expansion
Aet = a[% + a?ft + ■ ■ ■ + a™fm
with respect to the basis vectors /i, . . . ,/m. The operator A has a similar effect on the other basis vectors:
Ae2 = a[% + a?%
+ amf
Aen = flJ»|/i + fl<*>/2 + ■ ■ ■ + at]fm. These formulas can be written more concisely as
A^ = 2^J)/i     (/ = 1,2,...,«).
(3)
The coefficients a(/] {i = 1
A — A{e f) —
m;J= 1,
, «) define an m x w matrixf
called the matrix of the operator A relative to the bases {e} = {ex, and {/} = {fu . . . ,fm}. The components of the vectors Aeu Ae2, ■ with respect to the basis {/} serve as the columns of this matrix.! Now, given any vector
, Ae„
let
x = 2 £ a- g x> j=i
^ = Ax = 2W*-
With a view to expressing the components rji,. . . , v)m of the vector y in terms of the components £l5. . . , <;„ of the vector    we observe that
2 Vifi = Ax = A( 2 Sa) = 2 ^Aei
t I.e., a matrix with    rows and n columns.
X Note the distinction between the symbol A (boldface Roman) for an operator and the corresponding symbol A (Hghtface Italic) for the matrix of A.
80     linear functions of a vector argument
chap. 4
Comparing coefficients of the vector/;, we find that
•1*=iX%      0=1.2-----m), (4)
or, in expanded form
■Vx = + a[2% + ■ • • + a[^n,
......................... (5)
1« - ali% + + ' ' ' +
Therefore from a knowledge of the matrix of the operator A relative to the basis eu e2, . . . , en we can determine the result of applying A to any vector
n
of the space X. In fact, the equations (5) express the components of the vector y = Ax as linear combinations of the components of x. Note that the coefficient matrix of the system of the equations (5) is just the matrix A{e f).
Next let t>e an arbitrary m x n matrix, where the superscript is
the column number and the subscript is the row number. Given any vector
n
y-l
we construct the vector
m
y = 2 fiifi
with components rj2, . . . , rjm determined by (5). It is easy to see that the operator A effecting this mapping of the vector x into the vector y is a linear operator. We now construct the matrix of the operator A relative to the basis ely e2i . . . , en. Since the vector ex has components £t = 1, £2 = 0, ...,£„ = 0, it follows from (5) that the components of the vector Aex will be the numbers a^, a2x\ . . . , a{£\ so that
M - a?fx +        + ' ■ ■ + a^fm.
Similarly,
Ae, = a[% +        + • • • + al»fm      (J = 1, 2, . . . , n).
Therefore the matrix of the operator A coincides with the original matrix ||«^ II- Thus every m x n matrix is the matrix of a linear operator A mapping an n-dimensional space X into an m-dimensional space Y, with fixed bases elt. . . , en in X and f, . . . ,fm in Y. Thus (3), or equivalently (4), establishes a one-to-one correspondence between linear operators mapping a space X
sec. 4.2
linear operators 81
(with basis elt. . . , en) into a space Y (with basis fu . . . ,fm) and m x n matrices made up of numbers from the field K. In particular, identical operators A and B (i.e., operators such that Ax = Bx for every x e X) have identical matrices.
Finally we note that (5) can be used to construct the operator A directly (and uniquely) from the matrix A = ||^.J)||. In fact, A is just the coefficient matrix of the system (5).
4.24. Examples
a. Clearly, the matrix of the zero operator (see Example 4.22a) relative to any basis in the space X and any basis in the space Y consists entirely of zeros.
b. If \\a\j)\\ is the matrix of A, then the matrix of the negative operator (see Example 4.22b) is obviously just — ||a|J>||.
c. Let m > n and suppose the operator A carries the vectors of the basis ex,.. . , en of the space X into linearly independent vectors/i, ...,/„ of the space Y. We augment the vectors/i, ...,/„ by the vectors fn+i,.. • ,fm to make a basis for the whole space Y. Then the matrix of the operator A relative to the bases eu . . . , en and fu . . . ,fm is clearly of the form
n
0
0
1
0 0
d. In particular, the matrix ot the identity operator E (see Example 4.22d) relative to the basis els .. . , en of the space X (the domain of E) and the basis eu . . . , en of the same space (the range of E) is just
1   0   ■■• 0
0   1- 0
0 0-1
A matrix of this form is called the unit matrix or identity matrix of order n.
0
0
n\
ml
0 0 0 0
0 0
82     LINEAR FUNCTIONS OF A VECTOR ARGUMENT
CHAP. 4
4.3. Sums and Products of Operators
We now consider addition of operators and multiplication of operators both by numbers and by other operators. First we note that two operators A and B mapping a space X into a space Y are said to be equal (written A = B) if Ax — Bx for every x e X.
4.31. Addition of operators. Given two linear operators A and B mapping a space X into a space Y, the operator C = A + B is defined by the formula
Cx = (A + B)a- = Aa- + Bx. (6)
Obviously, C also maps the space X into the space Y. To verify that C is again a linear operator, let x =       + a2x2. Then
C(a1a1 + ot2x2) = A(a1a1 + a2x2) + B(a1a*1 + <x2x2) = at-iAXi + a2Ax2 + oojBa*! + a2Bx2 = (^(Aa*! + BXj) + ix2(Ajc2 + Bx2) = oLiCXi + oc2Ca*2,
so that both conditions a) and b) of Sec. 4.21 are satisfied. The linear operator C defined by (6) is called the sum of the operators A and B. It is easily verified that
A + B = B + A, (A + B) + C = A + (B + C),
A + 0 = A, U A + (-A) = 0,
where A, B and C are arbitrary linear operators, 0 is the zero operator (see Example 4.22a), and —A is the negative of the operator A (see Example 4.22b), i.e., the operator carrying the vector xeX into the vector — Aa.
4.32. Multiplication of an operator by a number. Let A be a linear operator mapping a space X into a space Y, and let X be a number from the field K. Then the operator B = XA, called the product of the operator A and the number X, is defined by the formula
B\ - (XA)a = X(Ax).
It is easily verified (just as in Sec. 4.31) that this operator is linear, and moreover that
Xt(X2A) - (X^A, 1 ■ A = A,
(7 )
(Xi + X2)A = XiA + X2A, X(A + B) = XA + XB.
sec. 4.3
sums and products of operators 83
The relations (7) and (7') show that the set of all linear operators mapping a linear space X into a linear space Y is itself a linear space.
4.33. Multiplication of operators. Let A be a linear operator mapping the space X into the space Y and B a linear operator mapping the space Y into the space Z (where all the spaces are over the same number field K). Then the operator P = BA, called the product of the operator B and the operator A (in that order), is defined as the operator mapping X into Z such that
Px = (BA)x = B(A.v)
(note that first the operator A acts on the vector x and then the operator B acts on the resulting vector in the space Y). The operator P is again linear, since
P(a1x1 + <x2jc2) = B[A(axx! + a2x2)] = B(a1Ax1 + a2Ax2) = ci.1BAxl + a2BAx2 = o^P;^ + a2Px2.
4.34. The following relations are easily verified:
a) X(BA) = (XB)A for every number X e K and arbitrary operators A mapping the space X into the space Y and B mapping the space Y into the space Z;
b) (A + B)C = AC + BC for arbitrary operators A and B mapping the space Y into the space Z and C mapping the space X into the space Y;
c) A(B + C) = AB + AC for arbitrary operators B and C mapping the space X into the space Y and A mapping the space Y into the space Z;
d) (AB)C = A(BC) for arbitrary operators C mapping the space X into the space Y, B mapping the space Y into the space Z, and C mapping the space Z into the space W.j
For example, to verify d), according to the definition of operator equality we must prove the identity
[A(BC);c] - [(AB)C]x
for every x e X. But by the very definition of the operator product, we have
[A(BC)x] - A[(BC).v] - A[B(Cjc)], [(AB)C]jc - (AB)(Cx) - A[B(Cjc)],
which implies the required formula. The other formulas are proved similarly.
t The associative law for operator multiplication is expressed by d), and the distributive law by b) and c).
84     linear functions of a vector argument
chap. 4
4.4. Corresponding Operations on Matrices
We now study the matrix analogues of the algebraic operations on linear operators described in Sec. 4.4.
4.41. Addition of operators. Let A and B be two linear operators mapping a space X with basis ex,. .. , en into a space Y with basis f,. .. ,fm. Moreover, let A = Ha^ll be the matrix of the operator A and B = \\b{i]\\ the matrix of the operator B, relative to these bases. Then
771 771
A*, = I a\j%     Be, = £ 0 = 1,2_____«),
and hence
m
(A + B)«, = Ae, + Be, = J (*{'» +
*=i
It follows that the matrix corresponding to the operator A + B is just Waii} + bii]W- Tn's matrix is called the sum of the matrices \\a[i}\\ and Thus the sum A + B is defined for every pair of matrices A and B with the same number of rows and the same number of columns.
4.42. Multiplication of an operator by a number. With the same notation as before, we have
(XA)*, = X(A£,) = 2
It follows that the matrix corresponding to the operator XA is just the matrix ||X^.J)||, obtained by multiplying all the elements of the matrix \)a{/]\\ by the number X. This matrix is called the product of the matrix ||#|;>|| and the number X.
Since there is a one-to-one correspondence between m x n matrices and linear operators mapping an w-dimensional space into an w-dimensional space (see Sec. 4.22), there is a one-to-one correspondence between algebraic operations involving operators and the analogous operations involving matrices. Hence, since operators obey the rules (7) and (7'), the same is also true of matrices (of course, this can easily be verified directly). Thus we see that the set of all m x n matrices is itself a linear space, which, by its very construction, is isomorphic to the linear space of all linear operators mapping an w-dimensional space X into an w-dimensional space Y.
4.43. Multiplication of operators. Let X, Y and Z be linear spaces, and let elt. . . , en be a basis in X, f,. . . ,fm a basis in Y, and gu . . . , gQ a
sec. 4.4
corresponding operations on matrices 85
basis in Z. Let B be a linear operator mapping X into Y with m X n matrix II, so that
m
i=l
and let A be a linear operator mapping Y into Z with q x m matrix so that
A/i = ifli"g*      0"= 1,...,«).
*:=i
Then for the product P = AB we have
Tfl 171
(AB)*, = A(Bej) = A 2 *<m/« = I b^Af
i=l i=l m Q Q   i m \
t=i     fc=i fc=i\t=i /
Hence the elements pkf) of the matrix P of the operator P = AB are given by
m
P*} = I «W      (j - 1, ■ ■ • , n; k = 1, . . . ,q). (8)
This is the desired result, which can be expressed as follows: The element of the matrix P belonging to the kth row and jth column equals the sum of the products of the elements of the kth row of the matrix A with the corresponding elements of the jth column of the matrix B. The matrix P = \\plkj)\\ which is obtained from the matrices A = \\aki}\\ and B = \\bW\l in accordance with formula (8) is called the product of the matrices A and B (in that order).
It should be noted that for the product P = AB to make sense, the number of columns in A must equal the number of rows in B. Then P will have the same number of rows as A and the same number of columns as B. This fact can be expressed more strikingly in the "m x n notation," namely, the product AB of a q x / matrix A and an m x n matrix B is defined if / = m, in which case AB is a q X n matrix. Both products AB and BA are defined if / = m and q = «, in which case AB is a square n x n matrix while BA is a square m x m matrix. Moreover, if / = m = q = n, i.e., if both matrices A and B are square n X n matrices, then AB and BA are also n x n matrices. However, these products need not be equal. For example.
0	1		1	0		0	0
1	0		0	0		1	0
1	0		0	1		0	1
0	0		1	0		0	0
86     linear functions of a vector argument
chap. 4
Thus multiplication of square matrices is in general noncommutative. As for the associative and distributive laws, the situation is more favorable. In fact, as shown in Sec. 4.34, operator multiplication obeys the associative and distributive laws, and hence we can assert that the same is true of matrix multiplication, since there is a one-to-one correspondence between operators and matrices associating sums and products of operators with the sums and products of the corresponding matrices.
4.44. Examples
In the following examples, we write both indices of matrix elements as subscripts, so that the element aik of the matrix A ~ \\ajk\\ belongs to the yth row and the A:th column. In this notation, formula (8) for the matrix product P = AB takes the form
m
Pki = 2akibij     (j = 1, . . . , n; k = 1, . . . , q). (8')
a. Suppose we multiply an m x n matrix A = \\ajk\\ from the left by an m x m matrix Brs — \\bik\\ with all its elements bjk equal to zero except the single element brs = 1. Then by (8') we get the m X n matrix
BrsA = (r)
= 00
				«11      «12 ■		* «17,
■ ■ ■	1 ■ ■ ■			«S1      «S2 '		«37J
0	0			«ml 0	am2	
«si	as2   ■ ■			«Stt		
0	0    ■ ■			0		
so that the rth row of the matrix BrsA consists of the elements of the 5th row of the matrix A while all other elements of BrsA vanish.
b. Suppose we multiply an/»x« matrix A = \\ajk\\ on the right by an n x n matrix Cpq = \\cjk\\ with all its elements cjk equal to zero except the
sec. 4.4
single element cVQ =
corresponding operations on matrices 87
1. Then by (8') we get the m x n matrix
(?)
ACvq -
«11
«21
a
id
a
In
a
2p
a
2n
a
ml
a
mv
a,
(P)
(?)
o o
0
a
a
2p
a
0 0
0
so that the ^th column of the matrix ACfiq consists of the elements of thepth column of the matrix A while all other elements of A Cvq vanish.
c. With the same matrices Brg, A and CPQ we have
		(?)	
	0 ■	■    0 ■	■ 0
BrsACvQ^(r)	0 ■		■ ■ 0
	0 ■	■ ■    0 ■	■ • 0
Thus BrsACpq is an m x n matrix all of whose elements vanish with the (possible) exception of the single element, equal to asp, appearing in the rth row and ^th column.
d. By what m X m matrix D must we multiply an m X n matrix A from the left to make the matrix DA coincide with the matrix obtained from A by interchanging its Kh and jth rows?
Solution. Example 4.44a shows that the matrix whose rth row is the 5th row of the matrix A is obtained by multiplying A on the left by the m x m matrix Brs. But the other rows of the resulting matrix vanish. It is
88     linear functions of a vector argument
chap. 4
now clear that to get the required matrix, we must multiply A from the left by the m x m matrix
(0 (s)
0
3
0
1
e. By what n x n matrix G must we multiply an m x n matrix A from the right to make the matrix AG coincide with the matrix obtained from A by interchanging its pth and qth columns?
Solution. By an argument like that in Example 4.44d, we have
^ = Cue + Cqv + ^ Ckk.
k*v
f. By what m x m matrix F must we multiply an m x n matrix A from the left to make the matrix FA coincide with the matrix obtained from A by adding a times its 5th row to its /-th row?
Solution. Using Example 4.44a, we obviously have F = E + ~kBrs where E is the unit matrix of order m.
g. By what n x n matrix H must we multiply an m X n matrix A from the right to make the matrix AH coincide with the matrix obtained from A by adding X times its pth column to its ^th column?
Solution. Clearly, H = E + y.CPQ where £ is the unit matrix of order n.
4.5. Further Properties of Matrix Multiplication
4.51. Multiplication of block matrices. In multiplying matrices, it is sometimes convenient to partition the matrices into blocks and afterwards
sec. 4.5
further properties of matrix multiplication 89
deal with the blocks as separate entities. Suppose we are given an m x n matrix A and an n x p matrix B, partitioned into blocks as follows:
P
Suppose further that every "block-row" of the matrix A contains the same number of blocks as every "block-column" of the matrix B, and that the "width" of every block Aik of the matrix A coincides with the "height" of every block Bks of the matrix B. Then the products AikBks all make sense, and in fact are rectangular matrices of size depending on the indices j and s (but not on the index k). We then have the following multiplication rule: The product matrix AB is made up of blocks constructed from the blocks of the matrices A and B in the same way as the elements of AB are constructed from the elements of A and B, i.e.,
AB =
AnBn + Al2B2l H----	^11^12 + ^12^22 + ■ * *
A21BU + A22B2l + * • *	-^21-^12 + ^22-^22 + ' ' *
(9)
• * * *
To prove (9), let / be the index of a block-row of A containing the kth ordinary row of A, and let j be the index of a block-column of B containing the qth ordinary column of B. By the general rule of Sec. 4.43, the elements of the product matrix P = AB are of the form
Pkq = akAQ + ■ ' ■ + aknbm
= KA« H-----1- %A«) + —I- (fl*A« H-----1- akJ>m)i
where parentheses are inserted in keeping with the widths of blocks of A (and heights of blocks of B). But the first term in parentheses is the element in the &th row and ^th column of the block AiXBljt the second term in parentheses (not written) is the element in the kth row and ^th column of the block AaB2j, and so on. Thus p^ is the element in the kth row and ^th column of the block AaBXi + • • * + AirBrj, itself the block in the ith row and yth column of the matrix P = AB regarded as a block matrix. The proof of (9) is now complete.
90     linear functions of a vector argument
chap. 4
4.52. Multiplication of quasi-diagonal matrices. A matrix is said to be quasi-diagonal if it is of the form
A =
	
	A 22
where the "off-diagonal" blocks consist entirely of zeros. Suppose the block Akk is an mk x nk matrix (k = 1, . . . , s), and consider the quasi-diagonal matrix
B =
	
	B22
B
where the block Bkk is an nk x pk matrix (k = \,. . . , s). Then, using the rule of Sec. 4.51 to multiply the matrices A and B, we immediately get
AB
	
	A 22-^22
A R
Thus in this case the matrix AB is again a quasi-diagonal matrix, where the block AkkBkk has mk rows andpk columns.
4.53. Multiplication of transposed matrices. Given an m X n matrix A ~ by the transpose of A (cf. Sec. 1.41) is meant the n X m matrix
A' = ||a^J such that
a'ik = aki U' = U ■ ■ ■ ,n;k = 1, . . . , m).
Let A be an m x n matrix and B an n x p matrix. Then the product P = AB is defined and is an m x p matrix. Moreover, the product B'A' of the transposed matrices A' and B' is also defined and is a p x m matrix. We now show that
B'A' = (AB)'. (10)
sec. 4.5
further properties of matrix multiplication 91
Let the elements of the matrices A, B, P ~ AB, A', B' and P' be denoted by aa> bay Pa, a'a = aHt b'u = bH, p'i}. = pH. Then, by the rule for matrix multiplication,
n n n
Pik = Pki = 2°«*« = ^a',Hbkj = ^b'kia'H,
j=l j = \ j=l
where the summation is over the index j with the indices i and j held'fixed. Thus to form the element p'ki of the matrix P', the elements of the A:th row of B' are multiplied by the corresponding elements of the /th column of A' and then added. In other words, using the rule for matrix multiplication once again, we see that P' is the product of B' and A' (in that order), thereby proving (10).
4.54. Minors of the product of two matrices. Given an/nXK matrix A = \\aik\\ and an n x p matrix B = ||£,J, we construct the m X p matrix P = AB = ||/7jfc||. Fixing the rows with indices o^, . . . , afc (o^ < ■ ■ ■ < afc) and the columns with indices {Jl5 . . . , $k (^ < ■ ■ ■ < fjfc), where k < m, < /», we now consider the problem of calculating the minor
Mi\.....RAB) =
a
a
+ a^inbnQlc
(U)
formed from these rows and columns. To make this calculation, we use the linear property of determinants (Sec. 1.44). The vth column of the minor (11) is the sum of k "elementary columns1' with elements of the form ayib^v (where the column indices i and v are fixed, and the row index j varies from 1 to k). Hence the whole minor (11) is the sum of kk "elementary determinants" consisting only of elementary columns. Since in each elementary column the factor b^v does not change as we go down the column, it can be factored out of the elementary determinant. After this, each elementary determinant takes the form
a
a
a
an
a.
«2* a
a
a
a
a
(12)
where i\, i2, . . ■ , ik are certain numbers from 1 to n. If some of these numbers are the same, then clearly the corresponding elementary determinant vanishes. Moreover, this is always the case if k > n. Therefore // the matrix AB has minors of order k > w, they must all vanish.
92     linear functions of a vector argument
chap. 4
Returning to the case k < n, we note that it is only necessary to consider elementary determinants for which the indices ilf i2, .. . , ik are all different. In this case, the determinant
««1*1 ««i»2
«agj'i «a2l"2
is the same (except possibly for sign) as the minor M*l';;;f*(A) where the indices jx, . . . ,jk (j\ < ■ ■ - < jk) are the indices ix, . . . , ik rearranged in increasing order. To find the sign which must be ascribed to (13) to get M*l','.'.'.'.f£(A), we successively interchange adjacent columns of (13) until we arrive at the normal arrangement of the columns, i.e., the arrangement they have in the matrix A itself. At each interchange of two adjacent columns, the determinant (13) changes sign and the number of inversions in the permutation ib i2.....ik changes by unity. Since in the final arrangement of
the columns, the subscripts are in natural order (i.e., without inversions), the number of successive changes of sign is equal to the number of inversions in the permutation ilt i2, . . . , ifc.f Let N(i) denote the number of sign changes. Then the expression (12) takes the form
(-irXpAfe ■ ■ • bilSkM:;::::;%(A). (14)
To obtain (11), we must now add up all the expressions of the form (14).
First we add up all the expressions with the same set of indices j\,, . . ,jk, taking out the common factors M^'[[['^{A). The remaining expression is then
(~OiY(t)£ti3A-202' ' ' bikflk>
where the summation is over all distinct sets of indices ix, i2,. . . , ik (these indices range from 1 to n). But this expression is just the minor M^-yjiB) Thus finally we get the formula
Mg:::::S:04B) = 2 Mj;:::::^)Af j;::::;fe(B), (is)
where the summation is over all distinct sets of indices juj2, .. . ,jk (1 < ji < j2 < • * ■ < jk < n). The total number of terms in the sum (15) is just the binomial coefficient
k\{n~k)\'
t It is assumed that the change in the indices i\, i2,. . . , h produced by every column interchange causes a smaller index to appear before a larger index, with the result that the total number of inversions changes by exactly one.
a
«i*s
a,,
a
(13)
sec. 4.6
the range and null space of a linear operator 93
Our result can be summarized in the following
Theorem. Every minor of order k < n of the matrix AB can be expressed in terms of the minors of the same order of the matrices A and B, in the way given by formula (15).
4.6. The Range and Null Space of a Linear Operator
4.61. Let A be a linear operator mapping a linear space X into a linear space Y (in the notation of Sec, 2.71, this is expressed by writing A;X —>• Y). Let n be the dimension of X and m the dimension of Y, and choose an arbitrary basis eu . . . , en in X and fu .. . ,fm in Y. Then, by the method of Sec. 4.23, we can associate the operator A with an m x n matrix
A = Wa^W      (i = 1, . . . ,m;j = 1, . . , ,«).
Let T(A) be the range of A, i.e., the set of all vectors y = Ax, x e X. We now consider the problem of finding the dimension of the subspace T(A) from a knowledge of the matrix A, Writing
we have
n
y = Ax = 2 lkkek, k=i
Hence the range of the operator A coincides with the linear manifold spanned by the vectors Aet,.. . , Aen. As noted on p. 51, the dimension of this linear manifold L^Ae^ . . . , Aen) equals the maximum number of linearly independent vectors in the system Aex,. .. , Aen. We know from Sec, 4.23 that the columns of the matrix of the operator A consist of the components of the vectors Aeu . . . , Aen with respect to the basis ex, . . . , en, and hence the problem of finding the maximum number of linearly independent vectors in the system Aex, . . . , Aen reduces at once to that of finding the maximum number of linearly independent columns of the matrix A. But by Theorem 3.12c, the latter quantity is just the rank of the matrix of the operator A. Thus the dimension of the range of a linear operator A mapping an n-dimensional space X into an m-dimensional space Y equals the rank of the matrix of A relative to any basis {e} in X and any basis {/} in Y.
We note that the choice of bases does not matter here. Therefore the rank of the matrix of an operator A does not depend on the choice of bases, i.e,, depends only on the operator A itself. In what follows, the rank of the matrix of the operator A (relative to any bases) will simply be called the rank of the operator A, denoted by rA.
94     linear functions of a vector argument
chap, 4
4.62. Next let N(A) be the null space of the operator A, i.e., the set of all vectors xeX such that Ax = 0, and as before let A — ||aj«|| be the matrix of A. We now consider the problem of finding the dimension of the subspace N(A) from a knowledge of the matrix A. Let
x = 2     e N(A). »=i
Then the system (5), p. 80 takes the form
a{x% + «i% + ■ • ' + a[n)ln = 0,
a?% + afl2 + ■ ■ ■ + a?% = 0,
........................ (16)
<*l% + + ■ ■ ■ + <!'»»$„ = 0.
Moreover, it is obvious that, conversely, every vector x eX whose components satisfy (16) belongs to the null space of the operator A. Thus the problem of finding the dimension of the null space of the operator A is equivalent to the problem of finding the dimension of the subspace of X consisting of all solutions of the system (16). But according to Sec. 3.51, the dimension nA of this subspace equals n — r, where r is the rank of the coefficient matrix of the system, or equivalently, the rank of the operator A. It follows that nA = n — rA. Thus the dimension of the null space of the operator A equals the rank of the space X {on which A acts) minus the rank of the operator A.
4.63. In particular, if the morphism A:X—>- Y is an epimorphism, then T(A) = Y and hence rA = m. If the morphism A: X Y is a monomorphism, then N(A) = {0} and hence rA — n. The converse assertions are also true; If the rank of the matrix A equals the number m of its rows, then the dimension of T(A) coincides with the dimension of the whole space Y and hence T(A) = Y. Therefore the morphism A is an epimorphism if and only ifrA = m. If the rank of the matrix A equals the number of its columns, then the vectors fx = Aex, .,.,/„ = Aen are linearly independent and hence the operator A is a monomorphism (see Sec. 2.73c). Therefore the morphism A is a monomorphism if and only if rA = n.
4.64. The following proposition is the converse of the results of Sees. 4.61 and 4.62:
Theorem. Let X be an n-dimensional linear space and Y an arbitrary linear space. Then, given any subspaces Nc X and T c Y the sum of whose dimensions equals «, there exists a linear operator A:X—s-Y such that N(A) = N, T(A) - T.
SEC. 4.6
THE RANGE AND NULL SPACE OF A LINEAR OPERATOR 95
Proof. Let the dimensions of N and T be k and m = n — k, respectively. Moreover, let fuf2, . . . ,fm be m linearly independent vectors in the subspace T, and let ex, e2,... ,en be any basis in the space X whose first k vectors lie in the subspace N (see Sec. 2.43). Defining an operator A by the conditions
Aet = 0      (/= 1,2, ...,*), (17) Aei+k=fi      ('= 1,2, . . . , m),
we now show that A satisfies the requirements of the theorem. First of all, it is obvious that T(A) is the linear manifold spanned by the vectors/i,/2, . . . , fm and hence coincides with the subspace T. Moreover, by (17), every vector of the subspace N belongs to N(A), and it remains to show only that every vector of N(A) belongs to N. Suppose Ax = 0 for some
n
x = 2
Then, by (17),
0 = Ax = A(lxex + "■.■ + inen) = ^fc+i/i + ■ ■ ■ + infm,
and hence = • • ■ = \n = 0 since fx> . . . ,fm are linearly independent. But then
x = \xex + ■ ■ • + lkek e N. |
4.65. The following theorem on the rank of the product of two matrices is a consequence of the geometric notions just introduced:
Theorem. The rank of the product AB of two matrices A and B does not exceed the rank of each of the factors.
Proof. Naturally, we must assume that the number of columns of the matrix A coincides with the number of rows of the matrix B, since otherwise the product AB could not be formed. Thus let A be an m x n matrix and B a n x p matrix, and introduce linear spaces X, Y and Z with dimensions n, m and p, respectively. Choose a basis ex, . . . , en in the space X, a basis fi, ■ ■ ■ ,fm m tne space Y and a basis glf . . . ,gp in the space Z. Using these bases, we associate a linear operator A:X —>- Y with the matrix A and a linear operator B:Z —>- X wiht the matrix B (see Sec. 4.23). Then the product operator AB:Z^Y corresponds to the product matrix AB. The range of the operator AB is contained in the range of the operator A, by the very definition of AB. Since by Sec. 4.61 the dimension of the range of any operator equals the rank of its matrix, we find that the rank of the product of two matrices does not exceed the rank of the first factor. To prove that it also does not exceed the rank of the second factor, we go over to transposed matrices. Using equation (10), p. 90, we find that
rank AB = rank (AB)' = rank B'A' < rank B' = rank B. |
96     linear functions of a vector argument
chap. 4
4.66. The rank of the product of two matrices can actually be less than the rank of each factor. For example, the matrices
	0	1		1	0
A =			B =		
	0	0		0	0
both have rank one, but their product
0	0
0	0
has rank zero. Therefore the following theorem, which gives a lower bound rather than an upper bound for the rank of the product of two matrices, is of interest:
Theorem. Let A be an m X n matrix of rank rA and B an n x p matrix of rank rn. Then the rank of them x p matrix AB is no less than rA 4- rB — n.
Proof. First we show that any operator A:X —>■ Y of rank r carries every A:-dimensional subspace X' c: X into a subspace Y' c Y of dimension no less than r — (n — k). Choose a basis elf e2, . . . , en in the space X such that the first k basis vectors lie in the subspace X' (see Sec, 2.43). The components of the vectors Aelt Ae2, . . . , Aek generating the space Y' occupy the first k columns of the matrix of the operator A, By hypothesis, there are r linearly independent columns in the matrix of A. We divide these columns into two groups, the first consisting of columns whose numbers lie in the range 1 to k, the second consisting of columns whose numbers lie in the range k 4- 1 to n. The second group contains no more than n — k columns, and hence the first group contains no more than r — (n — k) columns. Thus the subspace Y' has no more than r — (« — k) linearly independent vectors, as asserted.
Now let A:X —► Y and B:Z —>- X be linear operators corresponding to the matrices A and B. By Sec. 4.61, the rank of the matrix of the operator AB is just the dimension of the range of AB. The operator B maps the whole space Z into the subspace T(B) <=■ X of dimension rB. But as shown above, the operator A maps the subspace T(B) into a subspace of dimension no less than rA — (n — rn) = rA + rB — n. Thus the range of the operator AB, and hence the rank of the matrix of AB, is no less than rA-\- rB — n. |
4.67. Corollary, Let A be an m x n matrix and B an n x p matrix, and suppose the rank of one of these matrices equals n. Then the rank of AB equals the rank of the other matrix.
Proof. In this case, the upper and lower bounds for the rank of AB, given by Theorems 4.65 and 4.66, have the same value, equal to the rank of the other matrix. |
sec. 4.6
the range and null space of a linear operator 97
4.68. Let A be a linear operator mapping a linear space X into a linear space Y. A linear operator B mapping Y into X is called a left inverse of the operator A if
BA = E
is the unit operator in the space X. The operator A is then called a right inverse of the operator B. The following theorem gives conditions under which the operator A (or B) has a left (or right) inverse:
Theorem. The operator A: X —>- Y has a left inverse if and only if A is a monomorphism. The operator B: Y —v X has a right inverse if and only if B is an epimorphism.
Proof. Let A be a monomorphism with range T(A) <=■ Y. Then for every y e T(A) there is an x e X such that Ax = y, where x is uniquely determined by y since A is a monomorphism by hypothesis. Let Q <=■ Y be the subspace whose direct sum with T(A) is the whole space Y (see Sec. 2.46). We now define an operator B: Y X by the following rule: For y e T(A) we set By equal to the (unique) vector x for which Ax = yy while otherwise we set
By = 0     if j^eQ,
By = Byx   if y = yx + y2, yx e T(A), y2 e Q.
Then it is easy to see that the operator B is linear and that BAx ~ x, for every x e X, so that B is the left inverse of A. However, if A is not a monomorphism, there exists a nonzero vector x eX such that Ax = 0. Then for any B:Y-*-X we have (BA)x = B(Ax) = B(0) = 0, so that A indeed fails to have a left inverse.
Next let B:Y—vX be an epimorphism and let N(B) c Y be the null space of B, while Q Y is the subspace whose direct sum with N(B), denoted by N(B) + Q, is the whole space Y. Since
X = B(Y) = B(N(B) + Q) = B(Q),
the mapping B:Q —> X is also an epimorphism and in fact an isomorphism, since no nonzero element y e Q is mapped into zero by the operator B. We now define an operator A:X —>- Y by the following rule: Given any x eX, we set Ax equal to the (unique) vector y e Q for which By = x. Then it is easy to see that the operator A is linear and that BAx = x for every xeX, so that A is the right inverse of B. However, if B: Y X is not an epimorphism, then BAx ^ x for any operator A: X —> Y and any vector x eX such that .y   T(B), so that B has no right inverse. |
4.69. a. As we know, the result of multiplying an n x m matrix P by an m x n matrix A is a square n x n matrix
S = PA.
98     LINEAR functions of a vector argument
chap. 4
If S is the unit n x n matrix (see Example 4.24d), we call P the left inverse of the matrix A. Similarly, the result of multiplying an m x n matrix A by an n X m matrix Q is a square in x m matrix
T=AQt
and if 7" is the unit m x m matrix, we call Q the right inverse of the matrix A.
b. Using the results of Sec. 4.63, we can now formulate Theorem 4.68 in terms of the rank of a matrix:
Theorem. An m X n matrix A has a left inverse if and only if its rank equals n and a right inverse if and only if its rank equals m.
4.7. Linear Operators Mapping a Space Kn into Itself
4.71. Let A be a linear operator mapping the space X into itself (this corresponds to setting Y = X in Sec. 4.21). Such an operator is said to be an operator (acting) in the space X.
Suppose the operator A acts in an n-dimensional space X = K„. Choosing a basis elt. . . , en in the space X, we use the same basis in Y = X to construct the matrix of the operator A. Then formula (3), p. 79 becomes
Aei = Za<% (18)
(after setting= e^), so that the coefficients now form a square n x n matrix A, called the matrix of the operator A in (or relative to) the basis {e} = {el5 . . . , We will sometimes denote this matrix by A(e). The corresponding formula relating the components of the vectors x and y, where
y = Ax,      x = 2 ljes,      y = 2rl^j
is
yit=I°l% 09)
(cf. formula (4), p. 80). For a fixed basis {e} = {ex, .. . , en}> we get a one-to-one correspondence between all linear operators acting in the space Kn (i.e., mapping K„ into itself) and all square n x n matrices made up of elements of the underlying field K.
4.12. Examples
a. The operator associating the zero vector with every vector of the space X is obviously linear. As in Example 4.22a, this operator is called the zero operator. It is clear that the matrix of the zero operator relative to any basis consists entirely of zeros.
sec. 4.7
linear operators mapping a space Kn into itself 99
b. The identity (or unit) operator E, associating the vector x itself with every vector xeX, has already been considered in Example 4.22d. Its matrix is the unit (or identity) matrix of the form
1   0   ■ ■ - 0
p      0   1   ■■■ 0 E -
0  0   ■ ■ ■ 1
(cf. Example 4.24d).
c. The operator A which carries every vector x e X into X.v, where X is a fixed number from the field K, is obviously linear. This operator is called the similarity operator (with ratio of similitude X). As in the preceding example, the similarity operator has the matrix
X 0 • * • 0 0   X   ••• 0
0  0   *  • X
in any basis.
d. We can specify a vector in the Euclidean plane V2 by giving its polar coordinates p and 9. The operator A carrying the vector x = (p, 9) into Ax = (p, 9 4- 90), where 90 is a fixed angle, is linear (as can easily be verified by drawing a figure). This operator is called the rotation operator through the angle 90.
To construct the matrix of A, we choose a basis in V2 consisting of two orthogonal unit vectors ex and e2. Drawing a figure, we easily see that after rotation through the angle 90 the vector ex goes into the vector ex cos 90 4-e2 sin 9„, while the vector e2 goes into —ex sin 90 4- e2 cos 90. Hence the matrix of the rotation operator A has the form
cos 90 —sin 90 sin 90       cos 90
in the basis ex, e2.
e. Let ex, e2,... , en be a basis in an w-dimensional space K„, and suppose that with the vector
n k=l
100     linear functions of a vector argument
chap. 4
we associate the vector
Px = ^lkek
where m < n. Then P is a linear operator, called the projection operator onto the subspace Km spanned by the vectors eu e2, . . . , em.
To construct the matrix of P, we note that it carries the vectors eu e2,... , em into themselves and the vectors . .. , en into the zero vector. Hence the matrix of the projection operator P in the basis eu e2y . . . , en is just
(m)
1 0
0 1
0 0
0 0
0 0
0 0
0 0
1 0 0 0
0 0
0 0
0 0
0
f. Let eu e2,. .. , en be a basis in an /2-dimensional space Kn, and let Xx, X2,. .. , Xn be n fixed numbers. Defining an operator A for the basis vectors by the conditions
Aex = \xeXi     Ae2 — \e2,... ,     Aen — Xnen,
we then of course use linearity to define A for any other vector
by the condition
Ax =2XÄ^-
The resulting operator A is said to be diagonal relative to the basis ex, e2,... , en; we also call A a diagonalizable operator.
The matrix of an operator which is diagonal relative to the basis ex, e2> ... , en is of the form
Xx   0    ••• 0
0
0 0
0
in the same basis. Such a matrix, which can have nonzero elements only on its principal diagonal, is said to be diagonal (hence the corresponding
sec. 4.7
linear operators mapping a space K„ into itself 101
terminology for the operator itself). It should be noted that the matrix of an operator which is diagonal relative to the basis eu e2,. . • , en will in general not be diagonal in another basis fi,f2, • • • ,/„.
4.73. a. Using the rules of Sees. 4.31 and 4.32 to add linear operators acting in a space X and multiply them by numbers, we again get linear operators acting in X. The rules (7) and (7'), p. 82 show that the set of all linear operators acting in a space X (equipped with the indicated operations of addition and multiplication by numbers) is again a linear space over the same field K. Moreover, the operation of multiplication described in Sec. 4.33 can always be defined for operators acting in a space X, and the result is again an operator acting in X. In particular, we can define the powers of a given operator A by the rules
A1 = A, A2 - AA,
A8 - A2A - (AA)A - A(AA) - A (A2),
A" - A""lA - AA""1. We then have the formula
Am+n = AmA"      (m, n = 1, 2,. . .), (20) which can easily be proved by induction. Next we define
A0 - E,
where E is the identity operator, and show that (20) remains valid in the case where one of the indices is zero. In fact, if B is any operator, we have
(BE)jc = B(Ejc) = B\ = E(Bjc),
so that
BE = EB = B.
Setting B = A", we obtain
A"E = EA" = A",
as required.
b. Let X = K„ be a finite-dimensional space, and let ex,... , en be an arbitrary basis in X. Then with every linear operator A acting in the space X we can associate the matrix of A in the basis ex,.. . , en. Just like the operators themselves, the corresponding matrices can be added, multiplied and raised to powers in accordance with the rules of Sees. 4.41-4.43. The dimension of the linear space of all matrices of order n can easily be found. In fact, let Etj be the matrix whose elements are all zero except for the
102      LINEAR FUNCTIONS OF A VECTOR ARGUMENT
CHAP. 4
element in the z'th row andyth column, which, to be explicit, we choose to be 1. Then the matrices Eu (i,j = 1,. . . ,n) are obviously linearly independent. On the other hand, every matrix of order n is a linear combination of the matrices EH. Hence the matrices Eu form a basis in the space of all matrices of order w. Since the number of matrices Ei} is «2, the dimension of the space of all matrices of order n is just n2 (see Sec. 2.35). The space of all linear operators acting in X = Kn obviously has the same dimension n2.
474. Examples
a. Multiplication by the complex number <x> = a + ifi is a linear transformation in the xy-plane, which can be described by a real matrix of order two. It follows from the multiplication formula
(a + + iy) = (olx - fly) +        + ay)
that this matrix is of the form
a	-ß
ß	a
This rule establishes a one-to-one correspondence between complex numbers co = a + /f$ and real matrices w of order two, where (as is easily verified) the sum (or product) of two numbers goes into the sum (or product) of the corresponding matrices. This is described by saying that the matrices -m form an exact representation of the field of complex numbers (see Sec. 11.21).
b. Let Bfc (k > 0) denote the operator which "lowers indices by k," i.e., the operator carrying each basis vector em (m = 1,. . . , h) into the basis vector em_k if m — k > 0 and into 0 if m — k < 0. Obviously
B0 = E,     BkBr = Bfc+r,
and, in particular,
The matrix of the operator Bx is
0 10-0 0  0   1   *** 0
0 0 o ... 1 ooo ... 0
sec. 4.7
linear operators mapping a space K„ into itself 103
while that of the operator Bk (k < n) is
(* + 1)
0 0
0
0
1 0
0 1
0 0
0 0
0
0
1
0
4.75. The determinant of the product of two matrices. Let A = ||aifcj| and
B — ll^jfcll De any two n x n matrices, and let C = AB be their product. Applying Theorem 4.54 to the minor M\"-^{AB), which is just the determinant of the matrix AB, we get
det AB = det A det B. Thus we have proved the following
(21)
Theorem. The determinant of the product of two n x n matrices equals the product of the determinants of the matrices.
There also exist direct proofs of this theorem, i.e., proofs which do not rest on a proposition like Theorem 4.54. Here is one such proof. Consider the determinant
D =
		-1	0 •	0
	• b2n	0	-1	0
bn\ '		0	0 ••	•   - 1
0 •	•• 0	an		
0 •	■• 0	a2i	* *	a2n
0 •	• 0		a«2 '	a "nil
of order 2n. By Sec. 1.32, the determinant D equals the product of the determinants of the matrices
A =
a
ii
a
In
a
a.
so that
B =
* ^1«
• b
D = det A det B.
(22)
104     linear functions of a vector argument
chap. 4
But there is another way of evaluating D. Using the elements —1 in the first n rows and last n columns of D, we can make all the elements in the last n rows and last n columns of D vanish. This is done by adding to the (« + l)st row of D the first row multiplied by an, the second row multiplied by aVi, the «th row multiplied by aln, then adding to the (n + 2)nd row of D the first row multiplied by a21, the second row multiplied by #22,. .. , the «th row multiplied by #2„, and so on, until we finally arrive at the last (2«th) row. This gives
d =
6n
bnün + bnüit + ■ ■ • + bnlaln biiCtzi + &2itfe2 + * * * + bnla2n
bnanl + bzlani + * * • + bnlann
bm
bin
1 0
0 -1
bnn 0 0
&m#ii + * * * + bnnaln 0 0
bi„a21 + " * 4- bnnain 0 0
bi*anl + •• • + bnnann 0 0
and hence, by Laplace's theorem Sec. (1.81)
-1    0  *•*    0 blla11 + '■■ + bnlaln 0-1   " •    0 btlazl + • * • + bnla2n
• blnan + • - - + b„naln ' blna21 + • • • + bnnain
0 0 • * • -1 6n*7„i + * • * + bnlann * • * blnani + • * ■ + bnna„ *3ii6n + • • • + alnbnl   ■ • •  üubin + • — + alnbnn
041*11 + " * + *W>«1    * ' *    #21*1« + * *' + *W>„«
#„Ai + * *' + annb„i
anlbin + - - + annbr,
- det (AB). (23)
Comparing (22) and (23), we get (21), thereby proving the theorem.
A square matrix A is said to be nonsingular if det A y= 0 and singular if det A = 0. It follows from (21) that if the matrices A and B are nonsingular, then so is the product matrix AB, while if at least one of the matrices A and B is singular, then so is AB. These conclusions can also be deduced from Theorem 4.65 and Corollary 4.67.
4.76. The inverse operator. In keeping with the definition given in Sec. 4.68, an operator B acting in a space X is called a left inverse of the operator A acting in the same space X if
BA ^ E,
where E is the identity operator. The operator A is then called a right inverse of the operator B.
sec. 4.7
linear operators mapping a space K„ into itself 105
a. It is possible for an operator A to have many left inverses and no right inverses at all (see Problems 25 and 26) or, conversely, many right inverses and no left inverses at all. However, suppose A has both a left inverse P and a right inverse Q, so that
P = pe = P(AQ) = (PA)Q = EQ = Q.
Fixing Q, we see that every left inverse coincides with P and hence is uniquely determined. In just the same way, the right inverse Q is uniquely determined under these circumstances. The uniquely determined operator P = Q, which is simultaneously both a left and a right inverse of the operator A, is called the inverse of the operator A and is denoted by A-1. The operator A itself, with the inverse A-1, is said to be invertible (or nonsingular).
b. Let A be an operator acting in an ^-dimensional space X = Kn, and let A be the matrix of A in some fixed basis e1}... , en. Then either det A ^ 0 or det A = 0. In the first case, the rank of the matrix A equals n and it follows from Theorem 4.69b that A has both a left and a right inverse. Correspondingly, the operator A then has both a left and a right inverse, and hence is invertible. However, if det A — 0, then, by Theorem 4.69b again, the matrix A has neither a left nor a right inverse, and hence the operator A acting in Kn has neither a left nor a right inverse.
4.77. The matrix of the inverse operator. Let A be an invertible operator acting in an w-dimensional space X, and let B = A-1 be its inverse. Choosing a basis ely.. . , e„, let A — ||ajJ>|| and B — \\b[i}\\ be the matrices of the operators A and B in this basis.
We now find an explicit formula for the elements b\j) in terms of the elements a{/K Fixing the row number i, we use formula (8), p. 85 to write down expressions for the elements of the zth row of the matrix BA = E:
b?>a™ + b?>a?> + • • • + b™a{» = 0, W + by a? + • • • + = 1,
blval*> + bl2V2n) + * * * + b[n]a(nn) - 0.
The unknowns . .. , b\n) can be determined from this system of equations by using Cramer's rule (Sec. 1.73), since det A 0 by hypothesis. Expanding the determinant in the numerator of the resulting expression for b^ with respect to the yth column, we get
A{i) det A
where Aiil) is the cofactor of the element a[i] in the matrix A. In words, the element b[i] of the inverse matrix A"1 equals the ratio of the cofactor of the
106     linear functions of a vector argument
chap. 4
element a\i] of the original matrix A to the determinant of A. Thus we have proved the following.
Theorem. Every nonsingular matrix A = ||^.y,|| has a unique inverse matrix B = \\b^]\\ such that
AB = BA = E. The elements of the matrix B are given by formula (24).
4.78. Let A~l be the inverse of the operator A, as in Sec. 4.76a. Then by A~fc we mean the operator (A-1)*. It is easily proved by induction that formula (20) continues to hold for negative powers. Powers of the inverse matrix are defined in just the same way, and then the validity of the formula
Am+n = AmAn      (#w,n = 1,2, . . .)
for negative powers of matrices is an immediate consequence of the validity of (20) for negative powers of operators.
4.8. Invariant Subspaces
4.81. Given a linear operator A acting in a linear space K, we say that a subspace K' K is invariant with respect to (or under) A if x e K' implies AxeK'. In particular, the trivial subspaces, i.e., the whole space and the space whose only element is the zero vector, are invariant with respect to every linear operator. Naturally, we will be interested only in nontrivial invariant subspaces.
4.82. The linear operators given in the examples of Sec. 4.72 will now be examined from this point of view.
a-c. Every subspace is invariant with respect to the operators of Examples 4.72a-c (the zero operator, the identity operator, and the similarity operator).
d. The rotation operator in the plane (Example 4.72d) has no nontrivial invariant subspaces, unless the angle of rotation equals mn where m is an integer (in which case, every one-dimensional subspace is invariant).
e. The projection operator (Example 4.72e) has the following invariant subspaces (among others): The subspace K' of vectors
m
* = 2 lkek
which remain unchanged and the subspace K" of vectors
n
k—m\-l
which are carried into zero.
sec. 4.8
invariant subspaces 107
f. Every subs pace spanned by some of the basis vectors eu e2, . . . , en is invariant under a diagonal operator (Example 4.72f).
4.83. Suppose an operator A acting in an n-dimensional space K„ has an invariant m-dimensional subspace Km. Choose a basis el,. . . , en for K„ such that the first m vectors ex,. . . , em lie in Km. Then
Aex = a[l)ex + • • * + a^em,
Aem - a[m)ex + • • • + a^em, and hence the matrix of the operator A is of the form
A =
4l)
a
0
0
a[m)
"l
a
(rn)
a
(m+l)
0
0
a
(m+l)
a
(n)
(n) m+l
a
(m+l)
(25)
in the given basis. Note that all the elements in the first m columns of this matrix vanish if they appear in rows m+l through n. Conversely, if the matrix of an operator A is of the form (25), then the subspace spanned by the vectors ex, ,
, em is invariant under A.
4.84. Suppose the space K„ can be represented as a direct sum of in-
variant subs paces E, F, such that the vectors
, H (see Sec. 2.45), and choose a basis for K,
ex, . .. , er lie in E, fx, . . . ,fs lie in F,
hi, ... , hu lie in H. Then the matrix of the operator A has the quasi-diagonal form
(26)
108     linear functions of a vector argument
chap. 4
where the square matrices A(e), A(f), . .. , Am along the diagonal are made up of elements a(.j), b\j),. . . , d[j) in accordance with the formulasf
Ae, ^ 2 al%,
t=l
Ah, - 2 dl%,
while all the elements outside the matrices Aie), A(f), . . . , Am vanish. Conversely, if the matrix of an operator A is of the form (26) in some basis, then the space Kn can be represented as the direct sum of the invariant subspaces spanned by the corresponding groups of basis vectors.
4.9. Eigenvectors and Eigenvalues
4.91. A special role is played by the one-dimensional invariant subspaces of a given operator A; they are also called invariant directions (or eigenrays). Every (nonzero) vector belonging to a one-dimensional invariant subspace of the operator A is called an eigenvector of A. In other words, a vector x 0 is called an eigenvector of the operator A if A carries x into a collinear vector, i.e., if
Ax = Xjc.
The number X appearing in (27) is called the eigenvalue (or characteristic value) of the operator A, corresponding to the eigenvector x.
4.92. We now reexamine the examples of Sec. 4.72 from this standpoint.
a-c. In Examples 4.72a-c, every nonzero vector of the space is an eigenvector and the corresponding eigenvalues 0, 1, X.
d. The rotation operator (Example 4.72d) has no eigenvectors unless the angle of rotation equals miz where m is an integer.
e. The projection operator (Example 4.72e) has eigenvectors of the form
m
and
y = 2
f Cf. formula (18), p. 98.
sec. 4.9
eigenvectors and eigenvalues 109
with corresponding eigenvalues 1 and 0. It can be verified that the projection operator has no other eigenvectors.
f. The diagonal operator (Example 4.72f) by its very definition has the eigenvectors ex, e2,. . . , en with corresponding eigenvalues X1} X2,. . . , XM.
4.93. Next we prove two simple properties of eigenvectors.
a. Lemma. Given an operator A with eigenvectors xx, x2,. . ., xm and corresponding eigenvalues Xl5 X2, . . . , Xm, suppose Xf ^ X3- whenever i ^ j. Then the eigenvectors xx, x2, . . . , xm are linearly independent.
Proof. We prove this assertion by induction on the integer m. Obviously, the lemma is true for m — 1. Assuming that the lemma is true for any m — 1 eigenvectors of the operator A, we now show that it remains true for any m eigenvectors of A. In fact, assume to the contrary that xx, x2, . .. , xm are linearly dependent, so that there is a linear relation
aiXx + a2x2 + * * * + <xmxm = 0
between the eigenvectors xx, x2,. . . , xm, with ocx ^ 0, say. Applying the operator A to this relation, we get
txl'\lx1 + a2X2x2 -r • • • + v-m^mXm — 0-
Multiplying the first equation by Xm and then subtracting it from the second equation, we find that
which by the induction hypothesis implies that all the coefficients
<Xi(^i — ^m), «2(^2 — \»), • • • > am_i(^m_i — Xm) vanish, in particular that
ai(Xx ~ a j = 0,
contrary to the assumption that a.x 0, Xi 7^ Xm. This contradiction shows that the eigenvectors xx> x2, .. . , xm must be linearly independent. |
In particular, a linear operator A acting in an n-dimensional space cannot have more than n eigenvectors with distinct eigenvalues.
b. Lemma. The eigenvectors of a linear operator A corresponding to a given eigenvalue X span a subspace K(A) K.
Proof. If
Axx = Xjy"i,      Ax2 :== X^2,
then
A(a^! + $x2)     v.Axx + f$Ax2 = aXxx + fJXXj = X(axx + f}x2). |
110     linear functions of a vector argument
chap. 4
The subspace K(x> is called the eigenspace (or characteristic space) of the operator A, corresponding to the eigenvalue X.
4.94. Next we show how to calculate the components of the eigenvectors of an operator A, where A is specified by its matrix in some basis eu e2,... , en of the space Kn. Suppose the vector
n
x = 2 5***
*:=i
is an eigenvector of A, so that
Ax = Xjc (27)
for some X. Using (5), p. 80, we can write (27) in component form as
a[l% + a[2% + • • • + a[n)ln = X^, a2l% + afl2 + • • • + a^ln = X£2,
or
a{nl% + a«% + • • • + a{:Xn = ^n
(a[l) - X)5x + + * * * + Ä - 0,
a(2% + (42> - X)5a + • • • + a2n%n = 0,
(28)
(2)J
= 0.
This homogeneous system of equations in the unknowns £x, £2, .. . , ^ has a nontrivial solution if and only if its determinant vanishes (see Sec. 3.22);
A(X)^
a?" - X
42)
fl22>
(2)
o.
(29)
The polynomial A(X) of degree n in X is called the characteristic polynomial of the matrix A."\ To each of its roots X0 e K there corresponds an eigenvector of the operator A obtained by substituting X0 for X in (28) and then solving the resulting compatible system for the quantities . .. , £M. Moreover,
X0 is obviously the eigenvalue corresponding to this eigenvector. In particular, it follows that although the matrix of the operator A depends on the choice of the basis ex, e2,. . . , en, the roots of the characteristic polynomial of the
t Correspondingly, equation (29) itself is called the characteristic equation of A.
sec. 4.9
eigenvectors and eigenvalues      I 11
= 0
matrix no longer depend on the choice of basis. We will discuss this matter further in Sec. 5.53.
4.95. We now study the various possibilities which can occur in solving the characteristic equation (29).
a. The case of no roots in the field K. If equation (29) has no roots at all in the field K, then the linear operator A has no eigenvectors in the space Kn. For example, as already noted, the rotation operator in the plane V% corresponding to rotation through an angle
y0^miz      (m =0, ±1, ±2, .. .) (30)
has no eigenvectors. This fact, which is geometrically obvious, is easily proved algebraically. Indeed, for the rotation operator, equation (29) takes the form
cos cp0 — X     —sin 90
sin 90      cos 90 — X
(see Example 4.72d), which becomes
1 — 2X cos 90 + X2 = 0
after calculating the determinant. But this equation has no real roots if (30) holds.
b. If K = C is the field of complex numbers, then by the fundamental theorem of algebra, equation (29) always has a root X0 e K. Thus in the space Cn every linear operator has at least one eigenvector.
c. The case of n distinct roots. If all n roots of equation (29) lie in the field K and are distinct, we can find n distinct eigenvectors of the operator A in the space Kn by solving the system (28) for X = X1? X2, . . . , X„ in turn. By Lemma 4.93a, the eigenvectors f,f2, • - • ,fn so obtained are linearly independent. Choosing them as a new basis, we can construct the matrix of the operator A in this basis. Since
A/2 = ^2/2?
^nfn
the matrix Aif) has the form
Xx 0 0 X,
0 0
(31)
0 0
112     LINEAR FUNCTIONS OF A VECTOR ARGUMENT
CHAP. 4
Recalling the definition of a diagonalizable operator (see Example 4.72f), we can formulate this result as follows: Let A be an operator in the space Kn, whose matrix {in any basis) has a characteristic polynomial with n distinct roots in the field K. Then A is diagonalizable. The matrix of A in the basis consisting of its eigenvectors is diagonal, with diagonal elements equal to the eigenvalues of A.
d. On the other hand, if the operator A has a diagonal matrix of the form (31) in some basis fi,f2, • • • ,fn of the space K„ with arbitrary, not necessarily distinct numbers Xx, X2,.. . , Xn along the diagonal, then the vectors fx,f2, •••,/„ are eigenvectors of A and the numbers Xx, X2,. .. , Xn are the corresponding eigenvalues.
To see that A has no eigenvalues other than Xl7 X2,. .. , Xrt, suppose X is an eigenvalue of A corresponding to the eigenvector
/ = I Mi,
i=i
so that Af = X/. Then, comparing coefficients off in the equations
(71 71 71
i=i   i i=i i=i
we get
X& - \& (i = 1, 2,. . . , n). (32)
But at least one of the numbers pl5 (32, . . . , (3n is nonzero, say (3X ^ 0. Thus, choosing i = 1 in (32), we find that X = Xx, i.e., X is already one of the numbers Xx, X2,. .. , X„.
e. The case of multiple roots. Let X = X0 be a root of multiplicity r > 1 of the characteristic equation (29). The following question then arises: What is the dimension of the corresponding eigenspace K(Xo>, or in other words, how many linearly independent solutions does the system (28) have for X = X0? This question can be answered exactly from a knowledge of the rank of the matrix of the system (28), but we would like an answer which involves only the multiplicity r of the root X0.
In Examples 4.72a-c and 4.72e, it is easily verified that the dimension of each eigenspace K(Xo) is the same as the multiplicity of X0 as a root of the characteristic equation of the given operator. However, this is not true in general. For example, let A be the operator in R2 with matrix
A -
*0	0
V-	
problems      I I 3
where jjl^O is arbitrary. Here the characteristic polynomial is (X0 — X)2 and has a double root X = X0. Correspondingly, the system (28) takes the form
\i ■ Ix + o • u = 0, which, to within a numerical factor, has the unique solution
£i = 0,      l2 = 1.
Thus the eigenspace of the operator A corresponding to the eigenvalue X0 has dimension 1, which is less than the multiplicity of the root V
It can be shown that in the general case the dimension of the eigenspace K(x«) does not exceed the multiplicity of the root \ (see Chapter 5, Problem 7). A complete solution to the problem of finding the dimension of the space K(*0> for the case K ~ C will be given in Chapter 6, after showing how to determine the "canonical form" of the matrix of the given operator.
PROBLEMS
1. After defining in a natural way addition of linear forms and multiplication of a linear form by a real number, construct a new linear space K* consisting of all the linear forms defined on some linear space K. If the dimension of the space K is n, what is the dimension of the space K* ?
2. Which of the following vector functions defined on the space V3 are linear operators:
a) Ax = x + a (a is a fixed nonzero vector);
b) Ax = a;
c) Ax = (a, x)a;f
d) Ax = (a, x)x;
e) Ax = (l\t £2 + £3, $3), where x =      £2, £3);
f) Ax = (sin £x, cos £2, °);
g) Ax = (2^ - 5,, £2 + 5», $0?
3. Consider the following operations in the space of all polynomials in t\
a) Multiplication by t;
b) Multiplication by t2;
c) Differentiation.
Are these linear operators?
4. Suppose the operator A defined on V3 carries the vectors
^=(0,0,1),     x2 = (0,1,1),     x,-(1,1,1)
t Here (a, x) denotes the usual scalar product of the vectors a and x, i.e., the number equal to the product of the lengths of the vectors and the cosine of the angle between them.
114     linear functions of a vector argument
chap. 4
into the vectors
y, = (2,3,5),      y2 = (1,0,0),     y3 = (0, 1, -1).
Form the matrix of A in the following bases:
a) ex = (1, 0, 0), e2 = (0, 1, 0), e3 = (0, 0, 1);
b) xXf x2, x%.
5. In three-dimensional space let A denote the operator corresponding to rotation through 90° about the axis OX (taking O Y into OZ), let B denote the operator corresponding to rotation through 90° about the axis OY (taking OZ into OX), and let C denote the operator corresponding to rotation through 90° about OZ (taking OX into O Y). Show that
A4 = B4 = C4 = E,     AB ^ BA,     A2B2 = B2A2.
Is the relation ABAB - A2B2 valid?
6. In the space of all polynomials in t, let A denote the differentiation operator and let B denote the operator corresponding to multiplication by the independent variable t, so that
AP(t) = P'{t),     hP(t) = tP(t). Is the relation AB = BA valid? Find the operator AB — BA.
7. Assuming that AB = BA, prove the formulas
(A + B)2 - A2 + 2AB + B2,
(A + B)3 = A3 + 3A2B + 3AB2 + B3.
How must these formulas be changed if AB BA?
8. Assuming that AB — BA = E, prove the formula
AmB - BAm = mAm~l      (m = 1, 2, ...).
9. Find the dimension of the linear space K™ of all linear operators mapping an ^-dimensional space Kn into an m-dimensional space Km, and construct a basis for K™
10. Find the product AB of the matrices A and B, where
	1	2	3		-1	-2	-4
A -	2	4	6	,      B =	-1	-2	-4
	3	6	9		1	2	4
11. Raise the following matrices to the nth power:
	1 1		cos 9	-sin 9
A =		,      B =		
	0 1		sin 9	cos 9
12. Find all matrices A of order two satisfying the condition
0	0
0	0
problems      I 15
13. Calculate AB - BA where
		1 2	2				4 1	1
a)	A =	2 1	2		5 =		-4 2	0
		1 2	3				1 2	1
		2	1	0			3	1
b)	A =	1	1	2			3	-2
		-1	2	1			-3	5
14. The sum axl + • *' + of the diagonal elements of a matrix A = \\aik\\ is called the trace of ^, denoted by tr A. Prove that
tr (A + B) = tr A + tr B, tr        = tr (BA).
15. Prove that the formula AB — BA = E is impossible for operators A and B acting on an /i-dimensional space K„.
Comment. The result of Problem 6 shows that the assumption that the space Kn is finite-dimensional plays an essential role here.
16. Given a square matrix Cof order two such that tr C = 0 (cf. Problem 14), show that C can be represented in the form
C = AB BA
where A and B are (unknown) matrices of order two.
17. Let
*j = t%i}et      (./ = l,2,...,m)
be m linearly independent vectors in an w-dimensional space, and let A be the operator defined on the linear manifold L(xl5 x2,.. • , xm) such that
m
yi - Axj = 2 ak)xk      (j = l,2,...,m).
Show that every minor of order m of the matrix made up of the components of y>j (with respect to the basis ex, e2,. .. , en) equals the product of det llaj^ll with the corresponding minor of the matrix made up of the components of the vectors x$.
18. Show that if the basis minor of a matrix of rank r appears in the upper left-hand corner, then the ratio of any minor M of order r to the minor appearing in the same columns as M but in the first r rows depends only on the column indices of the minor M.
I 16     linear functions of a vector argument
chap. 4
19. Show that if A is a matrix of rank r, then any second-order determinant of the form
jl^ii.ts.•--.»>     fyfii-iz,.-At il.is.....ir ki.ki.....kt
\jfk\.k%....,kf i\jfki,k2,...,kT JVIii.i2.....if lvtki.k2.....kT
consisting of minors of order r of the matrix A, vanishes.
20. Show that every minor of order k of the matrix ABC equals a sum of products of certain minors of order k of the matrices A, B and C.
21. Find the inverses of the following matrices:
			1	2	-3
	1 2				
A =		B =	0	1	2
	2 5				
			0	0	1
C -
22. Prove that
for any nonsingular matrix A.
1
2
1
2
l
-I
(A'r1 = oo'
23. Find all solutions of the equation XA = 0, where ,4 is a given second-order matrix, X is an unknown second-order matrix and 0 is the zero matrix (the matrix all of whose elements vanish).
24. Let A = llaj" II be any square matrix of order n, and let A\3) be the cofactor of the element in the determinant of A. The matrix Ä = \\A{^ || is called the ad jugate of the matrix A. Prove that
ÄA = AÄ = (det A)E.
25. In the space of all polynomials in the variable t, consider the operators A and B defined by the relations
fn—i
n+l
A[a0 + a*J + * * * + antn] = ax + a2t + * • • + anV B[a0 + axt + - ■ - + antn] = a0t + axt2 + ■ • •
Show that A and B are linear operators and that
AB = E,      BA ^ E.
Does the operator A have an inverse?
26. Show that the operator B of Problem 25 has infinitely many left inverses.
27. Prove that if A is a nonsingular linear operator acting in an /i-dimensional linear space, then every subspace invariant under A is also invariant under A"1.
28. Prove that if the linear operators A and B commute (i.e., if AB = BA), then every eigenspace of the operator A is an invariant subspace of the operator B.
problems 117
29. Prove that if a direct sum (Sec. 2.45) of eigenspaces of an operator A coincides with the whole space k and if each eigenspace of the operator A is invariant under an operator B, then A and B commute.
30. Let x and y be eigenvectors of the operator A corresponding to distinct eigenvalues. Show that ax + $y (a ^ 0, (3 ^ 0) cannot be an eigenvector of A.
31. Prove that if every vector of the space k is an eigenvector of the operator A, then A = XE (X e K).
32. Prove that if the linear operator A commutes with all linear operators acting in the given space, then A = XE.
33. Let the linear operator A have the eigenvector e0, with eigenvalue X0. Show that eQ is also an eigenvector of the operator A2, with eigenvalue X2.
34. Even if a linear operator A has no eigenvectors, the operator A2 may have eigenvectors (e.g., the operator corresponding to rotation through 90° in the plane). Show that if the operator A2 has an eigenvector with a nonnegative eigenvalue x = [i.2, then the operator A also has an eigenvector.
35. Find the eigenvalues and eigenvectors of the operators given by the following matrices:
c)
	2		1	—	1				1		-2		2
a)	0		1		0	; b)			0		1		0
	0		2		1			0	0	0	0	1	1
2	—	1		0				-1		0		1	
0		1		-1		d)		0		0		0	
0		1		3				0		0		0	
36. Verify the following facts:
a) The relation N(A)    T(A) is necessary and sufficient for the equality A2 = 0 to hold;
b) N(A) <= N(A2) «= N(A3) c ■ ■ ■ for any operator A;
c) T(A) => T(A2) => T(A3) => ■ - ■ for any operator A;
d) If T(A*) = N(Am), then
T(A) c NtA'"^'-1),     T(Am+*-1) «= N(A).
37. Show that every linear operator A of rank r can be represented as the sum of r linear operators of rank one.
38. Find all the invariant subspaces of a diagonal operator with n distinct diagonal elements, and show that there are 2" such subspaces.
chapter 5
COORDINATE TRANSFORMATIONS
As is well known, in solving geometric problems by the methods of analytic geometry a very important role is played by the proper choice of a coordinate system. Proper choice of a coordinate system also plays a very important role in a much wider class of problems connected with the geometry of w-dimensional linear spaces. This chapter is devoted to a study of the rules governing coordinate transformations in /i-dimensional spaces. In particular, the results obtained here are fundamental for the classification of quadratic forms which will be made in Chapter 7.
5.1. Transformation to a New Basis
be another basis in the same space. The vectors of the system {/} are uniquely determined by their expansions in terms of the vectors of the original basis:
5.11. Let
fx - P?^ + pil}e2 + • • • h = P?'* 4- Pi2)e2 + * • •
(1)
L = Pi^i + p[nie2
_i_
118
sec. 5.1 transformation to a new basis 119
or, more concisely,
fj = Ipli)ei     0=1,2,...,«). (2) The coefficients p\j)      = 1, 2,...,«) in (1) and (2) define a matrix
P = \\P(A\ =
called the matrix of the transformation from the basis {e} to the basis {/}. As was done previously in similar cases (Sec. 4.2 ff.), we write the components of the vectors(with respect to the basis {e}) as the columns of the matrix P. By the same token, the formulas (1) together with the matrix P specify a corresponding linear operator P, defined by the relations f = Pe» 0* = 1, 2,.. . , n) and called the operator of the transformation from the basis {e} to the basis {/}.
The determinant D of the matrix P is nonvanishing, since otherwise the columns of P, and hence the vectors fx,f2,... ,fn, would be linearly dependent (Sec. 3.12a). A matrix with a nonvanishing determinant is said to be non-singular (recall Sec. 4.75). Thus the transformation from one basis of the n-dimensional space Kn to another basis is always accomplished by using a nonsingular matrix.
5.12. Conversely, let {e} — {ex, e2, . . . , en} be a given basis of the space Kw, and let P = \pKp\ be a nonsingular matrix of order n. Using the equations (1), construct the system of vectors fx, f2, . . . ,fn. It is clear that these vectors are linearly independent, since the columns of every non-singular matrix are linearly independent (Sec. 3.12a). Consequently, the vectors fltfz, form a new basis for the space Kw. Thus every non-singular matrix P = Wp^W determines via (1) a transformation from one basis of the n-dimensional space Kn to another basis.
5.13. Next we note a particular case of a transformation to a new basis, i.e., the case where every vector/^ is just the corresponding vector ek multiplied by a number Xk ^ 0 (k = 1, 2,. . . , n). Then the equations (1) take the form
/i = Mi,
ft = ^-2e2->
fn ' "^rfini
Pil) A"' P?» Pi"'
p[1) pT
p[n)
P^
120     coordinate transformations
chap. 5
and the matrix P has the diagonal form
Xx 0
P =
0 X,
0 0
0 0
(3)
In particular, for Xx = X2 = • ■ • = XM = 1, we obtain the matrix of the identity transformation, namely the unit matrix
1 0 0 1
0 0-1
(the original basis is not changed by the identity transformation).
5.2. Consecutive Transformations
5.21. Let P ~ \\p[n\\ be the matrix of the transformation from the basis
{e} = {elt e2i. . . , en)
to the basis
if) = {/1./2, • ■ ■
and let Q = ||^.fc)|| be the matrix of the transformation from the basis {/} to the basis
ig) = igl*       • • • > gn)-
We now determine the matrix of the transformation from the basis {e} directly to the basis {g}. By (2), the formula for transforming from the basis {e} to the basis {/} is
/i = 2A 0-1,2,...,«),
while that for transforming from the basis {/} to the basis {g} is
**=iy*y, (/c-i,2,...,n).
Substituting (4) into (5), we obtain
= 2(2^rU (*=1,2,...,„).
(4)
(5)
(6)
sec. 5.3
transformation of the components of a vector 121
On the other hand, if T = ||^*'|| denotes the matrix of the transformation from the basis {e} to the basis {g}, we can write
S* = 2'!*^      (*=l,2,...,n). (7) Comparison of (6) and (7) gives
tf-lp^q?'     (i,*=l,2f...,«). (8)
Recalling formula (8), p. 85 (where the choice of indices is somewhat different, but not their role), we find that the desired matrix T is the product PQ of the matrices P and Q.
5.22. Consider the following special case of consecutive transformations. Since the matrix P is nonsingular, the system of equations (1) can be solved for the vectors ex, e3, . . . , en. The resulting system of equations
*i = <7'i7i + qFfz + ■ • ■ +
e, = q[2)A + q^h + ■ ■ ■ + q(n2)L,
........................ (9)
en - qin)A + q\n)h + • • ■ + q(:%
obviously determines the transformation from the basis {f} to the basis {e}. The consecutive transformation from the basis {e} to the basis {/} by using the matrix P and then from the basis {/} to the basis {e} by using the matrix Q — \W}k) II is equivalent to the transformation from the basis {e} to itself, i.e., to the identity transformation with unit matrix (3).
5.3. Transformation of the Components of a Vector
5.31. Let {e} = {ex, e2i . . . , en} and {/} = {/i,/2, ...,/„} be two bases in an ^-dimensional linear space K,,. Any vector xeK„ has the expansions
x = lxex -f      H-----1- ^nen = vjifi -\- f]%f2 H-----V v)„/w, (10)
where • ■ ■ -     are the components of the vector x with respect to the
basis {e} and y\x, -/j2,. . . , rjn are its components with respect to the basis {/}. We now show how to calculate the components of the vector x with respect to the basis {/} in terms of its components with respect to the basis
{el
Suppose we are given the matrix P = Wp^W of the transformation from the basis {e} to the basis {/}. Then the vectors {e} are given in terms of the
122     coordinate transformations
chap. 5
vectors {/} by (9) or, more briefly, by
«y=itf}a      (k=l,2,...,n), (11) fc=i
where the matrix Q ~ \\qkj)\\ is the inverse of the matrix/*. Substituting (11) into the expansion (10), we get
3 = 1 fc=l J=l      \fc=l / fc=l\j = l /
It follows by the uniqueness of the expansion of the vector x with respect to the basis {/} that
^■=14%      (*= l,2,...,n), (12)
>=i
or, in expanded form
1i = 9i 5i + <7i £2 + • ■ ■ + <7i
yj2 = q?% + ^2 + ■ ■ ■ + #>5„,
i„ =     + v(:% + ■ ■ ■ + sir's,.
Thus components of the vector x with respect to the basis {/} are linear combinations of the components of the vector x with respect to the basis {e}; the coefficients of these linear combinations form a matrix which is the transpose of the matrix of the transformation from the basis {/} to the basis {e}, i.e., the transpose of the inverse of the matrix P. Denoting the inverse of the matrix P by P_1 and the transpose of a matrix by a prime, we find that the matrix S describing the transformation from the components ^, £2, . . . , £„ to the components ra, v)2, . . . , yj„ is given by
S = (P-1)'. 5.32. The converse proposition is also valid:
Theorem. Let ^, £2, . . . , \n be the components of an arbitrary vector x with respect to the basis {e} = e2, . . . , e^} of the n-dimensional space Kn, and let the quantities 7)1? rj2, . . . , rin be defined by the formulas
ril — Sll%il + ^12^2 + ' ' " + slti^n> ^2 ~ S2l*il ~\~ S22^s2 + * ' ' + S2n%in?
f\n       Snl^l "T" sn2^2 + ' ' ' + Snr&n,
where det ||jiJtH ^ 0. Then a new basis {/} = {/i,/2,. . . ,/„} can be found in the space K„ such that the numbers ra, r\2, ■ . . , f}n are the components of the vector x with respect to the basis {/}.
SEC. 5.4 TRANSFORMATION OF THE COEFFICIENTS OF A LINEAR FORM 123
Proof. Introduce the matrix S = \\sjk\\ and the matrix P = (5') 1 with elements denoted by p[jK Substituting these elements into the formulas (1), we get a new basis {/} = {/i,/2> • • ■ >/«}■ We assert that this is the desired basis. In fact, consider the transformation formulas (12), which give the components of the vector x with respect to the new basis. As we have seen, these formulas can be written in terms of the matrix CP"1)'. But in the present case, (P~1)' coincides with S, since
(p-1)' = ([(st1!-1)' - (s'y = s.
Hence, given any vector x, the quantities yj^ y]2, . . . , rj„ are just the components of x with respect to the basis {/}. |
5.33. Just as in Sec. 5.21, we can construct the matrix corresponding to consecutive transformations of the components of a vector. Let £ls £2, . . . , £n be the components of the vector x with respect to the basis {e}, and let the quantities     yj2, . . . , v\n and tx, t2, . . . , t„ be defined by the equations
n
t\i = 2 Pali      0* = 1, 2, . . . , n),
n
Tit =2m,-     (k=l,2,..., n),
3 = 1
respectively, where the matrices P = \\pH\\ and Q = \\qkj\\ are nonsingular. Then, just as before, we can express the quantities tl5 r2, . . . , t„ directly in terms of the quantities     £2, . . . ,    by the formulas
TO     /TO \ TO
Tfc = 2 ( HQmPh): = 2 = !» 2, • . ■ , «),
»=l / *=l
where the quantities (/*,& = 1,2,...,«) form a matrix T equal to the product QP of the matrices 2 and ^
5.4. Transformation of the Coefficients of a Linear Form
Let L(.y) be a linear form defined on a space K„. As we saw in Sec. 4.1, if a basis {<?} = {<?l5 e2, - • • > is chosen in KB, then the values of L(x) can be calculated from the formula
L(x)=i/^,
A=l
where Z,k {k = 1, 2, ...,«) are the components of the vector x with respect to the basis {<?}, and the coefficients lk are given by
lk = L{eJ (*=l,2,...,n).
124     coordinate transformations
chap. 5
The coefficients lk obviously depend on the choice of the basis {e}. We now derive the rule governing the transformation of the coefficients of a linear form when we go over to a new basis. Suppose the formulas
ft = 2 Pi\     U = 1, 2, ... , n) (13)
give the transformation from the basis {e} to the new basis {/}. We wish to find the coefficients of the linear form L(x) in the basis {/}. These coefficients are the numbers X, = L(f), which can easily be found by using (13):
h = Ufs) = I Pif)UeJ = 2 p[%.
i=i i=i
Thus the coefficients of a linear form transform in the same way as the basis vectors themselves.
5.5. Transformation of the Matrix of a Linear Operator
5.51. Given a linear operator A in an n-dimensional space K„, let A(e) = be the matrix of A in the basis {e} = {eu e2, . .. , en}, while A(f) = Wa^W is its matrix in the basis {/} = {/ls/2, ■ ■ ■ »/«}■ Moreover, suppose the transformation formulas from the basis {e} to the basis {/} have the form
/*=2>?W     (*=l,2,...,n), (14)
j=i
and let P denote the matrix ||/»J.fc,||. We now find the relation between the matrices Ale), A(f) and P.
The matrix A(e) is defined by the system of equations
Aet = 2 a[%     (y - 1, 2,... , n), (15)
i=i
and the matrix A{f) by the system of equations
A/m=i«im)A (m=l,2,...,ii).
fc=i
In the last equation, we use (14) to replace the vectors fk by their expressions in terms of the vectors e}. The result is
A/m =2«^ 2 Pl% = 2 (lP?Km))e,
fc=l        i=l i=l \fc=l /
after changing the index of summation from j to i. Next we apply the operator
sec. 5.5
transformation of the matrix of a linear operator 125
A to both sides of (14), changing k to m and using the expansion for Ae3 given by (15):
; = 1 t=l j=l\j=l /
Comparing coefficients of ei in the last two expansions, we find that
k=i y=»i
or
PAt„=Au)P (16)
in matrix form. This is the desired relation between the matrices Aie), Aif) and P. Multiplying on the left by the matrix P-1, we get the following expression for the matrix A(f):
Aif) ~ P~1A(e)P-
5.52. It follows from (16) and the theorem on the determinant of a product of two matrices (Sec. 4.75) that
det P det A{f) = det A(e) det P,
or, since det.P ^ 0,
det Aie) = det A(f).
Thus the determinant of the matrix of an operator does not depend on the choice of a basis in the space. Therefore we can talk about the determinant of an operator, meaning thereby the determinant of the matrix of the operator in any basis.
5.53. Besides the determinant, there exist other functions of the matrix elements of an operator which remain unchanged under transformation to a new basis. To construct such functions, consider the operator A — XE, where X is a parameter. This operator obviously has the matrices A(e) — XE and Alf) — X£ in the bases {e} and {/}. By what was just proved, we have
det (A(e) - XE) = det (Am - XE)
for any X. Both sides of this equation are polynomials of degree n in X. Since these polynomials are identically equal, they have the same coefficients for any power of X. Hence these coefficients, which are functions of the matrix elements of the operator, are invariant under changes of basis.
126     coordinate transformations
chap. 5
We now examine the nature of these functions. The determinant of the matrix Au) — XE has the form
a[l) - X
"2
a[2)
"2
a2n)
a
a
(2)
a
in)
= (-i)nXn + A^1 + ■ ■ ■ + A^X + A„.
It is an easy consequence of the definition of a determinant that the coefficient Ax of X*-1 equals the sum
a
a{2)
"2
of the diagonal elements, taken with the sign (—l)*"1^ The coefficient A2 of X"~2 is the sum of all the principal minors of order 2, taken with the sign (— \)n~2-% Similarly, the coefficient Afc of X"~fc is the sum of all the principal minors of order k, taken with the sign (— l)"-'1". Finally, the coefficient An of X°, i.e., the constant term, is obviously equal to just the determinant of the operator. The polynomial det (Aie) — X£), which, as we have just seen, is independent of the choice of basis, is called the characteristic polynomial of the operator A.
*5.6. Tensors
5.61. The components of a vector, the coefficients of a linear form, the elements of the matrix of a linear operator, these are all examples of a general class of geometric objects called tensors. Before giving the definition of a tensor, we first revise and "rationalize" our notation somewhat- The basis vectors of an ^-dimensional space K„ will be denoted, as before, by the symbols er, e2, .. . , en (with subscripts). The components of vectors, e.g., x andj^, will be denoted by - - ■ > V1 and vj1, yf, . . . , vf (with superscripts). The coefficients
of a linear form L(x) will be denoted by /l9 /2, -. . , /„ (with subscripts). The matrix elements of a linear operator will be denoted by a3., where the superscript designates the row number and the subscript designates the column number (in contradistinction to the notation adopted in Sec. 4.23). The convenience of this arrangement of indices is determined by the following summation convention: If we have a sum of terms such that the summation index t (say) occurs twice in the general term, once as a superscript and once
f The sum a[l) + a[2) -f ■ ■ ■ +      is called the trace of the operator A (cf. Problem 14, p. U5).
X The minor ^)\\)l['..,tjl is said to be a principal minor if i\ — ju i2 ~ jit . . . , i,c ~ jk.
sec. 5.6
tensors      I27
as a subscript, then we will omit the summation sign. For example, with our convention, the expansion of the vector x with respect to the basis {el} e2, . . . , en] takes the form
x =
(although the summation sign is omitted, summation over i is implied). The expression for a linear form L(x) in terms of the components of the vector x and the coefficients of the form becomes
Ux) = 1,1*
(summation over i is implied). The result of applying the operator A to the basis vector et takes the form
Aef = ale}
(summation over j is implied). The components if of the vector Ax are expressed in terms of the components of the vector x as follows:
rt   — ui->
(summation over / is implied).
We will denote quantities pertaining to a new coordinate system by the same symbols as in the old coordinate system but with primes on the indices. Thus we denote new basis vectors by er, e2>, . . . , en>, new components of a vector x by £2', . . . , etc. The elements of the matrix of a transformation from the basis ei to the basis ev will be denoted by p\,, so that
et. = pie, (17)
(summation over / is implied). The elements of the matrix of the inverse transformation will be denoted by q\ , i.e.,
et = qUi- (18)
(summation over /' is implied). The matrix q\ is the inverse of the matrix p\> \ this can be expressed by writing
fO for i ^ U for i =j,
or
.  ., [Ofor/'^y,
PU\ = I , (20)
U for j' = /.
To make the notation more concise, let denote the quantity which depends on the indices i and j in such a way that it equals 0 when the indices are different and 1 when the indices are the same. Then we can write (19) in the form
pw; = Vi (20
128     coordinate transformations chap. 5
and (20) in the form
Mi = (22)
5.62. To show the advantages of using our new notation, we derive once again the formulas by which the components of a vector, the coefficients of a linear form and the matrix elements of an operator transform in going over to a new basis. Thus suppose we have a vector
x = 1% = 1%.. Using (18) to replace et byqle^ , we obtain
which implies
V = qtV, (23)
since the ef> form a basis. This is just the transformation formula for the components of a vector.
Next suppose we have a linear form L(x). The numbers /i} are defined as usual by the relations lv = L(«v)- Using (17) to substitute the expression p\,et for eVi we obtain
lv = L(p{*<) = p},L{ei) = p{''<.
so that
h- = &l» (24)
which is the desired formula.
Finally suppose we have an operator A. The elements of its matrix in the new basis are defined by the relations
Aev = a^ey.
Using (17) to substitute pfe* and p\>e} for the quantities et> and ey, we get
p\,Aei = afol.ej. But Aet = a\e^ so that the result is
Since the ei are basis vectors, we have
To get aK on the right, we multiply both sides by qf and sum over the index j. Using the relation (22), we obtain
By the definition of the quantity      the sum over / reduces to the single
sec. 5.6
tensors 129
term corresponding to the value / = k!. Then S*> = 1 (no summation implied) and we get
a% = p\.q)'a{y (25)
which is the desired formula.
It is not hard to verify that the three transformation formulas just derived are the same as those derived earlier in the ordinary way (see Sees. 5.3-5.5). Formulas (23)-(25) have much in common. In the first place, these formulas are linear in the transformed quantities. Secondly, the coefficients in these formulas are elements of the matrix transforming the old basis into the new basis or elements of the matrix of the inverse transformation or, finally, elements of both matrices.
5.63. We are now in a position to give the definition of a tensor. Tensors are divided into three classes, covariant, contravariant and mixed. Moreover, every tensor has a definite order. We begin by defining a covariant tensor, which, to be explicit, we take to have order three. Suppose there is a rule which in every coordinate system of an n-dimensional space K„ allows us to construct «3 numbers (components) Tiik, each of which is specified by giving the indices /, J, k definite values from 1 to n. By definition, these numbers Tm form a covariant tensor of order three if in going to a new basis, the quantities Tiik transform according to the formula
i'i'k' — Pi'Pj'Pk'iiik'
A covariant tensor of any other order is defined similarly; a tensor of order m has nm components instead of «3 components, and in the transformation formula there appear m factors of the form p\. instead of three factors. In particular, the coefficients of a linear form, which transform by formula (24), constitute a covariant tensor of order one.
Next we define a contravariant tensor of order three. Suppose we have a rule which in every coordinate system allows us to construct n3 numbers Tiik, each of which is specified by giving the indices i,j, k definite values from 1 to «. By definition, these numbers Ti,k, form a contravariant tensor of order three if in going to a new basis, the quantities Tm transform according to the formula
Tn'k' = qfq^Tiik.
A contravariant tensor of any other order is defined similarly. In particular, the components of a vector form a contravariant tensor of order one.
The terms "covariant" and "contravariant," which have just been introduced, are very simply explained. "Covariant" means "transforming in the same way" as the basis vectors, i.e., by using the coefficients p\>. "Contravariant" means "transforming in the opposite direction," i.e., by using the coefficients q\*.
130     coordinate transformations
chap. 5
There is still the case of mixed tensors to consider. For example, numbers Tf., specified in every coordinate system, form a mixed tensor of order three, with two covariant indices and one contravariant index, if in going to a new basis, the quantities Tf. transform according to the formula
A mixed tensor with / covariant indices and m contravariant indices is defined similarly. In particular, the elements of the matrix of a linear operator form a mixed tensor of order two, with one covariant index and one contravariant index. Note the convenience of our arrangement of indices, which has been deliberately chosen to indicate the character of any tensor at a glance.
5.64. Operations on tensors. We can define the operation of addition for two tensors of the same structure, e.g., for two tensors Tf. and S* (with two covariant indices and one contravariant index). In this case, the sum is a tensor of the same structure, defined as follows: In every coordinate system, the component of with fixed indices ij, k is the sum of the corresponding components of T*. and The fact that the quantities Qf. actually form a tensor, and indeed one of the same structure as Tf. and St, is implied by the following equality:
~*      r,k'fTk   I   ck\ i „3'„k's\k
= Pi'Pi'ak\Ta + su) = Pi'PyQkQu-The operation of multiplication is applicable to tensors of any structure. For example, let us multiply a tensor 7*i; by a tensor S[,. The result is a tensor Q\ik of order four. In any coordinate system its component with fixed indices i,j,k, I is defined as equal to the product of the corresponding components of the factors 7~, and S}r The tensor character of Qlm can be verified as follows:
Qi-i'k' = Ti'3'sk' = Pi'PyTijpWiSk = Pi-PrPk-ql Tnsl = Pi-PyPWiQ\n-
Next we consider still another operation called contraction. This operation can be applied to tensors which have at least one covariant index and one contravariant index. For example, suppose we have a tensor 7?.. To contract T\. with respect to the superscript and the first subscript means to form the quantity
in every coordinate system. Here summation over the index / is implied; as a result, the quantity T} = 7**. depends only on the index /'. Contraction of a tensor yields another tensor, whose order is two less than the order of the original tensor. We verify this for the present example. We have
problems     |3 I
Here the summation over k reduces to only one term, corresponding to the value k = i. Since    = 1 (no summation implied), we obtain
as required.
What is the result of contracting a mixed tensor t\ of order'two with respect to its two indices? The quantity t = T\ no longer has even a single index, i.e., in every coordinate system it consists of just one number. This number is the same in every coordinate system, since
t' = t? = pW\t\ = s;rj = t\= t.
Such a scalar quantity, which does not depend on the coordinate system, is called an invariant. Thus, by contracting tensors, we can obtain invariants of the tensors.
For example, if we contract the tensor a\ corresponding to the linear operator A, the invariant a\ so obtained is the trace of the matrix of A, i.e., the sum of its diagonal elements. The invariance of this quantity has already been proved in a different way in Sec. 5.53. As another example, the matrix c) of the product of two operators with matrices a\. and b1., respectively, is the mixed second-order tensor obtained by contracting the fourth-order tensor a^b) with respect to the indices k and /.
PROBLEMS
1. A vector x e K„ has components £1; £2,. . . , ln with respect to a basis elt e2,.. . , en. How does one construct a new basis in Kn such that the components of x with respect to this basis equal 1, 0, . . . , 0?
2. A basis elt e2,.. ■ , en is chosen in an n-dimensional space K„. Show that every subspace K' K„ can be specified as the set of all vectors x e Kn whose components (with respect to the basis el5 e2,.. . , en) satisfy a system of equations of the form
2 atili =0      (/ = 1,2,. . .,k).
3 (Continuation). Show that every hyperplane H ^ K„ can be specified as the set of all vectors x e Kn whose components (with respect to the basis e1( e2, . . . , en) satisfy a system of equations of the form
n
2*»^ = bi      (' = 1, 2, . . . ,k).
3-1
4. Let the components of a vector in the plane be £lt ^2 witri respect to one basis, vi2 with respect to another basis, and tl, t2 with respect to a third basis.
132     coordinate transformations
Suppose that
Ti = ^ii^i + 612^2, A = ||fl«H,
chap. 5
12 = «215i + «22^2, T2 = 621^i + 622£2,
Express the components t1s t2 in terms of the components £ls £2.
5. Given a linear form L{x) ^ 0 in the space K„, find a basis/i,/2, such that the relation
holds for every vector
* = Z *»*/*■
6. Let the operator A acting in an n-dimensional space R have a A:-dimensional invariant subspace R'. Then, temporarily regarding A as defined only in the subspace R', we can construct the characteristic polynomial of degree k for A. Show that this polynomial is a factor of the characteristic polynomial of the operator A acting in the whole space R.
7. Let X = X„ be an r-fold root of the equation det ^Aie) — = 0. Show that the dimension m of the eigenspace R(x»> of A corresponding to the root \ does not exceed r.
8. Show that the quantity 8| is a second-order tensor, with one covariant index and one contravariant index.
9. A set of quantities is defined in every coordinate system as the solution of the system of equations
TikSti = 8*,
where Tik is a contravariant tensor of order two and det || Tik\\ 0. Show that Si}- is a covariant tensor of order two.
chapter 6
THE CANONICAL FORM OF THE MATRIX OF A LINEAR OPERATOR
Two operators A and B acting in an n-dimensional space K„ are said to be equivalent if there exist two bases in Kn such that the matrix of the operator A in the first basis coincides with the matrix of the operator B in the second basis. Clearly, the "linear transformations" in Kn corresponding to equivalent operators have identical properties. But how can we decide whether or not the operators A and B are equivalent by examining their matrices in the same basis?
In this chapter, starting from a given linear operator A in an w-dimensional (real or complex) space, we will find a basis in which the matrix A of the operator A has "canonical form," i.e., a form which is the simplest possible in a certain sense. This canonical form can be obtained directly from the elements of the matrix of the operator A in any basis. Moreover, it turns out that if the operators A and B are equivalent, then their matrices have the same canonical form. Thus a necessary and sufficient condition for two operators to be equivalent is that their canonical matrices coincide.
We begin our considerations by studying a special class of operators (Sec. 6.1). The general case will be studied in Sec. 6.3.
6.1. Canonical Form of the Matrix of a Nilpotent Operator
6.11. A linear operator B acting in an n-dimensional space K„ is said to be nilpotent if Br = 0 (i.e., if Brx = 0 for every x eKn) for some positive
133
134     the canonical form of the matrix of a linear operator chap. 6
integer r. Given a nilpotent operator B such that Br = 0, we will assume that B^1 ^ 0, i.e., that there are vectors x eKn such that Brlx ^ 0. By the height of a vector x e K„, we mean the smallest positive integer m for which Bmx = 0. By hypothesis, every vector x e K„ is of height </•, and there are vectors of height equal to r. Given any k < r, let Hk denote the set of all vectors of height <k. Obviously, Hfc is a subspace of Kn. In fact, if x, y e Hfc, then Bkx = 0, Bky = 0 and hence Bfc(ax + fiy) = 0 for arbitrary a, (3 e A", so that the height of the vector olx + fiy does not exceed k, i.e., ax + (3_y e Hfc. Moreover, it is obvious that Hr = Kn and thatf
{0} = H0 <= H, c • • • c Hr^ <= Hr = K„.
Let mk denote the dimension of Hfc, so that
0 — m0 < tnl < • • • < wr = «.
Next we construct a basis in the space Kn as follows: As we have seen, Hr_! does not coincide with the whole space Kn = Hr. Therefore we can find vectors /1} . . . tfv lying in Hr and linearly independent over H^, where pY = mr — mr_x (see Sec. 2.44). The vectors B/i, . . . , Bf^ lie in Hr_! and are linearly independent over Hr_2. In fact, if we had
a.B/i + • ■ ■ + xPBfPi =ge Hr_2      (g ^ 0),
then application of the operator Br~2 would give
aiBr-yi + • • ■ + <x.pBr~lfPi = 0,
or equivalently
H-----1- E Hr-l>
which is impossible, by construction. It follows that the dimension — mr_2 °f the space over Hr_2 (again see Sec. 2.44) is equal to or greater than the dimension mr — of the space Hr over H^. We now supplement the vectors B/l5 . . . , BfVi with vectors fVl+1, . . . ,fP2 in to make the largest system which is linearly independent over tL._2 {pz = mr_t — wir_2). Applying the operator B to all these vectors, we get vectors
B2/i» ■ ■ ■ , B2/Pi, B/„i+1, . . . , B/Pj
lying in Hr_2 and linearly independent over Hr_3 (this is proved in the same way as before). It follows that wr„2 — mr^ > /wr-1 — mr_2, and we can construct vectors fPi+i, . . . ,/„a in Hr_2 which together with the preceding system form a "full system'1 of vectors linearly independent over Hr„3. Continuing this construction in the subspaces Hr_3, . . . , H0 = {0}, we finally
f {0} denotes the set whose only element is the zero vector.
sec. 6.1
canonical form of the matrix of a nilpotent operator 135
get a full system of n linearly independent vectors. This system can be written in the form of a table
■>fvr
where the vectors in the first row are of height r, those in the second row are of height r — 1, and so on, with the vectors in the last row being of height 1 (so that the operator B carries them all into the zero vector).
6.12. Every column of the above table determines an invariant subspace of the operator B. The first px invariant subspaces all have dimension /•, the next p2 — pi invariant subspaces all have dimension r — 1, and so on, with the last pr — /V-i single-element columns determining one-dimensional invariant subspaces. The whole space K„ is the direct sum of these pT invariant subspaces.
6.13. Next we write the matrix of the operator B in the subspace determined by the vectors of the first column. For a basis we choose the vectors Br-1/i, Br-2/i, . . . , B/i,/i, arranged in order of increasing height. With this arrangement, the operator B carries the first vector of the basis into the zero vector, the second vector into the first vector, etc., and finally the rth vector into the (r — l)st vector. Therefore, according to Sec. 4.23, the matrix of the operator B has r rows and r columns, and is of the form
0
0
0  0 1
0 0 0 0  0 0
0 0 0 0
0 1 0 0
(1)
with zeros everywhere except for the elements (equal to 1) along the diagonal just above the principal diagonal. The matrix of the operator B takes a similar form in the other invariant subspaces, corresponding to the remaining columns of the table, and in fact can differ from the matrix (1) only by having a different number of rows and columns.
6.14. Thus the matrix of the operator B in the whole space K„ is quasi-diagonal (see Sec. 4.84), with blocks of the form (1) along the principal
136     the canonical form of the matrix of a linear operator chap. 6
diagonal:
0	1	0 •	•• 0	0
0	0	1 •	•• 0	0
0	0	0 •	* * # •• 0	4 1
0	0	0 •	0	0
0 0
0 0
* «
0 1
0 0
0 1	
0 0	
	0
0
(2)
The number of blocks of size r equals px, the number of blocks of size r — 1 equals p2 —/>i, - - - , the number of blocks of size (2) equals pT_Y — pr_2, and the number of blocks of size (1) equals pT — pT_^ Naturally, if /V-h-i = Pr-i for some /, then the matrix (2) contains no blocks of size/.
6.2. Algebras. The Algebra of Polynomials
6.21. We begin with some definitions. A linear space K over a number field K is called an algebra (more exactly, an algebra over K) if there is defined on the elements x, y, .. . of K an operation of multiplication, denoted by x • y (or xy)t which satisfies the following conditions:
1) &(xy) = (<zx)y = x(ay) for every x, y in K and every a in K;
2) (xy)z = x(yz) for every x, y, z in K (the associative law);
3) (x + y)z = xz + yz for every x, y, z in K (the distributive law).
0 1 0
0 0 1
« • •
0 0 0
0 0 0
sec. 6.2
algebras. the algebra of polynomials 137
In general, multiplication may not be commutative, i.e., we may have xy ^ yx. If multiplication is commutative, i.e., if
4) xy — yx for every x, y in K,
then the algebra K is said to be commutative.
An element e eK is called a left unit if ex — x for every xeK, a right unit if xe = x for every xeK, and a two-sided unit or simply a unit (in K) if ex = xe = x for every xeK.
An element xeK is called a /e/if inverse of the element >> e K if xy is the unit of the algebra K; in this case, y is called a right inverse of x. If an element z has both a left and a right inverse, then the two inverses are unique and in fact coincide (cf. Sec. 4.76a). The element 2 is then said to be invertible, and its inverse is denoted by z"1.
The product 2u of an invertible element 2 and an invertible element u is an invertible element with inverse u~l2~1. If the element u is invertible, then the equation ux = v has the solution x = urxv. This solution is unique, being obtained by multiplying the equation ux = v on the left by w-1. In the commutative case, we write x — v\u or x = v:u, calling the element x the quotient of the elements v and u.
The ordinary rules of arithmetic are valid for quotients, i.e.,
£1     Vi     V1U2 -r UjV2
~T~ -
«1       «2 «l«2
Ui    U2        U ill2 V1.V2 ViU2
(if Ui and «2 are invertible), (if Ui and w2 are invertible), (if «!, w2, and i>2 are invertible).
«!  W2 UiV2
The proof of these facts is left to the reader.
An algebra K is said to have dimension n if K has dimension n regarded as a linear space.
6.22. Examples
a. Given any linear space K, suppose we set x ' y = 0 for every xjeK. This gives an algebra, called the. trivial algebra,
b. An example of a nontrivial commutative algebra over a field K is given by the set n of all polynomials
m
Pfr)=ZakXk
fc=0
with coefficients in equipped with the usual operations of addition and multiplication. This "polynomial algebra" has a unit, namely the polynomial e(X) with a0 = 1 and all other coefficients equal to 0.
138     the canonical form of the matrix of a linear operator chap. 6
c. The linear spaceM(X„) of all matrices of order n with elements in K, with the usual definition of matrix multiplication, is an example of a finite-dimensional noncommutative algebra of dimension n2 (see Sec. 4.73b).
d. A more genera] example of a noncommutative algebra with a unit is the linear space of all linear operators acting in a linear space K, with the usual definition of operator multiplication (see Sec, 4.33).
6.23. a. A subspace L <= K is called a subalgebra of the algebra K if x e L, y g L implies xy eL. A subspace L <= K is called a right ideal in K if x g L, y eK implies xy e~L and a left ideal in K if x g L, y eK implies yx e L. An ideal which is both a left and a right ideal is called a two-sided ideal. In a commutative algebra there is no distinction between left, right and two-sided ideals. There are two obvious two-sided ideals in every algebra K, i.e., the algebra K itself and the ideal {0} consisting of the zero element alone.f All other one-sided and two-sided ideals are called proper ideals. Every ideal is a subalgebra, but the converse is in general false. Thus the set of all polynomials P(X) satisfying the condition P(0) = P(l) is a subalgebra of the algebra II which is not an ideal, while the set of all polynomials P(X) satisfying the condition P(0) ~ 0 is a proper ideal of the algebra IT.
b. Let L <= K be a subspace of the algebra K, and consider the factor space K/L (Sec. 2.48), i.e., the linear space consisting of the classes X of elements x e K which are comparable relative to L. If L is a two-sided ideal in K, then, besides linear operations, we can introduce an operation of multiplication for the classes X e K/L. In fact, given two classes X and Y, choose arbitrary elements x e X, y e Y and interpret X • Y as the class containing the product xy. This uniquely defines X ■ Y, since if x'eX, y e Y, then
xy' — xy = x(y' — y) + (x' — x)y,
and hence xy — xy belongs to L together with y' — y and x' — x. Moreover, since conditions l)-3), p. 136 hold in K, the analogous conditions hold for the classes X e K/L. Therefore the factor space K/L equipped with the above operation of multiplication, is also an algebra, called the factor algebra of the algebra K with respect to the two-sided ideal L. If the algebra K is commutative, then obviously so is the factor algebra K/L.
6.24. Let K' and K" be two algebras over a field K. Then a morphism co of the space K' into the space K" (Sec. 2.71) is called a morphism of the algebra K' into the algebra K" if besides satisfying the two conditions
a) co(x' + /) = co(x') + «(/) for every x', / g K',
b) to(ax') = ato(x') = aco(x') for every x e K' and every v. e K
t As in Theorem 2.14c, 0 ■ x = 0 for every xE K.
sec. 6.2
algebras. the algebra of polynomials 139
for the morphism of two spaces (see p. 53), it also satisfies the condition c) co(x'/) = co(x')to(/) for every x', / e K'.
A morphism co which is an epimorphism, monomorphism or isomorphism of the space K' into the space K", as defined in Sec. 2.71, is called an epimorphism, monomorphism or isomorphism of the algebra K' into the algebra K", provided condition c) is satisfied.
6.25. Examples
a. Let L be a subalgebra of an algebra K. Then the mapping co which assigns to every vector xeL the same vector x e K is a morphism of the algebra L into the algebra K, and in fact a monomorphism. As in Example 2.72a, this monomorphism is said to embed L in K.
b. Let L be a two-sided ideal of an algebra K, and let K/L be the corresponding factor algebra (Sec. 6.23 b). Then the mapping co which assigns to every vector x e K the class X e K/L containing x is a morphism of the algebra K into the algebra K/L, and in fact an epimorphism. As in Example 2.72b, this epimorphism is called the canonical mapping ofK onto K/L.
c. Let co be a monomorphism of an algebra K' into an algebra K". Then the set of all vectors co(x') e K" is a subalgebra L" <= K", and the monomorphism co is an isomorphism of the algebra K' onto the algebra V.
d. Let co be a morphism of an algebra K' into an algebra K". Then the
set L' of all vectors x' e K' such that co(x') = 0, which is obviously a subspace
of K' (cf. Sec. 2.76b), is a two-sided ideal of the algebra K'. In fact, if x' e V,
/ e K,then , , ,
co(x /) --- co(x )a)(y ) = 0,
so that x'y' e L', and similarly y'x' e L', i.e., L' is a two-sided ideal of Kf, as asserted. As in Sec. 2.76b, let CI be the monomorphism of the space K'jV into the space Kff which assigns to each class X' e K'jV the (unique) element co(x'), x' e X'. Then Q is a monomorphism of the algebra K'jV into the algebra K". In fact, choosing x' e X', / e Y', we have x'y' eX'Y'
and Q(XfY') = co(x'/) = co(x>(/) = Q(X')G(Y').
If the morphism co is an epimorphism of the algebra K' into the algebra K", then the morphism D, is an isomorphism of the algebra Kr/L' onto the algebra K".
e. Let A be a linear operator acting in a space K over a field K. Since addition and multiplication by constants in A! are defined for linear operators acting in K, with every polynomial
140     the canonical form of the matrix of a linear operator chap. 6
(ak e K) we can associate an operator
m
k=0
acting in the same space K as A itself. Then the rule associating P(X) with P(A) has the three properties figuring in Sec. 6.24. In fact, if
m m m
P(X) = PifX) + P2(X) = JakAk = 2K + 6fc)Xfc,
k=0 fc=0 fc=0
then clearly
Tn mm
P(A) = 2(ak + bk)Ak = %akAk + %bkAk = PX(A) + P2(A),
k—0 k—0 fc=0
and similarly for property b), while if
mm mm
3=0        fc=0 3=0 fc=0
then
mm mm
0(A) = 2 2«AA>+* = 2^2^Afc = P^P^A),
j=0fc=0 j=0 k=0
by the distributive law for operators (Sec. 4.34). Note that the operators Px(A) and P2(A) always commute with each other, regardless of the choice of the polynomials PX(X) and P2(X). The resulting morphism of the algebra II of polynomials (Example 6.22b) into the algebra B(K) of linear operators acting in K (Example 6.22d) is in general not an epimorphism, if only because operators of the form P(A) commute with each other, while the whole algebra B(K) is noncommutative.f
f. There exists an isomorphism between the algebra L(A^) of all linear operators acting in the ^-dimensional space Kn and the algebra M(Kn) of all matrices of order n with elements from the field K. This isomorphism is established by fixing a basis ex, . . . , en in the space Kn and assigning every operator A eL(Kn) its matrix in this basis. Both algebras L(£K) andM(Kn) have the same dimension k2.
6.26. The set of all polynomials of the form P(X)0o(X), where Q0(l) is a fixed polynomial and P(X) an arbitrary polynomial, is obviously an ideal in the commutative algebra II of all polynomials P(X) with coefficients in a field K (Example 6.22b). Conversely, we now show that every ideal I ^ {0} of the algebra II is of this structure, i.e., is obtained from some polynomial (2oO0 by multiplication by an arbitrary polynomial P(X). To this end, we
t Except in the trivial case where K is one-dimensional.
sec. 6.2
algebras. the algebra of polynomials l4l
find the nonzero polynomial of lowest degree, say q, in the ideal /, and denote it by Q0(k). We then assert that every polynomial Q(k) e I is of the form P(X)G0(X), where P(X)eU. In fact, as is familiar from elementary algebra,
£(X) = i>(X)£0(X) + *(X), (3)
where P(X) is the quotient obtained by dividing (?(X) by G0(X) and R(k) is the remainder, of degree less than the divisor G0(X), i.e., less than the number q. But the polynomials Q(X) and G0(X) belong to the ideal /, and hence, as is apparent from (3), so does the remainder R(X). Since the degree of R(X) is less than q and since Q0(ty has the lowest degree, namely q> of all nonzero polynomials in /, it follows that R(X) = 0, and the italicized assertion is proved.
The polynomial Q0(X) is said to generate the ideal /.
6.27. The polynomial G0(X) is uniquely determined by the ideal I to within a numerical factor. In fact, if the polynomial (^(X) has the same property as the polynomial (?0(X), then, as just shown,
Q9(X) = Po(*)Gi00-
It follows that the degrees of the polynomials Gi(X) and G0(X) coincide and that /i(X) and P0(X) do not contain X and hence are numbers, as asserted.
6.28. Given polynomials Gi(X), .. . , Gm(X) not all equal to zero and with no common divisors of degree > 1, we now show that there exist polynomials /»J(X),. . . , /» (X) such that
P°i(X)Q&) + • • • + i* (X)Gm(X) =1- (4) In fact, let / be the set of all polynomials of the form
PMQifr) + • • • + PMQM
with arbitrary /^(X), . . . , Pm(X) in II. Then / is obviously an ideal in IT. By Sec. 6.26, the ideal / is generated by some polynomial
m
QM = IPXtiQM. (5)
In particular,
Gi(x) = sx(X)G0(X),..., Gm(x) = sjx)G0(x),
where Si(X), . . . , Sm(X) are certain polynomials, from which it follows that GoPO lS a common divisor of the polynomials GiO)> . . . , Gm(X). But, by
142     the canonical form of the matrix of a linear operator chap. 6
hypothesis, the degree of £?0(X) is zero, and hence Q0(k) is a constant a0, where a0 ^ 0 since otherwise / = {0}. Multiplying (5) by l/a0 and writing P°(X) = P°(X)/a0, we get (4), as required.
6.3. Canonical Form of the Matrix of an Arbitrary Operator
6.31. Let A denote an arbitrary linear operator acting in an «-dimen-sional space K„. Since the operations of addition and multiplication are defined for such operators (Sees. 4.31-4.33), with every polynomial
m
poo = 2«***
fc=0
we can associate an operator
p(a)=2%a*
acting in the same space Kn (cf. Example 6.25e), where addition and multiplication of polynomials corresponds to addition and multiplication of the associated operators in the sense of Sec. 4.4. In fact, if
m
then
P(X) = PX(X) + P2(X) = + 2 V* = 2(fl* + bjk"
jfc=0 k=0 k=0
P(A) = 2K + bk)Ak =2«^* +IbkAk = P,(A) + P2(A).
fc=0 fc=0 *-=0
Similarly, if
then
Q{k) = Pa(X)P2(X) = 2«***2M' = 2 2>*M*"
t=0       j=0 (-=0 3=0
<2(a) = 2 2«*M*JJ = 2«^*2M' = ^i(a)p2(a),
fc=0j=0 fc=0 3=0
by the distributive law for operator multiplication (Sec. 4.34). In particular, the operators Pi(A) and P2(A) always commute.
Thus the mapping co(P(X)) = P(A) is an epimorphism (Sec. 6.24) of the algebra IT of all polynomials with coefficients in the field K into the algebra IIA of all linear operators of the form P(A) acting in the space K„. By Sec. 6.25d, the algebra IIA is isomorphic to the factor algebra n//A, where /A is the ideal consisting of all polynomials P(X) such that
co(P(X)) = P(A) = 0.
We now analyze the structure of this ideal.
sec. 6.3 canonical form of the matrix of an arbitrary operator 143
6.32. As noted in Example 6.25f, the set of all linear operators acting in a space Kn is an algebra of dimension w2 over the field K. Hence, given any operator A, it follows that the first n2 + 1 terms of the sequence
A0 = E, A, A2, . . . , Am, .. .
must be linearly dependent. Suppose that
m
ZakAk = 0      (w < n2).
Then, by the correspondence between polynomials and operators established in Sec. 6.31, the polynomial
k=0
must correspond to the zero operator. Every polynomial Q(k) for which the operator Q(A) is the zero operator is called an annihilating polynomial of the operator A. Thus we have just shown that every operator A has an annihilating polynomial of degree < n2.
6.33. The set of all annihilating polynomials of the operator A is an ideal in the algebra n. By Sees. 6.26-6.27 there is a polynomial Q0(k) uniquely determined to within a numerical factor such that all annihilating polynomials are of the form /*(A)(?o(A) where P(X) is an arbitrary polynomial in n. In particular, (?0(A) is the annihilating polynomial of lowest degree among all annihilating polynomials of the operator A. Hence £?0(X) is called the minimal annihilating polynomial of the operator A,
6.34. Theorem. Let Q(X) be an annihilating polynomial of the operator A, and suppose that
(2(a) - GxMftcx),
where the factors £?i(A) and Q2(k) are relatively prime. Then the space K„ can be represented as the direct sum
Kn = Tx + T2
of two subspaces Tx and T2 both invariant with respect to the operator A,f where
Q1(A)x2 = 0,      02(A)*i = 0
for arbitrary xx e Tl5 x2 e T2, so that d(X) and Q2(X) are annihilating polynomials for the operator A acting in the subspaces T2 and T1} respectively.
t Thus ^! e T, implies A*i E Tt and similarly x2 e T2 implies \x2 G T».
144     the canonical form of the matrix of a linear operator chap. 6
Proof By Sec. 6.28 there exist polynomials P^X) and P2(X) such that and hence
P1(A)(21(A) + P2(A)(22(A) = E.
Let Tfc (k — 1,2) denote the range of the operator Qk(A), i.e., the set of all vectors of the form Qk{A)x, xeKn (see Sec. 4.61). Then obviously y = Qk{A)x e Tfc implies Ay = Qk{A)Ax e Jk, so that the subspace Tfc is invariant with respect to the operator A. Given any xx e T1; there is a vector y EKn such that
Q2(A)Xl = Q2(A)Q1(A)y = Q(A)y = 0, and similarly, given any x2 e T2, there is a vector z e Kn such that
<2i(A)*2 = G!(A)Ga(A)z = <2(A)z = 0. Moreover, given any x e Kn, we have
x = g1(A)P1(A)x + £2(A)P2(A)x = x1 + x2,
where
*fc - Qk(A)Pk(A)x eTk     (* = 1, 2).
It follows that Kn is the sum of the subspaces Ta and T2. If x0 e Tx Q T2, then Q1(A)x0 = Q2(A)x0 = 0, and hence
x0 = P1(A)Ql(A)x0 + P2(A)Q2(A)x0 = 0.
Therefore T1f)T2 = {0}, and the sum Kn = Tx + T2 is direct.f |
6.35. Remark, By construction, the operator Q^A) annihilates the subspace T2, while the operator Q2(A) annihilates the subspace Tv We now show that every vector x annihilated by the operator Qi(A) belongs to T2, while every vector x annihilated by the operator Q2(A) belongs to Tx. In fact, suppose Qx(A)x = 0. We have x — xx + x2 where x± e Tl5 x2 e T2, and hence 0i(A)*i = Qx(A)x — Qx(A)x2 = 0 since Qx(A)x2 — 0. But Q^AjXy = 0 as well, since x± e Tx. It follows that
*i = i5i(A)Gi(A)x1 + P2(A)g2(A)x1 - 0,      x = x2 e T2.
Similarly, Q2(A)x = 0 implies x e Tlt and our assertion is proved.
6.36. Representing the polynomials 2i(X) and Q2{X) themselves as products of further prime factors, we can decompose the space Kn into smaller subspaces invariant with respect to the operator A and annihilated by the
f Naturally, the possibility is not excluded that one of the subspaces Tx and T2 consists of the zero vector alone.
sec. 6.3 canonical form of the matrix of an arbitrary operator      145
appropriate factors of d(X) and (?2(X). Suppose the annihilating polynomial Q(X) has a factorization of the form
m
= n - w (6)
fc=l
where Xx, . . . , Xm are all the (distinct) roots of g(X) and rk is the multiplicity of Xfc. For example, such a factorization is always possible (to within a numerical factor) in the field C of complex numbers. Then we have the following
Theorem. Suppose the operator A has an annihilating polynomial of the form (6). Then the space Kn can be represented as the direct sum
Kn = Tx + * • • + Tm
of m subspaces Tl5 . . . , TOT, all invariant with respect to A, where the subspace Tk is annihilated by Brkk, the rkth power of the operator
Bk = A - XfcE
Proof. Apply Theorem 6.34 repeatedly to the factorization (6) of Q(k) into m relatively prime factors of the form (X — X_,)r>. |
6.37. By construction, the operator Bk is nilpotent in the subspace Tk. Hence, by Sec. 6.14, in every subspace Tk (t^{0}) we can choose a basis in which the matrix of Bk takes the canonical form (2). In this basis, the matrix of the operator A = Bk + XfcE takes the form
Xft   1    0-0 0 0    X,,.   1          0 0 0    0   0   - • •   Xfc 1 0   0   0-    0 xfc	
	lk   1    0-0 0 0   Xfc   1   •••   0 0 0   0    0    • •   Xfc 1 0   0   0   •••   0 xfc
146     the canonical form of the matrix of a linear operator chap. 6
Hence the matrix of the operator A in the whole space K„ — Tx + • • • + Tm takes the form
X, 1   • ■ ■ 0 0   Xx ■ ■ • 0 0 0   ■ ■ ■ 1 0   0     ■ • Xx	
	Xx 1   • ■ • 0 0   X, • • • 0 0 0   ■ ■ - 1 0 o ■ ■ ■ xt
Xm	1 • ■	■ 0
0	Xm ■	■ 0
0	0   • •	■ 1
0	o ■ ■	' xm
x,„
_
in the basis obtained by combining all the canonical bases constructed in the spaces Tl9 . . . , Tm. Thus finally we have the following
Theorem. Given any operator A in an n-dimensional space Kn with an annihilating polynomial of the form (6) (in particular, any operator A in an n-dimensional complex space C„), there exists a basis, called a Jordan basis, in which the matrix of A takes the form (8), called the Jordan canonical form 0/A.f
In the case Kn = Cn the complex numbers Xl5 . . . , X„ can be arranged in
t Synonymously, the Jordan normal form of A.
sec. 6.4
elementary divisors      147
accordance with any rule, e.g., in order of increasing absolute value/f The representation (8) is not always possible in the case of an operator A acting in a space Kn ^ C„. In Sec. 6.6 we will consider the canonical form of the matrix of an operator A acting in a real space Kn = R„.
6.4. Elementary Divisors
6.41. The matrix (8) can be specified by a table
^1 ■ «1 , • • • , "n X   -M(2) »(2)
(fc)
> «
(fc)
>
(9)
1    . „(m)
which for each diagonal element \k indicates the sizes n[k), . . . , n{rk) of the corresponding "elementary Jordan blocks" of the form
n
1 0
0   X, 1
0 0 0 0   0 0
0 0
(10)
appearing in the matrix (8). We now show how to construct the table (9) and thereby determine the form- of the matrix /(A) of the operator A, from a knowledge of the matrix A of the operator A in any basis of the space K„.
6.42. As shown in Sec. 5.53, the characteristic polynomial of the operator A does not depend on the choice of a basis. Forming this polynomial for the Jordan basis, we get
det {A - X£) = det (J(A) - X£) = IT (x* - x)Kl +""+"rt,      0l)
since every element below the principal diagonal in (8) is zero. Thus the numbers Xfc (k = 1, . . . , m) are the roots of the characteristic polynomial, and the numbers rk = nik) + •••-(- «(rfc) are the multiplicities of these roots.
f Or in order of increasing argument 0 (varying in the interval 0 < 9 < 2tt), in the case of identical absolute values.
148     the canonical form of the matrix of a linear operator chap. 6
Hence, by calculating the characteristic polynomial (which can be done by using the matrix A) and finding its roots, we can determine the quantities Xfc and rk = «<*> + • • • + n™ in the table (9).
6.43. Next (here and in Sec. 6.44) we show how to use the matrix A of the operator A in the original basis to calculate the numbers themselves. Since /(A) and A are matrices of the same operator A in different bases, it follows from Sec. 5.51 that
The minors of a fixed order, say p, of the matrix A — IE are certain polynomials in X of degree </». Let IP(A) be the ideal in the algebra IT generated by all these minors, and let fp(J(A)) have the analogous meaning. Then the two ideals IV{A) and fv(J(A)) coincide. In fact, according to Sec. 4.54, every minor of order p of the matrix /(A) — IE is a sum of products of minors of order p of the matrices A — XE, T and T~l. But the elements of T and T_1 are numbers. Thus every minor of order p of the matrix/(A) — \E is simply a linear combination of minors of order p of the matrix A — XE, and hence belongs to the ideal IV(A). By symmetry, every minor of order p of the matrix A — IE belongs to the ideal IP(J(A)). It follows that the ideals IV(A) and IJJ(A)) coincide, as asserted.
Now let DP(K) be the polynomial generating this ideal. According to Sec. 6.26, £>j,(X) is just the greatest common divisor of the polynomials generating IV(A). Thus the greatest common divisor of the minors of order p of the matrix /(A) — ~kE is the same as the greatest common divisor of the minors of order p of the matrix A — ~kE, and hence can be regarded as known. The greatest common divisor of the minors of order p of the matrix /(A) — ~kE can be calculated directly as follows: Instead of the matrix /(A) — ~kE, we can again consider a matrix of the form S(J(A) — ~kE)T, where S and 7* are invertible numerical matrices (not containing X). The operations of interchanging rows (or columns) and adding an arbitrary multiple of one row (or column) to another lead to matrices of just this kind (see Examples 4.44d-4.44g). We now assert that the elementary block
/(A) = T~lAT,
where T is a nonsingular matrix, and hence that
/(A) - X£ = T~\A - X£)7*.
Xfc — X
1
0
0
0
Xfc-X 1
0
0
0
0
1
0
0
0
sec. 6.4
elementary divisors 149
can be reduced to the form
n
<fc>,
1 0 0 1
0 0
0 0
(x* - x)"'
(12)
by operations of the indicated type. In fact, to get (12) we first subtract the first row multiplied by Xfc — X from the second row, then the second row multiplied by Xfc — X from the third row, and so on. This gives the matrix
(x* - x)2
1 0 0 1
0 0
(-O-^-x)-1 oo-i (-i)^(xfc - X)«    0 0- 0
where q = njfc>. Then from the first column we subtract the second column multiplied by Xfc — X, the third column multiplied by —(Xfc — X)2, etc., and finally the (q — l)th column multiplied by ( — i)Q~2(Xk — X)0^1. This gives the matrix
0 1   0   • ■ 0
0
0
0 1
0 0
(-D-XXfc-x)« o o
0
0
from which the matrix (12) can be obtained by interchanging columns .f
We now calculate the greatest common divisor DP(X) of the minors of order p of the matrix J(k) with blocks of the form (12) along its principal diagonal. Since all nondiagonal elements of /(X) vanish, the only minors of J(k) which can be nonzero are those with the same set of row and column indices, and such a minor is simply equal to the product of its diagonal elements. Among the elements along the principal diagonal of the matrix /(X), a certain number, say N, are binomials of the form (kk — X)n, while the other n — JV elements are all equal to 1. The number N is just the total number of Jordan blocks in the matrix /(A), i.e., N = rx 4- • • • 4- rm. Clearly Dv(k) 1 if p < n — N, since some of the minors of /(X) of order p < n — ./V are certainly equal to 1. Suppose we replace the matrix /(X) by
t Except possibly for the sign of the element (Xt — >.)«, which is irrelevant to the subsequent determination of DP(X).
150     the canonical form of the matrjx of a linear operator chap. 6
the diagonal matrix
(Xt - X)
,tl\
1
n~ N
which obviously has the same polynomial Dv(k) as /(X). The greatest common divisor of the minors of order p of the matrix /(X) are clearly of the form
Dfr) = Tlfr*-*fk > (13)
with nonnegative exponents [ik(p). The exponents in (13) are easily found. For example, to determine }*.x(p), we note that \ix(p) is the smallest exponent with which X], — X appears in all minors of /(X) of order p. If p < n — rx, then there is a minor of order p which does not contain \x — X at all, so that \ix(p) = 0. However, if p = n — rx -\- \, then, bearing in mind that the
exponents n\
(l)
rv^ are arranged in decreasing order, we have
|XiO») = <).
Moreover, each timep is increased further by 1, the exponent \ix(p) increases, first by n'^, then by «^_2> and so on> until finally we get \t.x(p) = w^11 + ■ + n[1] for p = n. Similarly,
'0 if p < « — /■*,
,(ft>
rj—1
if p = « — rfc + 1,
if p = « — rk + 2,
(ft)
,<ft>
Note that
+ n\*>      if p
— Mn — 1)
ttk(« - 1) ~ M> - 2)
— n.
— n
(ft)
„(ft)
«2 >
|Afc(« - rfc + 1) — jxfc(« - rk) = n™,
so that
ja« - 7 + 1) - 1^(11 - y) = »?>      O = 1, 2, . . . , n - 1) (14) (we set #i<*> = 0 if y > rfc).
sec. 6.4
elementary divisors 151
6.44. The ratio
EM =
JWi(a) dm
is called an elementary divisor of the operator A. The elementary divisors, like the polynomials dP(X) themselves, do not depend on the choice of a basis and hence can be calculated from the matrix of A in any basis. It follows from (13) that
n ex* - a)^+i> m em=-=n (a*
a)
n (a, - xr"'
or equivalently,
m
En-M = n (x* - xr"1^1»-^"-'"' k=i
Using (14), we get
m
En-m = n (x* - xr.1
(p=l,2,...,«- 1)
a = 1,2,1).
0 = 1,2,...,«- 1),
where the roots of ii„_,-(X) have multiplicities equal to the sizes of certain Jordan blocks in the matrix /(A). Thus by calculating the elementary divisors of A, we can find the numbers n{k), thereby finally solving the problem of constructing the table (9).
6.45. Examples
a. The "Jordan matrix"
0	1	0
0	1	1
0	0	1
152     the canonical form of the matrix of a linear operator chap. 6
of order ten has three blocks of sizes 3, 2 and 1 corresponding to the root X1 = 1, and two blocks of sizes 2 and 2 corresponding to the root X2 = 2. Hence the elementary divisors are
em = (1 - X)3(2 - X)2, £8(X) - (1 - X)2(2 - X)2, £7(X) = 1 - X, £6(X)= •• ■ = £1(X)= 1.
b. Suppose a given matrix A = \\aik\\ of order ten has elementary divisors
em = (3 - X)2(4 - X)3,
£8(X) = (3 - X)2(4 - X),
£7(X) = 4 - X,
£,(X) = 4 - X,
£.(X) = • • • = £,(X) = 1
(calculated from the minors of the matrix A — X£", as in Sees. 6.43-6.44). Then, according to Sec. 6.44, the Jordan matrix/(A) has two blocks of sizes 2 and 2 corresponding to the root Xx = 3, and four blocks of sizes 3,1,1 and 1 corresponding to the root X2 = 4. It follows that
/(A) =
6.46. Thus from a knowledge of the elementary divisors of an operator A, we can determine all the numbers n\k) and hence the structure of the Jordan canonical form of A. In particular, we see that the Jordan canonical form of an operator A is uniquely determined by A.
sec. 6.5
further implications 153
On the other hand, since the elementary divisors of an operator A are determined by the minors of the matrix A — \E in any basis, two equivalent operators A and B, i.e., two operators with the same matrix in two (distinct) bases, have the same Jordan canonical form. Conversely, it is obvious that if two operators have the same Jordan canonical form, then they are equivalent. This completely solves the problem of the equivalence of linear operators (in a complex space), posed at the beginning of the chapter.
6.5. Further Implications
6.51. If it is known that the operator A can be reduced to diagonal form, i.e., that its matrix has the form
X,
in some basis, then A is just the Jordan matrix of the operator A (all the Jordan blocks are of size 1). In particular, the elementary divisors all have simple roots. Conversely, if all the elementary divisors of an operator A have only simple roots, then the Jordan matrix J(A) has blocks of size 1 only and hence is diagonal.
6.52. Given the Jordan canonical form of an operator A, we can easily find its minimal annihilating polynomial. Suppose the operator B has the matrix
0	1	0 •	• 0
0	0	1 •	■ 0
0	0	0 •	• • 1
0	0	0 •	• 0
154     the canonical form of the matrix of a linear operator chap. 6
in the basis el9
Then for every
, ep, so that
Bex = 0, Be2 = elt . . . , Bev — ev_x. B x = 0
fc=i
Thus X15 is an annihilating polynomial of the operator B. The minimal annihilating polynomial is a divisor of X* (see Sec. 6.33), and hence must be of the form Xm, m < p. But B^-1^ = ex ^ 0, so that X* is in fact the minimal annihilating polynomial of B.
Now suppose the operator A has the matrix
^   1 0-0
0    X0 1
0
0  o  o ... 1 0   0   0 ••• x0
in the same basis elf. . . , ev, so that A = B + X0E. As just shown,
(A - X0E)*> = B* = 0.
and hence (X0 — X)* is an annihilating polynomial of A, in fact the minimal annihilating polynomial, by the same argument as before. Next suppose the operator A has the quasi-diagonal matrix
\	1	0 •	• 0
0	*o	1 •	• 0
0	0	0 •	• 1
0	0	0 •	• *0
	1	0 •	• • 0
0	*o	1 •	• • 0
0	0	0 •	• 1
0	0	0 •	• *0
sec. 0.6
the real jordan canonical form 155
where the blocks along the diagonal have sizes px > p2 > • • • > pr. Then a polynomial Q(k) annihilating the operator A must annihilate each block separately. Clearly the polynomial (Xq — Xf1 has this property (cf. Sec. 4.52), and in fact is the minimal annihilating polynomial, by the same argument as before.
Finally, in the general case where the operator A has the Jordan matrix described by the table (9), the polynomial
eoo = n (** - wi
k=l
is clearly an annihilating polynomial of A, in fact the minimal annihilating polynomial, since none of the exponents n{*] can be lowered, for the reasons given above.
Thus the polynomial Q(X) is the minimal annihilating polynomial of the operator A. The degree of (2(a), equal to + • • * + n[m), is the sum of the sizes of the largest Jordan blocks, each corresponding to a root of the characteristic polynomial. Note that this number cannot exceed the order of the matrix A, i.e., the dimension n of the space in which the operator A acts. The characteristic polynomial
det(A - XE) = JI (Xjt - X)"i*,+"'+n£
1
of the operator A (see Sec. 6.42) contains £?(X) as a factor, and hence is also an annihilating polynomial (a result known as the Hamilton-Cayley theorem). However, the characteristic polynomial is in general not the minimal annihilating polynomial of A. Clearly, the characteristic polynomial coincides with the minimal annihilating polynomial of A if and only if each root of the characteristic polynomial figures in only one Jordan block, of size equal to the multiplicity of the root.
6.6. The Real Jordan Canonical Form
6.61. Let A be a linear operator acting in a real M-dimensional space Rn. Then in general there is no canonical basis in which the matrix of A takes the Jordan form (8), if only because the characteristic polynomial of A can have imaginary roots. Nevertheless, we can still find a modification of the Jordan matrix (8) suitable for the case of a real space.
Let A = ||aJ-fc)H be the matrix of the operator A in some basis elt. .. , en of the space RK, and consider the complex w-dimensional space Cn consisting of the vectors
x = ajCj + ' • ' + aMen,
156     the canonical form of the matrix of a linear operator chap. 6
where alt. .. , aB are arbitrary complex numbers. The matrix A specifies a linear operator A in the space Cn in accordance with the formula
Ax = J «*Acft = 2 «* 2 «j >
the same formula specifying the operator A itself for vectors x with real components afc.
6.62. First we consider the case of an operator A with an annihilating polynomial of the special form
P(X) = (X2 + t2)*,
where t is a positive number. For the operator A it makes sense to talk about polynomials Q(A) with complex coefficients, in particular, the polynomials (A + /t)» and (A — h)v. The polynomial P(X) = (X2 + t2)v is also an annihilating polynomial of the operator A. According to Theorem 6.34, the factorization
(X2 + t2)33 = (X - rr)»(X + H)v
corresponds to a decomposition of the space Cn into a direct sum of two subspaces C* and C2, both invariant with respect to A, in which A has annihilating polynomials (X — H)v and (X -+- H)p, respectively. Moreover, if the subspace     consists of the vectors
x = oc^x -f- ■ - • + *mem
with arbitrary complex coefficients <x)rt, then the subspace C2
consists of the vectors
where afr is the complex conjugate of cnk(k — 1, . . . , w). In fact, if
(A - ztEV'jc = 0, (15)
then, taking complex conjugates in both factors of the left-hand side, we get
(A + HE)vx = 0, (15')
and conversely.! In particular, it follows that n is even, i.e., n — 2m where m is the dimension of each of the subspaces C\ and C2.
f The subspaces C* and Q are uniquely determined by (15) and (15'), respectively (see Sec. 6.35).
sec. 6.6
the real jordan canonical form 157
Now let/* be the Jordan basis of the operator A in the space C*, as in Sec. 6.37. According to (7), the matrix of A in this basis is of the form
IT	1.	0
0	I*T	1
0	0	IT
IT	1	0
0	IT	1
0	0	IT
Hence the action of A on the basis vectors is described by the formulas A/1 = iTfl - . . , V? = irfl
A/flj =fni-l "T" itfni* ■
■ ) A/""   — fn»-l + 'T/mo-
The action of A on the complex conjugate vectors/* in C2 is described by the complex conjugates of these formulas:
A/i=-iT/i,_ A/WJ-/T/1
A/? = -/t/I, _
- A
Thus we see that the vectors/^ form a Jordan basis for the operator A in the space C2. Hence all the vectors taken together form a Jordan basis
for the operator A in the whole space C„.
We now construct a basis in the real space RK, by replacing each pair of
complex vectors ff and    by a pair of real vectors
?h|(/?+/i)>    a? = (/}-/*)■
2 2/
(16)
It follows from the formulas
V? = /!Li + n/5
m=fU-*fk (fi=/i=o)
158     the canonical form of the matrix of a linear operator chap. 6
that
a   (/! +/*)} - Ag* = gU - ta;,
C/5 -      = AAJ = hU + TgJ     (g$ - AJ = 0).
Thus the action of the operator A on the vectors gf and hk is described by the formulas
Ag? = -Thl AgJ = gj -tA*
A/?2 = h\ + Tg*,
2j
(17)
Ag„\ = A«4 =
g*
Sit"
—rh
hk. is
Moreover, (16) implies
f) = gk + iA},     /J = g? - iAj. Therefore the (complex) linear manifold spanned by all the vectors gj? the same as the linear manifold spanned by all the vectors /*, f). But the
number of vectors gk, hk is the same as the number of vectors/*,/*. Hence the vectors gk, hf are linearly independent over the field C, just like the vectors
/*»/?• Tnus> a fortiori, the vectors gk, hk are linearly independent over the field R, i.e., in the real space Rw.
It follows from the formulas (17) that the matrix of the operator A in the basis g*, hk is a quasi-diagonal matrix, made up of blocks of the form
0   T 10
— T 0
0 1
0 T -T 0
1 0
0 1
0 T
— T 0
0 T -T 0
1 0
0 1
0 T
-T 0
(18)
of sizes 2n1, . . . , 2nQt respectively.
sec. 6.6
the real jordan canonical form 159
6.63. We now consider the general case. Let A be a linear operator in a real w-dimensional space Rn, and let P(X) be an annihilating polynomial of P(X). Then P(k) has a factorization of the form
to = im - w n - <^)2 + -a
fc=l
(to within a numerical factor) in the real domain, where Xk (k = 1,. . . , n) are the distinct real roots of P(X) and CTj + /t, = jx,, «Tj — ztj = jx, are the distinct imaginary roots of P(l). According to the general theory (Sec. 6.36), the space Rn can be represented as a direct sum
R.  =   2   E, + J F,
J=l
of subspaces invariant with respect to A, where (Xft — X)r* is an annihilating polynomial of the operator A in the subspace Efc, while (ct, — X)2 + t2 is an annihilating polynomial of A in the subspace F,. In the subspace Ek the operator A can be reduced to the Jordan canonical form (7). As for the subspace Fj, letBj = A — ct;E. Then(X2 + t2)351 is an annihilating polynomial for the operator B; in F,, and hence, by Sec. 6.62, there is a basis in which the matrix of B, is of the form (18), with t replaced by t,. In this same basis the matrix of the operator A = B; -f ct,E is quasi-diagonal, made up of blocks of the form
-T7 °I
1 0 0 1
1 0 0 1
■t, ct;
1 0 0 1
ct, t,
(19)
of sizes 2«!,. .. , 2nQt respectively. Thus we can choose a basis in the space R„ in which the matrix of the operator A consists of diagonal blocks of the form (10) and (19). This "real Jordan matrix" will be denoted by JR(A).
6.64. As in Sec. 6.4, the structure of the matrix JR(A) can be deduced from the elementary divisors of the operator A, which in turn can be calculated
160     the canonical form of the matrix of a linear operator        chap. 6
from the minors of the matrix A — ~kE'm the original basis. Since the polynomials Dv(k) and EPQC) are obtained from the minors A — ~KE by rational operations, the polynomials Ev(k) have real coefficients and hence are of the form
£„-A) = n (h - jo**' n - °if + -or1"  u = i, ^ ■ ■ •,*-1)
fc=l !=1
(cf. Sec. 6.44). To every exponent n\k) there corresponds a Jordan block of size njfe), and to every exponent pf] a block of the form (19) of size 2pl/K
6.65. The above results can be summarized in the form of the following
Theorem. Given any operator A in a real n-dimensional space R„, there exist*, a basis in which the matrix of A is quasi-diagonal, made up of blocks of the form (10) and (19), where \ (k = 1,. . . , m) are the real roots and al ± z't, (/ — 1, .. . , s) the complex roots of the characteristic polynomial of A. The sizes of the blocks are uniquely determined by the elementary divisors of A in the way indicated in Sec. 6.64.
6.66. Corollary. Every linear operator A in a real n-dimensional space Rn has an invariant subspace of dimension 2.
Proof. The basis vectors g\ and h\ obviously generate a two-dimensional invariant subspace of A (see (17)). |
The number of distinct two-dimensional subspaces of A can always be estimated (from below). In fact, there are at least as many such subspaces as there are distinct diagonal blocks (19) of size >2 in /#(A).
*6.7. Spectra, Jets and Polynomials
In many problems of algebra and analysis, the need arises to calculate various functions (in particular, polynomials) of given linear operators acting in a finite-dimensional space. Such functions, which have a number of special properties, will be investigated in the next two sections. A natural arithmetic model for functions of a single operator is the algebra of jets, with which we begin our discussion.
6.71. By a spectrum, denoted by 5, we mean any set of points Xls. .. , Xfe, where it is assumed that each point Xfc is assigned a "multiplicity," i.e., a positive integer rk {k = 1,. . . , m), a fact indicated by writing
sec. 6.7
spectra, jets and polynomials 161
Moreover, we assume that each point ~kk is assigned a set of rk numbers from the field K, denoted by
Such a set of numbers will be called a jet f defined on S.
We now introduce the following algebraic operations in <f{S), the set of all jets on a given spectrum S:
a. Addition of jets. By the sum f + g of two jets / = {f{j)(kk)} and g = {g0)(Xfc)} we mean the jet defined by the set of numbers
(/+/l)",(X*)=/tf»(Xt)+^(Xk) (k = 1, . . . , m\j = 0, 1, . . . , rfc — 1).
b. Multiplication of a jet by a number. By the product of of a jet f = {/0)(Xfc)} and a number a e K we mean the jet defined by the set of numbers
These two operations obviously convert the set ^(S) into a linear space, whose zero element is the jet 0 whose "components" are all zero.
c. Multiplication of jets. By the product fg of two jets f = {/0)(XA.)} and g — {gU)Q>k)} we mean the jet defined byf
(fg)(\)=fMsM,
= 1, . . . , m;j = 0, 1, . .. , rk — 1), where C\ is the binomial coefficient
Cj =-—
/!(;-/)!
It is easily verified that this operation is commutative and satisfies conditions l)-3) of Sec. 6.21. Therefore f{S) is a commutative algebra over the field K. This algebra has a unit, i.e., a jet e such that ef = f for every / e ^(S). In fact, we need only choose
f 1   if y = 0,
e{i) (K) =
t0   if  0 <y<rt
t These formulas are formally identical with Leibniz's rule for repeated differentiation of the product of two functions /and g.
162     the canonical form of the matrix of a linear operator chap. 6
In what follows, we will set up a correspondence between the algebra ý(S) and the algebra of all polynomials with coefficients in the field K, for the case where the points Xls. .. , Xm all belong to K.
6.72. It will be assumed that the field K has infinitely many distinct elements. Making this assumption, we first show how to "reconstruct" the coefficients of a polynomial from a knowledge of its values.
a. Let
be a polynomial with coefficients in the field K, whose argument X also takes values in K. Then the coefficients a0, alf. . . , av of P(X) are uniquely determined by the values of P(X). In fact, let X0, X1?. . . , ~kv be distinct elements of K, and consider the equations
a0 + fli^o + * * * + avK = fPo), a0 -f a^ -f • • ■ 4- av~k{ = fPi)>
«o + ai^v + * ' ' + av\vv = P(XP),
which can be regarded as a system of p 4- 1 equations in the unknowns The system has a nonvanishing determinant (see Example 1.55c), and hence, as asserted, has a unique solution by Cramer's rule (Sec. 1.73).
b. In particular, it follows that if two polynomials
V
k
P(X) = 2>***, <2(*)=2M;
coincide for every value X e K, then
ak = bk     (k = 0, 1, . . . ,p).
6.73. We will subsequently need the concept of the derivative of a polynomial P(X), and the notions of higher derivatives and Taylor's formula as well. In analysis these concepts are introduced for the case of polynomials which are functions of a real (or complex) argument, but here we are concerned with polynomials P(X) whose argument X varies in an arbitrary field K. We must therefore introduce the corresponding definitions independently, i.e., without recourse to the notion of a limit which may not exist in the field K.
a. Fixing a point \i e K, we write the formula
1 akXk = 2 ak[ii + (X - (x)]* = 2 ^ (X - fxf, (20) k=o fc=o fc=o k\
sec. 6.7
spectra, jets and polynomials 163
where the quantities
(k = o, l,..., p)
k\
are the polynomials in fx obtained after expanding [|x + (X — jx)]fc in powers of [x and X — jx and collecting similar terms. The polynomials bk([i) are then given the following names:
£0(jx) = = jP(h-)» tne polynomial p(}x) itself,
fc=0
&i((x) = 2^fljt|x*-1 = P'fa), they*rtf derivative of p((x), fc=i
62(jx) =lk(k — l>fc(Afc-2 = the 5efO«£/ derivative of p(fx),
fc=2
6p((x) = />(p — 1) • • • 1 • flj, = P<J,>(|x), the     derivative of P((x).
For a polynomial of degree/>, we set P(*>(}x) = 0 if q > p. In the new notation, formula (20) takes the form
JW = I 7: Pm(^ - tf, (200
k=0 kl
known as Taylor's formula for the polynomial P(X).
b. In particular, for the polynomial
P(X) = (X-a)» (aeK),
we have
P(a) = P'(«) = •' ' = P^-1^) = 0,
P<»>(X) = />!,
P<*>(X) - 0      («7 > />).
c. More generally, if
P(X) - (X - a)*(2(X),
we have
qoo=i w - a)\   m = I   - «)fc+J,s
and hence
P(a) = P\d) = • • • = PO-^a) = 0. (21)
d. Conversely, if it is known that (21) holds, then
= 2 7; PaV)(x - «)*
= (X - af2 - P,fc>(a)(X - af-» = (X - ar<2(X), where Q(k) is a new polynomial.
164     the canonical form of the matrix of a linear operator chap. 6
6.74. It should be noted that the representation of the polynomial P(X) in the form
ia*x* = p(x) =2^^
where the &fc(jx) are polynomials in fx, is necessarily unique. In fact, suppose we fix jx = }x0 and give X the distinct values X0, Xls . . . , \v in turn. Then t = X — (x takes the distinct values X0 — fx0, ~kx — jx0, .. . , Xp — jx0, and the values of the polynomial
k=0
are known for these values of t, being equal to P(X0), PfXj), . . . , P(Xp). But then the quantities bk(\i.0) are uniquely determined, by Sec. 6.72a. Since this is true for arbitrary fx = (x0 e Kt the polynomials bkQj.) (k = 0, 1, . .. , p) are themselves uniquely determined.
6.75. a. Given two polynomials P(X) and Q (X), we now verify the formulas
(P + QfV) = P,H(|x) + Qm(\i), (22)
k
(PQ)ik\\i) = 2 CJP"^'^'^) (23) (k = 0, 1, 2, . . .), where
In fact, by definition,
(P + C)(a) = I 7: (P + Qt\^ ~ tf,
A-=0 A:!
P(X) + Q(X) =27- [P(fc,((x) + <2(J%)](x - |x)*,
&=o A:!
so that (22) follows from the uniqueness theorem of Sec. 6.74. Similarly, (PQ)(X) =27; (PQf V)(X ~ H)\
sec. 6.7 spectra, jets and polynomials 165
while on the other hand,
m = i t. pu^)& - n)',   <m = 27: Q,Mo*)(x - (x)',
mm =2 Ittt: ^0'^)<2(n((x)(x -
,=oj=o./! /!
*-oU-oj! (A: — ;)! J Thus the uniqueness theorem of Sec. 6.74 implies
A:! j=o;!(A:—;)!
which is equivalent to (23).
b. In particular, formula (23) implies the following important Theorem. If
P<*>(|x) = 0      (k = 0, 1, . . . , m),
then
WW = 0      (* = 0,l,...,m) ybr a«ypolynomial Q(k).
6.76. Now suppose we are given a spectrum
S = {Xi1, . . . , X^1}      (Xj- £ A^)
and the corresponding algebra £(S) of jets on S (see Sec. 6.71). Then with every polynomial p(X) we associate the jet p e ^(5) which assigns to X, the numbers
where the P(j>(Xfc) are the derivatives of the polynomial P(X), as defined in Sec. 6.73. It follows from formulas (22) and (23) that the operations on jets defined in Sec. 6.71 correspond to the usual operations of addition and multiplication of polynomials. Thus the mapping p(k) —>p is a morphism (Sec. 6.24) of the algebra of polynomials II into the algebra of jets #{S\ As we now show, this morphism is an epimorphism, i.e., given any jet/, we can find a polynomial P(X) such that
p(K) =/(x*), P'(xfc) =f'M,.. ■, P"*-1'^) =/,r-1|W
(k — 1, . . . , m).
To prove the assertion, it is enough to consider the case where all the numbers fU)(kk) vanish except one, corresponding to any given value k = kx. In fact,
166     the canonical form of the matrix of a linear operator chap. 6
having solved the problem for this case, we need only construct a polynomial Pk(k) for each k = 1, . .. , m satisfying the conditions
PkM =f(\),. ■ •, PlTk-1]M = f'^Xh), (24)
PI-XK) =0     {s ^ k; j = 0, 1,.. . , rs - 1), (25) and the solution will then be given by the formula
P(X) = P,(X) + • • • + PTtt(X).
Thus we must find a polynomial Pfc(X) satisfying the conditions (24) and
(25) . To this end, we look for Pfc(X) in the form
P&) = QMRM, (26) where Qk(X) is a new polynomial and
= IT (* - KP- (27)
By Sec. 6.73c, we have
R*X\)=0     (s^k;j^0, 1, ...,r,-l), and hence, by Theorem 6.75b,
Pj/'fX,) =0     (5 ^ *;./ = 0, 1,. . . , r, - 1)
for any polynomial 0fc(X). Hence the condition (25) is clearly satisfied. We must still subject the polynomial Pfc(X) to the condition (24). Since
**(Xk) = IT (h ~ \Y° * 0,
the condition
ffrt) = PM = GtM^CXfc) uniquely determines £?fc(Xfc). Moreover, once Qk(kk) is known, the condition
f(K) = PIM = Gi(x*)^(x*) + Q*(xk^k)
uniquely determines Qk(\). Continuing in this way, we are able to uniquely determine all the numbers Qkfyk)> Q'ki}-k), • • • , C^*-1^)- But once these numbers are known, we can determine the desired polynomial Qk(k) by using Taylor's formula
QM -I ~Gi"(xk)(x ~KY- (28)
3=0 ;!
Reasoning backwards, we see that the polynomial Pfc(X) defined by formulas
(26) -(28) satisfies the stipulated conditions (24) and (25).
6.77. Next, applying Sec. 6.52d, we find that the algebra #{S) of all jets defined on the given spectrum S is isomorphic to the factor algebra n//,
sec. 6.7
spectra, jets and polynomials 167
where I is the ideal in IT consisting of all polynomials for which PliKK) = 0      (A: = 1.....m;/ = 0, 1.....rfc — 1).
It follows from Sec. 6.73d that every polynomial P(k) e / is divisible by the polynomial
m
T(a) = IT (X - KY\ (29)
fc=i
and from Sec. 6.73c that every polynomial divisible by 7(A) belongs to I. The ideal I, li ke every ideal in the algebra IT, is generated by the polynomial in I of lowest degree (see Sec. 6.26), and this polynomial is just 7(A) itself. Hence the algebra ^(S) is isomorphic to the factor algebra Ujl, where I is the ideal generated by the polynomial 7(A).
6.78. We now use the result of Sec. 6.77 to solve the problem of describing all invertible elements (Sec. 6.21) of the algebra
Obviously, a jet/ for which /(Xfc) = 0 for at least one value of k cannot be invertible, since then
(fg)M =f(K)g(h) = 0^1= e(kk)
for every jet g. Thus let/be a jet such that
/(Afc)^0 {k=l,...fm)t
and let P(A) be the polynomial for which
- /(x*), ■ • •, P(r*-1]M = f^M    (* = l,..., in)
(see Sec. 6.76). This polynomial clearly has no factors in common with the polynomial 7(A) defined by (29), and hence, by Sec. 6.28, there are polynomials Q(X) and S(A) such that
P(A)e(A)+r(A)S(A)^l. (30)
Let q be the jet corresponding to the polynomial (2(A). Applying the epimor-phism II —»- f{8) constructed in Sec. 6.76 to equation (30), and using the fact that this epimorphism carries the polynomial 7(A) into 0, we find that
/* = 1.
i.e., the jet f e ^{S) is invertible.
Let u be any invertible jet. Then, as we know from Sec. 6.21, the equation
ux = v
where x is an unknown jet and v any given jet, has the unique solution x = vju. We can find an explicit expression for the ratio v\u by successively
"68     the canonical form of the matrix of a linear operator chap. 6
solving the equations
2 cy^)*'^.) = ^,j>(xfc)
(A: = 1, . . . , m\j = 0, 1, . . . , rk — 1).
6.79. a. A spectrum S = {X^1,. . . , X^1} with complex Xls . . . , Xm is said to be symmetric if whenever S contains an imaginary number Xfc = ak + frk, it also contains the complex conjugate number Xfc = — fak with the same multiplicity rk. A jet/ = {/(i)(^fc)} defined on a symmetric spectrum S is said to be symmetric if the numbers f(i)(}.k) and fij}(hk) are complex conjugates (y = 0, 1,. . . , rk — 1). If P(X) is a polynomial with real coefficients, then the jet defined on a symmetric spectrum by the numbers
i*>>(Xfc)      (k = 1, . . . , mj = 0, 1, . . . , rk - 1)
is symmetric, since the derivatives PU)(k) also have real coefficients and hence
PmM = P{HK)- (31)
Conversely, given a symmetric jet / = {fU)Q<k)} on a symmetric spectrum 5 — {X^1,. . . , X^1}, we can always find a polynomial P0(A) with real co" efficients such that
Pi"(**) = (A; = 1, .. . , m; j = 0, 1,. . . , rfc — 1).
In fact, by Sec. 6.76, we can construct a polynomial P(X) with complex coefficients satisfying the same conditions. Let P(k) denote the polynomial whose coefficients are the complex conjugates of those of P(X). Then it follows from (31) that
i[P<>>(Xfc) + P<>->(Xfc)} = ~[P^(h) + Pm(W] = P°>M = /<J>(M, i.e., the polynomial
with real coefficients satisfies the required conditions.
b. The set of all symmetric jets / on a symmetric spectrum S obviously forms an algebra over the field of real numbers. According to Sec. 6.25d, this algebra is isomorphic to the factor algebra IT//, where IT is the algebra of all polynomials with real coefficients and I <= IT is the ideal consisting of all
sec. 6-8
operator functions and their matrices 169
polynomials P(X) e n for which
Pm(K) = 0      (k - 1, . . . , m-J = 0, 1, . . . , rk - 1), i.e., the ideal generated by the (real) polynomial
m
r(x) = n & -
fr-1
*6.8. Operator Functions and Their Matrices
In this section we investigate functions of operators, finding matrices (and corresponding rules of operation) for polynomials of the form P(A) and rational functions of the form P(A)jQ(A), where A is any linear operator acting in an n-dimensional space Cn (or R„). In Sec. 6.89 we will extend the "calculus of operators" to the case of analytic functions of operators.
6*.81. Given an operator A acting in an «-dimensional space K^, let IIA be the algebra of all operators of the form P(A), where P(X) is some polynomial. Then ITA is isomorphic to the factor algebra UjIA, where IT is the algebra of all polynomials and IA is the ideal generated by the minimal annihilating polynomial T(k) of the operator A (see Sees. 6.31-6.33). Suppose it is known that the polynomial r(X) has the factorization
m
r(x) = n(^-^*r (32)
in the field K. Then, by Sec. 6.77, the factor algebra II//A is isomorphic to the algebra f(S) of all jets defined on the spectrum
S = SA  = (Xi1,.... a^}
(called the spectrum of the operator A). Hence the algebra IIA is itself isomorphic to the algebra ^(S). The explicit form of this isomorphism is the following: To every jet/6 J?(S) there corresponds the class of polynomials P(X) e II such that
P{J)M = fU)M      (k = 1, . . . , m-j = 0, 1, . . . , rk_,), (33)
and to each of these polynomials there corresponds the same uniquely defined polynomial operator P(A), which we denote by f(A).
Below we will investigate the explicit form of the matrix of the operator P(A) for a given minimal annihilating polynomial (32), in the case where the matrix of A is in Jordan canonical form.
170     THE CANONICAL FORM OF THE MATRIX OF A LINEAR OPERATOR CHAP. 6
6.82. First suppose the operator A has a matrix (of order n) of the special form
(34)
	1	• 0
0	\ ■	■• 0
0	0 •	■ • 1
0	0 •	
in some basis of the space Kn. Then A is of the form T^E + B, where the operator B has the matrix
0 1
0 1
0 0
0 0
0
0
1
0
According to Example 4.74b, the matrix of Bk is
(* + l)
0 0
0
0
1 0 0 1
0 0
(35)
where the diagonal consisting entirely of ones has moved over k steps to the right from the principal diagonal. If P(X) is an arbitrary polynomial of degree p, then
k=o k\
by Taylor's formula (20'). Replacing X by the operator A, we get the identity
jP(A) = 2 f P^XoXA - XoE)* =£ y- P(fc)(Ao)B*-fc=o k! fc=o k!
Then, taking account of the expression (35) for the matrix of Bk, we find that
sec. 6.8
operator functions and their matrices 171
P(A) has the matrix
P(X0)   P'(Xo) V'(Xo)
0      TOo) P'(Xo)
---i><n-1,(A0)
in ~ 1)!
(«-2)!
P(n-2)(Xo)
0
0
0
(36)
Note that to construct the matrix of P(A) from the polynomial P(X), we only need the n values P(X0), P'(Xo),. . . , P(n_1'(X0)J where n is the order of the matrix of A.
6.83. Next suppose the operator A has a quasi-diagonal matrix of order n, made up of blocks of the form (34), where X0 takes the values X1?. .. , Xm with corresponding block sizes nlt. . . , nm. By the rules for operating on quasidiagonal matrices (Sec. 4.52), each block of the matrix of the operator P(A) can be calculated independently. Applying Sec. 6.82, we find that the matrix of P(A) is obtained by replacing each block (34) of the matrix of A by the block (36). Thus to construct the matrix of P(A), we now need the values
P<»(X*)      (k = 1, . . . , m\j = 0, 1, . . . , nk - 1).
6.84. Let A be any operator acting in an w-dimensional complex space Cn. Then, as on pp. 146-147, there exists a basis in which the matrix of A is quasi-diagonal, made up of blocks of the form
n
k 1 0 Aj,
0
1
0 0 0 0   0 0
0
0
1
(k — 1,. . . , m\) — 1,
r»), (37)
where the numbers rk and n{k) are those figuring in table (9). Correspondingly, the spectrum of the operator A is
If
S — SA   — {Xi1, ... j X^1}. /= {fmM)      ik = 1, - . . , m;j = 0, 1, . . . , rk ~ 1)
is any jet defined on S, then, by Sees. 6.81-6.83, the corresponding operator /"(A) has a quasi-diagonal matrix, in which each block of the form (37) is
172     the canonical form of the matrix of a linear operator chap. 6
replaced by the block
0    /(xte) /'(x,)
{n{}k)~ 1)!
1        / <fc> ,i
(n<fc) - 2)!
1 fln^2)M
0 0
0
(38)
The isomorphism between the algebras UA and £{S) has now been made perfectly explicit.
6.85. a. Next we consider functions of an operator A which has a matrix of order 2m of the form
<j     t 10 — t     g 0 1
g t
— t g
(39)
g t
— t g
where g and t are elements of the field K. Introducing the 2x2 matrices
E =
1 0	a _	g t
0 1	>     A —	•—-t g
we can write the matrix of A as the following block matrix of order m: A  E   0   •••   0 0
0   A E
0 0 0 0   0 0
•   0 0
A E
0 A
A  0 0
0   A 0
0   0 0
0   0 0
0 0
0 0
A 0
0 A
+
0 E 0
0 0 E
0 0 0
0 0 0
0 0
0 0
0 E
0 0
sec. 6.8
operator functions and their matrices 173
Therefore it follows from Sec. 6.82 and the rule for multiplication of block matrices (Sec. 4.51) that the matrix of P(A) can be written in the form of the block matrix
P(A)  P'(A) ~P"(A)
0     P(A) P'(A)
1
(m - 1)! 1
(m - 2)!
Plm-1}(A) P(m-2,(A)
0
0
0
P(A)
(40)
b. If the matrix of A is quasi-diagonal, made up of blocks of the form (34) and (39), then, just as in Sec. 6.83, we deduce that the matrix of P(A) is obtained by replacing each block by the corresponding block of the form (36) or (40).
c. In the case where K = R, so that the numbers a, t and the polynomial P(X) are real, we can find the explicit form of the matrices P(fc)(A) figuring in (40). In fact, introducing the matrix
0 1
■1 0
we easily verify that P = ~E, so that the algebra of real matrices
Re X   Im X
A = aE + tI ■■
-Im X   Re X
(X = a + (*t)
is isomorphic to the ordinary algebra of complex numbers (cf. Example 4.74a). Hence for any polynomial P(X) with real coefficients we have
P(A) = P(aE + tA) = and correspondingly
ReP(X) ImP(X) -lmP(X) ReP(X)
(X = a + *t),
ReP<*>(X) ImP'fc,(X) -lmP<fr>(X) ReP(fc>(X)
6.86. Let K = R and = R„. Then, given any operator A acting in K„, the minimal annihilating polynomial T(X) has real coefficients and hence has a symmetric spectrum SA (see Sec. 6.79a). The algebra IIA of operators of the form P(A) is isomorphic to the facior algebra UjIA, where II is the algebra of polynomials with real coefficients and TA is the ideal generated by the minimal annihilating polynomial of the operator A. According to Sec. 6.79b,
174     the canonical form of the matrix of a linear operator chap. 6
this factor algebra is isomorphic to the algebra of symmetric jets on the spectrum SA. On the other hand, there is a basis in which the matrix of A is quasi-diagonal, with diagonal blocks of the form (34) and (39). Let/be any symmetric jet on the spectrum SA. Then it follows from the above considerations that the corresponding matrix /(A) is obtained by replacing every block (34) by a block (38) and every block (39) of size 2m by the block matrix
/(A) /'(A)  ... _l_/«-it(A)
/(A)
(m - 1)!' 1
(m - 2)!
/(m"2>(A)
0       0     • • • /(A) of order m, where the /(ft>(A) are 2x2 matrices of the form
Re/(*>(X) lm/<*>(X)
/<*>(A) =
-Im/<*>(X) Re/(A>(X)
6.87. Given a linear operator A acting in a space Cn, suppose A has the Jordan canonical form (8) specified by the table (9), as on pp. 146-147. We now look for all invertible operators of the form P(A), where P(X) is a polynomial. It is clear from the form of the operator of the matrix of P(A) in the Jordan basis of the operator A that the determinant of this matrix is just
ntWr*,      Irk = n
JL-1 k=l
(cf. Example 1.55b). Therefore the operator P(A) is invertible in the algebra L(Cm) of all linear operators acting in the space Cn if and only if
P(Xfc)^0      (*=l,...,m). (41)
Moreover, if the condition (41) is satisfied, then the inverse operator [/'(A)]-1 already belongs to the algebra IIA. In fact, in this case the jet p corresponding to the polynomial P(k) in the algebra of jets ^(SA), i.e., the jet consisting of the numbers
i**>(Xfc)      (*= 1,. ..,m;/ = 0, 1,. ...r* - 1),
is invertible in the algebra f{SA), by Sec. 6.78. But then the operator P(A) is invertible in the algebra ITA, by the isomorphism between the algebras f(SA) and nA.
sec. 6.8
operator functions and their matrices 175
Again using the isomorphism between the algebras </(5A) and nA, we see that if P(A) is invertible, then the equation
P(A)*(A) = 0(A),
where X(k) is an unknown polynomial and (?(X) any given polynomial, has the unique solution X(A) = Q(A)jP(A). Let x, p and q be the jets corresponding to the polynomials X(k), P(X) and £?(X), respectively, so that in particular/?* = q, x ~ qjp. Then, according to Sees. 6.78 and 6.84, the matrix of the operator A^A) in the Jordan basis of the operator A is obtained by replacing every block of the form (36) by a block of the form
P(h)
0
0
\p(a)A-> o
2\p(X)A-xt \/>(X)/>.=xt
(42)
6.88. The above result can be interpreted somewhat differently. Given a spectrum 5 — {X^1,... , X^1} in the complex plane, let91(5) denote the set of all complex rational functions
G(X)
m = m'
where P(X) and (?(X) are polynomials, and P(X) has no roots at the points of the Set 5. In the set9l(5) we define the operations of addition of two functions, multiplication of a function by a complex number, and multiplication of two functions in accordance with the usual rules, thereby making91(5) into an algebra over the field C. Moreover, we note that every function /(X) e91(5) has derivatives /'(X),/"(X),... in the usual sense of analysis. Assigning to each function /(X) e§l(5) the jet
f-{fuKK)}     (k = \,
m
;j = o, l,
1),
where fU)(kk) denotes the usualyth derivative of /(X), we get a morphism of the algebra 91(5) of rational functions into the algebra f(S) of jets on the spectrum 5, in fact an epimorphism, since by Sec. 6.76 the jets corresponding to just the polynomials Q(k) already fill the whole algebra ^(5).
Now let 5 = 5A be the spectrum of some operator A acting in the space C„. Then the algebra IIA of operators P(A) is isomorphic to the algebra of jets tfiS^), and we can extend the given epimorphism 9l(5A) —»- ^(5^ to an epimorphism 9l(5A) —»- IIA. In other words, we can assign to each
176     the canonical form of the matrix of a linear operator chap. 6
rational function /(X)e9t(.S) a linear operator /(A)eiTA such that the correspondence /(X)-»-/(A) is again an epimorphism, where the matrix of the operator /(A) is given by the rule (42).
6.89. Instead of the algebra of rational functions, we can consider the algebra of analytic functions. Thus let ^(S) be the set of all functions /(X) analytic at the points Xl5. . . , Xm, i.e., analytic in a neighborhood of each of the points Xx,. . . , Xm. Then the set ^(5) equipped with the usual operations of addition and multiplication is again an algebra over the field C, in fact an algebra containing the algebra ^1(5). Analytic functions also have derivatives of all orders (in the usual sense of analysis), and using them, we can extend the epimorphism9l(5A) —»- IIA constructed in Sec. 6.88 to an epimorphism ^(SA) —>• nA. An important feature of this new epimorphism is that it now involves many transcendental functions of analysis, like etX, cos tl, sin ?X,etc. If/(A) denotes the operator corresponding to the function /(X) e ^(SA), then its matrix in the Jordan basis of the operator A is calculated by the same rule (38) as before. We note in particular that the operator formula
is an immediate consequence of the identity
and the fact that the mapping ^(SA) —*• FIA is an epimorphism.
The results of Sees. 6.87-6.89, pertaining to linear operators in a complex space, can be carried over to linear operators in a real space, by using the real Jordan canonical form and the method of Sees. 6.85-6.86. We leave the details of this extension to the reader, since no new ideas are involved.
PROBLEMS
1. The matrix of an operator A is of the form
X	0	0	. . . o	0
1	>.	0	• • • 0	0
0	l	X	. . . o	0
0	0	0	. . . x	0
0	0	0	... 1	X
in a basis eu e.2,. .. , en. In what basis does it have Jordan canonical form?
problems 177
2. Prove that the matrix A and the matrix A' (obtained by transposing A) are equivalent.
3. Find the Jordan canonical form of the matrix
2	-1	-1	3	2
4	1	-1	3	2
I	1	0	-3	-2
4	-2	-1	5	1
1	1	1	-3	0
4. Are the operators specified by the matrices
	1	1	0		4	1	-1
A -	0	1	0	,      B =	-6	-1	2
	0	0	2		2	1	1
equivalent?
5. Find the elementary divisors of the following matrices of order n
1   2 3
1 1 0 1
0 0
A* -
0 1 2 0  0 I
0   0 0
n
n - I
0
1
	n	n - 1	n —	2 •	• 1		1	1	1 • •
	0	n	n —	1 •	■ 2		0	2	2   • •
	0	0	n		• 3		0	0	3   • •
	0	0	0		• n		0	0	0   • •
6. Show that all matrices of the form
0    0 0
a	a12	«13 *	* äl*
0	OL	^23 *	a2.n
0	0	a	
1
2
3
n
178     the canonical form of the matrix of a linear operator chap. 6
with arbitrary elements a12, alz,... are equivalent if the elements a12, a23, • • •, dn^n are nonzero.
7. Find the Jordan canonical form of the matrix A satisfying the equation P(A) = 0, where the polynomial      has no multiple roots.
8. Find the Jordan canonical form of the matrix A satisfying the equation P(A) = 0, where the polynomial J°(X) is an arbitrary polynomial.
9. Prove that if the annihilating polynomial of an operator A acting in the space Rn is of degree 2, then every vector x lies in a plane or line invariant with respect to A.
10. Find all matrices commuting with the m x m matrix
Am(a) =
a	1	0 •	• 0	0
0	a	I   • •	• 0	0
0	0	0 -	' a	1
0	0	0  • •	• 0	a
11. Find all m x n matrices B satisfying the condition
BAn(a) = Am(a)B.
12. Find all matrices commuting with quasi-diagonal matrices of the form
Ami(a)      0 0
0
0
0
0
13. Find all matrices commuting with quasi-diagonal matrices of the form
0
Am.(a2)
0 0
0 0
where the numbers ax, a2,.. . , ak are all distinct.
14. Find all matrices commuting with the general Jordan matrix (8).
15. Under what conditions is every matrix commuting with a given matrix A a polynomial in A ?
chapter 7
BILINEAR AND QUADRATIC FORMS
In this chapter, we shall study linear numerical functions of two vector arguments. Unlike the theory of linear numerical functions of one vector argument, the theory of linear numerical functions of two vector arguments (such functions are called bilinear forms) has rich geometric content. Setting the second argument equal to the first in the expression for a bilinear form, we get an important new kind of function of one variable, called a quadratic form, which is no longer linear.
The considerations of Sees. 7.1-7.8 pertain to a linear space K over an arbitrary number field K, while those of Sec. 7.9 pertain to a real linear space.
7.1. Bilinear Forms
7.11. A numerical function A(x, y) of two vector arguments x and y in a linear space K is called a bilinear form (or a bilinear function) if it is a linear function of x for every fixed value of y and a linear function of y for every fixed value of x. In other words, A(x, y) is a bilinear form in x and y if and only if the following relations hold for any x, y and z:
A(x -f z, y) A(coc,y)
A(x, y + z) A(x, ay)
A(x, y) + A(z, y), aA(x, y),
A(x, y) + A(x, z), ctA(x, y).
(1)
179
180     bilinear and quadratic forms
chap. 7
The first two equations mean that A(x, y) is linear in its first argument, and the last two equations that A(x, y) is linear in its second argument. Using induction and the relations (1), we easily obtain the general formula
(k m \ k m
2 2 hyt 1=22 xfiiMXi, M (2) « = 1 f=l / !=lj=l
where xu . . . , xk, yu . .. , ym are arbitrary vectors of the space K and a1? . . . , afc, f}l5 . . . , pm are arbitrary numbers from the field K.
Bilinear forms defined on infinite-dimensional spaces are usually called bilinear functional.
7.12. Examples
a. If Lx(x) and L2(x) are linear forms, then A(x, y) — L1(x)L2(y) is obviously a bilinear form in x and y.
b. An example of a bilinear form in an M-dimensional linear space with a fixed basis elt e2, . . . , en is the function
n n
M*>y) = 2 2afl&Y)t»
where
x = 2       y = 2 ^*
are arbitrary vectors and the aa. (U k ~ 1,2,...,«) are fixed numbers.
7.13. The general representation of a bilinear form in an n-dimensional linear space. Suppose we have a bilinear form A(x,y) in an ^-dimensional linear space K„. Choose an arbitrary basis elt e2, . ■ ■ , en in K^, and write
A(A> **) = aik      (/, k = 1, 2, . . . , «). Then for any two vectors
w n
■v = 2       y = 2*)*«*»
it follows from (2) that
2^2^) = 2 2^%A(^>^)
i=l        k=l        I        * = lfr=l n n
= 2 2«^*- <3)
Thus the most general representation of a bilinear form in an w-dimensional linear space has already been encountered in Example 7.12b.
sec. 7.1
biljnear forms 181
The coefficients aik form a square matrix
an   a12   - ■ • a
A — A{e) —
a21 ^22
a
a
n2
In G2n
a„„
= a
ik\\
which we will call the matrix of the bilinear form A(x, y) in (or relative to) the basis {e} = {ex, e2, ... , en}.
7.14. Symmetric bilinear forms. A bilinear form is called symmetric if
A(x, y) = AO, x)
for arbitrary vectors x and y. If the bilinear form A(x, y) is symmetric, then
atk ~ A(eit ek) ~ A(ek, et) = aki,
so that the matrix A{e) of a symmetric bilinear form in any basis eu e2,.. . , en of the space K„ equals its own transpose A[e). It is easily verified that the converse is also true, i.e., if A[e) = A{e) in any basis elt e2,... , eni then the form A(x, y) is symmetric. In fact, we have
n n n
My,x) = 2 Vikn&k = 2 aMk = 2 ai£if\k = M*, y),
i.k=l i.k=l i.k=l
as required. In particular, we have the following result: If the matrix of the bilinear form A(x, y) calculated in any basis equals its own transpose, then the matrix of the form calculated in any other basis also equals its own transpose. A matrix which equals its own transpose will henceforth be called symmetric.
7.15. Transformation of the matrix of a bilinear form when the basis is changed.
a. Of course, if we transform to a new basis, the matrix of a bilinear form changes according to a certain transformation law. We now find this law. Let AU) ^= ||aifc|| be the matrix of the bilinear form A(x, y) in the basis
{e} = {els e2, ■ ■ ■ , en},
and let A{f) — ||&Jfc|| be the matrix of the same form in the basis
{/} = ■ ■ • >/»>
(i, k = 1,2,..., ri). Assuming that the transformation from one basis to the other is described by the formula
fi-lpfe,     (i = l,2,...,/i)
182      bilinear and quadratic forms
chap. 7
with the transformation matrix P = \\p{}i]\\, we have
bik = \(fufh) = A(i>«% 2 p™e\
\j=i      1=1 /
= | p?»PlwA(^ *,) = 2 p< W
This formula can be written in the form
bi*=22p\j)'anp?\ (4) *=-i/=-i
where p\i]' = p{/] is an element of the matrix P' which is the transpose of P. Equation (4) corresponds to the following relation between matrices (see Sec. 4.43):
A(f)=P'AMP. (5)
b. Since the matrices P and P' are nonsingular, it follows from Corollary 4.67 that the rank of the matrix A[f) equals the rank of the matrix AU), i.e., the rank of the matrix of a bilinear form is independent of the choice of a basis. Hence it makes sense to talk about the rank of a bilinear form. A bilinear form A(jc, y) is said to be nonsingular if its rank equals the dimension n of the space K„.
c. Let A(x, y) be a nonsingular bilinear form. Then, as we now show, given any vector x0 ^ 0, there exists a vector yQ e Kn such that A(;c0, y0) ^ 0. Suppose to the contrary that A(jc0, y) = 0 for every y e Kn, and construct a basis elt e2, . . . , en in the space K„ such that eY = x0. Then the matrix of the form A(x, y) in this basis is such that
aim — A(e1( em) = A(x0, em) = 0,
so that the whole first row of the matrix consists of zeros. But then the rank of the matrix is less than w, contrary to the hypothesis that A(x, y) is non-singular. This contradiction proves our assertion.
d. Note that a form A(x, y) which is nonsingular in the whole space K may be singular in a subspace K' <= K. For example, the form
A(x, y) *= li^i — It^ii
is nonsingular in the space R2, where x — (£ls £2), y — ^2)- However, it vanishes identically in the subspace R'2 <= R2 where ^ = £2 (and     — 7)2).
e. It follows from (5) and Theorem 4.75 on the determinant of the product of two matrices that
det Aif) — det AU) (det Pf.
(6)
sec. 7.2
quadratic forms 183
7.2. Quadratic Forms
One of the basic problems of plane analytic geometry is to reduce the general equation of a second-degree curve to canonical form by transforming to a new coordinate system. The equation of a second-degree curve with center at the origin x = 0, y — 0, has the familiar form
Ax2 + 2Bxy + Cf = D. (7)
A coordinate transformation is described by the formulas
x = alxx' + «12/, y = a21x' + a22y',
where au, a12, a2u a22 are certain numbers (usually sines and cosines of the angle through which the axes are rotated). As a result of this coordinate transformation, (7) takes the simpler form
A'x'2 + B'y'2 = D.
An analogous problem can be stated for a space with any number of dimensions. The solution of this and related problems is the fundamental aim of the theory of quadratic forms, which we now present.
7.21. We begin with the following definition;
A quadratic form defined on a linear space K is a function A(jc, x) of one vector argument x eK obtained by changing y to x in any bilinear form A(x,y) defined on K.
According to (3), in an w-dimensional space Kn with a fixed basis {e} = {elt e2t. . . , en}, every quadratic form can be written as
n n
i=lJfc=Tl
where £l9 £2> • • • » are components of the vector x with respect to the basis {e}. Conversely, every function A(x, x) of the vector x defined in the basis {e} byformula (8) is a quadratic form in x. 1 n fact, we need only introduce the bilinear form
n n
B(x, y) = 2
i=lfc=l
where t)2, .. . , v\n are the components of the vector y with respect to the basis {e}. Then the function A(x, jc) is obviously just the quadratic form B(x, x).
184     bilinear and quadratic forms
chap. 7
7.22. We can write the double sum (8) somewhat differently by combining similar terms. Let bu — aH and bik = aik + aki (i ^ k). Then, since
aik^i^k + aki^>k^i — (aik 4~ aki)£,i%>k =
the double sum (8) can be written as
n
A(x,x) -2 2>fl&£*» and has fewer terms. It follows that two different bilinear forms
n n
Mx, y) = 2 aik^k,   C(x, >o = 2 c«^it
i,k=l i.k=l
can reduce to the same quadratic form after y is replaced by x. All that is necessary is that aik + ahi — cik + cki for arbitrary / and k.
Thus, in general, we cannot reconstruct uniquely the bilinear form generating a given quadratic form. However, in the case where it is known that the original bilinear form is symmetric, it can be reconstructed. In fact, if aik = aki, then the relation aik + aki = bik (/ ^ A:) uniquely determines the coefficients aik, i.e.,
out = aki = \ (i ^ k), (9)
while for / = k we have
«« = (9')
so that the bilinear form itself is uniquely determined. This assertion can be proved without recourse to bases and components. In fact, we have
A(x + y, x + y) = A{x, x) + A(x, 7) + A<>, x) + A(y, y)
by the definition of a bilinear form, and
>>) = i[A(x, y) + AO, x)] = ^[A(x + >% * + y) - A(x, x) - AO, >>)]
by the assumption that A(x, y) is symmetric. Hence the value of the bilinear form A(x, y) for any pair of vectors x, y is uniquely determined by the values of the corresponding quadratic form for the vectors x, y and x + y.
On the other hand, to obtain all possible quadratic forms, we need only use symmetric bilinear forms. In fact, if A(x, y) is an arbitrary bilinear form, then
Ai(x, y) = i [A(x, y) + AO, x)] is a symmetric bilinear form, and
Ai(x, x) = ^ [A(x, x) + A(x, x)] = A(x, x), i.e., the quadratic forms Ax(x, x) and AO, *) coincide.
sec. 7.3
reduction of a quadratic form to canonical form 185
7.23. These considerations show that in using bilinear forms to study the properties of quadratic forms, we need only consider symmetric bilinear forms, with corresponding symmetric matrices \\aik\\, aik = aki. By the matrix of the quadratic form A(x, x), we mean the symmetric matrix A — \aik\ of the symmetric bilinear form A(x, y) corresponding to A(x, x). When the basis is changed, the matrix A of the quadratic form A(x, x) transforms just like the matrix of the corresponding symmetric bilinear form A(;t, y), i.e.,
A{f) — P'Aie)P,
where P is the matrix of the transformation from the basis {e} to the basis {/}. In particular, the rank of the matrix of a quadratic form does not depend on the choice of a basis. Therefore we can talk about the rank of a quadratic form A(x, jc), understanding it to mean the rank of the matrix of A(x, x) in any basis of the space K„. A quadratic form whose rank equals the dimension n of the space K„ is said to be nonsingular.
7.3. Reduction of a Quadratic Form to Canonical Form
7.31. Suppose we are given an arbitrary quadratic form A(jc, x) defined on an n-dimensional linear space K„. We now show that there exists a basis {/} = {fx,fi, . . • ,/„} in Kn such that given any vector
n
x = 2*)*/*»
the value of the quadratic form A(x, x) is given by
A(x, x) = X^2 + Xftj + • • • + Xny]2„, (10)
where Xl5 X2, . . . , X„ are certain fixed numbers. Every basis with this property will be called a canonical basis of A(jc, x), and the expression (10) will be called a canonical form of A(x, x). In particular, the numbers Xl5 X2, . . . , Xn will be called canonical coefficients of A(x, x).
Let {e1} e2, . . . , en) be an arbitrary basis of the space Kn. If
t)
X = 2 %>kek>
then, as we have already seen, A(x, x) can be written in the form
Hx,x) =i (ii)
186     bilinear and quadratic forms
chap. 7
According to Sec. 5.22, our assertion will be proved if we can write a system
*h =^11^1+^12^2 + • • * + Pmln,
f\2 =/>2i5i + ^22^2 + * * * +P2nln, fin = A»l£l +^n2^2 H----+ PnnZn
with a nonsingular matrix P— such that expressing the variables "»In appearing in (11) in terms of Z,u £2,.. . , £n has the effect of transforming (11) into the form (10). We will carry out the proof by induction on the number of variables £f actually appearing in (11), i.e., those which have nonzero coefficients, assuming that every form containing m — 1 variables £1( £2> • • • > sav> can De reduced to the canonical form (10) with n = m — 1, by making a transformation (12) also with n = m — 1.
If (11) actually contains only one variable ^, say, i.e., if (11) has the form
A(x, x) = bug,
then the induction hypothesis is satisfied for any choice of pxl ^ 0. Consider a form (11) which actually contains m variables ^, £2, . . . , £m. First we assume that one of the numbers biu b22, • . , bmmt say bmm, is nonzero, and we group together all the terms in (11) which contain the variable £m. This group of terms can be written in the form
^lm^fli ~\~ ^2771^2^7» ~\~    '    ~\~ b7h—1, tti'St?i—1 Sm ~\~ ^mm^m
= 5i +        S2 + ■ • • + b-f~* ^-i + ÜV Ax(x, x), (13)
-^-^771771 771771 '
where Ax(x, x) denotes a quadratic form which depends only on the variables £l5 £2,.. . , Now consider the coordinate transformation
fi = £l,
T2 = ^2s
771—1 ^>m—1>
x    —   ^lm   JT     I JT   _i_ . . . _i_ bm-l.m k , k
771771 771771 771771
The matrix of this transformation is nonsingular (its determinant is actually 1). In the new coordinate system, A(jc, x) clearly has the form
A(x, x) = B(x, x) + bmmx
,2
7715
sec. 7.3
reduction of a quadratic form to canonical form 187
where the quadratic form B(x, x) depends only on the variables t1? t2, . . . , tm-1. By the induction hypothesis, there exists a new transformation
7)i = + p12T2 + ' • • + A.fl.-l^m-l.
t]2 = /"21t1 + P22r2 + * * ' + p2.m-lTm-l>
(12')
^m-l — Pm-1,1T1 ~\~ Pm-1,2T2 + " " " + Pm-l,m-lTm-l>
with a nonsingular matrix P = Wp^h which carries B(x, x) into the canonical form
B(x, x) = Itfl + X27)^ + • ■ • + X^y]2^.
If we supplement the system of equations (12') with the additional equation f\m = Tm, we obtain a nonsingular transformation of the variables t1} t2, .. . , Tm into the variables y^, yj2, . .. , yjw, which carries A(x, x) into the canonical form
A(X, X) = B(x, x) + bmmT2m = Xtfl + 3^7)1 + • • • + ^m-lVU-l +
According to Sec. 5.33, the direct transformation from the variables {£,} to the variables {yj} is accomplished by using the matrix equal to the product of the matrix of the transformation from {t} to {yj} and the matrix of the transformation from {£} to {t}.| Since both of these matrices are nonsingular, the product of the matrices is also nonsingular.
We must still consider the case of a quadratic form A(jc, x) in m variables £u £2» • • • > £m which has all the numbers bu, b22, .. . , bmm equal to zero. Consider one of the terms b^^ with a nonzero coefficient, say b12 ^ 0. Then carry out the following coordinate transformation, where for convenience we write the transformation from the new variables to the old variables:
£,2 — £>'l £>2>
p (14)
The determinant of the matrix of the transformation (14) equals —2, and hence this transformation is again nonsingular. The term bl2^^2 is transformed into
^12^1^2 = ^12^12 ^12^2^
so that two squared terms with nonzero coefficients are produced simultaneously in the new form. (Clearly these terms cannot cancel any of the other
f {£} is shorthand for the set {lu £m}, {?)} for the set      tj2, . .., rjm}, etc.
188     bilinear and quadratic forms
chap. 7
terms, since all the other terms contain a variable 3 with i > 2.) We can now apply our inductive method to the quadratic form (11) written in the new variables
Thus, finally, we have proved our theorem for any integer m = 1,2,.... In particular, the case m = n suffices to prove the theorem for an arbitrary quadratic form in an w-dimensional space.
The idea of our proof, i.e., consecutive splitting off of complete squares, can be used as a practical method for reducing a given quadratic form to canonical form. However, in Sec. 7.5 we will describe another method, which permits us to obtain directly both the canonical form and the vectors of the canonical basis.
7.32. Example. To reduce the quadratic form
A(x, x) = 3 +        + 53 -        - UUz + 43 - 4^4 -        - 3
to canonical form, we first complete the square in the group of terms containing £l5 writing
*)i = lr + 3£2 ~ 2^3-Then the form is transformed into
A(x, x) = 4 - 4^ - 4^4 - 8^4 ~ 3-Next we complete the square in the group of terms containing £2, writing
7)2 = 2^ + li-
This reduces the form to
A(X, X) = 7)2 — 7)2 — 8£3£4.
There are no squares of the variables £3 and £4. Hence we write
^3 — ^3 "~ r/4» £4 = *]3 + *)4,
so that ^3^4 = 7)2 — 7j|. Thus the form A(x, x) is reduced to the canonical form
A(x, x) = 7)2 — -nS — H + 8^)4
by the transformation
*ii = £1 + 3£2 — 2^,
7)2 = 2^2 + £4, ?)3 = 2^3 + ¥^4i ^4 ^       1^3 "I- 2^4-
It is apparent from the construction that this transformation is nonsingular, a fact which is easily verified directly.
sec. 7.3
reduction of a quadratic form to canonical form 189
7.33. a. Neither the canonical basis nor the canonical form of a quadratic form is uniquely determined. For example, any permutation of the vectors of a canonical basis gives another canonical basis. In Sec. 7.5 it will be shown, among other things, that with a few rare exceptions a canonical basis for a given quadratic form can be constructed by choosing an arbitrary vector of the space as the first vector of the basis. Moreover, if A(x, x) is written in the canonical form
A(X, X) — XjY)J + X27)2 + * ' • + Xn7)*,
where Y)i, 7)2,... , t]n are the components of the vector x, then the transformation
7)2 = <x2t2, ""In anTTi
(where al5 a2, ... , an are fixed numbers all different from zero and tx, t2, . . . , tn are new components) carries A(x, x) into the new form
A(x, x) = (X^tJ + (XaoM + * * * + (Xna^)t^,
which is also canonical but has different coefficients. Hence there still remains the problem of describing all the canonical forms to which a given quadratic form can be reduced. This problem can be made more precise if we restrict the definition of a canonical form (as for example will be done in Sec. 7.93 for the case of a real space) or if we restrict the class of admissible coordinate transformations (as for example will be done in Sec. 10.1 for the case of a Euclidean space).
b. It should be noted that in the preceding example the number of nonzero coefficients remains unchanged when we transform from the variables (y)} to the variables {t}. In general, the number of nonzero canonical coefficients is obviously the rank of the matrix of the quadratic form in the corresponding canonical basis. Since the rank of the matrix of a quadratic form does not depend on the choice of a basis (Sec. 7.23), the number of nonzero canonical coefficients of a quadratic form does not depend on the choice of a canonical basis. Moreover, this number obviously coincides with the rank of the quadratic form (Sec. 7.23). Thus from a knowledge of a quadratic form A{x, x) in any basis {e}, we can predict the number of nonzero canonical coefficients of A(x, x) in any canonical basis, namely the rank of A(x, x). In particular, the canonical coefficients of a nonsingular quadratic form are all nonzero in any canonical basis.
190     bilinear and quadratic forms chap. 7
7.4. The Canonical Basis of a Bilinear Form
7.41. a. The vector xY is said to be conjugate to the vector yx with respect to the bilinear form A(xf y) if
A(*i> yd = o.
In this case, yy is also said to be conjugate to xv
b. Let \\aik\\ be the matrix of the form A(x, y) in any basis elt e2, . . ■ , en. Then, if
71 71
the condition for xx and ^ to be conjugate can be written in the form
71
A(*i, yx) = 2 = °-
c. If the vectors xlt x2, ■ ■ ■ , xk are all conjugate to the vector yx, then every vector of the linear manifold L(xx, x2j ■ ■ ■ , xk) spanned by xx> x2, . . . , x^ is also conjugate to yx. In fact, it follows from the properties of a bilinear form that
M*iXi + a2*2 + * * ' + afcXjfc, yj)
= c^Afo, yx) + a2A(x2, yd + ' " + a*A(**> ^1) = °-
A vector yy conjugate to every vector of a subspace K' <= K is said to be conjugate to the subspace K'.
d. The set K" of all vectors yx e K conjugate to the subspace K' is obviously a subspace of the space K. This subspace K" is said to be conjugate to K'.
7.42. A basis ex, e2, . . . , en of the w-dimensional space K„ is called a canonical basis of the bilinear form A(x, y) if the basis vectors are conjugate to each other, i.e., if
Afc.O-O  for i^k.
For example, in the space K3 let the bilinear form A(x, y) be the scalar product of the vectors x and y. Then to say that x and y are conjugate with respect to A(x, y) means that x and y are orthogonal. In this case, any orthogonal basis of the space K3 is a canonical basis.
7.43. The matrix of a bilinear form relative to a canonical basis is diagonal, since
aik = AO,., ek) = 0   for   i ^ k.
sec. 7.4
the canonical basis of a bilinear form 191
Since a diagonal matrix coincides with its own transpose, a bilinear form which has a canonical basis must be symmetric. (We recall from Sec. 7.14 that whether or not the matrix of a bilinear form is symmetric does not depend on the choice of a basis.) Conversely, we now prove that every symmetric bilinear form A(x,y) has a canonical basis. To set this, consider the quadratic form A(x, x) corresponding to the given bilinear form A(x, y). We know that there exists a basis ex, e2, ■ ■ ■ , en in the space K„ in which A(x, x) can be written in the canonical form
A(x, x) = 2 h&
It follows from formulas (9) and (9'), p. 184 that the corresponding symmetric bilinear form A(x, y) takes the canonical form
n
Mx, y) = 2 (15)
i=l
in this basis, where
and hence its matrix is diagonal. But this just means that the basis ey, e2, ■ - - , en is canonical for the form A(jc, y), and our assertion is proved.
7.44. In analytic geometry it is shown that the locus of the midpoints of the chords of a second-degree curve which are parallel to a given vector is a straight line. We now prove this theorem. A second-degree curve in the jc^-plane has an equation of the form
«11*1 + 2a12xxx2 + a22x\ + b^ + b2x2 + c = 0
or
A(x, x) + L(x) + c = 0,
where
A(x, x) = a1±xl + 2al2x1x2 + ^22*2 is a quadratic form and
L(x) =       + b2x2
is a linear form in the vector x = (xu x2). Let x be the vector giving the position of the midpoint of a chord parallel to a fixed vector e. This means that the equations
A(x + te, x + te) + L(x + te) + c = 0, A(x    te, x — te) + L(x — te) + c 0
(16)
192     bilinear and quadratic forms
chap. 7
are satisfied for some t ^ 0. Let A{x, y) be the symmetric bilinear form corresponding to the quadratic form A(x, x). Then we can write (16) as
A(x, x) + 2tA(x, e) + t2A(e, e) 4- L(x) + tL(e) + c = 0, A(x, x) - 2tA(x, e) + t2A(et e) 4- L(x) — tL{e) + c = 0. Subtracting the second equation from the first and dividing by 2t, we get
This equation is linear in x and hence determines a straight line in the Xj_x2-plane, thereby proving the theorem.
Let x' be another point of the same line, so that
Then subtracting (18) from (17), we get
A(;c — xr, e) = 0,
i.e., the vector e and the vector x — x' determining the direction of the straight line in question are conjugate with respect to the bilinear form A(x, y), in the sense of Sec. 7.41.
7.45. Let e1; . . . , ek be a canonical basis of the form A(jc, y) in a k-dimensional subspace K' <= K, and let zr, . . . , efc be the corresponding canonical coefficients. Expressing the numbers A(x, et) in terms of the components of a vector x e K', we get
so that the numbers A(x, are uniquely determined by the components of the vector x. If the form A(x, y) is nonsingular in the subspace K', then the numbers zt are all nonzero. In this case, the converse is also true, i.e., the values A(x, et) of the form A(x, y) uniquely determine the components of the vector x.
7.5. Construction of a Canonical Basis by Jacobi's Method
7.51. The construction of a canonical basis given in Sec. 7.31 has the drawback that the components of the vectors of a canonical basis and the corresponding canonical coefficients \ cannot be determined directly from a knowledge of the elements of the matrix A[f) of the symmetric bilinear form A(x, y) in a given basis {/} = {/i,/*2> ■ ■ ■ ,fn)- Jacobi's method, which will now be presented, does allow us to do just this. However, we must now impose the following supplementary condition on the matrix Aif): The
2A(jc, e) + L(e) = 0-
(17)
2A(x',e) 4- L(e) = 0.
(18)
sec. 7.5
CONSTRUCTION OF A CANONICAL BASIS BY JACOBl'S METHOD 193
descending principal minors of A{f) of order up to and including n the principal minors of the form
a12
°1 = alU °2 =
1, i.e.,
021 «22
<*n-l =
must all be nonvanishing.
«21
«12 «22
an-l.l an-l,2
altn-l a2.n-l
flTl-l.Tl-]
(19)
7.52. The vectors elf e2, . . . , en are constructed by the formulas
e3=«i2)/l+«22>/2+/3>
(20)
ť« ai
where the coefficients aj*> (z = 1, 2, . . . , k; k — 1, 2, . . . , n — 1) are still to be determined. First of all, we note that the transformation from the vectors f1>f2, ■ ■ ■ ,/*. to the vectors ex, e2, . . . , ek is accomplished by using the matrix
1
ai
0
1
0 0
ax       aa a3
0 0
0 0
a
(*-l> A—1
whose determinant is unity. Hence for A: = 1,2,..., n the vectors/],/2, • • ■ > /j. can be expressed as linear combinations of et, e2, . . . , ek, so that the linear manifold L(/],/2, ... ,fk) coincides with the linear manifold Lfo, e2,.. . ,ek).
We now subject the coefficients a|.fc) (z = 1, 2, . . . , k) to the condition that the vector ek+l be conjugate to the subspace L(ely e.2, . . . , ek). A necessary and sufficient condition for this is that the relations
A(W/,) = 0, AO^/i) = 0, . . . , A(ek+Ufk) = 0
(21)
194     bilinear and quadratic forms
chap. 7
be satisfied. In fact, it follows from (21) that the vector ek^x is conjugate to the linear manifold spanned by the vectors/i,/2, - - ■ which, as we have just proved, coincides with the linear manifold spanned by the vectors £], e2,. ■ ■ , ek. Conversely, if the vector is conjugate to the subspace L(ei> e2, ■ ■ ■ ? e*)» it is conjugate to every vector in the subspace, in particular, to the vectors/x,/2, ■ • •      so that the conditions (21) are satisfied.
Substituting the expression (20) for into (21) and using the definition of a bilinear form, we obtain the following system of equations in the quantities a|*' (i = 1, 2, . . . , k):
AOW,) = af'At/i.A) + ^AC/2,/1) + ■ • ■ + «iWA(/ft J,) + AC/WO = °> A(<W2) = ^HfuU) + aS^AC/i J») + ■' ■ + 4fc)A(A,/2) + A(/i+1,,/2) = 0,
AK+1, A) = afTMfuA) + «SwA(/.,/») + ■' ■ + «iwA(/fcJ4) + A(/fc+1,/A) = 0.
(22)
By hypothesis, this nonhomogeneous system of equations with coefficients
A(/i,/j) = a»v     O'J = 1. 2> • ■ • > k)
has a nonvanishing determinant, and hence can be solved uniquely. Therefore we can determine the quantities a[.*} and thereby construct the desired vector ek+v To determine all the coefficients a[.*' and all the vectors ek, we must solve the appropriate system (22) for every k. Thus, in all, we must solve n — 1 systems of linear equations.
Let £1( £2, ■ ■ • > £n denote the components of the vector x and 7)lf v)2, . .. , Y)n the components of the vector y with respect to the basis ex, e2, . . . , en just constructed. Then the bilinear form A(jc, y) becomes.
A(x, y) = f -kilim (23)
in this basis.
7.53. To calculate the coefficients \, we argue as follows: Consider the bilinear form A(x, y) only in the subspace Lm = L^, e2,. . . , em) where m < n. The form A(x, y) clearly has the matrix
a2\     a22      " " ' a2w
^wl    ^m2     ' ' "     & mm
SEC. 7,5
CONSTRUCTION OF A CANONICAL BASIS BY JACOBl'S METHOD 195
in the basis/i,/a,
,fm of the subspace Lm and the matrix \  0    • • ■ 0 0    X2   ■■■ 0
* t at* *
|o  o   ■■■ xffl
in the basis ex, ea, .
(20) from the basis fx,f2i • • • ,/mto the basis eX) e2, 1. Hence by equation (6), p. 182 we must have
em. As we have seen, the matrix of the transformation
. , em has determinant
det
axx ax2 a2x a22
or, in the notation (19),
a2m
^«1 +M
~ det
Xj 0
0 x2 0 0
0 0
&m = \'k2" " • Xm      (m = 1,2, (Sn = det A{f)). It follows immediately that
X,-^-*^,   X2=^,   X3=^-\ K=~- (24)
Using (24), we can find the coefficients of the bilinear form A(jc, y) in a canonical basis without calculating the basis itself.
7.54. Consider once again the kth equation in the system (20), which we write in the form
fk+l —     ai*!/l       ' ' "       a*fc'/* + ek+l — 8k + ek+l>
where gk lies in the subspace L(/i, . . . ,fk) and ek+x is conjugate to this subspace. The coefficients oc^*',. . . , aj.*' are uniquely determined by the system (22) subject to the condition that det || A(/,/^)|| ^ 0 or, equivalently, that the form A(x, y) be nonsingular in the subspace L(/x, . . . ,fk). Since the vectorfk+x is arbitrary in this construction, then, writing
h = e
L(/1? . . . Jk) = K' c K,
f —fk+l>        S ~~ ?>*■>
we arrive at the following
Theorem. Suppose the bilinear form A(x, y) is nonsingular in a subspace K' <= K, and suppose the vector f does not belong to K'. Then there exists a
196     bilinear and quadratic forms
chap. 7
unique expansion
f=g + "> (25) where g e K' and h is conjugate to the space K'.
7.55. Let K" denote the subspace conjugate to the subspace K' with respect to the form A(x, y). Then the existence and uniqueness of the expansion (25) shows that the whole space K is the direct sum of the subspaces K' and K" (see Sec. 2.45). Thus, given a subspace K' <= K in which a bilinear form A(jc, y) defined on the whole space K is nonsingular, K can be written as the direct sum
K -— K ~-[— K j
where K" is conjugate to K' with respect to the form A(x, y).
7.6. Adjoint Linear Operators
7.61. Let (x, y) denote a fixed nonsingular symmetric bilinear form in the space K„. Let A and B be linear operators acting in K„, and use the formulas
A(x, y) = (Ax, y)y     B(x, y) = (x, By)
to define functions A(x, y) and B(x, y) of two vector arguments x and y. Then A(x,y) and B(x,y) are bilinear forms. In fact, it follows from the definition of a linear operator (Sec. 4.21) and the definition of a bilinear form (Sec. 7.11) that
M*i + *2, y) = (A(jtj + x2)f y) ^ (Axi + Ax2, y)
= (A*!, y) + (Ax2, y) = A(xlt y) + A{x& y),
A(ouc, y) — (A(ouc), y) = (aAx, y) = a(Ax, y) = a.A(x, y)t
which shows that A(x, y) is linear in its first argument. Similarly, the linearity of A(x, y) in its second argument is a consequence of the linearity of (x, y) in y. Then A(x, y) is a bilinear form, and similarly so is B(x, y). Next let £],, . . , en be a canonical basis of the form (x,y), so that
(eJf ek) = 0   if j^k,
(<?m, O = em 6 em ^ 0.
We now compare the matrix of the operator A with that of the form A(x, y) in this basis. The matrix Wa^W of the operator A is defined by the formula
Aei =2ak)eK      (7 = 1, ...,n),
sec. 7.6
adjoint linear operators 197
where here (in contradistinction to the notation adopted in Sec. 4.23) the superscript indicates the row number and the subscript the column number. The matrix \\ajk\\ of the form A(x, y), where the first subscript indicates the row number and the second the column number, is defined by the formula
a,-m = A0,> em) = (A*J( em) = ^akj)ek, e™j = a{i\em, em) = emalJ!. (26)
Hence the rath column of the matrix \\aim\\ is obtained (for every m = 1, . . . , n) by multiplying the mth column of the matrix llaj^H by the canonical coefficient zm of the form (x, y). Similarly, for the matrix Wb^ || of the operator B (in the same basis ex, . . . , en) and the matrix \\bjk\\ of the form B(x, y), we get
bim = B(e}, em) = {eh Bem) = (e^b^e^ = b^Xe,, et) = e,*V"\ (27)
i.e., theyth row of the matrix \\bim\\ is obtained (for every j = 1, . . . , ri) by multiplying the7th column of the matrix of the operator B by the corresponding canonical coefficient z}.
7.62, Conversely, given two bilinear forms A(x, y) and B(x, y) in the space K„, we assert that there exist unique linear operators A and B such that
A(x, y) = (Ax, y),     B(x, y) - (x, By). (28)
To show this, we specify A and B in the same basis ex,. . . , en by the matrices with elements
a\m) = 1 A(et, em),     b\m) - ^- B(e,, em),
respectively. We then use these operators to construct the forms Ax(x, y) = (Ax, y) and Bx(x, y) — (x, By). It follows from Sec. 7.61 that the matrix of the form A^x, y) coincides with the matrix of the form A(x, y) in the basis £],..., en, while the matrix of the form B3(x,.y) coincides with the matrix of the form B(x, y). But then
(Ax,y) = M(x,y) = A(x,y),      (x, By) - B,(x,y) = B(x,y)
for arbitrary x, y e Kn (recall Sec. 7.13), so that the operators A and B satisfy (28). To prove the uniqueness, we need only verify that if an operator A satisfies the condition
(Ax, y) — 0 for arbitrary x, y e K„, (29)
then Ax = 0 for every x e K„, so that A is the zero operator. Suppose Ax0 ^ 0 for some x0eKn. Then, since the form (x,>0 is nonsingular, it follows from Sec. 7.15c that there is a vector y0 e K„ such that (Ax0, y0) 7^ 0.
198     bilinear and quadratic forms
chap. 7
This contradicts (29) and establishes the required uniqueness of A. The uniqueness of B is proved similarly.
7.63. We now prove the following important
Theorem. Let (x, y) be a nonsingular symmetric bilinear form in the space Kn. Then, given any linear operator A acting in Kn, there exists a unique linear operator A' acting in K„ such that
(Ax, y) = (Xy A'y)
for arbitrary x, y e Kn. The matrix of the operator A' in any canonical basis of the form (x, y) is obtained from the matrix of A by transposition, followed by multiplication of the mth row by the canonical coefficient em and division of the jth column by the canonical coefficient    (j, m — 1, . . . , n).
Proof We use the given operator A to construct the form A(x, y) = (Ax, y), and then we define the operator A' by the formula
(Ax, y) = A(x, y) = (x, A'y).
The existence and uniqueness of A' follow from Sec. 7.62. In any canonical basis of the form (x,y), the matrix llaj^ll of the operator A, the matrix \\aim\\ of the form A(x,y) and the matrix of the operator A' are
related by formulas (26) and (27):
It follows that
a'3[m) =^ - ^a^.   | (30)
The operator A' is called the adjoint (or conjugate) of the operator A with respect to the form (x, y).
7.64. The operation leading from an operator A to its adjoint A' has the following properties:
1) (A')' = A for every operator A;
2) (A + B)' = A' + B' for every pair of operators A and B;
3) (XA)' = XA' for every operator A and every number X g K;
4) (AB)' = B'A' for every pair of operators A and B.
To prove property 1), we use the formula
(x,(A')'y) = (A'x,y) = (x,Ay)
implied by the definition of (A')', together with the uniqueness of the operator
sec. 7.7
isomorphism of spaces equipped with a bilinear form 199
defined by a bilinear form (Sec. 7.62). The remaining properties are proved similarly. Thus
(x, (A + B)'y) = ((A + B)x, y) - (Ax, y) + (Bx, y)
= (x, A'y) + (x, B» = (x, (A' + B»
implies property 2).
(x, (XA)» = (XAx, >>) = X(Ax,j>) = X(x, A'y) - (x, XA» implies property 3), and
(x, (AB)» = (ABx, y) - (Bx, A» = (x, B'A» implies property 4).
7.65. We point out another connection between the operators A and A'. Suppose the subspace K' <= K„ is invariant under the operator A. According to Sec. 4.81, this means that the operator A carries every vector x e K' into another vector of the same subspace K'. Let K* be the subspace conjugate to K' (Sec. 7.55). Then K" is invariant under the adjoint operator A'. In fact, suppose y e K", so that (y, x) = 0 for every x e K'. Then (A'y, x) = (y, Ax) ~ 0, since x e K' implies Ax e K'. But this means that the vector A'y is conjugate to every vector x e K' and hence belongs to K", as required.
7.7. Isomorphism of Spaces Equipped with a Bilinear Form
7.71. Definition. Let K' and K" be two linear spaces over the same number field K. Suppose K' is equipped with a nonsingular symmetric bilinear form A(x',/), while K" is equipped with a nonsingular symmetric bilinear form A(x", y"). Then K' and K" are said to be A-isomorphic if
1) They are isomorphic regarded as linear spaces over the field K (see Sec. 2.71), i.e., there exists a one-to-one mapping (morphism) cox' = x" preserving linear operations;
2) The values of the forms A(x', y') and A(x", y") coincide for all corresponding pairs of elements x', y' and x" = cox', y" = coy', i.e.,
A(x',/) = A(xff,/).
7.72. Theorem. Given two finite-dimensional linear spaces K' and K", suppose K' is equipped with a nonsingular symmetric bilinear form A(x',_y'), while K" is equipped with a nonsingular symmetric bilinear form A(x", y"). Then K' and K" are A-isomorphic if and only if
a) They have the same dimension n;
b) There exists a canonical basis for A(x', /) in K' and a canonical basis for A(x", y") in K" relative to which the two forms have the same set of canonical coefficients zx, . . . , zn.
200     bilinear and quadratic forms
chap. 7
Proof. Suppose K' and K" are A-isomorphic. Then they are isomorphic as linear spaces and hence have the same dimension, say n (see Sec. 2.73d). If e[,. . . , e'n is a canonical basis for the form A(jc\ /) in the. space K', then
f0   if i^j, Uf if i
Let e'[,. . . , e"n be the vectors in K" corresponding to the vectors e[,. . . , e'n in K' under the given A-isomorphism. By hypothesis,
f0   if i^j,
A(a*J, coe,') = A«, e'\) =
U* if i —j.
Thus e", . . , , e"n is a canonical basis for A(x", y") in the space K*. Moreover, A(x"fy") has the same canonical coefficients £!,...,£„ in the basis e", . . . , e"n as A(jc', /) has in the basis e[,. . . , e'n.
Conversely, suppose K' and K" have the same dimension n, and let e' ... , e' eK' and e" . . . , e" e K" be canonical bases with the same canonical coefficients e1?. . . , e„, so that
f0   if i^j,
A{e\, e'A = A«, =
Ut-  if  i =j.
Given any vector
in K', let
x» = W(x') = 2
1=1
(with the same components ^,. . . , £J be the corresponding vector in K*. This correspondence defines an isomorphism co of the spaces K' and K" (see Sec. 2.73d). Moreover, if
y'=2       f = "(/) = 2 v7,
then
AfV, /) = 2 zMi - A(x", /), »=i
so that the isomorphism co is an A-isomorphism. |
7.73. Given an w-dimensional space K„ equipped with a nonsingular symmetric bilinear form A(x, y), consider an A-isomorphism of K„, i.e., an invertible linear mapping y — Qx which does not change the form A(x,y) in the sense that
A(Qx, Qy) ~ A(x, y). (31)
sec. 7.7
isomorphism of spaces equipped with a bilinear form 201
We will henceforth denote A(x, y) simply by (x, y). If Q' is the adjoint of the operator Q with respect to the form (x, y), then
(Qx, Qy) = (Q'Qx, y). (32)
It follows from (31) and (32) that
Q'Q = E, (33)
and hence that Q' is the inverse of the operator Q (since Q is nonsingular, so is Q).
Conversely, (33) implies (32) and then (31), so that the condition (33) completely specifies the class of operators which do not change the form (x, y). These operators are said to be invariant with respect to the form (x, y).
7.74. If Q is invariant, then so is the inverse operator Q_1 = Q', since
(Q>, Q» - (QQ'x, y) - (x, y)
for every x and y. The product of two invariant operators Q and T is also an invariant operator, since
(QTx, QT» = (Tx, T» = (x, y)
for every x and y.
7.75. Let elt . . . , en be a canonical basis of the form (x, y), with canonical coefficients e1s . . . , £„. Then, applying an invariant operator Q to the vectors el5 . . . , en, we get the vectors
/i = Qe1; ...,/„ = Qen, (34)
where
f e,   if y =
C/i./*)-(Q^Q«*) = (^^= A   r . ,
(0   if j ^ k.
Thus /l9 . . . ,/n is also a canonical basis of the form (jc, _v), with the same canonical coefficients slf. . . , e„.
Conversely, if,/„ is a canonical basis of the form (x, y) with the same canonical coefficients £],...,£„ as the basis el7. . . , then the operator Q defined by (34) is invariant. In fact,
[ tj  if j = k,
10   if k,
and hence (31) holds for any pair of basis vectors. But then, by the linearity, (31) holds for arbitrary vectors x, y e K„, as required.
Thus an invariant operator Q is characterized by the fact that it carries every canonical basis of the space Kn (with respect to the form (x,y)) into another canonical basis with the same canonical coefficients.
202     bilinear and quadratic forms
chap. 7
7.76. We now find conditions characterizing the matrix of an invariant operator Q in a canonical basis of the form (x, y). Let elf . . . , en be such a basis, and let S],...,En be the corresponding canonical coefficients. Moreover, let Q = \\q\n\\ be the matrix of Q in the basis ey, . . . , en. Then, according to Sec. 7.63, the matrix of the adjoint operator has the form
0! = \\d:{% q'^^qf. In terms of matrix elements, we can write equation (33) as
i ^v=i v,v=*?» = I1 If J = K
Ef 10   if j ^ A:.
In other words,
t=l £,•
if y*=*.
Equation (35) is equivalent to (33), and can also serve as the definition of an invariant operator Q.
Thus an invariant matrix, i.e., the matrix of an invariant operator in any canonical basis of the form (x, y), is characterized by the fact that the sum of the squares of the elements of its yth column taken with coefficients
e"1, . . . , e"1 equals the number zj1 (j — 1.....n), while the sum of the
products of the corresponding elements of two different columns also taken with the coefficients s"1, . . . , e"1 equals zero. Since (33) also implies QQ' = E, we also have the relations
or
n Uj   if  j — m,
swiM-=r t . 05')
k=i [0   if j zfc m.
This gives another characterization of an invariant matrix, namely the sum of the squares of the elements of itsyth row taken with coefficients £],... , sn equals the number (j = 1,. . . , n), while the sum of the products of the corresponding elements of two different rows also taken with the coefficients £!,...,£„ equals zero.
*7.8. Multilinear Forms
7.81. By analogy with bilinear forms we can consider linear functions of a larger number of vectors (three, four or more). All such functions are called multilinear forms.
sec. 7.8
multilinear forms 203
Definition. A function A(xls. . . , xk) of k vector arguments xlt . . . , xk varying in a linear space K is called a multilinear (more exactly, a k-linear) form if it is linear in each argument xf (j — \ ,. . . , k) for fixed values of the remaining arguments xlt . . . , x^, xj+1> . . . , xk, A multilinear form A(jc1? . . . , xk) is called symmetric if it does not change when any two of its arguments are interchanged, and antisymmetric if it changes sign when any two of its arguments are interchanged.
An example of an antisymmetric multilinear form in three vectors x, y and z (a trilinear form) of the space V3 is the mixed triple product of x, y and z.f An example of an antisymmetric multilinear form in n vectors
xx — (#n, a12i • - • > aln),
x2 — (#21? a22> . . . , a2n),
Xn — (an\-> an2? • • ■ > #nn)
of an h-dimensional linear space Kn| is the determinant
#11  «i2   • • ■ a
A(x1? x2)
#21 #22
In «2n
(36)
A somewhat more general example is the product of the determinant (36) with a fixed number le K.
7.82. We now show that every antisymmetric multilinear form
A(xl7 x2,. . . , xn)
in n vectors xlt x2,. . . , xn of an n-dimensional linear space Kn with a fixed basis eyi e2> . . . , en equals the determinant (36) multiplied by some constant XeK.
Let X denote the quantity A(els e2,. . . , en). Then we can easily calculate the quantity A(^i? eti, . . . , ein) where il7 i2, . . . , in are arbitrary integers from 1 to n. If two of these numbers are equal, then A(eh, eiz> . . . , ein) vanishes, since on the one hand it does not change when the arguments corresponding to these numbers are interchanged, while on the other hand it must change sign because of the antisymmetry property. If all the numbers 71? i2, ...,/„ are different, then by making the same number of interchanges
t I>e., (x,y x z) where (,) denotes the scalar product and x the vector product. X By x,_ = (alu am . .. , aln) we mean x = a,,^ + a12e2 + - ■ ■ + alnenj where eu e%, . , en is a fixed basis in K„, and so on.
204     bilinear and quadratic forms
chap. 7
of adjacent arguments as there are inversions in the sequence of indices 'i> k> ■ • ■ > we can cause the arguments to be arranged in normal orderf; let the required number of interchanges be N. Then we have
A(etl, eh,. . ., ein) - (-1)-VX.
Now let
xi = ^Laaei     (i = 1, 2,.. ., n)
be an arbitrary system of n vectors of the space Kn, and consider the multilinear form
(n n                                 n \
2 2 a2i2^i2, ■ ■ - , 2 flmA
»1=1 i2=l                        in=*l /
n
=     2     «1^021," " " oninA(eilt e,2,. . ., ein)
*l-*2.....«n=l
n
= x   2    (-i)^]*^*, ■ ■ ■ <w
Since in each term of the last sum, iV denotes the number of inversions in the arrangement of the second subscripts of the elements aif when the first subscripts are in normal order, it follows that each term is one of the terms in the determinant (36) with the appropriate sign. Hence the sum of all the terms is just the determinant (36), and our assertion is proved.
In particular, this shows that the mixed triple product of three vectors jc, y and z of the space K3 in any basis can be written as the third-order determinant made up of the components of jc, y and z, taken with a coefficient equal to the triple product of the basis vectors.
7.9. Bilinear and Quadratic Forms in a Real Space
7.91. Every real number has a definite sign (+ or —), and hence the theory of bilinear and quadratic forms in a real space can be carried somewhat further than in a space over an arbitrary field K. According to the general theory of Sec. 7.31, a quadratic form A(x, x) can be reduced in some basis to the canonical form
A(X, X) = X^2 + X27)^ + • ■ ■ + An7)2,
where the number of nonzero coefficients X]f X2, - . . , X„, equal to the rank of the form A(x, x) (Sec. 7.33b), does not change when the canonical basis is changed. These coefficients are either positive or negative. It turns out
t Cf. the proof of Theorem 4.54.
sec. 7.9
bilinear and quadratic forms in a real space 205
that changing the canonical basis also has no effect on the total number of positive coefficients and the total number of negative coefficients:
Theorem (Law of inertia for quadratic forms). If a quadratic form A(x, x) in a real space is written in canonical form, the total number of positive coefficients and the total number of negative coefficients are invariants of the form, i.e., do not depend on the choice of the canonical basis.
Proof. Suppose A(x, x) has the form
n
A(x, x) = 2
i.Jfc=l
in the basis {e} = {elt e2, . . . , en}, where £1? . . . ,\n are the components of the vector x with respect to {e}. Suppose A(x, x) has two canonical bases
{/} = -{/i»./a» ■ ■ ■ >/Jand(£}= igi>g2, ■ ■ ■ ,£„}■ Let 7)!, 7)2----, Y)ndenote
the components of x with respect to the basis {/}, and let t1; t2, . . . , in denote the components of x with respect to the basis {g}. Let the corresponding transformation formulas be
fii = *u^i +        + ■ ■ ■ + bin%n,
7)2 = &2i£i + ^22^2 + ' ' " + ^2n£n>
(37)
In = *nl£l + ^«2^2 + * ' * + bnl&n
and
Tl = cll£l + ^12^2 + ' ' ' + c1n£>n-> T2 — C2i^! + f22^2 + ' ' * + c2n£r»>
(37')
Tn = Cnl£l H~ c«2^2 H~ ' ' " H~ Cnn^ni
where the matrices \\bik\ and \\cik\\ are nonsingular. In the basis {/}, A(;t, x) has the form
A(x, x) =       + ■ ■ • + afcY]J. - a^iUj — ■ ■ ■ — a„V)'^, (38) while in the basis [g] it has the form
A(x, x) = P,t2 + ■ ■ ■ + P,tJ - (W^ - ■ ■ ■ - (VrJ, (39)
where the numbers <x1? . . . , <xm, plt . . . , Pfl are assumed to be positive. We wish to show that k — p, m — q. Equating the right-hand sides of (38) and (39), and transposing negative terms to opposite sides of the equation, we obtain
+ ■ ■ ■ + aft7)J + p^t^, + ■ ■ ■ + P,tJ
=- *k+i4+i + ■ ■ " + "nrfn + P,t2 + ■ • ■ + pyrj. (40)
206     bilinear and quadratic forms
chap. 7
Now supposed < />, and consider the vectors x which satisfy the conditions
fli = 0, y)2 = 0,. . . , % = 0,
tp+1 = 0, . . . , tq = 0, Ta+1 = 0, . . . , T„ — 0.
There are clearly less than n of these conditions, since k < p. Using (37) and (37') to express 7)1?. . . , t^, tj,+1, .. . , t„ in terms of the variables £1? £2, . . . , ?m we obtain a homogeneous system of linear equations in the unknowns £i, £2> • • • , £n- The number of equations is less than the number of unknowns, and therefore this homogeneous system has a nontrivial solution x = (£lf £2, • • ■ , On the other hand, because of (40), every vector x satisfying the conditions (41) also satisfies the conditions
Tl = T2 = ' * * = T„ — 0.
However, since det ||cifc|| ^ 0, any vector x for which
Tl = T2 = ' " * = T*> = Vi = ' * ' ~ T„ = 0
must be the zero vector, with all its components ^, £2> • • • , \n equal to zero. Thus the assumption that k < p leads to a contradiction. Because of the complete symmetry of the role played by the numbers k and p in this problem, the assertion p < k also leads to a contradiction. It follows that k— p. Moreover, examining the conditions
Tj = 0, t2 = 0, . . . , tv = 0,
t)ah-i = 0, . = 0, t?+] = 0, . . . , tn = 0,
we can use the same argument to show that m < q is impossible and hence, by symmetry, that q < m. Thus we finally find that k = /», m = q. |
7.92. The total number of terms appearing in the canonical form of a quadratic form \(x, x), i.e., its rank (see Sec. 7.33b), is also called its index of inertia. The total number of positive terms is called the positive index of inertia, and the total number of negative terms is called the negative index of inertia. If the positive index of inertia equals the dimension of the space, the form is said to be positive definite. In other words, a quadratic form A(x, x) is positive definite if and only if all n of its canonical coefficients are positive. It follows that a positive definite quadratic form takes a positive value at every point of the space except the origin of coordinates.
Conversely, if a quadratic form defined on an «-dimensional real space takes positive values everywhere except at the origin, then its rank is n and its positive index of inertia is also n, i.e., the form is positive definite. In fact, for a form of rank less than n or with less than n positive canonical coefficients, it is easy to find points in the space other than the origin where
sec. 7.9
bilinear and quadratic forms in a real space 207
the form takes either the value 0 or negative values. For example, the quadratic form
A{x, x) = l\ + %\
of rank 2 in a three-dimensional space takes the value 0 for any nonzero vector with components ^ = 0, £2 ^ 0, £3 = 0. For these vectors the form
a(*, x) - n -1\ + a
of rank 3 in a three-dimensional space takes negative values. Clearly, these examples illustrate the full generality of the situation.
7.93. The law of inertia just proved for quadratic forms generalizes immediately to the case of symmetric bilinear forms, i.e., the total number of positive coefficients and the total number of negative coefficients in the canonical form (22) of a symmetric bilinear form A{x, y) is independent of the choice of a canonical basis. Thus the positive and negative indices of inertia are well-defined concepts for a symmetric bilinear form. The values of the positive and negative indices of inertia of the bilinear form A(x,y) and hence of the quadratic form A(x, x) can be determined from the signs of the descending principal minors of the matrix of the form in any basis (provided only that the minors are nonzero) by using the formulas (24), p. 195.
It should be noted that given any quadratic form A(x, x) in a real space Rn, a canonical basis can always be found such that the corresponding canonical coefficients can only take the values ± 1. In fact, having reduced A(x, x) to the form
A(x, x) = AjflJ +----h \vril - \w&+x ----- |x,t£+,
where the numbers Xx, . . . , X^, \ix,. . . , (xa are all positive, we make another coordinate transformation
thereby reducing A(x, x) to the form
A(X, X) — Tj + ' ■ • -|- tp — Tj,+ l       ' Tj>+(f
This shows that in a real space the numbers p and q are the only invariants^ of the quadratic form A(x, x) and the corresponding symmetric bilinear form A(x, y).
Theorem. Two finite-dimensional real spaces R' and R", equipped with nonsingular symmetric bilinear forms A(x', y') and a(x", y"), respectively, are A-isomorphic if and only if they have the same dimension and the indices of
t Apart from any function of p and q (like the rank r = p + q), which is obviously an invariant of A(jc, jc) and A(jc, y).
208     bilinear and quadratic forms
chap. 7
inertia p\q' of the form A(x\ /) coincide with the corresponding indices of inertia p\ q" of the form A(x", y").
Proof An immediate consequence of the above considerations and Theorem 7.72. |
7.94. Let A{x, y) be a symmetric bilinear form in a real space Rn. Then, as in Sec. 7.15b, A(x, y) is said to be nonsingular if its rank equals the dimension of the space, i.e., if all the coefficients Xl5 X2, . . . , Xn in the canonical form
A(x, y) = + X2£2t)2 + • • ■ + Xn£„y)n
(see Sec. 7.43) are nonzero. Suppose that in addition all the coefficients Xx, X2, . . . , Xn are positive, so that the corresponding quadratic form A(x, x) is positive definite (see Sec. 7.92). Then the bilinear form A(x, y) is said to be positive definite. Thus, according to Sec. 7.92, A(x, y) is positive definite if and only if the corresponding quadratic form A(x, x) takes a positive value for every nonzero vector x:
By its very definition, a positive definite form A{x, y) in a space Rn is nonsingular. But, because of the fact that A{x, x) > 0, a positive definite form A(x, y) remains positive definite in any subspace R' <= Rn. Hence a positive definite bilinear form, unlike the general bilinear form (see Sec. 7.15d), remains nonsingular in any subspace R' <= Rn. Thus, given any k linearly independent vectors f, . . . ,fk, the determinant
D
A(/fcJi)   ••■ A(/fc,/fc) must be nonzero. We will see in a moment that D must in fact be positive.
7.95. An important example of a symmetric positive definite bilinear form in the space Vz is given by the scalar product (x, y) of the vectors x and y. In fact, it follows at once from the definition of the scalar product that
(*, y) = (y, x),
(x, x) = \x\2 > 0  for  x ^ 0.
The first of these relations shows that the bilinear form (x, y) is symmetric, while the second shows that the corresponding quadratic form takes a positive value for every vector x ^ 0. Thus the bilinear form {x, y) is positive definite.
Positive definite bilinear forms will play a particularly important role below. In fact, by using such forms we will be able to introduce the concepts of the length of a vector and the angle between two vectors in a general linear space (Chap. 8).
sec. 7.9
bilinear and quadratic forms in a real space 209
7.96. The problem now arises of how to use the matrix of a symmetric bilinear form A(x, y) to determine whether or not A(x, y) is positive definite. The answer to this problem is given by the following
Theorem. A necessary and sufficient condition for the symmetric matrix A = \\atk\\ to define a positive definite bilinear form A(x, y) is that the descending principal minors
an,
a
li
ax2
a21 a22
Oil 021 «31
a
12
a
13
a22 a23
a32 a33
det
\a,
(42)
of the matrix \\aik\\ all be positive.
Proof If the principal minors (42) of the matrix A are all positive, then by the formulas (24), p. 195, all the canonical coefficients Xfc of the form A(a\ y) are also positive in some basis, i.e., A(x, y) is positive definite.
Conversely, suppose the form A(x, y) is positive definite. Then the descending principal minors (42) of the matrix \\aik\\ are positive. In fact, the principal minor
' all      a12       ' ■ • °i
M -
*lm
a
21
a
22
a
2m
a
ml
a
m2
a,
corresponds to the matrix ||aiA.|| (/, k = 1, 2, . . . , m) of the bilinear form A(x, y) in the subspace Lm spanned by the first m basis vectors. Since A(x, y) is positive definite in the subspace Lm (A(x, x) > 0 for x y= 0), there exists a canonical basis in Lm in which A(jc, y) can be written in canonical form with positive coefficients. In particular, the determinant of A(x, y) in this basis is positive, being equal to the product of the canonical coefficients. Bearing in mind the relation between determinants of a bilinear form in different bases (equation (6), p. 182), we see that the determinant of A(x, y) in the original basis of the subspace Lm is also positive. But the determinant of A(x, y) in the original basis of Lm is just the minor M, It follows that M > 0. |
Remark. In the second part of the proof, we could have taken M to be any principal minor instead of a descending principal minor, without changing the argument in any essential way. Thus every principal minor of the matrix of a positive definite bilinear form is positive.
7.97. For a positive definite form A(x, y) there always exists a canonical basis ely . . . , en in which all the canonical coefficients equal +1 (see Sec. 7.93). Hence two rt-dimensional real spaces       and R^ equipped with
210     bilinear and quadratic forms
chap. 7
positive definite forms A(V, /) and A(x", /'), respectively, are A-isomorphic, by Theorem 7.72.
7.98. The solution of the following problem is often needed in applications of linear algebra to analysis (i.e., in the theory of conditional extrema): Given the matrix A ^= \\aik\\ of a symmetric bilinear form A(x, y), determine whether the form is positive definite in the subspace specified by the system of k independent linear equations
2^ = 0     (/ = 1,2,..., *;*<«).
It turns out that a necessary and sufficient condition for this to be the case is that the descending principal minors of orders 2k + 1, 2k + 2, . . . , k + n of the matrix
0	o   . •	• 0	hi	^12     * *	" hn
0	0    • •	• 0	b2X	*22      ' *	' b2n
0	0    - •	- 0	hi	bk2   ' •	• hn
hi	b21   • •	- hi	«n	«12 *	■ «1«
bu	b22   • •		«21	«22	• «2„
hn	b2n ■	' bkn	«nl	«n2 -	«n«
be positive, under the assumption that the rank of the matrix Wb^W equals k and that the determinant made up of the first k columns of ||&^|| is non-vanishing.f
PROBLEMS
1. Do the elements of the matrix of a bilinear form constitute a tensor (Sec. 5.61), and if so, of what type?
2. Reduce the quadratic form
Si-=2   <   ^2 -3   ' ~3->l
to canonical form.
3. Let p be the positive index of inertia of a quadratic form A(jc, x) (defined on the space R„), and let q be its negative index of inertia. Moreover, let Xx, X2, . . . , ~kv be any p positive numbers and \xx, [l2, . . . , \iq any q negative numbers. Show that there exists a basis in which the form A(x, x) takes the form
A(x, x) = Xxt| + ■ ■ ■ +        + H^p+l + • • • + \J.q~l+q. t See the note by R. Y. Shostak, Uspekhi Mat. Nauk, vol, 9, no. 2 (1954), pp. 199-206.
problems     21 I
4. Show that the matrix of a quadratic form of rank r always has at least one nonvanishing principal minor of order r.
5. Reduce the bilinear form
\(x,y) =        + F,iTj2 + l2r\x + 2^2y}2 -r 2$2Tt3 + 2£3y)2 + 553^3 to canonical form,
6. Apply Jacobi's method to reduce the bilinear form
A(x,_y) = ^rjj - ^7)2 ~ £2t)x + cxt]3 + E3y)x + 2£2y]3 + 2£3y)2 + £3y)3 + + l2t)2 to canonical form,
7. State the conditions under which a symmetric matrix \\aik\\ defines a negative definite bilinear form.
8. Given a symmetric matrix A = \\aik\\ with the properties
show that aT
axx > 0, 0.
«21 °22
> 0, . . . , det Haull > 0,
9. Prove that an antisymmetric multilinear form in n + 1 vectors of an n-dimensional space Kn vanishes identically,
10. Let \(x1>. . . , x„_x) be an antisymmetric multilinear form in n — 1 vectors of an ^-dimensional space. Prove that A(jcx, ... , *n-x) can be written in any basis as a determinant whose first n - 1 rows consist of the components of the vector arguments and whose last (nth) row is fixed,
11. Prove that every antisymmetric bilinear form \(x,y) ^ 0 can be reduced to the canonical form
\(x, y) = axT2 - a2'x + <t3t4 ~ a4t3 + ■ • 12. Prove that a real quadratic form
n
A(X, X) =  J aikZiZk i,k=l
is nonnegative for all x e Rn if and only if all principal minors of the matrix A = \\aik\\ are nonnegative.
Comment. The descending principal minors 8X and 82 vanish for the matrix
0	0
0	-1
but the corresponding form fails to be nonnegative. Thus the conditions 8: > 0, 82 > 0 are not sufficient for nonnegativity of the form.
212     bilinear and quadratic forms
chap. 7
13. Let a(x, y) be a nonsingular symmetric bilinear form in an ^-dimensional space k„, and let k' <= kn be a subspace of dimension r. Prove that the space k" c k conjugate to k' with respect to A(x,y) is of dimension n — r.
14. Consider the symmetric bilinear form
(x,y) - lxriX - ?2r]2
in the space r2. Find the operator which is the adjoint with respect to this form of the rotation operator with matrix
A =
COS a sin a Sin a    COS a
15. Let (x,y) be a nonsingular quadratic form in the space k„, For the system
2>rt5*. 0 = 1,2,...,«) (43)
k=l
of n linear equations in n unknowns, prove Fredholrrfs theorem which asserts that the system (43) has a solution for precisely those vectors b = (bx,. . . , bn) which are conjugate to all the solutions of the homogeneous system
i>rt1*=0, (44)
k=l
where \\a'jk\\ is the matrix conjugate to \\ajk\\ with respect to the form (x,y). From this deduce that the number of independent linear conditions on the vector b which are necessary and sufficient for the system (43) to have a solution equals the dimension of the space of solutions of the homogeneous system
i««5ft=0      (_/ = l,2,...,n).
k=l
Comment, For a general system
2 ajk% =-b}      (y = 1, 2, . . . , m * «), (43')
the two quantities in question no longer coincide, and their difference, equal to m — n, is called the index of the system (43').
16. Prove that every nonnegative bilinear form of rank r in the space rn can be represented as a sum of r nonnegative bilinear forms of rank 1.
17. Prove that every bilinear form of rank 1 in the space Kn is of the form
where f(x) and g{y) are linear forms.
problems    213
18. Prove that if
71
and
n
B(x,>0 = 2 ^f1)*
t.k—l
are nonnegative bilinear forms in the space Rn, then the form
n
C(x,y) =   2 Ofl^ik^k j.k^i
is also nonnegative.
chapter 8
EUCLIDEAN SPACES
8.1. Introduction
The explanation of a large variety of geometric facts rests to a great extent on the possibility of making measurements, basically measurements of the lengths of straight line segments and the angles between them. So far, we are not in a position to make such measurements in a general linear space; of course, this has the effect of narrowing the scope of our investigations. A natural way to extend these "metric" methods to the case of general linear spaces is to begin with the definition of the scalar product of two vectors which is adopted in analytic geometry (and which is suitable as of now only for ordinary vectors, i.e., elements of the space V3 introduced in Sec. 2.15a). This definition reads as follows: The scalar product of two vectors is the product of the lengths of the vectors and the cosine of the angle between them. Thus the definition already rests on the possibility of measuring the lengths of vectors and the angles between them. On the other hand, if we know the scalar product for an arbitrary pair of vectors, we can deduce the lengths of vectors and the angles between them. In fact, the square of the length of avector equals the scalar product of thevectorwith itself, while the cosine of the angle between two vectors is just the ratio of their scalar product to the product of their lengths. Therefore the possibility of measuring lengths and angles (and with it, the whole field of geometry associated with measurements, so-called "metric geometry"), is already implicit in the concept of the scalar product. In the case of a general linear space, the
214
sec. 8.2
definition of a euclidean space 215
simplest approach is to introduce the concept of the scalar product of two vectors, and then use the scalar product (once it is available) to define lengths of vectors and angles between them.
We now look for properties of the ordinary scalar product which can be used to construct a similar quantity in a general linear space. For the time being, we restrict ourselves to the case of real spaces.
As already noted in Sec. 7.95, in the space V3 the scalar product (jc, y) is a symmetric positive definite bilinear form in the vectors jc and y. Quite generally, we can define such a form in any real linear space. Thus we are led to consider a fixed but arbitrary symmetric positive definite bilinear form A(jt, y) defined on a given real linear space, which we call the "scalar product" of the vectors x and y. We then use the scalar product to define the length of every vector and the angle between every pair of vectors by the same formulas as those used in the space K3. Of course, only further study will show how successful this definition is; however, in the course of this and subsequent chapters, it will become apparent that with this definition we can in fact extend the methods of metric geometry to general linear spaces, thereby greatly enhancing our technique for investigating various mathematical objects encountered in algebra and analysis.
At this point, it is important to note that the initial positive definite bilinear form can be chosen in a variety of different ways in the given linear space. The length of a vector x calculated by using one such form will be different from the length of the same vector calculated by using another form; a similar remark applies to the angle between two vectors. Thus the lengths of vectors and the angles between them are not uniquely defined. However, this lack of uniqueness should not disturb us, for there is certainly nothing very surprising about the fact that different numbers will be assigned as the length of the same line segment if we measure the segment in different units. In fact, we can say that the choice of the original symmetric positive definite bilinear form is analogous to the choice of a "unit" for measuring lengths of vectors and angles between them.
A real linear space equipped with a "unit" symmetric positive definite bilinear form will henceforth be called a Euclidean space, while a linear space without a "unit'' form will be called an affine space. The case of complex linear spaces will be considered in Chapter 9.
8.2. Definition of a Euclidean Space
8.21. A real linear space R is said to be Euclidean if there is a rule assigning to every pair of vectors xjeRa real number called the scalar product of the vectors x and y, denoted by (jc, y), such that
a) (jc, y) = (y, x) for every x, y e R (the commutative law);
b) (x, y + z) = (jc, y) + (jc, z) for every x, y, z eR (the distributive law);
216     euclidean spaces
chap. 8
c) Qoc, y) = X(x, y) for every x, y e R and every real number X;
d) (x, x) > 0 for every x # 0 and (x, x) = 0 for x = 0.
Taken together, these axioms imply that the scalar product of the vectors x and y is a bilinear form (axioms b) and c)), which is symmetric (axiom a)) and positive definite (axiom d)). Conversely, any bilinear form which is' symmetric and positive definite can be chosen as the scalar product.
Since the scalar product of the vectors x and y is a bilinear form, equation (2) of Sec. 7.1 holds, and in the present case becomes
where xx, . . . , xk, yx, . . . , ym are arbitrary vectors of the Euclidean space R, and <xx, . . . , afc, f4x, . . . , {3m are arbitrary real numbers.
8.22. Examples
a. In the space V% of free vectors (Sec. 2.15a), the scalar product is defined as in the beginning of Sec. 8.1, and axioms a)-d) express the familiar properties of the scalar product, proved in vector algebra.
b. In the rt-dimensional space Rn (Sec. 2.15b) we define the scalar product of the vectors x = (£x, £2> • • • , £n) and y ~ (^In 1)2, • ■ • > ?)n)    tne formula
This definition generalizes the familiar expression for the scalar product of three-dimensional vectors in terms of the components of the vectors with respect to an orthogonal coordinate system. The reader can easily verify that axioms a)-d) are satisfied in this case.
We note that formula (2) is not the only way of introducing a scalar product in Rn. A description of all possible ways of introducing a scalar product (i.e., a symmetric positive definite bilinear form) in the space Rn has essentially already been given in Sec. 7.96.
c. In the space R(a, b) of continuous real functions on the interval a < t < b (Sec. 2.15c), we define the scalar product of the functions x = x(t) and y = y(t) by the formula
Axioms a)-d) are then immediate consequences of the basic properties of the integral. Henceforth the space R(a, b), with the scalar product defined by (3), will be denoted by R2(a, b),
8.3. Basic Metric Concepts
Equipped with the scalar product, we now proceed to define the basic metric concepts, i.e., the length of a vector and the angle between two vectors.
(1)
(X, y) = £x7h + £27)2 H----+ £B7)n.
(2)
(x, y) =    x(t)y(t) dt.
(3)
SEC. 8.3
basic metric concepts 217
8.31. The length of a vector. By the length (or norm) of a vector x in a Euclidean space R we mean the quantity
|*1 = W(*, *)• (4)
Examples
a. In the space K3 our definition reduces to the usual definition of the length of a vector.
b. In the space Rn the length of the vector x = (£x, £a,. . . , £n) is given
by
1*1 = +v« + s + • • ■ + a.
c. In the space /?a(a, 6), the length of the vector x(t) turns out to be
|x| = +V(*, x) = -k
x\t) dt.
This quantity is sometimes written ||x(r)|| and is best called the norm of the function x(t) (in order to avoid misleading connotations connected with the phrase "length of a function").
8.32. It follows from axiom d) that every vector * of a Euclidean space R has a length; this length is positive if * ^ 0 and zero if * = 0 (i.e., if * is the zero vector). The formula
]Xx| = V(Xx, Xx) = Vx2(x, x) = IX] V(*, *) = IX] |x| (5)
shows that the length of a vector multiplied by a numerical factor X equals the absolute value of X times the length of x.
A vector x of length 1 is said to be a unit vector. Every nonzero vector x can be normalized, i.e., multiplied by a number X such that the result is a unit vector. In fact, solving the equation |Xx] ~ 1 for X, we see that X need only be such that
1*1
A set F <=■ R is said to be bounded if the lengths of all the vectors x e F are bounded by a fixed constant. The set of all vectors x e R such that |x| < 1 is a bounded set called the unit ball, while the set of all x e R such that |x] = 1 is a bounded set called the unit sphere.
8.33. The angle between two vectors. By the angle between two vectors x and y we mean the angle (lying between 0 and 180 degrees) whose cosine is the ratio
1*1 \y\'
218     euclidean spaces
chap. 8
For ordinary vectors (in the space K3) our definition agrees with the usual way of writing the angle between two vectors in terms of the scalar product. To apply this definition in a general Euclidean space, we must first prove that the ratio has an absolute value no greater than unity for any vectors x and y. To prove this, consider the vector ~kx — y, where X is a real number. By axiom d), we have
(X* - yt Xx - y) > 0 (6)
for any X. Using (1), we can write this inequality in the form
\\x, x) - 2X(*, y) + (y, y) > 0. (7)
The left-hand side of the inequality is a quadratic trinomial in X with positive coefficients, which cannot have distinct real roots, since then it would not have the same sign for all X. Therefore the discriminant (x, y)2 — (x, x)(y, y) of the trinomial cannot be positive, i.e.,
(*» yf < (*> *)0» >')• Taking the square root, we obtain
\(x,y)\ < \x\ \y\, (8)
as required. The inequality (8) is called the Schwarz inequality.^
8.34. We now examine when the inequality (8) reduces to an inequality. Suppose the vectors x and y are collinear, so that y — Xx, X e R, say. Then obviously
\{x,y)\ = |(*, Xx)| = |X| (*,*) = |X| \x\2 = |jc| \y\,
and (8) reduces to an equality.
Conversely, if the inequality (8) reduces to an equality for some pair of vectors x and y, then x and y are collinear. In fact, if
\(x,y)\ = \x\ \y\,
then the discriminant of (7) vanishes and hence (7) has a unique real root X0 (of multiplicity two). Therefore
X2(x, x) - 2X0(x, y) + (y, y) = (l0x - y, l0x - y) = 0,
whence it follows by axiom d) that X0x — y = 0 or y = X0x, i.e., the vectors x and y are collinear. Thus the absolute value of the scalar product of two vectors equals the product of their lengths if and only if the vectors are collinear.
Examples
a. In the space V3 the Schwarz inequality is an obvious consequence of the definition of the scalar product as the product of the lengths of two vectors and the cosine of the angle between them.
t Sometimes also associated with the names of Cauchy and Buniakovsky.
sec. 8.3
basic metric concepts 219
b. In the space R„ the Schwarz inequality takes the form
(9)
and is valid for any pair of vectors x = (£x, £2, . . . , £,n) and y = (t)x, t)2, . . . , 7)„), or equivalently, for any two sets of real numbers \x, £2, . . . , ?n and y)x, 7)2, ... , v)n.
c. In the space R2(a, b), the Schwarz inequality takes the form
f
x(t)y(t) dt
(10)
8.35. Orthogonality. Two vectors x and y are said to be orthogonal if (jc5 ^>) = 0. Thus the notion of orthogonality of the vectors x and y is the same as the notion of x and y being conjugate (Sec. 7,41a) with respect to the bilinear form (x, y). If x y= 0 and y ^ 0, then, by the general definition of the angle between two vectors, (x, >>) = 0 means that * and y make an angle of 90° with each other. The zero vector is orthogonal to every vector x eR.
Examples
a. In the space Rn the orthogonality condition for the vectors x = 52,. • • , £n) and 7 = (?h,     • • • , t)n)takes tne form
llf\l + £2y)2 + ' ' • -f ln'f\n = 0.
For example, the vectors
ex=(l,0,...,0), e2 = (0, 1,...,0),
en = (0, 0, . . . , 1)
are orthogonal (in pairs).
b. In the space R2(a, b) the orthogonality condition for the vectors x = x(t) and y = y(t) takes the form
x(t)y{t) dt = 0.
J a
The reader can easily verify, by calculating the appropriate integrals, that in the space R2(— 7c, 7c) any two vectors of the "trigonometric system"
1, cos /, sin /, cos 2/, sin 2t,. . . , cos nt, sin     ..-.
are orthogonal.
220     euclidean spaces
chap. 8
8.36. We now derive some simple propositions associated with the concept of orthogonality.
a. Lemma. If the nonzero vectors xx, x2,. . . , xk are orthogonal, then they are linearly independent.
Proof. Suppose the vectors are linearly dependent. Then a relation of the form
ai*i ~r a2*2 + ' ' ' + &kxk — 0
holds, where <xx ^ 0, say. Taking the scalar product of this equation with xx, we obtain a.l(xl, xx) = 0, since by hypothesis the vectors xx, x2, - • . , xk are orthogonal. It follows that (xx, xx) = 0 and hence that xx is the zero vector, contrary to hypothesis. |
The result of this lemma is often used in the following form: If a sum of orthogonal vectors is zero, then each term in the sum is zero.
b. Lemma, If the vectors yx, y2,. . . , yk are orthogonal to the vector x, then any linear combination ot.xyx + oc^ + ' ' ' +       is also orthogonal to x.
Proof We need merely note that
OlVi + «2>-2 H-----h a^, x)
= aiOi, x) + *2<J>2, x)-\----+ afcOfc, x) = 0. |
The set of all linear combinations oc^i + a2_y2 + • • • + <xkyk forms a subspace L = L^, y2,. . , , yk), namely the linear manifold spanned by the vectors yx, y2, . . . ,yk (Sec. 2.51). Therefore if x is orthogonal to the vectors 7i» )>2, • • • ■> yk-> it is orthogonal to every vector of the subspace L. In this case, we say that the vector x is orthogonal to the subspace L. In general, if F <= R is any set of vectors in a Euclidean space R, we say that the vector x is orthogonal to the set F if x is orthogonal to every vector in F. According to Lemma 8.36b, the set G of all vectors x orthogonal to a set F is itself a subspace of the space R. The most common situation is the case where F is a subspace. Then the subspace G is called the orthogonal complement of the subspace F.
8.37. The Pythagorean theorem and its generalization. Let the vectors x and y be orthogonal. Then, by analogy with elementary geometry, we can call the vector x + y the hypotenuse of the right triangle determined by the vectors x and y. Taking the scalar product of x + y with itself, and using the orthogonality of the vectors x and y, we obtain
\x +y\2 = (x+y,x + y) = (x, x) + 2(x,y) + (y,y)
= (x,x) + (y,y) = \x\* + \y\\
sec. 8.3
basic metric concepts 221
This proves the Pythagorean theorem in a general Euclidean space, i.e., the square of the hypotenuse equals the sum of the squares of the sides. It is easy to generalize this theorem to the case of any number of summands. In fact, let the vectors *lt x2, ■ ■ ■ » *jt be orthogonal and let
Z = Xi + x2 + • ■ ' + xk.
Then we have
|z|2 = (Xl + x2 H----+ xk, xt + x2A----+ xk)
= l*il2 + l^l2 + ■ • ■ + \*k\2. (11)
8.38. The triangle inequalities. If x and y are arbitrary vectors, then by analogy with elementary geometry, it is natural to call x + y the third side of the triangle determined by the vectors x and y. Usi ng the Schwarz inequality, we get
l*+>f = (x + y,x+y) = (*, x) + 2(x,y) + (y,y) ( <      + 2 M \y\ + \y\* = (1x| + \y\f,
[ > |*P - 2 |*| \y\ + \y\* = (|*1 - \y\f,
or
|* + ^1 < \x\+\y\, (12)
1* + >1 > |1*1 -|^||. (13)
The inequalities (12) and (13) are called the triangle inequalities. Geometrically, they mean that the length of any side of a triangle is no greater than the sum of the lengths of the two other sides and no less than the absolute value of the difference of the lengths of the two other sides.
8.39. We could now successively carry over all the theorems of elementary geometry to any Euclidean space. But there is no need to do so. Instead we introduce the concept of a Euclidean isomorphism between two Euclidean spaces, i.e., two Euclidean spaces R' and R" are said to be Euclidean-isomorphic if they are isomorphic regarded as real linear spaces (see Sec. 2.71) and if in addition
x , y) = (x , y )
whenever the vectors *", y" e R" correspond to the vectors x\ y' eR'. Then it is obvious that every geometric theorem (by which we mean any theorem based on the concepts of a linear space and a scalar product) proved for a space R' is also valid for any space R" which is Euclidean-isomorphic to R'. According to Sec. 7.97, any two Euclidean spaces with the same dimension n are Euclidean-isomorphic. Hence any geometric theorem valid in an «-dimensional Euclidean space R^ is also valid in any other rt-dimensional Euclidean space R^. In particular, the theorems of elementary geometry,
222     euclidean spaces
chap. 8
i.e., the geometric theorems in the space R3, remain valid in any three-dimensional subspace of any Euclidean space. In this sense, the theorems of elementary geometry are all valid in any Euclidean space.
8.4. Orthogonal Bases
8.41. Theorem. In any n-diniensional Euclidean space R„ there exists a basis consisting of n nonzero orthogonal vectors.
Proof There exists a canonical basis ex, e2>. . . , en for the bilinear form (x,y), just as for any other symmetric bilinear form in an w-dimensional space (see Sec. 7.43). The condition
(£>,., ek) = 0      (i # k)
satisfied by the vectors of the canonical basis is in this case just the condition for orthogonality of the vectors ei and ek. Thus the canonical basis ex, e2,..., en consists of n (pairwise) orthogonal vectors. |
In Sec. 8.6 we will consider a practical method for constructing such an orthogonal basis.
8.42. It is often convenient to normalize the vectors of an orthogonal basis by dividing each of them by its length. The resulting orthogonal basis in Rn is said to be orthonormai
Let ex, e2l . . . , en be an arbitrary orthonormai basis in an ra-dimensional Euclidean space R„. Then every vector x e Rn can be represented in the form
x = lxex + l2e2 + ■ ■ ■ + Znen, (14)
where £l5 c,2, . . . , cin are the components of the vector jc with respect to the basis ex,e2, . . . , en. We will also call these components Fourier coefficients of the vector x with respect to the orthonormai system ex, e2, . . . , en. Taking the scalar product of (14) with e{i we find that
5, = 0"=l,2,...,n). (15)
Let y — rixex + -q2e2 + ■ ■ • + 'r\nen be any other vector of the space Rn. Then it follows from (1) that
(x, y) = lxriX + l2ri2 + • • • + £„v (16)
Thus in an orthonormai basis the scalar product of two vectors equals the sum of the products of the components (Fourier coefficients) of the vectors. In particular, setting y — x, we obtain
i.y|2 = (x, x) = i\ +1\ + ■ • • + e
(17)
sec. 8.5
perpendiculars 223
8.5. Perpendiculars
8.51. Let R' be a finite-dimensional subspace of a Euclidean space R, and let /be a vector which is in general not an element of R'. We now pose the problem of representing/in the form
f=.g + K (18)
where the vector g belongs to the subspace R' and the vector h is orthogonal to R'. The vector g appearing in the expansion (18) is called the projection of f onto the subspace R', and the vector h is called the perpendicular dropped from the end of f onto the subspace R'. This terminology calls to mind certain familiar geometric associations, but it is not intended to do more than just suggest these associations.!
The solution of this problem has in effect already been given in Sec. 7.54 for any symmetric bilinear form which is nonsingular in the subspace R'. Since the positive definite form (xfy) is nonsingular in every subspace R' <= R (Sec. 7.94), the existence and uniqueness of a solution of our problem follows from Sec. 7.54. Moreover, as shown in Sec. 7.55, the existence of the expansion (18) shows that the whole space R is the direct sum of the subspace R' and its orthogonal complement R". A direct sum whose terms are orthogonal is called an orthogonal direct sum. Thus we have expanded the space R as an orthogonal direct sum of the subspaces R' and R". If R and R' have dimensions n and k, respectively, then the dimension of R" equals n — ky since the dimension of the direct sum is the sum of the dimensions of its terms (Sec. 2.47).
We note that the problem is also solved in the case where /lies in the subspace R', since then
/=/+o.
This solution is obviously unique. In fact, if
f=g + f* (geR',heR"), then h =f — g eR' which implies h = 0, g =/
8.52. Applying the Pythagorean theorem (Sec. 8.37) to the expansion (18), we obtain
l/l2 = \g\2 + \h\\ (19)
which implies the formula
0<|*|<|/|, (20)
t Since the concept of the "end of a vector" plays no role in our axiomatics, it is inappropriate to look for any logical content in this terminology.
224     euclidean spaces
chap, 8
expressing the geometric fact that the length of a perpendicular does not exceed the length of the line segment from which it is dropped. Consider the cases where one of the inequalities in (20) becomes an equality. The first equality sign holds if |A| = 0; this means that/ — g + 0, i.e.,/is an element of the subspace R'. The second equality sign holds if \h\ = |/|; according to (19), this means that g — 0 or / =0 + h, i.e., / is orthogonal to the subspace R'. Thus \h\ = 0 means that f belongs to R\ while \h\ = \ f\ means that f is orthogonal to R'. In any other configuration of /, the (inherently positive) length of h is less than that of /
Now let ex, e%,. . . , ek be an orthonormal basis in the subspace R', and
let
Then, by Sec. 8.42,
Substituting this value of \g\2 into (19), we get
l/l2 = l*l2+i4
In particular, for any (finite) orthonormal system ex, e%, . .. , ek and any vector/, we have the inequality
i>;<i/i',
known as BesseVs inequality. The geometric meaning of this inequality is clear: The square of the length of the vector f is no less than the sum of the squares of its projections onto any k mutually orthogonal directions.
8.53. In the applications, we sometimes need an explicit solution of the problem of dropping a perpendicular onto a subspace R', given some basis {b} — {bx, b2, . . . ,bk} in R' (in general, not an orthonormal basis). To solve this problem, we first expand the required vector g (the "foot of the perpendicular") with respect to the basis {b}, i.e., we write
g = ^bx + fi2b2 + ■ ■ • + $kbk.
*
We then impose on the vector h — / — g the condition that it be orthogonal to all the vectors bx, b2, . . . , bk, thereby obtaining the system of equations
(/>, bx) = (f-gt bx) = (/ bx) - Mt>i> b0 - Uh, h)-----UbM = o,
(h, b2) = b2) = (/ b2) - Mbi, b2) - Uh> b2)----- $k(bk, b2) = 0,
M = </-*. M = </. h) - ßx(*lf *,) - Uh, bk)----- ßÄ-,bk) = 0,
sec. 8.5
perpendiculars 225
with determinant
D
(bx,bx) (b2,bx) ih> h)   (b2> b2)
(h, h)
(bk, b2)
(t>i, bk)   (b2, bk)
But D is nonzero, being the determinant of the matrix of the positive definite form (x, y) in the basis bx, b2, . . . , bk (see Sec. 7.96). Hence we can solve the system by Cramer's rule, obtaining the following expression for the coefficients fa (j = 1, 2, . . . , k):
1 D
(bx>bx) (^A) (bx,b2) (b2ib2)
(bx,bk) (b2,bk)
(VxA) (f,bx) (bj+x>bx) (b^b,)   (ffb2)   (b^ 1SZ>.>)
(bk,bx) {b,M
(bk, bk)
8.54. The problem of dropping a perpendicular can be posed not only for a subspace, but also for a hyperplane, in which case the problem is formulated as follows: Suppose that in a Euclidean space R, we are given a vector/and a hyperplane R", generated by parallel displacement of a subspace R'. We wish to show that there exists a unique expansion
(21)
where the vector g belongs to the hyperplane R" and the vector h is orthogonal to the subspace R'.f The geometric meaning of the expansion (21) is illustrated in Figure 1(a). Note that the terms in the expansion (21) are in general no longer orthogonal.
The problem is now easily reduced to the problem of Sec. 8.51. In fact, if we fix any vector in the hyperplane R" and subtract it from both sides of (21), we obtain the problem of representing the vector/ — f0 as a sum of two vectors g —f0 and h, of which the first belongs to the subspace R' and the second is orthogonal to R' (see Figure 1(b)). By the result of Sec. 8.51, such a representation exists. Therefore the representation (21) also exists. It
t Saying that^ belongs to the hyperplane R* means geometrically that the end point of g lie in the hyperplane R", while its initial point is, as usual, at the origin of coordinates. One must not imagine that the whole vector g lies in the hyperplane R"!
226     euclidean spaces
chap. 8
(b)
Figure 1
remains only to prove the uniqueness of the representation (21). If there were two such representations
/= gi + hi = g2 + K,
then we would have
where gx — g2 belongs to the subspace R' and hx — h2 is orthogonal to R'. It follows that gx — g2 ~ hx — h2 = 0, as required.
8.6. The Orthogonalization Theorem
8.61. The following theorem is of fundamental importance in constructing orthogonal systems in a Euclidean space:
Theorem (Orthogonalization theorem). Let  xx, x2,. . ., xfc, . . .   be a finite or infinite sequence of vectors in a Euclidean space R, and let Lk = L(xx, x2, . . . , xk) be the linear manifold spanned by the first k of these vectors. Then there exists a system of vectors yx, y2, . . . , ykj . . . such that
1) The linear manifold = ~L(yx, y2l . . . , yk) spanned by the vectors 7i> y%i ■ * ■ » yk coincides with the linear manifold Lkfor every positive integer k;
2) The vector yk+x is orthogonal to l^kfor every positive integer k.
Proof. We will prove the theorem by induction, i.e., assuming that k vectors yx,y2, ■ ■ ■ ,yk have been constructed which satisfy the conditions
sec. 8.6
the 0RTHOG0NAL1ZATI0N theorem 227
of the theorem, we will construct a vector yk^x such that the vectors yr, y2,
• ■ ■ » A) }'k+l
also satisfy the conditions of the theorem. First let yx = xv Then the condition = Lx is obviously satisfied. The subspace Lfc is finite-dimensional, and hence by Sec! 8.51 there exists an expansion
*fc+i = gk + K (22)
where gk is an element of Lfc and is orthogonal to Lfc. Setting = /tfc, we now verify that the conditions of the theorem are satisfied for this choice of yk+1. By the induction hypothesis, the subspace Lfc contains the vectors y^y** ■ ■ ■ ■> yk-> and hence the larger subspace Lfc+1 also contains these vectors. Moreover, it follows from (22) that L^x contains the vector hk = yk+l. Therefore the subspace Lfc+1 contains all the vectors yx, y2, . . . , yk, yk+1, and hence also contains the linear manifold L^+1 spanned by these vectors. Conversely, the subspace L^+1 contains the vectors xly x2, . . . , xk, and moreover by (22), L^+1 contains the vector xk+1 as well. It follows that Lj.+1 contains the whole subspace Lfc+1. Therefore L^+1 = Lfc+1, and the first assertion of the theorem is proved. The second assertion is an obvious consequence of the construction of the vector yk+l = hk. This completes the induction, thereby proving the theorem. |
8.62. In the present case, the inequality (20) takes the form
0 < lA+il < (23)
As shown in Sec. 8.52, the equality \yk+x\ ~ 0 means that the vector xk+1 belongs to the subspace LA, and is therefore a linear combination of the vectors xx, x2, . . . , xk. The opposite equality \yk+1\ .= |xfc+1| means that the vector xkJrl is orthogonal to, the subspace Lfr, and hence is orthogonal to each of the vectors xlt x2, • .. , xk.
8.63. Remark. Every system of vectors zx, z2, . . . , zk,. . . satisfying the conditions of the orthogonalization theorem coincides to within numerical factors with the system yx, y2, . . . , yk, . „ . constructed in the proof of the theorem. In fact, the vector zR+1 must belong to the subspace Lfc+1, and at the same time zk+1 must be orthogonal to the subspace Lfc. The first of these conditions implies the existence of an expansion
Zk+i = c\y\ + W + ' " + ckyk + ck+1yk+1 = yk + ck+lyk+l,
where yk = cxyx + c2y2 H-----+- ckyk e L* and ck^yk+1 is orthogonal to Lk.
The second condition implies that yk — 0 and hence that
zi -i = ck+iy* •■ i»
as required.
228     euclidean spaces
chap. 8
*8.64. Legendre polynomials. Suppose we apply the orthogonalization theorem to the system of functions
jc0(0= 1,^(0 = x*(0
in the Euclidean space R2( — \, 1). Then the subspace Lk — L(\, tk) is obviously the set of all polynomials in t of degree n < k. The functions x0(t), xx(t)>... , xk(t) are linearly independent (see Sec. 2.22d), and hence the functions MO^MO*... obtained by the orthogonal ization method are all nonzero, by Sec. 8.62. By its very construction, yk(t) must be a polynomial in t of degree k. In particular, direct calculation by the orthog-onalization method gives
^o(0 = 1, M0 = ', M0 = ?2-i 73(0 = >3-!',-•• •
These polynomials were introduced in 1785 by the French mathematician Legendre, in connection with certain problems of potential theory. The general formula for the Legendre polynomials was found by Rodrigues in 1814, who showed that the polynomial yn(t) is given by
Pn(t) = ~ [(? - 1)"]     (n = 0, 1, 2, . . .) (24) dtn
to within a numerical factor. We now prove this formula, using the remark of Sec. 8.63, i.e., we will show that the polynomialpn(t) satisfies the conditions of the orthogonalization theorem, whence it will follows from the remark in question that pn(t) must equal cnyn(t) for every n, as required.
a. The linear manifold spanned by the vectors p0(t), px(t), . . . , pn(t) coincides with the set of all polynomials of degree no greater than n. In fact, it is obvious from (24) that the polynomial pk(t) is clearly a polynomial in t of degree k. In particular,
MO = aoo,
PlO) = «10 + <*uU
p2(t) = «20 + a2lt + a22t2,
M0 = ako + aklt H----+ akktk,
PJJ) = an0 + anlt + ■ ■ ■ + anktk +----h annt«,
where the leading coefficients am, an, . . . , ann are nonzero. Thus all the polynomials M0,M0» ■ • • >A»(0 are elements of the linear manifold
sec. 8.6
the orthogonalization theorem 229
spanned by the functions 1, t,. . . , tn, which is obviously just the set Ln of all polynomials in t of degree no greater than n. Conversely, the functions 1, t, .. . , /" can be expressed as linear combinations of p0(t),px{t),. . . , pn(t), since the matrix of the linear relations (25) has the nonvanishing determinant a90axx ''' ann- Hence the linear manifold L(p0(t),px(t),..., pn(t)) coincides with the linear manifold L{\, /,...,/") and therefore coincides with the set Ln, as required.
b. The vector pn(t) is orthogonal to the subspace Ln_x. It is sufficient to verify that the polynomial pn(t) is orthogonal in the sense of the space R2(—1, 1) to the functions 1, t,. . . , tn~x. To show this, we use the formula for integration by parts, familiar from elementary calculus, which in the case of polynomials involves derivatives of the type considered in Sec. 6.73c from a purely algebraic point of view. In particular, the derivatives of the polynomial
(t2 - 1)" = (/ - ])»(/ + 1)"
of orders 0, 1,...,« — 1 vanish for t — ±l.f Thus, calculating the scalar product of tk and pn(t) for k < n and integrating by parts, we obtain
= t*[(r - in
tm(ti—1)
+1- k f+l tk-x[{t2 - l)"]'"-1' dt, -1 J-l
where the first term on the right vanishes. Integrating the second term by parts again, and continuing this process until the exponent of t becomes zero, we get
0\ Pn(t)) = ~-ktk~l[(t2 - l)n]in-2)
= = ±k\ |+1 [{t2 - l)n]{n~k]dt = ±kl[(t2 - l)"]<»-*-l>
+1+ k(k - 1) f+l ^[{t2 - l)n]in~2) dt -I J-l
+1
= 0, -l
i.e., pn(t) is orthogonal to Lk_x, as asserted.
Thus, finally, we have proved that for every n the polynomial yn(t) is the same as the polynomial pjj) = [(t2 — l)"](n), except for a numerical factor.
t Cf. formula (21), p. 163.
230     euclidean spaces
chap. 8
We now calculatepn(\), by applying the formula for w-fold differentiation of a product to the function
(t2 - 1)" = (t +       - 1)".
The result is
pn(t) = k/ + mt - ir]M
= (/ + i)n[(? - i)n3U) + c?[(* + i)n]'[(/ - i)K]<"-]) + ■ • ■
= (/ + l)nn\ + C>(' + \)n-xn(n - 1) ■ • • 2(t - 1) H----,
where C£ == nlfk\(n — k)\. The substitution f — 1 makes all the terms of this sum vanish from the second term on, and we get
/>n(l)=2»ll!.
For numerical purposes, it is convenient to make the values of our orthogonal functions equal 1 for f = 1. To achieve this, we need only multiply pjj) by the factor \j2nn\. In fact, it is actually these normalized polynomials which are called the Legendre polynomials, i.e., the Legendre polynomial of degree n, denoted by PJj), is given by the formula
1
rt-iin)
8.7. The Gram Determinant
8.71. By a Gram determinant is meant a determinant of the form
(*l,*l) *ü)     •■•     Ol, Xk)
(x2, xx)   (x2, x2)   • • •   (x2, xk)
G(xx, x2, . . . , xk)
(xJt, *l)    (xki xz)
(xk, xk)
where xx, x2,.. . , xk are arbitrary vectors of a Euclidean space R. In Sec. 7.96 we saw that this determinant is positive in the case of linearly independent vectors xx, x2,. .. , xfc. To calculate the value of C7(xX) x2,. . . , xk), we apply the orthogonalization process to the vectors xx, x2, . . . , xk. Thus let_yx — xx and suppose the vector
72 = + X2
is orthogonal to yx. Replacing the vector xx by yx everywhere in the determinant G(xx, x2,. .. , xk), we multiply the first column of G(xx, x2,. . . , xfc) by ocx (associating qlx with the second factors of the scalar products) and add it to the second column. Then we multiply the first row of the determinant by ax (associating <xx with the first factors of the scalar products) and add it
sec. 8.7
the gram determinant 231
to the second row. As a result, the vector y2 appears at every place in the determinant where x2 appeared formerly. Next let
be orthogonal to yx and y2. Multiply the first column by f3x and the second column by %, and add them to the third column. Then carry out the same operations on the rows. As a result, x3 is replaced by y3 everywhere in G(xx, x2,... , xk). We can continue this process until we arrive at the last column (and row). Since these operations do not change the value of the determinant, we finally obtain
G(xx, x2, . . . , xfc)
o    Cv2, y2)
o o
(26)
0 0 {yk,yh)
= Oi, ri)<>2, yd • • • (>>*, yd-
Moreover, by the result of Sec. 8.62, we have the inequality
0 < G(xx, x2, . . . , xk) < (xx, *i)(x2, x2) * • • (xfc, xfc). (27)
Next we examine the conditions under which the quantity G(x1? x2,... , xfc) can take the values 0 or (xx, xx)(x2, x2) ■ • ■ (xfc, xfc). It follows from the form (26) of the Gram determinant that it vanishes if and only if one of the vectors yx, y2,. .. , yk vanishes. But according to Sec. 8.62, this implies that the vectors xx, x2, .. . , xk are linearly dependent. Moreover, according to (26) and Sec. 8.62, the second equality sign holds in the inequality (27) only in the case where the vectors xx, x2, .. . , xfe are already orthogonal. Thus we have proved the following
Theorem. The Gram determinant of the vectors x1; x2,. . . , xk vanishes if the vectors are linearly dependent and is positive if they are linearly independent. It equals the product of the squares of the lengths of the vectors xx, x2, . . . , xk if they are orthogonal and is less than this quantity otherwise.
8.72. The volume of a k-dimensional hyperparallelepiped. As is well known from elementary geometry, the area of a parallelogram equals the product of a base and the corresponding altitude. If the parallelogram is determined by two vectors xx and x2, then for the base we can take the length of the vector xx and for the altitude we can take the length of the perpendicular
232     euclidean spaces
chap. 8
dropped from the end of the vector x2 onto the line containing the vector xx. Similarly, the volume of the parallelepiped determined by the vectors xx, x2 and x3 equals the product of the area of a base and the corresponding altitude; for the area of the base we choose the area of the parallelogram determined by the vectors xx and x2, and for the altitude we take the length of the perpendicular dropped from the end of the vector x3 onto the plane of the vectors xx and x2.
These considerations make the following a very natural inductive definition of the volume of a /:-dimensional hyperparallelepiped in a Euclidean space: Given a system of vectors xx, x2,. .. , xk in a Euclidean space R, let hj denote the perpendicular dropped from the end of the vector xj+x onto the subspace
L(*i, x2,. . . , Xj)      (y — 1, 2, . .. , k 1),
and introduce the following notation:
Vx = \xx\ (a one-dimensional volume, i.e., the length of the
vector xx),
V2 — V\ \h\\       (a two-dimensional volume, i.e., the area of the parallelogram determined by the vectors xx, x2),
y% — ^2 |A2|        (a three-dimensional volume, i.e., the volume of the parallelepiped determined by the vectors xx, x2, x3),
Vk— Vk-i \nk-i\ (a fc-dimensional volume, i.e., the volume of the hyperparalleliped determined by the vectors xx, x2, ■ • • >
Obviously the volume Vk can be written in the form
Vk = V[xi, xa, . . . , xk] = |xx| \hx\ • • • \hk_x\.
Using equation (26), we can express the quantity Vk in terms of the vectors xx, x2,. . . , xk as follows:
(*1> C*l» -^2)    ' ' '    0^l> xk)
(x2, xx)  (x2, x2)   ' ■ *   (x2, xk)
■
«  •  • **■ ■■■ -■■
(xki Xl)    (Xki x2)    ' ' '    (xkf xk)
Thus the Gram determinant of the k vectors xx, x2, . . . , xk equals the square of the volume of the k-dimensional hyperparallelepiped determined by these vectors.
8.73. Let
Q'}     0 = 1,2.....fc;/= 1,2.....it)
sec. 8.7
the gram determinant 233
be the components of the vector xs with respect to an orthonormal basis ex, e2,.. . , en. Expressing the scalar products in terms of the components of the vectors involved, we obtain the following formula for V\\
?«)?(d + ... + ^gi) ... 1«»^) + ... +
■
• ■ •   •   • ■ ■   a   * a
w + ••■ + e'e ■•■ w + *■■ + e^r
We now use an argument similar to that used in Sec. 4.54. Every column of the determinant just written is the sum of n "elementary columns" with elements of the form £j>>i;i-a\ where the indices a and / are fixed in each elementary column, while/ ranges from 1 to k. Therefore the whole determinant equals the sum of nk "elementary determinants" consisting only of elementary columns. In each elementary column the factor £<a) is constant and hence can be factored out of the elementary determinant. As a result, each elementary determinant takes the form
jr(i)   . . .
K(2)     £<2)     . . . £<2)
hik ^
• * ■ • • •
where /x, i2,. . . , ik are numbers from 1 to «. If some of these numbers are the same, then the corresponding elementary determinant obviously vanishes. Thus we need only consider the case where il7 i2, . .. , ik are all different. In the entire sum we group together those terms of the form (28) which have the same indices ju i2,. . . , ik but arranged in different orders. Let
AHA,72» * ■ • »7*3
denote the sum of all such terms, where A>A, ■ ■ • j\ are the indices i\, /2, . . . , ik rearranged in increasing order. An argument similar to that used in Sec. 4.54 then leads to the following result: In the n x k matrix
(/ = l,2,...,*;7 = l,2,...,/0,
the quantity M2[jlfj2>.. . ,yfc] is the square of the minor of order k formed from the columns of this matrix with indices A»A» • • • >A- The sum of all the terms (28) equals the sum of the squares of all the minors of order k of the matrix H^'ll- Thus the square of the volume of the A>dimensional hyperparallelepiped determined by the vectors xlt x2,. . . , xk equals the sum of the squares of all the minors of order k in the matrix consisting of the components of the vectors xl7 x2,. . . , xk with respect to any orthonormal
hi   Sij hik
234     euclidean spaces
chap. 8
8.74. In the case k = n, the matrix || has only one minor of order k, equal to the determinant of the matrix. Hence the volume of the n-dimensional hyper parallelepiped determined by the vectors xx, x2,. . . , xn equals the absolute value of the determinant formed from the components of the vectors xi, x2, • - - > xn wjt" respect to any orthonormal basis.
8.75. Hadamard's inequality. Using the results of the preceding section, we can obtain an important estimate for the absolute value of an arbitrary determinant
£     £     . . . £
^21     ^-22     " " ' S2fc
D
In I
k.2
^kk
of order k. If we regard the numbers \i2,. . . , ^ik (7 = 1, 2,.. . , A:) as the components of a vector xt with respect to an orthonormal basis in a A>dimensional Euclidean space, then the result of Sec. 8.74 allows us to interpret the absolute value of the determinant D as the volume of the A>dimensional hyperparallelepiped determined by the vectors xx, x2, . . . , xk. Then, using the expression for this volume in terms of the Gram determinant, we have
D2 = G(xx, x2, . . . , xk). Applying Theorem 8.71, we obtain
k k
D2 < (xx, xx)(x2, x2) • • ■ (xk, xk) = TJ
an inequality known as Hadamard's inequality. Moreover, we note that according to Theorem 8.71, the equality holds if and only if the vectors xx, x2,. . . , xk are pairwise orthogonal.
The geometric meaning of Hadamard's inequality is clear, i.e., the volume of a hyperparalleled piped does not exceed the product of the lengths of its sides, and it equals this product if and only if its sides are orthogonal.
8.8. Incompatible Systems and the Method of Least Squares
8.81. Suppose we are given an incompatible system of linear equations
aXixx + ax2x2 + • ■ • + aXmxm = bx, a2Xxx 4- a22x2 4-----\- a2mxm = b2,
amxi + an2x2 + ' ■ ' 4- anmxm = bn.
sec. 8.8
incompatible systems and the method of least squares 235
Since the system is incompatible, it cannot be solved, i.e., we cannot find numbers cx, c2,. . . , cm which satisfy all the equations of the system when substituted for the unknowns xx, x2, . . . , xm. Thus if we substitute the numbers £2> ■ ■ ■ , Sm f°r the unknowns xx, x2,. . . , xm in the left-hand side of the system (29), we obtain numbers yx,y2, ■ - • ,yn which differ from the numbers bx, Z>2,. . . , bn. This suggests the following problem: Given real numbers aik and bk (j = 1, . . . , m; k — 1, . . . , ri) find the numbers £l5 £2,. . . , £m which when substituted into (29) give the numbers yx, y2, . . . , yn with the smallest possible mean square deviation
S2=i(Y,-^)2 (30)
from the numbers bx, b2, . . . , bn, and find the corresponding minimum value oft*.
An example of a situation where this problem arises in practice is the following: Suppose we want to determine the coefficients ^ in the linear relation
b = lxax + £2a2 +----h lmam
connecting the quantity b and the quantities ax, a2,. .. , ami given the results of measurements of the a,- (/ — 1, 2, . . . , m) and the corresponding values of b. If the /th measurement gives the value a„ for the quantity a} and the value bi for the quantity b, then clearly
liOn + ^2«,2 + ' ' ' + lmaim = b(. (31)
Thus n measurements lead to a system of n equations of the form (31), i.e., a system of the form (29). As a result of unavoidable measurement errors, this system will generally be incompatible, and then the problem of finding the coefficients \2, .. . , \m does not reduce to the problem of solving the system (29). This suggests determining the coefficients ^ in such a way that every equation is at least approximately valid and the total error is as small as possible. If we take as a measure of the error the mean square deviation of the quantities
m
from the known quantities biy i.e., if we take formula (30) as a measure of the error, then we arrive at the problem formulated at the beginning of this section. Moreover, in this case, it is also useful to know the quantity S2, since this helps to estimate the accuracy of the measurements.
8.82. We can immediately solve the problem just stated, if we interpret it geometrically in the real space Rn. Consider the m vectors ax, a2, . . . , am
236     euclidean spaces
chap. 8
whose components form the columns of the system (29), i.e.,
a2 = (aX2i a2i,. . . , an%),
Forming the linear combinations ^xax + £2a2 + • ■ • + £,mam, we obtain the vector y = (yx, y2, • * • > Tn)- Our problem is to determine the numbers ^2» • ■ ■ > ?m m sucn a wav that the vector y has the smallest possible deviation in norm from the given vector b = (bXj b2, . . . , bn). Now the set of all linear combinations of the vectors ax, a2, . . . , am forms a subspace L ~ L(ax, a2,. . . , a,n), and the projection of the vector b onto the subspace L is the vector in L which is the closest to b. Therefore the numbers ^, £2, . . . , £m must be chosen in such a way that the linear combination
lxax + Sgfla +----h lmam
reduces to the projection of b onto L. But, as we know, the solution of this problem is given by the last equation in Sec. 8-53, i.e.,
I D
(«i, ax)
(aj_x, ax)    (b, ax)    (aJ+x, ax)
(ax, am)   • • •   (a,-.!, am)   (b, am)   (ai+x, am) where D is the Gram determinant G(ax, a2, . . . , am).
{am, ax)
8.83. The results of Sec. 8.72 also allow us to evaluate the deviation S itself. In fact, S is just the altitude of the {m + l)-dimensional hyper-parallelepiped determined by the vectors ax, a2,. . . , am, b, and hence is equal to the ratio of volumes
V[ax, a2,..., am, b] V[ax, a2, . . . , am]
Using the Gram determinant to write each of these volumes, we finally obtain
g2 ^ G(au a2, . . . , am, b) G(ax, a2, . . . , am)
Thus the problem posed in Sec. 8.81 is now completely solved.
sec. 8.9
adjoint operators and 1sometry 237
8.84. In numerical analysis the following problem is often encountered {interpolation with the least mean square error): Given a function f0(t) defined in the interval a < t < b, find the polynomial P(t) of degree k (k < n) for which the mean square deviation from the function f0(t), defined by
is the smallest. Here t0, tx,. . . , tn are certain fixed points of the interval a < / < b. Using geometric considerations, M. A. Krasnosyelski has given the following simple solution of the problem: Introduce a Euclidean space R consisting of functions f{t) considered only at the points t0, tx,. . . , /„, and define the scalar product by
a*) =ifOt)m-
Then the problem reduces to finding the projection of the vector f0(t) onto the subspace of all polynomials of degree not exceeding k. The coefficients of the desired polynomial
P(t) - $0 + lxt + • • • + lkt* are given by the same formulas as in the problem analyzed previously, i.e., (1,1)   (r,l)   *••   (*'-\l)   (/0,1)   (/'+1,1)   •■• (t\l)
(1,0   (*,/)  •••  (f'-\o   (/o,0 0i+\t)
1
D
(\,tl) (/,/*)
(tj~\tk)   (f0,tk) (ti+\tk)
(r, 0
(t\ tk)
where D is the Gram determinant G(l, t1'). The least square deviation itself is given by the formula
0(1, t,...,t\ P) G(l,t,...,tk)
8.9. Adjoint Operators and Isometry
8.91. Adjoint operators with respect to the form (x, y). We now apply the results of Sec. 7.6 on the connection between linear operators and bilinear forms to the case where the fixed form (x, y) is the scalar product of
238     euclidean spaces
chap. 8
the vectors x and y. Let A and B be linear operators in a Euclidean space R„, and use the formulas
A(x, y) = (Ax, y),      B(x, y) = (x, By) (32)
to construct bilinear forms A(x, y) and B(x, y). Since any orthogonal basis is a canonical basis of the form (x, y), and since the canonical coefficients of (x,y) all equal 1 in any such basis, it follows from Sec. 7.61 that the matrix ||ayfc|| of the form A(x, y) in any orthonormal basis coincides with the matrix \\aki]\\ of the operator A, while the matrix ||^J of the form B(x,>>) is the transpose of the matrix \\tyk)\\ of the operator B. Conversely, given bilinear forms A(x, y) and B(x, y) in the space R„, there exist unique linear operators A and B such that the formulas (32) hold (see Sec. 7.62). Moreover, applying Theorem 7.63 to the form (x,y), we get the following
Theorem. Given any linear operator A acting in an n-dimensional Euclidean space Rn, there exists a unique linear operator A' (the adjoint of A) acting in Rn such that
(Ax,^)-(x,A»
for arbitrary x, y e Rn. The matrix of the operator A' in any orthonormal basis of the space R„ is the transpose of the matrix of the operator A.
8.92. Using the operation of taking the adjoint in a Euclidean space, we now introduce the following classes of operators:
a. Symmetric operators, defined by the relation
A' = A.
A symmetric operator is characterized by the fact that transposition does not change its matrix in any orthonormal basis.
b. Antisymmetric operators, defined by the relation
A' = —A.
An antisymmetric operator is characterized by the fact that transposition changes the sign of its matrix in any orthonormal basis.
c. Normal operators, defined by the relation
A'A = AA\
The class of normal operators obviously contains the class of symmetric operators and the class of antisymmetric operators. The study of these classes of operators will be pursued in Sees. 9.3-9.4.
8.93. We now formulate the results of Sees. 7.73-7.76 on invariant operators for the case of a Euclidean space Rn. Consider a linear invertible
sec. 8.9
adjoint operators and isometry 239
mapping y — Qx of the space Rn into itself which does not change the scalar product: (Q*,qw = (*.,).
A mapping of this kind, which in Sec. 7.73 was said to be invariant with respect to the form (xfy), will now be called isometric. Thus an isometric operator Q is characterized by the relation
Q'Q = E
(cf. formula (33), p. 201), where E is the unit operator and Q' is the operator adjoint to Q with respect to the form (x, y), i.e., the operator adjoint to Q in the sense of Sec. 8.91. The inverse Q_1 — Q' of an isometric operator is itself isometric, and so is the product of two isometric operators (see Sec. 7.74).
According to Sec. 7.75, an isometric operator Q is characterized by the fact that it carries every orthonormal basis ex,. .. , en into another ortho-normal basis fx = Qex,... ,/„ = Qen. The matrix Q = of an isometric operator Q in any orthonormal basis is called an orthogonal matrix. An orthonormal matrix is characterized by the conditions (35), p. 202, which in the present case take the form
f 1    if   y = /c,
*=i 10   if k,
or by the conditions (35'), p. 202, which take the form
'l   if j ~ m,
in -f
10  if j ^ m,
i.e., the sum of the squares of the elements of any row (or column) equals 1, while the sum of the products of the corresponding elements of two different rows (or columns) equals 0.
8.94. It follows from the relation Q~l = Q' that the formulas
fi = <li]ei + • • • + q™e„
(33)
for the transformation from one orthonormal basis e±, . . . , en to another orthonormal basis fx, . . . ,/n (such a transformation is called an orthogonal transformation) are "inverted" by the formulas
ex = qi% + * ' ' + q\n)L,
..................... (34)
240     euclidean spaces
chap. 8
By Sec. 5.31, the components y]k of a vector x with respect to the basis/x,. . . , /„ are expressed in terms of the components ^ of the same vector with respect to the basis ex, ... , en by the formulas
Ii = Ä + • • • +
Tin = q[n\ + • ' • +
with inverse formulas
(35)
In = i«X + ■ * • + «SV
(36)
8.95. Given m < n rows of numbers q\,] (i — 1,. . . ,n;j~ 1,.. . , m) satisfying the conditions
fl if ) = k,
I «w =
*-i 10   if j ^k,
consider the problem of finding n — m more rows of numbers qlt}) (j = m + 1,. . . , n) such that the n x n matrix ||^<»!! (/,_/ = 1,...,«) is orthogonal. This problem is easily solved by using a geometrical argument. Suppose the given rows are interpreted as components of m vectors in a Euclidean space R„ with scalar product
• • •,S«),    • ■ •, >]»)) = 2 ^
J-l
(recall Example 8.22b). Then our problem consists of augmenting m given orthonormal vectors qx, . .. ,qm with further vectors to make an orthonormal basis for the space Rn. With this geometrical interpretation, the problem is obviously solvable. For example, we can augment qx, ■ . . ,qm with any other vectors qm+x,. . . , qn such that the resulting system of n vectors is linearly independent, and then use Theorem 8.61 to make the whole system of n vectors orthonormal.
8.96. We now consider some further properties of symmetric operators.
a. If the subspace R' c R « invariant under the operator A, then, by Sec. 7.65, the orthogonal complement of\L' is invariant under the adjoint operator A'. Therefore, in the case of a symmetric operator A, if the subspace R' is invariant under A, then so is the orthogonal complement of R'.
b. Theorem. Every symmetric operator in the plane (n = 2) has an eigenvector.
problems 241
Proof. In this case, the equation determining the eigenvectors is just
a
n
X
= 0.
a
21
a
22
— X
The discriminant of this quadratic equation is (axx + a22f — 4(axxai2 — a2lal2) = (aix
a22f + 4a2,
> o,
and hence has real roots. |
c. From these considerations and the fact that every operator in a real space has an invariant plane (see Sec. 6.66), it follows that every symmetric operator in the space Rn has an orthogonal basis consisting of eigenvectors. In Sec. 9.45 we will deduce this result in a more general way, without recourse to the real Jordan canonical form.
1. Suppose we define the scalar product of two vectors of the space K3 as the product of the lengths of the vectors. Is the resulting space Euclidean?
2. Answer the same question if the scalar product is defined as the product of the lengths of the vectors and the cube of the cosine of the angle between them.
3. Answer the same question if the scalar product is defined as twice the usual scalar product.
4. Find the angle between opposite edges of a regular tetrahedron.
5. Find the angles of the "triangle" formed in the space R2( — 1, 1) by the vectors xx(t) = 1, x2(t) = /, x3(t) = 1 — t.
6. Write the triangle inequalities in the space R2(a, b).
7. Find the cosines of the angles between the line \x = £2 = ■ • • = \n and the coordinate axes in the space Rn.
8. In the space if4 expand the vector f as the sum of two vectors, a vector g lying in the linear manifold Spanned by the vectors bt and a vector h orthogonal to this subspace:
a) /- (5, 2, -2,2),  Ax -(2, 1,1,-1),   ^ = (1,1,3,0);
b) /= (-3,5,9,3),   ^ = (1,1,1,1),   bt = (2, - 1, 1, 1),
b3 =(2, -7, -1, -1).
9. Prove that of all the vectors in the subspace R', the vector g of Sec. 8.51 (the projection of/onto R') makes the smallest angle with f.
10. Show that if the vector g0 in the space R' is orthogonal to g (the projection of / onto R'), then^0 is orthogonal to /itself.
PROBLEMS
242     euclidean spaces
chap. 8
11. Show that the perpendicular dropped from the origin of coordinates onto a hyperplane H has the smallest length of all the vectors joining the origin with H.
12. Given the system of vectors xx = i, x2 = 2i, x3 = 3i, x4 = 4i    2j, xs = —i + 10j,x6 = i + } + 5k in the space V3 with basis     k, construct the vectors yuy~2, • • • .^6 figuring in the orthogonalization theorem.
13. Using the method of the orthogonalization theorem, construct an orthogonal basis in the three-dimensional subspace of the space R4 spanned by the vectors (1,2,1,3), (4,1,1,1), (3,1,1,0).
14. Given two subspaces R' and R" of a Euclidean space R, let m(R\ R") denote the maximum length of the perpendiculars dropped onto R" from the ends of the unit vectors e e R', and define the quantity w(R", R') similarly. Then the quantity
6 = max {m(R\ R"), m(R", R')}
is called the spread of the subspaces R' and R". Show that the subspaces R' and R" have the same dimension if 8 < 1. (M. A. Krasnosyelski and M. G. Krein)
15. Find the leading coefficient An of the Legendre polynomial Pn(t).
16. Show that Pn(t) is an even function for even n and an odd function for odd n. In particular, find Pn( — \).
17. Show that if the polynomial tPn_x(t) is expanded in terms of the Legendre polynomials, so that
then the coefficients a0, ax,. .. , «„_3 and a„_x are zero.
18. Find the coefficients an_2 and an of the expansion of the polynomial tPn_x(t) given in the preceding problem, thereby obtaining the recurrence formula
nPn(t) = (2n     l)/JVi(0 - (« - OJVaCO-
19. Find the polynomial
Q(t) = tn + bxtn~x + ■ ■ ■ + bn„xt + bn for which the integral
Q\t) tit
J-i
has the smallest value.
20. Find the norm of the Legendre polynomial Pn(t).
21. Let A be any linear operator acting in an «-dimensional Euclidean space Rn. Show that the ratio
k^fo^     F[Ayx, Ay2,. . . , Ax,,]
^[xx, X2, . . . , Xft\
is a constant (i.e., is independent of the choice of the vectors xx, x2i. .. , xn), and find the value of k(A) (the "distortion coefficient").
problems 243
22. Show that fc(ab) = k(A)k(B) for any two linear operators a and b.
23. Let xux2,... ,xk, y, z be vectors in a Euclidean space r. Prove the inequality
V[xi* x2i• • • > xkiyi zl ^ VJxii xz* • • • ) xk* z] ^ jc2, ... , Xjt,y]        V[xx, x2i • .. , Xj.]
24. Let xu x2,.. . , xm be vectors in a Euclidean space r. Prove the inequality
m
V[xlt xa,... , xm] < PJ {y[xu • • • > , • • • . xJ}l/<m"x)- (38)
What is the geometric meaning of this inequality?
25 (Continuation). Prove the following inequalities, which sharpen Hadamard's inequality:
^[^X» x2i * * * * ^wil
m
< IT { Vixi, • • * > xk-l> xk+l> ■ • • » -^J}1
m
< ■ ■ ■ <    n       xS2,..., *^}i-*-<«-')/<™-i)<™-2)--
_ m
< • ■ - < n {^-i. *.8]}i/<m-i) < n w.
(M. K. Faguet)
26. If |aiJfc| < M, then
det ||flttH <Mnnn'2,
by Hadamard's inequality. Show that this estimate cannot be improved for
n = 2m.
27. Show that if n(a) and t(a) are the null space and range, respectively, of the operator a, then the orthogonal complements of these subspaces are the range and null space, respectively, of the adjoint operator a'.
28. Let A be an orthogonal matrix. Show that Aik = aik det A is the cofactor of the element aik of A.
29. Show that the sum of the squares of all the minors of order k appearing in k fixed rows of an orthogonal matrix equals 1. Show that the sum of the products of all the minors of order k appearing in one group of k rows with the corresponding minors in another group of k rows equals 0.
30. A linear operator q preserves the length oi' every vector. Show that q is isometric.
244     euclidean spaces
chap. 8
31. An operator A which preserves the orthogonality of any pair of vectors x and y, i.e., such that (x,y) = 0 implies (Ajc, Ay) = 0, is called an isogonal operator. Isometric operators and similarity operators (A* = X* for every x) are isogonal, and so is the product of any similarity operator and any isometric operator. Show that every isogonal operator is the product of i similarity operator and an isometric operator.
32. Let Q be a linear operator acting in an ^-dimensional Euclidean space R„ (n > 3). Suppose Q does not change the area of any parallelogram, so that
V[x,y] = V[Qx,Qyl
Show that Q is an isometric operator.
33. Let Q be a linear operator acting in an w-dimensional Euclidean space R„, and suppose Q does not change the volume of any ^-dimensional hyperparallele-piped (k < n). Show that Q is isometric. (M. A. Krasnosyelski)
Comment. For k = n the assertion of Problem 33 fails to be valid, since then every operator Q with det Q = ±1 will satisfy the condition of the problem.
34. Let F = {xu x2i..., xk] and G = {yx,y^ ... ,yk} be two finite systems of vectors in a Euclidean space R„. Show that a necessary and sufficient condition for the existence of an isometric operator Q taking every vector xt into the corresponding vector^,- (/* = 1, 2,..., k) is that the relations
xd = tyu yd     (Uj = 1,2,...,*)
hold.
35 (The angles between two subspaces). Let R' and R" be two subspaces of a Euclidean space R. Let the unit vector e vary over the unit sphere of the subspace R', and let the unit vector e" vary (independently of e) over the unit sphere of the subspace Rw. For some pair of vectors e = e[, e" = e"x, the angle between e' and e" achieves a minimum, which we denote by <px. Now let e' vary over its unit sphere while remaining orthogonal to e[, and let e" vary over its unit sphere while remaining orthogonal to e[. With these constraints, the angle between e' and e" achieves a minimum <p2 > <px for some pair e' = e'2, e" = e"2. Then let e vary over its unit sphere while remaining orthogonal to e[ and e2, and let e" vary over its unit sphere while remaining orthogonal to e"x and e^. In this way, we get a new minimum angle <p3 > <p2 and a new pair e'3 and e%. Continuing this process, we obtain a set of angles <px, <p2,.. ., <pfc, the number of which equals the smaller of the dimensions of R' and R". The angles <px, <p2,. .. , <pfc are called the angles between the subspaces R' and R". Prove the following facts;
a) The angles <px, <p2,.. ., <pft are uniquely defined and do not depend on the choice of the vectors e[, e[\ e'2, e2,... if these vectors are not uniquely defined by the construction;
b) The angles <px, <p2,... , <pfc determine the subspaces R' and R" to within their spatial orientation, i.e., if there are two pairs of subspaces R', R" and S', S" such that the angles between R 'and R" are the same as those
PROBLEMS 245
between S' and S", then there exists an isometric operator which simultaneously carries S' into R' and S" into R";
c) Given any preassigned angles <px < <p2 < ■ ■ ■ < <pfc < tt/2, we can construct a pair of spaces R' and R* such that <plt <p2,... , <pk are the angles between R' and R*.
36. Let yx, y2,...,ym be the projections of the vectors xx, x2;... , xm onto some subspace. Show that the volume of the hyperparallelepiped determined by the vectors yx,y2,... ,ym does not exceed the volume of the hyperparallelepiped determined by the vectors xx, x2, ... , xm.
37 (Continuation). In Problem 36 suppose that both the vectors xx, x2,... , xm and the vectors yx,y2,..., ym are linearly independent. Show that the formula
^1*^2» • * • ' y™\ = Vtxl> xl-> • • • > xm\ COS ax COS a2 • • ■ COS am
holds, where ax, a2,.. . , am are the angles between the subspaces Lx = L(xx, x2,..., xm) and L2 = ~L(yx,y2,.. . ,ym) (see Prob. 35).
38. A set of k vectors in a Euclidean space R will be called a k-vector, and we will say that two A:-vectors {xx, x2,... , xk) and {yx,y2> ■ ■ ■ >yk] are equal if
1) The volume V[xx, x2,.. . , xk] equals the volume V[yx,y2,... ,yk];
2) The linear manifold L(jcx, x2,. . . ,xk) coincides with the linear manifold
3) The systems xx, x2,... ,xk and yx,y2, .. . , yk have the same orientation, i.e., the operator in the space L(;cx,jc2, . .. ,xk) carrying the system xx, x2,.. . , xk into the system yx, y2,. . . , yk has a positive determinant.
Show that a Ar-vector {xx,x2,. . . ,xk) in an ^-dimensional space R„ is uniquely determined if we know the values of all the minors of order k of the n x k matrix
(i= 1,2,... ,n;j = l,2,...,k)
formed from the components of the vectors xx, x2,..., xk with respect to any orthonormal basis ex,e2, ■ • ■ ,en of the space R„.
39. If the ^-vector {xx, x2,.. ., xk) equals the fc-vector {yx,y2,... ,yk} (Prob. 38), show that the minors of order k of the matrix formed from the components of the vectors xx, x2,... , xk equal the corresponding minors of the matrix formed from the components of the vectorsyx,y^ ■ ■ ■ >yk-
40. By the angles between two fc-vectors {xx, x2,. . . , xk] and {yx,y2,. ■. ,yk} we mean the angles between the subspaces Lx = L,(xx,x2,... ,xk) and L2 = ~L(yx,y2, • ■ • ,yk) (see Prob. 35) subject, however, to the supplementary condition that the vectors ex, e2,. .. , ek chosen in the subspace Lx (when constructing the angles) have the same orientation as the vectors xx, x2,.. . , xk (this condition plays a role only in constructing the last vector ek), and similarly for the subspace L2. Show that the angles Px, P2> ■ ■ • , fa between the A>vectors and the angles «x, a2,... , afc between the corresponding subspaces are connected by the following relations:
«, = h     (j < k), <*k = fa   or   ct.k = n - 0B.
246     euclidean spaces
chap. 8
41. By the scalar product of two ^-vectors X = {xx,x2,- ■. ,xk} and Y = {yx,y2, ■ • • ,Vk}> specified by the matrices JVand Y made up of the components of the vectors x{ and yt with respect to some orthonormal basis of the space Rn, we mean the sum of all the products of the minors of order k of the matrix X with the corresponding minors of the matrix Y. Show that this scalar product equals
V[xx, x2,. . . j xk] V[yx,y2,. . . , yk] cos (3X cos p2 ■ • ■ cos pfc,
where j3x, |32,. . . , $k are the angles between the ^-vectors A'and Y.
42. Show that the scalar product of the two A>vectors X = {jcx,;c2,
and Y = {yi,y2> ■ ■ • >yk) can be written in the form
(xvyO   (xlty2)   ... (xx,yk)
(X2>yd (x2>y2) •■■ (x2>yk)
■> xk]
{X, Y)
(xk,yi) (xk*y2>
(xk,yk)
43. Show that if the polynomial [P(t)]k is an annihilating polynomial of the isometric operator a, then so is the polynomial P(t).
chapter 9
UNITARY SPACES
9.1. Hermitian Forms
9.11. A numerical function A(x, y) of two arguments x and y in a complex space C is called a Hermitian bilinear form or simply a Hermitian form if it is a linear form of the first kind in x for every fixed value of y and a linear form of the second kind (Sec. 4.14) in y for every fixed value of a\ In other words, A(jc,^) is said to be a Hermitian form in x and y if the following conditions are satisfied for arbitrary x, y, z in C and arbitrary complex a:f
A(x + z, y) = A(x, y) + A(z, y),
A(cLX,y) = aA(x, j'), A(x, j> + z) — A(x, y) + A(x, z),
A(x, ay) = aA(x, 7).
Using induction and (1), we easily obtain the general formula
2 ai*» 2 Pi.Vj 1=22 a;(^A(*» (2) i=l     j=l     /    1=1 J=l
where jq, . . . , xfc, >'x,. . . ,ym are arbitrary vectors of the space C and <xx, . . . , v.k, f$lt . . . , {Jm are arbitrary complex numbers.
t As usual, the overbar denotes the complex conjugate.
247
248     unitary spaces
chap. 9
9.12. Examples
a. If Lx(x) is a linear form of the first kind and L2(x) is a linear form of the second kind (Sec. 4.14), then A(x, y) = Lx(x)L2(y) is a Hermitian form.
b. An example of a Hermitian form in an rt-dimensional space Cn with a fixed basis ex, e2,. . . , en is the function
A(*,>0 = £ £fl«&^, (3)
where
71 71
are arbitrary vectors and aik (i, & = 1,2,...,«) are fixed complex numbers. In fact, (3) is the general representation of a Hermitian form in an n-dimensional complex space. This is proved in the same way as the analogous proposition for bilinear forms in a space Kn (see Sec. 7.13).
9.13. A Hermitian form A{xy y) is said to be Hermitian-symmetric (or simply symmetric) if
At* x) = A(x, y) (4)
for arbitrary vectors x and y. Given a symmetric Hermitian form A(x, y) in an w-dimensional complex space C„, suppose we use (3) to write A(x, y) in terms of the components of the vectors x and y with respect to the basis ex,. . . , en. Then
aik = Afo, <?*) = A(efc,     = ow, (5)
i.e., the matrix of the form A(x,y) in the basis ex, . . . , en is carried into itself by transposing the matrix and replacing all its elements by their complex conjugates. Conversely, if the coefficients of a Hermitian form A(xfy) satisfy the condition (5), then A(x, y) is symmetric, since
My,x) == 2       == 2 aJZtfi = 2 aki^i = Mx, y)-
i.k=l i.k=l i.k=l
A matrix \\aik\\ such that aik = aki (/, k = 1, . . . , n) will henceforth be called Hermitian-symmetric (or simply Hermitian).
9.14. a. Suppose the Hermitian form A(x,y) has the matrix A(e) = \\aik\\ in the basis ex,. . . , en of the space C and the matrix A(f) = in the basis/i,. . . ,/„, where the relation between the two bases is given by
fi-2pf^ (i=l,...,»).
sec. 9.1
herm1tian forms 249
Then, reasoning as in Sec. 7.15, we find that the relation between the matrices A[9) and A{f) is given by the formula
A{f) — P*A{e)P, (6)
where P = Wp^W is the matrix of the transformation from the basis ex, . . . , en to the basis fXi. . . and P* is the matrix obtained from P by transposing and then replacing elements by their complex conjugates. Writing P* = \\p*li)\\t we have _
P?"=pP (U = !,•••,»)■ b. Just as in Sec. 7.23, it follows from (6) that the rank of the matrix A(t) of the Hermitian form A(jc, y) is independent of the choice of the basis {e}. The form A(x,y) is said to be nonsingular if its rank (i.e., the rank of the matrix A{e) in any basis {e}) equals the dimension n of the space Cn. If the form A(x, y) is nonsingular, then, given any vector x0 0, there is a vector y0 e Cn such that A(x0, y0) ^ 0 (cf. Sec. 7.15c).
9.15. a. By a Hermitian quadratic form in a complex space C we mean the function of one variable x eC obtained by changing y to x in any Hermitian bilinear form A(x, y). It follows from Sec. 9.12b that in an n-dimensional complex space Cn with basis ex, . . . , en, a Hermitian quadratic form can be expanded in terms of the components ■ ■ ■ , of the vector x by the formula
n _
A(x, x) = 2 a*Mk (7)
with complex coefficients aik. Conversely, a function A(x, x) of the form (7) is the Hermitian quadratic form obtained by changingy to x in the Hermitian bilinear form
71
A(x, >0 = 2 flft^ft-».ft=l
b. If a Hermitian bilinear form A(x, 7) is symmetric, so that aik == aw, then the corresponding Hermitian quadratic form A(x, x) is also said to be symmetric. A symmetric Hermitian quadratic form A(x, x) can only take real values, since it follows from (4) that
A(xf x) — A(x, x).
Unlike the situation in Sec. 7.22, there is a unique Hermitian bilinear form A(xy y) corresponding to a given Hermitian quadratic form A(x, x). In fact,
A(x + y, x + y) = A(x, x) + A(x, y) + A(y, x) + A(y, y), A(x + iy, x + iy) — A(x, x) — iA(x, y) + iA(y, x) + A(yt y).
250     unitary spaces
chap. 9
Multiplying the first equation by i and then subtracting the second equation from the first, we easily find that
A(*> y) = ^ [M* + y,x + y) + i A(x + iy, x + iy)]
- [A(x, x) + A(y, y)],
so that A(x, 7) is uniquely determined in terms of the values A(x, x), A(_y, .y), A(x + y, x -r y) and A(x + iy, x + iy) of the given Hermitian quadratic form.
If the Hermitian quadratic form A(x, x) has the representation
n _
A(x, x) = 2 aiklHk
i.k=l
in some basis ex, . . . , en, then the Hermitian bilinear form
A(x, >>) = 2 an&Mk
obviously reduces to A(x, x) if we make the substitution y — x. Moreover, as just shown, this is the unique Hermitian bilinear form reducing to A(x, x) under this substitution.
9.16. a. Given a symmetric Hermitian quadratic form A(x, x) in an n-dimensional complex space Cn, there exists a basis in C„ in which A(x, x) can be written in the canonical form
A(x, x) =2Afc^fc =ZA*
(8)
with real coefficients Xx, X2, . . . , Xrt.
The proof of this proposition is analogous to that of Theorem 7.31. Instead of equation (13), p. 186, we have
= 6,
ff     I   ^äm ff   _i_ 1   b,n-\,m ff iff
Sx ~r       c2 -f- ■ ■ • -f- —       i,m_i -j-
+ A^x, x)
(bmm 7^ 0), where Ax(x, x) is a symmetric Hermitian quadratic form in the variables     £2,. . . , £m_x. Instead of the transformation (14), p. 187, we
sec. 9.1
herm1tian forms 251
now have the transformation
=   + &
^3 ~= ^3*
which carries the sum al2^^2 + «12^x^2 («12 # 0) into the expression
(a12 + «12)^ifi ™ '(«12 - «12)^2^2 + ■ ' ■ ,
where at least one of the two (real) coefficients al2 + ^12 and i(al2 — al2) is nonzero.
b. The law of inertia (Theorem 7.91) continues to hold for a symmetric Hermitian quadratic form A(x, x) in a complex space, i.e., the total numberp of positive coefficients and the total number q of negative coefficients among the numbers Xx, X2, . . . , Xn do not depend on the choice of the canonical basis. The proof of this proposition is the exact analogue of that of Theorem 7.91. As in the real case, the number p is called the positive index of inertia and the number q the negative index of inertia of the form A(x, x).
It should be noted that the law of inertia does not hold for quadratic (as opposed to Hermitian quadratic) forms in a complex space Cn. For example, the quadratic form
a(x, x) = g + g
is transformed into
A(x, X) = 1)1 - 7]1
by the coordinate transformation
c. Given a symmetric Hermitian quadratic form A(x, x) in a space C„, a canonical basis can always be found such that the corresponding canonical coefficients can only take the values ±1. In fact, having reduced the form A(jc, x) to the form
A(x, x) = Xx Kl2 H----+ X„ \t]v\2 - (xx h^J2 - ■ ■ ■ - (x„ hP+0|2,
where the numbers Xx, . . . , Xs, (xx, . . . , \iQ are all positive, we make another coordinate transformation
252     unitary spaces
chap. 9
thereby reducing A(x, x) to the form
A(x, x) = |Tl|' + ■ - ■ + |Tj)|» - It^J»-----|t„J»
(cf. Sec. 7.93).
9.17. a. The vector x1 is said to be conjugate to the vector yx with respect to the Hermitian bilinear form A{x, y) if
7i) = 0.
If the vectors xx, x2, . . . , xk are all conjugate to the vector yly then every vector of the linear manifold L(xx, x2, • . . , xk) spanned by xly x2, . . . ,xk is also conjugate to yx (cf. Sec. 7.42c). In general, a vector yx conjugate to every vector of a subspace C C is said to be conjugate to the subspace C. The set C of all vectors ^eC conjugate to the subspace C is obviously a subspace of the space C. This subspace C is said to be conjugate to C.
A basis ex, e2,. . . , en of the space Cn is said to be a canonical basis of the form A(x, y) if
A(et, ek) = 0   for  i ^ k.
Every symmetric Hermitian bilinear form A(x, y) has a canonical basis. In fact, let er, e2,. . . , en be a basis in which the corresponding quadratic form A(x, x) can be written in the canonical form
n _
A(x, x) = J hlili,
where
n
*=]
Then, by Sec. 9.15b, the bilinear form A(x, y) takes the canonical form
= 2 x^f-
n
X*   if   i = 0   if l^k.
b. Suppose the principal descending minors o\, S2, ■ ■ ■ > ^n_x of the matrix \\aik\\ of a symmetric Hermitian quadratic form A(x;, x) are all nonvanishing. Then, just as in Sec. 7.52, we can use Jacobi's method to construct a canonical
A(x, y) =
in this basis, where
y = \
and hence
sec. 9.1
herm1tian forms 253
basis for A(x, x), and the canonical coefficients of A(x, x) are given by the same formulas
(Sn = det ||flrt||) as on p. 195.
•
c. A symmetric Hermitian bilinear form A(x, y) is said to be positive definite if A(jc, x) > 0 for every x^O. Just as in the real case (Sec. 7.94), an equivalent condition is that all the canonical coefficients of A(x, x) be positive, or alternatively, that p = w, where p is the positive index of inertia of the form A(x, x).
Just as in Theorem 7.96, a necessary and sufficient condition for the form A(x, y) to be positive definite is that
\ > 0, S2 > 0,. . . , Sn > 0
(Sylvester's conditions). The proof given on p. 209 carries over without change to the complex case.
9.18. a. Given a nonsingular symmetric Hermitian bilinear form (x, y), we can introduce the concept of the adjoint of a linear operator (with respect to the form (x, y)), just as in Sec. 7.6. First we note that if A and B are linear operators in the space Cn, then the forms
A(x, y) = (Ax, y),      B(x, y) = (x, By)
are Hermitian bilinear forms, whose matrices are related to the matrices of the operators A and B (in any canonical basis of the form (x, y) with canonical coefficients z^) by the formulas
n       __ _    „()) U        __ _ him'
ujm rnurn J u}m }uj
(the notation is the same as in Sec. 7.61). Conversely, given two Hermitian bilinear forms A(x,y) and B(x,y), then, just as in Sec. 7.62, there exist unique linear operators A and B such that
A(x, y) = (Ax, y),      B(x, y) = (x, By).
b. It follows, just as in Sec. 7.63, that given any linear operator A acting in the space C„, there exists a unique linear operator A* acting in Cn such that
(Ax,y) = (x,A*y)
-for arbitrary x, yeCn. The matrices \\a^\\ and of the operators A
and A* in any canonical basis of the form (x, y) with canonical coefficients Zj are related by the formula
ai     — — am .
254     unitary spaces
chap. 9
The operator A* is called the adjoint (or Hermitian conjugate) of the operator A with respect to the form (x, y).
c. The operation leading from an operator A to its adjoint A* has the following properties (cf. Sec. 7.64):
1) (A*)* = A for every operator A;
2) (A + B)* = A* + B* for every pair of operators A and B;
3) (XA)* = XA* for every operator A and every number X e C;
4) (AB)* = B*A* for every pair of operators A and B.
9.19. a. As in Sec. 7.71, two complex spaces C' and C" equipped with nonsingular symmetric Hermitian bilinear forms A(x',y') and A(x",y"), respectively, are said to be A-isomorphic if the spaces C'and C are isomorphic regarded as linear spaces over the field C (see Sec. 2.71) and if
A(x',/) = A(x",y")
for all corresponding pairs of elements x', /eC and x", y" e C".
b. Theorem. Two finite-dimensional complex spaces C' and C, equipped with nonsingular symmetric Hermitian bilinear forms A(x',y') and A(x'\y")y respectively, are A-isomorphic if and only if they have the same dimension and the indices of inertia p', q' of the form A(x', y') coincide with the corresponding indices of inertia p", q of the form A(x", y").
Proof Precisely the same as that of the analogous proposition for real spaces (Theorem 7.93). |
c. In particular, two ^-dimensional complex spaces and C^, equipped with positive definite forms A(x',y') and A(x", y")y respectively, are always A-isomorphic (cf. Sec. 7.97).
9.2. The Scalar Product in a Complex Space
9.21. It will be recalled from Sec. 8.21 that the scalar product of two vectors x and y in a real space is taken to be any fixed symmetric positive definite bilinear form (x, y). The corresponding quadratic form (x, x) is then positive for every nonzero vector x, and can be used to define the length of x (see Sec. 8.31). In a complex space, any symmetric positive definite Hermitian bilinear form has the analogous property (see Sec. 9.17c). This leads to the following definition: A complex linear space C is said to be a unitary space if it is equipped with a symmetric positive definite Hermitian bilinear form (x, y), called the (complex) scalar product of the vectors x and
sec. 9.2
the scalar product in a complex space 255
y, i.e., if there is a rule assigning to every pair of vectors x^eCa complex number (x, y) such that
a) (y, x) = (x, y) for every x, y eC;
t>) (x, y + z) = (x, y) + (x, z) for every x,y,ze C;
c) (Xx, y) = X(x, y) for every x, y e C and every complex number X;
d) (jc, x) > 0 for every x^O and (x, x) = 0 for x = 0.
Axioms a)-c) imply the general formula
(k m \ km _
1=1 ■ 3 = 1 / »'=lj = l
where xx, . . . , xk, yx, . . . , ym are arbitrary vectors of the space C and ax, . . . , afc, (31( . . . , f$m are arbitrary complex numbers.
9.22. Examples
a. In the ^-dimensional space Cn (Sec. 2.15b) we define the scalar product of the vectors x =      £2,. . . , £n) and y = (t^, tj2, ■ ■ ■ , tq„) by the formula
(X, y) = ^7)X + £2^2 H----+ IrJln-
The reader can easily verify that axioms a)-d) are satisfied in this case.
b. In the space C(a, b) of all continuous complex-valued functions on the interval a < t < b (Sec. 2.15d) we define the scalar product of the functions x = x(t) and y — y(t) by the formula
(x, y) = | x(t)y(t) dt.
Axioms a)-d) are then immediate consequences of the basic properties of the integral.
9.23. Basic metric concepts. Next we introduce various metric concepts in a unitary space C, just as was done in the case of a real Euclidean space (Sec. 8.3).
a. The length of a vector. As in the real case, by the length (or norm) of a vector x in a unitary space C we mean the quantity
1*1 — +v (*, *).
Every nonzero vector has a positive length, and the length of the zero vector equals 0. For any complex X, we have the equality
|Xjc| = \JQoc, lx) = XX(jc, x) = |X| v(*, *) = \M |*|»
256     unitary spaces
chap. 9
which shows that the length of a vector x multiplied by a numerical factor X equals the absolute value of X times the length of x.
A vector x of length 1 is said to be a unit vector. Every nonzero vector can be normalized, i.e., multiplied by a number X such that the result is a unit vector. In fact, we need only choose X such that
just as on p. 217.
The set of all vectors x e C such that \x\ < 1 is called the unit ball in C, while the set of all x e C such that |jc| — 1 is called the unit sphere.
b. The Schwarz inequality. The inequality
\{x,y)\< \x\\y\ (9)
holds for every pair of vectors x and y in C. The idea of the proof is the same as in the real case (Sec. 8.33), except that we must now be careful about complex numbers. The inequality (9) is obvious if (x, y) = 0. Thus let {x,y)J=0. Clearly,
(Xjc — y, Xx — y) > 0
for arbitrary complex X. Expanding the left-hand side, we get
|X|* (x, x) - X(x, y) - x (*77) + (y, y) > 0. (10)
Let y be the line in the complex plane determined by the origin and the complex number (x, y), and let y' be the line symmetric to y with respect to the real axis. Suppose X varies over the line y', so that X = tzQ, where t is real and
, -KA
l(*, y)\
is the unit vector determining the direction of y'. Then
Hx,y) = *\(x,y)\
is real, and hence
X (x, y) = X(x, y),
so that the inequality (10) becomes
t*(x,x) -2t\(x,y)\ + (y,y)> 0. (11)
The same argument as in Sec. 8.33 now leads to the desired inequality (9).
If equality holds in (9), then the trinomial in the left-hand side of (11) has a unique real root t0 (of multiplicity two). Replacing tz0 by X, we find that the trinomial in the left-hand side of (10) has the root X0 = t^0. Therefore
(X0x - y, Xo* - y) = 0
and hence y — \pz, so that the vectors x and y differ only by a (complex) numerical factor.
sec. 9.2
the scalar product in a complex space 257
c. Orthogonality. Although the concept of the angle between two vectors is not introduced in a unitary space, we still consider the case where two vectors x and y are orthogonal, which means, just as in the real case, that
(x, y) = 0.
If x and y are orthogonal, then obviously
(y>x) = '(*> y) = flit is easily verified that the analogues of Lemmas 8.36a-b and the Pythagorean theorem (Sec. 8.37) remain valid for orthogonal vectors in a unitary space. Moreover, the analogue of the expansion theorem of Sec. 8.51 also holds, i.e., given a finite-dimensional subspace C' c C and a vector/ which is in general not an element of C, there exists a unique representation
/=* + *.
where geC and h is orthogonal to C. The set of all vectors h orthogonal to the subspace C is itself a subspace, which we call the orthogonal complement of the subspace C and denote by C". Just as in Sec. 8.51, we see that the original space C is the direct sum of the subspace C and its orthogonal complement C".
d. The triangle inequalities. If x and y are two vectors in a unitary space C, then, by Schwarz's inequality (9),
\x + y\2 = (x + y,x + y) = (x, x) + (x,y) + (*77) + (y,y) (<{x,x) + 2 \(x,y)\ + (y,y) < (|x| + \y\)\ [ >(x, x) - 2 \(x,y)\ + (y,y) > (\x\ -
or
|x + ^|< |x| + \y\, (12) \x + y\> \\x\ (13) As in the real case, these inequalities are called the triangle inequalities.
9.24. Orthogonal bases in an n-dimensional unitary space Cn. According to Sec. 9.16a, the symmetric Hermitian bilinear form (x, y) has a canonical basis ex, e2,. . . , en in the w-dimensional space Cn, and in this case the condition
(eu ek) = 0     (i^ k)
for the basis to be canonical reduces to the orthogonality condition. Moreover, the orthogonal basis vectors ex, e2,. . . , en can be regarded as normalized, so that
258     unitary spaces
chap. 9
Let
n n k=l k=l
be any two vectors in Cn, with components ^,k, t]k (k = \, . . . , n) with respect to the basis ex, e2f . . . , en. We then get the following formula for the scalar product (x, y) in terms of the components of x and y:
n
(X y)
;.=i
9.25. a. As shown in Sec. 9.18a, the formula
A(x, 7) = (Ax, >>)
establishes a one-to-one correspondence between Hermitian bilinear forms A(x, y) and linear operators A acting in the space Cn. In any orthonormal basis ex, e2,. . . , en of the space Cn, the matrix ||a,-m|| of the form A(x, y) and the matrix \\ajm)\\ of the operator A, where
ajm = ^(ej> em)i
n
are related by the formula
a    = ali)
b. Let A be any linear operator acting in the space Cn. Then, as shown in Sec. 9.18b, there is a unique operator A*, the adjoint of A with respect to the scalar product (x, y), such that
(Ax,y) = {x,A*y)
for arbitrary x, y e Cn. Since any orthonormal basis is a canonical basis for the form (x,y), with canonical coefficients zi — 1, the matrices \aS^\ and of the operators A and A* are related by the formula
In other words, the matrix of the operator A* is obtained from that of the operator A by "Hermitian transposition," i.e., by transposition followed by replacing all elements of the matrix by their complex conjugates. Correspondingly, we call the matrix of A* the Hermitian conjugate (or adjoint) of that of A.
c. As in Sec. 8.96a, if the subspace C c C is invariant under the operator A, then the orthogonal complement of C is invariant under the adjoint operator A*.
sec. 9.3
normal operators 259
9.26. A coordinate transformation in an w-dimensional unitary space C„ leading from one orthonormal basis to another is called a unitary transformation. Unitary transformations are analogous to orthogonal transformations in a Euclidean space (see Sec. 8.94). If eu . . . , en and fx, . . . ,fn are orthonormal bases in Cn and if U = \\uki]\\ is the matrix of the corresponding unitary transformation, so that
fc=l
then obviously
n    ~r~    (l   if  * = j<,
7f=l 10     if      i ^ j.
Conversely, if the numbers uki] satisfy the conditions (14), then the matrix IItt^'II is a unitary matrix, i.e., the matrix of a unitary transformation.
The linear operator U corresponding to a unitary matrix is called a unitary operator. Just like an isometric operator in a real space, a unitary operator in a complex space does not "change the metric." In other words, if
n »
x = 2 5a,    r = 27)^>>
then
(Uv, Uy) = 2 ZfiitVet, Vet) = 2 5*],(/«,/,) = 2 ^ = (*, 7)-
The matrix V of the inverse transformation from the basis flt... ,fnto the basis ex,. . . , en is also unitary. Moreover, if K = ||t>[f,||, we have
»r=(Aa  4° = (^A) = wf.
Thus the inverse of a unitary matrix is obtained by first transposing and then going over to complex conjugate elements. Therefore
u-1 - U*
for a unitary operator U, or equivalently,
U*U = UU* = E.
9.3. Normal Operators
9.31. Definition. An operator A acting in an w-dimensional unitary space C„ is said to be normal if it commutes with its own adjoint, i.e., if
A*A = AA* (15)
260     unitary spaces
chap. 9
(cf. Sec. 8.92c). An example of a normal operator is given by any operator A whose eigenvectors ex, . . . , en, satisfying the relation
Ae} = \fr      (j = 1,. . . , w),
form an orthogonal basis in Cn. In fact, the matrix of the operator A in the basis ex,. . . , en is then of the form
Xx 0 ... 0 0    X2   ... 0
0   0    ... X,
But, by Sec. 9.25, the matrix of the operator A* in the same basis ex, is just
lx   0    ... 0
(16)
0
0
(17)
0   0   ... xn
from which is is obvious that the operators A and A* commute.
9.32. Theorem. Every eigenvector xofa normal operator A with eigenvalue X is an eigenvector of the operator A* with eigenvalue X.
Proof. Let P Cn be the subspace consisting of all eigenvectors of the operator A with eigenvalue X. If x e P, then
AA*x = A*Ax = A*(Xx) = XA*x,
which implies A*xeP. Hence P is invariant under the operator A*. Moreover,
(A*x, y) = (x, Ay) = (x, Xy) = (Xx, y)
for arbitrary x, y e P, and hence
A*x — Xx. |
9.33. a. Theorem. Given any normal operator A acting in a unitary space Cn, there exists an orthonormal basis ex, . . . , en in Cn consisting of eigenvectors of A.
Proof. The normal operator A, like every linear operator in the space C„, has an eigenvector (see Sec. 4.95b). Let ex be an eigenvector of A with eigenvalue X, and let P Cn be the subspace consisting of all eigenvectors of A with this eigenvalue X. If P is the whole space C„, then we need only arbitrarily augment ex with vectors e2,. . . , en to make an orthonormal basis
sec. 9.3
normal operators 261
for Cn, thereby proving the theorem. Thus suppose P ^ Cn, and let Q be the orthogonal complement of P in Cn. The subspace P is invariant under the operator A*, as in the proof of Theorem 9.32 (in fact, A* carries every vector x e P into the vector Xx). It follows that Q is invariant under the operator A itself, because of Sec. 9.25c and the fact that (A*)* = A (see Sec. 9.18c). We can now prove the theorem by induction. In fact, suppose the theorem is true for every space Cn of dimension n < k. Then it is also true for Cft+1, since to get an orthonormal basis for Cfc+1 consisting of eigenvectors of A, we need only choose such a basis in the subspace Q (such exists by the induction hypothesis, since the dimension of Q is </c) and then augment this basis by any orthonormal basis in P. The proof is now complete, since the theorem is obviously true for the one-dimensional space Cx. |
b. It follows from Theorem 9.33a that every normal operator A is diagonalizable (see Sec. 4.72f). In fact, A has the diagonal matrix
A ~
\ 0 0 A,
0 0
0 0
in the orthonormal basis constructed in Theorem 9.33a, consisting of eigenvectors of A. The eigenvalues of A lie on the principal diagonal of this matrix, each appearing a number of times equal to the dimension of the corresponding characteristic subspace (cf. p. 110). Hence the characteristic polynomial det \\A — Xis|| of the operator A, which as we know is independent of the choice of basis (see Sec. 5.53), has the form
det IIA ~ X£||
II
X)f\
(18)
where Xx,. .. , Xm are the distinct eigenvalues of the operator A and rx, . . . , rm are the dimensions of the corresponding characteristic subspaces.
c. On the other hand, suppose it is known that a normal operator A has a characteristic polynomial of the form
det M-XEII -IT(^-X)
Vk
Jfc=l
2>*
(19)
where fjtx,. . . , (xs are distinct complex numbers and px,... ,pk are certain positive integers (multiplicities). Then it can be asserted that the operator A has an orthonormal basis consisting of eigenvectors with eigenvalues (xx, . . . , (x„ where the dimension of the characteristic subspace corresponding to the eigenvalue u., is just p,. In fact, the polynomials (18) and (19) must coincide, by the uniqueness of the characteristic polynomial. But then our assertion
262     unitary spaces
chap. 9
follows from the familiar theorem on the uniqueness of the factorization of a polynomial.
9.34. Self-adjoint operators. An operator A acting in a unitary space C is said to be self-adjoint if A* = A, i.e., if
(Ax, y) = (x, Ay) (20)
for arbitrary vectors x, y eC. Note that A is self-adjoint if and only if the bilinear form (Ax, y) corresponding to A is Hermitian-symmetricf According to Sec. 9.25, the matrix of a self-adjoint operator A in any orthonormal basis coincides with its own Hermitian conjugate, i.e., with the matrix obtained from that of A by transposition followed by taking complex conjugates of all elements. Conversely, every operator A with a Hermitian-symmetric matrix (i.e., a matrix equal to its own Hermitian conjugate) in some orthonormal basis is self-adjoint.
Since a self-adjoint operator A is obviously normal, it follows from Theorem 9.33a that there exists an orthonormal basis ex, . . . , en in the space C„ in which the matrix of the operator A takes the form (16) and that of A* takes the form (17). Hence X, = X,- (j = 1,. . . , n), since A* = A, i.e., the numbers X^ are all real. This proves the following
Theorem. Given any self-adjoint operator A in a unitary space Cn, there exists an orthonormal basis ex, . . . , en consisting of eigenvectors of A with eigenvalues that are all real.
Conversely, every linear operator A in the space C„ with the indicated property is self-adjoint. In fact, A is normal by Sec. 9.31, and comparing (16) and (17) we find that A* — A, since the numbers X,- are all real.
9.35. Antiself-adjoint operators. An operator A acting in a unitary space C„ is said to be antiself-adjoint if A* = —A. The matrix of an antiself-adjoint operator A in any orthonormal basis ex,. . . , en has the following characteristic property:
aik — (A^, ek) = (eit A*ek) = (eit —Aek) = —(Aek, e() = —aki
(i, k = 1,. . . , n).
An antiself-adjoint operator A is obviously normal. Applying Theorem 9.33a, we find that there exists an orthonormal basis ex,. . . , en in the space Cn in which the matrix of the operator A takes the form (16) and that of A* takes the form (17). Hence X^ = —X,- (j = \, . . . , n), since A* = —A,
f In fact, the condition (Ay, x) = (\x,y) is equivalent to (20). For this reason, a self-adjoint operator might also be called Hermitian-symmetric.
sec. 9.4
applications to operator theory in euclidean space 263
i.e., the numbers X, are all purely imaginary. This proves the following
Theorem. Given any antiself-adjoint operator A in a unitary space Cn, there exists an orthonormal basis ex, . . . , en consisting of eigenvectors of A with eigenvalues that are all purely imaginary.
Conversely, every linear operator A in the space C„ with the indicated property is antiself-adjoint.
9.36. As in Sec. 9.26, an operator U acting in a unitary space C„ is said to be unitary if U*U = UU* = E. In particular, every unitary operator is normal. Applying Theorem 9.33a, we find that there exists an orthonormal basis ex, . . . , en in the space Cn in which the matrix of the operator U takes the form (16) and that of U* takes the form (17). Hence X.,X,- = 1 (j — 1, .. ., n), since U*U = E, or equivalently,
|X,| - 1      (j= \,...,n).
This proves the following
Theorem. Given any unitary operator U in a unitary space Cn, there exists an orthonormal basis ex,. . . , en consisting of eigenvectors of the operator U with eigenvalues that are all of absolute value 1.
Conversely, every linear operator U in the space Cn with the indicated property is unitary.
9.4. Applications to Operator Theory in Euclidean Space
9.41. Embedding of a Euclidean space in a unitary space. As in Sec. 8.21, let R be a (real) Euclidean space with scalar product (x, y). Consider the complex space C consisting of the formal sums x + iy where x, y e R, with the following natural operations of addition and multiplication by arbitrary complex numbers:
(*i + i>i) + (*2 + W = (*i + *2) + i(yi + yd, (a + i$(x + iy) = (ax - (3y) + /(ay + $x).
Then it is easily verified that C has all the properties of a complex linear space.
We now identify the vectors jc + iO with the vectors x e R, calling them real vectors of the space C. The vectors 0 + iy will be denoted simply by iy and called purely imaginary vectors. By the complex conjugate of the vector x + iy, written x + iy, we mean the vector x — iy.
Next we introduce a scalar product in the space C, defined by the formula
(*i + '>i> *2 + iyd = [(*!, x2) + (yx, y2)] + i[(yx, x2) - (xx, y2)].
264     unitary spaces
chap. 9
It is easily verified that this scalar product satisfies axioms a)-d) of Sec. 9.21. In particular,
(x + iy, x + i» = (x, x) 4- (y, y).
Thus the space C contains the space R as a subset, equipped with the same scalar product, and subject to the same operations of addition and multiplication by real numbers. Note that every orthonormal system (or basis) eu . . . , en in the space R is also an orthonormal system (or basis) in the space C.
9.42. Every linear operator A specified in the space R can be extended into the space C by the formula
A(x + iy) = Ax + iAy, (21)
where the operator A is obviously a linear operator in the space C. The matrix of the operator A in the space C relative to a basis ex,. . . , en eR coincides with the matrix of the operator A in the space R relative to the same basis, since, according to (21),
Aes — Aei     (j = 1,. .. ,«).
This extension from A to A preserves algebraic relations between linear operators, i.e., if A + B = D in the space R, then A + B = D in the space C, while if AB = D in the space R, then AB = D in the space C. This follows for example from the fact that matrices are preserved under the extension from A to A.
9.43. Let A' be the adjoint of the operator A in the real space R (see Sec. 8.91). Then the extension A' of the operator A' into the space C is just the operator A* adjoint to the extension A of A. In fact, given arbitrary vectors z = x + iy, w = u + iv e C, we have
(A'(jc + iy), u + iv) - (A'x, u) + /(A>, u) - i(A'x, v) + (A>, v)
= (x, Au) + i(y, Aw) — i(x, Av) + (y, Av) = (x + iy, A(u + iv)),
as required.
In particular, the extension of a symmetric operator (A' = A) is a self-adjoint operator (A* = A), the extension of an antisymmetric operator (A' = —A) is an antiself-adjoint operator (A* = —A), and the extension of an isometric operator (U' = U_1) is a unitary operator (U* = U_1). Finally, the extension of a normal operator (A'A = AA') is again a normal operator (A* A = AA*).
sec. 9.4
applications to operator theory in euclidean space 265
9.44. Structure of a real normal operator. Let ct and t be real numbers. Then the easily verified matrix equality
ct2 + ta 0
ct t
-t ct
CT —t t «7
CT     —t
t a
ct t — t ct
0
ct2 + t2
(22)
shows that the matrix
— t ct
commutes with its own transpose (and hence a fortiori with its own adjoint); more generally, the same is true of the quasi-diagonal (real) matrix
°1 Tl -tx ctx	
	— ^2 CT2
rn	7ft		
	m		
			
			
(23)
of order 2m + r- m = /«|r.
Theorem. CP/yen any normal operator A z'« a real Euclidean space Rn, there exists an orthonormal basis flt. . . yfn gR„ in which the matrix of A is of the form (23), with m + r — w, w/tere f/re numbers X, = ct,- + re, (y — 1, . . . , w) awf/ \„+i> . . . , Xr are uniquely determined by A. yac7, f/r&ye numbers are the roots of the characteristic equation
det \\A - X£|| = 0, (24)
and each root of (24) appears in the matrix (23) a number of times equal to its multiplicity.
Proof. As in Sec. 9.41, we construct the unitary space Cn whose scalar product is the extension of the scalar product (x, y) defined in the space Rn.
266     unitary spaces
chap. 9
We then use (21) to extend the operators A and A' into the space Cn. As shown in Sec. 9.43, the extensions of A and A' are the normal operator A and its adjoint A*. Let \\aik\\ denote the matrix of the operator A relative to any orthonormal basis ex, . . . , en in the space Rn (the numbers aik are real).f Then the operator A has the same matrix relative to the basis ex, . . . , en in the whole space C„. Since the characteristic equation (24) has real coefficients, if is an imaginary root of (24), then so is the complex conjugate X,. Bearing this in mind, we write the sequence of distinct roots of (24) in the form
Xx, Xj, . . - , Xj,, Xp, Xp+1, ..., x?,
where the roots Xx,. . . , X„ are imaginary and the roots Xp+1>. . . , \ are real. Then, by Sec. 9.33b, the space Cn can be represented as a direct sum of orthogonal subspaces
A^, Aj,. .. , Ap, Aj,, Ap^j,.. . , A0,
where A:- consists of all eigenvectors of the operator A corresponding to the eigenvalue X, and A^ consists of all eigenvectors of A corresponding to the eigenvalue X,, while
Ap+l = A-p+X) . . . , A5 — Aq. If z = x + iy e A}, then the equation Az = \p becomes
n k=X
in component form (with respect to the original basis ex> . . . , en), where
Taking the complex conjugate and recalling that the numbers ajk are real, we get
n       _        _ _
2aJk^>k = \^>j k=X
This means that the vector z — (£x, . . . , £„) is also an eigenvector of the operator A with eigenvalue X,. It follows that the operation of taking the complex conjugate carries the space A3 into the space A^.
Now let Xx = gx + hx> where tx 0 since Xx Xx, and let gx be any unit vector in Ax, so that gx e Ax. Moreover, let
fx = -j= (gx + gx)>      h =        (gx ~ §i)>
so that
gx = At (fx + 1/2)* (fi ~
x/2 V2
t In the course of the proof, we will construct a new orthonormal basis flr. . . ,/„ for R„ in which A has a matrix of the form (23).
sec. 9.4
applications to operator theory in euclidean space 267
where the vectors fx and f2 are obviously real, and moreover orthonormal, since it follows from
gi) = tel. 81) = 1 >      tei» Si) = 0
that
(A, A) = (A,A) = \ i(gv gi) + tel. £)] = i,
(A> A) = - z tei + ft. gi - so = - ^ ttei. gi) - tel. gi)l = o.
2i 2i
Since
AA = AA = j= (Agl + Agx) = -±r (X,gx + lxgx)
— ~ K°i + 'Ti)(A + (A) + Oi — «Ti)(A — 'A)] = aifi — TiA, AA - AA = -j=-. (Agx - Ag\) =      (Xxgx - lxgx) = txA + <hA,
we see that the operator A transforms the plane of the vectors fx, f2 into itself and has the matrix
°i Ti
-Ti <*i
(25)
in the basis fx,f2. If the dimension of Ax is greater than 1, we choose another unit vector g2 e Ax orthogonal to gx, with complex conjugate g2 e Ax (the latter is automatically orthogonal to g2). Repeating the above construction for g2 and g2, we get a new pair of real vectors/3,/4 which are linear combinations of g2, g2 and hence orthogonal to the vectors fx, f2 (themselves linear combinations of gx, gx). Clearly A transforms the plane of the vectors fi,fi into itself and has the same matrix (25). Continuing this construction, we eventually get 2m orthonormal real vectors /i,/2, . - . >/2m-i>.Am. where m is the sum of the dimensions of the subspaces Ax,. . . , Av and A transforms the plane of the vectors f2j-X,f2j into itself, with either the same matrix (25) or the analogous matrix obtained by replacing gx, ~x by ak, tk (k = 2t... ,p).
Next consider the subspace Ap+1 corresponding to the real root Xj,+l = Xp+1. The operation of taking the complex conjugate obviously carries the subspace A^ into itself. Letg be any vector in A„+1, and let g be its complex conjugate. There are just two possibilities, namely, the vectors g and g are either linearly independent (in C„) or linearly dependent. If g and g are linearly independent, then so are the real vectors
/-^;(g + g).   f = -jr.ig - g)-V2 V21
Likeg and g, these vectors belong to Ap+1, and hence are eigenvectors of the
268     unitary spaces
chap. 9
operator A with the same eigenvalue Xp+1. On the other hand, if g and g are linearly dependent, then
g = e2i*g     (0 < 9 < 7r), since g and g have the same length. Therefore
so that the vector
/ = e*g
is real. Moreover, since/belongs to A^, like ^ itself,/is an eigenvector of A with the same eigenvalue X^. Thus, in any event (continuing this construction if necessary), we can always find a basis in AJ)+1 consisting of real vectors. Applying the orthogonalization theorem (Theorem 8.61) to this basis, we finally get first an orthogonal and then an orthonormal basis in A^j. Clearly the operator A transforms Am into itself and has the diagonal matrix
0
0
0
0
0
0
110+1
(26)
in the orthonormal basis. Repeating this construction for the remaining subspaces A^, . . . , A„, we eventually obtain a set of orthonormal vectors /2m+i)/2m+2. ■ ■ ■ >/«> which together with the previously constructed vectors fi->/2-> ■ ■ ■ ,/2m f°rm a full orthonormal basis for R„. To complete the proof, we need only take account of the special form of the typical blocks (25) and (26), compensating for the somewhat different indices in (23) which refer to roots which are not necessarily distinct. |
The geometric meaning of a normal operator can be deduced from this theorem. First we observe that the operator with matrix
-T a
in the basis fltf2 can be interpreted as a rotation accompanied by an expansion in the plane of the vectors fi,f2. In fact, we need only note that
a t
— t ct
V*2 +
Va2 + t2     Jo2 +
/   2   i     2       /   2   I 2
= M
cos a sin a —sin a   cos a
sec. 9.4
applications to operator theory in euclidean space 269
where the effect of the matrix
cos a   sin a —sin oc   cos a
is to rotate every vector in the fx,f2 plane through the angle a, while M is clearly the expansion coefficient. Recalling (23), we now see that the total effect of the normal operator A is to produce rotations accompanied by expansions in m mutually orthogonal planes and expansions only (by factors of
Am+i' ■ ■ ■ ' \> respectively) in the r — m directions orthogonal to these planes and to each other.f
9.45. The structure of a real symmetric operator. Let A be a symmetric operator acting in a real space Rn) so that A' =• A. Then the extension A of the operator A into the unitary space Cn is self-adjoint, i.e., A* — A. The eigenvalues Xx, . . . , X„ of a self-adjoint operator are all real (see Sec. 9.34). Hence there are no blocks of the form (25) in the representation (23), and all that remain are diagonal elements. This proves the following
Theorem. Given any symmetric operator A in a real Euclidean space Rn, there exists an orthonormal basis in R„ consisting of eigenvectors of A.
Geometrically, a symmetric operator produces expansions (by factors of Xx,. . . , X„, respectively) along each of n orthogonal directions. The numbers Xx,. . . , Xn are the roots of the characteristic equation (24). Hence the characteristic equation corresponding to a symmetric matrix A — \\aik\\ must havew (not necessarily distinct) real roots and no imaginary roots at all.
9.46. The structure of a real antisymmetric operator. If A is an antisymmetric operator acting in R„, so that A' = —A, then the extension A of the operator A into the space C„ is antiself-adjoint, i.e., A* = —A. The eigenvalues Xx, . . . , X„ of an antiself-adjoint operator are all purely imaginary (see Sec. 9.35). Hence the blocks (25) in the representation (23) take the special form
0
(/ = 1, 2,. . . , m),
—Tj 0
while the numbers Xm+1, Xm+2,... , Xr must all be 0. This proves the following
t The expansion is actually a contraction if 0 < Vo2 + t2 < 1 or if 0 < X* < 1. Moreover, expansion by a factor >,k < 0 is actually an expansion accompanied by a reflection.
270     unitary spaces
chap. 9
Theorem. Given any antisymmetric operator A in a real Euclidean space Rn, there exists an orthonormal basis in R„ in which the matrix of A takes the quasi-diagonal form
0 tx -tx 0	
	0 t2 -t2 0
o -m		
0		
	0	
		0
0
Conversely, if the matrix of an operator A is of the form (27) in some orthonormal basis, then A is antisymmetric (Sec. 8.92b).
Geometrically, an antisymmetric operator produces rotations through 90° followed by expansions (by factors of tx, . . . , t,„, respectively) in m mutually orthogonal planes, while mapping into 0 all vectors orthogonal to these planes.
9.47. The structure of a real isometric operator. If A is an isometric operator acting in Rn, so that A' = A-1, then the extension A of the operator A into the space C„ is unitary, i.e., A* = A~'. The eigenvalues Xl5 . . . , X„ of a unitary operator are all of absolute value 1 (see Sec. 9.36). Hence the blocks (25) in the representation (23) take the special form
cos a, sin —sin otj cos
and the numbers Xm+1, . . . , Xr must all be ± 1. This proves the following
Theorem. Given any isometric operator A in a real Euclidean space R„, there exists an orthonormal basis in R„ in which the matrix of A takes the
problems 271
quasidiagonal form
cos «i   sin «i — sin ax   cos at	
	cos oc2   sin oc2 —sin a2   cos <xt
COS 3t,„	sin am		
— sin am	cos a„,		
		±1	
			±1
±1
Geometrically, an isometric operator A produces a rotation through a certain angle (with no accompanying expansion) in each of m mutually orthogonal planes, and acts like the operator E or —E in each of the r-m directions orthogonal to these planes and to each other. However, we can combine every pair of such directions with identical expansion coefficients (both +1 or both —1) into a plane in which the operator A also produces a rotation (through 0° or 180°). Making all such combinations, we find that if n is odd, then some last direction has the coefficient 4-1 or —1, while if*n is even, there may be two ungrouped directions with coefficients +1 and — 1. The presence of —1 among these remaining coefficients shows that besides the indicated rotations there is an additional reflection with respect to some coordinate plane, for example, the plane orthogonal to the basis vector en. We then have det A = — 1, whereas det A ~ +1 if there is no such reflection.
PROBLEMS
1. A self-adjoint operator acting in a unitary space Cn is said to be nonnegative (or positive) if all its eigenvalues \,.. . X„ are nonnegative (or positive). Show that the square of every symmetric operator is nonnegative.
2. Show that given any self-adjoint nonnegative (or positive) operator A, we can find a unique nonnegative (or positive) operator B, the "square root of the operator A," such that B2 = A.
272     unitary spaces
chap. 9
3. Take the square root of the operator A specified by the matrix
13 14
4
A =
14  24 18
4   18 29
in an orthonormal basis ex, e2, ez.
4. Let A be an arbitrary linear operator acting in a unitary space Cn, and let A* be its adjoint. Prove that A*A is a nonnegative operator. Prove that A'A is a positive operator if A is nonsingular.
5. Given that a linear operator A is the product SQ of a self-adjoint operator
5 and a unitary operator Q, prove that S2 = AA*.
6. Show that every nonsingular linear operator A can be represented as the product SQ of a self-adjoint operator and a unitary operator.
7- Prove that the representation of the operator A as a product SQ in Problem
6 is unique.
8. A linear operator V acting in Cn is said to be nonexpanding if | Vjc| < |jc| for every x. Prove that every linear operator A can be represented as the product of a self-adjoint operator and a nonexpanding operator.
9. Show that two self-adjoint operators A and B commute if and only if they have a common system of n mutually orthogonal eigenvectors.
10. Given a linear operator A acting in the space C„, find an orthonormal basis in which the matrix of A has the triangular form
a\
i
a
n
0 a*
,(2)
a
(2)
A -
n
0
0
a"
n
chapter 10
QUADRATIC FORMS IN EUCLIDEAN AND UNITARY SPACES
10.1. Basic Theorem on Quadratic Forms in a Euclidean Space
10.11. We begin with the following theorem concerning symmetric bilinear forms in a Euclidean space:
Theorem. Every symmetric bilinear form \(x,y) in an n-dimensional Euclidean space RTi has a canonical basis consisting of orthogonal vectors.
Proof Consider the linear operator A corresponding to the given symmetric bilinear form (see Sec. 8.91). The operator A is also symmetric. According to the theorem on symmetric operators (Theorem 9.45), the space R„ has an orthonormal basis consisting of the eigenvectors of the operator A, and the matrix of A is diagonal in this basis. Since this matrix is also the matrix of the bilinear form A(jc, y), the orthonormal basis just found is a canonical basis of A(jc, y). |
10.12. We now apply this result to the study of quadratic forms. Given a quadratic form
n
Mx> *) = 2 ai££x     (aik = aki), (1)
we will regard the numbers £ls £2> as the components of a vector x
in an «-dimensional Euclidean space Rn, with a scalar product defined by the formula
n
(x, y) = 2 Stf*
273
274     quadratic forms in euclidean and unitary spaces
chap. 10
where y = (7]x, 7)2, . . . , t)^. The basis
ex = (1,0, . . . ,0), e2 = (0, 1,...,0),
ew = (0,0,...,l) is an orthonormal basis in Rn, and clearly
n n
x = y = S7)'^-
! = 1 » = 1
Now consider the bilinear form
A(x, .y) = 2
corresponding to the quadratic form (1). By Theorem 10.11, this form has an orthonormal basis/ls/2,. . . ,/„. If the components of the vectors x and y are tx, t2, . . . , t„ and 6X, 02, . . . , 0„, respectively, in this basis, then we can write the bilinear form A(jc, y) as
n
A(x, y) = 2 XfT^i and the quadratic form A(jc, jc) as
A(x, x) = 2 X^, (2)
The transformation from the basis ex> e2, . . . , en to the basis/i,/2s . . . ,/„ is given by
h^lqfe,     0 = 1, 2,...,, 7),
where 0 = ll^f'll is an orthogonal matrix (Sec. 8.93). According to the formulas (36), p. 240, the relation between the components tx, t2, . . . , t„ and £x, £2, . . . ,    is given by the system of equations
0 = 1,2,...,*), (3)
involving the transposed matrix Thus we have proved the following important
Theorem. Every quadratic form (1) in an n-dimensional Euclidean space R„ can be reduced to the canonical form (2) by making an isometric coordinate transformation (3).
sec. 10.1 basic theorem on quadratic forms in a euclidean space 275
10.13. The sequence of operations which must be performed in order to construct the coordinate transformation (3) and the canonical form (2) of the quadratic form (1) can be deduced from the results of Sees. 4.94 and 9.45. We now give this sequence of operations in final form:
a) Use the quadratic form (1) to construct the symmetric matrix A = .
b) Form the characteristic polynomial A(X) = det (A ~~ X£) and find its roots. By Sec. 9.45, this polynomial has n (not necessarily distinct) real roots.
c) From a knowledge of the roots of the polynomial A(X), we can already write the quadratic form (1) in canonical form (2); in particular, we can determine its positive and negative indices of inertia.
d) Substitute the root Xx into the system (28), p. 110. For the given root Xx, the system must have a number of linearly independent solutions equal to the multiplicity of the root Xx. Find these linearly independent solutions by using the rules for solving homogeneous systems of linear equations.
e) If the multiplicity of the root \ is greater than unity, orthogonalize the resulting linearly independent solutions by using the method of Sec. 8.61.
f) Carrying out the indicated operations for every root, we finally obtain a system of n orthogonal vectors. We then normalize them by dividing each vector by its length. The resulting vectors
J\ ~\H\ > Ht '••■>"« h
f   _ (nW   „<2> „(2K
Jn " Wl   i H2   i • • • » Hn )
form an orthonormal system.
g) Using the numbers qlj}, we can write the coordinate transformation (3).
h) To express the new components tx, t2, . . . , t„ in terms of the old components     £2> • • • >     we write
T^i^i 0-1,2,...,«),
>'=i
recalling that the inverse of the orthogonal matrix Q is the transposed matrix Q'.
10.14. In Sec, 7.33a we saw that neither the canonical form nor the canonical basis of a quadratic form is uniquely defined in an affine space; in general, any preassigned vector can be included in the canonical basis of the quadratic form. The situation is quite different in a Euclidean space, provided that only orthonormal bases are considered. The point is that the matrix of the quadratic form and the matrix of the corresponding symmetric linear operator transform in the same way, as already noted in Sec. 8.91.
276     quadratic forms in euclidean and unitary spaces
chap. 10
Thus a canonical basis for the quadratic form is at the same time a basis consisting of the eigenvectors of the symmetric operator, and the coefficients of the quadratic form relative to the canonical basis (the "canonical coefficients") coincide with the eigenvalues of the operator. But the eigenvalues of the operator A are the roots of the equation det (A — XE) ~ 0, an equation which does not depend on the choice of a basis and is an invariant of the operator A. Hence the set of canonical coefficients of the form (Ax, x) is uniquely defined. As for the canonical basis of the quadratic form (Ax, x), it is defined with the same arbitrariness as in the definition of a complete orthonormal system of eigenvectors of the operator A, i.e., apart from permutations of the eigenvectors, we can multiply any of them by —1, or more generally, we can subject them to any isometric transformation in the characteristic subs pace corresponding to a fixed eigenvalue X.
10.2. Extremal Properties of a Quadratic Form
10.21. Next, given a quadratic form A(x, x) in a Euclidean space R„, we examine the values of A(x, x) on the unit sphere (x, x) = 1 of the space Rn, and inquire at what points of the unit sphere the values of A(x, x) are stationary. It will be recalled that by definition a differentiable numerical function f(x), defined at the points of a surface U, takes a stationary value at the point x0 e U if the derivative of the function f(x) along any direction on the surface U vanishes at the point x0. In particular, the function f(x) is stationary at the points where it has a maximum or a minimum.
The problem of determining the stationary values of a quadratic form on the unit sphere is a problem involving conditional extrema. One method of solving the problem is to use Lagrange's method,f as follows: We construct an orthonormal basis in the space R„ and denote the components of the vector x in this basis by c,x, £2,. . . , In this coordinate system, our quadratic form becomes
A(x, x) = 2 atj&tZfr and the condition (x, x) = 1 becomes
• n
Using Lagrange's method, we construct the function
• ■ •»In) = 2 aikl&* -x 2 55»
+ See e.g., R. Courant, Differential and Integral Calculus, Vol. II (translated by E. J. McShane), Interscience Publishers, Inc., New York (1956), p. 190.
sec.  10.2 extremal properties of a quadratic form 277
and equate to zero its partial derivatives with respect to %t (/' = 1,2,..., n), recalling that aik = aki:
2 iflfl&-2X5,-0     (i = l,2,...,«).
After dividing by 2, we obtain the familiar system
(au — X)£x + «X252 + • • • + aXn\n = 0,
«21^1 + («22 — *)^2 + ' • ' + a2n%n = 0, «nl5l + »»2^2 +----H (flBB ~ X)£n = 0
(cf. p. 110), which serves to define the eigenvectors of the symmetric operator corresponding to the quadratic form A(x, x). It follows that the quadratic form A(x,x) takes stationary values at those vectors of the unit sphere which are eigenvectors of the symmetric operator A corresponding to the form A(x, x).
10.22. We now calculate the values which the form takes at its stationary points. To do this, we introduce the corresponding symmetric operator A and write the quadratic form as
A(x, x) = (Ax, x).
Suppose that A(x, x) takes a stationary value at the vector ev Since we have just shown that ei is an eigenvector of the operator A, i.e., Ae{ = X^,, we have
A(ef, e%) = (Aeif et) = X^, et) = X,.
Hence the stationary value of the form A(x, x) at x = et equals the corresponding eigenvalue of the operator A. Since the eigenvalues of the operator A are the same as the canonical coefficients of the form A(x, x), we can conclude that the stationary values of the form A(x, x) coincide with its canonical coefficients. In particular, the maximum of the form A(x, x) on the unit sphere is equal to its largest canonical coefficient, and the minimum of A(x, x) on the unit sphere is equal to its smallest canonical coefficient.
10.23. Quadratic forms and bilinear forms can both be considered not only on the whole «-dimensional space R„, but also on a ^-dimensional subspace Rfc c RM, and we can then look for an orthonormal canonical basis in Rfc. Let the quadratic form A(x, x) have the canonical form
A(x, x) = -kxl\ + X2y + • " • + X„S
(4)
278     QUADRATIC FORMS IN EUCLIDEAN AND UNITARY SPACES
chap. 10
in the whole space R„, and the canonical form
in the subspace Rk. We now find the relation between the coefficients (Xj, (x2, . . . , and the coefficients Xx, X2, . . . , X„. For convenience, we assume that the canonical coefficients are arranged in decreasing order, i.e., that
Xx > X2 > • • • > X„,      jxjl > ja2 > • • • > (xfc.
As we know, the quantity Xx is the maximum value of the quadratic form A(x,x) on the unit sphere of the space Rn; similarly, (xx is the maximum value of A(x, x) on the unit sphere of the subspace Rk. This implies that fxx < Xx. Moreover, we also have [xx > X„_fc+1. To see this, let elt e2, . . . , en be the canonical basis in which A(x, x) takes the form (4). Consider the (n — k + l)-dimensional subspace R' spanned by the vectors eu e2i. . . , en_k+v Since k + (n — k + 1) > n, then, by Corollary 2.47c, the subspaces R' and Rft have at least one nonzero vector in common. Let this vector be
■^0 = (£l '>■•■> ^>n—k+li 0> ■ ■ ■ » 0),
and assume that x0 is normalized, i.e., that |x0| = 1. According to (4), we have
A(.Xo, xQ) = Xx(<;x0))2 + ■ • ■ + X„_fr+1(^°!fc+x)2
> Xn_fc+1(EX0))   + " * " + X.„_fc+i(^«—fc+l)   = ^n-k+V
This implies that fxx, the maximum value of the quadratic form A(x, x) on the unit sphere of the subspace Rk, cannot be less than X„_fc+X, as asserted. Thus the quantity fxx satisfies the inequalities
Al > ^1 > An-fc+l- (5)
10.24. Naturally, the quantity fxx takes different values for different A>dimensional subspaces. We now show that there exist k-dimensional subspaces for which the equality signs hold in (5). Let R' be the subspace spanned by the first k vectors ex, e2i. .. , ek of the canonical basis of the form A(x, x). Then A(x, x) is just
A(x, x) = Xxa + X2£22 + • • • + Kll in the basis ex, e2, . . . , ek of R'. In particular,
A(ex, ex) = Xx = max A(x, x).
a-eR-
Thus the quantity
ij,x = (Ai(Ra.) :  max A(x, x) takes its maximum value Xx for Rk = R'.
sec. 10.2
extremal properties of a quadratic form 279
Next let R" be the subspace spanned by the last k vectors en_k+l, en-k+2->-' ■ > en °f the canonical basis of the form A(x, x). Then A(x, x) is just
A(x, x) = Xn_A.+i^„_A.+i + ■ * ■ + X^J;2, in the basis en_k+l, .. . , en of R". In particular,
Afcv-k+i, en_k+l) = Xw_m = max A(x, x),
l*l=i
JT6R"
and, just as before, we conclude that \ix takes its minimum value Xn_t+1 for Rj. = R". Thus we obtain the following new definition of the coefficient X„_fc+1: The coefficient Xn_fc+1 in the canonical representation of the quadratic form A(jc, x) equals the smallest value of the maximum of A(x, x) on the unit spheres of all possible k-dimensional subspaces of the space Rn.
10.25. Using this result, we can estimate the other canonical coefficients of the quadratic form A(;t, x) on the subspace Rfc. For example, if the sub-space Rfc is fixed, then (x2 is the smallest value of the maximum of A(x, x) on the unit spheres of all the (k ~ l)-dimensional subspaces of Rk, while X„_fc+2 is the smallest value of the maximum of A(x, x) on the unit spheres of all the (k — l)-dimensional subspaces of the whole space R„. Hence we have (x2 > \-k+2> and similarly
On the other hand, X2 is the smallest value of the maximum of the quadratic form A(jc, x) on the unit spheres of all the (n — l)-dimensional subspaces of the whole space R„. But, according to Corollary 2.47c, the intersection of every (n — l)-dimensional subspace with the subspace Rfc is a subspace of no less than (« — l) + /c — n ~ k — 1 dimensions, so that X2 is no less than the smallest value of the maximum of A(x, x) on the unit spheres of all such subspaces; in particular, X2 is no less than jx2, the smallest value of the maximum of A(x, x) on the unit spheres of all the (k — l)-dimensional subspaces of Rfc. Therefore we have X2 > (x2, and similarly X3 > u.3,. . . , Xfc >      Thus the canonical coefficients     jx2, . . . , fxfc satisfy the inequalities
Xx > fxx > Xn_fe+1,
X2 > }x2 > X„_fc+2, (6)
\> m > X„.
For k ~ n — 1, the inequalities (6) become
X2 > (x2 > X3, (7)
280     quadratic forms in euclidean and unitary spaces
chap. 10
*10.26. Consider the behavior of the quadratic form
Mx, x) =2x^2
in the (n — l)-dimensional subspace       specified by the equation
ai£i + <*2£2 + ' * ' +        =0      (a2 + a| + • ■ ■ + a2 = 1). (8)
Assuming that all the coefficients Xx, X2, . . . , X„ are different, we can calculate the coefficients u.x, (x2, . . . , u.„_x by using a method due to M. G. Krein. At least one of the coefficients ax, oc2, . . . , a„ is nonzero. For example, suppose ocn ^ 0. Then (8) implies
j  n—1
= - — 5>&-
a„ *=i
Substituting this expression for into A(x, jc), we find that A(jc, jc) has the form
in the subspace R„_x, in terms of the variables £2, . . . , £„_r The canonical coefficients of this quadratic form are the same as its stationary values on the unit sphere of the subspace R^ (Sec. 10.22). In the variables £x> £2). . .   £n_x this sphere has the equation
B(x, x) = l\ + H + • • • + g_x + ^f£*^V = 1-
Just as before, we determine these stationary values by using Lagrange's method. Thus we form the function
n—X _ -\ /«—1
A(x, x) - XB(x, x) = 2 (** ~ ^ + -JLT-~ 2
and equate to zero its partial derivatives with respect to £fc (A: = 1, 2, . . . . w — 1), obtaining
- X) + (I        a, = 0. (9)
The required coefficients fxx, (x2,. . . , (x„_x are the roots of the equation obtained by equating to zero the determinant D(X) of the system of linear equations (9). The coefficient matrix of this system is clearly the sum of two
sec. 10.2
extremal properties of a quadratic form     zoi
matrices; the first matrix is diagonal with the numbers Xfc — X (fc = 1, 2,. . . , n — 1) along the diagonal, while the second matrix has the form
axa2 oc2oc2
axan_x a2a„_x
By the linear property of determinants (Sec. 1.44), the determinant Z>(X) is the sum of the determinant of the first matrix and all the determinants obtained by replacing one or more columns of the determinant of the first matrix by the corresponding columns of the second matrix and taking account of the factor (X„ — X)/o^. Since any two columns of the second matrix are proportional, we need only consider the case where one of the columns of the determinant of the first matrix is replaced by the corresponding column of the second matrix.
In particular, if the A:th column of the first matrix is replaced by the kth column of the second matrix, the resulting determinant has the form
	Xx-X	0	0		0	0
	0	x3 - X •	0		0	0
x„-x	0	0	Afc-1 ' A		0	0
2 a*	0	0	0	txktxk	0	0
	0	0	0	*fcafc+l	x,^-x ■	0
	0	0	0		0	Aw-1 -
2 n (h - a)
an      Afr ~ A
Denote the determinant of the first matrix by
F(X) = n (X, - X),
k=l
and let
G(X) - IT -
282     QUADRATIC FORMS IN EUCLIDEAN AND UNITARY SPACES
CHAP. 10
Then the required determinant D(X) becomes
1        "Z.1 a
D(X) = F(k) + - G(X) J —*- - (10)
art X,. - X
Solving the equation D(k) = 0, we find the quantities \iu u,2, . . . , (xn_x in which we are interested. Note that these quantities depend on the squares of the numbers cnk rather than on the numbers afc themselves. Thus changing the sign of one or more coefficients in (8) does not change the canonical coefficients of the form A(x, x) in the subspace Rn_x.
*10.27. Equation (10) is of particular interest in that it allows us to construct from given numbers u,2, ■ • • , y-n-i satisfying the inequalities (7) a subspace R„_x in which the form A(x, x) has the canonical coefficients \iu \i2, .. . , (jLn_x. (Again it is assumed that the numbers Xl5 X2, . .. , Xn are distinct.) We now show how this is done.
First we note that (10) can be written in the form
2D(a)     2f(X)   y  4      » a;
G(X)        G(X)    kti lk - X    t-i \. - X
Thus the numbers &\, . . . , cn2n are proportional to the coefficients obtained when we expand the rational function D(X)/G(X) in partial fractions. Now suppose we are given numbers     (x3, . . . , (jl„^.x satisfying the inequalities
Al > Hi > A2,
X, > (x2 > X3, (12)
An-l > \t-n-l > An-
Let
Dx(X)= II((x, -X), and expand the rational function Z>X(X)/G(X) in partial fractions
5&> = _£L_ + _&_ + .+ (13)
G(X)     Xx - X     X2 — X X„ - X
The coefficients cx, c2,. . . , cn are given by the familiar formulaf
_DM_= _^ Di(W
°k    (Xx - Xfc) • • - (X,_x - Xfc)(Xfc+1 - X,) • • • (Xn - Xfc) G'(Xfc) '
t See e.g., R. A. Silverman, Modern Calculus and Analytic Geometry, The Macmiltan Co., New York (1969), p. 861.
sec. 10.3
simultaneous reduction of two quadratic forms 283
and all have the same sign. To see this, we note that the numbers Dx(kx)t Dx(k2), ■ • • » Di(K) alternate in sign, since, by hypothesis, the roots of the polynomial Dx(k) alternate with the roots of the polynomial G(X). Thus the numbers Dx(kk)lG'(kk), and hence the coefficients ck {k = 1,... , «), all have the same sign. By supplying an extra factor, we can assume that the ck are all positive and add up to 1. We can then define the numbers by the formulas
<H = ci>      af = c2, . . . , o£ = cn, (14)
where each <xfc can have either sign.
Finally we show that the subspace Rn_i defined by the equation
+ a2^2 + ' ' ' + Ct^„ = 0
is the required subspace, in which the quadratic form A(x, x) has the canonical coefficients \xx, \x2i. . . »|a„-si. In fact, as proved above, the polynomial Z)(X) whose roots are the canonical coefficients of A(jc, x) in the subspace R„_x is given by formula (10) or the equivalent formula (11). Comparing (11) with (13) and using (14), we find that the polynomial Z>(X) differs only by a numerical factor from the polynomial Dx{k) just constructed. But then the roots of D(X) coincide with the numbers (xls (x2, . . . , \xn^x, as required.
Remark. It can be shown that the numbers cnx,..., <xn depend continuously on the numbers Xlt. . . , Xn, \xx, . .. , \in^x. Using this fact, we can verify that the problem can still be solved if the numbers jxx,. . . , (xn_x satisfy the inequalities (7) instead of (12) or if the numbers Xx, . . . , X„ are no longer distinct.
10.3. Simultaneous Reduction of Two Quadratic Forms
10.31. The following question plays an important role in certain problems of mathematics and physics: Given two quadratic forms A(x, x) and B(x, x) defined in an n-dimensional affine space R„, how does one find a basis in which bothA(x, x) andB(x, x) are reduced to canonicalform (i.e., to sums of squares of the components of x with certain coefficients)? The following example in the plane (« = 2) shows that this problem does not always have a solution:
Consider the two forms
A(x, x) = l\ - H B(x, x) = Z&2.
Finding a common canonical basis for these two forms is the same as finding a common pair of conjugate vectors for the hyperbolas A(x, x) — 1 and B(x, x) = 1 (see Sec. 7.42). Since these are equilateral hyperbolas, we know from analytic geometry that the conjugate directions of the hyperbolas are
284     quadratic forms in euclidean and unitary spaces
chap. 10
symmetric with respect to their asymptotes. Therefore the polar angles cpx and 9a corresponding to the pair of conjugate directions satisfy the relation
?1 + ?2 = ~
for the first hyperbola and the relation
9i + <p2 = 0
for the second hyperbola (both relations hold only to within an integral multiple of n). Since the two relations are mutually exclusive, there does not exist a common pair of conjugate vectors in this case.
It turns out that the problem of simultaneous reduction of two quadratic forms does have a solution if we make the supplementary assumption that one of the forms, say B(x, x), is positive definite, i.e., that B(x, x) > 0 for x ^ 0. In this case, the existence of a solution is easily proved as follows: Let B(x, y) be the symmetric bilinear form corresponding to the quadratic form B(jc, x), and introduce a Euclidean metric in the affine space R„ by writing
(x,y)=^B(x,y).
The fact that B(x, y) is symmetric and positive definite guarantees that (x, y) satisfies the axioms for a scalar product. By Sec. 10.11 there exists an ortho-normal basis (with respect to this metric) in which A(jc, x) takes the canonical
f°rm
A(x, x) = X^2 + X^2. + ■ ■ ■ + XnS, (15)
where £1} £2> ■ ■ ■ > \n denote the components of the vector x in the basis just found. In the same basis, the second quadratic form B(jc, x) becomes
B(x, x) = (x, x) = ti\ + fit + ■ ■ • + rfn,
by formula (17), p. 222. Hence, as asserted, there exists a basis in which both A(x, x) and B(jc, x) have canonical form.
10.32. To construct the components of the vectors elt . . . , en of the basis which is simultaneously canonical for both quadratic forms, we use the extremal properties of quadratic forms. As shown in Sec. 10.21, the vectors ex,. . . , en of the required basis are the vectors obeying the condition
(x, x) — B(x, x) = 1
for which the form A(x, x) takes stationary values. Suppose A(x, x) and B(jc, x) are given by
n
M.x, x) = 2 aq&&k>
B(x,x)=- f bikUk
i,k=l
sec. 10.3
simultaneous reduction of two quadratic forms 285
in the original basis. Using Lagrange's method, we form the function
n n Fill, 12, - ■ ■ , In) = 2 aiklilk ~ (x 2 bi*&&k>
and then equate to zero its partial derivatives with respect to all the ^:
Z«*4 - n2>**£* = 0     0=1,2,..., n). (16)
k=l
The resulting system of homogeneous equations
(au - jx^i)^ + (an - ]xbn)l2 H-----h (fli„ - (x6ln)^n = 0,
(«21 ~ + («22 - ^22)^2 H-----h («2„ - ^2n)£« = 0,
(«Bi - H*Bi)5i + (««2 - (A2)£2 H-----r- («„„ - |a6bb)5„ = 0
has a nontrivial solution if and only if its determinant vanishes:
«11 - \*t>Vl     «12 - (a*12     * ' *     Bin" V&ln
(17)
«21 — ^21     «22 — V-b
22
«nl — V^nX    ««2 — V-bn2
«2B — V-hn
ann —
0.
(18)
Solving (18), we find n solutions \x = (A: — 1,2,...,«). Then substituting (xfc into the system (17), we find the components £2fc),. .. , £nfc) of the corresponding basis vector ek. The results of Sec. 10.31 guarantee that (18) has « real roots and that every root of multiplicity r corresponds to r linearly independent solutions of the system (17).
10.33. Turning to the calculation of the canonical coefficients, we now show that the coefficients Xx, X2,. . . , X„ in the canonical representation (15) of the form A(x,x) coincide with the corresponding roots fjtx, u,2,... , fx„ of the determinant (18). We could use an argument like that given in Sec. 10.22, but we prefer to carry out a direct calculation. Given the root \xm, we multiply the 2th equation of the system (16) by Qm) (the ith component of the solution corresponding to (xm) for i — 1,2,...,« and then add all the resulting equations, obtaining
Mem, O - 2 aikl{r%m) = ^ 2 bik^%m) = V-JSiem, ej = \Lm, (19)
i.k=i
i,k=l
since B(em, em) = 1. On the other hand, if r)<m), 7)<m),. . . , yj^1' are the canonical components of the vector em, then obviously 7)<m) = 0 if i' ^= m
286     quadratic forms in euclidean and unitary spaces
chap. 10
while y)^' = 1, and hence
A(c«,0 =lHr&m)? -Xm. (20)
i=l
Comparing (19) and (20), we get (xm = Xm, as asserted. This result allows us to write A(x, jc) in canonical form, without calculating the canonical basis.
10.34. The problem posed in Sec. 10.31 of simultaneously reducing two quadratic forms A(x, x) and B(x, x) to canonical form, where one of the forms, say B(x, x), is positive definite, was solved in a rather strong form, i.e., we reduced B(x, x) to a sum of squares with coefficients equal to 1. In general, this is not required, and hence the coefficients of the canonical forms are not uniquely determined. Nevertheless, as we now show, the ratios of the corresponding canonical coefficients are still independent of the means used to simultaneously reduce A(x, x) and B{x, x) to canonical form.
Suppose that A(x, x) and B(x, x) have been simultaneously reduced to canonical form in two different ways, i.e., suppose that in the variables li, £2, ■ ■ * , In we have
A(x, x) = 2 B(x, x) = 2
while in the variables y)1( v]2, .. . , y]b we have
n n
A(x, x) = 2 P^5,     B(x, x) = 2 1=1 1=1
Since the form B(x, x) is positive definite, the numbers and (/ = 1, 2, . . . , n) are all positive. Consider the new coordinate transformation
Then the forms A(x, x) and B(x, x) become
A(x,x)=2^!?,      Bixtx) = ih
in the variables L and
A(x, x) = 2 " 1?,      B(x, x) = 2 ft
1 = 1 t2 2 = 1
in the variables rti. Let elt e2,. . . , en be the basis corresponding to the
variables ^, and let fx,f2, ■ ■ ■ ,fn he the basis corresponding to the variables rti. Both these bases are orthonormal in the metric determined by the form B(x, x). Moreover, according to Sec. 10.14, the set of canonical coefficients of the quadratic form A(x, x) is uniquely determined. Hence the two sets of
sec. 10.4        reduction of the general equation of a quadric surface 287
numbers Xjv^ X2/v2, . .. , X„/v„ and Pi/t^, p2/-r2, . . . , p„/t„ must coincide, except possibly for order, and our assertion is proved.
10.4. Reduction of the General Equation of a Quadric Surface
10.41. In this and subsequent sections, we will call the elements of the rt-dimensional linear space R„ points rather than vectors (cf. Sec. 2.17), which is more in keeping with the geometry of the situation. By a quadric (or second-degree) surface in Rn is meant the locus of the points x =
£2» • ■ ■ » £n) G Rn which satisfy an equation of the form
I aikU* +22 b& + e = 0 (21)
i.fc=l 1=1
or
A(x, x) + 2L(x) + c = 0, A(x, x) = 2
ij a quadratic form in the components of the radius vector of the point x,
L(x) - J bit*
is a linear form, and c is a constant.^
We will assume that the space R„ is Euclidean and that the numbers £i, £2>- ■ ■ > are the coordinates ofthe points with respect to an orthonormal basis. The problem of this section is then to choose a new orthonormal basis in Rn such that our quadric surface is specified by a particularly simple equation, called the canonical equation of the surface. Subsequently, we will use the canonical equation to study the properties of the surface.
10.42. First of all, as in Sec. 10.12, we make an orthogonal coordinate transformation
Zt^llWi     0' = 1,2.....n) (22)
in R„, reducing the quadratic form A(x, x) to the canonical form
n
A(xf x) = 2 Mi*
t In the case n = 2, the geometric object defined by (21) is called a second-degree curve. However, we will henceforth always use the word "surface," despite the fact that, strictly speaking, it should be changed to "curve" whenever n = 2.
288     QUADRATIC FORMS IN EUCLIDEAN AND UNITARY SPACES
chap. 10
Substituting (22) into (21), we get
2      + 22 hra + c - 0, (23)
where the lt (i ~ 1, 2, . . . , ri) are the new coefficients of the linear form
L(x).
If X; z£ 0 for some i in (23), we can eliminate the corresponding linear term by appropriately shifting the origin of coordinates. For example, if \ # 0, we have
\      xx/ kx
We then set
Ai
which is equivalent to shifting the origin to the point
("t0-0.....°)-
As a result of this substitution, the pair of terms \i)2 + 21^1 is changed to
I2
m;2 - f2,
Xi
i.e., the quadratic term has the same coefficient as before, the linear term disappears, and /J/XJ is subtracted from the constant term. After making all such transformations, the equation of the surface becomes
W + Mi + ' ■ ■ + V)2 + 2/r+1y]r+1 + ■ ■ • + 2/„7)„ + c = 0.
Here, for simplicity, we have dropped the primes on the variables f\'., and we have renumbered the variables in such a way that the variables appearing in the quadratic form come first, i.e., Xx, X2, . . . , Xr are nonzero and X^. = 0 for k > r. If r = n or if the numbers lr+1, /r+2, ...,/„ all turn out to be zero, we obtain the equation
Art! + W + ■ • • + Kflr + C = 0, (24)
called the canonical equation of a central surface. A quadric surface is said to be nondegenerate if all n variables appear in its canonical equation, and degenerate if less than n variables appear in its canonical equation. A nondegenerate central surface, with canonical equation
+ + ' ■ ■ + Kt\n + C = 0 (25)
(i.e., such that r = «), is said to be a proper central surface if c y= 0 and a conical surface if c — 0. The meaning of this terminology will be apparent later.
sec. 10.5
geometric properties of a quadric surface 289
Now suppose at least one of the numbers L+1, lr+2,... , ln is nonzero, and carry out a new orthogonal coordinate transformation by using the formulas
tx = v)!, T2 ~ 752»
tr - 7)r, (26) M
where M is a positive factor guaranteeing the orthogonality of the transformation matrix. Since the sum of the squares of the elements of every row of an orthogonal matrix must equal 1, we have
M2 = /2+1 + /2+2 +■■■ + /!.
The remaining rows (i.e., rows r + 2, r + 3, . . . , n) can be arbitrary, provided only that the resulting matrix is orthogonal (see Sec. 8.95). As a result of the transformation (26), the equation of the surface takes the form
A1T1 + ' ' * + ArTr ~ 2MTr+l — C.
If c ^ 0, another shift of the origin given by the formula
Tr+1 " Tr+1    2M '
or
2M<+X = 2M-r+1 - c,
allows us to eliminate the constant term. Then, dropping the prime on tjlfx, we obtain the equation
Xrf + ■ ■ ■ + Xrt2 = 2Mrr+1, (27) called the canonical equation of a noncentral surface.
10.5. Geometric Properties of a Quadric Surface
10.51. The center of a surface. By a center of a surface is meant a point
with the following property: If the point
290     quadratic forms in euclidean and unitary spaces
chap. 10
lies on the surface, then the point
(Si       Sl> Sj?      Sa> ■ ■ ■ j Sn ^n)*
which is symmetric with respect to x0, also lies on the surface. A surface with the canonical equation (24) has at least one center, since every point for which
% = t)2 - • ■ • = 7)r = 0 (28)
is obviously a center. This explains why such surfaces are called central surfaces.
We now show that a surface with the canonical equation (24) has no centers other than the point (28), a fact that will be used later. To see this, let (£°, S2> • ■ • > S°) ^e a center of the surface. Then the relation
MS? + Si)2 + ^2° + £2)2 + ■ ■ • + MVr + lTf + C = 0
implies
U% - Si)2 + MS2 - S2)2 + ■ ■ ■ + *,(S? - Sr)2 + e = 0. Subtracting the first equation from the first, we obtain the equation
X^Ri + a2^2 + ■ • ■ + = 0, (29)
satisfied for arbitrary £1} £2, . . . , corresponding to points on the surface (24). If the point (£{ + Q + £2, . . . , £° + £„) lies on the surface (24), then so does the point        - ^, £° + £2, . .. , £° + But
-a - 5i = s + (-25? - so,
and hence we have
MR-25? - Si) + U^2 + ■ ■ ■ + Kl%T = 0, (29')
as well as (29). Subtracting (29') from (29), we get
i\ll(li + 3) = 0,
which implies Si = —SJ ^ ^=1 ^ ^- ®ut s'nce Si can De replaced by —Si, we also have — Si = —SJ- This, together with £T = — contradicts the assumption that £° 7^ 0, thereby proving that = 0. Similarly, we find that
« = •-.=8=0,
as required.
10.52. Proper central surfaces. Consider a proper central surface, i.e., a surface with canonical equation (25), where c ^0. Dividing by c, we transform (25) into the form
sec. 10.5
geometric properties of a quadr1c surface 291
where the numbers at are defined by
ai = +J~     (i = 1,2,.. .,n),
v Aj
and are called the semiaxes of the surface. Renumbering the coordinates in such a way that the positive terms appear first, we get
2 2
3» + * +
2   1      2 ^
a1 a2
2     - 2
, TL* _ 3*±!
.2 „2
fc+i
0*
= 1.
(30)
It is natural to exclude the case k = 0 from consideration, since there are no real values v)1} 7)2, .. • , v\n satisfying (30) if k = 0. (In this case, one sometimes says that (30) defines an "imaginary" surface.) This leaves n different types of proper central surfaces, corresponding to the values k — 1,2,.
n.
a. In the two-dimensional case {n = 2), we have k = 1, k equation (30) leads to the two curves
2, and
(*= 1)
(* = 2)
?!
4-i
«2
2 2
2i + 3-2 = l
«1 tf2
familiar from analytic geometry.
b. For n = 3 we have k = 1, A; = 2, A:
(a hyperbola), (an ellipse),
3, and the corresponding proper
central surfaces in three-dimensional space are given by the equations
{k= 1) (k - 2) (/c = 3)
5Í a?
íl a2,
2 2
Hi , ^
3?
5?
2 2 2
1,
1.
a*,
a;
We now remind the reader of the construction of each of these three surfaces.
Consider the sections of each of the surfaces made by the horizontal planes y)3 = Caz (— oo < C < oo). These sections are respectively hyperbolas
(* = 1)
292     quadratic forms in euclidean and unitary spaces
chap. 10
with the 7^-axis as transverse axis, ellipses
\ = 1 + C2
defined for all values of C, and ellipses
jU i j* = i _ r2
defined only for \C\ < 1. To locate the vertices of these sections, we construct the sections of each surface made by the coordinate planes t\x — 0, 7)2 = 0. In the case k — 1, only the coordinate plane t\t = 0 gives a real section, i.e., the hyperbola
The vertices of the hyperbola formed by the horizontal sections lie on this curve, and as a result of the construction we obtain the surface shown in Figure 2, called a hyperboloid of two sheets.
In the case k = 2, the sections made by both planes tjj = 0 and 7)2 = 0
with the 7)3-axis as transverse axis. The set of ellipses formed by the horizontal sections have vertices lying on these hyperbolas, and form the surface shown in Figure 3, called a hyper boloid of one sheet. Finally, in the case k — 3, the sections made by the coordinate planes 7)x = 0, r\2 = 0 are ellipses. Drawing the ellipses made by the horizontal sections, we obtain an ellipsoid (see Figure 4).
Figure 2
are hyperbolas
^42479
46892856
SEC. 10.5
GEOMETRIC PROPERTIES OF A QUADR1C SURFACE 293
Figure 3
c. Quadric surfaces in spaces of more than three dimensions are not easily visualized. Nevertheless, even in the multidimensional case, we can show essential differences between the types of proper central surfaces corresponding to the different values k — 1,2,...,«. We begin by pointing out differences which are geometrically obvious in three dimensions. On the hyperboloid of two sheets (k = 1), there exists a pair of points which cannot be made to coincide by a continuous displacement of the points along the surface; to obtain such a pair of points, we need only take the first point on one sheet and the second point on the other sheet. On the hyperboloid of one sheet (k = 2), any two points can be made to coincide by means of a continuous displacement along the surface; however, there exists a closed curve, e.g., a curve going around the "throat" of the hyperboloid, which cannot be continuously deformed into a point. On the ellipsoid, {k — 3), any closed curve can be deformed into a point. These facts can
Figure 4
294     quadratic forms in euclidean and unitary spaces
chap. 10
serve as the starting point for classifying the geometric differences between proper central surfaces in an «-dimensional space, as we now show.
We introduce the following definitions: A geometric figure A is said to be homeomorphic to a figure B if there exists a one-to-one, bicontinuousf mapping of the points of the figure A into the points of the figure B. A figure A lying on a surface S is said to be homotopic to a figure B lying on the same surface if the figure A can be mapped into the figure B by means of a continuous deformation, during which the figure A always remains on the surface S.
Using these definitions, we can formulate the geometric differences between the proper central surfaces corresponding to different values of k as follows: For k = 1 we can find a pair of points on the surface which are not homotopic to each other. For k = 2 every point on the surface is homotopic to every other point, but there exists a curve which is homeomorphic to a circle and not homotopic to a point. For k — 3 every curve which is homeomorphic to a circle is homotopic to a point, but there exists a part of the surface which is homeomorphic to a sphere (in three-dimensional space) and not homotopic to a point. Continuing in this way, we can formulate the following distinguishing property of the proper central surface corresponding to a given value of k: Every part of the surface which is homeomorphic to a sphere in (k — l)-dimensional space is homotopic to a point, but there exists a part of the surface which is homeomorphic to a sphere in ^-dimensional space and not homotopic to a point. In particular, this implies that the proper central surfaces in n-dimensional space (which are obviously homeomorphic to each other for equal values of k) are not homeomorphic to each other for distinct values of k. The proof of these facts will not be given here, and can be found in a course on elementary topology.
10.53. Conical surfaces. Next we consider a conical surface, i.e., a surface with canonical equation (25), where c = 0. In this case, equation (25) becomes homogeneous, i.e., if the point (y)x> y]2, ... , ?)„) satisfies (25), then so does the point (f7)1( tf\2,. . . , f7)„) for any t. This means that the surface is made up of straight lines going through the origin of coordinates.£ Just as before, we can write the canonical equation of a conical surface in the form
t Equivalently, continuous in both directions, i.eM continuous with a continuous inverse.
% Except when all the terms in (25) have the same sign, in which case (25) defines a single point, namely the origin.
ak+i
(31)
sec. 10.5
geometric properties of a quadric surface 295
We now find the number of different types of conical surfaces corresponding to a given value of n. If the number of negative terms m — n — k in the canonical equation (31) is greater than w/2, then, multiplying the equation by — 1, we obtain an equation describing the same surface but which now has a number of negative terms less than «/2. Therefore it is sufficient to consider the cases corresponding to the values m < «/2. If m is even, then, excluding the case of a point (m = 0), we obtain n\2 different types of conical surfaces, corresponding to the values m = 1,2,..., «/2. If n is odd, there are (n — l)/2 different types of conical surfaces, i.e., those corresponding to the values m = 1, 2, ...,(« — l)/2.
a. In the plane (n = 2), besides a point, there is only one other type of conical surface (w = 1), with the canonical equation
2 2
3!-2L' = o
The corresponding geometric figure is a pair of intersecting straight lines with the equations
*5i = ±
ax a2
In three-dimensional space (« = 3), besides a point, there is also only one other type of conical surface, corresponding to
n- 1    3-1 « m---- =- = 1,
with canonical equation
2 2 2
Hi + Jls _ ^ = o
ax «2
The corresponding geometric object is a cone. In the particular case where ax — a2f this is a right circular cone (see Figure 5).
b. To visualize the form of a conical surface in the general case, we consider its intersection with the hyperplane
y\n = Can      (~co<C<co). (32)
Substituting (32) into (31), we get
2 2 2 2
^L1 _1_ . . . _1_ -I* _ _ . . . „ "Ow-l „ ^ 2
«1 a*.    ak4l an_x
This is the equation of a proper central surface in an (w — l)-dimensional space. The surfaces corresponding to different values of C are all similar to
296     quadratic forms in euclidean and unitary spaces
chap. 10
Figure 5
each other, with semiaxes proportional to the value of C. Thus every conical surface in the n-dimensional space RM can be obtained from a central surface in the (n — 1)-dimensional space RM_i by displacing the central surface along a perpendicular to R„_i and at the same time proportionately stretching the surface in all directions. Moreover, to obtain all possible types of conical surfaces in this way, we need only use the central surfaces in Rn_x for which the number of negative terms in the canonical equation does not exceed (n - l)/2.
10.54. Nondegenerate noncentral surfaces (paraboloids). Just as in Sec. 10.52, we can reduce the canonical equation of a nondegenerate noncentral surface to the form
aj+-"+3!-4±!—4=*-*].- (33)
ai 0)C      <7fc+i an-l
We now find the number of different types of nondegenerate noncentral surfaces. If the number of negative terms in the left-hand side of (33) is greater than (n — l)/2, then, multiplying (33) by —1, we obtain the equation of the same surface, but with a number of negative terms in the left-hand side which is less than (w — l)/2 and with a change of sign of the right hand side. The sign of the right-hand side is restored by the mirror reflection f\'n = —f\n. Thus, if we do not count surfaces obtained from each other by mirror reflections as being of different types, the number of different types
sec. 10.5 geometric properties of a quadric surface 297
of nondegenerate noncentral surfaces is equal to the number of integers satisfying the inequality 0 < m < (n — l)/2. This number equals n\2 if n is even and (n + l)/2 if n is odd.
a. In the plane (n — 2) there is only one nondegenerate noncentral curve, i.e., the parabola with canonical equation
fil = 2oh)a .    (m = 0).t
b. In three dimensions there are two nondegenerate noncentral surfaces
(.-3,«-±i-i±i-2)
~2 + ~2 = 27]3     (m = 0),
al a2
— - — = 2yi3     (m ----- 1). a\ at
In the first case (m = 0), the sections of the surface made by the plane 7)3 = C > 0 is an ellipse. To find the position of the vertices of this ellipse, we construct the sections of the surface made by the coordinate planes
= 0 and y)2 = 0. Each of these sections is a parabola, and the intersections of these parabolas with the plane yj3 = C locate the vertices of the ellipse. The resulting surface, shown in Figure 6, is called an elliptic paraboloid (a circular paraboloid in the special case where ax = a2).
In the second case (m = 1), the section of the surface made by the plane 7]3 = C > 0 is a hyperbola with the y)x-axis as its transverse axis. To find
r>3
Figure 6
t Note that now m = n — 1 — k.
298     quadratic forms in euclidean and unitary spaces
chap. 10
Figure 7
the position of the vertices, we note that the section of the surface made by the coordinate plane 7]2 = 0 is the parabola
whose intersection with the plane yj3 = C gives the position of the vertices of the hyperbola. The section made by the plane % = C < 0 is a hyperbola with the 7)2-axis as its transverse axis. The vertices of this hyperbola lie on the parabola
rjj = -2a\^
in the plane y]x = 0. The section made by the plane v)3 = 0 is a pair of straight lines, which serve as asymptotes for the projections on the plane y)3 = 0 of all the hyperbolas lying in horizontal sections of the surface. The surface itself is called a hyperbolic paraboloid (see Figure 7).
c. To visualize the form of the surface (33) in the general case, we investigate the way the sections made by the hyperplanes yj„ = C change when C varies from 0 to + oo. Every such section is a central surface in n — 1 dimensions. All these surfaces are similar to each other, and their semiaxes (unlike the case of conical surfaces) vary according to a parabolic law, i.e., are proportional to the square root of C. For C = 0 the central surface
SEC. 10.5
GEOMETRIC PROPERTIES OF A quadr1c SURFACE 299
becomes conical. For C < 0 the central surface goes into the conjugate surface, i.e., the positive and negative terms in the canonical equation exchange their roles. In the special case where the terms of (33) have the same sign, which, to be explicit, we take to be positive, the surface exists only in the half-space t\n > 0.
d. The reason for calling this class of nondegenerate surfaces noncentral is that such surfaces actually have no centers. For n = 3 this is obvious from Figures 6 and 7. To prove the assertion in the general case, assume the contrary, i.e., suppose that the surface (33) has a center (tjJ1, y)°, . . . , rj°). Since, in particular, this center must be a center of symmetry for the section y)„ = y)°, which represents a nondegenerate central surface in n — 1 dimensions, we must have
(cf. Sec 10.51). Thus the center must lie on the 7)n-axis. Now if we go from an arbitrary point (t^, .. . , + S) lying on the surface to the sym-
metric point ( —*h, • • • , —*)„_i, *)° — S), equation (33) must still be satisfied. But the left-hand side of (33) remains the same when we make this transition, and hence its right-hand side cannot change. It follows that S = 0, and hence that there are no points on the surface for which t\n ^ yfn. But (33) obviously has solutions t)x, 7)2,. .. , r\n with t\n ^ rfa. This contradiction shows that our surface cannot have a center.
10.55. Degenerate surfaces. As in Sec. 10.42, by a degenerate surface we mean a surface whose canonical equation contains less than n coordinates. For example, suppose that the coordinate 7)„ is absent in the canonical equation. Then all the sections of the surface made by the (« — l)-dimensional hyperplanes t\n — C (— oo < C < oo) give the same surface in n — 1 dimensions. Therefore every degenerate surface in the n-dimensional space R„ is generated by translating a quadric surface in the (n — X)-dimensional space R„„x along a perpendicular to Rn_x.
a. We now find the appropriate curves in the plane (n = 2). In this case, the canonical equation contains only one coordinate and hence is just
For C > 0 we obtain a pair of parallel lines, for C = 0 a pair of coincident lines, and for C < 0 an "imaginary curve."
b. To construct degenerate surfaces in three-dimensional space (n = 3), we must translate all the second-degree curves in the y)1r)2-plane along the /]3-axis. When this is done, ellipses, hyperbolas and parabolas give elliptic, hyperbolic and parabolic cylinders, respectively (see Figure 8), while pairs
300     QUADRATIC FORMS IN EUCLIDEAN AND UNITARY SPACES
CHAP. 10
Figure 8
of intersecting, parallel and coincident lines lead to intersecting, parallel and coincident planes (see Figure 9).
10.6. Analysis of a Quadric Surface from Its General Equation
10.61. We have just described all possible types of quadric surfaces in an H-dimensional Euclidean space, where the type of the surface was determined from its canonical equation. However, the surface is often specified by its general equation (21) rather than by its canonical equation, and it is sometimes important to determine the type of the surface, i.e., construct its
Figure 9
SEC. 10.6
ANALYSIS OF A QUADR1C SURFACE FROM ITS GENERAL EQUATION 301
canonical equation, without carrying out all the transformations described in Sec. 10.42. It turns out that to write down the canonical equation of the surface specified by equation (21), we need only know the following two quantities:
a) The roots of the polynomial
A(X) =
a
21
a
nX
«12 «22 X
««2
of degree «;
b) The coefficients of the polynomial
a
In
a
2n
ann - X
AX(X) =
of degree «.
«11 - A
a
21
a
nl
bx
a
12
«22 ^
a
n2
a
Xn
bx
«2n b2
ann — X bn
To obtain explicit expressions for the coefficients of AX(X), we use the linear property of determinants (Sec. 1.44). Every column of the determinant AX(X), except the last one, can be written as a sum of two columns, the first consisting of the numbers (i = 1,2,...,«;/ fixed) and the number bu the second consisting of n zeros and the number —X. As a result, the determinant AX(X) can be written as a sum of determinants, each of which is obtained by replacing certain columns (except the last one) in the matrix
«11 «12
(hn a,
22
anX «n2
bx b2
°Xn h «2« h
ann bn
(34)
by columns consisting of « zeros and the single element —X, with the number —X appearing on the principal diagonal of the matrix. After expansion with respect to the columns containing the number —X, each of these determinants becomes
302     QUADRATIC FORMS IN EUCLIDEAN AND UNITARY SPACES
CHAP. 10
where k is the number of columns containing the element —X, and Mn+l_k is a minor of order n + 1 — k of the matrix Ax. This minor is characterized by the fact that if it uses the fth row (/ = 1, 2,.. . , ri) of Aiy it also uses the rth column, and moreover, it must use the last row and column of Ax. Minors with this property will be called bordered minors. It is obvious that every bordered minor of the matrix Ax appears in the expansion of the determinant AX(X). From this we immediately conclude that the coefficient °f (~^)* in the expansion of the determinant AX(X) in powers of — X equals the sum of all the bordered minors of order n + 1 — k. It is convenient to write the expansion of AX(X) in the form
- an+1 - a„X + a„_xX2----+ ax(~X)",
where the coefficient <xfc is the sum of all the Ath-order bordered minors of the matrix Ax.
10.62. As we already know, the roots of the characteristic polynomial A(X) give us the coefficients of the squared variables in the canonical equation. To find the remaining term, which is of degree 0 if the canonical equation has the form (24) and of degree 1 if it has the form (27), we must examine the behavior of the polynomial AX(X) under coordinate transformations.
Thus consider the quadratic form
Ax(x, x) = 2 aij££k + 22 b£&n+i + c%
2
n+1
(35)
in the (n + ])-dimensional Euclidean space R„+i, where £x, £2, • • • , are the components of the vector x e Rn+1 with respect to some orthonormal basis eue2,'.. , en, en+x. The operator corresponding to (35) is the symmetric operator Ax which has the matrix (34) in the basis ex, e2,. . . , en, en+x; we will also denote this matrix by A(e). Besides this operator, consider the operator Ex defined by the relations
Ei^fc = ek     (k < «),
This operator has the matrix
ien+x = 0.		
0  0   • • •	0	0
1   0   • • •	0	0
0   1   • • •	0	0
0 0
0 0 0 0  0 0
1 0 0 0
(36)
SEC. 10.6   ANALYSIS OF A QUADR1C SURFACE FROM ITS GENERAL EQUATION 303
in the same basis elt e2,.. . , en, en+x. Let R„ denote the subspace with the vectors ex, e2,. . . ,en as a basis. Then the operator Ex is obviously the identity operator in this subspace.
Now suppose we are given an isometric operator Q in the space R„ Then Q carries the orthonormal basis elf e2,. . . , en into another ortho-normal basis fi,f2, • • • ,/„. We construct a new isometric operator Qx in the space Rn+1 by setting
=/*      (* < w)> If the matrix of the operator Q has the form
°21 °22
°2n
qni q»2 ' ' ' qnn
in the space R„, then the matrix of the operator Qx just constructed has form
Qx
qn in
°2l °22
qm qn2
0 0
qm o
q2n o
qnn o
o 1
in the space R„+x. This matrix corresponds to the following coordinate transformation (see Sec. 8.94):
Sx = ?n>h + ^21^)2 H-----h qm-rin,
I2 = q^l + ?22?)2 + ' * • + qn2f\n, In =?ln*)l +q*nf\2 H----+ WU.
(37)
In the new basis fx,f2,. .. ,fn,fn+i the operator A has the matrix
(see Sec. 5.51), while the operator Ex has the same matrix (36) as before. Moreover, according to Sec. 5.52,
det (A{f) — 'kEy) = det (A{e) — X£x).
304     QUADRATIC FORMS IN EUCLIDEAN AND UNITARY SPACES
CHAP. 10
We now assume that (37) is the transformation (see Sec. 10.42) which reduces the quadratic form
to the canonical form
A(x, x) — 2 aa&fo
A(x, x) = 2 ^itfi-
It follows from (37) that Qx transforms the quadratic form (35) in n + 1 variables into
2    +2 2       + ^i+i- (38)
After this transformation, the matrix of the operator Ax, which, as we know, transforms in the same way as the matrix of the corresponding quadratic form, becomes
A{f) —
\ 0
0 x2
+ *
0 0
0 0
0 0
0 0
0 0
xr 0
0 0
0 0
0 lx 0 h
0 lr
0 /,
0
L
T+l
and the polynomial AX(X) = det (A{f) — \EX) equals the determinant
\ — X 0
0 0
0
0
x2 - X
0 0
0
0 0
Xr - X
0 0
0 0
0
-X
0
/
r+1
0 0
0 lt
0 /,
r+l
c
The coefficients of this polynomial can be calculated, by using the bordered minors of the matrix A(f), just as they were calculated before by using the
bordered minors of the matrix A
Av
SEC. 10.6   ANALYSIS OF A QUADR1C SURFACE FROM ITS GENERAL EQUATION 305
We note that for r < n all the bordered minors of the matrix A{f) which are of order higher than r + 2 must vanish, since they contain two proportional columns. Thus for r < n the coefficients <xr+3, <xr+4,. . . , ocn+1 vanish. Moreover, for r < n the nonvanishing minors of order r + 2 must use the first r rows and first r columns of the matrix A{f). In general, the bordered minors of order r -+- 1 need not use these r rows and columns. However, we note the following two cases where a bordered minor of order r + 1 must in fact use the first r rows and columns:
1) r — n, in which case it is obvious that the matrix A{f) has only one minor of order r + 1 (i.e., of order n + 1), namely its determinant, made up of all the rows and columns of A(n;
2) r < w, /r+x = lT+2 = • • • = /„ = 0, in which case there is only one nonvanishing bordered minor of order r + 1, made up of elements from the rows and columns with numbers 1, 2, . .. , r, n + 1.
10.63. Next we show how the next step in the transformation of equation (38), made with the aim of eliminating the quantities lx, l2,. .. , lr, affects the matrix of the operator Ax. First consider the transformation
/ ,  h >
*ll = ■*]! + ~ *)n+l»
Vk^Vk      (A: = 2, 3, ...,« + 1), carrying the matrix A(f) into the matrix
\	0 •	• • 0	0 •	• 0	0
0	x2 •	• 0	0    • ■	• 0	/.
0	0 •		0 •	* 0	K
0	0 •	• 0	0 •	• 0	
0	0 •	• • 0	0 •	• 0	K
0	h •		h+1 *		c---
This operation on Am can be described as follows: The first column is multiplied by /X/Xx and subtracted from the last column, and then the first row is also multiplied by /X/Xx and subtracted from the last row. The subsequent transformations required to eliminate the quantities l2, /3,. . . , lr
306     QUADRATIC FORMS IN EUCLIDEAN AND UNITARY SPACES
CHAP. 10
can be described similarly. As a result of all these transformations, the matrix Am goes into the matrix
Ar)
	0 •	• 0	0 •	• 0	0
0		• 0	0 •	• 0	0
0	0 •	K	0 •	• • 0	0
0	0 •	■ • 0	0 •	• 0	
0	0 •	• 0	0 •	• 0	In
0	0 •	• 0		• K	c'
Moreover, these transformations do not change the values of the bordered minors of the matrix A{f) which use the first r rows and columns of A(f). Next consider the polynomial
det (A%\ - XEJ
Xt — X 0
0 0
0 0
0
x2 — X
0 0
0 0
0 0
0 0
Xr - X 0
0
0 0
0
I.
r+1
+ ai(-X)",
0 0
0 0
0 0 0 /r+1
-X L
where we have dropped the prime on c'. The coefficients of this polynomial are calculated by using the bordered minors of the matrix Afy in just the same way as the coefficients of the polynomial At(X) are calculated by using the bordered minors of the matrix A{f). Since the bordered minors of order r 4- 2 (where r < n) are invariant under the transformation leading from A(f) to Afy, as shown above, we find that <x^+2 = ar+2. tn the same way, we have tx'r+l ~ ar+1 in the two special cases noted above.
10.64. First we consider the special case r = n. Here the coefficient a'i+1 of the polynomial A[r,(X) is obviously equal to the product XXX2 • • • Xnc,
SEC. 10.6    ANALYSIS OF A QUADR1C SURFACE FROM ITS GENERAL EQUATION 307
so that the quantity c in the canonical equation (25), p. 288 is just
c _        a«+l ___an+l
X^ • • • X„     XtX2 • • • X.
11
10.65. Next suppose that r < n. Then we must determine the coefficient ocr+2 of the polynomial A{r)(X), which will be needed in a moment.f The nonvanishing bordered minors of Aff\ of order r + 2 have the form
\ 0 • • • 0 0 0 0   X2   • • •   0    0 0
0   0   • • •   Xr   0 0 0   0   •••   0   0 lm 0   0   •••   0   lm c and their sum, which equals the coefficient <x^+2 = ar+2, is given by
XXX2 • * • XT(/r+1 + /r+2 + ••* + /„).
We recall that the condition for reducing equation (21) to the canonical form (27) is that at least one of the coefficients l^, lr+2, ...,/„ be nonvanishing. We can now formulate this condition equivalently in the form of the inequality
ar+2 7^ 0,
and at the same time give the following formula for calculating the coefficient M of the canonical form (27):
= — XiXa • ' * \l2n      (m = r + 1, . . . , n),
However, if ar+2 *= 0, then /r+1 = /r+2 = *•* — /„ = 0, and (21) reduces to the canonical form (24). Thus we have arrived at another special case. In this case, the coefficient <x'r+l = <xr+1 is obviously equal to the product XXX2 • • • Xrc, so that the coefficient c of the canonical form (24) is just
a,
T+l
a.
r+l
XXX2
>.xx2
t It is easily verified that in this case all the coefficients am of the polynomial Ajr>(X) with m > r 4- 2 vanish.
308     QUADRATIC FORMS IN EUCLIDEAN AND UNITARY SPACES
CHAP. 10
10.66. We now summarize these results in the form of a table. As before, we agree to arrange the roots Xx, X2,... , X„ of the characteristic polynomial A(X) in such a way that the nonzero roots Xt, X2,. . . , Xr come first, denoting the product XXX2 • • * Xr by Ar.
Data
K # o
"■n-l
o
^w-2
* 0
"■»+1
0
an+l = 0
X, = 0\     a, ^ 0
\ ^ 01 a3=0
Canonical Equation
Xx1jJ + XjjY]* +
Ml + hril + • • • + Xw_1,,2_i + 2
A.
■n-l
Ml + X2Tj| + • • • + X^l)^ +
a.
A
= 0
n-l
XjTjJ + X2Y)2 +
ln-2
Ml + + • * * + Xr,~2^«~2 +
a.
n-l
A.
0
n-2
fl2 = 0
a,
-.2 "2
= o
10.7. Hermitian Quadratic Forms
10.71. Many of the theorems of the preceding sections carry over to the case of quadratic forms in a complex space. We begin with the following basic
Theorem. Every symmetric Hermitian bilinear form A(x, y) in an n-dimensional unitary space Cn has a canonical basis consisting of n orthogonal vectors.
Proof. According to Sec. 9.34, the linear operator A associated with the form A(x, y) by the formula A(x, y) = (Ax, y) is self-adjoint. Hence by Theorem 9.34, there is an orthonormal basis ex, . .. , en in the space Cw consisting of eigenvectors of the operator A. The matrix of the operator A is diagonal in this basis, and hence so is the matrix of the form A(x, y), since the operator and the form have the same matrix in any orthonormal
sec. 10.7
HERMIT1AN QUADRATIC FORMS 309
basis of the space Cn. Therefore ex, . . . , en is a canonical basis of the form A(x,y). |
10.72. It follows from this theorem that every symmetric Hermitian quadratic form A(x, x) can be reduced to the canonical form
A(x, x)=2h\U
by a unitary transformation. The sequence of operations leading to determination of the coefficients X3- and the components of the vectors of the canonical basis is the same as in the real case (see Sec. 10.13).
10.73. Next we look for the stationary values of a symmetric Hermitian quadratic form A(x, x) on the unit sphere
iiy2 = i
7 = 1
in Cn, recalling from Sec. 9.15b that A(x, x) takes only real values. Let ex,. . . , en be an orthonormal basis of the form A(x, x). Then in this basis we have
A(x, x) = 2 X,       = 2 X,(g2 + tJ),
(*, x)=iiy2 = i>2 + T2)
(£j — a, -+- hj). Using Lagrange's method, we equate to zero the partial derivatives of the function A(x, x) — X(x, x) with respect to each of the 2n real variables <?3-,    (j — 1, . . . , w). This gives
2XJ-crJ- — 2XtT; = 0,      2Xitj- — 2Xt^ = 0      (/ = 1, . . . , «).
These equations are satisfied for a vector x with jxj ~ 1 if and only if X coincides with one of the numbers Xl5. . . , Xn. Suppose X = Xfc. Then a solution of the equations is given by the vector x with components ^ = a. -\- ii. f= 0 for j ^ k and j^fcj = 1. Hence, just as in the real case (Sec. 10.21), the Hermitian quadratic form A(x, x) takes stationary values at those vectors of the unit sphere which belong to its canonical basis ex, . . . , en, in other words at the eigenvectors of the corresponding self-adjoint operator A. The values of the form at these points coincide with the corresponding canonical coefficients. In particular, the maximum of the form A(x, x) is the largest of the coefficients Xi( and the minimum of A(x, x) is the smallest of these coefficients.
10.74. Next consider the problem of the simultaneous reduction to canonical form of two symmetric Hermitian quadratic forms A(x, x) and
310     QUADRATIC FORMS [N EUCLIDEAN AND UNITARY SPACES
CHAP. 10
B(jc, x), one of which, say B(x, x), is positive definite. To solve this problem, we choose the Hermitian bilinear form B(jc, y) as the scalar product. Then, by Sec. 10.72, there exists an orthonormal canonical basis for the form A(x, x), in the sense of the given scalar product. In this basis we have
A(x,.x)=ix,iy2, B(x,x)=iiy2,
,~1 ,-=1
as required.
The calculation of the coefficients X^ and the components of the vectors of the canonical basis (with respect to an arbitrary original basis) is carried out in the same way as in the real case (Sec. 10.32), after first writing the forms A(x, x) and B(x, x) as real functions of the real variables a,-, t, (y = 1, . . . , w), where = + ru\,-. We leave the details as an exercise for the reader.
PROBLEMS
1. Use an orthogonal coordinate transformation to transform each of the following quadratic forms to canonical form:
a) 25* + 5! - 45x5, - 45,5.;
b) 2% + 5% + 5^ + Alxl2 - 4^ - 8£2V,
c) 2% + 2% + 2%- 4^2 + 2lx\x 4- 2$2$3 - 4^;
d) 2^2 + 2^3 - 2SiJJ4 - 21^ + 2\2lx + 25.5*.
2. What are the stationary values of the quadratic form
A(x, x) = x\ + \x\ + \x\
on the sphere |jcj = 1, where x = (xx, x2, x3), and of what type are they (minimum, maximum, etc.)?
3. Show that each of the quantities [ax, jjl2, ... , ^ can actually attain the upper and lower bounds indicated in formula (6), p. 279.
4. Two quadratic forms A(x, x) and B(x, x) in Rn are said to be comparable if the inequality A(x, x) < B(x, x) holds for every x e Rw. Let xx > x2 > ■ • • > \ be the canonical coefficients of the form A(x, x), and let \ix > [x2 > ■ ■ ■ > \in be those of the form B(x, x). Show that the inequality
holds for every k = 1,2,...,«. (This is obvious in the case where A(x, x) and B(x, x) have a common canonical basis.)
5. Find a common pair of conjugate directions for the curves
x^ v2
PROBLEMS     31 I
6. Construct the linear transformation which reduces both quadratic forms
A(x, x) = l\ +        + 25» - 2^5, + 35*.
B(x, x) =   + 2i^2 + n2iz - ii^ + eq
to canonical form. What are the corresponding canonical forms?
7. Show that the basis in which the quadratic forms A(x, x) and B(x, x) both take canonical form, with canonical coefficients \, x2,..., xw and vu v2, • • • > vn, respectively, is uniquely determined to within numerical factors, provided that the ratios
are distinct.
8. Prove that the midpoints of the chords of a quadric surface parallel to the vector y = t)2, . . • , 7)n) lie on an (« - l)-dimensional hyperplane (the diametral plane conjugate to the vector^).
9. What quadric surfaces in three-dimensional space (with coordinates x, y, z) are represented by the following equations-.
X2      V2      Z2 X2      V2      z2 „
a>4-J + T = 1J b)4--?-T=-1; c)*=/ + ^
d) y = x2 + z2 + 1;   e) y = xzl
10. Simplify the following equations of quadric surfaces in three-dimensional space, and give the corresponding coordinate transformations:
a) 5x2 + 6y2 + lz2 - 4xy + 4yz - lOx + 8y + Hz - 6 = 0;
b) x2 + 2y2 - z2 H- \2xy - 4xz - Syz + I4x + 16y - 12z - 3 = 0;
c) 4x2 + y2 + 4z2 - 4xy + 8xz - 4yz - \2x - \2y + 6z = 0.
11. Show that the intersection of an ellipsoid with semiaxes ax > a2 > • * • > an with a ^-dimensional hyperplane going through the center of the ellipsoid is another ellipsoid with semiaxes bx > b2 > • • > bk, where
ax > bx > fln-fc+i, o2 > b2 > an_k+2y
ak > bk> an.
chapter I I
FINITE-DIMENSIONAL ALGEBRAS AND THEIR REPRESENTATIONS
I I.I. More on Algebras
11.11. The concept of an algebra was introduced in Sec. 6.21, this being the name given to a linear space (over a field K) equipped with a (commutative or noncommutative) operation of multiplication of elements, obeying axioms l)-3), p- 136. The algebras considered in Chapter 6 were for the most part commutative, but, in passing, we mentioned an important example of a noncommutative finite-dimensional algebra, namely, the algebra B(Kn) of all linear operators acting in an w-dimensional space K„. This chapter is devoted to the study of B(K„) and its subalgebras. But first we will find it convenient to consider abstract finite-dimensional algebras.
11.12. Not every algebra has a unit, as shown by the example of the trivial algebra, i.e., any algebra such that xy = 0 for all elements x and y (Example 6.22a). Nevertheless, every algebra can be extended to an algebra with a unit in the following standard way:
Given any algebra A, let A+ be the set of all formal sums a + X, where a e A and X is a number from the field K. Then A+ is obviously a linear space with operations
(a + X) + (b + (x) = (a + b) + (X + jx)
and
[x(a + X) = [xa + X(x
312
SEC. 11.2
REPRESENTATIONS OF ABSTRACT ALGEBRAS 313
(a, b e A; X, u. e K). Moreover, A+ is an algebra with respect to the multiplication operation
(a + X)(Z> + n) = (ab + XZ> + jjui) + Xjx.
The algebra A+ certainly has a unit, i.e., the formal sum of the zero element of A and the number 1. We now need only note that the original algebra A can be regarded as a subset of A+ by simply identifying each element a e A with the formal sum a + 0 e A+.
11.2. Representations of Abstract Algebras
11.21. Let A be an abstract algebra over a field K, and let B(K) be the algebra of all linear operators acting in a linear space K over the same field K. We now consider morphisms of the algebra A into the algebra B(K), henceforth indicated by notation of the form T: A —> B(K).
a. Definition. A morphism T:A—>-B(K) is called a representation of the algebra A in the space K. A representation is called trivial if la = 0 for every a e A and exact (or faithful) if T is a monomorphism, i.e., if the operators Ta and T& corresponding to distinct elements a and b of the algebra A are themselves distinct elements of the algebra B(K).
The set of all elements a e A which are carried into the zero operator by the representation T is called the kernel of the representation T. The kernel of the trivial representation is the whole algebra A, while the kernel of an exact representation consists of a single element, namely the zero element of the algebra. In the general case> the kernel of any representation is a two-sided ideal of the algebra A (see Example 6.25d).
b. Definition. Two representations T':A-»*B(K') and T":A^B(K") of an algebra A are said to be equivalent if there is an isomorphism U: K' —>- K" between the linear spaces K' and K" such that
UT^ = T^U
for every a e A. Obviously, in the case of finite-dimensional spaces K' and K", equivalence of the representations T' and T" means that the operators T^ and (a e A) have identical matrices in suitable bases of the spaces K' and K".
c. Let T;A—>-B(K) be a representation of the algebra A. A subspace K' c K is called an invariant subspace of the representation T if it is invariant with respect to all operators Ta, a e A. By considering the operators T0 only on the space K\ we obviously get a new representation TK: A —>■ B(K'), called the restriction of the representation T onto K'.
314     FINITE-DIMENSIONAL ALGEBRAS AND THEIR REPRESENTATIONS CHAP. 11
d. Finally let T: A —> B(K) be a representation of the algebra A such that K is the direct sum of subspaces Kk (1 < k < n) invariant with respect to the representation T, and let Tfc denote the restriction of the representation T onto Kk (1 <&<«). Then we say that the representation T is the direct sum of the representations Tk (1 </:<«).
11.22. To every algebra A we can assign in a natural way a representation T:A-* B(A) in the linear space A itself which associates with each element
a e A the operator of left multiplication by a, i.e., the operator Ta defined by the formula Tab — ab for every b e A. This representation is called the left regular representation of the algebra A. The invariant subspaces of the left regular representation are obviously left ideals in A (Sec. 6.23a). Using this concept, we can establish the following important
Theorem. Every algebra is isomorphic to a subalgebra of the algebra B(K), for a suitable choice ofK.
Proof. It is easy to see that the theorem is equivalent to the assertion that every algebra has an exact representation. Let A be the given algebra. As shown in Sec. 11.12, there exists an algebra A+ with a unit e which has A
as a subalgebra. Let T: A+ —>■ B(A+) be the left regular representation of this
algebra. Then T is exact, since Tae = ae = a ^ 0 for every aeA+ a ^0.
Hence the restriction of the morphism T onto the subalgebra A <=■ A+ is an exact representation of the algebra A in the space K = A+. |
11.3. Irreducible Representations and Schur's Lemma
11.31. Among all representations of a given algebra we now distinguish those with the simplest structure in a certain sense. Every representation T:A—s-B(K) of an algebra A has at least two invariant subspaces, K itself and the subspace {0} consisting of the zero element alone. Any other invariant subspace is said to be proper. Proper invariant subspaces which contain no other such subspaces are called minimal invariant subspaces of the representation T.
Definition. A nontrivial representation T:A—vB(K) is said to be irreducible if it has no proper invariant subspaces.
11.32. Given any vector zeK, it is easy to see that the set Kx — {Taz eK:a 6 A} is an invariant subspace of the representation T. A vector z eK is said to be cyclic (with respect to the representation T) if Kz = K. This definition, together with the definition of irreducibility, immediately implies the following
Theorem. A representation acting in the space K is irreducible if and only if every nonzero vector z eK is cyclic.
SEC. 11.4
BASIC TYPES OF FINITE-DIMENSIONAL ALGEBRAS 315
Despite its simplicity, this result will subsequently be found very useful.
11.33. The irreducible representations of algebras over the field C of complex numbers have the following important property:
Theorem (Schur's lemma). Let T:A—*-B(C) be an irreducible representation of the algebra A over the field C. Then every operator in C which commutes with all the operators Ta, a e A, is a multiple of the identity operator E.
Proof. Let S be an operator which commutes with all Ta, a e A, and let x be an eigenvector of S (Sec. 4.9). Then Sx = ax for some complex X, and hence STax — TaSx = XTax for every a e A. But the representation T is irreducible, and hence, by Theorem 11.32, every vector y e C can be represented in the form y ~ Tax, a e A. It follows that S = XE. |
It should be noted that the proof makes essential use of the fact that every linear operator in a (finite-dimensional) complex linear space has an eigenvector (see. Sec. 4.95b). In view of the decisive role of Schur's lemma, we will henceforth confine ourselves to a consideration of linear spaces and algebras over the field of complex numbers.
11.4. Basic Types of Finite-Dimensional Algebras
Beginning with this section, unless the contrary is explicitly stated, we will consider only finite-dimensional algebras (i.e., algebras which are finite-dimensional regarded as linear spaces) over the field C of complex numbers.
What is the structure of finite-dimensional algebras and their representations? Most of this chapter will be devoted to results along just these lines. In particular, we will distinguish some classes of algebras whose structure can be studied completely, i.e., we will succeed in describing all such algebras (to within an isomorphism) and all their representations. We refer to the classes of simple and semisimple algebras.
The various classes of algebras arise when we consider specific properties of their ideals and representations.
11.41. Definition. A nontrivial algebra is called simple if it contains no proper two-sided ideals (Sec. 6.23a). An example of a simple algebra is-the algebra B(C„) of all linear operators in a finite-dimensional space. In fact, let J be a two-sided ideal in the algebra B(C„), and let A — \aih\ e J be a nonzero matrix such that aTS ^ 0, say. Then, as shown in Sec. 4.44, by multiplying the matrix A from the right and from the left by certain matrices, i.e., by performing operations that do not leave the ideal J, we can get a matrix Ers whose only nonzero element 1 appears in the rth row and sth
316     FINITE-DIMENSIONAL ALGEBRAS AND THEIR REPRESENTATIONS CHAP. 11
column. Moreover, by further multiplying E„ from the right and from the left by certain matrices, we can get any matrix Eik (/, k = 1,. . . , n) without leaving the ideal J. But linear combinations of the matrices Ejk give the matrix of any operator in B(C„), and hence J = B(CJ. As we will see later (Sec. 11.64), this example is unique in the class of all finite-dimensional algebras over the complex number field.
Theorem. Every simple algebra has an exact irreducible representation.
Proof. Let A be a simple algebra, and consider its left regular representation T:A—^B(A). It follows at once from the fact that A is finite-dimensional that among the invariant subspaces of the representation T
there is a minima] subspace A'. The restriction T of the representation T onto A' is nontrivial. To show this, we need only prove that for every be A', the set Ab = {ab:aeA} =£{0}, resorting to the following simple proof (due to A. S. Nemirovski): Suppose, to the contrary, that Ab = {0}. Then, as is easily seen, the set bA = {b:a e A} is a two-sided ideal in A, and hence, since A is simple, either bA = A or bA = {0}. But if bA = A, then Ab — {0} implies that every product in A equals zero, while if bA = {0}, the set {kb:X e C} is a two-sided idea] in A since Ab = {0}, and hence must coincide with the whole algebra since A is simple. Thus, in both cases, the algebra A turns out to be trivial, and hence cannot be simple.
Thus the representation T: A —>- B(A') is nontrivial. But then, on the one hand, it is irreducible, by the minimality of A', while on the other hand, its kernel, being a two-sided ideal distinct from the whole simple algebra A, consists of the zero element alone. Therefore T (like any irreducible representation of A) is at the same time exact. |
It turns out that the converse theorem is also true, i.e., every finite-dimensional algebra with an exact irreducible representation is simple. This will be shown at the end of Sec. 11.64.
11.42. An arbitrary algebra may not have exact irreducible representations. But it is natural to single out those algebras whose properties can be described in terms of their irreducible representations. This leads to the following wider class of algebras:
Definition. An algebra A is called semisimple if, given any nonzero element a e A, there exists an irreducible representation mapping a into a nonzero operator. In other words, the intersection of the kernels of all the irreducible representations of a semisimple algebra consists of the zero element alone.
It follows from Theorem 11.41 that every simple algebra is also semi-simple. On the other hand, consider the w-dimensional (n > 1) algebra C„,
SEC. 11.4
BASIC TYPES OF FlNITE-DlMENSlONAL ALGEBRAS 317
consisting of the elements a = (<xt,. . . , <x„) where a,- e C, with multiplication component by component^ This algebra is obviously commutative. Moreover, the set of all a — (alt. . . , <xn) such that <xt = 0, say, is a two-sided ideal in Cn, so that the algebra Cn is not simple. Suppose that with every
element a = (at.....an) we associate the complex number <xfc (1 < k < n),
or equivalently the operator of multiplication by <xfc in the one-dimensional space Ct. Then we get an irreducible representation of the algebra Cn which maps every element of Cn with afc ^ 0 into an operator distinct from zero. Since every nonzero element ae Cn has at least one nonzero component, there exists an irreducible representation mapping a into a nonzero operator. Therefore the algebra Cn is semisimple.
In this example, Cn is a direct sum of simple (one-dimensional) algebras. The example can easily be generalized by considering a direct sum of simple noncommutative algebras. Then, as will be shown in Sec. 11.77, we get the general form of a finite-dimensional semisimple algebra over the field of complex numbers.
11.43. Next we introduce algebras whose properties are, in a certain sense, the opposite of those of a semisimple algebra:
Definition. An algebra A is called a radical algebra if every nontrivial representation of A has a proper invariant subspace. In other words, a radical algebra has no irreducible representations at all.
As an example, consider the algebra A of polynomials P{z) = cxz + * • • + cnzn with the usual operations but subject to the condition zn+l = 0. Then every element of the algebra A vanishes when raised to the (« + l)th power, so that no element of A has an inverse. The algebra A has no non-trivial one-dimensional representations, since every nonzero linear operator in a one-dimensional space is invertible. Let T be a nontrivial (and hence multidimensional) representation of the algebra A, and let Z be the operator corresponding to the element z. Since Z (like z itself) is noninvertible, there exists a vector e 0 such that Ze = 0. But then P(Z)e = 0 for every P(z) e A. Thus we have found a nontrivial invariant subspace (the straight line determined by the vector e) of the representation T. It follows that A is a radical algebra.
.11.44. Definition. By the radical of an algebra A is meant the intersection of the kernels of all irreducible representations of A if such representations exist, or the whole algebra A if no such representations exist.
•f I.e., if tar = (a,, . . . , a„), b = (fV . . . , [3„), then ab = (aj(31} . . . , anf3„).
318     FINITE-DIMENSIONAL ALGEBRAS AND THEIR REPRESENTATIONS CHAP. 11
Since the kernel of every representation is a two-sided ideal of the algebra A (see Sec. 11.21a), the radical of A, being an intersection of two-sided ideals of A, is itself a two-sided ideal of A.
The study of algebras with nontrivial radicals (in particular, radical algebras) involves substantial difficulties, with results that, as a rule, are not in definitive form (some of these results will be found at the end of this chapter). On the other hand, semisimple algebras and their representations can be studied in complete detail. In fact, as we will see below, the study of semisimple algebras reduces to that of simple algebras.
We now turn to the detailed study of simple algebras and their representations.
11.5. The Left Regular Representation of a Simple Algebra
11.51. Thus let A be a* simple algebra, and let T;A-^B(X) be a fixed exact irreducible representation of A (the existence of T follows from Theorem 11.41). This representation will henceforth be called standard.
Theorem. Let T: A —>- B(A) be the left regular representation of a simple algebra A, and let I be a minimal invariant subspace ofT. Then
a) The restriction T1 of the representation T onto I is equivalent to T;
b) The subspace I, regarded as a subalgebra of A, has a right unit.
Proof. First we fix an element a e I, a 0. Since the representation T is exact, Tax 0 for some x e X. Consider the linear operator u:I —v X defined by the formula Vb = Tbx for every b el. It is easy to see that the kernel of the operator u is a left ideal in A (or equivalently an invariant
subspace of the representation T) contained in I but not coinciding with I. Hence the kernel of u consists of the zero element alone. On the other hand, the image of u is obviously a nonzero invariant subspace of the irreducible representation T, and hence coincides with the whole space X. Thus u is an isomorphism of I onto X. Moreover, for arbitrary b el and c e A,
IH> = V(cb) = Tcbx = Tc(T6x) = TcVb,
and hence
ut; = tcu,
which shows that the representations T1 and T are equivalent (see Sec. 11.21 b). Furthermore, since u maps I onto all of X, there exists an element eel such that Ve = Tex = x. It follows that
Vibe) = Jbex = T,(T» = T6x = Vb
SEC. 11.5 THE LEFT REGULAR REPRESENTATION OF A SIMPLE ALGEBRA 319
for every be I. But U is a one-to-one mapping, and hence be = b. Thus e is a right unit in the algebra. |
It should be noted that any exact irreducible representation of a simple algebra can be chosen as the standard representation. Therefore an automatic consequence of this theorem is the fact that all exact irreducible representations of a simple algebra are equivalent.
11.52. Lemma. Given an arbitrary algebra A, let Ix and I2 be left ideals of A with right units ex and e2, respectively, where aex = 0 for every a el2 Then there exists a right unit e2 in I2 such that be2 = Ofor every b e Ix.
Proof Let e2 = e'2 — exe2. Then for every a e I2 we have
ae2 = ae2 — aexe2 = a, since ae'2 = a and aex = 0. Moreover,
be2 = be2 — bexe2 — be2 — be2 = 0
for every b e lx. |
11.53. Theorem. The left regular representation of a simple algebra A is the direct sum of its irreducible representations.
Proof. We will construct the desired set of minimal invariant subspaces
of the representation T:A—^B(A) by induction, proving at each step that, as an algebra, the direct sum of the subspaces already found has a right unit. For the first subspace we take any minimal invariant subspace Ix of the
representation T. According to Theorem 11.51, lx has a right unit ex. Suppose we have already found minimal invariant subspaces Ix, . . . , lk such that the left idea] J£ = Ix + • • • + Ifc has a right unit ek. If Jk = A, we have succeeded in constructing the desired invariant subspaces. Otherwise, let
rk = {aeA:aek=0}.
Then it is easy to see that J'k is an invariant subspace of the representation T, whose intersection with is empty. Moreover, since every element aeA can be represented in the form a — aek + (a — aek), where aek e Jk and a — aek e J'k, the algebra A is the direct sum of    and J£.
The finite-dimensional invariant subspace 3"k contains a minima] invariant subspace, which we denote by Ifc+1. According to Theorem 11.51, Ifc+1 contains a right unit e'k+l, where aek = 0 for every a e Iw since Ifc+1 <= j£. It follows from Lemma 11.52 that Ifc+X contains a right unit e"k such that be"k = 0 for every b e 3k. Let ek+x = ek + e"k. Then, as is easily verified, ek+x is a right unit in the ideal
J£+i = Ii H-----h I* 4- W
320     FlNlTE-DlMfcNanjNAL ALGEBRAS AND THEIR REPRESENTATIONS CHAP. 11
This proves the legitimacy of making the induction from k to k + 1. The algebra A is finite-dimensional, and hence at some stage we get the set of minimal invariant subspaces I1( . . . , lm of the representation T whose direct sum is the whole algebra A. Hence the left regular representation of A is the direct sum of its irreducible representations. |
11.54. We note that it was shown in the course of the proof that every simple algebra has a right unit. Actually, we have the following stronger
Theorem. Every simple algebra has a unit.
Proof. Let A be a simple algebra, and let e be a right unit of A. Consider the operator Tf in the standard representation T:A -^B(X). Then
Ta(Tex —x) = Jaex - Tax = 0
for every x eX and a e A. Since T is irreducible, every nonzero vector must be cyclic (Theorem 11.32). It follows that Jex — x = 0. In other words, Te is the identity operator in the space X. But then
TaTe = JJa = Ta
for every a e A, and hence ae — ea = a by the exactness of the representation T. Therefore e is a unit in A. |
11.6. Structure of Simple Algebras
At the end of this section we will solve the problem of the structure of simple algebras. In so doing, we will find the following concept very useful:
11.61. Let X be a linear space, and let A0 be a subalgebra of B(X). The subset of B(X) consisting of the operators which commute with all operators in A0 will be called the commutator of the algebra A0, denoted by A0.
It is easy to see that A0 is itself a subalgebra of B(X). The commutator of this new subalgebra, denoted by A0, will be called the second commutator of the algebra A0. Obviously we have A0 A0.
11.62. Given any algebra A, every element ae A defines two operators in B(A), the operator of left multiplication Ta, specified by the formula Tab = ab, and the operator of right multiplication Ra, specified by the formula ~Rab = ba. It is easy to see that the set of all operators of left multiplication and the set of all operators of right multiplication form subalgebras in B(A), which we denote by A* and Ar0, respectively.
SEC. H.6
STRUCTURE OF SIMPLE ALGEBRAS 321
Lemma. If the algebra A has a unit, then Aj, = Ar0 and ATQ = A^.
Proof If S g A^, then
S(ab) = STab = JaSb = aSb.
Setting b = e, where e is the unit in A, we get Sa — aSe. Therefore S is the operator of right multiplication by the element Se e A, i.e., S e Ar0. It follows
that Al0    AJ, and hence that Al0 = Ar0, since obviously ArQ <=■ AlQ. The
formula AT0 = Al0 is proved in just the same way. |
11.63. Theorem. Given a simple algebra A with standard representation T: A —>■ B(X), let A0 be the algebra of operators ofT. Then A0 = A0.
Proof. The algebra A0 defined above can obviously be regarded as the
algebra of operators of the left regular representation T:A—>-B(A) of the algebra A. According to Theorem 11.53, this representation is the direct
sum of certain irreducible representations T1': A —>-B(LJ (1 < i < m), where, by Theorem 11.51, each representation is equivalent to the standard representation. This means the following: We can find a basis xlt . . . , xn in the space X and a basis f^\ . . . , f^] ineachofthesubspacesl; (1 < i < m) such that for every a e A, the matrix of the operator Ta in the basis /j*1', f2a), . . . , f^m) of the whole space A has the quasi-diagonal form
f -
(1)
where each block along the principal diagonal is the matrix of the operator Ta in the basis xlt . . . , xn and the "off-diagonal" blocks consist entirely of zeros. It follows from the rule for multiplication of block matrices (Sec. 4.51) that every matrix commuting with all matrices of the form (1) is a matrix of the form
Si,
(2)'
where each block S{i is an n x n matrix commuting with all the matrices Ta, a e A.
322     FINITE-DIMENS(ONAL ALGEBRAS AND THEIR REPRESENTATIONS CHAP. 11
Now let P be an operator in A0, and let P be its matrix in the basis . . . , xn. Then the quasi-diagonal matrix
E
obviously commutes with all matrices of the form (2), and hence determines in the basis f[l), /2(1>,. . . , f^m) of the space A an operator belonging to the second commutator of the algebra A0. By Theorem 11.54, every simple algebra has a unit, and hence, by Lemma 11.62,
A1 — Ar
which means that the matrix P determines in the basis fxa\ /2(1),. . . , f^m) an operator P, equal to T6 for some be A. But then P = T^ for the same b, and hence P belongs to the algebra A0. The proof is now complete, since P
is an arbitrary element of A0. |
11.64. We are now in a position to prove the basic theorem on simple algebras:
Theorem {First structure theorem). Every simple algebra is isomorphic to the algebra of all linear operators acting in some finite-dimensional space X.
Proof. Let A be a simple algebra, and let T: A-v B(X) be the standard representation of A. It is sufficient to prove that the algebra A0 of operators of the representation T coincides with B(X). Since the representation T is irreducible, it follows at once from Schur's lemma (Theorem 11.33) that the
commutator A0 of the algebra A0 consists of just those operators which are
multiples of the identity operator. But then the second commutator A0
coincides with the whole algebra B(X). At the same time A0 = A0, by Theorem 11.63, and hence A0 = B(X). |
It should be noted that behind all the considerations leading to the first structure theorem lies the fact that every simple algebra has an exact irreducible representation. Hence we have incidentally proved that every algebra with an exact irreducible representation is isomorphic to the algebra B(X). It follows at once that the converse of Theorem 11.41 holds: Every algebra with an exact irreducible representation is simple.
SEC. 11.7
STRUCTURE OF SEMISIMPLE ALGEBRAS 323
11.7. Structure of Semisimple Algebras
11.71. In this section we will show that the problem of the structure of a semisimple algebra reduces completely to the problem of the structure of a simple algebra (already studied above). To this end, we will find it useful to introduce some new concepts.
Definition. By a normal series of an algebra A is meant a chain of algebrasf
A = l03l12--oln2l^ = {0}
in which each algebra is a two-sided ideal of the preceding algebra. By a composition series of an algebra A is meant a normal series of A in which each ideal is maximal (i.e., is not contained in any larger two-sided ideal) and I„ contains no proper two-sided ideals.
It is easy to see that every finite-dimensional algebra has a composition series. In fact, among the (proper) two-sided ideals of a finite-dimensional algebra A there is a maxima] ideal Il5 say. Similarly, the algebra Ix contains a maximal two-sided ideal I2, I2 contains a maximal two-sided ideal I3, and so on. Since the original algebra A is finite-dimensional, after a finite number of steps we finally arrive at an algebra which contains no further proper ideals. The chain of algebras
A = I0 3 l% z> • • • => In => 1^ = {0}
so obtained is obviously a composition series of the algebra A.
11.72. Before turning to the special properties of normal and composition series of semisimple algebras, we-prove the following
Lemma. Given any element a of a semisimple algebra A, there exists an element b e A such that every power of the element ba is nonzero.
Proof By the definition of a semisimple algebra, there exists an irreducible representation T: A ^ B(X) such that Ta ^ 0. Then for some x e X, x ^ 0, the vector y = Tax is nonzero and therefore, by Theorem 11.32, is a cyclic vector of the irreducible representation T. Hence there is an element b e A such that Tby = x, i.e., such that
Tba* = Tb(Ta*) = Tb7 = x.
It follows that every power of the operator Tbo, and hence every power of the element ba e A, is nonzero. |
t Here and in the rest of this section (only) we write A ^ B (equivalently, B 5 A) to mean that A is a subset of B, reserving the notation A <= B (equivalently, B ~=> A) to mean that A is a proper subset of B (i.e., A £ B but A ^ B).
324     FINITE-DIMENSIONAL ALGEBRAS AND THEIR REPRESENTATIONS CHAP. 11
11.73. Theorem. A normal series of a semisimple algebra cannot contain nonzero trivial algebras.
Proof. Let A be a semisimple algebra, and let
A = L2l12-2L2 In+1 = {0}
be a normal series of A. It can be assumed without loss of generality that the algebra In contains an element a distinct from zero. Obviously, to prove the theorem, we need only find an element c e I„ such that ca ^ 0.
By Lemma 11.72, there exists an element be X such that every power of ba is nonzero.
ck = (baf^b      (* = 0, l,...,n-l).
Then induction on k shows that ck e Ik+X. In fact, for k = 0 we have c0 = bab e lx since a elx, and the possibility of carrying out the induction follows at once from the obvious relation ck+x = ckack and the fact that a e Ifc+2. Thus we see that the element c = cn_x belongs to the algebra ln, and moreover
ca - (bafn-lba = (bafn ^ 0,
as required. |
11.74. Next we prove three simple propositions:
Lemma. Let A 2 ^ 2 I2 D {0} be a normal series of an algebra A, where the algebra I2 is simple. Then I2 is a two-sided ideal in A.
Proof By Theorem 11.54, the algebra I2 has a unit e. Since eelx, the elements ae and ea belong to Ix for every aeA, But then
ab = a(eb) = (ae)b e I2,
ba = (be)a = b(ea) e I2
for every b el2. |
11.75. Lemma, Let A be an arbitrary algebra, and let I be a two-sided ideal of A with a unit. Then A has a two-sided ideal J such that A is the direct sum of I and J.
Proof. Let J = {a e A:ae = 0}, where e is the unit of the algebra I. Then obviously J is a left ideal in A. Moreover, A is the direct sum of I and J, since b ~ be + {b — be) and b — bee J.
We must still prove that J is a right ideal in A. Clearly
ab = abe + a(b — be) for arbitrary ae J and b e A. But be = ebe since be e I, and hence
abe = (ae)be = 0
SEC. 11.7
STRUCTURE OF SEM1S1MPLE ALGEBRAS 325
since ae = 0. Therefore ab = a(b — be), so that ab is the product of two elements of J. It follows that ab e J. |
11.76. Lemma. Let I and J be two-sided ideals of an algebra A, and suppose A is the direct sum of I and J, with I the maximal two-sided ideal in A. Then the algebra J contains no proper two-sided ideals-
Proof Let J' be a two-sided ideal of J which does not coincide with J. Then the algebra J" = I + J' is a two-sided ideal in A. But I is maxima], and hence J" = I. It follows that J' = {0}. |
11.77. We are now at last in a position to prove the basic theorem on the structure of semisimple algebras:
Theorem {Second structure theorem). Every semisimple algebra A is a direct sum of two-sided ideals of A, each of which is a simple algebra.
Proof As shown in Sec. 11.71, we can construct a composition series
A - I0 => I, => • • • z> In => = {0}
for A. Our theorem is then obviously a special case of the following
Assertion. For every k (0 < k < n) the algebra is a direct sum of two-sided ideals ofIn_k, each a simple algebra, and moreover       has a unit.
We now prove this assertion by induction on k. The algebra In has no proper two-sided ideals, and moreover is nontrivial, by Theorem 11.73. Hence the algebra I„ is simple and, in particular, has a unit (by Theorem 11.54). This proves the assertion for k — 0.
Suppose now that the assertion is true for some k (0 < k < n — 1). This means, in particular, that the algebra ln_k has a unit, and hence, by Lemma 11.75, I„^fc_1 is a direct sum In_fc -f- J where J is a two-sided ideal in I„_a;_i. Since        is a maxima] two-sided ideal in it follows from
Lemma 11.76 that the algebra J contains no proper two-sided ideals. At the same time, applying Theorem 11.73 to the normal series
A = I„ => Ix => ■ • • =» IN_fc_x => J => {0},
we find that J is nontrivial and hence simple. By the induction hypothesis, the algebra ln_k is a direct sum of two-sided ideals of        each a simple algebra. Being simple, each of these subalgebras is also a two-sided ideal in In-fc_i, by Lemma 11.74. It follows at once from this fact and the relation —       -J- J that I„_fc_i is also a direct sum of two-sided ideals of eacn a simple algebra. We must still show that the algebra I„_fc_i has a unit. Let ex be the unit of the algebra In_k (which exists by the induction hypothesis), and let e2 be
326     FINITE-DIMENSIONAL ALGEBRAS AND THEIR REPRESENTATIONS CHAP. 11
the unit of the simple algebra J. Then, since ab = ba = 0 for arbitrary a 6 I„_j.., b e J, it is easy to see that the element e = ex + e2 is a unit in the whole algebra
Thus we have justified the induction on k, thereby proving the italicized assertion. But, as already noted, our theorem is a special case of this assertion (corresponding to k = n). |
It should be noted that we have incidentally proved that every semisimple algebra has a unit.
The two-sided ideals found in the theorem, whose direct sum is the given semisimple algebra A, will henceforth be called the simple components of the algebra A.
11.78. It was shown in Sec. 11.64 that every simple algebra is isomorphic to the algebra B(X) for some finite-dimensional space X or, equivalently, to the algebra of all square matrices of a certain order. Now let Xx, . . . , Xn be a set of finite-dimensional spaces, and let B(XX, . . . , X J be the set of all rows of the form
a — (ax, . . . , flj,
where ak is an operator from the algebra B(Xfc) (or, if convenient, a matrix of the appropriate order). Obviously B(XX,. . . , X„) is an algebra with respect to the "component-by-component" operations defined by the formulas
a + b = (ax + bx,.. . , an + bn),
\a — (Xaj, . . . , ~kan),
ab = (axbx,. . . , anbn),
where a, be B(XX, . . . , Xn), a = (ax,. .. , an), b = (bXy ... , bn), and X is a complex number. It follows from these considerations that Theorem 11.77 has the following equivalent form:
Every semisimple algebra is isomorphic to the algebra B(XX, . . . , Xn) for some set of spaces Xx, . . . , X„.
We note further that the simple components of the algebra B(Xx,.. . , X„) obviously consist of rows of the form (0, . .. , 0, ak, 0, . .. , 0), where the kth entry ranges over the whole algebra B(Xfc) and the remaining entries are all zero. We will identify each such component with the corresponding algebra B(Xfr).
11.79. We conclude this section by finding all two-sided ideals of a semisimple algebra:
Theorem. Every two-sided ideal of a semisimple algebra A is the direct
SEC. 11.8
REPRESENTATIONS OF SIMPLE AND SEM1S1MPLE ALGEBRAS 327
sum of a certain number of simple components of A.
Proof According to Sec. 11.78, the semisimple algebra A is isomorphic to some algebra of the form B(XX, . . . , XJ with simple components B(XA;), 1 < k < n. Let I be a two-sided ideal in B(XX, . . . , Xn), and let Ik be the intersection of I with B(Xfc). If I contains the element
a = (alt . . . , ak_u ak> ak+l, . . . , an),
then I also contains the element
aek = (0, . . . , 0, ak> 0, . . . , 0),
where ek is the unit in B(Xfc). It follows that I can be written as the direct sum
I = Ix + ■ ' ' + I„.
But it is easily seen that Ifc is a two-sided ideal in the simple algebra B(Xfc) for every k (1 < k < n). Hence either Ifc = {0} or Ik coincides with the whole algebra B(Xfc). |
11.8. Representations of Simple and Semisimple Algebras
From a knowledge of the structure of simple and semisimple algebras, we can without particular difficulty find all their representations to within an equivalence.
11.81. Let A be a semisimple algebra. Then, by Sec. 11.78, we can identify A with the algebra B(Xl5. . . , X„) for some set of spaces Xk (1 < k < n). Therefore, besides the given algebra A, we are led in a natural way to consider n representations T*: A —* B(Xfc), 1 < k < n of A, defined by the formula
T0fc = ak e BfXfc)
for every a = (alt . . . , ak,. . . , an) e A. Since the image of the representation Tfc is the whole algebra B(X^), these representations are all irreducible.
Theorem. Every irreducible representation of a semisimple algebra A is equivalent to one of the representations Tfc (1 < k < n).
Proof Let A = B(Xl9 . . . , X„) be a semisimple algebra, with an irreducible representation T:A—^B(X), and let Z(T) be the kernel of the representation T. Since Z(T) is a two-sided ideal in A (Sec. 11.21a), it follows from Theorem 11.79 that Z(T) is the direct sum of certain simple components of A. Let At denote the direct sum of the remaining simple components of A which do not figure in Z(T), and let T(1): At — B(X) be the restriction onto Ax of the original representation T. The new representation T(1) is now exact,
328     FINITE-DIMENSIONAL ALGEBRAS AND THEIR REPRESENTATIONS CHAP. 11
and moreover irreducible since the images of the representations T(1> and T obviously coincide. The algebra Ax, having an exact irreducible representation, must be simple (see Sec. 11.64). Hence Ax reduces to a single simple component, i.e., Ax coincides with B(Xt) for some k (1 < k < «). But then, as is easily seen,
T0 = Tj»      ak e B(X,)
for every a = (au . . . , ak, . . . , an) e A.
Now, according to Sec. 11.51, all exact irreducible representations of a simple algebra are equivalent. In particular, the representation T(l) :B(Xfc) —>-B(X) and the identity representation T(2> :B(XJ —>- B(Xfc) are equivalent. This means that there exists an isomorphism U:X Xk such that UT^1' = Tj£>U for every ak e B(Xk). But Ta = T^> for every a e A, as just shown, while on the other hand it follows from the definition of the representation Tk that Jka = Tg>. Therefore UTa = T*U for every a e A, which proves the equivalence of the representations T and T'. |
11.82. Next we consider arbitrary representations of simple and semi-simple algebras. In this regard, the following general proposition will be found useful:
Lemma. Given an arbitrary algebra A, let T:A—v B(X) be any representation of A, and let Xx, . . . , X„ be minimal invariant subspaces of T spanning a linear manifold which coincides with X.f Then X is the direct sum of certain of the subspaces Xx, . . . , Xn.
Proof. An intersection of invariant subspaces of a representation is itself an invariant subspace. Therefore it follows from the minimality of the given subspaces that for any k, the intersection of the subspace Xfc+X with the linear manifold spanned by the subspaces Xx, . . . , Xk is either empty or Xfc+1 itself. Hence by consecutively choosing those of the subspaces Xx, .. . , Xn which are not contained in the linear manifold spanned by the preceding subspaces, we get the subspaces whose direct sum is the whole linear manifold spanned by Xx, . . . , Xn, namely the whole space X. |
11.83. According to the second structure theorem, every semisimple algebra A is isomorphic to an algebra of the form B(XX,. .. , Xn). In what follows, we will find it convenient to consider the realization of B(XX,..., X„) in the form of an algebra of rows, each made up of n matrices of the appropriate orders. The number appearing in the "//th" place in the Arth matrix of the row corresponding to the element aeA will be denoted by X^*(a). Moreover, we will use e[k) to denote the element of the algebra A such that
t By the linear manifold spanned by ihe spaces X1( .. . , X„ we mean the set of all linear combinations of the form       + • • • + a.nxn where xke Xt (cf. Sec. 2.51).
SEC. 11.8
REPRESENTATIONS OF SIMPLE AND SEMlSlMPLE ALGEBRAS 329
H^iep) = 1 wn'le a11 other elements in the matrices of the corresponding row equal zero. It should be noted that
2 *»' = e, (3)
where e is the unit of the algebra A.
Lemma. Let T:A—^B(X) be a representation of a semisimple algebra A and suppose the vector y — Te<*tx is nonzero for some x e X and certain
indices i and k. Then y belongs to some minimal invariant subspace of the representation T.
Proof Let Y = {Tay.a £ A}. Then, since y = T^ix, it follows from the rule for matrix multiplication that every element zx £ Y is of the form Zj = Tbx, where b is some linear combination of the elements ef> (with **and k fixed). It is sufficient to show that if zx ^ 0, then zx is a cyclic vector with respect to the restriction of the representation T onto Y.
Now let z2 £ Y, so that z2 = Tcx, where c is another linear combination of the same elements e[k). Using the realization of the algebra A as an algebra of matrix rows, we find an element a £ A such that c — ab. But then z2 = Tcx = Ta(T6;c) = Tazlt Hence the vector zx is cyclic, as asserted. |
11.84. Theorem. Every representation of a semisimple algebra A is a direct sum of irreducible representations and the trivial representation.
Proof Given any representation T°:A—^ B(X°), consider the operator T^ where e is the unit in A. Then the formula
x = T>ex + (jc - Tex)
obviously defines an expansion of X° as a direct sum of subspaces X and X0 invariant with respect to T°, where the restriction of T° onto X0 is the trivial representation. We must still show that the representation T:A—► B(X), the restriction of T° onto X, is a direct sum of irreducible representations.
Let ... , xm be a basis in X. Then Te is the identity operator in X, and hence, because of (3), the linear manifold spanned by the vectors of the type Tf<j->jc for all possible indices i,j and k coincides with the whole space X.
By Lemma 11.83, every nonzero vector of this type lies in some minimal irreducible subspace of the representation T. Thus the conditions of Lemma 11.82 are in force. But then the space X is the direct sum of certain minimal invariant subspaces of the representation T, so that T is a direct sum of irreducible representations. |
11.85. Theorems 11.81 and 11.84 together describe to within an equivalence all representations of semisimple (including simple) algebras. In
330     FINITE-DIMENSIONAL ALGEBRAS AND THEIR REPRESENTATIONS CHAP. 11
particular, we see that the operators of a given representation of a simple algebra (singling out this case for greater clarity) are described in some basis by quasi-diagonal matrices of the form
M
M
(4)
0
where M ranges over the whole set of matrices of the appropriate order and 0 denotes the zero matrix. In the more general case of asemisimple algebra, the corresponding matrices are quasi-diagonal matrices of the form
Mi,
Mi
0
(5)
where each of the matrices Mls . .. , Mk appearing in the indicated larger blocks ranges independently over the whole set of matrices of the appropriate order (in general different for different matrices).
11.86. Incidentally we have described all simple and semisimple matrix algebras (i.e., algebras which themselves consist of matrices). In fact, by merely assigning each matrix of such an algebra its operator (in any basis), we get an exact representation of the algebra. This and the preceding considerations immediately imply the following assertion:
SEC. 11.9
SOME FURTHER RESULTS 331
Every simple (or semisimple) matrix algebra consists of all matrices of the form P~XLP, where P is a fixed nonsingular matrix and L ranges over the set of all matrices of the form (4) (or of the form (5)).
For algebras containing the unit matrix, we get a somewhat different result:
Every simple matrix algebra containing the unit matrix consists of all matrices of the form PlLP, where P is a fixed nonsingular matrix, L ranges over the set of all quasi-diagonal matrices of the form
M	
	M
M
(6)
and M ranges over the set of all matrices of the appropriate order. Every semisimple algebra containing the unit matrix consists of all matrices of the form P~LLP, where P is a fixed nonsingular matrix, L ranges over the set of all quasi-diagonal matrices of the form
My
My
and each of the matrices My,. . . , Mk ranges independently over the whole set of matrices of the appropriate order.
11.9. Some Further Results
Thus we have completed the description of simple and semisimple finite-dimensional algebras, as well as their representations. Further investiga tion of finite-dimensional algebras lies beyond the scope of this chapter.
332     FINITE-DIMENSIONAL ALGEBRAS AND THEIR REPRESENTATIONS CHAP. 11
Nevertheless, to give perspective, we now cite some well-known results along these lines.
11.91. Wedderburn's theorem. Every finite-dimensional algebra is the direct sum (regarded as a linear space) of its radical and some semisimple algebra^
11.92. The radical of a finite-dimensional algebra consists only of nil-potent elements. Moreover, for every such algebra there exists a positive integer n such that the product of any n elements of its radical equals zero. X
11.93. Every representation of a radical algebra is described in some basis by matrices with zeros on and below the principal diagonal.§
PROBLEMS
1. Prove that every left ideal of the algebra B(Kn) is the set of all operators whose null spaces contain some subspace K' <= K„.
2. Prove that every right ideal of the algebra B(Kn) is the set of all operators whose ranges are contained in some subspace K' <= KM.
3. Find all maximal left and right ideals of the algebra B(KW).
4. Given any semisimple algebra B of linear operators over a space C„, introduce a scalar product (x, y) in Cn such that A e B implies A* e B.
5 (Converse of Problem 4). Given any algebra B of linear operators over a space Cn, prove that if there exists a scalar product (*, y) in Cn such that A e B implies A* e B, then the algebra B is semisimple.
6. Suppose the conditions of Problem 5 are satisfied. Prove that B is a simple algebra if the intersection of the commutator B (Sec. 11.61) and the algebra B itself consists only of operators which are multiples of the identity operator.
7. Let B be the simple algebra consisting of all matrices of the form (6) made up of m2 blocks:
M 0 • 0 0   M  • ■ ' 0
0    0   ■ • • M
t See e.g., N. Jacobson, The Theory of Rings, American Mathematical Society, New York (1943), p. 116.
% See e.g., N. G. Chebotarev, Introduction to the Theory of Algebras (in Russian), Gostekhizdat, Moscow (1949), Sec. 8.
§ Here, of course, it is not asserted that the matrices of the operators of the representation range over the whole set of matrices of this type. See e.g., A, Y. Khelemeski, On algebras of nilpotent operators and related categories (in Russian), Vestnik MGU, Ser. Mat. Mekh., no. 4 (1963), pp. 49-55.
PROBLEMS 333
Show that the commutator of B can be represented (in the same basis) by all matrices of the form
\XE \*E • • ' \mE X21£    XjgZ;    ■ • • X2m£"
Xwi£  Xm;,£  * ■ • XmmE
where the \k (j, k = 1, . . . , m) are arbitrary complex numbers. In particular, show that the intersection of B and B consists only of matrices which are multiples of the unit matrix.
8. For what semisimple matrix algebra B does the commutator B coincide with B itself?
9. Describe every semisimple commutative algebra B (B <= B).
10. Describe every semisimple matrix algebra B for which B c B.
11. Prove that 5 = B for every semisimple algebra B.
12. Let B be the algebra consisting of all polynomials in a single operator A (hence B is commutative, so that B => B). Under what conditions does B = B?
13." Show that if the algebra B {0} consists only of nilpotent elements (i.e., if \k = 0 for some k = k(A) for every A e B), then the equality CB = B cannot hold for any CeB.
14. An algebra B is said to be nilpotent if there exists a number p such that the product of any p elements of B equals zero. Show that an algebra B equal to the direct sum Bx + ■ ■ • + Btn of its right ideals is nilpotent if each ideal B3 (/ = 1,. . . , m) is nilpotent.
15. Prove that if a finite-dimensional algebra B consists only of nilpotent elements, then B itself is nilpotent.
16. Given a nilpotent algebra B of operators in the space K„, let Mx Kn be the intersection of all null spaces of all the operators A e B, let M2 <= K„ be the intersection of all subspaces carried into Mj by the operators A e B, let M3 c KM be the intersection of all subspaces carried into M2 by the operators A E B, and so on. Show that
{0} c Ml c M2 <^ ■ • ■ c M/, = K.n,
where each set is a proper subset of the next and p is the index of nilpotency of B, i.e., the smallest number p such that the product of any p operators in B equals zero.
334     FINITE-DIMENSIONAL ALGEBRAS AND THEIR REPRESENTATIONS CHAP. 11
17. Prove that for every nilpotent algebra B of operators in a space K^, there exists a basis in which every operator A e B is specified by a matrix of the form
0	A12	-^13	
0	0	^23 •	
0	0	0 •	
0	0	0 ■	0
where p is the index of nilpotency of B. (A. Y. Khelemski)
*appendix
CATEGORIES OF
FINITE-DIMENSIONAL
SPACES
A.I. Introduction
A. 11. Recently the concept of a category and certain related ideas have begun to play an important role in various branches of mathematics.! An example of a category is a collection of sets together with mappings of the sets into one another. A collection of linear spaces or algebras together with their morphisms is another example of a category.
The exact definition of a category is as follows: Let s& be a set of indices a, and let be a set of elements Xa (a e jtf) called objects of the category Jf. Suppose that for every pair of objects X& and Xa there is a set of other elements Apo( called mappings of the object Xa into the object X& such that the product of the mappings Arg and is defined for arbitrary a, (3, y and belongs to ^?Ya, where multiplication is associative, i.e.,
A8T(ATpA|jx) = (ASTATp)Apa;
for arbitrary a, (3, y, S. In particular, the set &ao, of mappings of the objects Xa into themselves is defined, and (associative) multiplication of mappings is defined in M^. Finally, it is required that the set ^?aa contain the unit
t See e.g., H. Cartan and S. Eilenberg, Homologkal Algebra, Princeton University Press, Princeton, N.J. (1956); Séminaire A. Grothendieck, Algebře Homoiogique, Secretariat Mathémaťtque, Paris (1958); A. G. Kurosh et al., Elements of the theory of categories (in Russian), Uspekht Mat. Nauk, vol. 15, no. 6 (1960), pp. 3-52.
335
336 Appendix
element la, which has the property that
for arbitrary a, (3 and y. Instead of ^?aa we will usually write simply ^?a.
A set of objects Xa and mappings Apa with the properties just enumerated is called a category. A category is called linear if in the set ^a of mappings Apa (with arbitrary fixed a and (3) there are defined operations of addition of mappings and multiplication of mappings by numbers (from the field AT). This makes the set 88^ into a linear space over the field K. Thus in a linear category the set 88^ becomes an algebra with a unit (over the field K).
A.12. In this appendix we will consider linear categories whose elements are finite-dimensional linear spaces (of dimension >1) over the field C of complex numbers, while the mappings are linear mappings (morphisms) of one such space into another.
Thus we start with the following definition: Let Xa (a e stf) be a set of finite-dimensional complex linear spaces, and for every a let &a be an algebra of linear operators carrying Xa into itself. Moreover, suppose that for every pair of indices a and (3 there is a set of linear operators Apa carrying Xa into Xp such that 1) if contains the operators A&(X and Bpa, then 08^ contains the operator sum Apa + B^, and 2) if 08^ contains the operator Apa, then 38^ contains the product XA^ where X is an arbitrary complex number. A family of linear operators with these two properties will be called a linear family. In particular, the linear family 38^ coincides with the algebra 88 %. It is also assumed that
#*#3x <= (1) for arbitrary <y., (3 and y, i.e., that every product
A3*       (ay3 6^yP» A3» e^3«)
belongs to Such a set of spaces Xa together with algebras <#a and linear families 38^ will be called a category of finite-dimensional spaces or simply a category, and will be denoted by JT.
If we choose a basis in every space Xa, then the algebras 38^ and linear families 38^ can be identified with the algebras and linear families of the corresponding matrices, a fact which will henceforth be exploited systematically.
In what follows, we will find the categories of linear spaces corresponding to given algebras 38a, confining ourselves to the case where the 38^ are semisimple algebras containing the unit matrix. According to Sec. 11.86, for such an algebra the space Xa can be decomposed into a direct sum of subspaces Xaj invariant under all the operators A^, where in each subspace Xxj the algebra 38 % is a simple algebra containing the unit matrix, i.e., is
appendix 337
described in some basis by the set of all quasi-diagonal matrices of the form
where C ranges over the set of all matrices of the appropriate order.
We begin with an analysis of some special cases for which general results can afterwards be stated. Thus in Sec. A.2 we consider the case where every algebra is complete, i.e., is the algebra of all linear operators acting in the space Xa. The opposite case where each 08 ^ is an algebra of operators of the form XE (multiples of the identity operator E) is considered in Sec. A.3. The results of Sec. A.4 pertain to the case of simple algebras this being a natural generalization of the case of the algebras {XE}. In Sec. A.5 we consider the case where each algebra 08^ is an algebra of all diagonal matrices of a given order, while in Sec. A.6 the general category with semi-simple algebras 08 ^ is reduced to the categories considered in the preceding sections.
A.13. We now recall the notation and rules of operation governing matrices of linear operators mapping a linear space X into a linear space Y (see Sees. 4.41-4.43). Let X be an w-dimensional space with basis eu . . . , en, and let Y be an m-dimensional space with basis flf. . . ,/m. Then with every linear operator A mapping X into Y we associate an m x n matrix
a\\    a\2     ' ' ' am
^ __     fl21      a22      * ' * a2n "ml    ®m2     * * * "ma
(with m rows and n columns), where the numbers ah, a2j,.. . , ami in the y'th column are the coefficients of the expansion of the vector Ae, e Y with respect to the basis flf. . . ,fm. Moreover, let Z be a /c-dimensional space with basis glt .. . ,gk. Then with every operator B mapping the space Y into the space Z we associate a k x m matrix
Z>n   b12   * • • blm
D _     ^21     ^22     * ' * b2m
'k2
km
338 appendix The operator C =
BA maps X into Y and has the k X n matrix
C =
-11
*?21 ^22
Ckl Ck2
■In
'2n
kn
obtained by multiplying the matrices B and A in accordance with the formula
ci>q = 2bp>aiQ      (P = U • • • , k\ q ~ 1, . . . , «).
A.14. The following fact, slightly generalizing Examples 4.44a-b (and proved in the same way), will often be found useful:
Lemma. Given an m X n matrix A = \\a.M\\, suppose A is multiplied from the left by a k X m matrix B = |j6„|| with all its elements equal to zero except the single element br s =1. Then the result is a k X n matrix BA whose rQth row consists of the elements of the sQth row of the matrix A while all other elements of BA vanish. On the other hand, if the matrix A is multiplied from the right by an n X I matrix C = jjc„jj with all its elements equal to zero except the single element criSi, the result is an m X I matrix AC whose sxth column consists of the elements of the rxth column of the matrix A while all other elements of AC vanish.
A.15. It follows from the lemma that if an m X n matrix A is multiplied from the left by a k X m matrix B and from the right by an n X I matrix C, where B and C have the indicated properties, then the result is a k X I matrix BAC all of whose elements vanish with the (possible)exception of the single element, equal to as , appearing in the roth row and ^th column (cf. Example 4.44c).
A.2. The Case of Complete Algebras
A.21. Suppose the category consists of finite-dimensional linear spaces Xa, where for every a the algebra of operators acting in Xa is complete, i.e., is the algebra of all linear operators in X^. Fixing arbitrary bases elf . . . , en in the space Xx and /-.,... ,fa in the space X2, we can identify the operators in the sets .^u, MX2, :M21, z$22 with the corresponding matrices.
Let n be the dimension of the space Xx and m the dimension of the space X2. Suppose the family i^21 contains a nonzero operator A, so that the corresponding m X n matrix A = has at least one nonzero element,
say aPoQo. We can assume without loss of generality that       = I. It follows
APPENDIX 339
from the condition (1) and the assumption that 39 x and 89 2 are complete matrix algebras that the product of A from the left by an m x m matrix and from the right by an n x n matrix is itself a matrix in the family 39^. But, according to Sec. A. 15, there is always an operation of this kind leading to an m X n matrix with a unique nonzero element equal to 1 in any preassigned position. Hence, since any m X n matrix is a linear combination of such matrices, we see that 8§21 contains all m X n matrices, i.e., 8$^ is a complete family of operators mapping Xx into X2.
A.22. As we will see below, the category Jf just described can be related to a certain partially ordered set.
Definition. A set S is said to be partially ordered if for every pair of elements A, BeS there is a relation, denoted by the symbol < (and read "less than or equal") satisfying the following axioms:
a) If A < B and B < A, then A = B;
b) If A < B and B < C, then A < C;
c) A < A for every A.
A somewhat more general concept is that of a prepartially ordered set, by which we mean a set S with a relation < satisfying only axioms b) and c). In this case, if A < B and B < A, we call A and B equivalent and write A ~ B. Then A ~ B and B ~ C together imply A ~ C. In fact, by axiom b), it follows from C that ,4 < C and from C < B, B < v4 that
C <i A. But ,4 < C and C < v4 together imply A ~ C. Therefore the relation < allows us to partition the whole set S into (equivalence) classes $4,89 ,. . . , where each class contains all elements equivalent to A as well as a given element A, while elements A and 5 belonging to distinct classes are nonequivalent.
Next we introduce the relation < for the classes j& and 39 themselves, writing $0 < 39 if there exist elements A estf, B e 39 such that A < B. This definition is independent of the choice of the elements A e <s/, Be 39. In fact, suppose Ay e jrf, Bx e3S, so that A ~ Alf B ~ i^. Then /4X < /4 < B < i?! and hence < as required. The fact that axioms b) and c) for a partially ordered set hold for the classes jtf,3§,... now follows from the fact that they hold for the elements A, B, . . . . To show that axiom a) also holds for the classes 39, . . . , let < 89, 89 < s& and choose arbitrary elements A e j/, 5e J1. Then v4 < 5 and £ < ^4, so that ^ and £ are equivalent. But then jrf and 89 coincide, ie.,     — 88, as required.
Thus by introducing an equivalence relation in a prepartially ordered set S, in the way indicated, we arrive at a partially ordered set of classes of equivalent elements.
340 APPENDIX
A.23. We now resume our study of the category It follows from Sec. A.21 that given any pair of spaces Xx and X2, there are just four possibilities:
a) ^?12 and ^?21 are both complete sets of operators;
b) ^?12 is a complete set and &21 consists of the zero element alone;
c) &2\'s a complete set and 08x% consists of the zero element alone;
d) ^?12 and ^?21 both consist of the zero element alone.
If ^?12 is a complete set and no assumptions at all are made about ^?21, we write Xx < X2 (the relation X2 < Xx has the analogous meaning).
As we now show, the relation < makes the category into a pre-partially ordered set. In fact, &u is a complete set of operators for the given space Xl5 by hypothesis, and hence Xx < Xx. Moreover, if Xx < X2 and X2 < X3, then ^?12 and ^?23 are complete sets of linear operators mapping Xx into X2 and X2 into X3, respectively. Since all our spaces have dimension > 1, there is obviously a nonzero operator in the set ^?13. In fact, let ex e Xx, e2 e X2, e3e X3 be fixed nonzero vectors. Then such an operator can be obtained as the product AB, where the operator A e 88 x% carries e2 into ex and the operator B e ^?23 carries ez into ez. By Sec. A.21, 88xz is a complete set of operators carrying X3 into X1? so that Xx < X3. Thus axioms b) and c) are satisfied, and the category Jf" has been made into a prepartially ordered set.
A.24. In accordance with Sec. A.22, we now introduce an equivalence relation in JT, writing Xx <—< X2 if X2 < Xx and Xx < X2, i.e., if both 88X2 and 882X are complete sets of the corresponding linear operators. Then the set of spaces Xa decomposes into classes of equivalent spaces, and the set of all such classes becomes a partially ordered set when equipped with a relation as in Sec. A.22.
Conversely, every partially ordered set of classes 9£a of finite-dimensional spaces defines a category of the type under consideration. In fact, for spaces Xx and X2 belonging to the same class we specify 88X2 and ^?21 as complete sets of operators, while for spaces Xx and X3 belonging to classes 9EX and 3C% such that #*! < #3 (i.e., such that SCX < 3CZ but 3CX 7^ &3)', we specify 88 xz as a complete set and 882x as the set consisting of the zero element alone. Moreover, if Xx and X4 belong to noncomparable classes 9EX and 5T4, we specify that 88xi and 88^ both consist of the zero element alone.
The description of categories of the indicated type is now complete.
A.3. The Case of One-Dimensional Algebras
A.31. Turning to the case where the given algebras are all one-dimensional, we consider two simple examples:
a. Let the category JTj consist of two spaces Xx and X2 of the same dimension, and let the set 882x consist of an operator A mapping Xx onto X2
APPENDIX 341
in a one-to-one fashion together with all its multiples XA, XeC, while the set ^?12 consists of the operator B which is the inverse of A together with all its multiples jaB, fx e C. Then obviously
^12^21 = {*E},      ^21^12 = {XE}.
b. Let the category Jf2 consist of two arbitrary spaces Xx and X2 with fixed subspaces X{ <= X! and X2 <= X2, and let the set ^?21 consist of all operators carrying X! into X2 with X{ going into {0}, while the set ^?12 consists of all operators carrying X2 into X{ with X2 going into {0}. Then obviously
#u#ai = {0},      ^21^12 = {0}.
It will now be shown that the categories and JT2 essentially exhaust all categories consisting of two spaces with ^i..— {XE} (j — 1, 2), i.e., that the following alternative holds for any such category Jf: Either &l2&2l — {0}, in which case 8$2X@$X% — {0} also and the category is contained in a category of the type JT2, or the spaces Xj and X2 have the same dimension and Jf is a category of the type J€~x.
A.32. Thus let Jf be a category consisting of two spaces Xx and X2 subject to the condition &x — {XE}, 382 = {XE}. Let 1^ <=■ XL be the intersection of the null spaces (Sec. 4.62) of all operators A21 e 08%Xt and let N2 <= X2 be the intersection of the null spaces of all operators A12 e ^?12. If M2XXX <= N2 and ^12X2 <= Nl5 we are dealing with a subcategory of a category of the type Jf2 in which X^ = N1? X'2 = N2. Therefore we assume that SS2XXX is not contained in N2, say, and hence that there is a vector xx e Xx and an operator A21 e 8d2X such that A2Xxx = x2 does not belong to N2.
Every operator Bai e 0§2X carries xx into a vector collinear with jc2, and every operator Qx% e ^?12 carries x2 into a vector collinear with xx. In fact, let A2lxx = jc2, B2Xxx — y2, and consider an operator C°2 e ^?12 such that C*2x2 7^ 0. Then, by the basic condition, C°2x2 = C°2A21x1 = Xx1? where X 7^ 0. Replacing C°2 by a multiple of C°2, we can assume that X= 1. Moreover B12C°2x2 = Bzlxx = y2f while at the same time B21Cj2x2 = \ix2, and hence y2 = (ut2. Since, conversely, xx = C°2jc2 and xx$\Sx by the definition of xx, we have analogously C12x2 = y.xx for every C12 e 8$xv
Moreover, in the given case, Ni and N2 reduce to the set {0} consisting of the zero vector alone. In fact, if z2 eNj, then A2i(:x;i + zx) = A21xx — xz> i.e., the vector xx in the above construction can be replaced by xx + zx. But then C^2x2 is a multiple of both xx and xx + zx, so that xx and zx are collinear. Therefore zx = 0, since xx e N,. It follows that Nx = {0}. Similarly, starting with x2, we find that N2 = {0}.
We now see that xx can be chosen to be any nonzero vector of the space Xx, since there is always an operator A21 e ^?21 carrying xx into a nonzero
342 APPENDIX
vector. Hence the operators of the set ^?2i establish a one-to-one correspondence between all the straight lines of the space Xx and some set of straight lines of the space X2, in fact the set of all straight lines of the space X2 by the symmetry of our construction.
Next we prove that the whole set ^?21 reduces to the set of multiples of a single operator. Let Xi 7^ 0 be an arbitrary vector of the space Xlf and let x2 be a nonzero vector determining the straight line in the space Xa corresponding to xt. As we know, there is an operator A^ e &2i carrying xx into precisely xa. Every other operator A21 e &2i carries xx into Xx2 for some X. First suppose A21Xi = Xx2, where X^O. Then the operator
B21 = -■ A21 A
carries xx into precisely x:2. Moreover, B21 coincides with A°x everywhere. In fact, suppose to the contrary that A%1y1 =/2, B^x = z2 ^ y2. This can happen only if y2 ^ 0, z2 =»>2, u. ^ 1 or if/2 = 0, z2 =j£ 0. Let zx = axL + P/x be a nonzero vector with a y= 0, fi ^ 0. Then the vectors A°1z1 and B21Z! are collinear, as proved above. But this is impossible in our case, since
A2i(ax! + (JyO = ax2 + $y2,      Bai(axi + $y{) — ocx2 + (3|Ay2 if y2 ^£ 0, while
A21(ax! + (3^i) = ax2,      B^aXi + P.v1) = ax2 + (3z2
if y2 = 0. This contradiction shows that if A21X! = Xx2, X 9^ 0, then A21 = XA°r Now suppose A^Xj = 0. Then, as just proved, A*x + A21 = A^ and he nee A21 — 0. Thus ^*2l reduces to the set of multiples of a fixed operator A2i, and similarly 0§x2 reduces to the set of multiples of a fixed operator B°2. The products A^BJ and B° Aj^ are nonzero and, by the basic assumption, give operators which are multiples of the identity operator. Hence the operators A^ and B°2 are inverses of each other (apart from a numerical factor). But this is possible only if the spaces Xx and X2 have the same dimension. Thus, finally, we have proved that every category Jf of the indicated type which is not a subcategory of a category of the type Jf2, is a category of the type JTX.
A.33. The categories of the types Jf^ and Jf2 are not tne onty possible categories with two spaces X1; X2 and algebras 08 x = {XE}, 08 2 = {XE}. In fact, suppose that in the set 082X of a category of the type JT2 we choose a linear subset without increasing Nx or decreasing N2 (for example, by imposing a suitable extra linear homogeneous condition on the elements of the matrices of the operators A21). Then we get a category Jf' which satisfies the given conditions but does not coincide with Jf. In the set of all categories with      = {XE} (y = 1,2), partially ordered with respect to set inclusion,
APPENDIX 343
the categories of the type JT2 are characterized by the fact that they are maximal, in the sense that no category of the type Jf2, except for singular cases where X( = {0} or X^= {0}, can be enlarged while preserving the properties of a category and the conditions ^ = {XE}. In fact, suppose that a category of the type Jf2 can be enlarged by including an operator taking a valuey2 $ Xa for some xL e Xi — X[, where X( = {0}. Let B°2 e &l2 be an operator carryingy2 into a nonzero vector x[ e X'v Then B^2A°1x1 = x[, contrary to hypothesis.
Moreover, suppose that in the category we include an operator A21 carrying a vector xx e X[ into a nonzero vector^ e X2. Then clearly X'2 ^ X2, since otherwise X^ = ^12X2 = ^12X2 = {0}, and there cannot exist a vector xx mapped into a nonzero vector. Hence there is an operator B12 e 0SX2 carrying a vector y2 e X2 — Xg into xx. But then Ai2B21/2 ~ y'2-> contrary to hypothesis.
Similarly, assuming that X^ ^ {0}, we find that it is impossible to include a single extra operator in the set 08\2. Thus our category $f of the type Jf"2 is indeed maximal, under the assumption that X[ jtz {0}, X'2 jtz {0}.
A.34. The singular cases must be considered separately. For example, suppose X[ = {0}, so that 88 X2 consists of the zero operator alone. Then, if X^ ^ X2, the category is nonmaximal, and we can enlarge the set ^?21 to include all operators mapping X! into X2 without dropping the conditions &j = {XE} (j=l,2). This gives a "trivial" maximal category, where ^12 = W and ^?21 is a complete set of operators mapping Xx into X2. There is an analogous maximal category with ^?21 = {0} and 88X2 a complete set. Thus, finally, we find that the general category of the type Jt2 is maximal under the following conditions: 1) X[^ {0}, Xa ^ {0}; 2) X( - {0}, X'2 = X2; 3) X; = Xlf X'2 « {0}.
A.35. We now turn to the general case of a category with an arbitrary number N < oo of spaces Xa, ote«s/, Here we have the following analogue of the alternative proved in Sec. A.31:
Theorem. If' 08x = • • • = 8§k = {XE}, then either the product &\k3Skk_y * • * ^32^21 vanishes, or the spaces XL, . . . ,Xk all have the same dimension and ^a = {XA?} where the Kti are fixed invertible operators such that
■^lfc^fc.fc-l * * * ^32^21 = E.
Proof. Suppose the product ^lk^kk^ * * * ^32^21 contains a nonzero operator, which is therefore equal to XE with X ^ 0, and let rj be the dimension of the space X,- (j = 1, .. . , k). Consider the category JT0 made up of the two spaces XL, X2 and the following sets of operators 8§\v
~ m ... m — ^
v0v> — ^'l]c<J®k.k-l ""32> 21 — ^21
344 APPENDIX
(0$\2 is the linear manifold spanned by the corresponding operator products AijtAjt^i * * * A32, each mapping X2 into X^. Since clearly ^2^21 ^ {0}, it follows from Sees. A.31-A.32 that Xx and X2 have the same dimension fi = r2> while 38\x = ^21 = {AA°J where A^ is an invertible operator with inverse (A^)-1 and 0$\2 = {^(A^)-1}. Similarly, applying the same argument to the category Jf0' made up of the two spaces X2, X3 and the linear manifolds ^°3, 0^% spanned by the operators of the form
(AaO^AifcAjt^! ' 1 * A43
and A32, respectively, we find that r2 ~ r3 and ^?32 = {aA°2} where A^2 is an invertible matrix. Continuing in this way, we arrive at the desired conclusion after k steps. |
A.36. In this section and the next, when considering a category made up of N spaces, we will assume that all the cyclic products 08^08^ * * * 08 m vanish. Otherwise, we would simply identify the corresponding spaces which are all of the same dimension.
First consider the following concrete category, which we denote by Jff: Let X12,. .. , X1Ar be N — 1 arbitrary subspaces of the space Xls and for distinctj, k, I, . . . let
Xijfc = x^ n xlk, = x^ n xlfc n xu,...,
where we successively form intersections of the spaces Xu two at a time, three at a time, and so on. If N is finite, the last intersection will be X^,..^, the intersection of all N — 1 of the selected subspaces, while if TV is infinite there will be no last intersection. Let the same construction be carried out in all the remaining spaces X2, X3,. . . , where the index of the whole space is always the first of the indices appearing in the symbol used to denote any of its subspaces. Thus to any set of distinct indices j, k,.. . (in that order) there corresponds a unique subspace of Xf. As for the sets 0$&Oi, we define 082X as the set of all operators mapping Xx into X2 such that every subspace Xljt>>fc goes into X2Xitttk if the sequence jt. . . , k does not contain the index 2 and into the set {0} otherwise, with the other sets 08^ being defined similarly.
We now prove that Jf2y is in fact a category. Given operators A21 e 082X and B32 e ^?32, consider the operator b22a21 carrying the space Xx into the space X3. The operator Aax carries the subspace XXj-._-ft into X2Xim_k and then B32 carries Xa,..iJt into X321i.„k c X3W,.iJfc, Hence B^A^ e ^31, as required. Moreover, if there is a sequence of operators A^, . . . , Afcl mapping the space Xx into itself, then the resulting operator carries Xx into XXj _kX = {0}, in keeping with the requirement that 08 Xi ■ ■ • 08kX = (0},
APPENDIX 345
A.37. Next we show that every category} made up of N < oo spaces Xx, X2, . . . with 08i = {XE} is contained in a category of the type Let Xik be the total image in the space X, of the space Xk under the action of all operators in &jk, and let XikUiSm be the total image in the space X, of the space Xm under the action of all operators of the form AjkAkl ■ ■ ■ Asm (in that order). Then Xjkl_^m is contained in the intersection of Xjk, Xn,. .. , Xim. In fact, if z e X,fcI...,m, then
z     Z. ^ik^kl        ^n'/v/r ^jm^m'
a
where z* eXM) or equivalently,
2 e 2, AikAkl ' ' ' ajj?3'(?»
where
— 2* Aqr ' ' ' Asm2m b Ag-a
But A^A^ - ■ ■ A^ e 08jqi and hence z e Xiv, as required. Note also that Au carries X,...m into X„...m.
It is now clear that our category is contained in a category of the type JT*V with defining subspaces Xjk. In particular, all the maximal categories must be of the type JT^. However, it is not clear what conditions on the subspaces X,fc make a category of the type Jf^ maximal. (Recall that in the case of two spaces Xx and X2, a necessary and sufficient condition for maximality of a category of the type JT2 is that the spaces XX2 and X2X either be both different from {0}, or else that one of the spaces be the whole space while the other is the space {0}.)
A.4. The Case of Simple Algebras
If the given algebras 08^ are all simple, then, by Sec. 11.86, each consists of the set of all quasi-diagonal matrices of the form
(2)
in some basis, where C ranges over the set of all square matrices of order ra.
t Of the special type under consideration.
346 APPENDIX
A.41. First we consider a category with just two spaces, a space X! of dimension nr with kx blocks of size mx (so that nx = kxmx) and a space X2 of dimension n2 with k2 blocks of size m2 (so that n2 = k2m2). Then every matrix %2X e£%t2X can be partitioned into blocks as follows:
m.
		^12
%zX —	A2X	
		■
\
7
Similarly, every matrix 9312 e ^12 can De written in the form
m9
mx\
	Bx2
	
•	
Bkli	
B
\
B
7
Theorem. Either %X^X2 = 0 (for arbitrary %x e ^21, ©we^iaO- or = /c2 matrices Ajk are all multiples of an (arbitrary) fixed matrix
A and the matrices Bih are all multiples of an (arbitrary) fixed matrix M, with the constants of proportionality making up a pair of mutually inverse matrices A and B of order kx = k2."f
Proof If the matrices %tX and 5312 belong to the category JT, then so does their product (from the appropriate sides) by matrices Cx and C2 of the form (2). Therefore, along with the equality
?t2193i2 = C2,
we also have
9t2iCrBia = C2
f A category of the second type will be denoted by Jf3.
APPENDIX 347
for an arbitrary matrix Cx of the form (2). Recalling the rule for multiplication of block matrices (Sec 4.51), we have
AllCBll + Al2CB2l 4- ■ • • + AlkCBkil
— A2xCBl2 -\- A22CB22 + * *' + A2kiCBki2
™ ' ' ' ~ AktlCBlk> + Akt2CB2kt + ■ • • + AkakiCBkikt,
AuCBl2 + Al2CB22 + ■ ■ ■ + AlkCBki2 = 0,
(3)
Let C be the matrix with a single nonzero element, equal to 1, appearing in the rth row and 5th column (r < mlf s < my). In general, if A is any m2 X mx matrix and B any mx X m2 matrix, then ACB is amy X m2 matrix of rank 1, with the element a^Jb^ appearing in the pt\\ row and ^th column. With this choice of C, the formulas (3) become
avrbs<t t aj>r°SQ + + avr °sq
= anb12 + a22b22 4-----h a2klbkl2
upr"*q   i   upr"sq   ' '     vr "sq
= • • • = alfbl? + akPfb2kt + ■ ■ ■ + (4) a^h12 + a12i>22 4-----h alklbkl2 = 0
uvrusq   i   uprusq   i i   uvr   so u>
where the superscripts denote the indices of the corresponding matrices. We can regard (4) as a single matrix equation
AvrBm
a
a
n
vr
.21
vr
a
a
12
vr
.22
pr
a
pr
a
to 2 pr
X 0 0 x
a
a
vr
2k i pr
a
ktki vr
J11
21
' sq
12
'sq
22
'sq
bkil bkl2
"sq "sq
uSq
b2ki
"sq
Ukikt
"sq
348 APPENDIX
Similarly, we have
11 i21
uSq \L 0
0 fJL
1,12
°sq
uSq
SQ
blk* b2k*
"sq
$q
a
a
.11
vr
21 pr
a
a
.12 pr
22
vr
a
a
vr
2k i vr
a
vr
vr
a
ktkt
vr
0 0
0 0
V-
Thus we see that the matrices Avr and (with parameters p, r, s, q) form a category connecting the space Xx of dimension kx with the space X2 of dimension k2, subject to the conditions
mx = {xe},    m2 = {[jlE}.
We can now apply the alternative proved in Sec. 12.32. Namely, if kx •=£ k2, then in fact x = 0, u. = 0, while if x ^ 0 (or if fx y= 0) for at least one set of indices p, q, r, s, then kx = k2 and the matrices Avr are all multiples of a single invertible matrix A, while the matrices Ssg are all multiples of the inverse matrix B = A'1:
Apr — ~A7rrA,        BSQ = \J.gqB.
The matrix Apr consists of the elements of the matrices Ajk appearing in the pth. row and rth column. Hence
ik
where the aik are the elements of the matrix A of order kx — k2\ It follows that the matrices Aik are all multiples with coefficients a*' of a fixed matrix A = HX^JI, and similarly for the matrices Sik, Moreover, the matrices A and B are inverses of each other, as already noted, |
A.42. Thus if ?l2i®i2 0, then kx = k2 and the category is of the form
	a^A
a*lA	■
■	
akllA	■
58i2 =
BllM	B^M   ■ ■ ■	frklM
S21M		*
bkllM		
a*lklA
(5)
APPENDIX 349
where A = is an m2 x m1 matrix and M = is an mx X m2 matrix. Among the matrices A figuring in the given category there must be a nonzero matrix (since ^21^12 ^ {0})> and hence any m2 X m1 matrix must be a matrix A since we can get any nonzero matrix by multiplying A^ from the right by C2 and from the left by Cx. Hence if ^21^12 ^ {0}, the set ^21 consists of all matrices of the form (5), where A = 115**11 is a fixed invertible matrix and A ranges over the set of all m2 X mx matrices. The situation is similar if &X2&2X 7^ {0}. It is now clear that the inequalities &xZ$2X ^ {0} and ^21^12 7^ {0} either both hold or both fail to hold.
A.43. The above results can be formulated in terms of tensor products, an approach which allows us to explain some further facts as well. Thus we begin with the following definition:
Given a A>dimensional linear space X with a basis ex,. . . , ek and an /w-dimensional linear space Y with a basis fx, , . . ,fmi by the tensor product X x Y = Z of the spaces X and Y we mean the set of all finite formal sums
P
I^vX yVi
where xv eX,j/ve Y. Here it is assumed that
[xi Xj»] + [x2 X y] = [(xx + x2) X y], [x X yx] + [x X y2] = [x X (yx + y2)],
P P P
Jxvxv x yv =][xv x M\ =2Av[xv x yv].
V—1 v=\ v=l
It follows that Z is a linear space of dimension <Atw, where all the vectors of Z can be expressed, in terms of vectors of the form e. x fi (2 = 1,. .. , k; j = 1,. .. , m). It is further assumed that the vectors e{ X fj are linearly independent and hence form a basis for the space Z, so that the coefficients ci} in the expansion
k m
g = 2I^« xji] (6)
can be uniquely determined. We can write (6) somewhat differently by summing over the index i. This gives
m  J k \ m
i=i \i=i     / j=i
where the
k
*j = 2 CUei
1=1
are arbitrary vectors of the space X (no longer necessarily basis vectors).
350 APPENDIX
A.44. Let A be an operator mapping a space Xx of dimension kx into a space X2 of dimension k2, and let B be an operator mapping a space Yj of dimension ml into a space Y2 of dimension tnz. Then by the tensor product C = A x B of the operators A and B we mean the operator mapping the space Zx = Xx x Yx into the space Z2 = X2 x Ya in accordance with the formula
C[e\ X ff] - Ae\ X Be) (7) (the superscript is the index of the space). If
X=l ix=l
then (7) takes the form
C[e\xf}]=2 2aabiv[el X f*].
X=l |A=*1
Next we find the structure of the matrix C of the operator C with respect to the bases e\ X ff and e*Xff arranged in Xx x Yx in the order
e\ X f\, e\ Xfl,..., elkl X /}, e\ X f\, e\ X f\,. . . , 4 X f\, . . .,
^1 X /mi? e2 X ■ ■ - > ^fci X /mi,
and similarly in the space X2 x Y2. According to Sec. A. 13, the matrix C has the form
<*lAl «12*11
a21bu a22&u
a^Ai aje^bn
«2fc b
ii
^l^n^l   ak-fibm{L     ' ' ' ak1k1bm1l
«21*1^2 fl22*lm2
tffcj^lmz flA-12*l7«2 ak1lbmim2 ak12b7n1mi
alk2blmt a2k2bXm2
^fcifcj^lm,
fllfcl*m1m1
^■k^k^b mlmi
or
11
^*m.l
^1,
when written as a block matrix.
APPENDIX     351
A.45. Applying Sees. A.41 and A.44, we see that the operators of the algebra 08x considered above are the operators in the tensor product of an mx-dimensional space XT and a k^dimensional space Yx which are tensor products of an arbitrary operator C e ^(X^ and the unit operator E£ 0${YX). Moreover, the operators of the set 0§2X are the tensor products of an arbitrary operator A e ^!?(X2, Xx) and a fixed invertible operator A e ^?(Y2, Yx), while the operators of the set ^?ia are the tensor products of an arbitrary operator M e ^(X1? X2) and the inverse operator A-1,
A.46. The following formula obviously holds for products of tensor products of operators:
(A x B)(C x D) = (AC) x (BD).
Hence, multiplying the operators A21 e ^21 and A12 e 08X2, we find that
(A x A)(M x A"1) = (AM) x (AA_1) = (AM) xEel2)
as must be the case for a category
A.47. Next we find the invariant subspaces of the algebra 0$ — {C x E} of operators acting in the space Z = X x Y. These subspaces are tensor products of the form X x Y0, where Y0 is an arbitrary subspace of Y, since
(C x E)(X x Y0) = CX x EY0 e X X Y0.
To see that Z has no other invariant subspaces, let
z = 2 xi x y*
be any vector in Z (it can be assumed that the vectors xt are linearly independent), and suppose C carries the vectors xt into given vectors xt e X. Then
(Cx E)2>, xyt = 2*.- *yi> and hence any subspace invariant under all the operators C x E which contains the vector 2 xi X yj also contains every vector ]T xi X y%- This proves the italicized assertion.
If we apply every operator Ax A of the category to an invariant subspace Xx x Yxo <= Xx x Yx, then, since the matrix A is arbitrary, the resulting image in the space Z2 = X2 x Y2 is the subspace AXX x AY10 = X2 x Y20- Hence the operators of the category establish a one-to-one correspondence between the invariant subspaces of the spaces Zx and Z2, at the same time establishing a one-to-one correspondence between the ordinary subspaces of the spaces Yx and Y2.
A.48. Everything said above is valid under the condition &21&12 ^ {0} (or equivalently #ia#ai ^ {0}). If ^12^21 = ^21^i2 = {0}, the above
352 APPENDIX
scheme does not work, and the matrices of the category do not in general consist of blocks which are multiples of a fixed matrix A. The situation is then the same as in Sec. A.32, and we can apply the result proved there, i.e., our category JT is contained in some category of the type JT2 (just which one to be explained below).
A.49. We now turn to the case of a category made up of an arbitrary number of spaces Za (a e sf)\ and simple algebras 88^. Two spaces Zx and Z2 will be called cognate if 88l2882X =j£ {0}, so that the matrices of 88 21 are of the form (5). It is clear that the relation of being cognate is transitive. In fact, if Zx is cognate to Z2 and Z2 is cognate to Z3, then Zx is cognate to Z3, since, by the arbitrariness of the matrices A, there are nonzero matrices in the product ^?32^?21. Hence we can partition the whole set of spaces Za into nonintersecting classes of cognate spaces. If Zx and Z2 belong to distinct classes, then 88l28821 = 882l88l2 = {0}.
We can now repeat the scheme of Sec. A.36 with certain modifications. Suppose our set of spaces Za is partitioned into various classes Glt . . . , Gr, . . . of cognate spaces, where the spaces belonging to the class Gr are of the form Xri x Yr, and Yr denotes essentially one space in which invertible operators act. We first consider the spaces Yr by themselves, and construct for them a category Jf^ just as in Sec. A.36 (satisfying the condition ^is&si — {0}) by choosing arbitrary subspaces Yr(Jl and then forming their intersections Yr(Jlv, Yr(AVT,.... This category consists of the operators Ai3 mapping Ys into Yt and at the same time carrying the subspaces YSVlV, ...cY, into the subspaces Yig(Jlv,. . . <= Y^. Then for the spaces Za we construct the following category, denoted by Jf^: If Z. and Zk are cognate, the operators Aik e 88 ik are those previously constructed, while if Z, — x Y^ and Zk = Xk x Yk belong to distinct classes, the operators Aik are arbitrary operators mapping Zk into Zi and at the same time carrying every invariant subspace Xk x YkViV   into an invariant subspace X^ x YifclAVi .
We now verify that every category with simple algebras ^ is contained in a category of the type JQV. Suppose Zj = Xs x Y, and Zk = Xk x Yk belong to distinct classes of cognate spaces. Let Zik be the total image in the space Z, of the space Zk under the action of all operators in 88ik. Then Zik is obviously an invariant subspace of Z^, and hence is of the form X,- x Yjk where Yik is some subspace of Y^. Similarly, let Zm.sm be the total image in the space Zi of the space Zm under the action of all operators of the form AikAkl • • • ASm. Then Zm_$m is also an invariant subspace, which is easily seen to be contained in the intersection of Zik, Zn,. . . , ZJm, by an argument like that given in Sec. A.37. It follows that our category is contained in a category of the type Jf^, as asserted.
t We temporarily denote each space by Za instead of Xa, reserving Xa for the first factor in the tensor product Za = Xa x Ytx (cf. Sec. A.45).
APPENDIX 353
A.5. The Case of Complete Algebras of Diagonal Matrices
Suppose the given algebras are all complete algebras of diagonal matrices. Then in each space Xa there is a fixed basis in which the matrices of the operators Aa e are all diagonal. Relative to these bases, the operators A0a e are also specified by certain (rectangular) matrices, so that our problem can be stated as a problem in matrix theory.
A.51. First consider a category with two spaces Xx and X2, and let Ax2 be the matrix of any operator in SSX2. Then, by the definition of a category, the product
Bx2 = AXAX2B2, (8)
where Ax and B2 are suitable diagonal matrices, is also the matrix of an operator in ^?12. Suppose Ax is the (diagonal) matrix whose only nonzero element, equal to 1, appears in the y'th row and jth column, while B2 is the matrix whose only nonzero element, again equal to 1, appears in the Ath row and kth column. Then, by Lemma A-14, all the elements of the matrix Bl2 vanish with the (possible) exception of the single element appearing in the jth row and Ath column, and this element is just the element a^k of the matrix Al2. Thus the operation (8) replaces every element of the matrix A12 by zero, except the element ajk which it leaves unchanged.
This leads to the following conclusion about the structure of the family 0SX2\ The family 8&\2 consists of all matrices with arbitrary elements at a fixed set of positions and zeros everywhere else.
A.52. Let S12 denote the fixed set of positions in the matrices of the family ^?12 at which arbitrary elements are allowed. We now explain the connection between the sets S12 and S2X. Let Al2 e ^?12 be a matrix whose only nonzero element, equal to 1, appears in the^th row and A^th column,! so that (jlt kx) e S12, and let B12 e 3§2X be any matrix with arbitrary nonzero elements at the positions of S21. Then the products Cx = AX2B2X and D2 — B21AX2 are diagonal matrices, by hypothesis. On the other hand, by Lemma A. 14, the jxth row of Cx consists of the elements of the A^th row of B2l while all the other elements of Cx vanish. Since Cx must be diagonal, we see that all the elements of the A^th row of B2l vanish with the (possible) exception of the element in the fxth column. Similarly, the A^th column of D2 consists of the elements of the jxth column of the matrix B2X while all other elements vanish, and since D2 must be diagonal, all the elements of the jxth column of B2X vanish with the (possible) exception of the element in
t For simplicity, if A12 is an operator in and A12 is its matrix, we write Al2 £ ^ as well as A12 e ^1S, and similarly for ^2i, etc.
354 APPENDIX
the kjth row. Thus, if (ju kx) e S12, all the elements of the j\th column and kxth row of an arbitrary matrix of &21 vanish with the {possible) exception of the element at the intersection of this row and column.
We are now able to determine the structure of the set S21 from a knowledge of the set S12. By suitably interchanging rows and columns of the matrices of 2SX2 (which is equivalent to interchanging elements in the bases of the spaces Xx and X2), we can see to it that the rows and columns appearing first in the matrices of SSX2 contain no positions in the set S12, while the rows and columns with only one position each in Sl2 come next and the rows and columns with at least two positions each in 512 come last. Thus a matrix
112
'12
has the form
0 0
Y
0
0
n
a
A12 —
rn
0 0 0 0
0
0 •
0 •
1 1 1 • 1 •
(9)
where the positions corresponding to the set S12 are occupied by ones and all other positions are occupied by zeros.
Next we construct the general matrix B21 e &21, with n rows and m columns:
	1	a			m
1	•		0 ■-	-	0 ---
T	•		-	-	
	0		1 ••	■	0 ---
S	0			• 1	0 ---
	0		■	•	0 ---
n	0		0 •	■	0 ---
APPENDIX 355
Since the matrix A12 has a one in row a + l and column Y + 1 the matrix B2l can have a one in column a + 1 and row y + l but, in any event the remaining elements of this row and column must vanish. The same is true of all the rows from y 1 to S and columns from oc + 1 to p. If the matrix A12 has two ones in column S + 1, then all the elements of the corresponding row of the matrix B21 vanish, and the same is true of all columns from S + 1 to n (which contain at least two ones). However, if a column of the matrix A12 contains only a single one, then there are two ones in some suitable row with index >|3, and this causes the column of the matrix B21 with the same index to vanish. As a result, the whole lower right-hand corner of the matrix B2l is occupied by zeros. In fact, let (j, k) be any position in this corner, and consider the corresponding position (k,j) in the matrix A12. Then the kth row oryth column of Al2 has at least two ones, since otherwise we would have put this row or column in an "earlier" position. This means that the kth column or jth row of B2l consists entirely of zeros, so that in any event there must be a zero at the position (y, k). The lower left-hand corner of the matrix B21 also consists entirely of zeros, In fact, if a one appeared anywhere in the lower left-hand corner of B21, say at the position (j, k), then, by the symmetry of the construction, all the elements in the yth column of A12 except possibly the element in the kth row (i.e., in the upper right-hand corner of A12) would have to vanish, which is impossible since this column must have a one in the lower right-hand corner. A similar argument shows that the upper right-hand corner of B21 also consists entirely of zeros. As for the elements in the upper left-hand corner of B21, they can be arbitrary.
Thus it is clear that our category can be enlarged by including all elements of the lower right-hand corner of the matrix (9) in the set S12 (provided Sl2 does not already contain all these elements) and including all elements of the upper left-hand corner of the matrix (10) in the set S2]. The category then becomes maximal, since it is no longer possible to enlarge Sl2 without making S21 smaller. In geometric language, the maximal category made up of two spaces Xj and X2 is constructed as follows: The space Xx is the direct sum of three subspaces XJ, X\, X2 and the space X2 is the direct sum of three subspaces X°, X*, X2, where X\ and X\ have the same dimension. The effect of an operator Al2 is such that X" is mapped into {0}, X\ is mapped into X\ by a diagonal matrix and X2 is mapped into X2 in an arbitrary way, while the effect of an operator B21 is such that X° is mapped into XJ in an arbitrary way, X* is mapped into XJ by a diagonal matrix and X| is mapped into {0}. An arbitrary (nonmaximal) category differs from the maximal category in that the operators mapping X2 into X2 are not arbitrary, but rather correspond to matrices with zeros in certain fixed positions, while the same is true of the operators mapping X* into X* (there is no connection whatsoever between the positions occupied by these zeros).
356 APPENDIX
A.53. Next we consider a category JT involving arbitrarily many spaces Xa (a e sf). First of all, it is clear that every subcategory of the category Jf made up of a pair of spaces Xa, Xp and corresponding families &&a, 88^ is constructed in the way just described, i.e., 88^ is the family of all matrices with arbitrary elements at some prescribed set of positions 5Pac, while 88,^ is the family of all matrices with arbitrary elements at some other prescribed set of positions Sa&. In this regard, we introduce the following notation: If S is any set of positions in an m X n matrix, then 8§mn(S) is the set of all m X n matrices with arbitrary elements at the positions S and zeros everywhere else.
Now let Sx be a set of positions in an m X n matrix and S2 a set of positions in an n X p matrix. Suppose S is the product SXS2, defined as the set of all positions in an m X p matrix at which one can get nonzero elements in the product 88 mn{Sx)88 nv{S2). In other words, a position («", k) belongs to the set SXS2 if and only if there exists an index j such that belongs to Sx and (y, k) belongs to S2. Let Sn, . . . , Slv be a collection of such sets of positions for an m X n matrix, and let 521,...,De an analogous collection for an n X p matrix. Then the general formula
l)suUs2j= U Usus2j (ii)
i=l ;=1 ( = 1 j=l
is an easy consequence of the definition of a product of S-sets.
In terms of products of S-sets, we can write the conditions for our category in the form
^atg^pa *— SapiS(jY c: Say, (12)
where D is the set of all positions along the principal diagonal of the appropriate square matrix.
A.54. We now construct a family of concrete categories of a certain type. To specify a category means to specify all the families ^?ap, or equivalently in the present case, to specify all the sets S^, Choosing £21 arbitrarily, we then choose 512 in such a way that S2i512 <= D, S12S2l <= D (we have already described how this is done in Sec. A.52). Suppose Sik has been constructed for all j and k less than n, in such a way that the conditions (12) for a category are satisfied. Then Sjn and Sni (j < n) are constructed as follows: Snl is chosen arbitrarily, and SXn is chosen to satisfy the conditions SlnSnl <= D, SnlSln <= D. Suppose Sjn and Snj are chosen for all j < k in such a way that (12) holds. The required sets Snk and Skn must satisfy the following conditions implied by (12):
a) SnkSkn <= D, SknSnk <= D;
D) S}kSkn    Sin> SknSni     Skj, SinSnk    Sik, SnkSki Sni't
c) Skn ~* SkiSin* $nk ~* ^nj^jk-
APPENDIX 357
The conditions a) and b) represent "upper bounds" and condition c) "lower bounds" for the sets Snk and Skn. We now show that these conditions are compatible. Suppose, for example, that
n-l n-1
$kn — U SkiSin,      Snk = U SnjSjk. (13)
4=1 j=l
Then, by formula (11) and the induction hypothesis,
»—1 n—1 n—1 n~l
^nk^kn ~ U SnjSjk U S^S^n = U   U SnjS Sk^tci^ in j=l i=l 3=1 i=l
j=l 1=1
n—1 n—1 «—1
SjkSkn —      U SkiSin = (J SjkSkiSin    (J SuSin Sin.
»"=1 i=l t=l
This proves the first of the relations a) and the first of the relations b), and it is clear that the remaining relations can be proved by similar arguments. Thus the induction is justified and our construction is correct.
It is possible, of course, to construct a category by using arbitrary Snk and Skn satisfying the conditions a)-c), and not just sets of the special type (13) used to prove the compatibility of these conditions. In this way we obtain a large family of concrete categories, in each of which only the sets Snl are arbitrary, while the remaining sets 5a3 satisfy the extra conditions a)-c).
A.55. We now see that every category JT such that 08^08^ <= 08(D) belongs to the family just constructed. In fact, the sets Snl and Sln are defined in Jf for every «, while the remaining sets Snk and Skn must satisfy the conditions a)-c). But then Jf is a category of the family described in Sec. A. 54.
It would be interesting to describe the form of the maximal categories of this family.
A.6. Categories and Direct Sums
A.61. Given a category with basic spaces X*, algebras £2* and families of operators 08™, where p = 1,. . . , kt and q = 1,. . . , kQ, we now show how to construct a new category whose basic spaces are direct sums of the spaces Xf and whose basic algebras are the corresponding direct sums of the algebras 08v.
358 APPENDIX
Thus let X( be the direct sum of the spaces XJ,. .. , X*», and let 0it be the direct sum of the corresponding algebras 0S\,. .. , 08k.i (i.e., in the space Xf an operator A £ 381 acts like any operator in the algebra To specify an operator Ai? e 08H, we use the block matrix
AH —
A11	A a
	.22
/tji
4klZ
AH
(14)
where the block Aq? corresponds to an arbitrary operator of mapping the space Xf into the space X* (p = 1,.. . , k^j = 1,. . . , k}). To show that this gives a category, we note that if
then
Btl =
	Bn	■ ■ ■	Dil
		. . .	
Bkil
fcj2
l 12d21
Dki
kiki
lkj f>kil
AflBtl =
where each sum of products again belongs to the appropriate family of operators, by the definition of the category JT. Thus our rule leads to a new
category Jf, which we call an extension of the category JT.
A.62. It turns out that the converse is also true, i.e., if the basic spaces Xf figuring in a category are direct sums of certain spaces Xf (p = 1,. . . , kt) and if the corresponding algebras 061 are direct sums of algebras 08* (p = 1, .. . , kj) of operators acting in X^, then the whole category is an
extension in the above sense of a category JT, constructed from the spaces X^ and algebras 06vv In fact, let JT' be a category of the indicated
APPENDIX 359
type. Then in an appropriate basis chosen in the subspaces Xf, every matrix At of an operator of the algebra ^ has the quasi-diagonal form
A)
iJct
where A? is a square matrix of order rf(p=\t...t AjJ. Every matrix AH of an operator of the algebra 0&ti is a block matrix of the form (14), where A^f is a rectangular matrix with r? rows and rp columns. With each block A]f we can associate in a natural way an operator A?f mapping the space Xf into the space X?. Using all such operators, we construct a new category Jf]f with basic spaces Xf, algebras SSf and families 0F» of operators AJf specified by the matrices A™. We now show that this collection of objects does in fact define a category.
Let A?f be an operator mapping Xf into X*, and let A** be an operator mapping X* into X{. Then the product A\f = A™A*» belongs to the family @lf. In fact, the category JT' contains the matrix with its qpth block equal to A™ and all other blocks equal to zero, as well as the matrix with its r^th block equal to ArJ and all other blocks equal to zero. The product of these two matrices, which belongs to the category Jf", is a matrix with its /ith block equal to A™ and all other blocks equal to zero. Therefore A\f e ^?[f, as asserted.
Thus all the conditions for a category are satisfied. It is true that the operators mapping the space Xf into the spaces X? with the same subscript have not yet been defined. However, all such operators can be set equal to zero without destroying the requirements for a category.
A.63. Since every semisimple algebra of operators acting in a space Xt allows us to decompose the space Xi into a direct sum of spaces X^ in which the algebra now acts as a simple algebra, we see that the structure of a general category with semisimple algebras reduces to that of a category with simple algebras (this problem was considered in Sec. A.4). The matrix of every operator Aj{ e 3§H of the category is of the form (14) in an appropriate basis, where each block A*f is the matrix of an operator in the family &°f of some category Jf™ with basic spaces X«, Xf and simple algebras @f. Some blocks of the matrix A}i may be identically zero for all the Ait. If we
360 APPENDIX
denote the set of all vanishing blocks by SH, the question arises of how the sets SH are related for various indices / and/ A similar problem was considered in Sec. A.5 for the case of one-dimensional blocks. The method used there is also applicable to the present case, and leads to the following result: If the category determined by the intersection of the jth block row and ith block column of the matrix Al2e &l2 is of the type Jf1 or JT3 (involving invertible matrices)^ then all the blocks in the ith block row and jth block column of the matrix A2l determine zero categories, with the (possible) exception of the block at the intersection of this row and column. If the category in question is of the type Jf2, then matrices of a category of the type JT2 appear in the indicated blocks and have zero products with the given matrix.
We can now determine the structure of the general category, as in Sec. A. 5.
Remark. A. Y. Khelemski (loc. cit.) has found the categories corresponding to nilpotent algebras 88^.
t See Sec. A.31 and the footnote on p. 353.
HINTS AND ANSWERS
Chapter I
1. Ans. a) +; b) + .
2. Ans. ana32a23fl44> ^41^12^23^34, ^31^42^23^44-
3. Ans. (_l)»("-D/a.
4. Hint. Consider the determinant all of whose elements equal 1.
5. Ans. A = (mq — np)(ad — be).
6. Hint. Multiply the first column by 10*, the second by 103, the third by 102, the fourth by 101, and add them to the last column. Then use Corollary 1.45.
7. Ans. Ax =   -29,400,000, A2 = 394.
8. Hint. P(x) is obviously a polynomial of degree 4. We first find its leading coefficient, and then determine its roots by making rows of the determinant coincide.
Ans. P(x) = -3(jc2 - l)(x2 - 4).
9. Hint. Add all the columns to the first. Ans. A - [x + (n - l)a](x - a)"-1.
10. Hint. The determinant on the left is a polynomial of degree n in xn with roots xu. ..,        and hence can be represented in the form
n-l
(A + Bxn) TJ (*» - **)■
Jfc=l
361
362     HINTS AND ANSWERS
Another representation of the same determinant in the form of a polynomial of degree n in xn can be obtained by expanding it with respect to its last column. Equating the coefficients of    and those of jc*~2, find A and B.
11. Am. cx = 0, c2 = 2, c3 = —2, c4 = 0, c5 = 3.
12. a« 2^;;*:;;;:Mi±::::^ = °> where  < t2 <•••</* and< ^ <
• ■ • < ik are fixed, and at least one of the ia differs from the corresponding Vg, (Cauchy).
13. Hint. It is sufficient for the corresponding fourth-order determinant to be nonzero.
14. Hint. Use the results of Sees. 1.96-1.97. Chapter 2
1. Ans. No, since we cannot multiply by —1 and stay within the set.
2. Ans. No, since we cannot add two vectors which are symmetric with respect to the given line and still stay within the set.
3. Ans. Yes. In particular, the number 1 eP serves as the "zero vector" of the space P.
4. Hint. See Sec. 1.96.
5. Hint. Assuming linear dependence of the form
ri + a2/r* + ■ ■ ■ + aktr* = 0, divide by tr* and differentiate. Then use induction in k.
6. Hint. Show that the zero vector also has a unique expansion with respect to the system elt e2,. . . , en. From this deduce the linear independence of the vectors of the system.
7. Ans. Yes, consisting of a single vector, i.e., any element xeP different from 1.
8. Ans. 1.
9. Ans. The intersection is the line of intersection (in the usual sense) of the two planes, while the sum is the whole space.
10. Hint. See Sec. 2.34.
11. Ans. No. It can be replaced by any other vector of the hyperplane.
12. Ans. With the "point" interpretation, the property means that every hyperplane contains the line passing through any two of its points.
13. Ans. In general p + q + 1, if this number does not exceed the dimension of the whole space.
HINTS AND ANSWERS 363
14. Arts, p + q + r + 2 if this number does not exceed the dimension of the whole space.
15. Ans. With each positive number associate its logarithm. Chapter 3
1. Hint. In a matrix of rank 1 the columns are proportional.
2. Hint. We have to write the conditions for a vector^ to belong to the subspace L in such a way that they involve only minors of A of order k. But^> e L if and only if the matrix B obtained by adding to A the column consisting of the components of the vector y has rank k, or equivalently, if and only if every minor of B of order k + 1 vanishes. Expanding every minor of B of order k + 1 with respect to elements of the last column, we obtain a system of equations in the components of^>, with coefficients which are minors of A of order k.
3. Hint. See Sees. 1.51-1.52.
4. Ans. x = (c1} c2, c3, c4), where cx = ~ 16 + cz + c4 4- 5cB, c2 = 23 — 2c3 -2c4 — 6c5.
5. Ans. If (X - 1)(X + 2) * 0, then
x + 1 1 (X + l)2
X ~     X + 2 '      -V~X + 2'      2 ~  X + 2 '
If X = 1, the system has solutions depending on two parameters. If X = —2, the system is incompatible.
6. Ans. The matrices
«1			<*1	*i	Cl
	b2	and	*2		
«3	h		*3	^3	c3
must have the same rank. 7.        The matrices
	K		«1		
(*2	b2		«2	b2	
•	•	and	•	•	
	bn			bn	
must have the same rank.
8. Ans.xW = (1, -2,1,0,0), x(2> = (1, -2,0, 1,0), x™ = (5, -6,0,0, 1).
364     HINTS AND ANSWERS
9. Arts.
-16		1		1		5
23		-2		-2		-6
0	+ «1	1	+ «2	0	+ «3	0
0		0		1		0
0		0		0		1
for example. Here the first column consists of the components of a vector x0 which is a particular solution of the nonhomogeneous system, while the other columns consist of the components of the vectors ya), y{2), y(3) forming a normal fundamental system of solutions of the corresponding homogeneous system.
10. Ans. The rank of Ax is 3, and there is a basis minor in the upper left-hand corner (for example). The rank of A2 is 5, and the basis minor is the same as the determinant of the matrix.
11. Hint. Move the minor M into the upper left-hand corner and then, by using the procedure of Sec. 3.62, show that all the columns of A starting with the {r + l)st can be made into zero columns.
12. Hint. If P ^ 0, look for A in the form
P 0 x 0   1 y
13. Hint. The rank of the matrix \\aik\\ is either equal to n or less than n.
14. Hint. Use the Kronecker-Capelli theorem.
15. Hint. Use the result of Prob. 14.
Chapter 4
1. Ans. Also n.
2. Ans. c) and g).
3. Ans. Yes.
4. Ans.
	-1	-1	2		2	0	-2
a) AM =	1	-3	3	;     b) Alx) =	1	-1	1
	-1	-5	5		2	1	0
5. Ans. ABAB * A2B2.
6. Ans. AB - BA = E.
7. Hint. (A + B)2 = A2 + AB + BA + B2,
(A + B)3 = A3 + A2B + ABA + AB2 + BA2 + BAB + B2A + B3.
HINTS AND ANSWERS 365
8. Hint. Use induction.
9. Ans. The dimension of the space is nm. For basis operators we can take those corresponding to the matrices A^ (i = 1,. . . , n;j = 1,. . . , m), where Ai3- is any matrix whose elements are all zero except for the element in the /th row and jth column.
10. Ans.
11. Ans.
	0	0	0
AB =	0	0	0
	0	0	0
	1 n		COS /j<p	—sin n<p
An =				
	0 1		sin ny	cos ny
12. Ans. A —
13. Ans.
a	b
c	—a
, where be = —a2
	-9	-2	-10		0	0	0
a)	6	14	8	; b)	0	0	0
	-7	5	-5		0	0	0
15. Hint. Use Prob. 14.
16. Hint, The three equations for the unknown dements of the matrices A and B lead to equations for three minors of an unknown 2x3 matrix. Now see Chap, 3, Prob, 12,
17. Hint, See Sec, 4,54.
18. Hint, Express the elements of the minor M in terms of the elements appearing in the first r rows, and then use Theorem 4,75,
19. Hint, Use the solution to Prob, 18,
20. Hint, See Sec. 4,54,
21. Ans,
c) C-1 = C.
			l	-2	7
	5 -2				
a) A-1 =		;    b) ß-1 =	0	1	-2
	-1 2		0	0	1
23. Ans. HA is the zero matrix, then Xis arbitrary, If det A ^ 0, then Xis the zero matrix. If det A =0 and A is not the zero matrix, then its rows are
366     HINTS AND ANSWERS
proportional. Let a/p be the ratio of the corresponding elements of the first and second rows of the matrix A. Then
X =
	ap
	
for any p and q.
24. Hint, See Sees, 1,51-1,52,
25. Ans, No,
26. Hint. Consider the operator Ax such that
27. ///«/. The operator A carries linearly independent vectors into vectors that are again linearly independent,
28. Hint. Apply the equality AB = BA to an eigenvector of the operator A.
31. Hint. Use the result of Prob. 30,
32. Hint. Suitably choosing an operator B and using Prob, 28, reduce the solution to Prob. 31,
34. Hint. Use the factorization of the operator A2 — n2E.
35. Ans. a)X1=2,/1 = (l,0f0); \ = l,/2 = (1, 0, 1); X3 = - l,/3 = (0, 1,-1);
b) X1= -1,/! = (1,0,0);     X2 = X3 = l,/2 = (1,0, l)f/8= (0, 1, -1);
c) >»i =2,/, = (1,0,0); d)Xx = 1,/i = (1,0,0, -1);X2 = 0, f2 = (0, 1, 0, 0).
36. Hint, The relation T(A*) <= N(Am) is necessary and sufficient for the equality Afc+m = 0 to hold,
37. Hint, Let/1} , , , ,/r be a basis for the range of the operator A, so that
r
Ax = J,ai(x)ft
for every x e K„, Now let
Atx = Qi(x) fi      (i = 1, , , , , r).
Chapter 5
1. Hint, The first vector of the new basis is x,
2. Hint. Choose a new basis f\,fz, , - - ,fn whose last n — k vectors form a basis for the space K', Write the condition xeK' in the form of a system of equations involving the components of x in the new basis. Use the transformation formulas to construct the corresponding system of equations involving the components of x in the original basis.
HINTS AND ANSWERS 367
3. Hint, Use Prob. 2 and the definition of a hyperplane,
4. Ans, The matrix of the desired transformation is C = BA"1*
5. Hint. Let eu e2, . . . , en be an arbitrary basis in Kn, and let
L(x) =24£,,
where 51} £2i , . , , are the components of the vector jc. Begin the formulas for the coordinate transformation with the equation
n
6. Hint. Use Sec, 4,83 and the invariance of the characteristic polynomial (Sec. 5.53),
7. Hint, Choose a basis whose first m vectors lie in the subspace R(x°\ Show that for this basis the polynomial det \\A[f) — \E\\ has the factor (X — \)m. Now use the invariance of the characteristic polynomial (Sec, 5,53),
Chapter 6
1. Ans. In the basis enf ett„lf, . , ,ex.
2. Hint, See Sec, 6,44,
3. Ans,
1	1	0	0	0
0	-1	0	0	0
0	0	2	1	0
0	0	0	2	0
0	0	0	0	2
4. Ans. No. E2(A) = (X - 2)(X - l)2, E2(B) = (X - 1)(X2 - 5X - 2),
5. Ans. En^(Ax) = En_y(A2) = (1 - X)», £n^2(A) = = U
= n - u^j -1.
6. ff/itf. £„_!(^) = (a - Xf, £„„2^) = 1.
7. Ans. A diagonal matrix with some of the roots of the polynomial J°(X) along its principal diagonal,
368     HINTS AND ANSWERS
8. Ans, Some of the roots of the polynomial P(k) lie along the principal diagonals of the Jordan blocks, and the sizes of the blocks do not exceed the multiplicities of the corresponding roots,
9. Hint, The vectors x, Ax and Ajc2 are linearly dependent.
10. Ans, Polynomials in Am(d).
11. Ans, Matrices of the form bx  b2   bz   >>> bm 0    bx  b2   "•   bm^   • • •
0    0 0
(n > m)
or
R = "tim
*1	h •	bn
0		•• bK
0	0 •	I*
0	0 •	• ■ 0
0	0 •	• • 0
'n-l
(n < m).
12. Ans. Matrices of the form
B„
with the blocks Bm,m. given in the answer to Prob, 11, 13. Ans, Matrices of the form
Br
1 1
0 B
0
0
0
5
0 0
t
mkmk
14. To every group of Jordan blocks with the same root of the characteristic polynomial, there corresponds a block of the kind given in the answer to Prob, 12. The remaining elements are all zero,
15. Ans. If the multiplicity of each root of the characteristic polynomial equals the size of the corresponding Jordan block, or if the characteristic polynomial coincides with the minimal annihilating polynomial, or if all the elementary divisors except the one with highest index equal 1,
HINTS AND ANSWERS 369
Chapter 7
1. Ans, A tensor of order two, with two covariant indices,
2. Ans, For example,
1i - '4 ~ 4'
where
1l = i£l + 1^2 + S3. ^2 = 2^1-^2» 13 = £3-
3. /fl/tf. See Sec. 7.93.
4. Hint, See Sees, 4,54 and 7,15,
5. .4/15, For example,
where a,- and ~i (i = 1, 2, 3) are the new components of the vectors x and y. The transformation formulas to the new basis are
°i = ?i + £2,       a2 = £2 + 2£3,       cr3 = £3.
6. ///«/. First renumber the variables in such a way that the matrix of the bilinear form A(x, y) is transformed into a form to which Jacobi's method is applicable,
7. Hint, II —(7^11 must be the matrix of a positive definite form, Ans,
all a\2
a21 a22
>0,(-lfdet \\aik\\ >0,
8. Hint, See the remark to Sec, 7,96.
9. Hint, Consider the form on the basis vectors,
10. Hint, The last row of the determinant consists of the elements
4'° = (-l)k~1M^i,*k_i, *jh-i» ...,*«)      (h = 1, 2,, .. , n).
11. Use the equation A(elt e2) = 1 to find the first pair of basis vectors, Then construct the subspace L defined by the equations
\(elt x) = 0,      A(<?2, x) = 0,
If the form A.(x,y) does not vanish identically in this subspace, find vectors ez, c4 e L such that A(e3, et) = 1, and so on,
12. Hint. Consider the form
A(x, x) + e J l\      (e > 0),
1=1
and apply the criterion of Sec, 7.96,
13. Hint, Let
370     HINTS AND ANSWERS
be a basis of the subspace K', Then K" consists of the vectors y = (?h> - - - » *)n) satisfying the system
a(*<>V) = 2 (i>tt5j'»W =0      (/ = l,...,r). *=i \*=i /
The matrix of the coefficients of the system is the product of the nonsingular matrix of the form \(x,y) and the matrix ||£('MI of rank r. Now use
Corollary 4,67,
14. Ans, a' = a,
15. Hint, If y = (%, , , ,, rJn) is a solution of the system (44), then
(b3y) = (Ax,y) = (x,A'y) =0,
Conversely, the system (44) is the condition for the vectors y and a}- = (ailt , , , , a]n) to be conjugate, If (b,y) = 0 for all such yf then x lies in the linear manifold spanned by the vectors ax.....an,
16. Hint, See Chap, 4, Prob. 37.
17. Hint, See Chap, 3, Prob. 1,
18. Hint, First consider the case of nonnegative forms of rank 1, using Prob. 17 and then Prob, 16,
Chapter 8
1. Ans, No, since axiom b) fails, and so does axiom c) (for X = —1).
2. Ans, No, since axiom b) fails.
3. Ans. Yes, The new definition of the scalar product merely corresponds to a change of units along the coordinate axes,
4. Hint. Let elt e2i ez denote the vectors directed along three edges of the tetrahedron drawn from a common vertex, and express the other edges of the tetrahedron as vectors.
Ans. 90°.
5. Ans. 90°, 60°, 30°.
6. Ans,
<
•b
(*(/) + y(t)f dt
x\t) dt + J   /(/) du
1
7. Ans, cos <p = ——
8. Ans. a) g = (3, 1, -1, -2), h = (2, 1, -1,4); b) g = (1,7,3,3), h = (-4, -2,6,0).
HINTS AND ANSWERS 371
9. Hint, Use the definition of angle (Sec, 8,33) and the orthogonality of the vector h to all vectors of the subspace R',
10. Hint. Take the scalar product of equation (18), p, 223 with the vector g0.
11. Hint, See Sec, 8,52,
12. Ans. yx = 5, y2 = yz = 0, y^ = -2j, y5 = 0, v6 = 5k.
13. Ans. (1,2, 1, 3), (10, -1, 1, -3), (19, -87, -61, 72).
14. Hint. Assuming that the dimension of R" is greater than the dimension of R', consider the vector e" e R" which is orthogonal to the projection of R' onto R". Then use Prob. 10,
15. Ans. An -2„(wj)2-
16. Ans. Pn(-l) = (-l)tt.
17. Hint. Express the coefficients as scalar products,
18. Hint. Use the results of Probs, 15 and 16.
19. Hint. Expand Q(t) in Legendre polynomials.
1
Ans. Q(t)= — Pn(t).
20. Ans. \Pn(t)\\* = —^TT •
2n + 1
21. Ans. k(\) = \detA\.
22. Hint. See Sec. 4,75.
23. Hint. This is a question of comparing the altitudes of two hyperparallel-epipeds,
24. Hint. The inequalities
V\xXy x2, . . . , xm] V[xi, x2-> • ■ • » xk\ /t        .   ~ .
77P-; < -rr{-;     (/: = 1, 2,.. . , m)
are easily obtained from the inequality (37), Multiply them all together for k = 1,2,..., /w, make appropriate cancellations, and then take the (m — l)th root. The geometric meaning of the inequality is the following: The volume of an An-dimensional hyperparallelepiped does not exceed the product of the (m — l)th roots of the volumes of its (m — l)-dimensional "faces."
25. Hint. Write the inequality (38) for jcsi, xh, , , . , x$r, and then multiply these inequalities together for all permissible values of slt s2, ■ . • , V
26. Hint. We must construct a hyperparallelepiped in a 2m-dimensional space such that the projections of its edges onto each axis have absolute values no greater than M and such that its volume is exactly Mnnn/2. For M = 1, the
372     HINTS AND ANSWERS
matrix Am of the components of the 2m-dimensional vectors determining this hyperparallelepiped are given by the following recurrence formula:
Am —
lm— 1
tn— 1
A,n—i        -^m- l
	1 1
	
	1 -1
Comment. For «    2™, the estimate can be improved,
27. ////tf. Given any subspaceG c R, let G-1 denote the orthogonal complement of G, For every x e N(A) and every z e R,
(A'z, x) = (z, A*) = 0,
and hence A'zeN^A), i.e., T(A') c^(A), T1 (A') = N(A). For every x e T-L(A) and every y e R,
(A'x,y) = (x, Ay) = 0,
and hence A'x = 0, i.e., xeN(A'), so that T-L(A) c N(A'), T-L(A') c N(A). It follows that N(A) = T^(A'), N^(A) = T(A'). The other assertion is proved similarly,
28. Hint, See Sec. 4.77.
29. Hint. See Sec. 4.54.
30. Hint. The angles of a triangle are uniquely determined by its sides. Alternatively, the symmetric bilinear form (Qx, Qy) is uniquely determined by the quadratic form (Qx, Qx).
31. Hint. A given isogonal operator A transforms the orthonormal basis eue2, . . . ,en into an orthogonal basis f[ = a^, f2' = a2/2,. .. ,f'n =
where fltf2...../„ are unit vectors. Let Q be the isometric operator carrying
the vectors/j,/2, ...,/„ into eu e2, . . . , en. Then the matrix of the isogonal operator QA is diagonal. Show that the condition at a, allows one to construct a pair of orthogonal vectors which are carried into nonorthogonal vectors by the operator QA.
32. Hint. It is sufficient to show that Q is an isogonal operator (see Prob, 31). Assuming that there is a right angle which is not transformed into a right angle, construct a parallelogram whose area changes .as a result of applying the operator Q.
33. Hint. Generalize the construction of Prob. 32.
34. Hint. Applying the orthogonalization process to the given systems, obtain orthonormal systems eu e2, . , , and/3,/2,.... Using Sec. 8,53, show that the formulas expressing the vectors Xj, x2,. . . , xk in terms of elt e2,. , , are the same as those expressing the vectors yity2,. . , ,yk in terms of /],/2,. - , ■ Then define Q as the operator which maps the system elt e2, . , . into the system
/l»/2» ■ - ■ ■
HINTS AND ANSWERS 373
35. Hint. Consider the finite systems e[, e\, e'2, e\,.. . , e'k, ek and f[ff^f2, f2,. ■ - ,f'kyfl obtained in determining the angles between the subspaces R', R" and the subspaces S', S". By construction,
= (fiJ") = c°s ft      (/ = 1, 2,... , k),
K O = (/;./",) = o,   «, e\) =       = o (/
Show further that (e\, = (/-,/p = 0 (using Prob. 9). Then use the result of Prob. 34.
36. Hint. Use Prob. 11.
37. Hint. In the subspaces L3 and L2, let eu e2,. . . , em and /i,/2,.. . ,/m be the bases obtained in constructing the angles al5 a2,... , am. In the space R construct a basis ex, e2,... , em, ^TO+1,. .. ,en which begins with the vectors obtained by orthogonalizing the vectors elf e2,. .. , em,flyf2,. . . ,fm. Expand the vectors xly x2,... , xm, yly y2,. . . , ym with respect to this basis. Show that the matrices of these expansions each have only one minor of order m, if we disregard minors which are known to vanish. Then use the expression for the volume of a hyperparallelepiped in terms of the minors of the corresponding matrix.
38. Hint. See Chap. 3, Prob. 2 and Chap. 4, Prob. 17.
39. Hint. Verify the assertion in the special basis whose first k vectors belong to the subspace ~L(xx, x2,. .. , xk). To go over to the general case, use Chap. 4, Prob. 17, showing that det        = 1.
40. Hint. First consider the case k = 2.
41. Hint. Choose a basis in the space R like that chosen in Prob. 37, and verify that the formula is valid in this basis. Then go over to the general case in the same way as in Prob. 39-
42. Hint. See Sec. 4.54.
43. Hint. Consider the orthogonal complement Z of the invariant (with respect to A) subspace H of all vectors x such that P(A)x = 0. The subspace Z is also invariant with respect to the operator A, and hence with respect to [P(A)]fc"1. But if z e Z, then [/"(A)]*--^ e H, so that [Z'(A)]'c~1z = 0. From this, deduce that [Pit)]**1 is an annihilating polynomial of the operator A.
Chapter 9
1. Hint. Use Sec. 9.45.
2. Hint. The operator B has a basis consisting of eigenvectors elt. . . , en with positive eigenvalues i^,. . . , Hence B2^ = y.2eif and a necessary condition for B2 = A is that the et be eigenvectors of the operator A and that the numbers [x\ coincide with the     But this is also sufficient for B2 = A.
374     HINTS AND ANSWERS
3. Hint. First transform the basis in such a way as to diagonalize the matrix of the given operator.
Ans.
2 4
2
4. Hint. The operator A'A is symmetric, and the expression (A'Ajc, x) = (Ax, Ax) is nonnegative for arbitrary x e R„. If A is nonsingular, this expression is positive for arbitrary x e R„.
5. Hint. Q' = Qr1.
6. Hint. The operator A'A is symmetric and positive (Prob. 4), and hence we can find a symmetric positive operator S such that S2 = AA'. Then construct an operator Q such that Q = S~*A and show that Q is isometric.
7. Hint. Use Probs. 2 and 5.
8. Hint. Let R' R„ be the subspace spanned by the eigenvectors of the operator A'A with nonzero eigenvalues, and let R" be the orthogonal complement
of R'. On R' let V equal the isometric component of A (so that ^/A'A \x =Ajc), and on R" let \x = 0.
9. Hint. Use Chap. 4, Probs. 28-29.
10. Hint. Apply the orthogonalization process to the vectors of the Jordan basis of A (Sec. 6.37).
Chapter 10
2       2 1 1. Ans. a) 4rj2 + tj2 - 2y)2; tjj = - ^ - - S2 + - S3,
2        1 2
^2 = 3 Si + 3 S2 — J S2,
1        2 2
% = 3 Si + 3 S2 + 3 ^ai
1 2 2
b) 10rj2 + Y)2 + Y)2; tjj = - Si + j S2 ~ 3 S3,
2 1
^2 = ~7r Si 7=- ^2>
HINTS AND ANSWERS 375
c) vi - 'fit + 3V)2; + 5y)2;      = i 5i + ^2 ± ^3 ± ~ ^,
_ 1       1 r     1 1
7i2 ~~ 2 ^ ~*~ 2 ^2 ~~ 2 ^3 ~ 2
_lr      1        1        1 r
^3 ~ 2 i'1 ~~ 2 ^2    2 ^3 ~~ 2
1        1        1 1
■^4 = 2 ^i ~" 2 ^2 ~~ 2 ^3    2 ^*'
d) vj2 + r)2 + r,2 - 3r)2;     ^ ^ ^ + ^? 5af
V2" c       V2 c
1 „      1 1 1 p
2 ?1 ~ 2 ^2 + 2 ^3 ~ 2 "4
1111
*)4 = 2 *»i ~ 2 ^2    2 ^3 2
2. /4ns. A maximum for x = (±1, 0, 0) where A(*, jc) = 1. A minimum for x = (0,0, ±1) where \(x, x) = A minimax for x = (0, ±1,0) where A(jc, at) = x, i.e., the function A(x, x) increases if we go along the unit sphere in one direction from the point x and decreases if we go in the other direction.
3. Hint. Namely, on the subspace spanned by the corresponding canonical basis vectors.
4. Hint. The coefficient Xk equals the smallest of the maxima of the form \(x, x) on a system of subspaces, and the coefficient [xfc equals the smallest of the maxima of the form B(jc, x) on the same system of subspaces.
5. Ans. yjx = ±£.
6. Ans. \(x, x) =     + Tit + Y]2, B(x, x) = rj2 ± 2rj2 + 3tj|,    = ^ - tj2 ± 2ri3,
^2 = ^2 — ^3? ^3 = ^3-
7. if/'*/. The problem reduces to the uniqueness of the canonical basis of a symmetric operator with distinct eigenvalues.
8. Hint. Generalize Sec. 7.44.
9. Ans. a) A hyperboloid of one sheet with its axis along the ^-axis; b) A hyperboloid of one sheet with its axis along the .xr-axis; c) A circular paraboloid with its axis along the x-axis; d) A circular paraboloid with its axis along the j>-axis, displaced one unit along this axis; e) A hyperbolic paraboloid.
10. Ans. a) x\ + 2y\ + 3z2 = 6;     3(x - 1) = -xx + 2yx + 2zlt
3y - 2xx - y1 + 2zlt 3(z ± 1) = 2xx + 2yx - zx\
376     HINTS AND ANSWERS
b) x\ + 2yl - lz\ = 6;     3(x + 1) = -xx + 2yx + 2zlt
3(y + 1) = 2xx ~yx + 2z1} 3z = 2xx + 2_vx - zx;
c) y\ = 2xx\ 3(x - m) = 2xx + 2yx + zlt
3(y + 2m) = 2xx — yx — 2zlt 3(z + 2m) = -xx + 2yy - 2zx (m arbitrary).
11. Hint. The semiaxes of the ellipsoid are determined from the canonical coefficients of the corresponding quadratic form. Use the results of Sec. 10.25.
Chapter 11
1. Hint. Let K' be the intersection of the null spaces of all operators belonging to a left ideal J <= B(K„), and let r be the dimension of K'. Choose a basis in K„ whose first r basis vectors lie in K'. Then the first r columns of the matrix of every operator A e J consists entirely of zeros. Let m be the dimension of J, and let Alt. . . , Am be linearly independent operators in J. Consider the matrix with
n — r columns and mn rows obtained by writing all the matrices Ax.....Am on
top of each other and omitting the first r (zero) columns. The rank of this matrix is n — r, and hence it has n — r basis rows. The linear combinations of these rows give all possible rows consisting of n — r elements. Now use Sec. 4.44.
2. Hint. Introducing a nonsingular bilinear form (x,y), consider the set J* of all operators A* conjugate to the operators A g J. This set is a left ideal. Now use Prob. 1.
3. Ans. A maximal left ideal of the algebra B(Kn) is the set of all operators carrying a fixed vector of the space Ktt into zero. A minimal left ideal is the set of all operators carrying a fixed (n — l)-dimensional subspace of K„ into zero. A maximal right ideal is the set of all operators carrying the whole space K„ into a fixed (n — l)-dimensional subspace. A minimal right ideal is the set of all operators carrying the whole space Kn into a fixed straight line.
4. Hint. Let
(x, y) = 2 ty,      I * = 2 -A> y = 2 )
in the basis elt. . . , en in which the matrix of the operator A e B takes the form indicated in Sec. 11.85.
5. Hint. If a subspace C <= Cn is invariant (with respect to the algebra B), then so is its orthogonal complement. Expand Cn as an orthogonal direct sum of irreducible invariant subspaces. Every operator A ^ 0 (of the algebra B) acts as a nonzero operator in at least one of these subspaces.
HINTS AND ANSWERS 377
6. Hint. Deduce from the representation of Sec. 11.85 that the commutator of a semisimple but nonsimple matrix algebra B intersects B in matrices other than multiples of the matrix corresponding to the identity operator.
7. Hint. Write the desired matrices as block matrices consisting of m2 blocks. Then write the commutativity condition and use Schur's lemma.
8. Ans. For the algebra B of all diagonal matrices
\ 0 " ••• 0 0    X2   ••• 0
* ■ • . ■ *
0   0   ••■ x„
where Xlt X2,... , X„ are arbitrary complex numbers. Every matrix algebra B = B reduces to this form in some basis.
9. Ans. Let B be the algebra of all operators under which a given system of subspaces, whose direct sum is the whole space Cn, remain characteristic subspaces. Then B c B. Every algebra with B^B reduces to this form.
10. Ans. The space Cn is a direct sum of subspaces C(1),.. . , C{k), and the algebra B consists of all operators invariant in each Clj) (y = 1,. .. , k). The commutator B consists of all operators which are multiples of the identity operator in each CU) (J = 1.....k).
11. Hint. If B is a direct sum B(1) + • • • + B<*>, then B - B(1> + ■ • ■ + B(fc>.
12. Ans. If the multiplicity of each root of the characteristic equation of the operator equals the size of the corresponding Jordan block (see Chap. 6, Prob. 15).
13. Hint. If CB = B, then CA = C for some A e B. It follows that C = CA = C(CA) = C2A = OA = ■ ■ ■ .
15. Hint. Let Alt. . . , Am be a basis of the algebra B. Then, if B is not nil-potent, one of the right ideals A1B,.. . , AmB, say A^, is not nilpotent (Prob. 14). Moreover AXB ^ 0 (Prob. 13), and the problem reduces to the analogous problem for an algebra of smaller dimension.
16. Hint. If M( = Mi+1, then for every vector xeMj, there is an operator AxeB such that A^^M^ = Mi+1. Moreover, there is an operator A2eB such that A^x e Miy and so on. If Mv ^ K„, then for every x e Mp+1 — Mp there is an operator A„£B such that Avxe Mv — M^, then an operator A^e Bsuch that A^A^ £        — and so on, somatA^ ■ • • Apx ^ 0.
17. Hint. Use the subspaces M1}. . . , M^, of Prob. 16.
BIBLIOGRAPHY
Bellman, R., Introduction to Matrix Algebra, McGraw-Hill Book Co., Inc., New York (1960).
Gantmakher, F. R., The Theory of Matrices, 2 vols., translated by K. A. Hirsch,
Chelsea Publishing Co., New York (1959). Gelfand, I. M., Lectures on Linear Algebra, translated by A. Shenitzer, Interscience
Publishers, Inc., New York (1961). Haimos, P. R., Finite-Dimensional Vector Spaces, second edition, D. Van Nostrand
Co., Inc., Princeton, N.J. (1958). Hamburger, H. L. and M. E- Grimshaw, Linear Transformations, Cambridge
University Press, New York (1951). Hoffman, K. and R. Kunze, Linear Algebra, Prentice-Hall, Inc., Englewood
Cliffs, N.J. (1961).
Jacobson, N., Lectures in Abstract Algebra, Vol. 2, Linear Algebra, D. Van Nostrand
Co., Inc., Princeton, N.J. (1953). Mirsky, L., An Introduction to Linear Algebra, Oxford University Press, New York
(1955).
Noble, B., Applied Linear Algebra, Prentice-Hall, Inc., Englewood Cliffs, N.J. (1969).
Per lis, S., The Theory of Matrices, Addison-Wesley Publishing Co., Reading, Mass. (1952).
Shilov, G. E., An Introduction to the Theory of Linear Spaces, translated by R. A.
Silverman, Prentice-Hall, Inc., Englewood Cliffs, N.J. (1961). Thrall, R. M. and L. Tornheim, Vector Spaces and Matrices, John Wiley and Sons,
Inc., New York (1957).
379
INDEX
Adjoint matrix, 258
Adjoint operator, 198, 238, 254, 258 Adjugate matrix, 116 Affine space, 31, 215 A-isomorphism, 199, 254 Algebra(s), i36ff, 3l2ff
of analytic functions, 176
commutator of, 320 second, 320
commutative, 137
complete, 337, 338-340
of diagonal matrices, 353-357
composition series of, 323
of dimension n, 137
factor, 138
finite-dimensional, 3l5ff ideals in, 138 of jets, 161
morphism of, 138-139, 313
nilpotent, 333
normal series of, 323
one-dimensional, 340-345
of operators, 169
of polynomials, 137ff
radical, 317
radical of, 317
of rational functions, 175
representations of, 313ff
semisimple, 316
structure of, 323-327 simple, 315, 345-352
structure of, 320-322 simple components of, 326 subalgebra of, 138
Algebra(s) (cont.):
trivial, 137, 312
unit in, 137, 312 left, 137 right, 137 two-sided, 137 Angle (s):
between /c-vectors, 245
between subspaces, 244
between vectors, 217 Annihilating polynomial, 143
minimal, 143 Antiself-adjoint operator, 262 Antisymmetric operator, 238
real, structure of, 269 Antisymmetry property of determinants, 9 Associativity, 1, 2, 83, 86, 136
Basis, 38ff
components of a vector with respect to, 39
orthogonal, 222, 257
orthonormal, 222, 258 Basis columns, 25, 59 Basis minor, 25, 59 Basis minor theorem, 25, 59 Basis rows, 59 Bessel's inequality, 224 Bicontinuous mapping, 294 Bilinear form(s), 179ff
canonical basis of, 190
canonical coefficients of, 192
canonical form of, 191
general representation of, 180
Hermitian, 247
381
382 INDEX
Bilinear form(s) (cont.):
matrix of, 181
transformation of, 181-182
nonsingular, 182, 208
positive definite, 208
rank of, 182
symmetric, 181 in a Euclidean space, 273 Bilinear function (see Bilinear form) Bilinear functional, 180 Bordered minors, 302 Bounded set, 217 Buniakovsky, V. Y., 218
Canonical basis: of a bilinear form, 190 construction of, by Jacobi's method, 192-196
of a Hermitian form, 252
of a quadratic form, 185 Canonical coefficients, 185, 192 Canonical equation:
of a central surface, 288
of a noncentral surface, 289 Canonical form, 133ff
of a bilinear form, 191
of a Hermitian form, 252
Jordan, 146
of the matrix of an arbitrary operator, 146
of the matrix of a nilpotent operator, 136
of a quadratic form, 185 Canonical mapping, 54, 139 Cartan, H-, 335 Category, 335ff
extension of, 358
of finite-dimensional spaces, 336
linear, 336
maximal, 343
objects of, 335 mappings of, 335 Cauchy, A. L., 218, 362 Center (of a surface), 289, 299 Central surface, 288, 290
canonical equation of, 288
proper, 288
in n dimensions, 293-294 semi axes of, 291 Characteristic equation, 110 Characteristic polynomial:
of a matrix, 110
of an operator, 126 Characteristic space (see Eigenspace) Characteristic value (see Eigenvalue) Chebotarev, N. G., 332 Circular paraboloid, 297 Class (of comparable elements), 48 Cof actor:
of an element, 12
of a minor, 22 Cognate spaces, 352 Columns of numbers:
linear combination of, 10, 24 coefficients of, 24
linearly dependent, 27
product of, with a number, 24
Columns of numbers (cont.):
sum of, 24 Commutativity, 1, 137 Commutator, 320
second, 320 Comparable elements (of a subspace), 48 Complementary minor, 21 Complex numbers, field of, 3 Components, 34
simple, of an algebra, 326
of a vector, with respect to a basis, 39 Composition series, 323 Conical surface, 288, 294-296 Conjugate operator (see Adjoint operator) Conjugate subspace, 190, 252 Conjugate surface, 299 Conjugate vector:
to a subspace, 190, 252
to another vector, 190, 252 Coordinate transformation(s), 118ff
consecutive, 120
matrix of, 119
operator of, 119
orthogonal, 239
unitary, 259 Courant, R., 276 Cramer's rule, 20, 35
Derivatives of a polynomial, 163
Descending principal minors, 193 Determinant (s), 6ff
antisymmetry property of, 9
column operations on, 11
elements of, 6
evaluation of, 16-17
expansion of:
with respect to a column, 12 with respect to a row, 12
Gram, 230
linear property of, 10
of a matrix, 6
order of, 6
product of, 103
of a product of matrices, 103
quasi-triangular, 23
terms of, 6
transpose of, 9
triangular, 14
Vandermonde, 15 Diagonal matrix, 100 Diagonal operator, 100 Diagonalizable operator, 100 Dimension:
of a hyperplane, 53
of a linear manifold, 51
of a linear space, 40 over a subspace, 45
of an algebra, 137
of the null space of an operator, 94
of the range of an operator, 93
of a sum of subspaces, 47 Direct sum, 45, 314
orthogonal, 223 Directed line segment, 31 Distortion coefficient, 242
INDEX 383
Distributivity, 2, 83, 86, 136
Eigenray (see Invariant direction)
Eigenspace, 110
Eigenvalue, I08ff
Eigenvector, 108ff
Eilenberg, S., 335
Elementary divisor, 151
Elementary operations, 67
Ellipsoid, 292
Elliptic paraboloid, 297
Embedding, 54, 139
Epimorphism, 53
of an algebra, 139 Equivalence classes, 339 Equivalent elements, 339 Euclidean isomorphism, 221 Euclidean space(s), 2l5ff
embedding of, in a unitary space, 263ff
Euclidean-isomorphic, 221
Factor algebra, 138
Factor space, 49 Faguet, M. K., 243 Field(s), Iff axioms, 1
of complex numbers, 3
isomorphic, 2
of rational numbers, 2
of real numbers, 2 First structure theorem, 322 Fourier coefficients, 222 Fredholm's alternative, 73 Fredholm's theorem, 212 Fundamental system of solutions, 65
normal, 66 Fundamental theorem of algebra, 3
General sointion, 63, 66
Gram determinant, 230
Hadamard inequality, 234
Hamilton-Cayley theorem, 155 Hardy, G. H., 2, 3 Hermitian (bilinear) form, 247 canonical basis of, 252 canonical form of, 252 (Hermitian-)symmetric, 248 nonsingular, 249 positive definite, 253 rank of, 249 Hermitian conjugate matrix, 258 Hermitian conjugate operator (see Adjoint
operator) Hermitian matrix, 248 Hermitian quadratic form(s), 249, 308-310 symmetric, 249
canonical form of, 309 simultaneous reduction of two, 310 stationary values of, 309 Hermitian-symmetric matrix (see Hermitian matrix)
Hermitian-symmetric operator (see Self-adjoint operator) Homeomorphic figures, 294
Homogeneous linear system, 43 Homotopic figures, 294 Hyperbolic paraboloid, 298 Hyperboloid of one sheet, 292 Hyperboloid of two sheets, 292 Hyperparallelepiped, volume of, 232 Hyperplane, 52
dimension of, 53 Hypotenuse (in a Euclidean space), 220
Ideal:
left, 138
proper, 138
right, 138
two-sided, 138 Identity matrix (see Unit matrix) Identity operator, 78, 99 Identity transformation, 120 Imaginary numbers, 3 Inclusion relations, 31
Incompatible system of linear equations, 4, 234
Index of inertia, 206
negative, 206, 251
positive, 206, 251 Index of nilpotency, 333 Integers (in a field), 2
Interpolation with least mean square error, 237
Invariant, 131 Invariant direction, 108 Invariant matrix, 202 Invariant operator, 201 Invariant subspace, 106, 313 Inverse element, 32 uniqueness of, 33 Inverse matrix, 105 Inverse operator, 105 Inversion, 5
Invertible element (of an algebra), 137 Isogonal operator, 244 Isometric operator, 239
real, structure of, 270 Isomorphism:
of algebras, 139
of fields, 2
of linear spaces, 53
Jacobi's method, 192-196, 252
Jacobson, N., 332 Jet(s):
addition of, 161
algebra of, 161
invertibility of, 167
multiplication of, 161
product of:
with another jet, 161 with a number, 161
sum of, 161
symmetric, 168 Jordan basis, 146 Jordan block, 147 Jordan canonical form, 146
real, 159
384 INDEX
Jordan normal form (see Jordan canonical form)
Kernel, 56, 313
Khelemski, A. Y., 332, 334, 360 Jt-linear form, 203 Krasnosyelski, M. A, 237, 242, 244 Krein, M. G., 242, 280 Kronecker-Capelli theorem, 62 Kurosh, A. G., 335 A:-vectors, 245
angles between, 245
equal, 245
scalar product of, 246
Lagrange's method, 276, 280, 285, 309
Laplace's theorem, 23
Law of inertia, 205, 207, 251
Left ideal, 138
Left inverse, 97, 98, 104, 137
Left unit, 137
Legendre, A. M., 228
Legendre polynomials, 228-230
Length of a vector, 217, 255
Linear combination:
of columns, 10, 24
of vectors, 36 Linear dependence:
of columns, 27
of vectors, 36 Linear family (of operators), 336 Linear form, 75
coefficients of, 76
transformation of, 123-124
of the first kind, 77
of the second kind, 77 Linear functional, 76 Linear independence of vectors, 36
over a subspace, 44 Linear manifold (spanned by spaces), 328 Linear manifold (spanned by vectors), 50 Linear operator (see Operator) Linear space(s), 3Iff
A-isomorphic, 199, 254
basis for, 38
cognate, 352
complex, 34
concrete, 34
dimension of, 40 over a subspace, 45
direct sum of, 45
infinite-dimensional, 40
(K-)isomorphic, 53
n-dimensional, 40
real, 34
subspace of, 42
tensor product of, 349 Linear subspace (see Subspace) Linear vector function (see Linear operator)
Matrices;
block, 89
multiplication of, 89 determinant of a product of, 103
Matrices (cont.):
minors of a product of, 91 multiplication of, 85
noncommutativity of, 85-86 quasi-diagonal, 90
multiplication of, 90 rank of a product of, 95 sum of, 84 transposed, 90
multiplication of, 90 Matrix, 5ff adjoint of, 258 ad jugate of, 116 augmented, 62 of a bilinear form, 181 block, 89
characteristic polynomial of, 110 coefficient, 18, 62 determinant of, 6 diagonal, 100 elements of, 5
Hermitian conjugate of, 258
Herrnitian (-symmetric), 248
identity, 81, 99
invariant, 202
inverse, 105
left inverse of, 98
minor of, 13, 21, 59
of a nilpotent operator, 136
nonsingular, 104, 119
of an operator, 79, 98
order of, 5
orthogonal, 239
principal diagonal of, 5
product of:
with a number, 84 with another matrix, 85
of a quadratic form, 185
quasi-diagonal, 90
rank of, 25, 59, 60, 67-71
right inverse of, 98
singular, 104
symmetric, 181
trace of, 115
transpose of, 60, 90
transposed, 60
unit, 81, 99
unitary, 259 Matrix algebra, 330
semisimple, 330
simple, 330 McShane, E. J., 276 Mean square deviation, 235 Method of least squares, 235-236 Metric geometry, 214 Minor, 13ff
basis, 25, 59
bordered, 302
complementary, 21
of order k, 21
of a product of matrices, 91 principal, 126 descending, 193 Monomorphism, 53 of an algebra, 139
INDEX 385
Morphism, 53
of an algebra, 138
kernel of, 56
null space of, 56
range of, 55 Multilinear form, 203
antisymmetric, 203
symmetric, 203
Natural numbers, 2
Negative element, 1 Nemirovski, A. S., 316 Nilpotent operator, 133
matrix of, 136 Noncentral surface, 289
canonical equation of, 289
nondegenerate, 296-299 Nonnegative operator, 271 Norm, 217, 255 Normal operator, 238, 259
geometric meaning of, 268-269
real, structure of, 265-269 Normal series, 323 Null space, 56, 94 Number field {see Field)
Operator(s), 53, 75, 77«
acting in a space, 98 addition of, 82, 84 adjoint of, 198, 238, 254, 258 annihilating polynomial of, 143
minimal, 143 antiself-adjoint, 262 antisymmetric, 238 characteristic polynomial of, 126 characteristic space of, 110 conjugate of, 198 determinant of, 125 diagonal, 100 diagonalizable, 100 eigenspace of, 110 eigenvalue of, 108 eigenvector of, 108 elementary divisor of, 151 equality of, 82 equivalent, 133, 153
extension of, from a real to a complex
space, 264 Hermitian conjugate of, 254 Hermitian-symmetric, 262 identity, 78, 99 invariant, 201 inverse of, 105
matrix of, 105 invertible, 105 isogonal, 244 isometric, 239
Jordan canonical form of, 146 left inverse of, 97, 104 mapping a space K„ into itself, 98ff matrix of, 79, 98
transformation of, 124 multiplication of, 82-83 negative of, 78 nilpotent, 133
Operator(s) {cont.); nonexpanding, 272 nonnegative, 271 nonsingular, 105 normal, 238, 259
real, 265 null space of, 94 positive, 271 powers of, 101 product of:
with a number, 82
with another operator, 83 projection, 100 range of, 93 rank of, 93
right inverse of, 97, 104
rotation, 99
self-adjoint, 262
similarity, 99
spectrum of, 169
sum of, 82
symmetric, 238
tensor product of, 350
trace of, 126
unit, 78, 99
unitary, 259, 263
zero, 78, 98 Operator functions, 169-176
matrices of, 171-176 Order:
of a determinant, 6
of a matrix, 5 Orthogonal basis, 222, 257 Orthogonal complement, 220, 257 Orthogonal direct sum, 223 Orthogonal matrix, 239 Orthogonal transformation, 239 Orthogonal vectors, 219, 257 Orthogonality of a vector:
to a set, 220
to a subspace, 220 Orthogonalization theorem, 226 Orthonormal basis, 222, 258
Paraboloid, 296-299
circular, 297
elliptic, 297
hyperbolic, 298 Partially ordered set, 339 Particular solution, 63
Perpendicular (dropped onto a subspace), 223
foot of, 224 Planes (in a linear space), 53 Polynomial algebras, 137ff Positive definite bilinear form, 208 Positive definite Hermitian form, 253 Positive definite quadratic form, 206 Positive operator, 271 Prepartially ordered set, 339 Principal minor, 126
descending, 193 Product:
of jets, 161
of matrices, 85
386 INDEX
Product (cont.):
of numbers, 1
of operators, 82-83
of vectors with numbers, 32 Projection (of a vector), 223 Projection operator, 100 Pythagorean theorem, 221, 257
Quadratic form(s), 179, 183«
canonical basis of, 185 canonical coefficients of, 185 canonical form of, 185 comparable, 310 in a Euclidean space, 273ff extremal properties of, 276-283 Hermitian, 249
in a unitary space, 308-310 matrix of, 185 nonsingular, 185 positive definite, 206 rank of, 185, 189
reduction of, to canonical form, 185-189 simultaneous reduction of two, 283-287 Quadric surface(s), 287-308
analysis of, from general equation, 300-308
canonical equation of, 287 central, 288, 290 degenerate, 288, 299 noncentral, 289 nondegenerate, 288 Quotient: of elements of an algebra, 137 of numbers, 2
Radical (of an algebra), 317
Radical algebra, 317 Radius vector, 35 Range, 55, 93 Rank:
of a bilinear form, 182
of a Hermitian form, 249
of a matrix, 25, 59, 60, 67-71
of an operator, 93
of a product of matrices, 95
of a quadratic form, 185, 189 Ratio of similitude, 99 Rational numbers:
in a field, 2
field of, 2 Real numbers, field of, 2 Reciprocal element, 2 Representation^), 3l3ff
direct sum of, 314
equivalent, 313
exact, 313
faithful, 313
invariant subspace of, 313
minimal, 314
proper, 314 irreducible, 314 kernel of, 313 left regular, 314, 318-320 restriction of, 313 standard, 318
Representation(s) (cont.):
trivial, 313 Right ideal, 138 Right inverse, 97, 98, 104, 137 Right unit, 137 Rodrigues, J. M., 228 Rotation operator, 99
Scalar product, 214, 215«
complex, 254ff
of /c-vectors, 246 Scalar quantity, 131 Schur's lemma, 315 Schwarz inequality, 218, 256 Second-degree curve, 287 Second-degree surface {see Quadric surface) Second structure theorem, 325 Self-adjoint operator, 262 Semiaxes, 291 Semisimple algebra, 316
representations of, 327-330
structure of, 323-327 Shostak, R. Y., 210 Silverman, R. A., 282 Similarity operator, 99 Simple algebra, 315
representations of, 327-330
structure of, 320-322 Slope (of segment joining matrix elements):
negative, 7
positive, 7 Solution space of a linear system, 43 Space:
C(a, b), 35
Cn, 34
Kn, 34
R{a, b), 35 34
V„ 34
V„ 34
V3, 34 Spectrum, 160
multiplicity of, 160
symmetric, 168 Spread of subspaces, 242 5-sets, 353
product of, 356 Stationary value:
of a function, 276
of a quadratic form, 276 Straight lines (in a linear space), 53 Subalgebra, 138 Subspace(s), 42ff
angles between, 244
comparable^elements of, 48
conjugate, 190, 252
direct sum of, 45 orthogonal, 223
intersection of, 42
invariant, 106, 313
nontrivial, 42
orthogonal complement of, 220, 257 spread of, 242 sum of, 42 trivial, 42
INDEX 387
Sum^
of jets, 161
of matrices, 84
of numbers, 1
of operators, 82
of vectors, 31 Summation convention, 126 Sylvester's conditions, 253 Symmetric operator, 238
real, structure of, 269 System of linear equations, 3ff
augmented matrix of, 62
coefficient matrix of, 18, 62
coefficients of, 3
compatible, 4 nontrivially, 61
compatibility of, 4, 61 condition for, 62 nontrivial, 61
constant terms of, 4
determinate, 4
homogeneous, 43
incompatible, 4, 234
indeterminate, 4
index of, 212
solution(s) of, 4 distinct, 4
fundamental system of, 65
normal, 66 general, 63, 66
product of, with a number, 43 sum of, 43 trivial, 43 solution space of, 43
Taylor's formula (for a polynomial), 163
Tensor(s), 126-131
addition of, 130
contraction of, 130
contravariant, 129
covariant, 129
invariants of, 131
mixed, 130
multiplication of, 130
order of, 129 Tensor product:
of linear spaces, 349
of operators, 350 Trace, 115, 126, 131 Transpose:
of a determinant, 9
of a matrix, 60, 90
Triangle inequalities, 221, 257 Trivial solution, 43 Two-sided ideal, 138
Unit (two-sided), 137
Unit ball, 217, 256
Unit matrix, 81, 99
Unit operator (see Identity operator)
Unit sphere, 217, 256
Unit vector, 217, 256
Unitary matrix, 259
Unitary operator, 259, 263
Unitary space, 254ff
Unitary transformation, 259
Vandermonde determinant, 15
Vector(s), 3 Iff angle between, 217 complex conjugate of, 263 components of, 39
transformation of, 121 conjugate:
to a subspace, 190, 252
to another vector, 190, 252 cyclic, 314 difference of, 34 height of, 134 length of, 217, 255 linear combination of, 36
coefficients of, 36 linearly dependent, 36 linearly independent, 36 norm of, 217, 255 normalization of, 217, 256 orthogonal:
to a subspace, 220, 257
to another vector, 219, 257 perpendicular dropped from the end of, 223
product of, with a number, 32 projection of, onto a subspace, 223 purely imaginary, 263 real, 263 sum of, 31 unit, 217, 256
Wedderbnrn's theorem, 332
Zero, 1
Zero column, 27 Zero operator, 78, 98 Zero vector, 32 uniqueness of, 32