Brisk guide to Mathematics Jan Slovák and Martin Panák, Michal Bulant, Vladimir Ejov, Ray Booth Brno, Adelaide, 2020 Authors: Ray Booth Michal Bulant Vladimir Ezhov Martin Panák Jan Slovák With further help of: Aleš Návrat Michal Veselý Graphics and illustrations: Petra Rychlá 2018 Masaryk University, Flinders University Contents - practice Contents - theory Chapter 1. Initial warmup 4 Chapter 1. Initial warmup 4 A. Numbers and functions 4 1. Numbers and functions 4 B. Difference equations 9 2. Difference equations 10 C. Combinatorics 13 3. Combinatorics 14 D. Probability 18 4. Probability 18 E. Plane geometry 28 5. Plane geometry 27 F. Relations and mappings 41 6. Relations and mappings 41 G. Additional exercises for the whole chapter 47 Chapter 2. Elementary linear algebra 68 Chapter 2. Elementary linear algebra 68 1. Vectors and matrices 68 A. Vectors and matrices 68 2. Determinants 81 B. Determinants 83 3. Vector spaces and linear mappings 92 C. Vector spaces and linear mappings 92 4. Properties of linear mappings 112 D. Properties of linear maps 112 E. Additional exercises for the whole chapter 124 Chapter 3. Linear models and matrix calculus 140 1. Linear optimization 140 Chapter 3. Linear models and matrix calculus 140 2. Difference equations 149 A. Linear optimization 140 3. Iterated linear processes 158 B. Difference equations 147 4. More matrix calculus 168 C. Population models 155 5. Decompositions of the matrices and D. Markov processes 162 pseudoinversions 192 E. Unitary spaces 168 F. Matrix decompositions 172 Chapter 4. Analytic geometry 230 G. Additional exercises for the whole chapter 203 1. Affine and Euclidean geometry 230 2. Geometry of quadratic forms 252 Chapter 4. Analytic geometry 230 3. Proj ecti ve geometry 259 A. Affine geometry 230 B. Euclidean geometry 239 Chapter 5. Establishing the ZOO 277 C. Geometry of quadratic forms 255 1. Polynomial interpolation 277 D. Further exercise on this chapter 269 2. Real numbers and limit processes 288 3. Derivatives 311 Chapter 5. Establishing the ZOO 277 4. Infinite sums and power series 325 A. Polynomial interpolation 277 B. Topology of real numbers and their subsets 286 Chapter 6. Differential and integral calculus 371 C. Limits 288 1. Differentiation 371 D. Continuity of functions 305 2. Integration 391 E. Derivatives 308 3. Sequences, series and limit processes 416 F. Extremal problems 314 G. L'Hospital's rule 328 Chapter 7. Continuous tools for modelling 449 H. Infinite series 334 1. Fourier series 449 I. Power series 340 2. Integral operators 471 J. Additional exercises for the whole chapter 345 3. Metric spaces 482 Chapter 6. Differential and integral calculus 371 Chapter 8. Calculus with more variables 509 A. Derivatives of higher orders 371 1. Functions and mappings on R" 509 B. Integration 391 2. Integration for the second time 547 C. Power series 425 3. Differential equations 561 D. Extra examples for the whole chapter 439 Chapter 7. Continuous tools for modelling 449 A. Orthogonal systems of functions 449 B. Fourier series 453 C. Convolution and Fourier Transform 469 D. Laplace Transform 482 E. Metric spaces 484 F. Convergence 491 G. Topology 496 H. Additional exercises to the whole chapter 501 Chapter 8. Calculus with more variables 509 A. Multivariate functions 509 B. The topology of En 512 C. Limits and continuity of multivariate functions 514 D. Tangent lines, tangent planes, graphs of multivariate functions 516 E. Taylor polynomials 525 F. Externa of multivariate functions 526 G. Implicitly given functions and mappings 531 H. Constrained optimization 533 I. Volumes, areas, centroids of solids 548 J. First-order differential equations 566 K. Practical problems leading to differential equations 577 L. Higher-order differential equations 579 M. Applications of the Laplace transform 587 N. Numerical solution of differential equations 590 0. Additional exercises to the whole chapter 595 Chapter 9. Continuous models - further selected topics 606 A. Exeterior differential calculus 606 B. Applications of Stake's theorem 606 C. Equation of heat conduction 612 D. Variational Calculus 614 E. Complex analytic functions 614 Chapter 10. Statistics and probability methods 700 A. Dots, lines, rectangles 700 B. Visualization of multidimensional data 709 C. Classical and conditional probability 712 D. What is probability? 720 E. Random variables, density, distribution function 723 F. Expected value, correlation 734 G. Transformations of random variables 739 H. Inequalities and limit theorems 741 1. Testing samples from the normal distribution 747 J. Linear regression 758 K. Bayesian data analysis 760 L. Processing of multidimensional data 765 Chapter 11. Number theory 777 A. Basic properties of divisibility 777 B. Congruences 786 C. Solving congruences 799 D. Diophantine equations 818 Chapter 9. Continuous models - further selected topics 606 1. Exterior differential calculus and integration 606 2. Remarks on Partial Differential Equations 630 3. Remarks on Variational Calculus 661 4. Complex Analytic Functions 676 Chapter 10. Statistics and probability theory 700 1. Descriptive statistics 700 2. Probability 712 3. Mathematical statistics 758 Chapter 11. Elementary number theory 777 1. Fundamental concepts 777 2. Primes 781 3. Congruences and basic theorems 786 4. Solving congruences and systems of them 799 5. Diophantine equations 814 6. Applications - calculation with large integers, cryptography 818 Chapter 12. Algebraic structures 841 1. Posets and Boolean algebras 841 2. Polynomial rings 856 3. Groups 871 4. Coding theory 891 5. Systems of polynomial equations 899 Chapter 13. Combinatorial methods, graphs, and algorithms 923 1. Elements of Graph theory 923 2. A few graph algorithms 949 3. Remarks on Computational Geometry 971 4. Remarks on more advanced combinatorial calculations 989 E. Primality tests 821 F. Encryption 825 G. Additional exercises to the whole chapter 836 Chapter 12. Algebraic structures 841 A. Boolean algebras and lattices 841 B. Rings 855 C. Polynomial rings 857 D. Rings of multivariate polynomials 863 E. Algebraic structures 868 F. Groups 871 G. Burnside's lemma 894 H. Codes 897 I. Extension of the stereographic projection 905 J. Elliptic curves 906 K. Grobner bases 910 Chapter 13. Combinatorial methods, graphs, and algorithms 923 A. Fundamental concepts 923 B. Fundamental algorithms 934 C. Minimum spanning tree 944 D. Flow networks 946 E. Classical probability and combinatorics 950 F. More advanced problems from combinatorics 956 G. Probability in combinatorics 958 H. Combinatorial games 966 I. Generating functions 969 J. Additional exercises to the whole chapter 1003 Index 1010 Preface The motivation for this textbook came from many years of lecturing Mathematics at the Faculty of Informatics at the Masaryk University in Brno. The programme requires introduction to genuine mathematical thinking and precision. The endeavor was undertaken by Jan Slovák and Martin Panák since 2004, with further collaborators joining later. Our goal was to cover seriously, but quickly, about as much of mathematical methods as usually seen in bigger courses in the classical Science and Technology programmes. At the same time, we did not want to give up the completeness and correctness of the mathematical exposition. We wanted to introduce and explain more demanding parts of Mathematics together with elementary explicit examples how to use the concepts and results in practice. But we did not want to decide how much of theory or practice the reader should enjoy and in which order. All these requirements have lead us to the two column format of the textbook, where the theoretical explanation on one side and the practical procedures and exercises on the other side are split. This way, we want to encourage and help the readers to find their own way. Either to go through the examples and algorithms first, and then to come to explanations why the things work, or the other way round. We also hope to overcome the usual stress of the readers horrified by the amount of the stuff. With our text, they are not supposed to read through the book in a linear order. On the opposite, the readers should enjoy browsing through the text and finding their own thrilling paths through the new mathematical landscapes. In both columns, we intend to present rather standard exposition of basic Mathematics, but focusing on the essence of the concepts and their relations. The exercises are addressing simple mathematical problems but we also try to show the exploitation of mathematical models in practice as much as possible. We are aware that the text is written in a very compact and non-homogeneous way. A lot of details are left to readers, in particular in the more difficult paragraphs, while we try to provide a lot of simple intuitive explanation when introducing new concepts or formulating important theorems. Similarly, the examples display the variety from very simple ones to those requesting independent thinking. We would very much like to help the reader: • to formulate precise definitions of basic concepts and to prove simple mathematical results; • to percieve the meaning of roughly formulated properties, relations and outlooks for exploring mathematical tools; • to understand the instructions and algorithms underlying mathematical models and to appreciate their usage. These goals are ambitious and there are no simple paths reaching them without failures on the way. This is one of the reasons why we come back to basic ideas and concepts several times with growing complexity and width of the discussions. Of course, this might also look chaotic but we very much hope that this approach gives a better chance to those who will persist in their efforts. We also hope, this textbook should be a perfect beginning and help for everybody who is ready to think and who is ready to return back to earlier parts again and again. To make the task simpler and more enjoyable, we have added what we call "emotive icons". We hope they will spirit the dry mathematical text and indicate which parts should be read more carefully, or better left out in the first round. The usage of the icons follows the feelings of the authors and we tried to use them in a systematic way. We hope the readers will assign the meaning to icons individually. Roughly speaking, we are using icons to indicate complexity, difficulty etc.: Further icons indicate unpleasant technicality and need of patiance, or possible entertainment and pleasure: Similarly, we use various icons in the practical column: The practical column with the solved problems and exercises should be readable nearly independently of the theory. Without the ambition to know the deeper reasons why the algorithms work, it should be possible to read mainly just this column. In order to help such readers, some definitions and descriptions in the theoretical text are marked in order to catch the eyes easily when reading the exercises. The exercises and theory are partly coordinated to allow jumping there and back, but the links are not tight. The numbering in the two columns is distinguished by using the different numberings of sections, i.e. those like 1.2.1 belong to the theoretical column, while 1.B.4 points to the practical column. The equations are numbered within subsections and their quotes include the subsection numbers if necessary. In general, our approach stresses the fact that the methods of the so called discrete Mathematics seem to be more important for mathematical models nowadays. They seem also simpler to get percieved and grasped. However, the continuous methods are strictly necessary too. First of all, the classical continuous mathematical analysis is essential for understanding of convergence and robustness of computations. It is hard to imagine how to deal with error estimates and computational complexity of numerical processes without it. Moreover, the continuous models are often the efficient and effectively computable approximations to discrete problems coming from practice. As usual with textbooks, there are numerous figures completing the exposition. We very much advise the readers to draw their own pictures whenever necessary, in particular in the later chapters, where we provide only a few. The rough structure of the book and the dependencies between its chapters are depicted in the diagram below. The darker the color is, the more demanding is the particular chapter (or at least its essential parts). In particular, the chapters 7 and 9 include a lot of material which would perhaps not be covered in the regular course activities or required at exams in great detail. The solid arrows mean strong dependencies, while the dashed links indicate only partial dependencies. In particular, the textbook could support courses starting with any of the white boxes, i.e. aiming at standard linear algebra and geometry (chapters 2 through 4), discrete chapters of mathematics (11 through 13), and the rudiments of Calculus (5, 6, 8). í. Initial warm -V Ti 5. Establishing the ZOO 6. Differential and integral calculus 7. Continuous toots for modelling <■-- 2. Elementary linear algebra I I I ___\ \ 11. Elementary number theory 4. Analytic geometry -J K 3. Linear models and matrix calculus t 8. Calculus with more variables 9. Continuous models -further selected topics "7 10. Probability and statistics % \ 1 % 1 ' 12. Algebraic structures - > 13. Combinatorics, graphs, and algorithms All topics covered in the book are now included (with more or less details) in our teaching of large four semester courses within our Mathematics minor programme, complemented by numerical seminars. In our teaching, the first semester covers chapters 1 and 2 and selected topics from chapters 3 and 4. The second semester essentially includes chapters 5,6, and 7. The third semester is now split into two parts. The first one is covered by chapter 8 (with only a few glimpses towards the more advanced topics from chapter 9), while the rest of the semester is devoted to the rudiments of the graph theory in chapter 13. The last semester provids large parts of the chapters 11 through 13. Actually, the second semester could be offered in parallel with the first one, while the fourth semester could follow immediately after the first one. Probability and statistics (chapter 10) are offered as separate course in parallel. CHAPTER 1 Initial warmup "value, difference, position " - what it is and how to comprehend it? A. Numbers and functions We can already work with natural, integer, rational and real numbers. We explain why rational numbers are not sufficient for us (although computers are actually not able to work with any other) and we recall the complex numbers (because even the real numbers are not adequate for some calculations). l.A.l. Show that the integer 2 does not have a rational square root. Solution. Already the ancient Greeks knew that if we prescribe the area of square as a2 = 2, then we cannot find a rational a to satisfy it. Why? Assume we know that (p/q)2 = 2 for natural numbers p and q that do not have common divisors greater than 1 (otherwise we can further reduce the fraction p/q). Then p2 = 2q2 is an even number. Thus, on the left-hand side p2 is even. Therefore so is p because the alternative that p is odd would imply the contradiction that p2 is odd. Hence, p is even and so p2 is divisible by 4. So q2 is even and so q must be even too. This implies thatp and q both have 2 as a common factor, which is a contradiction. □ The goal of this first chapter is to introduce the reader to the fascinating world of mathematical thinking. The name of this chapter can be also understood as an encouragement for patience. Even the simplest tasks and ideas are easy only for those who have already seen similar ones. A full knowledge of mathematical thinking can be reached only through a long and complicated course of study. We start with the simplest thing: numbers. They will also serve as the first example of how mathematical objects and theories are built. The entire first chapter will become a quick tour through various mathematical landscapes (including germs of analysis, combinatorics, probability, geometry). Perhaps sometimes our definitions and ideas will often look too complicated and not practical enough. The simpler the objects and tasks are, the more difficult the mastering of depth and all nuances of the relevant tools and procedures might be. We shall come back to all of the notions again and again in the further chapters and hopefully this will be the crucial step in the ultimate understanding. Thus the advice: do not worry if you find some particular part of the exposition too formal or otherwise difficult - come back later for another look. 1. Numbers and functions Since the dawn of time, people want to know "how much" about something they have, or "how much" is something worth, "how long" will a particular task take, etc. The answer for such ideas is usually some We consider something to be a number, if it behaves according to the usual rules - either according to all the rules we accept, or maybe 5 only to some of them. For instance, the result of multiplication does not depend on the order of multiplicands. We have the number zero whose addition to another number does not change the result. We have the number one whose product with another number does not change the result. And so on. The simplest example of numbers are the positive integers which we denote Z+ = {1,2,3,...}. The natural numbers consist of either just the positive integers, or the positive integers together with the number zero. The number zero is kind of "number" CHAPTER 1. INITIAL WARM UP l.A.2. Remark. It can be proved that for all positive natural numbers n and x the n-th root tfx of x is either natural or it is not rational, see l.G.l. A Alt diagonals %f sjmr&s imdi'oml Next, we work out some examples with complex numbers. If you are not familiar with the basic concepts and properties of complex numbers, consult the paragraphs 1.1.3 through 1.1.4 in the other column. I.A.3. Calculate z\ + z2, z\ ■ z2, zi, \z2\, fj, for a) z\ = \ — 2i, z2 = 4.2 — 3; b) z\ = 2, Z2 = 2. Solution. a) z1 + z2 = 1 - 3 - 22 + 42 = -2 + 2i, z1 ■ z2 = 1 ■ (-3) - 8i2 + 6i + 4i = 5 + 102, zi = 1 + 2i, \z21 = \/42 + (-3)2 25 ^ 25 5, a 22 l-(-3)+8i2+6i-4i 25 b) Zl + Z2 = 2 + 2, Zl ■ Z2 = 22, Zl = 2, 1221 = 1, ^ = -22. □ l.A.4. Determine r (2+3i)(l+iv5) l-i%/3 -Z V?5 either considered to be a natural number, as is usual in computer science, or not a natural number as is usual in some other contexts. Thus the set of natural numbers is either Z+, or the setN = {0,1, 2,3,... }. To count "one, two, three,..." is learned already by children in their pre-school age. Later on, we meet all the integers Z = {..., —2, —1,0,1,2,... } and finally we get used to floating-point numbers. We know what a 1.19-multiple of the price means if we have a 19% tax. 1.1.1. Properties of numbers. In order to be able to work ■ properly with numbers, we need to be careful with their definition and properties. In mathematics, the basic statements about properties of objects, whose validity is assumed without the need to prove them, are called axioms. We list the basic properties of the operations of addition and multiplication for our calculations with numbers, which we denote by letters a,b,c,.... Both operations work by taking two numbers a, b. By applying addition or multiplication we obtain the resulting values a + b and a ■ b. Properties of numbers Properties of addition: (CGI) (a + b) + c = a+ (b + c), for all a, b, c (CG2) a + b = b + a, for all a, b (CG3) there exists 0 such that for all a, a + 0 = a (CG4) for all a there exists b such that a + b = 0. The properties (CG1)-(CG4) are called the properties of a commutative group. They are called respectively associativity, commutativity, the existence of a neutral element (when speaking of addition we usually say zero element), and the existence of an inverse element (when speaking of addition we also say the negative of a and denote it by —a). Properties of multiplication: (Rl) (a-b) ■ c = a - (b- c), for all a, b, c (R2) a ■ b = b ■ a, for all a, b (R3) there exists 1 such that for all a 1 ■ a = a (R4) a ■ (b + c) = a ■ b + a ■ c, for all a, b, c. The properties (R1)-(R4) are called respectively associativity, commutativity, the existence of a unit element and dis-tributivity of addition with respect to multiplication. The sets with operation +, ■ that satisfy the properties (CG1)-(CG2) and (R1)-(R4) are called commutative rings. Two further properties of multiplication are: (F) for every a^O there exists b such that a-b = \. (ID) if a-b = 0, then either a = 0 or b = 0 or both. The property (F) is called the existence of an inverse element with respect to multiplication (this element is then denoted by a-1. For normal arithmetic, this is called the reciprocal of a, the same as 1 la or -. 5 CHAPTER 1. INITIAL WARM UP Solution. Since the absolute value of the product (ratio) of any two complex numbers is the product (ratio) of their absolute values and every complex number has the same absolute value as its complex conjugate, we have that (2+3i)(l-Hy5) _ |2_|_32| ■ IH-'V^I 1—i\/3 \l-iV3\ V22 + 32 = VTs. \2 + 3i\ □ l.A.5. Simplify the expression (5y/3 + 5i)n for n = 2 and n = 12. Solution. Using binomial theorem for n = 2 we get (5^ + 5i)2 = 75 + ■ 5i - 25 = 50 + 50^- Taking powers one by one or doing an expansion using binomial theorem are in the case n = 12 too much time-consuming. Let us rather write the number in polar form 5^ + 5^ = 10 + |) = 10 (cos | +isin|) and using de Moivre theorem we easily obtain (5y/3 + 5i)12 = 1012 (cos ift + i sin ^) = 1012. Q l.A.6. Determine the distance d of the numbers z, z in the complex plane for y _ \/3 V3 _ -3 ^ — 2 t 2. Solution. It is not difficult to realize that complex conjugates are in the complex plane symmetric with respect to the x-axis and the distance of a complex number from the x-axis equals its imaginary part. That gives d = 3. □ l.A.7. Express the number z1 = 2 + 3i in polar form. Express the number 22 = 3(cos(7r/3) + i sin(7r/3)) in algebraic form. Solution. The absolute value of \z1 (the distance of the point with Cartesian coordinates [2,3] in the plane from the origin) is V22 + 32 = Vl3. From the right triangle in the diagram we compute sm(ip) = 3/\/l3, cos(tp) =2/\/l3. Thus ip = arcsin(3/v/13) = arccos(2/v/13) = 56.3°. In total, 13 13 ( cos ( arccos v v^;;+isinlarcsinv^, Transition from polar form to algebraic form is even simpler: z2 = 3 (cos (J^j + isiri ^ _3 . 3^3 ~ 2 +l' ~' = 2 The property (ID) then says that there exists no "divisors of zero". A divisor of zero is a number a, a ^ 0, such that there is a number b, b ^ 0, with ab = 0. 1.1.2. Remarks. The integers Z are a good example of a commutative group. The natural numbers are not such an example since they do not satisfy (CG4) (and possibly do not even contain the neutral element if one does not consider zero to be a natural number). If a commutative ring also satisfies the property (F), we speak of a field (often also about a commutative field). The last stated property (ID) is automatically satisfied if (F) holds. However, the converse statement is false. Thus we say that the property (ID) is weaker than (F). For example, the ring of integers Z does not satisfy (F) but does satisfy (ID). In such a case we use the term integral domain. Notice that the set of all non-zero elements in the field along with the operation of multiplication satisfies (Rl), (R2), (R3), (F) and thus is also a commutative group. However in this case, instead of addition we speak of multiplication. As an example, the set of all non-zero real numbers forms a commutative group under multiplication. The elements of some set with operations + and ■ satisfying (not necessarily all) stated properties (for example, a commutative field, an integral domain) may be called scalars. To denote them we usually use lowercase Latin letters, either from the beginning or from the end of the alphabet. We will use only these properties of scalars and thus our results will hold for any objects with such properties. This is the true power of mathematical theories - they do not hold just for a specific solved example. Quite the opposite, when we build ideas in a rational way they are always universal. We will try to emphasise this aspect, although our ambitions are modest due to the limited size of this book. Before coming to any use of scalars, we should make a short formal detour and pay attention to its existence. We shall come back to this in the very end of this chapter, when we shall deal with the formal language of Mathematics in general, cf. the constructions starting in 1.6.5. There we indicate how to get natural numbers N, integers Z, and rational numbers Q, while the real numbers R will be treated much later in chapter 5. At this point, let us just remark that it is not enough to pose the axioms of objects. We have to be sure that the given conditions are not in conflict and such objects might exist. We suppose the readers are sure about the existence of the domains N, Z, Q and can handle them easily. The real numbers are usually understood as a dense and better version of Q, but what about the domain of complex numbers? As is usual in mathematics, we will use variables (letters of alphabet or other symbols) to denote numbers, and it does not matter whether we know their value beforehand or not. □ 6 CHAPTER 1. INITIAL WARM UP l.A.8. Express z = cos 0 + cos j + i sin | in polar form. Solution. To express number z in polar form, we need to find its absolute value and argument. First we calculate the absolute value: = V ( cos 0 + cos ■ + 8111" \ 1 + + For the argument p, we have: cosw- M£) - l±i - vl COS p — |z| _ ^_ _ 2 , therefore p = tt/Q. Thus y/3 2 sin p = V3. _ Im(z) _ \/3 (cos ^ + i sin ^) □ l.A.9. Using de Moivre theorem, calculate (^cos -g- + J sin ^ Solution. We obtain COS- +2SU1-J 317T 317T 7tt 7tt a/3 1 = cos--\-i sin- = cos--\-i sin — =---1 -. 6 6 6 6 2 2 □ 1.A.10. Is the "square root" well defined function in complex numbers? Solution. No, it is only defined as a function with domain being non-negative real numbers and the image being the same set. In the complex domain, for any complex number z (except zero) there are two complex numbers such that their square is equal z. Both can be called square root and they differ by sign (square root of — 1 is according to this definition i as well as — i). □ 1.1.3. Complex numbers. We are forced to extend the domain of real numbers as soon as we want to see solutions of equations like x2 = b for all real numbers b. We know that this equation always has a solution x in the domain of real numbers, whenever b is non-negative. If b < 0, then such a real x cannot exist. Thus we need to find a larger domain, where this equation has a solution. The crucial idea is to add the new number i to the real numbers, the imaginary unit, for which we require i2 = — 1. Next we try to extend the definitions of addition and multiplication in order to preserve the usual behaviour of numbers (as summarised in 1.1.1). Clearly we need to be able to multiply the new number i by real numbers and sum it with real numbers. Therefore we need to work in our newly defined domain of complex numbers C with formal expressions of the form z = a + ib, being called algebraic form of z. The real number a is called the real part of the complex number z, the real number b is called the imaginary part of the complex number z, and we write Re(z) = a, lm(z) = b. It should be noted that if z = a + i b and w = c + id then z = w implies both a = c and b = d. In other words, we can equate both real and imaginary parts. For positive x we then get (i ■ x)2 = —l-x2 and thus we can solve the equations as requested. In order to satisfy all the properties of associativity and distributivity, we define the addition so that we add independently the real parts and the imaginary parts. Similarly, we want the multiplication to behave as if we multiply the pairs of real numbers, with the additional rule that i2 = —1, thus (a + i b) + (c + i d) = (a + c) + i (b + d), (a + ib) ■ (c + i d) = (ac — bd) + i (be + ad). Next, we have to verify all the properties (CGI-4), (Rl-4) and (F) of scalars from 1.1.1. But this is an easy exercise: zero is the number 0 + i 0, one is the number 1 + i 0, both these numbers are for simplicity denoted as before, that is, 0 and 1. For non-zero z = a + i b we easily check that z~x = (a2 + b2)~1(a — i b). All other properties are obtained by direct calculations. 1.1.4. The complex plane and polar form. A complex number is given by a pair of real numbers, therefore it corresponds to a point in the real plane I2. Our algebraic form of the complex numbers z = x + iy corresponds in this picture to understanding the x-coordinate axis as the real part while the y-coordinate axis is the imaginary part of the number. The absolute value of the complex number z is defined as its distance from the origin, thus \z\ = \Jx2 + y2. The reflection with respect to the real axis then corresponds to changing the sign of the imaginary part. We call this operation z z = x — iy the complex conjugation. Let us now consider complex numbers of the form z = cos ip + i sin p, where p is a real parameter giving the angle between the real axis and the line from the origin to z (measured in the positive, i.e. counter-clockwise sense). These 7 CHAPTER 1. INITIAL WARM UP l.A.ll. Complex numbers are not just a tool to obtain "weird" solutions to quadratic equations. They are necessary to determine solutions to cubic equations, even if these solutions are real. How can we express solution to the cubic equation a;3 + ax2 + bx + c = 0 in real coefficients a, b, c? We show a method developed in sixteenth century by Ferro, Cardano, Tartaglia and possibly others. Substitute x := t — a/3 (to remove the quadratic part from the equation) to obtain the equation: t3 + pt + q = 0, where p = b - a2/3 and q = c + (2a3 - 9a&)/27. Now introduce unknowns u, v satisfying the conditions u + v = t and 3uv + p = 0. Substitute the first condition into the previous equation to obtain u3 + v3 + (3uv + p)(u + v) + q 0. Now use the second equation to eliminate v. This yields u6 + qu3-^ = 0, which is a quadratic equation in the unknown s = u3. Thus £ ± /£! + El ~2 V 4 27' By back substitution, we obtain x = —p/3u + u — a/3. In the expression for u there is cube root. In order to obtain all three solutions we need to work with complex roots. The equation x3 = a, a =^ Q, with the unknown x has exactly three solutions in the domain of complex numbers (the fundamental theorem of algebra, see (12.2.8) on page 866). All these three solutions are called cube roots of a. Therefore the expression y/a has three meanings in the complex domain. If we want a single meaning for that expression, we usually consider it to be the solution with the smallest argument. 1.A.12. Show that the roots £i, ■ ■ ■, Cn of the equation xn = 1 form the vertices of the regular n-gon in the plane of the complex numbers. Solution. The argument of the roots is given by de Moivre theorem, namely the argument multiplied by n has to be a multiple of 2tt, the absolute value has to be one, so the roots are = cos(k^) + i sin(k^), k = 1,..., n, which are indeed the vertices of a regular polygon. □ numbers describe all points on the unit circle in the complex plane. Every non-zero complex number z can be then written as z = z | (cos
B assigning to each element x in the domain set A the value f(x) in the codomain set B. The set of all images f(x)eB is called the range of /.
The set A or B can be a set of numbers, but there is nothing to stop them being sets of other objects. The mapping /, however it is described, must unambiguously determine a unique member of B for each member of A.
In another terminology, the member x e A, is often called the independent variable. Then y = f(x) G B, is called the dependent variable. We also say that the value y = j(x) is a. function of the independent variable x in the domain of /.
For now, we shall restrict ourselves to the case where the codomain B is a subset of scalars and we shall talk about scalarfunctions.
8
CHAPTER 1. INITIAL WARM UP
1.A.13. Show that the roots £1, £2,• • •, £n of the equation xn = 1 satisfy
n
£6 = 0.
i=l
Solution. Let £i be the root with the smallest positive argument. The other roots satisfy = £f (see the previous example), thus
71 71
i=l i=l Si
= 0,
where we have summed up the geometric sequence £i .
□
More examples about complex numbers can be found in the end of the chapter, starting at l.G.2.
1.A.14. Solve the equation
a;3 + x2 - 2x - 1 = 0.
Solution. This equation has no rational roots (methods to determine rational roots will be introduced later, see (??)). Substitution into formulas obtained in l.A.ll yields p =
b - a2/3 = -7/3, q = -7/27. It follows that
\/28 ± 12v/314f
u = -.
6
We can theoretically choose up to six possibilities for u (two for the choice of the sign and three independent choices of the cubic root). But we obtain only three distinct values for x. By substitution into the formulas, one of the roots is of the form
14
3(28 - 84^)
\/28 - 84^ 1
+--ř-— - q = 1-247,
6 3
similarly for the other two (approximately —0.445 and —1.802). As noted before, we see that even if we have used complex numbers during the computation, all the solutions are real. □
B. Difference equations
Difference equations (also called recurrence relations) are relations between elements of a sequence, ^ where an element of the sequence depends on 11 previous elements. To solve a difference equation means finding an explicit formula for n-th (that is, arbitrary) element of the sequence.
The simplest way to define a function appears if A is a finite set. Then we can describe the function / by a table or a listing showing the image of each member of A. We have certainly seen many examples of such functions:
Let / denote the pay of a worker in some company in certain year. The values of independent variable, that is, the domain of the function, are individual workers x from the set of all considered workers. The value j(x) is their pay for the given year. Similarly we can talk about the age of students or their teachers in years, the litres of beer and wine consumed by individuals from a given group, etc.
Another example is a food dispensing machine. The domain of a function / would be the button pushed together with the money inserted to determine the selection of the food item
Let A = {1,2,3} = B. The set of equalities/(l) = 1, /(2) = 3, /(3) = 3, defines a function / : A -> B. Generally, as there are 3 possible values for /(l), and the same for /(2), and /(3), there are 27 possible functions from A into B in total.
But there are other ways to define a function than as a table. For example, the function / can denote the area of a planar region. Here, the domain consists of subsets of the plane (e.g. all triangles, circles or other planar regions with a defined area). The range of / consists of the respective areas of the regions. Rather than providing a list of areas for a finite number regions, we hope for a formula allowing us to compute the functional value f(P) for any given planar region P from a suitable class.
Of course, there are many simple functions given by formulas, like the formula f(x) = 3x + 7 with A = B = R or A = B = N.
Not all functions can be given by a formula or list. For example, let j(t) denote the speed of the car at time t. For any given car and time t, we know there will be the functional values j(t) denoting its speed. Which can of course be measured approximately, but usually not by a formula.
Another example: Let j(n) be the nth digit in the decimal expansion of it = 3.1415 .... So for example /(4) = 5. The value of / (n) is defined but unknown if n is large enough.
The mathematical approach in modelling real problems often starts from the indication of certain dependencies between some quantities and aims at explicit formulas for functions which describe them. Often a full formula is not available but we may obtain the values j(x) at least for some instances of the independent variable x, or we may be able to find a suitable approximation.
We shall see all of the following types of expressions of the requested function / in this book:
• exact finite expression (like the function j(x) = 3a; + 7 above);
• infinite expression (we shall come to that only much later in chapter 5 when introducing the limit processes);
• description of how the function's values change under a given change of the independent variable (this behaviour
9
CHAPTER 1. INITIAL WARM UP
If an element of the sequence is determined only by the previous element, we call it a first order difference equation. This is a common real world problem, for instance when we want to find out how long will repayment of a loan take for fixed monthly repayment, or when we want to know how much shall we pay per month if we want to repay a loan in a fixed time.
l.B.l. Michael wants to buy a new car. The car costs € 30 000. Michael wants to take out a loan and repay it with a fixed month repayment. The car company offers him a loan to buy the car with yearly interest of 6%. The repayment starts at the end of the first month of the loan. Michael would like to finish repaying the loan in three years. How much should he pay per month?
Solution. Let P denote the sum Michael has to pay per month. After the first month Michael repays P, part of it is a repayment of the loan, part of it pays the interest. Let dk stand for the loan after k months and write C = 30 000 for the price of the car, and u = Sj&S for tne monthly interest rate. We know do = C = 30 000 and after the first month there is
d1 = C - P + u ■ C.
In general, after the fc-th month we have
(1) dk = dk_i — P + m4-i = (1 + u)dk-i — P-
Using the relation (1) from paragraph 1.2.3 we obtain dk given by (we write a = l + u)
dk = dnak — P
a-l
Repaying the loan in three years means d^e = 0, thus
(1 + u)36u
P = 30 000
(1 + u)36 - 1
= 30000 f»»)^m.
V (12.06/12)36 - 1 J □ Note that the recurrence relation (1) can be used for our case as long as all y (n) are positive, that is, as long as Michael still has to repay something.
I.B.2. Consider the case from the previous example. For how long would Michael have to pay, if he repays €500 per month?
Solution. Setting as before a = (l + 2jf) = 1.005, C = 30 000 the condition dk = 0 gives the equation
k £i 200P
will be displayed under the name difference equation in a moment and under different circumstances later on);
• approximation of a not computable function with a known one (usually including some error estimates -this could be the case with the car above, say we know it goes with some known speed at the time t = 0, we break as much as possible on a known surface and we compute the decrease of speed with the help of some mathematical model);
• finding only the probability of possible values of the function. For example the function giving the length of life of a given group of still living people, in dependence of some health related parameters.
1.1.6. Functions denned explicitly. Let us start with the most desirable case, when the function values are defined by a computable finite formula. Of course, we shall be interested also in the efficiency of the formulas, i.e. how fast the evaluations would be. In principle, real computations can involve only a finite number of summations and multiplications of numbers. This is how we define the polynomials, i.e. function of the form f(x) = an-xn+- ■ ■ +ai -x+a0, where a0,... ,an are known scalars, x is the unknown variable whose value we can insert. xn = 1 ■ x ■ ■ ■ x means the n-times repeated multiplication of the unit by x (in particular, a;0 = 1), and j(x) is the value of the indicated sum of products. This is fairly well computable formula for each n e N. The choice n = 0 provides the constant a0.
The next example is more complicated.
Factorial function
Let A = Z+ be the set of positive integers. For each n e Z+, define the factorial function by
n\ =n(n-l)(n-2)...3-2-l.
For convenience we also define 0! = 1. (We will see why this is sensible later on). It is easy to see that n\ = n ■ (n — 1)! for all n > 1.
-^r-C 200P-C
So 1! = 1, 2! = 2 ■ 1 = 2, 3! = 3 ■ 2 ■ 1 = 6, 6! = 720 etc.
The latter example deserves more attention. Notice that we could have defined the factorial by settings = B = Nand giving the equation j(n) = n ■ f(n — 1) for all n > 1. This does not yet define /, but for each n, it does determine what j(n) is in terms of its predecessor f(n — 1). This is sometimes called a recurrence relation. After choosing /(0) = 1, the recurrence now determines /(l) and hence successively /(2) etc., and so a function is defined. It is the factorial function as described above.
2. Difference equations
The factorial function is one example of a function which can be defined on the natural numbers by means of a recurrence relation.
10
CHAPTER 1. INITIAL WARM UP
By taking logarithms of both sides, we obtain
_ ln(200P) - ln(200P - C) In a '
which for P = 500 gives approximately k = 71.5, thus Michael would be paying for 72 months (the last repayment would be less than €500). □
l.B.3. Determine the sequence {yn}™=1, which satisfies the following recurrence relation
Vn+i = — + 1, n > 1, yi = 1. q
Linear recurrences can naturally appear in geometric problems:
l.B.4. Suppose n lines divide the plane into regions. What is the maximum number of regions that can be formed in this way?
Solution. Let the number of regions be pn. If there is no line in the plane, then the whole plane is one region, thus p$ = l. If there are n lines, then adding an (n + l)-st line increases the number of regions by the number of regions this new line intersects. If no lines are parallel and no three lines intersect at the same point, the number of regions the (n + l)-st line crosses is one plus the number of its intersections with the previous lines (the crossed area will then be divided into two, thus the total number increases by one at every crossing).
The new line has at most n intersections with the already-present n lines. The segment of the line between two intersections crosses exactly one region, thus the new line crosses at most n+l regions.
Thus we obtain the recurrence relation
Pn+l =pn+{n+ 1).
Such a situation can often be seen when formulating mathematical models that describe real systems in economy, biology, etc. We will observe here only a few simple examples and return to this topic in chapter 3.
1.2.1. Linear difference equations of first order. A general difference equation of the first order (or first order recurrence) is an expression of the form
f(n + l)=F(n,f(n)),
where F is a known function with two arguments (independent variables). If we know the "initial" value /(0), we can compute /(l) = F(0, /(0)), then /(2) = F(l, /(l)) and so on. Using this, we can compute the value j(n) for arbitrary
An example of such an equation is provided by the factorial function j(n) = n\ where:
(n + l)! = (n + l)-n!
In this way, the value of f(n + 1) depends on both n and the value of f(n), and formally we would express this recurrence in the form F(x, y) = (x + l)y.
A very simple example is / (n) = C for some fixed scalar C and all n. Another example is the linear difference equation of first order
(1) f(n + l) = a-f(n)+b,
where a^O and b are fixed numbers.
Such a difference equation is easy to solve if b = 0. Then it is the well-known recurrent definition of the geometric progression. We have
/(l) = a/(0), /(2)=a/(l) = a2/(0), and so on. Hence for all n we have
f(n) = a"/(0).
This is also the relation for the Malthusian population growth model. This is based on the assumption that population size grows with a constant rate when measured at a sequence of fixed time intervals.
We will prove a general result for first order equations with variable coefficients, namely:
(2) f(n + l) = an-f(n)+bn.
We use the usual notation for sum J2< the similar notation for the product Yl- We use also the convention that when the index set is empty, then the sum is zero and the product is one.
1.2.2. Proposition. The general solution of the first order difference equation (2) from the previous paragraph with the initial condition /(0) = yo is for n G N given by the formula
/n-l \ n-2 I n-1 \
(1) f(n) = J] at y0 + £W J] aA bJ + bn-i-
\i=0 J j=0 \i=j+l J
11
CHAPTER 1. INITIAL WARM UP
for which po = 1. We obtain an explicit formula for pn either by applying the formula in 1.2.2 or directly:
Pn = Pn-i + n = pn_2 + {n-l)+n
pn-z + (n - 2) + (n - 1) + n ■ n(n + 1) n2 +n + 2
■Po
= i +
□
Recurrence relations can be more complex than those of first order. We show an example of combinatorial problem, for whose solution a recurrence relation can be used.
I.B.5. How many words of length 12 that consist only of (óT^S) letters A and B, but do not contain a sub-word BBB, are there?
Solution. Let an denote the number of words of length n consisting of letters A and B but without BBB as a sub-word. Then for an (n > 3) the following recurrence holds
Sn = 0~n-l + O-n-2 + "re-3i
since the words of length n that satisfy the given condition either end with an A, or with an AB, or with an ABB. There are a„_i words ending with an A (preceding the last A there can be an arbitrary word of length n — 1 satisfying the condition). Analogously for the two remaining groups. Further, it is easily shown that ai = 2, a2 = 4, and a3 = 7. Using the recurrence relation we can then compute
a12 = 1705.
We could also derive an explicit formula for n-th element of the sequence using the theory, which we will develop in the chapter 3. □
l.B.6. Partial difference equations. The recurrence relation in the next problem has a more complex form in comparison to the form we have dealt with in our theory. So we cannot evaluate the arbitrary member in our sequence P(k,i) explicitly. We can only evaluate it by a subsequent computing from previous elements. Such an equation is called partial difference equation, since the terms of the equation are indexed by two independent variables (k, I).
The score of a basketball match between the teams of Czech Republic and Russia after the first quarter is 12 : 9 for the Russian team. In how many ways could the score have developed?
Proof. We use mathematical induction. The result clearly holds for n = 1 since /(l) = a0y0 + fro-Assuming that the statement holds for some fixed n, we compute:
(/n-l \ n-2 ( n-l \ N
I J\ ai ) yo + ^2 I II fli J + hn-l \i=0 J j=0 \i=j+l J j
+ bn
\ n-l In \
n at) yo+e nat \b3[+bn>
\i=0 J j=0 \i=j+l J
as can be seen directly by multiplying out. □
Note that for the proof, we did not use anything about the numbers except for the properties of commutative ring.
1.2.3. Corollary. The general solution of the linear difference equation (l)from 1.2.1 with a/1 and initial condition
/(0) = yo is
(1)
1 — nn f(n) = any0 +--b.
1 - a
Proof. If we set a{ and b{ to be constants and use the general formula 1.2.2(1), we obtain
f{n) = any0 + b(\ + Yjan->-1'
\ 3=0
We observe that the expression in the bracket is
(1 + a H-----han_1). The sum of this geometric progression follows from
l-a™ = (l-a)(l + a + ... + a™-1). □
The proof of the former proposition is a good example of a mathematical result, where the verification is quite easy, as soon as someone tells us the theorem. Mathematical induction is a natural method of proof.
Note that for calculating the sum of a geometric progression we required the existence of the inverse element for non-zero scalars. We could not do that with integers only. Thus the last result holds for fields of scalars and we can thus use it for linear difference equations where the coefficients a, b and the initial condition /(0) = yo are rational, real or complex numbers. This last result also holds in the ring of remainder classes Zfc with prime k (we will define remainder classes in the paragraph 1.6.7).
It is noteworthy that the formula (1) is valid with integer coefficients and integer initial conditions. Here, we know in advance that each j(n) is an integer, and the integers are a subset of rational numbers. Thus our formula necessarily gives correct integer solutions.
Observing the proof in more detail, we see that 1 — an is always divisible by 1 — a, thus the last paragraph should not have surprised us. However it can be seen that with scalars from Z4 and say a = 3, we fail since 1 — a = 2 is a divisor of zero and as such does not have an inverse in Z4.
12
CHAPTER 1. INITIAL WARM UP
Solution. We can divide all possible evolutions of the quarter with the final score k : I into six mutually exclusive possibilities, according to which team scored, and how much was it worth (1, 2 or 3 points). If we denote by P(k,i) the number of ways in which the score could have developed for a quarter that ended with k : I, then for k,l > 3 the following recurrence relation holds:
(k,i)
P(k-3,l) + P(k-2,l) + P(k-l,l) + P(k,l-
+
P(k,i-2) + P(k,i-3)- Using the symmetry of the problem, P(k,i) = P(i,k) - Further, for k > 3:
(fc,0)'
P(k,2) = P(k-3,2) + P(k-2,2) + P(k-1,2) + P(k,l) + P( P(k,l) = P(k-3,1) + P(k-2,1) + P(k-l,l) + P(k,0), P{k,0) = P{k-3,0) + P(k-2fi) + P{k-l,0)i
which, along with the initial condition, gives P(o,o) = 1. P{i,o) = 1. ^(2,0) = 2, P(3,o) = 4, = 2, P(2jl) =
(i.i:
+ P(0,1) + P(2,0) = 5, P(2,2) = P(0,2) + P(l,2) +
+ P(2,o) — 14- Hence by repeatedly using the above
equations, we obtain eventually
P(12g) = 497178513.
We will discuss recurrent formulas (difference equations) of higher order with constant coefficients in chapter 3.
C. Combinatorics
In this section we use natural numbers to describe some ^ indivisible items located in real life space, and fyv&r^ deal with questions as how to compute the num-' ber of their (pre)orderings, choices, and so on. In many of these problems, "common sense" is sufficient. We just need to use the rules of product and sum in the right way, as we show in the following examples:
The linear difference equation 1.2.1(1) can be neatly in-igu terpreted as a mathematical model for finance, cfj*5f"TjF'J e.g. savings or loan payoff with a fixed interest rate a and fixed repayment b. (The cases of sav-Vfliti ings and loans differ only in the sign of b). With varying parameters a and b we obtain a similar model with varying interest rate and repayment. We can imagine for instance that n is the number of months, an is the interest rate in the nth month, bn the repayment in the nth month.
1.2.4. A nonlinear example. When discussing linear difference equations, we mentioned a very primitive population growth model which depends directly on the momentary population size p. At firstsight.it is clear that such a model with a > 1 leads to a very rapid and unbounded growth.
A more realistic model has such a population change Ap(n) = p(n + 1) — p(n) only for small values of p, that is Ap/p ~ r > 0. Thus if we want to let the population grow by 5% for a time interval only for small p, then we choose r to be 0.05. For some limiting value p = K > 0 the population may not grow. For even greater values it may even decrease, for instance if the resources for the feeding of the population are limited, or if individuals in a large population are obstacles to each other etc.
Assume that the values yn = Ap(n)/p(n) change linearly in p(n). Graphically we can imagine this dependence as a line in the plane of the variables p and y. This line passes through the point [0, r], so that y = r when p = 0 This line also passes through [K, 0], since this gives the second condition, namely that when p = K the population does not change. Thus we set
By setting y = yn = Ap(n)/p(n) and p = p(n) we obtain
p(n + 1) — p(n) r
p(n)
K
p(n) + r.
By multiplying, we obtain a difference equation of first order with p(n) present as both a first and a second power.
(1) p(n + l)=p(n)(l--p(n)+r).
l.C.l. Mother wants to give John and Mary five pears and six apples. In how many ways can she divide the fruits among them? (We consider the pears to be indistinguishable. We consider the apples to be indistinguishable. The possibility that one of the children gets nothing is not excluded.)
Try to think through the behaviour of this model for var-, ious values of r and K. In the diagram we ;/ can see the results for parameters r = 0.05 (that is, five percent growth in the ideal state), K = 100 (resources limit the population to the size 100), and as p(0) = 2 we have initially two individuals.
13
CHAPTER 1. INITIAL WARM UP
Solution. The five pears can be divided in six ways (it is determined by the number of pears given to John, the rest goes to Mary.) The six apples can be divided in seven ways. These divisions are independent. Using the rule of product, the total number is 6 ■ 7 = 42. □
l.C.2. Determine the number of four-digit numbers, which either start with the digit 1 and do not end with the digit 2, or that end with the digit 2 but do not start with the digit 1 (of course, the first digit must not be zero).
Solution. The set of numbers described in the statement consists of two disjoint sets. The total number is then obtained by summing the number of numbers in these two sets. In the first set there are numbers of the form "1XXY" where X is an arbitrary digit and Y is any digit except 2. Thus we can choose the second digit in ten ways, independently of that the third digit in ten ways and again independently the fourth digit in nine ways. These three choices then uniquely determine a number. By multiplication, there are 10 ■ 10 ■ 9 = 900 of such numbers. Similarly in the second set we have 8-10-10 = 800 numbers of the form "YXX2" (for the first digit we have only eight ways, since the number cannot start with zero and one is forbidden). By addition, the solution is 900 + 800 = 1700 numbers. □ In the following examples we will use the notions of combinations, and permutations (possibly with repetitions).
I.C.3. During a conference, 8 speakers are scheduled. Determine the number of all possible orderings in which two given speakers do not speak one right after the other.
Solution. Denote the two given speakers by A and B. If B follows directly after the speaker A, we can consider it as a speech by a single speaker AB. The number of all orderings where B speaks directly after A is therefore 71, the number of permutations of seven elements. By symmetry, the number of all orderings where A speaks directly after B is also 71. Since the number of all possible orderings of eight speakers is 8!, the solutionis 8! - 2 ■ 7!. □
l.C.4. How many rearrangements of the letters of the word PROBLEM are there, such that
a) the letters B and R are next to each other,
b) the letters B and R are not next to each other.
Solution, a) The pair of letters B and R can be assumed to be a single indivisible "double-letter". In total we have six
NM-LIN&\F^ "PETeNVE-NCE—
>1o° bo
V 1»-
0
So
7®c
15o Zoo
Note that the original almost exponential growth slows down later. The population size approaches the desired limit of 100 individuals. For p close to one and K much greater than r, the right side of the equation (1) is approximately p(n)(l + r). That is, the behaviour is similar to that of the Malthusian model. On the other hand, if p is almost equal to K, the right side of the equation is approximately p(n). For an initial value of p greater than K the population size will decrease. For an initial value of p less than K the population size will increase.1
3. Combinatorics
A typical "combinatorial" problem is to count in how many ways something can happen. For instance, in how many ways can we choose two different sandwiches from the daily offering in a grocery shop?
In this situation we need first to decide what we mean by different. Do we then allow the choice of two "identical"' sandwiches? Many such questions occur in the context of card games and other games.
The solution of particular problems, usually involves either some multiplication of particular results (if the individual possibilities are independent) or some addition (if their appearance is disjoint). This is demonstrated in many examples in the problem column (cf. several problems starting with l.C.l).
1.3.1. Permutations. Suppose we have a set of n (distinguishable) objects, and we wish to arrange them in some order. We can choose a first object in n ways, then a second in n — 1 ways, a third in n — 2 ways, and so on, until we choose the last object for which there is only one choice. The total number of possible arrangements is the product of these, hence there are exactly n\ = n(n — l)(n — 2)... 3 ■ 2 ■ 1 distinct orders of the objects. Each ordering of the elements of a set S is called a permutation of the elements of S. The number of permutations on a set with n elements is n\.
^This model is called the discrete logistic model. Its continuous version was introduced already in 1845 by Pierre Francois Verhulst. Depending on the proportions of the parameters r, K and p(0), the behaviour can be very diverse, including chaotical dynamics. There is much literature on this model.
14
CHAPTER 1. INITIAL WARM UP
distinct letters and there are 6! words of six indivisible letters. We have to multiply this by two, since the double-letter can be either BR or RB. Thus the solution is 2 ■ 61.
b) The events in b) form the complement to the part a) in the set of all rearrangements of the seven-letters. The solution is therefore 7! - 2 ■ 6!. □
l.C.5. In how many ways can an athlete place 10 distinct cups on 5 shelves, given that all 10 cups fit on any shelf? Solution. Add 4 indistinguishable items, say separators, to the cups. The number of all distinct orderings of cups and separators is 141/4! (the separators are indistinguishable). Each placement of cups into shelves corresponds to exactly one ordering of cups and separators. It is enough to say that the cups before the first separator in the ordering are placed in the first shelf (preserving the order), the cups between the first and the second separator in the second shelf, and so on. Thus the required number is 141/4!. □
l.C.6. Determine the number of four-digit numbers with exactly two distinct digits. (Recall that the first digit must not beO.)
Solution. First solution. If 0 is one of the digits, then there are 9 choices for the other digit, which must also be the first digit. There are three numbers with a single 0, three numbers with two 0's, and just one number with three 0's. Thus there are 9(3+3+l)=63 numbers which contain the digit 0. Otherwise, choose the first digit for which there are 9 choices. There are then 8 choices for the other digit and 3+3+1 numbers for each choice, making 9 ■ 8 ■ (3 + 3 + 1) = 504 numbers which do not contain the digit 0. The solution is 504+63=567 numbers.
Second solution. The two distinct digits used for the number can be chosen in (g0) ways. From the two chosen digits we can compose 24 — 2 distinct four-digit numbers (we subtract the 2 for the two four digit numbers which use only one of the chosen digits). In total we have (3°) (24 - 2) = 630 numbers. But in this way, we have also computed the numbers that start with zero. Of these there are (^) (23 — 1) = 63. Thus the solution is 630 — 63 = 567 numbers. □
l.C.7. There are 677 people at a concert. Do some of them have the same (ordered) pair of name initials? Solution. There are 26 letters in the alphabet. Thus the number of all possible name initials are 262 = 676. Thus at least two people have the same initials. □
We can identify the elements in S by numbering them (using the digits from one to n), that is, we identify S with the set S = {1,..., n) of n natural numbers. Then the permutations correspond to the possible orderings of the numbers from one to n. Thus we have an example of a simple mathematical theorem and this discussion can be considered to be its proof.
Number of permutations
Proposition. The number p(n) of distinct orderings of a finite set with n elements, is given by the factorial function:
(1) p(n) = n\
Suppose S is a set with n elements. Suppose we wish to choose and arrange in order just k of the members of S, where 1 < k < n. This is called a k-permutation without repetition of the n elements. The same reasoning as above shows that this can be done in
m\
v(n, k) = n(n - l)(n - 2) ■ ■ ■ (n - k + 1) =
(n-k)\
ways. The right side of this result also makes sense for k = 0, (there is just one way of choosing nothing), and for k = n, since 0! = 1.
Now we modify the problem, this time where the order of selection is immaterial.
1.3.2. Combinations. Consider a set S with n elements. A k-combination of the elements of S is a selection of k elements of S, 0 < k < n, when order does not matter.
For k > 1, the number of possible results of a subsequential choosing of our k elements, is n(n — l)(n — 2) ■ ■ ■ (n — k + 1) (a fc-permutation). We obtain the same fc-tuple in k\ distinct orders. Hence the number of k-combinations is
i{n-l){n-2)---{n-k + l)
1!
k\ {n-k)\k\
If k = 0, the same formula is still true, since 0! = 1, and there is just one way to select all n elements.
Combinations
Proposition. The number c(n, k) of combinations of k-th
degree among n elements, where 0 < k < n, is
(1)
fn\ n(n — 1). .. (n — k + 1) n\ \ k I
c(n, k)
k(k-l).
1
{n-k)\kV
We pronounce the binomial coefficient (£) as "n over k" or "n choose k". The name stems from the binomial expansion, which is the expansion of (a+b)n. If we expand (a+b)n, the coefficient of akbn~k is the number of ways to choose a
15
CHAPTER 1. INITIAL WARM UP
l.C.8. New players meet in a volleyball team (6 people). How many handshakes are there when everybody shakes once with everybody else? How many handshakes are there if everybody shakes hands once with each opponent after playing a match?
Solution. Each pair of players shakes hands at the introduction. The number of handshakes is then the combination c(6, 2) = (®) = 15. After a match each of the six players shakes hands six times (with each of six opponents). Thus the required number is 62 = 36. □
l.C.9. In how many ways can five people be seated in a car for five people, if only two of them have a driving licence? In how many ways can 20 passengers and two drivers be seated in a bus for 25 people?
Solution. For the driver's place we have two choices and the other places are then arbitrary, that is, for the second seat we have four choices, for the third three choices, then two and then 1. That makes 2.4! = 48 ways. Similarly in the bus we have two choices for the driver, and then the other driver plus the passengers can be seated among the 24 seats arbitrarily. First choose the seats to be occupied, that is, (^). Among these seats the people can be seated in 21! ways. The solution is 2- (£)21! = 2-f ways. □
1.C.10. Determine the number of distinct arrangements <25mj> which can arise by permuting the letters in each individual word in the sentence "Pull up if I pull up" (the arising arrangements and words do not have to make any sense).
Solution. Let us first compute the number of rearrangements of letters in individual words. From the words "pull" we obtain 4!/2 distinct anagrams (permutation with repetition P(l, 1, 2)), similarly "up" and "if yields two. Therefore, using the rule of product, wehave^-2-2-1-^-2 = 1152. Notice, that if the resulting arrangement should be a palindromic one again, there would be only four possibilities. □
l.C.ll. In how many ways can we insert five golf balls into five holes (into every hole one ball), if we have four identical white balls, four identical blue balls and three identical red balls?
Solution. First solve the problem in the case that we have five balls of every colour. In this case it amounts to free choice of five elements from three possibilities (there is a choice out of
fc-tuple from n parentheses in the product (from these parentheses, we take a, from the others, we take b). Therefore we have
(2)
(a + b)" = £
k=0
lkbn~
Note that only distributivity, commutativity and associativity of multiplication and summation was necessary. The formula (2) therefore holds in every commutative ring.
We present a few simple propositions about binomial coefficients - another simple example of a mathematical proof. If needed, we define (£) = 0 whenever k < 0 or k > n.
1.3.3. Proposition. For all non negative integers n, we have
(D(t) = L-k) 0 Kk on the bottom and we calculate directly (we write A for the matrix of / and B for the matrix of g in the chosen bases):
gv,w 0 fu,v(x) = w o g o v_1 o v o f o u_1
= B ■ (A ■ x) = (B ■ A) ■ x = (g o f)^(x)
for every x e Kn. By the associativity of matrix multiplications, the composition of mappings corresponds to multiplication of the corresponding matrices. Note that the isomorphisms correspond exactly to invertible matrices and that the matrix of the inverse mapping is the inverse matrix.
The same approach shows how the matrix of a linear mapping changes, if we change the coordinates on both the domain and the codomain:
V -
V -
f
AV-
IV
s-1
where T is the coordinate transition matrix from u' to u and S is the coordinate change matrix from v' to v. If A is the original matrix of the mapping, then the matrix of the new mapping is given by A' = S*-1 AT.
In the special case of a linear mapping / : V —> V, that is the domain and the codomain are the same space V, we express / usually in terms of a single basis u of the space V. Then the change from the old basis to the new basis u' with the coordinate transition matrix T leads to the new matrix A' = T~XAT.
2.3.17. Linear forms. A simple but very important case of linear mappings on an arbitrary vector space V over the scalars K appears with the codomain being the scalars themselves, i.e. mappings / :
_ V —> K. We call them linear forms.
If we are given the coordinates on V, the assignments of a single i-fh coordinate to the vectors is an example of a linear form. More precisely, for every choice of basis v = (iii,..., ii„), there are the linear forms 11* : V —> K such that 11* (vj ) = Sij, that is, 11* (vj ) = 1 when i = j, and 11* (vj ) = 0 when i =^ j.
The vector space of all linear forms on V is denoted by V* and we call it the dual space of the vector space V. Let us now assume that the vector space V has finite dimension n. The basis of V*, v* = (11*,..., 11*), composed of assignments of individual coordinates as above, is called the dual basis to v. Clearly this is a basis of the space V*, because these forms are evidently linearly independent (prove
106
CHAPTER 2. ELEMENTARY LINEAR ALGEBRA
as the change of coordinate system of the observer) We have to understand, what happens with the coordinates of vectors first. The key to all this is the transition matrix (see 2.3.15). We will further write e for the standard basis, that is vectors ((1,0,0), (0,1,0), (0,0,1)) (these vectors could be any three linearly independent vectors in a vector space; with naming them as we did, we identified the vector space with R3)
2.C.21. A vector has coordinates (1,2,3) in the standard basis e. What are its coordinates in the basis
M= ((1,1,0),(1,-1,2),(3,1,5))?
Solution. We write the transiton matrix T for u to the standard basis first. We just write coordinates of the vectors which form the basis u in the columns:
For expressing the sought coordinates we albeit need the transition matrix from the standard basis to u. No problem, it is just T( — 1). (see 2.3.15 if you have not done so yet). We already know how to compute inverse matrix (see 2.1.10).
1
Finally the sought coordinates are
T-\W)T = (\,-1,\)T.
□
Similarly we work with the matrix of a linear mapping.
2.C.22. We are given a linear mapping R3 —> R3 in the standard basis as the following matrix:
\2 0 0/ Write down the matrix of this mapping in the basis (/i,/2,/3) = ((1,1,0), (-1,1,1), (2,0,1)). Solution. Again the transition matrix T for changing the basis from the basis / = (fi, H, h) to the standard basis e can be obtained by writing down the coordinates of the vectors /i. Hi H in the standard basis as the columns of the matrix T. Thus we have
/I -1 2\ T= 1 1 0 . \0 1 1/
it!) and if a e V* is an arbitrary form, then for every vector
U = X\V\ + ■ ■ ■ + xnvn
a(u) = xia(vi) + ■ ■ ■ + xna(yn)
= a{vi)vl(u) H-----h a(vn)v*(u)
and thus the linear form a is a linear combination of the forms
Taking into account the standard basis {1} on the one-dimensional space of scalars K, any choice of a basis v on V identifies the linear forms a with matrices of the type 1/n, that is, with rows y. The components of these rows are coordinates of the general linear forms a in the dual basis v*. Expressing such a form on a vector is then given by multiplying the corresponding row vector y with the column of the coordinates x of the vector u e V in the basis v:
a(u) = y-x = yxxx H-----h ynxn.
Thus we can see that for every finitely dimensional space V, the dual space V* is isomorphic to the space V. The choice of the dual basis provides such an isomorphism.
In this context we meet again the scalar product of a row of n scalars with a column of n scalars. We have worked with it already in the paragraph 2.1.3 on the page 70.
The situation is different for infinitely dimensional spaces. For instance the simplest example of the space of all polynomials K[x] in one variable is a vector space with a countable basis with elements v{ = x1. As before, we can define linearly independent forms v*. Every formal infinite sum JZSo aiv* *s now a well-defined linear form on K[x], because it will be evaluated only for a finite linear combination of the basis polynomials xl ,i = Q,\,2,....
The countable set of all v* is thus not a basis. Actually, it can be proved that this dual space cannot have a countable basis.
2.3.18. The length of vectors and scalar product. When dealing with the geometry of the plane R2 in j/ the first chapter we also needed the concept of the length of vectors and their angles, see 1.5.7. For denning these concepts we used the scalar product of two vectors u = (x,y) and v = (x',y') in the form u ■ v = xx1 + yy'.
Indeed, the expression for the length of v = (x,y) is given by
\\v\\ = \/x2 + y2 =
while the (oriented) angle ip of two vectors u = (x, y) and v = (x1, y') is in the planar geometry given by the formula
cos ip ■
xx + yy
Note that this scalar product is linear in each of its arguments, and we denote it by u ■ v or by (u, v). The scalar product denned in such a way is symmetric in its arguments and of
107
CHAPTER 2. ELEMENTARY LINEAR ALGEBRA
The transition matrix for changing the basis from the standard basis to the basis / is then the inverse of T:
T"
3
1 I
V 4 4 2 /
The matrix of the mapping in the basis / is then given by (see 2.2.11)
□
2.C.23. Consider the vector space of polynomials of one variable of degree at most 2 with real coefficients. In this space, consider the basis 1, x, x2. Write down the matrix of the derivative mapping in this basis and also in the basis
/ = (1 + X2, X, X + X2).
Solution. First we have to determine the matrix of the derivative mapping (let us denote the mapping as d, its matrix as D). We chose the basis (1, x, x2) as a standard basis e, so we have coordinates 1 ~ (1,0, 0), x ~ (0,1, 0) and x2 ~ (0,0,1). We look at the images of the basis vectors: d(l) = 0 ~ (0,0,0), d{x) = 1 ~ (1,0,0) and d(x2) = 2x ~ (0,2,0). Now we write the images as columns into the matrix D:
D ■
Now we write the coordinates of the basis vectors of the basis / into the columns:
to get the transition matrix from / to e. As in the previous example we get the matrix of d in the basis / as
T~XDT = where we had to compute
□
course ||i>|| = 0 if and only if v = 0. We also see immediately that two vectors in the Euclidean plane are perpendicular whenever their scalar product is zero.
Now we shall mimic this approach for higher dimensions. First, observe that the angle between two vectors is always a two-dimensional concept (we want the angle to be the same in the two-dimensional space containing the two vectors u and v). In the subsequent paragraphs, we shall consider only finitely dimensional vector spaces over real scalars R.
Scalar product and orthogonality
A scalar product on a vector space V over real numbers is a mapping ( , } : V x V —> R which is symmetric in its arguments, linear in each of them, and such that (v,v) > 0 and ||w||2 = (v, v) = 0 if and only if v = 0.
The number ||i>|| = sj(v, v) is called the length of the vector v.
Vectors v and w e V are called orthogonal or perpendicular whenever (v, w) = 0. We also write v _L w. The vector v is called normalised whenever ||i>|| = 1.
The basis of the space V composed exclusively of mutually orthogonal vectors is called an orthogonal basis. If the vectors in such a basis are all normalised, we call the basis orthonormal.
A scalar product is very often denoted by the common dot, that is, (u,v) = u ■ v. Thus, it is then necessary to recognize from the context whether the dot means a product of two vectors (the result is a scalar) or something different (e.g. we often denote the product of matrices and product of scalars in the same way).
Because the scalar product is linear in each of its arguments, it is completely determined by its values on pairs of basis vectors. Indeed, choose abasis u = (ui,..., un) of the space V and denote
S{j — (ui, Uj ).
Then from the symmetry of the scalar product we know Sij = Sji and from the linearity of the product in each of its arguments we get
If the basis is orthonormal, the matrix S is the unit matrix. This proves the following useful claim:
Scalar product in coordinates
Proposition. For every orthonormal basis, the scalar product is given by the coordinate expression
(x,y) = yT ■ x.
For each basis of the space V there is the symmetric matrix S such that the coordinate expression of the scalar product is
(x,y) =yT ■ S -x.
108
CHAPTER 2. ELEMENTARY LINEAR ALGEBRA
2.C.24. In the standard basis in R3, determine the matrix of the rotation through the angle 90° in the positive sense about the line (t, t, t), t e R, oriented in the direction of the vector (1,1,1). Further, find the matrix of this rotation in the basis
g = ((1,1,0), (1,0,-1), (0,1,1)).
Solution. We can easily determine the matrix of the given rotation in a suitable basis, that is, in a basis given by the directional vector of the line and by two mutually perpendicular vectors in the plane x + y + z = 0, that is, in the plane of vectors perpendicular to the vector (1,1,1). We note that the matrix of the rotation in the positive sense through 90° in an
orthonormal basis in R2 is
0 -1
In the orthogonal ba
0 -k/l
v1 0
sis with vectors of length k, I respectively, it is , ^ ^
If we choose perpendicular vectors (1, —1,0) and (1,1, —2) in the plane x + y + z = 0 with lengths v2 and V&, then in the basis / = ((1,1,1), (1, -1,0), (1,1, -2)) the rotation
/l 0 0 \ we are looking for has matrix 0 0 — V% I. In order
\0 1/V3 0 / to obtain the matrix of the rotation in the standard basis, it
is enough to change the basis. The transition matrix T for changing the basis from the basis / to the standard basis is obtained by writing the coordinates (under the standard basis) of the vectors of the basis / as the columns of the matrix
/ll 1\ T: T = 1 -1 1 . Finally, for the desired matrix R,
V 0 -2)
we have
/l 0 0 \ R = T-\Q 0 -VI \ ■ T-1 \0 1/V3 0 /
/ 1/3 1/3-^/3 1/3 + v/3/3\ = 1/3 + ^/3 1/3 1/3-^/3 1,1/3-^/3 1/3 + ^/3 1/3 /
This result can be checked by substituting into the matrix of general rotation (2.C.20). By normalizing the vector (1,1,1) we obtain the vector
(x, y, z) = (l/y/3, l/y/3, l/y/3), cos(^) = 0, sin(>) = 1. □
2.C.25. Matrix of general rotation revisited. We de-
rive the matrix of (general) rotation from ^>\tfV (2.C.20) through the angle V on any vector space is called a projection, if we have
/"/ = /■
In such a case, we can write, for every vector v e V,
v = /(„) + („ _ /(„)) e Im(/) + Ker(/) = V
and if v e Im(/) and f(v) = 0, then also v = 0. Thus the above sum of the subspaces is direct. We say that / is a projection to the subspace W = Im(/) along the subspace U = Ker(/). In words, the projection can be described naturally as follows: we decompose the given vector into a component in W and a component in U, and forget the second one.
If V has a scalar product, we say that the projection is orthogonal if the kernel is orthogonal to the image.
Every subspace W =^ V thus defines an orthogonal projection to W. It is a projection to W along W±, given by the unique decomposition of every vector u into components uw £ W and uw± e W±, that is, linear mapping which maps uw + uw± to uw-
2.3.20. Existence of orthonormal bases. It is easy to see that on every finite dimensional real vector space there exist scalar products. Just choose any basis. Define lengths so that each basis vector is of unit length. Immediately we have a scalar product. Call it orthonormal. In this basis the scalar products of vectors are computed as in the formula in the Theorem 2.3.18.
More often we are given a scalar product on a vector space V, and we want to find an appropriate orthonormal basis for it. We present an algorithm using suitable orthogonal projections in order to transform any basis into an orthogonal one. It is called the Gramm-Schmidt orthogonalization process.
109
CHAPTER 2. ELEMENTARY LINEAR ALGEBRA
/ = {{x,y,z),(-y,x, 0),(zx, zy,z2 - 1)), that is, in the orthogonal basis composed of the directional vector of the axis of rotation and of two mutually perpendicular vectors with sizes Vl — z2 lying in a plane perpendicular
to the axis of rotation, the matrix corresponding to the
/I 0 0 \
rotation is A = 0 cos(tp) — sm(ip) . The matrix
\0 sm( 1,21-> —i, written in the coordinates (1,0) i-> (1,0) and (0,1) i-> (0, -1). By writing
the images into the columns we obtain the matrix
1 0
0 -1
In the basis / the conjugation interchanges basis vectors, that is, (1,0) m- (0,1) and (0,1) m- (1,0) and the matrix of
The point of this procedure is to transform a given sequence of independent generators v1,..., vk of a finite dimensional space V into an orthogonal set of independent generators of V.
Gramm-Schmidt orthogonalization
Proposition. Let (wi,..., Uk) be a linearly independent k-tuple of vectors of a space V with scalar product. Then there exists an orthogonal system of vectors (vi,..., Vk) such that Vi £ span{wi,... ,Ui}, and span{wi,... ,Ui} = span{«i,..., v^, for all i = 1,..., k. We obtain it by the following procedure:
• The independence of the vectors Ui ensures that ui =f^ 0; we choose v\ = u\.
• If we have already constructed the vectors vi,...,V£ with the required properties and if £ < k, we choose Vf+i = Uf+i + aivi + ■ ■ ■ + cifVf, where a, =
conjugation under this basis is
0 1
1 0
Proof. We begin with the first (nonzero) vector vi and calculate the orthogonal projection v2 to
spanjwij^ C span{«i, v2}.
The result is nonzero if and only if v2 is independent of v1. All other steps are similar:
In step £, £ > 1 we seek the vector vi+1 = ui+1 + a1v1 +
----h a/vt satisfying {vi+i, Vi) = 0 for all i = 1,..., £. This
implies
0 = {ue+1 + a1v1-\-----\-aeve,Vi) = {ue+1,Vi) + a,i{vi,Vi)
and we can see that the vectors with the desired properties are determined uniquely up to a scalar multiple. □
Whenever we have an orthogonal basis of a vector space V, we just have to normalise the vectors in order to obtain an orthonormal basis. Thus, starting the Gramm-Schmidt orthogonalization with any basis of V, we have proven:
Corollary. On every finite dimensional real vector space with scalar product there exists an orthonormal basis.
In an orthonormal basis, the coordinates and orthogonal projections are very easy to calculate. Indeed, suppose we have an orthonormal basis (ei,..., e„) for a space V. Then every vector v = x1e1 + ■ ■ ■ + xnen satisfies
(ei,v) = (e,, xiei H-----h xnen) = x{
and so we can always express
(1) v = (e1,v)e1 H-----h (e„, v)en.
If we are given a subspace W C V and its orthonormal basis (ei,..., e^), then we can extend it to an orthonormal basis (ei,..., en) for V. Orthogonal projection of a general vector v e V to W is then given by the expression
v i-> (e1,v)e1 H-----h (en,v)ek.
110
CHAPTER 2. ELEMENTARY LINEAR ALGEBRA
b) For the basis (1, i) we obtain 1 i-> 2 + i, i i-> 2i — 1, that is, (1,0) m- (2,1), (0,1) m- (2,-1). Thus the matrix of
multiplication by the number 2 + i under the basis (1, i) is:
2 -1^ 1 2
We determine the matrix in the basis /. Multiplication by (2+i) gives us: (1-i) m- (l-i)(2+i) = 3-i, (1+i) m- (1 + 3i). Coordinates (a, &) j of the vector 3 — i in the basis / are given, as we know, by the equation a-(l—i)+b-(l+i) = 3+i, that is, (3 + i)f_ = (2,1). Analogously (1 + 3i)/ = (-1,2).
Altogether, we obtain the matrix ^ ^
Think about the following: why is the matrix of multiplication by 2 + i the same in both bases? Would the two matrices in these bases be the same for multiplication by any complex number? □
2.C.27. Determine the matrix A which, under the standard basis of the space R3, gives the orthogonal projection on the vector subspace generated by the vectors ui = (—1,1,0) and u2 = (-1,0,1).
Solution. Note first that the given subspace is a plane containing the origin with normal vector u3 = (1,1,1). The ordered triple (1,1,1) is clearly a solution to the system
—Xi + X2 =0,
-xi + x3 = 0,
that is, the vector u3 is perpendicular to the vectors u1,u2.
Under the given projection the vectors u1 and u2 must map to themselves and the vector u3 on the zero vector. In the basis composed of ui, u2, u3 (in this order) is thus the matrix of this projection
(1 ! S).
\0 0 0/
Using the the transition matrix for changing the basis
/-I -1 l\ T = 1 0 1 , T" \0 11/ from the basis (ui, u2, u3) to the standard basis, and from the standard basis to the basis («i, u2, u3) we obtain
A
-1 A 0 0
1 0 1 ľ 0 1 0
0 1 1/ Vo 0 0
2 1 I\
31 23
I \
. 3 3 3 /
In particular, we need only consider an orthonormal basis of the subspace W in order to write the orthogonal projection to W explicitly.
Note that in general the projection / to the subspace W along U and the projection gioU along W is constrained by the equality g = idy —/. Thus, when dealing with orthogonal projections to a given subspace W, it is always more efficient to calculate the orthonormal basis of that space W or W1- whose dimension is smaller.
Note also that the existence of an orthonormal basis guarantees that for every real space V of dimension n with a scalar product, there exists a linear mapping which is an isomorphism between V and the space R™ with the standard scalar product (i.e. respecting the scalar products as well). We saw already in Theorem 2.3.18 that the desired isomorphism is exactly the coordinate assignment. In words - in every orthonormal basis the scalar product is computed by the same formula as the standard scalar product in R™.
The constant coefficient is the determinant \A\. We shall see later that this coefficient describes how much the linear mapping scales the volumes.
We shall return to the questions of the length of a vector and to projections in the following chapter in a more general context.
2.3.21. Angle between two vectors. As we have already noted, the angle between two linearly independent vectors in the space must be the same as when we consider them in the two-dimensional subspace they generate. Basically, this is the reason why the notion of angle is independent of the dimension of the original space. If we choose an orthogonal basis such that its first two vectors generate the same subspace as the two given vectors u and v (whose angle we are measuring), we can simply take the definition from the planar geometry. Independently of the choice of coordinates we can formulate the definition as follows:
Angle between two vectors
The angle ip between two vectors v and w in a vector space with a scalar product is given by the relation
(v,w)
cos p ■
IMIIHI
□
The angle defined in this way does not depend on the order of the vectors v, w and it is chosen in the interval 0 < p < it.
We shall return to scalar products and angles between vectors in further chapters.
2.3.22. Multilinear forms. The scalar product was given as a mapping from the product of two copies of a vector space V into the space of scalars, which was linear in each of its arguments. Similarly, we will work with mappings from the product of k copies of a vector space V into the scalars, which are linear in each of its k arguments. We speak of k-linear forms.
Ill
CHAPTER 2. ELEMENTARY LINEAR ALGEBRA
D. Properties of linear maps
2.D.I. Write down the matrix of the mapping of orthogonal projection on the plane passing through the origin and perpendicular to the vector (1,1,1).
Solution. The image of an arbitrary point (vector) x = (x1,x2,x3) e R3 under the considered mapping can be obtained by subtracting from the given vector its orthogonal projection onto the direction normal to the considered plane, that is, onto the direction (1,1,1). This projection p is given by (see 1) as
(x, (1,1,1))
= (
l(l,l,l)l2
xľ+x2 + x3 xľ+x2 + x3 xľ+x2 + x3^
3 3 The resulting mapping is thus
We have (correctly) obtained the same matrix as in the exercise 2.C.27. □
2.D.2. In R3 write down the matrix of the mirror symmetry with respect to the plane containing the origin and (1,1,1) being its normal vector.
Solution. As in 2.D.1 we get the image of an arbitrary vector x = (x1,x2,x3) e R3 with the help of the orthogonal projection onto the direction (1,1,1). Unlike in the previous example, we need to subtract the projection twice (see image). Thus we get the matrix:
x — 2p =
x1 2(x2 +x3) x2 2(x1 +x3) x3 2(x1 +x2) 'T 3 ' Y 3
3
1 2
3 3
2 1
I 32
. 3 3
Second solution. The normed normal vector of the mirror plane is n = ^(1,1,1). We can express the mirror image of v under the mirror symmetry Z as follows: Z(y) = v — 2{v,n)n = v — 2n ■ (nT ■ v) = v — 2(n ■ nT) ■ v = ((E — 2n ■ nT)v (where we have used (v, n) = v ■ nT for the standard scalar product and the associativity of the matrix
Most often we will meet bilinear forms, that is, the case a : V x V —> K, where for any four vectors u, v, w, z and scalars a, b, c and d we have
a(au + bv, cw + dz) = aca(u, w) + ada(u, z)
+ bca(v, w) + bda(v, z).
If additionally we always have
a(u, w) = a(w, u),
then we speak of a symmetric bilinear form. If interchanging the arguments leads to a change of sign, we speak of an antisymmetric bilinear form.
Already in planar geometry we have defined the determinant as a bilinear antisymmetric form a, that is, a(u, w) = —a(w,u). In general, due to the theorem 2.2.5, we know that the determinant with dimension n can be seen as an n-linear antisymmetric form.
As with linear mappings it is clear that every fc-linear form is completely determined by its values on all fc-tuples of basis elements in a fixed basis. In analogy to linear mappings we can see these values as fc-dimensional analogues to matrices. We show this by an example with k = 2, where it will correspond to matrices as we have defined them.
Matrix of a bilinear form
If we choose a basis uonV and define for a given bilinear form a scalars = a(ui,Uj) then we obtain for vectors v, w with coordinates x and y (as columns of coordinates)
a(v,w) = ^ aijxiVj where A is a matrix A = (a^-).
A-y,
Directly from the definition of the matrix of a bilinear form we see that the form is symmetric or antisymmetric if and only if the corresponding matrix has this property.
Every bilinear form a on a vector space V defines a mapping V —> V*, v i-> a(v, ). That is, by placing a fixed vector in the first argument we obtain a linear form which is the image of this vector. If we choose a fixed basis on a finitely dimensional space V and a dual basis V*, then we have the mapping
x H> (y H> xT -A-y).
All this is a matter of convention. Also we may fix the second vector and get a linear form again.
4. Properties of linear mappings
In order to exploit vector spaces and linear mappings in modelling real processes and systems in other sciences, we need a more detailed analysis of properties of diverse types of linear mappings.
112
CHAPTER 2. ELEMENTARY LINEAR ALGEBRA
multiplication). We get the same matrix:
E — 2n- n = 0 1 0 -
2.4.1.
□
2.D.3. Consider R3, with the standard coordinate system. In the plane z = 0 there is a mirror and at the point [4,3,5] there is a candle. The observer at the point [1,2, 3] is not aware of the mirror, but sees in it the reflection of the candle. Where does he think the candle is?
Solution. Independently of our position, we see the mirror image of the scene in the mirror (that is why it is called a mirror image). The mirror image is given by reflecting the scene (space) by the plane of the mirror, the plane 2 = 0. The reflection with respect to this plane changes the sign of the 2-coordinate. That is we can see the candle at the point [4,3,-5]. □ By using the inner product we can determine the (angular) deflection of the vectors:
2.D.4. Determine the deflection of the roots of the polynomial x2 — i considered as vectors in the complex plane.
Solution. The roots of the given polynomial are square roots of i. The arguments of the square roots of any complex numbers differ according to the de Moivre theorem by it. Their deflection is thus always it. □
2.D.5. Determine the cosine of the deflection of the lines p, q in R3 given by the equations
p : —2x + y + z = 1 x + 3y — Az = 5 q : x — y = —2
2 = 6
o
2.D.6. Using the Gram-Schmidt orthogonalisation, obtain the orthogonal basis of the subspace
U = {(x1,x2,x3,xA)T e R4; x1 + x2 + x3 + xA = 0}
of the space R4.
We begin with four examples in the lowest dimension of interest. With the standard basis of the plane R2 and with the standard scalar product we consider the following matri-/// ■ ces of mappings / : R2 —> R2:
A ■
1 0 0 0
0 1 0 0
a 0 0 b
0 -1
1 0
The matrix A describes the orthogonal projection along the subspace
W = {(0,a); a e R} C R2
to the subspace
V = {(a,0); a e R} C R2,
that is, the projection to the x-axis along the y-axis. Evidently for this / : R2 —> R2 we have / o / = / and thus the restriction /1 v of the given mapping on its codomain is the identity mapping. The kernel of / is exactly the subspace W.
The matrix B has the property B2 = 0, therefore the same holds for the corresponding mapping /. We can envision this as the differentiation of polynomials Ri [x] of degree at most one in the basis (l,x) (we shall come to differentiation in chapter five, see 5.1.6).
The matrix C gives a mapping /, which rescales the first vector of the basis a-times, and the second one b-times. Therefore the whole plane divides into two subspaces, which are preserved under the mapping and where it is only a homothety, that is, scaling by a scalar multiple (the first case was a special case with a = 1, b = 0). For instance the choice a = 1, b = — 1 corresponds to axial symmetry (mirror symmetry) under the x-axis, which is the same as complex conjugation x+iy i-> x—iy on the two-dimensional real space R2 ~ C in basis (1, i). This is a linear mapping of the two-dimensional real vector space C, but not of the one-dimensional complex space C.
The matrix D is the matrix of rotation by 90 degrees (the angle 7r/2) centered at the origin in the standard basis. We can see at first glance that none of the one-dimensional subspaces is preserved under this mapping.
Such a rotation is a bijection of the plane onto itself, therefore we can surely find distinct bases in the domain and codomain, where its matrix will be the unit matrix E. We simply take any basis of the domain and its image in the codomain. But we are not able to do this with the same basis for both the domain and the codomain.
Consider the matrix D as a matrix of the mapping g : C2 —> C2 with the standard basis of the complex vector space C2. Then we can find vectors u = (i, 1), v = (—2,1), for which we have
0 -1
1 0
0 -1
1 0
-1
-1
113
CHAPTER 2. ELEMENTARY LINEAR ALGEBRA
Solution. The set of solutions of the given homogeneous linear equation is clearly a vector space with the basis
(A
1 0 0
lil = 0 1 0
W
shall be denoted Denote by vi, v2, v3, vectors of the orthogonal basis obtained using the Gram-Schmidt orthogonalisation process.
First set vi = ui. Then let
V2 = «2 -
■ Vl
1
1<1
vi = u2 - -vx = 1,0
1 1
that is, choose a multiple v2 = (—1, —1,2,0) . Then let
T
\\vl 11 1 1 1
~3'~3'~3' Altogether we have
T
I |l^2 112
V2 = U3
■v2
(A
1 -l -i
0 2 -i
Due to the simplicity of the exercise we can immediately give an orthogonal basis of the vectors
(1,-1,0,0)T, (0,0,1,-1)T, (1,1,-1,-1)T
(-1,1,1,-1)\ (1,-1,1,-1) , (-1,-1,1,1)
□
2.D.7. Write down a basis of the real vector space of the matrices 3x3 over R with zero trace. (The trace of a matrix is the sum of the elements on the diagonal). Write the coordinates of the matrix
/I 2 0\ 0 2 0
V -2 -3/
in this basis.
2.D.8. Find the orthogonal complement [7± of the subspace
U = {(x1,x2,x3,xA); xi = x3,x2 = x3 + 6xA} C R4.
Solution. The orthogonal complement [7± consists of just those vectors that are perpendicular to every solution of the system
xi - x3 = 0,
x2 — x3 — 6x4 = 0.
That means that in the basis (u, v) on C2, the mapping g has the matrix
K =
i 0 0 -i
Notice that by extending the scalars to C, we arrive at an analogy to the matrix C with diagonal elements a = cos(^tt) + jsin(1;7r) and its complex conjugate a. In other words, the argument of the number a in polar form provides the angle of the rotation.
This is easy to understand, if we denote the real and imaginary part of the vector u as follows
xu + Wu = Re u + i Im u ■
The vector v is the complex conjugate of u. We are interested in the restriction of the mapping g to the real vector subspace
V = R2 n spanc{u, «}cC2. Evidently,
V = spanK{u + u, i(u - u)} = spanK{a;u, -yu}
is the whole plane R2. The restriction of g to this plane is exactly the original mapping given by the matrix D (notice this matrix is real, thus it preserves this real subspace). It is immediately seen that this is the rotation through the angle \ it in the positive sense with respect to the chosen basis xu, —yu. Work it by yourself with a direct calculation. Note also why exchanging the order of the vectors u and v leads to the same result, although in a different real basis!
2.4.2. Eigenvalues and eigenvectors of mappings. A key
to the description of mappings in the previous examples was the answer to the question "what are the vectors satisfying the equation j(u) = a ■ u for some suitable scalars a?".
We consider this question for any linear mapping / :
V —> V on a vector space of dimension n over scalars K. If we imagine such an equality written in coordinates, i.e. using the matrix of the mapping A in some bases, we obtain a system of linear equations
A ■ x — a ■ x = (A — a ■ E) ■ x = 0
with an unknown parameter a. We know already that such a system of equations has only the solution x = 0 if the matrix A—aE is invertible. Thus we want to find such values aeK for which A — aE is not invertible, and for that, the necessary and sufficient condition reads (see Theorem 2.2.11)
(1)
det(yl -a-E) = 0.
If we consider A = a as a variable in the previous scalar equation, we are actually looking for the roots of a polynomial of degree n. As we have seen in the case of the matrix D, the roots may exist in an extension of our field of scalars, if they are not in K.
114
CHAPTER 2. ELEMENTARY LINEAR ALGEBRA
A vector is a solution of this system if and only if it is perpendicular to both vectors (1,0, —1, 0), (0,1, —1, —6). Thus we have
U± = {a- (1, 0, -1, 0) + b ■ (0,1, -1, -6); a,b e R}.
□
2.D.9. Find an orthonormal basis of the subspace V C
where V = {(xi, x2, x3, x4) e
Ci + 2^2 + x3 = 0}.
Solution. The fourth coordinate does not appear in the restriction for the subspace, thus it seems reasonable to select (0,0,0,1) as one of the vectors of the orthonormal basis and reduce the problem into the subspace R3. If we set the second coordinate equal to zero, then in the investigated space there are vectors with reverse first and third coordinate, notably, the unit vector (^j, 0, —^75,0). This vector is perpendicular to any vector which has first coordinate equal to the third coordinate. In order to get into the investigated subspace, we choose the second coordinate equal to the negative of the sum of the first and the third coordinate, and then normalise. Thus we
choose the vector (-7
, 0) and we are finished. □
V6' %/6' VE
2.D.10. Find the eigenvalues and the associated subspaces of eigenvectors of the matrix
A-
Solution. First we find the characteristic polynomial of the matrix:
-1-A 1 0
-1 2
3 — A -2
0
2-A
= A3 - 4A3 + 2A + 4
3
This polynomial has roots 2,1 + \/3, 1 — \/3, which are then the eigenvalues of the matrix. Their algebraic multiplicity is one (they are simple roots of the polynomial), thus each has associated only one (up to a non-zero multiple) eigenvector. Otherwise stated, the geometric multiplicity of the eigenvalue is one, see 3.4.10).
We determine the eigenvector associated with the eigenvalue 2. It is a solution of the homogeneous linear system with the matrix A - 2E:
—3xi + X2 = 0 —lxi+x2 = 0 2xi - 2x2 = 0.
Eigenvalues and eigenvectors
Scalars A e K satisfying the equation j(u) = A ■ u for some nonzero vector u e V are called the eigenvalues of mapping f. The corresponding nonzero vectors u are called the eigenvectors of the mapping f.
If u, v are eigenvectors associated with the same eigenvalue A, then for every linear combination of u and v,
f(au + bv) = af(u) + bf(y) = X(au + bv).
Therefore the eigenvectors associated with the same eigenvalue A, together with the zero vector, form a nontrivial vector subspace V\ C V. We call it the eigenspace associated with A. For instance, if A = 0 is an eigenvalue, the kernel Ker / is the eigenspace Vq.
We have seen how to compute the eigenvalues in coordinates. The independence of the eigenvalues from the choice of coordinates is clear from their definition. But let us look explicitely what happens if we change the basis. As a direct corollary of the transformation properties from the paragraph 2.3.16 and the Cauchy theorem 2.2.7 for calculation of the determinant of product, the matrix A' in the new coordinates will be A' = P~1AP with an invertible matrix P. Thus
\P~1AP - \E\ = \P~1AP - P~1\EP\
= \P~\A- \E)P\
= |P_1||(A- A£)||P|
= \A-\E\,
because the scalar multiplication is commutative and we know that |P_1| = |P|_1.
For these reasons we use the same terminology for matrices and mappings:
Characteristic polynomials
For a matrix A of dimension n over K we call the polynomial \A — \E\ £ Kn [A] the characteristic polynomial of the matrix A.
Roots of this polynomial are the eigenvalues of the matrix A. If A is the matrix of the mapping / : V —> V in a certain basis, then \A — \E\ is also called the characteristic polynomial of the mapping f.
Because the characteristic polynomial of a linear mapping / : V —> V is independent of the choice of the basis of V, the coefficients of individual powers of the variable A are scalars expressing some properties of /. In particular, they too cannot depend on the choice of the basis. Suppose dim V = n and A = (a^) is the matrix of the mapping in some basis. Then
\A - A ■ E\ = (-1)™A™ + (-l^Vii + ■ ■ ■ + ann)A™-1 + ■■■ + \A\\°.
The coefficient at the highest power says whether the dimension of the space V is even or odd.
115
CHAPTER 2. ELEMENTARY LINEAR ALGEBRA
The system has solution x\ = x2 = 0, x3 e R arbitrary. So the eigenvector associated with the value 2 is then the vector (0,0,1) (or any multiple of it).
Similarly we determine the remaining two eigenvectors - as solutions of the system [A - (1 + V3)E}x = 0. The solution of the system
(-2 - V3)xx + x2 = 0 -lxi + (2 - V3)x2 = 0 2a;i - 2x2 + (1 - V3)x3 = 0
is the space {(2 - V3,1,2) i, t e R}.
That is the space of eigenvectors associated with the eigenvalue 1 + V3-
Similarly we obtain that the space of eigenvectors associated with the eigenvalue 1 — \/3 is {(2 + \/3, 1, —2) t, t e R}. □
2.D.11. Determine the eigenvalues and eigenvectors of the <2im3> matrix
A =
Describe the geometric interpretation of this mapping and write down its matrix in the basis:
ei = (1,-1,1) e2 = (1,2,0) e3 = (0,1,1)
The most interesting coefficient is the sum of the diagonal elements of the matrix. We have just proved that it does not depend on the choice of the basis and we call it the trace of the matrix A and denote it by Tr A. The trace of the mapping f is denned as a trace of the matrix in an arbitrary basis.
In fact, this is not so surprising once we notice that the trace is actually the linear approximation of the determinant in the neighbourhood of the unit matrix in the direction A. We shall deal with such concepts in Chapter 8 only. But since the determinant is a polynomial, we may see easily that the only terms in det(_E + tA) which are linear in the real parameter t are just the trace. We shall see relation to matrix exponential later in Chapter 8.
The coefficient at A0 is the determinant \A\ and we shall see later that it describes the rescaling of volumes by the mapping.
2.4.3. Basis of eigenvectors. We discuss a few important properties of eigenspaces now.
Theorem. Eigenvectors of linear mappings f : V —> V associated to different eigenvalues are linearly independent.
Proof. Let ai,..., ak j^jt; I. mapping / and ui
be distinct eigenvalues of the .., Uk eigenvectors with these eigenvalues. The proof is by induction on the number of linearly independent vectors among the chosen ones.
Assume that ui,...,ue are linearly independent and ui+i — J2i ciui is tneir linear combination. We can choose 1 = 1, because the eigenvectors are nonzero. But then
f(ue+i) = ai+i ■ ui+i = J2i=i aW
■ Ui, that is,
f(ui+1) = ^2 ai+i -Ci-Ui a- f(ui) = ^2(
Solution. The characteristic polynomial of the matrix A is
= -A3+4A2-2A = -A(A2-4A+2)
1-A 1 1
1 0
2 - A 1
2 1-A
The roots of this polynomial are the eigenvalues, thus the eigenvalues are 0, 2 + \/2, 2 — \/2. Thus eigenvalues are 0, 2 + \[2, 2 — \[2. We compute the eigenvectors associated with the particular eigenvalues:
• 0: We solve the system
0
Its solutions form a one-dimensional vector space of eigenvectors: span{(l, —1,1)}.
By subtracting the second and the fourth expression in the equalities we obtain 0 = Y?i=i(ai+i ~ ad ' ci ' ui- All the differences between the eigenvalues are nonzero and at least one coefficient q is nonzero. This is a contradiction with the assumed linear independence ui,...,ue, therefore also the vector u;+i must be linearly independent of the others. □
The latter theorem can be seen as a decomposition of a linear mapping / into a sum of much simpler mappings. If there are n = dim V distinct ? eigenvalues A,, we obtain the entire V as a direct sum of one-dimensional eigenspaces V\z. Each of them then describes a projection on this invariant one-dimensional subspace, where the mapping is given just as multiplication by the eigenvalue A,.
Furthermore, this decomposition can be easily calculated:
116
CHAPTER 2. ELEMENTARY LINEAR ALGEBRA
• 2 + \/2: We solve the system
/-(1 + v2) 1 0 \ fXl\
i -V5 i ) \x2 = o.
V 1 2 -(1 + V2)/ VW
The solutions form a one-dimensional space span{ (1,1+
• 2 — \/2: We solve the system
av2-i) i o \ /,a
1 v"2 1 \\x2 \= 0.
V 1 2 (v^-l)/ w
Its solutions form a space of eigenvectors span{(l, 1 — y/2,1- v2)}.
Hence the given matrix has eigenvalues 0, 2 + \/2 and 2 — \/2, with the associated one-dimensional spaces of eigenvectors span{(l, —1,1)}, span{(l, 1 + \/2,1 + \/2)} and span{(l, 1 — \/2, 1 — V^)} respectively.
The mapping can thus be interpreted as a projection along the vector (1,-1,1) into the plane given by the vectors (1,1 + y/2,1 + y/2) and (1,1 - y/2,1 - y/2) composed with the linear mapping given by "stretching" by the factor corresponding to the eigenvalues in the directions of the associated eigenvectors.
Now we express it in the given basis. For this we need the matrix T for changing the basis from the standard basis to the new basis. This can be obtained by writing the coordinates of the vectors of the original basis under the new basis into the columns of the matrix T. But we shall do it in a different way - we obtain first the matrix for changing the basis from the new one to the original one, that is, the matrix T-1. We just write the coordinates of the vectors of the new basis into the columns:
/I 1 0\ T-1 = -1 2 1 . V1 0 l)
Then
/ 0 0 1 \
T = T_1 1 = 1 0 -1 , V-2 1 3 J
and for the matrix B of a mapping under new basis we have
(see 2.3.16)
/0 5 2 \ B = TAT'1 = [ 0 -2 -1 .
\0 14 6 /
□
You can find more exercises on computing with eigenvalues and eigenvectors on the page 133.
Basis of eigenvectors
Corollary. If there exist n mutually distinct roots \ of the characteristic polynomial of the mapping f : V —> V on the n-dimensional space V, then there exists a decomposition of V into a direct sum of eigenspaces each of dimension one. This means that there exists a basis for V consisting only of eigenvectors and in this basis the matrix for f is the diagonal matrix with the eigenvalues on the diagonal. This basis is uniquely determined up to the order of the elements and scale of the vectors.
The corresponding basis (expressed in the coordinates in an arbitrary basis ofV) is obtained by solving n systems of homogeneous linear equations of n variables with matrices (A — Xi ■ E), where A is the matrix of f in a chosen basis.
2.4.4. Invariant subspaces. We have seen that every eigenvector v of the mapping / : V —> V generates a subspace span{w} c V, which is preserved by the mapping /.
More generally, we say that a vector sub-space W C V is an invariant subspace for a linear mapping f,iff(W)cW.
If V is a finite dimensional vector space and we choose some basis («i,..., uk) of a subspace W, we can always extend it to be a basis (ui,..., Uk, Mfc+i, • • • ,un) for the whole space V. For every such basis, the mapping will have a matrix A of the form
(1)
A ■
B C 0 D
where B is a square matrix of dimension k, D is a square matrix of dimension n — k and C is a matrix of the type n/(n — k). On the other hand, if for some basis («i,..., un) the matrix of the mapping / is of the form (1), then W = span{ui,..., Uk} is invariant under the mapping /.
By the same arguments, the mapping with the matrix A as in (1) leaves the subspace span{ufc+i,..., un} invariant, if and only if the submatrix C is zero.
From this point of view the eigenspaces of the mapping are special cases of invariant subspaces. Our next task is to find some conditions under which there are invariant complements of invariant subspaces.
2.4.5. We illustrate some typical properties of mappings on the spaces R3 and R2 in terms of eigenvalues and eigenvectors.
(1) Consider the mapping given in the standard basis by the matrix A
117
CHAPTER 2. ELEMENTARY LINEAR ALGEBRA
In the case of a 3 x 3 matrix, you can use this special formula to find its characteristic polynomial:
2.D.12. For any nxn matrix A its characteristic polynomial ^4 — A E | is of degree n, that is, it is of the form
\A-\E\ = cn An+c„_i An_1H-----hci A+c0, c„ ^ 0,
while we have
^ = (-1)™, Cn_! = (-I)™"1 tr A c0 = I A\. If the matrix ^4 is three-dimensional, we obtain
\A-\E\ = -A3 + (trA) A2 + ciA+ \A\. By choosing A = 1 we obtain
\A- E \ = -1 + trvl + ci + | yl |. From there we obtain
\A-\E\ =
-A3 + (trA) A2 + (| A- E\ + 1 - trA- | A\) A+ | A\. Use this expression for determining the characteristic polynomial and the eigenvalues of the matrix
32 -67 47
7 -14 13
-7 15 -6
o
2.D.13. Find the orthonormal complement of the vec-torspace spaned by the vectors (2,1,3), (3,16,7), (3, 5,4),
(-7,7,-10).
Solution. In fact the task consists of solving the system 2.A.3, which we have done already. □
2.D.14. Pauli matrices. In physics, the state of a particle with spin | is described with Pauli matrices. They ^xT^c are the 2 x 2 matrices over complex numbers:
0i
0 1\ /0 -iN
1 0
0 -1
vi oy " 0
For square matrices we define their commutator (denoted by square brackets) as [oi, a2\ := o\o2 — o2o\
Show that [ai, o2] = 2ia3 and similarly [a1,a3] = 2io2 and [02,03] = 2ia1. Furthermore, show that o2 = o2 = of = 1 and that the eigenvalues of the matrices 01, o2, o3 are ±1.
Show that for matrices describing the state of the particle with spin 1, namely
1 A 1 °\ 1 A ~i 0 \ A 0 0 \
7=2 [l I I '71(i 0 -l]Ao 0 0 1
We compute
\A - A^l =
with roots Ai = eigenvalue A =
0
1 - A 0
1 0
-A
= -A3 + A2 + A-l, 1. The eigenvectors with
\ ľ 0 -A
~ 0 0 0
/ Vo 0 0 /
: 1,A2 = 1,A3 = -
1 can be computed:
/-1 or 000 \ 1 0 -i/
with the basis of the space of solutions, that is, of all eigenvectors with this eigenvalue
ui = (0,1,0), w2 = (1,0,1).
Similarly for A = — 1 we obtain the third independent eigenvector
A 0 A A 0 A
0 2 - 0 2 0 ^W3 = (-1,0,1)
U 0 1/ Vo 0 0/
Under the basis «i, u2, u3 (note that u3 must be linearly independent of the remaining two because of the previous theorem and ui, u2 were obtained as two independent solutions) / has the diagonal matrix
/I 0 0\ A= 0 1 0 . \0 0 -l)
The whole space R3 is a direct sum of eigenspaces, R3 = Vi ffi V2, with dim V\ = 2, and dim V2 = 1. This decomposition is uniquely determined and says much about the geometric properties of the mapping /. The eigenspace V\ is furthermore a direct sum of one-dimensional eigenspaces, which can be selected in other ways (thus such a decomposition has no further geometrical meaning).
(2) Consider the linear mapping / : r2 [x] —> r2 [x] defined by polynomial differentiation, that is,/(l) = 0, j(x) = 1, j(x2) = 2x. The mapping / thus has in the usual basis (l,x, x2) the matrix
The characteristic polynomial is | A—A- E\ = —A3, thus it has only one eigenvalue, A = 0. We compute the eigenvectors:
0 1 °\ ŕ 1 °\
0 0 2 . - 0 0 1
Vo 0 0 Vo 0 0
,0 i 0
>0 0 -I,
The space of the eigenvectors is thus one-dimensional, generated by the constant polynomial 1.
The striking property of this mapping is that is no basis for which the matrix would be diagonal. There is the "chain" of vectors mapping four independent generators as follows: |x2i-^a;i-^li-^0 builds a sequence of subspaces without invariant complements.
118
CHAPTER 2. ELEMENTARY LINEAR ALGEBRA
, the commuting relations are the same as in the case of Pauli matrices.
Equivalently it can be shown that under the notation 1 := ^ l) '= *fj3'^ := i&2,K := ici forms the vector space with basis (1,1, J, K) of an algebra of quaternions (the algebra is a vector space with binary bilinear operation of multiplication, in this case the multiplication is given by matrix multiplication). In order for the vector space to be an algebra of quaternions it is necessary and sufficient to show the following properties: P = J2 = K2 = —1 and IJ = -JI = K,JK = -KJ = I and KI = -IK = J.
2.D.15. Can the matrix
«-(; f)
be expressed in the form of the product B = P-1 ■ D ■ P for some diagonal matrix D and invertible matrix PI If possible, give an example of such matrices D, P, and find out how many such pairs there are.
Solution. The matrix B has two distinct eigenvalues, and thus such an expression exists. For instance it holds that
[5 6\ _ i fy/2 -V2\ (11 0 \ i / y/2 y/2\ \6 5) 2\V2 V2j\0 -1)'2\-V2 V2J-
There exist exactly two diagonal matrices D:
(11 0\ (-1 0\ \0 -1)' \0 11J '
but the columns of the matrix P-1 can be substituted with their arbitrary non-zero scalar multiples, thus there are infinitely many pairs D, P. □
As we have already seen in 2.D. 11, based on the eigenvalues and eigenvectors of the given 3x3 matrix, we can often interpret geometrically the mapping it induces in R3. In particular, we notice that can do so in the following situations: If the matrix has 0 as eigenvalue and 1 as an eigenvalue with geometric multiplicity 2, then it is a projection in the direction of the eigenvector associated with the eigenvalue 0 on the plane given by the eigenspace of the eigenvalue 1. If the eigenvector associated with 0 is perpendicular to that plane, then the mapping is an orthogonal projection.
If the matrix has eigenvalue —1 with the eigenvector perpendicular to the plane of the eigenvectors associated with the eigenvalue 1, then it is a mirror symmetry through the plane of the eigenvectors associated with 1.
2.4.6. Orthogonal mappings. We consider the special case tJT j-p.,, of the mapping / : V —> W between spaces fi^TWC/ with scalar products, which preserve lengths for
ÚM^U,t^___ all vectors u e V.
Orthogonal mappings
A linear mapping / : V —> W between spaces with scalar product is called an orthogonal mapping, if for all u e V
(f(u),f(u)) = (u,u).
The linearity of / and the symmetry of the scalar product imply that for all pairs of vectors the following equality holds:
(f(u + v), f(u + v)) = (f(u), f(u)) + (f(v), /(„)) + 2{f(u),f(v)).
Therefore all orthogonal mappings satisfy also the seemingly stronger condition for all vectors u,v e V:
(f(u),f(v)) = (u,v),
i.e. the mapping / leaves the scalar product invariant if and only if it leaves invariant the length of the vectors. (We should have noticed that this is true for all fields of scalars, where 1 + 1^0, but it does not hold true for Z2.)
In the initial discussion about the geometry in the plane we proved in the Theorem 1.5.10 that a linear mapping R2 —> R2 preserves lengths of the vectors if and only if its matrix in the standard basis (which is orthonormal with respect to the standard scalar product) satisfies AT ■ A = E, that is, A-1 = AT.
In general, orthogonal mappings / : V —> W must be always injective, because the condition (f(u), J(u)) = 0 implies (u, u) = 0 and thus u = 0. In such a case, the dimension of the range is always at least as large as the dimension of the domain of /. But then both dimensions are equal and / : V —> Im / is a bijection. If Im / ^ W, we extend the orthonormal basis of the image of / to an orthonormal basis of the range space and the matrix of the mapping then contains a square regular submatrix A along with zero rows so that it has the required number of rows. Without loss of generality we can assume that W = V.
Our condition for the matrix of an orthogonal mapping in any orthonormal basis requires that for all vectors x and y in the space K":
(A ■ xf ■ (A ■ y) = xT ■ (AT ■ A) ■ y = xT ■ y.
Special choice of the standard basis vectors for x and y yields directly AT ■ A = E, that is, the same result as for dimension two. Thus we have proved the following theorem:
Matrix of orthogonal mappings
Theorem. Let V be a real vector space with scalar product and let f : V —> V be a linear mapping. Then f is orthogonal if and only if in some orthogonal basis (and then consequently in all of them) its matrix A satisfies AT = A-1.
119
CHAPTER 2. ELEMENTARY LINEAR ALGEBRA
If the matrix has eigenvalue 1 with an eigenvector perpendicular to plane of the eigenvectors associated with the eigenvalue —1, then it is an axial symmetry (in space) through the axis given by the eigenvector associated with 1.
2.D.16. Determine what linear mapping R3 by the matrix
1 is given
_I _2x 3 3 \
_1 _8 3 3 I
1 1/
Solution. The matrix has a double eigenvalue —1, its associated eigenspace is span{(2,0,1), (1,1,0)}. Further, the matrix has 0 as the eigenvalue, with eigenvector (1,4, —3). The mapping given by this matrix under the standard basis is then an axial symmetry through the line given by the last vector composed with the projection on the plane perpendicular to the last vector, that is, given by the equation x + Ay — 3z = 0.
□
2.D.17. The theorem 2.4.7 gives us tools for recognising a matrix of a rotation in R3. It is orthogonal (rows orthogonal to each other equivalently the same for the columns). It has three distinct eigenvalues with absolute value 1. One of them is the number 1 (its associated eigenvector is the axis of the rotation). The argument of the remaining two, which are necessarily complex conjugates, gives the angle of the rotation in the positive sense in the plane given by the basis u\ + u\, i{u\ -«!)■
2.D.18. Determine what linear mapping is given by the matrix
3 16 -12
5 25 25
-16 93 24
25 125 125
12 24 107
25 125 125
Í). 3
3>' 5
5., §2); § - |j,w3 = (1, §2). All three eigenval-
Solution. First we notice, that the matrix is orthogonal (rows are mutually orhogonal, and equivalently the same with columns). The matrix has the following eigenvalues and corresponding eigenvectors: 1, v1 = (0,1,
ues have absolute value one, which together with the observation of orthogonality tells us that the matrix is a matrix of rotation. Its axis is given by the eigenvector corresponding to the eigenvalue 1, that is the vector (0,1, |). The plane of rotation is the real plane in R3, which is given by the intersection of two dimensional complex space in C3 generated by the remaining eigenvectors with R3. It is the plane
Proof. Indeed, if / preserves lengths, it must have the claimed property in every orthonormal basis. On the other hand, the previous calculations show that this property for the matrix in one such basis ensures length preservation. □
Square matrices which satisfy the equality AT = A-1 are called orthogonal matrices.
The shape of the coordinate transition matrices between orthonormal bases is a direct corollary of the above theorem. Each such matrix must provide a mapping K" —> K" which preserves lengths and thus satisfies the condition S*-1 = . When changing from one orthonormal basis to another one, the matrix of any linear mapping changes according to the relation
A' = STAS.
2.4.7. Decomposition of an orthogonal mapping. We take a more detailed look at eigenvectors and eigenvalues of orthogonal mappings on a real vector space V with scalar product.
Consider a fixed orthogonal mapping / : V —> V with the matrix A in some orthonormal basis. We continue as with the matrix D of rotation in 2.4.1.
We think first about invariant subspaces of orthogonal mappings and their orthogonal complements. Namely, given any subspace W C V invariant with respect to an orthogonal mapping / : V —> V, then for all v e W1- and nelfwe immediately see
(f(v),w) = (f(v), /o/-») = <«,/-»} = 0
since f~1(w) G W, too. But this means that also f(W±) C W1- and we have proved a simple but very important proposition:
Proposition. The orthogonal complement of a subspace invariant with respect to an orthogonal mapping is also invariant.
If all eigenvalues of an orthogonal mapping are real, this jji 11 claim ensures that there always exists a basis of V composed of eigenvectors. Indeed, the restriction of ~X f to the orthogonal complement of an invariant sub-space is again an orthogonal mapping, therefore we can add one eigenvector to the basis after another, until we obtain the whole decomposition of V. However, mostly the eigenvalues of orthogonal mappings are not real. We need to deviate into complex vector spaces. We formulate the result right away:
120
CHAPTER 2. ELEMENTARY LINEAR ALGEBRA
span{ (1,0, 0), (0, —4, 3)} (the first generator is the (real multiple of) v2 + v3, the other one is the (real multiple of) i(v2 — v3), see 2.4.7). We can determine the rotation angle in this plane, It is a rotation by the angle arccos(|) = 0,2957T, which is the argument of the eigenvalue | +1 i (or minus that number, if we would choose the other eigenvalue).
It remains to determine the direction of the rotation. First, recall that the meaning of the direction of the rotation changes when we change the orientation of the axis (it has no meaning to speak of the direction of the rotation if we do not have an orientation of the axis). Using the ideas from the proof of the theorem 2.4.7, we see that the given matrix acts by rotating by arccos(|)) in the positive sense in the plane given by the basis ((1,0,0), (0, §)). The first vector of the basis is the imaginary part of the eigenvector associated with the eigenvalue | + |i, the second is then the (common) real part of the eigenvectors associated with the complex eigenvalues. The order of the vectors in the basis is important (by changing their order the meaning of the direction changes). The axis of rotation is perpendicular to the plane. If we orient using the right-hand rule (the perpendicular direction is obtained by taking the product of the vectors in the basis) then the direction of the rotation agrees with the direction of rotation in the plane with the given basis. In our case we obtain by the vector product (0,1,-1) x (1,1,-1) = (0,-1,-1). It is thus a rotation through arccos(|) in the positive sense about the vector (0, —1, —1), that is, a rotation through arccos(|) in the negative sense about the vector (0,1,1). □
2.D.19. Determine what linear mapping is given by the matrix
-1 3 -1
5 5 5
-8 9 2
5 5 5
8 -4 3
5 5 5
Solution. By already known method we find out that the matrix has the following eigenvalues and corresponding eigenvectors: 1, (1,2,0); | + fa, 1, (1,1 + - i); § -|i,(l,l — 2,-1 + i). Though all three eigenvectors have absolute value 1, they are not orthogonal to each other, thus the matrix is not orthogonal. Consequently it is not a matrix of rotation. Nevertheless, it is a linear mapping which is "close" to a rotation. It is a rotation in the plane given by two complex eigenvectors (but this plane is not orthogonal to the vector (1,2,0), but it is preserved by the map). It remains to
Orthogonal mapping decomposition
Theorem. Let f : V —> V be an orthogonal mapping on a real vector space V with scalar product. Then all the (in general complex) roots of the characteristic polynomial f have length one. There exists the decomposition of V into one-dimensional eigenspaces corresponding to the real eigenvalues A = ±1 and two-dimensional subspaces Px^x with A £ C\R, where f acts by the rotation by the angle equal to the argument of the complex number A in the positive sense. All these subspaces are mutually orthogonal.
Proof. Without loss of generality we can work with the space V = Rm with the standard scalar product. The mapping is thus given by an orthogonal matrix A which can be equally well seen as the matrix of a (complex) linear mapping on the complex space Cm (which just happens to have all of its coefficients real).
There exist exactly m (complex) roots of the characteristic polynomial of A, counting their algebraic multiplicities (see the fundamental theorem of algebra, 12.2.8). Furthermore, because the characteristic polynomial of the mapping has only real coefficients, the roots are either real or there are a pair of roots which are complex conjugates A and A. The associated eigenvectors in Cm for such pairs of complex conjugates are actually solutions of two systems of linear homogeneous equations which are also complex conjugate to each other - the corresponding matrices of the systems have real components, except for the eigenvalues A. Therefore the solutions of this systems are also complex conjugates (check this!).
Next, we exploit the fact that for every invariant sub-space its orthogonal complement is also invariant. First we find the eigenspaces V±i associated with the real eigenvalues, and restrict the mapping to the orthogonal complement of their sum. Without loss of generality we can thus assume that our orthogonal mapping has no real eigenvalues and that dim V = 2n > 0.
Now choose an eigenvalue A and let u\ be the eigenvector in C2n associated to the eigenvalue A = a + i(3, (3 =^ 0. Analogously to the case of rotation in the plane discussed in paragraph 2.4.1 in terms of the matrix D, we are interested in the real part of the sum of two one-dimensional (complex) subspaces W = span{uA} ffi span{uA}, where u\ is the eigenvector associated to the conjugated eigenvalue A.
Now we want the intersection of the 2-dimensional com-
plex subspace W with the real subspace ]
which
is clearly generated (over R) by the vectors u\ + u\ and i(u\ —u\). We call this real 2-dimensional subspace Px x C R2n and notice, this subspace is generated by the basis given by the real and imaginary part of u\
xx = Reux, -y\ = -ImuA.
121
CHAPTER 2. ELEMENTARY LINEAR ALGEBRA
determine the direction of the rotation. First, we should recall that the meaning of the direction of the rotation changes when we change the orientation of the axis (it has no meaning to speak of the direction of the rotation if we do not have an orientation of the axis).
Using the same ideas as in the previous example, we see that the given matrix acts by rotating by arccos (|)) in the positive sense in the plane given by the basis ((1,1,-1), (0,1,). The first vector of the basis is the imaginary part of the eigenvector associated with the eigenvalue | + |i, the second is then the (common) real part of the eigenvectors associated with the complex eigenvalues. The order of the vectors in the basis is important (by changing their order the meaning of the direction changes). The "axis" of rotation is not perpendicular to the plane, but we can orient the vectors lying in the whole half-plane using the right-hand rule (the perpendicular direction is obtained by taking the product of the vectors in the basis) then the direction of the rotation agrees with the direction of rotation in the plane with the given basis. In our case we obtain by the vector product (0,1,-1) x (1,1,-1) = (0,-1,-1). It is thus a rotation through arccos(|) in the positive sense about the vector (0,-1,-1), that is, a rotation through arccos(|) in the negative sense about the vector (0,1,1). □
2.D.20. Without any written computation determine the spectrum of the linear mapping / : R3 —> R3 given by
(xi,x2,x3) l-> (xi +x3,x2,x1 +x3). O
2.D.21. Find the dimension of the eigenspaces of the eigenvalues Aj of the matrix
/4 0 0 0\
1 5
0 0
3/
Because A ■ (u\ + u\) = \u\ + \u\ and similarly with the second basis vector, it is clearly an invariant subspace with respect to multiplication by the matrix A and we obtain
A ■ xx = axx + py\, A-yx = -ayx + fixx.
Because our mapping preserves lengths, the absolute value of the eigenvalue A must equal one. But that means that the restriction of our mapping to Px x is the rotation by the argument of the eigenvalue A. Note that the choice of the eigenvalue A instead of A leads to the same subspace with the same rotation, we would just have expressed it in the basis x\, y\, that is, the same rotation will in these coordinates go by the same angle, but with the opposite sign, as expected.
The proof of the whole theorem is completed by restricting the mapping to the orthogonal complement and finding another 2-dimensional subspace, until we get the required decomposition. □
We return to the ideas in this proof once again in chapter three, where we study complex extensions of the Euclidean vector spaces, see 3.4.4.
Remark. The previous theorem is very powerful in dimen-sion three. Here at least one eigenvalue must be real ±1, since three is odd. But then the associated eigenspace is an axis of the rotation of the three-dimensional space through the angle given by the argument of the other eigenvalues. Try to think how to detect in which direction the space is rotated. Note also that the eigenvalue —1 means an additional reflection through the plane perpendicular to the axis of the rotation.
O
./img/0163b.jpg
We shall return to the discussion of such properties of matrices and linear mappings in more details at the end of the next chapter, after illustrating the power of the matrix calculus in several practical applications. We close this section with a general quite widely used definition:
122
CHAPTER 2. ELEMENTARY LINEAR ALGEBRA
Spectrum of linear mapping
2.4.8. Definition. The spectrum of a linear mapping f : V —> V, or the spectrum of a square matrix A, is a sequence of roots of the characteristic polynomial / or A, along with their multiplicities, respectively. The algebraic multiplicity of an eigenvalue means the multiplicity of the root of the characteristic polynomial, while the geometric multiplicity of the eigenvalue is the dimension of the associated subspace of eigenvectors.
The spectral diameter of a linear mapping (or matrix) is the greatest of the absolute values of the eigenvalues.
In this terminology, our results about orthogonal mappings can be formulated as follows: the spectrum of an orthogonal mapping is always a subset of the unit circle in the complex plane. Thus only the values ±1 may appear in the real part of the spectrum and their algebraic and geometric multiplicities are always the same. Complex values of the spectrum then correspond to rotations in suitable two-dimensional sub-spaces which are mutually perpendicular.
123
CHAPTER 2. ELEMENTARY LINEAR ALGEBRA
E. Additional exercises for the whole chapter
2.E.I. Kirchhoff's Circuit Laws. We consider an application of Linear Algebra to analysis of electric circuits, using Ohm's law and Kirchhoff's voltage and current laws.
Consider an electric circuit as in the figure and write down the values of the currents there if you know the values
Vx = 20, 14 = 120, 14 = 50, i?i = 10, R2 = 30, R3 = 4, RA = 5, R5 = 10,
Notice that the quantities J, denote the electric currents, while Rj are resistances, and 14 are voltages.
Solution. There are two closed loops, namely ABEF and EBCD and two branching vertices B and E of degree no less than 3. On every segment of the circuit, bounded by branching points, the electric current is constant. Set it to be I\ on the segment EFAB, I2 on EB, and I3 on BCDE.
Applying Kirchhoff's current law to branching points B and E we obtain: I\ + I2 = I3 and I3 — I\ = I2, which are, of course the same equations. In case there are many branching vertices, we write all Kirchhhoff's Current Law equations to the system, having at least one of those equations redundant.
Choose the counter clockwise orientations of the loops ABEF and EBCD. Applying Kirchhoff Voltage Law and Ohm's Law to the loop ABEF we obtain the equation:
14 + hR3 - I2R5 + V3 + hRi + hRi = 0.
Similarly, the loop EBCD implies
-14 + I3R2 -V3 + R5I2 = 0.
By combining all equations, we obtain the system
h + h - I3 = 0, (Rs + Ri + R^h - R5I2 + = -14-14,
Rsh + R2I3 = 14 + 14-Substituing the prescribed values we obtain the linear system
h + h - I3 = 0, 19/i - 10/2 + = -70,
IO/2 + 30/3 = 170.
This has solutions h =-§§ « -1.509, I2 = ^ « 4.132, 73 = ^^ 2.623. □
2.E.2. The general case. In general, the method for electrical circuit analysis can be formulated along the following steps:
i) Identify all branching vertices of the circuit, i.e vertices of degree no less than 3;
ii) Identify all closed loops of the circuit;
iii) Introduce variables Ik, denoting oriented currents on each segment of the circuit between two branching vertices;
124
CHAPTER 2. ELEMENTARY LINEAR ALGEBRA
iv) Write down Kirchhoff's current conservation law for each branching vertex. The total incoming current equals the total outgoing current;
v) Choose an orientation on every closed loop of the circuit and write down Kirchhoff's voltage conservation law according to the chosen orientation. If you find an electric charge of voltage Vj and you go from the short bar to the long bar, the contribution of this charge is Vj. It is — Vj if you go from the long bar to the short one. If you go in the positive direction of a current I and find a resistor with resistance Rj, the contribution is —Rjl, and it is Rjl if the orientation of the loop is opposite to the direction of the current I. The total voltage change along each closed loop must be zero.
vi) Compose the system of linear equations collecting all equations, representing Kirchhoff's current and voltage laws and solve it with respect to the variables, representing currents. Notice that some equations may be redundant, however, the solution should be unique.
To illustrate this general approach, consider the circuit example in the diagram.
Solution.
i) The set of branching vertices is {B, C, F, G, H}.
ii) The set of closed loops is {ABHG, FHBC, GHF, CDEF}.
iii) Let I\ be the current on the segment GAB, I2 on the segment GH, I3 on the segment HB, J4 on the segment BC, I5 on the segment FC, on the segment FH, I? on GF, and Is on CDEF.
iv) Write Kirchhoff's current conservation laws for the branching vertices:
• vertex B: I\ + I3 = I4
• vertex C: I4 + 15 = J§
• vertex F: I8 = I5 + I6 — I7
• vertex G: — I7 = I\ +I2
• vertex H: I2 + Iq = I3
v) Write Kirchhoff's voltage conservation for each of the closed loops traversed counter-clockwise:
• loop ABHG: -RJ2 + V3 + R2h - V2 = 0
• loop FHBC: V4 + R3h - V3 = 0
• loop GHF: RJ2 - V1 = 0
• loop CDEF: R4IS - V4 = 0
Set the parameters: Rx =4, R2 = 7, R3 = 9, RA = 12, Vx = 10, V2 = 20, , V3 = 60, , VA = 120, to obtain the system
h+h-h=Q
h + k- hi = 0
h+h-h-h=Q
h+h+h=Q
125
CHAPTER 2. ELEMENTARY LINEAR ALGEBRA
12 - I3 + I6 = 0
77i - 4J2 = -40 9J4 = -60 4J2 = 10 12 J8 = 120
with the solution set h = J2 = f, J3 = J4 = J5 = f, 4 = ^, ^ = f, J8 = 10.
□
2.E.3. Solve the system of equations
xx + x2 + x3 + x4 — 2x5 = 3,
2a;2 + 2x3 + 2x4 - 4x5 = 5,
—xi — x2 — x3 + x4 + 2x5 = 0,
—2xi + 3x2 + 3x3 — 6x5 = 2.
Solution. The extended matrix of the system is
/ 1 1 1 1 -2 3 \
0 2 2 2 -4 5
-1 -1 -1 1 2 0
V -2 3 3 0 -6 2 /
Adding the first row to the third, adding its 2-multiple to the fourth, and adding the (—5/2)-multiple of the second to the fourth we obtain
/ 1 1 1 1 -2 3 \ / 1 1 1 1 -2 3 \
0 2 2 2 -4 5 0 2 2 2 -4 5
0 0 0 2 0 3 0 0 0 2 0 3
V 0 5 5 2 -10 8 / V 0 0 0 -3 0 -9/2 J
The last row is clearly a multiple of the previous, and thus we can omit it. The pivots are located in the first, second and fourth. Thus the free variables are X3 and X5 which we substitute by the real parameters t and s. Thus we consider the system
xi + X2 + t + X4 — 2s = 3, 2x2 + 2t + 2x4 - 4s = 5, 2x4 = 3.
We see that X4 = 3/2. The second equation gives
2x2 + 2f + 3 - 4s = 5, that is, x2 = l- t + 2s.
From the first we have xi + l-i + 2s + i + 3/2-2s = 3, tj. xx = 1/2. Altogether,
(xi, x2, x3, x4, x5) = (1/2, 1— t+2s, t, 3/2, s), t,s G R.
Alternatively, we can consider the extended matrix and transform it using the row transformations into the row echelon form. We arrange it so that the first non-zero number in every row is 1, and the remaining numbers in the column containing this 1 are 0. We omit the fourth equation, which is a combination of the first three. Sequentially, multiplying the second and
126
CHAPTER 2. ELEMENTARY LINEAR ALGEBRA
1111-2
the third row by the number 1/2, subtracting the third row from the second and from the first and by subtracting the second row from the first we obtain
5/2 ) ~
3/2j
3/2 J
0 2 2 0 0 0
1 1 1
0 110 V 0 0 0 1 0 If we choose again 23
1111-2 0 111-2 0 0 0 1 0
10 0 0 0 110 0 0 0 1
0
0
s (t,s e K), we obtain the general solution (2.E.3) as above.
□
2.E.4. Find the solution of the system of linear equations given by the extended matrix
/ 3 3 2
2 1 1
0 5-4
\ 5 3 3
3\
4
1
5 )
3 \ / 3 3 2 1 3 \
4 0 -3 -1 -2 6
1 0 5 -4 3 1
5 / V 0 6 1 14 0 /
Solution. We transform the given extended matrix into the row echelon form. We first copy the first three rows and into the
last row we write the sum of the (2)-multiple of the first and of the (—3)-multiple of the last row. By this we obtain
/ 3 3 2 1 2 11 0 0 5-43 \ 5 3 3 -3
Copying the first two rows and adding a 5-multiple of the second row to the 3-multiple of the third and its 2-multiple to the
fourth gives
/ 3 3 2 0 -3 -1 0 5-4 \ 0 6 1
Copying the first, second and fourth row, and adding the fourth to the third, yields
1 3 \ / 3 3 2 1 3 \
-2 6 0 -3 -1 -2 6
3 1 0 0 -17 -1 33
14 0 / V 0 0 -1 10 12 /
(3 3 2 1 3 \ (3 3 2 1 3 \
0 -3 -1 -2 6 0 -3 -1 -2 6
0 0 -17 -1 33 0 0 -18 9 45
V 0 0 -1 10 12 / V 0 0 -1 10 12 /
With three more row transformations, we arrive at
/ 3 3 2 1 3 ^ { 3 3 2 1 3 ^
0 -3 -1 -2 6 0 -3 -1 -2 6
0 0 -18 9 45 0 0 2 -1 -5
V 0 0 -1 10 12 j 0 0 1 -10 -12 j
/ 3 3 2 1 3 (3 3 2 1 3 \
0 -3 -1 -2 6 0 -3 -1 -2 6
0 0 1 -10 -12 0 0 1 -10 -12
V 0 0 2 -1 -5 ) ^ 0 0 0 19 19 /
The system has exactly 1 solution. We determine it by backwards elimination
(3 3 2 1 3 N \ í 3 3 2 0 2 ^
0 -3 -1 -2 6 0 -3 -1 0 8
0 0 1 -10 -12 0 0 1 0 -2
V 0 0 0 1 1 / { 0 0 0 1 1 )
3 0 0 6 / ' 1 1 0 0 2 ^ /I 0 0 0 4 \
0 -3 0 0 6 0 1 0 0 -2 0 1 0 0 -2
0 0 1 0 -2 0 0 1 0 -2 0 0 1 0 -2
V 0 0 0 1 1 ) v 0 0 0 1 1 ) V 0 0 0 1 1 /
127
CHAPTER 2. ELEMENTARY LINEAR ALGEBRA
The solution is
x1 = 4, x2 =-2, x3 =-2, x4 = 1.
□
2.E.5. Find all the solutions of the homogeneous system
x + y = 2z + v, z + Au + v = 0, — 3u = 0, z = —v of four linear equations with 5 variables x, y, z, u, v.
Solution. We rewrite the system into a matrix such that in the first column there are coefficients of x, in the second there are coefficients of y, and so on. We put all the variables in equations to the left side. By this, we obtain the matrix
/l 1 -2 0 0 1 0 0 0 \0 0 1
We add (4/3)-multiple of the third row to the second and subtract then the second row from the fourth to obtain
/l 1 —2 0 -1\ /l 1 —2 0 -1\
0 0 1 4 1 0 0 1 0 1
000 -3 0 ~000 -3 0'
\0 0 1 0 1/ \0 0 0 0 0/
0 -1\ 4 1 -3 0 0 1/
We multiply the third row by the number —1/3 and add the 2-multiple of the second row to the first, which gives
/l 1 -2 0 0 1 0 0 0
\0 0
0
0 -1\ 0 1
-3 0
0 0 /
/l 1 0 0 1\
0 0 10 1
0 0 0 1 0
\0 0 0 0 0/
From the last matrix, we get immediately (reading from bottom to top) u = 0, z + v = 0, x + y + v = 0. Letting v = s and y = t, the complete solution is
(x, y, z, u, v) = (-i - s, i, -s, 0, s).
which can be rewritten as fx\
y
t,s e
z u
w
= t
i
o o
Vo/
+ s
o
-1 o
V 1 /
t,s e'.
Notice that the second and the fifth column of the matrix together form a basis for the solutions. These are the columns which do not contain a leading 1 in any of its entries. □
2.E.6. Determine the number of solutions for the systems (a)
12x1 +
x\
\/5x2
+ +
Ux3 5x3 2x3
-9, -9, -7;
(b)
4xi + 2x2 - 12x3 = 0,
5a;i + 2x2 ^3 = 0,
2xi - x2 + 6X3 = 4;
128
CHAPTER 2. ELEMENTARY LINEAR ALGEBRA
(c)
4xi + 2x2 - 12x3 = 0,
5xi + 2x2 — x3 = 1,
—2xi — X2 + 6x3 = 0.
Solution. The vectors (1,0, —5), (1,0,2) are clearly linearly independent, (they are not multiples of each other) and the vector (12, \/5,11) cannot be their linear combination (its second coordinate is non-zero). Therefore the matrix whose rows are these three linearly independent vectors (from the left side) is invertible. Thus the system for case (a) has exactly one solution.
For cases (b) and (c), it is enough to note that
(4,2,-12) = -2(-2,-1,6).
In case (b) adding the first equation to the third multiplied by two gives 0 = 8, hence there is no solution for the system. In case (c) the third equation is a multiple of the first, so the system has infinitely many distinct solutions. □
2.E.7. Find a linear system, whose set of solutions is exactly
{(t + 1, 2t, 3t, At); t e R}.
Solution. Such a system is for instance
2xi — X2 = 2, 2x2 — X4 = 0, 4x3 — 3x4 = 0.
These solutions are satisfied for every t e R. The vectors
(2,-1,0,0), (0,2,0,-1), (0,0,4,-3)
giving the left-hand sides of the equations are linearly independent (the set of solutions contains a single parameter). □
2.E.8. Solve the system of homogeneous linear equations given by the matrix
/0 V2 V3 V& 0 \
2 2^-2 -y^
0 2^ 2V3 -V3 ' \3 3 -3 0 /
o
2.E.9. Determine all solutions of the system
X2 + X4 = 1,
2x2 - 3x3 + 4x4 = -2,
x2 X3 + X4 = 2,
- £3 = 1.
O
2.E.10. Solve
3x - 5y + 2u + Az = 2,
5x + 7y - Au - 6z = 3,
7x - Ay + + 3z = 4,
X + 6y - 2u - 5z = 2
o
2.E.11. Determine whether or not the system of linear equations
3xi + 3x2 + x3 = 1,
2xi + 3x2 — x3 = 8,
2xi — 3x2 + x3 = 4,
3xi — 2x2 + 23 = 6
of three variables x1, x2, x3 has a solution. O
129
CHAPTER 2. ELEMENTARY LINEAR ALGEBRA
2.E.12. Determine the number of solutions of the system of 5 linear equations
AT-x = (1,2,3,4,5)T,
where
/3 1 7 5 0\ x = (Xl,x2,x3)T and A= 0 0 0 0 1 .
\2 1 4 3 0/
Repeat the question for the system
AT-x = (1,1,1,1,1)T
2.E.13. Depending on the parameter aeR, determine the solution of the system of linear equations
ax i + 4x2 +2 x3 = 0, 2xi + 3x2 — x3 = 0.
2.E.14. Depending on the parameter aeR, determine the number of solutions of the system
(2\
5
3
V-3/
O
o
í4 1 4 a\ M
2 3 6 8 X2
3 2 5 4
\6 -1 2 -8/ \xA)
o
2.E.15. Decide whether or not there is a system of homogeneous linear equations of three variables whose set of solutions is exactly
(a) {(0,0,0)};
(b) {(0,1,0), (0,0,0), (1,1,0)};
(c) {(x, 1,0); x e R};
(d) {(x,y,2y); x,y G R}.
O
2.E.16. Solve the system of linear equations, depending on the real parameters a, b.
x + 2y + bz = a x — y + 2z = 1 3a; — y = 1.
2.E.17. Using the inverse matrix, compute the solution of the system
%i + x2 + x3 + x4 = 2, x\ + x2 — x3 — xA = 3, x\ — x2 + x3 — x4 = 3,
X\ — X2 — X3 + X4 = 5.
o
o
130
CHAPTER 2. ELEMENTARY LINEAR ALGEBRA
2.E.18. For what values of parameters a.iel has the system of linear equations
x\ — ax2 — 2x3 = b,
x\ + (1 — a)X2 = b — 3,
x\ + (1 — a)X2 + ax3 = 2b — 1
(a) exactly one solution;
(b) no solution;
(c) at least 2 solutions? (i.e. infinitely many solutions)
Solution. We rewrite it, as usual, in the extended matrix, and transform:
At the first step we subtract the first row from the second and the third; and at the second step we subtract the second from the third. We see that the system has a unique solution (determined by backward elimination) if and only if a ^ 0. If a = 0 and b = —2, we have a zero row in the extended matrix. Choosing x3 e K as a parameter then gives infinitely many distinct solutions. For a = 0 and b ^ —2 the last equation a = b + 2 cannot be satisfied and the system has no solution. Note that for a = 0, b = —2 the solutions are
(xi, x2, x3) = (-2 + 2t, -3 - 2t, t), feR
and for a =^ 0 the unique solution is the triple
'-3a2-a&-4a + 2& + 4 2& + 3a + 4 b + 2\
' ' J ■
a a a J
□
& =
Find real numbers bi,b2,b3 such that the system of linear equations A - x = b has:
(a) infinitely many solutions;
(b) unique solution;
(c) no solution;
(d) exactly four solutions.
Solution. Since the first row is the sum of the other two, it is enough to choose b\ = b2 + b3 in case a) and b1 ^ b2 + b3 in case c). Variant b) cannot occur, since the matrix A is not invertible. As long as we work over reals, there cannot be any finite number of solutions, except zero or one. Thus d) is impossible. □
2.E.20. Factor the following permutations into a product of transpositions:
. (1 2 3 4 5 6 7^
l) \7 6 5 4 3 2 1
. (I 2 3 4 5 6 7 8
U> \6 4 1 2 5 8 3 7
A 2 3 4 56789 10
m> U 6 1 10 2 5 9 8 3 7
131
CHAPTER 2. ELEMENTARY LINEAR ALGEBRA
O
i)
ii)
iii)
2.E.21. Determine the parity of the given permutations:
'\ 2 3 4 5 6 7^
7 5 6 4 1 2 3,
1 2 3 4 5 6 7 8^
6 7 1 2 3 8 4 5, 123 4 56789 10
971 10 25493 6
O
2.E.22. Find the algebraically adjoint matrix F* for
fa ß 0\ F = 7 5 0 , q, ß,j,S £ R. \0 0 1/
2.E.23. Calculate the algebraically adjoint matrix for the matrices
/3 -2 0 -1\
0 2 2 1 (b)
O
(a) 1 -2 -3 -2 \0 1 2 1 /
where 2 denotes the imaginary unit.
1 + 2 22
3-2i 6
o
2.E.24. Is the set V = {(1, x); a; £ R} with operations
(B:VxV^V, (l,y) ffi (l,z) = (l,z + y) for all ©:RxV^K « ©(l,y) = (l,y«) for all -
a vector space? O
2.E.25. Express the vector (5,1,11) as a linear combination of the vectors (3,2,2), (2,3,1), (1,1,3), that is, find numbers p,q,r £ R, for which
(5,1,11) = p (3, 2, 2) + g (2, 3,1) + r (1,1, 3).
O
2.E.26. In R3, determine the matrix of rotation through the angle 120° in the positive sense about the vector (1, 0,1) O
2.E.27. In the vector space R3, determine the matrix of the orthogonal projection onto the plane x + y — 2z = 0. O
2.E.28. In the vector space R3, determine the matrix of the orthogonal projection on the plane 2x — y + 2z = 0. O
2.E.29. Determine whether the subspaces U = ((2,1, 2,2)) and V = ((-1,0, -1,2), (-1,0,1,0), (0,0,1, -1)) of the space R4 are orthogonal. If they are, is R4 = U ffi V, that is, is U± = V1 O
2.E.30. Let p be a given line:
p: [1,1] + (4,l)i, t £ R
Determine the parametric expression of all lines q that pass through the origin and have deflection 60° with the line p. Q
132
CHAPTER 2. ELEMENTARY LINEAR ALGEBRA
2.E.31. Depending on the parameter ( 6 8, determine the dimension of the subspace U of the vector space R3, if U is generated by the vectors
(a) u± = (1,1,1), u2 = (l,t,l), u3 = (2,2,t);
(b) wi = (t, t, t), u2 = (-At,-At, At), u3 = (-2, -2, -2).
o
2.E.32. Construct an orthogonal basis of the subspace
((1,1,1,1), (1,1,1,-1), (-1,1,1,1)) of the space R4. O
2.E.33. In the space R4, find an orthogonal basis of the subspace of all linear combinations of the vectors (1, 0,1,0),
(0,1,0,-7), (4,-2,4,14).
Find an orthogonal basis of the subspace generated by the vectors (1, 2,2, —1), (1,1, —5, 3), (3,2,8, —7). O 2.E.34. For what values of the parameters a, b G R are the vectors
(1,1,2,0,0), (1,-1,0,1, a), (l,&,2,3,-2) in the space R5 pairwise orthogonal? O 2.E.35. In the space R5, consider the subspace generated by the vectors
(1,1,-1,-1,0), (1, —1, —1,0, —1), (1,1,0,1,1), (—1,0, —1,1,1). Find a basis for its orthogonal complement. O
2.E.36. Describe the orthogonal complement of the subspace V of the space R4, if V is generated by the vectors (—1,2,0,1), (3,1, -2, 4), (-4,1, 2, -4), (2, 3, -2, 5). O
2.E.37. In the space R5, determine the orthogonal complement W1- of the subspace W, if
(a) W = {(r + s + t, -r + t, r + s, -t, s + t); r, s, t G R};
(b) W is the set of the solutions of the system of equations x\ — x3 = 0, x\ — x2 + x3 — x4 + x5 = 0.
o
2.E.38. In the space R4, let
(1,-2,2,1), (1,3,2,1)
be given vectors. Extend these two vectors into an orthogonal basis of the whole R4. (You can do this in any way you wish, for instance by using the Gram-Schmidt orthogonalization process.) O
2.E.39. Define an inner product on the vector space of the matrices from the previous exercise. Compute the norm of the matrix from the previous exercise, induced by the product you have denned. O
2.E.40. Find a basis for the vector space of all antisymmetric real square matrices of the type 4x4. Consider the standard inner product in this basis and using this inner product, express the size of the matrix
/ 0 3 1 0\
-3012 -1-10 2
\0 -2-2 0/
O
133
CHAPTER 2. ELEMENTARY LINEAR ALGEBRA
2.E.41. Find the eigenvalues and the associated eigenspaces of eigenvectors of the matrix:
Solution. The characteristic polynomial of the matrix is A3 — 6A2 + 12A — 8, which is (A — 2)3. The number 2 is thus an eigenvalue with algebraic multiplicity three. Its geometric multiplicity is either one, two or three. We determine the vectors associated to this eigenvalue as the solutions of the system
-x1 +x2 = 0, (A - 2£)x = -xx +x2 = 0, 2xi -2x2 = 0.
Its solutions form the two-dimensional space ((1, —1, 0), (0,0,1)}. Thus the eigenvalue 2 has algebraic multiplicity 3 and geometric multiplicity 2.
□
2.E.42. Determine the eigenvalues of the matrix
(-13 0
-30 V-12
5 4 2\_
-10 0
12 9 5
4 1/
6
O
2.E.43. Given that the numbers 1, —1 are eigenvalues of the matrix
/-ll 5 4
A =
1\ 0
-3 0 1 -21 11 8 2 \-9 5 3 1/
find all solutions of the characteristic equation A — A E =0. Hint: if you denote all the roots of the polynomial A — A E\
by Ai,A2,A3,A4,then
^4 | = Ai ■ a2 ■ a3 ■ a4, and trA = Ai + a2 + a3 + a4.
o
2.E.44. Find a four-dimensional matrix with eigenvalues Ai = 6 and A2 = 7 such that the multiplicity of A2 as a root of the characteristic polynomial is three, and that
(a) the dimension of the subspace of eigenvectors of \2 is 3;
(b) the dimension of the subspace of eigenvectors of A2 is 2
(c) the dimension of the subspace of eigenvectors of A2 is 1
O
2E.45. Find the eigenvalues and the eigenvectors of the matrix:
2.E.46. Determine the characteristic polynomial A — A E |, eigenvalues and eigenvectors of the matrix
134
CHAPTER 2. ELEMENTARY LINEAR ALGEBRA
4 -1 6
2 1 6
2 -1 8
respectively.
O
135
CHAPTER 2. ELEMENTARY LINEAR ALGEBRA
Solutions to the exercises
2 A.ll. There is only one such matrix X, and it is
-32 -8
1 10 -4
1 12 -5
0 5 -2
Ab =
A-3 =
'14 13 -13> 13 14 13 v 0 0 27 y
2.A./5.
2A.16. CT
í2 -3 0 0
-5 8 0 0 0
0 0 -1 0 0
0 0 0 -5 2
\o 0 0 3 -V
0
1 -1 0
\1 -1 -1
2A.17. In the first case we have 1
-1 0
yi-1 =
in the second
2.C.9. (2 + ^,2-^).
2.C.10. The vectors are dependent whenever at least one of the conditions
a = b = 1, a = c = 1, b = c = 1
is satisfied.
2.C.11. Vectors are linearly independent.
2.C.12. It suffices to add for instance the polynomial x.
2D.5. cos =
2.D./2. Je I ^ - A E \ = —A3 + 12A2 - 47A + 60,. Ai = 3, A2 = 4, As = 5.
2D.20. The solution is the sequence 0,1, 2.
2D.21. The dimension is 1 for Ai = 4 and 2 for A2 = 3.
2.E.8. The solutions are all scalar multiples of the vector
(l + v^, -V3, 0, 1, 0)
2.E.9. XI = 1 + t, X2 = |, X3 = t, : 2.E.10. The system has no solution. 2.E.11. The system has a solution, because
re:
2 2
w
r3^
3
-3 V-2/
-1 1
\1/
8 4
w
136
CHAPTER 2. ELEMENTARY LINEAR ALGEBRA
+ 2x3 = 1,
+ X3 = 2,
+ 4x3 = 3,
+ 3X3 = 4,
= 5
2.E.12. The system of linear equations
3xi xi 7xi 5xi
X2
has no solution, while the system
3xi + 2x3 = 1,
XI + X3 = 1,
7x1 + 4x3 = 1,
5xi + 3x3 = 1,
X2 =1
has a unique solution xi = —1, x2 = 1, x3 = 2. 2.E.13. The set of all solutions is given by
{(-10t, (a + 4)t, (3a - 8)t) ; t e R}.
2.E.14. For a = 0, the system has no solution. For a ^ 0 the system has infinitely many solutions. 2.E.15. The correct answers are „yes", ,410", „no" and „yes" respectively.
2.E.16. i) If b ± -7, then x = z = (2 + a)/(b + 7), y = (3a - b - l)/(b + 7). ii) If b = -7 and a ± -2, then there is no solution, ill)
If a = — 2 and b = —7 then the solution is x = z = t, y = 3t — 1, for any t.
2.E.17.
(1 1
1 1 1\
1 -1 -1 1-11-1
\1 -1-1 1/
We can then easily obtain
13 3
XI = -, X2 =--, X3 =--, X4 = —.
4 4 4 4
2.E.20. i) (1,7)(2,6)(5,3), ii) (1, 6)(6, 8)(8,7)(7, 3)(2,4), ill) (1,4)(4,10)(10, 7)(7, 9)(9, 3)(2, 6)(6, 5) 2.E.21. i) 17 inversions, odd, ii) 12 inversions, even iii) 25 inversions, odd 2.E.22. From the knowledge of the inverse matrix F~x we obtain
1 1 1\
1 1 1 -1 -1
1 -1 1 -1
V -1 -1 1 /
3 1
F*
(aS - ßi) F'1
-ß
for any a, j3, 7, 8 e R. 2.E.23. The matrices are
(a)
ŕ1
0
-4\ -1 6
(b)
6
-3 + 2i
-2i 1+i
\ 2 1-6 -10/
2.E.24. It is easy to check that it is a vector space. The first coordinate does not affect the results of the operations - it is just the vector space (R, +, •) written in a different way. 2.E.25. There is a unique solution
p = 2, q = -2, r = 3.
2.E.26.
2.E.27.
1 1/4 3/4 >
VE/A -1/2
y 3/4 V6/A 1/4 j
I 5/6 -1/6 l/3\
-1/6 5/6 1/3 .
V 1/3 1/3 1/3/
137
CHAPTER 2. ELEMENTARY LINEAR ALGEBRA
2.E.28.
I 5/9 2/9 -4/9\
2/9 8/9 2/9 \-4/9 2/9 5/9 /
2.E.29. The vector that determines the subspace U is perpendicular to each of the three vectors that generate V. The subspaces are thus orthogonal. But it is not true that R4 = U ffi V. The subspace V is only two-dimensional, because
(-1,0, -1, 2) = (-1,0,1,0) - 2 (0,0,1, -1).
2.E.30.
qi : (2 -
V3
2 ,2^3+ ±)t, g2: (2 + ^,-273 + 1)*.
2.E.31. In the first case we have dim U = 2 for t e {1, 2}, otherwise we have dim U = 3. In the second case we have dim U = 2 for t # 0 and dim U = 1 for t = 0.
2.E.32. Using the Gram-Schmidt orthogonalization process we can obtain the result
((1,1,1,1), (1,1,1,-3), (-2,1,1,0)).
2.E.33. We have for instance the orthogonal bases
((1,0,1,0), (0,1,0,-7))
for the first part, and
((1, 2, 2, -1), (2, 3, -3, 2), (2, -1, -1, -2)).
for the second part.
2.E.34. The solution is a = 9/2, b = —5, because
1 + 6 + 4 + 0 + 0 = 0, 1-6 + 0 + 3- 2a = 0. 2.E.35. The basis must contain a single vector. It is
(3,-7,1,-5,9).
(or any non-zero scalar multiple thereof)
2.E.36. The orthogonal complement V± is the set of all scalar multiples of the vector (4, 2, 7,0). 2.E.37.
(a) W1- = <(1,0, -1,1,0), (1, 3,2,1, -3));
(b) Wx = {(1,0,-1,0,0), (1,-1,1,-1,1)).
2.E.38. There are infinitely many possible extensions, of course. A very simple one is
(1,-2,2,1), (1,3,2,1), (1,0,0,-1), (1,0,-1,1).
2.E.39. For instance, one can use the inner product that follows from the isomorphism of the space of all real 3x3 matrices with the space R9. If we use the product from R9, we obtain an inner product that assigns to two matrices the sum of products of two corresponding elements. For the given matrix we obtain
= Vl2 + 22 + O2 + O2 + 22 + O2 + l2 + (-2)2 + (-3)2 = V23.
2.E.40.
2.E.42. The matrix has only one eigenvalue, namely —1, since the characteristic polynomial is (A + l)4 2.E.43. The root —1 of the polynomial | A — A E \ has multiplicity three. 2.E.44. Possible examples are,
(a)
0 0 °\ /6 0 0
0 7 0 0 (b) 0 7 1 0
0 0 7 0 0 0 7 0
\o 0 0 V 0 0 V
0 0 °\
(c) 0 7 1 0
0 0 0 7 0 1 V
138
_CHAPTER 2. ELEMENTARY LINEAR ALGEBRA_
2.E.45. There is a triple eigenvalue —1. The corresponding eigenspace is {(1,0,0), (0, 2,1)).
2.E.46. The characteristic polynomial is — (A — 2)2 (A — 9), that is, the eigenvalues are 2 and 9 with associated eigenvectors
(1,2,0), (-3,0,1) a (1,1,1)
139
CHAPTER 3
Linear models and matrix calculus
where are the matrices useful? - basically almost everywhere..
A. Linear optimization
Let us start with an example of a very simple problem:
3.A.I. A company manufactures bolts and nuts. Nuts and bolts are moulded - moulding a box of bolts takes one minute, a box of nuts is moulded for 2 minutes. Preparing the box itself takes one minute for bolts, 4 minutes for nuts. The company has at its disposal two hours for moulding and three hours for box preparation. Demand says that it is necessary to manufacture at least 90 boxes of bolts more than boxes of nuts. Due to technical reasons it is not possible to manufacture more than 110 boxes of bolts. The profit from one box of bolts is $4 and the profit from one box of nuts is $6. The company has no trouble with selling. How many boxes of nuts and bolts should be manufactured in order to have maximal profit?
Solution. Write the given data into a table:
Bolts 1 box Nuts 1 box Capacity
Mould 1 min ./box 2 min ./box 2 hours
Box 1 min ./box 4 min ./box 3 hours
Profit $4/box $6/box
We have already developed a useful package of tools and it is time to show some applications of matrix calculus. The first three parts of this chapter are independent and the readers more interested in the theory might skip any them and continue with the fourth part straight ahead.
It might seem that the assumption of linearity of relations between quantities is too restrictive. But this is often not so. In real problems, linear relations may appear directly. A problem may be solved as a result of an iteration of many linear steps. If this is not the case, we may still use this approach at least to approximate real non-linear processes.
We should also like to compute with matrices (and linear mappings) as easily as we can compute with scalars. In order to do that, we prepare the necessary tools in the second part of this chapter. We also present a useful application of matrix decompositions to the pseudoinverse matrices, which are needed for numerical mastery of matrix calculus.
We try to illustrate all the phenomena with rather easy problems. Still some parts of this chapter are perhaps difficult for first reading. This in particular concerns the very first part providing some glimpses towards the linear optimization (linear programming), and the third part devoted to iterated processes (the Frobenius-Perron theory).
The rest of the chapter comes back to some more advanced parts of the matrix calculus (the Jodan canonical form, decompositions, and pseudo-inverses of matrices). The reader should feel free to move forward if getting lost.
1. Linear optimization
The simplest linear processes are given by linear mappings ip : V —> W on vector spaces. As we can surely imagine, the vector v e V can represent the state of some system we are observing, while ip(v) gives the result after some process is realized.
If we want to reach a given result b e W of such a process, we solve the problem
\u ■ ei|2 +
+ \u ■ ek\
This property is called the Bessel inequality.
(4) If(ei,...,ek) is an orthonormal system of vectors, then u £ span{ei,..., ek} if and only if
II l|2 I 2 i i I |2
=|u-ei| + ■ ■ ■ + \u ■ ek\ ■
This is called the Parseval equality.
(5) If(e±,..., ej;) is an orthonormal system of vectors and u £ V, then the vector
w=(u- e1)e1 H-----h (u ■ ek)ek
is the only vector which minimizes the norm \\u — v\\ among all v £ span{ei,.. ., ek}.
Proof. The verifications are all based on direct computations:
(2): The result is obvious if v = 0. Otherwise, define the vector w = u — ^v, that is, w + v and compute
2 (TFv)
m\r - tarnu -v)-^(vu)+ -,+1,4-1+11
(u-v)(u-v)
ll^ll4
Hwll2!^!)2 = ||u||2||w||2 - 2(u ■ v)(vTv) + (u ■ v)(v~~v)
These are non-negative real values and thus, ||w||2|M|2 > \u ■ v\2 and the equality holds if and only if w = 0, that is, whenever u and v are linearly dependent. (1): It suffices to compute
||u + w||2 = ||u||2 + ||w||2 + U'W + W'U = ||u||2 + \\v\\2 + 2Re(u ■ v)
< \\u\\2 + ||w||2 + 2|u-'y| < ||u||2 + |H|2 + 2||u|||H|
= (\H\ + \M)2
Since we deal with squares of non-negative real numbers, this means that ||u + w|| < ||u|| + ||w||. Furthermore, equality implies that in all previous inequalities equality also holds. This is equivalent to the condition that u and v are linearly dependent (using the previous part).
(3), (4): Let (e1,..., ek) be an orthonormal system of vectors. We extend it to an orthonormal basis (ei,..., en) (that is always possible by the previous theorem). Then, again using the previous theorem, we have for every vector u £ V
n n k
\\u\\2 = ^2(u ■ et)(u ■ et) = ^2 \u ' ei\2 > \u ' ei\2
i=l i=l i=l
But that is the Bessel inequality. Furthermore, equality holds if and only if u ■ e{ = 0 for all i > k, which proves the Parseval equality.
(5): Choose an arbitrary v £ sp&n{e1,... ,ek} and extend the given orthonormal system to the orthonormal basis
(ei,... ,e„). Let («i,... ,un) and (xu ...,xk, 0,..., 0) be
172
CHAPTER 3. LINEAR MODELS AND MATRIX CALCULUS
U, where X is a lower triangular matrix given by the Gaussian reduction, and U upper triangular. From this equality A = X~x U, which is the desired decomposition. (Thus we have to compute the inverse of X). □
3.F.3. Find the LU-decomposition of the matrix
C -!1 !)■ 0
3.F.4. Ray-tracing. In computer 3D-graphics the image is very often displayed using the Ray-tracing algorithm. The basis of this algorithm is an approximation of the light waves by a ray (line) and an approximation of the displayed objects by polyhedrons. These are bounded by planes and it is necessary to compute where exactly the light rays are reflected from these planes. From physics we know how the rays are reflected - the angle of impact equals the angle of reflection. We have already met this topic in the exercise I.E.10.
The ray of light in the direction v = (1,2,3) hits the plane given by the equation x + y + z = 1.1n what direction is it reflected?
Solution. The unit normal vector to the plane is
n = ^5= (1,1,1). The vector that gives the direction of the reflected ray vr lies in the plane given by the vectors v, n. We can express it as a linear combination of these vectors. Furthermore, the rule for the angle of reflection says that (ii,n) = —(vf>,n). From there we obtain a quadratic equation for the coefficient of the linear combination.
This exercise can be solved in an easier, more geometric way. From the diagram we can derive directly that
Vfj = v — 2(i>, n)n In our case, vr = (—3, —2, —1).
□
3.F.5. Singular decomposition, polar decomposition, pseudoinverse. Compute the singular decomposition of the matrix
A ■
0 0
-10 0 V 0 0 0 /
. Then compute its polar decomposition and find its pseudoinverse.
Solution. First compute ATA:
A1 A:
0 °\
= 0 0 0
Vo 0 \
coordinates of u and v under this basis. Then
||u—u||2 = \u1-x1\2-\-----\-\uk-xk\2 + \uk+1\2-\-----\-\un\2
and this expression is clearly minimized when choosing the individual vectors to be x\ = u\,..., xk = uk. □
3.4.4. Unitary and orthogonal mappings. The properties fit °f ormogonal mappings have direct analogues in the complex domain. We can easily formu-SS^issES late them and prove together:
Proposition. Consider the linear mapping (endomorphism) p : V —> V on the (real or complex) space with scalar product. Then the following conditions are equivalent.
(1) ip is unitary or orthogonal transformation,
(2) ip is linear isomorphism and for every u, v G V
(2): The mapping p is injective, therefore it must be onto. Also p(u) ■ v = p(u) ■ pip'1^)) = u ■ p'1^). (2) => (3): The standard scalar product is in K". It is given for columns x, y of scalars by the expression x ■ y = yTE x = yx, where E is the unit matrix. Property (2) thus means that the matrix A of the mapping p is invertible and yT Ax
(yTA
(A-^yf
(A-^yf 0 for all
x. This means that By substituting the
complex conjugate of the expression in the parentheses for x we find that equality is possible only when AT = A-1. (We may also rewrite the expression as yT(A— (A~1)T)x and see the conclusion by substituting the basis vectors for x and y.)
(3) => (4): This is an obvious implication.
(4) => (5) In the relevant basis, the claim is expressed via the matrix A of the mapping p as the equation AAT = E, which is ensured by (4).
(5) => (6): We have\ATA\ = \E\ = \AAT\ = \A\~\A\ = 1, there exists the inverse matrix A-1. But we also have AATA = A, therefore also ATA = E which is expressed exactly by (6).
(6) => (1): In the chosen orthonormal basis
p(u) ■ p(v) = (Ay) Ax = yTATAx = yTEx =
-T
y x
where x and y are columns of coordinates of the vectors u and v. That ensures that the scalar product is preserved. □
173
CHAPTER 3. LINEAR MODELS AND MATRIX CALCULUS
to obtain a diagonal matrix. We need to find an orthonormal basis under which the matrix is diagonal and the zero row is the last one. This can be obtained by rotating about the x-axis through a right angle. The y-coordinate then goes to z
and z goes to —y. This rotation is an orthogonal transforma-
/I 0 0\
tion given by the matrix V = 10 0 1 . By this, we
\0 -1 Oj
have found the decomposition AT A = VBVT. here, B is diagonal with eigenvalues (1,^,0) on the diagonal. Because B = (AV)T(AV), the columns of the matrix
0 0 -\\ /l 0 0\
AV = -1 0 0 V 0 0 0
0 0 1 ^0 -1 0;
o \ o^
-10 0 0 0 0y
form an orthogonal system of vectors, which we normalise and extend to a basis. That is then of the form (0,-1,0), (1,0,0), (0,0,1). The transition matrix of changing from this basis to the standard one is then
U
Finally, we obtain the decomposition A = UVBV
'0 0 -I\ -10 0 = , 0 0 0 /
' 0 1 0\ I 0 0\ I 0 -1 0 00 \ 0 0 0 , 0 0 1/ \0 0 0/ \0 1
The geometrical interpretation of decomposition is the following: first, everything is rotated through a right angle by the x-axis, then follows a projection to the xy plane such that the unit ball is mapped on the ellipse with major half-axes 1 and \. The result is then rotated through a right angle about the z-axis.
The polar decomposition A = P ■ W can be obtained from the singular one: P :=UVBUt and W := UVT, that is,
1 0 0
0 0 0
'\ 0 °^
0 1 0
,0 0 0;
Characterizations from the previous theorem deserve J.',, some notes. The matrices A e Mat„(K) with the property A-1 = AT are called unitary matrices for | complex scalars (in the case R we have already used the name orthogonal matrices for them). The definition itself immediately implies that a product of unitary (orthogonal) matrices is again unitary (orthogonal). The same is true for inverses. Unitary matrices thus form a subgroup U(n) c G1„(C) in the group of all invertible complex matrices with the product operation. Orthogonal matrices form a subgroup 0(n) c G1„(R) in the group of real invertible matrices. We speak of a unitary group and of an orthogonal group.
The simple calculation
1 = det E = det(AAT) = det Adet A = | det A\2
shows that the determinant of a unitary matrix has norm equal to one. For real scalars the determinant is ±1. Furthermore, if Ax = \x for a unitary or orthogonal matrix, then (Ax) ■ (Ax) = x ■ x = | A |2 (a; ■ x). Therefore the real eigenvalues of orthogonal matrices in the real domain are ±1. The eigenvalues of unitary matrices are always complex units in the complex plane.
The same argument as we have seen with the orthogonal mappings imply that orthogonal complements of invariant subspaces with respect to unitary mappings ip : V —> V are also invariant. Indeed, if V be a unitary mapping of complex vector spaces. Then V is an orthogonal sum of one-dimensional eigensubspaces.
Proof. There exists at least one eigenvector v e V, since complex eigenvalues always exist. Then the restriction of p to the invariant subspace (w)± is again unitary and also has an eigenvector. After n such steps we obtain the desired orthogonal basis of eigenvectors. After normalising the vectors we obtain an orthonormal basis. □
Now it is possible to understand the details of the proof of the spectral decomposition of the orthogonal mapping from 2.4.7 at the end of the second chapter. The real matrix of an orthogonal mapping is interpreted as a matrix of a unitary mapping on a complex extension of Euclidean space. We observe the corollaries of the structure of the roots of the real characteristic polynomial over the complex domain. Automatically we obtain invariant two-dimensional subspaces given by pairs of complex conjugated eigenvalues and hence the corresponding rotation for restricted original real mapping.
p(v) ■ p(p 1(u)) = vp 1(u).
174
CHAPTER 3. LINEAR MODELS AND MATRIX CALCULUS
\ 2 0
= 0 1 0
/ Vo 0 o
From this it follows that
/0 0 -I -10 0 \ 0 0 0
The pseudoinverse matrix is then given by the expression
/l 0 0\
vl(-i) := VS UT, where ff = 0 2 0 . Thus,
\0 0 0/
=
A 0 °\ 0 -i °\
0 2 0 1 0 0
Vo 0 o Vo 0 1
□
3.F.6. QR decomposition. The QR decomposition of a matrix A is very useful when we are given a system of linear equations Ax = & which has no solution, but an approximation as good as possible is needed. That is, we want to minimize \\Ax — b\\. According to the Pythagorean theorem, \\Ax — b\\2 = \\Ax — &u ||2 + ||&±||2, where b is decomposed into &n which belongs to the range of the linear transformation A, and into b±, which is perpendicular to this range. The projection on the range of A can be written in the form QQT for a suitable matrix Q. Specifically for this matrix we obtain it by the Gram-Schmidt orthonormalisation of the columns of the matrix A. Then Ax - &y = Q(QTAx - QTb). The system in the parentheses has a solution, for which \\Ax — b\\ = ||&±||, which is the minimal value. Furthermore, the matrix R := QT A is upper triangular and therefore the approximate solution can be found easily.
Find an approximate solution of the system
x + 2y = 1 2x + Ay = 4
Solution. Consider the system Ax = b with A =
and b -
which evidently has no solution. We orthonor-
malise the columns of A. We take the first of them and divide
it by its norm. This yields the first vector of the orthonormal
A
basis
But the second is twice the first and thus it will be after orthonormalisation. Therefore Q = ( ^ i.
The projector on the range of A is then
3.4.5. Dual and adjoint mappings. When discussing vec-J.i,, tor spaces and linear mappings in the second chapter, we mentioned briefly the dual vector space V* of all | linear forms over the vector space V, see 2.3.17. This duality extends to mappings:
Dual mappings For any linear mapping > •: I • U , the expression
(1) {v,il>*(a)) = {il>(v),a),
where ( , } denotes the evaluation of the linear forms (the second argument) on the vectors (the first argument), while v e V and a e W* are arbitrary, defines the mapping ip* : W* —> V* called the dual mapping to ip.
Choose bases vmV,wmW and write A for the matrix of the mapping t/j in these bases. Then we compute the matrix of the mapping t/j* in the corresponding dual bases in the dual spaces. Indeed, the definition says that if we represent the vectors from W* in the coordinates as rows of scalars, then the mapping t/j* is given by the same matrix as t/j, if we multiply by it the row vectors from the right:
(ip(v),a) = (qi, ... ,a„) ■ A ■
This means that the matrix of the dual mapping ip* is the transpose AT, because a ■ A = (AT ■ aT)T.
Assume further that we have a vector space with scalar product. Then we can naturally identify V and V* using the scalar product. Indeed, choosing one fixed vector w e V, we substitute this vector into the second argument in the scalar product in order to obtain the identification V ~ V* = Hom(V; K)
V 3 w i-> (v i-> (v,w)) e V*.
The non-degeneracy condition on the scalar product ensures that this mapping is a bijection. Notice it is important to use w as the fixed second argument in the case K = C in order to obtain linear forms. Since factorizing complex multiples in the second argument yields complex conjugated scalars, the identification V ~ V* is linear over real scalars only.
It is clear that the vectors of an orthonormal basis are mapped to forms that constitute the dual basis, i.e. the orthonormal basis are selfdual under our identification. Moreover, every vector is automatically understood as a linear form, by means of the scalar product.
How does the above dual mapping W* —> V* look in terms of our identification? We use the same notation ip* '■ W —> V for the resulting mapping, which is uniquely given as follows:
175
CHAPTER 3. LINEAR MODELS AND MATRIX CALCULUS
Next,
and
Adjoint mapping
1
71
_9_
(5 9)
The approximate solution then satisfies Rx = QTb, and here that means 5x + 9y = 9. (The approximate solution is not unique). The QR decomposition of the matrix A is then
□
/2 -1 -1\ 3.F.7. Minimise &|| for A = -1 2 -1 and
V-l -1 2y/
b = 10 . Hence write down the QR decomposition of the
w
matrix A
Solution. The normalised first column of the matrix A is ei = J — 1 J. From the second column, subtract its com-
-1,
ponent in the direction e1. Then
By this we have created an orthogonal vector, which we nor-
malise to obtain e2 = -4? I 1 . The third column of the
matrix A is already linearly dependent (verify this by computing the determinant, or otherwise). The desired column-orthogonal matrix is then
, / 2 0 \
V6
-1 y/3 v-l -V3J
Next,
R =
and
Ve ^0 3^ -3^/
For every linear mapping %p : V —> between spaces with scalar products, there is the adjoint mapping tp* uniquely determined by the formula
(2) (yj(u),v) = (u,r(v))-
The parentheses means the scalar products on W or V, respectively.
Notice that the use of the same parenthesis for evaluation of one-forms and scalar products (which reflects the identification above) makes the denning formulae of dual and adjoint mappings look the same.
Equivalently we can understand the relation (2) to be the definition of the adjoint mapping tp*. By substituting all pairs of vectors from an orthonormal basis for the vectors u and v we obtain directly all the values of the matrix of the mapping tp*. Using the coordinate expression for the scalar product, the formula (2) reveals the coordinate expression of the adjoint mapping:
(i(j(v),w) = (»1,
(v,ip*(w)}.
It follows that if A is the matrix of the mapping t/j in an orthonormal basis, then the matrix of the adjoint mapping tp* is the transposed and conjugated matrix A - we denote this by
A* = AT.
The matrix A* is called the adjoint matrix of the matrix A. Note that the adjoint matrix is well denned for any rectangular matrix. We should not confuse them with algebraic adjoints, which we used for square matrices when working with determinants.
We can summarise. For any linear mapping t/j: V —> W between unitary spaces, with matrix A in some bases on V and W, its dual mapping has the matrix AT in the dual basis. If there are scalar products on V and W, we identify them (via the scalar products) with their duals. Then the dual mapping coincides with the adjoint mapping tp* : W —> V, which has the matrix A*. The distinction between the matrix of the dual mapping and the matrix of the adjoint mapping is thus in the additional conjugation. This is of course a consequence of the fact that our identification of the unitary space with its dual is not a linear mapping over complex scalars.
3.4.6. Self-adjoint mappings. Those linear mappings which coincide with their adjoints: tp* = t/j, are of particular interest. They are called 1 self-adjoint mappings. Equivalently we can say that they are the mappings whose matrix A satisfies A = A* in some (and thus in all) orthonormal basis.
176
CHAPTER 3. LINEAR MODELS AND MATRIX CALCULUS
The solution of the equation Rx = QTb is x = y = z. Thus, multiples of the vector (1,1,1) minimize \\Ax — b\\.
The mapping given by the matrix A is a projection on the plane with normal vector (1,1,1).
□
3.F.8. Linear regression. The knowledge obtained in this chapter can be successfully used in practice for solving problems with linear regression. It is about finding the best approximation of some functional dependence using a linear function.
Given a functional dependence for some points that is, f{a\,. .., a}n) = yi,..., f(ak, a\,. .., akn) = yk, k > n (we have thus more equations than unknowns) and we wish to find the "best possible" approximation of this dependency using a linear function. That is, we want to express the value of the property as a linear function f(xi,..., xn) = b\x\ + b2x2 + ■ ■ ■ + bnxn + c. We choose to define "best possible" by the minimisation of
k I n \ 2
i=l \ j=\ j
with regard to the real constants bi,... ,bn, c. The goal is to find such a linear combination of the columns of the matrix A = (aj) (with coefficients &i,..., bn\ that is closest to the vector (yi,..., yk) in Rfe. Thus it is about finding an orthogonal projection of the vector (yi,..., yk) on the sub-space generated by the columns of the matrix A. Using the theorem 3.5.7 this projection is the vector (pi,..., bn)T =
3.F.9. Using the least squares method, solve the system
2x + y + 2z = 1 x + y + 3z = 2 2x + y + z = 0 x + z = —1
Solution. The system has no solution, since its matrix has rank 3, and the extended matrix has rank 4. The best approximation of the vector b = (1,2, 0, —1) can thus be obtained using the theorem 3.5.7 by the vector A^^b. AA^^b is then the best approximation - the perpendicular projection
In the case of Euclidean spaces the self-adjoint mappings are those with symmetric matrices (in orthonormal basis). They are often called symmetric mappings.
In the complex domain the matrices that satisfy A = A* are called Hermitian matrices or also Hermitian symmetric matrices. Sometimes they are also called self-adjoint matrices. Note that Hermitian matrices form a real vector subspace in the space of all complex matrices, but it is not a vector sub-space in the complex domain.
Remark. The next observation is of special interest. If we multiply a Hermitian matrix A by the imaginary unit, we obtain the matrix B = i A, which has the property
B* =iAT = -B.
Such matrices are called anti-Hermitian or Hermitian skew-symmetric. Every real matrix can be written as a sum of its symmetric part and its anti-symmetric part,
A=\(A + AT) + \{A-AT). In the complex domain we have analogously
A=±(A + A*)+iji(A-A*).
In particular, we may express every complex matrix in a unique way as a sum
A = B + iC
with Hermitian symmetric matrices B = ^(A + A*) and C = ^[(A — A*). This is an analogy of the decomposition of a complex number into its real and purely imaginary component and in the literature we often encounter the notation
B = reA= ^(A + A*), C = imyl= j(A-A*).
In the language of linear mappings this means that every complex linear automorphism can be uniquely expressed by means of two self-adjoint mappings playing the role of the real and imaginary parts of the original mapping.
3.4.7. Spectral decomposition. Consider a self-adjoint mapping t/j : V —> V with the matrix A in some orthonormal basis. Proceed similarly as in 2.4.7 when we diagonalized the matrix of orthogonal mappings. Again, consider arbitrary invariant subspaces of self-adjoint mappings and their orthogonal complements. If a self-adjoint mapping t/j : V —> V leaves a subspace W C V invariant, i.e. ip(W) C W, then for every v e W±, w e W
(tp(v),w) = (v,ip(w)} = 0.
Thus also, ^(W/±) C W±.
Next, consider the matrix A of a self-adjoint mapping in an orthonormal basis and an eigenvector x e Cn, i.e. A ■ x = \x. We obtain
\{x, x) = {Ax, x) = {x, Ax) = {x, Xx) = \{x, x).
Ill
CHAPTER 3. LINEAR MODELS AND MATRIX CALCULUS
of the vector b on the space generated by the columns of the matrix A.
Because the columns of the matrix A are linearly independent, its pseudoinverse is given by the relation (ATA)~1AT. Hence
The desired x is
A^b = (-6/5,7/3,l/3)T.
The projection (the best possible approximation to the column of the right side) is then the vector
(3/5,32/15,4/15,-13/15). □
The positive real number (x,x) can be cancelled on both sides and thus A = A, and we see that eigenvalues of Hermitian matrices are always real.
The characteristic polynomial det(vl — XE) has as many complex roots as is the dimension of the square matrix A (including multiplicities), and all of them are actually real. Thus we have proved the important general result:
Proposition. The orthogonal complements of invariant sub -spaces of self-adjoint mappings are also invariant. Furthermore, the eigenvalues of a Hermitian matrix A are always real.
The very definition ensures that restriction of a self-adjoint mapping to an invariant subspace is again self-adjoint. Thus the latter proposition implies that there always exists an orthonormal basis of V composed of eigenvectors. Indeed, start with any eigenvector vi, normalize it, consider its linear hull Vi and restrict the mapping to . Consider next another eigenvector v1 e V2±, take V2 = span(VI U {v2}), which is again invariant. Continue and construct the sequence of invariant subspaces V\ C V2 C ... Vn = V, building the orthonormal basis of eigenvectors, as expected.
Actually, it is easy to see directly that eigenvectors associated with different eigenvalues are perpendicular to each other. Indeed, if ip(u) = Aw, i/j(v) = fiv then we obtain
A(u, v) = (4>(u),v) = (u, ip{v)) = [i{u, v) = n(u, v).
Usually this result is formulated using projections onto eigensubspaces. Recall the properties of projections along subspaces, as discussed in 2.3.19. A projection P : V —> V is a linear mapping satisfying P2 = P. This means that the restriction of P to its image is the identity and the projector is completely determined by choosing the subspaces Im P and KerP.
A projection P : V —> V is called orthogonal if Im P _L Ker P. Two orthogonal projections P, Q are called mutually perpendicular if Im P _L Im Q.
Spectral decomposition of self-adjoint mappings
Theorem (Spectral decomposition). For every self-adjoint mapping ip : V —> V on a vector space with scalar product there exists an orthonormal basis composed of eigenvectors. If Ai,..., Afc are all distinct eigenvalues of ip and if Pi,..., Pk are the corresponding orthogonal and mutually perpendicular projectors onto the eigenspaces corresponding to the eigenvalues, then
V> = XiPi + ■ ■ ■ + xkPk-
The dimensions of the images of these projections Pi equal the algebraic multiplicities of the eigenvalues Xi.
178
CHAPTER 3. LINEAR MODELS AND MATRIX CALCULUS
3.4.8. Orthogonal diagonalization. Linear mappings which allow for orthonormal bases as in the i, latter theorem on spectral decomposition are called orthogonally diagonalizable. Of course, they are exactly the mappings for which we can find an orthonormal basis in which the matrix of the mapping is diagonal. We ask what they look like.
In the Euclidean case, this is simple: diagonal matrices are first of all symmetric, thus they are the self-adjoint mappings. As a corollary we note that an orthogonal mapping of an Euclidean space into itself is orthogonally diagonalizable if and only if it is self-adjoint.They are exactly the self-adjoint mappings with eigenvalues ±1.
The situation is much more interesting on unitary spaces. Consider any linear mapping p : V —> V on a unitary space. Let p = -p + if] be the (unique) decomposition of p into its Hermitian and anti-Hermitian part. If p has diagonal matrix D in a suitable orthonormal basis, then D = Re D + i Im D, where the real and the imaginary parts are exactly the matrices of -p and 7]. This follows from the uniqueness of the decomposition. Knowing this in the particular coordinates, we conclude the following computation relations at the level of mappings ip o r\ = r\ o ip (i.e. the real and imaginary parts of p commute), and pop* = p* o p (since this clearly holds for all diagonal metrices). The mappings p : V —> V with the latter property are called the normal mappings.
A detailed characterization is given by the following theorem (stated in the notation of this paragraph):
Theorem. The following conditions on a mapping p : V —> V on a unitary space V are equivalent:
(1) p is orthogonally diagonalizable,
(2) p* o p = p o p* (p is a normal mapping),
(3) ip o rj = rj o ip (the Hermitian and anti-Hermitian parts commute),
(4) ifA= (ciij) is the matrix ofp in some orthonormal basis, and Xi are the m = dim V eigenvalues of A, then
m m i,j = l i=l
Proof. The implication (1) => (2) was discussed above. (2) (3): it suffices to calculate
ip o p* = (-p + ir])(%p — irf) = -p2 + rf + i(rj-p — -prf)
ip* o p = (-p — iT])(t/j + irf) = -p2 + rf + i(-prj — rj-p)
Subtraction of the two lines yields
pp* — p*p = 2i(rj-p — iprf).
(2) => (1): If p is normal, then
(p(u),p(u)} = (p*p(u),u) = (pp*(u),u)
= (p*(u),p*(u))
thus \p(u) \ = \p*(u)\.
Next, notice (p - A id V)* = (p* - A id V). Thus, if p is normal, then (p — A id V) is normal too.
179
CHAPTER 3. LINEAR MODELS AND MATRIX CALCULUS
If ip(u) = Aw, then u is in the kernel of p — A idy. Thus the latter equality of norms of values for normal mappings and their adjoints ensures that u is also in the kernel of ip* — A idy. It follows that p* (u) = Aw. We have proved, under the assumption (2), that p and p* have the same eigenvectors and that they are associated to conjugated eigenvalues.
Similarly to our procedure with self-adjoint mappings, we now prove orthogonal diagonalizability. The latter procedure is based on the fact that the orthogonal complements to sums of eigenspaces are invariant subspaces.
Consider an eigenvector u e V with eigenvalue A, and any v e (u)±. We have
(p(v),u) = (v, p*(u)} = (v, Am) = X(u, v) = 0.
Thus p(y) e (u)±. The same occurs if u is replaced by a sum of eigenvectors instead.
(1) => (4): the expression J2i j \aij\2 *s the trace of the matrix AA*, which is the matrix of the mapping pop*. Therefore its value does not depend on the choice of the or-thonormal basis. Thus if p is diagonalizable, this expression equals exactly J2i 12 ■
(4) => (1): This part of the proof is a direct corollary of the Schur theorem on unitary triangulation of an arbitrary linear mapping V —> V, which we prove later in 3.4.15. This theorem says that for every linear mapping p : V —> V there exists an orthonormal basis under which p has an upper triangular matrix. Then all the eigenvalues of p appear on its diagonal. Since we have already shown that the expression J2i j \aij\2 does not depend on the choice of the orthonormal bases, all elements in the upper triangular matrix, which are not on the diagonal must be zero. □
Remark. We can rephrase the main statement of the latter theorem in terms of matrices. A mapping is normal if and only if its matrix A satisfies A A* = A* A in some orthonormal basis (and equivalently in any orthonormal basis). Such matrices are called normal. Moreover, we can consider the last theorem as a generalization of standard calculations with complex numbers. The linear mappings appear similar to complex numbers in their algebraic form. The role of real numbers is played by self-adjoint mappings, and the unitary mappings play the role ofthe complex units cos t+i sin t e C. The following consequence of the theorem shows the link to the property cos21 + sin2 t = l.
Corollary. The unitary mappings on a unitary space V are exactly those normal mappings p on V for which the unique decomposition p = ip + if] into Hermitian and anti-Hermitian parts satisfies ip2 + rj2 = idy.
Proof. If p is unitary, then pp* = idy = p*p and thus pp* = (ip + if])(ip — if]) = ip2 + 0 + rj2 = idy. On the other hand, if p is normal, we can read the latter computation backwards which proves the other implication. □
180
CHAPTER 3. LINEAR MODELS AND MATRIX CALCULUS
3.4.9. Roots of matrices. Non-negative real numbers are exactly those which are squares of real numbers (and thus we may find their square roots). At the same time, their positive square roots are uniquely denned. Now we observe a similar behaviour of matrices of the form B = A* A. Of course, these are the matrices of the compositions of mappings p with their adjoints.
By definition,
(1) (Bx,x) = (A*Ax,x) = (Ax, Ax) > 0
for all vectors x. Furthermore, we clearly have
B* = (A* A)* = A* A = B.
Hermitian matrices B with the property (Bx,x) > 0 for all x are called positive semidefinite matrices. If the zero value is attained only for x = 0, they are called positive definite. Analogously, we speak of positive definite and positive semi-definite (self-adjoint) mappings ip : V —> V.
For every mapping p : V —> V we can define its square root as a mapping ip such that ip oip = p. The next theorem completely describes the situation when restricting to positive semidefinite mappings.
Positive semidefinite square roots
Theorem. For each positive semidefinite square matrix B, there is the uniquely defined positive semidefinite square root \/~B.
If P is any matrix such that P~1BP = D is diagonal, then \/B = P\/rDP~1, where D has got the (non-negative) eigenvalues of B on its diagonal and \/D is the matrix with the positive square roots of these values on its diagonal.
Proof. Since B is a matrix of a self-adjoint mapping p,
there is even an orthonormal P as in the theorem (cf. Theorem 3.4.7) with all eigenvalues in the diagonal of D non-negative. Consider C = \fB as defined in the second claim and notice that in-
deed
C2 = PVIĎP^PVIĎP-1 = PDP'1 = B.
Thus the mapping ip given by C must have the same eigenvectors as p and thus these two mappings share the decompositions of K" into mutually orthogonal eigenspaces. In particular, both of them will share the bases in which they have diagonal matrices and thus the definition of VT5 must be unique in each such basis. This proves that the definition of \fB does not depend on our particular choice of the diag-onalization of p. □
Notice there could be a lot of different roots, if we relax the positivity condition on \fB, see ??.
181
CHAPTER 3. LINEAR MODELS AND MATRIX CALCULUS
3.4.10. Spectra and nilpotent mappings. We return to the behavior of linear mappings in full generality. 'f We continue to work with real or complex vector
spaces, but without necessarily fixing a scalar product there.
Recall that the spectrum of a linear mapping f : V —> V is a sequence of roots of the characteristic polynomial of the mapping /, counting multiplicities. The algebraic multiplicity of an eigenvalue is its multiplicity as a root of the characteristic polynomial. The geometric multiplicity of an eigenvalue is the dimension of the corresponding subspace of eigenvectors.
A linear mapping / : V —> V is called nilpotent, if there exists an integer k > 1 such that the iterated mapping fk is identically zero. The smallest k with such a property is called the degree of nilpotency of the mapping /. The mapping / : V —> V is called cyclic, if there exists the basis ..., un) of the space V such that /(ui) = 0 and = ui_1 for all i = 2,..., n. In other words, the matrix of / in this basis is of the form
A-
(Q 1 0 0 0 1
If f(v) = a v, then jk(y) = ak ■ v for every natural k. Note that, the spectrum of nilpotent mapping can contain only the zero scalar (and this is always present).
By the definition, every cyclic mapping is nilpotent. Moreover, its degree of nilpotency equals the dimension of the space V. The derivative operator on polynomials, D(xk) = kxk~x, is an example of a cyclic mapping on the spaces K„ [x] of all polynomials of degree at most n over the scalars K.
Perhaps surprisingly, this is also true the other way round - every nilpotent mapping is a direct sum of cyclic mappings. A proof of this claim takes much work. So we formulate first the results we are aiming at, and only then come back to the technical work.
In the resulting theorem describing the Jordan decomposition, the crucial role is played by vector (sub)spaces and linear mappings with a single eigenvalue a given by the matrix
(1) j =
fx 1 0 ... 0\ 0 a 1 ... 0
\0 0 0 ... Xj
These matrices (and the corresponding invariant subspaces) are called Jordan blocks.4
Camille Jordan was a famous French Mathematician working in Analysis and Algebra at the end of the 19th and the beginning of the 20th centuries.
182
CHAPTER 3. LINEAR MODELS AND MATRIX CALCULUS
Jordan canonical form
Theorem. Let Vbe a real or complex vector space of dimension n. Let f : V —> V be a linear mapping with n eigenvalues (in the chosen domain of scalars), counting algebraic multiplicities. Then there exists a unique decomposition of the space V into the direct sum of subspaces
V = Vi ffi ■ ■ ■ ffi Vk
where not only f(Vi) C Vi, but the restriction of f to each Vi has a single eigenvalue A, and the restriction f — \ idys on Vi is either cyclic or is the zero mapping. In particular, there is a suitable basis in which f has a block-diagonal matrix J with Jordan blocks along the diagonal.
We say that the matrix J from the theorem is in Jordan canonical form. In the language of matrices, we can rephrase the theorem as follows:
Corollary. For each square matrix A over complex scalars, there is an invertible matrix P such that A = P~x J P and J is in canonical Jordan form.
The matrix P is the transition matrix to the basis from the theorem above. Notice that the total number of ones over the diagonal in J equals the difference between the total algebraic and geometric multiplicity of the eigenvalues. The ordering of the blocks in the matrix corresponds to the chosen ordering of the subspaces Vi in the direct sum. Thus, the uniqueness of the matrix J is true up to the ordering of the Jordan blocks. There is therefore freedom in the choice of the basis for such a Jordan canonical form.
3.4.11. Remarks. The existence of the Jordan canonical form is clear for the cases when all eigenvalues are either distinct or when the geometric and algebraic multiplicities of the eigenvalues are the same. In particular, this is the case for all unitary and self-adjoint mappings on unitary vector spaces, while the definition of normal mappings requires eaxactly this behavior. In particular, the Jordan canonical form of a mapping is diagonal if and only if the mapping is normal.
A consequence of the Jordan canonical form theorem is that for every linear mapping /, every eigenvalue of / uniquely determines an invariant subspace that corresponds to all Jordan blocks with this particular eigenvalue. We shall call this subspace the root subspace corresponding to the given eigenvalue.
We mention one useful corollary of the Jordan theorem (which is already used in the discussion about the behavior of Markov chains). Assume that the eigenvalues of our mapping / are all of absolute value less than one. Then repeated application of the linear mapping on every vector v e V leads to a decrease of all coordinates of fh(v) towards zero, without bounds.
Indeed, assume / has only one eigenvalue A on all the complex space V and that / — A idy is cyclic (that is, we consider only one Jordan block separately). Let v1,..., vi be
183
CHAPTER 3. LINEAR MODELS AND MATRIX CALCULUS
the corresponding basis. Then the theorem says that /(i^) = Xv2 + vi, f2{v2) = X2v2 + A«i + \vi = \2v2 + 2Xvi, and similarly for other v{'s and higher powers. In any case, the iteration of / results in higher and higher powers of A for all non-zero components. The smallest of them can differ from the largest one only by less than the dimension of V. The coefficients are bounded too.
This proves the claim. The same argument can be used to prove that for the mapping with all eigenvalues with absolute value strictly greater than one leads to unbounded growth of all coordinates for the iterations jk(y).
The remainder of this part of the third chapter is devoted to the proof of the Jordan theorem and a few necessary lemmas. It is much more difficult than anything so far. The reader can skip it, until the beginning of the fifth part of this chapter in case of any problems with reading it.
3.4.12. Root spaces. We have already seen by explicit examples that the eigensubspaces completely describe geometric properties for some linear mappings only. Thus we now introduce a more subtle tool, the root subspaces.
Definition. A non-zero vector u £ Vis called a root vector of the linear mapping p : V —> V, if there exists an a e K and an integer k > 0 such that (p — a idy)fe(w) = 0. This means that the fc-th iteration of the given mapping sends u to zero. The set of all root vectors corresponding to a fixed scalar A along with the zero vector is called the root subspace associated with the scalar A e K. We denote it by 7l\.
If u is a root vector and the integer k from the definition is chosen as the smallest possible one for u, then (p — a idy)fe_1(w) is an eigenvector with the eigenvalue a. Thus we have 7l\ = {0} for all scalars A which are not in the spectrum of the mapping p.
Proposition. Let p : V —> V be a linear mapping. Then
(1) TZ\ C V is a vector subspace for every A G K,
(2) for every A, \i G K, the subspace TZ\ is invariant with respect to the linear mapping (p — fi idy). Inparticular TZ\ is invariant with respect to p,
(3) if fi 7^ A, then (p — \i idy)|-R.A is invertible,
(4) the mapping (p — A idy)|-R.A is nilpotent.
Proof. (1) Checking the properties of the vector vector subspace is easy and is left to the reader.
(2) Assume that (p — A idy)fe(u) = 0 and put v = (p — [i idy)(u). Then
(p-\idv)k(v) =
= (p - A idv)k((p - A idy) + (A - fi) idy)(u)
= (p - A idy)fe+1(u) + (A - fi) ■ (p - A idy)fe(u) = 0
(3) If u e Ker(p — \i idy) |-^A, then
( V/U, V. □
Combining the latter theorem with the triangulation result from Corollary 3.4.14, we can formulate:
Corollary. Consider a linear mapping p : V —> V on a vector space V over scalars K, whose entire spectrum is in K. Then V = 7Z\1 ffi ■ ■ ■ ffi 7Z\n is the direct sum of the root subspaces. If we choose suitable bases for these subspaces, then under this basis p has block-diagonal form with upper triangular matrices in the blocks and eigenvalues Xi on the diagonal.
3.4.17. Nilpotent and cyclic mappings. Now almost everything is prepared for the discussion about canonical forms of matrices. It only remains to clear the relation between cyclic and nilpotent mappings and combine already proved results.
Theorem. Let p : V —> V be a nilpotent linear mapping. Then there exists a decomposition of V into a direct sum of subspaces V = V\ ffi ■ ■ ■ ffi Vk such that the restriction of p to each summand Vi is cyclic.
188
CHAPTER 3. LINEAR MODELS AND MATRIX CALCULUS
Proof. We provide a straightforward construction of a basis of the space V such that the action of the mapping p on the basis vectors directly shows the de-composition into the cyclic mappings, 'i Let k be the degree of nilpotency of the mapping p and write Pi = lm(pl), i = 0,..., k. Thus,
{0} = Pk c Pfc_i c ■ ■ ■ c Pi c P0 = V.
Choose a basis ek~1,..., ek~^ of the space Pk-i, where pk-i > 0 is the dimension of Pk-i. By definition, Pk-i C Ker p, i.e. p(ek~1) = 0 for all j.
Assume that Pk-i ^ V. Since Pk-i = p(Pk-2), there necessarily exist the vectors ek~2, j = 1,... ,pk-i in Pk-2, such that p(ek~2) = e^-1. Assume
aiei_1 + ' ■■ + aPk_1ekp;_\ + b1e\~2 + -- ■ + bPk_1ek;2i = 0.
Applying p on this linear combination yields frie^-1 + • • • + bPkl ep~\ = 0. This is a linear combination of independent vectors, therefore all bj = 0. But then also a, = 0. Thus the linear independence of all 2pk_1 chosen vectors is established. Next, extend them to a basis
(1) 1 '•••'e^-
k 2 A: 2 A: 2 A: 2
el ' ' ' ' ' ePfc-l' Pfc-l+1'' ' ' ' Pfc-2
of the space Pk-2- The images of the added basis vectors are in Pk-\. Necessarily they must be linear combinations of the basis elements ek_1,..., ek~\ ■ We can thus adjust the chosen vectors ek~2i+1,..., ek~22 by adding the appropriate
linear combinations of the vectors ek~2,..., ek~2i with the result that they are in the kernel of p. Thus we may assume our choice in the scheme (1) has this property.
Assume that we have already constructed a basis of the subspace Pk-i such that we can directly arrange it into the scheme similar to (1)
pk-l
el '•••'ePfc-i
pk—2 pk—2 pk—2 pk—2
el ' ' ' ' ' ePk-l' epfc_i + l' ' ' ' ' ePfc-2
A—3 k—3 k—3 k—3 pA:—3 k—3
cl ' ' ' ' ' cpfc-i' cpfc-i + l' ' ' ' ' cpfc-2' cpfc-2 + l' ' ' ' ' cPfc-3
k-e k-e k-e k-e k-e k-e
where the value of the mapping p on any basis vector is located above it. The value is zero if there is nothing above that basis vector.
If Pk-i ^ V, then again there must exist vectors
ek-e-\e^;1 which map to ek~e,ek~^. We can
extend them to a basis Pk-i-±, say, by the vectors
k-e-i k-e-i Pk-i+i>'''' Pk-i-i•
Again, exactly as when adjusting (1) above, we choose the additional basis vectors from the kernel of p. and analogically as before we verify that we indeed obtain a basis for
Pk-e-i-
189
CHAPTER 3. LINEAR MODELS AND MATRIX CALCULUS
After k steps we obtain a basis for the whole V, which has the properties given for the basis of the subspace Pk-i. Individual columns of the resulting scheme then generate the subspaces Vi. Additionally we have found the bases of these subspaces which show that corresponding restrictions of p are cyclic mappings. □
3.4.18. Proof of the Jordan theorem. Let Ai,..., A^ be all
*v\\ the distinct eigenvalues of the mapping p. From the assumptions of the Jordan theorem it follows thatK = ftAl (B---(BTZXk. W The mappings p{ = {p\nK -\ id-RAi) are nilpotent and thus each of the root spaces is a direct sum
Tlx, =Pi,Xi Pj„x,
of spaces on which the restriction of the mapping p — A, idy is cyclic. Matrices of these restricted mappings on PTtS are Jordan blocks corresponding to the zero eigenvalue, the restricted mapping p\pTi3 has thus for its matrix the Jordan block with the eigenvalue A,.
For the proof of Jordan theorem it remains to verify the claim about uniqueness (up to reordering the blocks). Because the diagonal values A, are given as roots of the characteristic polynomial, their uniqueness is immediate. The decomposition to root spaces is unique as well. Thus, without loss of generality we may assume that there is just one eigenvalue A and we are going to express the dimensions of individual Jordan blocks using the ranks rk of the mapping (p — A idy )k. This will show that the blocks are uniquely determined (up to their order). On the other hand, changing the order of the blocks corresponds to renumbering the vectors of basis, thus we can obtain them in any order.
If ip is a cyclic operator on an m-dimensional space, then the defect of the iterated mapping i\jk is k for 0 < k < m, while the defect is m for all fc > m. This implies that if our matrix J of the mapping p on the n-dimensional space V (remind we assume V = Tlx) contains dk Jordan blocks of the order k, then the defect De = n — of the matrix (J — \E)e is
Df = d1 + 2d2 H-----\-£de+ £de+1 H----.
Now, taking the combination 2Dk — Dk-i — Dk+i we cancel all those terms in the latter expression which coincide for £ = k — 1, k, k + 1 and we are left with
2-Dfc — P>k-\ — Dk+i = dk-Substituting for De's, we finally arrive at
dk = 2n-2rk-n + rk_1-n + rk+1 = rfc_i -2rk + rk+1.
This is the requested expression for the sizes of the Jordan blocks and the theorem is proved.
190
CHAPTER 3. LINEAR MODELS AND MATRIX CALCULUS
3.4.19. Remarks. The proof of the theorem about the exis-*Sw tence of the lordan canonical form was construe-/>i>~v.--p. tive> but it does not give an efficient algorithmic ''^jr"-- approach for the construction. Now we show .. how our results can be used for explicit computation of the basis in which the given mapping p : V —> V has its matrix in the canonical lordan form.5
(1) Find the roots of the characteristic polynomial.
(2) If there are less than n = dim V roots (counting multiplicities), then there is no canonical form.
(3) If there are n linearly independent eigenvectors, there is a basis of V composed of eigenvectors under which ip has diagonal matrix.
(4) Let A be the eigenvalue with geometric multiplicity strictly smaller than the algebraic multiplicity and vi,...,Vk be the corresponding eigenvectors. They should be the vectors on the upper boundary of the scheme from the proof of the theorem 3.4.17. We need to complete the basis by application of iterations p — A idy. By doing this we also find in which row the vectors should be located. Hence we find the linearly independent solutions w{ of the equations (p — X id)(wi) = Vi from the rows below it. Repeat the procedure iteratively (that is, for w{ and so on). In this way, we find the "chains" of basis vectors that give invariant subspaces, where p — X id is cyclic (the columns from the scheme in the proof).
The procedure is practical for matrices when the multiplicities of the eigenvalues are small, or at least when the degrees of nilpotency are small. For instance, for the matrix
A =
we obtain the two-dimensional subspace of eigenvectors
span{(l,0,0)T,(0,l,0)T},
but we still do not know, which of them are the "ends of the chains". We need to solve the equations (A — 2E)x = (a, b, 0)T for (yet unknown) constants a, b. This system is solvable if and only if a = b, and one of the possible solutions is x = (0,0,1)T, a = b = 1. The entire basis is then composed of (1,1,0)T, (0,0,1)T, (1,0,0)T. Note that we have free choices on the way and thus there are many such bases.
There is a beautiful purely algebraic approach to compute the Jordan canonical form efficiently, but it does not give any direct information about the right basis. This algebraic approach is based on polynomial matrices and Weierstrass divisors. We shall not go into details in this textbook.
191
CHAPTER 3. LINEAR MODELS AND MATRIX CALCULUS
5. Decompositions of the matrices and pseudoinversions
Previously we concentrated on the geometric description
matrix calculus in general.
Even when computing effectively with real numbers we use decompositions into products. The simplest one is the unique expression of every real number in the form
that is, as a product of the sign and the absolute value. Proceeding in the same way with complex numbers, we obtain their polar form. That is, we write z = (cosp + i sinip)\z\. Here the complex unit plays the role of the sign and the other factor is a non-negative real multiple.
In the following paragraphs we list briefly some useful decompositions for distinct types of matrices. Remind, we met suitable decompositions earlier, for instance for positive semidefinite matrices in paragraph 3.4.9 when finding the square roots. We shall start with similar simple examples.
3.5.1. LU-decomposition. In paragraphs 2.1.7 and 2.1.8 we
transformed matrices over scalars from any field into row echelon form. For this we used elementary row transformations, based on successive multiplication of our matrix by invertible lower
triangular matrices Pi. In this way we added multiples of the rows above the currently transformed one.
Sometimes we also interchanged the rows, which corresponded to multiplication by a permutation matrix. That is a square matrix in which all elements are zero except exactly one value 1 in each row and column. To imagine why, consider a matrix with just one non-zero element in the first column but not in the first row. When we used the backwards elimination to transform the matrix into the blockwise form
(remind Eh stays for the unit matrix of rank h) then we potentially needed to interchange columns as well. This was achieved by multiplying by a permutation matrix from the right hand side.
For simplicity, assume we have a square matrix A of size m and that Gaussian elimination does not force a row interchange. Thus all matrices Pi can be lower triangular with ones on diagonal. Finally we note that inverses of such Pi are again lower triangular with ones on the diagonal (either remember the algorithm 2.1.10 or the formula in 2.2.11). We obtain
of the structure of a linear mapping. Now we translate our results into the language of matrix decomposition. This is an important topic for numerical methods and
a = sgn(a) ■ a
U = P ■ A = Pk ■ ■ ■ P1 ■ A where U is an upper triangular matrix. Thus
A = L-U
192
CHAPTER 3. LINEAR MODELS AND MATRIX CALCULUS
where L is lower triangular matrix with ones on diagonal and U is upper triangular. This decomposition is called LU-decomposition of the matrix A. We can also absorb the diagonal values of U into a diagonal matrix D and obtain the LDU-decomposition where both U and L have just ones along the diagonal, A = L D U.
For a general matrix A, we need to add the potential permutations of rows during Gaussian elimination. Then we obtain the general result. (Think why we can always put the necessary permutation matrices to the most left and most right positions!)
LU-decomposition
Let A be any square matrix of size m over a field of scalars. Then we can find lower triangular matrix L with ones on its diagonal, upper triangular matrix U and permutation matrices P and Q, all of size m, such that
A = P-L-U-Q.
3.5.2. Remarks. As one direct corollary of the Gaussian tT r^< elimination we can observe that, up to a choice of suitable bases on the domain and codomain, CmJ*dr::_ every linear mapping / : V —> W is given by a matrix in block-diagonal form with unit matrix of the size equal to the dimension of the image of /, and with zero blocks all around. This can be reformulated as follows: every matrix A of the type m/n over a field of scalars K can be decomposed into the product
where P and Q are suitable invertible matrices.
Previously (in 3.4.10) we discussed properties of linear mappings / : V —> V over complex vector spaces. We showed that every square matrix A of dimension m can be decomposed into the product
A = P- J-P~\
where J is a block-diagonal with Jordan blocks associated with the eigenvalues of A on the diagonal. Indeed, this is just a reformulation of the Jordan theorem, because multiplying by the matrix P and by its inverse from the other side corresponds in this case just to the change of the basis on the vector space V (with transition matrix P). The quoted theorem says that every mapping has Jordan canonical form in a suitable basis.
Analogously, when discussing the self-adjoint mappings we proved that for real symmetric matrices or for complex Hermitian matrices there exists a decomposition into the product
A = P ■ D ■ P*,
where D is the diagonal matrix with all (always real) eigenvalues on the diagonal, counting multiplicities. Indeed, we
193
CHAPTER 3. LINEAR MODELS AND MATRIX CALCULUS
proved that there is an orthonormal basis consisting of eigenvectors. Thus the transition matrix P reflecting the appropriate change of the basis must be orthogonal. In particular,
p~1 — p*
For real orthogonal mappings we derived analogous expression as for the symmetric ones, i.e. A = P ■ B ■ P*. But in this case the matrix B is block-diagonal with blocks of size two or one, expressing rotations, mirror symmetry and identities with respect to the corresponding subspaces.
3.5.3. Singular decomposition theorem. We return to general linear mappings / : V —> W between vector spaces (generally distinct). We assume that scalar products are defined on both spaces and we restrict ourselves to orthonormal bases only. If we want a similar decomposition result as above, we must proceed in a more refined way than in the case of arbitrary bases. But the result is surprisingly similar and strong:
Singular decomposition
Theorem. Let Abe a matrix of the type m/n over real or complex scalars. Then there exist square unitary matrices U and V of dimensions m and n, and a real diagonal matrix D with non-negative elements of dimension r,r < min{m, n}, such that
A = USV*, S=(^ °qJ
and r is the rank of the matrix AA*.
The matrix S is determined uniquely up to the order of the diagonal elements in D. Moreover, are the square roots of the positive eigenvalues of the matrix A A*.
If A is a real matrix, then the matrices U and V are orthogonal.
Proof. Assume first that m < n. Denote by if : I" —> Km the mapping between i, real or complex spaces with standard scalar products, given by the matrix A in the standard bases.
We can reformulate the statement of the theorem as follows: there exists orthonormal bases on K" and Km in which the mapping f is given by the matrix S from the statement of the theorem.
As noted before, the matrix A* A is positive semidefinite. Therefore it has only real non-negative eigenvalues and there exists an orthonormal basis w of K" in which the corresponding mapping f* o f is given by a diagonal matrix with eigenvalues on the diagonal. In other words, there exists a unitary matrix V such that A* A = V B V* for a real diagonal matrix B with non-negative eigenvalues (di, d2,..., dr, 0,..., 0) on the diagonal, d{ ^ 0 for alii = 1,..., r. Thus
B = \ A A \ (AV)*(AV).
194
CHAPTER 3. LINEAR MODELS AND MATRIX CALCULUS
This is equivalent to the claim that the first r columns of the matrix AV are orthogonal, while the remaining columns vanish because they have zero norm.
Next, we denote the first r columns of AV as vi,...,vr £ Km. Thus, (vi,Vi) = di, i = l,...,r, and the normalized vectors u{ = ~^Vi form an orthonormal system of non-zero vectors. Extend them to an orthonormal basis u = ui,..., um for the entire Km. Expressing the original mapping p in the bases w of I" and u of Km, yields the matrix VB. The transformations from the standard bases to the newly chosen ones correspond to the multiplication from the left by a unitary (orthogonal) matrix U and from the right by V~x = V*. This is the claim of the theorem.
If m > n, we can apply the previous part of the proof to the matrix A* which implies the desired result.
All the previous steps in the proof are also valid in the real domain with real scalars. □
This proof of the theorem about singular decomposition is constructive and we can indeed use it for computing the unitary (orthogonal) matrices U and V and the non-zero diagonal elements of the matrix S.
The diagonal values of the matrix D from the previous theorem are called singular values of the matrix A.
3.5.4. Further comments. When dealing with real scalars, the singular values of a linear mapping ip : Rn —> Rm have a simple geometric meaning:
Let K G W1 be the unit ball in the standard scalar product. The image 11 that for every mapping ip : K" —> Km with the ma-"!y>3^ trix A in the standard bases we can choose a new or-As^F thonormal basis on Km for which p has upper trian-?w gular matrix.
Consider the images ABA = A
and we obtain
-(I SW? SWT I
Consequently
B
' D~x P 2 R
for suitable matrices P, Q and R. Next,
BA-
'D~x P\(P> 0\ _ f E 0^ 3 R) \0 OJ ~ \QD 0y
is Hermitian. Thus QD = 0 which implies Q = 0 (the matrix D is diagonal and invertible). Analogously, the assumption that AB is Hermitian implies that P is zero. Finally, we compute
B = BAB =
D-1 0\ (D 0\ (D-1 0 0 R J I 0 01 \ 0 R
On the right side in the right-lower corner there is zero, and thus also R = 0 and the claim is proved.
(4): Consider the mapping p :Kn ^ Km, x i-> Ax, and direct sums I" = (Keri^)±ffiKeri^,Km = Imi^ffi(Imi^)± of the orthogonal complements. The restricted mapping V? := V^Ke-rip)1- '■ (Keri^)± —> Imp is a linear isomorphism. If we choose suitable orthonormal bases on (Ker p)1-and Im p and extend them to orthonormal bases on whole spaces, the mapping p will have matrix S and p the matrix D from the theorem about the singular decomposition. In the next section, we shall discuss in detail that for any given b e Km, there is the unique vector which minimizes the distance ||& — 211 among all z e Imp (in analytic geometry we shall say that the point z realises the distance of b from the affine subspace Im p), see 4.1.16). The properties of the norm proved in theorem 3.4.3 directly imply that this is exactly the component z = b\ of the decomposition b = b\ +b2, &i £ Imp, &2 £ (Imp)±.
Now, in our choice of bases, the mapping p^ is given by the matrix 5^ from the singular decomposition theorem. In
199
CHAPTER 3. LINEAR MODELS AND MATRIX CALCULUS
particular, ip^(lmip) = (Kerip)±, D 1 is the matrix of the restriction m^ ^[(im^)-1- *s zer0, I^eed,
if o n.
For instance, an experiment gives many measured real values bj, j = 1,..., m. We want to find a linear combination of only a few fixed functions fi, i = 1,..., n which approximates the values bj as good as possible. The actual values of the fixed functions at the relevant points y.j e R define the matrix a{j = fi(yj). The columns of the matrix are given by values of the individual functions fi at the considered points. The goal is to determine the coefficients x{ e R
200
CHAPTER 3. LINEAR MODELS AND MATRIX CALCULUS
so that the sum of the squares of the deviations from the actual values
j=l i=l j=1 i=l
is minimized. By the previous theorem, the optimal coefficients are A^b.
As an example, consider just three functions fo(y) = 1, /i(y) = V< h{y) — y2- Assume that the "measured values" of their unknown combination g(y) = x0 + x\y + x2y2 in integral values for y between 1 and 5 are bT = (1,12,6,27,33). (This vector b arose by computing the values 1 + y + y2 at the given points adjusted by random integral values in the range ±10.) This leads in our case to the matrix A = (bj{)
/l 1 1 1 1\ AT= 1 2 3 4 5 . \1 4 9 16 25/
The requested optimal coefficients for the combination are
A^ -b
9 5 0
37 23
35 70
1 1
7 14
0.600\
0.614
1.214/
_4 5
6
7 _ 1
7
_3 5 37 70 _ J_ 14
A\
12 6
27 \33/
The resulting approximation can be seen in the picture, where the given values b are shown by the diamonds, while the dashed curve stays for the resulting approximation g(y) = xi + x2y + x3y2.
/
/
V
/
♦ /
The computation was produced in Maple and taking 15 values yi = I + i + i2, with a random vector of deviations from the same range added produced the following picture:
201
CHAPTER 3. LINEAR MODELS AND MATRIX CALCULUS
4.
202
CHAPTER 3. LINEAR MODELS AND MATRIX CALCULUS
G. Additional exercises for the whole chapter
3.G.I. Solve the following LP problem
minimize {7x — 5y + 3z} 0 B. It is sufficient to express the image f(A0) of the origin of the coordinate system on A in the coordinate system on B. In other words, the vector / (A0) — B0 with the basis v is expressed as a column of coordinates yo- Everything else is then given by multiplying by the matrix of the map ip in the chosen bases and by adding the outcome. Each affine map therefore has the following form in coordinates:
x y0 + Y ■ x,
where yo is as above, and Y is the matrix of the map p.
As in the case of linear maps, the transformation of affine coordinates corresponds to the expression of the identity map in the chosen affine frames. The change of coordinate expression of an affine map caused by a change of the basis is computed by multiplying and adding matrices and vectors.
Let
x = w + M ■ x',
describe a change of basis on the domain by a translation w and a matrix M. Let
y' = Z + N ■ y
describe a change of basis on the range space by a translation z and a matrix N. Then
y' = z + N ■ y = z + N ■ (y0 + Y ■ x)
= (z + N ■ y0 + N ■ Y ■ w) + (N ■ Y ■ M) ■ x'.
Hence the affine map in the new bases is given by the translation vector z + N-y0 + N- Y- w and matrix N ■ Y ■ M.
241
CHAPTER 4. ANALYTIC GEOMETRY
with the only solution a = 3, b = —4. Hence
PA-B = 3 (4, -2, -3, -2) - 4 (2, -1, -2, -2) = (4,-2,-1,2),
where
\\PA-b\\ = \/42 + (-2)2 + (-l)2 + 22 = 5.
Hence the distance between A and U equals 11 Pa-b \ = 5. □
4.B.5. In the vector space R4, compute the distance v between the point [0,0, 6,0] and the vector subspace
U : [0, 0, 0, 0]+ti (1, 0,1, l)+t2 (2,1,1, 0)+t3 (1, -1, 2, 3),
t\,t2, £368
Solution. We solve the problem by the least squares method. Write the generating vectors of U as the columns of the matrix
(1 2 1\
0 1-1 A~ 1 1 2 V 0 3/
Substitute the point [0,0, 6,0] by the corresponding vector b = (0,0,6,0)T. Now solve A ■ x = b. This is the linear equation system
xi + 2x2 +
x2 -
xi + x2 +
X\ +
X-3
2x3 3x3
0, 0, 6, 0,
by the least squares method. (Note that the system does not have a solution - the distance would be 0 otherwise.) Multiply A ■ x = b by the matrix AT from the left-hand side. Then the augmented matrix AT ■ A - x = AT ■ b is
By elementary row operations, transform the matrix to the
normal form
/ 3 3 6 3 6 3 I 6 3 15
Continue with backward elimination
/1 1 2 2 \ /1 0 3
0 1 -1 0 . - 0 1 -1
Vo 0 0 Vo 0 0
4.1.15. Euclidean point spaces. So far, we do not need the notions of distance and length for geometric considerations. But the length of vectors and the angle between vectors, as denned in the second chapter (see 2.3.18 and elsewhere), play a significant role in many practical problems.
Euclidean spaces
The standard Euclidean point space £n is the affine space An whose difference space is the standard Euclidean space R™ with the scalar product
(x,y) = yT ■ x.
The Cartesian coordinate system is the affine coordinate system (A0;u) with the orthonormal basis u.
The Euclidean distance between two points A, B V on
a Euclidean space V, det p equals the (oriented) volume of the image of the parallelepiped determined by vectors of an orthonormal basis. More generally, the image of the parallelepiped V, determined by arbitrary dim V vectors, has a volume equal to det p-multiple of the former volume.
4.1.24. Outer product and cross product of vectors. The
previous considerations are closely related to the tensor product of vectors. We do not go further in this technically more complicated topic. But we do mention the outer product n = dim V of vectors ui,..., un e V.
249
CHAPTER 4. ANALYTIC GEOMETRY
closest). Find such points in the case of planes g±, g2. Denote
ui = (1,0,-1,0,0), u2 = (0,1,0,0,-1), «i = (1,1,1,0,1), v2 = (0,-2, 0,0,3).
Points X1 e Qi, X2 e g2, which are the "closest" (as commented above), are
Xx = [7,2,7,-1,1] + slU2,
X2 = [2,4,7,-4,2] +t2Vl + s2v2,
so
Xx-X2 = [7,2,7,-l,l]-[2,4,7,-4,2]
+ txUx + SiU2 - t2v1 - S2V2
= (5,-2,0, 3,-1)
+ txUx + SiU2 - t2v1 - S2V2.
The dot products
(X1-X2,u1) = 0, (X1-X2,u2) = 0, (X1-X2,v1) = 0, (X1-X2,v2) = 0 then lead to the linear equation system
2ti = -5,
2si + 5s2 = 1,
-4f2 - s2 = -2,
—5si — t2 — 13s2 = —1
with the unique solution t\ = —5/2, si = 41/2, t2 = 5/2,
s2 = —8. We obtained
Let (uij,.
)T be coordinate expressions of vectors
Xx
X2
5 41 [7,2,7,-l,l]--wi + yw2
^9 45 19 39" 2'~2~'~2~'~ '~T
[2,4,7, -4, 2] + -Vl-8v2 '9 45 19 39"
2'T'T'~ '~T
The distance between the points X\, X2 equals the distance between the planes g\, g2) both of which are given by
\\X1 -X2 || = || (0,0,0,3,0) || = 3. □
4.B.20. Find the intersection of the plane passing through the point A = [1, 2,3,4] £ R4 and orthogonal to the plane
g: [1,0,1,0] + (1,2,-1, -2)s+ (1,0,0, l)t, s,teR.
Solution. Find the plane orthogonal to g. Its direction is orthogonal to the direction of g, for vectors (a, b, c, d) within its direction we get linear equation system
(a, b, c, d) ■ (1,2,-1,-2) = 0 = a + 2&-c-2d = 0 (a, b, c, d) ■ (1,0,0,1) = 0 = a + d = 0.
Uj in a chosen orthonormal basis V. Let M be a matrix with elements (tiy). Then the determinant |M| does not depend on the choice of the basis. Its value is called the outer product of the vectors «i,...,un, and is denoted by [u1,..., un]. Hence the outer product is the oriented product of the corresponding parallelepiped, see 4.1.22.
Several useful properties of the outer product follow directly from the definition
(1) The map (ui,..., un) i-> [u±,..., un] is an antisymmetric n-linear map. It is linear in all arguments, and the interchange of any two arguments causes a change of sign.
(2) The outer product is zero if and only if the vectors ui,..., un are linearly dependent.
(3) The vectors u1,... ,un form a positive basis if and only if the outer product is positive.
Consider a Euclidean vector space V of dimension n > 2 and vectors ui,..., un-i £ V. If these n — 1 vectors are substituted into the first n — 1 arguments of the n-linear map denned by the volume determinant as above, then there is one argument left over. This defines a linear form on V. Since the scalar product is available, each linear form corresponds to exactly one vector. This vector v £ V is called the cross product of the vectors u\,..., un-i. For each vector w £ V
(v,w) = [uu . . .,u„_i,wj-
We denote the cross product by v = ui x ... x .
If the coordinates of the vectors in an orthonormal basis are v = (y1,..., yn)T, w = (x1:..., xn)T and u,- =
)T, then the definition can be expressed as
yizi H-----h ynxn
"11
Ml(n-l) Xl
Hence the vector v is determined uniquely. Its coordinates are calculated by the formal expansion of this determinant along the last column. The following properties of the cross product are direct consequences of the definition:
Theorem. For the cross product v = ui x ... x
(1) v £ (ui,... ,U„_l}±
(2) v is nonzero if and only if the vectors ui,..., un-i are linearly independent,
(3) the length \\v\\ of the cross product equals the absolute value of the volume of parallelepiped V(0;u!,... ,un_i),
(4) (ui,..., un-i, v) is a compatible basis of the oriented Euclidean space V.
Proof. The first claim follows directly from the defining 1 formula for v. Substituting an arbitrary ^ISLJLY/ vector Uj for w gives the scalar product v ■ uj on the left and the determinant with two equal columns on the right.
250
CHAPTER 4. ANALYTIC GEOMETRY
The solution is the two-dimensional vector space ((0,1,2,0), (-1,0,-3,1)}. The plane r orthogonal to q passing through a has parametric equation
r : [1, 2, 3,4] + (0,1, 2, 0)u + (-1, 0, -3, l)v, u,v eR.
We can obtain the intersection of the planes from both parametric equations. It is given by the linear equation system
l+s+t = 1-v
2s = 2 + u
1 — s = 3 + 2w — 3i>
-2s + t = A + v,
which has the unique solution (it must be so as matrix columns are linearly independent) s = —8/19, t = 34/19, u = —54/19, v = —26/19. Substitute the parameter values s and t into the parametric form of the plane g, to obtain the intersection [45/19,-16/19,11/19,18/19]. (Needless to say, the same solution is obtained by substituting the values into r). □
4.B.21. Find a line passing through point [1,2] e R2 so that the angle between this line and the line
p: [0,1] + 1)
is 30°.
Solution. The angle between two lines is the angle between their direction vectors. It is sufficient to find the direction vector v of the line. One way to do so is to rotate the direction vector of p by 30°. The rotation matrix for the angle 30° is
Ax>s30° -sin30°^ _ (^ -\
~ 1 \/3 2 2
^sin 30° cos 30' The desired vector v is therefore
v^s 1
v^s l
We could perform the backward rotation as well. The line (one of two possible) has parametric equation
1 ' J ^ 2 2' 2 2y
□
4.B.22. An octahedron has eight faces consisting of equilateral triangles. Determine cos a, where a is the angle between two adjacent faces of a regular octahedron. Solution. An octahedron is symmetric, therefore it does not matter which two faces are selected. By suitable
The rank of the matrix with n — 1 columns Uj is given by the maximal size of a non-zero minor. The minors which define coordinates of the cross product are of degree n — 1 and thus claim (2) is proved.
If the vectors «i,..., are linearly dependent, then (3) also holds. Suppose the vectors are linearly independent. Let v be their cross product, and choose an orthonormal basis (ei,..., e„_i) of the space («i,..., u„_i}. It follows from what is proved that there exists a multiple (l/a)v, 0 ^ a e R, such that (ei,..., ek, (l/a)v) is an orthonormal basis of V. The coordinates of the vectors in this basis are
uj = (uij,---,u{n-i)j,0)T, v = (0, ...,0,q)t.
So the outer product [ui,..., , v] equals (see the definition of cross product)
0
"1,
■,Un-!,v] =
"11
^(n-l)l
0
Ml(n-l) M(n-l)(n-l) 0
0 a
= (v, v) = a2. By expanding the determinant along the last column,
a2 = a VolT-^O;un_i).
Both the remaining two claims follow from the proposition below. □
In technical applications in R3, the cross product is often used. It assigns a vector to any pair of vectors.
4.1.25. Aftine and Euclidean properties. Now we can consider which properties are related to the affine structure of the space and which properties we really need in the difference space.
All Euclidean transformations, (bijective affine maps) which preserve the distance between points, preserve also all objects we have studied. Moreover they pre-I serve unoriented angles, unoriented volumes, angle between subspaces etc. If we want them to preserve also oriented angles, cross products, volumes, then we must also assume that the transformations preserve the orientation.
We ask: Which concepts are preserved under affine transformations?
Recall first that an affine transformation on an n-dimensional space A is uniquely denned by mapping n + l points in general position, that is, by mapping a one n-dimensional simplex. In the plane, this means choosing the image of any nondegenerate triangle. Preserved properties are properties related to subspaces and ratios. In particular, incidence properties of the type "a line passing through a point" or "a plane contains a line" etc. are preserved. Moreover, the collinearity of vectors is preserved. For every two collinear vectors, the ratio of their lengths is preserved independently of the scalar product denning the length. Similarly, the ratio of the volumes of two n-dimensional parallelepipeds is preserved under the
251
CHAPTER 4. ANALYTIC GEOMETRY
[0,0,^]
■^,0,0],
scaling, the octahedron has edge length 1 and is placed in the standard Cartesian coordinate system R3 so that its centroid is at [0,0,0]. Its vertices then are located at the points A = [^,0,0], B = [0,^2,0], C D = [0,-^,0], E = [0,0,-^2] and,F :
We compute the angle between the faces CDF and BCF. We need to find vectors orthogonal to their intersection and lying within respective faces, which means orthogonal to CF. They are altitudes from D and F to the edge CF in the triangles CDF and BCF respectively. The altitudes in an equilateral triangle are the same segments as ythe medians, so they are SD and SB, where S is midpoint of CF. Because the coordinates of points C and F are known, S has coordinates [— -^2,0, -^2] and the vectors areS£> = (^2,-^2,-^ gether
4 jand^ = (^,f,-^).To-
(s/2 V 4 ' %/2 2 ' %/2\ 4 0 (s/2 V 4 ' s/2 2 ' -f) 1
ll(f , \/2 2 ' -%\ \\\{% \/2 2 ' -^)ll 3
Therefore a = 132°.
□
4.B.23. In Euclidean space R5 determine the angle ip between subspaces U, V, where
(a) U : [3, 5,1, 7, 2] + t (1, 0, 2, -2,1), i G R,
V : [0,1, 0, 0, 0] + s (2, 0, -2,1, -1), s£ R;
(b) [/ : [4,1,1, 0,1] + t (2, 0, 0, 2,1), te R,
V : xi + x2 + 23 + x5 = 7;
(c) U :2xi-x2 + 2a;3 + x5 = 3,
V : xi + 2x2 + 223 + x5 = -1;
(d) [/ : [0,1,1, 0, 0] + t (0, 0, 0,1, -I), te R,
K : [1, 0,1,1,1] + r (1,-1, 2,1, 0) + s (0,1, 3, 2, 0) + p (1,0,0,1,0)+? (1,3,1,0,0),
r, s,p,qe R;
(e) [/ : [0, 2, 5, 0, 0] + t (2,1, 3, 5, 3) + s (0, 3,1,4, -2)
+ r (1,2,4, 0,3), t,s,r e R, V: [0,0, 0,0,0] + p (-1,1,1,-5,0)
+ q (1,5,1,13, -4), p,qeR;
(f) t/: [l,l,l,l,l]+t(l,0,1,1,1)
+ s (1,0, 0,1,1), (,s£l, [1,1,1, l,l]+p(l, 1,1,1,1)(1,1, 0,1,1) + r(l,l,0,l,0),p, q,r G R.
Solution. Recall that the angle between affine subspaces is the same as the angle between vector spaces associated to
transformations, since the determinant of the corresponding matrix changes by the same multiple.
These affine properties can be used in the plane to prove geometric statements. For instance, to prove the fact that the medians of a triangle intersect in a single point, and in one third of their lengths, it is sufficient to verify this only in the case of an isosceles right-angled triangle or only in the case of an equilateral triangle. Then this property holds for all triangles. Think about this argument!
2. Geometry of quadratic forms
After straight lines, the simplest objects in the analytic geometry of plane are the conic sections. These are given by quadratic equations in Cartesian coordinates. A conic is distin-«// ■ guished as a circle, ellipse, parabola or hyperbola, by examining the coefficients. There are two degenerate cases, namely a pair of lines or a point. We cannot distinguish a circle from an ellipse in affine geometry, therefore we begin with Euclidean geometry.
4.2.1. Quadrics in £n. In analogy with the equations of conic sections in plane, we start with objects in Euclidean point spaces. These are defined in a given orthonormal basis by quadratic equations, and are known as quadrics.
Choose a fixed Cartesian coordinate system in £n. This is a point and an orthonormal basis of the difference space. Consider a general quadratic equa-Kjrv)fc tion for the coordinates (2i,...,2„)Tofa point 4faw^m^— A e £n
(i) ^2 aijxixj + X] ^aiXi + ° = o,
i,j=l i=l
where it may be assumed by symmetry that a{j = aj{ without loss of generality. This equation can be written as
f(u) + g(u) + a = 0
for a quadratic form / (i.e. the restriction of a symmetric bilinear form F to pairs of equal arguments), a linear form g, and a scalar a e R. We assume that at least one coefficient dij is nonzero. Otherwise the equation is linear and describes a Euclidean subspace.
Notice that the equation (1) keeps the same shape under every Euclidean (or affine) coordinate transformation, i.e., it splits again into quadratic, linear and constant parts.
4.2.2. Quadratic forms. Begin the discussion of equation (1) with its quadratic part, i.e. bilinear symmetric form F : Rn x Rn —> R. Similarly, think of a general symmetric bilinear form on an arbitrary vector space.
For an arbitrary basis on this vector space, the value j(x) on vector x = x1e1 + ■ ■ ■ + xnen is given by the equation
(1) f(x) = F(x, x) = Y2 xixjF{eii ej) — xT ■ A ■ x
252
CHAPTER 4. ANALYTIC GEOMETRY
them. Therefore the translation caused by the point addition can be omitted.
Case (a). Since U and V are one-dimensional spaces, the angle p e [0, tt/2] is given by formula
_ (l,0,2,-2,l)-(2,0,-2,l,-l) _ 5 COSy- || (1,0,2,-2,1) ll-H (2,0,-2,1,-1) || - vTO-vTO'
Therefore cos p = 1/2 and p = tt/3.
Case (b). The subspace U has direction vector (2,0,0,2,1) and the subspace V has normal vector (1,1,1,0,1). The angle between them ip = tt/3 is derived from the formula
cos ip ■■
(2,0,0,2,!)■(!,1,1,0,1)
3 3-2-
(2,0,0,2,1) ||-|| (1,1,1,0,1)
Notice that p = n/2 — ip = because p is complement to ip.
Case (c). The hyperplanes U and V are defined by normal vectors u = (2, -1,2, 0,1) and v = (1,2, 2,0,1). The angle p equals to angle between the direction vectors aat, Therefore (see (a))
_ (2,-1,2,0,!)■(!,2,2,0,1) _ 1 .. _ tt_
COSy — || (2,-1,2,0,1) ll-M (1,2,2,0,1) || — 2' lj- ^ — 3'
Case (d). Denote
«=(0,0,0,1,-1), Vl = (1,-1,2,1,0), w2 = (0,1,3, 2,0), v3 = (1,0,0,1,0), v4 = (1,3,1,0,0) and denote the orthogonal projection of u into the vector sub-space of V (subspace generated by v1,v2,v3, v4) by pu. Now
pu = avi + bv2 + cv3 + dv4 for some a, b, c, d G R and
(Pu - u, «i) = 0, (pu-u,v2) = Q, {pu-u, v3) = 0, {pu-u,v4) = 0.
Substituting for pu gives the linear equation system
7a + lb + 2c = 1,
7a + lib + 2c + 6d = 2, 2a + 2b + 2c + d = 1, 6b + c + lid = 0. The solution is (a,b, c, d) = (-8/19, 7/19,13/19,-5/19).
'\Pu\\
(1)
and so
cos p ■
8 7 13 5
Pu = ~19Vl+ 19V2 + 19V'3 ~ 19V4 = (°' °' °' X'0)'
cosy
\Pu
(0,0, 0,1,0)
1 _ V2 V2~~2~'
,\u\\ ||(0,0,0,1,-1) Hence p = 7r/4.
Case (e). Determine the intersection of the vector subspaces associated with the given affine subspaces. The
where A = (a^) is a symmetric matrix with elements a{j = F(ei, ej). We call such maps / quadratic forms, and the formula from above for the value of the form in terms of the chosen coordinates is called the analytic formula for the form.
In general, by a quadratic form is meant the restriction j(x) of a symmetric bilinear form F(x, y) to arguments of the type (x, x). Evidently, the whole bilinear form F can be reconstructed from the values j(x) since
fix + y)=Fix + y,x + y) = fix) + f(y) + 2F{x, y).
If we change the basis to a different basis e'l5..., e'n, we get different coordinates x = S ■ x' for the same vector (here S is the corresponding transformation matrix), and so
fix) = iS ■ x')T ■ A ■ iS ■ x') = ix'f ■ iST ■ A ■ S) ■ x'.
Assume now that the vector space is equipped with a scalar product. Then the previous computation can be formulated as follows. The matrix of the bilinear form F, which is the same as the matrix of /, transforms under a change of coordinates in such a way that for orthogonal changes it coincides with the transformation of a matrix of a linear map (i.e., then S~4 =5^). This result can be interpreted as the following observation:
Proposition. Let V be a real vector space with a scalar product. Then formula
p H> F, Fiu, u) = (y(u),u)
defines a bijection between symmetric linear maps and quadratic forms on V.
Proof. Each bilinear form with a fixed second argument jji,, becomes a linear form F( ,u). In the presence of a scalar product, it is given by the formula F(v,u) = v ■ w for a suitable vector w. Put ip(u) =w. Directly from the coordinate expression (1) displayed above, p is the linear map with symmetric matrix A. Hence it is selfadjoint.
On the other hand, each symmetric map p defines a symmetric bilinear form F by formula F(u, v) = (p(u),v) = (u, R,
f(x1,x2,x3) = 3x2 + 2x1x2 +x2 + 4x2x3 + Qx3.
Solution. Its matrix is
/3 1 0\ A= 11 1 2 . \0 2 6/
According to step (1) of the Lagrange algorithm (see Theorem 4.2.5), perform the following operations
1 2
f(x1,x2,x3) = g(3xi + x2)2 + -x22 + 4x2x3 + Qxl
= \vl + \(\v2 + 2y3)2
1 o 3 9
The form has rank 2 and the matrix changing the basis to the polar basis w is obtained by a combination of following transformations: z3 = y3 = x3, z2 = \y2 + 2y3 = \x2 + 2x3 and z\=y\ = 3x\ + x2, so the change of basis matrix is /3 1 0\
The origin of the Cartesian coordinates is the center of the studied conic. The new orthonormal basis of the difference space gives the direction of semiaxes. The final coefficients a, b then give the lengths of the semiaxes in the nonde-generate directions.
4.2.5. Affine point of view. In the previous two paragraphs, 43,«« we searcried for essential properties and stan-äCjt^K dardized analytical descriptions of objects de-sSgssiJ^s fined in Euclidean spaces by quadratic equations. We sought the simplest equations which can be obtained by a suitable choice of coordinates. A geometric formulation of the result is that for two different quadrics, given in different Cartesian coordinates, there exists a Euclidean transformation on £n (that is, an affine bijective map preserving lengths) if and only if the above algorithm leads to the same analytic formulas, up to the order of coordinates. Moreover, the Cartesian coordinates in which the objects are given by the resulting canonical formulas, can be obtained directly. Hence the explicit expression of the corresponding coordinate transformation is also obtained. It is always a composition of a translation, rotation and reflection with respect to a hyper-plane.
Of course, we may ask to what extent we can do the same in affine spaces, where we can choose any coordinate system. For example, in the plane we cannot distinguish the circle from the ellipse. On the other hand, we can distinguish from the hyperbola and between all other types of conies. In particular, all hyperbolas merge into one etc. We postpone discussion of this issue to the third part of this chapter, except for the case of quadratic forms.
Consider a quadratic form / on a vector space V and its
analytic formula /(u) on V. Then for the vector u f can be written as
Ax with respect to a chosen basis
xiui + ■ ■ ■ + xnun, the form
T
0 f 2 ^0 0 1,
It is already shown that A is diagonal for a suitable choice of basis. In other words that F(ui, uj) = 0 for i =^ j for a suitable symmetric form F. Each such basis is called the polar basis of the quadratic form /. A scalar product can always be chosen for such a purpose. Nevertheless, without the use of the scalar product, there is a much simpler algorithm for finding a polar basis among all other bases. At the same time, there is relevant information to be found about the affine properties of the quadratic form. The algorithmic procedure in the proof of the next theorem is known as the Lagrange algorithm.
Theorem. Let V be a real vector space of dimension n, f : V->la quadratic form. Then there exist a polar basis for fonV.
255
CHAPTER 4. ANALYTIC GEOMETRY
We computed the polar coordinates, expressed them in standard basis and wrote them as rows of the matrix (the columns of this matrix are vectors of the standard basis in the polar basis). The polar basis vector coordinates are the columns of the matrix T-1.
,0 0
The polar basis is therefore ((§,0,0), (-§,§,0), (1,-3,1)).
□
4.C.2. Determine the polar basis of the form / : R3 -> R3.
f{x1,x2,x3) = 2x1X3 + x\.
Solution. The matrix is of the form
Change the order of the variables: y\ = x2, y2 = x\, y3 = x3. It is then trivial to apply step (1) of Lagrange algorithm (there are no common terms). However for the next step, case (4) sets in. Introduce the transformation z1 = y1,
Z2 = V2, z3 = y3 - y2.
f(x1,x2,x3) = z2+2z2(z3+z2) = z\ + ]-{2z2+z3)
1
Together, z1 = y1 = x2, z2 = y2 = x1, z3 = y3 - y2 = x3 — x1. The matrix T for change to polar basis is
/0 1 0> and T-1 = j 1 0 0 \0 1 1)
The polar basis is therefore ((0,1,0), (1,0,1) (0,1,1)). □
4.C.3. Find the polar basis of the quadratic form / : R3 R, which in the standard basis is denned as
Proof. (1) Let A be the matrix of / in basis u = (ui,..., un) on V, and assume an =^ 0. Then we may write
f(x1, ...,xn) = axlx\ + 2a12xxx2 H-----h a22x\ + ...
= ciii (auX! + 0-12X2 H-----1- alnxn)2
+ terms not containing x\.
Hence we can transform the coordinates (i.e. change the basis) such that in the new coordinates
x[ = anxi + a12x2 H-----halnxn, x'2 = x2,.. . ,x'n = xn.
This corresponds to the new basis
Vl
-1,
alnux.
f(xi, x2, X3) = XXX2 + XxX3.
(As an exercise, compute the transformation matrix). In the new basis the corresponding symmetric bilinear form satisfies g(vi,Vi) =0 for alii > 1 (compute it!). Thus / has the form aiix'i2 + h m the new coordinates, where h is a quadratic form independent of the variable x\.
It is often easier to choose v\ = u\ in the new basis. Then / = fi + h, where /1 depends only on x[, while x[ does not appear in h, but g(yi,vi) = an.
(2) Assume that after step (1), h is a matrix of rank less than n with a nonzero coefficient of x'2 2. Then the same procedure can be repeated to obtain the expression j = ji +f2 +h, where h contains only the variables with index greater than two. Proceed in this way until a diagonal form is obtained after n — 1 steps, or in (say) the i-th step, the element ai{ is zero.
(3) If the last possibility occurs, and there exists some other element ajj ^ 0 with j > i, then it suffices to exchange the i-th and the j-th vector of the basis. Then continue according the previous procedure.
(4) Assume that the situation is ajj = 0 for all j > i. If there is no element a^ ^ 0 with j > i, k > i, then we are finished, since then the matrix is diagonal. If a^ ^ 0, then we use the transformation Vj = Uj + uk + we keep the other vector of basis constant (i.e. x'k = xk — Xj, the other remain constant). Then h(vj, Vj) = h(uj, Uj) + h(uk, uk) + 2h(uk,Uj) = 2ajk 7^ 0 and we can continue as for case (1). □
4.2.6. Afflne classification of quadratic forms. The vectors in the basis obtained from the Lagrange algorithm can be rescaled by scalars such that the coefficients of the squares of variables are only the scalars 1,-1 and 0. Moreover, the following law of inertia says that the number of one's and minus one's does not depend on the choices in the course of the algorithm. These numbers are called the signature of a quadratic form. As before, there is a complete description of quadratic forms in the sense that two such forms may be transformed each one into the other by an affine transformation if and only if they have the same signature.
Theorem. For each nonzero quadratic form of rank r on a real vector space V there exists a natural number p, and r
256
CHAPTER 4. ANALYTIC GEOMETRY
Solution. By an application of the Lagrange algorithm:
f(x1:x2, x3) = 2x\x2 + x2x3 substitution y2 = X2 — xi, yi = xi, j/3 = X3
= 2x1(x1 + y2) + (xi + y2)x3 = 2x\ + 2xxy2 + xxx3 + y2x3
1, 1 ,2 1 2 1 2
=-{2Xl + y2 +-x3) - -y2- -x3+y2x3 substitution yi = 2xi + yi + \x3
2^1 ~ - gx3 +V2X3
1 2 nA 1 ^2 3
= -Vl-2(-y2--x3) +-substitution j/3 = \y2 — -|x3
= ^2 - 2y32 + m
With the coordinates yi,y3,x3, the quadratic form has a diagonal shape, which means that the basis associated with those coordinates is the polar basis of the form. If we want to express the basis, we need to obtain the matrix which changes the basis from polar to standard. By definition of the change of basis matrix, its columns are the polar basis vectors. Either we express the old variables (xx,x2, x3) by new variables (yi, 2/3. 23), or equivalently we express the new ones by the old ones (which is easier). In the latter case, we need to compute inverse matrix.
yi = 2xi + y2 + \x3 = 2xx + (x2 - x{) + \x3 and y3 = \yi — \x3 = —\x\ + i;x3 — i;x3. The matrix for changing the basis from the polar basis to the standard basis is
The inverse matrix is
T"
_ 2
3 0
Hence one of the polar bases of the given quadratic forms is (see the columns of the matrix),
{(1/3,1/3,0), (-2/3,4/3,0), (-1/2,1/2,1)}.
4.C.4. Determine the type of conic section denned by
3x± — 3xix2 + x2 — 1 = 0.
□
independent linear forms pi,
p < r and
pr £ V* such that 0 <
f(u) = (p1(u))2 + - ■ ■ + (pp(u))2-(pp+1(u))2-----(Pr(u))2.
Otherwise put, there exists a polar basis, in which f has an analytic formula
xi +
+ X
Lp+1
The number p of positive diagonal coefficients in the matrix of the given quadratic form (and thus the number r — p of negative coefficients) does not depend on the choice of polar basis.
Two symmetric matrices A, B of dimension n are matrices of the same quadratic form in different bases if and only if they have the same rank and the same number of positive coefficients in the polar basis.
Proof. By completing the square, f(x1,...,xn) = Xix2 + ■ ■ ■ + \rx2, \{ ^ 0, in a basis on V. Assume moreover that the first p coefficients a, are positive. Then the transformation yi = y/X~ixi,..., yv = *J\~vxv, yp+1 =
\J ~\+lxp+l t ■ ■ iVr — \/—XrXr, yr+l = 2V_|_i 5 • • • 5 y-n —
xn yields the desired formula. The forms pi are exactly the forms from the dual basis in V* to the obtained polar basis.
It remains to prove that p does not depend on the procedure. Assume that there is a formula for the same form / in the polar bases u, v, i.e.
f(xi,-
f(yi,-
■,Vn)
■x\ + '
~\~ X' *^T)-|-1 " " " X j.
+ yq
>+ y2+i
Denote the subspace generated by the first p vectors
of the first basis by P
p), and similarly
Q = {vq+i,..., vn). Then for each u e P, f(u) > 0 while for v e Q f(v) < 0. Hence necessarily P n Q = {0}, and therefore dim P + dim Q < n. Hence p + (n — q) < n, so that p < q. By interchanging the subspaces, q < p, and so
p = q-
Thus p is independent of the choice of the polar basis. Consequently for two matrices with the same rank and the same number of positive coefficients in the diagonal form of the corresponding quadratic form, the analytic formulas are the same. . □
While discussing symmetric maps we talked about definite and semidefinite maps. The same discussion has an obvious meaning also for symmetric bilinear forms and quadratic forms. A quadratic form / on a real vector space V is called
(1) positive definite if j(u) > 0 for all vectors u/0,
(2) positive semidefinite if j(u) > 0 for all vectors u e V,
(3) negative definite if j(u) < 0 for all vectors u/0,
(4) negative semidefinite if j(u) < 0 for all vectors u e V,
(5) indefinite if j(u) > 0 and f(v) < 0 for two vectors u, v £ V.
257
CHAPTER 4. ANALYTIC GEOMETRY
Solution. Complete the squares:
3x\ - 3x!X2 + x2 - 1 = ^(3zi - 7^x2)2 - ^2 + x2-l
1 2 4 3 3yi 3V 1 2_4 2_2
2yi 3V2 3'
^ 2 ^/^ 1n2 1
According to the list 4.2.4, the given conic section is a hyperbola. □
4.C.5. By completing the squares, express the quadric
-x2 + 3y2 + z2 + 6xy - Az = 0 in such a way that one can determine its type from it. Solution. Complete the square. Deal first with all terms involving an x. Obtain the equation
-{x - 3y)2 + 9y2 + 3y2 + z2 - 4z = 0.
There are no "unwanted" terms containing y , so repeat the procedure for z. This gives
-(x - 3y)2 + Yly2 + (z - 2)2 - 4 = 0.
Conclude that there is a transformation of variables that leads to the equation (we can divide by 4 if desired)
-x2 + y2 + ž2 - 1 = 0.
□
We can tell the type of the conic section without transforming its equation to the form listed in 4.2.4. Every conic section can be expressed as
axlx2 + 2a12xy + a22y2 + 2a13x + 2a23y + a33 = 0.
Determinants A = detA
an ai2 ar3 «12 «22 «23 and
013 a32 a33
c aii aio . . . 1.1
0 = are invariants of conic sections which means
0-12 0,22
that they are not changed by Euclidean transformations (rotation and translation). Furthermore, the different types of conic sections have different signs of those determinants.
• A =^ 0 for non-degenerate conic sections:
ellipse for S > 0, hyperbola for S < 0 and parabola for 8 = 0
For a real ellipse (not imaginary), it is necessary that (an + «22)^ < 0.
• A = 0 for degenerate conic sections, or pairs of lines.
The signs (or zero-value) of the determinants are really in-variant to the coordinate transformation. Denote X = y
The same names are used for symmetric matrices corresponding to quadratic forms. By the signature of a symmetric matrix is meant the signature of the corresponding quadratic form.
4.2.7. Theorem (Sylvester criterion). A symmetric real matrix A is positive definite if and only if all its leading principal minors are positive.
A symmetric real matrix A is negative definite if and only if(—l'y\Ai\ > 0 for all leading principal submatrices At.
Proof. The claim about negative definite forms follows immediately from the first part of the theorem. Just observe that A is positive definite if and only if —A is negative definite.
Suppose that the form / is positive definite. Then A = PTEP = PTP for a suitable regular matrix P. Hence \A\ = \P\2 > 0. Let u be a chosen basis in which the form / has matrix A. The restrictions of / to the subspaces Vk = («i, • • •, Uk) are, positive definite forms fk again, and the corresponding matrices in the bases ui,..., Uk are the leading principal submatrices Ak. Thus \Ak\ > 0, too.
In order to prove the other implication, analyse in detail the form of the transformations used in completing the square in the Lagrange algorithm. The transformation used in the first step always has an upper triangular matrix T. By rescaling, see proposition 4.2.5, the matrix has one's on the diagonal:
an
T :
0
1
0
Such a matrix of the transformation from basis u to basis v has several useful properties. In particular, its leading principal submatrices Tk formed by the first k rows and columns are the transformation matrices of a subspace Pk = («i,..., Uk) from basis (ui,..., Uk) to basis (vi..., Vk). The leading principal submatrices Ak of the matrix A of the form / are matrices of restrictions of the form / to Pk. Therefore, the matrices Ak and A'k of restrictions to Pk in basis u and v respectively satisfy Ak = Tk A'^Tk)-1, where T is the transformation matrix from u to v. The inverse matrix to an upper triangular matrix with one's on the diagonal is again an upper triangular matrix with one's on the diagonal. Hence we may similarly express A' in terms of A. Thus the determinants of the matrices Ak and A'k are equal by Cauchy formula. Thus, we have proved:
Claim. Let f be a quadratic form on V, dim V = n,. Let ube a basis of V such that the items (3) and (4)from the Lagrange algorithm while finding the polar basis are not needed. Then the analytic formula
., xn) = \\x2 + \2x2 H-----h \rx2
is obtained where r is the rank of the form f, Ai,..., Ar 7^ 0 and for the leading principal submatrices of the (former) matrix A of quadratic form f, \Ak\ = AiA2 ... Xk, k < r.
258
CHAPTER 4. ANALYTIC GEOMETRY
and denote A as the matrix of the quadratic form. Then the corresponding conic section has equation X7"AX = 0. The standard form is obtained by rotation and translation. This is by a transformation to new coordinates x', y' satisfying
x = x' cos ft — y' sin a + c\
y = x' sin ft + y' cos a + c2,
/x>\
or, in matrix form, for the new coordinates X = \y' ,
V1/
(1)
(x\ /cos ft
y I = I sin ft v V 0
Put X = MX into the conic section equation to obtain the equation in new coordinates
XTAX = 0 (MX)TA(MX) = 0 X'TMTA MX' = 0.
After each step in this procedure, the resulting matrix contains zeros under the diagonal in the already processed columns. At the same time, all the principal minors remain the same. Consequently if the leading principal minors are nonzero, then the next diagonal term in A is nonzero and we do not need other steps than completing the squares. Moreover, Aj = A|/|A-i|- This proves the following:
Corollary (Jacobi theorem). Let f be a quadratic form of rank r on a vector space V with matrix A for the basis u. Steps other than completing the square are not required if and only if the leading principal submatrices of A satisfy A | 7^ 0, ..., | A-1 7^ 0. Then there exists a polar basis in which f has the analytic formula
f(xu...,xn) = \Ai\xf + y^rlx2, H-----h , j/^
Hence if all leading principal minors are positive, then / is positive definite by the Jacobi theorem and the Sylvester criterion is proved. □
Denote by A the matrix of the quadratic form in
new coordinates. Then A = MTAM, where matrix
(cos ft —sin ft cA sin o; cos o; c2 has unit determinant, so 0 0 l)
det A' = det MT det A det M = det A = A.
Necessarily, the determinant A33, which is the algebraic complement of a33, is invariant to the coordination transformation. For rotation only, det A = det MT det A det M.
/cos ft — sin ft 0\ In this case the matrix M = sin o; cos a 0
V 0 0 l)
For translation only,
and detAj3 /I 0
M = 0 1 \0 0
unchanged.
= det A3 = <5.
and this subdeterminant remains
3. Projective geometry
In many elementary texts on analytic geometry, the authors finish with the affine and Euclidean objects described above. The affine and Euclidean ge-\| ometries are sufficient for many practical prob-"^i^^^A— lems, but not for all problems.
For instance in processing an image from a camera, angles are not preserved and parallel lines may (but do not have to) intersect.
Moreover, it is often difficult to distinguish very small angles from zero angles, and thus it would be convenient to have tools which do not need such distinguishing.
The basic idea of projective geometry is to extend affine spaces by points at infinity. This permits an easy way to deal with linear objects such as points, lines, planes, projections, etc.
4.C.6. Determine the type of conic section
2a;2 - 2xy + 3y2 - x + y - 1 = 0.
Solution. The determinant
2 -1 A= -1 3
= -2M0,
hence it is a non-degenerate conic section. Moreover
S = 5 > 0, therefore it is an ellipse. Furthermore
(an + a22)A = (2 + 3) • < °> so itis real ellipse. □
4.3.1. Projective extension of affine plane. We begin with the simplest interesting case, namely geometry in a plane. If we imagine the points in the plane A2 as the plane z = 1 in 7l3, then each point P in the affine plane is represented by a vector u = (x,y,\) e K3. So it is represented also by a one-dimensional subspace (u) c K3. On the other hand, almost every one-dimensional subspace in R3 intersects the plane in exactly one point P. The vectors of such a subspace are given by coordinates (a, y, z) uniquely up to a common scalar multiple. Only the subspaces corresponding to vectors (a, y, 0) do not intersect the plane.
259
CHAPTER 4. ANALYTIC GEOMETRY
4.C.7. Determine the type of conic section x2 —4xy—hy2 +
2x + Ay + 3 = 0.
1 -2 1
Solution. The determinant A = -2 -5 2 =-34 ^ 0,
1 2 3
1 -2
Projective plane
furthermore S hyperbola.
-2 -5
9 < 0, it is therefore a □
4.C.8. Determine the equation and type of conic section passing through the points
[-2,-4], [8,-4], [0,-2], [0,-6], [6,-2].
Solution. Input the coordinates of the points into the general conic section equation
anx2 + a22y2 + 2a12xy + aix + a2y + a = 0
There follows the linear equation system
4an + 16a22 64an + 16a22 4a22 36a22 36an + 4a22
+ 16ai2 — 2ai — 64ai2 + 8ai
— 24ai2 + 6ai
-4a2 + a =0,
-4a2 + a =0,
-2a2 + a =0,
— 6a2 + a =0,
-2a2 + a = 0.
In matrix form we perform operations
Í4 16 16 -2 -4 A
64 16 -64 8 -4 1
0 4 0 0 -2 1
0 36 0 0 -6 1
\36 4 -24 6 -2 V
fA 16 16 -2 -4 1\
0 4 0 0 -2 1
0 0 64 -8 12 -9
0 0 0 24 -36 27
0 0 0 3 -v
f 48 0 0 0 0 -1
0 12 0 0 0 -1
0 0 64 0 0 0
0 0 0 24 0 3
0 0 0 3 -2
Then
an = 1, a22 = 4, ai2 = 0, a\ = —6, a2 = 32. The conic section has equation
x2 + 4y2 -6x + 32y + 48 = 0.
Complete the terms x2 —Qx, 4y2 +32y to squares. The result is
(x - 3)2 + 4(y + 4)2 - 25 = 0,
Definition. The projective plane V2 is the set of all one-dimensional subspaces in R3. The homogeneous coordinates of a point P = (x : y : z) in the projective plane are triples of real numbers given up to a common scalar multiple, at least one of which must be nonzero. A straight line in the projective plane is denned as a set of one-dimensional sub-spaces (i.e. points in V2) which generate a two-dimensional subspace (i.e. a plane) in R3.
For a concrete example, consider two parallel lines in the affine plane R2
Li : y — x — 1 = 0, L2 : y — 2 + 1 = 0.
If the points of lines L1 and L2 are finite points in projective space V2, then their homogeneous coordinates (x : y : z) satisfy equations
Li : y — 2 — 2 = 0, L2:y — 2 + 2 = 0.
the intersection L1 n L2 is the point (1 : 1 : 0) G V2 in this context. It is the point at infinity corresponding to the common direction vector of the lines.
4.3.2. Affine coordinates in the projective plane. If we be-
J.,<,, gin with the projective plane and if we want to see the affine plane as its "finite" part, then instead of the plane z = 1 we may take another plane a in R3 which does not pass through the origin 0 G R3. Then the finite points are those one-dimensional subspaces which have a nonempty intersection with the plane a.
Consider the two parallel lines from the previous paragraph. Let us choose the plane y = 1 to obtain two lines in the affine plane
Li:l-2-z = 0, L2:l-2 + z = 0
The "infinite" points of the former affine plane are given by 2 = 0. The lines L[ and L'2 intersect at the "finite" point (x,z) = (1,0). This corresponds to the geometric concept that two parallel lines L1, L2 in the affine plane meet at infinity, at the point (1:1:0), but this point becomes finite in different affine coordinates.
4.3.3. Projective spaces and transformations. In a natural way one can generalize the procedure in the affine plane for each finite dimension.
By choosing an arbitrary affine hyperplane An in the vector space Rn+1 which does not pass through origin, we may identify the points P G An with one-dimensional subspaces generated by these points. The remaining one-dimensional subspaces determine a hyperplane parallel to An ■ They are called infinite points in the projective extension Tn of the affine plane An ■
The set of infinite points in Tn is always a projective space of dimension one less. An affine straight line has only one infinite point in its projective extension (both ends of the line "intersect" at infinity and thus the projective line looks
260
CHAPTER 4. ANALYTIC GEOMETRY
or rather
(£Z3£+ÖL±£_1=0.
The conic section is an ellipse with centre at [3, —4]. □
4.C.9. Other characteristics and concepts of conic sections. The axis of a conic section is a line of reflection symmetry for the conic section. From the canonical form of a conic section in polar basis (4.2.4) it can be shown that an ellipse and a hyperbola both have two axes (x = 0 and y = 0). A parabola has one axis (x = 0). The intersection of a conic section and its axis is called a conic section vertex.
The numbers a, b from the canonical form of a conic section (which express the distance between vertices and the origin) are called the length of semi-axes. In the case of an ellipse and hyperbola, the axes intersect at the origin. This is a point of central symmetry for the conic section, called the centre of the conic section.
For practical problems involving conic sections, it is often easiest to describe them in parametric form. Often, this avoids contending with messy square roots.
Every point P on the parabola y2 = Aax, a > 0, can be described by P = (x,y) = (at2, 2at), for real t. The standard parametric form for the parabola is the pair of equations
x = at2 y = 2at,
(Note that the roles of x and y are interchanged, so that the axis of symmetry is the line y = 0.) The tangent line at at2, 2at) has slope \ and equation t(y — 2at) = (x — at2). The point F = (a, 0) on the axis is called the focus of the parabola, and the line x = — a is called the directrix. Each point on the parabola is equidistant from the focus and the directrix. This property can be used to define a parabola.
Every point P on the ellipse ^ + = 1 can be described by P = (x,y) = (a cos 9, b sin 9,) where 0 < b < a. The standard parametric form for the ellipse is the pair of equations
x = a cos 9, y = b sin 9.
The tangent line at P has slope — bac°*l and consequently has equation (a cos 9)(y — bsin 9) = —bcos 9)(x — a cos 9). The positive number e, denned by b2 = a2(I — e2) is called the eccentricity of the ellipse. If e = 0, the ellipse becomes a circle or radius a = b. Otherwise 0 < e < 1. The two points Fi = (ae, 0) and F2 = (—ae, 0) are the foci of the ellipse, and the lines x = ±a/e are the directrices.
like a circle). The projective plane has a projective line of infinite points, the three-dimensional projective space has a projective plane of infinite points etc.
More generally, we can define the projectivization of a vector space. For an arbitrary vector space V of dimension n + 1, we define
V{V) = {P , C, D) as
si t2
P= -,--•
h s2
The definition is valid, since although the vectors x and y are determined up to a scalar multiple, these multiples cancel out in the definition.
Similarly, each projective transformation preserves cross-ratios. Indeed, if the transformation is given in arithmetic coordinates by a matrix A, we have images A - w = t1A - x + t2A ■ y, and similarly for Az. Therefore the four images have the same cross-ratio.
We discuss the characterization of projective transformations. These are exactly those maps which preserve cross-ratios. But this is not a very practical characterization, since it contains implicitly the claim that these maps map projective lines to projective lines.
One can prove a much stronger statement. A map of arbitrarily small open area in affine space R™ (e.g. a ball without boundary) into the same affine space which maps lines to lines is actually a restriction of a uniquely determined projective transformation of the projective extension VM.n+1 of the former affine space R™. Thus these transformations also preserve cross-ratios.
XQ _i_ Vo_
a4 "t" b4
2 2
If we substitute a2e2 = a2 —b2 and |§- = 1— ^ (the point X is lying on the ellipse), we find that the previous term equals b2. □
4.C.13. Projective approach to conic section. Projective space gives an ability to approach the conic section from a new perspective (compare with 4.3.11). We can understand conic sections in 82 defined by the quadratic form
f(x, y) = aux2 + 2a12xy + a22y2 + 2a13x + 2a23y + a33
as a set of points in projective plane V2 with homogenous coordinates (x : y : z), which are the zero points of the homogenous quadratic form
f(x,y,z) = a11x2+2a12xy+a22y2+2a13xz+2a23yz+a33z2
Or rather f(v) = vTAv, where v is a column vector with coordinates (x, y, z) and matrix A is symmetric matrix (a^). By theorem 4.2.6, there exists a basis in which this quadratic form has one of the following equations
f(x, y, z) = x2 + y2 + z2, f(x, y, z) = x2 + y2 - z2.
In the former case there is only one solution of / (x, y, z) = 0 and therefore the original form does not represent a real conic section. The second quadratic form represents a cone in R3. We obtain the corresponding conic section by moving back to inhomogeneous coordinates. That means intersecting the
4.3.8. Duality. The
projective hyperplanes
in
n-dimensional projective space V(V) are defined as the projectivizations of n-dimensional I vector subspaces in the vector space V. Hence ,f in homogeneous coordinates, they are defined as kernels of linear forms a e V* which in turn are determined up to a scalar multiple.
Thus in a chosen arithmetic basis, a projective hyper-plane is given by a row vector a = (q0, ...,q„). But the forms a are given uniquely up to a scalar multiple. Therefore, each hyperplane in V is identified with exactly one geometric point in the projectivization of the dual space V(V*). We call such a space the dual projective space, and we talk about a duality between points and hyperplanes.
Of forms, the linear map defining a given collineation acts by the multiplication of row vectors from the right by the same matrix
a = (q0, . .. ,q„) m- a ■ A.
The matrix of the dual map is AT. But the dual map maps forms in the opposite direction, from the "target space" to the "initial one". Therefore the inverse map for the collineation of /is required in order to study the effect of regular collineations on points and their dual hyperplanes. The inverse is given by the matrix A-1. Hence the matrix for the action of the corresponding collineation on forms is (AT)~1. Since the inverse matrix equals the algebraically adjoint matrix A*alg, up to the multiplication by the inverse of determinant, ( see equation (1) on page 91,) we can work directly with the projective transformation of the space V(V*) given by the matrix (^4*lg)t (or without transposing if we multiply row vectors from the right).
264
CHAPTER 4. ANALYTIC GEOMETRY
cone with the plane which has the equation z = 1 in the original basis. Immediately we obtain the conic section classification from 4.29., which corresponds to the intersecting cone in R3 with different planes. Non-degenerate sections are depicted. Degenerate sections are those which pass through the vertex of the cone.
We define the following useful terms for a conic section in projective plane:
Points P, Q£ V2 corresponding to one-dimensional sub-spaces (p), (q) (generated by vectors p, q £ R3) are called polar conjugated with respect to conic section /, if F(p, q) = 0, or rather pTAq = 0.
Point P= (p) is called singular point of conic section /, when it is polar conjugated with respect to / with all points of the plane, so F(p, x) = 0 Va; £ P2. In other words, Ap = 0. Hence the matrix A of the conic section does not have maximal rank and therefore defines a degenerate conic section. Non-degenerate conic sections do not contain singular points.
The set of all points X= (x) are called polar conjugated with P = (p) polar of the point P with respect to the conic section /. It is therefore the set of points for which F(p, x) = pTAx = 0. Because the polar is given by a linear combination of coordinates, it is always (in the non-singular case) a line. The following explains the geometric interpretation of polar.
4.C.14. Polar characterization. Consider a non-degenerate conic section /. The polar of a point P £ f with respect to / is the tangent to / with the touch point P. The polar of the point P <£ / is the line denned by the touch points of the tangents to / passing through P.
Solution. First consider P£ /. Suppose that the polar of P, denned by F(p, x) = 0, intersects / in Q= (q) =fiP. Then F(p, q) = 0 and /(g) = F(q, q) = 0. For an arbitrary point X = (x) lying on P and Q, x = ap + j3q for some a, j3 £ R.
The projective point X belongs to the hyperplane a if the arithmetic coordinates satisfy a ■ x = 0. It still holds after acting with an arbitrary collineation, since
(a ■ A'1) -(A-x) = a-x = 0.
4.3.9. Fixed points, centers and axes. Consider a regular collineation / given in an arithmetic basis of . projective space V(V) by a matrix A. 'T2§g|j^ By the fixed point of the collineation /, we C£i3f— mean a point A which is mapped to itself. That is, f(A) = A. By the fixed hyperplane of collineation / is meant a hyperplane a which is mapped to itself. That is, /(a) C a.
Hence the arithmetic representatives of fixed points are exactly the eigenvectors of the matrix A.
In the geometry of the plane, we meet many types of collineations: reflection through a point, reflection across a line, translation, homothety etc. Perhaps we remember also some types of projections, e.g. the projection of a plane in R3 to another from a center S £ R3.
Note also that there appear fixed lines next to fixed points in all cases of such affine maps. For example, the reflection through a point preserves also all lines passing through this point. In the case of a translation the infinite points behave similarly.
Now we discuss this phenomenon in an arbitrary dimension. First, we define a classical notion related to the incidence of points and hyperplanes.
A set of hyperplanes passing through a point A £ V(V) is a set of all hyperplanes which contain the point A. For each point A the corresponding set of hyperplanes itself is a hyperplane in the dual space V(V*). It is given by one homogeneous linear equation in arithmetic coordinates.
For a collineation / : V(V) -> V(V), a point S £ V(V), is called the center of collineation /, if all hyperplanes in the set determined by S are fixed hyperplanes. A hyperplane a is called the axis of collineation f if all its points are fixed points.
It follows that the axis of a collineation is the center of the dual collineation, while the set of hyperplanes denning the center of collineation is the axis of the dual collineation.
Since the matrices of a collineation on the former and the dual space differ only by the transposition, their eigenvalues coincide (the eigenvectors are column vectors, respectively row vectors corresponding to the same eigenvalues). For example in the projective plane (and for the same reason in each real projective space of even dimension) each collineation has at least one fixed point, since the characteristic polynomials of corresponding linear maps are of odd degree. Hence they have at least one real root.
Instead of discussing a general theory, we illustrate its usefulness in several results for projective planes. .
Proposition. A projective transformation other than the identity has either exactly one center and exactly one axis, or it has neither a center nor an axis.
265
CHAPTER 4. ANALYTIC GEOMETRY
Because of the bilinearity and symmetry of F,
f(x) = F(x,x) = a2F(p,p)+2a(3F(p, q)+(32F(q,q) = 0.
So every point X of the line lies on the conic section /. However, when the conic section contains a line, it has to be degenerate, which is a contradiction.
The claim for P f follows from the corollary of the symmetry of the bilinear form F. When the Q lies on the polar of P, then P lies on the polar of Q.
□
Using polar conjugates we can find the axes and the centre of the conic sections without using the Lagrange algorithm.
Consider the conic section matrix as a block matrix
A=(i a), \a a J
where A = (a^) for i,j = 1,2, a is vector (013,023) and a = 033. This means that the conic section is defined by the equation
uTAu + 2aTu + a = 0 for a vector u= (x,y). Now we show that
4.C.15. The axes of a conic section are the polars of the points at infinity determined by the eigenvectors of the matrix A.
Solution. Because of the symmetry of A in the basis of its eigenvectors, it has a diagonal shape D = , where
A, [i e R and this basis is orthogonal. Denote by U the matrix changing basis to a basis of eigenvectors (columns), then the conic section matrix is
fUT 0\ (A a\ fU 0\ _ ( D UTa\ \0 l)\aT a)\0 l) ~ \aTU a )
Proof. Consider a collineation / on VM3 and assume that it has two distinct centers A and B. Denote by £ the line given by these two centers, and choose a point X in the projective plane outside of £. If p and q are the lines passing through pairs of points (A, X) respectively (B, X), then f(p) = p and /(g) = q. In particular, X is fixed. But then all points of the plane outside of £ are fixed. Hence each line different from £ has all points out of £ fixed and thus also its intersection with £ is fixed. It follows that / is the identity mapping. So it is proved that every projective transformation other than the identity has at most one center. The same argument for the dual projective plane proves that there is at most one axis.
If / has a center A, then all lines passing through A are fixed. They correspond therefore to a two-dimensional sub-space of a row eigenvectors of the matrix corresponding to the transformation /. Therefore, there exists a two-dimensional subspace of column eigenvectors for the same eigenvalue. This represents exactly the line of fixed points, hence it represents the axis. The same consideration in the reversed order proves the opposite statement - if a projective transformation of plane has an axis, then it has also a center. □ check ta delail!
For practical problems it is useful to work with complex projective extensions also in the case of a real plane. Then the geometric behaviour can be easily read off the potential existence of real or imaginary centers and axes. picture missing!
4.3.10. Pappus Theorem. The following result known as Pappus theorem is a classic result of projective geometry.
Proposition. Let two triples of distinct consecutive collinear points {A, B, C} and {A, B ,C'} lie on two lines that meet at the point T, which is closest to A and A, respectively. Define points Q, R and S as
Q = [AB]f][BA'], R = [AC]C\[CA'], S = [BC]n[CBf].
Then {Q, R, S} are also collinear.
Proof. Without loss of generality, consider the plane, passing through {T, A, B, C, A, B, O } as a 2-dimensional plane in V2 defined by z = 1 in the homogeneous coordinates
(x:y: z).
The points {T, A, B, C, A', B ,C'} may be considered as objects in V2, representing lines through the origin in R3 with directional vectors {t,a,b,c,a',b',c'}, respectively. These can be chosen up to a real non-zero factor. The condition {z = 1} uniquely identifies those points in R3 regardless of the choice of {t, a, b, c, a', b', c'}. Since {T, A, B, C} are collinear points, (they lie in the same 2-dimensional linear subspace of R3), we may assume that this plane is generated by t and a. Choose
b = t + a, c = Xt + a,
and analogously, for {T, A', B, C"}
b' = t + a', c' = X't + a'
CHAPTER 4. ANALYTIC GEOMETRY
So in this basis there is the canonical form defined by vector UTa (up to a translation). Specifically, denote the eigenvectors by v\, Vfj,, and then
w aT«A,2 , aTv^2 (aTv\)2 (aTvA2 X(x + —±)2 + p(y +-^)2 = ^--^L + ^- ct.
X jjb A jjb
This means that the eigenvectors are the direction vectors of the conic section axes (main directions). The axes equa-
T T
tions in this basis are x = — and y = — . The axes coordinates u\ and uM in the standard basis satisfy v^u\ =
T T
— s-^x and = — because v\{Xu\ + a) = 0 and v^lfiu^+a) = 0. These equations are equivalent to the equations vJ(Au\ + a) = 0 and v^Au^ + a) = 0 which are the polar equations of the points defined by the vectors vx&v^. □
4.C.16. Remark. A corollary of the previous claim is that the centre of the conic section is polar conjugated with all points at infinity. The coordinates of the centre s then satisfy the equation As + a = 0.
If de-t(A) 7^ 0, then the equation As + a = 0 for centre coordinates has exactly one solution if S = det(A) ^ 0, and no solutions if S = 0. That means that, regarding non-degenerate conic sections, the ellipse and the hyperbola have exactly one centre. The parabola has no centre, (its centre is point at infinity).
4.C.17. Prove that the angle between the tangent to the parabola (with arbitrary touch point) and the parabola axis is the same as the angle between the tangent and the line connecting the focus and the point of tangency
Solution. The polar (i.e. tangent) of a point X=[x0, yo] to a parabola defined by the canonical equation in the polar basis is a line satisfying
A 0 °\ i
0 0 -p)
V> -p o I 1
The cosine of the angle between the tangent and the axis of the parabola (x = 0) is given by the dot product of the corresponding unit direction vectors. The unit direction vector of the tangent is
\/p2+xl 1
(p, x0) and therefore
(p,xo).(0,l)
x0
Vp2 + xo ' ' VP2 + xo Now we show that this is the same as the cosine of the angle between the tangent and line connecting the focus F=[0, |],
for some real constants A and A'. It is only necessary to show that the vectors q, r, s, representing Q, R, S in V2 generate a 2-dimensional subspace in R3. Since
(t + a) + a' = a + (t + a'), q = t + a + a' represents Q. Since
XX't + X'a + Xa' = X(X't + a') + X'a = X'(Xt + a) + Xa', r = XX't + X'a + Xa' represents R. Finally,
s = g — r = t + a + a' — XX't — X'a — Xa' = (l-X')(t + a) + (l-X)(X't + a') = (l-X)(t + a') + (l-X')(Xt + a)
represents the point S. Thus, the points {Q, R, S} lie in the 2-dimensional subspace generated by vectors q and r. Since Q, R, S also belong to the plane {z = 1}, these points are collinear. □
4.3.11. Projective classification of quadrics. To end this section, we return to conies and quadrics. A f/' quadric Q in n-dimensional affine space R™ is defined by a general quadratic equation (1), see page 252. By viewing the affine space R™ as affine coordinates in projective space VM.n+1 we may wish to describe the set Q by homogeneous coordinates in projective space. The formula in these coordinates should contain only the terms of second order since only a homogeneous formula is independent of the choice of the multiple of homogeneous coordinates (x0, xi,..., xn) of a point. Hence we search for a homogeneous formula whose restriction to affine coordinates, (that is, substitution x0 = 1), gives the original formula (1).
But this is especially easy. Simply add the right number of x0 to all terms - nothing to the quadratic terms, one to the linear terms and x2, to the constant term in the original affine equation for Q.
We obtain a well defined quadratic form / = Y17j=o aijxixj on the vector space Rn+1 whose zero set defines correctly the projective quadric Q.
The intersection of a "cone" Q C Rn+1 of the zero set of this form with the affine plane x0 = 1 is the original quadric Q whose points are called the proper points of the quadric. The other points Q \ Q in the projective extension are the infinite points.
The classification of real or complex projective quadrics, up to projective transformations, is a problem already considered. It is all about finding the canonical polar basis, see paragraph 4.2.6. From this classification, given by the signature of the form in the real case and by the rank only in the complex case, we can deduce also the classification of the affine quadrics. We show the essential part of the procedure in the case of conies in the affine and the projective plane.
The projective classification gives the following possibilities, described by homogeneous coordinates (x : y : z) in the projective plane VM3:
' = 0
imaginary regular conic given by x2 + y2 + z2
267
CHAPTER 4. ANALYTIC GEOMETRY
and the touch point X. The unit direction vector of the connecting line is
1
\Jxl + (Vo
= (x0,y0 - 77)-
For the cosine of the angle,
1 1
\/P2 + ^^ + (yo-|)
(x0yo + -z-) 2 2
Substitute yo
^ to obtain ,x° .
2p fi^l
This example shows that lightrays striking parallel with axis of parabolic mirror are reflecting to the focus and, vice versa, light rays going through focus reflect in direction parallel with axis of parabola. This is the principle of many devices such as parabolic reflectors. □ Solution. (Alternative) At the point P = (at2,2at) on the parabola, the tangent line has slope (1/t) and the focus is at (a, 0). So the line joining P to the focus F has slope 2°/~a° = (ta2^!) ■ If 0 is the angle between the tangent line and the x — axis, then tan 0 = 1/t, so
2 tan 9 lit It tan2y = -=— =---7T- = -=-
l-tan26» (1 - 1/t2) t2-l
By subtraction, the angle between the tangent line and the line joining P to the focus is 6.
Note that the tangent line meets the a;-axis at Q where Q = (—at2,0). The result follows from showing that \FP\ = FQI, and hence the triangle QFP is isosceles. □
You can find many more examples on quadrics on D
• real regular conic given by x2 + y2 — z2 = 0
• pair of imaginary lines given by x2 + y2 = 0
• pair of real lines given by x2 — y2 = 0
• one double line x2 = 0.
We consider this classification as real, that is, the classification of quadratic forms is given not only by its rank but also by its signature. Nevertheless, the points of a quadric are considered also in the complex extension. In this way we should understand the stated names. For example the imaginary conic does not have any real points.
4.3.12. Afline classification of quadrics. For an affine classification we must restrict the projective transformations to those which preserve the line of infinite points. This can be seen also by the converse procedure — for a fixed projective type of conic Q, that is, its cone Q C R3, we choose different affine planes a c K3 which do not pass through the origin. We observe the changes to the set of points QDa, which are proper points of Q in affine coordinates, as realized by the plane a.
Hence in the case of a regular conic there is a real cone Q given by the equation z2 = x2 + y2. As planes a we may for instance choose the tangent planes to unite the sphere. If we begin with the plane z = 1, the intersection consists only of finite points forming a unit circle Q. By a gradual change of the slope of a we obtain a more and more stretched ellipse until we get such a slope that a is parallel with one of lines of the cone. At that moment there appears one (double) infinite point of the conic whose finite points still form one connected component, and so we have a parabola. Continuing to change the slope gives rise to two infinite points. The set of finite points is no longer connected, and so we obtain the last regular quadric in the affine classification, a hyperbola.
We can take advice from the introduced method which enables us to continue the classification in higher dimensions. In particular, we notice that the intersection of the conic with the projective line of infinite points is always a quadric in dimension one less. It is either the empty set or a double point or two points as types of quadrics on a projective line. Next we found an affine transformation transforming one of possible realizations of a fixed projective type to another one, only if the corresponding quadrics in the infinite line were projec-tively equivalent. In this way, it is possible to continue the classification of quadrics in dimension three and above.
268
CHAPTER 4. ANALYTIC GEOMETRY
o
D. Further exercise on this chapter
4.D.I. Find a parametric equation for the intersection of the following planes in R3:
a : 2x + 3y -2 + 1 = 0 a p : x - 2y + 5 = 0.
4.D.2. Find a common perpendicular for the skew lines
p: [1,1,1] +*(2,1,0), q: [2, 2, 0] + t(1,1,1).
o
4.ZJ.3. Jarda is standing in [—1,1, 0] and has a stick of length 4. Can he simultaneously touch the lines p and q, where
p : [0,-1,0]+i (1,2,1), q : [3,4,8] +s(2,1,3)?
(The stick must pass through [—1,1,0].) O
4.D.4. A cube ABCDEFGH is given. The point T lies on the edge BF, with | bt | = \ \BF\. Compute the cosine of the angle between ATC and BDE. O
4.D.5. A cube ABCDEFGH is given. The point T lies on the edge AE, with |at| = \\AE\. S is the midpoint of AD. Compute the cosine of the angle between BDT and SCH. O
4.D.6. A cube ABCDEFGH is given. The point t lies on the edge BF, \BT\ = \\BF\. Compute the cosine of the angle between ATC and BDE. O
4.D.7. What are the lengths of semi-axes, when the sum of their lengths equals the distance between foci both equal 1.
Solution. It is given that a + b = 1 and 2ae = 1. Also b2 = a2(I — e2). Eliminating e gives b2 = a2 — (1/4). So 1/4 = a2 -b2 = (a-b)(a + b) = a - b. So a = 5/8 and & = 3/8. □
Solution. (Alternative.) Solve the system
a + b = 1 2e= 2Va2 - &2 = 1
and find solution a = |, b = |. □ 4.D.8. For what slopes k are the lines passing through [—4,2] secant and tangent lines of the ellipse denned by
— + yl = i
9 4
Solution. The direction vector of the line is (1, k) and its parametric equations then are x = — 4 + t, y = 2 + fci. The intersection with the ellipse satisfies
M + *)2 (2 + H)2 9 4 This quadratic equation has discriminant equal to
D = -^(7fc + 16). 9
This implies that for k e (—4^,0) there are two solutions, and the line is a secant. For k = — ir and k = 0 there is only one solution and the line is a tangent to the ellipse. □
269
CHAPTER 4. ANALYTIC GEOMETRY
4.D.9. Find all lines tangent to the ellipse 3x2 + 7y2 = 30, so that the distance from the centre of the ellipse to the tangent is 3.
Solution. All lines at distance 3 from the origin are tangents to the circle centre at [0,0] and radius 3. They all have an equation x cos 9 + y sin 9 = 3 for some #.This line meets the standard ellipse x2/a2 + y2/b2 = 1 where
x2 (3 - a;cos6»)2 _ ~a2+ b2 sin2 9
or
a;2(a2cos26»+č>2sin2 6)-6a2x cos 0-a2(b2 sin2 0-9) = 0
It is a tangent line if the above equation has a double root for x. Thus it is required that
36a4 cos2 9 = 4(a2 cos2 9 + b2 sin2 6») (9 - b2 sin2 9). This simplifies to requiring that
a2 cos2 9 + b2 sin2 9 = 9.
This implies
2 a (9 - &2) ■2a (a2-9)
For the given problem a2 = 10 and b2 = 30/7. The solution is iVŠŠ + yVŤ = 3V40. □ Solution. (Alternative.) The tangent line is (y - b sin 9)= - ^g-§ (x - a cos 6») with a2 = 10 and b2 = 30/7. The distance to the origin, 3, implies 3 = (a/ cos 9) sin p where
tan tp = ^Hj-f 3 cos 0 = a sin p where a sin 9 tan p = b cos 9 3 sin 9 = b cos p 9/a2cos29 + 9/b2sin29 = 1. 9 cos2 9 + 21 sin2 9 = 10. 12 sin2 9 = 1
(y- vWŠ) = -(z- V^/v^IvW?] □ Solution. (Alternative.) The centre of the ellipse is at the origin. The distance d between the line ax + by + c = 0 and the origin is d = ^^+b21 '^le tan8ent then satisfies a2 + b2 = ^. The equation of the tangent passing through the point [xt , yr] is 3xxt + Tyyr — 30 = 0. For coordinates of the point of tangency,
(3xT)2 + (7yT)2 = 100 4 + 7Vt
3x2T + ly2T = 30
Its solution is xt = ±^J^,yT = ±y jg. Considering the symmetry of ellipse, there are four solutions
±3^^x ± 7^y - 30 = 0. □
4.D.10. A hyperbola x2 — y2 = 2 is given. Find an equation of a hyperbola having the same foci and passing through point
[-2,3].
Solution. The given hyperbola has a2 = b2 = 2, so a2e2 = a2 + b2 = 4, and the foci are at (±ae, 0) = (±2,0). So the desired hyperbola has equation
^{x-2)2+y2-^{x + 2)2 + y2 = k,
for some constant k. Since the hyperbola passes through [—2,3], k = 2. Squaring gives
y/(x - 2)2 + y2 = [y/(x + 2)2 + y2 + 2],
(x - 2)2 +y2 = (x + 2)2 + y2 + 4^(x + 2)2 + y2 + 4 (—2x — l)2 = (x + 2)2 + y2 or 3a2 = y2 + 3 which is the required hyperbola. □
270
CHAPTER 4. ANALYTIC GEOMETRY
Solution. (Alternative.) The equation of the desired hyperbola is — = 1, with its eccentricity e satisfying
a2e2 = a2 + b2 = 4, since the foci are at [±ae, 0] = [±2,0]. The point [—2,3] lies on the hyperbola, so — = 1. It
follows that a2 = 1, b2 = 3. The desired hyperbola is x2 — ^- = 1. □
4.D.11. Determine the equations of the tangent lines to the hyperbola Ax2 — 9y2 = 1, which are perpendicular to line
x - 2y + 7 = 0.
Solution. All lines perpendicular to the given line have an equation 2x + y + c = 0 for some c. So the line has an intersection with a double root with the given hyperbola. So the equation Ax2 — 9(—2x — c)2 = 1 has a double root. Hence (36c)2 —
4.32.(9c2 + l) = 0, and c = ±2-^2. □
4.D.12. Determine the tangent to the ellipse + ^- = 1 which is parallel with line x + y — 7 = 0. Solution. The lines parallel with the given line intersect this line in a point at infinity (1 : —1 : 0). Construct tangents to given ellipse passing through this point. The point of tangency T= (ti : t2 : £3) lies on its polar and therefore satisfies — ^ = 0, so t2 = jgti. Substituting into the ellipse equation, we get t1 = The touching points of the desired tangents are [ |] and [—— |]. The tangents are polars of those points. They have equations x + y = 5 and x + y = —5. □ Solution. (Alternative). The given line has slope — 1. The tangent line at (4 cos 8,3 sin 6) has slope — ff^f, so it is required thattan#= |. The tangent line has equation (y — 3 sin 6) = (—l)(a; — 4cos 6) where either sm6 = 3/5 and cos# = 4/5 or sin 6 = —3/5 and cos 6 = —4/5. The two solutions are x + y = ±5. □
4.D.13. Determine the points at infinity and the asymptotes of the conic section
2x2 + Axy + 2y2 - y + 1 = 0
Solution. The equation for the points at infinity of 2x2 + Axy + 2y2 = 0 or rather 2(x + y)2 = 0 has a solution x = —y. The only point at infinity therefore is (1 : —1 : 0), so the conic section is a parabola. The asymptote is a polar of this point, specifically the line at infinity z = 0. □
4.D.14. Prove that the product of the distances between an arbitrary point on a hyperbola and both of its asymptotes is constant. Find its value.
Solution. Denote the point lying on the hyperbola by P. The asymptote equation of the hyperbola in canonical form is bx±ay = 0. Their normals are (b,±a) and from here we determine the projections Pi, P2 of point P to asymptotes. For the distance between point P and asymptotes we get \PPi,2\ = ^p^fl- The product is therefore equal to a „2^^ = ' because P lies on hyperbola. □
4.D.15. Compute the angle between the asymptotes of the hyperbola 3x2 — y2 = 3.
12_ 2
Solution. For the cosine of the angle between the asymptotes of the hyperbola in canonical form, cos a = §2+^2 ■ In this case the angle is 60 degrees. □
4.D.16. Locate the centers of the conic sections
(a) 9a;2 + 6xy - 2y - 2 = 0
(b) x2 + 2xy + y2 + 2x + y + 2 = 0
(c) x2 - Axy + Ay2 + 2x - Ay - 3 = 0
(d) ^ + ^ = 1
Solution, (a) The system As + a = 0 for computing centers is
9Sl + 3s2 = 0 3si - 2 = 0 '
Solve it to obtain the center at [|, —2].
271
CHAPTER 4. ANALYTIC GEOMETRY
(b) In this case,
S1 + S2 + i = 0 si + s2 + \ = 0.
Therefore there is no proper center (the conic section is a parabola). Moving to homogeneous coordinates we can obtain the center at infinity (1 : —1 : 0).
(c) The coordinates of the center in this case satisfy
Sl-2s2 + l = 0 -2si+4s2-2 = 0.
The solution is the line of centers. This is so because the conic section is degenerate: it is a pair of parallel lines.
(d) The center is at (a, 0). The coordinates of the center therefore give the translation of the coordinate system to the frame in which the ellipse has its basic form.
□
4.D.17. Find the equations of the axes of the conic section Qxy + 8y2 + Ay + 2x — 13 = 0.
Solution. The major and minor axes of the conic section are in the direction of the eigenvectors of matrix ^jj ^ . The
characteristic equation has the form A2 — 8A — 9 = 0. The eigenvalues are therefore Ai = —1, A2 = 9. The corresponding eigenvectors are then (3, —1) and (1, —3). The axes arethepolars of points at infinity denned by those directions. For (3, —1), the axis equation is —3a; + y + 1 = 0. For (1, —3) it is —9a; — 21y — 5 = 0. □
4.D.18. Determine the equations of the axes of the conic section 4a;2 + 4a;y + y2 + 2x + Qy + 5 = 0.
/4 2\
Solution. The eigenvalues of the matrix I ^ ^ I are Ai = 0, A2 = 5 and the corresponding eigenvectors are (—1,2) and (2,1). There is one axis 2a; + y + 1 = 0, and the conic section is a parabola. □
4.D.19. The equation
a;2 + 3a;y — y2+a; + y + l = 0. defines a conic section. Determine its center, axes, asymptotes and foci.
4.D.20. Find the equation of the tangent at P=[l, 1] to the conic section
4a;2 + 5y2 - 8a;y + 2y - 3 = 0
Solution. By projecting, this is a conic section defined by the quadratic form (x, y, z)A(x, y, z)T with matrix
Using the previous theorem, the tangent is a polar of P, which has homogenenous coordinates (1 : 1 : 1). It is given by equation (1,1, l)A(x, y, z)T = 0, which in this case gives
2y - 2z = 0
Moving back to inhomogeneous coordinates, the tangent line equation is y = 1.
□
272
CHAPTER 4. ANALYTIC GEOMETRY
4.D.21. Find the coordinates of the intersection of the y axis and the conic section denned by
5x2 + 2xy + y2 - 8x = 0
Solution. The y axis, is the line x = 0. It is the polar of the point P with homogeneous coordinates (p) = (p± : p2 : p3). That means that the equation x = 0 is equivalent to the polar equation F(p, v) = pTAv = 0, where v = (x, y, z)T. This is satisfied when Ap = (a, 0, 0)T for some aeR, This condition gives the conic section matrix
equation system
5pi + P2 - 4p3 = aj P1+P2 = 0 -4pi = 0
We can find the coordinates of P by the inverse matrix, p = A-1 (a, 0, 0)T, or solve the system directly by backward substitution. In this case we can easily obtain solution p = (0, 0, — \a). So the y axis touches the conic section at the origin.
□
4.D.22. Find a touch point of the line x = 2 with the conic section from the previous exercise.
Solution. The line has equation x — 2z = 0 in its projective extension and therefore we get the condition Ap = (a, 0, — 2a) for the touch point P, which gives
5pi + P2 - 4p3 = a P1+P2 = 0 -4pi = -2a
Its solution is p = (Iq, — |a, ^a). These homogeneous coordinates are equivalent to (2, —2,1) and hence the touch point has coordinates [2,—2]. □
4.D.23. Find equations of the tangents passing through P= [3,4] to the conic defined by
2x2 - Axy + y2 - 2x + 6y - 3 = 0.
Solution. Suppose that the point of tangency T has homogeneous coordinates given by a multiple of the vector t= (ti,t2,t3). The condition that T lies on the conic section is tT At = 0, which gives
2t\ - Ufa + t22- 2*1*3 + 6*2*3 - 3*3 = 0
The condition that P lies on the polar of T is pT At = 0, where p = (3,4,1) are the homogeneous coordinates of point P. In this case, the equation gives
/ 2 2 -1\ /*A (3,4,1) -2 1 3 *2 =-3*i+*2 +6*3 = 0 V-l 3 -3j \t3J
Now we can substitute t2 = 3*i — 6*3 to the previous quadratic equation. Then
-t\ + 4*1*3 - 3*3 = 0
Because the equation is not satisfied for *3 = 0, we move to inhomogeneous coordinates (|^, 1), for which we get
-(|)2 + 4(|)-3 = 0 a |=3(|)-6,
tj. = 1 a || = —3, nebo |^ = 3 a || = 3. So the touch points have homogeneous coordinates (1 : —3 : 1) and (3:3:1). The tangent equations are the polars of those points 7x — 2y — 13 = 0 and x = —3. □
273
CHAPTER 4. ANALYTIC GEOMETRY
4.D.24. Find an equation of the tangent passing through the origin to the circle
x2 + y2 - Wx - Ay + 25 = 0
Solution. The touch point (ti : t2 : t3) satisfies
/ 1 0 -5\ (tl\ (0,0,1) 0 1 -2 f2 =-5fi-2f2 + 25 = 0 \-5 -2 25/ \t3J
From here we eliminate t2 and substitute into circle equation, which (ti : t2 : t3) has to be satisfied as well. We obtain the quadratic equation 29t2 — 250«i + 525 = 0, with solutions 11 = 5 and t\ = We compute the coordinate t2 and get touch points [5,0] and The tangents are polars of those points with equations y = 0 and 20a; — 21y = 0. □
4.D.25. Find tangents equations to circle x2 + y2 = 5 which are parallel with 2a; + y + 2 = 0.
Solution. In the projective extension, these tangents intersect at the point at infinity satisfying 2a; + y + z = 0, so in point with homogeneous coordinates (1 : — 2 : 0). They are tangents from this point to the circle. We can use the same method as in previous exercise. The conic section matrix is diagonal with the diagonal (1,1, —5) and therefore the touch point (ti : 12 : t3) of the tangents satisfies t\ — 2t2 = 0. Substitute into the circle equation to get 5t2 = 5. Since t2 = ±1, the touch points are
[2,1] and [-2,-1]. □
Solution. Alternative. The point P = v^cos 8, sin 6) lies on the circle for all 6. The tangent line at P is x cos 6 + y sin 6 = y/5. This has slope -(cos 0)/(sin 6») which is -2 provided tan 9 = 1 /2. It follows that P is at either [2,1] or [-2, -1]. □ A tangent line touching the conic section at infinity is called an asymptote. The number of asymptotes of a conic section equals the number of intersections between the conic section and the line at infinity. So the ellipse has no real asymptotes, the parabola has one (which is however a line at infinity) and the hyperbola has two.
4.D.26. Find the points at infinity and the asymptotes of the conic section denned by
4a;2 - 8xy + 3y2 - 2y - 5 = 0
Solution. First, rewrite the conic section in homogeneous coordinates.
4a;2 - 8xy + 3y2 - 2yz - hz2 = 0 the homogeneous coordinates (x : y : 0) satisfying this equation, which means
4a;2 - 8xy + 3y2 = 0.
It follows that either: ^ = — h 01 y ~ ~ I ■ The conic section is therefore a hyperbola with points at infinity P= (—1:2:0) aQ= (-3 : 2 : 0).
/ 4 -4 0\ M (-1,2,0) í-4 3 -ll í y\ =-12a; + 10y-2 = 0
a
/ 4 -4 0\ M (-3,2,0) -4 3 -1 y =-20x + 18y-2 = 0 V 0 -1 -5/ W
□
There are further exercises on conic sections on the page 269.
4.D.27. Harmonic cross-ratio. If the cross-ratio of four points lying on the line equals —1, we talk about a harmonic quadruple. Let ABCD be a quadrilateral. Denote by K the intersection of the lines AB and CD, by M the intersection of
274
CHAPTER 4. ANALYTIC GEOMETRY
the lines AD and BC. Further let L, N be the intersection of KM and AC, BD respectively. Show that the points K, L, M, N axe, a harmonic quadruple. O
275
CHAPTER 4. ANALYTIC GEOMETRY
Exercise solution
4A.9. 2, 3, 4, 6, 7, 8. Find planes the positions of which correspond to each of those numbers.
4.B.12. For the normal vector (a, b, c) of such planes ax + by + cz = d, a + b = 0 since (a,b,c) must be orthogonal to the direction of p. a = d since the plane contains (1,0,0). So the plane is ax — ay + cz = a. If a = 0, then the plane is z = 0, The angle condition requires
cos60° = \ = v^f^Vf
and by choosing a = — b = 1 (vector (0,0,1) does not satisfy the conditions, so by certain multiplication we can get a = — b = 1) we then get, using the angle condition, | v^v/c2+e21 = \ • altogether, the sought equations are x — y ± \/6 — 1 = 0.
4.B.17. (-1,3,2).
4.D.I. Line (2t, t, It) + [-5,0, -9].
4.D.2. [3,2,l][8/3,8/3,2/3].
4D.3. The transversal [1,1,1][—3,1, —1] is of length V20, so the stick is not long enough. 4.D.4. ^ 4.D.5. ^.
276
CHAPTER 5
Establishing the ZOO
which functions do we need for our models? - a thorough menagerie
A. Polynomial interpolation
Let us start with some examples which will hopefully make us more comfortable with polynomials.
5.A.I. Determine the sum of coefficients of the polynomial
(1 — 2x + 3x2 — x3)r, where r is your age in years.
Solution. The sum of coefficients of a polynomial is equal it's value in 1. Therefore the sum is (1 — 2 + 3 — l)r = V = 1.
□
In this chapter, we start using tools allowing us to model dependencies which are neither linear, nor discrete. Such models are often needed when dealing with time dependent systems. We try to describe them not only at discrete moments of time, but "continuously". Sometimes this is advantageous, for instance in physical models of classical mechanics and engineering. It might also be appropriate and computationally effective to employ an approximation of discrete models in economics, chemistry, or biology. In particular such ideas may be appropriate in relation to stochastic models, as we shall see in Chapter 10.
The key concept is that of a function, also called a "signal" in practical applications. The larger the class of functions used, the more difficult is the development of effective tools. On the other hand, if there are only a few simple types of functions available, it may be that some real situations cannot be modelled at all.
The objective of the following two chapters is thus to introduce explicitly the most elementary functions of real variables. It is also to describe implicitly many more functions, and to build the standard tools to use them. This is the differential and integral calculus of one variable. While the focus has been mainly on the part of mathematics called algebra, the emphasis will now be on mathematical analysis. The link between the two is provided by a "geometric approach". If possible, this means building concepts and intuition independently of any choice of coordinates. Often this leads to a discrete (finite) description of the objects of interest. This is immediate when working with polynomials now.
5.A.2. Determine the coefficient by a;120 of the polynomial
P(x) = (l-a; + a;2-a;3 + -
Solution.
-x4i))(l + x + xz + -
„ioi-\
P(x)
1 + x 1 — X
(1
(l-a;50)(l + a;2+a;4 + .
+ x48 - x102
1-x2
100\
+ x
= \ + xz + The coefficient by a;120 is —1.
— ■ ■ ■ — x
150
□
1. Polynomial interpolation
In the previous chapters, we often worked with sequences of real or complex numbers, i.e. with scalar functions N —> K or Z —> K, where K is a given set of numbers. We also worked with sequences of vectors over real or complex numbers.
Recall the discussion from paragraph 1.1.6, about dealing with scalar functions. This discussion is adequate to work with functions R —> R {real-valuedfunctions of one real variable), or R —> C {complex-valued functions of one real variable), or sometimes more generally the vector-valued functions of one real variable R —> V. The results can usually be
CHAPTER 5. ESTABLISHING THE ZOO
5.A.3. Prove that any real solution x0 of the equation a;3 +
px+q = 0(p,q G R) safisfies the inequality 4qx0 <
Solution. Note that xq is the solution of the quadratic equation x0x2 + px + q = 0, therefore its discriminant
p — 4:x0p is non-negative.
□
5. A.4. Let P (x) be a polynomial of degree at most n, n > 1, such that
for k = 0,1,..., n. Find P(n + 1).
Solution. LetQ(x) = (x + l)P(x) - (n + 1 - x). Note that Q(x) has degree n + 1 and the condition from the problem statement now sais that (n + 1) numbers 0,1,..., n are roots of the polynomial, that is Q(x) = K ■ x ■ (x — 1) ■ ■ ■ (x — n). Now we use the two expressions of Q(x) to determine K. On one hand Q(-l) = -(n + 2), but Q(-l) = K ■ (-l)n+1 ■ (n + 1)! as well. Thus
K =
(-!)"(« + 2)
(n + 1)!
thatisQ(n+l) = (-l)n(n+2). On the other hand from our definition of Q(x) we get Q(n + 1) = (n + 2)P(n + 1). All together P(n + 1) = (-1)™. □
5.A.5. Let P(x) be a polynomial with real non-negative coefficients. Prove, that if P(^)P(x) > 1 for x = 1 than the same inequality holds for every positive x.
Solution. Let P(x) = anxn + an_i2+_1 + ■ ■ ■ a±x + a®. From the problem statement we have P(l)2 > 1. Further
P(x)P ^-^ = (anxn + an_i2+_1 H----aix + a0)
■ (ans~n + an-ix~(n~lS> + ■ ■ ■ aix-1 + ag
n
= ai + aiaj ^xi~l+
i=0 i ■
(5)
h + Jh2 +i2 i '
i. e. ■
h+^h2+l2 h+Vh?
2h I ■
Thus we have shown that for the initial speed from (4), the player is able to score.
During the free throw, supposing the player lets the ball go at the height of 2 m, we have
h = 1.05 m, 1 = 4.225 m, g = 9.80665 m ■ s"2, and so the minimal initial speed of the ball is
v0 = y^9.80665 [l.05 + ^(1.05)2 + (4.225)2Jm ■ s"1 =
7.28 m- s"1. The corresponding angle is then
* = ^8 9.806 61.4.225 = °-907 fad « 52 °-
Let us think for a while about the obtained value of the angle p for the initial speed vq. According to the picture, we have
2/3 + (it — a) = tt and a + 7 whence it follows that
2'
5.3.7. Derivatives of the elementary functions. Consider the exponential function f(x) = ax for any fixed real a > 0. If the derivative of ax exists for all x, then
f\x) = lim
5x-s-0 Sx
= ax lim
adx - 1
<5x-s-0 Sx
= f'(0)ax
On the other hand, if the derivative at zero exists, then this formula guarantees the existence of the derivative at any point of the domain and also determines its value. At the same time, the validity of this formula for one-sided derivatives is also verified.
Unfortunately, it takes some time to verify that the derivatives of exponential functions indeed exist (see 5.4.2, i, and 6.3.7).
There is an especially important base e, sometimes known as Euler's number, for which the derivative at zero equals one.
Remember the formula (ex)' = ex for a while and draw on its consequences.
For the general exponential function, (using standard rules of differentiation),
(axY = (e111^)' = ln(a)(eln(a^) = ln(a) ■ ax.
Thus exponential functions are special since their derivatives are proportional to their values.
Next, we determine the derivative (lne(a;))'. The definition of the natural logarithm as the inverse to ex,
1 1 1
(ex)' ex y
So it holds that
allows the calculation:
(1) (ln)'(y) = (In) V)
The formula
(2) (xa)' = ax11'1
for differentiating a general power function can also be derived using the derivatives of the exponential and logarithmic functions:
(xa)' = (ealnxY = ealnx(alnx)' = a— = axa
x
5.3.8. Mean value theorems. Before continuing the journey of finding new interesting functions, we derive several simple statements about derivatives. The meaning of all of them is intuitively clear from the diagrams. The proofs follow the visual imagination.
319
CHAPTER 5. ESTABLISHING THE ZOO
V = 5-0 = 5 + i = Hf +7) = Hi +arctgf). We have obtained that the elevation angle corresponding to the throw with minimal energy is the arithmetic mean of the right angle and the angle at which the rim is seen (from the ball's position).
Rolle's theorem
The problem of finding the minimal speed of the thrown ball was actually solved by Edmond Halley as early as in 1686, when he determined the minimal amount of gunpowder necessary for a cannonball to hit a target which lies at greater height (beyond a rampart, for instance). Halley proved (the so-called Halley's calibration rule) that to hit a target at the point [/, h] (shooting from [0,0]) one needs the same minimal amount of gunpowder as when hitting a horizontal target at distance h + \/h2 + I2 (at the angle p = 45 °). Halley also demonstrated that the value of ip is stable with regard to small difference of the amount of used gunpowder and insignificant errors in estimating the target's distance. □
5.F.8. A bullet is shot at angle p from a point at height h A01// above ground at initial speed v0. It will fall on the
fr ground at distance R from the point of shot (see the picture). Determine the angle p for which the value of R is maximal.
Solution. We will express the bullet's position in time by the points [x(t), y(t)]. We assume that it was shot at time t = 0 from the point [0,0] and it will fall on the ground at the point [R,—h] at certain time t = t0, i. e. x(0) = 0, y(0) = 0, x(t0) = R, y(t0) = —h. Similarly to Halley's problem, we will consider the equations
x1 (t) = vq cos p, y1 (t) = vq sin p — gt, £ £ (0, to)
for the horizontal and vertical speeds of the bullet, where g is the gravity of Earth.
We can continue as when solving the previous problem: by integrating these equations (taking x(0) = y(0) =0 into consideration), we get
Theorem. Assume that the function f : R —> R is continuous on a closed bounded interval [a, b] and differentiable inside this interval. If f(a) = f(b), then there is a number c £ (a, 6) such that /'(c) = 0.
t-0LLt!& TttEDZBM
Proof. Since the function / is continuous on the closed interval (i.e. on a compact set), it attains its maximum and its minimum there. Either its maximum value is greater than f(a) = f(b), or the minimum value is less than f(a) = f(b), or / is constant. If the third case applies, the derivative is zero at all points of the interval (a, 6). If the second case applies, then the first case applies to the function —/.If the first case applies, it occurs at an interior point c. If /'(c) ^ 0 then the function / would be either increasing or decreasing at c (see 5.3.2), implying the existence of larger values than /(c) in a neighbourhood of c, contradicting that /(c) is a maximum value. □
5.3.9. The latter result immediately implies the following corollary.
Lagrange's mean value theorem
Theorem. Assume the function f : R —> R is continuous on an interval [a, b] and differentiable at all points inside this interval. Then there is a number c G (a, 6) such that
fib) - f(a)
f(c) =
M£AN VMMC- TtteoZEH
1 ^The French mathematician Michel Rolle (1652-1719) proved this theorem only for polynomials. The principle was perhaps known much earlier, but the rigorous proof comes from the 19th century only.
320
CHAPTER 5. ESTABLISHING THE ZOO
x(t) = Vgt COS p, y(t) = Vgt sill if — | gt2, i 6
((Mo),
and from the conditions limt^to_ x(t) = x(t0) = R, limty(t) = y(t0) = —h, we then have that
R = i>nin cos p, — h = i>oin sin 1,2 — ^ gig.
From the first equation, it follows that
to =
r
Vq cos if '
so we can express the previous two equations by the single equation
gR2
(1)
-h = R\na.p- 2
2vq cosz p
where p G (0,7r/2).
Unlike with Halley's problem, the value of v0 is given and R is variable (dependent on p). So, actually, there is a function R = R(p) (in variable p) which must satisfy (1) (it is determined by the equation (1)). Thus, this function is given implicitly. The equation (1) can be written as (R is substituted by R(p))
R(p) tan p ■ 2vg cos2 p — gR2(p) + h ■ 2v$ cos2 p = 0. Using the relation
2 tan p cos2 p = sin 2p, we can transform (1) into the form
(2) R( - gR2(p) + 2hv\ cos2 p = 0.
Differentiating with respect to p now gives
R1 (p)vl sin2p + 2R{ip)vlcos2p - 2gR(p)R'(p) -2IiVq (2 cos p sin p) = 0,
i. e.
R'(p) [v2sm2p-2gR(p)] = —2R(p)vq cos 2p + 2hvg sin 2p. Thus we have calculated that
It suffices to verify that sin2i,s — 2gR(p) ^ 0 for every p G (0,7r/2). Let us suppose the contrary and substitute
R
into (1), obtaining
_ vQ sin 2(p _ v0 sin (p cos
29
9
_^ _ vl sin ip cos y _ fffo sin2 ip cos2 y
g ^ 2g2t;2 cos2 (y9
Simple rearrangements lead to
, _ vl sin2 tp
which cannot happen (the left side is surely negative while the right one is positive).
Proof. The proof is a simple statement of the geomet-itySS^ r'ca' meaning of the theorem: The secant line ^Bf// between the points [a, f(a)] and [b, /(&)] has a tangent line which is parallel to it (have a look at the diagram). The equation of the secant line is
y = 9(x) = f(a) +-^ZTa-(x ~ a>-
The difference h(x) = f(x) — g(x) determines the (vertical) distance of the graph and the secant line (in the values of y). Surely h(a) = h(b) and
h'(x) = f\x) -
f(b) - fa) b — a
By the previous theorem, there is a point c at which h'(c) = 0. □
The mean value theorem can also be written in the form:
(1) f(b) = f(a) + f(c)(b-a).
In the case of a parametrically given curve in the plane, i.e. a pair of functions y = j(t), x = g(t), the same result about the existence of a tangent line parallel to the secant line going through the boundary points is described by Cauchy's mean value theorem:
Cauchy's mean value theorem
Corollary. Let functions y = f(t) and x = g(t) be continuous on an interval [a, b] and differentiate inside this interval, and further let g'(t) =/ 0 for all t G (a,b). Then there is a point c G (a, 6) such that
fb)-f(a) =f(c) g(b)-g(a) g'(c)-
Proof. Put
h(t) = (/(b) - fa))g(t) - (g(b) - g(a))ft).
Now h(a) = f(b)g(a) - f(a)g(b), h(b) = f(b)g(a) -f(a)g(b), so by Rolle's theorem, there is a number c G (a, b) such that ti(c) = 0.
Finally, the function g is either strictly increasing or decreasing on [a, b] and thus g(b) =/ g(a). Moreover, g'(c) ^ 0 and the desired formula follows. □
5.3.10. A reasoning similar to the one in the above proof leads to a supremely useful tool for calculating limits of quotients of functions.
321
CHAPTER 5. ESTABLISHING THE ZOO
So we were able to determine R'( 0+ and for <£> —> 7r/2— the value of R decreases) and is differentiable at every point of this interval, it has its maximum at the point where its derivative is zero. This means that R(p) can be maximal only if
(3) R(p) = hum2ip.
Let us thus substitute (3) into (2). We obtain
h tan 2p Vq sin 2p — gh2 tan2 2p + 2hvg cos2 p = 0,
and let us transform this equation:
tan 2p v2, sin 2p + 2vq cos2 p = gh tan2 2p,
«o2 + uo (cos 2p + 1) = gh
v2, sin2 2p + vl cos2 2p + cos 2p = gh cos 2 Uo2 + Uo2cos2^ = 5/l1^^, «2 (1 + cos 2p) = gh (1-cos2c^(21y+cos2y), i>q cos 23 = g/i (1 — cos 2p), and cos 2p = vv+gh •
L'Hopital's rule12
cos2 2(y9 '
= „?, S'n2 2lP
However, by this we have uniquely determined the point
0 - 2'
i arccos
at which i? is highest. Since sin 2p0 = \/l — cos2 2y>o
r--ip- Jv%+2ghv%
i/l — , 9 , ^ = —7-r-u—, we have
\/"o+2g'"'o
R (po) = h tan 2y>o = /i ■
^ug+2gto2
Let, for instance, javelin thrower Barbora Spotakova give ajavelin the speed v0 = 27.778 m/s = 100 km/hat the height /i = 1.8 m (with g = 9.806 65 m ■ s~2). Then the javelin can fly up to the distance
R (Vo) = 92806765 V27.7782+2 ■ 9.806 65 ■ 1.8 m= 80.46m. This distance was achieved for
9.806 65-1.8
0.774 2 rad « 44.36c
However, the world record of Barbora Spotakova does not even approach 80 m although the impact of other phenomena (air resistance, for example) can be neglected. Still we must not forget that from 1 April 1999, the center of gravity of the women's javelin was moved towards its tip upon the decision of IAAF (International Association of Athletics Federation). This reduced the flight distance by around 10 %.
Theorem. Suppose f and g are functions differentiable on some neighbourhood of a point x0 G R, yet not necessarily at x0 itself. Suppose
lim f(x) = 0, lim g(x) = 0.
X—YXq X^Xq
If the limit
exists, then the limit
lim
x-*x0 g'yx)
lim
/(*)
x^x0 g(x)
also exists, and the two limits are equal.
i t
j ' ' l'HosrmL mifi-11 / ' -
Proof. Without loss of generality, the functions / and g are zero at the point x0. The quotient of the values then corresponds to the slope of the secant '^--^ line between the points [0,0] and [j(x),g(x)]. At the same time, the quotient of the derivatives corresponds to the slope of the tangent line at the given point. Thus it is necessary to verify that the limit of the slopes of the secant lines exists from the fact that the limit of the slopes of the tangent lines exists.
Technically, we can use the mean value theorem in Cauchy's parametric form. First of all, the existence of the expression /' (x) jg' (x) on some neighbourhood of the point x0 (excluding xq itself) is implicitly assumed. Thus especially for points c sufficiently close to x0, g'(c) 4 0.13 By the mean value theorem,
lim
lim
f(x) - f(x0)
lim
f'(Cx)
x^xo g(x) x^x0 g(x) - g(x0) x^x0 g>(cx)
12Guillaume Francois Antoine, Marquis de l'Hopital, (1661-1704) became famous for his textbook on Calculus. This rule was first published there, perhaps originally proved by one of the famous Bernoulli brothers.
13This is not always necessary for the existence of the limit in a general sense. Nevertheless, for the statement of l'Hospital's rule, it is. A thorough discussion can be found (googled) in the popular article 'R. P. Boas, Counterexamples to L'Hospital's Rule, The American Mathematical Monthly, October 1986, Volume 93, Number 8, pp. 644-645.'
322
CHAPTER 5. ESTABLISHING THE ZOO
The original record (with "correctly balanced" javelin) was 80.00 m.
The performed reasoning and the obtained result can be applied to other athletic disciplines and sports. In golf, for instance, h is close to 0, and thus it is just the angle
Po = lim i arccos , = A arccos 0=4 rad = 45 °
at which the ball falls at the greatest distance
R = hl™0\ f ^° + 2gh = I1' Let us realize that our calculation cannot be used for h = 0 (ip0 = 7r/4) since then we would get the undefined expression tan (7r/2) for the distance R. However, we have solved the problem for any h > 0, and therefore we could get a helping hand form the corresponding one-sided limit. □
5.F.9. Regiomontanus' problem, 1471.
In the museum, there is a painting on the wall. Its lower -0T> edge is a meters above ground and its upper edge b meters, then (its height thus equals b — a). A tourist is looking at the painting, her eyes being at height h < a meters above ground. (The reason for the inequality h < a can, for instance, be to allow more equally tall visitors to view the painting simultaneously in several rows.) How far from the wall should the tourist stand if she wants to maximize her angle of view at the painting?
Solution. Let us denote by x the distance (in meters) of the tourist from the wall and by p her angle of view at the painting. Further, let us set (see the picture) the angles a, /3 e (0,7r/2) by
tana = ^, tan/3 =
Our task is to maximize ip = a — (3. Let us add that for h > b, one can proceed analogously and for h e [a, b], the angle ip increases as x decreases (p = it for x = 0 and h e (a, &)).
where cx is a number lying between x0 and x, dependent on x. From the existence of the limit
lim -77-^,
x-s-x0 g'{x)
it follows that this value will be shared by the limit of any sequence created by substituting the values x = xn approaching xq into f'(x)/g'(x) (cf. the convergence test 5.2.15). Especially, we can substitute any sequence cXn for xn —> x0, and thus the limit
x^xo g'(cx)
exist, and the last two limits are equal. Hence the desired limit exists and has the same value. □
From the proof of the theorem, it is true for one-sided limits as well.
5.3.11. Corollaries. L'Hospital's rule can easily be extended for limits at the improper points ±00 and for the case of infinite values of the limits. If, for instance, we have
lim f(x) = 0, lim g(x) = 0,
x—soo x—soo
then limx^0+ f0-/x) = 0 and limx^0+ 90-/x) = 0.
At the same time, from existence of the limit of the quotient of the derivatives at infinity,
lim Wm - lim
x-0+ (5(1/2;))' x-o+ g'(l/x)(-l/x2)
lim
f'0-M
lim
x-s-o+ g'(\/x) x-s-oo g'(x) Applying the previous theorem, the limit
lim
/(*)
lim
fO-M
lim
x-s-oo g(x) X-S-0+ g(\/x) x-s-oo g'(x)
exists in this case as well.
The limit calculation is even simpler in the case when
lim f(x) = ±00, lim g(x) = ±00.
X—YXq X—YXq
Then it suffices to write
lim
lim
x-s-xo g(x) x-s-xo l//(a;)'
which is already the case of usage of l'Hospital's rule from the previous theorem. It can be proved that l'Hospital's rule has the same form for infinite limits as well:
Theorem. Let f and g be functions differentiable on some neighbourhood of a point xq G R, not necessarily at xq itself. Further, let the limits Ym\x^Xo f(x) = ±00 and ]imx-tXo g(x) = ±00 exist. If the limit
/'(*)
exists, then the limit
lim
x->x0 g'(x)
ii„ M
x-s-xo g(x)
also exists and they equal each other.
323
CHAPTER 5. ESTABLISHING THE ZOO
From the condition h < a it follows that the angle 0 for a e (O, y(b- h)(a- /i)) , f'(x)<0 for x e (V(&-/i)(a-/i),+oo) .
Hence the function / has its global maximum at the point
x0 = y7'(6 — h)(a — h) (let us remind the inequalities h < a < b).
The point x0 can, of course, be determined by other means. For instance, we can (instead of looking for the maximum of the positive function / on the interval (0, +oo)) try to find the global minimum of the function
g(x)
+
_ 1 _ x2 + (b-h)(a-h) _
f(x) x(b—a) (b-h)(a-h)
x(b—a)
x e (o, +oo)
with the help of the so-called AM-GM inequality (between the arithmetic and geometric means)
Proof. Apply the mean value theorem. The key step is to express the quotient in a form where the derivative arises:
f(x) _ f(x) /(s) - f(y) g(s) - g(y)
g(x) f(x)-f(y) g(x)-g(y) g(x)
where y is fixed, from a selected neighbourhood of xq and x is approaching x0. Since the limits of / and g at x0 are infinite, we can surely assume that the differences of the values of both functions at x and y, having fixed y, are non-zero.
Using the mean value theorem, replace the fraction in the middle with the quotient of the derivatives at an appropriate point c between x and y. The expression of the examined limit thus gets the form
g(x)
i -
g(y) g(x)
1 -
9'(cY
where c depends on both x and y. As x approaches x0, the former fraction converges to one. If y is simultaneously moved towards x0, the latter fraction becomes arbitrarily close to the limit value of the quotient of the derivatives. □
5.3.12. Example. By making suitable modifications of the examined expressions, one can also apply l'Hopital's rule on forms of the types oo — oo, 1°°, 0 ■ oo, and so on. Often one simply rearranges the expressions or uses some continuous function, for instance the exponential one.
For an illustration of such a procedure, we show the connection between the arithmetic and geometric means of n non-negative values Xi. The arithmetic mean
M (x1,...,xn) =-
n
is a special case of the power mean with exponent r, also known as the generalized mean:
xr,+
+ xi
Mr(Xl,...,xn) =
The special value M-1 is called the harmonic mean. Calculate the limit value of Mr for r approaching zero. For this purpose, determine the limit by l'Hopital's rule (we treat it as an expression of the form 0/0 and differentiate with respect to r, with x{ as constant parameters).
The following calculation uses the chain rule and knowledge of the derivative of the power function, must be read in reverse. The existence of the last limit implies the existence of the last-but-one, and so on.
lim \n(Mr(Xl,... , aj) = lim m^K + '■■ + <))
In X±-\-----\~xn ln xn
= lim r " r-
r—>0 rxn
n
lnai + ■ ■ ■ + lna„
In ýxx.....xn.
Hence
lim Mr(xi, ...,i„) = Á/ai ...xr,
v—so
324
CHAPTER 5. ESTABLISHING THE ZOO
^^VvTm, 2/1,2/2 >0, where the equality occurs iff yi = y2. The choice
2/1 (z) = bf^, V2{x) = (b~'ll'La~)h)
then gives
d{x) = yi(x) + y2(x) > 2 \Jyi{x) y2(x) = bhV(b-h) (a-h). Therefore, if there is a number x > 0 for which y± (x) = y2(x), then the function g has the global minimum at x. The equation
yi(x)=y2(x), i.e. ^=(-^1,
has a unique positive solution x0 = \J(6 — h)(a — h).
We have determined the ideal distance of the tourist from the wall in two different ways. The angle corresponding to x0 is
*a(b-a) _ „„„t„„ _b-a
0, ve(o,2), f'(p) R™ and a k-linearform & : R™ x ... x R™ on the space R™. Then the derivative of the composed map
ri,
■,rk-i,
-, x e V.
drk^
dt ^v dt -1 • • -v->-■■>■*—> dt )■
(3) The previous statement remains valid even if & also has values in the vector space, and is linear in all its k arguments.
Proof. (1) The linear maps are given by a constant matrix of scalars A = (a^) so that
\p o r(t) = I ^aHrj(i),..., 'S^amiri(t) j.
386
CHAPTER 6. DIFFERENTIAL AND INTEGRAL CALCULUS
The points at which f'(x) =0 are clearly the solutions of the equation sin x = 0, x £ V, i.e. the derivative is zero at points x3 = Q,x4 = it. The inequalities
2 cos2 x+1 > cos2 2x > 0, sinx > 0, xG79n(0,7r)
imply that / is increasing at every inner point of the set V, thus / is increasing on every subinterval of V. The even parity of / then implies that it's decreasing at every point x £ (—7r, 0), x ^ —3ir/4, x ^ — 7r/4. Hence the function has strict local extremes exactly at the points
Xk = ItTT, k £ Z.
With respect to periodicity of /, we uniquely describe these extremes by stating that for x$ = xq = 0, we get a local minimum (recall the value of the function / (0) = 1) and for x4 = x1 = it, a local maximum with the value / (it) = —1. Let's compute the second derivative
/.// / \ [4 cos x( — sin x) sin x-\- (2 cos2 x+l) cos x] cos2 2x
t (x) = J-----v=-'-1-
J \ / cos4 2x
4 cos 2x{— sin 2x) (2 cos2 x+l) sin x cos4 2x
[10 sin2 x cos2 x+2 cos4 x+cos2 x+4 sin4 x+7 sin2 x] cos x cos3 2x '
x e V. Note that after a few simplifications, we can also express
put \ (3+4 cos2 x sin2 x+8 sin2 x) cos x / (z) = ^-c^Hte-'-. x E D
or
/./// \ f 11— 4 cos4 x—4 cos2 x) cos x _ ,~
f (x) = 1-J-;;-'-, x £ V.
J \ / cos3 2x '
Since
10 sin2 x cos2 a; + 2 cos4 a; + cos2 x + 4 sin4 a; + 7 sin2 a; > 0, x £ R,
or
3 + 4 cos2 a; sin2 a; + 8 sin2 x = 11 — 4 cos4 a; — 4 cos2 a; >
3, a; £ R
respectively, we have f"(x) = 0 for certain x e V if and only if cos x = 0. But that's satisfied only by x5 = tt/2 £ 2?. It's clear that /" changes its sign at this point, i.e. it's a point of inflection. No other points of inflection exist (the second derivative /" is continuous on V). Other changes of the sign of /" occur at zero points of the denominator, which we have already determined as discontinuities x\ = tt/4 and x2 = 3tt/4. Hence the sign changes exactly at points x\, x2, x5, thus the inequality
Carry out the differentiation separately for individual coordinates of the result. However, the derivative acts linearly with respect to scalar linear combinations, see Theorem 5.3.4. That is why the derivative is obtained simply by evaluating the original linear map 0 pro x -> 0+
implies that / is convex on the interval [0,7r/4), concave on (7r/4,7r/2], convex on [tt/2, 3tt/4) and concave on (37r/4, it]. The convexity and concavity of / on other subintervals is given by its periodicity and a simple observation: if a function is even and convex on an interval (a, b), where 0 < a < b, then it's also convex on (—b, —a).
All that's left is computing the derivative (to estimate the speed of the growrth of the function) at the point of inflection, yielding /' (tt/2) = 1. Based on all previous results, it's now easy to plot the graph of function /. □
6.A.36. Determine the course of the function
ln(x)
and plot its graph.
Solution, i) First we'll determine the domain of the function:
R+\{1}.
ii) We'll find the intervals of monotonicity of the function: first we'll find zero points of the derivative:
ln(a;) - 1 In2(x)
= 0
The root of this equation is e. Next we can see that f'(x) is negative on both intervals (0,1) and (1, e), hence j(x) is decreasing on both intervals (0,1) and (1, e). Additionally, j'(x) is positive on the interval (e, oo), thus f(x) is increasing here. That means the function / has the only extreme at point e, being the minimum, (we can also decide this using the sign of the second derivative of the function / at point e, because/(2)(e) > 0).
iil) We'll find the points of inflection:
/(2)(-) =
ln(x)
And(x)
= 0
The root of this equation is e2, so it must be a point of inflection (it cannot be an extreme with regard to the ptrevious point).
iv) The asymptotes. The line x = 1 is an asymptote of the function. Next, let's look for asymptotes with a finite slope k:
k = lim r
ln(x)
x-i-oo ln(x)
This corresponds to the idea that after the choice of a parametrization with a derivative of constant length, the second derivative in the direction of the movement vanishes. The second derivative lies in the plane orthogonal to the tangent vector.
^
re> i/( kmc.
If the second derivative is nonzero, the normed vector
1
n(s)
Tr"(s)
0.
\\r"(s)\\
is the (principal) normal of the curve r(s). The scalar function k(s) satisfying (at the points where r"(s) ^ 0)
r"(s) = k(s)tj(s)
is called the curvature of the curve r(s). At the zero points of the second derivative k(s) is defined as 0.
At the nonzero points of the curvature, the unit vector b(s) = r' (s) x n(s) is well defined and is called the binormal of the curve r(s). By direct computation
0 = ±(b(s),r'(s)) = (bf(s),r>(s)) + (b(s),r"(s))
= (b'(s,r'(s)) + K(S)(b(s),n(s)) = {b'(s), r'(s)),
which shows that the derivative of the binormal is orthogonal to r'(s). b'(s) is also orthogonal to b(s) (for the same reason as with r' above). Therefore it is a multiple of the principal normal n(s). We write
b'(s) = -t(s)ti(s).
The scalar function r (s) is called the torsion of the curve r(s).
In the case of plane curves, the definitions of binormal and torsion do not make sense.
We have not yet computed the rate of change of the principal normal, which can be written as n(s) = b(s) x r'(s):
n'(s) = b'(s) x r'(s) + n(s)b(s) x n(s)
= —t(s)n(s) x r'(s) + k(s)(—r'(s))
= r(s)&(s) — «(s)r'(s).
Successively, for all points with nonzero second derivative of the curve r(s) parametrized by the arc length, there is derived the important basis (r'(s),n(s), b(s)), called the Frenet frame in the classical literature. At the same time, this
388
CHAPTER 6. DIFFERENTIAL AND INTEGRAL CALCULUS
If the asymptote exists, its slope must be 0. Let's continue the computation
x-s-oo \n(x)
0 ■ x = lim \n(x) = oo,
and because the limit isn't finite, an asymptote with a finite slope doesn't exist.
The course of the function:
10 -
y 5-
10 15 20
x
-10 -
□
Now move from determining the course of functions onto other subjects connected to derivatives of functions. First we'll demonstrate the concept of curvature and the osculating circle on an ellipse
6.A.37. Determine the curvature of the ellipse x2 +2y2 = 2 at its vertices (4.C.9). Also determine the equations of the circles of osculation at these vertices.
Solution. Because the ellipse is already in the basic form at the given coordinates (there are no mixed or linear terms), the given basis is already a polar basis. Its axes are the coordinate axes x and y, its vertices are the points [\f2, 0], [— \/2, 0], [0,1] and [0, —1]. Let's first compute the curvature at vertice [0,1]. If we consider the coordinate y as a function of the coordinate x (determined uniquely in a neighbourhood of [0,1] ), then differentiating the equation of the ellipse with respect to the variable x yields 2x + 4yy' = 0, hence y' = — ^ (y' denotes the derivative of function y(x) with respect to the variable x; in fact it's nothing else than expressing the derivative of a function given implicitly, see ??). Differentiating this
basis is used in order to express the derivatives of its components in the form of the Frenet-Serret formulas
^-(s) = K{s)n(s), ^-(s) = T(s)b(s) - K(s)r'(s)
db
ds
(s) = -T(s)n(s).
The following theorem tells how crucial the curvature and torsion are. Notice that if the curve r (s) lies in one plane, then the torsion is identically zero. In fact, the converse is true as well. We shall not provide the proofs here.
Theorem. Two curves in a space parametrized by the length of their arc can be mapped to each other by an Euclidean transformation if and only if their curvature functions and torsion functions coincide except for a constant shift of the parameter. Moreover, for every choice of smooth functions k a t there exists a smooth curve with these parameters.
By a straightforward computation we can check that the curvature of the graph of the function y = j(x) in plane and the curvature k of this curve defined in this paragraph coincide. Indeed, comparing the differentials of the length of the arc for the graph of a function (as a curve with coordinates x(t), y(t) = /(*(*))):
dt=(l + (fx)2y/2dx, dx = (l + (fx)2)-y2dt
(here we write /i = £) we obtain the following equality for the unit tangent vector of the graph of a curve
r'(s) = (x'(s),y'(s)) = ((l+a)2)"1/2 /,(l + (/,)2)"1/2).
A messy, but very similar computation for the second derivative and its length leads to
«2 = lk"f = (0)2(i + (/,)2)-3
as expected. If we write r = (x,y), y' = fxx', x' = (1 + J2)"1/2, then
2^ )' ^fxfxx-^ (*£ ) fxfxx
y" = fxx{x'f + fxX" = fxx{x')2 - fxxfx{x')\ Hence
(*"? + (y"f = fUx'f (fx + (i + /2)2 + ft
-2/2(l + /2))
= /L(i + /2)-4(/2 + i)
= slx(} + slr3-
6.1.16. The numerical derivatives. In the begining of this textbook we discussed how to describe the values in a sequence if its immediate differences ftV are known, (c.f. paragraphs 1.1.5, 1.2.1). Be-'"^ov^^p^— fore proceeding the same way with the derivatives we clarify the connections between derivatives and differences. The key to this is the Taylor expansion with remainder.
389
CHAPTER 6. DIFFERENTIAL AND INTEGRAL CALCULUS
equation with respect to x than yields y" = — 5 (^ — -pr) ■ At point [1,0], we obtain y' = 0 and y" = — \ (we'd receive the same results if we explicitly expressed y = | V2 — x2 from the equation of the ellipse and performed differentiation; the computation would be only a little more complicated, as the reader can surely verify). According to 6.1.12, the radius of the osculation circle will be
(i + (yr
(y"
= -2,
or 2, respectively, and the sign tells us the circle will be "below" the graph of the function. The ideas in 6.1.12 and 6.1.15 imply that its center will be in the direction opposite to the normal line of this curve, i.e. on the y axis (the function y as a function of variable x has a derivative at point [0,1], thus the tangent line to its graph at this point will be parallel to the x axis, and because the normal is perpendicular to the tangent, it must be the y axis at this point). The radius is 2, so the center will be at point [0,1 — 2] = [0, —1]. In total, the equation of the osculation circle of the ellipse x2 + 2y2 = 2 at point [0,1] will be x2 + (y + l)2 = 4. Analogously, we can determine the equation of the osculation circle at point [0, —1]: x2 +(y—1)2 = 4. The curvatures of the ellipse (as a curve) at these points then equal \ (the absolute value of the curvature of the graph of the function).
For determining the osculation circle at point W/2, 0], we'll consider the equation of the ellipse as a formula for the variable x depending on the variable y, i.e. a; as a function of y (in a neighbourhood of point W/2,0], the variable y as a function of x isn't determined uniquely, so we cannot use the previous procedure - technically it would end up by diving by zero). Sequentially, we obtain: 2xx' + Ay = 0, thus x' = -2|, and x" = -2(± - ^). Hence at point W/2,0], we have x' = 0 and x" = — \/2 and the radius of the circle of osculation is p = — = ^ according to 6.1.12. The normal line is heading to —oo along the x axis at point W/2, 0], thus the center of the osculation circle will be on the x axis on the other side at distance hence at the point [y/2 - 0] = [^,0]. In total, the equation of the circle of osculation at vertice W/2,0] will be (x — -^2)2 + y2 = \. The curvature at both of these vertices equals \/2.
Suppose that for some (sufficiently) differentiable function f(x) defined on the interval [a, 6], the values fi = f(xi) at the points x0 = a, x1, x2, ■ ■ ■, xn = b, are given while x{ — Xi-i = h for some constant h > 0 and all indices i = 1,..., n. Write the Taylor expansion of function / in the form
f(xt ±h) = fi± hf(xt) + y/"(zi) ± y/(3)(^) + • • •
Suppose the expansion is terminated at the term containing hk which is of order k in h. Then the actual error is bounded by
hk+i
(fc + l)!l; [)l
on the interval [xi — h,Xi + h]. If the (k + l)th derivative / is continuous, it can be approximated by a constant. Then for small h, the error of the approximation by the Taylor polynomial of order k acts like hk+1 except for a constant multiple. Such an estimation is called an asymptotic estimation.
Asymptotic estimates
Definition. The expression G(h) is asymptotically equal to
F(h) for /i -5- 0. Write G(h) = 0(F(h)), if the finite limit
G(h) lim ttttt- = a e R
h^o F(h)
exists.
Similarly, compare the expressions for h —> oo and use the same notation.
Denote the values of the derivatives of j(x) at the points
a)
Xi as ft'. Write the Taylor expansion as:
£11 £111 h±l=h±f[h+J-j-h2±^-hi + ...
I D
Considering combinations of the two expansions and /, itself, we can express the derivative as follows
fi+i - fi-i = ,i h?_ A3) 2h h 3!h
fi + l — fi _ rl . h rll .
h ~ h + 2\Jt
fi — fi — 1 _ rl h „
h ~ Jt 2\Jt This suggests a basic numerical approximation for derivatives:
Central, forward, and backward differences
The central difference is defined as f[ = ^'+12lf' 1, the/or-— ^'+1h~^', and the backward differ-
ward difference is f[
zA
h
■ £l fi-fi-
ence is f[ — ———
If we use the Taylor expansions with remainder of the appropriate order, we obtain an expression of the error of the
390
CHAPTER 6. DIFFERENTIAL AND INTEGRAL CALCULUS
approximation by the central difference in the form
□
6.A.38. Remark. The vertices of an ellipse (more generally the vertices of a closed smooth curve in plane) can be defined as the points at which the function of curvature has an extreme. The ellipse having four vertices isn't a coincidence. The so called "Four vertices theorem" states that a closed curve of the class C3 has at least four vertices. (A curve of the class C6 is locally given parametrically by points [f(t),g(t)] e R2, t e (a, b) c R, where / and g are functions of the class C3 (R).) Thus the curvature of the ellipse at its any point is between its curvatures at its vertices, i.e. between \ and \/2.
B. Integration
We start with an example testing the understanding the concept of Riemannian integration.
6.B.I. Let y = x | on the interval I = [—1,1] and let
-1,
,0,
,1
be a partition of the interval I for arbitrary n e N. Determine Ssn , SUp and Ssn , m (the upper and lower Riemann sum corresponding to the given partition).
Based on this result decide if the function y = x | on [—1,1] is integrable (in Riemann sense). O
And now some easy examples that everyone should handle.
6.B.2. Using integration "by heart", express
(a) J zTx dx,
(b) I-
dx, x e (-2,2);
(c) / dx,
(d) f^kdx,x^-l.
Solution. We can easily obtain
3!
ft2(/(3)(^+^)-/(3)(^-^))
Here, 0 < £, r\ < 1 are the values from the remainder expression of fi+i and fi-i, respectively. The error of the second derivative in the other two cases is obtained similarly. Thus, under the assumption of bounded derivatives of third or second order, the asymptotic estimates are computed:.
Theorem. The asymptotic estimate of the error of the central difference is 0(h2'). The errors of the backward and forward differences are 0(h).
Surprisingly, the central difference is one order better than the other two. But of course, the constants in the asymptotic estimates are important, too. In the case of the central difference, the bound on the third derivative appears, while in the two other cases second derivatives show up instead.
We proceed the same way when approximating the second derivative. To compute f"(xi) from a suitable combination of the Taylor polynomials, we cancel both the first derivative and the value at x{. The simplest combination cancels all the odd derivatives as well:
/z+1'2/t{z + /z-1=/i2) + g/(4)(^) + .-..
This is called the second order difference. Just as in the central first order difference, the asymptotic estimate of the error is
f(2) = fi+l^±l_±+0(h2).
Notice that the actual bound depends on the fourth derivative of/.
2. Integration
6.2.1. Indefinite integral. Now, we reverse the procedure of differention. We want to reconstruct the actual values of a function using its immedi-ate changes. If we consider the given function j(x) as the (say continuous) derivative of an unknown function F(x), then at the level of differentials we can write dF = f(x) dx.
We call the function F the primitive function or the indefinite integral of the function /. Traditionally we write
F(x) = J f(x)dx.
Lemma. The primitive function F(x) to the function f(x) is determined uniquely on each interval [a, b] up to an additive constant.
Proof. The statement follows immediately from Lagrange's mean value theorem, see 5.3.9. Indeed, if F'(x) = G'(x) = j(x) on the whole interval [a, b], then the derivative of the function (F — G) (x) vanishes at all points c of the interval [a, b]. The mean value theorem implies that for all points x in this interval,
F(x) - G{x) = F(a) - G(a) + 0-(x-a).
391
CHAPTER 6. DIFFERENTIAL AND INTEGRAL CALCULUS
■ dx
(a) J e x dx = — J —e x dx = — e x + C;
(b) f ,} , dx = f . 2 dx = arcsin § + C;
^arctg^+C;
(d) / ^3+3x+2 da; = ln I X'3 + 3x + 2 I + >
where we used the formula J 4^ dx = ln | f(x) | + C.
□
6.B.3. Compute the indefinite integral
j(jx +4e* - + 9sin5a; + 2cos§ -for x 3, a; -| + fezr, fc G Z.
Solution. Only by combining the earlier derived formulas, we obtain
Thus the difference of the values of the functions F and G is constant on the interval [a,b]. □
The previous lemma supports another notation for the indefinite integral:
+
cos2 x 3—x
dx
J[7X + 4eT - ± + 9sin5a; + 2cosf
cos2 x 3—x
+ T.
dx
F(x) = J f(x) dx + C with an unknown constant C.
6.2.2. Newton integral. We consider the value of a real function j(x) as an immediate increment of the region bounded by the graph of the function / and the x axis and try to find the area of this region between boundary values a a & of some interval. We relate this idea with the indefinite integral.
Suppose we are given a real function / and its indefinite integral F(x), i.e. F'(x) = j(x) on the interval [a, b].
Divide the interval [a, b] into n parts by choosing the points
ln7- + 6e31 + 2^Tn2
| cos 5x + 4 sin | — 3 tg x -
X0 < X\ <
<^ xn
ln | 3 - x | + C.
□
For expressing the following integrals, we'll use the method of integration by parts (see 6.2.3).
6.B.4. Compute J a; cos a; dx, x G Rand Jin a; da;,
x > 0;
Solution.
u = lax u' = -
X
v' = 1 V = X
Approximate the values of the derivatives at the points Xi by the forward differences. That is, by the expressions
f(xi)=F'(xi)
■F(Xi)
Xi-\-\ Xi
Finally the sum over all the intervals of our partition yields the approximation of the area:
ln x dx ■
i=0
= a;lna; — / 1 da; = a; ln a; — x + C.
x cos x dx =
u = x u1 = 1 v1 = cos a; v = sin a;
sin x dx = x sin x + cos x + C.
□
-£f(Xi)(Xi+1-Xi) ^E^^'H^+i-.
i=0
n-1
^(F(xi+1)-F(Xi)) = F(b) - F(a).
Therefore we expect that for "nice enough" functions f(x), the area of the region bounded by the graph of the function and the x axis (including the signs) can be calculated as a difference of the values of the primitive function at the boundary points of the interval. This procedure is called the Newton integration?
6.B.5. Using integration by parts, compute
(a) / (x2 + 1) e~x dx, x G R,
(b) J(2x - l)lnx dx, x > 0,
(c) Jarctga; dx, x G R,
(d) / ex sm(fJx) dx, x,/3eR,
Isaac Newton (1642-1726) was a phenomenal English physicist and mathematician. The principles of integration and differentiation were formulated independently by him and Gottfried Leibniz in the late 17th century. It took nearly another two centuries before Bernhard Riemann introduced the completely rigorous modern version of the integration process.
392
CHAPTER 6. DIFFERENTIAL AND INTEGRAL CALCULUS
Solution. First emphasise that by integration by parts, we can compute every integral in the form of
J P(x) abx dx, J P(x) sin (bx) dx, J P(x) cos (bx) dx,
fP(x)lognaxdx, J P(x) arcsin (bx) dx, J P(x) arctg (bx) dx,
f xb log™ (kx) dx, J P(x) arccos (bx) dx, J P(x) arccotg (bx) dx,
l (cx) dx, J i
! (cx) dx,
where P is an arbitrary polynomial and
a £ (0,1) U (1, +oo), 6,c£R\{0}, n £ N, k > 0. Thus we know that (a)
F(x) G'(x)
J (x2 + 1) e~x dx = F'(x) = 2x G(x) = -e"
x2 + l = e~x
(b)
- (x2 + 1) &~x + J2xe~x dx = F(x) = 2x F'(x) = 2 G'(x)=e~x G(x) = -e~x x2 + 1) &~x - 2x &~x + J 2 &~x dx x2 + 1) &~x - 2x &~x - 2 &~x + C --t~x (x2 + 2x + 3) + C;
J(2x — 1) lna; dx =
F(x) = lna; G'(x) = 2x-\
F'(x) = 1/a G(x) = x2 -
(x2 — x) In x — f dx = (x2 — x) lna; + f 1 — x dx = (x2 — x) lna;+a; —^- + C;
(c)
/ arctg x dx
F(x) = arctg a; G'(x) = 1
F'(x) = G(x) = x
; arctg x - J jf^; dx = x arctg x - |J dx = x arctg x — \ In (l + x2) + C;
(d)
J ex sin(ßx) dx = F(x) =ex I F'(x) =ex \_
G'(x) = sm.(ßx) J G(x) = -i cos ßx | ~
— jex cos(ßx) + i J ex cos(ßx) dx = F(x) = &x I F'(x) =ex \_
G'(x) = cos(ßx) I G(x) = i sin(ßx) | ~ — j&x cos(ßx) + j2-ex smQßx) — J ex sm(ßx) dx,
which implies
J ex sin a; dx = 11ß2 ex (sm(ßx) — ß cos(ßx)) + C.
□
' 1, >
^ >o, Ki. yCj, io
Newton integral
If F is the primitive function to the function / on the interval [a, b], then we write
^ f(x)dx=[F(x)]ba = F(b)-F(a)
and call it the Newton (definite) integral with the bounds a and b.
We prove later that for all continuous functions / £ C° (a, b) the Newton integral exists and computes the area as expected. This is one of the fascinating theorems in elementary calculus. Before going into this, we discuss how to compute these integrals.
The primitive functions are well denned for complex functions /, where the real and the imaginary part of the indefinite integrals are real primitive functions to the real and the imaginary parts of /. Thus, with no loss of generality, we work only with real functions in sequel.
6.2.3. Integration "by heart". We show several procedures 1 for computing the Newton integral. We exploit the knowledge of differentiation, and look for primitive functions. The easiest case is the one where the given function is known as a derivative. To learn such cases, it suffices to read the tables for function derivatives in the menagerie the other way round. Hence:
393
CHAPTER 6. DIFFERENTIAL AND INTEGRAL CALCULUS
For expressing the following integrals, it's convenient to use the substitution method (see 6.2.5).
6.B.6. Using a suitable substitution, determine
Integration table
(a) J ^2x - 5 dx, x > §;
(b) dx,x>0;
Solution. We have
(a)
(b)
(c)
(d)
J \l2x — 5 dx =
t = 2x-5 dt = 2dx
= W Vidt
= IfT +C = ^(2x-5)3 + C;
t = 7 + In x dt = - dx
X
_ (7+ln x '
= ft7dt=^+C
+ C;
In
(1+sin x)2
dx —
t = 1 + sin x dt = cos x dx
_ r dt - J t2
1+sin x
r cos x ^x .
\/l+sin2 x
t = sin x dt = cos x dx
u = t + Vl + t2 > 0
du = 1 +
dt
/l+t2 1 -dt
J ^du = \nu + C =
t+Vl+t2 Vl+t2
In (t + vT+l2") + C
= In (sinx + \J\ + sin2x ) + C.
For arbitrary nonzero a, b e R and neZ,n/-l: a dx = ax + C
axn dx = -^r[Xn+1 + C ■ e~~ +C
— dx = a In x + C
x
acos(bx) dx = ^ sin(&a) + C
asin(&a) dx = — ^ cos(bx) + C
acos(bx) sinn(&a) dx = b,na+i) sm™+1(&a;) + C
i(bx) cosn(bx) dx = -
atg(&a) dx = — — ln(cos(&a)) + C
■ dx = arctg (-) + C
(bx) + C
a1 + x'
dx = arccos (-) + C
■. dx = arcsin (-) + C.
In all the above formulae, it is necessary to clarify the domain on which the indefinite integral is well defined. We leave this to the reader.
Further rules can be added by observations of suitable structure of the given functions. For example,
/(*)
dx = ln|/(x)| +C
□
for all continuously differentiable functions / on intervals where they are nonzero.
Of course, the rules for differentiating a sum of differentiable functions and constant multiples of differentiable functions yield analogous rules for the indefinite integral. So the sum of two indefinite integral is the indefinite integral of the sum of the integrated functions, up to the freedom in the chosen constant, etc.
6.B.7. Determine the integrals a)
/ J sin
b) Jx2^/2x + Ida.
6.2.4. Integration by parts. The Leibniz rule for deriva-
can be interpreted in the realm of the primitive ■■■'fMkh^t-. functions. This observation leads to the follow-
ing very useful practical procedure. It also has theoretical consequences.
394
CHAPTER 6. DIFFERENTIAL AND INTEGRAL CALCULUS
Solution. For computing the first integral, we'll choose the substitution t = tg x, which can be often used with an advantage.
da;
J(x) — cos2 (a;)
substitution t = tgx
dt = 1^ dx = (l + tg2(x)) dx = (l + t2) Ax
sin2(x) - _^!M
\x) =
l+tg2(x)
t2
— dt =
1 2
l+tg2(x)
1 f 1
i+t2 l
i+t2
t - 1
Integration by parts The formula for computing the integral on the left hand side
is called integration by parts.
The above formula is useful if we can compute G and at the same time compute the integral on the right hand side. The principle is best shown on an example. Compute
I = a; sin a; dx.
t + 1
1
In this case the choice F(x) = x, G'(x) = sin a; will help. Then G(x) = — cos x and therefore
tg+1
Now we'll compute the second integral:
x2y/2x + Ida;
u = x2 u = 2x
v' = y/2x + 1 v = \{2x + \)
I = x{— cos x) — J — cos x dx = —x cos a; + sin a; + C.
Some integrals can be dealt with by inserting the factor 1, so that G'(x) = 1:
= -x2(2x + l)f - - f a;2v/2a7TTda; - -(2a; + 1)1 + C, 3 3 J 9
which can be thought of as an equation, when the variable is
the integral. By putting it on one side,
a;V2a; + Ida;
In x dx = / 1 ■ In x dx
= a;lna;— / — x dx = x lax — x + C.
x
x2(2x + l)i
2
~ 7 v! = 1
xV2x+T
U = x
v1 = y/2x + 1 v = ±^2x + 1
6.2.5. Integration by substitution. Another useful procedure is derived from the chain rule for differentiating composite functions. If
F'(y) = f(y), y = p(x),
where p is a differentiable function with nonzero derivative, then
dF(p(x))
1 32/1 If 3
-a;2(2a; + l)2 - - ( -aV2a; + 1 - - (2x + l)ä da;
dx
F'(y) ■ p'(x)
x2(2x + l)i
2 a;v'2a; + l + -^-(2a; + l)t
21 2
105 1 2
and thus F(y) + C = J f(y) dy can be computed as
F( 0. Thus we should consider the values C\ and C2. For the sake of simplicity though, we'll use the notation without indices and stating the corresponding intervals. Furthermore, we'll help ourselves by letting aC = C for a e R \ {0} and C + b = C for b e R, based on the fact that
{€■ C G R} = {aC; C G R} = {C + b; C G R} = R.
We could then obtain an entirely correct expression for example by substitutions C = aC, C = C + b. These simplifications will prove their usefulness when computing more complicated problems, because they make the procedures and the simplifications more lucid.
Case (b). Sequential simplifications of the integrated function lead to
■ dx =
/ ^¥ dx-Jldx = tgx-x + C, where we helped ourselves by the knowledge of the derivative (tgaO'^d^ x=£% + kir,keZ. Case (c). It suffices to realize that this is a special case of the formula
jj$dx = ]n\f(x)\+C, which can be verified directly by differentiation
(In I f(x) \+C)' = (In [±f(x)])' + (€)' = %^ =
±/(*) _ /M
±m f(x) ■
Hence
/T^^ = ta(l + sina;) + C.
Case (d). Because the integral of a sum is the sum of integrals (if the seperate integrals are sensible) and a nonzero constant can be factored out of the integral at any time, we have
J 6 sin 5x + cos § + 2 dx = -| cos5x + 2 sin § + 3eT + C.
6.B.9. Determine
(a) / JoS^ dx, x ^ z + kir, k e Z;
(b) / x2 e~3x dx,
(c) J cos2 a; dx, x G R.
Solution. Case (a). Using integration by parts, we obtain
As an illustration, we verify the last but one integral in the list in 6.2.3 using this method. To compute
1
I =
vT
: dx.
Choose the substitution x = sin t. Then dx = cos tdt. So
I =
1
VT - sin2 í = / dt = t + C.
cos tdt =
1
Vcos2 Í
cos í di
By substitution f = arcsin a; into the result, I = arcsin x+C.
While substituting, the actual existence of the inverse function to y = ip(x) is required. To evaluate a definite Newton integral, it is needed to correctly recalculate the bounds of integration. Problems with the domains of the inverse functions can sometimes be avoided by dividing the integration into several intervals. We return to this point later.
6.2.6. Integration by reduction to recurences. Often the jjfi 1, use of substitutions and integrating by parts leads to recurent relations, from which desired integrals can be evaluated. We illustrate by an example. Integrat-
ing by parts, to evaluate
cosm x dx
' x cos x dx
= cos™1 a; sin a; — (m — 1) J cos™1 x(— sin x) sin x dx
= cos^xsinx + {m-l)Jcos--2xsin2xdx.
Using the formula sin2 x = 1 — cos2 x,
mlm = cosm-1 a;sina; + (m — l)Jm_2 The initial values are
To = x, 1\ = sin a;.
Integrals in which the integrated function depends on expressions of the form (a;2 + 1) can be reduced to these types of integrals using the substitution x = tgf. For example, to compute
_ f dx k J {x2 + lf the latter substitution yields (notice that dx = cos-21 dt)
Jk —
dt
s2t( ^44 + l)
\ COS^ t J
= I cos2k~2 t dt
□ For k = 2, the result is
J2 = -(costsmt + t) = - ^^—^ + t
After the reverse substitution t = arctg x 1 / x
j2
2 V 1 + x-
+ arctg x ) + C.
When evaluating definite integrals, we can compute the whole recurrence after evaluating with the given bounds. For
396
CHAPTER 6. DIFFERENTIAL AND INTEGRAL CALCULUS
■ dx
/ = x G'(x^ - 1
COS^ X
sin x
F'(x) = 1 G(x) =tgx
= X tg x — cos x I + C.
J tg x dx = x tg x + J -^f dx = x tg x + In
Case (b). This time we are clearly integrating a product of two functions. By applying the method of integration by parts, we reduce the integral to another integral in a way that we differentiate one function and integrate the second. We can integrate both of them (we can differentiate all elementary functions). Thus we must decide which of the two variants of the method we'll use (whether we'll integrate the function y = x2, or y = e~3x). Notice that we can use integration bz parts repeatedlz and that the n-th derivative of a polynomial of degree n e N is a constant polynomial. That gives us a way to compute
I-
' dx
F(x) G'(x) x2 e~3x + § /:
_ ~ —3x
F'(x) = 2x G(x) = -ie-
-3x
-3x
dx
and furthermore
x e
_ I,
3 -
~3x dx —
F(x) = . G'{x) =
-3x
F'(x) = 1 G(x)- 1
: dx
-\x& 3x
-3x
3 " 1 ~—3x 9 e
+ c.
In total, we have
x e
~3x dx = -|s'e'
-3*
C2, -3x
~3x - § xe
3x__2_ „-
27 C
3x
+ c =
[x2 + 1X + D+C.
Note that a repeated use of integration by parts within the scope of computing one integral is common (just like when computeing limits by the l'Hospital rule).
Case (c). Again we apply integration by parts using
J cos2 x dx = J cos x ■ cos xdx =
F(x) = cos x G' (x) = cos x
F' (x) = — sin a; G(x) = sin a;
cos x ■ sin x + J sin2 xdx = cos x ■ sin x + J 1 — cos2 xdx = cosx ■ sinx + f 1 dx — f cos2 xdx = cos a; ■ sin a; + a; — j cos2 a; da;.
Although the return to the given integral might make the reader cast some doubts on it, the equality
j cos2 xdx = cos x ■ sin x + x — j cos2 x dx
implies
2 j cos2 xdx = cos a; ■ sin x + x + C,
i.e.
1
(1)
cos2 xdx = - (x + sin x ■ cos a;) + C.
It suffices to remember that we put C/2 = C and that the indefinite integral (as an infinite set) can be represented by one specific function and its translations.
example while integrating over the interval [0, 2ir], the integrals have these values:
Jo = / dx = [x\l* = 2tt Jo
r2ir
I\ = I cos xdx = [sinx]2^ = 0
0
2tt
0 for even m
Im-2
Thus for even m = 2n, the result is
Im = I cosm x dx Jo
2tt
2t? 7
cos x dx -
(2n-l)(2n-3)...3-l
2tt.
Jo 2n(2n - 2) ... 2
For odd m it is zero (as could be guessed from the graph of the function cos x).
6.2.7. Integration of rational functions. The next goal is the integration of the quotients of two polynomials j(x)/g(x). There are several simplifications to start with.
If the degree of the polynomial / in the numerator is greater or equal to the degree of the polynomial g in the denominator, carry out the division with remainder (see the paragraph 5.1.2). This reduces the integration to a sum of two integrals. The division provides
f = q-g + h,
f
- = q+-.
9 9
Thus, / f(x)/g(x) dx = f qdx + f h(x)/g(x) dx where the first integral is easy and the second one is again an expression of the type h(x)/g(x), but with degree of g(x) strictly larger than the degree of h(x) (such functions are called proper rational functions).
Thus we can assume that the degree of g is strictly larger than the degree of /. We introduce the procedure to integrate proper rational functions by a simple example.
Observe that we can integrate (a + x)~n,n > 1, and
-dx = In I a + x I + C.
a + x
Summing such simple fractions yields more complicated ones:
-2 6 4x + 2
+
x + 1 x + 2 a;2 + 3a; + 2 which can be integrated directly:
4a;+ 2
a;2 + 3a; + 2
- dx ■
-21n Ire + II + 6 In la; + 2 +C.
This suggests looking for a procedure to express proper rational functions as a sum of simple ones. In the example, it is straightforward to compute the unknown coefficients A and B, once the roots of the denominator are known:
4a;+ 2 _ 4a;+ 2 _ A B x2 + 3x + 2 ~ (x + l)(x + 2) ~ x + 1 + x + 2'
397
CHAPTER 6. DIFFERENTIAL AND INTEGRAL CALCULUS
We emphasise that usually suitable simplifications or substitutions lead to the result faster than integration by parts. For example, by using the identity
cos2 x = i (1 + cos 2x) , i£l we easily obtain
J cos2 x dx = J \dx + j'\cos2xdx = l + + C =
- +
2 1 J 2 ^ 2 1 4
2sinlcosI+C= \{x + smx-cosx)+C.
□
6.B.10. Integrate
(a) J cos5 x ■ sin a; dx,
(b) f cos5 x ■ sin2 x dx,
(c) /Ä^^e(-f,f);
(d) i-^+v^ ^ a. > o.
V xb-\-x
Solution. Case (a). This is a simple problem for the so called first substitution method, whose essence is writing the integral in the form of
(1)
f(p(x)) 0.
Solution. This problem can illustrate the possibilities of combining the substitution method and integration by parts (in the sscope of one problem. First we'll use the substitution y = y/x to get rid of the root from the argument of the exponential function. That leads to the integral
J e^ dx
V2 = x
■2/ye»dy.
2y dy = dx
Now by using integration by parts, we'll compute
fye.y dy =
F(y) = y G'(y) =
F'(y) G(y) =
= 1
y&y - $ &y dy = y&y - &y + c.
Thus in total, we have
Je^dx = 2y&y - 2 & + C = 2 e^ (y/x - 1) + C.
PIEMAN NUV \NT£GKA'l
% % %H
RlEMANN INTEGRAL4
Definition. The Riemann integral of the function / on the interval [a, b] exists, if for every sequence of partitions with representatives (Zk)kLo wim norms of the partitions Sk approaching zero, the limit
lim Ssk = S
A:—>-oo
exists and its value does not depend on the choice of the sequence of partitions and their representatives. Then we write
S = / f(x) dx.
This definition does not look very practical, but nonetheless it allows us to formulate and prove several simple properties of the Riemann integral:
Theorem. (1) Suppose f is a bounded real function defined on the interval [a, b], and c G [a, b] is an inner point of this interval. Then the integral f(x) dx exists if and only if
both of the integrals f(x) dx and J"6 f(x) dx exist. In that case
fb re rb
f(x)dx= I f(x)dx+ I f(x)dx.
J a J c
(2) Suppose f and g are two real functions defined on the interval [a, b], and that both of the integrals J f(x) dx and
g(x) dx exist. Then the integral of their sum also exists and
b rb rb
(f(x) + g{x)) dx= f(x) dx+ g{x) dx.
J a J a
(3) Suppose f is a real function defined on the interval [a, b], C £ R is a constant, and the integral J f(x) dx exists. Then the integral C ■ f(x) dx also exists and
) rb
C ■ f(x)dx = C ■ / f(x)dx.
4Bernhard Riemann (1826-1866) was an extremely influential German mathematician with many contributions to infinitesimal analysis, differential geometry, and in particular complex analysis and analytic number theory.
400
CHAPTER 6. DIFFERENTIAL AND INTEGRAL CALCULUS
6.B.14. Prove that
1 . „
— sin x ■
2
113
-- cos(2a;) H--cos(4a;) H--.
4 v ' 16 V ' 16
Solution. Easier than to compare the given expressions directly is to show that the functions on the right and left hand side have the same derivatives. We have LI = 2 cos x sin3 x = sin(2a;) sin2 x,
P' = \ sin(2a;) + \ sin(4a;) = sin 2x(\ + \ cos(2a;)) = sin(2a;) sin2 x. Hence the left and the right hand side differ by a constant. This constant can be determined by comparing the values at one point, for example 0. Both functions are zero at zero, thus they are equal. □ Integration of rational functions.
The key to integration of rational functions lies in decomposition of a rational function as a sum of a simple rational functions, which we know, how to integrate. Let us decompose some rational functions:
6.B.15. Carry out the suggested division of polynomials
2xb-x4+3x2-x+1 x2-2x+4
for x e
6.B.16. Express the function
y:
3xl+2x3-x2 + l 3x+2
as a sum of a polynomial and a rational function. 6.B.17. Decompose the rational expression
( \ 4x2 + l3x-2 . W x3+3x2-4x-12>
(b)
2x°+5xJ-x^+2x-l xe+2x4+x2
into partial fractions. 6.B.18. Express the function
,, _ 2x3+6x2+3j:-6
y ~ x*-2x3
in the form of partial fractions. 6.B.19. Decompose the expression
7x2-10x+37
Proof. (1) First suppose that the integral over the whole □ interval exists. When computing it, we can limit ourselves to limits of the Riemann sums whose partitions have the point c among their partitioning points. Each such sum can be obtained as a sum of two partial Riemann sums. If these two partial sums would depend on the chosen partitions and representatives in the limit, then the total sums could not be independent on the choices in limit. (It suffices to keep the sequence of partitions of the subinterval the same, and change the other so that the limit would change).
Conversely, if both Riemann integrals on both subinter-vals exists, they can be approximated with arbitrary precision by the Riemann sums, and moreover independently on their choice. If a partitioning point c is added to any sequence of Riemann sums over the whole interval [a, b], the value of the whole sum is changed. Also the values of the partial sums over the intervals belonging to [a, c] and [c, b] change at most by a multiple of the norm of the partition and possible differences of the bounded function / on all of [a, b]. This is a number arbitrarily close to zero for a decreasing norm of the partition. Necessarily the partial Riemann sums of the function over the two parts of the interval also converge to the limits, whose sum is the Riemann integral over [a, b].
(2) In every Riemann sum, the sum of the functions manifests as the sum of the values in the chosen representatives. Because multiplication of real numbers is distributive, each Riemann sum becomes the sum of the two Riemann sums with the same representatives for the two functions. The statement follows from the elementary properties of limits.
(3) Each of the Riemann sums is multiplied by the constant C. So the claim follows from the elementary properties of limits. □
6.2.9. The fundamental theorem. The following result is crucial for understanding the relation between the integral and the derivative. The complete proof of this theorem is somewhat longer, so it is broken into several subsections.
O Fundamental theorem of integral calculus
o
o
x3-3x2+9x+13
into partial fractions.
6.B.20. Express the rational function
_ -5x+2 y ~ xi-x3+2x2
in the form of a sum of partial fractions. 6.B.21. Decompose the function
y= x3(x+i)
o
o
Theorem. For every continuous function f on a finite interval [a, b] there exists its Riemann integral Ja f(x)dx. Moreover, the function F(x) given on the interval [a,b] by the Riemann integral
F(x) = / f(t)dt
J a
is a primitive function to f on this interval.
6.2.10. Upper and lower Riemann integral. In the first Ji„ step for proving the existence of the integral, we use nrj? an alternative definition, in which the choice of rep-
r\£jnf' resentatives and the corresponding value /(&) is re-W-1 placed by the suprema Mi of the values f(x) in the
corresponding subintervals x{], or by the infima m, of
401
CHAPTER 6. DIFFERENTIAL AND INTEGRAL CALCULUS
into partial fractions.
o
6.B.22. Determine the form of the decomposition of the rational function
_ 2x2-U4 y ~ (x-2)x2 (3x2+x+4)2
into partial fractions. Don't compute the undetermined coefficients! O
the function f(x) in the same subintervals, respectively. We speak of upper and lower Riemann sums, respectively (in literature, this process is also called the Darboux integral).
Because the function is continuous, it is bounded on a closed interval, hence all the above considered suprema and infima exist and are finite. Then the upper Riemann sum corresponding to the partition E = (x0,..., xn) is given by the expression
6.B.23. Express the function
_ xA+&x2+x-2
y ~ x4-2x3
as a sum of a polynomial and a proper rational function Q. Then express the obtained function Q in the form of a sum of partial fractions. O
6.B.24. Write the primitive function to the rational function
(a) y ■
(b) y-
x^2;
>-2);
x ^ 2.
S~,sup = SUp f(Q)(Xi - Xi_l)
n
The lewer Riemann sum is
",inf = E( illf f(0)(Xi - Xi~l)
i=i x*-i<£ 1.
To illustrate, we apply this procedure to the three polynomials 1, x, x2 on the interval [—1,1]. Put gx = 1, and generate the sequence
51 = 1
g2 = x g-i = x2
\\9i\
x ■ ldx \ ■ gx = x — 0 = x
\\9i\\- \J-i fi
x2 ■ ldx \ ■ gx
\\92\
x2 ■ x dx J ■ g2 = x2 —
The corresponding orthogonal basis of the space R2 [x] of all polynomials of degree less than three on the interval [—1,1] is 1, x, x2 — 1/3. Rescaling by appropriate numbers so that
452
CHAPTER 7. CONTINUOUS TOOLS FOR MODELLING
This is the recurrent definition of Chebyshev polynomials. That all Tk(x) are polynomials now follows by induction.
□
7A.4. Show that the choice of the weight function ui(x) = e~x and the interval I = [0, oo) in the previous example leads to an inner product for which the Laguerre polynomials
kj
the basis elements all have length 1, yields the orthonormal basis
Ln(x) = £
k=0
form an orthonormal system.
k\
o
7A.5. Check that the orthonormal systems obtained in the previous two examples coincide with the result of the corresponding Gram-Schmidt orthogonalisation procedure applied to the system 1, x, x2,..., xn,..., using the inner products (, )u], possibly only up to signs. O Given a finite-dimensional vector (sub)space of functions, calculate first the orthogonal (or orthonormal) basis of this subspace by the Gram-Schmidt orthogonaliza-tion process (see 2.3.20). Then determine the orthogonal projection as before. See the formula (1) at page 110.
7A.6. Given the vector subspace (sm(x),x) of the space of real-valued functions on the interval [0, it] , complete the function a; to an orthogonal basis of the subspace and determine the orthogonal projection of the function \ sin(a;) onto it. O
7A.7. Given the vector subspace (cos(a;), x) of the space of real-valued functions on the interval [0, it] , complete the function cos (a;) to an orthogonal basis of the subspace and determine the orthogonal projection of the function sin(a;) onto it.
o
B. Fourier series
Having a countable system of orthogonal functions Tk, k = 0,1,..., as in the examples above, we may sequentially project a given function / to 11 the subspaces 14 = (T0,..., Tk). If the limit of these projections exists, this determines a series built on linear combinations of Tk. Under additional conditions, this should allow us to differentiate or integrate the function / in a similar way, as we did with the power series.
We consider one particular orthogonal system of periodic functions, namely that of J.B.J. Fourier.
The periodic functions are those describing periodic processes, i.e. f(t+T) = f(t) for some positive constant T e K,
hi
h2
2' V 2
For example, hi = and
-l
llSif '
x, h3
Ydx ■
(3a;2-1).
We could easily continue this procedure in order to find orthonormal generators of B4 [x]. The resulting polynomials are called Legendre polynomials.
Considering all Legendre polynomials h{, i = 0,..., we have an infinite orthonormal set of generators such that polynomials of all degrees are uniquely expressed as their finite linear combinations.
7.1.4. Orthogonal systems of functions. Generalizing the latter example, suppose we have three polynon-ials hi, h2, h3 forming an orthonormal set. For any polynomial h, we can put
H = (h, hi)hi + (h, h2)h2 + (h, h3)h3.
We claim that H is the (unique) polynomial which minimizes the L2-distance \\h - H\\. See 3.4.3.
The coefficients for the best approximation of a given function by a function from a selected subspace are obtained by the integration introduced in the definition of the inner product.
This example of computing the best approximation of H by a linear combination of the given orthonormal generators suggests the following generalization:
Orthogonal systems of functions
Every (at most) countable system of linearly independent functions in 5^ [a, b] such that the inner product of each pair of distinct functions is zero is called an orthogonal system of functions. If all the functions /„ in the sequence are pair-wise orthogonal, and if for all n, the norm ||/n||2 = 1, we talk about an orthonormal system of functions.
Consider an orthogonal system of functions fn £ 5° [a, b] and suppose that for (real or complex) constants c„, the series
oo
F(X) = ^Cnfn{x)
71 = 0
converges uniformly on a finite interval [a, b]. Notice that the limit function F(x) does not need to belong to 5° [a, b], but this is not our concern now.
By uniform convergence, the inner product (F, fn) can be expressed in terms of the particular summands (see the corollary 6.3.7), obtaining
oo j-b
(F,fn) = E Cm / fm(x)fn(x) dx = Cn\\fn\\22, m=0 Ja
453
CHAPTER 7. CONTINUOUS TOOLS FOR MODELLING
called the period of /, and all t G R. One of the fundamental periodic processes which occur in applications is a general simple harmonic oscillation in mechanics. The function f(t) which describes the position of the point mass on the line in the time t is of the form
(1)
f(t) = asin(o;i + b)
for certain constants a, a; > 0, & G R. In the diagram on the left, f(t) = sin(i + 1) and on the right, f(t) = sin(4i + 4):
Applying the standard trigonometric formula
sin(a + ß) = cos a sin ß + sin a cos ß, with a, ß G R, we write the function f(t) alternatively as
(2) f(t) = ccos(o;i) + dsin(cdt),
where c = a sin b, d= a cos b.
7.B.I. Show that the system of functions 1, sin(a;), cos(a;), ..., sin(na;), cos(na;),... is orthogonal with respect to the L2 inner product on the interval I = [—tt, it] . O
Building an orthogonal system of periodic functions sin(na;) and sin(na; + 7r/2) = cos(na;) leads to the classical Fourier series. 1
In application problems, we often meet the superposition of different harmonic oscillations. The superposition of finitely many harmonic oscillations is expressed by sums of functions of the form
fn(x) = an cos(ncjx) + bn sin(na;a;) for n G {0,1,..., m}. These particular functions have prime period 27r/(neu). Therefore, their sum
(3)
2
+ £^ (a„ cos(na;a;) + bn sin(nüjx))
is a periodic function with period T = 2tt/uj.
^The Fourier series are named in honour of the French mathematician and physicist Jean B. J. Fourier, who was the first to apply the Fourier series in practice in his work from 1822 devoted to the issue of heat conduction (he began to deal with this issue in 1804-1811). He introduced mathematical methods which even nowadays lie at the core of theoretical physics. He did not pay much attention to physics himself.
since each term in the sum is 0 except when m = n. Exactly as in the example above, each finite sum J2n=o cnfn(x) *s the best approximation of the function F(x) among the linear combinations of the first k + 1 functions /„ in the orthogonal system.
Actually, we can generalize the definition further to any vector space of functions with an inner product. See the exercise 7.A.3 for such an example. For the sake of simplicity we confine ourselves to the L2 distance, but the reader can check that the proofs work in general.
We extend our results from finite-dimensional spaces to infinite dimensional ones. Instead of finite linear combinations of base vectors, we have infinite series of pairwise orthogonal functions. The following theo-f ^ rem gives us a transparent and very general answer to the question as to how well the partial sums of such a series can approximate a given function:
7.1.5. Theorem. Let fn, n = 1, 2,..., be an orthogonal sequence of (real or complex) functions in 5° [a,b] and let g G 5° [a, b] be an arbitrary function. Put
cn = H/nll 2 / g(x)fn(x) dx. J a
Then
(1) For any fixed n G N, the expression which has the least L2-distance from g among all linear combinations of functions fi ,...,/„ is
n
hn = y^Cjfj(x).
i=l
(2) The series lcn|2||/nl|2 always converges, and moreover
oo
£m2II.U2<:NI2-
71=1
(3) The equality X^i cn\\fn\\2 = ||ff||2 holds if and only if
Hindoo \\g - hn\\2 = 0.
Before presenting the proof, we consider the meaning of the individual statements of this theorem. Since we are working with an arbitrarily chosen orthogonal system of functions, we cannot expect that all functions can be approximated by linear combinations of the functions /,.
For instance, if we consider the case of Legendre orthogonal polynomials on the interval [—1,1] and restrict ourselves to even degrees only, surely we can approximate only even functions in a reasonable way. Nevertheless, the first statement of the theorem says that the best approximation possible (in the L2-distance), is by the partial sums as described.
The second and third statements can be perceived as an analogy to the orthogonal projections onto subspaces in terms of Cartesian coordinates. Indeed, if for a given function g, the series F(x) = J2^=i cnfn(x) converges pointwise, then the function F(x) is, in a certain sense, the orthogonal projection of g into the vector subspace of all such series.
454
CHAPTER 7. CONTINUOUS TOOLS FOR MODELLING
7.B.2. Show that the system of functions 1, sin(no;a;), cos(no;a;), for all positive integers n is orthogonal with respect to the L2 inner product on the interval [—ir/ui,ir/cu].
o
When projecting a given function orthogonally to the subspace of functions (3), the key concept is the set of Fourier coefficients an and bn, n6N,
7.B.3. Find the Fourier series for the periodic extension of the function
(a) g(x) = 0, x £ [—7r, 0), g(x) = sinx, x £ [0,7r);
(b) g(x) = \x\, x £ [-7T, tt);
(c) g{x) = 0, x £ [-1, 0), s(a;) = x + 1, ir £ [0,1).
Solution. Before starting the computations, consider the illustrations of the resulting approximation of the given functions. The first two display the finite approximation in the cases a) and b) up to n = 5, while the third illustration for the case c) goes up to n = 20. Clearly the approximation of the discontinuous function is much slower and it also demonstrates the Gibbs phenomenon. This is the overshooting in the jumps, which is proportional to the magnitudes of the jumps.
The case (a). Direct calculation gives (using formulae from 7.1.6)
7t 0 7t
ao =\ J 9{x)&x = ^ f Odai+i/sinaida;
— tv — tv 0
1 [ itt 2
= - — COS Xin. = —,
tv l ju tv '
The second statement is called Bessel's inequality and it is an analogy of the finite-dimensional proposition that the size of the orthogonal projection of a vector cannot be larger than the original vector. The equality from the third statement is called Parseval's theorem and it says that if a given vector does not decrease in length by the orthogonal projection onto a given subspace, then it belongs to this subspace.
On the other hand, the theorem does not claim that the partial sums of the considered series need to converge point-wise to some function. There is no analogy to this phenomenon in the finite-dimensional world. In general, the series F(x) need not be convergent for any given x, even under the assumption of the equality in (3). However, if the
series J2n=
converges to a finite value, and if all the
functions /„ are bounded uniformly on I, then, the series F(x) — J2n°=i cnfn(x) converges at every point x. Yet it need not converge to the function g everywhere. We return to this problem later.
The proof of all of the three statements of the theorem is similar to the case of finite-dimensional Euclidean spaces. That is to be expected since the bounds for the distances of g from the partial sum / are constructed in the finite-dimensional linear hull of the functions concerned:
Proof of theorem 7.1.5. Choose any linear combination / = 2~Zn=i anfn and calculate its distance from g. We obtain
k ~b k
^2anfn\\2= g(x)-^2anfn
77=1 J 11 77=1
(x)
dx
b
g(x)\2 dx - / y~]g(x)anfn(x) dx-ja 77=1
(x)g(x)dx+ / y~] anfn( -- 77=1 Ja 77=1
dx
l~Y^ anCn\\fn\\2-^ancn\\fn\\2 +^a2n\\fn\
77=1
k
+ ll/™H2((C™ ~ fl77)(c77 - an) - \Cn[
77=1
k
^ ^ 11 /t7 II (I ^77 ^77 I I ^77
Since we are free to choose an as we please, we minimize the last expression by choosing an = cn, for each n. This completes the proof of the first statement. With this choice of an, we obtain Bessel's identity
k k
ii5-Ec^ii2 = ii5ii2-Eic"i2ii/™ii2-
n=l n=l
Since the left-hand side is non-negative, it follows that:
Ec2ii/«ii2^
455
CHAPTER 7. CONTINUOUS TOOLS FOR MODELLING
in J g{x)cos(nx) Ax
— tt
0 tt
= ^/0da;+iJsinxcos(nx) Ax
^ J sin((l + n)x) + sin((l — n)x) A x j_
2ir
2ir
cos((l+n)x) cos((l—n)x)
1+n 1-n
cos((l+n)7r) cos((l — 77)77)
1+n 1-n
i 1
+
1+77 1—n 1
2ir ^ 1 + 77
1 ^(-1)" + 1 1-772
1-77
1
2 '
ai =57 Jsm(2a;) da; = 0
77 0 77
&i = ^ f g(a;) sin a; d a; = i J Oda; + iJ sin2 a; d x
— tt — tt 0
= iJl-cos(2a:)da: = i[a:--^
0 L
7T
bn = ^ / g{x)svci(nx) Ax
— tt
0 tt
= ^:/0da; + dJsmxsin(na;) dx
-tt 0 tt
~ ~krr S cos((l — n)x) — cos((l + n)x) A:
_ 1 sin((l—77)x) sin((l+77)x)
_ 2tF [ 1-77 1 + 77
for n G N \ {1}. Thus, we arrive at the Fourier series
1 sin a; 1 v-^ / i — - + — +
77 = 2 V
tt
-— cos(na;J
Since (—1)™ + 1 = 0 when n is odd, and = 2 when n is even, we can put n = 2m to obtain the series
1 sin a; 2 ^-^
77 ^ 9~ '77 ^
ä(2r
cosl^ma;]
tt z—' 4m2 — 1
777=1
The case (b). The given function is of a sawtooth-shaped oscillation. Its expression as a Fourier series is very important in practice. Since the function g is even on (—tt, tt), it is immediate that bn = 0 for all n e N. Therefore, it suffices to determine an for n£N:
ao = ^ ] g{x)Ax = l]xAx = l H\ = ^-
-tt 0 L JO
For other neN, use integration by parts, to get
tt tt
an = — J g{x) cos(na;) d x = -fx cos(na;) d x
-tt 0 tt
= - \- sm(nx)]^--— f sinfna;) da;
tt L 77 ^ ' \ 0 77tt j ^ '
= ^F[cos(na;)]J=^F0((-l)n-l)-So an = — t4- for n odd, an = 0 for n even.
Let k —> oo. Since every non-decreasing sequence of real numbers which is bounded from above has a limit, it follows that
OO 77=1
which is Bessel's inequality.
If equality occurs in Bessel's inequality, then statement (3) follows straight from the definitions and the Bessel's identity proved above. □
An orthogonal system of functions is called a complete orthogonal system on an interval I = [a,b] for some space of functions on I if and only if Parseval's equality holds for every function g in this space.
7.1.6. Fourier series. The coefficients Cn from the previous theorem are called the Fourier coefficients of a given function in the (abstract) Fourier series. The previous theorem indicates that we are able to work with countable orthogonal systems of functions /„ in much the same way as with finite orthogonal bases of vector spaces.
There are, however, essential differences: • It is not easy to decide what the set of convergent or uniformly convergent series
F(x) = E °nfn
looks like.
• For a given integrable function, we can find only the "best approximation possible" by such a series F(x) in the sense of L2-distance.
In the case when we have an orthonormal system of functions /„, the formulae mentioned in the theorem are simpler, but still there is no further improvement in the approximations.
The choice of an orthogonal system of functions for use in practice should address the purpose for which the approximations are needed. The name "Fourier series" itself refers to the following choice of a system of real-valued functions:
Fourier's orthogonal system
1, sin a;, cos a;, sin 2a;, cos 2a;,
smni, cosna;,
An elementary exercise on integration by parts shows that this is an orthogonal system of functions on the interval
[-7T, TT].
These functions are periodic with common period 27r (see the definition below). "Fourier analysis", which builds upon this orthogonal system, allows us to work with all piece-wise continuous periodic functions with extraordinary efficiency. Since many physical, chemical, and biological data are perceived, received, or measured, in fact, by frequencies of the signals (the measured quantities), it is really an essential mathematical tool. Biologists and engineers often use the word "signal" in the sense of "function".
456
CHAPTER 7. CONTINUOUS TOOLS FOR MODELLING
This determines the Fourier series of a function of sawtooth-shaped oscillation as
TT 2^/(-l)"-l . ,
- + -> -5-cosfna;)
2 tt ^ \ n2 '
tt 4 cos((2ti — l)a;)
Periodic functions
tt —' (2ti-1)2
77=1 v !
tt 4 / cos(3a;) cos(5a;)
---cos x H---—- H--+ •
2 f 32 52
This series could have been found by an easier means, namely by integrating the Fourier series of Heaviside's function (see "square wave function" in 7.1.9 ).
The case (c). The period for this function is T = 2, and so cj = 2tt/T = tt. Use the more general formulae from 7.1.6, namely
xo+T 1
ao = T I d(x) da; = J g(x) Ax
x0 — 1
0 1
= J Odx + j(x + l) dx = §, -1 0
x0+T 1
an = ij; J g{x) cos{nux) dx = J g(x) cos(mrx) dx
x0 — 1
0 1
= J" 0 d a; + J (a; + 1) cos(n7ra;) d x = I . -1 0
x0+T 1
bn - 2
ir2 '
_ l-2(-l)
777T
T J g(x) sin(ncjx) dx = J g(x) sm(mrx) dx
xo —1 0 1
= J0da; + J(a; + 1) sin(n7ra;) d x -1 0
The calculation of a0 was simple and needs no further comment. As for determining the integrals at an and bn, it sufficed to use integration by parts once. Thus, the desired Fourier series is
I + g (izir^i cos{n7TX) + iz2tn
4 \ 71 7T V / 777T
\(nTTX)
Some refinements of the expression are available. For instance, for n G N,
«77 = — ^f2" f°r n °dd, «77 = 0 for 71 even,
and, similarly,
K = ^ for 71 odd, bn = - ^ for 71 even.
□
A real or complex valued function / defined on R is called a periodic function with period T > 0 if f(x + T) = j(x) for every 16I,
It is evident that sums and products of periodic functions with the same period are again periodic functions with the same period.
We note that the integral f*°+T f(x) dx of a periodic function / on an interval whose length equals the period T is independent of the choice of a;0 G R. To prove it, it is enough to suppose 0 < a;0 < T, using a translation by a suitable multiple of T. Then,
rxo+T rT rxo+T
f(x) dx = I f(x) dx + I f(x) dx
l xo J xo JT
rT rxo rT
f(x)dx+ / f(x)dx= / f(x)dx Jo Jo
Fourier series The series of functions
F[x) = — + 'Sy^j (an cos{nx) + bn sin(na;))
2 n=l
from the theorem 7.1.5, with coefficients
1 rxo+2-k
TT
J Xo
1 rXo+2-k
TT J xo
In the next exercise we show that the calculation of the Fourier series does not always require an integration. Es- which have values pecially in the case when the function g is a sum of products (powers) of functions y = sin(ma;), y = cos(nx) for a™ 771,71 G N, one can rewrite g as a finite linear combination of basic functions.
g(x) cos(nx) dx, g(x) sin(7ia;) dx,
is called the Fourier series of a function g on the interval
[a;0,a;o + 2tt}.
The coefficients an and bn are called Fourier coefficients of the function g.
If T is the time taken for one revolution of an object moving round the unit circle at constant speed, then that constant speed is a; = 27r/T. In practice, we often want to work with Fourier series with an arbitrary primary period T of the functions, not just 271. Then we should employ the functions cos(o;7ia;), sm(uinx), where cu = 2f. By substitution t = cux, we can verify the orthogonality of the new system of functions and recalculate the coefficients in the Fourier series F(x) of a function g on the interval [x0, x0 + T]:
F{x) = + (an cos(nux) + bn sm(nujx)^,
by, =
n- =1
2 rxo+T
T . Ixo
2 rxo+T
T . Ixo
g(x) cos(ncjx) dx, g(x) sin(7io;a;) dx.
457
CHAPTER 7. CONTINUOUS TOOLS FOR MODELLING
7.B.4. Determine the Fourier coefficients of the function
(a) g(x) = sin(2a) cos(3a), a £ [—tt, 7t];
(b) g(x) = cos4 x, x £ [—7r,7r].
Solution. Case (a). Using suitable trigonometric identities,
sin(2a) cos(3a) = |(sin(2a + 3a;) + sin(2a — 3a)) = ^ sin(5a) — i sin a.
It follows that the Fourier coefficients are all zero except
for&i = -1/2, b5 = 1/2. Case (b). Similarly, from
cos a = (cos a) = ( 2y—-)
= \ (1 + 2 cos(2a) + cos2(2a))
7.1.7. The complex Fourier coefficients. Parametrize the unit circle in the form:
piut _ CQSUJt _|_ j smCl;f.
For all integers m, n with m ^ n,
ptmx e-.m dx= I et(m-a)x ^
|(l + 2cos(2a) +
l+cos(4x) >
= § + \ cos(2a) + \ cos(4a), a £ R. Hence a0 = 3/4, a2 = 1/2, a4 = 1/8, and the other coefficients are all zero. □
7.B.5. Given the Fourier series of a function / on the interval [—7T, 7r] with coefficients am, &„, m £ N, £ Z+, prove the following statements:
(a) If /(a) = /(a + tt), a £ [-tt,0], then a2fc-i = &2fc-i = 0 for every A; £ N. (b) If /(a) = -/(a + 7r), a £ [-7T, 0], then a0 = a2k = b2k = 0 for every k £ N.
Solution. The case (a). For any k £ N, the statement can be proved directly by calculations, but we provide here a conceptual explanation.
The definition of the function / ensures it is periodic with period -k. Thus we may write its Fourier series on the shorter interval [—-|, f] as follows
oo
f{t) =--h (än cos(2ní) + bn sin(2ní)).
71=1
Clearly this must be also the Fourier series of the same function / over the interval [—tt, tt] and so the claim is proved. Alternatively, if
oo
/(a) = a0/2 + ^( an cos(na) + bn sin(na))
77=1
then
oo
/(a+7r) = ag/2+(a„ cos(na+?i7r)+?)Tl sin(na+?i7r))
77=1
oo
= ag/2 + (—1)™ (an cos(na) + bn sin(na))
77=1
The two series are the same only when the odd coefficients are zero.
_ 1 \ei(m-n)xY _ q
Thus for m^n, the integral {etmx, emx) = 0.
Fourier's complex orthogonal system
—nut — cut -I cut J2ut nut
6 ,...,6 ; , c , • • •
Note that if m = n, then
etmx e~tmx dx = I dx = 2tt.
The orthogonality of this system can be easily used to recover the orthogonality of the real Fourier's system: Rewrite the above result as
(cos ma + i sin ma) (cos nx — i sin nx) dx = 0
By expanding and separating into real and imaginary parts we get both
(cos ma cos nx + sin ma sin nx) dx = 0
(sin ma cos nx — cos ma sin nx) dx = 0 By replacing n with —n, we have also
(cos ma cos nx — sin ma sin nx) dx = 0
(sin ma cos nx + cos ma sin nx) dx = 0
r
and hence, with m ^ n,
cos ma cos nx dx = 0
sin ma sin nx dx = 0
T
sin ma cos nx dx = 0
r
which proves again the orthogonality of the real valued Fourier system.
Note the case m = n > 0, when
cos2 nx dx = || cos(na) ||| = tt,
sin2 nx dx = \\ sin(na)||| = tt,
lín = 0, then ||1||| = 2tt.
458
CHAPTER 7. CONTINUOUS TOOLS FOR MODELLING
The case (b). Similarly if
oo
an cos(nx) + bn sin(na;))
n=l
then — f(x + tt)
oo
= — ao/2 — (an cos(nx + mr) + bn sin(na; —\-mr))
n=l
oo
= — ciq/2 + (—(an cos(nx) + bn sin(na;)).
n=l
The two series are the same only when the even coefficients are zero. □
Complex Fourier series. It is sometimes convenient (and of-N\^_Lv ten easier) to express the Fourier series using the "Af^o complex coefficients cn instead of the real coeffi-W cients an and bn. This is a straightforward consequence of the facts
emu)x _ cos(jlulx) _|_ j sin(na;a;) or, vice versa
cos(no;a;) = (e«™* + e-«™*)
sin(no;a;) = ^(e"™* -e~m"x).
The resulting series for a real or complex valued function g on the interval [—tt, tt] is F(x) = X^L-oo cn elnx with
cn = ^ J\-^g(x)dx.
See the explanation in 7.1.7. We need just one formula for Cn, rather than one for an and another one for bn.
7.B.6. Compute the complex version of the Fourier series F(x) of the 27r-periodic function g(x) denned by g(x) = 0,
if — tt < x < 0, while g(x) = 1 if 0 < x < tt.
Solution. We have for n^O,
while cq = ^ JJ1 d x = 1/2. So
For a (real or complex) function f(t) with —T/2 < i < T/2, and all integers n, we can define, in this context, its complex Fourier coefficients by the complex numbers
_ 1
T/2
T/2
me-
at.
The relation between the coefficients an and bn of the Fourier series (after recalculating the formulae for these coefficients for functions with a general period of length T) and these complex coefficients cn follow from the definitions, cq = ao/2, and for natural numbers n, we have
Cn — 2 (^71 ibn), C—n — 2 (^n + ibn) •
If the function / is real valued, cn and c_„ are complex conjugates of each other.
We note here that for real valued functions with period 27r, the Bessel inequality in this notation becomes
°° -1 fit
(l/2)|a0|2 + E(l«n|2 + l^|2) y —i---—2"T-■-~
The function <£> has another useful feature. Namely we can obtain the unit constant function by adding all its integer translations
£ X1 and i2 : X —> X2, into two completions of the space X, and denote the corresponding metrics by d, di, and d2, respectively. The mapping
■X-
■X,
p: Ll(X) -
is well-defined on the dense subset i\ (X) c X1. Its image is the dense subset i2(X) c X2 and, moreover, this mapping is clearly an isometry. The dual mapping i\ oi2_1 works in the same way.
Every isometric mapping maps, of course, Cauchy sequences to Cauchy sequences. At the same time, such Cauchy sequences converge to the same element in the completion if and only if this holds for their images under the isometry p. Thus if such a mapping p is denned on a dense subset X of a metric space X\, then it has a unique extension to the whole Xi with values lying in the closure of the image p(X), i. e. X2.
By using the previous ideas, there is a unique extension of p to the mapping p : X\ —> X2 which is both a bijection and an isometry. Thus, the completions X\ and X2 are indeed identical in this sense.
Thus it is proved:
Theorem. Let X be a metric space with metric d which is not complete. Then the completion X of X with metric d is unique up to bijective isometries.
In the following three paragraphs, we introduce three theorems about complete metric spaces. They are highly applicable in both mathematical analysis and verifying convergence of numerical methods.
7.3.9. Banach's contraction principle. A mapping F
X —> X on a metric space X with metric d
___is called a contraction mapping if and only if
there is a real constant 0 < C < 1 such that for all elements x, y in X,
d(F(x),F(y)) y + t0))-Another application of the mean value theorem yields
0 guarantees the desired equality
fxy(x,y) = fyx(x,y)
at all points (x,y).
8.1.12. The same procedure for functions of n variables proves the following fundamental result:
commutativity of partial derivatives
Theorem. Let f : En —> R be a k-times differentiable function with continuous partial derivatives up to order k (inclusive) in a neighbourhood of a point x g Rn. Then all partial derivatives of the function f at the point x up to order k (inclusive) are independent of the order of differentiation.
Proof. The proof for the second order is illustrated
above in the special case when n = 2. In fact, it yields the general case as well.
Indeed, notice that for every fixed choice of
a pair of coordinates xt and Xj, the discussion of their interchanging takes place in a two-dimensional affine subspace, (all the other variables are considered to be constant and do not affect in the discussion). So neighbouring partial derivatives may interchanged. This solves the problem in order two.
In the case of higher-order derivatives, the proof can be completed by induction on the order. Every order of the indices ii,...,ik can be obtained from a fixed one by several interchanges of adjacent pairs of indices. □
523
CHAPTER 8. CALCULUS WITH MORE VARIABLES
f'Jz=xt \n2x-£ + x^ hix-%
f" = \x^ + ^-Hnx ■ ljx>
E^-Hnx ■ =%,f" ^x^lrfx-^+x* Inx ■
+
□
8.D.25. Find all first and second order partial derivatives of z = f(x, y) in [1, \/2,2] defined in a neighborhood of the
point by x2 + y2 + z2 — xz — \/2yz = 1.
o
8.D.26. Find all first and second order partial derivatives of z = f(x, y) in [—2, 0,1] defined in a neighborhood of the
point by 2x2 + 2y2 + z2 + 8xz -2 + 8 = 0. O
8.D.27. Determine all second partial derivatives of the function / given by f(x, y, z) = y'xylaz. Solution. First, we determine the domain of the given function: the argument of the square root must be non-negative, and the argument of the natural logarithm must be positive. Therefore, Df = {(x, y, z) e R3, (z > lk(xy > 0)) V (0 < z < \)k{xy < 0)}.
Now, we calculate the first partial derivatives with respect to each of the three variables:
fx
yln(z)
f' fy
x 111(2)
= , fz
xy
2^/xy 111(2) JV 2^/xy 111(2) J 2z^/xy 111(2) Each of these three partial derivatives is again a function of three variables, so we can consider (first) partial derivatives of these functions. Those are the second partial derivatives of the function /. We will write the variable with respect to which we differentiate as a subscript of the function /.
fXX
fxy fxz
fyy
fyz fzz
2 1 2
y In 2 4(xy In z)'i
xy In2 2 In 2
4(xylnz)i 2yjxy\nz'
xy2 In 2
+
Az(xy\az)i 2z^xy In 2
21 2 x In 2
4(xy In z)'i
x2yIn 2
+
4z(xy\nz)2 2zy/xy In 2' xy
x2y2
422(xyln2)f 2z2^xy In 2
8.1.13. Hessian. The differential was introduced as the linear form df(x) which approximates the function / at a point x in the best possible way.
Similarly, a quadratic approximation of a function / : En -+ R is possible.
Hessian
Definition. If / : R™ -+ R is a twice differentiable function, the symmetric matrix of functions
/ a2/ (x) a2/ (x)\
V 9 f (x) 9 f (x) I
is called the Hessian of the function /.
It is already seen from the previous reasonings that the vanishing of the differential at a point (x, y) e E2 guarantees stationary behaviour along all curves going through this point. The Hessian
Hf(x, y)=( -H^'y] fxy(ix'y}] JK ,U> \fyx(x,y) fyy(x,y))
plays the role of the second derivative. For every parametrized straight line
c(t) = (x(t),y(t)) = (x0 + y0 + -qt),
the derivative of the univariate function a(t) = j(x(t),y(t)) can be computed by means of the formula -^f(t) = fx(x(t),y(t))x'(t) + Jy(x(t),y(t))y'(t) (derived in 8.1.8) and so the function
df df P{t) = f(x0,yo) +tg--(x0,y0)£ + t — (x0,y0)Ti
+
fxx(x0,yo)£2 + 2fxy(x0,y0)£ri + fyy(x0,y0)i]:
shares the same derivatives up to the second order (inclusive) at the point t = 0 (calculate this on your own!). The function (3 can be written in terms of vectors as
= f(x0,yo) + df(x0,y0)(tv) + ^Hf(x0,y0)(tv,tv),
where v = (£, 77) is the increment given by the derivative of the curve c(t), and the Hessian is used as a symmetric 2-form.
This is an expression which looks like Taylor's theorem for univariate functions, namely the quadratic approximation of a function by Taylor's polynomial of degree two. The following illustration shows both the tangent plane and this quadratic approximation for two distinct points and the function f(x,y) = sin(a;) cos(y). popisobrazku
524
CHAPTER 8. CALCULUS WITH MORE VARIABLES
By the theorem about interchangeability of partial derivatives (see 8.1.12), we know that fxy = fyx, fxz = fzx, fyz = fzy. Therefore, it suffices to compute the mixed partial derivatives (the word "mixed" means that we differentiate with respect to more than one variable) just for one order of differentiation. □
£. Taylor polynomials
8.E.I. Write the second-order Taylor expansion of the function / : R2 -> R, f{x,y) = \n(x2 + y2 + 1) at the point
[1.1]-
Solution. First, we compute the first partial derivatives:
fx
2x
-J,
2y
x2 + y2 + VJy x2+y2 + V
then the Hessian:
Hf(x,y) =
4Xy
02+j/2 + l)2 (x2+j/2 + l)2
4Xy
' (x2+j/2 + l)2 (x2+j/2 + l)2
The value of the Hessian at the point [1,1] is
2 _4
\ 25 9 9
Altogether, we get that the second-order Taylor expansion at the point [1,1] is
T2(x,y) =/(l, 1) + fx(l, l)(x - 1) + fy(l, l)(y - 1)
+ \(x-l,y-l)Hf(l,l) ^yZ\
= m(3) + |(z-l) + |(y-l) + ±(z-l)2
=\ix2 + V2 + 8x + % - AxV - 14) + ln(3)-9
□
Remark. In particular, we can see that the second-order Taylor expansion of an arbitrary differentiable function at a given point is a second-order polynomial.
8.E.2. Determine the second-order Taylor polynomial of the function / : R2 —> R2, f(x, y) = xy cosy at the point [tt, 7t] . Decide whether the tangent plane to the graph of this function at the point [7r,7r, /(tt, tt)] goes through the point
[0,7T,0].
Solution. As in the above exercises, we find out that
TV "i 1 2 2 3 , 1 4
T(xi y) = 277 y ^xy^7ry+27r'
8.1.14. Taylor's expansion. The multidimensional version of Taylor's theorem is an example of a mathe-'■*ScTS^" matical statement where the most difficult part is finding the right formulation. The proof is then quite simple. The discussion on the Hessians continues. Write Dkf for the fc-th order approximations of the function / : En —> R™. It is always a fc-linear expressions in the increments.
The differential D1f = df (the first order) and the Hessian D2 f = Hf (the second order) are already discussed. For functions / : En —> R, points x = (x1,..., xn) G En, and increments v = (£i,..., £„), set
Dkf(x)(v)=
l 0 (the composition as well as the sum of increasing functions is again an increasing function). Therefore, it has a unique externum, and that is a minimum at the point x = 0. Similarly, for any fixed value of x, f is a shift of the function f2, and f2 has a minimum at the point y = 0, which is its only externum. We have thus proved that / can have a local externum only at the origin. Since
/(0,0) = 0, f(x,y)>0, [x,y] gR2x{[0,0]},
Multi-indices
A multi-index a of length n is an n-tuple of non-negative integers (qi, ..., an). The integer \a\ = ol\ + ■ ■ ■ + an is called the size of the multi-index a.
Monomials are written shortly as xa instead of a"1 £22 • • • xnn- Real polynomials in n variables can be symbolically expressed in a similar way as univariate polynomials:
/ = E flQa;Q' g = E bl3X'3 e Kta;i''''' H (r cos ip, r sin R2 (for instance, on the domain of all points in the first quadrant except for the points having x = 0):
= \Jx2 + y2, p = arctan
Consider now the function gt : E2 —> R, with free parameter
t e R,
g(r, p, t) = sin(r - t)
534
CHAPTER 8. CALCULUS WITH MORE VARIABLES
S. Therefore, we are likely to get 6 local extrema. Further, inside every eighth of the sphere given by the coordinate planes, there may or may not be another extremum. The particular quadrants can be easily parametrized, and the function h (considered a function of two parameters) can be analyzed by standard means (or we can have it drawn in Maple, for example).
Actually, solving the system (no matter whether algebraically or in Maple again) leads to a great deal of stationary points. Besides the six points we have already talked about (two of the coordinates equal to zero and the other to ±1) and which have A = ±|, there are also the points
P±
Vs Vs Vs
~3~' ~3~' ~3~
for example, where a local extremum indeed occurs.
If we restrict our interest to the points of the circle K, we must give another function G another free parameter 77 representing the gradient coefficient. This leads to the bigger system
in polar coordinates. Such a function can approximate the waves on a water surface after a point impulse in the origin at the time t, see the illustration (there, t = —tt/2). While it was easy to define the function in polar coordinates, it would have been much harder to guess with Cartesian coordinates.
Compute the derivative of this function in Cartesian coordinates. Using the theorem,
dg , . dg , . dr dg , . dp , .
-ix,y,t) = -(r,p)-(x,y) + -(r,p)-(x,y)
= cos(\/x2 + y2 — i)
3a;2 -2\x - V,
3y2 -2\y -v,
3z2 - 2Xz - V, and, similarly,
\A2 + :
+ 0
x2 + y2 + z2 -1, x + y + z.
However, since a circle is also a compact set, h must have both a global minimum and maximum on it. Further analysis is left to the reader. □
dg , . dg , -.dr dg , ,3(3, .
-ix,y,t) = -(r,p)-(x,y) + -(r,p)-(x,y)
i(^x2 +y2 -t)
V
\A2 + :
8.H.3. Determine whether the function / : R3 -> R, f(x, y, z) = x2y has any extrema on the surface 2a;2 + 2y2 + z2 = 1. If so, find these extrema and determine their types.
Solution. Since we are interested in extrema of a continuous function on a compact set (ellipsoid) - it is both closed and bounded in R3 - the given function must have both a minimum and maximum on it. Moreover, since the constraint is given by a continuously differentiable function and the examined function is differentiable, the extrema must occur at stationary points of the function in question on the given set. We can build the following system for the stationary points:
2a;y = 4fca;, a;2 = 4fcy, 0 = 2kz.
8.1.22. The inverse mapping theorem. If the first derivative of a differentiable univariate function is non-zero, its sign determines whether the func-S'iuNte tion is increasing or decreasing. Then, the function has this property in a neighbourhood of the point in question, and so an inverse function exists in the selected neighbourhood. The derivative of the inverse function /_1 is then the reciprocal value of the derivative of the function / (i.e. the inverse with respect to multiplication of real numbers).
Interpreting this situation for a mapping E\ —> E\ and linear mappings R —> R as their differentials, the nonvanish-ing is a necessary and sufficient condition for the differential to be invertible as a linear mapping. In this way, a statement is obtained which is valid for all finite-dimensional spaces in general:
535
CHAPTER 8. CALCULUS WITH MORE VARIABLES
This system is satisfied by the points [± , , 0] and [±-^=, —-^g, 0]. The function takes on only two values at these four stationary points. Ir follows from the above that the first and second stationary points are maxima of the function on the given ellipsoid, while the other two are minima. □ Remark. Note that we have used the variable k instead of A from the theorem 8.1.28.
8.H.4. Decide whether the function / : R3 -> R, f(x, y,z) = z — xy2 has any minima and maxima on the sphere
x2 + y2 + z2 = 1. If so, determine them.
Solution. We are looking for solutions of the system
kx = -y2, ky = -2xy, kz = 1.
The second equation implies that either y = 0ora; = —■§. The first possibility leads to the points [0,0,1], [0, 0, -1]. The second one cannot be satisfied. Note that because of the third equation k =^ 0 and substituting into the equation of the sphere, we get the equation
k2 k2 1
T + T + F=1'
which has no solution in real numbers (it is a quadratic equation in k2 with the negative discriminant). The function has a maximum and minimum, respectively, at the two computed points on the given sphere. □
8.H.5. Determine whether the function / : R3 -> R, f(x, y, z) = xyz, has any externa on the ellipsoid given by the equation
g(x,y, z) = kx2 + ly2 + z2 = 1, k, / G R+.
If so, calculate them.
Solution. First, we build the equations which must be satisfied by the stationary points of the given function on the ellipsoid:
dg ,df
—— = A— : yz = ZXkx, ox ox
dg ,df
— = A— : xz = 2Xly, dy dy
dg xdf
—— = A— : xy = 2\z.
Oz Oz
The inverse mapping theorem
Theorem. Let F : En —> En be a differentiable mapping on a neighbourhood of a point x0 G En, and let the Jacobi matrix D1F(x0) be invertible.
Then in some neighbourhood of x0, the inverse mapping F-1 exists, it is differentiable, and its differential at the point F(x0) is the inverse mapping to the differential D^F(x0).
Hence, D1 (F~v)(F(x0)) is given by the inverse matrix to the Jacobi matrix of the mapping F at the point x0.
Proof. First, verify that the theorem makes sense and , is as expected. If it is supposed that the in-;/ verse mapping exists and is differentiable at F(x0), then differentiating the composite mapping F-1 o F enforces the formula
id«
D1(F~1 o F)(x0) = D\F~1) o D^ixo),
which verifies the formula at the conclusion of the theorem. Therefore, it is known at the beginning which differential for F-1 to find.
Next, suppose that the inverse mapping _F_1 exists in a neighbourhood of the point F(x0) and that it is continuous. Since F is differentiable in a neighbourhood of x0, it follows that
(1) F(x) - F(x0) - D^Fix^ix - x0) = a(x - x0)
with function a : R™ —> 0 satisfying lim^o 'p]fQ(1') = 0. To verify the approximation properties of the linear mapping (_D1_F(a;o))~1, it suffices to calculate the following limit for y = F(x) approaching y0 = F(x0):
lim 1 (f-1(y)-P-1(yo)-(fJ1P(a:0))-1(y-y0)).
y->y° \\y-yo\\
Substituting (1) for y — yo into the latter equality yields
1
lim
x — Xq
y^yo \\y - y0\\
- (z)1^))-1^1^)^ - x0) + a(x - xq))^
-1
lim
y^yo \\y-y0\
= (D^ixo))-1 lim
■(D'FixoJr^aix-xo))
(-1)
■(a(x - x0)),
y^yo \\y - y0\\
where the last equality follows from the fact that linear mappings between finite-dimensional spaces are always continuous. Hence performing this linear mapping commutes with the limit process.
The proof is almost finished. The limit at the end of the expression is, using the properties of a, zero if the values \\F(x)—F(x0)\\ are greater than C\\x—x0\\ for some constant
536
CHAPTER 8. CALCULUS WITH MORE VARIABLES
We can easily see that the equation can only be satisfied by a triple of non-zero numbers. Dividing pairs of equations and substituting into the ellipse's equation, we get eight solutions, namely the stationary points x = ±y|p V = ^^TJp z = ±-j^. However, the function / takes on only two distinct values at these eight points. Since it is continuous and the given ellipsoid is compact, / must have both a maximum and minimum on it. Moreover, since both / and g are continuously differentiable, these externa must occur at stationary points. Therefore, it must be that four of the computed stationary points are local maxima of the function (of value 3J3kl) and the other four are minima (of value — 3J3kl). □
8.H.6. Determine the global externa of the function
f(x, y) = x2 — 2y2 + Axy — 6a; — 1 on the set of points [x, y] that satisfy the inequalities
(1) x > 0, y > 0, y < -x + 3.
Solution. We are given a polynomial with continuous partial derivatives on a compact (i. e. closed and bounded) set. Such a function necessarily has both a minimum and a maximum on this set, and this can happen only at stationary points or on the boundary. Therefore, it suffices to find stationary points inside the set and the ones on a finite number of open (or singleton) parts of the boundary, then evaluate / at these points and choose the least and the greatest values. Notice that the set of points determined by the inequalities (1) is clearly a triangle with vertices at [0, 0], [3,0], [0,3].
Let us determine the stationary points inside this triangle as the solution of the equations fx = 0, fy = 0. Since
fx{x,y) = 2a; + 4y-6, fy(x,y) =4x - Ay,
these equations are satisfied only by the point [1,1]. The boundary suggests itself to be expressed as the union of three line segments given by the choice of pairs of vertices. First, we consider x = 0, y G [0,3], when j(x,y) = — 2y2 — 1. However, we know the graph of this (univariate) function on the interval [0,3] It is thus not difficult to find the points at which global externa occur. They are the marginal points [0, 0], [0,3]. Similarly, we can consider y = 0, x G [0, 3], also obtaining the marginal points [0, 0], [3, 0]. Finally, we get to the line segment y = —a; + 3, a; G [0,3]. Making some rearrangements, we get
f(x, y) = f(x, -x + 3) = -5a;2 + 18a; - 19, xe [0, 3].
C > 0. This can be translated in terms of the inverse as
C\\F-1(y)-F-1(y0)\\ < \\y - y0||, i.e.
\\F-1(y)-F-1(y0)\\ R™ be a continuously differen-tiable and invertible mapping, and write
t = (h,... ,tn), x = (xu ... ,xn) = G(h, ...,tn).
Further let M = G(N) be a Riemann measurable sets, and f : M —> R a continuous function. Then, N is also Riemann measurable and
f(x) dXl.. .dxn = / f(G(t))\det(D1G(t))\dt1.. .dtn.
8.2.9. The invariance of the integral. The first thing to be j§t verified is the coincidence of two definitions of volume of parallelepipeds (taken for granted in •5£=lisiEL> the above intuitive explanation of the latter theorem). Volumes and similar concepts were dealt with in chapter 4 and a crucial property was the invariance of the concepts with respect to the choice of Euclidean frames of R™, cf. 4.1.22 on page 247, which followed directly from the expression of the volumes in terms of determinants. It is needed to show that the same result holds in terms of the Riemann integration as defined above. It turns out that it is easier to deal with invariance with respect to general invertible linear mappings^ : R™ -4- R™.
Proposition. Let R™ be an invertible linear mapping and I C Rn a multidimensional interval. Consider a function f, such that fo^P is integrable on I. Then M = &{F) is Riemann measurable, f is Riemann integrable on M and
I dx-i
■ dxn =
M
detail / {fo&){y1,...,yn)dy1...dyn.
□
Proof. Each linear mapping is a composition of the elementary transformations of three types (see the discussion in chapter 2, in particular paragraphs 2.1.7 and 2.1.9).
The first one is a multiplication of one of the coordinates with a constant: ), y = rsm((p), z = z with Jacobian J = 4=r. The equation of the paraboloid in these coordinates is z = r2, so the volume of the solid is equal to
/•7T-/2 /*\/2 /*2 ^
V = 4 / / —pr dz dr dp
Jo Jo Jr2 v2
/"ir/2 rs/2 rir/2
= 2y/2 / 2r - r3 dr dp = 2^2 / dp
Jo Jo Jo
= V2tt.
□
8.1.10. Calculate the volume of the ellipsoid x2 + 2y2 +
3z2 = 1.
Solution. We will consider the coordinates
x = r cos( 2, and r2 = 8 — 2, i. e., z = r + 2 for the former surface and z = 8 — r2 for the latter. Altogether, the projection of the given solid onto the coordinate p is equal to the interval [0, 2tt]. Having fixed a po e [q,2n], the projection of the intersection of the solid and the plane p = p0 onto the coordinate r equals (independently of po) the interval [0,2]. Having fixed both r0 and po, the projection of the intersection of the solid and the line r = ro, p = po, onto the coordinate z is equal to the interval [r0 + 2,8 — ?-q] . The Jacobian of the
considered transformation is J
V2
r, so we can write
V ■
2tt r2 rS-r-
0 JO Jr+2
r a a a 16y/2 —= az dr dp = -it.
V2 ^3
□
8.1.15. Find the volume of the solid which lies inside the cylinder y2 +z2 = 4 and the half-space a; > 0 and is bounded by the surface y2 + z2 + 2x = 16.
Solution. In cylindric coordinates,
V ■
2tt 1-2 /.8-V-
0 JO JO
r dx dr dp = 28tt.
□
8.1.16. The centroid of a solid. The coordinates (xt, yt,zt) of the centroid of a (homogeneous) solid T with volume V in R3 are given by the following integrals:
xt = J J J xdxdydz,
Vt
y dx dy dz,
z da;dyd2.
The centroid of a figure in R2 or other dimensions can be computed analogously.
8.1.17. Find the centroid of the part of the ellipse 3a;2 + 2y2 = 1 which lies in the first quadrant of the plane R2.
can be guaranteed. The entire image of J lies inside a slightly enlarged linear image of J by the derivative. Now, the outer measure a of the image G(J) satisfies:
a < (1 + e)n vol„ R = (1 + e)n | det G(t) vol„ J.
If [i is the upper Riemann sum for the measure of dM corresponding to the chosen partition, the outer measure of G(dM) must be bounded by (1 + e)n maxf€aM | det G(t)\fi. Finally, for the same e, the norm of the partition is bounded, so that [i < e, too. But then the outer measure is bounded by a constant multiple of (1 + e)ne, with the universal constant maxtGgM | det G(t)\. So the outer measure is zero, as required. □
A slightly extended argumentation as in the proof above leads to understanding that the Riemann integrable functions are exactly those bounded functions with compact support whose set of discontinuity points has (Riemann) measure
8.2.11. Proof of Theorem 8.2.8. A continuous function / and a differentiable change of coordinates is under consideration. So the inverse G-1 is continuously differentiable, and the image G~1(M) = N is Riemann measurable. Hence the integrals on both sides of the equality exist and it remains to prove that their values are equal.
Denote a composite continuous function by
I tn) — f(G(h,..., tn)),
and choose a sufficiently large n-dimensional interval I containing 7y and its partition S. The entire proof is nothing more than a more exact writing of the discussion presented before the formulation of the theorem.
Repeat the estimates on the volumes of images from the previous paragraph on Riemann measurability. It is already known that the images G(Ii1...in) of the intervals from the partition are again Riemann measurable sets. For each small part Ii 1... i n of the partition E, the integral of / over Ji1...in = G(Ii1...in) certainly exists, too.
Further.ifthecenterf^...^ of the interval/^...^ isfixed, then the linear image of this interval
Ri
G(ttl...tn) + Z31G(til...i„)(Jil...i„ - ttl...tn),
is obtained. This is an n-dimensional parallelepiped (note that the interval is shifted to the origin with the linear mapping given by the Jacobi matrix, and the result is then added to the image of the center).
If the partition is very fine, this parallelepiped differs only a little from the image Ji1 . By the uniform continuity of the mapping G, there is, for an arbitrarily small e > 0, a norm of the partition such that for all finer partitions
G(fn...2J + (1 + e)D1G(t1,... f„)(/n...2J D Jll...lk.
559
CHAPTER 8. CALCULUS WITH MORE VARIABLES
Solution. First, let us calculate the volume of the given ellipse. The transformation x = -j^x', y = -^y' with Jaco-
bian -j7g leads to
v5
0 Jo
tt
dyda; = ——
V6jo Jo
1 rVl-x2
Ay' Ax'
The other integrals we need can be computed directly in Cartesian coordinates x and y:
Tx =
xAyAx= I x\l---da; =
1 /"* /l — 3i y/2
1 ^(l-3x2)Ax=^§. Therefore, the coordinates of the centroid are 2-^l. □
8.1.18. Find the volume and the centroid of a homogeneous cone of height h and circular base with radius r.
Solution. Positioning the cone so that the vertex is at the origin and points downwards, we have in cylindric coordinates that
V = 4 / / / pAzApAp = -Trhr2. Jo Jo J^p 3
Apparently, the centroid lies on the z-axis. For the
2-coordinate, we get
cone
zdV=-
y rir/2 j-r
0 JO J^p
zpAzApAp = —h.
Thus, the centroid lies \h over the center of the cone's base.
□
8.1.19. Find the centroid of the solid which is bounded by the paraboloid 2a;2 + 2y2 = z, the cylinder (a;+l)2+y2 = 0, and the plane 2 = 0.
Solution. First, we will compute the volume of the given solid. Again, we use the cylindric coordinates (x = r ■ cos p, y = r ■ sin p, z = z), where the equation of the paraboloid is z = 2r2 and the equation of the cylinder reads r = — 2 cos(y>). Moreover, taking into account the fact that the plane x = 0 is tangent to the given cylinder, we can easily
However, then the n-dimensional volumes also satisfy
vol„(Jn...2J < (l + e)nvo\n(Rn...tJ
= (1 + eT\AetG(tn...tk)\vo\n(In...tJ.
Now, it is possible to estimate the entire integral: f(x1, ...,xn)dx1... dxn =
= E / f(x1,...,xn)dx1...dxn
< V ( sup g(t))voln(Jil...iJ
<(l + e)"V( sup g(t))\AetG(tn...tk)\vo\n(In...tJ.
— t€li, ;„
If e approaches zero, then the norms of the partitions approach zero too, the left-hand value of the integral remains the same, while on the right-hand side, the Riemann integral of g(t) | det G(t) | is obtained. Instead of the desired equality, the inequality:
f(x)dx1...dxn< I f(G(t))\Aet(D1G(t))\dt1...dtn
m Jn
is obtained.
The same reasoning can be repeated after interchanging G and G-1, the integration domains M and N, and the functions / and g. The reverse inequality is immediately obtained:
g(t)\Aet(D1G(t)) \ dh ... dtn < f /(a;)|det(D1G(G-1(a;)))|
JM
\Aet(D1G~1 (x)) \ dx±... dxn f(x) dx1...dxn.
The proof is complete.
8.2.12. An example in two dimensions. The coordinate transformations are quite transparent for the in-
"^~T^' tegral of a continuous function f(x,y) of two '>< V ' variables. Consider the differentiable transfor-
1' -~- mation G(s,t) = (x(s,t),y(s,t)). Denoting g(sj) = f(x(s,t),y(s,t)),
f(x,y) dxdy = / g(s,t)
G(iV) JN
dx dy dx dy ds dt dt ds
dsdt
is obtained.
As a truly simple example, calculate the integral of the indicator function of a disc with radius R (i.e. its area) and the integral of the function f(t, 6) = cos(i) defined in polar coordinates inside a circle with radius \tt (i.e. the volume hidden under such a "cap placed above the origin", see the illustration).
560
CHAPTER 8. CALCULUS WITH MORE VARIABLES
determine the bounds of the integral that corresponds to the
volume of the examined solid:
r^r r~2 cos v r2r2 V = I I I rdzdr dp
o Jo
2 cos ip
2rA dr dp
8 cos irp = 37r,
where the last integral can be computed using the method of recurrence from 6.2.6.
Now, let us find the centroid. Since the solid is symmetric with respect to the plane y = 0, the y-coordinate of the centroid must be zero. Then, the remaining coordinates xt and zt of the centroid can be computed by the following integrals:
1
xT ■
V
1
Vjo 1
V .U
1
V .L
B
x dx dy dz
-2 cos ip
r2 cos p dz dr dp
'o
-2 cos ip
2r4 cos p dr dph
o
64 6 , 4 — cos pdp = --,
where the last integral was computed by 6.2.6 again. Analogously for the z-coordinate of the centroid:
1
-2 cos ip
zr cos p dz dr dp = —.
The coordinates of the centroid are thus [—1,0,
201
3' 9 >■
□
8.1.20. Find the centroid of the homogeneous solid in R3 which lies between the planes z = 0 and z = 2, bounded by the cones x2 + y2 = z2 and x2 + y2 = 2z2.
Solution. The problem can be solved in the same way as the previous ones. It would be advantageous to work in cylindric coordinates.
However, we can notice that the solid in question is an "annular cone": it is formed by cutting out a cone K1 with base radius 4 of a cone K2 with base radius 8, of common height 2.
The centroid of the examined solid can be determined by the "rule of lever": the centroid of a system of two solids is the weighted arithmetic mean of of the particular solids' centroids, weighed by the masses of the solids. We found out
First, determine the Jacobi matrix of the transformation
x = r cos 6,y = r sin 9
D1G =
(cos ( I sin I
Hence, the determinant of this matrix is equal to
det D1G(r, 9) = r(sin2 9 + cos2 6») = r.
Therefore, the calculation can be done directly for the disc S which is the image of the rectangle (r, 9) e [0, R] x [0, 27r] = T. In this way the area of the disc is obtained:
f2" fR fR 2
dxdy = I I r drd9 = I 2ixr dr = ttR . s Jo Jo Jo
The integration of the function / is very similar, using multiple integration and integration by parts:
/•2ir j-tt/2
j(x,y)dxdy= I I r cos r drd9 = tt2 — 27T. is Jo Jo
In many real life applications, a much more general approach to integration is needed which allows for the dealing with objects over curves, surfaces, and their higher dimensional analogues. For many simple cases, such tools can be built now with the help of parametrization of such k-dimensional surfaces and employ the letter theorem to show the independence of the result on such a parametrization. These topics are postponed to the beginning of the next chapter where a more general and geometric approach is discussed.
3. Differential equations
In this section, we return to (vector) functions of one variable, defined and examined in terms of their instantaneous changes.
8.3.1. Linear and non-linear difference models. The concept of derivative was introduced in order to work with instantaneous changes of the examined quantities. In the introductory chapter, difference equations based on similar concepts in relation to sequences of scalars were discussed. As a motivating introduction to equations containing derivatives of unknown functions, recall first the difference equations.
561
CHAPTER 8. CALCULUS WITH MORE VARIABLES
in exercise 8.1.18 that the centroid of a homogeneous cone is situated at quarter its height. Therefore, the centroids of both cones lie at the same point, and this points thus must be the centroid of the examined solid as well. Hence, the coordinates of the wanted centroid are [0,0, §]. □
8.1.21. Find the volume of the solid in R3 which is bounded by the cone part x2 + y2 = (z — 2)2 and the paraboloid
x2 + y2 = 4 — z.
Solution. We build the corresponding integral in cylindric coordinates, which evaluates as follows:
j-2-K rl ri-r2 g
V = I I I rdz drdp = —ir.
Jo Jo Jr+2 6
□
8.1.22. Find the volume of the solid in R3 which lies under the cone x2 + y2 = (z — 2)2, z < 2 and over the paraboloid x2 + y2 = z.
Solution.
/•2ir r-1 r~2-r g
V = / / / rdz drdp = -ir.
Jo Jo Jr2 6
Note that the considered solid is symmetric with the solid from the previous exercise 8.1.21 (the center of the symmetry is the point [0,0, 2]). Therefore, it must have the same volume. □
8.1.23. Find the centroid of the surface bounded by the parabola y = 4 — x2 and the line y = 0. O
8.1.24. Find the centroid of the circular sector corresponding to the angle of 60° that was cut out of a disc with radius 1.
o
8.1.25. Find the centroid of the semidisc x2 + y2 = 1, y > 0.
o
8.1.26. Find the centroid of the circular sector corresponding to the angle of 120° that was cut out of a disc with radius 1.
o
8.1.27. Find the volume of the solid in R3 which is given by the inequalities z > 0, z — x < 0, and (x — l)2 + y2 < 1.
o
8.1.28. Find the volume of the solid in R3 which is given by the inequalities z > 0, z — y <0. O
8.1.29. Find the volume of the solid bounded by the surface
3a2 + 2y2 + 3z2 + 2xy - 2yz - Axz = 1.
o
The simplest difference equations are formulated as yn+i = F(yn, n), with a function F of two variables. For example, the model describing interests of deposits or loans (this included the Malthusian model of populations) was considered. The increment was proportional to the value, yn+i = ayn, see 1.2.2. Growths by 5% is represented by a = 1.05. Considering continuous modeling, the same request leads to an equation connecting the derivative y'(t) of a function with its value
(1) V'(t)=ry(t)
with the proportionality constant r. Here, the instantaneous growth by 5% corresponds to r = 0.05.
It is easy to guess the solution of the latter equation, i.e. a function y(t) which satisfies the equality identically,
y(t) = Cert
with an arbitrary constant C. This constant can be determined uniquely by choosing the initial value yo = y(to) at some point to. If a part of the increment in a model should be given as a constant independent of the value y or t (like bank charges or the natural decrease of stock population as a result of sending some part of it to slaughterhouses), an equation can be used with a constant s on the right-hand side.
(2) y'(t)=r-y(t) + s. The solution of this equation is the function
y(f) = Cert--.
r
It is a straightforward matter to produce this solution when it is realized that the set of all solutions of the equation (1) is a one-dimensional vector space, while the solutions of the equation (2) are obtained by adding any one of its solutions to the solutions of the previous equation. The constant solution y(t) = k for k = — ^ is easily found.
Similarly, in paragraph 1.4.1, the logistic model of population growth was created. Based on the assumption that the ratio of the change of the population size p(n + 1) — p(n) and its size p(n) is affine with respect to the population size itself. The model behaves similar as the Malthusian one for small values of the population size and to cease growing when reaching a limit value K. Now, the same relation for the continuous model can be formulated for a population p(t) dependent on time t by the equality
(3) p'(t)=p(t)(-^p(t)+r).
At the value p(t) = K for a (large) constant K, the instantaneous increment of the function p is zero, while for p(t) > 0 near zero, the ratio of the rate of increment of the population and its size is close to r, which is the (small) number expressing the rate of increment of the population in good conditions (e.g. 0.05 would again mean immediate growth by 5%).
It is not easy to solve such an equation without knowing any theory (although this type of equations will be dealt with in a moment). However, as an exercise on differentiation, it
562
CHAPTER 8. CALCULUS WITH MORE VARIABLES
8.1.30. Find the volume of the part of R3 lying inside the ellipsoid 2x2 +y2 + z2 =6 and in the half-space x > 1. O
8.1.31. The area of the graph of a real-valued function f(x,y) in variables x and y. The area of the graph of a function of two variables over an area S in the plane xy is given by the integral
l + f2 + f2dxdy.
Considering the cone x2 + y2 = z2. find the area of the part of its lateral surface which lies above the plane z = 0 and inside the cylinder x2 + y2 = y. Solution. The wanted area can be calculated as the area of
the graph of the function z = \Jx2 + y2 over the disc K: x2 — (y — |)2. We can easily see that
V
fx
fy
x2 + y2 ' y x2 + y2 '
so the area is expressed by the integral
JJ y/l + f2 + f*dxdy = JJ yfidxdy.
= V2
7t /*sm 7t
r ar dtp — - / sin ip
2 _/n
□
8.1.32. Find the area of the parabola z = x2 + y2 over the disc x2 + y2 < 4. O
8.1.33. Find the area of the part of the plane x + 2y + z = 10 that lies over the figure given by (x — l)2 + y2 < 1 and y > x.
o
In the following exercise, we will also apply our knowledge of the theory of Fourier transforms from the previous chapter.
8.1.34. Fourier transform and diffraction. Light intensity is a physical quantity which expresses the transmission of energy by waves. The intensity of a general light wave is defined as the time-averaged magnitude of the Poynting vector, which is the vector product of mutually orthogonal vectors of electric and magnetic fields. A monochromatic plane wave spreading in the direction of the y-axis satisfies
I = ceo
where c is the speed of light and eo is the vacuum permittivity. The monochromatic wave is described by the harmonic function Ey = t/j(x, t) = A cos(cut — kx). The number A is the
is easily verified that the following function is a solution for every constant C:
Pit)
K
l + CKe-
For the continuous and discrete versions of the logistic models, the values K = 100, r = 0.05, and C = 1 in the left hand illustration are chosen. The same 1.4.1 result occurs in the right hand illustration (i.e. with a = 1.05 and p± = 1, as expected). The choice C = 1 yields p(0) = K/(1 + K) which is very close to 1 if K is large enough.
In particular, both versions of this logistic model yield quite similar results. For example, the left hand illustration also contains the dashed line of the graph of the solution of the equation (1) with the same constant r and initial condition (i.e. the Mathusian model of growth).
8.3.2. First-order differential equations. By an (ordinary) first-order differential equation, is usually meant the relation between the derivative y'(t) of a (^Mtiwts function with respect to the variable t, its value ^r^sp^A- y(t), and the variable itself, which can be written in terms of some real-valued function F : R3 —> R as the equality
F(y'(t),y(t),t) = 0.
This equation resembles the implicitly defined functions y(t); however, this time there is a dependency on the derivative of the function y(t). We also often suppress the dependence of y = y(t) on the other variable t and write F(y', y, t) = 0 instead.
If the implicit equation is solved at least explicitly with regard to the derivative, i.e.
y' = f(t,y)
for some function / : R2 —> R, it is clear graphically what this equation defines. For every value (t, y) in the plane, the arrow corresponding to the vector (1, f(t, y)), can be considered. That is the velocity with which the point of the graph of the solution moves through the plane, depending on the free parameter t.
For instance, the equation (3) in the previous subsection determines the following: (illustrating the solution for the initial condition as above).
563
CHAPTER 8. CALCULUS WITH MORE VARIABLES
at
maximal amplitude of the wave, u is the angular frequency, and for any fixed t, the so-called wave length A is the prime period. The number k then represents the speed k = 2f-which the wave propagates. We have
I = ce0- [ Eldt = ce0- [ A2 cos2 (ojt - k x) dt T Jo T Jo
a91 C 1 +cos(2M -kx)) , = ce0A2- -^-=dt
T Jo 1
1 ,,lr sin(2(cjt — k x)) 1T
1 . 9I/ sin(2(o;r — k x)) — sin(2(—k x)) -, = -ce0Az- (t H--—---—-—
2 u t{ 2uj
= ^ce0A2(l +
■ 1 A2
= ^ce0A2
i(2(wt -kx))- sin(2(-fc x)), 2cjt '
The second term in the parentheses can be neglected since it is always less than 7^7 = 577 < 10~6 f°r real detectors of light, so it is much inferior to 1. The light intensity is directly proportional to the squared amplitude.
A diffraction is such a deviation from straight-line propagation of light which cannot be explained as the result of a refraction or reflection (or the change of the ray's direction in a medium with continuously varying refractive index). The diffraction can be observed when a lightbeam propagates through a bounded space. The diffraction phenomena are strongest and easiest to see if the light goes through openings or obstacles whose size is roughly the wavelength of the light. In the case of the Fraunhofer diffraction, with which we will deal in the following example, a monochromatic plane wave goes through a very thin rectangular opening and projects on a distant surface. For instance, we can highlight a spot on the wall with a laser pointer. The image we get is the Fourier transform of the function describing the permeability of the shade - opening.
Let us choose the plane of the diffraction shade as the coordinate plane 2 = 0. Let a plane wave Aexp(ikz) (independent of the point (x, y) of landing on the shade) hit this plane perpendicularly. Let s(x,y) denote the function of the permeability of the shade, then the resulting waves falling onto the projection surface at a point (£, if) can be described as the integral sum of the waves (Huygens-Fresnel principle) which have gone through the shade and propagate through the medium from all points (x, y, 0) (as a spherical wave) into the point (£,77,2):
,* y s s
/ / / / / / /
/ / / / / / / /
s s s / r / / / / / / / / / / / / / / / /
/ / / / /// / / / / / / / / / / ■ / / / / / /
/ / / / / ////// /////////// /////////// /////////// yyyyyyyyyyy
Such illustrations should invoke the idea that differential equations define a "flow" in the plane, and each choice of the initial value (to,y(t0)) should correspond to a unique flow-line expressing the movement of the initial point in the time t. It can be anticipated intuitively that for reasonably behaved functions f(t,y) in the equations y' = f(t,y), there is a unique solution for all initial conditions.
8.3.3. Integration of differential equations. Before examining the conditions for existence and uniqueness of the solutions, we present a truly elemen-!>^j$£~ tarv method for finding the solutions. The idea, y>$s~*~ mentioned briefly already in 6.2.14 on page 406, is to transforms the problem to ordinary integration, which usually leads to an implicit description of the solution.
Equations with separated variables
Consider a differential equation in the form
(1) y' = f(t) ■ g(y)
for two continuous functions of a real variable, / and g.
The solution of this equation can be obtained by integration, finding the antiderivatives
G(y)
f(t)dt.
This procedure reliably finds solutions y(t) which satisfy 9(y(t)) 7^ 0, given implicitly by the formula
(2) F(t) + C = G(y)
with an arbitrary constant G
Differentiating the latter equation (2) using the chain rule for the composite function G(y(t)) leads to ^yy'(^) = f(t)< as required.
As an example, find the solution of the equation
y' = ty.
Direct calculation gives In \y(t)\ = \t2 + C with arbitrary constant C. Hence it looks (at least for positive values of y) as
y(t) = e'*2+c = D e'*2,
where D is an arbitrary positive constant. It is helpful to examine the resulting formula and signs thoroughly. The constant solution y(t) = 0 also satisfies the equation. For negative values of y, the same solution can be used with negative
564
CHAPTER 8. CALCULUS WITH MORE VARIABLES
v) = A / / s(x, y)e-jfe(^+TO) dx dy
J jr2 rp/2 rq/2
iP(^v)=A / e-tk^+vv) dy dx
J-p/2 J-q/2
rp/2 rq/2
e~lk^x dx
A
p/2
-ikrjy
dy
q/2
p/2 ' e-ik-qy ' q/2
-ik£ _ -p/2 —ikrj -q/2
A
= Apq
2sin(fcCp/2) 2sin(fc rjq/2)
sin(fc £p/2) sin(fc rjq/2) k£p/2 kriq/2
The graph of the function j(x) = looks as follows:
o,l-
Ia-
Jü2-
-20 / 1 \/ 0,2-
The graph of the function if) =
sin t sin 7
then does:
constants D. In fact, the constant D can be arbitrary, and a solution is found satisfying any initial value.
i i \ \ WW \~W-/ /////) \ 1 \ l \ \\ \ N-"-/ / / / / / I I i I, i \ \ \ \ \-w /III!! \ \ \ \ \ \ \ \\^// I ! I I I W W \ W \ / / / /
WWW \N<—/ / / /
\ \ W \ W^-a-sS// III.
\ \ \ \ N\\v-----/ r ill
\ \ \\\\x^------/ / / / /
\ \ \ \ \ \ \-"'//111 II \\ \ \ \ \ \ w-"-// ///////
\ \ \ \ \ W-r—// I I I I I
\ \ \ W\\\ y—///////. \ \\ \ \ \ \w"—s/r 11 1 ill \ \\\ // / ill
\ \ \\ WW-----"S//ill
\ \ \\N w------"/' / /
/ / .'X---------^^x\\\ \ \
/ / f I ///---'-—-^;\\ A \ \
I 1111 I I //S-^---NW \ \ \\\ \
I ill I I I / /,---->v.\.\ \ \ \ \\, \
I / I I I I I / --x \ \ \ \ \ \ \ \
/Iff//// /•—\ \ \ \ \ \ \ \ 1 II) il'lll /.-—-\\ \ \ \ \ \
ill wii 11 \ \ \ \ \ \
/ / SSSSS'^--------\
/ / / /////'—\ \ \
I I I I I / //--j— ss\ \ \ \ \ \
//////>/'—fcs\\s\\\\
WW I I I 11 n \ \ \ \ \ \ \ W I! If 11 WW \ i \
WW I! I I//---W\ \ \ vV 1
Iii///// /---\ wiim fin//1\ \ \ \ i
The illustration shows two solutions which demonstrate the instability of the equation with regard to the initial values: For every to, if we change a small yo from a negative value to a positive one, then the behaviour of the resulting solution changes dramatically. Notice the constant solution y(t) = 0, which satisfies the initial condition y(t0) =0.
Using separation of variables, the non-linear equation is easily solved from the previous paragraph which describes the logistic population model. Try this as an exercise.
8.3.4. First order linear equations. In the first chapter, we paid much attention to linear difference equations. Their general solution was determined in paragraph 1.2.2 on page 11. Although it is clear beforehand that it is a one-dimensional affine space of sequences, it is a hardly transparent sum, because all the changing coefficients need to be taken into account.
Consequently this can be used as a source of inspiration for the following construction of the solution of a general first-order linear equation
(1) y'= a{t)y + b{t)
with continuous coefficients a(t) and b(t).
First, find the solution of the homogeneous equation y' = a(t)y. This can be computed easily by separation of variables, obtaining the solution with y(s) = y0,
y(t) = y0F(t,t0), F(t,s)=ef>^dx.
In the case of difference equations, the solution of the general non-homogeneous equation was "guessed". Then it was proved by induction that it was correct. It is even simpler now, as it suffices to differentiate the correct solution to verify the statement, once we are told what the right result is:
The solution of first-order linear equations
The solution of the equation (1) with initial values y(to) = yo is (locally in a neighbourhood of 10) given by the formula
y(t)=yoF(t,to)+ [ F(t,s)b(s)ds,
where F(t,s) = eJ'sa<-^dx.
And the diffraction we are describing:
Verify the correctness of the solution by yourselves (pay proper attention to the differentiation of the integral where t is
565
CHAPTER 8. CALCULUS WITH MORE VARIABLES
140 fad
- Orad
L-MO rad
Since limx^0 ~ 1. the intensity at the middle of the image is directly proportional to To = A2p2q2. The Fourier transform can be easily scrutinized if we aim a laser pointer through a subtle opening between the thumb and the index finger; it will be the image of the function of its permeability. The image of the last picture can be seen if we create a good rectangular opening by, for instance, gluing together some stickers with sharp edges.
J. First-order differential equations 8.J.I. Find all solutions of the differential equation
V
(1 + cos2 x)
Solution. We are given an ordinary first-order differential equation in the form y' = f(x,y), which is called an explicit form of the equation. Moreover, we can write it as y' = fi(x)- J2(y) for continuous univariate functions f\ and J2 (on certain open intervals), i. e., it is a differential equation with separated variables.
First, we replace y' with dy/dx and rewrite the differential equation in the form
Since
1+co2s x dx.
COSz x
I1-
+COS x
dx = J ■
+ 1 dx,
we can integrate using the basic formulae, thereby obtaining
(1) arcsiny = tg x + x + C, CeK.
However, we must keep in mind that the division by the expression sj\ — y2 is valid only if it is non-zero, i. e., only for y =^ ±1. Substituting the constant functions y = 1, y = — 1 into the given differential equation, we can immediately see that they satisfy it. We have thus obtained two more solutions,
both in the upper bound and a free parameter in the integrand, cf. 6.3.14).
In fact, there is the general method called variation of constants which directly yields this solution, see e.g. the problem 8.1.9. It consists in taking the solution for the homogenous equation in the form y(t) = cF(t,t0) and consider instead an ansatz for a solution to the non-homogeneous equation in the form y(t) = c(t)F(t,t0) with an unknown function c(t). Differentiating yields the equation c' = e~ -^o a^dx an(j integrating this leads to c(t) = Jl eX.'° a^dx b(s)ds, i.e. y(t) = c(i)eJ4 a^dx as in the above formula. Check the details!
Notice also the similarity to the solution for the equations with constant coefficients explicitly computed in the form of convolution in ?? on the page ??, which could serve as inspiration, too.
As an example, the equation
y
1 -xy,
can be solved directly, this time encountering stable behaviour, visible in the following illustration.
8.3.5. Transformation of coordinates. The illustrations suggest that differential equations can be perceived as geometric objects (the "directional field of the arrows"), so the solution can be found by conveniently chosen coordinates. We return to this point of view later. Here are three simple examples of typical tricks as seen from the explicit form of the equations in coordinates.
We begin with homogeneous equations of the form
y\
Considering the transformation z = f and assuming that t ^ 0, then by the chain rule,
z'(i) = ±(ty'(i)-y(i)) = \(f(z)-z),
which is an equation with separated variables.
Other examples are the Bernoulli differential equations, which are of the form
y'(t) = f(t)y(t)+g(t)y(tr,
566
CHAPTER 8. CALCULUS WITH MORE VARIABLES
which are called singular. We do not have to pay attention to the case cos x = 0 since this only loses points of the domains (but not any solutions).
Now, we will comment on several parts of the computation. The expression y' = dy/dx allows us to make many symbolic manipulations. For instance, we have
dz _ dy_ _ dz_ 1 _ dx_
dy dx dx"1 dy '
The validity of these two formulae is actually guaranteed by the chain rule theorem and the theorem for differentiating an inverse function, respectively. It was just the facility of the manipulations that inspired G. W. Leibniz to introduce this notation, which has been in use up to now. Further, we should realize why we have not written the general solution (1) in the suggesting form
(2)
y = sin (tg x + x + O) , C £
As we will not mention the domains of differential equations (i. e., for which values of x the expressions are well-defined), we will not change them by "redundant" simplifications, either. It is apparent that the function y from (2) is denned for all x £ (0,7r) \ {k/2}. However, for the values of x which are close to 7r/2 (having fixed C), there is no y satisfying (1). In general, the solutions of differential equations are curves which may not be expressible as graphs of elementary functions (on the whole intervals where we consider them). Therefore, we will not even try to do that. □
8.J.2. Find the general solution of the equation
y' = (2 - y) tg x.
Solution. Again, we are given a differential equation with separated variables. We have
dy dx dy
= (2 - y) tg x, sin a;
■ dx,
y — 2 cos x — In | y — 2 | = -In | cosa; -ln|C|, C ^ 0.
Here, the shift obtained from the integration has been expressed by In | C |, which is very advantageous (bearing in mind what we want to do next) especially in those cases when we obtain a logarithm on both sides of the equation. Further, we have
where n/0,1. The choice of the transformation z = y1 71 leads to the equation
z'(t) = (l-n)y(t)-n(f(t)y(t)+g(t)yn) = {\-n)f{t)z{t) + {\-n)g{t),
which is a linear equation, easily integrated.
We conclude with the extraordinarily important Riccati equation. It is a form of the Bernoulli equation with n = 2, extended by an absolute term
y'(t) = f(t)y(t) + g(t)y(t)2 + h(t).
This equation can also be transformed to a linear equation provided that a particular solution x (t) can be guessed. Then, use the transformation
Z{t) = y{t)-x{ty
Verify by yourselves that this transformation leads to the equation
z'(i) = -{f(i) + 2x(i)g(i))z(i)-g(i).
As seen in the case of integration of functions (the simplest type of equations with separated variables), the equations usually do not have a solution expressible explicitly in terms of elementary functions.
As with standard engineering tables of values of special functions, books listing the solutions of basic equations are compiled as well.6 Today, the wisdom concealed in them is essentially transferred to software systems like Maple or Math-ematica. Here, any task about ordinary differential equations can be assigned, with results obtained in surprisingly many cases. Yet, explicit solutions are not possible for most problems.
8.3.6. Existence and uniqueness. The way out of this is numerical methods, which try only to approximate the solutions. However, to be able to use them, good theoretical starting points are still needed regarding existence, uniqueness, and stability of the solutions.
We begin with the Picard-Lindeldf theorem:
Existence and uniqueness of the solutions of ODEs
Theorem. Consider a function f(t, y) : R2 —> R with continuous partial derivatives on an open set U. Then for every point (to,yo) £ U C R2, there exists the maximal interval I = [to — a, to + b], with positive a, b £ R, and the unique function y(t) : / —> R which is the solution of the equation y' = f(t, y) on the interval I.
For example, the famous book Differentialgleichungen reeller Funktionen, Akademische Verlagsgesellschaft, Leipzig 1930, by E. Kamke, a German mathematician, contains many hundreds of solved equations. They appeared in many editions in the last century.
567
CHAPTER 8. CALCULUS WITH MORE VARIABLES
In | y — 2 | = ln|Ccosa;|, C / 0, |2/ — 2 | = |Ccosx|, C/0, y-2 = Ccosa;, C 0,
where we should write ±C (after removing the absolute value). However, since we consider all non-zero values of C, it makes no difference whether we write +C or — C. We should pay attention to the fact that we have made a division by the expression y — 2. Therefore, we must examine the case y = 2 separately. The derivative of a constant function is zero, so we have found another solution, y = 2. However, this solution is not singular since it is contained in the general solution as the case C = 0. Thus, the correct result is
y = 2 + Ccosa;, C e R. □
8.J.3. Find the solution of the differential equation
(1 + ex) yy' = ex which satisfies the initial condition y(0) = 1. Solution. If the functions / : (a, 6) —> R and g : (c, d) —> R are continuous and g(y) / 0, y e (c,d), then the initial problem
y' = f(x)g(y), y(x0) = yo has a unique solution for any xo £ (a, b), yo G (c, d). This solution is determined implicitly as
yo xo
In practical problems, we first find all solutions of the equation and then select the one which satisfies the initial condition.
Let us compute:
(1 + ex)ydy/dx = ex, ex
ydy =
1 + ex
■ dx.
V
= ln(l + ex) + ln|C,|, C + 0,
^=ln(C[l + e*]), C>0. The substitution y = 1, x = 0 then gives i=ln(C-2), i.e. C = ^. We have thus found the solution
iL.
2
l. e.,
In ( 4 [1 + e^]
21n(^ [l + e*
Proof. If a differentiable function y{i) is a solution of an equation satisfying the initial condition y(t0) =t0, then it also satisfies the equation
y(t) = Vo+ y'(s)ds = y0+ f(s,y(s))ds,
J to J to
where the Riemann integrals exist due to the continuity of / and hence also y'. However, the right-hand side of this expression is the integral operator
L(v)(t) = yo+ I f{s,y{s))ds
J to
acting on functions y. Solving first-order differential equations, is equivalent to finding fixed points for this operator L, that is, to find a function y = y(t) satisfying L(y) = y.
On the other hand, if a Riemann-integrable function y(t) is a fixed point of the operator L, then it immediately follows from the fundamental theorem of calculus that y(t) satisfies the given differential equation, including the initial conditions.
It is easy to estimate how much the values L(y) and L(z) differ for various functions y(t) and z(t). Since both partial derivatives of / are continuous, / is itself lots'1 cally Lipschitz. This means that restricting the values ffl 1 (t, y) to a neighbourhood U of the point (to, yo) with compact closure, there is the estimate
\f(t,y)-f(t,z)\ 0? O
8.K.9. A 100-gram body lengthens a spring of 5 cm if hung on it. Express the dependency of its position on time t provided the speed of the body is 10 cm/s when going through the equilibrium point. O Further practical problems that lead to differential equations can be found on page 595.
L. Higher-order differential equations
8.L.I. Underdamped oscillation. Now, we will describe a simple model for the movement of a solid object attached to a point with a strong spring. If y(t) is the deviation of our object from the point yo = y(0) = 0, then we can assume that the acceleration y"(t) in time t is proportional to the magnitude of the deviation, yet with the other sign. The proportionality constant k is called the spring constant. Considering the case k = 1, we get the so-called oscillation equation
y"(t) = -y{i).
This equation corresponds to the system of equations
x'(t) = -y(t), y'(t) = x(t)
Differentiability of the solutions
Theorem. Consider an open subset U C Rn+fe and a mapping f : U —> Rn with continuous first derivatives. Then, a system of differential equations dependent on a parameter A £ Rfe with initial condition at a point x £ U
y' = f(y,\), y(0)=x
has a unique solution y(t, x, A), which is a mapping with continuous first derivatives with respect to each variable.
Proof. Consider a general system dependent on parameters, but viewed as an ordinary autonomous system with no parameters. More explicitely, consider the parameters to be additional space variables and add ■I (vector) conditions \'(t) = 0 and A(0) = A. Therefore, the theorem is proved for autonomous systems with no further parameters. There is dependency on the initial conditions.
Just as in the proof of the fundamental existence theorem 8.3.6, build on the expression of the solutions as fixed points of the integral operators and prove that the expected derivative, as discussed above, enjoys the properties of the differential. Fix a point x0 as the initial condition, together with a small neighbourhood x0 £ V, which if necessary can be further decreased during the following estimates, so that
\f(y)-f(z)\ 0, there is a bound \\h\\ < S for which the remainder R satisfies
\\R(y(t,x0 + h),y(t,x0))\\ < e\\y(t,x0 + h) - y(t,x0)\\
< \\h\\eeCT.
Therefore, the estimate on G(t, h) can be improved as follows:
G{t,h)T. This implies that lim^o ^G(t, h) = 0 as requested. □
In the same way, it can be proved that continuous differ-. entiability of the right-hand side up to order k (inclusive) guarantees the same order of differentiability of solutions in all input parameters.
8.3.13. The analytic case. Let us pay additional attention to the case when the right hand side / of the system of equations
(1)
y' = f(y), y(to) = yo
580
CHAPTER 8. CALCULUS WITH MORE VARIABLES
of the spring, and other factors) which is initiated by an outer force.
The function f(t) can be written as a linear combination of Heaviside's function u(t) and its shift, i. e.,
/(*) =cos(2t)(u(t)-uff(t))
Since
C{y"){s) = s2C{y) - sy(0) - y'(0) = s2C{y) + 1,
we get, applying the results of the above exercises 7 and 8 to the Laplace transform of the right-hand side
s2C(y) + 1 + 4C(y) = £(cos(2i)(u(i) - uw(t)))
= £(cos(2i) ■ u(t)) - £(cos(2i) ■ u^t))
= £(cos(2f)) - e-*s£(cos(2(i + tt))
(l-e-™)-
Hence,
i
■2+4'
+ (l-e-
s2 + 4 v y(s2 + 4)2'
Performing the inverse transform, we obtain the solution in the form
s
y(t) = -I sin(2i) + \t sin(2i) + £_1 ( e However, by formula (1), we have
(s2 + 4)2
(s2+4)2;=^"1(e"^(ísin(2í)))
= (t--ir) sin(2(i - tt)) ■ H^(t).
Since Heaviside's function is zero for t < tt and equal to 1 for t > tt, we get the solution in the form
J-±sin(2i) + ±/jsin(2i) for 0 < t < tt j^sin^f) forf>7r
□
8.L.3. Find the general solution of the equation
y'" - 5y" - 8y' + 48y = 0.
Solution. This is a third-order linear differential equation with constant coefficients since it is of the form
2/(") + aiy^"1) + a2y(n-2) + • • • + an-iy' + any = f(x) for certain constants ai,..., an e K. Moreover, we have f(x) = 0, i. e., the equation is homogeneous.
First of all, we will find the roots of the so-called characteristic polynomial
\n + ai A™"1 + a2Are-2 + ■ ■ ■ + a„_iA + an. Each real root A with multiplicity k corresponds to the k solutions
is analytic in all arguments (i.e. a convergent multidimensional power series f(y) = J2^\=o iisee 8-L15)-Exactly as in the previous discussion, we may hide the time variable t as well as further parameters in the variables.
The famous theorem below says that the solution of the most general system with analytic right-hand side is analytic in all the parameters as well (including the initial conditions).
ODE version of Cauchy-Kovalevskaya Theorem
Theorem. Assume f(y) is a real analytic vector valued function on a domain in Rn and consider the differential equation (1). Then the unique solution of this initial problem is real analytic, including the dependancy on the initial condition.
Proof. The idea of the proof is identical as in the simple one-dimensional case in 6.2.15. As we saw in the beginning of the previous paragraph, there are universal (multidimensional) polynomial expressions for all derivatives of the vector function y(t) in terms of the partial derivatives of the vector function /. If we expand them in terms of the individual partial derivatives of the mapping / all of their coefficients are obviously non-negative. Let us write again
y(fc)(0) = ft(/(y(0)),...,^/(y(0)),...)
for these multivariate vector valued polynomials (the multi-indices (3 in the arguments are all of size up to k — 1).
Without loss of generality we may consider the initial condition to = 0, y(0) = 0. Indeed, constant shifts of the variables (say z = y — yo, x = t — t0) transform the general case to this one. Once we know that the components of the solution are power series, the transformed quantities will be analytic too, including the dependancy on the values of the incital conditions.
In order to prove that the solution to the problem y' = f(y), y(0) =0 is analytic on a neighborhood of the origin, we shall again look for a majorant g for the vector equation y' = f(y), i.e. we want an analytic function on a neighborhood of the origin 0 6 R" with dag(0) > \daf(0)\, for all multi-indices a. Then, by the universal computations of all the coefficients of the power series y(t) = J2T=o j^y^ (®)tk solving potentially our problem, and similarly for z' = g(z), the convergence of the series for z implies the same for y:
zW(0) = Pk(g(0),...,df)g(0),...)
>Pk(\f(0)\,..., |fy/(0)|) > |y(fc)(0)|-
As usual, knowing already how to find a majorant in a simpler case, we try to apply a straightforward modification.
By the analycity of /, for r > 0 small enough there is a constant C such that |-g9Q/i(0)rlQl | < C, for all i = 1,... ,n and mutli-indices a. This means \dafi(Q)\ < C-^j. In the 1-dimensional case, we considered the multiple of a geometric series g(z) = C-^ with the right
581
CHAPTER 8. CALCULUS WITH MORE VARIABLES
and every pair of complex roots A = a ± i/3 with multiplicity k corresponds to the k pairs of solutions
eax cos (fa) , x eax cos (fa) ,..., a^"1 eax cos (fa) , eax sin (fa) , x eax sin (fa) x*'1 eax sin (fa) .
Then, the general solution corresponds to all linear combinations of the above solutions.
Therefore, let us consider the polynomial
A3 - 5A2 - 8A + 48
with roots Ai = A2 = 4, A3 = —3. Since we know the roots, we can deduce the general solution as well:
y = C\eix + C2xeix + C3e~3x, C\,C2,C3eR. □
8.L.4. Compute
y'" + y" + 9y' + 9y = ex + 10 cos (3a).
Solution. First, we will solve the corresponding homogeneous equation. The characteristic polynomial is equal to
A3 + A2 + 9A + 9,
with roots Ai = — 1, A2 = 3i, A3 = —3i. The general solution of the corresponding homogeneous equation is thus
y = C1e-x + C2cos(3x) + C3sm(3x), Ci,C2,C3 G R.
The solution of the non-homogeneous equation is of the form
y =
Cie-* + C2 cos (3a;) + C3 sin (3a;) + yp, d, C2, C3 G R
for a particular solution yp of the non-homogeneous equation.
The right-hand side of the given equation is of a special form. In general, if the non-homogeneous part is given by a function
Pn(x)taX,
where Pn is a polynomial of degree n, then there is a particular solution of the form
yp = xkRn(x) eax,
where k is the multiplicity of a as a root of the characteristic polynomial and Rn is a polynomial of degree at most n. More generally, if the non-homogeneous part is of the form
eax [Pm(x) cos (fa) + Sn(x) sin (fa)] ,
where Pm is a polynomial of degree m and Sn is a polynomial of degree n, there exists a particular solution of the form
yp = xkeax [Ri(x) cos (fa) + 7] (a) sin (fa)],
derivatives g^ = C^. Now the most similar mapping is
g(z1,...,zn) = (g1(z1,...,zn),...,gn(z1,...,zn)) with all the components gi equal to
h
h(z1, ...,zn) = C-
r - zi-----zn
Then the values of all the partial derivatives with \a\ = fc at
z = 0 are
dah(0) = Crk!(r-Zl
-k-l
z=0
exactly as suitable. (Check the latter simple computation yourself!)
So it remains to prove that the majorant system z' = g(z) has got the converging power series solution z. Obviously, by the symmetry of g (all compoments equal to the same h and h is symmetric in the variables z{), also the solution z with z(0) = 0 must have all the components equal (the system does not see any permutation of the variables z{ at all). Let us write Zi(t) = u(t) for the common solution components. With this ansatz,
u'(t) = h(u(t),u(t),u(t)) = C-
r — nu(t)
This is nearly exactly the same equation as the one in 6.2.15 and we can easily see its solution with u(0) = 0:
u = -I 1 -
n
1 -
InCt
Clearly, this is an analytic solution and the proof is finished.
□
8.3.14. Vector fields and their flows. Before going to higher-order equations, pause to consider systems of first-order equations from the geometrical point of view. When drawing illustrations of solutions earlier, we already viewed the right hand side of an autonomous system as a "field of vectors" /(a) G R™. This shows how fast and in which direction the solution should move in time.
This can be formalized. A tangent vector with a footpoint a G Rn is a couple (a, »)6l"x Rn. The set of all vectors with footpoints in an open set U C R™ is called the tangent bundle TU, with the footpoint projection p : (a, v) i-> a. A vector field X defined on an open set U C R™ is a mapping X : U —> TU which is a section of the projection p, i.e., p o X = id ij. The derivative in the direction of the vector field X is defined for all differentiable functions g on U by
X(g) :U^R, X(g)(x) = dx(x)g = dg(x)(X(x)).
So the vector field X is a first order linear differential operator mapping functions into functions. Apply pointwise the properties of directional derivative to obtain the derivative rule (also called the Leibniz rule) for products of functions:
(1) X(gh) = hX(g)+gX(h).
In fixed coordinates, X(x) = (X1 (a),... Xn(x)) and
X(g)(x)
*l(a)^(a) + .
+ X„(a)^-(a)
582
CHAPTER 8. CALCULUS WITH MORE VARIABLES
A - ±-
A — 20-
where k is the multiplicity of a + i/3 as a root of the characteristic polynomial and Ri, Ti are polynomials of degree at most / = max {m, n}.
In our problem, the non-homogeneous part is a sum of two functions in the special form (see above). Therefore, we will look for (two) corresponding particular solutions using the method of undetermined coefficients, and then we will add up these solutions. This will give us a particular solution of the original equation (as well as the general solution, then). Let us begin with the function y = ex, which has particular solution ypi (x) = Atx for some 4eR. Since
vvi (x) = y'P1 (x) = y'v\ (x) = y'p'i (x) = A&x^
substitution into the original equation, whose right-hand side contains only the function y = ex, leads to 20Aex = ex, i. e. For the right-hand side with the function y = 10 cos (3x), we are looking for a particular solution in the form
Vp2 (x) = X[B cos (3a;) + C sin (3a;)] . Recall that the number A = 3i was obtained as a root of the characteristic polynomial. We can easily compute the derivatives
y'p2 (x) ~ cos (3a;) + C sin (3a;) +x [-3B sin (3a;) + 3C cos (3a;)] y" (x) = 2 [-3B sin (3a;) + 3C cos (3a;) +x [-9B cos (3a;) - 9C sin (3a;)] (x) = 3 [-9B cos (3a;) - 9C sin (3a;) +x [27B sin (3a;) - 27C cos (3a;)].
Substituting them into the equation, whose right-hand side contains the function y = 10 cos (3a;), we get
-185 cos (3a;) - 18C sin (3a;) - 6B sin (3a;) + 6C cos (3a;) = 10 cos (3a;). Confronting the coefficients leads to the system of linear equations
-185 + 6(7 = 10, -18(7-65 = 0 with the only solution B = —1/2 and C = 1/6, i. e., Vp2 (x) — x [~\ cos (3x) + | sin (3a;)] . Altogether, the general solution is
y = Cxtrx + C2 cos (3a;) + <73 sin (3a;) + ^ex — -a; cos (3a;) + -a;sin (3a;) , C\, C2, C3 £ R.
Clearly, there are the special vector fields with coordinate functions equal to zero except for one function X{ which is identically one. Such a field then corresponds to the partial derivatives with respect to the variable x{. This is also matched by the common notation g|- for such vector fields and in general,
X{x)=Xl{x)— + .
+ Xn(x)
dxn
Remark. Actually, each derivative on functions, i.e., a linear operator D satisfying (1), is given by a unique vector field. This may be seen as follows. First D(l) = D(l ■ 1) = 2D(1) and, thus, D(c) = 0 for constant functions. Next, each function j(x) can be written on a neighborhood of a point q £ R" as f(x) = /(g) + ftf(q + t(x - q)) dt =
/() + E?=i Jo1 !£( + t(x - ?))dt^ - ft) = /(?) +
E?=i <*i(x)(Xi - gi). Thus, D(f) = 0 + E"=i D(ai)(qt -%) + E"=i at{q)D{xt) = Y7=i ^(<3,)I)(^)- Defining the components X{ = D(xi) of the vector field X, we have obtained D as the derivative in the direction of X.
We shall write X(U) for the set of all smooth vector fields on U, i.e. those with all components X{ smooth. Vector fields g|- can be perceived as generators ofX(U), admitting smooth functions as the coefficients in linear combinations.
We return to the problem of finding the solution of a system of equations. Rephrase it equivalently as finding a curve which satisfies
x'{t) = X{x{t))
for each value x(t) in the domain of the vector field X. In words: the tangent vector of the curve is given, at each of its points, by the vector field X. Such a curve is called an integral curve of the vector field X, and the mapping
Flf : Rn -+ Rn,
defined at a point x0 as the value of the integral curve x(t), satisfying x(0) = xq is called the flow of the vector field X. The theorem about existence and uniqueness of the solution of the systems of equations says (cf. 8.3.6) that for every continuously differentiable vector field X, its flow exists at every point x0 of the domain for sufficiently small values of t. The uniqueness guarantees that
Flf+s(a;)=Flf oFlf(a;),
whenever both sides exist. In particular, the mappings Fl^ and Fl^ always commute.
Moreover, the mapping Flf (x) with a fixed parameter t is differentiable at all points x where it is defined, cf. 8.3.12.
If a vector field X is defined on all of R™, and if its support is compact (i.e., X(x) = 0 off a compact set K c Rn), then its flow clearly exists at all points and for all values of t. Vector fields with flows existing for all t £ R are called complete. The flow of a complete vector field consists of (mutually commuting) diffeomorphisms Flf : R™ -> R™ with inverse diffeomorphisms Fl^t.
583
CHAPTER 8. CALCULUS WITH MORE VARIABLES
□
8.L.5. Determine the general solution of the equation
y" + 3y' + 2y = e~2x.
Solution. The given equation is a second-order (the highest derivative of the wanted function is of order two) linear (all derivatives are in the first power) differential equation with constant coefficients. First, we solve the homogenized equation
y" + 3y' + 2y = 0. Its characteristic polynomial is
x2 + 3x + 2 = (x + l)(x + 2),
with roots x1 = —1 and x2 = —2. Hence, the general solution of the homogenized equation is
c1e~x + c2e~2x,
where ci, c2 are arbitrary real constants.
Now, using the method of undetermined coefficients, we will find a particular solution of the original non-homogeneous equation. According to the form of the non-homogeneity and since —2 is a root of the characteristic polynomial of the given equation, we are looking for the solution in the form yo = axe~2x for a£l,
Substituting into the original equation, we obtain
a[-4e~2x+4xe~2x+3(e~2x-2xe-2x)+2xe-2x] = e~2x,
hence a = — 1. We have thus found the function — xe~2x as a particular solution of the given equation. Hence, the general solution is the function space c\e~x + c2e~2x — xe~2x,
ci,c2eK. □ 8.L.6. Determine the general solution of the equation
y" + y' = l.
Solution. The characteristic polynomial of the given equation is x2 + x, with roots 0 and —1. Therefore, the general solution of the homogenized equation is c1 + c2e~x, where ci,c2 G R.
We are looking for a particular solution in the form ax, a£l (since zero is a root of the characteristic polynomial). Substituting into the original equation, we get a = 1. The general solution of the given non-homogeneous equation is
ci + c2e~x + x, ci, c2 G R. □
A simple example of a complete vector field is the field X(x) = Its flow is given by
Flf (xi, ...,i„) = (xi +t,x2, ...,i„).
On the other hand, the vector field X(t) = t2-^ on the one-dimensional space R is not complete as the solutions x(t) of the corresponding equation dx = x2(iiareoftheforma;(f) = except for the initial condition x(0) = 0, so they "run away" towards infinite values in a finite time.
The points x0 in the domain of a vector field X : U C R™ —> R™ where X(x0) = 0 are called singular points of the vector field X. Clearly ¥\f (x0) = x0 for all t at all singular points.
8.3.15. Local qualitative description. The description of vector fields as assigning the tangent vector in the modelling space to each point of the Euclidean space is independent of the coordinates. It follows that the flows exhibit a geometric concept which must be coordinate-free.
It is necessary to know what happens to the fields and their flows, when coordinates are transformed. Suppose y = F(x) is such a transformation with F : R™ —> R™ (or on some smaller domain there). Then the solutions x(t) to a system x' = X(x) satisfy x'(t) = X(x(t)), and in the transformed coordinates this reads
y'(t) = (F(x(t)))'(t) = D1F(x(t))-x'(t)
= D1F{x{t)) ■ X{x{t)).
This means that the "transformed field" Y in the new coordinates is Y(F(x)) = D1F(x) ■ X(x). At the same time, the flows of these vector fields are related as follows:
Flf oF(x) = FoFlf (x).
By fixing x = x0 and writing x(i) = Fl^(a;o), the curve F(x(t)) is the unique solution for the system of equations y' = Y(y) with initial condition yo = F(x0), which equals the right-hand side.
The following theorem offers a geometric local qualitative description of all solutions of systems of first order ordinary differential equations in a neighbourhood of each point x which is not singular.
The flowbox theorem
Theorem. If X is a differentiable vector field defined on a neighbourhood of a point x0 G Rn and X(x0) 0, then there exists a transformation of coordinates F such that in the new coordinates y = F(x), the vector field X is given as the field Tp-.
J (77/1
Proof. Construct a diffeomorphism F with the required jjj i. properties, step by step. Geometrically, the essence of the proof can be summarized as follows: first select a hypersurface which goes through the point x0 ]fl and is complementary to the directions X(x) near to x0. Then fix the coordinates on it, and finally, extend them
584
CHAPTER 8. CALCULUS WITH MORE VARIABLES
8.L.7. Determine the general solution of the equation to some neighbourhood of the point x0 using the flow of the
-2x field X
y + 5y + 6y — e . Without loss of generality, move the point x0 to the origin
by a translation. Then by a suitable linear transformation on
Rn,setX(0) = ^(0).
Solution. The characteristic polynomial of the equation is with such coordinates, write the flow of the field X go-
x2 + 5x + 6 = (x + 2)(x + 3), its roots are -2 and -3. ing through the point (x1:..., xn) at time t = 0 as Xi(t) =
The general solution of the homogenized equation is thus Pi{t,xi,... ,xn),i = \,... ,n. Next, define the components
c\e~2x + C2e_3x, ci, c2 £ R. We are looking for aparticular (Z1'• • •'as
solutionin the form axe~2x, (-2 is aroot of the characteristic fi(x1,... ,xn) = ipi(x1,0, x2, ■ ■ ■ ,xn). polynomial), a £ R, using the method of undetermined coefficients. Substitution into the original equation yields a = 1.
Hence, the general solution of the given equation is OF d
-(o,...,o) = - (^M,...,o),...^M,...,o))
This follows the strategy. Since X(0,..., 0) = ^
Cle-2x + c2e-3x+xe-2x. 9x1 dt^°
□
8.L.8. Determine the general solution of the equation
y"-y' = 5.
= (1,0,. ..,0),
while the flow Flff at the time t = 0 yields
d + N, || Ak - A^ \\ < e. Hence,
U assigning the foot points to the tangent vectors, we write TXU for the vector space of all vectors X with p(X) = x at a point x e U, and we use the notation X(U) forthesetof all smooth vector fields on the open subset U.
The linear combinations of the special vector fields admitting smooth functions as the coefficients generate the entire X(U). Thus we write general vector fields as
CHAPTER 9. CONTINUOUS MODELS - FURTHER SELECTED TOPICS
2 / / dxdy=2[x]2_2- [y]_2 = 32
□
9.B.2. Compute
x Ax + xy dy,
where c is the positively oriented curve going through the vertices A = [0,0]; 5= [1,0]; C = [0,1].
Solution. The curve c is the boundary of the triangle ABC. The integrated functions are continuously differentiable on the whole R2, so we can use Green's theorem:
1 -x+l
xAdx + xydy = / / y dx dy = / / y dx dy
l
d
-x+l
0 0
1
xA 2x2
dx ■
+ x
x2 - 2x + 1
da;
□
9.B.3. Calculate
(xy + x + y) dx + (xy + x — y) dy,
where c is the circle with radius 1 centered at the origin.
Solution. Again, the prerequisites of Green's theorem are satisfied, so we can use Green's theorem, which now gives
(a;y + x + y) dx + (xy + x — y) dy y + 1 — x — ldxdy
d
1 2-k
r2 (sin p — cos ip) dr dp
o o
1 2-k
= I r dr I sin p — cos pdp
o o
= ^ [ - cos v - sin V between two open sets U C R™, V C Rm defines the mapping F* : TV -4- TV by applying the differential D1F to the individual tangent vectors. Thus if y = F(x) = (Mx), ...Jm(x)) then F,:TXU^ TF(x)V,
_d_
dx,
(y)
EE
j=ivi=i
dfj(x) dxi
X,(x
JL 9yj
□
When we studied the vector spaces in chapter two, we came accros the useful concept of linear forms. They were denned in paragraph 2.3.17 on page 106. This idea extends naturally now. A scalar valued linear mapping denned on the tangent space TXU is such a linear form at the foot point x. The vector space of all such forms T*U = (TXU)* is thus naturally isomorphic to Rn* and the collection T*U of these spaces comes equipped by the projection to the foot points, let us denote it again by p. Having a mapping r\ : U cK"^ T*U with values r/(x) e T*U on an open subset U, i.e., p o rj = id[/, we talk about a differential form rj on U, or a linear form.
Every differentiable function / on an open subset U C R™ defines the differential form df on U (cf. 8.1.7). We use the notation fl1 (U) for the set of all smooth linear differential forms on the open set U.
In the chosen coordinates (x1,..., xn) we can use the differentials of the particular coordinate functions to express every linear form r\ as
r)(x) = ri1(x)dx1 H-----\-r)n(x)dxn,
where r\i (x) are uniquely determined functions. Such a form rj evaluates on a vector field X(x) = X1(x)-^ + ■ ■ ■ + Xn(x)-^- as
ri(X)(x) = ri(x)(X(x)) = rll(x)X1(x)+- ■ ■+rln(x)Xn(x).
If the form r\ is the differential of a function /, we get just back the expression
X(f)(x) = df(X(x)) = jj^X^x) + ■■■ + §^-Xn(x)
for the derivative of / in the direction of the vector field X.
9.1.2. Exterior differential forms. As we discussed already in chapters 1 and 4, the volume of fc-dimensional parallelepipeds S, as a quantity depending of the k vectors spanning S, is an antisymmetric fc-linear form on the vectors, see 2.3.22 on page 111. Remember also the computation of the volume of parallelepipeds in terms of determinants in 4.1.22 on page 247.
Thus, if we want to talk about the (linearized) volume on fc-dimensional objects, we need a concept which will be linear in k distinct tangent vector arguments and will assign a scalar quantity to them. Moreover, we will require that interchanging any pair of arguments swaps the sign, in accordance with the orientations.
607
CHAPTER 9. CONTINUOUS MODELS - FURTHER SELECTED TOPICS
9.B.4. Compute f(2e2xsiny — 3y3)dx + (e2xcosy +
^x3)dy, where c is the positively oriented ellipse
Ax2 + 9y2 = 36.
Solution.: We will use Green's theorem, choosing the linear deformation of polar coordinates
x = 3rcosip, p e [0,2tt],
y = 2r sin p r £ [0,1],
leading to (the Jacobian of the transformation is 6r):
4
(2e2x siny - 3y3)dx + (e2x cosy + -a;3) dy =
2e2xcosy + 4a;2 - (2e2xcosy - 9y2) da; dy : 6r [4(3rcosy)2 + 9(2rsiny>)2] =
d
1 2ir
0 0
1 2ir 3
= 216 J r3 dr J dp = 216 ■ [—\Q ■ 2ir = 108tt.
o o
□
9.B.5. Compute
J(exlny - y2a;)da; + - ^x2yj dy,
c
where c is the positively oriented circle (x — 2)2 + (y — 2)2 1.
Solution.
ex 1
(exlny - y2a;)da; + (---x2y)dy =
V ^
d 1 2ir
e e
--a;y---h 2a;y da; dy :
y y
r(r cos p + 2) ■ (r sin y> + 2) dr dip =
o o
1 2ir
r3 sin y> cos p + 2r2 (sin y> + cos ^) + 4r dr di/; ^
o o
2ir
— f sin p cos y> dijS + — / sin y> + cos p dp + Air =
1 rsin p1 2ir r . n 2ir
g L 2 J0 + [-cos^ + siny>J0 +47T = 47r.
Exterior differential forms
Definition. The vector space of all fc-linear antisymmetric forms on a tangent space TXU, U C K™, will be denoted by Ak(TxU)*. We talk about exterior k-forms at the point xeU.
The assignment of a fc-form r](x) £ AkT*U to every point x e U from an open subset in R™ defines an exterior differential k-form on U. The set of smooth exterior fc-forms on U is denoted fJ^ (U).
Next, let us consider a smooth mapping G : V —> U between two open sets V C Rm and (7 C t", an exterior fc-form r)(G(x)) £ Ak(T^^U), and choose arbitrarily k vectors Xi(x),... ,Xk(x) in the tangent space TXV. Just like in the case of linear forms, we can evaluate the form r] at the images of the vectors Xi using the mapping y = G(x) = (gi(x),..., gn(x)). This operation is called the pullback of the form rj by G.
G*(r/(G(a;)))(X1(a;),...,Xfc(a;))
= r/(G(a;))(G4X1(a;)),. .., G^X^x))),
which is an exterior form in Ak (T* V). In the case of linear forms, this is the dual mapping to the differential D1G. We can compute directly from the definition that, for instance,
OXjy,
□
and so
(1) G*(dy{) = ^-dx1 + --- +
ox i
which extends to the linear combinations of all dy{ over fuc-tions.
Another immediate consequence of the definition is the formula for pullbacks of arbitrary fc-forms by composing two diffeomorphisms:
(2) (GoF)*a = F*(G*a). Indeed, as a mapping on fc-tuples of vectors,
(G o F)*a = a o ((D1G o D1F) x ... x (D1G o D1F)) = G*(a) o (D1F x ... x D1F) = F* o G*a
as expected.
9.1.3. Wedge product of exterior forms. Given a fc-form
, £ AkRn* an(j an l_(orm p e AkRn*,
€£L_kY/ we can create a (k + £)-form a A (3 by all possible permutations a of the arguments. We just have to alternate the arguments in all possible orders and take the right sign each time:
(aAp)(X1,...,Xk+t) =
a&Ek
608
CHAPTER 9. CONTINUOUS MODELS - FURTHER SELECTED TOPICS
9.B.6. Calculate the integral
(ex siny — xy2)dx + ( ex cosy — '^'^ )
where c is the positively oriented circle x2+y2+4x+4y+7 =
o. O
9.B.7. Compute
f(3y - eshlx) dx + (7x + vV + 1) dy,
where c is the positively oriented circle x2 + y2 = 9. O Compute the integral
fA y3, A 9 x3. ,
/ - + 2xy - 2- )dx + - + x2 + — dy, J x 3 y 3
where c is the positively oriented boundary of the set D =
{(x,y)eR2 :4 V C Kfe, we can easily compute the result, following the same definition. Let us denote
ip*(uj)(u) = f(u)dui A ■ ■ ■ A duk.
Invoking the relation 9.1.2(2) for the pullback of a form by a composite mapping, we get
UJ = Q forgetting the first coordinate.
On the submanifold Q, there is the (n — 1)-dimensional involutive distribution D generated by the fields y = Y%\q, i = 2,..., n (notice we again use the argument from the beginning of the proof about the brackets of restricted fields). Now, our assumption says we find suitable coordinates (q2,... ,qm) on Q around the point x e Q, so that for all small constants bn+1, ...,bm, the integral submanifolds of D are defined by qn+1 = bn+1, ...,qm=bm.
Finally, we need to adjust the original coordinate functions yi all over the neighborhood U of x. The obvious idea is to use the flow of X1 = Y1 to extend the latter coordinates on Q. Thus we define the coordinate functions in all y e U using the projection p,
xi(y) = yi(y),x2(y) = q2(p(y)),-- -,xm = qm{p{y))-
The hope is that all submanifolds TV given by equations xn+1 = bn+1,..., xm = bm (for small bf) will be tangent to all fields Yi,...,Yn. Technically, this means Yi(xj) = 0 for alii = 1,..., n, j' = n + 1,..., m. By our definition, this is obvious for the restriction to Q, and obviously Y\ (xf) = 0 in all other points, too.
Let us look closely on what is happening with one of our functions Yi(x,j) along the flows of the field X1. We easily compute with the help of the definition of the Lie bracket
■^(Yiixj)) = Y1(Yi(xj)) = YMfa)) + [Y^fa)
m
= Yi{Xi{xA) + cmY^x/) + y^/clikYk(xj)
k=2
m
= y^cukYkjxj).
k=2
This is a system of linear ODEs for the unknown functions Yi(xj) in one variable x1 along the flow lines of Y1. The initial condition at the point in Q is zero and thus this constant zero value has to propagate along the flow lines, as requested. The induction step is complete. □
9.1.19. Formulation via exterior forms. As we know from linear algebra, a vector subspace of codimension k is defined by k independent linear forms. Thus, every smooth n-dimensional distribution D C TM on a manifold M can be (at least) locally defined by m — n linear forms uij on M.
A direct computation in coordinates reveals that the differential of linear from cu evaluates on two vector fields as follows
(1) duo{X, Y) = X{uo{Y)) - Y{uj{X)) - uo{[X, Y]).
629
CHAPTER 9. CONTINUOUS MODELS - FURTHER SELECTED TOPICS
Indeed, if X = J2t X,£-, Y = ^ Y£-, cu = J2tcu,dx,, then
X(co(Y))-Y(co(X)) = jyXi^jYjyY^icjjXj))
= duj{X,Y) +uj{[X,Y]).
Thus, the involutivity of a distribution denned by linear forms cjn+i,... ,cjrn should be closely linked to properties of the differentials on the common kernel. Indeed, there is the following version of the latter theorem:
Frobenius' theorem
Theorem. The distribution D defined on an m-dimensional manifold M by (m — n) independent smooth linear forms oJn+i, ■ ■ ■ ,wm is integrable if and only if there are linear forms ctij such that duj^ = J2e a^e A uji.
Proof. Let us write cu = (cun+i,... ,wm) fortheR"1-™-valued form. The distribution is D = kero;. Now, the formula (1) (applied to all components of cu) implies that involutivity of D is equivalent to du\ u = 0.
If the assumption of the theorem on the forms holds true, dui clearly vanishes on the kernel of cu and therefore D is invo-lutive, and one of the implications of the theorem is proved.
Next, assume D is integrable. By the stronger claim proved in the latter Frobenius theorem, for each point x e M, there are coordinates (x1,... ,xm) such that D is the common kernel of all dxn+i, ■ ■ ■, dxm. In particular, our forms uij are linear combinations (over functions) of the latter (m — n) differentials. Moreover, there must be smooth invertible matrices of functions A = (a^) such that
dxk = P-ki^e, k, £ = n + 1,..., m.
e
Finally, dcu^ includes only terms with dx{ A dxj with j > n and all dxj can be expresed via our forms u>i from the previous equation. Thus the differentials have got the requested forms. □
2. Remarks on Partial Differential Equations
The aim of our excursion into the landscape of differential equations is modest. We do not have space in this rather elementary guide to come close enough to this subtle, beautiful, and extremely useful part of mathematics dealing with differential equations. Still we mention a few issues.
First, the simplest method reducing the problem to already mastered ordinary differential equations is explained, based on the so called characteristics. Then we show more simple methods how to get some families of solutions.
Next, we present a more complicated theoretical approach dealing with formal solvability of even higher order systems of differential equations and its convergence - the
630
CHAPTER 9. CONTINUOUS MODELS - FURTHER SELECTED TOPICS
famous Cauchy-Kovalevskaya theorem. This is the only instance of general existence and uniqueness theorem for differential equations involving partial derivatives. Unortunately, it does not cover many of interesting problems of practical importance.
Finally, we display a few classical methods to solve boundary problems involving some of the most common equations of second order.
9.2.1. Initial observations. In practical problems, we often
meet equations relating unknown functions of '**STy^*' more variables and their derivatives. We already '■Is X~ handled the very special case where the rela-s> tions concerned functions x(t) of just one vari-
able t. More explicitely, we dealt with vector equations
x(k) = F(t, x, x, £, x\ x{k-^), F : Rnk+1 -> Rn,
where the dots over ieB" meant the (iterated) derivatives oix(t) = (x1(t),... ,xn(t)), up to the order k. The goal was to find a (vector) curve x(t) in Rn which makes this equation valid.
Two more comments are due: 1) we can omit the explicit appearance of t on the cost of adding one more variable and equation x0 = 1; and 2) giving new names to the iterated derivates Xj = x^) and adding equations ±j = Xj+1, j = 1,..., k — 1, we reduce always the problem to a first order system of equations (on a much bigger space).
Thus, we should like to work similarly with the equations
F(x, y, UXxi V>xyi ^yyi • • • ) 0,
where u is an unknown function (possibly vector valued) of two variables x and y (or even more variables) and, as usual, the indices denote the partial derivatives. Even if we expect the implicit equation to be solved in some sence with respect to some of the highest partial derivatives, we cannot hope for a general existence and uniqueness result similar to the ODE case.
Let us start with a most simple example illustrating the general problem related to the choice of the initial conditions.
9.2.2. The simplest linear case. Consider one real function u = u(x, y), subject to the linear homogeneous equation
(1) a(x,y)ux + b(x,y)uy = 0
where a and b are known functions of two variables denned for x, y in a domain 17 C t2. We consider the equation in the tubular domain fl x R c R3. Usualy, fl is an open set together with a nice boundary, a curve dfl in our case.
An obvious simple idea suggests to write fl as a union of non-intersecting curves and look for u constant along those curves. Moreover, if those curves were transversal to the boundary dfl, then initial conditions along the boundary should extend inside of fl. Thus, consider such a potentially existing curve c(t) = (x(t), y(t)) and write
0 = JtuW)) = ux{c{t))x{i) + uy(c(t))y(t).
631
CHAPTER 9. CONTINUOUS MODELS - FURTHER SELECTED TOPICS
This yields the conditions for the requested curves:
(2) x = a(x,y), y = b(x,y).
Since u is considered constant along the curve, we obtain a unique possibility for the function u along the curves for all initial conditions x(0), y(0), and u(x(0), y(0)), if the coefficients a and b are at least Lipschitz in x and y.
The latter curves are called the characteristics of the first order partial differential equation (1) and they are solutions of its characteristic equations (2). If the coeficients are dif-ferentiable in all variables, then also the solution u will be dif-ferentiable for differentiable choices of initial conditions on a curve transversal to the characteristics and we might have solved the problem (1) locally. Still it might fail.
Let us look at the homogeneous linear problem
(3) yux — xuy = 0, u(x, 0) = x.
We saw already the solutions to the characteristic equations
x = y, y = -x
and the characteristics are circles with centers in the origin, x(t) = Rsmt, y(t) = Rcost. If we choose any even differentiable function tp(x) = u(x, 0) for the initial conditions at points (x, 0), we are lucky to see that the solution will work. But for odd functions, e.g. our choice tp(x) = x, there will be no solution of our problem in any neighbourhood of the origin. Clearly, this failure is linked to the fact that the origin is a singular point for characteristic equations.
9.2.3. The quasi-linear case. The situation seems to get more tricky once we add a nontrivial right-hand value f(x, y, u) to the equation (1), i.e. we try to solve the problem (allowing a and b to depend on u)
(1) a(x, y, u)ux + b(x, y, u)uy = f(x, y, u).
But in fact, the very same idea leads to characteristic equations on R3, writing z = u(x,y) for the unknown function along the characteristics. Geometrically, we seek for a vector field tangent to all graphs of solutions in the tubular domain fl x R. Remind z = u(x, y), restricted to a curve in the graph, implies z = uxx + uyy, and thus we may set z = f(x, y, z), x = a(x, y, z), y = b(x, y, z) in order to get such a characteristic vector field.
Characteristic equations and integrals The characteristic equations of the equation (1) are
(2) x = a(x,y,z), y = b(x,y,z), z = f(x,y,z).
This autonomous system of three equations is uniquely solvable for each initial condition if a, b, and / are Lipschitz.
A function t/j on fl x R which is constant on each flow line of the characteristic vector field, i.e., i/j(x(t),y(t),z(t)) = const for all solutions of (2), is called an integral of the equation (1). If i/jz ^ 0, then the implicit function theorem guarantees the unique existence of the function z = u(x,y) satisfying the chosen initial conditions.
632
CHAPTER 9. CONTINUOUS MODELS - FURTHER SELECTED TOPICS
Check yourself that the latter functions u are solutions to our problem. This approach covers the homogeneous case as well, we just consider the autonomous characteristic equations with i = 0 added.
Let us come back to our simple equation 9.2.2(3) and choose j(x,y,u) = y for the right-hand side. The characteristic equations yield x = R sin t, y = R cos t as before, while z = y = Rcost and hence z = Rsint + z(0). Thus, we may choose t/j(x,y,z) = z — x as an integral of the equation, and the solution u(x,y) = x + C with any constant C.
Notice, there will be plenty of solutions here since we may add any solution of the homogenous problem, i.e. all functions of the form
(3) u(x,y) = h(x2 + y2)
with any differentiable function h. Thus, the general solution u(x, y) = x + h(x2 + y2) depends on one function of one variable (the above constant C is a special case of h).
We may also conclude that for "reasonable" curves dfi C R2 (those transversal to the circles centred at the origin and not containing the origin) and "reasonable" initial value u\qq (we have to watch the multiple intersection of the circles with dfi!) there will be (at least locally) a unique solution extending the intital values to an open neighborhood of dfi.
Of course, we may similarly use characteristics and integrals for any finite number of variables x = (x1,..., xn) and equations of the form
. , du , du ,
ai(x,u)---1-----\-an[x,u)-— = J[x,u)
ox i dxn
with the unknown function u = u(x1,... ,xn). As we shall see later, typically we obtain generic solutions dependent on one function of n — 1 variables, similarly to the above example.
9.2.4. Systems of equations. Let us look what happens if we add more equations. There are two quite different ways how to couple the equations.
We may seek for an unknown vector valued functions
u= («!,..., um) : R™ —> Rm, subject to m equations
(1) Ai(x,u) ■ Vu{ = fi(x,u), i = l,...m,
where the left hand side means the scalar product of a vector valued function A{ : Rm+n —> Rn and the gradient vector of the function Ui. Such systems behave similarly as the scalar ones and we shall come back to them later.
The other option leads to the so called overdetermined systems of equations. Actually we shall not pay more attention to this case in the sequel and so the reader might jump to 9.2.6 if getting lost.
Consider a (scalar) function u on a domain in fi C R™ and its gradient vector Vu. For each matrix A = (ciij) with m rows and n columns, with differentiable functions aij(x, u) on fi x R, and the right hand value function F(x, u) : fi x R —> Rm, we can consider the system of equations
633
CHAPTER 9. CONTINUOUS MODELS - FURTHER SELECTED TOPICS
(2) A(x,u) ■ Vu = F(x,u).
Of course, in both cases, we have got m individual equations of the type from the previous paragraph and we could apply the same idea of characteristic vector fields for all of them. The problem consists in coupling of the equations and obtaining possibly inconsistent neccesary conditions from the individual characteristic fields.
Let us look at the overdetermined case now. We can get most close to the situation with the ordinary differential equations if A is invertible and we move it to the right hand side, arriving at the system of equations
(3) Vu = A'1^,^ ■ F{x,u) = G{x,u).
The simplest non-trivial case consists of two equations in two variables:
ux = f(x, y, u), uy = g{x, y, u).
Geometrically, we describe the graph of the solution as a surface in R3 by prescribing its tangent plane through each point. An obvious condition for the existence of such u is obtained by differentiating the equations and employing the symmetry of the higher order partial derivatives, i.e. the condition uxy = uyx- Indeed,
t^xy fy + Iu9 9x + 9uj ^yx:
where we substituted the original equations after applying the chain rule. We shall see in a moment that this condition is also sufficient for the existence of the solutions. Moreover, if the solutions exist, then they are determined by their values in one point, similarly to the ordinary differential equations.
9.2.5. Frobenius' theorem again. Similarly, we can deal with the gradient Vu of an m-dimensional vector valued function u. For example, if m = 2 and n = 2 we are describing the tangent planes to the two-dimensional graph of the solution u In general we face ttiti equations
(1) — = Fpi{X,U), l = 1,. . . ,71, p = 1,. . . ,771.
The necessary conditions imposed by the symmetry of higher order derivatives then read
w dxtdxj ~ dxj "t" duq rV~ dxt "t" duq rV
for all i, j and p.
Let us reconsider our problem from the geometric point of view now. We are seeking for the graph of the mapping u : R™ —> Rm. The equations (1) describe then-dimensional distribution D on Rm+n and the graphs of possible solutions u = («!,..., um) axe, just the integral manifolds of D. The distribution D is clearly defined by the m linear forms
Up
= dup-^Fpidxi, p=l,.
634
CHAPTER 9. CONTINUOUS MODELS - FURTHER SELECTED TOPICS
while the vector fields generating the common kernel of all
u)p can be chosen as
xt = A + y-^A.
Now we compute differentials dcup and evaluate them on the fields Xi
—dcuv = . pl dxj A dxi + pl du„ A dxi
^ dx; J ^ du„
i,j J i,q H
l,j ■> q H
-dup(Xj,X^(^ + Y^Fqj)
tdFvj + 9fpj f ^ dii ^ du„ '
Q
qi)
Thus, vanishing of the differentials on the common kernel is equivalent to the neccesary conditions deduced above, and the Frobenius theorem says that the latter conditions are sufficient, too. We have proved the following:
Theorem. The system of equations (1) admits solutions if and only if the conditions (2) are satisfied. Then the solutions are determined uniquely locally around x G Q by the initial conditions u(x) G Rm.
Remark. The Frobenius' theory deals with the so called overdetermined systems ofPDEs, i.e. we have got too many equations and this causes obstructions towards their integra-bility. Although the case in the last paragraph sounds very special, the actual use of the theory consists in considering differential consequences of a given system until we reach a point, where the special theorem applies and gives not only further obstractions but also the sufficient conditions.
9.2.6. General solutions to PDE's. In a moment, we shall deal with diverse boundary conditions for the solutions of PDEs. In most cases we shall be ^Nfc happy to have good families of simple "guessed" ^r^ispB^-J— solutions which are not subject to any further conditions. We talk about general solutions in this context. Unlike the situation with ODEs, we should not hope to get a universal expression for all possible solutions this way (although we can come close to that in some cases, cf. 9.2.3(3)). Instead, we often try to find the right superpositions (i.e. linear combinations) or integrals built from suitable general solutions.
Let us look at the simplest linear second order equations in two variables, homogeneous with constant coefficients:
(1) Auxx + 2Buxy + Cuyy + Dux + Euy + Fu = 0
where A, B, C, D, E, F are real constants and at least one of A, B, C is non-zero.
Similarly to the method of characteristics, we try to reduce the problem to ODEs. Let us again assume solution
s
635
CHAPTER 9. CONTINUOUS MODELS - FURTHER SELECTED TOPICS
in the form u = f(p), where / is an unknown function of p and p(x, y) should be nice enough to get close to solutions. The necessary derivatives are ux = f'px, uy = f'py,
V'xx f PxPx + f Pxxi V'xy f PxPy + f Pxyi ^yy
f'PyPy + f'Pyy Thus (1) becomes too complicated in general, but restricting to afnnep(a;,y) = ax+/3y with constants a, /3, we arrive at
(2) (Aq2 + 2Baf3 + Cf32)/" + (Da + E0)f + Ff = 0.
This is a nice ODE as soon as we fix the values of a and (3. Let us look at several simple cases of special importance.
Assume D = E = F = Q, A / 0. Then, after dividing by a2, we solve the equation (A + 2B^ + Cfr)/" = 0 and the right choice of the ratio A = j3/a =^ 0 kills the entire coefficient at /". Thus, (2) will hold true for any (twice dif-ferentiable) function / and we arrive at the general solution u(x,y) = f(p(x, y)), withp(a;, y) = x + Ay. Of course, the behavior will very much depend on the number of real roots of the quadratic equation
A + 2BX + C\2 = 0.
The wave equation. Put A = 1,C = B = Q, thus our equation is uxx = -^Uyy, the wave equation in dimension 1. Then the equation 1 — ^A2 = 0 has got two real roots A = ±c, and we obtain p = x ± cy leading to the general solution
u(x, y) = f(x - cy) + g(x + cy)
with two arbitrary twice differentiable functions of one variable / and g.
In Physics, the equation models one-dimensional wave development in the space parametrized by x while y stays for the time. Notice c corresponds to the speed of the wave u(x, 0) = f(x) + g(x) initiated in the time y = 0, and while the / part moves forwards, the other part moves backwards. Indeed, imagine u(x,y) = f(x — cy) describes the displacement of a string at point x in time y. This remains constant along the lines x — cy = constant. Thus, a stationary observer sees the initial displacement u(x, 0) moving along x-axis with the speed c.
In particular, we see that the initial condition along a line in the plane is not enough to determine the solution, unless we request the solution will move only in one of the possible directions (i.e. we posit either / or g to be zero).
The Laplace equation. Now we consider A = C = 1, B = 0, i.e. the equation uxx + uyy = 0. This is the Laplace equation in two dimensions and its solutions are called harmonic functions.
Proceeding as before, we obtain two imaginary solutions to the equation A2 + 1 = 0 and our method produces p = x ± iy, a complex valued function instead of the expected real one. This looks ridiculous, but we could consider / to be a mapping / : C —> C viewed as a mapping on the complex plane. Remind that some of such mappgins have got differentials D1f(p) which actually are multiplications by complex numbers at each point, cf. ??. This is in particular true for
636
CHAPTER 9. CONTINUOUS MODELS - FURTHER SELECTED TOPICS
any polynomial or converging power series. We may request that this property holds true for all iterated derivatives of this kind. In general, we call such functions on C holomorhic and we discuss them in the last part of this chapter. The reader is advised to come back to this exposition on general solutions to Laplace equation after reading through the begining of the part on complex analytic functions below, starting in 9.4.1.
Now, assuming / is holomorphic, we can repeat the above computation and arrive again at
(A2 + l)/"(p) = 0
independently of the choice of / (here f'(p) means the complex number given by the differential D1f,f"(p)is the iteration of this kind of derivative). Moreover, the derivatives of vector valued functions are computed for the components separately and thus both the real and the imaginary part of the general solution f(x + iy)+g(x — iy) will be real general solutions.
For example, consider f(p) = p2 leading to
u(x, y) = (x + iy)2 = (x2 — y2) + i 2xy
and simple check shows that both terms satisfy the equation separately. Notice the two solutions x2 — y2 and xy povide the bases of the 2-dimensional vector space of harmonic homogeneous polynomials of degree two.
The diffusion equation. Next assume A = k, B = C = D = F = 0, and add the first order term with E = — 1. This provides the equation
Uy KUXX:
the diffusion equation in dimension one.
Applying the same method again, we arrive at the ODE
«q2/" -Pf = 0
which is easy to solve. We know the solutions are found in the form f(p) = eyp with v satisfying the condition kcPv2 — (3v = 0. The zero solution is not interesting, thus we are left with the general solution to our problem by substituting
p{x, y) = ax + /3y and v =
Again, a simple check reveals that this is a solution. But it is not very "general" - it depends just on two scalars a and (3. We have to find much better ways how to find solutions of such equations.
9.2.7. Nonhomogeneous equations. As always with linear equations, the space of solutions to the homogeneous linear equations is a real vector space (or complex, if we deal with complex valued solutions).
Let us write the equation as Lu = 0, where L is the differenatial operator on the left hand side. For instance,
L Ad2 B d2 cd2 D& Ed F dx2 dxdy dp2 dx dy in the case of the linear equation 9.2.6(1).
637
CHAPTER 9. CONTINUOUS MODELS - FURTHER SELECTED TOPICS
The solutions of the corresponding non-homogeneous equation Lu = f with a given function / on the right hand side form an affine space. Indeed, if Lu\ = f, Lu2 = f, Lu3 = 0, then clearly L(ui —u2) = 0 while L(ui +u3) = f. Thus, if we succeed to find a single solution to Lu = f, then we can add any general solution to the homogeneous equation to obtain a general solution.
Let us illustrate our observation on some of our basic examples. The non-homogenous wave equation uxx — uyy = x + y has got the general solution
u{x,y) = \{x3 -y3)+f{x-y) + g{x + y)
D
depending on two twice differentiable functions.
The non-homogeneous Laplace equation is called the Poisson equation. A general complex valued solution of the Poisson equation uxx + uyy = x + y is
u{x,y) = \{x3 + y3) + f(x - iy) + g(x + iy)
D
depending on two holomorphic functions / and g.
9.2.8. Separation of variables. As we have experienced, a straightforward attempt to get solutions is to expect them in a particular simple form. The method of separation of variables is based on the assumption that the solution will appear as a product of single variable functions in all variables in question. Let us apply this method on our three special examples.
Diffusion equation. We expect to find a general solution of
kuxx = ut in the form u(x, t) = X(x)T(t). Thus the equation says kX" (x)T(t) = T'(t)X(x). Assume further u ± 0 and divide this equation by u = XT:
X" (x) _ T'(t) X{x) ~ nT(t)'
Now the crucial observation comes. Notice the terms on the left and right are function of different variables and thus the equation may be satisfied only if both the sides become constant. We shall have to distinguish the signs of this separation constant, so let us write it as —a2 (choosing the negative option). Thus we have to solve two independent ODEs
X' + a2X = 0, T" + q2kT = 0.
The general solutions are
X(x) = A cos ax + B sin ax T(t) = Ce~a2Kt
with free real constants A, B, C. When combining these solutions in the product, we may absorb the constant C into the other ones and thus we arrive at the general solution
u(x, t) = (A cos ax + B sin ax) e~a .
This solution depends on three real constants.
638
CHAPTER 9. CONTINUOUS MODELS - FURTHER SELECTED TOPICS
If we choose a positive separation constant instead, i.e. A2, there will be a sign change in our equations and the resulting general solution is
2 ,
u(x,t) = (Acoshax + Bsinhax) ea .
If the separation constant vanishes, then we obtain just u(x, t) = A + Bx, independent of t. The Laplace equation. Assume u(x, y) = X(x)Y(y) satisfies the equation uxx + uyy = 0 and proceed exactly as above. Thus, X" Y + Y"X = 0 and dividing by XY and choosing the separation constant a2, we arrive at
X" = a2X, Y" = -a2Y.
The general solution depends on four real constants
A, B, C, D
u(x, y) = {A cosh ax + B sinh ax) (C cos ay + D sin ay).
If the separation constant is negative, i.e. —a2, the roles of x and y swap.
The wave equation. Let us look how the method works if there are more variables there. Consider a solution
u(x, y, z, t) = X(x)Y(y)Z(z)T(t) of the 3D wave equation
1
~2~Utt — Uxx A Uyy A Uzz.
Playing the same game again, we arrive at the equation
±T"XYZ = X'YZT + Y"XZT + Z'XYT. Dividing by u ^ 0,
c2 T X Y Z and since all the individual terms depend on different single variables, they have to be constant. Again, we shall have to keep attention to the signs of the separation constants. For instance, let us choose all constants negative and look at the individual four ODEs
11 2 A o2 1 2 X2
with the constants satisfying — a2 = —{? — j2 — S2. The general solution is u(x, y, z, t) = X(x)Y(y)Z(z)T(t) with linear combinations
T(t) = A cos cat + B sin cat
X(x) = C cos f3x + D sin f3x
Y(y) = Ecosjy + F sin 71/
Z(z) = G cos Sz + H sin Sz
with eight real constants A through H.
If we choose any of the separation constants positive, the corresponding component in the product would display hyperbolic sine and cosine instead. Of course, the relation between the constants sees the signs as well.
We can also work with complex valued solutions and choose the exponentials as our building blocks (i.e. X(x) = e±ifix or x(x) = e±,3x, etc). For instance, take one of the solutions with all the separation constants negative u(x, y, z, t) = el<3x elSz e~lcat = e^+7V+Sz-cat) _
639
CHAPTER 9. CONTINUOUS MODELS - FURTHER SELECTED TOPICS
Similarly to the ID situation, we can again see a "plane wave" propagating along the direction (/3,7, <5) with angular frequency CQ.
9.2.9. Boundary conditions. We continue with our examples of second order equations and discuss the three most common boundary conditions for them. Let us consider a domain C R", bounded or unbounded, and a differential operator L denned on (real or complex valued) functions on fl. We write dfl for the boundary of fl and assume this is a smooth manifold.
Locally, such a submanifold in R™ is given by one implicit function p : R™ —> R and the unit normal vector v(x), x e dfl, to the hypersurface dfl is given by the normalized gradient
() nv^ir
We say that a function u is differentiable on fl, if it is differ-entiable on its interior and the directional derivatives D],u(x) exist in all points of the boundary. Typically we write ^ for the derivative in the normal direction.
For simplicity, let us restrict ourselves to L of the form
and look at the equation Lu = F(x, y, u, |^).
Cauchy boundary problem
At each point of the boundary x e dfl we prescribe both the value p(x) = u(x) and the derivative ip{x) = §7(2;) in the normal unit direction.
The Cauchy problem is to solve the equation Lu = F
du 8v
on fl, subject to u = p and |^ = ip on dfl.
We shall see that the Cauchy problems very often lead locally to unique solutions, subject to certain geometric conditions on the boundary dfl. At the same time, it is often not the convenient setup for practical problems. We shall illustrate this phenomenon on the 2D Laplace equation in the next but one paragraph.
An even simpler possibility is to request only the condition on the values of u on the boundary dfl. Another possibility, often needed in direct applications, is to prescribe the derivatives only. We shall see, that this is reasonable for the Laplace and Poisson equations.
DlRICHLET AND NEUMANN BOUNDARY PROBLEMS
At each point of the boundary x e dfl we prescribe the value p(x) = u(x) or the derivative ip{x) = ^(x) in the normal unit direction.
The Dirichletproblem is to solve the equation Lu = F on fl, subject to the condition u = p on dfl.
The Neumann problem is to solve the equation Lu = F on fl, subject to the condition = ip on dfl.
640
CHAPTER 9. CONTINUOUS MODELS - FURTHER SELECTED TOPICS
9.2.10. Uniqueness for Poisson equations. Because the proof of the next theorem works in all dimensions n > 2, we shall formulate it for the general Poisson equation
Theorem. Assume u is a twice differentiable solution of the Poisson equation (1) on a domain fl C Kn. If u satisfies the Dirichlet condition u = p on dfi, then u is the only solution of the Dirichlet problem.
If u satisfies the Neumann condition |^ = ip on dfi, then u is the unique solution of the Neumann problem, up to an additive constant.
The proof of this theorem relies on a straightforward consequence of the divergence theorem. Remind 9.1.14, saying that for each vector field X on a domain 17 c R™ with hyper-surface boundary dfi
(2) / 6ivXdx1...dxn = X ■ v ddfi, J m Jan
where v is the oriented (outward) unit normal to dfi and ddfi
stays for the volume inherited from R™ on dfi.
1ST and 2ND green's identity
Lemma. Let M "- " be a n-dimensional manifold with boundary hypersurface S, and consider two differentiable functions p and ip. Then
(3) / (pAip + Vp ■ Vip) dx1... dxn = / pVip ■ v dS. Jm Js
This version of the divergence theorem is called the 1st
Green's identity.
Next, let us consider one more differentiable function
p and X = ppVip — ippVp. The the divergence theorem
yields the so called 2nd Green's identity
(4)
p(V ■ {pV))ip - ip{V ■ (pV))p dx1... dxn
m
fi(pVip — ipVp) ■ v dS,
s
where V ■ (p V) means the formal scalar product of the two vector valued differential operators.
Proof of the Green's identities. The first claim follows by applying (2) to X = pS/ip, where p and ip are differentiable functions and S7ip is the gradient of ip. Indeed,
ix^K" = p{Vip ■ v)dS div X = pAip + Vp ■ Vip,
where the dot in the second term denotes the scalar product of the two gradients. Let us also notice that the scalar product S7ip -v is just the derivative of ip in the direction of the oriented unit normal v.
The second identity is computed the same way and the two terms with the scalar products of two gradients cancel each other. The reader should check the details. □
641
CHAPTER 9. CONTINUOUS MODELS - FURTHER SELECTED TOPICS
Remark. A special case of the 2nd Green's identity is worth mentioning. Namely, if fi = 1 and both t/j and p vanish on the boundary dfi, we obtain
pAifj — ipAp dxi... dxn = 0.
n
This means that the Laplace operator is self-adjoint with respect to the L2 scalar product on such functions.
Proof of the uniqueness. Assume ui and u2 are solutions of the Poisson equation on fi, thus u = u± — u2 is a solution of the homogeneous Laplace equation,
Au = Alii - Au2 = F -F = 0.
At same time, either u = ui — u2 = 0 on dfi or 4^ = 0 on
dfi.
Now we exploit the first Green's identity (3) with p
ip = u,
I (uAu + Vu ■ Vu) dxi... dxn = I Jn J as
9,1 JQ u—db.
In Jan dn
In both problems, Dirichlet or Neumann, the right hand side vanishes. The first term in the left hand integrand vanishes, too. We conclude
|| Vu||2 dxi... dxn = 0,
in
but this is possible only if Vu = 0 since the integrand is continuous. Thus, u = u1 — u2 is constant. But if we solve a Dirichlet problem, then ui and u2 coinside on the boundary and thus they are equal. □
9.2.11. Well posed problems. Consider the Cauchy boundary problem for uxx + uyy = Q,dfi given by y = 0 and
p(x) = u(x, 0) = Aa sin ax ip(x) = uy(x, 0) = Ba sinaa;
with the scalar coefficients Aa and Ba depending on the chosen frequency a. Simple inspection reveals, that we can find such a solution within the result from the separation method:
1
u
(x, y) = (Aa cosh ay H--Ba sinh ay) sin ax.
a
Now, choose Ba = 0 and Aa = ^, i.e.
u(x,y) = — cosh ay sinaa;.
a
Obviously, when moving a towards infinity, the Cauchy boundary conditions can become arbitrarily small and still small change of Ba causes arbitrarily big increase of the values of u in any close vicinity of the line y = 0.
Imagine, the equation describes some physical process and the boundary conditions reflect some measurements, including some periodic small errors. The results will be horribly instable with respect to these errors in the derivatives. We should admit that the problem is in some sense ill-posed, even locally. This motivates the following definition.
642
CHAPTER 9. CONTINUOUS MODELS - FURTHER SELECTED TOPICS
Well-posed and ill-posed boundary problems
The problem Lu = F on the domain fl with boundary conditions on dfl is called well-posed if all three conditions hold true:
(1) The boundary problem has got a solution u (a classical solution means u is twice continuously differentiable);
(2) the solution u is unique;
(3) the solution is stable with respect to initial data, i.e. "small" change of the boundary conditions results in a "small" change of the solution.
The problem is called ill-posed, if any of the above conditions fails.
Usualy, the stability in the third condition means that the solution is continuously dependent on the boundary conditions in a suitable topology on the chosen space of functions.
Also the uniqueness required in the second condition has to be taken reasonably. For instance, only uniquenes up to some additive constant makes sense for the Neumann problems.
9.2.12. Quasilinear equations. Now we exploit our experience and focus on the (local) Cauchy type problems for equations of arbitrary order. Similarly to the ODEs, we shall deal with problems, where the highest order derivatives are prescribed (more or less) explicitly and the initial conditions are given on a hypersurface up to the order k — 1.
Some notation will be useful. We shall use the multi-indices to express multivariate plynomials and derivatives, cf. 8.1.15. Further we shall write Vfeu = {dau; \a\ = k} for the vector of all derivatives of order k. In particular, Vu means again the gradient vector of u.
Quasi-linear PDEs
For unknown scalar function u on a domain fl C R™ we prescribe its derivatives
(1) ^aa(x,u,.. ^V^ujdaU = b(x,u,Vfe_1u),
\a\=k
where b and aa are functions on the tubular domain fl x RN, accomodating all the derivatives, with at least one of aa nonzero. We call such equations the (scalar) quasilinear partial differential equations (PDE) of order r.
We call (1) semilinear if all aa do not depend on u and its derivatives (thus all the non-linearity hides in b).
The principal symbol of a semi-linear PDE of order k is the symmetric fc-linear form P on fl,
P(x) : (Rn')k —> R, P(x,t,...,$= 5>Q(x)r-
For instance, the Poisson equation Au = f(x, y, u, Vu) on R2 is a semi-linear equation and its principal symbol is the positive definite quadratic form P((, n) = (2 + if, independent of (x, y).
643
CHAPTER 9. CONTINUOUS MODELS - FURTHER SELECTED TOPICS
The diffusion equation ^ = Au on R3 has got the symbol P(t, £, v) — C2 + V2> i-e- a positive semi-definite qua-dratic form, while the wave equation Du = — = 0 has got the indefinite symbol P(t, Q = r2 — (2 on R2.
We shall focus on the scalar equations and reduce the problem to a special situation which allows a further reduction to a system of first order equations (quite similarly to the ODE theory). Thus we extend the previous definition to systems of equations. Notice, these are systems of the first kind mentioned in 9.2.4.
Systems of quasi-linear PDEs
A system of quasi-linear PDEs determines a vector valued function u : fl c R™ —> Rm, subject to the vector equation
(2) A(x, u,..., Vfe_1u) ■ Vfeu = b(x, u,..., Vfe_1u).
Here A is a matrix of type m x M with functions ajiQ : flxKNas entries, M = (n+£_1) is the number of k-combinations with repetition from n objects, Vfeu is the vector of vectors of all the fcth-order derivatives of the components of u, b : fl x RN —> Rm, and ■ means the scalar products of the individual rows in A with the vectors VkUi of the individual components of u, matching the individual components in b.
9.2.13. Cauchy data. Next, we have to clarify the boundary condition data. Let us consider a domain U C R™ and a smooth hypersurface r C U, e.g. r given by an implicit equation f(x1,...,xn) = 0 locally. Consider the unit normal vector u(x) at each point x e T (i.e. v = V/ if given implicitly). We would like to find minimal data along r determining a solution of 9.2.12(1), at least locally around a given point.
To make things easy, let us first assume that r is prescribed by xn = 0. Then v(x) = (0,..., 0,1) at all x e r and knowing the restriction of u to r, we also now all derivatives da with a = (qi, ..., qn-i, 0), 0 < \a\. Thus, we have to choose reasonably differentiable functions cj on r, j' = 0,..., k — 1, and posit for all j
dau(x) = Cj(x), a = (0,..., 0, j), x e T.
All the other derivatives 9a« on f, 0 < \a\ < oo with a„ < k axe, computed inductively by the symmetry of partial derivatives.
Moreover, if a(o,...,o,fc) 7^ 0, we can establish the remaining fc-th order derivative by means of the equation 9.2.12(1) and hope to be able to continue inductively. Indeed, writing a — a(o,...,o,k)(x,u, ■ ■ ■, Vfe_1(u)) 7^ 0 (and similarly leaving out the arguments of the other functions aa), the equation 9.2.12(1) can be rewritten as
(1) -fa—ku=-(- Yl a^u + bix,^...,^-1^).
n \a\=k,an^:k
Now, on r we can use the already known derivatives to compute directly all the dau with an < k + 1. But differentiating
644
CHAPTER 9. CONTINUOUS MODELS - FURTHER SELECTED TOPICS
the latter equation by we obtain the missing derivative of order k + 1 from the known quantities on the right-hand side. By induction, we obtain all the derivatives, as requested.
In the general situation we can iterate the derivative P>l(x)u °f u m the direction of the unit normal vector v to the hypersurface F\
Cauchy data for scalar PDE
The (smooth or analytic) Cauchy data for the fcth order quasi-linear PDE 9.2.12(1) consist of a hypersurface F c U andfc (smooth or analytic) functions Cj, 0 < j < k—1, prescribing the derivatives in the normal directions to r
(2) {Dl(x)yu{x) = c3{x), xeF
A normal direction v(x), x e r, is called characteristic for the given Cauchy data, if
(3) aa{x,u,...,SJk-1u)v{x)a = 0.
\a\=k
The Cauchy data are called non-characteristic if there are no characteristic normals to r.
Notice the situation simplifies for the semi-linear equations. Then the characteristic directions do not depend on the chosen functions Cj from the Caychy data and they are directly related to the properties of the principal symbol of the equation. In the case of the hyperplane r = {xn = 0} treated above, the Cauchy data are non-characteristic if and only if
a(o,...,o,fc) 7^ 0.
For instance, semi-linear equations of first order always admit characteristic directions since their principal symbols are linear forms and so they must have non-trivial kernels (hy-perplanes of characteristic directions). In the three second order examples of the Laplace equation, diffusion equation, and wave equation very different phenomena occur. Since the symbol of the Laplace equation is a positive definite quadratic form, characteristic directions can never appear, independently of our choice of r. On the contrary, there are always non-trivial characteristic directions in the other two cases.
Characteristic cones of semi-linear PDEs
The characteristic directions of a semi-linear PDE on a domain n C K™ generate the characteristic cone C(x) (ZTxfl in the tangent bundle,
C(x) = {£ e TXQ; P(x)(t,...,t) = 0}.
The Cauchy data on a hypersurface r are non-characteristic if and only if (TT)1- DC = {0}, i.e. the orthogonal complements to the tangent spaces to r with respect to the standard scalar product on R™ never meet the characteristic cone.
Notice, cones for linear forms are hyperplanes in the tangent space, quadratic cones appear with second order, etc. The tangent vectors to characteristics of the first order quasi-linear equations (as introduced in 9.2.2) are orthogonal to the characteristic normals. We have learned that the first order equations propagate the solutions along the characteristic
645
CHAPTER 9. CONTINUOUS MODELS - FURTHER SELECTED TOPICS
lines and so we are not free to prescribe the Cauchy data for the solution in such a case.
9.2.14. Cauchy-Kovalevskaya Theorem. As seen so many times already, the analytic mappings are very rigid and most questions related to them boil down to some estimates and smart combinatorial ideas. It is time to remind what happens for analytic equations and Cauchy data in the very special case of theODEs.
For a single scalar autonomous ODE of first order, the Cauchy data consist of a single point "hypersurface'T = {x} in n C K and the value u(x). In particular, the Cauchy data are always non-characteristic in dimension one. Already in 6.2.15 we gave a complete proof that the induced derivatives of u provide a converging power series and thus the only solution, on certain neighborhood of x. In 8.3.13 we extended the same proof to autonomous systems of ODEs, which verified the same phenomenon for general systems of ODEs of any order k. Here the Cauchy data again consist of the only point in r and all derivatives of u of orders less than k (and again, they are always non-characteristic).
In subsequent paragraphs we shall comment on how to extend the ODE proof to the following very famous theorem. In particular, the statement says that we have to expect general solutions to fcth order scalar equations in n variables to depend on k independent functions of n — 1 variables. This is in accordance with our experience from simple examples.
Cauchy-Kovalevskaya theorem
Theorem. The analytic Cauchy problem consisting of quasi-linear equation 9.2.12(1) with analytic coefficients and right hand side, and analytic non-characteristic Cauchy data 9.2.13(2) has got a unique analytic solution on a neighborhood of each point in T.
Notice that we have computed explicitly the formal power series for the solution (by an inductive procedure) for the special case when T is denned by xn = 0. In this case, the theorem claims that this formal series always converges with non-trivial radius of convergence.
The full proof is very technical and we do not have space to bother the readers with all details. In the next paragraphs, we shall provide indications toward the steps in the proof. If the track (or interest) will be lost, the reader should rather jump to 9.2.18.
9.2.15. Flattening the Cauchy data. The first step in the proof is to transform the non-characteristic data to the "flat" hypersurface T discussed in the beginning of 9.2.13. Remind that for such T the non-characteristic condition in 9.2.13(3) reads a(o,...,o,fc) 7^ 0.
Let us start with the general equation and its analytic Cauchy data on an analytic let" (we omit the arguments
646
CHAPTER 9. CONTINUOUS MODELS - FURTHER SELECTED TOPICS
of all the functions and I = 0,..., k — 1)
(ft u
(1) ^ aadau = b, -^-j(x) = Ci(x), x e T.
\a\=k
We shall work locally around some unspecified fixed point in r. Since r is an analytic hypersurface in R™, there are new local coordinates y = \P(x), such that
r = {x; Vn{x) = 0}.
Moreover, dv d$ — du
dyn oyn
where A is the Laplace operator A = ^2 + ■ ■ ■ +
> 0 a real constant. The operator L lives on domains in
Rn+1.
Let us first return to the 2D wave equation utt =
at
k > 0, the diffusion equation is considered on domains in
Rn x R.
Again, let us have a look at the simplest ID diffusion equation ut = kuxx. It describes the diffusion process in a one-dimensional object with diffusivity k (assumed to be constant here) in time. First of all, let us notice that the usual boundary value presription of the state at time t = 0 is not matching the assumption of the Cauchy-Kovalevskaya theroem. Indeed, taking r = {t = 0}, the normal direction vector ■§? is characteristic.
652
CHAPTER 9. CONTINUOUS MODELS - FURTHER SELECTED TOPICS
The intuition related to the expectation on diffusion prob-,, lems suggests that Dirichlet boundary data should suf-[^» fice (we just need the inicial state and the diffusion r\£^ then does the rest), or we can combine them with some Neumann data (if we supply heat at some parts of the boundary). Moreover, the process should not be reversible in time, so we should not expect that the solution would extend accross the line t = 0.
Let us look at a classical example considered already by Kovalevskaya. Posit
u(0,x)=g(x) =
on a neighborhood of the origin (perfect analytic boundary data and equation), and expect u is a solution of ut = uxx in
the form u(t, x) = J2k,e>o C 0+ is exactly the function ip, as expected.
We shall come back to such convolution based principles a few pages later, after investigating simpler methods.
9.2.22. Superposition of the solutions. A general idea to solve boundary value problems is to take a good supply of general solutions and try to take lin-^t-^_ ear combination of even infinite many of them. This means we consider the solution in a form of a series. The type of the series is governed by the available solutions.
Let us illustrate the method on the diffusion equation discussed above. Imagine we want to model the temperature of a a homogeneous bar of length d. Initially, at time t = 0, the temperature at all points x is zero. At one of its ends we keep the temperature zero, while the other end will be heated with some constant intensity. Set the bar as the interval x G [0, d] c K, and the domain Q = [0, d] x [0, oo). Our boundary problem is
d
(1) ut = kuxx, u(x, 0) = 0, u(0, t) = 0, T^u(d, t) = p,
where p is a constant representing the effect of the heating. The idea is to exploit the general solutions
2 ,
u(x, t) = (A cos ax + B sin ax) ea
from 9.2.6 with free parameters a, A, and B. We want to consider a superposition of such solutions with properly chosen parameters and get the solution to our boundary problem in the form combining Fourier series terms wit the exponentials. This approach is often called the Fourier method.
The condition u(0,t) = 0 suggests to restrict ourselves to A = 0. Then, ux (x, t) = Ba cos(aa;) e~Q Kt. It seems to be difficult now to guess how to combine such solutions, to get something constant in time, as the Neumann part boundary condition requests. But we can help with a small trick.
654
CHAPTER 9. CONTINUOUS MODELS - FURTHER SELECTED TOPICS
There are some further obvious solutions to the equation -those with u depending on the space coordinate only. We may consider
v(x, t) = px
and seek further our solution in the form u(x, t)+v(x). Then u must again be a solution of the same diffusion equation (1), but the boundary conditions change to u(x, 0) = — px, u(0,t) = 0, £u(d,t) = 0. Now, we want
ux(x, t) = Bacos(ax) e~a'iKt = 0,
i.e. we should restrict to frequencies a = ^nir, with odd non-negative integers n. This has settled the second of the boundary condition. The remaining one is u(x, 0) = — px which sets the condition on the coefficients B in the superposition
2^ B2k+1 sin i -—- i = -px
k>0 ^ '
on the interval x e [0, d\. This is a simple task of finding the Fourier series of the function x, which we handled in 7.1.10. Combining all this, we get the requested solution u(x, t) to our problem:
- *P** E sin(^) e"*12^ •
fc>0
Even though our supply of general solutions was not big, superposing countably many of them helped us to solve our problem. Notice the behavior at the heated end. If t —> oo, then the all exponential terms in the sum vanish faster than the very first one, the sine terms are bounded, and thus the entire component with the sum vanishes quite fast. Thus, for big t, the heated end will increase its temperature nearly linearly with the speed p.
9.2.23. Separation in transformed coordinates. As we
have seen several times, it is very useful to view a given equation rather as an inpendent object expressed in some particular coordinates. The practical problems mostly include some symmetries and then we should like to find some suitable coordinates in order to see the equation in some simple form.
As an example, let us look at the Laplace operator A in the polar coordinates in the plane, and cylindrical or spherical coordinates in the space. Writing as usual x = r cos p, y = r sin p for the polar transformation, the Laplace operator gets the neat form
(1) 4= ii , Ii!_,il = Iifri),Iii
V. ) qr2 r2 Qyj2 r dr r dr \ dr) r2 dip2
The reader should perform the tedious but straightforward computation. Similarly,
(2) A - i-^-lV-^A + + =-
W ^ ~ r dr V dr! ^ r2 dip2 ^ dz2 '
(2) A = JS-fr2-^-) H__K__^- H__i__^-(smib-Q-)
^ ^ r2 dry dr) r2 sin ip dip2 r2 si'ntpdtpK ^ dtp)
in the cylindrical and spherical coordinates, respectively.
655
CHAPTER 9. CONTINUOUS MODELS - FURTHER SELECTED TOPICS
Let us illustrate the use on the following problem. Imagine a twisted circular drum, whose rim suffers a small vertical displacement. We should model the stabilized position of the drumskin.
Intuitively, we should describe the drumskin position by the 2D wave equation, but since we are interested in the state with vanishing, we actually take u as the vertical displacement in the interior of the unit circle, fl = {x2 + yy < 1} C R2 and request Au = 0, subjet to the Dirichlet boundary problem prescribing the vertical displacement u(x,y) = f(x,y) of the rim.
Obviously, we want to consider the problem in the polar coordinates, where the boundary condition gets the neat form u(l, 0.
We shall apply the separation of variables method to these data. Expecting the solution in the form u(r,p) = R(r) " + a2 0)
$( " + p2$ = 0, r2R" + rR' + (a2r2 -02)R = O.
The angular component equation has got the obvious solutions A cos /3p + B sin ftp, and again we have to restrict (3 to integers in order to get single-valued solutions. With (3 = m, the radial equation is the well known Bessel's ODE of order m (notice our equation gets the form we had in ?? once we substitute z = ar), with the general solution
R(r) = CJm(ar) + DYm(ar),
where Jm and Ym are the special Bessel functions of the first and second kinds.
We have obtained a general solution which is very useful in practical problems, cf. ??.
Non-homogeneous equations. Finally, we add a few comments on the non-homogeneous linear PDEs. , T Although we provide arguments for the claims, we - *| shall not go into technical details of proofs because of the lack of space. Still, we hope this limited insight will motivate the reader to seek for further sources to learn more.
As always, facing a problem Lu = /, we have to find a single particular solution to this problem, and we may then add all solutions to the homogeneous problem Lu = 0. Thus, if we have to match say Dirichlet conditions u = g on the boundary dfi of a domain fi, and we know some solution w, i.e. Lw = / (not taking care of the boundary conditions), than we should find a solution v to the homogenenous Dirichlet problem with the boundary condition g — w^n. Clearly the sum u = v + w will solve our problem.
In principle, we may always consider superpositions of known solutions as in the Fourier method above. We shall indicate a more conceptual and general approach now briefly.
657
CHAPTER 9. CONTINUOUS MODELS - FURTHER SELECTED TOPICS
Let us come back to the ID diffusion equation and our solution of a homogeneous problem by means of the Fourier transform in 9.2.21. The solution of ut = kuxx with u(x, 0) = p is a convolution of the boundary values u(x, 0) with the heat kernel
Now, the crucial observation is that u(x,t) = Q(x,t) is a solution to L(u) =ut — kuxx = 0 for all x and t > 0, while on neigborhood of the origin it behaves as the Dirac delta function in the variable x. (The first part is a matter of direct computation, the second one was revealed in 9.2.21 already.)
The latter observation suggests, how to find the particular solutions to a non-homogeneous problem. Consider the integral of the convolution
(2) u(x,t) = J^J Q(x-y,t-s)f(y,s)dy^ds.
The derivative ut will have two terms. In the first one we differentiate with respect to the upper limit of the outer integral, while the other one is the derivative inside the integrals. The derivatives with respect to x are evaluated inside the integrals. Thus, in the evaluation of L = ^ — k-^j the terms inside of the integral cancel each other (remember Q is a solution for all x, and t > 0) and only the first term of ut survives.
It seems obvious that this term is the evaluation of the integrand with s = t. Although, these values are not properly denned, we may verify this claim in terms of taking limit (t— s) -> 0+. This leads to
lim / Q(x-y,t-s)f(y,s)dy = f(x,s).
Thus, (2) is a particular solution and clearly u(x, 0) = 0.
The solution of the general Dirichlet problem L(u) = /, u(x, 0) = p on fi = R x [0, oo) is
u(x,t)= Q(x - y,t)p(y) dy +
(3)
Q(x -y,t- s)f(y, s) dy ) ds.
o
Let us summarize the achievements and try to get generalization to general dimensions.
First, we can generalize the heat kernel function Q writing its nD variant depending on the distance r from the origin only. Consider the formula with x e R™ as the product of the ID heat kernels for each of the variables in x.
1 IMI2
(4) Q(x,t) = -==e--Gr .
sJ(AllKt)n
Then taking the n-dimensional (iterated) convolution of Q with the boundary condition p on the hyperplane t = 0 provides the solution candidate
(5) u(x,t)= Q(x-y,t)p(y)dy1...dyn.
658
CHAPTER 9. CONTINUOUS MODELS - FURTHER SELECTED TOPICS
Indeed, a straightforward (but tedious) computation reveals that Q is a solution to L(u) = 0 in all points (x, t) with t > 0, and Q behaves again as the Dirac delta at the origin. In particular (5) is a solution to the Dirichlet problem L(u) = 0, u(x, 0) = p and we can allso obtain the non-homogeneous solutions similarly to the ID case.
9.2.26. The Green's functions. The solutions to the (non-homogeneous) diffusion equation constructed in the last paragraph are built on a very simple idea - we find a solution G to our equation which is denned everywehere expcept in the origin and blows up in the origin at the speed making it into a Dirac delta function at the origin. A convolution with such kernel G is then a good candidate for solutions for. Let us try to mimic this approach for the Laplace and Poisson equations now.
Actually, we shall modify the strategy by requesting
659
CHAPTER 9. CONTINUOUS MODELS - FURTHER SELECTED TOPICS
about 1 page to be finished - the spherical symmetric solution to Laplace => Green's function => solution to poisson similarly to the diffusion.
660
CHAPTER 9. CONTINUOUS MODELS - FURTHER SELECTED TOPICS
3. Remarks on Variational Calculus
Many practical problems look for minima or maxima of real functions J : S —> R defined on some spaces of functions. In particular, many laws of nature can be expressed as certain "minimum principle" concerning some space of mappings.
The basic idea is exactly the same as in the elementary differential calculus: we aim at finding the best linear approximations of J at fixed arguments u G 5, we recognize the critical points (those with vanishing linearization), and then we perhaps look at the quadratic approximations at the critical points. However, all these steps are far more intricate, need a lot of care, and may provide nasty surprises.
9.3.1. Simple examples first. If we know the sizes of tangent vectors to curves, we may ask what is the shortest distance between two points. In the plane R2, this means we have got a quadratic form g(x) = (gij(x)), 1 < i, j < 2, at each x e R2 and we want to integrate (the dots mean derivatives in time t, u(t) = u2(t)) are differentiable paths)
(1) J(u)= j2^/g(u(ť))(ii(ť))dt
Jt1
to get the distance between the two given points = (ui(íi),u2(íi)) = A and u(t2) = («1(^2),u2{t2)) = B. If the size of the vectors is just the Euclidean one, and we consider curves u (ť) = (t, v (ť)), i.e., graphs of functions of one variable, the length (1) becomes the well known formula
(2) J{U)= / y/l+v(t)2dt.
Quite certainly we all believe that the mimimum for fixed boundary values v(ti) and v(t2) must be a straight line. But so far, we have not formulated the problem itself. What is the space of curves we deal with? If we allowed non-continuous ones, then shorter paths are available! So we should aim at proving that the lines are the minimal curves among the continuous ones. Do we need them to be differentiable? In some sense we do, since the derivative appears in our formula for J, but we need to have the integrand defined only almost ev-erywehere. For example, this will be true for all Lipschitz curves.
In general, g{u)(u) = gn{u)u\ + 2g12(u)u1ii2 + g22(u)u\. Such lengths of vectors are automatically inherited from the ambient Euclidean R3 on every hypersurface in the space. Thus, finding the minimum of J means finding the shortest track in a real terrain (with hills and valleys).
If we choose a positive function a on R2 and consider g(x) = a(x)2 idR2, i.e., the Euclidean size of vectors scaled by ot(x) > 0 at each point x e R2, we obtain
(3) J(u)= [ 2a{t,v{t))^l + v{tfdt.
We can imagine the speed 1/q of a moving particle (or light) in the plane depends of the values of a (the smaller is a, the
661
CHAPTER 9. CONTINUOUS MODELS - FURTHER SELECTED TOPICS
bigger is the speed) and our problem will be to find the shortest path in terms of the time necessary to pass from A to B.
As a warm up, consider a = 1 in the entire plane, except the vertical strip V = {(t, y); t e [a,a + b]} where a = N and take A = (0,0), B = (a+b, c), a,b,c> 0. Wecanimag-ine V is a lake, you have to get from A to B by running and swimming, and you are swimming N times slower than running. If we believe that the straight lines are the minimizers for constant a, then it is clear that we have to find the optimal point P = (a, p) on the bank of the lake where we start swimming. The total time T(p) will then be (s is our actual speed when running straight)
\AP\ \PB\ J-- +
s/N
J (VP2 + a2 + AV(c-p)2 + &2)
and we want to find the minimum of T(p). The critical point is given by
P Ar C-P _, • M ■ I
—P== = iv —-======== => sin ip = iv sin tp,
VP2 + a2 V(c-p)2 + &2
where p is the angle betwen our running track and the normal to the boundary of V, while ip is the angle between our swimming track and the normal to the boundary (draw a picture yourself!). Thus we have recovered the famous Snell law of light diffraction saying that the proportion of the sine values of the angles is equal to the proportion of the speeds. (Of course, to finish the solution of the problem, the reader should find the solution p of the quartic equation and check that it is a minimum.)
9.3.2. Variational problems. We shall restrict our attention to the following class of problems.
General first order variational problems
Consider an open Riemann measurable set fi C K™, the space C1 (i?) of all differentiable mappings u : fl —> Rm, a
C2 function F = F(x, y,p) :8"xRmx Rnm and set the functional
(1) J{u)= I F(xXz),£>Vz))a^,
Jn
i.e., J(u) is computed as the ordinary integral of aRiemann integrable function j(x) = F(x,u(x), D1u(x)) where D1 u is the Jacobi matrix (the differential) of u. The function -F is called the Lagrangian of the variational problem and our task is to find the minimum of J and the corresponding minimizer u with prescribed boundary values u on the boundary dfl (and perhaps some further conditions restricting u).
Mostly we shall restrict ourselves to the case n = m = l, like in the previous paragraph, where u is a real differentiable function denned on an interval (ti, t2) and the function F =
F(t,y,p) : R3 ->• R,
(2) J(u) = f F(t,u(t),u(t))dt.
Jn
662
CHAPTER 9. CONTINUOUS MODELS - FURTHER SELECTED TOPICS
We saw F = sj\ + p2, F = a{t)\J\ + p2 in the previous paragraph. If we take F = y\J\ + p2, the functional J computes the area of the rotational surface given by the graph of the function u (up to a constant multiple). In all cases we may set the boundary values u(ti) and u(t2).
Actually, our differentiability assumptions are too strict as we saw already in our last example above, where F was dif-ferentiable except of the boundary of the lake V. We can easily extend our space of functions to piecewise differentiable u and request F(t, u(t),u(t)) to be piecewise differentiable for all such u's (as always, picewise differentiable means the one-side derivatives exist at all points).
A maybe shocking example is the following functional:
(3) J(u) = [ (u(t)2 - l)2 dt
Jo
on piece-wise differentiable functions on [0,1] (i.e. F is the neat polynomial (p2 — l)2). Clearly, J(u) > 0 for all u and if we set u(0) = = 0, then any zig-zag piecewise linear function u with derivatives ±1 satisfying the boundary conditions achieves the zero minimum. At the same time, there is no minimum among the differentiable functions u (find a quick proof of that!), but we can approximate any of the zigzag minima by smooth ones at any precision.
9.3.3. More examples. Let us develop a general method how to find the analogy to the critical points form the elementary calculus here. We shall find the necessary steps dealing with a specific set of problems in this paragraph. Let us work with the Lagrangian generalizing the previous examples:
(1) F(t,y,p) =yVl+P2
r > 0, and write Ft, Fy, Fp, etc., for the corresponding partial derivatives. Consider the variational problem on an interval I = (h, t2) with fixed boundary conditions u(ti) and u(t2) and assume u e C2 (7), u(t) > 0. Let us consider any differentiable v on I with v(ti) = v(t2) = 0 (or even better v with compact support inside of 7). Then u + Sv fulfills the boundary conditions for all small real <5's and consider
J(u + Sv) = [ 2F{t,u{t) + Sv{t),u{t) + Sv{t)).
Jt!
Of course, the necessary condition for u being a critical point must be \0J(U + Sv) = 0, i.e., (remind the derivative with respect to a parameter can be swapped with the integration)
(2) 0= / Fy(t,u(t),u(t))v(t) + Fp(t,u(t),u(t))v(t)dt.
Jt!
Integrating the second term in (2) per partes immediately yields (remember v(ti) = v(t2) = 0)
0 = jf \Fy{t,u{t),u{t))v{t) - jtFp{t,u{t),u{t)))v{t)dt.
663
CHAPTER 9. CONTINUOUS MODELS - FURTHER SELECTED TOPICS
(3) —Fp(t,u(t),u(t))=Fy(t,u(t),u(t)).
This condition will be certainly satisfied if the so called Euler equation holds true for u (we prove this is a necessary condition in lemma 9.3.6)
d It
An equivalent form of this equation for ii(t) =^ 0 is (we omit the arguments t of u and ii)
(4) Ft(t,u,u) = jt(F{t,u,ii) -uFp(t,u,uj).
Inourcaseof_F(f,y,p) = yr(l+p2)1/2,Ft vanishes identically, Fp = yrp(l + p2)-1/2 and thus, if we further assume r / 0, ti > 0, the term in the bracket has to be a positive constant C:
C7 = ur(lW)1/2-uuru(l+u2)-1/2 = ur(l+u2)-1/2. We have arrived at the differential equation
(5) u = C(l + u2)1/2r which we are going to solve.
Consider the transformation ii = tan r, i.e.,
u = C(l + (tanr)2)1/21- = C(cos r)-1/1-,
and so du = (cos t)~1It tanrdr. Consequently, dt = idu = £(cosr)_1/r(ir and by integration we arrive at the very useful parametrization of the solutions by the parameter r (which is actually the slope of the tangent to the solution graph):
(6) t = t0 + — [ (coss-^r)ds u = C(cosr)-1/r.
r Jo
Now, we can summarize the result for several interest-|^ mg values of r. First, if r = 0 (which
r ^W^wSv we excluded on the way), then the Euler equation (3) reads
u{\ + u2)-3'2 = 0,
which implies ii = 0 and thus the potential minimizers should be straight lines as expected. (Notice that we have not proved yet that the Euler equation is indeed a necessary condition, we shall come to that in the next paragraphs.)
For general r/0, the Euler equation (3) tells (a straightforward computation!)
1 + ii2
u = r-
u
and thus the sign of the second derivative coincides with the sign of r. In particular, the potential minimizers are always concave functions (if r < 0) or convex (if r > 0).
If r = —1, the parametrization (6) leads to (an easy integration!)
(7) t = to — C sin r, u = C cos r,
thus for r G [—tt/2, tt/2] our solutions are half-circles with radius C in the upper halfplane, centred at (to, 0). For r = —1/2, the solution is
C C
(8) t = t0 - — (2r + sin2r), — (l + cos2r)
664
CHAPTER 9. CONTINUOUS MODELS - FURTHER SELECTED TOPICS
which is a parametric description of a fixed point on a circle with diameter C rolling along the t axis, the so called cycloid curve. Now, r e [—7r/2,7r/2] provides t running from to + \C-k to to — \C-k, while u is zero in the point to±\Cn and reaches the highest point at t = to. (Draw pictures!)
Next, look at r = 1/2. Another quick integration reveals f = fo + 2C tanr = i0 + 2Cw, and we can compute ii and substitute into (5) to obtain
u = C+-^(t-t0)2.
Thus the potential minimizers are parabolas with the axis of symmetry t = to. If we fix A = (0,1) and a fo, there are two relevant choices C = 5 (1 ± \/l — i2,) whenever \ t0 < 1 (and no options for |fo I > 1). The two parabolas will have two points of intersection, A and another point B. Clearly only one of them should be the actual minimizer. Moreover, the reader could try to prove that the parabola u = \t2 touches all of them and has them all on the left (this is the so called envelope of all the family of parabolas). Thus, there will be no potential minimizer joining the point A = (1, 0) to an arbitrary point on the right of the parabola u = \t2.
The last case we come to is r = 1, i.e., the case of the area of the surface of the rotational body drawn by the graph. Here we better use another parametrization of the slope of the tangent, we set ii = sinh r. A very similar computation as above then immediately leads to t = 10 + *f /J" cosh s ds and we arrive at the result6
(9) u(i) = Ccosh:
c
9.3.4. Critical points of functionals. Now we shall devel-ope a bit of theory verifying that the steps done in the previous examples realy provided necessary conditions for solutions of the variational ^» problems. In order to underline the essential features, we shall first introduce the basic tools in the realm of general normed vector spaces, see 7.3.1. The spaces of piece-wise differentiable functions on an interval with the Lp norms can serve as typical examples. We shall deal with mappings T : S —> R called (real) functionals.
The first differential
Let 5 be a vector space equipped with a norm || ||. A continuous linear mapping L : S —> R is called a continuous linear functional.
A functional T : S —> R is said to have the differential Du^aXSi point u e S if there is a continuous linear functional L such that
(1) Inn + W = 0.
•u-s-0 \\v\\
Some more details on the set of examples of this paragraph can be found in the article "Elementary Introduction to the Calculus of Variations" by Magnus R. Hestenes, Mathematics Magazine, Vol. 23, No. 5 (May - Jun., 1950), pp. 249-267.
665
CHAPTER 9. CONTINUOUS MODELS - FURTHER SELECTED TOPICS
In the very special case of the Euclidean 5 = R™, we have recovered the standard definition of the differential, cf. 8.1.7 (just notice that all linear functionals are continuous on a finite dimensional vector space). Again, the differential is computed via the directional derivatives.7 Indeed, if (1) holds true, than for each fixed v e S
,™ c-r-/ w i ,. T(u + tv)—T(u) d
(2) SFtuYv) = hm —i-'--= — F{u+tv)
t-s-o t dt\o
exists and L(y) = ST(u)(y). We call ST(u) the variation of the functional T atu.
A point u e S is called a critical point if ST(u) = 0. We say that T has got a local minimum at u if there is an open neighborhood U of u such that T(w) > T(u) for all w e U. Similarly, we define local maxima and talk about local extrema.
If u is an extreme of T, then in particular t = 0 must be an extreme of the function T(u + tv) of one real variable t, where v is arbitrary. Thus the extremes have to be at critical points, if the variations exist.
Next, let us assume the variations exist at all points in a neighborhood of a critical point u e S. Then, again exactly as in the elementary calculus, considering two increments v,w e S we consider the limit
(3) SzT(u){v,w) = hm —--'--^- ' '.
If the limits exist for all u, v, then clearly S2T(u) is a bilinear mapping. Then, S2T(u)(w, w) is a quadratic form which we can consider as a second order approximation of T at u. We call it the second variation of T. Moreover, again as in the elementary calculus, <52 ^(m) (w, w) = ^(u+tw), if the second variation exists. We may summarize:
Theorem. Let T : S —> R be a functional with a local extreme in u £ 5. If the variation 8F{u) exists, then it has to vanish. If the second variation 82F{u) exists (thus in particular, exists on a neighborhood of u), then 82J7(u)(w,w) > Ofor a minimum, while 82F{u) (w,w) < Ofor a maximum.
Proof. Assume T has got a local minimum at u. We have already seen, f(f) = T(u + tv) has to achieve a local minimum for each v at t = 0. Thus /'(0) = 0 if f(t) is differentiable, and so ST(u) vanishes.
Now assume S2T(u)(w, w) = /"(0) = r < 0 for some w. Then the mean value theorem implies
f(t)-f(0) = f'{c)t = -{f'{c)-f'(0))ct
c
forsomef > c> 0. Thus, fori small enough/(f)-/(0) < 0 which contradicts /(0) being a local minimum.
The claim for maximum follows analogously (or we may apply the already proved result to the functional —T). □
In functional analysis, this directional derivative is usually called the Gateaux differential, while the continuous functional L satisfying (1) is usually called the Frechet differential, going back to two of the founders of functional analysis from the beginning of the 20th century.
666
CHAPTER 9. CONTINUOUS MODELS - FURTHER SELECTED TOPICS
Corollary. On top of all assumptions of the above theorem suppose F{v + tw) is 2 times differentiable at t = 0 and 82F{v) (w,w) > Ofor all v in a neighborhood of the critical point u and w G 5. Then T has got a minimum at u.
Proof. As before we consider f(t) = T(u + tw), w = z — u. Thus, for some 0 < c < 1
Hz) - m = f(i) - /(o) = /'(o) + \f"{C)
= —82J-(u + cw)(w,w) > 0.
□
Remark. Actually, the condition from the collolary is far too strong in infinite dimensional spaces. It is possible to replace it by the condition S2T continuous at u and S2T(u) (w,w) > C\\w\\ for some real constant C > 0 just in the critical point u. In the finite dimensional case, this is equivalent to the requirement S2T continuous and positive definite.
9.3.5. Back to variational problems. As we already noticed, the answer to a variational problem minimizing a functional (we omit the arguments t of the unknown function u)
(1) J(u) = / F{t,u,ii)dt
depends very much on the boundary conditions and the space of functions we deal with. If we posit u(ti) = A, u(t2) = B with arbitrary A, B e K we may deal with spaces of differentiable or piecewise differentiable functions satisfying these boundary conditions. But these subspaces will not be vector spaces any more. Thus, strictly speaking, we cannot apply the concepts from the previous paragraph here.
However, we may fix any differentiable function v on [t1,t2] satisfying v(ti) = A, v(t2) = B, e.g. v(t) = A + (B - A) j^zj-, and replace the functional J by
J(u) = J(u + v) = / F(t,u + v,ii + v)dt.
Now, the intitial problem transforms to one with boundary conditions u(ti) = u(t2) = 0 and computing the variations jgj[u + Sw) = -^J(u + v + Sw) does not change, i.e. we have to request w(ti) = w(t2) = 0 and we differentiate in a vector space.
Essentially, we just exploit the natural affine structures on the subspaces of functions defined by the general boundary conditions and thus the derivatives have to live in their modeling vector subspaces.
667
CHAPTER 9. CONTINUOUS MODELS - FURTHER SELECTED TOPICS
The first and second variations
Corollary. Let F(t,y,p) be a twice differentiable Lagrangian and consider the variational problem of finding a minimum of the functional (1) on the space of differentiable functions u g C1^!,^] with boundary conditions u(ti) = A, u(t2) = B. Then the first and second variations exist and can be computed for all v g S = {v g C1 [ti,t2]; v(ti) = v(t2) = 0}, as follows:
rt2
(2) 8J(u)(v)= {Fy{t,u,u)v + Fp{t,u,u)v)dt
rt2
52J(u)(v,v)= (Fyy(t,u,u)v2+
(3) Jt!
2Fyp(t, u, u)vv + Fpp(t, u, u)v2) dt.
If u is a local minimum of the variational problem, then
SJ(u)(y) = 0 for all v g S, while S2J(u)(y,v) > 0 for all v in a neighborhood of the origin in S.
Proof. Thanks to our strong assumptions on the differentiability of F, u, and v, we may differentiate the real function f(t) = J(u + tv) at t = 0 swapping the integral and the derivative. This immediately provides both formulae.
The remaining two claims are straightforward consequences of the theorem and corollary in the previous paragraph 9.3.4. □
9.3.6. Euler-Lagrange equations. We are following the path which we already tried when discussing our first bunch of examples in 9.3.3. Our next step was to guess the consequences of vanishing of the first variation in terms of a differential equation. Now we complete the arguments. We start with a simple result called the fundamental lemma of the calculus of variation.
Lemma. Assume u is a continuous function on the interval [ii, *2] and for all compactly supported smooth p g
c?[hM
u(t) 0. Due to the continuity, u(t) > u(c)/2 > 0 on a neighborhood (c — s,c + s) c (t1, t2), s > 0. Next, remind the smooth variants of indicator functions constructed in 6.1.6. For every pair of positive numbers 0 < e < r, we constructed a function 2) dt
<
to+as
(Ca2 + 2Ca\i>\) dt - \p \ v2 dt
to+as
= 2CsaA + 2Cal
t0+s
to —as
to+s
\p\dr—\pa \ p2 dr
2Csa3 +4Csa2 -\p{
to+s
p2 dr)a.
The integral on the right-hand side is strictly positive and, thus, the entire expression is negative if a is small enough. This is a contradiction and the proof is complete. □
9.3.8. Special cases. Very often the Lagrangians do not depend on all variables and then the variations and the Euler-Lagrange equations get special forms. The following summary is a straigtforward consequence of the general equation 9.3.6(1), whose equivalent form we saw already in 9.3.3(4)
670
CHAPTER 9. CONTINUOUS MODELS - FURTHER SELECTED TOPICS
Special forms of Lagrangians
Case 1. If the Lagrangian is F(t, y), i.e., does not depend on the derivatives, then the Euler-Lagrange equation says
(1) Fy(t,U)=0
which is an implicit equation for u(t). Moreover, the second variation is &2J(u)(v, v) = Jt*2 Fyy(t, u)v2 dt. Case 2. If the Langrangian is F(t,p), then the Euler-Lagrange equation is
(2) ^-Fp(t,u) = 0
and its solutions are given by the first order differential equation Fp(t,u) = C with a constant parameter C. Moreover, the second variation is S2J(u)(v, v) = Jt*2 Fpp(t, u)vv2dt. Case 3. If the Langrangian is F(u, u), then there is a consequence of the Euler-Lagrange equation (for 11,0)
(3) jt(F(u,u)-uFp(u,u))=Q
which again reduces the equation to the first order including a free constant parameter.
9.3.9. Remarks on higher dimensional problems.
671
CHAPTER 9. CONTINUOUS MODELS - FURTHER SELECTED TOPICS
9.3.10. Problems with free boundary conditions.
672
CHAPTER 9. CONTINUOUS MODELS - FURTHER SELECTED TOPICS
9.3.11. Constrained and isoperimetric problems.
673
CHAPTER 9. CONTINUOUS MODELS - FURTHER SELECTED TOPICS
placeholder
674
CHAPTER 9. CONTINUOUS MODELS - FURTHER SELECTED TOPICS
placeholder
675
CHAPTER 9. CONTINUOUS MODELS - FURTHER SELECTED TOPICS
4. Complex Analytic Functions
In the rest of the chapter, we shall look at the (single complex variable) functions denned on the complex plane C = R2. On many occasions we saw how helpful it was to extend objects from the real line into the complex plane. We provide a few glimpses into the rich classical theory, and we hope the readers will enjoy the use of it in the practical column.
9.4.1. Complex derivative. An open and connected subset fl C C is called a region, or a domain. A mapping / : fl —> C is called a complex function of a single complex variable. Working with complex numbers, we may repeat the definition of derivative:
Complex derivative
We say that a complex function / : fl —> C has the complex derivative f'(a) at a point a e fl, if the complex limit
z^a z — a
exists. We say that / is differentiate in the complex sense, or holomorphic, on fl, if its complex derivative f'(z) exists at each z e fl.
Clearly, this definition restricts itself to the definition of the derivative of functions of one real variable along R C C, when restricting the denning limit to real z and a. We shall see that the existence of a complex derivative is much more restrictive than in the real case.
The simplest example of a differentiable complex function is z i-> zn, n e N.
Indeed, exactly as with the real polynomials, we compute (z + h)n-zn = h(nzn-1 + \n{n-l)zn-2h+--- + hn-1) and thus for all z e C we obtain the limit
(znY = lim---- = nzn 1.
h^o h
By the very definition, the mapping / i-> /' is linear over the complex scalars and thus all polynomials f(z) are differentiable this way:
n n — 1
f(z) = akzk m- /'(*) = J2(k+iH+i*fc-
k=0 k=0
9.4.2. Analytic functions. A complex function / : fl —> C is called analytic in the region fl if for each a e fl, there is an open disc D = {\z — a\ o = $1, ■ ■ ■, &n = C, if (i) for all t G [0,1] the radii of convergence of the discs Dt are Rt > 0 and the centres of o or oo = ^0 along js(t). Let Rs(t) be the radius of convergence of (*-°)". *» = J (c^TT^.
n=0 , vs 7
If C ~ a = r'. then £ — a < 2: — a and similarly to the above, the expansion
/(C) /(C) 1 ^ /(C)
a_1 l^(Z- a)n+l ^ a>
C-z (z- a) - ^
' v ' z-a n=0
leads to the equality (via term by term integration)
IC—al —r'
Thus, we have obtained the Laurent series representation
00
/(*)= E c^-«r
n= —oo
as requested.
On the other hand, if we are given a Laurent series (1), then fixing arbitrary n e Z and multiplying this formula by (z — a)~(n+1) we can integrate over \z — a\ = p in order to obtain
--^ ) , - (iz = 27T J C„ .
l*-a|=P(*-a)n+1 The circle {|z — a| = p} with r < p < R was chosen arbitrarily, and in particular we see that the integrals Jjz_n|=p (z-f^I+i ^z cannot depend on p. □
9.4.19. Remarks on convergence. Given a Laurent series, its regular part J2^=oc^(z ~ a)n represents a power series that converges absolutely and uniformly on compact sets in its disc of convergence {\z — a\ < R} with = limsuPjj^oq cn|i, see the Cauchy'-Hadamard formula in Theorem 9.4.2.
The principal part Y^n^-i0™^ ~ a)n becomes a power series J2^=i c-nWn after a coordinate change w = and this series converges for \w\ < \ with r = lim sup^^^ Ic_„ I " . Thus we have verified:
693
CHAPTER 9. CONTINUOUS MODELS - FURTHER SELECTED TOPICS
Proposition. For any set of coefficients {cn,n G Z} set — = lim sup cn | ~, r = lim sup c_„ | ™.
-tX n—>-00 77—>-00
Tften ffre Laurent series
oo
/(*) = E Cn(z-aT
77= —OO
converges absolutely and uniformly on any compact set in the annulus A = {r < \z — a\ < R}. It is analytic in A and if \z — a\ < r, then the principal part ^77=^1 °n(z ~ a)n diverges, while the regular part Y^=o °n(z ~ a)™ diverges
for 2 — a I > R.
9.4.20. Link to Fourier series. There is a very interesting link between Laurent and Fourier series. If / is
/'■^■"•j^' analytic in .1 — {1 — p < 2 < l+p} for some p > 0 then its n-th Laurent series coefficient
1 t fi-z).dz = — r f(eu)e-mtdt.
2m 7|Z,_. 2™+! 2tt
Therefore, c„ represents the n-th Fourier coefficient of
0 for any R > 0. Thus, c„ = 0 for any n > 1 and consequently
/ (2) = CQ. □
9.4.22. Isolated singularities. We look at typical examples
of of analytic functions around "suspicious" points. Consider the fraction / (2) = The origin is the zero point of both sin 2 and 2, and since they behave very similarly for small 2, we
seelimz^0/(2) = 1-
694
CHAPTER 9. CONTINUOUS MODELS - FURTHER SELECTED TOPICS
On the other hand f(z) = -z grows towards infinity, limz_).o \ — °°> in the sense of the extended complex plane C = C U oo (also called the Riemann sphere). We can imagine C as the sphere with the stereographic projection onto the plane C, see the picture. Then clearly \imz^a f(z) = oo if and only if limz^0 l/l2)! = oo in the sense of standard analysis in real variables.
It might easily happen that the limit does not exist at all, see the theorem below. For example, take f(z) = e~ around the point z = 0. It is given by the infinite principle part of a Laurent series, f(z) = J2n=-i tIt2™- m general we talk about isolated singular points:
Isolated singularities
If f(z) is analytic in a punctured neighbourhood V = {0 < \z — a\ < p} then a is called an isolated singular point for f(z). We say that the singular point is
• removable, if there is a finite limit lim f(z) = b e C;
z—ya
• a pole, if lim f(z) = oo;
z—ya
• an essential singularity, if lim f(z) does not exist in C.
z—ya
A function f(z) with only isolated singularities in a domain 17 c C and without any essential singularities is called a meromorphic function in 17.
The function f(z) = tan-j provides an example of a non-isolated singularity at a = 0, as 0 is the limit of poles (f +n'rr)-1n eZ, of/(z).
On the other hand, all rational functions f(z)/g(z) are meromorphic in C.
The following theorem classifies isolated singularities and poles in terms of Laurent series.
Theorem. The following properties are equivalent:
• the point z = a is a removable singularity for f(z);
• |/(;z)| is bounded in some punctured neighbourhood V = {0 < \z - a\ < p};
• the Laurent series of f(z) in V = {0 < \z — a\ < p} is the Taylor series f(z) = Er^=o °n(z ~ a)™' (,e- ^e principal part vanishes;
• /(a) can be defined so that f(z) becomes analytic in {\z — a < p}.
Further, the point z = a is a pole for f(z) if and only if the principal part of the Laurent series of f(z) in {0 < \z — a < p} contains only finitely many terms, i.e. f(z) = 2~2l^=-N cn{z—a)n, n £ N, for some integer N (the smallest N with this property is called the order or the pole at z = a).
Finally, the Laurent series f(z) = E^=-oo °n(z ~ a)n in the punctured neighbourhood of a contains infinitely many terms with non-zero coefficients cn, n < 0, if and only if z = ais an essential singularity for f(z).
Proof. If \f(z)\ < M for 0 < \z - a\ < p, then by the Cauchy inequalities 9.4.21(1), |c_„| < Men, n > 0, for all 0 < e < p. Therefore, all coefficients with negative indices
695
CHAPTER 9. CONTINUOUS MODELS - FURTHER SELECTED TOPICS
vanish, and
oo
n=0
Define /(a) = cq to obtain a power series that converges in the entire disc {\z — a\ < p}.
This implies the equivalence of the four conditions in the first part of the theorem.
By the definition of a pole, f(z) ^ 0 in some punctured disc D = {0 < \z — a\ < p' < p} around a as limz^a f(z) = oo. Therefore g(z) = j^j is also analytic in
D and limz^a g(z) = 0. Hence, g(z) is analytic in D assuming s(a) = 0 and, therefore, g(z) = (z — a)Nh(z) for some integer n and analytic function h(z) with h(a) ^ 0. Thus, j4^r is also analytic on a neighborhood of a and, therefore,
Conversely, if f(z) = ^~yrh(z), where h(a) ^ 0, then limz^a f(z) = oo and a is a pole for /.
Finally, an isolated singularity of f(z) is neither removable nor a pole if and only if the principle part of its Laurent series is infinite and this observation finishes the proof. □
9.4.23. Some consequences. There are several straightforward corollaries of our classification of isolated singularities. In particular, if limz^a f(z) does not exist, then f(z) has really chaotic behaviour:
Theorem. If a e C is an essential singularity of f(z), then for any w G C there is a sequence zn —> a such that
lirrin^oo f(zn) = w.
Proof. Let w = oo. Since the singularity z = a is not removable, f(z) cannot be bounded in any punctured neighbourhood of a. So there exists a sequence zn —> a such that lirrin^oo f(zn) = oo.
For w e C, if in any punctured neighbourhood of a there is a point z such that f(z) = w, then by making a sequence with those points we obtain a sequence 2„such that f(zn) = w, as required. If there is a punctured neighbourhood of a where /(z) 7^ w, then 3(2) = }(z)-w a^so ^as 311 isolated singularity at z = a, which cannot be a pole or a removable one, as otherwise f(z) = w + would have a limit as 2 —> a. Therefore z = a is an essential singularity for 3(2) and, thus, there is a sequence 2„ —> a such that g(zn) = 00, which implies that lim^oo =w. □
We say that 00 e C is an isolated singularity of /(z) if f(z) is analytic in {|2 — a\ > R} for some R > 0. These are straightforward consequences of the Liouville theorem if 00 is the only singularity of f(z):
Corollary. If f(z) is analytic in C and z = 00 is a removable singularity for f(z) then f(z) is a constant.
If f(z) is analytic in C and z = 00 is a pole then f(z) is a polynomial f(z) = Y^j=o cj (z ~ a)J-
696
CHAPTER 9. CONTINUOUS MODELS - FURTHER SELECTED TOPICS
Proof. The first claim is a simple reformulation of the Liouville theorem, cf. 9.4.21.
To deal with the other claim, consider g(w) = f(^). Then w = 0is apolefor g(w). Let P(w) = Y^f=i cjw~-' be the principal part of Laurent series for g(w). Thus, h(w) = g(w) — P(w) is analytic in C with a removable singularity at w = 0. Moreover, limUJ_j.00 h{w) = limUJ_j.00 g(w) = /(0). Thus, \h(w) | is bounded, and by the Liouville theorem h(w) = const = /(0) = Co. Hence f(z) = g(z_1) = E f=o cj 2-7 > which is a polynomial in 2. □
9.4.24. Residues. Next we return to the Cauchy integral theorem, with our knowledge of isolated singularities.
A residue of an analytic function f(z) at its isolated singular point a e C is defined as
resa/ = — f(z) dz,
271"* J\z\=r
where 0 < r < p. Obviously, the definition does not depend on the choice of r.
Residue Theorem
Theorem. If f(z) is represented by the Laurent series
Y^=-oo cn(z- a)n, then resa f = c_i.
Further, consider a domain DcC and a function f(z) analytic in D \ {ai,..., an}, where cij G D, j = 1,..., n. Then
P n
I f(z)dz = 2my2resa. /. Job j=1
Proof. Integrating the Laurent series J2n°=-oo cn(z — a)n and using the fact that Jjz_a|=p(z — a)n dz = 0 unless n = —1, while f (z — a)-1 dz = 2ixi, we obtain that
resa / = c_i.
Next, choose such p > 0 that open discs Dj = {\z — aj\ < p}> 3 — 1, • • •,n, have pairwise empty intersections and their closures Dj belong to D. Then the Cauchy Integral theorem 9.4.15, applied to Dp = D \ |J"=i Dj yields
9.4.25. Residues at infinity. Recall that when integrating along the circle \ z — a\ = R, we always assume counterclockwise orientation on the circle. Thus we use the minus sign in the definition:
If f(z) is analytic in the closure of an exterior of a disc {\z\ > i?},then
resoo / = -
f(z) dz.
697
CHAPTER 9. CONTINUOUS MODELS - FURTHER SELECTED TOPICS
In terms of the Laurent series f(z) = J2n°=-oo c^(z ~ a)™> valid in {\z\ > R}, we have resoo / = —c_i.
Note that if f(z) is analytic in C \ {a±,..., an}, then
n
resoo / + resaj. / = 0.
Indeed, by taking a disc {\z\ < R} of sufficiently large radius that does not contain singularities on its boundary, we conclude that
/f(z) dz = res„ . f.
\*\=R j=1
9.4.26. Example of applications. Residues of analytic functions are used for the evaluation of improper integrals in real analysis. The following lemma turns out to be very useful for such purposes. We shall write M(R) for the maximum of | / (z |) over the upper half of the circle with radius R, i.e. M(R) = max|z|=fliImz>0 \ f(z)\.
Jordan's lemma
Lemma. Consider the function f(z) continuous on {Imz > 0, \z\ = R}. Then, for each positive real parameter t,
f(z)ettz dz < ^-M(R).
|z|=fl,Im z>0
Consequently, if f(z) is continuous on {Imz > 0, \z\ > Ro} and Mmb^oo M(R) = 0, then
lim / f(z)ettzdz = 0.
R^-oo J
|z| = R,Im z>0
Proof. We estimate the integral from the lemma:
f f(Re*9)e-tRsin9+URcos9iRe*9d8 Jo
< RM(R) f e-tRsin9d8. Jo
To evaluate the latter integral, we observe that sin 6 > ^8 for 0 < 6 < §. Thus, using t > 0 and the substitution r = we arrive at
RM(R) J e-tRsin9d8 = 2RM(R) f* e"^sin 9 d9 Jo Jo
rRt
<2RM(R) e=2^Ld6=-M(R) e~T dr Jo t Jo
= -j-M(R)(l - e-Rt) < jM(R).
The consequence for R —> oo is obvious. □
Typically, the Jordan lemma is used to compute improper integrals of real analytic (complex valued) functions g(x) =
698
CHAPTER 9. CONTINUOUS MODELS - FURTHER SELECTED TOPICS
f(x) eltx along the entire real line (or rather the real or imaginary parts of such integrals). If the corresponding complex analytic function f(z) has only a finite number of poles ak in the upper half plane and lim^oo M(R) = 0, then we may compute the real integral
where jr is the path composed byf the interval [-R, R] and the half upper circle of radius R. See the diagram and the examples in the other column.
9.4.27. Concluding remarks. Of course we have not touched on many important issues in this short introduction. These include the conformed properties of all analytic functions, i.e. they preserve all angles of curves, the richness of analytic functions which allow the mapping of any simply connected region fl bijectively to the unit open disc (i.e. both the map and its inverse are analytic, the Riemann mapping theorem). The proper setup for analytic extensions are the Riemann surfaces with their fascinating topological properties. Also, we only commented on the possibility of proving the Cauchy integral theorem for triangles just assuming the existence of complex derivative. The analyticity of all holomorhic functions then follows from the Cauchy integral formula. Moreover, we have not mentioned the functions of several complex variables at all!
We hope that all of these interesting issues will challenge the readers to go for further more detailed study in the relevant literature.
699
CHAPTER 10
Statistics and probability methods
Is statistics a part of mathematics ?
- whenever it is so, we need much of mathematics there... !
A. Dots, lines, rectangles
The obtained data from reality can be displayed in many ways. Let us illustrate some of them.
10.A.1. Presenting the collected data. 20 mathematicians were asked about the number of members of their household. The following table displays the frequency of each number of members.
Number of members 1 2 3 4 5 6
Number of households 5 5 1 6 2 1
Create the frequency distribution table. Find the mean, median and mode of the number of members. Build a column diagram of the data.
Solution. Let us begin with the frequency distribution table. There, we write not only the frequencies, but also the cumulative frequencies and relative frequencies (i. e., the probability that there is a given number of members in a randomly picked household). Let us denote the number of members by x{, the corresponding frequency by n{, the relative frequency by pi (= ni/ J2&j=i nj = «i/20), the cumulative frequency by iV, (= Yl]=i xj)< and me relative cumulative frequency by F{
Roughly speaking, statistics is any processing of numerical or other type of data about a population of objects and their presentation. In this context, we talk about descriptive statistics. Its objective is thus to process and comprehensibly represent data about objects of a given "population" — for instance, the annual income of all citizens obtained from the complete data of revenue authorities, or the quality of hotel accommodation in some region. In order to achieve this, we focus on simple numerical characterization and visualization of the data.
Mathematical statistics uses mathematical methods to derive conclusions valid for the whole (potentially infinite) population of objects, based on a "small" sample. For instance, we might want to find out how much a certain disease is spread in the population by collecting data about a few randomly chosen people, but we interpret the results with regard to the entire population. In other words, mathematical statistics makes conclusions about a large population of objects based on the study of a small (usually randomly selected) sample collection. It also estimates the reliability of the resulting conclusions.
Mathematical statistics is based on the tools of probability theory, which is very useful (and amazing) in itself. Therefore, probability theory is discussed first.
This chapter provides an elementary introduction to the methods of probability theory, which should be sufficient for correct comprehension of ordinary statistical information all around us. However, for a serious understanding of a mathematical statistician's work, one must look for other resources.
1. Descriptive statistics
Descriptive statistics alone is not a mathematical discipline although it uses many manipulations with numbers and sometimes even very sophisticated methods. However, it is a good opportunity for illustrating the mathematical approach to building generally useful tools.
At the same time, it should serve as a motivation for studying probability theory because of later applications in statistics.
many pictures missing!!
CHAPTER 10. STATISTICS AND PROBABILITY THEORY
(= A^/20 = E;=iPj):
n{ Pi Ni Fl
1 5 1/2 5 1/4
2 5 1/4 10 1/2
3 1 1/20 11 11/20
4 6 3/10 17 17/20
5 2 1/10 19 19/20
6 1 1/20 20 1
Now, we can easily construct the wanted (column) graphs of (relative, cumulative) frequencies:
The mean number of members of a household is:
5-1 + 5- 2 + 1- 3 + 6-4 + 2- 5 + 1- 6
20
= 2.9.
The median is the arithmetic mean of the tenth and eleventh values (having been sorted), which are respectively 2 and 3, i. e., x = 2.5.
The mode is the most frequent value, i. e., x = 4.
The collected data can also be presented using a box plot:
The upper and lower sides of the "box" correspond respectively to the first (lower) and the third (upper) quartile, so its height is equal to the interquartile range. The thick horizontal line is drawn at the median level; the lower and upper horizontal lines correspond respectively to the minimum and maximum elements of the data set, or to the value that is 1.5 times the interquartile range less than the lower side of the box (and greater than the upper side, respectively). The data outside this range would be shown as circles.
We can also build the histogram of the data:
In our brief introduction, we first introduce the concepts allowing to measure the positions of data values and the variability of the data values (means, percentiles etc.). We touch the problem how to visualize or otherwise present the data sets (diagrams). Then we deal with the potential relations between more data sets (covariance and principal components) and, finally, we deal with data without numerical values relying just on their frequencies of appearance (entropy).
10.1.1. Probability, or statistics? It is not by accident that rrs> _ we return to a part of the motivating hints 'C m^z?^\ fr°m me first chapter, as soon as we have
managed to gather enough mathematical ■ tools both discrete and continuous. Nowadays, many communications are of a statistical nature, be it in media, politics, or science. Nevertheless, in order to properly understand the meaning of such a communication and using particular statistical methods and concepts, one must have a broad knowledge of miscellaneous parts of mathematics. In this subsection, we move away from the mathematical theory; and think about the following steps and our objectives.
As an example of a population of objects, consider the students of a given basic course. Then, the examined numerical data can be:
• the "mean number of points" obtained during the course in the previous semester and the "variance" of these values,
• the "mean marks" for the examination of this and other courses and the "correlation" (i.e. mutual dependence) of these results,
• the "correlation" of data about the past results of given students,
• the "correlation" of the number of failed exams of a given student and the number of hours spent in a temporary job,
• ...
With regard to the first item, the arithmetic mean itself does not carry enough information about the quality of the lecture or of the lecturer, nor about the results of particular students. Maybe the value which is "in the middle" of the population, or the number of points achieved by the student who was just better than half of the students is of more concern. Similarly, the first quarter, the last quarter, the first tenth, etc. maybe of interest. Such data are called statistics of the population. Such statistics are interesting for the students in question as well, and it is quite easy to define, compute, and communicate them.
From general experience or as a theoretical result outside mathematics, a reasonable assessment should be "normally" distributed. This is a concept of probability theory, and it requires quite advanced mathematics to be properly denned. Comparing the collected data about even a small random population of students to theoretical results can serve in two ways: We can estimate the parameters of the distribution as well as draw a conclusion whether the assessment is reasonable.
701
CHAPTER 10. STATISTICS AND PROBABILITY THEORY
Histogram of x
Note that the frequencies of one- and two-member households were merged into a single rectangle. This is used in order to make the data "easier to read" - there exist (various and ambiguous) rules for the merging.
We simply mention this fact without presenting an exact procedure (it is just as anyone likes). □
10.A.2. Given a data set x = (x1,x2, ■ ■ ■ ,xn), find the mean and variance of the centered values x{ — x and the standardized values ^f2- ■
Solution. The mean of the centered values can be found directly using the definition of arithmetic mean:
^ n 1 n - 71
— / (xt — x) = — > Xi--> 1 = x — x = 0.
n ^ ' n ^ n ^
2 = 1 2 = 1 2 = 1
The variance of the centered values is clearly the same as for the original ones (sx). For the standardized values, the mean is equal to zero again, and the variance is
□
10.A.3. Prove that the variance satisfies sx = i J27=i x1 ~ x2.
Solution. Using the definitions of variance and arithmetic mean, we get:
1
Yl -2x-x+x2) = ^J2x^-^J2x^+-
i=l i=l i=l
1 n
1 \ " 2 -2
2x ■
At the same time, the numerical values of statistics for a given population can yield qualitative description of the likelihood of our conclusions. We can compute statistics which reflect the variability of the examined values, rather than where these values are positioned within a given population. For instance, if the assessment does not show enough variability, it may be concluded that it is badly designed, because the students' skills are of course different. The same applies if the collected data seem completely random.
In the above paragraph, it is assumed that the examined data is reliable. This is not always the case in practice. On the contrary, the data is often perturbed with errors due to construction of the experiment and the data collection itself.
In many cases, not much is known about the type of the data distribution. Then, methods of non-parametric statistics are often used (to be mentioned at the end of this chapter). Very interesting conclusions can be found if we compare the statistics for different quantities and then derive information about their relations. For example, if there is no evident relation between the history of previous studies and the results in a given course, then it may be that the course is managed wrongly.
These ideas can be summarized as follows:
• In descriptive statistics, there are tools which allow the understanding of the structure and nature of even a huge collection of data;
• in mathematics, one works with an abstract mathematical description of probability, which can be used for analysis of given data. Especially, this is when there is a theoretical model to which the data should correspond;
• conclusions of statistical investigation of samples of particular data sets can be given by mathematical statistics;
• mathematical statistics can also estimate how adequate such a description is for a given data set.
10.1.2. Terminology. Statisticians have introduced a great many concepts which need mastering. The fundamental concept is that of a statistical population, which is an exactly denned set of basic statistical units. These can be given by enumeration or by some rules, in case of a larger population.
On every statistical unit, statistical data is measured, with the "measurement" perceived very broadly.
For instance, the population can consist of all students of a given university. Then, each of the students is a statistical unit and much data can be gathered about these units - the numerical values obtainable from the information system, what is their favorite colour, what they had for dinner before their last test, etc.
The basic object for examining particular pieces of data is a data set. It usually consists of ordered values. The ordering can be either natural (when the data values are real numbers, for example) or we can define it (for instance, when we observe colours, we can express them in the RGB format
702
CHAPTER 10. STATISTICS AND PROBABILITY THEORY
□
10.A.4. The following values have been collected:
10; 7; 7; 8; 8; 9; 10; 9; 4; 9; 10; 9; 11; 9; 7; 8; 3; 9; 8; 7.
Find the arithmetic mean, median, quartiles, variance, and the corresponding box diagram.
Solution. Denoting the individual values by a{ and their frequencies by rij, we can arrange the given data set into the following table.
a{ 3 4 7 8 9 10 11
1 1 4 4 6 3 1
From the definition of arithmetic mean, we have
_ 3 + 4 + 4- 7 + 4- 8 + 6- 9 + 3-10 + 11 _ 162 X~ l + l + 4 + 4 + 6 + 3 + l ~ ^0" ~
Since the tenth least collected value is X(10) = 8 and the eleventh one is X(u) = 9, the median is equal to x = §±2 = 8.5. The first quartile is 20.25 = X(5)+X(6) = 7, and the third quartile is 20.75 = x<15>+x<16> = g_ From the definition of variance, we get s2:
5.12 + 4.12 + 4 ■ l.l2 + 4 ■ 0.12 + 6 ■ 0.92 + 3 ■ 1.92 + 2.92 1+1+4+4+6+3+1
= 3.59.
The histogram and box diagram are shown in the following pictures.
Histogram of x
to "I I-
in -
f - I-1-
l
cu
3 n - ...
o"
■p
u_
ey - 1-1
o -1 '-1-1-1-1-1-1-'
i-1-1-1
4 6 8 10
X
where we have used "statistics" method to make the histogram "nice" and "clear". You can find a lot of these conventions in the books on statistics, but if you do not know them,
and order them with respect to this sign). We can also work with unordered values.
Since statistical description aims at telling comprehensible information about the entire population, we should be able to compare and take ratios of the data values. Therefore, we need to have a measurement scale at our disposal. In most cases, the data values are expressed as numbers. However, the meaning of the data can be quantified variously, and thus we distinguish between the following types of data measurement scales.
Types of data measurement scales The data values are called:
• nominal if there is no relation between particular values; they are just qualitative names, i.e. possible values (for instance, political parties or lecturers at a university when surveying how popular they are);
• ordinal the same as above, but with an ordering (for example, number of stars for hotels in guidebooks);
• interval if the values are numbers which serve for comparisons but do not correspond to any absolute value (for example, when expressing temperature in Celsius or Fahrenheit degrees, the position of zero is only conventional);
• ratio if the scale and the position of zero are fixed (most physical and economical quantities).
With nominal types, we can interpret only equalities x\ = x2\ with ordinal types, we can also interpret inequalities x1 < x2 (or x1 > x2); with interval types, we can also interpret differences x1 — x2. Finally, with rational types, we have also ratios 2:1/2:2 available.
10.1.3. Data sorting. In this subsection, we work with a dataset x1,x2,... ,xn, which can be ordered (thus, theirtype is not nominal) and which have been obtained through measurement on n statistical units. These values are sorted in a sorted data set
(1) 2(1),2(2),...,2(rl).
The integer n is called the size of the data set.
When working with large data sets where only a few values occur, the simplest way to represent the data set is to enumerate the values' frequencies.
For instance, when surveying the political party preference or when presenting the quality of a hotel, write only the number of occurrences of each value.
If there are many possible values (or there can even be continuously distributed real values), divide them into a suitable number of intervals and then observe the frequencies in the given intervals. The intervals are also called classes and the frequencies are called class frequencies. We also use cumulative frequencies and cumulative class frequencies which correspond to the sum of frequencies of values not exceeding a given one.
703
CHAPTER 10. STATISTICS AND PROBABILITY THEORY
you are lost. This is the default setting of the R program. For example if you replace just value 3 by 2 you get quite different looking histogram:
Histogram of x
—i
12
□
Most often, the mean at of a given class is considered to be its representative, and the value a^n, (where n{ is the frequency of the class) is the total contribution of the class. Relative frequencies a, /n, and relative cumulative frequencies, can also be considered.
A graph which has the intervals of particular classes on one axis and rectangles above them with height corresponding to the frequency is called a histogram. Cumulative frequency is represented similarly.
The following diagram shows histograms of data sets of size n = 500 which were randomly generated with various standard distributions (called normal, x2, respectively).
10.1.4. Measures of the position of statistical values. If
the magnitude of values around which the collected data values gather are to be expressed, ^^1~§»- then the concepts of the definition below can be used. There, we work with ratios or interval types of scales.
Consider an (unsorted) data set (x1,..., xn) of the values for all examined statistical units and let n\,..., nm be the class frequencies of m distinct values a1,... ,am that occur in this set.
Means
10.A.5. 425 carps were fished, and each one was taken weighed. Then, mass intervals were set, resulting in the following frequency distribution table:
Weight (kg) 0-1 1-2 2-3 3-4 4-5 5-6 6-7
Class midpoint 0.5 1.5 2.5 3.5 4.5 5.5 6.5
Frequency 75 90 97 63 48 42 10
Draw a histogram, find the arithmetic, geometric, and harmonic means of the carps' weights. Furthermore, find the median, quartiles, mode, variance, standard deviation, coefficient of variation, and draw a box plot.
Solution. The histogram looks as follows:
Definition. The arithmetic mean (often only mean) is given as
x
=—xi = — 7i,-
71 ^ 71 ^ 3
= 1 3 = 1
The geometric mean is given as
X
= \y,X1X2 ---Xn
and makes sense for positive values xt only. The harmonic mean is given as
\ 71 L—' Xi
\ i=l 1
and is also useded for positive values x{ only.
The arithmetic mean is the only one of the three above which is invariant with respect to affine transformations. For all scalars a, b,
^ n n
(a + b ■ x) = — > (a + bxA = a + b} X{ = a + b ■ x.
704
CHAPTER 10. STATISTICS AND PROBABILITY THEORY
K-5d=.16447, ; LI lliatm pt ,01
Ccekavane nonnalrii
0 1 ? 3 4 s
Therefore, the arithmetic mean is especially suitable for interval types.
The logarithm of the geometric mean is the arithmetic mean of the logarithms of the values. It is especially suitable for those quantities which cumulate multiplicatively, e. g. interests. If the interest rate for each time period is x{%, then the final result is the same as if the interest rate had constant value of xG%. See 10.A.9 for an example where the harmonic mean is appropriate.
In subsection 8.1.29 (page 547), we use the methods invented there to prove that the geometric mean never exceeds the arithmetic mean. The harmonic mean never exceeds the geometric mean, and so
From the definitions of the corresponding concepts in subsection 10.1.4, we can directly compute that the arithmetic mean is x = 2.7 kg, the geometric mean is xG = 2.1 kg, and the harmonic mean is xH = 1.5 kg. By the definitions of subsection 10.1.5, the median is equal to x = x0,5 = 2.5 kg, the lower quartile to 20.25 = 1.5 kg, the upper quartile to 20.75 = 3.5 kg, and the mode is 2 = 2.5 kg. From the definitions of subsection 10.1.6, we compute the variance of the weights, which is s2 = 2.7 kg2, whence it follows that the standard deviation is sx = 1.7 kg, and the coefficient of variation is Vx = 0,6. □
10.A.6. Prove that the entropy is maximal if the nominal values are distributed uniformly, i. e., the frequency of each class is rii = 1.
Solution. By the definition of entropy (see 10.1.11), we are looking for the maximum of the function Hx = — Y17=iPAnPi with respect to unknown relative frequencies Pi = which satisfy Y^=1Pi = 1. Therefore, this is a typical example of finding constrained extrema, which can be solved using Lagrange multipliers. The corresponding Lagrange function is
L(pi, ■ ■ ■ ,Pn, A) = - ^Pilnpi + A ^pi - 1
The partial derivatives are J^- = — lnp, — 1 + A, hence its stationary points is determined by the equations pi = eA_1 for alH = 1,..., n. Moreover, we know that the sum of the relative frequencies pi is equal to one. This means that 7ieA_1 = 1, whence we get A = 1 — Inn. Substitution then
10.1.5. Median, quartile, decile, percentile, ... Another way of expressing the position or distribution of the values is to find, for a number a between zero and one, such a value xa that 100a% of values from the set are at most xa and the remaining ones are greater than xa. If such a value is not unique, one can choose the mean of the two nearest possibilities.
The number xa is called the a-quantile. Thus, if the result of a contestant puts him into 21.00, it does not mean that he is better than anyone else yet. However, there is surely no one better than him.
The most common values of tire the following:
• The median (also sample median) is defined by
X = 20.50
for odd n
I (x(n/2) + Z(n/2+i)) for even n
yields pi
□
where x^ corresponds to the value in the sorted data set 10.1.3(1).
• The first and third quartile are Qi = 20.25 and Q3 = 20.75. respectively.
• The p-th quantile (also sample quantile or percentile) xp, where 0 < p < 1 (usually rounded to two decimal places).
One can also meet the mode, which is the value 2 that is most frequent in the data set 2.
The arithmetic mean, median (with ratio types), and mode (with ordinal or nominal types) correspond to the "anticipated" values.
Note that all a-quantiles with interval scales are invariant with respect to affine transformations of the values (check this yourselves!).
10.1.6. Measures of the variability. Surely any measure of the variability of a data set 2 e R™ should be invariant with respect to constant translations. In the Euclidean space R™, both the standard distance and the sample mean have this property. Therefore, choose the following:
705
CHAPTER 10. STATISTICS AND PROBABILITY THEORY
10.A.7. The following graphs depict the frequencies of particular amounts of points obtained by students of the MB 104 lecture at the Faculty of Informatics of Masaryk University in 2012. The axes of the cumulative graph are "swapped", as opposed to the previous example.
The frequencies of particular amounts of points are enumerated in the following table:
# of points # of students
20.5 1
20 1
19 2
18.5 1
18 2
17.5 3
17 2
16.5 4
16 3
15.5 5
15 7
14.5 6
14 14
13.5 21
13 21
12.5 19
12 17
11.5 18
11 31
10.5 22
10 53
# of points # of students
9.5 9
9 9
8.5 13
8 8
7.5 13
7 4
6.5 7
6 4
5.5 8
5 7
4.5 9
4 5
3.5 7
3 8
2.5 8
2 14
1.5 8
1 2
0.5 6
0 9
The corresponding histogram looks as follows:
The histogram was obtained from the Information System of Masaryk University. We can see that the data are shown in a somewhat unusual way: individual amounts of points correspond to "double rectangles". It is a matter of taste how to represent the data (it is possible to merge some
Variance and standard deviation
Definition. The variance of a data set x is defined by
1 "
i=l
The standard deviation sx is defined to be the square root of the variance.
As requested, the variability of statistical values is independent of constant translation of all values. Indeed, the un-sorted data set
y = (x1 + c, x2 + c,..., xn + c)
has the same variance sy = sx.
Sometimes, the sample variance is used, where there is (n — 1) in the denominator instead of n. The reason will be clear later, cf. 10.3.2.
In case of class frequencies rij of values a, for m classes, this expression leads to the value
of the variance. In practice, it is recommended to use the Sheppard's correction, which decreases s2, by h2/12, where h is the width of the intervals that define the classes. Further, one can encounter the data-set range
R = xi
and the interquartile range
^3
The mean deviation, which is defined as the mean distance of the values from the median:
1 71
dx=1-y,\
71 -'
The following theorem clarifies why these measures of variability are chosen:
Theorem. The function S(t) = (l/n)Y^=1(xi—t)2 hasthe minimum value att = x, i.e., at the sample mean.
The function d(i) = (1/n) Y^7=i \Xi ~ ^ ^as ^e minimum value att = x, i.e., the median.
Proof. The minimum of the quadratic polynomial
f(t) = Y17=i (xi — /)2 is at the only root of its derivative:
71
f'(i) = -2j2(xt-t).
i=l
Since the sum of the distances of all values from the sample mean is zero t = x is the requested root and the first proposition is proved.
As for the second proposition, return to the definition of the median. For this purpose, rearrange the sum so that the first and the last summand is added, then the second and the last-but-one summand, etc. In the first case, this leads to the
706
CHAPTER 10. STATISTICS AND PROBABILITY THEORY
values, thereby decreasing the number of rectangles, or to use thinner rectangles).
□
25
20
15
10
0 50 100 150 200 250 300 350 400 450 500 pořadí studentů
We can notice that the mode of the values is 10, which, accidentally, was also the number of points necessary to pass the course. The mean of the obtained points is 9.48.
10.A.8. Here, we present column diagrams of the amounts of points of MB 101 students in autumn 2010 (the very first semester of their studies). The first one corresponds to all students of the course; the second one does to those who (3 years later) successfully finished their studies and got the bachelor's degree.
—
in
expression — t\ + \x^ — t\, and this is equal to the distance X(n) — x^ provided t lies inside the range, and it is even greater otherwise. Similarly, the other pair in the sum gives X(n_i) — 2(2) if 2(2) < t < 2(n_i), and it is greater otherwise. Therefore, the minimality assumption leads to t = i. □
In practice, it is required to compare the variability of data sets of different statistical populations. For this purpose, it is convenient to relativize the scale, and so use the coefficient of variation of a data set x:
Vx =
\x\
This relative measure of variability can be perceived in percentage of the deviation with respect to the sample mean x.
10.1.7. Skewness of a data set. If the values of a data set are distributed symmetrically around the mean value, then
x = x
However, there are distributions where
x > x.
This is common, for instance, with the distribution of salaries in a population where the mean is driven up by a few very large incomes, while much of the population is below the average.
A useful characteristic concerning this is the Pearson coefficient, given by
x — x P = 3-.
It estimates the relative measure (the absolute value of (3) and the direction of the skewness (the sign). In particular, note that the standard deviation is always positive, so it is already the sign of x — x which shows the direction of the skewness.
QUANTILE COEFFICIENTS OF SKEWNESS
More detailed information can be obtained from the quantile coefficients of skewness
x\—p -\- Xp 2>x Pp — ,
X\—p Xp
for each 0 < p < 1/2. Their meaning is clear when the numerator is expressed as (x1-p — x) — (x — xp).
In particular, the quartile coefficient of skewness is obtained when selecting p = 0.25.
Again, the results can be depicted in an alternative way:
10.1.8. Diagrams. People's eyes are well suited for perceiving information with a complicated structure. That is why there exist many standardized tools for displaying statis-'////■ ■ tical data or their correlations. One of them is the box diagram.
707
CHAPTER 10. STATISTICS AND PROBABILITY THEORY
0 100 200 300 400 500 600 porsdf studentu
And these are the graphs of amounts of points obtained by those students who continued their studies:
16
0 2 4 6 8 10 12 14 16 18 body
0 20 40 60 80 100 120 140 160 poradf studentu
We can see that in the former case, the mode is equal to 0, while in the latter case, it is 10 again. The frequency distribution is close to the one of the MB 104 course, which is recommended for the fourth semester.
BOX DIAGRAM
The diagram illustrates a histogram and a box diagram of the same data set (normal distribution with mean equal to 10 and variance equal to 3, n = 500).
The middle line is the median; the edges of the box are the quartiles; the "paws" show 1.5 of the interquartile range, but not more than the edges of the sample range. Potential outliers are indicated too.
Common displaying tools allow us to view potential dependencies of two data sets. For instance, in the left-hand diagram below, the coordinates are chosen as the values of two independent normal distributions with mean equal to 10 and variance equal to 3. In the right-hand illustration, the first coordinate is from the same data set, and the second coordinate is given by the formula y = 3x + 4. It is also perturbed with a small error.
10.1.9. Covariance matrix. Actually, the depencies be-jfj, tween several data sets associated to the same sta-.J i/j tistical units are at the core of our interest in many /^^SS^ real world problems. When definining the vari-" ance in 10.1.6 above, we employed the euclidean distance, i.e. we evaluated the scalar product of the values of the square of distances from the mean with itself. Thus, having two vectors of data sets, we may define
708
CHAPTER 10. STATISTICS AND PROBABILITY THEORY
10.A.9. A car was traveling from Brno to Prague at 160 km/h, and then back from Prague to Brno at 120 km/h. What was its average speed?
Solution. This is an example where one might think of using the arithmetic mean, which is incorrect. The arithmetic mean would be the correct result if the car spent the same period of time going at each speed. However, in this case, it traveled the same distance, not time, at each speed. Denoting by d the distance of Brno and Prague and by vp the average speed, we obtain
d d _ 2d 160 + 120 _ 7'
whence
160 ^ 120
Therefore, the average speed is the harmonic mean (see 10.1.3) of the two speeds.
□
B. Visualization of multidimensional data
The above examples were devoted to displaying one numerical characteristic measured for more objects (number of points obtained by individual students, for example). Graphical visualization of data helps us understand them better. However, how to depict the data if we measure p different characteristics, p > 3, of n objects. Such measurements cannot be displayed using graphs we have met.
10.B.1. One of the possible methods is the so-called principal component analysis. In this method, we use eigenvectors and eigenvalues (see 2.4.2) of the sample covariance matrix (see 10.2.35). We will use the following notation:
• random vectors of the measurement
■^-i (xili X{2, • • • 5 xifj) » ^ 1, . . . , 72,
• the mean of the j-th component
mj = iT,?=ixij>J = i,
• the sample variance of the j-th component
• the vector of means m = (m1,..., rap),
• the sample covariance matrix
tt=t Er=i(x* - mXx* - m)T
(note that each summand is a p-by-p matrix).
The covariance matrix is symmetric, hence all its eigenvalues are real and its eigenvectors are pairwise orthogonal. Moreover, considering the eigenvectors of unit length, we
Covariance and covariance matrix
Consider two data sets x = (x1,..., xn), y = (y1,..., yn), and their means x, y. We define their covariance by the formula
1 "
cov(a;,y) = - ^(xj - x)(yi -y).
i=l
If there are k sample sets a^1) = (x±\ ..., x1^),
= (x[k\ ..., x^), then their covariance matrix is the
symmetric matrix C = (c^) with ctj = cov(x<-t\x<-^). Again, the sample covariance and sample covariance
matrix are denned by the same formulae with n replaced by
(«-!)■ _
Clearly the covariance matrix has got the variances of the individual data sets on its diagonal.
In order to imagine what the covarinace should say, consider the two possible behaviours of two data sets: (a) they will deviate from their means in a very similar way (comparing individually x{ and y,), (b) they behave very independently. In first case, we should assume that the signs of the deviations will mostly coincide and thus the sum in the definition will lead to a quite big positive number. In the other case the signs should be rather independent and thus the positive and negative contributions should effectively cancel each other in the covariance sum.
Thus we expect the data sets expressing independent features to be close to zero while the covariance of dependent sets should be far from zero. The sign of the covariance shows the character of the dependence. For example, the two sets of data depicted in the left hand diagram above had covariance bout -0.11, while the covariance of the data from the right hand picture was about 25.9.
Similarly to the variance, we are often interested in normalized values. The correlation coefficient takes the covariance and divides it by the standard deviation of each of the data sets. In our two latter cases, the correlation coffeicients are about -0.01 and 0.99. As expected, they very clearly indicate which of the data are correlated.
10.1.10. Principal components analysis. If we deal with statistics involving many parameters and we need to decide quickly about their similarity (correlation) with some given patterns, we might use a simple idea from linear algebra.
Assume we have got k data sets x^>. Since their covariance matrix C is symmetric, there is an orthonormal basis e in Rk such that in this basis the corresponding quadratic form given by C will enjoy a diagonal matrix. The relevant basis e consists of the real eigenvectors e{ e Kfe for the eigenvalues Ai. The bigger is the absolute value A{|, the bigger is the variation of the orthogonal projection x of all the k data sets into this one-dimensional subspace spaned by e{.
Thus we may restrict ourselves to just this one data set x and consider the statistics concerning this one set as representing the multi-parametric data sets x^>. Similarly we may
709
CHAPTER 10. STATISTICS AND PROBABILITY THEORY
can see that the eigenvalue corresponding to an eigenvector of the covariance matrix yields the variance of (the size of) the projection of the data onto this direction (the projection takes place in the p-dimensional space). The goal of this method is to find the direction (in the p-dimensional space of the measured characteristics) for which the variance of the projections is as great as possible. Thus, this direction corresponds to the eigenvector of the covariance matrix whose eigenvalue is the greatest one. The linear combination given by the components of this vector is called the first principal component. The size of the projection onto this direction estimates the data quite well (the principal component can be viewed as a characteristic which substitutes for the p characteristics, i. e., it is a random vector with n components). If we subtract this projection from the data and consider the direction of the greatest variance again, we get the second principal component. Repeating this procedure further, we obtain the other principal components. The directions of the principal components correspond to the eigenvectors of the covariance matrix in decreasing order with respect to the size of the corresponding eigenvalues.
10.B.2. Find the first principal component of the following simple data and the vector which substitutes them: Five people were taken their height, little finger length, and index finger length. The measured data are shown in the following table (in centimeters).
Solution.
Martin Michael Matthew John Peggy
index f. 9 11 8 8 8
little f. 7.5 8 6.3 6 6.5
height 186 187 173 174 167
The vectors of the collected data are: xi = (9; 7.5; 186), x2 = (11; 8; 187), x3 = (8; 6; 173), x4 = (8; 6; 174), x5 = (8; 6.5,167). The covariance matrices of these vectors are:
/0.04 0.14 \1.72
'0.641 0.640 ,3.521
0.14 1.72 \ /4.840
0.49 6.02 , 2.64
6.02 73.96/ \21.12
0.640 3.521\ /0.641
0.640 3.52 , 0.640
3.52 19.36/ V2.721
2.64 21.12X
I. 44 11.52
II. 52 92.16/
0.640 2.721> 0.640 2.72
2.72 11.56;
'0.641 0.240 8.321 > 0.240 0.09 3.12 , 8.32 3.12 108.16;
also use several biggest eigen-values instead of one and reduce the dimension of our parameter space in this way. Finally, considering the unit length eigenvector (qi, ..., ak) corresponding to the chosen eigenvalue A, then the values olj provide the right coefficients in the orthogonal projection (x^,.. . ,x(k~)) i-> x = aia^1) H-----h akx(k\
See the exercise 10.B.2 for an illustration, together with another description how to proceed with the data in 1 O.B.I.
The latter approach is called the principal component analysis.
10.1.11. Entropy. We also need to describe the variability ,, of data sets even with nominal types, for instance in statistical physics or information theory. The only thing at disposal is the class frequencies, so the principle of classical probability can be used (see the fourth part of chapter one). There, the relative frequency of the i-th class, pi = ^i, is understood to be the probability that a random object belongs to this class.
The variance of ratio-type values with class frequencies rij was given by the formula (see 10.1.6)
where pj denotes the (classical) probability that the value is in the j-th class. Therefore, it is a weighted mean of the adjusted values where the weight of the term (a, — x)2 is pj.
The variability of nominal values are expressed similarly (denote it by Hx) ■ Even though there are no numerical values aj for the indices j, we can be interested in functions F that depend on the relative frequencies pj. For a data set x we can define
Hx = 5>^(r
where F is an unknown function with some reasonable properties.
If the data set has only one value, i.e. pk = 1 for some k and otherwise pj = 0, then we agree that the variability is zero, and so F(l) = 0.
Moreover, Hx is required to have the following property: If a data set Z consists of pairs of values from data sets X and Y (for example, one can observe eye colour and hair colour of people - statistical units), it is reasonable that the variability of Z be the sum of the variabilities, that is, Hz = Hx + Hy.
The relative class frequencies pi for the values of the data set X and qj for those of Y are known. The relative class frequencies for Z are then
ninrij nra
so we demand the equality (the ranges of the sums are clear from the context)
^PiqjFiPiQj) = ^PtF{pi) + ^qJF{qj).
710
CHAPTER 10. STATISTICS AND PROBABILITY THEORY
The sample covariance matrix is then a quarter of their sum, i. e.,
/ 1.70 1.075 9.35 \ S = 1.075 0.825 6.725 \9.35 6.725 76.30/
The eigenvalues of S are approximately 2.7, 312.2, and 0.38. The unit eigenvector corresponding to the greatest one is approximately (0.122; 0.09; 0.989). Thus, the first principal component is (185 5; 186.8; 172.4; 173.4; 166.5), which is not far from the people's heights. □
10.B.3. The students of a class had the following marks in various subjects:
Find the first principal component of the following simple data and the vector which substitutes them. Solution. The vectors of observation are xi = (1,1,2,2,1), xio = (2,3,1,2,1). The corresponding covariance matrices are:
/ 1.21 1.10 -0.330 -0.330 0.110 \ 1.10 1. -0.300 -0.300 0.100 -0.330 -0.300 0.0900 0.0900 -0.0300 -0.330 -0.300 0.0900 0.0900 -0.0300 \ 0.110 0.100 -0.0300 -0.0300 0.0100 /
/ 0.0100 -0.100 0.0701 -0.0300 0.0100 \
-0.100 1. -0.700 0.300 -0.100
0.0701 -0.700 0.490 -0.210 0.0701
-0.0300 0.300 -0.210 0.0900 -0.0300
\ 0.0100 -0.100 0.0701 -0.0300 0.0100 /
The sample covariance matrix is
/ 0.99 0.44 -0.078 0.26 -0.01 \
0.44 0.89 -0.22 0.22 -0.11
-0.078 -0.22 0.45 0.23 0.03
0.26 0.22 0.23 0.45 -0.078
\-0.01 -0.11 0.033 -0.0778 0.100 /
Its dominant eigenvalue is about 13.68, and
the corresponding unit eigenvector is approximately
(0.70; 0.65;-0.13; 0.28;-0.07). Therefore, the principal
Student id Maths Physics History English PE
1 1 1 2 2 1
2 1 3 1 1 1
3 2 1 1 1 1
4 2 2 2 2 1
5 1 1 3 2 1
6 2 1 2 1
7 3 3 2 2 1
8 3 2 1 1 1
9 4 3 2 3 1
10 2 3 1 2 1
Since pi and qj are relative frequencies, they sum to 1. So the right-hand side of the equality can be written as
leading to
J^PiqjFipiqj) = ^PtQjiFipi) + F(qj)).
This is satisfied by any constant multiple of a logarithm of any fixed base a > 1. It can be shown that no other continuous solution F exists.
Since pi < 1, lapi < 0. The variability must be non-negative, so F is chosen to be a logarithmic function multiplied by —1. Such a choice also satisfies F(l) = 0, as desired.
Entropy
The measure of variability of nominal values is expressed in terms of entropy. It is given by
#x = -£-M-)>
where k is the number of sample classes. Sometimes (especially in information theory), the binary logarithm is used instead of the natural logarithm.
One often works with the quantity
n
ft
(or with another logarithm base).
In this form, for a data set X with k equal class frequencies, compute
= k.
which is independent of the sample size. The next illustration shows 2-based entropy y for the number of occurrences of ' letters a, & in 10-letter words consisting of these characters, and x is the number of occurrences of b.
Note that the maximum entropy 1 occurs for the same number of a's and &'s, and indeed 21 = 2 as computed above.
The following illustration displays the entropy of 11 randomly chosen strings of length 10 made of 8 characters. The values are all much less than the theoretical maximal value of 3. This reflects the fact that the number of occurences of the individual 8 characters cannot be equal (or it could happen with a very small probability if the length of the string was 8
711
CHAPTER 10. STATISTICS AND PROBABILITY THEORY
componentis (1.58; 2.73; 2.13; 2.93; 1.45; 1.93; 4.28; 3.48; 5.26p6M)) If the same is done with, say, strings of length 10000,
□ we would get very close to 3 (typically the difference would
, , r . .. . „ .... be in the order of 10~3, if the random string generator was
Another possible method of visualization of multidimen- , ..
good enough).
sional data is the so-called cluster analysis, but we will not go
into further details here. ♦
C. Classical and conditional probability
In the first chapter, we met the so-called classical probability, see 1.4.1. Just to recall it, let us try to solve the following (a bit more complicated) problem:
10.C.1. Ales wants to buy a new bike, which costs 5100 crowns. He has 2500 crowns left from organizing a camp. Ales is no dope: he took 50 more crowns from his pocket money and went to the casino to play the roulette. Ales always bets on red. This means that the probability of winning is 18/37 and the amount he wins is equal to the amount he has bet. His betting strategy is as follows: The first time, he bets 10 crowns. Each time he has lost, he bets twice the previous bet (if he does not have enough money to make this bet, he leaves the casino, deeply depressed). Each time he has won, he bets 10 crowns again. What is the probability that, using this strategy, he wins the desired 2550 more crowns? (As soon as this happens, he immediately runs to buy the bike.)
Solution. First of all, we calculate how many times Ales can lose in a row. If he bets 10 crowns the first time, then in order to bet n times, he needs
10+20+- ■ ■+10-2™-1 = 10-1 2
10-
2™ - 1 2 - 1
10
As we can see, the number 2550 is of the form 10(2™ — 1), for n = 8. This means that Ales can bet eight times in a row regardless of the odds. He can never bet nine times in a row, because for that he would have to have 10(29 — 1) = 5110 crowns, which he will never reach (he stops betting as soon as he has 5100 crowns). Therefore, Ales loses the whole game if and only if he loses eight consecutive bets. The probability of losing one bet is 19/37; hence, the probability of losing eight consecutive (independent) bets is (19/37)8. Thus, the probability that he wins 10 crowns (using his strategy) is 1 — (19/37)8. In order to win 2550 crowns, he must win 255 times, and the probability of this is
o \ 255
2. Probability
Before further reading, the reader is advised to go through the fourth part of chapter one (the subsection beginning on page 18). Back then, we worked mainly with classical finite probability. We denned the basics of a formalism which we extend now. The main extension is that the sample space fl can be infinite, even uncountable. Recall that when we talked about geometric probability at the end of the fourth part of chapter one, the sample space for description of an event was a part of the Euclidean space, and events were suitable subsets of it. All of those sets were uncountable.
Begin with a simple (infinite, yet still discrete) example, to which we return from time to time throughout this section.
10.2.1. Why infinite sets of events? Imagine an experiment where a coin is repeatedly tossed until it comes up heads. There are many questions to be asked about this experiment: What is the probability of tossing the coin at least 3 times? (or exactly times, or at most 10 times, etc.) The outcomes of this experiment can be considered in the forma;;; e N>i U {oo}, which could be read as "the coin comes up heads for the first time in the fc-th toss". Note that k = oo is inserted, since the possibility that the coin always comes up tails must be allowed, too.
This problem is solved if the classical probability 1/2 of the coin coming up heads in one toss is used (and the same for tails). In the abstract model, the total number of tosses by any natural number N cannot be bounded. On the other hand, the probability that the coin always comes up tails in the first (k — 1) tosses out of the total number of n > k tosses is given by the fraction
2n-
2n
= 2"
where in the numerator, there is the number of favorable possibilities out of n independent tosses (i.e. the number of possibilities how to distribute two values into the n—k remaining positions), while in the denominator, there is the number of
712
CHAPTER 10. STATISTICS AND PROBABILITY THEORY
Therefore, the probability of winning using his strategy is much lower than if he bet everything on red straightaway.
□
10.C.2. You could try to solve a slight modification of the above problem: loe stops playing only if he loses all his money; if he still has some money, but not enough to bet twice the previous bet, he bets 10 dollars again.
We also met the conditional probability in the first chapter, see 1.4.8.
10.C.3. Let A, B be two events such that B is a disjoint union of events B\, B2, ■ ■ ■, Bn. Using the definition of conditional probability (see 10.2.6), prove that
(1)
P(A\B) =^TlP(A\Bi)P(Bi\B)
Solution. First, note that the events Af]Bu Af)B2,... ,Af)Bn are also disjoint. Therefore, we can write
P{A\B1yj...yjBn) = P^B^---^ =
V 1 n> P(BiU---UB„)
_ P ((A n Bi) u (A n B2) u ■ ■ ■ u (A n Bn)) _ - pjB) -
_ T.UPJA^B,) P(Bt) _
P(Bi) P(B)
n
= Y,P(A\Bi)P(Bi\B).
□
10.C.4. We have four bags with balls: In the first bag, there are four white balls. In the second bag, there are three white balls and one black ball. In the third bag, there are two white and two black balls. Finally, in the fourth bag, there are four black balls. We randomly pick a bag and take two balls out of it (without putting the first one back). Find the probability that
a) the balls are of different colors;
b) the second ball is white provided the first ball was white.
Solution. Since there is the same number of balls in each of the bags, any ball has the same probability of being taken (similarly for any pair of balls lying in the same bag). Therefore, we can solve this problem using classical probability
a) Altogether, there are 24 pairs of balls that can be taken. Out of them, 7 consist of balls of different colors. Therefore, the wanted probability is 7/24.
all possible outcomes. As expected, this probability is independent of the chosen n, and there is the J2T=i ^~k = 1-Therefore, the probability of tossing only tails is zero.
Thus we can define probability on the sample space fl with sample points (outcomes) ujk, whose probability is 2~k. This leads to a probability according to the definitions below.
We return to this example throughout this section.
10.2.2. tr-fields. Work with a fixed non-empty set fl, which contains the possible outcomes of the experiment and which is called the sample space. The possible outcomes a; G fl are also called sample points. In probability models, not all subsets of outcomes need be admitted. In particular, the singletons {cu} need not be considered. Those subsets whose probability we want to measure are required to satisfy the axioms of the so called a-algebras.
The axioms listed below are chosen from a larger collection of natural requirements in a minimal form. The first one is based on the assumption that the universal event should be a measurable set. The second one is forced by the assumption that events can be negated. The third one reflects the necessity to examine the event of the occurrence of at least one event from a countably infinite collection. (For instance, in the example from the previous subsection, the coin is tossed only finitely many times, but there is no upper bound on the number of tosses.).
ct-algebras of subsets
A collection A of subsets of the sample space is called a a-algebra or a-field and its elements are called events or measurable sets if and only if
• fl G A, i.e., the sample space is an event;
• if A, B G A, then A \ B G A, i.e., the set difference of two events is also an event;
• if A{ G A, i G 7, is a countable collection of events, then their union is also an event, i.e., Uig/Aj G A.
As usual, the basic axioms imply simple corollaries which describe further (intuitively required) properties in the form of mathematical theorems. The reader should check carefully that both following properties hold.
• The complement Ac = fl \ A of an event A is again an event.
• The intersection of two events is again an event since for any two subsets A, B c fl,
A\(fl\B) = AnB.
Actually, for any countable system of events A{, i G 7, the event
fi \ uJG[/Ac = nJG/A2
is also in the tr-algebra A.
Altogether, a tr-algebra is a collection of subsets of the sample space which is closed with respect to set differences, countable unions, and countable intersections.
713
CHAPTER 10. STATISTICS AND PROBABILITY THEORY
b) Let A denote the event that the first ball is white and B denote the event that the second ball is white. Then, P(B OA) is the probability that both balls are white, and this is equal to 10/24 = 5/12 since there are 10 such pairs. Again, we can used classical probability to calculate P(A): there are 16 balls in total, and 9 of them are white. Altogether, we have
P(B n A) _ ^ _ 20
~ JL ~ 27' 16 z'
P(B\A) =
P{A)
Another solution. The event A can be viewed as the union of three mutually exclusive events A\, A2, A3 that we took a white ball from the first, second, and third bag, respectively. Since there is the same number of balls in each of the bags, the probability of taking any (white) ball is also the same (independent of which ball it is), so we get P(A) = ^ and P(AX\A) = # = f. P(M\A) = § = ±,
10.2.3. Probability space. Now introduce probability in the mathematical model, recalling the concepts used already in the first chapter.
Elementary concepts Use the following terminology in connection with events:
• the entire sample space 17 is called the universal event; the empty set 0 £ A is called the null event;
• the singletons cu £ 17 are called elementary events (note that {cu} may not even be an event in .4);
• the intersection of events r\ieiAi corresponds to the si-multaneous occurrence of all the events A\,i £ I;
• the union of events UieiA{ corresponds to the occurrence of at least one of the events A\, i £ I;
• if AnB = 0, then A, B £ A are called exclusive events or disjoint events,
• if A G B, then the event A implies the event B;
• if A £ A, then the event B = 17 \ A is called the complementary event to A and denoted B = Ac.
P(A3\A) = P(B\A) =
Applying (5), we obtain
We have seen an example of probability defined on an infinite sample space in 10.2.1 above. In general, probability
P(B|yli)P(yli|yl) + P(B\A2)P(A2\A) + P(fyjtyyp^$fyAas follows:
P(A)
+ P(B\A2
P(A2) P(A)
+ P(B\A3
P(A3
Probability
P(P(AJ
4 2 9 + 3
3 12 9 + 39
20 27'
□
10.C.5. We have four bags with balls: In the first bag, there are four white balls. In the second bag, there are three white balls and one black ball. In the third bag, there are two white and two black balls. Finally, in the fourth bag, there are one white and three black balls. We randomly pick a bag and take a ball out of it, finding out that it is black. Then we throw away this bag, pick another one and take a ball out of it. What is the probability that it is white?
Solution. Similarly as in the above exercise, let A denote the event that the very first ball is black. This event can be viewed as the union of mutually exclusive events A{, i = 2,3,4, where A{ is the event of picking the i-th bag and taking a black ball from there. Again, the probability of picking any (black) ball is the same. Hence, P(A2\A) = i, P(A3\A) = § = |, and P(A4\A) = § = \. Let B denote the event that the second ball is white. If the thrown bag is the second one, then there are a total of 7 white balls remaining, so the probability of taking one of them is P(B\A2) = ^ (we can use classical probability again because each of the bags contains the same number of balls, so any ball has the same probability of being taken). Similarly, P(B\A3) = j|
Definition. A probability space is the c-algebra A of subsets of the sample space 17 on which there is a scalar function P : A —> K with the following properties:
• P is non-negative, i.e., P(A) > 0 for all events A;
• P is countably additive, i.e.,
P(uie)4,) = ^P(A),
iei
for every countable collection of mutually exclusive events;
• the probability of the universal event is 1.
The function P is called the probability function on (17, .4).
Immediately from the definition, the complementary event satisfies
P(AC) = \- P(A).
In chapter one, theorems on addition of probabilities were derived. Although dealing with finite sample spaces, the arguments remain the same now. In particular, the inclusion and exclusion principle says for any finite collection of k events Ai that
k k-1 k
p(ui14J) = £ p(ao -EE p^n aj)
i=l i=l j=i-\-l
k-2 k-1 k
+ E E E PiAnA.nAe)
i=ij=i+ie=j+i -----h
+ (-i)*-1P(a1n/l2n-n4).
714
CHAPTER 10. STATISTICS AND PROBABILITY THEORY
and P(B\A4) = ^. Applying (5), we get that the wanted probability is
P(B\A) = P(B\A2)P(A2\A) + P(B\A3)P(A3\A) + P(B\A4)P(A4\A) =
7_
12 " 6
I I J_ . I I
6T12 3 ^
iL I
12 2
25 36'
□
10.C.6. We have four bags with balls: In the first bag, there are a white ball and a black ball. In the second bag, there are three white balls and one black ball. In the third bag, there are one white and two black balls. Finally, in the fourth bag, there are one white and three black balls. We randomly pick a bag and take a ball out of it, finding out that it is white. Then we throw away this bag, pick another one and take a ball out of it. What is the probability that it is white?
Solution. Similarly as in the above exercise, we view the event A of the first ball being white as the union of four mutually exclusive events A\, A2, A3, and A4 that we take a white ball from the first, second, third, and fourth bag, respectively. The probability of taking a white ball out of the first bag is P(Ai) = \ ■ \ (the probability of A\ is the product of the probability that we pick the first bag and the probability that we take a white ball from there); simi-
larly, P(A2
1 a 4 ' 4'
p(A3)
P(A.
l l
4) ~ 4 ' 4-
11
24
Note
P(A) = P(A1) + P(A2) + P(A3) + P(A4) that the probability P(A) cannot be calculated classically, i. e., by simply dividing the number of white balls by the total number of the balls, because, for instance, the probability of taking a white ball from the first bag is twice greater than from the fourth bag. As for the conditional probabilities, wehaveP(^i|A) = P(A1)/P(A) = P(A2\A) = ^, P(A3\A) = ^j,P(A4\A) = ^. Now, let B denote the event that we take another white ball after we have thrown away the first bag. We want to apply (5) again. It remains to compute P(B\Ai), i = 1,... ,4. The probability P(B\A1) can be computed as the sum of the probabilities of the mutually exclusive events B2, B3, B4 (given A{) that the second white ball comes from the second, third, fourth bag, respectively. Altogether, we have
P(B\A1) = P(B2\A1)+P(B3\A1)+P(B4\A1 Similarly,
P(B\A2) P(B\A3)
13 11 34 + 33
The reader should look back at 1.4.5 and think about the details.
10.2.4. Independent events. The definition of stochastically independent events also remains unchanged. It reflects the intuition that the probability of the simultaneous occurrence of independent events is equal to the product of the particular probabilities.
Stochastic independence
Events A, B are said to be stochastically independent if and only if
P(A n B) = P(A)P(B). Of course, the universal event and the null event are stochastically independent of any event.
Recall that replacing an event A{ with the complementary event A\ in a collection of stochastically independent events A\, A2, again results in a collection of stochastically independent events, and (see 1.4.7, page 23)
P(A1 U ■
UAk) = l- P(A\ n ■
= i-(i-P(A1))...(i-P(Ak)).
Classical finite probability remains the fundamental example of probability, used as the inspiration during creation of the mathematical model. Recall that in this case, fl is a finite set, the tr-algebra A is the collection of all subsets of fl, and the classical probability is the probability space (fl, A, P) with probability function P : A -> R,
Pt*> = W
This corresponds precisely to the intuition about the relative frequency pa of an event A when drawing a random element from the sample set fl.
This definition of probability guarantees reasonable behaviour of monotone sequences of events:
10.2.5. Theorem. Consider a probability space (fl, A, P) and a non-decreasing sequence of events A\ C A2 C .... Then,
p({Ja) = lim P(Ai).
\ = 1 '
Similarly, if A\ D A2 D A3 D ..., then
p(r\Ai) =lim p(^)-
II I II I II
11,13,11
32 34 34
13 36'
1
2'
1 l PrQof. The considered union A = can be
^^yrTEteji in terms of mutually exclusive events
Ai = Ai \ Ai—i,
denned for all i = 2,3,.... Set A1=A1. Then,
/ OO \ OO k
P(A) = P(U^ ) =£iW= lim J2P(Ai).
715
CHAPTER 10. STATISTICS AND PROBABILITY THEORY
P(B\A4) = Altogether, we get
11 _i_ 13 , 11 32 + 34 + 33
19
36'
For the finite sums,
p(Ai) = p(^i) + E(p(^) - =
P(B\A) = P(B\A1)P(A1\A) + P(B\A2)P(A2\A) + P(B
19 44
4 3 13 9 12 19 3 911 + 3622 + 211 + 3622
□
A^^^ji^p^^^yp^^ms proves the first part of the theorem.
In the second part, consider the complements B{ = A': instead of the events A{. They satisfy the assumptions of the first part of this theorem. Then, the complement of the considered intersection is
10.C.7. Two shooters shoot at a target, each makes two shots. Their respective accuracies are 80 % and 60 %. We have found two hits in the target. What is the probability that they belong to the first shooter?
Solution. The probability of hitting the target is 4/5 for the first shooter, and 3/5 for the second one. Consider the events: A... there are two hits in the target, both of the first shooter,
B ... there are two hits in the target.
Our task is to find P(B\A). We can divide the event B into six disjoint events according to which shot(s) of each shooter was/were successful. We enumerate the events in a table and, for each of them, we compute its probability. This is easy as each of the events is the intersection of four independent events (results of the four shots). A hits is denoted by 1, amiss by 0.
Shooter 1 Shooter 2 probability
0 1 0 1 14 2 3 5 ' 5 ' 5 ' 5
B2 0 1 1 0 24 252
B-i 1 0 1 0 24 252
BA 1 0 0 1 24 252
B5 1 1 0 0 (14 252
B6 0 0 1 1 y 252
Adding up the probabilities of these disjoint events, we get:
6
P(B) = J2p(Bt) = 169/625.
i=l
Now, we can compute the conditional probability, using the formula of subsection 10.2.6:
P(AnB) _ P(B5) _ H _ 64 P(B) P(B) if 164'U^8-
□
P(A\B) =
OO \ c OO
B = A°=l()Ai) =ub,
= 1 ' i=l
The desired statement follows from the fact that
P(A) = 1 - P(B) = 1 - lim P(Bt) = lim (l - P{Bt))
i—>-oo i—>-oo
which completes the proof. □
10.2.6. Conditional probability. Consider the following problem: On average, 40% of students succeed in course X and 80% of students succeed in course Y. If a random student is enrolled in both these courses saying that he has passed one of them (but we overhear which one), what is the probability that he has meant course XI
As mentioned in subsection 1.4.8 (page 24), such problems can be formalized in the way described below. (We shall come back to the solution of the latter problem in 10.3.12.)
Conditional probability
Definition. Let H be an event with non-zero probability in the tr-algebra A of a probability space (17, A, P). The conditional probability P(A\H) of an events e A with respect to the hypothesis H is defined as
P{Ar\H)
P{A\H) =
P(H)
The definition corresponds to the intuition from the classical probability that the probability of events A and H occurring simultaneously, provided the event H has occurred, is P(AnH)/P(H).
Directly from the definition, the hypothesis H and the event A are independent if and only if P(A) = P(A\H).
At first sight, it may seem that introducing conditional probability does not add anything new. Actually, it is a very important type of approach which is needed in statistics as well. The hypothesis can be the a prior probability (i.e. the prior belief assumed beforehand), and the resulting probability is said to be posterior (i.e., it is considered to be a consequence of the assumption). This is the core of the Bayesian approach to statistics as is seen later.
The definition also implies the following result.
716
CHAPTER 10. STATISTICS AND PROBABILITY THEORY
10.C.8. We toss a coin. If it comes up heads, we put a white ball into an (initially empty) bag; otherwise we put a black ball there. This is repeated n times. Then, we take a ball randomly from the bag (without replacement). Suppose it is white. What is the probability that another ball we take randomly from the bag is black?
Solution. We will solve the problem for a general (possibly biased) coin. In particular, we assume that the individual tosses are independent and that there exists a fixed probability of the coin coming up heads, which we denote p. The event "a ball in the bag is white" corresponds to the event "the coin came up heads in the corresponding toss". Since the first ball was white, we deduce that p > 0. We can also see that the probability space "taking a random ball from the bag" is isomorphic to the probability space "tossing a coin". Since we assume that the individual tosses are independent, we also get the independence of the colors of the selected balls. This leads to the conclusion that the probability in question is I—p.
Is this reasoning correct? Do we not expect the probability of taking a black ball to be greater than 1 — pi See, there were approximately np white and n(l — p) black balls in the bag, so if we had removed one white ball, the probability of selecting a black one should increase, shouldn't it? Before reading further, try to figure out which (if any) of these two presented reasonings is correct, and whether the probability is also dependent on n (the number of balls in the bag before any were removed).
Now, we select a more sophisticated approach to the problem. Let B{ denote the event "there were i white balls in the bag" (before any were removed), i e {0,1,2,..., n). Further, let A denote the event "the first ball is white" and C denote the event "the second ball is black". Actually, the event Bi says that the coin came up heads i times out of n; hence, its probability is
Lemma. Let an event B be the union of mutually exclusive events B\, B2,. ■ -,Bn. Then,
p{Bi) = [i:jf{i-pY-\
The conditional probability of taking a white ball provided there are exactly i white balls in the bag is equal to
P{A\Bt) = -.
n
We are interested in the probability of C, knowing that A has occurred, i. e., we want to know P(C\A). Since the events Bi are pairwise disjoint, this is also true for the events C nBi. Since C can be decomposed as the disjoint union \J™=0(C n
(1)
P(A\B) = ^P(A\Bi)P(Bi\B)
A n Br, are
Proof. The events A n BltA n B2, also mutually exclusive. Therefore,
P ((A n Bi) u (A n B2) u ■ ■ ■ u (A n Bn))
P(B)
P(AnBj)P(Bj) = P(Bi) P(B)
YJP(A\Bi)P(Bi\B).
□
Consider the special case B = Q. Then, the events Bi can be considered the "possible states of the universe", P {A | Bi) expresses the probability of A provided the universe is in its i-th state, and P{Bi\fl) = P(Bi) is the probability of the universe being in its i-th state. By the above lemma,
n
P(A) = P{A\tt) = YJP(ABi)P{Bi).
i=l
This formula is called the law of total probability.
10.2.7. Bayes' theorem. Simple rearrangement of the conditional probability formula leads to
P(A n B) = P(B nA) = P{A)P{B\A) = P{B)P{A\B).
There are two important corollaries:
Bayes' rules
Theorem. The probabilities of events A and B satisfy
P(A)P(B\A)
P(B)
P(A)P(B\A)
(1) P(A\B) =
(2) P(A\B) p(A}p(B\A} + p(Ac)P{B\Ac)'
The first proposition is called the inverse probability formula. The second proposition is called the first Bayes' formula.
Proof. The first statement is a mere rearrangement of the formula above the theorem. To obtain the second statement, note that
P(B) = P(B nA) + P(B n AO-Applying the law of total probability, P(B) = P(A)P(B\A) + P(AC)P(B\AC) can be substituted into the inverse probability formula, thereby obtaining the second statement of the theorem. □
717
CHAPTER 10. STATISTICS AND PROBABILITY THEORY
Bi), we can write
^)-p(U(CnB.^)-EP('C^))n-').
1 " ^ > i=0
1 "
= pJA) Y p(A n B«)p(c'\A n B«) =
^ ' i=0
n
= plAjY P(Bt)P(A\Bt)P(C\A n Bt).
We use the law of total probability and substitute for P(A), which leads to
J2P(BAP(A\BAP(C\AnB,
P{C\A) =
(1)
P{A)
n
Y,P(Bi)P(A\Bi)P(C\Ar\Bl
i=0_
n
E P{Bt)P{A\Bt)
i=0
This formula is sometimes called the second Bayes' formula; it holds in general provided the space fl is a disjoint union of the events Bi.
Since we tossed the coin at least once, we have n > 1. Now, we can calculate:
Pi,Bi)P(A\Bi) = u J p^1 - p)n~l ■ -n
^(,-l)!(n-,)!Pl W
j=0 n-1
i!(n-i-l)!
PJ+1(1-P)T
=pE("7>i(1-p)B-1"
j=0 ^ ^
= p(p+ (1 1 =p,
=£("V(i-p)B-
j=0 n-1
2 71 — j
E
(71-2)!
^ (i-l)!(7i-i-l)!
71 71—1
=E-,i""o2)l.„p<+1(i-p)"-<-1
Bayes' rule is sometimes formulated in a somewhat more general form, proved similarly as in (2):
Let the sample space fl be the union of mutually exclusive events Ai,... An. Then, for any i e {1,..., n},
P(B\Ai)P(Ai)
(3)
P(A\B)
J2nk=1P(B\Ak)P(Ak)
10.2.8. Example and remarks. Now, the introductory question from 10.2.6 can be dealt with easily. Consider the event A which corresponds to "the student having passed an exam" and the event B which corresponds to "the exam in question concerning course X". Assume that the probabilities of the exam concerning either course are the same, i.e., P(B) = P(BC) = 0.5. While the wanted probability P(B\A) is unclear, the probability P(A\B) = 0.4 is given, as well as P(A\BC) = 0.8.
This is a typical application of Bayes' formula 10.2.7(2). There is no need to calculate P(A) at all:
P{B)P{A\B)
p(B\A) =
P(B)P(A\B) + P(BC)P(A\BC) 0.5 ■ 0.4 _ 1 ~ 3'
i=0
i!(7i-2-i)r
0.5-0.4 + 0.5-0.8
In order to better understand the role of the prior probability hypothesis, here is another example.
Consider a university using entrance exams with the following reliability: 99% of intelligent people pass them, while concerning non-intelligent people, only 0.5% are able to pass. It is desired to find the probability that a random student (accepted applicant) of the university is intelligent.
Thus, let A be the event "a random person is intelligent" and B be the event "the person passed the exams successfully". Using Bayes' formula, the probability that A occurs provided B has occurred can be computed. It is only necessary to supply the general probability p = P(A) that a random applicant is intelligent.
P(A\m = p-°M K 1 ' p ■ 0.99 +(l-p)- 0.005'
The following table presents the result for various values of p. The first column corresponds to the case that every other applicant is intelligent, etc.
p j 0.5 0.1 0.05 0.01 0.001 0.0001 P(A\B) I 0.99 0.96 0.91 0.67 017 0.02
Therefore, if every other applicant is intelligent, then 99% of the students are intelligent. If only 1% of the population meets an expectation of "intelligence" and the applicants form a good random sample, then only about two thirds of the students are intelligent, etc.
Consider similar tests for the occurrence of a disease, say HIV. There may be a test with the same reliability as the one above and use it to test all students that are present at the university. In this case, assume that the parameter p is close to the one for the entire population (say 1 out of 10000 people is infected, on average), which corresponds to the last column
718
CHAPTER 10. STATISTICS AND PROBABILITY THEORY
=p{i-p)52(n-2)pi{i-Pr-*-i =
i=o \ '
= \p(l-p), n>l [0, n = l.
Substituting this into the second Bayes' formula, we obtain the wanted probability
P{C\A) = l°> n = 1' [1 -p, n > 1.
Thus, the simple reasoning about the probability spaces being isomorphic led to the correct result. The second reasoning was wrong because it omitted the fact that since the first ball was white, the expected number of white balls in the bag (before removing the first one) was greater than np. The calculation highlights the singular case n = 1. □
10.C.9. Once upon a time, there was a quiz where the first prize was Ferrari 599 GTB Fiorano. The contestant who won the final round was taken into a room where there were three identical doors. Behind two of them, there were goats, while the third one contained the car. In order to win the car, the contestant had to guess the correct door. First, the contestant pointed at one of the three doors. Then, an assistant opened one of the other two doors behind which there was a goat. Now, the contestant is given the option to change his guess. Should he do so?
Solution. Of course, we assume that the contestant wants to win the car. First of all, try to examine your intuition for random events. For example, you can reason as follows: "One of the two remaining doors contains the car, each with the same probability. Therefore, it does not matter which door we choose." Or: "The probability of choosing the correct door at the beginning is |. The shown goat changes nothing, so the probability that the guess is wrong is |. Therefore, we should change the door, thereby winning by |."
Apparently, it is wise to change the door only if the probability of the car being behind that door is greater than behind the initially chosen one. We consider the following events: H stands for "the initial guess is correct", A stands for "we have changed the door", and C for "we have won". We are thus interested in the probabilities P(C\A) and P(C\AC).
First, we choose one of three doors, and the Ferrari is behind one of them, so
of the table above. Clearly the result of the test is catastrophi-cally unreliable. Only about 2% of the students who are tested positive are really infected!
Note that the problem with both tests is the same one. It is clear that real entrance exams require good selectivity and reliability. So the university marketing must ensure that the actual applicants do not provide a good random sample of population. Perhaps the university should try to discourage "non-intelligent" people from applying and thus secure a sufficiently low number of such applicants. With diseases, even the very rare occurrence of healthy people tested positively can be devastating. If the test is improved so that it is 100% reliable for positive people, it would have almost no impact on the resulting probabilities in the table.
Thus, if a person is tested positive when diagnosing a rare disease, it is necessary to make further tests. Then, the result P(A\B) of the first test plays the role of the prior probability P(A) during the second test, etc. This approach allows one to "cumulate the experience".
10.2.9. Borel sets. In practice, the probability of events
tJT j-p.^ which are expressed by questioning whether ffi$m'! some numerical quantity falls into a given in-terval is of interested. We illustrate this on the example dealing with the results of students in a given course, measured for instance by the number of points in a written exam(cf. 10.1.1).
On one hand, there is only a finite number of students, and there are only a finite number of possible results (say, the numbers of points in the written exam can be the integers 0 through 20). On the other hand, imagining the results of the students as an analogy to independent rolls of a regular die is inappropriate. Even if a regular 21-hedron would exist (it cannot, see chapter 13); that would be somewhat weird.
Thus it is better to focus on the assessing function X : fl —> R in the sample space fl of all students and model the probability that its value falls into a fixed interval when a random student is picked. For instance, if the table transferring points into marks A through F is fixed, the probability that the student obtained an A or a B can be modeled.
In the case of a reasonable course, we should expect that the most probable results are somewhere in the middle of the "interval of success", while the ideal result of the full number of points is not very probable. Similarly, if many values of X lie in the interval of failure, this may be at most universities perceived as a significant failure of the lecturer. This is a typical example of the random variables or random vectors, as denned below (it depends whether the result of just one or several students is chosen randomly).
One way to proceed is to model the behaviour of X as probability denned for all intervals. This requires the following tr-algebra:1
In this connection, we also talk about the c-algebra of Borel-measurable sets on Rk, and then the following definition says that random variables are Borel-measurable functions.
719
CHAPTER 10. STATISTICS AND PROBABILITY THEORY
We assume that the event of changing the door is independent of the original guess, hence
P(A\H) = P(A\HC) = P(A), P(AC\H) = P(AC\HC) =
If the original guess is correct and it is changed, then we surely lose; while if it is originally wrong and then it is changed, then we surely win. Therefore, we have
P(C\Ar\H) = Q = P(C\AcnHc),
P(C\AcnH) = 1 = P{C\AC\HC).
It follows from the second Bayes' formula (1) that
P{C\A) =
borel sets
The Borel sets in R are all those subsets that can be obtained A°JP intervals using complements, countable unions, and ^countable intersections.
More generally, on the sample space fl = Rfe, one considers the smallest tr-algebra B which contains all fc-dimensional intervals.
The sets in B are called the Borel sets on Rfe.
10.2.10. Random variables. The probabilities of the individual intervals in the Borel algebra are usually given as follows. Consider a numerical quantity X on any sample space, that is, a function X : fl —> R. Since it is desired to work with the
= P(H)P(A\H)P(C\A nH)+ P(HC)P(A\HC)P(C\A n #c ^qbahility of X taking on values from any fixed interval, the
P{A) probability space and the properties of the function X have
—P(HC) — - to a^ow
3 Notice that working with finite probability spaces where
and, analogously, all subsets are events, every function X : fl —> R is a random
variable in the following sense.
P{C\AC) =
_P(H)P(AC\H)P(C\AC n H) + P(HC)P(AC\HC)P(C\AC nlT)
=P(H) =
1
We have thus obtained P(C\A) > P(C\AC), which means that it is wise to change the door.
Note that the solution is based upon the assumption that the assistant deliberately opens a door behind which there is a goat. If the contestant believes it was an accident or if instead, say, he happens to see (or hear) a goat behind one of the two not chosen doors, then the first reasoning is correct and the
Random variables and vectors
Definition. A random variable X on a probability space (fl, A, P) is a function X : fl —> R such that the inverse image X~x (B) lies in A for every Borel set B e BonR. The real-valued function PX(B) = P(X~1 (B)) denned on all intervals B c R is called the (probability) distribution of a random variable X.
A random vector X = (X1,..., Xk) on (fl, A, P) is a fc-tuple of random variables X{ : fl —> R denned on the same probability space (fl, A, P).
probability remains to be \.
□
10.C.10. We have two bags. The first one contains two white and two black balls, while the second one contains one white and two black balls. We randomly select one of the bags and take two balls out of it (without replacement). What is the probability that the second ball is black provided the first one is white? O
D. What is probability?
First of all, recall the geometric probability, which was introduced in ??.
10.D.1. Buffon's needle. A plane is covered with parallel lines, creating bands of width /. Then, a needle of length / is thrown onto the plane. What is the probability that the needle crosses one of the lines?
If intervals I\,..., Ik in R are chosen, then the probability of simultaneous occurrence of all of the k events X{ e h must exist. Thus, as in the scalar case, there is a real-valued function denned on the fc-dimensional intervals B = Ji x ■ ■ ■ x Ik, PX(B) = P(X~! (B)) (and thus also for all Borel sets B c Rfe). It is called the probability distribution of the random vector X.
10.2.11. Distribution function. The distribution of random variables is usually given by a rule which shows how the probability grows as the interval B is extended.
In particular, consider the intervals I with endpoints a, b,
—oo < a < b < oo. Denote P(a < X < b) the probability of X lying in I = (a, b), or P(X < b) if a = —oo; and analogously for other types of intervals. In the special case of a singleton, write P(X = a).
In the case of a random vector X = (X1,..., Xk), write P(ai < Xi < bi,..., ak < Xk < bk) for the probability of simultaneous occurrence of the events where the values of Xi fall into the corresponding intervals (which may also be closed, unbounded, etc.).
720
CHAPTER 10. STATISTICS AND PROBABILITY THEORY
Solution. The position of the needle is given by two independent parameters: the distance d of the needle's center from the closest line (d e [0,1/2]) and the angle a (a e [0, tt/2]) between the lines and the needle's direction. The needle crosses one of the lines if and only if 1/2 sin a > d. The space of all events (a, d) is a rectangle tt/2 x 1/2. The favorable events (a,d) (i. e. those for which //2sina > d) correspond to those points in the rectangle which lie under the curve 1/2 sin a (q being the variable of the x-axis). By 6.2.21, the area of the figure is
f / / - sin a da = —.
o 2 2
Thus, the wanted probability is (see ??)
2
tt
□
The following (known) problem, which also deals with geometric probability, illustrates that we must be cautious about what is assumed to be "clear".
10.D.2. Bertrand's paradox. What is the probability that a random chord of a given circle is longer than the side of an equilateral triangle inscribed into the circle? Solution. We will show three ways how to find "this" probability.
1) Every chord is determined by its center. Thus, a random choice of the chord is given by a random choice of the center. The chord is greater than the side of the inscribed equilateral triangle if and only if its center lies inside the concentric circle with half radius. The center is chosen "randomly" from the whole inside of the circle. Therefore, the probability that it will lie in the inner disc is given by the ratio of the areas of these discs, which is \.
2) Unlike above, we claim that the wanted probability does not change if the direction of the chord is fixed. Then, the centers of such chords lie on a fixed diameter of the circle. The favorable centers are those which lie inside the inner circle (see 1)), i. e., inside a fixed diameter of the inner circle. The ratio of the diameters is 1 : 2, hence the wanted probability is \.
3) Now, we observe that a chord is determined by its end-points (which must lie on the circle). Let us fix one of the endpoints (call it A)-thanks to the apparent symmetry, this should not affect the resulting probability. Then, the chord satisfies the given condition if and only if the other endpoint
Distribution function
Definition. The distribution function or cumulative distribution function of a random variable X is the function Fx ■ R -> [0,1] defined for all x e R by
Fx(x) = P(X 0. Then,
P{\X-EX\>e)
We learn that 0,
lim P(\ — -p\>e)=0.
n—too 1 71 1
This result is known as Bernoulli's theorem (one of many).
This type of limit behaviour is called convergence in probability. Thus it is proved (as a corollary of Chebyshev's inequality) that the random variables Yn converge in probability to the constant random variable p.
743
CHAPTER 10. STATISTICS AND PROBABILITY THEORY
10.H.5. At the Faculty of Informatics, 10 % of students have prumer less than 1.2 (let us call them successful). How many students must we meet if the probability that there are 8-12 % successful ones among them is to be at least 0.95? Solve this problem using Chebyshev's inequality, and then using Moivre-Laplace theorem.
Solution. Let X denote the random variable that corresponds to the number of successful students, parametrized by the number n of students we meet. Since a randomly met student has probability 10 % of being successful, when meeting n students, we have X ~ Bi(n, j^). By F, we have EI = O.ln and varX = 0.09n. By Chebyshev's inequality 10.2.32, the wanted probability satisfies
P(\X - 0.1n| < 0.02n) = 1 - P(\X - 0.1n| > 0.02n) > > 0.1 ■ 0.9n _ 225 ~ ~ (0.02n)2 ~ ~ ~n'
The inequality 1 - ^ > 0.95 and hence
P(\X - 0.1n| < 0.02n) > 0.95
holds for n > 4500. The exact value of the probability is given in terms of the distribution function Fx of the binomial distribution:
P(0.08n < X < 0.12n) = Fx(0.12n) - Fx(0.08n).
Using the de Moivre-Laplace theorem (see 10.2.40), we can approximate the standardized random variable Z = wix^[i with the standard normal distribution, Fz ~ (-2,05) = 0.0404.
□
10.H.8. Using the distribution function of the standard normal distribution, find the probability that the absolute difference between the heads and the tails in 3600 tosses of a coin is at most 66.
Solution. Let X denote the random variable that corresponds to the number of times the coin came up heads. Then X has binomial distribution Bi(3Q00,1/2) (with expected value 1800 and standard deviation 30), so for a large value of n = 3600,
The other propositions about the variance are quite simple corollaries:
var(X + Y)= E((X + Y)- E(X + Y)f = E((X -EI) + (7- E7))2 = E(X -EX)2+ 2E(X -EX)(Y -EY)
+ E(Y — E Y)2 = varX + 2 cov(X, Y) + varY
Furthermore, if X and Y are independent, then E(XY) = E X E Y, and hence that their covariance is zero. □
Directly from the definition,
var(X) = cov(X,X).
The latter theorem claims that covariance is a symmetric bilinear form on the real vector space of random variables whose variance exists. The variance is the corresponding quadratic form. The covariance can be computed from the variance of the particular random variables and of their sum, as seen in linear algebra, see the property (5).
Notice that the random variable, equal to the sum of n independent and identically distributed random variables Y{ behaves, very much differently than the multiple nY. In fact,
var(Yi + ■ ■ ■ + Yn) = n var Y, var(nY) = n2 var Y.
10.2.34. Correlation of random variables. To a certain ex-^ tent, covariance corresponds to dependency between the random variables. Its relative version is called the correlation of random variables and, similarly as for the standard deviation, the following concept is denned:
Correlation coefficient
The correlation coefficient of random variables X and Y whose variances are finite and non-zero is denned as
cov(X, Y)
PX,Y
Vvar X\/ var Y
As seen from theorem 10.2.33, the correlation coefficient of random variables equals the covariance of the standardized variables -±= (X — EX) and (Y-EY).
The following equalities hold (here, a,b,c,d are real constants, bd 7^ 0, and X, Y are random variables with finite
X-800 20
the distribution function of the variable imated, by the de Moivre-Laplace theorem, with the distribution function (1,1) -£(-!, 1) = 0,7498.
□
Pa+tx,c+dY = sgn(bd)px,Y Px,x = 1-
Moreover, if X and Y are independent, then px,Y = 0.
Note that if the variance of a random variable X is zero, =then it assumes the value E X with probability 1. If the value of X falls into an interval I not containing E X with probability p 7^ 0, then the expression var X = E(X — E X)2 is positive. Stochastically, random variables with zero variance behave as constants.
745
CHAPTER 10. STATISTICS AND PROBABILITY THEORY
10.H.9. The probability that a seed will grow is 0.9. How many seeds must we plant if we require that with probability at least 0.995, the relative number of grown items differs from 0.9 by at most 0.034.
Solution. The random variable X that corresponds to the number of grown seeds, out of n planted ones, has binomial distribution X ~ Bi(n, ^). By F, we have EX = 0.9n and varX = 0.09n, so the standardized variable is Z = xZ°:®n .
V0.09n
The condition in question can be written as
P(\X - 0.9n\ < 0.034n) = P (\Z\ < °'°34n
V0.09n
/ 0 34 \ = Pi\Z\< — v^J > 0.995.
By the de Moivre-Laplace theorem, for large n, the distribution function can be approximated by the distribution function ( — V«
Altogether, we get the condition
^ f'0.34 r-
1 > 0.995.
Odtud vypočítáme n >
3z(0.9975)V 0,34 J
615.
□
10.H.10. The service life (in hours) of a certain kind of gadget has exponential distribution with parameter A = jq . Using the central limit theorem, bound the probability that the total service life of 100 such gadgets lies between 900 and 1050 hours.
Solution. In exercise 10.F.5, we computed that the expected value and variance of a random variable Xi with exponential distribution are equal to EXj = j and varXj = j?, respectively. Thus, the expected service life of each gadget is EXj = [i = 10 hours, with variance varXj = a2 = 100 hours2. By the central limit theorem, the distribution
(xx-ji}
If the covariance is a positive-definite symmetric bilinear form, then it would follow from the Cauchy-Schwarz inequality (see 3.4.3) that
(1)
\px,y \ < 1
The following theorem claims more. It shows that the full correlation or anti-correlation, i.e. px,y = ±1 of random variables X &Y says that they are bound by an affine relation Y = kX + c, where the sign of k corresponds to the sign in px,y = ±1. On the other hand, a zero correlation coefficient says that the (potential) dependency between the variables is very far from any affine relation of the mentioned type. (Note, however, this does not mean that the variables must be independent).
For instance, consider random variables Z ~ N(0,1) andZ2. Then cov(Z, Z2) = EZ3 = 0 since the density of Z is an even function. Thus the expected value of an odd power of Z is zero, if it exists.
Theorem. If the correlation coefficient is defined, then \px,y\ < 1. Equality holds if and only if there are constants k, c such that P(Y = kX + c) = 1.
Proof. A stochastic affine relation between Y and X with nonzero coefficient at Y is sought. This is equivalent to Y + sX ~ D(c) for some fixed value of the parameter s and constant c. In such a case the variance vanishes. Thus one considers the following non-negative quadratic expression:
,'Y — ~EY X-EX\ , 0 < var ( , + t —, 1=1 + 2tpx,Y + t
VvarY
V varX
of the transformed random variable -4= y^L, ( —'— itjo Si=i ^ — 10 approaches the standard normal distribution as n tends to infinity. Thus, the wanted probability for the service life of 100 gadgets
/ 100
P(900 < E xi< 105°) = p I -1 < JJjTj YlXj ~ 10 - 0
\ 2 = 1
can be approximated with the distribution function of the normal distribution:
The right-hand quadratic expression does not have two distinct real roots; hence its discriminant cannot be positive. So Mpx,y)2 — 4 < 0. Hence the desired inequality is obtained, and also the discriminant vanishes if px,y = ±1- For the only (double) root to, the corresponding random variable has zero variance; thus it asumes a fixed value with probability 1. This yields the affine relation as expected. □
10.2.35. Covariance matrix. The variability of a random J -,, vector must be considered. This suggests considering the covariances of all pairs of components. The fol-lowing definition and theorem show that this leads to ™5 an analogy of the variance for vectors, including the behaviour of the variance under affine transformations of the random variables.
746
CHAPTER 10. STATISTICS AND PROBABILITY THEORY
P(900 < J2X < 1050) « <2>(0.5) - <2>(-l) « 0.533.
covariance matrix
□
10.H.11. We keep putting items into a chest. The expected mass of an item is 3 kg and the standard deviation is 0.8 kg. What is the maximum number of items that we can put into the chest so that with probability at least 99%, the total mass does not exceed one ton?
Solution. Let X{ denote the random variable that corresponds to the mass of the i-th item. Then, we have fi =
EX{ = 3 and a = yVar X{ = 0.8 (in kilograms), and we want to have
p(J2xi ^ 100°) =
By the central limit theorem 10.2.40, the distribution of the random variable
'Xi-3\ 1 A„ 3y^
1
1
0.8
V 0.8 I 0.8Jn^~
can be approximated by the standard normal distribution. Thus, we get
p(£x < iooo) = P(sn < ^--fc) - *(^-
We learn that 2(0.99) « 2.326, so the wanted n satisfies the quadratic equation
1000 V« 0.8^ ~ IDF
whence we get n « 322. □
= 2.326,
I. Testing samples from the normal distribution
In subsection 10.3.4, we introduced the so-called two-sided interval estimate of an unknown parameter fi of the normal distribution N(/i, a2). In some cases, we may be interested only in an upper or lower estimate, i.e. a statistic U or L for which P(p < U) or P(L < fi), respectively. Then, we talk about a one-sided confidence interval (—oo, U) or (L, oo). The formula for these intervals can be derived similarly as for the two-sided interval. Now, we have for the random variable Z = V^^f^ ~ N(0,1) that
1 - a = $(z(l - a)) = P(Z < z(l - a)).
Hence it immediately follows that
1 - a = P(X - 4=z(l - a) < /i),
Consider a random vector X = (X1,... ,Xn)T all of whose components have finite variances.
The covariance matrix of the random vector X is defined in terms of the expected value as (notice the vector X is viewed as a column of random variables now)
varX = E(X - EX)(X - EX)T.
Using the definition of the expected value of a vector and expanding the matrix multiplication, it is immediate that the covariance matrix var X is the symmetric matrix
/ varXi cov(Xi,X2) ■■■ cov(Xi,Xn)\ cov(X2,X1) varX2 ■■■ cov(X2,X„)
\cov (Xn,X{) cov (Xn,X2)
var Xn J
Theorem. Consider a random vector X = (Xi,..., Xn)T all of whose components have finite variances. Further, consider the transformed random vector Y = BX + c, where B is an m-by-n matrix of real constants and c G Rm is a vector of constants. Then,
var(Y) = var(BX + c) = £(var X)BT.
Proof. The claim follows from direct computation, us-3^g7|he properties of the expected value:
°-8 var(r) = E((BX + c) - E(BX + c)) ((BX + c)
-E(BX + c))T
= E(B(X -EX))(B(X -EX))T
= BE(X — EX)(X — E X)TBT
= B(vaxX)BT.
The constant part of the transformation has no impact, while with respect to the linear part of the transformation, the covariance matrix behaves as the matrix of a quadratic form.
10.2.36. Moments and moment function. The expected value and variance reflect the square of the deviation of values of a random variable from the average. In descriptive statistics, one also examines the skewness of the data, and it is natural to examine the variability of random variables in terms of higher powers of the given random variable X.
The characteristic E(Xk) is called the k-th moment; the characteristic fik = E ((X — EX)k) is called the k-th central moment of a random variable X. What also comes in handy is the k-th absolute moment, given by E \ X\k.
From the definition it follows that for a continuous random variable X,
EXk
xkfx(x)dx.
747
CHAPTER 10. STATISTICS AND PROBABILITY THEORY
so L = X - -£jz(1 - a). Similarly, we find U = X + -j%z(l — a), and for a distribution with unknown variance, H>X- -J=f„-i(l - a) and ji < X + -J=i„-i(l - a).
If we want to estimate the variance a2 of a random distribution, then we use theorem 10.3.3, similarly as when we derived it for the expected value. This time, we use the second part of the theorem, by which the random variable S2 has distribution x2- Then, we can immediately see that
71—1
1 - a = P ( xl-M/2) < -^S2 < xl-^l - a/2)
Thus, the two-sided 100(1 — a)% confidence interval for the variance is
(n-l)S0 (n-l)S0
X2_i(l-"/2)'xLiW2)
and similarly for the one-sided upper and lower estimates, we get
o-2 < —f—, resp. 4——^-r < a2.
X2n-i{a) X2_i(l-a)
10.1.1. We roll a die 600 times, obtaining only 45 sixes. Is it possible to say that the die is ideal at level a = 0.01?
Solution. For an ideal die, the probability of rolling a six is always p= \- The number of sixes in 600 rolls is given by a random variable X with binomial distribution X ~ Bi(600, i). By 10.2.40, this distribution can be approximated by the distribution N(100, ^p). The measured value X = 45 can be considered a random sample consisting of one item. Assuming that the variance is known and applying 10.3.4, we get that the 99% (two-sided) confidence interval for the expected
value p, equals (45 - ^J^fz (0.995), 45 + ^/2|2z(0.995)). We learn that the quantile is approximately 2(0.995) « 2.58, which gives the interval (21,69). However, for an ideal die, we clearly have fi = 100, so our die is not ideal at level a = 0.01. □
10.1.2. Suppose the height of 10-years-old boys has normal distribution N(/i, a2) with unknown expected value fi and variance a2 = 39.112. Taking the height of 15 boys, we get the sample mean X = 139.13. Find
i) the 99% two-sided confidence interval for the parameter
ii) the lower estimate for fi at significance level 95 %.
Similarly, for a discrete random variable X whose probability is concentrated into points x{,
EXk =YJ^kfx(xi).
The next theorem shows that all the moments completely describe the distribution of the random variable, as a rule.
For the sake of computations, it is advantageous to work with a power series in which the moments appear in the coefficients. Since the coefficients of the Taylor series of a function at a given point can be obtained using differentiation, it is easy to guess the right choice of such a function:
Moment generating function
Given a random variable X, consider the function Mx (t) R -4- R defined by
Mx(t) = Ee
tx
J2ietx' fx(xi) if X is discrete I^oo e*x fx (x) dx if X is continuous.
If this expected value exists, the moment generating function of the random variable X can be discussed.
It is clear that this function Mx (t) is always analytic in the case of discrete random variables with finitely many val-
Theorem. Let X be a random variable such that its analytic moment generating function on an interval (—a, a) exists. Then, Mx (t) is given on this interval by the absolutely convergent series
00 /fc
Mx(*) = £]fe!E**
k=0
If two random variables X and Y share their moment generating functions over a nontrivial interval (—a, a), then their distribution functions coincide.
Proof. The verification of the first statement is a simple exercise on the techniques of differential and integral calculus. In the case of discrete variables, there are either finite sums or absolutely and uniformly converging series. In the case of continuous variables, there are absolutely converging integrals. Thus, the limit process and the differentiation can
be interchanged. Since -| e1
d tx
: etx, it is immediate that
dk
d¥
Mx(t)=EXk,
as expected.
The second claim is obvious for two discrete variables X and Y with only a finite number of values x1,... ,Xk for which either fx(xi) =^ 0 or fvixi) =^ 0. Indeed, the functions etx' are linearly independent functions and thus their coefficients in the common moment function
M(t) = etel f(xt) + ■■■ + etx« f(xk)
must be the shared probability function values for both random variables X and Y.
748
CHAPTER 10. STATISTICS AND PROBABILITY THEORY
Solution, a) By 10.3.4, the 100(1 - a)% two-sided confidence interval for the unknown expected value \i of the normal distribution is
(l) fielx
-,z(l - a/2),X + —j=z{\ - a/2)
where X is the sample mean of n items, a2 is the known variance, and z(l — a/2) is the corresponding quantile. Substituting the given values n = 15, a « 6.254 and the learned z(0.995) « 2.576, we get -^z(a/2) « 4.16, i. e., (i e (134.97,143.29).
b) The lower estimate L for the parameter \i at significance level 95 % is given by the expression L = X — ^z(0.95). We learn that 2(0.95) ~ 1.645, and direct substitution leads
to fi e (136.474, oo). □
10.1.3. A customer tests the quality of bought products by examining 21 randomly chosen ones. He will accept the delivery if the sample standard deviation does not exceed 0.2 mm. We know that the pursued property of the products has normal distribution of the form N(10 mm; 0.0734 mm2). Using statistical tables, find the probability that the delivery will be accepted. How does the answer change if the customer, in order to save expenses, tests only 4 products?
Solution. The problem asks for the probability P(S < 0.2). By theorem 10.3.3, when sampling n products, the random variable zprSa has distribution Xn-i- m our case> n — 21 and a2 = 0.0734, so
In the case of continuous variables X and Y sharing their J.',, generating function M(t), the argument is more involved and an indication only is provided. Notice that | M(t) is analytic and thus it is denned for all complex numbers t, \t\ < a. In particular,
M(it)
' f(x)dx,
which is the inverse Fourier transform of f(x), up to the constant multiple V2tt, see 7.2.5 (on page 474). If this works for all t, then clearly / is obtained by the Fourier transform of (v/27r)~1M(ii) and thus must be the same for both X and Y. Further details, in particular covering general random variables, would need much more input from measure theory and Fourier analysis, and thus it is not provided here. □
It can be also shown that the assumptions of the theorem are true whenever both Mx (—a) < oo and Mx (a) < oo.
10.2.37. Properties of moment function. By the properties of the exponential functions, it is easy to compute the behaviour of the moment function un-fi-Ofe der affine transformations and sums of independent random variables.
Proposition. Let a, b G R and X, Y be independent random variables with moment generating functions Mx (t) and MY(t), respectively. Then, the moment generating functions of the random variables V = a + bX and W = X + Y are
Ma+bx(t)=eatMx(bt) Mx+Y (t) = Mx(t)MY(t)
Proof. The first formula can be computed directly from the definition:
P(S < 0.2) = P
20
0.0734
■S2 <
20
0.0734
0-22 = xlo
20-0.22\Mv(t) =Ee(°+^' =Eeate^x =eatMx(bt). 0.0734 J As for the second formula, recall that etx and etY
are
The expression in the argument of the distribution function is approximately 10.9, and we can learn from the table of the x2 distribution that xlo(10-9) ~ °-05- Th^- the probability that delivery will be accepted is only 5 %. We could have expected the probability to be low: indeed, E S2 = = a2 = 0.0734 > 0.22. If the customer tests only 4 products, then the probability of acceptance is given by the expression xl
independent variables. Use the fact that the expected value of the product of independent random variables equals the product of the expected values.
.Eet(x+y)
Mw(t)
EetxetY
.EetxEetY
Mx(t)MY(t).
□
0.0734 2
x|(1.63). The value of the distribution function of x2 m this argument cannot be found in most tables. Therefore, we estimate it using linear interpolation. For instance, if the nearest known points are x\ (0.58) = 0.1 and x|(6.25) = 0.9, then
xl(1.63) « (1.63 - 0.58)
0.9-0.1 6.25 - 0.58
+ 0.1 « 0.24.
10.2.38. Normal and binomial distributions. As an illustrating example, compute the moment function of two random variables X ~ N(/i, a) and X ~ Bi(n,p).
Moment generating function for N(/i, a)
Proposition. If X ~ N(/i, a), then
Mx(t) =e^e^. In particular, it is an analytic function on all o/R.
749
CHAPTER 10. STATISTICS AND PROBABILITY THEORY
Although this results is only an estimate, we can be sure that the probability of acceptance is much greater than when testing 21 products. □
10.1.4. From a population with distribution N(p,a2), where a2 = 0.06, we have sampled the values 1.3; 1.8; 1.4; 1.2; 0.9; 1.5; 1.7. Find the two-sided 95% confidence interval for the unknown expected value. Solution. We have a random sample of size n = 7 from the normal distribution with known variance a2 = 0.06. The sample mean is
X = ^(1.3 + 1.8 + 1.4 + 1.2 + 0.9 + 1.5 + 1.7) = 1.4
and we can learn for the given confidence level a = 0.05 that z(l - a/2) = 2(0.975) « 1.96. Substituting into (1), we immediately obtain the wanted interval (1.22,1.58). □
10.1.5. Let X1,..., Xn be a random sample from the distribution N(p, 0.04). Find the least number of measurements that are necessary so that the length of the 95% confidence interval for fi would not exceed 0.16.
Solution. Since we have a normal distribution with known variance, we know from (1) that the length of the (1 — a)% confidence interval is ^=z(l — a/2). Substituting the given values, we get that the number n of measurements satisfies the inequality
2-02
—=-z(0.975) < 0.16.
Since z(0.975) « 1.96, we obtain n > 24.01. Thus, at least 25 measurements are necessary. □
10.1.6. Consider a random variable X with distribution N(fi,a2), where fi, a2 are unknown. The following table shows the frequencies of individual values of this random variable:
Xi 8 11 12 14 15 16 17 18 20 21
rii 1 2 3 4 7 5 4 3 2 1
Calculate the sample mean, sample variance, sample standard deviation, and find the 99% confidence interval for the expected value [i.
Solution. The sample mean is given by the expression X = niXi/ J2ni- Substituting the given values, we get X = 490/32 « 15.3. By definition, the sample variance is S = J2ni(Xi — X)2/(J2ni ~ !)■ Substituting the given values, we get S2 = 1943/256 « 7.6, so the sample standard deviation is S ~ 2.8. The formula for the two-sided (1 — a)% confidence interval for the expected value fi, when the variance
Proof. Suppose Z ~ N(0,1)). Then „ 1
Mz(t)
_e 2 da;
. 2tt
1 e-^-2tx+t2-t2)Ax
2tt
1 <*-')2 , e 2 da;
2tt
= e 2
where use is made of the fact that in the last-but-one expression, for every fixed t, the density of a continuous random variable is integrated; hence this integral equals one.
Substitute the formula for the moment generating function M^+az, to obtain for X ~ N(/i, a) that
Mx(t) =^t&^r-again a function analytic over entire R.
□
In particular, the moments of Z of all orders exist. Substitute \t2 into the power series for the exponential function, and calculate them all:
2 \ k oo
lit
k=0 v ' k=0
= V — t2k =
= l + Qt + \t2 + Qii + ^-tA + ... 2 4!
In particular, the expected value of Z is E Z = 0, and its variance is varZ = ¥>Zr — (E2)2 = 1. Further, all moments of odd orders vanish, EZ4 =3, etc.
Hence the sum of independent normal distributions X ~ N(/i, a) and Y ~ N(/i', a') has again the normal distribution X + Y ~N(fi + fi',a + a').
Similarly, considering the discrete random variable X ~ Bi(n,p),
Mx(t) =Eetx = E(pe*f (f)(l - p)n~k
k=0 ^ '
= {pet+{\-p)Y = {p{et-\) + \Y = 1 + npt + ^ (n(n - l)p2 + np)t2 + ...
is computed. Of course, the same can be computed even easier using the proposition 10.2.37 since X is the sum of n independent variables Y ~ A(p) with the Bernoulli distribution. Therefore,
Eetx = (EetY)n = (pe* +(1 - p))n.
Hence all the moments of the variable Y equal p. Therefore, EY = p, while var Y = p(l —p). From the moment function Mx(t), EX = npandvarX = EX2 - (EX)2 = np(l -P)-
750
CHAPTER 10. STATISTICS AND PROBABILITY THEORY
is unknown, was derived at the end of subsection 10.3.4:
fie(x- -J=*n-i(l - a/2),X + -J=*n-i(l - a/2)
Substitution yields X = 15.3, n = 32, S « 2.8, a = 0.01, and we learn %(0.995) « 2.75. Thus, the 99% confidence interval is \i e (14.0,16.7). □
10.1.7. Using the following table of the distribution function of the normal distribution, find the probability that the absolute difference between the heads and the tails in 3600 tosses of a coin is greater than 90.
Standard Normal Distribution Table
Solution. Let X denote the random variable that corresponds to the number of heads. Then, X has binomial distribution Bi(3Q00,1/2) (with expected value 1800 and standard deviation 30), so by the de Moivre-Laplace theorem, for large val-
can
ues of n, the distribution function of the variable be approximated by the distribution function 0,
lim P(
1 "
71 ^-'
ß \ < e) = 1.
Proof. By the use Chebyshev's inequality just as at the end of subsection 10.2.32,
= 2<2>(-1.5) = 0.1336,
where the last value was learned from the table.
□
n 1 v-^ I \ varfi E"_i Xi — u)
P - VX-/i >e)<-uz"1-1 "
M 7i ^-^ 1 ' el
i=l
= 7j?Er=ivar^ < SL
e2 ~ Tie2
751
CHAPTER 10. STATISTICS AND PROBABILITY THEORY
10.1.8. The probability that a newborn baby is a boy is 0.515. Find the probability that there are at least the same number of girls as boys among ten thousand babies.
Solution.
PIX < 50001
X-5150
-150
V5150- 0.485 ^5150- 0.485
-iV(0,l)
= 0.00135
□
10.1.9. Using the distribution function of the standard normal distribution, find the probability that we get at least 3100 sixes out of 18000 rolls of a six-sided die.
Solution. We proceed similarly as in the exercises above. X has binomial distribution Bi(18000,1/6). We find the expected value ((1/6) (18000) = 3000) as well as the standard deviation ^(1/6) (1 - 1/6)18000) = 50. Therefore, the variable x~3q00 can be approximated with the distribution function 3100] = P = P
X- 3000 3100 - 3000
>
50
X - 3000
50
> 2
50
= 1 -<2>(2) = 0.0228.
10.1.10. A public opinion agency organizes a survey of preferences of five political parties. How many randomly selected respondents must answer so that the probability that for each party, the survey result differs from the actual preference by no more than 2% is at least 0.95?
Solution. Let pi, i = 1... 5 be the actual relative frequency of voters of the i-th political party in the population, and let X{ denote the number of voters of this party among n randomly chosen people. Note that given any five intervals, the events corresponding to X{/n falling into the corresponding interval may be dependent. If we choose n so that for each i, Xi/n falls into the given interval with probability at least 1 — ((1 — 0.95)/5) = 0.99, then the desired condition is sure to hold even in spite of the dependencies. Thus, let us look for n such that P[|f - p\ < 0.02] > 0.99. First of all, we
Thus, the probability P is bounded from below by
1 71
71 -'
- ß \ < e) > 1
C
which proves the proposition.
□
Thus, existence and uniform boundedness of variances suffices for the means of pairwise uncorrelated variables Xi with zero expected value to converge (in probability) to zero.
10.2.41. Central limit theorem. The next goal is more ambitious. In addition to the law of large numbers, the stochastic properties of the fluctuation of the means Xn = i Y17=i around the expected value [i need to be understood. We focus first on the simplest case of sequences of independent and identically distributed random variables X{. Then formulate a more general version of the theorem and provide only comments on the proofs.
Move to a sequence of normalized random variables X{. Assume E X{ = 0 and var X{ = 1. Assume further that the moment generating function Mx (t) exists and is shared by all the variables X{.
The arithmetic means - y^L, X{ are, of course, random variables with zero expected value, yet their variances are -% = -. Thus, it is reasonable to renormalize them to
1
E*»
which are again standardized random variables. Their moment generating functions are (see proposition 10.2.37)
MSn(t) = Ee"
■T,ix* _
Mx(-r=)
□ Since it is assumed that the variables X are standardized,
t
1 + 0^ + 1 — + o -
\/n In n
where again o(G(n)) is written for expressions which, when divided by G(n), approach zero as n —> oo, see subsection 6.1.16.
Thus, in the limit,
lim Ms„ (t) = lim
1 + —--h O ( —
2n Kn'
■ e 2
This is just the moment generating function of the normal distribution Z ~ N(0,1), see the end of subsection 10.2.35. Thus, the standardized variables Sn asymptotically have the standard normal distribution.
We have thus proved a special version of the following fundamental theorem. Although the calculation is merely a manipulation of moment generating functions, many special cases were proved in different ways, providing explicit estimates for the speed of convergence, which of course is useful information in practice.
Notice that the following theorem does not require the probability distributions of the variables X{ to coincide!
752
CHAPTER 10. STATISTICS AND PROBABILITY THEORY
rearrange the expression:
--p < 0.02
X
-0.02 <--p < 0.02
n
= P [-0.02 ■ n< X -pn< 0.02 ■ n] =
-0.02 ■ n
X — pn
0.02 ■ n
yjnp{\-p) yjnp{\-p) yjnp{\-p)
2 0.99
> 0.995
Since the distribution function is increasing, the last condition is equivalent to
> *-(0.995)
\Aw - p)
"-02-" > 2.576
^np{\-p)
> 50-2.576- Vp(l-p)
> (25-2.276)2-4147
Here, we used the fact that the maximum of the function p(l — p) is \, and it is reached at p = \. We can see that if e. g. p = 0.1, then \Jp{\ — p) = 0.3 and the value of the least n is lower. This accords with our expectations: for less popular parties, it suffices to have fewer respondents (if the agency estimates the gain of such party to be around 2 % without asking anybody, then the wanted precision is almost guaranteed).
□
10.1.11. Two-choice test. Consider random vectors Y\ and Y2 all of whose components are pairwise independent random variables with normal distribution, and suppose that the components of vector Y{ have expected value and the variance a is the same for all the components of both vectors.
Central limit theorem
Theorem. Consider a sequence of independent random variables Xi which have the same expected value ~EXi = p, variance var Xi = a2 > 0 and uniformly bounded third absolute moment E\Xi\3 < C. Then, the distribution of the random variable
1 "
Xi- p
satisfies
lim P(Sn p(72) = 72- (1 - ■§) • (1 -1) = 24, alternatively 3 is either divisible by an odd prime p (then, ip(n) is divisible p — 1, which is an even integer) or n is a (higher-than-first) power of two (and then, p(2a) = 2Q_1 is even
Corollary. Let a,beN, (a, 6) = 1. Then
p(a ■ b) = ip(a) ■ (d)"(
Now, we will demonstrate the use of the Möbius inversion formula on a more complex example from the theory of finite fields. Let us consider ap-element field Fp (i.e., the ring of residue classes modulo a prime p) and examine the number Nd of monic irreducible polynomials of a given degree d over this field. Let Sd(x) denote the product of all such polynomials. Now, we borrow a (not very hard) theorem from the theory of finite fields which states that for all n e N, we have
Y[Sd(a
Confronting the degrees of the polynomial on both sides yields
Pn = J2dNd,
d\n
791
CHAPTER 11. ELEMENTARY NUMBER THEORY
as well). Altogether, we have found out that p{n) is odd only for n = 1, 2. ii) The integer 2n +1 is odd, so (2,2n +1) = 1, and hence
p(4n + 2) = p(2 ■ (2n + 1)) = p(2) ■ p(2n + 1) = p{2n + \). □
11.B.13. Find all natural numbers m for which:
^ i) p(m) = 30,
ii) (3) = 2, p(32) = y>(7) = 6,^(11) = 10 are all integers which divide 30 into an odd integer greater than 1. Therefore, if we had, for instance, m = 7 ■ m1, where 7 \ mi, then we would also have p(mi) = 5, which is impossible, as we know from the previous exercise.
We thus get j3 = 7 = 8 = 0 and m = 2a ■ 31£, whence we can easily obtain the solution m G {31,62}.
p
In particular, we can see that for any n G N, it holds that Nn = ^ (pn — ■ ■ ■ + p(n)p) ^ 0 since the expression in the parentheses is a sum of distinct powers of p multiplied by coefficients ±1, so it cannot be equal to 0. Therefore, there exist irreducible polynomials over Fp of an arbitrary degree n, so there are finite fields Fp™ (having pn elements) for any prime p and natural number n (in the theory of field extensions, such a field is constructed as the quotient ring ¥p[x]/(f) of the ring of polynomials over Fp modulo the ideal generated by an irreducible polynomial / G Fp [x] of degree n, whose existence has just been proved).
11.3.3. Example. By the formula we have proved, the number of (monic) irreducible polynomials over ¥2 of degree 5 is equal to
N5
Pl
(p(l)-25+p(5)-2) =6.
d\5
The number of monic irreducible polynomials over F3 of degree four is then
^4 = \ £d|4 p (2) 3d = \ (Ml) ■ 34 + P(2) ■ 32 + p^S1) = i(81-9) = 18-
11.3.4.
Fermat's little theorem, Euler's theorem. These theorems belong to the most important results of elementary number theory, and they will often be applied in further theoretical as well as practical problems.
Theorem (Fermat's little). Let a be an integer and pa prime, p\ a. Then,
aP'1 = 1 (mod p).
Proof. The statement will follow as a simple consequence of Euler's theorem (and together with this one, it is a consequence of more general Lagrange's theorem 12.3.10). However, it can be proved directly (by mathematical induction or a combinatorial means, as mentioned in exercise 11.B.15). □
Sometimes, Fermat's little theorem is presented in the following form, which is apparently equivalent to the original statement.
Corollary. Let a be an integer and p a prime. Then,
aP = a (mod p).
Before formulating and proving Euler's theorem, we introduce a few useful concepts.
792
CHAPTER 11. ELEMENTARY NUMBER THEORY
ii) Similarly to above, only primes p G {2, 3} can divide m, and the prime 3 can divide m only in the first power. However, since = 17, the prime 3 cannot divide m at all. The remaining possibility, m = 2Q, leads to 34 = 2Q_1, which is also impossible. Therefore, there is no such number m.
iii) Now, every prime p dividing m must satisfy p—1 | 20, so p — 1 G {1, 2,4,5,10,20}, which is satisfied by primes p G {2,3,5,11}, and only 2 and 5 of those can divide m in higher power. We thus have
m = 2Q3'Vll'5,
where a G {0,1,2,3}, 7 G {0,1, 2}, /3, S G {0,1}.
First, consider <5 = 1. Then, p(2a3l35r) = 2, whence we easily get that 7 = 0 and (a,0) G {(2,0), (1,1), (0,1)}, which gives three solutions: m G {44, 66, 33}.
Further, let us have 5 = 0. If 7 = 2, then p{2a3<3) = 1, whence (a,0) G {(1,0), (0,0)}. We thus obtain two more solutions: m G {50,25}.
If 7 = 1, then we get = 5, similarly to the above item. This is an odd integer, so we get no solutions in this case. This is also the case for 7 = 0 since the equation p(2a) = 20 has no solution, either.
Altogether, there are five satisfactory values m G {25,33,44,50,66}.
iv) This problem is of a different kind than the previous ones, so we must approach otherwise. The relation 1. Then, 1. n
11.B.14. Find all two-digit numbers n for which 9\p(n). Q
Residue systems
A complete residue system modulo m is an arbitrary m-tuple of integers which are pairwise incongru-ent modulo m (the most commonly used m-tuple is 0,1,..., m — 1 or, for odd m, its "symmetric" variation
_m —1 _-I n -I m—1 \
A reduced residue system modulo m is an arbitrary —teger a and any prime p which does not divide a, it holds that ap_1 = 1 (mod p).
Solution. First, we prove (by induction on a) that an apparently equivalent statement, ap = a (mod p), holds for any a G Z and prime p. For a = 1, there is nothing to prove. Further, let us assume that the proposition holds for a and prove its validity for a +1. It follows from the induction hypothesis and the exercise 11.B.6 that
(a + l)p = ap+ lp = a + l (mod p),
which is what we were to prove.
The statement holds trivially for a = 0 as well as in the case a < 0,p = 2. The validity for a < 0 and p odd can be obtained easily from the above: since —a is a positive integer, we get — ap = (—a)p = —a (mod p), whence ap = a (mod p).
The combinatorial proof is a somewhat "cunning" one: Similarly to problems using Burnside's lemma (see exercise
12. G.1), we are to determine how many necklaces can be created by wiring a given number of beads, of which there is a given number of types. Having a types of beads, there are clearly ap necklaces of length p, a of which consist of a single bead type. From now on, we will be interested only in the other ones, of which there are thus ap — a. Apparently, each necklace is transformed into itself by rotating by p beads. In general, a necklace can be transformed into itself by rotating by another number of beads, but this number can never be coprime to p (for instance, considering p = 8 and the necklace ABABABAB, rotations by 2,4, or 6 beads leave it unchanged). However, if p is a prime, it follows that all rotations lead to different necklaces. Therefore, if we do not distinguish necklaces which differ in rotation only (i.e., in the position of the "knot"), there are exactly
ap -a
P
of them, which especially means that p | ap — a.
As an example, let us consider the case a = 2,p = 5, i.e., necklaces of length 5, consisting of 2 bead types (A, B). There are 25 = 32 necklaces in total, 2 of which consist of a single bead type (AAAAA, BBBBB). Leaving them and
modulus is surely not greater than p(m). As we will see later, the integers whose order is exactly s. Dividing the integer t — s by r with remainder, we get t — s = q ■ r + z, where q, z G No, 0 < z < r.
" -<= " Since t = s (mod r), we have 2 = 0, hence at-s _ aqr _ ^ary = yi (moci my Multiplying both
sides of the congruence by the integer as leads to the wanted statement.
" => " It follows from a* = as (mod m) that as ■ aqr+z = as (moc[ my Since ar = 1 (mod m), we also have aqr+z = az (mod m). Altogether, after dividing both sides of the first congruence by the integer as (which is co-prime to the modulus), we get az = 1 (mod m). Since z < r, it follows from the definition of the order that 2 = 0, hence r | t — s. □
The above theorem and Euler's theorem apparently lead to the following corollary (whose second part is only a reformulation of Lagrange's theorem 12.3.10 for our situation):
Corollary. Let m G N, a G Z, (a, m) = 1, and r be the order of a modulo m.
(1) For any n G N U {0}, it holds that
an = 1 (mod m) ^=4> r | n.
(2) r | - bn = 1 (mod m). □
The last statement of this series connects the orders of two integers to the order of their product:
Lemma. Letm G N, a,b G Z, (a,m) = (b,m) = 1. If a has order r and b has order s modulo m, where (r, s) = 1, then the integer a ■ b has order r ■ s modulo m.
Proof. Let S denote the order of a ■ b. Then, (ab)s = 1 (mod m). Raising both sides of this congruence to the r-th power leads to arSbrS = 1 (mod m). Since r is the order of a, we have ar = 1 (mod m), i.e., brS = 1 (mod m), and so s I rS. From r being coprime to s, we get s | S. Analogously, we can get r | S, so (again utilizing that r, s are coprime) r ■ s I S. On the other hand, we clearly have (ab)rs = 1 (mod m), hence 5 \ rs. Altogether, 5 = rs. □
11.3.7. Primitive roots. Among the integers coprime to a modulus m (i.e., the elements of a reduced residue system modulo m), the most important ones are those whose order is equal to xa is called the discrete logarithm or index of the integer a (with respect to a given modulus m and a fixed primitive root g), and it is a bijection between the sets {a G Z; (a,m) = 1,0 < a < m} and {x G Z; 0 < x < 1. The modulus m has primitive roots if and only if at least one of the following conditions holds:
• m = 2 or m = 4,
• mis a power of an odd prime,
• m is twice a power of an odd prime.
The proof of this theorem will be done in several steps. We can easily see that 1 is a primitive root modulo 2 and 3 is a primitive root modulo 4. Further, we will 1| show that primitive roots exist modulo any odd prime (in algebraic words, this is another proof of the fact that the group (Z^, •) of invertible residue classes modulo a prime m is cyclic; see also 12.3.8).
Proposition. Let p be an odd prime. Then there are primitive roots modulo p.
Proof. Let r 1, r2,..., rp_ 1 be the orders of the integers 1, 2,... ,p — 1 modulo p. Let S = [r1,r2, • • •, rP-i] be the least common multiple of these orders. We will show that there is an integer of order S among 1, 2,..., p — 1 and that
5 = p-l.
Let S = q"1 ■ ■ ■ q^k be the factorization of S to primes. For every s G {1,..., k}, there is a c G {1,... ,p — 1} such that q"s I rc (otherwise, there would be a common multiple of the integers ri, r2,..., rp_i less than S). Therefore, there exists an integer b such that rc = b ■ q"s. Since c has order
rc, the order of the integer gs
is equal to q"s (by the
is thus sufficient to determine the remainder of the exponent modulo 5. We have
1413 = (-1)13 = -1 = 4 (mod 5),
so the wanted remainder is 44 = 28 = 256 = 6 — 5 + 2 = 3 (mod 11). Alternatively, we could have finish the calculation as follows: 44 = 4_1 = 3 (mod 11).
theorem 11.3.6 on orders of powers).
Reasoning analogously for any s G {1,..., k}, we get integers gi,..., gk, and we can set g := g\ ■ ■ ■ gk. From the properties of the order of a product, we get that the order of g is equal to the product of the orders of the integers g\,..., gk, i.e. to q°1 ■ ■ ■ ql"* = S.
Now, we prove that S = p — l. Since the orders of the integers 1, 2,..., p — 1 divide S, we get the congruence xs = 1 (mod p) for any x G {1,2,... ,p — 1}. By theorem 11.4.8, there are at most S solutions to a congruence of degree S modulo a prime p (in algebraic words, we are actually looking for roots of a polynomial over a field, and there cannot be more of them than the degree of the polynomial, as we will see in part 12.2.4). On the other hand, we have already shown that this congruence has p—1 solutions, so necessarily S >p—l. Still, S is (being the order of g) a divisor of p — 1, whence we finally get the wanted equality S = p — 1. □
Now, we show that there are primitive roots modulo powers of odd primes. First, we prove two helping lemmas.
Lemma. Let p be an odd prime, £ > 2 arbitrary. Then, it holds for any a G Z that
-1
(1 + apf
l + ap*
(mod p
Proof. This will follow easily from the binomial theorem using mathematical induction on I.
I. The statement is clearly true for I = 2.
II. Let the statement be true for I, and let us prove it for I + 1.
CHAPTER 11. ELEMENTARY NUMBER THEORY
□
11.B.25. Determine the last two digits of the decimal expan-v^ii/", si°n °f the number 1414 .
^ Solution. We are interested in the remainder of the I? number a = 1414" upon division by 100. However, since (14,100) > 1, we cannot consider the order of 14 modulo 100. Instead, we can factor the modulus to coprime integers: 100 = 4-25. Apparently, 4 | a, so it remains to find the
remainder of a modulo 25. By Euler's theorem, we have
14^(25) = 1420 = 1 (mod 25^
so we are interested in the remainder of 1414 upon division by 20 = 4 ■ 5. Again, we clearly have 4 | 1414, and further 1414 = (-1)14 = 1 (mod 5), so
141
16
(mod 20). Altogether,
M14" = M16 = 216 . 716 (mod25).
We can simplify the computation to come a lot if we realize that
72 = -1 (mod 25), and 25 = 7 (mod 25).
Then,
141414 = 216 . 716 = ^5^3 . 2 . 716 ^ 73 . 2 . 716
= 2 ■ 719 = 2 ■ (-1)9 ■ 7= 11 (mod 25).
We are thus looking for a non-negative integer which is less than 100, is a multiple of 4, and leaves remainder 11 when divided 25 - the only such number is clearly 36. □
11.B.26. Determine the last three digits of the number 121C|11.
o
11.B.27. Find all natural numbers n for which the integer
A01//^ 5n — 4n — 3n is divisible by eleven.
fr Solution. The orders of all of the numbers 3, 4, and 5 are equal to five, so it suffices to examine n e {0,1,2,3,4}. It can be seen from the following table
0 12 3 4
5™ mod 11 1 5 3 4 9 4™ mod 11 1 4 5 9 3 3™ mod 11 1 3 9 5 4
that only the case n = 2 (mod 5) yields 3 — 5 — 9 = 0
(mod 11).
The problem is thus satisfied by exactly those natural numbers n which satisfy n = 2 (mod 5). □
Invoking exercise 11.B.7 and raising the statement for £ to the p-th power, we obtain
(1 + apf1'1 = (1 + ap'^f (mod pl+1).
It follows from the binomial theorem that
(l + ap^f = l+p-a-pe-1+ J2
a p
and since we have p\ (£) for 1 < k < p (by exercise 11 .B .6), it suffices to show that pe+1 | p1+(£_:L)fe, which is equivalent to 1 < (k - 1)(£ - 1). Thanks to the assumption £ > 2, we get that pe+1 | for k = p as well. □
Lemma. Let p be an odd prime, £ > 2 arbitrary. Then, it holds for any integer a satisfying p\ a that the order ofl+ap modulo pl equals pe_1.
Proof. By the previous lemma, we have
(l + apf''1 = l + ape (mod/+1),
and considering this congruence modulo pe, we get (1 + ap)p = 1 (mod pe). At the same time, it follows directly from the previous lemma and p not being a divisor of a that (1 + ap)p ^ 1 (mod pe), which gives the wanted propo-
sition.
□
Proposition. Let p be an odd prime. Then, for every £ G N, there is a primitive root modulo pl.
Proof. Let g be a primitive root modulo p. We will show that if gp_1 ^ 1 (mod p2), then g is a primitive root even modulo pe for any I 6 Ff, (If we had gv~x = 1 (mod p2),
then (g + p)
p-i
1 + (p — \)gp p ^ 1 (mod p2), so we
could choose g + p for the original primitive root instead of the congruent integer g.)
Let g satisfy gv~x ^ 1 (mod p2). Then, there is an a e Z, p \ a such that gv~x = 1 + p ■ a. We will show that the order of g modulo pe is ip(pe) = (p—l)p£_1. Let n be the least natural number which satisfies gn = 1 (mod pe). By the previous lemma, the order of gv~x = 1 + p ■ a modulo pe is pe~1. However, then it follows from the corollary of 11.3.5 that
(gP"1)™ = (iff"1 = 1 (mod pe) ==> p1-1 | n.
At the same time, the congruence gn = 1 (mod p) implies that p — 1 | n. From p — 1 and pe_1 being coprime, we get that (p — l)p£_1 | 7i. Therefore, ti = 2. Now, we will verify that 2 is a primitive root modulo 11. The order of 2 divides (41) = 40 = 23 ■ 5, it holds that an integer g coprime to 41 is a primitive root modulo 41 if and only if
g20 ^ 1 (mod 41) A g8 ^ 1 (mod 41).
Now, we will go through the potential primitive roots in ascending order:
g = 2: 28 = 25-23 = -9-8 = 10 (mod 41),
220 = (25)4 = (_9)4 = gl2 = (_1)2
= 1 (mod 41), 5 = 3: 38 = (34)2 = (-1)2 = 1 (mod 41), g = 4: the order of 4 = 22 always divides the order 2,
5 = 5
(52)4 = (-24)4
(28
= 102 = 18 (mod 41), 520 = (52)io = (_24)10 = 240 = (220)2 = 1 (mod 41), g = 6: 68 = 28-38 = 10-l = 10 (mod 41),
g20 = 220 . 320 = 220 . (38)2 . g4
= 1 • 1 • (—1) = —1 (mod 41).
We have thus proved that 6 is the least positive primitive root modulo 41 (if we were interested in other primitive roots modulo 41 as well, we would get them as the powers of 6 with exponent taking on values from the range 1 to 40 which are coprime to 40. There are exactly y>(40) = p(23 -5) = 16 of them, and the resulting remainders modulo 41 are ±6, ±7, ±11, ±12, ±13, ±15, ±17, ±19).
Now, if we prove that 640 ^ 1 (mod 412), we will know that 6 is a primitive root modulo any power of 41 (if we had "bad luck" and found out that 640 = 1 (mod 412), then a
The subsequent proposition describes the case of powers of two. We will use similar helping lemmas as in the case of odd primes.
Lemma. Let £ G N, £ > 3. men 52
(mod 2e).
Proof. Similarly as above for 2 \ p.
1 + 2*
□
Lemma. Let £ G N, £ > 3. Then the order of the integer 5 modulo 2l is 2l~2.
Proof. Easily from the above lemma.
□
Proposition. Let I £ N. There are primitive roots modulo 2l if and only if I < 2.
Proof. Let £ > 3. Then the set
S = {(-If ■ 5b; a G {0,1}, 0 < b < 2e~2; b G Z}
forms a reduce residue system modulo 2e: it has ip (2e) elements, and it can be easily verified that they are pairwise incongruent modulo 2e.
At the same time (utilizing the previous lemma), the order of every element of S apparently divides 2e~2. Therefore, this reduced system cannot (and nor can any other) contain an element of order p(2e) = 2e~1. □
The last piece to the jigsaw puzzle of propositions which collectively prove theorem 11.3.8 is the statement about nonexistence of primitive roots for composite numbers which are neither a power of prime nor twice such.
Proposition. Let m G N be divisible by at least two primes, and let it not be twice a power of an odd prime. Then, there are no primitive roots modulo m.
Proof. Let m factor to primes as 2ap"1 ■ ■ ■ p^k, where
a G N0, on G N, 2 \ pi, and k > 2 or both k > 1 and a > 2. Denoting 5= [p(2a), pip"1),... ^(p"1)],^ am easily see that S < p(2a) ■ pip"1) ■ ■ ■ pip"1) = p(ra) and
that for any a G Z, (a, m) = 1, we have ad = 1 (mod Therefore, there are no primitive roots modulo m.
□
In general, it is computationally very hard to find a primitive root for a given modulus. The following theorem describes a necessary and sufficient 5 condition for the examined integer to be a primitive root.
11.3.9. Theorem. Let m be such an integer that there are primitive roots modulo m. Let us write p(m) = q"1 ■ ■ ■ q^k, where qi,..., qu are primes and Qi,..., G N. Then, for every j 6 Z, (g, m) = 1, it holds that g is a primitive root modulo m, if and only if neither of the following congruences holds:
j 'i =1 (mod m),
., g "k =1 (mod m).
Proof. If either of the congruences were true, it would mean that the order of g is less than p(ra).
798
CHAPTER 11. ELEMENTARY NUMBER THEORY
primitive root modulo 412 would be 47 = 6 + 41). To avoid manipulating huge numbers when verifying the condition, we will use several tricks (the so-called residue number system).
First of all, we calculate the remainder of 68 upon division by 412; this problem can be further reduced to computing the remainders of the integers 28 and 38:
28 = 256 = 6-41 + 10 (mod412),
38 = (34)2 = (2 ■ 41 - l)2 = -4 ■ 41 + 1 (mod 412).
Then,
68 = 28 ■ 38 = (6 ■ 41 + 10)(-4 -41 + 1) = -34 ■ 41 + 10 = 7 ■ 41 + 10 (mod 412)
and
64o = (-68)5 = (7 . 41 + 10)5 = (io5 + 5 ■ 7 ■ 41 ■ 104) = 104(10 + 35 ■ 41) = (-2 ■ 41 - 4)(-6 ■ 41 + 10) = (4-41 -40) = 124 ^ 1 (mod412).
In the calculation, we made use of the fact that 104 = 6 ■ 412 - 86, i.e., 104 = -2 ■ 41 - 4 (mod 412).
Therefore, 6 is a primitive root modulo 412, and since it is an even integer, we can see that 1687 = 6 + 412isa primitive root modulo 2 ■ 412 (while the least positive primitive root modulo 2 ■ 412 is the integer 7). □
C. Solving congruences
Linear congruences. The following exercise illustrates that a /3 {m)/d = gbV(m)/d (mod m) bip{m
is true if and only if that any quadratic congruence can be transit »•^-'7' formed to the (possibly system of congruences of) binomial form x2 = a (mod p), and then we can decide about the solvability using the Legendre symbol. Let us illustrate it on several examples.
11.C.23. Determine the number of solutions of the congruence 13a;2 + 7x + 1 = 0 (mod 37). Solution. First, we need to normalize the polynomial on the left-hand side, i.e. we have to find the inverse of 13 modulo 37. Using the Eucliean algorithm we find that the inverse is 20, and after multiplication of both sides of the congruence by it and reducing modulo 37 we obtain the congruence a;2 +29a;+ 20 = 0 (mod 37). Now we complete the square (an odd coefficient 29 does not cause any trouble as it can be replaced by -8) and we obtain (x - A)2 + A = 0 (mod 37). After substitution of y for x — A we finally obtain the congruence in a binomial form
y2 = -4 (mod 37).
The fact that this congruence is solvable can be established either using theorem 11.4.10, or with use of the Legendre symbol. The former approach leads to the calculation d =
(2,^.(37)) = 2, and
(-4)^=1 (mod 37), while the latter one gives
37) ~ ( 37 ) ' (37) ~ 1 by the corollary after theorem 11.4.13 (as 37 = 1 (mod 4)). In any way we have obtained that the given congruence has d = 2 solutions. □
11.C.24. Solve the congruence 6a;2 +x—1 = 0 (mod 29). Solution. Although we have not presented any special method for finding solutions of quadratic congruence yet (apart from the general method for binomial congruences or going through the complete residue system) we will see that in some case the set of solutions can be easily established. Let us first proceed in the usual way: multiplying the congruence by 5 (it is the inverse of 6 modulo 29) we obtain a;2 + 5x — 5 = 0 (mod 29), and after completing the square we have
(a;-12)2 = 4 (mod 29).
Law of quadratic reciprocity
11.4.13. Theorem. Let p, q be odd primes. Then,
(1) (f) = (-1) V,
(2) (2) = (-1)^,
(3) (§) = (§)■(-!)"•
The theorem is put this way mainly because we can calculate the value (a/p) for any integer a using these three formulae and the basic rules for the Legendre symbol.
Example. Let us calculate the value (79/101) using the properties of the Legendre symbol.
79 \ _ /101
Toy ~ \~79
22 79
79 J \79 11
since 101 is congruent to 1 modulo A
79
= (-1)
since 79 is congruent to — 1 modulo 8
since 11 = 79 = 3 (mod 4)
(-!)( — ) =1 since II = 3 (mod 8).
Many proofs of the the quadratic reciprocity law can be found in literature6. However, many of them (especially the shorter ones) usually make use of deeper knowledge from algebraic number theory. We will 1 present an elementary proof of this theorem here.
Let S denote the reduced residue system of the least residues (in absolute value) modulo p, i.e.,
g _ r p-t p-3 _i 1 p-3 p-i 1
l 2 ' 2 ' " " " ' 2' 2 J "
Further, for a e Z, p \ a, let nv(a) denote the number of negative least residues (in absolute value) of the integers
1 ■ a, 2 ■ a,..., —— ■ a,
i.e., we decide for each of these integers to which integer from the set S it is congruent and count the number of the negative ones. If it is clear from context which values a,p we mean, we will usually omit the parameters and write only fi instead
of fip(a).
Example. We determine nP(a) for the prime p = 11 and the integer a = 3.
In 2000, F. Lemmermeyer stated 233 proofs - see F. Lemmermeyer, Reciprocity laws. From Euler to Eisenstein, Springer. 2000
810
CHAPTER 11. ELEMENTARY NUMBER THEORY
We immediately see that this congruence is solvable with the pairof solutionsx—12 = ±2 (mod 29), andfhusa; = 10,14 (mod 29).
We could have also seen almost immediately that the given polynomial can be factored as Qx2 + x — 1 = (3a; — 1) (2x+1), and thus the prime modulus 29 has to divide either 3a; — 1 or 2a; + 1. The obtained linear congruences 3a; = 1 (mod 29) and 2a; = — 1 (mod 29) easily yield the same solutions x = 10 (mod 29) and x = 14 (mod 29) as above. □
11.C.25. Find all integers which satisfy the congruence
x2 = 7 (mod 43).
Solution. The Legendre symbol evaluates to
,43)= ~ {t) = ~ G,
Hence it follows that 7 is a quadratic nonresidue modulo 43, so there is no solution of the given congruence. □
11.C.26. Find all integers a for which the congruence
Now, the reduced residue system we are interested in is S = {—5,..., —1,1,,..., 5}, and for a = 3, we calculate
3
-5 -2 1
4
(mod 11) (mod 11) (mod 11) (mod 11) (mod 11),
whence /in (3) = 2.
We will show in the following statement that this integer is tightly connected to the Legendre symbol - the value of the symbol (3/11) can be determined in terms of the p function
as (-l)Mn(3) = (_1)2 = L
11.4.14. Lemma (Gauss). Ifp is an odd prime, a G Z, p \ a, then the value of the Legendre symbol satisfies
= -1.
= a (mod 43)
is solvable.
Proof. For each integer i G {1,2,..., 1~-}, we set a value mi G {1,2,..., ^5^} so that i ■ a = ±m, (mod p). We can easily see that if fc, / G {1,2,..., Pj^L} are different, then the values m^, mj are also different (the equality mk = mi would imply that k ■ a = ±1 ■ a (mod p), and hence k = ±1 (mod p), which cannot be satisfied unless k = I).
Therefore, the sets {1,2,...,^} and
{mi, m2,..., rriv-i } coincide, which is also illustrated
2
by the above example. Multiplying the congruences
p-1
leads to
Solution. This exercise is a follow-up to the above one, from which we can see that the integer 7 does not meet the requirement. We can test all the remainders modulo 43 in the same way, but there is a simpler method. The congruence is surely solvable if a is a multiple of 43 (then, it has a unique solution); and if not, it must be a quadratic residue modulo 43. The quadratic residues can be most simply enumerated by calculating the squares of all elements of a reduced residue system modulo 43.
The quadratic residues are thus the integers congruent to (±1)2, (±2)2, (±3)2,..., (±21)2 modulo43; so the problem is satisfied by exactly those integers a which are congruent to 11.4.12, whence (a/p) any one of 1,4, 6, 9,10,11,13,14,15,16,17,21,23,24, 25, 31,35,36,38,40,41. □
1 ■ a = ±mi
2 ■ a = ±?7i2
a = ±mP
(mod p), (mod p),
(mod p)
£fi! ■ = (-1)" ■ £fl! (mod p),
since there are exactly p negative values on the right-hand sides of the congruences. Dividing both sides by the integer !, we get the wanted statement, making use of lemma
(mod p).
□
Now, with the help of Gauss's lemma, we will prove the law of quadratic reciprocity.
Using law of quadratic reciprocity 11.4.13 we can calculate the value (a/p) for any integer a and an odd prime p. Moreover, evaluation of the Legendre symbol is fast enough even for high arguments, therefore using it is favourable to verifying criteria of the theorem 11.4.10.
Proof of the law of quadratic reciprocity. The first part has been already proven; for the rest, we first derive a lemma which will be utilized in the proof of both of the remaining parts.
Let a e Z, p \ a, k e N and let [x] and (x) denote the integer part (i.e. floor) and the fractional part, respectively, of
811
CHAPTER 11. ELEMENTARY NUMBER THEORY
11.C.27. Here, we recall the statement of the Law in the slightly modified way which is more suitable for direct calculations.
i) —1 is a quadratic residue for primes p which satisfy p = 1 (mod 4) and it is a quadratic nonresidue for primes p satisfying p = 3 (mod 4).
ii) 2 is a quadratic residue for primes p which satisfy p = ±1 (mod 8) and it is a quadratic nonresidue for primes p satisfying p = ±3 (mod 8).
iii) If p = 1 (mod 4) or q = 1 (mod 4), then (p/q) = (q/p); for other odd p, q, we have (p/q) = —(q/p).
Solution. We simply apply law of quadratic reciprocity in the appropriate cases.
i) The integer 1~- is even iff 4 | p — 1.
ii) We need to know for which odd primes p the exponent is
2—i
^-g— is even. Odd primes are congruent to ±1 or ±3 modulo 8, so we have (by 11.B.7) that either p2 = 1 (mod 16) orp2 = 9 (mod 16).
iii) This is clear from the law of quadratic reciprocity. D
11.C.28. Derive by straight calculation from Gauss's lemma 11.4.14 once again the so-called supplementary laws of quadratic reciprocity:
'2N
a real number x. Then,
-1\ p-1
and
p2-i
= (-!) — .
Solution. To evaluate (—1/p) in the former case, we should realize that fi tells the number of least (in absolute value) negative remainders of integers in the set
{-1,-2,...,-*=!}.
However, those are exactly the desired remainders and they
are all negative; hence we have /i = 1~- and (—1/p) =
p-i (-1) —.
In the latter case, we need to express the number of least (in absolute value) negative remainders of integers in the set
{1-2, 2-2, 3-2..., *=!■ 2}.
For any k e {1,2,..., *=! }, the integer 2 k leaves a negative
remainder if and only if 2k > *=!, i.e., iff k > *=!. Now, it
remains to determine the number of such integers k.
If p = 1 (mod 4), then this number is equal to 1~- —
£—! = 2—! so 4 4 >b"
(—1\ p-i p-i p+1 p2-i
[—) = (-iT = (-i)~ = = (-i)~
~2ak ak n/ak\ ak n/ak\
2 — + 2( —) = 2 — + 2( — >
. P . . P . \P / . P . \P /
This expression is odd if and only if (^} > \, which is iff the least residue (in absolute value) of the integer ak modulo p is negative (a watchful reader should notice the return from the calculations of (ostensibly) irrelevant expressions back to the conditions close to the Legendre symbol). The integer pp(a) thus has the same parity (is congruent to, modulo 2) as
E.
k=l I
], whence (thanks to Gauss's lemma) we get that
(-l)Ma) = (_i)
Furthermore, if a is odd, then a + p is even and we get
2 /a+p\
2a\ _ (2a + 2p P
A_a+p * 2
P J \ P
= (-l)^fc=i v~~r-
= (-i)E^! Lit J . (-i)E,=2! k.
Since the sum of the arithmetic series J2k=i ^ *s \ 2=! 2±! = 2^!, we get (for a odd) the relation
= (-1)
v 2
(-i)J
\p) (jv
which, for a = 1, gives the wanted statement of item 2.
By part (2), which we have already proved, and the previous equality, we now get for odd integers a that
(1)
Now, let us consider, for given primes p/g, the set T = {q ■ x; x e Z, 1 < x < (p - l)/2} x x {p-y; y G Z, 1 < y < (q - l)/2}.
We apparently have \T\ = *=! ■ We will show that we also have
(-i)m = (-1)2-5: [f] .(-i)ESf[^],
which will be sufficient thanks to the above.
Since the equality ga; = can happen for no pair of x, y from the permissible domain, the set T can be partitioned to (disjoint) subsets Tx and T2 so that Tx = T n
{(u,u);u,u G Z,u < u}, T2 = T\ Ti. Clearly, |Ti| is the number of pairs (qx,py) for which x < Ey. Since
812
CHAPTER 11. ELEMENTARY NUMBER THEORY
since £±1 is odd in this case.
Similarly, for p = 3 (mod 4), the number of such inte-
gers k equals
p— 1 p—3 _ p+1
—1\ P+1 P+1 P-1 P -1
—J = (-1)— = (-I) — '— = (-I)";
since 1~- is °dd in this case as well.
□
11.C.29. Solve the congruence x2 - 23 = 0 (mod 77). Solution. Factoring the modulus, we get the system
x2-l = 0 (mod 11), x2 -2 = 0 (mod 7).
Clearly, 1 is a quadratic residue modulo 11, so the first congruence of the system has (exactly) two solutions: x = ±1 (mod 11). Further, (2/7) = (9/7) = 1, and it should not take much effort to notice the solution: x = ±3 (mod 7).
We have thus obtained four simple systems of two linear congruences each. Solving them, we will get that the original congruence has the following four solutions: x = 10, 32, 45 or 67 (mod 77). □
11.C.30. Solve the congruence 7a;2 + 112a; + 42 = 0 (mod 473). O
Jacobi symbol. Jacobi symbol (a/b) is a generalization of the Legendre symbol to the case where the "lower" argument b need not be a prime, but any odd positive integer. It is defined as the product of the Legendre symbols corresponding to the prime factors of b: if b = p"1 ■ ■ ■ p^k, then
ybj \PlJ \Pk/
The primary motivation for introducing the Jacobi symbol is the necessity to evaluate the Legendre symbol (and thus to decide the solvability of quadratic congruences) without having to factor integers to primes. We will illustrate such calculation on an example having in mind that Jacobi symbol shares with the Legendre one not only the notation but also almost all of the (computational) properties.
11.C.31. Decide whether the congruence a;2 = 219 (mod 383) is solvable.
\y ^ \ ■ ^r- < %>we have [fy] ^ ^f1- For a fixed y>
in Ti, there are thus exactly those pairs (qx,py) for which 1 < x < [jy]; hence |Ti| = J2y=i)/2 [*v] ■ Analogously,
|T2| = ££i1)/2 [**].
By (1), we thus have (|) = (-l)|Tl1 and g) = (-1)IT2I, which finishes the proof of the law of quadratic reciprocity.
□
The evaluation of the Legendre symbol (as we saw in j|| the example above) allows us to only use the law of quadratic reciprocity for primes, so it forces us to factor integers to primes, which is a very *rz±i hard operation from the computational point of view. This can be mended by extending the definition of the Legendre symbol to the so-called Jacobi symbol with similar properties.
Definition. Let a e Z, b e N, 2 \ b. Let b factor as b = P1P2 ■ ■ ■ Pk to (odd) primes (here, we exceptionally do not group the same primes to a power of the prime, rather we
write each one explicitly, e.g. symbol
'a\ (a
J>J VPi. is called the Jacobi symbol.
135 = 3 ■ 3 ■ 3 ■ 5). The
P2
Pk
We show in the practical column that the Jacobi symbol has similar properties as the Legendre one. However, there is a substantial aberration-it is not generally true that (a/b) = 1 implies that the congruence x2 = a (mod 6) is solvable.
(-1) •(-!) = !,
Example.
J5) = (3) ' (5, but the congruence
x2 = 2 (mod 15)
has no solution (the congruence a;2 = 2 is solvable neither modulo 3 nor modulo 5).
Theorem (Law of quadratic reciprocity for the Jacobi symbol). Let a,b £N be odd integers. Then,
(1) (^) = (-1)^,
(2) (D=(-l)^.
(3) (§) = (£)■(-!)".
Proof. The proof is simple, utilizing the law of quadratic reciprocity for the Legendre symbol. See exercise 11.C.35. □
There is another application of the law of quadratic reci-jjj 1 procity in a certain sense - we can consider the question: For which primes is a given integer a a quadratic residue? We are already able to answer this question for a = 2, for example. The first step in answering this question is to do so for primes since the answer
813
CHAPTER 11. ELEMENTARY NUMBER THEORY
(Jacobi) as 383 = 219 = 3 (mod 4) 164 = 22 ■ 41 (Jacobi) as41 = 1 (mod 4)
Solution. Since 383 is a prime, the congruence will be solvable if the Legendre symbol will satisfy (219/383) = 1.
'219\ _ //383^ ,383) _ \219y
'164 \ _ / 41 K219j _ \219, '219> ~AA
v4l) = "fe) fe,
v41y '41N
-1N T
□
Now, we introduce several exercises proving that the Jacobi symbol has properties similar to the Legendre one, which relieves us of the necessity to factor the integers that appear when working purely with the Legendre symbol.
11.C.32. Prove that all odd positive numbers b, b' and all integers a, ai, a2 satisfy (the symbols used here are always the Jacobi ones):
i) if ai = a2 (mod 6), then (f) = (f),
ii) (flTl) = (f)(f).
iii)
1
as 41 = 1 (mod 8) as41 = 1 (mod 4) as 7 = 3 (mod 4).
Kbb>,
o
11.C.33. Prove that if a, b are odd natural numbers, then
q-l 1 b-1
2 + 2
Ü)
s 1 s
(mod 2), (mod 2).
Solution.
i) Since the integer (a — 1) (6 — 1) = (ab — 1) — (a — 1) — (b— 1) is amultipleof 4, we get (ab—1) = (a—l)+(b— 1) (mod 4), which gives what we want when divided by two.
ii) Similarly to above, (a2 - l)(b2 - 1) = (a2b2 - 1) -— (a2 — 1) — (b2 — 1) is a multiple of 16. Therefore, (a2b2 - 1) = (a2 - 1) + (b2 - 1) (mod 16), which gives the wanted statement when divided by eight (see also exercise 11.A.2). □
for composite values of a depends on the factorization of the integer a.
Theorem. Let q be an odd prime.
• If q = 1 (mod 4), then q is a quadratic residue modulo those primes p which satisfy p = r (mod q), where r is a quadratic residue modulo q.
• If q = 3 (mod 4), then q is a quadratic residue modulo those primes p which satisfy p = ±b2 (mod Aq), where b is odd and coprime to q.
Proof. The first theorem follows trivially from the law of quadratic reciprocity. Let us consider q = 3 (mod 4), i.e., (q/p) = (—^^(p/q). First of all, letp = +b2 (mod Aq), where b is odd, and hence b2 = 1 (mod 4). Then, p = b2 = 1 (mod 4) andp = b2 (mod q). Therefore, (—1)R2~ = 1 and (p/q) — 1. whence (q/p) = 1. Now, if p = —b2 (mod Aq), then we similarly get that p = —b2 = 3 (mod 4) and p = —b2 (mod q). Therefore, (—\)^~ = —\ and (p/q) = —1, whence we get again that (q/p) = 1.
For the opposite way, suppose that (q/p) = 1. There are two possibilities - either (—l)1^ = 1 and (p/q) = 1, or (—1)^2- = — land (p/q) = —1. In the former case, we have p = 1 (mod 4) and there is a b such that p = b2 (mod q). Further, we can assume without loss of generality that b is odd (if not, we could have taken b + q instead). However, then we get b2 = 1 = p (mod 4), and altogether p = b2 (mod Aq).
In the latter case, wehavep = 3 (mod 4) and (—p/q) = (-l/q)(p/q) = (-1)(-1) = 1. Therefore, there is a b (which can also be chosen so that it is odd) such that —p = b2 (mod q). We thus get —b2 = 3 = p (mod 4), and altogether p = —b2 (mod Aq). □
5. Diophantine equations
It is as early as in the third century AD when Diophantus of Alexandria dealt with miscellaneous equations while admitting only integers as solutions. And there is no wonder -in many practical problems that lead to equations, non-integer solutions may fail to have a meaningful interpretation. As an example, we can consider the problem of how to pay an exact amount of money with coins of given values.
In honor of Diophantus, equations for which we are interested in integer solutions only are called Diophantine equations.
Another nice example of a Diophantine equation is Eu-ler's relation
v-e+f=2
from graph theory, connecting the number of vertices, edges, and faces of a planar graph. Furthermore, if we restrict ourselves to regular graphs only, we get to the problem about existence of the so-called Platonic solids, which can be smartly described just as a solution of this Diophantine equation - for more information, see 13.1.22.
Unfortunately, there is no universal method for solving this kind of equations. There is even no method (algorithm)
814
CHAPTER 11. ELEMENTARY NUMBER THEORY
11.C.34. Prove that if a1:..., are odd natural numbers, then
i) "H^"1 = Eti ^ (mod 2),
ii)
re=i«i-i
= Eti ^ (mod 2).
O
11.C.35. Prove the law of quadratic reciprocity for the Ja-cobi symbol, i.e., prove that if a, b are odd natural numbers, then
i) (^) = (-l)^. © = (-1)^.
i") (t) = (IM-i)"-
Solution. Let (just like in the definition of the Jacobi symbol) a factor to (odd) primes as p±p2 ■ ■ - pk-i) The properties of the Legendre symbol and the aforementioned statement imply that
-1\ /-A /-l^
Pi
P2
Pk
If we
PI — 1 PL. — 1
= (-l)^-...(-l)^- = = (-1)^ ^ =
= (-i)sH-i = (-i)^.
ii) Analogously to above.
iii) Further, let b factor to (odd) primes as q± q2 ■ have pi = qj for some i and j, then the symbols on both sides of the equality are equal to zero. Otherwise, the law of quadratic reciprocity for the Legendre symbol implies that for all pairs (pi, qj), we have
W W
Therefore,
k e
'Pi
Pi -1 Qj-1
w =nn\„
i=ij=i w
k e
■nnv
i=lj=l v'
=ik-d
2 2 —
_1 qj~1
— 1) 2 2-^j = l 2
ik-d
Pi-i nj=i gj'1 -pT /gj
n^) — nvp
i=i j=i ^
to decide whether a given polynomial Diophantine equation has a solution. This question is well-known as Hubert's tenth problem, and the proof of algorithmic unsolvability of this problem was given by fOpnii MairiHceBira (Yuri Matiya-sevich) in 1970.7
However, there are cases in which we are able to find the solution of a Diophantine equation, or - at least - to reduce the problem to solving congruences, which is besides the already mentioned applications another motivation for studying them. Now, we will describe several such types of Diophantine equations.
Linear Diophantine equation. A linear Diophantine equa-tion is an equation of the form
a1x1 + a2x2 H-----h anxn = b,
wriere-s=r,..., xn are unknowns and ai,..., an, b are given non-zero integers.
We can see that the ability to solve Diophantine equations is sometimes important in "practical" life as well, as is proved by Bruce Willis and Samuel Jackson in Die Hard with a Vengeance, where they have to do away with a bomb using 4 gallons of water, having only 3- and 5-gallon containers at their disposal. A mathematician would say that the gentlemen were to find a solution of the Diophantine equation 3x+5y = 4.
One can use congruences in order to solve these equations. Apparently, it is necessary for the equation to be solvable that the integer d = (ai,..., an) divides b. Provided that, dividing both sides of the equation by the number d leads to an equivalent equation
a[x1 + a2x2 H----+ a'„x„, = b'
where a\ = cii/d for i = 1, have
-\- cirixn n and b' = b/d. Here, we
d- (a[,.
J = (da[,... , da'n) = (au ...,an) = d,
See the elementary text M. Davis, Hubert's Tenth Problem is Unsolv-able, The American Mathematical Monthly 80(3): 233-269. 1973.
815
CHAPTER 11. ELEMENTARY NUMBER THEORY
(-u-E?=^nn(f
i=ij=i ^
= (-1) , , (-
We utilized the result of part (i) of the previous exercise in the calculations. □
11.C.36. Determine whether the congruence x2 = 38
(mod 165) is solvable.
Solution. The Jacobi symbol is equal to
' 38 'i , 165 y
1165/ \165i '19\ {19)
(§)■(§)
,11,
■(f)-(f)-(rf) = (-i)3(l)-(^)-(ir)3 = i-
This result does not answer the question of the existence of a solution. However, if we split the congruence to a system of congruences according to the factors of the modulus, we obtain
x2 = -1 (mod 3),
x2 = 3 (mod 5),
x2 = 5 (mod 11), whence we can easily see that the first and second congruences have no solution. In particular,
= and (|) = (|) = (|) = -1 ■
Therefore, neither the original congruence has a solution. □
11.C.37. Find all primes p such that the integer below is a N\\+v quadratic residue modulo p:
">Sy/o i) 3, ii) — 3, iii) 6.
Solution.
i) We are looking for primes p / 3 such that x2 = 3 (mod p) is solvable. Since p = 2 satisfies the above, we will consider only odd primes p / 3 from now on. For p = 1 (mod 4), it follows from the law of quadratic reciprocity that 1 = (3/p) = (p/3), which occurs if and only if p = 1 (mod 3). On the other hand, if p = — 1 (mod 4), then 1 = (3/p) = —(p/3), which holds for p = —1 (mod 3). Putting the conditions of both cases together, we arrive at p = ±1 (mod 12), which, together with p = 2, completes the set of all primes satisfying the given condition.
ii) The condition 1 = (—3/p) = (—l/p)(3/p) is satisfied if either (-1/p) = (3/p) = lor(-l/p) = (3/p) = -1. In the former case (using the result of the previous item), this means that p = 1 (mod 4) andp = ±1 (mod 12). In the latter case, we must have p = — 1 (mod 4) and p = ±5 (mod 12), at the same time - we can take,
so «,...,<) = 1.
Further, we will show that the equation
a1x1 + a2x2 H-----h anxn = b,
where a1, a2,..., an, b are integers such that (a1,..., an) =
I, always have a solution in integers and all such solutions can be described in terms of n — 1 integer parameters.
We will prove this proposition by mathematic induction on n, the number of unknowns. The situation is trivial for 7i = 1 - there is a unique solution (which does not depend on any parameters). Further, let n > 2 and suppose that the statement holds for equations having n — 1 unknowns. Denoting d= (ai,..., a„_i), any n-tuple x1,... ,xn that satisfies the equation must also satisfy the congruence
aiXi+a2X2-\-----h anxn = b (mod d).
Since d is the greatest common divisor of the integers ai,..., a„_i, this congruence is of the form
anxn = b (mod d),
which (since (d, an) = (ai,..., an) = 1) has a unique solution
xn = c (mod d), where c is a suitable integer, i.e., xn = c+d-t, where t e Z is arbitrary. Substituting into the original equation and refining it leads to the equation
aixi + ■ ■ ■ + an-ixn-i = b — anc — andt
with Ti—l unknowns and one parameter, t. However, the number (b — anc) jd is an integer, so we can divide the equation by d. This leads to
a[xi H----+ a'n_1xn-1 = b',
where a\ = ai/d for i = 1,... ,ti — 1 and b' = ((& — anc)/d) — ant, satisfying
(a[,...,a/n_1) = (da'1,...,da'n_1)-\ = (au ..., a„_i)' \ = 1-
By the induction hypothesis, this equation has, for any ieZ, a solution which can be described in terms of n—2 integer parameters (different from t), which together with the condition xn = c + dt gives what we wanted.
II. 5.1. Pythagorean equation. In this section, we will deal with enumeration of all right triangles with integer side lengths. This is a Diophantine equation where we will only seldom use the methods described above; nevertheless, we will look at it in
detail.
The task is to solve the equation
2,2 2
x + y = z
in integers.
Solution. Clearly, we can assume that (x, y,z) = 1 (otherwise, we simply divide both sides of the equation by the integer d = (x,y,z)).
Further, we can show that the integers x,y,z are pair-wise coprime: if there were a prime p dividing two of them,
816
CHAPTER 11. ELEMENTARY NUMBER THEORY
for instance, the set {—5, —1,1,5} for a reduced residue system modulo 12, and since (3/p) = 1 for p = ±1 (mod 12), we surely have (3/p) = — 1 whenever p = ±5 (mod 12). We have thus obtained four systems of two congruences each. Two of them have no solution, and the remaining two are satisfied by p = 1 (mod 12) and p = — 5 (mod 12), respectively, iii) In this case, (6/p) = (2/p)(3/p) and once again, there are two possibilities: either (2/p) = (3/p) = 1 or (2/p) = (3/p) = —1. The former case occurs if p satisfies p = ±1 (mod 8) as well as p = ±1 (mod 12). Solving the corresponding systems of linear congruences leads to the condition p = ±1 (mod 24). In the latter case, we get p = ±3 (mod 8) as well as p = ±5 (mod 12), which together gives p = ±5 (mod 24). Let us remark that thanks to Dirichlet's theorem 11.2.5, the number of primes we were interested in is infinite in each of the three problems. □
11.C.38. The following exercise illustrates that if the mod-\\N i'/, ulus of a quadratic congruence is a prime p satisfy-
fr ing p = 3 (mod 4), then we are able not only to decide the solvability of the congruence, but also to describe all of its solutions in a simple way.
Consider a prime p = 3 (mod 4) and an integer a such that (a/p) = 1. Prove that the solution of the congruence x2 = a (mod p) is
±a 4
(mod p).
Solution. It can be easily verified (using lemma 11.4.12) that
(a2^)2 = = a • (^) = a (mod p) . □
11.C.39. Determine whether the congruence
x2 = 3 (mod 59) is solvable. If so, find all of its solutions.
-(-1) = 1.
Solution. Calculating the Legendre symbol
,59) = =~(sy
we find out that the congruence has two solutions. Thanks to the statement above, we can immediately see (59 = 3 (mod 4)) that the congruence is satisfied by
x :
±3"
,15
(35
±3X
= ±73 = ±343 = Til (mod 59), since 35 = 243 = 7 (mod 59).
then we can easily see that it would have to divide the third one as well, which it may not according to our assumption. Therefore, at most one of the integers x, y is even. If neither of them were, we would get
z2 = x2 + y2 = 1 + 1 (mod 8),
which is impossible (see exercise 11.A.2). Altogether, we get that exactly one of the integers x, y is even. However, since the roles of these integers in the equation are symmetric, we can, without loss of generality, select x to be even and set x = 2r, r e N. Hence, we have
so
□
4r2 = zl - yl,
2 z+y z-y
r = -■-.
2 2
Now, let us denote u = \ (z + y), v = \ (z — y) (then, the inverse substitution is z = u + v, y = u — v). Since y is coprime to z, so is u to v (if there were a prime p dividing both u and v, then it would divide their sum as well as their difference, i.e., the integers y and z). It follows from
r2 = u ■ v
that there are coprime positive integers a, b such that u = a2, v = b2. Moreover, since u > v, we must have a > b. Altogether, we get
x = 2r = 2ab,
y = u — v = (a2 — b2),
z = u + v = (a2 + b2),
which indeed satisfies the given equation for any coprime a, b e N with a > b. Further solutions can be obtained by interchanging x and y. Finally, relinquishing the condition (x, y, z) = 1, each solution will yield infinitely many more if we multiply each of its component by a fixed positive integer d. □
11.5.2. Fermat's Last Theorem for n = 4. Thanks to the
§. parametrization of Pythagorean triples, we will be able to prove that the famous Fermat's Last Theorem
xn + yn = zn
has no solution for n = 4 in integers. For this task it is sufficient to prove that the equation x4 + y4 = z2 has no solution inN.
Solution. We will use the so-called method of infinite descent, which was introduced by Pierre de Fermat. This method utilizes the fact that every non-empty set of natural numbers has a least element (in other words, N is a well-ordered set).
Therefore, suppose that the set of solutions of the equation x4 + y4 = z2 is non-empty and let (x, y, z) denote (any) solution with z as small as possible. The integers x, y, z are thus pairwise distinct. Since the equation can be written in the form
(x2)2 + (y2)2 = z2,
817
CHAPTER 11. ELEMENTARY NUMBER THEORY
D. Diophantine equations
Here, we limit ourselves only to the small class of equations which can be solved using divisibility or can be reduced to solving congruences.
ll.D.l. Linear Diophantine equations. Decide whether it is possible to use a balance scale to weigh 50 grams of given goods provided we have only (an arbitrary number of) three kinds of masses; their weights are 770, 630, and 330 grams, respectively. If so, how to do that?
Solution. Our task is to solve the equation
770a; + 630y + 330z = 50,
where x,y,z G Z (a negative value in the solution would mean that we put the corresponding masses on the other scale). Dividing both sides of the equation by (770,630,330) = 10, we get an equivalent equation
77a; + 63y + 33z = 5.
Considering this equation modulo (77,63) following linear congruence:
7, we get the
33z = 5 5z = 5 z = 1
(mod 7), (mod 7), (mod 7).
This congruence is thus satisfied by those integers z of the form z = 1 + 74, where t is an integer parameter.
Substituting the form of z into the original equation, we
get
77a;+ 63y = 5 - 33(1 +74), llx + 9y = -4 - 334.
Now, we consider this (parametrized) equation modulo 11:
9y = -4 - 334 (mod 11), -2y = -4 (mod 11), y = 2 (mod 11).
Therefore, this congruence is satisfied by integers y = 2+lis for any s e Z. Now, it only remains to calculate x:
Ux = -4 - 334 - 9(2 + lis), llx = -22 - 334 - 9- lis, x = -2-3t- 9s.
it follows from the previous exercise that there exist r, s e N such that
2 o 2 2 2 2 i 2
x = 2rs, y = r — s , z = r +s.
Hence, y2 + s2 = r2, where (y, s) = 1 (if there were a prime p dividing both y and s, then it would divide x as well as z, which contradicts that they are coprime). Making the Pythagorean substitution once again, we get natural numbers a, b with (y is odd)
y = a2-b2, s = 2ab, r = a2 + b2.
The inverse substitution leads to
a;2 = 2rs = 2-2a&(a2 + &2),
and since x is even, we get
^y = ab(a2 + b2).
The integers a, b, a2 + b2 are pairwise coprime (which can be derived easily from the fact that y is coprime to s). Therefore, each of them is a square of a natural number:
a = c2, b = d2, a2 + b2 = e2,
whence c4 + d4 = e2, and since e < a2 + b2 = r < z, we get a contradiction with the minimality of z. □
6. Applications - calculation with large integers, cryptography
11.6.1. Computational aspects of number theory. In
many practical problems which utilize the re-
.'•**rf~-1?F"1 suits of number theory, it is necessary to execute '<^^5r_' one or more of the following computations fast:
• common arithmetic operations (sum, product, modulo) on integers;
• to determine the remainder of a (natural) n-th power of an integer a when divided by a given m;
• to determine the multiplicative inverse of an integer a modulo mei;
• to determine the greatest common divisor of two integers (and the coefficients of corresponding Bezout's identity);
• to decide whether a given integer is a prime or composite number.
• to factor a given integer to primes.
Basic arithmetic operations are usually executed on large integers in the same way as we were taught at primary school, i.e., we add in linear time and multiply and divide with remainder in quadratic time. The multiplication, which is a base for many other operations, can be performed asymptotically more efficiently (there exist algorithms of the type divide and conquer) - for instance, the Karatsuba algorithm (1960), running in time & (nlog2 3) or the Schonhage-Strassen algorithm (1971), which runs in 0(n log nlog log n) and uses Fast Fourier Transforms - see also 7.2.5. Although it is asymptotically much better, in practice, it becomes advantageous for integers of at least ten thousand digits (it is thus
818
CHAPTER 11. ELEMENTARY NUMBER THEORY
We have found out that the equation is satisfied if and only if
(x, y, z) is in the set
{(-2-3f-9s,2 + lls,l + 7f);s,f £ Z}.
Particular solutions can be obtained by evaluating the triple at concrete values of t, s. For instance, setting t = s = 0 gives the triple (—2,2,1); putting t = —4, s = 1 leads to
(1,13,-27).
Of course, the unknowns can be eliminated in any order -the result may seem "syntactically" different, but it must still describe the same set of solutions (that is given by a particular coset of an appropriate subgroup (in our case, it is the subgroup (2, 2,1) + (3,0,7)Z+ (-9,11,0)Z) in the commutative group Z3, which is an apparent analog to the fact that the solution of such an equation over a field forms an affine subspace of the corresponding vector space). □
Other types of Diophantine equations reducible to congruences. Some Diophantine equations are such that one of the unknowns can be expressed explicitly as a function of the other ones. In this case, it makes sense to examine for which integer arguments it holds that the value of the function is also an integer.
For instance, having an equation of the form
mxn = f(x1,... ,xn-i),
where m is a natural number and f(xi,...,xn-i) e Z[xi,..., xn-i] is a polynomial with integer coefficients, an n-tuple of integers x\,..., xn is a solution of it if and only if
f{xi, ■ ■ ■ ,Zn-i) = 0 (mod m).
11.D.2. Solve the Diophantine equation x (x + 3) = Ay — 1.
Solution. The equation can be rewritten as Ay = x2 + 3x +1. Now, we will solve the congruence
x2 + 3x + l = 0 (mod 4).
This congruence has no solution since for any integer x, the polynomial x2 + 3x + 1 evaluates to an odd integer (the fact that the congruence is not solvable can also be established by trying out all four possible remainders modulo 4 into it).
□
11.D.3. Solve the following equation in integers:
379a; + 314y + 183y2 = 210
used, for example, when looking for large primes in the GIMPS project).
11.6.2. Greatest common divisor and modular inverses.
As we have already shown, the computation of the solution of the congruence a ■ x = 1 (mod m) in variable x can be easily reduced (thanks to Bezout's identity) to the computation of the greatest common divisor of the integers a and m and looking for the coefficients k, I in Bezout's identity k-a+l-m = 1 (the integer k is then the wanted inverse of a modulo m).
function extended_gcd (a ,m) if m == 0:
return (1,0)
else
(q,r) := divide (a,m) (k,l) := extended_gcd(m, r) return (1,k — q*1)
A thorough analysis shows that the problem of computing the greatest common divisor has quadratic time complexity.
11.6.3. Modular exponentiation. The algorithm for modular exponentiation is based on the idea that when computing, for instance 264 mod 1000, one need not calculate 264 and then divide it with remainder by 1000, but that it is better to multiply the 2's gradually and reduce the temporary result modulo 1000 whenever it exceeds this value. More importantly, there is no need to perform such a huge number of multiplications: in this case, 63 naive multiplications can be
replaced with six squarings, as
264 = am2)2)2)2)2 2.
function modular_pow (base , exp , mod)
result := 1
while exp > 0
if (exp % 2 == 1):
result := (result * base) % mod
exp := exp >> 1
base := (base * base) % mod
return result
The algorithm squares the base modulo n for every binary digit of the exponent (which can be done in quadratic time in the worst case) and it performs a multiplication for every one in the binary representation of the exponent. Altogether, we are able to do modular exponentiation in cubic time in the worst case. We can also notice that the complexity is a good deal dependent on the binary appearance of the exponent.
See, for example, D. Knuth, Artof'ComputerProgramming, Volume!: Seminumerical Algorithms, Addison-Wesley 1997 or Wikipedia, Euclidean algorithm, http : //en .wikipedia . org/wiki/Euclidean_ algorithm (as of July 29, 2017).
CHAPTER 11. ELEMENTARY NUMBER THEORY
Solution. The equation is linear in x, so the other unknown, Example. Let us compute 2560 (mod 561).
y, must satisfy the congruence
183y2 + 314y- 210 = 0 (mod 379).
Now, we can complete the left-hand polynomial to square in order to get rid of the linear term. First of all, we must find a f £ Z such that 183 ■ t = 1 (mod 379). (In other words, we need to determine the inverse of the integer 183 modulo 379). For this purpose, we will use the Euclidean algorithm:
379 = 2 ■ 183 + 13, 183 = 14-13 + 1,
whence
1 = 183 - 14 ■ 13 = 183 - 14 ■ (379 - 2 ■ 183) = = 29 ■ 183 - 14 ■ 379.
Therefore, we can take, for instance, the integer 29 to be our t. Now, multiplying bith sides of the congruence by t = 29 and rearranging it, we get an equivalent congruence:
y2 + lOy - 26 = 0 (mod 379)
Now, we can complete the left-hand polynomial to square, which leads to (substituting z = y + 5)
(y + 5)2
26:
2 .
379 \379 \ 3 J v ' \ 17
= (-1)
■(+1) (+1) =
0 (mod 379), 51 (mod 379).
Invoking the law of quadratic reciprocity, we calculate the Legendre symbol (51/379):
51
379
|) ■(-!>■ (£)-<«■<-" ■(£
whence it follows that the congruence is solvable, and, in particular, it has two solutions modulo 379.
The proposition of exercise 11.C.38 implies that the solutions are of the form
i C1 ISA
z = ±51 4 ,
where 513 = 1 (mod 379), whence 5195 = (513)31 ■ 512 = -52 (mod 379). The solution is thus z = ±52 (mod 379), which gives for the original unknown that
y = 47 (mod 379), y = -57 (mod 379).
Therefore, the given Diophantine equation is satisfied by those pairs (x,y) withy G {47 + 379 ■ k; k G Z} U {-57 +
Since 560 gives
(1000110000)2, the mentioned algorithm
exp base result last digit exp
560 2 1 0
280 4 1 0
140 16 1 0
70 256 1 0
35 460 1 1
17 103 460 1
8 511 256 0
4 256 256 0
2 460 256 0
1 103 256 1
0 511 1 0
Therefore, 25
1 (mod 561).
11.6.4. Primality testing. Although we have the Fundamental theorem of arithmetic, which guarantees that every natural number can be uniquely factored to a product of primes, this operation is very hard from the computational point of view.
In practice, it is usually done in the following steps:
(1) finding all divisors below a given threshold (by trying all primes up to the threshold, which is usually somewhere around 106);
(2) testing the remaining factor for compositeness (deciding whether some necessary condition for primality holds);
(a) if the compositeness test did not find the integer to be composite, i.e., it is likely to be a prime, then we test it for primality to verify that it is indeed a prime;
(b) if the compositeness test proved that the integer was composite, then we try to find a non-trivial divisor.
~~ The mentioned steps are executed in this order because the corresponding algorithms are gradually (and strongly) increasing in time complexity. In 2002, Agrawal, Kayal, and Saxena published an algorithm for primality testing in polynomial time, but it is still more efficient to use the above procedure in practice.
11.6.5. Compositeness tests - how to recognize composite numbers with certainty? The so-called compositeness tests check for some necessary condition for primality. The easiest of such conditions is Fermat's little theorem.
Proposition (Fermat's test). Let N be a natural number. If
there is an a ^ 0 (mod N) such that aN_1 ^ 1 (mod N), then N is not a prime.
Unfortunately, having a composite N, it still may not be easy to find such an integer a which reveals the compositeness of N. There are even such exceptional integers N for which the only integers a with the mentioned property are those which are not coprime to N. To find them is thus equivalent to finding a divisor, and thus to factoring N to primes.
820
CHAPTER 11. ELEMENTARY NUMBER THEORY
379 -k; k e 1} and x = ^ ■ (210 - 314y - 183y2); e. g. (-1105,47) or (-1521, -57) (which are the only solutions with \x\ < 105). □
11.D.4. Solve the equation 2X = 1 + 3^ in integers.
Solution. If y < 0, then 1 < 1 + 3y < 2, whence 0 < x < 1, so x could not be an integer. Therefore, y > 0, hence 2X = 1 + 3^ > 2 and x > 1. We will show that we also must have x < 2. If not (i.e., if x > 3), then we would have
1 + 3^ = 2X = 0 (mod 8),
whence it follows that
3V = -1 (mod 8).
However, this impossible since the order of 3 modulo 8 equals 2, so the powers of three are congruent to 3 and 1 only. Now, it remains to examine the possibilities x = 1 and x = 2. For x = 1, we get
3» = 21 - 1 = 1,
hence y = 0. If x = 2, we have
3» = 22 - 1 = 3,
whence y = 1. Thus, the equations has two solutions: a; = 1,
y = 0; and a; = 2, y = 1. □
E. Primality tests
11.E.1. Mersenne primes. The following problems are in <2Jig3> deep connection with testing Mersenne num-
bers for primality.
For any q e N, consider the integer Mq = 2? — 1 and prove:
i) If q is composite, then so is Mq.
ii) If q is a prime, g = 3 (mod 4), then 2q + 1 divides Mg if and only if 2q + 1 is a prime (hence it follows that if
There are indeed such ugly (or extremely nice?) composite numbers N for which every integer a which is co-prime to N satisfies a""1 = 1 (mod N). These are called Carmichael numbers, the least of which9 is 561 = 3 1117, and it was no sooner than in 1992 that it was proved10 that there are even infinitely many of them.
Example. We will prove that 561 is a Carmichael number, i.e., that it holds for every a e N which is coprimeto 3T1-17 that a560 = 1 (mod 561).
Thanks to the properties of congruences, we know that it suffices to prove this congruence modulo 3,11, and 17. However, this can be obtained straight from Fermat's little theorem since such an integer a satisfies a2 = 1 (mod 3), a10 = 1 (mod 11), a16 = 1 (mod 17), where all of 2, 10, and 16 divide 560, hence a560 = 1 modulo 3, 11 as well as 17 for all integers a coprime to 561 (see also Korselt's criterion mentioned below).
11.6.6. Proposition (Korselt's criterion). A composite number n is a Carmichael number if and only if both of the following conditions hold
• n is square-free (divisible by the square of no prime),
• p — 1 | n — 1 holds for all primes p which divide n.
Proof. " <= " We will show that if n satisfies the above two conditions and it is composite, then every a e Z which is coprime to n satisfies an_1 = 1 (mod n). Let us thus factor n to the product of distinct odd primes: n = pi ■ ■ ■ pk, where Pi — 1 n — 1 for alii G {1,,..., k}. Since (a, pi) = 1, we get from Fermat's little theorem that aPl_1 = 1 (mod pi), whence (thanks to the condition pi — 1 | n — 1) it also follows that an_1 = 1 (mod pi). This is true for all indices i, hence an-i = ^ (mod n), so n is indeed a Carmichael number.
" => " A Carmichael number n cannot be even since then we would get for a = — 1 that an_1 = — 1 (mod n), which would (since an_1 = 1 (mod n)) mean that n is equal to 2 (and thus is not composite). Therefore, let n factor as 7i = p"1 ■ ■ ■pk"k, where pi are distinct odd primes and a{ e N. Thanks to theorem 11.3.8, we can choose for every i a primitive root gi modulo pi', and the Chinese remainder theorem then yields an integer a which satisfies a = g{ (mod p"') for all i and which is apparently coprime to n. Further, we know from the assumption that an_1 = 1 (mod n), so this holds modulo pi', and thus g™-1 = 1 (mod pi') as well. Since gi is a primitive root modulo p°', the integer n — 1 must be a multiple of its order, i.e. amultiple of 1.
ii) Let n = 2g+l be a divisor of Mg. We will show that n is aprime invoking Lucas' theorem 11.6.10, Since n — 1 = 2g has only two prime divisors, it suffices to find com-positeness witnesses for the integers 2 and q. We have
2^ = 22 ^ 1 (mod 7i), (-2)^ = -2« =. - 1 ^ 1 (mod ti), thanks to the assumption 7i | Mg = 29 — 1. Further, since (-2)™-1 = 2™-1 = 22q - 1 = (2« + l)Mq = 0 (mod ti), it follows from Lucas'theorem that ti is a prime.
Now, let p = 2g + 1 = —1 (mod 8) be a prime. Since (2/p) = 1, there exists an m such that 2 = m2 (mod p). Hence, 2q = 2^ = 77ip_1 = 1 (mod p), so p\2q -1 = = Mg.
iii) If p | Mg = 2q — 1, then the order of 2 modulo p must divide the prime g, hence it equals g. Therefore, g | p—1, and there exists a k e Z such that 2gfc = p — 1. Altogether, we get
(2/p) = 2^ = 2qk = 1 (mod p),
i.e., p = ±1 (mod 8).
11.E.2.
□
For each of the following Mersenne numbers, determine whether it is prime or composite:
211 _ 1)215 _ li223 _ li229 _ ^
and283 - 1.
Solution. In the case of the integer 215 — 1, the exponent is composite; therefore, the whole integer is composite as well (we even know that it is divisible by 23 — 1 and 25 — 1). In the other cases, the exponent is always a prime. We can notice that these primes, namely q = 11,23,29, and 83, are even Sophie Germain primes (i.e., 2g + 1 is also a prime). It thus follows from part (ii) of the previous exercise that 23 | 211 — 1, 47 I 223 - 1, and 167 I 283 - 1.
Fermat's primality test can be slightly improved to Eu-ler's test or even more with the help of the lacobi symbol, yet this still does not mend the presented problem completely.
Proposition (Euler's test). Let N be an odd natural number. If there is an integer a ^ 0 (mod N) such that a~^~ ^ ±1 (mod N), then N is not a prime.
Proof. This follows directly from Fermat's theorem and
the fact the for N odd, we have a
!)•
JV-l
(^
-l)(a-
□
Proposition (Euler-lacobi test). Let N be an odd natural number. If there is an integer a ^ 0 (mod N) such that a~^~ ^ (^) (mod N), then N is not a prime.
Proof. This follows immediately from lemma 11.4.12.
□
Example. Let us consider N = 561 = 3 ■ 11 ■ 17 as before and let a = 5. Then, we have 5280 = 1 (mod 3) and 5280 = 1 (mod 10), but 5280 = -1 (mod 17), so surely 5280 ^ ±1 (mod 561). Here, it did not hold that a^"1)/2 = ±1 (mod N), so we even did not need to check the value of the lacobi symbol (5/561). However, the Euler-lacobi test can often reveal a composite number even in the case when this power is equal to ±1.
Example. Euler's test cannot detect the compositeness of the integer iV = 1729 = 7-13-19 since the integer = 864 = 25 ■ 33 is divisible by 6, 12, and 18, and so it follows from Fermat's theorem that a^-1^2 = 1 (mod N) holds for all integers a coprime to N. On the other hand, we get already for a = 11 that (11 /172 9) = -1, so the Euler-lacobi is able to recognize the integer 1729 as composite.
Let us notice that the value of the Legendre or lacobi symbol (a/n) can be computed very efficiently thanks to the law of quadratic reciprocity11, namely in time
0((l0ga)(l0g7l)).
pseudoprimes
A composite number ti is called a pseudoprime if it passes the corresponding test of compositeness without being revealed. We thus have
(1) Fermat pseudoprimes to base a,
(2) Euler (or Euler-lacobi) pseudoprimes to base a,
(3) strong pseudoprimes to base a, which are composite numbers which pass the following compositeness test:
The subsequent test is simple, yet (as shown in theorem 11.6.8) very efficient. It is a further specification of Fermat's test, which we have introduced at the beginning.
11.6.7. Theorem. Let p be an odd prime. Let us write p—1 = 2* ■ q, where t is a natural number and q is odd. Then, every integer a which is not a multiple ofp satisfies aq = 1 (mod p)
See Wikipedia, Sophie Germain prime, http : //en . wikipedia. org/wiki/Sophie_Germain_prime (as of July 28, 2013, 14:43 GMT).
See H. Cohen, A Course in Computational Algebraic Number The-
ory, Springer, 1993.
822
CHAPTER 11. ELEMENTARY NUMBER THEORY
We cannot use this proposition for the last case since 29 ^ 3 (mod 4) and, indeed, 59 \ 229 - 1. Now, however, it follows from part (iii) of the above exercise that if there is a prime p which divides 289 — 1, then it must satisfy
p = ±l (mod 8) p= 1 (mod 29),
i.e., p = 1 (mod 232) or p = 175 (mod 232). If we are looking for a prime divisor of the integer n = 229 — 1 = 536 870 911, then it suffices to check the primes (of the above form) up to y^n « 23170. There are 50 of them, so we are able to decide whether n is a prime quite easily (even with paper and pencil). In this case, fortunately, n is divisible already by the least prime, 233. □
11.E.3. Show that the integer 341 is a Fermat pseudoprime to base 2, yet it is not a Euler-lacobi pseudoprime to base 2. Further, prove that the ^ integer 561 is a Euler-lacobi pseudoprime to base 2, but not to base 3. Prove that, on the other hand, the integer 121 is a Euler-lacobi pseudoprime to base 3, but not to base 2.
Solution. The integer 341 is a Fermat pseudoprime to base 2 since 210 = 1 => 2340 = 1 (mod 341). It is not a Euler-lacobi pseudoprime since 2170 = 1 (mod 341), but (gfj-) = -1, which follows from the fact that 341 = -3 (mod 8). For the integer 561, we have 2280 = 1 (mod 561) and (gfj) = 1, since 561 = 1 (mod 8). Therefore, it is a Euler-lacobi pseudoprime to base 2. But not to base 3, since 3 | 561. On the other hand, the integer 121 satisfies 35 = 1 (mod 121) => 360 = 1 (mod 121) and (I|T) = 1, but 260 = 89 ^ 1 (mod 121). □
11.E.4. Prove that the integers 2465, 2821, and 6601 are Carmichael numbers, i.e., denoting any of them as n, then every integer a co-prime to n satisfies
a71'1 = 1 (mod n).
Solution. We have 2465 = 5-17-29, 2821 = 7-13-31, 6601 = 7-23-41, and the proposition follows from Ko-rselt's criterion 11.6.6 since all of the integers 4,16,28 divide 2464 = 25 ■ 7 ■ 11, all of the integers 6,12,30 divide
2820 = 22-3-5-47, and 6,22,40 divide 6600 = 23-3-52-ll.
□
or there exists an e G {0,1, — 1} such that a2 q = — 1 (mod p).
Proof. It follows from Fermat's little theorem that
p | a?-1 - 1 = (a^ - lXa2^ + 1) =
= (a^-l)(a^+l)(a^ + l) =
= (aq - l)(aq + l)(a2q + 1) ■ ■ ■ (a2"1" + 1), whence the statement follows easily since p is a prime. □
Proposition (Miller-Rabin compositeness test). Let N, t, q
be natural numbers such that N is odd and N — 1 = 2* ■ q, 2 { q. If there is an integer a ^ 0 (mod N) such that
aq iL 1 (mod N)
a2'"^-l (mod JV) for e £ {0,1, — 1},
then N is not a prime.
Proof. The correctness of the test follows directly from the previous theorem. □
Miscellaneous types of pseudoprimes
In practice, this easy test rapidly increases the ability to recognize composite numbers. The least strong pseudoprime to base 2 is 2047 (while the least Fermat pseudoprime to base 2 was already 341), and considering the bases 2, 3, and 5, the least strong pseudoprime is 25326001. In other words, if we are to test integers below 2-107, then it is sufficient to execute this compositeness test already for the bases 2,3, and 5. If the tested integer is not revealed to be composite, then it is surely a prime. On the other hand, it has been proved that no finite basis is sufficient for testing all natural numbers.
The Miller-Rabin test is a practical application of the previous statement, and we are even able to bound the probability of failure thanks to the following theorem, which we present without a proof12.
11.6.8. Theorem. Let N > 10 be an odd composite number. Let us write N — 1 = 2* ■ q, where t is a natural number and q is odd. Then, at most a quarter of the integers from the
12Schoof, René (2004), "Four primality testing algorithms" (PDF), Algorithmic Number Theory: Lattices, Number Fields, Curves and Cryptography, Cambridge University Press, ISBN 978-0-521-80854-5
823
CHAPTER 11. ELEMENTARY NUMBER THEORY
11.E.5. Prove that the integer 2047 is a strong pseudoprime to base 2, but not to base 3. Further, prove that the integer 1905 is a Euler-Jacobi pseu-S^. doprime to base 2 but not a strong pseudo-prime to this base.
Solution. In order to verify whether 2047 is a strong pseudo-prime to base 2, we factor
(22046 -l) = (21023 - 1)(21023 + 1).
Since 21023 = 1 (mod 2047), the statement is true. However, it is not a strong pseudoprime to base 3 as
31023 = 1565 ^ ±1 (mod 2047).
Notice that for the integer 2047, the strong pseudoprimality test is identical to the Euler one (this is because the integer 2046 is not divisible by four).
The integer 1905 is a Euler-Jacobi pseudoprime to base 2 since 21904/2 = 1 (mod 1905) and the Jacobi symbol (2/1905) is equal to 1. Since 1904 = 24 ■ 7 ■ 17, 1905 will be a strong pseudoprime to base 2 only if at least one of the following congruences holds:
2952 = _x 2476 = _x 2238 = _-l
2119 = ±1
(mod 1905), (mod 1905), (mod 1905), (mod 1905).
However, 2952 = 2476 = 1 (mod 1905), 2238 = 1144 (mod 1905), and 2119 = 128 (mod 1905). Therefore, 1905 is not a strong pseudoprime to base 2. □
11.E.6. Applying Pocklington-Lehmer test 11.6.11, show that 1321 is a prime.
Solution. Let us set N = 1321, then N -l = 1320 = 23-3-5-ll. For the sake of simplicity, we will assume that the trial division is executed only for primes below 10, then F = 23 ■ 3 ■ 5 = 120, U = 11, where (F, U) = (120,11) = 1.
In order to prove the primality of 1321 by the Pocklington-Lehmer test, we need to find a primality witness ap for each p e {2,3, 5}.
Since (2— - l, 1321J = 1 and (2— - l, 1321J = 1, we can lay a3 = a5 = 2. However, for p = 2, we have ^21^a — 1,1321^ = 1321, so we have to look for another primality witness. We can take a2 = 7 since (7W1 - l, I32l) = 1. In both cases, we have 21320 =
set {a £ Z; 1 < a < N, (a, N) = 1} satisfies the following condition:
aq = 1 (mod N) or there is an e G {0,1,..., t — 1} satisfying
a¥q = -\ (mod N).
In practical implementations, one usually tests about 20 random bases (or the least prime bases). In this case, the above theorem states that the probability of failing to reveal a composite number is less than 2-40.
The time complexity of the algorithm is same as in the case of modular exponentiation, i.e. cubic in the worst case. However, we should realize that the test is non-deterministic and the reliability of its deterministic version depends on the so-called generalized Riemann hypothesis (GRH ).
11.6.9. Primality tests. Primality tests are usually applied when the used compositeness test claims that the examined integer is likely to be a prime, or they are executed straightaway for special types of integers. Let us first give a list of the most known tests, which includes historical tests as well as very modern ones.
(1) AKS - a general polynomial primality test discovered by Indian mathematicians Agrawal, Kayal, and Saxena in 2002.
(2) Pocklington-Lehmer test - primality test of subexponen-tial complexity.
(3) Lucas-Lehmer test - primality test for Mersenne numbers.
(4) Pepin's test - primality test for Fermat numbers from 1877.
(5) ECPP - primality test based on the so-called elliptic curves.
Now, we will introduce a standard primality test for Mersenne numbers.
Proposition (Lucas-Lehmer test). Let q=^2bea prime, and let a sequence {sn)n°=0 be defined recursively by
s0 = 4,sn+i = s2 - 2.
Then, the integer Mq = 2q — 1 is a prime if and only if Mq divides sq-2-
Proof. We will be working in the ring R = Z[V3] = = {a + b^3; a,b G Z}, where the division with remainder behaves similarly as in the integers (see also 12.2.5). Let us seta = 2+V3,/3 = 2-V3 and note that q+/3 = 4,q-/3 = 1.
First, we prove by induction that it holds for all n e No
that
(1)
a2 +/32
1 + q2
The statement is true for n = 0 since so = 4 = a + f3. Now.letus suppose thatitis true forn—1, then sn = sn_1 — 2
Wikipedia, Riemann hypothesis, http: //en . wikipedia . org/wiki/Riemann_hypothesis (as of July 29, 2017).
824
CHAPTER 11. ELEMENTARY NUMBER THEORY
71320 = 1 (mod 1321). The primality witnesses of the in- iS;by the induction hypothesis, equal to (a
teger 1321 are thus a2 = 7, a3 = a5 = 2. Instead, we could also have chosen for all primes p the same number (e. g. 13), which is a primitive root modulo 1321. □
11.E.7. Factor the integer 221 to primes by Pollard's p-method. Use the function f(x) = x2 + 1 with initial value x0 = 2.
+ I32
Solution. Let us seta; = y = 2. The procedure from 11.6.14 gives:
x:=f(x) y:=f(f(y)) j/1,221) mod 221
5
26 14 197
26 197 104 145
1 1 1
13
We have thus found a non-trivial divisor, so now it is easy to calculate 221 = 13 ■ 17. □
11.E.8. Find a non-trivial divisor of the integer 455459.
Solution. Consider the function f(x) = x2 + 1 (we silently assume that this function behaves randomly modulo an unknown prime divisor p of the integer n and has the required properties). In the particular iterations, we compute a <—
f(a) (mod n), b <— /(/(&)) (mod n) while evaluating d = (a — b,n).
a b d
5 26 1
26 2871 1
677 179685 1
2871 155260 1
44380 416250 1
179685 43670 1
121634 164403 1
155260 247944 1
44567 68343 743
We have found a divisor 743, and now we can easily compute that 455459 = 613-743. □
F. Encryption
11.F.1. RSA. We have overheard that the integers 29, 7, 21 were sent by means of RSA with public key (7,33). Try to break the cipher and find the messages (integers) that were originally sent.
+ /32
Further, since Mq = — 1 (mod 8), we have (2/Mq) 1, and it follows from the law of quadratic reciprocity that
Mr
Mr,
2« - 1
-1,
since we have 2q — 1 = 1 (mod 3) for q odd. Both of these expressions are valid even if Mq is not a prime (in this case, it is the Jacobi symbol).
Let us note that in the last part of the proof, we will use the extension of the congruence relation to the elements of the domainZfv7^] = {a+b^3; a,b e Z}; just like in the case of the integers, we write for a,/3e Z[V3] that a = j3 (mod p) if p | a—f3. Further, an analog of proposition (ii) from 11 .B.6 holds as well - if p is a prime, then (a + 0)p = ap + ff (mod p) (the proof is identical to the one for the integers).
" ==> " Suppose that Mq is a prime. We will prove that
Mn
= —1 (mod Mg), which will imply (thanks to 1) that Sq-2. Since 2 1. Let p be a prime which divides N — 1. Further, let us suppose that there is an integer ap such that
= 1 (mod N) and
N-l . P
-l,N\= 1.
Then
Let pa>? be the highest power of p which divides N — 1 every positive divisor d of the integer N satisfies
d=l (mod pa").
Proof of the Pocklington-Lehmer theorem. Every positive divisor d of the integer N is a product of prime divisors of N, so it suffices to prove the theorem for prime values of d. The condition ap_1 = 1 (mod N) implies that the integers ap, N are coprime (any divisor they have in common must divide the right-hand side of the congruence as well). Then, (ap, d) = 1 as well, and we have a^-1 = 1
-1,7V =1,
(mod d) by Fermat's theorem. Since (apN we get apN ^ 1 (mod d).
Let e denote the order of ap modulo d. Then, e | d—1, e\ Ar — 1, and e] (N — I)/p.
If pQp \ e, then e | N — 1 would imply that e | ^j^, which is a contradiction. Therefore, pa" | e, and so pa" d-1. □
827
CHAPTER 11. ELEMENTARY NUMBER THEORY
and (—9) 1 = 9 (mod 41). Therefore, the decrypted message is the integer
M = 9 ■ 6 = 13 (mod 41) . □
11.F.6. Rabin cryptosystem. Alice has chosen p = 23, q = 31 as her private key in Rabin cryptosystem. The public key is n = pq = 713, then. Encrypt the message M = 327 for Alice and show how Alice will decrypt it.
Solution. We compute C = (327)2 = 692 (mod 713) and send this cipher to Alice. According to the decryption procedure, we determine
r = CCP+1)/4 = 692^ = 18 (mod 23),
s = C(9+1)/4 = 692^ = 14 (mod 31),
and further the coefficients a, b into Bezout's identity 23 ■ a + 31 ■ b = 1 (using the Euclidean algorithm). We get a = —4, b = 3; the candidates for the original message are thus the integers +4 ■ 23 ■ 14 ± 3 ■ 31 ■ 18 (mod 713). We thus know that one of the integers
386,603,110,327
is the message that was sent. □
11.F.7. Show how to encrypt and decrypt the message M = 321 in Rabin cryptosystem with n = 437. Solution. The encrypted text can be obtained as the square modulo n: C = 3212 = (-116)2 = 13456 = 346 (mod 437). On the other hand, when decrypting, we will use the factorization (its knowledge is the private key of the message receiver) n = 437 = 19-23, and we compute r = 346iTi = 3465 = 17 = -2 (mod 19) and s = 24612^i = 3466 = 1 (mod 23). Applying Euclidean algorithm to the pair (19,23) = 1, we determine the coefficients into Bezout's identity
19- (-6) + 23- 5 = 1.
Then, the message is one of the integers ±6-19-l±5-23-(—2) (mod 437), i.e., M = ±116 or M = ±344. Indeed, M = -116 = 321 (mod 437). □
11.6.12. Theorem. Let N G N, N > 1. Suppose that we can
write N - 1 = F ■ U, where (F, U) = 1 and F > VN, and that we are familiar with the prime factorization of F. Then:
• if we can find for every prime p | F an integer ap G Z from the above theorem, then N is a prime;
• if N is a prime then for every prime p \ N — 1, there is an integer ap G Z with the desired properties.
Proof. By theorem 11.6.11, the potential divisor d > 1 of the integer 7y satisfies d = 1 (mod pa") for all prime factors of F, hence d = 1 (mod F), and so d > ^N. If 7y has no non-trivial divisor less than or equal to v^V, then it is necessarily a prime. On the other hand, it suffices to choose for ap a primitive root modulo the prime 7y (independently of p). Then, it follows from Fermat's theorem that a.
N-l
1
(JV-l)/p
(mod 7y), and since ap is a primitive root, we get ap 1 (mod 7y) for any p \ N — 1.
The integers ap are again called primality witnesses for the integer 7y. □
Remark. The previous test also contains Pepin's test in itself (here, for 7y = Fn, we have p = 2, which is satisfied by the primality witness ap = 3).
11.6.13. The polynomial test. Viz Radan (ATC2014) -podrobne, a velmi stručne McAndrew (Crypto in sage) Přidat popis AKS algoritmu - zvazit zda vcetne důkazu (podívat se do algebraické kapitoly, jestli je tam vse potrebne). Pripadne by bylo mozne přidat důkaz tvrzeni o Rabin-Millerovi ze Schoofova clanku citovaného vyse. Přidat popis AKS algoritmu - zvazit zda vcetne důkazu (podívat se do algebraické kapitoly, jestli je tam vse potrebne). Pripadne by bylo mozne přidat důkaz tvrzeni o Rabin-Millerovi ze Schoofova clanku citovaného vyse. Přidat popis AKS algoritmu - zvazit zda vcetne důkazu (podívat se do algebraické kapitoly, jestli je tam vse potrebne). Pripadne by bylo mozne přidat důkaz tvrzeni o Rabin-Millerovi ze Schoofova clanku citovaného vyse. Přidat popis AKS algoritmu - zvazit zda vcetne důkazu (podívat se do algebraické kapitoly, jestli je tam vse potrebne). Pripadne by bylo mozne přidat důkaz tvrzeni o Rabin-Millerovi ze Schoofova clanku citovaného vyse. Přidat popis AKS algoritmu - zvazit zda vcetne důkazu (podívat se do algebraické kapitoly, jestli je tam vse potrebne). Pripadne by bylo mozne přidat důkaz tvrzeni o Rabin-Millerovi ze Schoofova clanku citovaného vyse. Přidat popis AKS algoritmu - zvazit zda vcetne důkazu (podívat se do algebraické kapitoly, jestli je tam vse potrebne). Pripadne by bylo mozne přidat důkaz tvrzeni o Rabin-Millerovi ze Schoofova clanku citovaného vyse. Přidat popis AKS algoritmu - zvazit zda vcetne důkazu (podívat se do algebraické kapitoly, jestli je tam vse potrebne). Pripadne by bylo mozne přidat důkaz tvrzeni o Rabin-Millerovi ze Schoofova clanku citovaného vyse. Přidat popis AKS algoritmu - zvazit zda vcetne důkazu (podívat se do algebraické kapitoly, jestli je tam vse potrebne). Pripadne by bylo mozne přidat důkaz tvrzeni
828
CHAPTER 11. ELEMENTARY NUMBER THEORY
o Rabin-Millerovi ze Schoofova clanku citovaného vyse. Přidat popis AKS algoritmu - zvazit zda vcetne důkazu (podívat se do algebraické kapitoly, jestli je tam vse potrebne). Pripadne by bylo mozne přidat důkaz tvrzeni o Rabin-Millerovi ze Schoofova clanku citovaného vyse. Přidat popis AKS algoritmu - zvazit zda vcetne důkazu (podívat se do algebraické kapitoly, jestli je tam vse potrebne). Pripadne by bylo mozne přidat důkaz tvrzeni o Rabin-Millerovi ze Schoofova clanku citovaného vyse. Přidat popis AKS algoritmu - zvazit zda vcetne důkazu (podívat se do algebraické kapitoly, jestli je tam vse potrebne). Pripadne by bylo mozne přidat důkaz tvrzeni o Rabin-Millerovi ze Schoofova clanku citovaného vyse. Přidat popis AKS algoritmu - zvazit zda vcetne důkazu (podívat se do algebraické kapitoly, jestli je tam vse potrebne). Pripadne by bylo mozne přidat důkaz tvrzeni o Rabin-Millerovi ze Schoofova clanku citovaného vyse. Přidat popis AKS algoritmu - zvazit zda vcetne důkazu (podívat se do algebraické kapitoly, jestli je tam vse potrebne). Pripadne by bylo mozne přidat důkaz tvrzeni o Rabin-Millerovi ze Schoofova clanku citovaného vyse. Přidat popis AKS algoritmu - zvazit zda vcetne důkazu (podívat se do algebraické kapitoly, jestli je tam vse potrebne). Pripadne by bylo mozne přidat důkaz tvrzeni o Rabin-Millerovi ze Schoofova clanku citovaného vyse. Přidat popis AKS algoritmu - zvazit zda vcetne důkazu (podívat se do algebraické kapitoly, jestli je tam vse potrebne). Pripadne by bylo mozne přidat důkaz tvrzeni o Rabin-Millerovi ze Schoofova clanku citovaného vyse. Přidat popis AKS algoritmu - zvazit zda vcetne důkazu (podívat se do algebraické kapitoly, jestli je tam vse potrebne). Pripadne by bylo mozne přidat důkaz tvrzeni o Rabin-Millerovi ze Schoofova clanku citovaného vyse. Přidat popis AKS algoritmu - zvazit zda vcetne důkazu (podívat se do algebraické kapitoly, jestli je tam vse potrebne). Pripadne by bylo mozne přidat důkaz tvrzeni o Rabin-Millerovi ze Schoofova clanku citovaného vyse. Přidat popis AKS algoritmu - zvazit zda vcetne důkazu (podívat se do algebraické kapitoly, jestli je tam vse potrebne). Pripadne by bylo mozne přidat důkaz tvrzeni o Rabin-Millerovi ze Schoofova clanku citovaného vyse. Přidat popis AKS algoritmu - zvazit zda vcetne důkazu (podívat se do algebraické kapitoly, jestli je tam vse potrebne). Pripadne by bylo mozne přidat důkaz tvrzeni o Rabin-Millerovi ze Schoofova clanku citovaného vyse. Přidat popis AKS algoritmu - zvazit zda vcetne důkazu (podívat se do algebraické kapitoly, jestli je tam vse potrebne). Pripadne by bylo mozne přidat důkaz tvrzeni o Rabin-Millerovi ze Schoofova clanku citovaného vyse. Přidat popis AKS algoritmu - zvazit zda vcetne důkazu (podívat se do algebraické kapitoly, jestli je tam vse potrebne). Pripadne by bylo mozne přidat důkaz tvrzeni o Rabin-Millerovi ze Schoofova clanku citovaného vyse. Přidat popis AKS algoritmu - zvazit zda vcetne důkazu (podívat se do algebraické kapitoly, jestli je tam vse potrebne). Pripadne by bylo mozne přidat důkaz tvrzeni o Rabin-Millerovi ze Schoofova clanku citovaného vyse.
829
CHAPTER 11. ELEMENTARY NUMBER THEORY
11.6.14. Looking for divisors. If one of the composite-
SS, » ness tests vefifies mat a given integer is indeed -Tj\ composite, we usually want to find one of its sSsggZ? non-trivial divisors. However, this task is much more difficult than mere revealing that it is composite - let us recall that the compositeness tests can guarantee the compos-iteness, yet they provide us with no divisors (which is, on the other hand, advantageous for RSA and similar cryptographic protocols). Therefore, we will present here only a short summary of methods used in practice and one sample for inspiration.
(1) Trial division
(2) Pollard's p-algorithm
(3) Pollard's p — 1 algorithm
(4) Elliptic curve method (ECM)
(5) Quadratic sieve (QS)
(6) Number field sieve (NFS)
For illustration, we demonstrate in the exercises (11.E.8) one of these algorithms - Pollard's p-method - on a concrete instance. This algorithm is especially suitable for finding relatively small divisors (since its expected complexity depends on the size of these divisors), and it is based on the idea that having a random function / : S —> S, where S is a finite set having n-elements, the sequence (xn)™=0, where xn+i = j(xn), must loop. The expected length of the tail as well as the period is then sJ-k ■ n/8.
*1
The algorithm described below is again a straightforward implementation of the mentioned reasonings. Algorithm (Pollard's p-method):
Input : n — the integer to be factored ,
and an appropriate function j(x)
a : = 2; b-=2;d:=l
While d= 1 do
a : = /(«)
b := sum
d : = gcd(a — b, n)
If d = = n, return FAILURE.
Else return d.
830
CHAPTER 11. ELEMENTARY NUMBER THEORY
11.6.15. Public-key cryptography. In present-day prac-
tice, the most important application of number theory is the so-called public-key cryptography. Its main objectives are to provide
• encryption; the message encrypted with the public key of the receiver can be decrypted by no one else (to be precise, by no one who does not know his private key);
• signature; the integrity of the message signed with the private key of the sender can be verified by anyone with access to his public key.
The most basic and most often used protocols in public-key cryptography are:
• RSA (encryption) and the derived system for signing messages,
• Digital Signature Algorithm - DSA and its variant based on elliptic curves (ECDSA),
• Rabin crypto system (and signature scheme),
• ElGamal crypto system (and signature scheme),
• elliptic curve cryptography (ECC),
• Diffie-Hellman key exchange protocol (DH).
11.6.16. Encryption - RSA. First, we describe the most 4&ri known public-key cipher - RSA. The principle of the protocol RSA is as follows:
• Every participant A needs a pair of keys - a public one (Va) and a private one (Sa)-
• Key generating: the user selects two large primes p, q, and calculates n = pq, (p(ri) = (p — 1) (q — 1). The integer n is public; the idea is that it is too hard to compute ip(n).
• Then, the user chooses a public key e and verifies that
(e, N satisfy (/(a), /(b)) = (/(a), /(|a - b|)) for all a, b e N. Prove that (/(a), /(b)) = /((a, b)). Show that this implies the result of exercise 11.A.6 as well as the fact that (Fa, Ft,) = F(ab), where Fa denotes the a-th term of the Fibonacci sequence. O
11.G.10. Let the RSA parameters be n = 143 = 11 ■ 13, e = 7, d = 103. Sign the message m = 8, and verify this signature. Decide, whether s = 42 is the signature of the message m = 26. O
836
CHAPTER 11. NUMBER THEORY
Key to the exercises
9.B.6. 4tt. 9.B.7. 36tt.
9. B.8. ff.
10. C.10. | • | + | • 1 = |.
10.E.17. Simply, a = §. Thus, the distribution function of the random variable X is Fx (t) = |t3 for t e (0, 2), zero for smaller values of t, and one for greater. Let Z = X3 denote the random variable corresponding to the volume of the considered cube. It lies in the interval (0, 8). Thus, for t e (0,8) and the distribution function Fz of the random variable Z, we can write Fz{t) = P[Z < t] = P[X3 < t] = P[X < v^] = Fx (v7*) = |t. Then, the density is fz (t) = g on the interval (0, 8) and zero elsewhere. Since this is the uniform distribution on the given interval, the expected value is equal to 4.
10.F.9. EU = 1 • 0.6 + 2 • 0.4 = 1.4, EU2 = 0.4 + 4 • 0.6 = 2.8 EV = 0.4 + 0.6 + 1.2 = 2.1, EV2 = 0.3 + + 1.2 + 3.6 = 5.1, E(UV) = 2.8, var([/) = 2.8 - 1.42 = 2.8 - 1.96 = 0.84, var(F) = 5.1 - 4.41 = 0.69, cov(W) = 2.8 - 1.4 • 2.1 = -0.14,
= -0-14 lJU>v \/0.84-0.69'
10.F.10. EX = 1/3, var2 X = 4/45. 10.F.11.
px,y = — 1.
10. F.12. guy = -0,421.
11. B.14. Let us consider the factorization of the number n to primes. If n = p"1 ■ ■ -p^k, then •p(n) = (pi — l)p"1_1 • • • (pk — l)p^fc_1. And if we want to have 9 | 3 for some i e {1,..., k},
ill) pi = 3, ai = 2, andpj = 1 (mod 3) for some distinct i, j e {1,..., k}, iv) pi = 1 (mod 3) andpj = 1 (mod 3) for some distinct i,j 6 {1,..., k}.
If we restrict our attention (as the statement of the problem asks) to numbers n < 100, then the condition
i) is satisfied by primes 19, 37, and 73 (together with their multiples 38, 57, 76, 95, and 74),
ii) is satisfied by 33 = 27, 34 = 81 (together with a multiple 54), ill) is matched by the number 32 • 7 = 63,
iv) is matched by the number 7 • 13 = 91.
11.B.22.
i) The integer 3 has order 4 modulo 10, so it suffices to determine the remainder of the exponent when divided by 4. This remainder is equal to 1, so the last digit is 31 = 3.
ii) 37 = —3 (mod 10) is of order 4. Again, it suffices to compute the remainder of the exponent upon division by 4. However, we apparently have 37 = 1 (mod 4), so the wanted remainder upon division by 10 equals (—3)1 =7, and the last digit is thus 7.
ill) Since (12,10) > 1, it makes no sense to talk about the order of 12 modulo 10. However, the examined integer is clearly even, so it suffices to find its remainder upon division by 5. The order of 12 = 2 (mod 5) is 4, and the exponent satisfies 1314 = l14 = 1 (mod 4). We thus have 1213 = 21 (mod 5), and since 2 is an even integer, it is the wanted digit as well.
11.B.23. Since tp(n) < n, we surely have tp(n) | n\, whence the statement already follows as odd positive integers n satisfy 2*''"' = 1
(mod n).
11.B.26. Similarly to the above exercise, we will examine the remainders upon division by coprime integers 125 and 8. We know that
(12,125) = 1 and ^(125) = 100, so
12io" = 12io».io» = ^12i°y°a = = 1 (mod 125)-
Since 4 | 12, the number 1210 is divisible even by 410 , so it is by mere 8 as well, hence 1210 = 0 (mod 8). The Chinese remainder theorem states that exactly one of the integers 0,1,..., 999 leaves remainder 1 upon division by 125 and is divisible by 8. This integer is
376 (it can be found, for instance, by going through the multiples of 125 increased by 1 and examining their divisibility by 8). Therefore,
in11
the last three digits of the number 12 are 376.
11.C.6. i) The greatest common divisor of the moduli is 3, and 1^—1 (mod 3), so the system has no solution.
ii) The condition for solvability of linear congruences, (8,12345678910111213) = 1, is clearly true, so this congruence has a unique solution.
837
CHAPTER 11. NUMBER THEORY
ill) The moduli are coprime, so by the Chinese remainder theorem, there is a unique solution modulo 29 • 47.
11.C.8. We have 208 = 24 • 13 and (3446, 208) = 2 | 8642. Therefore, the congruence has two solutions modulo 208 and it is equivalent to the system
3x = 1 (mod 8), x = 10 (mod 13).
The solutions of this system are x = 75 and x = 179 (mod 208). 11.C.9. Since 2 is a primitive root modulo both 5 and 13, we get that
2" = 3 (mod 5) 2" = 23 (mod 5) n = 3 (mod 4)
and
2" = 3 (mod 13) 2" = 24 (mod 13) n = 4 (mod 12).
This apparently implies the infinitude of the multiples of both 5 and 13 among the integers 2" — 3 in question. On the other hand, we can see that none of them can be a multiple of 5 and 13 simultaneously since the system of congruences n = 3 (mod 4), n = 4 (mod 12) has no solution.
11.C.15. Since the modulus can be written as 105 = 3-5-7, where the factors are pairwise coprime, the congruence in question is equivalent to the following system:
x3 — 3x + 5 = 0 (mod 3), x3 — 3x + 5 = 0 (mod 5), x3-3x + 5 = 0 (mod 7).
Clearly, the first congruence is equivalent to x3 = 1 (mod 3), and that one is equivalent to x = 1 (mod 3) as it follows from Fermat's little theorem that x3 = x (mod 3) holds for all integers x.
The second congruence is equivalenttox(x2—3) = 0 (mod 5), which is satisfied iff x = 0 (mod 5) orx2 = 3 (mod 5). However, since 3 is a quadratic nonresidue modulo 5 (the Legendre symbol (3/5) is equal to —1), we get that x = 0 (mod 5) is the only solution of the second congruence of the system.
The third congruence can be transformed to the form x3 — 3x — 2 = 0 (mod 7), which is satisfied iff x = — 1 (mod 7) or x = 2
(mod 7) (since the left-hand side factors as x3 — 3x — 2 = (x — 2)(x + l)2). Of course, this can also be found out by examining all
possibilities modulo 7. Altogether, there are two solutions of the original congruence modulo 105: x = 55 and x = 100.
11.C.30. The modulus 473 factors to 11 • 43, thus we have to solve the system of two congruences. The first one leads to (x — 3)2 = 3
(mod 11) with two solutions x — 3 = ±5 (mod 11). The second one can be transformed to (x + 8)2 = 15 (mod 43). Now we
43 + 1
can proceed either noticing that 15 = 144 (mod 43) or calculating (using result of 11.C.38) ±(x + 8) = 15 4 =31 (mod 43), in both cases we get the result x = 4, 23 (mod 43). Now we combine the solutions of both congruences of the system and we obtain
x = 152,195, 262, 305 (mod 473).
11.C.32. All of the results can be proved directly from the definition of the Jacobi symbol and the multiplicativity of the Legendre symbol. 11.C.34. In light of the previous exercise, both statements can be proved easily by mathematical induction.
ll.G.l. Let n = abc, where a ± 0. Then 11 | n 11 | c + a - b c + a - b e {0,11}.
We shouldhave 100a+110t'+': = a2 + b2 + c2 <^=> 100a + 106 + c = ll(a2 + b2 + c2). If c + a — b = 0, i.e. b = a + c, then
100a + 10(a + c) + c = ll(a2 + (a + c)2 + c2) 110a + 11c = ll(2a2 + 2ac + 2c2) 10a + c = 2a2 + 2ac + 2c2 2a2 + 2ac - 10a + 2c2 - c = 0 a2 + (c — 5)a + c2 — — = 0.
The discriminant of this quadratic equation is —3c2 — 8c + 25 > 0 <^=> c e {0,1}. c = 0 a2 — 5a = 0 a = 5, b = 5. Thus the first solution is n = 550. c = 1 ==> a, b 0 N.
838
CHAPTER 11. NUMBER THEORY
If c + a — b = 11, i.e. b = a + c — 11, then
100a + 10(a + c - 11) + c = ll(a2 + (a + c - ll)2 + c2)
110a + 11c- 110 = ll(2a2 + 2ac+ 2c2 - 22a - 22c + 121) 10a + c - 10 = 2a2 + 2ac + 2c2 - 22a - 22c + 121 2a2 + 2ac - 32a + 2c2 - 23c + 131 = 0 a2 + (c_l6)a + c2_|!c+^i=0.
Now, the discriminant is -3c2 + 14c - 6 > 0 c e {1, 2, 3,4}.
c=lVc=2Vc = 4 ==> a, b 0 N.
c = 3 =>■ a2 — 13a + 40 = 0 =>■ a = 8, b = 0. The other solution is therefore n = 803.
11.G.2. Fermat's little theorem states that ap = a which together with the requirement ap = 1 gives the condition a = 1 (mod p) for the pairs (a,p).
For a < 16 we get the following pairs:
a 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
P P - 2 3 2 5 2,3 7 2 3 2,5 11 2,3 13 2,7
11.G.3. We prove by the induction that 5" = 5"+4 (mod 10000) for n > 4. For n = 4 we have 54 = 625 = 390625 = 5s (mod 10000). Induction step: 5k+1 = 5k+5 = 5k+4 • 5 = 5k • 5 = 5k+1 (mod 10000).
For all a e N we therefore have: 54a = 625 (mod 10000), 54a+1 = 3125 (mod 10000), 54a+2 = 5625 (mod 10000) a 54a+3 = 8125 (mod 10000), and the last four digits of 5" form periodical sequence with period 4.
11.G.4. If n is an arbitrary natural number, then 22 =1 (mod 3), so it suffices to choose for k odd positive integers with k = 2 (mod 3). And there are surely infinitely many of them - they are those which satisfy k = 5 (mod 6). For these values of k, we always have that 22 + k is a multiple of 3 and greater than 3, so it is a composite number.
11.G.5. Let us fix an integer k e Z \ {1} and an arbitrary a e N. We will show that for an arbitrarily large a, we can find a positive integer n such that 22 + k is composite and greater than a. That will complete the proof.
Let us fix s e No, h e Z so that k - 1 = 2s ■ h, 2 \ h, and m e N satisfying 22™ > a - k. Now, let an I satisfy I > s, £ > m. If the integer 22 + k is composite, then we are done, since 22 + k > 22 + > a. Therefore, let us assume that the integer 22 + k is a prime and denote it by p. With help of Euler's theorem, we can find an integer of the desired form which is a multiple of p. We have
p- 1 = 22' + 2s ■ h = 2s ■ hi, where hi e N is odd. We thus have 2v(hl) = 1 (mod hi), whence 2s+v(hl) = 2s (mod p - 1), and since I > s, we also have
2*+v(hi)=2 2 , we also have 2 +fc>2 + k = p > a. We have thus found a composite number which is of the
wanted form and greater than an arbitrarily large value of a.
Let us mention that the case of k = 1 is a well-known open problem examining the infinitude of Fermat primes.
11.G.6. We can easily see that 2 | ai = 10 and 3 | ai = 48. Further, we can show that p \ aP-2 holds for any prime p > 3. By Fermat's theorem, we have 2P~1 = 3P_1 = 6P_1 = 1 (mod p). Therefore,
6ap-2 = 3-2p_1+2-3p_1+6p_1-6 = 3+2+1-6 = 0 (mod p).
Let us remark that knowledge of algebra allows us to proceed more directly: for p > 3, we can consider the p-element field FP, which contains multiplicative inverses of the elements 2, 3, and 6 and their sum isi + | + | = l.
11.G.7. We could reason about the factorization of n to primes, which is a bit complicated. Instead, we will use a little trick. Suppose that there is an n satisfying the conditions n \ 2" — 1, n > 1, and let us select the least one. Surely, n is odd, hence n | 2^(n) - 1. Utilizing the result of exercise 11. A.6, we get that n | 2d — 1, where d = (n,ip(n)) (which especially implies that 2d — 1 > 1 andd > 1). Atthe same time, d < ip(n) < n and d | n, whence it follows that d | 2d — 1, which contradicts the assumption that our n is the least one that meets the conditions.
839
CHAPTER 11. NUMBER THEORY
11.G.8. Since 2P~1 = 1 (mod p), it suffices to choose appropriate multiples of p — 1 for n, i. e., to find a so that n = k(p — 1) would satisfy the condition n ■ 2™ = — 1 (mod p). However, thanks to p — 1 | n, this is equivalent to k = 1 (mod p), and there are clearly infinitely many such values k.
11.G.9. Analyze the Euclidean algorithm for computing the greatest common divisor.
11.G.10. The signature is 8103 = 83 (mod 143), which can be verified by 837 = 8 (mod 143). Finally, 42 is not the valid signature as
427 = 81 ^ 26 (mod 143).
840
CHAPTER 12
Algebraic structures
The more abstraction, the more chaos? - no, it is often the other way round...
A. Boolean algebras and lattices
12.A.1. Find the (complete) disjunctive normal form of the proposition
(B1 =>C) A P V C) A B]'.
Solution.
If the propositional formula contains only a few variables (in our case, it is three), the most advanta-' 8eous Procedure is to build the truth table of the
formula and build the disjunctive normal form from that. The table will consist of 23 = 8 rows. The examined formula is denoted p.
In this chapter, we begin a seemingly very formal study. But the concepts reflect many properties of things and phenomena surrounding us. This is one of the parts of the book which is not in the prerequisites of any other chapter. Large parts serve as a quick illustration of interesting uses of mathematical tools and models.
The simplest properties of real objects are used for encoding in terms of algebraic operations. Thus, "algebra" considers algorithmic manipulations with letters which usually correspond to computations or descriptions of processes.
Strictly speaking, this chapter builds only on the first and sixth parts of chapter one, where abstract views on numbers and relations between objects are introduced. But it is a focal point for abstract versions of many concepts already met.
The first two sections aim at direct generalizations of the familiar algebraic structure of numbers. This leads to a discussion of rings of polynomials. Only then we provide an introduction to group theory, for which there is only a single operation.
The last two sections provide some glimpses of direct applications. The construction of (self-correcting) codes often used in data transfer is considered. The last section explains the elementary foundations of computer algebra. This includes solving polynomial equations and algorithmic methods for manipulation and calculations with formal expressions.
1. Posets and Boolean algebras
Familiarity with the properties of addition and multiplication of scalars and matrices is assumed. Likewise, the binary operations of set intersection and union in elementary set theory, as indicated in the end of the first chapter. We proceed to work with symbols which stand for miscellaneous objects resulting in the universal applicability of the results.
This allows the relating of the basic set operations, to propositional logic which formalizes methods for expressing propositions and evaluating truth values.
12.1.1. Algebraic operations. For any set M, there is a set K = 2M consisting of all subsets of M, together with the operations of union V : K x K —> K and intersection A :
CHAPTER 12. ALGEBRAIC STRUCTURES
A B c B => C [(A V C) A B]'
0 0 0 0 1 0
0 0 1 1 1 1
0 1 0 1 1 1
0 1 1 1 0 0
1 0 0 0 1 0
1 0 1 1 1 1
1 1 0 1 0 0
1 1 1 1 0 0
The resulting complete disjunctive normal form is the disjunction of the formula that correspond to the rows with one in the last column (the formula is true for the given valuation of the atomic propositions). The row corresponds to conjunction of the variables (if the corresponding value is 1) or their negations (if it is 0). In our case, it is the disjunction of conjunctions corresponding to the second, third, and sixth rows, i. e., the result is
{A A B A C) V (A A B A C) V (A A B A C).
We can also rewrite the formula by expanding the connective => with A and V, using the De Morgan laws and dis-tributivity:
(B1 C) A [{A V C) A B]' <=h ^{B V C) A [(A V C)' V B'\ ^(BVQA [(A' AC) V B']
[(BVC) A (A AC')] V [(B V (7) A B'j <^ <^=> [(BAA' AC?) V (CAA'aC)] V [(BAB') V (CAB')] <^=> (B A A A C) V (C A B') ,
which is an (incomplete) disjunctive normal form of the given formula. Clearly, it is equivalent to our result above (the word "complete" means that each disjunct (called clause in this context) contains each of the three variables or their negations (these are called literals). □
12A.2. Find a disjunctive normal form of the formula
((A A B) V C)' A (A V (B A C A D))
O
We know several logical connectives: A, V, =>, = and the unary'. Any prepositional formula with these connectives can be equivalently written using only some of them, for instance V and'. There are also connectives which alone suffice to express any propositional formula. From binary connectives, these are NAND and NOR {A NAND B = (A A B)1,
K x K —> K. This is an instance of an algebraic structure on the set K with two binary operations. In general, write (K, V, A). In the special case of sets, these binary operations are denoted rather by U and n, respectively.
To every set A e K, its complement A' = K \ A can be assigned. This is another operation ' : K —> K with only one argument. Such operations are called unary operations.
In general, there are algebraic structures with k operations pi,..., Hk, each of them
ij -times
fij : K x ■ ■ ■ x K —> K
with ij arguments, and write (K, pi,..., p^) for such a structure. The number ij of arguments is called the parity of the operation ("unary", "binary", etc.). If ij = 0, then the operation has no arguments which means it is a distinguished element in K.
With subsets in K = 2M, there is the unique "greatest object", i.e. the entire set M, which is neutral for the A operation. Similarly, the empty set 0 G K is the only neutral element for V. Notice that if M is empty, then K contains the only element 0.
12.1.2. Set algebra. View the algebraic structure on the set K = 2M from the previous paragraph as (K, V, A,', 1,0), with two binary operations, one unary operation (the complement), and two special elements 1 = M, 0 = 0.
It is easily verified that all elements A, B, C e K satisfy the following properties:
Axioms of Boolean algebras
4^, A a (B a C) = (A a B) a C,
(2) A V (B V C) = (A V B) V C,
(3) AAB = BAA, AVB = BVA,
(4) A a (BVC) = (AAB) V(AAC),
(5) AV(BAC) = (AV B) a(AvC),
(6) there is a 0 e K such that A V 0 = A,
(7) there is a 1 e K such that A a 1 = A,
(8) AaA = 0, A\/A' = l.
Compare these properties with those of the scalars (K, +, •, 0,1): Properties (1) and (2) say that both the operations A and V are associative. E'tiAit: Property (3) says that both operations are also commutative. So far, this is the same as for the addition and multiplication of scalars. Also there are neutral elements for both operations there.
However, the properties (4) and (5) are stronger now: they require the distributivity of A over V as well as V over A. Of course, this cannot be the case for addition and multiplication of numbers. In the case of numbers, multiplication distributes over addition but not vice versa.
842
CHAPTER 12. ALGEBRAIC STRUCTURES
ANORB = (AVB)'). Try to express each of the known connectives using only NAND, and then only NOR. These connectives are implemented in electric circuits as the so-called "gates".
12A.3. Express the propositional formula (A => B) using only the NAND gates. O
12.A.4. Write down a logic table for the Boolean proposition
((A ab)v C)'.
Solution. Using de Morgan's and distributive laws we express
((A ab)v C)' = (A a B)' a G = A' a G v B a G.
Setting 1 for the value True and 0 for False for A,B,C we obtain the table for on M, 518
integral operators, 438
interior, 447
interior point, 264
interior point of a subset, 456
interpolation polynomial, 251
interval [xo, xi], 257
invariant subspace, 112
inverse, 77
inverse Fourier transform, 439
inverse function, 282
inverse matrix to the rotation matrix, 32
inverse relation, 39
inversion in permutationcr, 85
invertible matrix, 77
isolated point, 265, 456
isometry, 453
isomorphism, 625
Jacobi symbol, 600 Jacobi theorem, 228 Jacobian matrix of the mapping, 488 JarnAnk's algorithm, 652 Jordan blocks, 164 Jordan curve theorem, 645 Jordan decomposition, 164 Jordan measure, 374
k- combinations, 15 k-combinations with repetitions, 15 kernel of linear mapping, 99 kernel of the integral operator l, 438 Kronecker delta, 75 Kruskal's algorithm, 651 Kuratowski theorem, 646
l'Hospital's rule, 285
Lagrange algorithm, 226
Lagrange interpolation polynomial, 253
Lagrange's mean value theorem, 284
Laplace expansion, 90
Laplace transform, 443
law of cosines, 215
law of inertia, 227
law of quadratic reciprocity, 597, 598 leading principal minors, 89
leading principal submatrices, 89 leaf, 640
least common multiple, 567
left-sided limit, 268
Legendre polynomials, 418
Legendre symbol, 595
Leibniz criterion, 294
Leibniz rule, 280
length of a curve, 374
Leslie model for population growth, 146
level sets, 498
limes superior, 294
limit, 261, 267
limit point, 456
limit point of the set A, 263
limit points of a subset A C X, 447
line segment, 208
Linear algebra, 24
linear approximation, 255
linear combinations, 82
linear combinations of vectors, 24
linear difference equation of first order, 9
linear form r\ on U, 514
linear forms, 103
linear functionals, 435
linear mapping, 28
linear mapping (homomorphism), 99
linear programming problem, 134
linear restrictions, 134
linearly dependent, 82
linearly independent, 82
Lipchitz continuous, 459
Lipschitz continuity, 493
local parametrization of the manifold, 515
locally finite cover by parametrizations, 518
logarithmic function with base a, 276
logarithmic order of magnitude, 547
loop, 622
low pass filter, 428
lower bound, 259
lower Riemann sum, 366
Lucas's test, 609
Möbius function, 579 Möbius inversion formula, 579 Malthusian population growth, 9 mapping, 7
mapping from a set A to the set B, 37
Markov chain 151
Markov process, 151
mathematical analysis, 249
mathematical induction, 10,14
matrices, 28
maximum, 485
mean value, 374
member of the determinant 84
Menger's theorem, 634
Mersenne primes, 573
method of Lagrange multipliers, 501
metric, 445
metric on the graph, 635 metric space, 445 minimum, 485
minimum excluded value, 661 minimum spanning tree, 651 Minkowski inequality, 449
1013
INDEX
minor, 89
minor complement, 89 modules over rings, 73 Monte Carlo methods, 24 morphism, 625
multidimensional interval, 503 multiple, 566
multiplicative function, 580 mutually perpendicular, 161
natural logarithm, 276 natural spline, 257 negative definite, 228 negative semidefinite, 228 negatively definite, 487 negatively semidefinite, 487 neighborhood of a point, 264 Newton integral, 359 nilpotent, 163 Nim, 658
non-homogeneous linear difference equations, 143
Norm, 148
norm, 155, 445
norm of the partition, 365
normal space, 500
normal vector, 498
normalised, 104, 153
normalized vectors, 31
normed vector space, 445
nowhere dense, 451
number tt, 297
number of solutions of a congruence, 588
objective function, 134 odd, 85
One-sided derivatives, 277 one-to-one, 38 onto, 37 open, 446
open e-neighbourhood, 446
open cover, 265,456
open intervals, 264
open set, 264
order of a modulo m, 582
order of an integer modulo m, 582
order of magnitude, 547
ordered field, 259
ordered trees, 643
ordering, 39
orientation, 36,218
orientation of the manifold, 517
oriented euclidean (point) space, 218
oriented manifold with boundary, 522
oriented manifolds, 518
oriented vector space, 218
origin of the affine coordinate system, 204
orthogonal, 104, 153
orthogonal basis, 104
orthogonal complement, 105, 154
orthogonal group, 157
orthogonal mapping, 112
orthogonal matrices, 113,157
orthogonal mother wavelet, 427
orthogonal system of functions, 419
orthogonally diagonalisable, 161
orthonormal basis, 153
orthonormal system of functions, 419
orthonormalised basis, 104 oscillates, 291 osculating circle, 353 outdegree deg_ v, 627 outer product, 220 outgoing, 622
pairwise coprime, 570 parametric description, 26, 205 parametrized by the length, 356 parent, 642
Parity of permutation, 85
Parseval equality, 156
Parseval's theorem, 420
partial derivatives of order k, 481
partial derivatives of the function /, 476
partial sums, 291
particular solution, 133
partition, 40
Pascal triangle, 15
path, 623, 625
path graph of length n, 623
path of length n, 625
perfect numbers, 573
periodic, 298
periodic function, 422
permutation, 12
permutation of the set X, 84
permutation with repetitions, 15
perpendicular, 31,104,161
perpendicular projection, 105
Perron-Frobenius theory, 147
Petersen graph, 624
phase frequency, 425
Picard's approximation, 532
planar graph, 645
plane trees, 643
Pocklington-Lehmer, 609
points, 203
polar basis, 226
polynomial order of magnitude, 547
polynomials, 8
positive definite, 227
positive direction, 32
positive matrix, 147
positive semidefinite, 228
positively definite, 163,487
positively semidefinite, 163, 487
power function xa, 275
power mean with exponent r, 288
power residue, 594
power series, 295
power series centered at x0, 300
predecessor, 642
preimage, 38
primality witness, 609
prime, 570
Primitive matrix, 147
primitive root, 584
principal matrices, 89
principal minors, 89
principle of inclusion-exclusion, 20
private key, 612, 614
probability function, 17
projection, 105
projective maps, 231
1014
INDEX
Projective plane V2, 229
projective quadric, 235
projective transformations, 231
projectivization of a vector space, 230
proper, 267, 277
proper rational functions, 363
pseudoinverse matrix, 175
pseudoprime, 605
public key, 612, 614
pullback of the form r\ by ip, 515
QR decomposition, 175 quadratic forms, 223 quadrics, 223
Rabin cryptosystem, 613 radius of convergence, 295 range, 7
rank of the matrix, 82 rank of the quadratic form, 224 ratio of points, 212 rational functions, 274 rays, 208
real-valued functions of a real variable, 249 recurrence relation, 9 reduced residue system, 581 reflection through a line, 33 reflexive, 39
regular collineations, 231
regular square matrix, 77
residual capacity, 655
Riccati equation, 529
Riemann integral, 365
Riemann measurable, 504
Riemann measure of the set, 373
Riemann sum, 365
Riemann-Stieltjes integral sum, 386
right-continuous or left-continuous, 272
right-sided limit, 268
Rolle's theorem, 284
root of the polynomial, 251
root of the tree, 641
root vector, 165
rooted trees, 641
rotation or curl of the vector field, 524 rows of the matrix, 74 RSA, 612
Saarus rule, 84 sample space, 17 sampling interval r, 443 scalar functions, 7 Scalar product, 104 scalar product, 31,74,153 scale, 36
second-order partial derivatives, 481
self-adjoint, 160
self-adjoint matrices, 160
semiaxes, 225
semipath, 655
separated variables, 370
sequence an converges to a, 268
sequentially continuous, 369
series of functions, 295
set of solutions, 588
shift of the plane, 25
signature of a quadratic form, 227
simplex, 208, 209
Simpson's rule, 386
sine Fourier series, 427
singular values of the matrix, 173
sink, 654
size of a flow, 654
size of the vector, 153
smooth, 342
solution, 525
solvable, 136
source, 654
spanning subgraph, 626
spanning tree, 649
spectral radius of matrix A, 148
Spectrum of linear mapping, 115
spectrum of linear mapping, 163
Sprague-Grundy function, 661
Sprague-Grundy theorem, 662
square matrix, 75
square wave function, 425
Standard affine space A„, 202
standard basis of K", 98
standard maximalisation problem, 134
standard rninirnisation problem, 134
standard unitary space, 153
stationary point of the function, 485
stationary points, 344, 501
Steinitz exchange theorem, 97
Steinitz's theorem, 648
stochastic matrices, 152
stochastically independent, 20, 21
strategy, 658
strict extremum, 485
subdeterminant, 89
subgraphs, 626
submatrix of the matrix A, 89
subspaces, 208
successor, 642
sum of impartial games, 661
sum of subspaces, 95
supremum, 259
surface and volume of a solid of revolution, 376
surjective, 37
symmetric, 39, 86
symmetric bilinear form, 108
symmetric mappings, 160
symmetric matrices, 160
symmetrization, 622
tail of the edge, 622
tangent hyperplane, 480
tangent line, 255
tangent line to the curve c, 476
tangent plane, 480
tangent space, 500
tangent space TU, 514
tangent vector, 476, 513
Taylor expansion with a remainder, 345
Taylor polynomial of fc-th degree, 345
the backward difference, 358
the central difference, 358
the class of funcitons Ck (A), 342
the curvature of the curve, 356
the differential of function /, 352
the differention of the second order, 358
The domain of the relation, 37
1015
INDEX
The Euclidean plane, 30
the existence of a neutral element, 5
the existence of a unit element, 5
the existence of an inverse element, 5
the forward difference, 358
the Frenet frame, 356
The fundamental theorem of algebra, 343
the graph of a function, 38
the indefinite integral, 358
the integral mean value theorem, 369
the lower Riemann integral, 369
the main normal, 356
the primitive function, 358
the second derivative, 342
the sources and sinks of the network, 653
the uniform continuity, 369
the upper Riemann integral, 369
the Weierstrass test, 382
topology, 264, 447
topology of the complex plane, 264
topology of the metric spaces, 447
topology of the real line, 264
torsion of the curve, 356
totally bounded, 457
Trace of mapping, 111
trace of matrix, 111
trail, 625
transformation, 489 transient, 152 transitive, 39 translation, 25, 203 transpose, 86 transposition, 84 trapezoidal rule, 385 tree, 640 triangle, 208, 623 triangle inequality, 155 trigonometric functions, 297
unbounded, 264 undirected graph, 622 uniform continuity, 368 uniformly bounded, 460 uniformly Cauchy, 380
unit decomposition subordinate to a locally finite cover, 519
unit matrix, 32, 75
unitary group, 157
Unitary isomorphism, 154
unitary mapping, 154
unitary matrices, 157
Unitary space, 153
universal formula, 667
unsaturated, 655
upper bound, 259
upper Riemann sum, 366
Vandermonde determinant, 252 variation, 387 vector, 72 vector field, 541 vector field X, 513
vector field X along the curve M, 513
vector functions, 354
vector functions of one real variable, 354
vector of restrictions, 134
Vector space, 92
vector subspace, 94
vectors, 24 vertices, 622
walk, 625
walk of length n, 625 wavelet mother function, 427 weak connectedness, 633 weakly connected, 639 weight, 636
zero curvature, 352 zero matrix, 74 zero measure, 388 zero vector, 24
1016
Based on the earlier textbook: Matematika drsně a svižně Jan Slovák, Martin Panák, Michal Bulant a kolektiv
published by Masarykova univerzita in 2013 1. edition, 2013 500 copies
Typography, IATja and more, Tomáš lanoušek
Print: Tiskárna Knopp, Černčice 24,549 01 Nové Město nad
Metují (2) is more demanding. So assume X is complete and totally bounded, but X does not satisfy (2).
Then there is an open covering Ua, a e I, of X, which does not contain any finite covering. Choose a sequence of positive real numbers Ek —> 0 and consider the finite e^-nets from the definition of total bound-edness. Further, for each k, consider the system Ak of closed balls with centres in the points of the e^-net and diameters 2ek- Clearly each such system Ak covers the entire space X. Altogether, there must be at least one closed ball C in the system Ai which is not covered by a finite number the sets Ua. Call it C\ and notice that diamC\ = 1e\.
Next, consider the sets C\ n C, with balls C e A2 which cover the entire set C\. Again, at least one of them cannot be covered by a finite number of Ua, we call it C2. This way, we inductively construct a sequence of sets Ck satisfying Ck+i C Ck, diam Ck < 2ek, £k —> 0, and none of them can be covered by a finite number of the open sets Ua.
Finally we choose one point Xk £ Ck in each of these sets. By construction, this must be a Cauchy-sequence. Consequently, this sequence of points has a limit x since X is complete. Thus there is Uaa containing x and containing also some ^-neighbourhood B$(x). But now, if diamCfc < 2ek < S, then Ck C Bg(x) c Uaa, which is a contradiction.
The remaining step is to show the implication (2) => (1). Assume (2) and considering any sequence of points Xi e X, we set Cn = {xk\ k >n}. The intersection of these sets must be non-empty by the following general lemma:
Lemma. Let X be a metric space such that property (2) in the Theorem holds. Consider a system of closed sets D a, a G I, such that each its finite subsystem Dai,..., Dak has nonempty intersection. Then also
This simple lemma is proved by contradiction, again. If the latter intersection is empty, then
X = X\ (f)aeIDa) = UaeI(X \ Da) = UaeIVa,
where VQ = X\Da are open sets. Thus, there must be a finite number of them, {Vai, VQn }, covering X too. Thus, we obtain
X
=iVa,
■U?=1(X\Dai)=X\(n?=1Dai).
This is a contradiction with our assumptions on Da and the lemma is proved.
Now, let x e n^LjCn. By construction, there is a subsequence xnk in our sequence of points xn e X, so that d(xUk ,x) < 1/k. This is a converging subsequence, and so the proof is complete. □
497
CHAPTER 7. CONTINUOUS TOOLS FOR MODELLING
Solution. That the requirement (1) cannot be omitted is probably contrarily to many readers' expectations. For a counterexample, consider the set X = N with metric
d(m, n) = 1 + mjj_n, m 7^ n, d(m, n) = 0, m = n.
It is indeed a metric. The first and second properties are clearly satisfied. To prove the triangle inequality, it suffices to observe that d(m, n) g (1,4/3] if m ^ n. Hence the only Cauchy sequences are those which are constant from some index on. These sequences are constant except for finitely many terms, sometimes called almost stationary sequences. Thus, every Cauchy sequence is convergent, so the metric space is complete. Define
An := {m g N; d(m, n) < 1 + ^} , ti g N.
As the inequality in their definition is not strict, it is guaranteed that they are closed sets. Since An = {ti, n + 1,... }, it follows that {An} are nested, but with empty intersection (contrary to (1)). If the requirement (1) is omitted, then the metric space is not complete, contradicting the data. Of course, in this case the condition (1)) is not met, as
lim sup {d(x,y); x,y g An}
n—yoo
l
= lim 1 +
2n + 1
= 1/0.
□
7.G.4. Determine whether the set (known as the Hilbert cube)
A = {{xn}neN g £2; | xn | < i, n g N} is compact in £2. Then determine the compactness of the set
B = {{xn}neN g £oo; | Xn | < 71 g N}
in the space £oo ■
Solution. The space £2 is complete (see 7.F.5). Every closed subset of a complete metric space defines a complete metric space. The set A is evidently closed in l2, so it suffices to show that it is totally bounded, and from the theorem 7.3.13(3) it is compact. To do that, construct an e-net of A for any given e > 0: Begin with the well-known series
oo „
J_ — zL 2^ k2 — 6
k=l
(see (1)).
For every e > 0, there is an n(e) g N satisfying
As an immediate corollary of the latter theorem, each closed subset in a compact metric space is again compact. For subsets of a totally bounded set are totally bounded, and closed subsets of a complete metric space are also complete.
Another consequence is an alternative proof that a subset K c R™ is compact, if and only if it is closed and bounded.
Notice also that while the conditions (1) and (3) are given in terms of the metric, the equivalent condition (2) is purely topological.
7.3.14. Continuous functions. We revisit the questions related to continuity of mappings between metric spaces. If fact, many ideas understood for the functions of one real variable generalize naturally.
In particular, every continuous function / : X —> R on a compact set X is bounded and achieves its maximum and minimum. Indeed, consider the open intervals Un = (n — l,7i+l) c R, 7i g Z covering R. Then their preimages S^iUi) cover X, so that there is a finite number of them, covering X as well. Thus / is bounded and the supremum and infimum of its values exist. Consider sequences f(xn) and f(yn) converging to the supremum and infimum, respectively. Then there must be covergent subsequences of the points xn and yn in X and their limits x and y are in X too. But then f(x) and f(y) are the supremum and infimum of the values of / since / is continuous and thus respects convergence.
We should also enjoy to see the differences between the "purely topological" concepts, as the continuity (possibly defined merely by means of open sets), and the next stronger concepts, which are "metric" properties.
Uniformly continuous mappings
E k2
k=n(e) + l
A mapping / : X —> Y between metric space is called uniformly continuous, if for each e > 0 there is a S > 0, such that dy (/(x), / (y)) < e for all x, y g X with dx(x, y) < S.
Notice that this requirement on the uniform continuity of / is equivalent to the condition that for each pair of sequences x^ and y^ in X, dx(xk,yk) —> 0 implies dY(f(xk),f(Vk))^0.
This observation leads to the following generalization of the behavior of real functions:
Lemma. Each continuous mapping f : X —> Y on a compact metric space X is uniformly continuous.
Proof. Assume / is a continuous function. Consider any two sequences x^ and yk with d(xk, yk) —> 0.
Since X is compact, there is a subsequence of Xk converging to some point x g X and so we may assume x^ —> x, without loss of generality. Now, dx(x,yk) < dx(x,xk) + dx(xk, Vk) -> 0 and so lim^oo yk = x, too.
498
CHAPTER 7. CONTINUOUS TOOLS FOR MODELLING
From each of the intervals [— 1/n, 1/n] for n e {1,n(e)}, choose finitely many points
x\,..., („) so that for any x e [—1/n, 1/n] that min x — x™ < -4=.
j€{l.....ra(n)} 1 J 1 V5"
Consider such sequences {yn}n&i from l2 whose terms with indices n > n(e) are, zero, and at the same time,
r- / 1 1 \ r- / n(e) n(£) X
Vl t ^l, • • ^xm(l)( ' • • • ' ^ \X1 ' • • • 'Xm(n(e))/-
There are only finitely many such sequences and they create the desired e-net for A: let xn e l2 is arbitrary. According to our choice of the sequences yn, there is yn such that
d(xn,yn) =
\
Vk)
k=l
<
<
< e
\ k=l
Y(Xk
VkV +
\
E
+
e
k=n(e) + l
e
5«(e) + 2
1
T-l+2=e-
Since e > 0 is arbitrary, the set ^4 is totally bounded, which implies compactness.
The closure of the set B is
B = {{xn}n&i e loo; | %n | < «• G N} . Hence B is not closed, and so it is not compact. The set B is compact. The proof of this fact is much simpler than for the set A, thus we leave it as an exercise for the reader. □
7.G.5. Prove that on each metric space X, the given metric d is a continuous function IxI->R, O
7.G.6. Show that if F is a continuous mapping on a compact metric space X, then the inequality
d(F(x),F(y)) 0 without loss of generality),
det W\ J f(ayu y2, ■ ■ ■, yn) dy1 ... dyn = = a J ■■■{^J f(ayi^v2, - ■ ■ ,yn)dy^j ...dyn = qq"1 / ••• ( / f{xi,x2,...,xn)dx1 J ...dxn
f(x1,x2,... ,xn)dx1... dxn.
The second case is even easier, since the order of integration does not matter due to the Fubini theorem. The third case is similar to the first one:
det W\ J f(yi + y2, y2,---, yn) dy1 ... dyn =
rb„ / fbt \
f(yi + v2, v2, ■ ■ ■ , Vn) dyi J .. . dyn
f-bn / l-b1+x2 \
/ f(xi,x2,. . . ,xn) dx1 j . . . dxn
'a1+x2 /
f(xi,x2, ■ ■ ■ ,xn) dxi... dxn.
The reader should check the details that the last multiple integral describes the image □
As a direct corollary of the proposition, the Riemann integral is invariant with respect to the Euclidean affine mappings. That is, the integral cannot depend on the choice of the orthogonal frame in the Euclidean R™.
8.2.10. Riemann measurable sets. It is necessary to understand how to recognize Riemann measurable domains M.
When denning the Riemann integral, a strict analogy of the lower and upper Riemann integrals for univariate functions can be considered. This means taking infima or suprema of the integrated function over the corresponding multidimensional intervals instead of the function values at the representatives in the Riemann sums. For bounded functions, there are
557
CHAPTER 8. CALCULUS WITH MORE VARIABLES
changed proportionally to the change of the volume of an infinitesimal volume element, which is the Jacobian. Therefore, if we consider the volume of the ball with a given radius r to be known, (in this case, r = 1), we can infer directly that the volume of the ellipsoid is V = ■ ^tt =
8.1.12. Find the volume of the solid which is bounded by the paraboloid 2a;2 + 5y2 = z and the plane z = l.
Solution. We choose the coordinates
X = 72rCOS(^)'
V = Tgrsin^), z = z.
The determinant of the lacobian is -4==, so the volume is
V = I I I ,_dz dr dtp =—==.
Jo Jo Jr* VW 2VTÖ
□
8.1.13. Find the volume of the solid which lies in the first octant and is bounded by the surfaces y2 + z2 = 9 and y2 =
3a;.
Solution. In cylindric coordinates,
/•tt/2 r^- cos2 (ip) 27
V = I II rdxdrdp=—ir.
Jo Jo Jo 16
□
8.1.14. Find the volume of the solid in R3 which is bounded by the cone part 2a;2 +y2 = (z—2)2,z > 2 and the paraboloid
2a;2 +y2 = 8 - z.
Z.
well-defined values of the upper and lower integrals found in this way. If this is done for the indicator function xm of a fixed set M, the inner and outer Riemann measure of the set M is obtained. Evidently, the inner measure is the supremum of the areas given by the the (finite) sums of the volumes of all multi-dimensional intervals from the partitions which are inside M, and on the other hand, the outer measure is the infi-mum of the (finite) sums of the volumes of intervals covering M. It follows directly from the definition that a set M is Riemann measurable if and only if its inner and outer measures are equal.
The sets whose outer measure is zero are, of course, Riemann measurable. They are called measure zero sets or null sets. The finite additivity of the Riemann integral makes the measure finitely additive. Hence, a disjoint union of finitely many measurable sets is again a measurable set, and its measure is given by the sum of the measures of the individual sets in the union.
Consider the measurability of any given set M C I C R™ inside a sufficiently large multidimensional interval /. Consider the boundary dM, i.e. the set of all boundary points of M. For any partition E of I from the definition of the Riemann integral of xm, each of the intervals with non-trivial intersection with dM contributes to the upper integral but might not contribute to the lower integral. On the contrary, for every point in the interior MQ c M its interval hx...in contributes to both the same way as soon as the norm of the partition is small enough. This observation leads to the first part of the following claim:
Proposition. A bounded set M C Rn is Riemann measurable if and only if its boundary is of Riemann measure zero.
If M is a Riemann measurable set and G : M C Rn —> Rn is a continuously differentiate and invertible mapping, then G(M) is again Riemann measurable.
Proof. The first claim is already verified . Since both >";v G and G-1 are continuous, G maps internal p}', points of M to internal points of G(M). To '\/Tj^ finish the proof, it must be verified that G maps ^skil— the boundary dM, which is a set of measure zero, again to a set of measure zero.
Since every Riemann integrable set M is bounded, its closure M must be compact. It follows that G and all partial derivatives of its components are uniformly continuous on M, and in particular on the boundary dM.
Next, consider a partition E of an interval I containing dM and a fixed tiny interval J in a partition including a point t e dM. Write R = G(t) + D1G{t){J -t). J is first shifted to the origin by translation, then the derivative of G is applied obtaining a parallelepiped. This is shifted back to be around G(t). By the uniform continuity of G and D1G, for each e > 0 there is a bound S for the norm of a partition for which
G(J) C G(t) + (1 + e)D1G{t){J - t)
558
CHAPTER 8. CALCULUS WITH MORE VARIABLES
Solution. First of all, we find the intersection of the given surfaces:
(z-2)2 = -z + 8, z>2;
therefore, 2 = 4, and the equation of the intersection is 2x2+y = 4. The substitution a; = -^r cos( Rm is Lipschitz continuous over convex compact sets.
8.1.20. Differential of composite mappings. The following theorem formulates a very useful generalization of the chain rule for univariate functions. Except for the concept of the differential itself, which is mildly complicated, it is actually the same as the one already seen in the case of one variable.
The Jacobi matrix for univariate functions is a single number, namely the derivative of the function at a given point, so the multiplication of Jacobi matrices is simply the multiplication of the derivatives of the outer and inner components of the function. There is, of course, another special case: the formula derived and used several times for the derivative of a composition of multivariate functions with curves. There,
532
CHAPTER 8. CALCULUS WITH MORE VARIABLES
odkud
zx(tt, 1) = zv{tt,\) =
Fx(tt,1,0) = >z(tt,1,0) tt + 1'
Fy (it, 1,0) = tt
>z(tt,1,0) tt + 1'
□
8.G.4. Having the mapping F : R3 -> R2, F(x, y, z) = (f(x,y,z),g(x,y,z)) = (ex sm^ xyz), show that the equation F(x, ci(x), C2(x)) = (0,0) defines a curve c : R —> R2 on a neighborhood of the point [1, tt, 1]. Determine the tangent vector to this curve at the point 1. Solution. We will calculate the square matrix of the partial derivatives of the mapping F with respect to y and z: _ffy fz\ _ (xcosyex^y 0
H(x,y,z)
9y 9z
Hence, H(l,n, 1) =
-1 0
1 tt
xy
and detiř(l,7t, 1) =
—tt=£0. Now, it follows from the implicit mapping theorem (see8.1.24) that the equation F(x, ci(x), Oi{x)) = (0, 0) on a neighborhood of the point [1, tt, 1] determines a curve (ci (x), c2 (x)) defined on a neighborhood of the point [1, tt] . In order to find its tangent vector at this point, we need to determine the )column) vector (fx, gx) at this point: fx\ (smye*^y\ ffx{l, tt, 1)\ /0
9x) V yZ ) ' V5x(l,71", l)J [tt
The wanted tangent vector is thus
(Cl)x(l)\ ffy(l,tt,l) f^tt,!)^'1 f fx(l,tt,l (C2)x(l))) \9v0.,irA) gz(l,tt,l)J \9x(l,tt,l)
V °H"HTIK)-
□
H. Constrained optimization
We will begin with a somewhat atypical optimization problem.
8.H.I. A betting office accepts bets on the outcome of a tennis match. Let the odds laid against player A winning be a : 1 (i. e., if a bettor bets x dollars on the event that player A wins and this really happens, then the bettor wins ax dollars) and, similarly, let the odds laid against player B winning be b : 1 (fees are neglected). What is the necessary and sufficient condition for (positive real) numbers a and b so that a bettor cannot guarantee any profit regardless the actual outcome of the match? (For instance, if the odds were laid 1.5 : 1 against
the differential is the one form expressed via the partial derivatives of the outer components, evaluated on the vector of the derivative of the inner component, again given by the product of the one line (the form) and one column (the vector).
The chain rule
Theorem. Let F : En —> Em and G : Em —> Er be two differentiable mappings, where the domain ofG contains the whole image of F. Then, the composite mapping G o F is also differentiable, and its differential at any point x in the domain of F is given by the composition of differentials
D1{G o F)(x) = D1G{F{x)) o D1F{x).
The Jacobi matrix on the left hand side is the product of the corresponding Jacobi matrices on the right hand side.
Proof. In paragraph 8.1.6 and in the proof of Taylor's i theorem, it was derived how the differentiation of mappings composed of functions and curves behaves. f This proved the theorem in the special case of n = = 1. The general case can be proved analogously, one just has to work with more vectors.
Fix an arbitrary increment v and calculate the directional derivative for the composition G o F at a point x e En. This means to determine the differentials for the particular coordinate functions of the mapping G composed with F. To simplify, write g o F for any one of them.
dv(goF)(x) - 1
Y^-(g{F{x + tv))-g{F{x))).
The expression in parentheses can, from the definition of the differential of g, be expressed as
g(F(x + tv)) - g(F(x) = dg(F(x))(F(x + tv) - F(x))
+ a(F(x + tv) -F(x)), where a is a function defined on a neighbourhood of the point
F(x) which is continuous and lim^-
*(v) = 0. Substi-
tution into the equality for the directional derivative yields
dv(g o F)(x) = lini i fdg(F(x))(F(x + tv) - F(x))
+ a(F(x + tv) - F(x))j
= dg(F(x)) (Jim - \F(x + tv) - F(x)
+ lim j (a(F(x + tv) - F(x))j
= dg(F(x)) o D1F(x)(v) + 0.
The fact that linear mappings between finite-dimensional spaces are always continuous was used. In the last step the Lipschitz continuity of F, i.e. \\F(x + tv) —F(x)\\ < C\\v\\t was exploited, and the properties of the function a.
So the theorem for the particular functions gi,..., gr of the mapping G is proved. The theorem in general now follows
533
CHAPTER 8. CALCULUS WITH MORE VARIABLES
the win of A and 5 : 1 against the win of B, then the bettor could bet 3 dollars on B winning and 7 dollars on A winning and profit from this bet in either case).
Solution. Let the bettor have P dollars. The bet amount can be divided to kP and (1 — k)P dollars, where k e (0,1). The profit is then akP dollars (if player A wins) or b(l — k)P dollars (if B does). The bettor is always guaranteed to win the lesser of these two amounts; the total profit (or loss) is obtained by subtracting the bet P, then. Since each of a, b, P is a positive real number, the function akP is increasing, and the function b(l — k)P is decreasing with respect to k. For k = 0, b(l—k)P is greater; for k = l, akP is. The minimum of the two numbers akP and b(l — k)P is thus maximal for a k e (0,1), namely for the value k0 which satisfies ak0P = b(l — ko)P, whence fco = Therefore, the betting office must choose a, b so that akoP = b(l — ko)P < P, which is equivalent to ak0 < 1, i. e., ab < a + b. □ We managed to solve this constrained optimization problem even without using the differential calculus. However, we will not be able to do so in the following problems.
8.H.2. Find the extremal values of the function
h(x, y, z) = x3 + y3 + z3 on the unit sphere S in R3 given by the equation
F(x, y, z) = x2 + y2 + z2 — 1
as well as on the circle which is the intersection of this sphere with the plane
G(x, y, z) = x + y + z.
Solution. First, we will look for stationary points of the function h on the sphere S. Computing the corresponding gradients (for instance, grad h(x, y, z) = (3x2, 3y2, 3z2)), we get the system
0 = 3x2 - 2\x, 0 = 3y2 - 2\y, 0 = 3z2 - 2\z, 0 = x2 + y2 + z2 - 1
consisting of four equations in four variables. Before trying to solve this system, we can estimate how many local con-strianed externa we should anticipate the function to have. Surely, h(P) is in absolute value equal to at most 1, and this happens at all intersection points of the coordinate axes with
from the definition of matrix multiplication and its links to linear mappings. □
8.1.21. Transformation of coordinates. A mapping F : En —> En which has an inverse mapping G : En —> En defined on the entire image of F is 'i^t-^ called a transformation. Such a mapping can be perceived as a change of coordinates. It is usually required that both F and G be (continuously) differentiable mappings.
lust as in the case of vector spaces, the choice of "point of view", i.e. the choice of coordinates, can simplify or deteriorate comprehension of the examined object. The change of coordinates is now being discussed in a much more general form than in the case of affine mappings in the fourth chapter. Sometimes, the term "curvilinear coordinates" is used in this general sense. An illustrative example is the change of the most usual coordinates in the plane to polar coordinates. That is, the position of a point P is given by \J x2 + y2 from the origin and the angle
its distance r
p = arctan(y/a;) between the ray from the origin to it and the a-axis (if x ^ 0).
.is.
v,/ i
■X/
The illustration shows the the "line" r =
0. 8.J.4. Find the solution of the differential equation
□
x+1
which satisfies y(0) = 1.
Solution. Similarly to the previous example, we get
dy dx y2 + 1 ~ x + V arctany = In | x + 11 + C, CeK.
The initial condition (i. e., the substitution x = 0 and y = 1) gives
arctan 1 = In 111 + C, i. e., C = f. Therefore, the solution of the given initial problem is the function
y(x) = tg (In + + f) on a neighborhood of the point [0,1]. □
8.J.5. Solve
(1) V = 2x + 2y-l-
Solution. Let a function / : (a, b) x (c, d) —> R have continuous second-order partial derivatives and f(x,y) =^ 0, x G (a,b), y G (c, d). Then, the differential equation y' = f(x,y) can be transformed to an equation with separated variables if and only if f(x,y) fy(x,y)
fL(x,y) f"y(x,y)
With a bit of effort, it can be shown that a differential equation of the form y' = f(ax + by + c) can be transformed to an equation with separated variables, and this can be done by the substitution z = ax + by + c. Let us emphasize that the variable z replaces y.
We thus set z = x + y, which gives 2' = 1 + y'. Substitution into (1) yields
z>-l=Z+l
0, x G (a,b), y G (c, d).
dz dx
2z-V
^- + 1,
2z - 1
dz dx 1 '
3z
2 l,i
-z--\n\z\=x + C,
2z-Y dz = 1 dx,
C GR,
must leave the chosen space of functions y invariant, i.e. the images L(y) are also there.
To begin, choose e > 0 and S > 0, both small enough so that [t0 - 5, t0 + S] x [y0 - e, y0 + e] = Vet/, and consider only those functions y(i) which satisfy for J = [to —5, t0 +S] the estimate maxt6j \y(t)—yo\ < £■ The uniform continuity of f(t, y) on V ensures that fixing e and further shrinking S, implies
max \L(y)(t) - y0\
< e.
Finally, the above estimate for \\L(y) — L(z)\\ shows that if S is decreased sufficiently further, then the latter constant D becomes smaller than one, as required for a contraction. At the same time, L maps the above space of functions into itself.
However, for the assumptions of the Banach contraction theorem, which guarantees the uniquely determined fixed point, completeness of the space X of functions on which the operator L works is needed.
Since the mapping f(t, y) is continuous, there follows a uniform bound for all of the functions y(t) considered above and the values t > s in their domain:
\L(y)(t) - L(y)(s)\ < J \f(s,y(s)\ds < A\t - s\
with a universal constant A > 0. Besides the conditions mentioned above, there is a restriction to the subset of all equicontinuous functions in the sense of the Definition 7.3.15. According to the Arzela-Ascoli Theorem proved in the same paragraph at the page 499, this set of continuous functions is already compact, hence it is a complete set of continuous functions on the interval.
Therefore, there exists a unique fixed point y(t) of this contraction L by the Theorem 7.3.9. This is the solution of the equation.
It remains to show the existence of a maximal interval I = (to — a, t0 + b). Suppose that a solution y(t) is found on an interval (to,ti), and, at the same time, the one-sided limit y1 = limt_i.tl_ y(t) exists and is finite.
It follows from the already proven result that there exists a solution with this initial condition (t 1, yi), in some neighbourhood of the point t1. Clearly, it must coincide with the discussed solution y(t) on the left-hand side of t1. Therefore, the solution y(t) can be extended on the right-hand side of 11.
There are only two possibilities when the extension of the solution behind t1 does not exist: either there is no finite left limit y(t) at ti, or the limit yi exists, yet the point (f 1, yi) is on the boundary of the domain of the function /. In both cases, the maximal extension of the solution to the right of to is found.
The argumentation for the maximal solution left of to is analogous. □
I z — j In I Cz I = x,
C^0.
569
CHAPTER 8. CALCULUS WITH MORE VARIABLES
Now, we must get back to the original variable y in one of these forms. The general solution can be written as
lx + ly-
x + y\ = x + C, C e
i. e.,
a; - 2y + In | a; + y | = C, C G
At the same time, we have the singular solution y = —x, which follows from the constraint z ^ 0 of the operations we have made (we have divided by the value 3z). □
8.J.6. Solve the differential equation
xy' + y In x = y In y.
Solution. Using the substitution u = y/x, every homogeneous differential equation y' = f (y/x) can be transformed to an equation (with separated variables)
u' = -\ (f(u) - u), i.e. u'x + u = f(u).
The name of this differential equation is comes from the following definition. A function / of two variables is called homogeneous of degree k iff f(tx,ty) = tkf(x,y). Then, a differential equation of the form
P(x, y) dx + Q(x, y) dy = 0
is a homogeneous differential equation iff the functions P and Q are homogeneous of the same degree k.
For instance, we can discover that the given equation
x dy + (y In x — y In y) dx = 0
is homogeneous. Of course, it is not difficult to write it explicitly in the form
y' = 2 In 2.
^ XX
The substitution u = y/x then leads to
du dx
u'x + u = u lnu, x = u (lnu — 1) ,
8.3.7. Iterative approximations of solutions. The proof of the previous theorem can be reformulated as an iterative procedure which provides approximate solutions using step-by-step integration. Moreover, an explicit estimate for the constant C from the proof yields bounds for the errors.
Think this out as an exercise (see the proof of Banach fixed-point theorem in paragraph 7.3.9). It can then be shown easily and directly that it is a uniformly convergent sequence of continuous functions, so the limit is again a continuous function (without invoking the complicated theorems from the seventh chapter).
Picard's approximations
Theorem. The unique solution of the equation
y' = f(t,y)
whose right-hand side f has continuous partial derivatives can be expressed, on a sufficiently small interval, as the limit of step-by-step iterations beginning with the constant function (Picard's approximation):
Vo(t)=yo, yn+i(t) = L(yn), n = l,....
It is a uniformly converging sequence of differentiable functions with differentiable limit y(t).
du dx u (lnu — 1) x '
Only the Lipschitz condition is needed for the function /, so the latter two theorems are true with this weaker assumption as well. It is seen in the next paragraph that continuity of the function / guarantees the existence of the solution. Yet it is insufficient for the uniqueness.
8.3.8. Ambiguity of solutions. We begin with a simple example. Consider the equation
y
Separating the variables, the solution is
y(t) = \{t + c)\
for positive values y, with an arbitrary constant C and t + C > 0. For the initial values (to, yo) with yo 0, this is an assignment matching the previous theorem, so there is locally exactly one solution. The solution must apparently remain non-decreasing, hence for negative values yo. the solution is the same, only with the opposite sign and t + C < 0.
However, for the initial condition (to,yo) = (to,0), there is not only the already discussed solution continuing to the left of to and to the right, but also the identically zero solution y(t) = 0. Therefore, these two branches can be glued arbitrarily (see the diagram, where the thick solution can be continued along the t axis and branch along the parabola at any value t.)
570
CHAPTER 8. CALCULUS WITH MORE VARIABLES
where u(lau — 1) ^ 0. Using another substitution, namely t = In u — 1, we can integrate
du
u (Inu — 1) dt
T
In 1t1 = In I x I + In | C |,
In I In u — 1 I = In Cx \,
lnu — 1 = Cx, V
In-
x V :
-Cx + l, xtCx+1,
dx
x
dx
x
C^O, C^O,
c^o, c^o, c^o.
The excluded cases u = 0 and In u = 1 do not lead to two more solutions sinceu = 0 implies y = 0, which cannot be put into the original equation. On the other hand, In u =1 gives y/x = e, and the function y = ex is clearly a solution. Therefore, the general solution is
y = xeCx+1, CeR.
□
8.J.7. Compute
V
I _ 4x+3j/+l
3x+2j/+1 '
Solution. In general, we are able to solve every equation of the form
ax + by + c
(1)
, = f ( ax + by + c \ V J\Ax + By + c)
Ax + By + C, If the system of linear equations
(2) ax + by + c = 0, Ax + By + C = 0
has aunique solution x0, y0, then the substitution u = x—x0, v = y — Ho transforms the equation (1) to a homogeneous equation
dv _ £ 1 au-\-bv
du J \Au+Bv
If the system (2) has no solution or has infinitely many solutions, the substitution z = ax + by transforms the equation (1) to an equation with separated variables (often, the original equation is already such).
In this problem, the corresponding system of equations
4x + 3y + 1 = 0, 3a; + 2y + 1 = 0
has a unique solution xq = — 1, yo = 1- The substitution u = x+1, v = y — 1 then leads to the homogeneous equation
Nevertheless, the existence of a solution is guaranteed by the following theorem, known as Peano existence theorem: uk'Sai"'^^^.
Theorem. Consider a function j(t,y) : R2 —> R which is continuous on an open set U. Then for every point (to,yo) £ U D R2, there exists a solution of the equation
y' = f(t,y)
locally in some neighbourhood oft0.
Proof. The proof is presented only roughly, with the details left to the reader.
We construct a solution to the right of the
dv du
4u+3v ' 3u+2v '
initial point to. For this purpose, select a small step h > 0 and label the points
tk = t0 + kh, A; = 1,2,....
The value of the derivative f(to,yo) of the corresponding curve of the solution (t,y(t)) is defined at the initial point (to, yo), so a parametrized line with the same derivative can be substituted:
V(o)(t) = yo + f(to,yo)(t-t0).
Label y± = y^0) (h)- Construct inductively the functions and points
y(k)(t) =yk + f(xk,yk)(t - tk), yk+1 = y{k)(tk+1). Now, define yh(t) by gluing the particular linear parts, i.e.,
yh(t) = y(k)(t) ifte[kh,(k + \)h].
This is a continuous function, called the Euler's approximation of the solution.
It "only" remains to prove that the limit of the functions yh for h approaching zero exists and is a solution. For this, one must observe (as done already in the proof of the theorem on uniqueness and existence of the solution) that f(t,y) is uniformly continuous on a sufficiently small neighbourhood U where the solution is sought. For any selected e > 0, a sufficiently small S such that \ f(t,y) — f(s,z)\ < e, exists whenever \\(t-s,y-z)\\