MASARYK UNIVERZITY
Brisk guide to maths
Jan Slovák, Martin Panák, Michal Bulant
et al
Brno 2013
The work on the textbook has been supported by the project
CZ. 1.07/2.2.00/15.0203.
INVESTMENTS IN EDUCATION DEVELOPMENT
Authors:
Mgr. Michal Bulant, Ph.D. Mgr. Aleš Návrat, Dr. rer. nat. Mgr. Martin Panák, Ph.D. prof. RNDr. Jan Slovák, DrSc. RNDr. Michal Veselý, Ph.D.
Graphics and illustrations:
Mgr. Petra Rychlá
©2013 Masaryk University
Contents
Chapter 1.   Initial warmup 3
1. Numbers and functions 3
2. Combinatorics 7
3. Difference equations 11
4. Probability 15
5. Plane geometry 23
6. Relations and mappings 36
Chapter 2.   Elementary linear algebra 72
1. Vectors and matrices 72
2. Determinants 83
3. Vector spaces and linear mappings 92
4. Properties of linear mappings 108
Chapter 3.   Linear models and matrix calculus 137
1. Linear processes 137
2. Difference equations 143
3. Iterated linear processes 150
4. More matrix calculus 157
5. Decompositions of the matrices and pseudoinversions 176
Chapter 4.   Analytic geometry 207
1. Affine and euclidean geometry 207
2. Geometry of quadratic forms 227
3. Projective geometry 234
Chapter 5.   Establishing the ZOO 254
1. Polynomial interpolation 254
2. Real number and limit processes 263
3. Derivatives 281
4. Power series 293
Chapter 6.   Differential and integral calculus 347
1. Differentiation 347
2. Integration 364
3. Infinite series 382
Chapter 7.   Continuous models 419
1. Fourier series 420
2. Metric spaces 432
3. Integral operators 448
4. Discrete transforms 455
Chapter 8.   Continuous models with more variables 463
1. Functions and mappings on M" 463
2. Integration for the second time 494
3. Differential equations 516
4. Notes about numerical methods 539
Preface
This textbook follows the years of lecturing Mathematics at the Faculty of Informatics at Masaryk University in Brno. The programme requires introduction to genuine mathematical thinking and precision, but there is not much time dedicated.
Thus, we want to cover seriously, but quickly, about as much of mathematical methods as usual in bigger courses in the classical Science and Technology programmes. At the same time, we do not want to give up the completeness and correctness of the mathematical exposition. We want to introduce and explain more demanding parts of Mathematics, together with elementary explicit examples how to use the results. But we do not want to solve for the reader how much of theory or practice to enjoy and in which order.
All these requests have lead us to the two collumn format where the rahter theoretical explanation and the practical examples are split. This way, we want to please and help the readers to find their own way. Either to go through the examples and algorithms first, and then to come to explanations why the things work, or the other way round. We also hope to overcome the usual stress of the readers horrified by the amount of the stuff. With our text, they are not supposed to read through everything in a linear order. On the opposit, the readers should enjoy browsing through the text and finding their own paths.
In both collumns, we intend to present rather standard exposition of basic Mathematics, but focusing on the essence of the concepts and their relations. The examples are solving simple mathematical problems but we also try to show thei use in mathematical models in practise as much as possible.
We are aware thath the theoretical text is written in a very compact way. A lot of details are left to readers, in particular in the more difficult paragraphs. Similarly, the examples display the variety from very simple ones to those reuqesting some deeper thoughts.
We would very much like to help the reader
• to formulate precise definitions of basic concepts and to prove simple mathematical results,
• to percieve the meaning of roughly formulated properties, relations and outlooks for exploring mathematical tools,
• to understand the instructions and algorithms creating mathematical models and to appreti-ate their usage.
The goals are ambitions and nearly everyone needs his or her own paths, including failures. This is one of the reasons why we come back to basic ideas and concepts several times with growing complexity and width of the discussions. Of course, this might also look as chaotic, but we very much hope that this approach gives a much better chance to those who will persit in their effort.
Clearly this textbook cannot be the only source for everybody. Actually, the only really good proceeding is the combine several sources and to think about their differences on the way. But we hope, it should be a perfect begin and help for everybody who is ready to return back to the individual parts again and again.
To make this task simpler, we have added emotive icons. We hope they will not only spirit the dry mathematical text but also indicate which parts should be rather be read carefully or better jumped over in the first round. The usage of the icons follows the feelings of the authors and we tried to use them in a systematic way. Roughly speaking, we are using icons warning before complexity, difficulty etc.:
Further icons indicated unpleasant technicality and need of patiance:
Finally, there also icons showing up the joy of the game:
The practical collumn with the examples should be readable nearly independently off the theory. Without the ambition to know the deeper reasons why the algorithms work, it should be possible to readjust here. Some definitions and descriptions in the theoretical text are marked to be catched easily when reading the examples, too. The examples and theory are partly coordinated to allow jumping there and back, but the links are not tight.
CHAPTER 1
Initial warmup
"value, difference, position"
- what it is and how to comprehend it?
A. Numbers and functions
We can already work with natural, integer, rational and real numbers. We argue why rational numbers are not sufficient for us (although computers are actually not able to work with any other) and we recall the so-called complex numbers (because even reals are not enough for some calculations).
1.1. Find some real number which is not rational.
Solution. One among many possible answers is V2. Already the old Greeks knew that if we prescribe the area of rectangle a2 = 2, then we cannot find rational a which would satisfy it. Why?
Assume we know that it holds (p/q)2 = 2 for natural numbers p and q that do not have common divisors different from 1 (otherwise we can further reduce the fraction p/q). Then p2 = 2q2 is an even number. Thus, on the left-hand side p2 is even, and so is p. Hence, p2 is divisible by 4, and so q must be even that implies p and q have 2 as a common factor, which is a contradiction.
□
1.2. Remark. It can be even proven that 72-th root of a rational number, where n is natural, is either natural or is not rational (see ||G||
The goal of the first chapter is to introduce the reader to the fascinating world of mathematical thinking. For that we choose our examples of mathematical modelling of real situations using abstract object and connections to be as specific as possible. We also go through a few topics and mechanisms to which we will subsequently return in the rest of the book, and in the end of the chapter we will speak about the language of mathematics itself (which we will mostly use in an intuitive way).
The easier the objects and settings we work with are, the more it is difficult to understand in depth the nuances of the use of particular tools and mechanisms. Mostly it is possible to reach the core ideas only through their connection to others. Therefore we introduce them from many points of view at once.
Changing the topics very often might be confusing, but it will surely get better when we return to specific ideas and notions in later chapters.
The name of this chapter can be also understood as an encouragement to patience. Even the simplest tasks and ideas are easy only for those who have already seen similar ones. Full knowledge and mathematical thinking can be reached only through a long and complicated.
Let us start with the simplest thing: common numbers.
1. Numbers and functions
Since the dawn of ages people wanted to know "how much" of something they had, or "how much" is some-r" "^^r^^i tnm8 w°rth, "how long" will a particular task take, etc. The result of such ideas is usually //// -I some "number". We consider something to be a number, if we can multiply it and add it, and it behaves according to the usual rules - either according to all rules we except, or only to some. For instance, the result of multiplication does not depend on the order of multiplicands, we have the number zero whose adding does not change the result, we have the number one which behaves in a similar manner with respect to addition, and so on.
The simplest example are the so-called natural numbers, which we denote N — {0, 1, 2, 3,...}. Note that we consider zero to be a natural number, as it is usual especially in computer science.
To count "one, two, three,..." is learned already by little children in their pre-school age. Some time later we meet the integers Z — {..., —2, — 1, 0, 1, 2,...} and finally we get used to floatingpoint numbers, and we know what a 1.19-multiple of the price means thanks to the 19% tax.
CHAPTER 1. INITIAL WARM UP
1.3. Find all solutions to the equation x2 = b for any real number b.
Solution. We know that this equation always has a solution x in the domain of real numbers, whenever b is non-negative. If b < 0, then such real x cannot exist. Thus we need to find a bigger domain, where this equation has a solution.
First we add to the real numbers a new number i, so-called imaginary unit, and try to extend the definitions of addition and multiplication in order to preserve the usual behaviour of numbers (as summarised in 1.1).
Clearly we need to be able to multiply the new number i by real numbers and sum it with real numbers. Therefore we need to work in our newly defined domain of complex numbers C with formal expressions of the form z, = a + i b.
In order to satisfy all the properties of associativity and distributiv-ity, we define the addition so that we add independently the real parts and the imaginary parts. Similarly, we want the multiplication to behave as if we multiply the tuples of real numbers, with the additional rule that f = — 1, that is,
(a + i b) + (c + i d) = (a + c) + i (b + d), (a + i b) ■ (c + i d) = (ac — bd) + i (be + ad).
□
The real number a is called the real part of the complex number z„ the real number b is called the imaginary part of the complex number z„ and we write re(z) = a, im(z) = b.
1.4. Assert that all the properties (KG1-4), (01-4) and (P) of scalars from 1.1 hold.
Solution. Zero is the number 0 + i 0, one is the number 1 + i 0, both these numbers are for simplicity denoted as before, that is, 0 and 1. All properties are obtained by direct calculations. □ Complex number is given by a tuple of real numbers, therefore it is a point in the real plane M2.
1.5. Show that the distance of the complex number z, = a + i b from the origin (we denote it by \z\) is given by the expression zz, where z„ the complex conjugate, is a — i b.
Solution. The product
ZZ = (a2 + b2) + i (-ab + ba) = a2 + b2
is always a real number and indeed gives us the square of the distance from the number z, to the origin. Thus it holds \z\2 = zz. □
1.1. Properties of numbers. In order to be able to properly work with numbers, we need to be more careful with their definition and properties. In mathematics, the basic statements about properties of objects, whose validity is assumed without the need to prove them, are called axioms. A good choice of axioms determines the range of the theory they give rise to, and also to its usability in mathematical models of reality.
Let us now list basic properties of the operations of addition and multiplication for our calculations with numbers, which we
denote by numbers a,b,c,____       Both operations work by taking
two numbers a, b and by applying addition or multiplication we obtain the resulting values a + b and a ■ b.
__ |    Properties of scalars    [__>
Properties of numbers:
(KG1) (a + b) + c — a + (b + c), for all a, b, c
(KG2) a + b — b + a, for all a, b
(KG3)  there exists 0 such that for all a it holds that a + 0 — a (KG4) for all a there exists b such that a + b — 0
The properties (KG1)-(KG4) are called the properties of commutative group. They are called associativity, commutativity, existence of neutral element (when speaking of addition we usually say zero element), existence of inverse element (when speaking of addition we also say the negative of a and denote it by —a), respectively. Properties of multiplication:
(01) (a ■ b) ■ c — a ■ (b ■ c), for all a, b, c
(02) a ■ b — b ■ a, for all a, b
(03) there exists 1 such that for all a it holds that 1 • a — a
(04) a ■ (b + c) — a ■ b + a ■ c, for all a, b, c.
The properties (01)-(04) are called associativity, commutativity, existence of unit element and distributivity of addition with respect to multiplication, respectively.
The sets with operation +, • that satisfy the properties (KG1)-(KG2) and (01)-(04) are called commutative rings. Further properties of multiplication:
(P)      for every a / 0 there exists b such that a ■ b — 1. (OI) if   a ■ b — 0, then either a — 0 or b — 0
The property (P) is called existence of inverse element with respect to multiplication (this element is then denoted by a-1) and the property (OI) then says that there exists no "divisors of zero".
Properties of the operations of addition and multiplication will be often used, even if we do not know what object are ifl^£      we really working with. In this way we obtain very f^fivt   general mathematical tools. However, it is good to 'ffsr*%5s-L have some idea of typical examples of object we work with.
The integers Z are a good example of commutative group, the natural numbers are not since they do not satisfy (KG4) (and possibly do not even contain the neutral element if one does not consider zero to be natural).
If a commutative ring also satisfies the property (P), we speak of field (often also about commutative field).
4
CHAPTER 1. INITIAL WARM UP
1.6. Remark. The distance \z\ is also called the absolute value of the complex number z.
1.7. Polar form of complex numbers. Let us first consider complex numbers of the form z = cos <p + i sin <p, where <p is a real parameter giving the angle between the real axis and the line from z, to the origin (measured in the positive sense and taking values between 0 and 2tt to avoid ambiguity). These numbers describe exactly all points on the unit circle in the complex plane. Every non-zero number z, can be then written in a unique way as
z = \z\(cos(p + i sincp).
The number <p is called the argument of complex number z.
1.8. Multiplication of complex numbers in the polar form. Let the numbers z,\ = \zi\ (cosO^) + i sinO^)) and z2 = \z2\ (cos(<p2) + i sin(<p2)) be given, and let us calculate their product:
zi ■ z2   =   \zi| (cosOpO + i sinO^)) • \z,2\ (cos(<p2) + i sin(<p2)) =   |zi||z2| cos^Ocos^) - sinOpO sin(<p2) +
+i (cos(ipi) sin(<p2) + sin(<pi) cos(<p2)) =   \zi\\z2\[cos((pi +<p2) + i sinO?! + <p2)],
the last equality is a result of addition formulas for trigonometric functions. Repeated application of the previous formula on the product of the number z, with itself yields the so-called "Moivre theorem":
z" = [\z\(cos(p + i sincp)f = \z\"(cos(?2<p) + i sinincp)).
1.9. Express the number zi = 2 + 3/ in the polar form and express the number z,2 = 3(cos(7r/3) + i sin(7r/3)) in the algebraic form.
Solution. The absolute value of the given number (the distance of the point with Cartesian coordinates [2, 3] in the plane from the origin) is V22 + 32 = ^13. From the right triangle in the picture we can easily compute sin(<p) = 3/Vl3, cos(<p) = 2/Vl3. Thus it is ^ = arcsin(3/Vl3) = arccos(2/yi3) = 53, 3°. In total,
zi
2 3 + i •
13
13
cos I arccos
+ i sin arcsm '13// V VV13
Transition from the polar form to algebraic is even simpler:
z2
3 (cos (!)+,• sin (!))
□
The last stated property (OI) is automatically satisfied if (P) holds. However, it does not work the other way and thus we say that the property (OI) is weaker than (P). For instance the ring of integers Z does not satisfy (P) but satisfies (OI). In such case we use the term integral domain.
Let us note that the set of all non-zero elements in the field along with the operation of multiplication satisfies (OI), (02), (03), (P) and thus is also a commutative group. Only, instead of addition we speak of multiplication. As an example, we can take all non-zero real numbers.
The elements of some set with operations + and • satisfying (not necessarily all) stated properties (that is, commutative field, integral domain, field) will be called scalars. To denote them we use lowercase Latin letters, either from the beginning or from the end of the alphabet.
All properties (KG1)-(KG4), (01)-(04), (P), (OI) from our consideration are necessary to be considered an axiomatic definition of the corresponding mathematical terms. For our needs it is enough to keep in mind that in further discussions we will use only these properties of scalars and thus our results will hold for any objects with such properties.
Exactly this is the true power of mathematical theories - they do not hold just for a specific solved example. Quite the opposite, when build in a rational way they are always universal. We will try to emphasise this aspect, although our ambitions are very humble due to the limited size of the book.
1.2. Existence of scalars. In order to build a mathematical theory, we need to ensure that such objects can exist. Thus i*r*%,s we wiU show how to construct basic sets of numbers. For the construction of natural numbers let us start with the assumptions that we know what sets are. Let us denote the empty set by 0 and define
(1.1) 0 := 0,   n + 1 := n U {«} ,
in other words
0 := 0, 1 := {0}, 2 := {0, 1},..., n + 1 := {0, 1,..., n}.
This notation says that if have already defined the numbers 0, 1, 2,... n, then the number n + 1 is defined as the set of all previous numbers.
In this way we identify the natural numbers with the number of elements of some specific sets. The number n is a set with exactly n elements, and two natural numbers a, b are identical if and only if the corresponding sets have the same number of elements. In the set theory we say "cardinality" of a set instead of "the number of elements" of a set. This term makes sense even for infinite sets (the other does not).
At first sight we also see the usual definition of ordering of natural numbers according to their size (the number a is strictly smaller then b if and only if a / b and a c b as a set). A further formal step should be the definition of addition and multiplication and a proof of all basic properties of natural numbers, including the afore-stated axioms of commutative ring. For instance, we can easily show that every subset in N has a smallest element and many other properties about which we usually don't think too long and consider them trivial.
We will not deal with the construction of numbers deal in much detail and assume that the reader knows the rational numbers
5
CHAPTER 1. INITIAL WARM UP
1.10.  Express z = cos 0 + cos j + i sin j in polar form. Solution. To express number z, in polar form, we need to find out its absolute value and argument. Let us first calculate the absolute value:
|z| = y(cos0 + cosf)2 + sin2f = y(l + i)2+(^)2 = 73. Now for the argument <p we have:
re(z)       1 + 7 J3
cos(p =       = —j± = sirup =
therefore <p = jt/6. Thus we have obtained
z = V3 (cos I + i sin f)
im(z)
(Q), the real numbers (M) and the complex numbers (C). From time to time we will recall some theoretical and practical connections during the course of the book. In detail, the construction of rational numbers from the natural numbers is done in 1.40. The construction of real numbers is closely connected to the limit processes which we do later, and some time sooner we will deal with complex numbers from various algebraic viewpoints. The figure depicts how can we perceive number domains as embedded one into another (that is, the complex plane contains many copies of natural and integer numbers, the real line, and so on).
Furthermore, as it is usual in mathematics, we will use variables (letters of alphabet or other symbols) to denote numbers, and it does not matter whether we know their value beforehand or not.
□
1.11.  Using the Moivre theorem calculate
(cos I + i sin |)31 .
Solution. We immediately obtain
(cos j + i sin |-)
31
cos ^ + i sin ^
6 6
V3 2
cos ljr + i sin ^
6 6
2'
□
More examples about complex numbers can be found at || 1.1111|
1.12. Complex numbers are not just a tool to obtain "weird" solutions to quadratic equations, but are necessary to determine solutions to cubic equations, even if these solutions are real. How can we express solution to the cubic equation
x3 + ax2 + bx + c = 0
in real coefficients a,b,cl We show a method developed in sixteenth century by Ferro, Cardano, Tartaglia and possibly others. Let us substitute x := t — a/3 (to remove the quadratic part from the equation) to obtain the equation:
t3 + pt + q = 0,
where p = b—a2/3 etadq = c+(2a3 — 9ab)/21. Now let us introduce unknowns u, v satisfying conditions u + v = t and 3uv + p = 0. Plugging the first condition to the previous equation we obtain
u3 + v3 + (3uv + p)(u + v) + q = 0,
and then plugging the second yields
p3
u6 + qu3 - — = 0, H 27
which is a quadratic equation in the unknown s = u3. Thus we have
3, «±j!t + £t
4 27
1.3. Scalar functions. We often work with some value which is not given as a particular number. Instead, we something about the dependence of the value on other values. Formally we write that the value y — fix) of our "dependent" variable value y is give by the "independent" value x. We can consider the knowledge of / formally (it is just some unspecified dependence) or operationally - / (x) is given by a formula which consists of known operations (for now, we assume a there is just a finite number). If the value is scalar, we speak of a scalar function. Every function is defined over some set, we speak of domain of a function, and the range of all values of the function is then the so-called codomain of the function.
We can have the values of / to be given only approximately or with some probability.
The point of mathematical considerations is then to derive, from a non-formal description of dependencies, explicit formulas for functions which describe them, or at least explicit evaluations for specific values of dependent values, or their approximation. According to the type of he task and the goal we work with
• exact finite expression,
• infinite expression,
• approximation of an unknown function with a known value (usually with explicitly evaluated possible error of the approximation),
• approximation of values with evaluation of their probability, etc.
Scalar function is for instance yearly pay of a worker in some company (the values of of independent variable, that is, the domain of the function, are individual workers x from the set of all considered workers, fix) is their yearly pay for the given year). Analogously, we can observe monthly pay of a particular worker in time (the independent variable is then time in months, dependent variable is the pay in the given month). Another example is then the area of a planar object, volume of an object in space, speed of a particular car in time, etc. We can surely imagine that in all given cases the value can be given by some loosely described connection or measured approximately, etc.
1.4. Functions denned by operations. Functions can be given by listing their values - for instance in a company there
I / are only finitely many employes and we can compose a table with their actual monthly pays. More often, we have some rules how to obtain the values instead of definite tables.
6
CHAPTER 1. INITIAL WARM UP
By substituting back, we obtain
x = —p/3u + u — a/3.
In the expression for u there is cubic root, and in order to obtain all three solutions we need to work with complex roots. The equation x3 = a, a 7^ 0, with the unknown x has exactly three solutions in the domain of complex numbers (The fundamental theorem of algebra, see (||?? ID). All these three solutions are called cubic root of a. Therefore the expression %[a has three meanings in the complex domain. If we want to have a single meaning for that expression, we usually consider it to be the solution with the smallest argument.
Let us further add that in the used method there could possible arise division by zero. In this case another method (usually simpler) is necessary.
Factorial function
1.13.   Solve the equation
x3 +x2
2x - 1 = 0.
Solution. As we can easily find out, this equation has no rational roots (methods to determine rational roots will be introduce in the part (||?? ID). Plugging into obtained formulas we yield p = b — a2/3 = —7/3, q = —7/27, for u we then have
^28 ± \2j=W
u =-,
6
where we can theoretically choose up to six possibilities for u (two for the choice of the sign and three independent choices of the cubic root). But as we can easily see, we obtain only three distinct values for x. Plugging into (|| 1.12||) one of the roots is of the form
14 ra_l =
6 3
'3(28 - 84/V3)
similarly for the other two (approximately —0, 445 and —1, 802). As we have said before, we see that even if we have used complex number during the course, the result is real. □
B. Combinatorics
In this section we will work with natural numbers that describe some indivisible items located in our real life space, and deal with questions like how to compute the number of their (pre)orderings, choices, and so on. In a great number of these problem, "common sense" is sufficient. We just need to use the rules of product and sum in the right way, as we show in the next examples:
An important scalar function denned over natural numbers is the so-called factorial, defined by the relation
/(0) = 1, /(«) = «■/(«-!)
for n — 1,2,____We write f(n) — n \ and the definition clearly
means that
n \ — n ■ (n — 1) • • • 1.
Our definition of the factorial function says how its value f(n) changes when we change the value of n by one. The formula for n! then explicitly says, how much is that in total. In this case it is not a very effective formula, since its complexity increases with increasing n, but it is hard to find a better one.
Let us have a look on the natural number addition as on a scalar function defined by operations. The domain is then the set of all tuples (a, b) of natural numbers. We define a + b as a result of the procedure of adding 1 to a a couple of times. Recall that we have defined a + 1 in general in the equation (1.1). For every addition of one we remove the greatest element from b, and we carry this on until b is not empty (effectively, for every addition we subtract one from b, whose value at any given time tells us how many more ones shall we add).
It is evident that addition defined in this way is given by an (iterative) formula, but this approach is not very efficient for practical addition. This will happen in our exposition often - theoretical correct definition of a term or an operation does not mean that the given procedure can be carried effectively. For this reason we will develop theories to obtain practical tools. As for natural numbers, we are able to deal with them quickly (while they are small), and for the large ones we know the algorithm for reducing it to addition of more smaller numbers, and for the really large ones we have computers (unless they are very very large).
2. Combinatorics
A typical "combinatorial" problem is to enumerate in how many ways something can happen. For instance, in ____how many ways can we choose two different sandwiches from the daily offer in a grocery shop? Is this ■- the situation where we consider that all sandwiches that are there are pairwise distinct or do we consider only distinct kinds of sandwiches? Do we then allow to take two of the same type? Many such question occur in the context of card and other games.
When solving particular problems we usually use either the so-called "rule of product", where in mutually independent tasks we combine all possible results, or the "rule of sum", where we sum the number of ways for distinct incompatible tasks. Practically we demonstrate it in many examples.
1.5. Permutations. If from a set of n elements we create some order of the elements, we can choose the first element in n ways, the second can be then chosen in n — 1 ways and so on, until we take the last element for which there is only one choice. Therefore for a given finite set S with n elements there are exactly n! distinct orders (we have used the product rule). The process of ordering elements of a set S is called permutation of the elements of S. The result of a permutation is always some ordering of the elements.
7
CHAPTER 1. INITIAL WARM UP
1.14. Mother wants to give John and Mary five pears and six apples. In how many ways can she divide the fruits among them? (We consider both pears and apples to be indistinguishable. We also allow that one of the children might not get anything.)
Solution. Five pears can be divided in six ways (it is determined by the number of pears given to John, the rest goes to Mary.) Six apples can be independently divided in seven ways. Using the rule of product, the total number is 6 • 7 = 42. □
1.15. Determine the number of four-digit numbers, which start with the digit 1 and do not end with the digit 2, or that end with the digit 2 but do not start with the digit 1.
Solution. The set of the numbers described in the statement consists of two disjoint sets, that is, of numbers which start with the digit 1 but do not end with the digit 2 (the first set), and of numbers that do not begin with the digit 1 and end with the digit 2 (the second set). The total number is then obtained using the rule of sum by summing the number of numbers in these two sets. In the first set there are numbers of the form "1XXY" where X is an arbitrary digit and Y is any digit except 2. Thus we can choose the second digit in ten ways, independently of that the third digit in ten ways and again independently the third digit in nine ways. These three choices then uniquely determine a number and using the rule of product we then have 10 • 10-9 = 900 of such numbers. Similarly in the second set we have 8-10-10 = 800 numbers of the form "YXX2" (for the first digit we have only eight ways, since the number cannot start with zero and one is forbidden). Using the rule of sum we have 900 + 800 = 1700 numbers. □
1.16. Determine the number of ways of placing white tower and black tower on the chessboards (of size 8 x 8), that they don't threat each other (that is, they are neither in the same column nor in the row).
Solution. Let us first place the white tower. For it we can choose among 82 positions. In the second step we place the black tower. Now we have "to our disposal" 72 positions. Using the rule of some the total number of ways is 82 • 72 = 3 136. □ In the following examples we will use the notions of combination, permutation, variation (possibly with repetitions), which we have defined.
1.17. During the conference, 8 speakers are scheduled. Determine the number of all possible orderings in which two given speakers do not speak one right after the other.
Solution. Let us denote the two given speakers as A and B. If right after the speaker A follows B, we can see it as a speech of a single
If we first number the elements in S, that is, we identify S with the set S — {1,..., n] of n natural numbers, then permutations correspond to possible orderings of numbers from one to n. Thus we have an example of a simple mathematical theorem and this discussion can be considered to be its proof:
_ [    Number of permutations    [__^
Proposition. The number p(n) of distinct orderings of a finite set with n elements is given by the factorial function:
(1.2) p(n) = n!
1.6. Combinations and variations. Another simple example of a value determined by formula are so-called binomial coefficients, which express in how many ways can we choose k distinguishable items from a set of n items.
mjzfx*m>>-1-~ Clearly we have
n(n - 1) • • • (n - k + 1) of possible results of a subsequential choosing of our k elements, but we obtain the same /c-tuple in k\ distinct orders. If we want to choose the items along with an ordering, we speak of a variation of &-th degree.
As we have just checked, the number of combinations and variations are given by the following formula, which are not very effective for calculations with k and n large, since they contain factorials.
_ [    Combinations and variations    [__%
L
Proposition. For the number c(n, k) of combinations ofk-th degree among n elements, where 0 < k < n, it holds that
(1.3)
/   „     (n\     nin-\)...in-k + \) n\
cin, k) — [    ) — - = -.
\kj k(k-l)..A (n-k)\k\
For the number vin, k) of variations it holds that
(1.4) v(n,k) = n(n-l)---(n-k+l) for all 0 < k < n (and zero otherwise).
We pronounce binomial coefficient (£) as "n over k". The name stems from the so-called binomial expansion, which is the expansion of (a + b)n. If we expand (a + b)n, the coefficient at akbn~k equals for every 0 < k < n exactly the number of ways to choose a k-tuple from n parentheses in the product (from these parentheses, we take a, from the others, we take b). Therefore we have
(i.5)        (fl+*r = £("W-*
and note that for the derivation only distributivity, commutativity and associativity of multiplication and summation was necessary. The formula (1.5) therefore holds in every commutative ring.
Let us present another simple example of a mathematical proof - a few simple propositions about binomial coefficients. For a simplification of formulations we define (£) — 0 whenever k < 0 or k > n.
1.7. Proposition. For all natural numbers k and n we have
8
CHAPTER 1. INITIAL WARM UP
speaker AB. The number of all orderings where B speaks right after A is then equal to the number of permutations of seven elements. Clearly, the same number is for the number of all orderings where A speaks right after B. Since the number of all possible orderings of eight speakers is 8!, the result is 8! — 2 • 7!. □
1.18. How many anagrams of the word PROBLEM are there, such that
a) the letters B and R are next to each other,
b) the letters B and R are not next to each other.
Solution, a) The pair of letters B and R can be assumed to be a single indivisible "double-letter". In total we have six distinct letters and there are 6! words of six indivisible letters. In our case we have to multiply this by two, since are double-letter can be either BR or RB. Thus the result is 2 • 6!.
b) 7! — 2 • 6! the complement to the part a) to the number of all seven-letter words of distinct letters. □
1.19. In how many ways can an athlete place 10 distinct cups to 5 shelves, if into every shelf all 10 cups fit?
Solution. Let us add 4 indistinguishable items, say separators, to the cups. The number of all distinct orderings of cups and separators is clearly 14!/4! (the separators are indistinguishable). Every placement of cups into shelves corresponds to exactly one ordering of cups and separators. It is enough to say that the cups before the first separator in the ordering are placed in the first shelf (preserving the order), the cups between the first and the second separator in the second shelf, and so on. Thus the number 14!/4! is the result. □
1.20. Determine the number of four-digit numbers with exactly two distinct digits.
Solution. Two distinct letters used for the number can be chosen in (2°) ways, from two chosen digits we can compose 24 — 2 distinct four-digit numbers (we subtract the 2 for the two one digit numbers). In total we have (2°)(24 - 2) = 630 numbers. But in this way, we have also computed the numbers that start with zero. Of these there are (j)(23 — 1) = 63. Thus we have 630 — 63 = 567 numbers. □
1.21. Determine the number of even four-digit numbers composed of exactly two distinct digits.
Solution. Analogously to the previous example, let us first ignore the peculiarities of the digit zero. We thus obtain © (24 -2)+5-5(23 -1) numbers (we first count only the number that consist only of even digits, the second summand gives the number of even four-digit numbers
(2) (III) = © + (k"+i)
(3) ELo GD =2"
(4) £SLo*GD = w2"-1-
Proof. The first proposition is immediate directly from the formula (1.3). If we expand the right-hand side of (2), we obtain
n
k + 1
k\{n-k)\ {k+\)\{n-k (k+ l)n! + (n-k)n\ (k + l)!(n -k)\ (n + l)!
1)!
(k+ \)\(n-k)\
which is the left-hand side of (2).
In order to prove (3), we use the so-called mathematical induction. This tool is very suitable for statements saying that something should hold for every natural number _^ n. The mathematical induction consists of two steps. In the first, base step we assert the claim for n — 0 (in general, for the smallest n the claim should hold for). In the second, inductive step we assume that the claim holds for some n (and all smaller numbers) and using this we prove that that this implies the claim for n + 1. Putting it together, we obtain that the claim holds for every n.
The claim (3) clearly holds for n — 0, since Q — 1 = 2°. (Similarly easy is it also for n — 1.) Now let us assume that the claim holds for some n and calculate the corresponding sum for n + l using the claims (2) and (3). We yield
n + l   ,        .N       n + l r
k=0
k=0
n
k- 1
k=-l v 7      k=0 V 7
~>n   , >~>n _      ^n + l
Note that the formula (3) gives the number of all subsets of an n-element set, since (£) is the number of all subsets of size k. Note also that (3) follows from (1.5) by choosing a — b — 1.
To prove (4) we again employ induction, as in (3). For n — 0 the claim clearly holds. Inductive assumption says that (4) holds for some n. Let us now calculate the corresponding sum for n + 1 using (2) and the inductive assumption. We obtain
n + l k=0
n + 1
n + l r k=0 L
n
k- 1
n k
n + l
k=-l v 7      k=0    V 7
=t(;)+t»(;)+i:»C)
jfc=0 v 7      k=0    V 7      k=0    V 7
= 2" + nl"'1 + Hi"'1 = (n + 1)2".
This completes the inductive step and the claim is proven for all natural n. □
9
CHAPTER 1. INITIAL WARM UP
with one digit even and one digit odd). Again we have to subtract the numbers that start with zero, of those there are (23 — 1)4 + (22 — 1)5. The final number is thus
'5N
(24 - 2) + 5 • 5(2J - 1) - (2J - 1)4 - (2Z - 1)5 = 272.
□
1.22. There are 677 people at a concert. Do some of them have the same name initials?
Solution. There are 26 letters in the alphabet. Thus the number of all possible name initials are 262 = 676. Thus at least two people have the same initials. □
1.23. New players meet in a volleyball team (6 people). How many times do they shake hands when introducing to each other (everybody shakes with everybody)? How many times do they shake hands with the opponent after playing a match?
Solution. Every tuple of players shakes hands at the introduction. The number of handshakes is then equal to the combination C(2, 6) = (2) = 15. After a match each of the six players shakes hands six times (with each of six opponents). Thus the number is 62 = 36. □
1.24. In how many ways can five people be seated in a car for five people, if only two of them have driver licence? In how many ways can 20 passengers and two drivers be seated in a bus for 25 people?
Solution. On the driver's place we have two choices and the other places are then arbitrary, that is, for the second seat we have four choices, for the third three choices, then two and then 1. That makes 2.4! = 48 ways. Similarly in the bus we have two choices for the driver, and then the driver plus the passengers can be seated among the 24 seats arbitrarily. Let us first choose the seats to be occupied, that is, (jj), and among these the people can be seated in 21! ways. That makes 2. (£)21! = ^ ways. □
1.25. In how many ways can we insert into three distinct envelopes five identical 10-bills and five identical 100-bills such that no envelope stays empty?
Solution. Let us first compute the number of insertions ignoring the non-emptiness condition. Using the rule of product (we insert the 10-bills and 100-bills independently) we have C(2, 7)2 = Q2. Let us now subtract the insertions such that exactly one envelope is empty and then the insertions such that two are empty. We have C(2, 7)2 - 3(C(1, 6)2 - 2) - 3 = Qf - 3(62 - 2) - 3 = 336. □
The second property from our claim allows us to compose all binomial coefficients into the so-called Pascal triangle, where every number is obtained as a sum of two coefficients situated right "above" it:
n = 0
n = 1
n = 2
n = 3
n — 4
n = 5
1
1
1
1
1
10
1
1
1
1        5 10
Note that in individual rows we have exactly the coefficients at individual powers in the expression (1.5), for instance the last given row says
(a
■ b)5 = a5
■5a4b ■
10a V
lOflV
-5ab*
1.8. Choice with repetitions. The ordering of n elements, where some of them are indistinguishable is called permutation with repetitions.
Let there be among n given elements pi elements of first kind, p2 elements of second kind, ..., pk of the k-th kind, where p\ + p2 + • • • + p\ — n, then the number of permutations with repetitions of these elements is denoted as P(pi,...,Pk).
Similarly to permutations and combinations without repetitions, for the choice of the first element we have n possibilities, for the second n — 1 and so on, until the last element, for which we have only one choice. But we consider the orderings which differ only in the order of indistinguishable elements to be identical. Elements of every kind can be ordered in pi! ways, thus we have
_ Permutations with repetitions _.
1
P(pi,...,pk)
Pi!
Free choice of k elements from n elements, when order matters, is called variation ofk-th degree with repetitions, the number of those is denoted V(n,k). Free choice in this case means that we assume that for every choice we have the same number of possibilities - for instance, when we return the elements back before the next choice, when we throw the same dice, and so on. The following clearly holds:
I    Variations with repetitions    [__<
L
V(n,k) = n"
If we are interested in choice without taking care of order, we speak of combinations with repetitions and for their number we write C(n, k). At first sight, it does not seem to be easy to determine the number. The proof of the following theorem is typical for mathematics - we reduce the problem to another problem we have already dealt with. In our case it is reduction to standard combinations without repetitions:
Combinations with repetitions [___
Theorem. The number of combinations with repetitions of k-th order from n elements equals for every k > 0 and n > 1
\-k- r
k
C(n,k) =
10
CHAPTER 1. INITIAL WARM UP
1.26. Determine the number of distinct sentences which can arise by permuting letters in the individual words in the sentence "Skokan na koks" (the arising sentences and words do not have to make any sense).
Solution. Let us first compute the number of anagrams of individual words. From the word "skokan" we obtain 6!/2 distinct anagrams (permutation with repetition P(\, 1, 1, 1,2)), similarly "na" yields two and "koks" 4!/2. Therefore, using the rule of product, we have 6!4!/4 = 4320. □
1.27. How many distinct anagrams of the word "krakatit", such that between the letters "k" there is exactly one other letter.
Solution. In the considered anagrams there are exactly six possibilities of placement of the group two "k", since the first of the two "k" can be placed at any of the positions 1 — 6. If we fix the spots for the two "k", then the other letters can be placed arbitrarily, that is, in P(\, 1, 2, 2) ways. Using the rule of product, we have
6 • 6!
6- 1,2,2)
2-2
1080.
□
1.28. In how many ways can we insert five golf balls into five holes (into every hole one ball), if we have four white balls, four blue balls and three red balls?
Solution. Let us first solve the problem in the case that we have five balls of every colour. In this case it amounts to free choice of five elements from three possibilities (there is a choice out of three colours for every hole), that is variations with repetitions (see). We have
V(3,5) = 35.
Now let us subtract the configurations where there are either balls of one colour (there are three such), or exactly four red balls (there are 2 • 5 = 10; we first choose the colour of the non-red ball - two ways -and then the hole it is in - five ways). Thus we can do it in
35 - 3 - 10 = 230
ways.
□
1.29. In how many ways could have the English Premier League league finished, if we know that no two of the three teams Newcastle United, Fulham and Tottenham Hotspur are not "adjacent" in the final table? (There are 20 teams in the league.)
Solution. First approach. We use the inclusion-exclusion principle. From the number of all possible resulting tables we subtract the tables
Proof. The proof is based on a trick (a simple one, as soon as we understand it). We show two different approaches.
Assume first, that we are drawing cards from a deck of n different cards and in order to make it possible to draw some card multiple times, we add to the deck k — 1 different jokers (we definitely want to draw at least one of the original cards). Say that we have drawn r original cards and s jokers, that is, r + s — k. It seems that we should devise a method how to assign "substitute" jokers to original cards, so that we know how many times we have drawn each original card. But we actually need to discuss only the number of ways how to do that.
For that we can use the mathematical induction and assume that the claim holds for any arguments smaller than n and k. We need to obtain the combination with repetition of the s-th order from r original cards, which gives (^+k~r~l) — (k^1), which is exactly the number of combinations without repetitions of s-th order from all jokers. Thus the theorem is proven.
Alternative approach (induction-free): Over the set
S — {fli, ..., a„],
from which we choose the combination, we fix an ordering of the elements and for our choices of elements of S we prepare n boxes into which we give (in the fixed order) the elements of S (one element into every box).
The individual choices x, e S are then given to the box which already contains this element. Now let us realize that in order to detect the original combination we just need to know how many elements are there in individual boxes. For instance,
a I bbb I cc I d
* | * * * | ** | *,
determines the choice b, b, c from the set S — [a, b, c, d}.
In the general case of the choice of k elements from n possible we have a chain of n + k elements and the number C(n, k) equals the number of possible placements of the boxes | among individual elements. This amount to the choice of n—1 positions from n+k—1 possible. Since we have
+ k-l\     (  n+k-1   \ /n+k-V
1
the theorem is proven (for the second time).
1
□
3. Difference equations
In the previous paragraphs we have seen formulas, which determined the value of a scalar function defined on natural numbers (factorial) or on tuples of natural numbers (binomial coefficients) using already defined values. In the paragraph 1.5 the binomial coefficients are defined with a directly computable formula, but we can also understand them using the relationship exhibited in 1.8 -instead of the value of the function we give the difference corresponding to a change of the variable.
Such approach can be seen very often when formulating mathematical models that describe real systems in economy, biology, etc. We will observe only a few sim-pie examples and we will return to this topic in the future.
11
CHAPTER 1. INITIAL WARM UP
where some two of the three teams are adjacent and then add the tables where all three teams are adjacent. The number is then
20!
2! • 19! + 3! • 18! = 1741445647958016000.
Second approach. Let us consider the three teams to be "separators". The remaining teams have to be divided such that between any two separators there is at least one team. The remaining teams can be arbitrarily permuted, as can the separators. Thus we have 'IS
■ 17! • 3! = 1741445647958016000.
ways.
□
1.30. For any fix n e N determine the number of all solutions to the equation
xi +x2-\-----h xk = n
on the set of strictly positive integers.
Solution. Every solution (r1; ..., rk), X!*=i ri = n can be uniquely encoded as a sequence of separators and ones, where we first write r\ ones, then a separator, then r2 ones, then another separator, and so one. Such sequence then clearly contains n ones and k — l separator. Every such sequence clearly determines some solution of the given equation. Thus there are exactly that many solutions as there are sequences, that
□
(n+k-l\
IS,
C. Difference equations
Difference equations (also called recurrence relations) are relations between elements of some sequence, where an element of the sequence depends on previous elements. To solve a difference equations means finding an explicit formula for 72-th (that is, arbitrary) element of the sequence. Recurrence relation allows us only to compute 72-th element by computing all previous elements.
If an element of the sequence is determined only by the previous element, we speak about first order difference equation. Those are present in our real world, for instance when we want to find out how long will repayment of a loan take for fixed monthly repayment, or when we want to find out how much shall we pay per month if we want to repay a loan in a fixed time.
1.31. Michael wants to buy a new car. The car costs €30, 000. Michael wants to take out a loan and repay it with a fixed month repayment. The car company offers him to buy the car with yearly interest of 6%. Michael would like to finish repaying the loan in three years. How much should he pay per month?
1.9. Linear difference equations of first order. General difference equation of the first order is an expression of the form
f(n + 1) = F(n, fin)),
where F is a known scalar function with two parameters. If we know the "initial" value f(0), we can compute/(l) — F(0, /(0)), then f(2) — F(l, /(l)) and so on. Using this approach we can compute the value fin) for arbitrary n e N. Note that this idea resembles the construction of natural numbers from the empty set or the principle of mathematical induction.
An example of such equation is the definition of the factorial function:
in + 1)! = in + 1) -n!
We see that the value of fin + 1) depends on both n and the value of/(«).
Another, very simple example is fin) — C for some fixed scalar C and all n, and the so-called linear difference equation of first order
(1.6)
f(n + \) = a-f(n) + b,
where a / 0 and b are known scalars.
Such difference equation is easy to solve if b — 0. Then it is the well-known recurrent definition of the geometric progression and it holds that
/(l) = a/(0),    /(2) = fl/(l) = a2/(0)   and so on
Thus for all n we have
/(«) = aV(0).
This is also the relation for the so-called Malthusian population growth model, which is based on the assumption that during a given time interval the populations grows with a constant ratio a to the state before the interval.
We will prove a general result for first order equations, which are similar to linear, but allow varying coefficients of a and b,
(1.7)
fin + l) — a„- fin) + bn.
First let us think about what such equations can describe.
Linear difference equation (1.6) can we nicely interpret as a mathematical model for finance, e.g. savings or loan payoff with a fixed interest rate a and fixed repayment b (the cases of savings and loans differ only in the sign of b).
With varying parameters a and b we obtain a similar model with varying interest rate and repayment. We can imagine for instance that n is the number of months,
1 ^i'- a„ is the interest rate in the nth month, b„ the repay-* - ment in the nth month. Do not be afraid of the seemingly difficult calculations in the following result. It is a typical example of technical mathematical statement for which it is hard to "guess" precisely how it should be formulated. On the other hand, it is then a simple exercise on the properties of scalars and mathematical induction to prove it. Really interesting are then the corollaries, see 1.11 later.
In the formulation we use along with the usual notation for sum me similar notation for the product Yl- In the rest of the text we will also use the convention that when the index set is empty, then the sum is zero and that the product is one.
12
CHAPTER 1. INITIAL WARM UP
Solution. Let 5 denote the sum Michael has to pay per month. After first month Michael repays 5, part of it is a repayment of the loan, part of it repays the interest. Let dk stand for the loan after k months. After first month dk is
0,06
di = 30000 - 5 +        • 30000.
12
In general, after k-th month we have (1.1)
0, 06
dk = di-i — 5 H—<4-i-
Using the relation (1.9) is dk given by
0,06\ 1 +        I 30000
1 +
0, 06 12
125 0706
Repaying the loan in three years means d36 = 0, thus we obtain
(1.2)
5 = 30000
0,06 12
l-(l + °f )"36
912.7.
□
Note that the recurrence relation (|| 1.11|) can be used for our case as long as all y(n) are positive, that is, as long as Michael still has to repay something.
1.32. Consider the case from the previous example. For how long would Michael have to pay, if he would like to repay €500 per month?
Solution. Setting q = (l + ^) = 1.005, c = 30000 the condition dk = 0 gives the equation
, 2005 H      2005 -c
by taking logarithms of both sides we obtain
,     ln2005 -ln(2005-c)
k = -,
In q
which for 5 = 500 gives approximately £ = 71,5, thus Michael would be paying for six years (and the last repayment would be less than €500). □
1.33. Determine the sequence {yn}^L\, which satisfies the following recurrence relation
3y" , 1     ^1 1 yn+i = — + 1, n > 1, yi = 1.
o
Linear recurrence can appear for instance in geometric problems:
1.10. Proposition. General solution of the difference equation (1.7) of first order with the initial condition /(0) = yo is given by the formula
(n-l     \ n-2 / n-l \
n a') yo + X!  I! ai hi+ bn-1-i=0    I j=0 \i=j+\ )
Proof. We will prove the proposition using mathematical induction. It clearly holds for n — 1 where it amounts directly to the definition f(l) — flow + V
Assuming that the statement holds for some fixed n, we can easily compute:
(/n-l     \ n-2 / n-l       \ \
(n a>) y°+12 [ nai) bJ+b»-1 \i=0    I 7=0 \i=j+\     ) )
+ K
(n       \ n — l  I     n \
rifl')^ + E    II ai\bj+bn, i=0    I j=0 \i=j+\ j
as can be directly seen by multiplying out. □
Let us again note that for the proof we did not need anything about the scalars we used except for the properties of commutative ring.
1.11. Corollary. General solution of linear difference equation (1.6) with a ^ 1 and initial condition /(0) = yo is
(1.9)
1 - a"
f(n) = anyo + --b.
I — a
Proof. If we set a, and bi to be constants and use the general formula (1.8) we obtain
/(«) = «"W + ^(l + E"""7'"1)-
V       7=0 7
For evaluating the sum of products in the second summand we need
to observe that these are expressions (1+aH-----\-an~l)b. The sum
of this geometric progression can be computed using the formula
l—a" — (1 — a)(l+a-\-----h a"-1), and that yields the required
result. □
Note that for calculating the sum of a geometric progression we required the existence of the inverse element for non-zero scalars. We could not do that with integers only. Thus the last result holds for field of scalars and we can thus use it for linear difference equations where the coefficients a, b and the initial condition /(0) — yo are rational, real or complex numbers; and also in the ring of remainder classes Zk with prime k (we will define remainder classes in the paragraph 1.41).
It is noteworthy that the formula (1.9) actually holds even with the integer coefficients and initial condition. Then we know in advance that all f(n) are integer, and integers are a subset of rational numbers. Thus our formula necessary gives correct integer solutions.
Observing the proof in more detail, we see that 1 — a" is always divisible by 1 — a, thus the last paragraph should not have surprised
13
CHAPTER 1. INITIAL WARM UP
1.34. Suppose n lines divide the plane into areas, what is the maximal number of areas that can arise this way?
Solution. Let the number of areas be p„. If there is no line in the plane, then the whole plane is an area, thus po = 1. If there are n lines, then by adding (n + l)-st line increases the number of areas by the number of areas this new line interesects. If no lines are parallel and no three lines intersect at the same point, the number of areas the (n + l)-st line crosses equals to one plus the number of its intersections with the previous lines (the crossed area will then be divided into two, thus the total number increases by one at every crossing). The new line has at most n intersections with the already-present n lines. The segment of the line between two intersections crosses exactly one area, thus the new line crosses at most n +1 areas. Before adding the line, there was at most p„ areas (by the definition of p„).
Thus we obtain the recurrence relation
pn+1 = pn+(n + 1),
from which we obtain an explicit formula for pn either by applying the formula 1.10 or directly:
pn = pn-i + n = pn_2 + (n-l)+n
= p„-3 + (n - 2) + (n - 1) + n =
n(n + 1)     n2 + n + 2 = 1 + —-- = -
Po
□
Recurrence relation can be more complex than first order. Let us list example of combinatorial problems, for whose solution a recurrence relation can be used.
us. However it can be seen that with scalars from Z4 and say a — 3 we fail since 1 — a — 2 is a divisor of zero.
1.12. Nonlinear example. Let us return for a while to the first \\ order equation (1.6) we have used for a very primitive population growth model which directly depends on the momentary population size p. On first sight it is W clear that such model with a > 1 leads to a very rapid and unbounded growth.
A more realistic model has such population change Ap(n) — p(n + 1) — pin) only for small values of p, that is Ap/p ~ r > 0. Thus if we want to let population grow by 5% for a time interval only for small p, we choose r to be 0, 05. For some limit value p — K > 0 the population does not grow and for even greater values it even decreases (since for instance the resources for the feeding of the population are limited, individuals in a big population are obstacles to each other etc).
Let us assume that exactly the values y„ — Apin)/pin) change linearly in pin). Graphically we can imagine this dependence as a line in the plane of variables p and y, which goes through the points [0, r] (that is when p — 0 we have y — r) and [K, 0] (which gives the second condition - when p — K the population does not change). Thus we set
y
By setting y
yn and p ■■
pin + 1)
pin) we obtain pin)
pin)
— pin)+r,
that is by multiplying out we obtain a difference equation first order (where the value of p in) is present as both first and second power).
(1.10)
pin + 1) = p(n)(l - —pin)+r)
Try to thing through the behaviour of this model for various , values of r and K. On the picture we can see the ■1    values for parameters r — 0, 05 (that is, five percent growth in the ideal state), K — 100 (the resource limit the population to the size 100) and piO) are two individuals.
Heunbakni M/'i^oi r
Note that the original almost exponential growth slows later down and the value approaches the desired limit of 100 individuals. For p close to one and K way greater than r the right side of the equation (1.10) is approximately p(n)(l+r), that is the behaviour is similar to that of the Malthusian model. On the other hand, for p almost equal to K the right side of the equation is approximately pin). For initial value of p greater than K the values will decrease, for smaller than K they will grow, thus the system will basically oscillate about the value K.
14
CHAPTER 1. INITIAL WARM UP
1.35. How many words of length 12 that consist only of letters A and B, but do not contain a sub-word BBB, are there?
Solution. Let a„ denote the number of words of length n consisting of letters A and B but without BBB as a sub-word. Then for a„ (n > 3) the following recurrence holds
since the words of length n that satisfy the given condition either end with an A, or with an AB, or with an ABB. There are a„_i words ending with an A (preceding the last A there can be an arbitrary word of length n — 1 satisfying the condition). Analogously for the two remaining groups. Further, we can easily compute that a\ = 2, a2 = 4, a3 = 1. Using the recurrence relation we can then compute
a12 = 1705.
We could also derive an explicit formula for n-th element of the sequence using the theory we have developed. Characteristic polynomial of the recurrence relation is x3 — x2 — x — 1 with one real and two complex roots, which we can express using the relations (|| 1.12||).
□
Score of a basketball match between the teams of Czech Republic and Russia is after the first quarter 12 : 9 for the Russian team. In how many ways could the score have developed?
Solution. If we denote P^j) the number of ways in which the score could have developed for a quarter that ended with k : I, then for k, I > 3 the following recurrence relation holds:
(k-3,l)
(k-2,l)
(k-l,l)
OU-l)
+ p<
OU-2)
OU-3)-
(We can divide all possible evolutions of the quarter with the final score k : I into six mutually exclusive possibilities, according to which team scored a goal and how worth it was (1, 2 or 3 points)). Using the symmetry of the problem, it clearly holds that P^j) = P(i,k)- Further
4. Probability
Let us have a look on another frequent example of scalar-valued functions - the observed values are often known neither explicitly by a formula nor implicitly by some description. They are a re-y'"'1    suit of some randomness and we try to describe the probability of some outcome happening.
1.13. What is probability? As a simple example we can use common six-sided dice throwing, with sides labelled as
1, 2, 3, 4, 5, 6.
If we describe mathematical model of such throwing with a "fair" dice, we expect and thus also require that every side occurs with the same frequency. In words, we say that "every in advance chose side occurs with the probability g".
But if you try to manufacture with a knife such dice from wood, you probably observe that the relative frequencies will not be the same. In such situation, we can after a large number of tries count the relative frequencies of each label, and set these to be the probabilities in our mathematical description. But no matter how large the number of tries is, we cannot exclude the possibility that all the tries we did was some unlikely combination of the result and thus our model is not well chosen.
In the following part we will work with abstract mathematical description of probability in the simplest approach. The question how accurate or adequate for a specific real-world problem is out of the realms of mathematics. But that does not mean that such question are not for mathematician, quite the opposite (most likely in cooperation with some experts in the given area). Later we will return to probability and see it as a theory describing the behaviour of random processes or fully deterministic processes where not all determining parameters are known.
Mathematical statistics allows us to say how much can we expect that a given model corresponds to reality, or allows us to determine the parameters of the model in such way that the correspondence with the observations is high and simultaneously can estimate the reliability of the chosen model.
For both probability and statistics a complex mathematical theory is required, which we build over the course of few semesters.
On the example of our dice we can imagine it as follows: in the probability theory we work with the parameters pi for the probabilities of individual sides and only require that these probabilities are non-negative and their sum is
P1+ P2 + P3+ P4 + P5 + P6 — 1-
When choosing specific values pi for a specific dice in mathematical statistics we can then estimate the reliability of our mathematical model of the die.
Our humble goal for now is just to indicate how to abstractly capture the probabilistic considerations in formal f ~-±Z%      mathematical objects. The following paragraphs are thus basically just exercises in simple operations with sets and combinatorics (that is, calculating the number of possibilities of satisfying the condition for finite sets).
15
CHAPTER 1. INITIAL WARM UP
we have for k > 3 that:
P(k,2)     =    P(k-3,2) + P(k-2,2) + P(k-l,2) + P(k,l) + P(k,0), P(k,l)     =    P(k-3,l) + P(k-2,l) + P(k-l,l) + P(k,0), P(k,0)     =    P(k-3,0) + P(k-2,0) + ^*(i-l,0).
which along with the initial condition gives P(o,o) = 1> Ai,o) :
^(2,0) = 2, ^(3,0) = 4, ^(1,1) = 2, P(2,l) = -P(l,l) + P(0,l) + ^(2,0) P(2,2) = P(0,2) + P(l,2) + P(2,l) + P(2,0) = H g^eS
p(l29) = 497178513.
□
Remark. We see that the recurrence relation in this problem has a more complex form in comparison to the form we have dealt with in our theory and thus we cannot evaluate arbitrary number P^,i) explicitly, we can evaluate it only by a subsequent computing from previous elements. Such an equation is called partial difference equations, since the elements of the equation are indexed by two independent variables (k,l).
We will talk more about recurrent formulas (difference equations) of higher orders with constant coefficients in chapter 3.
D. Probability
Let us state a few simple exercises for classical probability, where we are dealing with some experiment with only a finite number of outcomes ("all cases") and we are interested whether the outcome of the experiment belongs to a subset of possible outcomes ("favourable outcomes"). The probability we are trying to determine then equals to the number of favourable outcomes divided by the total number of all outcomes. Classical probability can be used when we assume (know) that each of the possible outcome has the same probability of happening (for instance, fair dice throwing).
1.37. What is the probability that the roll of a dice results to a number greater than 4?
Solution. There are six possible outcomes (the set {1, 2, 3, 4, 5, 6}) of which two are favourable ({5, 6}). Thus the probability is 2/6 = 1/3.
□
1.38. We randomly choose a group of five people from a group of eight men and four women. What is the probability that there are at least three women in the chosen group?
Solution. We compute the probability as a quotient of the number of favourable outcomes to the total number of outcomes. We divide the
1.14. Random events. We work with a non-empty fixed set £2 of all possible outcomes, which we call the sample space. For simplicity the set £2 is finite with elements &>i, ...,&>„, corresponding to individual possible outcomes. Every subset Acfi represent a possible event. The set of subsets A of the sample space is called the set of events, if
• £2 e A (the sample space is an event),
• if A, B e A, then A \ B e A (that is, for every two events their set difference is also an event),
• if A, B e A, then A U B e A (that is, for every two events their union is also an event).
Clearly also the complement Ac — Q \ A of an event A is an event, which we call the opposite event to the event A. The intersection of two events is again an event, since for every two subsets A, B <z £2 holds
A \ (£2 \ B) = A n B.
In words, the set of events can be also characterised as a system of subsets of (finite) sample space closed on intersection, union and set difference. Individual sets A e A are called random events (with respect to .4).
For our dice throwing is ^2 — {1, 2, 3, 4, 5, 6} and the set of events consists of all subsets of the set £2. For instance the event {1, 3, 5} is interpreted as "the result of the throw is an odd number".
Now for some terminology, which should remind of the connections with the description of real models:
• the whole sample space £2 is called the sure event, the empty subset 0 e A is called impossible event,
• singleton subsets {&>}  c  £2 are called elementary events, Pospisil definuje samotne prvky jako
el.jevy.Jinde se to definuje i jeste ji-nak
16
CHAPTER 1. INITIAL WARM UP
favourable cases according to the number of men in the chose group: there can be either two or one. There are eight groups with five people of which one is a man (all women have to be present in such groups, thus it depends only on which man we choose). There are c(8, 2) • c(4, 3) = (j) • (3) of groups with two men (we choose two men from eight and then independently three men from four, these two choices can be independently combined and thus using the rule of product we obtain the number of such groups). Thus the total number of groups with five people and at least three women is c(12, 5) = (j2). The probability is then
8 + 0© 5 (?) 33-
□
Let us give an example for which the use of classical probability is not suitable:
1.39. What is the probability that the reader of this exercise wins at least €25 million euro in EuroLotto during the next week?
Solution. Such a formulation is incomplete, it does not give us enough information. We present a "wrong" solution. The sample space of possible outcomes is two-element: either the reader wins or not. Favourable event is one (win), thus the probability is 1/2 (a clearly wrong answer). □
Remark. In the previous exercise the basic condition of the usage of classical probability was violated - every elementary event must have the same probability. In fact, the elementary event has not been defined. EuroLotto has a daily draw with jackpot of €25, 000, 000 for choosing 5 correct numbers 1 — 50. There is no other way to win €25, 000, 000 than to win a jackpot on some of the day during the week. The elementary event would be that a single lotto card with 5 numbers wins a jackpot. Assuming that the reader submits k lotto cards every day of the week, the probability of winning at least one
jackpot during the week equals     — 2n876o-
Ik
1.40. There are 2n seats in a row in cinema. We randomly seat n men and n women in the row. What is the probability that no two persons of the same sex sit next to each other?
Solution. In total there are (2n!) of possible seatings, the number of seating satisfying the given condition is 2(n\)2: we have two ways for choosing the positions for men (thus also for women) - either all men sit on odd-numbered places (thus women sit on even-numbered places), or vice versa. Among these places, both men and women are
translators note:    im really unsure about the english terminology regarding some of these terms...
• intersection of events At, i e I, corresponds to the event n,e/A,, union of events At, i e I, corresponds to the event
• A, B € A are mutually exclusive, if A n B — 0,
• the event A has as a corollary the event B, if A c B,
Give an example of all listed terms for the sample space of dice rolling or analogously for coin throwing!
1.15. Definition. Probability space is a triple (£2, A, P), where
A is a set of events of (finite) sample space £2, where there is a scalar function P : A -> K with the following properties:
• P is non-negative, that is, P(A) > 0 for all events A,
• P is additive, that is, P(A U B) — P(A) + P(B), whenever A, B <e A and An B — 0,
• the probability of the sure event is 1, that is P(£2) — 1.
The function P is called probabilistic on the set of events A.
Clearly an immediate corollary of our definitions is a list of simple yet useful propositions. For instance, for all events it holds that
P(AC) = 1 - P(A).
Further, we can using mathematical induction extend the additivity to additivity for any number of mutually exclusive events At c £2, i e I, that is,
iel
whenever At n Aj — 0, for all i / j, i, j e I.
1.16. Definition. Let £2 be a finite simple space and let the set
of events A be the set of all subsets of £2. Classical probability is probabilistic space (£2, A, P) with probabilistic function
P:A
P(A) =
Q\
where | A| stands for the number of elements of the set A e A.
Clearly such given function is probabilistic, check by yourself that all the given axioms hold.
1.17. Summing probabilities. For mutually incompatible events is probability summing for the occurrence of at least one of
fthem already incorporated into the definition of the probabilistic function. However, in general summing of prob-) abilities for event occurrences is difficult. The problem is that whenever the events are mutually compatible, some of the elementary events are counted multiple times.
The simplest case to imagine is the one with two mutually compatible events A and B. Let us first consider classical probability, where it basically reduces just to counting elements in subsets. The probability of the occurrence of at least one of the events, that is, the probability of their union, is given by the formula
(1.11)        P(A U B) — P(A) + P(B) — P(A n B)
since the elements that belong to both sets A and B were first counted twice and thus we have to subtract them once.
17
CHAPTER 1. INITIAL WARM UP
seated arbitrarily. The resulting probability is thus
2(72 !)2
p(n) = ——, p(2) = 0, 33, p(5) = 0, 0079, p(8) = 0, 00016. (2n)\
1.41. Five persons entered an elevator in a building with eight floors. Each of them leaves the elevator at any floor with the same probability. What is then the probability, that
i) all of them leave at sixth floor,
ii) all of them leave at the same floor,
iii) each of them leaves at a different floor.
Solution. The sample space of possible events is the space of all possible ways of leaving the elevator by 5 people. There are 85 of them.
In the first case there is only one favourable outcome, thus the probability is p-, in the second case we have eight favourable outcomes, thus the probability is ^ and finally in the third case the number of favourable outcomes is given by a five-element variation of eight elements (we choose five floors among eight where some person leaves the elevator and then we choose the order in which they leave at the chosen floors), the probability is then (see 1.6 and 1.8) v(5, 8)     8-7--4
V(5, 8)
85
0,2050781250.
□
1.42. Randomly choose a positive integer smaller than 105. What is the probability that it will consist only of digits 0, 1, 5 and it will be divisible by 5?
Solution. There are 2 • 34 — 1 numbers satisfying the conditions (except for the last digit, at every position we have three choices, if there are some 0 at the beginning of the number we ignore them but let them remain). There are 105 — 1 positive integers smaller than 105, thus according to the classical probability we obtain that the probability is
2-34-l 10s —1
0.0016.
□
1.43. >From a sack with five white and five red balls we randomly draw three (we do not return the balls back to the sack). What is the probability that two of them are white and one is red?
Solution. Let us divide the event into a union of three disjoint events, according to in what turn we draw the red ball. Probability, that the red ball is drawn as third, second, or first, respectively, are : \ ■ | • |,
I.I.I I . 1 . I Tn total — 2    9    2' 2   9    2       i-""" 12-
Another solution. Consider the number of all possible triples of drawn balls (the balls of the same colour are indistinguishable), thus (3°). There are (2) • (I) of triples with exactly two white balls (two
The same result is obtained for general probabilistic function P on a set of events. Since AC\B and A\B are independent events,
P(A) = P(A \B) + P(A n B),
□     similarly for B, and we also have
P(A U B) = P(A \B) + P(B \ A) + P(A n B).
If we express the probabilities of P(A \ B) and P(B\A) in terms of P(A), P(B) and P(A n B), we obtain the formula (1.11), now for the general case.
The following theorem is a direct reflection of the so-called \^ combinatorial inclusion-exclusion principle into our finite probability and it says, how shall we deal with multiple elementary event counting in general case. It is probably a good example of mathematical theorem, where the hardest part is finding a good formulation. After that is done, we can say that the claim is (intuitively) obvious.
On the picture is the situation for three sets A, B, C for classical probability. If we just sum the probabilities of A, B and C, then the hatched areas represent elements that are present twice, and the double-hatched area represents those present thrice. Thus we subtract the hatched areas once, which leads to triple subtraction of the elements of double-hatched area, which we must therefore add once more.
In general, thanks to the additivity of the probability, we can imagine that we decompose every event as a union of elementary events (although the elementary events do not have to belong to the set of events in consideration). Then the probability of every event is given by the sum of probabilities of its elementary events. For expressing the probability that at least one of the events occurs we can sum the probabilities of A,, then subtract those elementary events that are present twice (that is, in intersection of two A,'s). However, we might have subtracted some event too many times, notably in the case that an element was present in (at least) three At's. Thus we again add, and so on.
Theorem. Let A\,..., A^ e Abe arbitrary events over the sample space Q with a set of events A. Then
k k-l k
p(utlAi)=j2p(Ai')-Y, E p(^nAi)
/ —1 i — 1 j=i + l
k-2  k-l k
+ E E  E P(AinAjnAt)
r' = l j=i + l t=j+\
+ p(Ai n A2n■ ■ ■ n Ak).
18
CHAPTER 1. INITIAL WARM UP
white balls can be drawn in Q ways, and one red ball can join them in five ways). The required probability is then
(\°)
5_ 12-
□
1.44. From a hat where there are five white, five red and six black balls we randomly draw balls (and do not return the drawn balls back). What is the probability that the fifth drawn ball is black?
Solution. We will solve a general problem, the probability that the z'-th drawn ball is black. This probability is the same for all i, 1 < i < 16 - we can imagine that we draw all balls one by one, and every such sequence (from the first drawn ball to the last one) consisting of five white, five red and six black has the same probability of being drawn. Thus we can use the classical probability. There are P (5, 5, 6) = gi.^'g; of such sequences. The number of sequences where there is a black ball on the r'-th place, the rest arbitrary, equals to the number of arbitrary sequences of five white, five red and five black balls, that
equals P(5, 5, 5)
15! 5!5!5!
. Thus the probability is
f(5,5,5) _ _ 3 P(5,5, 6)
16! 6!5!5!
□
Let us return to the dice throwing and try to describe the events of the sample space £2 that arise when we are throwing until a six is rolled, but no more than hundred times.
For a single roll the sample space consist of six numbers from one to six, this is classical probability. For whole series of rolls the sample space is much bigger - it consist of finite sequences of numbers from one to six, which have at most 100 elements and all numbers but the last one are from one to five, and either the last number is six or it has length exactly 100 and there is no number six in it. An event A can be for instance the subset "there were at most two rolls". All favourable elementary events are then
[1,6], [2,6], [3,6], [4,6], [5,6].
Using the classical probability for single dice rolls we derive the probability of the events in £2. But it is not classical probability - if we were to derive the probability of the event A, it means that first roll is not six and the second is. Classical probability says
5 1 5 P(A) =---= —,
6 6 36
since the first roll is different from six with probability 1 — and the second roll is completely independent of the first one. Clearly, this does not amount to the quotient of all favourable cases to the size of the sample space.
Proof. In order to make the aforementioned ideas into a proof, we need to ensure that all the operations of adding subtracting are with coefficient one. Instead of doing that, we can give a more formal proof by mathematical induction over the number of events k, whose probabilities we are summing. Try to compare both approaches as they presented, it should help to clarify what it means to "prove" and what it means to "understand".
For k — 1 the claim is obvious, the case k — lis the same as the equality (1.11) which we have already proved (in general case).
Let us assume that the theorem holds for any number of events up to a definite k > 1. Now we can work in the induction step with the formula for k + 1 events, where the union of the first k of them are considered to be the A in the equation (1.11) and the remaining event is considered to be the B :
) U Ajt+i)
E((-1)j+1     E P(Atln...nAtj))
P(Ak+l)
1 <!!<■■
P((AiU
■ U A*) n Ajt+i).
This already resembles the formula for k + 1 summed events, but still in the big sum there are missing the expressions containing Ajt+i and the value for probability that all the events happen. On the other hand, the last expression should not be there. We can replace it by the expression
-P((Ai n Ajt+i) U ■ ■ ■ U (At n Ajt+i))
and for this we can again use the induction, that is the formula in the statement of the theorem. With a little patience (and a paper long enough to write down all the expressions) we can check that this adds all the missing pieces. □
1.18. Inclusion-exclusion principle. A special case of the previous theorem is one of classical probability, where all finite subsets of the sample space are events and all elementary I events have the same probability. In the formula from the V-1 previous theorem all the probabilities give the sizes of the subsets involved, up to a common factor where n is the number of elements of the sample space.
In this way we can extract from the theorem 1.17 the following claim for the size of a general finite set M and its subsets Ai,..., A*. As usual we let \M\ denote the number of elements of the set M.
Of course that for every finite set M and its subspaces it holds
that
|M\(uf=1A;)| = |M|-|uf=1 Ai\.
Now we can use the previous theorem and express the size of the union on the right side, and we obtain the theorem that is usually called the principle of inclusion-exclusion.
|M\(uf=1A,-)| =
= |M| + W(-iy    E lAi.n-.-nA.-.iY
7=1 l<r'i < — <ij<k
Again it is easy to depict the theorem for two or three sets, see the picture before the theorem 1.17.
19
CHAPTER 1. INITIAL WARM UP
In general, we can say that after exactly 1 < k < 100 rolls the experiment ends with probability (f • \. Among all possibilities, it is most likely that it ends after the first roll.
Another example how to obtain events with different probability by dice throwing is to roll more dice and observe the sums. Let us think as follows: when rolling one dice every outcome has the same probability - ^. When rolling two dice, every tuple (a, b) (a tuple of two integers from one to six in a given order) has the same probability . If we ask for two fives, the probability is half of the probability for two different values without a prescribed order. When considering the sum of two dice throwing there is no contradiction with the classical probability concept as an event with a prescribed sum is a union of elementary outcomes that all have same probabilities ^. For possible results of a sum of the two dice (in the upper row) we list the number of ways (in the lower row):
I        Sum        I 2 I 3 I 4 I 5 I 6 I 7 I 8 I 9 I 10 I 11 I 12 I
12345654 3
1
| Number of ways
Similarly the probability for rolling three dice is ^ (when assuming that order makes a difference). If we ask for the probability of some given sum to appear, we just need to determine in how many ways such a sum can appear and then sum the probabilities.
1.45. Inclusion-exclusion principle. Secretary has to send six letters to six different people. She puts the letters in the envelopes randomly. What is the probability that at least one person receives the letter that was meant for him?
1.19. Independent events. Let us return for a while to the simple model of a fair dice. We are interested in possible dependencies among events.
For instance the probability that the events "an odd number is rolled" and "the result is at least three" occur simultaneously is ■j. That is the same as \ ■ |, the product of the probabilities of the events. This corresponds to the idea that the probability of simultaneous occurrence is given by the product of the particular probabilities. On the other hand, if we consider the mutually incompatible elements, for instance "even number occurs" and "odd number occurs", the probability of both of them occurring simultaneously zero, while the product of particular probabilities is nonzero. This corresponds to the idea that these two events must be the dependent, since the occurrence of the first one forbids the occurrence of the other one. Clearly, a weaker dependence can occur, for instance the event "odd number occurs" is a corollary of the event "number 3 occurs" and thus the probability of mutual occurrence is not given by the product.
For the probabilistic function P on an arbitrary set of events we say that the events A and B are stochastically independent if the following holds
P(A n B) = P(A) ■ P(B).
Let us try the same approach with a dice and more events, for instance the events A "odd number occurs", B "the result is at least" and C "the result is at most 3". The probabilities are P(A) — j, P(B) - \, P(C) — \, P(AnBno = i = i- | tuples we have for instance P(A n C) — \ ^ 1
l
2, but taking
l
3 t~ 2 " 2-
In general, mutually independent sets are defined in this way:
Definition. Consider an arbitrary probability space (£2, A, P) and k events A\,..., A* in that space. We say that these events are stochastically independent (with respect to the probabilistic function P), if for any chosen events
Mt, 1 < £ < kwe have
P(A, n---n Ak) = P(A,)
P(Ah).
Clearly, every subset of a set of stochastically independent events is also stochastically independent. Further, for any two stochastically independent events we compute
P(A 0 5^) = P(A \ B) — P(A) - P(A n B) =
= P(A)(1 - P(B)) = P(A)P(BC).
>From there we can easily derive that by exchanging one or more event is a set of stochastically independent events we again obtain a set of stochastically independent sets.
Very often we need to compute the probability that at least one of the stochastically independent set of events occurs, that is, we want to compute P(Ai U • • • U Ak). In such situation we can use elementary properties of set operations, the so-called De Morgan laws,
(UI-6/AI-)c = nieIAct (nieIAi)c = U/g/A?
and we obtain (1.12)   P(Ai U
■uAjt) = i-P(A<n---nA£) =
= l-(l-P(Ai))...(l-P(Ajt)).
20
CHAPTER 1. INITIAL WARM UP
Solution.
Let us compute the probability of the opposite event - no person receives the correct letter. The sample space corresponds to all possible orderings of six elements (envelopes). If we denote both the letters and the envelopes by numbers from one to six, then all the favourable events (no letter is assigned to the corresponding envelope) correspond to such orderings of six elements, where the z'-th element is not at the z'-th place (z = 1, ..., 6) - so-called orderings without a fixed point. We compute the number of such orderings using the inclusion-exclusion principle. If we denote by Mt the set of permutations such that z is a fixed point (note that permutations in Mt can also have other fixed points), then the resulting number d of permutations without a fixed point is
d = 6! - |Mi U •• • UM6|
The number of elements in the intersection is \Mtl n • • • n Mik\, k = 1, ..., 6, is (6 — k)! (the order of the elements z'i, ..., ik is fixed, the remaining 6 — k can be ordered arbitrarily). Using the inclusion-exclusion principle we have
\MX U • • • U M6\ = (fW - *)!
and thus for the number d we obtain the relation
d   =   6!(«-*)!
k=0 V 7 k=0
The probability that no person receives "his" letter is then
E
k=0
(-1)* k\
and the probability we were asked for is
(-1)* 53
k=0
k\
144
□
Remark. Note that the answer does not change much with growing number of letters. For n letters is the probability that the secretary does not assign any of them in correct order
k=0
(-1)* k\
1
1 - -,
e
as we see later, the sum converges to (approaches) the value 1/e. In a similar way the exercise || 1.1531| can be solved.
1.20. Conditional probability. We can reformulate the measure of dependency between two sets with the idea that we are investigating one of them under the condition that the other has occurred. For independent events, this does not have any impact. For instance, "what is the probability that in roll of two dice the result is twice 5, assuming that the sum of the results is 10?" We can formalise such approach as follows.
___|    Conditional probability j. -
Definition. Let H be an event with non-zero probability in a set of events A in probability space (£2, A, P). Conditional probability P (A | H) of the event A e A assuming that H (the hypothesis) has occurred is defined by the formula
P(A\H)
P(A n H)
P(H)
As it is obvious from the definition, the hypothesis H and the event A are independent if and only if P(A) — P(A\H). The definition also directly implies the "theorem for product of probabilities" - if we have two events A i, A2 satisfying P (A i n A2) > 0, then
P(Ai n A2) = P(A2)P(Al\A2) = P(Al)P(A2\Al).
All these numbers express (in a different manner) the probability that both events A\ and A2 occur. For instance, in the last case we first look whether the first event occurred. Then, assuming that the first has occurred, we look whether the second also occurs. Similarly, for three events A\, A2, A3 satisfying P(Ai n A2 n A3) > 0 we obtain
P(Ai n A2 n A3) = P(Al)P(A2\Al)P(A3\Al n A2).
In words, this can be described as follows: the probability that three events occur at once can be computed by first computing the probability that the first occurs, then computing the probability that the second occurs under the assumption that the first has occurred, and then computing the probability that the third occurs under the assumption that both the first and the second have occurred.
If we have in general k events Ai,..., A* satisfying P(Ai n • • • n A£) > 0, then the theorem says the following
P(Ai n ■ ■ ■ n A*) = P(A1)P(A2|A1)- • -PtAjtiAi n • • • n Ak_x).
Really, thanks to the assumption all the probabilities of the intersections, which are taken as the hypotheses, are non-zero. By simplifying the expression we obtain both on the left and on the right side of the equation the probability of the event corresponding to the intersection of A\,..., A*.
1.21. Geometric probability.
In practical problems we often encounter much more complicated models, where the sample space is not a finite set. At this moment, we do not have even basic tools for generalising probability to infinite sets at our disposal, but we can give at least a simple illustration.
21
CHAPTER 1. INITIAL WARM UP
The following exercise is a simple model, which estimates the probability of death of a person in a traffic accident.
1.46. Approximately 1200 persons die per year at the roads of Czech Republic. Determine the probability that some person of a chose group of 500 people dies in the following ten years in a traffic accident. For simplicity, assume that every person has the same "chance" of dying in traffic accident and that is 1200/107.
Solution. Let us first count the probability that one randomly chosen person does not die in ten years in a traffic accident. The probability that he does not die in a year is (1 — j^). The probability that he does not die in ten years is then (1 — )10. The probability that in ten years none of the given 500 people does not die is again using the product rule (the events are independent) (1 — y^-)5000. The probability of the opposite event, that is, some of the chosen people dies, is then
5000
1 - I 1 - — I     = 0.4512.
/      12 V
('-ioO
□
Remark. Model we have used in the previous exercise to describe the given situation is just approximate. The complication is in the condition that every person in the sample has the same probability of dying, which we have derived based on the total number of deaths per year. But the number of deaths changes yearly and even if it did not, the population changes. Let us show one of the possible inaccuracies on a different approach to the solution: if 1200 persons per year dies, then in ten years 12000 persons die. The probability that a certain person dies in ten years can thus be estimated by 12000/107. The probability that a specific person does not die in ten years is then (1 — j^) (first two members of binomial expansion of (1 — j^)10)- In total we analogously obtain the estimate of the probability
500
/ 12VUU
1-1--- = 0.4514.
V 10V
We see that both estimates are very close to each other.
The effort to use mathematical knowledge for winning in various gambling games is very old. Let us have a look on a very simple example.
1.47. Alex has a €2500 left over from organising a summer camp. Alex s is no fool - he added €50 from his savings and decided to go playing roulette. Alex s bets only on colour. The probability of winning when betting on colour is 18/37. He begins to bet €10, and if he loses, in the next bet he bets twice the amount he betted in the previous (only if he has enough money, if not he ends the game even
Consider the plane R2 of tuples of real numbers and its subset £2 with known volume vol £2. For example we can take the unit square. Events are represented by subsets Acfi and for the event set A we consider some suitable system of subsets for which we can determine the volume. An event A then occurs if a randomly chosen point from £2 belongs to the subarea determined by A, otherwise the event does not occur.
Let us take for instance the problem where we randomly choose two values a < b in the interval [0, 1] c K. All values a and b are chosen with the same probability and the question is "what is the probability that the interval (a, b) has size at least one half?" The choice of points (a, b) is actually a choice of a point [a, />] inside of the triangle £2 with border points [0, 0], [0, 1], [1, 1] (see the picture).
We can imagine this as a description of a problem where a very tired guest at a party tries to divide a sausage with two cuts into three pieces for him and his two friends. What is the probability that somebody gets at least a half of the sausage?
Thus we need to determine the area of the subset which corresponds to points with b > a + j, that is, the inside of the triangle A bounded by the points [0, j], [0, 1], [\, 1]. Clearly we get that
Try to answer on your own the question "what is the minimal prescribed length / such that the probability of choosing an interval of length at least / is one half?"
1.22. Monte Carlo methods. One of efficient computation methods for approximate values is the simulation of the probability by relative occurrence of a chosen event. For in-stance the well-known formula for the volume of a circle fi' 1 with given radius says that the volume of a unit circle is exactly the constant
n = 3, 1415...,
which expresses the ratio of the volume of the circle and the square of its radius. (Let us note that there is a fact we have not proven - why should the volume of a circle equal to a constant multiple of the square of its radius? We will be able to prove this mathematically after we learn how to do the so-called integration. Experimentally, we can verify this by the approach given bellow with squares of different size).
22
CHAPTER 1. INITIAL WARM UP
if he has some money left). If he wins, in the next round he bets again €10. What is the probability that using this strategy he wins another €2550? (As soon as he has already won such amount, he ends the game).
Solution. Let us first count how many times in a row can Alex s loose. If he begins with a bet of €10 , then for n bets he needs
10+20+- • -+10-2'
n-l
10'(^2') = 10'(ytt) = 10-(2"-d-
As we can easily see, the number 2550 is of the form 10(2" — 1) for n = 8. Alex s can thus bet eight times in a row no matter what the result is, for nine bets he would need 10(29 — 1) = €5110 and during the game he will never have such amount (as soon as he has €5100, he ends). Thus in order for him to fail, he must lose eight times in a row. The probability of losing on one bet is 19/37, probability of losing eight times in a row is (19/37)8 (as the bets are independent). The probability that in these eight games he wins €10 (using his strategy) is thus 1 - (19/37)8. In order to win €2500 , he needs to win 255 times €10. Again using the product rule the probability of winning is
255
0.29.
Thus the probability of winning is lower than betting everything at once on colour. □
1.48. Individually you can try to solve the previous exercise assuming that Alex has the same strategy as before, but ends only when he has no money (if he cannot afford to double the bet when he lost the previous but still has some money, he begins again with €10).
Let us now exercise the so-called "conditional" probability (see (1.20)).
1.49. What is the probability that when rolling two dice the sum is 7, if we know that neither of the rolls resulted in 2?
Solution. Let B be the event that neither of the rolls results into 2, and let A be the event "sum is 7". The set of all possible outcomes is again denoted by £2. Then
P(AnB) |Ans|
P(A\B)
P(B)
\B\
mi
\B\
The number 7 can appear as a sum in four ways if there is no 2, that is, \A n B\ = 4, \B\ = 5 -5 = 25, thus
P(A\B) = —.
25
Note that P(A) = \, that is, A and B are independent.
□
If we choose Q to be unit square and A to be the intersection of Q with a unit square (centred at the origin), then vol A — |jr. Thus if we have a reliable generator of random numbers between zero and one and we compute relative frequencies how often the distance of the point [a, b] (given by the generator) from the origin is smaller than one, thatis, a2+b2 < 1, then the result (after a large number of attempts) approximates the number \ti pretty well.
Numerical approaches based on this principle are called Monte Carlo methods.
5. Plane geometry
So far we have been using elementary notions from geometry n        of the real plane in an intuitive way. Now we will investigate in more detail how to deal with the need ^^SSsS—M to describe "position in the plane" and to find some relation between positions of distinct points in the plane.
Our tools will be again mappings, but this time we will consider only very special rules which to tuples of values (x, y) assign tuples (w, z) — F(x, y). This part will also serve as a gentle introduction to the area of mathematics called Linear algebra, which we will deal with in subsequent three chapters.
1.23. Vector space R2. Let us view the "plane" as a set of tuples of real numbers (x, y) e R2. We will call these tuples vectors in R2. For such vectors we can define addition "coordinate-wise", that is for vectors u — (x, y) and v — (x7, y1) we set
u + v — (x +x', y + /).
Since all the properties of commutative groups hold for individual coordinates, the hold for our new vector addition too. In particular there exists so called zero vector 0 — (0, 0), whose addition to any vector v results again into the vector v. We are using the same symbol 0 for the vector and for its scalar coordinates on purpose — from the context it will be always clear which "zero" it should be.
Next we define multiplication of vectors and scalars in such a way that for a e R and v — (x, y) e R2 we set
a ■ v — {ax, ay).
Usually we will omit the symbol • and just the juxtaposition of symbols a v shall denote the scalar multiple of a vector. We can directly check other properties for scalar multiplication by a, b and addition of vectors u, v, for instance
a (u + v) — a u + a v, (a + b)u — a u + bu, a{bu) — {ab)u,
where we are again using the same symbol plus for both vector addition and scalar addition.
These operation are easy to imagine if we consider the vectors v to be arrows going from the origin 0 — [0, 0] and ending at the position [x, y] in the plane.
Such arrows can we compose — one right after another, and that corresponds to the vector addition. Multiplication by a scalar a corresponds to stretching the arrow to its a -multiple.
23
CHAPTER 1. INITIAL WARM UP
1.50. Michael has two mailboxes, one at gmail.com and the other at hotmail.com. His username is the same at both servers, but passwords are different (he does not remember which passwords corresponds to which server). When typing in the password for accessing his mailbox, he makes a typo with probability 5% (that is, if he wants to type in specific password, with probability 95% he types what he intended). Michael typed in at the server hotmail.com a username and a password, but the server told him that something is wrong. What is the probability that he chose the correct password but just "mistyped" when typing in? (We assume that the username is always typed correctly)
Solution. Let A be the event that Michal typed in at hotmail.com a wrong password. This event is an union of two disjoint events:
A\ : he wanted to type in the correct password and mistyped, A2 : he wanted to type in the wrong password (the one from gmail.com) and either mistyped or not.
Thus we are looking for a conditional probability P(Ai\A) which is according to the formula for conditional probability:
P(Ai|A) = —-- =-—— =-—-,
P(A)        P{AXUA2)     P{Ax) + P{A2y
thus we just need to determine the probabilities P{A\) and P(A2). The event A! is a conjunction (intersection) of two independent events: Michael wanted to type in a correct password and Michael mistyped. According to the problem statement, the probability of the first event is 1/2 and the probability of the second event is 1/20, in total P{A\) = \ ' I) = io (we multiply the probabilities, since the events are independent). Further we have (directly from the problem statement) P(A2) = \. In total P(A) = P(Al) + P(A2) = ± + \ = §, and we can evaluate
P(Al\A)
1
21'
□
The method of geometric probability can be used in the case that the given sample space consists of infinitely many elementary events, which altogether fill some area of a line, space (where we can determine length, volume, ...). We assume that the probability, that elementary event of a given sub-area happens, is equal to the ratio of the volume of the subarea to the volume of the sample space.
1.51. >From Edinburgh Waverly station trains depart every hour (in direction to Aberdeen) and from Aberdeen to Edinburgh they also comeevery hour. Assume that the trains move between these two stations with an uniform speed 72 km/h and are 100 meters long. The trip takes 2 hrs in either direction. The trains meet each other somewhere
Ms + {XT'
Now we are able to do a very important step: if we remember jjfi „ two important vectors e\ — (1, 0) and e2 — (0, 1), then every vector can be obtained as
u — (x, y) — x e\ + y e2.
The expression on the right is called linear combinations of vectors e\ and e2. The tuple of vectors e — (ei,e2) is called a basis of the vector space R2.
However, if we choose other two vectors u, v such that neither of them is a multiple of the other, that is a different basis of R2, we can do the same. Linear combination w — x u + y v gives us for all distinct tuples (x, y) exactly all vectors w in the plane.
Finally, we can consider the vectors to be the arrows in the abstract position, that is if we forget the identification of the points in the plane with the tuples of numbers. The only fact that remains is that all the arrows are ^— "fixed" in the point 0 which is also the zero vector. Operations of addition and scalar multiplication remain, and only through the choice of the base e\, e2 we identify our plane of arrows with R2.
1.24. Affine plane. If we fix some vector u e R2, we can add it (that is, compose it with other vectors as an arrow) to any point P — [x, y]. Therefore with any fixed vector u we have defined shift, which maps every point of the plane to P + u.
24
CHAPTER 1. INITIAL WARM UP
along the trail. After visiting an Edinburgh pub John, who Lives in Aberdeen, takes train home and falls asleep at the departure. During the trip from Edinburgh to Aberdeen he randomly sticks his head out of the train for five seconds, in the space where the trains ride in the other direction. What is the probability that he loses his head? (We assume that there are no other trains here.)
Solution. The mutual speed of the oncoming trains is 40 m/s, the oncoming train passes John's window for two and a half seconds. The sample space of all outcomes is thus the interval (0, 7200 s). During John's trip two trains pass by John's window in the opposite direction and any overlap of the their 2.5 s passing time interval with the 5 s interval when John's head might be sticking out is fatal. Thus, for each train, the space of "favourable" outcomes is an interval of length 7.5 s somewhere in the sample space. For two trains, it's double this amount. Thus the probability of losing the head is 15/7200 = 0.002. □
1.52. In one of the countries in the world, once a day between eight a.m. and eight p.m. a bus randomly departs from town A to town B. Once a day in the same time interval another bus departs in the other direction. The trip in either direction takes five hours. What is the probability that the buses meet, if they use the same trail?
Solution. The sample space is a square 12 x 12. If we denote the time of the departure of the buses as x and y respectively, then they meet on the trail if and only if \x — y\ < 5. This inequality determines in the square the are of "favourable events". The are of the remaining part is easier to compute, since it is an union of two right-angled isosceles triangles with legs of length 7. Thus in total it is 49, the area of the "favourable part" is 144 - 49 = 95 and the probability is p = ^ = 0, 66.
□
1.53. A rod of length two meters is randomly divided into three parts. Determine the probability that at least one part is at most 20 cm long.
Solution. Random division of a rod into three parts is given by two points of the cut, x and y (we first cut the rod in the distance x from
Let us now completely forget the coordinates and see the whole plane as a set where are shifts take place. Such a set A — R2 can be imagined from the point of view
of an observer, who observes from some fixed position (we can call that position for instance O — [xq, yo] e R2). Suppose that the observer sees the plane as an infinite plate without any measurements and labels, and only knows what means shifting by any multiple of some vector u e R2. Such a plane will be called "Affine plane".
In order to be able to see the "tuples of real numbers" around him, the observer must choose some fixed point E\ which he will call the "point [1,0]" and some other point E2 which he will call the "point [0, 1]". In other words, he chooses a basis e\ — (1, 0), e2 — (0, 1) among the shifting vectors. To reach any point he will then just jump "a-times in the direction e\" and then "fr-times in the direction e2" and the resulting point will be called the "point [a, b]". If he does it the usual way, it will not matter on the order of the operations, that is he can first jump fr-times in the direction e2 and after that in the direction e\.
The thing we have described now is called the choice of (affine) coordinate system in the plane, the point O is its origin, and in general every point P of the plane is identified with the tuple of numbers [a,b], which we will also denote as shift P — O.
>From now on we will work in fixed coordinates, that is with tuples of real number, but for better orientation we will denote vectors in parentheses instead of brackets (which we use for coordinates of points in the affine plane).
t .AT
1.25. Lines in the plane. If our observer can shift by any multiple of a fixed vector, he also knows what is a line.
It is a subset p c A in the plane, such that there exists f point O and a non-zero vector i; such that
p = {P e A; P - O = t  v, t e R).
Let us now describe P — P(t) e p in the chosen coordinates with the choice v — (a, P):
x(t) — x0 ■
t,   y(t) = yo ■
Since the vector i; — (a, ft) is non-zero, at least one of the numbers a, p has to be non-zero. Let us assume that for instance a ^ 0, then we can eliminate t from the parametric equation for x and y and through a simple computation we obtain
—y6x + ay — —fixo -t That is the general equation of the fine (1.13) ax + by — c,
ay0.
25
CHAPTER 1. INITIAL WARM UP
the origin, we do not move it and again cut it in the distance y from the origin). The sample space is thus a square C with side 2 m. If we place the square C so that its two sides lie on axes in the plane, then the condition that at least one part is at most 20 cm determines in the square a subarea O:
O = {(x, y) e C| (x < 20) V (x > 180) V (y < 20) V (y > 180)
V (\x - y\) < 20}.
As we can clearly observe, this subarea has volume ^ times the volume of the whole square.
&J iiL iL
with the following relation between the tuple of numbers (a, b) (—ft, a) and the direction vector of the line v — (a, ft)
□
E. Plane geometry
Let us return for a while back to the complex numbers. The complex plane is basically "normal" plane, where we have something extra:
1.54. Interpret multiplication by the imaginary unit i and taking the complex conjugate as a geometrical transformations in the plane. Solution. Imaginary unit i corresponds to the point (0, 1). Let us note that multiplying any number z, = a + i b by the imaginary unit gives result
i ■ (a + i b) = —b + i a
which is under the interpretation in the plane just a rotation of the point z, through the right angle in the positive sense, that is counterclockwise.
Taking the complex conjugate is a reflection through the axis of real numbers:
z = (a + i b) h> (a — i b) = z,-
□
Now one well-known but nevertheless useful exercise.
1.55. Determine the sum of angles, which are between the vectors (1, 1), (2, 1) and (3, 1) in the plane R2 (picture).
Solution. If we view the plane R2 as the Gauss plane (of complex numbers), then the given vectors correspond to complex numbers 1 +i, 2 + i and 3 + i, and we are to find the sum of their arguments, which
(1.14)
aa + bß — 0.
The expression on the right in the equation of the line (1.13) can we view as a scalar function F which depends f '~±zi on the points in the plane and with values in R, the equation itself as a condition on its value. We shall see later that the vector (a, b) is in this case exactly the direction, in which F grows the fastest. For this reason will the direction perpendicular to (a, b) exactly the direction, in which our function F remains constant. The constant c then determines, which among all the parallel lines with that direction this equation corresponds to.
Let us now have two lines p and q, and ask about their intersection p n q. That is described as a point which satisfies the equations of both lines simultaneously. Let us write them like this
(1.15)
-by —r dy — s.
Again we can view the left side as a mapping, which to every tuple of coordinates [x, y] of point P in the plane assigns vector of values of two scalar functions F\ and F2 given by the left sides of the particular equations (1.15). Thus we can write our equations as a single relation F(v) — w, where F is a mapping which maps the vector i; describing the position of any point in the plane (in our coordinates) to the vector given by the left side of the equations, and we demand that this mapping maps it to the specified value w — (r, s).
1.26. Linear mappings and matrices. Mappings F with which we have worked with when describing the intersection of lines have one very important property in common: they preserve the operations of addition and multiplication with vectors and scalars, that is they preserve linear combinations:
F(a ■ v + b
for all a, b e R, v, w e from M2 to M2, and write F:
w) — a ■ F(v) + b ■ F(w)
R2. We say that F is a linear mapping R2 —>• M2. This can be also described
26
CHAPTER 1. INITIAL WARM UP
is according to the de Moivre's formula the argument of their product. Their product is (1 + i)(2 + 0(3 + /) = (! + 3i) (3 + i) = 10/, which is a purely imaginary number with argument jt/2 - thus the sum we looked for is exactly it 12. □
1.56. Write the characteristic equation of the line p : x = 2 — t,
y = 1 + 3t, t € R.
Solution. The vector (—1,3) gives the direction of the line p. Therefore the vector (3, 1) is a normal to p and the characteristic equation of p is
3x + y + c = 0
for some eel We can determine this c by setting x = 2, y = 1 (the line p passes through point [2,1] with t = 0). Thus we obtain c = —1 and consequently the result 3x + y — 1 = 0. □
1.57. We are given a line
p : [2,0] + t(3,2), t € R.
Determine the characteristic equation of this line and the intersection with the line
q : [-1,2] + 5(1,3), s e R.
Solution. The coordinates of the points on this line are given by the parametric equations as x =2 + 3? and y = 0 + 2t. By eliminating t from the equations we obtain the characteristic equation:
2x - 3y - 4 = 0.
We obtain the intersection of p with the line q by putting the points of q in parametric expression into the characteristic equation of p:
2(-l+5)-3(2 + 3*)-4 = 0,
where we get that s = —12/7 and from the parametric equation of q we obtain the coordinates of the intersection P:
19 22
P = [--,--]
7 7
□
1.58.  Determine the intersection of the lines
p : x + y - 4 = 0,    q : x = -1 + 2t, y = 2 + t, t eR.
Solution. Let us first note that the direction of p is given by the vector up = (1,-1) (any nonzero vector perpendicular to the vector (1,1) from the characteristic equation of p) and the direction of q is given by the vector uq = (2, 1). As the vector up is not a multiple of the vector vq, we see that the lines have a nonempty intersection (they are not parallel). The point [x, y] is the intersection if and only if its
with words — linear combination of vectors maps to the same linear combination of their images, that is linear mapping are those mappings which preserve linear combinations.
We have already encountered the same behaviour in the equation (1.13) for the line, where the linear mapping in question was F : R2 -> R and its prescribed value c. That is also the reason why the values of the mapping z — F(x, y) are on the image depicted as a plane in R3.
We will write such mapping with the so called matrices and their multiplication. By matrix we mean a rectangular scheme of scalars, for instance
a   b\       , I x
nebo   i; —
c   d) \yy
we speak of (square) matrix A and (column) vector i;. The multiplication is defined as follows:
' a b\ (x\ (ax + by^ , c   dj   \y j     \cx +dy j
Similarly, we can instead of vector multiply from the right by another matrix B of the same dimension as A. We just apply the given formulas on individual columns of the matrix B and as a result we again obtain a square matrix.
We cannot multiply vector i; from the right with matrix A because the number of scalars on the rows of i; and the number of scalars in the columns of A differ. But we |^ can write the vector w as a row of scalars (so called =£^_ transposed vector) wT — (a b) and that we can multiply from the right with our matrix A or vector i; already.
We can easily check the so-called associativity of multiplication (do it for general matrices A, B and a vector i; in detail):
(A ■ B) ■ v = A ■ (B ■ v).
Of course that we can instead of vector i; write any matrix C of correct dimension. In a similarly easy way can we see that dis-tributivity also holds:
A-(B + Q = A- B + A-C,
but the commutativity does not hold and there also exist "divisible zeros". For instance
'0   1\   /0   0\ _ /0   1\   /0   0\   /0   1\ _ /0 0^
,0 o)'\p i) ~ [p oj'Xo i)'[p o) ~ [p 0j
We observe in particular that vector multiplication with a fixed matrix gives a linear mapping, and, in the other direction, using the values of a linear mapping F on two fixed vector of basis we obtain the whole corresponding linear mapping. Thus the points in the plane are in general images of the linear mapping F from a plane to a plane, lines are in general preimages of values of linear mappings from the plane to the real line R. With matrices and vectors can we write the equations for lines and points as
A ■ v —
Of course, in particular situations it does not have to be like this. For instance the intersection of two identical lines is the line itself (and the preimage of a specific value for such a linear mapping will be a whole line), the preimage of zero under the zero mapping
27
CHAPTER 1. INITIAL WARM UP
coordinates satisfy the equation of p and there exist a real number t such that
x = -1 + 2t,    y = 2 + t. If we put this into the equation of p, we obtain
(-1+20+ (2 + 0-4 = 0.
This equation is satisfied by t = 1, which gives the intersection with coordinates x = 1, y = 3. □
1.59. Find the characteristic equation of the line p, which goes through the point [2,3] and is parallel with the line x — 3y + 2 = 0, and the parametric equation of the line q which goes through the points [1,3] and [-2, 1].
Solution. Every line parallel to the line x — 3y + 2 = 0 is given by the equation
x - 3y + c = 0
for some cel. The line q goes through the point [2, 3]. Therefore it must hold that
2 - 3 • 3 + c = 0,    tj.   c = l. We can immediately give the parametric equation of the line q
q : [1, 3] + t (1 - (-2), 3 - 1) = [1, 3] + t (3, 2), (el.
is the whole plane. The first case happens when on the left side of equations (1.15) there are the same expressions up to a scalar multiple (said in another way, the rows of the matrix A are the same up to a scalar multiple). In such a case either the intersection of the fines is empty (the lines are parallel but distinct) or it contains all points of the fine (identical lines). This condition can be expressed by saying that the ratios a/c and b/d must be the same, that is
(1.16)
ad — be — 0.
Note that this expression already takes care of the cases, where either c or d is zero.
1.27. Determinant of matrix. Theexpressionontheleftin(1.16) is called determinant of the matrix A and we write for it
det A
ad — be.
Our discussion can be now expressed as follows:
Proposition. Determinant is a scalar function det A defined for all matrices A and equation A-v — u has a unique solution if and only if det A / 0.
It was necessary that we are working with the field of scalars w'ji*£\ —try to think it through. For instance it does not hold ^ "    " with integers in general. If we just compute the solution of the equations with integer coefficients (that is W the matrix A has only integer inputs) the solution does not have to be integral in general.
□
1.60.  Determine whether some of lines
pi : 2x + 3y — 4 = 0,    p2 '■ x — y + 3 = 0,    pi : — 2x + 2y = —6,
p4 : -x - | y + 2 = 0,    p5 : x = 2 + t, y = -2 - t, (el are parallel.
Solution. It is clear that
-2 • (-x - | y + 2) = 2x + 3y - 4.
The characteristic equations thus describe the same line. The vector (2, 3) is normal to the line pi, for the line p2 a vector is (1, —1), for the line pj, such a vector is (—2,2) and for the line ps such vector is (1,1) (perpendicular to the vector (1,-1)). The lines p2 and pj, are parallel (as the vectors perpendicular to them are multiples of each other). There are no more pairs of parallel lines, since the equations
x - y + 3 = 0,    -2x + 2y + 6 = 0
has clearly no solutions, the lines p\ and p4 form the only pair of identical lines. □
1.28. Affine mappings. Let us now investigate how the matrix notation allows us to work with simple mappings in the affine plane. We have seen that matrix multiplication gives a linear mapping. Shifting in the affine plane
by a fixed vector t matrix notation:
(r, s) e R can we also easily write in the
x + r y + s
If we allow ourselves to add fixed vector to the result of a linear mapping then our expression will have the form
t =
ax ex ■
■ by dy
In this way we have described exactly all so-called affine mappings of the plane to itself.
Such mappings allow us recomputing of coordinates which arose by different choices of origins and bases of directions for shifting. What happens, if our observer from the paragraph 1.23 will observe the plane from a different point, or chooses different points E\, E2I Try to think through that when speaking about coordinates the difference will be exactly realised by affine mapping. Later we will see general reasons why that holds in any dimension.
28
CHAPTER 1. INITIAL WARM UP
1.61. Determine the line p which is perpendicular to the line q : 6x — 7y + 13 = 0 and which goes through the point [—6,7].
Solution. Since the vector normal to q is the direction of p, we can directly write the result
p : x = -6 + 6t, y = 1 - It, t € R.
□
1.62. Give an example of numbers a,b € R, such that the vector u is a normal to AB where A = [1, 2], B = [2b, b], u = (a — b, 3).
Solution. The direction of AS is (2b — \,b — 2) (this vector is always nonzero), and therefore the vector (2 — b ,2b — 1) is normal to AB. Setting
2-b = a-b, 2/3-1=3, we obtain a = b = 2. □
1.63. Determine the relative position of the lines p, q in the plane for p : 2x — y — 5 = 0, q : x + 2y — 5 = 0. If they are not parallel, determine the coordinates of the intersection.
Solution. From the characteristic equations of p, q we obtain the vectors (2, — 1), (1, 2) which are normal to them. The lines are parallel if and only if these vectors are multiples of each other, which is not the case. The intersection is found by solving the equations
2x - y - 5 = 0,    x + 2y - 5 = 0.
Expressing y from the first equation as y = 2x — 5 and putting it to the second equation we obtain
x + 2(2x - 5) - 5 = 0,    tj.   x = 3.
Then easily y = 2 - 3 — 5 = 1. The intersection thus is [3, 1]. □
1.64. Consider the plane R2 with the standard coordinate system. A laser ray is sent from the origin [0, 0] in the direction (3, 1). It hits the mirror line p given by the equation
p: [4,3]+f(-2, 1)
and then is reflected (the angle of rebound is the same as the angle of entry). At which point meets the ray the line q, given by
q : [7,-10]+ f (-1,6)?
Solution. The angle between the line p and the direction of the ray is 45°, the rebounded ray is thus perpendicular to the entering ray and its direction is (1, —3) (Be careful with the orientation! The vector of the direction can also be obtained via reflection (axial symmetry) of
£>eTKAfNt''m LzM?
1.29. Euclidean plane. Let us now give our observer the ability to see and measure distance. For instance we can trust the common equation for the length of the vector v — (a,b)
■b2
in affine coordinates chosen by the observer. Immediately we can define notions as angle and rotation in the plane.
We can easily imagine it like this: our observer decides about some points E\ and E2 that they are at distance 1, i^v? and also decides that they are perpendicular. Dis-&r   tance in the direction of the coordinate axes are then given by the corresponding ratio, in general Euclid (Pythagorean) theorem is used. This leads to the equation given above. tUKLIDOVSK//
i/zm'LENosr
IMI=ll?-$ll = 1l&7e'
Of course that our observer can work in a different manner. He can use some specific standard for real measurements of the distance of points P and Q in the plane and then say that exactly that is the length of the vector Q — P, which is necessary to shift from the P to Q. Then he picks some of the vectors which have size 1 and for instance using the triangle with sides of size 3,4 and 5 constructs a perpendicular vector of size 1 and then continues as before.
Euclidean plane is an affine plane with a notion of distance as given above.
1.30. Angle between vectors. So-called trigonometric function (cos)(p — which we have already used in the discussion about complex numbers as points in the plane — is given by the value of the first coordinate of the unit vector whose angle with the vector (1, 0) is <p. The second coordinate of such vector is then clearly given by the real value 0 < sin <p < 1 satisfying
(cos cp) + (sin cp) — i.
The angle between two vectors i; and w can be in general described using coordinates v — (vx, vy), w — (wx, wy) like this:
cos <p —
VXWX  + VyWy
29
CHAPTER 1. INITIAL WARM UP
the vector perpendicular to the line p). The ray meets the mirror at the point [6,2], thus the rebounded ray has the equation
[6,2] +      -3), t > 0.
The intersection of the liven given by the rebounded ray with the line q is the point [4,8], which lies out of the half-line given by the rebound ray (t = —2). Thus the rebound ray does not meet the line q. □
Remark. The reflection of a ray in three-dimensional space is studied in the exercise || 3.531|.
1.65.  Line segment of length 1 started moving at noon with a constant speed 1 mathrmms~l in the direction (3,2) from the point [—2, 0]. Another line segment of length of 1 has started moving from the point [5,-2] also at noon, but with double speed. Will they collide?
Solution. Lines along which the segments are moving can be described parametrically:
p   : [-2,0]+r(3,2), q   :   [5,-2]+s(-l,l).
Characteristic equation of the line p is
2x - 3y + 4 = 0.
Plugging the parametric equation of the line q yields the intersection point P = [1, 2].
Now let us try to choose a single parameter t for both lines so that the corresponding point describes the position at p and q of the first and second line segment respectively at the time t (more precisely, the position of the initial point of the line segment). At time 0 is the first line segment at the position [—2,0], the second at the position [5, —2]. During time? (measured in seconds) the first segments travels t units of length in the direction (3,2), the second segments travels 2t units of length in the direction (—1, 1). Thus the corresponding parametrisations are:
P
q
[-2,0] +
(3, 2),
[5, -2] + fV2(-l, 1).
The initial point of the first segment enters the point [1, 2] at time t\ = \/l3 s, the initial point of the second segment at time t2 = V2 s -more than a half second sooner. At the time 12 + \ = \fl + \ < 11 the ending point of the second segment moves away from P. Thus when the initial point of the first segment enters the point P, the ending point
0PCWLKA PVO<A i/EICTom ^7
This relation can be easily check, as long as we believe that rotating the plane around the origin preserves angles. In such a case can we first multiply arbitrarily chosen vectors with suitable scalars such that we get vectors of length 1 (our equation clearly gives the same result after scalar multiplication). Then we can rotate the plane such that the first of our vectors coincides with the first basis vector (1,0). Our equation gives then
cos <p —
which is just the definition of the function cos <p.
1.31. Rotation around a point in the plane. Matrix of any given mapping F : R2 -> R2 is easy to guess: if the result of applying the mapping is the matrix with columns (a, c) and (b,d), then the first column (a, c) is obtained by multiplying this matrix with the basis vector (1,0) and the second is the evaluation at the second basis vector (0, 1).
We can see from the picture that rotating counter-clockwise through the angle i/r there are in the matrix the following columns:
a b c d
cos -iff sin \jf
- sin -iff cos \jf
The counter-clockwise direction is called the positive direction, the other direction is the negative direction. Therefore we obtain the claim:
30
CHAPTER 1. INITIAL WARM UP
of the second segment is already away and the segments do not collide.
□
1.66. Planar soccer player shoots a ball from the point f = [1, 0] in the direction (3, 4) hoping to hit the goal which is a line segment from the point a = [23, 36] to b = [26, 40]. Does the ball fly towards the goal?
Solution. Due to the fact that the situation takes places in the first quadrant, it is sufficient to consider only the slopes of the vectors fa, (3, 4), fb. If they form either increasing or decreasing sequence (in the order we have written them), the ball flies towards the goal. The sequence is 36/22, 4/3, 30/25 which is a decreasing sequence, thus the ball flies towards the goal. □
1.67.   Simplify (a
0 5
-2 2
b
b)t ■ 2C • u, while
2 0^ -1 1
C
a-b
(a-b)1
Solution. By plugging in
-1   V '    v      y ^5 and by matrix multiplication we obtain
-2   -l\   /4 -4 5     1 j ' 18 10
2C
(a - b)t ■ 2C ■ u
4 -4 8 10
-52 64
__J    Rotation matrix J_
Rotating through a given angle if in the positive direction about the origin is given by the matrix Rf:
R,i
cos -iff — sin -iff sin if    cos if
Now, since we know how the matrix of the rotation in the plane looks like, we can check that rotation preserves distances and angles (defined by the previously given equation). Let us denote the image of a vector i; as
,    i v'\     ^ /vx cos ilf — vv sin ilf
vx sin if + vy cos if
and similarly w' — Rf-w. We can easily check that it really holds that
\v   — \\v\\
v'w' — vxwx
Uy IVy .
The previous expression can be written using vectors and matrices as follows:
□
1.68.   Give an example of matrices a and b for which
(a) (a + b) ■ (a - b) £ a ■ a - b ■ b;
(b) (a + b) ■ (a + b) ^ a ■ a + 2A ■ b + b ■ b.
Solution. Let us remind that we are considering two-dimensional (square) matrices a and b. For any two matrices a and b we have
(a + b) ■ (a - b) = a ■ a - a ■ b + b ■ a - b ■ b. The identity
(a + b) ■ (a — b) = a ■ a — b ■ b
is thus obtained if and only if—ab +b-a is zero matrix, that is if and only if the matrices a and b commute. An example of such matrices are thus such pairs of matrices, which do not commute (the matrix of multiplication is changed when we change the order of multiplied matrices). We can choose for instance
a = i: :i, b =
since with this choice is a ■ b
b - a
8 5 20 13
Analogously is for any pair of matrices a, b
13 20 5 8
(Rf ■ w)T(Rf ■ v) — wTv.
The transposed vector (Rf ■ w)T equals wT ■ R^, where R^ is the so-called transpose of the matrix Rf. That is a matrix, whose rows consist of the columns of the original matrix and similarly the columns consist of the rows of the original matrix. Therefore we see that the rotation matrices satisfy the relation R^ ■ Rf — /, the matrix / (sometimes we denote this matrix just as 1 and mean by this the unit in the ring of matrices) is the so-called unit matrix
I =
1 0 0 1
This led us to a derivation of a remarkable claim — the matrix F with the property that F ■ Rf — I (we will call such a matrix the inverse matrix to the rotation matrix Rf) is the transpose of the original matrix. That makes sense, since the inverse mapping to the rotation through the angle if is again a rotation, but through the angle — if, that is the inverse matrix of R^ equals the matrix
R-
cos(—ir) — sin(— if) sin(— if)     cos(— if)
cos if sin if - sin if   cos if
It is easy to write the rotation around a point P — O
again using matrix, the equation can be expressed
with a shift:
31
CHAPTER 1. INITIAL WARM UP
(A + B) ■ (A + B) = A ■ A + A ■ B + B ■ A + B ■ B. That means that
(A +B) ■ (A +B) = A- A +A-B + A-B + B ■ B
is satisfied if and only if A ■ B = BA. In the second case is the answer exactly the same as in the first case.
□
G :
1.69.  Decide, whether the mapping F, G
' x \     i Ix — 3y yy J      \—1x + 5y
' x\ (2x + 2y - 4 vy)      \4x - 9y + 3
are linear.
given by
x, y e
x, y e
Solution. For any vector (x, y)T e
we can express 7    — 3 \   / x\      „ 11x
-2 3 / V-V vv^
This implies that both mappings are affine. Let us remind that affine mapping is a linear one if and only if the zero vector maps to zero. Since
:?))-©■ °m-(t
the mapping F is linear, the mapping G is not. □
1.70. Let us consider a regular hexagon ABCDEF (the vertices are labelled in the positive direction) with centre at the point S = [1,0] and the vertex A = [0, 2]. Determine the coordinates of the vertex C.
Solution. The coordinates of the vertex C can be obtained by rotating the point A around the centre S of the hexagon through the angle 120° in the positive direction:
'cos(120°)   - sin(120°)N sin(120°) cos(120°)
C
(A — S) + S
+ [1,0] = [^-73,-1-^]
□
1.71.  Determine the angle between two vectors
(a) u = (-3, -2), v = (-2, 3);
(b) u = (2,6), v = (-3, -9).
Solution. The angle <p we are looking for can be computed from the formula (1.36). Note that the vector (—3, —2) can be obtained by changing the coordinates of the vector (—2, 3) and multiplying one of them by the number —1. But these operations are used when we want to obtain the vector normal to a vector of direction of a given line
One just has to realise that instead of rotating around the given point P we can first shift P into the origin, then do the rotation and after that do the inverse shift, which takes the whole plane back where it should have been all the time. Let us calculate then:
v — w I—> Rf ■ (v — w)
Rf ■ (v — w) + w
' cos \jf(x — wx) — sin \jf(y . sin TJf(x — wx) + cos TJf(y -
- Wy) + WX Wy)) + W,
1.32. Reflection. Another well-known example of mappings which preserve length is the so-called reflection \"° through a line. Again it suffices to describe reflections through fines that go through the origin O, all other reflections can be derived using shifts and rotations.
Let us look for a matrix of reflection with respect to the line with the direction given by the unit vector i; such that the angle between i; and the vector (1,0) has value i/r. Let us first realise that
Z0
1 0 0 -1
- /1 o
ü-A
(I)
In general, we can rotate any line so that it has the direction (1,0) and thus we can write general reflection matrix as
where we first rotate via the matrix R-^ so that the line is in "zero" position, reflect with the matrix Zo and return back with the rotation Rf.
32
CHAPTER 1. INITIAL WARM UP
(or vice versa). Vectors in the case (a) are thus perpendicular, that is <p = Tt/2. In the case (b), since —3 • (2, 6) = 2 • (—3, —9), the vector m is a multiple of the vector v. If one vector is a positive multiple of another, the angle between these two is clearly zero. In our case we have to multiply by a negative number, which gives <p = n. □
1.72. Determine the angle (the deviation) <p between two diagonals A3A7 and A5A10 of a regular dodecagon (polygon with twelve sides) A0AXA2 ... An.
Solution. The angle does not depended on the size of the given dodecagon. Let us choose the dodecagon inscribed in a circle with diameter 1. As in the previous exercise, we choose the coordinates of its vertices and then using a formula finish the computation that
cos(<p)
E,thatis<p = 75°.
2V2W3'
Alternative solution. This problem can be solved via method of synthetic geometry only: let us denote the centre of the regular dodecagon by S and the intersection of the diagonals a3a7 and a5a10 by T. Now |zLA7A5Aio| = 45° (this the inscribed angle which corresponds to the central angle A75Ai0, which is a right angle), furthermore IZIA5A7A3I = 30° (again the inscribed angle corresponding to the central angle A5 SA3, which is 60°). Thus the angle A5 TA7 is then equal to a complement of the aforementioned angles to 180°, that is 105°. The deviation we are looking for is then 180° - 105° = 75°. □
1.73. Compute the lengths of the sides of the triangle with vertices A = [2,2], B = [3,0], C = [4,3].
Solution. Using the well-known formula for the size of a vector
||w|| = tJu\ + u\,    u = («i, u2) € M2 we obtain the results
\AB\ = \\A - B\\ = V(2 - 3)2 + (2 - O)2 = VŠ, \BC\ = \\B - C\\ = V(3 - 4)2 + (0 - 3)2 = VÍÔ, \AC\ = \\A-C\\= V(2 - 4)2 + (2 - 3)2 = VS.
□
1.74. Let an equilateral triangle with vertices [1, 0] and [0, 1] which lies completely in the first quadrant be given. Determine the coordinates of its third vertex.
Therefore we can calculate (thanks to the associativity of matrix multiplication):
—
cos ý sin \jf
cos l/r sin \jf
— sin ý cos l/r
sin \jf
— cos l/r
1 0 \ / cos ý sin ý 0   —1J   \— sin ý cosý
cos \jf sin \jf - sin \jf   cos \jf
2 sin \jf cos \jf -(cos2 ý — sin2 ý)
cos2 \j/ — sin2 \j/ 2 sin \jf cos \jf
cos2i/f sin2i/f sin2i/f   — cos2i/f
We have used the usual addition formulas for trigonometric functions. Let us also note that     • Zq is given:
cos 2ijf sin2i/f
sin2i/f - cos 2ijf
0
-1
cos2i/f — sin2i/f sin2i/f cos2i/f
This observation can be depicted and formulated as follows
Proposition. Rotation through the angle iff is obtained by two subsequent reflections in the directions that have the angle between them.
If we can prove the previous proposition purely via geometri-\\ cal argumentation (try to be a "synthetic geometer"), we have just proved standard formulas for trigonometric functions of double angle. W The following recapitulation of previous ideas is somehow deeper (we can almost think that we can already prove some interesting mathematical result):
__\    Mappings that preserve length J___
1.33. Theorem. Linear mapping of the euclidean plane is composed of one or more reflections if and only if it is given by a matrix R which satisfies
R
: b
c d
ab + cd = 0,    a2 + c2 — b2 + d2 — 1.
Solution. The third coordinate is [^ +      \ + ^] (we are rotating the point [1, 0] through 60° around [0, 1] in the positive direction). □     This happen if and only if this mapping preserves length.
33
CHAPTER 1. INITIAL WARM UP
1.75. Determine the coordinates of the vertices of a triangle, which arises by rotating an equilateral triangle, whose two vertices are A = [1,1] and B = [2,3] and the third is in the half-plane given by the line AB and the point S = [0, 0], by 60° in the positive direction around the point S.
Solution. The third vector of the triangle can be obtained for instance by rotating through 60° of one vertex around the other (in the correct direction).   The points we are looking for
then have coordinates [— |V3, V3 -[1 - |V3, v3 + |].
1.76.  Find two matrices A such that
A2
9 J' Lt
rV3, iV3 + i], □
Hint: which geometric transformation in the plane is given by the matrix A2?
Solution. A2 is the matrix of rotation through 60° in the positive direction, thus the matrix we are looking for are
that is they are matrices of rotation through 30° or through 210°. □
1.77.  Determine A ■ A for
(coscp -sm(p\ A =    . ,   where ineR.
^sm^    cosip J
Solution. We know that the mapping
' x \ (cos <p — sin <p\ (x KyJ      ^sin^    cos<p J \y
is the rotation of the plane R2 around the origin through the angle <p in the positive direction. Since matrix multiplication is associative, we obtain that the mapping
A ■ A
' x \ I cos <p — sin <p \ I cos <p — sin <p \ I x ' KyJ      \sm(p    coscp J   ^sin^    coscp J   \yi
is a rotation through the angle 2<p. That means that
^cos 2<p   — sin 2cp^ vsin2<p cos2<p
Let us note that we could have directly multiplied A ■ A (and apply
the formulas for sine and cosine of double angle). But repeating the
aforementioned method (or using the mathematical induction) yields
' cos ncp   — sinncp^ sinncp cosncp
easier (we set A2 = A ■ A, A3 = A ■ A ■ A etc.) □
A"
2,3,
Rotation is such mapping if and only if the determinant of the matrix R equals one, which corresponds to an even number of reflections. When there is an odd number of reflections, the determinant equals —1.
Proof. Let us first calculate, how a general matrix A may look like, if the corresponding mapping preserves length. That is we have a mapping
' x\      {a   b\   (x\     jax + by^ <y) ^ Vc   d)   \y) ~ \cx +dyy Preserving length thus means that for every x and y we it holds that
x2 + y2 = (ax + by)2 + (cx + dy)2 —
— (a2 + c2)*2 + (b2 + d2)y2 + 2(ab + cd)xy.
Since this equation is to hold for every x and y, the coefficients of the individual powers x2, y2 and xy on the left and right side of the equation must be equal. Thus we have calculated that the conditions put on the matrix R in the first part of the theorem we are proving are equivalent to the property than the given mapping preserves length.
Thanks to the relation a2 + c2 — 1 we can assume that a — cos <p and c — simp for a suitable angle <p. As soon as we choose the first column of the matrix R, the relation ab + cd — 0 determines the second column up to a multiple. But we also know that the size of the vector in the second column is one, and thus we have only two possible cases for the matrix R:
cos <p sin <p
— sin <p cos <p
cos <p sin <p
sin <p - cos cp
In the first case it is the rotation through the angle <p, in the second case it is the rotation composed with the reflection through the first coordinate axis. As we have seen in the previous proposition 1.31, every rotation corresponds to two reflections and the determinant of the matrix R is in these two cases really either one or minus one and distinguishes between these two cases. □
1.34. Area of a triangle. In the end of our little trip to the areas of geometry we will focus on the area of planar objects. For us, triangles will be sufficient. Every triangle is determined by a tuple of vectors i; and w, which, if shifted so that they start from one vertex P of the triangle, determine the remaining two vertices. We would like to find a formula (scalar function vol), which to two vectors assigns the number equal to the area vol A(v, w) of the triangle A(v, w) defined in the aforementioned way, where we for definiteness pick for P the origin (shift does not change the volume anyway).
We can see from the statement that the desired value is half of the area of the parallelogram spanned by the vectors i; and w and is easy to calculate (using the well-known formula: base times corresponding height), or simply observe from the picture that the following holds
vol A(v + v', w) — vol A(u, w) + vol A(u', w) vol A(av, w) — a vol A(u, w).
34
CHAPTER 1. INITIAL WARM UP
1.78. Reflection. Find the matrix of reflection in the plane through the line y = V3x (that is the matrix of the axial symmetry).
Solution. Draw a picture.    In the basis given by the vectors
(1, V3), (—V3, 1) is the matrix of the reflection clearly ^ ^
Thus in the standard basis it is
1    -V3\ (I    0 \ / 1 -73
V3   i )\o -y\V3 1
The inverse matrix is in this case easy to find since the vectors in the columns are perpendicular, thus the matrix is (almost) orthogonal. We have
1    -V3V1 _ i /  1 V3
i )   H-V3 i
By multiplying of the corresponding matrices we obtain the result
V3 VV3 1
(Redraw the pcture!). □
2 \  n    1  / ■ This result can be directly guessed from the picture
1.79. Determine, which linear mappings from M2 to M2 are given by the following matrices (that is, describe the geometrical meaning of the matrices)
1 0 0  0/ '
-1 0 0 1/'
Solution. Let (x, y)T stand for an arbitrary real vector. For the matrix Ai we have
x
or
'> Co?
which means that the linear mapping given by this matrix is the projection on the x axis. Similarly we can see that the matrix A2 determines the reflection with the respect to the y axis, since
The matrix A3 can be expressed in the form
^cos <p   — sin cp^ sin<p cos<p
for <p = 7r/4, thus it gives the rotation of the plane around the origin through the angle 7t/4 (in the positive direction, that is counterclockwise). □
1.80. Parallelogram identity. Let us prove the so-called "parallelogram identity" for an illustration of our tools: If u, v e M2, then:
2(||M||2+||u||2) = ||M+i;||2 + ||M-i;||2.
That means, the sum of the squares of the diagonals of a parallelogram equals the sum of the squares of the lengths of the four sides of the parallelogram.
i
o
Finally we add to our problem formulation a condition
vol A(u, hi) = — vol A(w, v),
which corresponds to the idea that we give a sign to the area, according to the order in which we are taking the vectors (that is, if we see the area from the top or from the bottom).
If we write the vectors i; and w into the columns of a matrix A, then
A — (v, w) h-> det A
satisfies all the three conditions we wanted. How many such mappings could there possibly be? Every vector can be expressed using two basis vectors e\ — (1,0) and e2 — (0, 1) and by linearity is then every possibility for vol A uniquely determined by the value for these vectors. Since for area — in the same way as for determinant — is clearly vol A(ei, e\) — vol A(e2, e2) — 0 (due to the required antisymmetry), every such scalar function is necessarily determined by the value on the single tuple of arguments (e\, e2). Therefore all possibilities are equal up to a scalar multiple, which can be determined by the condition
1
volA(ei, e2) = -,
that is we are choosing orientation and scale through the choice of basis vectors and we want that the unit square has area equal to one.
Thus we see that the determinant gives the area of a parallelogram determined by the columns of the matrix A and the area of the triangle is thus one half of that.
1.35. Visibility in the plane. The previous description of the value for oriented area gives us elegant tool for determining the position of a point relative to oriented fine segments. > By an oriented line segment we mean two points in the plane M2 with fixed order. We can imagine it as an arrow from one point to the other. Such an oriented fine segment divides the plane into two half-planes, let us call them "left" and "right". We want to be able to tell whether a given point is in the left or right half-plane.
Such tasks are often met in computer graphics when dealing with visibility of objects. For simplicity we can imagine that a line segment can be "seen" from the points to the right of it and cannot be seen from the points to left of it (this corresponds to the notion that object with bounded by fine segments oriented counterclockwise has to the left of the line segments its interior, through which the segment cannot be seen).
35
CHAPTER 1. INITIAL WARM UP
Solution. Using both sides of the equation into the coordinates u =
(uu u2), v = (vi, v2) yields:
2(\\u\\2 + \\v\\2) = 2(u\ + u\ + v\ + v\) = u\ + 2u\V\ + v\ + u\ + 2u2v2 + u|+ + u2 — 2u\V\ -\- v\ -\- u\ — 2u2v2 + t>2 = (ill + vi)2 + (u2 + v2)2 + (ill - vi)2 + (u2 - v2)2 = \\u+v\\2 + \\u-v\\2.
□
1.81. Show that by composing an odd number of point reflections in the plane yields again a point symmetry.
Solution. Point reflection in the plane across the point 5 is represented with the formula X h» 5 - (X - 5), that is X h» 25 - X. (The image of the point X in this reflection is obtained by summing the vector opposite to the vector X—S and the vector 5.) By repeated application of three point reflections across the points 5, T and U respectively thus yields X h» 25 - X h» 2T - (25 - X) h» 2U - (2T - (25 - X)) = 2(U - T + 5) - X, that is X h» 2(U - T + 5) - X, which is a point reflection across the point 5 — T + U. Thus composition of any odd number of point reflection can be thus reduced to a composition of three point reflections, thus it is a point reflection (in principle, this is a proof by mathematical induction, try to formulate it by yourself). □
1.82. Construct (2n + l)-gon, if the middle points of all its sides are given.
Solution. We use the fact that the composition of an odd number of point reflections is again a point reflection (see the previous exercise). Denote the vertices of the (2n + l)-gon we are looking for by A\, A2, ..., A2n+i and the middle points of the sides (starting from the middle point of AiA2) by 5i, 52, ... S2n+i. If we carry out the point reflections across the middle points, then clearly the point A! is a fixed point of the resulting point reflection, thus it is its centre point. In order to find it, it is enough to carry out the given point reflection with any point X of the plane. The point Ai then lies in the middle of the line segment XX' where X' is the image of X in that point reflection. The rest of the vertices A2, ..., A2n+i can be obtained by mapping the point A\ in the point reflections across the points 5i, ..., S2n+i. □
w oA<J~-l c<
1.83. Determine the area of the triangle ABC, if A [-2, 0], C = [5, 9].
[-8, 1],B
We have the line segment A B and are given some point C. Let us now calculate the oriented area of the corresponding triangle determined by the vectors A — C and B — C. If the point C is to the left of the line segment, then with the usual positive orientation (counter-clockwise) is the vector A — C encountered sooner than the other vector B — C and thus the resulting area (that is the value of the determinant of the matrix with these two vectors as columns) greater than zero. On the other hand, if the vectors are encountered in the other order, the resulting determinant value will be negative and thus we can say that the point is to the right of the segment.
The mentioned approach is really often used for testing the relative position in standard tasks in 2D graphics.
6. Relations and mappings
In the final part of the introductory chapter we will return to the formal description of mathematical structures, but we will try to illustrate them on examples we already know. Also, we can consider this part to be an exercise in formal approach to objects and concepts of mathematics.
1.36. Relations between sets. First we need to define cartesian product A x B of two sets A and B. It is the set of all ordered tuples (a, b) such that a e A and b e B. Binary relation between two sets A and B is then a subset R of cartesian product A x B.
Often we write a b for expressing the fact that (a, b) e R, that is that the points a e A and be B are in relation R. Domain of a relation is the subset
D c A,    D = [a e A; 3b e B, (a,b) e R}.
In words, it is the set of elements a from the set A such that there exists an element bin B such that (a, b) belongs to the relation R. Shortly, the domain consists of such elements of A that have an image in B. Similarly codomain of a relation is the subset
/eg,    I = [b e B;3a e A, (a, b) e R],
that is the elements of B that have a preimage in A.
Special case of a relation between sets is mapping from a set A to the set B. It is just for the case when every element of the domain of the relation is in relation with IIIeL^ exactly one element of the codomain. Examples of mappings known to us are all scalar functions, where the domain of the mapping is a set of scalars, for instance the set of integers or reals. For mappings we usually use notation we have already been using when dealing with scalar functions. We write
/:oa^;a, f(a) = b
36
CHAPTER 1. INITIAL WARM UP
61 2 ■
Solution. We know that the area equals to the half of the determinant of the matrix, whose first column is given by the vector B — A and the second column by the vector C — A, that is the determinant of the matrix
'-2-(-8) 5-(-8)N 0-1 9-1
A simple calculation yields the result
I ((_2 - (-8)) . (9 - 1) - (5 - (-8)) • (0 - 1))
Let us add that the change of the order of the vectors leads to change in the sign of the determinant (but the absolute value is unchanged) and that the value of the determinant would not change at all if we wrote the vertices in the rows (preserving the order). □
1.84. Compute the area S of the quadrilateral given by its vertices [1,1], [6, 1], [11,4], [2,4].
Solution. Let us first denote the vertices (in the counter-clockwise direction)
A = [1,1],    B = [6, 1],    C = [ll,4],    D = [2, 4].
If we divide the quadrilateral A BCD into the triangles ABC and ACD, we can obtain its area as the sum of the areas of these two triangles, by evaluating the determinants
6-1 11	- 1		5	10
1-1 4-	- 1		0	3
11-1 2	- 1		10	1
4-1 4	- 1		3	3
where in the columns are these vectors B C — A, D — A (for d2). Then
S
A, C - A (for d{) and
úl _1_ šl
2 2
5-3-10-0 _|_ 10-3-1-3
15+27
21.
to express the fact that (a, b) belongs to a relation, and we say that b is the value of / at a. Furthermore we say that
• mapping / maps the set A to the set B if D — A,
• mapping / of the set A to the set B is surjective (or onto), if D — A and I — B,
• mapping / of the set A to the set B is infective (or one-to-one), if D — A and for every b e I there exist exactly one preimage a <e A, f(a) — b.
Expressing a mapping / : A -> B as a relation f^AxB,    f = {(a,f(a));aeA} is also known as the graph of a function f.
\
0 ^TŕCľí j** (n^_Y<WV
9
2        1        2       ~~ 2
(thanks to the order of the vectors are all determinants greater than zero). Correctness of the result is easy to confirm, since the quadrilateral A BCD is a trapezoid with bases of lengths 5,9 and their distance v = 3. □
1.85. Give the area of a meadow, which is determined on the area map by the points at quotas [-7, 1], [-1, 0], [29, 0], [25, 1], [24, 2] and [17, 5]. (Ignore the measurement units. They are determined by the ratio of the area map to the reality.)
Solution. The given hexagon can be divided for instance into for triangle with vertices at
[-7, 1], [-1, 0], [17, 5];       [-1, 0], [24, 2], [17, 5];
[-1,0], [25, 1], [24, 2];       [-1, 0], [29, 0], [25, 1]. The areas are 24, 89/2, 27/2 and 15, which gives the result 24 + 44 \ + 13 \ + 15 = 97.
1.37. Composition of relations and functions. For mappings, the conception of composition is clear. We have two mappings / : A -> Bandg : B -> C, then their composition go f : A -> C is denned as
(gof)(a) = g(f(a)). It can be also expressed under the notation used for relation as
/ c A x B,    f = {(a, f(a)); a € A} g^BxC,   g = {(b, g(b)); beB] gof^AxC,    gof = {(a,g(f(a)));aeA}.
The composition of relation is denned in a very similar way, "4f%%K we Just ac^ existential quantifiers to the statements, "V^i      since we have to consider all possible "preimages"
J^P^   and all possible "images". Let R c A x B, S c Bx C
^rs^i- be relations. Then S o R c A x C,
SoR = {(a, c); 3b e B, (a, b) e R, (b, c) e S}.
A special case of relation is the identity relation
idA — {(«, a) <e A x A; a <e A}
on the set A. It is a neutral element with respect to composition with any relation that has A as its codomain.
37
CHAPTER 1. INITIAL WARM UP
□
1.86. Determine the area of a triangle A2A3An, where A0Ai... An are vertices of a regular dodecagon inscribed in a circle of radius 1.
Solution. The vertices of the dodecagon can be identified with twelfth roots of 1 in the complex plane. If we additionally choose A0 = 1, we can then write Ak = cos(2/c7r/12) + i sin(2/c7r/12). For the vertices of the investigated triangle it holds that A2 = cos(7r/3) + i sin(7r/3) = 1/2 + zV3/2, A3 = cos(7r/2) + i sin(7r/2) = i, An = cos(—it/6) + i sin(—jt/6) = V3/2 — i/2, that means the that the coordinates of these points in the complex plane are A2 = [1/2, V3/2], A3 = [0, 1], An = [\/3/2, — \]. According to the formula for the area of a triangle is the area S equal to
3 - 73
1	A2	-An	1
2	A3	-An	= '2
V3 2
2
1 I VI
2 2
I
Since the determinant is non-negative, we could have omitted the absolute value for aestethical reasons. □
1.87. Which sides of the quadrilateral given by the vertices [—2, — 2], [ 1, 4], [3, 3 ] and [2, 1 ] are visible from the position of the point [3, tt — 2]?
Solution. It is a classical problem of the visibility of the sides of a
convex polygon in the plane. In the first step we order the vertices
such that their order corresponds the counter-clockwise direction. If
we choose as the first vertex for instance the vertex A = [—2,-2],
the order of the remaining vertices is then B = [2, 1], C = [3, 3],
D = [1,4]. Let us first consider the side AB. It along with the point
X = [3, tt — 2] determines the matrix
-2-3 2-3 -2 - (tt - 2)   1 - (tt - 2),
such that its first column is the difference A — X and the second column
is B — X. Whether it can be seen from the point [3, tt — 2], is then
determined by the sign of the determinant
-2-3 -2-(jt - 2)
2-3 1 - (tt - 2)
-5
-tt
-1
tt
—5 • (3 — tt) — (-1X-77-) < 0.
Negative value signifies that the side is visible. Let us note that it does not matter if we are considering the differences A — X and B — X, or X — A and X — B. But if we change the order of the columns, the corresponding side would be visible if and only if the determinant is positive.
For the side BC we analogically obtain
co-fee H^slt^
For every relation r c A x b we define the inverse relation
r~l = {(b, a); (a, b) e r] c b x A.
Beware, the same term is used with mappings in a more specific situation. Of course, for every mapping there is its inverse relation, but this relation is in general not a mapping. Therefore we speak about the existence of an inverse mapping if every element b e b is an image of exactly one element in A. In such case the inverse mapping is exactly the inverse relation.
Note that the composition of a mapping and its inverse mapping (if it exists) always leads to the identity mapping, but when dealing with relations it does not have to be so in general.
1.38. Relation on a set. In the case when A — b we speak about a relation on the set A. We say that the relation r is:
• reflexive, if idx c r, that is (a, a) e r for every a e A,
• symmetric, if r-1 — r, that is if (a, b) e    then also (b, a) e r,
• antisymmetric, if r-1 fl r c id^, that is if (a, b) e r and also(/\ a) e r, then a — b,
• transitive, if r o r c r, that is if (a, b) e r and (b,c) e r implies (a,c) e r.
Relation is called equivalence if it is reflexive, symmetric and transitive.
Relation is called ordering if it is reflexive, transitive and antisymmetric. Orderings are usually denoted by the symbol <, that is the fact that element a is in relation with element b is written as a < b.
Now it is good to realise that the relation <, that is "to be strictly smaller than", on real (rational, integer, natural) numbers is not an ordering, since it is not reflexive.
A good example of ordering is inclusion. Consider the set 2A of all subsets of a finite set A (notation is a special example of the common notation ba for the set of all mappings from A to
38
CHAPTER 1. INITIAL WARM UP
2-3
1 - (7t - 2)
3-3
3 - (7t - 2) -1 • (5 - J!)
-1 0
3 — Tt    5 — 7T
0 < 0.
Thus this side is also visible. Only the sides CD and DA remain. For them we obtain
3-3 1-3      _0 -2
3 -(tt-2)   4-(77--2)~ 5-77- 6-77-0 - (-2) • (5 - tt) > 0,
1-3 -2-3		-2 -5
4 - (tt - 2)   -2 - (tt - 2)		6 — Tt —Tt
-2-(-Tt) - (-5) • (6-77-) > 0.
Thus from the point X are visible exactly the sides determined by the pairs of vertices [—2, —2], [2, 1] and [2, 1], [3, 3]. □
1.88. Give the sides of the pentagon with vertices at points [—2, — 2], [—2, 2], [1, 4], [3,1] and [2, —11/6], which are visible from the point [300, 1].
Solution. For simplifying the notation let us traditionally" set
A = [—2, —2],    B = [2,-11/6],    C = [3, 1],    Z) = [1,4],    £" = [-2,2].
The sides BC and CD are clearly from the position of the point [300, 1] visible, on the other hand DE and EA cannot be seen. For the side AB let us determine
-2 - 300   2- 300 ,  17. 0QS.
_2-l    _ii _ ! = _302 ' (—e) ~ (~298) • (-3) < 0.
6
This implies that the side can be seen from the point [300, 1]. □
1.89. Visibility of the sides of a triangle. Let the triangle with the vertices A = [5, 6], B = [7, 8], C = [5, 8] be given. Determine, which of its sides are visible from the point P = [0, 1].
Solution. Let us order the vertices in the positive direction, that is counter-clockwise: [5, 6], [7, 8], [5, 8]. Using the corresponding determinants we can can determine whether the point [0, 1] lies to the "left" or to the "right" of the sides of the triangle when we view them as oriented line segments,
> 0,
0.
Since the last determinant is zero, we see that the points [0, 1], [5, 6] and [7, 8] lie on a line, the side AB is thus not visible. The side BC is also not visible, unlike the side AC for which the determinant is negative. □
B -	P		1	7
C -	P		5	7
A -	P		5	5
B -	P		7	7
C -	P		5	7	
A -	P	—	5	5	<
B; the elements of the set 2A are thus mappings from A to {0, 1} which "say" whether given element is in a given subset). We have a relation c on the set 2A given by the property "being a subset". Thus it is X c Z if X is a subset of Z. Clearly all three conditions from the definition of ordering are satisfied: if I c F and Y c. X then necessarily X and Y must be identical. If X c Y c Z then also X c. Z, and reflexivity is clear from the definition.
We say that an ordering < on a set A is complete, if for every two elements a, b e Ait holds that they are comparable, that means that either a < b or b < a. Let us note that not all tuples (X, Y) of subsets of A are comparable in this sense. More precisely, if A contains more than element, there exist subsets X and Y where neither X c. Y nor Y c. X.
Let us recall the recurrent definition of natural numbers N — {0, 1,2, 3,...}, where
0 = 0,
1 = {0, 1,2,...,«}.
On this set N we define a relation < as follows: m < n, if either m e n or m — n. Clearly this is a complete ordering. For instance 2 < 4, since
2 = {0, {0}} € {0, {0}, {0, {0}}, {0, {0}, {0, {0}}}} = 4.
In other words, the recurrent definition itself gives the relation n < n + 1 and transitively then n < k for all k obtained in this manner later.
1.39. Partitions of an equivalence. Every equivalence Sona set A gives also a partition of the set A, consisting of subsets of mutually equivalent sets, so called equivalence classes. For any a e A we consider the class (set) of elements, which are equivalent with a, that
is
Ra = {b e A; (a, b) e R}.
Often we will write for Ra simply [a]a, if it isclear from the context which equivalence we have in mind.
Clearly Ra — Rb if (a,b) e R, and every such equivalence class is therefore represented by any of its elements, so-called representant. Furthermore Ra n Rb / 0 if and only if Ra — Rh, that is the equivalence classes are pairwise disjoint. Finally, A — UaeARa, that is the whole set A is partitioned to equivalence classes.
Another way of looking at things is that [a] is seen as the element a "up to equivalence".
1.40. Construction of the integers and rational numbers. With natural numbers we can do addition and we know that adding zero to a number does not change it. We can also define subtraction, but the result does not always belong to
VJ 1 the set N.
The basic idea of construction of the integers from the natural numbers is to add to N these missing results. This can be done as follows: instead of result of subtraction, we will work with ordered tuples of numbers, which will represent the result. It remains just to define which such tuples are equivalent (with respect to the result of subtraction). The necessary relation is then:
(a, b) ~ (a', b')
b'
■b'
■b.
39
CHAPTER 1. INITIAL WARM UP
1.90. Determine which sides of the quadrilateral with vertices A = [95, 99], B = [130, 106], C = [40, 60], D = [130, 120], are visible from the point [2, 0].
Solution. First we need to determine the sides of the quadrilateral (the "correct" vertex order): A BCD. After computing the corresponding determinants as in previous exercises we see that only the side CB is visible. □
F. Mappings and relations
1.91. Determine whether the following relations over the set M are equivalence relations:
i) M = {/ : R -> R], where (/ ~ g) if /(0) = g(0).
ii) M = {/ : R -> R], where (/ ~ g) if /(0) = g(l).
iii) M is the set of lines in the plane, where two lines are in relation if they do not intersect.
iv) M is the set of lines in the plane, where two lines are in relation if they are parallel.
v) M = N, where (m ~ n) if S(m) + S(n) = 20, while S(n) stands for the sum of the digits of the number n.
vi) M = N, where (m ~ n) if C(m) = C(n), where C(n) = S(n) if the sum of the digits S(n) is less than 10, otherwise we define C(n) = C(S(n)) (thus it always holds that C(n) <
10) .
Solution.
i) Yes. Let us check the three properties of equivalence:
i) Reflexivity: for any real function / it holds that /(0) = /(0).
11) Symmetry: if /(0) = g(0), then also g(0) = /(0).
iii) Transitivity: if /(0) = g(0) andg(0) = h(0), then also /(0) = HO).
ii) No. The relation is not reflexive, since for instance for the function sin we have sin 0 ^ sin 1 and is not even transitive.
iii) No. The relation is not reflexive (every line intersects itself) and not transitive.
iv) Yes. The equivalence classes then correspond to unoriented directions in the plane.
v) No. The relation is not reflexive. 5(1) + 5(1) = 2.
vi) Yes.
Let us note that the expression in the middle equations cannot be realised in natural numbers, but the expression on the right can. We can easily check that it really is an equivalence, and we denote its classes as the integers Z. We define addition and subtraction on Z using representants. For instance
[(a, b)] + [(c, d)] = [(a+c,b + d)],
which is clearly independent of the choice of representants.
It is always possible to choose representants (a, 0) for natural numbers and representants (0, a) for negative numbers — this is probably the simplest and clearest choice.
This simple example shows how important it is to be able to see the equivalence classes as a whole object and to concentrate on the properties of these objects, not on the formal description of their construction. However, the description is important in order to be able to check that such objects exist.
On integers we have all the properties of scalars (KG1)-(KG4) and (01)-(04), see the paragraphs 1.1 and 1.3. For multiplication the neutral element is one, but for all numbers a other than zero and one we are not able to find a number a-1 with the property a ■ a~l — 1, that means that for multiplication we are missing inverse elements.
Let us also note that the properties of the integral domain (ID), see 1.3. This means that if the product of two numbers equals zero, at least one of them has to be zero.
Thanks to the last stated property we can construct the rational numbers Q by adding all missing multiplicative inverses by a method analogous to the construction of Z from N. On the set of all ordered tuples (p,q),q / 0, of integers we define a relation ~ so that it models our expectation of the fractions p/q:
{p',q') <=> plq = p'l4 <=>
(p,q)
W = p'
q.
Again, we are not able to formulate the expected behaviour in the middle equation when we work in Z, but for the equation on the right this is indeed possible. Clearly this relation is a well-defined equivalence (think it through!) and rational numbers are then the equivalence classes. If we formally write p/q instead of tuples (p,q), we can define the operations of multiplication and addition by the well-known formulas.
□
1.92. We have a set {3, 4, 5, 6, 7}. Write explicitly the relations
1.41. Remainder classes. Another nice and simple example are j:^ the so-called remainder classes of integers. For a fixed natural number k we define an equivalence ~£ so that two numbers a, b e Z are equivalent if they have the same remainder when divided by k. The resulting set of equivalence classes is denoted as Z*. This procedure is simplest for k — 2. This yields Z2 — {0, 1}, where zero stands for even numbers and one for odd numbers. Again it is easy to see that using representants we can correctly define addition and multiplication for each Z*.
Theorem. The remainder class 7Li_ is a commutative field of scalars (that is, the property (P) from the paragraph 1.3 is also satisfied) if and only ifk is a prime.
Ifk is not prime, then Zj- contains a divisor of zero, thus it is not an integral domain.
Proof. The second part is easy to see — if x- y — k for natural numbers x, y, then clearly the result of multiplying the corresponding classes [x] • [y] is zero.
40
CHAPTER 1. INITIAL WARM UP
i) a divides b
ii) a divides b or b divides a
iii) a and b have a common divisor greater than one
On the other hand, if x and k are relatively prime, then according to the so-called Bezout equality which we derive later (see ??) natural numbers a and b satisfying
a x + b k — 1,
which for corresponding equivalence classes gives
o
[a] ■ [x] + [0] = [a] ■ [x] = [1]
and thus [a] is the inverse element to [x].
□
1.93. Let the relation r be defined over R2 such that ((a, b), (c, d)) € r for arbitrary a,b,c,d € R if and only if b = d. Determine whether it is an equivalence relation. If it indeed is, describe geometrically the partitioning it determines.
Solution. From ((a, b), (a, b)) € r for all a, b € R it is implied that the relation is reflexive. Equally easy to see is that the relation is symmetric, since in the equality of the second coordinates we can interchange left and right side. If ((a, b), (c, d)) e r a ((c, d), (e, f)) e r, that is it holds that b = d and d = /, we easily get that the transitivity condition ((a, b), (e, f)) € r, that is b = f, holds. The relation r is an equivalence relation, where the points in the plane are in relation if and only if they have the same second coordinate (the line they determine is perpendicular to the y axis). The corresponding partition then divides the plane into the lines parallel with the x axis. □
1.94. Determine how many distinct binary relations can be defined between the set X and the set of all subsets of X, if the set X has exactly 3 elements.
Solution. Let us first realize that the set of all subsets of X has exactly 23 = 8 elements, and thus the cartesian product with X has 8 • 3 = 24 elements. Possible binary relations the correspond to subsets of this cartesian product, and of those there are exactly 224. □
1.95. Give the domain d and the codomain i of the relations
r = {(a, v), (b, x), (c, x), (c, u), (d, v), (/, y)}
between the sets A = {a, b, c, d, e, /} and b = {x, y, u, v, w}. Is the relation r a mapping?
Solution. Directly from the definition of the domain and the codomain of a relation we obtain
d = {a, b, c, d, /} C A,    i = {x, y, u, v} C b.
It is not a mapping since (c,x),(c,u) e r, that is c e d has two images. □
41
CHAPTER 1. INITIAL WARM UP
1.96. Determine about each of the following relations over the set {a,b,c,d} whether it is an ordering and whether it is complete:
Ra = {(a, a), (b, b), (c, c), (d, d), (b, a), (b, c), (b, d)}, Rh = {(a, a), (b, b), (c, c), (d, d), (d, a), (a, d)}, Rc = {(a, a), (b, b), (c, c), (d, d), (a, b), (b, c), (b, d)}, Rd = {(a, a), (b, b), (c, c), (a, b), (a, c), (a, d), (b, c), (b, d), (c, d)}, Re = {(a, a), (b, b), (c, c), (d, d), (a, b), (a, c), (a, d), (b, c), (b, d), (c,d)}.
Solution. Ra is an ordering, which is not complete (for instance neither (a,c) g Ra nor (c, a) g Ra). The relation Ri, is not anti-symmetrical (it is both (a, d) € R], and (d, a) € R],), therefore it is not an ordering (it is an equivalence). The relations Rc and Rd are also not an ordering, since they are not transitive (for instance (a,b),(b,c) € Rc, Rd, (a,c) £ Rc, Rd) and also not reflexive ((d, d) g Rc, (d,d) £ Rd). Relation Re is a complete ordering (if we interpret (a,b) e R as a < b, then a < b < c < d). □
1.97. Determine whether the mapping / is injective (one-to-one) or surjective (onto), if
(a) / : Z x Z -> Z,    f((x, y)) = x + y - 10x2;
(b) / : N -+ N x N,    f(x) = (2x, x2 + 10).
Solution. In the case (a) is given a mapping which is surjective (it is enough to set x = 0) but not injective (it is enough to set (x, y) = (0, —9) and (x,y) = (1, 0)). In the case (b) in is an injective mapping (both its coordinates, that is functions y = 2x and y = x2 + 10 are clearly increasing over N) which is not surjective (for instance the tuple (1,1) has no preimage). □
1.98. Determine the number of mappings from the set {1, 2} to the set {a, b, c}. How many of them are surjective and how many injective?
Solution. To the element 1 we can assign arbitrarily one of the elements a,b,c. Similarly for the element 2 we have two possibilities. Thus according tho the (combinatorial) rule of product there are exactly 32 mappings of the set {1, 2} to the set {a, b, c}. None of them can be surjective, since the set {a, b, c} has more elements than the set {1,2}. For the arbitrary mapping of the element 1 (three possibilities) we have injective mapping if and only if the element 2 gets mapped to a different element (two possibilities). Thus we see that the number of injective mappings of the set {1, 2} to the set {a, b, c} is 6. □
CHAPTER 1. INITIAL WARM UP
1.99. Determine the number of inj ective mappings of the set {1, 2, 3} to the set {1,2, 3,4}.
Solution. Any injective mapping among the given sets is given by choosing an (ordered) triple from the set {1, 2, 3, 4} (the elements in the chosen triple will correspond in order to images of the numbers 1, 2, 3) and vice versa every injective mapping gives us such a triple. Thus the number of injective mappings equals the number of ordered triples among four elements, that is i>(3, 4) = 4 • 3 • 2 = 24. □
1.100. Determine the number of surjective mapping of the set {1,2, 3, 4} to the set {1,2, 3}.
Solution. We can determine the number by subtracting the number of non-surjective mappings from the number of all mappings. The number of all mappings is V (3, 4) = 34, the number of non-surjective mappings (that is the number of mappings with one-element codomain) is three. Thus the number of mappings with two-element codomain is (2) (24 — 2) (there are (2) ways to choose the codomain and for a fixed two-element codomain there are 24 —2 ways how to map four elements onto them). Thus the number of surjective mappings is
1.101. The Hasse diagram of ordering. The Hasse diagram of a give ordering -< over an n-element set M is a diagram with n vertices (every vertex corresponds to exactly one element of the set), and two vertices (elements) a, b are joined (with a more or less vertical) line (such that a is "lower" and b is "higher") if and only if b covers a, that is a < b and there is no c e M such that a < c and c <b.
1.102. Determine the number of ordering relations of a four-element set.
Solution. We will consider all possible Hasse diagrams of orderings over a four-element set M, and we will count how many different orderings (recall that an ordering is a subset of a set M x M) the given Hasse diagram has. See the picture:
(1.3)
□
In total, there are 219 orderings over a four-element set.
□
43
CHAPTER 1. INITIAL WARM UP
1.103. Determine the number of ordering relations of the set {1, 2, 3, 4, 5} such that exactly two pairs of element are incomparable.
o
1.104. Write all relations over a two-element set {1, 2}, which are symmetric but are neither reflexive nor transitive.
Solution. Reflexive relations are exactly those, which contain both tuples (1, 1), (2, 2). By this we have excluded the relations
{(1,1), (2, 2)},    {(1,1), (2, 2), (1,2)},    {(1,1), (2, 2), (2,1)},
{(1,1), (2, 2), (1,2), (2, 1)}.
The remaining relations, which are symmetric but not transitive, must contain (1,2), (2, 1). If such a relation contains one of these two (ordered) tuples, it must due to the symmetry condition contain also the other. If it contains none of these tuples, then it is clearly transitive. >From the total number of 16 relations over a two-element set have we thus chosen
{(1,2), (2,1)},       {(1,2), (2,1), (1,1)},       {(1,2), (2,1), (2, 2)}.
It is clear that each of these 3 relations is symmetric but neither reflexive nor transitive. □
1.105. Determine the number of equivalence relations over a set {1,2,3,4}.
Solution. Equivalences can be enumerated by the sizes of their equivalence classes. For the sizes of equivalence classes over a four-element set we have these possibilities:
The sizes of equivalence classes	number of equivalences of this type
1,1,1,1	1
2,1,1	0
2,2	\0
3,1	G)
4	l
In total we have 15 different equivalences. □
Remark. In general, the number of partitions of a given n-element set is given by the Bell number Bn+k, for which a recurrence formula can be derived
n
1.106.   How many relations are there over an «-element set?
Solution. Relation is an arbitrary subset of the cartesian product of the set with itself. This cartesian product has n2 elements, thus the
2
number of all relations over an n-element set is 2" . □
44
CHAPTER 1. INITIAL WARM UP
1.107. How many reflexive relations are there over an «-element set?
Solution. Relation over the set m is reflexive if and only if it has the diagonal relation AM = {(a, a), kde a e m} as a subset. As for the rest of the n 2—n ordered tuples in the cartesian product MxMwe have independent choice, whether the tuple belongs to the relation or not. In
2
total we have 2" ~" different reflexive relations over an n-element set.
□
1.108. How many symmetric relations are there over an ^-element set?
Solution. Relation r over the set m is symmetric if and only if the intersection of r with each {(a, b), (b, a), where a ^ b, a,b € m} is either the whole two-element set or is empty. There are two-element subsets of the set m, and if we also declare what the intersection of r and the diagonal relation AM = {(a, a), where a e m} should be, then r is completely determined. In total we are to do (2) + n independent choices between two alternatives: each set of the type {(a, b), (b, a)\where a, b € m, a ^ b} is either the subset of r or it is disjoint with r, and every tuple (a, a), a € m either is in r or not. In total we have 2©+" symmetric relation over an n-element set.
□
1.109. How many anti-symmetric relations over an ^-element set are there?
Solution. Relation r over the set m is anti-symmetric if and only if the intersection of r with each set {(a, b), (b, a)} a ^ b, a, b € m is either empty or one-element (which means that it is either {(a, b)} or {(b, a)}). The intersection of r with the diagonal relation is arbitrary. By declaring what these intersections are the relation r is completely determined. In total we thus have 3®2" anti-symmetric relations over an n-element set. □ In [?] we have defined remainder classes (also called residue classes) and we have shown that Zp is a field for any prime p. On the other hand, events we are not used to when dealing with real or complex numbers occur in Zp.
1.110. Non-zero polynomial with zero values. Find a non-zero polynomial of one indeterminate with coefficients in Z7, that is an expression of the form anx" + • • • + a\x + a0, at € Z7, a„ ^ 0 such that it attains only zero values over the set Z7 (that is, if we set x to be equal to any of the elements of Z7 and evaluate, we always obtain zero).
Solution. For the construction of such polynomial we use the Fermat's little theorem which says that for any prime number p and number a,
_CHAPTER 1. INITIAL WARM UP
which is not divisible by p, we have:
ap~l = l(modp).
Thus we can take for instance the polynomial x1 - x (the polynomial x6 — x is not zero for x = 0). □
46
CHAPTER 1. INITIAL WARM UP
G. Additional exercise for the whole chapter
1.111.  Let t and m be positive integers. Show that the number %/i is either integer or is not rational.
Solution. Show that if the number is not integer, then it cannot be rational. If %/i is not integer, then there exists a prime r and integer s such that f divides t, rs+1 does not divide t (this we write as ordr t = s) and m does not divide s . Assume that ^ft = £, p, q e Z, in other words t ■ pm = qm. Consider ordr L and ordr R and their divisibility by the number m. (L denotes the left-hand side of the equation,...). □
1.112. Determine
(2+3r)(l+r\/3) l-r\/3
Solution. Since the absolute value of the product (ratio) of any two complex numbers is the product (ratio) of their absolute values and every complex number has the same absolute value as its complex conjugate, we have that
(2+3Q(l+i 73) l-r'V3
|2 + 3f|
|l+iV3| |l-r'\/3|
|2 + 3f| = V22 + 32
□
1.113.   Simplify the expression (573+ 5/) .
Solution. Taking powers one by one or doing an expansion using binomial theorem are in this case too much time-consuming. Let us rather write
5V3 + 5i = 10      + 0 = 10 (cos f + i sin f)
and using the Moivre theorem we easily obtain
(573 + 5ij 12 = 1012 (cos ^ + i sin ^f) = 1012.
1.114. Calculate zi + zi, z\ ■ zi, z\, \zi\, zf2, for
a) z,\ = 1 - 2i, Z2 = 4i -3
b) zi =2,z2 = i
1.115.  Determine the distance d of the numbers z, z in the complex plane for
/3V3 _ • 3 2 2m
□
o
Solution. It is not difficult to realize that complex conjugates are in the complex plane symmetrical with respect to the x-axis and the distance of a complex number from the x-axis equals its imaginary part. That gives d = 3. □
47
CHAPTER 1. INITIAL WARM UP
1.116. In the meeting there were six men. If all of them shook hands with each other, how many handshakes have happened?
Solution. The number of handshakes equals the number of ways of choosing an unordered tuple among 6 elements, thus the result is c (6, 2) = (^) = 15. □
1.117. Determine in how many ways a 4-member committee can be chosen among 15 deputies, if it is not allowed for two certain deputies to work together.
Solution. The result is
(?)-(?) = 1287.
It can be obtained by first calculating the number of all 4-member committees and then subtracting the number of those committees where the given two deputies are chosen together (in that case, we only choose two more members among the remaining 13 deputies). □
1.118. In how many ways can we divide 8 women and 4 men in two six-member groups (which are considered unordered) in such a way that there is at least one man in each group?
Solution. If we forget the last condition, division of 12 people in two six-member groups can be done by just choosing 6 people and put them to the first group, which can be done in (?) ways. The groups are not distinguishable (we do not know which one is the first one), thus the total number is rather \ ■ (?). In (j) cases all men are in one group (we choose two women among eight to complete the group). The correct answer is thus
\ ■ (?) " © = 434-
□
1.119. What is the number of 4-digit numbers composed of digits 1,3,5,6,7 and 9, where no digit occurs more than once?
Solution. We have 6 distinct letters at our disposal. We ask: how many distinct ordered 4-tuples can be chosen from them? The result is 1; (6, 4) = 6 • 5 • 4 • 3 = 360. □
1.120. The Greek alphabet consists of 24 letters. How many words of exactly five letters can be composed in it? (Disregarding whether the words have some actual meaning or not.)
Solution. For each of the five positions in the word we have 24 possibilities, since the letters can repeat. The result is then v (24, 5) = 245. □
1.121. In a long-distance race, where the racers start one after another in given time intervals, there were k racers, among them 3 friends. Determine the number of starting schedules in which no two of the 3 friends start next to each other. For simplicity assume k > 5.
Solution. Remaining k — 3 racers can be ordered in (k — 3)1 ways. For the three friends there are then k — 2 places (the start, the end and the k — 4 spaces) where we can put them inv (k — 2, 3) ways. Using the rule of (combinatorial) product, we obtain
(k - 3)\ ■ (k - 2) ■ (k - 3) ■ (k - 4) = (jk - 2)! • (jk - 3) • (k - 4).
□
48
CHAPTER 1. INITIAL WARM UP
1.122. There are 32 participants of a tournament. The organisers have stated that the participants must divide arbitrarily into four groups, such that the first one has size 10, the second and the third 8, and the fourth 6. In how many ways can this be done?
Solution. We can imagine that from 32 participants we create a row, where first 10 are the first group, next 8 are the second group and so on. There are 32! orderings of all participants. Note that the division into groups is not influenced if we change the order of the people in the same group. Therefore the number of distinct divisions equals
P (10, 8,8,6) = T5ifbi-
□
1.123. We need to accommodate 9 people in one four-bed room, one three-bed room and one two-bed room. In how many ways can this be done?
Solution. If we assign to the people in the four-bed room the number 1, in the three-bed room number 2 and in the two-bed room number 3, then we create permutations with repetitions from the elements 1, 2, 3, where 1 occurs four times, 2 three times and 3 two times. Number of such permutations is
P (4, 3, 2) = ^ = 1260.
□
1.124. Determine the number of ways how to divide among three people A, B and C 33 distinct coins such that A and B together have twice as many coins as C.
Solution. From the problem statement it is clear that C must receive 11 coins. That can be done in (^) ways. Each of the remaining 22 coins can be given either to A or to B, which gives 222 ways. Using the rule of product we obtain the result (^) • 222. □
1.125. In how many ways can we divide 40 identical balls among 4 boys?
Solution. Let us add three matches to the 40 balls. If we order the balls and matches in a row, the matches divide the balls in 4 sections. We order the boys at random, give the first boy all the balls from the first section, give the second boy all the balls from the second section and so on. It is now evident that the result is (433) = 12 341. □
1.126. According to quality, we divide food products into groups i, 77, 7/7, iv. Determine the number of all possible divisions of 9 food products into these groups, such that the numbers of products in groups are all distinct.
Solution. If we directly write the considered groups from the elements of i, ii, iii, iv, we create combinations of repetitions of the ninth-order from four elements. The number of such combinations is (g2) = 220. □
1.127. In how many ways could the table of the first soccer league ended, if we know only that at least one of the teams Ostrava, Olomouc is in the table after the team of Brno (there are 16 teams in the league).
Solution. Let us first determine the three places where the teams of Brno, Oloumouc and Ostrava ended. Those can be chosen in c(3, 16) = (g6) ways. From 6 possible orderings of these three teams
49
CHAPTER 1. INITIAL WARM UP
on the given three places only four satisfy the given condition. After that, we can independently choose the order of the remaining 13 teams at the remaining places of the table. Using the rule of product, we have the solution '16^
□
. • 4 • 13! = 13948526592000.
3
1.128. How many distinct orderings (in a row) at a picture of a volleyball team (6 players), if
i) Gouald a Bamba want to stand next to each other
ii) Gouald a Bamba want to stand next to each other and in the middle hi) Gouald a Kamil do not want to stand next to each other
Solution.
i) In this case Gouald a Bamba can be considered a single person, we just multiply then by two to determine their relative order. Thus we have 2.5! = 240 orderings.
ii) Here it is similar except that the position of Gouald and Bamba is fixed. We have 2.4! =48 orderings.
hi) Probably the simplest approach is to subtract the cases where Kamil and Gouald stand next to each other (see (i)). We get 6! - 2.5! = 720 - 240 = 480.
□
1.129. Coin flipping. We flip a coin six times.
i) How many distinct sequences of heads and tails are there?
ii) How many sequences with exactly four heads are there?
iii) How many sequences with at least two heads are there?
o
1.130. How many anagrams of the word BAZILIKA are there, such that there are no two vowels next to each other and no two consonants next to each other?
Solution. Since there are four vowels and four consonants in the word, each such anagram is either of the type BABABABA or ABABABAB. On the given four places we can permute vowels in P0(2,2) =      ways and independently of that also the consonants (4! ways). Using the rule of product, the result is then 2 • 4! •     = 288. □
1.131. In how many ways can we divide 9 girls and 6 boys into two group such that each group contains at least two boys?
Solution. We divide the boys and the girls independently: 29(25 — 7) = 12800. □
1.132. Material is composed of five layers, each of them has fibres in one of the possible six directions. How many of such materials are there? How many of them have no two neighbouring layers which have fibres in the same direction?
Solution. 65 a 6 55. □
50
CHAPTER 1. INITIAL WARM UP
1.133. For any fixed n e N determine the number of all solutions to the equation
x\ + x2 H-----h xk = n
in the set of positive integers.
Solution. If we look for a solution in the domain of positive integers, then we note that the natural numbers x\, ... xk are a solution to the equation if and only if the non-negative integers yt = xt — \, / = 1 are a solution to the equation
yi + yi H-----\-yk=n -k.
Using || 1.30||, there are (£:}) of them. □
1.134. There are n forts on a circle (n > 3), numbered in a row with numbers 1,..., n. In one
§,, moment of time each of the shoots at one of its neighbours (fort 1 neighbours with the fort ri). Denote by P(n) the number of all possible results of the shooting (a result of the shooting is x       a set of numbers of those forts that were hit, regardless of the number of hits taken). Prove that P(n) and P(n + 1) are relatively prime.
Solution. If we denote the forts that were hit by a black dot and the unhit by a white dot, the task is equivalent to the task to determine the number of all possible colourings of n dots on a circle with black and white colour, such that no two white dots have "distance" one. For odd n this number is equal to K(n) - the number of colourings with black and white, such that no two white dots are adjacent (we reorder the dots such that we start with the dot one and proceed increasingly with odd numbers, and then increasingly with even). For even n this number equals K(n/2)2, the square of the colouring of n/2 dots on a circle such that no two white are adjacent (we colour independently the dots on even positions and on odd positions).
For K(n) we easily derive a recurrent formula K(n) = K(n—\)+K(n—2). Furthermore, we can easily compute that K(2) = 3, K(3) = 4, K(4) = 7, that is, K(2) = F(4) - F(0), K(3) = F(5) -F(\),K(4) = F (6) — F (2), and using induction we can easily prove that K(n) = F(n+2) — F(n — 2), where F(n) denotes then-th member of the Fibonacci sequence (F(0) = 0, F(l) = F(2) = 1). Since (K(2), K(3)) = 1, we have for n > 3 similarly as in the Fibonacci sequence
(K(n), K{n - 1)) = (K(n) - K(n - 1), K(n - 1)) = = (K(n - 2), K(n-!)) = ■■■ = 1.
Let us now show that for every even n = 2a is P (n) = K(a)2 relatively prime with both P (n + 1) = K(2a + 1) and P(n — 1) = K(2a — 1). For this the following is enough: for a > 2 we have
(K(a), K(2a + 1)) = (K(a), F(2)K(2a) + F(l)K(2a - 1)) =
=   (K(a), F(3)K(2a - 1) + F(2)K(2a -2) = ...
=   (K(a), F(a + l)K(a + 1) + F(a)K(a)) =
=   (K(a), F(a + 1)) = (F(a + 2) - F(a - 2), F(a + 1)) =
=   (F(a +2) - F(a + 1) - F(a - 2), F(a + 1)) =
=   (F(a) - F(a - 2), F(a + 1)) =
=   (F(a - 1), F(a + 1)) = (F(a - 1), F(a)) = 1 (K(a), K(2a - 1)) = (K(a), F(2)K(2a - 2) + F(l)K(2a - 3)) =
51
CHAPTER 1. INITIAL WARM UP
=   (K(a), F(3)K(2a - 3) + F(2)K(2a - 4)) =
=   • • • = (K(a), F(a)K(a) + F(a - l)K(a - 1)) =
=   (K(a), F(a - 1)) = (F(a + 2) - F(a - 2), F(a - 1)) =
=   (F(a + 2) - Fifl), F(a - 1)) =
=   {F{a + 2) - Fifl + 1), F(a - 1)) = (F(a), F(a - 1)) = 1. This proves the claim. □
1.135. How much money do I save in a building savings in five years, if I invest in it 3000 Kc monthly (at the first day of the month), the yearly interest rate is 3% and once a year I obtain a state donation of 1500 Kc (this donation comes at first of May)?
Solution. Let xn be the amount of money at the account after n years. Then (for n > 2) we obtain the following recurrent formula (assuming that every month is exactly one twelfth of a year)
xn+1 = 1, 03(xn) + 36000 + 1500+
0.03-3000(l + ii + ... + !) +
interests from deposits this year
+ 0, 03• - •1500
3
interest from the state donation credited at this year = l,03(x„) + 38115.
Therefore
n-2
xn = 38115 J](l, 03)' + (l,03)"-1Jti + 1500,
while xx = 36000 + 0, 03 • 3000 (l + ^ + • • • + ^) = 36585, in total x5 = 38115       q3)q3~ ^ + (!> 03)4 ' 36585 + 1500 = 202136.
□
1.136. Remark. In reality, interests are computed according to the number of days the money is on the account. You should obtain a real bank statement of a building savings, determine its interest rates and try to compute the credited interests in a year. Compare the result with the sum that was credited in reality. Compute until the numbers disagree ...
1.137. What is the maximum number of areas the plane can be divided into by n circles? Solution. For the maximum number p„ of areas we derive a recurrent formula
pn+1 = pn+2n.
Note that the (n + l)-th circle intersects n previous circles in at most 2n points (and this can really occur)
52
CHAPTER 1. INITIAL WARM UP
Clearly p\ = 2. Thus for p„ we obtain
p„ = pn-i + 2(72 - 1) = p„-2 + 2(n - 2) + 2(72 - 1) = . . .
n-l
= pi +        2i = 722 — 72 + 2.
r = l
□
1.138. What is the maximum number of areas a 3-dimensional space can be divided into by 72 planes?
Solution. Let the number be r„. We see that r0 = 1. Similarly to the exercise (|| 1.34||) we consider 72 planes in the space, we add another plane ad we ask what is the maximum number of new areas. Again it is exactly the number of areas the new plane intersects. How many can that be? The number of areas intersected by the (72 + l)-th plane equals to the number of areas the new (72 + l)-th plane is divided into by the lines of intersection with the 72 planes that were already situated in the space. However, there are at most 1/2 ■ (n2 + n + 2) of those (according to the exercise in plane), thus we obtain the recurrent formula
722 + 72 +2 rn + l = rn H--^-"
This equation can be again solved directly:
(72 - l)2 + (72 - 1) + 2 722 - 72 + 2
r„   =   r„_i H---- = r„_i H---- =
(72 - l)2 - (72 - 1) + 2      722 - 72 + 2
=   r„_2 +---+ ^^ =
722       (72 — l)2       72       (72 — 1) =    r„_2 + — +---------- + 1 + 1 =
2 2 2 2
,  n2      (n-l)2      (72 - 3)2      72      (72 - 1)      (72 - 2)
"r""3 +2+2     +     2 2        2 2 +
+1 + 1 + 1 =
Y  n \   n n
'■o + 2E?-2El' + E1
/ — 1 / — 1 / — 1
53
CHAPTER 1. INITIAL WARM UP
n(n + l)(2n + 1)     n(n + 1)
1 "I--7^---:--1" "
12 4
723 + 6« + 5
where we have used the known relation
.2     n(n + l)(2n + 1)
E'2
r = l
which can be easily proved by mathematical induction.
□
1.139. What is the maximum number of areas a 3-dimensional space can be divided into by n balls?
o
1.140. What is the number of areas a 3-dimensional space is divided into by n mutually distinct planes which all intersect a given point?
Solution. For the number x„ of areas we derive a recurrent formula
x„ = x„-i + 2(n - 1), furthermore x\ = 2, that is,
x„ = n(n — 1) + 2.
□
1.141. From a deck of 52 cards we randomly draw 16 cards. Express the probability that we choose exactly 10 red and 6 black cards.
Solution. We first realize that we don't have to care about the order of the cards. (In the resulting fraction we would obtain ordered choices by multiplying by 16! both nominator and denominator.) The number of all possible (unordered) choices of 16 cards from 52 is Similarly, the number of all choices of 10 cards from 26 is equal to (^) and of 6 cards from 26 is (266). Since we are choosing independently 10 cards from 26 red and 6 cards from 26 black, using the (combinatorial) rule of product we obtain the result
/26\ l76\
Hzr1 =0, 118.
□
1.142. In a box there are 7 white, 6 yellow and 5 blue balls. We draw (without returning) 3 balls randomly. Determine the probability that exactly 2 of them are white.
Solution. In total there are (7+3+5) ways, how to choose 3 balls. Choosing exactly two white allows (2) choices of two white balls and simultaneously choices for the third ball. Using the rule of product is the number of ways how to choose exactly two white equal to Q • (Y) ■ Thus the result is
^ = 0, 283.
□
54
CHAPTER 1. INITIAL WARM UP
1.143. From a deck with 108 cards (2 x 52 + 4 jolly jokers) we draw without returning 4 cards randomly. What is the probability that at least one of them is an ace or a joker?
Solution. We can easily determine the probability of the complementary event, that is, in the 4 drawn cards there is none of the 12 cards (8 aces and 4 jokers). This probability is given by the ratio of the number of choices of 4 cards from 96 and the number of choices of 4 cards from 108, that is, (946) / ^^J8). The complementary event thus has the probability
1 - M = 0, 380.
□
1.144. When throwing a dice, eleventh times in a row the result was 4. Determine the probability that the twelfth roll results in 4.
Solution. The previous results (according to our assumptions) do not influence the result of further rolls. Thus the probability is 1/6. □
1.145. From a deck of 32 cards we randomly draw 6 cards. What is the probability that all of them have the same colour?
Solution. In order to obtain the result
t$ = 1,234- 10"4,
we just first choose one of the 4 colours and realize that there are Q ways how to choose 6 cards from 8 cards of this colour. □
1.146. Three players are given 10 cards each and two remain (from a deck of 32 cards, where 4 of them are aces). Is it more likely, that somebody receives seven, eight and nine of spades; or that two aces remain?
Solution. Since the probability that some of the players receives the three mentioned cards equals
3 ,32
while the probability that two aces remain equals
it is more likely that some of the players receives the three mentioned cards. Let us note that proving the inequality
Q (?)
is possible by transforming both sides, where by repetitive crossing-out (after expanding the binomial coefficients according to their definition) we easily obtain 6 > 1. □
1.147. We throw n dice. What is the probability that among the numbers that appeared the values 1, 3 and 6 are not present?
Solution. We can reformulate the exercise that we throw the dice n times. The probability that the first roll does not result into 1, 3 or 6 is 1/2. The probability that neither the first nor the second roll is clearly 1/4 (the result of the first roll does not influence the result of the second roll). Since the event determined by the result of a given roll and event determined by the result of another roll are always (stochastically) independent, the probability is 1/2". □
55
CHAPTER 1. INITIAL WARM UP
1.148. Two friends are shooting independently of each other at one target - one shoots, then the second shoots, then the first, and so on. The probability that the first hits 0, 4, the second friend has the probability of hitting 0, 3. Determine the probability P of the event that after shooting there will be exactly one hit of the target.
Solution. We determine the result by summing the probabilities of two mutually exclusive events -first friend hit the target and the second has not; and second friend hit the target and first has not. Since the events of hitting are independent (note that independence is preserved when taking complements) is the probability given by the product of the probabilities of given elementary elements. That is,
p = 0, 4 • (1 - 0, 3) + (1 - 0, 4) • 0, 3 = 0, 46.
□
1.149. We flip three coins twelve times. What is the probability that at least one flipping results in three tails?
Solution. If we realize that when repeating the flipping, the individual results are independent, and denote for i € {1,..., 12} by At the event „the z'-th flipping results in three tails", we are determining
P (iJ A^j = 1 - (1 - P(A0) • (1 - P(A2)) ••• (1 - P(An)).
For every i € {1,..., 12} is P(At) = 1/8, since at every coin of the three the tail is with the probability 1/2 independently of the results of the other coins. Now we can write the final probability
□
1.150. In a particular state there is a parliament with 200 members. Two major political parties in this state flip a coin during an "election" for every seat in the parliament. Each of the parties has associated one side of the coin. What is the probability that each of the parties gains 100 seats? (The coin is "fair".)
Solution. There are 2200 of possible results of the elections (considered to be sequences of 200 results of flips). If each party is to obtain 100 seats, then there are exactly 100 tails and 100 heads in the sequence. There are (2[J[J) such sequences (since the sequence is uniquely determined by choosing 100 members of 200 possible, which will result in, say, tails). The resulting probability is /2oo\ 200!
'/ra<ioo)220° = if^=ao56-
□
1.151. Seven Czechs and five English are randomly divided into two (nonempty) groups. What is the probability that one group consists of Czechs only?
Solution. There are 212 — 1 of possible divisions. If one group consists of Czechs only, it means that all English are in one group (either in the first or in the second). It remains to divide the Czechs into two nonempty groups, that can be done in 27 — 1 ways. In the end we must add 1 for the division which puts all English in one group and all Czechs in another,
2 • (27 - 1) + 1 212- 1
56
CHAPTER 1. INITIAL WARM UP
□
1.152. From ten cards, where exactly one is an ace, we randomly draw a card and put it back. How many times must we do this, so that the probability that the ace is drawn at least once, is greater than 0, 9?
Solution. Let At be the event „at z'-th drawing the ace was drawn". Since the individual events At are (stochastically) independent, we know that
P \\J ^ J = 1 - (1 - P(A0) • (1 - P(A2)) ••• (1 - P(An)) for every n e N. We are looking for an n e N such that it holds that
P (iJ A^j = 1 - (1 - P(A0) • (1 - P(A2)) ••• (1 - P(An)) > 0, 9. Clearly is P(At) = 1/10 for any i € N. Thus it is enough to solve the equation
1-(&)"> 0,9,
from which we can express
n > !og"an,    kde a > 1.
logo 0,9
Evaluating, we obtain that we must do the drawing at least twenty two times. □
1.153. Texas hold'em. Let us now solve a couple of simple exercises concerning the popular card game Texas hold'em, whose rules we will not state (if the reader does not know them, she can look them up on the Internet). What is the probability that
i) the starting combination is a tuple of the same symbols?
ii) in my starting tuple of cards there is an ace?
iii) in the end I have one of the six best combinations of cards?
iv) I win, if I hold in my hand ace and a triple of twos (of any colour), on the flop there is ace and two twos and on the turn there is a third three and all these four cards have distinct colour? (The last card river is not yet turned)
Solution.
i) The number of distinct symbols is 13 and there are always four of them (one of each colour). Thus the number of tuples with the same symbols is 13(2) = 78. The number of all possible tuples is (1324) = 1326. The probability of having same symbols is then 77 = 0, 06.
ii) One card is the ace, that is four choices, and the second is arbitrary, that is 51 choices. But we have counted twice the tuples with two aces, of which there are Q = 6. Thus we obtain 4.51 - 6 = 198 tuples and the probability is ^ = 0, 15.
iii) Let us compute the probabilities of the individual best combinations:
ROYAL FLUSH: There are exactly only four such combinations - one of each colours. The number of combinations of five cards are (552) = 2598960. The probability is thus equal to 1,5.10"6. Very small:)
STRAIGHT FLUSH: Sequence which ends with the highest card in the range 6 to K, that is eight choices for every colours. We obtain 259382960 = 1, 2.10-5.
POKER: Four identical symbols - 13 choices (for every symbol one). The fifth card can be arbitrary, that is 48 choices. That makes 25llt6Q = 2, 4.10-4.
57
CHAPTER 1. INITIAL WARM UP
FULL HOUSE: Three identical symbols make 13(3) = 52 choices and two identical symbols make 12(2) = 72 choices. The probability is 2598960 ^ 1' 4-10-3. FLUSH: All five cards of the same colour means 4(153) = 5148 choices and the probability
is then 5148 — 2 10~3 it> men 3593960 — ^-lu ■
STRAIGHT: The highest card of the sequence is in the range from 6 to Ace, that is 9 choices. The colour of every card is arbitrary, that makes 9.45 = 9216 choices. But we have counted both straight flush and royal flush which we must subtract.
For determining the probability of one of the six best combinations we don't have to do that, we just do not count the first two combinations. Therefore we obtain the probability approximately 3, 5.10"3 + 2.10"3 + 1, 4.10"3 + 2, 4.10"4 = 7, 14.10"3. iv) The situation is clearly pretty good and therefore it will be better to count bad situation, that is, when the opponent has even better combination. I have at this moment full house of two aces and three two's. The only combination that could beat me at this moment is either full house of three aces and two twos or a poker of twos. That means that the enemy must have either the ace or the last two. If he has the two and any other card, then he clearly wins no matter what card is river. How many ways are there for this other card in his hand? 3 + 4 + -- - + 4 + 2 = 45 (one triple and two aces cannot be in his hand since I have them). There are (426) = 1035 remaining combination and the probability of such loss is then 0,043. If he has an ace in his hand, then the following can happen. If he holds two aces, then he again wins if two is not on the river - then I would have split poker. The probability of my (conditional) loss is then y^-fl = 10~3. If the enemy has in his hand ace and some other card than 2 and A, then it is a draw no matter what is on the river. The total probability of the win is thus almost 96 %.
□
1.154. A volleyball team (with libero, that is, 7 people) sits after a match in a pub and drinks beer. But there is not enough mugs, and thus the publican keeps using the same seven. What is the probability that
i) exactly one person does not receive the mug he had last round,
ii) nobody receives the mug he had last round,
hi) exactly three receive the mug they had last round.
Solution.
i) If six people receive the mug they had last round, then clearly the seventh person also receives the mug he had last round, the probability is thus zero.
ii) Let M is the set of all orderings and event At occurs when the z'-th person receives his mug from last round. We want to calculate \M - U, At \. We obtain 7! Yll=o = 1854- And the probability is ^ = §§ = 0, 37.
hi) We choose which three receive the mug they had last round - Q = 35 choices. The remaining four must receive mugs from somebody else. That is again the formula from the previous section, specifically it is 4! Ylt=o (~~rh = 9 choices. In total we have 9 • 35 = 315 choices and the probability is      = j?.
58
CHAPTER 1. INITIAL WARM UP
□
1.155. In how many ways can we place n identical rooks on a chessboard n x n such that every non-occupied position is threatened by some of the rooks?
Solution. Such placements are a union of two sets: the set of placements where in at least one row there is one rook (therefore in every row there is exactly one; this set has nn elements - in every row we choose independently one position for the rook), and the set of placements where in every column there is at least one (that is exactly one) rook (as before, this set has nn elements). The intersection of these sets has n! elements (the places for the rooks are chosen sequentially starting in the first row - there we have n choices, in the second only n — 1 - one column is already occupied...). Using the inclusion-exclusion principle, we obtain
2nn - n\.
□
1.156. Determine the probability that when throwing two dice at least one resulted in four, if the sum is 7.
Solution. We solve this exercise using the classical probability, where the condition is interpreted as restriction of the probability space. The space has due to the condition 6 elements, and exactly 2 of those are favourable to the given event. The answer is thus 2/6 = 1/3. □
1.157. We throw two dice. Determine the conditional probability, that the first die resulted in five under the condition that the sum is 9. Based on this result, decide whether the events "first dice results in five" and "the sum is 9" are independent.
Solution. If we denote the event "first dice resulted in five" by A and the event "the sum is 9" by H, then it holds
P(A\H) = ^ = f = I.
36
Note that the sum 9 occurs when the first die is 3 and the second 6, the first is 4 and the second 6, the first is 5 and the second is 4, or the first is 6 and the second is 3. Of those four results (that have the same probability) only one is favourable to the event A. Since the probability of A is clearly 1/6 7^ 1/4, the events are not mutually independent. □
1.158. Let us have a deck of 32 cards. If we draw twice one card, what is the probability that the second drawn card is an ace, if we return the first card; and when we don't return the first card (then there are 31 cards in the deck).
Solution. If we return the card in the deck, we are just repeating the experiment, which has 32 possible results (which have the same probability), and exactly four of them are favourable. Thus we see that the probability is 1/8. In the second case when we do not return the card, is probability also the same. It is enough to consider that when drawing all the cards one by one is the probability of the ace as the first card identical to the probability that the ace is the second card. We could also use conditional probability, that results into
_±    A. _1_ 28    _4_ _ 1 32 ' 31       32 ' 31 ~~ 8-
59
CHAPTER 1. INITIAL WARM UP
□
1.159. Consider families with two children and for simplicity assume that all choices in the set £2 = {bb, bg, gb, gg}, where b stands for „boy" and h stands for „girl" considering the age of the children have the same probability. Choose random events
Hi - familiy has a boy,   Ai - family has two boys.
Compute P (Ai|Hi).
Similarly consider families with three children, where
Q = {bbb,bbg,bgb, gbb, bgg, gbg, ggb, ggg}.
If
H2 - the family has both boy and girl,   A2 - the family has at most one girl, decide whether the events A2 and H2 are independent.
Solution. Considering which of the four elements of the set £2 are (not) favourable to the event Ai or Hi, we easily obtain
p (A  I H \ — p(AinHi) — p(Ai) — 1 _ I r —    P(Hl)    — P(Hl) — 3 — 3-
Further we have to determine whether the following holds:
P (A2 f)H2) = P (A2) • P (H2).
Again we just have to realize that exactly the elements kkk, kkh, khk, hkk of the set £2, are favourable to the event A2; to the event H2 the elements kkh, khk, hkk, khh, hkh, hhk are favourable and to the event A2 n H2 the elements kkh, khk, hkk. Therefore
P (A2 n H2) = § = I ■ I = P (A2) • P (H2),
which means that the events A2 and H2 are independent. □
1.160. We flip a coin five times. For every head, we put a white ball in a hat, for every tail we put in the same hat a black ball. Express the probability that in the hat there is more black balls than white balls, if there is at least one black ball in the hat.
Solution. Let us have the following two events
A - there are more black balls than white balls in the hat, H - there is at least one black ball in the hat.
We want to express/5(A\H). Note that the probability P (Hc) of the complementary event to the event H is 2~5 and that the probability of the event is the same as the probability P (Ac) of the complementary event (there are more white balls in the hat). Necessarily, P(H) = 1 — 2~5, P(a) = 1/2. Furthermore P(a n H) = P(a), since the event H contains the event A (the event A has H as a consequence). Thus we have obtained
p(A\m - p(AnH) -    2    - 16 r(.A\ti)-  p{H)   _ 5_31.
□
60
CHAPTER 1. INITIAL WARM UP
1.161. In a box there are 9 red and 7 white balls. Sequentially we draw three balls (without returning). Determine the probability that the first two are red and the third is white.
Solution. We solve this exercise using the theorem about multiplication of probabilities. First we require a red ball, that happens with the probability 9/16. If a red ball was drawn, then in the second round we draw a red ball with the probability 8/15 (there are 15 balls in the box, 8 of them are red). Finally, if two red balls were drawn, the probability that a white ball is drawn is 7/14 (there are 7 white balls and 7 red balls in the box). Thus we obtain
±.±.J.=0 15
16    15 14
□
1.162. In the box there are 10 balls, 5 of them are black and 5 are white. We will sequentially draw the balls, and we do not return them back. Determine the probability that first we draw a white ball, then a black, then a white and in the last, fourth turn again a white.
Solution. We use the theorem about multiplication of probabilities. In the first round we draw a white ball with the probability 5/10, then a black ball with probability 5/9, then a white ball with probability 4/8 and in the end a white ball with probability 3/7. That gives
_5_    5    4    3 _ _5_ 10 ' 9 ' 8 ' 7 ~~ 84-
□
1.163. From a deck of 32 cards we randomly draw six cards. Compute the probability that the first king will be chosen as the sixth card (that is, the previous five cards do not contain any king).
Solution. Using the theorem about multiplication of probabilities we have
28    27    26    25    24    J_ ^_ r> 0790 32 ' 31 ' 30 ' 29 ' 28 ' 27 ~~ U' u/z->-
□
1.164. What is the probability that a sum of two randomly chosen positive numbers smaller than 1 is smaller than 3/7?
Solution. It is clear that it is a simple exercise on geometrical probability where the basic space £2 is a square with vertices at [0, 0], [1, 0], [1, 1], [0, 1] (we are choosing two numbers in [0, 1]). We are interested in the probability of the event that a randomly chosen point [x, y] in this square satisfies x + y < 3/7, that is, the probability that the point lies in the triangle A with vertices at [0, 0], [3/7, 0], [0, 3/7]. Now we can easily compute
p(a\ — Mil — (^) /2 _ 2.
^   '      vol Q 1 98 •
□
61
CHAPTER 1. INITIAL WARM UP
1.165. Let a pole be randomly broken into three parts. Determine the probability that the length of the second (middle) part is greater than two thirds of the length of the pole before the breaking.
Solution. Let d stand for the length of the pole. The breaking of the pole at two points is given by the choice of the points where we split the pole. Let x be the point which is the first (closer to left end of the pole), and x + y be the point where the second splitting occurs. That says that the basic space is the set {[x, y]; x € (0, d), y € (0, d — x)}, that is, a triangle with vertices at [0, 0], [d, 0], [0, d]. The length of the middle part is given by the value of y. The condition from the exercise statement can be now restated as y > 2d/3, which corresponds to the triangle with vertices at [0, 2d/3], [d/3, 2d/3], [0, d]. Areas of the considered triangles are d1 /2 a (d/3)2/2, therefore the probability is
3^-2   _ 1 (fl    ~ 9'
2
□
1.166. A pole of length 2 m is randomly divided into three parts. Determine the probability of the event that the third part is shorter than 1,5m.
Solution. This exercise is for using the geometrical probability, where we are looking for the probability that the sum of the lengths of the first two parts is greater than one fourth of the length of the pole. We determine the probability of the complementary event, that is, the probability that if we randomly choose two points on the pole, both of them are in the first quarter of the pole. The probability of this event is 1/42, since the probability of picking a point in the first quarter of the pole is clearly 1/4 and this choice is independently repeated (once). Thus the probability of the complementary event is 15/16. □
1.167. Mirek and Marek have a lunch at the school canteen. The canteens opens from 11 to 14. Each of them eats the lunch for 30 minutes, and the arrival time is random. What is the probability that they meet at a given day, if they always sit at the same table?
Solution. The space of all possible events is a square 3x3. Denote by x the arrival time of Mirek and by y the arrival time of Marek, these two meet if and only if \x — y\ < 1/2. This inequality determines in the square of possible events the area whose volume is 11/36 of the volume of the whole square. Thus that is also the probability of the event. □
1.168. >From Brno Honza rides a car to Prague randomly between 12 and 16, and in the same time interval Martin rides a car to Brno from Prague. Both stop in a motorest in the middle of the trip for thirty minutes. What is the probability that they meet there, if Honza's speed is 150 km/h and Martin's is 100 km/h? (The distance Praha-Brno is 200 km).
Solution. If we denote the departure time of Martin by x and the departure time of Honza by y, and in order to have fewer fractions in the following calculations choose a time unit to be ten minutes, then the base space is a square 24 x 24. The arrival time of Martin to the motorest is x + 6, arrival time of Honza is y +4. As in the previous exercise, the event that they meet in the motorest is equivalent to the event that their arrival times do not differ by more than thirty minutes, that is, | (x + 6) — (y + 4) | < 3. This condition determines an area with volume 242 — \ (232 +192) (see the figure) and the probability
62
CHAPTER 1. INITIAL WARM UP
p =
(232 + 192
)
242
□
1.169. Mirek departs randomly between 10 and 20 o'clock from Brno to Prague. Marek departs randomly in the same interval from Prague to Brno. The trip takes 2 hours. What is the probability that they meet on the road (they use the same road)?
Solution. We are solving analogously to the previous exercise. The space of all events is a square 10x10, Mirek, departing at the time x, meets Marek, departing at the time y if and only if \x — y\ < 2. The probability is p =     = |r = 0,36.
1.170. Two meter-long pole is randomly divided into three pieces. Determine the probability that a triangle can be built of the pieces.
Solution. Division of the pole is given as in the previous exercises by the points of cutting x and y and the probability space is again a square 2 x 2. In order to be able to build a triangle of the pieces, the lengths of the parts must satisfy the triangle inequalities, that is, sum of lengths of any two parts must be greater than the length of the third part. Since the sum of the lengths is 2 meters, this condition is equivalent to the condition that each part must be smaller than 1 meter. Using the cut-points x and y, we can express this that it cannot simultaneously hold x < 1 and y < 1 or simultaneously x > 1 and y > 1 (this corresponds to the conditions that the border parts of the pole are smaller than 1), and also \x — y\ < 1 (the middle part is smaller than one). These conditions are satisfied by the shaded area in the picture, whose volume is 1/4.
□
63
CHAPTER 1. INITIAL WARM UP
□
1.171.  Does the equation
(a)
(b)
(c)
have a unique solution (that is, exactly one)?
Solution. The set of equation is uniquely solvable if and only if the determinant of the matrix given by the left-hand side coefficients is nonzero. Therefore, the coefficients on the right-hand side do not influence the uniqueness of the solution. Thus we have to have the same answer in (a) and (b). Since
4 -73
4jci	- V3x2	= 3,
xx	- 2Jlx2	= -2
Ax i	- V3x2	= 16,
xx	- 2Jlx2	= -7
Ax i	+ 2x2	= 7,
—2x\	*2	= -3
1
-2V7
4- (-2V7) - (-V3- l) ^0, = 4-(-l)-(2-(-2)) = 0,
-2 -1
for (a) and (b) there is a unique solution and in (c) there is not. If we multiply the second equation in (c) by —2, we see that it has no solution at all. □
1.172.   Compute the area 5 of a quadrilateral given by the vertices
[0,-2],    [-1,1],    [1,5], [1,-1].
Solution. In the usual notation
A = [0,-2],    5 = [1,-1],    C = [l,5],    Z) = [-l,l]
and the usual division of the quadrilateral into triangles ABC and ACD with areas Si and S2 we obtain
S = Si + s2
1-0 1-0 -1+2  5 + 2
+
1-0 -1-0 5+2 1+2
(7-1)+ ±(3+ 7)
□
1.173. Determine the area of the quadrilateral ABCD with vertices A C = [2, 5] aD = [-2, -5].
[1,0], B = [11, 13],
64
CHAPTER 1. INITIAL WARM UP
Solution. We divide the quadrilateral into two triangles ABC and ACD. We compute their areas by computing the determinants, see 1.34,
1	1	5		1	1 5
2	10	13	+	2	-3 -5
□
1.174.   Compute the area of parallelogram with vertices at [5, 5], [6, 8] at [6, 9]. Solution. Although such parallelogram is not uniquely determined (the fourth vertex is not given), the triangle with vertices at [5, 5], [6, 8] and [6, 9] must be necessarily a half of every parallelogram with these three vertices (one of the sides of the triangle becomes the diagonal of the parallelogram). Therefore the area equals the determinant
6-5	6-5		1	1
8-5	9-5		3	4
□
65
CHAPTER 1. INITIAL WARM UP
1.175. Determine the number of relations over the set {1, 2, 3, 4}, which are both symmetric and transitive.
Solution. Relations of the given properties is an equivalence over some subset of the set {1, 2, 3, 4}.
1.177. Determine the numer of ordering relations over the set {1, 2, 3, 4} such that the elements 1 and 2 are not comparable (that is, neither 1^2 nor 2^1, where -< stands for the ordering relation). O
1.178. Determine the number of surjective mappings / from the set {1, 2, 3, 4, 5} to the set {1, 2, 3} such that /(l) = /(2).
Solution. Every such mappings is uniquely given by the images of the elements {1, 3, 4, 5}, there are exactly that many mappings as there are surjective mappings of the set {1, 3, 4, 5} to the set {1, 2, 3}, that is, 36, as we know from the previous exercise. □
1.179. Give all the elements in S o R, if
R = {(2, 4), (4, 4), (4, 5)} C N x N, S = {(3, 1), (3, 2), (3, 5), (4, 1), (4, 4))cNx N.
Solution. Considering all choices of two ordered tuple
(2, 4), (4,1);    (2, 4), (4, 4);    (4, 4), (4,1);    (4, 4), (4, 4)
satisfying that the second element of the first ordered tuple—which is a member of R—equals the first element of the second ordered tuple—which is a member of S—we obtain
In total, 1 + 4 • 1 + (j) • 2 + (3) • 5 + 15 = 52.
□
1.176. Determine the number of ordering relations over a three-element set.
o
SoR = {(2, 1), (2,4), (4, 1), (4,4)}.
□
1.180.  Let a binary relation be given
R = {(0,4), (-3,0), (5,7T), (5, 2), (0,2)}
between sets A = Z a B = R. Express R 1 and R o R 1. Solution. We can immediately see that
R-1 = {(4, 0), (0, -3), (tt, 5), (2, 5), (2, 0)}.
Furthermore,
R o R-1
{(4, 4), (0, 0), (77-, tt), (2, 2), (4, 2), (it, 2), (2, it), (2, 4)}.
□
1.181.  Decide whether the relation R determined by the condition:
Solution. In the first case R is transitive, because
66
CHAPTER 1. INITIAL WARM UP
In the second case R is not transitive. For instance, consider
(4, 2), (2, 1) e R,    (4, 1) £ R.
□
1.182. Find all relations over M = {1,2}, which are not antisymmetric. Which of them are transitive?
Solution. There are four relations that are not antisymmetric. They are exactly subsets of the set {1,2} x {1,2}, which contain the elements (1, 2), (2, 1) (otherwise the condition of antisymmetry is satisfied). Of these four only the relation
{(1, 1), (1, 2), (2, 1), (2, 2)} = M x M,
is transitive, because not containing tuples (1,1) and (2, 2) in a transitive relation means that the relation cannot contain both (1,2) and (2, 1). □
1.183. Is there an equivalence relation, which is also an ordering, over the set of all lines in the plane?
Solution. An equivalence relation (or ordering relation) must be reflexive, therefore every line must be in relation with itself. Furthermore we require that the relation is both symmetric (equivalence) and antisymmetric (ordering). That means that a line can be in relation only with itself. If we define the relation such that two lines are in relation if and only if they are identical, we obtain "very natural" relation which is both equivalence relation and ordering. We just need to check that it is transitive, which it trivially is. Thus the only relation satisfying the problem statement is the identity over the set of all lines in the plane. □
1.184. Determine, whether the relation
R = {(k,l) eZxZ; \ k\ > |/|} over the set Z is an equivalence and/or an ordering.
Solution. The relation R is not an equivalence: it is not symmetric (take (6, 2) e R, (2, 6) ^ R); it is not an ordering: it is not antisymmetric (take (2, —2) e R, (—2, 2) e R). □
1.185. Show that the intersection of any equivalence relation over a set X is again an equivalence relation, and that the union of two ordering relations over a set X does not have to be an ordering.
Solution. We see that the intersection of equivalence relations is reflexive, symmetrical and transitive: all the equivalence relations must contain the tuple (x, x) for every x e X, therefore the intersection contains that tuple too. If the element (x, y) is in the intersection, then the element (y, x) is also in the intersection (just use the fact that every equivalence is symmetric). If tuples (x, y) and (y, z) are in the intersection, then both are in the equivalences also. Since the equivalences are transitive, they all contain the element (x, z) and thus that element is also in the intersection. If we chose X = {1, 2} and the ordering relation
R, = {(1, 1), (2, 2), (1, 2)},    R2 = {(1, 1), (2, 2), (2, 1)} over X, we obtain the relation
/?1U/?2 = {(1,1),(2, 2), (1,2), (2, 1)}, which is not antisymmetric, thus not an ordering. □
67
CHAPTER 1. INITIAL WARM UP
1.186. Over the set M = {1, 2, ..., 19, 20} there is an equivalence relation ~ such that a ~ b for any a,b € M if and only if the first digits of the numbers a, b are the same. Construct the partition given by this equivalence.
Solution. Two numbers from the set M are in the same equivalence class if and only if they are in the relation (first digit is the same). Therefore the partition consists of the sets
{1, 10, 11, ..., 18, 19}, {2, 20}, {3}, {4}, {5}, {6}, {7}, {8}, {9}.
□
1.187. We are given partition of two classes {b, c}, {a, d, e} of the set X = {a, b, c, d, e}. Write down the equivalence relation R over the set X which gives this partition.
Solution. Equivalence R is determined by the fact that the two elements are in relation if and only if they are in the same partition class (note also that R must be symmetric), and every element is in relation with itself (R must be reflexive). Therefore R contains exactly
(a, a), (b, b), (c, c), (d, d), (e, e),
(b, c), (c, b), (a, d), (a, e), (d, a), (d, e), (e, a), (e, d).
□
1.188. In the following three figures, icons are connected with lines such that people in different parts of the world could have assigned them. Determine whether it is a mapping, and whether it is injective, surjective or bijective.
Solution. In the first case it is a mapping which is surjective but not injective, because both the snake and the spider are labelled as poisonous. The second case is not a mapping but only a relation, since the dog is labelled both as a pet and as a meal. The third case is again a mapping. This time it is neither injective nor surjective. □
68
CHAPTER 1. INITIAL WARM UP
1.189. Let {a, b, c, d] be a set with a relation
{(a, a), (b,b), (a,b), (b,c), (c,b)}.
What is the minimal number of elements we have to add to the relation in order to make it an equivalence?
Solution. Let us successively ensure the three properties that define an equivalence. First it is the reflexivity. We must add the tuples {(c,c), (d,d)}. Second is the symmetry - we must add (b, a) and for the third step we must do the so-called transitive closure. Since a is in relation with b and b is in relation with c, we must add (a, c) and (c, a). □
1.190. Consider the set of numbers that have five digits in the binary notation and a relation such that two numbers are in the relation whenever their digit sum has the same parity. Write down the corresponding equivalence classes.
Solution. We have two equivalence classes (of eight members): [10000] = {10000, 10011, 10101, 10110, 11001, 11010, 11100, 11111} which corresponds to the set {16, 19, 21, 22, 25, 26, 28, 31} and [10001] = {10001, 10010, 10100, 11000, 10111, 11011, 11101, 11110}
which corresponds to the set {17, 18, 20, 24, 23, 27, 29, 30}. □
1.191. Consider the set of numbers that have three digits in the ternary notation and a relation such that two numbers are in the relation whenever they
i) begin with the same two digits in this notation,
ii) end with the same two digits in this notation. Write down the corresponding equivalence classes. Solution.
i) We obtain six three-element classes
[100] = {100, 101, 102} corresponds {9, 10, 11} [110] = {110, 111, 112} corresponds {12, 13, 14} [120] = {120, 121, 122} corresponds {15, 16, 17} [200] = {200, 201, 202} corresponds {18, 19, 20} [210] = {210, 211, 212} corresponds {21, 22, 23} [220] = {220, 221, 222} corresponds {24, 25, 26}
ii) In this case we have nine two-element classes
[100] = {100, 200} corresponds {9, 18} [101] = {101, 201} corresponds {10, 19} [102] = {102, 202} corresponds {11, 20} [110] = {110, 210} corresponds {12, 21} [111] = {111, 211} corresponds {13, 22} [112] = {112, 212} corresponds {14, 23} [120] = {120, 220} corresponds {15, 24}
69
CHAPTER 1. INITIAL WARM UP
[121] = {121, 221} corresponds {16, 25} [122] = {122, 222} corresponds {17, 26}
□
1.192. What is the maximal domain D and codomain H such that the following mappings are bi-jective, and what is then the inverse function?
i) x h» x4 ii) x h» x3
hi) x ^ ^
Solution.
i) D = [0, oo) and H = [0, oo) or also D = (—oo, 0] a H = [0, oo). The inverse function is thenx h» tfx.
ii) D = H = R and the inverse function is x h> ^/x.
hi) D = R \ {—1} and H = R \ {0}. The inverse function is x h» ^ — 1.
□
1.193. Consider a relation R x R. A point is in the relation whenever it holds that
(x - If + (V + 1)2 = 1.
Can we describe the points using the function y = f(x)l Depict the points in the relation. Solution. We cannot, because for instance y = — 1 has two preimages: x = 0 and x =2. The points lie on a circle with the centre at the point (1,-1) and radius 1. □
1.194. Let for any two integers k, I hold that (k,l) e r whenever the number 4k — 41 is an integral multiple of 7. Is such a relation over r an equivalence? Is it an ordering?
Solution. Note that two integers are in the relation r if and only if they have the same remainder under the division by 7. Therefore it is an example of the so-called remainder class of integers. Therefore we know that the relation r is an equivalence relation. Its symmetry (for instance, (3, 10), (10, 3) e r, 3 ^ 10) implies that it is not an ordering. □
1.195. Let a relation r be defined over the set = {3, 4, 5,... ,n,n + 1, ...}, such that two numbers are in the relation whenever they are relatively prime (that is, the prime decompositions of the numbers do not contain any common number). Determine whether this relation is reflexive, symmetric, antisymmetric, transitive.
Solution. For a tuple of the same numbers in holds that (n, n) g r. Therefore the relation is not reflexive. It is clear that when two numbers are relatively prime or not, it does not matter how they are ordered - it is a properly of unordered tuples. Therefore, r is symmetric. From the symmetry we have that it is not antisymmetric (for instance, (3, 5) e fl, 3 / 5). Since r is symmetric and (n, n) g r for any number n € N, a choice of two distinct numbers which are in the relation gives that r is not transitive. □
70
CHAPTER 1. INITIAL WARM UP
Solution to the exercises
1.33. y„ = 2(f)" - 2. 1.92.
i) (3, 3), (4,4), (5, 5), (6, 6), (7, 7), (3, 6), check that it is an ordering relation.
ii) again (/, /) for i — 1,..., 7 and additionally (3, 6), (6, 3), check that it is an equivalence relation.
iii) (/, /) for i — 1,..., 7 and also (3, 6), (6, 3), (4, 6), (6, 4). Check that it is not an equivalence, since transitivity does not hold.
1.103. Three different Hasse diagrams which satisfy the given condition. In total 5! + 5! + 5!/4 — 270. 1.114.
a) 1 - 3 - 2/ + 4/ = -2 + 2i, 1 • (-3) - 8/2 + 6/ + 4i = 5 + 10«, 1 + 2i, ^42 + (-3)2 = 5, £i = fi% = 1 ■ (-3) + 8/2 + 6i - 4/25 =       + A«.
b) 2+/, 2/2, 1, I = -2/.
i) 26 = 64
ii) 0 = 15
iii) No head is one possibility (q) = 1, one head is (j) — 6. Thus there are 7 sequences with at most one head and the result is 64 — 7 = 57.
1.139. The maximum number y„ of areas a plane can be divided into by n circles is y„ — yn-\ + 2(n — 1), yi = 2, that is, y„ = n2 — n + 2.
For the maximum number p„ of areas a space can be divided into by n balls we obtain the recurrent formula
Pn+i — Pn+ y«, Pi = 2, that is, pn = |(n2 - 3n + 8).
1.176. 19. /./77. 87.
71
CHAPTER 2
Elementary linear algebra
are you able to calculate with scalars? - if not, let us go straight to the matrices...
In the previous chapter we have warmed up with relatively simple problems which did not require any sophisticated tools. It was enough to use addition and multiplication of scalars. In this and subsequent chapters we will be dealing with particular topics in more detail.
Three chapters will be about tools for working with that, where the operations consist of simple operations with scalars, but work with more scalars simultaneously. We speak about "linear objects" and "linear algebra". Although it might seem a very specialised tool, we shall see later that even more complicated objects are studied mostly using their "linear approximations".
In this chapter we will work directly with finite sequences of <gu scalars. Such sequences arise in real-world problems whenever we have our objects described with more parameters. You should not trouble yourselves with _Wil^-- trying to imagine the space with more than three "coordinates". You have to five with the fact that we are able to depict one, two or three dimensions, but we will deal with arbitrary number of dimensions. And when we will observe any parameter in, say, 500 students (for instance, their study results), our data will have 500 elements and we would like to work with them. Our goal is to devise tools which will work well even if the number of elements is large.
Also, do not be afraid of terms like field or ring of scalars 1/ Simply, imagine any specific domain of numbers. ^TyL Rings of scalars contain for instance integers Z and 53=211e^ ? all residue classes, among fields we have only M, Q, C and residue classes for k prime. Specific among them is Z2, where from the equation x — —x we cannot infer that x — 0, where in every other field we indeed can.
1. Vectors and matrices
Mostly, we speak about vectors in connection with a field of scalars, since the general theory is way more complicated in case there are some non-invertible non-zero scalars. Only in the first two parts of this chapter we will work with vectors and matrices in the context of finite sequences of scalars, and in that case it will be interesting to note how the situation would behave if we had, say, integers instead of a field. It will be hopefully easy to see, how strong results can be derived with precise formal reasoning.
2.1. Vectors over scalars. For now, vector is for us an ordered
n -tuple of scalars from K, where the fixed n e N is called dimension.
CHAPTER 2. ELEMENTARY LINEAR ALGEBRA
A. Systems of linear equations
We attack vector spaces from an unexpected direction. We begin with something we know, that is, systems of linear equations. Because even behind them we can actually find vector spaces.
2.1. A colourful example. A company of painters ordered 810 litres of colour, which should contain the same amount of red, green and blue colour (that is, 810 litres of black). The provider can satisfy this order by mixing the colours he usually sells (he has got enough of that in his warehouse), that is
• reddish colour - it contains 50 % of red, 25 % of green and 25 % of blue colour;
• greenish colour - it contains 12,5 % of red, 75 % of green and 12,5 % of blue colour;
• bluish colour - it contains 20 % of red, 20 % of green and 60 % of blue colour.
How many litres of each of the colours at the warehouse has to be mixed in order to satisfy the order?
Solution. Denote by
• x - the amount (in litres) of reddish colour to be used;
• y - the amount (in litres) of bluish colour to be used;
• z - the amount (in litres) of greenish colour to be used;
By mixing of the colours we want a colour that contains 270 litres of red. Note that reddish contains 50 % red, greenish contains 12,5 % red and bluish 20 % red. Thus the following has to be satisfied:
0,5x   +   0, 125y   +   0,2z
270.
Analogically, we require (for blue and green colours respectively) that
0, 25x + 0, 75y + 0,2z = 270, 0,25x   +   0, 125y   +   0,6z   = 270.
Now we can carry on in two ways. Either we can express individual variables using other variables - from the first equation we have x = 540—0, 25y—0, 4z, weplugitforx into the second and third equations and obtain two linear equations of two variables 2, 75y + 0, 4z = 540 and 0, 25y + 2z, = 540. From the second equation we express z, = 270 — 0, 125y and plugging into the first one we obtain 2, ly = 432, that is, y = 160, therefore z = 270 - 0, 125 • 160 = 250 and x = 540 - 0, 25 • 160 + 0, 4 • 250 = 400.
Second approach is to use matrix notation. The first row of the matrix consists of coefficients of the variables in the first equation, second of the coefficients in the second equation and third of the coefficients
We can add and multiply scalars. We will be able to add vectors, but multiplying a vector will be possible only by scalar. This corresponds to the idea we already saw in the plane R2, where addition realised vector composition (as of composition of arrows emanating from the origin) and the multiplication by scalar realised stretching of vectors.
Multiple of a vector u — (a\,..., a„) by a scalar c is defined by multiplying by c every element of the n-tuple u, and also addition is defined coordinate-wise. That means, ___|    Basic vector operations j____
u + v — (fli, ..., a„) + (b\, ..., b„) — (fli + bi, ..., a„ + b„) c ■ u — c ■ (fli, ..., a„) — (c ■ a\, ..., c ■ a„).
For vector addition and multiplication by scalars we shall use the same symbols as for scalars, that is plus and either dot or juxtaposition.
The vector notation convention. We shall not, unlike many other textbooks, use any special notations for vectors and leave it to the reader to pay attention to the context. For scalars, we shall mostly use letters from the beginning of the alphabet, for the vector from the end (the middle part of the alphabet remains for indices of variables or components and also for summation indices).
We will often require that scalars are from some specific fields, see 1.1, but in this chapter will mostly work with operations that do not require this assumption. In the more advanced literature, this is usually called modules over rings instead of vector spaces. In the general theory in the next chapter, we will work exclusively with fields of scalars.
For vector addition in K" the properties (KG1)-(KG4) clearly hold with the zero element being
0= (0, ...,0) e K".
We are purposely using the same symbol for both the zero vector element and the zero scalar element.
___|    Vector properties J____
(VI) (V2) (V3) (V4)
For all vectors v,weK" and scalars a, b e K we have
a-(v + w) — a-v + a- w (a + b) ■ v — a ■ v + b ■ v a ■ (b ■ v) — (a ■ b) ■ v 1 • v — v
The properties (V1)-(V4) of our vectors, that is, n-tuples of scalars in K", are easy to check for any specific ring of scalars K, since when checking the properties we are using for the individual coordinates of vectors only the properties of scalars listed in 1.1 and 1.3.
In this way we shall work with, for instance, W,Q",C", but also with Z", (Z*)", n = 1, 2, 3, ....
2.2. Matrices over scalars. A slightly more complicated object we will use when working with vectors are matrices.
73
CHAPTER 2. ELEMENTARY LINEAR ALGEBRA
in the third. Therefore the matrix of the system is
0,5 0, 125 0,2 0, 25 0, 75 0, 2 0,25  0,125 0,6
extended matrix of the system is obtained from the matrix of the system by adding the column of the right-hand sides of the individual equations in the system:
0, 5 0, 125 0, 2 0, 25 0, 75 0, 2 0, 25  0, 125  0, 6
By sequentially doing the so-called elementary row transformations (they correspond to equivalent operations with the equations, see 2.7) we obtain:
0,5 0,125 0,2 0, 25 0, 75 0, 2 0,25  0,125 0,6
1 0,25 0,4 0 2,75 0,4 0  0,25 2
1   0,25  0,4 0     1 8 0    11 1,6
0,25 0,4 3     0,8 0,5 2,4
1 0,25 0,4 0 11 1,6 0     1 8
1 0,25 0 1 0 0
0,4
-86,4
And again by plugging the values back we compute -21600
z ■
y
250,
-86,4 2 160 - 8 • 250
x = 540 - 0, 4 • 250
160,
- 0, 25 • 160
400.
Thus it is necessary to mix 4001 of reddish, 160 1 of bluish and 2501 of greenish colour. □
2.2.   Solve a system of simultaneous linear equations
2, -3, -3.
x\   +   2x2   + 3x3
2x\     —    3x2     — x3
-3xi   +    x2   + 2x3
Solution. We write the system of equations in the form of extended matrix of the system
1
1
which we using the elementary row transformations transform into the row echelon form
1
-7
3
-7 11
-7
_j    Matrices of type m/n [_
A matrix of the type m/n over scalars K schema A with m rows and n columns
/ an    a\2   ■ ■ ■ a\n\ «21    «22    • • • «2«
is a rectangular
A =
1
where e K for all 1 < i < m, 1 < j < n. For a matrix A with elements     we also use the notation A — (aij).
Vectors (an, ai2, ■ ■ ■, at„) e Kn are called O'-th) rows of the matrix A, i — I,..., m, vectors (aij, a2j,..., amf) e Km are called (j-th) columns of the matrix A, j —
Matrix can be also understood as a mapping
A:{l,...,m}x{l,...,n}->-K,
where A(i, j) — aij. Matrices of the type 1/n or n/1 are actually just vectors in K".
Ever general matrices can be understood as vectors in Km'n, we just concatenate all the columns. In partic-^~LY/'   ular, matrix addition and matrix multiplication by scalars is defined:
A + B — (aij + bij),    a ■ A — (a ■ a^)
where A — (atj), B — (bij), a e K.
The matrix —A — (—aij) is called the addition inverse to the matrix A and the matrix
/0   ... 0\
0 =
\o ... o/
is called the zero matrix. Seeing matrices as m ■ n-dimensional vectors, we obtain the following claim
Proposition. The formulas for A + B, a-A, —A, 0 give for the set of all matrices of the type m/n the operations of addition and multiplication by scalars, which satisfy axioms (V1)-(V4).
2.3. Matrices and equations. An often-used tool for description of mathematical models are systems of linear equations. Matrices are useful for the description of such systems. We use for this the notion of scalar product of two vectors, which assigns to the vectors (ai,... ,an) and (x\,... ,x„) assigns their product
(fll, ..., a„) ■ (x\, ..., x„) — a\x\ H-----h a„x„
that is, we subsequently multiply the coordinates of the vectors and sum the results.
Every system of m linear equations in n variables
flllJCl 021*1
■ fll2*2 ■
■ a22x2 ■
ai„xn = b\ a2nxn = b2
®m\x\ ~t~ am2x2 + • • • + amnxn — bm
can be seen as a constraint on values of m scalar products with an unknown vector (x\,...,x„) with vectors of coordinates
(an, ..., ain).
74
CHAPTER 2. ELEMENTARY LINEAR ALGEBRA
1	2	3	2 \	/1	2	3	2
0	1	1	1 '		1	1	1
0	0	4	-W		0	1	-1
First we have subtracted from the second row twice the first row and to the third row we have added thrice the first row. Then we have added the second row to the third row and multiplied the second row by — 1 /4. Let us now go back to the system of equations
x\   +   2x2   +   3x3   = 2, x2   +     X3   = 1, X3   =   — 1.
We immediately see that X3 = —1. If we plug in X3 = —1 into the equation x2 + x3 = 1, we obtain x2 = 2. And again by plugging x3 = — 1, x2 = 2 into the first equation we obtain x\ = 1. □
Systems of linear equations can be written in the matrix notation. But is it an advantage, when we can solve the systems even without speaking of the matrices? Yes it is, we can speak about the solution with more concept, we can say in the language of matrices how many solutions a system has and it is more natural and elegant for computer applications. Try to get more familiar with particular operations which can be done with matrices. As we have seen in previous examples, equivalent operations with linear equations correspond to elementary row (column) transformations. Further we have seen that transforming a matrix into a row echelon form (this process is called Gaussian elimination, see 2.7), solving the system is then very easy. We show it on some more examples, where we will see that a system can have infinitely many solutions.
2.3.   Solve a system of linear equations
2x\ — x2 +       3x3 = 0,
3xi + 16x2 +       7x3 = 0,
3xi — 5x2 +       4x3 = 0,
-7jci + 7x2 + -10jc3 = 0.
Solution. Because the right-hand side of all equations is zero (such a case is called a homogeneous system) we shall work with the matrix of the system only. We find the solution by transforming the matrix into the row echelon form using elementary row transformations, which correspond to changing the order of equations, multiplying an equation by a non-zero number and addition of multiples of equations. Furthermore, we can always go back and forth between the matrix notation and the original system notation with variables xt. We obtain:
/ 2    -1     3   \     / 2    -1 3\
16
-5 7
7 4
■10/
0 0
35/2 -7/2 7/2
5/2 -1/2 1/2 j
The vector of variables can be also seen as a column in a matrix of the type n/1, and similarly the values b\,... ,b„ can be seen as a vector u, and that is again a single column of the matrix of the type n/1. Our system of equations can be then formally written as A ■ x — u as follows:
a\n\
(bl\
\@ml     •••    amn J     \Xn / \bm/
where the left-hand side is interpreted as m scalar products of the individual rows of the matrix (giving rise to a column vector), whose values are determined by the equations. That means that the identity of the ;-th coordinates corresponds to the original ;-th equation
anxi H-----h ainx„ — bt
and the notation A ■ x — u gives the original system of equations.
2.4. Matrix product. In the plane, that is, for vectors of dimension two, we have developed a matrix calculus and we saw that it is effective to work with (see 1.26). Now we will work more generally and develop all tools we ■Wtv*^^— already know from the plane case for all dimensions n.
Matrix multiplication is possible to define only when the dimensions of rows and columns allow it, that is, when the scalar product is defined for them as before:
Matrix product |___
For any matrix A — (aif) of the type m/n and any matrix B — (bjk) of the type n/q over the ring of scalars K we define their product C — A ■ B — (en) as a matrix of the type m/q with the elements
Cik
aijbß, for arbitrary 1 < / < m, 1 < k < q.
That is, the element ac[ik] of the product is exactly the scalar product of the ;-th row of the matrix on the left and of the k-th column of the matrix on the right. For instance we have
2 1 1 -1   0 1
3 2 3 3   1 0
2.5. Square matrices. If there is the same number of rows and columns in the matrix, we speak of square matrix. The number of rows and columns is then called the dimension of the matrix. The matrix
/l   ... 0\
E = (Sij) =
\0
1/
is called the unit matrix. Numbers defined in such way &ij are also called Kronecker delta. Over the set of square matrices over K of dimension n the matrix product is defined for any two matrices, that is, there is the multiplication operation is defined there, and its properties are similar to that of scalars:
75
CHAPTER 2. ELEMENTARY LINEAR ALGEBRA
3	\	/ 2	-1	3 \
5/2		0	7	1
0		0	0	0
0	/		0	
>From there we can see that the second, third and fourth equations are multiples of the equation 7x2 + x3 = 0. But we still carry on with the transformations, in order to see what will happen:
/ 2    -1 3    \     / 2 -1
0 35/2 5/2     _    0 35/2
0 -7/2 -1/2    ~    0 0
\ 0 7/2 1/2  /     \ 0 0
Although we were given four equations for three variables, the whole system has infinitely many solutions, because for any x3 e M the remaining equations have
2x\   —    x2   +   3x3   = 0, 7x2   +    x3   = 0
solution. Thus we substitute for the variable x3 a parameter t e M and
express
1
x\ = — (x2 — 3x3)
x2 — —-xj,
1 1 11
■ t   a   xi = - (x2 — 3x3) = — — t.
If we now substitute t = -
(Xi, x2, x3)
7 2 7
—7s, we obtain the result in a simple form
(lis, s, -Is) ,   s e
□
2.4.   Find all solutions of the system of linear equations
3xi +   3x3   —   5x4   = —8,
x\    —    x2    +     X3    —      X4    = —2,
—2xi   —   X2   +   4x3   —   2x4   = 0, 2xi   +  x2   —    x3   —    X4   = —3.
Solution. The corresponding extended matrix of the system is
/
V
0
-i -i
1
3 1
4
■1
"3 /
By changing the order of rows (equations) we obtain
/
-1 1
1
-1
-1 4 3
-2 \
-3
0
V  3 0
which we transform into the row echelon form:
/	1	-1	1	-1	-2 ^		( 1	-1	1	-1	-2 \
	2	1	-1	-1	-3		0	3	-3	1	1
	-2	-1	4	-2	0		0	-3	6	-4	-4
V	3	0	3	-5	-* )		\o	3	0	-2	~2)
	/ 1	-1	1	-1	-2 \		(1	-1	1	-1	-2 \
	0	3	-3	1	1		0	3	-3	1	1
	0	0	3	-3	-3		0	0	3	-3	-3
		0	3	-3	"3 )			0	0	0	
The system has thus infinitely many solutions, because we have three equations for four variables, which have exactly one solution for any
Proposition. Over the set of all square matrices of the dimension , n over arbitrary ring ofscalars K is defined the multiplication operation with the following properties of ^rr. rings (see 1.3):
(1) The associativity holds (Ol).
(2) The unit matrix E — (<Sy) is a unit element for multiplication (03).
(3) The distributivity of multiplication and addition holds (04). In general, neither the properties (02) nor (Ol) hold. Therefore, the square matrices for n > 1 do not form an integral domain, therefore they are not even a (non-commutative) field.
Proof. Associativity of multiplication - (Ol): Since scalars are associative, distributive and commutative, we can for the three matrices A — (a^) of type m/n, B — {bjk) of type n/p and C — (cki) of type p/q calculate
A - B — (j2aU-bJk),    BC= (j2bJk-Ck), (A ■ B) ■ C — \ J2(Z2aij.bjk).cki\ = (J2aij.bjk.ckij, A-(B ■ Q = ( ^ ay.      bjk .cki) j = (      a'j-bjk -cki ) •
j k J       Kj,k J
Note that while computing, we did rely on the fact that it does not matter in which order are we doing given sums and products, that is, we were heavily using the properties of scalars.
We can easily see that multiplication by unit matrix has the property of a unit element:
(\   0   ••• 0\ 0   1   ••• 0
A- E
( an V™!
a\m\
Q-mm j
\o o ... 1/
similarly for multiplication by E from the left.
It remains to show the distributivity of multiplication and addition. Again using the distributivity of scalars we can easily calculate for matrices A — (fly) of the type m/n, B — (bjk) of the type n/p,C — (cjk) of the type n/p, D — (dki) of the type p/q
A-(B + Q= (j2aij(bjk +cjk)j
= ((I>A*) + = A ■ B + A ■ C
(B + Q ■ D — (j2(bjk +cjk)dkij
= ((J2bjkdki) + (J2cJkdki)) = B-D + C-D. ^   k k '
As we have seen in 1.26, two matrices of dimension two do not necessarily commute:
A 0\ /0 1\ _ /0 1n ,0   0/" v0   01 ~ vo 0)
0 1 0 0
1 0 0 0
0 0 0 0
76
CHAPTER 2. ELEMENTARY LINEAR ALGEBRA
choice for the variable x4 e M. Thus for x4 we substitute the parameter t € R and go back from the matrix notation to the equations
x\    —      x2    ~\~     X3    —      t    =    —2,
3x2   —   3x3   +    t   = 1, 3x3   -   3t   = -3.
>From the last equation we have x3 = t — 1. Plugging in for x3 into the second equation gives us
3x2 — 3r + 3 + t = 1,   that is, x2
Finally using the first equation we have 1
xx
(2r — 2) + ř — 1 — ř
-2,
1
- (2ř
3 V
1
2)
tj.   x1 = - (2t - 5).
(x\, x2, X3, x4)
2s
3s) in the form
3s — 1, 3s
S €
Thus the set of solutions can be written (for t
2s - -, 3 3
Let us now go back to the extended matrix of the system and transform it further by using the row transformations in order to have (still in the row echelon form) the first non-zero number of every row (the so-called pivot) equal to one and that all the other numbers in the column of the pivot are zero. We have
/1	-1	1 -1	-2 \			( 1		-1	1	-1	-2 \	
0	3	-3 1	1				0	1	-1	1/3	1/3	
0	0	3 -3	-3				0	0	1	-1	-1	
\o	0	0 0	0 )			V	0	0	0	0	0 )	
/1	-1	0 0	-1				/ 1	0	0	-2/3	-5/3 \	
0	1	0 -2/3	-2/3				0	1	0	-2/3	-2/3	
0	0	1 -1	-1				0	0	1	-1	-1	
	0	0 0	0	)				0	0	0	0	
because first we have multiplied the second and the third row by 1/3, then we have added the third row to the second and its (— 1)-multiple to the first. Finally we have added the second row to the first. From the last matrix we easily obtain the result
x2 x3
w
/-5/3\ -2/3 -1
V 0 /
+ t
(2/3\ 2/3 1
V 1 /
t e
Free variables are those whose columns do not contain any pivot (in our case there is no pivot in the fourth column, that is, the fourth variable is free and we use it as a parameter). □
2.5.   Determine the solutions of the system of equations
3xi + 3x3 — 5x4 = 8,
x\ — x2 + X3 — X4 = —2,
—2xi — x2 + 4x3 — 2x4 = 0,
2xi -\- x2 — X3 — X4 = —3.
This gives us immediately a counterexample to validity of (02) and (OI). For matrices of type 1/1 both axiom clearly holds, because the scalars itself have them. For matrices of greater dimension the counterexamples can be obtained similarly - they have the counterexamples for dimension 2 in their left upper corner, the rest is zero. (Verify it on your own!) □
In the proof we have actually worked with matrices of more general type, thus we have proved the properties in greater generality:
Associativity and distributivity of matrix multiplication J
Corollary. Matrix multiplication is associative and distributive, that is,
A- (B ■ O = (A- B) C A-(B + Q = A- B + A-C,
whenever are all the given operations defined. Unit matrix is a unit element for multiplication (both from the right and from the left).
2.6. Inverse matrices. With scalars we can do the following: from the equation a ■ x — b we can express x — a-1 • b, whenever the inverse of a exists. We would ^S^g£^ like to be able to do this for matrices too, but we have a problem - how can we tell when such matrix exists, and how to compute it?
We say that B is inverse of A, when
A ■ B — B ■ A — E.
Then we write B — A ~1 and from the definition it is clear that both matrices must be square and of the same dimension n. A matrix which has an inverse is called invertible matrix or regular square matrix.
In the subsequent paragraphs we derive (among other things) that B is inverse of A whenever just one of the required equations holds (the other equation is then necessarily true also).
If A ~1 and B~1 exist, then there also is the inverse of the product A ■ B
(2.1)
(A ■ B)'1 = B~l ■ A'1.
Because the associativity of matrix multiplication proved a while ago, we have that
• A'1) ■ (A ■ B) — B~l ■ (A~l ■ A) ■ B — E (A- B)- (B~l ■ A~l) — A ■ (B ■ B~l) ■ A~l — E.
Because we can calculate with matrices similarly as with scalars (they are just a little more complicated), the existence of inverse matrix can really help us with the solution of systems of linear equations: if we express a system of n equations for n unknowns as a matrix
MM
product
A ■ x —
air,
Vml     ' ' '    ßmm/ \Xm/
(bl\
\bmj
11
CHAPTER 2. ELEMENTARY LINEAR ALGEBRA
Solution. Note that the system of equations in this exercise differs from the system of equations in the previous exercise only in the value 8 (instead of — 8) on the right-hand side. If we do the same row transformations as in the previous exercise, we obtain.
/	3 0		3	-5		8 \			/	1	-1	1	-1	-2 \
	1 -1		1	-1		-2				2	1	-1	-1	-3
	—	2 -1	4	-2		0				-2	-1	4	-2	0
V	2 1		-1	-1					V	3	0	3	-5	8 /
		/ 1 -1	1	—	1					/ 1	-1	1	-1	-2 \
		0 3	-3	1		1				0	3	-3	1	1
		0 0	3		3	-3				0	0	3	-3	-3
		\0 0	3		3	13	)				0	0	0	i6;
where the last operation was subtracting the third row from the fourth. From the fourth equation 0 = 16 follows that the system has no solutions. Let us emphasise than whenever we obtain an equation of the form 0 = a for some a ^ 0 (that is, zero row on the left side and nonzero number after the vertical bar) when doing the row transformation, the system has no solutions. □ You can find more exercises for systems of systems of linear equations on the page 127
B. Manipulations with matrices
In this sub-chapter we shall work with matrices only, in order to get more familiar with their properties.
2.6. Matrix multiplication. Carry on the matrix multiplications and check the result. Note that, in order to be able to multiply two matrices, the necessary and sufficient condition is that the first matrix has the same number of columns as the number of rows of the second matrix. The number of rows of the resulting matrix is then given by the number of rows of the first matrix, the number of columns then equals the number of columns of the second matrix.
12
3
7 0
and when there exists inverse of the matrix A, then we can multiply from the left by A-1 and we obtain
u — A~
A ■ x — E ■ x — x,
that is, A-1 • u is the desired solution.
On the other hand, expanding the condition A ■ A~l — E for unknown scalars in the matrix A~l gives us n systems of linear equations for the same matrix on the left and different vectors on the right.
2.7. Equivalent operations with matrices. Let us gain some practical insight into the relation between systems of equations and their matrices. Clearly, searching for the inverse can be more complicated than direct solution to the system of equations. But it is important that whenever we have to solve more systems of equations with the same matrix A but with different right sides u, then yielding A-1 can be really beneficial for us.
From the point of view of solving systems of equations A-x — u it is natural to consider the matrices A and vectors u equivalent whenever they give a system of equations with the same solution set. Let us think about possible operations which would simplify the matrix A such that obtaining the solution is easier.
Let us begin with simple manipulations of rows of equations which do not influence the solution, and similar modifications of the right-hand side vector. If we are able to change a square matrix into the unit matrix, then the right-hand side vector is a solution of the original system. If some of the rows of the system vanish during the course of manipulations (that is, they become zero), it will also give us some direct informations about the solution. Our simple operations are:
Elementary row transformations [__
• switching of two rows,
• multiplication of a given row by a non-zero scalar,
• adding a row to another row.
These operations are called elementary row transformations. It is clear that the corresponding operations at the level of the equations in the system do not change the set of the solutions whenever our ring is an integral domain.
-14   1 12),
Analogically, elementary column transformations of matrices
are
• switching of two columns
• multiplication of a given column by a non-zero scalar,
• adding a column to another column.
these do not preserve the solution set, since they interchange the variables itself.
Systematically we can use elementary row transformations for subsequent ehmination of variables. This gives an algorithm which is usually called Gaussian elimination of variables.
Gaussian elimination of variables    |_s
Proposition. Non-zero matrix over arbitrary ring of scalars K can be transformed using finitely many elementary row transformations into the so-called (row) echelon form:
• Ifaik — Ofor all k — 1, ..., j, then a\j — Ofor all k > i,
78
CHAPTER 2. ELEMENTARY LINEAR ALGEBRA
vi
) (1   2 -2)
(-2).
if ciQ-i)j is the first non-zero element at the (i — l)-th row,
then at
0.
Remark. The parts i) and ii) in the previous exercise show that multiplication of square matrices is not commutative in general, in the part iii) we see that if we can multiply two rectangular matrices, then it is possible only in one of the orders. In parts iv) and v) you can note that
(A • B)T = AT ■ BT.
2.7. Calculate A5 and A-3, if
O
2.8. Let
	ľ	0	-5\	/
A =	2	7	15 ,	B=\
	U	7	13/	\
Can the matrix A be transformed into B using only elementary row transformations (then we say that such matrices are row equivalent)?
Solution. Both matrices are clearly row equivalent with three-dimensional identity matrix. It is easy to see that row equivalence on the set of all matrices of given type is indeed an equivalence relation. Thus the matrices A and B are row equivalent. □
2.9. Find some matrix B for which the matrix C echelon form, where
/3 5
B ■ A is in row
2\ 3 0
1   -3 -5 \7   -5    1 4/
Solution. If we multiply the matrix A gradually from the left by elementary matrices (consider what elementary row transformations does it correspond to)
	/ 1	0	0	0\		/1	0	0	0\
E3 =	0 -3	1 0	0 1	0 0	,    E4 =	0 0	1 0	0 1	0 0
	\o	0	0	V			0	0	V
	(I	0	0	0\		(I	0	0	0\
E5 =	0 0	1/3 0	0 1	0 0	,    E6 =	0 0	1 -2	0 1	0 0
	\p	0	0			v>	0	0	V
	(I	0	0	0\		(I	0	0	0\
E1 =	0 0	1 0	0 1	0 0		0 0	1/4 0	0 1	0 0
		-4	0	V			0	0	V
Proof. Matrix in the row echelon form looks like this
/o
0
0
a-lk
2	-1 M	
-1	2    -1 .	V
0	0     1 /	
0 aip
film
)
and the matrix can (but does not have to) end with some zero rows. In order to transform arbitrary matrix we can use a simple algorithm, which will bring us, row by row, to the resulting echelon form:
___|    Gaussian elimination algorithm |_ -
(1) By a possible switching of rows we obtain a matrix where the first row has in the first non-zero column a non-zero element, let that column be j-th.
(2) For i — 2,..., by multiplying the first row by the element a^, multiplying z'-th row by the element a\j and subtracting we eliminate the element atj on the z-th row.
(3) By repeated application of the steps (1) and (2), always for the not-yet-echelon part of rows and columns in the matrix we reach after a finite number of steps the final form of the matrix.
This proves the proposition.
□
The given algorithm is really the usual elimination of variables used in the systems of linear equations.
In a completely analogical manner we define the column echelon form of matrices and doing column instead of row elementary transformations, we obtain an algorithm for transforming matrices into the column echelon form.
Remark. We have formulated the Gaussian elimination for general scalars from some ring. It seems natural to multiply with scalars to obtain a row echelon form where the coefficients at the non-zero "diagonal" are ones - computing the solution is then easy. However, this is not possible in general - take for instance the integers Z.
For solving systems of equations the given algorithm does not make any sense if there are divisors of zero among the scalars. Think carefully about the differences between K — Z, K — R and possibly Z2 or z4.
2.8. Matrix of elementary row transformations. In the following we will work exclusively with field of scalars K, that is, every non-zero scalar has an inverse.
Note that elementary row (column) transformations correspond to multiplication from the left (right) by the following matrices:
79
CHAPTER 2. ELEMENTARY LINEAR ALGEBRA
we obtain
B = EsE1E6E5E4Et,E2Ei
/0 0
0 1/12
1 -2/3 \0 -4/3
1 0\
-5/12 0
1/3 0
-1/3 1/
C
/l -3-5 0 \
0 1 9/4 1/4
0 0 0 0
\0 0 0 0 /
□
2.10. Complex numbers as matrices. Consider the set of matrices
b a
, a,
}. Note that C is closed under addition and
multiplication, and further show that the mapping / : C
b a
h» a+bi satisfies f(M + N) = f(M) + f(N) and
f(M-N) = f(M) ■ f(N) (on the left-hand sides of the equations we have addition and multiplication of matrices, on the right-hand sides we have addition and multiplication of complex numbers). Thus the set C along with multiplication and addition can be seen as the field C of complex numbers. The mapping / is then called isomorphism (of fields). Thus for instance we have
-9N ,9 i
3 5 -5 3
69 13 -13 69
which corresponds to (3 + 5z) • (8 — 9i) = 69 — 13/.
2.11.   Solve the equations for matrices
'l 3'
3 8
■X,
1 2 3 4
X
2 ■
1 3
3 8
1 2 3 4
Solution. Clearly the unknowns Xi and X2 must be matrices of the type 2 x 2 (in order for the products to be defined and that the result is a matrix of the type 2x2). Set
X,
CL\
Xj
a2 b2 C2 d2
and multiply out the matrices in the first given equation. It has to hold
ci\ + 3ci 3ai + 8ci
bi + 3dx 3*i + 8di
1 2 3 4
that is,
3ci\
+ +
3ci
3bi
+ 3di
1,
2,
3,
4,
By adding a (—3)-multiple of the first equation with the third equation we obtain c\ = 0 and then a\ = 1. Analogously, by adding a
(1) Switching of the ;-th and j-th row (column) /l    0    ... \
0
0
1
1    ... 0
V
i/
(2) Multiplication of the ;-th row (column) by the scalar a: /I \
1
(3) Adding the ;-th and the j-th row (column): /l 0
o '••
V
This trivial observation is actually very important, since the product of invertible matrices is invertible (recall 2.1) and all elementary transformations over field of scalars are invertible (the definition of the elementary transformation itself ensures that inverse transformations are of the same type and it is easy to determine corresponding matrix).
For arbitrary matrix A we obtain by multiplying with suitable invertible matrix P — Pf- P\ from the left (that is, sequential multiplication with k matrices) its equivalent row echelon form A' — P ■ A.
In general, if we apply the same elimination procedure for the columns, we can obtain from any matrix B its column echelon form B' by multiplying it from the right by a suitable invertible matrix Q — Q l • • • Qi ■ If we start with the matrix B — A' in row echelon form, this procedure eliminates only the still non-zero elements out of the diagonal of the matrix and in the end we can transform the remaining elements to be units. Thus we have verified a very important result we will use many times in the future:
2.9. Theorem. For every matrix A of the type m/n over field of scalars K there exists square invertible matrices P of dimension m and Q of dimension n such that the matrix P ■ A is in row echelon
80
CHAPTER 2. ELEMENTARY LINEAR ALGEBRA
(—3)-multiple of the second equation to the fourth equation we obtain
d\ = 2 and then b\ = —4. Thus we have
'1 -4^ v0 2
We find the values a2,b2,c2, d2 by a different approach. We use the relation
form and
a b c d
1
ad - be \-c a which holds for any numbers a,b,c,d e M (easy to derive; it also directly follows from 2.2), we calculate
1 3
3 8
-8 3 3 -1
Multiplying the given equations by this matrix from the right gives
X2
1 2 3 4
-8 3 3 -1
and thus
X2
-2 1 ■12 5
□
2.12. Solve the matrix equation
'2 5
X
1 3
4 -i 2 1
o
2.13. Computing the inverse matrix. Compute the inverse matrix of the matrices
/4  3  2\ /l 0
A = ( 5  6  3 1,    £=3 3 \3   5   2/ \2 2
Then determine the matrix (AT ■ B) 1.
Solution. We find the inverse by the following: write next to each other the matrix A and the unit matrix. Then use elementary row transformations so that the sub-matrix A changes into the unit matrix. This will change the original unit sub-matrix to A-1. We gradually obtain
-1
0
1
í4	3 2		1	0	
	6 3		0	1	0 ~
	5 2		0	0	1/ \
-2	0		1	0	-1 \
16	3		-5	1	5 ~
11	2		-3	0	4;
i	-2	0	1     0    -1 \	( 1	0	0
0	5	1	-2    1     1 ~	o	0	1
0	1	0	1-22/	o	1	0
1	0	-1 \
-2	1	1
-3	0	4 J
3	-4	3
-7	11	-9
1	-2	2
/I ■	. 0		.. 0\
0 .	. 1	0    ... .	.. 0
0 .	. 0	1     0 .	.. 0
0 .	. 0	0     0 .	.. 0
V			/
P ■ A ■ Q
2.10. Algorithm for computing inverse matrix. In the previous paragraphs we have basically obtain the complete algorithm for computing the inverse matrix. Using the simple approach described in the next paragraph, we either find out that the inverse does not exist, or we compute it. Keep in mind that we are still working over field of scalars.
Equivalent row transformations of square matrix A of dimension n lead to the matrix P' such that the matrix P' ■ A is in row echelon form. It could be the case that some of the last rows are zero. If there exists the inverse of A, then there exists also the inverse of P' ■ A. But if the last row of P' ■ A is zero, then the last row of P' ■ A ■ B is also zero for any B of dimension n. That is, the existence of zero row in the result of (row) Gaussian elimination means that there cannot exists the inverse A-1.
Assume now that A-1 exists. Because of the previous, we obtain the row echelon form which has no non-zero row, that is, all diagonal elements of P' ■ A are non-zero. But then carrying the row elimination using the elementary row transformation from the bottom-right corner backwards and transforming the diagonal elements to be units we obtain the unit matrix E. That is, we find another invertible matrix P" such that for P — P" ■ P' we have P ■ A — E. Doing column instead of row transformation we can (under the assumption of the existence of A-1) find a matrix Q such that A ■ Q — E. From this we have
P = p . E = P ■ (A ■ Q) = (P ■ A) ■ Q = Q.
That is, we have found the inverse matrix
A-1 — P — Q
for the matrix A. Notably, at the point of finding the matrix P with the property PA — £ we don't have to do any further computation since we know that we already have the inverse matrix. Practically we can work as follows:
___|    Computing the inverse matrix J___
Next to each other we write the original matrix A and the unit matrix E, we transform the matrix A using the elementary row transformation to the row echelon form, then using the so-called backwards elimination to the diagonal matrix and then by multiplying with the inverse elements of K to the unit matrix. Simultaneously, we apply all these transformations to the matrix E, and as a result we obtain the matrix A-1 in place where E has been. If during the course of the procedure we obtain a zero row in the original matrix, we conclude that the inverse matrix does not exist.
81
CHAPTER 2. ELEMENTARY LINEAR ALGEBRA
3-4 3 1    -2 2 -7   11 -9
In the first step we subtracted from the first row the third row, in the second step we added a (—5)-multiple of the first to the second row and added a —3)-multiple of the first row to the third row, in the third step we subtracted from the second row the third row, in the fourth step we added a (—2)-multiple of the second row to the third row, in the fifth step we added a (—5)-multiple of the third row to the second row and added a 2-multiple of the third row to the first row, and in the last step we changed the second and the third row. Let us emphasise the result
/ 3    -4 3 A1 = ( 1    -2 2 -7   11 -9)
Let us note that when calculating the matrix A ~1 we did not have to cope with fractions thanks to the suitably chosen row transformations. Although we could carry on similarly when doing the next exercise, that is, B~l, we will rather do the more obvious row transformations. We have
1
1	0	0	1     2    -3 \	/ 1	0	0	1	2	-3
0	1	0	-1    1    -1 I	" 0	1	0	-1	1	-1
0	0	1 3	o  -!   1 /		0	1	0	-2	3
that is,
Using the identity
(AT .B)~l =B~l - {A7)'1 = B~ and the knowledge of the inverse matrices computed before, we obtain
(A?.B)-'
■14 -9 42 ■10 -5 27 17     10   -491
□
2.11. Linear dependence and rank. In the previous musings about calculations with matrices we have worked all the time with row and column addition seeing them as vector, along with scalar multiplication. Such operations are called linear combinations. We return to such operations in abstract sense in a while in 2.24, it will be also useful to understand their core meaning just now. Linear combination of rows (columns) of a matrix A — (a^) of type m/n we understand an expression of the form
) are rows (or u j —
c\uix H-----h CkUtk,
where q are scalars, uj — (a.j\,... ,cijn (a\j,..., amj) are columns) of the matrix A.
If there exists linear combination of given rows with at least one non-zero scalar coefficient which results into zero row, we say that these rows are linearly dependent. In the other case, that is, when the only possibility of obtaining zero row is by taking only zero scalars, the rows are called linearly independent.
Analogously, we define linearly dependent and linearly independent columns.
The previous results about the Gaussian elimination can be now interpreted as follows: the number of non-zero "steps" in the row (column) echelon form is always equal to the |x number of linearly independent rows (columns) of the matrix. Let Eh be the matrix from the theorem 2.9 with h ones on the diagonals and assume that by two different transformation sequences we obtain two different h' < h. But then according to our algorithm there are two invertible matrices P and Q such that
P-Eh,Q = Eh.
In the product Ey ■ Q there will be more zero rows in the bottom part of the matrix than ones in Eh - but we should be able to reach Eh using only elementary row transformations. Increasing the number of linearly independent rows using only elementary row transformations is not possible. Therefore the number of ones in the matrix P ■ A ■ Q in the theorem 2.9 is independent of the choice of our elimination sequence and is always equal to the number of linearly independent rows in A, and also to the number of linearly independent columns in A. This number is called the rank of the matrix and we denote it by h(A). Let us remember the following theorem:
Theorem. Let A be a matrix of type m/n over field of scalars K. The matrix A has the same number h(A) of linearly independent rows and columns. Notably, the rank is always at most minimum of the dimensions of the matrix A.
The algorithm for computing the inverse matrix also says that square matrix A of dimension m has inverse if and only if its rank equals m.
2.12. Matrices as mappings. Analogous to the way we did it
when working with matrices in the geometry of the plane (see 1.29), we can interpret every square matrix A as a mapping
A:Kn^Kn, ibA-i.
Thanks to the distributivity of matrix multiplication it is clear how are linear combinations of vectors mapped using such mappings:
A ■ (a x + b y) — a (A ■ x) + b (A ■ y).
82
CHAPTER 2. ELEMENTARY LINEAR ALGEBRA
2.14. Compute the inverse matrix of the matrix
0 -2N -2 1 -5 2
2.15. Compute the inverse matrix of the matrix
O
/8 3
5 2
0 0
0 0
\0 0
0   0 0\
0   0 0
-10 0
0    1 2
0    3 5/
O
2. id Determine whether there exists an inverse of the matrix
/II     1     1 \
C
If yes, then compute C
1 1
V1
1
-1 1 I
O
while i is the imaginary unit
O
2.18. Write the inverse matrix to the n x n matrix (n > 1)
/2-n 1 1     2 - n
\ 1
1        1 \ 1
2-n 1 1 2-n/
O
C. Permutations
In order to be able to define the key point of the matrix calculus, that is, determinant, we must deal with permutations (bijections of a finite set) and their parities.
We shall use the two-row notation for permutations (see 2.14). In the first row we list all elements of the given set, and every column
Right from the definition we see (thanks to the associativity of multiplication) that composition of mappings corresponds to matrix multiplication in given order. Thus invertible matrices correspond to bijective mappings.
>From this point of view the theorem 2.9 is very interesting. We can see it as follows: the rank of the matrix determines how big is the image of the whole K" under this mapping. Really, if A — P ■ Ek ■ Q with matrix Ek with k ones as in 2.9, then the invertible Q first just bijectively "shuffles" the n-dimensional vectors in K", the matrix Ek then "copies" first k coordinates and zeroes the n — k remaining. This "^-dimensional" image then cannot be enlarged by multiplying with P.
2.13. Solving systems of linear equations. We shall return to the notions of dimension, linear independence and so on in the third part of this chapter. But even now we can notice what the already derived results say about the solution of the system of linear equations. If we consider the matrix of the system of equations and add to it the column of the required results, we speak about the extended matrix of the system. The approach we have presented before corresponds to sequential variable elimination in the equations and deletion of the linearly dependent equations (these are simply a consequence of other equations).
We have thus derived the complete information about the size of the set of solutions of the system of linear equations, based on the rank of the matrix of the system. If we are left with more nonzero rows in the row echelon form of the extended matrix than in original matrix of the system, then there cannot be any solution (simply, we cannot hit the given value with the corresponding linear mapping). If the rank of both matrices is the same, then we have in the backwards elimination exactly that many free parameters as the difference between the number of variables n and the rank h(A).
2. Determinants
In the fifth part of the first chapter we have seen (see 1.27) that for square matrices of dimension 2 over the real numbers there exists scalar function det, which as-5S=tS=E^8 signs to the matrix a non-zero number if and only if the inverse of the matrix exists. We did not say it in these words, but you can check by yourself that it means indeed the same (see the paragraphs starting with 1.26 and the formula (1.16)). Determinant was also useful in another way, see the paragraphs 1.33 and 1.34, where we have derived that the volume of the parallelepiped should be linearly dependent on every of the two vectors defining it and it is useful to require the change of the sign when changing the order of these vectors. Because determinant (and only determinant) had these properties, up to a constant scalar multiple, we stated that it corresponds to the definition of the volume. Now we will see that we can do it similarly for every finite dimension.
In this part we will work with arbitrary scalars K and matrices over these scalars. Our results about determinants will thus hold for all commutative rings, notably also for integer matrices.
2.14. Definition of the determinant. Let us remind that the bijective mapping from the set X to itself is called permutation of
83
CHAPTER 2. ELEMENTARY LINEAR ALGEBRA
then corresponds to a tuple (preimage, image) in the given permutation. Because permutation is a bijection, the second row is indeed a permutation (ordering) of the first row, in accordance with the definition from combinatorics.
2.19.  Decompose the permutation
a
123456789 316789542
into a product of transpositions.
Solution. We first decompose the permutation into a product of independent cycles: let us start with the first element (one) and look on the second row to see what the image of one is. It is three. Now we look on the column that starts with three, and find out that the image of three is six, and so on. We carry on in this manner for so long until we again reach the starting element (in this case it is one). We obtain the following sequence of elements, which map to each other under the given permutation:
1 i—> 3 i—> 6 i—> 9 i—> 2 i—> 1.
The mapping which maps elements in such a manner is called cycle (see 2.16) which we denote (1, 3, 6, 9, 2).
Now we take any element not contained in the obtained cycle, and start with him the same procedure as with one. We obtain the cycle (4, 7, 5, 8). From the method is clear that the result does not depend on the first obtained cycle. Each element from the set ({1, 2, ..., 9}) appears in one of the obtained cycles, we can thus write:
a = (l,3,6,9,2)o(4,7,5,8).
For cycles the decomposition into transpositions is simple, we have
(1, 3, 6, 9, 2) = (1, 3)o(3, 6)o(6, 9)o(9, 2) = (1, 3)(3, 6)(6, 9)(9, 2). We thus obtain:
a = (1, 3)(3, 6)(6, 9)(9, 2)(4, 7)(7, 5)(5, 8).
□
Remark. Let us note that the operation o is composition of mappings, thus it is necessary to carry out the composition "backwards", as we are used to composition of mappings. Applying the given composition of transposition for instance on the element two we can gradually write:
[(1, 3)(3, 6)(6, 9)(9, 2)](2) = [(1, 3)(3, 6)(6, 9)]((9, 2)(2)) =
[(1,3)(3,6)(6,9)](9) = [(1,3)(3,6)](6) = (1,3)(3) = 1,
the set X, see 1.7. If X — {1, 2,..., n], the permutation can be written putting the resulting ordering into a table:
1
<7(1)
2 o (2)
n
a {n)
The element x e X is called a fixed point of the permutation a if a (x) — x. Permutation a such that there exist exactly two distinct elements x, y e X such that a (x) — y while all other elements z e X are fixed points, is called transposition, we denote it by (x, y). Of course that for such transformation it holds also that a (y) — x, therefore the name.
In the dimension 2 the formula for determinant was simple -take all possible products of two elements, one from every column and every row of the matrix, give them a sign such that switching two columns leads to the change of the sign of the whole result, and sum all of them (that is, all two):
A =
a b c d
det A — ad — be.
In general, consider square matrices A — (ay) of dimension n over K. The formula for the determinant of the matrix A is also composed of all possible products from elements from individual rows and columns:
___[    Definition of determinant    [__^
Determinant of the matrix A is a scalar det A — \A \ defined by the relation
1^1 —  J2 Sgn(°r)fllff(D -a2a(2)
' ®no{n)
where £„ is the set of all possible permutations over {1,... ,n\ and the sign sgn for a permutation a will be described later. Each of the expressions
sgn(er)fliCT(i) • a2a(2) ■ ■ ■
is called a member of the determinant \A\.
In the dimensions 2 and 3 we can easily guess correct signs. The product of the elements on the diagonal should be with positive sign and we want anti-symmetry when switching two columns or rows.
|    Determinants in the dimension 2 and 3    |_^
For n — 2 it is, as we have expected
an an an ö22
011022 — Ol2«21-
Similarly for n — 3
an   an 013
021 a 22 a 23 fl31    fl32 í/33
011022033 — fll3fl22fl31 + 013021032 —011023032 + fll2fl23fl31 — 012021033-
This formula is called the Saarns rule.
84
CHAPTER 2. ELEMENTARY LINEAR ALGEBRA
thus the mapping indeed maps the element 2 on the element 1 (it is actually just the cycle (1, 3, 6, 9, 2) written in a different way). When writing a composition of permutations, we often omit the sign "o" and speak of product of permutations.
When writing the cycle we write only the elements on which the cycle (that is, the mapping) nontrivially acts (that is, the element is mapped to some other element). Fixed-points of the cycle are not listed. Thus it is necessary to know on which set do we consider the given cycle (mostly it will be clear from the context). The cycle c = (4, 7, 5, 8) from the previous example is thus a mapping (permutation), which, in the two-row notation, looks like this
123456789 123786549
If the original permutation has some fixed-points they do not appear in the cycle decomposition.
Let us further note that the notation (1, 2, 3) gives the same cycle as for instance (2, 3, 1) or (3, 1, 2). But the notation (1, 3, 2) is a different cycle.
2.20.  Determine the parity of the following permutations:
1   2  3  4 5
7   8 9
3 1
7   8  9  5  4 2
1 2  3  4  5 6
2 4  6   1   5 3
Solution. >From the previous exercise we know that a = (1,3)(3,6)(6, 9)(9, 2)(4, 7)(7,5)(5, 8). Its parity is given by the parity of the number of transpositions in its decomposition (which is, unlike the number of transposition in an arbitrary decomposition, always the same). There are seven transpositions in the decomposition, thus the permutation is odd. Even without the knowledge of a decomposition of a into transpositions we could compute the number of tuples (a,b) c {1, 2,..., 9} x {1, 2, ..., 9} which are inverse with respect to a (see 2.15): we go sequentially through the second row in the two-row notation and for every number k there we count the number of numbers which are smaller than k and are located after k in the second row. It is not hard to realise that the number of inversions in a given permutation is exactly the number of tuples "bigger before smaller" in the second row. For a we compute (stepping through the second row): after three there is one and two, thus we add 2; after one there is no smaller number and we add 0; after six there is five, four and two, thus we add 4, similarly for seven, eight and nine, for five we add 2, for four we add 1 and for two nothing. Thus we have 17 inversions in total and the permutation is indeed odd.
2.15. Parity of permutation. How can we find a sign of a per-jiTZi mutation. We say that a tuple of element a,b e fv4t X — {1, ... ,n] forms an inversion in permutationo, .f^few^'i it " l> and a (a) > a(b). Permutation a is called 'ffsr*%5s-L even (odd), if it contains even (odd) number of inversions.
Parity of permutation a is (-l)™mber of inversions and we de_ note it by sgn(er). This amounts to our definition of sign for computing determinant. But we would like to know how to calculate with parity. >From the following theorem about permutations it is clear that the Saarus rule really gives the determinant for the dimension 3.
Theorem. Over the set X — {1, 2,..., n\ there are exactly n! distinct permutations. These can be ordered in a sequences such that every two consequent permutations differ in exactly one transposition. For any permutation there is such sequence starting with it. Every transposition changes parity.
Proof. For one- and two-element X the claim is trivial. Let us do induction over the number of dimensions.
Assume that the claim holds for all sets with n — 1 elements and consider a permutation g(\) — a\,..., g(n) — an. According to the induction assumption all the permutations that end with an can be obtained in a sequence where every two consequent permutations differ in one transposition. There are (n — 1)! such permutations. On the last of them we use the transposition of g (n) — an with some element at has not yet been at the last position, and once again form a sequence of all permutations that end with at. After doing this procedure n -times, we obtain n (n—1)! —n \ distinct permutations - that is, all permutations on n elements. The resulting sequence satisfies the condition.
Note that the last sentence of the theorem does not seem to be useful for its application. But it is a very important part for proving it by induction over the size of X.
It remains to prove the part of the theorem about parities. Consider the ordering
(a\, ..., at, ai+\, ..., an), containing r inversions. Then clearly in the ordering
(a\, ..., fl;+i, at, ..., a„)
there are either r — 1 or r + 1 inversions. Every transposition (at, aj) is obtainable by doing (j — i) + (j — i — 1) — 2(j — i) — 1 transpositions of neighbouring elements. Therefore doing any transposition changes the parity. Also, we already know that all permutations can be obtained by applying transpositions. □
We found out that applying a transposition changes the parity of a permutation and any ordering of numbers {1,2,... ,n] can be obtained through transposing of neighbouring elements. Therefore we have proven
Corollary. On every finite set X — {1, ..., n\ with n elements, n > 1, there are exactly \n! even and \n \ odd permutations.
If we compose two permutations, it means first doing all transpositions forming the first permutations and then all the transpositions forming the second. Therefore for any two permutations g, n : X -> X we have
sgn((T o n) = sgn((j) • sgn(^)
85
CHAPTER 2. ELEMENTARY LINEAR ALGEBRA
Analogously we can decompose r into either a product of transpositions (using the cycle decomposition):
t = (1, 2, 4)(3, 6) = (1, 2)(2, 4)(3, 6),
or we count the number of inversions inr: 1+2 + 3 + 0 + 1 = 7. Anyway we find out that r is also an odd permutation.
and also
□
D. Determinants
Ensure on the following exercise that you can compute determinants of the type 2x2 and 3x3 (using the Saarus rule):
(\ 2
2.21. Compute the determinant of the following matrices I ^ ^
'l    2    3\ / 1 11 1-12       1    0 0 K3    2    2) \-2   0 1
2.22. Compute the determinant of the matrix
/l   3  5 6\ 12  2 2
1 1 1 \0   1 2
2
Solution. We start by expanding the first column, where there is greatest number (one) of zeroes. Gradually we get
13 5 6 12 2 2 1112 0   12 1
using the Saarus rule
	2 2	2		3	5 6		3	5 6
1 •	1 1	2	- 1 •	1	1 2	+ 1 •	2	2 2
	1 2	1		1	2 1		1	2 1
2-	2 + 6	—	2.					
□
2.23.  Find all the values of argument a such that
1   1 1
a
0 a 1
0 1 a
0 0 0
1.
For complex a give either its algebraic or polar form.
Solution. We compute the determinant by expanding the first row of
the matrix:
D
a 1 0 a 0 1
0   0   0 -further we expand using the last row:
a 1
	a	1	1
a ■	1	a	1
	0	0	—a
D = a ■ (—a)
1
-a2(a2
1)-
sgn((T   ) = sgn((j).
2.16. Decomposing permutations into cycles. A good tool for practical work with permutations is the cycle decomposition.
__ [     Cycles |___
Permutation a over the set X — {1,..., n] is called cycle of length k, if we can find elements a\,..., e X, 2 < k < n such that o(ai) — flj+i, i — 1, ..., k — 1, while er(a,t) — «1 and other elements in X are fixed-points of a. Cycles of length two are exactly transpositions.
Every permutation is a composition of cycles. Cycles of even length have parity — 1, cycles of odd length have parity 1.
The last claim has yet to be proven. If we define for a given permutation a relation r such that two elements x, y e X are in relation if and only if ar(x) — y for some iteration of the permutation a, then clearly it is an equivalence relation (check it carefully!). Because X is finite set, for some £ it must hold that a1 (x) — x. If we pick one equivalence class {x, er(x),..., ol~l (x)} c X and define other elements to be fixed-points, we obtain a cycle. Evidently, the original permutation X is then composition of all these cycles for individual equivalence classes and it does not matter in which order we compose the cycles.
For determining the parity we just have to note that cycles of even length can be written as an odd number of transposition, therefore their parity is — 1. Analogously, cycle of odd length can be obtained using an even number of transpositions and therefore it has parity 1.
2.17. Simple properties of determinant. Knowing the proper-|gu        ties of permutations and their parities from previous r-xsg-Nj^P- paragraphs allows us to derive quickly some basic properties of determinant.
For every matrix A — (ai;) of the type m/n over scalars from K we define transpose of A. It is a matrix AT — (a'..) with elements a?. —     which is of the type n/m.
Square matrix A with the property A — AT is called symmetric. If it holds that A — —AT, then A is called antisymmetric.
_ Simple properties of determinant _.
Theorem. For every square matrix A = (flij) the following claims hold:
(1) \AT\ = \A\
(2) If one of the rows contains only zero elements from K, then \A\ =0.
(3) If a matrix B was obtained from A by transposing two rows, then\A\ = -\B\.
(4) If a matrix B was obtained from A by multiplying a row by a scalar a € K, then \ B\ — a \A\.
(5) If all elements of the k-th row in A are of the form ay = + bkj and all remaining rows in the matrices A, B — (bij), C — (cij) are identical, then \A\ = \B\ + \C\.
(6) Determinant \A \ does not change if we add to any row of A a linear combination of other rows.
86
CHAPTER 2. ELEMENTARY LINEAR ALGEBRA
Together we obtain the following condition for a: a4 — a2 + 1 = 0. Substituting t = a2 we have t2 — t + 1 with roots t\ = l+l2^ =
cos(7r/3) + i sin(7r/3), h = x~\^ = cos(7r/3) — i sin(7r/3) = cos(—7r/3) + i sin(—7t/3), from where we obtain four possible values for the parameter a: a\ = cos(7r/6) + i sin(7r/6) = V3/2 + i/2, a2 = cos(77r/6) + i sin(77r/6) = — V3/2 — i/2, a3 = cos(—it/6) + i sin(—it/6) = V3/2 — i/2, a4 = cos(57r/6) + i sin(57r/6) = -V3/2 + i/2. □
2.24. Vandermonde determinant. Prove the formula for the so-called Vandermonde determinant, that is, determinant of the Vandermonde matrix:
	1	1 .	1	
	CL\	a2 .		
V n =	a\	a2 .	■ al	\<i<j<n
	a"'1	a2 .	■ <~l	
where a\, ...,an el and on the right-hand side of the equation there is the product of all terms aj — at where j > i.
Solution.
We show a really beautiful proof by induction, which fills the heart
§, of any mathematician with supreme joy. Consider the determinant V„ to be a polynomial P in the variable a„. From the v definition of the determinant it follows that this polynomial is of degree n — 1 in this variable and that the numbers Cl\,. .. ,Cln — \ are its roots: if we substitute in the Vandermonde matrix V„ into the last column formed by the powers of a„ any of the previous columns formed by the powers of the number at, the value of this changed determinant is actually the value of the Vandermonde determinant (seen as the polynomial in the variable a„) at the point at. However, that determinant is clearly zero, because determinant of matrix with two identical, that is, linearly dependent columns is zero. That means that at is a root of P. Thus we have n — 1 roots of a polynomial of degree k, thus it must be the list of all its roots and P must be of the form P = C(an — a\){an — a2) ■ ■ ■ (an — a„_i) where C is some constant - the leading term of the polynomial P. If we consider computation of the determinant V„ using the last column expansion, we see that C is the coefficient at ann~x, that is V(n — 1). Since for n = 2, clearly V(2) = a2 — a\, the laim for V„ holds by induction. □ Alternative solution, (see Hints and solutions to the exercises)
Proof. (1) The members of determinants
in bijective correspondence.   To a member sgn(cr)flifr(i) • a2a{2) ■ ■ ■ ana{n) corresponds in A T member (it does not depend on the order of scalars)
sgn(er)aCT(i)i • aa(2)2 ■ ■ ■ aa(„)„ —
= sgn(ff)fl1(T-i (1)
a2a-1 (2) '
' ano-l («)'
and we have to ensure that this member has the correct sign. The parity of a and a ~1 is the same, therefore it really is a member in the determinant of | AT \ and the first claim is proven.
(2) This comes straight from the definition of determinant, because all its members contain from every row exactly one member. If one of the rows is zero, all members of the determinant are zero.
(3) In all members of | B\ the only change in comparison with
| A \ is an addition of one transposition, therefore all the signs will be reversed.
(4) This is straight from the definition, because members of | B\ are members of | A \ multiplied by the scalar a.
(5) In every member of | A \ there is exactly one element from the &-th row of the matrix A. Thanks to the distributive law for multiplication and addition in K, the claim follows directly from the definition of determinant.
(6) If there are two identical rows in A, among the members of determinant there are always two identical up to the sign. Therefore in this case | A \ =0. Thanks to the claim (5), we can add to the given row any other row without changing the value of the determinant. Thanks to the claim (5), we can add even a scalar multiple of any other row. □
2.18. Corollaries for computation. Thanks to the previous theorem, we can using elementary row transformations bring every square matrix A into row echelon form, without changing the value of its determinant. We just have to be careful and add to rows only linear combinations of other rows.
__J    Computing determinants using elimination |.__
If the matrix A is in row echelon form, then every member of \A\ at least one element lies below the diagonal, except for the case that all elements lie on the diagonal. Therefore the diagonal member is the only non-zero one. Thus we see that the determinant of such matrix in row echelon form is
\A \ — an ■ a22 ■ ■ ■ ■ ann.
Previous theorem gives us very effective method for computing determinants using Gauss elimination method, see the paragraph 2.7. _
Let us note a nice corollary of the first claim of the previous theorem about the equality of the determinant of the matrix and its transpose. It ensures that whenever we prove some claim about determinants formulated in terms of rows of the corresponding matrix, we immediately obtain an analogous claim in terms of the columns. For instance, we can immediately formulate all the claims (2)-(6) for
87
CHAPTER 2. ELEMENTARY LINEAR ALGEBRA
2.25.  Find out whether the matrix
/3	2 -1	2 \
4	1 2	-4
-2	2 4	1
\2	3 -4	
3		2 -1		2
4		1 2		-4
-2		2 4		1
2		3 -4		8
	1	2 -	-4	
3 •	2	4	1	—
	3	-4	8	
		4 1	2	
-2		-2 2	4	
		2 3		4
2 •
2 4
addition of linear combinations of columns. We can use it for deriving the following formula for direct calculation of solutions for systems of linear equations:
__ |     Cramer rule [___
is invertible.
Solution. Matrix is invertible (that is, there is an inverse matrix) whenever we can transform it by elementary row transformations into the unit matrix. That is equivalent for instance to the properly that it has non-zero determinant. That we can compute using the Laplace Theorem (2.32) by expanding for instance the first row:
Consider the system of n linear equations for n variables with matrix of the system A — (ay) and the column of values b — (b\,..., b„), that is, in matrix notation we are solving the equation Ax — b. If there exists the inverse A-1, then individual components of the unique solution x — (x\,..., x„) given by the relation
= \AiWA\~1,
where the matrix A, arises from the matrix of the system A by exchanging the ;-th column for the column b of values.
=  3 • 90 - 2 • 180 + (-1) • 110 - 2 • (-100) = 0, that is, the given matrix is not invertible. □
£. Systems of linear equations for the second time
We have already encountered systems of linear equations at the beginning of the chapter. Now we will deal with them in more detail. Let us first use the advantage for computing the solution of the system of linear equations given by the inverse of the matrix.
2.26. Participants of a trip. There were 45 participants of a two-day bus trip. First day the fee for a watchtower was €30 for an adult, €16 for a child and €24 for a senior. In total, the fee was €1116. On the second day, the fee for a bus with a palace and botanical garden tour was €40 for an adult, €24 c for a child and €34 for a senior. In total, the fee was €1542. How many adults, children and seniors were there among the participants?
Solution. Let us introduce the variables
x giving the „number of adults"; y giving the „number of children"; z giving the „number of seniors";
There were 45 participants, therefore
Really, as we have already seen, inverse of the matrix of the system exist if and only if the system has unique solution. If we have such solution x, we can plug instead of the column b into the matrix At the corresponding linear combination of the columns of the matrix A, that is the values bt — anx\ + • • • + ainx„. Then, by subtracting the xj--multiples of all other columns, in the ;-th column remains just the x,-multiple of the original column of A. The number x, can thus be brought in front of the determinant to obtain the equation |Ai||A|_1 — xi\A\\A\~x — x,, which is the claim.
Let us further note that the properties (3)-(5) from the previous theorem say that determinant, as a mapping which assign to n vectors of dimension n (rows or columns of the matrix) a scalar, is antisymmetric mapping linear in every argument, exactly as we required in analogy to the 2-dimensional case.
2.19. Further properties of the determinant. Later we will see that exactly as in the dimension 2 the determinant of the matrix equals to the (oriented) volume of the parallelepiped determined by the columns of the matrix. We shall also see that considering the mapping x given by the square matrix A over M" we can see the determinant of this matrix as expression of the ratio between the volume of the parallelepipeds given by the vectors x\,...x„ and their images A ■ x\,..., A ■ x„. Because mapping composition x i-> A • x i-> B • (A • x) corresponds to matrix multiplication, the so-called Cauchy theorem is easy to understand:
__ |    Cauchy theorem    |__,
Theorem. Let A — (ay). B — (bij) be square matrices of dimension n over the ring of scalars K. Then \A ■ B\ — \A \ ■ \B\.
x   +   y   + z
45.
Not that from the Cauchy theorem and the representation of the elementary row transformations by multiplication by suitable matrices (see 2.8) we immediately have the claim (2), (3) and (6) from the theorem 2.17.
We know derive this theorem in a purely algebraic way just \\ because the previous argumentation based on geometrical intuition could hardly work for arbitrary scalars. The base tool is the so-called determinant expansion W using one or more of the rows or columns. We will also need a little of technical preparation. Reader who is not fond of too much abstraction can skip these parts and absorb only the statement of the Laplace theorem and its corollaries.
88
CHAPTER 2. ELEMENTARY LINEAR ALGEBRA
The total fee for the entry into the watchtower and the botanical garden expressed in our variables gives 30x + \6y + 24z, and 40x + 24y + 34z respectively. But we know the actual values (€1116 and €1542). Thus we have
30x 40x
+ +
16y 24y
+ +
24z 34z
1 116, 1542.
We write the system of three linear equations in the matrix notation as
1 1 1
30 16 24
^40 24 34
The solution is
r	1/
\y	= 6
	6 \
because
1    1 1
30 16 24 ,40  24 34,
Expressed in words, there were 22 adults, 12 children and 11 seniors.
□
2.27. Using the inverse matrix, compute the solution of the system
X\ -\- x2 -\~ X3 -\- x4 = 2,
X\ -\- x2 — x3 — x4 = 3,
X\ — x2 ~\~ x3 — x4 = 3,
X\ — x2 — x3 -\- x4 = 5.
o
But what if the matrix of the system is not invertible? Then we cannot use the inverse matrix for solving the system. Such a system then always has more than one solution. As the reader may know, a system of linear equations either has no solution, has one solution or has infinitely many solutions (for instance, it cannot have exactly two solutions). The space of the solutions is either a vector space (in the case when the right-hand side of the system is zero, we speak of homogeneous system of linear equations) or an affine space, see 4.1 (in the case when the right-hand side of at least one of the equations is nonzero, we speak of non-homogeneous system of linear equations). We demonstrate possible types of solutions of a system of linear equations by examples.
2.20. Minors of the matrix. When investigating matrices and their properties we often work only with parts of the matrices. Therefore we need some new notions.
__|       submatrices and minors J___
Let A — (aij) be a matrix of the type m/n and let 1 < i\ < ■ ■ ■ < ik < rn, 1 < ji < ... < ji < n be fixed natural numbers. Then the matrix
M =
U2
\ai,
of the type k/l is called a submatrix of the matrix A determined by the rows i\,... Jk and columns j\,ji. The remaining (m — k) rows and in — t) columns determine a matrix M* of the type (m—k)I(n—l), which is called complementary submatrix to M in A. When k — £we define \M\, which is called subdeterminant or minor of the order k of the matrix A. If m — n, then when k — I we have also M* square, then \M* | is called minor complement \M\, or complementary minor of the submatrix M in the matrix A. The scalar
(_1)!'h-----h't+j'h-----.
is called algebraic complement of the minor \M\.
Submatrices formed by the first k rows and columns are called leading principal submatrices, and their determinants leading principal minors of the matrix A. IF we choose k sequential rows and columns starting with the ;-th row, we speak of principal matrices and principal minors.
Specially, when k — £ — 1, m — nwe call the corresponding complementary minor the algebraic complement A^ of the element atj of the matrix A.
2.21. Laplace determinant expansion. If the principal minor \M\ of the matrix A is of the order k, then directly from the definition of the determinant we see that each of the individual k\(n — k)\ members in the product of \M\ with its algebraic complement is a member of \A\.
In general, consider a submatrix M, that is, a square matrix given by the rows i\ < 12 < ■ ■ ■ < ik and columns j\ < ■■■ < j^. Then using — 1) + • • • + (ik — k) exchanges of neighbouring rows and (ji — 1) + • • • + (jk — k) exchanges of neighbouring columns in A we can transform this submatrix M into a principal submatrix and the complementary matrix gets transformed into its complementary matrix. The whole matrix A gets transformed into a matrix B for which it holds thanks to 2.17 and the definition of determinant that \B\ — (-1)"|A|, where a = Yji=\(lh - jh) -2(1 + • • • + k). Therefore we have checked:
Proposition. If A is a square matrix of dimension n and \ M\ is its minor of the order k < n, then the product of any member of\M\ with any member of its algebraic complement is a member of\A\.
This claim suggests the intuition than using some products of smaller determinants we could express the determinant of the matrix itself. We see that \A\ contains exactly n \ distinct members, exactly one for each permutation. These members are mutually distinct as polynomials in elements of (a general indeterminate)
89
CHAPTER 2. ELEMENTARY LINEAR ALGEBRA
2.28. For what values of parameters a,b sR has the system of linear equations
xx
X\ X\
+ +
(1 (1
ax2 a)x2 a)x2
2x3
+ axj,
b-3, 2b - 1
(a) exactly one solution;
(b) no solution;
(c) at least 2 solutions?
Solution. We rewrite it, as usual, in the extended matrix, and transform:
a a
0
a
2b - 1
-2 2
a + 2
1	—a	-2	b
0	1	2	-3
0	0	a	b + 2
2.29.  Determine the number of solutions for the systems (a)
V5x2
\2xx +
+ HX3
-9,
(b)
(c)
X\			-	5x3	= -9,
X\			+	2X3	= -7;
Ax i	+	2x2	_	12X3	= 0,
5xi	+	2x2	-	x3	= 0,
—2x\	—	x2	+	6x3	= 4;
4xi	+	2x2	_	12X3	= 0,
5xi	+	2x2	-	x3	= 1,
—2xi	—	x2	+	6x3	= 0.
matrix A. If we can show that there are exactly that many mutually distinct expressions from the previous claim, we obtain the determinant \A \ from their sum.
It remains to show that the members of the product | M\ ■ \ M* \ contain exactly n! distinct members from | A \.
>From the chosen k rows we can choose minors M and using the previous lemma each of the k\(n — k) \ members in the products of \M\ with their algebraic complements is a member of \A\. But for distinct choices of M we can never obtain the same members and the individual members in (— \)l\+'"+lk+h+---+ii . \M\ ■ \M*\ are also mutually distinct. Therefore we have exactly the required number k! (n — k)! (^) — n! of members.
Thus we have proven:
___J    Laplace theorem J___
At the first step we subtract the first row from the second and the third; and at the second step we subtract the second from the third. We see that the system has a unique solution (determined by backward elimination) if and only if a 7^ 0. For a = 0, the third column is a zero column. If a = 2 and b = —2, we have a zero row, and choosing x3 € R as a parameter gives infinitely many distinct solutions. For a = 0 and b ^ —2 the last equation a = b + 2 cannot be satisfied and the system has no solution.
Let us note that for a = 0, b = — 2 the solutions are
(xi, x2, x3) = (-2 + It, -3 - It, t) ,    t € R
and for a ^ 0 the unique solution is the triple
-3a2 - ab - 4a +2b +4     2b + 3a + 4   b + 2
□
Theorem. Let A — (aij) be a square matrix of dimension n over arbitrary ring of scalars with k rows fixed. Then \A\ is a sum of all (£) products (-\)h+~+ik+h+~+ii. \m\ • |M* | of minors of the order k chosen among the fixed rows with their algebraic complements.
Laplace theorem transforms the computation of \A\ into the computation of determinants of lower dimension. This method of computation is called Laplace expansion by the chose rows (or columns). For instance, the expansion of the z-th row or j-th column is:
J2a
where Ay denotes the algebraic complement of the element ay (that is, minor of order one).
In practical computation it is often efficient to combine the Laplace expansion with a direct method of linear combination addition.
2.22. Proof of Cauchy theorem. The theorem is based on a |gu        clever but elementary application of the Laplace theorem. We just use the Laplace expansion twice on particular positions in the matrix.
Let us first consider the following matrix H of the dimension In (we are using the so-called block symbolics, that is, we write the matrix as if composed of the (sub)matrices A, B, and so on).
(a\
H
A 0 -E B
-1
V 0
«In
0
-1
0
b\\ b„i
0 \ 0
bin bun /
Laplace expansion of the first n rows gives us \H\ — \A \ ■ \B\.
Now we add sequentially to the last n rows linear combinations of the first n columns in order to obtain a matrix with zeros in the
90
CHAPTER 2. ELEMENTARY LINEAR ALGEBRA
Solution. Vectors (1, 0, —5), (1, 0, 2) are clearly linearly independent (they are not multiples of each other) and the vector (12, V5, 11) cannot be their linear combination (its second coordinate is non-zero), therefore the matrix whose rows are these three linearly independent vectors is invertible. Thus the system for the case (a) has exactly one solution.
For the cases (b) and (c) it is enough to note that
(4,2, -12) = -2(-2, -1, 6).
In the case (b) adding the first equation to the third multiplied by two gives 0 = 8, no solution for the system; in the case (c) the third equation is a multiple of the first - the system has clearly infinitely many distinct solutions. □
2.30. Find (any) linear system, whose set of solutions is exactly
{(t + 1, It, 3t, At); t e R}.
Solution. Such a system is for instance
2x\ — x2 = 2,    2x2 — x4 = 0,    4x3 — 3x4 = 0. These solutions are satisfied exactly for every t e R and vectors
(2,-1,0,0),    (0,2,0,-1), (0,0,4,-3)
giving the left-hand sides of the equations are clearly linearly independent (the set of solutions contains a single parameter). □
2.31. Determine the rank of the matrix
/ 1 -3  0 1 \
1 -2 2-4
1-10 1
\-2 -1   1 -2/
Then determine the number of solutions of the system of linear equations
xl     +      x2    + x3
—3xi   — 2x2   — x3
+ 2x2
Xi     — 4X2    + x3
and all solutions of the system
xl     + x2    + x3
—3x\   — 2x2   — x3
+ 2x2
Xi     — 4X2    + x3
and of the system
— 2x4
— X4 + x4
— 2x4
— 2x4
— X4 + x4
— 2x4
4,
5, 1, 3
0, 0,
0,
0
Xl Xl Xi
-2xi
3x2
2X2   + 2x3
x2
x2    + x3
1,
-4, 1, -2.
bottom right corner. We obtain
/flll    ...    fll„ en
K
-1
0
Cnl
0
C In \
0
\0 -1       0   ...   0 /
The elements of the submatrix on the top right part must satisfy
cij = anbij + ai2b2j H-----h ainbnj,
that is, they are exactly the members of the product AB and | K\ — \H\. The expansion of the last n columns gives us
= (-\)"+1+-+2"\A ■ B\ = (_i)2"-(»+D
\A-B\ = \A-B\.
This proves the Cauchy theorem.
2.23. Determinant and the inverse matrix. Assume first that there is an inverse matrix of the matrix A, that is, A ■ A~l — E. Since the unit matrix always satis-4*5^»   fies I E\ — 1, for every invertible matrix we have that \A\ is an invertible scalar and thanks to the Cauchy theorem we have \A~1\ — |A|_1.
But we can say more, combining the Laplace and Cauchy theorem.
___J    Inverse matrix determinant formula J___
For any square matrix A -matrix A* = (a*), where a* of the elements     in A. The matrix A* is called algebraically adjoint matrix of the matrix A.
Theorem. For every square matrix A over a ring of scalars K we have that
(2.2) A A* = A* A = \A\ • E.
Notably,
(1) A-1 exists as a matrix over a ring of scalars K if and only if IAI ~1 exists in K.
(2) If A~l exists, then A~l
(flij) of dimension n we define a = Aji are algebraic complements
|A|
A*.
Proof. As we have already mentioned, Cauchy theorem shows that the existence of A-1 implies the invertibility of
\A\ e K.
For arbitrary square matrix A we can directly compute A ■ A* — (cij), where
n n
Cij = ^^atkalj = ^2aikAjk-
k=\ k=\
If i — j it is exactly the Laplace expansion of \A \ of ;-th row. If i / j it is expansion of determinant of the matrix where the ;-th and j-th row is the same, therefore — 0. This implies that A ■ A* — IAI • E and we have proven the equality (2.2).
Let us further assume that \A\ is an invertible scalar. If we repeat the previous computation for A * ■ A, we obtain | A \ ~1A * ■ A — E. Therefore our computation really gives the inverse matrix of A, as claimed in the theorem. □
91
CHAPTER 2. ELEMENTARY LINEAR ALGEBRA
Solution. Because det A = —10, that is, non-zero, the columns of A are linearly independent, and thus the rank equals to the dimension. The first of the three given system is given by the extended matrix
/	1	1	1	-2	4 \
	-3	-2	-1	-1	5
	0	2	0	1	1
V	1	-4	1	-2	3 )
But the left-hand side is exactly AT with determinant \AT\ = \A\ ^0. Therefore there exists a matrix (AT) 1 and the system has a unique solution
(jci, x2, jc3, x4)T = {AT)~l ■ (4, 5, 1, 3)T .
The second of the systems has the same left-hand side (given by the matrix AT) as the first. Because the numbers on the right-hand side of the equations in the system do not influence the number of solutions and because every homogeneous system has a zero solution, the only solution of the second system is given by
(jci, x2, x3, x4) = (0,0, 0, 0).
The third system is given by the extended matrix
/ 1 1 1
V -2
-3 0
-2 2
-1 0
-1 1
1 \
-4 1
"2/
which is the matrix A (only the last column is given after the vertical bar). If we try to simplify the matrix into the row echelon form, we must obtain a row
( 0  0  0 | a ) ,    kde   a ^ 0.
We know, that the column on the right-hand side is not a linear combination of the columns on the left-hand side (the rank of the matrix is 4). This system thus has no solution. □
2.32. Let
be given. Find real numbers bi,b2,b3 such that the system of linear equations A • x = b has:
(a) infinitely many solutions;
(b) unique solution;
(c) no solution;
(d) exactly four solutions.
Solution.
For the readers it is definitely no problem to find correct values in the cases a) and c) (it is enough to choose b\ = b2 + b3 in the case
As a direct corollary of this theorem we can once again prove the Cramer rule for solving the systems of linear equations, see 2.18. Really, for the solution of the system A - x — b we just need to read in the equation
x — A-1 • b — \A\~lA* -b
the last expression as the Laplace expansion of the determinant of the matrix At which arose through the exchange of the ;-th column of A for the column b.
3. Vector spaces and linear mappings
2.24. Abstract vector spaces. Let us go back for a while to the systems of m linear equations of n variables from 2.3 and let us further assume that the system is homogeneous A ■ x — 0, that is,
\am\
a\n\   (x\\ /0\
7 VW
Thanks to the distributivity of the matrix multiplication it is clear that the sum of two solutions x — (x\,..., x„) and y —
(yi,..., yn) satisfies
A-ix + y)
A-y = 0
and is thus also a solution. Similarly, a scalar multiple a ■ x is also a solution. The set of all solutions of a fixed system of equations is therefore closed on vector addition and scalar multiplication. That are the basic properties of vectors of dimension n in K", see 2.1. Now we have the vectors in the solution space with n coordinates and the "dimension" of this space is given by the difference of the number of variables and the rank of the matrix A. Thus we can easily have with the solution of 1000 coordinates only one or two free parameters. Thus the whole solution space will behave as a plane or a line, as we have already seen in 1.25 at the page 25.
Already in the paragraph 1.9 we have encountered a more interesting example of a space of all solutions of a homogeneous linear difference equation of first order. All solutions have been obtained from a single one by scalar multiplication and are also closed under addition and scalar multiples. These "vectors" of solutions are infinite sequences of numbers, although we intuitively expect that the "dimension" of the whole space of solutions should be one. Therefore we need a more general definition of vector space and its dimension:
|    Vector space definition    [__^
Vector space V over field of scalars K is a set where we define the operations
• addition, which satisfies the axioms (KG1)-(KG4) from the paragraph 1.1 on the page 4,
• scalar multiplication, for which the axioms axioms (V 1)-(V4) from the paragraph 2.1 on the page 72 hold.
Let us remind our simple notational convention: scalars are usually denoted by letters from the beginning of the alphabet, that is, a, b, c,..., while for vectors we shall use letters from the end, that is, u, v, w, x, y, z. Usually, x, y, z will denote n -tuples of
92
CHAPTER 2. ELEMENTARY LINEAR ALGEBRA
a) and b\ ^ b2 + b3 in the case c)). Let us further note that \A\ = 0, thus the system either has infinitely many or no solution. In general, the set of solutions of a homogeneous system of linear equations is a vector space, thus the variant d) is a priori excluded. The variant b) is possible only for a system with a regular matrix (the only solution is then a zero vector).
□
2.33. Solve the system of homogeneous linear equations given by the matrix
/o	V2	73	76	0 \
2	2	73	-2	-75
0	2	75	273	-73
\3	3	73	-3	0 /
o
2.34. Determine all solutions of the system
2.35. Solve
x2			+	x4	=	1,
3x\   — 2x2	-	3x3	+	4x4	= -	2,
X\     + x2	-	x3	+	x4	=	2,
XX		x3				1.
3x   — 5y	+	2u	+	Az	= 2,	
5x   + ly	—	Au	-	6z	= 3,	
Ix   - Ay	+		+	3z	=	
O
o
2.36. Decide whether the system of linear equations
+
3xi 2x\ 2x\ 3x\
+ +
3x2 3x2 3x2 2x2
+ +
x3 x3 x3 x3
1,
8, 4,
of three variables x\, x2, x3 is solvable.
O
2.37. Determine the number of solutions of 2 systems of 5 linear equations
AT - x
(1,2, 3,4, 5)2
(1, 1, 1, 1, 1)
where
X — (X\, x2, x3)
o
2.38. Determine the solution of the system of linear equations
+   Ax2   +2  x3   = 0, Li   +   3x2    —   x3   = 0,
ax i 2x\
scalars. For completeness, the letters from the middle of the alphabet, for instance i, j,k,£, will mostly denote indices in expressions.
In order to gain some practice in formal approach, we check simple properties of vectors, which are trivial for n -tuples for scalars, but not so evident for general vectors.
2.25. Proposition. Let V be a vector space over a field of scalars K, further take a, b, a, e K, and vectors u, v, uj e V. Then
(1) a ■ u — 0 if and only if a — 0 or u — 0,
(2) (—1) • u — —u,
(3) a ■ (u — v) — a ■ u — a ■ v,
(4) (a — b) ■ u — a ■ u — b ■ u,
(5) (E?=i «i) ' (T,J=i "]) = E?=i T,J=i Oi ■ "J-
Proof. We can expand
0 • u — a ■ u
which according to the axiom (KG4) ensures 0 • u — 0. Now
« + (-!)•« (=2) (1 + (-1)) • w = 0 • u = 0 and thus — u — (— 1) • u. Further,
(V2, V3)
a ■ (u + (—1) • v)    =    a ■ u + (—a) ■ v — a ■ u — a ■ v,
(a +0) ■ u — a ■ u
Which proves (3). It holds that
(a
(V2, V3)
b) ■ u    —    a ■ u + (—b) ■ u — a ■ u — b ■ u
which proves (4). The property (5) follows using induction with (V2) and (VI).
It remains to prove (1): a-0 — a - (u — u) — a-u — a-u — 0, which along with the first derived proposition in this proof proves one implication. For the other implication we first need the axiom of the field for scalars and the axiom (V4) for vector spaces: if p ■ u — 0 and p / 0, then u — 1 • u — (p~l ■ p) ■ u — p~l ■ 0 — 0. □
2.26. Linear (in)dependence. In the paragraph 2.11 we have worked with the so-called linear combinations of rows of a matrix. With general vectors we will work analogously:
__J    Linear combinations and independence |___
Expression of the form a\ ■ vi + ■ ■ ■ + ■ is called linear combination of vectors ui, ...,    e V.
Finite sequence of vectors v\, ..., is called linearly independent, if the only zero linear combination is the one with all coefficients zero, that is, for scalars a\,...,    e K holds that
ai ■ vi H-----h ak ■ vk — 0
a\ — ü2 — ■ ■ ■ — ak — 0.
It is clear that in independent sequence of vectors all vectors are mutually distinct and nonzero.
The set of vectors M c V in vector space V over K is called linearly independent, if every finite &-tuple of vectors v\, ... ,Vk e M is linearly independent.
The set of vectors M is linearly dependent, if it is not linearly independent.
93
CHAPTER 2. ELEMENTARY LINEAR ALGEBRA
depending on the parameter aeK.
2.39. Depending on the parameter a e solutions of the system
/4
2
3
V6
1
3
2
a \ (x\\
*2
5 2
■8/
x3 \x4J
O
determine the number of
/2\ 5
3
v-v
o
2.40. Decide whether there is a system of homogeneous linear equations of three variables whose set of solutions is exactly
(a) {(0, 0, 0)};
(b) {(0,1,0), (0,0,0), (1,1,0)};
(c) {(jc, 1,0); x € R};
(d) {(x,y,2y); x,y elj.
O
2.41. Solve the system of linear equations, depending on the real parameters a, b.
x + 2y + bz   = a x - y + 2z   = 1 3x-y   = 1.
O
2.42. Find the algebraically adjoint matrix and the inverse of the matrix
(\ 0 5
0\
4
0
\0  7  0 8/ Solution. The adjoint matrix is
An A3i \A4i
An
A22
A32 A42
An A23 A33 A43
Au\ A24 A34 A44/
where Atj is the algebraic complement of the element atj of the matrix A, that is, the product of the number (—l)i+i and the determinant of the matrix given by A without the z'-th row and 7-th column. We have
0  0 4
-24,       Al2    = -
3 0 4 0 6 0 7  0 8
5
0 0
6 0
A43
0 0 3 4 0 0
0,
144
0,...
■12.
Directly from the definition we have that a nonempty subset M „^mL-      °f vectors from a vector space over a field of scalars ifl^£      K is dependent if and only if one of its vectors can be f^few^   expressed as a finite linear combination using other 'ffsr*%5s-L vectors in M. Really, at least one of the coefficients in the corresponding zero linear combination must be nonzero, and since we are over a field of scalars, we can multiply whole combination by the inverse of this nonzero coefficient and thus express its corresponding vector using others.
Every subset of a linearly independent set M is clearly also linearly independent (we require the same conditions on a smaller set of vectors). Similarly, we can see that M c V is linearly independent if and only if every finite subset of M is linearly independent.
2.27. Generators and subspaces. Subset M c Vis called vector sub space if it along with restricted operations of ad-rt      dition and scalar multiplication forms a vector space. That is, we require
Va, b e K, Vi>, w e M, a ■ v + b ■ w e M.
Let us investigate a couple of cases: The space of m-tuples of scalars Rm with coordinate-wise addition and multiplication is a vector space over R, but also a vector space over Q. For instance for m — 2, the vectors (1, 0), (0, 1) e R2 are linearly independent, because from
a- (1,0)+b- (0, 1) = (0, 0)
follows a = b = 0. Further, the vectors (1,0), (V2, 0) e M2 are linearly dependent over R, because V2 • (1, 0) = (V2, 0), but over Q they are linearly independent! Over R these two vectors "generate" one-dimensional subspace, while over Q the subspace is "bigger".
Polynomials of degree at most m form a vector space Mm[x]. We can see the polynomials as mappings / : R —>• R and define the addition and scalar multiplication like this: (/ + g)(x) — f(x) + g(x), (a ■ f)(x) — a ■ f(x). Polynomials of any degree also form a vector space Moo M and Rm [x] c R„ [x] is a vector subspace for any m < n < 00. Subspaces are also for instance all even polynomials or odd polynomials, that is, polynomials satisfying f(-x) = ±f(x).
In a complete analogy as with polynomials we can define a structure of vector space on a set of all mappings R —>• R or of all mappings M —>• V of an arbitrary fixed set M into the vector space V.
Because the condition in the definition of subspace consists only of universal quantifiers, the intersection of subspaces is still a subspace. We can clearly see it also directly: Let Wi, i e I, be vector subspaces in V, a, b e K, u,v e P 1 nieIWi. Then for all i e I, a ■ u + b ■ v e Wt, but that means that a ■ u + b ■ v e nieI Wi.
Notably, the intersections (M) of all subspaces W C V that contain some given set of vectors M c V is a subspace.
We say that a set M generates the subspace (M), or that the elements of M are generators of the subspace (M).
Let us again formulate a few simple claims about subspace generation:
Proposition. For every nonempty set M C V we have that
94
CHAPTER 2. ELEMENTARY LINEAR ALGEBRA
A*
By plugging in we obtain
/-24 0 20     0 \
0 -32 0 28
8 0-4 0
\ 0 16 0 -12/
/-24 0 8      0 \
0 -32 0 16
20 0 -4 0
\ 0 28 0 -12/
We compute the inverse matrix A~l from the relation A~l = \A\~l - A*. Determinant of the matrix A is (expanding the first row) equal to
10 2 0
0  3 0 4
5  0 6 0
0  7 0 8
3	0	4		0	3	4
0	6	0	+ 2	5	0	0
7	0	8		0	7	8
16.
By plugging in we obtain
/	-3/2	0	1/2	0 \
	0	-2	0	1
	5/4	0	-1/4	0
V	0	7/4	0	-3/4/
□
2.43. Find the algebraically adjoint matrix F* for
a, ß,y,S e
O
2.44. Calculate the algebraically adjoint matrix for the matrices
-1\
(a)
/3 -2 0
0 2 2 1
1 -2 -3 -2 \0 1 2 1 /
(b)
1 + i 2i 3-2/ 6
where i denotes the imaginary unit.
O
F. Vector spaces
The properties of vector space, which we have already observed for the plane or three dimensional space are possessed by other sets as well. We illustrate this by examples.
2.45. Vector space - yes or no? Decide for the following sets whether they form a vector space over the field of real numbers:
(1) (M) — {ai ■ u\ H-----\-cik-iik\ k e N, at e K, Uj e M, j —
1, ...,£};
(2) M — (M) if and only ifM is a vector subspace;
(3) ifN C M then (N) C (M) is a vector subspace
Subspace (0) generated by the empty subspace is the trivial sub-space {0} C V.
Proof. (1) The set of all linear combinations
a\u\
on the right-hand side (1) is clearly a vector subspace and of course it contains M. On the other hand, each of the linear combinations must be in (M) and thus the first claim is proven.
The claim (2) follows immediately from (1) and from the definition of vector space and analogously (1) implies the third claim.
Finally, the smallest subspace is {0}, because empty set is contained in every subspace and each of them contains the vector 0. □
2.28. Sums of subspace. Since we now have some intuition about generators and their respective subspaces, we should understand the possibilities how some subspaces can generate whole space V.
_J    Sum of subspaces J___
Let Vt,i e I be subspaces of V. Then the subspace generated by their union, that is, (U, e/ V,), is called sum of subspaces V). We denote it as 2~2iei ^- Notably, for a finite number of subspaces Vi,..., Vk C V we write
Vl+--- + V* = (ViUVr2u---UVJt).
We see that every element in the considered subspace can be expressed as a linear combination of vectors from the subspaces Vi. Because vector addition is commutative, we can associate members that belong to the same subspace and for a finite sum of k subspaces we obtain
Vi + V2 + ■ ■ ■ + Vk = {Vl + ■ ■ ■ + vk; Vie Vi, i = 1,..., k).
Sum W — Vi + ■ ■ ■ + Vk C V is called direct sum of subspaces if the intersection of any two is trivial, that is, Vi n Vj — {0} for all i ^ j. We show that in such case can every vector w e W be written in a unique way as a sum
w — vi
Vk,
where Vi e Vi. Really, if for that vector we could simultaneously write w — v[ +----h v'k, then
0 — w — w — (vi — v[) H----+ (vk — v'k).
If vi — v. is the first nonzero term of the right-hand side, then this vector from Vi can be expressed using vectors from other sub-spaces. That is a contradiction with the assumption that Vi has zero intersection with other subspaces. The only possibility is then that all the vectors on the right-hand side are zero and thus the expression of w is unique.
For direct sums of subspaces we write
W = Vi ®
Vk
|V/.
2.29. Basis. Now we have everything prepared for understanding minimal sets of generators as we understood them in the plane M2.
95
CHAPTER 2. ELEMENTARY LINEAR ALGEBRA
i) The set of solutions of the system
xl + x2 + ' ' ' + X9S + x99 + ^100 =100xi,
xi + x2-\-----h x98 + x99 =99xi,
xi + x2 H-----h x98 =98*1,
X} -(- X2 =2x\.
ii) The set of solutions of the equation
X\ + X2 + ■ ■ ■ + X\oo = 0
iii) The set of solution of the equation
xi + 2x2 + 3x3 H-----h lOOxioo = 1.
iv) The set of all real (or complex) sequences. (Real or complex sequence is a mapping /:N->Mor/:N->C The image of number n is then called 72-th member of the sequence, we usually denote it by lower index, say a„.)
v) The set of solutions of homogeneous difference equation.
vi) The set of solutions of non-homogeneous difference equation.
vii) {/ : R -> R\f(\) = f(2) = c, c e R} Solution.
i) Yes. They all are real multiples of the vector (1, 1, 1..., 1),
1-,-'
100 ones
that is, vector space of dimension 1 (see also (2.29)).
ii) Yes. It is a space of dimension 99 (corresponds to the number of free parameters of the solution). In general the set of all solutions of any system homogeneous linear equations forms a vector space.
iii) No. For instance, taking twice the solution x\ = 1, xt =0, i = 2, ... 100 we do not obtain a solution. But the set of solutions forms a so-called affine space (see (4.1)).
iv) Yes. The set of all real or complex sequences clearly forms a real (complex) vector space. Adding the sequences and scalar multiplication is defined term-wise, where it is clearly the vector space of all real (complex) numbers.
v) Yes. In order to show that the set of sequences which satisfy given difference homogeneous equation it is enough to show that it is closed under addition and real number multiplication (as the set of all real sequences is a vector space, as we know). Let us have two sequences (x7)^0 and (yj)^0
Basis of vector space
V
Subset M C V is called basis of vector space V if (M) and M is linearly independent.
Vector space with finite basis is called finitely dimensional, the number of elements of the basis is called the dimension ofV. If V does not have a finite basis, we say that V is infinitely dimensional. We write dim V — k, & e N or & = 00.
In order to be satisfied with such definition of dimension, we must know that different bases of the same space will always have the same number of elements. This we will show in a while. But we note immediately, that the trivial subspace is generated by empty set, which is an "empty" basis. It thus has zero dimension.
Basis of a ^-dimensional space will usually be denoted as a k-tuple v — (v\..., Vk) of basis vectors. It is mostly about having a convention: with finitely dimensional vector spaces we shall always consider the base along with a given order of the elements even if we have not defined it that way, strictly said.
Clearly, if (vi,... ,v„) is a basis of V, the whole space V is a direct sum of the one-dimensional subspaces
V=(vi)®---®(vn).
An immediate corollary of the derived uniqueness of decomposition of any vector w in V into the components in the direct sum gives unique decomposition
W — X\V\ + • • • + x„ v„
and allows us after choosing a basis to see vectors again as n-tuples of scalars. To this idea we will return in the paragraph 2.33, when we finish the discussion of existence of bases and sums of sub-spaces in general case.
2.30. Theorem. >From any finite set of generators of a vector space V we can choose a basis. Every base of a finitely dimensional space V has the same number of elements.
Proof. First claim can be easily proved using induction on the number of generators k.
Only the zero subspace does not need any generator and thus we are able to choose an empty basis. On the other hand, we are not allowed to choose the zero vector (generators would be linearly dependent) and there is nothing else in the subspace.
In order to have our inductive step more natural, we deal with the case k — 1 first. We have V — ({v}) and 1; ^ 0, because {v} is linearly independent set of vectors. Then {v} is also a basis of the vector space V.
Assume that the claim holds for k — n and consider V — (vi,..., vn+i). If vi,..., vn+i are linearly independent, then they form a basis. In the other case there exists i such that
vt —aivi H-----\-ai-iVi-i +ai+ivi+i H-----\-an+\vn+\.
Then V — (vi,..., i^-i, ur-+i,..., vn+\) and we can choose a basis, using inductive assumption.
In remains to ensure that bases always have the same number of elements. Consider basis (y\,... ,vn) of the space V and for arbitrary nonzero vector consider
a\v\
■ a„v„ e V
96
CHAPTER 2. ELEMENTARY LINEAR ALGEBRA
satisfying the given equation, that is,
anxn+k + an-\xn+k-\ + ' ' ' + a0xk     = 0
anyn+k + an-iyn+k-i + ■ ■ ■ + aoyk   = 0.
By adding these equations, we obtain
an(xn+k + yn+k) + an-i(xn+k-i + yn+k-i) + • • • + a0(xk + yk) = 0,
therefore also the sequence (xj + yj)^0 satisfies the given equation. Analogously, if the sequence (xj)^0 satisfies the given equation, then also (uxj)^0, where uel vi) No. The sum of two solutions of a non-homogeneous equation
anxn+k + an-\xn+k-\ + ' ' ' + a0xk     — c
a„y„+k + an-iyn+k-i-\-----h a0yk   =   c,    c e
{0}
satisfies the equation
an(xn+k + yn+k) + an-i(xn+k-i + yn+k-i) + • • • + a0(xk + yk) = 2c,
that is, it does not satisfy the original non-homogeneous equation. But the set of solutions forms an affine space, see 4.1.
vii) It is a vector space if and only if c = 0. If we take two functions / and g from the given set, then (/ + g)(l) = (/ + g)(2) = /(l) + g(l) = 2c. Thus if / + g is to be a member of the given set, it must be that (/ + g)(l) = c, therefore 2c = c, therefore c = 0.
□
2.46.  Find out, whether the set
U1 = {(x1,x2,x3) sM3; | jci | = \x2\ = \x3\} is a subspace of a vector space R3 and the set U2 = {ax2 + c; fl,cel)
a subspace of the space of polynomials of degree at most 2.
Solution. The set U\ is not a vector (sub)space. We can see that, for instance,
(1, 1,1)+ (-1,1, 1) = (0,2,2) glA. The set U2 is a subspace (there is a clear identification with R2), be-
with at ^ 0 for some i. Then
1 , , vt — — (u - (a\vi H-----hflr-iUr-i +ai+ivi+i H-----\-a„v„))
Cli
and therefore also {u,v\,..., vi-\, vi+\, ...,v„) — V.
We ensure that this is again a basis: if adding u to linearly independent vectors v\,..., vi+\,..., v„ would lead to a set of linearly dependent vectors, then u is their linear combination. That would mean
V
(V\, . . . , Vi-l, Vi + l, ...,vn),
cause
{axx2 + ci) + (a2x2 + c2) = (ax + a2)x2 + (ci + c2),
which is not possible.
Thus we have proved that for any nonzero vector u e V there exists i, 1 < i < n, such that (u,v\,..., vi-\, vi+\, ..., vn) is again a basis of V.
Further, we shall instead of one vector u consider a linearly independent set u\,..., uk and we will sequentially add u\, u2,..., always exchanging for some vt using our previous approach. We have to ensure that there always is such (that is, that the vector u will not exchange for each other). Assume thus that we have already placed u\,..., ui. Then the vector ui+\ can clearly be expressed as a linear combination of such vector and the remaining vj . If only the coefficients at u i,..., ui were nonzero, that would mean that the vectors u\,..., ui+\ are linearly dependent, which is a contradiction.
For every k < n we can after k steps obtain a basis in which from the original basis k vectors were exchanged for new ones. If k > n, then in the n-th step we obtain a basis consisting only of new vectors , which means that the original set could not be linearly independent. Notably it is not possible that two bases have a different number of elements. □
In reality, we have proved a stronger claim, the so-called Steinitz exchange theorem, which says that for every finite basis v and every system of linearly independent vectors in V we can find a subset of the basis vectors which can be exchanged with the new vectors to obtain a basis.
2.31. Corollaries of the Steinitz exchange theorem. Thanks to the possibility of freely choosing and exchanging basis vectors we can immediately derive nice (and intuitively expectable) properties of bases of vector space:
Proposition. (1) Every two bases of a finitely dimensional vector space have the same number of elements, that is, our definition of dimension is basis-independent.
(2) IfV has a finite basis, then every linearly independent set can be extended to a basis.
(3) Basis of a finitely dimensional vector space is maximal linearly independent set.
(4) Bases of a vector space are exactly minimal sets of generators.
A little bit more complicated, but now easy to deal with, is the situation of dimensions of subspaces and their sums:
Corollary. Let W,W\,W2 c V be subspaces of a space V of finite dimension. Then we have that
(1) dimW < dim V,
(2) V = W if and only if dim V = dim W,
(3) dim Wi + dim W2 = dim(Wi + W2) + dim(Wi n W2).
97
CHAPTER 2. ELEMENTARY LINEAR ALGEBRA
k ■ (ax2 + c) = (ka) x2 + kc for all numbers a\,c\, a2, c2,a,c,ksR. □
2.47. Is the set V = {(1, x); x el) with operations
®:VxV^V, (l,y)©(l,z) = (l,z +y) for all z, y eR 0:lxV ^F,    z © (l,y) = (l,y ■ z)   for allz, y e 1.
a vector space? O
G. Linear dependence and independence, bases
2.48. By calculating the determinant of a suitable matrix decide whether the vectors (1, 2, 3, 1), (1, 0, -1, 1), (2, 1, -1, 3) and (0, 0, 3, 2) are linearly dependent or not.
Solution. Because
1 2
1 0
2 1 0 0
3 1 -1 1
1
10 ^0,
the given vectors are linearly independent.
□
2.49. Given arbitrary linearly independent vectors u, v, w, z in a vector space V, decide whether in V the vectors
u — 2v,    3u + w — z,    u — 4v + w + 2z,    4v + Sw + 4z
are linearly independent or not.
Solution. The considered vectors are linearly independent if and only if the vectors (1, -2, 0, 0), (3, 0, 1, -1), (1, -4, 1, 2), (0, 4, 8, 4) are linearly independent in R4. We have
1-200
3 0 1-1 1-412
0 4    8 4
thus the vectors are linearly independent.
-36 ^0,
□
2.50. Determine all constants ael such that the polynomials ax2 + x +2, —2x2 + ax + 3 and x2 + 2x + a are linearly dependent (in the vector space P3 [x] of polynomials of one variable of degree at most three over real numbers).
Solution. In the basis 1, x, x2 the coefficients of the given vectors (polynomials) are (a, 1, 2), (—2, a, 3), (1, 2, a). Polynomials are linearly independent if and only if the matrix whose columns are given by the coordinates of the vectors has rank lower than the number of the vectors, which in this case means that rank must be two or lower. In
Proof. It remains to prove only the last claim. That is clear when the dimension of one of the spaces is zero. Assume then that dim W\—r>\, dim W2 — s > 1 and let (wi..., wt) be a basis of W\ n W2 (or empty set, if the intersection is trivial). According to the Steinitz exchange theorem this basis of the intersection can be extended to a basis (wi,..., wt, ut+\ ... ,ur) for W\ and to a basis (wi... ,wt, vt+\,... ,vs) for W2. Vectors
wt, ut+i.
Ur, Vt+l ... ,VS
clearly generate W\ + W2. We show that they are linearly independent. Let
fliwi H-----Vatwt +bt+iut+i + ...
----h brur + ct+i vt+i
■csvs = 0.
Then necessarily
- (Q+i • vt+i H-----hcs • vs) —
— a\ ■ w\ +----h at ■ wt + bt+\ ■ ut+\ + ■ ■ ■ + br ■ ur
must belong to ^nffi. That implies that
bt+i — ■ ■ ■ — br — 0, since in that way we have defined our bases. Then also
a\ ■ w\ H-----h at ■ wt + ct+\ ■ vt+\ H-----h cs ■ vs — 0
and because the corresponding vectors form a basis W2, all the coefficients are zero.
The claim (3) now follows by directly calculating of generators. □
2.32. Examples. (1) K" has (as a vector space over K) dimension n. Basis is for example an n-tuple of vectors
((1,0, ...,0), (0, 1, ...,0)..., (0, ...,0, 1)).
This basis is called the standard basis ofK". Note that in the case of finite field of scalars, say Z*, the whole space K" has only a finite number (kn) of elements.
(2) C as a vector space over R has dimension 2, basis is for instance the numbers 1 and i.
(3) Km[x], that is, the space of all polynomials of degree at most m, has dimension m + 1, basis is for instance the sequence 1, x, x2, ... ,xm .
Vector space of all polynomials K[x] has dimension 00, but we can still find a basis (although infinite in size): 1, x, x2,____
(4) Vector space R over Q has dimension 00 and does not have a countable basis.
(5) Vector space of all mappings / : R —>• R has also dimension 00 and does not have any finite basis.
2.33. Vector coordinates. If we fix a basis (vi,..., v„) of a finitely dimensional space V, then every vector w e V can be expressed as a linear combination 1; — aivi + ■ ■ ■ + anvn. Assume that we can do it in two ways:
But then
w — a\v\ H-----h a„v„ — b\v\-\-----h b„v„.
0 = (ai - b\) ■ vi H-----h (an - b„) ■ vn
98
CHAPTER 2. ELEMENTARY LINEAR ALGEBRA
the case of square matrix, rank lower than the number of rows means that the determinant is zero. The condition for a thus reads
-2 1
a 2 3 a
0,
that is, a is a root of the polynomial a3 — 6a — thus there are 3 such constants a\ = —l,a2j,
5 = (a + l)(a2-
_ l±V2l
-a —5),
□
2.51. Vectors
(1,2,1),    (-1,1,0), (0,1,1)
are linearly independent, and therefore together form a basis of R3 (for basis it is important to give an order of the vectors). Every three-dimensional vector is therefore some linear combination of them. What linear combination corresponds to the vector (1, 1, 1), or equiv-alently, what are the coordinates of the vector (1, 1, 1) in the basis formed by the given vectors? O
Solution. We seek a, b, c e R such that a(\, 2, 1) + b(-\, 1,0) + c(0, 1, 1) = (1, 1, 1). The equation must hold in every coordinate, so we have a system of three linear equations in three variables:
a — b =1 2a+b+c   = 1 a + c   = 1,
whose solution gives us a =    b = — \,c = \, thus we have (1, 1, 1) = \ ■ (1, 2, 1) - \ ■ (-1, 1, 0) + l- ■ (0, 1, 1),
that is, the coordinates of the vector (1, 1, 1) in the basis ((1, 2, 1), (-1,1, 0), (0, 1, 1)) are (I -I I). □
2.52. Express the vector (5, 1, 11) as a linear combination of the vectors (3, 2, 2), (2, 3, 1), (1, 1, 3), that is, find numbers p,q,r e R, for which
(5, 1, 11) = p (3, 2, 2) + q (2, 3, 1) + r (1, 1, 3).
O
2.53. Consider the complex numbers C as a real vector space. Determine the coordinates of the number 2 + i in the basis given by the roots of the polynomial x2 + x + 1.
Solution. Because roots of the given polynomial are
+ z'# and
and thus a, — b[for all i — 1,..., n. We have reached the following conclusion:
In a finitely dimensional space every vector can be given in a unique way as a linear combination of basis vectors. Coefficients of this unique linear combination expressing the given vector w e V in the chosen basis v — (vi,...,v„)sae called coordinates of the vector w in this basis.
Whenever we speak about coordinates (a\,..., a„) of vector w, which we express as a sequence, we must have a fixed ordering of basis vectors v — (v\,..., v„). Although we have defined the basis as a minimal set of generators, in reality we work with them as with sequences (that is, with ordered sets).
___|    Assigning coordinates to vectors J_ -
Mapping, which to the vector u — a\ v\ H-----Yan vn assigns its
coordinates in the basis v shall be denoted with the same symbol v : V -> K". It has the following properties:
(1) v_(u + w) — v_(u) + v(w); Vm, w e V,
(2) v(a ■ u) — a ■ v(u); Va e K, Vm e V.
Note that the operations over left and right side of these equa-jSt# tions are not identical, quite the opposite, they are operations over different vector spaces! At this op-5^2g|3^ portunity, we can think about the general case of the basis M of (possibly infinite) vector space V. The basis then does not have to be countable, but still we can define the mapping M : V —>• KM (that is, the coordinates of the vectors are the mapping from M to K).
The given properties of assignments of coordinates were already seen at the mappings in geometry we have called linear (they preserved our linear structure in the plane). Before we deal more thoroughly with the dependency of the coordinates on the choice of the basis, we look in more generality at the notion of the linearity of the mapping.
2.34. Linear mapping. For any vector space (of finite or infinite dimension) we define "linearity" of a mapping between spaces similarly to the planar case (M2):
Linear mapping, definition {_
Let V and W be vector spaces over the same field of scalars K. The mapping / : V —>• W is called linear mapping (homomorphism) if the following holds:
(1) f(u + v) = f(u) + f(v), Vu,veV
(2) f(a-u) = a- f(u), Va e K, Vm e V.
i^-, we have to determine the coordinates {a, b) of the vector
Clearly, such mapping have already been seen in the case of matrix multiplication:
/ : K"     Km,x     A -x
with matrix of type m/n over K.
Image Im / := f(V) c W is always vector subspace, since linear combination of images f(u{) is an image of a linear combination of the vectors    with the same coefficients.
Analogously, the set of all vectors Ker / := f~l ({0}) c V is a subspace, since the linear combination of zero images will always be a zero vector. The subspace Ker / is called kernel of linear mapping f.
Linear mapping which is a bijection is called isomorphism.
99
CHAPTER 2. ELEMENTARY LINEAR ALGEBRA
2 + z in the basis (—^ + 1^,-^ — 1 ^). These real numbers a, b are uniquely determined by the condition
1 73 1 73
a ■ (---h z—) + b ■ (---z-) = 2 + z.
2 2 2 2
By considering individually the real and the imaginary part of the equation we obtain a system of two linear equations in two variables:
1 1
--a--b   = 2
2 2
V3 73,
-a--b   = 1.
2 2
Its solution gives us a
-2 +
V3
coordinates are (—2 +     — 2 — -^).
-2 — ^j-, therefore the
□
2.54. Remark. As a perceptive reader has definitely spotted, the problem statement is not unambiguous - we are not given the order of the roots of the polynomial, thus we do not have the order of the basis vectors. The result is thus given up to the permutation of the coordinates.
Let us also add a remark about the so-called rationalising the denominator, that is, removing the square roots from the denominator. The authors do not have a distinctive attitude whether this should al-ways be done or not (Does ^- look better than ?). In some cases the rationalising is undesirable: from the fraction -J= we can immediately
spot that its value is a little greater than 1 (because V 35 is just a little smaller than 6), while for the rationalised fraction we cannot spot anything.
2.55. Consider complex numbers C as a real vector space. Determine the coordinates of the number 2 + z in the basis given by the roots of the polynomial x2 — x + 1.
2.56. For what values of the parameters a,b,c e M are the vectors (1, 1, a, 1), (l,b,l, 1), (c, 1, 1, 1) linearly dependent?
2.57. Let a vector space V be given along with some basis formed by the vectors u,v,w,z. Determine whether the vectors
u — 3v + z,    v — 5w — z,   3w — lz,   u — w + z
are linearly (in)dependent.
2.58. Complete the vectors 1 — x2 + x3, 1 + x2 + x3, 1 — x — x3 into a basis of the space of polynomials of degree at most 3.
2.59. Do the matrices
1    0 \     (I    4 \     (-5  0\     (I -2 1-2J'    [0  -lj '    V 3    0) '    [0 3
form a basis of the vector space of square two-dimensional matrix?
Analogously to the abstract definition of vector spaces, it is again necessary to prove seemingly trivial claims that follow from the axioms:
Proposition. Let f : V -> W be a linear mapping between two vector spaces over the same field of scalars K. For all vectors u, u\, ..., uk e V and scalars a\,..., ak e~Kit holds that
(1) /(0) = 0,
(2) f(-u) = -/(«),
(3) f(ai ■ u\ H-----h ak ■ uk) — a\ ■ f (u\) -\-----V ak ■ f(uk),
(4) for every vector subspace V\ C V is its image /(Vi) a vector sub space in W,
(5) for every vector subspace W\ C W isthe set f~l(W\) — {v e V; f(v)eWi] a vector subspace in V.
Proof. We rely on the axioms, definitions and already proved results (in case you are not sure what has been used, look it up!):
/(0) = f(u -u) = /((l - 1) • u) = 0 • f(u) = 0, /(-«) = /((-l) • u) = (-1) • f(u) = -f(u).
The property (3) is again easy from the definition for two sum-mands using induction on the number of summands. >From the property (3) we have that (/(Vi)) — f(Vi), thus it is vector sub-space.
On the other hand, if f(u) e W\ and f(v) e W\ then for any scalars it will be that f(a-u+b-v) — a ■ f (u)+b-f (v) eW\. □
2.35. Simple corollaries.
(1) Composition g of : V —>• Z of two linear mappings / : V —>• W and g : W —>• Z is again a linear mapping.
(2) Linear mapping / : V —>• W is an isomorphism if and only if Im / — W and Ker / — {0} c V. Inverse mapping of an isomorphism is again an isomorphism.
(3) For any two subspaces V\, V2 C V and linear mapping / : V     W it holds that
f(Vi + V2) = f(Vi) + f(V2),
f(Vi n v2) c /(Vi) n f(v2).
(4) The mapping "coordinate assignment" u : V —>• K" given by arbitrarily chosen basis u — (u\, ...,«„) of a vector space V is an isomorphism.
(5) Two finitely dimensional vector spaces are isomorphic if and only if they have the same dimension.
(6) Composition of two isomorphisms is an isomorphism.
Proof. Proving the first claim is a very easy exercise. ^ For the proof of the second one we must realise
that if / is a linear bijection, then a vector w is an image of a linear combination au + bv, that is u> — f~l (au + bv), if and only if
f(w) = au + bv = f(a-f-1(u)+b-f'Hv)).
Thus it also holds that w — af~l(u) + bf~l (v) and therefore the inversion of a linear bijection is again a linear bijection.
Further, / is surjective if and only if Im / — W and if Ker / — {0} then f(u) — f(v) ensures f(u — v) — 0, that is, u — v. In this case / is injective.
The remaining claims are easy to prove by induction. Try to make a counterexample - in the inclusion that is to be proved there
100
CHAPTER 2. ELEMENTARY LINEAR ALGEBRA
Solution. The four given matrices are as vectors in the space of 2 x 2 matrices linearly independent. It follows from the fact that the matrix
/1	1	-5	1 \
0	4	0	-2
1	0	3	0
V"2	-1	0	3/
is regular (which is by the way equivalent to any of the following claims: its rank equals its dimension; it can be transformed into the unit matrix by elementary row transformations; it has the inverse matrix; it has non-zero determinant (equal to 116); it stands for a system of homogeneous linear equations with only zero solution; every non-homogeneous linear system with left-hand side given by this matrix has a unique solution; the range of a linear mapping given by this matrix is a vector space of dimension 4 - this mapping is injective). □
2.60. Let there be in M3 two vector spaces U and V generated by the vectors
(1, 1, -3), (1, 2, 2)   a   (1, 1, -1), (1, 2, 1), (1, 3, 3),
respectively. Determine the intersection of these two subspaces.
Solution. The subspace V has dimension only 2 (it is not the whole space M3), because
1	1	1		1	1	-1	
1	2	3	=	1	2	1	= 0
-1	1	3		1	3	3	
and any two of the considered three vectors is clearly linearly independent. Similarly we can see that U has dimension 2. Also we have
1 1
1 2
-3 2
1 1
-1
2^0,
and therefore the vector (1,1,-1) does not he in the subspace U. The intersection of two planes (two-dimensional spaces) passing through the origin in a three-dimensional space must be at least a line. In our case it is exactly a line (subspaces are not identical). Thus we have determined the dimension of the intersection - it is one-dimensional. If we note that
1 -(1, 1, -3)+2- (1,2,2) = (3,5, 1) = 1 • (1, 1, -1) +2 • (1,2, 1),
we obtain expression of the intersection in the form of a set of all scalar multiples of the vector (3,5, 1) (thus it is a line passing through the origin with this vector as a direction). □
does not always have to hold an equality (that is, find an example where the inclusion is strict). □
2.36. Coordinates again. Consider any two vector spaces V and W over K with dim V — n, dim W — m and consider some r linear mapping / : V -> W. For every choice of basis 1 -' *% u_ — (u\,..., un) on V, v — (vi,..., vn) on W we have ft1 ' at our disposal the corresponding coordinate assignments and the whole situation is captured in the following diagram:
The bottom arrow fu,v is defined by the remaining three, that is, as a mapping it is a composition
fu,v =RO f ou~l.
Matrix of a linear mapping I -
Every linear mapping is uniquely determined by its values on I an arbitrary set of generators, notably on the vectors of a basis u. Denote by
f(u\) — an ■ vi +a2i ■ v2 H-----V am\vm
f(u2) — a 12 ■ vi + a22 ■ v2 H-----h am2vm
/(«„) — a\n ■ vi + a2„ ■ v2 H-----h amnvm,
that is, scalars atj form a matrix A, where the columns are coordinates of the values f(uj) of the mapping / on the basis vectors expressed in the basis v on the target space W.
Matrix A — (a^) is called matrix of the mapping f in bases u, v. |
For a general vector u — x\u\ H-----h x„u„ e V we calculate
(recall that vector addition is commutative and distributive with respect to scalar multiplication)
/(«) — x\f(ui) H-----h xnf(un)
= xi(flni;i-|-----Yam\vm) H-----\-x„(ai„vi-\-----Yamnvm)
= (xiflnH-----Yxna\n)v\ H-----h (x\am\-\-----Yxnamn)vm.
Using matrix multiplication we can now very easily and clearly write down the values of the mapping fu,v(u>) defined uniquely by the previous diagram. Recall that vector in W are understood as columns, that is, matrices of the type r/1
fu,v(u(u))) - v(f(w)) - A ■ u{w).
On the other hand, if we have fixed bases on V and W, then every choice of a matrix A of the type m/n gives a unique linear mapping K" —>• Km and thus also a mapping / : V —>• W. If we have chosen bases of spaces V and W, every choice of a matrix of the type m/n correspond to a unique linear mapping V —>• W and we have shown a bijection between matrices of the corresponding dimension and linear mappings V —>• W.
101
CHAPTER 2. ELEMENTARY LINEAR ALGEBRA
2.61. Determine the vector subspace (of the space R4) generated by the vectors u\ = (—1,3, —2, 1), u2 = (2,-1,-1, 2), u3 = (—4, 7, —3, 0), «4 = (1, 5, —5, 4), by choosing some maximal set of linearly independent vectors ut (that is, by choosing a basis).
Solution. We write the vectors ut into the columns of a matrix and transform it using elementary row transformations. This way we obtain
Í-1
3
-2
V i
2 1
0 1 \0 0
7
/I
0
-1
-i -: 2 o
o
-i -i
1 \
5
-5 4/
4\
5/4
1
0 /
( 1 -1
3
V-2 (I 2 0 1 0 0 \0 0
2 2
-1 -1
4\ 1
5
"5/
/I
0 0
4 \
5/4 -1/4 0 /
(I 0
0 1
0 0
\0 0
2 0 4 -4 -7 7
3 2
-1
0 0
4\
5
-7 3 /
°\
0
1
0/
2 • (-1, 3, -2, 1) - (2, -1, -1, 2) = (-4, 7, -3, 0).
□
2.62. In the vector space R we are given three-dimensional sub-spaces
U = (mi, u2, u3),       V = (vi, v2, v3),
while
	(\\		(\\		(\\		( 1 ^		(1 \
	1		1		0		1		-1
	1	, u2 =	0	, u3 =	1	,    vi =	-1	, v2 =	1
	vv		V)		V)				
v3 = (1, — 1, — 1, l)r. Determine the dimension and give a basis of the subspace U n V.
Solution. The subspace U n V contains exactly the vectors that can be obtained as a linear combinations of vectors ut and also as a linear combination of vectors vt. We thus search for numbers x\, x2, x3, yi, y2, y3 € R such that the following holds:
	(\\		(1\		(1\		( 1 ^		( 1 ^		( 1 \
	1		1		0		i		-l		-1
xx	1	+ x2	0	+ x3	1	= yi	-i	+ y2	l	+ J3	-1
	vv		V)		v)						
that is, we are looking for a solution of a system
+  x3 =
Xi Xi
xx
+ +
X2 *2
*2
+ +
X3
x3
yi yi -y\ -yi
+ +
y2 y2 y2 y2
+
+
J3, J3, J3,
ys-
>From that it follows that linearly independent are exactly the vectors ui,u2,u4, that is, exactly the vectors corresponding to the columns which contain first non-zero number of some row. Furthermore we have (see the third column)
2.37. Matrix for changing the coordinates. If we choose V and W to be the same space, but with different bases, and for / pick the identity mapping, the approach from the previous paragraph expresses the vectors of the basis u in coordinates with respect to the basis v. Let the resulting matrix be T. If we then have the vector u as
U — X\U\ + • • • + XnUn,
that is, in coordinates with respect to u and plug for their expression using the vectors from v, we obtain the coordinate expression x — (x\,..., x„) of the same vector in the basis v. It is enough just to reorder the summands and express the individual scalars at the vectors of the basis.
In reality, we are doing exactly the same thing as in the previous paragraph for the special case of the identity mapping idv on the vector space V. Matrix of this identity mapping is T and therefore the direct calculation must give x — T ■ x. The situation is depicted in the diagram
The resulting matrix T is called matrix for changing the basis from u of the vector space V to the basis v of the same space. Directly from the definition we have:
Calculating the matrix for changing the basis    [_(
Proposition. Matrix T for changing from the basis u to the basis v is obtained by taking coordinates of the vectors of the basis u expressed in the basis v are written as the columns of the matrix T.
The role of the matrix for changing the basis is that if we know the coordinates x of the vector in the basis u, then its coordinates in the basis v are obtained by multiplying the column x with the matrix for changing the basis (from the left). Because the inverse mapping for the identity mapping is again an identity mapping, the matrix for changing the basis is always invertible and its inverse is the matrix for changing the basis in the opposite direction, that is, from the basis v to the basis u.
2.38. More coordinates. Now we show how to compose possible coordinate expressions of linear mapping. Let us consider another vector space Z over K of dimension k with basis w, linear mapping g : W —>• Z and denote the corresponding matrix by gv,w-
V ■
/
w ■
fu,v
8v,ui
Composition g of on the upper row corresponds to the matrix of the mapping K" —>• Kk on the bottom and we directly calculate (we write A for the matrix / and B for the matrix of g in the chosen
102
CHAPTER 2. ELEMENTARY LINEAR ALGEBRA
Using matrix notation of this homogeneous system (and preserving bases): the order of the variables) we have
/l		i i	-1	-1	-1\		/l		1	1		-1	-1 -	-1\
1		1 0	-1	1	1		0		0	-1		0	2	2
1		0 1	1	-1	1		0		-1	0		2	0	2
v>		1 1	1	1	-v				1	1		1	1 -	
	/l	1	1	-1	-1	-is			/l 1	1		-1	-1 -	-1\
	0	1	1	1	1	-1			0 1	1		1	1 -	-1
	0	0 -	-1	0	2	2			0 0	1		0	-2 -	-2
		0	1	3	1	!y			^0 0	0		1	1	
		/l 1	1	0	0     0 \			/l 0		0	0	0	2 \	
		0 1	1	0	0 -	2		0 1		0	0	2	0	
		0 0	1	0 -	-2 -	2		0 0		1	0	-2	-2	
		^0 0	0	1	1 1/			^0 0		0	1	1	0	
We obtain a solution
X\
t, s e
-2ř, X2 = —2s, x 3 = 2s + 2t, yi
t, y2 = s, y3 = t,
We obtain a general vector of the intersection by substituting
X\ + x2 X\ + x3
	/	0	\
		-it -	2s
		2s	
	V	It	)
We see that
dim U n V
unv
-i i
V 0 /
-i
0
V1/
□
2.63.   Give some basis of the subspace
U
of the vector space of real matrices 3x2. Extend this basis to a basis of the whole space.
Solution. Let us remind that a basis of a subspace is a set of linearly independent vectors which generate the given subspace. Because
'1   2\ /0   1\      1-2 -V
2 |3 4+32 3=0 1 v5 6/ \4 5/ \2 3 the whole subspace U is generated just by the first two matrices. These are furthermore linearly independent (none is a multiple of another) and thus give a basis. If we want to extend it to a basis of the whole space of real matrices 3 x 2, we must find four more matrices (the
gv,w ° fu,v(x) = wo gov 1 ovo f o u 1
— B ■ (A ■ x) — (B ■ A) ■ x — (g o f\,w{x)
for every x e Kn. Composition of mappings thus corresponds to multiplication of the corresponding matrices. Note that the isomorphisms correspond exactly to invertible matrices.
The same approach gives us an answer to the question how does the matrix of the mapping change whenever we change the basis (both in the domain and in the codomain):
V
idy
V ■
f
w-
fu,v
w
where T is the matrix for changing the basis from w' to u and S is the matrix for changing the basis from v' to v. If A is the original matrix of the mapping, then the matrix of the new mapping is given by A' = S~l AT.
In the special case of linear mapping / : V —>• V, that is, mapping that has the same space V as its domain and codomain, we express / usually with a single basis u of the space V. Then the changing of the basis to the new one u' with the matrix T for changing from u' to u the new matrix will be A' — T~l AT.
2.39. Linear forms. A specially simple but important case of linear mappings are so-called linear forms. They are linear mappings from the vector space V over field of scalars K into the scalars K. If we are given the coordinates on V, the assignments of a single ;-th coordinate to the vectors is an example of a linear form. More precisely, for every choice of basis v — (v\ ,...,«„) we have at our disposal the linear forms v* : V —>• K such that v*(vj) — Stj, that is, zero for distinct indices i and j and one for the same indices.
Vector space of all linear forms on V is denoted by V* and called dual space of the vector space V. Let us now assume that the vector space V has finite dimension n. The basis V* composed of assignments of individual coordinates as before is called dual basis. Really it is a basis of the space V*, because these forms are clearly linearly independent (prove it!) and if a is an arbitrary form, then it holds for every vector u — x\ v\ + • • • + x„ vn
a(u) — jcia(ui) + • • • + x„a(v„)
= a(i;i)i;*(«) H-----\-a(v„)v*(u)
and thus the linear form a is a linear combination of the forms v*.
For a fixed basis {1} on one-dimensional space of scalars K are for every choice of the basis v on V the linear forms a identified with matrices of the type 1/n, that is, with rows y. Exactly the components of these rows are coordinates of the general linear forms in the dual basis v*. Expressing such form on vector is then given by multiplying the corresponding row vector y with the column of the coordinates x of the vector u e V in the basis i>:
a(u) — y ■ x — yixi H-----h ynxn.
Thus we can see that for every finitely dimensional space V is V* isomorphic to the space V. Realisation of such isomorphism is given for instance by our choice of the dual basis for the chosen basis on the space V.
103
CHAPTER 2. ELEMENTARY LINEAR ALGEBRA
1	0	0	0	0	0
2	1	0	0	0	0
3	2	1	0	0	0
4	3	0	1	0	0
5	4	0	0	1	0
6	5	0	0	0	1
dimension of the whole space is clearly 6) such that the resulting six-tuple is linearly independent. We can use for instance the canonical basis
'l   0\    /0   1\    /0  0\    /0  0\    /0  0\    /0 0^
0 0 I,   o o 1,   1 o 1,   0 11,   o o 1,   0 0
v0 0/ \0 0/ \0 0/ \0 0/ \1 0/ \0 1, of the space of real matrices 3x2, which can be identified directly with M6. If we write down the two vectors of the basis of U and then the canonical basis of the whole space, by choosing first 6 linearly independent vectors we obtain a desired basis. If we consider for instance
1,
we can immediately add to the basis vectors
'l   2\       /0 1N 3   4, 23 v5   6)      \4 5y
of the subspace U the matrices (vectors of the space of the matrices)
^0  0\      /0  0\      /0  0\      /0 0^ 10,   (oil,     00, 00 vo o)    \o o)    \i o)    \o 1,
to a basis. Let us note that the determinant given above is easy to compute - it equals the product of all elements on the diagonal, because the matrix is in lower triangular form (everything above the diagonal is zero). □
H. Linear mappings
How to analytically describe similar mappings (for instance rotation, axial symmetry, mirror symmetry, projection of a three-dimensional space on a two-dimensional one) in the plane or in the space? How can we describe scaling of a picture? What do they have in common? They all are linear mappings. That means that they preserve certain structure of the space or a subspace. What structure? Structure of a vector space. Every point in the plane is described by two coordinates, every point in the (3-dimensional) space is described by three coordinates. If we fix the origin, then it makes sense to say that some point is in some direction twice that far from the origin as some other point. We also know where do we get if we shift by some value in a given direction and then by some other value in another direction. These properties can be formalised - we speak of vectors in the plane or in the space and about their multiplication and addition. Linear mapping has the property that the image of a sum of vectors is
In this context we again meet the scalar product of a row of n scalars with a column of n scalars, as we have worked with it already in the paragraph 2.3 on the page 74.
When considering an infinitely dimensional space, things be-\\ have differently. For instance the simplest example of the space of all polynomials K[x] in one variable is a vector space with a countable basis with elements Vi — x* and as before we can define linearly independent forms v*. Every formal infinite sum J2t^oaivi is now well-defined linear form on K[x], because it will be evaluated only on a finite linear combination of the basis polynomials x*, i = 0, 1,2, ....
The countable set of all v* is thus not a basis. In reality, one can show that this dual space cannot have a countable basis.
2.40. The size of vectors and scalar product. When dealing with the geometry of the plane R2 in the first chap-/ ter in the paragraph 1.29 we have already worked not with just bases and linear mappings but also with the size of vectors and their angles. For defining these terms we have used the scalar product of two vectors i; — (x, y) and v' — (x*, y1) in the form u ■ v — xx* + yy1. Really, the actual expression for the size of i; — (x, y) is given by
|| v || — y x2 + y2 — -Jv ■ v,
while the (oriented) angle <p of two vectors i; — (x, y) and v' — (x*, y) is in planar geometry given by the relation
cos <p
XX1 + yy1
Note that this scalar product is linear in every of its arguments, we denote it by u ■ v or by (v, v'). Scalar product defined in such way is symmetric in its arguments and of course that it holds that Hull — 0 if and only if v — 0. >From our considerations it can be seen that in the Euclidean plane two vectors are perpendicular whenever their scalar product is zero.
In the case of real vector space of any dimension we shall try a similar approach, because the concept of the angle of two vectors is clearly always two-dimensional (we want the angle to be the same in the two-dimensional space containing u and v). In the following paragraphs, we shall consider only finitely dimensional vector spaces over real scalars R.
Scalar product and perpendicularity _
Scalar product on a vector space V over real numbers is a mapping (, ):VxV^l which is symmetric in its arguments, linear in each of its arguments, and such that (i;, i;) > 0 and 111; 112 — (v, v) — 0 if and only ifv—0.
The number || i; || — *J(v, v) is called the size of the vector i;.
Vectors v and w e V are called orthogonal or perpendicular whenever (v, w) — 0. We also write v ± w. The vector i; is called normalised whenever ||u|| — 1.
The basis of the space V composed of orthogonal vectors only is called orthogonal basis. If the vectors are additionally normalised, it is then orthonormalised basis.
104
CHAPTER 2. ELEMENTARY LINEAR ALGEBRA
a sum of the images of the vectors and the image of a multiple of a vector is the multiple of the image of the vector. These properties are shared among the mappings stated at the start of this paragraph. Such a mapping is then uniquely determined by its behaviour on vectors of a basis (in the plane by the image of two vectors not on the same line, in the space by the image of three vectors not in the same plane).
And how to write down some linear mapping / on a vector space V ? Let us start for simplicity with the plane M2: assume that the image of the point (vector) (1,0) is (a,b) and the image of the point (vector) (0, 1) is (c, d). This uniquely determines the image of arbitrary point with coordinates (u, v): f((u, v)) = f(u(\, 0) + v(0, 1)) = uf(l, 0) + vf(l, 0) = (ua, ub) + (vc, vd) = (au + cv, bu + dv), which can be efficiently written down as follows:
a c b d
au + cv bu + dv
Linear mapping is thus a mapping uniquely determined by a matrix. Furthermore, when we have another linear mapping g given by
(e f\
the matrix I     ^ j, then we can easily compute (an interested reader
can fill in the details by himself) that their composition g o f is given
, (ae + fc   be + df\
by the matrix (^+^   bg + dh)-
This leads us to the definition of matrix multiplication in exactly this way, that is, we want that an application of a mapping on a vector is given by the matrix multiplication of the matrix of the mapping with the given vector, and that mapping composition is given by the product of the corresponding matrices. It works analogously in the spaces of higher dimension. Further, this again shows what has already been proven in (2.5), that is, that matrix multiplication is associative but not commutative, because it is so with mapping composition. That is another of the motivation why one should investigate vector spaces.
Let us now recall that already in the first chapter we have worked with matrices of some linear mappings in the plane R2, notably with the rotation around a point and with axial symmetry (see 1.31 and 1.32).
Let us now try to write down matrices of linear mappings from M3 to M3. How does the matrix of a rotation in three dimensions look like? Let us begin with special (easier for description) rotations about coordinate axes:
2.64. Matrix of rotation about coordinate axes in M3. We write down matrices of the rotations by the angle <p, gradually about the (oriented) axes x, y and z, in M3.
Scalar product is very often denoted by the common dot, that is, (u, v) — u-v. From the context it is then necessary to recognise whether it is a product of two vectors (result is a scalar then) or something different (we have denoted the product the matrices and the product of scalars in the same way sometimes).
Because scalar product is linear in each of its arguments, it is completely determined by its values on tuples of basis vectors. Really, let us choose a basis u — («i,..., un) of the space V and denote
Sij = (Ui, Uj).
Then from the symmetry of scalar product we have sij — sp and from the linearity of the product in each of its arguments we get
\YxiUi, y,x>">) = XIv/v; "/• uj) = Xv//V/V/-
If the basis is orthonormal, the matrix S is the unit matrix. This proves the following useful claim:
__J       SCALAR PRODUCT AND ORTHONORMAL BASIS j___
Proposition. Scalar product is in every orthonormal basis given in coordinates by the expression
(x, y) - xT ■ y.
For every general basis of the space V there is symmetric matrix S such that the coordinate expression of the scalar product is
(x, y) — xT ■ S ■ y.
2.41. Orthogonal complements and projections. For every fixed subspace W C  V in a space with scalar rt      product we define its orthogonal complement as follows
ff^jueV; u _L v for all i; e W}.
Directly from the definition it is clear that W1- is vector subspace. If W C V has basis (u\,..., uk) the condition for W1- is given as k homogeneous equations for n variables. Thus W1- will have dimension at least n — k. Also u e W n W1- means (u, u) — 0 and thus also u — 0 due to the definition of scalar product. Clearly then the whole space V is the direct sum
V = w ®WL.
Linear mapping / : V —>• V on any vector space is called projection, if we have
/<>/ = /■ In such case for every vector v e V
v = f(v) + (v- f(v)) € Im(/) + Ker(/) = V
and if i; e Im(/) and f(v) — 0 then also v — 0. The previous sum of subspaces is then direct. We say that / is a projection on the subspace W — Im(/) along the subspace U — Ker(/). In words, the projection can be described naturally as follows: we decompose the given vector into component in W and in U and forget the second one.
If V has a scalar product, we say that the projection is perpendicular if the kernel is perpendicular to the image.
105
CHAPTER 2. ELEMENTARY LINEAR ALGEBRA
Solution. When rotating any particular point about the given axis (say x), the corresponding coordinate (x) does not change and the remaining two coordinates are then given by the rotation in the plane which we already know (a matrix of the type 2/times!).
Thus we gradually obtain the following matrices - rotation about the axis z;.
(cos cp   — sin cp 0^ sin cp    cos cp 0 0        0 1,
rotation about the axis y:
cos cp    0   sin cp1
0      1 0 — sin cp   0   cos q>}
rotation about the axis x:
'10 0 0  cos cp   — sin cp y0   sin cp    cos cp
The sign at cp in the matrix for rotation about y is different. We want, as with any other rotation, the rotation about the y axis to be in the positive sense — that is, when we look in the opposite direction of the direction of the y axis, the world turns anti-clockwise. The signs in the matrices depend on the orientation of our coordinate system. Usually, in the 3-dimensional space the so-called "dextrorotary coordinate system" is chosen: if we place our hand on the x axis such that the fingers point in the direction of the axis and such that we can rotate the x axis in the xy plane so that x coincides with the y axis and they point in the same direction, then the thumb should point in the direction of the z, axis. In such system this is a rotation in the negative sense in the plane xz, (that is, the axis z, turns in the direction towards x). Think about the positive and negative sense of rotations through all three axes. □ The knowledge of matrices allows us to write the matrix of rotation about any (oriented) axis. Let us start with a specific example:
2.65. Find the matrix of the rotation in the positive sense through the angle 7t/3 about the line passing through the origin with the oriented directional vector (1, 1,0) under the standard basis M3.
Solution. The given rotation can be easily obtained by composing these three mappings:
• rotation through the angle 7t/4 in the negative sense about the axis z, (the axis of the rotation goes over on the x axis);
• rotation through the angle 7t/3 in the positive sense about the x axis;
• rotation through the angle jt/'4 in the positive sense about the z axis (the x axis goes over on the axis of the rotation).
Every subspace W / V thus defines an perpendicular projection on W. It is a projection on W along W-1, given by the unique decomposition of every vector u into components uw e W and uw± e W-1, that is, linear mapping which maps uw + uw±_ on
2.42. Existence of orthonormal basis. Note that on every finitely dimensional real vector space there definitely exist scalar products. Just pick any basis, call it orthonormal and we immediately have a scalar product. In this basis the scalar products are computed as in the formula in the Theorem 2.40.
But we can do it in other way too. If we are given scalar product on a vector space V, we can easily use some suitable perpendicular projections and transform any basis into orthonormal one. It is called Gramm-Schmidt orthogonalisation process. The point of this procedure is to transform a given sequence of nonzero generators v\,..., Vk of a finitely dimensional space V into an orthogonal set of nonzero generators for V.
_ I    Gramm-Schmidt orthogonalisation
Proposition. Let (u\,..., «*) be a linearly independent k-tuple of vectors of a space V with scalar product. Then there exists an orthogonal system of vectors (v\, ..., v^) such that vi e (u \, ... ,Uj■), i — 1, ..., k. We obtain it by the following procedure:
• The independence of the vectors ui ensures that u\ choose v\ — u\.
• If we have already constructed the vectors v\, ... quired properties, we choose vi+\ — U£+\ +a\v\ +■ where at — — t,l"l+\'^'K
^ 0; we
V£ of re-
■ ■ +a£V£,
Proof. Let us begin with the first (nonzero) vector v\ and calculate the perpendicular projection v2 on do
(ui)1" C {{vuv2}).
The result is nonzero if and only if v2 is independent on v\. In all further steps we work similarly.
In the £-th we want that for vi+\ — u^+\ +a\v\ H-----Va^v^
holds {vi+i, vt) — 0 for all i — 1,... ,£. That implies
0 — (u£+i +a\vi H-----latvt, vt) — (w^+i, vt) +at{vi, vt)
and we can see that the vectors with desired properties are determined uniquely up to a scalar multiple. □
Whenever we have an orthogonal basis of a vector space V, we just have to normalise the vectors in order to obtain an orthonormal basis. Thus we have proven:
Corollary. On every finitely dimensional real vector space with scalar product there exist an orthonormal basis.
In orthonormal basis the coordinates and perpendicular projections are very easy to calculate. Really, let us have an orthonormal basis (e\,..., e„) of a space V. Then every vector v — x\e\ + • • • + xnen satisfies
{et, v) — {et, x\e\ H-----h x„e„) — xt
and it always holds that
(2.3) i; — (e\, v)e\ H-----\-(e„,v)e„.
106
CHAPTER 2. ELEMENTARY LINEAR ALGEBRA
The matrix of the resulting rotation is the product of the matrices corresponding to the given three mappings, while the order of the matrices is given by the order of application of the mappings - the first mapping applied is in the product the rightmost one. Thus we obtain the desired matrix
0 j_
2
V3
/V2			o)		/l
z V2 2		z Vj 2	0		0
		0	'J		
	/	3	J_		
=		4 j_	4 3		
	V-	4 Vě 4	4 Vě 4	1 2	
Note that the resulting rotation could be also obtained for instance by taking the composition of the three following mappings:
• rotation through the angle n/4 in the positive sense about the axis z, (the axis of rotation goes over on the axis y);
• rotation through the angle 7t/3 in the positive sense about the axis y;
• rotation through the angle 7t/4 in the negative sense about the axis z, (the axis y goes over to the axis of rotation).
Analogously we obtain
0
1
0
V5\
2
0
/VI
2
VI
2
2
VI
2
0
0
□
2.66. Matrix of general rotation in R3. Derive the matrix of a general rotation in R3.
Solution. We can do the same things as in the previous example with general values. Consider arbitrary unit vector (x, y, z). Rotation in the positive sense through the angle <p about this vector can be written down as a composition of the following rotations whose matrices we already know:
i) rotation TZ\ in the negative sense about the z, axis through the angle with cosine equal to x/^fx2 + y2 = x/Vl — z2, that is, with sine y/Vl — z2, under which the line with the directional vector (x, y, z) goes over on the line with the directional vector (0, y, z). Matrix of this rotation is jc/V1 - z2    y/Vl -z2 0>
/VT^l5 o |,
-y/y/T 0
z2 x
If we are given a subspace W c V and its orthonormal basis (ei,..., ek), we can surely extended it to an orthonormal basis (e i,..., en) of the whole V. Perpendicular projection of a general vector v e V on W is then given by the relation
v i-> (ei, v)e\ H-----h (e„, v)ek.
For perpendicular projection it is enough to know just the orthonormal basis of the subspace W, on which we are projecting.
Let us also note than in general projections / on a subspace W along U and projections g on U along W tied with the relation g — idy — /. When dealing with perpendicular projections on a given subspace W, it is always more efficient to calculate the orthonormal basis of the space which has smaller dimension (that is, for either W or W-1).
Let us also note that the existence of an orthonormal basis ensures that for every real space V of dimension n with scalar product there exists a linear mapping which is an isomorphism between V and the space M" with standard scalar product. Similarly it has been shown already in the Theorem 2.40, where we have shown that the desired isomorphism is exactly the coordinate assignment. In words - in orthonormal basis the scalar product with coordinates is computed by the same formula as the standard scalar product in W.
We shall return to the questions of the size of a vector and to projections in the following chapter in more general context.
2.43. Angle of two vectors. As we have already noted, the angle of two linearly independent vectors in the space must be the same as when we consider them in the two-dimensional subspace they generate. Basically, this is the reason why the notion of angle is independent of the dimension of the original space and if we choose orthogonal basis such that its first two vectors generate the same subspace as the two given vectors u and i; (whose angle we are measuring), we can simply take the definition from the planar geometry. Even without choosing the basis it must hold that:
___|    Angle of two vectors [ -
Angle <p of two vectors i; and w in a vector space with scalar product is given by the relation
cos <p
(v, w)
Angle defined in this way does not depend on the order of the vectors i;, w and is in the interval 0 < <p < it.
0
1,
We shall return to the scalar products and angles between vectors in further chapters.
2.44. Multilinear forms. Scalar product was given as a mapping \.    from the product of two copies of a vector space V into the space of scalars, which was linear in each of its arguments. Similarly, we will work with mappings from the product of k copies of a vector space
V into the scalars, which are linear in each of its k arguments. We speak of k-linear forms.
Most often we will meet bilinear forms, that is, the case a :
V x V —>• K, where for any four vectors u, v, w, z and scalars a,
107
CHAPTER 2. ELEMENTARY LINEAR ALGEBRA
ii) rotation 1Z2 in the positive sense about the y axis through the angle with cosine Vl — z2, that is, with sine z, under which the line with the directional vector (0, y, z) goes over on the line with the directional vector (1, 0, 0). Matrix of this rotation is
R2
z 0
0 VT
iii) rotation 7^3 in the positive sense about the x axis through the angle <p with the matrix
'10 0 0  cos(<p)   — sin(<p) \0   sin(<p) cos(<p)
iv) rotation 7Z21 with the matrix R2l,
v) rotation VS[X with the matrix R^1.
Matrix of the composition of these mappings, that is, the matrix we are looking for, is given by the product of the rotations in the reverse order:
A,"
b, c and d it holds, as with the ordinary scalar product, that
a(au + bv, cw + dz) — aca(u, w) + ada(u, z)
+ bca(v, w) + bda(v, z).
If we additionally have
a(u, w) — a(w, u),
we speak of symmetric bilinear form. If exchange of the arguments leads to opposite sign of the result, we speak of antisymmetric bilinear form.
Already in the planar geometry we have defined determinant as a bilinear antisymmetric form a, that is, a(u, w) — —a(w, u). In general we know thanks to the theorem 2.17 that determinant in the dimension n can be seen as n-linear antisymmetric form.
As with linear mappings it is clear that every ^-linear form is completely determined by its values on all ^-tuples of basis elements in a fixed basis. In analogy to linear mappings we can see these values as ^-dimensional analogues to matrices. We show this on an example with k — 2, where it will really correspond to matrices (as we have defined them).
__ I    Matrix of bilinear form    [__>
-1 • R2l -R3-R2-Ri =
cos <p + (1 — cos (pjx2 yx( 1 — cos cp) + z sin cp \zx( 1 — cos cp) — y sin cp
(1 — cos (p)xy — z sin cp cos (p + (1 — cos (f^y2 (1 — cos cp)zy + x sin cp
(1 — cos cp)xz + y sin (pi (1 — cos cp)yz — x sin qr-cos cp + (1 — cos (p)z2
□
If we choose basis u on V and define for a given bilinear form a scalars — aim, uj) then we obviously obtain for vectors 1;, w with coordinates x and y (as columns of coordinates)
n
a(v, in) = ^ <//;.v/V/ = yT ■ a- x, ij=l
where A is a matrix a — (atj).
2.67. We are given a linear mapping M3 as the following matrix:
P 3
in the standard basis
Write down the matrix of this mapping under the basis
fi, h) = ((1,1,0), (-1,1,1), (2,0, 1)).
Solution. The transition matrix T for changing the basis from the basis / = (/1, fi, h) to the standard basis, that is, to the basis given by the vectors (1, 0, 0), (0, 1, 0), (0, 0, 1), can be obtained, according to the Claim 2.25, by writing down the coordinates of the vectors f\, f2, h in the standard basis as the columns of the matrix T. Thus we have
Directly from the definition of the matrix of bilinear form we see that the form is symmetric or antisymmetric if and only if the corresponding matrix has this property.
Every bilinear form a on vector space V defines a mapping V —>• V*, v \-> a( , v), that is, plugging a fixed vector in the second argument we obtain a linear form which is the image of this vector. If we choose a fixed basis on a finitely dimensional space V and a dual basis V*, then we have a mapping
y
(ih>y • a ■ x).
4. Properties of linear mappings
More detailed analysis of properties of types of linear mappings will now lead us to a better understanding of tools which vector spaces give us for modelling of linear processes and systems.
2.45.   Let us begin with four examples in the lowest interesting dimension. In the standard basis of the plane M2 with the standard scalar product we consider 5 the following matrices of mapping / : M2 —>•
a =
1 0 0 0
B =
0 1 0 0
c =
a 0 0 b
D =
0
The matrix a gives a perpendicular projection along the subspace
W C {(0,a); aeRjcR2
108
CHAPTER 2. ELEMENTARY LINEAR ALGEBRA
Transition matrix for changing the basis from the standard basis to the basis / is then given by
on the subspace
Matrix of the mapping in the basis / is then given by
t~lat
□
2.68. Consider the vector space of polynomials of one variable of
degree at most 2 with real coefficients. In this space, consider the
basis 1, x, x2. Write down the matrix of the derivative mapping in this
basis and also in the basis 1 + x2, x, x + x2.
/0   1   0\   /0    1     1 \ Solution.   002,21     3. □ \0  0  0/   \0  -1 -1/
2.69. In the standard basis in R3 determine the matrix of the rotation through the angle 90° in the positive sense about the line (t, t,t),t el, oriented in the direction of the vector (1, 1, 1). Further, give the matrix of this rotation in the basis
£=((1,1,0), (1,0,-1), (0,1,1)).
Solution. We can easily determine the matrix of the given rotation in a suitable basis, that is, in a basis given by the directional vector of the line and by two mutually perpendicular vectors in the plane x + y + z, = 0, that is, in the plane of vectors perpendicular to the vector (1, 1, 1). We shall note that the matrix of the rotation in the pos-
itive sense through 90° in some orthonormal basis in R2 is
0 -1
1 0
In orthogonal basis with sizes of the vectors k, I it is ^
If we choose perpendicular vectors (1,-1,0) and (1, 1, —2) in the plane x + y + z = 0 with sizes \fl and Vo", then in the basis / = ((1, 1, 1), (1, —1, 0), (1, 1, —2)) the rotation we are looking for
/I     0       0 \ has matrix 10     0     — V3 I. In order to obtain the matrix of the
\0   1/V3     0 / rotation in the standard basis, it is enough to change the basis. The
transition matrix t for changing the basis from the basis / to the standard basis is obtained by writing the coordinates (under the standard basis) of the vectors of the basis / as the columns of the matrix t: /I    1 1
t = I 1   — 1    1 |. Finally, for the desired matrix r we have VI    0 -2
V c {(fl,0); íieI)c
that is, the projection on the x-axis along the y-axis Evidently for this mapping / : R2 -> R2 it holds that / o / = / and thus the restriction f\y of the given mapping on its codomain is identity mapping. The kernel of / is exactly the subspace W.
The matrix B has the property B2 — 0, therefore the same holds for the corresponding mapping /. We can envision it as a mapping of differentiation of polynomials Mi [x] of degree at most one in the basis (1, x) (with differentiation we shall deal in the chapter five, see ??).
The matrix C gives a mapping /, which enlarges the first vector of the basis a-times, the second fr-times. Therefore the whole plane divides into two subspaces, which are preserved under the mapping and where it is only a homothety, that is, scaling by a scalar multiple (first case was a special case with a — 1, b — 0). For instance the choice a — 1, b — — 1 corresponds to axial symmetry (mirror symmetry) under the x-axis, which is the same as complex conjugation x + iy i-> x — iy on the two-dimensional real space R2 ~ C in basis (1,0- This is a linear mapping of the two-dimensional real vector space C, but not of the one-dimensional complex space C.
The matrix D is a matrix of rotation through the right angle in the standard basis and on the first sight we can see that none of the one-dimensional subspaces is not preserved under this mapping.
Such rotation is a bijection of the plane to itself, therefore we can surely find distinct bases in the domain and codomain, where its matrix will be the unit matrix E (we simply take any basis of the domain and its image in the codomain). But we are not able to do this with the same basis on both the domain and the codomain. Let us see the matrix D as a matrix of the mapping g : C2 -> ~"2 in the standard basis of the complex vector space . Then we can find vectors u — (i, 1), i; — (—i, 1), for which we have
0 -1
1 0
0 -1
1 0
i ■ u,
That means that in the basis (w, v) on C the mapping g has the matrix
'i 0
^0 -i.
K
and note that the this complex analogy to the case of matrix C has on the diagonal the elements a — cos(j7r) + «sin(j7r) and its complex conjugate a. In other words, the argument of this number in polar form gives the angle of the rotation.
This is easy to understand, if we denote the real and imaginary part of the vector u as follows
iyu
Re w + ŕ Im u
i/+í''lo
The vector i; is complex conjugate of u. We are interested in the restriction of the mapping g on the real vector space V — R2 n
(w, v) c
-. Evidently is
V
(u + u, i(u — u)) — (ji
-yu)
109
CHAPTER 2. ELEMENTARY LINEAR ALGEBRA
R
1/3 1/3 + V3/3
1/3-VŠ/3   l/3 + V3/3\ 1/3        1/3 - V3/3
J/3-V3/3   1/3 + V3/3        1/3 /
This result can be checked by plugging into the matrix of general rotation (||2.66||). By normalising the vector (1, 1, 1) we obtain the vector (x, y, z) = (1/V3, 1/V3, 1/V3), cos(?)) = 0, &in(<p) = 1. □
2.70. In M3 determine the matrix of rotation through the angle 120° in the positive sense about the vector (1,0, 1) (it is enough to give in the form of matrix product). O
2.71. Matrix of general rotation revisited. Let us try to derive the matrix of (general) rotation from (||2.66||) through the angle <p in the positive sense about the unit vector (x, y, z) in a different way, analogically to the previous exercise. In the basis / = {{x, y, z), (—y, x, 0), (zx, zy, z2 — 1)), that is, in the orthogonal basis composed of the directional vector of the axis of rotation and of two mutually perpendicular vectors with sizes Vl — z2 lying in a plane perpendicular to the axis of rotation, the matrix corresponding to the rota-
/I 0 0 \
tion is A = I    0      cos(<p)   — sin(<p) I. The matrix for changing \cos(<p)   sm(cp)        0 J
(x   —y zx
the basis from / to the standard basis is then T = \ y    x zy
0
1,
with the inverse matrix
X	y
y	X
l-z2	l-z2
zx	zy
l-z2	l-z2
Finally, for the matrix R of the rotation we obtain
R=T ■ RT~
(cosp + (1 — cosip)x yx(l — cosp) + z sinip zx (1 — cos <p) — y sin <p
(1 — cos <p)xy — z sinip (1 — cosip)xz + y sin <p cosp + (1 — cosp))'2 (1 — co$<p)yz — x sin <p (1 — cos <p)zy + x sinip     cosp + (1 — cosip)z2
When doing multiplication and simplification we must repeatedly use the assumption x2 + y2 + z2 = 1.
Through a more detailed analysis of properties of various types of linear mapping we now obtain a deeper understanding of tools we are given by vector spaces for linear modelling of processes and systems.
the whole plane R2. The restriction of g on this plane is exactly the original mapping given by the matrix A and from the definition of multiplication by the complex unit it is a rotation through the angle \it in the positive sense with respect to the chosen basis xu, —yu (work it by yourself with a direct calculation, and realise also why exchanging the order of the vectors u and i; leads to the same result, although in a different real basis!).
2.46. Eigenvalues and eigenvectors of mappings. Key to the de-
. scription of mappings in the previous examples were ■i answers to the question "what are the vectors satis-C$£jMm,z:_ fying the equation f(u) — a ■ u for some suitable scalars a?".
Let us fix a linear mapping / : V —>• V on a vector space of dimension n over scalars K. If we imagine such equality written in coordinates, that is, using the matrix of the mapping A in some bases, it is an expression
A ■ x — a ■ x — (A — a ■ E) ■ x — 0.
>From the previous we know that such a system of equations has the only solution x — 0 if the matrix A—aE is invertible. Thus we want to find such values a e K for which A — aE is not invertible, and for that the necessary and sufficient condition (see Theorem 2.23)
(2.4)
det(A -a-E) = 0.
If we consider X — a a variable in the previous scalar equation, we are actually looking for roots of polynomial of n-th degree. As we have seen in the case of the matrix D, the roots may exist, but do not have to depending to the field of scalars K we are having.
_ |    Eigenvalues and eigenvectors [___
Scalars X satisfying the equation f(u) — X ■ u for a nonzero vector u e V are called eigenvalues of mapping f, the corresponding nonzero vectors u then eigenvectors of mapping f.
If u, v are eigenvectors associated with the same eigenvalue X, then for every linear combination of u and i; it holds
f(au + bv) — af (u) + bf(v) — X(au + bv).
Therefore the eigenvectors associated with the same eigenvalue X form along with a zero vector a nontrivial vector subspace Vx, that is, eigenspace associated with X. For instance, if X — 0 is an eigenvalue, the kernel Ker / is a eigenspace Vq.
>From the definition of the eigenvalues it is clear that their computation cannot depend on the choice of the basis and the matrix of the mapping /. Indeed, as a direct corollary of the transformation properties from the paragraph 2.38 and Cauchy theorem 2.19 for calculation of the determinant of product we obtain by choosing different coordinates a matrix A' — P~l AP with invertible matrix P and
\P~LAP
XE\ = \P~LAP -■■ \P~1(A-XE)P\ ■■ \A-XE\,
P~lXEP\
-i i
(A-XE\\P\
because scalar multiplication is commutative and \P~ \ —
>From these reason we use for matrices and mappings the same terminology:
110
CHAPTER 2. ELEMENTARY LINEAR ALGEBRA
2.72. Consider complex numbers as a real vector space and choose 1 and i for its basis. Determine in this basis the matrix of the following linear mappings:
a) conjugation,
b) multiplication by the number (2 + i).
Determine the matrix of these mappings in the basis / = ((1 — 0,(1 + 0).
Solution. In order to determine the matrix of a linear mapping in some basis, it is enough to determine the images of the basis vectors.
a) For conjugation we have 1 h» 1, i h» —i, written in the coordinates (1,0) h» (1,0) and (0, 1) h» (0, -1). By writing the images into the columns we obtain the matrix ^ ^l)' ^ ^ ^a sis / the conjugation swaps basis vectors, that is, (1, 0) h» (0, 1)
and (0, 1) i—> (1, 0) and the matrix of conjugation under this basis is
0 1\
1 0)'
b) For the basis (1,0 we obtain 1 h» 2 + i, i h» 2i — 1, that is,
(1,0) h» (2, 1), (0, 1) h» (2,-1). Thus the matrix of multiplication
(2 -1N
by the number 2 + i under the basis (1, /) is: I ^ ^
Now let us determine the matrix in the basis /. Multiplication by (2 + 0 gives us: (1 - 0 ^ (1 - 0(2 + 0 = 3- i, (1 + 0 ^ (1 + 30-Coordinates (a,b)f of the vector 3 — i in the basis / are given, as we know, by the equation a ■ (1 — 0 + b ■ (1 + 0 = 3 + i, that is, (3 + 0/ = (2, 1). Analogously (1 + 30/ = (-1, 2). Altogether, we
1 V> "
Think about the following: why is the matrix of multiplication by 2 + i the same in both bases? Would the two matrices in these bases be the same for multiplication by any complex number? □
2.73. Determine the matrix A, which under the standard basis of the space M3 gives the orthogonal projection on the vector subspace generated by the vectors u \ = (—1, 1,0) etadu2 = (—1,0, 1).
Solution. Let us first note that the given subspace is a plane going through the origin with the normal vector u3 = (1, 1, 1). The ordered triple (1, 1, 1) is clearly a solution to the system
have obtained the matrix
-xx
+ x2
+ x3
0, 0,
that is, the vector u3 is perpendicular to the vectors u\,u2.
Under the given projection the vectors u \ and u2 must map to themselves and the vector u3 on the zero vector. In the basis composed of
Characteristic polynomial of matrix and mapping [__
For a matrix A of dimension n over K we call the polynomial | A — XE\ € K„[A] characteristic polynomial of the matrix A.
Roots of this polynomial are the eigenvalues of the matrix A. If A is the matrix of the mapping / : V -> V in a certain basis, then | A — XE\ is also called the characteristic polynomial of the mapping f. i
Because the characteristic polynomial of a linear mapping / : V -> V is independent of the choice of the basis of V, its coefficients at individual powers of the variable X are scalars expressing the properties of /, that is, they cannot depend on the choice of the basis. Notably as a simple exercise for calculating determinants we express the coefficients at the highest and lowest powers (we assume dim V — n and the matrix of the mapping A — (atj) to be in a certain basis):
\A — X - E\ — {-\)nXn + (-ly-Vi + • • • + ann) ■ k + ••• + IAI -X°.
! n — l
Coefficient at the highest power says only whether the dimension of the space V is even or odd. We have already noted that the determinant of the matrix of a mapping expresses how the given linear mapping scales the volume.
Interesting is that the sum of the diagonal elements of the matrix of a mapping does not depend on the choice of basis. We call it the trace of matrix and denote it by TrA. Trace of mapping is defined as a trace of the matrix in an arbitrary basis. In reality this is not so surprising, because in the eight chapter we show an example to illustrate a method of differential calculus, which shows that the trace is actually a linear approximation of the determinant in the neighbourhood of the unit matrix, see ??.
In the following we show a few important properties of eigenspaces.
2.47. Theorem. Eigenvectors of linear mappings f : V —. associated to different eigenvectors are linearly independent.
V
Proof. Let a\,..., a\ be distinct eigenvalues of the mapping / and u\, eigenvectors with these eigenvalues. We
do the proof by induction on the number of linearly independent vectors among the chosen ones. Assume that «i,... ,ue are linearly independent and ui+\ — qui is their linear combinations. At least 1—1 can be chosen, because the eigenvectors are nonzero. But then /(w^+i) — • m+i — T!i=i ai+i ■ ci ■ ut> that is,
fim+i) — J^ai+i
Ui
• f(Ui) = ^\
at ■ Ui.
By subtracting the second and the fourth expression in the equalities we obtain 0 — J2l=i(ai+i — ai) • q • ui- All the differences between eigenvalues are nonzero and at least one coefficient q is nonzero. That is a contradiction with the assumed linear independence «i,..., ui, therefore also the vector ui+i must be linearly independent of the others. □
111
CHAPTER 2. ELEMENTARY LINEAR ALGEBRA
u\, u2, u3 (in this order) is thus the matrix of this projection
'1 o rr> o 1 o vo o Oy
Using the the transition matrix for changing the basis
_ 1 2
1 \
3 3 3
III
3 3 3
from the basis (u \, u2, u3) to the standard basis, and from the standard basis to the basis («i, u2, u3) we obtain
-l -l r
1     0 1 0     1 1,
	0	0
0	1	0
0	0	0
2 _ I _ 1
3 3 3 I       2 _ 1
"33 3
1         1 2
The just proved theorem can be seen as a decomposition of a linear mapping / into a sum of simple mappings. For distinct eigenvalues Xi of the characteristic polynomial we obtain one-dimensional eigenspaces Vit. Each of them then describes a projection on this invariant one-dimensional subspace, where the mapping is given just as multiplication by the eigenvalue Xi. The whole subspace V is then decomposed into a direct sum of individual eigenspaces. Furthermore, this decomposition can be easily calculated:
[    Basis of eigenvectors    [__>
□
2.74. In the vector space R3 determine the matrix of the orthogonal projection onto the plane x + y — 2z, = 0. O
2.75. In the vector space R3 determine the matrix of the orthogonal projection on the plane 2x — y + 2z = 0. O
I. Bases and inner products
Using the inner product we can solve in a different (better?) way problems we were able to solve already using changes of coordinates.
2.76. Write down the matrix of the mapping of orthogonal proj ection on the plane passing through the origin and perpendicular to the vector (1,1,1).
Solution. The image of arbitrary point (vector) x = (xi, x2, x3) e R3 under the considered mapping can be obtained by subtracting from the given vector its orthogonal projection onto the direction normal to the considered plane, that is, onto the direction (1, 1, 1). This projection p is given by (see 2.3) as
(x, (1, 1, 1)) _   X\ +X2 + X3   Xi+X2+X3 Xi+X2+X3
1(1, 1, 1)|2  ~ 3       '        3       ' 3
The resulting mapping is thus
2x\     x2 -\- x3 2x2 X\ -\- x3 2x3 X\ -\- x2 xp = (~$         3    ' ~3         3    ' ~3 3    } =
Corollary. If there exists n mutually distinct roots Xi of the characteristic polynomial of the mapping f : V -> V on n-dimensional space V, then there exists a decomposition of V into a direct sum of eigenspaces of dimension 1. That means that there exists a basis of V composed only of eigenvectors and in this basis f has diagonal matrix. This basis is uniquely determined up to the order of the elements.
The corresponding basis (expressed in the coordinates in an arbitrary basis ofV) is obtained by solving N systems of homogeneous linear equations ofn variables with matrices (A — Xi ■ E), where A is a matrix of f in a chosen basis.
3 3 3
We have (correctly) obtained the same matrix as in the exercise || 2.731|.
□
2.48. Invariant subspaces. We have seen that every eigenvector v of the mapping / : V -> V generates a subspace (v) c V, which is preserved by the mapping /.
In more generality, we say that a vector subspace W c V is invariant subspace for a linear mapping /, if it holds that f(W)<zW.
If V is a finitely dimensional vector space and we choose some basis («i,..., uk) of a subspace W, we can always extend it to a basis («i,..., uk, uk+i,..., un) of the whole space V and in every such basis has our mapping the matrix A of the form
(2.5)
A =
B C 0 D
where B is a square matrix of dimension k, D is a square matrix of dimension n — k and C is a matrix of the type n/(n —k). On the other hand, if in some basis (u i ,...,«„) the matrix of the mapping / is of the form (2.5), W — («i,..., is an invariant subspace of the mapping /.
Of course that in our matrix of the mapping (2.5) a submatrix C is zero if and only if the subspace (wt+i, ...,«„) generated by the added vectors of the basis invariant.
>From this point of view the eigenspaces of the mapping are extremal case of invariant subspaces and notably in the case of existence of n — dim V distinct eigenvalues of the mapping / we obtain a decomposition of V into direct sum of n eigenspaces. In a suitable basis formed of the eigenvectors the mapping has then diagonal form with eigenvalues on the diagonal.
2.49. Orthogonal mappings. Let us now have a look on the special case of the mapping / : V —>• W between spaces with scalar products, which preserve sizes for all vec-
f     tors u e V.
Ill
CHAPTER 2. ELEMENTARY LINEAR ALGEBRA
2.77. In R3 a standard coordinate system is considered. In the plane z, = 0 there is a mirror and at the point [4,3,5] there is a candle. The observer at the point [1, 2, 3] is not aware of the mirror, but sees in it the reflection of the candle. At what point does he think the candle is?
o
Solution. [4, 3, -5] □
2.78. Find the matrix of the reflection with respect to the plane x +
y+z = 0.
Solution. >From the equation of the plane we determine its unit normal vector. In our case it is « = -^(1,1,1). The reflection Z of the vector i; can then be expressed by Zv = v — 2(v.n)n = (1 — 2nnT)v (for the standard scalar product we have v.n = vnT). The matrix of the reflection is then
Definition of orthogonal mapping
1 — 2nnT
1
'i i r
2 - -  i i i
3 Vi i i.
1
1
1
□
Using the inner product we can determine the (angular) deflection of the vectors:
2.79. Determine the deflection of the roots of the polynomial x2 — i considered as vectors in the complex plane.
Solution. The roots of the given polynomial are square roots of i. The arguments of the square roots of any complex numbers differ according to the de Moivre theorem by n. Their deflection is thus always n. □
2.80. Determine the cosine of the deflection of the lines p, q in R3 given by the equations
p   :   -2x + y + z = 1
x + 3y - 4z = 5 q   :   x — y = —2
z = 6
2.81. Let a line be given:
p : [1, 1] + (4,      t e R
Determine the parametrical expression of all lines q that pass through the origin and have deflection 60° with the line p. Q
Linear mapping / : V -> W between spaces with scalar product is called orthogonal mapping, if for all u e V
>From the linearity of / and from the symmetry of the scalar product follows that for all tuples of vectors the following equality holds:
(f(u + v), f(u + v)) = (f(u), f(u)) + (f(v), f(v))
+ 2(/(k),/(«)).
Therefore all orthogonal mappings satisfy also seemingly stronger condition that for all vectors u, v e V it holds that
(/(«), f(v)) = («, v).
In the initial discussion about the geometry in the plane we have proved in the Theorem 1.33 that a linear mapping R2 -> R2 preserves sizes of the vectors if and only if its matrix in the standard basis (which is orthonormal with respect to the standard scalar product) satisfies AT ■ A — E, that is, A-1 = AT.
In general, orthogonal mapping / : V -> W must be always injective, because the condition (f(u),f(u)) — 0 means also (u, u) — 0 and thus u — 0. In such case is then the dimension of the range always at least as big as the dimension of the domain of /. But then the both dimensions equal and we know that / : V —>• Im / is a bijection. If Im / / W, we extend the orthonormal basis of the image of / to an orthonormal basis of the target space and the matrix of the mapping then contains a square regular submatrix A along with zero rows so that it has the required size. Without loss of generality we can assume that W — V.
Our condition for the matrix of orthogonal mapping says in orthonormal basis that for all vectors for all vectors x and y in the space K" the following:
(A • x)T ■ (A ■ y) = xT ■ (AT • A) • y = xT ■ y.
By specially choosing x and y to be the vectors of the standard basis we directly obtain that AT ■ A — E, that is, the same result as in the dimension two. Thus we have obtained the following theorem:
___J    Matrix of orthogonal mappings |_ -
Theorem. Let V be a real vector space with scalar product and let f : V —>• V be a linear mapping. Then f is orthogonal if and only if in some orthogonal basis (and then consequently in all of them) it has the matrix A satisfying AT — A~l.
Proof. Indeed, if / preserves sizes, it must have the listed property in every orthonormal basis. On the other hand, the previous calculations show that this property for matrix in one basis ensures size preservation. □
Square matrices which satisfy the equality AT — A~l are called orthogonal matrices.
Corollary of the previous theorem is also a description of all matrices S of basis changing. Each must give a mapping K" —>• K" that preserves sizes and thus satisfies the condition S~l — ST. When changing from one orthonormal basis to another the matrix of any linear mapping changes according to the relation
113
CHAPTER 2. ELEMENTARY LINEAR ALGEBRA
2.82. Using the Gram-Schmidt orthogonalisation obtain the orthogonal basis of the subspace
U = {(*!, X2, X3, x4)t € M4; X\ + X2 + X3 + x4 = 0}
of the space R4.
Solution. The set of solutions of the given homogeneous linear equation is clearly a vector space with the basis
/-1\ /-1\ /-1\
0
1
1
0
u2
u3
0 0
V1/
Vectors of the orthogonal basis obtained using the Gram-Schmidt orthogonalisation process shall be denoted t>i, v2, v3. Let us first set vi =u\. Further, let
1
u\ ■ V\
v2 = u2- Vi = u2 - - Vi
\\vi\\2 2
1 1
—, —, 1,0
2 2
that is, let us choose a multiple v2 = (—1, —1,2,0)T. Further, let
V3 = U3
u\ ■ V\
|Wll
u\ ■ v2
~\\v7\
1
1
v2 = u3
Vi
v2
1   1 1
—, —, —, 1
3    3 3
Altogether we have
	/-1\		/-1\		/-1\
	1		-1		-1
	0	,    v2 =	2	,    v3 =	-1
	\°)		\°)		\3/
Let us add that due to the simplicity of the exercise we can immediately give an orthogonal basis of the vectors
(1,-1,0, 0)T ,    (0, 0, l,-l)r
or
(-1,1,1,-iy
(i,-i,i,-iy
(1,1,-1,-1)'
(-1,-1,1,1)'
□
2.83. Write down some basis of the real vector space of the matrices 3x3 over R with zero trace (the sum of the elements on the diagonal) and write the coordinates of the matrix
'12 0
0 2 0
1 -2 -3,
in this basis.
2.84. Define some inner product on the vector space of the matrices from the previous exercise. Compute the norm of the matrix from the previous exercise, induced by the product you have defined. O
A'
STAS.
2.50. Decomposition of an orthogonal mapping. Let us now
have a more detailed look on eigenvectors and eigenvalues of orthogonal mappings on a real vector space V with scalar product.
Consider fixed orthogonal mapping / : V -> V with matrix A in some orthonormal basis and let us try to continue as with the matrix D of rotation, as in the example 2.45.
But let us first have a general look on invariant subspaces of orthogonal mappings and their orthogonal complements. If for any subspace W C V and orthogonal mapping / : V -> V it holds that f(W) c W, then also for all w € ff1 it holds that w e W
(f(v), w) = (f(v), f o f-\w)) = (v, f-\w)) = 0
because also f~1(w) e W. But that means that also /(W-1) W-1. We have thus proved a simple but important proposition:
C
Proposition. Orthogonal complement of invariant subspace is also invariant.
If eigenvalues of orthogonal mapping are real, this claim ensures that there always exists a basis V of eigenvectors. Indeed, restriction of / on the orthogonal complement of an invariant sub-space is again an orthogonal mapping, therefore we can put into the basis one eigenvector after another, until we obtain the whole decomposition of V. However, mostly the eigenvalues of orthogonal mappings are not real. We again need to make a trip into complex vector spaces. Let us formulate a result right away:
Orthogonal mapping decomposition    |_n
Theorem. Let f : V —>• V be an orthogonal mapping on a vector space V with scalar product. Then all the roots of the characteristic polynomial f have size one and there exists a decomposition ofV into one-dimensional eigenspaces corresponding to the eigenvalues k — ±1 and two-dimensional subspaces Px ~x, where f acts by rotating through the angle equal to the argument of the complex number k in the positive sense. All these subspaces are mutually orthogonal.
Proof. Without loss of generality we can work with the space V — Rm with the standard scalar product. The mapping is thus given by orthogonal matrix A which we can see as a matrix of a linear mapping on a complex W/ space Cm (which just happens to have all coefficients real). Necessary there exist exactly m (complex) roots of the characteristic polynomial, counting their algebraic multiplicity (see the Fundamental theorem of algebra, ??). Furthermore, because the characteristic polynomial of the mapping has only real coefficients, the roots are either real or there is a tuple of roots which are complex conjugates X and X. The associated eigenvectors in Cm for such tuple of complex conjugates are actually a solutions to two systems of linear homogeneous equations which are also complex conjugates of each other - the corresponding matrices of the systems are all real except for the eigenvalues. Therefore also the solutions of this systems are complex conjugates.
Now we use the fact that for every invariant subspace its orthogonal complement is also invariant.    We first find the
114
CHAPTER 2. ELEMENTARY LINEAR ALGEBRA
2.85. Determine some basis of the vector space of all antisymmetric real square matrices of the type 4x4. Consider the standard inner product in this basis and using this product express the size of the matrix
/ 0     3     1 0\ -3012 -1-1    0 2 \0    -2-2 0/
2.86. Find the orthogonal complement U1- of the subspace
U = {(x\, x2, x3, x4); x\ = X3, x2 = X3 + 6x4} C R4.
Solution. Orthogonal complement U1- consists of exactly those vectors that are perpendicular to every solution of the system
x\ —   X3 = 0,
x2     ~    x3     ~    6X4    = 0.
A vector is a solution of this system if and only if it is perpendicular to both vectors (1, 0, —1, 0), (0, 1, —1, —6). Thus we have
U1- = {a ■ (1, 0, -1, 0) + b ■ (0, 1, -1, -6); a, b e R}.
□
2.87. Determine whether the subspaces U = ((2, 1, 2, 2))
a V = ((-1,0, -1,2), (-1,0, 1,0), (0,0, 1, -1)) of the space R4 are orthogonal. If they are, is R4 = U © V, that is, is U1- = VI
2.88. Depending on the parameter t e R determine the dimension of the subspace U of the vector space R\ if U is generated by the vectors
(a) mi = (1, 1, 1),    u2 = (l,t,\),    «3 = (2, 2, 0;
(b) «i    =    (t,t,t),    u2    =    {-At,-At, At),    «3 = (-2, -2, -2).
2.89. Construct an orthogonal basis of the subspace
((1,1,1,1), (1,1,1,-1), (-1,1,1,1)) of the space R4.
2.90. In the space R4 find some orthogonal basis of the subspace of all linear combinations of the vectors (1,0, 1, 0), (0, 1, 0, —7), (4,-2,4,14) and the subspace generated by the vectors (1,2,2,-1), (1, 1, -5,3), (3,2, 8, -7).
2.91. For what values of the parameters a, b e R are the vectors
(1,1,2,0,0),    (1,-1,0, I, a), (1,6,2,3,-2) in the space M5 pairwise orthogonal?
eigenspaces V±\ associated to the real eigenvalues and restrict our mapping to the orthogonal complement of their sum. Without loss of generality we can thus assume that our orthogonal mapping has no real eigenvalues and that dim V — 2n > 0.
Let us now choose some eigenvalue X and let u-k be the eigenvector associated to the eigenvalue X — a + ifi, p / 0. Analogously to the case of rotation in the plane given in the paragraph 2.45 by the matrix D we are interested in the real part of the sum of two one-dimensional subspaces (ux) © (ux), where ux is the eigenvector associated to the eigenvalue X.
It is an intersection of the given sum of the complex subspaces with R2n, which is generated by the vectors ux+ux and i(ux-ux), that is, real vector subspace Px C R2" generated by the basis given by the real and imaginary part of ux
XX — reux, Because A ■ (ux + üx) — Xux
-yx — -imux.
Xux and similarly with the second basis vector, it is clearly an invariant subspace with respect to multiplication by the matrix A and we obtain
A • xx — ocxx + Pyx, A ■ yx — -ayx + fixx.
Because our mapping preserves sizes, the size of the eigenvalue X must be equal to one. But that means that the restriction of our mapping on Px is rotation through the argument of the eigenvalue X. Note that the choice of the eigenvalue X instead of X leads to the same subspace with the same rotation, we just have it expressed in basis xx, yx, that is, we must in the coordinates rotate through the angle with opposite sign.
The proof of the whole theorem is finished, because the by restriction of our mapping to the orthogonal complement and repeating the previous we obtain the whole decomposition after n steps. □
We return to the ideas in this proof once again in the third chapter, when we study complex extensions of the Euclidean vector spaces, see 3.26.
Remark. Specially in the dimension three at least one eigenvalue ± 1 must be real, because three is an odd number. But then the associated eigenspace is an axis of the rotation of the three-dimensional space through the angle given by argument of the other eigenvalues. Try to think how to detect in which direction the space is rotated and also that the eigenvalue — 1 means additional reflection through the plane perpendicular to the axis of the rotation.
We shall return to the discussion of the properties of matrices and linear mappings. Before we continue with the general theory, we show first in the next chapter a couple of application. We close this section with a general definition:
___J    Spectrum of linear mapping j___
2.51. Definition. Spectrum of linear mapping f : V —>• V (spectrum of a matrix) is a sequence of roots of the characteristic polynomial /, along with multiplicities. Algebraic multiplicity of eigenvalue means its multiplicity as of the root of the characteristic polynomial, geometric multiplicity of the eigenvalue is the dimension of the associated subspace of eigenvectors.
Spectral diameter of of a linear mapping (of a matrix) is the greatest of the absolute values of the eigenvalues.
115
CHAPTER 2. ELEMENTARY LINEAR ALGEBRA
2.92. In the space M5 consider the subspace generated by the vectors (1,1,-1,-1,0),       (1,-1,-1,0,-1), (1,1,0,1,1), (—1,0,—1,1,1). Find some basis of its orthogonal complement.
2.93. Describe the orthogonal complement of the subspace V of the space R4, if V is generated by the vectors (—1, 2, 0, 1), (3, 1, —2, 4), (-4, 1,2,-4), (2, 3, -2,5).
2.94. In the space M5 determine the orthogonal complement W1- of the subspace W, if
(a) W = {(r + s + t, -r + t, r + s, -t, s + t); r,s,t e R};
(b) W is the set of the solutions of the system of equations x\ —
X3 = 0, X\ — X2 ~\~ X3 — x4 -\- x5 = 0.
2.95. Let in the space R4 be given vectors
(1,-2,2,1), (1,3,2,1).
Extend these two vectors into an orthogonal basis of the whole R4. (You can do it in any way you like, for instance using the Gram-Schmidt orthogonalisation process.)
In this terminology, our results about orthogonal mappings can be formulated as follows: the spectrum of orthogonal mapping is always a subset of the unit circle in the complex plane. That means that in the real part of the spectrum there are only values ±1, whose algebraic and geometric multiplicities are the same. Complex values of the spectrum then correspond to the rotations in suitable two-dimensional subspaces which are mutually perpendicular.
2.96.  Find some orthonormal basis of the subspace Vet, where V = {{x\,x2, x3, x4) e R4 \ x\ + 2x2 + x3 = 0}.
Solution. We see that the fourth coordinate does not appear in the restriction for the subspace, thus it seems reasonable to pick (0, 0, 0, 1) as one of the vectors of the orthonormal basis and reduce the problem into the subspace R3. Let us try once again to avoid any computation - we see that if we set the second coordinate equal to zero, then in the investigated space there are vectors with reverse first and third coordinate, notably, the unit vector (, 0, — , 0). This vector is perpendicular to any vector which has first coordinate equal to the third coordinate. In order to get into the investigated subspace, we choose the second coordinate equal to the opposite value of the sum of the first and the third coordinate, we then normalise, that is, we choose the vector (4=, — -7=, -4=, 0) and we are done. □
J. Eigenvalues and eigenvectors
2.97. Eigenvalues and eigenvectors can be used to illustrative description of linear mappings, notably in R2 and R3.
(1) Consider mapping with a matrix under the standard basis
116
CHAPTER 2. ELEMENTARY LINEAR ALGEBRA
We then obtain
\A - XE\
-a     0 1
0 1 — a 0
1 0 -a
-a3 + x2 + x - i,
with roots Xi 2 = 1,a3 = —1. Eigenvectors with eigenvalue a = 1 can be computed:
with the basis of the space of solutions, that is, of all eigenvectors with this eigenvalue
«i = (0,1,0),   u2 = (1,0,1). Similarly for a = — 1 we obtain the third independent eigenvector
(-1,0, 1).
Under the basis (note that u3 must be linearly indepen-
dent of the remaining two thanks to the previous theorem and u\, u2 were obtained as two independent solutions) / has the diagonal matrix
The whole space R3 is a direct sum of eigenspaces, R3 = v\ © v2, dim v\ = 2, dim v2 = 1. This decomposition is uniquely determined and says a lot about geometric properties of the mapping /. The eigenspace v\ is furthermore a direct sum of one-dimensional eigenspaces, which can be chosen in more ways (thus such a decomposition has no further geometrical meaning).
(2) Consider linear mapping / : R2[x] -> R2[x] defined by polynomial differentiation, that is, f(l) = 0, f(x) = 1, f(x2) = 2x. The mapping / thus has in the usual basis (1, x, x2) the matrix
Characteristic polynomial is | A — X ■ E\ = —a3, thus it has only one eigenvalue, a = 0. We compute the eigenvectors:
The space of the eigenvectors is thus one-dimensional, generated by the constant polynomial 1.
117
CHAPTER 2. ELEMENTARY LINEAR ALGEBRA
2.98. An exercise with a change of basis. Determine the eigenvalues and eigenvectors of the matrix
/l 1
A =   1   2 1
V 2 h
Describe the geometric interpretation of this mapping and write down its matrix under the basis:
ei   = [1,-1,1] e2   = [1,2,0] e3   =   [0, 1, 1]
Solution. Characteristic polynomial of the matrix is
1 — a     1 0 1      2 — a      1     =-a3+4a2 - 2a =-a(a2 - 4a+ 2). 1 2      1 - a
Roots of this polynomial, eigenvalues, say when the matrix
'l-a 1 0 1 2-a 1 1        2      1 - ay
will not have full rank, that is, the system of equations
'1-a 1 0 1 2-a 1 1        2     1 - ay
will have more solutions than just x = (0, 0, 0). Thus eigenvalues are 0, 2 + V2, 2 — ~Jl. Let us compute eigenvectors associated with the particular eigenvalues:
• 0: We solve the system
'l   1   0\ /jcA 1   2   1 \\x2 \= 0 1   2   l) \x3)
Its solutions form one-dimensional vector space of eigenvectors: ((1, -1, 1)). 2 + V2: We solve the system
(1 + V2)     1 0      \ /jcA
1 -72 1       \\x2 \= 0.
1 2     -(1 + V2)/ W
The solutions form a one-dimensional space ((1, 1+V2, 1+ V2)>.
• 2 — V2: We solve the system
V2-1)    1 0     \ /jcA
1        V2        1 x2   = 0.
1 2    (V2 - 1)/ VW
Its solutions form a space of eigenvectors ((1, 1 — V2, 1 — V2)>.
118
CHAPTER 2. ELEMENTARY LINEAR ALGEBRA
The given matrix has eigenvalues 0, 2 + \/2 and 2 — \/2, with associated one-dimensional spaces of eigenvectors ((1, —1,1)), ((1,1 + 72, 1 + 72)) and ((1, 1 - V2, 1 - V2)) respectively.
The mapping can thus be interpreted as a projection along the vector (1, — 1, 1) into the plane given by the vectors (1,1 + V2, 1 + V2) and (1,1 — V2, 1 — \/2) composed with the linear mapping given by "stretching" by the factor corresponding to the eigenvalues in the directions of the associated eigenvectors.
Now we express it under the given basis. For this we need the matrix T for changing the basis from the standard basis to the new basis. This can be obtained by writing the coordinates of the vectors of the original basis under the new basis into the columns of the matrix T. But we shall do it in a different way - we obtain first the matrix for changing the basis from the new one to the original one, that is, the matrix T~l. We just write the coordinates of the vectors of the new basis into the columns:
Then
and for the matrix B of a mapping under new basis we have (see 2.38)
/0   5 2 B = TAT'1 = 10   -2 -1 \0 14
□
Let us do some more exercises for computing with eigenvalues and eigenvectors.
2.99. Find the eigenvalues and the associated subspaces of eigenvectors of the matrix A=
Solution. Let us first construct the characteristic polynomial of the matrix:
-1-k     1 0 -1     3-A 0 2        —2    2 — X
A3 - 4A3 + 2X + 4.
This polynomial has roots 2, 1 + V3, 1 — V3, which are then eigenvalues of the matrix. Their algebraic multiplicity is one (they are simple roots of the polynomial), thus each has associated only one (up to a
119
CHAPTER 2. ELEMENTARY LINEAR ALGEBRA
non-zero multiple) eigenvector (that is, the so-called geometric multiplicity of the eigenvalue is also one, see 3.32).
Let us determine the eigenvector associated with the eigenvalue 2 (it is a solution of the homogeneous linear system with the matrix A - IE):
-3xi + x2 = 0 — lxi + x2 = 0 2x\ — 2x2   = 0.
The system has solution xi = x2 = 0, x3 e M arbitrary, the eigenvector associated to the value 2 is then for instance the eigenvector (and any multiple of it).
Analogically we determine the remaining two eigenvectors - as solutions of the system [A — (1 + */3)E]x = 0 and of the system [A — (1 + V3)£]x = 0. The solution of the system
(-2-V3)xi+x2   = 0 -lxi + (2 - V3)x2   = 0 2xi - 2x2 + (1 - V3)x3   = 0
is the space {((^ — V)t, — 5, ) , t e M}. That is the space of eigenvectors associated with the eigenvalue 1 + V3 (except for the zero vector, which is a solution of the system, but we do not consider it an eigenvector; we shall not refer to this anymore in the future and we won't explicitly exclude the zero vector from the set of solutions).
Similarly we obtain that the space of eigenvector associated with the eigenvalue 1 - 73 is ((-1 - ^, -±, 1)). □
2.100. Find the eigenvalues and the associated eigenspaces of eigenvectors of the matrix:
/I     1 0\
A =   -1    3    0 . \2    -2 2/
Solution. Characteristic polynomial of the matrix is A3—6A2+ 12A — 8, which is (A — 2)2 with a root 2 which has multiplicity 3. The number 2 is thus an eigenvalue with algebraic multiplicity three. Its geometric multiplicity is either one, two or three. Let us determine the vectors associated to this eigenvalue as the solutions of the system
-xi    +x2    = 0, (A - 2£)x = -xi    +x2    = 0, 2xi   —2x2   = 0.
Its solutions form the two-dimensional space ((1, —1, 0), (0, 0, 1)). The eigenvalue 2 has thus algebraic multiplicity 3 and geometric multiplicity 2.
CHAPTER 2. ELEMENTARY LINEAR ALGEBRA
□
Further basic exercises regarding eigenvalues and eigenvectors of matrices can be found at the page 130
2.101. For any nxn matrix A its characteristic polynomial \A—XE\ is of degree n, that is, it is of the form
| A - X E | = cn X" + c„_i X"-1 + ■ ■ ■ + ci X + c0, cn^0, while we have
cn = (-!)",    c„_! = (-l)"-1trA,    c0 = |A|. If the matrix A is three-dimensional, we obtain
| A - XE | = -X3 + (tr A) X2 + c\ X + \A\. By choosing X = 1 we obtain
\A-E\ = -l + trA+Cl + |A|. >From there we obtain
\A-XE\ = -A3 + (trA)A2 + (|A-£'| + l-trA-|A|)A + |A|.
Use this expression for determining the characteristic polynomial and the eigenvalues of the matrix
/32   -67 47 A =    7    -14 13 \-7    15 -6
o
2.102. Without any computation write down the spectrum of the linear mapping / : M3 -> M3 given by (x\, x2, x3)     (x\ + x3, x2, x\ + x3).
o
2.103. Give the dimension of the eigenspaces of the eigenvalues Xt of the matrix
/4  0  0 0\
14  0 0
5  2  3 0-\0  4  0 3/
o
2.104. Pauli matrix In physics, state of the particle with spin \ is described with Pauli matrices. They are the 2 x 2 matrices over complex numbers:
CTl = (i o)'CT2 = (/ "o)'CT3 = (o -l)
For square matrices we define their commutator (denoted by square brackets) as [o\, a2] := a\a2 — a2a\
Show that it holds that \_a\, a2] = 2io3 and similarly [a\, 0-3] = 2ia2 and [a2, er3] = 2ia\. Furthermore, show that of = a| = o2 = 1 and that eigenvalues of the matrices a\, a2, cr3 are ±1.
121
CHAPTER 2. ELEMENTARY LINEAR ALGEBRA
Show that for matrices describing the state of the particle with spin
1
the same commuting relations hold as in the case of Pauli matrices. Equivalently  it  can  be  shown  that under  the notation
space with basis (\, i, j, k) an algebra of quaternions (algebra is a vector space with binary bilinear operation of multiplication, in this case the multiplication is given by the matrix multiplication). In order for the vector space to be an algebra of quaternions it is necessary and sufficient to show the following properties: i2 = j2 = k2 = —1 and ij = -ji = k,jk = -kj = i and ki = -ik = j.
2.105. Can the matrix
be expressed in the form of the product B = P~l ■ D ■ P for some diagonal matrix D and invertible matrix PI If it is possible, give an example of such tuple of matrices D, P, and find out how many such
As we have already seen in , based on the eigenvalues and eigenvectors of the given 3x3 matrix, we can often geometrically interpret the mapping it gives in R3. Notably, we can do it in these situations:
If the matrix has 0 as eigenvalue and 1 as an eigenvalue with geometric multiplicity 2, it is a projection in the direction of the eigenvector associated with the eigenvalue 0 on the plane given by the eigenspace of the eigenvalue 1. If the eigenvector associated with 0 is perpendicular to that plane, the mapping is an orthogonal projection.
If the matrix has eigenvalue — 1 with the eigenvector perpendicular to the plane of the eigenvectors associated with the eigenvalue 1, it is a mirror symmetry through the plane of the eigenvectors associated with 1.
If the matrix has eigenvalue 1 with eigenvector perpendicular to plane of the eigenvectors associated with the eigenvalue —1, it is an axial symmetry (in space) through the axis given by the eigenvector associated with 1.
2.106. Determine what linear mapping R3 -» R3 is given by the matrix
forms the vectors
tuples are there.
O
122
CHAPTER 2. ELEMENTARY LINEAR ALGEBRA
Solution. The matrix has a double eigenvalue —1, its associated eigenspace is ((2, 0, 1), (1, 1, 0)). Further, the matrix has 0 as the eigenvalue, with eigenvector (1, 4, —3). The mapping given by this matrix under the standard basis is then an axial symmetry through the line given by the last vector composed with the projection on the plane perpendicular to the last vector, that is, given by the equation x + Ay - 3z = 0. □
2.107. The theorem (2.50) gives us tools how to recognise matrix of a rotation in R3: it has three distinct eigenvalues with absolute value 1, one of them is the number 1 (its associated eigenvector is the axis of the rotation). The argument of the remaining two, which are necessarily complex conjugates, gives the angle of the rotation in the positive sense in the plane given by the basis ux + W, i\ux — ui~]-
2.108. Determine what linear mapping is given by the matrix
=i    1 =1
5 5 5
=1 1 1
5 5 5
8 ^4 3
5        5 5
Solution. By the already known method we find out that the matrix has the following eigenvalues and corresponding eigenvectors: 1, (1, 2, 0); | + f z, 1, (1, 1 + i, -1 - 0; | - fi, (1, 1 - i, -1 + 0- It is thus a matrix of a rotation (all the eigenvalues have absolute value 1 and one of the eigenvalues is 1), further we know that it is a rotation by the angle arccos(|) = 0, 2957T, which is the argument of the complex number | + ji. It remains to determine the direction of the rotation. First it is good to recall that the meaning of the direction of the rotation changes when we change the orientation of the axis (it has no meaning to speak of the direction of the rotation if we do not have an orientation of the axis). Using the ideas from the proof of the theorem 2.50, we see that the given matrix acts by rotating by arccos(|)) in the positive sense in the plane given by the basis ((0,1,-1), (1,1,-1)). The first vector of the basis is the imaginary part of the eigenvector associated with the eigenvalue | + |r, the second is then the (common) real part of the eigenvectors associated with the complex eigenvalues. The order of the vectors in the basis is important (by changing their order the meaning of the direction changes). The axis of rotation is perpendicular to the plane. If we orient using the right-hand rule (the perpendicular direction is obtained by taking the product of the vectors in the basis) then the direction of the rotation agrees with the direction of rotation in the plane with the given basis. In our case we obtain by the vector product (0, 1, -1) x (1, 1, -1) = (0, -1, -1). It is thus a rotation through arccos(|) in the positive sense about the vector
123
CHAPTER 2. ELEMENTARY LINEAR ALGEBRA
(0,-1,-1), that is, a rotation through arccos(|) in the negative sense about the vector (0, 1, 1). □
124
CHAPTER 2. ELEMENTARY LINEAR ALGEBRA
125
CHAPTER 2. ELEMENTARY LINEAR ALGEBRA
126
CHAPTER 2. ELEMENTARY LINEAR ALGEBRA
K. Additional exercises for the whole chapter 2.109.   Solve the equation
+      X2    +      x3     + x4
-Xi
-1x\
2x2
*2
+ 2x3
x3
+ 2x4 +
2x5 4x5
+   3x2   + 3x3
x4   -\- 2x5 - 6x5
3, 5, 0, 2.
Solution. The extended matrix of the system is
/
V
1 0
-1 -2
1
2
1
2
1 -1
1
2 1 0
3 \ 5 0
2/
Adding the first row to the third, adding its 2-multiple to the fourth, and adding the (—5/2)-multiple of the second to the fourth we obtain
/111 0  2 2 0  0 0 \0  5 5
-2 -4 0
-10
3 \ 5
3
8/
/ 1 0 0
1
2 2
-2 -4
0
0
3 \ 5
3
-9/2 /
The last row is clearly a multiple of the previous, and thus we can omit it. The pivots are located in the first, second and fourth, thus the free variables are x3 and x5 which we substitute by the real parameters t and s. We thus consider the system
Xi +
*2
2x2
+ +
t It
+ +
x4
2s 4s
3, 5, 3.
We see that x4 = 3/2. The second equation gives
2x2 + 2t + 3 — 4s = 5,    that is,    x2 = 1 — t + 2s. >From the first we have
xi + 1 - t + 2s + t + 3/2 - 2s = 3,
Altogether, (2.1)
(Xi, X2, x3, x4, x5)
tj.     Xi = 1/2.
(1/2, 1 - t +2s, t, 3/2, s),    t,s €
In this exercise we also consider the extended matrix and we transform it using the row transformations into the row echelon form, where the first non-zero number in every row is 1 and where in a column in which this 1 is located the remaining numbers are 0. We also note that we omit the fourth equation, which is a combination of the first three. Gradually, multiplying the second and the third row by the number 1/2, subtracting the third row from the second and from the first and by subtracting the second row from the first we obtain
0
1	1	1	1	-2	3)
0	2	2	2	-4	5
0	0	0	2	0	3J
1	1	1	0	-2	3/2
0	1	1	0	-2	1
0	0	0	1	0	3/2
1   0  0  0 0 0 110-2 0  0  0   1 0
127
CHAPTER 2. ELEMENTARY LINEAR ALGEBRA
If we choose again x3 = t, x5 = s (t, s € R), we obtain the general solution (|| 2.11|) in the same form,
directly. Consider the corresponding equations
jci = 1/2,
x2   +   t —   2s   = 1,
x4 = 3/2.
□
2.110.  Find the solution of the system of linear equations given by the extended matrix
/ 3	3	2	1	3 \
2	1	1	0	4
0	5	-4	3	1
\5	3	3	-3	5/
Solution. We transform the given extended matrix into the row echelon form. We first copy the first three rows and into the last row we write the sum of the (2)-multiple of the first and of the (—3)-multiple of the last row. By this we obtain
/ 3	3	2	1	3 \		( 3	3     2 1	3 \
2	1	1	0	4		0	-3   -1 -2	6
0	5	-4	3	1		0	5-4 3	1
\5	3	3	-3	5 )		\0	6     1 14	0/
Copying the first two row s and adding a 5-multiple of the second row to the 3-multiple of the third and its 2-multiple to the fourth gives
2
-1 -17 -1
/ 3 0 0
-1
-4
1
1
-2 3
14
3 \
1
0/
/ 3 0 0
V 0
0 0
1
-2 -1 10
3 \
33
12/
We copy the first, second and fourth row, and add the fourth to the third, we obtain
/ 3 0 0
V 0
0 0
-1
■17 -1
1
-2 -1 10
3 \ 6
33 12/
/ 3 0 0
V 0
0 0
-1
■18
-1
1
-2
9
10
3 \ 6
45 12/
Then we have (the remaining row transformations are „usual")
/ 3	3	2	1	3 \			( 3		3	2	1		3 \	
0	-3	-1	-2	6				0	-3	-1	-2		6	
0	0	-18	9	45				0	0	2	-1		-5	
\o	0	-1	10	12)			V	0	0	1	-10		-12)	
/ 3	3	2	1	3				(3	3	2	1		3	\
0	-3	-1	-2	6				0	-3	-1	-2		6	
0	0	1	-10	-12				0	0	1	-10		-12	
	0	2	-1	-5	)			\0	0	0	19		19	/
We see that the system has exactly 1 solution. We determine it by backwards eUmination
	(3	3	2	1	3			(3		3	2	0	2					
	0	-3	-1	-2	6			0		-3	-1 0		8					
	0	0	1	-10	-12			0	0		1	0	-2					
	\0	0	0	1	1	)		\0	0		0	1	1	)				
( 3		3 0	0	6 \		( 1	1	0	0	2	\		( 1	0	0	0	4	
	0	-3 0	0	6		0 1		0	0	-2			0	1	0	0	-2	
	0	0 1	0	-2		0 0		1	0	-2			0	0	1	0	-2	
V	0	0 0	1	1 /		^ 0 0		0	1	1	)		V 0	0	0	1	1	)
128
CHAPTER 2. ELEMENTARY LINEAR ALGEBRA
The result is then
X\ = 4,    x2 = —2,    x3 = —2,    x^ = 1.
□
2.111.   Give all the solutions of the homogeneous system
x + y = 2z + v,    z + 4w + v = 0,    —3m =0,    z = — 1; of four linear equations with 5 variables x, y, z, u, v.
Solution. We rewrite the system into a matrix such that in the first column there are coefficients at x, in the second there are coefficients at y, and so on, while we put all the variables in equations to the left side. By this we obtain the matrix
-1\ 1
0 4
/l   1 -
0  0 ]
0  0 0-30 \0  0    1     0     1 /
We add (4/3)-multiple of the third row to the second and subtract then the second row from the fourth to obtain
/I 1 0 0 0 0
0
1
0
1
0 4
-3 0
-1\ 1 0
/I 1 0 0 0 0
0
1
0 0
0 0
-3 0
-1\
1
0
0/
We multiply the third row by the number —1/3 and add the 2-multiple of the second row to the first, which gives
/l   1 -2 0 -l\     /l 1 0 0 l\
0  0 1 0 1          0 0 1 0 1
00 0 -3 0    ~   0 0 0 1 0
\0  0 0 0 0 /     \0 0 0 0 0/
>From the last matrix we can directly obtain all solutions
because we have the matrix in the row echelon form, while the first non-zero number in every row is 1 and in a column where there is such 1 there are zeroes at all other positions. The solution given as a linear combination of two vectors is determined exactly by the columns without first non-zero number on some row, that is, by the second and the fifth column, when we choose 1 as the second coordinate for the second column and as the fifth coordinate for the fifth column and when we take the numbers in the corresponding column with the opposite sign and put them at the position given by the column where there is a first 1 in its row. Let us add that the result can be immediately rewritten in the form
/x\		/-1\		/-1\
y		1		0
z	= t	0	+ s	-1
u		0		0
\v)		\°)		K1)
(x, y, z, u, v) = (—t — s, t, —s, 0, s) ,    í, s e
□
2.112. Decompose the following permutations into transposition:
129
CHAPTER 2. ELEMENTARY LINEAR ALGEBRA
(l 2 3 4   5   6 l\
l> \1 6 5 4   3   2 1/'
/l 2 3 4  5  6  7 8\
U) V6 4 1 2  5   8   3 7/1'
?1 2 3   4   5  6  7   8  9 10\
Ul) \4 6 1 10  2  5  9   8  3   7 /
2.773. Determine the parity of the given permutations: .   (I 2 3 4  5  6 7\
'^7 5 6 4   1  2 3/'
/l 2 3 4  5  6  7 8\
U) \6 7 1 2  3  8  4 5/'
/l 2 3   4   56789 10\
m' V9 7 1 10  2  5  4  9  3   6 )'
2.774. Determine the eigenvalues of the matrix
/	-13	5	4	2\
	0	-1	0	0
	-30	12	9	5
V	-12	6	4	V
o
2.775. Having been told that the numbers 1, — 1 are the eigenvalues of the matrix
/-ll   5   4 l\ -3    0 10 -21   11   8  2 ' \-9    5   3 1/
give all solutions of the characteristic equation | A — X E \ =0. Hint: if you denote all the roots of the polynomial | A — X E \ as X\, k2, a3, a4, it is
I A I = ki ■ k2 ■ a3 • k4,    tr A = ki + k2 + a3 + k4.
o
2.77d Give an example of a four-dimensional matrix with eigenvalues ki = 6 and k2 = l such that their multiplicity of k2 as a root of the characteristic polynomial is three and that
(a) the dimension of the subspace of eigenvectors of k2 is 3;
(b) the dimension of the subspace of eigenvectors of k2 is 2;
(c) the dimension of the subspace of eigenvectors of k2 is 1;
o
2.777. Find the eigenvalues and eigenvectors of the matrix:
-1 0 0
2.77S. Determine the characteristic polynomial | A—k E |, eigenvalues and eigenvectors of the matrix
<\   -1 6^ 2    1 6
a -1
130
CHAPTER 2. ELEMENTARY LINEAR ALGEBRA
O
respectively.
131
CHAPTER 2. ELEMENTARY LINEAR ALGEBRA
Solutions to the exercises
2.7.
2.12. There is only one such matrix X, and it is
'18 -32^
A      — 27
'14   13   —13> 13   14 13 0    0 27
			/I		10 -	-4\		
2.14.	A-1 =		1		12 -	5		
			V>		5 -	"2/		
		(2	_1		0	0	0\	
		-5	8		0	0	0	
2.15.		0	0		-1	0	0	
		0	0		0	-5	2	
			0		0	3	-V	
				(0	1	1	o\	
2.16.	C"1 =		1 2	0 1	1 -1	0 0	0	1
				V	-1	-1	1	/
2.17. In the first case we have
A =
in the second
2.18. We have
1 (3 -i
2 ' \i 1
/14   8 5^
2 11 \ 1 10;
n - 1
ŕ	1	i	... 1\
i	0	i	... 1
i	i	0	
			'•. 1
v	i		1 0/
2.21. -3,17,-1
2.24. By subtracting the first row from all other rows and then expanding the first column we obtain
V„(xi, x2, ■ ■ ■, x„)
1 XI x^
0   X2 — XI   x| — x^
0   xn    x\   x2 x^
X2      X^ "^2
Xfi      X\     X2, X2
„«-1
„n-l _ „n-l A2 Al
x"-l _ x"-l
vn-\ _
*n -""1
If we take out x;+i — x\ from the i-th row for / e {1, 2,..., n — 1}, we obtain
V„(xi, x2, ..., x„)
i x2 + XI  ... YľjZo x2~j~2 xi
(X2 - xi) • • • (x„ - xi)
1    x„ + XI
«-i'-2 / Z^j=o   " 1
132
CHAPTER 2. ELEMENTARY LINEAR ALGEBRA
By subtracting from every column (starting with the last and ending with the second) x\ -multiple of the previous column, we obtain
^n-2 n-j-2
1    X2 + X\
n-j-Z r 2-.j=£> A2 Al
En—2 n-j-2 i j=0 Xn Tl
1 X,
1 xn
x2
xl-
1    x„ + X\
Therefore
Vn(xi,x2----,x„) — (x2 -xi) ■ ■ ■ (xn - xi) Vn-i(x2, ... ,xn).
Because it is clear that
^(Xft — l, Xn) — Xn      Xn — \,
it holds (by induction) that
V„ (x i, x2.....x„ ) —    Y\   (x.i - x') ■
1 </<./<»
Note that the determinant is non-zero whenever the numbers x\,..., xn are mutually distinct. 2.27.
/I	1   1   1 \	-1	(I 1	i   i \
1	1   -1 -1	1	1 1	-i -i
1	-1   1 -1	4	1 -1	i -i
V	-1-1 l)		V1 -i	-i if
We can then easily obtain
13
X] — —,     x2 —--,     Xi. —--,     XA — —.
4 4 4 4
2.33. The solutions are exactly all scalar multiples of a vector
(l + V3, -V3, 0, 1, o) .
2.34. x\ — 1 + t,   x2 — |,   X3 — t,   XA — — 5.       t e
2.35. The system has no solution.
2.3(5. The system has a solution, because
/3\     / 3 \         / 1 \
2 2
w
3
-3
V-2/
-1 1
V1/
2x3 x3 4x3 3x3
2x3 =
X3 =
4x3 =
3x3 =
/1\
8 4
V6/
1,
2,
3.
4. 5
2.37. System of linear equations
3*i xi
5xi
X2
has no solution, while the system
3xi xi 7xi 5xi
X2
has a unique solution xi — —l,x2 — l,x3 —2.
2.38. The set of all solutions is
{(—lOr, (a+4)t, (3a — 8)0 ; t e
133
CHAPTER 2. ELEMENTARY LINEAR ALGEBRA
2.39. For a — 0 the system has no solution, for a / 0 it has infinitely many solutions.
2.40. The correct answers are „yes", „no", „no" and „yes" respectively.
2.41. i) For b / -7 is x = z = (2 + a)/(fc + 7), y = (3a - - l)/(fc + 7) (lb), ii) For £ a ^ — 2 (lb) has no solution (lb), for a — —2 the solution is x — z, 3z — 1 (2b).
2.43. From the knowledge of the inverse matrix F_1 we obtain
-7 (lb) and
F * = («<$ - ßy) F~
-y
a 0
for any a, fi,y,8e R. 2.44. The matrices are
/ 1 1 0 1
(a)
-1
-1
\2
1
-4\
-1 6
-10/
(b)
6
-3 + 2i
-2i l + i
2.47. It is easy to check that it is a vector space. The first coordinate does not affect the results of the operations - it is just the vector space (R, +, •) written in a different way. 2.52. There is a unique solution
p — 2,       q — —2,       r — 3.
2.55. (2+^,2-^).
2.56. The vectors are dependent whenever at least one of the conditions
a — b — I,       a — c — 1,       b — c — 1
is satisfied.
2.57. Vectors are linearly independent.
2.58. It suffices to add for instance the polynomial x. 2.70.
I 1/4    —v/6/4 3/4 L/6/4    -1/2 -V6/4 V 3/4     V6/4 1/4
2.74.
2.75.
5/6	-1/6	1/3
-1/6	5/6	1/3
1/3	1/3	1/3
5/9 2/9
2/9 8/9
-4/9 2/9
-4/9> 2/9 5/9
2.SO. cos =
V2 N/3"
2.8/.
91 : (2- ^,2V3 + i)f,    fl2 : (2+-2V3 +i)f.
2.S4. For instance the inner product that follows from the isomorphism of the space of all real 3x3 matrices with the space R9. If we use the product from R9, we obtain an inner product that assigns to two matrices the sum of products of two corresponding elements. For the given matrix we obtain
'1
y 12 + 22 + O2 + O2 + 22 + O2 + l2 + (-2)2 + (-3)2
134
CHAPTER 2. ELEMENTARY LINEAR ALGEBRA
2.87. The vector that gives the subspace U is perpendicular to each of the three vectors that generate V. The subspaces are thus orthogonal. But it is not true that M4 — U © V. The subspace V is only two-dimensional, because
(-1,0, -1,2) = (-1,0, 1,0) -2(0, 0, 1, -1).
2.88. In the first case we have dim U — 2 for t e {1, 2}, otherwise we have dim U — 3. In the second case we have dim U — 2 for t / 0 and dim U — 1 for t — 0.
2.89. Using the Gram-Schmidt orthogonalisation process we can obtain the result
((1, 1, 1, 1), (1, 1, 1, -3), (-2, 1, 1,0)).
2.90. Preserving the order of the subspaces from the problem statement we have for instance the orthogonal bases
((1,0, 1,0), (0, 1,0, -7))
and
((1, 2, 2, -1), (2, 3, -3, 2), (2, -1,-1, -2)).
2.91. The result is a — 9/2, b — —5, because it must hold
1+6 + 4 + 0 + 0 = 0,    1-6 + 0 + 3-2a = 0.
2.92. The basis must contain a single vector. It is some non-zero scalar multiple of the vector
(3, -7, 1, -5,9).
2.93. Orthogonal complement V1- is a set of all scalar multiples of the vector (4, 2, 7, 0).
2.94.
(a) W1- = ((1, 0, -1, 1, 0), (1, 3, 2, 1, -3));
(b) W1- = ((1,0, -1,0,0), (1, -1, 1, -1, 1)).
2.95. There is infinitely many possible extensions, of course. A very simple one is for instance
(1,-2,2,1),    (1,3,2,1),    (1,0,0,-1), (1,0,-1,1).
2.101. Je | A - X E | = -X3 + 12X2 - MX + 60, . Xx = 3, X2 = 4, X3 = 5.
2.102. The result is the sequence 0, 1,2.
2.103. Dimension is 1 for X\ — 4 and 2 for X2 — 3.
2.105. Matrix B has two distinct eigenvalues, and thus such expression exists. For instance it holds that
'5   6\_i/v/2   -V2\   /ll    0\   i/V2 s/2\ v6   S)-^\J2    V2JV0    -lj'2^_V2 V2j There exist exactly two diagonal matrices D:
'11    0\        /-l 0
but the columns of the matrix P~l can be substituted with their arbitrary non-zero scalar multiples, thus there are infinitely many tuples D, P.
2.112. i) (1, 7)(2, 6)(5, 3), ii) (1, 6)(6, 8)(8, 7)(7, 3)(2,4), ill) (1,4)(4, 10)(10, 7)(7, 9)(9, 3)(2, 6)(6, 5)
2.113. i) 17 inversions, odd, ii) 12 inversions, even iii) 25 inversions, odd
2.114. The matrix has only one eigenvalue--1.
2.115. The root —1 of the polynomial | A — X E \ has multiplicity three.
2.116. For instance,
135
CHAPTER 2. ELEMENTARY LINEAR ALGEBRA
(a)
/6 0 0
(c)
0\
0 0
V
/6 0 0
(b)
0
1 7 0
/6 0 0
\o o\
0
1
V
0 0
2.117. Triple eigenvalue —1, corresponding eigenspace is ((1, 0, 0), (0, 2, 1)).
2.118. Characteristic polynomial is —(X — 2)2(X — 9), that is, eigenvalues are 2 and 9 with associated eigen-
vectors
(1,2, 0), (-3,0, 1)   a (1,1,1)
136
CHAPTER 3
Linear models and matrix calculus
where are the matrices useful? - basically almost everywhere..
A. Processes with linear restrictions
Let us show an example of a very simple linear optimisation prob-
lem:
3.1. A company manufactures bolts and nuts. Nuts and bolts are moulded - moulding a box of bolts takes one minute, box of nuts is moulded for 2 minutes. Preparing the box itself takes one minute for bolts, 4 minutes for nuts. The company has at its disposal two hours for moulding and three hours for box preparation. Demand says, that it is necessary to manufacture at least 90 boxes of bolts more than boxes of matrices. Due to technical reasons it is not possible to manufacture more than 110 boxes of bolts. The profit from one box of bolts is 40 Kc and the profit from one box of 60 Kc. The company has no trouble with selling. How many boxes of nuts and bolts should be manufactured in order to have maximal profit?
Solution. Let us write the given data into a table:
	Bolts 1 box	Nuts 1 box	Capacity
Mould	1 min./box	2 min./box	2 hours
Box	1 min./box	4 min./box	3 hours
Profit	40 Kc/box	60 Kc/box	
We have already developed a pretty useful package of tools and it is time to show some applications of the matrix calculus. On some relatively easy problems we see that the theory allows us both qualitative and quantitative analyses and sometimes it leads quite easily to some surprising results.
Although it might seem that the assumption of linearity of relations between the quantities is too restrictive, it is quite often note so - in real problems there linear relations tend to either directly appear or the final process is a result of an iteration of many linear steps. And even if it is not the case, we can using this approach at least approximate the real processes.
We like to view the matrices (and linear mappings) as objects with which we would like to work with as if they were scalars. In order to do that, a pretty hard work to be done in |x the fourth chapter is required. We show a quick and useful application then on the so-called matrix decompositions, which are needed for numerical mastery of matrix calculus in a most robust way.
1. Linear processes
3.1
Solution of system of linear equations. Simple linear processes are given by linear mappings <p : V -> W on vector spaces. As we can surely imagine, the vector i; e V can represent the state of some system we are observing, while cp(v) gives the result after some process was realised.
If we want to reach a given result b e W of such process, we solve the problem
cp(x) — b
for some unknown vector x and a known vector b.
In fixed coordinates we then have a matrix A of a mapping <p and coordinate expression of the vector b. As we have already noted in the introduction to the second chapter, the set of all solutions of the so-called homogeneous system
a -x = 0
is a vector space.
If the dimension of V is finite, say n, and the dimension of the image of the mapping <p is k, then by solving of this system using the row echelon transformation (see 2.7) we find out that the dimension of the space of all solutions is exactly n—k. Indeed, as the columns of the matrix of the mapping are exactly the images of the vectors of the basis, in the matrix of the system there are exactly k linearly independent columns and thus also the same number of linearly independent rows. Therefore even after doing the transformation into the row echelon form exactly n—k zero rows remain.
CHAPTER 3. LINEAR MODELS AND MATRIX CALCULUS
Denote by x\ the number of manufactured boxes of bolts and by x2 the number of manufactured boxes of nuts. From the restriction on the moulding time and from the restriction on the box preparation we obtain the following restrictive conditions:
xi + 2x2 < 120 xi+4x2   < 180
x\   >   x2 + 90
xx   > 110
The objective function (the function that gives the profit for given number of manufactured nuts and bolts) is \§x\ + 60x2. The previous system of inequalities gives R2 a certain area and the optimisation of the profit means to find in this area the point (points) in which the objective function has the maximum value, that is, to find the highest k such that the line 40xi + 60x2 = k has a non-empty intersection with the given area. Graphically, we can find the solution for example by placing the line p into the plane such that it satisfies the equation 40xi + 60x2 = 0 and start moving it "upwards" as long as it has some intersection with the area. It is clear that the last intersection is either a point or the borderline of the line (and the line must be parallel to p). Thus we obtain (see the figure) the point x\ = 110 and x2 = 5. Maximum possible income is thus 40 • 100 + 60 • 5 = 4700 Kč. □
3.2. Minimisation of costs for feeding Stable in Nišovice u Volyně buys fodder for winter: hay and oat. The nutritional values of the fodder and required daily portions for one foal are given in the table:
g/kg	Hay	Oat	Requirements
Dry basis	841	860	At least 6300 g
Digestible nitrogen stuff	53	123	At most 1150g
Starch	0,348	0,868	At most 5,35 g
Calcium	6	1,6	At least 30 g
Phosphate	2,8	3,5	At most 44 g
Natrium	0,2	1,4	Approximately 7 g
Cost	1,80	1,60	
Every foal must obtain in daily meal at least 2 kg of oat. Average cost (counting the payment for the transportation) cost 1, 80 Kc per 1 kg of hay and 1, 60 Kc per 1 kg of oat. Compose daily diet for one foal which has minimum costs.
3.3. Optimal distribution of material. On inner wooden panelling of a cottage there are following requirements
• at most 120 planks of length 35 cm,
• from 180 to 330 planks of length 120 cm,
• at least 30 planks of length 95 cm.
When solving the system of equation we are thus left with exactly n—k free parameters and by setting one of them to have the value one and making the other to be zero we obtain exactly n—k linearly independent solutions. All solutions are then given by all the linear combinations of these n—k solutions. Every such (n — &)-tuple of solutions is called fundamental system of solutions of the given homogeneous system of equations. We have proved:
Theorem. The set of all solutions of the homogeneous system of equations
A-x = 0
for n variables with the matrix A of rank K is a vector subspace ofW of dimension n—k. Every basis of such subspace forms a fundamental system of solutions of the given homogeneous system.
3.2. Non-homogeneous systems of equations. Consider now the general system of equations
A ■ x = b.
Let us now realise once again that the columns of the matrix A are actually images of the vectors of the standard basis in K" under the linear mapping <p corresponding to the matrix A. If there is to be a solution, b must be in the image under <p and thus it must be a linear combination of the columns in A.
If we extend the matrix A by the column b, the number of linearly independent columns and thus also rows might increase (but does not have to). If this number increases, then b is not in the image and the system of equations does not have a solution. If on the other hand the number of linearly independent rows does not change after adding the column b to the matrix A, it means that b must be a linear combination of the columns of A. Coefficients of such combinations are then exactly the solutions of our system.
Consider now two fixed solutions x and y of our system and some solution z of the homogeneous system with the same matrix. Then clearly
A-(x-y)=b-b = 0 A ■ (x + z) = 0 + b = b. Thus we can summarise:
3.3. Theorem. The solution of non-homogeneous system of linear equations A ■ x = b exists if and only if adding the column b to the matrix A does not increase the number of linearly independent rows. In such case the space of all solution is given by all sums of one fixed particular solution of the system and all solutions of the homogeneous system that has the same matrix.
In literature, this theorem is often called Frobenius theorem and its usual formulation is "system has a solution if and only if the rank of its matrix equals the rank of its extended matrix".
3.4. Optimisation linear models. In the parallel column we have started this chapter with painting problems. We shall continue with this. Imagine that our very specialised painter in the black-white world is willing to paint
facades of either small family houses or of big public building, and that he uses only black and white colours. He can arbitrarily choose, in what range will he do x units of area of the first type or y units of the second type. Let us assume that his maximal workload in a given interval is L units of area, his clean income (that is, after subtracting the costs) is c\ per a unit of area for small houses and
138
CHAPTER 3. LINEAR MODELS AND MATRIX CALCULUS
But only planks of length 4 meters can be bought. The total waste cannot be greater that 360 cm. Determine the minimal number of planks that can be bought (and how to cut them) in order to meet the requirements.
C2 per unit of area for big buildings. Furthermore, he has only W kg of white colour and B kg of black colour at his disposal. Finally, a unit of area for small houses requires w\ kg of white colour and b\ kg of black colour, while for big buildings the corresponding values are w2 and b2.
If we sum it all into (in)equalities, we obtain the conditions
(3.1)
(3.2)
(3.3)
x\ + X2 < L w\x\ + W2X2 < W b\x\ + b2x2 < B.
The total clean income of the painter,
h(x\,X2) = c\x\ + C2X2,
should be maximised.
Each of the given inequalities clearly gives in the plane of the variables (x\,X2) a half-plane, bounded by a line given by the corresponding equality, and we must also assume that both x\ and X2 are non-negative real numbers (because the painter cannot paint negative areas). Thus we have the restrictions for the values (xi, X2) - either the restrictions are unsatisfiable, or they allow points inside of a polygon with at most five vertices, see the picture.
In general we speak of linear programming problem whenever we seek either maximum or minimum of a linear form h over Rn over a set bounded by a system of linear inequalities which we call linear restrictions. Vector on the right side is then called vector of restrictions, the linear form h is also called objective function.
Formulation with inequalities < at restrictive conditions, non-negative variables and maximalisation of the objective functions is called standard maximalisation problem. On the other hand, standard minimisation problem is defined by seeking minimum of the objective function while the restrictive inequalities are > and the variables are non-negative.
It is easy to see that every general linear programming problem can be transformed into a standard one of either type. Aside of sign changes we can work with decomposition of the variables that have no sign restriction into a difference of two non-negative ones. Without loss of generality we shall further work only with the standard maximisation problem.
How to solve such problem. We seek maximum of a linear form h over subsets M of a vector space which are given by linear inequalities, that is, in the plane by the intersection of half planes, in general we shall speak in the next chapter about half-spaces. Note that every linear form over real vector space h : V -> R (that is, arbitrary linear scalar function) is monotone in every chosen direction - that is, in the direction it either grows all the time or decreases. More precisely, if we choose a fixed starting vector u e V and "directional" vector v e V, then composition of our form h with parametrisation yields
t h-> h(u + t v) = h(u) + t h(v).
This expression is indeed with increasing parameter t always either increasing or decreasing, or constant (depending on whether h(v) is positive, negative or zero).
Thus we surely must expect that problems similar to the one with the painter are either unsatisfiable (if the given set with restrictions is empty), or the profit is unbounded (if the restrictions give an unbounded of the space and the form h is in some of the unbounded directions non-zero) or they attain a maximal solution
139
CHAPTER 3. LINEAR MODELS AND MATRIX CALCULUS
in at least one of the "vertices" of the set M (while usually that would be the case for only a single vector, it can also be the case that the maximum is attained on a part of the boundary of the area M.
3.5. Formulation using linear equations. Finding an optimum is not always as simple as in the previous case. The problem can contain many variables and restrictions and even deciding whether the set M of satisfiable points is non-empty can be a problem. We don't have space here for the complete theory, but we mention at least two directions of ideas, which show that actually the solution can be always found in a way similar to the previous paragraph.
Let us begin by comparison with systems of linear equations - because we understand those well. Let us write the equations (3.1)-(3.3) in general form:
A ■ x < b,
where x is now an n-dimensional vector, b is m-dimensional vector and A is the corresponding matrix. By an inequality between vectors we mean individual inequalities between coordinates. We want to maximise the product c ■ x for a given row vector of coefficients of the linear form h. If we add a new auxiliary variable for every equation and add another variable z for the value of the linear form h, we can rewrite the whole system as a system of linear equations
' z
where the matrix is composed of the blocks with 1+n+m columns and 1 + m rows, with corresponding individual components of the vectors. Additionally we require for all coordinates X and xs non-negativity.
If the given system of equations has a solution, in this set of solutions we seek values for the variables z, x and xs, such that all x are non-negative and z maximised. In the paragraph 4.11 on page 215 we will discuss this situation from the viewpoint of affine geometry.
Specifically, in our problem of black and white painter the system of linear equations looks like this:
/l	-c\	~C2	0	0	o\
0	1	1	1	0	0
0	W\	W2	0	1	0
V>	bi	b2	0	0	V
XI X2
x3
x4
w
L W
W
3.6. Duality of linear programming. Consider the real matrix A with m rows and n columns, vector of restrictions b and row vector c giving the objective function. From these data |- we can compose two problems of linear programming for
x € M" and y e W".
Maximisation problem: Maximise c ■ x under the conditions A ■ x < b and x > 0.
Minimisation problem: Minimise yT ■ b under the condition yT ■
A > cT and y > 0.
140
CHAPTER 3. LINEAR MODELS AND MATRIX CALCULUS
We say that these two problems are mutually dual. For deriving other properties of linear programming we first introduce some terminology.
We say that the problem is solvable if there is some admissible vector x which meets all restrictions. Solvable maximisation (minimisation) is bounded, if the objective function is bounded from above (bellow) over the set of admissible vectors.
Lemma. IfxeM." is an admissible vector for the standard maximisation problem and y e M™ is admissible vector for the dual minimisation problem, then for the objective functions we have
Proof. It is actually a simple observation: x > 0 and cT < yT ■ A, but also y > 0 and A ■ x < b, thus it must also hold that
>From here we immediately see that if both dual problems are solvable, then they must be bounded. Even more interesting is the following corollary, which is directly implied by the inequality in the previous proof.
Corollary. If there exist admissible vectors x and y of dual linear problems such that for the objective functions it holds that c-x — yT -b, then both are optimal solution for the corresponding problem.
3.7. Theorem (About duality). If a standard problem of linear programming is solvable and bounded, then its dual is also bounded and solvable, there exist an optimal solution for each of the problems and the optimal values of the corresponding objective functions are equal.
Proof. One direction was already proved in the previous corollary. It remains to prove the existence of an optimal solution. That is easy to prove by constructing an algorithm, which we won't do in a great detail now. We will return to the missing part of the proof in the part about affine geometry at the page 215. □
Let us note yet another corollary of the just formulated duality theorem:
Corollary (Equilibrium theorem). Consider two admissible vectors x and y for the standard maximisation problem and its dual problem from the definition 3.6. Then both these vectors are optimal if and only ifyi — Ofor all coordinates with index i for which Z~2"j=i aijxj < bi and simultaneously xj — 0 for all coordinates with index j such that Y1T=1 yiaij > ci-
Proof. Consider that both relations regarding the zeroes
c ■ x < yT ■ b
c ■ x < yT ■ A ■ x < yT ■ b, which is what we wanted to prove.
□
among x, and yt hold. Then we can in the follow-?T-y ing computation calculate with equation, because r Jfiv\   ^be summands with strict inequality have zero
- coefficients anyway:
m m n m n
X v</'< - X?> X aiJxj -XX v'"'.'-v'
/ —1 i — 1      j—l i — l j—l
and from the same reason also
m
n
n
141
CHAPTER 3. LINEAR MODELS AND MATRIX CALCULUS
This shows one implication, thanks to the duality theorem.
Consider now that both x and y are optimal vectors. We thus know that
m m     n n
J2ytbi - EE v/"/;v; - Er/v>
t=i     i=\ j=i j=i
but simultaneously the left- and right-hand sides are equal. Thus there is equality everywhere. If we rewrite the first equality as
we see that it can be satisfied only if the relation from the statement holds, because it is a sum of non-negative numbers and it equals zero. From the second equality we similarly derive the second part and the proof is finished. □
The duality theorem and equilibrium theorem are useful when solving linear programming problems, because they show us relations between zeroes among the additional variables and satisfying the restrictions.
3.8. Notes about linear models in economy. Our very schematic problem of black-white painter from the paragraph 3.4 can be used to illustrate one of the typical economical models, the so-called model of production planing. The model tries to capture the problem completely, that is, to capture both external and internal relations. Left-hand sides of the equations (3.1), (3.2), (3.3) and of the objective function h(xi, x2) are expressing various production relations. Depending on the character of the problem, we have on the right-hand sides either exact values (and then we solve equations) or capacity restrictions and goal optimisation (then we obtain linear programming problems).
We can thus in general solve the problem of source allocation with supplier restrictions and either minimise costs or maximise income. We can also interpret duality from this point of view. If our painter would like to set up costs of his work vl, of white colour yw and of black colour yB, then he minimises the objective function
L ■ yL + Wyw + ByB
with restrictions
yL + w\yw +b\yB > c\ yL + wiyw + b2yB > c2.
But that is exactly the dual problem to the original one and the theorem 3.7 says that optimal state is such when the objective functions have the same value.
Among economical models we can find many modifications. One of them are problems of financial planing, which are connected to the optimisation of portfolio. We are setting up volume of investment into individual investment possibilities with the goal to meet the given restrictions for risk factors while maximising the profit, or dually minimise the risk under given volume.
Another common model is marketing application, for instance allocation of costs for advertisement in various media or placing advertisement into time intervals. Restrictions are in this case determined by budget, target population, etc.
142
CHAPTER 3. LINEAR MODELS AND MATRIX CALCULUS
Very common are models of nutrition, that is, setting up how much of different kinds of food should be eaten in order to meet total volume of specific components, e.g. minerals and vitamins.
Problems of linear programming arise with personal tasks, where workers with specific qualifications and other properties are distributed into working shifts. Common are also problems of merging, problems of splitting and problems of goods distribution.
2. Difference equations
We have already met difference equations in the first chapter, zsp    ^    albeit briefly and of first order only. Now we show a more general theory for linear equations m^ with constant coefficients, which gives not only !/,       very practical tools but also nice illustration for concepts of vector spaces and linear mappings.
J    Homogeneous linear difference equation of order k
3.9. Definition. Homogeneous linear difference equation of order k is given by the expression
a0x„ + a\xn-\ H-----h akx„-k = 0,    a0 / 0   ak / 0,
where the coefficients at are scalars, which can possibly also depend on n.
We also say that such equality gives homogeneous linear recurrence of order k and we usually denote the sequence in question as a function
a\ ak
x„ = f(n) =--f(n - 1)------f(n - k).
«0 a0
Solution of this equation is a sequence of scalars x,, for all
i e N (or i e Z), which satisfy the equation with any fixed n.
By giving any k values x, in sequence are determined all the
J§t#    other values uniquely. Indeed, we work over a field of scalars, thus the values ao and ak are invertible and thus using the definition any x„ can be computed uniquely, similarly for xn-k. Induction thus immediately proves that all remaining values are uniquely determined.
The space of all infinite sequences x, forms a vector space, where addition and multiplication by scalars works coordinate-wise. Directly from the definition is immediate that a sum of two solutions of a homogeneous linear equation or a multiple of a solution is again a solution. Analogously as with homogeneous linear systems we see that the set of all solutions form a subspace.
Initial condition on the values of the solutions is given as a ^-dimensional vector in K*. Sum of initial conditions determines the sum of the corresponding solutions, similarly for scalar multiples. Note also that plugging zeroes and ones into initial k values immediately yields k linearly independent solutions of the equation. Thus although the vectors are infinite sequences, the set of all solutions has finite dimension, we know that its dimension equals to the order of the equation k, and we can easily obtain a basis of all those solutions. Again we speak of fundamental system of solutions and all other solutions are its linear combinations.
As we have already checked, if we choose k indices i, i + 1,...,«+ k — 1 in sequence, the homogeneous linear difference equation gives a linear mapping Kk -> K°° of ^-dimensional vectors of initial values into infinitely-dimensional sequences of the same scalars. Independence of such solutions is equivalent to the
143
CHAPTER 3. LINEAR MODELS AND MATRIX CALCULUS
independence of the initial values - which can be easily told from determinant. If we have a &-tuple of solutions (x[,1],..., x[f]), it is independent if and only if the following determinant, the so-called Casortian, is non-zero for one n (which then implies it is non-zero for all n)
c(41],...,*?]) =
r[i] Tm
r[l] Jk]
An + l •••      An + l
Jl] Jk]
3.10. Solution of homogeneous recurrences with constant coefficients. It is hard to find a universal mechanism for finding a solution of general homogeneous linear difference equations, that is, directly computable expression for the general solution x„. In practical models there are very often equations, where the coefficients are constant. In this case it is possible to guess suitable form for the solution and indeed we will be able to find k linearly independent solutions. This is a complete solution of the problem, as all other solutions will be linear combinations.
For simplicity let us start with equations of second order. Such are very often encountered in practical problems, where there are relations based on two previous values. Linear difference equation (recurrence) of second order with constant coefficients is for us thus a form
(3.4) f(n + 2) = a- f(n + l)+b-f(n) + c,
where a,b,c are known scalar coefficients.
For instance in population models we can assume that the individuals in a population mature and start breeding two seasons later (that is, they add to the value fin + 2) by a multiple b ■ fin) with positive b > 1), while immature individuals tire and destroy part of the mature population (that is, the coefficient a is negative). Furthermore, it might be that somebody destroys (uses, eats) a fixed amount c every season.
Special such case with c — 0 is for instance the Fibonacci sequence of numbers yo, yi, • • •, where yn+2 — y«+i + yn.
If when solving a mathematical problem we don't have any new idea, we can always try to what success leads some known solution of a similar problem. Let us try to plug into the equation (3.4) with coefficient c — 0 similar solution as with the linear equations, that is, fin) — X" for some scalar X. By plugging in we obtain
X" +2 - aXn+1 - bXn = X" iX2 -aX-b) = 0.
This relation will hold either for X — 0 or for the choice of the values
Xi = ^(fl + Vfl2 + 4b),    X2 = ^(a - Va2 +4b).
We have thus determined when actually such solutions indeed work, we just have to suitably choose the scalar X. But this is not enough for us, since we need to find a solution for any two initial values /(0) and /(l), and we have only found two specific sequences satisfying given equation (or possibly only one sequence -if A.2 = Ai).
As we have already derived for even very general linear recurrences, sum of two solutions /i(n) and f2(n) of our equation fin + 2) — a ■ fin + 1) — b ■ fin) — 0 is clearly again a solution
144
CHAPTER 3. LINEAR MODELS AND MATRIX CALCULUS
B. Recurrent equations
Distinct linear dependences can be a good tool for describing various models of growth. Let us begin with a very popular population model that uses linear difference equation of second order:
3.4. Fibonacci sequence. In the beginning of the Spring a stork brought on a meadow two newborn rabbits, male and female. The female is, after being two months old, able to
deliver two newborns, male and female. The newborns can then start delivering after one month and then every month. Every female is pregnant for one month and then she delivers. How many pairs of rabbits will be there after nine months (if none of them dies and none "moves in")?
Solution. After one month, there is still one pair, but the female is already pregnant. After two months, first newborns are delivered, thus there are two pairs. Every next month, there are that many new pairs as there were pregnant females one month before, which equals to the number of at least one month-old pairs, which equals the number of pairs that were there two months ago. The total number of pairs p„ after n months is thus the sum of the number of pairs in the previous two months. For the number of pairs we thus have the following homogeneous linear recurrent formula
(3.1) Pn+2 = Pn + 1 + Pn,      n = l,...,
which along with initial conditions p\ = 1 and p2 = 1 uniquely determines the numbers of pairs of rabbits at the meadow in individual months. Linearity of the formula means that all members of the sequence (p„) appear in the first power, the meaning of the word recurrence is hopefully clear and the homogeneity means that in the formula the absolute term is missing (see further for non-homogeneous formula). For the value of the n-th member we can derive an explicit formula. In searching for the formula we can use the observation that for certain r the function f is a solution of the difference equation without initial conditions. This r can be obtained by plugging into the recurrent relation:
r"+2   =  r"+1 +r"   and after dividing by f we obtain
r2   =   r + 1,
which is the so-called characteristic equation of the giver recurrent formula. Our equation thus has roots ^—j^- and ^-y^ and the sequences
of the same equation and the same holds for the scalar multiples of the solution. Our two specific solutions thus allow even more general solutions
f(n) = Ci^ + C2Xn2
for arbitrary scalars C\ and C2 and for unique solution of the specific problem with given initial values /(0) and /(l) it remains just to find the corresponding scalars C\ and C2. (And we also need to check whether it is possible for any two initial values).
3.11. Choice of scalars. Let us show how this can work on at least one example. Let us concentrate on the problem that the roots of the characteristic polynomial are in general not in the same field of the scalars as the coefficients in the equation. Thus we solve the problem:
1
(3.5)
yn+2
yo — 2, yi — 0.
(±^)" and b„
2 , ^..^ ^„ — (—^-)n, n > 1 satisfy the given relation. The relation is also satisfied by any linear combination, that is, any
In our case is thus X\t2 — \ (1 ± \/3) and clearly yo — C\ + C2 — 2
yi = ^Ci(l + V3) + ^C2(1-V3)
is satisfied for exactly one choice of these constants. Direct calculation yields C\ — I — \\fi, C2 — 1 4- 5V3 and our problem has unique solution
f(n) = (1 - ^/3)^(1 + V3)" + (1 + ^/3)^d - V3)".
Note that even if the found solutions for the equation with integral coefficients look complicated and are expressed with irrational (or possibly complex) numbers, we know a priori that the solution itself is again integral. Without this "step aside" into bigger field of scalars we would not be able to describe the general solution.
We will meet with similar events very often. General solution allows us also without direct enumeration of constant to discuss qualitative behaviour of the sequence of numbers fin), that is, whether the values with growing n approach some fixed value or oscillate in some interval or are unbounded.
3.12. General case of homogeneous recurrences. Let us now
try similarly as in the case of second order to plug in the choice x„ — X" for some (yet unknown) scalar X into the general homogeneous equation from the definition 3.9. For every n we obtain the condition
X"-kia0Xk + aiXk~l ■ ■ ■ + ak) = 0
which means that either X — 0 or X is the root of the so-called characteristic polynomial in the parentheses. Characteristic polynomial is independent of n.
Assume that the characteristic polynomial has k distinct roots X\,... ,Xk. We can for this purpose extended the field of scalars we are working in, for instance Q into R or R into C, because the result of the calculations will again solutions that stay in the original field thanks to the equation itself. Each of the roots gives us single possible solution
xn — iXi)" ■
In order to be happy, we require k linearly independent solutions.
145
CHAPTER 3. LINEAR MODELS AND MATRIX CALCULUS
sequence c„ = san + tb„, s, t e R. The numbers s and t can be chosen so that the resulting combination satisfies the initial conditions, in our case c\ = 1, c2 = 1. For simplicity it is clever to define the zero-th member of the sequence as co = 0 and compute s and t from the equations for c0 and c\. We find out that s = —   , t =     and thus
(1 + V5)" - (1 - V5)"
(3.2)
Pn
2"(V5)
Such sequence satisfies the given recurrent formula and also the initial conditions c0 = 0, c\ = 1, thus it is the single sequence given by this requirements. Note that the value of the formula (||3.2||) is integer for any natural n (it gives the integer Fibonacci sequence), although it might not seem so at the first glance. □
3.5. Simplified model for behaviour of gross domestic product.
Consider the difference equation
(3.3)
yk+2 - a{\ + b)yk+i + abyk = 1,
where yk is the gross domestic product at the year k. The constant a is the so-called consumption tendency, which is a macroeconomical factor that gives the fraction of money that the people spend (from what they have at their disposal), and the constant b describes the dependence of the measure of investment of the private sector on the consumption tendency.
We further assume that the size of the domestic product is normalised such that on the right-hand side of the equation the result is 1.
Compute the values y„ for a = |, b = |, yo = 1, y\ = 1.
Solution. Let us first look for the solution of the homogeneous equation (the right side being zero) in the form of r1. The number r must be a solution of the characteristic equation
a(\ + b)x +ab = 0,    that is, x2
1
x + - = 0, 4
which has a double root \. All the solutions of the homogeneous equation are then of the form a(\)n + bn(^)n.
Let us also note that if we find some solution of the non-homogeneous equation (the so-called particular solution), then if we add to it any solution of the homogeneous solution, we obtain another solution of the non-homogeneous equation. It can be shown that by this we obtain all solutions of the non-homogeneous equation.
In our case (that is, when all the coefficients and the non-homogeneous term are constant) is a particular solution the constant
In order to do this it suffices to check the independence by plugging k values for n — 0,..., k — 1 for k choices of Xi into the Casortian (see 3.9). We thus obtain the so-called Vandermonde matrix and it is a nice (but not entirely trivial) exercise to compute that for every k and any /c-tuple of distinct Xt is determinant of such matrix non-zero, see ||2.24|| on the page 87. But that means that the chosen solutions are linearly independent.
We have thus found the fundamental system of solutions of the homogeneous difference equation in the case that all the roots of its characteristic polynomial are distinct.
Consider now the multiple root X and plug into the definition the assumed solution x„ — nXn. We obtain the condition
aotiX" + ■
ak(n
k)Xn~k = 0.
This condition can be rewritten using the so-called derivation of a polynomial (see ?? on the page ??), which we denote by apostrophe:
X(a0Xn + ■ ■ ■ + akXn-k)' = 0
and right at the beginning of the fifth chapter we shall see that the root of a polynomial / has multiplicity greater than one if and only if it is a root of /'. Our condition is thus satisfied.
With greater multiplicity £ of the root of the characteristic \^    polynomial we can proceed similarly and use the fact that a root with multiplicity £ is a root of all derivations of the polynomial up to £ — 1 (inclusively). Derivations look like this:
\ n— k
f(X) = a0Xn H-----\-akX
f'(X) = a0nX"~L H-----h ak(n - k)X'
n—k — 1
a0n(n-l)X"-2+- ■ ■ +ak(n-k)(n-k-l)X"-k-2
f
«+1> =avn...(n-£)Xn-1-1 +.
+ ak(n — k) ... (n — k — £)X
n-k-l-l
Let us look on the case for a triple root X and try to find a solution in the form n2Xn. Plugging into the definition we obtain the equation
a0n2Xn + --- + ak(n- k)2Xn~k = 0.
Clearly the left side equals the expression X2f"(X) + Xf'(X) and because X is a root of both derivations, the condition is satisfied.
Using induction we easily prove that even for general condition for the solution in the form x„ —nlXn,
a0nlXn + ... ak(n - k)lXn~k = 0,
the solution can be obtained as a linear combination of the derivations of the characteristic polynomial starting with the expression
Xl+lf^+l-Xl£(£ + \)fil) + ... and we have thus came close to the complete proof of the following:
Theorem. Every homogeneous linear difference equation of the order k over any field of scalars K contained in the complex numbers K has as a set of all solutions a k-dimensional vector space generated by the sequences xn — nlXn, where X are (complex) roots of the characteristic polynomial and the powers £ run overall
146
CHAPTER 3. LINEAR MODELS AND MATRIX CALCULUS
y„ = c. By plugging into the equation we obtain c is, c = 4. All solutions of the difference equation
1
yk+2 - yk+\ + - • yk = l
c + ±c = l,that
are thus of the form 4 + a (pn + bn (pn. We require that yo = yi = 1 and these two equations give a = b = —3, thus the solution of this non-homogeneous equation is
3" 1 i
Again, as we know that the sequence given by this formula satisfies the given difference equation and also the given initial conditions, it is indeed the only sequence characterised by these properties. □ In the previous case we have used the so-called method of indeterminate coefficients. It is based on the following: on the basis of the non-homogeneous term of the given difference equation we "guess" the form of the particular solution. The forms of the particular solutions are known for many non-homogeneous terms. For instance the equation
(3.4)
yn+k +alyn+k_l H-----h akyn = Pm(n),
where P (m) is a polynomial of degree n and the corresponding characteristic equation has real roots, has (almost always) particular solution of the form Qm(n), where Qm(n) is a polynomial of degree m.
Other possible way how to solve such equation is the so-called variation of constants method, where we first find a solution
y(n) = ^CifiQi)
r = l
of the homogenised equation and we consider the constants ct as functions Ci(n) of the variable n and we look for a particular solution of the given equation in the form
y(n) = ^Ci(n)fi(n).
i=\
Let us show on the picture the values of ft for i < 35 with the equation
fin) = 9-f(n - 1) - lf(n - 2) + 1     /(0) = /(l) = 1.
natural numbers between zero and multiplicity of the corresponding root k.
Proof. Aforementioned relations between the multiplicity of a root and the derivation of the polynomial will be proven later, and we won't prove the fact that every complex polynomial has exactly that many roots (counting multiplicities) as is its degree. Thus it just remains to prove that the found k-tuple of solutions is linearly independent. Even in this case we can inductively prove that the corresponding Casortian is non-zero, as we have already done in the case of Vandermonde determinant before.
For illustration of our approach we show how does the calculation look like for the case of a root k\ with multiplicity one and a root X2 with multiplicity two:
C(k\,k\,nk\)
,« + 1 1 «+2
k2
^2+1 i«+2
— k\k2
— k\k2
1
kl
1
k2
kl k2 1
1 n 1 2n ~k\k2
k\ — X2 kl(ki-k2) k\ — k2
"2
n
(n + l)k2 (n + 2)k\
1 n 0 k2 0 k1
nk\ (n + \)kn+l (n + 2)kn+2
k2
kl(k\ — k2) ki
n i 2« + l
klk2
(ki - k2y / 0.
In the general case the proof can be carried on in a completely similar way, inductively. □
3.13. Real basis of the solutions. For equations with real coefficients the initial real conditions always lead to real solutions. Still, the corresponding fundamental solutions derived using the just proven theorem might exists only in the complex domain.
Let us therefore try to find other generators, which will be more convenient for us. Because the coefficients of the characteristic polynomial are real, each its root is either real or the roots are paired as complex conjugates.
If we describe the solution in the polar form as
k" = \k\"(cos n<p + i sinn<p) k" = \k\"(cos n<p — i sinn<p),
we immediately see that their sum and difference leads to other two linearly independent solutions
xn = \k\" cos nip,    yn — \k\n sin ncp.
Difference equations very often appear as a model of dynamics of some system. Nice topic to think about is connection between absolute values of individual roots and stability of the solution - either of all of them or with dependence on the initial conditions. We will not go into details here, because only in the fifth chapter we will speak of convergence of values to some limit value and so on, but still there is some space fore some interesting numerical experiments for instance with oscillations of suitable population or economical models.
147
CHAPTER 3. LINEAR MODELS AND MATRIX CALCULUS
1;
0,95-0,9
0,85; 0,8;
0,75-0,7-
0 5        10       15       20       25       30 35
Let us do some exercises about solving linear difference equation of the second order with constant coefficients. The sequence satisfying the given recurrence equation of the second order is uniquely determined whenever we give some two neighbouring members. Let us again note a further usability of complex numbers: for determining the explicit formula for the 72-th member of the sequence of real numbers we might require calculations with complex numbers (that happens when the characteristic polynomial of the difference equation has complex roots).
3.6. Find explicit formula for the sequence satisfying the following linear difference equation with the initial conditions:
xnjr2 = 2xn -\- n, X\ = 2, x2 = 2.
Solution. Homogenised equation is
Xn_|_2 = 2xn.
Its characteristic polynomial is x2 — 2, its roots are ±V2. The solution of the homogenised equation is of the form
a(V2)n + b(-V2)n,    for any a,b eR.
We look for the particular solution using the method of indeterminate coefficients. The non-homogeneous part of the equation is a linear polynomial n, particular solution will thus be in the form of linear polynomial in the variable n, that is, kn + I, where k,l e R. By substituting into the original equation we obtain
k(n + 2) + I = 2(kn +l)+n.
By comparing the coefficients at the variable n on both sides of the equation we obtain the relation k = 2k + 1, that is, k = —1, by comparing the absolute terms we obtain 2k +1 = 21, that is, I = —2. Thus the particular solution is the sequence —n — 2.
3.14. Non-homogeneous     linear     difference equations.
Analogously to the case of systems of linear equations we can obtain all solutions of non-homogeneous linear difference equations
■ ak{n)xn-k = bin),
ao(n)x„ + a\(n)xn-
where the coefficients a, and b are scalars which might depend on n, andao(«) A akin) 7^ 0.
We proceed by finding one solution and adding whole vector space of dimension k of solutions of the corresponding homogeneous system. Indeed this yields a solution and because the difference of two solutions of a non-homogeneous system is a solution of the homogeneous system, we obtain all solutions in this way.
When we were working with systems of linear equations, it was possible that there was no solution. This is not possible with difference equations. But it is usually not easy to find that one particular solution of a non-homogeneous system, if the behaviour of the scalar coefficients in the equation is complicated. For linear recurrences the situation is similar.
Let us restrict ourselves to single case, where the corresponding homogeneous system has constant coefficients and bin) is a polynomial of degree s. The solution can then be found in the form of the polynomial
x„ — «0 + a\n + • • • + asns
with unknown coefficients at,i — 1,..., s. By plugging into the difference equation and comparing the coefficients at the individual powers of n we obtain a system of s + 1 equations for s + 1 variables a,. If this system has a solution, we have found a solution of our original problem. If it has no solution, it is enough to increase the degree s of the polynomial in question.
For instance the equation x„—x„_2 — 2 cannot have a constant solution, but by setting x„ — ao + ai« we obtain the solution ai = l (and the coefficient ao can be arbitrary) and thus the general solution of our equation is
x„ = Ci + C2(-1)" +«■
Note that the matrix of the corresponding system of equations for a polynomial of lower degree is zero and the equation 0 • ao — 2 has no solution.
3.15. Linear filters. We will now consider now the infinite sequences
X — (- - - , X—n, X—njr \ , . . . , X— \ , Xq, X\ , ..., xn, . . . )
and will work, similarly to the case of systems of linear equations, with the operation T that maps the whole sequence x to the sequence z — Tx with elements
Zn — aQXn + fllX„-l H-----h akXn-k.
With sequences x we can again work as with vectors, operations working coordinate-wise, and the vector space is infinitely-dimensional. Our mapping T is clearly a linear I mapping in such space, f ^ The sequences can be imagined as discrete values of some signal, subtracted usually in very short time units, operation T can then be a filter that works with the signal. We are interested in estimating the properties such "filter" can have.
148
CHAPTER 3. LINEAR MODELS AND MATRIX CALCULUS
Thus the solution of the non-homogeneous difference equation of the second order without initial condition is of the form a(*Jl)n + b(-V2)n -72-2, a,b el.
Now, by plugging in the initial conditions, we determine the indeterminate a, b € R. For calculational simplicity we use a little trick: from the initial conditions and the given recurrence relation we compute the member x0 : x0 = \ (x2 — 0) = 1. The given recurrence formula along with the conditions x0 = 1 and x\ = 1 is then clearly satisfied by the same formula that satisfies the original initial conditions. Thus we have the following relations for a, b:
x0 :     a(v/2)0 + i(-v/2)°-2= 1,    thus a +b = 3, x\ :      V2a — V2b = 5,
whose solution gives us a = 6+^/2~, b = 6     . The solution is thus
the sequence
6+ 575   r-      6-575 /-xn =---(V2)" +---(-V2)" - n - 2.
□
3.7. Determine the real basis of the space of all solutions of the homogeneous difference equation
-"-«+4 = -"-«+3 "i- -*"« + !      -*-« i
Signals are very often from their essence a sum of some parts, which are by themselves basically periodical. From our definition it is clear that periodic sequences xn, that is, sequences satisfying for some fixed natural number p
xn+p — xn
will also have periodic images z—Tx
Zn+p — a0xn+p + a\xn-\+p + ' ' ' + akXn-k+p
— aoxn + a\xn-\ H-----h akXn-k — Z„
with the same period p.
For a fixed operation T we are interested in the following: which input periodic sequences remain roughly the same (up to a multiple) and which will be suppressed to zero values.
In the second case we are looking for the kernel of our linear mapping T. That is given by homogeneous difference equation
a0x„ + a\xn-\ H-----h akx„-k = 0,    a0 / 0   ak / 0,
which we are able to solve.
3.16. Bad equaliser. As an example, let us consider a very simple linear filter given by the equation
Zn — (TX)n — Xnjr2 -\- Xn.
The results of such operation on a signal are hinted at in the |gu        following four pictures for gradually increasing fre-. '"^y^" quency of periodic signal x„ — cos(cpn). Red is the original signal, green is the result after using the filter. e> -   Unevenness of the curves are consequence of imprecise depicting, both signals are of course smooth sinus curves.
Solution. The characteristic polynomial of the given equation is x4 — x3 — x +1. If we are looking for its roots, we are solving the reciprocal equation
x + 1 = 0
x4 -x3
Standard procedure is to solve the equation by the expression x2 and then we use the substitution t = x + that is, t2 = x2 + ^ + 2. We obtain the equation
t2
0,
with roots ti = — 1, t2 = 2. For both of these values of the indeterminate t we solve separately the equation given by the substitution:
1
x +
It has two complex roots: x\
;V3
-1.
2 ^l 2
cos(27r/3) + i sin(27r/3)
andx2 = — 5 — 1 2 = cos(2tt/3) — i sin(27r/3).
For the second value of the indeterminate t we obtain the equation
1
x + - = 2
Note that in the areas where the resulting signal is roughly as strong as the original there is a dramatic shift in the phase. Cheap equaliser indeed work in such a bad way.
149
CHAPTER 3. LINEAR MODELS AND MATRIX CALCULUS
with double root 1. Thus the basis of the vector space of the sequences that are a solution of the difference equation in question is the following quadruple of sequences: {(-5 + *V3~) }™=l, {(-5 - zV3~) }™=l, (constant sequence) and {n}™=l. If we are looking for a real basis, we must replace two of the generators (sequences) from this basis by some sequences that are real only. As these generators are power series whose members are complex conjugates, it suffices to take as suitable generators the sequences given by the half of the sum and by the half of the z'-th multiple of the difference of that complex generators. This yields the following real basis of the solution space: {1}^ (constant sequence), {n}™=l, {cos(« • 2tt/3)}^=1, {sin(« • 2tt/3)}^=1. □
3.8. Find a sequence that satisfies the given non-homogeneous difference equation with the initial conditions:
X~n+2 = Xn + 1 + 2x„ + 1,      X\ = 2, X2 = 2.
Solution. General solution of the homogenised equation is of the form a{—1)" + b2". Particular solution is the constant —1/2. General solution of the given non-homogeneous equation without initial conditions is thus
fl(-l)" +b2"
1
By plugging in the initial conditions then gives us the constants a = —5/6, b = 5/6. The given difference equation with initial conditions is thus satisfied by the sequence
■-(-1)" + -2""1 - -. 6     '     3 2
□
3.9. Determine the sequence of real numbers that satisfies the following non-homogeneous difference equation with initial conditions:
2x„+2 = — Xn + 1 + xn + 2,      X\ = 2, X2 = 3.
Solution. General solution of the homogenised equation is of the form a(—l)n + b(l/2)n. Particular solution is the constant 1. General solution of the non-homogeneous equation without initial conditions is thus
a(-\T+b(^j +1.
By plugging into the initial conditions we obtain the constants a = 1, b = 4. The given equation with initial conditions is thus satisfied by the sequence
(_!)»+ 4 fV\ +i.
3. Iterated linear processes
3.17. Iterated processes. In practical models we very often encounter the situation where the evolution of a system in a given time interval is given by a linear process, and we are interested in the behaviour of the system after many iteration. Very often is the linear process remains the same, from the mathematical point of view it is thus repeated multiplication of the state vector by the same matrix.
While for solving the systems of linear equation we needed only minimal knowledge of properties of linear mappings, in order to understand the behaviour of an iterated system we need to know the properties of eigenvalues, properties of eigenvectors and other structural results.
In a sense we are in the same environment as with linear recurrences and actual our description of filters in previous paragraphs can be described in such way. Imagine that we are working with sound and are keeping track by the state vector
Yn — (xn,
■ Xn-k+l)
of all values from the actual one to the last one that is yet being processed in our linear filter. In one time interval (for the frequency of audio signal a very short one) we then move to the state vector
Yn + l — (xn + l, X„, . . . , X„-k+2),
where the first value x„+i = a\x„-\-----ha,tx«-,t+i is computed as
with homogeneous difference equations, the others are just shift by one position and last one is forgotten. The corresponding square matrix of order k that satisfies Yn+\ — A ■ Yn looks as follows:
(a\   «2   ••• a-k\ 1    0    ...     0 0
A =
0 1
0 0
\0    0   ...     1 O.J
For such simple matrix we have derived explicit procedure for the complete formula for the solution. In general, it wont be so easy even for very similar systems. One of the typical cases is study of dynamics of a population in distinct biological systems.
Note also that the matrix A has (understandably) the characteristic polynomial
p(X) = Xk - aik
k-l
ak,
as can be easily derived by expanding the last column and the recurrence. That is explainable also directly, because the solution x„ — X", X / 0 basically means that the matrix A by multiplication takes the eigenvector (Xk,..., X)T to its A-multiple. Thus such X must be eigenvalue of the matrix A.
3.18. Leslie model for population growth. Imagine that we are dealing with some system of individuals (cattle, insects, cell cultures, etc.) divided into m groups (according to their age, evolution stage, etc.). The state X„ is thus given by the vector
Xn — (ui, ..., um)T depending on the time t„ in which we are observing the system. Linear model of evolution of such system is then given by the matrix A of dimension n, which gives the change of the vector X„
150
CHAPTER 3. LINEAR MODELS AND MATRIX CALCULUS
□
3.10.   Solve the following difference equation:
-"-«+4 = -"-«+3      -"-«+2 ~i~ -"-« + 1 -*-«•
Solution. From the theory we know that the space of the solutions of this difference equation is a four-dimensional vector space whose generators can be obtained from the roots of the characteristic polynomial of the given equation. The characteristic polynomial is
x4 - x3 + x2 - x + 1 = 0.
It is a reciprocal equation (that means that the coefficients at the (n — /c)-th and /c-th power of x, k = 1, are equal). Thus we use the
substitution u = x + £. After dividing the equation by x2 (zero cannot be a root) and substituting (note that x2 + \ = u2 — 2) we obtain
2 1 1 2 X   — X + 1---1--7="
- u — 1=0.
1±VI
X x^
Thus we obtain the indeterminates «1,2 = 1^2- From there then by the equation x2 — ux + 1 = 0 we determine the four roots
Xl, 2,3,4
1± V5±V-10±2V5
Now we note that the roots of the characteristic equation could have been "guessed" right away - it is
x5 + 1 = (x + l)(x4 - x3 + x2 - X + 1),
and thus the roots of the polynomial x4 — x3 + x2 — x + 1 are also the roots of the polynomial x5 + 1, which are exactly the fifth roots of the — 1. By this we obtain that the solutions of the characteristic polynomial are the numbers x\ ,2 = cos( j) ± i sin(y) and x3 4 = cos(^) ± i sin(3j-). Thus the real basis of the space of the solution of the given difference equation is for instance the basis of the sequences cos(rj-), sin(rj-), cos(3^ZL) and sin(3^ZL), which are sines and cosines of the arguments of the corresponding powers of the roots of the characteristic polynomial.
Note that we have by the way derived the algebraic expressions for ^10~2^ cos(3i) = 2^=1 and sin(3f) =
cos(f) = ±±^, sin(f)
V10+2 VI
.5/ 4   . ^"v5/ 4      ' w"v 5 ' 4 M
4 (because all the roots of the equation have the absolute value 1, they are real (imaginary) parts of the corresponding roots). □
3.11. Determine the explicit expression of the sequence satisfying the difference equation x„+2 = 2x„+i — 2x„ with members x\ = 2, x2 = 2.
Solution. The roots of the characteristic polynomial x2 — 2x + 2 are 1 + i and 1 — i. The basis of the (complex) vector space of the solution
to
Xn+i — A ■ X„
when time changes from t„ to ?„+i.
Let us show as an example the so-called Leslie model for population growth, where there is the matrix
(fl	h	h ■	fm — 1	fm^
	0	0 .	0	0
0	^2	0 .	0	0
0	0	^3	0	0
\0    0    0    ...   rm-i 0/
whose parameters are tied with the evolution of a population divided into m age groups such that f denotes the relative fertility of the corresponding age group (in the observed time shift from N individuals in the z-th group arise new f N ones - that is, they are in the first group), while t, is relative mortality in the z-th group in one time interval. Clearly such model can be used with any number of age groups.
All coefficients are thus non-negative real numbers and the numbers t, are between zero and one. Note that when all x are equal one, it is actually a linear recurrence with constant coefficients and thus has either exponential growth/decay (for real roots X of the characteristic polynomial) or oscillation connected with potential growth/decay (for complex roots).
Before we introduce more general theory, let us play for a while with this specific model.
Direct computation with the Laplace expansion of the last column yields the characteristic polynomial pm (X) of the matrix A for the model with m groups:
Pm(X) = \A- XE\ = -kPm-i(k) + (-l)m~7mTl . . . Xm — \.
Easily by induction we derive that this characteristic polynomial is of the form
Pm(X) = (-l)m(X"
■ fli X
m — l
-\X — am)
and mainly non-negative coefficients a\,..., am, if all parameters xi and fi are positive. For instance it is always
— fm X\ . . . Xm — 1.
Let us qualitatively estimate the distribution of the roots of the polynomial pm. Sadly, details of this procedure could
fbe exactly explained only later, after understanding some parts of the so-called mathematical analysis in the chapter five and later, however it should all be intuitively clear even now. We express the characteristic polynomial in the form
pm(X) = ±Xm(\-q(X))
where q(X) — a\X~l + • • • + amX~m is a strictly decreasing non-negative function for X > 0. Evidently there exists exactly one positive X for which q(X) — 1 and thus also pm (X) — 0. In other words, for every Leslie matrix there exists exactly one positive real eigenvalue.
For actual Leslie models of populations all coefficients t, and fj are between one and zero and a typical situation is when the only real eigenvalue Xi is greater or equal to one, while the absolute values of the other eigenvalues are strictly smaller than one.
151
CHAPTER 3. LINEAR MODELS AND MATRIX CALCULUS
is thus formed by the sequences y„ = (1 + /)" and z„ = (1 — /)". The sequence in question can thus be expressed as a linear combination of these sequences (with complex coefficients). It is thus x„ = a-yn+b-Zn, where a = a\ + ia2, b = b\ + ib2. From the recurrent relation we compute x0 = \(2x\ — x2) = 0 and by substitution n = 0 and n = 1 into the expression of xn we obtain
1 = x0   =   a\ + ia2 + b\ + ib2
2 = xi   =   (fli + ia2)(\ + i) + (bi + ib2){\ - /),
and by comparing the real and the complex part of both equations we obtain a linear system of four equations i
a\ + bi
a2 + b2 a\ - a2 + b\ + b2 ai + a2 — t>i + b2   = 0
with solution a\ = b\ = b2 = ^ and a2 = —1/2. Thus we can express the sequence in question as
Xn = (\- ^0(1 + 0" + (\ + - if-
The sequence can also be expressed using the real basis of the (complex) vector space of the space of solutions, that is, using the sequences u„ = \{y„+ zn) = (\/2)" cos(^) and v„ = \i(z„ - y„) = (V2)" sin(rj-). The transition matrix for the changing the basis from the complex one to the real one is
Mi -$)•
the inverse matrix is T~l = 0 ^, for expressing the sequence x„ using the real basis, that is, for expressing the coordinates (c, d) of the sequence x„ under the basis {u„, vn}, we have
If we begin with any state vector X which is given as a sum of eigenvectors
X — Xi + ■ ■ ■ + xm with eigenvalues Xi, then iterations yield
X
X\xr +
■ ■ x\,xn
thus under the assumption that \Xi\ < 1 for all i > 2, all components in the eigensubspaces decrease very fast, except for the component X\X\.
Distribution of the population among the age groups are thus very fast approaching the ratios of the components of eigenvector to the dominant eigenvalue X\.
For example for the matrix (let us realise the meaning of individual coefficient, they are taken from the model for sheep breeding, that is, the values x contain both natural deaths and activities of breeders)
		( 0	0.2	0.8	0.6	o\
= 1		0.95	0	0	0	0
	A =	0	0.8	0	0	0
= 0		0	0	0.7	0	0
= 2		1 o	0	0	0.6	0/
the eigenvalues are approximately
1.03, 0, -0.5, -0,27 + 0.74/, -0.27-0.74/
with absolute values 1.03, 0, 0.5, 0.78, 0.78 and the eigenvector corresponding to the dominant eigenvalue is approximately
XT = (30 27 21 14 8).
We have immediately chosen the eigenvector whose coordinates sum to 100, it directly gives us the percentual distribution of the population.
If we instead of three-percent total growth of the population rather wanted constant number and said that we will eat sheep from second group, we would be asking the question how much shall we decrease x2 so that the dominant eigenvalue would be one.
3.19. Matrices with non-negative elements. Real matrices that have no negative elements have very special properties. Also, they are very often present in practical /zj$£fakvfe   models. We shall thus introduce the so-called Perron--l^jf^^K-J— Frobenius theory which deals with such matrices.
Let us begin with definition of some notions in order to be able to formulate our ideas.
|    Positive and primitive matrix    [__>
L
Definition. Under positive matrix we understand a square matrix A whose all elements atj are real and strictly positive. Primitive matrix is such square matrix A such that some power Ak is positive.
thus we have again an alternative expression of the sequence xn where there are no complex numbers (but there are square roots):
x„ = (V2rcos(^) + (V2rsin(^),
which we could have obtained by solving two linear equations in two variables c, d, that is, 1 = x0 = c ■ u0 + d ■ v0 = c and 2 = x\ = c ■ u\ + d ■ v\ = c + d. □
Let us recall that spectral radius of matrix A is the maximum of absolute values of all (complex) eigenvalues of A. Spectral radius of a linear mapping over (finitely-dimensional) vector space is the spectral radius of the corresponding matrix under some basis.
2
Norm of a matrix A e Rn or of a vector x e Rn is the sum of absolute values of all elements. For vector x we write |x| for its norm.
The following result is very useful and hopefully also well understandable. Its proof is with its hardness quite atypical for this
152
CHAPTER 3. LINEAR MODELS AND MATRIX CALCULUS
3.12. Determine the explicit expression of the sequence satisfying the difference equation x„+2 = 3x„+1 + 3x„ with members x\ = 1 and x2 = 3. O
3.13. Determine the explicit formula for the n -th member of the unique solution {x„}™=l that satisfies the following conditions:
Xn+2 = Xn + \ — X„,  , X\ = 1, x2 = 5.
o
3.14. Determine the explicit formula for the n -th member of the unique solution {x„}™=l that satisfies the following conditions:
— Xn+3 = 2Xn+2 + 2xn + i + X„, X\ = 1, X2 = 1, X3 = 1.
o
3.75. Determine the explicit formula for the n -th member of the unique solution {xn}^=l that satisfies the following conditions:
— X„+3 = 3x„+2 + 3x„ + i + X„, X\ = 1, X2 = 1, X3 = 1.
o
C. Population models
Population models which we are to deal with right now will have recurrence relations in vector spaces. The unknown in this case is not a sequence of numbers but a sequence of vectors. The role of coefficients is played by matrices. We begin with a simple (two-dimensional) case.
3.16. Savings. With a friend we are saving for a holiday together by monthly payments in the following way. At the beginning I give 10 €and he gives 20 €. Every consecutive month each of us gives as many as last month plus one half of what the other has given the month before. How much will we have after one year? How much many will I pay in the twelfth month?
Solution. The amount of many I pay in the 72-th month is denoted as x„ and the amount my friend is paying is y„. The first month we thus givex! = 10, yi = 20. For the following payments we can write down a recurrent relation:
Xn + l = Xn -\- ~^y-n
y«+i = y» + 2xn
If we denote the common savings as z,„ = xn + y„, then by summing the equations we obtain zn+i = zn + \zn = \zn- That is a geometric sequence and we obtain z,„ = 3.(|)"_1. In a year we will thus have zi + z,2 + • • • + z,\2- This partial sum is easy to compute
textbook, so we give at least a vague idea how to do it. If the reader has some problems with smooth reading, we suggest skipping the proof immediately.
Theorem (Perron). If A is a primitive matrix with spectral radius X € M, then X is a simple root of characteristic polynomial of the matrix A, which is strictly greater than the absolute value of any other eigenvalue of the matrix A. Furthermore, there exist eigenvector x associated with X such that all elements x, of x are positive.
Vague idea. In the proof we shall rely on the intuition from \\ elementary geometry. Partly we will make the used concepts more precise in the analytical geometry in the fourth chapter, some analytical aspects will be W studied in more detail in the fifth chapter and later, and some claims won't be proven in this textbook at all. Hopefully the presented ideas will not just illuminate the theorem but also will motivate for deeper study of geometry and analysis by themselves. Let us begin with a understandable auxiliary lemma:
Lemma. Consider any polyhedron P containing the origin 0 e W. If some iteration of the linear mapping iff : M" -> M" maps P in its inside, then the spectral radius of the mapping iff is strictly smaller than one.
Consider the matrix A of the mapping i/r under the standard basis. Because the eigenvalues of Ak are the k-th powers of the eigenvalues of the matrix A, we can without loss of generality assume that the mapping i/r already maps P into its inside. Clearly i/r cannot have any eigenvalue with absolute value greater than one.
Let us argue by contradiction. Assume that there exists eigenvalue X with \X\ — 1. Thus there are two possibilities, either Xk — 1 for suitable k or there is no such k.
The image of P is a closed set (that means that when the points in the image group about some point y in M", the point y is also in the image) and the border of P is not intersected at all by the image. Thus ^ cannot have a fixed point on the border and there cannot even be any point on the border to which the points in the image would converge. The first argument excludes that some power of X is one, because such fixed point on the border of P would then exist. In the remaining case there would definitely be a two-dimensional subspace W C M" on which the restriction of i/r acts as a rotation by an irrational argument and thus there definitely exist a point y in the intersection of W with the border of P. But then the point y could be approached arbitrarily close by the points from the set i[r" (y) (through all iterations) and thus would have to be in the image also. That leads to a contradiction and thus the lemma is proven.
Now let us prove the Perron theorem. Our first step is ensuring the existence of the eigenvector which has all elements positive. Let us consider the so-called standard simplex
S — {x — (xi
,x„Y , |x| = \,xt >(U = 1,
Because all elements in the matrix A are non-negative, the image A ■ x has all non-negative coordinates as x does and at least one of
them is always non-zero. The mapping x
1 (A ■ x) thus
3(1 + - H----+ = 3
1-1
772,5.
maps S to itself. This mapping S —>• S satisfies all the assumptions of the so-called Brouwer fixed point theorem and thus there exists vector y e S such that it is mapped by this mapping to itself. That
153
CHAPTER 3. LINEAR MODELS AND MATRIX CALCULUS
In a year we will have saved over 772 €.
The recurrent system of equation describing the savings system can be written by matrices as follows:
means that
xn + l
y«+i
1     f \ / Xr
7     1/ Wr-
it is thus again a geometric sequence. Its elements are now vectors and the quotient is not a scalar, but a matrix. The solution can be found analogously:
The power of the matrix acting on the vector (xi,yi) can be found by expressing this vector in the basis of eigenvectors. The characteristic polynomial of the matrix is (1 — A)2 — 4- — 0 and the eigenvalues are
thus A
1,2
3 j_
2' 2
. The corresponding eigenvectors are thus (1,1) and
(1,-1). For the initial vector (xi,yi) = (1, 2) we compute 'l\    3 /l\ l/l
and thus
2 VI
3 /3\"_1 /l
2 V2
-1
1 /1\"_1 / 1
2 V2
That means that in the 12. month I pay
Xl2
12
12
130
EUR and my friend pays basically the same amount. □
Remark. The previous example can be solved also without matrices by rewriting the recurrent equation: x„ = xn + \yn = \xn + \z,n-
The previous example was actually a model of growth (in the case of growth of saved money). Let us now go to the models of growth describing primarily a growth of some population. Leslie model of population growth with which we have coped with in great detail in the theoretical part describes very well not only populations of sheep (according to which it was developed), but can be also applied in modelling of the following populations:
3.17. Rabbits for the second time. Let us show how the Leslie model can describe the population of the rabbits on the meadow with which we have worked in the exercise (||3.4||). Let us consider that the rabbits are dying after reaching the ninth year of age (in the original model the rabbits were immortal). Let us denote the numbers of rabbits according to their age in months in time t (in months) as x\(t), x2(t),..., x9(t), then the numbers of rabbits in individual categories are after one month described by the formula x\{t + 1) =
A ■ y = ky,    k = \ A ■ y\
and we have found an eigenvector that lies in S. Because some power of Ak has due to our assumption all elements positive and of course we have Ak ■ y = kky, all elements of the vector y are strictly positive (that is, they he inside of S) and k > 0.
In order to prove the rest of the theorem, we will consider the mapping given by the matrix A in a more suitable basis and furthermore we shall multiply it by a constant A-1:
B = A_1(y_1 ■ A ■ Y),
where Y is a diagonal matrix with coordinates yi of a just-found eigenvector y on a diagonal. Evidently B is also a primitive matrix and furthermore the vector z = (1,..., l)T is its eigenvector, because clearly Y ■ z = y.
If we know prove that [i = 1 is a simple root of the characteristic polynomial of the matrix B and that all other roots have absolute value strictly smaller than one, the proof is finished.
In order to do that we use the auxiliary lemma. Consider the matrix B to be a matrix of a linear mapping that maps the row vectors
u = («1
««)
B = v,
that is, using multiplication from the right. Thanks to the fact that z = (1,..., l)T is an eigenvector of the matrix B, the sum of the coordinates of the row vector i;
i,i=\
i = \
whenever u e S. Therefore the mapping maps the simplex S on itself and thus has in S a (row) eigenvector w with eigenvalue one (fixed point, thanks to the theorem of Brouwer). Because some power Bk contains only strictly positive elements, the image of the simplex S in the &-th iteration of the mapping given by B lies inside of S. We are getting close to using our lemma prepared for this proof.
We shall still work with the row vectors. Denote by P the shift of the simplex S into the origin by the eigenvector w we have just found, that is, P = —w + S. Evidently P is a polyhedron containing the origin and the vector subspace V c K" generated by P is invariant to the action of the matrix B through multiplication of the row vectors from the right. Restriction of our mapping on P thus satisfies the assumptions of the auxiliary lemma and thus all its eigenvalues are strictly smaller than one.
We have yet to deal with the problem that the just considered mapping is given by multiplication of the row vectors from the right with the matrix B, while originally we were interested in the mapping given by the matrix B and multiplication of the column vectors from the left. But that is equivalent to the multiplication of the transposed column vectors with the transposed matrix B in the usual way - from the left. Thus we have proven the claim about eigenvalues for the transpose of B. But transposing does not change the eigenvalues.
Dimension of the space V is n — 1, thus completing the proof.
□
154
CHAPTER 3. LINEAR MODELS AND MATRIX CALCULUS
or
x3(t) + --	• +x9(0,		Xi	(t + l)		=	Xi	-1	(0, pro		i	= 2,3
/xi(t+l)\			1	1	1	1	1	1	1			Ai(0\
x2(t+l)		1	0	0	0	0	0	0	0	0		x2(t)
x3(t+l)		0	1	0	0	0	0	0	0	0		x3(t)
x4(t+ 1)		0	0	1	0	0	0	0	0	0		Mi')
x5U + 1)	=	0	0	0	1	0	0	0	0	0		x5(i)
x6(t + 1)		0	0	0	0	1	0	0	0	0		X(,(t)
xi (t + 1)		0	0	0	0	0	1	0	0	0		xi(t)
x8(7+ 1)		0	0	0	0	0	0	1	0	0		Mf)
\x9(f + 1)7		Vo	0	0	0	0	0	0	1	0/		\x9(f)/
,10,
Characteristic polynomial of the given matrix is A9 — A7 —A6 —A5 —A4 — A3 — A2 — A — 1. Roots of this equation are hard to explicitly express, but we can estimate one of them very well - Ai = 1, 608 (why must it be smaller than (V5 + l)/2)?). Thus the population grows according to this model approximately with the geometric sequence 1, 608'.
3.18. Pond. Let us have a simple model of a pond where there lives a population of white fish (roach, bleak, vimba, nase, etc.). We assume that 20 % of babies survive their second year and from that age on they are able to reproduce. For these young fish, approximately 60 % of them survives their third year and in the following years the mortality can be ignored. Furthermore we assume that the birth rate is three times the number of fish that can reproduce.
Such population would clearly very quickly fill the pond. Thus we want to maintain a balance by using a predator, for instance esox. Assume that one esox eats per year approximately 500 mature white fish. How many esox should be put into the pond in order for the population to stagnate?
Solution. If we denote by p the number of babies, by m the number of young fish and by r the number of adult fish, then the state of the population in the next year is given by:
3m + 3r 0,2p ,0, 6m + xr ,
where 1 — r is the relative mortality of the adult fish caused by the esox. The corresponding matrix describing this model is then
If the population is to stagnate, then this matrix must have eigenvalue 1. In other words, one must be the root of the characteristic polynomial of this matrix. That is of the form A2(r - A) + 0, 36 - 0, 6.(r - A) = 0. That means that r must satisfy
r - 1+0.36 - 0.6(r - 1) = 0 0.4r - 0.04 = 0
3.20. Simple corollaries. The following very useful claim has with the knowledge of the Perron theorem a surprisingly simple proof and shows how strong is the prop-erty of the primitive matrix of a mapping.
Corollary. If A — (flij) is a primitive matrix and x e W its eigenvector with all coordinates non-negative and eigenvalue A, then A > 0 is the spectral radius of A. Furthermore it holds that
mm
£>,7 <A<
max
i = l
i = l
Proof. Consider the eigenvector x from the statement. Because A is primitive, we can fix k such that Ak has only positive elements, then of course Ak ■ x — Xkx is a vector with all coordinates strictly positive. Necessarily then A > 0.
>From the theorem of Perron we know that the spectral radius 11 is an eigenvalue and choose such eigenvector y associated with [i such that the difference x — y has only strictly positive coordinates. Then necessarily for all the powers of n we have
0 < A" ■ (x - y) = A"x - fi"y,
but we also have that A < ji. From there we directly have A — ji.
It remains to estimate the spectral radius using the minimum and maximum of sums of individual columns of the matrix. We denote them by />m;n and bmax, choose x to be a vector with the sum of coordinates equal to one and count:
X! auxj - X! "kXi ~k
i,j=l i = l
n    y n        \ n
* =    (fly )xj ^12 b™*xj
n    / n        v. n A = j2\12aij )xj - X^min*/ :
= 1 vr = l
□
Note that for instance all Leslie matrices from 3.18, where all the coefficients f and tj are strictly positive, are primitive and thus we can apply on them the just derived results.
Perron-Frobenius theorem is a generalisation of the Perron theorem for more general matrices, which we won't give here. More information can be found for instance in ??.
3.21. Markov chains. Very frequent and interesting case of linear processes with only non-negative elements in matrix is a mathematical model of a system which can be in one of m states with various probabilities. In a given point of time the system is in state i with probability X{ and transition form the state i to the state j happens with probability ?y.
We can write the process as follows: at time n the system is described by the probability vector
x„ - («l(«),
um{n))
That means that all components of the vector x are real non-negative numbers and their sum equals one. Components give the distribution of the probability of individual possibilities for the state of the system. The distribution of the probabilities at time
155
CHAPTER 3. LINEAR MODELS AND MATRIX CALCULUS
0.6 At + 0.5 Kk, -0.16 Dk   +   1.2 Kk;
In the next year only 10 % is allowed to survive and the rest should be eaten by the esox. If we denote the desired number of esox by x, then together they eat 500x fish, which should according to the previous computation be 0.9r. The ratio of the number of white fish to the number of esox should thus be r- = ^j. That is approximately one esox for 556 white fish. □ In general, we can work with the previous model as follows:
3.19. Let in the population model prey-predator be the relation between the number of predators Dk and preys Kk in the given and the following month (ieNU {0}) be given by the linear system
(a)
Dk+l Kk+i
(b)
Djt+i =        0.6 +   0.5 Kk,
Kk+l =   —0.175 Dk +   1.2 Kk;
(c)
Djt+i =        0.6 +  0.5 Kk,
Kk+l =   —0.135 Dk +   1.2 Kk. Let us analyse the behaviour of this model after a very long time.
Solution. Note that individual variants differ from each other only in
the value of the coefficients at Dk in the second equation. We can thus
express all three cases as
'DA    (0.6  0.5\ (Dk_x KkJ     \-a 1.2j'\Kk-ly
where we gradually set a = 0.16, a = 0.175, a = 0.135. The value
of the coefficient a represents here the average number of preys killed
by one (clearly a „humble") predator per month. When denoting
y—a   1.21
we immediately obtain
Using the powers of the matrix T we can determine the evolution of the populations of predators and preys after a very long time. We easily compute the eigenvalues
(a) kt = 1,   k2 = 0.8;
(b) kt = 0.95,   k2 = 0.85;
(c) ki = 1.05,   k2 = 0.75
the matrix T is and the respective eigenvectors are
(a) (5,4)r, (5,2)r;
(b) (10,7)r,    (2, if;
(c) (10,9)r ,    (10, 3f.
k e N,
jk  l Do
k e N.
n + 1 is given by multiplying the probabilistic transition matrix T — (tij), that is,
Xfi+\ — T - xn.
Because we assume that the vector x captures all possible states and thus with total probability one again transits into some of the state, all columns of T are also given by probabilistic vectors. Such process is called (discrete) Markov process and the resulting sequence of vectors xo, x \, ... is called Markov chain xn.
Note that every probabilistic vector x is actually mapped by a Markov process on a vector with a sum of coordinates equal to one:
j2tijxj = X(X'/;)V/ = Xv; = '•
>J J ' J
Now we can use the Perron-Frobenius theory in its full power. Because the sum of the rows of the matrix is always equal to the vector (1,..., 1), we can easily see that the matrix T—E is singular and thus one is surely an eigenvalue of the matrix T.
If furthermore T is a primitive matrix (for instance, when all elements are non-zero), from the corollary 3.20 we know that one is a simple root of the characteristic polynomial and all others have absolute value strictly smaller than one.
Theorem. Markov processes with the matrix that has no zero element or that some its power has this property, satisfy:
• there exist unique eigenvector im for the eigenvalue 1 which is probabilistic,
• the iteration Tkxo approaches the vector x^ for any initial probabilistic vector xq.
Proof. This claim follows directly from the positivity of the coordinates of the eigenvector derived in the Perron theorem.
Assume first that the algebraic and geometric multiplicities of the eigenvalues of the matrix T are the same. Then every probabilistic vector xo can be (in complex extension C") written as linear combination
x0 — ClX0
■ c2u2
where u2... ,u„ extend Xoo to a basis of the eigenvectors. But then the &-th iteration gives again a probabilistic vector
Xk
Tk-x0
+ X2c2u2-\-----hA/„«„.
Because all eigenvalues X2, ■ ■ ■ Xn are in absolute value strictly smaller than one, all components of the vector xk but the first one approach (in norm) zero very rapidly. But xk is still probabilistic, thus it must be that c\ — 1 and the second claim is proven.
In reality even with distinct algebraic and geometric multiplicities of eigenvalues we reach the same conclusion using a more detailed study of the so-called root subspaces of the matrix T which we reach in the connection with the so-called Jordan matrix decomposition even in this chapter, see the note 3.33.
Even in the general case we reach in the eigensubspace (xqo) a uniquely determined invariant (n — 1)-dimensional complement, on which are all eigenvalues in absolute value smaller than one and thus the corresponding component in xk approaches zero as before. □
156
CHAPTER 3. LINEAR MODELS AND MATRIX CALCULUS
For & e N thus holds that (a)
5 5\ (I 0\* (5 5W 4  21 ' \0  0.8/ ' U 2
,i _ j'10  2\   /0.95    0 \*  /10  2^ 1 7   1/ ' l 0    0.85/ ' I 7 1
(b)
(c)
)  10\  /1.05    0 V  /10  10x _1 9    3 J ' V 0    0.75) ' \ 9 3
>From there we further have for big 4eN that (a)
(5  5\   (\   0\   (5 5
(b)
4  2/   \0  0/   \4 2
j_ /-10 25 10 V -8 20
10  2\   /0  0\   /10 2
(c)
7   1/   \0  0/   V 7 1
0 0 0 0
,*_./'10   10\   /1,05*  0\   /10 10
9 3 / V 0 0/ V 9 3 1,05* /-30 100N
60   V"27   90 / ' because exactly for big ieNwe can set
(a)
1 0 \ /l 0 0  0.8/ ~ VO 0
0.95    0 V    /0 0 0    0.85/ ~ VO 0
1.05    0 V    /1.05* 0
(b)
(c)
0    0.15 J      \ 0 Oj Let us note that in the variant (b), that is for a = 0.175, it was not necessary to compute the eigenvectors. Thus we have obtained
(a)
Kk)~10\-8   20) '\K0
= J_(5 (-2D0 + 5K0) 10 14 (-2D0 + 5K0)
3.22. Iteration of the stochastic matrices. Matrices of Markov chains, that is, matrices whose rows have sum of their components equal to one are called stochastic matrices. Standard problems connected with Markov processes contain answers to the question about the expected time elapsed between transition from one state to another and so on. Right now we are not prepared for solving these problems, but we return to this topic later.
We reformulate the previous theorem into a simple, but surprising result. By convergence to a limit matrix in the following theorem we mean the following: if we say that we want to bound the possible error e > 0, then we can find a bound on the number of iterations k after which all the components of the matrix differ from the limit one by less than e.
Corollary. Let T be a primitive stochastic matrix from a Markov process and let Xqo be the stochastic eigenvector for the dominant eigenvalue I (as in the theorem before). Then the iterations Tk converge to the limit matrix T^, whose columns all equal to iw.
Proof. Columns in the matrix Tk are images of the vectors of the standard basis under the corresponding iterated linear mapping. But these are images of the probabilistic vectors and thus all of them converge to x^. □
Now for a short goodbye to Markov processes we think about the problem whether there exist for a given system the states into which the system tends to get in and stay in them.
We say that a state is transient, if the system stays in it with probability strictly smaller than one. State is absorbing if the system stays in it with probability one and into which the system can get with non-zero probability from any of the transient states. Finally, Markov chain x„ is absorbing, if all its states are either absorbing or transient.
If in the absorbing Markov chain first of r states of the system are absorbing, then for the stochastic matrix T of the system this means that it decomposes into "block-wise" upper triangular form
E R 0 Q
where £ is a unit matrix whose dimension is given by the number of absorbing states, while R is a positive matrix and Q non-negative. In any case, iterations of this matrix yield a matrix which has the same block of zero values in the bottom-left block and thus it is not primitive, for instance
r2 =
R + R-Q
Q2
Even about such matrices we can obtain many information using the full Perron-Frobenius theory and with knowledge of probability and statistics also estimate expected time after which the system gets into one of the absorbing states.
4. More matrix calculus
On pretty practical examples we have seen that understanding the inner structure of matrices and their properties is a strong tool for specific computations and analyses. Even more is it true for effectivity of numerical calculations with matrices. Therefore we will for a while deal with abstract theory.
157
CHAPTER 3. LINEAR MODELS AND MATRIX CALCULUS
(b)
(c)
0 0 0 0
Dk
1.05*
60 1.05*
60
D0 K0
-30 100 -27 90
Do K0
10(-3D0 + 10K0) 9(-3D0 + 10Z0)
These results can be interpreted as follows:
(a) If 2Do < 5Ko, the sizes of both populations stabilise on nonzero sizes (we say that they are stable); if 2D0 > 5K0, both populations die out.
(b) Both populations die out.
(c) For 3D0 < 10 K0 begins a population boom of both kinds; for 3 Do > 10Ko both populations die out.
Even a tiny change of the size of a can lead to a completely different result. This is caused by the constantness of the value of a - it does not depend on the size of the populations. Note that this restriction (that is, assuming a to be constant) has no interpretation in reality. But still we obtain an estimate on the sizes of a for stable populations.
□
3.20. Remark. Other model for the populations of predators and preys is the model by Lotka and Volterra, which describes a relation between the populations by a system of two ordinary differential equations. Using this model both populations oscillate, which is in accord with observations.
In linear models an important role is played by the primitive matrices (3.19).
3.21. Which of the matrices
0 1/7
1 6/7
D
0
	/1/2	0	l/3\			r	1	0
B =	°	1	1/2 ,	C =		1/4	0	1/2
	\l/2	0	1/6/			\3/4	0	1/2
1/2	0	0 \			(0	1 0	0\	
1/3	0	0	E		0	0 0	1	
1/6	1/6	1/3			1	0 0	0	
0	5/6	2/3)			\°	0 1	V	
are primitive? Solution. Because
A2 = /l/7   6/49 \ 3 16/7 43/49/'
'3/8   1/4 l/4> 1/4  3/8 1/4 ,3/8  3/8 1/2,
We will investigate further some special types of linear mappings on vector spaces and also a general case where the structure is described using the so-called Jordan theorem.
3.23. Unitary spaces and mappings. We are already used to the fact that it is efficient to work in the domain of complex numbers even in the case when we are interested only in real objects. Furthermore, in many areas the complex vector spaces are necessary component of the problem. For instance, take the so-called quantum computing, which became a very active area of theoretical computer science, although quantum computers have not been constructed yet (in a usable form).
Therefore we extend what we know about orthogonal mappings and mappings from the end of the second chapter with the following definitions:
__ |    Unitary spaces    [__>
Definition. Unitary space is a complex vector space V along with the mapping V x V -> C, (u, v) i-> u ■ v, which satisfies for all vectors u, v, w e V and scalars a e C
(1) u ■ v — v ■ u (the bar stands for complex conjugation),
(2) (au) ■ v — a(u ■ v),
(3) (u + v) ■ w — u ■ w + v ■ w,
(4) if u / 0, then u ■ u > 0 (notably if the expression is real). Such mapping is called scalar product over V.
Real number *Jv ■ v is called size of the vector v and vector is normalised, if its size equals one. Vectors u and i; are called orthogonal if their scalar product is zero, basis composed of mutually orthogonal and normalised vectors is called orthonormal basis V.
On first sight this is an extension of the definition of Euclidean vector spaces into the complex domain. We will keep on using the alternative notation (u, v) for scalar product of vectors u and i;. Identically to the real domain, we obtain immediately from the definition the following simple properties of the scalar product for all vectors in V and scalars in C:
u ■ u € M
u ■ u — 0   if and only if   u — 0
u ■ (av) — a(u ■ v) u ■ (v + w) — u ■ v + u ■ w M.0 = 0-« = 0
' j ij
where the last equality holds for all finite linear combinations. It is a simple exercise to prove everything formally, for instance the fist property follows from the definition property (1).
Standard example of scalar product over complex vector space
(xi
x„)T ■ (yi,
• x„) -xiyi
■ xn yn.
Thanks to conjugation of the coordinates of the second argument this mapping satisfies all required properties. The space C" with this scalar product is called standard unitary space of dimension n. We can denote this scalar product with matrix notation as x • y —
-T
y   ■ x.
158
CHAPTER 3. LINEAR MODELS AND MATRIX CALCULUS
the matrices A and C are primitive, and because
'1/2  0 1/3 0    1 1/2 vl/2  0 1/6
the middle column of the matrix B" is always (for n e N) the vector (0, 1, 0)T, that is, the matrix B cannot be primitive. The product
/1/3   1/2    0     0 \   M     /       0 \
1/2   1/3    0 0 0    1/6   1/6 1/3 \l/6    0    5/6 2/3/
0
a
w
o
a/6 + bß \5a/6 + 2b/3j
a, b e
implies that the matrix D2 has in the right upper corner a zero two-dimensional (square) sub-matrix. By repetition of this implication we obtain that the same property is shared by the matrices D3 = D ■ D2, D4 = D ■ D3, ..., D" = D ■ D"-\ thus the matrix D is not primitive. The matrix E is a permutation matrix (in every row and every column there is exactly one non-zero element, 1). It is not difficult to realise that the powers of the permutation matrix are again permutation matrices. The matrix E is thus also not primitive. This can be easily verified by calculating the powers E2, E3, E4. The matrix E4 is a unit matrix. □ Now we show a more robust model.
3.22. Model of spreading of annual plants. We consider the plants that at the beginning of the summer blossom, at the peak of the summer produce seeds and die. Some of the seeds burst into flowers at the end of the autumn, some survive the winter in the ground and burst into flowers at the start of the spring. The flowers that burst out in autumn and survive the winter are usually bigger in the spring and usually produce more seeds. After this, the whole cycle repeats.
The year is thus divided into four parts and in each of these parts we distinguish between some "forms" of the flower:
Part	Stage
beginning of the spring	small and big seedlings
beginning of the summer	small, medium and big blossoming flowers
peak of the summer	seeds
autumn	seedlings and seeds
We denote by x\(t) and by x2(t) the number of small and big seedlings respectively at the start of the spring in the year t and by y\(t), y2(t) and (0 the number of small, medium and big flowers respectively in the summer of that year. From the small seedlings either small or big flowers grow, from the big seedlings either medium or big flowers grow. Each of the seedlings can of course die (weather, be eaten by a cow, etc.) and nothing grows out of it. Denote by btj the probability that the seedling of the j-th size, j = 1,2 grows into a flower of the
Completely analogously to the Euclidean spaces and orthogonal mappings, great importance is in those mappings that respect scalar product.
___J    Unitary mapping    J___,-
Linear mapping <p : V -> W between unitary spaces is called unitary mapping, if for all vectors u,v e V we have
u ■ v — cp(u) ■ (p(v).
Unitary isomorphism is a bijective unitary mapping.
3.24. Properties of spaces with scalar product. In a brief discussion about Euclidean spaces in the previous chapter we have already derived some simple properties of spaces with scalar product. The proofs for the complex case are very similar.
In the following we shall work with real and complex spaces simultaneously and we write K for R or C, in the real case the conjugation is just the identity mapping (as the actual restriction of the conjugation in the complex plane to the real line is). Similarly to the real space we define in general for arbitrary vector subspace U C V in the space with scalar product its orthogonal complement
U1- = {v e V; u ■ v = 0 for all u e U],
which is clearly also a vector subspace in V.
In the following paragraphs we work exclusively with finitely-dimensional unitary or Euclidean spaces. However, many of our results have a natural generalisation for the so-called Hilbert spaces, which are specific infinitely-dimensional spaces with scalar products, to which we return later, albeit briefly.
Proposition. For every finitely-dimensional space V of dimension n with scalar product we have:
(1) InV there exists an orthonormal basis.
(2) Every system of non-zero orthogonal vectors in V is linearly independent and can be extended to an orthogonal basis.
(3) For every system of linearly independent vectors (u\,..., uk) there exists an orthonormal basis (y\, ..., vn) such that its vectors respectively generate the same subspaces as the vector Uj, that is,      ..., vt) — {u\..., ut), 1 < i < k.
(4) If («i, ..., u„) is an orthonormal basis V, then coordinates of every vector u e V are expressed via
u — (u ■ u{)u\ + ••• + («• u„)u„.
(5) In any orthonormal basis the scalar product has the coordinate form
u ■ v — x ■ y — x\yi H-----V xnyn
where x and y are columns of coordinates of the vectors u and v in a chosen basis. Notably, every n-dimensional space with scalar product is isomorphic to the standard Euclidean R" or the unitary C".
(6) Orthogonal sum of unitary subspaces V\ + ■ ■ ■ + Vk in V is always a direct sum.
(7) If A C V is an arbitrary subset, then A1- C V is a vector (and thus also unitary) subspace and (A-1)1- C V is exactly the subspace generated by A. Furthermore we have V — (A) © A\
(8) V is orthogonal product of n one-dimensional unitary sub-spaces.
159
CHAPTER 3. LINEAR MODELS AND MATRIX CALCULUS
z'-th size, i = 1, 2, 3. Then we have
0 < bn < 1,   bl2 = 0,   0 < &21 < 1,   0</322<0, i3i=0,
0 < /332 < 1, &11 + &21 < 1,      ^22 + /J32 < 1
(think in detail about what each of these inequalities expresses). If we consider the classical probability, we can compute bn as a ratio of the positive results (small seedling grew into a small flower) and of all possible results (the number of small seedlings), that is, b\\ = yi {f)/x\(t). From the
yi(0 = bnxi{f). Analogously, we obtain the equality
y3(0 = b32x2(t).
If we denote for a while by y2,i (t) and y2,2(0 the number of medium flowers that grew out of small and big seedlings respectively, we have
yi(t) = y2,i(t) + y2,2(0 and/32i = j2,i(0Ai(0, ^22 = yi,iit) / x2(f) and thus
y2(0 = b2\xl(t) + b22x2(t).
x(t)
xdt)
x2(t)
y(t)
and rewrite the previous equation in the matrix notation
y(0 = Bx(t).
Denote by cn, c12 and c13 the number of seed produced by small, medium and big flowers respectively, and by z(t) the total number of produced seeds in the summer of the year t, we have
z(t) = cnyi(0 +c12y2(f) +c13y3(0,
or in matrix calculus
z(t) = Cy(t)
with the notation
C = (cn   cX2   ci3) .
If we want the matrix C to describe the modelled reality, we assume that the inequalities
0 < Cn < cl2 < ci3
hold.
Denote finally by w 1 (0 and w2 (t) the number of seeds that burst in the autumn and the number of seeds that stay in the ground during the winter respectively, and by d\\ and d2\ the probabilities that the seed burst out in the autumn and that the seed does not burst respectively, and by f\ \ and f22 the probabilities that the seedling and the seed do
Proof. (1), (2), (3): We first extend the given system of vectors into any basis (u\,..., u„) of the space V and then start the Gramm-Schmidt orthogonalisation from -j f. 2.42. This yields an orthogonal basis with properties as required in (3). But from the Gramm-Schmidt orthogonalisation algorithm it is clear that when the original k vectors formed an orthogonal system of vectors, then during the process they won't be changed. Thus we have also proved (2) and (1). (4): If u — a\u\ + ■ ■ ■ + anun, then
u ■ Ui — a\(u\ ■ Ui) +
■ Qn{un ■ Ui) — üi \\Ui II    — Cli
(5): Similarly we compute for any vectors u —
X\U\
■■ y\u\
yn
U ■ V — (X\U\
— x\y\ ^
■+xnun) ■ (yi«i ~t~ xnyn.
ynu„)
(6) : We need to show that for any tuple V), Vj from the given sub-spaces their intersection is trivial. If we have u e Vt and simultaneously u e Vj, then we have u _L u, that is, u ■ u — 0. That is possible only for the zero vector u e V.
(7) : Let u, v e A-1. Then (au + bv) ■ w — 0 for all w e A, a, b e K (from the distributivity of the scalar product). We have thus checked that A-1 is a unitary subspace in V. Let (vi,..., vjc) be some basis (A) chosen among the elements of A, and let be («i,..., wjO the orthonormal basis outputted by the Gramm-Schmidt orthogonalisation of the vectors (vi,..., v^). We extend it to the orthonormal basis of the whole V (both exist thanks to the already proven parts of this proposition). Because it is an orthogonal basis, necessarily (wi+i, ...,«„) — [u\,..., u^)1- — A1-and A c (wi+i,..., un)L (this follows from expressing the coordinates under the orthonormal bassi). If u _L (wjfc+i, ..., u„), then wis necessarily a linear combination of the vectors u 1\, but that happens whenever it is a linear combination of the vectors v\,..., vi, which is equivalent to u being in (A).
(8) : This is equivalent to the formulation of existence of the orthonormal basis. □
3.25. Important properties of size. Now we have everything prepared for basic properties connected with our definition of the size of vectors. We speak also of the norm defined by the scalar product. Note also that all '^c^t^J^— claims always consider finite sets of vectors and their validity does not depend on the dimension of the space V where it all takes place.
Theorem. For any two vectors u,v in the space V with scalar product we have
(1) || u + v || < || u II + II v II, with equality if and only ifu and v are linearly dependent.
(triangle inequality)
(2) \u ■ v\ < || u || || v ||, with equality if and only if u and v are linearly dependent.
(Cauchy inequality)
(3) For every orthonormal system of vectors (e\, ..., e\) we have
22 2 > \u ■ e\ I + • • • + \u ■ ek\
(Bessel inequality).
160
CHAPTER 3. LINEAR MODELS AND MATRIX CALCULUS
not die during the winter respectively. The probabilities dn,d2i clearly must satisfy the inequalities
0 < d\\,    0 < d2\,    dn + d2\ = 1,
and because a seedling dies in the winter more easily than a seed hidden in the ground, we assume about fu, f22 that
(4) For orthonormal system of vectors (e\, ..., ek) the vector u belongs to the subspace e [e\,..., ek) if and only if
|2
|m||2 — \u ■ e\ \2 ■
\u-ek\
When denoting
'dn ,d2\
D
0 < /n < f22 < 1.
fu 0
w(t)
Wi(t)
0    MJ'       ~w \w2(t) we obtain with similar ideas as before the equalities
w(t) = Dz(t),       x(t + 1) = Fw(t).
Because the matrix multiplication is associative, we can for the numbers in individual stages of flowers in the following year from the previous equalities compose the recurrent formulas:
x(t + 1) =Fw(t) = F{Dz(t)) = (FD)z(t) = (FD){Cy(t)) = =(FDC)y(t) = (FDC)(Bx(t)) = (FDCB)x(t),
y(t + 1) =Bx(t + 1) = B{Fw(t)) = (BF)w(t) = (BF)(Dz(t)) = =(BFD)z(t) = (BFD)(Cy(t)) = (BFDC)y(t),
z(t + 1) =Cy(t + 1) = C(Bx(t + 1)) = (CB)x(t + 1) = (CB)(Fw(t)) =(CBF)w(t) = (CBF)(Dz{t)) = (CBFD)z(t),
W(t + 1) =Dz(t + 1) = D(Cy(t + 1)) = (DC)y(t + 1) =
=(DC)(Bx(t + 1)) = (DCB)x(t + 1) = (DCB)(Fw(t)) = ==(DCBF)w(t).
Using the notation
Ax = FDCB,    Ay = BFDC,    Az = CBFD,    Aw = DCBF, we simplify them into the formula
x(t+\) = Axx{t), y(H-l) = Ayy(t), z(t+l) = Azz(t), w(t+\) = Aw(t)
>From these formulas we can compute the distribution of the population of the flowers in any part of any year, if we know the starting distribution of the population (that is, in the year zero).
For instance, let the distribution of the population be known in the summer, that is, z(0) of seeds. The distribution of the population at the beginning of the spring in the t-th year is
x(t) = Axx(t - 1) = A2xx(t -2) = ... = A'-lx(l) = A'~l Fw(0) = =A'-1FDz(0).
(Parseval equality) (5) For orthonormal system of vectors (e\, ..., ek) and vector u € V is the vector
w — (u ■ e\)e\ H-----h (u ■ ek)ek
the only vector which minimises the size \\u — v\\ for all v € (ei, ..., ek).
Proof. All proofs rely on direct computations: (2): Define the vector w :— u — ^v, that is, w _L v and compute
0 < \\w\\2 = \\u\\2
0 < IMI2IMI2 =
(u-v)
IM
2 (" ' »)
II   i|2ii ii2
u-v ,,\ i (u-v)(u-v) ||., M2 II '•J II \\VV
2{u ■ v){u ■ v) + (u ■ v)(u ■ v)
>From there it directly follows that ||«||2IMI2 > \u ■ v\2 and the equahty holds if and only if w — 0, that is, whenever u and v are linearly dependent. (1): Again it suffices to compute
|2
||2      ii m2
v\\  — \\u\\ ■
< \\u\
J v ||- + U ■ V + V ■ u
IMI2 + 2Re(« • v) \v\\2 +2\u ■ v\ < \\u\\2
2
2||m|
iwiir
Because these are positive real numbers, it indeed is that || u+v\\ < \\u\\ + \\v\\. Furthermore, with equahty it must be that in all previous inequalities equality also holds, but that is equivalent to the condition that u and v are linearly dependent (using the previous part).
(3), (4): Let (e\,..., ek) be an orthonormal system of vectors. We extend to an orthonormal basis (e\,...,e„) (that is always possible thanks to the previous theorem). Then, again using the previous theorem, is for every vector u e V
n n k
\\u\\2 — ^(w • et)(u ■ ei) — ^2 \u ■ e,|2 > ^\u ■ e,|2 /—1 /—1 /—1
But that is the Bessel inequality. Furthermore, equality can hold if and only if u ■ =0 for all i > k, which proves the Parseval equahty.
(5): Choose arbitrary v e  {e\,..., ek) and extend the given orthonormal system to the orthonormal basis (e\,..., en). Let («i,... ,un) and (xi,..., xk, 0,..., 0) be coordinates of u and v • under this basis. Then
\\u-v\\2 — \u\ - xi |2 H-----h \uk - xk\2 + l^+il2 H-----h |«„|2
and this expression is clearly minimised when choosing the individual vectors to be x\ — u\,..., xk — uk. □
3.26. Properties of unitary spaces. The properties of orthogonal mapping have a direct analogue in the complex domain. We can easily formulate them and prove gSL^ together:
Proposition. Consider the linear mapping (endomorphism) cp : V —>• V on the space with scalar product. Then the following conditions are equivalent.
161
CHAPTER 3. LINEAR MODELS AND MATRIX CALCULUS
Note that the matrix Az = CBFD is of the type 1 x 1; it is not a matrix but just a scalar. We can denote by A = Az, compute
(3.5)
fbn 0
A = CBFD = (cxx   cX2   cX3) j b2i b22
0 b32)
7n    0 \ (dn
0    f22 Ui
= (cnbn + cX2b2X   cX2b22 + cX3b32) ^^i) =
= bncndnfn + b2ici2dnfn + b22cX2d2Xf22 + b32cX3d2Xf22
and order the previous computation into a suitable form
x(t) = (FDCB)'-1 FDz(0) = FD(CBFD)'~2 CBFDz(0) =
= FD(CBFDy-lz(<d) = FDAf-lz{Q) = k''1 FDzQ);
in this way only two matrix multiplications remain.
Let us list concrete values of the matrices B,C,D,F; they are the parameters of a hypothetical flower, which were inspired by the actual grass Vulpia ciliata:
B
(0.3    0 \ ,
0.1 0.6 , C = (1 10 100), D = Vi \0   0.2/ \
0.5 .5
0.05 0
Now we can compute the individual matrices, which map the vector describing the distribution of the population in some vegetative part of the year on the vector of the distribution of the population in the same part of the next year:
'0.0325  0.6500\ ^°°15  °-075° 0J5°°^
0.0650 1.3000/
Ay =   0.0325  0.3250 3.2500
1.3325,
v0.0100  0.1000 1.0000/
'0.0325 1.3000\ v0.0325 1.3000/" The value X = Az = 1.3325 expresses the relative increment of the population between two years. Check by yourself that each of the matrices Ax, Ay, Aw has only one non-zero eigenvalue A = 1.3325; other eigenvalues are equal to 0.
We show one more application of the given model. We can be interested in the "flexibility" of the reaction of the relative increment A on the change of the individual "demographic parameters" - for instance, how the change of the probabilities of survival of the seeds changes the yearly increment. Let us be more precise in the formulation of the question. By flexibility of the reaction of the characteristic X on the parameter s, denote d by e(k, s), we mean the relative change of the value A related to the relative change of the parameter s. Even more precisely: by A (s) we denote the yearly increment in dependence on the parameter s. Then Ak(s) = \{s + As) — \{s) expresses the absolute change of the relative increment A with the absolute change of
(1) (p is unitary or orthogonal transformation,
(2) <p is linear isomorphism and for every u, v e V it holds that cp(u) ■ v — u ■ cp~l (v),
(3) the matrix A of the mapping <p in any orthonormal basis satisfies A-1 = AT (for Euclidean spaces this means that A~l = AT),
(4) matrix A of a mapping <p under some orthonormal basis satisfies A-1 — AT,
(5) rows of the matrix A of the mapping <p under orthonormal basis form an orthonormal basis of the space W with standard scalar product,
(6) columns of the matrix A of the mapping <p under orthonormal basis form an orthonormal basis of the space W with standard scalar product.
Proof. (1) =>■ (2): The mapping <p is injective, therefore it must be onto. It also holds that cp(u) ■ v — cp(u) ■ (p((p~l (v)) — u ■ cp~l (v).
(2) =>■ (3): Standard scalar product is in K" always given for columns x, y of scalars by the expression x ■ y — xT Ey, where E is the unit matrix. The property (2) thus means that the matrix A of the mapping <p is invertible and it holds that (Ax)Ty — xT A~ly. That means that xT (ATy - A~ly) = 0 for all x e W. Notably by substituting the expression in the parentheses for x we find out 0 rtiat this is possible only when AT — A-1.
0-ljp) (4): if AT — A-1 under some orthonormal basis, then the condition (2) holds (cp(u) ■ v — (Ax)TEy — xT EA~ly — u ■ (p~l (v)) and thus also (3).
(4) =>■ (5) The claim is expressed via the matrix A of the mapping <p as the equation AAT — E, which is ensured thanks to (4).
(5) =>■ (6): Because for the determinant we have |ArA| — \E\ — \AAT\ — \A\\A\ — 1, there exists the inverse matrix A-1. But we also have A A1A — A, therefore also A1A — E which is expressed exactly by (6).
(6) =>■ (1): In the chosen orthonormal basis we have
<p(u) ■ <p(v) = (Ax)1 (Ay)
xAT Ay
■■ xT Ey
T -
x y
where x and y are columns of coordinates of the vectors u and i;. That ensures that the scalar product is preserved. □
Characterisation from the previous theorem deserves some ^ notes. The matrix A e Mat„ (K) with the property A-1 — AT are called unitary matrices for complex scalars (and in |x the case R we have already used the name orthogonal matrices for them). From the definition we have that a product of unitary (orthogonal) matrices is again unitary (orthogonal), the same holds for inverses. Unitary matrices thus form a subgroup U(n) C Gl n (C) in the group of all invertible complex matrices with the product operation. Orthogonal matrices form a subgroup 0(n) c Gl„ (R) in the group of real invertible matrices. We speak of unitary group and of orthogonal group. Simple calculation
1 = det E = det(AAJ) = det A det A = | det A\L
shows that the determinant of a unitary matrix has always size equal to one, in the case of real scalars the determinant is equal ±1. Furthermore, if Ax — Xx for unitary or orthogonal matrix, then (Ax) ■ (Ax) — x ■ x — \X\2(x ■ x). Therefore the real eigenvalues of orthogonal matrices in the real domain are equal ± 1, the
162
CHAPTER 3. LINEAR MODELS AND MATRIX CALCULUS
the parameter s by As. The relative change of the increment of the parameter s is As/s. The flexibility is then the ratio of these two relative changes, that is,
Ak(s)/k(s)       s  k(s + As) - k(s)
e(k, s) =-=--.
As/s k(s) As
Specifically, the yearly relative increment of the population depending
on the survival of the seeds over the winter is according to (||3.5||)
M/22) = d2i(b22Ci2 + b32cn)f22 + dn(bncnfn +621^12/11)
and for specific values of the other parameters
M/22) = 13/22 + 0.0325.
Because /22 = 0.1, we can compute
A(0.1) = 1.3325, A(0.1+As) = 1.3325+13A*, AA(O.l) = 13 As, therefore
0.1 13As e(k, 0.1) =.....—— = 0.976.
1.3325 As
Analogically we can compute the flexibility of the reaction of the rela tive increment k of the population on the other "demographic parame ters". The results are summarised in the table
parameter	flexibility	parameter	flexibility
bu	0.006	en	0.006
b2i	0.019	C\2	0.244
b22	0.225	C\3	0.751
b23	0.750	fn	0.024
d\\	0.024	J22	0.976
d2i	0.976		
>From it we can see that the increment k is mostly influenced by the number of the seed that overwinter (parameter ^21) and their survivability (parameter ^22)- This revelation is not surprising, the farmers are aware of this fact since the times of neolithic times. The result shows that the mathematical model indeed adequately describes the reality.
Other interesting and well-described models of growth can be found in the collection of exercises after this chapter.
3.23. Consider the following Leslie model: farmer breeds sheep. The birth-rate of sheep depends only on their age and is on average 2 lambs per sheep between one and two years of age, 5 lambs per sheep between two and three years of age and 2 lambs per sheep between three and four years of age. Younger sheep do not deliver any lambs. Every year, half of the sheep die, uniformly distributed among all age groups. Every sheep older than four years is sent to the butchery. Farmer would like to sell (living) lambs younger than one year for their skin. What part of the lambs can be sold every year such that the
eigenvalues of unitary matrices are always complex units in the complex plane.
As with the orthogonal mappings we can easily check that orthogonal complements of invariant subspaces with respect to unitary cp : V -> V are always also invariant. Indeed, if cp(U) c U, u e U and 1; e U1- are arbitrary, then
<P(v) ■ (p((p~l («)) = v-(p~l(u).
Because the restriction <p\u is also unitary, it must thus be a bijec-tion, notably we have that cp~l (u) e U. But then cp(v) ■ u — 0, because 1; e UL. That means that also <p(y) e U-1.
This yields an immediate useful corollary in the complex domain
Corollary. Let <p : V -> V be a unitary mapping of complex vector spaces. Then V is orthogonal sum of one-dimensional eigen-subspaces.
Proof. There surely exist at least one eigenvector v e V. Then the restriction <p on the invariant subspace (v)1- is again unitary and surely has also some eigenvector. After n such steps we obtain the desired orthogonal basis of eigenvectors. After normalising the vectors we obtain an orthonormal basis. □
Now it is possible to easily understand the details of the proof of spectral decomposition of the orthogonal mapping from 2.50 at the end of the second chapter - real matrix of an orthogonal mapping is interpreted as a matrix of a unitary mapping on a complex extension of Euclidean space and we carefully observe the corollaries of the structure of the roots of the real characteristic polynomial over the complex domain. We automatically obtain invariant two-dimensional subspaces given by tuples of complexly conjugated eigenvalues and thus the corresponding rotation for restricted original real mapping.
Dual and adjoint mappings. When discussing vector spaces and linear mappings in the second chapter we have already briefly mentioned the dual vector space V* of all lx linear forms over the vector space V, see 2.39. t^For every linear mapping between vector spaces \j/ : V -> W we can naturally define its dual mapping ty* : W* -> V* by the relation
(3.6) (v, f*(a)) = {f(v),a),
where ( , ) denotes evaluation of the form (the second argument) on the vector (first argument), v e V and a e W* are arbitrary.
Let us choose bases v over V, w over W and let us write A for the matrix of the mapping i/r under these bases. Then we easily compute in dual bases the matrix of the mapping ty* in the corresponding dual bases over the dual spaces. Indeed, the definition says that if we represented the vectors from W* in the coordinates as rows of scalars, then the mapping ty* is given by the same matrix as if, if we multiply by it the row vectors from the right:
(ijf(v), a) — (a 1
, ®n) ■ A
— (v, if* (a)).
\Vn/
That means that the matrix of the dual mapping ty* is the transpose AT, because a - A — (AT ■ aT)T.
163
CHAPTER 3. LINEAR MODELS AND MATRIX CALCULUS
size of the herd remains the same? In what ratio will then the sheep be distributed among individual age categories?
Solution. Matrix of the model (without action of the farmer) is
/0  2 5
L =
i   0 0
o \
\0 0
2\ 0
0 0
5 0/
The farmer can influence how many sheep younger than one year stay in his herd to the next year, that is, he can influence the element In of the matrix L. Thus we are dealing with the model
/0   2   5 2\ a   0   0 0 0   ±   0 0 \0  0   \ 0/
and we are looking for an a such that the matrix has the eigenvalue 1 (we know that it has only one real positive eigenvalue). The characteristic polynomial of this matrix is
X - 2aX2
5 1
-X - -, 2 2
and if we require it to have 1 as a root, it must be that a
(we
substitute X = 1 and set the polynomial equal to zero). The farmer can
thus sell
i i
10
of lambs that are born that year. The corresponding
eigenvector for the eigenvalue 1 of the given matrix is (20, 4,2, 1) and in these ratios the population stabilises. □
3.24. Consider the Leslie population growth model for the population of rats, divided into three groups according to age: younger than one year, between one year and two years and between two years and three years. Assume that there exists no rat older than three years. The average birth-rate of one rat in individual age categories is the following: in the first group it is zero, in the second and in the third it is 2 rats. The mortality in the second group is zero, that is, the rats that survive their first year die after three years of life. Determine the mortality in the first group, if you know that the population stagnates (the total number of rats does not change). O
D. Markov processes
3.25. Sweet-toothed gambler. Gambler bets on a coin - whether a flip results in a head or in a tail. At the start of the game he has three sweets. On every flip, he bets on sweet and if he wins, he gains one additional, if he looses, he looses the sweet. The game ends when he loses all sweets or has at least five sweets. What is the probability that the game does not end after four bets?
Let us further assume that we are in a vector space with scalar product. If we choose one fixed vector v e V, substituting vectors for the second argument in the scalar product gives us a mapping
V -* V* = Hom(V, K)
V
(W !->• (u, 111) €
The non-degeneracy condition of the scalar product ensures that this mapping is a bijection. Furthermore we know that it indeed is a linear mapping over complex or real scalars, because we have fixed the second argument. On first sight it is clear that the vectors of the orthonormal basis are mapped on forms that constitute a dual basis, and every vector can be thus understood using the scalar product as a linear form.
In the case of vector spaces with scalar product our identification of a vector space with its dual also takes the dual mapping i/>* to the mapping i/>* : W -> V given by the formula
(3.7)
where by the same notation of parentheses as in the definition (3.6) we now mean scalar product. This mapping is called adjoint mapping to iff.
Equivalently we can understand the relation (3.27) to be the definition of the adjoint mapping i/>*, for instance by substituting all tuples of vectors of an orthonormal basis for the vectors u and i; we directly obtain all values of the matrix of the mapping i/>*. The previous calculation for the dual mapping in coordinates can be now repeated, we just have to keep in mind that in orthonormal bases in unitary spaces the coordinates of the second argument are conjugated:
(i[r(v), hi) — (wi, ..., w„) ■ A
\Vn/
= A
\w„)
— (v, \jr* (w))
\Vn/
Therefore we see that if A is the matrix of the mapping i/> in an orthonormal basis, then the matrix of the adjoint mapping i/>* is the transposed and conjugated matrix A - we denote this by A* — AT.
The matrix A* is called the adjoint matrix of the matrix A. Note that adjoint matrices are well defined for any rectangular matrix. We should not confuse them with algebraic adjoints, which we have used for square matrices when working with determinants.
We can thus summarise that for any linear mapping i/> : V -> W between unitary spaces under orthonormal bases with the matrix A, its dual mapping has in the dual bases the matrix AT. If we also identify using the scalar product the vector spaces with their duals, then the dual mapping corresponds to the adjoint mapping i/>* : W -> V (it is a custom to denote this mapping in the same way as the dual mapping), which has the matrix A*. The distinction between the matrix of the dual mapping and of the adjoint mapping is thus in the additional conjugation, which is of course the corollary of the fact that unifying the unitary space with its dual is not complexly linear mapping (since from the second argument in the scalar product the scalars are brought out conjugated).
164
CHAPTER 3. LINEAR MODELS AND MATRIX CALCULUS
Solution. Before the 7-th round we can describe the state of the player by the random vector Xj = (Mi), PiU), P2U), PsU), P4(j), PsU)), where pt is the prob-abiUty that the player has z sweets. If the player has before the 7-th bet z sweets (z = 2,3, 4), then after the bet he has with 1/2 probability (z — 1) sweets with 1/2 probability (z + 1) sweets. If he attains five sweets or loses them all, the number of sweets does not change. The vector Xj+i is then obtained from the vector Xj by multiplying it with the matrix
/l	0.5	0	0	0	0\
0	0	0.5	0	0	0
0	0.5	0	0.5	0	0
0	0	0.5	0	0.5	0
0	0	0	0.5	0	0
	0	0	0	0.5	V
x,
At the start we have
M 0
0
1 0
W
after four bets the situation is described by the following vector
x,
AAX,
16
0
_5_
16
0
\V
that is, the probability that the game ends in the fourth bet or sooner is one half.
Note also that the matrix A describing the evolution of the prob-abilist vector X is itself probabilistic, that is, in each column the sum is one. But it does not have the property required by the Perron-Frobenius theorem and by a simple computation you can check (or you can see it straight without any computation) that there exist two linearly independent eigenvectors corresponding to the eigenvalue 1 -the case that the player has no sweet, that is x = (1, 0, 0, 0, 0, 0)T, or the case when the player has 5 sweets and the game thus ends with him keeping all the sweets, that is, x = (0, 0, 0, 0, 0, l)T. All other eigenvalues (approximately 0.8,0.3, —0.8, —0.3) are in absolute value strictly smaller than one. Thus the components in the corresponding eigensubspaces with iteration of the process with arbitrary initial distribution vanish and the process approaches the limiting value of the probabilistic vector of the form (a, 0.0, 0.0, 1 — a), where the value a
3.28. Self-adjoint mappings. A special case of linear mapping are those that are identical with their adjoint mappings: if* = if. Such mappings are called self-< adjoint. Equivalently we can say that they are those mappings whose matrix A is under one (and thus under all) orthogonal basis satisfies A = A*.
In the case of Euclidean spaces the self-adjoint mappings are those that have symmetric matrix (under some basis). Often they are called symmetric matrices and symmetric mappings.
In the complex domain the matrices that satisfy A = A* are called Hermitian matrices. Sometimes they are also called self-adjoint matrices. Note that Hermitian matrices form a real vector subspace in the space of all complex matrices, but it is not a sub-space in the complex domain.
Remark. Especially interesting is in this connection the following remark. If we multiply a Hermitian matrix A by the imaginary unit, we obtain the matrix B = i A, which has the property B* = i AT = — B. Such matrices are called anti-Hermitian. As every real matrix is a sum of a symmetric and an anti-symmetric part,
1
1
A = -(A + A1) + -(A-A1), in the complex domain we analogously have
A=l-(A + A*) + i^-(A-A*) 2 2i
and can thus express every complex matrix in exactly one way as
a sum
A = B + iC
with Hermitian matrices B and C. It is an analogy of the decomposition of a complex number into its real and purely imaginary component and in the literature we often encounter the notation
B = re A = -(A + A*), C = im A = — (A - A*). 2 2i
In the language of linear mappings this means that every complex linear automorphism can be uniquely expressed using two self-adjoint mappings.
3.29. Spectral decomposition. We consider a self-adjoint mapping if : V -> V with matrix A under some orthonormal basis and we try to proceed similarly as in 2.50. Again, I we first look in general at the invariant subspaces of self-adjoint mappings and on their orthogonal complements. If for any subspace W C V and self-adjoint mapping if : V -> V we have if(W) c W, then also for every 1; e W-1, w e W
(if(v), in) = (u, if(w)) = 0.
That means that also f(W^) c W^.
Consider now the matrix A of a self-adjoint mapping under some orthonormal basis and A ■ x = Xx for some eigenvector x e C". We obtain
X(x, x) = {Ax, x) = (x, Ax) = (x, Xx) = X(x, x).
Positive real number (x, x) can be cancelled out and thus it must be X = X, that is, eigenvalues are always real.
The characteristic polynomial det( A—X E) has that many complex roots as is the dimension of the square matrix A, and all of them are actually real. Thus we have proved important general result:
165
CHAPTER 3. LINEAR MODELS AND MATRIX CALCULUS
depends on the initial number of sweets. In our case it is a = 0.4, if there were 4 sweets at the start, it would be a = 0.2 and so on. □
3.26. Car rental. Company that rents cars every week has two branches - one in Prague and one in Brno. A car rented in Brno can be returned in Prague and vice versa. After some time it has been discovered that in Prague, roughly 80 % of the cars rented in Prague and 90 % of the cars rented in Brno are returned there. How to distribute the cars among the branches such that in both there is at the start of the week always the same number of cars as in the week before? How will the situation look like after some long time, if the cars are distributed at the start in a random way?
Solution. Let us denote the components of the vector in question, that is, the initial number of cars in Brno and in Prague by xB and x P respectively. The distribution of the cars between branches is then described
by the vector x = (^B^j ■ ^ we consider such a multiple of the vector x such that the sum of its components in 1, then its components give the procentual distribution of the cars. At the end of the week is according to the statement the state is described by the vector I ^ ^  0 8 / \ jc
'0.1 0.2 v0.9 0.8
rental. If at the end of the week in the branches there should be the same number of cars as at the beginning, we are looking for such vector x for which it holds that Ax = x. That means that we are looking for an eigenvector of the matrix A associated with the eigenvalue 1.
The characteristic polynomial of the matrix A is (0.1 — A) (0.8 — A) — 0.9.0.2 = (A — 1)(A + 0.1) and 1 is indeed an eigenvalue of
the matrix A. The corresponding eigenvector x = I Xb ) satisfies the
\xpj
-0.9    0.2 \ (xB
The matrix A
thus describes our (linear) system of car
0. It is thus a multiple of the vector . For determining the procentual distribution we are looking for
The suitable distribution of the cars between
equation ^Q9    _Q y ^
0.2^ .0.9,
a multiple such that xB + xP = 1. That is satisfied by the vector
_L (02\ - (0A^ 11 \0.9J ~ V0.82y
Prague and Brno is such that 18% of the cars are in Brno and 82% of the cars are in Prague.
If we choose arbitrarily the initial state x = I Xb ), then the state
\xpj
after n week is described by the vector xn = A"x. Now it is useful to express the initial vector x in the basis of the eigenvectors of A. The eigenvector of the eigenvalue 1 has already been found and similarly we find eigenvectors of the eigenvalue —0.1. That is for instance the
vector
Proposition. Orthogonal complement of an invariant subspace for self-adjoint mapping is also invariant. Furthermore, all eigenvalues of a Hermitian matrix A are always real.
>From the definition itself it is clear that restriction of a self-adjoint mapping on an invariant subspace is again self-adjoint. The previous claim thus ensures that there always exists a basis of V composed of eigenvectors. Indeed, the restriction of \[r on the orthogonal complement of an invariant subspace is again self-adjoint mapping, thus we can add into the basis one eigenvector after another, until we obtain whole decomposition of V. Eigenvector associated with distinct eigenvalues are perpendicular, because from the equations \[f(u) — Xu, ^j/(v) — pv we have that
X{u, v) — (if(u), v) — {u, ifr(v)) — p{u, v) — p{u, v).
Usually our result is formulated using projections on eigensub-spaces. About the projector P : V -> V we say that it is perpendicular if Im P _L Ker P. Two perpendicular projectors P, Q are mutually perpendicular if Im P _L Im Q.
Theorem (About the spectral decomposition). For every self-adjoint mapping iff : V -> V on a vector space with scalar product there exists an orthonormal basis composed of eigenvectors. If Xi, ..., Xk are all distinct eigenvalues of iff and P\, . . . , Pk are the corresponding perpendicular and mutually perpendicular projectors on the eigenspaces corresponding to the eigenvalues, then
f = XiPi + --- + XkPk.
Dimension of images of these projectors is always equal to the algebraic multiplicity of the eigenvalues a,.
3.30. Orthogonal diagonalisation. Mappings for which we can \^ find an orthonormal basis as in the previous theo-\ rem about spectral decomposition are called orthogonally diagonalisable. They are of course exactly such mappings for which we can find an orthonormal basis such that the matrix of the mapping is diagonal under this basis. Let us think for a while how can they look like.
For the Euclidean case it is simple: diagonal matrices are first of all symmetric, thus they are exactly the self-adjoint mappings. As a corollary we obtain a result that an orthogonal mapping of an Euclidean space into itself is orthogonally diagonalisable if and only if it is self-adjoint (they are exactly the self-adjoint mappings with eigenvalues ±1).
For complex unitary spaces the situation is more complicated. Consider arbitrary linear mapping cp : V -> V of a unitary space and let cp — \[r + in be the (uniquely given) decomposition of cp into its Hermitian and anti-Hermitian part. If cp has under a suitable orthonormal basis a diagonal matrix D, then D — reD + iimD, where the real and the imaginary parts are exactly the matrices \[r and n (follows from the uniqueness of the decomposition). Thus it also holds that \[r o n — n o i\r and cp o cp* — cp* ocp. The mappings cp : V —>• V with the last listed property are called normal.
Mutual connections are shown in the following proposition (we follow the notation of this paragraph):
Proposition. The following conditions are equivalent:
(1) cp is orthogonally diagonalisable,
(2) cp* o cp — cp o cp* (that is, cp is a normal mapping),
(3) iff o r) = r) o ifr,
166
CHAPTER 3. LINEAR MODELS AND MATRIX CALCULUS
The initial vector can thus be expressed as a linear combination
/0 2\ /-1N x = a + b I j ). State after n weeks is then
(4) for a matrix A — (atj) of a mapping <p under some orthonor-mal basis and its m — dimV eigenvalues ki it holds that
a;
*n, /0.18\
\0.82 j '
-A = fl JJ;» Wir -
The second summand is for n -» oo approaching zero and thus the
state stabilises at a ^j'g^, that is, the coordinate of the initial vector
at the direction of the first eigenvector. The coefficient and can be easily expressed using the initial states of the cars: a = XB+*P. □
3.27. Popularity of the media. In a certain country there are two television channels. From a public survey it follows that in one year 1/6 of the viewers of the first channel move to the second, 1/5 viewers of the second move to the first channel. Determine the time evolution of the number of viewers watching given channels using Markov processes, write down a matrix of the process, find its eigenvalues and eigenvectors. O
3.28. Students at the lecture. Students can be divided into, say, three group - those that are present on a lecture and pay attention, those that are present but pay no attention and those who are in a pub instead. Now let us observe, lecture after lecture, how the numbers in the individual groups change. The first step is to observe what are the probabilities that a student changes his state. Let us say that it can be as follows:
Student that pays attention: with probability 50% stays in the same state, with 40% stops paying attention and with 10% moves to the pub. Student that pays no attention: starts paying attention with 10%, with 50% stays in the same state and with 40% moves to the pub. Student that is in pub has zero probability of returning to the lectures.
How does the model evolve in time? How does the situation change if we assume at least ten percent probability that a student returns from the pub to the lecture (but is not going to pay any attention)?
/0.5  0.1 0\
Solution. The matrix of the Markov process is I 0.4  0.5  0 I. Its
\0.1   0.4 1/
characteristic polynomial is (0.5 — k)2(\ — k) — 0.4(1 — k) = 0. Evidently one is an eigenvalue of this matrix (the other roots are 0.3 and 0.7). In the course of time, the students divide into groups as described
by the corresponding eigenvector - which is a solution of the equality -0.5    0.1    0\ /jc\
0.4    —0.5  0| lyl = 0, which are exactly multiples of the vec-0.1     0.4    0/ \z) tor (0, 0, 1). In other words, all students end up in the pub.
E,,l*vl2 = £T=i l^l2-
Brief proof. The implication (1)
(2) was already dealt
with.
(2) <s> (3): it suffices to do a direct calculation
cpcp* — (-iff + in)(i{f — in) = iff2 + n2 + i(ni{f — ifrn)
cp* cp = (iff — ir))(i{r + in) — iff2 + n2 + i(^rr\ — r)ifr)
Subtraction yields 2i{n^r — TJrn).
(2) =>■ (1): letw e V be an eigenvector of the normal mapping cp. Then
cp(u) ■ cp(u) — {cp*cp(u), u) — ((pep* (u), u) — cp* (u) ■ cp* (u),
thus also \cp(u)\ — \cp* (u)\. If cp is normal, then (cp — kid V)* — (cp* —kidV) and thus (cp —k id V) is also a normal mapping. From the previous equation follows that if cp(u) — ku, then cp* (u) — ku. That means that <p and cp* have the same eigenvectors and conjugated eigenvalues.
As with self-adjoint mappings we know easily prove orthogonal diagonalisability. A necessary and sufficient condition for that is that the orthogonal complement of every eigensubspace for normal <p is invariant (note that a restriction of a normal mapping on an invariant subspace is again normal). Consider an eigenvector u e V with eigenvalue k, v e (u)-1. We have
cp(v) ■ u — v ■ cp* (u) — (v, ku) — ku ■ V — 0
and thus again cp(v) e (u)-1.
(1) <s> (4): the expression ^; ■ |fly |2 is the trace of the matrix AA*, which is the matrix of the mapping <p o cp*. Therefore it does not depend on the choice of orthonormal basis. Thus if <p is diagonalisable, this expression equals exactly ^t \kt\2.
The other implication is a direct corollary of the Schur theorem about unitary triangulation of an arbitrary linear mapping V —>• V, which we prove later in 3.37. The theorem says that for every linear mapping <p : V —>• V there exists an orthonormal basis under which <p has upper triangular matrix. On its diagonal there must be then all the eigenvalues of cp. As we have already shown, the expression ^t ■ \atj \2 does not depend on the choice of the orthonormal bassi, thus from the assumed equality we have that all elements that are not on the diagonal must be in this matrix equal to zero. □
In terms of matrices of mappings we obtain: a mapping is normal if and only if its matrix under some orthonormal basis (equiv-alently, under any orthonormal basis) satisfies A A* — A* A. Such matrices are called normal matrices.
Remark. Note that for the calculations with linear mappings on complex unitary spaces, we can understand the last theorem as a generalisation of common calculations with complex numbers in the polar form - the role of real numbers is played by self-adjoint mappings, the role of complex numbers is played by unitary mappings. Very notable is also the analogy to the expression of complex units in the form cos t + i sin t with the property cos2 t + sin2 t — 1:
Corollary. Unitary mappings on unitary space V are exactly those normal mappings for which the aforementioned unique decomposition cp — ijf + in satisfies ijj2 + rj2 — id V.
167
CHAPTER 3. LINEAR MODELS AND MATRIX CALCULUS
Such result is clear even without any computation - as the probability of returning from the pub is zero, all students end up in the pub.
Adding 10 percent possibility for leaving the pub, this changes. The
/0.5  0.1    0 \
corresponding matrix is now I 0.4  0.5  0.1 I. Again we have that
\0.1   0.4 0.9/
the state stabilises on the eigenvector associated with the eigenvalue 1.
That is in this case the solution of the equation
-0.5    0.1      0 \ /jc\ 0.4    -0.5    0.1       y =0. 0.1     0.4    -0.1/ \z)
A solution is for instance the vector (1, 5, 21). The distribution of
the students in the individual group is then given by the multiple of
this vector for that the coordinates sum to one, that is, the vector
(^j, Jj, |i). Again, most of the students end up in the pub, but some
will be at school.
□
3.29. Roulette. The player of the roulette has the following strategy: he came to play with €10. He always bets everything he has. He always bets on black (there are 37 numbers in the roulette, 18 black, 18 red and zero). The player ends whenever he has nothing, or when he wins €80. Consider this problem as a Markov process and write down its matrix.
Solution. In the course of the game and at its end the player can have
only one of the following amounts of money (in €): 0, 10, 20, 40,
80. If we view the situation as a Markov process, then these amounts
corresponds to its states, and we easily construct the matrix:
/l a 0 0
0 b 0 0
a 0 0
a 0\
0 0
0 0
0 0
\0  0  0  b   IJ
where a
12
37
and b = Note that the matrix is probabilistic and singular. The eigenvalue 1 is double. The game does not converge to a single vector Xoo, but ends in one of the eigenvectors associated with eigenvalue 1, that is, either (1, 0, 0, 0, 0) (the player looses it all), or (0, 0, 0, 0, 1) (the player wins €80). Furthermore we observe that the game ends after three bets, that is, the sequence {A"}™=1, is constant for n > 3:
A"
and we easily determine that the game ends with the probability a + ab + ab2 = 0, 885 as a loss and with the probability roughly 0, 115
/l	a + ab + ab2	a + ab	a	0\
0	0	0	0	0
0	0	0	0	0
0	0	0	0	0
	b3	b2	b	V
Proof. For unitary mapping cp is qxp* — id V — cp* cp and thus cpcp* = (i/r + it])(^r — it]) — i/>2 + 0 + rj2 — id V. On the other hand, for normal mapping the last calculation shows that the other implication holds too. □
3.31. Non-negative mappings and roots. Non-negative real numbers are exactly those which we can write as square roots. Generalisation of such behaviour for matrices and mappings can be seen in products of matrices B — A* ■ A (that is, composition of
mappings i/>* of):
(B ■ x, x) = (A* • A • x, x) = (A • x, A • x) > 0 for all vectors x. Furthermore, we clearly have
B* = (A* ■ A)* = A* ■ A = B.
Hermitian matrices B with such property are called positively semi-definite and if the zero value is attained only for x — 0, they are called positively definite. Analogously, we speak of positively definite and positively semidefinite mappings iff : V -> V.
For every positively semidefinite mapping i/> : V -> V we can find its root, that is, a mapping rj such that rj o rj — i/>. It is simplest to see under an orfhonormal basis where i/> has diagonal matrix. Such basis exists (as we have already proven) and the matrix A of the mapping i/> has on diagonal only non-negative real numbers, the eigenvalues of i/>. If some of them were negative, then the condition for non-negativity would not be satisfied already for some of the basis vectors. But then it suffices to define the mapping rj using the matrix B with square roots of the corresponding eigenvalues on diagonal.
3.32. Spectra and nilpotent mappings. At the end of this section we return to the question about behaviour of linear mapping in full generality. We shall still work with real or complex vector spaces.
Let us recall that spectrum of linear mapping f : V -> V is a sequence of roots of the characteristic polynomial of the mapping /, counting multiplicities. Algebraic multiplicity of eigenvalue is its multiplicity as of a root of the characteristic polynomial, geometric multiplicity of eigenvalue is the dimension of the corresponding subspace of eigenvectors.
Linear mapping / : V -> V is called nilpotent, if there exists an integer k > 1 such that the iterated mapping /* is identically zero. The smallest k with such property is called degree ofnilpo-tency of the mapping /. The mapping / : V —>• V is called cyclic, if there exists a basis (u\,... ,u„) of the space V such that f(u\) — 0 and /(«/) — ui-\ for all i =2,..., n. In other words, the matrix of / under this basis is of the form
/0   1 0 0   0 1
V: :
A
7
If f(v) — a ■ v, then for every natural k we have f (v) — ak ■ v. Notably, the spectrum of nilpotent mapping can contain only zero scalar (and that is always present).
Directly from the definition follows that every cyclic mapping is nilpotent, furthermore its degree of nilpotency is equal to the
168
CHAPTER 3. LINEAR MODELS AND MATRIX CALCULUS
as a win of €80. (We multiply by the matrix A°° the initial vector (0, 1, 0, 0, 0) and obtain the vector (a + ab + ab2, 0, 0, 0, b3).) □
3.30. Consider the situation from the previous case and assume that the probability of both win and loss is 1/2. Denote by A the matrix of the process. Without using any computational software determine
A100. O
3.31. Absent-minded professor. Consider the following situation: absent-minded professor carries an umbrella with him, but with probability 1/2 he forgets it where he is leaving from. In the morning, he leaves to the work. At the work, he goes for a lunch into a restaurant, and then back. After he is finished with his work, he leaves for home. Consider for simplicity that he does not go anywhere else and that in the restaurant the umbrella stays on his favourite spot, where he can take it from on the next day (if he does not forget it there). Consider this situation as Markov process and write down its Matrix. What is the probability that after many days in the morning the umbrella is located in the restaurant? (It is useful to take as a time unit one day -from morning to morning.)
Solution.
'11/16 3/8 l/4>
3/16 3/8 1/4
1/8 1/4 1/2,
We compute for instance the element a\, that is, the probability
that the umbrella starts its day at home and stays there (that is, will be
there the next day in the morning) - there are three disjoint ways for
the umbrella:
D the professor forgets it at home in the morning p\ = \, DPD the professor takes it to the work, then he forgets to take it on
the lunch and in the evening he takes it home: P2 = \    \ =
1
8'
DPRPD the professor takes the umbrella all the time with him and does not forget it anywhere: p3 = \ ■\ ■\ ■\ = j^.
In total a\ = pi + p2 + pi = ^.
The eigenvector of this matrix corresponding to the dominant eigenvalue 1 is (2, 1, 1), and thus the desired probability is 1/(2 + 1 + 1) = 1/4. □
3.32. Algorithm for determining the importance of pages. Internet browsers can find on the Internet (almost) all pages containing a given word or phrase. But how to sort the pages such that the user receives a list sorted according to the relevance of the given pages? One of the possibilities is the following algorithm: the collection of all found pages is considered to be a system and each of the found
dimension of the space V. The operator of derivation on polynomials, D(xk) — /ex*-1, is an example of cyclic mapping on spaces K„ [x] of all polynomials of degree at most n over scalars K.
Surprisingly, this also holds the other way - every nilpotent mapping is a direct sum of cyclic mappings. A proof of this claim takes a lot of work, thus we first formulate the results we are aiming at, and then gradually start with the technical work. In the resulting theorem about Jordan decomposition appear vector (sub)spaces and linear mappings on them with a single eigenvalue X and a matrix
(XX    0    ... 0\ ox    1    ... 0
j =  . .
v0   0    0    ... X)
These matrices (and corresponding invariant subspaces) are called Jordan blocks.
Theorem (Jordan theorem about canonical form). Let V be a vector space of the dimension n and f : V -> V be a linear mapping with n eigenvalues, counting algebraic multiplicities. Then there exists a unique decomposition of the space V into a direct sum of subspaces
V = Vi ® ■ ■ ■ ® vk
such that f (Vi) C Vi, restriction of f on every Vi has a single eigenvalue Xi and the restriction f — Xi ■ id on Vi is either cyclic or zero mapping.
The theorem thus says that for a suitable basis every linear mapping has block-diagonal form with Jordan blocks along the diagonal. The total number of ones over the diagonal in such form equals the difference between total algebraic and geometric multiplicity of the eigenvalues.
3.33. Notes. Note that we have already proven the Jordan theorem for the cases when all eigenvalues are either distinct or when the geometric and algebraic multiplicities of the eigenvalues are the same. Specifically, we have already proven it for unitary, normal and self-adjoint mappings.
Another useful observation is that for every linear mapping /, every eigenvalue of / has uniquely determined invariant subspace that corresponds to the Jordan block in the matrix.
We should also mention one very useful corollary of the Jordan theorem (which we have already used in the discussion about the behaviour of Markov chains). Assume that the eigenvalues of our mapping / are all in absolute value smaller than one. Then repeated application of the linear mapping on every vector v e V leads to a fast decrease of all coordinates of fk(v) bellow any bounds. Indeed, assume for simplicity that on whole V the mapping / has only one eigenvalue X and / — X id v is cyclic (that is, we consider only one Jordan block) and let v\,..., vi be the corresponding basis. Then the condition from the theorem says f(v2) — Xv2 + v\, f2(v2) — X2v2 + Xv\ + Xv\, and similarly for other vt and higher powers. In any case, iteration results in higher and higher powers of X at all non-zero components, while the smallest of them can be at most the degree of nilpotency lower than the number of iterations.
This proves the claim (and the same argument can be used to prove that for the mapping with all eigenvalues with absolute
169
CHAPTER 3. LINEAR MODELS AND MATRIX CALCULUS
pages as one of its states. We describe a random walk on these pages as a Markov process. The probabilities of transitions between pages are given by the hyperlink: each link, say from page A to page B, determines the probability (l/(total number of links from the page A)), with which the process moves from the page A to the page B. If from some page there are no leading links, we consider it to be a page from which a link leads to every other page. This gives us probabilistic matrix m (the element mi; corresponds to the probability with which we move form the z-th page to the 7-th page). Thus if one randomly clicks on links in the found pages (and from a linkless page one just chooses randomly the next one) the probability that at a given point in time (distant enough from the beginning) one is located on the z-th page corresponds to the z-th component of the unit eigenvector of the matrix m, corresponding to the eigenvalue 1. Looking at the sizes of these probabilities we define the importance of the individual pages.
This algorithm can be modified by assuming that the users stops clicking from a link to link after certain time and again starts on a random page. Say that with probability d he chooses randomly a new page and with probability (l — d) keeps clicking. In such situation the probability of transition between any two pages 5, and Sj is non-zero - it is d/n + (1 — d)/total number of links at the page 5, if from 5, there is a link to Sj, and d/n otherwise (if there are no links at 5,, then it is \/ri). According to the Perron-Frobenius theorem the eigenvalue 1 is with multiplicity one and dominant, and thus the corresponding eigenvector is unique (if we chose transitional probabilities only as described in the previous paragraph, it would not have to be so).
For illustration consider pages A, B, C and D. The links lead from A to B and to C, from B to C and from C to A, from D nowhere. Let us say that the probability that the user chooses a random new page is 1/5. Then the matrix m looks as follows:
/1/20 1/20 17/20 l/4\
9/20 1/20 1/20 1/4
9/20 17/20 1/20 1/4
\l/20 1/20 1/20 1/4/
The eigenvector corresponding to the eigenvalue 1 is (305/53,175/53,315/53,1), the importance of pages is thus given according to the order of the sizes of the corresponding components, that is, C > A > B > D.
3.33. Based on the temperature at 14:00 the days are divided into warm, average and cold. From the all-year statistics, after a warm day in half of the cases the next day is warm in 50 % of the cases and average day in 30 % of the cases, after an average day the next day is in 40 % of the cases average and cold in 30 % of the cases, and after
m
value strictly greater than one leads to unbounded growth of all coordinates for the iteration fk(v)).
The rest of this part of the third chapter is devoted to the proof of the Jordan theorem and some necessary lemmata. It is way more difficult than anything so far and the reader can skip it, until the beginning of the fifth part of this chapter.
3.34. Root spaces. On examples we have already seen that the eigensubspaces describe additional geometric properties only for some linear mappings. Thus we now introduce a more subtle tool, the so-called root subspaces.
Definition. Non-zero vector u e V is called root vector of a linear mapping cp : V -> V, if there exists a e K and an integer k > 0 such that (cp — a ■ idy)k(u) — 0, that is, k-th iteration of the given mapping maps u to zero. The set of all root vectors corresponding to a fixed scalar X along with the zero vector is called the root subspace associated to the scalar X e K, and is denote as IZx-
If u is a root vector and the k from the definition is chosen the smallest possible, then (cp — a ■ idy)*-1 (w) is an eigenvector with the eigenvalue a. Thus we have IZx — {0} for all scalars X which are not in the spectrum of the mapping cp.
Proposition. For linear mapping <p : V -> V we have:
(1) for every X e K is H\ C V a vector subspace,
(2) for every X, \i e K is 1Z\ invariant with respect to the linear mapping (<p — fi ■ idy), notably it is that 1Z\ is invariant with respect to <p,
(3) if fi ^ X, then (<p — fi ■ idy)!^ is invertible,
(4) the mapping ((p — X ■ idy)\jzk is nilpotent.
Proof. (1) Checking the properties of the vector vector sub-space is easy and we leave it on the reader.
(2) Assume that (cp — X ■ idy)*(w) — 0 and consider 1; — (cp — jji ■ idy)(«). Then
(<p—k ■ idy)k(v) —
= (<p - X ■ idv)k((<p - X ■ idy) + (X — fl) ■ idy)(w) — (cp — X ■ idy)*+1(w) + (X - ii) ■ (cp - X • idv)k(u) = 0
(3) If u e Ker(cp — ji ■ idy)|^, then
((p — X ■ idy)(«) — ((p — jji ■ idy)(«) + (fl — X) ■ U — (fl — X) ■ u
>From there we have that 0 — (cp — X ■ idy)k(u) — (ji — X)k ■ u and thus also u — 0 for X / ji.
(4) Choose a basis e\,... ,ep of the subspace IZx- Because according to the definition there exist numbers kt such that (cp — X-idy)*' (et) — 0, we have that the whole mapping (cp — X ■ idy) |^ is nilpotent. □
3.35. Factor subspaces. Our next aim is to show that the dimension of the root spaces is always equal to the algebraic multiplicity of the corresponding eigenvalues. Let us first introduce some useful technical tools.
DefiftTHon. Let U C V be a vector subspace. On the set of all vectors in V we define an equivalence relation as follows: vi ~ v2 if and only if v\ — v2 e U. Axioms of equivalence are easy to check. The set V/U of the classes of this equivalence, along with the operations defined using representants, that is, [v] + [w] —
170
CHAPTER 3. LINEAR MODELS AND MATRIX CALCULUS
a cold day the next day is in 50 % of the cases cold and in 30 % of the cases average. Without any further information derive how many warm, cold and average days can be expected in a year.
Solution. For each day exactly one of the states „warm day", „average day", „cold day" is attained. If the vector x„ has as its components the probabilities that a certain (72-th) day is warm, average and cold (respectively), then the components of the vector
/0.5  0.3 0.2\
xn+1 =   0.3   0.4  0.3 -xn \0.2  0.3 0.5/
give the probabilities that the next day is warm, average and cold respectively For verifying it suffices to substitute
/1\ /0\ /0>
X„ =     0    ,      X„ =     1     ,      X„ = 0
W        W \h
while for instance for the third choice we must obtain the probabilities that after a cold day follows a warm, average and cold day (respectively). We see that the problem is a Markov chain with probabilistic transitional matrix
/0.5  0.3 0.2> T =   0.3  0.4 0.3 \0.2  0.3 0.5,
Because all the elements of this matrix are positive, there exists a probabilistic vector
X0O = {Xoo> xoo' xoo) '
to which the vector x„ approaches as n grows, independently of the
vector x„ for small n. Furthermore, thanks to the corollary of the
Perron-Frobenius theorem x^ is the eigenvector of the matrix T for
the eigenvalue 1. Thus it must hold that
xl = 0.5x1 + 0.3x1 + 0.2x1, xl = 0.3x1 + OAxl + 0.3x1, -4   =  0.2 xj^   +   0.34,   +   0.5 x^,
where the last condition means that the vector x^ is probabilistic. It is easy to compute that this system has a single solution
1 _      2 _ 3
Thus we can expect roughly the same number of warm, average and cold days.
Let us emphasise that the sum of the numbers from any column of the matrix T had to equal 1 (otherwise it would not be a Markov process). Because TT = T (the matrix is symmetric), the sum of all numbers from any row is also equal 1. We say that a matrix with non-negative elements and with the properly that the sum of the numbers
[1; + w], a ■ [u] = [a ■ u], forms a vector space which we call factor vector space of the space V by the subspace U.
Check the correctness of the definition of the operations and that all the axiom of the vector space hold!
Classes (vectors) in the factor space V/U will be often denoted as a formal sum of one representant with all vectors of the subspace U, for instance u + UeV/U,ueV. Zero vector is in V/U exactly the class 0+U, that is, the vector u e V represents the zero element in V/U if and only if it is u e U.
For simple examples, think about V/{0} — V, V/V — {0} and about the factor space of the plane M2 by any one-dimensional subspace (here, every one-dimensional subspace U C M2 is a line passing through the origin), where the classes of equivalence are parallel lines with this line.
Proposition. Let U c V be a vector subspace and (u\, ..., un) be such basis of V such that (u\, ..., uk) is a basis of U. Then dim V/U = n — k and the vectors
uk+i + U,..
U
form a basis of V/U.
Proof. Because V = {u\, ...,«„), it is also that V/U = (ui+U,..., u„+U). But first k generators are zero, thus V/U =
(wjfc+i + U, ... ,u„ + U). Assume that ak+\ ■ (uk+i + U)-\-----h
a„ ■ («„ + U) = (ajfc+i • Wjfc+i H-----h a„ ■ u„) + U = 0 € V/U.
That is equivalent to the belonging of a linear combination of the vectors uk+i, ...,«„ to the subspace U. Because U is generated by the remaining vectors, the combination is necessary zero, that is, all coefficients at are zero. □
3.36. Induced mappings on factor spaces. Assume that U C V is an invariant subspace with respect to linear mapping <p : V —>• V and choose basis u\,...,un of the space V such that the first k vectors of this basis is a basis of U. In this basis <p has the block matrix C\
^ \. Then we are able to prove the following lemma:
Lemma. (1) the mapping <p induces a linear mapping <pv/u '■ V/U     V/U, (pv/u (v + U) = cp(v) + U with the matrix D under the induced basis uk+\ + U,..., un + U on V/U,
(2) characteristic polynomial of <pv/u divides the characteristic polynomial of (p.
Proof. For 1;, w e V, u e U, a e K we have cp(v + u) e cp(v) + U (because U is invariant), (cp(v) + U) + (<p(w) + U) = cp(v + w) + U and a ■ (cp(v) + U) = a ■ cp(v) + U = <p(a ■ v) + U (because <p is linear), thus the mapping <pypj is well-defined and linear. Furthermore we have directly from the definition of the matrix of the mapping that the matrix cpv/u in the induced basis on V/U is exactly the matrix D (when counting the images of the basis elements the coefficients of the matrix C add only to the class U). The characteristic polynomial of the induced mapping cpv/u is thus \ D — X - E\, while characteristic polynomial of the original mapping cp is \A - X ■ E\ = \B - X ■ E\\D - X ■ E\. □
Corollary. Let V be a vector space over K of dimension n and let <p : V —>• V be a linear mapping whose spectrum contains n elements (that is, all roots of the characteristic polynomial lie in K and we count their multiplicities). Then there exists a sequence
111
CHAPTER 3. LINEAR MODELS AND MATRIX CALCULUS
in any column equals one and analogously for rows is called doubly stochastic. Important property of every doubly stochastic primitive matrix (for any dimension - the number of states) is that the corresponding vector Xqo has all the components identical, that is, after sufficiently many iterations all the states in the corresponding Markov chain are attained with the same frequency. □
3.34. John is used to go running every evening. He has three tracks - short, middle and long. Whenever he chooses a short track, the next day he feels bad about it and chooses uniformly between long and medium. Whenever he chooses a long track, the next day he chooses arbitrarily among all three. Whenever he chooses the medium track, the next he feels good about it and again chooses uniformly between medium and long. Assume that he has been running like this for a very long. How often does he choose the short one and how often the long one? What is the probability that he chooses a long one when he picked it a week before?
Solution. Clearly it is a Markov process with three possible states -choices for short, medium and long tack. This order of the states gives a probabilistic transitions matrix
0 0 1/3^ 1/2   1/2 1/3 ,1/2   1/2 1/3,
It suffices to realise that for instance the second column corresponds to the choice of the medium track in the previous day, which means that with the probability 1/2 again a medium track will be chosen (the second row) and with probability 1/2 a long track will be chosen (the third row). Because we have
/ 1/6    1/6 l/9\ T2 =   5/12   5/12  4/9 , \5/12  5/12 4/9/
we can use the corollary of the Perron-Frobenius theorem for Markov chains. It is not difficult to compute that eigenvector corresponding to the eigenvalue 1 and which is probabilistic vector, namely:
1 3  3X T 7' 7' 7
The values 1/7, 3/7, 3/7 then give respectively the probabilities that in a randomly chosen day he choose short, medium and long track.
Let John at a certain day (that is, in time n e N) choose a long track. This corresponds to the probabilistic vector
xn = (0, 0, If .
of invariant subspaces {0} — Vq C V\ C • • • C V„ — V with dimensions dim Vj- = i. Under the basis u\,... ,un of the space V such that Vt■ — (u i■) the mapping <p has as its matrix the upper triangular matrix:
Ai ••• *\
\0   ... knJ
where k\,... ,kn is a sequence of the elements of the spectrum.
Proof. Construction of the subspaces Vt is done inductively. Let k i,..., k„ be elements in the spectrum of the mapping <p, that means that the characteristic polynomial of the mapping <p is in the
form (A— k\).....(k—kn). Wechoose Vb — {0}, V\ — where
u i is an arbitrary eigenvector with eigenvalue k i. According to the previous theorem is the characteristic polynomial of the mapping
(fv/Vi of the form (k — k2).....(k — kn). Assume that we have
already constructed linearly independent vectors u\,..., uk and invariant subspaces Vj- — {u\...,ui),i — 1,..., k < n such that the characteristic polynomial of (pv/vk is °f the form (k — kk+i) ■ ■ ■ ■ ■ (k — kn) and <p(ui) e (ki ■ ut + Vi-i) for all i — 1,..., k.
Thus there exists an eigenvector uk+i + Vk e V/Vk of the mapping <py/yk with the eigenvalue lak+\. Consider now the space Vk+i — ■ ■ ■, Wi+i). If the vector u k+i is a linear combination of the vectors u\,..., uk that means that uk+i+Vk is the zero class in V/Vk, but that is not possible. Thus we have dim Vk+i — k + 1. It remains to study the induced mapping (pv/vk+\ ■ Characteristic polynomial of this mapping is of degree n—k—1 and divides the characteristic polynomial of the mapping <p. But adding the vectors u i,..., u k+1 to the basis of V yields a block matrix of the mapping <p with upper triangular submatrix B in the left upper corner and zero in the left lower corner, and the diagonal elements are exactly the scalars k\,..., kk+i. Therefore the roots of the characteristic polynomial of the induced mapping have the required properties.
□
3.37. Notes. If the decomposition of the whole space V into direct sum of eigensubspaces exists, then there exists _ a basis of eigensubspaces and the previous theorem actually does not say anything interesting. But its strength is that the only assumption is the existence of dim V roots of the characteristic polynomial (counting multiplicities). That is ensured whenever the field K is algebraicly closed, for instance the complex numbers C. A direct corollaries of this are interesting claims about the determinant and the trace of the mapping: they are always the product and the sum of the elements in the spectrum respectively. This can be also used for all real matrices. We can always consider them to be complex, calculate what we need, and because both determinant and the trace are algebraic expressions in terms of the elements of the matrix, the results are exactly the values we wanted.
If we are given a scalar product on a vector space V, we can in every inductive step of the previous proof use the fact that it always
holds that V/Vk ~      and 3
(u + Vk) e V/Vk. That
means that in every class of the factor V/Vk there exists exactly one vector from V^. Indeed, the factor space by any subspace in a unitary space has this property - if u, v e V^ are in the same class, then their difference belongs to Vk n V^~, thus they are equal. Thus we can choose as the representant uk+i of the class (the eigenvector
172
CHAPTER 3. LINEAR MODELS AND MATRIX CALCULUS
For the following day it holds that
/ 0     0 l/3\ /0\ /1/3N
xn+1 =   1/2   1/2 1/3 0 = 1/3
\l/2   1/2 1/3/ V1/ \l/3> and after seven days we have
'0>
x„+7 = r7 • o
'l/3^ 1/3
1/ \1/3y
The enumeration gives us as components of xn+1 the values
0.142 861225...;   0.428 569 387...;   0.428 569 387...
Thus the probability that he chooses a long track under the condition that he chose it seven days ago is roughly 0.428 569 ~ 3/7 = 0.428 571. □
3.35. The production line is not reliable: individual products differ in quality in a non-neglectible way. Furthermore, a certain worker tries to improve the quality of the products and intervenes to the process. The products are distributed into classes I, II, III according to their quality, and a report found out that after a product of class I the next product has the same quality in 80 % of the cases and is of quality II in 10 % of the cases; after a product of the class the next product is of the class in 60 % of the cases and is of quality I in 20 % of the cases, and after a product of the quality III the next product is of the quality III in 50 % of the cases and in 25 % of the cases it is of quality II. Compute the probability that the 18-th product is of the quality I, if the 16-th product is of quality III.
Solution. Let us first solve the problem without using a Markov chain. The event in question is satisfied by the cases (16-th product is of the class III)
• 17-th product is of the class I and 18-th product is of class I;
• 17-th product is of the class II and 18-th product is of class
I;
• 17-th product is of the class III and 18-th product is of class I,
with probabilities respectively
• 0.25 • 0.8 = 0.2;
• 0.25 • 0.2 = 0.05;
• 0.5-0.25 = 0.125.
Thus we easily obtain the result
0.375 = 0.2 + 0.05 + 0.125.
Now let us view the problem as a Markov process. From the statement we have that to the order of the possible states „product is of
<Pv/Vk) choose exactly the vector from Vf-. This modification leads us to the orthogonal basis with the properties required in the claim about triangulation. Therefore there also exists such orthonormal basis:
Corollary (Schur orthogonal triangulation theorem). Let <p :
V -> V be arbitrary linear mapping in a (real or complex) unitary space with m — dim V eigenvalues (counting multiplicities). Then there exists an orthonormal basis of the space V such that the matrix of cp is under this basis upper triangular with eigenvalues k\, ... ,km on the diagonal.
3.38. Theorem. Let <p V
root spaces
V be a linear mapping. The sum of
.,n.
that correspond to distinct eigenvalues k\ ... ,kk is direct. Furthermore, for every eigenvalue k the dimension of the subspace 1Z\ equals to the algebraic multiplicity ofk.
Proof. We do the proof by induction over the number k of \\ root spaces. Assume that the theorem holds for less than k spaces and that for vectors «i eKi,, ...,«*€ 1Zxk we have that u\ + • • • + ui_ — 0. For suitable j then (<p — kk- idy)' («*) — 0 and also yt — (<p — k^ ■ idy)7 (ui) are non-zero vectors in TZ^i ,i — 1,..., k — 1, if ut are non-zero (see the previous theorem). But also
k
yi H-----h yk-i — ^~2(<P ~ kk ■ idvy(ui) — 0
r = l
and thus using the inductive assumption all yt are non-zero. But then also ui_ — 0 and linear independence is proven.
It remains to show that the dimension of every root space IZx equals the algebraic multiplicity of the root k of the characteristic polynomial. Let thus be k an eigenvalue of <p, denote by <p the restriction <p\-jix and let i/> : V/IZ^ -> V/TZ^ be the mapping induced by <p on the factor space. Assume that the dimension IZx is smaller than the multiplicity of the root k of the characteristic polynomial. Using the lemma 3.36 we conclude that k is also an eigenvalue of the mapping i/>. Let (i; + 7\^) e V/lZx be the corresponding eigenvector, that is, ir(v+lZx) — k-(v+lZx), which according to the definition denotes v <£lZx and cp(v) — k ■ v + w for suitable w e IZx. We thus have w — (<p— k-idy)(v) a(cp— k-idy)>(w) — 0 for suitable j. We have thus derived (<p—k-idvy+1 (v) — 0, which contradict the choice v <£1Zi.
This proves that the dimension of IZx equals the multiplicity of the root k of the characteristic polynomial of <p. □
Corollary. For every linear mapping <p : V —>• V whose whole spectrum is in K is V — 1Z\X © • • • © 1Z\n the direct sum of the root subspaces. If we choose suitable bases for these subspaces, then <p has under this basis block-diagonal form with upper triangular matrices in the blocks and eigenvalues ki on the diagonal.
3.39. Nilpotent and cyclic mappings. Now almost everything is prepared for the discussion about canonical forms of matrices. It only remains to clear the relation between cyclic and nilpotent mappings and compose together the already proven results.
173
CHAPTER 3. LINEAR MODELS AND MATRIX CALCULUS
class I", „product is of class II", „product is of class III" corresponds the probabilistic matrix
'0.8  0.2 0.25^ 0.1   0.6 0.25 v0.1   0.2 0.5
Situation that the product is in the class III is given by the probabilistic vector (0.0. l)T. For the next product we obtain the probabilistic vector
'0.8  0.2 0.25 0.1   0.6 0.25 v0.1   0.2 0.5
and for the next product in order then the vector
'0.8  0.2 0.25 0.1   0.6 0.25 v0.1   0.2 0.5
whose first component is the desired probability.
Let us add that the first method of the solution (without using the Markov process) led to the result faster. But let us realise how unclear it would become if we wanted to compute, say, 22-nd or 30-th product. In the second method one can in a sense restrict the computations to relevant parts of the matrices only instead of „mindlessly" multiplying the whole matrix. When using the Markov process, we have also directly obtained the probabilities that the 18-th product belongs to the class II and III. □
3.36. Repeated dice casting. Write down the transitional probabilistic matrix T for the Markov chain with states „maximum resulting number after n attempts" with the order of the states 1,..., 6. Then determine T" for every neN.
Solution. We can immediately list
/1/6    0 0
1/6 2/6
1/6 1/6
1/6 1/6
1/6 1/6
0
3/6
0	0	0\
0	0	0
0	0	0
4/6	0	0
1/6	5/6	0
1/6	1/6	V
\l/6   1/6 1/
where the first column is determined by the state 1 and probability 1/6 that it is preserved (that is, the next result is one) and probability 1/6 for transition into any of the other states 2, ..., 6 (the result on the dice would be 2,..., 6), the second column is given by the state 2 and probabilities 2/6 that it is preserved (the result is 1 or 2) and probability for transition 1/6 for transition into any of the other states 3,..., 6 (the result would be 3, ..., 6), and the last column is derived from the fact that the state 6 is persistent (if 6 has already been seen, no greater result can be).
Theorem. Let cp V -> V be a nilpotent linear mapping. Then there exists a decomposition of V into a direct sum of subspaces V — Vi © • • • © Vk such that the restriction ofcp on any of them is cyclic.
Proof. Verifying this is quite straightforward and consists of construction of such basis of the space V that the action of the mapping cp on the basis vectors directly show the , ,g decomposition into the cyclic mappings. But taking care
i of the details will take some time. Let k be the degree of nilpotency of the mapping cp and denote = imifff), i — 0,..., k, that is,
{0} = Pk C Pjt_i C • • • C Pi C P0 = V.
Choose arbitrary basis e\~l,..., ekp~^ of the space Pk-\, where pk-i > 0 is the dimension of Pk-i- >From the definition it follows that Pk-\ C K&vcp, that is, always cp(ek~l) — 0.
Assume that P^-i / V. Because Pk-i — cp(Pk-2), there necessarily exist in Pk-2 the vectors ek~2, j — 1,..., pk-\ such
thatp(e*-2) — ekrl. Assume
a\e\ 1
,ek-\+biek-2
"T- uPk-l"Pk-l
Application of the mapping cp on this linear combination yields
bie\~l A-----ybVk_xekv~\ = 0, therefore all bj = 0. But then also
aj — 0, because it is a combination of the basis vectors. Thus we have verified the linear independence of all 2pk-\ chosen vectors. We extend them to a basis
-i ,
ek~l • Pk-i
ek~2 ek~2
'Pk-2
of the space Pk-2- Furthermore, the images of the added basis vectors are in Pk-i, necessarily they must be linear combinations of the basis elements ek~l,..., ek~} . We can thus exchange the cho-
sen vectors e
cp(ekr2). This
l   ' """' "Pk-i ■
Jj-x+i.....4r-2 withvectors^"2
ensures that the vectors added to the basis of Pk-2 belong to the kernel of the mapping cp. Let us thus assume it right about the chosen basis (1).
Let us assume further that we have already constructed a basis of the subspace Pk-i such that we can directly compose it into the schema
-i ,
~Pk-\
- g*-2 ek-2
k-l
ek~3 ek~3
■ '   Pk-V   Pk-\ + V '
k-i k-i
Pk-V cpk-i + V- ■
ek~2
■ ' Pk-2
k-3 k-3 k-3
' '   Pk-2' »-2+1' • • • ' Pk-3
k-i k-l „k-l CPk-2> Cpk-2 + V
k-l
01     '• • •' "»-1' >,t_i + l'- • •' "»-2' "»-2 + 1'" " "' "»-3' • • • "Pk-l
where the value of the mapping cp on any basis vector is located above him, or equals zero if there is nothing above that basis vector. If Pk-i / V, then again there must exist vectors
k-l-l
' e n-t 1 wnich maP on e i
k-l
tend them to a basis P^-i-i, say by the vectors
., ek lt and we can ex-
Pk-l
k-l-l
k-l-l
'■pk-i + l' • • • ' Pk-l-1-
By gradual subtraction of the values through iteration of the mapping cp on these vectors yields that the vectors added to the basis
174
CHAPTER 3. LINEAR MODELS AND MATRIX CALCULUS
Also, for n e N we can directly determine
( *" (§)"-(*)"
«)"-(*)"
(I)
0 0
(«"
© -a: (i)-(t: -«)■
0 0 0
(«■
(«-(*: -(«■
0 0 0 0
(ir -(§)"
°\
0 0 0 0
1/
The values in the first column correspond gradually to the probabilities that n-times in a row the result is 1, n-times in a row the result is 1 or 2 and there was at least one 2 (therefore we subtract the probability given in the first row), n-times in a row the result is 1,2 or 3 and at least once the result is 3, up to the last row where there is the probability that at least once during n throws the result is 6 (this can be easily derived from the probability of the complementary event). Similarly, in the fourth column are the non-zero probabilities of the events „n-times in a row the result is 1, 2, 3 or 4", „n-times in a row the result is 1, 2, 3, 4 or 5 and at least once it is 5" and „at least once during n attempts the result is 6". Interpretation of the matrix T as the probabilistic transition matrix of a Markov process allows for quick expression of the powers r,«eN. □
3.37. In this problem we deal with a certain property of an animal species which is determined independently of the sex but just by a certain gene - a tuple of alleles. Every individual gains one allele from each of its parent, randomly and independently. There are forms of the gene given by various alleles a, A - they form three possible states aa, a A = Aa and AA of the properly.
(a) Assume that each individual of a certain population mates only with an individual of another population, where there appears only the properly caused by the tuple a A. Exactly one of their offspring (randomly chosen one) will be left on the spot and he will also mate only with an individual of that specific population, and so on. Determine the probabilities of appearance of aa, a A, AA in the considered population after certain time.
(b) Solve the problem given in the case (a), if the other population is composed only of individual with the tuple A A.
(c) Randomly chosen two individuals of opposite sex are bred. >From their progeny again randomly choose two of opposite sex and breed them. If you carry on with this for a long time, compute the probability that both bred individuals have a tuple of alleles A A, or aa (then the process of breeding ends).
Pk-i-i lie in the kernel of <p and analogically as before we verify that we indeed obtain a basis Pk-i-i.
After k steps we obtain a basis of the whole V, which has the properties given for the basis of the subspace Pk-i. Individual columns of the resulting schema then generate the subspaces Yi and additionally we have directly found the bases of these sub-spaces that show that corresponding restrictions of <p are cyclic mappings. □
3.40. Proof of the Jordan theorem. Let k\,..., kk be all distinct eigenvalues of the mapping <p. From the assumptions of the Jordan theorem it follows that V — TZ^l © • • • © 1Zxk. The mappings cpt — (<p\nk. — kt • id^. ) are nilpotent and thus each of the root spaces is a direct sum
of spaces on which the restriction of the mapping <p —ki id y cyclic. Matrices of these restricted mappings on Pr,s are Jordan blocks corresponding to the zero eigenvalue, the restricted mapping <p\ pr s has thus for its matrix the Jordan block with the eigenvalue kt.
For the proof of Jordan theorem it remains to prove the claim about uniqueness. Because the diagonal values ki are given as roots of the characteristic polynomial, their uniqueness is immediate. We express the dimensions of individual Jordan blocks using the ranks rk(ki) of the mapping (cp — ki ■ idy)*. This will show that the blocks are uniquely determined (up to their order). On the other hand, switching the order of the blocks corresponds to renumbering the vectors of basis, thus we can obtain them in any order.
If ijf is a cyclic operator on an n-dimensional space, then the defect of the iterated mapping ^ is k for 0 < k < n and is n for all k>n. This implies that if the matrix J of the mapping <p contains dk(k) of Jordan blocks of the order k with the eigenvalue k, then the defect of the matrix (J — k ■ E)1 is
di (k) + 2d2(k) + ...ldt (k) + Idi+i (k) + ...
>From here we calculate
n - n(k) — d\ (k) + 2d2(k) H dk(k) — rk-\(k)
■ ■ ■ + ldt(k) + ldl+l(k) ■ ■2rk(k) + rk+l(k)
(where the last row arises by combining the previous for values
l=k-l,k,k+Y).
3.41. Note. The proof of the theorem about the existence of the Jordan canonical form was constructive, but it does not give us a perfect algorithmic approach for the construction. Now we summarise the already derived approach for the explicit computation of the basis under which the given mapping <p : V —>• V has the matrix in the canonical Jordan form.
(1) We find the roots of the characteristic polynomial.
(2) If there are less than n — dim V of them (counting multiplicities), there is no canonical form.
(3) If there are n linearly independent eigenvectors, we obtain a basis of V composed of eigenvectors and under it <p has diagonal matrix.
(4) Let k be the eigenvalue with geometric multiplicity strictly smaller than algebraic multiplicity and v\,..., vk be the corresponding eigenvectors. They should be the vectors on the
175
CHAPTER 3. LINEAR MODELS AND MATRIX CALCULUS
(d) Solve the problem from the case (c) without the condition that the individuals have the same parent. Thus you just breed random individuals from a population among them, then you breed among their progeny, and so on.
Solution. Case (a). It is a Markov process given by the matrix
'1/2   1/4    0 \ 1/2   1/2   1/2 ,
0 1/4 1/2/
while the order of the states corresponds to the order of the tuples of alleles aa, a A, AA. The values in the first column follow from the fact that an offspring of parents that have alleles aa and a A has probability 1/2 for the tuple aa and probability 1/2 for the tuple a A. Analogously, we work out the third column. The values in the second column follow from the fact that each of the four cases of the tuples of alleles aa,aA, Aa,AA has the same probability for an individual whose both parents have the tuple a A. Note that there is a difference between counting probability — where we must distinguish between a A and Aa (which allele comes from which parent) — and investigating just the properties caused by the tuples a A and Aa (which are then the same). For determining the resulting state it thus then suffices the probabilistic vector associated with the eigenvalue 1 of the matrix T, because the matrix
/3/8   1/4 1/8N T2 =   1/2   1/2 1/2
\l/8   1/4 3/8, satisfies the condition of the Perron-Frobenius theorem (all its elements are positive). The probabilistic vector is
1 1  r "
v4' 2' 4,
which gives the probabilities 1/4, 1/2, 1/4 of appearance of combinations aa, a A and A A respectively, after a very long (theoretically infinite) time.
Case (b). For the order of the tuples of alleles AA,aA,aawe now
obtain the probabilistic matrix
/l   1/2 0> T =   0   1/2 1 \0    0 Oj
We immediately see all eigenvalues 1,1/2 and 0 (if we subtract them from the diagonal, the rank of the resulting matrix is not 3, that is, the homogeneous system given by this matrix will have a non-trivial solution). To these eigenvectors then respectively correspond the eigenvectors
1
upper border of the scheme from the proof of the theorem 3.39, but it is necessary to find a suitable basis by application of iterations <p — X ■ id v . By doing this we also find out in which row are the vectors located, and we find the linearly independent solutions of the equations (<p — X id)(x) — vt from the rows bellow it. We repeat the procedure iteratively (that is, for wi and so no). We find by this "chains" of basis vectors that give subspaces, where <p — X id are cyclic.
The procedure is practical for matrices where the multiplicities of the eigenvalues are small, or at least the degrees of nilpotency are small. For instance, for the matrix
A =
we obtain the two-dimensional subspace of eigenvectors
((1,0, 0), (0,1,0)).
We need to find the solutions of the equations (A — 2E)x — (a, b, 0)T for suitable constants a, b. This system is solvable only for a — b and one of the possible solutions is i; — (0, 0, 1), a — b — 1. The whole basis is then composed of (1,1,0), (0, 0, 1), (1, 0, 0). Note that we had many choices for bases and thus there are many such bases.
5. Decompositions of the matrices and pseudoinversions
In the previous part we concentrated on the geometric description of the structure of the mapping. Now we translate our results into the language of the so-called matrix decompositions, which is a very •ti/,       important topic for numerical methods and matrix calculus in general.
Even when computing with real numbers we are using for simplicity decompositions into products. The simplest is expression of every real number uniquely in the form
a — sgn(a) • \a\,
that is, as a product of the sign and the absolute value. In the following text we briefly list some of such decompositions for distinct types of matrices. For instance, we have already used a suitable decomposition for positively semidefinite matrices in the paragraph 3.31 for construction of the square root of the matrix.
3.42. LU-decomposition. Let us begin with reformulation of some results we have already derived. In the para-!«:<•:. a graphs 2.7 and 2.8 we have transformed matrices over scalars from any field into the row echelon form. -^rr^i=*-J— For this we have used elementary row transforms, which were based on gradual multiplication of our matrix by invertible lower triangular matrices P, which acted by adding multiples of rows under the one transformed at the moment.
Assume for simplicity that our matrix A is square and that Gaussian elimination does not force us to swap rows - thus all our matrices P, can be lower triangular with ones on diagonal. Finally, it suffices to note that inverses of such P, are again lower triangular with ones on the diagonal and we obtain
U = P • A = ft ■ ■ ■ Pi ■ A
176
CHAPTER 3. LINEAR MODELS AND MATRIX CALCULUS
Therefore it is
'l   -1 1 0    1 -2 v0   0 1
'1   -1 1 0    1 -2 v0   0 1
>From there for arbitrary n e N it follows
/l   -1    1 \   /l    0 0 r" =   0    1    -2-0   1/2 0 \0   0     1 /   \0    0 0
1 0    0\   /l -1 1
0 1/20-0 1 -2
0 0    0/   \0 0 1
1 0    0\   /l 1 1N 0 1/2  0 1-10 1 2 0 0    0/   \0 0 1,
'I   -1 1 0    1 -2 ,0   0 1
'10 0 0  2"" 0 ,0    0 0
Clearly for big n e N we can substitute 0 for 2 ", which implies
/l   -1    1 \   /l  0 0\   /l   1   1\     /ll r T" «   0    1    -2       0  0  00   1   2   =   0  0 0 \0   0     1 /   \0  0  0/   \0  0   1/     \0  0 Oj
Thus if individuals of the original population procreate exclusively
with the member of the specific population (that, that has only AA),
necessarily after a sufficient number of breeding it results into a total
elimination of the tuples a A and aa (and it does not matter what their
original distribution was).
The case (c). Now we have 6 possible states (in this order)
AA, AA;    aA,AA;    aa, AA;
aA,aA;    aa,aA; aa,aa,
while these states are given by the genotypes of the parents. The matrix of the corresponding Markov chain is
/l	1/4	0	1/16	0	0\
0	1/2	0	1/4	0	0
0	0	0	1/8	0	0
0	1/4	1	1/4	1/4	0
0	0	0	1/4	1/2	0
	0	0	1/16	1/4	V
If we consider for instance the situation (second column), where one of the parents has the tuple A A and the second has a A, then clearly each of the four cases (we are talking about the tuple of alleles of two randomly chosen offsprings)
AA, AA;    AA,aA;    aA,AA; aA,aA
occurs with the same probability. The probability of staying in the second state is thus 1/2 and the probability for transition from the second state to the first is 1/4 and to the fourth state also 1/4.
Now we should again determine the powers T" for big n e N. Considering the form of the first and of the last column we immediately
where U is the upper triangular matrix and thus
A — L-U
where L is lower triangular matrix with ones on diagonal and U is upper triangular. This decomposition is called LU-decomposition of the matrix A.
In the case of the general matrix we can with Gaussian elimination into the row echelon form need some additional row permutations, sometimes even column permutations. Then we obtain the more general
A — P ■ L ■ U ■ Q, where P and Q are some permutation matrices.
3.43. Notes. A direct corollary of the Gaussian elimination is also a realisation that up to the choice of suitable / bases on the domain and codomain, every mapping / : V -> W given by a matrix in block-diagonal form with unit matrix, with size given by the dimension of the image / and with zero blocks all around. This can be reformulated as follows: every matrix A of the type m/n over a field of scalars K can be decomposed into the product
E 0 0 0
Q,
where P and Q are suitable invertible matrices.
For square matrices we have in 3.32 shown when discussing properties of linear mappings / : V -> V over complex vector spaces that every square matrix A of the dimension m can be decomposed into the product
A — PB- P~l,
where B is block-diagonal with Jordan blocks associated to eigenvalues on the diagonal. Indeed, it is just a reformulation of the Jordan theorem, because multiplying by the matrix P and by its inverse from the other side corresponds in this case just to a change of the basis on the vector space V and the cited theorem says that in a suitable basis every mapping has Jordan canonical form.
Analogously, when discussing the self-adjoint mappings we have proved that for real symmetric matrices or for complex Her-mitian matrices there always exists a decomposition into the product
A — PB- P*,
where B is a diagonal matrix with all (always real) eigenvalues on the diagonal, counting multiplicities. Indeed, it is again a product of matrices then stand for the change of the basis, but we allow only changes between orthonormal bases and thus also the matrix P for the change must be orthogonal. From there we have P~l — P*.
For real orthogonal mappings we have derived analogous expression as for symmetric, only our B is diagonal with blocks of size two or one, expressing either rotation or mirror symmetry or identity with respect to the corresponding subspaces.
3.44. Singular decomposition theorem. Let us return to general linear mappings between vector spaces (in general distinct). If a scalar product is defined on them and we restrict ourselves on orthonormal bases only, we must proceed in a more refined way than in the case of arbitrary bases.
177
CHAPTER 3. LINEAR MODELS AND MATRIX CALCULUS
find out that 1 is an eigenvalue of the matrix T. It is very easy to find the eigenvectors
(l,0,0,0,0,0)r,    (0,0,0,0,0, if
corresponding to the eigenvalue 1. By considering only a four-dimensional submatrix of the matrix T (omitting the first and sixth row and column) we find the remaining eigenvalues
1     1     1 - V5     1 + 75 2'    4'        4    '        4 '
If we recall the solution of the exercise called Sweet-toothed gambler, we don't have to compute T". In that exercise we obtained the same eigenvectors corresponding to the eigenvalue 1 and the other eigenvalues also had their absolute value strictly smaller than 1 (the exact values were not used). Thus we obtain identical conclusion - the process approaches the probabilistic vector
(a, 0,0, 0,0, 1 -a)T ,
where a e [0, 1] is given by the initial state. Because only at the first and sixth position of the resulting vector can be a non-zero number, the states
aA,AA;    aa, A A;    aA,aA; aa,aA
after many breedings disappear. Let us further realise (follows also from the exercise Sweet-toothed gambler) that the probability that the process ends with AA, A A equals the relative ratio of the appearance of A in the initial state.
The case (d). Let the values a,b,c e [0, 1] give in this order the relative ratios of occurrence of alleles AA, aA, aa in the given population. We want to obtain the expression of relative ratios of the tuples AA, a A, aa in the offspring of the population. If the choice of tuples for breeding is random, then for a suitably big population it can be expected that the relative ratio of breeding of individuals that both have AA is a2, the relative ratio for the tuple a A and AA is lab, the relative ratio for a A (both of them) is b2 and so on. The offspring of the parents with tuples AA,AA must inherit A A. The probability that the offspring of the parents with tuples AA, a A has AA is clearly 1/2 and the probability that the offspring of the parents with tuples a A, a A has A A is 1/4. There are no other cases for an offspring with the tuple A A (if one of the parents has the tuple aa, then the offspring cannot have A A). Relative frequency of A A in the progeny is thus
, 1,1, b2
a1 ■ 1 + lab---h b1 ■ - = a2 + ab -\--.
2 4 4
Theorem. Let A be any matrix of the type m/n over real or complex scalars. Then there exist square unitary matrices U and V of dimensions m and n, and a real diagonal matrix D with non-negative elements of dimension r, r < min{m, n\, such that
A = USV*
S =
D 0 0 0
and r is the rank of the matrix AA*. Furthermore, S is determined uniquely up to the order of the elements and the elements of the diagonal matrix D are the square roots of the eigenvalues di of the matrix AA*. If A is a real matrix, then the matrices U and V are orthogonal.
Proof. Assume first that m < n and denote <p : K" -> Km the mapping between real and complex spaces with % standard scalar products, given by the matrix A under the standard bases.
We can reformulate the statement of the theorem as follows: there exists orthonormal bases on K" and Km under which the mapping <p has the matrix S from the claim of the theorem.
As we have seen before, the matrix A* A is positively semi-definite. Therefore it has only real non-negative eigenvalues and there exists an orthonormal basis w in K" under which the corresponding mapping <p* o <p has for matrix a diagonal matrix with eigenvalues on the diagonal. In other words, there exists unitary matrix V such that A*A = VBV* for real diagonal matrix with non-negative eigenvalues (d\, di,..., dr, 0,..., 0) on the diagonal, di / 0 for all i = 1,... ,r. >From there
B = V*A*AV = (AV)*(AV).
That is equivalent to the claim that first r columns of the matrix AV are orthogonal and the remaining are zero, because they have zero size.
Let us now denote first r columns v\,..., vr e Rm. Thus it holds that (vi,Vi) = dt, i = 1,..., r, and the normalised vectors ui = -j= Vi form an orthonormal system of non-zero vectors. Let us extend them to an orthonormal basis u = u\, ...,«„ of the whole Km. If we express our original mapping <p under the bases w in K" and u in Km, we obtain the matrix \fB. The transformations from the standard bases to the new chosen ones correspond to the multiplication from the left with orthogonal matrix U and from the right with V~l = V*.
If m > n, we can apply the previous part of the proof on the matrix A*. From there we directly obtain the desired claim.
If we work over real scalars, all the previous steps in the proof are also realised in the real domain. □
This proof of the theorem about singular decomposition is constructive and we can indeed use it for computing the unitary (orthogonal) matrices U and V and the non-zero diagonal elements of the matrix S.
3.45. Geometric interpretation. Diagonal values of the matrix D from the previous theorem are called singular values of the matrix A. Let us reformulate this theorem in the real case more geometrically.
178
CHAPTER 3. LINEAR MODELS AND MATRIX CALCULUS
Analogically we set gradually the relative frequencies of the tuples a A and aa in the progeny:
b2
ab + be + lac -\--
and
c + be -\--.
This process can be viewed as a mapping T that transforms the vector (a,b,c)T. It holds that
(a\       (    a2 +ab + b2/4 b \ h» \ab + bc + 2ac + b2/2 c)      \    c2 + bc + b2/4
Let us mention that the domain (and also the codomain) of T are just the vectors
' a'
kde   a, b, c e [0, 1], a + b + c = 1.
We would like to give the operation T my multiplying the vector by some constant matrix. But that is clearly not possible (the mapping T is not linear). It is thus not a Markov process and the determination of what happens after a long time cannot be simplified as in the previous cases. But we can compute what happens if we apply the mapping T twice in a row. In the second step we obtain
a2 + ab + b2/4 T : I ab + be + lac + b2/2 I h» I t2 I , kde
c2 + be + b2/4
t\ = [a2 + ab + — )  + ( a2 + ab + — ) ( ab + be + 2ac + —
b2^
+ — I ab + be + lac -\--
4 V 2
t2 = ( a2 + ab + — | ( ab + be + lac + '— | +
+ [ ab + be + lac +       (c2 + be + ^ ) +
b2\ ( b2\     1 / ^2
+ l[a2 + ab + — ) ( c2 + 6c + — J + - (     + be + lac +
,     , , ^2\2    / b2\ ( 7 b2
f7 = ( cl + be + — I  + I ab + be + lac + — I I cl + be + —
1 / b2
+ — \ ab + be + lac -\--
4 V 2
It can be shown (using a + b + c = 1) that
b2 b2 b2
t\=a2+ab-\--,    tl = ab + be + lac -\--,    tl=c2 + bc-\--,
z 4 2 4
For the corresponding linear mappings : W —>• Rm the singular values have indeed simple geometric meaning: let K c M" be the unit ball for the standard scalar product. The image cp(K) is the always an m-dimensional ellipsoid (possibly degenerate). The singular values of the matrix A are then the sizes of the main half-axes and the theorem further says that the original sphere always allows orthogonal grouped diameters, whose image are exactly all half-axes of this ellipsoid.
For square matrices it can be seen that A is invertible if and only if all singular values are non-zero. The ratio of the greatest to the smallest singular value is an important parameter for the robustness of the sequence of numerical computations with matrices, for instance for the computation of the inverse matrix. Let us also note that there exist fast methods of computations (approximations) for eigenvalues, thus the singular decomposition is very effective to work with.
3.46. Polar decomposition theorem. The singular decomposi-f% tion theorem is a starting point for many very useful tools. Let us now think about some direct corollaries (which by themselves are quite non-trivial). The statement of the theorem says that for any matrix A, real or complex, A — U SW* with S diagonal with non-negative real numbers on the diagonal and U and W unitary. But then also A — USU*UW* and let us call the matrices P = USU*, V = UW*. First of them, P, is Hermitian (in real case symmetric) and positively semidefinite, because it regards just how to write down the mapping with real diagonal matrix S in another orthonormal basis, while V is a product of two unitary matrices and thus again unitary (in the real case orthogonal). Furthermore A* — WSU* and thus AA* — USSU* — P2 and our matrix P is actually the square root of the easily computable Hermitian matrix A A*.
Assume that A — PV — QU are two such decompositions of the matrix A into the product of positively semidefinite Hermitian and unitary matrix and assume that A is invertible. But then
AA* = PVV*P = P2 = QUU*Q = Q2
is positively definite and thus the matrices Q — P — \/AA* are uniquely determined and invertible. But then also U — V —
P~lA.
We have thus completely derived a very useful analogy of the decomposition of a real number into a sign (orthogonal matrix in the case of dimension are exactly ±1) and the absolute value (the matrix P, for which we can compute the square root).
Theorem (Polar decomposition theorem). Every square complex matrix A of the dimension n can be always expressed in the form A — P ■ V, where P is Hermitian and positively definite square matrix of the same dimension and V is unitary. We have P = V A A*. If A is invertible, the decomposition is unique and V — (s/~AA*)-lA.
If we work over real scalars, P is symmetric and V orthogonal.
If we apply the same theorem on A* instead of A, we obtain the same result, but with the order of the Hermitian and unitary matrices reversed. The matrices in the corresponding right and left decomposition will of course be in general distinct.
179
CHAPTER 3. LINEAR MODELS AND MATRIX CALCULUS
that is,
a2 + ab + b2/4
a2 +ab +b2/4 ab +bc + lac + b2/2 I h» I ab + be + lac + b2/2 c2 + bc + b2/4    )       \     c2 +bc + b2/4
We have obtain a surprising result that further application of the transform T does not change the vector obtained in the first step. That means that the appearance of the considered tuples is after arbitrary long time time the same as in the first generation of offspring. For a big population we have thus proven that the evolution takes place during first generation (unless there are some mutation or selection). □
3.38. Let there are two boxes, which contain together n white and n black balls. In regular time intervals from both boxes a ball is taken and moved to the other urn, while the number of balls in each of the boxes is at the beginning (and thus for all the time) equal to n. Give for this Markov process its probabilistic transition matrix T.
Solution. This case is often used in physics as a model of blending two incompressible liquids (already in the year 1769 introduced by D. Bernoulli) or analogously, as a model of diffusion of gases. The states 0,1,...,« correspond for instance to the number of white balls in the first box. This information already says how many black balls are in the first box (and the remaining balls are then in the second box). If in the certain step a state changes from j € {1,...,«} to j — 1, it means that from the first box a white ball was drawn and from the second a black ball was drawn. That happens with probability
I . I - ]—
n   n n2
Transition from the state j € {0, ...,n — 1} to the state j + 1 corresponds to drawing the black ball from the first box and a white ball from the second box, with probability
n ~ j   n - j     (n- j)2
The system stays in the state j e {1,— 1}, if from both boxes balls of the same colour were drawn, which has the same probability
n
J n
J_ I
n
2j (n- j)
In the complex case the analogy with the decomposition of numbers is even more funny - positively semidefinite P again plays a role of the absolute value of the complex number, the unitary matrix V then has a unique expression as a sum V — re V + i imV with Hermit-ian real and imaginary parts and the property (re V)2 + (im V)2 — E, that is, we obtain a full analogy for the polar form for the complex numbers (see the final remark in 3.30). But note that in the case with more dimensions it is important in what order is this "polar form" of matrix written. It is possible in both ways, but the results are in general distinct.
For many practical applications it is faster to use the so-called QR decomposition of matrices, which is an analogy of the Schur orthogonal triangulation theorem:
3.47. Theorem. For every complex matrix A of the type m/n there exists a unitary matrix Q and an upper triangular matrix R such that A = QTR.
If we work over real scalars, both Q and R are real.
Proof. In the geometric formulation we need to prove that for every mapping <p : K" -> Km with the matrix A under the standard bases we can choose new orthonormal basis on Km such that then <p has upper triangular matrix.
Consider the images (p(e\),..., cp(e„) e Km of the vectors of the standard orthonormal basis, and choose from them maximal linearly independent system vi,..., vk in such a way that the removed dependent vectors are always a linear combination of the previous vectors, and we extend it into a basis vi,... ,vm. Let u\,..., um be an orthonormal basis Km obtained by Gramm-Schmidt orthogonalisation of this system of vectors.
Now for every is (p(e{) either one of vj, j < i, or it is a linear combination of vi,..., i^-i, therefore in the expression of <p(ei) under the basis u appear only vectors u\,... ,ut. The mapping <p thus has under the standard basis on K" and under u on Km upper triangular matrix R. The change of the basis u on Rm corresponds to the multiplication by unitary matrix Q from the left, that is, R — QA, equivalently A — QT R.
The last claim is clear from our construction. □
To close this part of the text let us note the especially use-f% ,      ful and important application of our results for the
r£jY approximate numerical calculations. It is a quite ^^sSlllS^3 straightforward application of singular decompositions of matrices, as can be already seen from the following:
A = USV*     S =
3.48. Definition. Let A be a real matrix of the type m /n and let
'D Q\ 0 Oj
be its singular decomposition (notably, D is invertible). The matrix
'D-1 0^ 0 Oj
is called the pseudoinverse matrix of the matrix A.
it
VS'U*, $
Let us add that from the state 0 it is necessary (with probability 1) to go to the state 1 and similarly from the state n with probability one to
As the following theorem shows, the pseudoinverse is an important generalisation of the notion of inverse matrix, together with direct applications.
3.49. Theorem. Let A be real or complex matrix of the type m /n. Then for its pseudoinverse it holds that:
180
CHAPTER 3. LINEAR MODELS AND MATRIX CALCULUS
the state n —
nL
1. Considering it all we obtain the matrix
0
1
l(n
1)
0      {n- l)2
0 0
0 0
0
22
2 • 2(n -
0 0
2)
0 0
(n - 2)2
22 0
0 0
0
(n - l)2
2- (« - 1)1
1
for the order of the states 0, 1,
When using this model in physics we are of course interested in the distribution of balls in boxes after a certain time (the number of drawings). If the initial state is for instance 0, we can use the powers of the matrix T to observe with what probability the number of white balls in the first box is increasing. We can confirm the expected result that the initial distribution of the balls influences their distribution after certain time in a very negligible way.
If we had numbered the individual balls, we would instead of ball drawing draw some of the numbers 1, 2,..., 2n and the ball whose number was draw would move to the other ball. We would obtain a Markov process with states 0, 1,..., 2n (the number of balls in the first box), where we are not distinguishing the colour any more. This Markov chain is also very important in physics (P. and T. Ehrenfest have introduced it in 1907). It is used as a model of interchange of heat between two isolated bodies (the heat is represented by the number of balls, the bodies by the boxes). □
3.39. Two players, A and B, gamble for money repeatedly a certain game, which can result only in a victory of one of the players. The winning probability for the player A is in each individual game p € [0, 1/2) and both bet always only €1, that is, after each game with probability p the player B gives €1 to the player A and with probability 1—p the other way round. They play as long as both have some money. If the player A has at the start of the game €x and the player B has €y, determine the probability that the player A loses it all.
Solution. This problem is called Ruining of a player. It is a special Markov chain (see also the exercise Sweet-toothed gambler) with many important applications. The probability in question is
(3.6)
x+y'
Let us investigate what is this value for specific choices of p,x, y. If the player B wants to be almost sure and requires that the probability that the player A loses with him €1000 000 c is at least 0.999, then it
(1) if A is invertible (notably, it is square), then 0\ A* = A~\
0(4) forpseudoinverse A^ it holds that A^A and AA^ are Hermit-
0
(■)
0
0
ian (in real case symmetric) and
AA^A = A,    A'AA' = A'.
pseudoinverse matrices A^ is by the four properties from the previous point determined uniquely. Thus if some matrix B of the type n x m satisfies that BA and A B are both Hermitian, ABA = A and BAB = b, then B = A1", if A is a matrix of the system of linear equations Ax — b with the right-hand side b e W", then the vector y — A^b e W minimises the size || Ax — b || for all vectors x eW, the system of linear equations Ax — b with b e Km is solvable if and only if it holds that AA^b — b. In this case all solutions are given by the expression
x = A*b+(E - AtA)u,
where u eW is arbitrary.
Proof. (1): If A is invertible, then the matrix S — U*AV is also invertible and right from the definition we have S' — S~l. From that it follows that A(_1) A = A A(_1) = E. (2): Direct computation yields SS' S — S and   S$ — 1j 1 S", therefore
AA(-V)A = USV*VS,U*USV* = US&SV* = USV* = A and analogically for the second equation. Furthermore,
(AA(-r))* = (USS'U*)* = Ui^T^U*
= uis&yu* = uss'u* = aa(-v)
and similarly it can be proven that (A(_1)A)* = A{~l)A.
(3) The claim can be proven via direct computation. We consider for a while the mapping <p given under the standard basis by the matrix A, and we express the mapping <p under the basis form the singular decomposition theorem, that is, under this basis the mapping <p has matrix S from the definition of pseudoinverse A^. Without loss of generality we now work in this basis, that is, we can assume that in the block form
it
(5)
D 0 0 0
it
D~ 0
with diagonal matrix D of all non-zero singular numbers, and B is a matrix satisfying the assumptions. Clearly
AfA
and thus we obtain
A1" = AfABAAf = >From there we see that
B
B
I)
Q
E 0 0 0
D-1 0
for suitable matrices P, Q and R. But now
'd~x   p\(d 0 q    rj\0 0
BA
E
QD
181
CHAPTER 3. LINEAR MODELS AND MATRIX CALCULUS
suffices for him to have €346 c if p = 0.495 (or €1727 if p = 0.499). Therefore it is possible in big casinos that „passionate" players play almost fair games. □
3.40. In a certain company there exist two competing departments. The management has decided that every week they will measure relative (with respect to the number of employees) incomes attained by these two departments. To the more successful department will then 2 employees of the other department be moved. This process will go on as long as both departments have some employees. You have gained a position in this company and you can choose one of these two departments where you will work. You want to choose the department which won't be cancelled due to the employee movement. What will be your choice, if one of the departments has 40 employees, the other 10 employees and you estimates that the second one will have relatively greater income in 54 % of the cases? O Another application of the Markov chains are in the additional exercises after this chapter.
E. Unitary spaces
Even in the previous chapter we have defined the scalar product in real vector spaces (2.40), in this chapter we extend its definition to the complex spaces too (3.23).
3.41. Groups O(n) and U(n). If we consider all linear mappings from M3 to M3 which preserve the given scalar product, that is, with respect to the definitions of the lengths of the vectors and deviations of two vector all linear mappings that preserve lengths and angles, then these mappings form with respect to the operation of composition a group (see 1.1); composition of two such mappings is by the the definition also a mapping that preserves lengths and angles, unit element of the group is the identity mapping, the inverse element for a given mapping is its inverse mapping - thanks to the condition on the lengths preservation such mapping exists. Matrices of such mappings thus form a group with the operation of matrix multiplication (see ), it is called the orthogonal group and is denoted by 0(n). It is a subgroup of all invertible mappings from M." to M.".
If we additionally require that the matrices have determinant one, we speak of the special orthogonal group SO(n) (in general the determinant of a matrix in O(n) can be either 1 or — 1).
Similarly we define the unitary group U(n) as group of all (complex) matrices that correspond to the complex linear mappings from
should be Hermitian, thus QD — O and thus also Q — O (the matrix D is diagonal and invertible). Analogously, the assumption that AB is Hermitian implies that P is zero. Additionally, we have
B = BAB =
D-1 0
D-1 0
On the right side in the right-lower corner there is zero, and thus also R — O and the claim is proven.
(4): Consider the mapping cp : K" -> Km, x i-> Ax, and direct sums K" — (Ker cp)1- © Ker cp, Km = Imcp © (Imcp)-1. The restricted mapping cp :— cp^Keiip:)± ■ (Kercp)1- -> Imcp is a linear isomorphism. If we choose suitable orthonormal bases on (Kercp)1- and Imcp and extend them to orthonormal bases on whole spaces, the mapping cp will have matrix S and cp the matrix D from the theorem about the singular decomposition. For given b e Km is the point z e Imcp that minimises the distance \\b — z\\ (that is, the point that realises the distance from the affine subspace p(b,Imcp), see the next chapter) exactly the component z — b\ of the decomposition b — b\ + b2, b\ e Imcp, b2 e (Imcp)-1. But in a suitably chosen basis is the mapping , originally given under standard bases by the pseudoinverse A(_1), given by the matrix y from the singular decomposition theorem, notably we have ,(-D i
(Imcp) — (Ker<p)± and D 1 is the matrix of the restriction
(J^i is zero- Indeed we have
<po<p-L)(b) = <f>(<p(-L)(z))=z
and the proof is finished.
(5) Evidently, from the equality Ax — b for a fixed x e Kn it follows that
b = AAJAx = AAJb.
Thus it is a necessary condition. On the other hand, if this condition holds, then we can for the given expression x compute
Ax = A(Afb +(E - ATA)u) = b + (A - AATA)u = b.
The rank of the matrix A — A1" A gives the correct size of the image of the corresponding mapping according to the Frobenius theorem about solution of the system of linear equations, and thus we obtain in this way all solutions. □
Remark. It can be also shown that the matrix A (~:) minimises the expression
||AA(_1) - E\\2
that is, the sum of squares of all elements of the given matrix.
>From the point (4) of the previous theorem we obtain that the matrix A A1" is the matrix of the perpendicular projection form the vector space M", where n is the number of the rows of the matrix A on the subspace generated by the columns of the matrix A (this interpretation has of course meaning only for matrices that have more rows than columns).
Furthermore, for matrices A whose columns for independent vectors, the expression (AT A)~l AT makes sense and it is not hard to verify that this matrix satisfies all properties from (1) and (2) from the previous theorem, thus it is a pseudoinverse of the matrix A.
182
CHAPTER 3. LINEAR MODELS AND MATRIX CALCULUS
C" to C" that preserve a given scalar product in a unitary space. Analogously, SU(n) denotes the subgroup of matrices in U(n) with determinant one (in general, determinant of matrix in U(n) can be any complex unit).
3.42. Consider the vector space V of functions R -» C. Determine whether the mapping <p from the unitary space V is linear.
i) cp(u) = Xu where A e C
ii) cp(u) = u*
iii) cp(u) = u2(= u.u)
iv) cp(u) = £
V is for suitable functions a unitary space of infinite dimension. Scalar
product is defined by the relation f.g = f_Qo f(x)g(x)dx. Solution, yes, no, no, yes
□
3.43. Show that if H is a Hermitian matrix, then U = exp(/i?) = YlT=o "hM^Y is a unitary matrix and compute its determinant. Solution. >From the definition of exp we can show that it holds that exp(A + B) = exp(A). exp(S) as we are used to with the exponential mapping in the domain of numbers. Because in general it is (u+v)* = u* + v* and (cv)* = cv*, we obtain
U*
co    1 co 1
(Y-(iHYT = Y-,(-iH*Y
and because H*
«=o H, then
«=o
U* = T(-1)"-(///)" = exp(-iff) z—' n\
«=o
and thus
U*U = exp(iH)exp(-iH) = exp(0) = 1.
det(I7) = e
trace (iH)
□
3.44.   Hermitian matrices A, B, C satisfy [A, C] = [B, C] = 0 and [A, S] 7^ 0, where [, ] is a commutator of matrices defined defined by the relation [A, B] = AB — BA. Show that at least one eigensubspace of the matrix C must have dimension > 1.
Solution. We prove it by contradiction. We assume that all eigensub-spaces of the operator C have dim = 1. Then we can for any vector u write u = ckuk where uk are linearly independent eigenvectors of the operator C associated with the eigenvalue kk (and ck = u.uk) For these eigenvectors it clearly holds that
0 = [A, C]uk = ACuk — CAuk = XkAuk — C(Auk)
3.50. Linear regression. The approximation property (3) from the previous theorem is very useful in the cases where we are to find as good approximation as possible for the (non-existent) solution of a given system Ax — b, where A is a real matrix of the type m/n and m > n.
For instance, an experiment gives us many measured real values bj and we want to find a linear combination of some functions fi, which approximates the values bj. The actual values of the chose functions in the points yj e R give a matrix atj — fjiyi), whose columns are given by values of the individual functions fj in the considered points, and our goal is to determine the coefficients xj e R so that the sum of the squares of the deviations from the actual values
is minimised. In other words, we seek a linear combination of the functions ft such that we interpolate the given values bt "well". Thanks to the previous theorem are the optimal coefficients
A^b.
In order to have a more specific idea, consider just two functions f\ (x) — x, f2(x) — x2 and assume that the "measured values" of their unknown combination g(x) — y\x + y2x2 in integral values for x between 1 and 10 are bT — (1.44 10.64 4.48 14.56 31.12 39.20 54.88 71.28 85.92 104.16). This vector arose by computing the values x + x2 in given points shifted by random values in range ±8. The matrix A — (btj) is in our case equal to
Ar =
12   3 4
10
1   4   9   16   25   36   49   64   81 100
and the coefficients in the combination are
,(-D
0.61 0.99
The resulting interpolation can be seen at the picture, where the given values b are interpolated with a green polygonal chain, while the red graph corresponds to the combination g. The computations were done in the system Maple using the command leastsqrs(B,b). If you are enfriended with Maple (or some other similar software), try to do some experiments with similar tasks.
183
CHAPTER 3. LINEAR MODELS AND MATRIX CALCULUS
>From there we see that Auk is an eigenvector of the matrix C with the eigenvalue Xk. But that means that Auk = XAuk for some number XA. Similarly we derive Buk = Xfuk for some number Xf. For the commutator of matrices A and B we then obtain
[A, B]uk = ABuk - BAuk = XAXBkuk - XBkXAuk = 0
But that means that
[A, B]u = [A, B]J2ckUk - J]cjt[A, B]uk = 0
k k
and because u was arbitrary, that means that [A, B] = 0, which is a contradiction. □
3.45. Applications in quantum physics. In quantum physics we
don't give to quantities any numerical value, as in classical
rf physics, but a Hermitian operator. That is nothing but a Her-mitian mapping, which can lead (and often does) to a linear transformation between unitary spaces of infinite dimension (we can imagine this as a matrix of infinite dimension). Vectors in this unitary space then represent the states of the given physical system. When measuring a given physical quantity we obtain only values that are eigenvalues of the corresponding operator.
For instance instead of the coordinate x we have an operator of the coordinate x, that results in multiplication by x. If the state of the system is described by the vector V, then it holds that x. (v) = xv, that is, it corresponds to the multiplication of the vector by the real number x. At the first glance this Hermitian operator is different from our cases of finite dimensions. Evidently every real number is an eigenvalue (x has the so-called continuous spectrum). Similarly, in place of speed (more precisely, momentum) we have the operator p = —z The eigenvectors are solution of the differential equation —i^ = Xv. Even in this case is the spectrum continuous. That expresses the fact that the corresponding physical quantity is continuous (it can attain any real value). On the other hand, we have physical quantities, for instance energy, that can attain only discrete values (energy exists in quanta). The corresponding operators are then really similar to the Hermitian matrices, they just have infinitely many eigenvalues.
3.46. Show that x. and p are Hermitian and that
[x, p] = i
Solution. For any vector i; it holds that
„  „       „ „ dv d(xv)
[x, p]v = xpv — pxv = x(—i—) + i--= iv
dx dx
and from there we directly have our claim. □
x
CHAPTER 3. LINEAR MODELS AND MATRIX CALCULUS
A2 - 3A + 2
3.47. Show that
[x — p, x + p] = 2i
Solution. Evidently we have that [x, x] — 0 and [p, p] = 0 and the rest follows from the linearity of the commutator from the previous exercise. □
3.48. Jordan form. Find the Jordan form of the matrix A. What is the geometric interpretation of this decomposition of the matrix?
Solution. i)We first compute the characteristic polynomial of the matrix A
l A        1171 1
\A-XE\=     _6 4_A
The eigenvalues of the matrix A are the roots of this polynomial, that means that ki 2 = 1,2. Because the matrix is of order two and has two
distinct eigenvalues, its Jordan form is a diagonal matrix J = ^ ^ The eigenvector (x, y) associated with the eigenvalue 1 satisfies 0 = (A — E)x = ^ 5 3^ that is, — 2x +y = 0. That holds exactly for the multiples of the vector (1,2). Similarly we find out that the eigenvector associated with the eigenvalue 2 is (1, 3). The matrix P is then obtained by writing these eigenvectors into tho columns, that is,
P = ^ F°r the matrix A we then have A = P ■ J ■ P~l. The inverse of Pis/5"1 = ^2    ^and we obtain
■1   1\ = /1 0\/3 -1
-6  4/     ^2   3J \0  2) \-2 1
This decomposition tells us that the matrix A determines such linear mapping that has in basis of the eigenvectors (1, 2), (1, 3) the aforementioned diagonal form. That means that in the direction (1,2) nothing is changing and in the direction (1,3) every vector is being stretched twice.
ii) Characteristic polynomial of the matrix A is in this case
|A - XE\
1 — k 1 -4     3 — A
A2 - 2A + 1 = 0
We obtain a double root A = 1 and the corresponding eigenvector (x, y) satisfies
' -2   l\ /jcn
0 = (A-E)x-,   4 2)yy
The solutions are, as in the previous case, multiples of the vector (1,2). The fact that the system has no two linearly independent vectors as a
185
CHAPTER 3. LINEAR MODELS AND MATRIX CALCULUS
solution says that the Jordan form in this case is not optimal, but it
will be a matrix ^      . The basis for which A has this form is the
eigenvector (1,2) and a vector that maps on this vector by the mapping A — E, it is thus a solution of the system of equations
-2 1 -4 2
1 ) - ( "2 1
2 \00
The solutions are the multiples of the vector (1,3). We obtain the same basis as in the previous case and we can write
-4 3 J \2 3J\0 \J\-2 1 The mapping now acts on the vector as follows: the component in the direction (1,3) stays the same and the component in the direction (1,2) is multiplied by the sum of the coefficients that determine the components in the directions (1,3) and (1,2). □
3.49. Find the Jordan form of the matrix A and write down the decomposition. What is geometric interpretation of this decomposition?
li = | ^ ^2 4^ anc* = I ^4 i^j and draw how the vectors v = (3, 0), Aiv and A2v decompose with respect to the basis of the eigenvectors of the matrix Ai _2.
Solution. The matrices have the same Jordan forms as the matrices in the previous exercise and both have in the basis of the vectors (1,2) and (1, —1), that is,
3 (-2    4 ) = (2   -l) (0  2) (2 -1
and
3 (4 1 ) = (2 -l) (0 l) (2 -1, For vector 1; = (3, 0) we obtain 1; = (1,2)+ 2(1, —1) and for its images Aiv = (5, -2) = (1, 2) + 2 • 2 • (1, -1) and A2v = (5, 4) = (2 + 1) .(1,2) + 2- (1,-1). □
F. Matrix decompositions
3.50. Prove or disprove:
• Let A be a square matrix n x n. Then the matrix AT A is symmetric.
• Let A be a square matrix with only real positive eigenvalues. Then A is symmetric.
3.51. Find an LU-decomposition of the following matrix:
-2 1 0 -4 4 2 -6   1 -1.
186
CHAPTER 3. LINEAR MODELS AND MATRIX CALCULUS
Solution.
'1 0 0\ (-2 1 0^ 2 1 0 0 2 2 K3 -1 1/ \ 0 0 1, We first multiply the matrices that correspond to the Gaussian elimination, we thus obtain for the original matrix a, xa = u, where x is a lower triangular matrix given by the Gaussian reduction, u upper triangular. From this equality we have a = x~l u, which is the desired decomposition (we thus have to compute the inverse of x). □
/l    1 0\
3.52. Find the LU-decomposition of the matrix I 1   —1    2 l-O
V-   1 -V
3.53. Ray-tracing. In computer 3D-graphics the image is very often displayed using the Ray-tracing algorithm. The basis of this algorithm is an approximation of the light waves by a ray (line) and approximation of the displayed objects by polyhedrons. These are bounded by planes and it is necessary to compute where exactly are the light rays reflected from these planes. From physics we know how are the rays reflected - the angle of impact equals the angle of reflection. With this topic we have already met in the exercise || 1.64||.
The ray of light in the direction i; = (1, 2, 3) hits the plane given by the equation x + y + z = 1. In what direction is it reflected? Solution. Unit normal vector to the plane is n = -^(1,1,1). The vector that gives the direction of the reflected ray vR lies in the plane given by the vectors v,n. We can express it as a linear combination of these vectors. Furthermore, the rule for the angle of reflection says that (v, n) = —{vR, n). >From there we obtain a quadratic equation for the coefficient of the linear combination.
This exercise can be solved in an easier, more geometric way. From the picture we can directly derive that
vR = v — 2(v, n)n
and in our case we obtain vR = (—3, —2, — 1). □
3.54. Singular  decomposition,   polar   decomposition, pseu-
doinverse. Compute the singular decomposition of the matrix /0   0 -I\
a = I — 1   0    0 I. Then compute its polar decomposition and
\ 0   0    0 / find its pseudoinverse.
Solution. We first compute at a:
/ 0    -1   0\ / 0 0 ata =10     0   0     -1 0
and obtain a diagonal matrix. But we need to find such orthonormal basis under which the matrix is diagonal and the zero row is the last
187
CHAPTER 3. LINEAR MODELS AND MATRIX CALCULUS
one. This can be clearly obtained by rotating through the right angle
about the x-axis (the j-coordinate then goes to z, and z, goes to —y).
This rotation is an orthogonal transformation given by the matrix v = 'l    0 0\
0 0 1 I. By this we have (without much computation) found the v0 -1 o)
decomposition at a = vbvt, where b is diagonal with eigenvalues (1, \, 0) on the diagonal. Because now we have b = (av)t(av), the columns of the matrix
0   0  -k\ (I    0   0\     / 0    4 0
0 0 0 / \0 -1 0/ \ 0 0 Oy form an orthogonal system of vectors, which we normalise and extend to a basis. That is then of the form (0, -1, 0), (1, 0, 0), (0, 0, 1). The transition matrix of changing from this basis to the standard one is
then u = | — 1   0  0 |. Finally, we obtain the decomposition a
u*Jbvt
0 0 -1 0 0 0
Geometrical interpretation of decomposition is the following: first, everything is rotated through the right angle by the x-axis, then follows a projection to the xy plane such that the unit ball is mapped on the ellipse with major half-axes 1 and \ and the result is the rotated through the right angle about the z-axis.
The polar decomposition a = p ■ w can be simply obtained from the singular one: p := u^/~but and w := uvt, that is,
	0	°\ 1	f0	0	-1
0	0	-1 =	-1	0	0
0	1	o 1	o	1	0
and
W =
and from that it follows that
0   0 -lj -10 0 0   0 0
Pseudoinverse matrix is then given by the expression A(_1) := vsu1 'I   0 0^
where s = I 0  2  0 I. Thus we have
t(-D
188
CHAPTER 3. LINEAR MODELS AND MATRIX CALCULUS
□
3.55. QR decomposition. QR decomposition of a matrix A is very useful in the case when we are given a system of linear equations Ax = b which has no solution, but we need to find an approximation as good as possible. That is, we want to minimise \\Ax — b\\. According to the Pythagorean theorem we have || Ax — b\\2 = \\ Ax — b\\ ||2 + || b±_\\2, where b was decomposed into b\\ that belongs to the range of the linear transformation A (that corresponds to the matrix A) and into bj_, that is perpendicular to this range. Projection on the range of A can be written in the form QQT for a suitable matrix Q. Specifically for this matrix we obtain it through Gram-Schmidt orthonormalisation of the column of the matrix A. Then we have Ax — b\\ = Q(QT Ax — QTb). The system in the parentheses has a solution, for which we obtain \\Ax — b\\ = \\bj_\\, which is the minimal value. Furthermore, the matrix R := QT A is upper triangular and therefore the approximate solution can be found very easily.
Find an approximate solution of the system
x + 2y = 1 2x + 4y = 4
(1 2\
Solution. We have a system Ax = b with A = I       J and b =
(which evidently has no solution). We thus orthonormalise the
columns of A. We take the first of them and divide it by its size. This
i /I'
is
yields the first vector of the orthonormal basis y^J ■ ^ut the second is twice the first and thus it will be after orthonormalisation zero. Therefore we have Q = . The projector on the range of A i
then QQT = next we compute
9
and
The approximate solution then satisfies Rx = QTb and that in our case means 5x + 9y = 9 (approximate solution is thus not unique). QR decomposition of the matrix A is then
□
189
CHAPTER 3. LINEAR MODELS AND MATRIX CALCULUS
2    -1 -1\
3.56.  Minimise \\Ax -b\\ for A = | -1    2    -1 I and/3 =
1-1    2 /
and write down the QR decomposition of the matrix A. Solution. Normalised first column of the matrix A is 000 e\ = (2\
I — 1 I. From the second column we subtract its component in the direction e\. We have
-1
and therefore we obtain l\ /-l
By this we have created an orthogonal vector, which we normalise and (°\
obtain e2 = 4? I 1 I. The third column of the matrix A is already 1-1/
linearly dependent (we can verify this by computing the determinant). The desired column-orthogonal matrix is then
1   I2 °
Next we compute
' 2 -1 -1
-3     -3 A Ve^O  373 -373;
and
The solution of the equation Rx = QTb is x = y = z. Multiples of the vector (1, 1, 1) thus minimise \\Ax — b\\.
The mapping given by the matrix A is a projection on the plane with a normal vector (1, 1, 1).
□
3.57. Linear regression. The knowledge we have obtained in this chapter can be successfully used practically for solving problems with linear regression. It is about finding the best approximation of some functional dependence using a linear function.
We are thus given a functional dependence in some points (for instance, we investigate the value of the property of people depending
CHAPTER 3. LINEAR MODELS AND MATRIX CALCULUS
on their intelligence, the value of the property of their parents, number of mutual friends with Mr. Williams, ...), that is, f{a\,... ,aln) = yi,..., f{a\, ak, ..., ak) = yk,k > n (we have thus more equations than unknowns) and we want "best possible" approximation of this dependency using a linear function, that is, we want to express the value of the property as a linear function f{x\,..., x„) = b\X\ + b2x2 +• • • +bnxn +c. If we also define "best possible" by minimisation of
k     / n \ 2
E I v<   E(/';V' 1 < > j
with regard to the real constants b\, ...,b„, c. Our goal is to find such linear combination of the columns of the matrix A = (a1.) (with coefficients b\,..., b„), that has the smallest distance from the vector (yi,..., yk) in Rk, it is thus about finding an orthogonal projection of the vector (yi,..., yk) on the subspace generated by the columns of the matrix A. Using the theorem 3.49 this projection is the vector (bu...,bn)T = A(-rHyi,...,bn).
3.58.  Using the least squares method, solve the system
2x + y + 2z = 1
x + y + 3z = 2
2x + y + z = 0
x + z, = -1
Solution. Our system has no solution, since its matrix has rank 3, the extended matrix has rank 4. The best approximation of the vector b = (1, 2, 0, —1) formed by the right sides of the equations can be thus obtained using the theorem 3.49 by the vector A{~l)b. (AA{~l)b is then the best approximation - the perpendicular projection of the vector b on the space generated by the columns of the matrix A).
Because the columns of the matrix A are linearly independent, its pseudoinverse is given by the relation (AT A)-1 AT. Thus we have
i(-D
191
CHAPTER 3. LINEAR MODELS AND MATRIX CALCULUS
3/5	-1	° \	2 1	2	
-1	10/3	-2/3	1 1	1	0
0	-2/3	1/3 /	\2 3	1	1
1/5	-2/5	1/5	3/5 \		
0	1/3	2/3	-5/3		
0	1/3	-1/3	1/3 /		
The desired x equals
A{~l)b = (-6/5,7/3, l/3)r.
The projection (the best possible approximation of the column of the right sides) is then the vector (3/5, 32/15, 4/15, -13/15). □
192
CHAPTER 3. LINEAR MODELS AND MATRIX CALCULUS
G. Additional exercises for the whole chapter
3.59. Model of evolution of a whale population. For evolution of a population females are important, and for them the important factor is not age but fertility. From this point of view we can divide the females into newborns (juvenile), that is, females who are yet infertile; young fertile females; adult females with highest fertility and postclimacterial females which are not fertile anymore (but are still important with respect to taking care of newborns and food gathering).
We model the evolution of such population in time. For a time unit we choose time it takes to reach adulthood. Newborn female which survives this interval becomes fertile. The evolution of a young female to full fertility and to postclimacterial state depends on the environment. That is, transition to next category is a random event. Analogously, death of an individual is also a random event. Young fertile female has per unit interval less children than adult female. Let us formalise these statements.
Denote by xi(t), x2(t), x3(t), x4(t) the number of juvenile, young, adult and postclimacterial females in time t respectively. The amount can be expressed as a number of individuals, but also as a number of individuals relative on a unit area (the so-called population density), or as a total biomass and similarly. Further denote by pi the probability that a juvenile female survives the unit time interval and becomes fertile, and by p2 and p3 the respective probabilities that a young female becomes adult and that adult female becomes old. Another random event is dying (positively formulated: survival) of females that do not move to the next category - we denote the probabilities respectively q2, q3 and q4 for young, adult and old females. Each of the numbers p\, p2, p3, q2, q3, q4 is as a probability from the interval [0, 1]. Young female can survive, reach adulthood or die; these events are mutually exclusive, together they form a sure event and cannot be excluded. Thus we have p2 + q2 < 1. From similar reasons we have p3 +q3 < 1. Finally, we denote by f2 and f3 the average number of daughters of a young and adult female, respectively. These parameters satisfy 0 < f2 < f3.
Expected number of newborn females in the next time interval is the sum of daughters of young and of adult females, that is
x1(t + \) = f2X2(t) + f3X3(t).
We denote for a while by x2,i (t + 1) the amount of young females in time t + 1, which were in the previous time interval, that is, in time t, juvenile, and by x2:2(t +1) the amount of young females, that were already in time t fertile, survived that time interval bud did not move into the adulthood. The probability p\ that a juvenile female survives the interval can be expressed by classical probability, that is, by the ratio x2,i(f + \)/x\{t), and similarly we can express the probability q2 as the ratio x2,2(t + l)/x2(t). Because young females in time t + 1 are exactly those that survived the juvenile stage and those that already were fertile, did survive and did not evolve, it holds that
x2(t + 1) = x2,i(t + l)+x%2(t + 1) = p\xi{t) +q2x2(t).
Analogically we derive the expected number of fully fertile females
x3(t + 1) = p2x2(t) + q3x3(t)
and the expected number of postclimacterial females by
x4(t + 1) = p3x3(t) + q4x4(t).
193
CHAPTER 3. LINEAR MODELS AND MATRIX CALCULUS
CO
20
30
40
50
10
Figure i. Evolution of a population of orca whale. On the horizontal axis the time is in years, on the vertical axis is the size of the population. Individual areas depict the number of juvenile, young, adult and old females respectively, from bellow.
Now we can denote /0
Pi 0
h
P2
0
h o
P3
o\
0 0
<?4/
x(t)
Ai(0\
*2(0 X3(t)
\x4(t) /
and rewrite the previous recurrent formulas in the matrix form
x(t + 1) = Ax(t).
Using this matrix difference equation we can easily compute the expected number of whale females
in individual categories, if we know the distribution of population at some initial time.
Specifically, for the population of orca whales the following parameters were observed: pi = 0,9775,  q2 = 0,9111,   h = 0,0043, p2 = 0,0736,  q3 = 0,9534.   /3 = 0,1132, p3 = 0,0452,   q4 = 0,9804; Time interval is in this case one year.
If we start at the time t = 0 with unit measure of young female in some unoccupied area, that is,
with the vector x(0) = (0, 1, 0, 0)T, we can compute
x(l)
/ 0	0,0043	0,1132	0 >		/0\		^0,0043^		
0,9775	0,9111	0	0		1		0,9111		
0	0,0736	0,9534	0		0		0,0736		
V 0	0	0,0452	0,9804y		W			)	
/ 0	0,0043	0,1132	0 \		^0,0043^			/0,01224925\	
0,9775	0,9111	0	0		0,9111			0,83430646	
0	0,0736	0,9534	0		0,0736			0,13722720	
V 0	0	0,0452	0,9804/			0 J		^0,00332672/	
x(2)
and we can carry on. The results of the computation can be also expressed graphically; see the picture || 11|. Try by yourself a computation and graphical depiction of the results even for a different initial
194
CHAPTER 3. LINEAR MODELS AND MATRIX CALCULUS
distribution of the population. The result should be an observation that the total population grows exponentially, but the ratios of the sizes of individual groups stabilise gradually on constant values. The matrix A thus has the eigenvalues
ki = 1,025441326, k2 = 0,980400000, a3 = 0,834222976, a4 = 0,004835698,
eigenvector associated with the largest eigenvalue k± is
w = (0,03697187, 0,31607121, 0,32290968, 0,32404724);
this vector is normed such that the sum of its components equals 1.
Compare the evolution of the size of the population with the exponential function F(t) = k[x0, where x0 is the total size of the initial population. Compute also the relative distribution in individual categories in the population after certain time of evolution, and compare it with the components of the eigenvector w. They will appear very close, this is caused by the fact that A has only single eigenvalue that has the greatest absolute value and by the fact that the vector space generated by the eigenvectors associated with the eigenvalues a2, a3, a4 has with the non-negative orthant intersection only the zero vector. The structure of the matrix A itself does not ensure such easily predictable evolution, because it is a so-called reducible matrix (see ??).
3.60. Model of growth of population of teasels Dipsacus sylvestris. This plant can be seen in four stages. Either as a blossoming plant or as rosette of leaves, while with the rosette there are three sizes - small, medium and large. The life cycle of this monoicous perennial plant can be described as follows.
Blossoming plant produces in late summer some number of seeds and dies. From the seeds, some sprout already in that year into a rosette of leaves, usually of medium size. Other seeds spend the winter in the ground. Some of the seeds in the ground sprout in the spring into a rosette, but because they were weakened during the winter, the size is usually small. After three or more winters the "sleeping" (formally, dormant) seeds die as they loose the ability to sprout. Depending on the environment of the plant, small or medium rosette can during the year grow, and any rosette can stay in its category or die (wither, be eaten by insects, etc.) Medium or large rosette can in the next year burst into a flower. Blossoming flower then produces seeds and the cycle repeats.
In order to be able to predict the spreading of the population of the teasels, we need to quantify the described events. The botanists discovered that a blossoming plant produces on average 431 seeds. The probabilities that a seed sprouts, that a rosette grows or bursts into a flower are summarised in the following table:
195
CHAPTER 3. LINEAR MODELS AND MATRIX CALCULUS
event
probability
seed produced by a flower dies
seed sprouts into a small rosette in the current year
seed sprouts into a medium rosette in the current year
seed sprouts into a large rosette in the current year
seed sprouts into a small rosette after spending the winter
seed sprouts into a medium rosette after spending the winter
seed sprouts into a large rosette after spending the winter
seed sprouts into a small rosette after spending two winters
seed dies after spending one winter
small rosette survives but does not grow
medium rosette survives but does not grow
large rosette survives but does not grow
small rosette grows into a medium one
small rosette grows into a large one
medium rosette grows into a large one
medium rosette bursts into a flower
large rosette bursts into a flower
0,172 0,008 0,070 0,002 0,013 0,007 0,001 0,001 0,013 0,125 0,238 0,167 0,125 0,036 0,245 0,023 0,750
Note that all the relevant events in the life cycle have their probabilities given and that the events are mutually incompatible.
Let us imagine that we always observe the population at the beginning of the vegetative year, say in March, and that all considered events take place in the rest of the year, say from April to February. In the population there are blossoming flowers, rosettes of three sizes, produced seeds and seeds that have been dormant for a year or two. This could lead us to division of the population into seven classes - just-produced seeds, seeds dormant for one year, seeds dormant for two years, rosettes small, medium and large and blossoming flowers. But the just-produced seeds are in the same year changed either into rosettes or they spend winter, thus they do not form an individual category. Let us thus denote:
x\(t) — the number of seeds dormant for one year in the spring of the year t x2(t) — the number of seeds dormant for two years in the spring of the year t x3 (t) — the number of small rosettes in the spring of the year t x4(t) — the number of medium rosettes in the spring of the year t x5 (t) — the number of large rosettes in the spring of the year t xe(t) — the number of blossoming flowers in the spring of the year t The number of produced seeds in the year t is 431x6(f). The probability that the seeds stays dormant
for the first year equals the probability that the seed does not sprout into any rosette and does not die,
that is, 1 - (0,008 + 0,070 + 0,002 + 0,172) = 0,748. The expected number of seeds dormant for
winter in the next year is thus
The probability that the seed that have been dormant for one year stays dormant for the second year equals the probability that the dormant seed does not sprout into any rosette and that it does not die, that is, 1 - 0,013 - 0,007 - 0,001 - 0,013 = 0,966. The expected number of seeds dormant for two winters is thus
Small rosette can sprout from the seeds immediately, from a seed dormant for one year or from a seed dormant for two years. The expected number of small rosettes sprouted from non-dormant seeds in the year t equals 0,008 • 431x6(f) = 3,448x6(f). The expected number of small rosettes sprouted
xi(t + 1) = 0,748 • 431x6(0 = 322,388jc6(0.
x2(t + 1) = 0,966*1 (f).
196
CHAPTER 3. LINEAR MODELS AND MATRIX CALCULUS
from the seeds dormant for one and two years is 0,013*i (?) and 0,010*2(0 respectively. With these newly sprouted small rosettes there are in the population also the older small rosettes (those that have not grown yet) - of those there are 0,125*3(0- The total expected number of small rosettes is thus
x3(t + 1) = 0,013*i(0 + 0,010*2(0 + 0,125jt3(0 + 3,448*6(0-
Analogically we determine the expected number of medium and large rosettes
x4(t + 1) =0,007*^0 + 0,125jt3(f) +0,238jc4(0 + 0,070 -431*6(0 = =0,007*i(0 + 0,125*3(0 +0,238*4(0 + 30,170*6,
x5(t + 1) =0,245*4(0 + 0,167*5(0 + 0,002 • 431*6(0 = =0,245*4(0 +0,167*5(0 + 0,862*6(0.
The blossoming flower can arise either from medium or from large rosette. The expected number of blossoming flowers is thus
x6(t + 1) = 0,023*4(0 + 0,750*5(0-
We have thus reached six recurrent formulas for individual components of the investigated plant. We now denote
/ 0	0	0	0	0	322,388\		/*i(0\
0,966	0	0	0	0	0		*2(0
0,013	0,010	0,125	0	0	3,448	, *(0 =	*3(0
0,007	0	0,125	0,238	0	30,170		*4(0
0,008	0	0,038	0,245	0,167	0,862		*5(0
V 0	0	0	0,023	0,750	o )		\*6(0 /
and write the previous equalities in the matrix form suitable for the computation
x(t + 1) = A*(0-
If we know the distribution of the individual components of the population in some initial year t = 0, we can compute the expected numbers of flowers and seeds in the following years. We can also
6
compute the total number of individuals n(t) at the time t, n(t) = ^*;(0, relative distribution
r = l
of the individual components Xi(t)/n(t), i = 1, 2, 3, 4, 5, 6 and the yearly relative change in the population n(t + l)/n(t). The results of such calculations for fifteen years and the case that we have put into some locality one blossoming flower, are given in the table || 11|. Unlike the whale population, the image would not be very clear, as the numbers of flowers are negligible compared to the numbers
of seeds (the individual areas for flowers would merge in the picture).
ki= 2,3339 k4 = 0,1187 + 0,1953i
The matrix A has the eigenvalues A2 = -0,9569 + l,4942i X5 = 0,1187 - 0,1953i
X3 = -0,9569 - l,4942i X6 = -0,127'4
The eigenvector associated with the eigenvalue ki is
w
(0,6377, 0,2640, 0,0122, 0,0693, 0,0122, 0,0046);
this vector is normed such that the sum of its components is equal to one. We see that with increasing time t the relative increment in the size of population approaches the eigenvalue X\, relative distribution of the components in the population approach the components of the normed eigenvector associated with the eigenvector X\. Every non-negative matrix that has non-zero elements at the
197
CHAPTER 3. LINEAR MODELS AND MATRIX CALCULUS
t	xi	x2	x3	X4	x5	Xg	n(t) |
0	0,00	0,00	0,00	0,00	0,00	1,00	1,00
1	322,39	0,00	3,45	30,17	0,86	0,00	356,87
2	0,00	311,43	4,62	9,87	10,25	1,34	337,50
3	432,13	0,00	8,31	43,37	5,46	7,91	497,18
4	2550,50	417,44	33,93	253,07	22,13	5,09	3 282,16
5	1641,69	2463,78	59,13	235,96	91,78	22,42	4514,76
6	7 227,10	1585,88	130,67	751,37	107,84	74,26	9 877,12
7	23 941,29	6981,37	382,20	2486,25	328,89	98,16	34218,17
8	31 646,56	23 127,29	767,29	3768,67	954,73	303,85	60568,39
9	97 958,56	30570,58	1 786,27	10381,63	1 627,01	802,72	143 126,78
10	258788,42	94627,97	4 570,24	27 597,99	4358,70	1459,04	391402,36
11	470376,19	249 989,61	9 912,57	52 970,28	10991,08	3 903,78	798 143,52
12	1258 532,41	454383,40	23 314,10	134915,73	22317,98	9461,62	1902925,24
13	3 050 314,29	1215742,31	56442,70	329 291,15	55 891,57	19 841,54	4727 523,56
14	6396675,73	2946603,60	127 280,49	705398,22	133 660,97	49492,37	10359111,38
15	15955747,76	6179188,75	299 182,59	1721756,52	293 816,44	116 469,89	24566161,94
t		*2(0	x3(t)	x4(t)	*s(0	x6(t)	n(t + 1)
1	n(t)	n(t)	n(t)	n(t)	n(t)	n(t)	n(t)
0	0,000	0,000	0,000	0,000	0,000	1,000	356,868
1	0,903	0,000	0,010	0,085	0,002	0,000	0,946
2	0,000	0,923	0,014	0,029	0,030	0,004	1,473
3	0,869	0,000	0,017	0,087	0,011	0,016	6,602
4	0,777	0,127	0,010	0,077	0,007	0,002	1,376
5	0,364	0,546	0,013	0,052	0,020	0,005	2,188
6	0,732	0,161	0,013	0,076	0,011	0,008	3,464
7	0,700	0,204	0,011	0,073	0,010	0,003	1,770
8	0,522	0,382	0,013	0,062	0,016	0,005	2,363
9	0,684	0,214	0,012	0,073	0,011	0,006	2,735
10	0,661	0,242	0,012	0,071	0,011	0,004	2,039
11	0,589	0,313	0,012	0,066	0,014	0,005	2,384
12	0,661	0,239	0,012	0,071	0,012	0,005	2,484
13	0,645	0,257	0,012	0,070	0,012	0,004	2,191
14	0,617	0,284	0,012	0,068	0,013	0,005	2,371
15	0,650	0,252	0,012	0,070	0,012	0,005	
Table 1. Modelled evolution of the population of teasels Dipsacus sylvestris. Sizes of the individual components of population, the total size of population, relative distribution of the individual components of population and the relative increments of sizes.
same positions as A is primitive. The evolution of the population thus necessarily approaches a stable structure.
3.61. Nonlinear model of population. Investigate in detail the evolution of the population for a non-linear model from the text book (1.12) and the values and K = 1 and
i)	rate of j	growth r	= 1 and the initial state p(l) =	0,2
ii)	rate of j	growth r	= 1 and the initial state p(l) =	2
hi)	rate of j	growth r	= 1 and the initial state p(l) =	3
iv)	rate of j	growth r	= 2,2 and the initial state p(l)	= 0,2
v)	rate of j	growth r	= 3 and the initial state p(l) =	0,2
Compute some first members and predict the future growth of the population.
198
CHAPTER 3. LINEAR MODELS AND MATRIX CALCULUS
Solution.
i) The first ten members of the sequence p(n) is in the following table. >From there we can see that the size of the population converges to the value 1.
n	P(n)
1	0,2
2	0,36
3	0,5904
4	0,83222784
5	0,971852502
6	0,999207718
7	0,999999372
Graph for the evolution of the population for r = 1 and p(l) = 0, 2:
ii) For the initial value p(l) = 2 we obtain p(2) = 0 and after that the population does not change.
iii) For p(l) = 3 we obtain
n	P(n)
1	3
2	-15
3	-255
4	-65535
and from there we see that the populations decreases under all bounds, iv) For the measure of growth r = 2, 2 and the initial state p(l) = 0, 2 we obtain
n	P(n)
1	0,2
2	0,552
3	1,0960512
4	0,864441727
5	1,122242628
6	0,820433675
7	1,144542647
8	0,780585155
9	1,157383491
10	0,756646772
11	1,161738128
12	0,748363958
!3	1,162657716
14	0,74660417
We see that instead of convergence we obtain in this case an oscillation - after some time the population jumps between the values 1,16 and 0,74. The graph of the evolution of the population for r = 2, 2 and p(l) = 0, 2 then looks as follows: v) For the rate of growth r = 3 and the initial state p(l) = 0, 2 we obtain
199
CHAPTER 3. LINEAR MODELS AND MATRIX CALCULUS
n	P(n)
1	0,2
2	0,68
3	1,3328
4	0,00213248
5	0,008516278
6	0,033847529
7	0,131953152
8	0,475577705
9	1,223788359
10	0,402179593
11	1,123473097
12	0,707316989
13	1,328375987
14	0,019755658
15	0,077851775
16	0,293224403
17	0,91495596
18	1,148390614
19	0,63715945
20	1,330721306
21	0,010427642
22	0,041384361
23	0,160399447
In this case the situation is more complicated - the population starts oscillating between more values. In order to be able to see between what values, we would need to compute more members. For the members from the table we have the following graph:
□
3.62. In a lab an experiment is being carried on with the same probability of success and failure. If the experiment succeeds, the probability of the success of the second experiment is 0, 7. If the first experiment fails, the probability of the success of the second experiment is only 0, 6.
This process goes on, that is, if the previous experiment was successful, the probability of the next success is 0, 7 and if the previous experiment was a failure, then the probability of the next success is 0, 6. For any n € N determine the probability that the 72-th experiment is successful.
Solution. Let us introduce the probabilistic vector
xn = {xln,x2n)T, neN,
where x\ is the probability of the success of the 72-th experiment and x\ = 1 — x\ is the probability of its failure. According to the statement it is
-(E)
and clearly also
_ /0, 7  0, 6\   /l/2\ _ /l3/20\ Xl ~ [0, 3   0,4J' \l/2) ~\ 7/20 ) ■
Using the notation
/7/10 3/5\ V3/10 2/5;
200
CHAPTER 3. LINEAR MODELS AND MATRIX CALCULUS
it holds that
(3.7) xn+i = T ■ x„,    n e N,
because the probabilistic vector x„+1 depends only on x„ and this dependency is identical for both x2 and x\. >From the relation (||3.7||) we directly have
(3.8) jc„+i = T ■ T -jc„_i = ••• = T" -jci,    n > 2, n e N.
Therefore we express T", 72 e N. It is a Markov process, and thus 1 is an eigenvalue of the matrix T. The second eigenvalue 0, 1 follows for instance from the fact that the trace (the sum of the elements on the diagonal) equals to the sum of the eigenvalues (every eigenvalue is counted with its algebraic multiplicity). To these eigenvalues then correspond the eigenvectors
We thus obtain
'2    1 \   /l     0 \   (2 1
T    ' 1   -11   \0   1/10)   \1 -1
that is, for n e N we have
Substitution
2 1 Wl     0 V     2 1
1 -1) ' \0   1/10/ ' \1 -1
2 1 \   /l"     0 \   (2 1
i -l/'lo io-"/"vi -i
2 i y = i (i i
1   -l)    "3 [l -2 and multiplication yields
1/2+ 10-" 2-2-10-"\ 3     — IO-" 1+2-10""/'
>From there, from (||3.7||) and from (||3.8||) it follows that
'2        11 1
+ t—7T    -   n e N.
,3    6- 10"   3 6-10", Specially, we see that for big n the probability of success of the 72-th experiment is close to 2/3. □
3.63. Student on a student dormitories is very "socially tired" (as a result, he is not able to fully perceive the universe around him and coordinate his movements). In this state he decides that he invites on the party-in-progress his friend which lives at the end of the hall. But, at the other end of the hall there lives somebody he definitely does not want to invite. But he is so „tired", that he realises the decision to make a step in a desired direction only in 53 of 100 attempts (in the remaining 47, he makes a step in exactly the opposite direction). Assuming that he starts in the middle of the hall and that the distance to both of the doors at the ends corresponds to twenty of his awkward steps, determine the probability that he first reaches the desired door. O
201
CHAPTER 3. LINEAR MODELS AND MATRIX CALCULUS
3.64. Let r e Nof persons be playing the so-called "silent post". For simplicity assume that the first person whispers to the second person exactly one (arbitrarily chose) of the words „yes", „no". The second person then whispers to the third one that of the words „yes", „no" the second person thinks that the first person whispered. This then continues to the 72-th person. If the probability that the word changes (on purpose, accidentally) to the other word during one transmission equals p € (0, 1), determine for big n e N the probability that the 72-th person correctly receives the word transmitted by the first person.
Solution. We can view this problem as a Markov chain with two states called Yes and No, and we say that the process is in the Yes state in the time m € N, if the m-th person thinks that the received word is „yes". For the order of the states Yes, No the probabilistic matrix is
The product of the matrix Tm~l and the probabilistic vector of the initial choice of the first person then gives the probability of what the m-th person thinks. We don't have to compute the powers of this matrix, because all the elements of the matrix T are positive numbers. Furthermore, this matrix is doubly stochastic. Thus we know that for big 72 e N the probabilistic vector is close to the vector (1/2, 1/2)r. The probability that the 72-th person says „yes" is thus approximately the same as the probability that the 72-th person says „no", independently of the initial word. For a big number of participants thus holds that roughly half of them hears „yes" (we repeat that this does not depend on the initial word).
For completeness let us determine what would be the result if we assumed that the probability of change from „yes" to „to" is for any person equal to p € (0, 1) and the probability of change from „no" to „yes" is equal to (in general distinct) q € (0, 1). In this case for the same order of the states we obtain a probabilistic matrix
Again, with sufficiently many people it does not depend on the initial choice of the word. Simply speaking, in this model it holds that it does not depend on the initial state, because the people decide about what the transmitted information is; more precisely, the people themselves decide about the frequency of appearance of „yes" and „no", if there is enough of them (and there is no checking present).
Let us further add that the obtained result was experimentally confirmed. In psychological experiment there was an individual repeatedly exposed to an event that could have been interpreted in two ways, and it was being done in time intervals that ensured that the subject still remembered the previous event. See for instance „T. Havr'anek et al.: Matematikapro biologick'e a I'ekafsk'e vedy, Praha, Academia 1981", where there is an experiment in which an ambiguous object (say, a drawing
which leads to (for big 72 e N) to the probabilistic vector close to the vector
\p + q   p + qj which for instance follows from the expression of the matrix
202
CHAPTER 3. LINEAR MODELS AND MATRIX CALCULUS
of a cube which can be perceived from both the bottom and the top) is in fixed time intervals lighted on. Such process is a Markov chain with the transition matrix
'1 - p q p l-q,
where p,q € (0, 1). □
3.65. In a certain game you can choose one of two opponents. The probability that you beat the better one is 1/4, while the probability that you beat the worse one is 1/2. But the opponents cannot be distinguished, thus you do not know which one is the better one. But you await a big number of games (and for each of them you can choose a different opponent). And of course you want reach the winning ratio as big as possible. Consider these two strategies:
1. For the first game choose the opponent randomly. If you win some game, carry on with the same opponent; if you lose the game, change the opponent.
2. For the first two games, choose (one) opponent randomly. Then for the next two games, if you lost both the previous games, change the opponent, otherwise stay with the same.
Which of the strategies is better?
Solution. Both strategies are a Markov chain. For simplicity denote the worse opponent by A and
the better opponent by B. In the first case for the states „game with A" and „game with 5" (in this
order) we obtain the probabilistic transition matrix
'1/2 3/4^ ,1/2 1/4,
This matrix has all elements positive, and thus it suffices to find the probabilistic vector Xoo, which is associated with the eigenvalue 1. It holds that
3  2X T
.5 5,
Its components correspond to the probabilities that after a long row of games the opponent is the player A or player B. Thus we can expect that 60 % of the games will be played against the worse of the opponents. Because
2 _ 3   1    2 1
5 ~ 5 ' 2 + 5 ' 4' there will be roughly 40 %.
For the second strategy, let us use the states „two games in a row with A" and „two games in a
row with 5" that lead to the probabilistic transition matrix
'3/4 9/16N 1/4 1/16;
We easily determine that now it is
9 4 13' 13,
Against the worse opponent we would then play (9/4)-times more frequently than against the better one. Let us recall that for the first strategy it was (3/2)-times more frequently. The second strategy is thus better. Let us also note that for the second strategy roughly 42,3 % of the games are winning - it suffices to enumerate
11     9   1     4 1
0, 423 = — =---+---.
26     13  2    13 4
203
CHAPTER 3. LINEAR MODELS AND MATRIX CALCULUS
□
3.66. Petr regularly meets his friend. But he is „well-known" for his bad timekeeping. Bud he is trying to change, thus it holds that in half of the cases he comes on time and in one tenth of the cases he comes even sooner than he should, if he was late for the last meeting. But if he was on time or sooner for the last meeting, he returns back to his „carelessness" and with probability 0, 8 comes late, and with only 0, 2 he is on time. What is the probability that on the 20-th meeting he comes late, when on the eleventh he was on time?
Solution. Clearly it is a Markov process with states „Petr came late", „Petr came on time", „Petr came sooner" with the probabilistic transition matrix (with the given order of states)
/0,4  0,8  0,8\ T =   0,5  0, 2  0, 2 . \0, 1    0 0/
The eleventh meeting is determined by the probabilistic vector (0, 1, 0)T (we surely know that Petr came on time). To the twentieth meeting corresponds the vector
/0\     /0,571578 368\ T9     1   =   0,371316224 . \0/     \0, 057 105 408/
The desired probability is thus 0, 571578 368 (exactly). Let us add that
/0, 571 316224  0,571578 368  0,571 578 368\ T9 =   0,371512 832  0,371316224   0,371316224 . \0,057 170944  0,057105 408  0,057 105 408/
>From there we see that it really does not depend on whether he came on the eleventh meeting late (first column), on time (second) or sooner (third). □
3.67. Two students A and B spend every Monday morning by playing a certain computer game. The person who wins then pays for both of them in the evening in the restaurant. The game can also be a draw - then each pays for the half. The result of the previous game partially determines the next game. If a week ago the student A has won, then with the probability 3/4 wins again and with probability 1/4 it is a draw. Draw is repeated with the probability 2/3 and with probability 1/3 the next game is won by B. If the student B won a game, then with the probability 1/2 he wins again and with probability 1/4 student A is the winner of the next game. Determine the probability that today each of them pays half of the costs, if the first game played long time ago was won by A.
Solution. We are actually given a Markov process with the states „the student A wins", „the game ends with a draw, „the student B wins" (in this order) with the probabilistic transition matrix
/3/4    0 l/4\ T =   1/4  2/3   1/4 . \ 0    1/3 1/2/
We want to find the probability of the transition from the first state to the second after a big number n € N of steps (weeks). The matrix T is primitive, because
/ 9/16    1/12    5/16 \ T2 =   17/48   19/36   17/48 . \ 1/12    7/18     1/3 /
204
CHAPTER 3. LINEAR MODELS AND MATRIX CALCULUS
It thus suffices to find the probabilistic eigenvector of the matrix T associated with the eigenvalue 1. It is easy to compute that
_ /2  3 2
x°° ~ \r r 7
We know that the vector x^ differs only very slightly from the probabilistic vector for big n and also
does not depend on the initial state, that is, for big n e N we can set
/2/7  2/7 2/7\ T" ~   3/7   3/7   3/7 . \2/7   2/7 2/7/
The desired probability is the element of this matrix on the second position in the first column (the second component of the vector x^). Thus we have (quite quickly) found the result 3/7. □
205
CHAPTER 3. LINEAR MODELS AND MATRIX CALCULUS
Solutions of the exercises
3.2. Daily diet should contain 3, 9 kg of hay and 4, 3kg of oat. The costs per foal are then 13, 82 Kc.
3.3. 3.12.
1
1
2\/3sin(n • (tt/6)) — 4cos(n • (jr/6)). -3(-l)" - 2cos(« • (2tt/3)) - 2^3 sin(« • ((2tt/3)). (-l)"(-2«2 + 8« -7).
3.13. xn
3.14. x„
3.15. x„
3.24. Leslie matrix of the given model is (the mortality of the first group is denoted by a)
^0   2 2^ a   0 0 V0   1 Oy
The stagnation condition corresponds to the fact that the matrix has 1 for the eigenvalue, that is, the polynomial
X3 — 2aX — 2a has 1 as its root, that is, a = 1/4.
3.27.
1
I 5
The matrix has the dominant eigenvalue 1, the corresponding eigenvector is (|, 1). Because the eigenvalue is dominant, the ratio of the viewers stabilises on 6 : 5.
3.30. As in (|| 3.291|) the game ends after three bets. Thus all the powers of A, starting with A3, are identical.
(i   7/8   3/4   1/2 0\
0    0      0     0 0
0    0      0     0 0
0    0      0     0 0
\0   1/8   1/4   1/2 1/
3.40. We can use the result of the exercise called Ruining of the player. The probability that the first department is cancelled is according to this exercise equals to
1 _ / 0-46 \5 1     \ 1-0.46/
.100
,25
= 0.56.
1 _ / 0-46 y 1     \ 1-0.46/
It was enough to plug in p — 1 — 0.54, y — 10/2 and x — 40/2 to (||3.6||). It is thus more clever to choose
the smaller department.
3.50.
• The claim holds. (B := AT A, btj — (i-th row of AT) ■ (j-th column of A)= bjt AT) ■ (;'-th column of A)=(j-th column of A) • (;'-th row of AT)
A 1N
(j-th row of
The claim does not hold. Consider for instance A
0 1
3.52.
1
0
3.63. Again it is a special case of the Ruining of the player. It suffices to reformulate the statement accordingly. For p = 0, 47, y = 20 and x = 20 from (||3.6||) follows the result
1
0,917 =
V 1-0,47 /
/_047_\
V 1-0,47 /
206
CHAPTER 4
Analytic geometry
position, incidence, projection
- and we return to matrices again...
A. Affine geometry
4.1.   Find the parametric equation for a line in R3 given by equations
x   -   2y   +   z   = 2, 2x   +    y   -   z   = 5.
Solution. It is obviously sufficient to solve the equation system. However we can use different approach. We need to find non-zero direction vector ortogonal to normal vectors (1, —2, 1), (2, 1,-1). Cross product
(1, -2, 1) x (2, 1, -1) = (1,3,5) gives us such vector. We can notice that triple (x, y, z) = (2,-l,-2) satisfies the respective system and we obtain the solution
[2,-1,-2]+ t (1,3, 5),   t eR.
□
4.2.   Plane in R4 is given by its parametric equation
q : [0, 3, 2, 5] + t (1, 0, 1, 0) + s (2, -1, -2, 2),   t, s e Find its implicit equation.
Now we come back to our view on geometry that we had when we studied positions of points in the plane in the 5th part of the first chapter, c.f. 1.23. First we will be interested in properties of objects in the Euclidean space, delimited by points, straight fines, planes etc. The essential point will be to clarify how their properties are related to the notion of vectors and whether they depend on the notion of length of vectors.
In the next part, we will use linear algebra for the study of objects which are defined in a nonlinear way. To do this we will need a little bit more from the theory of matrices again. The results will be important later on, while discussing the technique for optimalization, i.e. searching for extrema of functions.
At the end of this chapter we show how the projectivization of affine spaces help us to get a simplification and stability of algorithms typical for computer graphics.
1. Affine and euclidean geometry
While we were clarifying the structure of solutions of linear equations in the first part of the previous chapter we found out in paragraph 3.1 that all solutions of non-H homogeneous systems of linear equations does not form vector spaces but always arise in such a way that to an one particular solution we add the vector space of solutions of the corresponding homogeneous system. On the other hand, the difference of any two solutions of the nonhomogeneous system is always a solution of the homogeneous system. This behaviour is similar to the behaviour of linear difference equations, as we have seen in paragraph 3.14 already.
4.1. Affine spaces. A direction how to deal with the theory is given already in the discussion about the geometry of the plane, c.f. paragraph 1.25 and further. There we described straight lines and points as sets of solutions of systems of linear equations. Any line was considered as a one-dimensional subspace, although its points were described by two coordinates. Parametrically, the fine was defined by the sum of a single point (i.e. to a pair of coordinates) and multiples of a fixed direction vector. Now we will proceed in the same way in arbitrary dimension.
___J    Standard affine space J___
Standard affine space A„ is a set of all points in M" = A„ together with an operation which to a point A — (a\,..., an) e An and a vector i; — (v\,... ,v„) e Rn — V assigns the point
a + v — (ai + di, .
v„) e
An •
CHAPTER 4. ANALYTIC GEOMETRY
Solution. Our task is to find a system of equations with 4 variables x, y, z, u (because dimension of the space is 4) which are satisfied by the coordinates of precisely those points which he in the plane. Note that sought system must contain 2 = 4 — 2 linearly independent equations. We solve the problem by so called elimination of parameters. Points [x,y,z,u] € q satisfy
x = t   + 2s,
y = 3 — s,
z = 2 +   t   - 2s,
u = 5 + 2s,
where t,s sR. We can express the system as matrix
/1	2	-1	0	0	0	0 \
0	-1	0	-1	0	0	3
1	-2	0	0	-1	0	2
	2	0	0	0	-1	5/
where the first two columns are direction vectors of the plane, followed by negative identity matrix and finally the last column is vector of coordinates of point [0, 3, 2, 5]. We expressed the system in such a way so that it is a system in t, s, x, y, z, u and we move all the unknown variables to the one side of the equations. We transform obtained matrix using elementary row operations in order to get as much zero-rows on the left-hand side of the first vertical line. Adding (—1)-times the first row and (—4)-times the second row to the third row and adding twice the second row to the first row we obtain
/1	2	-1	0	0	0	0 \		
0	-1	0	-1	0	0	3		
1	-2	0	0	-1	0	2		
\°	2	0	0	0	-1	5 )		
(1	2	-1	0	0	0	0		\
0	-1	0	-1	0	0	3		
0	0	1	4	-1	0	-10		
	0	0	-2	0	-1	11		/
Which implies result
+
4y -2y
- 10 u   + 11
0, 0.
Coefficients on the right-hand side of the first vertical line, respective to the rows which are zero-rows on the left-hand side of that line, are the coefficients of general equations of a planes. Note that if we expressed the original system as a matrix
/1	0	0	0	1	2	0 \
0	1	0	0	0	-1	3
0	0	1	0	1	-2	2
	0	0	1	0	2	5/
This operation satisfies the following three properties:
(1) A + 0 — A for all points A e A„ and the null vector 0 e V,
(2) A + (v + w) — (A + v) + w for all vectors v,weV and points A e A„,
(3) for every two points A, B e A„ there exists exactly one vector v e V such that A + v — B. This vector is denoted hy v — B — A, sometimes also AB.
The underlying vector space M" is called the difference space of the standard affine space A„.
We notice a danger of several formal ambiguities. We are
'i§t# using the same symbol "+" for two different oper-'''jfxj/k ations: for adding a vector from the difference space t0 a p0int in the affine space, and for for summing vectors in the difference space V — Rn. Also we do not introduce specific letters for the set of points in the affine space, i.e. A„ denotes both this set of points and also the whole structure defining the affine space.
Why do we actually want to distinguish between the set of points in the affine space A„ and its difference space V when both spaces can be viewed as M" ? It is going on fundamental formal step to understanding the geometry in M": The thing is that the geometric objects like straight lines, points, planes etc. do not depend directly on the vector space structure of the set M", and do not depend at all on the fact that we are working with n-tuples of scalars. We only need to know what it means to move "straight in a given direction". For instance, we consider the affine plane as an unbounded board without chosen coordinates but with the possibility to move about a given vector. When we switch to such abstract view, we will be able to discuss the "plane geometry" for two-dimensional subspaces, i.e. planes in higher-dimensional spaces, the geometry of "Euclidean space" for three-dimensional subspaces etc., without the need to work with ^-tuples of coordinates.
This point of view is present in the following definition:
4.2. Definition. The affine space A with the difference space V is a set of points V, together with the map
V
(A, v)
where V is a vector space and our map satisfies the properties (l)-(3) from the definition of the standard affine space.
So for a fixed vector i; e V we get a translation rv : A -> A as the restricted map
tv : V ~ V x {v} -* V,    A\-^ A + v.
By the dimension of an affine space A, we mean the dimension of its difference space.
In sequel we do not distinguish accurately between denoting the set of points A and the set of vectors V, we talk about points and vectors of the affine space A instead.
It follows immediately form the axioms that for arbitrary points A, B, C in the affine space A
(4.1)
(4.2)
(4.3)
A - A = 0 e V B — A — -(A - B) (C - B) + (B - A) = C - A.
Indeed, (4.1) follows from the fact that A+0 — 0 and that such vector is unique (the first and the third defining property). By adding
208
CHAPTER 4. ANALYTIC GEOMETRY
where x,y,z,u remains on the left-hand side of the equations, similar
transformation
/ 1 0  0 0
0 10 0
0 0 10
V 0 0  0 1
gives us the result
1	2	0 \		/	1	0	0	0	1	2	0	\
0	-1	3			0	1	0	0	0	-1	3	
1	-2	2			-l	-4	1	0	0	0	-10	
0	2	5 )		V	0	2	0	1	0	0	11	/
4y 2y
+
+ u
10,
11.
When expressing system as a matrix, it is important to take into consideration whether the vertical line separates left-hand side from right-hand side. As we saw in this exercise, parameter eUmination method can be long-winded and it is not difficult to make a mistake along the way.
Another solution All we wanted to obtain in fact, are two linearly independent normal vectors, i.e. vectors perpendicular to (1, 0, 1, 0), (2,-1, —2, 2). If we "guessed" that these vectors could be for example (0, 2, 0, 1), (-1, 0, 1, 2), inputting x = 0, y = 3, z = 2, u = 5 to the equations
2y +    u   = a,
—x +   z   +   2u   = b
we get a = 11, b = 12, and the sought implicit expression is
2y +    u   = 11,
+   2u   = 12.
+ z
u 2u
□
4.3.   Find a parametric equation of the plane passing through points
A = [2, 1, 1],    £ = [3,4, 5],    C = [4, -2, 3].
Then find a parametric equation of the open half-plane containing the point C and bounded by line going through the points A, B.
Solution. We need one point and two (linearly independent) vectors lying in this plane for the parametric equation of the plane. It is enough to choose the point A and vectors 5 — A = (1,3, 4) and C — A = (2, —3, 2), which are obviously independent. A point [x, y, z] lies in the plain if and only if there exist numbers t, s € R so that
x =2 + 1-t+2-s,    y = l + 3- t - 3-s,    z = 1 + 4 ■ t + 2 ■ s;
which means the parametric equation is
[2, 1, 1] + t (1, 3, 4) + s (2, -3, 2),   t, s e R.
Setting s = 0 gives us a line passing through points A, B. For t = 0, s > 0 we get a ray with initial point A and passing through C. Particular but arbitrarily choosen t e R and variable s > 0 gives us a ray initiated on the border line and going through the half-plane in
successively B — A and A — B to A, according to the second defining property we obtain obviously A again. So we added the null vector which proves (4.2). Similarly, (4.3) follows from the defining property 4.1 (2) and the uniqueness.
Let us remark that the choice of one fixed point Aq e A determines a bijection between V and A. So for a fixed basis u in V we get for every point A e A a unique expression
A = Aq + x\u\ + ■ ■ ■ + x„u„.
We talk about an affine coordinate system (Aq; u\,..., un) given by the origin of the affine coordinate system Aq and the basis u of the corresponding difference space, or also about an affine frame
(Aq, w).
We can summarize the situation as follows: Affine coordinates of a point A in the frame (Aq, u) are the coordinates of the vector A — Aq in the basis u of the difference space V.
The choice of an affine coordinate system identifies each n-dimensional affine space A with the standard affine space A„.
4.3. Affine subspaces. If we choose only such points in A which have some of in advance chosen coordinates equal to zero (for instance the last one), we obtain again a set which behaves as an affine space. Indeed, this is the spirit of the following definition of the so called affine
Ma subspaces.
Subspaces of an affine space
Definition. The nonempty subset Q c A of an affine space A with a difference space V is called an affine subspace in A if the subset W = {B — A; A, B e Q} c Visa vector subspace and for any A e Q, v e W we have A + v e Q.
It is important to include both of the conditions in the definition since it is easy to find examples of sets which satisfy the first condition but not the second one. Have a think about a straight line in the plane with one removed point.
For an arbitrary set of points M c A in an affine space with a difference space V, we define the vector space
Z(M) = ({B — A; B, A e M}) C V
of all vectors generated by the differences of points in M.
In particular, V = Z(A) and every affine subspace Q c A itself satisfies the axioms for an affine space with the difference space Z(Q).
Directly from the definitions we also get that the intersection of any set of affine subspaces is either an affine subspace or the empty set.
The affine subspace (M) in A generated by a nonempty set M c A is the intersection of all affine subspaces which contain all points of the subset M.
„    Affine hull and parametric description of a subspace \^
The affine subspaces can be nicely described by their difference spaces if we choose a point Aq e M in a generating set M. Indeed, we get (M) = {Aq + v; v e Z(M) c Z(A)}, i.e. to generate the affine subspace we take the vector subspace Z(M) in the difference space generated by all differences of points in M, and we add this vector space to an arbitrary point in M. We talk also about the affine hull of the set of points M in A.
209
CHAPTER 4. ANALYTIC GEOMETRY
which point C lies. That means that the sought open half-plane can be expressed parametrically as
□
[2, 1,1] + ? (1, 3, 4) + s (2, -3, 2), «el,s> 0.
4.4.   Determine relative position of lines
p : [1,0, 3] + t (2,-1,-3), (el,
q : [1,1, 3] + s (1,-1,-2), s el.
Solution. We will find common points of given lines (subspaces intersection). We get a system
1 + 2t = 1 + s, 0 - t = 1 - s, 3   -   3t   =  3   - 2s.
>From the first two equations we get that t = 1, s = 2. However, this does not satisfy the third equation. The system does not have a solution. Direction vector (2, —1, —3) of the line p is not a multiple of direction vector (1, — 1, —2) of the line q which means that the lines are not parallel. Hence, they are skew lines. □
4.5.   Find all numbers a elso that lines
p : [4, -4, 8] +t (2, 1, -4), (el, q : [a, 6, -5] + s (1, -3,3),    s € R
are intersecting.
Solution. Lines are intersecting if and only if the system
+
4 -4
+ 2t + t - At
a
s,
6 - 3s, 5   + 3s
has exactly one solution. Expressing the system as a matrix (the first column corresponding to t, the second to s), we solve
1 2	-1	a - 4 \
1	3	10
V-4	-3	-13 )
3		10
-1		a - A
-3		-13
\		10
1	a-24	
		3
1
We see that the system has exactly one solution if and only if the second row is a multiple of the third row. This property is satisfied only for a = 3. Let us add that the point of intersection of the lines is [6, —3, 4].
□
On the other hand, whenever we choose a subspace U in the difference space Z(A) and a fixed point A e A the subset A + U, created by all possible sums of the point A and all vectors in U, is an affine subspace. This approach leads to the notion of parametrization of subspaces:
Let Q = A + Z(Q) is an affine subspace in A„ and («i,... ,uk) is a basis of Z(Q) c Rn. Then the expression of the subspace
Q — {A + hui +■■■ + tkuk; t\, ... ,tk <eR}
is called the parametric description of the subspace Q.
We have seen already another way how to prescribe affine spaces: If we choose affine coordinates, then the difference space may be described by a homogeneous system of linear equations in these coordinates. By inserting the coordinates of one point of our subspace Q into the system of equations we get the right-hand side of the nonhomogeneous system with the same matrix, and the whole subspace Q is exactly the set of solutions of this system. The description of the subspace Q by a system of equations in given coordinates is called an implicit description of the subspace Q.
The following general proposition says that we can prescribe all affine subspaces in this way, and so it also shows the geometric nature of solutions of systems of linear equations.
4.4. Theorem. Let (Aq; u) be an affine coordinate system in a n-dimensional affine space A. In these coordinates, affine subspaces of dimension k in A are exactly the sets of solutions of solvable systems ofn — k linearly
independent equations in n variables.
Proof. Let us consider an arbitrary solvable system ofn—k linearly independent equations at (x) = bi, where bi e R, i = 1,..., n — k. If A = (ai,..., an)T e Rn is a fixed solution of this (nonhomogeneous) system and if U cl" is the vector space of all solutions of the homogenized system a, (x) = 0, then the dimension of U is k and the subset of all solutions of the given system is of the form {B; B = A + (yi,..., yn)T, y = (yi ..., yn)T e U} c Rn, c.f. 3.1. So the corresponding affine subspace is described parametrically by the initial coordinates (Aq; u).
In the opposite direction, let us consider an arbitrary affine subspace Q c A„, let us choose a point B therein, and let us consider this point to be the origin of an affine coordinate system (B, v) for the affine space A. Since Q = B + Z(Q), we need to describe the difference space of the subspace Q as a subspace of solutions of a homogeneous system of linear equations. Therefore let us choose a basis v of Z(A) such that the first k vectors form a basis of Z(Q). Then in these coordinates the vectors i; e Z(Q) are given by equations
a.j(v) = 0,    j = k + 1, ..., n,
where a, are linear forms from the so called dual basis to v, i.e. functions which assign to a vector the corresponding coordinates in our basis v.
Hence our vector subspace Z(Q) of dimension k in the n-dimensional space Rn is given indeed as a solution of a homogeneous system ofn—k independent equations. The description of the chosen affine subspace in our newly chosen coordinate system (B; v) is therefore given by a system of homogeneous linear equations.
210
CHAPTER 4. ANALYTIC GEOMETRY
4.6. In R3, determine the relative position of a line p defined implicitly by
x   +    y   -   z   = 4, x   -   2y   +   z   = -3 and a plane q : y = 2x — 1.
Solution. Normal vector q is (2, —1,0) (consider £ : 2x — y+Oz = 1). It can be seen that
(1, 1,-1)+ (1,-2, 1) = (2,-1,0),
which means that the normal vector of the plane q is a linear combination of the p normal vectors. Vector defining the line (given by non-zero direction vector perpendicular to the normal vectors) lies in a subspace of the plane q (direction vector is perpendicular to the vector (2, —1,0)). Therefore we know that the line p is parallel to the plane q. Now we have to find whether they intersect (meaning that p lies in q). System of equations
x   +    y   -   z   = 4, x   -   2y   +   z   = -3, 2x   —     y =1
has infinitely many solutions, because by suming up the first two equations we get the third one. Line p lies in plane q. □ The following exercise is a typical vector spaces intersection exer-
^... „. cise. Reader should be able to solve this exercise. We f
I ■Jv^S recommend not to continue in reading this book unless 11 it is so.
rind intersection of subspaces Q\ and Q2, where
[4, -5, 1, -2] + h (3, 5, 4, 2) + t2 (2, 4, 5, 1) + t3 (0, 3, 1, 2), [4, 4, 4, 4] + Sl (0, -6, -2, -4) + s2 (-1, -5, -3, -3), for ?i, ?2, ?3, S\,S2 € R.
Solution. Point x = [xi, ] e M lies in 2i n Q2 if and only
if
xx		~ 4 "		(3\		(2\		(o\
x2 x3	=	-5 1	+ h	5 4	+ h	4 5	+ t3	3 1
x4		-2				V)		
for some numbers t\,t2,t3 el and, as well,
*1		"4"		/0\		/-1\
x2		4	+ *i	-6	+ S2	-5
x3	—	4		-2		-3
x4		4				
for some s\, s2 € R. We get an equation
	(3\		(2\		/0\	/4-4\		/0\		/-1\
	5		4		3	4 + 5		-6		-5
h	4	+ h	5		1	= 4-1	+ *i	-2	+ ^2	-3
			w			^4 + 2^				
It remains to cope with the consequences of transition from the former coordinate system (A; u)to the our adapted system (B; v). It follows from a general consideration about transformations of coordinates in the following paragraph that the final description of the subspace will be again via system of linear equations, but this time nonhomogeneous in general. □
4.5. Coordinate transformations. Any two arbitrarily chosen afiine coordinate systems (Aq, u), (Bq, v) differ in the basis of the difference spaces and in that the origin of the latter one is translated about the vector (Bo — Aq). Hence we can read off the equations for the corresponding coordinate transformations from the rule for a transformation of a point X e A
X — B0 + x\vi H-----h x'nv„
— B0 + (Ao - -Bo) + x\ui H-----\-x„u„.
Let y — (yi,..., y„)T denotes the column of coordinates of the vector (Ao — Bo) in the basis v, and let M — (atj) be the matrix expressing the basis u in terms of the basis v. Then
x'l — yi + «11*1 H-----1" ai„x„
Xn — y-n ~r &n\X\ ~t ' ' ' ~\~ ClnnXn
i.e.in matrix notation
x' — y + M ■ x.
As an example, we may express the influence of such a change of basis on the coordinates of subsets described by systems of linear equations. Let our system in coordinates (Ao; w) I has the form
S ■ x — b
where S is the matrix of the system. Then
S-x = S- M~l • (y + Af -x)-S- M~l ■ y = b.
Thus in the new coordinates (Bo; v) considered above, the system will have the form
(S ■ M~l ) ■ x' — b' — b + (S ■ M~l ) ■ y.
Therefore, if a subset is described by a system of linear equations in an one affine frame, then it is so also in the all other affine frames. This finishes fully the proof of the previous proposition.
4.6. Examples of affine subspaces. (1) The one-dimensional (standard) affine space is the subset of all points of a real straight line A\. Its difference space is an one-dimensional vector space R (and the supporting set _ -— is also R). The affine coordinates are obtained by a choice of an origin and a scale (i.e. a basis in the vector space R). All proper affine spaces are 0-dimensional, they are exactly formed by all points of the real straight line R.
(2) The two-dimensional (standard) affine space is a set of all points in the space A2 with the difference space M2. (The supporting set is M2.) The affine coordinates are obtained by a choice of an origin and two linearly independent vectors (directions and scales). The proper subspaces then are all points and straight lines in the plane (0-dimensional and 1-dimensional). The lines are prescribed by a
211
CHAPTER 4. ANALYTIC GEOMETRY
Using matrix notation (variables are t\, t2,t3,s\, s2 respectively and we move vectors corresponding to s\ and s2 to the left-hand side) we solve by row operations
/ 3  2  0  0   1   0 \     / 3    2 0 543659 029 451233    ~    07 36 ^ 2   1   2  4   3   6 /     \ 0   -1   6 12
0 18
1
10
5 7
0 \ 27 9 18
/300000\ 0  2  0  0  0 0 0  0   1   2  0 3 \ 0  0  0  0   1   0 /
We can see that t\ = t2 = s2 = 0 and for s\ = t e M we have t3 = 3 — 2t. Note that for determination of Q\ n Q2 it is sufficient to know either h,t2, t3, or s\, s2. Let's go back to expression
xx		~4~		/0\		f-l\		~4~		(0\
x2 x3	=	4 4	+ *i	-6 -2	+ s2	-5 -3	=	4 4	+ t	-6 -2
x4		4						4		
Intersection of given subspaces is line (s = —2t)
[4,4,4,4] + ^(0,3,l,2), sel For checking correctness of our solution we can substitute
xx		~ 4 ~		(3\		(2\		(o\
x2 x3	=	-5 1	+ h	5 4	+ h	4 5	+ t3	3 1
x4		-2				v)		w
+ (3 - 20
M
3 1
V2/
+ t
/o\
-6 -2
v-v
□
4.8. Decide whether points [0,2,1], [-1,2,0], [-2,5,2] and [0, 5, 4] in M3 all he in the same plane.
Solution. Arbitrary pair of given points in M3 defines a vector (see definition of affine space; its coordinates are given respectively by differences of coordinates of two points). The fact that four points lie in the same plane is equivalent to the fact that three vectors given by one chosen point and one of the other three points are linearly dependent. We can choose for example point [0, 2, 1] (regardless of the choice), then we consider vectors [0, 2, 1] - [-1, 2, 0] = (1, 0, 1), [0, 2, 1] -[-2, 5, 2] = (2, -3, -1) and [0, 2, 1] - [0, 5, 4] = (0, -3, -3). We can see that the sum of twice the first vector and the third vector is equal to the second vector and the vectors are linearly dependent (in
choice of a point and one generator of direction, i.e. a vector from the corresponding difference space (so called parametric definition of the straight line).
(3) The three-dimensional (standard) affine space is a set of all points in the space A3 with the difference space R3. The affine coordinates are obtained by a choice of an origin and three linearly independent vecors (directions and scales). The proper affine sub-spaces are then all points, straight lines and planes (0-dimensional, 1-dimensional and 2-dimensional).
(4) The subspace of all solutions of one linear equation a ■ x — b for an unknown point [x\,..., x„] e A„, known nonzero vector of coefficients (a\,..., a„) and a scalar b e R is an affine subspace of dimension n — 1 (we also say that the subspace is of codimension 1), i.e. so called hyperplane in An.
4.7. Affine combinations of points. Let us now introduce an analogue of the linear combination of vectors. Let Aq, ■ ■ ■, Ak be points in the affine space A. Their affine hull ({Aq ..., A^}) can be written as
{A0 + h(Ai -A0) + --- + tk(Ak-A0y,h,...,tkeM]
and in any affine coordinates (i.e. each point At is expressed by a column of scalars) we can write the same set as
(A0,
Ak) = {t0A0 + tiAi
tkAk; tt e
I    Affine combinations of points In general, by formulae ?o^o +        + •••
=0
i=0
coefficients satisfying J2h
-0 l'
+ tkAk with — 1 we mean the points Aq +
2~2i=i ti(Ai — Aq), and we all them the affine combinations of points.
The points Aq ..., Ak are in a general position if they generate a /c-dimensional affine subspace. It is easy to see from our definitions that this happens if and only if for each A, the vectors arose as differences of this point At and all other vectors Aj are linearly independent. We also observe that an assignment of a series of (dim A) + l points in a general position is equivalent to the definition of an affine frame with the origin in the first of them.
4.8. Simplices. For points in an affine space the affine combination is a similar construction as the linear combination for vectors in a vector space. Indeed, the affine subspace generated by points Aq ■ ■ ■, Ak is equal to the set of all affine combinations of its generators. We can generalize also the notion "to lie on the line between two points". In the two-dimensional case we can imagine the interior of a triangle. In general we proceed as follows:
^-dimensional simplices [___
Let Aq, ■ ■ ■, Ak be k + 1 points in a general position in an affine space A. The set A — A(Ao, ■ ■ ■, Ak) defined as the set of all affine combinations of points At with nonnegative coefficients only, i.e.
A = {t0A0 + fiAi
■tkAk;ti e [0, 1] cR,£f,- = 1}
i=0
is called a /c-dimensional simplex generated by the points A{.
212
CHAPTER 4. ANALYTIC GEOMETRY
other words, rank of matrix constructed by taking these vectors as rows is less than 3: in this case we have matrix
which has rank of 2). Hence, given points lie in the same plane. □
4.9. Into how many parts can three planes slice a space (M3)? Give an example of planes position for every case.
4.10. Decide whether point [2, 1,0] lies within convex hull of points [0, 2,1], [1,0,1], [3,-2,-1], [-1,0,1].
Solution. We form nonhomogeneous linear system, for coefficients t\, t2, h, t4, affine combination of given points, which gives the first point (they are determined unambiguously if the points are not coplanar).
3    -1\ /h\ (2\
/0 1
2 0
1 1
V1 1
-2 -1 1
0
1
h h W
1
0
V1/
The last equation says it is an affine combination. Solving it, we obtain (t\, t2,t3,t4) = (1,0, 1/2, —1/2), so it is not a convex combination.
□
4.11.   In M3 you are given tetrahedron ABCD, where A = [4, 0, 2],
B = [-2, -3, 1], C = [1, -1, -3], D = [2, 4, -2].
a) Determine its volume.
b) Decide, whether point X = [0, —3, 0] lies inside this tetrahedron.
Solution, a) Volume of a tetrahedron is one sixth of volume of parallelepiped, of which three edges from the point A are B — A =
(-6, -3, -1), C - A = (-3, -1, -5) and D — A = (-2, 4, -4) and it is given by absolute value of determinant
-3 -2
124.
Thus, the volume of the tetrahedron is ^. b) Given point does not he inside the tetrahedron. We express X as an affine combination of its vertices (by solving system of four linear equation in four unknowns a,b,c ad given by equality X = a A + bB + cC + dD), and we get
;D. This means that X does not he in the
X
iA + ^B + ^C
tetrahedron, i.e. in convex hull of points A, B, C and D (a, b, c and d would have to be inside inerval (0,1)). □
The one-dimensional simplex is a line segment, the two-dimensional simplex is a triangle, the zero-dimensional simplex is a point.
We notice that each ^-dimensional simplex has exactly k + 1 faces which are defined by equations — 0, i — 0,..., k. We see directly from the definition that the faces are also simplices, and their dimension is k — 1. We talk about the boundary of the simplex. For instance, the boundary of a triangle is formed by the three edges, and the boundary of each edge is formed by the two vertices.
The description of a subspace as a set of affine combinations of points in a general position is equivalent to the parametric description. We work similarly with the parametric description of simplices.
4.9. Convex sets. The subset M of an affine space is called convex if and only if for any two points A, B e M the set contains also the whole line segment A (A, B). We see directly from the definition that each convex set with k + 1 points in a general position contains also the whole simplex defined by these points (A formal proof is a part of poof of the following proposition). The examples of convex sets are
(1) the empty set,
(2) affine subspaces,
(3) line segments, rays p — {P + t ■ v; t > 0},
(4) more generally ^-dimensional subspaces
a — {P + t\ ■ v\ H-----h tk ■ vk; t\,
,    e R, k > 0},
(5) angles in two-dimensional subspaces
B = {P + tx ■ vi + t2 • v2; h > 0, ?2 > 0}.
Directly from the definition, it also follows that an intersection of an arbitrary system of convex sets is a convex set. The intersection of all convex sets containing given set M is called the convex hullJC(M) of the set M.
Theorem. The convex hull of any subset M C A is JC(M) = {fiAi + ••• +
1, tt >0,At eM}
f tsAs; J^tt
r = l
Proof. Let S denotes the set of all affine combinations on the right-hand side of the equation we want to prove. First we check that S is convex. Therefore, we choose two series of parameters U, i — 1,.., s\, fj, j — 1,..., s2 with the desired properties.
Without loss of generality, we may assume that s\ — s2 and that the same points from M there appear in both combinations (otherwise we simply add summands with zero coefficients). Let us consider an arbitrary point on the line segment given by vertices defined by the two combinations:
e(fiAi H-----h tsAs) + (1 - e)(^Ai H-----h /SAS), 0 < e < 1.
Obviously any point of thise line segment lies in S.
It remains to show that the complex hull of the points A\,..., As cannot be smaller than S. The points A, themselves correspond to the choice of parameters tj — 0 for all j / i and ti — 1. Let us assume that the claim holds for all sets with s — 1 points at most. It means that the convex hull of the
points A i
is (according to the assumption) formed
213
CHAPTER 4. ANALYTIC GEOMETRY
4.12. Affine transformation of point coordinates
Point x coordinates are expressed as [2, 2, 3] in affine basis {[1,2, 3], (1, 1, 1), (1,-1,2), (2, 1, 1)} (in M3). Determine its coordinates in standard basis, i.e. in basis {[0,0, 0], (1,0, 0), (0, 1,0), (0,0, 1)}.
Solution. Coordinates [2, 2, 3] in basis {[1, 2, 3], (1, 1, 1), (1, -1, 2), (2, give us [1, 2, 3]+2-(l, 1, l)+2-(l, -1, 2)+3-(2, 1, 1) = [11, 5, 12] coordinates of point x in standard basis. □
4.13. Affine transformation of mapping. Find affine mapping / in coordinate system with basis u = {(1, 1), (—1, 1)} and origin [2, 0], which is defined as
f(xi,x2)
+
in standard basis in M2.
Solution. Change of basis matrix from basis u to the standard basis is
'\ -f 1 1
We get the transformation matrix in basis ([2, 0], u) by first transforming coordinates in basis ([2, 0], u) to the standard basis, i.e. to basis ([0, 0], (1, 0), (0, 1)), then we apply transformation matrix / in the standard basis and in the end we transform back to the coordinates in basis ([2, 0], u). Transformation equations for changing coordinates yi, y2 in basis ([2, 0], u) to coordinates x\, x2 in standard basis are
And hereby we have
+
+
Hence, our sought mapping is
f(yi, yi)
1 j_
\ \
2 2
2 0 -1 1
+
+
+
□
4.14. Let there be a standard coordinate system in M3 space. Agent K lives at point S with coordinates [0, 1,2] and the headquarters gave him a coordinate system with origin S and basis {(1, 1,0), (-1,0, 1), (0, 1,2)}. Agent Bond lives at point D with coordinates [1, 1, 1] and uses coordinate system with basis {(0, 0, 1), (-1,1, 2), (1, 0, 1)}. Agent K has set an appointment with agent Bond in the old brickfield which is (according to K's coordinate
exactly by the combinations from the right side of the equation we want to prove, where ts — 0. Now consider a point A — t\ A\ +----h ts As e S,ts < 1, and affine combinations
e(iiAi + • • • + ís_iAs_i) + (1 - e(l - ts))As
It is a line segment with vertices given by parameters s — 0 (the point As) and e = 1/(1 — ts) (a point in the convex hull of k'ij)i., As_i). The point A is an inner point of this line segment with the parameter s — 1, and thus A lies in the complex hull of
Ai,...,As. □
The convex hulls of finite sets are called convex polyhedrons. If and only if the vertices Aq, ..., Ak defining the convex polyhedron are in a general position, we get a ^-dimensional simplex. In the case of a simplex, the expression of any of its points as an affine combination of the defining vertices is unique.
A specific example are the convex polyhedrons defined by one point and a finite number of vectors: Let u\, ... ,uk be arbitrary vectors in the difference space W, A e A„ a point. A parallelepiped Vk(A; u\,..., uk) c An is the set
Vk(A; u\, ..., uk) — {A + c\u\
- ckuk; 0 < et < lj
If the vectors u\, are independent, we talk about a k-
dimensional parallelepiped Vk (A; u\,..., uk) c A„. It is obvious from the definition that the parallelepipeds are convex. In fact they are the convex hulls of their vertices.
4.10. Examples of standard affine exercises. (1) To find a para-i^K-      metric description of an implicitly given subspace
r >-S£"*)"^"   and vice versa:
Sv'psrJ Finding a particular solution of a nonhomoge-31s2e— neous system and a fundamental solution of the homogenized system, we get (in the coordinates in which the equations have been set) exactly the desired parametric description. In the opposite direction, if we write the parametric description in coordinates and then we eliminate the free parameters t\,..., tk, we get exactly the equations defining the given subspace implicitly.
(2) To find the subspace generated by several subspaces Qi, ■ ■ ■, Qs (of different dimensions in general, e.g. to find a plane in Rj, given by a straight line and a point, by three points etc.), and to define this subspace implicitly or parametrically:
The resulting subspace Q is always determined by one fixed point At in a subspace Qi and by the sum of all difference spaces. For instance,
Q = Ai +(Z({A1,...,Ak}) + Z(Q1)
Z(QS)).
If the subspaces are given implicitly, it is possible to convert them into the parametric form first. Nevertheless, also different methods are advantageous in some concrete situations. Notice that we really need to use one point of each from the subspaces. For example, two parallel fines in a plane generate the whole plane but they share the same one-dimensional difference space.
(3) To find the intersection of the subspaces Q\, ■ ■ ■, Qs: If they are given in the implicit form, it is sufficient to unify all equations into one system (and to leave out the linearly dependent). If the system that has arisen is insolvable, then the intersection is empty. In the opposite case, we get an implicit description of the affine subspace which is the intersection we are searching for.
214
CHAPTER 4. ANALYTIC GEOMETRY
system) at point [1, 1,0]. Where should Bond come (regarding his coordinate system)?
Solution. Change of basis matrix from agent K's basis to the Bond's one (with the same origins) is
/-4    2 -1N T =     1     0 1 \2    -1    1 f
Vector (0, 1,2) thus has coordinates T ■ (0, l,2)r = (0,2, if, by transposing origin (we add vector (—1,0, 1)) we get the result
(-1,2,2). □
4.15.  Find a transversal of lines (line passing through both lines)
q : [2,2,0] + f(l, 1, 1),
[1, 1, 1] +t(2, 1,0),
so vn&Xffl', u, 0] lies on this line.
Solution. We find intersection of sought transversal with line q (denote it by Q). Transversal contains some point lying on p and the point [1, 0, 0], therefore it lies in plane p defined by this point and line p, thus in plane
[1, 1, 1] + t(2, 1,0) + ^(0, 1, 1).
Point Q is then intersection of this plane and line q. We will find it by solving system
1 +2t 1+t + s 1+s
2 + u 2 + u
Left-hand sides of equations represent all three coordinates of an arbitrary point of plane p respectively, right-hand sides then represent coordinates of arbitrary point on q (we denoted the free variable as u in order not to be ambiguous). Solving this sytem, we obtain s = 2, t = 2, u = 3 and by inputting u = 3 into line q equation we get Q = [5, 5, 3] (we get the same point by inputting s = 2, t = 2, into parametric equations of p). Sought transversal is thereby given by point Q and point [1, 0, 0]. Now we easily compute the intersection with p, point P = [7/3, 5/3, 1]. □
4.16.  Find a common perpendicular of two skew lines
p:      [3,0, 3] + (0, 1,2)?, (el q :      [0,-1,-2]+ (1,2,3)5 sei
Solution. We want to find a transversal perpendicular to both direction vector of line p and direction vector of line q. We can find
If we are given parametric forms, we may also directly search for common points as solutions of the appropriate equations, similarly as we were finding the intersections of vector spaces. In this way, we get directly the parametric description again. If the number of subspaces is greater then two, we must search for the intersection step by step.
If one of the subspaces is defined parametrically and the other implicitly, it suffices to substitute the parametrized coordinates and to solve the resulting system of equations.
(4) To find a crossbar between skew lines p, q in A3 passing through a given point or having a given direction:
By a crossbar we mean a straight fine which has a f nonempty intersection with both the skew fines. Thus the resulting crossbar is r is an one-dimensional affine sub-space. If we are given its one point A e r, then the affine subspace generated by p and A is either a straight line (A e p) or a plane (A ^ p). In the first case, we have an infinite number of solutions, one for each point of q, in the second case, it suffices to find the intersection B of the plane (pUA) with q, and r — ({A, B}). The problem has no solution if the intersection is empty. If q c (pUA), we get an infinite number of solutions again, and if the intersection has one element, we get exactly one solution.
If we are given a direction u e Rn, i.e. the difference space of r, instead of a point, then we consider the subspace Q generated by p and the difference space Z(p) + (u) c K". Again, we get infinite number of solutions if q c Q, otherwise we consider the intersection Q with q and we finish in the same way as in the previous case.
The solutions of many other practical geometric problems are based mostly on the systematic use of the steps given above.
4.11.
Remarks to linear programming. In the beginning of the third chapter in paragraphs 3.4-3.8, we dealt with practical problems which are given by systems of linear inequalities.
I We easily check that each single inequality
a\x\
■anx„ < b
defines a halfspace in the standard affine space M" which is bounded by a hyperplane given by the corresponding equation (compare with the definition in paragraph 4.9(4)). Indeed, if we choose the parametric description of the hyperplane
{P + tlVl+---+t„-lV„-l}
with vectors vi,..., v„-\ from the difference space, then by completing these vectors by 1; to a basis of the whole M", the value
a\x\ + • • • + a„x„ — b
on the linear combination tivi + • • • + tn-ivn-i + tnv must be positive for all vectors with either a positive or a negative t„.
At the same time we see that the set of all admissible vectors for the problem of the linear programming is always an intersection of a finite number of convex sets and hence the set itself is either convex or empty.
If the intersection is simultaneously nonempty and bounded, then it is obviously a convex polyhedron. As we have justified in 3.4 already, each linear form is either permanently increasing or permanently decreasing or constant along each (parametrized) straight line in the affine space. Thus if a given problem from linear programming is solvable and bounded, then it must have
215
CHAPTER 4. ANALYTIC GEOMETRY
the right direction for example by cross product of those two vectors, and obtain direction (1, —2, 1). Now we form linear equation system which expresses that a vector defined by some two points, one of them lying on p, the other on q, was parallel with direction (1, —2, 1). Symbolically we get system P — Q = k{\, —2, 1), or
[3, 0, 3] + (0, 1, 2)t - [0, -1, -2] + (1, 2, 3)s = Jk(l, -2, 1). Treat-
1-,-'   1-,-'
p q ing this equality component-wisely, we get
3-5 1 + t - 2s 5 + 2t - 3s
-2k
with solution t = 1, s =2, k = 1. Inputting t = 1 into line p parametric equation we get one point of the common perpendicular, point [3, 1,5], by inputting s = 2 into line q equation we then get point [3, 1,5]. The common perpendicular is defined by those two points □
B. Eucledian geometry
4.17.  Determine distance of lines in M3.
p : [1, -1, 0] + t(-1, 2, 3),   and  q : [2, 5, -1] + t(-1, -2, 1).
Solution. The distance is defined as the distance of ortogonal projections of arbitrary points on the respective lines to the ortogonal complement of the vector subspace generated by their directions. We find the ortogonal complement using cross product:
((-1, 2, 3), (-1, -2, l))x = ((-1, 2, 3) x (-1, -2, 1)) = ((8, -2,4)) = ((4,-1,2)).
Transversal is for example segment [1,—1,0][2,5,—1], so we project vector [1, -1, 0] - [2, 5, -1] = (-1, -6, 1). We obtain distance of lines:
|(-1,-6,1). (4,-1,2)| 4
P(P, <?)
I(4,-1,2)|
□
4.18.  Find point A lying on line
p : x + 2y + z - 1 = 0,    3x - y + 4z - 29 = 0,
which has the same distance from both B = [3, 11, 4] and C [-5, -13, -2].
Solution. First, we express line p parametrically, we solve system
1,
the optimal solution in one of the vertices of the corresponding convex polyhedron. The reader should be able to imagine this claim without problems in the case of a two-dimensional or three-dimensional problem. Nevertheless, the straightforward explanation in these small dimensions hold also for all finite-dimensional cases.
So we have given a "geometric proof" of the existence part of the fundamental theorem 3.7. Also we have translated the initial problem into a discrete (i.e. finite) problem of the given cost function in a finite number of points in the space. An example of a practical algorithm for finding and evaluating the corresponding vertices of the convex polyhedron will be given in the chapter about the discrete mathematics yet.
4.12. Affine maps. A map / : A -> B between affine spaces is called the affine map if there exists a linear map <p : Z(A) -> Z(B) between their difference spaces such that for all A e A, v e Z(A) the following holds
f(A + v) = f(A) + cp(v).
The maps / and <p are determined uniquely by this property and by arbitrarily chosen images of (dim .4 +1) points in a general position.
Then for an arbitrary affine combination of points fo^o H-----h
tsAs € A we get
f(t0A0 + --- + tsAs) =
= /(A0 + ii(Ai - A0) + ---= /(A0) + ti(p(Ai - A0) + ■ = ř0/(A0) + íi/(Ai) + ----
+ ts(As - A0)) ■ • + ts(p(As ~ A0) tsf(As).
On the other hand, if a map preserves affine combinations, we may use a specific combination of n + 1 fixed vectors generating the affine frame. Then choosing successively the coefficients to = 0 and f, = 1, we define the value of the map <p between difference spaces by the relation (p(Ai — Aq) = /(Ai). The previous computation can be read in the opposite direction, and so we can check the correctness and linearity of <p. Indeed, the assumption that the first and the last rows are equal implies that the second and the third rows are equal. So we found out that we really get an affine map with the corresponding linear map <p between difference spaces which we described in the chosen affine frame by this procedure. Therefore:
Theorem. The affine maps are exactly those maps which preserve the affine combinations of points.
It is sufficient to check the invariance of affine combinations for all pairs of points since we can create an arbitrary affine combination from them. Indeed, the affine combination of k + 2 points Ao, A;fc+i can be expressed as
r(t0A0
tkAk) +sAk+i,
x + 2y + z 3x   -     y   + 4z
29.
where J2i=o fk — 1 and r + s = 1. Simply we choose a point which is an affine combination of k + 1 points only and then we make its combination with the last one. In this way, any finite affine combination can be made step by step from the combination of pairs.
216
CHAPTER 4. ANALYTIC GEOMETRY
We rewrite the system as an augmented matrix and perform row operations
1 2 1 3-14
1
29
1 0 9/7 0   1 -1/7
We obtain expression P
2.-2.0
7 7
+ t
1 2 1 0   -7 1
59/7 -26/7
7 7
1
26
Introducing substitution t = Is + 26 we get
/> : [-25,0, 26] +s (-9, 1,7), seK. We get point A by choosing certain s el. On top of that vectors
A — B = (-28-9*, -11 +5,22 + 7s) , A-C = (-20 - 9s, 13 + s, 28 + 7s) should be of the same length, i.e.
V(-28 - 9s)2 + (-11 + s)2 + (22 + 7s)2
= V(-20 - 9s)2 + (13 + s)2 + (28 + Is)2 ,
or rather
(-28 - 9s)2 + (-11 + s)2 + (22 + 7s)2
= (-20 - 9s)2 + (13 + s)2 + (28 + Is)2. should hold. >From the last equation we get s = —3. Therefore
A = [-25, 0, 26] - 3 (-9, 1,7) = [2, -3, 5].
□
4.19. Michael is standing in [2, 1,2] and has a stick of length 4. Can he touch lines p and q with this stick at the same time?
p   :   [-1,4, 1] + t (-1,2,0),
q   :   [4,4,-1]+5(1,2,-4)?
(Stick has to pass through [2, 1, 2].)
Solution. We know how to compute transversal of those lines passing through [2, 1, 2]. It is segment [1, 0, 1][3, 2, 3], its length is \/V2,
which is less than 4. Michael is able to touch the lines.
□
4.20. In Euclidian space M4 determine the distance of point A = [2, —5, 1,4] and subspace defined by equations
U : Ax\ — 2x2 — 3x3 — 2x4 + 12 = 0,    2x\ — x2 — 2x3 — 2x4 + 9 = 0.
Solution. First, we find a parametric expression of subspace U. For example,
B = [0, 3, 0, 3] e U.
4.13. Ratio of colinear points. The affine combinations of pairs of points can be also expressed with the help of so called ratio of points on a straight line. If C is given
§ by an affine combination of points A and B ^ C where C — rA + sB, then we say that the number
X = (C; A, B) = --r
is the ratio of the point C with respect to the given points A and B. Since we can express the point C as
C = A + s(B - A) = B + r(A - B),
the ratio X is the ratio of length of the oriented vectors C — A and C — B. In particular, X — — 1 if and only if C is the center of the fine segment between A and B (i.e. r — s — \ in our affine combination).
Hence our characterization of affine maps in terms of affine combinations has the following intelligible consequence:
Corollary. Affine maps are exactly those maps which keep the ratios invariant.
4.14. Changes of coordinates. Under the choice of an affine coordinate system (Aq, w) on A and a system (Bo, v) on B, we get the coordinate expression of the affine map / : A -> B. It follows directly from the definition that it is sufficient to express the image f(Ao) of the origin of coordinate system on A in the coordinate system on B, i.e. to express the vector f(Ao) — Bq in the basis v as a column of coordinates yo, and everything else is then given by multiplying by the matrix of the map <p in the chosen bases and by adding the outcome. Each affine map therefore has the following form in coordinates:
x     yo + Y ■ x,
where yo is as above and Y is the matrix of the map <p.
The transformation of affine coordinates corresponds, similarly as in the case of linear maps, to the expression of the identity map in the chosen affine frames. The change of coordinate expression of an affine map caused by a change of the basis can be easily computed by multiplying and adding matrices and vectors. Indeed, changing basis on the domain by a translation w and a matrix M, i.e. the new coordinates may be written in terms of the old ones as
x — w + M ■ x',
and changing the basis on the target by a translation z and a matrix N, i.e. the new coordinates may be written in terms of the old ones as
/ — z + N ■ y,
for the map given by the translation yo and matrix Y in the old bases we directy compute
y = z + N ■ y = z + N ■ (yo + Y ■ x)
— (z + N ■ yo + N ■ Y ■ w) + (N ■ Y ■ M) ■ x'.
Hence the affine map in the new bases is given by the translation vector z + N-yo + N- Y- wa matici N Y ■ M.
Ill
CHAPTER 4. ANALYTIC GEOMETRY
We know that the distance of A and U equals the length of orthogonal projection of vector A — B to the ortogonal complement of direction of subspace U. However we know the ortogonal complement of U direction (it defines this subspace) - as set (of linear combination of normal vectors)
V
{t (4, -2, -3, -2) + s (2, -1, -2, -2); t,s €
We need to find orthogonal projection Pa-b of vector A — B to V, which lies in V, and thus
PA-b = a (4, -2, -3, -2) + 6 (2, -1, -2, -2)
for certain a, b eR. Obviously itmustholdthat (A—B—PA-b) -L V, thus
((A - fl) - /VB) ± (4, -2, -3, -2), ((A - S) - PA-b) ±(2,-1,-2,-2). By substitution of A — B and Pa-b we obtain
((2, -8, 1, 1)
((2, -8, 1, 1)
fl(4, •(4,
fl(4, •(2,
-2, -2,
-2, -1,
-3, -3,
-3, -2,
-2)--2) =
-2)--2))
6(2, = 0,
6(2, = 0;
■1,-2,-2))
■1,-2,-2))
so
(2, -8, 1, l)-(4, -2, -3, -2) -a(4, -2, -3, -2)-(4, -2, -3, -2) -6(2, -1, -2, -2)-(4, -2, -3, -2) = 0,
((2, -8, 1, l)-(2, -1, -2, -2)) -a(4, -2, -3, -2)-(2, -1, -2, -2) -6(2, -1, -2, -2)-(2, -1, -2, -2 = 0.
If we compute those dot products, we get system
19   -   33a   -   206   = 0, 8   -   20a   -   136   = 0,
with only solution a = 3, 6 = — 4. Hence
PA_B = 3 (4, -2, -3, -2) - 4 (2, -1, -2, -2) = (4, -2, -1, 2) where
II /Vb II = V42 + (-2)2 + (-l)2 + 22 = 5. Recall that distance of A and £/ equals 11 Pa-b 11 = 5.
4.15. Euclidean point spaces. So far we have not needed the no-^ tions of distance and length for our geometric considerations. But the length of vectors and the angel between vectors, as we defined them at the end of the third part of the second chapter (see 2.40 and farther), play a significant role in many practical problems. In fact, this additional information refers only to the vectors from the difference space, and so there is not much work to be done:
__ |    Euclidean spaces    |__%
The standard euclidean point space £„ is the affine space An whose difference space is the standard euclidean space M" with the scalar product
(x, y) — yT ■ x.
The Cartesian coordinate system is the affine coordinate system (Ao; u) with the orthonormal basis u.
The Euclidean distance between two points A, B e £„ is defined as the length of the vector \\B — A ||, and will be denoted by
P(A, B).
Euclidean subspaces in £„ are affine subspaces, where the corresponding difference spaces are considered with restricted scalar products.
□
By a general euclidean point space £ of dimension n we mean an affine space, whose difference space is a real n-dimensional euclidean vector space. The notion of a cartesian coordinate system has the obvious meaning again. Since each choice of such a coordinate system gives an identification of £ with the standard space £„, we deal, without loss of generality, mainly with the standard euclidean spaces and their subspaces.
From the geometric point of view the simple properties of the j«i       scalar product like triangular inequality, Cauchy in-KtjL      equality, Bessel inequality etc., derived in the fourth part of the previous chapter (see 3.25), have very use-ful consequences:
4.16. Theorem. For the points A, B, C e £n the following holds
(1) p(A, B) = p(B, A)
(2) p(A, B) = 0 if and only if A = B
(3) p(A,B) + p(B,C)> P(A,C)
(4) In each cartesian coordinate system (Ao; e), the distance of the points A — Aq + a\e\ + • • • + anen, B — Ao + b\e\ +
----\-b„en is yjj2"=i(ai ~ bi)2-
(5) Given a point A and a subspace Q in £„, there exists a point P e Q which minimalizes the distance between A and the points in Q. The distance between A and P is equal to the length of the orthogonal projection of the vector A — B into Z(Q)-1 for an arbitrary B e Q.
(6) More generally, for subspaces Q and 1Z in £„ there exist points P e Q and Q e 1Z which minimalize the distances of points B € Q and A € 1Z. The distance between the points P and Q is equal to the length of the orthogonal projection of the vector A — B into Z(Q)1- for arbitrary points B e Q and A e 7Z.
Proof. The first three properties follow directly from the ^   properties of length of vectors in spaces with a scalar product, the fourth one follows directly from the expression of the scalar product in an orthonormal basis.
218
CHAPTER 4. ANALYTIC GEOMETRY
4.21. In vector space R4 compute distance i; between point [0, 0, 6, 0] and vector subspace
U : [0, 0, 0, 0] + h (1, 0, 1, 1) + h (2, 1, 1, 0) + h (1, -1, 2, 3),
Solution. We will solve the problem by the least squares method. Let U's generating vectors be columns of matrix
/l   2    1 \
A =
■1
0 1
1 1 2 \1   0 3/
and we substitute point [0, 0, 6, 0] by corresponding vector b =
(0, 0, 6, 0)T. We will solve A ■ x = b, i.e. linear equation system
X\   +   2x2 + x3 = 0,
x2 — xj, = 0,
x\   +    x2 + 2x3 = 6,
x\ + 3x3 = 0,
by least squares method. (Note that the system does not have a solution
- the distance would be 0 otherwise.) Let's multiply A ■ x = b by
matrix AT from the left-hand side. Augmented matrix AT ■ A ■ x =
AT ■ b then is
3 3 6 3 6 3 6  3   15 12
By elementary row operations we transform the matrix to the normal form
3 3 3 6 6 3
3 15
12 j     \ 0  -3 3 We continue with backward eUmination 1   1 2 0   1 -1 0  0 0
and see the solution
■1
0
x = (2 - 3t, t, t) T ,   t e R.
Note that the existence of infinitely many solutions is caused by third vector generating U, which is redundat because
3 (1, 0, 1, 1) - (2, 1, 1, 0) = (1, -1, 2, 3).
Arbitrary (t e R) linear combination
(2 - 3f) (1, 0, 1, 1) + t (2, 1, 1,0) + ? (1, -1, 2, 3) = (2, 0, 2, 2)
corresponds to a point [2, 0, 2, 2] in subspace U, which is the nearest point to [0, 0, 6, 0]. The distance is therefore
u = || [2, 0, 2, 2] - [0, 0, 6, 0] || = V22 + 0 + (-4)2 + 22 = 2^6.
□
Let us look at the relation for the minimal distances p(A, B) for B e Q. The vector A — B decomposes uniquely as A — B = u\ +u2, where u\ e Z(Q), u2 e Z(Q)-1. The component u2 does not depend on the choice ofB € Q since any potential change of the point B would show by adding a vector from Z(Q). Now let us choose P = A + (—u2) = B + u\ e Q. We get
BH2 =
ll"2ll   > ll"2ll   — IIA ■
From here we see that the minimal possible distance is reached exactly for our point P and its value is ||«2 II indeed.
We get the general result in a similar way. For the choice of arbitrary points A e 1Z and B e Q their difference is given as a sum of vectors u\ e Z(K) + Z(Q) and u2 e (Z(K) + Z(Q))J-, where the component u2 does not depend on the choice of the points. Adding suitable vectors from the difference spaces of 1Z and Q we obviously obtain points A' and B' whose distance is exactly
II"2||. □
Now we extend our brief overview of elementary problems in the affine geometry.
4.17. Examples of standard problems. (1) To find the distance from the point A € £„ to the subspace Q C £„:
A method of solving such problem is given in the proposition 4.16.
(2) In £2 to construct the straight line q through a given point A which form a given angle with a given line p:
Let us remind that we have worked with angles between vectors in the plane geometry already (see e.g. 2.43). We find a vector u € M2 lying in the difference space of the line q, and we choose a vector i; having the prescribed angle with u. The desired line is given by the point A and the difference space (v). The problem has either two solutions or only one solution.
(3) To find the perpendicular from a point to a given line:
The procedure is introduced in the poof of the last but one item of the proposition 4.16.
(4) In £3 to determine the distance of two lines p, q:
We choose arbitrarily one point from each of the lines, A e p, B e q. The component of the vector A—B lying in the orthogonal complement (Z(p) + Z(q))1- has the length equal to the distance between p and q.
(5) In £3 to find the axis of two skew lines p a q:
By the axis we mean the crossbar which realizes the minimal possible distance of the given skew lines in terms of the points of intersection. Again, the procedure can be derived from the proof of the proposition 4.16 (the last item). Let rj is the subspace generated by a single point A e p and the sum Z(p) + (Z(p) + Z(q))-1. Provided that the lines p and q are not parallel, it is going to be a plane. Then the intersection rjllq together with the difference space (Z(p)+Z(q))-L give the parametric expression of the desired axis. If the lines are parallel, then the problem has an infinite number of solutions.
4.18. Angles. Various geometric notions like angles, orientation, volume etc. in the point spaces £„ are defined in
f ~-±Z%      terms of suitable notions from the vector euclidean spaces just as the notion of the distance. Let us remind that we defined the angle between two vectors at the end of the third part of the second chapter, see 2.43.
219
CHAPTER 4. ANALYTIC GEOMETRY
4.22.   Compute volume of parallelepiped in R3 with base in plane z, = 0 and with edges given by pairs of vertices [0, 0, 0], [-2, 3, 0]; [0, 0, 0], [4, 1, 0] a [0, 0, 0], [5, 7, 3]. Solution. Parallelepiped is given by vectors (4,1,0), (—2,3,0), (5,7,3). We know that its volume is defined as determinant 4-2 5
Indeed, from Cauchy inequality follows 0
< 1, and
3 0
3 • 14 = 42.
Note that if we modified the order of vectors, we would get result ±42, because determinant gives us oriented volume of parallelepiped. Further note that the volume would not change if the third vector was [a,b,3] for arbitrary a, b € R. Its surface obviously depends only on ortogonal distance of planes of its upper and lower base and their area
4 -2
1
14.
□
4.23. Let points [0, 0, 1], [2, 1, 1], [3, 3, 1], [1, 2, 1] define a paral-leloid. Determine point X lying on line p : [0, 0, 1] + (1, 1, l)t so that parallelepiped defined by given paralleloid and point X has volume of 1.
Solution. We will form a determinant which gives us volume of a parallelepiped with X moving along line p:
t t
1 0
2 0
Volume should be 1 which introduces condition t = 1/3.
□
4.24. Let A BCD EF GHbea cube (with common notation, i.e. vectors E — A, F — B, G — C, H — D are orthogonal to the plane defined by vertices A, B, C, D) in Euclidean space R3. Compute angle cp between vectors F — A a H — A.
Solution. We have solved this problem using formula for angle between vectors. Let's think about the problem further. Vertices A, F, H are vertices of a triangle with all sides of the same length, it is hence equilateral triangle and therefore cp = jt/3. □
4.25. Let 5 be a midpoint of edge AB of cube ABCDEFGH (with common labelling). Compute cosine of angle between lines £"5 and BG.
Solution. Dilatation (homotethy) is similar mapping, hence it preserves angles. We can therefore asume that the cube edge has length 1. Further, we can place the point A to the origin of coordinate system and points B and E to points [1, 0, 0] and [0, 0, 1] respectively. Other coordinates are then given: S = [1/2, 0, 0], G = [1,1,1], vector
MINI
so it has sense to define the angle cp(u, v) between vectors u, v e V in a real vector space with a scalar product given by the equation
u ■ v
cos cp(u, v) — -,    0 < cp(u, v) < 2tt.
I"l III'll
This is completely in accordance with the situation in the two-dimensional euclidean space R2 and with our philosophy that the notion related to the two vectors is the issue of the plane geometry in fact.
In the euclidean plane, we used also the geometric functions cos and sin which we defined by a pure geometric consideration. We will come back to this in the beginning of the fifth chapter, when we will be able to check precisely the geometric opinion that the function cos is decreasing in the interval [0, it]. Therefore, the angle between two vectors in higher-dimensional spaces is measured in the plane which is generated by these two vectors (or it is zero), and our defining relation corresponds to the conventions in all dimensions.
In an arbitrary real vector space with a scalar product, it follows directly from definitions that
v\\2 =
■ + |MI  - 2(« • v)
2 2
— \\u\\ + ||«11  — 2||w|| ||u|| coscp(u, v).
This is evidently the well known law of cosines from the plane geometry.
Next, the following relation holds for each orthonormal basis e of the difference space V and a non-zero vector u e V
\\u\\2 — ^2 \u • ei\2-
i
By dividing this equation by the number ||» ||2 we get
1 = ^(cos^(w, e,))2,
which is the law of directional cosines cp(u, e{) of the vector u.
Now we can derive reasonable definitions for angles between general subspaces in an euclidean vector space from the definitions of angles between vectors. Concurrently we must decide how to deal with cases, where the subspaces have a nontrivial intersection. As the angle between two lines, we want to take the smaller one from the two possible angles, in the case of two nonparallel planes in R3 we do not want to say that the angle is zero since they intersect and have one direction in common:
Angles between subspaces    [__<
4.19. Definition. Let us consider finite-dimensional subspaces Ui, U2 in an euclidean vector space V of an arbitrary dimension.
The angle between vector subspaces U\, U2 is the real number a — cp(Ui, U2) e [0, j] satisfying: (1) If dimt/i = dimt/2 = 1, U\ = (u), U2 = (u),then
|w.u|
cos a = -.
(2) If the dimensions of U\, U2 positive and U\ n U2 — {0}, then the angle is the minimum of all angles between one-dimensional subspaces
a — min{<p((«), (j;)); 0 ^ u e U\,0 ^ v e U2}.
220
CHAPTER 4. ANALYTIC GEOMETRY
£"5 = (1/2, 0, -1) and BG = (0, 1, 1). Sought cosine of angle <p is then
(1/2, 0,-1) • (0, 1, 1)
cos(<p)
1(1/2, 0,-l)|| ||(0,1,1)|
V2 V5
□
4.26. Copmute angle between line p given by implicit equations
x   +   3y   +   z   = 0, -x   -    y   +   z   = 0
and plane q : x + y + 2z + 1 = 0.
Solution. We can see that normal vector of plane q is (1, 1, 2). Now we copy the first equation of line p and then sum both of them, obtaining
x   +   3y   +    z,   = 0, 2y   +   2z,   = 0.
>From this system we can see that y = — z, and x = 2z. Vector
(2, —1, 1) is therefore direction vector of p; in other words, we have
(p is obviously passing through the origin)
p : [0, 0, 0] + t (2,-1, 1),   t eR.
For angle <p between vectors (1, 1, 2), (2, —1, 1) we have
2-1+2 1
cosw = ——-— = —.
V6- 76 2
Hence <p = 60 °. However, this is angle between direction vector of p and normal vector q. Sought angle is complement of this angle, so the correct result is 30 ° = 90 ° - 60 °. □
4.27. In real plane, find a line which passes through point [—3,0] and angle between this line and line
p : V3x + 3 y + 5 = 0
is 60°.
Solution. First we have to realize that there will be two such lines. General equation of line in plane has form of
ax+by+c = 0,   and we may choose parameters so that  a2+b2 = 1.
Let's find such numbers a,b,c e R, so that all the conditions are satisfied. Inputting x = —3,y =0 into the equation (line has to go through[—3, 0]), we get c = 3a. Condition of angle between lines equals 60 ° then gives
V3a + 3b
- = cos 60c 2
12
tj-
73=  V3a + 3b
Performing further operations
± 1 = a + V3b   and exponentation 1
We show in a moment that such minimum always exists.
(3) If U\ C U2otU2g U\ (in particular if one of them is empty), then a — 0.
(4) If UXC\U2^ {0} and Ux / Ux n U2 / U2, then
a = v{Ui n (Ui n rj2)\ u2 n (Ui n u2)-L).
The angle between affine subspaces Q\, Q2 in an euclidean point space £„ is defined as the angle between their difference spaces Z(Qi), Z(Q2).
Let us notice that the angle is always well defined, in particular in the last case is
(Ui n (Ui n ui)1-) n (u2 n (ur n u2f) = {0}
and so we can indeed determine the angle according to the item (2). Let us also notice that in the case U\ n U2 — {0}, the subspaces U\ and U2 are perpendicular in terms of our former definitions if and only if the angle between them is it/2. However, if they have a nontrivial intersection, then they cannot be perpendicular in the former sense.
In order to show the correctness of the definition, it remains to show that the vectors u e U\,v e U2 minimalizing the expression for the angle always exist. First a special case:
4.20. Lemma. Let v be a vector in an euclidean space V and U C V an arbitrary subspace. Let us denote by vi € U, v2 e U1- the (uniquely determined) components of the vector v, i.e. v — v\ + v2. Then the angle <p
between the subspace generated by v and the subspace U satisfies
cos<p((v), U) — cos<p((v), (l>l)) —
Proof. According to the Cauchy inequality, for all vectors u e U we have
\u ■ v\       \u ■ (v\ + v2)\      \u ■ V\\
\\u\\ \\v\ \u\\ IIu 1II \\u\\ III'll
\v\ I
\v\ I
\v\\ \\v\ I
\v\ ■ v\ \\v\\ \\v\ II
This implies
cos(p((v), («)) < cos(p((v), (v\))
\vi I
+ 3b2 + 2V3ab.
and thus the vector v\, which we have found, represents the largest possible value of the cosine of angles between all choices of vectors inU. But since the function cos is decreasing on the interval [0, j], we get the smallest possible angle in this way, and so the claim is proved. □
4.21. Calculating angles. The procedure in the previous lemma ;i can be understood as follows. We take the orthogonal projection of the one-dimensional subspace generated by 1; t into the subspace U, and we look at the ratio between 1; 1 ^ ' and its image. A similar procedure is used in the higher dimension too. However, the problem is to recognize the directions whose projections give the desired (minimal) angle. We can see this in our previous example if we project the bigger space U into one-dimensional (1;) first, and then orthogonally back to U. We find out that the desired angle corresponds to the direction of
221
CHAPTER 4. ANALYTIC GEOMETRY
If we use a2 + b2 = 1, we get
0 = 2b2 + 2V3ab,    tj.   0 = b (b + V3aj .
Together (remember that c = 3a and a2 + b2 = 1)
1 73
a = ±1,    & = 0,    c = ±3;       a = ±-,    & = T—, c
2 2
We can easily check that lines determined by those coefficients
x + 3 = 0, satisfy all the conditions.
1
- x
2
73 3 — V + - = 0 2^2
3
±-.
2
□
4.28. Determine general equation of all planes so that angle between every such plane and plane x + y + z — 1 = 0is 60°, and further, they contain line p : [1, 0, 0] + t (1, 1, 0). O
4.29. Determine angles between planes
a:      [1,0, 2] + (1,-1, l)t + (0,1,-2)5 p:      [3, 3, 3]+ (1,-2,0)? + (0,1,1)5
Solution. Line of intersection between planes has direction vector (1,-1,1), plane ortogonal to this vector has intersection with given planes generated by vectors vektory (1,0, —1) a (0, 1, 1). Angle between these one-dimensional subspaces is 60°. □
4.30. Cube ABCDA'B' C D' (in standard notation, i.e. ABCD and A'B' C D' are faces and A A' is an edge). Compute angle between AB' and AD'.
Solution. Consider cube of side 1 and place it in M3 in such way that
vertex A has coordinates [0, 0, 0], vertex B coordinates [1, 0, 0] and
vertex C coordinates [1, 1, 0]. Then vertex B! has coordinates [1, 0, 1]
and vertex D' coordinates [0, 1, 1]. We can determine vectors AB' =
B' — A = [1,0, l]-[0, 0, 0] = (1,0, 1), AD' = D'-A = [0, 1, 1]-
[0, 0, 0] = (0, 1, 1). By definition of angle <p between those vectors
(1,0, 1)-(0,1,1) 1
cos(co) =-= -,
11(1,0,1)1111(0,1,1)11 2
hence <p = 60°.
□
For further exercise on angles see .
4.31. Now we will look at usage of Cauchy inequality. Prove that for every n e N and for all positive x\, x2, ..., xn elan inequality
n2<
1
1
+ — +
X\ X2
+
1
(xi + x2 H-----\-x„).
holds. For what arguments does equality hold?
the eigenvector of this map, and its eigenvalue is the square of the cosine of the angle.
Hence let us consider two arbitrary subspaces U\, U2 in an euclidean vector space V, U\ n U2 — {0}, and let us choose or-thonormal bases e and e' of the whole space V such that U\ —
(ei,
ek), U2 — (e.
Let us consider the orthogonal projection <p of the space V on U2, its restriction on U\ will be denoted by <p : U\ -> U2 as before. Similarly, let i/> : U2 -> U\ be the map which has arisen from the orthogonal projection onU\. In the bases (e\, ek) and (e'j,..., e'}), these maps have matrices
A =
ek ■ ex
B =
\e\-e\    ... ek-e'J
l -e\
erei
\e[ ■ ek   ...   e'r ekJ
Since we are regarding scalar products on a real vector space, et ■ e'j — e'j -et holds for all indices i, j, in particular we have B — A T.
The composition of maps fofi : U\ —>• U\ has therefore a symmetric positive semidefinite matrix AT A, and i/> is an adjoint map to <p. We saw that each such map has only nonnegative real eigenvalues and that it has a diagonal matrix with these eigenvalues on the diagonal in a suitable orthonormal basis, see 3.29 a 3.31.
Now we can derive a general procedure for computing the angle a — (p(Ui, U2).
Theorem. In the previous notation, let k be the largest eigenvalue of the matrix A1A. Then (cos a)2 — X.
Proof. Let u e U\ be the eigenvector of the map i/> o <p corresponding to the eigenvalue X. Let us consider all eigenvalues Xi,..., Xk (including multiplicities), and let m — (u\, ...,«„) be the corresponding orthonormal basis of Ui made up from the eigenvectors. We may directly assume that X — X\, u — u\.
We need to show that the angle between an arbitrary v e Ui and U2 is at least as large as the angle between u and U2, i.e. to show that the cosine of the corresponding angle cannot be greater. By the previous lemma, it is sufficient to discuss the angle between u and cp(u) e U2, and we know that — 1. Hence let us choose v e U\, v — a\u\ + • • • + akuk, Y^=\    — \\v\\2 — L Then
— a\u\ + ■
\\<P(v)\\2 = (p(v) ■ (p(v) < \\i[r o <p(v)\\
- (■iff o (p(v)) ■ V v\\ —      o (p(v)\
Moreover, the previous lemma gives also a formula for computing the angle a between the vector i; and the subspace U2
\\<p(v)\
\\<p(v)\
Since we have chosen Xi to be the largest eigenvalue and the sum of squares of coordinates a2 is fixed to one, we get
(cos a)  — \\<p(v) |
\\jr o (p(v)\
222
CHAPTER 4. ANALYTIC GEOMETRY
Solution. It is sufficient to consider Cauchy inequality \u-v\< llullllvll
in Euclidean space Rn for vectors 1      1 1
We obtain (4.1)
n <
-\- xn.
We will get wanted inequality by raising (|| 4.11|) to a power. We further know that Cauchy inequality is becoming equality when vector u is a multiple of v, which implies x\ = x2 = ■ ■ ■ = x„. □
4.32. Vectors u = (u\, u2, u3) and v = (vi, v2, v3) are given. Find third unit vector such that parallelepiped defined by those three vectors had the greatest possible volume.
Solution. Denote sought vector as t_ = (t\, t2, t3). By Proposition ||??|| is volume of parallelepiped V3(0;u,v,t) defined as absolute value of determinant
£• (« x v) < HillII« x v\\ = \\u x vII.
Sign of inequality follows from Cauchy inequality, moreover we know that this becomes equality if and only if( = c(«xw),ceK. The volume therefore could be at most equal to the area of paralleloid defined by vectors u, v (i.e. size of vector (u x v)). Equality holds if and only if
t = ±-
	V\	h		h	h	h
u2	v2	h	=	U\	u2	u3
u3	v3	h		V\	v2	v3
(uxv) \(u xv)\
□
4.33. Find foot of fine passing through point [0, 0, 7] and perpendicular to plane
p : [0,5, 3] + (1,2, l)t + (-2, 1, l)s.
4.34. In Euclidean space M5 determine the distance of planes
Q! : [7, 2, 7, -1,1] + h (1, 0, -1, 0, 0) + sx (0, 1, 0, 0, -1), q2 : [2, 4, 7, -4, 2] + t2 (1, 1, 1, 0, 1) + s2 (0, -2, 0, 0, 3),
where t\, s\, t2, s2 € R, and the distance of planes
oi : [0, 1, 2, 0, 0] + pi (2, 1, 0, 0, 1) + qx (-2, 0, 1, 1, 0), a2 : [3, -1, 7, 7, 3] + p2 (2, 2, 4, 0, 3) + q2 (2, 0, 0, -2, -1),
where p\,q\, p2, q2 e R.
Solution. Case q\, q2. We first compute ortogonal complement to sum of vectors defining the planes. We form a matrix where its rows
lfv — u, we get exactly ||<o(i;)|| — ^ill^ll — k , and thus the angle has the minimal value for this vector. □
4.22. Calculating volume. We met an indication of calculating volumes in the plane geometry already at the end of the fifth part of the first chapter (see 1.34). There we found out that the notion of orientation played the fundamental role. We could imagine the orientation as the decision whether we looked at our plane M2 from above or from bellow. The difference is in the order of the standard basis vectors e\ and e2 on the unite circle. We proceed in the same way in general:
___|    Orientation of a vector space |_ -
We say that two bases u and v of a real vector space V determine the same orientation if the transformation matrix between them has a positive determinant. Formally, by the orientation of a vector space V we mean the equivalence class of bases u with respect to the equivalence which we defined just now, by the sign of the determinant. Equivalent bases in this sense are called agreeing with the chosen orientation.
It follows directly from the definition that there exist exactly two orientations on every vector space. From each agreeing basis we can obtain a disagreeing one by an arbitrary transformation matrix with a negative determinant.
A vector space with a chosen orientation is called the oriented vector space.
The oriented euclidean (point) space is an euclidean point space whose difference space is oriented. In sequel we consider the standard euclidean space £„ together with the orientation given by the standard basis of M".
Let u\,..., uk be arbitrary vectors in the difference space W, A e £„ a point. As an example of a convex set, we defined the parallelepiped Vk(A; u\,..., m) c £n by
Vk(A; wi, ..., uk) — {A + c\u\ H-----h ckuk; 0 < q < 1}.
If the vectors u\,... ,uk are linearly independent, we talk about a ^-dimensional parallelepiped Vk(A; u\..., uk) c £„. For given vectors u\,..., uk we have also the parallelepipeds of the lower dimension
Vx(A;ux),... ,Vk(A;ux, ... ,uk)
in euclidean subspaces A + (u\),..., A + (u\,..., uk) at our disposal.
If u\,..., uk are linearly independent, we define the volume Vol Pk = 0.
Otherwise we think about it as we did in the case of the Gramm-Schmidt orthogonalization
(«1, . . . , Wjfc) — («1, . . . , Wjfc-l) © («1, ... , Mj:_l)-Ln («1, ...,Uk).
In this decomposition, uk is uniquely expressed as
uk = uk+ ek
where ek _L (u\, ..., uk-
223
CHAPTER 4. ANALYTIC GEOMETRY
are direction vectors of planes. Then we transform this matrix into
normal form. We j											
	/ 1 0-1	0	0			/ 1	0	-1	0	0	\
	0    1 0	0	-1			0	1	0	0	-1	
	1   1 1	0	1			0	0	1	0	1	
	x 0  -2 0	0	3	)		\0	0	0	0	1	/
So the ortogonal complement is ((0, 0, 0, 1, 0)). (It was obvious that vector (0, 0, 0, 1, 0) lies within the ortogonal complement. By transforming the matrix into normal form we determined that the ortogonal complement is one-dimensional.) The distance between planes equals the size of perpendicular projection of vector A\ — A2 into sub-space ((0, 0, 0, 1, 0)) for arbitrary points Ai e q\, A2 € q2. Choose e.g. Ai = [7,2, 7,-1, 1], A2 = [2,4,7,-4,2]. Obviously the ortogonal projection Ax - A2 = (5, -2, 0, 3, -1) to ((0, 0, 0, 1, 0)) is (0, 0, 0, 3, 0). The size of (0, 0, 0, 3, 0) gives the sought distance 3.
Case o~\, o~2. Sum of directions of a\, a2 is generated by direction vectors. Denote them by
mi = (2, 1,0, 0, 1),   m2 = (-2,0, 1, 1,0), ui = (2, 2, 4, 0, 3),    v2 = (2, 0, 0, -2, -1).
Let's find such points X\ € a\, X2 e a2, that realize the distance between a\ and a2. We know that
Xx-X2= [0, 1,2,0, 0] - [3,-1,7, 7,3]
+p\ux + qiu2 - p2vi - q2v2 =   (-3, 2, -5, -7, -3) + piui + qxu2 - p2vt - q2v2
and it holds that
(X1-X2,u1)=0, (X1-X2,u2)=0, {X1-X2,v1)=0, {X1-X2,v2)=0,
((-3, 2, -5, -7, -3), ui) + pi (ui,ui) + qi {u2, ux)
- p2(v1,u1) - q2{v2,u1) = 0,
((-3, 2, -5, -7, -3), u2 ) + pi       u2 ) + qi {u2, u2 )
- p2 (vu u2) - q2 (v2, u2) = 0,
((-3, 2, -5, -7, -3), t>i ) + pi       t>i ) +qi (u2, vt )
- p2(v1,v1) - q2(v2,v1) = 0,
((-3,2, -5, -7, -3),v2) + pi (uuv2) +qx (u2,v2)
- p2 (vu v2) - q2 (v2, v2) = 0. By computing those dot products we get linear equation system
tj-
We define the absolute value of the volume of a parallelepiped inductively such that we fulfil the idea that it is the product of the volume of the "base" and the "altitude":
\Vol\Tk(A;Ul,
I Vol\Vi(A; «0 = ||«i II
■ ,«*)= Ik* II I Vol |7Vi(A; «i,
■, w*-i).
If u\,..., un is a basis agreeing with the orientation of V, we define the (oriented) volume of the parallelepiped by
VolP*(A; «i,..., u„) — | Vol\Vk(A; uu..., u„),
in the case of a nonagreeing basis we set
VolP*(A; wi,..., un) — —| Vol\Vk(A; uu ..., un).
The following claim clarifies our former comments that the determinant expresses the volume in a sense. The thing is that the first claim says exactly that we get the volume of the parallelepiped in a ^-dimensional space, which is stretched on k vectors, such that we write down their coordinates (in an orthonormal basis) into columns of a matrix and we calculate the determinant.
The formula in the second claim is called Gramm determinant. Its advantage is that it is independent on the choice of basis and, therefore, it is better to handle in the case that k is lower then the dimension of the whole space.
Theorem. Let Q  c   £n be an euclidean subspace, and let (e\,..., ek) be its orthonormal basis. Then for arbitrary vectors u\, ... ,uk € Z(Q) and A € Q the . following holds
(1) VolTk(A; ux
Uk) —
(2) (VolTk(A;Ul,...,Uk)y
Proof. The matrix
u\ ■ e\ u\ ■ u\
u\ ■ Uk
Uk ■ e\ Uk ■ ek
.. Uk ■ U\ ..     Uk ■ Uk
A = :
\u\-ek ..
has the coordinates of vectors u\,. columns, and
|A|2= \A\\A\ = \A-u\ ■ u\
U\ ■ Uk
Uk ■ ei\
Uk ■ ek) . ,uk in the chosen basis in
|A Uk ■ u\
Uk ■ Uk
\ATA\
Hence we see that if (1) holds, then also (2) holds.
The unoriented volume is directly form the definition equal to the product
\Vol\Vk(A;ui,. where ui — u\, v2 — u2
uk) — IN Illk2ll •• , vk — uk
a2vi
Nil, ■ ak v\
of _,vk-i is the result of the Gramm-Schmidt orthogonalization.
224
CHAPTER 4. ANALYTIC GEOMETRY
7,
6pi   -   \q\   - 9p2   - 3q2
-4pi   +   6qi + 6q2   = 6,
9pi               - 33p2   - q2   = 31,
3pi   -   6qi   - p2   - 9q2   = -11,
which we solve by forming matrix and performing elementary row operations.
7 \ / 1
/
V
9
3
0
-9 0
-33 -1
0 0 0
1 0 0 1 0
V 0 0 0 1
0
0 0
o \
-1 -1
2
31
-11 J
The solutions is (pi, q\, p2, q2) = (0, —1, — 1, 2). We have found
Xt-X2 = (-3, 2, -5, -7, -3)-u2+vx-2v2 = (-3, 4, -2, -4, 2).
The size of vector (—3,4,—2,—4, 2) and at the same time distance between planes a\, a2 is hence
7 = V(-3)2 + 42 + (-2)2 + (-4)2 + 22.
We determined distance between q\ and £2 differently than the distance between a\ and a2. We could have used both methods in both cases. Let's try the former method for the case of a\, a2. Let's find ortogonal complement of vector subspace generated by
(2,1,0,0,1),    (-2,0,1,1,0),    (2,2,4,0,3), (2,0,0,-2,-1)
We get / 2
V
1 0 0 1
2 4 0 0
0
1 0
1 \ 0
3
-1 /
/ 1 0  0 0
0 10 0
0 0 10
V 0 0 0 1
3/2 \ -2
1
2
The ortogonal complement is ((—3/2,2,-1,-2,1)), or rather ((3, —4, 2, 4, —2)). Note that distance between a\ and a2 equals the size of ortogonal projection of vector (difference of arbitrary point in o\ and arbitrary point in a2)
u = (3, -2, 5, 7, 3) = [3, -1, 7, 7, 3] - [0, 1, 2, 0, 0]
to this ortogonal complement. Denote the ortogonal projection of u as pu and choose 1; = (3, —4, 2, 4, —2). Obviously pu = a ■ v for some del and it holds
( u — pu, v ) = 0,    tj.    ( u, v ) — a { v, v ) = 0.
Computing gives 49 — a ■ 49 = 0. Therefore pu = 1 ■ v = v and the distance between planes a\ and a2 is equal
\\Pu\\ = 732 + (-4)2 + 22 + 42 + (-2)2 = 7.
Method of computing distance using ortogonal complement of sum of vector spaces has proven to be „faster way to the solution". With no doubt, it will be the same for planes q\ a q2. The second method however reveals points where the distance can be measure (pair
Thus we have
(yo\Vk(A;uu...,uk)Y =
VI ■ VI
vi ■ vk
v\ ■ v\ 0 0 0
. .     Vk- VI
■ •   vk- vk
vk ■ vk
Let us denote by B the matrix whose columns are formed jji:, by the coordinates of vectors v\,..., vk in the orthonor-
mal basis e. Since 1;
vk have arisen from u\
|t as images under a linear transformation with an upper-triangular matrix C with ones on the diagonal, we have B = CAand|B| = \C\\A\ = \A\. Butthen|A|2 = \B\2 = \A\\A\, and thus ~VolVk(A; u\,..., uk) — ±|A|. The resulting volume is zero if the vectors u\,... ,uk are dependent. Provided that they are independent, the sign of the determinant is positive if and only if the basis u\,... ,uk defines the same orientation as the basis e. □
We can formulate the following important geometric consequence:
4.23. Corollary. For each linear map <p : V —>• V on an euclidean
space V, det <p is equal to the (oriented) volume of the image of the parallelepiped determined by vectors of an orthonormal basis. More generally, the image of the parallelepiped V determined by arbitrary dim V vectors has volume equal to det <p-multiple of of the former volume.
4.24. Outer product and cross product of vectors. The previous considerations are closely related to so called tensor product of vectors. We will not go farther in this technically more complicated area but we mention at least the case of the outer product n — dim V of vec-
, u„ e V.
Let (u 1 j■■,..., u„j■■)T be coordinate expressions of vectors uj in a chosen orthonormal basis V, and let M be a matrix with elements (uij). Then the determinant \M\ does not depend on the choice of the basis, and its value is called the outer product of the vectors «!,...,«„, and denoted by [u\,..., «„]. Hence the outer product is exactly the oriented product of the corresponding parallelepiped, see 4.22.
Several useful properties of the outer product follow directly from the definition
(1) The map (u\,u„) i->- [u\,..., un] is antisymmetric n-linear map. It means, it is linear in all arguments, and the interchange of any two arguments causes the change of sign of the result.
(2) The outer product is zero if and only if the vectors u\,... ,un are linearly dependent.
(3) Vectors u\,..., un form a positive basis if and only if their outer product is positive.
In technical applications in the space R3, we often use a closely related operation, so called cross product, which assigns a vector to any pair of vectors.
Let us consider an arbitrary euclidean vector space V of dimension n > 2 and vectors u\,..., u„-\ e V. If we substitute
tors u\,
225
CHAPTER 4. ANALYTIC GEOMETRY
of points in which the planes are the closest). Let's find such points in the case of planes q\, q2. Denote
ui = (1, 0, -1, 0, 0),   u2 = (0, 1, 0, 0, -1), ui = (1,1, 1,0, 1),    v2 = (0, -2, 0, 0, 3).
Points Xi € q\, X2 € Q2, which are „the closest" (as commented above), are
Xi = [7, 2, 7, -1,1] + hui + siu2, X2 = [2, 4, 7, -4, 2] + t2vi + s2v2,
so
X-i — Xo
[7,2,7, -1, 1] - [2,4,7, -4,2] +hui + SiU2 - t2vt - s2v2 (5, -2, 0, 3, -1) + hui + s\u2 - t2v\ - s2v2.
Dot products
\XX -X2,Ul) [Xl-X2,vl)
o,
0,
[Xi-X2,u2) Xl-X2,v2)
o,
0
then lead to linear equation system
2h = -5,
2s i + 5^2 = 1,
-4t2 - s2 = -2,
-5si   -      t2 - I3s2 = -1
with only solution t\ = —5/2, s\ = 41/2, t2 = 5/2, s2
obtained
"9 45 19
5 41
Xl = [7,2,7,-1, l]--«l +yM2
X2 = [2, 4, 7, -4, 2] + -vi - Sv2
2 2
-1,
-8. We
39
9 45 19 39 2' T' T' ~ ' ~T
Now we can easily see that the distance between points x\,x2 (and, at the same time, distance between planes q\, q2) je || x\ — x2 \ \ = ||(0,0,0,3,0)||=3. □
4.35. Find intersection of plane passing through point A = [l,2,3,4]eK4 and ortogonal to plane
q : [1, 0, 1, 0] + (1, 2, -1, -2)s + (1, 0, 0, l)f, s,(eR.
Solution. First, let's find plane ortogonal to q. Its direction will be ortogonal to direction of q, for vectors (a,b,c,d) within its direction we get linear equation system
(a,b, c,d) ■ (1,2,-1, -2) = 0   =   a+2b-c-2d = 0 (a,b,c,d) ■ (1,0, 0, 1) =0   =   a+d = 0.
these n — 1 vectors into the first n — 1 arguments of the n-linear map defined by the volume determinant as above, then we are given one argument left, i.e. a linear form on V. Since we have the scalar product at disposal, each linear form corresponds to exactly one vector. We call this vector u € y the cross product of vectors u\,..., w„_i, i.e. the following holds for each vector w e V
{v, w)
[Ml, ... , W„_l, HI J.
We denote the cross product byw = «i x...x«„_i.
If the coordinates of our vectors in an orthonormal basis are v = (yi,..., yn)T, w = (xi, ... ,xn)T and Uj = (u\j,. ..unj)T, then our definition can be expressed as
y\x\
' ynXn
«11
Ul(n-l) Xi
Mnl     • • •    Mn(n — 1)
We see from here that the vector i; is given uniquely and its coordinates are calculated by the formal expansion of this determinant along the last column. At the same time, the following properties of the cross product are direct consequences of the definition:
Theorem. For the cross product v — u\ x ... x w„_i we have
(1) v e      ..., Un-i)1-
(2) v is nonzero if and only if the vectors u \ ,...,«„_ i are linearly independent,
(3) the length \\v\\ of the cross product is equal to the absolute value of the volume of parallelepiped V(0; u\, ..., w„_i),
(4) («i,..., w„_i, u) is an agreeing basis of the oriented eu-clidean space V.
Proof. The first claim follows directly from the defining formula for i; since substituting an arbitrary vec-WL JkY/   tor uj for w we get the scalar product v ■ uj on the left and the determinant with two equal columns on the right.
The rank of the matrix with n — 1 columns uj is given by the maximal size of a non-zero minor. The minors which define coordinates of the cross product are of degree n — 1 and thus the claim (2) is proved.
If the vectors u\,... ,u„-i are dependent, then also (3) holds. Therefore, let us consider that the vectors are independent, let i; be their cross product, and let us choose an orthonormal basis (ei,..., e„_i) of the space (ui,..., u„-\). It follows from what we have proved that there exists a multiple (l/a)v, 0 ^ a e R, such that (e i,..., ek, (1 /a) v) is an orthonormal basis of the whole space V. The coordinates of our vectors in this basis are
uj — (u\j,u(n-i)j, 0)T,    v — (0, ..., 0, a)T.
So the outer product [u\,..., w„_i, i;] is equal (see the definition of cross product)
0
|«i, ..., w„_i, v\ —
«11
«i(«-i)
«(«-i)i   •••   «(«-i)(«-i) 0 0       ... 0 a
— (v, v) — a . Expanding the determinant along the last column we get
a2 = a VolP(0; uu ..., ;„_i).
226
CHAPTER 4. ANALYTIC GEOMETRY
Solution is two-dimensional vector space ((0, 1,2, 0), (—1,0, —3, 1)). Plane r ortogonal to q passing through a has parametric equation
r : [1, 2, 3, 4] + (0, 1, 2, 0)w + (-1,0, -3, l)v,    u, v e R.
We can obtain intersection of planes from both parametric equations. We get linear equation system
1 + s + t = l-v
2s = 2 + u
1 — s = 3 + 2u — 3v
-2s + t = 4 + v,
which has only solution (it must be so as matrix columns are linearly independent) s = -8/19, t = 34/19, u = -54/19, v = -26/19. Inputting parameter values s and t into parametric form of plane q, we obtain sought intersection [45/19, -16/19, 11/19, 18/19] (needless to say, we get the same solution by inputting the values into r). □
4.36. Find a line passing through point [1,2] e R2 so that angle between this line and line
p : [0, l] + f(l,l)
is 30°.
Solution. Angle between two lines is angle between their direction vectors. It is sufficient to find direction vector v of the line. One way to do so is to rotate direction vector of p by 30°. Rotation matrix for the angle 30° is
cos 30°   - sin 30c sin 30°    cos 30°
Sought vector v is therefore
We could perform the backward rotation as well. The line (one of two possible) has parametric equation
/V3    1 73 l\ [1'21 + (--2- + 2J'-
□
4.37. Determine cos a, where a is angle between two adjacent faces of regular octahedron (octahedron has eight equilateral triangles as faces).
Solution. Octahedron is symetric, therefore it does not matter which two faces we choose. Further, without loss of generality, asume octahedron of edge length 1 and place it into standard Cartesian coordinate
Both the remaining two claims from the proposition follow From here. □
4.25. Affine and euclidean properties. Now we can have a think about which properties are related to the affine structure of the space and for which properties we really need the scalar product in the difference space.
It is obvious that all euclidean transformations, i.e. bijective affine maps between euclidean spaces, which preserve the distance between points preserve also all objects we have studied. I.e. next to the distances they preserve also un-oriented angles, unoriented volumes, angle between sub-spaces etc. If we want them to preserve also oriented angles, cross products, volumes, then we must assume in addition that our transformations preserve the orientation too.
We may formulate our problem also as follows: Which concepts of euclidean geometry are preserved under affine transformations?
First let us remind that an affine transformation on a n-dimensional space A is uniquely defined by mapping n + 1 points in a general position, i.e. by mapping one n-dimensional simplex. In the plane, it means to choose the image of one (nondegenerate) triangle, which may be an arbitrary (nondegenerate) triangle. The preserved properties will be the properties related to subspaces in particular, i.e. the properties of the type "a line passing through a point" or "a plane contains a line" etc. At the same time, the col-inearity of vectors is preserved, and for every two colinear vectors, the ratio of their lengths is preserved (independently on the scalar product defining the length). Similarly, we have already seen that the ratio of volumes of two n-dimensional parallelepipeds is preserved under transformations (since the determinant of the corresponding matrix changes about the same multiple).
These affine properties can be used smartly in the plane to prove geometric claims. For instance, to prove the fact that the medians of a triangle intersect in a single point and in one third of their lengths, it is sufficient to verify this only in the case of an isosceles right-angled triangle or only in the case of an equilateral triangle, and then this property holds for all triangles. Think this argumentation over!
2. Geometry of quadratic forms
After straight lines, the simplest objects in the analytic geom-_ etry of plane are so called conic sections. They are given by quadratic equations in cartesian coordinates, and by coefficients we recognize //// ■ that the conic is a circle, ellipse, parabola or hyperbola, potentially it may be also a pair of lines or a point (the degenerate cases).
We will see that our tools enable us to classify effectively these objects in all finite dimensions and to work with them. It is also obvious that we cannot distinguish a circle from an ellipse in affine geometry, therefore we begin in the euclidean geometry.
4.26. Quadrics in £„. In analogy with equations of conic sections in plane, we start with objects in euclidean point spaces which are defined in a given orthonormal basis by quadratic equations, we talk about quadrics.
227
CHAPTER 4. ANALYTIC GEOMETRY
system R3 so that its centroid lies in [0, 0, 0]. Its vertices then are located in points A = [^,0,0], B = [0,^,0], C = [-^,0,0], D = [0,        0], £ = [0,0,-|]aF = [0, 0, ^].
We will compute angle between faces CDF and BCF. We have to find vectors ortogonal to their intersection and lying within respective faces, which means ortogonal to CF. They are altitudes from D and F to edge CF in triangles CDF and BCF respectively. Altitudes in equilateral triangle are the same segments as medians, so they are SD and SB, where S is midpoint of CF. Because we know coordinates of points C and F, the point S has coordinates [—0, ^-] and vectors are SD = (^,      ,-% a SB = (^, f,Together
cos a
4 ' 2
/ V2 72 _V_2\ /V2 72 V2x ^ 4 '      2'      4 ^ ' ^ 4 '   2 '      4 ^
Therefore a = 132°
□
4.38. In Euclidean space spaces U,V, where
determine angle <p between sub-
(a) U : [3, 5, 1, 7, 2] + t (1, 0, 2, -2, 1), (el,
V : [0, 1, 0, 0, 0] + s (2, 0, -2, 1, -1), seM;
(b) 1/ : [4, 1, 1, 0, 1] + t (2, 0, 0, 2, 1), t e 1,
V : xi + x2 + x3 + x5 = 7;
(c) U : 2*i — X2 + 2x3 + %5 = 3,
V : xi + 2x2 + 2x3 + x5 = — 1;
(d) U : [0, 1, 1, 0, 0] + t (0, 0, 0, 1, -1), t e 1,
V : [1,0, 1, 1, 1] +r (1, -1,2, 1,0) +s (0, 1, 3,2,0)
+ p(l,0, 0, 1,0)+ 9 (1,3, 1,0, 0), r,s, p,q el;
(e) U : [0, 2, 5, 0, 0] + t (2, 1, 3, 5, 3) + s (0, 3, 1, 4, -2)
+ r(l,2, 4,0, 3), t,s,r el,
V : [0, 0, 0, 0, 0] + p (-1, 1, 1, -5, 0)
+ q (1,5, 1, 13, -4), p,q sR;
(f) 1/ : [1, 1, 1, 1, l] + f (1,0, 1, 1, 1) + s (1,0,0, 1,1), t,s e
1,
V : [1, 1, 1, 1, 1] +p(l, 1, 1, 1, 1) + 9 (1, 1,0, 1, 1)
+ r(l, 1,0, 1,0),
p,q,r e 1.
Solution. First, recall that angle between affine subspaces is the same as the angle between vector spaces associated to them, and therefore we omit transposition caused by point addition.
Case (a). Since U a V are one-dimensional spaces, angle <p e [0,7r/2] is given by formula
Let us choose a fixed cartesian coordinate system in £„ (i.e. 4#8aS a Pomt 311 orthonormal basis of the difference fj4p      space), and let us consider a general quadratic equa-
''"'fMM^k'   ^on f°r ^ coordinates (xi,..., xn)T of a point A €
(4.4)
aijXiXj + ^^2a{X{ + a = 0,
where we may assume the symmetry = a^ without loss of generality. This equation can be written as
f(u)+g(u)+a = 0
for a quadratic form / (i.e. the restriction of a symmetric bilinear form F to pairs of equal arguments), a linear form g, and a scalar a e R. Farther, let us assume that at least one coefficient is nonzero (the equation is linear and describes an euclidean subspace otherwise).
Let us notice that every euclidean (ar affine) coordinate transformation transforms the equation (4.4) into the same form with a quadratic, linear and constant part.
4.27. Quadratic forms. Let us begin our discussion of equation (4.4) with its quadratic part, i.e. bilinear symmetric form F : Rn x Rn —>• R. Similarly, we may think of a general symmetric bilinear form on an arbitrary vector space.
For an arbitrary basis on this vector space, the value fix) on vector x = x\e\ +----h xnen is given by the equation
f(x) = Fix, x) = ^^XiXjFiei, ej) = xT ■ A ■ x
'J
where A = (a^) is a symmetric matrix with elements atj = F(et, ef). We call such maps / quadratic forms, and the formula from above for the value of the form in terms of the chosen coordinates is called the analytic formula for the form.
In general, by a quadratic form we mean the restriction / (x) of a symmetric bilinear form Fix, y) to arguments of the type (x, x). Evidently, we can reconstruct the whole bilinear form F from the values fix) since
fix + y) = Fix + y, x + y) = fix) + fiy) + 2F(x, y).
If we change the basis to a different basis e[,..., e'n, we get different coordinates x = S ■ x* for the same vector (here S is the corresponding transformation matrix), and so
fix) = iS-x')T
A ■ iS-x') = ix'Y ■ iS  ■ A ■ S) ■ x'.
Now let us assume again that our vector space is equipped with a scalar product. Then the previous computation can be formulated as follows. The matrix of bilinear form F, which is the same as the matrix of /, transforms under a change of coordinates in such a way that for orthogonal changes it coincides with the transformation of a matrix of a linear map (indeed, then we have S-1 = ST). We can interpret this result also as the following observation:
Proposition. Let V be a real vector space with a scalar product. Then formula
<p \-+ F,    F(u, u) = (<p(u), u)
defines a bijection between symmetric linear maps and quadratic forms on V.
228
CHAPTER 4. ANALYTIC GEOMETRY
cos if
(1,0,2,-2,l)-(2,0,-2,1,-1)
ao-vio
(1,0,2,-2,1) ||-|| (2,0,-2,1,-1) ||
Therefore cos (p = 1/2 and cp = n/3.
Case (b). We know direction vector (2, 0, 0, 2, 1) of subspace U and normal vector (1,1,1,0, 1) of subspace V. Angle between them if/ = n/3 can be easily derived from the formula
(2,0,0,2,1)-(1,1,1,0,1)
cos if/
j_
3-2-
(2,0,0,2,1) ||-|| (1,1,1,0,1)
Now we have to realise that cp = n/2 — if/ = n/6 (because cp is complement to if/).
Case (c). Hyperplanes U and V are defined by normal vectors u = (2, —1, 2, 0, 1) and i; = (1, 2, 2, 0, 1). Obviously angle cp is equal to angle between direction vectors u av. Therefore (see (a))
COS if
(2,-1,2,0,1)-(1,2,2,0,1)
(2,-1,2,0,1) INI (1,2,2,0,1)
Case (d). Denote
2'
tj- <P
« = (0,0,0,1,-1),    ui = (1,-1,2, 1,0),
v2 = (0, 1,3,2, 0),    u3 = (1,0,0, 1,0),    i;4 = (1,3, 1,0, 0)
and denote ortogonal projection of u into vector subspace of V (sub-space generated by vi, v2, v3, V4) by pu. If we knew pu, from the formula
" Pu I I
(4.2)
cos<p
would cp e [0, jt/2] . We know that
pu = av\ + bv2 + cv3 + dv4   for somea, b, c, d e M and that
(pu - u, vi ) = 0, (pu - u,v2) = 0, { pu - u, v3 ) = 0,    ( Pu    11,114) = 0.
Substituting for pu we get linear equation system
la   +    lb   +   2c = 1,
la   +   14/3   +   2c   +    6d   = 2,
2a   +    2b   +   2c   +      d   = 1,
6b   +    c   +   lid   = 0.
Solution is (a, b,c,d) = (-8/19, 7/19, 13/19, -5/19), a tak
5 19
1  _ 72
V5 = —
cos cp
8        7 13
—th H--V2 -\--1>3
19       19 19 I (0,0,0,1,0) 11
-V4 = (0,0, 0, 1,0),
..(0,0,0,1,-1)11     J2 2 Hence cp = jt/4.
Case (e). Let's determine intersection of vector subspaces associated with given affine subspaces. Vector {x\, x2, x3, X4, x$) is in vector subspace of U, if and only if
(X\, X2, X3, X4, x5) =
t (2, 1, 3, 5, 3) + s (0, 3, 1, 4, -2) + r (1, 2, 4, 0, 3)
Proof. Indeed, each bilinear form with a fixed second argument becomes a linear form au( ) = F( , u), and in the presence of a scalar product, it must be given by formula a(u)(v) = v ■ w for a suitable vector w. We set = w. One show directly from the coordinate expression displayed above that cp is a linear map with matrix A. Hence it is selfadjoint.
On the other hand, each symmetric map cp defines a symmetric bilinear form F by formula F(u, v) = (cp(u), v) = (u, cp(v)), and thus also a quadratic form by restriction. □
We get immediately the following consequence of this proposition. For each quadratic form / there exists an orthonormal basis of the difference space in which / has a diagonal matrix (and the values on the diagonal are determined uniquely up to their order).
Due to the identification of quadratic forms with linear maps, we can also define correctly the rank of the quadratic form as the rank of its matrix in any basis (i.e. the rank is equal to the dimension of the image of the corresponding map cp).
4.28. Classification of quadrics. Let us come back to our equation (4.4). Our results on quadratic forms enable us to rewrite this equation as follows
Y^^iA + J2biXi + b = o.
Hence we may assume directly that the quadric is given in this form. In the next step, we do completing the squares for the coordinates x, with A, ^ 0, which "absorbs" the squares together with the linear terms in the same variable (so called Lagrange algorithm, will be discussed in detail later). So we are left only with linear terms corresponding to variables for which the coefficient at the quadratic term was zero, and we get
r = l
pi)2+ E
j satisfying Xj
bjXj + c
= 0
0.
This corresponds to a translation of the origin about the vector with coordinates pt and to such a choice of basis of the difference space that we get the desired diagonal form in the quadratic part. In the identification of quadratic forms with linear maps derived above, it means that cp is diagonal on the orthogonal complement of its kernel. If we are left with some linear terms, we may adjust the orthonormal basis of the difference space for the kernel of cp such that the corresponding linear form is a multiple of the first term of the dual basis. Hence we can already reach the final formula
where k is the rank of matrix of quadratic form /. If b / 0, we can make the constant c in the equation to be zero by a next change of the origin.
Hence we see that the linear term may (but does not have to) appear only in the case that the rank of / is less than n, c e R may be nonzero only if b = 0. The resulting equations are called the canonical analytic formulas for quadrics.
229
CHAPTER 4. ANALYTIC GEOMETRY
for some t,s,r el, and, at the same time, (x\, x2, x3, x4, x5) e V if and only if
(jci, x2, x3, x4, x5) = p (-1, 1, 1, -5, 0) + q (1, 5, 1, 13, -4) for some p, q el. Let's find such t,s,r, p,q el,so that
t (2, 1, 3, 5, 3) + s (0, 3, 1, 4, -2) + r (1, 2, 4, 0, 3) = p(-l, 1, 1, -5,0) +q (1,5, 1, 13, -4).
It is a homogeneous linear equation system. We will solve it in matrix form (order of variables is t, s, r, p, q)
í2	0	1	1	-1			/ 1	3	2	-1	-5 \
1	3	2	-1	-5			0	2	1	-1	-3
3	1	4	-1	-1		~ ... ~	0	0	1	-1	1
5	4	0	5	-13			0	0	0	0	0
\3	-2	3	0	4	)		\0	0	0	0	
4.29. The case of £2. As an example of the previous procedure, let us go through the whole discussion in the simplest ; /    case of a nontrivial dimension, i.e. dimension two. ^~z_ The original equation has the form
an
x2 + a22y2 + 2ai2xy + a\x + 02y ■
0.
By a suitable choice of a basis of difference space and the subsequent completing the squares we reach the form (we use the same notation x, y for the new coordinates):
„2
a\\x
■ a22
y2 + a\x + a2y + <
0
where a, may be nonzero only in the case that an is zero. By the last step of the general procedure, i.e. in dimension n = 2 only by a choice of a translation, we reach exactly one of the following equations:
It has showed that vectors defining V are linear combination of U's vectors. That means V is subset of U, and hence <p = 0.
Case (f). Again we will find an intersection of U and V. Again we will search for numbers t, s, p, q, r e R such that
t (1,0, 1, 1, 1) +s (1,0,0, 1, 1) =
p(l, 1, 1, 1, 1) + <?(1, 1,0, 1, 1) + r (1, 1,0, 1,0).
The solutionis (t, s, p, q, r) = (—a, a, —a, a, 0), aeK. Intersection Z(U) n Z(V) of vector spaces U and V contains exactly vectors
(0, 0, -a, 0, 0) = -a (1, 0, 1, 1, 1) + a (1, 0, 0, 1, 1)
= 1, 1, 1, 1) + a(l, 1,0, 1, 1) + 0(1, 1,0, 1,0),
where a e R, i.e. Z(U) n Z(V) is generated by (0, 0, 1, 0, 0) and its ortogonal complement (Z([/)nZ(V))-Lis obviously generated by vectors
(1,0,0,0,0),    (0,1,0,0,0),    (0,0,0,1,0), (0,0,0,0,1). We get
Z(U)f)Z(V) ^ {0},    Z(U)f)Z(V) £ Z(U),
z(U) n z(V) £ z(V).
Angle <p is defined as angle between subspaces
z(U) n (Z(U) n z(V))1- a z(V)f)(Z(U)f)Z(V))-L.
It can be further seen that
z(U) n (Z(U) n z(V))1- = < (i, o, o, 1, i)),
z(V) n (Z(U) n z(V))1- = < (i, 1, o, 1, i), (i, 1, o, 1,0)).
It is enough to express Z(U) as linear combination of vectors
(0,0,1,0,0), (1,0,0,1,1) and subspace Z(V) by vectors
(0,0,1,0,0),    (1,1,0,1,1), (1,1,0,1,0).
ŕ/b2
ŕ/b2
y2/b2
0 = x2/a 0 = x2/a 0 = x2/a 0 = x2 /az — 2py 0 = x2/a2 + y2 /b 0 = x2/a2 -y2/b 0 = x2 0 = x2 0 = x2
1      empty set 1 ellipse 1 hyperbola
parabola
point
2 concurrent lines 2 parallel lines 2 identical lines empty set
The origin of cartesian coordinates is the center of the studied conic, the found orthonormal basis of the difference space gives the direction of semiaxes, the final coefficients a, b then give the lengths of semiaxes in nondegenerate directions.
4.30. Affine point of view. In the previous two paragraphs, we R have been searching for essential properties and stan-dardized analytical descriptions of objects defined in 5$*^^j euclidean spaces by quadratic equations. We wanted to get the simplest equations which may be reached by suitable choice of coordinates. A geometric formulation of our result is that for two different objects - quadrics, given in different cartesian coordinates in general, there exists an euclidean transformation on £„ (i.e. an affine bijective map preserving lengths) if and only if the above algorithm leads to the same analytic formulas, up to the order of coordinates. Moreover, we can obtain directly by our procedure the cartesian coordinates in which our objects are given by the resulting canonical formulas, and hence also the explicit expression of the corresponding coordinate transformation (we know that it is always a composition of a translation, rotation and reflection with respect to a hyperplane).
Of course, we may ask to what extend we can do the same in affine spaces, where we can choose any coordinate system. For example, in the plane it means that we cannot distinguish the circle from the ellipse, but we distinguish from the hyperbola and also between all other types of conies. In particular, all hyperbolas merge into one etc.
We show on quadratic forms the main difference in the procedure, and we postpone the next discussion of this issue to the third part of this chapter.
Let us consider a quadratic form / ona vector space V and its
analytic formula f(u)
Ax with respect to a chosen basis on
230
CHAPTER 4. ANALYTIC GEOMETRY
Since dimension of Z(U) n (Z(U) n Z(V))1- is 1, we can use formula (||4.2||), where u = (1, 0, 0, 1, 1) and pu is ortogonal projection of u into Z(V) n (Z(U) n Z(V))-1. Then
pu = a(l, 1,0, 1, 1) +b(l, 1,0, 1,0)
and
(pu - u, (1, 1, 0, 1, 1)) = 0,    (pu - u, (1, 1, 0, 1, 0)) = 0,
which leads to linear equation system
4a + 3b = 3, 3a   +   3b   = 2
with only solution a = l,b = —1/3. We have computed
pu = (l,|,0,|,l) and from (||4.2||) it follows that
(2/3,2/3,0,2/3,1)
cos cp
(1,0,0,1,1)
^,   tj.   <p = 0, 49 (^28°)
□
C. Geometry of quadratic forms
4.39.  Determine polar basis of form / : R3 -» R,
3x\ + 2x\X2 + x\ + 4X2X3 + 6X3 .
Solution. Its matrix is
f(X\, x2, x3)
According to step (1) of Lagrange algorithm (see Theorem 4.30), we perform following operations
1 2
f(xi,x2, x3) = -(3x! + x2)2 + -x\ + 4x2x3 + 6x3
1,32 , = 3 V? + 2(3^2 + 2y3)2
-l-72 + l72
and we see that the form has rank 2 and matrix changing basis to polar basis w_ is obtained by combination of following transformations:
2 2
Z3 = V3 = x3, z2 = 3V2 + 2y3 = 3^2 + 2x3, zi = yi = 3xx + x2, so the change of basis matrix is
'3 1 o^ 0 § 2
V0  0 1,
We computed polar coordinates, expressed them in standard basis and wrote them as rows of the matrix (see that columns of this matrix are vectors of standard basis in polar basis). Polar basis vector coordinates are the columns of matrix T~l.
V. Then for vector u — x\u\ -\-----h i„m„ we also write the form
/as
f(x\, . . . , Xn) = ^^ß/j'X/Xj,
U
We have already shown in the previous paragraphs with the help of the scalar product that A is diagonal for a suitable choice of basis, i.e. that F(ui, uj) — 0 for i / j holds for a suitable symmetric form F. Each such basis is called the polar basis of the quadratic form /. Obviously, we may always choose a scalar product for such purpose. Nevertheless, we are going to prove this statement without use of scalar product, and so we get much simpler algorithm for finding a polar basis among all other basis. At the same time, we get know the relevant information about affine properties of quadratic forms. The following proposition is in literature known as Lagrange algorithm.
Theorem. Let V be a real vector space of dimension n, f : V -> M a quadratic form. Then there exist a polar basis for f on V.
Proof. (1) Let A be the matrix of / in basis u — (u\, ...,u„) on V, and let us assume an / 0. Then we may write
f(x\, ..., x„) — a\\x\ + 2ai2XiX2 H-----h A22-4 + • • •
— flfj^flnXi + Ö12X2 H-----h fll«X„)2
+ terms not containing x\.
Hence we transform the coordinates (i.e. we change the basis) such that in new coordinates we have
x\ — a\\x\ + fli2X2 H-----h a\nxn, x'2 — X2, ..., x'n — x„.
It corresponds to the new basis (as an exercise, compute the transformation matrix)
V\   = üyyU\,   1)2 —-  U2 — Ü\2U\, . . . ,  V„   —  U„   — üyy a\nu\
and so, as we may expect, in the new basis the corresponding symmetric bilinear form satisfies g(v\,v{) — 0 for all i > 0 (compute!). Thus / has the form a^x'l2 + h in the new coordinates, where h is a quadratic form independent on the variable x\.
Due to technical reasons, it is mostly better to choose v\ — u\ in the new basis. Then we have the expression f — f\ + h, where f\ depend only on x'1, while x'1 does not appear in h, but g(vu vi) - an.
(2) Let us assume that after doing the step (1), we get for h a matrix (of rank less about one) with a nonzero coefficient at x7,2. Then we may repeat exactly the same procedure and we get the expression f — f\ + fi + h, where h contains only the variables with index greater than two. We may proceed in this way as long until we get a diagonal form after n — 1 steps or in a step, say ;-th step, the element an is zero.
(3) If the last possibility happens, but in the same time there exists some other element ajj / 0 with j > i, then it suffices to switch the ;-th and the j-th vector of the basis and to continue according the the previous procedure.
(4) Let us assume now that we come to the situation ajj — 0 for all j > i. If there is no element aß ^ 0 with j > i,k > i, then we are done since we have got a diagonal matrix. If aß / 0, then we use transformation vj — uj + uk + we keep the other vector of basis constant (i.e. x?k — xk — xj, the other remain constant). Then h(vj, Vj) = h(uj, Uj) + h(uk, WjO + 2h(uk, Uj) — 2aj\ ^0 and we can continue according to (1). □
231
CHAPTER 4. ANALYTIC GEOMETRY
'I zlL
o \
,0   0 1
1 3
-3 ,
polar basis is therefore      0, 0),        §, 0), (1, -3, 1)). □
3. /(xi, x2, X3)
4.40.  Determine polar basis of form / : M3
2*1X3 -(- X2.
Solution. Matrix of the form is
/0  0 1N A = J 0   1 0
V 0 °v
We can switch the order of variables: yi = x2, y2 = xi, y3 = x3. It is then trivial to apply step (1) of Lagrange algorithm (there are no common terms), however for the next step, case (4) sets in. We introduce transformation zi = yi,z2 = y2, zj, = yj, — y2. Pak
f(xl,x2,x3) = zj + 2z2(z3 + z2) = zj + ^(2z2 + z3f - X-z\.
Together we get z,\=y\= x2, z2 = y2= xu z3 = y3 - y2 = x3 - xx. Matrix T for change to polar basis is
/ 0    1   0\ /0   1 0>
T =    1    0  0     and   T~l =   1   0 0
V-i o i) \o 1 1,
polar basis is therefore ((0, 1, 0), (1, 0, 1) (0, 1, 1)). □
4.41. Find polar basis of quadratic form / standard basis defined as
I, which is in
f(xi,x2, x3) = xix2 + dissolution. By application of Lagrange algorithm we get:
f(X\, X2, X3) = 2xiX2 + X2X3
we perform substitution according to step (4) of the algorithm y2 — x2—x\, = 2xi (xi + yi) + (*i + y2)*3 = 2x\ + 2x\y2 + X1X3 + y2x3 =
1 1      2      1   2       1 2
= 2^2xi +y2 + 2xi) ~ y2 ~ 8X3 + y2X3 =
substitution yi — 2x\ + y2 + 5X3
1   2      '  2      ^   2   , 1   2 1    \2  ,   ^ 2
=       ~ 2^ ~ 8X3    ^ = 23'1 ~    2yi ~ 2Xs)     8X3 = substitution V3 — \y2 — \x3
4.31. Affine classification of quadratic forms. We can improve the Lagrange algorithm for computing polar basis by
____multiplying the vectors from basis by a scalar such
that the coefficients at squares of variables in the corresponding analytic formula for our form will be only scalars 1,-1 and 0. Moreover, the following law of inertia says that the number of one's and minus one's does not depend on our choices in the course of the algorithm. These numbers are called the signature of a quadratic form. As before, we get a complete description of quadratic forms in the sense that two such forms may be transformed each one into the other by an affine transformation if and only if they have the same signature.
Theorem. For each nonzero quadratic form of rank r on a real vector space V there exists a natural number 0 < p < r and r independent linear forms <p\, ... ,<pr e V* such that
f(U) = ((pi(u))2 + --- + ((pp(u))2-((pp+l(u))2-----((Pr(u))2.
Otherwise put, there exists a polar basis, in which f has analytic formula
f (X\, . . . , Xn) — Xj +■■■-{- Xp — xp+i — ■ ■ ■ — xr.
The number p of positive diagonal coefficients in the matrix of given quadratic form (and thus the number r — p of negative coefficients) does not depend on the choice of polar basis.
Two symmetric matrices A, B of dimension n are matrices of the same quadratic form in different bases if and only if they have the same rank and the same number of positive coefficients in the polar basis.
Proof. By    the    Lagrange    algorithm    we obtain
f(x\,..., x„) — X\x\ H-----\-Xrx^,Xi / 0, in a basis on V. let us
assume moreover that exactly the first p coefficients Xi are positive. Then the transformation yi — VMxi,..., yp — ^/Xp~xp, yp+\ —
-Xrxr, yr+i — xr+i,..., y„ — xn
y—Xp+\xp+\, ■ ■ ■, yr — yields the desired formula. The forms <pt are exactly the forms from dual basis in V* to the polar basis that we obtained. We must prove yet that p does not depend on our procedure. Let us assume that we managed to find a formula for the same form / in the polar bases u, v, i.e.
yi
■4
fr
1
^1
2yl +«:
•3 •
Within coordinates yi, V3, x3 we can see that the quadratic form has a diagonal shape, which means that basis associated with those coordinates is polar basis of the form. If we want to express the basis, we need to get matrix which changes basis from polar to standard. By definition of change of basis matrix, its columns are polar basis vectors.
= xi, yf&lxj ■ ■' x«) — x\ + ■ ■ ■ + xp Xp+i
/(yi,..., y„) = y\ + ■ ■ ■ + y\ - y\+l -■
and let us denote the subspace generated by first p vectors of the first basis by P — (u\,..., up), and similarly Q — (vq+i,... ,v„). Then for each u e P we have f(u) > 0 while for*; e Q we have f(v) < 0. Hence necessarily P n Q — {0} holds, and therefore dim P + dim Q < n. From here we conclude p + (n — q) < n, i.e. p < q. However, we get also q < p by the opposite choice of subspaces.
Thus p is independent on the choice of the polar basis. But then for two matrices with the same rank and the same number of positive coefficients in the diagonal form of the corresponding quadratic form, we get the same analytic formulas. □
While we discussed symmetric maps we talked about definite and semidefinite maps. The same discussion has an obvious meaning also for symmetric bilinear forms and quadratic forms. A quadratic form / on a real vector space V is called (1) positive definite if f(u) > 0 for all vectors u / 0,
232
CHAPTER 4. ANALYTIC GEOMETRY
We get change of basis matrix by either expressing the old variables (xi, x2, x3) by new variables (yi, y3, x3), or equivalently expressing the new ones by the old ones (which is easier), we however need to compute inverse matrix in the latter case.
We have y\ = 2x\ + y2 + \x3 = 2x\ + (x2 — x{) + \x3 and
\x3. Matrix changing basis from
1 V 9X3
1 X\ + \x3
^3 — 2->-z      2^ ~~      2-"1   1 2
polar basis to standard basis is
Inverse matrix is
'1     _2 _
45 : 3    3 : v0 0
One of polar bases of the given quadratic forms is hence   for   example   basis   (see   the   columns   of matrix
{(1/3, 1/3, 0), (-2/3, 4/3, 0), (-1/2, 1/2, 1)}. □
4.42.  Determine the type of conic section defined by
3xf
3x\x2 + x2 — 1 = 0.
Solution. We complete the squares:
1
3.Xi — 3x\X2 ~h X2 — 1
:(3*i
:x2y
1
-x\ + x2 1
1
yl 3 4 T 3 1
-,y\
2
2-71   rA 3"
According to list 4.29, the given conic section is hyperbola.
□
4.43.   By completing the squares express quadric
-x2 + 3y2 + z2 + 6xy - 4z, = 0
in such way that one can determine its type from it.
Solution. We move all terms containing x to — x2 and complete the
square. We get equation
-(x - 3y)2 + 9/ + 3y2 + z2 - 4z = 0.
There are no „unwanted" terms containing y , so we repeat the procedure for z, which gives us
-(x - 3y)2 + 12/ + (z - 2)2 - 4 = 0.
Now we can conclude that there is a transformation of variables that leads to equation (we can divide by 4 first)
+ y2+z2
1 =0.
(2) positive semidefinite if f(u) > 0 for all vectors u e V,
(3) negative definite if f(u) < 0 for all vectors u ^ 0,
(4) negative semidefinite if f(u) < 0 for all vectors u e V,
(5) indefinite if f(u) > 0 and f(v) < 0 for two vectors u, v e V.
We use the same names also for symmetric matrices corresponding to quadratic forms. By a signature of a symmetric matrix we mean the signature of the corresponding quadratic form.
4.32. Theorem (Sylvester criterion). A symmetric real matrix A is positive definite if and only if all its leading principal minors are positive.
A symmetric real matrix A is negative definite if and only if (— 1)' | A, | > Ofor all leading principal submatrices Ai.
Proof. We must analyse in detail the form of the transforma-\^ tions used in the Lagrange algorithm for constructing h the polar basis. The transformation used in the first step of this algorithm always have an upper triangular matrix T and if we use the technical modification mentioned in the proof of proposition 4.30 moreover, the matrix has one's on the diagonal:
/I
T =
£12
0 1
"n2\
0
V
□
Such matrix of the transformation from basis u to basis v has several nice properties. In particular, its leading principal submatrices Tk formed by first k rows and columns are the transformation matrices of a subspace Pk — ■ ■ ■, uk) from basis (u\,..., uk) to basis (vi..., vk). The leading principal submatrices Ak of the matrix A of form / are matrices of restrictions of the form / to Pk. Therefore, the matrices Ak and A'k of restrictions to Pk in basis uandv respectively satisfy A k — A'k(Tk)~l, where T is the transformation matrix from u to v. The inverse matrix to an upper triangular matrix with one's on the diagonal is an upper triangular matrix with one's on the diagonal again. Hence we may similarly express A' in terms of A. Thus the determinants of matrices Ak and A'k are equal by Cauchy formula. So we proved a useful statement:
Let f be a quadratic form on V, dim V — n, and let u be a basis of V such that we never need the items (3) and (4) from the Lagrange algorithm while finding the polar basis. Then as the result we get analytic formula
f(x\, ..., x„) — \\x\ + X2x\ + ■ ■ ■ + Xry?r
where r is the rank of form f, k\, ... ,kr ^ 0 and for leading principal submatrices of the (former) matrix A of quadratic form f we have \ Ak\ — k\k2 ... kk, k < r.
In our procedure, each sequential transformation makes zeros under the diagonal in next column. From here it is obvious that if the leading principal minors are nonzero then the next diagonal term in A is nonzero. By this consideration we proved so called Jacobi theorem:
Corollary. Let f be a quadratic form of rank r on a vector space V with matrix A in basis u. There is no need of other steps in Lagrange algorithm than completing squares if and only if the leading principal submatrices of A satisfy \A\ \ ^ 0,      \Ar\ ^ 0. Then
233
CHAPTER 4. ANALYTIC GEOMETRY
We can tell the type of the conic section without transforming its equation to the form listed in 4.29. As we know, we can express every conic section as
aux2 + 2anxy + any2 + 2ai3x + 2a23y + a33 = 0.
there exists a polar basis (which we get by the above algorithm), in which f has analytic formula
det A
an an an a 12 a22 a23 a 13   a32 a33
and
Determinants
an ayi
&12 a22
are so called invariants of conic section which means that they are not changed by Euclidian transformation (rotation and translation). Furthermore, different types of conic sections have different signs of those determinants.
• A / 0   non-degenerate conic sections:
ellipse for 8 > 0, hyperbola for 8 < 0 and parabola for 8 = 0 Furthermore, for real ellipse (not imaginary), (an +a22)A < 0 must hold.
• A = 0   degenerate conic sections, lines
We can easily check that signs (or zero-value) of the determinants are
(x\
really invariant to coordinate transformation. Denote X = I y I and
w
A is a matrix of quadratic form. Then the corresponding conic section has equation XT AX = 0. We get the standard form by rotation and translation, i.e. by transformation to new coordinates x', y1 satisfying
x   =   x' cos a — / sin a + c\
y   =   x' sin a + y' cos a + c2,
(X'\
or, in matrix form, for new coordinates X' = \ y' \ holds
cos a   — sin a   c\ \ ix in a    cos a    c2 I I / 0 0       1/ \1
Inputting X = MX' into the conic section equation we get equation in new coordinates
XTAX   = 0 (MX')T A(MX')   = 0 X'TMT A MX'   = 0.
Denote by A' matrix of the quadratic form in new coordinates. Then
(cos a   —sin a ciN sin a    cos a    c2 | has unit 0 0 1
determinant, so
det A' = det MT det A det M = det A = A.
n	
	=
	\
x„) — \ Ai\x]
\A2\ \Ai\~
\Ar\ x2 \Ar-l\ r'
Hence if all leading principal minors are positive, then / is positive definite by Jacobi theorem.
On the other hand, let us consider that the form / is positive definite. Then for a suitable regular matrix P we have A — PTEP = PTP. And so \A\ = \P\2 > 0. Let u be a chosen basis in which the form / has matrix A. The restrictions of / to subspaces V* — (u\,..., uk) are positive definite forms fk again, and the corresponding matrices in bases u i,..., ui_ are the leading principal submatrices A^. Thus|A,t| > 0 according to the previous part of the proof.
The claim about negative definite forms follows by observing the fact that A is positive definite if and only if —A is negative definite. □
3. Projective geometry
In many elementary texts on analytic geometry, the authors finish with the affine and euclidean objects described if '■. :i      above. The affine and euclidean geometries are sufficient for many practical problems, but not for all prob-■%iw*4&**-^-~ lems.
For instance in processing an image from a camera, angles are not preserved and parallel lines may (but does not have to) intersect. The next reason for finding a more general framework for geometric problems and considerations is to deal only with simple numerical operations like matrix multiplication. Moreover, it is difficult to distinguish very small angles from zero angles, and thus it is preferable to have tools which do not need such distinguishing.
The basic idea of projective geometry is to extend affine spaces by points in infinity such that it allows us an easy work with linear objects like points, lines, planes, projections, etc.
4.33. Projective extension of affine plane. We begin with the simplest interesting case, the geometry in a plane. If we imagine the points in plane A2 as the plane z — 1 in V?, then each point P in our affine plane is represented by a vector u — (x, y, 1) e R3, and so it is represented also by a one-dimensional subspace (u) c R3. On the other hand, almost each one-dimensional subspace in R3 intersects our plane in exactly one point P, and the vectors of such subspace are given by coordinates (x, y, z) uniquely up to a common scalar multiple. Only the subspaces corresponding to vectors (x, y, 0) will not have any intersection with our plane.
_ |    Projective plane [___
Definition. Projective plane V2 is the set of all one-dimensional subspaces in R3. Homogeneous coordinates of point P — (x : y : z) in the projective plane are triples of real numbers given up to a common scalar multiple, while at least one of them must be nonzero. The straight line in projective plane is defined as the set of one-dimensional subspaces (i.e. points in V2) which generate a two-dimensional subspace (i.e. a plane) in R3.
234
CHAPTER 4. ANALYTIC GEOMETRY
Necessarily, also determinant A33, which is algebraic complement of
«33 is invariant to coordination transformation, because for rotation
only det A' = det MT det A det M holds. In this case matrix M = ^cos a   — sin a 0\
sin a    cos a    Ol anddetA33 = detA33 = S. For translation 0 0 1/
/l   0 cx
only M = I 0   1   c2 | and this subdeterminant remains unchanged. \0  0 1
4.44.  Determine type of conic section 2x2 —2xy+3y2 —x+y—1=0.
Solution. Determinant A
■1 3 i j_
"2 2
i 4
i
-i
j/0 hence it
is non-degenerate conic section. Moreover 8 = 5 > 0, therefore it is ellipse, ellipse.
ellipse. Furthermore (an + a22)A = (2 + 3) • (—^) < 0, so it is real
□
4.45. Determine type of conic section x2 — 4xy — 5 y2 + 2x+4y+3 0.
1 -2 1 -2 -5 2 1     2 3
Solution. Determinant A furthermore 8
1 -2 -2 -5
-34 ^ 0, 9 < 0, it is therefore hyperbola. □
4.46. Determine equation and type of conic section passing through points
[-2, -4],    [8, -4],    [0, -2],    [0, -6],    [6, -2].
Solution. We will input coordinates of the points into general conic section equation
aiix2 + a22y2 + 2a\2xy + a\x + a2y + a
0
We get linear equation system
4«ii   +   16^22   +   16ai2 —   2a i   — 4a2 + a
64fln   +   16^22   —   64^12 +   8fli   — 4a2 + a
4a22 - 2a2 + a
36a22 — 6a2 + a
36fln   +    4^22   —   24^12 +   6ci\   — 2a2 + a
In matrix form we perform operations
0, 0, 0, 0, 0.
(4	16	16	-2	-4	1\
64	16	-64	8	-4	1
0	4	0	0	-2	1
0	36	0	0	-6	1
\36	4	-24	6	-2	V
In order to have a concrete example, let us look at two parallel lines in affine plane R2
Li:y-jc-l=0,    L2:y-x+ 1=0.
If we see the points of lines L i and L2 as finite points in projective space V2, their homogeneous coordinates (x : y : z) obviously satisfy equations
L\: y — x — z — 0,    L2 : y — x + z — 0.
It is easy to see that the intersection L i n L2 is the point (—1:1: 0) e V2 in this context, i.e. the point of infinity corresponding to the common direction vector of the lines,
4.34. Affine coordinates in projective plane. On the contrary if we begin with the projective plane and if we want to see the affine plane as its "finite" part, then instead of plane " z — 1 we may take an other plane a in R3 which does not pass through origin 0 e M3. Then the finite points will be those one-dimensional subspaces which have a nonzero intersection with the plane a.
Let us proceed farther in our example of two parallel lines from the previous paragraph, and let us see what their equations look like in coordinates in affine plane given by y — 1. To get them, it suffices to substitute y — 1 into the previous equations:
Li : 1
0,    L'7 : 1
0
Now the "infinite" points of our former affine plane are given by z — 0, and we see that our lines L\ and L'2 intersects in point (1, 1,0). This corresponds to the geometric vision that two parallel lines L i, L2 in affine plane intersect in infinity, in point (1:1:0) precisely.
4.35. Projective spaces and transformations. One can generalize in a natural way our procedure from the affine plane to each finite dimension.
Choosing an arbitrary affine hyperplane A„ in vector space M"+1 which does not pass through origin we may identify the points P e A„ with one-dimensional sub-spaces generated by these points. The remaining one-dimensional subspaces fulfil a hyperplane parallel to A„, and we call them infinite points in the projective extension V„ of affine plane A„.
Obviously the set of infinite points in V„ is always a projective space of dimension one less. An affine straight line has only one infinite point in its projective extension (both ends of the line "intersect" in infinity and thus the projective line looks like a circle), the projective plane has a projective line of infinite points, the three-dimensional projective space has a projective plane of infinite points etc.
More generally, we define the projectivization of a vector space: for an arbitrary vector space V of dimension n+1 we define
V(V) = {P C V; P C V, dim V = 1}.
Choosing a basis u in V we get so called homogeneous coordinates on V(V) such that for a P e V(V) we use its arbitrary nonzero vector u e V and the coordinates of this vector in basis u. The points of the projective space V(V) are called geometric points, while their generators in V are called arithmetic representatives.
In the chosen projective coordinates, we can fix one of them to be one (i.e. we exclude all points of the projective extension which have this coordinate equal to zero), and so we get an embedding
235
CHAPTER 4. ANALYTIC GEOMETRY
/4	16	16	-2	-4	1 \	
0	4	0	0	-2	1	
0	0	64	-8	12	-9	
0	0	0	24	-36	27	
	0	0	0	3	-v	
	/48	0	0	0 0	-1\	
	0	12	0	0 0	-1	
	0	0	64	0 0	0	
	0	0	0	24 0	3	
	\0	0	0	0 3	-V	
ci2 = 32.
We can choose value of a. If we choose a = 48, we get
an = l,    a22 = 4,    an = 0,    ax = -Conic section has equation
x2 + Ay2 - 6x + 32y + 48 = 0. We will complete x2 — 6x, Ay2 + 32y to squares, which gives us
(x - 3)2 + A(y + A)2 - 25 = 0,
or rather
(x-3)2 (y+A)2
(I)2    "   = ' We can see it is an ellipse with center in [3, —4].
□
4.47. Other characteristics of conic sections. Let's take a further look into some terms related to conic sections. Axis of conic section is a line of reflection symmetry for conic section. From canonical form of conic section in polar basis (4.29) it can be derived that an ellipse has two axes (x = 0 a y = 0), a parabola has one axis (x = 0) a hyperbola has two axes (x = 0 a y = 0). Intersection of axis and conic section itself is called conic section vertex. Numbers a, b from canonical form of conic section (which express distance between vertices and origin) are called semi-axes length. In the case of ellipse and hyperbola, the axes intersect in the origin. This point is a point of central symmetry for the conic section. This point is called center of conic section. Besides vertices and centers there are other interesting points lying on axis of conic section. For ellipse we have ellipse foci E, F characterized by property \EX\ + \FX\ = 2a for arbitrary X lying on ellipse. Following example shows that such points E a F really exist.
4.48. Existence of foci. For ellipse with lengths of semi-axes a > b are points E = [—e, 0] and F = [e, 0], where e = \J a2 — b2 its foci (in polar coordinates).
Solution. Consider points X = [x, y], which satisfy properly \EX\ + \FX\   =  2a and we show that these are exactly ellipse points.
of n-dimensional affine space A„ C V(V). It is precisely the construction which we used in our example on projective plane.
4.36. Perspective projection. The advantages of projective geometry shows up nicely in the case of perspective projection R3 -> R2. Let us imagine that an observer '$&.J sitting in the origin observes "one half of the world", i.e. the points (X, Y, Z) e R3 with Z > 0, and sees the image "projected" on the screen given by plane Z = / > 0.
Thus a point (X, Y, Z) in the "real world" projects to a point (x, y) on the screen as follows:
*/7
X
y = f-
It is not only a nonlinear formula but also the accuracy of calculations will be problematic in the case that Z is small.
Extending this transformation to a map V3 -> V2 we get (X : Y : Z : W) k> (x : y : z) = (-fX : -fY : Z), i.e. a map described by simple linear formula
'/   0   0 0> 0/00 .0   0 10;
/a
Y Z
W
This simple expression defines the perspective projection for finite points in R3 c V3 which we substitute as points with W = 1. In this way we eliminated problems with points whose image runs to infinity. Indeed, if the Z-coordinate of a real point is close to zero, then the value of the third homogeneous coordinate of the image is close to zero, i.e. it corresponds to a point close to infinity.
Affine and projective transformations. Obviously, each injective linear map <p : V\ —>• V2 between vector spaces maps one-dimensional subspaces to one-dimensional sub-spaces. Therefore, we get a map on projectivizations T : ~P(yi) —>• Viy^). Such maps are called projective maps, in literature one uses also the notion collineation if this map is in-vertible.
Otherwise put, the projective map is a map between projective spaces such that in each system of homogeneous coordinates on domain and image it is given by multiplication by a matrix. More generally if our auxiliary linear map is not injective, then we define the projective map only outside of its kernel, i.e. on points whose homogeneous coordinates do not map to zero.
Since injective maps V —>• V of a vector space to itself are invertible, all projective maps of projective space V„ to itself are invertible too. They are also called regular collineations or projective transformations. In homogeneous coordinates, they correspond to invertible matrices of dimension n+1. Two such matrices define the same projective transformation if and only if they differ by a constant multiple.
If we choose the first coordinate as the one whose vanishing defines infinite points, then the transformations preserving infinite points are given by matrices whose first row vanishes up to its first element. If we want to switch to affine coordinates of finite points, i.e we fix the first coordinate to be one, the first element in the first row must be also equal to one. Hence the matrices of collineations
236
CHAPTER 4. ANALYTIC GEOMETRY
Coordinate-wise, this equation looks like
yj(x + e)2 + y2 + J{x - e)2 + y2 = 2a By raising to a power and performing some operations we get
(a2 - e2)x2 + a2/ = a2(a2 - e2).
Substituting e2 = a2
b2 and dividing by a2b2 we get 1.
x2 y2 az
b2
□
Remark. Number e from the previous example is called eccentricity of an ellipse. Similarly, hyperbola foci are points E, F, which satisfy \\EX\ — \FX\\ = 2a for arbitrary X on hyperbola. You can check that there are two points satisfying this condition [—e, 0] and [e, 0] (in polar basis), where e = \Ja2 + b2. Parabola focus is a point F of coordinates F = [0, f] and it is characterized by a fact that distance between this point and arbitrary X on parabola is equal to the distance between X and line y
£ 2 ■
4.49.  Find foci of ellipse x2 +2/ = 2.
Solution. We can see from the equation that semi-axes lengths area = 1 and b = We easily compute (see ||4.48||): e = \Ja2 — b2 = 1, foci coordinates then are [ — 1, 0] a [1, 0]. □
preserving finite points of our affine space have the form: /l     0    ••• 0\
flu
\K ««1 ,b„)T
fli«
where b = (b\,..., bny e Rn and A = (ai;) is an invert-ible matrix of dimension n. The action of such matrix on vector (1, x\,..., x„) is exactly a general affine transformation, where b is the translation and A its linear part. Thus the affine maps are exactly those collineations which preserve the hyperplane of infinity points.
4.38. Determining collineations. In order to define an affine map, it is necessary and sufficient to define an image of the affine frame. In the above description of affine transformations as special cases of projective maps it corresponds to a suitable choice of an image of a suitable arithmetic basis of the vector space V.
But it does not hold in general that the image of an arithmetic basis of V determines the collineation uniquely. We show the core of the problem on a simple example of affine plane. If we choose four points A, B, C, D in the plane such that each three of them are in a general position (i.e. no three of them he on a line), then we may choose their images in the collineation as follows:
Let us choose arbitrarily their four images A', B' ,C, D' with the same property, and let us choose their homogeneous coordinates u, v, w, z, u', v', w', z' v R3. Obviously the vectors z and z' can be written as linear combinations
z = c\u + cjv + csw,
Z   = C,K
- y 1/1   ~t~ '      3 '
where all six coefficients must be nonzero, otherwise there exist three of our points which are not in general position.
Now we choose new arithmetic representatives u = c\u, v = c2v and w = c3w of points A, B and C respectively, and similarly u' = c\u', v' = C2v' and w' = c3w' for points A', B' a C. This choice defines an unique linear map <p which maps successively
(p(u) = ii',    <p(y') = v',    (p(w) = w'.
But at the same time we have
cp(z) = (piu + v + w) = u' + v' + w' = z'
and so indeed the constructed collineation maps the points such as we have chosen in advance. The linear map <p is given uniquely by our construction, thus the collineation is given uniquely by our choice.
Our argumentation holds also in the case when some of the chosen points are infinite (i.e. one or two). The same phenomenon can be explained even more easily on regular collineations of a projective line, these are defined by pairwise different images of three pairwise different points.
The procedure which we used obviously works in an arbitrary dimension n. Then we say that n + 2 points are in general position if no n + 1 of them he in the same hyperplane. We also call these points linearly independent, forming a geometric basis of projective space.
Theorem. A regular collineation on n-dimensional projective space is uniquely determined by linearly independent images of n+2 linearly independent points.
237
CHAPTER 4. ANALYTIC GEOMETRY
4.50. Prove that product of distances between ellipse foci and its arbitrary tangent is constant and tell the value of this constant.
Solution. Consider polar basis. Ellipse matrix has diagonal shape diag(^r, ^5,-1) and tangent equation in X=[x0, y0] is ^x + $y = 1. Distance between E,F= [=p?, 0] and this line is equal
1 ±e^
and its product
2 2
xl 4_ Zo a4 "T b4
l-e2Í
ÍR. _1_
„4 '
Zo
,2
If we substitute e2 = a2 — b2 and |§ = 1 — ^ (point X is lying on ellipse), we will find out that the previous term is equal to b2. □
4.51. What are the lengths of semi-axes, when the sum of their lengths equals distance between foci equals 1.
Solution. We solve system
and find solution a
2e
_ 5 8'
2~Jg
a + b
TIT7j2
□
4.52. For what slopes k are lines passing through [—4, 2] secant and tangent lines of ellipse defined by
x2 y2 — + — = 1 9 4
Solution. Direction vector of the line is (1, k) and its parametric equations then are x = —4+t,y = 2+kt. Intersection with ellipse satisfies
(-4 + t)2     (2 + let)2
+
1
9 4 This quadratic equation has discriminant equal to
k
D
-(Ik + 16)
Which means that for k e (—y, 0) there are two solutions (line is secant) and for k = — y and k = 0 only one solution (line is tangent).
□
4.53. Find line tangent to ellipse 3x2 + ly2 = 30, so that its distance from the center of the ellipse is 3.
Solution. Center of ellipse is in the coordinate system origin and for distance d between line ax+by+c = 0 and origin we have d
Tangent then satisfies a2 + b2
-Ja2+b2 '
I
'-. Equation of tangent passing
Proof. The proof is exactly the same as in dimension two. We recommend to write it in a detail as an exercise. □
4.39. Cross-ratio. Let us remind that affine maps preserve ratios of lengths of line segments on each line. Technically, we defined this ratio as for three points A, B and C / B, C = rA + sB as A = (C; A, B) = -s7. But it is obvious that for example for central projection the ratios are not preserved. Moreover, even the relative position of points on a line does not need to be preserved. On contrary we know from above that we may determine uniquely a projective transformation by choosing arbitrarily images of three pairwise different points on a projective line. But one can show relatively easily that the ratio of such ratios for two distinct points C is preserved: Let us consider four distinct points A,B, C, D in projective space with arithmetic coordinates x,y,w,z respectively which lie on a projective line. Since these four vectors lie in the subspace generated by (x, y), we may write w and z as linear combinations
w — t\x + s\y,   z — t2X + S2y
and we define so called cross-ratio of four points (A, B, C, D) as
P =
Sj_t2_
h S2
The definition is correct since although the vectors x and y are determined up to a scalar multiple, these multiples cancel out in our definition.
Similarly, it is obvious from our definition that each projective transformation preserves cross-ratios. Indeed, if it is given in our arithmetic coordinates by a matrix A, we get images A ■ w — t\A ■ x + t2A-y and similarly for Az, and therefore the four images will have the same cross-ratio.
Let us discuss the characterization of projective transformations yet. It holds again that they are exactly those maps which preserve cross-ratios. But this is not very practical characterization since it contains implicitly also the claim that these maps must map projective lines to projective lines.
But one can prove much stronger statement which says that a map of arbitrarily small open area in affine space M" (e.g. a ball without boundary) into the same affine space which maps lines to lines is actually a restriction of a uniquely determined projective transformation of the projective extension VRn+1 of the former affine space M". And thus these transformations evidently preserve also cross-ratios.
4.40. Duality. The projective hyperplanes in n-dimensional projective space V(V) are defined as the projectivizations of n-dimensional vector subspaces i vector space V. Hence €jp$ in homogeneous coordinates they are defined as kernels of V-1 linear forms a e V* which are determined again up to a scalar multiple.
Thus in a chosen arithmetic basis a projective hyperplane is given by a row vector a — (ao,..., a„). But in the same time the forms a are given uniquely, up to a scalar multiple. Therefore, each hyperplane in V is identified with exactly one geometric point in the projectivization of the dual space V( V*). We call such space the dual projective space and we talk about a duality between points and hyperplanes.
238
CHAPTER 4. ANALYTIC GEOMETRY
through point [xT, yT] is 3xxT + lyyj — 30 = 0. For coordinates
of touch point we have
(3xT)2 + (lyT)2   = 100 3x2 + ly2   = 30
On forms, the linear map defining a given collineation acts by the multiplication of row vectors from right by the same matrix
Its solution is xT = ±J^, yr = ±i7 j^- Considering symmetry of ellipse, we get four solutions ±3^J~^-x ± 7^J~^y — 30 = 0. □
4.54. Hyperbola x2 — y2 = 2 is given. Find equation of hyperbola having the same foci and passing through point [—2, 3].
Solution. Eccentricity of given hyperbola is e = V2 + 2 = 2. Equa-
2 2
tion of wanted hyperbola will be ^ — = 1 and its eccentricity will satisfy e2 = a2 + b2 = 4. Condition of point [—2, 3] lying on hyperbola gives 4j — ^ = 1. Respective solutions are a2 = 1, b2 = 3.
□
Sought hyperbola is xz — y = 1.
4.55. Determine equations of hyperbola 4x2 — 9y2 = 1 tangents, which are perpendicular to line x — 2y + 1 = 0. Solution. All lines perpendicular to given line have equation 2x + y + c = 0 for some c. So the hne should have exactly one intersection with given hyperbola, so equation 4x2 — 9(—2x — c)2 = 1 should have one solution. That happens for D = (36c)2 — 4.32.(9c2 + 1) = 0. Hence
c = ±
2V2 3 ■
□
4.56. Projective approach to conic section. The term of projective space gives us ability to approach the conic section from new perspective (compare with 4.42). We can understand conic section in £2 defined by quadratic form
fix, y) = anx2 + 2anxy + any2 + 2a\3x + 2a23y + a33
as set of points in projective plane V2 with homogenous coordinates (x : y : z), which are zero points of homogenous quadratic form
fix, y, z) = anx2 + 2anxy + any2 + 2a\3xz + 2a23yz + a33z2.
Or rather f(v) = vTAv, where 1; is column vector in coordinates (x, y, z) and matrix A is symmetric matrix (<%•). By Theorem 4.31 there exists a basis, in which this quadratic form has one of the following equation
f(x, y, z) = x2 + y2 + z2,    f(x, y, z) = x2 + y2 - z2.
In the former case there is only solution of / (x, y, z) =0 and therefore the original form does not represent any real conic section. The second quadratic form represents a cone in M3. We get the corresponding conic section by moving back to inhomogeneous coordinates. That means intersecting the cone with plane which had equation z, = 1 in the original basis. We immediately get the conic section classification
a - (a0,
i.e. the matrix of dual maps is A T. But the dual map maps forms in the opposite direction, from the "target space" to the "initial one", and therefore we need the inverse map to collineation / in order to study the effect of regular collineations on points and their dual hy-perplanes. The inverse is given by matrix A-1. Hence the matrix for the action of corresponding collineation on forms is (A7)-1. Since the inverse matrix is equal to the algebraically adjoint matrix A*lg, up to the multiplication by the inverse of determinant, see equation (2.2) on page 91, we can work directly with the projective transformation of space V(V*) given by matrix (A*lg)r (or without transposing if we multiply row vectors from right).
It is easy to see from definitions that the projective point X belongs to hyperplane a if their arithmetic coordinates satisfy a ■ x — 0. Obviously, it still holds after acting with an arbitrary collineation since
(a ■ A   ) • (A • x)
0.
4.41.
Fixed points, centers and axes. Let us consider a regular collineation / given in an arithmetic basis of projective space V(V) by a matrix A.
By the fixed point of collineation / we mean a
__ point A which is mapped on itself, i.e. f(A) — A,
by the fixed hyperplane of collineation / we mean a hyperplane a which is mapped on itself, i.e. f(a) c a.
Hence we see directly from the definition that the arithmetic representatives of fixed points are exactly eigenvectors of matrix A.
In the geometry of plane, we met many types of collineations: reflection through a point, reflection across a line, translation, ho-mothety etc. Perhaps we remember also some types of projections, e.g. the projection of a plane in M3 to another from a center S e R3.
Let us note that there appeared also fixed lines next to fixed points in all cases of such affine maps. For example, the reflection through a point preserves also all lines passing through this point, in the case of the translation the infinite points behave similarly.
Now we discuss this phenomenon in an arbitrary dimension. First we define a classical notion related to the incidence of points and hyperplanes.
A bunch of hyperplanes passing through point A € V(V) is the set of all hyperplanes which contain point A. It is obvious from the definition that for each point A the corresponding bunch of hyperplanes itself is a hyperplane in the dual space V(V *) (it is given by one homogeneous linear equation in arithmetic coordinates).
For a collineation / : V(V) —>• V(V) we call a point S e V(V) the center of collineation f if all hyperplanes in the bunch determined by S are fixed hyperplanes. A hyperplane a is called the axis of collineation f if all its points are fixed points.
It follows directly from the definition that the axis of a collineation is the center of the dual collineation, while the bunch of hyperplanes defining the center of collineation is the axis of the dual collineation.
Since the matrices of a collineation on the former and the dual space differ only by the transposition, their eigenvalues coincide (eigenvectors are column respectively row vectors corresponding
239
CHAPTER 4. ANALYTIC GEOMETRY
from 4.29., which corresponds to intersecting cone in R3 with different planes. Non-degenerate sections are depicted. Degenerate sections are those which passes the vertex of the cone.
We define following useful terms for conic section in projective plane :
Points P, Qe V2 corresponding to one-dimensional subspaces (p)' (?) (generated by vectors p,q € R3) are called polar conjugated with respect to conic section /, if F(p, q) = 0, or rather pT Aq = 0 holds.
Point P= (p) is called singular point of conic section /, when it is polar conjugated with respect to / with all points of the plane, so F(p,x) = 0 Vx € V2. In other words, Ap = 0. Hence the matrix A of conic section does not have maximal rank and therefore does define degenerate conic section. Non-degenerate conic sections do not contain singular points.
We call the set of all points X= (x) polar conjugated with P = (p) polar of point P with respect to conic section /. It is therefore set of point for which F(p,x) = pTAx = 0. Because polar is given by linear combination of coordinates, it is always (in non-singular case) a line. The following part explains geometric interpretation of polar.
4.57. Polar characterization. Consider non-degenerate conic section /. Polar of point P € f with respect to / is tangent to / with the touch point P. Polar of point P £ f is line defined by touch points of tangents to / passing through P.
Solution. We will first consider Pe / and show by contradiction that polar has exactly one common point with the conic section (the touch point). Suppose that polar of P, defined by F(p, x) = 0, intersects / in Q= (q) t^P. Then obviously F(p,q) =0 and fiq) = F(q, q) = 0. For arbitrary point X = (x) lying on P and Q we then have x = ap + fiq for some a, /i e K. Because of bilinearity and symmetry of F we get
fix) = Fix, x)
a2Fip,p)+2apFip,q)+p2Fiq, q)
0
to the same eigenvalues). For example in the pojective plane (and due to the same reason in each real projective space of even dimension) each collineation has at least one fixed point since the characteristic polynomials of corresponding linear maps are of odd degree and so they have at least one real root.
Instead of discussing a general theory, we will illustrate now shortly its usefulness on several results for projective planes. .
Proposition. A projective transformation different from the identity has either exactly one center and exactly one axis or it does not have center neither axis.
Proof. Let us consider collineation / on VR3 and let us assume that it has two distinct centers A and B. Let us denote by £ the line given by these two centers, and let us choose a point X in projective plane outside of £. If p and q are the lines passing through pairs of points (A, X) respectively (B, X), then also f(p) = p and f(q) = q and in particular also point X is fixed. But this means that all points of the plane outside of £ are fixed. Hence each line different from £ has all points out of £ fixed and thus also its intersection with £ is fixed. It means that / is identity mapping and so we proved that every projective transformation different from the identity may have at most one center. The same consideration for the dual projective plane gives the result about at most one axis.
If / has a center A, then all lines passing through A are fixed and correspond therefore to a two-dimensional subspace of row eigenvectors of matrix corresponding to transformation /. Therefore, there exists a two-dimensional subspace of column eigenvectors to the same eigenvalue, and this one represents exactly the line of fixed points, hence the axis. The same consideration in the reversed order proves the opposite statement - if a projective transformation of plane has an axis, then it has also a center. □
For practical problems it is useful to work with complex projective extensions also in the case of a real plane. Then the geometric behaviour can be easily read off the potential existence of real or imaginary centers and axes.
4.42. Projective classification of quadrics. In the end of this section we come back to conies and quadrics. A quadric ; / Q in n-dimensional affine space Rn is defined by gen-sr^ eral quadratic equation (4.4), see page 228. Viewing affine space Rn as affine coordinates in projective space VRn+1 we may aim to describe the set Q by homogeneous coordinates in projective space. The formula in these coordinates should contain only terms of second order since only a homogeneous formula is independent of the choice of the multiple of homogeneous coordinates (xo, x\,... ,x„) of a point. Hence we are searching for a homogeneous formula whose restriction to affine coordinates, i.e. substitution xo = 1, gives the original formula (4.4).
But this is especially easy, we simply add enough xq to all terms - no one to quadratic terms, one to linear terms and x2, to the constant term in the original affine equation for Q.
So we get a well defined quadratic form / on vector space Rn+1 whose zero set defines correctly so called projective quadric
The intersection of "cone" Q c Rn+1 of the zero set of this form with affine plane xq = 1 is the original quadric Q whose points are called proper points of the quadric, while the other points Q \ Q in the projective extension are the infinite points.
240
CHAPTER 4. ANALYTIC GEOMETRY
which means that every point x of line is lying on conic section /. However, when the conic section contains a line, it has to be degenerate, which is contradiction. As well, we can see that in the case of degenerate conic section, the polar is line of conic section itself.
Claim for P £ f follows from the corollary of symmetry of bilinear form F. When point Q lies on polar of P, then point P lies on polar of Q.
□
Using polar conjugates we can find axes and center of conic sections without need of Lagrange algorithm.
Consider conic section matrix as a block matrix
A a
T
a a
where A = (atj) for i, j = 1, 2, a is vector (ai3, a23) and a = a33. That means the conic section is defined by equation
uT Au + 2aTu + a = 0
for vector u = (x, y). Now we show that
4.58. Axes of conic section are polars of points at infinity determined by eigenvectors of matrix A.
Solution. Because of symmetry of A, in the basis of its eigenvectors, it has diagonal shape D = , where X, \± e M. and this basis is
ortogonal. Denote by U matrix changing basis to eigenbasis (columns are eigenvectors), then the conic section matrix in eigenbasis is
'UT   0\/A   a\/U   0\ _ / D UTa" 0    l) \aT   a)\0   l) ~ [aTU a
So in this basis we have canonical form defined by vector UTa (up to
translation). Specifically, denote by vx,    eigenvectors and we have
a1Vi t              aTv„ t     (aTvA2 (aTv,.)2 k(x + —±)2 + fi(y +--)2 =        kJ   +--'— - a.
A fl A fl
It means that eigenvectors are direction vectors of the conic section axes (so called main directions) and axes equations in this basis are x = —^y2- and y = —^J±- Axes coordinates uk and    in standard
T T
basis satisfy v^iii = — ^-^ and v^u^ = —^J±, because v^iXui +
The classification of real or complex projective quadrics, up to projective transformations, is a problem which we have already managed — it is all about finding the canonical polar basis, see paragraph 4.29. From this classification, which is given by signature of the form in real case and only by rank in complex case, we can deduce relatively easily also the classification of affine quadrics. We show the core of the procedure on the case of conies in affine and projective plane.
The projective classification gives the following possibilities, described by homogeneous coordinates (x : y : z) in projective plane VR3:
imaginary regular conic given by x2 + y2 real regular conic given by x2 + y2 pair of imaginary lines given by x2 + y2 — 0
= 0
z2 = 0
• pair of real lines given by x2
• one double line x2 — 0.
y
o
We consider this classification as real, i.e. the classification of quadratic forms is given not only by its rank but also by its signature. Nevertheless, the points of quadric are considered also in the complex extension. In this way we should understand the stated names, e.g. the imaginary conic does not have any real points.
4.43. Affine classification of quadrics. For affine classification we must restrict the projective transformations to those which preserve the line of infinite points. We can realize this also by an opposite procedure — for a fixed projective type of conic Q, i.e. its cone Q c R3, we are choosing different affine planes a c R3 which do not pass through origin, and we observe how the set of points Q n a, which are proper points of Q in affine coordinates realized by plane a, is changing.
Hence in the case of a regular conic we have a true cone Q given by equation z2 — x2 + y2 and as planes a we may take the tangent planes to unite sphere for instance. If we begin with plane z — 1, the intersection consists only from finite points forming a unite circle Q. By a successive sloping of a we get more and more stretched ellipse until we get such slope that a is parallel with one of lines of the cone. In that moment there appears one (double) infinite point of our conic whose finite points still form one connected component, and so we get parabola. The continuation in sloping gives rise to two infinite points and the set of finite points is no more connected, and so we get the last regular quadric in the affine classification, a hyperbola.
We can take the advice from the introduced method which enable us to continue the classification in higher dimensions. In particular, let us notice that the intersection of our conic with the projective line of infinite points is always a quadric in dimension one less, i.e. in our case it is either an empty set or a double point or two points as types of quadrics on a projective line. Next we found out that we found an affine transformation transforming one of possible realizations of a fixed projective type to another one only if the corresponding quadrics in the infinite line were projectively equivalent. In this way, it is possible to continue the classification of quadrics in dimension three and farther.
241
CHAPTER 4. ANALYTIC GEOMETRY
a) = 0 and v^ip^u^ + a) = 0. These equations are equivalent to equations vf(Au^ + a) = 0 a u^(AwM + a) = 0 which are polar equations of points defined by vectors v^av^. □
4.59. Remark. Corollary of the previous claim is a fact that center of the conic section is polar conjugated with all points at infinity. Center s coordinates then satisfy equation As + a = 0.
If det(A) 7^ 0, then equation As+a =0 for center coordinates has exactly one solution for 8 = det(A) ^ 0 and no solutions for 8 = 0. That means that, regarding non-degenerate conic sections, ellipse and hyperbola have exactly one center and parabola does not have any (its center is point at infinity).
4.60. Prove that angle between tangent to parabola (with arbitrary touch point) and parabola axis is the same as angle between the tangent and line connecting focus and the touch point.
Solution. Polar (i.e. tangent) of point X=[x0, yo] to parabola defined by canonical equation in polar basis is line satisfying
0 i i
x0x - py - py0 = 0
Cosine of angle between tangent and parabola axis (x = 0) is given by dot product of corresponding unit direction vectors. Unit direction vector of the tangent is , 1    (p,x0) and therefore for cosine we have
1 Xn
= (p,x0).(0,l) =
Pz+*o \JP +xo
Now we show that cosine of angle between tangent and line connecting focus F=[0, f ] and touch point X is equal. Unit direction vector of the connecting line is
1 p
(x0, yo - -)■
For cosine of angle we have
1 1 px0
===(x0y0 + ——)
p2 + xz0 Jxz0 + (yo - f)
P\2
r2
0. rr^t _£Q_
Substituting yo = ^ we get
/p2+4
This example shows that lightrays striking parallel with axis of parabolic mirrow are reflecting to the focus and, vice versa, lightrays going through focus reflect in direction parallel with axis of parabola. This is the principle of many devices such as parabolic reflector. □
242
CHAPTER 4. ANALYTIC GEOMETRY
4.61.  Find equation of tangent in P=[l, 1] to the conic section
4x2 + 5V2 - 8xy + 2y - 3 = 0
Solution. By projecting we get conic section defined by quadratic form (x, y, z)A(x, y, z)T with matrix
/ 4    -4 0
A =   -4    5 1 \ 0 1-3
Using previous theorem, tanget is polar of P, which has homogenenous coordinates (1:1: 1). It is given by equation (1,1, l)A(x, y, z)T = 0, which in this case gives
2y - 2z = 0
Moving back to inhomogeneous coordinates we get tangent equation y = l.
□
4.62. Find coordinates of intersection of y axis and conic section defined by
5x2 + 2xy + y2 - 8x = 0
Solution, y axis, i.e. line x = 0, is polar of sought point P with homogeneous coordinates (p) = (p\ : p2 : p3). That meansthat equation x = 0 is equivalent to polar equation F(p, v) = pT Av = 0, where i; = (x, y, z)T. This is satisfied when Ap = (a, 0, 0)T for some a € R. This condition gives us for conic section matrix
/5    1 -4\
A =    1    1 0 \-4  0   0 /
equation system
5pi + Pi ~ 4p3   = aj Pi + Pi   = 0 -4Pl   = 0
We can find point P coordinates by inverse matrix, p = A~1(a,0, 0)T, or solve the system directly by backward substitution. In this case we can easily obtain solution p = (0, 0, —\a). So y axis touches the conic section in the origin. □
CHAPTER 4. ANALYTIC GEOMETRY
4.63. Find a touch point of line x = 2 with conic section from the previous exercise.
Solution. Line has equation x — 2z, = 0 in projective extension and therefore we get condition Ap = (a,0, —2a) for touch point P, which gives us
5pi + p2 - 4p3   = a Pi + P2   = 0 —4pi   = —2a
Its solutionis p = (\a, — \a, \a). These homogeneous coordinates are equivalent to (2, —2, 1) and hence the touch point has coordinates [2, -2]. □
4.64. Find equations of tangents passing through P= [3, 4] to tangent defined by
2x2 - 4xy + y2 - 2x + 6y - 3 = 0
Solution. Suppose that the touch point has homogeneous coordinates given by multiple of vector t = (t\,t2, h)- Condition of T lying on conic section is tT At = 0, which gives
2t2 - 4ttt2 +       2ttt3 + 6t2h - 3rj = 0
Condition of P lying on polar of T is pT At = 0, where p = (3, 4, 1) are homogeneous coordinates of point P. In this case, the equation gives us
(2    -2 -l\(h\ (3, 4, 1)   -2    1     3 \ \t2 \= -3h +t2 + 6t3=0 V-l    3    -3) \t3)
Now we can input t2 = 3ti — 6t3 to the previous quadratic equation.
Then
-t\ +4tit3 - 3t\ = 0
Because for t3 = 0 equation is not satisfied, we can move to inhomo-geneous coordinates     ^, 1), for which we get
-(^)2 + 4(^)-3 = 0   a   g = 3(£)-6,
tj. f = 1 a ff = -3, nebo lj- = 3 a ff = 3. So the touch points have homogeneous coordinates (1 : —3 : 1) and (3:3: 1). Tangent equations are polars of those points Ix — 2y — 13 = 0 and x = — 3. □
4.65.  Find equation of tangent passing through origin to the circle
x2 + y2 - lOx - 4y + 25 = 0
Solution. Touch point (t\ : t2 : t3) satisfies
1     0    -5\ jh\ (0, 0, 1) | 0     1    -2 I I f2 I = —5*i - 2t2 + 25 = 0 -5   -2   25 / W
244
CHAPTER 4. ANALYTIC GEOMETRY
>From here we derive for example t2 and substitute in circle equation, which {t\ : t2 : h) has to satisfy as well. We get quadratic equation 29t2 - 250fi + 525 = 0, which has solutions h = 5 and tx = . We compute coordinate t2 and get touch points [5, 0] and [^, ^]. The tangents are polars of those points with equations y = 0 a 20x —21y = 0. □
4.66. Find tangents equations to circle x2 +v2 = 5 which are parallel with 2x + y + 2 = 0.
Solution. In projective extension, these tangets intersect in point at infinity satisfying 2x + y + z, = 0, so in point with homogeneous coordinates (1 : —2 : 0). They are tangents from this point to the circle. We can use the same method as in previous exercise. Conic section matrix is diagonal with the diagonal (1,1,-5) and therefore the touch point (?i : t2 : t3) of the tangents satisfy t\ — 2t2 = 0. Substituting into circle equation we get 5t\ =5. Since that t2 = ±1 and touch points are [2, 1] and [—2, —1]. □ Tangent touching the conic section at infinity is called conic section asymptote. Number of asymptotes of conic section is equal to number of intersections between conic section and line at infinity, which means that ellipse does not have any real asymptote, parabola has one (which is however line at infinity) and hyperbola two of them.
4.67. Find points at infinity and asymptotes of conic section defined by
Ax2 - Sxy + 3/ - 2y - 5 = 0
Solution. First, we rewrite the conic section in homogeneous coordinates.
4x2 - 8xy + 3y2 - 2yz - 5z2 = 0 Points at infinity are then points determined by homogeneous coordinates (x : y : 0) satisfying this equation, which means
4x2 - 8xy + 3/ = 0.
For fraction - we get two solutions: - = — \ and - = — Conic
y ° y 2 y 2
section is therefore hyperbola with points at infinity P= (—1 : 2 : 0) a
Q= (—3:2:0). Asymptoty jsou potom polAA/y bodLZ P a Q, tj.
4    -4    0\ /V (-1,2,0)1-4    3    -1     y I =-12x + 10y-2 = 0
(-3,2,0)   -4    3    -1     y   =-20x + 18y - 2 = 0
□
245
CHAPTER 4. ANALYTIC GEOMETRY
You can find further exercises on conic sections on the page 250.
4.68. Harmonic cross-ratio. If cross-ratio of four points lying on line equal to —1, we talk about harmonic quadruple. Let A BCD be a quadrilateral. Denote by K intersection of lines AB and CD, by M intersection of lines AD and BC. Further let L, N be intersection of KM and AC, BD respectively. Show that points K, L, M, N are harmonic quadruple. O
246
CHAPTER 4. ANALYTIC GEOMETRY
247
CHAPTER 4. ANALYTIC GEOMETRY
248
CHAPTER 4. ANALYTIC GEOMETRY
249
CHAPTER 4. ANALYTIC GEOMETRY
D. Further exercise on this chapter
4.69. Find parametric equation of intersection of planes in M3:
a : 2x + 3y — z + 1 = 0   a   p : x — 2y + 5 = 0.
o
4.70. Find common perpendicular of skew lines
p : [1, 1, 1] + t(2, 1, 0),    q : [2, 2, 0] + f (1, 1, 1).
o
4.71. Jarda is standing in [ — 1, 1,0] and has a stick of length 4. Can he simultaneously touch lines p and q, where
p   :   [0,-1,0] + f(l,2, 1), q   :   [3,4, 8]+^(2, 1,3)?
O (Stick has to pass through [-1,1,0].)
4.72. Cube ABCDEFGH is given. Let point T lie on edge BF, \BT\ = \\BF\. Compute cosine of angle between ATC and BDE. Q
4.73. Cube ABCDEFGH is given. Let point T lie on edge AE, \AT\ = \\AE\ and S is midpoint of AD. Compute cosine of angle between BDT and SCH. Q
4.74. Cube ABCDEFGH is given. Let point T lie on edge BF,\BT\ = \\BF\. Compute cosine of angle between ATC and BDE. O
2 2
4.75. Determine tangent to ellipse    + y = 1 parallel with line x + y — 7 = 0.
Solution. Lines parallel with given line intersect this line in point at infinity (1 : —1 : 0). We construct tangents to given ellipse passing through this point. Touch point T= (h '■ h '■ h) lies on its polar and therefore satisfies — § = 0, so t2 = ^t\. Substituting in ellipse equation we get t\ = ±y. Touch points of sought tangents are [y, |] and [—y, — f ]. Tangents are polars of those points. These have equations x + y = 5 and x + y = —5. □
4.76. Determine points at infinity and asymptotes of conic section
2x2 + 4xy + 2V2 - y + 1 = 0
Solution. Equation of points at infinity 2x2 + 4xy + ly2 = 0 or rather 2(x + y)2 = 0 has solution ^ — y. The only point at infinity therefore is (1 : — 1 : 0) (conic section is a parabola). Asymptote is a polar of this point, specifically line at infinity z = 0. □
4.77. Prove that product of distances between arbitrary point on a hyperbola and its asymptotes is consant and tell the value of this constant.
Solution. Denote the point lying on hyperbola as P. Asymptote equation of hyperbola in canonical form is bx ± ay = 0. Their normals are (b, ±a) and from here we determine projections Pi, P2 of point P to asymptotes. For distance between point P and asymptotes we get |P Pi 21 = -7==. The
2 „2    j.2  2 2 2
product is therefore equal a ^2^_h2P = f2+b2' because P ues on hyperbola. □
250
CHAPTER 4. ANALYTIC GEOMETRY
4.78.   Compute angle between asymptotes of hyperbola 3x2 — y2 = 3.
Solution. For cosine of angle between asymptotes of hyperbola in canonical form we get cos a
b2+a
. In our case the angle is equal 60°. □
4.79.   Compute centers of conic sections
(a) 9x2 + 6xy - 2y - 2 = 0
(b) x2 + 2xy + y2 + 2x + y + 2 = 0
(c) x2 - 4xy + Ay2 + 2x - Ay - 3 = 0
(d) ^# + ^# = 1
Solution, (a) System As + a = 0 for computing proper centers is
9si + 3s2   = 0 3*1-2  = 0
and, solving it, we obtain center [|, — 2].
(b) In this case we have
si+s2 + l = 0 S1+S2 + J  = 0
and therefore there is no proper center (conic section is a parabola). Moving to homogeneous coordinates we can obtain center at infinity (1 : — 1 : 0).
(c) Coordinates of center in this case satisfy
51-2*2 + 1   = 0 -2*i +4*2-2   = 0
and the solution is whole line of centers. It is so because the conic section is degenerated to pair of parallel lines.
(d) From equations for center computation we immediately get that center is (a, (3). Coordinates of center therefore gives translation of coordinate system origin to the frame in which the ellipse has basic form.
□
4.80. Tell the equations of axes of conic section 6xy + Sy2 + 4y + 2x — 13 = 0.
Solution. Main directions of the conic section (axes direction vectors) are eigenvectors of matrix 0 3\
2   g 1. Characteristic equation has form X2 — 8A — 9 = 0 and eigenvalues are therefore Ai = — 1,
X.2 = 9. Corresponding eigenvectors are then (3, — l)and(l, —3). Axes are polars of points at infinity defined by those directions. For (3, —1) we get axis equation — 3x + y + 1 = 0 and for (1, —3) axis
-9x - 2\y - 5 = 0. □
4.81. Determine equations of axes of conic section Ax2 + 4xy + y2 + 2x + 6y + 5 = 0.
(A 2\
Solution. Eigenvalues of matrix I ^   ^ I are X\ = 0, A2 = 5 and corresponding eigenvectors are
(—1,2) and (2, 1). We get axes 5 = 0 and 2x + y + 1 = 0. The former axis is obviously satisfied for no point. Hence there is only on axis (the conic section is a parabola). □
4.82. Equation
x2 + 3xy - y2 + x + y + 1 = 0.
251
CHAPTER 4. ANALYTIC GEOMETRY
defines a conic section. Tell its center, axes, asymptotes and foci.
252
CHAPTER 4. ANALYTIC GEOMETRY
Exercise solution
4.9. 2, 3, 4, 6, 7, 8. Try to find planes positions which correspond to each of those numbers on your own. 4.2<S. For normal vector (a, b, c) of such planes we havea+b = 0(ortogonal to /?)and by choosinga — — b = 1 (vector (0, 0, 1) does not satisfy the conditions, so by certain multiplication we can get a — — b — 1) we then
4.33. (-1,3,2).
4.69. Line (2t, t, It) + [-5, 0, -9].
4.70. [3,2, l][8/3, 8/3, 2/3].
4.71. Transversal [1,1,1][—3, 1,—1] of length V20, stick is not long enough.
4.72. ^
4.73.
get, using the angle condition.
\, altogether, the sought equations are x — y ± V6 —1=0.
253
CHAPTER 5
Establishing the ZOO
which functions do we need for our models? - a thorough menagerie
A. Polynomial interpolation
At the beginning of this chapter, we will try to approximate functions by polynomials. Suppose we have incomplete information about an unknown function, namely the values it takes at several points, or the values of its first or second derivatives at those points as well. We will try to find a polynomial (of the least degree possible) satisfying these dependencies.
5.1.   Find a polynomial p satisfying the following conditions:
p(2) = 1, p(3) = 0, p(4) = -1, p(5) = 6.
kWiKFirst, let us solve this task by creating a system of four linear equations in four variables. Suppose the polynomial is of the form A3X3 + a2x2 + a\x\ + flo- We know there is exactly one polynomial of degree less than four and satisfying the given conditions.
flo + 2fli + 4a2 + 803 ao + 3a\ + 9a2 + 21a3 flo + 4a i + 16^2 + 64^3 flo + 5a i + 25^2 + 125^3
1
0
In this chapter, we begin to develop tools that will allow us to model dependencies which are neither linear, nor discrete. We can often meet this need when we have a system developing in time and we try to describe it not only at several moments, but "continuously", i. e. for all possible points in time. Sometimes this is an intent (this concerns, for instance, physical models of classical mechanics), whereas other times it may be an appropriate approach to discrete models (in economics, chemistry, or biology, for example).
The key concept is that of a function. The larger class of functions we admit, the more difficult it will be to develop the necessary tools for our work. On the other hand, if there are only few types of functions available, it may happen that we will not be able to model some real situations at all. The objective of the following two chapters is thus to explicitly introduce several types of elementary functions, to implicitly describe far more functions, and to build the standard tools for the work with them. This is called differential and integral calculus of one variable. While, so far, we have mainly focused on the part of mathematics called algebra, now we will approach the so-called mathematical analysis.
1. Polynomial interpolation
In the previous chapters, we often worked with sequences of real or complex numbers, i. e. with scalar functions N -> K or Z -> K, where K is a given set of numbers. We also worked with sequences of vectors over real or complex numbers
Let us remind the discussion from the paragraph 1.4, where we thought about how to deal with scalar functions. There is nothing to add to this discussion and we would like (to start off) to work with functions M -> M (real-valuedfunctions of a real variable), or M -> C (complex-valued functions of a real variable), or functions Q -> Q (rational-valued functions of a rational variable) and so on. Our conclusions can usually be extended to the cases concerning vector values over the considered scalars, but we will mostly talk only about the cases of real and complex numbers.
Let us begin with the easiest functions which we can assign explicitly by finitely many algebraic operations with scalars.
5.1. Polynomials. We can add and multiply scalars, and these op-WsJ^// erations satisfy a number of properties which we enumerated in the paragraphs 1.1 and 1.3. If we admit any finite number of these operations, leaving one of the variables as an unknown and fixing the other scalars, we get the so-called polynomial functions:
CHAPTER 5. ESTABLISHING THE ZOO
Each equation arose from one of the given conditions.
Another option is to construct the required polynomial from the fundamental Lagrange polynomials.
(see 5.4):
Polynomials
A polynomial over a ring of scalars ! given by the expression
fix) = anx" + a„_ix"_1
is a mapping /
H----+ a\x + ao,
P(x)
(jc-3)(jc-4)(jc-5) 1 . ^-!±-!±-L + o .(...) +
(2-3)(2-4)(2-5)
(x-2)(x-3)(x-5) (x-2)(x-3)(x-4) (-1) • —-zrr:-^7~.-— + o •
(4 - 2) (4 - 3) (4 - 5) z-29.
where at, i = 0,..., n, are fixed scalars, multiplication is indicated by mere concatenation of symbols, and"+" denotes addition. If a„ / 0, we say that the the polynomial / has degree n. The decree of the zero polynomial is undefined. The scalars at are called
(5 — 2) (5 — 3) (5 — 4)   coefficients of the polynomial f.
4 , ,101
The coefficients of the polynomial form, of course, the solution of the aforementioned system of linear equations. □
5.2.   Find a polynomial P satisfying the following conditions:
P(l + 0 = i, P(2) = 1, P(3) = -i.
5.3. For pairwise distinct points x0,... ,xn e M, consider the elementary Lagrange polynomials (5.4)
li{x) :=  fr-^^-^-ijjx-^iH.-^) xsR,i=0,...,n.
(.Xi-Xo) — [Xi-Xi-l)[Xi-Xi + l) — (Xi-X„)
Prove that
J2 h W = 1   for all x e
Solution. Apparently,
Y,h(xo) = l + 0 + -- + 0=l,
= 0+1 + -- - + 0=1,
E^fe) = o + o + "- + i = i.
r=0
This means that the polynomial 5Z"=o ^ (x) °f degree not greater than n takes the value 1 at the n + 1 points x0,..., x„. However, there is exactly one such polynomial, namely the constant polynomial y = 1.
□
5.4.   Find a polynomial P satisfying the following conditions:
P(l) = 0, P'(l) = 1, P(2) = 3, P'(2) = 3.
Solution. Once again, we will provide two methods of finding the polynomial.
The given conditions give rise to four linear equations for the coefficients of the wanted polynomial. So if we look for a polynomial
The polynomials of degree zero are exactly the non-zero constant mappings. In algebra, polynomials are more often defined as formal expressions of the aforementioned form of fix), i. e. a polynomial is defined to be a sequence ao, a\,... of coefficients such that only finitely many of them are non-zero. However, we will show shortly that these approaches are equivalent.
It is easy to verify that the polynomials over a given ring of scalars form a ring as well. The multiplication and addition of polynomials are given by the operations in the original ring K by the values of the polynomials, i. e.
(/ ' *)(*) = fix) ■ 8(x),    (f + g)ix) = fix) +gix),
where the operations on the left-hand side are interpreted in the ring of polynomials whereas the operations on the right-hand side are the ones of the ring of scalars.
5.2. Euclidean division of polynomials. As we have already mentioned, we will work exclusively with the scalar fields Q, R, or C. In all these fields, the following statement holds:
Proposition (Euclidean division of polynomials). For any two
polynomials f of degree n and g of degree m, there is exactly one pair of polynomials q, r such that f = q ■ g +r and the degree of the polynomial r is less than m or r = 0.
Proof. Let us begin with the uniqueness. Suppose we have two expressions of the polynomial / in terms of the polynomials q, q', r, and r1, i. e. we have
f = q-g+r = q'-g + r'.
Subtraction gives 0 = (q — q') ■ g + ir — r1).
If q — q', then r = r' as well. And if q ^ q', then the term of the highest degree in iq — q') • g cannot be compensated by r — r', which leads to a contradiction. We have thus proved the uniqueness of the result of the division, supposing it exists.
It remains to prove that the polynomial / can always be expressed in the wanted form. If m > n, we can immediately set / = 0 • g + /. Therefore, let us suppose that n > m and prove the proposition by induction on the degree of the polynomial /.
If / is of degree zero, then the statement is trivial. Let us thus suppose that the statement holds for all polynomials / of degree less than n > 0 and consider the expression h(x) = fix) — jrx"~m gix). Either h(x) is the zero polynomial and we have got what we have been looking for, or it is a polynomial of a lower degree and as such can be written in the desired form as h (x) = q ■ g + r whence
f(x)=h(x)-and we are done.
^x«-
lgix) = iq +
^x«-
□
255
CHAPTER 5. ESTABLISHING THE ZOO
of degree less than four, we get the same number of equations and unknown coefficients (let us say P(x) = a3x3 + a2x2 + a\x + a0):
P(\) =      (a3 + a2 + a\ + a0 =0,
P'(\) =       3a3 + 2a2 + ax = 1,
P (2) = %a3 + 4a2 + 2a i + ciq = 3,
P'(2) =      \2a3 + 4a2 + ax = 3.
By solving this system, we obtain the polynomial P(x) = —2x3 + 10x2 - 13* + 5.
Another solution. We will use fundamental Hermite polynomials:
h\(x) =	1-	L	~~rr(x
	V	0 + (-	-1)
h\(x) =	(5-	2x)(x -	-I)2,
h\{x) =	(x-	l)(x-	2)\
h\(x) =	(x-	2)(x-	I)2-
Altogether,			
(2x - l)(x - 2)2,
P(x) = 0-h\(x)+3-h\(x)+\-h\(x)+3-h\(x)
-2x3+10x2-13x+5.
□
5.5. Using Lagrange interpolation, approximate cos2 1. Use the values the function takes at the points f, f, and j.
Solution. First, we determine the mentioned values: cos2(j) = 1/2, cos2(^-) = 1/4, cos2(^) = 0. Then, we determine the elementary Lagrange polynomials, calculating their values at the given point.
*o(l)
Altogether, P(l) =
(1	"f)(l	2>	8(7r"	3)(7T-	2)
- ,71 M	1Z \ { 1Z 3 )\4	2>	O	7T2	
(1-		2>	9(7r"	-4)(7T-	-2)
(- -	71 \ f 71	- ~) ~ 2>	y	7T2	
(1			2i7t-	4)(7T-	3)
^2	71 \ f 71 4){ 2			7T2	
(71-	3)(7T-	2) 1	•9(7r"	-4)(7T-	-2)
	7T2	4	* y	7T2	
+ 0
(5jt - 12) (jt -2)
ijT2
0.288913.
We may notice we did not need to calculate the third elementary polynomial. The actual value is cos2 1 = 0.291927. □
If the value / (b) equals zero for some element b e K, then we must have r — 0 in the quotient f(x) = q(x)(x —b) + r because otherwise we could not achieve f(b) — q(b) ■ 0 + r, where the degree of r is zero. We say that b is a root of the polynomial f. The degree of q is then exactly n — 1. If q also has a root, we can continue and in no more than n steps we arrive at a constant polynomial. Thus we have proved that the number of roots of any non-zero polynomial over the field K is at most the degree of the polynomial. Hence we can easily derive the following observation:
Corollary. IfM. is an infinite field, then the polynomials f and g are equal as mappings if and only if they equal as sequences of coefficients.
Proof. Suppose that / — g, i. e. / — g — 0, as a mapping. Therefore, the polynomial (/ — g)(x) has infinitely many roots, which is possible only if it is the zero polynomial. □
Let us realize that of course, this statement does not hold with finite fields. A simple non-example is the polynomial x2 + x over Z2 which represents a constant zero mapping.
5.3. Interpolation polynomial. It is often useful to give an easily <gu        computable expression for a function which is given by the values it takes at some given points xq,...,x„. If the values were all zeros, we can immediately find a polynomial of degree n + 1, namely
f(x) — (x - xq)(x - x\)...(x - x„),
which takes zero at these points and only at them. However, there are more polynomials which takes zero at the given points, for instance the zero polynomial, which is the only such polynomial in the vector space of polynomials of degree at most n. The general situation is analogous:
Interpolation polynomials    |_^
Let K be an infinite field of scalars. An interpolation polynomial f for the set of (pairwise distinct) points xq,...,x„ e K and the given values yo,..., yn e K is the zero polynomial or a polynomial of degree at most n such that /(*/) — yi for all i =0, l,...,n.
Theorem. For every set of n + 1 (pairwise distinct) points xq, ..., xn € K and the given values yo, ..., yn e K, there is exactly one interpolation polynomial f.
Proof. Let us begin with the easier part, i. e. the uniqueness.
If / and g are interpolation polynomials with the same defining values, then their difference is a polynomial of degree n which has at least n + 1 roots, and thus / — g — 0.
It remains to prove the existence. Let us label the coefficients of the polynomial / of degree n:
f — a„x" + • • • + a\x + flo-
256
CHAPTER 5. ESTABLISHING THE ZOO
5.6. Joe needs to calculate values of the sine function with a calculator capable of basic arithmetic operations only. As he remembers the sine's values at the points 0, |, j, j, | and knows that it, ~J2, and V3 are approximately 3.1416, 1.4142, and 1.7321, respectively, he decided to use interpolation. Help him build an approximate formula, using all of the given values.
Solution. We will construct the elementary Lagrange polynomials:
l0(x)
h(x)
h(x)
l4(x)
(x - f )(x - f )(x - f )(x - f) (0-f)(0-f)(0-f)(0-f)
1.4783x4 - 5.8052x3 + 8.1057x2 - 4.7746x + 1, (x - 0)(x - f )(x - f )(x - f)
(f -0)(|-f)(f -f)(f-§)
-13.3046x4 +45.2808x3 - 49.2419x2 + 17.1887x, (x - 0)(x - f )(x - f )(x - f)
(f - 0)(f - f )(f - f )(f - f)
23.6526x4 - 74.3070x3 +71.3298x2 -20.3718x, (x - 0)(x - f )(x - f )(x - f)
(f - 0)(f - f )(f - f )(f - f)
-13.3046x4 +38.3146x3 - 32.8279x2 + 8.5943x, (x - 0)(x - f )(x - f )(x - f)
(f-0)(f
— \{IL
I)
1.4783x4 - 3.4831x3 + 2.6343x2 - 0.6366x.
Then, the value of the interpolation polynomial is
1 V2 73
P(x) = 0 • o(i) + -h(x) + — h(x) + — h(x) + U(x)
= 0.0288x4 - 0.2043x3 +0.0214x2 + 0.9956x.
□
Additional questions: Can Joe use this formula to calculate the sine's values at the interval [|, 7r]? If not, what should he do? What would the approximate formulae look like if he used not all five knots, but only the three nearest ones for each point?
5.7. The day after, Joe needed to calculate the binary logarithm of 25. ax^', (Actually, he needed the natural logarithm of 25, but since he knows that In 2 is approximately 0.6931, the binary one will do.) So he took the points 16 and 32 (with values 4 and 5, respectively) and constructed the interpolation polynomial (line). P(x) = ±x + 3, hence P(25) = y| = 4.5625. Then, he added the point 8 (with value 3) in order to arrive at a more accurate result. In this case, the interpolation polynomial equals P(x) = — -^x2 + j^x + |, which gives P(25) = 4.7266. Joe wanted to obtain an even more accurate number, so he added two more points, namely 2 and 4 (with values 1 and 2, respectively). How shocked he was when he got
Substituting the wanted values leads to a system of n -for the same number of unknown coefficients a,
a0 + x0ai H-----h (x0)"a„ — yo
1 equations
a0
+ xnai H-----h (xn)na„ = y„.
The existence of a solution of this system can easily be shown by straight construction of the polynomial by the so-called Lagrange polynomials for the given points xq,...,x„, see the next paragraph.
However, we will now finish the proof using only our basic knowledge from linear algebra. This system of linear equations has a unique solution if the determinant of its matrix is an invertible scalar, i. e. a non-zero scalar (see 3.1 and 2.23). It is the so-called Vandermonde determinant, which was discussed in the exercise ||2.24|| on page 87.
Since we have verified that for zero right-hand sides, there is exactly one solution, we know that this determinant must be nonzero.
And since polynomials equal as mappings iff they equal as sequences of coefficients, the theorem is proved. □
v . t
5.4. Applications of interpolations. At first sight, it may seem ^ that real or rational polynomials, i. e. polynomial functions R —>• R or Q —>• Q, form a very nice class j« of functions of one variable. We can lay them through any set of given values. Moreover, they are easily expressible, so their value at any point can be calculated without difficulties. However, we encounter a number of problems when trying to put them in practice.
The first of the problems is to quickly find the polynomial which we will lay through the given data since solving the aforementioned system of linear equations generally requires time proportional to the cubed number of given points, which is unacceptable for larger data. Another problem is slow computation of the value of a polynomial of a relatively high degree at a given point. Both of these problems can be partially bypassed by selecting a more convenient expression of the interpolation polynomial (i. e. we choose a basis of the corresponding vector space of all polynomials of degree at most k which is better than the standard basis 1, x, x2, ..., x" ).
We will demonstrate this on one exercise:
257
CHAPTER 5. ESTABLISHING THE ZOO
the result P(25) = 5.892, which is apparently wrong as the binary logarithm is an increasing function. Can you explain the origin of this error?
Solution. Joe asked Google and learned that the interpolation error can be expressed as
f(x)-Pn(x)
(X — X0)(x — X\) . . . (x — Xn) +
(n + 1)!
where the point § is not known, but lies in the interval given by the least and greatest knots. The term in the fraction's numerator causes the accuracy to deteriorate by adding farther knots. □
5.8. A week later, Joe needed to approximate Vv. He got the idea of reversing the problem and using the inverse interpolation, ie. to interchange the roles of arguments (function inputs) and values (function outputs) and to approximate the value of an appropriate function at the point 0. Describe his procedure.
Solution. The function x2 — 1 takes 0 at Vv. Joe took the points x0 = 2, x\ = 2.5, and x2 = 3, with the function values —3, —0.75, and 2, respectively. Then he interchanged their roles, thus obtaining the elementary Lagrange polynomials
l0(x) h(x)
(x + 0.75)(x -2) 4 (-3 + 0.75X-3 - 2) ~ 45"
16 ,    16 32
--x--x -\--,
99       99 33
6,3 9
—x2 H--x H--.
55       11 55
1
—x 9
2
15'
For y/l, he got the approximate value 2 • Z0(0) + 2.5 • l\ (0) + 3 • l2(0)
437 165
2.6485.
Additional questions: Joe made a mistake while constructing one of the elementary polynomials, try to find it. Does this mistake affect the resulting value?
How could we make use of the value of the derivative at the point 2.5?
□
5.9.   Find a natural spline 5 which satisfies
5(-l) = 0, 5(0) = 1, 5(1) = 0.
Solution. The wanted spline consists of two cubic polynomials, let us denote them 5i for the interval [—1,0] and S2 for the interval [0, 1]. The word "natural" requires that the second derivatives of 5i and S2 be zero at the points — 1 and 1, respectively. Thanks to the given value at 0, we know that the absolute coefficients of both the polynomials are 1. By symmetry, the common value of the first derivative at the point 0 is 0. So we can set 5i (x) = ax3 + bx2 + 1 and S2(x) = cx3 +
Lagrange interpolation polynomials
Lagrange interpolation polynomial can easily be expressed in terms of the so-called elementary Lagrange polynomials li of degree n with the properties
1 i
J
|0 i*j
Apparently, these polynomials must be (up to a constant) equal to the expressions (x — x$)... (x — x,_i)(x — xi+\)... (x — x„), and so
Uj^ix-xj)
1
I I./. / <V<       V / ) '
The wanted Lagrange interpolation polynomial is then given by
f(x) — yolo(x) + y\l\(x) H-----h y„l„(x).
The usage of Lagrange polynomials is especially efficient if we are working with different values yt for the same set of values xi because in this case, the elementary polynomials li are already prepared.
One of the disadvantages of this expression is great sensitivity to inaccuracies in calculation when the differences of the given values Xi are small since one divides by these differences.
Another disadvantage is miserable stability of the values of real or rational polynomials as the variable grows. We will soon develop tools for an exact description of the functions' behavior, but even without them, it is clear that according to the sign of the coefficient of the term with highest degree, the polynomial's value will rapidly approach plus or minus infinity as x increases. However, the mentioned sign is even not stable under small changes of the defining values. This is illustrated by the following two pictures, displaying eleven values of the function sin(x) with small random changes of the values. There is the approximated function, the circles are gently moved values and the uniquely determined interpolation polynomial. While the approximation is quite good inside the interval, it is tremendous at the margins.
There is a rich theory about the interpolation polynomials, interested readers are advised to look at the special literature.
5.5. Note. The numerical instability caused by the closeness of (some) of the points x, is clearly seen on the system of equations from the proof of the Theorem 5.3. When solving a system of linear equations, the instability is closely related to the magnitude of the
258
CHAPTER 5. ESTABLISHING THE ZOO
dx2 + 1, where a, b, c, d are unknown real parameters. Confronting these forms with the conditions Si(-l) = 0, Si"(-1) = 0, S2(l) = 0, and S2"(l) = 0 yields the following system of four linear equations in the mentioned parameters.
-a+b+\   = 0,
-6a +2b   = 0,
c + d + 1   = 0,
6c + 2d   = 0.
Having solved that, we get Si (x)
_ 3, 2^ 2J
S(x)
-±x3
2X
fx2 + 1, S2(x)
\x3 - fx2 + 1. Altogether,
-±x3 ,2
±x3 -
2X
fx2 + 1   prox e [-1,0],
§x2 + l
pro x e [0, 1].
□
5.10.  Find a natural spline S which satisfies
S(-l) = 0, S(0) = 1, S(l) = 0, S'(-l) = 1, S'(l) = 1.
Solution. Our spline differs from the previous one only in the values of the derivatives at the points — 1 and 1. Similarly to the previous task, we get that the parts Si and S2 of our spline have the forms Si (x) = ax3 + bx2 + 1 and S2(x) = cx3 + dx2 + 1, respectively, where a,b,c, d are unknown real parameters. Confronting this with the conditions Si(-l) = 0, Si(-l) = 1, S2(l) = 0, and S^l) = 1 yields the system
-a + b + 1 3a — 2b c +d +1 3c + 2d
having the solution a = — 1, b = -wanted spline is the function
= 0, = 1, = 0, = 1, 2, c = 3, d
-4. Hence, the
S(x)
3x3
- 2x2 + 1 prox e [-1, 0], 4x2 + 1    prox e [0, 1].
□
5.11. Find a polynomial of degree two or less such that its values at the points
xq = —1,    Xi = 1,    x2 = 2
are
yo = 1,   yi = -3,   y2 = 4,
respectively.
5.12. Construct the Lagrange interpolation polynomial for
O
determinant of the corresponding matrix, i. e. the Vandermonde determinant, in our case. This can easily be calculated:
Lemma. For any sequence of pairwise distinct scalars xq, ..., xn e K, it holds that
V(x0,
i>k=0
Proof. We will proof this statement by induction on the number of the points x,. Apparently, it holds for n — 1 (and the problem is completely uninteresting for n — 0). Let us suppose that the result is correct for n — 1, i. e.
n-l
V(x0, ■ ■ ■, x„_i) = ]~[(Xi-Xj0.
i>k=0
Now let us consider the values x$,..., x„_i to be fixed and let us vary the value of x„. Expanding the determinant by the last row (see ??), we obtain the wanted determinant as the polynomial
(5.1)   V(x0, ...,*„) = (x„)"V(x0,..., x„_i) - (xn)"-1
This is a polynomial of degree n since we know that its coefficient at (x„)" is non-zero, by the induction hypothesis. Apparently, it will take zero at any point x„ — x, for i < n because in that case, the original determinant contains two identical rows. Our polynomial is thus divisible by the expression
(.Xn    x§)(xn    Xl) • • • (xn xw_i),
which itself is of degree n. Hence it follows that the whole Vandermonde determinant (as a polynomial in the variable x„) must, up to a multiplicative constant, equal this expression, i. e.
V(x0,       x„) = c ■ (xn - x0)(x„ - xi) • • • (x„ - x„_i).
Confronting the coefficients at the highest exponent in (5.1) with this expression yields
c — V(x0, x„_i), which finishes the proof of this lemma. □
Again, we can see that the determinant will be very small if the distances of the points x, are such.
5.6. Derivatives of polynomials. We have found out that the values of the polynomials rapidly tend to infinite values as the input variable grows (see the pictures as well). Therefore, it is apparent that polynomials are unable to describe periodic events (such as the values of the trigonometric functions). One could say that we will achieve much better results, at least between the points x,, if we look not only at the function values, but also at the rate of increase of the function at those points.
For this purpose, we introduce (only intuitively, for the time being) the concept of a derivative for polynomials. Again, we can work with real, complex or rational polynomials. The rate of increase of a real-valued polynomial f(x) at a point x e R is well expressed by
f(x + Ax) - f(x)
(5.2)
Ax
and since we can calculate (over an arbitrary ring)
(x + Ax)* = x* +kxk~1 Ax + • • • + (^)x' (Ax)*"' + •
■(Ax)*
259
CHAPTER 5. ESTABLISHING THE ZOO
Xi	-2	-1	1	2
yi	1	-1	-1	1
Then find any polynomial of degree greater than three which satisfies the conditions in the table. O
5.13. Find a polynomial p(x) = ax3 + bx2 + cx + d which satisfies
p(0) = l,   p(l) = 0,   p(2) = l,   p(3) = 10.
o
5.14. Construct a polynomial p of degree three or less which satisfies
p(0) = 2,    p{\) = 3,    p(2) = 12,    p(5) = 147.
we get, for the polynomial f(x) quotient in the form
(in V
f(x+Ax)-f(x) Ax
nx" 1 Ax
■ +(Axf
Ax „«-2
oo, the above
Ax
■+Cl\ —
Ax
+ (n — l)a„_ix     H----+ a\ + Ax(... )
where the expression in parentheses is polynomially dependent on Ax. Clearly, for the values Ax very close to zero, we get a value arbitrarily close to the following expression:
Derivatives of polynomials    [__^
The derivative of a polynomial f(x) — anx" respect to the variable x is the polynomial
oo with
o
5.15. Let the values yo, ■ ■ ■, y„ e 1 at pairwise distinct points x0,..., x„ € R, respectively, be given. How many polynomials of degree exactly n + 1 and taking the given values at the given points are there? O
5.16. Determine the Hermite interpolation polynomials P, Q satisfying
P(-l) = -ll,    P(l) = l,    P'(-l) = 12, P'(l)=4; e(-l) = -9,    Q(l) = -1,    Q'(-l) = 10, Q'(l)=2.
o
5.17. Replace the function / with a Hermite polynomial, knowing that
f'(x) — na„x" 1 + (n — Y)an-\x"
Xi	-1	1	2
f(Xi) f\Xi)	4 8	-4 -8	-8 11
x0	= 0,	X\	= 2,	X2 =	1,
yo	= 0,	y\	= 4,	yi =	1,
So	= 0,		= 4,	y2 =	2.
o
5.18. Without calculation, determine the Hermite interpolation polynomial if the following is given:
>From the definition, it is clear that it is just the value f'(xo) of the derivative which gives us a good approximation of the polynomial's behavior near the point xq. To be more precise, the lines
/(so + Ax) - /fa)
y —-7^-(x - x0) + /(x0),
Ax
i. e. the secant lines of the graph of the polynomial going through the points [xq, /(xo)] and [xq+Ax, /(xq+Ax)] approach, as Ax decreases, to the line
y — f'(x0)(x - x0) + /(xo),
which must be the tangent to the graph of the polynomial /. We talk about linear approximation of the polynomial / by its tangent line.
The derivative of polynomials is a linear mapping which assigns to polynomials of degree at most n polynomials of degree at most n — 1.
Iterating this procedure, we obtain the second derivative /", the third derivative /(3), and generally after /c-tuple iteration, the polynomial of degree n — k. Thus the (n + l)-st derivative is the zero polynomial. This linear mapping is an example of the so-called cyclic nilpotent mappings, which are more thoroughly examined in the paragraph 3.32 on nilpotent mappings.
o
5.19. Find a polynomial of degree three or less taking the value y = 4 at the point x = 1 and y = 9 at x = 2, having its derivative equal to —2 at x =0 and to 1 at x = 1. Then find a polynomial of degree three or less taking the value y = 6 at both the points x = 1 and x = — 1 and having its derivative equal to 2 at both these points. O
5.7. Hermite's interpolation problem. Again, let us consider m + 1 pairwise distinct real numbers xo,... ,xm,i. e. xi / xj for all i / j. We will want to lay polynomi-|lv als through given values, but we will now determine ^l^%s»A_ not only the values at those points, but also the first derivatives. That it, we set y, a 3/. for all i. We are looking for a polynomial / which will satisfy these conditions on the values and derivatives.
260
CHAPTER 5. ESTABLISHING THE ZOO
5.20. How many polynomials satisfying the following conditions are there? The degree is four or less, the values at x0 = 5 and x\ = 55 are yo = 55 and y\ = 5, respectively, and both the first and second derivatives at the point xq are zero. O
5.21. Find any polynomial P satisfying
P(0) = 6,    P(l) = 4,    P(2)=4,    P'(2) = 1.
Analogously as in the case of interpolating the values only, we obtain the following system of 2(m + 1) equations for the coefficients of the polynomial fix) = anx" +----h ao:
ao + XQfli +----h (xo)"a„
a0 + xma\ H-----h (xm)"a„
a\ + 2xofl2 + • • • + n(xo)"_1fln
yo
ym
y0
o
5.22. Construct the natural cubic interpolation spline for the values
yo = 1, yi = 0, y2 = 1 at the points xq = — 1, x\ = 0, X2 = 1, respectively. O
5.23. Construct the natural cubic interpolation spline for the function
/(jc) = |jc|, jce[-l,l], selecting the points x0 = — 1, x\ = 0, x2 = 1. O
5.24. Construct the natural cubic interpolation spline for the points
x0 = —3,    x\ = 0,    x2 = 3 and the values yo = —3, y\ = 0, y2 = 3. Q
5.25. Without calculation, construct the natural cubic interpolation spline for the points x0 = —l,xi = 0 a x2 = 2 and the value yo = yi = y2 = 1 at these points. O
5.26. Construct the complete (i. e., the derivatives at the marginal points are given) cubic interpolation spline for the points
xq = —3,    x\ = —2,    x2 = —1
and the values
y0 = 0,    yi = 1,    y2 = 2,    y'0 = 1,    y'2 = 1.
O
5.27. Construct the natural cubic interpolation spline for the function
a\ +2xma2■
+ n(xm)" lan
ym
y
selecting the points
Xq = 0,      X\ = 1,      X2 = 3.
o
More problems concerning polynomial interpolation can be found at 315.
Again, we could verify that the choice n = 2m+l makes the determinant of this system non-zero, and thus there will be exactly one solution. However, similarly to Lagrange polynomial, our polynomial / can be constructed straightaway. We just create a set of polynomials with values 0 or 1 (at both the derivatives and the values) in order to express the desired values as their linear combination. Verifying the following definition and proposition is left to the reader:
Hermite's interpolation polynomial [___
Hermite's interpolation polynomial is defined by fundamental Hermite's polynomials:
l"(xt)
1
-(X -Xi)
di(x)Y
h\ix) =
h]ix) = where l(x) = ]1Li(x
h]ixj)
(h])'(xj) hjixj) (hf)'ixj)
and so Hermite's interpolation polynomial is given by the expression
k
f(x) = J2(yih}ixi) + y'ih2ixi)y
r = l
i'(Xi)
(x - xt) (£t(x))2 ,
- Xi). These polynomials satisfy:
1 for i = j 0   for i / j
0 0
5.8. Examples of Hermite's polynomials. The simplest case is the one of prescribing the value and the derivative at one point. This fully determines a polynomial of degree one
fix) = f(x0) + f'ix0)ix - x0),
i. e. exactly the equation of the straight line given by the value and slope at the point x$. When we set the values and the derivatives at two points, i. e. y0 = fix0), y0 = f'ixo), yi = /(*i), = f'ixi) for two distinct points x;, we still obtain an easily computable problem.
Let us look at it in a simple case when xo = 0, x\ = 1. Then the matrix of the system and its inverse will be
/0   0   0 1\
A =
1 0 \3
1 0
0/
A =
(2	-2	1	1 \
-3	3	-2	-1
0	0	1	0
	0	0	
261
CHAPTER 5. ESTABLISHING THE ZOO
B. Topology of the complex numbers and their subsets
5.28.  Find limit, isolated, boundary, and interior points of the sets
N,   <Q>,   X = {x e R; 0 < x < 1} inM.
Solution. The set N. For any n e N, we have that
01 (n) n N = (n - 1, n + 1) n N = {n}.
Hence, there is a neighborhood of n e N in R which contains only one natural number (the number n), therefore every point n e N is isolated. There are thus no interior points (an isolated point cannot be interior). A point a € R is a limit point of A if and only if every neighborhood of a contains infinitely many points of A. However, the set
0i (a) n N = (a - 1, a + 1) n N,    where a e R,
is finite, hence N has no limit points. By finiteness of this set, we have that
Sh := inf | b — n
inf    \b-n\>0   for b e R \ N.
neOi(b)nN
Therefore, 0Sb (b) n N = 0, so no b e M \ N is a boundary point of N. We also know that every point which is not an interior point of a given set is necessarily its boundary point. The set of N's boundary points thus contains N, and so it equals N.
The set Q. The rational numbers are a dense subset of the real numbers. This means that for every real number, there is a sequence of rational numbers converging to it. (We can, for instance, imagine the decimal representation of a real number and the corresponding sequence whose z'-th term will be the representation truncated to the first i decimal digits. Furthermore, we can suppose that the terms of this sequence are pairwise distinct, for example by deliberately changing the last digit, or by taking the representation with recurring nines rather than zeros, ie. 0.999... for the integer 1 and so on). The set of Q's limit points is thus the whole R and every point x e R \ Q is a boundary point. Especially, we get that any 8 -neighborhood
P P
--8, —h 8 I ,    where p, q e Z, q ^ 0,
q q
of a rational number p/q contains infinitely many rational numbers, hence there are no isolated points. The number 72/10" is rational for no n € N. Supposing the contrary (again, p, q sZ,q ^ 0) V2     p ^ 10>
ie
10" q q we arrive at an immediate contradiction as we know that the number \fl is not rational. Every neighborhood of a rational number p/q thus contains infinitely many real numbers p/q + 72/10" (n e N)
The multiplication A • iyo,y\,ylQ,y'])T gives the vector («3, ci2, a\, ao)T of coefficients of the polynomial /, i. e.
/(*) = (2y0-2yi +/0 + /i)x3
+ (-3)>o + 3yq - 2y0 - y\)x2 + y0x + y0.
5.9. Spline interpolation. Similarly, we can prescribe any finite \. number of derivatives at the particular points and a convenient choice for the upper bound on the degree of the wanted polynomial leads to a unique interpolation. We will not delay ourselves with details here. Unfortunately, these interpolations do not solve the problems mentioned already in connection with the simple interpolation of values - complexity of the computations and instability. However, the usage of derivatives allows us to improve our methods:
As we have seen in the pictures demonstrating the instability of the interpolation by a single polynomial of sufficiently large degree, small local changes of the values dramatically affected the overall changes of the behavior of the resulting polynomial. Thus we may try to use small polynomial pieces of low degrees which we, however, must be able to link to one another properly.
The simplest case is to link each pair of adjacent points with a polynomial of degree at most one. This is also the most frequent way of displaying data. From the view of derivatives, this means that they will be constant on each of the segments and then will change in a leap.
A bit more sophisticated method is to prescribe the value and the derivative at each point, i. e. we will have four values for two points, which uniquely determines Hermite's polynomial of degree three, see above. This polynomial can then be used for all the values of the input variable between the marginal points xo < x\. We talk about the interval [xo, x{\. Such a piecewise polynomial approximation has the property that the first derivatives will be compatible.
However, in practice, mere compatibility of the first derivatives is insufficient (for instance, with railway tracks), and furthermore, the values of the first derivatives are not always at our disposal. Thus we get the idea of making use of the values at the given points, and on the other hand to require equality of the first and second derivatives between the adjacent pieces of the cubic polynomials. This conditions yield the same number of equations and unknowns, and so the problem will be similarly solvable:
_ |    Cubic splines [___
Let xo < xi < • • • < x„ be real values at which the required values yo, ..., yn are given. A cubic interpolation spline for this assignment is a function S : R —>• R which satisfies the following conditions:
• the restriction of S on the interval [x,_i, x;] is a polynomial St of degree at most three i — 1,..., n
• Si(xi-i) — yt-i and St (xt) — yt for all; — 1, ...n,
• S'i (xi) =       (xi) for all i = 1,..., n - 1,
• S'! (xt) = S'!+l{Xi) for all i = 1,..., n - 1.
The cubic spline1 for n + 1 points consists of n cubic polynomials, i. e. we have An free parameters (the first condition from the
The name comes from the meaning of a ruler used to draw smooth curves between points.
262
CHAPTER 5. ESTABLISHING THE ZOO
which are not rational (Q, as a field, is closed under subtraction). Therefore, every point p/q € Q is boundary as well, and there are no interior points of the set Q.
The set X = [0, 1). Let a e [0, 1) be an arbitrary number. Apparently, the sequences {a + {1 — ^}%Li converge to a and 1, respectively. So we have easily shown that the set of X's Umit points contains the interval [0, 1]. There are no other limit points: for any b <£ [0, 1] there is 8 > 0 such that Os (b) n [0, 1] = 0 (for b < 0 it suffices to take 8 = —b, and for b > 1 we can choose 8 = b — 1). Since every point of the interval [0, 1) is a limit point, there are no isolated points. For a e (0, 1), let 8a be the less one of the two positive numbers a, I — a. Considering
0Sa (a) = (a - 8a, a + 8a) c (0, 1),    a € (0, 1),
we see that every point of the interval (0, 1) is an interior point of X. For every 8 e (0, 1), we have that
os (0) n [0, i) = (-8, s) n [0, i) = [0, s),
0& (1) n [0, 1) = (1 - 8, l+8)0 [0, l) = (l - s, 1),
so every 8 -neighborhood of the point 0 contains some points of the interval [0, 1) and some points of the interval (—8,0), and every 8-neighborhood of 1 has a non-empty intersection with the intervals [0, 1), [1,1 + 8). Therefore, 0 and 1 are boundary points. Altogether, we have found that the set of X's interior points is the interval (0, 1) and the set of X's boundary points is the two-element set {0, 1}, as we know that no point can be both interior and boundary and that a boundary point must be an interior or limit point. □
5.29. Determine the suprema and infima of the following sets in R:
(-1)"
; n e N} C R,
A = (-3,0]U(1,tt)U{6}; B
5.30. Find sup A and inf A for
n+ (-!)"
5.31. The following sets are given:
N = {1,2, ...,n, ...},    M = t7 = (0,2]U[3,5]\{4}. Determine inf N, sup M, inf J and sup J in R.
; n e N ; C = (-9,
O
O
1
-; n € N n
definition). The other conditions then yield In + (n — 1) + (n — 1) more equalities, i. e. two parameters remain free. In practice, we often prescribe the values of the derivatives at the marginal points explicitly (the so-called complete spline), or assume they equal zero (this case is called a natural spline).
Unfortunately, the computation of the whole spline is not as easy as with the independent computations of Hermite's cubic polynomials because the data mingle between adjacent intervals. However, with an appropriate order, one can obtain a matrix of the system such that all of its non-zero elements appear on three diagonals only. These matrices are nice enough to be solved in time proportional to the number of points, using a suitable numerical method. For comparison, let us look at interpolation of the same data as in the case of the Lagrange polynomial, now using splines:
2. Real number and limit processes
It is important to have a sufficiently large stock of functions with which we can express all usual dependencies. However, at the same time, the choice of the functions must be carefully restricted so that we would be able to build some universal and efficient tools for the work with them.
Actually, the first problem we have to solve is how to define the values of the functions at all. After all, all we can get with a finite number of multiplication and addition is polynomial functions and efficient manipulation can be done with rational numbers only. However, we cannot do with rational numbers even when looking for roots of quadratic polynomials as, for instance, \/2 is not a rational number.
Thus our first step will be a thorough introducing of the so-called limit process, i. e. we will define precisely what it means that some values approach a certain value.
We can also notice that an important property of polynomials is their "continuous" dependency of their values on the input variable. Intuitively said, if we change x a little bit, the value of f(x) also changes a bit only. On the other hand, this behavior is not possessed by piecewise constant functions / : R -> R near the sudden "jumps". For instance, the so-called Heaviside's function
/(*) =
0 for all x < 0, 1/2   forx = 0,
1 for all x > 0
o
has this type of "discontinuity" for x = 0.
Let us formalize these intuitive statements.
263
CHAPTER 5. ESTABLISHING THE ZOO
5.32. Find a set M c R which does not have an infimum in R but has a supremum there. Similarly, find a set A7" c R which does not have a supremum in M but has an infimum there. O
5.33. Find a subset X of the set M such that sup X < inf X. Q
5.34. Find sets A, B, C c M such that
AHS = 0, ARC = 0, SRC = 0, sup A = inf 5 = infC = supC.
O
5.35. Mark the following sets in the complex plane:
i) {z eCMz- 1| = |z + l|},
n) {z e C| 1 < \z - i\ < 2},
iii) {z € C| Re(z2) = 1},
iv) {z e C| Re(i) < ±}.
Solution.
• the imaginary axis,
• annulus around /,
• the hyperbola a2 — b2 = 1,
• exterior of the unit disc centered at 1.
□
C. Limits
In the subsequent exercises, we will deal with calculating limits of sequences, that is what the sequences "look like at infinity". Then, if we were to determine the 72-th term of a given sequence for a very large 72, the hmit of the sequence (supposing it exists) can approximate it very well. We devote much space to computation of hmits of sequences (and limits of functions) in this exercise column, that is why they begin earlier (and end later) than in the part concerning theory.
Let us begin with limits of sequences. The needful definitions can be found at page. 266.
5.10. Real numbers. So far, we have made do with algebraic properties of real numbers which claimed that R is a field. However, we have also used the relation of the standard (total) order of the real numbers, denoted "<" (see the paragraph 1.38). The properties (axioms) of the real numbers, including the connections between the relations and other operations, are enumerated in the following table. The bars indicate how the axioms gradually guarantee that the real numbers form an abelian (commutative) group with respect to addition, that R \ {0} is an abelian group with respect to multiplication, that R is a field, that the set R together with the operations +, • and the order relation is a so-called ordered field. Finally, the last axiom can be perceived as claiming that R is "sufficiently dense", i. e. there are no points missing between any points (like, for instance, \fl is missing in the rational numbers).
[    Axioms of the real numbers    [__<
(Rl)   (a + b) + c = a + (b + c), for all a,b,ceR
(R2)   a+b = b + a, for all a,/> e R
(R3)   there is an element 0 e M such that for all a e R, a+0 =
a
(R4)   for all a e R, there is an additive inverse (—a) e R such that a + (-a) = 0
(R5) (R6) (R7)
(R8)
(a ■ b) ■ c = a ■ (b ■ c), for all a, b, c e R a ■ b = b ■ a for all a, b e R
there is an element 1 e R, 1 / 0, such that for all a e R,
1 • a = a
for all a e R, a / 0, there is a multiplicative inverse
-1
such that a ■ a 1 = 1
(R9)   a ■ (b + c) = a ■ b + a ■ c, for all a, b, c e R (RIO)   the relation < is a total order, i. e. reflexive, antisymmetric, transitive, and total on R (Rl 1)   for all a, b, c e R, a < b implies a + c < b + c (R12)   for all a, b e R, a > 0 and b > 0 implies a ■ b > 0
1
(R13)   every non-empty set A c has a least upper bound.
which has an upper bound
The conception of a least upper bound (also called supremum) must be thoroughly introduced. It makes sense for any partially ordered set, i. e. a set with a (not necessarily total) ordering relation. We will also meet it later in algebraic contexts. Let us remind that at the general level, an ordering relation is any binary relation on a set which is reflexive, antisymmetric, and transitive; see the paragraph 1.38.
__ I    Supremum and infimum    [__>
Definition. Let us consider a subset A c B in a partially ordered set B. An upper bound of the set A is any element b e B such that b > a holds for all a e A. Dually, we define the concept of a lower bound of the set A as an element b e B such that b < a for all a € A.
The least upper bound of the set A, if it exists, is called its supremum and denoted by sup A. Dually, the greatest lower bound, if it exists, is called an infimum; we write inf A.
The last axiom of our table of properties of the real numbers thus claims that for every non-empty set A of real numbers, it is true that if there is a number a which is greater than or equal to all
264
CHAPTER 5. ESTABLISHING THE ZOO
5.36.   Calculate the following limits of sequences:
i) lim 2"2+l"+1, ü) lim 2"2+3"+1,
iii) lim
n + l
«^oo 2«2+3« + l ' 2"-2"
V4«"
-oo 2"+2-" ■
24
iv) lim„
v) lim
vi) lim -v/4«2 + n — 2n.
Solution.
i) lim
2«2+3« + l n + l
2«+3+7
lim
n^*oo    1 + 7,
ii) lim 2"2+3"+1
7 „^„o  3«z+« + l
lim —
n^*oo 3+
^ + -L
n „2
iii) lim
n^*cx>
iv)
n + l 2n2+3n + l
lim
1
OO.
1 + -
lim .
„^oo 2n+3+-
2" - 2~" 2" + 2""
lim
IL _ i
— + 1
2-n    i 1
v) By the squeeze theorem (5.21): Vn e N : :
/4n2"   < y4n2+n
<
4n2+n +
lim
n^*oo
vi)
2nH
Then lim - 2. So lim
/4«2
lim
2«
2, lim ■
4«2+« +
v/4n2+n
2 as well.
lim \f\~n1
+ n — 2n
lim
n^*oo
(V4«2 + n - 2n)(jAn2 +n + 2n)
V4«2 + n + 2n
n
lim —_
«^oo ^4„2 _|_ „ _|_ 2n
lim
1
oo ^4n2+n
+ 2
1
4"
□
5.37.  Let c e
lim ^/c = 1.
n^*cx>
Solution. First, let us consider c > 1. The function Hfc is decreasing (in n), yet all its values are greater than 1, hence the sequence Zfc has a limit, and this limit is equal to the infimum of the sequence's terms. Let us suppose, for a while, that thus Umit is greater than 1, that is 1 +s for some s > 0. Then by the definition of a limit, all the sequence's
2
terms will eventually (from some index m on) be less than 1 + e +
2
especially ?/c < 1 + e +     . But then we have that
ic <
1 +s +
1 + 2<1 + £'
numbers x € A, then there is a least number with this property. For instance, the choice A = {x € Q, x2 < 2} gives us the supremum sup A = \fl.
An immediate consequence of this axiom is also the existence of infima for any non-empty set of real numbers bounded from below. (It suffices to realize that changing the sign of all the numbers interchanges suprema and infima).
For the formal construction of our theory, we need to know whether the properties we demand from the real numbers are realizable, i. e. whether there is such a set R with the operations and ordering relation which satisfy the thirteen axioms. So far, we have constructed correctly only the rational numbers, which form an ordered field, i. e. satisfy the axioms (Rl) - (R12), which can easily be verified.
Actually, the real numbers can not only be constructed, but the construction is, up to isomorphism, unique. However, for our need, we will do with an intuitive idea of the real line. We will focus on the existence and uniqueness later on.
5.11. The complex plane. Let us remind that the complex numbers are given as pairs of real numbers. We usually write them as z = re z + i im z. Therefore, the plane C = M2 is a good image of the complex numbers. With addition and multiplication, the complex numbers satisfy the axioms (R1)-(R9) and thus form a field. There is, however, no natural ordering defined on them which would satisfy the axioms (R10)-R(13). Nevertheless, we will work with them as we have already seen that extending some scalars to the complex numbers is highly advantageous for calculations, and sometimes even necessary.
There is an important operation on the complex numbers, the so-called conjugation. It is the reflection symmetry with respect to the line of real numbers, i. e. changing the sign of the imaginary part. We denote it by a bar over the number z e C:
z = rez — i imz.
Since for z = x + iy,
z ■ z = (x + iy) (x — iy) = x2 + y2,
this value expresses the squared distance of the complex numbers from the origin (zero). The square root of this non-negative real number is called the absolute value of the complex number z; we write
(a positive real number). We will show that (5.3)
z • z.
The absolute value is also defined on any ordered field of scalars K, we just define the absolute value \a\ as follows:
Ia if a > 0 —a   if a < 0.
Of course, it is true that for any numbers a, b e K,
(5.4) \a+b\ < \a\ + \b\.
This property is called the triangle inequality. It also holds for the absolute value of the complex numbers, which was defined above.
Especially for the field of rational numbers and the field of real numbers, which are subfields of the complex numbers, both definitions of the absolute value coincide.
265
CHAPTER 5. ESTABLISHING THE ZOO
which contradicts our assumption that 1 + e is the infimum of the considered sequence.
The theorem is trivial for c = 1, and for a number c € (0, 1) it follows from the above, if we invoke the theorem for the number l/c.
□
5.38. Determine
lim J/n.
Solution. Apparently, we have j/n > 1, n e N. So we can set rfn = 1 + a„   for certain numbers   a„ > 0, neN. By the binomial theorem we get that
n = (1 + a„)n = 1 + Qo„ + Qa2 + ••• + <, n > 2 (n e N). Hence we have the bound (all the numbers a„ are non-negative)
n > i ic
72 (?2 — 1)
which leads to
0 < a„ <
72-1
72 > 2(72 € N),
72 > 2(72 e N).
By the squeeze theorem,
0 = lim 0 < lim an < lim
n^-oc n^-oc n^-oc
Thus we have obtained the result
72 - 1
lim t/n = lim (1 + a„) = 1 + 0 = 1.
We can notice that by further application of the squeeze theorem, we get
1 = lim 1 < lim Ifc < lim j/n~ = 1 for every real number c > 1. □
5.39.   Calculate the limit
um (y/i.yi.yi... 272).
Solution. To determine the hmit, it is sufficient to express the terms
in the form
22 • 2? • 2s • • • 22* = 22+?+5+'"+2^.
Thus we get
lim (y/2 .Zfl.yi... 2lfl \ = lim 2
n—>oo \ / n—>oo
lim (
i+i+i.
2+4 + 8
_ J_
"2"
I + I + I.
2+4 + 8
2«=i
5.12. Convergence of a sequence. In the following paragraphs, we will work with one of the number sets K of ratio-X, nal, real, or complex numbers. The absolute value thus must be understood in the corresponding context, and we should also bear in mind that the triangle inequality holds in all these cases.
We would like to formalize the notion of a sequence of numbers approaching a limit. Therefore, the key object of our interest will be sequences of numbers a,, where the index i usually goes throughout the natural numbers. We will denote the sequences either loosely as ao, a\,..., or as infinite vectors (ao, ai,...), or (similarly to the matrix notation) as (flr)^i-
_ |    Cauchy sequences [___
Let us consider a sequence (ao, a l, • • •) of elements of K such that for any fixed positive number e > 0, it holds for all but finitely many terms a, of the sequence that for all but finitely many terms
a;,
a,- — a,- < e.
I
In other words, for any fixed e > 0, there is an index N such that the above inequality holds for all i, j > N; i. e. the elements of the sequence are eventually arbitrarily close to each other. Such a sequence is called a Cauchy sequence.
Intuitively, we feel that either all but finitely many of the sequence's terms are equal (then \at — aj \ =0 will hold from some index N on), or they "approach" some value. This is easily imaginable in the complex plane: choosing an arbitrarily small disc (with radius equal to e), then, supposing we have a Cauchy sequence, it must be possible to put it into the complex plane in such a way that it covers all but finitely many of the elements of the infinite sequence a,. We can imagine that the disc gradually shrinks to a single value a; see the picture.
tos-wutnost
O fJOMri&CNltff cjsel-
If such a value a e K exists for a Cauchy sequence, we would expect the sequence to have the property of convergence:
__ [    Convergent sequences    [__>
We say that a sequence (a;)°^0 converges to a value a iff for any positive real number e,
fl; — a\ < s
1
holds for all but finitely many indeces i (the set of those i for which the inequality does not hold may depend on e). The number a is called the limit of the sequence (fli)^0.
If a sequence a, e K, i — 0, 1,..., converges to a e K, then for any fixed positive e, we know that \at —a\ < s for all i greater than a certain N e N. However, by the triangle inequality, we then get that for all pairs of indeces i, j > N, it is true that
\ai — aj \ — \cii — ax + o/v — Qj \ < Wi — fl/v I + \aN ~ aj \ < 2e.
Thus we have proved:
Lemma. Every converging sequence is a Cauchy sequence.
266
CHAPTER 5. ESTABLISHING THE ZOO
By the well-known formula for the sum of geometric series,
00 /1 \ « 1
whence it follows that
lim (V2-^2-^2---72)=21=2.
n—>co \ /
□
5.40. Determine
1 2
n — 2    72 — 1
lim   — + — + ••• + —— +
O
5.41. Calculate
V«3 — H«2 + 2 + ^Jn1 — 2n5 — n3 — n + sin2 n lim -
2 - ^5t24 + 2n3 + 5
5.42. Determine the limit
n\ + (n-2)\- (n -4)!
o
lim
n50 +n\- (n - 1)!
O
5.43. Find two sequences (let use denote their terms by x„ and y„ (n e N), respectively) having infinite limits and such that
lim (x„ +y„) = l,        lim (x„ y2) = +00.
5.44. Determine the limit points of the sequence given by
(-l)"2n
o
V4«2 + 5n + 3
neN.
O
5.45. Calculate
if
lim sup a„   and   lim inf a„
n2 + An — 5     9 H7T
--- sin —,    n e N.
«2 + 9 4
O
5.46. Determine
lim inf ( (-1)" ( 1 + -J +sin —
o
However, in the field of rational numbers, it can easily happen that the corresponding value a does not exist even for a Cauchy sequence. For instance, the number \fl can be approached by rational numbers at with arbitrary accuracy, thereby obtaining a sequence converging to \fl, but the limit is not rational.
Ordered fields of scalars in which every Cauchy sequence is converging are called complete. The following theorem proposes that the axiom (R13) guarantees that the real numbers are such a field:
Theorem. Every Cauchy sequence of real numbers ai converges to a real value a e M.
Proof. The terms of any Cauchy sequence form a bounded set Q        since any choice of e bounds all but finitely many of them. Let us define B as the set of those real num-
^sSgSiLg bers x for which x < aj holds for all but finitely
many terms aj of the sequence.
Apparently, B has an upper bound, and thus has a supremum
as well, by (R13). Let us define a — sup 5. Now, having fixed
some e > 0, we choose N so that \at -Especially, aj > a^ — s and aj < apj -and so a^ — s belongs to B, while a^ we get that \a — a^\ < s, and thus
aj\ < s for all i, j > N. s for all indeces j > N, - s does not. Altogether,
aj\ <
■ aN\
|fljv — aj \ < 2e
for all j > N. However, this means that a is the limit of the considered sequence. □
Corollary. Every Cauchy sequence of complex numbers n converges to a complex number z.
Proof. Letuswritezi — at+ibi. Since |a,— aj\2 < \zi—Zj\2 and similarly for the values bt, both sequences of real numbers a, and bt are Cauchy sequences. They converge to a and b, respectively, and we can easily verify that z — a + 1' b is the limit of the sequence zi. □
5.13. Remark. The previous discussion gives us a method for defining the real numbers. We proceed similarly to building the integers from the natural numbers (adding all additive inverses) and building the rational numbers from the integers (adding all multiplicative inverses of non-zero numbers). This time, we "complete" the rational numbers by all limits of Cauchy sequences.
It suggests itself to introduce a suitable equivalence relation on the set of all Cauchy sequences of rational numbers so that Cauchy sequences (a,■) and (bt) are equivalent iff the distances | a, — bt I converge to zero (this is the same as the condition that merging these sequences into a single sequence-for instance, the terms of the first sequence will become the odd terms of the resulting one and the terms of the second sequence will be the even ones-yields a Cauchy sequence as well). We will not verify that this relation is an equivalence in detail, neither will we define the operations and the ordering relation, nor will we prove that all of the axioms will indeed hold. Nevertheless, it is not difficult. Nor is proving the fact that the axioms (R1)-(R13) define the real numbers uniquely up to isomorphism (a bijective mapping preserving the algebraic operations as well as the ordering). We will return to this notes later.
267
CHAPTER 5. ESTABLISHING THE ZOO
5.47.  Now let us proceed with limits of functions. The definition can be found at page 272. Determine
(a)
lim sinx;
(b)
(c)
(d)
lim
x + x
2 x2 - 3x + 2'
lim arccos-
X^+QO \ X + 1
lim
x + x
(jc-2)(jc+3) ,. x + 3 2 + 3 c lim-:-— = lim-- =--- = 5
►2 X2 - 3x + 2      x^2 (X - 2) (X - 1)      ^21-1 2-1
leads to the correct result (thanks to continuity of the obtained at function at the point x0 = 2). Let us realize that the Umit of a function can be calculated from the function values in an arbitrarily small deleted neighborhood of a given point xo and that the Umit does not depend on the function value at the point. We can thus make use of multiplying or reducing by factors which do not change the function values in an arbitrarily selected deleted neighborhood of the point x0.
Exercise (c). By moving the limit inwards twice, the original Umit transforms to
arccos I lim -
>+°° x + 1
It can easily be shown that
lim
1
0.
► +oo x + 1
As the function y = arccos x is continuous at the point 0 and takes the value 7t/2 there, and the function y = x3 is continuous at jt/2, we get that
lim (arccos—-—| = (arccos (  lim —-—| | = (—
:^+cx) \ x + 1 / V V^+CX) x + 1 / / ^ 2
5.14. Closed sets. Four our further work with the real or complex numbers, we will need to thoroughly understand the notions of closeness, boundedness, convergence, and so on. For any subset A of points in K, we will be interested not only of the points belonging to a e A, but also in the ones which can be approached by limits of sequences.
Limit points of a set    [__^
1 , lim arctg —,     lim arctg x ,     lim arctg (sin x) .
x^—oo x       x^—oo x^—oo
Solution. Exercise (a). Let us remind that a function / is, by definition, continuous at a given point x iff the limit of / at x is equal to the function value f(x). However, we know that the function y = sin x is continuous at every real number. Thus we get that
,.        . .   TV V3
lim sin x = sin — = —.
x^n/3 3 2
Exercise (b). The immediate substitution x = 2 leads to both zero numerator and zero denominator. Despite that, the problem can be solved very easily. The reduction
Let us consider a set A of points belonging to K. A point x € K is called a limit point of the set A iff there is a sequence «0, «i, • • • of elements of A such that all its terms differ from x, yet its limit is x.
The limit points of a subset A of rational, real, or complex numbers are those numbers x which can be approached by such sequences of numbers lying in A which do not contain the point x itself. Let us notice that a limit point of a set may or may not belong to it.
For every non-empty set A c K and a fixed point x e K, the set of all distances |x — a\, a e A, is a set of real numbers bounded from below, and so it has an infimum d(x, A), which is called the distance of the point x from the set A. Let us notice that d(x, A) — 0 if and only if x e A or x is a limit point of A. (We suggest that the reader prove this in detail from the definitions.)
[    Closed sets |___
The closure A of a set A c K is the set of those points which have zero distance from A (note that the distance from the empty set of points is undefined, therefore 0 = 0).
A closed subset in K is such a set which coincides with its closure. Thus these are exactly those sets which contain all of its limit points as well. There is a typical example of a closed set: a closed interval
[a, b] — {x e R, a < x < b] of real numbers, where a and b are fixed real numbers.
If either of the boundary values of the interval is missing, we write a — —oo (minus infinity) and similarly b — +oo. Such closed intervals are denoted by (—oo, b], [a, oo), and (—oo, oo).
The closed sets are exactly those which contain all they can "converge to". A closed set may be formed by a sequence of real numbers without a limit point or a sequence with a finite number of limit points together with these points. The unit disc (including its boundary circle) in the complex plane is another example of a closed set.
We can easily verify that any intersection and any finite union of closed set is again a closed set. Indeed, if all of the points of some sequence belong to the considered intersection of closed sets, then they belong to each of the sets, and so do all the limit points. However, if we wanted to say the same about an arbitrary union, we would get in trouble: singleton sets are closed, but a sequence of points created from them may not be. On the other hand, if we restrict our attention to finite unions and consider a limit point of some sequence lying in this union, then the limit point must also be the limit point of any subsequence, especially the one lying in only one of the united sets. As this set is assumed to be closed, the limit point lies in it, and thus it lies in the whole union.
268
CHAPTER 5. ESTABLISHING THE ZOO
Exercise (d). The function y = arctg x has properties which are "useful when calculating hmits" - it is continuous and injective (increasing) on the whole domain. These properties always (with no further conditions or hmitations) allow to move the examined limit into the argument of such a function. Therefore, let us consider
arctg ( lim — J ,   arctg ( lim x4 J ,   arctg ( lim sinx
\ x^ — oo x I \x^—oo      I \x^— oo
Apparently,
1
lim - = 0,
x^-oo x
lim x = +oo
x^> — oo
and the limit lim^-oo sinx does not exist, which implies
4
1 4 7T
lim arctg - = arctg 0 = 0,     lim arctg x = lim arctg y = —
—* —oo x x^—oo y^+oo 2
and the last hmit does not exist, either.
□
5.48.  Determine the limit
lim
1 — cos x >b x2 sin(x2)
Solution.
1 — cos X
lim
>o X2 sin(x2)
lim
2 sin2 (I)
>o X2 sin(x2)
lim
1 ^ (!)
2 .
2sm V2
>0 (f) sin(x2)
1 / sin -   lim —
2 \*->o
(f)V 1 — -lim
1
- • oo = oo.
2    j    - >osin2(x2) 2
The previous calculation must be considered "from the back". Since the limits on the right-hand side exist (no matter whether finite or infinite) and the expression \ ■ oo is meaningful (see the note after theorem 5.22), the original hmit exists as well. If we split the original limit into the product
1
lim(l - cosx) • lim ,
x^o x^o x2 sm(xz)
we would get the 0 • oo type, which is an indeterminate form, but this tells us nothing about existence of the original limit. □
5.49.  Determine the following limits:
i) lim^2 in) lim^o
x-2 •Jx1-^
ii) lim^o
sin (sin x)
iv)   lim^o e~>
Solution.
x — 2 x — 2
i) lim   , = lim
Vx - 2 0 lim = - = 0.
►2 ^/x2 _ 4    x^2 J(x -2)(x + 2)    x^2 Vx + 2 4
...    ,.     x-2   (5.27),. siny 11)    lim   , =  lim-= 1,
5.15. Open sets. There is another useful type of subsets of the real numbers: open intervals
(a, b) = {x € R; a < x < b],
where, again, a and b are fixed real numbers or infinite values ±oo. It is an open set, in the following sense:
__\    Open sets and neighborhoods of points |__-
An open set in K is a set whose complement is a closed set. A neighborhood of a point a e K is any open set O which contains a. If the neighborhood is defined as
Og(a) = {x e K, |x - a\ < 8}
for some positive number 8, then we call it the S -neighborhood of the point a. |
Let us notice that for any set A, a e K is a limit point of A if and only if every neighborhood of a contains at least one more point b e A,b ^ a.
Lemma. A set A cK of numbers is open if and only if with every point a € A, an entire neighborhood of a belongs to A.
Proof. Let A be an open set and a e A. If there were no neighborhood of the point a inside A, there would be a sequence a„ <£ A, \a — an \ < 1/n. But then the point a e A is a limit point of the set K \ A, which is impossible since the complement of A is closed.
Now let us suppose that every a e A has an entire neighborhood of its lying in A. This naturally prevents a limit point b of the set K \ A to lie in A. Thus the set K \ A is closed, and so A is open. □
From this lemma, it immediately follows that any union of open sets results in an open set, and further than any finite intersection of open sets is also an open set.
In the case of the real numbers, the ^-neighborhood of a point a is the open interval of length 28, centered at a. In the complex plane, it is the disc of radius 8, also centered at a.
5.16. Bounded and compact number sets. The closed and open sets are the basic concepts of topology. Without going into deeper connections, we have just made ourselves familiar with the topology of the real line and the topology of the complex plane. The following concepts will be extremely useful:
___j    Bounded and compact sets     J___i
A set A of rational, real, or complex numbers is called bounded iff there is a positive real number r such that \z\ < r for all numbers z e A. Otherwise, the set is called unbounded.
A set which is both bounded and closed is called compact.
►2 Vx2 - 4       y^o y
Closed bounded intervals of real numbers are a typical example of compact sets.
Let us add further topological concepts that will allow us to express efficiently:
An interior point of a set A of real or complex numbers is such a point that one of its neighborhoods is contained in A.
A boundary point of a set A is such a point that all its neighborhoods are disjoint with neither A, nor its complement K \ A. A boundary point of the set A may or may not belong to it.
269
CHAPTER 5. ESTABLISHING THE ZOO
where we made use of the fact that lim sin x = 0.
x^O
sin x iii) lim-
x^O X
smx
lim sin x ■ lim-
x^O x^O X
0-1=0,
again, the original limit exists because both the right-hand side limits exist and their product is well-defined.
iv) One must be cautious when calculating this limit. Both one-sided limits exist, but are different, which implies that the examined Umit does not exist:
lim e
x^0+
oo,
lim
x^O-
□
5.50. Calculate
(a) lim*
(c) lim*
Solution.
x+2 >2 (x-lf '
(b) (d)
lim* lim*
x+2 >2 (x-2)5 '
► +0o -
In this exercise, we will be concerned with so-called indeterminate forms. We recommend perceiving indeterminate forms as a helping concept which is only to facilitate the first approach to limit calculations because the obtained indeterminate form only means that one "has found out nothing". We know the limit of a sum is the sum of the limits, the limit of a product is the product of the Umits, and the limit of a quotient is the quotient of the limits, supposing the particular Umits exist and do not lead to one of the following expressions oo — oo, 0 • oo, 0/0, oo/oo, which are called indeterminate forms. For completeness, let us add that these rules can be combined and that an expression containing an indeterminate form is itself considered an indeterminate form. For instance, the forms
-oo + oo
0
oo
(—oo)3+oo
are all indeterminate, but the forms
0
- oo,        T-^- =
' 3+oo
0 • (oo — oo)"
0
-oo
oo,
3 + oo     (—oo)3
oo
can be called "determinate" (one can immediately determine the Umit - they correspond to the values — oo, 0, 0, respectively).
In exercise (a), the quotient of the numerator and the denominator gives us 4/0. Expressions containing division by zero are inappropriate (later, we should be able to avoid them). Yet it leads to the result, it is not an indeterminate form. We may notice that the denominator
An open cover of a set A is such a system of open sets Ut, i e I, that its union contains the whole of A.
An isolated point of a set A is a point a e A such that there is a neighborhood Nofa satisfying N n A — {a}.
^      ~ t
1 -'
5.17. Theorem. All subsets A of the real numbers satisfy:
(1) a non-empty set A is open iff it is a union of countably (or finitely) many open intervals,
(2) every point a e A is either interior or boundary,
(3) every boundary point of A is either an isolated or a limit point of A,
(4) A is compact iff every infinite sequence contained in it has a subsequence converging to a point in A,
(5) A is compact iff each of its open covers contains a finite sub-cover
Proof. (1) Apparently every open set is some union of neighborhoods of its points, i. e. of open intervals. So the question that remains is whether it suffices to take countably many of them. Thus we may try to select intervals which will be as "great" as possible. We will consider points a, b e A to be related iff the whole open interval (min{a, b], max{a, b}) is contained in A. Clearly, this relation is an equivalence (the open interval (a, a) is the empty set, which is contained in any set; symmetry and transitivity are apparent). The classes of this equivalence relation are intervals which are pairwise disjoint. Each of these intervals surely contains a rational number, and the obtained rational numbers are also pairwise distinct. However, there are only countably many rational numbers, so the statement is proved.
(2) It follows immediately from the definitions that no point can be both interior and boundary. Let a e A be a point that is not interior. Then there is a sequence of points at ^ A with a as its limit point. At the same time, a belongs to each of its neighborhoods. Thus a is boundary.
(3) Suppose that a e A is boundary but not isolated. Then, similarly to the reasoning from the previous paragraph, there are points at, this time inside A, whose limit point is a.
(4) Suppose that A is a compact set, i. e. both closed and bounded. Let us consider an infinite sequence of points at e A. This set surely has both a supremum b and an infimum a (we could have taken any upper and lower bounds of the set A as well). Now let us cut the interval [a, b] into halves: [a, \(b — a)] and [j(b — a),b]. At least one of them contains infinitely many of the terms at. We will select this half and one of the terms contained in it;
270
CHAPTER 5. ESTABLISHING THE ZOO
approaches zero from the right (for x ^ 2 we have that (x — 2)6 > 0). We write this as 4/ + 0. Thus the numerator and denominator are both positive in some deleted neighborhood of the point x0 = 2 and one can say that the denominator, at the Umit point, is "infinitely times less" than the numerator, that is
x + 2
lim
+oo,
= +oo (similarly, we can set
- 7i (x - 2f which corresponds to setting 4/ + 0 4/ - 0 = -oo).
When calculating the limit of (b), one can proceed analogously. Since the numbers have the same sign, we get that
x + 2 x + 2
lim
+oo 7^ —oo
lim
►2+ (X - 2)5 x^2- (x - 2)5 '
so the examined Umit does not exist. We can write 4/±0 (or, more generally, a/ ± 0, a 7^ 0, a € R*), which is a "determinate form". When thoroughly distinguishing the symbols +0 and —0 from ±0, a/ ± 0 for a    0 always means the limit in question does not exist. Exercises (c), (d). If fix) > 0 for all considered x e R, then
f(x)g{x) = eln(/w*w) = esM-ln/M.
Making use of the fact that the exponential function is continuous and injective on the whole of its domain (R), we can replace the Umit
g(x)
with
lim fix)
X—>XQ
lim (g(x)-lnf(x))
Let us remind that either of these limits exists if and only if the other one exists. Further,
lim igix) ■ Infix)) = a e R
X^-XQ
lim igix) ■ Infix)) = +oo
X—>XQ
lim igix)-In fix))
-OO
lim fix)
X—>XQ
lim fix)
X—>XQ
lim fix)
X—>XQ
g(x) g(x) g(x)
e ,
+oo,
0.
Thus we can write
lim fix)
X—>Xq
g(x)
lim g(x)- lim In f(x)
X—^A'q X—^A'q
if both limits on the right-hand side exist and do not lead to the indeterminate form 0 • oo. It is not difficult to realize that this indeterminate form can only be obtained in three cases, corresponding to the remaining indeterminate forms 0°, oo°, 1°°, when we have, respectively, that
and
and
lim fix) = 0
X^-Xq
lim f{x) = +oo
X^-Xq
lim fix) = 1
X^XQ
and
lim gix) = 0,
X^XQ
lim gix) = 0,
X^XQ
lim g(x) = ±oo.
X^-XQ
then we cut the selected interval into halves. Again, we select such a half which contains infinitely many of the sequence's terms and select one of those points. By this procedure, we obtain a Cauchy sequence (you can prove this by yourselves; all you need is careful manipulation with some bound, similarly as above). However, we know that Cauchy sequences have limit points or are constant up to finitely many exceptions. Thus there is a subsequence with the wanted limit. >From the fact that A is closed, it follows that the obtained point lies in A.
Now the other direction: if every infinite subset of A has a Umit point in A, then all limit points are in A, and so A is closed. If A were not bounded, we would be able to find an increasing or decreasing sequence such that the differences of adjacent numbers would be at least 1, for instance. However, such a sequence of points in A cannot have a Umit point at all.
(5) First, let us focus on the easier implication, i. e. let us suppose that every open cover contains a finite one and prove that A is both closed and bounded. Apparently, A -<im can be covered by a countable union of intervals /„ = " (n — 2, n + 2), n e Z, and any choice of a finite subcover of them witnesses that A is bounded.
Now let us suppose that a e R \ A is the limit point of a sequence at e A, and further, let us assume that \a — a„\ < i (otherwise we can select a subsequence satisfying this property). The sets
1 1
J„=R\[a--,a + -]
n n
for all n e N, n > 0, are unions of two open intervals and they also cover our set A. Since it is possible to choose a finite cover of A, the point a is inside the complement R \ A, including one of its neighborhoods, and thus it is not a limit point. Therefore, all of A's limit points must again lie in A. Hence A is closed as well.
The proof of the other impUcation is based upon the properties and existence of suprema. Let us suppose that A is compact and that an open cover C of A is given. >From the previous, it is apparent that A has a greatest element and a least element, which equal b = sup A and a = inf A, respectively. Let us mark the "extreme" set for which we can choose a finite cover from C, i. e. let us define the set
B = {x € [a, b], there is a finite subcover [a, x] n A}.
Apparently, a e B, so it is a non-empty set bounded from above. Therefore, it has a supremum c. Our task is to prove that, actually,
c = b.
The reasoning may be a bit chaotic unless we draw a picture of the situation. However, the essence is simple: We know that a < c < b. Let us thus suppose, for a while, that c < b. Since R \ A is open, for c <£ A, there is a neighborhood of the point c contained in [a, b] and, at the same time, disjoint with A. However, this would eliminate the possibility of c = sup B.
So there remains the case c e A, and there is a neighborhood O of the point c in the open cover C. Let us choose points p < c < q in O. Again, there will be a finite cover for [a, q] n A. But this means that q > c lies in B, which is impossible. The original choice of c < b led to a contradiction, which proves the desired equaUty b = c. Now, with the help of a neighborhood of b lying in C, we can find in C a finite cover for the whole of A. □
271
CHAPTER 5. ESTABLISHING THE ZOO
In other cases, knowledge (and existence) of the limits
lim fix),       lim gix)
x^xq x^xq
allows us to determine the result (having defined some more expressions)
lim fix)
x—>xq
lim fix)
x^xq
J™ g(x)
Since
1\ 1 lim I 2 H— J =2,     lim — = 0,     lim x = +oo,
we have that
lim    2 + -
x^+oo \ x
lim x
x^>+oo
lim
x^+oo \ x
2° = 1, 1
or
lim x x =  lim (xx) =0.
The last result can be expressed as 0°° = 0 or oo°° = oo, oo-1 = 0 (let us emphasize that these are not indeterminate forms).
Although we have laid great emphasis on the reader to prefer reasoning about the Umit behavior of functions to mindless labeling of the forms as determinate and indeterminate, it is, we hope, clear now why we will focus on the indeterminate ones. □
5.51. Calculate
lim
sinx + txx
+oo 2 cos x — 1 — x2 ' 3*+i + x5 - Ax
lim
lim
+oc y + 2X + x2 ' 4X - 8x6 - 2X - 167
lim
x^>+oo
3X - 45x - Vnjtx+l2 Jx — sin3 x + x arctg x
Vl + 2x + x2
Solution. Having reduced the first fraction by the polynomial x2, we get
lim
sinx + Tlx
lim
v2
x^+oo 2COSX — 1 — X2 x^+oo
Boundedness of the expressions
2 cos x — 1
r2
1
| sin x | < 1,    | 2 cos x — 11 < 3   pro   x e and x2 -» +oo for x -» +oo give us the result
lim
x^>+oo
—T +x 0 + jt
2 cos x — 1
r2
1 0-1
-71.
5.18. Limits of functions and sequences. For the discussion of limits, it is advantageous to extend the set R of real numbers by the two infinite values ±oo as we have done when defining intervals. A neighborhood of infinity is any interval (a, oo). Similarly, any interval (—oo, a) is a neighborhood of — oo. Further, we will extend the concept of a limit point so that oo is a limit point of a set A c R iff every neighborhood of oo has a non-empty intersection with it, i. e. if the set A is unbounded from above. Similarly for —oo. We talk about infinite limit points, sometimes also called improper limit points of the set A.
_ "Calculations with infinities" _.
We also introduce rules for calculation with the formally added values ±oo and arbitrary "finite" numbers a e R:
a + oo	= oo
a — oo	= —oo
a ■ oo	= oo, if a > 0
a ■ oo	= —oo, if a < 0
a ■ (—oo)	= —oo, if a > 0
a ■ (—oo)	= oo, if a < 0
a	
±oo	= 0, for all a / 0.
The following definition covers many cases of limit processes and needs to be thoroughly understood. We will go through the particular cases in detail presently.
_ [    Real and complex limits    [__^
Definition. Let us consider a subset A c R and a real-valued function / : A —>• R or a complex-valued function / : A —>• C, defined on A. Further, let us consider ste^!3 a Umit point xq of the set A (i. e. a real number or
±oo).
We sat that / has limit a e R (or a complex limit a e C) at the point xo and write
lim f(x) = a
x^xq
iff for every neighborhood 0(a) of the point a, there is a neighbor-hood0(xo) ofthe point xo such that for all x e An(0(xo)\{xo}), it holds that/(x) e 0(a).
In the case of a real-valued function, a = ±oo can also be the Umit. Such a Umit is called infinite or improper. In the other case, i. e. a e R, we say the Umit is finite or proper.
It is important to notice that the value of / at xo has no occurrence in the definition, and the function / may even not be defined at this Umit point (and in the case of an improper Umit point, it cannot, of course)! We often talk about a deleted neighborhood 0(x) \ {x} of those points where we are interested in the function values.
For now, we will not define improper Umits of complex functions.
5.19. The most often cases of domains. Our definition of a Umit covers several very dissimilar situations:
272
CHAPTER 5. ESTABLISHING THE ZOO
In the last argumentation, we actually used the squeeze theorem and the notation c/oo = 0 which is valid for any c € R (or bounded/ oo = 0, where "bounded" denotes a bounded function).
This procedure can be generalized. Any limit of the form
/iW + fi(x) + ••• + /„ (x)
lim
>x0 gi(x) + g2(x) H----+ g„(x)
where
satisfies
lim
lim =0,   ie {2,
x^x0 f\ (X)
lim -=0,    i e {2,
x^x0 gl(x)
ft(x) + f2(x) + ■ ■ ■ + fm(x)
m
lim
>x0 gl(x) + g2(x) H----+gn(x)        x^x0 gi(x)
supposing the limit on the right-hand side exists. It is advantageous to realize (the third limit can be determined, for example, by l'Hospital's rule, with which we will make ourselves familiar later)
lim — = 0,   lim — = 0,   lim — = 0,   lim — = 0
for
eel,    0 < a < ß,    1 < a < Hence we immediately have that
1X + 1
lim
+ x5 - Ax
lim
3-3"
lim
+oo    3* + 2X + X2 x^+oo 3
Ax - 8x6 - 2X - 167
3;
lim
>+oc y _ 45x _ Jllxx + U       x^+oc _7TT7T12 • 7tx
If we realize that
-oo.
71
lim arctgx = — > 1,
x^+oo 2
we will also obtain that
lim
x^+oo
x — sin x + x arctg x Vl + 2x +x2
x arctg x lim -—— =  lim arctgx
x^+oo     x/x2 x^>+oc TV
□
5.52.  Determine the limits lim
1        1 1
1
«^oovl-2    2-3 3-4 1 1
lim
+
+ ••• +
(n — 1) • n 1
Mn2 + 1     V«2 + 2 V«2 + n.
Solution. Since for every natural number k > 2 it holds that (what we do here is called partial fraction decomposition - we will present it in detail in the chapter concerning integration of rational functions)
1      _    1 1 (k-l)k ~ k - 1 ~ k'
(1) Limits of sequences. If A — N, i. e. the function / is defined for the natural numbers only, we talk about limits of sequences of real or complex numbers. In this case, the only limit point of the domain is oo, and we often write the values (terms) of the sequence as fin) — an and the limit in the form
lim a„ — a.
According to the definition, this means that for any neighborhood 0(a) of the limit value a, there is an index N e N such that a„ e 0(a) for all n > N. Actually, we have only reformulated the definition of convergence of a sequence (see 5.12). We have only added the possibility of infinite limits. We also say that the sequence an converges to a.
We can easily see from our definition for complex numbers that a sequence of complex values has limit a if and only if the real parts of at converge to re a and the imaginary parts converge to ima.
(2) Limits of functions at an interior point of an interval.
If / is defined on the interval A — (a,b) and xq is an interior point of this interval, we talk about the limit of a function at an interior point of its domain. Usually, we write
lim fix) — a.
x^xq
Let us examine why it is important to require fix) e 0(a) only for the points x / xq in this case as well. As an example, let us consider the function / : R —>• R
|0 ifx/0 1 ifx=0.
Apparently, the limit at zero is well-defined, and in accordance with our expectations, lim^o fix) — 0 even though the value /(0) — 1 does not belong into small neighborhoods of the limit point 0.
(3) One-sided limits. If A — [a, b] is a bounded interval and xo — a or xo — b, we talk about a one-sided limit of the function / at the point xq: from the left and from the right, respectively.
If the point xo is an interior point of the domain of /, we can, in order to determine the limit, consider the domain restricted to [xo, b] or [a, xo]. The resulting limits are also called a right-sided limit and left-sided limit, respectively, of the function / at the point xq. We denote them by limx^x+ fix) and lim^^- fix), respectively. As an example, we can consider the one-sided limits at xo — 0 for Heaviside's function h from the beginning of this part. Apparently,
lim h(x) — 1,
lim h(x) — 0.
However, the limit lim^o fix) does not exist.
It follows from out definitions that the limit at an interior point of the domain of an arbitrary function / exists if and only if both one-sided limits exist and are equal.
5.20. Further examples of limits. (1) The limit of a complex function / : A —>• C exists if and only if the limits of both the real part and the imaginary part exist. In this case, we have
lim fix) — lim (re/(x)) + i lim (im/(x)).
x^xq x^xq x^xq
The proof is straightforward and makes direct use of the definition of distances are neighborhoods of the points in the complex plane.
273
CHAPTER 5. ESTABLISHING THE ZOO
we get that lim
1        1 1
+ z-—: + —— + ■■■ +
1
1-2    2-3 3-4
(n — 1) • n
1 1
,111111
lim----1-----1-----1-----h
«^ooVl    22334 n-l n
lim I 1 - -
1.
Let us remark that this limit is quite important: it determines the sum of one of the so-called telescoping series (with which Johann I Bernoulli (1667-1748) worked).
To determine the second limit, we invoke the squeeze theorem. The bounds
1 11 In
1
+••• +
:+••• + ■
•Jn2 + n     y/n2 +n 1 1
+••• +
V«2 + 1 V«2 + n     V«2 + 1
for n € N give that
1
+ ••• +
•Jn2 + n     \Jn2 +n 1 n
V«2 +1   V«2 +1
lim —-==
V«2 + n
< lim _
w«2 +1
n
< lim —=.
n^°° V«2 +1
+ ••• +
V«2 +
Since
lim
lim
V«2 + n n-xx>^/„2
1, lim
lim
V«2 + l -/"2
1,
we also have that lim
1 1
+
V«2 + 1    V«2 + 2
+ ••• +
1
V«2 + n
□
5.53. Calculate
(a)
(b)
-v/1 + X — \J\ — X
lim-;
x^O X
lim
cosx — smx
yyr/4    cos (2x)
(c)
lim ^ f ^x2 + 2x + 3 -       + 2x + 2) .
Solution. We will calculate the wanted limits using the method of multiplying both the numerator and the denominator by a suitable expression. The first fraction can be conveniently extended by
Vl +X + y/l-X
Indeed, the membership into a ^-neighborhood of a complex value z is guaranteed by the real (1 /\/2)S-neighborhoods of the real and the imaginary parts of z. Hence the proposition follows immediately.
(2) Let / be a real or complex polynomial. Then for every point x € R, it holds that
Really, if/(x)
lim f(x) = f(x0).
x^xq
anx" + ■■■ +ao, then the identity (xo + S)k — ■ + Sk, substituted for k — 0,..., n, gives that choosing a sufficiently small S makes the values arbitrarily close to f(x0).
(3) Now, let us consider the following, quite awful, function defined on the whole real line
x*
XQ
Mx\~l
I 1 if x e 0   if x i
It is apparent straight from the definition that this function has an (even one-sided) limit at no point of its domain.
(4) The following function is even trickier than the previous one. Let / : R —>• R be the function defined as follows:2
/(*)
if x —
ifx i a
p, q relatively prime
Choosing any point x, no matter whether rational or irrational, \. and a huge natural number m, then x will belong to \>, exactly one of the intervals ^jjp) for some n (if x — £ we consider only coprime m > q). We set Sjc to be the minimum of the distances of the point x from the edges of these intervals for the considered m less than k. Of course, it always holds that &k < \-
Now, let us consider some e > 0 and k such that j < e. Then for all y in the deleted <5-neighborhood of the point x, we have either f(y) = 0 (if it is an irrational value) or f(y) < j for r > k (if it is a rational value). In either case, we get that \f(y)\ < e.
Therefore, this function's limit is zero at all real points x. However, only at the irrational points, this limit equals the function value.
5.21. Theorem (The squeeze theorem). Let f, g, h be three real-valued functions with the same domain A and such that there is a deleted neighborhood of a limit point xq e R of the domain where
fix) < g(x) < h(x). Then, supposing there are limits
lim f(x) = f0,     lim h(x) = h0
x^xq x^xq
and fo — ho, the limit
lim g(x) = go
x^xq
exists as well and it satisfies go — fo — ho.
This function is called Thomae function after a German mathematician J. Thomae, 1840-1921.
274
CHAPTER 5. ESTABLISHING THE ZOO
and making use of the well-known formula (a — b)(a + b) = a2 — b2. Thus we obtain
,.     VI +x - VI -x      ,.        (1+jc) - (1 -jc) lim- = lim ——-=--=—
X x^O x (Vl +x + Vl - x)
2 2
= lim
x^O
>o vi +x + Vi - x   VT + VT
1.
Similarly we can calculate
cosx — sinx
lim
yyr/4    cos (2jc)
lim
(cos x + sin x) (cos x — sin x)
>n/A     (cos x + sin x) cos (2x)
cos2 x — sin2 x
lim
x^n/4 (cosx + sinx) cos (2x)
1 1 lim -:-     = —--
x^jt/4 cosx + Sinx        V2   i V2
The reduction was made thanks to the identity
cos (2x) = cos2 x — sin2 x, iel As for the last Umit, to make use of the formula
(a - b) [a2 + ab + b2) = a3 - b3, we need the expression
V2 2 '
(x2 + 2x + 3)2+v/jc2 + 2x + 3-^x2 +2x + 2+J (x2 + 2x + 2\ which corresponds to a2 + ab + b2, so we choose
a = v^x2 + 2x + 3,       /3 = v^x2 + 2x + 2.
By this extension, we transform the original limit to for some polynomials P, Q. Let us emphasize that this really holds for all n e N. For n = 1, one must realize that we set (*) = 0 and that the polynomials P, Q may be constant zeros. So we get
(1 + 2nx)n = 1 +2n2x + 2n3 (n-l)x2 + P (x) x3, x e R, (l + nx)2n = 1 +2n2x + n3 (2n - 1)x2 + Q (x) x3,    x e R.
Mere substitution and simple rearrangements give us
(1 +2nx)n - (l+nx)2n
lim lim
x^O xl
lim (-t23 + (P(x) - Q(x)) x) = -n3 + 0 = -«3
(2«3 (n - 1) - n3 (2n - 1)) x2 + (P(x) - Q(x)) x3
Proof. From the assumptions of the theorem, it follows that for any e > 0, there is a neighborhood O(xq) of the point xq e A c R in which both / (x) and h (x) lie in the interval (fo—s, fo+s), for all x ^ xq. >From the condition f(x) < g(x) < h(x), it follows that g(x) e (fo -s, fo + s) as well, so lim^^ g(x) = f0.
The presented reasoning can be gently modified for infinite Umit values or for Umits at infinite points xq. It would be a good idea to think it through thoroughly! □
We can notice that this theorem allows us to calculate the limit for all types discussed above, i. e. limits of sequences, limits of functions at interior points, one-sided limits, and so on.
5.22. Theorem. Let A C R be the domain of real or complex functions f and g, let xq be a limit point of A and let the limits
lim f(x) = a €
x^xq
lim g(x) = b €
x^xq
exist. Then:
(1) the limit a is unique,
(2) the limit of the sum f
g exists and satisfies
lim (f(x) + g(x)) = a + b,
x^xq
(3) the limit of the product f ■ g exists and satisfies
lim (f(x) ■ g(x)) = a- b,
x^xq
(4) supposing b ^ 0, the limit of the quotient f/g exists and satisfies
f(x) _ a ~ b'
lim
□
*^*o g(x)
Proof. (1) Let us suppose that a and a' are two values of the Umit \imx^X() f(x). If a / a', then there are dis-U£/& joint neighborhoods 0(a) and O(a'). However, for sufficiently small neighborhoods of xo, the values of / should Ue in both the neighborhoods, which is a contradiction. Thus a = a'.
(2) Let us choose some neighborhood of a + b, for instance 02B(a+b). For a sufficiently small neighborhood of xo and x / xo, both f(x) and g(x) will lie in e-neighborhoods of the points a and b. Hence their sum will Ue in the 2e-neighborhood of the value a + b. The proposition is proved.
(3) Similarly to the above paragraph: we take 0Ei(ab). For sufficiently small neighborhoods of xq, the values of both / and
275
CHAPTER 5. ESTABLISHING THE ZOO
5.54. Calculate
lim (tan x) ^(2x)
x^n/4
Solution. Limits of the type 1±0° (like the examined one) can be calculated using the formula
lim fix)
x—>xq
lim «f(x)-l)g(x)) xq
supposing the limit on the right-hand side exists and f(x)^l for all x of some deleted neighborhood of the point x0 e M. Therefore, let us determine
sin x     . \ sin (2x)
lim (tanx — l)tan (2x)
x^7l/4
lim
1
x^k/4 ycosx     / cos (2x)
sin x—cos x     2 sin x cos x
lim
x^7l/4
lim
cosx -2 sinx
cos2 x — sin2 x
i V2
x^jt/4 cosx + smx
V2   i V2
2 2
Hence we have that
lim (tan x)
tan (7.x)
Let us remark that the used formula holds more generally for "the type iwhatever"5 that is with no further conditions on the limit lim^™ g(x) which even need not exist.
□
5.55.   Show that
smx lim-
Jtr^O X
1.
Solution. Let us consider the unit circle (especially its quarter lying in the first quadrant) and its point [cosx, sinx], x e (0, it/2). The length of the arc between the points [cosx, sinx] and [1,0] is equal to x. So we apparently have
smx < x,   x e
The value tan x is then the distance between the points [1, sin x/ cos x] and [1,0]. We can see that (feel free to draw a picture)
x < tanx,   x e
This inequality also follows from the fact that the area of the triangle with vertices [0, 0], [1, 0], [1, tanx] is greater than the area of the considered circular sector. Altogether, we have obtained that
sinx
smx < x <
cosx
that is
1 <
<
X €
x e
smx
cosx
smx
1 > -        > cosx,    x e
(••!) •
g will hit e-neighborhoods of the values a and b. Therefore, their product will he in the required e2-neighborhood.
(4) This is left as an exercise for the reader. □
Remark. If we look thoroughly at the presented proofs, we see that the statement of the theorem can be extended even to some infinite values of the limits of real-valued functions: Firstly, it must be the case that at least one of the limits is finite or that both limits share the same sign. Then it holds that the limit of the sum is the sum of the limits, with the conventions from 5.18. However, the case "oo — oo" is excluded.
In the second case, one of the limits may be infinite, then the other one must be non-zero. Then, again, the limit of the product is the product of the limits. Now, the case "0 • (±oo)" is excluded.
In the case of a quotient, we may have a e R and b — ±oo, then the resulting limit will be zero; or a — ±oo and b e R, then it will be ±oo according to the signs of the numerator and the denominator. The case " ^" is excluded.
Let us emphasize that our theorem also covers, as a special case, the corresponding statements about the convergence of sequences as well as about one-sided limits of functions defined on an interval.
For reasoning about limits, the following corollary of the definitions may be technically useful. It connects limits of sequences and of functions in general.
5.23. Corollary. Let us consider a real or complex function f defined on a set A C R and a limit point xo of the set A. The function f has limit y at the point xo if and only if for every sequence of points xn € A converging to, but different from xq, the sequence of the values f(xn) has limit y.
i   I I
I f
_»-1-f-jiw 4
Proof. First, let us suppose that the limit of / at xq is y. Then for any neighborhood U of the point y, there must be a neighborhood V of the point xo such that for all x e V n A, x / xo, we have fix) e U. For every sequence x„ —>• xo of points different from xo, the terms x„ will he in V for all n greater than a suitable N. Therefore, the sequences of values f(x„) will converge to y as well.
Now, let us suppose that the function / does not converge to y at x —>• xo. Then for some neighborhood U of the value y, there is a sequence of points xm / xo in A which are closer to xo than 1/m, and yet the value f(xm) does not belong to U. This way, we have constructed a sequence of points lying in A and different from xq for which the values f(x„) do not converge to y, thereby finishing the proof. □
276
CHAPTER 5. ESTABLISHING THE ZOO
Invoking the squeeze theorem, we get the inequalities
sinx
1 = lim 1 > lim - > lim cos x = cos 0=1.
x^0+ x^0+    X x^0+
Thus we have proved that
lim
smx
1.
x^0+ X
The function y = (sinx)/x defined for x ^ 0 is even, whence it follows that
smx lim -
x^O- X
smx lim -= 1.
x^0+ X
Since both one-sided limits exist and have the same value, the examined Umit exists as well and satisfies
lim
smx
smx lim - = 1.
x^O    X x^0± X
Let us remark that at first sight, one could say the limit can be calculated using l'Hospital's rule. However, then one would have to know the sine's derivative at zero which, actually, is the limit in question.
Thus we may not invoke l'Hospital's rule in this case. 5.56.  Determine the limits
□
lim
n^oo \ n + 1
lim ( 1 + —r
sin2 x
lim
x^O X
lim
3 tan x
>o 5x2
lim —y~.
sin x
sin (3x)
lim-
sin (5x)
lim I 1 - ^
arcsm x lim-;
x^O X
tan (3x)
lim
>o sin (5x)
lim
_
lim
^x
e   - e
x->o     x x->o sin (2x)
Solution. When calculating these limits, we will use our knowledge
of the following limits (a e 1):
/     a\n                    sinx                   ex — 1 lim(l + -j  =efl;       lim-= 1;       lim-= 1.
Thus we know that
x^0 X
x^0 X
, ,       IV /n-l\n
1     lim   1 - -    = lim
The substitution m = n — 1 gives us
(n — 1\                ( m -)  = lim -
m
m + l
m
lim I - J   • lim
m^oo \ m + 1/       m^oo m + l
Altogether, we have
lim
m
m + l
• lim
m
m + l
Now, we have prepared tools for a correct formulation of the property of continuity, with which we have dealt when talking about polynomials.
___J    Continuity of functions J. -
Definition. Let / be a real or complex function defined on an interval Act We say that / is continuous at a point x0 e A iff
lim f(x) = f(x0).
x^xq
The function / is said to be continuous on the set A iff it is continuous at every point xq e A.
Let us notice that for the boundary points of the interval A, the definition says that value of / equals the value of the one-sided limit there. We say that the function is right-continuous or left-continuous at such a point. We have also seen that every polynomial is a continuous function on the whole R, see 5.20(2). Further, we have met a function which is continuous at irrational real numbers only although it has limits at all rational points as well, see 5.20(4).
From the previous theorem 5.22 about limit properties, many of the following propositions immediately follow.
5.24. Theorem. Let f andg be (real or complex) functions defined on an interval A and continuous at a point xq € A. Then
(1) the sum f + g is continuous at xq
(2) the product f ■ g is continuous at xq
(3) if g(xo) ^ 0, then the quotient f/g is well-defined on some neighborhood ofxQ and is continuous at xq.
(4) if a continuous function h is defined on an neighborhood of the value f(xo) of the real-valued function f, then the composite function h o f is defined on an neighborhood of the point xq and is at xq.
Proof. The statements (1) and (2) are apparent. We need to supplement the proof of (3). If g (xq) / 0, then the entire e-neighborhood of the number g (xq) does not contain zero for a sufficiently small £ > 0. >From the continuity of g, it follows that on a sufficiently small ^-neighborhood of the point xq, g will be non-zero and the quotient f/g is thus well-defined there. However, then it will be continuous at xq by the previous theorem.
(4) Let us choose a neighborhood O of the value h(f(xo)). >From the continuity of h, there is a neighborhood O' of the point
277
CHAPTER 5. ESTABLISHING THE ZOO
Clearly, the second Umit is equal to 1. Changing the variables (replacing n with m), we can write the result
n
lim
n + 1
Further, it holds that
lim   1 +
l V
and
lim   1 +
lim     1 +
lim I 1 - -\   = lim ((1 - -
e° = l
0.
Let us point out that the first result follows from the Umits
lim (1 + \) = lim (l + -and the second one from
e,
lim - = 0
n—>oo fi
lim   1 -
n^oo V n
lim n = +oo,
where we set e~°° = 0 (this is a notation for lim^-oo ex = 0, which is a determinate form). We can easily get that
sin x                       sin x lim-= lim sinx • lim-=0-1=0.
x^O     X x^O x^O X
Apparently,
and the limit
lim-
x^o sinx
lim
l"1 = 1
1
x^o sinx
does not exist (we write 1/ ± 0). If we used the rule for the limit of a product to determine the limit
x
lim —t—
sin x
, we would obtain ll/±0=l/±0. This means that the limit does
not exist (this, again, is a determinate form). For the calculation of
arcsin x lim-,
x^0 X
we will make use of the identity x = sin (arcsin x) which holds for any x € (—1, 1), that is in some neighborhood of the point 0. Substituting y = arcsin x, we get
arcsm x arcsin x
lim-= lim
lim
y
l.
x->o    x sin (arcsin x) y^osiny
Let us remark that y -» 0 follows from substituting x = 0 into y = arcsin x and from continuity of this function at 0 (this also guarantees that such a substitution can be made).
/(xo) which is mapped into O by h. The continuous function / maps some sufficiently small neighborhood of the point xo into the neighborhood O'. However, this is the definition property of continuity, which finishes the proof. □
Now we can quite easily derive some basic connections between continuous mappings and the topology of the real numbers:
5.25. Theorem. Let f : R -> R be a continuous function. Then
(1) the inverse image f~l(U) of every open set U is an open set,
(2) the inverse image f~l (W) of every closed set W is a closed set,
(3) the image f(K) of every compact set K is a compact set,
(4) f has both a maximum and a minimum on every compact set K.
Proof. (1) Let us consider a point xo e f~l (U). There is a ;i neighborhood O of the value /(xo) which is contained in "Kfe U since U is open. However, then there is a neighborhood Alfa O' of the point xo which is mapped into O and thus belongs 'if ^   to the inverse image. Therefore, every point of the inverse image is interior, which finishes the proof.
(2) Let us consider a limit point xo of the inverse image f~l(W) and a sequence x;, /(x,) e W, which converges to it. >From the continuity of /, it apparently follows that /(xr-) converges to /(xo), and since W is closed, it must be that /(xo) e W. Clearly, all limit points of the inverse image of the set W are contained in W.
(3) Let us choose any open cover of / (K). The inverse images of the particular intervals are unions of open intervals and thus create a cover of the set K. We can select a finite cover from it, so it suffices to take finitely many of the corresponding images to cover the original set f(K).
(4) Since the image of a compact set is again a compact set, the image must be bounded and it must contain both the supremum and the infimum. Hence it follows that these must also be the maximum and the minimum, respectively. □
5.26. Corollary. Let f : R —>• R be continuous. Then
(1) the image of every interval is again an interval,
(2) f takes all the values between the maximal and the minimal one on the closed interval [a, b].3
Proof. (1) First, let us consider an open interval A and suppose there is a point y e R such that f(A) contains points less than y as well as points greater than y, but y ^ f(A). This means that for open sets B\ — (—oo, y) and B2 — (y, oo), their inverse images A\ — f~l{B\) c A and A2 — f~1(B2) C A cover A. Again, these sets are open, disjoint, and have a non-empty intersection with A. Thus there must be a point x e A which does not he in A i but is a limit point of this set. At the same time, it must he in A2, which is impossible for two disjoint open sets.
Thus we have proved that if there is a point y which does not belong to the image of the interval, then either all of the values must be above y or all must be below. Hence it follows that the image is again an interval. Let us notice that the marginal points of this interval may or may not he in the image.
This theorem is (especially in Czech literature) called Bolzano's theorem. Bernard Bolzano worked in Prague at the beginning of the 19th century.
278
CHAPTER 5. ESTABLISHING THE ZOO
We can immediately see that
3 tan2 x (3   sin x   sin x 1
lim--— = lim   - •
>o 5x2
o \ 5     x       x     cos2 x
3 sinx sinx 1 - • lim-• lim-• lim
5    x^O    X       x^O    X x^0COS2X
3 3 = -•1.1.1 = -. 5 5
By appropriate extension and substitution, we get
sin (3x) / sin (3x)      5x 3
lim-= lim
o sin (5x)     x^o \   3x      sin (5x) 5
sin (3x)           5x 3 = lim-• lim-• -
x^o    3x      x^o sin (5x) 5
= lim^.lim-^ = l.l.U2.
y^o  y    z^osinz  5 5 5
Thanks to the previous result, it can easily be calculated that
tan (3x) / sin (3x) 1
lim-= lim
osin(5x) ysin (5x)   cos (3x)
sin(3x) 1 = lim-• lim
3 3 - • 1 -.
x^o sin (5x)   x^o cos (3x)     5 5 Similarly, we can determine
,(5-2)* _ y
I e" —_——
jtr^O        X x^O
lim
= hm &
(5 - 2)x lim e2x ■ lim-
x^O        x^O 3x
3x 2
(5-2) •3
e° • lim
e^ - 1
and also
lim
e5x — e
lim
y^O y
i5x — 1    e~x — 1
3 = 1 • 1 -3 = 3
>o sin(2x)      *->■() \ sin (2jc)     sin (2x)
lim
±5x
1      2x      5    e x — 1 2x
1
lim
>o \   5x      sin (2x)   2       —x      sin (2x)   \ 2 *5x    1 2x 5
lim
>o   5x     x->o sin (2x) 2
lim
1 2x lim
>o    —x     x->o sin (2x)   \ 2
1
,. e"-l z 5 ,. e"-l ,. z lim-• lim-•--lim-• lim-
u^o    u      z^osinz   2    v^o v
z^o smz
1
5 1 „ 2 + 2=3-
5.57.   Calculate the limits
1 — cos (2x) lim-:-;
x^o xsinx
Solution. We will utilize the fact that
sinx lim-=
x^O X
lim
x^O
cosx
If the domain interval A contains one of its limit points, then the continuous function must map it to a limit point or an interior point of the image of the interior of A. This verifies the statement.
(2) This statement immediately follows from the previous one as the image of a closed bounded interval (i. e. a compact set) must be a closed interval again. □
We will finish our introductory discussion by some more theorems which are useful tools for calculating limits.
5.27. Theorem (About the limit of a composite function). Let f,
g : M —>• M be functions and lim^a f(x) = b.
(1) If the function g is continuous at the point b, then
lim g (f(x)) =g Urn f(x)) =g(b).
x^a \x^a /
(2) If the limit lim-y^j, g(y) exists and f(x) ^ b holds for all x from some deleted neighborhood of the point a, then
lim g (f(x)) = lim g(y).
x^a y^b
Proof. The first proposition can be proved similarly to 5.24(4). From the continuity of g at the point b, it follows that for any neighborhood V of the value g(b), we can find a sufficiently small neighborhood U of the point b whose values of g lies in V. However, if / has limit b at the point a, then / will hit U by all its values for some sufficiently small deleted neighborhood of the point a, which verifies the first statement.
Even if we cannot use the continuity of g at the point b, the previous reasoning will hold as well if we ensure that sufficiently small neighborhoods of the point a are mapped into a deleted neighborhood of the point b by the function /. □
5.28. Who is in the ZOO. We have begun to build our menagerie of functions with polynomials and functions which can be created from them "by parts". At the same time, we have derived many properties for a huge class of continuous functions. However, we do not
have many practically manageable examples at our disposal (except for the polynomials). As another example, we will concentrate on the quotients of polynomials.
Let / and g be two polynomials which can take complex values as well (i. e. we admit expressions a„x" H-----hflo with complex
coefficients at e C, but we allow to substitute real values only for the variable x).
.\{x el,g(i) = 0| C, fix)
The function h
h(x) =
g(x)
is well-defined at all real points x except for the roots of the polynomial g. Such functions are called rational functions. From the theorem 5.24, it follows that rational functions are continuous at □ all points of their domains. At the points where they are undefined, they can have
• a finite limit, supposing the point is a common root of both / and g and the multiplicity in / is at least as great as in g (in this case, extending the function's domain by this point and defining it to take the value of the limit there makes the functions continuous at the point as well),
• an infinite limit, supposing the one-sided infinite limits are equal,
279
CHAPTER 5. ESTABLISHING THE ZOO
Then, we get
lim
x^O
1 — cos (2jc)
x sinx
1 — (cos2 x — sin2 x)
x sinx (l — cos2x) + sin2x
x sinx
2 sin x sin x lim-:-      = lim 2-
X Sinx X
lim
x^O
lim
and lim
x^O
1 — cos X
lim
x^O
1 — cos x   1 + cos X x2        1 + cos X
lim
2;
1 — cos X
>o x2 (1 + cosx)
lim
sin2x
>o x2 (1 + cosx)
smx
lim
x^O X
lim
1
>o 1 + cosx
Let us remark that we could also use the identity
1 — cos (2x) = 2 sin2 x,       x e
D. Continuity of functions
□
5.58. Let us examine existence of limits and continuity of the function (x — I) - sgnx at the points 0 and 1.
Solution. First, let us calculate the one-sided limits at the point 0:
linwo-(* " l)~sgn* = linw0-(* - 1) = -1, linwo+(* " l)"sgn" = linwo+ ^ = ~h whence lim(x — i)-ssn* = —l. However, the function value at 0
x^O
equals 1, so the examined function is not continuous at the point 0. Further, we have that
linwi-(jc - iys^x = linwi- jzt = -oo, limx^1+(x - l)-ssn^ = limx^1+ ^ = 00.
Both one-sided limits at the point 1 exist, yet they differ, which implies that the (two-sided) limit of this function at 1 does not exist, and the function is not continuous here, either. □
R(x)
5.59.  Without invoking the squeeze theorem, prove that the function
[x,    x e {£; n € N} ; [0,   x e M\ {1; n e N}
is continuous at the point 0.
Solution. The function R is continuous at the point 0 if and only if limi?(x) = R(0) = 0.
x^O
We will show that, by the definition of a limit, the examined limit equals 0. Using the "usual" notation, we have a = 0, x0 = 0. Let 8 > 0 be arbitrary. For any x e (—8,8) we have that R(x) = 0,
• different one-sided infinite limits.
This situation is illustratively caught by the picture, which
shows the values of the function
(x - 0.05a)(x - 2 - 0.2a) (x - 5)
h(x) — -
x(x-2)(x-4)
for a — 0 (the left-hand picture thus displays the rational function
(x - 5)/(x - 4)) and for a = 5/3.
5.29. Power and exponential functions. The polynomials are created by addition and multiplication of scalars and the simple power functions n->/ with natural exponents n — 0, 1, 2,____The sense of the function
x i-> x_1, defined for all x / 0, is also obvious. Now, we will extend this definition to a general power function X3 with an arbitrary a e R.
We will use the properties of powers and roots, which we will consider to be a "matter of course". For a negative integer —a, we thus define
x~a = (x*)"1 = (x~l)a.
Further, we would surely want the equality bn — x for n e N to
1
imply that b is the n-th root of x, i. e. b — x«. It is necessary to verify that such b's always exist for positive real numbers x.
By factoring out y2 — yi in i\ — y\, we can easily see that the function y i->- -f is increasing for y > 0. Let us choose a number x > 0 and consider the set B — {y e R, y > 0, yn < x}. This is a non-empty set bounded from above, so let us set b — sup B. We already know that a power function with a natural exponent n is continuous, so we can easily verify that bn — x. Indeed, surely bn < x, and if the inequality were strict, we would find a number y such that b" < y < x, which would imply that b < y, which contradicts the definition of a supremum.
Thus we have the power function correctly defined for all rational numbers a —    X3 — (xp)? = (x?)p.
Eventually, we can notice that for the values a e R and x > 1, X3 is strictly increasing for rational a's. Therefore, we define
x° — supfx^ ,yeQ, y < a}.
For 0 < x < 1, we proceed analogously (one must be careful of the inequality signs) or we set X3 — (j)~a- For x — 1, we define 1" — 1 for any a.
Now, we have defined the power function x i->- X3 for all x e [0, 00) and a e R. However, we can consider another view of the construction: For every fixed real number c > 0, there is a well-defined function y i->- cy on the whole real line. This function is called an exponential function with base c.
280
CHAPTER 5. ESTABLISHING THE ZOO
or R(x) = x, hence (in both cases) we get R(x) e (—8,8). In other words, having chosen any 8 -neighborhood (—8, 8) of the point a, we can take the 8-neighborhood (—8,8) of the point x0 as then for any x e (—8,8) (the considered neighborhood of xo) it holds that R(x) € (—8, 8) (here, the interval (—8, 8) is the neighborhood of a). This matches the definition of a hmit (we did not even have to require x ^ x0).
The considered function R is called the Riemann function (hence the name R). In literature, it can be found in many modifications. For instance, the function
/(*)
1, j_
<?' 0,
X €
X £
for relatively prime p,q eZag > 1;
is also "often" called the Riemann function.
□
5.60. By defining the values at the points — 1 and 1, extend the function
f(x) = (x2 - 1) si
2x - 1
sin ■
1
X £ ±1 (X € R)
so that the resulting function is continuous on the whole R.
Solution. The original function is continuous at every point of its domain. Thus the extended function will be continuous if and only if we set
f(-l)   :=    lim ( (x2 - l)sin-X ~ 1
1
f(l)   :=   lim I (x2 - l) sin ■
2x - 1
1 \ v '       Xz - 1
If either of these hmits did not exist (or were infinite), the function could not be extended to a continuous one. Clearly we have that
2x - 1
sin —
xl
whence it follows that
i
1
< 1,   x ^ ±1 (x e R),
1  < fix) < \ xL - 1 ,   x ^ ±1 (x e R).
Since
lim I x2 — 1
Jt->±1 1
0,
by the squeeze theorem, we get the result /(±1) := 0.
5.61. Determine whether the equation e2* — x4 + 3x3 — 6x2 a positive solution.
Solution. Let us consider the function
□ 5 has
/(*)
Jlx
x4 + 3x3 - 6x2 - 5,    x > 0,
for which
f(0)
-4,     lim f(x)
x^+oo
lim e2x
x^>+oc
+oo.
The properties which we used when defining the power function and the exponential function f(y) — cy,L e. c — /(l), can be summarized in a single inequality for any positive real x and real y:
/(.V   •   V) - /(V) • /(V)
together with condition of continuity.
Indeed, for y — 0 we get that /(0) — 1, and hence 1 — f(0) — f(x — x) — f(x) ■ (f(x))~l and, eventually, for a natural number n, apparently f(nx) — (f(x))n. Thus we have determined the values x° for all x > 0 and a e Q. The continuity condition determines the function's values at the remaining points as well.
The exponential function especially satisfies the well-known formulas
(5.5)
(ax)
ix-y
5.30. Logarithmic functions. We have just seen that the exponential function f(x) — ax is increasing for a > 1 and decreasing for 0 < a < 1. Thus in both cases, there is a function f~l (x) inverse to it. This function is called a logarithmic function with base a. We write lnfl (x), and lnfl (ax) — x is the defining property. The equalities (5.5) are thus equivalent to
lna(*•?) = lnfl(x) + lnfl(y),    lna(xy) = y- lna(x).
Logarithmic functions are defined only for positive input values and are, on the whole domain, increasing for base a > 1 and decreasing for 0 < a < 1. lnfl(l) — 0 holds for every a.
We will see presently that there is an extremely important value of a, the so-called Euler's number e, see the paragraph 5.42. The function lne (x) is called the natural logarithm and denoted by ln(x) (i. e. omitting the base e).
3. Derivatives
When we were talking about polynomials, we already discussed how to describe the rate at which the func-1^17/, tion changes at a given point of its domain (see the paragraph 5.6). Back then, we examined the quotient (5.2), which expressed the slope of the secant line between the points [x, f(x)] e R2 and [x + Ax, f(x + Ax)] e M2 for a (small) increase Ax of the input variable. This reasoning is correct for any real or complex function /; we only have to properly work with the concept of a limit, instead of "intuitive decreasing" of Ax.
We introduce the definition of both proper and improper derivatives, i. e. we admit infinite values of the derivatives as well. We can notice that, unlike in the case of a mere limit of a function, now the function must be defined at the point xo at which we consider the derivative.
281
CHAPTER 5. ESTABLISHING THE ZOO
>From the fact that / is continuous on the whole domain it thus follows that it takes on all values y e [—4, +00). Especially, its graph necessarily intersects the positive semiaxis x, ie. the equation fix) =0 has a solution. □
5.62. At which points x e R is the function y = cos ( arctg ( I 12x21 + 11 I
(considering maximum domain) continuous?
5.63. Determine whether the function
x,      x < 0;
/(*)
0,
x, 0, x,
1
x-3 '
is   continuous; left-continuous
points — 7T, 0, 1, 2, 3, 7T.
5.6~¥. Extend the function
0 < x < 1; jc = 1;
1 < x < 2;
2 < x < 3; x > 3
right-continuous   at the
o
/(*) = arctg   1 + —
• 2 5 sin x ,
x e
\{0}
at x = 0 so that it is continuous at this point.
5.65. Find all p € R for which the function
sin (6x)
o
/(*)
3x
x e
\{0};
fiO)
is continuous at the origin.
o
5.66. Choose a real number a so that the function
-4 - 1
h(x) = is continuous on'. 5.67. Calculate
1
x > 1;
h(x)
a,
x < 1
o
lim
sin8 x
lim
sin8 x
o
5.68. Find all possible values of the parameter a e R so that the inequality
(a - 2)x2 - (a - 2)x + 1 > 0 holds for all real numbers x.
Solution. We can notice that for a = 2, the inequality holds trivially (there is constant 1 on the left side). For a ^ 2, the left side is a quadratic function / (x) in the variable x, and further / (0) = 1. Thanks to the function fix) being continuous, the inequality fix) > 0 will hold
Derivative of a function of a real variable    [_s
5.31. Definition. Let / be a real or complex function defined on an interval Acl and xq e A. If the limit
lim
fix) - fix0)
x0
exists, we say that the function / has derivative a at the point x$. The value of the derivative is denoted by fixo) or 37 (xo) or a =
In accordance with the value of the defining limit, the derivative is also sometimes called proper or improper.
One-sided derivatives (i. e. left-sided derivatives and right-sided derivatives) are defined analogously in terms of the corresponding one-sided limits.
If a function has a derivative at a point xq, we say the function is dijferentiable at x$. A function which is differentiable at every point of a given interval is said to be differentiable on the interval.
1
Derivatives can be easily manipulated with, but we will have a lot of work correctly deriving the derivatives even of some already constructed functions. Therefore, a bit prematurely, we introduce a table of derivatives of several such functions. In the last column, you can find references to the corresponding paragraph where the result is proved. We can also notice that even though we are unable to express inverse functions to some of our functions by elementary means, we are nonetheless able to calculate their derivatives; see 5.35.
Derivatives of some functions    [__i
function	domain	derivative	
polynomials fix)	whole M	fix) is again a polynomial	5.6
cubic splines	whole M	only the first deriva-	5.9
hix)		tive of h'ix) is continuous	
rational functions /(*)/* to	whole R except for roots	rational functions: f'(x)g(x)-f(x)g'(x) gU)2	5.34
	of g		
power functions	interval	fix) = ax"'1	??
fix) = x*	(0, 00)		
exponential func-	whole R	fix) = In (a) -ax	??
tions fix) = ax,			
a > 0, a ^ 1			
logarithmic	interval	fix) =	??
functions	(0, 00)		
fix)   = lnfl(x),			
a > 0, a ^ 1			
From the formulation of the definition, we would anticipate that /' (xo) will allow us to approximate the function / by a straight line
y — /(xo) + fix0)ix - x0).
282
CHAPTER 5. ESTABLISHING THE ZOO
for all real x if and only if there is no solution to the equation fix) = 0 in R (the whole of the graph of the function / will then be "above" the x-axis). This will occur if and only if the discriminant of the quadratic equation ia — 2)x2 — ia — 2)x + 1 = 0 (in x) will be negative. Thus we get the following necessary and sufficient condition:
D = ia-2)2 - Ma - 2) = ia - 2) (a - 6) < 0.
This is true for a e (2,6). Altogether, the inequality holds for all real
jtiffae[2,6). □
5.69. In R, solve the equation
2x +y +4x +5x +6x =5_
o
Solution. The function on the left side is a sum of five increasing functions on R, so it must be increasing as well. For x = 0, its value is 5, which is thus the only solution of the equation. □
5.70. In R, solve the equation
2X + 3X + 6X = 1.
o
5.71. Determine whether the polynomial
v-37
+ 5x21 - 4x9 + 5x4 - 2x - 3
has a real root in the interval (—1, 1).
o
E. Derivatives
First of all, let us show that the derivatives enlisted in the table of paragraph 5.31 are correct. We will derive them right from the definition of a derivative.
5.72. >From the definition, (see 5.31) find the derivatives of the functions x" (x is the variable, n is a constant positive integer), yfx, sinx.
Solution. First, let us remark that by substituting h for x — x0 in the definition of a derivative, we get
lim
x^-xq
fix) - fix0)
x0
lim
fix0 + h) - fjx0) h
In the following calculations, we will work with the latter expression of the limit.
This is the meaning of the following lemma, which says that replacing the constant coefficient f'(xo) in the line's equation with a certain continuous function gives exactly the values of /. The difference between the values if (x) and the value tK*o) on a neighborhood of xq then says how much the slopes of the secant lines and the tangent line at the point xq differ.
S»--
Lemma. A real or complex function fix) has a finite derivative at xq if and only if there is a neighborhood 0(xq) and a function iff which is continuous at xo and such that for all x e O(xq), it holds that
fix) = /(xo) + f(x)(x - x0). Furthermore, then i/(xo) = f'(xo), and f itself is continuous at the point xo.
Proof. First, let us suppose that /'(xq) is a finite derivative. If if is to exist, it is surely of the form
f(x) = (f(x) - /(x0))/(x - x0)
for all x e O \ {xo}. On the other hand, we define the value at the point xo as /'(xo). Surely, then
lim if(x) = f'(x0) = if(x0)
x^xq
as desired.
And if such a function if exists, the same procedure calculates its limit at xo. Thus the derivative /'(xo) exists as well and equals
if(x0).
>From the expression of / in terms of continuous functions, it is apparent the / itself is continuous at the point xo. □
5.32. Geometrical meaning of the derivative. The previous lemma can be illustrated geometrically, thereby getting another view at the derivative. It says that it can be determined whether the derivative exists from the graph of the function y = f(x), i. e. the corresponding curve with coordinates x and y: the derivative exists if and only if the slope of the secant line going through the points [xo, /(xo)] and [x, f(x)] changes continuously. If so, the limit value of this slope is the value of the derivative.
_J    Functions increasing and decreasing at a point
Corollary. If a real-valued function f has derivative /'(xo) > 0 at a point xq e M, then there is a neighborhood 0(xq) such that fix) > f'(xo) for all points x € 0(xq), x > xo, and fix) < /(xo) holds for all x € 0(xq), x < xq.
283
CHAPTER 5. ESTABLISHING THE ZOO
(xny = lim
(x + h)n - x"
h
(")x"-1h + (")x"-2h2 + --- + hn lim —-
h
nxn    + lim
" Xxn~2h + (" )xn~3h2 + ■■■+ h"~l
nx
.«-1
lim
h
lim
1
h^O h(y/x + h + yfx)     h^O Jx + h + yfx
1
2y/x'
sin(x + h) — sinx (sin x) = lim-
h^O h
sin x cos h + cos x sin h — sin x = lim-
h^O h
cos x sin A         sinx(cos/i — 1) = lim--h lim-
ft^o ft h^O ft
cosx • lim
shift 2(sin|)2
lim
ft^o    h h^O h
sin f
cos x • 1 + lim sin t -
cosx.
5.73. Differentiate:
Sol
ni:
□
i) x sin x,
ii)
iii) ln(x + Vx2 — a2), a 7^ 0,
iv) arctan^^^), |x| < 1,
v) X*.
he formula for the derivative of a product (the Leib-e get
that
(x sinx)' = x' • sinx + x • (sinx)' = sinx + x cosx. (ii) By the formula for the derivative of a quotient (5.34), we have
sinx     (sinx)' • x — sinx • x'     x cosx — sinx
X xz xz
(iii) This time, we will use the formula for the derivative of function composition (the chain rule, see 5.33). Setting h(x) = ln(x),
f(x)=x + Vx2 — a2, we obtain
On the other hand, if the derivative satisfies /'(xo) < 0, then there is a neighborhood 0(xq) such that f(x) < /(xo) far all points x € 0(xq), x > xq, and f(x) > /(xo) far all x € O(xq),
x < Xq.
, ._. x/x + h - ~Jx {x/x + h - Vx)(V-x + h + yfx) (y/x) = lim-= lim-, -
h^O h h^o h(Jx +h + Vx)
Proof. Let us consider the former case. By the previous lemma, we have fix) — fixo) + V^(x)(x — xo) and tK*o) > 0. However, since is continuous at the point xo, there must exist a neighborhood 0(xo) on which it holds that t/c(x) > 0. Then with increasing x > xo, the value fix) > fixo) increases as well, and analogously for x < x$.
The latter case (with a negative derivative) can be proved similarly. □
The functions that, for all points x of some neighborhood of a pointxo, satisfy fix) > /(xq) if x > xq and fix) < /(xq) if x < xo are called increasing at the point x$. If the function is increasing at all points of a given interval, then it is said to be increasing on the interval. Of course, functions which are increasing on an interval satisfy fib) > fia) for all a < b from this interval.
Dually, a function is said to be decreasing at a point xq iff there is a neighborhood of the point xq such that fix) < /(xq) if x > xo and fix) > fixo) if x < xq for all points x of the neighborhood. It is decreasing on an interval iff it is decreasing at every point of the interval.
Thus our corollary says that a function having a non-zero finite derivative at a point is either increasing or decreasing at that point, according to the sign of the derivative.
As an illustration of a simple usage of the connection between the derivatives and the properties of being an increasing (or decreasing) function, we can consider existence of inverses to polynomials. Since hardly any polynomials are exclusively increasing or decreasing functions, we cannot anticipate that there would be WglofJally defined inverse functions to them. On the other hand, the inverse exists to every restriction of / to an interval between adjacent roots of the derivative /', i. e. where the derivative of the polynomial is non-zero and keeps the sign. These inverse functions will never be polynomials, except for the case of polynomials of degree one, when the equation
y
gives that
1
x = -iy - b).
Similarly with a polynomial of degree two, the equation
leads to the formula
y — ax + bx + c
-b ± jb2 -Aa(c- y) 2a
ln(x + Vx2 - a2)' = hi fix))' = hi fix)) ■ fix)
(x _|_ Vx2 — a2)'    and thus the inverse (given by the above formula) exists only for
x + Vx2 — a2
those x which lie in either of the intervals (—oo,
2a
),(-&, oo)
1 +
+ Vx2"
For the work with inverse functions to polynomials, we thus cannot do with the functions we have at our disposal now, so we obtain new additions to our menagerie.
284
CHAPTER 5. ESTABLISHING THE ZOO
where we used the chain rule once again when differentiating
V-*2 — a2.
(iv) Again, we are looking for the derivative of a composed function:
arctan
l
l +
l-x2
VT^x2 +
h-x2
1 +
l-x2
Vl -X2 +
1
VT^x2 VT^x2
(v) First, we transform the function to a function with constant base (preferably the base e) which we are able to differentiate.
(jc*y = ((elnxy)' = (exlnxy
(x lnx)' • e
x In x
(1 + lnx) • x*
□
5.74.  Find the derivative of the function y = xsin x,x > 0.
Solution. We have
(xsinx)' = (esinx lnx)' = esinx lnx (cosx lnx + ^) xsinx (cosx lnx + säi).
5.75. For positive x, determine the derivative of the function
fix) = xlnx.
□
o
Solution. f'{x) = 2xln x~l ■ lnx.
5.76. For x e (0, ix/2), calculate the derivative of the function
y = (sinx)c°sx.
o
Solution, (sin x)1+cosx (cot2x -ln(sinx)).
We advise the reader to make up some functions and find their derivatives. The results can be verified in a great deal of mathematical programs. In the following exercise, we will look at the geometrical meaning of the derivative at a given point, namely that it determines the slope of the tangent line to the function graph at the given point (see 5.32).
5.33. Formulae for calculation of derivatives. Now, we will introduce several basic facts about the calculation of derivatives. They will talk about how much the differentiation is compatible with the algebraic operations of addition and multiplication of real or complex functions. The last formula then allows us to efficiently determine the derivatives of composite functions. It is also called the "chain rule".
Intuitively, they can be understood very easily if we imagine that the derivative of a function y = f(x) is the quotient of the rates of increase of the output variable y and the input variable x:
, Ay J Ax
Of course, for y = h(x) = f(x) + g(x), the increase in y is given by the sum of the increases of / and g, and the increase of the input variable is still the same. Therefore, the derivative of a sum is the sum of the derivatives.
In the case of a product, we have to be a bit more careful. For y = f(x)g(x), the increase is
Ay = fix + Ax)g(x + Ax) - f(x)g(x) = f(x+Ax)(g(x+Ax)-g(x)) + (/(x+Ax)- f(x))g(x)
Now, if we make the increase Ax small, we actually calculate the limit of a sum of products, which, as we know, can be calculated as the sum of the products of the limits. Thus we can expect that the derivative of a product fg is given by the expression fg' +f'g, which is called Leibniz rule.
The derivative of a composite function is even more interesting: Let us consider a function
g = ho f,
where the domain of the function z = h(y) contains the codomain of the function y = f(x). Again, by writing out the increases, we obtain that
Az _ Az Ay Ax     Ay Ax' Thus we may expect that the formula will be of the form
(h o f)'(x) = h'(f(x))f'(x).
Now we will provide correct formulations together with proofs:
___|    Rules for differentiation J_ -
5' =
Theorem. Let f and g be real or complex functions defined on a neighborhood of a point xq e M and having finite derivatives at this point. Then
(1) for every real or complex number c, the function x     c- f(x) has a derivative at the point xq and
(cf)'(x0) = c(f'(x0)),
(2) the function f + g has a derivative at the point xq and
(f + g)'(x0) = f(x0) + g'(x0),
(3) the function f ■ g has a derivative at the point xq and
(/ ' S)'(*o) = f'(xo)g(xo) + f(x0)g'(x0).
285
CHAPTER 5. ESTABLISHING THE ZOO
5.77. Using differential, approximate arccotg 1, 02.
Solution. The differential of the function / having continuous derivative at the point x0 is equal to
/' (x0) dx = f (x0) (x - x0).
The equation of the tangent to /'s graph at the point [xo, f(xo)] is then
y - f (*o) = /' (*o) (* - *o) ■ Hence we can see that the differential is the growth on the tangent line. However, the values on the tangent approximate those of /, supposing the difference x — x0 is "small". Thus we obtain the following formula for approximating the function value by its differential:
/(*) ~ / (*o) + /' (*o) (x - x0).
So, setting
f(x) := arccotgx,    x0 := 1,
we get
arccotg 1, 02 « arccotg 1 + ^ (1, 02 - 1) = f - 0, 01.
Eventually, let us remark that the point x0 is of course selected so that the expression x — x0 is as close to zero as possible, yet we must be able to calculate the values of / and /' at the point. □
5.78. Using differential, approximate arcsin 0, 497. O Solution, f -    0, 003.
5.79. Using differential, approximate
a := arctg 1, 02;    b :=
'70.
o
Solution, a ~ f + 0, 01; b ~ 4, 125.
5.80. Using differential, approximate
(a) sin(f|);
(b) sin (Iff).
O
Solution, (a) \
spin . 360 '
\/2 i \/2jt 2 360 '
(b)^r +
5.81. Determine the parameter celso that the tangent line to the graph of the function ln^r) at the point [1,0] goes through the point [2, 2].
Solution.
>From the statement of the problem it follows that the tangent's slope is |5f = 2. The slope is determined by the derivative at the
(4) Further, if h is a function defined on a neighborhood of the image yo — /(xo) and having a derivative at the point yo, the composite function h o / also has a derivative at the point xq
and
I
(h o f)'(xo) = h'(f(x0)) ■ f'(xo).
Proof. (1) and (2): A straightforward application of the theorem about sums and products of function limits yields the result.
(3) We will rewrite the quotient of the increases (which we have already mentioned), in the following way:
(fg)(x) - (fgKxo)          ,g(x)-g(x0) , f(x)-f(x0) - = f(x)--1--g(xo)-
x — xq x — xq x — xq
The limit of this expression for x —>• xo gives the wanted result because / is continuous at the point xo.
(4) By the lemma 5.31, there are functions i/r and <p which are continuous at the points xq and yo — f (*o) and they further satisfy
h(y) - h(y0) + <p(y)(y - yo), f(x) — f(x0) + f{x){x - x0)
on some neighborhoods of xq and yo- They also satisfy tK*o) = /'(xo) and cp(yo) — h'(yo). Then, it holds that
h(f(x)) - h(f(x0)) = cp(f(x))(f(x) - /(x0)) — V(.f(x))f{x){x - x0)
for x from the neighborhood of the point xo. However, the product (p(f(x))ijf(x) is a function which is continuous at xq and its value at the point xo is just the desired derivative of the composite function, again by the lemma 5.31. □
_j    Derivative of a quotient [_
5.34. Corollary. Let f and g be real-valued functions which have finite derivatives at a point xo and g(xo) ^ 0. Then the function h(x) — f(x)(g(x))~1 satisfies
h'(x0)
(xo)
f'(xo)g(x0) - f(xo)g'(x0) (g(xo))2
Proof. First, we will proof the special case when h (x) — x 1. From the definition of a derivative, we immediately get that
l   _ j_
h'(x) = lim  X+Ax-— — iim
x — x — Ax
Ajc^o Ax -1
lim
Ajc^o Ax(x2 +xAx)
Ajc^o x2 + xAx
and from the formulae for manipulations with limit, it follows that
h' (xo) — —x~2. Now, the chain rule says that
-g~2-g',
and, eventually, by the product rule, we obtain
if/8)' = if ■ 8~1)' = f'8~l ~ f8~28' = f'8~8f'-
□
286
CHAPTER 5. ESTABLISHING THE ZOO
given point, thus we get the condition
2 — In (ex)
2, that is   2 - ln(c) = 4,
x = l
hence c = \. Yet for c = 4, the function        takes the value -2 at
eL eL »Jx
the point 1. Therefore, there is no such c. □
5.82. Determine whether there is a point in the interval (0, 4) such that the tangent line at this point to the polynomial x (x — 4)5 is parallel to the x-axis. O Solution. Yes, there is.
5.83. Let p € (0, +00). Write the equation of the tangent to the parabola 2py = x2 at a general point [xo, ?]. O
Solution, y = fx-&.
5.84. Find the equation of the normal line to the graph of the function y = 1 — e^, x e M at the point where the graph intersects the x-axis.
o
Solution, y = 2x.
5.85. Find the equations of the tangent and normal lines to the curve
y = (x + 1) v^3 - x,   x e at the point [-1,0]. Solution, y = \/4 (x + 1); y
o
-f (x + 1).
5.86. Let the function
ln(2x3+4x
1+x
X €
be given. Determine the equations of the tangent and normal lines to the graph of this function at the point [1, ?]. O
In 5
20
Solution, y - ^ = (f -       (x - 1); y     2 _ 51n5_26
(x - 1).
5.87. At which points is the tangent to the parabola
y = 2 + x-x2,    x eR parallel to the x-axis? O Solution. [\,2\\
5.88. Determine the equations of the tangent line t and the normal line n to the graph of the function
y = Vx2 - 3x + 11,    x e R
at the point [2, ?]. Further, determine all points at which the tangent is parallel to the x-axis. O
-6x + 15;
Solution, t : y
f+ f;n:y
3f+ 11
5.89. What is the angle between the x-axis and the graph of the function y = In x? (We mean the angle between the tangent line and the
5.35. Derivatives of inverse functions. In the paragraph 1.36, 4#8aS      while talking about relations and mappings in gen-ifl^£      eral, we have defined the concept of an inverse func-f^few^   n'on- ^tne mverse function f~l to a given function 'ffsr*%5s-L / : R -> R exists (do not confuse this notation with the function x i-> (/(x))-1), then it is uniquely determined by either of the following identities
/ 1 o / = idM,    f of 1 = idB
and the other one then holds as well. If / is defined on a set A c R and f(A) — B, the existence of f~l is conditioned by the same statements with identity mappings id a and idg, respectively, on the right-hand sides. As we can see from the picture, the graph of the inverse function is obtained simply by interchanging the axes of the input and output variables.
If we knew that the inverse y — f~1(x) of a differentiable function x — f(y) is also differentiable, then the chain rule would immediately yield
1 = (id)'(x) = (/ o rl)'(x) = f(y) ■ (f-l)'(x),
so we obtain the formula (apparently, f'(y) must be non-zero in this case)
___|    Derivative of an inverse function |___
(5.6)
f'(y)
This corresponds to the intuitive idea that for y — f(x), the value of /' is approximately ^ while for x — f~l (y) it is approximately (f~l)'(y) — And this indeed is the way how we can calculate the derivatives of inverse functions:
Theorem. If f is a real-valued function differentiable at a point xo and such that /'(xo) ^ 0, then there is a function f~l defined on some neighborhood of the point and such that (5.6) holds.
287
CHAPTER 5. ESTABLISHING THE ZOO
positive x-axis in the "positive sense of rotation", ie. counterclockwise.) O Solution. 7T/4.
5.90. Determine the equations of the tangent and normal line to the curve given by the equation
+ y3 - 2xy = 0
at the point [1, 1]. Solution, y = 2 — x; y = x.
5.91. Prove the following:
o
< In (1 + x) < x    for all x > 0.
o
Solution. The inequalities follow, for instance, from the mean value theorem (attributed to Lagrange) applied to the function y = ln(l +0, t e [0, x].
F. Extremal problems
The simple observation 5.32 about the geometrical meaning of the derivative also tells us that a differentiable real-valued function of a real variable can have extremes only at the points where its derivative is zero. We can utilize this mere fact when solving miscellaneous practical problems.
5.92. Consider the parabola y = x2. Determine the x-coordinate xA of the parabola's point which is nearest to the point A = [1,2].
Solution. It is not difficult to realize that there is a unique solution to this problem and that we are actually looking for the absolute minimum of the function
f(x) = y/ix - l)2 + (x2 - 2)2,   x e R.
Apparently, the function / takes the least value at the same point where the function
g(x) = (x - l)2 + (x2 - 2)2,    x e R
does. Since
g'(x) = 4x3 - 6x - 2,   x e R,
by solving the equation 0 = 2x3 — 3x — 1, we first get the stationary point x = — 1 and after dividing the polynomial 2x3 — 3x — 1 by the polynomial x + 1, we then obtain the remaining two stationary points
±fi   and l-±fi.
As the function g is a polynomial (differentiable on the whole domain), from the geometrical sense of the problem, we get
Proof. First, let us realize that the request that the derivative at xo be non-zero means that our function / is either increasing or decreasing on some neighborhood of the point; see the corollary 5.32. Thus there exists an inverse function defined on some neighborhood. Since a continuous function maps a closed bounded interval onto a closed interval; the image f(U) of any open set U contained in the domain of / is open as well. Then, by the definition of continuity, the inverse function is continuous, too.
To prove our proposition, it now suffices to carefully read through the proof of the fourth statement of the theorem 5.33. We only choose / for h and f~l for /, and we know that the composite function is differentiable instead of supposing existence of the derivatives of both the functions (and we know that the composite is the identity function): Indeed, by the lemma 5.31, there is a function ^ continuous at the point yo such that
f(y) - /(yo) = (p(y)(y - yo),
on some neighborhood of yo. Further, it satisfies cp(yo) — /'(yo)-However, then the substitution y = f~l (x) gives that
x - xo = ^(/-1(x))(/"1(x) - f-\x0)),
for all x lying in some neighborhood O (xo) of the point xo. Further, we have f~l (xq) — yo, and since / is either strictly increasing or strictly decreasing, we get that cp(f~1(x)) / 0 for all x e O(xo)\ {xq}. Thus we can write
f-Hx)-rHx0)
l
x -x0 <p(f 1 (x))
for all x € 0(xo) \ {xo}. The right-hand side of this expression is continuous at the point xo and the limit equals
1 1
<p(f~Hxo)) /'(yo)'
Therefore, the limit of the left-hand side exists as well and equals the expression, i. e.
1
(/_1)'(*o)
/'(yo)
exists.
□
5.36. Derivatives of power, exponential, and logarithmic functions. As an illustrating example of calculating the derivative of an inverse function, we will determine (lne)'. We will use the formula (ex)' — ex although we have not proved it yet. From the definition of the natural logarithm,
eln* = x,
so we can easily calculate:
(5.7)
(ln)'(y) = (ln)V) =
(e*)'
1 _ 1
ex y
The formula
(5.8)
(xa)' = axa-
for differentiating a general power function can also be derived using the derivatives of exponential and logarithmic functions:
(xa)' = (efl lnx)' = efl lnx (a Inx)' = ax0'1.
288
CHAPTER 5. ESTABLISHING THE ZOO
xA
1+73 2 •
□
5.93. Consider an isosceles triangle with base length b and height (above the base) h. Inscribe the rectangle having the greatest possible area into it (one of the rectangle's sides will lie on the triangle's base). Determine the area S of this rectangle.
Solution. To solve this problem, it suffices to consider the problem of inscribing the largest rectangle into a right triangle with legs of lengths b/2 and h so that two of the rectangle's sides lie on the legs of the triangle. Thus we reduce the problem to maximizing the function
f{x) =x(h-2f) on the interval / = [0, b/2]. Since we have that
f(x)=h-^ forallxe/
and further
/(0) = /(§)= 0, fix) > 0, x € I, the function / must take the greatest value on / at its only stationary point x0 = b/4. Thus the sides of the wanted rectangle are b/2 long (twice x0: considering the original problem) and h/2 (which can be obtained by substituting b/4 for x into the expression h — 2hx/b). Hence we get S = hb/4. □
5.94. Among rectangles such that two of their vertices lie on the x-axis and the other two have positive y-coordinates and lie on the parabola y = 8 — 2x2, find the one which has the greatest area. Solution. The base of the largest rectangle is 4/V3 long, the rectangle's height is then 16/3. This result can be obtained by finding the absolute maximum of the function
Six) = 2x (8 - 2x2)
on the interval / = [0, 2]. Since this function is non-negative on /, takes zero at /'s boundary points, is differentiable on the whole of / and its derivative is zero at a unique point of /, namely x = 2/V3, it has the maximum there. □
5.95. Let the ellipse 3x2 + y2 = 2 be given. Write the equation of its tangent line which forms the smallest triangle possible in the first quadrant and determine the triangle's area.
Solution. The line corresponding to the equation ax + by + c = 0 intersects the axes at the points [—^, 0], [0, —|] and the area of the triangle whose vertices are these two points and the origin is S =
2
j^. The line which touches the ellipse at [xT, yT] has the equation 3xxr + yyT —2 = 0. The area of the triangle corresponding to it is
Now, let us focus on the derivative of the exponential function fix) — ax. If the derivative of ax exists at all points x, it will surely hold that
fix)
lim
a
x+Ax
-,Ax
lim ■
Ax Ajc^o Ax
On the other hand, if the derivative at zero exists, then this formula guarantees the existence of the derivative at any point of the domain and also determines its value. At the same time, we verified the validity of the formula for the one-sided derivatives.
Unfortunately, it will take us some time to verify (see 5.43,11 i 11, and 6.43) that the derivatives of exponential functions indeed exist. We will also see that there is an especially important base e, the so-called Euler's number, for which the derivative at zero equals one. What we can do now is to notice that the exponential functions are special in the way that their derivatives are proportional (with a constant coefficient) to their values:
1
f'iO)ax
iax)' = (eln^)' = ln(a)(eln(fl)*) = ln(a) • ax.
5.37. Mean value theorems. Before we continue our journey of building new functions, we will derive several sim-<\, pie statements about the derivatives. The meaning U of all of them is intuitively clear from the pictures and the proofs only follow the visual imagination.
Theorem. Let a function f : M -> M be continuous on a closed bounded interval [a, b] and differentiable inside this interval. If fia) = fib), then there is a number c e (a, b) such that f'ic) = 0.
Proof. Since the function / is continuous on the closed interval (i. e. on a compact set), it reaches both a maximum and a minimum there. If the maximum and the minimum shared the value fia) — fib), it would mean that the function / is constant, and thus its derivative is zero at all points of the interval (a,b). Therefore, let us suppose that at least one of the maximum and the minimum is different and let it occur at an interior point c. Then it is impossible to have f'ic) / 0 because then the function / would be either increasing or decreasing at the point c (see 5.32) and so it would take both lower and higher values than fic) at a neighborhood of the point c. □
The above theorem is called Rolle's theorem.lt immediately implies the following corollary, known as Lagrange's mean value theorem.
5.38. Theorem. Let a function f : M -> M be continuous on an interval [a, b] and differentiable at all points inside this interval.
289
CHAPTER 5. ESTABLISHING THE ZOO
thus S = Jx2yT. Further, in the first quadrant, we have that xT,yT > 0. To minimize this area means to maximize the product xTyT = 3x2r, which is (in the first quadrant) the same as to maximize
xttJ2-(xTyT)2 = x2(2 minimum is at xT
3x2T)
-3(x2
\)2 +     Hence, the wanted
-j^. The tangent's equation is V3x + y = 2 and
the triangle's area is 5mi„ = 2-jp.
□
5.96. At the time t = 0, the three points P, Q, R began moving in the plane as follows: The point P is moving from the point [—2, 1] in the direction (3, 1) at the constant speed VlO m/s, the point Q is moving from [0, 0] in the direction (— 1, 1) with the constant acceleration 2 V2 m/s2 (beginning at zero speed) and the point R is going from [0, 1] in the direction (1, 0) at the constant speed 2 m/s. At which time will the area of the triangle PQRbe minimal?
Solution. The equations of the points P, Q,Rm time are
P
Q R
[-2, 1] + (3, l)t, [0, 0] + (-l, l)ř, [0, 1] + (2, 0)í.
The area of the triangle P QR is determined, for instance, by half the absolute value of the determinant whose rows are the coordinates of the vectors PQ and QR (see 1.34). So we minimize the determinant:
-2 + t t
2t
-1 + i2
2t - t + 2.
The derivative is 6r — 1, so the extrema occur at t = Thanks to considering non-negative time only, we are interested in t = . The second derivative of the considered function is positive at this point, thus the function has a local minimum there. Further, its value at this point is positive and less than the value at the point 0 (the boundary point of the interval where we are looking for the extremum), so this point is the wanted global minimum. □
5.97. At 9 o'clock in the morning, the old wolf left his den D and as a part of his everyday warm-up, he began running counterclockwise around his favorite stump S at the constant speed 4 kph (not very quick, is he), keeping the constant distance of 1 km from it. At the same time, Little Red Riding Hood set out from her house H straight to her Grandma's cottage C at the constant speed 4 kph. When will they be closest to each other and what will their distance be at that time? The coordinates (in kilometers) are: D = [2, 3], S = [2, 2], H = [0, 0], C = [5, 5].
Solution. The wolf is moving along a unit circle, so his angular speed equals his absolute speed and his position in time can be described by
Then there is a number c e (a, b) such that
fib) - f(a)
fie) =
b — a
VETA  0 S-rttpW tfWOTE.
Proof. The proof is a simple record of the geometrical mean-ing of the theorem: The secant line between the points [a, f(a)] and [b, fib)] has a tangent line which is parallel to it (have a look at the picture). The equation of our secant line is
y — g(x) — f(a) + ^—^-^^(x - a).
b — a
The difference h (x) — f(x) — g(x) determines the distance of the graph and the secant line (in the values of y). Surely h(a) — h(b) and
it, ,     ft, ,     fib) - f(a)
h(x) = f (x)--■-.
b — a
By the previous theorem, there is a point c at which h' (c) — 0. □
The mean value theorem can also be written in the form:
(5.9) f(b) = f(a) + f'(c)(b-a).
In the case of a parametrically given curve in the plane, i. e. a pair of functions y — fit), x — git), the same result about existence of a tangent line parallel to the secant line going through the marginal points is described by the so-called Cauchy's mean value theorem:
Corollary. Let functions y = fit) andx = git) be continuous on an interval [a, b] and differentiable inside this interval, and further let g'it) ^ 0 for all t € (a, b). Then there is a point c € (a, b) such that
fjb) - fja) _ f'jc) gib) - gia)      g'ic) '
Proof. Again, we rely on Rolle's theorem. Thus we set
hit) = ifib) - fia))git) - igib) - gia)) fit).
NowMa) = fib)gia)-fia)gib),hib) = f{b)g{a)-f{a)g{p), so there is a number c e (a,b) such that h'(c) — 0. Since g'ic) / Owe get just the desired formula. □
A reasoning similar to the one in the above proof leads to a supremely useful tool for calculating limits of quotients of functions. The theorem is known as I'Hospital's rule:
290
CHAPTER 5. ESTABLISHING THE ZOO
the following parametric equations:
x(t) = 2- cos (4f), y(t) = 2- sin(4f), Little Red Riding Hood is then moving along the line
x(t) = 2J2t, y(t) = 2y/2t.
Let us find the extrema of the (squared) distance p of their paths in time:
p(t) = [2 - cos(40 - 2V2tf + [2 - sin(4f) - 2V2?]2, p'(t) =16(cos(40 - sin(40)(V2f - 1) + 32?+ + 4V2(cos(40 + sin(40) - 16V2.
It is impossible to solve the equation p' (t) = 0 algebraically, we can only find the solution numerically (using some computational software). Apparently, there will be infinitely many extrema: every round, the wolfs direction is at some moment parallel to that of Little Red Riding Hood, so their distance is decreasing for some period; however, Little Red Riding Hood is moving away from the wolfs favorite stump around which he is moving. We find out that the first local minimum occurs at t = 0.31, and then at t = 0.97, when the distance of our heroes will be approximately 5 meters. Clearly this is the global maximum as well.
The situation when we cannot solve a given problem explicitly is quite common in practice and the use of numerical methods is of great importance. □
5.98. Halley's problem, 1686. A basketball player is standing in front of a basket, at distance I from its rim which is at height h from the throwing point. Determine the minimal initial speed v0 which the player must give to the ball in order to score, and
the angle cp corresponding to this v0. See the picture.
Solution. Once again, we will omit units of measurement: we can assume that distances are given in meters, times in seconds (and speeds in meters per second then). Suppose the player throws the ball at time t = 0 and it goes through the rim at time t0 > 0. We will express the ball's position (while flying) by the points [x(t), y(t)] for t e [0, t0], and we require that x(0) = 0, y(0) = 0, x(t0) = I, y(t0) = h. Apparently,
x? (t) = vo cos cp,       y (t) = vo sin cp — gt
for t € (0, to), where g is the gravity of Earth, since the values x' (t) and y (0 are, respectively, the horizontal and vertical speed of the ball. By integrating these equations, we get
x(t) = v$t cos cp + c\,       y(t) = v$t sin cp — ^ gt2 + c2
5.39. Theorem. Let us suppose that f and g are functions dijferen-tiable on some neighborhood of a point xo e R, yet not necessarily at the point xq itself Moreover, let the limits
lim f(x)
x—>xq
0,
lim g{x) = 0
exist. If the limit
exists, then the limit
lim
fix)
x^x0 g'(x)
lim
x^x0 g(x)
exists as well and these two limits are equal.
Proof. Without loss of generality, we can assume that both the functions / and g take zero at the point xq. Again, we can illustrate the statement by a picture. *^t~~:_ Let us consider the points [g(x), f(x)]   e R2 parametrized by the variable x. The quotient of the values then corresponds to the slope of the secant line between the points [0, 0] and [f(x),g(x)].  At the same time, we know that the quotient of the derivatives corresponds to the slope of the secant line at the given point. Thus we want to derive that the limit of the slopes of the secant lines exists from the fact that the limit of the slopes of the tangent lines exists.
Technically, we can make use of the mean value theorem in a parametric form. First of all, let us realize that the existence of the expression f'(x)/g'(x) on some neighborhood of the point xo (excluding xo itself) is implicitly assumed; thus especially for points c sufficiently close to xo, we will have g'(c) / 0.4 Thanks to the mean value theorem, we have now that
fix)      ,.     /(x)-/(x0) f'(cx)
lim
lim
lim
«o g(x)       x^x0 g(X) - g(x0)       x^x0 g'(Cx)
where cx is a number lying between xo and x, dependent on x. From existence of the limit
,. fix) lim -,
x^xo g'(x)
it follows that this value will be shared by the limit of any sequence created by substituting the values x — xn approaching xq into
This is not always necessary for the existence of the limit in a general sense. Nevertheless, for the statement of l'Hospital's rule, it is. A thorough discussion can be found (googled) in the popular article 'R. P. Boas, Counterexamples to L'Hospital's Rule, The American Mathematical Monthly, October 1986, Volume 93, Number 8, pp. 644-645.'
291
CHAPTER 5. ESTABLISHING THE ZOO
for t € (0, to) and c\, c2 € R. From the initial conditions
lim x(t) = jc(0) = 0,        lim y(t) = y(0) = 0,
t^0+ t^0+
it follows that c\ = c2 = 0. Substituting the remaining conditions
lim x(t) = x(to) = I,        lim y(t) = y(t0) = h
t-Hg- t-Hg-
then gives
/ = voto cos cp,       h = voto sin cp — j gt^.
According to the first equation, we have that
I
(5.1)
to
Vq cos cp
and thus we get only one equation
gl2
(5.2)
h = I tan cp
2f2 cos2 cp
where v0 e (0, +oo), cp e (0, tt/2).
Let us remind that our task is to determine the minimal v0 and the corresponding cp which satisfies this equation. To be more comprehensible, we want to find the minimal value of v0 for which there is an angle cp satisfying (||5.2||). Since
l+tan2<p,      cp e (0, f),
the equation (||5.2||) can be written in the form
gi
_l_ _ cos2 t^+sin2 <
h - l tan cp +     (l + tan2 cp) = 0,
l. e.
tm2cp - ^ftmcp + ^ + 1=0.
gl —r ' gl2
>From the last equation (quadratic equation in p = tan cp), it follows that
tan<p
2u^     / 4ng       (2hv2 gl ^ g2i2~4[ gl2 +1
i. e.
(5.3)
tan<p = — ± gl
2hv2g - g2f
gl
Therefore, the angle cp satisfying (||5.2||) exists if and only if
„4 _ 2gh v2 - g2l2 > 0.
Once again, a suitable substitution (this time q = v^) allows us to reduce the left side to a quadratic expression and subsequently to get
(v2 -g[h + Vh^TfJj (v2 -g[h- Vh^TpJj > 0. As h < \Jh2 + P, it must be that
h + V/i2 +12
l. e. v0
>
h + Jh2 + P
The least value (5.4)
h + y/h2 + P
f'(x)/g'(x). Especially, we can substitute any sequence cXn for xn -> xo, and thus the limit
,. f'(cx) lim -
■*^*o g'(cx)
will exist, and the last two limits will be equal. Thus we have proved that the wanted limit exists and also has the same value. □
>From the proof of the theorem, it is apparent that it holds for one-sided limits as well.
5.40. Corollaries. L'Hospital's rule can easily be extended for limits at the improper points ±oo and for the case of infinite values of the limits. If, for instance, we have
lim f{x) = 0,     lim g(x) = 0,
x^oo x^oo
then limJC_>0+ fiX/x) = 0 and limJC_>0+ g(\/x) = 0.
At the same time, from existence of the limit of the quotient of the derivatives at infinity, we get
(/(!/*))' /'(l/xX-l/x2) lim -= lim--—
x^0+ (g(l/x))>      x^0+ g'(l/x)(-l/x2)
,.     /'(I/*)      ,. fix) = lim -= lim -.
X^0+  g'(l/x) X^QO g'(X)
Applying the previous theorem, we get that the limit
f(x)             f(l/x) f'(x) lim -= lim -= lim -
x^oo g(X)       x^0+ g(l/x)       x^oo g'(x)
will exist in this case as well.
The limit calculation is even simpler in the case when
lim f(x) = ±oo,     lim g(x) = ±oo.
X^XQ X^Xq
Then it suffices to write
f(x) l/g(x) lim -= lim -,
*->*0 g(x)       x^x0 l/f(x)
which is already the case of usage of l'Hospital's rule from the previous theorem. It can be proved that l'Hospital's rule holds in the same form for infinite limits as well:
Theorem. Let f and g be functions differentiable on some neighborhood of a point xo e M, yet not necessarily at the point xo itself Further, let the limits lim*-^ f(x) = ±oo and lim*-^ g(x) = ±oo exist. If the limit
fix)
exists, then the limit
lim
*->*0 g'(x)
,. fix) lim
x^x0 g(x)
exists as well and they equal each other.
Proof. Once again, we can apply the mean value theorem. The key step is to express the quotient in a form where the derivative arises:
/(*) _     /(*)       f(x)-f(y) g(x)-g(y) g(x)     f(x)-f(y)   g(x)-g(y) g(x) where y is fixed, from a selected neighborhood of xo and x is approaching xq. Since the limits of / and g at xq are infinite, we can surely assume that the differences of the values of both functions at x and y, having fixed y, are non-zero.
292
CHAPTER 5. ESTABLISHING THE ZOO
is then matched (see (||5.3||)) by (5.5)
v2     h + Vh2 + P tancp = — = —
i. e.   cp = arctg
h + Vh2 + P
I I The previous calculation was based upon the conditions x(t0) = I, y(t0) = h only. However, these only talk about the position of the ball at the time t0, but the ball could get through the rim from below. Therefore, let us add the condition / (t0) < 0 which says that the ball was falling at the time, and let us prove that it holds for vq from (|| 5.41|) and cp from (||5.5||).
Let us remind that we have (see (||5.1||), (||5.2||))
to
vq cos cp
0      2(1 tan cp—h) cos2
Using this, from we get
lim y(t)
t^to-
Vo sin<p — gt0 < 0
v2 < v0
sin cp
sin cp cos cp
2(1 tan cp—K) cos2
i. e. the equality
I sin cp cos cp < 2(1 tan cp — h) cos2 cp, from which we can easily see that
Y < tancp.
By confrontation with (||5.5||), we get that the last inequality really holds because
tan<p
i i
2h I ■
Thus we have shown that for the initial speed from (||5.4||), the player is able to score.
During the free throw, supposing the player lets the ball go at the height of 2 m, we have
h = 1.05 m,   / = 4.225 m,   g = 9.806 65 m • s"2,
and so the minimal initial speed of the ball is
7.28 m-s-1.
vo
9.806 65
1.05 + V(1.05)2 + (4.225)2
m-s
0.907 rad « 52c
The corresponding angle is then
cp — arctg 9 80665.4225
Let us think for a while about the obtained value of the angle cp for the initial speed v0. According to the picture, we have
2fi + (n — a) = it      and      a + y = |,
whence it follows that
fí = — = — — Y-
V       2       4       2 "
So it holds that
E. _i_ L 4 ~*~ 2
(f + y) = Hl + *tfg t).
j_ in
2
Using the mean value theorem, we can replace the fraction in the middle with the quotient of the derivatives at an appropriate point c between x and y, and the expression of the examined limit thus gets the form
/(*)
g(x)
1
g(y)
1
f(y)
f'(c) g'(c) '
where c depends on both x and y. Having fixed y and x approaching xq, the former fraction apparently converges to one. If we simultaneously move y towards xo, the latter fraction will get arbitrarily close to the limit value of the quotient of the derivatives. □
5.41. Use cases. Making suitable modifications of the examined expressions, one can also apply l'Hospital's rule on forms of the types oo — oo, 1°°, 0 • oo, and so on. One often simply rearranges the expressions or uses some smooth function, for instance the exponential one.
For an illustration of such a procedure, we will show the connection between the arithmetic and geometric means of n non-negative values xi. The arithmetic mean
M (xi
> Xfi) —
xx
is a special case of the so-called power mean with exponent r, also known as generalized mean:
Mr(xi,
■xi
The special value M~l is called harmonic mean. Now, let us calculate the limit value of Af for r approaching zero. For this purpose, we will determine the limit by l'Hospital's rule (it it an expression of the form 0/0 and we differentiate with respect to r, while x, are constant parameters).
The following calculation, in which we apply the chain rule and our knowledge of the derivative of the power function, must be read from the back. Existence of the last limit implies the existence of the last-but-one, and so on.
lim ln(Mr(xi,
r^0
, x„)) — lim
r^0
O)
x\ \\iX\-\-----Yxrn \nxn
lim-
r^0
lnxi + •
lnx„
= ln^/xl
Hence we can immediately see that
lim Mr (xi, ..., x„) — Zjx\ ... x„,
r^O
which is a value known as geometric mean.
4. Power series
5.42. The calculation of ex. Besides addition and multiplication, \j,    we can also manipulate with limits of sequences. ~i. Thus it might be a good idea to approximate non-polynomial functions by sequences of values that can be calculated.
293
CHAPTER 5. ESTABLISHING THE ZOO
We have obtained that the elevation angle corresponding to the throw with minimal energy is the arithmetic mean of the right angle and the angle at which the rim is seen (from the ball's position).
The problem of finding the minimal speed of the thrown ball was actually solved by Edmond Halley as early as in 1686, when he determined the minimal amount of gunpowder necessary for a cannonball to hit a target which lies at greater height (beyond a rampart, for instance). Halley proved (the so-called Halley's calibration rule) that to hit a target at the point [I, h] (shooting from [0,0]) one needs the same minimal amount of gunpowder as when hitting a horizontal target at distance h + ~Jh2 + P (at the angle <p = 45 °). Halley also demonstrated that the value of <p is stable with regard to small difference of the amount of used gunpowder and insignificant errors in estimating the target's distance. □
5.99.  A bullet is shot at angle <p from a point at height h above ground \l^v at initial speed vq . It will fall on the ground at distance R from the point of shot (see the picture). Determine the angle <p for which the value of R is maximal.
Solution. We will express the bullet's position in time by the points [x(t), y(t)]. We assume that it was shot at time t = 0 from the point [0, 0] and it will fall on the ground at the point [R, —h] at certain time t = t0, i. e. x(0) = 0, y(0) = 0, x(t0) = R, y(t0) = -h. Similarly to Halley's problem, we will consider the equations
x' (t) = vqcos<p,    yf (t) = u0sin<p — gt,       t e (0, to)
for the horizontal and vertical speeds of the bullet, where g is the gravity of Earth.
We can continue as when solving the previous problem: by integrating these equations (taking x(0) = y(0) = 0 into consideration), we get
x(t) = vot coscp,    y(t) = vot sin<p
and from the conditions lim^i0_ x(t) = y(to) = —h, we then have that
i
2
x(t0)
t2,
t G (0, to),
R, lim,_
y(t)
Having a looking at the function e*, we actually look for a function whose rate of increase equals the function's value at every point. This can be imagined as a splendid interest rate equal to the current value. If we apply the interest rate per year once a month, once a day, once an hour, and so on, we will get the following values for the yield x of the deposit:
(l+u) ' (1+36s) ' 0 Therefore, we could deem that
, 365
" )
8760
8760
= lim
n^oo
I x\n (1 + n)
At the same time, we can imagine that the finer we apply the interest, the higher the yield will be. So the sequence on the right-hand side should be an increasing one.
Let us, in detail, examine the sequence of numbers
a„ — (1 + -
V n
The so-called Bernoulli's inequality will come in handy:
Lemma. For every real number b > — 1, b ^ 0, and a natural number n > 2, it is true that (1 + b)n > 1 + nb.
Proof. For n — 2, we get
(1 +b)2 = 1 +2b-
■b2 > 1
2b.
From now on, we proceed by induction on n, supposing b > —I. Let us assume that the proposition holds for some k > 2 and let us calculate:
(1 + b)k+l = (1 + b)k(\ +b)> (1+ kb)(\ + b)
= l + (k+l)b + kb2 > l + (k+ \)b The statement is, of course, true for b — — 1 as well.
□
Now we can bound the quotient of adjacent terms a„ of out sequence
(i + ^r)-1
i
(n2 - \)nn n2n(n - 1)
1 \" n nL I   n —
> (1--)-
1.
n n — 1
Thus we have proved that our sequence is indeed increasing.
The following, very similar, calculation (applying Bernoulli's inequality once again) verifies that the sequence of numbers
b„
1
1 + -
n
n + l
is decreasing. Surely b„ > an.
b„
b„
+1
n
n + 1 n
n+2
1
1
1 + -
n
n
n + 1
n+2
1
1 + -
n
2n + l
2n
n+2
n(n ■+ 2)
n
1
n+2
= 1.
n(n+2)/
Thus the sequence a„ is increasing and bounded from above, so the set of its terms has a supremum which equals the limit of the
294
CHAPTER 5. ESTABLISHING THE ZOO
R = t>0f0 cos cp,    —h = v$to sin cp — -\ gt^.
From the first equation, it follows that
R
to
VQ cos <p
so we can express the previous two equations by the single equation
gR2
(5.6)
h = R tan cp
2vq cos2 cp' where cp s (0, tv/2).
Unlike with Halley's problem, the value of v0 is given and R is variable (dependent on cp). So, actually, there is a function R = R(cp) (in variable cp) which must satisfy (115.611) (it is determined by the equation (||5.6||)). Thus, this function is given implicitly. The equation (||5.6||) can be written as (R is substituted by R(cp))
R(cp) tmcp ■ 2vq cos2 cp — gR2(cp) + h ■ 2v^ cos2 cp = 0. Using the relation
2 tan cp cos2 cp = sin 2cp, we can transform (||5.6||) into the form
(5.7) R((p)vl sin2<p - gR2(cp) + 2hv2,cos2cp = 0.
Differentiating with respect to cp now gives
R'(cp)vf)sm2cp + 2R(cp)vf)cos2cp — 2gR(cp)R'(cp) — 2hvQ (2cos<p sin<p) =0,
i. e.
R'(cp) \v2lsrn2cp — 2gR(cp)\ = — 2R(cp)vf) cos2<p + 2hv2)sm2cp. Thus we have calculated that
. 7..."' ^(°<!)-
It suffices to verify that v2, sin 2cp—2gR(cp) ^ 0 for every cp s (0, tc/2). Let us suppose the contrary and substitute
-       2vi[h sin 2<p — R(w) cos 2<p]
R (cp) = —-—ö-,
VYV vlsm2q>-2gR(q>)
R
Vq sin 2<p sin <p cos <;
2g
gv^ sin2 ijp cos2 ip 2g^vk cos2
2   • 2
sin
into (||5.6||), obtaining
,        vl sin ip cos » ,
—h = —-tan<»
8 r
Simple rearrangements lead to
which cannot happen (the left side is surely negative while the right one is positive).
So we were able to determine R'(cp) for all cp e (0, it 12). What is more, we can immediately see that this derivative is zero if and only if
h sin2cp = R(cp) cos2cp,       i. e.      R(cp) = h tan2cp.
Since the function R must have a maximum on the interval (0, it 12) (according to the physical meaning of the problem, for cp -» 0+ and
sequence. At the same time, we can see that this value is also the limit of the decreasing sequence bn because
1
lim b„ = lim (1 H—)a„ = lim a„.
This limit determines one of the most important numbers in mathematics (besides the numbers 0, 1, and it), the so-called Eu-ler's number e. We thus have
e = lim ( 1
5.43. Power series for e*. The exponential function has been defined as the only continuous function satisfying f(l) = e and f(x+y) = f(x)-f(y). The base e is now expressed as the limit of the sequence an, thus necessarily
'i
ex = lim (an)x.
For the sake of simplicity, let us fix a positive number x. If we replace n with n/x in the values a„ from the previous paragraph, we again arrive at the same limit. (Think this out in detail!) Hence
e = lim
ex = lim
I x\n (1 + n)
Let us denote the n-th term of this sequence as un (x) = (1 x/n)n and express it by the binomial theorem:
(5.10)
u„ (x) = 1 -= 1 +x
X3
+ 3T
x"
n(n — Vjx2 2!^
n
__ f I
21 \  ~ n
nix" n\nn
i 1 n 1
1 - -n
2
1 - -n
2
1 - -n
Since all the expressions in the parentheses are less than one, we also get that
n 1
w„ (x) < vn (x) = —
Let us have a look at the formal infinite sum
(5.11)
oo og y
E^Eyr
We can see that vn (x) is just the sum of the first n terms of this formal infinite expression.
The quotient of adjacent terms in the series is cj+\ /cj = x/(n + 1). Thus for every fixed x, there is a number N e N such that c,-+i /c,- < 1/2 for all j > Af. However, so great indeces j satisfy Cj+\ < < 2~^~N+1^ cN. This means that the parial sums of the first n terms of our formal sum are bounded from above by the sums
N-l   j             j        n-N j D„ <  >^ —x-7 H--x"  >^ —7.
295
CHAPTER 5. ESTABLISHING THE ZOO
for cp -» 7t/2— the value of R decreases) and is differentiable at every point of this interval, it has its maximum at the point where its derivative is zero. This means that R (cp) can be maximal only if
(5.8)
R(cp) = h tan 2cp.
Let us thus substitute (||5.8||) into (||5.7||). We obtain
h tan 2cp v2, sin 2cp — gh2 tan2 2cp + 2hv^ cos2 cp = 0. This equation can be transformed to
tan 2cp v2, sin 2cp + 2v2) cos2 cp = gh tan2 2cp,
2 sin2 2cp
+ vl (cos2<p + 1) = gh
sin 2<p
V0
v2 sinz 2cp + V2 cos2 2cp + v2 cos 2cp = gh v2 + v2cos2cp = gh 1-cos22<p
vi (1 + cos 2cp) = gh
cos 2<p '
(1—cos 2#>)(l+cos 2<p) cos 2<p '
cos 2cp = gh (1 — cos 2cp) , (v2, + g/i) cos2<p = g/i,
cos2<p
gh vl+gh '
However, by this we have uniquely determined the point
at which R is highest. Since
i arccos —^
vl+gh '
sin 2<p0 = y/l — cos2 2<po = 1 /1 the function value
g2ft2    _ <Jvt+2shvl
(vl+ghf V0+Sh '
R (<Po) = h tan 2<po = ^
7"0+2^"0 r---
fg+jjA      _ \/v0+2ghv0
gh
m lv2 + 2gh.
This distance was achieved for
,„   _ I flrrrno       9.806 65-1.8 cpo - 2 dTCCUS 27.7782 +9.806 65-1.8
0.774 2 rad^ 44.36c
Since for every q, it holds that (1 -q) (l+q-\-----Ycf-) = 1 -<^+1,
we can also bound the values v„:
W_1 1 2
Un < y 1xj + A^V(i_2-"+w-1)
j=o J'
The hmit of the expressions on the right-hand side for n approaching infinity surely exists, and so the limit of the increasing sequence vn exists as well.
Now, let us examine the sequence of numbers u„, whose limit is ex. We will consider n > N for some fixed N (very great) and choose a fixed number k < N (quite small). Then for sufficiently large N, we can approximate the sum of the first k terms in the expression of un in (5.10) by i;* with arbitrary precision. Since this part of the sum of un is strictly less than un itself, the sequence u„ must converge to the same limit as the sequence v„. Thus we have proved
__ [    The power series for ex     [__,
Theorem. The exponential function is, for every number x e expressed as the limit of the partial sums in the expression
1 9
1+xH--x2
2!
1
y-x".
«=0
5.44. Number series. When deriving the previous important theorems about the function ex, we have accidentally worked with several extraordinarily useful concepts and tools. Now, we will formulate
them in general:
Infinite number series
Definition. An infinite series of numbers is an expression
E
«=0
an = aß -\- a\ -\- a2 -
at
Let, for instance, javelin thrower Barbora Spotakova give a javelin the speed i>o = 27.778 m/s = 100 km/h at the height h = 1.8 m (with g = 9.806 65 m • s~2). Then the javelin can fly up to the distance
R (<Po) = wMh V27.7782 + 2 - 9.806 65- 1.8 m = 80.46 m.
L
where the a„'s are real or complex numbers. The sequence of partial sums is given by the terms sk = S«=o fl«> m^ we sav ^e series converges and equals s iff the limit
s = lim s„
of the partial sums exists and is finite.
If the sequence of partial sums has an improper limit, we say that the series diverges to oo or —oo. If the limit of the partial sums does not exist, we sometimes say that the series oscillates.
However, the world record of Barbora Spotakova does not even approach 80 m although the impact of other phenomena (air resistance, for example) can be neglected. Still we must not forget that from 1 April 1999, the center of gravity of the women's javelin was moved towards its tip upon the decision of IAAF (International Association of Athletics Federation). This reduced the flight distance by around 10 %. The original record (with "correctly balanced" javelin) was 80.00 m.
For the sequence of partial sums sn to be converging, it is necessary and sufficient that it is a Cauchy sequence, i. e.
km — Sn I — \an+l + • • • + am I must be arbitrarily small for sufficiently great m > n. Since
|fl«+ll H-----h \am \ > \an+i H-----h am\,
the convergence of the series Y^k=.o \an \ implies the convergence of the series y^jtlo ««•
296
CHAPTER 5. ESTABLISHING THE ZOO
The performed reasoning and the obtained result can be applied to other athletic disciplines and sports. In golf, for instance, h is close to 0, and thus it is just the angle
<Po
lim 77 arccos
h^0+ Z
vl+gh
\ arccos 0 = \ rad = 45'
at which the ball falls at the greatest distance
h^0+ g v u ° g Let us realize that our calculation cannot be used for h = 0 (<po = tt/4) since then we would get the undefined expression tan {it/2) for the distance R. However, we have solved the problem for any h > 0, and therefore we could get a helping hand form the corresponding one-sided limit. □ Further miscellaneous problems concerning extrema of functions of a real variable can be found at 320
G. L'Hospital's rule
5.100.   Verify that the Umit (a)
sin (2jc) — 2 sin x 0 lim--- is of the type -;
x^o 2ex -x2 -2x -2 0
(b)
(c)
(d)
(e)
(f)
lnx oo
lim - is of the type —;
x^0+ cotx oo
lim I-- — -— I    is of the type oo — oo;
1+ V x — 1 lnx
lim (In (x — 1) • lnx)    is of the type 0 • oo;
1 Q
lim (cot x) i- *    is of the type oo ;
jt-»0+
lim
x^0 \ X
sin x \ x
is of the type 1°°;
__|    Absolutely convergent series J__
We say that a series 2~2k*=o ®n ^ absolutely convergent iff the I series z~2T=o      converges. |
The absolute convergence has been introduced because it can often be much easily verified. Moreover, the following theorem shows that all simple algebraic operations behave "very well" in the case of series that con-■— verge absolutely.
5.45. Theorem. Let S = 2~2T=o fl« an<^ T — S«=o b" be two absolutely convergent series. Then
(1) their sum converges absolutely to the sum
oo oo oo
S + T = ^ an + ^ bn = ^2(a„ + bn),
n—0 n—0 n—0
(2) their difference converges absolutely to the difference
oo oo oo
S - T = Efl« ~ E^" = E(fl" ~ b")'
n—0 n—0 n—0
(3) their product converges absolutely to the product
oo   / n
ST
E°" ) ' ( E^" I = E \ ^2an-kbk
\«=0
\h=0
n=0 \k=0
Proof. Both the first and the second statements are a straightforward consequence of the corresponding properties of limits. The third statement requires our attention. Let us write
cn — y^,an-kbk-
k=0
From the assumptions and the rule for the Umit of a product, we get
(Efl") ' IE*"
\n=0     /     \h=0 I
Thus it suffices to prove that
Efl"   ' \Hb"
\«=o
\«=0
0= lim
k^oo
Ik       \     I k       \ k
- EM-E
\n=0     I     \n=0     ' n=0
Ck
Let us confront the expressions
(Efl") ' (EM =   E aibJ>
\n=0     I     \n=0     I 0<i,j<k k
cn= E a'bj' Ec"= E a'bJ-
i+j=n 0<i,j<k
Thus we get the bound
k       \     I k
n=0
i+j<k 0<i,j<k
(k        \       / k        \ k Efl«) ■ (EM ~ Ec« = E aibi - E ifl'^'i-n—0     /     \n—0     /      n—0 i + i>k i + i>k
+P 0<i,j<k
i+p 0<i,j<k
To bound the last expression, we can use a simple trick: If the sum of the indeces is to be greater than k, then at least one of them must be greater than k/2. Surely we will not lower the expression if we
297
CHAPTER 5. ESTABLISHING THE ZOO
(g)
(7tX\^nx cos — )       is of the type 0 . 2 /
Then calculate it using l'Hospital's rule. Solution. We can immediately assert that (a)
lim (sin (2x) — 2 sinx) = 0 — 0 = 0,
jt=>0
J2
(b)
(c)
(d)
(e)
(f)
(g)
lim (2ex -x2 -2x -2) = 2- 0- 0- 2 = 0;
jt=>0 V '
lim lnx = — oo,        lim cotx = +oo;
X 1
lim -= +oo,        lim -= +oo;
x^i+ x — 1 x^i+ lnx
lim lnx =0,        lim In (x — 1) = — oo;
jtr=>l + jtr=>l +
lim cotx = +00,        lim -= 0;
x^o+ x^o+lnx
sinx 1 lim-= 1,       lim — = +oo;
x^0 X
Jt=>0 Xz
JTX
lim cos — = 0,        lim lnx =0.
x=>l — 2 x=>l —
The case (a). Applying l'Hospital's rule transforms the Umit
sin (2x) — 2 sin x
lim
into the limit
lim
ö 2e* - x2 - 2x - 2 2 cos(2x) — 2 cos x
>o    2ex -2x -2 which is of the type 0/0. Two more applications of the rule lead to
—4 sin (2x) + 2 sinx lim-
x^o        2ex - 2
and (the above limit is also of the type 0/0)
— 8 cos (2x) + 2 cosx     —8 + 2 lim-=-= —3.
x^o 2ex 2
Altogether, we have (returning to the original Umit)
sin (2x) — 2 sin x lim---= —3.
x^o 2ex — x2 — 2x — 2
Let us remark that multiple appUcation of l'Hospital's rule in an exercise is quite common.
From now on, we will set the Umits of quotients of derivatives obtained by l'Hospital's rule equal the Umits of the original quotients. We can do this if the gained limits on the right sides really exist, i. e.
add more terms into it, i. e. we will take all as in the product and remove only those whose indeces are both at most k/2.
\a;b
i+j>k 0<i,j<k
0<i, i<k
0<i,j<k/2
However, both the expressions of the difference are the partial sums for the product ST. Therefore, they share the same limit and their difference goes to zero. □
The following theorem states some conditions that will help us verify the convergence of series.
5.46. Theorem. Let S = 2~lT=o a" be an infinite series of real or complex numbers.
(1) If the series S converges, then lim^oo an = 0.
(2) Let us suppose that the limit of the quotients of adjacent terms of the series exists and
lim
an + l
q.
Then the series S converges absolutely for \q\ < 1 and does not converge for \q\ > 1. We know nothing in the case of \q\ = 1: the series may or may not converge. (3) If the limit
lim
= q
exists, then the series converges absolutely for q < 1 while it does not converge for q > 1. Again, in the case of q = 1, the series may or may not converge.
Proof. (1) We know that the existence and the potential value Jtsb/ of the Umit of a sequence of complex numbers is given by the Umits of the real parts and the imaginary parts. Thus it suffices to prove the first proposition for sequences of real numbers. If lim^oo an does not exist or is non-zero, then for sufficiently small number e > 0, there are infinitely many terms ak with \ak\ > s. So there must be either infinitely many positive terms of infinitely many negative terms among them. But then, adding any one of them into the partial sum, we get that the difference of the adjacent terms s„ and s„+i is at least e. Thus the sequence of partial sums cannot be a Cauchy sequence and, therefore, it cannot be converging, either.
(2) Since we want to prove the absolute convergence, we can assume straightaway that the terms of the series are real numbers at > 0. The proof was given for the special value of q = 1/2 when deriving the value of ex using series. Now, let us consider q < r < 1 for a real number r. From the existence of the Umit of the quotients, we can deduce that for every j greater than a sufficiently large N, it holds that
aj+i
< r(J~N+1) aN.
But this means that the partial sums s„ are, for large n > N, bounded from above by the sums
n-N
7=0 j=0 j=0
1 _ fi-N+l
T^~r
Since 0 < r < 1, the set of all partial sums is an increasing sequence bounded from above, and thus its limits is its supremum.
298
CHAPTER 5. ESTABLISHING THE ZOO
actually we will make sure that what we write is senseful only afterwards.
The case (b). This time, differentiation of the numerator and the denominator gives
lnx
lim - = lim
■x^O+COtX x^0+
lim
x^0+
sin2 x
The last limit can be determined easily (we even know it). From
lim — sinx = 0,
smx lim -= 1,
x^0+ X
the result 0 = 0-1 follows. We could also have used l'Hospital's rule again (now for the expression 0/0), obtaining the result
— sin2x            — 2-sinx-cos x —2-0-1 lim - = lim -=-= 0.
x^0+       X x^0+ 1 1
The case (c). By mere transforming to a common denominator:
lim
1
lim
x lnx — (x — 1)
-i+\x — 1    lnx/    x^i+    (x — l)lnx we have obtained the type 0/0. We have that
xlnx-(x-l) lnx + f-1 lim -= lim -:—--
lim
lnx
-i+    (x-l)lnx £^±_|_mx ^i+1-J.+lnx
We have the quotient 0/0, which (again by l'Hospital's rule) satisfies lnx 7 1 1
lim
lim
i+1-I+lnx +1     1 + 1 2'
Returning to the original Umit, we write the result
lim
1
i+ V x — 1 lnx
The case (d). We transform the assigned expression into the type oo/oo (to be precise, into the type — oo/oo) by creating the fraction
In (x - 1)
lim In (x — 1) • lnx = lim
By l'Hospital's rule,
,. ln(x-l)
lim ---= lim
In x
1
x-l
x^l +
x^l +
1 1
1
In x
lim
-x In x
-i+ x-l
In2 x *
This indeterminate form (of the type 0/0) can once again be determined by l'Hospital's rule:
-xln2x -ln2x - 2x lnx - -
lim - = lim
►1+    X — 1 *->■! +
1
0 + 0
0.
In the case q > r > 1, we will use a similar technique. However, this time, from the existence of the limit of the quotients, we now deduce that
aj+i
JJ-N+l)
ax > 0.
However, this means that the absolute values of the particular terms of the series do not converge to zero, and thus the series cannot be converging, by the already proved part of the theorem.
(3) The proof is quite similar to the previous case. From the existence of the limit q < 1, it follows that for any r, q < r < 1, there is an N such that for all n > N, Z/\an\ < r holds. Exponentiation then gives us \an \ < r", so we, once again, are comparing this to a geometric series. Thus the proof can be finished in the same way as in the case of the ratio test. □
In the proof of both the second and the third statement, we have used a weaker assumption than the existence of the limit. We only wanted to know that the examined sequences of non-negative terms are, from a given index on, either all greater or all less than a given number.
For this purpose, however, it suffices to consider, for a given sequence of terms bn, the supremum of the terms with index higher than n. These suprema always exist and create a non-increasing sequence. Its infimum is then called limes superior of the sequence and denoted by
lim sup bn.
The advantage is that limes superior always exists. Therefore, we can reformulate the previous result (without having to change the proof) in a stronger form:
Corollary. Let S — Z~2^Lo a" ^e an infinite series of real or complex numbers.
(1) If
an+i
q — lim sup
then the series S converges absolutely for q < 1 and does not converge for q > 1. For q = 1, it may or may not converge. (2) If
q = lim sup $\an\,
the series converges absolutely for q < 1 while it does not converge for q > 1. For q = 1, it may or may not converge.
5.47. Power series. If we consider not a sequence of numbers an, but rather a sequence of functions /„ (x) sharing the same domain A, we can use the definition of addition of series "pointwise", thereby ■'r///'    obtaining the concept of the series of functions
«=o
_j    Convergence of power series J__
A power series is given by an expression
oo
S(x) — ^2anX" ■
n=0
299
CHAPTER 5. ESTABLISHING THE ZOO
The cases (e), (f), (g). Since
lim (cot x) i"'
x^0+
/ sin x \ x2 lim -
x^O \ X
lim
lim
lnťcot x) In x
lim
/ 7tX\
(cos —)
In x
-  QX^O X
lim (lnjt-ln(cos ^-))
it suffices to calculate the limits given in the argument of the exponential function. By 1'Hospital's rule and simple rearrangements, we get
lim
In (cotx)
>o+ lnx
type
+00
-00
= l Km cot" ,™2*
x^0+
lim -
x^o+ cosx ■ sinx
0
type-
= lim
-1
*^o+ cos2x — sin x
1-0
ln^i lim —-
x^O X2
type5
type
: lim^
x^O
x cos x—sin x
v2
lim
x cosx — smx
lim
2x x->b    2x2 sinx
cosx — x sinx — cosx
0  4x sin x + 2x2 cos x
lim
smx
lim
0 4 sin x + 2x cos x — cosx
0
type-
-1
hence
0 4 cos x + 2 cos x — 2x sin x     4 + 2 — 0
1
lim (cot x) in x
x^0+
( sin x \ x lim —
x^O \ X
1
We can proceed similarly when determining the last Umit. We have that
In (cos ?y)
/ Jtx\ lim ) (In x) • In I cos — )
lim
lim
1
In x
sin
type
) -
2/2
-00 00
-00 00
1   _ j_
In2 x x
jt        x sin ^ • In x
= — lim -^-=r-.
2x^i- cos^f-
Since this form is of the type 0/0, we could continue by using 1'Hospital's rule; instead, we will go from
x sin    ■ In2 x
lim clover to the product of limits
7tX-
cos ?y
lim (x sin —^ •
x^i- V       2 /
ln2x
hm -ŤŤT
cos iy
1 • lim
ln2x
COS ^
We say that s(x) has radius of convergence p > 0 iff 5(x) converges for every x satisfying |x| < p and does not converge for
|x| > p.
5.48. Properties of power series. Although a significant part of the proof of the following theorem will have to be postponed until the end of the following chapter, we will formulate the basic properties of the power series right now:
Absolute convergence and differentiation
Theorem. Let s(x) = Y^Lq o-n^ be a power series and let the
limit _
r = lim yjoj
exist Then the radius of convergence of the series S equals p — r-1.
The power series s(x) converges absolutely on the whole interval of convergence and is continuous on it (including the marginal points, supposing it is convergent there). Moreover, the derivative exists on this interval, and
(x) = 'y^nanxn
n = \
Proof. To verify the absolute convergence of the series, we can use the root test from theorem 5.46(3), for every value of x. We calculate _
lim ^/\anx" I = rx,
and the series converges absolutely, or does not converge if this Umit is different from 1. Hence it follows that it indeed converges for |x| < p and diverges for |x| > p.
The statements about the continuity and the derivatives will be proved later in a more general context, see 6.43-6.45. □
Let us also notice that, when proving the convergence, we can use a stronger form of the root test, and so the radius r of convergence can, for every power series, be described explicitly by
r~l = lim sup ^J\an\.
5.49. Notes. If the coefficients of the series increase rapidly, i. e.
■ a„ = n", then r = 00, i. e. the radius of conver-i?LvY/   gence is zero. Indeed, such a series converges at a single point, namely x = 0. Now we will have a look at some examples of convergence of power series (including the marginal points of the corresponding interval): Let us consider
00 1
S(x) = J2x", Tlx)
The former case is a geometric series, which we have already met. Its sum is, for every x, |x| < 1,
s(x) =
1 — X
while |x| > 1 guarantees that the series diverges. For x = 1, we obtain the series 1 + 1 + 1 + ..., which is apparently divergent. For x = — 1, we get the series 1 — 1 + 1 — ..., whose partial sums do not have a Umit, i. e. the series oscillates.
300
CHAPTER 5. ESTABLISHING THE ZOO
Only now we apply 1'Hospital's rule for
ln2x
lim
COS :
type
0
lim
2 lnx
("§)
sin :
Altogether, we have
lim (hue • In (cos = ^.1-0 = 0,
i. e.
lim fcos —^
In x
□
5.101. As we have implicitly mentioned, using l'Hospital's rule can lead to a non-existing limit even though the original hmit exists: Determine the limit
x + sin x lim -
x^oo x
Solution. The limit is of the type ^, by l'Hospital's rule, we get that
x + sinx 1 + cosx
lim
x^-oc
lim
x^-oc
X x^oo 1
and since the hmit linu^oocosx does not exist, nor does the limit linu^oo 1 + cosx. However, the original hmit exists because
x
1 x + sin x x + 1 — < - < -
and by the squeeze theorem,
x + sin x           x + sin x           x + 1 1 = lim - < lim - < lim -= 1.
X^QO X X^QO X X^QO X
□
5.102. Determine
lnx
lim
h°o X
lim x In —,
x^o+ x
lim x e
x^O-
lim
x^O X
100 '
lim x e*
x^0+
lim (In x — x) ;
x^>+oc
lim
lim
lim
►+o° X + lnx • COSX ' x^+oo  */x + 3 x^+oo ^fx2 _|_ i
Solution. It can easily be shown (for instance, by 72-fold use of l'Hospital's rule) that for any 72 e N, it holds that
x e lim — =0,    i. e.     lim —
+00.
The squeeze theorem implies the following generalization for real numbers a > 0:
x°
lim —
0,   i. e.
lim
+00.
The theorem 5.46(2) shows that the radius of convergence of the latter example is 1 as well because
lim	4Tx"+1 «+1	— x lim	n
	-xn n		n + 1
			
For x — 1, we get the series 1 + 5 + -j + ..., which is divergent: By gradually summing up the 2k~1 adjacent terms 1/2*-1, ..., 1/(2* - 1) and replacing each of them by 2~k (thus they total up to 1/2), we can bound the partial sums from below by the sum of these 1 /2's. Since the bound from below diverges to infinity, so does the original series.
On the other hand, the series T(—l) — — 1 + \ — \ + ... converges although it, of course, cannot converge absolutely. This follows from a more general theorem which will be introduced in the next chapter.
5.50. Trigonometric functions. With the power series, our society of functions increased by a lot of new examples of ^r smooth functions, i. e. functions which are arbitrarily f many times differentiable on the whole of their domains. Moreover, all of these additions to out menagerie have the property (similarly to polynomials) that the formula which defines them in fact defines a function C —>• C.
Indeed, our reasoning about the absolute convergence holds flawlessly for series of complex numbers as well. Therefore, the power series will be convergent when we replace x with any complex number lying inside the disc with radius r centered at the origin of the complex plane.
Let us, for a while, play with the most important example, the exponential function
1 „
1
Ix2 2
This power series has infinite radius of convergence, so it defines a smooth function for all complex numbers x. Its values are the limits of values of (complex) polynomials with real coefficients and each polynomial is completely determined by finitely many of its values. Especially, the values of the power series are completely determined on the complex domain by their values at real input values x. Therefore, the complex exponential must also satisfy the usual formulas which we have already derived for the real values x. In particular, we have
ex+y = Qx
see (5.5) and the theorem 5.45(3). Let us substitute the values x — i ■ t, where i e Cis the imaginary unit, t e R arbitrary.
ef = 1 + a - -r2 -i—r3 + — t4 + i—t5 -...
2        3!       4! 5! and apparently, the conjugate number to z — e" is the number
z — e
Hence
z ■ z
1
hoo x"
and all the values z — e" lie on the unit circle (centered at the origin) in the complex plane.
The real and imaginary parts of the points lying on the unit circle have been described using the trigonometric functions cos 6 and sin 6, where 6 is the corresponding angle.
301
CHAPTER 5. ESTABLISHING THE ZOO
Taking into account that the graphs of the functions y = ex and y =
lnx (the inverse function to y = ex) are symmetric with regard to the
line y = x, we further see that
lnx x
lim -= 0,    i. e.     lim -= +00.
x lnx
Thus we have obtained the first result. That could also be derived from 1'Hospital's rule because
lnx lim -
:^+oo x
lim
lim
1
0.
hoo 1       x^+00 x
Let us point out that l'Hospital's rule can be used to calculate all of the following five limits. However, it is possible to determine these limits by much simpler means. For instance, the substitution y = 1/x leads to
lim x In —
x^0+ X
lim ie> =
x^0+
v Iny = lim -
y
lim —
y
0;
+00.
Of course, x -» 0+ gives y By the substitutions u ■ tively,
lim x e"
x^O-
_ j_
ft x2
lim
1/x -» +00 (we write 1/ + 0 = +00). — 1/x, v = 1/x2 we get that, respec-
lim
e
u
-00;
>0 X
100
v50 lim —
o,
where x -» 0— corresponds to u = —1/x -» +00 (we write — 1/ — 0 = +00) and then x -> Oto 1; = 1/x2 -» +00 (again 1/+0 = +00). We have also clarified that
lim (In x
X)
lim
-OO.
Potential doubts can be scattered by the limit
In x — x      ..     /. x lnx -
lim
lnx
lim (1
V lnx/
-OO,
which proves that even when decreasing the absolute value of the considered expression (without changing the sign), the absolute value of the expression remains unbounded.
We can equally easily determine
x
lim
lim
+°° x + In x • cos x
X
lim —
:^+oo x
1;
lim
x
+°° Vx2 + 1
lim
-^►+00
lim
+00;
= 1.
We have seen that the l'Hospital's rule may not be the best method for calculating Umits of types 0/0, 00/00. The three preceding exercises illustrate that it even cannot be applied in all cases (for indeterminate
60N!0METfUCK£.
Differentiating the parametric description of the points of a circle t i-> e", we get the vectors of "velocities" which will be given by the formula (if we do not believe that the power series can be differentiated term by term yet, we can instead differentiate the real part and the imaginary part separately) t i-> (elt)' — i -e", so their will keep the unit size. Hence we can deem that the whole circle will be traversed when the value of the parameter reaches the length of the circle, i. e. 2it (a thorough definition of the length of a curve needs integral calculus, then we will be able to verify this statement). This procedure can be used to define the number it, sometimes also called Archimedes' constant or Ludolphian number 5 half the length of the unit circle in the Euclidean plane M2.
Now, we can at least partially convince ourselves by a look at the least positive roots of the real part of the partial sums of our series, i. e. the corresponding polynomials. Already with order ten, we get the number it with accuracy of 5 decimal places.
Thus we obtain the definition of trigonometric functions in terms of the power series:
cos t — v&&" = 1--12
1  4       1 6
-r--r-
+ (-D
1
4!
6!
sin? — lme" —t--f3 H--15--1
7!
(2k)\ 1 3 1 3! 5!
+ (-D*
1
(2k + 1)
The following picture illustrates the convergence of the series for the cosine function. It is the graph of the corresponding polynomial of degree 68. Gradually drawing the partial sums, we can see that the approximation near zero is very good and hardly changes at all. As the order increases, the approximation gets better farther from the origin as well.
-*This number describes the ratio of the circumference to the diameter of an (arbitrary) circle. It was known to Babylonians and Greeks as early as the ancient times. The term Ludolphian number is derived from the name of German mathematician Ludolph van Ceulen of the 16th century, who produced 35 digits of the decimal expansion of the number, using the method of inscribed and circumscribed regular polygons, invented by Archimedes.
302
CHAPTER 5. ESTABLISHING THE ZOO
forms). If we had applied it to the first problem, we would have obtained, for x > 0, the quotient
1 x
1 + £2ii — In x ■ sin x     x + cos x — x In x ■ sin x
x
which is more complicated than the original one. The limit for x -» +oo does not even exist, so one of the prerequisites of l'Hospital's rule is not satisfied. In the second case, any number of multiple uses of 1'Hospital's rule leads to indeterminate forms. For the last problem, l'Hospital's rule sends us back to the original hmit: first it gives the fraction
1
2x
2Jx2 + l
and then
2x
27*2 + 1
1
From here, we can deduce that the limit equals 1 (we are looking for a non-negative real number ael such that a = a~l) only if we have already shown it exists at all. □ Other examples concerning calculation of hmits by l'Hospital's rule can be found at page 331.
H. Infinite series
Infinite series naturally appear in a series (of problems).
5.103. Sierpinski carpet. The unit squares is divided into nine equal squares and the middle one is removed. Each of the eight remaining squares is again divided into nine equal subsquares and the middle subsquare (of each of the eight squares) is removed again. Having applied this procedure ad infinitum, determine the area of the resulting figure.
Solution. In the first step, a square having the area of 1/9 is removed. In the second steps, eight squares (each having the area of 9-2, i. e. totaling to 8 • 9-2) are removed. Every further iteration removes eight times more squares than in the previous steps, but the squares are nine times smaller. The sum of areas of all the removed squares is
I + I + +
9 T 92 T 93 T
E
«=o
8* 9«+i ■
The area of the remaining figure (known as Sierpinski carpet) thus equals
1     E 9«+i — i     9 E (9)
1
«=0
«=0
0.
The well-known formula
e" &~" — sin21 + cos2 t — 1
follows straight from the definition. Further, from the derivative (e"Y — i e" we can see that
(sin t)' — cos t,       (cos t)' — — sin t.
Of course, this result can also be verified by differentiating our series term by term.
Let to denote the least positive number for which — — i. e. the first positive zero point of the function cos t. According to out definition of it, we have ?o — \it.
The square of this value is el2t° — s~l2'o = (s~"°)2, and so it is a zero point of the function sin t. Of course, for any t, it holds that
pQ\4k it
1 -e"
Therefore, both trigonometric functions sin and cos are periodic, with period 2it. Right from our definitions, we can see that this is their prime period.
Now we can easily derive all the usual formulae connecting the trigonometric functions. We will, for illustration, introduce some of them. First, let us notice that the definition says that
(5.12) cos? = -(e^+e-'O
(5.13)
2
sin? — — (elt -e~lt). 2i
Thus the product of these functions can be expressed as
sin? cos? — — (elt -e~lt)(elt +e~lt) 4i
= —(ei2t-e~i2t) = -sin2f. 4« 2
Further, we can utilize our knowledge of derivatives:
cos 2t — (— sin 2t)' — (sin t cos t)' — cos21 — sin2 t.
The properties of further trigonometric functions
sin t
tan t
cos t
cot t — (tan t)
□
can easily be derived from their definitions and the formulae for derivatives. The graphs of the functions sine, cosine, tangent, and cotangent are displayed on the pictures (they are the red one and the green one on the left, and the red one and the green one on the right, respectively):
303
CHAPTER 5. ESTABLISHING THE ZOO
5.104. Koch snowflake, 1904. Create a "snowflake" by the following procedure: At the beginning, consider an equilateral triangle with sides of length 1. With each of its three sides, do the following: Cut it into three equally long parts, build another equilateral triangle above (i. e. pointing out from, not into, the original triangle) the middle part and remove the middle part. This transforms the original equilateral triangle into a six-pointed star. Once again, repeat this step ad infinitum, thus obtaining the desired snowflake. Prove that the created figure has infinite perimeter. Then determine its area.
Solution. The perimeter of the original triangle is equal to 3. In each step, the perimeter increases by one third since three parts of every line segment are replaced with four equally long ones. Hence it follows that the snowflake's perimeter can be expressed as the limit
d„ = 3 (I)"   and    lim d„ = +oo.
The figure's area is apparently increasing during the construction. To determine it, it thus suffices to catch the rise between two consecutive steps. The number of the figure's sides is four times higher every step (the line segments are divided into thirds and one of them is doubled) and the new sides are three times shorter. The figure's area thus grows exactly by the equilateral triangles glued to each side (so there is the same number of them as of the sides). In the first iteration (when creating the six-pointed star from the original triangle), the area grows by the three equilateral triangles with sides of length 1/3 (one third of the original sides' length). Let us denote the area of the original equilateral triangle by So. If we realize that shortening an equilateral triangle's sides three times makes its area decrease nine times, we get
So + 3-f.
for the area of the six-pointed star. Similarly, in the next step we obtain the area of the figure as
So + 3-f +4-3-|.
Now it is easy to deduce that the area of the resulting snowflake equals the limit
lim (S0 + 3 • f + 4 • 3 • | +
A-An.%. J>Q_) ' ^     J    9/1+1)
S0 lim (l + i +
+ ••• +
• (IT)
So
1 + j lim
So
(
1 + 1 +
+
i+iE(ir
k=0
(IT)
= S0
So
1 + I Hm E (IT
' k=0
i _i_ I . _L_
3     1 4
So-
The snowflake's area is thus equal to 8/5 of the area of the original triangle, i. e.
8 o ? >->0
4
2 73 5 •
1 2 3
Cyclometric functions are the functions inverse to trigonometric functions. Since the trigonometric functions all have period 2it, their inverses can be defined only inside one period, and further only on the part where the given function is either increasing or decreasing. The inverse trigonometric functions are
with domain [—1, 1] andränge [—tt/2, tt/2]. Then arccos — cos-1
with domain [—1, 1] and range [0, it], see the left-hand picture.
The remaining functions are (displayed in the picture on the right)
arctan — tan-1
with domain R and range (—tt/2, tt/2), and finally
arccot — cot-1
with domain R and range (0, it).
The so-called hyperbolic functions are also of great importance in practice, namely
sinhx — -(ex — e x), 2 '
coshx — -(ex +e x). 2
The name indicates that they should have something in common with a hyperbola. A straight calculation gives (the squares cancel out and only the mixed terms remain)
(coshx)2 - (sinhx)2 = 2^(e* e~x) = 1.
The points [cosh t, sinh t] e R2 indeed parametrically describe a hyperbola in the plane. For hyperbolic functions, one can easily derive identities similar to the ones for trigonometric functions. Among many of them, we can easily see from the definition (by substituting into (5.12) and (5.13)) that
coshx — cos(i'x),       i sinhx — sin(i'x).
304
CHAPTER 5. ESTABLISHING THE ZOO
Let us notice that this snowfiake is an example of an infinitely long curve which encloses a finite area. □
5.105.   Calculate the series
oo        / n.
n = \
(b) e h
«=o
(c)    e (42/1-I   + 42n) > « = 1 oo
(d) e £;
«=1 00
(e) (3« + l)(3«+4) ■ «=0
Solution. The case (a). From the definition, the series is equal to
00 / 1       1 \
«=1
((tt " 7l) + (A " 7f) + • • • + 0 ~ vfe)) =
i^(1 + (-7! + 7!) + --- + (-* + ^)-vfe) = 1-The case (b). Apparently, this sequence is a quintuple of the standard geometric series with the common ratio q = 1/3, hence
e^=5e(ir=5
L_ — il
. 4 — 2 '
«=0 «=0 3
The case (c). We have that (with the substitution m = n — 1)
°°       3 2 3  °°        1 2   °° 1
e (42/1-1 + 421) = 4 e (4211-2) + is e (42/1-2) = «=1
«=1
(i + £)E
16/ ^ 42" m—0
14 V (-ly' 16 2^ V16/
m— 0
16 V42" « = 1
14
16 1-
14
15 •
The series of linear combinations was expressed as a linear combination of series (to be more precise, as a sum of series with factoring out the constants), which is a valid modification supposing the obtained series are absolutely convergent.
The case (d). From the partial sum
sn = \ + ^ + ^+ ■■■ + §:, neN, we immediately get that
32 + 33 +
■   n—1   1 n
'    3/1    '   3/1+1 >
n e N.
Therefore,
$n 3
Since lim
3/1+1
- i + J- + J- +
~ 3   1   32   1   33 1
0, we get that
4- —
3«
3«+i 1
n e N.
e # = Am § (.„ - f) = | lim e ^
iEar=i(i-i)=i-
The case (e). It suffices to use the form (this is the so-called partial fraction decomposition)
5.51. Notes. (1) If a power series S(x) is expressed with the value of the variable x moved by a constant offset xq, we arrive at the function T(x) — S(x — xq). If p is the ■■f )j radius of convergence of S, then T will be well-defined on the interval (xq — p, xq + p). We say that T is a power series centered at xq.
The power series can be defined in the following way:
S(x) — ^a„(x -x0)",
«=0
where xq is an arbitrary fixed real number. All of our previous reasonings are still valid, we must only be aware of the fact that they relate to the point xq. Especially, such a power series converges on the interval (xo — p, xo + p), where p is its radius of convergence.
Further, it holds that if a power series y — T(x) has its values in an interval where a power series S(y) is well-defined, then the values of the function S o T are also described by a power series which can be obtained by formal substitution of y — T(x) for y into S(y).
(2) As soon as we have power series with a general center at our disposal, we can calculate the coefficients of the power series for inverse functions straightaway. We will not introduce a list of formulae here, it can easily be obtained in Maple, for instance, by the procedure "series". For illustration, we will have a look at two examples:
We have seen that
1
1 2      1   3        1 4
-x2 + -x3 H--x4
2 6 24
Since e° — 1, we will search for a power series centered at x — 1 for the inverse function lnx, i. e.
lnx — ao+a\(x — l)+a2(x — I)2 +aj,(x — I)3 +a4(x — I)4 +... .
Applying the equality x — elnx, regrouping the coefficients by the powers of x and substituting, we get:
(      1 2     1 3     1  4 A x — an + a\  x H—xr H—x H--x + ...
u       V      2       6       24 /
■ a2\x
'-x2
■ fl3 I x
-x2
ao + flix + ( — a\ + a2 )x
24
—a\ + Ü2 + as Ix 6
- + - |fl2 + -A3 + a4 \x
Confronting the coefficients at the corresponding powers on both sides, we get
1 1 1
«0 = 0, a\ — 1, «2 = — ~' fl3 = y ' °4 = —4' " " "
which indeed corresponds to the valid expression (will be verified later):
lnx =
00    ^_\~jn~ 1
(x - 1)".
«=1
Similarly, we can play with the series
1 3     1 5     1 7 sin? = ?--r H--r--r
3!       5! 7!
305
CHAPTER 5. ESTABLISHING THE ZOO
l
(3« + l)(3«+4)
which gives
3« + l
lim i (l
Z^ (3«
«=0
I . I _ 7 T 7
lim i (l - -^-)
3 V 3«+4/
j_ 1 3 ' 3«+4'
1
n e NU {0},
1 + 1
4 4
+ l)(3«+4) 1
10 T       T 3« + l 3«+4/
□
5.106.   Verify that
n — 1 «—0
Solution. We can immediately see that
—    '      22 + 32 < Z    22 — 2'      42 + 52 + 62 + 72 < ^    42 — 4'
or, in general:
í   _i_     i__i__
(2«)2 (2"+1-l)2       ^      (2«)2 2"'
< 2'
« 1
n € N.
Hence (by comparing the terms of both of the series) we get the wanted
inequality, from which, by the way, it follows that the series YlT=i ^2
converges absolutely.
Eventually, let us specify that
00 j      2 00 1
Z^^2=T<^=Z^2«-
n—\ n—0
and the (unknown so far) series for its inverse (note that we are looking for a series centered at zero again because we have sin 0 — 0)
■ a\t + ajt1
arcsin t — üq Substitution gives
■ as
a4t
t — ao + a\ i t
3!
5!
a2\ t
3!
-ť3+-ť5
5!
= AO +       + fl2f
hence
:fl2 + A4 If
arcsin t — t
a3 )t3+ 3
120
1*3
6
öl
-aj, + 05
1^
40
(3) We can also notice that if we believed right from the beginning that the function ex can be expressed as a power series centered at zero and that power series can be differentiated term by term, then we would easily obtained the differential equation for the coefficients a„ as we know that (x"+1 )' = (« + l)x". Therefore, from the condition that the exponential function has its derivative equal to its value at every point, it follows that
and hence it is clear that a„ — X
flo
1
□
5.107.   Examine convergence of the series
00
YMn=±i.
'—1 n
« = 1
Solution. Let us try to add up the terms of this series. We have that
00
J2 In s±i = lim (In \ + In § + In \ + • • • + In =±i) =
n = \
lim ln^
2-3-4-(» + l)
, 9,      — lim In (n + 1) = +00. Thus the series diverges to +00.
5.108.   Prove that the series
g arctg »2+2»+3^+4     g ^±L
o n + l ' ni+n1— n
«=0
« = 1
do not converge. Solution. Since
lim arctg
«2+2«+3y/«+4 n + l
lim arctg ^
and
lim
3" + l
lim
+ OO,
□
the necessary condition lim a„ = 0 for the series YlT=nna" to con~ verge does not hold in either case. □
306
CHAPTER 5. ESTABLISHING THE ZOO
5.109.   What is the series
«=2
Solution. From the inequalities (consider the graph of the natural logarithm)
1 < Inn <n,    n > 3, n e N,
it follows that
>/l < ■ifinn < n > 3, n e N.
By the squeeze theorem,
lim v^lnn = 1,    i- C-    lim -»J= = 1 -
«^►00 n^oo Vinn
Thus the series does not converge. As its terms are non-negative, it must diverge to +00. □
5.110.   Determine whether the series
1
(« + l)-3" '
0) E 7^TTV3
«=0
00 .
(b) E
«=1
(c) E —r-
v 7 «—In «
« = 1
converge.
Solution. All of the three enlisted series consist of non-negative terms only, so the series is either finite (i. e. converges), or diverges to +00. We have that
(a) E - E (I)" = TZT < +00;
«=0 «=0 3
n — 1 n — 1 n — 1
00 00
(c) E —1— > E 1 = +°°-
v 7   ^-^ n —In« — t—1 n
n — \ n — \
Hence it follows that the series (a) converges; (b) diverges to +00; (c) diverges to +00. □ More interesting exercises concerning series can be found at page 332.
I. Power series
In the previous chapter, we examined whether it makes sense to assign a value to a sum of infinitely many numbers. Now we will turn our attention to the problem what sense the sum of infinitely many functions may have.
5.111. Determine the radius of convergence of the following power series:
00
i) E
« = 1
307
CHAPTER 5. ESTABLISHING THE ZOO
a) E
n = \
\_
(1+0""
Solution.
i) From we get that
1
lim sup
"n+\
Thus the power series converges exactly for the real numbers x € {—\,\) (alternatively, the complex numbers \x\ < \).
\ (it is har-
Let us notice that the series diverges for x monic), but on the other hand, it converges for x = — ± (alternating harmonic series). To determine the convergence for any x lying in the complex plane on the circle of radius \ is a much harder question which goes beyond our lectures.
ii)
lim sup
1
(l + 0"
lim sup
1 +i
V2 2 '
□
5.112.   Determine the radius r of convergence of the power series
(a) E^«;
«=i
oo
(b) £(-4/0"*";
n = l
co 2
(c) E(i + £)"
«=i
(d) E
(2+(-l)»)
n x
« = 1
Solution. It holds that
l
l.
8 '
(a) lim y\a„ | = lim „-
(b) lim ^/| an I = lim An = +oo;
(c) lim = lim (1 + i)" = e;
_ it—$
(d) lim sup y\a„\= lim sup 2+(lir =
Urn sup Jfl"_\r
1.
Therefore, the radius of convergence is (a) r = 8, (b) r = 0, (c) r = 1/e, (d)r = l. □
5.113.   Calculate the radius r of convergence of the power series
E ein
«=i
•Jn4+2n3 + l-jf
(x - 2)".
Solution. The radius of convergence of any power series does not change if we move its center or alter its coefficients while keeping their absolute values. Therefore, let us determine the radius of convergence of the series
308
CHAPTER 5. ESTABLISHING THE ZOO
^ \?«4+2«3 + l-3T"
Since
lim ^n3" = ( lim = 1   for a > 0,
we can move to the series
00
«=1
with the same radius of convergence r = jt/3. □
5.114. Give an example of a power series centered at the origin which, on the interval (—3,3), determines the function
1
x2-x-l2'
Solution. As
_1_-_I_- I (J.___]—)
x2-x-l2       (x-4)(x+3)       7 Vjt-4      x+3/
and
1     _ 4-     _ 1
x-4 1-| 4
(l + f + £ + ••• + £ + •••),
_L_ -_L_ - I (l __ + _! + ... + Iz£)_ + ... ^) x+3 — l-(-f) _ 3 ^      3 ^ 32 ^       ^    3"    ^       / '
we get
1        _ _J_  V  x^_ _ J_ (-*)" _ V / (-!)"+'__1_\ T«
x2_x_n 28   ^  4"       21   ^     3" ^ \   21-3" 28-4" / -
«=0 «=0 «=0 x 7
□
5.115.   Approximate the number sin 1° with error less than 10-10. Solution. We know that
oo
sin x = x - ± x3 + jj Xs - ± x1 H----= t t^htt       .    x eR.
3! 5! 7! (2« + l)!
«=0
Substituting x = jt/180 gives us that the partial sums on the right side will approximate sin 1°. It remains to determine the sufficient number of terms to add up in order to provably get the error below 10-10. The series
jr___1_ (JL_\3 I  _L (JL.)5 _ i_ (JL.)7 _i_       — V ( n \2n+1
180      3! V180/        5! V180/        7! V180/   ~r ' " — 2^ (2n + l)\ V 180/
«=0
is alternating with the properly that the sequence of the absolute values of its terms is decreasing. If we replace any such convergent series with its partial sum, the error we thus make will be less than the absolute value of the first term not included in the partial sum. (We do not give a proof of this theorem.) The error of the approximation sinl°^-r^
180 1803-3!
is thus less than
1805-5!       1U •
□
5.116. Determine the radius r of convergence of the power series
309
CHAPTER 5. ESTABLISHING THE ZOO
V 2   "! r" 2-^ (2n)! «=0
o
5.777. Calculate the radius of convergence for        2%/"   ■ O
5.778. Without calculation determine the radius of convergence of the power series
oo
5
n-
« = 1
V —_ X"
o
5.779. Find the domain of convergence of the power series
y«+i
34n
n = l
O
5.720. Determine for which x e M the power series
E     <-3)" (x-2)"
^ v/«4+2«3 + lll V 7
converges. O 5.727. Is the radius of convergence of the power series
oo oo n—0 n—1
common to all sequences {a«}^L0 of real numbers? O 5.722. Decide whether the following implications hold:
(a) If the limit lim ^o2 exists and is finite, then the power se-
3»/„2
ries
Z~2 an(x — xd)"
n = \
converges absolutely at at least two distinct points x.
(b) Conditional convergence of series 2~1™=\ an, 2~1™=\ ^« implies that the series X^i(6fl« — 5b„) converges as well.
(c) If a series YlT=o a» satisfies
lim a2 = 0,
then it is convergent.
(d) If a series YlT=i an converges, then the series
oo
n
« = 1
converges absolutely.
o
5.725. Approximate cos with error less than 10~5. O 5.124. For the convergent series
310
CHAPTER 5. ESTABLISHING THE ZOO
(-1)" ^ V^+100'
«=0
bound the error of its approximation by the partial sum sg 999. O
5.125. Express the function y = ex, defined on the whole real line, as an infinite polynomial whose terms are of the form an(x — 1)". Then express the function y = 2X defined on R as an infinite polynomial with terms anx" . Q
5.126. Find the function / to which, for x e R, the sequence of functions
converges. Is this convergence uniform on M? O
5.127. Does the series
00
T kde  x e R,
n = \
converge uniformly on the real line? O
5.128. Approximate
(a) the cosine of ten degrees with accuracy of at least 10~5;
(b) the definite integral JQ1/2 ^-j- with accuracy of at least 10~3.
o
5.729. Determine the power series centered at xG = 0 of the function
X
f{x) = /V dt,    x € R.
0
o
5.130. Using the integral test, find the values a > 0 for which the series
oo
y -
«=i
converges. O
5.131. Determine for which x e R the series
y_t_x3„
^ 2" • n ■ ln(n)
converges. O
5.132. Determine all x e R for which the power series £ ^ is con-
i=\ "
vergent. O
Solution. Forx e [-1, 1]. □
5.133. For which x € R does the series
oo
Eln(n\) nx
« = 1
311
CHAPTER 5. ESTABLISHING THE ZOO
converge? O
5.134. Determine whether the series
oo
T (-lf^tan -±=
■      ' n^Jn
n = \
converges absolutely, converges conditionally, diverges to +oo, diverges to — oo, or none of the above, (such a series is sometimes said to be oscillating). O
5.135. Calculate the series
oo
T —
t— n-V
« = 1
with the help of an appropriate power series. O
5.136. Forx e (-1, 1), add
x - Ax2 + 9x3 - 16x4 H----
5.137. Supposing \ x\ < 1, determine the series
oo
(a) E^*2-1;
«=i
2^n-l
(b) Y,n2x>
n = \
5.138. Calculate
using the power series
E
2n-l
(-2)""1 n = \
Jin
E 2"-«! X «=0
y
O
O
E(-i)" (2« + l)x2
«=0
for some x e (—1, 1). O 5.739. For x e R, calculate the series
oo
1    v3n + l
o
J. Additions into the ZOO
5.140. Determine the maximal subset of R where the function
y = arctg (x21 + sinx) • 5--—
can be defined. O Solution. R.
5.141. Write the maximal domain of the function
arccos (In x)
312
CHAPTER 5. ESTABLISHING THE ZOO
COS X .
X3 ' COS X
o
Solution. (1, e].
5.142. Determine the domain and the range of the function
y       2-3x ■
Then determine the function inverse to this one. O
Solution, (-oo, I) U (f, +oo); (-oo, -|) U +oo); y = ff±}, x + -\.
5.143. Is the function
(a) y
(b) y = ^ + l;
(c) y = ^;
(d) y = ^ + l;
(e) y = sin x + tan |;
(f) y = ln|lf;
(g) y = sinhx =
(h) y = coshx =
with the maximal domain odd? O Solution, (a) yes; (b) no; (c) no; (d) no; (e) yes; (f) yes; (g) yes; (h) no.
5.144. Is the function
(a) y
(b) y = ^ + l;
(c) y = ^;
(d) y = ^ + l;
(e) y = sin x + tan |;
(f) y = in jif;
(g) _y = sinhx
COS X . COS X
2
(h) y = coshx = ^+fl with the maximal domain even? O
5.145. Determine whether the function
(a) y = sin x • In I x I;
(b) y = arccotgx;
(c) y = x8 - ^3x6 + 3x2 - 6;
(d) y = cos (it — x);
(e) y = £»i£±£_
v 7 J       3+7 cos jc
with the maximal domain is odd and whether it is even. O
5.146. Is the function
(a) y = In (cos x) ;
(b) y = tan (3x) + 2 sin (6x)
313
CHAPTER 5. ESTABLISHING THE ZOO
with maximal domain periodic? O 5.147. Draw the graphs of the functions
/(x)=e|x|,   xeR;      g(x)=ln|x|, ieK\{0).
O
5.148. Draw the graph of the function
y = 2_l x ,    x €
5.149. The functions
o
sinhx = e 2e  ,   x e R;        coshx =    2e  ,   x eR; tanhx = x e M;      cothx = ie8\{0)
cosh x' ' sinh x ' 1 J
are called hyperbolic functions. Determine the derivatives of these functions on their domains. O
5.150. At any point x e R, calculate the derivative of the area hyperbolic sine (denoted arsinh), the function inverse to the hyperbolic sine y = sinh xonl. O Note: The inverse functions to the hyperbolic functions y = coshx, x € [0, +oo), y = tanhx, x e R and y = cothx, x e (—oo, 0)U(0, +oo) are called area hyperbolic functions (y = arsinhx belongs to them, too). They are denoted arcosh, artanh, arcoth, respectively and are defined for x e [1, +00), x e (—1,1), and x e (—00, —1) U (1, +00), respectively. Let us add that
(arcoshx)' = jA—^, x > 1, (artanhx)' = \x \ < 1,
(arcothx)' = \x \ > 1.
5.151. Calculate:
2     12 12
2 + lH---1---1---1---1---1----
2!    3!    4!    5! 6!
O
Solution. Confronting the series with the expansions of the functions sinh and cosh into power series, we get the result
sinh(l) +2cosh(l).
□
314
CHAPTER 5. ESTABLISHING THE ZOO
K. Additional exercises to the whole chapter
5.152. Determine a polynomial P(x) of the least degree possible satisfying the conditions P(l) = 1, P(2) = 28, P(0) = 2, P'(0) = 1, P'(l) =9. O
5.153. Determine a polynomial P(x) of the least degree possible satisfying the conditions P(0) = 0, P(l) = 4, P(-l) = -2, P'(0) = 1,        =7. O
5.754. Determine a polynomial 7J(x) of the least degree possible satisfying the conditions P(0) = -1,       = -1, P'(-l) = 10, P'(0) = -1,        = 6. O
5.755. From the definition of a limit, prove that
lim (x3 - 2) = -2.
5.158. Determine both one-sided limits
lim arctan —,       lim arctan —.
x^0+ X x^O- X
Knowing the result, decide existence of the Umit
lim arctan —.
x^O X
5.159. Do the following limits exist?
sinx             5x4 + 1 lim ——, lim-
x^O   X x^O X
5.160. Calculate the Umit
tanx — sinx
lim
sin x
5.161. Determine
2 sin3 x + 7 sin2 x + 2 sin x — 3
lim-----.
x^k/6 2 sin x + 3 sin x — 8 sin x + 3
5.162. For any m, n e N, determine
i™ - 1 lim -.
x^l  X" - 1
o
5.156. From the definition of a limit, determine
,. (l+x)2-3 lim -,
x^-i 2
i. e. write the 5(e)-formula as in the previous exercise. O
5.757. From the definition of a limit, show that
r     3 (x ~ 2)4 ^ lim -= +oo.
o
o
o
o
o
o
315
CHAPTER 5. ESTABLISHING THE ZOO
5.163. Calculate
lim  ( v x2 + x — x) .
o
5.164. Determine
lim
x^+oo
(xy/l+X2 -X2) .
o
5.765. Calculate
+ cosx
lim---.
sin x
5.166. Determine
sin (Ax)
o
lim
>o Vx + 1 - 1
O
5.767. Calculate
v/1 + tan x — V1 — tan x lim -.
x^o- smx
o
5.76S. Calculate
2X + Vl +x2 -x9 - 7x5 + 44x2
lim -—==-
■->-00 3X + ^6x6 + x2 - 18x5 - 592X4
o
5.769. Letlim^^.oo f(x) = 0. Is it true thatlim^-ooC/Cxj-gCx)) = 0 for every increasing function
g : R -> M? O
5.770. Determine the limit
. 2«-l
/ n lim--
n^oo \ n + 5
5.777. Calculate
sin x — x
o
lim
o
5.772. For x > e, determine the sign of the derivative of the function
fix) = arctan lnx
-l+lnjc'
o
Solution. f'(x) < 0, x > e.
5.173. Determine all local extrema of the function
y = x In2 X
316
CHAPTER 5. ESTABLISHING THE ZOO
defined on the interval (0, +00). O
Solution. The function has a local maximum at the point x\ = e~2 and it has a local minimum at the point x2 = 1.
5.174. Is there a real number a such that the function y = ax + sinx has a global minimum on the interval [0, 2tt] at the point x0 = 5n/41 O
Solution. There is not: for a = V2/2, there is only a local extremum at the point.
5.175. Find the absolute minimum of the function
y = ex — Inx,    x > 0
on its domain. O Solution. 2 = e - — In -.
e e
5.176. Determine the maximum value of the function
y = ^/3xe~x, xeR.
O
Solution. 4=.
5.177. Find the absolute extrema of the polynomial p(x) = x3 — 3x + 2 on the interval [—3, 2]. O Solution.4 = p{-\) = p (2), -16 = p (-3).
5.178. Let a moving object's distance in time be given as follows:
sit) = -it - 3)2 + 16, (e[0,7], where (is the time in seconds, and the distance is in meters. Determine
(a) the initial (i. e. at the time ( = 0 s) speed of the object;
(b) the time and position at which its speed is zero;
(c) its speed and acceleration at the time ( = 4 s.
Let us remark that the object's speed is the derivative of its position and acceleration is the derivative of its speed. O
5.179. From the definition of a derivative /' of a function / at the point x0, calculate /' for fix) = Vx at any point x0 > 0. O
5.180. Determine whether the derivative of the function
fix) = x arctan\,   x e R \ {0},      /(0) = 0 at the point x0 = 0 exists. O
5.181. Does the derivative of the function
y = sin (arctan (J 12x21 + 11 | • ^^12     + sin(sin(sin(sinx))),    x e R at the point xo = tt3 + 371 exist? O
5.182. Determine whether the derivative of the function
f(x) = (x2 - ljsin^,    x;t-l(x6l), /(-1)=0
317
CHAPTER 5. ESTABLISHING THE ZOO
at the point x0 = — 1 exists. O
5.183. Give an example of a function / : R -» R which is continuous on the whole real axis, but does not have derivatives at the points x\ = 5, x2 = 9. Q
5.184. Find functions / and g which have derivatives at no real point, yet their composition / o g is differentiable at every point of the real line. O
5.185. Using the basic formulae, calculate the derivative of the function
(a) y = (2 — x2) cosx + 2x sinx,   x e R;
(b) y = sin (sinx) ,   x e R;
(c) y = sin (In (x3 + 2x)) ,    x e (0, +oo);
1+Jt-Jt2 1-Jt+Jt2
(d) y = fe^  ^ e
o
o
5.786. By any means, determine the derivative of the function
(a) y = tJx y/x yf^c,    x € (0, +oo);
(b) y = In |tan || ,   x e M \ {«7r; n e Z}.
5.187. Write the derivative of the function
y = sin (sin (sinx)),    x e
5.188. For the function
i=£ j. .3/v3
/(x) = arccos      + Vx3
having the maximum possible domain, calculate /' on the largest subset of R where this derivative exists. O
5.189. At any point x ^ {nrr; n € Z}, determine the first derivative of the function y = Vsin x. O
5.190. For x e R, differentiate
o
xV 1 + x2 + e* (x2 - 2x + 2) .
O
5.191. Calculate /'(l) if
/(jc) = (x - l)(x - 2)2(x - 3)3,    x e R.
o
5.792. Determine the derivative of the function
|x|^l(xeM).
318
CHAPTER 5. ESTABLISHING THE ZOO
o
5.193. Differentiate (with respect to the real variable x)
x In2 (x + Vl +x2) - 2Vl +x2 In (x + Vl + x2) + 2x at all points where the derivative exists. Simplify the obtained expression. O
5.194. Determine /' on a maximal set if f(x) = \ogx e. O
5.195. Express the derivative of the product of four functions
[f(x)g(x)h(x)k(x)] '
as a sum of products of their derivatives and themselves, supposing all of these functions are differ-entiable. O
5.196. Determine the derivative of the function
_ x3 (x+l)2jx~+2 y (x+3)2
for x > 0. O
5.197. A highway patrol helicopter is flying 3 kilometers above a highway at the speed of 120 kph. Its pilot localizes a car whose straight-line distance from the helicopter is 5 kilometers and which is approaching it at 160 kph (with regard to the helicopter). Determine the car's speed with regard to a tin lying on the highway.
Solution. For the sake of simplicity, we will omit units of measurement (distances will be expressed in kilometers and times in hours, speeds in kph, then). The helicopter's position at time t can be expressed by the point [y(t), 3], and the car's position by [x(t), 0], then. (We choose the axes so that the helicopter and the car are moving along the x-axis.) Let us denote by s(t) the straight-hne distance of the car from the helicopter and by t0 the moment mentioned in the problem's statement. Let us calculate the car's speed with respect to the origin. We can suppose that x(t) > y(t) > 0, then x' (t) < 0, y (0 > 0 for the considered time moments t since the car is approaching the point [0, 0] from the right - the value x(t) decreases as t increases, therefore x1 (i) < 0. Similarly we can get that y (0 > 0 and also s' (t) < 0. Let us add that, for instance, yf (t) determines the rate of change of the function y at time t, i. e. the helicopter's speed We know that
s (t0) = 5,   *' (to) = -160,   y' (t0) = 120 and that (s(t) is the hypotenuse of the right triangle)
(5.9) (x(t)-y(t))2 + 32 = s2(t).
Hence it follows (x(t) > y(t) > 0) that
(x (to) - y (to))2 + 32 = 52,   i. e.   x (t0) - y (t0) = 4. By differentiating the identity (||5.9||), we get
2 (x(t) - y(t)) (x' (0 - y (0) = 2s(t)s> (t)
and then for t = t0,
2-A(x'(to)- 120) =2-5-(-160),   i.e.   x' (t0) = -80.
319
CHAPTER 5. ESTABLISHING THE ZOO
We have calculated that the car is approaching the tin at 80 kph. It suffices to realize with which units of measurement we worked. Having obtained a negative value is caused by our choice of the coordinate system. □
5.198. For which a e R is the cubic polynomial P which satisfies the conditions P (0) = 1, P'(0) = 1, P(\) =2fl+2, P'(l) = 5a + 1, a monotonie function on the whole M?
Solution. >From the conditions P(0) = 1 and P'(0) = 1 it follows that P(x) = bx3 + cx2 + x + 1 where b, c e R; the two remaining conditions determine two equations for the variables b and c: b + c + 2 = 2a + 2, 3b + 2c + \ = 5a + 1 with the unique solution b = c = a. The polynomials which satisfy the desired conditions are thus of the form P(x) = ax3 + ax2 + x + 1, a e R. The monotonicity of the polynomial is equivalent to having no local extrema. The extrema can occur only at those points where the derivative changes sign. Therefore, the polynomial is monotonie if and only if its derivative keeps the sign on the whole R. The derivative is
P'(x) = 3ax2 +2ax + 1
and it will keep the sign iff the discriminant is non-positive. Thus we get the condition
4a2 -12a   < 0
4a(a - 3)   < 0,
which is true for a e [0, 3]. However, for a = 0 the polynomial P is monotonie, yet not cubic, Thus the set of satisfactory numbers a is the interval (0, 3]. □
5.199.   Regiomontanus' problem, 1471.
In the museum, there is a painting on the wall. Its lower edge is a meters above ground and its ^ upper edge b meters, then (its height thus equals b—a). A tourist is looking at the painting, her eyes being at height h < a meters above ground. (The reason for the inequality h < a can, for instance, be to allow more equally tall visitors to view the painting simultaneously in several rows.) How far from the wall should the tourist stand if she wants to maximize her angle of view at the painting?
320
CHAPTER 5. ESTABLISHING THE ZOO
Solution. Let us denote by x the distance (in meters) of the tourist from the wall and by cp her angle of view at the painting. Further, let us set (see the picture) the angles a, f3 e (0, jt/2) by
tana = ^,      tan£ =
x    ' ' x
Our task is to maximize cp = a — f3. Let us add that for h > b, one can proceed analogously and for h € [a, b], the angle cp increases as x decreases (cp = it for x = 0 and h € (a, b)).
>From the condition h < a it follows that the angle cp is acute, i. e. cp e (0, it 12). Since the function y = tanx is increasing on the interval (0, it 12), we can turn our attention to maximizing the value tan cp. We have that
tan cp = tan (a — (3)
tana—tan fi 1 +tan a tan fi
x(b—a)
l + t±.!LdL x2+(h_h-)(a_h-)-
So it suffices to find the global maximum of the function
/(*)
x(b—a) x2+(b-h)(a-h) '
x e [0, +oo).
From the expression
(b-a)[x2+(b-h)(a-h)]-2x2(b-a) [x2+(b-h)(a-h)f
f'(x) we can see that
f'(x) > 0 for  x e (o, y/(b - h)(a - hf) ,
f'(x) < 0 for  x e ^/(b - h)(a - h), +oo)
(b-a)[(b-h)(a-h)-x2 [x2+(b-h)(a-h)f
X e (0, +oo),
Hence the function / has its global maximum at the point x0 = — h)(a — h) (let us remind the inequalities h < a < b).
The point xo can, of course, be determined by other means. For instance, we can (instead of looking for the maximum of the positive function / on the interval (0, +oo)) try to find the global minimum of the function
321
CHAPTER 5. ESTABLISHING THE ZOO
g(x) = -f- = *2+(»-W«-ft> = ^ + (»-*)(flTA),      xe(0,+00) with the help of the so-called AM-GM inequality (between the arithmetic and geometric means)
y-^>yfyiY2, V!,y2>0, where the equality occurs iff y\ = y2. The choice
yiW = ^.    ^) = ^rrI
then gives
g(x) = yi(x) + y2(x) > 2 Vvi(x)y2(x) = ^ V(* - A) (a - A).
Therefore, if there is a number x > 0 for which yi(x) = y2(x), then the function g has the global minimum at x. The equation
yl(*) = y2(*),    i.e.   ^ =
has a unique positive solution x0 = y/(b — h)(a — h).
We have determined the ideal distance of the tourist from the wall in two different ways. The angle corresponding to x0 is
(p0 = arctan 2 x.f'~^ ,, = arctan;-h—-
' xl+(b-h)(a-h) 2 J(b-h)(a-h)'
When looking at the painting from the ground (being an ant, for instance), we have h = 0, and so
b—a
x0 = yjab,      cpo = arctan;
2 \jab
If the painting is 1 meter high and its lower edge is 2 meters above ground (a = 2, b = 3), then the ant will see the painting at the largest angle <po = 0.2014 rad ~ 11.50 at the distance x0 = 2.45 meters from the wall. If this painting is viewed by a man whose eyes are at the height of 1, 8 meters, together with his son whose eyes are 1 meter above ground, then the father should stand x0 = 0.49 meters from the wall and his son x0 = 1.41 meters, then. We can notice that the father has <po = 0.795 6 rad ~ 45.60 whereas his son has <po = 0.339 8 rad ~ 19.5 °. The quotient
0.795 6 ^ 45.6
2.3
0.339 8 19.5
proves what a strongly better view the father has. □
5.200. Snell's law. Determine the refracted light ray between the point A in a homogeneous space with speed of light t>i and the point B in a homogeneous space with speed of light v2. See the picture.
Solution. Once again, we will omit units of measurement. We can assume that distances are given in meters, speeds t>i, v2 in meters per second (and time in seconds, then). The ray is determined by Fermat's principle of least time: of all the paths between the points A and B, the light will go along the one which can be traversed in the least time. In homogeneous spaces, the ray will be a straight line (in this case, we will consider its segment). So it suffices to determine the point R (given by
the value x) where the ray refracts. The distance between the points A and R is Jh\ + x2, between
points R and B it is y h\ + (d — x)2, then. The total time of the transmission of energy between the points A and B is thus given by the function
T(x) = 1--h--
in the variable x e [0,d]. Let us emphasize that we want to find the point x e [0, d] at which the value T(x) is minimal. The derivative
322
CHAPTER 5. ESTABLISHING THE ZOO
T'(x)
"'■   ''ij-t-a-        V2 w«2
d—x
vi.h]+x2 v2Jhl+(d-x)2
is a continuous function on the interval [0, d], so its sign can be easily described by its zero points. From the equation
T'(x) = 0,   i. e.       x     - d~x
it follows that
x
jh2+(d-x)2
This expression is useful for us because (see the picture)
d—x
sin<pi =  , sin<p2
Thus there is at most one stationary point; it is determined by
sin «>i V] (5.10) —— = —.
sin <p2 v2
Let us realize that as <pi € [0, tc/2] increases (when x increases), the angle <p2 e [0,7t/2] decreases. The sine is non-negative and increasing on the interval [0, tt/2], so the quotient (sin^i)/(sin^>2) is increasing with respect to x. Since T'(0) < 0 and T'(d) > 0, there is exactly one stationary point x0. From the inequalities T'(x) < 0 for x e [0, xq) and T'(x) > 0 for x e (xo, <f], it follows that there is the global minimum at the stationary point x0.
Let us summarize the preceding: The ray is given by the point R of refraction (i. e. the value x0), and the point R is given by the identity (||5.10||), which is called Snell's law in physics.
The quotient of t>i and v2 is constant for the given homogeneous spaces and determines an important quantity which describes the interface of optical spaces. It is called a refractive index and denoted by n. Usually, the first space is vacuum, i. e. v\ = c, and v2 = v, thus obtaining the (absolute) index of refraction n = c/v. For vacuum, we get n = 1, of course. This value is also used for air since its refractive index at the standard conditions (i. e. pressure of 101 325 Pa, temperature of 293 K and absolute humidity of 0.9 gm~3) is n = 1.000272. Other spaces have n > 1 (n = 1.31 for ice, n = 1.33 for water, n = 1.5 for glass).
However, the refractive index also depends on the wave length of the electromagnetic radiation in question (for example, for water and light, it ranges from n = 1.331 to n = 1.344), where the index ordinarily decreases as the wave length increases. The speed of light in an optical space having n > 1 depends on its frequency. We talk about the dispersion of light. The dispersion causes rays of different colors to refract at different angles. (The violet ray refracts the most and the red ray refracts the least.) This is also the origin of a rainbow. We can further remind the well-known Newton's experiment with a glass prism from 1666.
Eventually, let us remark that our task always has a solution because we can choose the point R arbitrarily. If, together with the speeds t>i and v2, the angle <p\ were given as well (our task could then be to calculate where the ray going from the point A intersects the line y = c for a certain c < 0 when the interface of optical spaces is on the x-axis), then the angle <p2 € (0, it 12) satisfying (|| 5.101|) might not exist. This corresponds to the total reflection (there is no refracted light at all). □
5.201. The rainbow. Why is the rainbow circular?
323
CHAPTER 5. ESTABLISHING THE ZOO
tv^Lfdj- Solution. In the exercise called Snell's law we clarified what is the rainbow caused by. (It is created by sunlight being refracted while entering a droplet of water.) Now we will go on with this problem. To be concrete, we will examine how the rays behave when going through the droplets. (See the picture.) The ray dropping onto a droplet's surface at the point A "splits". Some part of the light reflects (at the angle <pt from the normal line) and the other part refracts inside the droplet at the marked angle <pr. The ray, inside the droplet, reflects off the droplet's surface at the point B. Since | OA \ = \ OB |, the angle of reflection equals <pr. Here as well, of course, some part of the light refracts out of the droplet. The reflected ray then meets the droplet's surface again at the point C and refracts towards the observer at the angle cpt from the normal line. Let us add that we omit the case of the so-called secondary rainbow arc, i. e. when the ray reflects twice inside the droplet before refracting out of it.
We will express the angle a := LAIC. Since LOAI = cpt and LOAB = cpr, we get LBAI = (ft — cpr. Then
LBIA = 7T — (LABI) - (LBAI) = jt - (jt - cpr) — (cpt - cpr) = 2ipr — cpt
and further
a = 2- LBIA = 4<pr - 2<pt.
By Snell's law, we have
sin<Pi — n
sin <pr '
where n stands for the refractive index for water (as we assume that air's index of refraction equals 1). Thus we have that
whence it follows that
/ sin cpt \
(5.11) a = 4arcsin - I — 2(pt.
V   n )
For the rays going out of the droplet, the value a is different. The admissible values of a are not distributed uniformly. If R is the droplet's radius and y is the distance of the point A from the horizontal plane going through the center of the droplet, then
(5.12) sin^ = —   for   ye [0, R].
R
Of course, we can assume (thanks to the huge distance of the Sun from Earth) that the amount of energy coming from the Sun for y e [a — 8, a + 8] is independent of a e [<$,/? — <$] but depends only on the range of the considered values y for sufficiently small 8 > 0. It thus makes sense to analyze the function (see (||5.11||) and (||5.12||))
a(y) = 4 arcsin ^ — 2arcsin j,       y € [0, R].
By selecting the appropriate unit of length (for which R = 1) we can turn to the function
a(x) = 4arcsin^ — 2arcsinx,      x e [0, 1].
Having calculated the derivative
a'(x) = —jL= -    2,      x e (0, 1),
we can easily determine that the equation a'(x) =0 has a unique solution
324
CHAPTER 5. ESTABLISHING THE ZOO
x0 = y^e(0, 1),   if «2eE(l,4). Let us set n = 4/3 (which is approximately the refractive index of water). Further,
a'(x) > 0,    x e (0, xo),       c/(x) < 0,    x e (xo, 1). We have found that at the point
x0 = /^P- = I Ti = 0.86, the function a has a global maximum
o-(xo) = 4 arcsin -4? - 2 arcsin ^ = 0.734 rad ^ 42 °.
y v/ 2V3 3V3
Although it is amazing that the peak of the rainbow cannot be above the level of approximately 42 ° with regard to the observer, what is even more amazing are the values
a (0.14) = 39.4°,    a (0.94) = 39.2 °,    a(0.8) = 41.2°,    a (0.9) = 41.5 °.
Those imply (the function a is increasing on the interval [0, x0] and decreasing on the interval [x0, 1]) that more than 20 % of the values a lie in the band from around 39 ° to around 42 °, and 10 % lie in a band thinner than 1 °. Furthermore, if we consider
a(0.84) =41.9°,       a (0.88) = 41.9 °,
we can see that the rays for which a is close to 42 ° have the greatest intensity. Let us emphasize that this is an instance of the so-called principle of minimum deviation: the highest concentration of the diffused light happens to be at the rays with minimum deviation since the total angle deviation of the ray equals the angle 8 = it — a.
The droplets from which the rays creating the rainbow for the observer come lie on the surface of a cone having the central angle equal to 2a (x0). The part of this cone which is above ground then appears as the rainbow arc to the observer (see the picture). Thus when the sun is setting, the rainbow has the shape of a semicircle. Let us remark that the rainbow exists only with regard to its observer - it is not anchored in the space. Eventually, let us add that the circular shape of the rainbow was examined as early as 1635-1637 by René Descartes. □
5.202.   L'Hospital's pulley.
A rope of length r is tied to the ceiling at point A. A pulley is attached to its other end. Another
§,, rope of length I > \fcP- + r2, going through the pulley, is tied to the ceiling at point B which is at distance d from the point A. A weight is attached to this rope. In what position will the x       weight stabilize (the system will be in a stationary position)? Omit the mass and the size of the ropes and the pulley. See the picture.
Solution. The system will be in a stationary position if its potential energy is minimized, i. e. the distance f(x) of the weight from the ceiling is maximal. However, this means that for r > d, the pulley only moves under the point B. Further on we will thus suppose that r < d. By the Pythagorean theorem, the distance of the pulley from the ceiling is Vr2 — x2 and from the weight then I — y/(d — x)2 + r2 — x2 , which gives
fix) = Vr2 - x2 + I - y/(d - x)2 + r2 - x2 .
The state of the system is fully given by the value x e [0, r] (see the picture), so it suffices to find the global maximum of the function / on the interval [0, r]. First, we calculate the derivative
325
CHAPTER 5. ESTABLISHING THE ZOO
f W = Jfl -x2 ~ J(d-x)2+fl -x2  = Jr2 -x2 + J(d-x)2+fl -x2 '      X ^ ('0' r-*"
Exponentiating the equatino f'(x) = 0 for x e (0, r) leads to
x1       = d2
r2 —x2       {d—x)2+r2 —x2
Multiplying both sides by (r2 — x2) {(d — x)2 + r2 — x2) then leads to
2dx3 - (2d2 +r2)x2 + d2?2 = 0,       x e (0, r).
If we notice that one of the roots of the left-hand polynomial is x = d, we can easily transform the last equation into the form
(x-d) (2dx2 - r2x - dr2) = 0,       x e (0, r),
or (we have a formula for the quadratic equation)
2d(x-d)(x- d+f&M.) (x - ^g^) =0,       xe (0, r).
Hence we can see that the equation f'(x) = 0 has at most one solution on the interval (0, r). (Since r < d and \Jr2 + Sd2 > r, there are surely not two roots of the considered polynomial in x in the interval (0, r).) It remains to determine whether
i2 W r2 +%& 1 X0 - -4d- - 4 r
G (0, r).
Realizing that r,d > 0 and r < d,we get
0 < x0 < ^ r
l+V^T
r.
As the function /' is continuous on the interval (0, r), it can change sign only at the point x0. From the limits
lim f'(x) = -jf=,        lim f'(x) = -oo,
x^0+ -Jdl+rl x^r-
it follows that
fix) > 0,   x e (0, x0),      /'(jc) < 0,   x e (x0, r). Thus the function / has the global maximum on the interval [0, r] at the point x0. □
5.203. A nameless mail company can only transport parcels whose length does not exceed 108 inches and whose sum of length and maximal perimeter is at most 165 inches. Find the largest (i. e. having the greatest volume) parcel which can be transported by this company.
Solution. Let M denote the value 165 (inches) and x the parcel's length (in inches as well). Apparently, the wanted parcel has such a shape that for any t e (0, x), its cross section has a constant perimeter (the maximal one). We will denote this perimeter by p (in inches, again). We want the parcel to have the greatest volume so that the cross section of a given perimeter has the greatest area possible. It is not difficult to realize that the largest planar figure of a given perimeter is a disc. Thus we have derived that the desired parcel has the shape of a cylinder with height equal to x and radius r = p/2n.
Its volume is
V =jtr2x =
and it must be that p + x < M and x < 108. Thus we consider the parcel for which p + x = M. Its volume is
V(X) = i-M^2L = *3-2Mx2+M2x      where    x e (Q) 10g] _
Having calculated the derivative
326
CHAPTER 5. ESTABLISHING THE ZOO
v,(x) = ^-amx+m2   = 3^-^-?)^ x e (Q) 10g) j
we easily find out the the function V is increasing on the interval (0, 55] = (0, M/3] and decreasing on the interval [55, 108] = [M/3, min{108, A/}]. The greatest volume is thus obtained for x = M/3, where
v (f) = m = 0.011789 M3 ^ 0.867 8 m3. If the company also required that the parcel have the shape of a rectangular cuboid (or more generally a right prism of a given number of faces), we can repeat the previous reasoning for a given cross section of area S without specifying what the cross section looks like. It suffices to realize that necessarily S = kp2 for some k > 0 which is determined by the shape of the cross section. (If we change only the size of the sides of the polygon which is the cross section, then its perimeter will change by the same ratio. However, its area will change by square of the ratio.) Thus the parcel's volume is the function
V(x) = Sx = kp2x = k (M — x)2x,       x e (0, 108].
The constant k does not affect the point of the global maximum of the function V, so the maximum is again at the point x = M/3. For instance, for the largest right prism having a square base, we have p = M — x = 2M/3, i. e. the length of the square's sides is a = M/6 and the volume is then
V =a2x =      = 0.009 259 M3 ^0.681 6 m3.
For a parcel in the shape of a ball (when x is the diameter), the condition p + x < M can immediately be expressed as nx + x < M, i. e. x < M/{jt + 1) < 108. Thus for x = M/{jt + 1), we get the maximal volume
V = \it (f)3 = -^-3 = 0.007 370 M3 ^0.542 6 m3.
Similarly, for a parcel in the shape of a cube (when x is the length of the cube's edges), the condition p + x < M means x < M/5 < 108. Thus for x = M/5 we get the maximal volume
V =x3 = (f)3 = 0.008 M3 « 0.588 9 m3.
Let us add that the length of the edges of the cube which has the same volume as the found cylinder
is
a = -M-= 0.227 595 M « 0.953 849 m.
Let us realize its length and perimeter sum to 5a = 1.138 M, i. e. more than the company's limit by around 14 %. □
5.204. A large military area (further denoted by MA) having the shape of a square and area of 100 km2 is bounded along its perimeter by a narrow path. From the starting point in one corner of MA, one can get to the target point inside MA by going 5 km along the path and then 2 km perpendicularly to it. However, one can also go along the path at 5 kph for any time period and then askew through the MA at 3 kph. What distance do you have to travel along the path if you want to get there as soon as possible?
Solution. To travel x km along the path (where x e [0, 5]), we need x/5 hours. Our way through MA will then be
V22 + (5 - x)2 = Vx2 - lOx + 29 kilometers long and we will cover it in \Jx2 — lOx + 29/3 hours. Altogether, our journey will take
327
CHAPTER 5. ESTABLISHING THE ZOO
f(x) = \x + \y/x2 - lOx + 29 hours (let us remind that x e [0, 5]). The only zero point of the function
fix) = \ +
1   ,   1 jt-5
5       3 v/jc2-10jc+29
is x = 7/2. Since the derivative /' exists at every point of the interval [0, 5] and since
/(D = ?|</(5) = f </(0) = ^, the function / has its absolute minimum at the point x = 7/2 Thus we should go 3.5 km along the path. □
5.205. You find yourself in a boat on a lake at distance d km from the shore. You want to get to a given place on the shore whose straight-line distance is \JS + Z2 from you (see the picture). What path will you take if you want to be there as soon as possible, supposing you can row at t>i kph and run along the shore at v2 kph? How long will the journey take?
Solution. The optimal strategy is apparently given by first rowing straight to the shore at some point [0, x] for x € [0,1] and then running along the shore to the target point [0,1] (see the picture), so the trajectory consists of two line segments (or only one segment, in the case when x = I). The voyage to the point [0, x] on the shore will take
hours
and the final run then
l—x
hours.
«2
We want to minimize the total time, i. e. the function
on the interval [0, /]. Further, we can assume that t>i < v2. (Clearly for t>i > v2 the optimal strategy is to row straight to the target point, which corresponds to x = I.) First, we calculate the first derivative
and then the second derivative
f(x) = —rJ==, jce(0,Z)-
/ (d2 +x
Further, we solve the equation
t' (x) = 0,      i. e. Exponentiating this equation gives
v-2
l +x2 V2
Simple rearrangements lead to
2 \v? / • v7
xA = v 7   2,       i. e.      ^ —--
-(5)
Let us realize that we consider only x e (0,1). Thus we are interested in whether
^- d
-2-        < I,      *  "  ^--'-
If this inequaUty holds, then also v\ < v2 and the function i changes sign only at the point
X0 = G (0,/),
328
CHAPTER 5. ESTABLISHING THE ZOO
and this change is from negative to positive (consider limx^0+ f (x) < 0 and t" (x) > 0, x e (0, /)). This means that in this case, at the point x0 there is the global minimum of the function t on the interval [0, /]. However, if the inequality (||5.205||) is false, then we have f (x) < 0 for all x e (0,1) whence it follows that the global minimum of the function t on [0,1] is at the right-hand marginal point (the function t is decreasing on its domain). The fastest journey will take (in hours)
t (x0)
d2 +*l     l-x0 1
V2 Vi
a d
V2
dv2+ivifi^f-id dV2(i-(a)2)+hlyi-(^)2 dV2fi^f+iv
"2 V
i d
supposing (|| 5.2051|), and if (115.20511) does not hold.
ViV2
t (I) = hours
□
5.206. A company is looking for a rectangular patch of land with sides of lengths 5a and b. The company wants to enclose it with a fence and then split it into 5 equal parts (each being a rectangle with sides a, b) by further fences. For which values of a, b will the area S = Sab of the patch be maximal if the total length of the used fences is to equal 2 400 m?
Solution. Let us reformulate the statement of the problem: We want to maximize the product Sab while satisfying the condition
(5.13) 6b + 10a = 2400, a,b>0.
It can easily be shown that the function
a h-> 5a
2 400-lOa
defined for a e [0, 240] takes the maximal value at the point a = 120. Hence the result is
a = 120 m,    b = 200 m. Let us add that the mentioned value of b immediately follows from (||5.13||).
□
5.207. A rectangle is inscribed into an equilateral triangle with sides of length a so that one of its sides lies on one of the triangle's sides and the other two of the rectangle's vertices lie on the remaining sides of the triangle. What is the maximum possible area of the rectangle?
5.208. Choose the dimensions of an (open) swimming pool whose volume is 32 m3 and whose bottom has the shape of a square, so that one would spare the least amount of paint possible to prime its bottom and walls. O
5.209. Express the number 28 as a sum of two non-negative numbers such that the sum of the first summand squared and the second summand cubed is as small as possible. O
329
CHAPTER 5. ESTABLISHING THE ZOO
5.210. With the help of the first derivative, find the real number a > 0 for which the sum a + l/a is minimal. Then solve this problem without using the differential calculus. O
5.211. Inscribe a rectangle with the greatest perimeter possible into a semidisc with radius r. Determine the rectangle's perimeter. O
5.212. Among the rectangles with perimeter Ac, find the one having the greatest area (if such one exists) and determine the lengths of its sides. O
5.213. Find the height h and the radius r of the largest (i. e. having the greatest volume) cone which fits into a ball of radius R. Q
5.214. From the triangles with a given perimeter p, select the one with the greatest area. O
5.215. On the parabola given by the equation 2x2 — 2y = 9, find the points which are closest to the origin of the coordinate system. O
5.216. Your task is to create a one-liter tin having the "usual" shape of a cylinder so that the minimal amount of material would be used. Determine the proper ratio between its height h and radius r. O
5.217. Determine the distance of the point [3,-l]el2 from the parabola y = x2 — x + \. Q
5.218. Determine the distance of the point [—4, —2] e R2 from the parabola y = x2 + x + 1. O
5.219. At the time t = 0, a car left the point A = [5, 0] at the speed of 4 units per second in the direction (—1, 0). At the same time, another car left the point B = [—2, —1] at the speed of 2 units per second in the direction (0, 1). When will the cars be closest to each other and what will their distance be at that moment? O
5.220. At the time t = 0, a car left the point A = [0, 0] at 2 units per second in the direction (1,0). At the same time, another car left the point B = [1, — 1] at 3 units per second in the direction (0, 1). When will they be closest to each other and what will the distance be? O
5.221. Determine the maximum possible volume of a cone with surface area 3tv cm2 (the surface area of its base is included as well). The area of a cone is P = 7tr(r + h), its volume then V = \Ttr2h, where r is the radius of its base and h is its height. O
5.222. A 13 feet long ladder is leaned against a house. Suddenly the base of the ladder slips off and the ladder begins to go down (still touching the house at its other end). When the base of the ladder is 12 feet from the house, it is moving at 5 feet per second from it. At this moment:
(a) What is the speed of the top of the ladder?
(b) What is the rate of change of the triangle dehmited by the house, the ladder, and ground?
(c) What is the rate of change of the angle enclosed by the ladder and the ground?
o
5.223. Suppose you own an excess of funds without the possibility to invest outside your own factory which acts at a regulated market with a nearly unhmited demand and a hmited access to some key raw materials, which allows you to produce at most 10 000 products per day. You know that the raw profit p and the expenses e, as functions of a variable x which determines the average number of products per day, satisfy
330
CHAPTER 5. ESTABLISHING THE ZOO
v(x) = 9x,    n(x) = x3 - 6x2 + 15x,    x e [0, 10]. At what production will you profit the most from your factory? O
5.224. Determine
lim ( cotx--
x^O \ X
Solution. If we realize that
1
lim cotx = +oo,        lim — = +oo,
x^0+ x^0+ X
1
lim cotx = — oo,        lim — = — oo,
jt-»0- x^O- X
we can see that both one-sided limits are of the type oo — oo. We can thus consider the (two-sided) limit.
We will write the cotangent function as the ratio of the cosine and the sine and convert the fractions
to a common denominator, i. e.
1 \          x cos x — sin x lim  cotx--= lim-.
x^o \ x)    x^o xsmx
Thus we have obtained an expression of the type 0/0 for which we get (by 1'Hospital's rule)
xcosx —sinx          cosx — x sinx — cosx —xsinx
lim-= lim-= lim
x^o     xsinx x^o      sinx + xcosx sinx + x cosx
By one more use of 1'Hospital's rule for the type 0/0, we then get
—xsinx —sinx—xcosx 0 — 0
lim-= lim-=-= 0.
x^o sinx + x cosx cosx + cosx — x sinx     1 + 1 — 0
5.225. Determine the limit
7TX
5.226. Calculate
lim (1 — x) tan ■ 2
lim (— — xtan x).
^f-V2 /
5.227. Using l'Hospital's rule, determine
j^((3"-2"W
5.228. Calculate
.    1 1
lim
l \ 2 In x    x2 — 1
5.229. By l'Hospital's rule, calculate the limit
^2
2
lim cos
x^+oo \ x
□
o
o
o
o
o
331
CHAPTER 5. ESTABLISHING THE ZOO
5.230. Determine
lim (1 — cosx)s
o
5.231. Determine the following Umits
lim xtaT,        Hm xisz,
where a e M is arbitrary. O
5.232. By any means, verify that
ex - 1 lim-= 1.
x^O X
o
5.233. By applying the ratio test (also called D'Alembert's criterion; see 5.46), determine whether the infinite series
(a) E
n = l oo
(b) E ff;
«=i
(c) E „".„,
converges
« = 1
Solution. Since (a„ > 0 for all n)
2*+1-(«+2)3-3* -• ■v-a-Ti3 3"+1-2"-(« + l)
(a) um 22±i = lim ll[<»?rf; = lim = lim ^ = f < 1;
v y „^oo   an 3»+1-2»-(« + D3 3(« + l)3       „^^ 3«3 3
(b) lim 22±i = Km f^l- •      = Urn 4r = 0 < 1;
(C) Km 2s±l   =    lim (, i^r'n, • =    I™ t^tt •  lim ^   =    lim 4 •
v y „^oo   an \(« + l)2-(n + 1)!      «"   / „^00 (« + l)2       „^00    «" „^00 «2
lim (l + i)" = 1 e > 1, the series (a) converges; (b) converges; (c) does not converge (it diverges to +00). □
5.234.   By applying the root test (Cauchy's criterion), determine whether the infinite series
(a)   E ln"(« + l);
(b) E
«=1
00
(c) Earcsin"! «=1
converges.
Solution. Once again we consider series with non-negative terms only, where
(a) lim ^fa~n = lim = 0 < 1;
(«±±y   iim (i+iy
(b) lim 4a~n = lim ^TT = 7°°V    L = f < 1;
(c) lim ^/öJJ" = lim arcsin    = arcsin 0 = 0 < 1.
This means that all of the examined series converge. □
332
CHAPTER 5. ESTABLISHING THE ZOO
5.235.  Determine whether the series
oo
(a) £(-D" ln(l + £);
n = \
oo 2
(b) E ^ ■
.i!
« = 1 oo
(c) v (~3)" «=i
converges.
Solution. The case (a). By l'Hospital's rule, we have
r      K1+^)       r r        1 1
lim    v !   ' =  lim     2     ,-     =  lim —l— = 1,
hoo       2* x^*+oo        (2^) x^*+oo 1 + 2*
hence
0 < In (1 +      < £
for all sufficiently large neN. However, we know that the series E^i *s convergent. So it must be that
00
£ln(l + £) < +00,
«=i
i. e. the examined series converges (absolutely). The case (b). The ratio test gives
lim
i-       2("+1)2-«l        i-      22"+1        i- 2-4"
hm--= lim ^—r = lim ^-V = +00.
«^00 (« + l)!-2«2       «^00 " + 1
Thus the series does not converge.
The case (c). Now we will use the general version of the root test
lim sup y\an I = lim sup 6+3_1)n = f < 1, whence it follows that the series is (absolutely) convergent. □
5.236.   By any means, determine whether the following alternating series converge:
(a) E(-l)
n « +3«-l .
(3«-2)2 n = \
/u\ —1 3n4-3n3+9n-
yO) *■) (5«3-2)4"
_±\h-1 3n4-3n3+9n-l
n = l
Solution. The case (a). Since we have that
„2
lim = lim      = i y^ 0,
„^oo  (3«-2)2 9«2       9 7-'
it immediately follows that the limit
does not exist. Therefore, the series does not converge (a necessary condition for the convergence is not satisfied).
The case (b). We have seen that when applying the ratio (or root) test, the polynomials neither in the numerator nor in the denominator affect the value of the examined limit. Let us thus consider the series
00
4«
n = l
for which we have
lim
"n+\
4-<l-
However, this means that the original series is also (absolutely) convergent. □
333
CHAPTER 5. ESTABLISHING THE ZOO
5.237.  Does the following series converge?
E(-l)"+1 arctan J=
«=i
Solution. The sequence 12/V3« |     is apparently decreasing and the function y = arctan x increas-
ing (on the whole real axis), so the sequence |arctan [2/y/3n) \ is decreasing. Thus we have an alternating series which satisfies that the sequence of the absolute values of its terms is decreasing. Such an alternating series converges if and only if the sequence of its terms converges to zero (the so-called Leibniz criterion), and this is satisfied:
lim arctan -4= = arctan 0 = 0, i. e. lim ((—l)"+1arctan -4= ) = 0.
«=>oo V3« «=>oo \ V3«/
□
5.238.  Determine whether the series
oo
(a) E ^;
n = \
(b) E
cos(jrrc)
converges absolutely, converges conditionally, or does not converge at all.
Solution. The case (a). It is easy to show that this series converges absolutely. For instance,
E I     | — E ^ < E 2« = ^
n — 1 n — 1 n—0
and the second inequality has already been proven.
The case (b). We can see that cos (jtn) = (—1)", n e N. So we have an alternating series such that the sequence of the absolute values of its terms is decreasing. Therefore, from the Umit
lim 4= = 0
it follows that the series is convergent. On the other hand,
oo oo oo
J2 = Et?>E1 = +°°-
n = l
n = l v"        n = l
Thus the series converges conditionally. □ 5.239.   Calculate the series
OO    / v
n = l
(b) E Jr;
oo
(c) E (42/1-1 + 42^)'
n = l 00
(d) E f;
n = l 00
(e) 2E (3« + l)(3«+4)'
«=0
Solution. The case (a). By the definition,
00 / 1       1 \
n = l
lim
«=>oo VVvT s/l
) + (J2      V3)+"'+(i vil))
lim
«=>oo
(1 + (-75 + 7l) + --- + (-^ + 7i)-7fcT) = L
334
CHAPTER 5. ESTABLISHING THE ZOO
The case (b). Apparently, this is five times the convergent geometric series with the common ratio q = 1/3, hence
cx cx ^
E pr = 5 E (3)  = 5 • -—y = y-
«=0 «=0 The case (c). We have that (substituting m = n — 1)
E (42/1-1 + 4>)       4 E (42/1-2) ~T~ 16 E (42/1-2) « — 1 « — 1 « — 1
CX) CX)
14 V / Um       14       1 14
f 3   ,   2.\ y- J_ _ 14 y- (J_\«
\4 T 16/ ^ 42m       16 ^ V 16/ 16    1__L       15-
m=0 m=0 16
The series of linear combinations was expressed as a linear combination of series (to be more precise, as a sum of series with factoring out the constants), which is a valid modification supposing the obtained series are absolutely convergent. The case (d). From the partial sum
sn = | +    + Jr + ••• + £, neN,
we immediately obtain that
3   — 32 T 33 T T   yi    T 3„+i ,       « c «■
Thus
v — I_i_X_i_J__i_..._i_J___«_     i, c w
Since lim       = 0, we get
CX «
Ef =lim| (,„-|) = f^E^ =
i£(ir = i(^T-i) = i-
i=l v   3 7
The case (e). It suffices to use the form (the so-called partial fraction decomposition)
i 1111
(3« + l)(3«+4)       3    3« + l       3 3«+4
which gives
, neNU(O),
^ (3« + l)(3«+4) 3 V        4^4      7^7       10 ^        ^ 3« + l 3«+4/
n=0 n^oo
im i (l — t^tt) = k-
3 V        3«+4/ 3
□
5.240.  Verify that
cx °° i
E < E 2F-n — l n—0
Solution. We can immediately see that
1 <
or the general bound
1<1     .L + ±<2-^-±     — + — + — + —<4- — — -
1 - A'      22 + 32 < Z    22 — 2'      42 + 52 + 62 + 72 < ^    42 — 4'
(2/!)2 T -T p/i+l _1)2   ^  ^       (2/1)2  — 2"'       " *= 1X1 •
Hence (by comparing the terms of both of the series) we get the wanted inequality, from which, by the way, it follows that the series E^Li ^2 is absolutely convergent. Let us specify that
CX 2 °° 1
n — l n—0
335
CHAPTER 5. ESTABLISHING THE ZOO
□
5.241. Examine convergence of the series
oo
t—1 n
« = 1
Solution. Let us try to add up the terms of this series. We have that
oo
T In s±i = lim (In } + In § + In f + • • • + In s±±) =
j        " «=>oo " 7
lim In 2-3l49;("+1) = lim In (n + 1) = +oo.
H=>0o 1-z-3---H n=>0o
Thus the series diverges to +oo. □
5.242. Prove that the series
g arctan "2+2"+3>+4     g _^±l_
n + l
do not converge. Solution. Since
and
lim arctan "2+2"+3f+4 = iim arctan    = f
!• 3"-t-1 1' 3"
lim  3   2_  = lim — = +oo,
the necessary condition lim an = 0 for the series Eh==h0 an to converge does not hold. □
oo
E
n=>oo
5.243.  What is the series
71n«
n=L
Solution. From the inequalities (consider the graph of the natural logarithm)
1 < Inn < n,    n > 3, neN
it follows that
Vl < yinn < n > 3, n e N.
By the squeeze theorem (5.21),
i
lim vlnn = 1,    i. e.    lim -=== = 1.
«=>oo «=>oo Vinn
Thus the series is not convergent. Since all its terms are non-negative, it must diverge to +oo. □ 5.244.  Find out whether the series
oo
1
(a) Ett;
(n + l)-3«
n=0
(b) E ^
«=i
oo
(C)  E~T"
v 7 n —In n
n = l
converges.
Solution. All of the three enlisted series consist of non-negative terms only, so the series either converges, or diverges to +oo. We have
oo oo
(a) E ^ £ (5)" = T_T < +°°'
00    9 00    9 00
« — 1 77 — 1 « — 1
336
chapter 5. establishing the zoo
(c) E —1— > E 1 = +00.
n — 1 n — 1
Hence it follows that (a) converges; (b) diverges to +00; (c) diverges to +00.
□
5.245.   Show that the so-called harmonic series
E1
«=i
diverges.
Solution. For any natural number k, the sum of the first 2h terms of this series is greater than k/2: 1111111
1 + ~ + 7 + 7 + 7 + 7 + 7 + 7 + ---
2    3    4    5    6    7 8
1.1_1
"4+4-2
.i+i+i+i-i
as the sum of the terms from 2l + 1 to 2l+l is always greater than 2l-times (its number) 1/2' (the least one of them), which sums to 1/2. □
5.246.  Determine whether the following series converge, or diverge:
oo
i) E-
« = 1 oo
«=i
m) E
1
„.2100000
«=1
oo
iy) E (1+0«
«=1
Solution.
i) We will examine the convergence by the ratio test:
2(72 + 1)
			2n+\
lim	an + \	= lim	n + l In
n—>oo		n—>oo	Z n
lim
n—>oo
2 > 1,
so the series diverges.
ii) We will bound the series from below: we know that £ < for any natural number n. Thus the sequence of the partial sums s„ of the examined series and the sequence of the partial sums s'n of the harmonic series satisfy:
n       ^ 1
r = l V r = l
Since the harmonic series diverges (see the previous exercise), by definition, the sequence of its partial sums {s'n}'^=l diverges as well. Therefore the sequence of its partial sums {s„}^Li also diverges and so does the examined sequence.
iii) This series is divergent since it is a multiple of the harmonic series.
iv) The examined series is geometric, with common ratio j^j. Such a sequence is convergent if and only if the absolute value of the common ratio is less than one. We know that
V2
1
TT7
i
i1 1 -i '2 ~ 21'
1 1
4 + 4
< 1,
337
CHAPTER 5. ESTABLISHING THE ZOO
hence the series converges, and we are even able to calculate it:
1 + i
1 1
□
5.247. Consider a square with sides of length a > 0. Now consider the square whose vertices are the midpoints of the original square's sides. Then consider the square whose vertices are again the midpoints of the sides of the previous square; and so on. Determine the sum of the areas and the sum of the perimeters of all these (infinitely many) squares. O
5.248. Let a sequence of rows of semidiscs be given, such that for each n e N, the 72-th row contains 2" semidiscs, each having the radius of 2~". What is the area of an arbitrary figure consisting of all these semidiscs, supposing the semicircles do not overlap? O
5.249. Solve the equation
1 - tanx + tan2x - tan3 x + tan4x - tan5 x -\----= t ta1^2^..
tanzx+l
5.250. Determine
00
E (2"-!" yi-i")
n = l
5.251. Calculate
E v^2 + 2n + l.
n = l
5.252. Prove the convergence of the series
E
3"+2"
6" «=1
5.254. Sum up
1-3 T 3-5 T 5-7 T Z^   (2«-l)(2« + l) ■
« = 1
5.255. Using the partial fraction decomposition, calculate
o
o
o
and find its value. O
5.253. Calculate the series
00
(a) E
n = l 00
(b) E ^-
n=0
o
o
338
CHAPTER 5. ESTABLISHING THE ZOO
11=2
(b) E
n = l
O
5.256. Determine the value of the convergent series
oo
E
4«2-l ' «=0
o
5.257. Calculate the series
E
n2-\-3n ' n = l
O
5.258. In terms of
oo
(-i)"'1 -1 _ ixi_ i il _ i ii _ I
express the following two series
s — t = 1_i+!_! + !_! + A_! +
A ■— n 2^3      4^5       6^7       8 ^
« = 1
(both the series contain the same elements as the first one, only in a different order). O
5.259. Determine whether the series
oo
2"+(-2)"
«=0
converges. O
5.260. Prove the following statement: If a series E«*lo a« converges, then lim sin (3an + it) = 0.
«=> oo
o
oo     _ oo oo
5.267. For which a e M;   £ € Z;    y e M\{0} do the series E v;        £ £ ^
« = 120 h=240 «=360
converge? O
5.262. Determine whether the series
_j^« «"—5«°+2«
«=21 2
converges absolutely, converges conditionally, or does not converge at all. O
5.263. Find out whether the limit
lim l\ + \ + • • • + -4)
«=>oo V" " " '
is finite. Let us warn that one cannot make use of the sums
oo oo
1 jt2 «-1
ii- 6 ' t—i nL ii—1 n—2
5.264. Find all real numbers A > 0 for which the series
o
339
CHAPTER 5. ESTABLISHING THE ZOO
£(-l)"ln(l + A2")
n = l
is convergent. O
5.265. Let us remind that the harmonic series diverges; i. e.
oo
E 1 = +OC.
« = 1
Determine whether the series
I + ..._l_I_l_J__l_..._l_J__l_J__l_..._l_J__l_...
1 T 9 T 11 T       T 19 T 21 T       T 29 T
..._I__L_I_..._I__L_I—L_|_..._|__L_|_J__i_... is divergent as well. O
5.266. Give an example of two divergent series E^Li an, E^Li ^« with positive numbers for which the series E^Li @a„ — 2bn) converges absolutely. O
5.267. Find out whether the two series
OO - OO 7 4
Ei_i\« ("!) . V-1 /_i \n n —n+n \    L>   (2«)!' L> nZ+2n6+n
n — l n — l
converge absolutely, converge conditionally, or do not converge at all. O
5.268. Does the series
El    -\\n + \ ^n+^n + l
n = l
converge? O 5.269. Find the values of the parameter p € R for which the series
OO
£(-D" sin"f
n = l
converges. O
340
CHAPTER 5. ESTABLISHING THE ZOO
Solutions to the exercises
5.2. P(x) = (-| - ji)x2 + (2 + 3i> - I - t'-5.77. 3.x2 -2x -4.
5.72. (2x2 - 5) /3; eg. (fx2 - § )3.
5.73. a = 1,7> = -2, c = 0,   = 1.
5.74. x3 + x2 - x + 2.
5.15. Infinitely many.
5.16. P(x) = x3 - 2X2 + 5x - 3; g(x) = x3 - 2x2 + 3x - 3.
5.77. x5 -2x4 -5x + 2.
5.78. x2.
5.79. x3 -2x+5;x3 - x + 6. 5.20. Infinitely many.
5.27. Eg. x2 - 3x + 6.
5.22. Si{x) = \ (x + l)3 - \ (x + 1) + l,x e [-1.0]; S2(x) = -3-K3 + fx2^ e [0, 1].
5.23. Si(x) = \ (x + l)3 - § (x + 1) + l,x e [-1,0]; S2(x) = -513 + \x2,x e [0, 1].
5.24. 5i(jc) = x; £>(x) = x.
5.25. 5i(jc) = 1; 52(x) = 1.
5.2(5. Si(x) = x + 3, x e [-3 + / - 1, -3 + /]; i e {1, 2}.
5.27. 5iW = 1 - ^x + ^x3; S2(x) = \ - \ (x - 1) + ^ (x - l)2 - ^ (x - l)3.
5.29.
sup A = 6, 1
sup B — —, sup C = 9,
5.30. It can easily be shown that
sup A — —,
5.31. Clearly
infN=L    supAf = 0,
5.32. We can, for instance, set
M:=Z\N;       N := N.
inf A = -3;
inf£ = -1; inf C = -9.
inf A = 0. infj" = 0,    sup J = 5.
5.33. Consider any singleton (one-element set) Xcl.
5.34. The set C must be a singleton. Thus, let us choose C — {0}, for example. Now we can take A = (—1,0),
£ = (0, 1).
5.40. We have
/l      2 n-2    n-l\ /1 + n - 1   n-l\ 1
lim \ -2+-2+---+ —Ť- + —2~~ J = llm----T~   = o•
5.41. It can easily be shown that
V«3 - Hh2 + 2 + vV - 2n5 - n3 - n + sin2 ;i
lim - -= —00.
2 - V5fi4 + 2n3 + 5
341
CHAPTER 5. ESTABLISHING THE ZOO
5.42. The limit is equal to 1.
5.43. We can, for instance, set
x„ :— n,    yn :— —n + 1,       n e N.
5.44. The answer is ± 1.
5.45. The result is
lim sup a„ — 1,
5.46. We have
liminf ( (-1)" ( 1 + - I +sin—   =-e-—. n^oo  \ \      n J 4 / 2
5.62. The examined function is continuous on the whole R.
5.63. The function is continuous at the points — it, 0, it; only right-continuous at the point 2; only left-continuous at the point 3; and continuous from neither side at 1.
5.64. It is necessary to set /(0) :— 0.
5.65. The function is continuous iff p — 2.
5.66. The correct answer is a — 4.
5.67. It holds that
sin8 x sin8 x
lim —-— =   lim —-— = 0.
jr-»0+    Xi x^-oo xi
5.70. The only solution is x = —1.
5.71. It does.
5.116. r — +oo.
5.117. 1.
5.118. 3. 5.7/9. [-1, 1].
5.120.x e [2- \,2+ i].
5./2Z. It is. 5.722.
(a) True.
(b) False.
(c) False.
(d) True.
->•-«■"• 1      102.2 + 104.4!-
5.724. The error lies in the interval (0, 1 /200).
5.126. f(x) =x,x eR; it is.
5.127. It does not.
j.izy. Z^«=0 (2«+1)h! a
5.730. a > 1. 5.737. [-^2,^2). 5.733. x > 2.
5.134. The series is absolutely convergent. 5.735. In (3/2).
lim inf a„ — 0.
342
CHAPTER 5. ESTABLISHING THE ZOO
5J36-
5.137. (a) I In i±|; (b) J^.
5.758. 2/9.
5.ii9.xe~.
5.144. (a) no; (b) no; (c) yes; (d) yes; (e) no; (f) no; (g) no; (h) yes.
5.145. The functions (a), (e) are odd; the functions (c), (d) are even.
5.146. It is periodic, the prime period being (a) 2tt; (b) tt/3.
5.147. The functions / and g are even, so it suffices to consider the graphs of the functions y — ex, x e [0, +oo) and y — lnx, x € (0, +oo).
5.148. The given function is even, so to draw its graph, it suffices to know the graph of the function y — 2X,
X € (—oo, 0].
5.149. (sinhx)' = coshx; (coshx)' = sinhx; (tanhx)' = —(cothx)' =
l
cosh2*' v      " ' sinh2*'
5.150. -J—.
5.152. x4 + 2X3 - x2 + x - 2.
5.153. x4 + 2X3 - Ix2 + x + 2.
5.154. x4 + 3X3 - 3X2 - x - 1.
5.155. For every e > 0, it suffices to assign to the e-neighborhood of the point —2 the <5-neighborhood of the point 0 given by
m J,       8 — s,
and without loss of generality, we can assume that e < 1. Since if e > 1, we can set S — 1.
5.156. Existence of the limit and the equality
,. (l+x)2-3 3 lim -= —
x^-i        2 2
follows, for example, from the choice S :— s for e e (0, 1).
5.157. Since — (x — 2)4 < x for x < 0, we get 3 (x — 2)4/2 > —x for x < 0.
5.158. As
1        7T 1 7T
lim arctan — — —,        lim arctan — —--.
x^o+ x     2        x^o- x 2
the considered limit does not exist.
5.159. The former limit equals +oo, the latter does not exist.
5.160. The limit can be determined by a lot of means. For instance:
tanx — sinx /tanx —sinx cotx
lim-r-= lim
x^O     sin3x x^o \    sin3x cotx
1 — cos X 1 = lim -t— = lim
x^O cos x • sin2 x    x^o cos x (l — cos2 x) 1 1
= lim
>0 cosx (1 + cosx) 2
5.161. We have that
2 sin x + 7 sin x + 2 sin x — 3              sin x + 1 lim-----=  lim -= —3.
:^jt/6 2sin3x + 3 sin2x - 8 sinx + 3    x^n/6 sinx - 1
5.162. We have
x™ - 1 m lim -= —.
x" - 1 n
343
CHAPTER 5. ESTABLISHING THE ZOO
5.163. After multiplying by the fraction
Vx2 + x + X
Vx2 4- x + x
we can easily get that
lim   I V-y- + x — x)
5.164. We have
5.165. We have
lim  (x V1 + x2 — x2 ^ = -.
->+oo\ / 2
a/2 — V1 + cos x \fl
lim
*->0        sin x
5.166. By extending the given fraction, we can obtain
sin (4x)
lim
>0 Vx + 1 - 1 5.7(57. We have that
V1 + tan x — VI — tan x
lim -= 1.
x^o- sinx
5.168. Apparently,
2X + Vl +x2 -x9 - 7a-5 + 44X2 7 lim - —-= —.
x^-°° y + V6x6 + x2 - 18x5 - 592X4 18
5.769. The statement is false. For example, consider
f(x) := —,    x € (—oo. 0);       g(x) := x,    x e x
5.170.
n \2"-1 lim I - )       = e
«=>oo \n + 5
-10
5.178. (a) v(0) = 6m/s; (b) f = 3 s, 5(3) = 16m; (c) u(4) = -2m/s, a(4) = -2m/s2.
5.779. /'(*„) =
5.780. It does not because the one-sided derivatives differ (concretely: tt/2 from the right and —jt/2 from the left).
5.787. It does. 5.782. It does not.
5.183. f{x) :=|x-5| + |x-9|.
5.184. For instance, let / = g take 1 at rational numbers and —1 at irrational ones. 5.785. (a) x2 sinx; (b) cos (sinx) ■ cosx; (c) ffig cos (in (x3 + 2x)); (d) J^~^)2•
5.786. (a) | x 8; (b) cosecx =
5.187. cosx • cos (sinx) ■ cos (sin (sinx)).
5.788. /'(*) =   .   1    , + 1, x € (l - V2, 1 + V2).
V1+2jc— jc- V /
5.789.   c/" .
3 v sin x
344
CHAPTER 5. ESTABLISHING THE ZOO
5.190. + j? t?.
5.191. -8.
5.192.^ fig.
x e R.
5.194. f'(x) = -\ (log, e)2, x > 0, x / 1.
5.195. [f(x)g(x)h(x)k(x)] ' = f'(x)g(x)h(x)k(x) + f(x)g'(x)h(x)k(x) + f(x)g(x)h'(x)k(x) + f(x)g(x)h(x)k'(x).
5.207.  The inscribed rectangle has sides of lengths x, *j3/2(a — x), thus its area is V3/2(a — x)x. The
5.208.4m x 4m x 2m.
5.209. 28 = 24 + 4.
5.210. a = 1. 5.211.241 r.
5.212. It is the square with sides of length c).
5.213. h = ffl, r = ^fl.
5.214. It is the equilateral triangle (with area V3 /?2/36). 5.2/5. [2, -1/2], [-2, -1/2].
5.2/(5. i; = 2r.
5.217. The closest point is [1, 1], the distance then 2\/2.
5.218. The closest point is [— 1, 1], distance 3V2.
5.219. t = 1, 5s, the distance will be V5 units.
5.220. It will happen at the time ? = ^ s, the distance being units.
5.221. P = nrv + nr2 =>• j; = =>• V = \r(P - nr2). The extremum is at r = the substitution gives V = ^ cm3.
5.222. (a) 12ft/s; (b) -59, 5ft2/s; (c) -lrad/s.
5.223. At about 3 414 products per day.
5.224. Triple use of 1'Hospital's rule gives
maximum occurs for x = a/2, hence the greatest possible area is (V3/8)a2.
sinx — x
lim
6
5.225. 2/jr.
5.226.
5.227.
(C
5.228. 1/2.
5.229. We have
345
CHAPTER 5. ESTABLISHING THE ZOO
5.230. By double applying 1'Hospital's rule, one obtains
lim (1 -cosx)sin* =e° = 1.
x^O
5.231. In both cases, the result is e".
5.232. The limit can be easily calculated by 1'Hospital's rule, for instance.
5.247. 2a2; 4a (2 + V2").
5.248. tt/2.
5.249. x = | + A-7T, x = ^ + Jbr, k e Z.
5.250. 5. 5.25/. +oo.
5.252. 3/2.
5.253. (a) 3; (b) 9/4.
5.254. 1/2.
5.255. (a) 3/4; (b) 1/4.
5.256. -1/2.
5.257. 11/18.
5.258. s/2; 3s/2 (s = ln2).
5.259. It does.
5.260. It suffices to consider the necessary condition for convergence, namely lim^oo a„ = 0.
5.261. a > 0; /3 e {-2, -1, 0, 1, 2}; y e (-oo, -1) U (1, +oo).
5.262. It is absolutely convergent. 5.26i. The limit is equal to 1/2.
5.264. A € [0, 1).
5.265. The value of the given series is finite - the series converges.
5.266. For example: a„ = n/3, bn = n/2, n e N.
5.267. The former series converges absolutely; the latter one does conditionally.
5.268. It does.
5.269. pel.
346
CHAPTER 6
Differential and integral calculus
we already have the menagerie, but what shall we do with it? - we'll learn to control it...
A. Derivatives of higher orders
First we'll introduce a convention for denoting the derivatives of higher orders: we'll denote the second derivative of function / of one variable by /" or f(2), derivatives of third or higher order only by y (3) y (4) y («) por remembrance, we' 11 start with a slightly cunning problem using "only" first derivatives.
6.1.   Determine the following derivatives:
i) (x2 ■ sinx)",
ii) (**)",
iv) (x")(n),
v) (sinx)(n).
Solution, (a) (x2 ■ sinx)"
4x cosx — x2 sinx.
(b) (xx)" = [(1 + In *)**]' =
(d) (x")(n) = [(x")']
= (2xsinx +x2cosx)' = 2 sinx + xx~l +xx(l + lnx)2.
In the previous chapter we were playing either with extremely large classes of functions — all continuous, all differentiable etc. — or only with particular functions — for example exponential, goniometric, polynomials etc. However we had only a minimum of tools and we computed everything by hand. From the qualitative point of view, we only indicated how to use the knowledge of a linear approximation of a function to its derivative to discuss the local behavior of such function near a given point. Now we will put together several results that will allow us to work with functions more easily in simulations of real problems.
By differentiation we learned how to measure instantaneous changes. In this chapter we will deal with the task of summing infinitely many of these "infinitely small" changes, e.g. how to "integrate". First though, we will clarify some things about differentiating.
In the last part of the chapter we will come back to series of functions and fill in several missing steps in our argumentation so far.
1. Differentiation
6.1
Higher order derivatives. If the first derivative f'(x) of a y/- real or a complex function has a derivative (/')' (xo) -^I'b at the point xo, we say that the second derivative of function / (or second order derivative) exists. Then we write f"(x0) = (f')'(x0) or /(2)(x0). Function / is double differentiable on some interval, if it has a second derivative at each of its points. We define derivatives of higher orders inductively:
.__j       k TIMES DIFFERENTIABLE FUNCTIONS j__--
x2(lnx)2 x2(lnx)4'
rtVlO-D = (nxn-l )(»-!)
A real or a complex function / is differentiable (k + 1) times at the point xq for some natural number k, if it is differentiable k times on some neighbourhood of the point xo and its k-ih derivative has a derivative at the point xo. For the k-ih derivative of the function fix) we write f^k\x). For k = 0, by 0 times differentiable functions we mean continuous functions.
If derivatives of all orders exist on an interval, we say that the function / is smooth on it.
For functions with continuous k-ia derivative we use the denotation the class offuncitons Ck iA) on an interval A, where k can attain values 0, 1,..., oo. Often we write only Ck, if the domain is known from the context.
n\.
CHAPTER 6. DIFFERENTIAL AND INTEGRAL CALCULUS
(e) (sinx)(n) = re(z" sinx) + im(z" cosx).
6.2.   Differentiate the expression
□
tfT^T ■ (x + 2)3
ex(x + 132)2
of variable x > 1.
Solution. We'll solve this problem using the so called logarithmic differentiation. Let / be an arbitrary positive function. We know that
[ln/(Jc)]' = ^g,    tj.   /'(*) = /(*)• [In/(*)]',
if the derivative fix) exists. The usefulness of this formula is given by the fact that for some functions, it's easier to differentiate their logarithm then themselves. Such is the expression in our provlem. We'll
obtain
Zfx~^X ■ (x + 2)3 ex(x + 132)2
In
Zfx~^X ■ (x + 2)3 ex(x + 132)2
tfx~^\ ■ (x + 2)3 ex(x + 132)2
tfx~^\ ■ (x + 2)3 ex(x + 132)2
i)fx~^l ■ (x + 2)3 "    ex(x + 132)2
3 in (x + 2) + i in (x - 1) - x lne - 2In (x + 132)
1
x+2 40-1)
x + 132
□
6.3.   Let n e N be arbitrary. Find the «-th derivative of function
y = ln±±£,
Solution. With respect of the equality
ln±±f =ln(l + x) -ln(l -x) ,    jce(-l, 1), we'll define an auxihary function
/(jc):=ln(fljc + l),    x e (-1, 1), a = ±1. For i e (-1, 1) we can easily (sequentially) compute
ax-\-l'
-a2 (ax + l)2 '
f(3)ix)- 2a f4)ix)
(ax+l)3 '
-6a4 (ax + l)4 "
Based on these results we can figure out that
(,        (_iy-i(„ - \)\an
(6.1)     /w(*)=v   ;   v      '-,   ie(-U), neN.
(ax + 1)"
We'll verify the validity of this formula by mathematical induction. It holds for n = 1, 2, 3, 4, so it suffices to show that its validity for k e N implies its validity for k + 1. Because the direct computation yields
fk+l)ix) = (
(-!)*-'(*-!)! (ax + l)k
(-!)*-'(*-!)! ak (-k)a _ (-l)kk\ak+l
vzorec (116.1 II) it holds for all neN. Then
ln(n)(l+x)
(x + l)"
ln(n)(l-x)
From here we obtain the result
(ln^)(B) = (n-l)!(
l
(l-x)n
("-D!
(-x+ir (-1)"
x e (-1, 1).
We can illustrate the concept of higher order derivatives on polynomials. Because a derivative of a polynomial is a polynomial with a degree one less than the original one, after a finite number of differentiations we get the zero polynomial. More precisely, after exactly k + 1 differentiations, where k is the degree of the polynomial, we get zero. Of course then derivatives of all orders exist, e.g. feC°(R).
In the spline construction, see 5.9, we took care that the resulting functions would belong to the class C2(M). Their third derivatives will be sequentially constant functions. That is why the splines won't belong to C3 (R), even though all their higher order derivatives will be zero in all of the inner points of all single intervals in the interpolation. Think this example through in detail!
The next assertion is a simple combinatorical corollary of Leibniz's rule for differentiation of a product of two functions:
Lemma. If two functions f and g have derivatives of order k at the point xq, then their product also has a derivative of order k and the following equality holds:
(/■«)«(ao) = i;(*)/«(xo)^(xo).
Proof. For k — 0 the statement is trivial, for k — 1 it's Leibniz's product rule. If the equality holds for some k, by differentiating the right hand side and using Leibniz's rule we obtain a smiliar expression
E
i=0
fi+1)(xo)g
(k-
i)(xo) + f(i)(x0)g(k-
-i + l)
i*o)
In this new sum, the sum of orders of derivatives of products in all summands isk+1 and the coefficients of f^ (xo)g<*+1_i-) (xo) are the sums of binomial coefficients (. * l) + (*) = (*+!). □
6.2. Multiple roots and inversions of polynomials. We already computed the derivatives of polynomials in the paragraph 5.6 and it can be seen that these are smooth functions. In this case differentiation can be viewed as an injective algebraic map. Let's see how we can use differentiation for discussing multiple roots of polynomials.
First we formulate The fundamental theorem of algebra, whose proof will be left over to ??.
Theorem. Each nonzero complex polynomial f : C —>• C of degree at least one has a root.
Thus a polynomial of degree k > 0 has exactly k complex roots (counting multiplicities) and can be written uniquely in the form
f(x) = (x -ai)Cl ■ (x -aq)c
Nlel . fr — n  Y-q ^
where a\,..., aq are all roots of the polynomial / and
1 < c\, ..., cq < k
are their multiplicities (i.e. natural numbers).
By differentiation of f(x) as a function of one real variable x we get
f(x) =Cl(x- ai)^-\ .. (x - aq)c* +...
(l+xT
+ cq(x -ai)Cl...(x -aqfi
Ca-l
348
CHAPTER 6. DIFFERENTIAL AND INTEGRAL CALCULUS
pro x € (-1, 1) a n € N.
□
6.4. Determine the second derivative of function y = tg x on its whole domain, i.e. for cos x ^ 0. O
6.5. Determine the fifth and the sixth derivative of the polynomial
p (x) = (3x2 + 2x + 1) • (2x - 6) • (2x2 - 5x + 9),   x e R.
If ci = 1 and the root a \ is real, the value of the derivative /' at the point a\ will be nonzero, because the first term is nonzero, while all the others will vanish after setting x = a\. Similarly for other roots. Thus we verified a convenient property that a real root a of a polynomial / is multiple if and only if it is a root of its derivative /'. (We will extend this statement to all complex roots in time.)
o
6.6. With no computation involved, determine the 12th derivative of function
y = e2x + cos x + x10 - 5x7 + 6x3 - 1 Ix + 3,   x e R.
o
6.7. Write the 26th derivative of function
f(x) = sinx + x23 - x18 + 15X11 - 13x8 - 5x4 - llx3 + 16 + e2* pro x e R. O We'll show some more interesting examples of using the differential calculus. First though, we'll mention the Jensen inequality, which disscusses convex and concave functions and which we'll use later.
6.8. Jensen inequality. For a strictly convex function / on interval / and for arbitrary points x\, ..., xn e / and real numbers c\, ..., cn > 0 sucht that c\ + ■ ■ ■ + cn = 1, the inequality
□
\i = l /       i = l
holds, with equality occuring if and only if x\ = ■ ■ ■ = x„. Solution. Proof can be found for example in ||??|| Remark.
The Jensen inequality can be also formulated in a more intuitive j way: the centroid of mass points placed upon a graph of a strictly convex function lies above this graph.
63r Vrove that among all (convex) «-gons inscribed into a circle, the regular n-gon has the biggest area (for arbitrary n > 3). Solution. Clearly it suffices to consider the «-gons inside of which lies the center of the circle. We'll divide each such n-gon inscribed into a circle with radius r to n triangles with areas St, i € {\, ... ,n} according to the figure. With regard to the fact that
sin^ = ^,    cos^ = ^,       ie{l.....n},
2        r ' 2        r ' 1 '
we have
5; = xt ht = r2 sin y cos y = ^ r2 sin<pt,    i e {1, ..., n}. This implies that the area of the whole n-gon is
S=itSi = 2-r2J:smcpi.
r = l r = l
Thus we want to maximize the sum E"=i sm Vi > while for values <pt € (0, it) we clearly have
(6.2)
(Pi H-----1" <Pn = E
(Pi
2n.
i = \
The function y = sinx is strictly concave on the interval (0,7t), which means, that the function y = — sin x is strictly convex on this
6.3. The meaning of second derivative. We have already seen that the first derivative of a function is its linear approximation in the neighbourhood of a given point and that the sign of nonzero derivative determines whether the function is increasing or decreasing at the point xq. The points where the first derivative is zero are called critical points or stationary points of the given function.
v/zwak VUJMCZ-i
If xo is a stationary point of function /, the behavior of function / in the neighbourhood of xo can vary. It can be seen for example from the behavior of function fix) = x" in the neighbourhood of zero for arbitrary n. For odd n > 0, fix) will be increasing, while for even n it will be decreasing on the left side and increasing on the right side, therefore at xo it will attain its minimal value among points from (sufficiently small) neighbourhood of xq = 0.
We can apply this point of view to function /'. If the second derivative is nonzero, its sign determines the behavior of the first derivative. That's why at the critical point xo the derivative fix) will be increasing if the second derivative is positive and decreasing if the second derivative is negative. If it's increasing though, it means that it will necessarily be negative to the left of the critical point and positive to the right of it. In that case, function / is decreasing to the left of the critical point and increasing to the right of it. That means / attains its minimal value among all points from (sufficiently small) neighbourhood of xo at the point xo.
On the other hand, if the second derivative is negative at x$, the first derivative is decreasing, thus negative to the left of xo and positive to the right of it. Function / will then attain its maximal value among all values from some neighbourhood.
A function that is differentiable on (a, b) and continuous on [a, b] certainly has an absolute maximum and minimum of this interval. It can be attained only at its boundary or at a point with zero derivative, i.e. in a critical point. That means critical points may be sufficient for finding extremes and second derivatives will help us determine the types of the extremes, if they are nonzero. For more precise discussion though we need better approximation of the studied function than a linear one. That's why we'll first study notions in this direction and later come back to discussing the course of functions.
349
CHAPTER 6. DIFFERENTIAL AND INTEGRAL CALCULUS
interval. Then according to Jensen's inequality for q = l/n and x, (Pi, we have
-sin [Y,-(Pi:)<-E^sm<°i.    tj.    sin Q]- cp, >
\r = l        / r = l \r = l /
r = l
Moreover, we know the equality occurs exactly for <pi = ■ ■ ■ = <p„. If we express (using (||6.2||))
5 = ^E^in^<^sin(E^^
r2«
4^ sin 2*
we can see that S can attain at most the value on the right hand side. But that happens if and only if <pi = ■ ■ ■ = <p„ (we chose x, = (pt). Hence the regular «-gon is the one with the maximum area, because it satisfies <p\ = ■ ■ ■ = <pn = 2n/n. □
6.10. Isoperimitric quotient. For a closed curve in plane enclosing a planar region, we define its isoperimetric quotient as the number
IQ := s
AtcS
n2 '
where S denotes the area of the region and o its perimeter (i.e. the length of the curve). Hence the isoperimetric quotient determines the ratio of the area of the region and the area of a circle with the same perimeter as the given region. The notation IQ is therefore not only an English abbreviation for the isoperimetric quotient, but can be also thought of as the "intelligence of the region", with which it uses its perimeter for attaining as big area as possible. The isoperimetric theorem then states that for every closed curve, IQ < 1, with equality occuring only for a circle, or ("the circle is the smartest").
Determine IQ for a regular polygon and a circle and find the sector of a circle, for which its boundary has the largest IQ
Solution. First notice that the value of IQ doesn't change with a change of scale on the axes (same on both). Because when the proportions of the region get a times bigger (for arbitrary a > 0), the perimeter also gets a times bigger and the area a2 times (it's a square measure). Hence IQ doesn't depend on the size of the region, but only on its shape. Thus we can consider a regular n-gon inscribed into a unit circle. According to the figure,
h = cos cp = cos -,       £ = sin cp = sin -,
which yields
on = n ■ x
2n sin -
and
5 = n ■ \hx = n cos - sin -.
fit 17 17
Thus for a regular n-gon, we have IQ
^cotg^,
4«2 sin2 f
which we can verify for example for a square (n = 4) with a side of length a, where
JO — 4™2 — 1L — e. cotCT e. 1 V ~ (4a)2 — 4 — 4 LUl& 4 ■
Using the limit transition for n -> 00 and the Umit
lim sai = 1,
we get the isoperimetric quotient for a circle:
6.4. Taylor expansion. As a surprisingly easy use of Rolle's theorem we will now derive an extremely important result. It's called Taylor expansion with a remainder. Intuitively we can get to it by reversing our notions about power series. If we have a power series centered in a,
S(x) —      an(x — a)n, «=0
and we differentiate it repeatedly, we are getting power series (we know that we can differentiate such expression term after term, even if we haven't proved it yet)
Sw (x) — ^n(n — \) ... (n — k + \)an(x - a)
n—k
n—k
At the point x = a we then have S^k) (a) — Icla^. Then we can conversely read the last statement as an equation for and rewrite the original series as
00 ^
S(x) = ^-SW (fl)(.
a)n.
«=0
If we have some sufficiently smooth function fix) instead of a power series, it's suitable to ask if it can be expressed as a power series and how fast will the partial sums (i.e. approximations of function / by polynomials) converge. Our notion just suggested we can expect a good approximation by polynomials in the neighbourhood of point a.
I    Taylor polynomials of function /    |__,
Fot k times differentiable function / we define its Taylor polynomial ofk-th degree by the relation
Tk,af(x) = fia) + f'(a)(x - a) + i/»(x - a)2+
-f(3\a)ix-a)3 + 6
+ -f(k\a)ix-a)k. k\
The precise answer looks similar to the mean value theorem, but we work with higher degrees of polynomials:
Theorem (Taylor expansion with a remainder). Let f (x) be a function that is k times differentiable on interval (a, b) and continuous on [a, b]. Then for all x € (a, b) there exists a number c € (a, x) such that
fix) = fia) + f'ia)ix-a) + ...
+ —" a)*"1 + Y/k){c){X ~
ik
= Tk-hafix) + -f(k)ic)ix-a)k. k\
Proof. Define the remainder R (i.e. the error of the approximation for fixed x) as follows
f(x) = Tk.haf(x) + R
i.e. R — i^rix — a)k for a suitable number r (dependant on x). Now consider function F(§) defined by k-i
l—' 7! k\
350
CHAPTER 6. DIFFERENTIAL AND INTEGRAL CALCULUS
IQ = lim ^cotg-
lim
cos 0 1
1.
Of course, for a circle with radius r, we could have also directly computed
IQ = *£
4jr(jir2;) (2nr)2
1.
For the boundary a of sector of a circle with radius r and central angle <p € (0, 27r), we have
4jt
2ir<p
(2r+r<p) 1       (2+<pY '
Hence we're looking for a maximum of the function
<pe(0,27r).
J (2+<p)2 '
(2+¥>)J
<p e (0, lit)
By computing
/'(?)) = 2tt (2+y)^4(2+y) = 2tt we easily obtain that
/'(?)> 0,   pe(0,2),       /a»<0, <pe(2,2jr). Hence function / attains its maximal value for <po = 2 and for a central angle <po = 2 (radians), we get the largest
If) _ 2jTij90 _ JT
* ~ (2+po)2 ~ 4 •
For the sake of completeness, for a solid in three-dimensional space (more precisely, for the closed surface which is its boundary), we define
IQ:=
where V is the volume and S the surface of the solid. Thus we compare the volume of the solid with a given surface with the volume of the ball with the same space, o □
6.11. A string of length I is given. The task is to cut it into n parts so that it's possible to create boundaries of geometric figures given in advance (for example a square, a triangle, a circle, a halfcircle) with the least sum of areas from the n smaller strings.
Solution. To solve this problem, we'll use the isoperimetric quotient of curves and Jensen's inequality (stated in previous examples). For the geometric figures given in advance, denote the values of their isoperimetric quotients as
17-=-^-.       i €{l,...,n],
where 5, is the area and ot the perimeter of the z'-fh figure. We'll also use the denotation
n
A
i = l
Recall that the isoperimetric quotient is given only by the shape of the figure and doesn't depend on its size. In particular, the value A is constant (it's determined by the shapes of the given figures).
Our task is to minimize the sum £"=1 St with £"=1 ot cause
I. Be-
we need to minimize the expression
4jt
r = l
Its derivative (when x is considered as a constant parameter) is
1
e(V°'+1)^-^
i
(*- 1)!
Hx - H)
k-l
1
(*- 1)!
f^HMx-tf-1
1
(*-l)!
r(x -
k-l
1
Cc-£)*-1(/(*)(£)-r),
(*- 1)!
because the expressions in the sum cancel each other out sequentially. Now it suffices to notice that F(a) = F(x) = f(x) (recall that x is an arbitrarily chosen but fixed value from interval (a, b)). Then according to Rolle's theorem there exists a number c, a < c < x such that F'(c) = 0. That's exactly the desired relation. □
6.5. Estimations for expansions with a remainder. An especially simple case of Taylor expansion is the expansion of an arbitrary polynomial
f(x) = a„x"
'-ln-1
x"~
+ ■ ■ ■ + a\x + at),    an ^ 0.
Because the (n + 1)—th derivative / is identically zero, the Taylor polynomial of degree n has a zero remainder, therefore for eachxo e M
f(x) = f(x0) + f'(x0)(x - x0) + • • • + -Jin)(x{))(x - x0)"
n\
and we can compute all the derivatives easily (for example the last term is always of the form) a„ (x — xo)").
This result is a very special case of error estimation in Taylor series with a remainder. We know in advance that the remainder can be estimated by the size of the derivative which is identically zero from some degree on for polynomials.
More universally the estimation of the size of the k-th derivative on some interval can be used to estimate the error on the same interval. A special case is also the mean value theorem as an approximation by Taylor series of degree zero, see (5.9). A good examples of an expansion of an arbitrary degree are the goniometric functions sin and cos. By iterating the differentiation of function sin x we always get either sine or cosine with some sign, but in the absolute value the values won't exceed one. Thus we get a direct estimation of the speed of convergence of the power series
sinx - (71,o sin)(x)| <
(*+l)!
It show that for x drasticall lesser than k the error will be small, but for x comparable with k or bigger it will be large. Compare with the figure of the approximation of function cos x by Taylor polynomial of degree 68 in paragraph 5.50.
As we mentioned in the introduction of the discussion of Taylor expansion of functions, if we start with a power series fix) centered in a, then its partial sums coincide with Taylor polynomials Tkafix). The next statement is one of the simple formulation of the converse implication, i.e. when the given function fix) is actually a power series.
351
CHAPTER 6. DIFFERENTIAL AND INTEGRAL CALCULUS
Using Jensen's inequality for the strictly convex function y = x2 (on the whole real axis), we obtain
\2 n
Ectxi) < EctA
\i=\        /        r = l
for Xi € R and q > 0 with the property c\ + ■ ■ ■ + cn = 1. Moreover we know that the equality occurs if and only if x\ = ■ ■ ■ = x„. By choosing
we then get
y hi £l i <
r = l      V 7
By several simplifications, we obtain the inequality
and then (notice that ELi °i = 0
t < y °1
A — i—i Xi '
r = l
with equality again occuring for (6.3) xi = ■■■ =x„, tj.
^1
On_
This implies that S the smallest, if and only if (|| 6.31|) holds. This smallest value of S is I2/(4jt A). Now we only need to determine the lengths of the cut parts ot. If (||6.3||) holds, then clearly ot = kkt for all i € {1, ..., n} and certain constant k > 0. From
*? n n
E    = /   and simultaneously   E °i = ^ E A/ =
/—i /—i /—i
we can immediately see that k = l/A, i.e.
oi = %l,      ie{l.....b).
Let's take a look at a specific situation where we are to cut a string of length 1 m into two smaller ones and then create a square and a circle from them so that the sum of their areas is the smallest possible. For a square and a circle (in order), we have (see the example called Isoperimetric quotient)
M=£,    *2 = 1,       tj.      A=Ai+A2 = ^.
Then the lengths of the respective parts are (in metres)
°i = ife ' 1 = 4^7 = °- 56' °2
— • 1 — -3— — 0 44
4±zl     1 — 4+7t  — u'
The area of a square with perimeter 0, 56 m (with a side of length a = 0,14 m) is 0, 0196 m2 and the area of a circle with perimeter 0, 44 m (and radius r = 0, 07 m) is approximately 0, 015 4 m2. We can verify that (in m2
0,035 = 0,019 6 + 0,015 4.
4jtA 4(4+jt)
□
Taylor expansions. We necessarily need the derivatives of higher orders to determine the Taylor expansion of a given function.
Corollary (Taylor's theorem). Assume that the function f(x) is smooth on the interval (a — b, a + b) and all of its derivatives are bounded uniformally here by a constant M > 0, i.e.
\fik)(x)\ <M,    k = 0, 1, ..., x e (a - b, a + b).
Then the power series S(x) — E^Lo j\f^kHa)(x ~ a)" converges on the interval (a — b, a + b) to the function f(x).
Proof. The proof is identical with the notion in the specific case of function cos x higher. Think through the details! □
6.6. Analytic and smooth functions. If / is smooth at a, we can write a formal power series
s(x)^Y,Y\fik)(a)(x
«=0
■a)".
If this power series has a nonzero radius of convergence and simultaneously S(x) — f(x) on the respective interval, we say that / is an analytic function at the point a. A function is analytic on an interval, if it's analytic at its every point.
Not all smooth functions are analytic though. In fact it can be proven that for every sequence of numbers a„ we can find a smooth function, whose derivatives of order k will be these numbers ak} To at least imagine the essence of the problem, we'll introduce (as will be seen later, a very useful) function, that has all derivatives zero at zero but is nonzero at every other point, sider a function defined by
-l/x1
Obviously it's a well defined smooth function at all points x / 0. We'll check that at the point x — 0 the limit exists: lim^o f(x) — 0. Thus we can aditionally define /(0) — 0 and obtain a smooth function.
By a direct computation with usage of L'Hospital's rule we'll compute the derivative and it suffices to consider only the right derivative, because the function is even.
/'(0) = Hm ■
-l/x2
-0 x — = lim
-1 1 = - lim
= 0.
By differentiating the function f(x) at an arbitrary point x / 0 we'll get f'(x) — e~l/x -2x~3 a by repeated differentiating of the results we' 11 always get a sum of finitely many terms of the form
C e
-l/x2
where C is an integer and j is a natural number.
So we'll assume we've already proven that the derivative of order k of our function fix) exists and is zero at zero. While computing the following derivatives we'll proceed the same way as in the case k — 0 higher. We'll compute the limit of the expression f^k\x)/x forx —>• 0+,i.e. a finite sum of limits of the expressions
-l/xz
— x J j e
\/xz
All these expressions are of type oo/oo,
so we can use L'Hospital's rule repeatedly on them. Obviously after several differentiations of both the numerator and denominator (and a similar adjustment as higher) there will be still the same expression in the denominator, while in the numerator the power will be nonnegative. Thus the whole expression necessarily has a
It's a special case of so called Whitney's theorem, see. complete the citation and information.
352
CHAPTER 6. DIFFERENTIAL AND INTEGRAL CALCULUS
6.12. Determine the Taylor expansions (of k-ih order at point x) of the following functions:
i) Tq of function sin x,
ii) T? of function —.
Solution, (i) We'll compute the values of the first, second and third derivative of function / = sin at point 0: f'(0) = cos(0) = 1, /(2)(0) = -sin(0) = 0, /(3)(0) = -cos(0) = -1, also /(0) = 0. Thus the Taylor expansion of the third order of functionsin(x) at point Ois
ro3(sin(x)) = x — -x3.
(ii) Again/(l) = e, /'(I)
/(2)(D
/(3)(D
e x
ex x ex x
x=\
ex2 2ex 2- + —
x xJ
Jt = l
nx2
+
6ex
6ex
-2e
x = \
Thus we get the Taylor expansion of third order of function y at point 1:
T3(-)=e + -(x-l)2
x3 3x2
e(--+-
V   3 2
2x + -). o
□
6.13. Determine the Taylor polynomial ro6 of function sin and using theorem (6.4), estimate the error of the polynomial at point 7t/4.
Solution. Analogously to the previous example, we compute
fi 1   3 1 s
rn (sin(jc)) = x--x -\--xr .
0 6 120
Using the theorem 6.4, we then estimate the size of the remainder (error) R. According to the theorem, there exists c e (0, j) such that
R(jt/4)
COS(c)7T7
7!47
<
1
7!
0, 0002.
6.14. Find the Taylor polynomial of third order of function
y = arctg x,   x e M
at point x0 = 1.
6.15. Determine the Taylor expansion of third order at point x0 of function
□
o
= o
(a) y
(b) y
(c) y
(d) y
(e) y
1
COS x '
e 2 ;
sin (sin x) ;
tg*;
ex sin x
zero limit at zero, just like we've computed in the case of the first derivative higher. The same will hold true for a finite sum of such expressions, so we've found out that each derivative / w (x) at zero will exist and its value will be zero.
We've shown that our function fix) is smooth on whole M, it's of course a nonzero function everywhere except for x — 0, but all its derivatives at this point are zero. Of course, then it's not an analytic function at the point xq — 0.
6.7. Examples of nonanalytic smooth functions. We can easily K; „ modify our function f(x) from the previous paragraph in
this way:
-l/x2
ifx < 0 if x > 0
Again it's a smooth function on whole R. By another modification we can obtain a function that is nonzero in all inner points of the interval [—a, a], a > 0 and zero elsewhere:
h(x)
if |x| > a if Ixl < a.
This function is again smooth on whole R. The last two functions are on the figures, on right the parameter a — 1 is used.
Finally we'll show how to get smooth analogies of Heaviside functions. For two fixed real numbers a < b we define the function fix) with usage of earlier defined function g in this way:
/(*) =
gix - a)
gix - a) + gib - x)
Obviously for all x e R the denominator of the fraction is positive (because for each of the intervals determined by numbers a and b at least one of the summands of the denominator is nonzero, therefore the whole denominator is positive. Thus from our definition we get a smooth function fix) on whole R. For x < a the denominator of the fraction is zero according to the definition of g though, for x > b the numerator and denominator are equal. On the next two figures there are these functions fix) with parameters a — 1 — a, b — 1 + a, where on the left we have a — 0.8 and on the right we have a — 0.4.
353
CHAPTER 6. DIFFERENTIAL AND INTEGRAL CALCULUS
defined in a certain neighbourhood of point x0.
O
6.16. Determine the Taylor expansion of fourth order of function y = lnx2, x € (0, 2) at point x0 = 1. O
6.17. Find the estimation of the error of the approximation
In (1 + x) ~ x
forx e (-1,0).
O
6.18. Write the Taylor polynomial of fourth degree of function y = sin x, x € R centered at the origin. Using this polynomial, approximately compute sin 1° and determine the limit
lim x smx~x
jc^0+ x
O
6.19. Determine the Taylor polynomial centered at the origin of degree at least 8 of function y = e2x,x sR. Q
6.20. Express the polynomial x3 — 2x + 5 as a polynomial in variable
1.
O
6.21. Espand the function ln( 1 +x) into a power series at point 0 and 1 and determine all x e R for which these series converge.
Solution. First we'll determine the expansion at point 0. To expand a function into a power series at a given point is the same as to determine its Taylor expansion at that point. We can easily see that
[ln(x + 1)]
(«)
(-1)
n + l
(n- 1)!
(x + 1)"
so after computing the derivatives at zero, we have ln(x + 1) = In 1 +
oo
E anx", where
n = l
(_iy.+i(n_ i)! (_i)
n + l
Thus we can write
ln(x + 1)
x2 +
« = 1
For the radius of convergence, we can then use the limit of the quotient of the following coefficients of terms of the power series
1
1
lim„_
"n+\
1
-n^oo i
1.
lim,
Hence the series converges for arbitrary x e (—1, 1). For x = — 1 we get the harmonic series (with a negative sign), for x = 1 we get the alternating harmonic series, which converges by the Leibniz criterion. Thus the given series converges exactly for x e ( — 1, 1].
0,5 1
Now we can also easily create a smooth analogy of the characteristic function of the interval [c, d\.
Denote the higher specified function f(x) with parameters a — —s, b — +s as fE (x). Now for the interval (c, d) with length d — c > 2s we define the function hB(x) — fB (x — c) ■ fB (d — x). This function is identically zero on the intervals (—oo, c — s) a (d + s, oo) and identically equal one on the interval (c + s,d — s), moreover it's smooth everywhere and locally it's either constant or monotonie. The smaller the e > 0, the faster our function jumps from zero to one around the beginning of the interval or back at the end of it.
Thus we can see that smooth functions are very "plastic" — from a local behaviour around one point we cannot deduce anything at all about the global behavior of such function. Conversely, analytic functions are completely determined just by derivatives at one point. In particular they are completely determined by their behavior on an arbitrarily small neighbourhood of a single point from their domain. In this sense, they are very "rigid".
Local behavior of functions. We've seen that the sign of the first derivative of a differentiable function determines whether it's increasing or decreasing on some neighbourhood of the given point. If •■iu,       derivative is zero though, it doesn't tell us much about the behavior of the function by itself.
We've already encountered the importance of the second derivative while describing critical points. Now we'll generalize the discussion of critical points for all orders. We'll start with discussing the local extremes of functions, i.e. values, that are strictly bigger or strictly smaller than all the other values from some neighbourhood of a given point.
In the following we'll consider functions with sufficiently high number of continuous derivatives, without specifically pointing this assumption out.
We say the point a in domain of / is a critical point of order
k iff
f{l\d) = 0, f^i>(a)^0
(k+l),
Suppose f^k+l\a) > 0. Then this continuous derivative is positive on a certain neighbourhood O(a) of the point a as well. In that case, a Taylor expansion with a remainder gives us
f(x) = f(a) + -l—f(k+l\c)(x - a)k+1 (k+ 1)!
for all x in O(a). Because of that, the change of values of f(x) in a neighbourhood of a is given by the behavior of the function (x —a)k+1. Moreover, if k+l is an even number, then the values of f(x) in such neighbourhood are necessarily bigger than the value
354
CHAPTER 6. DIFFERENTIAL AND INTEGRAL CALCULUS
Analogously, for the expansion at point 1, by computing the above derivatives from ||6.211|, we get
ln(x + 1) = ln(2) + l-(x - 1) - l-(x - if + ^(x -if-...
00   (-1)" + !
= ln(2) + ^i-^_(jc- If,
«=i
n ■ 2"
and for the radius of convergence of this series, we get 1 1
lim„_
"n+\
I.
lim„_
2"+l (n+l) -OO 1 Vn
The first series converges for —1 < x < 1, the second for —1 <
x < 3. □
6.22.  Expand the function
(a) y = ln±±£, jce(-l,l);
(b) y = e.*2 + x2e~2\   x e R into a Taylor series centered at the origin..
Solution. If the function can be expressed as a sum of a power series (with a positive radius of convergence) on its domain of convergence, then this series is necessarily the Taylor series of the given function (its sum). This allows us to find the corresponding Taylor series easily. Case (a). We know that
OO j
ln(l+jc) = E t^x", *e(-l,l),
n = \
i.e.
n — 1 n — 1
xn = Hřhx
In total, we have
OO
ln{±£=ln(l+x)-ln(l-x) = E
n — 1 n—1
forx e (-1, 1).
Case (b). Similarly, the well known identity
2 Jht-\
implies
ex = f Xxn,    x e
«=0
e*2 = E^2)" = E^«, xs
«=0 «=0
and
OO
OO
x2 e~2x = x2 E h ("2x)" = E {-^r xn+2,   x e
«=0   ' «=0
Hence
Q*2 _i_ x2t~2x = T x2n+(-2)nxn+2
«=0
X €
□ R
/(a) and obvisouly the point a is a point of local minimum then. If k is even though, then the values on left are small and on right bigger than f(a), so an extreme doesn't occur even locally. On the other we can notice that the graph of function fix) intersects its tangent y — fia) at point [a, f(a)].
Conversely, if f^k+l\a) < 0, then because of the same reasoning it's a local maximum for odd k a again the extreme doesn't occur for even k.
6.9. Convex and concave functions. We say that differentiable function / is concave at point a, if in a certain neigh-
; / bourhood its graph lies completely below the tangent e~l at point [a, fia)], i.e. we require
fix) < fia) + f'(a)(x - a).
Conversely, we say that / is convex at point a, if its graph is above the tangent at point a, i.e.
fix) > fia) + f'(a)(x - a).
A function is convex or concave on an interval, if it has this property in its every point.
Moreover suppose that function / has continuous second derivatives in a neighbourhood of point a. From the Taylor expansion of second order with a remainder we obtain
fix) = fia) + f'ia)ix -a) + ^f"(c)(x - af.
Then obviously the function is convex, whenever f"ia) > 0, and concave, whenever f"ia) < 0.
If the second derivative is zero, we cab zse derivatives of higher orders. We can only make the same conclusion if the first other nonzero derivative after the first derivative is of even order. If the first nonzero derivative is of odd order, clearly the points of the graph of the function on opposite sides of some small neighbourhood of the studied point will lie on opposite sides of the tangent at this this point.
6.10. Inflection points. Point a is called an inflective point of a differentiable function /, if the graph of function /crosses from one side of the tangent to the other.
Suppose / has continuous third derivatives and write the Taylor expansion of third order with a remainder:
/ (x) = /(«)+/'(«) ix -a)+ \ f'ia) ix -af + - f" (c) (x -af.
2 6
If a is a nonzero point of the second derivative such that f'ia) / 0, then the first derivative is nonzero on some neighbourhood as well and clearly it's an inflective point. In that case, the sign of the third derivative determines whether the graph of the function crosses the tangent from the top to the bottom or vice versa.
Moreover, if a is an isolated nonzero point of the second derivative and simultaneously an inflective point, then clearly on some small neighbourhood of a the function is concave on one side and convex on the other. Thus we can also see the inflective points as the points of the crossover betwenn concave and convex behaviour of the graph of the function.
355
CHAPTER 6. DIFFERENTIAL AND INTEGRAL CALCULUS
6.23.  Determine the Taylor series centered at the origin of function
(b) y = arctgx,   x e (—1, 1). Solution. Case (a). We'll use the formula
oo oo
= £(-*)" = £(-i)"*", jce(-i.i)
«=0 «=0
abou the sum of a geometric series. By differentiating it, we obtain
(oo \ ' oo
£(-!)"*" ) = E(-l)"n^_1. Jce(-l,l), «=o / «=i
with (x0)' = 0, thus the lower index is n = 1. We can see that
oo
\« + l ~, vn — \
(1+xV
J2, (-1)"+1 njc""1,    jce(-l, 1).
«=i
Case (b). We can express the derivative of function y = arctg t as
oo oo
(arctg0' = ^7r = £ HT = £(-l)"<2", fe(-U).
«=0 «=0
Because forx e (—1, 1) we have
/ (arctgt)' dt = arctgx — arctg 0 = arctgx
6.11. Asymptotes of the graph of the function. We'll introduce one more useful utility for sketching the graph of a
function. We'll try to figure out the so called asymp-^ Civ     totes, i.e. the lines, which the values of function / M'i^     approach. An asymptote at the improper point oo is such a line y = ax + b, which satisfies
lim (f(x) — ax — b) = 0.
We also call it an asymptote with a slope. If such an asymptote exists, it satisfies
lim (f(x) — ax) = b
and therefore the limit
and
x / oo \ oo/ x \ oo
/ £(-1)"^ )dt = e ((-iyft2"dt) = e
o V«=o /        «=o V       o       / «=o
we already have the result
oo
arctgx = J2£Tx2"+1, *e(-l,l).
(-1)" r2« + l 2n + l X
«=0
6.24.  Find the Taylor series centered at x0 = 0 of function
X
f(x) = f u cos u2 du,    x € M..
0
Solution. The equality
oo
cos? = E tt€ t2", (et
£—^ (2«)! '
impUes
«=o
2                 (-1)"  / 2^2" M COS H   = U x    1-^- '"
£   (2«)!   (M ) £ (In
(-1)" „4« + !
«=0
U       ,     U €
«=0
and then (for x e M)
X X   / oo
/(jc) = J u cos u2 du = f I E ^)T m4"+1 I 0 0 V«=o   " '
£ m.hAn+iäu =e^
«=0 \ 0
(-1)" x4«+2
«=0
1+2) '
□
lim -= a
□
x^oo x
exists as well.
Conversely, if the last two limits exist, the limit from the definition of the asymptote exists as well, thus these are sufficient conditions as well.
We can define and compute the asymptote at the improper point — oo similarly.
This way we can find all the potentional lines satasfying the proporties of asymptotes with a slope. All we have left are potential lines perpendicular the the x axis:
The asyptotes at points a e R are lines x = a such that the function / has at least on of the one-sided limits at point a infinite. We also speak of th asymptotes without a slope.
For example rational functions have an asymptote in zero points of denominator that aren't zero points of the numerator.
We'll compute at least one simple example: function fix) = x + \ has the asymptotes y = x and x = 0. Indeed, the one-sided limits from the right and left at zero are clearly ±oo, while the limit f{x)/x = 1 + 1/x2 is of course exactly ± 1 at the improper points, while the limit f{x) — x = 1/x is zero at the improper points.
By differentiating we get
/'(x) = l-x-2,    f"(x) = 2x-3.
6.25. On the interval of convergence (—1, 1), determine the sum of the series
E n in + 1) x" .
n = l
Solution. We have
function f'{x) has two zero points ±1. At point x = 1 the function has a local minimum, at point x = — 1 a local maximum. The second derivative has no zero points in all its domain (—oo, 0) U (0, oo), so our function doesn't have any inflection points.
356
CHAPTER 6. DIFFERENTIAL AND INTEGRAL CALCULUS
oo
«=1
X>(/i + 1)jc" = E" (xn+l)' i
oo
E nX
-1 + £jc" «=o
«=i
.2 E^«V « = 1
^■2
E«*"+1
«=i
Ex"
«=i
	2 1
	
2*
for all x e (-1, 1). □
d^d. Expand the function cos2(x) into a power series (i.e. determine its Taylor expansion) at point 0 and determine for which real numbers this series converges.
6.27. Expand the function sin2(x) into a power series at point 0 and determine for which real numbers this series converges
6.28. Expand the function ln(x3 + 3x2 + 3x + 1) into a power series at point 0 and determine for which x e R it converges. O
6.29. Expand the function In *Jx into a power series at point 1 and determine for which i e lit converges. O More problems about Taylor polynomials and series can be found on page 412.
Now we'll state several "classical" problems, in which we'll determine the course of distinct functions.
6.30. Determine the range of function
f(x)=%±, xeR.
Solution. The line y = 1 is clearly an asymptote of function / at +oo and the line y = — 1 is an asymptote at — oo, because
lim 4 = 1,
lim sLzl
11111 p-ril
lim sLzl
2=1 = _1
0+1
The inequality
> o,
X €
then implies that / is continuous and increasing on R. Hence the range is the interval (—1,1). □
2
6.31. State all intervals on which the function y = e~x , x e R is concave. O
6.32. Consider function
j = arctg^-, x^0(xsR). Determine intervals on which this function is convex and concave and
also all its asymptotes.
6.33. Find all asymptotes of function
(a) y = xex;
with maximal domain.
6.34. State the asymptotes of function
o
o
y = 2 arctg
6.35. Consider function
X £ ±1 (X € R).
o
6.12. Differential of a function. In practical use of differential calculus, we often work with dependencies between several quantites, say y a x, and the choice of de-^ pendant and independant variable is not fixed. The explicit relation y — fix) with some function / is then only one of several options. Differentiating then expresses, that the immediate change of y — fix) is proportional to the immediate change of x with a proportion of fix) — ^-(x). This relation is often being written as
df
dfix) — —(x)dx, dx
where we interpret dfix) as a linear map of increments of given J/(x)(Ai) — fix) ■ Ax, while dx(x)(Ax) — Ax.
We speak of the differential of function f if the approximative property
fix +Ax)- fix)-df ix)iAx)
lim
Ajc^O
Ax
0
holds. Taylor theorem then implies that a function with bounded derivative /' has a differential df. In particular, that occurs at point x if the first derivative fix) exists and is continuous.
If the quantity x is expressed by another quantity t, i.e. x — git), and moreover by a function with continuous first derivative again, the the rule for differentiating composite functions tells us the composite function fog has again a differential
df dx df(f) =-j-(x) — (f)dt. dx at
Therefore we can see df as a linear approximation of the given quantity dependant on the increments of the dependant variable, no matter how this dependance is given.
6.13. The curvature of the graph of a function. To train our-\\ selves in the basic rules for differentiating composite functions etc., we'll disscuss the graph of a smooth function fix) as a special case of a parametrized curve in a plane for now. We can imagine it as a movement in the plane parametrized by an independant variable x. For an arbitrary point x from the domain of our function, by computing the first derivative we can immediately get the vector (1, f(x)) e R2 that represents the immediate velocity of such a movement. . The tangent line through the point [x, fix)]
357
CHAPTER 6. DIFFERENTIAL AND INTEGRAL CALCULUS
y = In
3e2x+ex + l0 e* + l
O
defined for all real x. Find its asymptotes.
6.36.  Determine the course of the function
f(x) = ^|x|3 + l.
Solution. The domain is the whole real axis, / has no discontinuities. For example it suffices to consider that the function y = f/x is continuous at every point x e R (unlike even roots defined only on the nonnegative axis). We can also immediately see that fix) > 1 and fi—x) = fix) for all x e R, i.e. the function / is positive and even. Thus we can obtain the point [0, 1] as the only intersections of the graph of / with the axes by substituting x = 0. The limit behavior of the function can be determined only at ±oo (there are no discontinuities), where we can easily compute
(6.4)
lim ý\.
+ 1
lim yp
lim I
+ 00.
Now we'll step up to determing the course of a function by using its derivatives. For x > 0, we have
/(i) = vV+T=(i3+i)]
hence
(6.5)   fix) = 1 (x3 + lp 3x2
(x3 + iy
> 0,       x > 0.
This implies that / is increasing on the interval (0, +oo). With respect to its continuity at the origin, it must be increasing on [0, +oo). Because it's an even function, we know that on the interval (—oo, 0] it must be decreasing. Thus it has only one local minimum at point x0 = 0, which is also a (strict) global minimum. Because a noncon-stant continuous function maps an interval to an interval, the range of / is exactly [1, +oo) (consider fix0) = 1 and (||6.4||)). Notice that thanks to the even parity of the function, we didn't have to compute the derivative /' on the negative half-axis, which can though be easily determined by substituting |x|3 = (—x)3 = —x3, yielding
fix) = \ (-x3 + lp (-3x2)
vV*3+i)3
< o,
x < 0.
When computing f'(0), we can proceed according to the definition or we can use the hmits
lim
0
uu,   ,--^ — lim -
determine the one-sided derivatives and then /'(0) = 0. In fact, we didn't even have to compute the first derivative on the positive half-axis either. To obtain that / is increasing on (0, +oo), we only needed to realize that both functions y = f/x and y = x3 + 1 are increasing on R and a composition of increasing functions is again an increasing function.
For x > 0, we can easily compute the second derivative using (II6.5H)
2^(*3 + l)2-§p(*3 + l)-1(3;C2)
vV+i)4
parametrized by this directional vector then represents a linear approximation of the curve.
We've also seen that in the case /" (x) — 0 and simultaniously f"'ix) ^ 0 the graph of our function intersects the tangent line, which means the tangent line is the best approximation of the curve at the point x up to the second order as well. We usually describe this by saying that the graph of the function / has a zero curvature at the point x. While the nonzero values of the first derivative described the speed of the growth (no matter the sign), we can intuitively expect the second derivative will describe the extent of the curvature of the graph. So far we've only seen that the graph of the function is above its tangent for a positive value and below it for a negative one.
We got the tangent at a fixed point P — [x, fix)] as a limit of the secants, i.e. the lines passing through the points P a Q — [x + Ax, fix + Ax)]. If we want to approximate the second derivative, we will interpolate the points P and Q / P by a circle Cq, whose center is at the intersection of the perpendicular lines to the tangents at points P and Q. It can be seen from the figure that if the angle between the tangent at a fixed point P and the x axis is a and the angle between the tangent at a fixed point Q and the x axis is a + Aa, then the angle of the mentioned perpendicular lines will be A a as well. If we denote the radius of our circle by p, then the length of the arc between points P and Q will be pAa. As the point Q approaches a fixed point P, the length of the arc will approach the length s of the studied curve, i.e. the graph of the function fix), and the circle will approach the circle Cp. Thus we get the basic relation for the limit radius p of the circle Cp:
As ds p = lim -= —.
Aa^O Aa da
We define the curvature of the graph of the function / at a point P as the number 1/p. Zero curvature then corresponds to an infinite radius p.
For computing the radius p we need to express the length of the arc s by the change of the angle a and express the derivative of this function by the derivative of /.
We can already notice that for an increasing angle 6 the length of the arc can either increase or decrease, depending on whether the circle Cq has the center above or below the graph of the function /. The sign of p then reflects whether the function is concave or convex. We also need to think abour the special case when the center "runs off" to infinity in limit, i.e. instead of a circle we get a line again, which is the tangent.
Obviously, we don't have a direct tool to compute the derivative ^j-. However, we know that tg a — df/dx and by differentiating this equality by x we obtain (using the rule for differentiating composite functions)
1
da
(cos a)2 dx On the left hand side we can substitute
= 1 + (tga)2 =
(cos a)2
1 + if')2 which implies (see the rule for differentiating an inverse function)
dx da
l + (tga)2     l +
>\2
f"
f"
358
CHAPTER 6. DIFFERENTIAL AND INTEGRAL CALCULUS
i.e. after a simplification we have
2x
(6.6)
> 0,
x > 0.
Similarly we can compute
f"(x) -.
2xj(-x3 + l)2 - lx^(-xi + l)-1(-3x2) ^(-3+l)4
2x
> 0,
for x > 0 and then /"(0) = 0. Next, we can use a limit transition:
lim
2x
0
- _ „ _ lim --.
According to the inequality (||6.6||), / is strictly convex on the interval (0, +oo). Also / must be strictly convex on (—oo, 0). To obtain this conclusion though, we again didn't have to compute the second derivative for x < 0, it sufficed to use the even parity of the function. In total, we obtained that / is convex on its whole domain (it doesn't have any inflection points).
To be able to plot the graph of the function, we still need to find the asymptotes (we leave the computation of values of the function at certain points to the reader). Since / is continuous on R, it can't have any horizontal asymptotes. A line y = ax+b is an inclined asymptote for x -» oo if and only if both (proper) limits
lim fill = a,        lim (f(x) - ax) = b.
2x
exist. Analogous statement holds for x lim m = lim Si =
x^-oo   x x^-oo x
lim (f(x) — 1 • x) = lim (\/~
x^-oo x^-oo \
> — oo. Hence the limits lim ^ = 1,
x3 + 1
lim
Vx^+T-x
j(x3 + l)2+xjx3 + l+x2
^{x3 + \)2+xJx3+A+x2
lim
x3+l-x3
um 3^
0
x^°° ^f(x3 + \f+xjx3 + \+x2
imply that the line y = x is an asymptote at +oo. If we again consider the fact that / is even, we'll immediately obtain the line y = —x as an asymptote at — oo. □
6.37.  Determine the course of the function
f(x) = cosx
J ^ '       cos 2x '
Solution. The domain consists of exactly those x e R, for which cos 2x 7^ 0. The equality cos 2x = 0 is satisified exactly for
2x = § + kit, k € Z,    tj.   x = f + kf, k € Z.
Hence the domain is
{f + kf; k eZ}.
Clearly we have
/(-x)
cos(—x) cos(—2x)
cos 2x
fix)
Now we're almost done, because the increment of the length of the arc s in dependance on x is given by the formula
±- = (1 +
ax
so by using the rule for differentianting a composite function we can compute
ds
^     da     dx da
ds dx      (1 + (/')2)3/2
Now we can see the relation between the curvature and the second derivative: the numerator of our fraction is always positive, no matter the value of the first derivative. It's equal to the third power of the size of the tangent vector of the studied curve. The sign of the curvature is therefore given only by the sign of the second derivative, which only confirms our notions about concave and convex points of functions. If the second derivative is zero, the curvature is zero as well. The circle, by which we defined the curvature is called the osculating circle.
Try to compute the curvature of simple functions yourself and use osculating circles while sketching their graphs. The computation at the critical points of the function / is the easiest, because in these we get the radius of the osculating circle as the reciprocal of the second derivative with the corresponding sign.
6.14. Vector differential calculus. As we've mentioned already ^ in the introduction to chapter five, for our notions about differentiating it was quite essential that we studied functions defined on real numbers and that their values could be added and multiplied by real numbers. That's why we need out functions / : R -> V to have values in the vector space V. For distinction, we'll call them vector functions of one real variable or more briefly vector functions.
Now we'll take more interest in real function with values in plane or space, i.e. / : R M2 and / : R R3. We talk about (parametrized) curves in plane and space. Similarly, we could work with values in Rn for any finite dimension n.
For simplification, we'll work in the fixed standard bases in M2 and R3, so our curves will be given by doubles or triples of simple real functions of one real variable, respectively. The vector function r in plane or space, respectively, is then given by
r(t) = x(?)ei + y(f>2,    K0 = *(0«l + y(t)e2 + z(t)e3.
The derivative of such a vector function is a vector, which approximates the map r by a linear map of a line to the plane or to the space.
In plane it's
dr(t)
dt
-(f) = r'(f) =x'(0e! +y(0«2
and similarly in space.
We have to understand the differential of a vector function in this context as well:
/ dx        dy        dz \
dr = I —e\ H--e2 H--ej, )dt
\ dt        dt        dt J
where we understand the expression on the right hand side in a way the increment of the scalar independant variable t is linearly mapped by multiplying the vector of the derivative and thus obtaining the corresponding increment of the vector quantity r.
359
CHAPTER 6. DIFFERENTIAL AND INTEGRAL CALCULUS
for all x in the domain, thus / (with its domain symmetric with respect to the origin) is an even function, which was implied by the even parity of the function y = cos x. Moreover, because cosine is periodic with a period of 2tt (i.e. y = cos 2x has a period of n), it suffices to consider the function / for
x e V := [0,7T] \ {f + kf; k e Z} = [0, f) U (f, 3f) U     , n],
since the course of the function on its whole domain can be derived using its even parity and periodicity with a period of 2tt .
Hence we'll only be concerned with the discontinuities x\ = n/4 and X2 = 3n/4. We'll determine the corresponding one-sided limits
lim -^f =
n    cos 2x c^ 4
lim -^f
3_ cos 2x x^lf-
+oo, = +oo,
lim
COS X
cos 2x
lim -^f
3_    cos 2x
-oo,
-oo.
If we have a respect to the continuity of / on the interval (tt/4, 3n/4), we can see that / attains all real values on this interval. Hence the range of / is the whole R. We also found out that the discontinuities are of the second kind, where at least one of the one-sided limits is improper (or doesn't exist). By that, we simultaneously proved that the lines x = n/4 and x = 3n/4 are horizontal asymptotes. If we'd want to formulate the previous results without a restriction to the [0, n ], we can say that at all points
xk = ^ + !f, ksZ f has a discontinuity of the second kind and every line
x = \ + kf, keZ is a horizontal asymptote. Also the periodicity of / implies that no other asymptotes exist. In particular, it cannot have any inclined asymptotes, nor can the (improper) hmits limx^+0o f(x), lim^-oo f(x) exist. Now we'll find the points of intersection with the axes. The point of intersection [0, 1] with the y axis can be cound by computing /(0) = 1. When looking for the points of intersection with the x axis, we consider the equation cos x = 0, x e V with the only solution being x = jt/2. Then we can easily obtain the intervals [0, tt/4), (it/2, 3n/4), on which / is positive, and the intervals (n/4, Jx/2), (3n/4, it], where it's negative.
Now we'll step up to computing the derivative
- sinx cos 2x — 2 cosx (— sin2x)
cos
22x
— sin x (cos2 x — sin2 x) + 2 cos x (2 sin x cos x) cos2 2x
sin3 x +3 cos2 x sin x     (sin2 x + cos2 x + 2 cos2 x) sin x
cos2 2x
cos2 2x
(2
cos j
+ 1) si
sinx
-, x e V.
cos2 2x
The points at which f'(x) = 0 are clearly the solutions of the equation sinx = 0, x € V, i.e. the derivative is zero at points x3 = 0, x4 = jt. The inequalities
2cos2x + 1 > cos22x > 0,    sinx > 0,       x eV C\ (0, jt)
imply that / is increasing at every inner point of the set V, thus / is increasing on every subinterval of V. The even parity of / then implies that it's decreasing at every point x e (—it, 0), x ^ —3n/4,
If the r(t) represents a parametrization of a curve, then its derivative is a velocity vector of such defined distance. We studied a special case of the vector r(t) — te\ + f(t)e2 giving the graph of the function / in the last paragraph. The second derivative then represents the acceleration of such defined movement. Notice that of course the acceleration need not be collinear with the velocity. In fact, in the case of the graph of a function, the acceleration is collinear with the velocity only at the points, where /" is zero, which corresponds the idea that the acceleration can be collinear only if the curvature of the graph is zero.
6.15. Differentiating composite maps. In linear algebra and geometry there are very useful maps called forms. They have one or more vectors as their arguments and they are linear > in each of their aguments. Thus we can define the size of the vectors (the dot product is symmetric bilinear form) or the volume of a parallelepiped (it's a n-linear antisymmetric form, where n is the dimension of the space), see for example the paragraphs 2.44 a 4.22.
Of course, we can use vectors r(t) dependant on a parameter as the arguments of these operations. By a straightforward usage of Leibniz's rule for differentiation of a product of functions we will verify the following
Theorem. (1) If r(t) : R —>• MP is a dijferentiable vector and * : R" —>• Rm is a linear map, then the derivative of the map for satisfies
d(V o r) dt
dr * o —.
dt
(2) Consider differentiable vectors n, I a k-linear form <J> : R" x ... xR" on the space derivative of the composed map
cp(t) = *(n(0, ...,rjt(0)
satisfies (generalized Leibniz's) rule
fk) H-----h <J>(n,
R" and The the
dw ,dr\ — = M—,r2, dt        v dt
drk x
(3) The previous statement remains valid without a change even if <J> also has values in the vector space (and is linear in all its k arguments).
Proof. (1) In linear algebra, it is shown that the linear maps are given by a constant matrix of scalars A — (atj) in a way that
y n n \
* o r(t) = (     aun (t), ■ ■ ■,     amiri (0 ) •
We now carry out the differentiation by individual coordinates of the result. However, we know, that derivative acts linerly towards scalar linear combinations, see Theorem 5.33. That's why we indeed get the derivative by simply evaluating the original linear map * on the derivative r1 (t).
(2) We obtain the second statement analogously. We write out the evaluation of the /c-linear form on the vectors r\,..., rk in the coordinates in this way:
<D(ri(r),
r k
Bi,
(n)h(f) ...(rk)ik(f),
360
CHAPTER 6. DIFFERENTIAL AND INTEGRAL CALCULUS
x 7^ —it/A. Hence the function has strict local extremes exactly at the points
xk = lav,       k e Z.
With respect to periodicity of /, we uniquely describe these extremes by stating that for x3 = x0 = 0, we get a local minimum ( recall the value of the function / (0) = 1) and for x4 = x\ = jt, a local maximum with the value / {jt) = —1. Let's compute the second derivative:
fix) =
[4 cos x{— sin x) sin x-\-{2 cos2 x + l) cos x\ cos2 2x—4 cos 2x(— sin 2x){2 cos2 x + l) sin x
cos4 2x
[—4 cos x sin2 x-\-2 cos3 x+cos x](cos2 x— sin2 x} — 4(—2 sin x cos *)(2 cos2 sin x
cos3 2x
—6 cos x sin x-\-2 cos x+cos x+4 cos x sin      cos x sin x +    sin x cos x-\-8 sin x cos x
cos3 2x
[10 sin2 x cos2 x-\-2 cos4 x+cos2 x+4 sin4 x-\-l sin2     cos x cos3 2x '
Note that after a few simplifications, we can also express
(3+4 cos2 x sin2 x+8 sin2 x) cos x
x <eV.
or
Since
cos3 2x
(l 1 —4 cos4 x —4 cos2     cos x cos3 2x '
x eV x eV.
10 sin2 x cos2 x + 2 cos4 x + cos2 x + 4 sin4 x + 7 sin2 x > 0, x
or
sin2x
3 + 4 cos2 x sin2 x + 8 respectively, we have fix)
11 -4cos4x -4cos2x > 3, x e M = 0 for certain x e P if and only if
cosx = 0. But that's satisfied only by x5 = n/2 e V. It's clear that /" changes its sign at this point, i.e. it's a point of inflection. No other points of inflection exist (the second derivative /" is continuous on V). Other changes of the sign of /" occur at zero points of the denominator, which we have already determined as discontinuities x\ = Tt/4 and x2 = 37T/4. Hence the sign changes exactly at points x\, x2, x5, thus the inequality
fix) > 0   pro   x -» 0+
implies that / is convex on the interval [0, tt/4), concave on (tt/4, tt/2], convex on [tt/2, 3it/4) and concave on (3^/4, it]. The convexity and concavity of / on other subintervals is given by its periodicity and a simple observation: if a function is even and convex on an interval (a, b), where 0 < a < b, then it's also convex on i-b, -a).
All that's left is computing the derivative (to estimate the speed of the growrth of the function) at the point of inflection, yielding /' (tt/2) = 1. Based on all previous results, it's now easy to plot the graph of function /. □
6.38.  Determine the course of the function
x
ln(x)
and plot its graph. Solution.
i) First we'll determine the domain of the function:
where the scalars B^.--** ^ given as the value of the given form <$>(eil,..., etk) on the chosen &-tuple of base vectors for every choice of indices.
The rule for differentiating a product of scalar functions then yields the statement.
(3) If <$> has vector values, it's given by finitely many components and we can use the previous notion on each of them □
On the euclidean space R3, besides the dot product, which assigns a scalar to two vectors, we also have the vector product, which assigns the vector u x v e R3 to vectors u and i;, see 4.24. This vector u x v is orthogonal to both vectors u and i;, its size equals the area of the parallelogram determined by the u and i; (in this order) and an orientation such that the triple u, v, u x v is a positively oriented basis.
The previous theorem immediately implies convenient corollaries:
Corollary. Consider the vectors u(f) and v(t) in the space R3. The derivatives of their dot product (u(t), v(t)) and their vector product u(t) x v(t) satisfy
(6.1)
(6.2)
— (u(t), v(t)) dt
d
— (u(t) x v(t)) dt
(u'(t), v(t)) + (u(t), v'(t)) u(t) XV(t) + u(t) x v'(t)
6.16. The curvature of curves. Now we have far more powerful tools for studying curves in amore systematic way than we discussed the curvature of the graphs of functions.
Let's look at the curves r(t) in space and assume they are parametrized in way that their tangent vector always has a unit size, i.e. {r1 (t), r1 (?)) = 1 for all t. We say the curve r(t) is parametrized by the length. By another differentiation of this unit vector r' (t) we obtain a vector r" (t), for which we'll evaluate (using the symmetry of the dot product)
0= 4<r'(0y(0> =2<r"(0y(0> dt
and thus the acceleration vector r" (t) is always orthogonal to the velocity vector. That corresponds to the idea that after the choice of a parametrization with a constant size of velocity, the acceleration in the direction of the movement cannot be noticeable, therefore the acceleration must lie in the plane orthogonal to the velocity vector.
If the second derivative is nonzero, we call the normed vector
1
n{t) = -r" it)
\V'(i)\\
the main normal of the curve r(t). The scalar function ic(t) satisfying (at the points where r" it) / 0)
r" it) = K(t)n(t)
is called the curvature of the curve r(t). At the zero points of the second derivative we define ic(t) by zero value as well.
At the nonzero points of the curvature the unit vector bit) = r1 it) x n it) is well defined, we call it the binomial of the curve r(t). By a direct computation we obtain
0= -(bit),/it)) dt
\{1}
(b'it),r'it)) + (bit),r"it)) = (b'it),r'it))+Kit)(bit),nit)) = {b\t),ť(t)),
361
CHAPTER 6. DIFFERENTIAL AND INTEGRAL CALCULUS
ii) We'll find the intervals of monotonicity of the function: first we'll find zero points of the derivative:
ln(jc) - 1
ln2(x)
0
The root of this equation is e. Next we can see that f'(x) is negative on both intervals (0, 1) and (1, e), hence f(x) is decreasing on both intervals (0, 1) and (1, e). Additionally, f'(x) is positive on the interval (e, oo), thus f(x) is increasing here. That means the function / has the only extreme at point e, being the minimum, (we can also decide this using the sign of the second derivative of the function / at point e, because f(2)(e) > 0). iii) We'll find the points of inflection:
ln(jc) - 2
/(2)W
0
x In (x)
The root of this equation is e2, so it must be a point of inflection (it cannot be an extreme with regard to the ptrevious point).
iv) The asymptotes. The line x = 1 is an asymptote of the function. Next, let's look for asymptotes with a finite slope k:
limy
ln(jc)
lim -
x^oo ln(x)
0.
If the asymptote exists, its slope must be 0. Let's continue the computation
lim -
x^°o ln(x)
0 • x = lim ln(x) = oo,
and because the Umit isn't finite, an asymptote with a finite slope doesn't exist.
The course of the function:
□
Now move from determining the course of functions onto other subjects connected to derivatives of functions. First we'll demonstrate the concept of curvature and the osculating circle on an ellipse
6.39. Determine the curvature of the ellipse x2 + ly2 = 2 at its vertices (||4.47||). Also determine the equations of the circles of osculation at these vertices.
which show that the tangent vector to the binormal is orthogonal to both b(t) and r' it). Therefore it must be a multiple of the vector of the main normal. We write
b'it)
-x(t)n(t)
and call the scalar function x(t) the torsion of the curve r(t).
We have not yet computed the speed of change of the main normal, which we can also write as n(t) = bit) x r' it):
n'it) = b'it) x r1 (0 + K(t)b(t) x nit)
= -x(t)n(t) x r1 it) + K(t)(-r/ it))
= x(t)b(t) -Kit)r'it).
Successively, for all points with nonzero second derivative of the curve r(t) parametrized by the length of the arc, we derived an importnat basis (r1 it), nit), bit)), called the Frenet frame in the classical literature and simultaneously in this base we expressed the derivatives of its components by the form of the so called Frenet-Serret formulas
dr'-(t) = K(t)n(t),    —it) = x(t)b(t) - K(t)r'(t)
dt
db dt
dt
it) = -x(t)n(t).
Notice that if the curve r(t) still lies in one plane, then its torsion is an identically zero function. In fact, the converse is true as well. We won't prove it here, but it is a corollary of a classical result of geometric theory of curves:
Two curves in a space parametrized by the length of the arc can be mapped to each other by an euclidean transformation, if and only if their curvature functions and torsion functions coincide except for a constant shift of the parameter. Moreover, for every choice of smooth funcitons k ax there exists a smooth curve with these parameters.
We won't prove this result here, the persons concerned can find the thorough version in[?].
By a straightforward computation we can check that the curvature of the graph of the function y = fix) in plane and the curvature k of this curve defined in this paragraph coincide. Indeed, by computing the derivative of the composite function using the differential of the length of the arc for the graph of a function of form
dt = (1 + ifxf)1/2dx,       dx = il + ifx)2)~1/2dt
(here we write fx = ^) we obtain this relation for our unir tangent vector of the graph of a cruve
r'it) = ix'it), y (0) = (d + ifx)2r1/2, fxd + (fx)2)-1'2)
and by fairly not well arranged, but similar computation of the second derivative and its size we indeed obtain
La
'dx2
6.17. The approximations of derivatives and the asymptotic estimations. In the beggining of this textbook in paragraphs 1.3, 1.9 and further we discussed how to express the value of a function by changes, i.e. differen-tions. In the next part of the text we will similaely the function / using its derivatives, i.e. immediate changes. Before that though, let's stop at the connections between
362
CHAPTER 6. DIFFERENTIAL AND INTEGRAL CALCULUS
Solution. Because the ellipse is already in the basic form at the given coordinates (there are no mixed or linear terms), the given basis is already a polar basis. Its axes are the coordinate axes x and y, its vertices are the points [72, 0], [-72, 0], [0, 1] and [0, -1]. Let's first compute the curvature at vertice [0, 1]. If we consider the coordinate y as a function of the coordinate x (determined uniquely in a neighbourhood of [0, 1] ), then differentiating the equation of the ellipse with respect to the variable x yields 2x + 4y/ = 0, hence / = — ^ (/ denotes the derivative of function y(x) with respect to the variable x; in fact it's nothing else than expressing the derivative of a function given implicitly, see ??). Differentiating this equation with respect to
x than yields y" = - ^-). At point [1, 0], we obtain / = 0
- \ (we'd receive the same results if we explicitly expressed
and y"
y = \ 72 — x2 from the equation of the ellipse and performed differentiation; the computation would be only a little more complicated, as the reader can surely verify). According to 6.13, the radius of the osculation circle will be
(! + (/)¥
(y")2
-2,
derivatives and differentions. The key to this will be the Taylor series with a remainder.
Suppose that for some (sufficiently) differentiable function / (x) defined on the interval [a, b], we know the values fi — f (x,) at the points xq = a, x\, x2, ■ ■ ■, x„ — b, while all indices i — 1,..., n satisfy x;■ — x,_i — h > 0 for some constant h. Write the Taylor expansion of function / in the form
f(xi ±h) = fi± hf'(Xi) + y± ^/(3)te) + ■ ■ ■
We know that if we finish the expansion by a term of order k in h, i.e. an expression containing hk, then the actual error will be bounded by
uk+l —-
(k+\)V
on the interval [x, — h, xt + h]. If the (k + l)-the derivative / is continuous, we can approximate it by a constant. Then we can see that for small h, the error of the approximation by the Taylor polynomial of order k acts like hk+l except for a constant multiple. Such an estimation is called an asymptotic estimation.
Definition. We say the expression G (h) is asymptoticall equal to F(h) for h -* 0 and write G(h) = 0(F(h)), if the finite limit
G(h)
lim
h^o F(h)
a € .
exists.
or 2, respectively, and the sign tells us the circle will be "below" the graph of the function. The ideas in 6.13 and 6.16 imply that its center will be in the direction opposite to the normal line of this curve, i.e. on the y axis (the function y as a function of variable x has a derivative at point [0, 1], thus the tangent line to its graph at this point will be parallel to the x axis, and because the normal is perpendicular to the tangent, it must be the y axis at this point). The radius is 2, so the center will be at point [0, 1 — 2] = [0, —1]. In total, the equation of the osculation circle of the ellipse x2 + ly2 = 2 at point [0, 1] will be x2 + (y + l)2 = 4. Analogously, we can determine the equation of the osculation circle at point [0, —1]: x2 + (y — l)2 = 4. The curvatures of the ellipse (as a curve) at these points then equal \ (the absolute value of the curvature of the graph of the function).
For determining the osculation circle at point [72, 0], we'll consider the equation of the ellipse as a formula for the variable x depending on the variable y, i.e. x as a function of y (in a neighbourhood of point [72, 0], the variable y as a function of x isn't determined uniquely, so we cannot use the previous procedure - technically it would end up by diving by zero). Sequentially, we obtain: 2xx' + 4y = 0, thus x' = -2f, and x" = -2{\ - y-^). Hence at point [72, 0], we have x' = 0 and x" = — 72 and the radius of the circle of osculation is p = —= ^- according to 6.13. The
normal line is heading to — oo along the x axis at point [72, 0], thus the center of the osculation circle will be on the x axis on the other side at distance ^, hence at the point [72 - ^, 0] = P^, 0]. In total, the equation of the circle of osculation at vertice [72, 0] will be (x — ^)2 + y2 = \. The curvature at both of these vertices equals 72.
Denote the sought estimations of the values of the derivatives of fix) at the points x, as f^ and write the Taylor expansion briefly in this way:
ft±i = ft±./;'// ■
rii fin
JLh2±l^h3.
2 6
For the approximations of the first derivative we can immediately use three different differences computed from the Taylor expansion:
(i)
fi+l~fi-l-^f(3)(Xi)
2h
3!
, (1)   _   fi + l   ~ fi    _    h    fh, s
~      h 2\J (Xi) + ---
f!1) = i^L + ^f"(Xi) + ...
when we only substracted the respective polynomials. Then we obtain a numerical representation of the first derivative. The first of them has an asymptotic estimation of the error of
(i)
fi + l - fi
i-1
2h
Oih2),
the other two have 0(/z). We call them the central difference, the forward difference and the backward difference. Surprisingly, the central difference one digit place better than the other two.
We can proceed the same way when approximating the second derivative. To be able to compute fix,) from a suitable combination of the Taylor polynomials, we need to cancel both the first derivatives and the value at x,. The simpliest combination cancels all the odd derivatives as well:
(2) _ fi + l ~ 2fj + fj + l h2
363
CHAPTER 6. DIFFERENTIAL AND INTEGRAL CALCULUS
□
6.40. Remark. The vertices of an ellipse (more generally the vertices of a closed smooth curve in plane) can be defined as the points at which the function of curvature has an extreme. The ellipse having four vertices isn't a coincidence. The so called "Four vertices theorem" states that a closed curve of the class C3 has at least four vertices. (A curve of the class C3 is locally given parametrically by points [f(t), g(t)] € R2, t € (a, b) c R, where / and g are functions of the class C3 (R).) Thus the curvature of the ellipse at its any point is between its curvatures at its vertices, i.e. between \ and \fl.
B. Integration
First, several easy examples that everyone should handle.
6.41. Using integration "by heart", express
(a) / e~x dx, x e R;
(b)/
•JA-x2
dx, x e (—2, 2);
x2+3
dx, x e
3x2 + l x3+x+2
dx, i^-l.
Solution. We can easily obtain
(a) / e~x dx = — f —e~x dx
+ C;
(b)/
•jA-x2
dx = f ■
'i-(!r
■■ dx
arcsin § + C;
x2+3
dx
dx
-+i
V3 f
1
73
(d)/
-Larctg^ + C;
3x2+3
1 + (73)
■ dx
x3+3x+2
dx = In x3 + 3x + 2 \ + C,
where we used the formula / 4^ dx
f(x)
ln|/(*)| + C.
□
6.42.   Compute the indefinite integral
T + 4 e~ - ^ + 9 sin 5x + 2 cos :
+
3-jc)
dx
pro i ^ 3, i ^ | + /C7T, k sZ.
Solution. Only by combining the earlier derived formulas, we obtain
We call this the differention of the second order and just like the central first differention, the asymptotic estimation is one digit place better than we would expect at first glance:
A2) _ fi + l J i
2fi + fi+l
h2
■0(h2).
2. Integration
6.18. Newton integral. Now we'll take interest in the reverse procedure than we did for differentiating. We'll want to reconstruct the actual values of some function us-5s- ing its immediate changes. If we consider the given function f(x) the derivative of an unknown function F(x) then at the differential level we can write
dF — f (x)dx.
We call the function F the primitive function or the indefinite integral of the function / and traditionally we write
F(x)
J
f (x)dx.
Lemma. The primitive function F(x) to the function f(x) is determined uniquely on each interval [a, b] except for an additive constant.
Proof. The statement follows immediately from Lagrange's mean value theorem, see 5.38. Indeed, if F'(x) — G'(x) — f(x) on the whole interval [a, b], the function (F — G)(x) has a zero derivative in all points c of the interval [a, b]. Then the mean value theorem implies that for all points x in this interval,
F(x) - G(x) = F(a) - G(a) + 0-(x-a).
Then the difference of the values of functions F a G must be the same on the interval [a, b]. □
The previous lemma leads us to this notation for the indefinite integral:
F(x)
-f
f (x)dx + C
with an unknown constant C.
We can also consider the value of real function f(x) as an immediate increment of the area bounded by the graph of the function / and the x axis and try to find the size of this area between boundary values a a b of some interval. Let's try to connect this concept with the indefinite integral. Suppose we know a real function and its indefinite integral F(x), i.e. F'(x) — f(x) on the interval [a,b].
If we divide the interval [a, b] to n parts by choosing the points
a — xo < x\ < ■ ■ ■ < x„ — b
and approximate the values of the derivatives at the points x, by the expressions
f(xt) = F'ixt)
F(xi+i) - F(xj)
Xi + l — Xi
by summing over all the intervals of our partition, we obtain an estimation of the sought size of the area:
364
CHAPTER 6. DIFFERENTIAL AND INTEGRAL CALCULUS
/ (r + 4ex - ± + 9sin5x + 2cos § - ^ +
dx
^ +6e^ +        ~ I cos5x + 4sinf - 3tgx - In | 3 -x \ + C.
□
For expressing the following integrals, we'll use the method of integration by parts (see 6.20).
6.43.   Compute / x cosx Ax, x e R and flnxdx,x> 0; Solution.
u = lnx   u' — 1 i/ = 1     i; = x
/
lnx dx
/
/
x cos x dx
xlnx — / ldx=xlnx — x + C
W = X w' = 1
i/ = cosx t> = smx x sinx + cosx + C.
x sin x — / sin x dx
□
6.44.  Using integration by parts, compute
(a) / (x2 + l) e~x dx, x e R,
(b) /(2x — 1) lnx dx, x > 0,
(c) / arctgx <ix, x e M,
(d) / ex sin x <ix, x e M,
Solution. First emphasise that by integration by parts, we can compute every integral in the form of
/ P(x) ahx dx,    f P (x) sin (fac) dx,    f P(x)cos(bx) dx,
f P(x) \ogna x dx,       f xfc log^ (kx) dx,
f P(x)arcsin (bx) dx,       f P(x)arccos (bx) dx,
f P(x)arctg (bx) dx,       f P(x)arccotg (bx) dx,
f ahx sin (cx) dx,       f ahx cos (cx) dx,
where P is an arbitrary polynomial and
a € (0, 1) U (1, +oo), fe,cei\{0}
Thus we know that
(a)
F(x) = X
n e N, k>0.
f (x2 + l) e~x dx
2 + l
G'(x)
(x2 + 1) e~x +f2xe~
F'(x) G(x) --dx =
2x —e"
F(x)--G'(x)
2x - e~
F'(x) G(x) --
(x2 + l) e~x - 2x e~x + f 2 e~x dx = - (x2 + l) e" 2x e~x - 2e~x + C = -e~x (x2 + 2x + 3) + C;
(b)
/(2x —l)lnx dx
(x2 — x) In x — / :
F(x)~-G'(x)
— dx =
lnx
= 2x -
F'(x) G(x) -.
1/x
j2
(x2 — x) In x + / 1 — x dx
{x2 — x) In x + x
X- + C;
(c)
«-i
«-i
!=0
(Xi + l-Xi)
i=0
F(b) - F(a).
Therefore we can expect that for "nice enough" functions f(x), the size of the area bounded by the graph of the function and the x axis can truly be calculated as a difference of the values of the primitive function at the boundary points of the interval. This procedure is called Newton integral. We write
Ja
f(x)dx = [F(x)fa = F(b) - F(a)
and also speak of the (Newton) definite integral within the bounds a, b.
In the case of a complex function /, the real and the imaginary part of its indefinite integral is uniquely determined by the real and the imaginary part of /, so with no further comments we'll only work with real functions from now on and we'll come back to complex ones in applications as needed.
6.19. Integration "by heart". Before we'11 clarify how the Newton integral is connected to the size of an area a eventually how to use it for simulations of practical problems, we'll show several procedures of computing the Newton integral. We'll only use our knowledge of differentiation.
The most easy case is the one when we can see the derivative in the integrated function flat out. To do that in the simple cases, it suffices to read the tables for function derivatives in our menagerie from the other side. This way we get f.ex. the following statements for all a e R and n e 7L, n / — 1:
/ /
/ / / / / / / / /
adx = ax" dx
C
—— x"+1 «+l
C
•ax dx = ± eax +C
a
— dx = a In x + C
x
a cos(bx) dx = | sin(bx) + C a sin (to) dx = — | cos (to) + C a cos (to) sin" (bx) dx = b(n+l) a sin(to) cos" (bx) dx =
sin"+1(to)
b(n+l)
cos
C
n+1(bx) + C
- In (cos (to)) + C b
a tg(to) dx
^^dx = Kctg(t) + C
yfa2 — x 1
_ dx = arccos (-) + C dx = arcsin (t—^ + C.
365
CHAPTER 6. DIFFERENTIAL AND INTEGRAL CALCULUS
/ arctg x dx
F(x) = arctg x G'ix) = 1
G(x) = x
x arctgx - /      dx = x arctgx - \J dx x arctgx - \ In (l + x2) + C;
(d)
/ ex sin x dx
F(x)--G'(x)
smx
F'(x) G(x)~-
cosx
-e^cosx + f ex cosxdx
F(x) -. G'ix)
e
= cosx
F\x) = ex G(x) = sinx
—ex cosx + ex sinx — / e* sinx <ix, which implies
/ ex sinx <ix = ^ e* (sinx — cosx) + C.
□
For expressing the following integrals, it's convenient to use the substitution method (see 6.21).
6.45.  Using a suitable substitution, determine
(a) / V2x — 5 dx, x
>
5 .
(7+ln x)1
dx, x > 0;
(c) f —
v ' J  (l+sin jt
(d) f   .cos*     dx, X € ■' Vl+sin2*
Solution. We have (a)
j dx, x £ ü+pl, k € Z;
/ V2x — 5 dx
t = 2x — 5 I dt = 2dx I
y (2x - 5)3 + C;
/ yftdt
t2 +C
(b)
(7+ln x)'
dx
t = 7 + lnx dt = - dx
x
(7+lnjt:)8
ffdt = j + C
+ C;
(c)
(l+sin x)2
dx
t = 1 + sin x (if = cos x <ix
+ c
l
l+sin x
+ C;
(d)
■ <ix
v7 l+sin2
M = f + Vi + f2 > o
du = (l + -r=) dt
&       - 1
f = sinx dt = cos x <ix
'l+r2
■■ dt
f - du = Inu + C
J u
In
(t + VT+^j + C = In (sinx + y/1 + sin2 jc^ + C.
□
In all the cases it's necessary to think through the domain on which the indefinite integral is well defined.
We can relatively easily add another rules by simple observations of suitable structure of the integrated functions to these table rules. F.ex.
/
fix) fix)
dx = ln|/(jc)| + C
for all continuously differentiable functions / on the intervals where they are nonzero. Of course, the rules for differentianting a sum of differentiable functions and constant multiples of differentiable functions imply that analogous rules hold for the indefinite integral.
6.20. Integration by parts. The computation of the integral using a primitive function (the indefinite integral), along with rule
(F ■ G)'it) = F'it) ■ Git) + Fit) ■ G'it)
for differentiating a product of functions, gives us the following formula for the indefinite integral
f,.).GW + C = /^WGW* + /fWG',.)*.
This formula is usually used in a way that one of the integrals on the right hand side is the one we want to compute, while we can compute the other one more easily.
The principle is best shown on an example. Let's compute
/
x sin x dx.
In this case a choice Fix) — x, G'ix) — sinx will help. Then G(x) — —cosx and therefore
I — —x cos x — j — cos x dx — —x cos x + sin x + C.
A common trick is also using this procedure for F'(x) — 1:
jinx dx = j
1 • In x dx — x In x
xüfx = x In x — x + C.
6.21. Integration by substitution. Another useful procedure is derived from differentiating composite functions. If
F'iy) = fiy),      y = vix), for a differentiable function <p with nonzero derivative, then
dF(<p(x))
dx
F'iy)-cp'ix)
and thus F(y) + C — f fiy) dy can be computed as
Fi<pix)) + C = j ficpix))cp'ix)dx.
By substituting x — cp~l (y) we then get the originally desired primitive function. More often, we write this fact in this way:
j f(y)dy = j fi<pix))cp'ix)dx
and we talk of substituting the variable y. On the level of differentials, the substitution can be easily understood in a way that (linearized) increments of the variable y and x are in mutual relation formally described by
dy — cp' (x) dx,
366
CHAPTER 6. DIFFERENTIAL AND INTEGRAL CALCULUS
6.46.  Determine the integrals
a) f      dx 9 ,
■}  sin (jc)-cosz(jc)
b) / x2 V2x + 1 dx.
Solution. For computing the first integral, we'll choose the substitution t = tg x, which can be often used with an advantage.
Ax
/
sin (x) — cos2(x) substitution t = tg x
dt = -^-Ax = (\ + tg2(x)) dx = (1 + t2) Ax
sin (x)
tg2M
i+tg2(*) i+fl cos2W = WW = T+F
=    l^-«=l-l — -ll —
J t2 -1        2] t-\    2] t + \
1 /tg(x)-l\
=   - In - + C
2 V tg+i ;
Now we'll compute the second integral:
J x2 V2x + 1 dx =
u = x2 u =2x
v' = V2x + 1   v = \{2x + 1)
1
-x2(2x + 1):
J x2V2
X + ldx - -(2x + 1)2 + C,
3 3
which can be thought of as an equation, when the variable is the integral. By putting it on one side,
/
x2 V2x + 1 dx
x2(2x + 1)5
x + 1
V2x + 1   v = \~J2x + 1
1   9 3        2  / 1       ,- If 3 \
-x2(2x + 1)2 - - I-xV2x + 1 - - / (2x + 1)2 dx J
1 2, 2,
-x2(2x + 1)2--xV2x + 1 H--(2x + 1)2 =
7 21 105
-x2(2x + 1)2 - — x(2x + 1)2 + — (2x + 1)2 + C
7
35
105
□
6.47.  Using the basic formulas, compute
(a) f-j=dx,x£0;
(b) /tg2x dx, x £ I + kn, k e Z;
(c) f dx,x^-^r+ 2kit, ksZ;
v ' j   l+sinjc       '      '        2 '
(d) / 6 sin 5x + cos | + 2 e^" dx, x el. Solution. Case (a). We can immediately determine
fj=dx=fx-1'3 dx = x4 +C = \Vx^ + C,
which corresponds the relation between the integrated quantities
f(y)dy = f((p(x))(p' (x)dx.
As an example, we'll verify the penultimate integral in the list in 6.20 using this method. For the integral
/ ST-
: dx
we'll choose the substitution x we obtain
sin t. Then dx — cos t dt and
dt
f        1                       f 1 = / — cos t dt — J__
•/  VI- sin2 ? J V cos2 ?
cos t dt
By reverse substitution t — arcsin x we get the already known relation I — arcsin x + C.
While substituting, we need to be aware of the actual existence of the inverse function to y — <p(x); while evaluating a definite Newton integral we also need to correctly recalculate the bounds of integration. The problems with the domains of the inverse functions can sometimes be avoided by dividing the integraiton into several intervals.
6.22. Integration by reduction to reccurences. Often the use of
fsubtitutions and integrating by parts leads to reccurent relations, from which we can evaluate the desired integrals. We'll illustrate this on an examples. By integration by ■)     parts, we evaluate
/cos^x = /«
cos™ 1 x cos x dx
— cos™ 1 x sin x
(m - 1)
cos™ 1 x sinx + (m — 1)
cos™ 2x(— sinx) sinx dx
cos™ 2 x sin2 x dx.
Using the formula sin x — 1 — cos2 x, we get
mlm — cos and the initial values are
m — l
x sinx + (m — l)Im-2
x,
The integrals in which the integrated function depends on expressions of the form (x2 + 1) can be often reduced to these types of integrals using the substitution x — tg t. Indeed, f.ex. for
/dx (x2 +1)*
the mentioned substitution gets us (notice that dx — cos-21 dt) f dt f
Jk = /-T~2-vf= /'
J   „„„2 j. i sin t   i   1 \ J
COSz t
(4t + i)
\ cosz t /
cos2* 2tdt.
For k — 2, the result is
1 1 j2 — — (cos t sin t + t) —
tst
2 V1 + tg
and therefore after the reverse substitution t — arctg x 1 / x
h =
2 VI
arctg x
C.
367
CHAPTER 6. DIFFERENTIAL AND INTEGRAL CALCULUS
where the notation in which we add C e R has to be understood in a way that we can get all primitive functions exactly by a constant translation of an arbitrary primitive function. But that's only true on an interval. In other words, the value C is generally distinct for x < 0 and for x > 0. Thus we should consider the values C\ and C2. For the sake of simplicity though, we'll use the notation without indices and stating the corresponding intervals. Furthermore, we'll help ourselves by letting aC = C for a e R \ {0} and C + b = C for b e R, based on the fact that
{C; CeR) = {aC; C e R] = {C + b; C e R] = R. We could then obtain an entirely correct expression for example by substitutions C = aC, C = C + b. These simplifications will prove their usefulness when computing more complicated problems, because they make the procedures and the simplifications more lucid.
Case (b). Sequential simplifications of the integrated function lead
to
ftg2xdx = f^Ldx= f
J   ° •>   cosz x •>
1—COS2 X
dx
f —k— dx — f 1 dx = tgx — x + C,
•I    COSz X •> °
where we helped ourselves by the knowledge of the derivative (tg*)' = d*7,      x^f+ /C7T,/ceZ. Case (c). It suffices to realize that this is a special case of the formula
ff^dx = ln\f(x)\ + C, which can be verified directly by differentiation
(ln|/(x)| + C)' = (ln[±/(x)])'+(Q
/ _ [±f(x)]' _ ±f'(x) _ f'(x)
±f(x)
±f(x) f(x)
Hence
I-
i+si*-dx =ln (1 + sinx) +C. Case (d). Because the integral of a sum is the sum of integrals (if the seperate integrals are sensible) and a nonzero constant can be factored out of the integral at any time, we have
/ 6 sin 5x + cos | + 2 e^" dx = — | cos 5x + 2 sin | + 3 e^ + C.
□
6.48. Determine
(a) / ^7 dx, x ^ j + kit, k € Z;
(b) / x2 e~3x dx, x e R;
(c) / cos2 x dx, x s R.
Solution. Case (a). Using integration by parts, we obtain
F(x) = x F'(x) = 1
G'« = d^ G{x)=tgx x tgx + / c™xx dx = x tgx + In | cosx | + C.
Case (b). This time we are clearly integrating a product of two functions. By applying the method of integration by parts, we reduce the integral to another integral in a way that we differentiate one function and integrate the second. We can integrate both of them (we can differentiate all elementary functions). Thus we must decide which of the two variants of the method we'll use (whether we'll integrate the function y = x2, or y = e~3x). Notice that we can use integration bz
■ dx
xtgx — f tgx dx
While evaluating definite integrals, we can compute the whole reccurence straight out after avaluating in the given bounds. F.ex. it can be seen immediately that while doing integration over the interval [0, 2it], our integrals have these values:
<>2jt
Iq — I dx
I
Jo
[x]2n = lit
*2n
h
-L
f
I    cosxdx — [sinx]2,^ — 0 Jo
0 for even m
cos  x dx —
m — l
-Im-2   for odd m Thus for even m — In we obtain the resulr
í>2jt
Jo
2„    ,      (2n - l)(2n-3)...3- 1„ cos   x dx —--——-—---2it,
jo 2n(2n — 2)... 2
outright, while for odd m it's always zero (as could be guessed from the graph of the function cos x).
6.23. Integration of rational functions. While doing integration of rational functions, we can use several simplifications. Particularly in the case the degree of the polynomial / in the numerator is greater or equal to the degree of the polynomial g in the denominator, it's sensible to carry out the division with a remainder outright (see the paragraph 5.2) and reduce the integration to a sum of two integrals. The first one will be an integration of a polynomial and the second one an integration of an expression f/g with degree of g strictly greater than the degree of / (such functions are called proper rational functions).
Indeed, we can achieve this by simple division of the polynomial:
/ h f = q-g+h,    — = q + -.
g g
Thus we can assume without loss of generality that degree of g is strictly greater than the degree of /. We' 11 show another procedure on a simple example. Let's try to analyse how to get the result
f(x) _    4x+2 -2 6
g (x)     x2 + 3x + 2 which we can integrate directly:
4x + 2
1
/
■ dx
-2 In I
II
61n|
C.
- 3x +2
First off, by modifying the sum of the fractions to a common denominator we can verify this equality easily. Conversely, if we know our expression can be written in the form
4x + 2 A B
x2+3x + 2    x+1    x + 2 we only need to compute the cofficients A and B. We can obtain equations for them by multiplying both side by the polynomial x2 + 3x + 2 from the denominator and comparing coefficients of the individual powers of x in the resulting polynomials on both sides:
4x + 2 = A(x + 2) + B(x + 1)
2A + B = 2, A + B = 4.
This is where our decomposition comes from. It's called decomposition into partial fractions.
This elementary procedure can easily be generalized. It's a purely algebraic notion based upon properties of polynomials, which we'll come back to in chapter ??.
368
CHAPTER 6. DIFFERENTIAL AND INTEGRAL CALCULUS
parts repeatedlz and that the 72-th derivative of a polynomial of degree n € N is a constant polynomial. That gives us a way to compute
x e
dx
F(x)--G'(x)
-3x
■\x2 e~3x + \
I
F'(x) G(x) x e
2x
1 „-3x 3 C
3xdx
and furthermore
x e
-3x
dx
1
I x e~3x + 3
10-3* "3 e
I «-3x 9 C
F(x) = x F'(x) G'(x) = e~3x   G(x) --1 /e~3xdx = -\xe~3x - ±e~3x + C.
In total, we have
/-i-2 ^  3x .1 y _ __1_ -^.2 ^  3x _      2     _   3x _       2  _ 3x ~v     C           L c ~ V -                 ^ -A<    G                   g . V G G
- \q~3x (x2 + \x + I) + C.
Note that a repeated use of integration by parts within the scope of computing one integral is common (just like when computeing limits by the l'Hospital rule).
Case (c). Again we apply integration by parts using
f cos2 x dx = f cos x ■ cos x dx =
cos x ■ sin x + / sin2 x dx
F(x) = cosx    F'(x) = — sinx G'(x) = cosx   G(x) = sinx
cos x • sin x + f 1 — cos2 x dx = cos x • sin x + f \dx — f cos2 x dx = cos x • sin x + x — f cos2 x <ix.
Although the return to the given integral might make the reader cast some doubts on it, the equality
/ cos2 x dx = cos x • sin x + x — f cos2 x dx
implies
2 / cos2 x dx = cos x • sin x + x + C,
i.e.
(6.7)
/
cos x dx
(x + sinx • cosx) + C.
It suffices to remember that we put C/2 = C and that the indefinite integral (as an infinite set) can be represented by one specific function and its translations.
We emphasise that usually suitable simplifications or substitutions lead to the result faster than integration by parts. For example, by using the identity
cos2x = i (1 + cos2x) ,       x € M
we easily obtain
/ cos2 xdx = f\dx + / \ cos 2x dx = | +        + C
x_ _i_ 2 sin x cos x   ■ /-< 4
i (x + sinx • cosx) + C.
□
6.49. Integrate
(a) / cos5 x • sin x dx, x e M;
(b) / cos5 x • sin2 x dx, x € M
(d) dx, x > 0.
Suppose the denominator g (x) and the numerator f(x) don't share any real or complex roots and that g (x) has exactly n distinct real roots a\,..., an. Then the points a\, ...an are exactly all the discontinuities of the function f(x)/g(x).
For simplifying the notion first write g(x) as the product
g(x) = p(x)q(x)
of two coprime polynomials. By Bezout identity (see ??), which is a corollary of ordinary polynomial division with a remainder, there exist polynomials a(x) and b(x) of degrees strictly lower than the degree of g such that
a(x)p(x) + b(x)q(x) — 1. Multiplying this equality by the quotient f(x)/g(x), we obtain fix) _ a(x) b(x) g(x)      q(x) p(x)
Now suppose our polynomial g (x) has only real roots, therefore it has a unique factorization (x — at)"', where    are the multiplicities of the roots a,, i — 1,..., k. By a sequential use of the previous procedure with coprime polynomials p(x) and g(x), we get a representation of f(x)/g (x) as a sum of fractions of the form
rijx) rk(x) (x-fli)"i     "' (x-ak)mr
where the degrees of the polynomials n (x) are strictly lesser than the degrees of the denominators. Each of them can be very easily represented as a sum
r(x)          Ai          A2 An ——— =-=--1-----1-----1---—,
(x — a)n     x — a     (x — a)2 (x — a)n
if we start from the highest powers of the polynomial r(x) and sequentially compute A\, A2,... by suitable adding and removing of summands in the numerator. F.ex.
5x - 16        x - 2 1 5 6
(x - 2)2      (x - 2)2      (x - 2)2     x - 2    (x - 2)2'
Now we need to handle the case, where there are not enough real roots. There always exists a factorization of g(x) to linear factors with eventual complex roots though. Repeating the previous notion for complex polynomials gives us the same result. If we know in advance the coefficients of the polynomials are real though, the complex roots in our expressions will come up simultaneously with their complex conjugate roots. Therefore we work with quadratic factors of the form of sum of squares (x — a)2 + b2 and their powers straight out. Our previous notion work very well again and guarantees that it will be possible to see the respective summands in the form of
Bx + C
7x>+x
((x - a)2 + b2)" '
Similarly to the real roots case, we can always find the corresponding decomposition into partial fractions of the form
A\x + B\ Anx + Bn
(x - a)2 + b2 H       h ((x - a)2 + b2)"
in the case of a power ((x—a)2+b2)n of such quadratic (irreducble) factor as well. Specific results can also be tried out in Maple by calling the procedure "convert(h, parfrac, x)" that decomposes the expression h that is rationally dependant on the variable x into partial fractions.
369
CHAPTER 6. DIFFERENTIAL AND INTEGRAL CALCULUS
Solution. Case (a). This is a simple problem for the so called first substitution method, whose essence is writing the integral in the form of
(6.8)
J f((p(x))(p'(x)dx
for certaing functions / and cp. Using the substitution y = (pipe), (we also substitute dy = cp1 (x) dx, which we get by differentiating y = (p(x)) , such integral can be reduced to the integral / f(y) dy. By substituing y = cos x, where dy = — sin x dx, we then obtain
/ cos5 x ■ sinx dx = — f cos5 x (— sinx) dx = — f y5 dy =
- 2- + C -Case (b). Using the equality
+ C.
/ cos5 x • sin2 x dx = f (cos2 x) sin2 x • cos x dx
/(I
sin'
x) sin2 x • cos x dx
we're tempted to use the substitution t
t
f cos5 x • sin2 x dx ft6 - 2t4 + t2 dt
dt
- sinx cos x dx
+ l + C
sin x, which yields
J = /(i -pftdt
•  7 i   • 5
sin x      2 sin x
+
+ c.
7      " 5   1   3   1  w —     7 5       1 3
Case (c). Because both sine and cosine are contained in an even power, we cannot proceed as in the previous problem. Let's try to use the so called second substitution method, which means a reduction of / f(y) dy to the form (||6.8||) for y = (p{x). A situation in which we replace a simple expression by a more complicated one might seem surprising. But don't forget that this more complicated integral might have such a form that we may just be able to compute it. We want to determine the primitive function of function f(x) = tg4x. Thus it's sensible to consider the substitution u = tg x. We obtain
x = arctg u 1
dx -
I
■ dx
du 1+u2 ,
y - u + arctg u + C = ^
--tgx + x + C
f t^—t du = f U2 — 1 + -y—r du
tgx + arctg (tgx) + C
Case (d). We have
dx
•O^+x
z = x '5 A~ — dx
I
l-z2+z zs+z6
1
6z5<fe
2z + 2-j^dz,
z6 = 6z'dz,
■dz = 6fz2
6 (£ - z2 + 2z - In | z + 11) + C =
2Vx - 6^/x + \2Zfx - 6 In (Z[x + l) + C,
where we again easily determined by substitution (for z, 7^
1)
dz z+1
dv
z + 1 = dz
/f-=ln|u|+C = ln|z + l| + C.
6.50. By combining integration by parts and the substitution method, determine
(a) / x3 e~*2 dx, x € R;
(b) /xarcsinx2 dx, x e (—1, 1).
Solution. Case (a). The substitution method leads to the integral
We can already integrate all of the above shown partial fractions. Recall that the last mentioned ones lead to integrals discussed in example 6.22 among other things.
To sum up, the rational function f(x)/g(x) can be integrated fairly easily, if we can find the corresponding decomposition of the polynomial in the denominator g(x). While computing Newton integrals though, the problematic points are the discontinuities of rational functions, in whose neighbourhood these functions are unbounded. We will tend to this problem later (see paragraph 6.30 lower).
6.24. Riemann integral. The notion of computing integral as a representation of the area bounded by the graph of a function and the x axis has to be put more precisely. We'll do that now and then prove that for all continuous function this definition yields the same results as Newton integral.
Consider a real function / defined on the interval [a, b] and choose a partition of this interval along with the choice of represen-tants l-i of the respective parts, i.e. a = xo < x\ < ■ ■■ < x„ = b and^- e [x,_i, x,], i = 1,..., n. The numbers = min/fx,—x,_i} is called the norm of the partition. We define Riemann sum corresponding to the chosen partition along with the chosen represen-tants
S = (x0, ...,.*„; §1,..., §„)
as
n r = l
We say the Riemann integral of function / on the interval [a, b] exists, if for every sequence of partitions with representants (E,t)£l0 with norms of the partitions &k approaching zero, the limit
lim ssk = s
exists and its value doesn't depend on the choice of the sequence of partitions and their representants. In that case we write
J a
f(x)dx.
This definition doesn't look too practically, but nonetheless it will allow us to easily formulate and prove several simple properties of the Riemann integral:
Theorem. (1) If f is a bounded real function defined on the interval [a, b] and c € [a, b] is an inner point of this interval, then the integral Ja f(x)dx exists if and only if both of the integrals fa f(x)dx and    f(x)dx exist. In that case
rb pc pb
I f(x)dx = / f(x)dx + / Ja Ja J c
f(x)dx
□ holds.
(2) If f and g are two real functions defined on the interval [a, b] and both of the integrals f% f(x)dx and f% g(x)dx exist, then the integral of their sum also exists and
rb rb rb
/  (f(x) + g(x))dx= /   f(x)dx+ / g(x)dx
Ja Ja Ja
holds.
370
CHAPTER 6. DIFFERENTIAL AND INTEGRAL CALCULUS
/x3e"
dx= dt- -2xdx =2-fte'dt> which can be easily computed by integrating by parts, yielding
■ / te' dt
F(t) = t	F'(t) = 1
dt) = e	G(0 = e'
\te
yedt
e~x (x2 + 1) + C.
Case (b). Similarly, we obtain
/ x arcsin x2 dx =
F'(t) G(t)
F(t) = arcsin t G'(t) = 1
t = XT
dt = 2x dx
l
/ arcsin t dt
t arcsin t
f-r=dt
u du
l
1
-2t dt
\t arcsin t + \ f ^= = \t arcsin t + \yfü + C
2t arcsint + ±Vl - t2 + C
\x2 arcsinx2 + ±Vl - x4 + C.
□
6.51.   Compute the integral
/ Vl -x2 dx,      x e (-1, 1) in two different ways. Solution. Integration by parts yields
F(x) = Vl -x2
F'(x) G(x)--
VT -xf
x Vl — x2 — f Vl — x2 dx + arcsin x,
which implies
2 / V1 — x2 dx = x Vl — x2 + arcsinx + C,
i.e.
/ V1 — x2 dx = ^ (x Vl — x2 + arcsinx^ + C.
The substitution method along with (||6.7||) then yields
x = sin y ' dx = cos y dy
f V1 — x2 dx
f y/1 — sin2 y ■ cos j dy
f cos2 j <fy = ^ (y + sin y ■ cos y) + C =
j (^sin j • a/ 1 - sin2 j +     + C = ^ (x Vl - x2 + arcsinx^ + C,
where y € (—tt/2, tt/2) for x e (—1, 1), thus among other things, we have
cos y | =yfc
0 < cos y
6.52. Determine
'cos2 y
sin2 j.
□
/ e^* dx,
x > 0.
Solution. This problem can illustrate the possibilities of combining the substitution method and integration by parts (in the sscope of one problem. First we'll use the substitution y = V* to get rid of the root from the argument of the exponential function. That leads to the integral
(3) If f is real function defined on the interval [a,b], C el is a constant and the integral f% f(x)dx exists, then the integral
f C ■ f(x)dx also exists and
/
Ja
b rb
C ■ f(x)dx = C ■ I f(x)dx
Ja
holds.
Proof. (1) First suppose that the integral over the whole interval exists. When computing it, surely we can limit ourselves to limits of the Riemann sums whose partitions have the point c among their partitioning points. Each such sum can be obtained as a sum of two partial Riemann sums. If these two partial sums would depend on the chosen partitions and representants in limit, then the total sums couldn't be independant on the choices in limit (it suffices to keep of sequence of partition of the subinerval the same and change the other so the limit would change).
Conversely, if both Riemann integrals on both subintervals exists, they can be approximated with arbitrary precision by the Riemann sums, and moreover independantly on their choice. If we add a partitioning point c to any sequence of Riemann sums over the whole interval [a,b], we'll change the value of the whole sum and also the values of partial sums over the intervals belonging to [a, c] and [c, b] at most by a multiple of the norm of the partition and possible differences of the bounded function / on whole [a, b]. That's a number tending arbitrarily close to zero for a decreasing norm of the partition. Then necessarily the partial Riemann sums of our function also converge to the limits, whose sum is the Riemann integral over [a, b].
(2) In every Riemann sum, the sum of the functions manifests as the sum of the values in the chosen representants. Because multiplication of real numbers is distributive, the statement follows.
(3) The same thought as in the previous case. □
The following result is crucial for understanding the relation between an integral and a derivative:
6.25. Theorem (The fundamental theorem of integral calculus).
For every continuous function f on a finite interval
rb
[a, b] there exists its Riemann integral Ja f(x)dx. Moreover, the function F(t) given on the interval [a, b] by the Riemann integral
F(x) = f f(t)dt
J a
is a primitive function to f on this interval.
The whole proof of this important statement will be somewhat longer. In the first step for proving the existence of the integral, we'll use an alternative definition, in which we replace the choice of representants and the corresponding value /(&) by the suprema Mi of the values f(x) in the corresponding subinterval [x,_i, x;], or by theinfima m, of the function fix) in the same subinterval, respectively. We speak of upper and lower Riemann sums, respectively (in literature, this process is sometimes called the Darboux integral).
371
CHAPTER 6. DIFFERENTIAL AND INTEGRAL CALCULUS
y2 = x 2y dy = dx
Now by using integration by parts, we'll compute
/ dx
2fyeydy.
fytydy
f(y)--g'(y)
y
f'(y) g(y) --
1
yey - feydy
yey — ey + C.
Thus in total, we have
/e^dx = 2yey - 2ey + C = 2e^ (y/x - l) + C.
□
6.53.  Prove that 1 . „
- sin x 2
1 1 3
-- cos(2x) H--cos(4x) H--.
4 '    16 16
Solution. Easier than to compare the given expressions directly is to show that the functions on the right and left hand side have the same derivatives. We have L' = 2 cos x sin3 x = sin(2x) sin2 x,
p'   =   \ sin(2x) + | sin(4x)   =  sin2x(j + ^cos(2x)) = sin(2x) sin2 x. Hence the left and the right hand side differ by a constant. This constant can be determined by comparing the values at one point, for example 0. Both functions are zero at zero, thus they are equal. □
C. Integration of rational functions
6.54. Integrate
(a) f ^ dx, x ^ 2;
(b) / (TiV      * * "4:
(c) f^TTs dx,xeR;
(d) f , ?0x~71 a dx, x e R.
Solution. Cases (a), (b). We have
f^dx
j x—2
y = x — 2 dy = dx
f-dy = 6In I j |+C = 61n|x-2|+C
and similarly
(X+4Y
dx
y = x +4 dy = dx
I^dy
-2y<
+ C
(x+4)'
+ c.
We can see that integrating the partial fractions which correspond to real roots of a denominator of rational function is very easy. Moreover, without loss of generality we can obtain
y = X — Xq
and
-x0
■ dx
dy = dx   I    f y dy ■ A In I x — x0 I + C
A In I y I + C
(x-xq)'
■ dx
y = x - X0
dy = dx
A
fj^dy
Ay-
-n + l
+ c
(l-n)(x-x0r-1
for all A, x0 e R, n > 2, n e N.
Case (c). Now we are to integrate a partial fraction corresponding to a pair of complex conjugate roots. Thus in the denominator there is a polynomial of degree 2 and in the numerator at most 1. If it's of degree 1, we'll write the partial fraction so that we'll have a multiple
6.26. Upper and lower Riemann integral. Because our func-jijfition is continuous, it's surely bounded on a closed interval, hence all the above considered suprema and infima and fi-| nite. Then the upper Riemann sum corresponging to the partition S — (xq, ..., x„) is given by the expression
Ss.sup = J2(  sup   /(§)) • (xi - Xi-i) = J^Miixt - xt-i),
r = l
while the lower Riemann sum is
Ss.inf — inf   /(§)) • (xt - xt-i) — y^mtixi - x,_i).
i—^h-i<^<xi f—'
Because for every partition S — (xq, ..., xn; §i,..., with rep-resentants, the inequalities
(6.3) Ss,inf < Ss,^ < Ss,sup
hold, and the infima and suprema can be approximated with arbitrary precision by the real values, we can suspect that the Riemann integral will exists if and only if for every sequences of partitions with norm approaching zero, the limits of both the upper and lower sums will exists and they will be equal. We'll prove that this is indeed true for all bounded functions:
Theorem. Let the function f be bounded on a closed interval [a, b]. Then
Ssup — inf i5sSUp, $inf — SUp $3, inf
are the limits of all sequences of upper and lower sums with norm approaching zero, respectively.
The Riemann integral of a bounded function f over the interval [a, b] exists iff Ssup — Smf.
Proof. If we refine a partition Si to e2 by adding another points, clearly
^Si,sup — ^S2,sup, ^Si,inf — ^S2,inf-
Every two partitions have common refinement, hence the values
^sup = inf i5sSUp, $inf — SUp $3, inf
are good candidates for the limits of upper and lower sums. Indeed, if the common limit of the upper sums S exists and is independant on the chosen sequence of partitions, then it must be Smp, and similarly for lower sums.
Conversely, consider a fixed partition S with n inner partitioning points of the intervala [a,b], and another partition Si, whose norm is a very small number S. In their common refinement e2, there will be only n intervals, that will contribute to the sum Ss2,sup by eventually lesser contribution than in the case of S i. Because / is a bounded function on [a, b], each of these contributions will be bounded by a universal constant multiplied by the norm of the partition (i.e. the maximal length of the corresponding interval in the partition). Hence when choosing sufficiently small S, the distance of Selsup from Smp won't be bigger than twice the distance of SE, sup from Smp.
If we now choose an arbitrary sequence Sj- with upper sums, whose limit is Smp, then for fixed e > 0 we can find k such that ^Ht.sup, k> N will be distant from Smp by less than e.
372
CHAPTER 6. DIFFERENTIAL AND INTEGRAL CALCULUS
of the derivative of the denominator in the numerator and add to it the fraction, in whose numerator there is only a constant. This way we'll obtain
r    3x+7      i   _ 3 r     2x-4     jv  ,  n r dx J x2-4x + 15 2 J
x2-4x + l5        - 2 J x2-4x + l5 dX + 13 / '.
§ln(x2 -4x + 15) + = f ln(x2 -Ax + 15) +
x-2
x2-4x + \5 J2
I
dx
+1
y --
dy
dx
ln(x2-4x + 15) + JLf
dy y2+l
■ In (x2 - Ax + 15) + J= arctg y + C = In (x2 - Ax + 15) +      arctg *jg + C
Again, we can generally express
Ax+b (x-x0)2+a2
dx
and compute
2(x-x0)
f f MX-X20> 2 dx
J (x-x0)2+a2
f , 2(X10) 2 dx + (B + Ax0) f
■]   (x-x0)2+a2 v K" •>
f _2
j y
(x-x0)2+a2
dx
y
(x — xq)2 + a2
dy = 2 (x — xq) dx
I
(x-x0)2+a2
In | y | + C = ln[(jc - x0)z + a2] + C,
x—xq
dx = -4 f
a1 j
dx
+i
z = dz.
dx
a J  z2 + l
± arctgz + C = i arctg      + C,
i.e.
J Ax+b
(x-x0)2+a2
dx
f In ((x - x0)2 + a2) + *±j^ arctg £^o + C,
where the values A, B, x0 e M, a > 0 are arbitrary.
Case (d). All that is left are the partial fractions for multiple complex roots in the form of
r   Ax+2B „„,       A, B,x0 e M, a > 0,n e N\ {1}, which can be analogically simplified to
2(*-*p)
2 [(x_X())2+fl2]
+ (S + Ax0)
[(*-*0)2+a2]'
Then we'll determine
2(x-xp)
[(x-x0)2+a2
y
(x — xq)2 + a2
l
(l-n)yn
dy = 2 (x — xo) <ix
+ c-
dy
y"
— + c
f iy_
J y"
and
K„ (x0, a) F(x) - 1 G'(x)
[(*-*0)2+a2]' 1
(l-«)[(*-*o)2+a2j
^ [(*-*0)2+a2]" dX =
F'(x) — -2n(x-x0)
G(x)
[(x-x0)2+a2 x — x()
x—xq
+ 2nj-
(x-x0)2+a2
[(*-*0)2+«2]"   '     '"J   [(x-x0)2+a2f+l [(x-Xo)2+a2T
dx
[(*-*o)2+«21
+ 2n (Kn (x0, a) - a2 Kn+i (x0, a)) which implies
Kn+l (x0, a) = K„ (x0, a) +
2« [(x-X0)2+a2]
which clearly also holds for n = 1. The last recurrent formula can be extended with the integral (derived in case (c))
Kt (x0,a) = Iarctg^ + C. In the given problem we have
30jc-77 (x2-6x+l3f
and furthermore
dx = 15 f
2x-6
(jc2-6jc + 13)
■ dx + 13 /
(x2-6x + l3)
■ dx
According to the last notion, we can find S such that for all partitions with norm less than S the sum will be closer than 2s. Hence we have just shown that for arbitrary e > 0, we can find S > 0 such that for all partitions with norm at most S the inequality l^s.sup — | < s will hold. That's exactly the statement that the number Smp is the limit of all sequences of upper sums with norms of the partition approaching zero. We cam prove the statement for lower sums in exactly the same way.
If the Riemann integral doesn't exist, there exist sequences of partitions and representants with different limits of Riemann sums. The the proven statement implies, that the limits of upper and lower sums will be different as well. Conversely, suppose Smp — Sm, but then all Riemann sums of sequences of the partitions must have the same limit because of the inequalities (6.3). □
6.27. Uniform continuity. Until now, we have only used the continuity of our function / to show that all such functions are bounded on a closed finite interval. We still have to show though, that for continuous functions,
^sup — ^inf*
>From the definition of continuity we know that for every fixed point x € [a, b] and every neighbourhood 0B(f(x)) there exists a neighbourhood Os(x) such that /(£><$ (x)) c 0E(/(*)). This statement can also be rewritten in this way: for y, z e Og (x), i.e.
\y-z\< 28,
it's true that f(y), f(z) € 0E(f(x)), i.e.
\f(y)-f(z)\<2s.
We're going to need a global variant of such properly, we call itw-niform continuity of function /:
Theorem. Let f be a continuous function on a closed finite interval [a, b]. Then for every s > 0, there exists 8 > 0 such that for all z, y € [a, b] satisfying \y — z\ < 8, the inequality \f(y)- f(z)\<e holds.
Proof. Because every finite closed interval is compact, we can cover it by finitely many neighbourhoods Os(x)(x) mentioned above in connection with continuity. Their radius 8(x) depends on the center x, while we' 11 consider the number e fixed. Finally, we' 11 choose 8 as the minimum of all (finitely many) <5(x). Our continuous function / then has the desired property (we only interchange the numbers e and 8 for their doubles). □
6.28. Finishing the proof of Theorem 6.25. Now we can easily finish the whole proof of existence of the Riemann integral. Choose e and 8 the same way as in the previous theorem about uniform continuity and consider any partition S with n intervals and norm at most 8.
Then
V    sup /(§) • (xi-Xi-O-Y]    inf /(§) • (xi-Xi-{)
<V|    sup /(§)-   inf • (Xi -Xi-{)
<£■ {b - a).
373
CHAPTER 6. DIFFERENTIAL AND INTEGRAL CALCULUS
2x-6
(x2-6x+l3)
■ dx
y = x2 - 6x + 13 J
dy = (2x — 6) dx | + C,
dy v2
+ c
1
22
(jc2-6x+13)'
2-6x + l3
■ dx = f
dx
[(x-3)2+22Y
(¥^i(3.2) + ^c^)
I (iarctg^ + C + I = >ctg^ + I + C
In total, we have
■ dx
30x-n
15
x-3
(x2-6x + 13)
~ x2-6x + 13 + 16 WCX% 2 + 8 x2-6x + 13 11 arrtCT £=1 4-     13x-159     _l r1
16arctg 2 + 8(jc2_fa+13) +c.
+ C
□
6.55.   Integrate the rational functions
jc3 + 1
jc(jc-I)3 x-4
(a)/
^ / 5jt2+6*+3
(c)/ (e) /
dx, i /0, i / 1; <ix, x e M;
(jc-4)(jc-2)(jc2+2jc+2)
dx, i /2, i /4;
-jt+i
2jc+1
(jc2+4jc+13)2
dx, i^l; <ix, x e M:
5jc2-12
dx, x e
jc4-12jc3+62jc2-156jc + 169
Solution. We'll compute all the given integrals in the way we can always use when integrating rational functions. We won't use any specific simplification or substitution. Even the recurrent formula for Kn+i(x0,a), which we derived in a general form, will be used only for x0 = 0, a = 1 (and also when n = 0). Using the aforementioned procedures, we obtain
(a)
x^ + l x(x-l)3
2 In I x
dx=2f£-1+f7£v+2f
dx
x-l
(*-d2 l
(*-d2
(*-d3
In I x I + C;
f dx_ ■J x
(b)
jc-4
dx — ml'
IOjc+6
dx
5jc2+6jc+3 10 J  5jc2+6jc+3 5 J 5jc2+6jc+3
^ln(5x2+6x + 3)-|/.    * -
H)
_ _6_
25
1Lln(5x2+6x+3)-f /
dx
dt
(^) +1
>(5*>+6*+3)-^/£_ = ± In (5x2 + 6x + 3) - ^ arctgf + C = ^ In (5x2 + 6x + 3) - ^ arctg ^t3 + C;
5jc+3
-4= <ix
V6
(c)
dx
(x-4)(x-2)(x2+2x+2)
i f A_I fiiL_l J_ f   4* + ll   dx — —\n\ X
52 J x-4      20 i *-2 ^ 130 J  x2+2x+2 UX ~ 52 111 I X
20
In I x - 2 I +
130
130
(2/
2jc+2 jc2+2jc+2
dx +1 f
dx
26Ö111
(*-4)3
(*-2)13 f = X + 1 J
dt = dx I
jc2+2jc+2 ,
130 260m
(x-4)i(x2+2x+2) (*-2)13
(* + l)2+l
■__7_ r rft
130 J t2+\
We can see that for decreasing norm of the partition the upper and lower sums are arbitrarily close to each other. That's why the suprema and infima coincide, which is what we needed to show.
For a complete proof of the fundamental theorem of integral calculus, we still need to verify the statement about the existence of the primitive function. We already know that for continuous function / on interval [a, b] there exists the integral fa f(x)dx for every t e [a,b]. As in the statement about uniform continuity, we can choose S > 0, dependant on dixed small e > 0, such that
\f(x + Ax)-f(x)\ <B
for all 0 < Ax < S on the whole intervalu[a, b]. The difference of the derivative of our function F(x) and the integrated function fix) is expressed by the limit of the expressions
'X+AX fit)dt- I   fit)dt)  - fix)
1   / fx+Ax f{ \
- I J        fit)dt - J fit)dt)
Ax
for Ax approaching zero. If we choose 0 < Ax < S though, then in absolute value this expression can be estimated by
xx+Ax \
fit)dt) - fix) <s,
Ax (/
because in the expression on the left hand side we can replace the integral by its Riemann sum with arbitrary precision and in the summands / (&) (x, — x, _ i) with & e [x, x + Ax] in any Riemann sum, the values /(§) are distant from fix) by at most e. Hence by replacing all /(&) by fix), we obtain a zero expression on the left hand side with error of at most e.
But that means that at the point x, the right derivative of function Fix) exists and equals fix). We prove the result for the left derivative in the same way, and the whole theorem 6.25 is therefore proven.
6.29. Important notes. (1) Theorems 6.25 and 6.24 claim that integral is a linear map
/
: C[a, b]
from a vector space of continuous functions on interval [a, b] to real numbers. Hence it's a linear form on the space C[a, b].
(2) We proved that every continuous function is a derivative of some function. Hence the concepts of Newton and Riemann integral coincide for continuous functions. Therefore the Riemann integral of continuous functions can be computed as the difference of values F(b) — F(a) of the primitive function F.
(3) In the first step of the proof of the theorem 6.25 we also proved an important statement, that for bounded function / on interval [a, b], the limits of the upper and lower sums always exist. They are also called the upper Riemann integral and the lower
Riemann integral and they are often denoted by J afix)dx and
f^f(x)dx.
In this way we can equivalently define the Riemann integral for continuous functions (as we did in the proof).
(4) In the next step of the proof we derived an important property of continuous functions that is called the uniform continuity on a closed interval [a, b]. Clearly every uniform continuous function is continuous as well, but the converse need not be true on open intervals. As an example, consider the function fix) = sin(l/x) on the interval (0, 1).
374
CHAPTER 6. DIFFERENTIAL AND INTEGRAL CALCULUS
2o0ln
1
260
In
(x-4)5(x2+2x+2)' U-2)13
(x-4)s(x2+2x+2) (x-2)13
+ ^ arctg? + C
+ 14 arctg (x + 1)  + C;
(d)
x4— x3— x-\-l
1 7
dx ■
dx
I f dx
3 J (x-1)2 1
3(jc-1) 3 t =
1
3(*-l)
(x+l) +: 2x + l I
f =
dt = —pdx 73 i
T^arctgf + C
3(*-l)
If
3 J x2+x + l
dx
+1
1
3(*-l)       373-' t2+l
2   J dt
1
(e)
2x+l
dx = f ■
3(x-l)
^±^dx-3f
^ arctg ^±I + C;
dx
(x2+4x + l3)2 dx _
(x2+4x + l3)2 J   (x2+4x + l3)2
t   = X2   +4X  + 13   | =    , ±   _ ,
(2x + 4) dx |       ■'  *2 ■' [(x+2)2+9]
— x+2 i dx _ u —    3 | _
dt
- f
21 J
[m
' f (u
du
U
du
\dx | *2+4*+13
(u2 + l)2
x2+4x + 13
1
x2+4x+13
^arctg^
I^arctgK + i^+C + C =
j+2 3
18
18 arctg 6 x2+4x + 13
(f)
5jc2-12
-12jc3+62jc2-156jc + 169
dx = f
5/
5jc2-12 (x2-6x + 13)2 30jc-77
<ix
5/
If
dx
x2-6x + l3 ^ J  (jc2-6jc + 13)2 2x-6
dx
(jc-3)2+4
+ 15/
dx
(x2-6x + l3) 2x-6
dx + 13 /
dx
(jc2-6jc+13)2 dx
t
dt
(-3)%! + 15 ^ (-2-fo+13)2 dx + 13f [(x_3)2+4y
~~ ^~T~ | M = -x2 — 6x + 13 | _ 5 dt ,-,c r du_ , : \ dx I du = (2x - 6) dx I ~ 2 j i2+!        j »2
2
11 r
16 J
+ 1
arctgf-if + f /
«2+i
| arctg ^ | arctg ^ —
15
x2-6x + l3
15
+ f (iarctgf + i^ + C
rai, + {|arctg^ + i|—^ + C
j-3 2
| arctg ^ + § arctg ^ - 72
15
x2-6x + l3
51 Qi-f-tCT *~3 _|_ 13x-159 16 dIt-L&    2 8(x2-6x + 13)
+ 11
+1
x-3
8 (jc-3)2+4
+ c.
+ c
□
6.56. Compute
dx,
(jc-1)2(jc2+2jc+2)
Solution. Because the degree of the polynomial in the denominator is lower than in the numerator, these polynomials don't have a common root and we know the representation of the denominator in the form of a product of root factors, we know the form of the decomposition of the integrated function into partials fractions
b
(x-l)2(x2+2x+2)
-A. 1 _
x-1 ^ (jc-1)2
1 Cx+D "t~ x2+2x+2
(5) Consider a function / on an interval [a, b], which is only sequentially continuous. That means it's continuous in all points c e [a, b] except for finitely many discontinuities ci, a < ci < b, in which it has finite one-sided limits. Because of the additivity of integral with respect to the interval of integration, see 6.24(1), the last theorem implies that in this case the integral
F{x)
f
J a
f(t)dt
exists for allx € [a, b] and the derivative of function F(x) exists in all points x, in which / is continuous. Moreover it can easily be verified that F(x) is continuous at the remaining points, so it's a continuous function on the whole interval [a, b]. When evaluating the integral by primitive functions, it's necessary to choose its individual parts so that they are connected. Then the whole integral can be computed as a difference of the function F(x) in its boundary values.
(6) The Lagrange theorem of mean value of a differentiable function has an analogy that is called the integral mean value theorem. Consider a function/(x) that is continuous on an interval [a, b] and its primitive function Fix). The mean value theorem claims that there exists an inner point a < c < b such that
f(x)dx = F(b) - F(a) = F'(c)(b - a) = f(c)(b - a).
This statement can be fairly easily derived directly from the definition of the Riemann integral and then it can be used in a straightforward way in the final step of the proof of the fundamental theorem of integral calculus.
6.30. Improper integrals. When discussing integration of rational functions, we saw, that we would also like to work with definite integrals over intervals, where the aSt^J3 integrated function fix) has improper (one-sided) limits. In that case, the integrated function is neither continuous nor bounded and thus may not satisfy the earlier derived results. We speak of the "improper integral".
A simple help in this case is discussing the definite integrals on smaller intervals with the boundary approaching the problematic point and study, whether the limit value of such definite integrals exists. If it does, we say the corresponding improper integral exists and equals this limit. We'll show this procedure on a simple example:
"2 dx
Jo
is an improper integral, because the integrated function fix) — (2 — x)~1/4 has its left-sided limit at the point b — 2 which equals 00. The integrated function is continuous at all other points. Therefore we study the integrals
h
f2~& dx _ c
Jo     v/2 — x Js
-1/4
dy
:2^4 - -a3'4.
Notice that when we used substitution, we obtained an integral with recalculated upper bound S and lower bound 2. By transforming the bounds to a usual position, we add one minus sign.
375
CHAPTER 6. DIFFERENTIAL AND INTEGRAL CALCULUS
for A, B, C, D € R. If we multiply this equation by the denominator of the left hand side, we'll obtain the identity
x =
A (x - 1) (x2 + 2x + 2) + B (x2 + 2x + 2) + (Cx + D) (x - l)2 ,
which hold for all x el\{l). But on its both sided there are polynomials, so the equality must also occur for x = 1. By substituing this value we immediately get 1 = 5(1 +2 + 2), i.e. B = 1/5.
We could choose other real (eventually complex) numbers and substitute them into the given equation, but we cannot expect to directly determine another of the variables (if we don't substitute a root of the denominator). Thus we'll rather compare the coefficients at the same powers of the polynomials
x - \ (x2 + 2x + 2) = -i x2 + | x - |,
A (x - 1) (x2 + 2x + 2) + (Cx + D) (x - l)2 = (A + C) x3 + (A - 2C + D) x2 + (C - 2D) x — 2A + D, which leads to a system of equations
0 = -1/5 = 3/5 = -2/5 =
Note that this system must have exactly one solution (which is uniquely determined by any three of the given equations). The sought solution is then
A = 2^5' C = ~2l> D
Thus
A   + C,
A   -   2C + D,
C - 2D,
-2A + D.
_8_ " 25 •
(x-l)2(x2+2x+2)
dx = f
dx
dx
x+8
■ dx
In I x - 1
l
j_
25      1 ~      " 1 5(jc-1)
where we used
25(x-l)   1  J  5(x-l)2      J 25(x2+2x+2)
^ In (x2 + 2x + 2) - ^ arctg (x + 1) + C,
jt+8
x2+2x+2
dx = f
\(2x+2)
■ dx
■f
2x+2
7/
l
(x+l)2 + l
dx
l
x2+2x+2
dx +
x2+2x+2 x2+2x+2
2 In (x2 + 2x + 2) + 7 arctg (x + 1) + C.
□
6.57. Determine
(a) I^tlXT1 dx,xeR;
(b) fj^dx, x ^±1.
Solution. Case (a). First we must do the division of polynomials
(x3 + 2x2 + x - 1) : (x2 - x + 1) = x + 3 + ^fj-,
to consider a proper rational function (with the degree of the numerator lower then the degree of the denominator). Now we'll compute
xi+2x2+x-l x2-x + l
+ 3x + |/
dx = f x +3dx + f
2x-\
dx
3x-4 x2-x+l dx
dx
2  ■ - ■ 2 , x2-x+r -    2 j {x_hy+{4y
XY + 3x + | In (x2 - x + 1) - j= arctg ^ + C. Case (b). We have
fJ*L.1dx=fldx + ±f£-1-±f 1 /■ *
/UX I 72JT
■Jlx-2 8 ^ j^-v^ + l
7
8 J jc-1       8 J  x+l       4 J x2+l 1   r V2x+2
<ix
8 J x2+-j2x + l
dx
The limit for S —>• 0 from the right clearly exists, so we've evaluated the improper integral
JO
We can proceed in the same way if we want to integrated over an unbounded interval. In this case, we often speak of improper integrals of the first kind, while the integrals of unbounded functions on finite intervals are improper integrals of the second kind.
More generally, for example for a e R
-f
J a
f(x) dx — lim
f
J a
f(x) dx,
if the limit on the right hand side exists. Similarly we can have a finite upper bound and the other one infinite. If both are infinite, we evaluate the integral as a sum of two integrals with a chosen fixed bound in the middle, i.e.
/cx) pa />c
f(x)dx= J f(x)dx+ J -cx) J—oo Ja
f(x) dx.
The existence nor the value doesn't depend on the choice of such bound, because by changing it, we only change both sum-mands by the same finite value, but with the opposite sign. Conversely a limit for which the upper and lower bound would approach ±oo at the same speed can lead to different results! For example
^1
/
J —i
x dx —
-x2
= 0,
even though the values of the integrals x dx with one fixed bound approach infinite values fast.
When evaluating the improper integral of a rational function we must carefully divide the given interval according to the discontinuities of the integrated function and compute all the improper integrals seperately. Moreover it's necesarry to divide the whole interval in a way that we always integrate a function unbounded only in a neighbourhood of one of the boundary points.
6.31. New acquisitions to the ZOO. From the solved problems it could seem it's usual to find an indefinite integral by expressions composed of known elementary functions. That's a I completely false impression, f ^ On the contrary, an overwhelming majority of continuous functions leads to integrals we cannot express in this way. Even if we integrate fairly simple functions. Because the functions obtained by integration often appear in applications, many of them have names and before the advent of computers, extensive tables were published for the needs of engineers. In further text, we'll come back to the methods of obtaining numeric approximations of such functions.
Let's see at least some examples. In methods of signal processing, the function
sinc(x)
sin(x)
is very important. It can be verified in a way fairly straightforward, although toilful way, that it's a smooth function with limit values
/(0) = 1, /'(0) = 0, /"(0) = -^.
376
CHAPTER 6. DIFFERENTIAL AND INTEGRAL CALCULUS
x + | In | x
l|-±ln|x + l|-±arctgx + 4f /
If
8    i i    4 "^a- ' 16
Zjc+V2
dx
V2 r
16 J
dx
Ix-sfj
x2--j2x + \ dx
dx
x + £ In | x ^ln
16
| In | x + 11 - \ arctgx +
(x2 - V2x + l) - ^ arctg (V2x - l) -2(| In (x2 + V2x + l) - x arctg (V2x + l) + C.
□
6.58. Compute
2jc4+2jc2-5jc + 1
x{jc2-jc+1)2
<ix,
Solution. We have
2jc4+2jc2-5jc + 1
x ^ 0.
dx = f4x_+ r _^±s^ dx+ r
J   x       J  x2-x+\ J
f x(x2-x+lf
ln\x\ + y^dx + lf
x—6
c2-x+l dx
(x2-x + \f 2x-l
■ dx
'if'
dx
ln(
2 J (x2-x+l)2
2-x + l) + lf
2 J x2-x + l   1 2 2 - X + 1
dx
In
x
x Vx2
f = X
(if = (2x — 1) dx , I r dt
2 J t2
dx
H)
2x-l ,V3
du
dx
~ n/3 44V3  r du 9    J  [u2 + l] 44V3 /1
X + 1
= In In
dx
(x2-x+l)2
= In I x I +
[H)2
(^) +1
J__88 f
2t       9 j
dx
x Vx2
X + 1
+
x Vx2
X + 1
7V3  r du 3   j  «2 + l
+ ^ arctg w
+1
i
(5arctg" + 5^r) + c
^ arctg ^
In
l
V^
2(jc2-jc + 1) 1
2(jc2-jc + 1) +
2(x2-x + l)
22 V3
X + 1
2x-l
In
x Vx2
X + 1
f arctg ^1
3 j^-jt+l ^ C-
+ c
□
6.59. Integrate
(a) /i^i^-te
_5 In x_
x In3 x+x hi2 x—2x
dx, x > 0, x e.
Solution. Case(a). The advantage of the method of integrating rational functions described above is its universality (using it, we can find primitive functions of every rational function). Sometimes though, using the substitution method or integrating by parts is more convenient. For example,
v-2 I
JjT^dx
y = x
dy = 2x dx
I
dy
2(l+y2)
1 f dy
2 J l+y2
■ arctgy + C = j arctgx + C.
Case (b). Using substitution, we obtain an integral of a rational function
y = In x I
•f x In3
5 In x_
x+x In2 x—2x
5y
f In3
5 In x
„2 .
jc+ln2 x-2 x
y-l y2+2y+2
- dx =
x
-y+2
dy dy =
■ dx
Hence it can be immediately seen that this even function will have an absolute maximum at the point x = 0 and with the increasing absolute value of x, it will ripple with ever decreasing amplitude. The sine integral function is defined by
Si(jc) = /   sinc(f) dt.
Another important functions are Fresnel's sine and cosine integrals
FresnelS (x) FresnelC(x)
= / sinfijrf2) Jo
= / cosfijrf2) Jo
dt dt.
On the left figure, there's the course of the function Si(x), on the right we can see both Fresnel's functions.
ll/vJl'''1''''''1''1''1.....
We can also obtain a new type of functions, if we allow a free parameter in the integrated expression, on which the result then depends. As an example, let's look at one of the most important mathematical functions ever — the so calld Gamma function. It's defined by
r(z)
Jo
f~ldt.
It can be shown that this function is analytic at all points z £ Z and for small z e N we can evaluate:
Jo
T(l) = /    e-'fdt = [-e"1]^ = 1 T(2) = /    e_r f1 dt = [- e_r      + J    e~c dt = 0 + 1 = 1
fOO
e~' tdt = 0 + 2 = 2
poo poo
= /    e~'t1 dt = [- + /    e~'dt = <
Jo Jo
POO PC
T(3) = /    e-'f-dt = 0 + 2 / Jo Jo
and by induction we can easily derive that for all positive integers n this function yields the value of a factorial:
Tin) = (n- 1)!
The following figure shows the course of the function f(x) — ln(r(x)) in logarithmic scale of the dependant variable. Hence we can see how fast the factorial actually grows.
377
CHAPTER 6. DIFFERENTIAL AND INTEGRAL CALCULUS
I^y-\j^±%dy + 3j
1
In I y In I In x -
y-l 1
(j + l)2+l2
dy
In (y2 + 2y + 2) + 3 arctg (y + 1) + C
1 | - i In (ln2x + 21nx +2) + 3arctg(lnx + 1) + C.
□
For an arbitrary function / that is continuous and bounded on a bounded interval (a,b), the so called Newton-Leibniz formula
(6.9) / f(x)dx = [F(x)]ba := lim F(x) - lim F
J x^b- x=>a+
(X)
holds, where F'(x) = f(x),x e (a,b). Emphasise that under the given conditions, the primitive function F always exists and so do both proper limits in (||6.9||). Hence to compute the definite integral, we only need to find the antiderivative and determine the respective onesided limits (eventually only values of the function, if the primitive function is continuous at the boundary points of the interval).
6.60. Determine
(a)/ 1
Vx^+JT*
dx, x > 0;
dx, x ^
l.
dx, x e R \ [-1, 1];
dx, x e (—oo, —4) U (1, +oo);
dx, x e (—1,2); dx, i^l.
(x-l)^x2+x + l
Solution. In this problem, we'll illustrate the use of the substitution method while integrating expressions containing roots. Case (a). If the integral is in the form of
p(i).
pay
p(j)
y/x) dx
for certain numbers p(\), p(2),..., p(j) e N and a rational function / (of more variables), the substitution f = x is suggested, where n is the (least) common multiple of numbers p(l),..., p{j). Using this substitution, we can always reduce the integrand (the integrated function) to a rational function, which we can always integrate. We'll get
dx _ ? 0 = X,   i°/x = t
dx
dx
r^V^+^t2)
10/^ = 10/(1 10 [In? + In
lOf dt 1 i 1 _
10ty
j- 4-
2fi ^ 3fi
1
t+l
) dt =
dt
+
(i+ \°I)10 Case (b). For integrals
10
^    ln(l+0] + C: + C.
5    I__10
& + 3^
2^
p(\X
p<ijax~
+ b) dx,
where again p(l), ..., p(j) e N, / is a rational expression and a, & eK,we choose the substitution f = ax +b while preserving the meaning of n. In this way, we'll get
Before we plunge into another topics of the mathematical analysis, we'll introduce several more direct applications of the Rie-mann integral.
6.32. Riemann measurable sets. The definition of the Riemann integral was derived from the concept of size of an area in plane with coordinates x and y bounded the x axis, values of the function y — f (x) and boundary lines x — a, x — b. Moreover, the area above the x axis is given with a positive sign, while the values under the axis lead to a negative sign. In fact, so far we only know what's an area of a parallelogram determined by two vectors, more generally in vector space M" we know what's an area of a parallelepiped. The areas of other subsets are yet to be defined. For some simple objects like polygons, the definition is given naturally by assumed properties.
The concept of the Riemann integral that we bult can now be directly used to measure the "volume" of one-dimensional subsets.
We say the subset A C M is (Riemann) measurable, if the function x : K ^ K
Xa(x)
| 1 if x e A 0 ifx^A.
is Riemann integrable, i.e. the integral
/oo Xa (x) dx -oo
exists (the finiteness of its value doesn't matter). The function xaIs called the characteristic function of the set A, the value m(A) is called the Riemann measure of the set A. Notice that for an interval A — [a, b] it's actually the value
fb
Xa(x) dx — I   dx — b — a,
/    XA(x)dx — /
J oo J a
just as we've expected.
This definition of "size" also has the expected property that the measure of a union of finitely many Riemann measurable pairwise disjoint subsets is the sum of their measures. In particular, every finite set A has zero Riemann measure.
If we instead take a countable union though, this property is no longer true. For example, it suffices to take the set Q of all rational numbers as a union of one-element subsets. While every set containing only finitely many points has a zero measure by our definition, the characteristic function xq is not Riemann integrable.
Notice that the upper Riemann integral of the characteristic set Xa corresponds to the infimum of the sums of lengths of finitely
378
CHAPTER 6. DIFFERENTIAL AND INTEGRAL CALCULUS
x + l
V5*+i
dx
t3 = 3x + 1 dx = t2 dt
I3 -1+3
tdt
ft4 + It dt = I (4 +t2) + C = £     + l) + C
V~(3jt+1)2 /3x + l
3   (^±i + i) + c = /(3^TTjI^ + c.
Case(c). Another generahzations are the integrals of the type
f f (X   "mlax+h    "<2)lax+h P(i)lax+b\ ^x
J J y '    y cx+d '    y cx+d ' ■ ■ ■ '    y cx+d J '
with the only additional condition on the values a,b,c,d e R being ad — be ^ 0. Preserving the meaning of the aforementioned symbols, we now put f = Specifically,
t2 _ x+l
x-l ?+l t2-l 4t dt
f--l -At2
f f-l J t2+l
(t2-l)2
f(-L___I___i_
J \t+l       t-l       t2 + l/
In 11 - 11 - 2 arctg t + C
+ 1 (r2-l)2
dt
x-l
The simplifications
In
In
1
In
2 arctg,
+ C.
In
In
VI x + l |+V| x-l
VI jc+i l—VI *—i
In
(VlT+Tl+VU-i I)
x + l
x-l
+ c.
2In (VI* + 1 I + VU - 1 I) -hi2 for x e (—oo, — 1) U (1, oo) then allow to write
/ \^idx =21n(VU + l| + VU- l|)-2arctg. Cases (d), (e), (f). Now we'll focus on the integrals
f f (x, Vax2 + &x + c^j dx,
where we expect a ^ 0 and &2 — 4ac 7^ 0 for otherwise arbitrary numbers a,b,c € R. Recall that / is a rational expression. We'll distinguish two cases, when the quadratic polynomial ax2 + bx + c has real roots and when it doesn't.
If a > 0 and the polynomial ax2 + bx + c has real roots x\, x2, we'll use the representation
X-X2 X—X\
Vax2 + bx + c = yfa y (x — X1)2       = yfa \ x — x\
and let?2 = . If a < 0 and the polynomial ax2 + bx + c has real roots x\ < x2, we'll use the representation
V ax2 + bx + c =
-a   (x — xi)
2 *2-:t x—X\
-a (x — xi)
X2-X X—X\
and let f2 = jz^. If the polynomial ax2 + bx + c doesn't have real roots (necessarily for a > 0), we choose the substitution
Vax2 + bx + c = ±V^ ■ x dzt
with any choice of the signs. Note that we of course choose the signs so that we get as easy expression to integrate as possible. In all these cases, these substitutions again lead to rational functions. Hence
(d)
many disjoint intervals, by which we can cover the given set A, while the lower integral is the supremum of the sums of lengths of finitely many disjoint intervals that can be embedded into the set A. We can proceed in the same way in higher dimensions when defining the Jordan measure. For the definition of area (volume) in higher-dimensional space we will also be able to use the concept of the Riemann integral as well, when we generalize it for the multidimensional case. It's good to already notice though that the original notion about an area of a plane figure bounded by a graph of a function in the way described above will indeed be fulfilled completely.
6.33. Mean value of a function. For a finite set of values, we're used to think of their mean value and usually define it as the arithmetic mean.
For a Riemann integrable function f(x) on an interval (finite or infinite) [a, b], its mean value is defined by
m(f)
1 [l b-a Ja
f(x) dx.
By definition, m(f) is the altitude of the rectangle (oriented according to the sign) over the interval [a, b], which has the same area as the area between the x axis and the graph of the fix). Hence the integral mean value theorem holds in general.
Proposition. If fix) is a Riemann integrable function on an interval [a, b], then there exists a number mi f) satisfying
fb
fix) dx = mif)ib — a).
f
J a
6.34. Length of a space curve. The integral we built can be also effectively used to compute the length of a curve in H'~±^i multidimensional vector space M". For the sake of simplicity, we' 11 show this on the case of a curve in M2 with coordinates x, y. Suppose we have a parametric description of a curve F : R —>• M2,
Fit) = [git), fit)]
and look at it as a trajectory of a movement. For simplicity suppose that fit) and git) have sequentially continuous derivatives.
By differentiating the map Fit) we'll obtain values that will correspond to the speed of the movement along this trajectory. Hence the total length of the curve (i.e. distance traveled over time between the values t = a,t = b) will be given by an integral over the interval [a, b], where the integrated function hit) will be exactly the sizes of the vectors F'(t). Therefore we want to compute the length s given by
s= fhhit)dt= fh J if'it))2 + ig'it))2 dt.
In a special case when the curve is a graph of a function y = fix) between points a < b, we'll obtain
J a
1 + (f'(x))2 dx.
The same result can be intuitively seen as a corollary of Pythagor's theorem: for a linear increment of the length of a curve As corresponding to the increment Ax of variable x, we can compute
As
'(Ax)2 + iAy)2
379
CHAPTER 6. DIFFERENTIAL AND INTEGRAL CALCULUS
dx
dx
(e)
(f)
(x+4)Jx2+3x-4       J (x+4)J(x-l)(x+4) t2 _ x-1
dx
X
dx
x+4
-j__4
d-t2)2 dt
I
dt = f
11-^1
5   1-fi ' dt
sgn(l-f2)/l^ = fsgn(^)f + C f sgn (x) M + C;
x <ix
l+v/-Jtr2+Jt+2
Jt+1
l+V-(^-2)(^ + l)
dx
I+Cjc + I).
r2+l -6t
(t2+\y
dt
f JiLqL. dt = f
-6t      P- + 1 (t2+l)2 t2+3t+l
dt
-6t
■ dt
V5
V5
5 2r+3+V5       t2+l 5_2r-3+V5,
dt
2-fln
2f + 3 + V5
-2f - 3 + V5
2V5
In
2^ in
2 arctg t + + C = 2 arctg ^
+ C =
+
2V5 ln2VJ^+3-^
5 «   /2—x
+3+V5
2arctgy2^ + C;
(jc-1)v/*2+*+1
Vx2 + X + 1 = X + f x2 + x + 1 = x2 + 2xt + t2
X =
<ix
-2(P->+l)
i2 +2r-2   , i 2t-l 1 -2(^-^+1) , (2r-l)2 flf
2      A =
f_(J^Dl_dt - f_
J    >2+2>-2 >2 -n-1 j r2+2r-2
2r-1      2r-1 f f VI        1        _ VI        1 \ j  ^ 3   r+l-V3 3   ,+ 1+73/
dt
V3
In
f+ 1- V3
V3
In
t + 1 + V3
+ C
fin
h-1-V3 h-1+V5
V*2+jt + l-jt + l-\/3
+ C = ^ In 1 . . -pi
J V*2+*+l-*+l+V3
V3
+ c.
□
6.61.  Using a suitable substitution, compute
j„ _ /    „   -x/5-l\ i i /V5-1
—=— dx,    x € (-oo, ^i^1) U (-
;+Vx2+x-l \ 2     / V
Solution. Even though the quadratic polynomial under the root has real roots x\, x2, we won't solve this problem by substitution t2 = jEr^- We could proceed that way, but we'll rather use a method we introduced for the complex roots case. That's because this method yields a very simple integral of a rational function, as can be seen from the calculation
and when looking at our definition of integral, that means
Conversely, the fundamental theorem of differential calculus (see 6.25) shows, that at the level of differentials, such defined quantity of the length of a graph of a function y — y(x) satisfies
ds — yj 1 + (/ (x))2dx,
just as we've expected.
As an easy example, we'll calculate the length of a unit circle as a double of an integral of the function y — Vl — x2 over [—1, 1]. We already know that the result must be 2it, because we defined it in this way.
''LiTT-
'1 + (/)2 dx = 2
U1 + T^dx
: dx — 2[arcsinx]l_1 — 2it.
If we instead use
y
— \j r2 — x2 — rJ 1 — (x/r)2
and bounds [—r, r] in the previous calculation, by substituting x — rt we'll obtain the length of a circle with radius r.
sir)
(x/r)2
(x/r)2 2r[arcsinx]l_1 — 2itr.
■ dx — 2
sr—p
: dt
The result is of course well known from elementary geometry. Nonetheless, by using integral calculus, we've just derived an important fact, that the length of a circle is linearly dependent on its diameter 2r. The number it is exactly the ration, in which is this dependancy realized.
6.35. Areas and volumes. The Riemann integral can be used directly to compute areas or volumes of shapes defined by a graph of a function.
As an example, let's calculate the area of a circle with radius r. The halfcircle bounded by the function \lr2 — x2 has an area, whose double a(r) can be computed using the substitution x — r sin t,dx — r cos t dt (using the corollary for h in the paragraph 6.22)
a(r)
2 — x2 dx — 2r2
2r2
— [cos? sin? + t\_n/2
,jt/2
J-n/2
-.irr2.
cos ? dt
It's again worth noticing that this well known formula is derived from the principles of integral calculus and surprisingly, the area of a circle is not only proportional to the square of the radius, but this proportion is again given by the constant jr.
Also notice the ratio of the area and the perimeter of a circle,
i.e.
Ttr2 r 2nr 2
A square with the same area has a side of length spiir and therefore its perimeter is 4v/jrr. Hence the perimeter of a square with an
380
CHAPTER 6. DIFFERENTIAL AND INTEGRAL CALCULUS
dx
:+jx2+x-l
\Jx2 + X — 1 = X + t
x2 + x - 1 = x2 +2xt + t2
v — t+1 1-2*
-2J2 +2H-2 (t+2)(l-2t)
«+2 IJl) Vx2 + X - 1
<ix
-2J2 +2t+2
dt
d-2f)2
t - 2 In 11 + 2 I
Iln|f
+ C
2 In ( Vx2 + x
In
Vx2 + x - 1
1 - X + 2)
+ C.
Note that each recommended substitution (see the above problems) can be in most specific problems usually replaced by another substitution, which allows to obtain the result in a much easier way. An undeniable advantage of the recommended substitutions is their universality though: by using them, one can compute all integrals of the respective type. □ Another method of integration can be found on page 406
D. Definite integrals 6.62.   Compute the definite integrals
jtg2xdx, j
dx.
Solution. For x 7^ |- + kit, where k e Z, we have
/ tg2 x dx = tgx — x + C, as we have compute earlier. This implies that
ftfxdx = [tgx -x]^ = 73 - | - (-l _ f) = _| _ f.
tc/6
Of course, definite integrals can be also computed directly. For example, the substitution y = tg x yields
jr/3
jr/3
/ tg2xdx = f
jt/6 tc/6 \/3      , v3
/ £?dy=  f 1
1/V3 1/V3
■ dx
y = tgx; dy
sin2x = T^f
1+tg2;
2
j_
1+y2
COS^ X
y2
l—dy = [y -arctgy]/5   - 2
1/V3 V3
When doing the substitution, we only need to not forget to change the limits of the integral to values gained by substituting V3 = tg (tt/3),
l/v3=tg(7T/6).
We'll compute the second integral by integration by parts for the definite integral. (Note that we also found the primitive function to function y = x cos-2 x earlier.) We have
jt/4 0
■ dx
F(x) = x G'(x) =- 1
2,
F'(x) = 1 G(x) =tgx
jt/4
[x tg x] ^ - / tg x dx = [x tg x] ;/4 + 7 =g± dx
0 0
[x tgx\l/4 + [In (cosx)]^4 = f + In ^
71—2 In 2
□
area of a unit circle is 4v/jr, which is approximately 0.8 more than the perimeter of a unit circle. It can be shown, in fact, that circle is a shape with the smallest perimeter among all with the same area. We'll get to derive such results in our comments about the so called calculus of variations in later chapters.
Another analogy of the this principle is the computation of surface and volume of a solid of revolution. If a solid originates by the rotation of a graph of function / around the x axis in an interval [a,b], an increment Ax causes the area to increase by a multiple of the length A s of the curve given by the graph a of function y — f(x) and the size of a circle with radius f(x). Hence the surface can be computed by the formula
Mf)
pb pb — 2tt I   f(x) ds — lit I
J a J a
f(x)Jl + (f'(x))2dx,
where ds is given by the increment on the length of curve y  —  f(x), see above.   If we would determine the solid of revolution by its bound parametrized by a pair of functions [x(f), y(0L the corresponding differential will be in the form ds — J(x' (0)2 + (/ (0)2 dt and for the surface, we'll get
A = 2it f y(f)^(y(t))2
J a
{x1 {t))2 dt.
When changing Ax, the volume of the same solid will increase by a multiple of this increment and the area of a circle with radius fix). Hence it's given by the formula
Vif)
J a
ifix))2dx.
As an example of using the formulas for surface and volume, we'll derive the well known formulas for the surface of a sphere and volume of a ball with diameter r.
1
\r — 2tt j )
1 - ix/r)
= 2nr
Vr
Ldt=
x2) dx
: dt
ix/r)2
\nrL
r2 x
4 ,3
3
Similar to the circle, a ball is also and object which has the smallest volume among all with a given surface. That's the reason why soap bubbles almost always assume this shape.
6.36. Integral criterion of the convergence of series. Using the improper integral, we can also decide the question of convergence of a wider class of infinite series than before:
Theorem. Let Y^rT=i /(n) be a series such that the function f : M —>• M is positive and nonincreasing on the interval (1, 00). Then this series converges if and only if the integral
fix) Ax.
converges.
Proof. If we interpret the integral as an area under the curve, -2 the criterion is clear.
381
CHAPTER 6. DIFFERENTIAL AND INTEGRAL CALCULUS
6.63.   Compute the definite integrals
(b) J2, dx;
(C) fo (e2*+3 + cos2*) dx'i
Solution. We have (a)
TT^T2
dx
y = 1 — x dy = -
2
2x dx
r-^dy
f^dy = [Vy]10 = l;
(b)
dx
Jx^l
Z = X + V-?
dz,
[lnz]
2+V3
2+V3
/ \dz
In
(2 + V3);
(c)
1 / 1  \        1 1 1
/ ( ??+3 + ^7 )dx = f       dx + f ^5-^ <ix
0
p = qx
dp = ex dx tsl
1 1 (75 j +1
.73
e/V5
f f ^TT*+tgl
3
[arctgs]^ + tg 1 = f (arctg s#-f)+tgl;
□
6.64.  Prove that
20 - J
n+x     — 10
Solution. Because
/l+jc
< Xv
x e [0, 1],
the geometric meaning of the definite integral implies
72 _ r *i 1
20       i V2 0
0
/l+jc
dx < f x9 dx 0
10 ■
□
6.65.  Without symbols of differentiation and integration, express
/ f5 In (t + 1) <#
x e (-1, 1),
if the differentiation is done with respect to x.
Solution. Integration is often thought of as the inverse operation to differentiation. In this problem, we'll use this "inverseness". The function
X
F(x) := ft5 ln(f + 1) dt,    x € (-1, 1)
0
If the given series diverges, then the series 2~Zn=2/(n) diverges as well. For any k e N, we have the following inequality for k-ih partial sum s'k (of the series without its first term)
k pk
= 1>></
n=2
f(x)dx,
because ^k is a lower sum of the Riemann integral f f (x) dx. But then
/>oo f>k
/    /(x)dx = lim  /   /(x)dx > lim ^k — 00
Jl k^ooJi k^oo
and the intergral diverges.
Now suppose the given integral converges and denote the k-ih partial sum of the given series by Sk. Then we have the inequalities
k^oo
/(x)dx — lim  /   /(x)dx < lim sk < 00,
k^oo
rk
because Sk is an upper sum of the Riemann integral J: f(x) dx and we suppose the given series converges.
□
3. Infinite series
While building our menagerie, we have already encountered power series, which extend the collection of all polynomials in a natural way, see 5.44. We also said that we'll obtain a class of analytic functions in this way, but we didn't even prove that power series are continuous functions. Now we can easily show that it is indeed the case and that we can also differentiate and integrate power series term by term. Because of this, we will see that it's not possible to obtain a sufficiently wide class of functions by using power series. For example, in this way we can never obtain only sequentially continuous periodic functions, which are very important for simulations and processing of audio and video signals.
6.37. How tamed are our series of functions? Let's now return iJl   to disscussing the limits of sequences of functions and the sum of series of functions from the point of applying the ^ methods of differential and integral calculus. Consider a convergent series of functions
S(x) = Ylf«(x)
«=1
on an interval [a,b]. Natural questions are:
• If all functions /„ (x) are continuous at some point xo e [a, b], is the function S(x) also continuous at the point xo?
• If all functions f„(x) are differentiable at some point a e [a,b], is the function S(x) also differentiable there and does the equality S> (x) = £~ 1 f> (x) hold?
• If all functions /„ (x) are Riemann integrable on an interval [a, b], is the function S(x) also integrable there and does the equality fha S{x)dx = Y.T=i fa /« hold?
First we'll show on examples that the answer on all three such formulated questions are "NO!". But then we'll find simple additional conditions on the convergence of the series which, on the contrary, will guarantee the validity of all three statements. Hence the series of functions are not generally well managable, though we can choose a wide class of ones which can be worked with very
382
CHAPTER 6. DIFFERENTIAL AND INTEGRAL CALCULUS
is clearly the antiderivative of function fix) := x5 In (x + 1) on interval (—1, 1), i.e. by differentiating it, we'll get exactly /. Hence
/t5 In (t + 1) dt
f t5 In (t + 1) dt
o
£. Improper integrals
-x5 In (x + 1)
□
6.66.  Decide if
+0o
f        dx e
<J x^Jx
Solution. The improper integral represents the area of the figure between the graph of a positive function
y = x > 1
and the x axis (from the left, the figure is bounded by the line x = 1). Hence the integral is a positive real number, or equals +oo. We know that
f < arctgx < §,   x e [1, +oo).
But that implies
+00 +00 +00
z = z f x-ldx< f ^ dx<\ f x-2 dx = 7t,
1 1 1
i.e. in particular
+0o
f ^ dx e
□
The formula (|| 6.91|) can be also used in a case when the function / is unbounded or the interval (a, b) is unbounded. We speak of the so called improper integrals. For the improper integrals, the limits on the right hand side may be improper and may not exist at all. If one of the Umits doesn't exist or we receive an expression oo — oo, it means that the integral doesn't exist (oo — oo doesn't have a character of an indefinite expression in this case). We say the integral oscilates. In every other case, we have the result (recall that oo + oo = +oo, —oo — oo = — oo, ±oo + a = ±oo for a e M).
6.67. Determine
(a) / sin x dx;
(b) /
dx . x*+x2 '
(d) / %
-l
Solution. Case (a). We can immediately determine
oo
/ sinx dx = [— cosx]^° = lim (— cosx) + cos 1.
well. Fortunately, power series will belong there as well. Then we'll also give some thoughts to alternative concepts of integration that work more satisfyingly even for wider classes of functions
6.38. Examples of nasty sequences. (1) First consider the functions
fnix) = (sinx)"
on intervalu [0, it]. The values of these functions will be nonnega-tive and lesser than one at all points 0 < x < it, except for x — j, where the value is 1. Hence on the whole interval [0, it], these functions will converge to the function
fix) = lim fnix) =
0   for all x / f
1
forx = j.
point by point. Clearly, the limit of the sequence of functions /„ is a noncontiguous function, even though all functions /„ (x) are continuous. The problematic point is even an inner point of the interval.
We can find the same phenomenon for series of functions, because the sum is the limit of partial sums. Hence in the previous example, it suffices to express /„ as the n-t partial sum. For example, fiix) — sinx, f2ix) — (sinx)2 — sinx, etc. The left figure plots the functions fmix) for m — n3,n — 1, ..., 10.
(2) Let's now look at the second question, i.e. badly behaving derivatives. Quite natural idea on the same principle as above is constructing a sequence of functions which will always have the same nonzero derivative at one point, but they will become smaller and smaller, so they will pointwise converge to an identic zero function.
The previous figure on the right plots the functions
/„(x)=x(l-x2)"
on interval [—1, 1] for values n — m2, m — 1, glance, it's clear that
10. At first
lim /„ (x) = 0
and all functions /„ (x) are smooth. Their derivative at the point x — 0 is
/„'(0) = ((1 -x2)" -2«x2(l -x2)""1)!^ = 1
no matter the n. But the limit function for the sequence /„ has a zero derivative at every point of course!
(3) We've already seen the counterexample to the third statement in 6.32. The characteristic function xq of rational numbers can be expressed as a sum of countably many functions, which will be numbered exactly by rational numbers and will be zero everywhere except for the single point after which they are named for,
383
CHAPTER 6. DIFFERENTIAL AND INTEGRAL CALCULUS
Because the limit on the right hand side doesn't exist, the integral os-cilates.
Cases (b), (c). Analogously, we can easily compute
dx
x*+xl
dx
2(x2+l)
1 v ' 1 1 J\ I 1
1+x2 dX
r   i i <
["I-arctgxJ
lim (-i - arctgx) + \ + arctg l=0-f + l + | = l- f and even more easily
/§ = [2V*t = 4-0 = 4,
o
where the primitive function is continuous from the right side at the origin (thus the limit equals the value of the function). Case (d). If we'd mindlessly compute
-l -1
l
r dx_
J X2
-1
2M
0
2 • oo = +oo.
6.68.   Compute the definite integrals
(a) /o°° dx'
(b) f^2 In I x I dx;
(°) h  — dx> (d) f:x ^ dx;
r2 1
(e) It
' 1  x In x
Solution. We have (a)
dx.
dx
(x+2)5
[(x+2)-^
(Ämo(x + 2)"4-2"4) = -Ho-^) = ^
(b)
2 0 2 2
/ In I x I dx = f In | x \ dx + f In | x \ dx = 2 f In x dx
-2-2 0 0
F(x) = lnx F'(x) = \ G'(x) = 1      G(x) = x
2 I [x lnx]2, — / 1 dx
where they value will be 1. Riemann integrals of all such functions will be zero, but their sum is not a Riemann integrable function.
This example illustrates the fundamental flaw of the Riemann integral, which we'll come back to later.
We can easily also find an example when the limit function / is integrable, all functions /„ are continuous, but the value of the integral still isn't the limit of the values of integrals of /„. It suffices to slightly change the sequence of functions which we used above:
fn(x) =2nx(l -x2)".
We can easily verify that the values of these functions also converge to zero for every x e [0, 1] (for example we can see that
\n(fn(x)) -+ -oo). But
we'd receive an obviously wrong result (a negative value while integrating a positive function). The reason why the Newton-Leibniz formula cannot be applied in this way is the discontinuity of the given function at the origin. But if we use the additivity rule
b c b
f f(x)dx = f f(x)dx + f f(x)dx,
a a c
which always holds, if the integrals on the right hand side are sensible, we'll find the correct result
101 n 1
C dx_ — C dx_ \ C dx_ — r_n   _i_r_ii —
J x2 — J x2 ~t~ J x2 — L jcJ-i + L xk -
-1 -1 0
lim - 1 - 1 - lim = oo - 2 + oo = +oo.
Note that the even character of function y = x~2 also implies
Jo
fn (x) dx
1
1/0.
6.39. Uniform convergence. An obvious reason of failure in all three previous examples is the fact that the speed of pointwise convergence of values f„(x) —>• f(x) varies dramatically point from point. Hence a natural idea is to limit ourselves to cases where the convergence will have roughly the same speed all over the interval
Uniform convergence [___
□ 1
Definition. We say the sequence of functions /„ (x) converges uniformly on interval [a, b] to a limit fix), if for every positive number e, there exists a natural number N e N such that for all n > N and all x e [a,b] the inequality
\fn(x) ~ f(x)\ < S
holds.
We say a series of functions converges uniformly on an interval, if the sequence of its partial sums converges uniformly.
Albeit the choice of the number N depends on the chosen e, it's independant on the point x e [a,b]. That's a difference from the pointwise convergence, where N depends on both e and x. We can visualise the definition graphically in this way: if we consider a zone created by a translation of the limit function / (x) to / (x) ± s for arbitrarily small, but fixed positive e, all of the functions /„ (x) will fall into this zone, except for finitely many of them. Clearly the first and the last of the previous cases didn't have this property; at the second case, the sequence of derivatives f'n lacked it.
The following three theorems can be briefly summed up by a statement that all three generally false statements in 6.37 are true for uniform convergence (but beware of the subtilities when differentiating).
6.40. Theorem. Let fn(x) be a sequence of functions that are continuous on interval [a, b], which converges uniformly to function f(x) on this interval. Then f(x) is also continuous on interval [a, b].
Proof. We want to show that for an arbitrary fixed point xq e [a, b] and any fixed small e > 0, the inequality
\f(x)-f(xo)\<e
384
CHAPTER 6. DIFFERENTIAL AND INTEGRAL CALCULUS
2 ([jc lnx]2 - [x]2) = 2    In2 - lim (x lnx) - 2 + OJ
41n2 - 4;
(c)
f^Pdx
t —   h 00
r^i=2ft-'dt=2[-t-']\
dt = -t-,= dx
2-s/x 1
2 ^ lim e~' — e
-i\ - 2.
(d)
W = 1/x
/ ueu du
x2 ^
F(u) = u G'(u) = e'
/ ueu du
F'(u) = 1 G(u) = e"
[u e"]!^ - / e" du = [u e"]!^ - [e"]
Ml ~ 1
lim we"--+ lim eM
_ 2.
e'
(e)
2
J jc In 1
In 2
1 In 2
Jo
r - lnx   1 _ f a_ _ n ,1
x        I 0
In (In2) — lim lnr = In (In2) + 00 = +00.
□
6.69.   Compute the improper integrals
/ x2 e~x dx; f
dx
Solution. Because the improper integral is a special case of a definite integral, we have at our disposal the basic methods to compute them. By integration by parts, we obtain
/ x2 e x dx 0
F(x) = x2 G'(x) = e"
F'(x) = 2x G(x) = -e"
[—x2 e x
+ 2 / x e x dx
F(x) -. G'(x)
F'(x) = 1 G(x) = -e"
- lim 4 + 2 \-x e~*l!° + 2 f e~x dx =
x^oo e L 0
0 - 2 lim 4 + 2 \-e~x]™ = 0 + 2 ( lim -e~x + l) = 2.
The substitution method then yields
r    dx    _ r
e2x+l
dx
y = e dy = ex dx
f dy
J   y2 + l
[arctgyl!0 = lim arctgy = f,
when the new limits of the integral are derived from the limits
lim tx = 0,
lim tx = +00.
x^-oo
□
6.70. Compute
0
2n+1 e~x2 dx,      n e N.
will hold for all x close enough to xo. From the definition of uniform convergence, for some e > 0 we have
\fn(x) ~ f(x)\ < S
for all x € [a, b] and all sufficiently large n. Choose some n with this property and consider S > 0 such that
\fn(x) ~ fn(xo)\ < S
for all x in <5-neighbourhood of xo (that's possible, because all fn(x) are continuous). Then
\f(x) - f(x0)\ < \f(x) - fn (x)| + \fn(x) - fn(x0)\
+ \fn(xo) - f(xo)\ < 3e
for all x in our chosen <5-neighbourhood of x$. □
6.41. Theorem. Let fn(x) be a sequence of Riemann integrable functions on a finite interval [a, b] which converge uniformly to function f (x) on this interval. Then f(x) is Riemann integrable as well and
lim  /   f„(x)dx— I   I lim f„(x))dx — I f(x)dx.
The proof of this theorem is based upon a generalization of properties of Cauchy sequences of numbers to uniform convergence of functions. This way we can work with the existence of the limit of a sequence of integrals without needing to know it.
___I    Uniformly Cauchy sequences |____-
Definition. We say the sequence of functions /„ (x) on interval [a, b] is uniformly Cauchy, if for every (small) positive number e, there exists (large) natural number N such that for all x € [a, b] and all n > N, the inequality
\fn(x) ~ fm(x)\ < £
holds.
Clearly every uniformly convergent sequence of function on interval [a, b] is also uniformly Cauchy on the same interval; it suffices to notice the usual bound
\fn(x) - fm(x)\ < \fn(x) - f(x)\ + \f(x) - fm(x)\
based on triangle inequality.
This observation will now suffice to prove our theorem, but first we'll stop at a convinient converse statement:
Proposition. Every uniformly Cauchy sequence of functions fn (x) on interval [a, b] uniformly converges to some function f on this interval.
Proof. The condition for a sequence of functions to be Cauchy implies that also for all x e [a,b], the sequence of values /„ (x) is a Cauchy sequence of real (eventually complex) numbers. Hence the sequence of functions /„ (x) must converge pointwise to some function f(x).
We'll show that in fact, the sequence f„(x) converges to its limit uniformly. Choose N large enough so that
\fn(x) ~ fm(x)\ < £
385
CHAPTER 6. DIFFERENTIAL AND INTEGRAL CALCULUS
Solution. We'll first solve this problem by the substitution method and then repeatedly apply integration by parts, yielding
J2       | oo
1 ff e~ydy =
y = xr _ dy = 2x dx |     2 Q
F'iy) = nf~l G(y) = -e-y
f x2n+l e~x dx o
■ F(y) = f G'iy) = e-y
[-/ e~y]™ + nff-1e-ydy \ = \ f f~l e~y dy
F(y) = f~l G'(y) = e-y
F'(y) = (n- l)f~2 G(y) = -e-y
oo
I ( [-f-1 e->]~ + (n-l)f f~2 e~y dy
e-ydy = ... = nÜLf^l f ye-y dy
F(y) = y G'(y) = e-y
F'iy) = i G(y) = -<
ni([-y-y]7 + f'-3dy
ni [-e-yE
00 «!
2 ■
□
6.71. In dependancy on a e M+ determine the integral f0 j% dx. O
F. Lengths, areas, surfaces, volumes
6.72. Determine the length of the curve given parametrically:
x = sin21,    y = cos21,
for? e [0, f].
Solution. According to ||??||, the length of a curve is given by the integral
V(x'(0)2 + (/(O)2 dt = /   V(sin202 + (-sin202 dt
V2sin2f dt = V2.
If we reaUze that the given curve is a part of the line y = 1 — x (since sin21 + cos21 = 1) and moreover the segment with boundary points [0, 1] (for t = 0) and [1, 0] (for t = |), we can immediately writw its length V2. □
6.73.  Determine the length of a curve given parametrically:
x = t2,    y = r3
for? e [0, V5].
Solution. We'll again determine the length i by using the formula
1991
f   J4t2 + 9t4 dt = I t^9t2+Adt Jo Jo
1 r5 .- 2 3 , 335
for some small positive e chosen beforehand and all n > N, x € [a, b]. Now choose one such n and fix it, then we have
\f„(x)-f(x)\ for all x € [a, b].
lim \f„{x) - fm(x)\ < £
□
Proof of the Theorem. Recall that every uniformly convergent sequence of functions is also uniformly Cauchy and that the Riemann sums of all single terms of our sequence converge to
rb
J a f" W ^x independently on the choice of the partition and the representants. Hence, if we have
\fn(x)
for all x € [a, b], then also
fm(x)\ < E
rb rb
/   /„ (x) dx -Ja Ja
fm (x) dx
< e\b ■
Therefore the sequence of numbers f„ (x) dx is Cauchy, hence convergent. Also because of the uniform convergence of the sequence /„ (x), the same must be true for the limit function fix) (its Riemann sums are arbitrarily close to the Riemann sums of the functions /„ for sufficiently large n), so the limit function fix) will again be integrable. Moreover,
pb pb /   /„ (x) dx - I .
Ja Ja
, (x) dx — I   fix) dx so it must be the correct limit value.
< e\b ■
□
For the corresponding result about derivatives, we need to take extra care regarding the assumptions:
6.42. Theorem. Let fn(x) be a sequence of functions differen-tiable on interval [a, b] and assume fn(xo) —>• f(xo) at some point xq € [a, b]. Moreover, let all derivatives gn (x) = f'n (x) be continuous and let them converge uniformly to function g(x) on the same interval. The the function fix) = f* git) dt is also differentiate on interval [a, b], the functions fn(x) converge to fix) and fix) = g(x).
Proof. If we consider functions /„ (x) = /„ (x) — /„ (xo) instead of fix), the assumptions and conclusions in the theorem will be valid or invalid for both sequences and the same time. Hence without loss of generality we can assume that all our functions satisfy /„ (xq) — 0. Then for all x e [a, b], we can write
fr
n (x) —  I gn
Jx0
it) dt.
Because the functions g„ uniformly converge to function g on whole [a, b], the functions /„ (x) converge to function
fix)
f
git)dt.
Because function g is a uniform limit of continuous functions, it is again a continuous function, thus we have proved all that was needed, see 6.24 about the Riemann integral and a primitive function. □
For infinite series, we can sum up the previous conlusions in this way:
□     6.43. Corollary. Consider functions fix) on interval [a, b].
386
CHAPTER 6. DIFFERENTIAL AND INTEGRAL CALCULUS
6.74. Determine the area to the right of the line x = 3 and bounded by the graph of a function y = ^-j- and the x axis.
Solution. The are is given by the improper integral J3°° -^-j-dx. We'll compute it using decomposition into partial fractions:
1
x3 - 1 1
x = 1
xu : 1 = C-B
x : 0 = A + C
Ax + B C +-
x2 + x + 1
1
(Ax + B)(x - 1) + C(x2 + x + 1), 1
c = r
2
B = —,
3
1
A = — 3
(1) If all functions fn (x) are continuous on [a, b] and the series
oo
S(x) = ^2fn(x)
n = l
uniformly converges to function S(x), then S(x) is continuous on [a, b].
(2) If all functions fn (x) are continuously differentiable on interval [a, b], the series S(x) — YlnLi fn(x) converges for some xq € [a, b] and the series T(x) — Yl^Li fn(x) converges uniformly on [a, b], then the series S(x) converges, it is continuously differentiable on [a, b] and S* (x) = T(x), i.e.
/ OO \ i oo
(£/«(*)) = £/»•
\ = 1 '       n = l
(3) If all functions f„ (x) are Riemann integrable on [a, b] and the series
and we can write
r ' il = ir(J— -i+2 )
J3   x3-l 3J3   \x-l    x2+x + \J
dx.
Now we'll seperately determine the indefinite integral / xi+x\\ ^x
/ /
x +2
x2 + x + 1
dx
x +
(* + 5>2 + !
dx +
3 r l
2j (jc + I)
■ dx
2^+4
substitution at the first integral t = X2 + X + 1
At = 2(x + \) dx
2 J  t        2 J (jc + I;
)2 + ^
1>   ^ 4
substitution at the first integral
S=X + \
As = Ax
1 - 3 - ln(x2 + x + 1) + -
1 9
- ln((x2 + x + 1) +
J s2 + l
34 r l
As
substitution at the second integral
U = ff
Au = -Xs As
V3
- ln(xz + x + 1) + 2
/ ~T~~—7 d« = J u2 + l
- ln(x2 + x + 1) + V3 arctan(w) =
1 9 /2x + l\
- ln(x2 + x + 1) + V3 arctan -— .
2 V  V3 /
S(x) = £/„(x)
n = \
uniformly converges to function S(x) on [a, b], then S(x) is integrable on [a, b] and
/ £/«(*) k* = £ / f»
Ja   \ = l 7 n = lJa
(x) dx.
6.44. Test of uniform convergence. The simpliest way to find out whether a sequence of functions converges uniformly is a comparison with absolute converts gence of a suitable sequence of numbers. This is often called the Weierstrass test. Suppose we have a series of functions /„ (x) on interval I — [a, b] and we have a bound
\fn(x)\ <%el
for suitable real constants a„ and for all x € [a, b]. We can immediately put a bound on the differences of the partial sums
(x) = E /«w
«=i
for distinct indices k. For & > m we get
k*(x) - Sm(x)| =
n=m + l
E     ^ E i/»(x)i< E a*-
n=m + l
If a series of (nonnegative) constants 2~2T=i a" is convergent, then of course the sequence of its partial sums is Cauchy. But we have just verified that in that case the sequence of partial sums s„ (x) will even be uniformly Cauchy.
Thanks to the statement proven above in 6.41 we just proved the following
Theorem (Weierstrass test). Let fn (x) be a sequence of functions defined on interval [a, b] with \fn(x)\ < an e M.
If the series of numbers Y^Li a" convergent, then the series S(x) — Z~2^Li fn(x) converges uniformly.
387
CHAPTER 6. DIFFERENTIAL AND INTEGRAL CALCULUS
In total, for the improper integral we can write: 1
1
dx
1       r 1      9 r (2x + 1
- lim  In \x — 11--ln(x + x + 1) — v 3 arctan I --=—
3       |_ 2 V V3
1/1 1      , r (28 + \\
- lim   - In \8 - 1| - - ln(<52 + 8 + 1)- V3 arctan ——
3 &^oo \3 2 V V3 /
1 1 73 7
— In 2 H— In 13 H--arctan —=
3 6 3 V3
1 1 V3 7
-In 13--In 2 H--arctan —= —
6 3 3 V3
1
3 s-
lim In
1
Vx2 + x + 1 7
1 r (28 + 1 - lim v 3 arctan --=—
3 <5—» oo V ^3
- In 13 H--— arctan —=--m2--it.
6 73 V3    3 6
□
6.75. Determine the surface and volume of a circular paraboloid created by rotating a part of the parabola y = 2x2 for x e [0, 1] around the y axis.
Solution. The formulas stated in the texts are true for rotating the curves around the x axis! Hence it's necessary either to integrate the given curve with respect to variable y, or to transform.
-2 „
V
/
Jo
■ Ax = it
S   =   2it f + — Ax =2it f J- + — Ax
Jo V 2V      8x J0 V2 16
7T-
17vT7 - 1 "24 '
□
6.76. Compute the area 5 of a figure composed of two parts of plane bounded by lines x = 0, x = l,x = 4, the x axis and the graph of a function
y
fx-l
Solution. First realize that
fx-l
< 0,   x e [0, 1),
/jc—1
and
lim
-oo,
lim
> 0, xe(l,4]
+oo.
,1_ jx-1 x^l + V*-l
The first part of the figure (below the x axis) is thus bounded by the curves
y = 0,    x = 0,    x = 1,    y = -yL= with an area given by the improper integral
fx-l
dx:
while the second part (above the x axis), which is bounded by the
6.45. Consequences for power series. The Weierstrass test is very useful for discussing power series
S(x) — ^ a„ (x - x0)"
«=0
centered at point x$.
During our first encounter with power series we showed in jgu        5.47 that each such series converges on (xo — S, xo + S), where the so called radius of convergence S > 0
•Vv ' can also be zero or oo. (see 5.51). In particular, in the proof of the theorem 5.47, we used a comparison with a suitable geometric series to verify the convergence of the series S(x). By the Weierstrass test, the series S(x) converges uniformly on every compact (i.e. finite) interval [a, b] belonging to the interval (xo — S, xo + S). Thus we proved this:
Theorem. Every power series S(x) is continuous and continuously differentiable at all points inside its interval of convergence. The function S(x) is also integrable and differentiating and integrating can be done term by term.
In fact, the so called Abel's theorem states the power series are continuous even in boundary points of their domain (including eventual infinite limits). We won't prove it here.
Just proven pleasant properties of power series also point at the boundaries of their useablity when simulating dependences of some practical events or processes. In particular, it's not possible to simulate sequentially continuous functions very well by using power series. As we' 11 see in a moment, for specific needs it's possible to find better sets of functions /„ (x) than the values /„ (x) — x" . The best known examples are the Fourier series and the so called wavelets which we'll discuss in the next chapter.
6.46. Laurent series. In the context of Taylor expansions let's look at a smooth function f(x) — e_1/* from paragraph 6.6. We've seen it's not analytic at zero, because all its derivatives are zero there. So while at all other points xo this function is given by convergent Taylor series with radius of convergence r — \x$\, at the origin the series converges at only one point.
But if we substitute the expression — 1/x2 for x in the power series for e*, we get a series of functions
S(x) = V-(-D"x
-In
E
(_1)N
x2",
«=o
which will converge at all points except for x / 0 and gives us a good description of behavior near the exceptional point x — 0. Thus it seems useful to consider the following more general series quite similar to the power ones:
___|    Laurent series [___
A series of functions of the form
S(x) -  ^2 an(x - x0)n
is called a Laurent series centered at xq. We call the series convergent if both its parts with positive and negative exponents converge separately.
curves
388
CHAPTER 6. DIFFERENTIAL AND INTEGRAL CALCULUS
y = 0,
1,
has an area of
Since
4
/■
1
4, y
dx.
. ,      |V(x-D2 + c,
/Jt —1 z
dx
the sum Si + S2 can be gotten as
(l^W -§)+ ^Hm (| V9 - f^l),
|(l + V9).
lim
We have shown among other things, that the given figure has a finite area, even though it's unbounded (both from the top and the bottom). (If we approach x = 1 from the right, eventually from the left, its altitude grows beyond measure.) Recall here the indefinite expression of type 0 • oo. Namely, the figure is bounded if we limit ourselves to x e [0, 1 - S] U [1 + S, 4] for an arbitrarily small S > 0. □
6.77. Determine the avarage velocity vp of a solid in the time interval [1, 2], if its velocity is
v(t) = -?f—, t€[l,2].
Omit the units.
Solution. To solve the problem, it suffices to realize that the sought avarage velocity is the mean value of function i; on interval [1,2]. Hence
2 5
vP = 2TT / "7= dt = f t1^ dx = V5 - V2,
with 1 + t2 =x,tdt
Ji+t2 dx/2.
□
6.78. Compute the length s of a part of the curve calles tractrix given by the parametric description
fit) = rcost + rln(tg|) ,    g(t)=rsiat, te[jv/2,a],
where r > 0, a € in/2, jt).
Solution. Since
fit)
-r sin? +
2tg j -cos2 j
-r sin t +
fit) = r cos t on interval [tv/2, a], for the length s we get
jt/2
r2 cos4 t
+ r2 cos2 t dt
dt
f S2SI dt J    sin t
Tl/2
Tl/2
-r [In (sin 0]^2
-r In (sin a)
□
6.79. Compute the volume of a solid created by rotation of a bounded surface, whose boundary is the curve x4 — 9x2 + y4 =0, around the x axis.
Solution. If [x, y] is a point on the x4 — 9x2 + y4 =0, clearly this curve also intersects points [—x, y], [x, —y], [—x, —y]. Thus is symmetric with respect to both axes x, y. For y = 0, we have x2 ix — 3) ix + 3) = 0, i.e. the x axis is intersected by the boundary
The purpose of Laurent series can be seen at rational functions. Consider such function Six) = f(x)/g(x) with coprime polynomials / and g and consider a root xo of polynomial g(x). If the multiplicity of this root is s, then after multiplication we get function Six) = S(x)(x — xqY, which will now be analytic on some neighbouthood of the point xq and therefore we can write
Six) =
Cl-l
ix - x0y
oo
^ a„(x
x0
+ a0 + «i (x — x0) +
x0)".
Now consider seperate parts
Six) = S- + S+ = an (x — xo)" +      an (x — xo)".
« = -oo n=0
As for the series S+, Theorem 5.47 implies that its radius of convergence R is given by the equality
R~x = lim sup tf\a„\.
If we apply the same idea to the series S- with 1/x subtituted for x though, we'll find out the series S- ix) converges for |x — xo | > r, where
r 1 = lim sup ^J\a-n |.
These notions remain completely true even for complex values of x substituted into our expressions.
Theorem. A Laurent series Six) centered at xo converges for all x € C satisfying r < |x— xq\ < R and diverges for all x satisfying
|x — xo| < r or \x — xq\ > R.
Hence we can see the Laurent series need not converge at any point at all, because we can have values R < r. But if we look for example at the above case of rational functions expanded to Laurent series at some of the roots of the denominator, then clearly r = 0 and therefore, as expected, it will really converge in the punctured neighbouthood of this point xo, while R will be given exactly by the distance to the closest root of the denominator. In case of our first example, for the function e-1^ we have r = 0 and R = oo.
6.47. Numerical approximation of integration. Just like at the end of the previous part of the text (see paragraph 6.17), we'll now use the Taylor expansion to propose as good and simple approximations of integration as possible. We'll work with an integral / =
rb
Ja fix)dx of analytic function fix) and a uniform partition of the interval [a, b] using points a = xq, x\, ..., x„ = b with distances Xi■ — Xi-i = h > 0. We'll denote the points in the middle of the intervals in the partitions by xr-+i/2 and the values of our function at the points of the partition by /(*/) = f.
We' 11 compute the contribution of one segment of the partition to the integral by the Taylor expansion and the previous theorem. We intentionally integrate symmetrically around the middle values so that the derivatives of odd orders cancel each other out while
389
CHAPTER 6. DIFFERENTIAL AND INTEGRAL CALCULUS
curve at points [—3,0], [0, 0], [3,0]. In the first quadrant, it can then be expressed as a graph of the function
f(x) = ^9x2 -x4,   x e [0, 3].
The sought volume is thus a double (here we consider x > 0) of the integral
/ Jtf2ix) dx = it f V9x2 — x4 dx. o o
Using the substitution t = V9 — x2 ixdx = —tdf), we can easily compute
3 _ 3 _ 0
/ V9x2 - x4 dx = fx ■ V9 - x2 dx = - J t2 dt = 9,
0 0 3
and receive the result 18tt. □
6.80.   Torricelli's trumpet, 1641. Let a part of a branch of the hyperbola xy = 1 for x > a, where a > 0, rotate around the x axis. Show that the solid of revolution created in this manner has a finite volume V and simultaneously an infinite surface S. Solution. We know that
+ 0o +0o
V = Tt  f  (i)   dx = TT  f  -jj dx = 77"
lim
'^►+00
K)
and
+00
S = 2n f i
J x
+00
+00
1 + (-ji) dx =2jt f dx > 2jt f \dx
2n I  lim In x — In a
+00.
The fact the the given solid (the so called Torricelli's trumper) cannot be painted with a finite amount of color, but can be filled with a finite amount of fluid, is called Torriccelli's paradox. But realize that a real color painting has a nonzero width, which the computation doesn't take into account. For example, if we would paint it from the inside, a single drop of color would undoubtedly "block" the trumpet of infinite length. □
Another problems about computing lengths of curves, areas of plane figures and volumes of parts of space can be found on page 413.
6.81. Apllications of the integral criterion of convergence. Now
let's get back to (number) series. Thanks to the integral criterion of convergence (see 6.33), we can decide the question of convergence for a wider class of series: Decide, whether the following sums converge
of diverge:
00
a) E "f,
« = 1 00
b) E£-
«=1
Solution. First notice, that we cannot decide the convergence of none
of these series by using the ratio or root test (all Umits lim |     | and
«^00
lim zfchi equal 1). Using the integral criterion for convergence of series, we obtain:
a)
í
1
x In (x)
■ dx
-dt = lim [ln(r)]o = 00,
0     t &^oo
integrating:
cA/2
/      fixi + 1/2 + t)dt = /        Y)-/("Wl/2)<" )dt
J-h/2 J-hl2\r7^) '
00 / /-A/2   1 \
=e(/ y{k\xi+v2Ýdt\
t-L\J-h/2k\ J
00 h2k+l
En_f (2k) f \ 22H2k + \)\J (Xi+1'2)-
k=0
A very simple numerical approximation of integration on one segment of the partition is the so called trapezoidal rule, which uses the area of a trapezoid given by the points [x;, 0], [x;, f], [0, xt+i], [xt+i, f+i] for approximation. This area is
Pi = l-ifi+fi+i)h so in total we can approximate the integral I by value
n-l
h
/uch = e * = 2(/o+2/1 +"'+2fn~i + fn)-
We'll now compare /nch with the exact value of I computed by contributions over seperate segments of the partition. We can express the values f by middle values and derivatives //+i/2 m this way:
h   1 h2 n
fi + \/2±\/2 = fi + \/2 ± i:fi + l/2 + T^yif (l + l/2)
2\22' h3
± 3!23/(3)(/ + 1/2) + ---' so for the contribution P, to the approximation we get
Pi = \(fi + fi+i)h = h(fi+l/2 + ^f'd + 1/2)) + Oih5).
>From here we get an estimations of the error I — 7iicn over one segment of the partition
h
h2
Ai = h(fi+l/2 + -f!'+l/2 - fi+l/2 - -//;i/2 + Oih4))
-f!'+l/2 + Oih5).
h3 12
The total error is thus estimated as
/ " /lich = J^nh3f" + n °^5) = - ^f" + °^4)
where /" represents the approximation of the second derivative of
If the linear approximation of the function over the seperate segments doesn't suffice, we can try can an approximation by a quadratic polynomial. To do that, we'll always need three points, so we'll work with segments of the partition in pairs. Suppose n — 2m and consider x, with odd indices. We'll require
fi+l = fix, +h) = fi+ah + fJh2
fi-l = fixt-h) = f-ah + ph2
which gives (see the similarity to the difference for approximating the second derivative)
1
P = -^.(fi+i + fi-l ~ 2fi)-
390
CHAPTER 6. DIFFERENTIAL AND INTEGRAL CALCULUS
hence the given series diverges.
b)
1
-rr dx
XL
lim
hence the given series converges.
□
6.82.  Using the integral criterion, decide the convergence of series
E
n = l
1
(« + l)ln2(« + l)
Solution. The function /(*)
x e [1, +oo)
(x + \)\Y?(x + \),
is clearly positive and nonincreasing on its whole domain, thus the given series converges if and only if the integral f^°° f(x)dx con-
verges.   By using the substitution y dx/(x + 1)), we can compute
+0o +0o
In (x + 1) (where dy
f
i
(x+l)ln2(x + l)
dx = f -i dy = ±.
In 2
Hence the series converges.
G. Uniform convergence
6.83.  Does the sequence of functions
□
x e
n € N
yn = e^ converge uniformly on M?
Solution. The sequence {y„}„ef>] converges pointwise to the constant function y = 1 on R, since
lim e4«2
1,   x e
e > 2   for all n e N
But the computation
y„ (yin^j
implies that it's not a uniform convergence. (In the definition of uniform convergence, it suffices to consider s € (0, 1).) □
6.84.  Decide whether the series
•Jx-n
E
n = l
n*+x2
converges uniformly on the interval (0, +oo). Solution. Using the denotation
fn(x)
we have
_ ^fx-n - ^pf+I2'
n(nA-3x2)
x > 0,   n e N,
x > 0,   n e N.
2VI(«4+*2)
From now on, let n e N be arbitrary. The inequalities f'n(x) > 0 for and f'n(x) < 0forx e imply that the
maximum of function /„ is attained exactly at the point x = n2/*j3. Since
V27 4«2
E
n = l
721 4«2
727 4
The area of approximation of the integral over two segments of the partition between x, _ i and x,+1 is now estimated by the expression
/h 2 ft + at + fit2 dt = 2hft + -fin3 -h 3
= 2hfi + ^(fi+i+fi-i-2fi) 6
= jM+i + fi-i-2fi)-
This procedure is called Simpson's rule. The whole integral is now approximated by
/Simp = jh(f0 + hn + 4 fk + 2   E /*)■
odd k even ^
Similarly to the procedure above we can derive that the total error is estimated by
/ " /simp = -^(b - a)A4/(4) + 0(h5),
where /(4) represents the approximation of the fourth derivative of /•
By the end of this chapter, we'll stop at other concepts of integration. First we'll show a modification of the Riemann integral, which will later be useful in notions about probability and statistics. We'll mostly stay in an area of notions and comments though, readers interested in a thorough explication will need to find another sources.
6.48. Riemann-Stieltjes integral. In our idea of integration as summing infinitely many linearized (infinitely) small increments of the area given by a function fix) we omitted the possibility that for different values of x we would take the increments with different weights. This could be surely arranged at the infinitesimal level by interchanging the differential dx for <p(x)dx for some suitable function <p. We've seen this behavior for example while computing the length of a parametrized curve in space.
Surely we can also imagine that at some point xq, the increment of the integrated quantity is given by af(xo) independently on the size of the increment of x. For example we can observe the probability that the amount of per mille of alcohol in blood of a driver at a test will be at most x. With quite a large probability we'll obtain value 0, thus for any integral sum, the segment containing zero must contribute by a constant nonzero contribution, independent on the norm of the partition. We cannot simulte such behavior by multiplying the differential dx by some real function. Instead we can generalize the Riemann integral in this way:
Choose a real nondecreasing function g on a finite interval [a, b]. For every partition S with representants & and points of partition
a = jco, xi, ..., x„ — b we define the Riemann-Stieltjes integral sum of function f(x) as
n
5s = E/fe)(gte)-gfe-i)).
r = l
Then we say the Riemann-Stieltjes integral
n = \
-f
Ja
f(x)dg(x)
391
CHAPTER 6. DIFFERENTIAL AND INTEGRAL CALCULUS
according to the Weierstrass test, the series E^li /«(x) converges uniformly on the interval (0, +00). □
6.85.  For jc e [-1,1], add
y- (-D"+1
n(n-\
jn + l
n = \
i(« + l)
Solution. First notice that by the symbol for an indefinite integral, we'll denote one specific primitive function (while preserving the variable), which should be understood as a so called function of the upper limit, while the lower limit is zero. Using the theorem about integration of a power series for x e ( — 1, 1), we'll obtain
y^OO (-1)" 2-tn = l n(n
v« + l
(n + 1)
/ E(^— Adx =fT,Zi((-Vn+1fxn-1dx)dx =
n = l
f(flZV=i(-x)n~ldx)dx = KJl-x+x2 -x3 +■■■ dx)dx
!{!^hdx)dx
/
ln(l + jc) + Ci dx
Since
/ Yl (y~~^—x") dx = /ln (!+x) + ci dx^
we know from the continuity of the given functions that
E
n = l
(-1)"
ln(l+x)+d, jce(-l.l).
The choice x = 0 then yields 0 = In 1 + C\, i.e. C\ = 0. Next,
u = In (1 + x) u'
f In (1 + x) dx
per partes
dx =
v' = 1
x In (1 + x) — f j^z dx = x In (1 + x) — f 1 — j^- dx = x In (1 + x) - x + In (1 + x) + C2 = (x + 1) In (x + 1) - x + C2.
Since the given series converges at the point x = 0 with a sum of 0, analogously as for C\ ,
0 = 1 • In 1 - 0 + C2 implies that C2 = 0. In total, we have
1
l+x X
y t±
(-!)"+' x« + l
« = 1
(n + D
(x + 1) In (x + 1) - x,
x e (-1, 1).
Moreover, according to Abel's theorem (see 6.45), the sum of the given series equals the (potentially improper) limit of the function (x + 1) In (x + 1) —x at points — 1 and 1. In our case, both hmits are proper (at point 1, the function is even continuous and the value of the limit at point 1 then equals the value of the function 2 In 2—1.) For computing the value of the limit at point —1, we'll use L'Hospital's rule:
lim (x + 1) In (x + 1) — x
-►-1+
lim t In t + 1
Inf 7 lim -- + 1 = lim -V + 1
lim
exists and its value is /, if for every real e > 0 there exists a norm of the partition S > 0 such that for all partitions S with norm lesser than S, we have
15S - /| < e.
For example, if we choose g(x) on interval [0, 1] as a sequentially constant function with finitely many discontinuities ci,... ,Ck and "jumps"
at =  lim g(x) - lim g(x),
x—y ci _|_ x—yc[ —
then the Riemann-Stieltjes integral exists for every continuous fix) and equals
»1 k
/ f(x)dg(x) = y2aif(ck). Jo ~i
By the same technique we used for the Riemann integral, we can now define upper and lower sums and uppoer and lower Riemann-Stieltjes integral, which have the advantage that for bounded functions they always exist and their values coincide if and only if the Riemann-Stieltjes integral in the above sense exists.
We already encountered problems with Riemann integration of functions that were "too jumpy". Technically, for function g(x) on a finite interval [a, b] we define its variation by
sup Igte)
S   r = l
■ g(Xi-l)\,
where we take the supremum over all partitions S of the interval [a, b]. If the supremum is infinite, we say g(x) has an unbounded variation on [a, b], otherwise we say g is a function with a bounded variation on [a, b].
Similarly to the procedure for the Riemann integral, we can quite easily derive the following:
Theorem. Let fix) and g(x) be real functions on a finite interval [a, b].
(1) Ifgix) is decreasing and continuously differentiable, then the Riemann integral on the left side and the Riemann-Stieltjes integral on the right side both exist simultaneously and their values are equal
fb pb
f(x)dg(x)
f f(x)g\x)dx= f
Ja Ja
(2) If fix) is continuous and g(x) is a nondecreasing function
rb
with a finite variation, then the integral Ja f(x)dg(x) exists.
6.49. Kurzweil integral. The last stop will be a modification of \^ the Riemann integral, which fixes the unfortunate be-\ havior at the third point in the paragraph 6.37, i.e. the limits of the nondecreasing sequences of integrable functions will again be integrable. Then we will be able to interchange the order of the limit process and integration in these cases, just like with uniform convergence.
First notice what's the essence of the problem. Intuitively we should assume that very small sets must have a zero size, and thus the changes of values of the functions on such sets shouldn't influence the integration. Moreover, a countable union of such "negli-_gihlq for Jhe purpose of integration" sets should have a zero size again. Surely we would expect that for example the set of rational
392
CHAPTER 6. DIFFERENTIAL AND INTEGRAL CALCULUS
Of course, the convergence of the series at points ±1 can be veri-
fied directly. It's even possible to directly deduce that E ^
«=i
(n + D
(by writing out ^
(n + D
n + 1"
1
□
6.86. Sum of a series. Using theorem 6.41 "about the interchange of a limit and an integral of a sequence of uniformly convergent functions", we'll now add the number series
oo 1
T —
n = \
We'll use the fact that f
xn+l
1
«2" '
Solution. On interval (2, oo), the series of functions E^Li ^rr converges uniformly. That is implied for example by the Weierstrass test: each of the function      is decreasing on interval (2, oo), thus their
values are at most ^n-; the series Y^T=
:1   2" + !
is convergent though
2«+l '
(it's a geometric series with quotient -]). Hence according to the Weierstrass test, the series of functions E^Li ^rr converges uniformly. We can even write the resulting function explicitly. Its value at any x € (2, oo) is the value of the geometric series with quotient 7, so if we denote the limit by fix), we have
fix)
Y —
z—i Yn + 1
« = 1
1 1
X2 1 -
1
x(x — 1)
By using (6.43) (3), we get
Ax
'2
~  1   _   ™ r°° dx
nln   ~   ^ J2 xn+1
n=\ n=\ 2
/•oo / 00 j
A
\«=i
-jf
dx
x(x
-<5
dx
f   1 1
lim /---
<5^oo J2   X — 1 X
dx
lim [(ln(<5 - 1) - ln(<5) - ln(l) + In2]
lim
<s^oo
In 2
In
1
+ln(2)
In ( lim
15^*00
+ ln2
□
6.87.   Consider function fix) = YlT^i ne "x ■ Determine
-In 3
fix) Ax.
pin ± J In 2
Solution. Similarly as in the previous case, the Weierstrass test for uniform convergence implies that the series of functions X^li ne~nx converges uniformly on interval (In 2, In 3), since each of the functions ne-nx js iesser man iL on (m 2, In 3) and the series YlT^i converges,
numbers inside a finite interval would have this property, hence its characteristic function should be integrable and the value of such interval should be zero.
We say the set A c M has a zero measure, if for every e > 0 we can find a covering of set A by a countable system of open intervals Jt,i — 1,2,... such that
^m(Ji) < s.
In the following, by the statement "function / has the given property on set B almost everywhere" we'll always mean the fact that / has this property at all points except for a subset A c B of zero measure. For example, the characteristic function of rational numbers is zero almost everywhere, a sequentially continuous function is continuous almost everywhere etc.
We'd now like to modify the definition of the Riemann integral so that when choosing the partition and the corresponding Riemann sums, we would be able to eliminate the omnious effect of the values of the integrated function on a before known set of zero measure. It also seems reasonable to try to guarantee that the segments in the given partitions with representants would have the property that near points of such set, they would be controllably small.
A positive real function S on a finite interval [a, b] is called a calibre. We call a partition S of interval [a, b] with representants & <5-calibrated, if we have
^ - Kb) < xi-i < Hi < xi < Hi + S(Hi)
for all i.
For further procedure, it's essential to verify that for every calibre S, a ^-calibrated partition with representants can be found. This statement is called Cousin's lemma and can be proven for example in the usual way based upon the properties of supremas. For a given calibre S on [a, b], we'll denote by M the set of all points x € [a, b] such that a ^-calibrated partition with representants can be found on [a, x]. Surely M is nonempty and bounded, thus it has a supremum s. If s / b, then we could find a calibrated partition with a representant at s, which leads to a contradiction.
Now we can define a generalization of the Riemann integral in this way:
Definition. Function / defined on a finite interval [a, b] has its Kurzweil integral
-f
J a
f(x) dx,
if for every e > 0, there exists a calibre S such that for every 8-calibrated partition with representants S, the inequality | Se — 11 < e holds for the corresponding Riemann sum Se ■
6.50. Properties of the Kurzweil integral. First notice that when \\ defining the Kurzweil integral, we only bounded the set of all partitions, for which we take the Riemann sums into account. Hence if our function is Riemann W integrable, then it must also have the Kurzweil integral and these two integrals are equal.
393
CHAPTER 6. DIFFERENTIAL AND INTEGRAL CALCULUS
which can be seen for example from the ratio test for convergence of series:
lim
ř7=>oo
an + l
lim
ř7=>oo
(n + 1)2
-(«+1
n2n
,. ln + 1 1 hm--= -.
«=>°o 2   n 2
In total, according to (6.43) (3), we have
/•In 3 /-In 3 00
/     f(x) áx   =    / ^Tne-nx
Jin 2
/•in J w
in 2 ^
00 1
00 00 / 1     1 \
D--"ti = E(^-^) = i
/hT"* dx
1 _ 1
2 ~ 2"
□
6.88. Determine the following limit (give reasons for the procedure of computation):
<*»(f)
lim
r00 a io (1
dx.
cos(í)
Solution. First we'll determine lim , The sequence of these
«=>OC> (l+n)
functions converges pointwise and we have
cos(f) 1 (||??||) 1
ex
lim
(1 + _)"      Um (1 + _)"
It can be shown that the given sequence converges uniformly. Then according to (6.41),
f°° cos(-) fc lim/o   0 + lf ^   = 1
+
cos(-) lim--
(1 + xň)
dx
1
/- = '
Jo e*
We leave the verification of uniform convergence to the reader (we only point out that the discussion is more complicated than in the previous cases).
□
For the same reason, we can repeat the argumentation in Theorem 6.24 about simple properties of the Riemann integral and again verify that the Kurzweil integral behaves in the same way. In particular, a linear combination of integrable function cf(x) + dgix) is again integrable and its integral is c fb f(x)dx + d fb gix)dx etc. For proving this, it only suffices to think through little modifications when discussing the refined partitions, which moreover should be S -calibrated.
Analogously, for the case of monotonie sequences of point-wise convergent functions, we can extend the argumentation verifying that the limits of uniformly convergent sequences of integrable functions /„ are again integrable and the integral of the limit is the limit of the values of integrals /„.
Finally, the Kurweil integral behaves in the way we would like it to, even to sets with zero measure:
Theorem. Consider a function f on interval [a, b], which is zero almost everywhere. Then the Kurzweil integral fb f '(x)d(x) exists and equals zero.
Proof. This is a nice illustration of the idea that we can get rid of the influence of values on a small set by a smart choice of calibre. Denote by M the corresponding set of zero measure, outside of which fix) — 0 and write Mk c [a,b],k — 1,..., for the subset of the points for which k — 1 < |/(x)| < k. Because all the sets Mi have zero measure, we can cover it by a countable system of in sum arbitrarily small and pairwise disjoint open intervals Jkj. Now definte the calibre S(x) for x e Jkj so that the whole intervals (x — Six), x + Six)) were still contained in Jkj. Outside of set M, we then define S arbitrarily.
For á-calibrated partition S of the interval [a, b] we can then put a bound on the corresponding Riemann sum
n-l
£/(£n)(*Z + l -Xi) — £ /(£n)(Xi + l -Xi) j=0
n-l
7=0
oo   n — l
<E  E \ f^n)\ixi + l -Xi) k=l j=0
oo      y n — l *.
<Em E ^w))
k=\     K j=0
Hence if we want this bound to be smaller than e for an e known in advancem it suffices to choose the covering by the intervals Jkj so that
j2m(Jkj) =
7=1
k2k'
Then in the last expression we can substitute for the inner sum, add the geometric series YlkLi      and get exactly the required e. □
Corollary. We don't change the Kurzweil integrability of a given function fix) neither the value of its integral if we change the values fix) on a set of zero measure.
394
CHAPTER 6. DIFFERENTIAL AND INTEGRAL CALCULUS
6.51. Comments about the integration. To finish ....
^ absolutely continuous functions, the relation be-
■'^'~7^f''': tween tne indefinite integral and the primitive func-^ Civ     tion, intergration in absolute value, Lebesgue inte-M'i^ gral
395
CHAPTER 6. DIFFERENTIAL AND INTEGRAL CALCULUS
396
CHAPTER 6. DIFFERENTIAL AND INTEGRAL CALCULUS
397
CHAPTER 6. DIFFERENTIAL AND INTEGRAL CALCULUS
398
CHAPTER 6. DIFFERENTIAL AND INTEGRAL CALCULUS
399
CHAPTER 6. DIFFERENTIAL AND INTEGRAL CALCULUS
400
CHAPTER 6. DIFFERENTIAL AND INTEGRAL CALCULUS
401
CHAPTER 6. DIFFERENTIAL AND INTEGRAL CALCULUS
402
CHAPTER 6. DIFFERENTIAL AND INTEGRAL CALCULUS
H. Expletory examples for the whole chapter
6.89. Let / be a given function and let z be a point such that
f(z) = 0,   f'(z) = 0,   f"(z) = 0,   f(3)(z) = l. Which of the following statements:
(a) the tangent line to the graph of function / at point [z, f(z)] is the x axis;
(b) the function / is not a polynomial of degree two;
(c) the function / is increasing at point z;
(d) the function / does not have a strict local minimum at point z ;
(e) the point z is an inflective point of function /
are necessarily true? O
6.90. Determine the total course of function
/(-*) =-7TT' *eM\{-l}. Hence determine (if sensible):
(a) the domain (it's given) and the range;
(b) eventual parity and periodicity;
(c) discontinuities and their kind (including the according one-sided Umits);
(d) points of intersections with the axes x, y;
(e) the intervals where the function is positive and where it's negative;
(f) the limits lim^^ f(x), \\mx^+QO f(x);
(g) the first and the second derivatice;
(h) the critical and the so called stationary points, at which the first derivative is zero (eventually the points, at which the first or the second derivative don't exist)
(i) the intervals of monotonicity;
(j) strict and nonstrict local and absolute extremes; (k) the intervals where the function is convex and where it's concave; (1) the points of inflection; (m) the horizontal and inclined asymptotes;
(n) values of the function / and its derivative /' at „significant" points; (o) the graph.
O
6.91. Determine the course of the function
m = ^f.
By determining the course of function / (not only in this example) we mean „determining the domain, the range and eventual parity or periodicity; computing the limits
lim f(x)    a      lim f(x),
if they exist; determining the discontinuities and their kind including the according one-sided limits (if they exist), zeros (if they exist) and the intervals where the function is positive and where it's negative; determining the first (and the second, if needed) derivative and the intervals on which the function increases, decreases or remains constant, finding the stationary (critical) points and all local extremes (if they exist); determining the points of inflection and the intervals on which the function is convex and concave; computing the values at significant points (i.e. find the values of the function at stationary points and points of inflection, if it helps when plotting the graph, and find the points of intersection with the axes, if they exist); plot its graph with the asymptotes11. O
6.92. Determine the course of the function
fix) = *3-3f_f*+1.
O
6.93. Determine the course of the function
403
CHAPTER 6. DIFFERENTIAL AND INTEGRAL CALCULUS
f(x) = jfte-*.
o
6.94. Determine the course of the function
f(x) = arctg 2Z7.
O
6.95. Determine the course of the function among other things, find the extremes, the points of inflection and the asymptotes and plot its graph. O
6.96. Determine the course of the function; among other things, find the extremes, the points of inflection and the asymptotes:
ln(x2 - 3x + 2) + x.
O
6.97. Determine the course of the function; among other things, find the extremes, the points of inflection and the asymptotes:
(x2 -De*2'1.
O
6.98. Determine the course of the function; among other things, find the extremes, the points of inflection and the asymptotes:
ln(2x2 -x - 1).
o
6.99. Determine the course of the function (among other things, find the extremes, the points of inflection and the asymptotes):
x2 -2
x - 1
O
6.100. Using the basic formulas, determine any primitive function to function
(a) y = yjx y/x ^fx,    x € (0, +00);
(b) y = i2x + 3X)2 ,   x e R;
(c) y = -^,
(d) ? = T» *e(-f,^).
o
6.101. Use the derivatives of functions y = tgx and y = cotgx to find the indefinite integrals of functions
(a) y = cotg2x,    x € (0, jt);
(b) y = ,2 1  2 ,   x e (0, f).
sin x cosz x \     l f
o
6.102. Find the primitive function to function
y = ex + 3
/4-x2
na intervalu (—2, 2). O 6.103. Determine
f       dx,   x e R.
■j 1
404
CHAPTER 6. DIFFERENTIAL AND INTEGRAL CALCULUS
o
6.104. Determine
/ x2-2x+T, dX>      X €
6.105. Forx e (0, 1), compute
f ( H—/ 3 „ + 4 sinx — 5 cosx) dx.
6.113. Determine
O
o
d.70d. Determine the indefinite integrals
(a) / arctgx <ix,   x e M;
(b) j^fdx,   x > 0
using integration by parts. O
6.107. By repeated use of integration by parts, for all x e M determine
(a) / x2 sinx <ix;
(b) / x2 e* dx.
o
6.108. For example by using integration by parts, determine
/ x In2 x dx
for x > 0. O
6.109. Using integration by parts, determine
/ (2-x2)e*dx
on the whole real line. O
6.110. Integrate
(a) f(2x + 5)wdx,   x e R;
(b) f —V- dx,    x > 0;
v ' J x hr x
(c) / e~x x2 dx,   x e R;
(d) /"15sa!2^£ic,   x e (-1, 1);
(e) f^dx,   x > 0;
(g) f^dx, xeR;
(h) / sin -v/x <ix,    x > 0
by using the substitution method. O
6.111. Forx € (0, 1), by using suitable substitutions, reduce the integrals
fx\f^dx;    f *
(x-l)jx2+x + l
to integrals of rational functions. O 6.112. Forx € {—jt/2, tz/2) compute
J   1 +sinz jt
using the substitution t = tg x. O
r 7, <ix, x > o.
405
CHAPTER 6. DIFFERENTIAL AND INTEGRAL CALCULUS
in an arbitrary way.
6.114. Compute
(a) fx" lnx dx,    x > 0, n ^ — 1;
(b) fjf^dx, xeR.
O
O
6.115.  For x > 0 determine
(2+5xy
dx:
(a) /
v A,
^I^dx;
(c) /   i A.
Solution. All three given integrals are binomial, i.e. they can be written as
f xm (a + bx" )p dx   for some a,beR, m,n,peQ.
The binomial integrals are usually solved by applying the substitution method. If p € Z (not necessarily p < 0), we choose the subtitution x = f, where s is the common denominator of numbers m a n: if 22±i e Z and p g Z, we choose a + &x" = z*, where s is the denominator of number p; and if + p e Z (p £ Z, ^ Z), we choose a + bx" = f x", where s is the denominator of p. In these three cases, a reduction to an integration of a rational function is guaranteed. Hence we can easily compute
(a)
p e Z
<ix = 4?3 <if
4/(2 + 5f4) *
4 / (8 + 60t4 + 150?8 + 125?12) dt = 4 (8? + 12f5 + 3 .   , 13 4 (8Vi + 12VX5" + f ^ + 4/ ^) + C;
(b)
/x~3 ^1 + x?^3 dx
p £ z, *2±i e
1 +X4 X = (f3
I)4
<ix
12?2 (?3 - 1)  dt | 12/f6 -t3dt = 12     - J) + C = 12^(1 + V^)4
12 / f3 (f3 - 1) dt 0+C=
(c)
f-rfrsdx = f{l+x*)-*dx
p£Z, ^ £ Z, f Ipe l"+ x4 = /x4
x = (t4 - I)-'
dx = -Z3 (f4 - 1)"
dt
t*-i
dt
I
t1
(t-i)(t+i)(t2+i) -\Qn\t- 1
/  iyt-l t+l +
dt
In | f + 1 | + 2 arctg 0 + C = + C.
4/i + l)
□
6.116.  Forx e (-§, f), integrate
(a) f
(b) /
1+4 cos2 x+3 sin2 x
<ix:
l
l+sin2 x
dx:
406
CHAPTER 6. DIFFERENTIAL AND INTEGRAL CALCULUS
(c) f ^— dx.
v 7 J  2—cos x
Solution. Integrals in the form of f f (sinx, cosx) dx for some rational function / are usually solved by the substitution method. If /(sinx, — cosx) = —/(sinx, cosx), we choose t = sinx; if /(—sinx, cosx) = —/(sinx, cosx), we choose t = cosx; and if /(—sinx, — cosx) = /(sinx, cosx), then t = tgx. If none of these equalities hold, the substitution t = tg | is used. We'll show it on the given integrals.
Case (a). In the denominator, we have
1+4 cos2 x + 3 sin2 x = 4 + cos2 x
and in the numerator only the sine function to an odd power, i.e. the substitution t = cos x, where dt = — sin x dx, allows to replace all the sines and cosines and thus obtain
f       f\.2  dx = fsmlt72x)dx = f=^dt = f(\-7^)dt = t-larctgt + C =
■J   l+4cos2JC+3sin2jc J       4+cos2 x J     4+t2 J y        4+t1 ' 2        & 2 1
cosx - farctg ^ + C. Case (b). Because both the sine and cosine appear here to an even power, the substitution t = tg x leads to
sin2 x = -+V,    cos2 x =      ,    dx = —^ dt,
i+fi' i+t2' i+t2 '
by which we obtain
lT^Tx=Ilfkdt = I\T2Wdt = f arctg (j2t) +C = f arctg (V2tgx) + C. Case (c). Now we'll use the universal substitution t = tg |, where
2'
sin x = t^V ,    cos x = -{—V,    dx = —^ dt.
l+t2 ' l+t2 ' l+i2
Then we can determine
2
I^rx=Itkdt = 2l^ = 2-^ arCtg (V3?) + C =     arctg (V3 tg f)+ C.
□
r2
6.117. Carry out the suggested division of polynomials
jc2-2jc+4
for x e M. O
6.778. Express the function
_ 3jc4+2jc3-jc2 + 1 — 3jc+2
as a sum of a polynomial and a rational function. O
6.779. Decompose the rational expression
/ \ 4jc2 + 13jc-2 . W xl+Jx2^-!!'
2x5+5x3-x2+2x-l W *6+2*4+*2
into partial fractions. O
6.120. Express the function
_ 2jc3+6jc2+3jc-6
y — x*-2xi
in the form of partial fractions. O
6.121. Decompose the expression
Ix2-I0x+31
x3-3x2+9x + l3
into partial fractions. O 6.122. Express the rational function
_ -5x+2
y ~ x*-x3+2x2
in the form of a sum of partial fractions. O 6.123. Decompose the function
407
CHAPTER 6. DIFFERENTIAL AND INTEGRAL CALCULUS
y x3(x+i)
into partial fractions. O
6.124. Determine the form of the decomposition of the rational function
_ 2x2-lU y — (x-2) x2 (3x2+x+4)2
into partial fractions. Don't compute the undetermined coefficients! O
6.125. Express the function
_ jc4+6jc2+jc-2
y - x4~2x3
as a sum of a polynomial and a proper rational function Q. Then express the obtained function Q in the form of a sum of partial fractions. O
6.126. Write the primitive function to the rational function
(a) y = 7Z2,    x £ 2;
(b) y = -j^, x^2.
6.127. Determine
/ "xrrb dx>   x e
J xz+4x+8
6.128. Compute the indefinite integral of the function
i
(x2+x+l)2 '
y —  <r2Jr-l-Y\2 '       X €
6.129. Determine
6.130. Integrate
6.131. Compute the integral
f xTZi dx>     X £1.
f ,    *     , dx, jceM\{l,2}.
J   (x-\)(x-2)2 '
6.132. Forx e (0, |), compute
(a) / sin3x cos4x dx;
(c) / 2 sin2 | dx;
(d) / cos2x dx;
(e) / cos5 x Vsinx dx;
sin x '
O
O
o
o
o
o
o
408
CHAPTER 6. DIFFERENTIAL AND INTEGRAL CALCULUS
6.133. Let y = | x | on the interval I = [— 1, 1] and let
S„ = (-l,-2=±,..., - ±,0, I 2=1 1)
be a partition of the interval 7 for arbitrary neN. Determine Ssn, SUp and 5sn, inf (the upper and lower Riemann sum corresponding to the given partition).
Based on this result decide if the function y = | x | on [— 1, 1] is integrable (in Riemann sense).
o
6.134. Compute
lim ^+^+-+7TTT.
«^►00 "
o
6.135. How many distinct primitive functions to function y = cos (lnx) does there exist on the interval (0, 10)? O
6.136. Give an example of a function / on th interval I = [0, 1] that doesn't have a primitive function on/. O
6.137. Using the Newton integral, compute
tc
(a) / sinx dx;
0
1
(b) / arctgx dx;
0
3jt/4
(c) f tx^ dx;
J     1 +sin x ' -rc/4 e
(d) / I In x I dx .
l/e
6.138. Compute
\ Vl+x2
1 3
fj^dx.
0
6.141. For example by repeated integration by parts, compute
tc/2
f e2x cosx dx.
0
6.142. Determine
1
/ x2 e~x dx.
-1
o
o
6.139. For arbitrary real numbers a < b determine
b
f sgnx dx.
a
Recall that sgnx = 1, for x > 0; sgnx = — 1, fpr x < 0; and sgnO = 0. O
6.140. Compute the definite integral
o
o
409
CHAPTER 6. DIFFERENTIAL AND INTEGRAL CALCULUS
6.143. Compute the integral
1
/'  //, dx
•> v/5-4jc
(a) /
e"
dx
In 2
(b) / £ dx.
o
0 0
6.147. Order the numbers
n/2 tt/2 1
A := / cosx sin2x dx,    B := f sin2x dx.    C := f —x5 5X dx.
6.151. With an error lesser than 1/10, approximately compute
2
f(x- 22^-i)mx dx.
O
using the substitution method. O 6.144. Compute
o
6.145. Which of the positive numbers
Jl/2 71
p := f cos7 a- dx,    q '■= f cos2 x dx
0 0
is bigger? O
6.146. Determine the signs of these three numbers (values of integrals)
2 71 271
a := fx3 2X dx;    b := f cosx dx;    c := f ^ dx.
o
0 0-1
D-.= f^dX + J^4dx + f^dx
2tz tt 10
by size. O
6.148. By considering the geometric meaning of the definite integral, determine
2
(a) / | x - 11 dx;
-2 0,10
(b) / tgx dx;
-0,10 2tz
(c) / sinx dx.
o
o
6.149. Compute f1l\x\ dx. O
6.150. Determine
i
f x5 sin2 x dx.
o
410
CHAPTER 6. DIFFERENTIAL AND INTEGRAL CALCULUS
6.152. Without using the symbols of differentiating and integrating, express
I / 3t2 cost dt\
r2
/
x4 + 3x3 +5x2 +4x +2
dx.
6.154. Compute the integral
'5    sin t
1 — cos21
4
At.
6.155. Compute the integral
"ln2 dx
o    e2x — 3ex
6.156. Compute:
zl
(i) f02 sin x sin 2x dx,
(ii) / sin2 x sin 2x dx.
6.157. Compute the improper integral
— oo
+CX)
(b) / f;
0
(C) / 2£!±vi dx.
0
1
(d) / In | x | dx.
-l
6.158. Determine
3jt/2
/"       COS x
^ 1+sinx 0
dx.
6.159. Compute the improper integral
+ 0o
/  72-TTT dx.
6.160. Compute
o
with variable x e M and a real constant a, if we differentiate with respect to x. O
6.153. Compute the indefinite integral
1
o
o
o
o
o
o
o
411
CHAPTER 6. DIFFERENTIAL AND INTEGRAL CALCULUS
+00
/ e2x+ex+l dx.
6.161. By using the substitution method, compute
0       2 00 1
f x e~x dx; f ^-j- dx.
-00 0
6.162. Compute the integrals
1   - n 4   _ n +°°  - n
f eJl dx- f ^— dx- f ^— dx
■J   VI      ' J   VI      ' J V?
0 1 4
did.?. Find the values of a e M, for which
+00
(a) / % € M;
1
(b) / £ e M;
0
+00
(c) / sin ax dx e M.
6.764. For which p, q € R is the integral
+00 r _dx
J    xP ]
(b)  / £3 e
— 00
+CX)
(c) / A e
1
E
(-1)"
o
o
o
o
' In? x 2
finite? O
6.165. Decide, if the following is true:
+00
—00 +00
o
6.766. Approximately compute cos with an error lesser than 10~5. O 6.167. For a convergent series
V^+100'
estimate the error of the approximation of its sum by the partial sum 59999. O
6.168. Without computing the derivatives, determine the Talor polynomial of degree 4 centered at x0 = 0 of function
f(x) = cosx — 2sinx — In(1 + x) ,    x e (—1, 1).
Then decide if the graph of function / in neighbourhood of the point [0, 1] is above or below the tangent line. O
6.169. By using differentiation, obtain the Taylor expansion of function y = cosx from the Taylor expansion of function y = sin x centered at the origin. O
412
CHAPTER 6. DIFFERENTIAL AND INTEGRAL CALCULUS
6.170. Find the analytic function whose Taylor series is
JC      ^ JC    i  ~^ JC JC    i  * * * j
for* e [-1,1]. O
6.171. From the knowledge of the sum of a geometric series, derive the Taylor series of function
y = 5+2x
centered at the origin. Then determine its radius of convergence. O
6.172. Expand the function
to a Taylor series centered at the origin. O
6.173. Expand the function cos2(x) to a power series at the point n/4 and determine for which x e R this series converges. O
6.174. Express the function y = ex defined on the whole real axis as an infinite polynomial with terms of the form a„ (x — 1)" and express the function y = 2X defined on R as an infinite polynomial with terms a„x". Q
6.175. Find a function / such that for x e R, the sequence of functions
to it. Is this convergence uniform on M? O
6.176. Does the series
oo
kde xeR,
n = \
converge uniformly on the whole real axis? O
6.177. By using differentiation, obtain the Taylor expansion of function y = cosx from the Taylor expansion of function y = sin x centered at the origin. O
6.178. Approximate
(a) cosine of ten degress with a precision of at least 10~5;
(b) the definite integral JQ1/2      with a precision of at least 10~3.
o
6.179. Determine the power expansion centered at x0 = 0 of function
r
f(x) = f e dt,    x s o
o
6.180. Find the analytic function whose Taylor series is
JC      ^ JC   ~~\   ~^ JC JC   ~~\   * * * ^
for* e [-1,1]. O
6.181. From the knowledge of the sum of a geometric series, derive the Taylor series of function
y — 5+2x
centered at the origin. Then determine its radius of convergence. O
6.182. Let a movement of a solid (a trajectory of a mass point) be described by function
s(t) = -(t - 3)2 + 16, fe[0,7]
in units m, s. Determine
(a) the initial (i.e. at time t = 0 s) velocity of the solid;
(b) the time and location at which the solid has zero velocity;
(c) the velocity and the acceleration of the solid at time t = 4 s.
Recall that velocity is the derivative of trajectory and acceleration is the derivative of velocity. O
413
CHAPTER 6. DIFFERENTIAL AND INTEGRAL CALCULUS
Solutions of the exercises
6.4. 2^F.
COS X
6.5. p(5)(x) = 12 • 5!; p(6)(x) = 0.
6.6. 212e2x + cosx.
6.7. f(26)(x) = -sinx + 226 e2*.
6.74. f + i (x - 1) - I (x - l)2 + ^ (x - l)3.
6.15. (a) 1 + ^; (b) 1 - ^; (c) x - ^; (d) x + ^; (e) x + x2 + ^. (5.7(5. 2 (x - 1) - (x - l)2 + | (x - l)3 - \ (x - l)4. (5.77. "*3
3(l+x)3 •
6 IS  v c;n 1° ~   ^ lim      „    x sin*-*2 _ 1
6.7,5. x --g-, sin 1  ~      - g—3, lim^0+-p---g-
6.20. (x - l)3 + 3 (x - l)2 + (x - 1) + 4. 6.26.
V(-l)"-x2",
(2«)!
converges for all real x. (5.27.
converges for all real x. 6.28.
00 92«-l
y(_D»+if-^
(2«)!
«=1
^3(-l)"+1 n
f(x) = > -x" ,
*—' n
«=1
converges forx € (—1, 1].
6.29. It's good to realize we're expanding i ln(x).
/(x) = ^(-iy+1-(x-iy,
2/
r=0
Converges on interval (0, 2].
(5.32. It's convex on intervals (—oo, 0) and (0, 1/2); concave on interval (1/2, +oo). It has only one asymptote, the line y — it/4 (v ±oo).
6.33. (a) y = 0 at —oo; (b) x = 2 - horizontal, y = 1 v ±oo.
6.34. y — 0 for x -> ±oo. (5J5. y = In 10, y = x + ln3. (5.77.       for a e (0, 1), oo else.
6.89. All of them.
6.90. The range is (—oo, 0] U [4, +oo). Function / is not odd, even nor periodic. It has a single discontinuity xo — — 1 with
lim   fix) — —oo,      lim   fix) — +oo.
x^> — 1 + x^* —1 —
The function intersects the x axis only at the origin. It's positive for x < —1 and nonpositive for x > —1. It can be shown easily that
lim  fix) — +oo, lim  fix) — —oo;
x^—oo x^+oo
X € K \
414
CHAPTER 6. DIFFERENTIAL AND INTEGRAL CALCULUS
This implies that / is increasing on the intervals [—2, — 1), (—1,0] and decreasing on the intervals (—oo, —2], [0, +oo). At the stationary point x\ — 0 it reaches a strict local maximum and at the stationary point x2 — —2 it has a sharp local minimum y2 — 4. It's convex on the interval (—oo, —1) and concave on the interval (—1, +oo). It doesn't have a point of inflection. The line x — —1 is a horizontal asymptote, the inclined asymptote at ±oo is the line y — — x + 1. For example, /(—3) — 9/2, /'(—3) — —3/4, /(l) — —1/2, /'(l) = -3/4.
6.91. The function is defined and continuous on R \ {0}. It's not odd, even nor periodic. It's negative exactly on the interval (1, +oo). The only point of intersection of the graph with the axes is the point [1,0]. At the origin, / has a discontinuity of the second kind and its range is is R, because
lim f(x) — +oo,      lim  f(x) — —oo,      lim  f(x) — +oo.
Aditionally,
f'(x) = -££, x€M\{0],
f"(x) =        x € R \ {0}.
The only stationary point is x\ — —72. The function / is increasing on the interval [x\, 0), decreasing on the intervals (—oo, x\], (0, +oo). Hence at point x\ it has a local minimum y\ — 3/74. It has no points of inflection. It's convex on its whole domain. The fine x — 0 is a horizontal asymptote and the line y — — x is an inclined asymptote at ±oo.
6.92. The function is defined and continuous on R \ {1}. It's not odd, even nor periodic. The points of intersection of the graph of / with the axes are the points [i-72,o] and [0, —1]. At xo — 1, the function has a discontinuity of the second kind and its range is R, which follows from the limits
lim f(x) — —oo,     lim f(x) — +oo,      lim  f(x) — +oo.
JC—>■ 1 — Jt-»1 + X^±QO
After the arrangement
/(x) = (x-l)2 + IlT, x€M\{l],
it's not difficult to compute
f'(x) = 2^^, x€M\{l],
f"(x) = 2^0^, x€M\{l}.
The only stationary point is x\ — 2. The function / is increasing on the interval [2, +oo), decreasing on the intervals (—oo, 1), (1, 2]. Hence at the point x\ it attains the local minimum y\ — 3. It's convex on the intervals (-oo, 1 - 72), (1, +00) and concave on the intervals (l-72, l) . The point x2 — 1 — 72 is thus a point of inflection. The line x — 1 is a horizontal asymptote. The function doesn't have any inclined asymptotes.
6.93. The function is defined and continuous on whole R. It's not odd, even nor periodic. It attains positive values on the positive half-axis, negative values on the negative half-axis. The point of intersection of the graph of / with the axes is only the point [0, 0]. The derivative can be determined easily:
f'(x) =        - 7x"e-*,   x e R \ {0},    /'(0) = +00,
The only zero point of the first derivative is the point xo — 1/3. The function / is increasing on the interval (—00, 1/3] and decreasing on the interval [1/3, +00). Hence at the point xo, it has an absolute maximum yo — l/73e. Since lim^-oo f(x) — —00, its range is (—00, yo]. The points of inflection are
xi = t=^,    x2 = 0,    x3 = li^.
It's convex on the intervals (xi, xi) and (X3, +00), concave on the intervals (—00, xi), (x2, X3). The only asymptote is the line y — 0 at +00, i.e. limx^+0o f(x) — 0.
6.94. The function is defined and continuous onR\ {2}. It's not odd, even nor periodic. It's positive exactly on the interval (0, 2). The only point of intersection of the graph of / with the axes is the point [0, 0]. At the point xo — 2, the so called jump of size it is realized, as follows from the limits
lim f(x) = f,     lim f(x) = -f.
We have
415
CHAPTER 6. DIFFERENTIAL AND INTEGRAL CALCULUS
The first derivative doesn't have a zero point. The function / is therefore increasing at every point of its domain. Since
lim  /(*) = -£,      lim  /(*) = -§,
its range is the set (—it/2, tt/2) \ {—tt/4}. The function / is convex on the interval (—00, 1), concave on the intervals (1,2), (2, +00). Thus the point x\ — 1 is a point of inflection with /(l) — tt/4. The only asymptote is the line y — —it/4 at ±00.
6.95. Domain R+, global maximum x — e, point of inflection x = Ve3, increasing on (0, e), decreasing on (e, 00), concave on (0, Ve3, convex on (Ve3, 00), asymptotes x = 0 and y — 0, lim^o /(x) = —00, lim^oo /(x) = 0.
6.96. Domain R \ [1,2]. Local maximum x = 1~2V^, concave on the whole domain, asymptotes x = 1, x — 2.
6.97. Domain R. Local minimas —1,1, maximum at 0. Even function. Points of inflection ±-^, no asymptotes.
6.98. Domain R \ [—j, 1]. No global extremes. No inflection points, asymptotes x — — 5, x — 1.
6.99. DomainM\{l}. Noextrems. No points of inflection, convex on (—00, 1), concave on (1, 00). Horizontal asymptote x — 1. Inclined asymptote y — x + 1.
6.100. (a) £ xV^; (b) £ + 2^ + ^; (c) ™L; (d) In (1 + sinx).
6.707. (a) -cotgx - x + C; (b) tgx - cotgx + C. 6.102. ex + 3arcsin §.
6.703. ±ln(l +x4) + C.
6.704. 2 V2 arctg ^ + C.
6.105. In J xff J + 5 arcsin x — 4 cos x — 5 sin x + C. 6.706. (a) x arctgx - Ml+__) + c; (b)      + C.
6.107. (a) —x2 cos x + 2x sin x + 2 cos x + C; (b) ex (x2 — 2x + 2) + C.
6.708. ^ (21n2x - 21nx + l) + C.
6.709. (2x -x2)ex + C.
6.770. (a) (2*+5)" + C; (b)        + C; (c) -± e"^ + C; (d) 5 arcsin3 x + C; (e) ^ + C; (f) arctg2 Vx + C; (g) ^ arctg      e*) + C; (h) 2 sin fx - 2 fx cos fx + C.
6.111. For example 1 — x = r^x gives /   ~2^4 <i?; and Vx2 + x + 1 = x + y leads to / ■y2+2y_2-
6.772. ^ arctg (V2 tg x) + C. 6.113. x - 2fx~ + 2 In (l + Vx) + C. 6.774. (a) ^ Inx - ^ + C; (b)--|_! +c. 6.777. 2X3 + 3X2 - 2x - 13 + f9x+A.
x1-2x+A'
IIS r3_I_|_2,__5_
6.779. (a)-^ + -^--^;(b)f-^
1   1 1
x-2 ^ x+2      *+3> ^ x      x2 ^ x2 + l ^ (x2 + \)2-
6-120.^ + ^-1.
6.121. ^ + /Xliv
6.122. ^-^ + ^.
416
CHAPTER 6. DIFFERENTIAL AND INTEGRAL CALCULUS
6.123.--^ + ^-^.
6-124. jM + i\ + CT + -x 6-125. 1 + -L - 2 + ^.
6.126. (a)31n|x-2|;(b)-^?.
6.127. § In (x2 + 4x + 8) - \ arctg ^ + C.
6.128. ^zrctg^ + J^)+C.
6-129. I in Jgfc + f arctg ^ + C.
6.130. iln|x - 1| - iln(x2 +x + l) -     arctg ^ + C.
6.737. In (| x - 1 | (x - 2)4) - ^ + x + C.
6.7J2. (a) ^ _ £°^i + C; (b) ^ + § + C; (c) x - sin x + C; (d) | + ^ + C; (e) | sini x - | sini x + £ sinT x + C; (f) ^ +2tgx - ^ + C; (g) ± In |tg|| - + C; (h) In |tg f | + C.
6.133. SSn, sup = =£1, SSn, m = 2=1; yes, it is.
6.134. /f VI Jx = \ (2V2 - l).
6.135. Infinitely many.
6.136. For example, / can attain a value of 1 at rational points of the interval and be zero at irrational points.
6.138. V5 - V2.
6.139. \b\ - \a\.
6.140. iln2.
6.141. i (e* - 2).
6.142. e-5e~l.
6.143. I.
6.744. (a) 4; (b)
6.145. p < q.
6.146. a >0;b = 0;c>0.
6.147. C<D = 0<A<B.
6.148. (a) 5; (b) 0; (c) 0.
6.149. 1. 6.750. 0.
6.151. 0 < /f lnx dx < ±, fix lnx dx = ln4 - |.
6.152. —6X5 cosx2.
6.753. i InCx2 + 2x + 2) - i InCx2 + x + 1) + ±V3 arctan ((2Af+1)v^^ + C.
6.737. (a) 2; (b) f
In 2.
2 '
;(c)21n(l + V2);(d)2-?,
6.754. iln(i±|^).
6.155. -\ - \ In2. 6.756.
(i) |,
/••\ 1 • 4 (n) j sin x.
6.757. (a) tt; (b) +00; (c) 20; (d) -2.
6.758. -00.
417
CHAPTER 6. DIFFERENTIAL AND INTEGRAL CALCULUS
6.159. ^tt.
6.160. ^p-n.
6.161. -\; 1. 6.162.2-%; l-\;\.
e' e     ez ez
6.763. (a) a > 1; (b) a < 1; (c) a = 0.
(5.764. Exactly for /? > 1, q e K and for p — 1, q > 1.
6.165. (a) true; (b) false; (c) true.
6.166. 1 - ^2 +
(5.7(57. The error belongs to the interval (0, 1/200). 6.168. 1 — 3a- + jjx4 ; above the tangent line.
6.169. ZZo
24"
co    (-1)" 9«
(2n)!
6.770. v = arctgx.
6.777. Exactly for x € (—§' §)> we nave
E(-f)
oo
1 _ 1 5+2x 5
«=0
6.172. ^ 2~2n=0 3nX" ■
6.173.
/(jc) = 1/2+ >----(x--)
J   J      '      ^  (2i + 1)!  V 4/
The series converges for all x € R.
6.175. fix) =x,x eM;yes. 6.776. No.
6.177. ZZo{-^x2n-
6.178. {a)\-^ + ^;{b)\-^.
0.1/V. l„=0 (2« + l)«! X
6.180. y = arctgx.
6.787. Exactly for x € ( — |,     we have
5+2x — 5   ^ \    5      1 ■
6.7S2. (a) v(0) = 6m/s; (b) t = 3 s, s(3) = 16m; (c) v(4) = -2m/s, a(4) = -2m/s2.
418
CHAPTER 7
Continuous models
How do we manage non-linear objects? - mainly by linear tools again...
A. Orthogonal systems of functions
If we want to depict a three-dimensional object in a plane, we usually consider its (mostly orthogonal) projection into the plane. Similarly, if we want to "express" some more complicated function in terms
In this chapter, we will show several applications of the tools of differential and integral calculus for various problems in which we will do with functions of one real variable.
The tools and procedures will be quite similar to the ones shown in the third chapter, i. e. manipulations with linear combinations of selected generators and linear transformations (looking for their kernels or the reverse images of given elements). However, we will work not with finite-dimensional vectors but with spaces of functions, i. e. the vector spaces we will be considering will seldom have finite dimension. We will get back to these as well as other practical fields in the next chapter in the context of functions of several variables, differential equations, and the calculus of variations.
First, we will approximate functions with linear combinations from given sets of generators. However, on the way, we will have to clarify how we can work with such concepts like distance. Actually, we will sketch the basics of what is called the theory of metric spaces, and this part will also serve us as a preparation for analysis in Euclidean spaces M". We will mainly resume applying the procedures we have already known from the Euclidean vector spaces. We will find out that our intuition from the Euclidean spaces of low dimensions is quite convenient in general as well.
Then we will focus on integral operators, i. e. linear mappings on functions which are defined in terms of integrals. Especially, we will pay our attention to the so-called Fourier analysis. As usual, our reasoning will touch discrete variants of previously discussed continuous operations.
In the entire chapter, we will work with functions of one variable which will take real or (very often) complex values.
CHAPTER 7. CONTINUOUS MODELS
of simpler ones, we can consider its projection into the (real) vector space generated by those simpler functions. Then we will be able to integrate, for instance, the more complicated functions in the same way as we integrated (or differentiated) functions expressed in terms of power series (if the space of the simpler functions is "sufficiently" large, then we will be able to do this with arbitrary accuracy).
We can even define a scalar product on a suitable (infinite) vector space of functions on a given interval. Thus the scalar product is not to be defined on the space of all functions on the given interval, but rather on a subspace of its which, on the other hand, will be large enough for our calculations (besides, it will contain all continuous functions on the given interval). The scalar product will allow us to calculate the projections in the same way as we used to do in the case of vector spaces. Given a finite-dimensional vector (sub)space of functions and wanting to determine the projection of a function onto it, we first calculate the orthogonal (or orthonormal) basis of this subspace by the Gram-Schmidt orthogonalization process and then we determine the orthogonal projection in the known way (2.3).
7.1. Let the vector subspace (x2, 1/x) of the space of real-valued functions on the interval [1, 2] be given. Complete the function 1/x to an orthogonal basis of the subspace and determine the orthogonal projection of the function x onto it.
Solution. First, we deal with the basis. It is required that the function 1/x be one of the vectors of the basis. The vector space in question is generated by two linearly independent functions, thus its dimension is 2 (and all of the vectors lying in it are of the form a ■ ^ + b ■ x2 for some a, iel). It remains to find one more vector of the basis which is orthogonal to the function f\ = 1/x. According to the Gram-Schmidt process, we are looking of it in the form f2 = x2 + k ■ \, k e M. The real constant k can be determined from the condition of orthogonality:
1   ,        1       1   , 11
<-,x2+£--) = (-,x2)+£<-,-).
X X
Therefore,
l l
\_x_
21.1
1    x x
dx
Thus, the wanted orthogonal basis is (^, x2 — |). Now, we calculate the projection px of the function x onto this subspace (see (2.3)):
<*,7>     1 (x,X2-7)
(l, l)   * + (x2 -
1 x ' x ' 1
2 15 - 3 x     34 x
':) ■ (X2 - -)
□
7.2.   Let us consider the real vector space of functions on the interval
generated by the functions \ an orthogonal basis of this space.
[1,2] generated by the functions Complete the function - to
1. Fourier series
7.1. Spaces of functions. As usual, we begin with choosing appropriate sets of functions which we want to work with. We want to have enough functions so that our models could be conveniently applied in practice. At the same time, the functions must be sufficiently "nice" as we must be able to integrate and differentiate them as needed.
We will largely work with functions defined on an interval I — [a,b] c M, or on an infinite interval (i. e. the marginal values a and b can take the values ±oo, but the sets will still be closed). .__|    Spaces of piecewise smooth functions    |_s
The set of functions 5° — 5° [a, b] contains exactly the piece-wise continuous functions on I — [a,b] with real or complex values, i. e. we suppose that at every point of the interval, the function / e 5° has the corresponding finite one-sided limits both from the left and from the right, and that on every finite interval, there are only finitely many points of discontinuity. Especially, all such functions are bounded on bounded intervals.
For every natural number k > 1, we will also consider the set of all piecewise continuous functions / such that all their derivatives up to order k (inclusive) lie in 5° (i. e. they need not exist at all points, but their one-sided limits do exist). We will denote this set by 5*.
In the case of an unbounded interval I, we will also often work with the subset 6^ c 5* of all functions with a compact support (i. e. the functions take zero outside some finite closed interval).
On bounded intervals, of course, all such functions have a compact support in this sense. When we are not interested in the interval we work on, we will write only 6^ in all cases. In the case of a finite interval [a, b] or under the condition of a compact support, our functions from 5° are always Riemann integrable on the chosen interval I, both in the absolute value and squared, i. e.
Ja
\f(x)\dx < oo,
Ja
(f(x)) dx < oo.
Our reasonings can be extended on significantly greater domains of functions, yet this often costs us a lot of technical effort. From time to time, we will make reference to Kurzweil (or Lebesgue) integrable functions for which the results are more compact and nicer. Interested readers are referred to extended and specialized literature. Actually we will keep the same strategy as with the rational and real numbers - we calculate with nice functions only and we can "handle" the limits of Cauchy sequences in the chosen metrics (which are usually needed only formally).
Distance of functions. From the already proved properties of limits and derivatives, we can immediately see that 5* and 6^ are vector spaces. In finite-dimensional spaces, the distance of vectors was considered in terms of the differences of the particular coordinates. In spaces of functions, we can proceed analogously and utilize the absolute value of real or complex numbers (or the Euclidean distance) in the following way:
Distance of functions    [__>
420
CHAPTER 7. CONTINUOUS MODELS
Solution. Similarly to the previous exercise, we use the Gram-Schmidt orthogonalization process (with the given scalar product).
Thus we gradually get that h{x)   = jj - tz, □
fs(x) =1
3_
4x ' 3     I 13
2x2       24x ■
7.3. Determine the projection of the functions ^ and x onto the vector space from the exercise ||7.2||. Determine the distances from this vector space as well.
Solution. Projection of ^ :      + %f2 + \h, distance:
Projection of x : 2j\ + (-§ + ln(2))/2 + (-|ln(2) + |)/3, distance: about 0.03496029493. We can see that the distance of the function which behaves in a similar way as the generators is smaller.
□
7.4. Let the vector subspace (sin(x),x) of the space of real-valued functions on the interval [0, n] be given. Complete the function x to an orthogonal basis of the subspace and determine the orthogonal projection of the function \ sin(x) onto it. O
7.5. Let the vector subspace (cos(x), x) of the space of real-valued functions on the interval [0, n] be given. Complete the function cos(x) to an orthogonal basis of the subspace and determine the orthogonal
projection of the function ^ cos(x) onto it.
B. Fourier series
o
One of the fundamental studied periodic processes which can be met in applications is a general simple harmonic oscillation in mechanics. It is the case of a mass point moving along a straight line. It is well known that the function / which describes the position of the mass point on the line in time is of the form
(7.1) f (t) = a sin (cot +b)
for certain constants a, co > 0, b e R determined by the position and velocity of the point at the initial time. The function y = f(t) can, for instance, be obtained by solving the homogeneous linear differential equation
(7.2)
/ + co2y = 0
following from Newton's law of force for the given movement. Let us mention that the function / has period T = 2jt/co (in mechanics, one often talks about frequency 1/T) and that the positive value a (expressing the maximum displacement of the oscillating point from the equilibrium position) is called the amplitude. The value b (expressing the position of the point at the initial time) is called the initial phase, and co is the angular frequency of the oscillation.
Similarly, we can focus on the function z = g (t) which describes the dependence of voltage upon time t in an electrical circuit with inductance L and capacity C and which is the solution of the differential equation
(7.3)
z" + co\
0.
The only difference between the equations (|| 7.21|) ans (|| 7.31|) (besides the dissimilar physical interpretation) is the constant co. In the equation (||7.21|), there is co2 = k/m where k is the proportionality constant
Definition. The L\-distance of functions / and g from 6^ is defined by
11/~ g\\i = /   l/W - g(x)\dx.
J a
Similarly, /^-distance of / and g is defined by
II/-SII2
or
\f(x)-g(x)\zdx
1/2
The size of the function || /1|: or || /1| 2 is understood to be its distance from the zero function.
In the first case, the L\ -distance of functions / and g which take real values only expresses the area enclosed by the graphs of these functions, regardless of which function takes greater values. Since we consider piecewise continuous functions / and g, their distance can equal zero only if they differ in their values at the points of discontinuity, i. e. at only finitely many points on bounded intervals. Indeed, if our functions differ at a point xo and they are continuous at it, then they also differ on some sufficiently small neighborhood of this point, and this neighborhood, in turn, contributes a non-zero value into the integral.
If we have three functions f,g, and h, then, of course,
f\h(x)-f(x)\dx= f
Ja Ja
pb pb
< /   \h(x) - g(x)\dx + /
Ja Ja
\h(x) - g(x) +g(x) - f(x)\dx
\g(x) - f(x)\dx,
so the usual triangle inequality holds. We can notice that to derive this inequality, we only used the triangle inequality for the scalars; thus it holds for functions /, g e 6J? with complex values as well.
The second definition is similar. The square of the size ||/||2 of a function / is
ll/H22=   / \f(x)\2dx J a
and it is derived from the well-defined symmetric bilinear mapping of real functions to scalars
(f,8)
-f
Ja
f(x)g(x)dx
by substituting / for both functions. In the case of complex values, we obtain this size similarly from the scalar product, using complex conjugation:
(f,8)
-
Ja
f(x)g(x) dx,
as we saw when talking about the unitary spaces in the third chapter:
Thus the triangle inequality will hold as well because the whole discussion can be done in a space of dimension at most three with scalar product, generated by given functions /, g, h.
7.3. (In)finiteness of dimensions, orthogonality. Let us, for a
<gu        while, stay at our definition of the Z,2-size || ||2 on '■"^c",^'   the vector space 6^. Apparently, the operation at the ii   jtS^- -    end of the last paragraph satisfies both linearity in the >     first argument and symmetry
(f,8) = (8,f),
421
CHAPTER 7. CONTINUOUS MODELS
and m is the mass of the point, while in the equation (||7.3||), there is
CO
(LC)"1.
Actually, every periodic process which can be described by a function of the form (||7.1||) is considered a harmonic oscillation, and the constants a,co,b are almost exclusively given the mentioned names borrowed from the simple harmonic oscillation of a mass point in mechanics.
Applying one of the sum formulae
sin (a + j3) = cos a sin   + sin a cos       a, j3 e M, we can write the function / (see (||7.1||)) as
(7.4)
f(t) = c cos (cot) + d sin (cot)
where c = a smb, d = acosb. Thus, the function / from (||7.4||) also describes a harmonic oscillation with amplitude a = \Jc1 + d2 and the initial phase b € [0, lit) satisfying sin b = c/a, cos b = d/a.
An important task in application problems is the composition (so-called superposition) of different harmonic oscillations. A key position is occupied by the superposition of finitely many harmonic oscillations expressed by functions of the form
/„ (x) = an cos (ncox) + b„ sin (ncox)
for n € {1, ..., m}. These particular functions have prime period 2jt/(nco). Therefore, their sum
(7.5)
[an cos (ncox) + bn sin (ncox) ]
«=i
is a periodic function with period 2n/co. It holds generally that the superposition of any finite collection of simple harmonic oscillations with commensurable periods is a period function whose period is the lease common multiple of the prime periods of the particular oscillations.
The sum (||7.5||) modified by an appropriate movement,
(7.6)
2
+      [an cos (ncox) + bn sin (ncox) ]
«=i
is just the m-th partial sum of the functional series
oo
(7.7) --\-      \-an cos (ncox) + bn sin (ncox) ].
«=i
From the physical point of view, it is a complicated periodic process which can serve as a natural approximation of the superposition of infinitely many simple harmonic oscillations (so-called harmonic components) of the functional series (||7.7||).
It may be an interesting question whether, on the other hand, every periodic process can be in a "reasonable" way expressed by the superposition of finitely (or possibly infinitely) many simple harmonic oscillations - whether every periodic process is the result of such a superposition. Exactly formulated from the mathematical point of view: whether every periodic function can be expressed as the finite sum (|| 7.61|), or at least as the series (||7.7||). Of course, the positive answer for a significant and broad class of periodic functions is obtained for the infinite sum only (see the theoretical part).
We have already mentioned that periodic processes play an important role in many physical and technical fields. Traditionally, we
i. e. in the real case, it is a symmetric bilinear mapping. At the same time, for continuous functions, the condition of non-zero size of non-zero functions holds as well, while for our piecewise continuous functions, zero size implies that the function is non-zero except for an (at most) countable set of points (finite on every finite interval). Thus we have truly defined the scalar product for the vector subspace of continuous functions.
In the case of more general functions, we should, from the technical point of view, identify functions which differ on finite intervals at finitely many points only. In our subsequent reasonings, this technical obstruction will play an insignificant role (we will occasionally make a reference to it in notes).
In the case of finite-dimensional real or complex vector spaces, we considered scalar products and the size of vectors as soon as in the second and third chapter. Now let us notice that when we derived the properties, we always worked with pairs or finite sets of vectors.
Now, we can do just the same with functions, and if we restrict our definition of a scalar product to a vector subspace generated (over real or complex numbers, according to our need) by only finitely many functions /i ,...,/*, we again obtain a well-defined scalar product on this finite-dimensional vector subspace.
As an example, let us consider the functions ft — xl, i — 0,..., k. In 5°, they generate the (k + 1)-dimensional vector sub-space Rklx] of all polynomials of degree at most k. The scalar product of two such polynomials is given by an integral. Every polynomial of degree at most k is uniquely expressed as a linear combination of the generators fo, ■■■ , fk- Moreover, if our generators were such that
(7.1)
(fi,fj) =
|0 for/^j, |l   for «' = ./',
then we would have the so-called orthonormal basis. At this occasion, let us remind the procedure of Gram-Schmidt orthogonaliza-tion, see 2.42, which transforms any system of linearly independent generators ft into new (again linearly independent) orthogonal generators gt of the same subspace, i. e. (gi, gj) — 0 for all i / j. We can calculate them step by step as gi — f\ and by the formulae
(fl+Ugi)
gi+i — fi+i
■ aigi,
fori > 1.
For illustration, we will apply this procedure to three polynomials 1, x, x2 on the interval [—1, 1]. We get g\ — 1,
g2 = X
g3
= x2
= x2
II* 1
3
1 dx ■ 1 — x — 0 — x
x2 • 1 dx ■ 1
— ('
ten2;-
■ x dx
Thus the corresponding orthogonal basis of the space R2M of all polynomials of degree less than three on the interval [—1, 1] is 1, x, x2 — 1/3. Normalization, i. e. multiplying by an appropriate scalar to change the size of the basis' elements to one, gives the orthonormal basis
[i [3 l/5,
/n=^-,    /*2 = y-*,    h3 = -xJ-(3x2 -1/3).
422
CHAPTER 7. CONTINUOUS MODELS
can point out acoustics, mechanics, and electrical engineering where answering the above question is undoubtedly of extreme importance. Besides that, looking for the answer has given rise to an independent mathematical field - the theory of Fourier series. Later, it began to be applied in many other classes of problems (among others, for solving the most of important types of ordinary and partial differential equations) and led to development of the particular theoretical foundations of mathematics (for instance, to precise definition of the fundamental concepts of a function and an integral).
The Fourier series are named in honor of French mathematician and physicist Jean B. J. Fourier, who was the first to apply trigonometric expressions (||7.6||) in practice in his work from 1822 devoted to the issue of heat conduction (he began to deal with this issue in 1804, and he finished the publication as early as in 1811). The significance of this work of Fourier's for physics is enormous although Fourier himself did not pay much attention to physics. He introduced mathematical methods which even nowadays belong to the standard tools of theoretical physics. His mathematical theory of heat also became the foundations for George S. Ohm when he derived the famous law of conduction of electric current. We should not forget to mention that there were many mathematicians who studied the properties of the sums (|| 7.61|) many years earlier than Fourier (L. Euler, for example). However, they did not achieve such significant results with regard to practical applications.
7.6.   Determine the Fourier coefficients of the function
(a) g(x) = sin(2x) cos (3x) , x € [—Jt, 7t];
(b) g(x) = cos4x, x € [—Jt, Jt].
Solution. The case (a). Since for ieK,we have
sin (2x) cos (3x) = sin (2x) [cos (2x) cos x — sin (2x) sin x] =
sin {Ax) cos x — sin (2x) sin x
j cos x sin {Ax)
1—cos(4x)
smx
- \ sin x + \ cos x sin (Ax) + \ sin x cos (Ax)
- 5 sin x + j sin (5x)
we can see that the Fourier coefficients are all zero except for b\ -1/2, b5 = 1/2.
The case (b). Similarly, from
-l2
cos4x
l+cos(2x)
= [cos2 x] [l + 2 cos (2x) + cos2 (2x) ]
l
~~ 4
3 + \ cos (2x) + \ cos (Ax) ,    x e
1+2 cos (2x) +
1+cos(4jc) 2
3/4, a2 = 1/2, a4 = 1/8, and the other coefficients
8   1 2
it follows that a0 are all zero.
We showed in this exercise that the calculation of the Fourier series may not lead to integrations (usually by parts). Especially in the cases where the function g is a product (power) of functions y = sin (mx), y = cos (nx) for m, n e N, it suffices to apply high-school knowledge (well-known trigonometric formulae). □
7.7.   Find the Fourier series for the periodic extension of the function
(a) g(x) = 0, x € [—Jt, 0),   g(x) = sinx, x e [0, Jt);
(b) g(x) = |x |, x € [-Jt,jt);
(c) g(x) =0, x e [-1, 0),   g(x) = x + 1, x e [0, 1).
Such orthonormal generators of Rk [x] are called Legendre polynomials.
7.4. Orthogonal systems of functions. We have just reminded ourselves the advantages of orthonormal bases of sub-spaces of finite-dimensional vector spaces. In the last example of Legendre polynomials generating M2 [x] C V — Rk[x], k > 2, for any polynomial h e V, the function
H — (h, h\)h\ + (h, h2)h2 + (h, h3)h3
will be the uniquely determined function which minimizes our L2-distance \\h — H\\ among all functions in R,t[x], see 3.25.
The coefficients for the best approximation of a given function by a function from a selected subspace can be obtained just by integrating in the definition of the scalar product.
The mentioned example suggests the following generalization: If we do the Gram-Schmidt orthogonalization for all monomials 1, x, x2,..., i. e. for a countable system of generators, what will become of that?
___J    Orthogonal systems of functions |_ -
Every (at most) countable system of linearly independent functions in 6^ [a, b] such that the scalar product of each pair of distinct functions is zero is called an orthogonal system of functions. If all the functions /„ in the sequence are pairwise orthogonal and for all n, the size ||/„ ||2 = 1, we talk about an orthonormal system of functions.
Let us thus consider an orthogonal system of functions /„ e 5° [a, b] and suppose that for (real or complex) constants c„, the series
F(x) = ^c„/„
«=1
(F, fn)
00 b
— ^2 Cm /
m=\
fm (x) fn (x) dx — Cn || fn I
converges uniformly on a finite interval [a, b]. Then the scalar product (F, /„) can easily be expressed in terms of the particular summands (see the corollary 6.43), obtaining
where the norm means (just like in further paragraphs) our Z,2-size. Surely we can now anticipate in what sense the procedures from finite-dimensional spaces can be extended: Instead of finite linear combinations of base vectors, we will work I with infinite series of pairwise orthogonal functions. The f ^   following theorem gives us a transparent and very general answer to the question how well the finite sums of such a series can approach a given function:
7.5. Theorem. Let /„, n — 1, 2,..., be an orthogonal sequence of (real or complex) functions in 5° [a, b] and let g e 5° [a, b] be an arbitrary function. Let us denote
rb _
cn — II/nil 2      g(x)f„(x) dx.
(I) For any fixed n € N, the expression which has the least L2-distance from g among all linear combinations of functions
423
CHAPTER 7. CONTINUOUS MODELS
Solution. The case (a). Direct calculation gives
Xg+2jT 0 71
üQ   = j-   f   g(x) dx = ^ f Odx + j- f sinx dx
X(j -71 0
= J- [-cosx]* = 1,
71   L J U jt '
fl„     = ^ g (x) COS («X) (ix
0 jt
= ^ / 0 <ix + ^   sin x cos (nx) dx
2^- / sin ([1 + n]x) + sin ([1 — n]x) dx
j_
2t7 (
cos([l+«]x) cos([l— n]x)
1+n 1—n
cos([1+«]jt) cos([1
1+n ~ 1-
xq+2ti 0
-jim. _i_ _J_ _i_ _J_^ „
n       T 1+n T 1-n / '
e N,
^   /  § (x) sin x <ix = ^    0 dx + j- f sin2 x (ix
71
2^- / 1 — cos (2x) dx = o
x0+2jt 0
i   r        „;„       j__i
o
sin(2x)
bn   = -   f  g(x) sin (nx) dx = - f 0dx + - f sinx sin («x) (ix
xq
2^- / cos ([1 — n]x) — cos ([1 + n]x) dx
j_
271
sin([l— n]x) sm([l+n]x) 1—n 1+n
0, for 72 e N\ {1}
Thus, we get the Fourier series
00 r / \
I + £    /cosfll+nfr) _ cosdl-nfr) +    1    +    1   \ CQS ( }
7i       2       2tt £—1    \ 1+n 1—n 1+n       1—n I        v '
n = l
The obtained result can be refined: For even numbers n, we have
cos([1+«]tt) _       cos([l— n\n)   .     1     .__1_
1+n       ~       1-n        '   1+n   ' 1-n
and for odd numbers n,
cos([1+«]jt)        cos([1— n]7t) 1+n 1—n
Altogether,
+ — + — + —
1+n   1   1-n   1   1+n   1 1-n
4
"«2-l ■
T 1+n T 1-n
1+n       1-n       1+n 1-n
«2-l
cos([1+«]jt) _ cos([1-«]jt)   ,__1__.__1_ _ ^ (-1)"+1-1
1+n 1-n '   1+n   '   1-n ~
so the resulting series can be written as
n e N,
J_ _i_ sin x   I   J_ v-^
71   "t"     2      "i" jt
« = 1
C_1 _ 1
--4j—— cos («x)
nz — l v 7
J_ _i_ sin x _      _2 v-^
71 2 jt ^
« = 1
cos(2«x) 4«2-l "
The case (b). First, let us mention that the given function is often talked of as a function of sawtooth-shaped oscillation and that its expression as a Fourier series is very important in practice. Since the function g is even on (—it, it), we can immediately see that b„ = 0 for all n € N. Therefore, it suffices to determine
xq+27T 71
a0 = -\   /   g(x) dx = I j x dx = I
xo 0
It,
fl,...,fn IS
r = l
(2) 77ze series zZ^Li \cn\2\\fn\\2 always converges and it holds that
X>»i2ii/»ii2< 11*1
n = l
(3) L^-distance of g from the partial sums s\ — 2~2n=i c„f„ converges to zero, i. e.
lim ||g — ^jfc ii = 0,
if and only if
«=1
Before we start with the proof, let us first look at the meanings of the particular statements of this theorem. Since we are working with an arbitrarily chosen orthogonal system of functions, we cannot expect that all functions can be approximated by linear combinations of the functions f.
For instance, if we consider the case of Legendre orthogonal polynomials on the interval [—1, 1] and restrict ourselves to even degrees only, we will surely be able to approximate even functions only. Nevertheless, the first statement of the theorem says that we can always reach the best approximation possible by partial sums (in /^-distance).
The second and third statements then can be perceived as an analogy to the orthogonal projections into subspaces expressed by Cartesian coordinates. Indeed, if for a given function g, the series F(x) — E^Li cnfn converges pointwise, then the function F(x) is, in a certain sense, a orthogonal projection of g into the vector subspace of all such series.
The second statement is called Bessel's inequality and it is an analogy of the finite-dimensional proposition that the orthogonal projection of a vector cannot be greater than the original vector. The equality from the third statement is called Parseval's theorem and it says that if a given vector does not become smaller by the orthogonal projection into a given subspace, then it surely belongs to the subspace.
On the other hand, our theorem does not claim that the partial sums of the considered series would have to converge pointwise to some function. There is no analogy to this phenomenon in the finite-dimensional world. In general, the series F(x) need not be convergent (i. e. if we considered more general functions than the ones from our space even in the case when the equality
in (3) holds. However, if the series z~2n%i lc« I converges to a finite value and all the functions /„ are bounded uniformly on I, then, the series F(x) — E^Li c« /« apparently converges at every point x. Yet it need not converge to the function g everywhere. We will get back to this problem shortly.
The proof of all of the three statements of the theorem is quite similar to the case of finite-dimensional Euclidean spaces. No wonder it is so as the bounds for the distances of g from the partial sum / are constructed in the finite-dimensional linear hull of the functions concerned:
424
CHAPTER 7. CONTINUOUS MODELS
and for any n    e    N using integration by parts,  we get
xq+2tt jt
an   = ^   f   g(x) cos (nx) dx = ^ f x cos («x) <ix
x0 0
JT
= -\- sin(«x)l^ —— f sin(njc) dx
jt In v/j0       nil j v/
= ^[cos(nJc)]J = ^°[(-l)--l], so
a„ = — -I- for n odd,    a„ = 0 for n even.
" «ZJT "
Now, we know the Fourier series of the function of sawtooth-shaped oscillation
oo
- + - T
2   ^ 71 £-1
«=1
JT
2
/_      1 Att _ 1
„2    cos («x)
£E
JT
«=1
cosx + ^i + ^i +
cos([2« — (2«-l)2
32     ^ 52
This series could have been found by an easier means - integrating the Fourier series of Heaviside's function (see "angular wave function" in the theoretical part).
The case (c). The function's period is T = 2, so we use more general formulae
a0
x0+T 1 0 1
Y f g(x) dx = f g(x)dx = f Odx + f(x + \)dx
x0 -1 -l o
x0+T 1
Y f g(x) cos (ncox) dx = f g(x) cos (njtx) dx
*o -i o 1
/ Odx + f(x + 1) cos (njtx) dx
-1 0
x0+T 1
3
2'
/ g(x) sin (ncox) dx = f g(x) sin (nnx) dx
x0
= f Odx + f(x + 1) sin (nnx) dx =    „(~ } , neN.
-1 o
The calculation of ao was simple and needs no more comments. As for determining the integrals at a„ and b„, it again sufficed to use integration by parts once (differentiating the polynomial u = x + 1). Thus, the wanted Fourier series is
+ E ( cos (nnx) + sin (nnx) )
n = \ V 7
Some refinements of the expression can be achieved when we realize, for instance, that for n € N, we have
2
n 2 77-2
and, similarly,
for n odd,    an = 0 for n even, — for n odd,       = — — for n even.
□
7.8. Let the Fourier series of a function / on the interval [—n,n] with coefficients am,bn, m € N U {0}, n € N be given. Prove the following statements:
(a) If f(x) = f(x +n), x e [-7T, 0], then a2k-i = b2k-i = 0 for every JeN.
(b) If f(x) = -f(x + n), x e [-7T, 0], then a0 = a2k = b2k = 0 for every JeN.
Solution. The case (a). For any k e N, the statement can be proved directly by the calculations
Proof of theorem 7.5. Let us choose a linear combination / — e«=i an.fn and calculate its distance from g. We get
US ~ ^2anfn\\2 —  /     g(x) ~ ^anfn(x) dx
1 J a 1
n—\ n—\
/b pb J0 _
\g(x)\2dx-      ^2g(x)a„f„(x) dx-Ja n=\
pb J0 _ pb J0
- /   ^2a„f„(x)g(x) dx + / ^an/n
«=1
(x)
11*1
^fl«c„||/„||2 - ^fl„c„||/„||2 + ^a2||/„|
«=1
«=1
«=1
= llgll2 + X) ll/«ll2((c„ -a„)(c„ -fl„)- |c„|2).
«=1
Apparently, the last expression can be minimized be choosing a„ — c„, which finishes the proof of the first statement.
Substituting this choice yields the so-called Bessel's identity
k k
iig-Ec»-f»ii2 = ii5ii2-Eic»i2ii/»ii2'
n = l
« = 1
from which the Bessel's inequality
Ec«ii/«ii2^
n = \
immediately follows as the left-hand side is non-negative. Therefore, the whole second statement has been proved as well because every non-decreasing sequence of real numbers which is bounded from above has a limit (the limit is equal to the supremum of the set of the sequence's terms).
If the Bessel's inequality happens to hold with equality, then the statement (3) follows straight from the definitions and the Bessel's identity proved above. □
An orthogonal system of functions is called a complete orthogonal system on an interval I — [a,b] for some space of functions on I iff Parseval's equality holds for every function g from this space.
7.6. Fourier series. The previous theorem indicates that we are able to work with countable orthogonal systems of functions /„ in much the same way as with finite or-2 thogonal bases of vector spaces. There are, however, essential differences:
• It is not easy to say what the whole space of convergent or uniformly convergent series
F(x) = ^2cnfn
«=1
looks like.
• For a given integrable function, we can find only the "best approximation possible" by such a series F(x) in the sense of Z,2-distance.
425
CHAPTER 7. CONTINUOUS MODELS
l  r n \      /r-.7     n \ j We talk about (abstract) Fourier series and the coefficients c„
ci2k-\ = - / / (x) cos ([2/c — llx) ax „                   ,                  , „           „. „
71 _n rrom the previous theorem are called Fourier coefficients ot a given
o                                        7t function.
= £ / /(x) cos ([2/c - l]x) dx + ± f f{x) cos ([2/c - l]x) dx in the case when we have an orthonormal system of functions
~n                   _                  0 /„, the formulae mentioned in the theorem are a bit simpler, but
= \x=y + jt\ = - f f iy + jt) cos ([2/c - 1] [y + jt]) dy stül ^is no further improvement.
71 _27t The choice of an orthogonal system of functions for use in
, nr „            „,            , practice must follow the purpose for which we want to apply the _i__L / fix) cos (12k — llx) dx
n J J v '      VL         J ' approximation and further tools. The name "Fourier series" itself
jt                                                7t references to the following choice of a system of real-valued func-
= ±f fiy) cos ([2/c - l][y + jt])dy + ±f fix) cos ([2/c - l]x) dx üons:
\ 0 _ I    Fourier's orthogonal system    [__%
= 1 //(y)[cos([2/c - l]y) cos ([2/c - 1]tt) ] o
- sin ([2/c - l]y) sin ([2/c - 1]tt) ] dy + £ / /(x) cos ([2/c - l]x) Jx sinx' cosx' sin2x' cos2x' ■ ■ ■ - sinnx' cosnx' ■ ■ ■
=               cos ([2*-l]y) ^ + i//(x)cos([2/c-l]x) dx=0*-"-;-"- .
q                                        q As an elementary exercise on integration by parts, we can cal-
l  nr         . culate that indeed it is an orthogonal system of functions on the
hk-i = ~ J fix) sin ([2/c - l]x) dx interval [-it, it]. Presently, we will show another means of verifi-
o                                       jt cation of this fact.
= j\ f fix) sin ([2/c - l]x) dx + X f fix) sin ([2/c - l]x) dx These functions are periodic with common period 2jt (see
-T                                       o the definition below) and the so-called "Fourier analysis", which
•          ,     I     irr/,    \ •  /r-w     nr        i\ i builds upon this orthogonal system, allows us to work with all
= x= y + 7T=-  /   fiy + jt) sin ([2/c - l][y + Jt}) dy , .      v       .       .6    .      „ .
71 _2iz (piecewise continuous) periodic tunctions with extraordinary et-
71 ficiency. Since many physical, chemical, and biological data are
//(x) sin ([2/c - l]x) dx
perceived, received, or measured, in fact, by frequencies of the so-
n 0                                         n called signals (i. e. the measured quantities), it is really an essen-
= n I fiy) sm ([2k — l][y + Jt]) + ^ f /(x) sin ([2/c — l]x) dx tial mathematical tool. Biologists and engineers even use the word
o                                           o "signal" in the sense of "function".
= £ / f(y) [sin ([2/c - l]y) cos ([2/c - \]jt) + sin ([2/c - \]jt) cos 'pv - i]y)] -[    Periodic functions    |--,
0 n I       A function / with real or complex values defined on the whole j fix) sin ([2/c — l]x) dx ^ is called a periodic function with period T > 0 iff for every
71 o x e M, it holds that fix + T) = fix).
1 r st \ ■  tr^ii     n\j   ,  l r /v \ •  /tiz     n\j a          It is apparent that sums and scalar products of periodic func-/ / (v) sm ([2/c - l]y) dy + - / / (x) sm ([2/c - l]x) dx = 0. , Ff                ,                    , ,.
*                                       ^ tions with the same period are again periodic tunctions with the
same period.
o o
The integral fx0+T fix) dx of a periodic function / on an
Jx0
rjTi >       .       ,. . ,     . interval whose length equals the period T is independent of the
The case (b). We immediately get , .     „ ^
J ° ■ choice of xo e M.
o
The last proposition can be proved easily:
a0 = -    fix)dx = ± f f(x)dx + ± f fix)dx= 0, Let us choose two such margina1 Points xo and y0 for the in-
71 _n 71 -tz 71 o tegration Substituting t = x +     for a suitable    we transform
■fyo+T f(x^^x to^e case wnen J'o 6 tx0'xo + T]- Now we can split the interval of integration into three parts, thereby finishing and then, in an analogous way as for the first statement, we get that for    the proof.
any k e N, The orthogonality of the Fourier system of functions can be
i  nr n/ N      ^r„,n N   , j^t\    calculated by a nice trip to the world of complex num-
a2k = - /  / (x) cos ( 2/c x) dx ?~i%21    u       u- u i *
zk     jt j j \ /      \l Ji!J    bers, which we can utilize later:
Let us remind that e"  = cos(x) + /sin(x). ^     /(x) cos ([2/c]x) dx + ^ f fix) cos ([2/c]x) dx = \ x = y + jt \ "~3£Jif?~ Straight differentiation of the product of real-valued
o functions, we can verify that real-valued functions z(x) and cpix)
1 77r/(j + ^)cos([2^][y+7r])^ + l//(x)cos([2^]x) dx      of a real variable x satisfy
-2n
/ /(y) cos ([2/c][y + tt]) ^y + 1 / /(x) cos ([2/c]x) Jx
0 0
(z(x) e'^W)' = z'(x) e'"w +i z(x) qf (x) e'"
r>(At)
426
CHAPTER 7. CONTINUOUS MODELS
=      / /(?) [cos (t2^) cos ([2^]^) - sin (&k]y) sin ([2/c]tt)] dy
o
+   / f(x) cos ([2/c]x) <ix o
=      J f(y) cos ([2/c]y) dy + ^j f(x) cos ([2/c]x) <ix = 0,
0 0
bye = £ / /(x)sin([2/c]x) <ix
— jt
0 jt
= ^ f f(x) sm ([2£]jc) dx + ^ f f(x) sin ([2/c]x) <ix = | x = y + tt
-jt 0
= £     /(y +    sin ([2*][y + TT]) dy + £ f fix) sin ([2/c]x) <fr
-2jt 0
= -i / /(y) sin ([2/c][y + tt]) rfy + £ / /(jc) sin ([2/c]x) dx
0 0
=      / /(y) [sin ([2/c]y) cos ([2/c]tt) + sin ([2/c]tt) cos ([2/c]y)] dy
o
+-\ f /(x)sin([2/c]x) dx o
= -jf f(y) sin ([2/c]y) dy + 1 / /(x) sin ([2/c]x) Jx = 0.
0 0
□
7.9. Decide the convergence and uniform convergence of the Fourier series of the function g(x) = e~x for x e [—1, 1).
Solution. We need not calculate the corresponding Fourier series if we only want to decide the convergence. Let us define a function s on R with period T = 2 as follows:
g(-l)+ lim g(x)
s(x) := g(x) = e"\ x e (-1, 1), .(1) :=--=
We know that this function is the sum of the Fourier series in question. In other words, the Fourier series converges to a periodic function s. Moreover, this convergence is uniform on every closed interval which contains none of the points 2k + 1, k e Z. This follows from the continuity of the functions g and g' on (— 1, 1). On the other hand, the convergence cannot be uniform on any interval (c, d) such that [c,d]n {2k + l;teZ)/0 because a uniform hmit of continuous functions is always a continuous function. Thus, the series converges to the function g on (— 1, 1), yet this convergence is uniform only on the subintervals [c, d] which satisfy the restriction —\<c<d<\. □
7.10. Determine the cosine Fourier series for the periodic extension of the function
g(x) = 1, x e [0, 1),   g(x)=0, x e [1,4).
Further, determine the sine Fourier series for
f(x) = x - 1, x e (0, 2),    f(x) = 3 - x, x e [2, 4).
Solution. We have already encountered the construction of a cosine Fourier series. It is the case of the Fourier series of an even function. Therefore, the first thing we must do is to define the function g on the interval (—4, 0) so that it is even, which means to set
g(x) := lforx e (-1,0),   g(x) :=0forx e (-4,-1].
The antiderivative of a complex function f(x) of a real variable x can, of course, be obtained in terms of antiderivatives of the real and imaginary components of the function /.
Thus we can easily calculate the integral (supposing m / n)
r
J —JT
r
J —JT
j(m—n)x
dx
1
^i(m—n)x-iJT
i(m—n)1
which always equals zero because it does not matter whether we run the multiples of tt clockwise or counterclockwise.
The integral we have just determined expresses the scalar product (eimx, etnx). Thus we can see that all of our functions etnx (with complex values) are, indeed, perpendicular to each other.
This scalar product can be rewritten as follows:
(e'fflI, elnx) — (cos(mx) + i sin(mx), cos(nx) + i sin(nx)) — ((cos(mx), cos(nx)) + (sin(mx), sin(nx))) + /((sin(mx), cos(nx)) — (cos(nx), sin(mx))).
We can notice that in the imaginary part of this expressions, there are odd functions integrated over the interval [—it, it], which surely results in zero.
The functions sin(x) and cos(x) differ only by the phase shift, i. e. cos(mx — it/2) — sin(mx). Thus both the summands in the real part of our expression are equal. Therefore, both of them must give zero. Thus we have verified the orthogonality of our system of functions.
At the same time, we can see that for m — n, the result is the real number f*n dx — 2it, and the sizes of both sin(nx) and cos(nx) must equal. Thus for positive numbers n, we necessarily get the sizes
For n
II cos(njc) II  — tt,    ||sin(nx)|| —it.
0 only, we get || 11|2 — 2it.
Fourier series J__
A series of functions
oo
F(x) —--h cos(nx) + bn sin(nx))
n = \
from the theorem 7.5, with coefficients
<0+2jt
-I f bn = ~ f
g(x) cos(nx) dx,
;0+2jt
g(x) sin(nx) dx,
is called the Fourier series of a function g on the interval [xo, xo + 2tt].
The coefficients a„ and b„ are called Fourier coefficients of the function g.
In practice, we want to work with Fourier series with an jji „ arbitrary period T of the functions, not only the value 2n.   Then it suffices to move to functions cos
sin(2^nx). By mere substitution t — cox, where co — T , $ we can verify the orthogonality of our new system of functions and recalculate the coefficients in the Fourier series F(x) of a function g on the interval [xo, xq + T]:
2n
427
CHAPTER 7. CONTINUOUS MODELS
Now we can consider its periodic extension onto the whole R with period T = 8 and co = n/4.
We always must have b„ = 0 for all « e Nina cosine Fourier series. We can also easily determine the Fourier coefficients
x0+T 1
a0 = j  f g(x)dx = j f \dx
x0 x0+T
0
I 2'
j f g(x) cos (ncox) dx = \ f cos ^ dx =     sin ^f, n e N,
x0
where we used the formula
(7.8)
a a
J f(x)dx = 2 J f(x)dx,
which is valid for every even function / integrable on the interval [0,a].
It is not a good idea to replace the expression sin (njt/4) in a similar way as in the previous exercises. We would have to divide the natural numbers n into 8 groups with respect to their remainder modulo 8. However, this would not yield much of a transparent expression. Thus we will do with the following form of the cosine Fourier series:
oo
z + E [— sin^cos-fM .
4       t—i Vnrz 4 4 j
« = 1
The sine Fourier transform of the function can be determined analogously from the odd extension of the given segment. We again have T = 8 and co = it/4 for the function /. However, this time it is the coefficients a„,n € N U {0}, which are zero. To find the remaining coefficient, we use integration by parts and (||7.8||) (the product of two odd functions is an even function), obtaining
x0+T
bn = j  f  f (x) sin (ncox) dx
x0
2
j(x-l) sin af-dx- f(x - 3) sin
o
[x
1) sin I)2
4
2
zf-dx
[-(* - 1)£ cos     ]„ + [J-r sin        - [-(* - 3)^ cos =f ] "       sin^ = ^ [(-!)»-IJ + ^sinf,    „ e N.
Hence we can immediately see that for even n, we have b„ = 0. Thanks to that, the sine Fourier series can be refined to the form
E[(^[(-D"-l] + ^rsin^)sinSf]
n = \
00 r / i \
V (    -4 ,   (-I)""116 \
£-> U2«-1]jt [2«-l]2jt2 ) n = l
sin
[2n — l]jrx
□
7.11. Express the function g(x) = cosx, x e (0, n), as the sum of a cosine Fourier series and a sine Fourier series.
Solution. Of course, we have
cosx = cosx,   x e (—it, jt),
considering the left-hand cosine to be the even extension of the function g and the right-hand cosine to be the uniquely given cosine Fourier series.
Then, the sine series must have a„ = 0, n e N U {0}, and we can easily calculate that
Fix) —--h 2_^(fl« cos(n&>x) + bn sin(n&)x)j,
n = \
which have values
b« = jf
<0+T
gix) cos(n&>x) dx,
'-0+T
gix) sin(n&)x) dx.
7.7. Exponential formula. A while ago, we used the basic formula for parametrization of the unit circle in the complex plane by the trigonometric functions when we verified the orthogonality of the functions cos(nx), sin(nx) If we consider co — lir/T to be the speed of running around the circle, where T is the time of one lap, we get the same parametrization in the form:
For a (real or complex) function fit) and all integers n, we can define, in this context, its complex Fourier coefficients as the complex numbers
i rT'2
cn = ~ fit)e-'mnt dt.
1 J-T/2
Straight from the definition, the relation between the coefficients a„ and b„ of the Fourier series (after recalculating the formulae for these coefficients for functions with a general period of length T) and these complex coefficients c„ become clear. For natural numbers n, we get
c„ = \(an - ib„),    c-n — \ian + ib„),
and if the function / takes on real values only, c„ and c_„ are, of course, complex conjugates of each other.
Thus we have expressed the Fourier series F(t) for a function fit) in the form
Both in the case of functions with real and complex values, the corresponding Fourier series can be written in this form. However, the coefficients will be complex in general in both cases.
We will return to this expression several times; for instance, when we will discuss the extraordinarily useful Fourier transformation.
We can notice that having fixed T, the expression co — 2it/T describes just the change of the frequency caused by n being increased by one. Thus it is just the discrete step by which we change the frequencies when calculating the coefficients of the Fourier series.
In subsequent parts of this chapter, we will show that Fourier series work with a complete orthogonal system on 5°. However, we will have to prepare ourselves for that thoroughly. For that reason, we formulate some useful results right now and add some practically oriented notes. We will get back to the proofs later.
7.8. Theorem. Let us consider a finite interval [a, b] with length T —b—a. Further, let f be a function with real or complex values in Sl[a,b] (i. e. a piecewise continuous function with a piecewise continuous first derivative), periodically extended on the whole M. Then:
428
CHAPTER 7. CONTINUOUS MODELS
b\ =   ^ / cosx sinx dx = ^ / sin (2x) dx = 0,
0 0
7t
/3„ =   ^   cos x sin («x) <ix o
=   - / sin ([« + l]x) + sin (\n — l]x) dx
o
(i) TTze partial sums sn of its Fourier series converge pointwise to the function
g(x)
U Hm f(y)
2 \y->*+
lim f(y)).
cos([n + l]x)   I   cos([ft — l]x) n+1        ' n-1
Considering that
b„ = 0 for odd n e N   and b„
we get
2«[(-l)" + l]
(«2-l)jT
n e N\ {!}.
4«
(«2-l)jT
for even n,
cosx
E
n = l
8«
(4«2-l)jr
sin (2«x)
x e (0, 7t).
□
7.12.  Write the Fourier series of the n -periodic function which equals cosine on the interval {—jt/2, Jt/2). Further, write the cosine Fourier series of the 2it -periodic function y = | cosx |. Solution. It is not hard to reaUze that we are looking for one Fourier series only (the second part of the problem is only reformulation of the first one).  Therefore, let us construct the Fourier series for the function g(x) = cosx, x e [—tz/2, tv/2]. Since g is even, we have b„ = 0, n e N. At the same time, we have
a0
jt/2
-   f cosxdx = -,
71      J 71 '
-jt/2
jt/2
J- f cos x cos (2nx) dx
-jt/2 jt/2
I /  \ [cos ([2n + l]x) + cos ([2n - l]x) ] <ix
-jt/2
jt/2
sin([2n + l]x)   -   sin([2n — l]x) 2n + l 2n-l
-jt/2
(-1)" + (-!)"+'
2« + l
2«-l
_ 4 (-!)"+'
7T 4«2-l
for every n e N. Let us notice that the calculation of a0 could have been included in the calculation of a general coefficient a„. The wanted Fourier series is
1 + ±E
TT 7t t—i
« = 1
(-1)"
4«2-l
cos(2nx)
□
7.13.  Expand the function g (x) = ex into
(a) a Fourier series on the interval [0, 1);
(b) a cosine Fourier series on the interval [0, 1];
(c) a sine Fourier series on the interval (0, 1].
Solution. All the way, we will use the formulae (7.9)
ex [a sin (ax) + cos (ax) ]
/
ex cos (ax) dx
l+a2
+ C,
a €
(7.10)
/
ex [sin (ßx) - ß cos (ßx) ]
ex sin (ßx) dx = ———---—- + C,   ß e
1 + ß
which can be obtained by double integration by parts.
(2) Moreover, if f is a continuous periodic function with piece-wise continuous derivative, then the piecewise convergence of its Fourier series is uniform.
(3) The L2-distance ||*jv — /II2 of the partial sums sn of the Fourier series from the function f on S1 [a, b] always converges to zero for N —>• 00.
7.9. Extension of periodic functions. The convergent Fourier series converges, of course, outside the original interval [-T/2, T/2] as well and it is a periodic function on the whole R.
As an example, let us consider the Fourier series for the periodic function given by Heaviside's function g(x) restricted to one period. I. e., our function g will be equal to — 1 on the interval [—it, 0] and to 1 on the interval (0, it). We need not care about the values at zero and at the marginal points of the interval because these do not have any effect on the coefficients of the Fourier series. Its periodic extension onto the whole R is usually called an "angular wave function".
Since it is an odd function, the coefficients at the functions cos(nx) must be all zero. For the coefficients at the functions sin(nx), we get
b„
1 f*
* J-71
2
— (1-
nit
g(x) sin(nx) dx
(-1)").
2 f* * Jo
sin(nx) dx
Thus the Fourier series has the form
4
g(x) = -it
sin(x)
1
- sin(3x) 3
— sin(5x)
The partial sums of its first five and fifty terms, respectively, are shown in the following pictures.
If the interval [-T/2, T/2] is chosen for the prime period of such an angular wave function, i. e. we want to work with the periodic extension of Heaviside's function with period T, we can easily recalculate that the resulting Fourier series has the form
4 / 1 1
g(x) — — I sin(&)x) H— sin(3<wx) H— sin(5<wx) + . it \ 3 5
where the number to — ^- is called the "phase frequency" of the wave. It expresses the ratio of the actual prime period to the unit frequency, i. e. the length 2it of the unit circle.
429
CHAPTER 7. CONTINUOUS MODELS
Thanks to them, we can get
(a)
a0
2fexdx =2(e- 1),
2 f ex cos (2njtx) dx =2 neN,
0
2(e-l)
1+4«2JT2
1
bn   = 2 f ex sin (2njtx) dx =2
(b)
o
4«jr(l-e) 1+4«2jt2
Qx[2njr sm(2njTx) +cos(2njrx)] 1+4«2jt2
ex [sm(2njTx) —2niz cos(2nizx)~\ 1+4«2jt2
,    n e N;
a0
2fexdx =2(e- 1),
0
1
2 / e* cos (njtx) dx = 2
(c)
2[(-l)"e-l]
1+«2JT2
2 / ex sin («7Tx) dx =2
Qx[njT sm(njTx) +cos(njrx) ]
1+«2JT2
eA [sin(njrx) —njr cos(njTx) ]
1+«2JT2
1+«2JT2
Straight substitution then yields the corresponding Fourier series
(a)
(b)
(c)
e _ 1+2(e - 1) E tS^1+47T (1 - e) E
1+4«2jt2    1 v"      ~' l+4«2jr2 '
n — 1 « —1
n = l
2tx E
«[l + (-l)"+1e] sin(«jor)
1+«2JT2
□
7.14. Express the function g(x) = jt2 —x2 on the interval [—jt, jt] in the form of a Fourier series. Using this expression, sum up the series
oo , oo
E „2
«=1
E „2-
« = 1
Solution. Once again, we could take advantage of the function g being even and calculate the non-zero coefficients a„ by integration by parts. However, in the theoretical part, the Fourier series for the function f(x) = x2 on the interval [—1, 1] is derived. This actually proves the identity
/(*) = i + £E
(—1)" COs(njTx) „2
, jce(-l.l).
n = \
Hence it follows (taking into account that g(—jt) = g(jv)) that
7t
3  + TT2 2.
„2
«=i
«=i
(-!)"+' cos(njc) „2
7T
It sufficed
x £ [—JT, Jt].
to add jt2 and multiply the original series by —1. Further, we have to realize that only nx, and not njxx, will be the arguments of the cosines. Thus the period is jt times as great (2/T and the integration bounds in the formula for an are changed), and integrating the cosines now does
Let us notice that as the number of terms of the series increases, the approximation gets much better except in a (still shrinking) neighborhood of the discontinuity point. There, the maximum of the deviance remains roughly the same. This is a general property of Fourier series and it is called Gibbs phenomenon.
Let us also notice that at the point of discontinuity, the value of the approximating function right half between the one-sided limits for Heaviside's function, just like 7.8(1) claims.
Of course, we cannot expect that the convergence of Fourier series for functions g with discontinuity points be uniform (then, the function g would have to be continuous itself, being a uniform limit of continuous functions).
7.10. Utilizing symmetry of functions. Let us think about the problem how we could approximate the function
(h~~±Z-P g(x) — x2 by a Fourier series on the interval [0, 1] as best as possible. If we just periodically extended ^— this function from the given interval [0, 1], it would not be continuous, and so the convergence at integers would be as queer as in the case of an angular wave function. However, we can easily work with the Fourier series on the base interval [—1, 1]. It is an even function on this domain, and so only the coefficients an can be non-zero.
For n > 0, double application of integration by parts yields:
a„ — — f x2 cos(2™;t )dx — 2 f x2 cos(jtnx)dx 2 J-i Jo
TtLnL
The remaining coefficient is
ao
2   f     2 f
— I   x dx — 2 I
2 J-i Jo
x2 dx
The entire series giving the periodic extension of x2 from the interval [—1, 1] thus equals
/(*) =
1
E
n = \
(-1)"
cos(jrnx).
By Weierstrass criterion, it is apparent that this series converges uniformly. Therefore, f(x) will be continuous. Thanks to the theorem 7.8, we know that actually f(x) — x2 on the whole interval [—1, 1] since we are approximating a continuous function on the whole R, and the convergence must be uniform. Thus our series approximates the function x2 on the interval [0, 1] far better than we could do it with a periodic extension of the function from this interval only.
Let us proceed with our illustrations. Thanks to the uniform convergence, we can invoke the rule for differentiating and integrating series term by term and calculate the Fourier series for the functions x and x3. The differentiation will be the simpler one:
-ix2)' 2 '
0 oo
-E
TT i—l
(-D
n + 1
■ sin(Trnx).
«=i
Apparently, this series cannot converge uniformly since the periodic extension of the function x is not a continuous function. However, we can easily derive that it will converge pointwise (see our reasonings about alternating series in ??), thus we really obtained the equality, (see the theorem ??).
430
CHAPTER 7. CONTINUOUS MODELS
not give it in the denominators (the change of the upper bound takes effect when calculating a0). Therefore, we had to multiply the original series by n2. Readers who are unable to perform the corresponding calculations in mind and to immediately realize the differences, are advised to calculate the Fourier series of the function g directly. Substituting x = 0 and x = n then gives
7t
oo
i. e.
E
n = l
(-D"
12 '
and
n—\ n—\
In other words, we have found another way of expressing
7T
12(1
J- 4- -i-
22   ' 32
42
+ ...)=6(l + £ + £ + £ + ...).
□
7.15.  Using the Fourier series of the function g(x) = ex,x e [0, lit), calculate YZLi resolution. We have (see also (||7.9||), (||7.10||))
a0   =I/e^x = l(e2--l), o
In
ex[cos(nx)+n sin(nx)] l+«2
- f ex cos (nx) dx = -
71   j v       ' 71
2n 0
(l+«2)jr '
1      n e N,
2jt
^ / e* sin (njc) <ix o
eA[sin(«x)—« cos(nx)] l+«2
2jt 0
«(e2lr-l) " (l+«2)jr '
neN.
Therefore,
"1 f 1 _(_        cos(nx)—« sin(nx) « = 1
l+«2
X € (0, 27T).
However, no choice of x e (0, 27r) yields the series E^Li "jj^i on the right-hand side. It would be obtained for x = 0. The periodic extension of g to R is apparently not continuous at this point, so we get
g(0) +  lim g(x)
x—>1tz—
2 2
hence it follows that
el7T + l 2
which can be refined to
i+«2
-1 / j_ _j_        cos 0—n sin 0
2      « = 1
oo
2^1^ l+«2 « = 1
oo
1 _ (jr-l)e2;r+jr+l 2- l+„2 "
2(e
□
7.16.   Calculate the series
y —
(2n-
n = \
Solution. To determine the value of this series, one can successfully apply many known Fourier series of various functions. Let us remind, for instance, the Fourier series
Similarly, we can integrate term by term, leading to
1 3     2       4 ^ (-1)"
-x  = —x ■
n = \
■ sin(jrnx),
and the resulting Fourier series is obtained by substituting for x from the previous equality.
7.11. General Fourier series and wavelets. In the case of a general orthogonal system of functions /„ and the series made from it, we often talk about general Fourier series with respect to the orthogonal system of functions /„. Fourier series and further tools built upon them are used for processing various signals, pictures, and so on. The nature of the period trigonometric functions used in the standard Fourier series and their simple scaling by increasing the frequency limit their usability. In many application fields, there arose a natural requirement of more convenient orthogonal systems of functions which will reflect the supposed nature of the data and which could be processed more efficiently.
Requirements for fast numerical processing usually include quick scalability and the possibility of easy movements by constant values. We can hope in such a system if, for instance, we choose a suitable continuous function i/r with a compact support from which we create countably many functions ^fjk, j,k e Z, by translations and dilations:
fjk(x) =2>/2f(2>x-k).
If at the same time, the following two conditions are satisfied:
• the form of the mother function   captures the possible behavior of the data in a good way,
• its descendants ^ form a complete orthogonal system,
then, possibly, only a few of the functions will do to approximate the processed signal in question. We talk about the so-called wavelets.
We have no space for details here, it is an extraordinarily vital field of research as well as the base of commercial applications. Interested readers are referenced to special literature.
Let us remark that actually, only discrete versions of our objects are used, i. e. the values of all the functions ^ are only enlisted in a discrete (very large) set of points and are also orthogonal in this sense. The standards JPEG2000 are a good example of this, they use this technique and are a tool for professional compression of visual data in film industry, or the format DjVu for compressed publications.
One of the first wavelets was created by Ingrid Daubechies. In the picture below, there are the so-called Daubechies mother
wavelet D4(x) and its daughter D4(2
1).
431
CHAPTER 7. CONTINUOUS MODELS
2       it 2,
« = 1
(2«-l)2 '
which was calculated for the function g (x) = |x |,x e [—n,n). Since this function is continuous on [—it, it) and | — tt | = 17t |, we even know that
jt _ 4_ st" cos([2n —1]*) 2       it ^     (2«-l)2 '
X e [ —7t, 71"].
« = 1
Substituting x = 0 gives
oo
n — z. - i v_J_
U       2       jt ^ (2«-l)2'
oo
*■e-  E —n
n = \
□
7.17.   Sum up the series
^ «4'    ^ «4 Solution. First, let us remind that the values of the series
y — = —    V ~'
„2 6 > „2 «—1 «—1
iE! 12
have already been determined. In this exercise, we will hint the procedure by which the series
t i ,    £ tP»
«—1 ft—1
for a general ieN can be calculated. We use the identities
sin (nx)
(7.11)
(7.12)
x e (0, lit),
«=i
An       x-^ cos («x) sin (nx)
— + 4E^2-^-4^E^^'    * 6(0,2*),
«—i «—i
which follow from the constructions of the Fourier series for the functions g(x) = x and g(x) = x2, respectively, on the interval [0, 2jt). By (||7.11||), we have
oo
£^ = ^, xe(0,27r).
«=i "
Substituting into (||7.12||) gives
Ecos( «2 «=1
cos(nx) _ 3jc2-6ttjc+2jt2
12
x e (0, 27T).
n = \
E
«=i
sin(ny) „3
o «=i
2_fi„„J_^2    . ..3_W2,~ 2,
0
Let us point out that, in fact, every Fourier series may be integrated term by term. Similarly, further integration gives
Mere substitution then proves the validity of this formula at the marginal points x = 0, x = 2n as well. The left-hand series is apparently bounded from above by £^Li ^ thus it converges absolutely and uniformly on [0, 2n]. Therefore, it can be integrated term by term:
oo
Esin(nx) «3
The function FA is not described by any means from analysis. The function is given only by the values it takes on a finite (yet very large) set of input values. It is chosen so that it would have, in its various parts, all the necessary properties for graphics data — both slow and fast increase, sharp turns at both extrema, and so on. The complexity of the construction lies, of course, in the condition that the system obtained by the mentioned construction be orthogonal!
2. Metric spaces
In this part of the chapter, we will focus on the concepts of distance and convergence in a more abstract way. We will take advantage of this presently when we prove the already mentioned properties of Fourier series, and we will also return to these concepts in miscellaneous contexts. So we can consider the subsequent pages to be a very useful (and hopefully manageable) trip into the world of mathematics for the competent or courageous.
7.12. Metrics and norms. When we derived the technique of Fourier series, we freely talked about the distance on a space of functions. Now we will stop by this 3s concept and explain it thoroughly. The Euclidean distance in the vector spaces Rn satisfies the following three abstract requirements. (So does the L\-distance d(f, g) — \\f — g\\t on the space of continuous and absolutely integrable functions.) For the oncoming paragraphs, let us try to keep these two examples in our minds.
_ |    Axioms of a metric and a norm [___
A set X together with a mapping rf:IxX->I such that for all x, y,z e X, the following conditions are satisfied
(7.2) d(x, y) > 0; and d(x, y) = 0 iff x = y ,
(7.3) d(x,y) = d(y,x),
(7.4) d(x, z) < d(x, y) + d(y, z),
is called a metric space. The mapping d is the metric on X.
If X is a vector space over R and || || : X -> R is a function satisfying
(7.5)
(7.6)
(7.7)
||x|| > 0; and ||x|| — 0 iff x — 0, \\Xx\\ — \X\ ||x||, for all scalars X ,
\\x + y\\ < ||x|| + ||y||,
then the function || || is called the norm on X, and the space X is then a normed vector space.
A norm always gives the metric d(x, y) — \\x — y\\.
432
CHAPTER 7. CONTINUOUS MODELS
E
n = l
1—cos(nx)
E
n = l
cos(ny) „4
/E
sin(ny) „3
/ 12 0
■ <iy
<iy
o «=i
_ jc4-4jtjc3+4jt2jc2 ~~ 48
Substituting x = it leads to
oo , ,
l+(-l)"+1
2- „4
«=1
E
n = l
1—cos(«7r)
ni
48 1
Taking into account the fact the left-hand numerator is zero for even numbers n and equals 2 for odd numbers n, the obtained series can be written as
oo 0 ^4
(7.13) £
«=i
(2n - l)4
7T
48'
From the expression
2^ «4 E  (2„)4 + E (2„
n — l n — l n — l
it follows that
oo
16 ^ n4 T ^ (2«-«=1
2^ «■
16 V 1 15 i— (2«-l)4
16 15
Hi 48
n = l
it 90 '
«—1 n — l
thereby having summed up the first series. As for the second one, we have
At the beginning of the previous part of this chapter, we actually defined the distance of functions using the so-called L i-norm. In Euclidean vector spaces, it was the norm ||x||, which is induced x e [0, lit], by the bilinear scalar product by the relation ||x||2 — (x, x). Similarly, we worked with the norm on unitary spaces. We obtained the L2-norm on continuous functions in the same way.
Of course, metrics given by a norm have very specific properties since their behavior on the whole space X can be derived from the properties in an arbitrarily small neighborhood of the zero element x — 0 e X.
7.13. Convergence. The concepts of (close) neighborhoods of particular elements, convergence of sequences of elements and the corresponding "topological" concepts can be defined on completely abstract metric spaces ■Wfjur^^^— in much the same way as in the case of the real and complex numbers and their sequences at the beginning of the fifth chapter, see 5.12-5.17.
We can almost copy these paragraphs; only the proofs of the theorem 5.17 will be much harder. We will start off with the concept of convergent sequences in a metric space X with metric d:
Cauchy sequences J___
E
n = l
(-D"
E
i
«=i
1 ILL
2 '
(2«-l)4
(2«)4 (2«-
1
« = 1 „4
« = 1
1)4
16 „4
«=1
7jr*
48       16    90       720 -
As we have said, one can proceed similarly when summing up the series
oo oo
y- J_       y- (-!)"+'
n—l n—l
for other k e N. Therefore, it is natural to ask for the value of the series E^Li ^- This problem has been tackled by mathematicians for centuries without success. The reader may justifiably be surprised by this since the mentioned procedure should be applicable to all the odd powers as well.
We can, for instance, start with the identity
E
n = l
cos(nx)
-In (2 sin I),   x e (0, lit),
which, by the way, can be proved by expanding the right-hand function into a Fourier series again. If we, similarly to above, integrated the left-hand series term by term twice and substituted x -> 0+ in the limit, we would get just the series E^Li ^- Thus, it should suffice to integrate the right-hand function twice and calculate one limit. However, the integration of the right-hand side leads to a non-elementary integral, i. e., the antiderivative cannot be expressed in terms of elementary function we usually work with. 1
□
7.18. Using Parseval's identity for Fourier's orthogonal system, verify that
oo
E
n = l
(2«-l)4 96'
1The function f(p) = Yln^l *s caUed the Riemann zeta function.
Let us consider an arbitrary sequence of elements x$,x\,... lying in X such that for any fixed positive real number e, it holds for all but finitely many terms x, of the sequence that for all but finitely many terms xj,
d(X[, Xj) < s.
In other words, for any fixed e > 0, there is an index N such that the above inequality holds for all i, j > N; i. e. the elements of the sequence are eventually arbitrarily close to each other. Such a sequence is called a Cauchy sequence.
1
Just as in the case of the real or complex numbers, we would like every Cauchy sequence of terms x, e X to converge to some value x in the following sense:
___|    Convergent sequences |___
If a sequence of elements x$,x\,... e X, a fixed element x e X and every positive real number e are such that for all but finitely many i (depending on the choice of e), it holds that
d(xt, x) < s,
we say that the sequence xt,i — 0,1,..., converges to the element x, which is called the limit of the sequence xt,i — 0, 1,... in the metric space X.
Thanks to the triangle inequality, we get that for each pair of terms x;, xj from a convergent sequence with sufficiently large indeces, it holds that (the denotation is taken from the definition above)
d(xt, Xj) < d(xt, x) + d(x, Xj) < 2s.
Therefore, every convergent sequence is a Cauchy sequence. Metric spaces where the converse (i. e. every Cauchy sequence is convergent) is true as well are called complete metric spaces.
7.14. Topology, convergence, and continuity. Just as in the case of the real numbers, we can formulate the convergence in terms of "open neighborhoods".
433
CHAPTER 7. CONTINUOUS MODELS
Solution. We have already summed this series up (see (||7.13||)). Now, we will reveal that number series can be summed up even more easily thanks to Fourier series. However, this way is conditioned by knowledge of a good deal of Fourier series and can be a bit more complicated for the reader. (We recommend to compare the solutions of this exercise and the previous one.)
It is imperative to choose an appropriate Fourier series. For instance, let us take the Fourier series
Open and closed sets
dE
« = 1
cos([2r? — \]x) (2«-l)2 '
x0+T
n = \
n = \
x0
says for it that
zL _i_ ii v 1
2 JT2 (2«-l)4
n = \
\2dx
1 fx2dx = 2*1 jt j 3
1. e.,
E
n = l
1
(2«-l)4
\ 3 2 / 16       96 •
□
Now, we will illustrate how Fourier series can be applied in the theory of differential equations. For the sake of simplicity, we consider only the non-homogeneous (compare to (||7.2||)) differential equation
(7.14)
y" + a2y = f(x)
with y an unknown in variable x e R, with a periodic, continuously differentiable function / : R -» R on the right-hand side and a constant a > 0. Let T > 0 be the prime period of the function / and let its Fourier series on [-T/2, T/2] be known, i. e., the identity
(7.15) f(x)
«=1
2itnx 2itnx An cos--h Bn sin-
x e
7.19. Prove that if the equation (||7.14||) has a periodic solution on R, then the period of this solution is also a period of the function /. Further, prove that the equation (|| 7.141|) has a unique periodic solution with period T if and only if
2jvn
(7.16) a 7^ for every n e N.
Solution. Let a function y = gix), x e R, be a solution of the equation (||7.141|) and let it have period p > 0. In order to substitute the function g into a second-order differential equation, its second derivative g" must exist. Since the functions g, g', g", ... share the same period, the function
g"ix)+a2gix) = fix)
is also period with period p. In other words, the function / is periodic as a linear combination of functions with period p. Thus, we have proved the first statement claiming that p = IT for a certain I € N.
The open s-neighborhood of an element x in a metric space X (or just e-neighborhood in short) is the set
0Eix) = {y e X; d(x, y) < s}.
A subset U C X is open iff with every point x it contains, it contains some e-neighborhood of x as well. A subset ff c lis closed iff its complement X \ W is an open set.
which we have obtained for the function gix) = \x\,x e [—7r, 7r) and which has already been used once to determine the value of a series. Parseval's identity
Instead of an e-neighborhood, we also talk about an (open) e-ball centered at x. In the case of a normed space, we can do with e-balls centered at zero: those added to the given element x give just its e-neighborhood.
The limit points of a subset A c X are, again, defined as such elements x e X that there is a sequence of points from A converging to, but not containing x. We can easily see that a set is closed if and only if it contains all of its limit points:
Indeed, it follows straight from the definition that a set A is closed if and only if for every point x A, there is some e > Osuch that the whole e-neighborhood Ob (x) has an empty intersection with A. Thus if A were closed and x were a limit point of the set A not belonging to it, then in every such e-neighborhood of such a point x, there are infinitely many points of the set A, which is a contradiction.
On the other hand, let us suppose that A contains all of its limit points and let us consider x e X \ A. If in every e-neighborhood of the point x, there were a point xB e A, then the choices e = 1/n give us a sequence of points x„ e A converging to x. But then, the point x would have to be a limit point, thus lying in A, which again leads to a contradiction.
For every subset A in a metric space X, we define its interior as the set of those points in A which belong to A together with some neighborhood of theirs. Further, we define the closure A of a set A as the union of the original set A with the set of all A's limit points.
As easily as in the case of the real numbers, we can verify that the intersection of any system of closed sets as well as the union of any finite system of closed sets results in a closed set again.
It is the other case with open sets: any union of open sets is an open set, but in general, only a finite intersection of open sets is again an open set. Prove these propositions by yourselves in detail!
We also advise the reader to verify that the interior of a set A equals the union of all open sets contained in A, while the closure of A is the intersection of all closed sets which contain A.
The closed and open sets are the essential concepts of the mathematical discipline called topology. Without going into deeper connections, we have just made ourselves familiar with the topology of the metric spaces.
The concept of convergence can be reformulated now as follows: a sequence of elements x,, i — 0, 1,..., in a metric space X converges to x e X iff for every open set U containing x, all but finitely many points of our sequence lie in U.
Just as in the case of the real numbers, we can define continuous mappings between metric spaces:
A mapping / : W -> Z is continuous iff the reverse image f~l iV) of every open set V c Z is an open set in W. Of course, this means nothing else than the claiming that for every z = fix) e Z and a positive real number e, there is a positive real number S
434
CHAPTER 7. CONTINUOUS MODELS
Now, suppose that the a function y = g(x), x e R, is a periodic solution of the equation (|| 7.141|) with period T and that it is expressed by a Fourier series as follows:
Clí) ^—\
(7.17)   g(x) = — +      [an cos (únnx) + bn sin (öotx)], x
«=i
where to = 2n/T. If g satisfies the equation (||7.14||), it must have a continuous second derivative on R. Therefore,
oo
g'(x) = E [o)nb„ cos (conx) — cona„ sin (conx)],    x e R,
n = l
(7.18)
oo
g"(x) =      \—co2n2an cos (&>72x) — co2n2bn sin (&>72x)] ,    x e R.
n = l
Substituting (||7.15||), (||7.17||) and (||7.18||) into (||7.14||) yields
a2f +
oo
E [(—co2n2an + a2an) cos (ncox) + {—co2n2bn + a2bn) sin (tkwx)]
n = \
4p + E t-^n cos («<wx) + S„ sin (ncox) ].
«=1
Hence it follows that
> ^0 ^0
Ac
a - = —,   i. e.   flo = —z,
2       2 a^-and (7.20)
(—co2n2 + a2) a„ = An,    {—co2n2 + a2) bn = Bn,    n € N.
We can see that there is exactly one pair of sequences {a„}„eNu{0), {bn}neN satisfying these conditions if and only if
-co2n2 + a2 = - (2jL)2 + a2 £ 0   for every neN, i. e., if (||7.16||) holds. In this case, the only solution of (||7.14||) with period T is determined by the only solution
(7.21) a„
-(w2«2+fl2' —(w2«2 + fl2'
of the system (||7.20||). Let us emphasize that we have silently utilized the uniform convergence of the series in (||7.18||). This follows, besides others, from deeper results of the general theory of Fourier series to which we will not pay further attention. □
7.20. Using the solution of the previous problem, find all lit -periodic solutions of the differential equation
f +2y = E
sin(nx)
X €
«=1
Solution. The equation is in the form of (||7.14||) for a = \[2 and the continuously differentiable function
oo
/(x)=E^
n = \
with prime period T = 2jt. According to the problem 117.1911, the condition \p2 £ N implies the there is exactly one 2it -periodic solution. If we look for it as the value of the series
oo
^ + E \-an cos (nx) + bn sin (nx)],    x e R,
n = l
such that for all elements y e W with distance dw(x, y) < 8, it also holds that dz(z, f(y)) < s.
Again, similarly as in the case of the real-valued functions, a mapping / between metric spaces is continuous if and only if it respects convergence of sequences.
7.15. Lp-norms. Now we have the general tools with which we jji :i can have a look at examples of metric spaces created by finite-dimensional vectors or functions at our disposal. We |x will restrict ourselves to an extraordinarily useful class of norms.
We begin with the real or complex finite-dimensional vector spaces M" and C", and for a fixed real number p > 1 and any vector z — (zi,..., zn), we define
=to
i=\
1/p
We are going to prove that this indeed defines a norm. The first two properties from the definition are clear. It remains to prove the triangle inequality. For that purpose, we will use the so-called Holder's inequality:
Lemma. For a fixed real number p > \ and every pair of n-tuples of non-negative real numbers x, and yi, it holds that
n / n       . l/p    , n .\/q
r = l \' = 1      7 \' = 1 7
where l/q — 1 — l/p.
Proof. Let us denote by X and Y the expressions in the prod-uct on the right-hand side of the inequality to be proved. If all of the numbers x, or all of the numbers yt are zero, then the statement clearly holds. Therefore, let us suppose that X / 0 and Y / 0. Holder's inequality is a useful straight corollary of the convexity of the exponential function. Let us define the numbers and wk so that
XeVk/p,    yk = YeWk/q.
Xk
Since l//? + l/g = l,we can consider the affine combination of ■ wk and thanks to the mentioned convexity, we
the values -vk obtain
M/p+Wk/q
<
1
— p 1
Hence we can calculate straightaway that
p 1
.1 &m
1
1
—Xkyk < —
XY   y - p
and summing over k — 1,
n
Exí;Ví' -
n,
1
XY
pXP
qYP
Erf-
However, the particular sums on the right-hand side give exactly Xp and Yq, so the whole expression is equal to l/p + l/q — 1. Multiplying this inequality by the number XY finishes the proof.
□
Now we will really be able to prove that || || is indeed a norm:
435
CHAPTER 7. CONTINUOUS MODELS
we further know that (see (||7.19||) and (||7.21||))
cjq — an — 0, tion ha
0
v = E
1
neN.
Jn ~ n2(2-n2) '
Thus, the given equation has the unique lit-periodic solution
oo
sm(nx)
« = 1
n2(2-n2) '
X €
□
C. Metric spaces
7.21. Show that the definition of a metric as a function d defined on X x X for a non-empty set X and satisfying
(7.22) d(x, y) = 0, if and only if x = y, x,yeX,
(7.23) d(x, z) < d(y, x) + d(y, z),    x, y, z e X,
is equivalent to the definition given in the theoretical part, in paragraph 7.12.
Solution. Ostensibly, this definition lays fewer requirements on the metric than the definition from the theoretical part. The definitions are equivalent iff the conditions (||7.22||), (||7.23||) imply
(7.24) d(y, x) > 0,    x, y e X,
(7.25) d(x, y) = d(y, x), x,yeX.
However, if we set x = z in (||7.23||), we get (||7.24||) from (||7.22||). Similarly, the choice y = z, in (||7.23||), using (||7.22||), implies that d(x, y) < d(y, x) for all points x, y e X. Interchanging the variables x and y then gives d(y, x) < d(x, y), i. e. (||7.25||). Thus, we have proved that the definitions are equivalent.
Many more ways of defining a metric can be found in literature. Besides those, one can find many definitions which are a bit different and lead to objects other than metrics (the most important ones being pseudometrics, ultrametrics, and semimetrics). The first axiomatic definition of a "traditional" metric was given by Maurice Frechet in 1906. However, the name of the metric comes from Felix Hausdorff, who used this word in his work from 1914. □
7.22. Consider the power set (the set of all subsets) of a given finite set. Determine whether the mapping defined for all considered subsets X, Yby
(a) dt(X, Y) :=|(Iuy)\(XnF)|;
(b) d2(X, Y) := (XU|1]u^nY)'' X U Y ^ 0' ^(0' 0) := 0 is a metric. (By | X |, we mean the number of elements of a set X.)
Solution. We will omit verifications of the first and second conditions from the definition of a metric in exercises on deciding whether a particular mapping is a metric. The reader should immediately realize that both d\ and d2 satisfy them. Therefore, we analyze the triangle inequality only.
The case (a). For any sets X, Y, z, we have
(7.26)
(xuz)\(xnz)c [(x uf)\(xn Y)] u [(y u z) \ (Y n z)]
since if x e (X U z) \ (X n z), then exactly one of the following occurs:
x € X and x g Z,   x g X and x e z.
_j    Minkowski inequality [_
For every p > 1 and all n -tuples of non-negative real numbers
(x\,..., xn) and (yi,..., y„), it holds that
(p*+*>p) *(E*r) +(Erf) •
To verify this practical inequality, we can use the following trick, invoking Holder's inequality. We surely have (notice that
P> 1)
E>ta + y^'1 ^ (E*f)  • (l> + yi)lp~1)q)
as well as
/ "     \ i/p   / " \
E^-fe + yi)p~l 2= (E^f)  • (E^ + yi)(p-l)q)
Summing up the last two inequalities and taking into account that p + q — pq, and so (p — l)q — pq — q —    we arrive at
l/q
Y.U(xi + yi)r n=i(xi + yi)p
l/q
E^
r = l
1/P
Erf
1/P
However, 1 — l/q — l/p, so this is just the so-called Minkowski inequality which we have wanted to prove.
Thus we have verified that on every finite-dimensional real or complex vector space, there is a class of norms || || for all p > 1. Beside that, we further set
llzlloo = maxfjzil, i = 1, ... ,n],
which, apparently, is a norm, too.
We can notice that Holder's inequality can be, in the context of these norms, written for all x — (x\,..., x„), y — (yi,..., y„) as
Ei* I-i*-1
for all p > 1 and q satisfying l/p + l/q — 1, where for p — 1, we set q = oo.
7.16. Lp -norms for sequences and functions. Now we can easily define norms on suitable infinite-dimensional vector spaces as well. Let us begin with sequences. The vector space £p, p > 1, is the set of all sequences of real or complex numbers x$,x\,... such that
All sequences with bounded absolute values of their terms create the space . Taking the limit as n goes to oo, we immediately get from the Minkowski inequality
Ei
i=0
l/p
is a norm on £p. Similarly, we set
Hxlloo — sup{|x, I, i — 0, 1, ... on ^oo, again obtaining a norm.
436
CHAPTER 7. CONTINUOUS MODELS
Thus, it makes sense to consider these four possibilities:
x e X, x £ Z, x e Y,   x e X, x £ Z, x £ Y,
x £ X, x e Z, x e F,   x ^ X, x e Z, x ^ F,
which may occur for x e (XUZ)\(XnZ). However, in any of these four cases, x belongs to exactly one of the sets (X U F) \ (X n F), (F U Z) \ (F n Z). Thus we have obtained the inclusion (||7.26||) whence the wanted triangle inequality follows.
di(X, Z) = |(XUZ)\(XnZ)|<
| [(X U F) \ (X n F)] U [(F U Z) \ (F n Z)] I < I (X U F) \ (X n F) I + I (F U Z) \ (F n Z) I = Y)+dl(Y, Z).
The case (b). We can proceed similarly to the case of d\. Let us denote by X' the complement of a set X. The equalities
(X U F) \ (X n F) =
(xnfnz)u(xnfn z') u(X'nfnz)u(X'nyn z'),
(F U Z) \ (F n Z) =
(X n f n z') u (X n 7' n Z) u (X' n f n z') u (X' n F' n Z), [(X uz)\(xn z)] u [7 \ (X u z)] = (xnF nz')u(xnF' nz')u(X' nF nz)u(X' nF' nz)u(X' nF nz'),
which, again, can be proved by listing several possibilities, imply a stronger form of (||7.26||):
[(X uz)\(xn z)] u [7 \ (X u z)] c
[(X U 7) \ (X fl F)] U [(F U Z) \ (7 n Z)].
Further, we invoke the inequality
(mz)x(inz)   <   [(xuz)x.(xnz)]u[Yx.(xuz)]       y \ \ 7 -L Q\
xuz — IUZU[r\(IUZ)] ' -f— yJ•
That is based upon calculations with non-negative numbers only since it holds in general that
f < fg,   y>0, z>0, xe[0,zl
Since, apparently,
X U Z U [7 \ (X U Z)]
we get
d2(X, Z)
X U F U Z,
(iuz)\(inz)
<
<
xuz
[(XUY)x.(XnY)]U[(YUZ)x.(YnZ)] XUYUZ
(xuYK(xnY)   , (Yuz)x,(Ynz)
XUY
[(xuz)x.(xnz)]u[Yx.(xuz)] <
XUZU[Yx,(XUZ)] — (XUY)x.(Xf]Y) | + | (YUZ)x.(YHZ) < XUYUZI —
+
yuz|      -d2(X,Y) + d2(Y,Z),
if X U Z 0 and F ^ 0. However, for X = Z = 0 or F = 0, the triangle inequality clearly holds as well.
Therefore, both mappings are metrics. The metric d\ has a mere helping use. On the other hand, the metric d2 has wider applications and it is also known as Jaccard's metric. It is named after biologist Paul Jaccard, who, in 1908, described the measure of similarity in insects populations using the function 1 — d2. □
7.23. Let
Prove that d is a metric on R.
Solution. Again, we prove the triangle inequality only (the rest is clear). Let us introduce a helping increasing function
(7.27) f(t) := j^,    t > 0.
Eventually, let us get back to the space of functions 5° [a, b] on a finite interval [a, b] or 6°, [a, b] on an unbounded interval. We have already met the norm || || i. However, for every p > 1 and for all functions in such a space of functions, the Riemann integrals
J a
\f(x)\pdx
surely exist, so we can define
11/11
= (/ \f(x)\Pdx]
The Riemann integral was defined in terms of limits, using the so-called Riemann sums which correspond to splitting S with representatives &. In our case, those are the finite sums
5s,s =E|/(^)|P(x'-^-i).
r = l
Holder's inequality applied to the Riemann sums of a product of
two f(x) and g(x) gives
r = l
l/q
< (^Ei/&)ip(-*
J a
=1 ^ \-=l
where on the right-hand side, there is just the product of the Riemann sums for the integrals        and \\g\\q.
Moving to limits, we thus verify the so-called Holder's inequality for integrals:
f(x)g(x) dx<(J f(xY dx\   I J g(xf dx\
which is valid for all non-negative real-valued functions / and g in our space of piecewise continuous functions with a compact support.
In just the same way as in the previous paragraph, we can derive the integral variant of the Minkowski inequality from Holder's inequality:
ll/ + Sllp< ll/llp+llsllp. Thus || ||p is indeed a norm on the vector space of all continuous functions having a compact support for all p > 1 (we verified this for p — 1 long ago). We will use the word "norm" for the entire space 5° [a, b] of piecewise continuous functions in this context; however, we should bear in mind that we have to identify those functions which differ only by their values at points of discontinuity.
Among these norms, the case of p — 2 is exceptional; we have realized it by the scalar product. In this case, we could have derived the triangle inequality much more easily using the Schwarz inequality.
For the functions from 5° [a, b], we can define an analogy of the Loo-norm on n-dimensional vectors. Since our functions are piecewise continuous, they will always have suprema of absolute values on a finite closed interval, so we can set
ll/lloo = SUP{/W, x e [a, b]}
437
CHAPTER 7. CONTINUOUS MODELS
The fact that / is increasing need not be verified by calculation of the first derivative. It can be seen by the simple rearrangement
m - fir) = TT7 " TT7 = (TTITO > °-    s > r ± °-Therefore,
d{x, z)
x-z
x-y+y-z
<
x~y \+\ y-z
1 + 1 x-z 1 + | x-y+y-z    — 1 + | x-y | + | y-z\
x-y__j__y-z <      x-y       , y-z
1 + | x-y | + | y-z, 1 + | x-y \ + \ y-z,    — l + \x-y\   '   l + \y-z,\
d(x, y) + d(y, z),    x, y, z e M.
□
7.24.  Determine the distance of the functions
f(x)=x,   g(x) = -^J=, xe[l,2]
v l+xz
as elements of the normed vector space <S[1, 2] of continuous functions on the interval [1,2] with norm
(a) ll/lli
(b) \\f\\x
fi I fix) \dx;
max{|/(jc)|; x e [1,2]}. Solution. The case (a). It suffices to compute
=? dx =
2 2
f\f(x)-g(x) \ dx = fx + 1 1
n2
| + V5-V2. The case (b). Now, we want to determine
max I f(x) - g(x)
*e[l,2]
max   x +
xe[l,2] v
When looking for extrema of functions, differentiation is a very strong and efficient tool. From the inequality
x +
n+x2/
we can immediately see
1 +
(vT+p
>0, xe[l,2],
max (x +
xe[l,2]
2 +
2 +
Ys-
n+x2/ vi+22
An increasing function takes the maximum value at the right marginal point of a closed interval. □
7.25.  Determine whether the sequence {xn }„eN where
x\
1,
1 + 5 +
+ -, n e N\ {1}, First, consider the usual metric given by
is a Cauchy sequence in the difference in absolute value (i. e., induced by the norm of absolute value). Then, consider the metric
Solution. Let us remind that
oo ^
(7.28) £-Therefore,
oo,
i. e.
k=\
oo 1
k—m
lim |x„
E
k—m + l
oo, m e N.
oo,   m e N.
for such a function /. Let us notice that if we considered both the one-sided limits (which always exist by our definition) and the value of the function itself to be the value / (x) at points of discontinuity, we can work with maxima instead of suprema. It is apparent again that it is a norm (except for the problems with the values at discontinuity points).
7.17. Completion of metric spaces. Both the real numbers R and the complex numbers C are (with the metric given by the absolute value) a complete metric space. Actually, this is contained in the axiom of existence of suprema. Let us remind the the real numbers were created as a "completion" of the space of rational numbers which is not complete itself. It is apparent that the closure of the set Q c R is the whole R.
Dense and nowhere-dense subsets _.
1
We say that a subset A c X in a metric space X is dense iff the closure of A is the whole space X. A set A is said to be nowhere dense in X iff the set X \ A is dense.
Apparently, A is dense in Z if every open set in the whole space X has a non-empty intersection with A.
In all cases of norms on functions from the previous paragraph, we can easily see that the metric spaces defined in this way are not complete since it can happen that the limit of a Cauchy sequence of functions from our vector space 5° [a, b] should be a function which does not belong to this space any more. Let us consider the interval [0, 1] as the domain of functions /„ which take zero on [0, 1 In) and are equal to sin(l/x) on [1/n, 1]. Apparently, they converge to the function sin( 1 /x) in all L p norms, but this function does not he in our spaces.
Completion of a metric space _.
Let X be a metric space with metric d which is not complete. A metric space X with metric d such that X c X, d is the restriction of d to the subset X and the closure X is the whole space X is called a completion of the metric space X.
The following theorem says that the completion of an arbitrary (incomplete) metric space X can be found in essentially the same way as the real numbers were created from the rationals. Before we get to the quite difficult proof of this extraordinarily important and useful result, we can notice that such a "completion" I of a space X can be done in a unique way, in a certain sense:
A mapping <p : Xi —>• X2 between metric spaces with metrics d\ and d2, respectively, is called an isometry iff all elements x,y e X satisfy d2(cp(x), cp(y)) — d\(x, y).
Of course, every isometry is a bijection onto its image (this follows from the properly that the distance of distinct elements is non-zero) and the corresponding inverse mapping is an isometry as well.
Now, let us consider two inclusions of a dense subset, i\ : X —>• Xi and i2 : X —>• X2, into two completions of the space X, and let us denote the corresponding metrics by d, d\, and d2, respectively. Apparently, the mapping
X
X2
438
CHAPTER 7. CONTINUOUS MODELS
Hence we can see that the sequence {x„} cannot be a Cauchy sequence. Thus we have found the answer for the usual metric. However, we could have utilized the fact that the sequence {xn} is not convergent by (||7.281|) and that we find ourselves in a complete metric space, where Cauchy sequences and convergent sequences coincide.
For the metric d, it suffices to realize that the mapping / introduced in (||7.271|) is a continuous bijection between the sets [0, oo) and [0, 1), having the property that /(0) = 0. Thus, any sequence is convergent "in the original meaning" if and only if it converges in the metric space R with metric d. It holds as well that a sequence is a Cauchy sequence in R with respect to the usual metric if and only if it is a Cauchy sequence with respect to d. □
7.26. Is the metric space <S[—1, 1] of continuous functions on the interval [—1, 1] with metric given by the norm
(a) || / ||p = (/^ | f(x) \p dx) l/P for p>\;
(b) ||/||0o=max{|/(x)|;xe[-l,l]} complete?
Solution. The case (a). Let us, for every n e N, define a function
fn(x) = 0, x e [-1,0),    /„(*) = 1, x e [±, l],
fn(x) =nx, x e [0, i) .
The obtained sequence {/„}„£n C <S[—1, 1] is a Cauchy sequence of functions. To verify it is a Cauchy sequence, it suffices, using the geometrical meaning of definite integral, to express
Vv     /,/„     \ Vp
f\fm(x)-f„(x)\Pdx\      <U\dx\ -<^1/p
for every m > n, m, n e N.
Let us focus on the potential limit of the sequence {/„} in <S[— 1, 1]. Let us assume it exists and denote it by /. For every s € (0, 1), there apparently exists an n(s) € N such that
/„(*) = 0, x e [-1,0],    fn(x) = 1, x e [e, 1] for all n > n(s). Therefore, the continuous function / must satisfy
f{x) = 0, x e [-1,0],    f{x) = 1, x e [e, 1] for an arbitrarily small s > 0. Thus, necessarily,
fix) = 0, x e [-1, 0],    fix) = 1, x € (0, 1].
However, this function is not continuous on [—1, 1] - it does not belong to the considered metric space. Therefore, the sequence {/„} does not have a limit in <S[— 1, 1], so this space is not complete.
The case (b). Let an arbitrary Cauchy sequence {fn}nm C <S[—1, 1] be given. The terms of this sequence are continuous functions /„ on [—1, 1] having the property that for s > 0 (or for every e/2 if you want) there is an n(e) € N such that
s
(7.29) max | fm{x) - f„ix) | < -,    m,n> n(e).
jce[-l,l] 2
In particular, we get for every x e [—1, 1] a Cauchy sequence {fnix)}neN C R of numbers. Since the metric space R with the usual metric is complete, every (for x e [—1,1]) sequence {f„{x)} is convergent. Let us set
fix) := lim f„ix),    x e [-1, 1].
is well-defined on the dense subset i\(X) c X\. Its image is the dense subset i2(X) c X2 and, moreover, this mapping is clearly an isometry. The dual mapping i\ o   1 works in the same way.
Every isometric mapping maps, of course, Cauchy sequences to Cauchy sequences. At the same time, such Cauchy sequences converge to the same element in the completion if and only if this holds for their images under the isometry <p. Thus if such a mapping <p is defined on a dense subset X of a metric space Xi, then it surely has a unique extension to the whole Xi with values lying in the closure of the image cpiX), i. e. X2.
By the previous reasoning, there is a unique extension of <p to the mapping <p~ : Xi -> X2 which is both a bijection and an isometry. Thus, the completions Xi and X2 are indeed identical in this sense.
7.18. Theorem. Let X be a metric space with metric d which is not complete. Then there exists a completion X ofX with metric d which is unique up to bijective isometries.
Proof. The idea of the construction is quite identical to \\ the one used when building the real numbers. Two Cauchy sequences x, and y, of points belonging to X are considered equivalent iff d(xt, y;) converges to W zero for i approaching infinity. This is a convergence of real numbers, thus the definition is correct.
From the properties of convergence on the real numbers, it is quite apparent that the relation defined above is really an equivalence relation. The reader is advised to verify this in detail. For instance, the transitivity follows from the fact that the sum of two sequences converging to zero converges to zero as well.
Now, let us define X as the set of the classes of this equivalence of Cauchy sequences. The original points x e X can be identified with the class of sequences equivalent to the constant sequence Xi — x, i — 0, 1, ....
It is now easy to define the metric d. It suggests itself to consider
d(x,y)= lim d(xt, yt)
for sequences x — {xq, x\,...} and y — {yo, yi,...}.
First, we have to verify that this limit exists at all and is finite. Straight from the triangle inequality and the fact that both the sequences x and y are Cauchy sequences, it follows that the considered sequence is also a Cauchy sequence of real numbers d(xt, yt), so its limit exists.
If we select different representatives x — {x?0, x/1,...} and y — {yo, yi,...}, then we can see from the triangle inequality for the distance of real numbers (we need to consider the consequences for differences of distances) that
Wix';, 3/.) - d(Xi, yt)\ < Wix';, 3/.) - dix[, yi)\+ \dix\, yd - d(xt, yt)\ < dixi,x't) + diyt, )/.).
Therefore, the definition is indeed independent of the choice of representatives.
Further, we verify that d is a metric on X. The first and second properties are clear, so it remains to prove the triangle inequality. For that purpose, let us choose three Cauchy representatives of the
439
CHAPTER 7. CONTINUOUS MODELS
Letting m -» oo in (||7.29||), we obtain
max | f{x) - f„ix) | < | < s,    n> n(e).
However, this means that the sequence {/„}„£n converges uniformly to the function / on [— 1, 1]. In other words, {/„}„£n converges to / with respect to the given norm. We have already found that a uniform limit of continuous functions is a continuous function. Thanks to that, we need not prove that / e <S[—1, 1]. Therefore, the metric space is complete.
Let us mention that we could arrive at the same results (using the same reasoning in both cases) for the more general metric space s[a, b] of continuous functions on [a, b] as well. □
7.27. One of the most important classifications of metric spaces is given by the so-called principle of nested balls. It says that a metric space iX, d) is complete if and only if every sequence {A„}„eN of nested (i. e., A„+1 c A„, neN) non-empty closed sets A„ satisfies
(7.30)
neN
However, there is one more part of this theorem - a requirement upon the considered sequences {A„}. It must hold that
(7.31)
lim sup {d(x, y); i,yeA„)=0.
Find out whether this requirement can be omitted.
Solution. The requirement (|| 7.311|), probably contrarily to many readers' expectations, cannot be omitted: the statement would become false. We need to give a counterexample proving that the statement does not hold without the mentioned condition.
For that purpose, let us consider the set X = N with metric
dim, n) = 1 H--l—, m ^ n,    dim, n) = 0, m = n.
v    '   ' m+n '        '      '        \    >    / '
The first and second properties are clearly satisfied. To prove the triangle inequality, it suffices to realize that dim, n) € (1,4/3] if m ^ n. All Cauchy sequences can be found equally easily: they are the so-called almost stationary sequences — constant from some index on (i. e., constant except for finitely many terms). Thus, every Cauchy sequence is convergent, so the metric space in question is complete. Let us introduce the sets
A„ := \m e N; dim, n) < 1 + ^} , neN.
As the inequality in their definition is not strict, it is guaranteed that they are closed sets. Since A„ = {n, n + 1, ...}, (||7.30||) does not hold. If we omitted the requirement (||7.311|), it would mean that the metric space is not complete, which is not true. Finally, let us mention that
lim sup {d(x, y); x, y € An} = lim (l + ^—) = 1^0.
□
7.28.  Prove that the metric space h is complete.
Solution. Let us consider an arbitrary Cauchy sequence {xn }„eN in the space i2. However, every term of this sequence is again a sequence, i. e., x„ = neN. Let us mention that, of course, the range of
indeces does not matter - there is no difference whether n, k e N or
elements x, y,z, and we easily get
d(x,z) — lim d(xt, Zi)
< lim d(xt, yt) + lim d(yt, zt)
— d(x, y) + d(y, z).
Apparently, the restriction of the metric d just defined to the original space X is identical to the original metric because the original points are represented by constant sequences.
It remains to prove that X is dense in X and that the constructed metric space is complete. We want to prove that for any fixed Cauchy sequence x — {x,} and every (no matter how small) s > 0, we can find an element y of the original space such that the distance of the constant sequences of y's from the chosen sequence Xi does not exceed e. However, since the sequence x, is a Cauchy sequence, all pairs of its terms x„, xm will eventually (i. e. for sufficiently large indeces m and n) become closer than e to each other. Then the choice y — xn for one of those indeces necessarily gives that the elements y and xm will be closer than e, and so, from the limit point of view, it will hold that d(y, x) < s.
Finally, it remains to prove that Cauchy sequences of points of the extended space X with respect to the metric d are necessarily convergent. In other words, we want to show that repeating the above procedure does not yield new points. This can be done by approaching the points of a Cauchy sequence ik by points yk from the original space X so that the resulting sequence y — {yt} would be the limit of the original sequence with respect to the metric d.
Since we already know that X is a dense subset in X, we can choose, for every element ik of our fixed sequence, an element Zk e X so that the constant sequence Zk would satisfy d(xk, ik) < 1 /k. Now, let us consider the sequence z — {zo, z\, ■ ■ ■}. The original sequence x is Cauchy, i. e. for a fixed real number e > 0, there is an index n(e) such that d(x„,xm) < s/2 whenever both m and n are greater than 11(e). Without loss of generality, we can assume that our index n(e) is greater than or equal to 4/e. Now, for m and n greater than n(e), we get:
d(Zmi Zn) — d(zm, Zn)
— d(Zmi xm) + d(xm, xn) -\- d(xn, Zn)
< l/m + s/2 + l/n < 2- + - — e. ~ ~   4 2
Thus it is a Cauchy sequence zt of elements in X, and so z e X. Let us examine whether the distance d(x„, z) approaches zero, which we tried to guarantee by the construction. From the triangle inequality,
dCz, xn) < d(z, z„) + d(z„, x„).
However, from our previous bounds, it follows that both the sum-mands on the right-hand side converge to zero, thereby finishing the proof. □
In the following three paragraphs, we will introduce quite simple three theorems about complete metric spaces. They are highly applicable in both mathematical analysis and verifying convergence of numerical methods.
440
CHAPTER 7. CONTINUOUS MODELS
n,k € N U {0}. Let us introduce helping sequences yk for k e N so that
yk = {y"k}neN = {4}„eN • If {x„} is a Cauchy sequence in h, then each of the sequences yk is a Cauchy sequence in R (the sequences yk are sequences of real numbers). It follows from the completeness of R (with respect to the usual metric) that all of the sequences yk are convergent. Let us denote their limits by zk, & e N.
It suffices to prove that z = {zk}km £ h and that the sequence {xn} converges for n -» oo in h just to the sequence z. The sequence {xn}neN C h is a Cauchy sequence; therefore, for every s > 0, there is an 72 (e) e N with the properly that
e (-"4, — ->4)  < £,    m,n > n(s), m, n e N.
In particular,
' 2
e (-"4 ~~ Xt)  < e'    m,n > n(s), m,n,l € N, whence, letting m -» oo, we can obtain
' 2
e(^-4) ^ e'   «>«(«), «JeN,
k=l
i. e. (this time i -» oo)
(7.32)
E(^~4)2<e'    n > n(e), n e N.
Especially, we have
oo 2
e(^-^)  < 00 > n>n(s),nsN and, at the same time,
00 .
e (4)  < 00,    " G N,
which follows straight from {x„}„£n C h- Since 00 / 00       / 00 ,
EM)<jE^t/Ete) . »eN
*=1
and
e - *ny
it must be that
e zj - 2Zkxl + (xlY
k=\ L
n e N,
E    < oo-
yfc=l
Thus we have proved that z, € h. The fact that {xn} converges for n -> 00 to z in i2 follows from (||7.32||). □
7.29. In the metric space <S[—1, 1] with metric given by the norm || • I loo, consider the sets
A ={/eS[-l,l]; /(0)e(0,2)},
b = {/ 6E
Are these sets open, closed?
f(x)dx = 0}.
7.19. Banach's contraction principle. A mapping F : X —>• X on a metric space X with metric <i is called a contrac-tion mapping iff there is a real constant 0 < C < 1 such that for all elements x, y in X,
d(F(x), F(y)) < Cd(x,y).
Theorem. If F is a contraction mapping on a complete metric space X, then it has a fixed point, i. e., there is a z € X such that F(z) = z.
Proof. The proof naturally follows the intuitive idea suggesting that if iterative application of a contraction mapping to an initial value zo e X should "accumulate" to some point. The metric space X, of course, needs to be complete; otherwise it could happen that the limit point does not exist in it.
Let us choose an arbitrary zo e X and consider the sequence zt,i =0, 1,...
zi — F(zo), Z2 — F(zi),zt+i — F(zi),... From the assumptions, we have
d(zi+uzi) =d(F(zi),F(zi-i))
< Cd(Zi,Zi-i) <■■■< Cdizuzo). The triangle inequality then implies that for all natural numbers j,
d(zi+j, Zi) < Y^d(zi+k, Zi+k-i)
k=l
j
<     C1^-1 d(zi, zo) = C d(zi, zo) £ &
C
k=l
k=l
k=\
1 - c
d(zi, zo)-
Now, for every positive (no matter how small) e, the right-hand expression is surely less than e for sufficiently large indeces i, i. e.,
d(Zi,zi+j) <
C
1 - c
d(zi, zo) < e.
However, this just says that our sequence z, is a Cauchy sequence. Thanks to the space X being complete, its limit z surely exists, and all that remains to be proved is F(z) — z.
However, every contraction mapping is clearly continuous. Therefore,
F(z) = F( lim z„) This finishes the proof.
lim F(z„)
□
7.20. Cantor intersection theorem. For any set A in a metric space X with metric d, the real number
diam A — sup d(x, y)
x,yeA
is called the diameter of the set A. The set A is said to be bounded iff diam A < 00.
Theorem. If A\ D A2 D ■■■ D Ai D ... is a non-decreasing sequence of non-empty closed subsets in a complete metric space X and diam A; —»• 0, then there is exactly one point x € X belonging to the intersection of all the sets Ai.
441
CHAPTER 7. CONTINUOUS MODELS
Solution. The interior of a set M is the set of all interior points of M and it is usually denoted by M°. A set M is then open if and only if M = M°. Similarly, we define the closure of a set M as the set of all points having zero distance from M; it is denoted by M. We can easily see that a set M is closed if and only if M = M. Since A0 = A, A = {f e «S[-1, 1]; /(0) e [0, 2]}, 5° = 0, B = B, the set A is open and not closed, and, the other way around, the set B is closed and not open. □
7.30. Let an arbitrary set X ^ 0 be given. The mapping d : X x X -» M defined by the formula
<i(x, y) := 1, i /},    <i(x, y) := 0, x = y
is clearly a metric on X. We talk about the so-called trivial or (more often) discrete metric space (X,d).
(a) Describe all Cauchy and convergent sequences in (X, d).
(b) Describe all open, closed, and bounded sets in (X, d).
(c) Describe interior, boundary, Umit, and isolated points of an arbitrary set in (X, d).
(d) Describe all compact sets in (X, d).
Solution. The case (a). For an arbitrary sequence {x„}„eN to be a Cauchy sequence, it is necessary in this space that there is an index n € N such that x„ = xn+m for all m € N. Any sequence with this property then necessarily converges to a common value x„ = x„+1 = • • • (we talk about almost stationary sequences). Besides others, we have proved that the metric space (X, d) is complete.
The case (b). The open 1-neighborhood of any element contains this element only. Therefore, every singleton set is open. Since the union of any number of open sets is an open set, every set is open in (X, d). However, this also means that every set is closed as well. The fact that the 2-neighborhood of any element coincides with the whole space implies that every set is bounded in (X, d).
The case (c). Once again, we make use of the fact that the open 1-neighborhood of any element contains this element only. Hence it follows that every point of any set is both its interior and its isolated point and that the sets have neither boundary nor limit points.
The case (d). Every finite set in an arbitrary metric space is apparently compact (it defines a compact metric space by restricting the domain of d). It follows from the classification of convergent sequences (see (a)), that no infinite sequence can be compact in (X, d). □
7.31. Decide whether the set (known as Hilbert cube)
A = {{xn}nm el2; I x„ | < \, n e N} is compact in l2. Then, decide the compactness of the set
B = {{x„}neN € Zoo; | xn | < \, n e N} in the space Zoo.
Solution. We know that the space l2 is complete. Every closed subset of a complete metric space defines a complete metric space. The set A is apparently closed in l2, so it suffices to show that it is totally bounded, and we will get its compactness.
Let us begin with the well-known series
Proof. Let us select one point zi for each set A,. Since diam Ai -> 0, for every positive real number e, we can find an index n(e) such that for all At with indeces i > n(e), their diameters are less than e. However, then for so large indeces i, j, we will have d(zt, Zj) < s, and thus our sequence is a Cauchy sequence. Therefore, it has a limit point z e X, which, of course, must be a limit point of all the sets At, thus it belongs to all of them (since they are all closed) and so to their intersection.
We have proved the existence of z. Now, it remains to prove its uniqueness. For that purpose, assume there are points z and y, both belonging to the intersection of all the sets At. Then their distance must be less than the diameter of the sets At, but that converges to zero. This completes the proof. □
7.21. Theorem (Baire theorem). If X is a complete metric space, then the intersection of every countable system of open dense sets Ai is a dense set in the metric space X.
Proof. Let a system of dense open sets At, i — 1, 2..., be
given in X. We want to show that the set A
, Ai has a non-
k=\
empty intersection with any open set U C X. We will proceed inductively, invoking the previous theorem.
Surely there is a z\ e A i n U, but since the set A i is open, the closure of an ei-neighborhood U\ (for sufficiently small ei) of the point zi is contained in A i as well. Let us denote the closure of this ei-ball U\ by Si. Further, let us suppose that the points zi and their open e,-neighborhoods Ut are already chosen for i — 1,... ,n. Since the set A„+i is open and dense in X, there is a point z„+\ e An+i n U„; however, since A„+i n U„ is open, the point zn+i belongs to it together with a sufficiently small sn+i -neighborhood U„+\. Then, the closures surely satisfy Bn+\ — U„+\ C U„, and so the closed set Bn+\ is contained in An+i n U„. Moreover, we can assume that s„ < 1/n.
If we proceed in this inductive way from the original point z\ and the set B\, we get a non-decreasing sequence of non-empty closed sets Bn whose diameter approaches zero. Therefore, there is a point z common to all of these sets, i. e.,
z e n^Ui = n^Bi c n^A,, n u,
which is the statement to be proved. □
7.22. Bounded and compact sets. The following concepts facil-' j»V      itated the phrasing of our observations about the real m~^J      numbers. They can be reformulated for general met-/f^ftvt*   fic sPaces whh almost no changes.:
An interior point of a subset A in a metric space is such an element of A which belongs to it together with some of its e-neighborhoods.
A boundary point of a set A is an element x e X such that each of its neighborhoods has a non-empty intersection with both A and the complement X \ A. A boundary point may or may not belong to the set A itself. A.
An open cover of a set A is a system of open sets Ut c X, i e I, such that their union contains the whole of A.
An isolated point of a set A is an element a e A such that one of its e-neighborhoods in X has the singleton intersection {a} with A.
A set A of elements of a metric space is called bounded iff its diameter is finite, i. e., there is a real number r such that d(x, y) <
442
CHAPTER 7. CONTINUOUS MODELS
Therefore, for every e > 0, there is an n(s) € N satisfying
2-
y £=h(e) + 1
>From each of the intervals [—l/n, \/n] for n e {1,..., we can choose finitely many points x", for any x e [—1/«, 1/«] that
, xnm(n) so that we would have
mm
je[l,...,m(n)}
/5"
Let us consider such sequences {y„}„em from h whose terms with in-deces n > n(s) are zero, and at the same time,
yi e {x
vm(l)
J«(e) e ixl ■
«00 1
m(n(EJ) I '
There are only finitely many such sequences and they create an e-net for A since
f+ S +
1 +
Since e > 0 is arbitrary, the set A is totally bounded, which implies its compactness.
It is very simple to determine whether the set B is compact. Every compact set must be closed, but the set B is not. Its closure is
B = {{x„}„m e Zoo; | x„ | < ±, n e N} .
The set B is compact. The proof of this fact is much simpler than for the set A, thus we leave it as an exercise for the reader. □
D. Integral operators
The convolution is one of the tools for smoothing functions:
7.32.  Determine the convolution f * where
1
x
x   forx e [0, 1]
Mx) fi(x)
for x ^ 0
ine [0, 1 0 otherwise
Solution. The value of the convolution at a point 7 is given by the integral f^ fi(x)f2(t — x) Ax. The integrated function is non-zero if the second factor is non-zero, i. e., if(7—x) e [—1, l],i. e.,x e [7 — l,f + l]. The value of the convolution at the point 7 can be interpreted as the integral mean of the function f over the interval (7 — 1,7 + 1). When integrating over this interval, we have to distinguish whether the number 0 belongs to it. If the interval contains zero, the integral must be split into two improper integrals. However, the value of the smaller one can be subtracted thanks to the function -\ being odd, so
the integral -\ Ax remains (think out why the formula works for negative numbers 7 as well). Thus, we get:
fi * fi(t)
t+i i
t-l X
Ax = In
~-i+t i
t+i
for 7 e (-oo, -1] U [1, oo],
fill \ Ax = ln|{±f|      for* e [-1,1].
□
Now, let us try to calculate the convolution of two functions both of which have a finite support.
r for all elements x, y e A. In the other case, the set is said to be unbounded.
A metric space X is called compact iff every sequence of terms xi e X has a subsequence converging to some point x e X.
In the case of the real numbers, we mentioned several characterizations of compactness. The concept of bounded-ness is a bit more complicated in the case of metric spaces. For any subsets A, B c X in a metric space X with metric d, we define their distance
dist(A, B) —    sup   {d(x, y)}.
xeA,yeB
If A — {x} is a singleton set, we talk about the distance dist(x, B) of the point x from the set B. We say that a metric space X is totally bounded iff for every positive real number e, there is a finite set A such that
dist(x, A) < £
for all points x e X. Let us remind that a metric space is bounded iff the whole X has a finite diameter.
We can immediately see that a totally bounded space is especially bounded. Indeed, the diameter of a finite set is always finite, and if A is the set corresponding to e from the definition of total boundedness, then the distance d(x, y) of two points can always be bounded by the sum of dist(x, A), dist(y, A), and diam A, which is a finite number. In the case of a metric on a subset of a finite-dimensional Euclidean space, these concepts coincide since the boundedness of a set guarantees the boundedness of all coordinates in a fixed orthonormal basis, and this implies the total boundedness. (Verify this in detail by yourselves!)
Theorem. The following statements about a metric space X are equivalent:
(1) X is compact,
(2) every open cover ofX contains a finite cover,
(3) X complete and totally bounded.
Sketch of the proof. If the second statement of the theorem is satisfied, the we can easily see that the space X must be totally bounded. Indeed, it suffices to choose the cover of X consisting of all e-balls centered at the points x e X. We can choose a finite cover from that and the set of centers x, of the balls participating in this finite cover already satisfies the condition from the definition of total boundedness.
To prove the implication (2) ==>■ (3), we need to show the completeness. Let us consider a Cauchy sequence x, .
□
7.23. Compactness on continuous functions. As an example of the fact that the behavior of compactness may differ in Euclidean spaces from that in spaces of functions, we will mention a very useful theorem, known as Arzela-Askoli theorem.
Theorem. A set M C C[a, b] is compact if and only if it is bounded, closed, and equicontinuous.
443
CHAPTER 7. CONTINUOUS MODELS
7.33.  Determine the convolution f\ * f2 where
/iW = fi(x) =
1-x2   fbrjc e [—1,1], 0 otherwise,
x for x e [0, 1], 0 otherwise.
Solution. The value of the convolution f\ * f2 at a point t is given by the integral over all real numbers of the product of the function f\ (x) and the function f2(t — x) with respect to the variable x (see 7.13). Thus, this value is zero if either of the values fi(x) and f2(t — x) is zero for any real x. On the other hand, the value of the convolution can be non-zero at a point t only if there are numbers x such that f\ (x) ^ 0 7^ fi(t — x). By the definitions of the given functions, this occurs if there are numbers i e [-1,1] (/i(i) / 0) such that (t — x) e [0, 1] (fiit-x) £ 0). I.e., /i*/2(0 can be non-zero if [t-1, f+l]n[0, 1] ^ 0. This happens for t e [—1,2]. We integrate over x belonging to the intersection of the intervals [t — l,t + 1] and [0, 1]. Further, this intersection depends on t e [—1,2]:
a) for t € [-1, 0], we have [t - 1, t + 1] n [0, 1] = [0,t + 1],
b) for t e [0, 1], we have [t - 1, t + 1] n [0, 1] = [0, 1],
c) for t e [1, 2], we have [t - 1, t + 1] n [0, 1] = [t - 1, 1].
According to the intersection of these intervals, we then have: a)
/oo ft+1 fi(x)f2(t -x)dx= /     fi(x)f2(t - x) Ax -oo JO
rt+1
L
(1 - x2)(t - x)dx
-t4 + t2+-t
b)
/oo /» 1
h(x)f2(t-x) = / fi(x)f2(t-x) -oo JO
1 2 1 (1 - x2)(t -x)dx = -t--,
c)
/oo /» 1
fi(x)f2(t-x) = / h(x)f2(t-x) -oo
J t-1
1 1 4
(1 - x2)(t - x) Ax = —t4 -t2+ -t.
12
Altogether, we get:
fi * flit)
-U4
U4 12'
3'
■f2
for? e [-2, -1], for? e [-1, 1], for? e [1,2] otherwise.
□
7.24. Proof of theorem 7.8 about Fourier series. The general context of metrics and convergence allows us now to get back to the proof of the theorem in which we got a first idea about piecewise and other convergences of Fourier series. However, we do not care about necessary conditions for convergence, and many other formulations can be found in literature. On the other hand, our theorem 7.8 was quite simple and concerned a good deal of useful cases.
Firstly, it is good to realize how convergences may differ with respect to different Lp norms. For the sake of simplicity, we will always work in the completion of the space 5^ or S]. with respect to the corresponding norm, without thinking about what the spaces actually look like (even though they could be described quite easily with the help of Kurzweil integral).
Holder's inequality (applied to functions / and constant 1) yields the first of the following bounds on 5° [a, b]:
f
J a
\fix)\dx < \a-b\Vl^ \fix)\Pdx
.h\VlCVl (J^\fix)\dx
< \a
VP
where p > 1 and l/p + l/q — 1, C > \f(x)\ on the whole interval [a, b] (such a uniform bound by a constant always exists if / e 5° [a, b]). The second bound follows immediately from the bound \f(x)\P < CP'1 \f(x)\and the relation 1 - l/p = l/q.
Thus it is apparent from the first bound that Lp-convergence /«—>•/ is, for any p > 1, always stronger than L\-convergence (and with a merely modified bound, we can derive an even stronger
proposition, namely that Lq
-convergence is stronger than Lp-
convergence whenever q > p; try this by yourselves). However, to apply the second bound, we have to require uniform boundedness of the sequence of functions /„, i. e. the bound for the functions /„ by a constant C must be independent of n. Then we can assert that I fn ix) — fix)\ < 2C, and our bound implies that L1 -convergence is stronger than Lp-convergence.
Therefore, all our Lp-norms on our space 5° [a, b] are equivalent with regard to convergence of uniformly bounded sequences of functions.
The most difficult (and most interesting) part is to prove the first statement of the theorem 7.8, which is in literature often referenced as Dirichlet condition (it is deemed to be derived as early as in 1824). First, we prove how this property of piecewise convergence implies the statements (2) and (3) of the theorem to be proved. Without loss of generality, we can assume that we are working on the interval [—it, it], i. e. with period T — 2it.
As the first step, we prepare simple bounds for the coefficients of the Fourier series. A bound-of-course is
1 r
* J-71
\fix)\ dx,
and the same for all the coefficients bn since both cos(x) and sin(x) are bounded by 1 in absolute value. However, if / is a continuous function in S1 [a,b], we can integrate by parts, thus obtaining
anif)
1 r
* J-71
fix) cos(nx)<ix
— —[f(x) sin(nx)] nit
(x) sin(nx) dx
444
CHAPTER 7. CONTINUOUS MODELS
7.34. Determine the convolution f\ * fi of the functions
fi = h =
l-x   fbrjc e [-2, 1], 0 otherwise,
1 for* e [0,1], 0 otherwise.
o
7.35.  Find the Fourier transform T(f) = f of the function
/(f) = sgnf,   re(-l,l);      f(t) = 0, ;eE\(-l,l), i. e.,/(0) = 0,f(t) = lfbrf e (0, 1) and/(f) = -1 for t e (-1, 0). Solution. The Fourier transform of the given function is
-4= / sgn f [cos (&>?) — z sin (atf) ] dt.
Since the product of any two odd functions is an even function, the product of an even function and an odd function is an odd function and since the integral of an odd function over the interval [ — 1, 1] is 0 (if this integral exists at all) and the integral of an even function over the interval [—1, 1] is twice the integral over [0, 1], we further get
Hf){fi>) = vfc /    sin (<»0 dt = ^
cos(a)t)
-ll
j    , 2_ cos oo—l
If we directly used the known expression of the Fourier transform of an odd function /, we would obtain more easily that
f f(t) sm(cot) dt o
/ sin (cot) dt o
2 cos (w— 1
□
7.36.  Describe the Fourier transform T(f) of the function
f(t)=Q~a?, teR,
where a > 0.
Solution. Our task is to calculate
oo
F(f)(co) = -±= f e~at2 e~i0Jtdt.
Differentiating (with respect to co) and then integrating by parts (for
F' = -it Q~a?, G = e-i0Jt) gives
oo
(F(f)(co))' = -±= f -it e~at2 e~imt dt =
-at2 -
■10Jt - lim f e
-at2—iojt        c   i(—io) Q—<*t2 Q—ioJt ^
f
2a
-4= ( f lim q-^ - f  lim q~atl - f f q~atl e~imt dt
V2F \ 2a 2a t^_0o J 2a
at2
-at2 0,—ia
m   I 1
/ e-^ t~imt dt
2a   I y^f
2a
T(f)(co).
= -bn(f')-
n
We write a„ (/) for the corresponding coefficient of the function /, and so on.
Thus we can see that the "smoother" a function is, the more rapidly the Fourier coefficients approach zero. Iterating this procedure, we really obtain a bound for functions / in [—it, it] with continuous derivatives up to order k inclusive
\a„(f)\
r \f(k+1\x)\dx, nlc+l7t J_K
and the same for b„(f). In other words, for sufficiently smooth functions /, the n*-multiples of their Fourier coefficients a„ and bn are bounded by the L i -norm of their /c-th derivative .
Let us thus consider a continuous function / in the space S1 [a, b] such that the partial sums of its Fourier series converge pointwise to /. Then we can assert that
\sn(x) - f(x)\ -
(ak cos(kx) + bi sin(kx))
k=N+l
oo
<    E dfl*l
k=N+l
\bk\).
The right-hand side can further be estimated by the coefficients a'n and b'n of the derivative /' (invoking Holder's inequality for the Lp and Lq norms for infinite series with p — q — 2, see 7.15, and Bessel's inequality for general Fourier series, see 7.5.(2))
oo ^
f(x)\< E -(Ki + i^i)
k=N+l ™ i\l/2
ife2 j
\sN(x)
<<2 E
k=N+l
<
f'»2
\K\2)
1/2
Thus we have obtained not only a proof of the uniform convergence of our series to the anticipated value, but also a bound for the speed of the convergence:
/V2    ,   \ 1 sup \sN(x) - f(x)\ <   -=||/'||2   • -=.
xeR W71" /
This proves the statement 7.8.(2), supposing the Dirichlet condition 7.8.(1) holds.
7.25. /^-convergence. In the next step of our proof, we will de-rive L 2-convergence of Fourier series under the con-h dition of uniform convergence. The proof utilizes the common technique of approximation objects which are not continuous by ones which are. We will de-=scribe it without further details. Interested readers should be able to fill in the gaps by themselves without any difficulties. First, we will formulate the statement we need in general:
Lemma. A subset of continuous functions f in 5° [a, b] on a finite interval [a, b] is a dense subset in this space with respect to the L^-norm.
445
CHAPTER 7. CONTINUOUS MODELS
Therefore, let us look for functions y(co) = F(f) (co) which satisfy the differential equation
(7.33)
, CO ^        2a ^
Writing y = dy/dco, we have
2l = -f-y,    i. e.   x-dy = -^dco,
cm 2a J' y 2a '
unless the function y equals zero (apparently, y = 0 is a solution of (||7.331|)). Integration yields
ln|y| = -£-ln|C|,    i.e.   y = ,
where C e M\{0}. Including the zero solution as well, we can express all the solutions of the differential equation (||7.33||) as the functions
y(co) = Ke~^,    K <eR.
Let us supplement the determination of the constant K for which we get T(f)(co). Later (in connection with the so-called normal distribution in statistics), we will learn that
/ e %1 dx = y/ji,
whence it follows that
J e~at2 dt = -±f e"*2 dx
Therefore,
W)(°) = vfcf = 7Ta and W)(0) = *e° = *. Altogether, we have
F(f)(co) = ^e
□
7.37. Determine the function / whose Fourier transform is the function
Solution. The inverse Fourier transform gives
oo
The idea of the proof can be seen well at the example of approximation of Heaviside's function h on the interval [—it, it]. For every S satisfying it > 8 > 0, we define the function fs as x/S for |x| < S and f$(x) — h(x) otherwise. Apparently, all the functions fs are continuous since the point of discontinuity was overcome by a convenient linear function on an interval whose size is controlled by S. It can be calculated very easily that \\h — fs\\2 —>• 0 as the function / is bounded in absolute value, and so the contribution of the integration over a decreasing interval has to approach zero.
All discontinuity points of a general function / can be cared for in exactly the same way. There are only finitely many of them, and so all of the considered functions are limit points of sequences of continuous functions.
Now, our proof is already simple because for the given function /, the distance of the partial sums of its Fourier series can be bounded with the help of a continuous approach fB in this way (all norms in this paragraph are the L2 norms):
Wf-sn (/) II < II / " fs II + II fs ~ sn (fs) II + \\sn (fs) ~ sn (/) ||
and the particular summands on the right-hand side can be controlled.
The first one of them is at most e, and according to the assumption of uniform convergence for continuous functions, the second summand can be bounded equally tightly. It is good to notice that the third one is the size of the partial sum of the Fourier series for f — fB. Thus we have
\\.f - fs-sN(f - fs)\\ < Therefore, (thanks to the triangle inequality)
\\sN(f-fs)\\ <2||/-/s|| <2e.
Altogether, we have bounded the whole distance for sufficiently close continuous functions and sufficiently large numbers N by the number 4s. This verifies the L2 convergence we wanted to prove.
7.26. Dirichlet kernel. Finally, we get to the proof of the first \\ statement of theorem 7.8. It follows straight from the definition of the Fourier series F(t) for a function f(t), using its expression with the complex exponents tial in 7.7, that the partial sums sn(t) can be written as
2iz 1   J      oj J oj
If we substitute — co for co in the integral over the interval (—oo, 0], we will obtain
oo
oo
71   vO     m m
0
1 c sin co 2n
f ^f- [cos (cot) — i sin (cot) + cos (cot) + i sin (cot) ] dco
oo
I  r sin» cos ,  t, dco
Let us mention that the previous expression can be obtained already from the fact that the function y =      with maximal domain is even.
j co
Using the identity
sinx • cos (xy) = ^ (sin [x(l + y)] + sin [x(l — y)]) ,    x, y e M,
sn(?)
-T/2
i      " pi
= j E J      f(x)e-imkx eimkt dx,
k=-n '
-T/2
where T is the period we are working with and co — 2n/T. This expression can be rewritten as
-T/2
sn (t)
and the function
f
j      Kft(t — x)f(x) dx,
J-T/2
1 N
KN(y) = ~ £ e'^
k=-n
is called Dirichlet kernel. Let us notice that the sum is a piece of a geometric series with common ratio etmy. Thus it can be expressed explicitly for all y / 0 in the following way (both the numerator
446
CHAPTER 7. CONTINUOUS MODELS
which, besides others, follows from the sum formulae (for the sine function), we get
f(t) = ±[f sin[ml1+ž)] dco + f W1-^ dcú m o m
The substitutions u = co(l + t),v = co(l — t) then give
(oo oo \
f^du-f^dv) =0, t>l; 0 0 /
(oo oo \ oo
ffdu + f^dv \=±f^du, »€(-1,1); 0 0/0
(oo oo \
-/^ + /^   =0,   f <-l. 0 0 /
Thus we have proved that the function / is zero for 11 \ > 1 and constant (necessarily non-zero) for 11 | < 1. (All the way, we assume that the inverse Fourier transform exists.)
Let us determine the function value /(0). The function
g(t) = \,    \t\<l;       g(t)=0, \t\>\
satisfies
F(g)(co) = ^f^'dt = ^/cosM) dt = J^.
-1 0
Hence it follows that /(0) = g(0)/2 = 1/2. Let us emphasize enumeration of the integral
oo
fmJLdu = $,
■J     u 2 '
0
which we have obtained as well.
□
7.38.   Solve the integral equation
oo
/ f(x) sin (xt) dt = e~x,    x > 0 o
for an unknown function /.
Solution. If we multiply both sides of the equation by the number y/2/jt, we obtain just the sine Fourier transform on the left-hand side. Therefore, it suffices to apply the inverse transform to the equation. Thus we get
oo
f(t) = | f e~x sin(xf) dx,    t > 0.
o
Integrating by parts twice, we can obtain
Je 1 sin (xt) dx
[— sin (xt) — t cos (xt) ] + C,
hence
/ e 1 sin (xt) dx
o
Therefore, the function is the solution of the equation.
/(') = JFiTF' ř>0-
and the denominator are multiplied by — e tmyl2 to be able to substitute the real-valued function sin):
KN(y)
1 e'
-iNioy
J(N+\)my
T 1
1 - el0jy
-i(N+\/2)a>y
J(N+l/2)coy
J QÍmy/2 _ e-imy/2
1 sin((Ař+ l/2)<uy)
T sm(<JL>y/2)
At the point y — 0, of course, we see that KN (0) — j (2N +1).
It is apparent from the last expression that Kf/(y) is an even function, and using 1'Hospital's rule, we can calculate quickly that it is continuous everywhere. Since all the partial sums of the series for the constant function f(x) — 1 also equal 1, we get from the definition of the Dirichlet kernel that
T/2
Kpj(x)dx — 1.
T/2
In the case of periodic functions, the integrals over intervals whose length equals the period are independent of the choice of the marginal points. Hence, changing the coordinates, we can also use the expression
rT/2
sn (x) — /      KN (y)f(x + y) dy
J-T/2
for the partial sums.
Finally, we are fully prepared. First, we will focus on the case when the function / is continuous and differentiable at the point x. We want to prove that in this case, the Fourier series F(x) converges to the value f(x) at the point x. We get
rT/2
sN (x) -/(*)=/      (f(x + y) - f(x))KN (y) dy.
J-T/2
The integrated expression can be rewritten into a form which reminds Fourier coefficients of convenient functions:
fix + y)- f(x)
sin((Ař+ l/2)<uy)
sin(&iy/2)
— <Px(y)(cos(6L>y/2) sin(NoL>y) + sin(<wy/2) cos(N(oy)),
where we used the denotation
fix + y)- fix)
<Px(y) -
□
sin(&>y/2)
for y / 0, while <px (0) — f'(x). Let us notice that we needed the differentiability and continuity of / at the point x in this step of the proof.
Now, we can truly perceive the difference s^(x) — f(x) as the sum of Fourier coefficients i»jv (tAi ) and a^iifi) where
T T
fi(y) — -^fx(y)cos(a)y/2), f2(y) — —(px(y)sin(coy/2).
However, this means that as N increases, this expression (if\)+ aw (^2) necessarily converges to zero (see 7.5.(2)).
For the end, we will look at the convergence in the case when the function / or its derivative has a discontinuity point at x — 0. Since the function belongs to S1, it is already continuous and differentiable on a neighborhood of the point x — 0 (outside the
447
CHAPTER 7. CONTINUOUS MODELS
7.39. Fourier transform and diffraction. Light intensity is a physical quantity which expresses the transmission of energy by waves. The intensity of a general light wave is defined as the time-averaged magnitude of the Poynting vector, which is the vector product of mutually orthogonal vectors of electric and magnetic fields. A monochromatic plane wave spreading in the direction of the y-axis satisfies
cs0
If
r Jo
El dt,
where c is the speed of light and £o is the vacuum permittivity. The monochromatic wave is described by the harmonic function Ey = \jr{x, t) = A cos((ot —kx). The number A is the maximal amplitude of the wave, a> is the angular frequency, and for any fixed t, the so-called wave length a is the prime period. The number k then represents the speed k = 2f- sH which the wave propagates. We have
ce0
- f E2dt = cs0- f A2 cos2(cot - kx) At * Jo ? Jo
1  Cz 1 + cos(2(&>? - kx))
cs0A2- f —' -—-—dt
* Jo 2 1       t 1 r     sm(2(a>t — kx)) 1T
-cs0A2-[t +
2co
-I
x
1 91 , sin(2(<wr — kx)) — sin(2(—kx)) , -cs0Az-(r +---)
2 r 2a>
1 9, sin(2(<wr — kx)) — sin(2(—kx)) , 1 -ce0A2(l +---) = -
2 2(or 2
cs0A
The second term in the parentheses can be neglected since it is always less than ^7 = jfr < 10~6 f°r real detectors of light, so it is much inferior to 1. The light intensity is directly proportional to the squared amplitude.
A diffraction is such a deviation from straight-line propagation of light which cannot be explained as the result of a refraction or reflection (or the change of the ray's direction in a medium with continuously varying refractive index). The diffraction can be observed when a lightbeam propagates through a bounded space. The diffraction phenomena are strongest and easiest to see if the light goes through openings or obstacles whose size is roughly the wavelength of the light. In the case of the Fraunhofer diffraction, with which we will deal in the following example, a monochromatic plane wave goes through a very thin rectangular opening and projects on a distant surface. For instance, we can highlight a spot on the wall with a laser pointer. The image we get is the Fourier transform of the function describing the permeability of the shade - opening.
Let us choose the plane of the diffraction shade as the coordinate plane z, = 0. Let a plane wave A exp(ikz) (independent of the point (x, y) of landing on the shade) hit this plane perpendicularly. Let s(x, y) denote the function of the permeability of the shade, then the resulting waves falling onto the projection surface at a point (§, rj) can be described as the integral sum of the waves (Huygens-Fresnel principle) which have gone through the shade and propagate through the medium from all points (x, y, 0) (as a spherical wave) into the point (§, n, z):
point itself). Let us split the / into its even part f\ and its odd part fi, i- e.,
f(x) = \(f(x) + f(-x)) + \(f(x)
0 is then defined as
lim f(y)).
y^O-
The value f\ (0) at the point x —
\( lim f(y) I y^0+
Then we can easily verify that the even part f\ (x) is continuous and differentiable at the point x — 0 (thanks to existence of the one-sided limits), and so on the entire neighborhood of this point. We are surely not surprised by the fact that the odd part satisfies /2(0) — 0, and so does the Fourier series which contains only the terms with sin(n&>x).
Thus we can refer to the previous continuous case and obtain, for the Fourier series F(x) of our function /, the identity
F(0) = Fi(0) + F2(0)
^(lim f(y)-l y^0+
lim /(y))+0,
y^O-
which we wanted to prove.
In the case of discontinuity at a general point, we can proceed similarly, and the whole proof has come to an end (so has the proof of the statements (2) and (3) of theorem 7.8 where we required that the Dirichlet condition be true).
3. Integral operators
7.27. Integral operators. In the case of finite-dimensional vector spaces, we can perceive the vectors as mappings from a finite set of fixed generators into the space of coordinates. The sums of vectors and the scalar multiples of vectors were then given by the corresponding operations with such functions. Then we worked with the vector spaces of functions of a real variable in the same way when their values were scalars (or vectors as well).
The simplest linear mapping a between vector spaces mapped vectors to scalars (the so-called linear forms). It was defined as the sum of products of coordinates x, of vectors with fixed values at — a(et) at the generators e;, i. e. by one-row matrices:
(xi
> Xn)
(«1
, ««) • (xi,
> Xn)
More complicated mappings, with values lying in the same space, were then given similarly by square matrices. We can approach linear operations on spaces of functions in an analogous way.
For the sake of simplicity, we will work with the real vector space S of all piecewise continuous real-valued functions having a compact support and defined on the whole R or on an interval I — [a,b]. Linear mappings S —>• R will be called (real) linear Junctionals. Examples of such functionals can be given in two different ways — by evaluating the function's values (or its derivatives') at some fixed points or in terms of integration.
We can, for instance consider the functional L given by evaluating the function at a sole fixed point xo e I, i. e.,
L(f) = /(x0).
Or, we can have the functional given by integration and a fixed function g(x), i. e.,
L(f)
-f
J a
f(x)g(x)dx.
448
CHAPTER 7. CONTINUOUS MODELS
A IS s(x, y)e-ik^+iy) dx dy J Js?
/p/2 pq/2 /     e-ik&+ny) dy dx -p/2 J-q/2
The function g(x) in the previous example is a function which weighs the particular values representing the function f(x) in the definition of the Riemann integral. The simplest case of such a functional is, of course, the Riemann integral itself, i. e. the case of g(x) — 1 for all points x.
We can get a good image from the choice
. .     (O if|x|>e Is if|x|<e.
for any e > 0. The integral of the function g over R equals one, and our linear functional can be perceived as a (uniform) averaging of the values of the function / over the e-neighborhood of the origin. Similarly, we can work with the function
/p/2 r-q/2 e-^x dx -p/2 J-q/2
-ikrjy
dy
~ g—ik^x ~	p/2	- £-iktjy -
	-p/2	_ —ikt]
q/2
0
if \x\ > £
if \x\ < s
2 sin(/c§/?/2) 2 sin(kr]q/2)
krj
Apq
sin(k^p/2) sin(kr]q/2) kijp/2 krjq/2
The graph of the function f(x) =      looks as follows:
The graph of the function VK§, vi)
sin £ sin rj 1 ~
then does:
And the diffraction we are describing:
_^ith which worked in the paragraph 6.6. This function is smooth on the whole R with a compact support on the interval (—e, e). Our functional has the meaning of a weighted combination of the values, but this time, the weights of the input values decrease rapidly as their distance from the origin increases. Surely, the integral of g over the whole R is finite, yet it is not equal to one. Dividing g by this integral would lead to a functional which would have the meaning of a non-uniform averaging of a given function /.
There is another quite common instance, the so-called Gaussian function
g(x) = -e * , it
which also has a unit integral over the whole R (we will verify this later). This time, all the input values x in the corresponding "average" have a non-zero weight, yet this weight becomes insignificantly small as the distance from the origin increases.
We could observe another example with a unit integral over the whole a while ago when we were discussing the Dirichlet kernels g(x) — Kn(x) for Fourier series.
7.28. Function convolution. Integral functionals from the previous paragraph can easily be modified to obtain a "steamed averaging" of the values of a given function / near a given point jel:
Ly(f)
-f
J —c
f(x)g(y -x)dx
__\    Convolution of functions of a real variable |___
The free parameter y in the definition of the functional Ly(f) can be perceived as a new independent variable, and our operation Ly actually maps functions to functions again, / i->- /:
f(y) = Ly(f)= f f(x)g(y-x)dx.
This operation is called the convolution of functions f and g, denoted f * g.
The convolution is mostly defined for real or complex functions on R with a compact support.
449
CHAPTER 7. CONTINUOUS MODELS
I- 1-10'rad
- Orad
L-1-10 rad
Since lim^o = 1, the intensity at the middle of the image is directly proportional to To = A2p2q2. The Fourier transform can be easily scrutinized if we aim a laser pointer through a subtle opening between the thumb and the index finger; it will be the image of the function of its permeability. The image of the last picture can be seen if we create a good rectangular opening by, for instance, gluing together some stickers with sharp edges.
7.40. Find the solution to the so-called equation of heat conduction (equation of diffusion)
ut(x, t) = a2 uxx(x, t),    x € M, t > 0
satisfying the initial condition lim u(x, t) = f(x).
Notes: The symbol ut = ^ stands for the partial derivative of the the u with respect to t (i. e., differentiating with respect to t and considering x to be constant), and similarly, uxx = £f- denotes the second partial derivative with respect to x (i. e., twice differentiating with respect to x while considering t to be constant). The physical interpretation of this problem is as follows: We are trying to determine the temperature u (x, t) in an thermally isolated and homogeneous bar of infinite length (the range of the variable x) if the initial temperature of the bar is given as the function /. The section of the bar is constant and the heat can spread in it by conduction only. The coefficient a2 then equals the quotient f^, where a is the coefficient of thermal conductivity, c is the specific heat and q is the density. In particular, we assume that a2 > 0.
Solution. We apply the Fourier transform to the equation, with respect to variable x. We have
T (ut) (co, t)
'2Jr
/ ut(x, t)e-imxdx
f u(x,t)e-imx dx
—oo
where differentiated with respect to t, i. e.,
T (ut) (co, t) = (T (u) (co, 0)' = (T (u))t (co, t). At the same time, we know that
By the transformation t — z — x, we can easily calculate that
/oo f(x)g(z- x)dx -oo f— oo
f(z-t)g(t)dt = (g*f)(z).
/
Thus the convolution, considered a binary operation
* : Sc x Sc —> Sc
of pairs of functions having compact supports, is commutative. Similarly, convolutions can be considered with integration over a finite interval; we only have to guarantee that the functions participating in them be well-defined. Especially, this can be done for periodic functions with integrating over an interval whose length equals the period.
The convolution is an extraordinarily useful tool for modeling the way in which we observe the data of an experiment or the influence of a medium through which information is transferred (for instance, an analog audio or video signal affected by noise, and so on) The input value / is the transferred information and the function g is chosen so that it would express the influence of the medium or the technical procedure used for the signal processing or the processing of any other data.
7.29. Gibbs phenomenon. Actually, we have already seen a useful case of convolution. In paragraph 7.26, we in-terpreted the partial sum of the Fourier series for a function / as a convolution with Dirichlet kernel
KN(y) = £
~icoky -772 r
This interpretation allows us to explain the so-called Gibbs phenomenon mentioned in paragraph 7.9.
7.30. Fourier transform. The convolution is one of many examples of a general integral operators on spaces of functions
L(f)(y)
-f
Ja
f(x)k(y, x) dx.
The function k(y, x), dependent on two variables,
is called the kernel of the integral operator L. The domain of such functionals must be chosen in view of the properties of the particular kernels so that the used integral would exist at all.
The theory of integral operators with kernels and equations they contain is very useful and interesting at the same time. However, we do not have enough space for it here. We will focus only on an extraordinarily important case, the so-called Fourier transform T, which has deep connections with Fourier series.
Let us remind that a function f(t), given by its converging Fourier series, equals
where the numbers c„ are complex Fourier coefficients, con — nljt/T with period T, see paragraph 7.7.
Having fixed T, the expression Aco — 2it/T describes just the change of the frequency caused by n being increased by one. Thus it is just the discrete step by which we change the frequencies when
450
CHAPTER 7. CONTINUOUS MODELS
T (a2 uxx) (co, t) = a2 T (uxx) (co, t) = —a2a? T (u) (oo, t). Denoting y(co, t) = T (u) (co, t), we get to the equation
yt = -a2co2 y.
We already solved a similar differential equation when we were calculating Fourier transforms, so it is now easy for us to determine all of its solutions
y(co, t) = K(co) e-fl2ft>\    K(co) e R.
It remains to determine K(co). The transformation of the initial condition gives
T(f) (co) = lim F(u) (oo, t) = lim y(oo, t) = K(oo)e° = K(co),
t^0+ t^0+
hence
y(co, t)=T (/) (oo) e-a2°?t,    K(oo) e R.
Now, using the inverse Fourier transform, we can return to the original differential equation with solution
u(x, t)
2Jr
f y(oo, t) eimx doo
-4- f T(f) (oo) e-"2a)2' eia)X doo
^ f I -jL= f f(s) e-ias ds ) &~amt &imx doo
-j==  f  f(S) I -JL.  f   e-aWt e-ia*s-x) d(D | ds
Computing the Fourier transform F(f) of the function f(t) for a > 0, we have obtained (while relabeling the variables)
-at2
'dp
/2c
e ^,   c > 0.
According to this formula (consider c = a2t > 0, p we have
oo, r
x),
Therefore,
./2t7   J C
u(x, t)
2m2t e-ia*s-x) dM
I2a2t
2a ■Jilt
J f(s)e  4a2' ds.
□
7.41.  Determine the Laplace transform C(f)(s) of the function
(a) f(t) = tat;
(b) f(t) = Cleait + c2ea2t;
(c) f(t) = cos ibt);
(d) fit) = sin ibt);
(e) fit) = cosh ibt);
(f) fit) = sinh ibt),
where the values b e R and ci,c2 € C are arbitrary and the positive number i e Kis greater than the real parts of the numbers a,a\,a2 e C and it is also greater than b in the problems (e) and (f).
Solution. The case (a). It follows directly from the definition of the Laplace transform that
calculating the coefficients of the Fourier series. The coefficient 1/T in the formula
T/2
T/2
f(t) e~la)nt dt
then equals Aco/2tt, so the series for f(t) can be rewritten as
'T/2
f(t)
i — — rv-i x
T/2
/We"
dx e"
Now, let us imagine the values con for all n e Z as the chosen repitesentatives for small intervals [co„, co„+i] of length Aco. Then, offi®pression in the big inner parentheses in the previous formula fa^K) actually describes the summands of the Riemann sums for thetlmproper integral
g(co)eat da,
where g(co) is a function which takes, at the points con, the values
-T/2
g(e>n)
I.
T/2
/We"
dx.
We are working with piecewise continuous functions with a compact support, thus our function / is integrable in absolute value over the whole R. Letting T —>• oo, the norm Aco of our subinter-vals in the Riemann sum gets finer. At the same time, in the last expression, we obtain the integral
g(o>)
r
J —c
/We"
dx.
The previous reasonings show that there is quite a large set of Riemann integrable functions / on R for which we ^ (\      can define a pair of mutually inverse integral opera-U tors:
Fourier transform J____
For every piecewise continuous real or complex function / on . with a compact support, we define
Hf)(co) = f(co)
i r
\[7jz J-c
Z(0e"
dt.
This function / is called the Fourier transform of the function /. The previous reasonings also show that we will have
f(t) = J^1 (f)(t)
— f
\/2jT J-c
f(co)elmt dco.
This says that the Fourier transform J7 just defined has an inverse operation J7-1, which is called inverse Fourier transform.
Let us notice that both the Fourier transform and its inverse are integral operators with almost identical kernels
±iojt
k(co, t) — e:
Of course, there transforms are meaningful for much greater domains. Interested readers are referenced to specialized literature.
451
CHAPTER 7. CONTINUOUS MODELS
£ (f) (s) = f eat e~st dt = fe~
lim (^-)
-(s-a)
(s—a)t
1
The case (b). Using the result of the above case and the linearity of improper integrals, we obtain
oo oo
£ (/) (s) = a f eait e~st dt + c2 f e"2t e~st dt = ^ +
0 0 12
The case (c). Since
cos (bt) = \ (eibt + &~ibt) ,
the choice c\ = 1/2 = c2, a\ = ib, a2 = —ib in the previous variant gives
£ (/) (*) = /(
1 „ibt
+ 5°
i
)
dt
i
+
i
2(s-ib)   1 2(s+/fc)
2+fc2-
The cases (d), (e), (f). Analogously, the choices
(d) c\ = —ill, c2 = i/2, ci\ = ib, a2 = —ib;
(e) ci = 1/2 = c2, ai = b,a2 = -b;
(f) ci = 1/2, c2 = -1/2, ai = b,a2 = -b
lead to
(d) £ (/) (S) - fr-Ce) £ (/) (i) (f) £ (/) (*)
s2+fc2 ' s . s2_h2 >
s2-fc2 •
7.42.  Using the relation (7.34)
□
C(f')(s)=sC(f)(s)- lim f(t),
derive the Laplace transforms of the functions y = cos t and y = sin t. Solution. First, let us realize that from (||7.34||), it follows that
£(f")(S) = S£(f')(S)- lim f'(t) =
- Hm f'(t) =
s  sC (/) (S) - lim f(t)
s2C(f)(s)-s lim fit) - lim /'(f).
Therefore,
-£ (siní) (5) = £ (- siní) (5) = £ ((siní)") (5) = s2 £ (sin ř) (5) — s lim sin ř — lim cos t = s2 £ (sin ř) (5) — 1,
whence we get
—£ (siní) (s) = s2£ (siní) (5) — 1, i. e. £ (siní) (5) Now, invoking (||7.34||), we can easily determine
s2+r
£ (cosi) (s) = £ ((siní)') 00 = s ^ - Jim siní =
□
7.43. For s > — 1, calculate the Laplace transform £ (g) (s) of the function
g(t) = te~'.
Further, for s > 1, calculate the Laplace transform £ (h) (s) of the function
7.31. Simple properties. The Fourier transform changes the local and global behavior of functions in an interesting way. Let us begin with a simple example in which we find a function f(t) which is transformed to the indicator function of the interval
Q], i. e., /(&>) = 0 for M > Q, and /(&>) = 1 for M < Q. The inverse transform ^F~l gives
f(t) =
—r
1
'2jt
1
1
:(e'
sPhtt 2i 2Q sin(ňř)
2Ťř
Thus, except for a multiplicative constant and the scaling of the input variable, it is the very important function sinc(x) —
Straight calculation of the limit at zero (1' Hospital's rule) gives /(0) — 2Q(2tt)~1/2, the closest zero points are at t — ±ir/ £2 and the function drops to zero quite rapidly outside the origin x — 0. This function is caught in the picture by a wavy curve for £2 — 20. Simultaneously, the area where our function f(t) keeps waving more rapidly as £2 increases is also depicted by a curve.
Omega = 20.000
We can see the indicator function of the interval £2] is Fourier-transformed to the function /, which has takes significant positive values near zero, and the value taken at zero is a fixed multiple of £2. Therefore, as £2 increases, the / concentrates more and more near the origin.
Further, we will derive the Fourier transform of the derivative fit) for a function /. We keep supposing that / has a compact support, i. e., especially both T(f') and T(f) really exist. Let us use integration by parts:
nf)(u)
i
V2jr i(tíT(f)((tí)
= tH /(0e"
dt
f
J —c
/(0e-
dt
Thus we can see that Fourier transform converts the (limit) operation of differentiation to the (algebraic) operation of a simple multiplication by the variable. Of course, this formula can be iterated, obtaining
Hf")(o>) = -a>LHf ),■■■, Hfin))
onHf).
452
CHAPTER 7. CONTINUOUS MODELS
hit) = t sinhf. Solution. Integrating by parts, we obtain
C ig) (s)= ft e"' e~st dt = ft e"(s+1)i dt = lim (^ztt ) - 0
I
dt
(lim
1
(s+l>
,2 •
Differentiating the Laplace transform of a general function — / (i. e., an improper integral) with respect to the parameter s gives
' oo \ 1      oo oo
/ -fit) e~st dt)  =f -fit) (e"")' dt = ftfit) e"s' dt.
vO / 0 0
This means that the derivative of the Laplace transform £(—f)is) is the Laplace transform of the function tfit). The Laplace transform of the function y = sinh t has already been determined as the function y = -jry;. Therefore,
(ä)(*) = (-Ä)'
2s
(s2-l)2-
Let us notice that we could also have determined C ig) is) this way.
□
The basic Laplace transforms are enumerated in the following table:
yit)
Ciy)is)
teai
feat
sincot cos cot eat sin cot eat (cos cot + - sin cot) t sin <w? sin<w? — cot cos cot
s—a 1
(s-a)2 n!
(s-a)"+1
(S — ß)2+ft)2 (S —ß)2+ft)2
2ft) s
(j2+ft)2)2
2ft)3 (s2+ft)2)2
7.44.  Prove the 4th and 5th rows of the table using Euler's formula
eimt = cos cot + i sin cot.
Solution.
Cicos cot)is) + z£ (sin «000 = Cieimt)is) eimte~st dt
poo pc
/ eimte-st At = / Jo Jo
eUo>-s)t df
1
(ia)—s)t
s — ico
1 eim --(lim —
s — ico t^oo est
s co + i-
1)
1
s + ico
s — ico     is — ico) is + ico)
s1 + co1        s1 + co
□
7.45. Let Ciy)is) denote the Laplace transform of a function yit). Using integration by parts, prove that
Solution.
(7.35)
Ciy')is) = sCiy)is) - y(0)
7.32. The relation to convolutions. There is another extremely important property, the relation between convolutions and Fourier transforms. Let us calculate the transform of the convolution h — f * g, where, as usual, we assume that the functions have compact supports. We will switch the order of integration, which is a step whose correctness will be verified later in differential and integral calculus, see ??. In the next little step, we will introduce the substitution t — x — u.
Hh)i<o)
fix) git - x)dx
2lt J-oo \J-
r r f(<J
ltx J—oo \j -
1 / f°°
ltx \j-oo
dt
giu)e
-iiö(u+x)
du I dx
if
giu)e
du
= V27TFif) ■ Hg)
A similar calculation shows the reverse statement, i. e., the fact that Fourier transform of a product is the convolution of the transforms, up to a constant.
Hf ■g) = ^=Hf)*Hg).
'In
As we mentioned above, the convolution f * g very often models the process of our observation of some quantity /. Using the Fourier transform and its inverse, we can now easily recognize the original values of this quantity if the convolution kernel g is known. We just calculate Tif * g) and divide it by the image Tig). This yields the Fourier transform of the original function /, which can be obtained explicitly using the inverse Fourier transform. We talk about deconvolution.
7.33. Dirac delta-function. Now, let us return to the first example of the inverse transform to the indicator function fa of the interval [—£1, Q]. Let £2 approach infinity and denote by \/2jtSit) the coveted limit "function" for J7-1 (/q)(0- The inverse image of a product with an arbitrary image Tig) can be expressed using convolution:
1 f°°
T~l ifa ■ Tig))iz) = -f= \    gii)T~x ifa)iz - t) dt.
>2tc
As Q, goes to oo, the left-hand expression transforms to 1 iTig))iz) — giz), while on the right-hand side, we get
giz)
f
g(t)8(z - t)dt.
The wanted 8(f) thus looks as a "function" which takes zero everywhere except the single point t — 0 where it "takes such an infinite value" that integrating the product of 8(t) and any integrable function g gives just the value of g at the point t — 0. Of course, it is not a function in the common sense, but it is an object used quite often. It is called the Dirac function 8 and it can be described correctly as an instance of the so-called distribution. Since we do not have enough space and time, we will not pay further attention to distributions. We only mention that the Dirac 8 can be imagined as a unit impulse at a single point. Its Fourier transform is the constant function Ti8) ico) — —)=.
s/LTZ
453
CHAPTER 7. CONTINUOUS MODELS
C(f)is) =s2C(y)-sy(0)-y'(0) and, by induction:
n
£(y(n))(s) = sn£iy)is) - JV-' y(i~l) (0).
7.46.  Find a function y(t) satisfying the differential equation
fit) + 4y(0 = sin 2?
and the initial conditions y(0) = 0 and y (0) = 0. Solution. From the previous example ||8.149||:
s2Ciy)is) + 4£(y)(s) = £ (sin 2?) 00 At the same time,
2
□
l. e.,
£ (sin 2000 C(y)is) =
s2 + 4' 2
is2 + 4)2
The inverse transform gives
y(t) = i sin 2? — \t cos2f.
□
7.47.  Find a function y(0 satisfying the differential equation
y" it) + 6y' (0 + 9y(0 = 50 sin t
and the initial conditions y(0) = 1 and y (0) = 4. Solution. The Laplace transform gives
s2Ciy)is) - s - 4 + 6isCiy)is) - 1) + 9£(y)(s) = 50£(sin000, i. e.,
is2 + 6s + 9)£iy)is)
50
+ 1
+ s +10,
£(y)(s)
50 s +10
+
is2+l) is + 3)2    is + 3)2' Decomposing the first term into partial fractions, we get
50
As + B      C D +-t +
so
is2 + l)is + 3)2     s2 + l     s + 3 is+3)2
50 = (As + B)(s + 3)2 + Cis2 + l)(s + 3) + D(s2 + 1).
Substituting s = —3, we obtain
50 = 10D   so   D = 5, and confronting the coefficients at s3
0 = A + C,   so   A = -C. Confronting the coefficients at s then yields
4
0 = 9A + 6B + C = 8A + 6B,    so   B = -C. Confronting the absolute terms, we get
50 = 9S + 3C + D = 12C + 3C + 5   so   C = 3, B = 4, A = -3.
On the other hand, many functions which are not integrable in absolute value on R are Fourier-transformed to expressions with Dirac 8. For instance,
Jr(cos(nf))('w) = J-(^(n — &)) + 8(n + ft))),
which can easily be seen from the calculation of the Fourier transform of the function fa cos(nx) and then be letting Q approach oo.
We can get the Fourier transform of the sine function in a similar way, we can also take advantage of the fact that the transform of the derivative of this function will differ only by a multiple of the imaginary unit and the variable.
These transforms are a base for Fourier analysis of signals: If a signal is a pure sinusoid of a given frequency, then this is recognized in the Fourier transform as two single-point impulses right at the positive and negative value of the frequency. If the signal is a linear combination of several such pure signals, we obtain the same linear combination of single-point impulses. However, since we always process a signal in a finite time interval only, we actually get not single-point impulses, but rather a wavy curve similar to the function sine with a strong maximum at the value of the corresponding frequency. The size of this maximum also yields the information about the original amplitude of the signal.
7.34. Fourier sine a cosine transform. If we apply the Fourier transform to an odd function f(t), i. e., /(—t) — —f(t), the contribution in the integration of the product of f(t) and the function cos(±ft)f) cancels for positive and negative values of t. Thus straight calculation gives
F(f)(co) ——= I f(t)sincotdt.
I In
The resulting function is odd again, hence by the same reason, the inverse transform can determined in a similar way:
F(f)(co) ——= I f(t)sincotdt.
/2it
Omitting the imaginary unit i gives mutually inverse transforms, which are called the Fourier sine transform for odd functions:
fAto)
fit)
2 r
V it Jo
V it Jo
f(t) sin (ft>?) dt,
fs (t) sin(ft)f) dt.
Similarly, one can define the Fourier cosine transform for even functions:
fdeo) =
fit)
f(t) cos (ft>?) dt,
12 r
* Jo
[2 [°° -
— I    fs it) sin <*)t dt.
* Jo
7.35. Laplace transforms. The Fourier transform cannot be applied to functions which are not integrable in absolute value over the entire R (at least, we do not obtain true functions). The so-called Laplace transform acts quite similarly as the Fourier one and is flawless in this sense:
£(/)(*) = m
Jo
f(t)&~st dt.
454
CHAPTER 7. CONTINUOUS MODELS
Since
we have
s + 10     s + 3 + 7
1
+
7
(s + 3)2     (s + 3)2     s+3    (s + 3)
2 '
£(y)(s)
-3s+4   I   _3_   I 5
s+3 ~T (s+3)2
s2 + l
+ ^- + 7
^ s+3 ^ (s+3)2
-3s    I__4__|_ _4_   I 12
s2+l       s2+l       s+3 (s+3)2-
Hence, using the inverse Laplace transform, we get the solution in the form
The integral operator C has a rapidly reducing kernel if s is a positive real number. Therefore, the Laplace transform is usually perceived as a mapping of suitable functions on the interval [0, oo) to the function on the same or shorter interval. The image C (p) exists, for example, for every polynomial p(t) and all positive numbers s.
In an analogous way as in the case of the Fourier transform, we can get the formula for the Laplace transform of a differentiated function for s > 0 using integration by parts:
C(f'(t))(s)
y(t)
-3cos? +4sin? + 4e 3t + 12te 3t.
/ /'(0e-Jo
dt
-St-iOO
Jo
7.48. Find the Laplace transform of Heaviside's function H(t) and shifted Heaviside's function Ha(t) = H(t — a):
'0 forf<0, for t = 0, for t > 0.
poo
/ f(t)e-st dt Jo
sC(fXs).
H(t)
Solution.
C(H(t))(s)
/    H(t)e~stdt = Jo Jo
dt
C(Ha(t))(s)
C(H(t -a))(s)
f
Jo
;(0-l)
POO
/ H
Jo
(t - a)e st At
f
Ja
-s(t+a)
dt = e~as C(H(t))(s)
□ =[/(0e" = -/(0)
The properties of the Laplace transform and many more transforms used especially in technical practice can be found in specialized literature.
4. Discrete transforms
The Fourier analysis of signals mentioned in the previous paragraph used to be realized by special analog circuits in radio technology, for instance. Nowadays, we work only with discrete data when processing signals by computer circuits. Let us assume that there is a fixed (tiny) sample interval r given in a (discrete) time variable and that our signal repeats with period A^t (for a very large natural number AO, which is the maximal period which can be rep--st ^resented in our discrete model.
□
7.49.   Show that
(7.36) C(f(t).Ha(t))(s) Solution.
C(f(t).Ha(t))(s)
e-as£(f(t + a))(s)
poo pc
/    f(t)H(t - a)e~st dt =
Jo Ja
poo pc
/    f(t + a)e-s(t+a) dt = e~as Jo Jo
e-asC(f(t + a))(s).
f(t)e~stdt
f(t +a)e~st dt
□
7.50. Solution. Find a function y(t) satisfying the differential equation and the initial conditions:
/(0 +4y(t) = f(t), y(0) = 0, /(0) = -1,
where the function f(t) is piecewise continuous:
cos(2f) for 0 < t < 7T, 0 for t > 7T.
This problem is a model of undamped oscillation of a spring (excluding friction and other phenomena like non-linearities in the toughness of the spring and so on) which is initiated by an outer force during the initial period only and then ceases.
455
CHAPTER 7. CONTINUOUS MODELS
The function fit) can be written as a linear combination of Heav-iside's function u(t) and its shift, i. e.,
fit) = cos(2f)(w(0 - MO)-
Since
Ciy")is) = s2Ciy) - syiO) - y'(0) = s2 Ciy) + 1,
we get, making use of the previous examples 7 and 8, the right-hand sides to the calculation of the Laplace transform
s2Ciy) + 1 + ACiy)   =   £(cos(2t)(u(t) - MO))
=   £(cos(20 • u(t)) - £(cos(20 • MO) =   £(cos(20) - e~ns£(cos(2(f + jt))
sz + 4
Hence,
= -TT7 + d " g"m)f2L,2-sz + 4 (sz + 4)z
The inverse transform then yields the solution in the form
y(0 = -\ sin(2f) + \t sin(20 + C~l (e~™ ^ $+ ^ According to (||7.36||),
C~\e~nS^T^)       = \C-He-dt,mi2t)))
= it - jt) sin(2(f - jt)) ■ Hn(t).
Since Heaviside's function is zero for t < jt and equals 1 for t > jt, we get the solution in the form
-\ sin(2f) + \t sin(20    for 0 < t < jx
I      sin(2f) for t > Jt
□
7.51.  Find a function y(i) satisfying the differential equation
f it) = cos ijtt) - yit),    t e (0, +oo)
and the initial conditions y(0) = c\, / (0) = c2.
Solution. First, let us emphasize that from the theory of ordinary differential equations, it follows that this problem has a unique solution. Further, let us remind that
and
C (/") (*) = s2£ (f) (s) - s lim fit) - lim fit)
£(cos ibt)) is)
s2+h2 ,
Applying the Laplace transform to the given differential equation thus gives
s2C (y) is) - sci -c2 = -zj-r - £ (y) is),
p. r r„\ tc\    c„.        — -A. l. e.,
S C\S c2
(7.37)       C iy) is) = + -y1— + -Tj-.
(s2 + 1) (s2 +Jt2)     sz + 1     s1 + 1
It suffices to find a function y satisfying (||8.12||). Partial fraction decomposition gives
_s_ _ 1     /   s _ s \
(s2+\)(s2+jt2) ~ n2-l \s2+l       s2+n2 ) '
456
CHAPTER 7. CONTINUOUS MODELS
Therefore, from the expression of £ (cos (bt)) (s) mentioned above and the proved equality
1
2+i'
£ (sin t) (s) = we already obtain the wanted solution
y(t) = -^—j (cos t — cos (jtt)) + ci cos t + c2 sin t.
□
7.52.   Solve the system of differential equations
x" (0+x' (0 = y(t)-f (t)W, x' (t)+2x(t) = -y(t)+y (t)+e~' with initial conditions x(0) = 0, y(0) = 0, x' (0) = 1, / (0) = 0. Solution. Once again, we apply the Laplace transform. Together with
this transforms the first equation to
s2 £ (x) (s) — s lim x(t) — lim x' (t) + s£ (x) (s) — lim x(t) =
£ (y) (s) - (s2£ (y) (s) - s lim y(t) - lim yf (t)) + ^
and the second one to
s£ (x) (s) - lim x(t) + 2£ (x) (s) =
-£ (y) (s) + s£ (y) (s) - lim y(t) + ^.
If we enumerate the Umits (according to the initial conditions), we get the linear equations
s2£ (x) (s)-l+ s£ (x) (s) = £ (y) (s) - s2 £ (y) (s) + ^
and
s£ (x) (s) + 2£ (x) (s) = -£ (y) (s) + s£ (y) (s) + ^ with a unique solution
Once again, we use partial fraction decomposition, obtaining
£ (x) (s) - i ^- + ^ —±__IJ- - 1    i     , I _J_
Since we have already calculated that
£ (t e"') (*) = -L-i,   £ (sinh 0 (*) =
£ (f sinh 0 (s)
2s
(*2-ir
we get
x(t) = | f e~l + \ sinh f,   y(t) = | f sinh t.
The reader can verify that these functions x and y are really the wanted solution. We strongly recommend to perform the verification (for instance for the reason that the Laplace transforms of the functions y = e',y = sinh t and y = t sinh t were obtained only for s > 1). □
7.53. Find the solution to the following system of differential equations:
x'(t)   = -2x(t) + 3y(0 + 3t2,
y'(t)   = -4x(t) + 5y(0 + e',    x(0) = 1, y(0) = -1
Solution.
£(x')(s)   =   £(-2x + 3y + 3t2)(s), £(y')(s)   =   £(-4x + 5y + e')(s).
457
CHAPTER 7. CONTINUOUS MODELS
The left-hand sides can be written using (||8.11||), and the right-hand ones can be rewritten thanks to linearity of the operator C. Since C(3t2)(s) = ^ and C(e')(s) = j^, we get the system of linear equations
sC(x)(s) - 1 sC(y)(s) + 1
-2C(x)(s) + 3C(y)(s) + -4C(x)(s) + 5C(y)(s) +
s-l ■
After rearrangements, we get X(s)x(s) = b(s) in matrices, where we denoted
's + 2    -3 \ ...     fC(x)(s)\     ,     ,     ( 1 + 4 4     ,-5>X^ = Uy)(,))andb^ = (-l+S7lr
Ms)
Cramer's rule says that
C(x)(s) = ^1,   C(y)(s) = M where
s + 2 -3 4 s-5
1 +
3s+ 2,
+ Ä 5 + 2     1 +
4     -1 +
1
-3 s-5
l
s-l
(s-5)(l + ^) + 3(-l + ^T)
(* + 2)(-l + ^T)
(s -5)(s3 +6)
24 o3 •
|A| =
|Ai| =
|A2| = Hence
C(x)(s)
£(y)(s) ■■
(s - l)(s - 2) V    * - i sj Using partial fraction decomposition, we can express the Laplace images of the solution
C(x)(s)-    39       3    '  28 21
(s - \)(s - 2) V        s3 s-\ 1 /(s + 2)(2-s)     4s3 +24
39 "2s2
(s-l)2 +
28 s-l
£(x)(s)
(s-l)2 +
27 s-l
4(s-2)
__7__
s-2
25
«3
87 4s '
12 «3
21
s
and then we arrive at the solutions of Cauchy's problem with the inverse transformation:
x(t) --
y(t)
39
t - 3te' + 28<?'
21 £2t 4 e
lit2 2 '
87 4 '
-18f-3te'+27e'-7e2t-6<2 -21.
□
7.54. Discrete cosine transform. The fundamental feature of JPEG compression of data is the so-called discrete cosine transform. That is given by an orthogonal matrix C = (cu)nk ;=1 defined as follows:
(2k - 1)(Z - 1)77-'
cu = akl cos
2n
where ak\
spa
otu = J - for i > 1 ■ The vector representing the data
is then decomposed orthogonally and some basis vectors (columns of the matrix C ) are dropped. This produces a reduction of data reasonably approximating the original set. The inverse transform is easy. Since C is orthogonal, it is given by multiplication by the transposed matrix.
Show that for n = 2, the matrix C equals ^ j ^ and that it is orthogonal. Calculate the orthogonal decomposition of vector (3, 4)
458
CHAPTER 7. CONTINUOUS MODELS
CCJ
with respect to the basis formed by the columns of the matrix and determine the eigenvalues and eigenvectors. Solution. Let us calculate
1/1 1 Wl 1 \ 1 (2 0^ 2 \1 -iJ'V1 "V ~ 2 V° 2-Thus, the matrix C is really orthogonal and its columns create an or-thonormal basis e\ = (-^, -^),e2 = — The coefficients of the orthogonal decomposition of the vector u = (3, 4) can be obtained easily by applying the transposed matrix
75 G -i)G) =
Therefore, the orthogonal decomposition has the following form:
7 A\ 1
1.
C1 u
^2 \v2/     72 \ V2y
The characteristic polynomial of the matrix C is (A+) (A—-j|) — 5 = 0, so the eigenvalues are A^2 = ±1 (an orthogonal matrix cannot have any others). The corresponding eigenvectors are determined by the respective equations
(-
So these are, for instance, the vectors (-^-^, 1 —       (^,— 1 (which are orthogonal automatically). Remark. Try to draw a picture of the action of the mapping determined by the matrix A on some vector in the plane.
7.55. Discrete cosine transform 2. Show that the symmetric matrix /0   1   ...  0 0\ 1   0  ...  0 0
0,   (-L + l)* +
0.
□
0  0  ...  0 1 \0  0  ...   1 0/ has eigenvalues Xt = cos <pt, where <pi
-XT with 1 < i < n, and
n + l —     — '
-^(sin^, sin 2^, : sinnig)
that the corresponding eigenvectors form an orthonormal basis.
Solution. First, we calculate the k-th coordinate of the vector
n + 1
/0 1
1 0
0 0
\0 0
0 0\ 0 0
0 1
1 0/
/ sin<^ \ sin 2<pi
\sinncpij
Using the sine sum formula, we obtain
IfJZ
2V n + 1
(sin(& — \)(pi + sin(k + l)cpi)
n + 1
sin kepi cos cpi,
so the given vector is really an eigenvector corresponding to eigenvalue cos <pi. Since there are n distinct eigenvalues (which is the dimension), these eigenvectors form a basis. It only remains to verify that the eigenvectors are orthogonal and normed. □
459
CHAPTER 7. CONTINUOUS MODELS
460
CHAPTER 7. CONTINUOUS MODELS
£. Additional exercises to the whole chapter
7.56. Expand the function sin (x) on the interval [—Jt,7t] into the Fourier series.
7.57. Expand the function cos2(x) on the interval [—jt, tv] into the Fourier series.
7.58. Determine the convolution of the functions f\ and f2, where
1   for* e [-1,0]
fi h
0
otherwise
* for * e [0, 1] 0 otherwise
o o
o
461
CHAPTER 7. CONTINUOUS MODELS
Key to the exercises
7.4. x, — -^x + sin(x); the projection does not change the function \ sin(x) since it already lies in the space.
7.5. cos(x), £ cos(x) + x. The projection does not change the function \ cos(x) since it already lies in the space.
7.34.
for? e[-2,-l]
fi * flit) =
t - £ +4 1-f + i     for? e[-l,l]
for? e [1,2] otherwise
7.56. \ — \ cos(2x).
7.57. \ + icos(2x).
7.58. Determine the convolution of the functions fa and fa, where
for? e [-1,0]
fi * flit) =
(t+iy
f?
^f-     for t e [0, 1]
2
0
otherwise
462
CHAPTER 8
Continuous models with more variables
one variable is not enough?
- never mind, just recall vectors!
A. Multivariate functions
8.1. Determine the domain of the function M2 -by the following formula:
a)
xy
which is given
b)
c)
y(x3 + x2 + x + 1) ' Z),
ln(xz
ln(-x2
v2),
d)
arcsin(2 sgn(xq(x)),
where xq denotes the indicator function of the rational numbers,
e)
fix, y, z) = -Jinx ■ arcsin(y2z).
Solution, a) The formula correctly expresses a value if and only if the denominator of the fraction is non-zero. Therefore, the formula defines a function on the set M2 \ {{x, 0), (—1, y), x, y el).
b) The formula is correct iff the argument of the logarithm is positive, i. e., \x\ > \y\. Therefore, the domain of this function is {{x, y) e \x\ > \y\}- You can see the graph of this function in the picture.
At the very beginning of our journey through the mathematical countryside, we have seen that it is not difficult to work with more parameters simultaneously since vectors can be manipulated as easily as scalars. We only have to think things out well. Now, once again, we will deal with situations when the mathematically expressed relations depend on more (yet still finitely many) parameters. We will see that there is no need of brand new ideas; it will often do to reduce the problems we encounter to the ones we are able to solve.
At the same time, we can finally return to the discussion of situations when the function values are described in terms of instantaneous changes - i. e., we will stop for a while to look at ordinary and partial differential equations. At the very end, we will introduce the so-called variation problems.
As usual, we will try to comment on the discrete variants of our approaches or problems on the fly.
1. Functions and mappings on R"
8.1. Multivariate functions. For the modeling of processes (or objects in graphics) in practice, we can seldom do with functions R -> R of one variable. At least, functions dependent on parameters are necessary, and the dependence of the change of the results on the parameters is often more important than the result itself. Therefore, we will consider the functions
fixux2,..., jc„) : R" -* R,
and we will try to extend our methods for monitoring the values and their changes for this situation. We call them functions of more variables or, more compactly, multivariate functions.
We will often work with the cases n — 2 or n — 3 so that the concepts being introduced would be easier to understand, and we will, in these cases, use the letters x, y, z instead of numbered variables. This means that a function / defined in the "plane" R2 will be denoted
f :R2 3 ix, y) i-> fix, y) e R, and, similarly, in the "space" R3
f :R3 3 ix, y, z) k> fix, y, z) e R.
Just like in the case of univariate functions, we talk about the domain A c 1" on which the function in question is defined. When examining a function given by a concrete formula, the first task is often to find the largest domain on which the formula makes sense.
CHAPTER 8. CONTINUOUS MODELS WITH MORE VARIABLES
It is also useful to consider the graph of a multivariate function, i. e., a subset G/CK"xI = R"+\ defined by
c) This formula is again a composition of a logarithm and a polyno-
mial of two variables. However, the polynomial — x2
y
takes on
only non-positive real values, where the logarithm is undefined (as a function R -» R).
d) This formula correctly defines a value iff the argument of the arc sine lies in the interval [—1, 1], which is broken by exactly those pairs (x, y) € R2 whose first component is rational. The formula thus defines a function on the set {(x, y), x e R \ Q}.
e) The argument of the square root must be non-negative, the argument of the natural logarithm must be positive, and the argument of the arc sine must be from [—1,1]. □
B. The topology of En
8.2. A known fact about the space En is that the shortest path between a pair of points is a line segment. However, many more metrics can be defined on the space R" (or on its subsets). For instance, considering a map of a state as a subset of R2, the distance of two points may be defined as the time necessary to get from one of the points to the other by public transport or on foot. In France, for example, the shortest paths between most pairs of points in this metric are far from line segments.
8.3. Show that every non-empty proper subset of En has a boundary point (which need not he in it).
Solution. Let U c En be a non-empty subset with no boundary point. Consider a point X e U, a point Y e U' := E„ \U, and the line segment XY c En. Intuitively, going from X to Y along this segment, we must once get from U to U', and this can happen only at a boundary point (everyone who has ever been to a foreign country is surely well acquainted with this fact). Formally, let A be the point of XY for which \XA\ = sup{|XZ|, XZ € U} (apparently, there is exactly one such point on the segment XY). This point is a boundary point of U: it follows from the definition of A that any line segment XB (with B e XA) is contained in U; in particular, B e U. However, if there were a neighborhood of A contained in U, then there would exist a part of the line segment XY longer than XA which would be contained in U, which contradicts the definition of the point A. Therefore, any neighborhood of the point A contains a point from U as well as a point from E„\U. □
Gf — {(xi,
, x„, f(x\, ...,x„)); (xi,
, x„) e A},
where A is the domain of /. For instance, the graph of the function defined in the plane by the formula
fix, y)
x + y x2 + y2
is quite a nice surface, caught in the picture. The maximal domain of this function consists of all the points of the plane except for the origin (0, 0).
When defining the function, and especially when drawing its graph, we used fixed coordinates in the plane. If we fix the value of either of the coordinates, only one variable remains. Fixing the value of x, for example, we get the mapping
R -> R3, ^ (x, y, f{x, y)),
i. e., a curve in the space R3. Curves are vector functions of one variable, with which we have already worked, namely in chapter six (see 6.14). The images of the curves for some fixed values of the coordinates x and y are depicted by lines in the picture.
The curves c : R -> R" are, besides multivariate functions, the easiest examples of mappings F : Rm -> R", which we will get to shortly.
In the case of functions of one variable, we built the entire differential and integral calculi upon the concepts of convergence, open neighborhoods, continuity, and so on. In the second part of chapter seven, these concepts were generalized for the so-called metric spaces, rather than only for the Euclidean spaces R".
Before going on with the following paragraphs, it is appropriate to recall these parts, or to look for the concepts and results there when necessary. We present a bit of a summary here.
464
CHAPTER 8. CONTINUOUS MODELS WITH MORE VARIABLES
8.4. Prove that the only non-empty clopen (both closed and open) subset of E„ is E„ itself.
Solution. It follows from the above exercise ||8.3|| that every nonempty proper subset U of E„ has a boundary point. If U is closed, then it is equal to its closure; therefore, it contains all of its boundary points. However, an open set (by definition) cannot contain a boundary point. □
8.5. Show that the space E„ cannot be written as the union of (at least two) disjoint non-empty open sets.
Solution. Suppose that E„ can be expressed thus, i. e., E„
where I is an index set. Let us fix a set U from this union. Then, we can write E„ = U U U, where both U and U (being a union of open sets) are open. However, they are also complements of open sets; therefore, they are closed as well. This contradicts the result of the previous exercise || 8.41|. □
8.6. Prove or disprove: a union of (even infinitely many) closed subsets of E" is a closed subset of E".
Solution. The proposition does not hold in general. As a counterexample, consider the union
U Ui. iel
u
1
-, 1
1
of closed subsets of R, which is equal to the open interval (0, 1). □
8.7. Prove or disprove: an intersection of (even infinitely many) open subsets of E" is an open subset of E".
Solution. The proposition does not hold in general. As a counterexample, consider the intersection
n(-f.+!)
i=2 V 7
of open subsets of R, which is equal to the closed singleton {1}. □
8.8. Consider the graph of a continuous function / : R2 -» R as a subset of £"3. Determine whether this subset is open, closed, and compact, respectively.
Solution. The subset is not open since any neighborhood of a point
lxo, yo, f(xo, yo)] contains a segment of the line x = x0, y = y0. However, there is a unique point of the graph of the function on this segment, and that is the point [x0, y0, f(x0, y0)].
The continuity of / implies that the subset is closed - we will show that every convergent sequence of points of the graph of / converges to a point which also lies in the graph: If such a sequence is convergent in £3, then it must converge in every component, so the sequence {[xn, y«]}^Li is convergent in R2. Let us denote this Umit by [a,b]. Then, it follows from the definition of continuity that its function values at the points [x„, yn] must converge to the value f(a,b). However, this means that the sequence {[x„, y„, f(xn, y„)]}™=l converges to the point [a,b, f(a,b)], which belongs to the graph of the function /. Therefore, the graph is a closed set.
8.2. Euclidean spaces. A Euclidean space En is perceived as a set of points in Rn without any choice of coordinates, and its direction space Rn is considered to be the § vector space of all increases that can be added to the points of the space E„.
Moreover, a standard scalar product
E
xtyt,
is selected on Rn, where u — (xi,..., x„) and 1; — (yi,..., y„) are arbitrary vectors. This gives a metric on E„, i. e., a function describing the distance ||P — Q\\ of pairs of points P, Q by the formula
Oil
where u is the vector which yields the point P when added to the point Q. In the plane E2, for instance, the distance of the points
P\ — (xi, yi) and P2 — (x2, y2) is given by
\\Pi -P2||2 = (xi -x2)2 + (yi -y2f.
Metrics defined in this manner satisfy the triangle inequality for every triple of points P, Q, R:
R\\ = HP - Q) + (Q - R)\\ < \\(P-Q)\
\(Q-R)\
see 3.25(1) in geometry, or the axioms of a metric in 7.12, or the same inequality (5.4) for scalars. The concepts defined for real and complex scalars and discussed for metric spaces in detail thus can be carried over (extended) with no problem for the points P of any Euclidean space:
___J    The topology of Euclidean spaces |_ -
a Cauchy sequence: a sequence of points P, such that for every fixed s > 0, || Pi—Pj:ll < £ holds for all indeces but for finitely many exceptional values i, j;
a convergent sequence: a sequence of points P converges to a point P iff for every fixed e > 0, || P — P || < e holds for all but finitely many indeces i; the point P is then called the limit of the sequence P,;
a limit point P of a set A c E„: there exists a sequence of
points in A converging to P and different from P;
a closed set: contains all of its limit points;
an open set: its complement is closed;
an open ^-neighborhood of a point P: the set Og (P) = {Q €
E„; IIP - Gil < S], 8 e R, S > 0;
a boundary point P of a set A: every ^-neighborhood of P has non-empty intersection with both A and the complement
En \ A;
an interior point P of a set A: there exists a <5-neighborhood of P which lies inside A;
a bounded set: lies inside some <5-neighborhood of one of its
points (for a sufficiently large 8);
a compact set: both closed and bounded.
The reader should make an appropriate effort to read the para-■ graphs 3.25, 5.14-5.17, 7.14-7.16, and 7.22 as ^l_3fc.Y /   well as try to think out/recall the definitions and connections of all these concepts.
465
CHAPTER 8. CONTINUOUS MODELS WITH MORE VARIABLES
The subset is closed, yet it is not compact since it is not bounded (its orthogonal projection onto the coordinate plane xy is the whole R2. (A subset of E„ is compact iff it is both closed and bounded.) □
C. Tangent lines, tangent planes, graphs of multivariate
functions
8.9. A car is moving at velocity given by the vector (0, 1, 1). At the initial time t = 0, it is situated at the point [1, 0, 0]. The acceleration of the car in time t is given by the vector (— cos t, — sin t, 0). Describe the dependency of the position of the car upon the time t.
Solution. As we have already discussed in paragraph 8.4, we got acquainted with the means of solving this type of problem as early as in chapter 6. Notice that the "integral curve" C(t) from the theorem of paragraph 8.4 starts at the point (0, 0, 0) (in other words, C(0) = (0, 0, 0)). In the affine space R", we can move it so that it starts at an arbitrary point, and this does not change its derivative (this is performed by adding a constant to every component in the parametric equation of the curve). Therefore, up to the movement, this integral curve is determined uniquely (nothing else than constants can be added to the components without changing the derivative). When we integrate the curve of acceleration, we get the curve of velocity (— sin t, cos ř — 1,0). Considering the initial velocity as well, we obtain the velocity curve of the car: (— siní, cosi, 1) (we shifted the curve of the vector (0, 1, 1), i. e., so that now the velocity curve at time t = 0 agrees with the given initial velocity). Further integration leads to the curve (cos t — 1, sin t, t). Shifting this of the vector (1, 0, 0) then fits with the initial position of the car. Therefore, the car moves along the curve [cos t, sin t, t] (this curve is called a helix). □
8.10. Determine both the parametric and implicit equations of the tangent line to the curve c : R -» R3, c(t) = {c\{f), c2(t), c3(t)) = (t, t2, f3) at the point which corresponds to the parameter's value t = 1.
Solution. The value t = 1 corresponds to the point c(l) = [1, 1, 1]. The derivatives of the particular components are c[ (t) = l,c'2(t) = 2t, c3(t) = 3t2. The values of the derivatives at the point t = 1 are 1, 2, 3. Therefore, the parametric equations of the tangent fine are:
x    =   c[(l)s +Ci(l) = t + 1, y   =   c'2(l)s + c2(l) = 2i + l, z   =   c'3(\)s + c3(\) = 3i + l.
In order to get the implicit equations (which are not given canonically), we ehminate the parameter t, thereby obtaining:
2x — y = 1,
3x - z = 2. □
8.11. The set of differentiable functions. We can notice that multi-variate polynomials are differentiable on the whole of their do-
slr mam- Similarly, the composition of a differentiable univariate 9   function and a differentiable multivariate function leads to a differentiable multivariate function. For instance, the function sin(x + y) is differentiable on the whole R2; ln(x + y) is a differentiable function on the set of points with x > y (an open half-plane, i. e., without the
It should be clear straight from the definitions that sequences of points Pi have the properties mentioned in the first and second items if and only if these properties are possessed by the real sequences obtained from the particular coordinates of the points P, in every Cartesian coordinate system. Therefore, it also follows from Lemma 5.12 that every Cauchy sequence of points in E„ is convergent. Especially, E„ is always a complete metric space.
8.3. Compact sets. Our games with open, closed, or compact sets could seem useless in the case of the real line E\ since in the end, we almost always talked about intervals.
In the case of metric spaces in the second part of chapter seven, it probably was, on the other hand, too complicated. However, the same approach is quite easy in the case of Euclidean spaces R". It is also very useful and important (and it is, of course, a special case of general metric spaces).
Just like in the case of E\, we define the open cover of a set (i. e., a system of open sets containing the given set), and the Theorem 5.17 holds as well (with mere reformulations):
Theorem. Subsets A C En of Euclidean spaces satisfy:
(1) A is open if and only if it is a union of a countable (or finite) system of ^-neighborhoods,
(2) every point a e A is either interior or boundary,
(3) every boundary point of A is either an isolated or a limit point of A,
(4) A is compact if and only if every infinite sequence contained in it has a subsequence converging to a point in A,
(5) A is compact if and only if each of its open covers contains a finite subcover.
Proof. The proof from 5.17 can be reused without changes in the case of propositions (l)-(3), yet now the concepts have to be perceived in a different way, and the "open intervals" are substituted with multidimensional <5-neighborhoods of appropriate points. However, the proof of the fourth and fifth propositions has to be adjusted properly. Therefore, it is a good idea to go through the proof of the corresponding propositions for general metric spaces in 7.22 while noticing the parts which can be simplified for Euclidean spaces. □
8.4. Curves in E„. Almost all of our discussion about limits, X derivatives, and integrals of functions in chapters
5 and 6 concerned functions of a real variable and real or complex values since we used only the triangle inequality valid for the magnitudes of the real and complex numbers. We already noticed back then that this argument can be carried over to any functions of a real variable with values in a Euclidean space R", and we introduced several tools for the work with curves in paragraphs 6.14-6.17.
Therefore, let us remind that for every (parametrized) curve1, i. e., a mapping c : R -> R" in an n-dimensional space, we can work with the concepts which simply extend our reasonings from the univariate functions:
• a limit: lim^í0 c(t) e R"
^In geometry, one often makes a distinction between a curve as a subset of e„ and its parametrization e ->• e". When we say the word "curve", we will exclusively mean the parametrized curve.
466
CHAPTER 8. CONTINUOUS MODELS WITH MORE VARIABLES
boundary line). The proofs of these propositions are left as an exercise on hmit compositions.
Remark.Notation of partial derivatives The partial derivative of a function / : R" -» R in variables x\,..., x„ with respect to the variable xi will be denoted by both J^- and the shorter expression fXl. In the exercise part of the book, we will rather keep to the latter notation. On the other hand, the notation J^- better catches the fact that this is a
' dx\
derivative of / in the direction of the vector field (you will learn what a vector field is in paragraph 8.34).
I'-'ol
(c(0 - eft,)) e
8.12.  Determine the domain of the function /
R,f(x,y)
x2 ^/y. Calculate the partial derivatives where they are defined on this domain.
Solution. The domain of the function in question in R2 is the half-plane {(x, y), y > 0}. In order to determine the partial derivative with respect to a given variable, we consider the other variables to be constants in the formula that defines the function. Then, we simply differentiate the expression as a univariate function. We thus get:
1 x2
fx = 2xy afy = - — .
277
The partial derivatives exist at all points of the domain except for the boundary line y = 0. □
8.13.  Determine the derivative of the function /  : R3 -» R, f(x,y,z)  = x2yz at the point [1,-1,2] in the direction 1; = (3,2,-1).
Solution. The directional derivative can be calculated in two ways. The first one is to derive it directly from the definition (see paragraph 8.5). The second one is to use the differential of the function; see 8.6 and theorem 8.7. Since the given function is a polynomial, it is differentiable on the whole R3. Let us go from the definition:
1
Mx,y,z)   =   lim-[/(*+3f, y + 2t, z - t) - f(x, y, z)] = t^o t
=   lim -[(x + 3t)2(y + 20 (z - 0 - x2 yz] =
t^o t
=   lim-[t (6xyz + 2x2z - x2 y) + t2 (...)] =
t^o t
=   6xyz + 2x2z - x2y.
We have thus derived the derivative in the direction of the vector (3, 2, —1) as a function of three real variables which determine the point at which we are interested in the value of the derivative. Evaluating this for the desired point thus leads to /„ (1, —1,2) = — 7.
In order to compute the directional derivative from the differential of the function, we first have to determine the partial derivatives of the function:
fx = 2xyz„ fy = x2z„ fz = x2y.
It follows from the note beyond theorem 8.7 that we can express Mh -1, 2) = 3/(1,-1,2) + 2^(1,-1, 2)+
• a derivative: c'(?o) — lim^-
• an integral: f% c(t)dt e W' We can also notice that both the limit and the derivative of
curves make sense in an affine space even without selecting the coordinates (where the limit is again a point in the original space, while the derivative is a vector in the direction space!). In the case of an integral, we will have to consider curves in the vector space M". The reason for this can be seen even in the case of one dimension, where we need to know the origin to be able to see the "area under the graph of a function".
Once again, it is apparent straight from the definition that limits, derivatives, and integrals can be calculated by particular n-coordinate components in W, and their existence can be determined in the same way.
We can also directly formulate the analogy of the connection between the Riemann integral and the antiderivative for curves (see 6.25):
Proposition. Let c be a curve in M", continuous on an interval [a, b]. Then its Riemann integral c(f)dt exists. Moreover, the curve
C(t)
Ja
c(s)ds e
is well-defined, differentiable, and it holds that C (t) — c(t)for all values t € [a, b].
It is worse with the mean value theorem and, in general, with Taylor's theorem, see 5.38 and 6.4. We can apply them in a selected coordinate system to the particular coordinate functions of a differentiable function c(t) — (c\ (t),..., cn (?)) on a finite interval [a, b]. In the case of the mean value theorem, for instance, we get the existence of numbers U such that
ct(b) - ct(a) — (b -a) ■ c-0i).
However, these numbers will be distinct in general, so we cannot express the difference vector of the marginal points c(b) — c(a) as a multiple of the derivative of the curve at a single point. For example, in the plane E2, we thus get for the differentiable curve
c(t) — (x(t), y(0) that
c(b) - c(a) = (x'&ib - a), 3/ (n)(b - a))
= (£-«)• (x'(£),
for two (different, in general) values e [a,b]. However, this reasoning is still sufficient for the following bound:
Lemma. Ifc is a curve in En with continuous derivative on a compact interval [a, b], then we have for all a < s < t < b that
\\c(t) - c(s)\\ < Vn(maxre[a,fe] ||c'(r)||) • \t - s\.
Proof. Direct application of the mean value theorem gives for appropriate points rt inside the interval [s, t] the following:
n n
\\c(t) - c(s)\\2 = £(c,-(0 - a(s))2 < Y](c'i(ri)(t - s))2
r = l
<(t - s)2 Y^maxre[sJ] c'tir)2
r = l
< n(maxre[s,r], r-=i,...,„ |c-(r)|)2(? - s)2
< nmaxre[M \\c'(r)\\2(t - s)2.
467
CHAPTER 8. CONTINUOUS MODELS WITH MORE VARIABLES
+ (-l)/z(l,-l,2) 3-(-4) + 2-2 + (-
l).(-l)
-7.
8.14.  Determine the derivative of the function /
□
R,
005(17)
z
at the point [0, 0, 2] in the direction of the vector
f(x, y, z) (1,2,3).
Solution. The domain of this function is M3 except for the plane z = 0. The following calculations will be considered only on this domain. The function in question is differentiable at the point [0, 0, 2] (this follows from the note ||8.11||). We can determine the value of the examined directional derivative by 8.6, using partial derivatives.
First, we determine the partial derivatives of the given function (as we have already mentioned in exercise ||8.12||, in order to determine the partial derivative with respect to x, we differentiate it as a univariate function (in x) and use the chain rule; similarly for other partial derivatives):
fx
2xy sin(x2y)
fy
x sin(x y)
fz
cos(x y)
z z z"
Evaluating this expression at the particular values, we obtain
fx(0, 0, 2) + 2 • fy(0, 0, 2) + 3 • M0, 0, 2)
1 -0 + 2-0 + 3-
3 4'
□
h~)+fx
e
1
1, -
e
(x - 1) + /,
1
1, -
e
1
y - -
-1 — x + ey.
The given point does not satisfy this equation, so it does not lie in the tangent plane. □
□
An important concept is the one of a tangent vector to a curve c : R -> E„ at a point c(to) e E„, which is defined as the vector in the direction Rn given by the derivative c'(to) e Rn. The straight line T given parametrically by
T :    c(t0) + t ■ c'(t0)
is called the tangent line to the curve c at the point to. Unlike the tangent vector, the tangent line T, being a non-parametrized line, is independent of the parametrization of the curve c since thanks to the chain rule, changing the parametrization leads to the same tangent vector, up to multiple.
8.5. Partial derivatives. For every function / : Rn -> R and an arbitrary curve c : R -> Rn, we can consider their composition (/ o c)(() : R ^ R. This composite function Foe expresses the behavior of the function / along the curve c. The simplest case is when we use straight lines.
I       directional and partial derivatives       |_^
8.15. Having a function / : W -» R with differential df(x) and a point x € W, determine a unit direction u e 1„ in which the directional derivative dv (x) is maximal.
Solution. According to the note beyond theorem 8.4, we are maximizing the function fv(x) = vifXl(x) + v2fX2(x) + • • • + vnfXn(x) in dependence on the variables v\, ...,vn which are bound by the condition v\ + ■ ■ ■ + v\ = 1. We have already solved this type of problem in chapter 3, when we talked about linear optimization (viz || ?? ||). The value /„ (x) can be interpreted as the scalar product of the vectors (fXl ,...,/*„) and {v\,... ,vn). And this product is maximal if the vectors are of the same direction. The vector 1; can thus be obtained by normalizing the vector (fXl ,...,/*„). In general, we say the the function grows maximally in the direction (fXl, ..., fXn). Then, this vector is called the gradient of the function /. In paragraph 8.19, we will recall this idea and go into further details. □
8.16. Determine whether the tangent plane to the graph of the function / : K x 1+ ^ I, f(x, y) = x ■ ln(y) at the point [1, ±] goes through the point [1, 2, 3] e M3.
Solution. First of all, we calculate the partial derivatives: fx(x, y) = ln(y), fy(x, y) = |; their values at the point [1, ^] are —1, e; further
f(l, -) = —1. Therefore, the equation of the tangent plane is
1
Definition. We say that / : R" —>• R has derivative in the direction of a vector v e Rn at a point x e En iff the derivative dvf(x) of the composite mapping t i-> f(x + tv) at the point t — 0, i. e.,
1
dvf(x) = lim-(f(x + tv) - f(x)). t^o t
The value dv f is also called a directional derivative.
The special choice of the lines in the direction of the axes of the coordinate system yields the so-called partial derivatives of the function f, which are denoted by J£-, i — I,... ,n, ox (without referring to the function) as operations
For functions in the plane, we thus get
^-f(xy)
ax
^-f(x,y) dy
1
lim -(f (x
t^o t
t, y)
lim -(fix, y t^o t
t)
f(x, y)) f(x, y)).
Especially, we can see that the partial differentiation with respect to a given variable is just the casual one-variable differentiation while considering the other variables to be constants.
8.6. The differential of a function / : Rn —>• R. However, we will not do with partial or directional derivatives for a good approximation of the behavior of a function by linear expressions. We would probably expect that a "differentiable" function of more variables composed with any differentiable curve again yields differentiable functions of one variable, which we have known well.
However, let us look at the functions in the plane given by the formulae
g(x, y) =
h(x, y)
i 1 if yx = 0
10 otherwise
[l ify=x2/0
10 otherwise
468
CHAPTER 8. CONTINUOUS MODELS WITH MORE VARIABLES
8.17. Determine the parametric equation of the tangent line to the intersection of the graphs of the functions / : R2 -» R, f(x, y) = x2 + xy - 6, g : R x R+ -» R, g(x, y) = x ■ ln(y) at the point [2, 1].
Solution. The tangent hne to the intersection is the intersection of the tangent planes at the given point. The plane that is tangent to the graph of / and goes through the point [2, 1] is
z   =   f(2,l) + fx(2,l)(x-xo) + fy(2,l)(y-y0) =   5x + 2y - 12.
The tangent plane to the graph of g is then
z   =   f(2,\) + gx(x,y)(2,\)(x-xo) + g(x,y)y(2,\)(y-yo) = 2y-2.
The intersection hne of these two planes is given parametrically as
[2, t, 2t - 2], t e R.
Another solution. The normal to the surface given by the equation
f(x, y,z) = 0 at the point b = [2, 1, 0] is (fx(b), fy(b), fz(b)) = (5,2, —1); the normal to the surface given by g(x, y,z) = 0 at the same point is (0, 2,-1). The tangent line is perpendicular to both normals; we can thus obtain a vector it is parallel to as the vector product of the normals, which is (0, 5, 10). Since the tangent hne goes through the point [2, 1,0], its parametric equation is [2, 1 + t, 2t], t e R. □
8.18. Determine all second partial derivatives of the function / given by f(x, y, z) = *Jxy In z.
Solution. First, we determine the domain of the given function: the argument of the square root must be non-negative, and the argument of the natural logarithm must be positive. Therefore, Df = {(x, y, z) e R\ (z > l&(xy > 0)) V (0 < z < l)&(xy < 0)}.
Now, we calculate the first partial derivatives with respect to each of the three variables:
fx
y ln(z)
fy
x ln(z)
fz
xy
2^/xy ln(z) 2V*y ln(z) 2z,~Jxy ln(z)
Each of these three partial derivatives is again a function of three variables, so we can consider (first) partial derivatives of these functions. Those are the second partial derivatives of the function /. We will write the variable with respect to which we differentiate as a subscript of the function /.
fxx fxy fxz
fyy fyz
fin2
4(xy Inz)" xy In2 z
+
Inz
4(xylnz)2     2y/xy lnz xy2 Inz y
+
4z(xylnz)2    2z^/xy In z
x2 In2 z 4(xy Inz) 5
X2 y In z
+
Apparently, neither of them extends all smooth curves going through the point (0, 0) to smooth functions. On the other hand, both partial derivatives of g at (0, 0) exist and no other directional derivatives do, while h has all directional derivatives at the point (0, 0), and we even have dvh(0) — 0 for all directions v, so this is a linear dependence on » 6 I2.
We can also imagine a function / which, along the lines (r cos 6, r sin 6) with a fixed angle 6, takes the values k(6)r, where k(6) is a periodic odd function of the angle 6, with period 2it. All of its directional derivatives dv f at (0, 0) exist, yet these will not be linear expressions depending on the directions i; for general functions k(6).
Therefore, we will imitate the case of univariate functions as thoroughly as possible, and we will forbid such a pathological behavior of functions directly by a definition:
___J    Differential J___
Definition. A function / : W -> R is differentiable at a point x iff all of the following three conditions hold: (1) the directional derivatives dvf(x) at the point x exist for all vectors v eRn,
dyf(x) is linearly dependent on the increase of v,
linwo m(/(* + v)~ f(x) ~ dvf{x)) = °-The linear expression dv f (in a vector variable v) is called a differential of the function f evaluated at the increase of v.
(2)
(3)
In words, we require that the increases of the function / at the point x be well approximated by linear functions of increases of the variable quantities.
It follows directly from the definition of directional derivatives that the differential can be defined solely by the property (3). Indeed, if there is a linear form df(x) such that the increases i; at the point x satisfy the property (3) with dvf(x) — df(x)(v), then df(x)(v) is apparently just the directional derivative of the function / at the point x, so the properties (1) and (2) are automatically satisfied.
Let us examine what we can say about the differential of a <fJU        function f(x, y) in the plane, supposing both partial
derivatives
3/ 3/
,    exist and are continuous in a neighborhood of a point (xq, 3>o).
To this purpose, consider any smooth curve 11->-(x(t), y(t)) with xo — x(0), yo — y(0). Using the mean value theorem for univariate functions in both summands separately, we obtain that
17(f(x(t),y(t))- f(x0,y0)) =
\(f(x(t), y(t))-f(x0, y(t))) + l(f(X0, y(t))-f(x0, yo))
- (x(t) - x0)
df ax
df
-(y(t)-yo)-T-(xo, y(n)) dy
4z(xylnz)2    2z,^/xy In z
for suitable numbers § and r\ between 0 and t.
Especially, for every sequence of numbers t„ converging to zero, we can get the corresponding sequences of numbers §„ and rjn which also converge to zero and all will satisfy the above expression.
Letting t —>• 0, we get thanks to continuity of the partial derivatives that (see the test for convergence of a function using
469
CHAPTER 8. CONTINUOUS MODELS WITH MORE VARIABLES
fzi
2 2
x y
xy
4z2(xy\wz,)2     2z2jxy lnz
By the theorem about interchangeability of partial derivatives (see 8.10), we know that fxy = fyx, fxz = fzx, fyz = fzy. Therefore, it suffices to compute the mixed partial derivatives (the word "mixed" means that we differentiate with respect to more than one variable) just for one order of differentiation. □
D. Taylor polynomials
8.19.  Write the second-order Taylor expansion of the function / : R2 -» R, f(x, y) = ln(x2 + y2 + 1) at the point [1, 1]. Solution. First, we compute the first partial derivatives:
2x _ 2y
fx = -
xz
then the Hessian:
Hf(x, y)
+ f + 1
2y2-2;c2+2 (x2+y2+l)2 4xy
4xy
(x2+y2+l)2 2;c2-2y2+2 (x2+y2+l)2 I
\    (x2+y2 + \)2
The value of the Hessian at the point [1, 1] is
4 2 "9 9
Altogether, we get that the second-order Taylor expansion at the point [1, 1] is
T2(x, y)
f(l, l) + Ml, 1)(* - l) + Mh i)(y - l) + -i,y- i)Hf(i, l). 1
y
x - 1
=  ln(3) + -(x- l) + -(y- l) + y(x- I)2-
-A-(x-\)(y-\)+l-(y-\)2 =   ^(x2 + y2 + 8x + Sy - 4xy - 14) + ln(3).
□
Remark. In particular, we can see that the second-order Taylor expansion of an arbitrary differentiable function at a given point is a second-order polynomial.
8.20.  Determine the second-order Taylor polynomial of the function / : R2 -» R2, f(x,y) = xy cos y at the point [n, it]. Decide whether the tangent plane to the graph of this function at the point [it, it, f{jt, 7t)] goes through the point [0, jt, 0]. Solution. As in the above exercises, we find out that
T(x, y) = l-jt2y2 -xy- jr3y + X-Tt\
The tangent plane to the graph of the given function at the point [7t,7t] is given by the first-order Taylor polynomial at the point [n, n]; its general equation is thus
z = —Tty — 7tX + TV2, and this equation is satisfied by the given point [0, jt, 0]. □
subsequences of the input values, 5.23, and Theorem 5.22 about the limits of sums and products of functions)
^f(x(t), y(0)|m) = x!(oAxo, yo) + y (O)-^(xo, yo), at ox ay
which is a pleasant extension of the theorem on differentiation of composite functions of one variable for vector-valued functions. Of course, the special choice of parametrized straight lines
(x(f), y(t)) = (x0 +    y0 + trfi
transforms our calculation, with i; — (§, r]), to the equality
of df dvf(x0, yo) — -z-(x0, yo)% + t-(*o, yo)n, ox ay
and this formula can be expressed in the nice way in which we described coordinate expressions of linear functions on vector spaces:
df df df — —dx H--dy.
dx dy
In other words, the directional derivative dvf is indeed a linear function R" —>• R on the increases, with coordinates given by the partial derivatives.
Similarly, we can now prove that the assumption of continuous partial derivatives at a given point guarantees the approximation properties of the differential as well.
We will consider general multivariate functions straightaway:
8.7. Theorem. Let f : En —>• R be a function of n variables which has continuous partial derivatives in a neighborhood of the a point x € En. Then its differential df at the point x exists and its coordinate expression is given by the formula
df           df df df — -—dxi + -—dx2 H-----h -—dx„.
OX l 0X2 oxn
Proof. This theorem can be derived analogously to the pro-cedure described above, for the case n — 2. We only i, have to be careful in details and finish the reasoning about the approximation properties. Just like above, we consider a curve
c(t) = (Cl(t),...,cn(t)),
c(0) — (0,0) and a point x e R", and we express the difference fix + c(t)) — f(x) for the composite function f(c(t)) as follows:
f(xi + ci(0, ..., x„ + cn(t)) - f(xi,x2 + c2(t), ...) + f(x\,x2 +c2(t), ...))- f(xi, x2, ..., x„ + cn(t))
+ f(xi,x2, ..., x„ +c„(0) - f(x\,x2,
> xn).
Now, we can apply the mean value theorem to all of the n sum-mands, thus obtaining (similarly to the case of two variables)
df
(ci(0 -ci(0)) — (xi +c1(61),x2 + c2(t), ...,xn +cn(t))
OX I
df
+ (c2(0 - c2(0))-Mxi, x2 + c2(92), ...,x„+ cn(t))
OX2
df
+ (c„(0 - c„(0))-—(xi, x2, ..., x„ + ci(6n)), dx„
470
CHAPTER 8. CONTINUOUS MODELS WITH MORE VARIABLES
8.21. Determine the third-order Taylor polynomial of the function / :
R3 -» R, fix, y,z)=x3y + xz2 + xy + 1 at the point [0, 0, 0]. O
8.22. Determine the second-order Taylor polynomial of the function / : R2 -» R, f(x, y) = x2 sin y + y2 cos x at the point [0, 0]. Decide whether the tangent plane to the graph of this function at the point [0, 0, 0] goes through the point [n, n, n]. Q
8.23. Determine the second-order Taylor polynomial of the function ln(x2 y) at the point [1,1]. O
8.24. Determine the second-order Taylor polynomial of the function
/ : R2 -» R,
fix, y) = tan(xj + y) at the point [0, 0]. O
£. Extrema of multivariate functions
8.25. Determine the stationary points of the function / : R2 -» R, fix,y) = x2 y + y2x — xy and decide which of these points are local extrema and of which type.
Solution. The first derivatives are fx = 2xy + y2 — y, fy = x2 + 2xy —x. If we set both partial derivatives equal to zero simultaneously, the system has the following solution: {x = y = 0], {x = 0, y = 1], {x = 1, y = 0], {x = 1/3, y = 1/3], which are four stationary points of the given function.
2y 2x + 2y — 1
2x + 2y — 1 2x Its values at the stationary points are, respectively, 0    -l\ /11W0 1
-l o y \i oj' \i i
Therefore, the first three Hessians are indefinite, and the last one is positive definite. The point [1/3, 1/3] is thus a local minimum. □
8.26. Determine the point in the plane x + y + 3z = 5 lying in R3 which is closest to the origin of the coordinate system. First, do this by applying the methods of linear algebra; then, using the methods of differential calculus.
Solution. It is the intersection point of the perpendicular going through the point [0, 0, 0] to the plane. The normal to the plane is it, t, 3f), t € R. Substituting into the equation of the plane, we get the intersection point [5/11,5/11, 15/11].
Alternatively, we can minimize the distance (or its square) of the plane's points from the origin, i. e., the function
(5 - y - 3z)2 + y2 + z2.
Setting the partial derivatives equal to zero, we get the system
3y + 10z - 15   = 0 2y + 3z-5   = 0,
whose solution is as above. Since we know that the minimum exists and is the only stationary point, we need not calculate the Hessian any more. □
The Hessian of the function / is
for appropriate values Qi, 0 < Qi < t. This is a finite sum, so the same reasoning as in the case of two variables verifies that
^-f(x + c(t))t=0 = cj(0) |--(x) + • • • + c'n(0)^-(x). at ox i axn
The special choice of the curves c(t) — x + tv for a directional vector i; verifies the statement about existence and linearity of the directional derivatives at x.
At the same time, we can apply the mean value theorem in the same way to the difference
fix + v)-fix) = dvfix+0v)
df df — vi-—(x + 6v) H-----h v„-—(x + 6v)
ax i axn
with an appropriate 6, 0 < 6 < 1, where the latter equality holds according to the formula for directional derivatives derived above, for sufficiently small v's thanks to the continuity of the partial derivatives in a neighborhood of the point x.
Since all the partial derivatives are continuous at the point x, we know that for an arbitrarily small e > 0, there is a neighborhood U of the origin in M" such that for w e U, all partial derivatives ^-(x + w) differ from |£- (x) by less than e. Thus we get the bound
-(f(x + w)-f(x)-dwf(x + 9w))
\w\\s,
so the approximation property of the differential is satisfied as well.
□
8.8. A plane tangent to the graph of a function. The linear approximation of the function behavior by its differential can, similarly to the case of univariate functions, :. be expressed with respect to its graph. We will just work with hyperplanes instead of tangent lines.
For the case of a function on E2 and a fixed point (xq, yo) e E2, consider the plane in £3 given by the equation
z = f(xo, yo) + dfix0, yo)(x -x0,y- yo)
df df — f(xo, yo) + t-(*o, yo)(x - x0) + —ixo, yo)(y - yo)-ax ay
We have already seen that the increase of the function values of a differentiable function / : E„ —>• R at points x + tv and x is always expressed in terms of the directional derivative dvf at a suitable point between them. Therefore, this is the only plane out of those which contain the point (xo, yo) having the property that all derivatives, and so the tangent lines of all curves
c(t) = (x(t), y(t), f(x(t), y(t)))
as well lie in it. It is called the tangent plane to the graph of the function /.
Two tangent planes to the graph of the function
f(x, y) — sin(x) cos(y)
are shown in the picture. The diagonal line is the image of the curve c(t) — (t, t, fit, t)).
All
CHAPTER 8. CONTINUOUS MODELS WITH MORE VARIABLES
8.27.  Determine the local extrema of the function
fix, y) = x2 + arctan2 x + I y3 + y I ,    x, y €
Solution. The function / can be written as the sum f\ + f2, where
f^x) = x2 + arctan2x,   x e R,      f2(y) = \ y3 + y \ ,   y e R.
If the function / has a local extremum at a point, then it does so with respect to an arbitrary subset of its domain. In other words, if the function has, for instance, a maximum at a point [a, b] and we set y = b, then the univariate function f(x,b)ofx must have a maximum at the point x = a. Let us thus fix an arbitrary y € R. For this fixed value of y, we get a univariate function, which is a shift of the function f\. This means that its maxima and minima are at the same points. However, it is easy to find the extrema of the function f\. We can just realize that this function is even (it is the sum of two even functio: and the function y = arctan2 x is the product of two odd functions) and increasing for x > 0 (the composition as well as the sum of increasing functions is again an increasing function). Therefore, it has a unique extremum, and that is a minimum at the point x = 0. Similarly, for any fixed value of x, f is a shift of the function f2, and f2 has a minimum at the point y = 0, which is its only extremum. We have thus proved that / can have a local extremum only at the origin. Since
/(0,0)=0,      f(x,y)>0, [ij]el2\([0,0]),
the function / has a strict local (even global) minimum at the point [0,0]. □
8.28.  Examine the local extrema of the function
fix, y) = (x + y2) ei, ijel.
Solution. This function has partial derivatives of all orders on the whole of its domain. Therefore, local extrema can occur only at stationary points, where both the partial derivatives fx, fy are zero. Then, it can be determined whether the local extremum occurs by computing the second derivatives.
We can easily determine that
fx(x, y) = ei + \ (x + y2)ef    fy(x, y) = 2y e%,    x, y e R.
A stationary point [x, y] must satisfy
fy(x,y)=0,    i.e.   y = 0,
and, further,
Mx,y) = A(jc,0)=et(l + ijc) =0,   i.e.   x = -2.
We can see that there is a unique stationary point, namely [—2,0].
Now, we calculate the Hessian Hf at this point. If this matrix (the corresponding quadratic form) is positive definite, the extremum is a strict local minimum. If it is negative definite, the extremum is a strict local maximum. Finally, if the matrix is indefinite, there will be no extremum at the point. We have
fxxix, y) = \ e§ (2 + \ (x + y2)) ,    fyyix, y) = 2&, fxyix, y) = fyXix, y) = yei,   x, y e R.
Therefore,
'fxx (-2, 0)   fxy i-2, 0)\ _ (l/2t 0
For the case of functions of n variables, the tangent plane is defined as an analogy to the tangent plane to an area in the three-dimensional space. Instead of being puzzled by a great deal of indeces, it will be useful to recall affine geometry, where we already worked with the so-called hyperplanes, see paragraph 4.3.
tangent (hyper)plane to the graph of a function at a point ,_|
Definition. A tangent hyperplane to the graph of a function / : Rn -> R at a point x e Rn is the nadr containing the point (x, fix)) with direction which is the graph of the linear mapping df(x) : Rn -> R, i. e. the differential at the point x e En.
The definition takes advantage of the fact that the directional derivative dv f is given by the increase in the tangent (hyper)plane corresponding to the increase of the input vector v.
Many analogies with the univariate functions follow from these reasonings. In particular, a differentiable function / on E„ has zero differential at a point x e En if and only if its composition with any curve going through this point has a stationary point there, i. e., is neither increasing, nor decreasing in the linear approximation.
In other words, the tangent plane at such a point is parallel to the hyperplane of the variables (i. e., its direction is E„ c having added the last coordinate set to zero). Of course, this does not mean that / should have a local extremum at such a point. Just like in the case of univariate functions, this depends on the values of higher derivatives.
8.9. Derivatives of higher orders. The operation of differentiation can be iterated similarly to the case of univariate functions. This time, we can choose different directions for each iteration.
If we fix an increase i; e M", the enumeration of the differentials at this increase defines a (differential) operation on differentiable functions / : E„ -> R
f     dvf = df(v),
and the result is again a function df(v) : E„ -> R. If this function is differentiable as well, we can repeat this procedure with another increase, and so on. In particular, we can work with iterations of partial derivatives. For second-order partial derivatives, we write
32     „ d2f
-°-)f
dxj dxi
-f
Hf i-2, 0)
fyxi-2,0) fyyi-2,Q)
0 2/e
dxi dxj dxi dxj
In the case of the repeated choice i — j, we also write
JL 0 JL\ f = JL f = itU.
dxi ° ;iv( /'     ;ay • dx2'
All
CHAPTER 8. CONTINUOUS MODELS WITH MORE VARIABLES
We should recall that the eigenvalues of a diagonal matrix are exactly the values on the diagonal. Further, positive definiteness means that all the eigenvalues are positive. Hence it follows that there is a strict local minimum at the point [—2,0]. □
8.29.  Find the local extrema of the function
fix, y, z)
9 2
+ T + \ ~ 3xz - 2y + 2z,    x,y,z e
Solution. The function / is a polynomial; therefore, it has partial derivatives of all orders. It thus suffices to look for its stationary points (the extrema cannot be elsewhere). In order to find them, we differentiate / with respect to each of the three variables x, y, z, and set the derivatives equal to zero. We thus obtain
3x2 — 3z = 0, i. e., z = x2,
2y — 2 = 0, i. e., y = 1,
and (utilizing the first equation)
z -3x + 2 = 0, i. e., x e {1,2}.
Therefore, there are two stationary points, namely [1, 1, 1] and [2, 1, 4]. Now, we compute all second-order partial derivatives:
fxx = 6x,      fXy = fyX = 0,      fxz = fzx = ~3,
fyy   =  2, fyz   =   fzy   =  0, fz   = 1.
Having this, we are able to evaluate the Hessian at the stationary points:
Hf (1,1,1)
0
0 2 0
Hf (2,1,4)
Now, we need to know whether these matrices are positive definite, negative definite, or indefinite in order to determine whether and which extrema occur at the corresponding points. Clearly, the former matrix (for the point [1, 1, 1]) has eigenvalue X = 2. Since its determinant equals —6 and it is a symmetric matrix (all eigenvalues are real), the matrix must have a negative eigenvalue as well (because the determinant is the product of the eigenvalues). Therefore, the matrix Hf (1, 1, 1) is indefinite, and there is no extremum at the point [1, 1, 1].
We will use the so-called Sylvester's criterion for the latter matrix Hf (2, 1,4). According to this criterion, a real-valued symmetric matrix
		an	an ■	■ ai„^
	an	a22	a23 ■	■ a2n
A =	an	a23	a33 ■	■ a3n
	\ain	a2n	a3n ■	ann J
d\ = \an\ , d2
is positive definite if and only if all of its leading principal minors A, i. e. the determinants
an   a\2 a\3 , d3= «i2   a22   a23 , ..., d„ = \ A |, a 13   a23 a33 are positive. Further, it is negative definite iff
dx < 0,    d2 > 0,    d3 < 0, i-l)"dn>0.
The inequalities
axx a12 an a22
We proceed in the same way with further iterations and talk about partial derivatives of order k
&f
dx{ ... dxi.
More generally, we can also iterate (assuming the function is sufficiently differentiable) any directional derivatives; for instance, dv o dwf for two fixed increases v, w e R".
___| times differentiable functions |_ -
We say that a function / : E„ -> R is k-times (continuously) differentiable at a point x iff all partial derivatives up to order k (inclusive) exist in a neighborhood of the point x and are continuous at this point.
We say that / is /c-differentiable iff it is /c-times (continuously) differentiable at all points of its domain.
>From now on, we will always work with continuously differentiable functions unless explicitly stated otherwise.
To show all of this in the simplest form, we will once again work in the plane E2, supposing the second-order partial derivatives are continuous. In the plane as well as in the space, iterated derivatives are often denoted by mere indeces referring to the variable names, for example:
df d2f d2f d2f
Jx      ^   , j xx       ^ 9 , j xy      ^ ^    ,   jyx      ^ ^
ox oxL oxo y oyox
We will show that if certain senseful conditions are satisfied, the partial derivatives commute, i. e., we need not care about the order in which we differentiate.
Since we suppose that the partial derivatives exist and are continuous, the limits
fxy (x, y) = lim y (fx (x, y + t) - fx (x, y)) 1( lr
— lim -  lim - I f(x +s,y + t)-f(x,y + t)
t^0 t \s^0 S
/(V   •   V. V)   •   /(V. V))
exist. However, since the limits can be expressed by any choice of values t„ —>• 0 and s„ —>• 0 and the limits of the corresponding sequences, we will also have that
fxy(X y) = lim j ({f(x + t,y + t)- f(x, y + 0)
-(f(x + t,y)- f(x,y))y
and this limit value is continuous at (x, y).
Let us consider the expression from which we take the last limit to be a function <p(x, y, t), and let us try to express it in terms of partial derivatives. For a temporarily fixed t, we denote g(x, y) — f (x + t, y) — f(x, y). Then the expression in the last big parentheses is, by the mean value theorem, equal to
g(x, y + t)- g(x, y) - t ■ gy(x, y + t0)
for a suitable to which lies between 0 and t (the value of to depends on t).
473
CHAPTER 8. CONTINUOUS MODELS WITH MORE VARIABLES
12
12 > 0,
12 0
12   0 -24 > 0,      0    2    0   = 6 > 0, -3  0 1
imply that the matrix Hf (2, 1, 4) is positive definite - there is a strict local minimum at the point [2,1,4]. □
8.30.  Find the local extrema of the function
z = (x2 - l) (l - x4 - y2) ,    x, y e R.
Solution. Once again, we calculate the partial derivatives zx, zy and set them equal to zero. This leads to the equations
-6x5 + 4x3 + 2x - Ixy2 = 0,    (x2 - l) (-2y) = 0,
whose solutions [x, y] = [0, 0], [x, y] = [1, 0], [x, y] = [-1, 0]. (In order to find these solutions, it suffices to find the real roots 1, — 1 of the polynomial — 6x4 + Ax2 + 2 using the substitution u = x2. Now, we compute the second-order partial derivatives
-30x4 + 12x2 + 2-2/, z
xy
~yx
-4xy, zyy = -2 (x2 - l)
Hz (0, 0)
and evaluate the Hessian at the stationary points:
^  °),    //z(l,0) = //z(-l,0) = (-016 £
We can see that the first matrix is positive definite, so the function has a strict local minimum at the origin.
However, the second and third matrices are negative semidefinite. Therefore, the knowledge of second partial derivatives in insufficient for deciding whether there is an extremum at the points [1,0] and [—1,0]. On the other hand, we can examine the function values near these points. We have
z (1,0) =z (-1,0) = 0,      z(x,0)<0 forxe(-l,l). Further, consider y dependent on x e (—1, 1) by the formula y = ^2 (l — x4), so that y -» 0 for x -» ± 1. For this choice, we get
z (x, ^2(1 -x4)) = (x2 - 1) (x4 - 1) > 0,   x e (-1, 1).
We have thus shown that in arbitrarily small neighborhoods of the points [1, 0] and [—1,0], the function z, takes on both higher and lower values than the function value at the corresponding point. Therefore, these are not extrema. □
8.31.  Decide whether the polynomial
p(x, y)
x6 + y8 + y4x4
has a local extremum at the stationary point [0, 0]. Solution. We can easily verify that the partial derivatives px and py are indeed zero at the origin. However, each of the partial derivatives Pxx, Pxy, Pyy is also equal to zero at the point [0, 0]. The Hessian Hp (0, 0) is thus both positive and negative semidefinite at the same time. However, a simple idea can lead us to the result: We can notice that p(0,0) = Oand
p(x, y) = x6 (1 - y5) + y8 + y4x4 > 0
for [x, y] € R x (—1, 1) \ {[0, 0]}. Therefore, the given polynomial has a local minimum at the origin. □
8.32. Determine local extrema of the function /  :  R3  -» R,
f(x,y,z)=x2y + y2z+x-zon.R3. O
Now, gy (x, y) — fy (x + t, y) — fy (x, y), so we can write <p
as
1
(f(x, y, t) — -gy(x, y + t0)
= j(fy(x + t,y + t0) - fy(x, y + t0)).
Another application of the mean value theorem yields
(fix, y, t) — fyx (x + ti, y + t0)
for a suitable t\ between 0 and t. However, if we split the big parentheses into (f(x + t, y + t) — f(x + t, y)) — (f(x, y + t) — f(x,y)), we get, by the same procedure with the function
h(x, y) — f(x, y + t) — f(x, y), the expression
(f{x, y, t) — fxy(x + s0, y + si)
with (in general) different constants so and s\. Since we assume that the second-order partial derivatives are continuous, the limit for t —>• 0 must guarantee the wanted equality
fxy(x, y) = fyx(x, y)
at all points (x, y).
The same procedure for functions of n variables proves the following fundamental result:
I    Interchangeability of partial derivatives
8.10. Theorem. Let f : En —>• M be a k-times differentiable function with continuous partial derivatives up to order k (inclusive) in a neighborhood of a point x € M". Then all partial derivatives of the function f at the point x up to order k (inclusive) are independent of the order of differentiation.
Proof. The proof for the second order was illustrated above in , the special case when n — 2 . The procedure works
similarly for the general case as well. " Formally, the proof can be led in the following way: we may assume that for every fixed choice of a pair of coordinates Xi and xj, the whole discussion of their interchanging takes place in a two-dimensional afline subspace, i. e., all the other variables are considered to be constant and take no effect in the reasonings.
In the case of higher-order derivatives, the proof can be finished by induction on the order. Indeed, every order of the indeces i\, ...,ik can be obtained from a fixed one by several swaps of adjacent pairs of indeces. □
8.11.
Hessian. In the case of first-order derivatives, we introduced the differential, being the linear form df(x) which approximates a function fata point x in the best way. Similarly, we will now want to understand the quadratic approximation of a function / : E„ —>•
474
CHAPTER 8. CONTINUOUS MODELS WITH MORE VARIABLES
8.33.
Determine the local extrema of the function /
9
x y
y2 z + 4x +z
on
8.34. Determine the local extrema of the function /
fix, y, z) = xz2 + y2 z - x + y on R3.
8.35. Determine the local extrema of the function /
fix, y, z) = y2z - xz2 + x + 4y on R3.
8.36. Determine the local extrema of the function /
fix, y) = x2y + x2 + ly2 + y on R2
8.37. Determine the local extrema of the function /
fix, y) = x2y + 2/ + 2y on R2.
8.38. Determine the local extrema of the function /
fix, y) = x2 + xy + 2y2 + y on R2.
8.39. Determine the local extrema of the function /
fix, y) = x2 + xy - 2y2 + y on R2.
O
► I
O
► I
o o o o o
F. Implicitly given functions and mappings
8.40. Let F : R2 -» R be a function, F(x,y) = xysin^xy2). Show that the equality Fix,y) = 1 implicitly defines a function / : U -» R on a neighborhood U of the point [1,1] so that Fix, fix)) = 1 forx € U. Determine f'il).
Solution. The function is differentiable on the whole R2, so it is such on any neighborhood of the point [1, 1]. Let us evaluate Fy at [1, 1]:
Fyix, y) = x sin
+ Ttx2 y2 cos
so Fy(l, 1) = 1 7^ 0. Therefore, it follows from theorem 8.18 that the equation Fix,y) = 1 implicitly determines on a neighborhood of the point (1, 1) a function / : U -» R defined on a neighborhood of the point (number) 1. Moreover, we have
Fxix, y)=y sin (^y2) + ijxy3 cos (^y2) ,
so the derivative of the function / at the point 1 satisfies ^(1, 1) 1
Fy(h 1)
1
□
Remark. Notice that although we are unable to explicitly define the function / from the equation Fix, fix)) = 1, we are able to determine its derivative at the point 1.
8.41.   Considering the function F : R2 -
Fix, y) = ex sin(y) + y
77-/2 - 1
, show that the equation F{x,y) = 0 implicitly defines the variable y to be a function of x, y = fix), on a neighborhood of the point [0,77-/2]. Compute /'(0).
Solution. The function is differentiable in a neighborhood of the point [0,7T/2]; moreover, Fy = ex cosy + 1, F(0, jt/2) = 1 ^ 0, so the equation indeed defines a function / : U -» R on a neighborhood of the point [0, tt/2]. Further, we have Fx = ex siny, 7^(0, jt/2) = 1, and its derivative at the point 0 satisfies:
Fx(0, tt/2) 1 f'iO) = - _.'        =-- = -1. □
7% (0,77-/2)
1
Hessian
Definition. If / : M" —>• M is a twice differentiable function, we call the symmetric matrix of functions
H fix) =
d2f dxi dxj
(H-ix)
ix) =
a2/
dx\dxn
ix)\
a2/
dxn dx\
ix)
■Al—ix)!
ÓXndXn  v  ' /
the Hessian of the function / at the point x.
We have already seen from the previous reasonings that zeroing the differential at a point (x, y) e E2 guarantees stationary behavior along all curves going through this point. The Hessian
H fix, y)
fxx ix, y)   fxy ix, y)
fxyix,y)     /vv(V. V)
plays the role of the second derivative. For every parametrized straight line
cit) = (x(0, yit)) the univariate functions
a(0 = /M0,X0)
m = fixo,yo) + ^-(xo,yo)^ ox
(x0 + %t, y0 + nt),
of
t-(*o, yo)ri
dy
fxxixo, yo)t + 2fxyix0, yo)ŠV + fyyixo, yo)v'
will share the same derivatives up to the second order (inclusive) at the point t — 0 (calculate this on your own!). The function f3 can be written in terms of vectors as
ßit) — fixo, yo) + dfixQ, y0) •
l-iH v)-Hfixo,yo)-(Í
or Pit) = fix0, yo) + dfix0, yo)iv) + jHfixo, yo)iv, v), where i; — (§, n) is the increase given by the derivative of the curve c(f), and the Hessian is used as a symmetric 2-form.
This is an expression which looks like Taylor's theorem for univariate functions, namely the quadratic approximation of a function by Taylor's polynomial of degree two. The following picture shows both the tangent plane and this quadratic approximation for two distinct points and the function fix, y) — sin(x) cos(y).
8.12. Taylor's theorem. The multidimensional version of Tay-lor's theorem is once again an example of a mathemat-ical statement where the most difficult part is finding ' ] '\MXJ   the right formulation. The proof is quite simple then.
475
CHAPTER 8. CONTINUOUS MODELS WITH MORE VARIABLES
8.42. Let
F(x, y, z) = sin(xy) + sin(yz) + sin(xz).
Show that the equation F{x, y,z) =0 implicitly defines a function z(x, y) : R2 -» R on a neighborhood of the point [tt, 1,0] e I3 so that F(x, y, zix, y)) = 0.
Determine zxin, 1) and zy(7t, 1).
Solution. We will calculate Fz = y cos(yz)+x cos(xz), Fz(tc, 1, 0) = tt + 1 7^ 0, and the function z(x, y) is defined by the equation F(x, y, z(x, y)) = 0 on a neighborhood of the point [tt, 1, 0]. In order to find the values of the wanted partial derivatives, we first need to calculate the values of the remaining partial derivatives of the function F at the point [tt, 1,0].
Fx(x, y, z) Fy(x, y, z)
y cos(xy) + zcos(xz) Fx(jv, 1, 0) x cos(xy) + z cos(yz) Fy(jv, 1, 0)
-1,
-TT,
odkud
zxin, 1)
Zy(Tt, 1)
1
F An, 1,0) Fz(it, 1,0)
FyJTt,   1,0) _ _
Fz(it, 1,0) ~ tt + ť
Tt + 1
7t
□
8.43.   Having the mapping F   :   R3   ^   R2, F(x,y,z) = (f(x,y,z),g(x,y,z))   =   (exsmy,xyz), show that the equation F(x, ci(x), c2(x)) = (0,0) defines a curve c : R -» M2 on a neighborhood of the point [1, Tt, 1]. Determine the tangent vector to this curve at the point 1.
Solution. We will calculate the square matrix of the partial derivatives of the mapping F with respect to y and z;.
H(x,y,z) = [fy fz
8y 8z
Hence, H(l,jt, 1)
-1
1
x cos y ex sin y 0
xz, xy
and det#(l, tt, 1)
-TT
ŕ o.
Now, it follows from the implicit mapping theorem (see 8.18) that the equation Fix, c\{x), c2(x)) = (0, 0) on a neighborhood of the point [1, tt, 1] determines a curve (ci(x), c2(x)) defined on a neighborhood of the point [1, tt]. In order to find its tangent vector at this point, we need to determine the )column) vector (fx, gx) at this point:
fx
sin y e
yz
.x sin y
fxihTT, 1)
8 Ahn, 1)
The wanted tangent vector is thus
(cúAh (c2)Ah)
fy(hn,l) fz(hn,l) ^(l,7r,l) gz(hn,\)
-1 0
1 TT
TT
1 0
fAhn,\ 8x(\,n, 1)
0
□
We will proceed in the direction mentioned above, and we will introduce a notation for the particular parts of Dkf approximations of higher orders for functions /:£„—>• R". It will alwyas be ^-linear expressions in the increases, and we will be interested only in their enumeration at a &-tuple of same values.
We have already discussed the differential D1 f — df (the first order) and the Hessian D2f — Hf (the second order). In general, for functions /:£„—>• R, points x — (x\,..., x2) e En, and increases i; — (§i, ...,§„), we set
Dkf(x)(v)
E
dkf
l<i'l, ...,ik<n
dxh .. . dxik
(x\, . . . , xn) •       • • •
An illustrative example (making use of the symmetry of the partial derivatives) is, for E2, the third-order expression
, d3 f ,       d3 f 9
dxJ dxz d y
3 3 / 2 3x3 y2 ^
d3 f 3 37^'
and, in general,
Dkf(x,yM i?) = X)0
3*/
dxk 1 sy
»7 •
Taylor expansion with remainder
Theorem. Let f : En —>• R be a k-times differentiable function in a neighborhood Og(x) of a point x € En. For every increase v € W of size || v || < ^,        exzsta a number 6, 0 < 0 < 1, swcft
/(x + u) = /(x) + Dlf(x)(v) + ±D2f(x)(v)+
1
(*- 1)!
2! k!
Proof. For an increase i; e M", we consider the parametrized jijf'straight line c(f) — x + tv in E„, and we examine the function <p : M —>• R defined by the composition (pit) — |■ / o c(t). Taylor's theorem for univariate functions claims that (see Theorem 6.4)
(Pit) = (piO) + <p'(0)t
1
(*- 1)!
(0)f*~
k\
(k) i0)r.
Therefore, it remains to verify that step-by-step differentiation of the composite function <p yields just the wanted relation. This can be done quite easily by induction on the order k.
For k — 1, Taylor's theorem coincides with the corollary of the mean value theorem applied to the directional derivative, which we have already used several times. When deriving it, we used the formula
dt
(pit)
df_
3xi
ixit)) ■ x'At)
3x„
ixit)) ■ x'Jt),
476
CHAPTER 8. CONTINUOUS MODELS WITH MORE VARIABLES
a+b"
G. Constrained optimization
We will begin with a somewhat atypical optimization problem.
8.44. A betting office accepts bets on the outcome of a tennis match. Let the odds laid against player A winning be a : 1 (i. e., if a bettor bets x dollars on the event that player A wins and this really happens, then the bettor wins ax dollars) and, similarly, let the odds laid against player B winning be b : 1 (fees are neglected). What is the necessary and sufficient condition for (positive real) numbers a and b so that a bettor cannot guarantee any profit regardless the actual outcome of the match? (For instance, if the odds were laid 1.5 : 1 against the win of A and 5 : 1 against the win of B, then the bettor could bet 3 dollars on B winning and 7 dollars on A winning and profit from this bet in either case).
Solution. Let the bettor have P dollars. The bet amount can be divided to kP and (1 — k)P dollars, where k e (0, 1). The profit is then akP dollars (if player A wins) or b(l — k)P dollars (if B does). The bettor is always guaranteed to win the lesser of these two amounts; the total profit (or loss) is obtained by subtracting the bet P, then. Since each of a, b, P is a positive real number, the function akP is increasing, and the function b(l — k)P is decreasing with respect to k. For k = 0, b(l — k)P is greater; for k = 1, akP is. The minimum of the two numbers akP and b(l — k)P is thus maximal for a k e (0, 1), namely for the value k0 which satisfies ak0P = b(\ — k0)P, whence k0 Therefore, the betting office must choose a, b so that ak0P = b(l k0)P < P, which is equivalent to ak0 < 1, i. e., ab < a + b. □ We managed to solve this constrained optimization problem even without using the differential calculus. However, we will not be able to do so in the following problems.
8.45. Find the extremal values of the function
h(x, y, z) = x3 +y3 +z3 on the unit sphere S in M3 given by the equation
F(x, y, z) = x2 + y2 + z2 - 1
as well as on the circle which is the intersection of this sphere with the plane
G(x, y, z) = x + y + z-
Solution. First, we will look for stationary points of the function h on the sphere S. Computing the corresponding gradients (for instance,
grad/i(x, y, z) = Ox2, 3V2, 3z2)), we get the system
0 = 3x2 - 2kx,
0 = 3/- 2ky,
0 = 3z2- 2Xz,
0 = x2 + y2 + z2 - 1
consisting of four equations in four variables. Before trying to solve this system, we can estimate how many local constrianed extrema we should anticipate the function to have. Surely, h(P) is in absolute value equal to at most 1, and this happens at all intersection points of the coordinate axes with S. Therefore, we are likely to get 6 local
which holds for every continuously differentiable curve and function /. This means that
Dlf(c(i))(v) = Dlf(c(i))(c'(i))
for all t in a neighborhood of zero. We will proceed accordingly for functions Dl f. We can write c'(t) instead of the increase v, and we will remember that further differentiation of c(t) leads identically to zero everywhere, i. e., c"(t) — 0 for all t (since it is a parametrized straight line). We suppose that
Dlf{x){v) = (
(xi(f), - - -, x„(0)
x'h(t)---x'h(t)
let us calculate this for I + 1. By the formula for first-order differentiation in a given direction, which has been derived, and the rule for the derivative of a product (see Theorem 5.33), differentiation of the composite function gives
^-Dlf(c(t))(c'(t)) =
dt
d      x ^
dl ^
1 ,...,ii<n
f
9x/i • • • 9x/£
(xi(f), - - -, x„(0)
■x'h(t)---X'u(t)
=  E E
■x'j{t)-x'h{t)---x'k{t)\+0
(xi(t), ...,x„(0)
which is indeed the wanted formula for the order I + 1. Taylor's theorem now follows from the enumeration at the point t — 0 and substituting into the equality for cp at the beginning of this proof.
□
8.13. Local extrema of multivalued functions. Now, we will try to examine the local maxima and minima of functions on En using the differential and the Hessian. Just like in the case of univariate functions, we say that an interior point xo e En of the domain of a function / is a (local) maximum or minimum iff there is a neighborhood U of its such that for all points x e U, the function value satisfies f(x) < /(xq) or f(x) > /(xq), respectively. If equality holds for no x / xo in the previous inequalities, we talk about a strict extrémům.
For the sake of simplicity, we will suppose that our function / has continuous both first-order and second-order partial derivatives on its domain. A necessary condition for existence of an extrémům at a point xq is that the differential be zero at this point, i. e., df(xo) — 0. Indeed, if df(xo) / 0, then there is a direction v in which we have dvf(xo) / 0. However, then the function value is increasing at one side of the point xo along the line xo + tv and it is decreasing on the other side, see (5.32).
An interior point x e En of the domain of a function / at which the differential df(x) is zero is called a stationary point of the function f.
All
CHAPTER 8. CONTINUOUS MODELS WITH MORE VARIABLES
extrema. Further, inside every eighth of the sphere given by the coordinate planes, there may or may not be another extremum. The particular quadrants can be easily parametrized, and the function h (considered a function of two parameters) can be analyzed by standard means (or we can have it drawn in Maple, for example).
Actually, solving the system (no matter whether algebraically or in Maple again) leads to a great deal of stationary points. Besides the six points we have already talked about (two of the coordinates equal to zero and the other to ±1) and which have a = ±|, there are also the points
y 3    3    3 J
for example, where a local extremum indeed occurs.
If we restrict our interest to the points of the circle K, we must give another function G another free parameter rj representing the gradient coefficient. This leads to the bigger system
0 0 0
0 = x2 + y2 +z2- L 0 = x+y + z.
3x2	— 2Xx	- V,
3/	- 2ky	- V,
3z2	-2Xz	- V,
However, since a circle is also a compact set, h must have both a global minimum and maximum on it. Further analysis is left to the reader. □
f(x,y,z) = 1. If so, find
8.46.  Determine whether the function / : R3 -» x2 y has any extrema on the surface 2x2 + 2^ + z2 these extrema and determine their types.
Solution. Since we are interested in extrema of a continuous function on a compact set (ellipsoid) - it is both closed and bounded in R3 - the given function must have both a minimum and maximum on it. Moreover, since the constraint is given by a continuously differentiable function and the examined function is differentiable, the extrema must occur at stationary points of the function in question on the given set. We can build the following system for the stationary points:
2xy x2 0
4kx, 4ky, 2kz.
This system is satisfied by the points [± , , 0] and [± , — , 0]. The function takes on only two values at these four stationary points. Ir follows from the above that the first and second stationary points are maxima of the function on the given ellipsoid, while the other two are minima. □
8.47. Decide whether the function / : R3 -» R, f(x, y, z) = z -xy2 has any minima and maxima on the sphere
2,2,2
x +y +z
1.
We will again, for a while, work with a simple function in E2 in order to illustrate our conclusions directly. Let us consider the function fix, y) — sin(x) cos(y) which has been discussed and caught in many pictures, namely in paragraphs 8.9 a 8.8.
The shape of this function resembles well-known egg plates, so it is apparent that we can find a lot of extrema, but also many more stationary point which, in fact, will not be extrema (the little "saddles" noticeable in the picture).
Therefore, let is calculate the first derivatives, and then the necessary second-order ones:
fxix, y) — cos(x) cos(y), fy(x, y) — - sin(x) sin(y),
and both derivatives will be zero for two sets of points
(1) cos(x) — 0, sin(y) — 0, that is (x, y) — i^^-n, £n), for any t,leZ
(2) cos(y) — 0, sin(x) — 0, that is (x, y) — (kit, 2^-n), for any t,leZ.
The second partial derivatives are
Hfix,y) =(f" ffxy)ix,y)
\Jxy Jyy/
- sin(x) cos(y)    — cos(x) sin(y)
- cos(x) sin(y)    — sin(x) cos(y)
We thus get the following Hessians in our two sets of stationary points:
If so, determine them.
(1) Hfikn + j, £n) — ± ^   ^j, where the sign — occurs
when k and £ have the same parity (remainder upon division by two), and the sign + occurs in the other case;
(2) Hfikn, In + j) — ± ^ ^j, where, again, the sign — occurs occurs when k and £ have the same parity, and the sign + occurs in the other case;
Now, if we look at the proposition of Taylor's theorem for order k — 2, we get, in a neighborhood of one of the stationary points
(*o, yo),
fix, y) = f(x0, yo)+ 1
+ 2Hf(x° + °(x ~ x°)' y° + e(y ~ yo))(x - xo, y - yo),
where Hf is now considered a quadratic form evaluated at the increase (x — xo, y — yo). Since the Hessian of our function is continuous (i. e., continuous partial derivatives up to order two, inclusive) and the matrices of the Hessian are non-degenerate, the local maximum occurs if and only if our point (xo, yo) belongs to the former group with k and £ of the same parity. On the other
478
CHAPTER 8. CONTINUOUS MODELS WITH MORE VARIABLES
Solution. We are looking for solutions of the system
x   = —ky2, y   = —2kxy, z   = k.
2j. The first
The second equation implies that either y = 0 or x = possibility leads to the points [0, 0, 1], [0, 0, —1]. The second one cannot be satisfied (substituting into the equation of the sphere, we get the equation
1       1 ,
777 + -r, +k = L 4k2 2k2
which has no solution. The function has a maximum and minimum, respectively, at the two computed points on the given sphere. □
8.48. Determine whether the function / : R3 -» R, f(x, y, z) = xyz, has any extrema on the ellipsoid given by the equation
g(x,y,z) = kx2 + lf + z2 = 1,    k, I e R+.
If so, calculate them.
Solution. First, we build the equations which must be satisfied by the stationary points of the given function on the ellipsoid:
dx dx
dy
dy
yz
xz.
xy
2Xkx,
2Xly,
2Xz.
dz dz
We can easily see that the equation can only be satisfied by a triple of non-zero numbers. Dividing pairs of equations and substituting into the ellipse's equation, we get eight solutions, namely the stationary points x = ±77^, y = ±-^j, z = ±7^5- However, the function / takes on only two distinct values at these eight points. Since it is continuous and the given ellipsoid is compact, / must have both a maximum and minimum on it. Moreover, since both / and g are continuously differentiable, these extrema must occur at stationary points. Therefore, it must be that four of the computed stationary points are local maxima of the function (of value 777=) and the other four are
minima (of value
3V3«
□
8.49.  Determine the global extrema of the function
f(x, y) = x2 - 2y2 + 4xy - 6x - 1 on the set of points [x, y] that satisfy the inequalities (8.1) x>0,    y>0, y<-x+3.
Solution. We are given a polynomial with continuous partial derivatives on a compact (i. e. closed and bounded) set. Such a function necessarily has both a minimum and a maximum on this set, and this can happen only at stationary points or on the boundary. Therefore, it suffices to find stationary points inside the set and the ones on a finite number of open (or singleton) parts of the boundary, then evaluate / at these points and choose the least and the greatest values. Notice that the set of points determined by the inequalities (||8.11|) is clearly a triangle with vertices at [0, 0], [3, 0], [0, 3].
hand, if the parities are different, then the point from the former group happens to be a point of a local minimum.
On the other hand, the Hessian of the latter group of points is always positive at some increases and negative at other ones. Therefore, the entire function / behaves in this manner in a small neighborhood of the given point.
In order to formulate the general statement about the Hessian and the local extrema at stationary points, we have to remember the discussion about quadratic forms from the paragraphs 4.31^1.32 in the chapter on affine geometry. There, we introduced the following attributes for a quadratic form h : En -> M:
• positively definite iff h (w) > 0 for all u ^ 0
• positively semidefinite iff h(u) > 0 for all u e V
• negatively definite iff h(u) < 0 for all u ^ 0
• negatively semidefinite iff h(u) < 0 for all u e V
• indefinite iff h(u) > 0 and f(v)<0 for appropriate u,v e V.
We also invented some methods which allow us to find out whether a given form has any of these properties.
The Taylor expansion with remainder immediately yields the following proposition:
Theorem. Let f : En -> Rbe a twice continuously differentiable function and x e En be a stationary point of the function f. Then
(1) f has a strict local minimum at x if Hf(x) is positively definite,
(2) f has a strict local minimum at x if H fix) is negatively definite,
(3) f does not have an extremum at x if H fix) is indefinite.
Proof. The Taylor second-order expansion with remainder applied to out function f(x\,..., x„), an arbitrary point x — (xi,..., x„), and any increase 1; — (vi,..., v„), such that both x and x + v lie in the domain of the function /, says that
f(x + v) = f(x) + df ix)iv) + \nfix + 0 ■ v)iv)
for an appropriate real number 6, 0 < 6 < 1. Since we suppose that the differential is zero, we get
fix + v) = fix) + l-Hfix + 6 ■ v)iv).
By our assumption, the quadratic form Hf(x) is continuously dependent on the point x, and the definiteness or indefiniteness of quadratic forms can be determined by the sign of the major subde-terminants of the matrix Hf, see Sylvester's criterion in paragraph 4.32. However, the determinant itself is a polynomial expression in the coefficients of the matrix, hence a continuous function. Therefore, the non-zeroness and signs of the examined determinants are the same in a sufficiently small neighborhood of the point x as at the point x itself.
In particular, for positively definite Hf(x), we have guaranteed that, at a stationary point x, f(x + v) > f(x) for sufficiently small 1;, so this is a sharp minimum of the function / at the point x. The case of negative definiteness is analogous. If Hf(x) is indefinite, then there are directions 1;, w in which fix + v) > fix) and fix + w) < fix), so there is no extremum at the stationary point in question. □
479
CHAPTER 8. CONTINUOUS MODELS WITH MORE VARIABLES
Let us determine the stationary points inside this triangle as the solution of the equations fx = 0, fy = 0. Since
fx(x, y) = 2x+Ay- 6, fy(x, y) = Ax - Ay, these equations are satisfied only by the point [1, 1]. The boundary suggests itself to be expressed as the union of three line segments given by the choice of pairs of vertices. First, we consider x = 0, y € [0, 3], when fix, y) = —2y2 — 1. However, we know the graph of this (univariate) function on the interval [0, 3] It is thus not difficult to find the points at which global extrema occur. They are the marginal points [0, 0], [0, 3]. Similarly, we can consider y = 0, x € [0, 3], also obtaining the marginal points [0, 0], [3,0]. Finally, we get to the line segment y = — x + 3, x e [0, 3]. Making some rearrangements, we get
f(x, y) = f{x, -x + 3) = -5x2 + 18* - 19, x e [0, 3]. We thus need to find the stationary points of the polynomial p(x) = —5x2 + 18* — 19 from the interval [0, 3]. The equation p'ix) = 0, i. e., — 10* + 18 = 0, is satisfied by x = 9/5. This means that in the last case, we obtained one more point (besides the marginal points), namely [9/5, 6/5], where a global extremum may occur. Altogether, we have these points as "suspects":
[1,1],    [0,0],    [0,3],    [3,0], [f,f] with function values
-A,    -1,    -19, -10, respectively. We can see that the function / takes on the greatest value -1 at the point [0, 0] and the least value -19 at the point [0, 3]. □
8.50. Determine whether the function / : R3 -» R, fix, y, z) = fz has any extrema on the line segment given by the equations
2x + y + z = 1,
x — y + 2z, = 0 and the constraint x e [—1,2]. If so, find these extrema and determine their types. Justify all of your decisions.
Solution. We are looking for the extrema of a continuous function on a compact set. Therefore, the function must have both a minimum and a maximum on this set, and this will happen either at the marginal points of the segment or at those where the gradient of the examined function is a linear combination of the gradients of the functions that give the constraints. First, let us look for the points which satisfy the gradient condition:
0
2yz
y2
2x + y + z x — y + 2z
2k +1, k-l, k + 2l, 1, 0.
The solution is [x, y, z] = [§, 0, -j] and [x, y,z] = [f, §, -|]
(of course, the variables k and I can also be computed, but we are not interested in them). The marginal points of the given line segment are [—1, |, |] and [2, —|, —|]. Considering these four points, the function takes on the greatest value at the first marginal point (/(*, y, z) = 4pp), which is its maximum on the given segment, and it
Let us notice that the theorem yields no result if the Hessian of the examined function is degenerate, yet not indefinite at the point in question. The reason is again the same as in the case of univariate functions. In these cases, there are directions in which both the first and second derivatives vanish, so at this level of approximation, we cannot determine whether the function behaves like f3 or ±?4 until we calculate the higher-order derivatives in the necessary directions at least.
At the same time, we can notice that even at those points where the differential is non-zero, the definiteness of the Hessian Hf(x) has similar consequences as the non-zeroness of the second derivative of a univariate function. Indeed, for a function / : Rn -> R, the expression
z(x + v) = f(x) + df(x)(v)
defines a tangent hyperplane to the graph of the function / in the space Rn+1, so Taylor's theorem of order two with remainder, as used in the proof, shows that when the Hessian is positively definite, all the values of the function / lie in a sufficiently small neighborhood of the point x above the values of the tangent hyperplane, i. e., the whole graph is above the tangent hyperplane in a sufficiently small neighborhood. In the case of negative definiteness, it is the other way round. Finally, when the Hessian is indefinite, the graph of the function goes from one side of the hyperplane to the other, but this happens, in general, along objects of lower dimension in the tangent hyperplane, so we have no straightforward generalization of inflexion points.
8.14. The differential of mappings. The concepts of a derivative and a differential can be easily extended to mappings F : E„ -> Em. Having selected the Cartesian coordinate system on both sides, this mapping is an ordinary m-tuple
F(xu ..., x„) - (/i(*i, ..., xn), ..., fm(xi, .. .,*„))
of functions f : E„ -> R. We say that F is a differentiable or k-times differentiable mapping iff the corresponding property is shared by all the functions fi,..., fm.
Differential and Jacobian matrix    |_^
The differentials dfi (x) of the particular functions f give a linear approximation of the increases of their values for the mapping
F(xu ... ,x„) = (/i(*i, .. .,xn), ..., fm(xx, ..., x„)).
Therefore, we can expect that they will also give a coordinate expression of the linear mapping D1F(x) : Rn —>• Rm between the direction spaces, which linearly approximates the increases of our mapping. The resulting matrix
DlF(x)
(dfi(x)\	
df2(x)	—
\dfm (x) J	
3/2 obci
I 3/m
\ obci
Ml
3/2 dX2
Ms.
dx2
3£l\
dx„ >
dxn
dx„ /
(x)
is called the Jacobian matrix of the mapping F at a point x.
The linear mapping D1F(x) defined on the increases i; — (vi,..., v„) by identically denoted the Jacobian matrix is called
480
CHAPTER 8. CONTINUOUS MODELS WITH MORE VARIABLES
takes the least value at the second marginal point {fix, y, z) which is thus its minimum there.
80 n 27'' □
8.51.  Find the maximal and minimal values of the polynomial
p(x, y) = 4x3 - 3x - Ay3 + 9y
on the set
M = {[x, y] e R2; x2 + y2 < l} .
Solution. This is again the case of a polynomial on a compact set; therefore, we can restrict our attention to stationary points inside or on the boundary of M and the "marginal" points on the boundary of M. However, the only solutions of the equations
px(x, y) = \2x2 -3 = 0,    py(x, y) = -12/ + 9 = 0
are the points
1 V3		1 S		1 V3		1 V3
2' 2		2' 2		2' 2		2' 2
which are all on the boundary of M. This means that p has no extrémům inside M. Now, it suffices to find the maximum and minimum of p on the unit circle k : x2 + y2 = 1. The circle k can be expressed parametrically as
x=cosf,    y = sinf, te[—jt,7t].
Thus, instead of looking for the extrema of p on M, we are now seeking the extrema of the function
/(f) := p(cos f, sin f) = 4 cos3 t — 3 cos t — 4 sin3 f + 9 sin f on the interval [—it, it]. For f e [—7T, tt], we have
/'(f) = —12cos2 f sinf + 3 sinf — 12sin2 f cosf + 9cos f,
In order to determine the stationary points, we must express the function /' in a form from which we will be able to calculate the intersection of its graph with the x-axis. To this purpose, we will use the identity
-K- = 1 + tan2 f,
cosz t
which is valid provided both sides are well-defined. We thus obtain
f'if) =
cos3 f [- 12tan f + 3 (tan f + tan3 f) - 12tan2 f + 9 (l + tan2f) ]
for f e [—it, it] with cosf ^ 0. However, this condition does not exclude any stationary points since sin f ^ 0 if cos f = 0. Therefore, the stationary points of / are those points f e [—it, tv] for which
-4 tan f + tan f + tan3 f - 4 tan2 f + 3 + 3 tan2 f = 0. The substitution s = tan f leads to
s3 - s2 - 3s + 3 = 0,   i. e.   (s - 1) (s - Vfj (s + Vfj = 0. Then, the values
5=1,   s = >/3,   s = — >/3 respectively correspond to
f e {—I jt, t e {—I jt, |7r},   f e {—\it, §77-}.
Now, we evaluate the function / at each of these points as well as at the marginal points t = —it, t = it. Sorting them, we get
/ (-1 it) = -1 - 373 < / (-1 jt) = -3V2 <f(-2-jt) = 1 -373 < -1, /(-7r) = /(7r) = -l<0, / (I Jt) = 1 + 373 > / (I jt) = 3V2 > / (i tt) = -1 + 373 > 0.
the differential of the mapping F at a point x in the domain iff we have
lim — (F(x + v)- F(x) - Z)1F(x)();)) = 0. u->0 Hull v 7
Several times, we have already used the fact that the definition of Euclidean distance guarantees that the limits of values in En exist if and only if the limits of the particular coordinate components do. Direct application of Theorem 8.5 about existence of the differential for functions of n variables to the particular coordinate functions of the mapping F thus leads to the following proposition (prove this in detail by yourselves!):
Corollary. Let F : En —>• Em be a mapping such that all of its coordinate functions have continuous partial derivatives in a neighborhood of a point x e En. Then the differential D1 F(x) exists, and it is given by the Jacobian matrix D1 F(x).
8.15. Transformation of coordinates. A mapping F : En —>• En which has an inverse mapping G : En —>• En defined on the whole of F's image is called a trans-formation. Such a mapping can be perceived as a change of coordinates. We usually require that both F and G be (continuously) differentiable mappings.
Just like in the case of vector spaces, the choice of our "point of view", i. e. the choice of coordinates, can simplify or deteriorate our comprehension of the examined object. The change of coordinates is now being discussed in a much more general form than in the case of affine mappings in the fourth chapter. Sometimes, the term "curvilinear coordinates" is used in this general sense. A very illustrative example is the change of the most usual coordinates in the plane to the so-called polar ones, i. e., the position of a point P is given by its distance r — ^/x2 + y2 from the origin and the angle cp — arctan(y/x) between the ray from the origin to it and the x-axis (if x ^ 0).
1/2*Pi
3/2*Pi
The change from polar coordinates to the standard ones is
ppolar = (r' «o) ^ (r cos V, r sin?) = cartesian It is apparent that it is necessary to limit the polar coordinates to an appropriate subset of points (r, <p) in the plane so that the inverse
481
CHAPTER 8. CONTINUOUS MODELS WITH MORE VARIABLES
Therefore, the global mnirnum of the function / is at the point t = —7t/3 , while the global maximum is at t = 2n/3.
Now, let us get back to the original function p. Since we know
the values cos (— ^ jt) = \, sin(— \jt)
V3
cos
sin
(H
V3
, we can deduce that the polynomial p takes on the
minimal value —1 — 3 73 (the same as /, of course) at the point [1/2, -V3/2] and the maximal value 1 + 3^3 at [-1/2, V3/2].
□
8.52.  At which points does the function
/(*, y)
Ax + y2
take on global extrema on the set M : | x \ + \ y \ < 1 ? Solution. Expressing / in the form
fix,y) = ix-2)2 -A + y2,
we can see that the global maximum and minimum occur at the same points as for the function
gix, y) := V(x -2)2 + /,    [x, y] e M,
since neither shifting the function nor applying the increasing function v = yfu for u > 0 changes the points of extrema (of course, they can change their values). However, we know that the function g gives the distance of a point [x, y] from the point [2, 0]. Since the set M is clearly a square with vertices [1, 0], [0, 1], [—1, 0], [0, —1], the point of M that is closest to [2, 0] is the vertex [1,0], while the most distant one is [ — 1, 0]. Altogether, we have obtained that the minimal value of / occurs at the point [1,0] and the maximal one at [ — 1, 0]. □
8.53. Compute the local extrema of the function y = fix) given implicitly by the equation
3x2+2xy+x = y2+3y + \,    [x, y] sR2\{[x,x - |] ; x € R} .
Solution. In accordance with the theoretical part (see 8.18), let us denote
Fix, y) = 3x2 + 2xy + x — y2 — 3y — |, [x, y]sR2\ {[x,x - §] ; x € R} and calculate the derivative
6x+2y + l 2x—2y—3 '
We can see the this derivative is continuous on the whole set in question. In particular, the function / is defined implicitly on this set (the denominator is non-zero).
A local extremum may occur only for those x, y which satisfy / = 0, i. e., 6x+2y+\ = 0. Substituting y = —3x — 1/2 into the equation Fix, y) = 0, we obtain — 12x2 + 6x = 0, which leads to
[x,y] = [0,-±],    [x,y] = [\,-2\.
We can also easily compute that
„ _ i ,y _ (6+2/)(2x-2;y-3)-(6jt+2;y+l)(2-2/) y   — \y )   — (2x-2y-3)2
Substituting x = 0, y = —1/2, / = 0 and x = 1/2, y = —2, / = 0, we obtain
f = _6(_2)_o > Q   for [JC>    = [o, 4]
and
f
6(+2)-0
<
0   for [*,)>] = [±,-2],
mapping would exist. The Cartesian image of lines in polar coordinates with constant coordinates r or <p is shown in the picture above.
The following theorem formulates a very useful generalization of the chain rule for univariate functions. Except for the concept of the differential itself, which is a bit complicated, it is actually the same as the one we have already seen in the case of one variable. The Jacobian matrix for univariate functions is in fact a single number, namely the derivative of the function at a given point, so the multiplication of Jacobian matrices is simply the multiplication of the derivatives of the outer and inner components of the function. There is, of course, another special case: the formulae we have derived for the derivative of a composition of multivariate functions with a curve.
The differential of a composite mapping    |_,
8.16. Theorem. Let F : En —>• Em and G : Em —>• Er be two dif-ferentiable mappings, where the domain of G contains the whole image of F. Then, the composite mapping G o F is also differen-tiable , and its differential at any point form the domain of F is given by the composition of differentials
Dl(GoF)(x) = D1G(F(x))o/)1F(x).
The corresponding Jacobian matrix is given by the product of the corresponding Jacobian matrices.
Proof. In paragraph 8.5 and in the proof of Taylor's theorem, we derived how differentiation of mappings composed from functions and curves behaves This proves the > special cases of this theorem for n — r — 1. The general case can be proved analogously, we just have to work with more vectors now.
Let us fix an arbitrary increase i; and calculate the directional derivative for the composition G o F at a point x e E„. This actually means to determine the differentials for the particular coordinate functions of the mapping G composed with F. For the sake of simplicity, we will write g o F for any one of them.
dv(g o F)(x) = lim Ug(F(x + tv)) - g(F(x))). t^o t
The expression in parentheses can, from the definition of the differential of g, be expressed as
g(F(x + tv)) - g(F(x) = dg(F(x))(F(x + tv) - F(x))
+ a(F(x + tv) - F(x)),
where a is a function defined on a neighborhood of the point F(x) which is continuous and satisfies lim^o ^a^i;) — 0. Substitution into the equality for the directional derivative yields
dv(g o F)(x) = lim Udg(F(x))(F(x t^o t
a(F(x
■ dg(F(x))(\iml-(F( 1
\t->o t
+ tv) - F(x)) tv) - F(x))) f tv) - Fix)
lim
t^o t
aiFix + tv) - Fix))
= dg(F(x)) o DlFix)iv) + 0,
482
CHAPTER 8. CONTINUOUS MODELS WITH MORE VARIABLES
We have thus proved that the implicitly given function has a strict local minimum at the point x = 0 and a strict local maximum at x = 1/2.
□
8.54. Find the local extrema of the function z the maximum possible set by the equation
f(x,y) given on
(8.2)
xz.
yz + 2x + 2y + 2z,-2 = 0.
Solution. Differentiating (||8.2||) with respect to x and y gives
2x + 2zzx - z - xzx - yzx + 2 + 2z,x = 0,
2y + 2zzy Hence we get that
z
(8.3)
xz,^
z - yz,y + 2 + 2z,y
0.
fx(x, y)
2z
fy(x, y)
— 2x —2 x - y + 2' -2y -2
2z, — x — y + 2
We can notice that the partial derivatives are continuous at all points where the function / is defined. This implies that the local extrema can occur only at stationary points. These points satisfy
zx = 0,   i. e.   z — 2x — 2 = 0,
zy = 0,    i. e.   z - 2y - 2 = 0. We have thus two equations, which allow us to express the dependency of x and y on z. Substituting into (||8.2||), we obtain the points
[x, y, z] =
[x, y, z]
-3 + 76, -3 + 76, -4 + 276 -3 - V6, -3 - 76, -4 - 276
Now, we need the second derivatives in order to decide whether the local extrema really occur at the corresponding points. Differentiating zx in (||8.3||), we obtain
7     —  f   (v  v\ - fa-2)(2z-x-y +2)-(z-2x-2)(2Zx-1)
Zxx - Jxx\x, y) - {2z_x_y+2)2 with respect to x, and
_ r   ,       n _ Zj,(2z-x-y+2)-(z-2x-2)(2zj,-l)
Zxy ~ Jxy(X, y) - (2z_x_y+2)2 ,
with respect to y. We need not calculate zyy since the variables x and y are interchangeabel in (||8.2||) (if we swap x and y, the equation is left unchanged). Moreover, the x- and y-coordinates of the considered
points are the same; hence zx
Now, we evaluate that at the
stationary points:
fxx (-3 + 76, -3 + V6) = fyy (-3 + V6, -3 + 76)
fxy (-3 + 76, -3 + 76) = fyx (-3 + 76, -3 + 76) = 0,
fxx (-3 - 76, -3 - 76") = fyy (-3 - 76, -3 - 76") = j=
fxy (-3 - 76, -3 - 76") = fyx (-3 - 76, -3 - 76") = 0. As for the Hessian, we have
Hf (-3 + 76, -3 + 76") =
Hf{[-
76, -3
where we made use of the properties of the function a and the fact the a linear mapping between finite-dimensional spaces are always continuous.
Thus, we have proved the theorem for the particular functions gi,..., gr of the mapping G. The whole theorem now follows from matrix multiplication. □
Now, we can illustrate, by a simple example, the usage of our concept of transformation and the theorem about differentiation of composite mappings. We have seen that the polar coordinates are given from the Cartesian ones by the transformation F : M2 —>• M2 which, in coordinates (x, y) and (r, cp), is written as follows (for instance, on the domain of all point in the first quadrant except for the points having x — 0):
— Jx2 + y2, (f — arctan
y
Consider a function gt : E2 —>• M which can be expressed as
g(r, <p, t) — sin(r — t)
in polar coordinates. Such a function can approximate the waves on a water surface after a point impulse in the origin at the time t, see the picture (there, t — —it/2). While it was easy to define it in polar coordinates, it would have been much harder in Cartesian ones.
Now, let us compute the derivative of this function in Cartesian coordinates. Using our theorem, we get
— (x, y, t) ax
and, similarly,
— (x, y, t) dy
dg        dr dg dcp
T-*/' <P)-z-(x, y) + 7-(r, <P)^—(x, y) or        ax d<p ox
cos(
^2 + y2 - t)        X +
yfx2 + y2
0
dg        dr dg dcp
~z (r, cp) — (x, y) + — (r, cp) — (x, y)
or        dy ocp oy
cos(
yjx2 +y2 -t)
y
Vx2 +y2
8.17. The inverse mapping theorem. If the first derivative of a univariate function is non-zero, its sign determines whether the function is increasing or decreasing. J^lYj Then, the function must have this property in a neigh-^^i^-J-~ borhood of the point in question, and so an inverse function exists in the selected neighborhood. The derivative of the
483
CHAPTER 8. CONTINUOUS MODELS WITH MORE VARIABLES
Apparently, the first Hessian is negative definite, while the second one is positive definite. This means that there is a strict local maximum of
the function / at the point — 3 + \/6, — 3 + V6 , and there is a strict local minimum at the point — 3 — V6, — 3 — V6 . □
8.55.
Determine the strict local extrema of the function
f{x,y) = \+1-,    x^O, y^O
on the set of points that satisfy the equation \ + \ = 4.
x y
Solution. Since both the function / and the function given implicitly
by the equation \ + \ — 4 = 0 have continuous partial derivatives of
x y
all orders on the set R2 \ {[0, 0]}, we should look for stationary points, i. e., for the solution of the equations Lx = 0, Ly = 0 for
L(x, y, k) = I + I - k      + £ - 4) ,    x^O, y^O.
We thus get the equations
-^ + 24 = 0,  -\ + ^ = o,
which lead to x = 2k, y = 2k. Considering the set of points in question, the constraint x = y gives the stationary points
~2~' 2~
(8.4)
"V2 V2_	
~2~' ~2~	
Now, let us examine the second differential of the function L. We can easily compute that
j     _ 2__6k
jii — x3        yAi '-'xy
whence it follows that
42
Lxv = 0,    Lyy = ^-f,,    x £ 0, y £ 0,
d2L(x,y) = {^-%dx2 + (±-f)dy2. Differentiating the constraint \ + \ = 4, we get
■ dx
■ dy = 0,    i. e. dy2
y-dx2
Therefore,
d2L(x, y)
dx
^3       y4 j xf>
In fact, we are considering a one-dimensional quadratic form whose positive (negative) definiteness at a stationary point means that there is a minimum (maximum) at that point. Realizing that the stationary points had x =2k,y = 2k, mere substitution yields
\dx2,
d2L^,^ = -4V2dx2, d2L(-^,-£)=4j2c which means that there is a strict local maximum of the function / at
the point is a strict
(8.5)
V2/2, V2/2
, while at the point
-V2/2, -V2/2
ocal minimum. The corresponding values are:
there
-2V2.
Now, we will demonstrate a quicker way how to obtain the result. We know (or we can easily calculate) the second partial derivatives of the function L, i. e., the Hessian with respect to the variables x and y:
HL (x, y)
> 2__6k n
v3        r4 u
inverse function is then the multiplicative inverse of the derivative of the original function.
Interpreting this situation for a mapping E\ —>• E\ and linear mappings R —>• R as their differentials, the non-zeroness is a necessary and sufficient condition for the differential to be invert-ible. In this way, we obtain a statement which is valid for finite-dimensional spaces in general:
I    The inverse mapping theorem    [__,
1
Theorem. Let F : En —>• En be a differentiable mapping on a neighborhood of a point xq € En, and let the Jacobian matrix D1 f(xo) be invertible. Then in some neighborhood of the point xq, the inverse mapping F-1 exists, and its differential at the point F(xq) is the inverse mapping to the differential D1F(xo), i. e., it is given by the inverse matrix to the Jacobian matrix of the mapping F at the point xq.
Proof. First, we should try to verify that the theorem makes sense and is expectable. If we supposed that the inverse mapping existed and was differentiable at the point F(xq), differentiation of composite functions enforces the formula
idM« = dl(F~l o F)(xo) = dHf'1) o D1F(xq),
which verifies the formula at the end of the theorem. Therefore, we know right from the beginning which differential for F~l to look for.
In the next step, we will suppose that the inverse mapping F~1 exists in a neighborhood of the point F(xq) and that it is continuous. We are to verify the existence of the differential. Since F is differentiable in a neighborhood of xq, it follows that
F(x) - F(x0) - dlF(x0)(x - x0) — a(x - x0)
with function a : R" —>• 0 satisfying lim^o ^a^i;) — 0. To verify the approximation properties of the linear mapping (D1F(xo))~1, it suffices to calculate the following limit for y — F(x) approaching yo — F(xq):
lim       1    n{F-\y) - F^iyo) - (Z)1JF(x0))-1(y - yo))-y^yo \\y - y0||
Substitution into the previous equality gives
1
lim
x0-
y^yo IIy - yo
(D1JF(x0))-1(D1JF(x0)(x - x0) + a(x - x0)) -1
lim
y^yo IIy - y0||
-(D1JF(xo))-1(«(x-x0))
(^^(xo))-1 lim
-1
-{a{x -x0)),
y->yo ||y - yoll
where the last equahty follows from the fact that linear mappings between finite-dimensional spaces are always continuous, and thanks to invertibility of the differential, performing it before the limit process has no impact upon existence of the limit.
484
CHAPTER 8. CONTINUOUS MODELS WITH MORE VARIABLES
The evaluation
HL
HL
-2V2 0 0 -2V2
2V2 0 o 2V2
then tells us that the quadratic form is negative definite for the former stationary point (there is a strict local maximum) and positive definite for the latter one (there is a strict local minimum).
We should be aware of a potential trap in this "quicker" method in the case we obtain an indefinite form (matrix). Then, we cannot conclude that there is not an extremum at that point since as we have not included the constraint (which we did when computing (PL), we are considering a more general situation. The graph of the function / on the given set is a curve which can be defined as a univariate function. This must correspond to a one-dimensional quadratic form. □
4.
8.56.  Find the global extrema of the function
f(x,y) = -\ + -\,    x^O, y^O
on the set of points that satisfy the equation \ + \
x y
Solution. This exercise is to illustrate that looking for global extrema may be much easier than for local ones (cf. the above exercise) even in the case when the function values are considered on an unbounded set. First, we would determine the stationary points (|| 8.4||) and the values (||8.5||) the same way as above. Let us emphasize that we are looking for the function's extrema on a set that is not compact, so we will not do with evaluating the function at the stationary points. The reason is that the function / may not have an extremum on the considered set -its range might be an open interval. However, we will show that this is not the case here.
Let us thus consider \ x\ > 10. We can realize that the equation
\ + \ = 4 can be satisfied only by those values y for which | y | >
x y
1/2. We have thus obtained the bounds
-2V2
<
10
+ 2 < 272, if
> 10.
2 < f(x, y) < 10
At the same time, we have (interchanging x and y leads to the same task)
-2V2 <       -2 < f(x,y) < ± +2 < 2V2,    if |y|>10.
Hence we can see that the function / must have global extrema on the considered set, and this must happen inside the square A BCD with vertices A = [-10, -10], B = [10, -10], C = [10, 10], D = [-10, 10].
The intersection of the "hundred times reduced" square with vertices at Ä = [-1/10,-1/10], B = [1/10,-1/10], C = [1/10, 1/10], D = [-1/10, 1/10] and the given set is clearly the empty set. Therefore, the global extrema are at points inside the compact set bounded by these two squares. Since / is continuously dif-ferentiable on this set, the global extrema can occur only at stationary points. We thus must have
fm
/(f,f)=2V2,    /mto = /(-f,-f) = -2V2.
Let us notice that we are almost done with the proof. The limit at the end of our expression is, thanks to the properties of the function a, zero if the magnitudes ||F(x) — F(xo)|| are greater than C\\x — xq|| for some constant C. This is a bit stronger property than F-1 being continuous; in literature, this property of a function is called Lipschitz continuity. So, now it remains "merely" to prove the existence of a Lipschitz-continuous inverse mapping to the mapping F.
To simplify the reasonings to come, we will transform the general case to a statement which is a bit more simple. Especially, we can achieve xq — 0 e Rn, y0 — F(xq) — 0 e M" by a convenient choice of Cartesian coordinates, which is without loss of generality.
Composing the mapping F with any linear mapping G yields a differentiable mapping again, and we know this changes the differential. The choice G(x) = (Dl F(O))"1 (x) gives Dl(Go F)(0) = idu". Therefore, we can assume that
DlF(0) = idRn.
Now, having these assumptions, let us consider the mapping K(x) — F(x) — x. This mapping is differentiable, too, and its differential at 0 is apparently zero.
By The Taylor expansion with remainder of the particular coordinate functions Kt and the definition of Euclidean distance, we get for any continuously differentiable mapping A" in a neighborhood of the origin of W the bound
\\K(x) - K(y)\\ < Cfn\\x-y\\,
where C is bounded by the maximum of all absolute values of the partial derivatives in the Jacobian matrix of the mapping A" in the neighborhood in question.2
Since the differential of the mapping K at the point xo — 0 is zero in our case, we can, selecting a sufficiently small neighborhood U of the origin, achieve the bound
\K(x)-K(y)\
1
< -1 - 2
■yll.
Further, substituting for the definition K(x) voking the triangle inequality || (u — v) + v\\ e., ||m|| — ||d|| < ||« — d|| as well, we get
||y — x|| — ||-F(x) — F(y)|| < ||-F(x)
1
= F (x) — x and in-
< ||« — i>|| + ||d||, i.
F(y) + y - x||
Hence, finally,
1
— Ilx
2
y|| < \\F(x)-F(y)\
With this bound, we have reached a great advancement: if x / y in our neighborhood U of the origin, then we also must have F(x) ^ F(y). Therefore, our mapping is bijective. Let us write F-1 for its inverse defined on the image of U. For this function, our bound says that
|F_1(jc) - F~l(y)II < 2||
yll,
so this mapping is not only continuous, but also Lipschitz-continuous, as we needed in the previous part of the proof.
It immediately follows from this reasoning that a function which has continuous partial derivatives on a compact set is Lipschitz-continuous on it as well.
485
CHAPTER 8. CONTINUOUS MODELS WITH MORE VARIABLES
□
8.57. Determine the maximal and minimal values of the function fix, y, z) = xyz on the set M given by the conditions
1,    x + y + z = 0.
x2 + y2 + z2
Solution. It is not hard to realize that M is a circle. However, for our problem, it is sufficient to know that M is compact, i. e. bounded (the first condition of the equation of the unit sphere) and closed (the set of solutions of the given equations is closed since if the equations are satisfied by all terms of a converging sequence, then it is satisfied by its limit as well). The function / as well as the constraint functions F(x, y, z) = x2 + y2 + z2 — 1, G(x, y, z) = x + y + z, have continuous partial derivatives of all orders (since they are polynomials). The Jacobi constraint matrix is
'Fx(x, y, z) Fy(x, y, z) Fz{x, y, z) \ = (2x 2y 2z" KGx(x,y,z) Gy(x,y,z) Gz(x, y, z)) ~ \l 1 1 Its rank is reduced (less than 2) if and only if the vector (2x, 2y, 2z) is a multiple of the vector (1, 1, 1), which gives x = y = z, and thus x = y = z =0 (by the second constraint). However, the set M does contain the origin. Therefore, we may look for stationary points using the method of Lagrange multipliers. For
L(x, y, z, A.i, k2) = xyz - ki (x2 + y2 + z2 - l) - k2 (x + y + z), the equations Lx = 0, Ly = 0, Lz = 0 give
yz, — 2k\x — k2 = 0,
xz, — 2k\y — k2 = 0,
xy — 2k\z, — k2 = 0,
respectively. Subtracting the first equation from the second one and from the third one leads to
xz — yz — 2k\y + 2k\x = 0,
xy — yz — 2k\z, + 2k\x = 0,
i. e.,
(x -y)(z + 2kx) = 0, (x-z) (y + 2*i) = 0. The last equations are satisfied in these four cases:
x = y, x = z;    x = y, y = —2k\; z = —2k\,x=z;    z = — 2k\, y = — 2k\, thus (including the constraint G = 0)
x = y = z, = 0;    x = y = —2k\, z, = 4Ai;
x = z = —2k\, y = Ak\\    x = 4A.i, y = z = —2k\.
Except for the first case (which clearly cannot happen), including the constraint F = 0 yields
(4A02 + (-2A02 + (-2A02 = 1, Altogether, we get the points
i. e. k\
1
'76'
1 2 '76' V6.
1 2
1
76
2
76'
1
'76'
1
76
It could seem that we are done with the proof, yet it is not \\ true. To finish the proof completely, we still have to show that the mapping F restricted to a sufficiently small neighborhood is not only bijective, but also that it maps open neighborhoods of zero onto open neighborhoods of zero. 3
Let us choose S so small that the neighborhood V — (D& (0) lies in U with its boundary, and, at the same time, the Jacobian matrix of the mapping is invertible on the whole V. This surely can be done since the determinant is a continuous mapping. Let B denote the boundary of the set V (i. e., the corresponding sphere). Since B is compact and F is continuous, the function
P(x) = \\F(x)\\
has both a maximum and a minimum on B. Let us denote a — ^minx£B p(x) and consider any y e Oa(0). Of course, a > 0. We want to show that there is at least one x e V such that y — F(x), which will prove the whole inverse mapping theorem.
To this purpose, consider the function (y is our fixed point)
h(x)= \\F(x)-y\\2.
Again, the image h(V) U h(B) must have a minimum. First, we show that this minimum cannot occur for x e B. Indeed, we have F(0) — 0, hence h(0) — \\y\\ < a. At the same time, by our definition of a, the distance of y from F(x) for x e B is at least a for y e Oa (0) (since a was selected to be half the minimum of the magnitude of F(x) on the boundary). Therefore, the minimum occurs inside V, and it must be at a stationary point z of the function h. However, this means that for all j — 1,..., n, we have
^r(z) = E2(/'«-w)ö77« = 0.
dxJ
i=\
dfi dxj
This system of equations can be considered a system of linear equations with variables & — ft (z) — yi and coefficients given by twice the Jacobian matrix D1F(z). For every zeV, such a system has a unique solution, and that is zero since we suppose that the Jacobian matrix is invertible.
Thus, we have found the wanted point x — z e V satisfying, for all i — 1,..., n, the equality ft (z) — yi, i. e., F(z) — y. □
8.18. The implicit function theorem. Our next goal is to apply the inverse mapping theorem for work with implicitly defined functions. For the beginning, let us IIISlI? consider a differentiable function F(x, y) defined in the plane E2, and let us look for those point (x, y) at which F{x, y) = 0.
An example of this can be the usual (implicit) definition of straight lines and circles:
F(x, y) — ax + by + c — 0
F(x, y) = (x - s)2 + (y - t)2 - r2 = 0, r > 0.
While in the first case, the function given by the first formula is (for
a c
y — fix) —--x--
y    J b b
for all x; in the other case, for any point (x$, yo) satisfying the equation of the circle and such that yo / t (these are the marginal
l l
76' 76'
2
76
1
76'
2 1 76' 76
2 1 1 76 ' 76 ' 76
In literature, there are many examples of mappings which, for instance, continuously and bijectively map a line segment onto a square.
486
CHAPTER 8. CONTINUOUS MODELS WITH MORE VARIABLES
We will not verify that these really are stationary points. The only important thing is that all stationary points are among these six.
We are looking for the global maximum and minimum of the continuous function / on the compact set M. However, the global extrema (we know they exist) can occur only at points of local extrema with respect to M. And the local extrema can occur only at the aforementioned points. Therefore, it suffices to evaluate the function / at these points. Thus we find out that the wanted maximum is
/(--.--■-)-/(--■-■--
V  76    V6 76/       V  Vě Vě V6y
.2       1       1 \ 1
/
,76'   76'   76/ 376' while the minimum is
f(- - --)=f(- -- -V76 76    76/       V76    76 76y
= /M-.2-,2J) = _J_.
V  76 76 76/ 376
□
8.58.  Find the extrema of the function / : R3 -» R, fix, y, z) = x2 + y2 + z2, on the plane x + y — z, = 1 and determine their types. Solution. We can easily build the equations that describe the linear dependency between the normal to the constraint surface and the examined function:
k, y = k z
-k,    k e
The only solution is the point [|, |, — |]. Further, we can notice that the function is increasing in the direction of (1, —1,0), and this direction lies in the constraint plane. Therefore, the examined function has a minimum at this point.
Another solution. We will reduce this problem to finding the extrema of a two-variable function on R2. Since the constraint is linear, we can express z, = x + y — 1. Substituting this into the given function then yields a real-valued function of two variables: f(x, y) = x2 + y2 + (x + y - l)2 = 2x2 + 2xy + y2 - 2x - 2y + 1. Setting both partial derivatives equal to zero, we get the linear equation
4x + 2y - 2 = 0,    4y + 2x - 2 = 0,
whose only solution is the point [|, |]. Since it is a quadratic function with positive coefficients at the unknowns, it is unbounded on R2. Therefore, there is a (global) minimum at the obtained point. Then, we can get the corresponding point [|, |, — |] in the constraint plane from the linear dependency of z. □
8.59.  Find the extrema of the function x + y : R3 -» R on the circle given by the equations x + y + z, = 1 and x2 + y2 + z2 = 4. Solution. The "suspects" are those points which satisfy
(1, 1,0)=*:- (1, 1,1) + /- (jc, y, z), k,leR.
Clearly, x = y(= 1/Z). Substituting this into the equation of the circle then leads to the two solutions
1    722 1     722 1 722
- ± -—, - ± -—, - T -— 3      6    3      6    3 3
points of the circle in the direction of the coordinate x), we can find a neighborhood of the point xq in which either
or
y
y
= f(x) = t + f(x-s)2
= fix) = t - V(x - s)2
according to which semicircle the point (xq, yo) belongs to. Having the picture of the situation drawn, the reason is clear: we cannot describe both the semicircles simultaneously by a single function y — fix). The marginal points of the interval [s — r, s + r] are more amazing. They also satisfy the equation of the circle, yet we have at them that Fy is ±r,t) — 0, which describes the position of the tangent line to the circle at these points, parallel to the y-axis. Indeed, we cannot find neighborhoods of these points in which the circle could be described as a function y — fix).
Moreover, the derivatives of our function y — fix) — t + 7(x — s)2 — r2 at points where it is defined can be expressed in terms of partial derivatives of the function F:
fix) = -
2(x — s)
y ■
If we interchange the roles of the variables x and y and we will want to find a dependency x — fiy) such that F(/(y), y) — 0, then we will succeed in neighborhoods of points is ± r, t) with no problem. Let us notice that the partial derivative Fx is non-zero at these points.
Our observation thus (for mere two examples) says: for a function Fix, y) and a point (a,b) e E2 such that F(a,b) — 0, there is a unique function y — fix) satisfying F(x, fix)) — 0 if we have Fy(a, b) / 0. In this case, we can even compute f'ia) — —Fx(a,b)/Fy(a,b). We will prove that actually, this proposition is always true. The last statement about derivatives can be remembered (and is quite comprehensible if things are thoroughly understood) from the expression for the differential of the function gix) — Fix, y(x)) and the differential dy — f'(x)dx
0 — dg — Fxdx + Fydy — (Fx + Fyf'(x))dx.
We could work analogously with the implicit expressions F{x, y, z) — 0, where we can look for a function gix, y) such that Fix, y, gix, y)) — 0. As an example, consider the function fix,y) — x2 + y2, whose graph is a circular paraboloid centered at the point (0, 0). This can be defined implicitly by the equation
0 = Fix, y,z) = z-x2
■y2-
Before formulating the result straight for the general situation, we can notice which dimensions could/should appear in the problem. If we wanted to find, for this function F, a curve c(x) — (ci (x), c2Íx)) in the plane such that
Fix, c(x)) — Fix, c\ix), C2Íx)) — 0,
then we succeed as well (even for all initial conditions x — a), yet the result will not be unique for a given initial condition. In fact, it suffices to consider an arbitrary curve on the circular paraboloid whose projection onto the first coordinate has non-zero derivative. Then we consider x to be the parameter of the curve, and c(x) is chosen to be its projection onto the plane yz.
487
CHAPTER 8. CONTINUOUS MODELS WITH MORE VARIABLES
Since every circle is compact, it suffices to examine the function values at these two points. We find out that there is a maximum of the considered function on the given circle at the former point and a minimum at the latter one. □
8.60. Find the extrema of the function / : R3 -» R, f(x, y, z) = x2 + y2 + z2, on the plane 2x + y — z, = 1 and determine their types.
o
8.61. Find the maximum of the function / : R2 -» R, f(x, y) = xy on the circle with radius 1 which is centered at the point [x0, y0] =
[0, i]. O
8.62. Find the minimum of the function / : R2 -» R, f = xy on the circle with radius 1 which is centered at the point [xo, yo] = [2, 0].
o
8.63. Find the minimum of the function / : R2 -» R, f = xy on the circle with radius 1 which is centered at the point [x0, y0] = [2, 0].
o
8.64. Find the minimum of the function / : R2 -» R, f = xy on the ellipse x2 + 3/ = 1. O
8.65. Find the minimum of the function / : R2 -» R, f = x2y on the circle with radius 1 which is centered at the point [x0, y0] = [0, 0].
o
8.66. Find the maximum of the function / : I on the circle x2 + y2 = 1.
8.67. Find the maximum of the function / : on th ellipse 2x2 + 3 V2 = 1.
8.68. Find the maximum of the function / : on the ellipse x2 + 2y2 = 1.
l,f(x,y) =x3y O
R, f(x, y) = xy
o
R, f(x, y) = xy
o
H. Volumes, areas, centroids of solids
8.69. Find the volume of the solid which lies in the half-plane z, > 0, the cylinder x2 + y2 < 1, and the half-plane
a) z < x,
b) x + y + z, < 0.
Therefore, we expect that one function of m + 1 variables defines implicitly a hypersurface in Rm+1 which we want to express (at least locally) as the graph of a function of m variables. We can anticipate that n functions of m + n variables will define an intersection of n hy-persurfaces in Rm+n, which is, in "most" cases, a m-dimensional object.
Let us thus consider a differentiable mapping
fn):
The Jacobian matrix of this mapping will have n rows and m columns, and we can write it symbolically as
d1F = (dlxF, d\F)
(Ml
obci
. Ml
\Sx\
Ml
dxm
Ml
dxm vm+n
3/i
dxm
SXn.
dxm
_Mn_
where (x\,..., xm+n) e Rm+n is written as (x, y) e D\ F is a matrix of n rows and the first m columns in the Jacobian matrix, while D^F is a square matrix of order n, with the remaining columns. The multidimensional analogy to the previous reasoning with the non-zero partial derivative with respect to y is the condition that the matrix Dy F is invertible.
The implicit mapping theorem
Theorem. Let F : Rm+n —>• Rn be a differentiable mapping in an open neighborhood of a point (a,b) e Rm x Rn — Rm+n at which F(a,b) — 0, and det DyF ^ 0. Then there exists a differentiable mapping G : Rm —>• R" defined on an neighborhood U of the point a € Rm with image G(U) which contains the point b and such that F(x, G(x)) = Ofor all x € U.
Moreover, the Jacobian matrix D1G of the mapping G is, in the neighborhood of the point a, given by the product of matrices
DlG{x)
-(DyF)
-\x, G(x)) ■ DlxF(x, G(x)).
Proof. For the sake of comprehensibility, we first show the proof for the simplest case of the equation F(x, y) — 0 with a function F of two variables. At first sight, it ft, v   will be quite complicated because it will be presented in a way which can be extended for the general dimensions as the theorem states.
We extend the function F to
F : R2     R2,    (x, y)     (x, F(x, y)). The Jacobian matrix of the mapping F is
1 0
yFx(x,y) Fy(x,y)
It follows from the assumption Fy(a,b) / 0 that the same holds in a neighborhood of the point (a, b) as well, so the function F is invertible in this neighborhood, by the inverse mapping theorem. Therefore, let us take the uniquely defined differentiable inverse mapping F-1 in a neighborhood of the point (a, 0).
Now, let us denote by it : R2 —>• R the projection onto the second coordinate, and consider the function /(x)=jroF_1(i,0).
DlF(x, y) =
488
CHAPTER 8. CONTINUOUS MODELS WITH MORE VARIABLES
Solution, a) The volume can be calculated with ease using cylindric coordinates. There, the cylinder is determined by the inequality r < 1; the half-plane z < x by z < r cos <p, then. Altogether, we get
™ r cos <p
2 3"
V = I   I    I       r dzdcpdr Jo J-| Jo
b) We will reduce this problem to one that is completely analogous to the above part by rotating the solid around the z-axis by the angle n/4 (be it in the positive or the negative direction). Apply-
/V2/2   -V2/2 0\ ing the rotation matrix I 72/2    72/2    0 I, the original inequality
V  0 0 1/
x+y + z<0is transformed to ~j2x' + z' < 0 in the new coordinates. Now, it is easy to express the integral that corresponds to the volume of the examined solid:
•J2r
r dz dip dr
2V2
We need not
~2      " —v z,/ uus if/ • ^
have computed the result as we did; instead, we could notice that the solid from part (a) differs only by homothety with coefficient 72 in the direction of the j-axis. See also note || 8.791|. □
8.70.  Find the volume of the solid in M3 which is given by x2 + y2 +
z2 < 1, 3x2 +3/ > z2, x > 0.
Solution.
First, we should realize what the examined solid looks like. It is a part of a ball which lies outside a given cone (see the picture).
The best way to determine the volume is probably to subtract half the volume of the sector given by the cone from half the ball's volume (note that the volume of the solid does not change if we replace the condition x > 0 with z > 0 - the sector is cut either "horizontally" or "vertically", but always to halves). We will calculate in spherical coordinates.
x = r cos(<p) sin(V0, y = r sin(<p) sin(V0, z   = rcos(i^),
(p e [0, 2jt), \jr e [0,7T), r e (0, 00). The Jacobian of this transformation'.
This function is well-defined and differentiable. We must verify that the expression
F(x,f(x)) = F(x, tz(F-1(x, 0)))
evaluates to zero in a neighborhood of the point x — a. It follows directly from the definition of F(x, y) — (x, Fix, y)) that its inverse is of the form F~l (x, y) — (x, ttF~1 (x, y)). Therefore, we can resume the previous calculation:
Fix, fix)) = yt(F(x, 7t(F-\x, 0)))) =
= tz(F(F-1(x, 0))) = tt(x, 0) = 0.
This proves the first part of the theorem, and it remains to compute the derivative of the function / (x). This derivative can, once again, be obtained by invoking the inverse mapping theorem, using the matrix (Z)1^)"1.
The following result can be easily verified by multiplying the matrices. (It can also be computed directly using the explicit formula for the inverse matrix in terms of the determinant and the algebraically adjoint matrix, see paragraph 2.23)
1
Fx(x, y)
0
Fy(x, y)
= {Fyix,y)r
1 / Fy(x, y) -Fx(x, y)
By the definition fix) — it F ~1 (x, 0), we are interested in the first entry of the second row of this matrix, which is just the Jacobian matrix D1 f. In our simple case, it is exactly the wanted scalar -Fxix,fix))/Fyix,fix)).
The general proof is exactly the same, there is no need to \\ change any of the mentioned formulae (the participating objects just get the "vector" sense), except for the last computation of the derivative of the function. There, the corresponding parts of the Jacobian matrix D\F and DlyF will appear, instead of the particular partial derivatives Of course, we need to work with vectors and matrices, rather than scalars.
For the calculation of the Jacobian matrix of the mapping G, we will once again use the computation of an inverse matrix, yet the procedure from paragraph 2.23 is not very advantageous. It is much better to get inspired by the case in dimension m + n — 2 and to divide the matrix
(D F~ )
idu" DlFix,y)
0
DlyFix, y)
into blocks of m and n rows and columns (i. e., A, for instance, is of type m xm, while C is of type n x m). Now, we can determine the matrices A, B,C, D from the definition equality for an inverse:
idm
0
DxFix,y) DyFix,y)
idm
0
0 idi
is r2 sin(V^).
Apparently, it follows from here that A — id^m, B — 0, D — iDlyF)~l, and, finally, DlxF+DlyFC = 0. From the last equality, we get the desired relation
DlG = C = —iDyF)~l ■ D\F.
This proves the theorem. □
489
CHAPTER 8. CONTINUOUS MODELS WITH MORE VARIABLES
First of all, let us determine the volume of the ball. As for the integration bounds, it is convenient to express the conditions that bind the solid in the coordinates we will work in. In the spherical coordinates, the ball is given by the inequality
x2+y2+z2 = r2<l.
First, let us find the integration bounds for the variable <p. If we denote by ity the projection onto the (^-coordinate in the spherical coordinates (Tt<p{(p, 9, r) = (p), then the image of the projection tx^ of the solid in question gives the integration bounds for the variable <p. We know that ity (ball) = [0, 2jt) (the equation r2 < 1 does not contain the variable <p, so there are no constraints on it, and it takes on all possible values; this can also easily be imagined in space).
Having the bounds of one of the variables determined, we can proceed with the bounds of other variables. In general, those may depend on the variables whose bounds have already been determined (although this is not the case here). Thus, we choose arbitrarily a <Po € [0, 2jz), and for this <po (fixed from now on), we find the intersection of the solid (ball) and the surface <p = <po and its projection jtf on the variable \jr. Similarly like for <p, the variable \jr is not bounded (either by the inequality r2 < 1 or the equality <p = <po), so it can take on all possible values, \jr € [0, it).
Finally, let us fix a <p = <po and ax// = i/V Now, we are looking for the projection 7tr(U) of the object (line segment) U given by the constraints r2 < 1, <p = <p0, if/ = if/0 on the variable r. The only constraint for r is the condition r2 < 1, so r e (0, 1].
Note that the integration bounds of the variables are independent of each other, so we can perform the integration in any order. Thus, we have
^koule
2jt
r2 sin(i/>) dif/ d<p dr
-it.
/o Jo Jo 3 Now, let us compute the volume of the spherical sector given by x2 +y2 +z2 < 1 and 3x2 +3.V2 > z2. Again, we express the conditions in the spherical coordinates: r2 < 1, 3sin2(i/» > cos2(i/», i. e., tan(i/>) > Just like in the case of the ball, we can see that the variables occur independently in the inequalities, so the integration bounds of the variables will be independent of each other as well. The condition r2 < 1 implies r e (0, 1]; from tan(V0 > we have e [0- f ]■ The variable <p is not restricted by any condition, sop e
[0, 277-].
,2K f1 n j 2-V3 ^sector = /     /    /    TT sin f df dr dcp =---it,
lo   Jo Jo J
altogether,
2      2-73 jt V = Vball ~ ^sector = ^--^-Jt = -j=.
We could also have computed the volume directly:
Jo Jo Ji
V
it
r sin ijr di/> dr dcp = —=.
73
8.19. The gradient of a function. As we have seen in the previous paragraph, if F is a continuously dif-ferentiable function of n variables, the definition F(x\,..., x„) — b with a fixed value b e R defines m7Jw-     a subset M c Rn which often has the properties of an (n — l)-dimensional hypersurface. To be more precise, if the vector of the partial derivatives
D F
3xi
dx„
is non-zero, we can describe the set M locally as the graph of a continuously differentiable function of n — 1 variables. In this connection, we also talk about level sets Mj,. The vector D1F e Rn is called the gradient of the function F. In technical and physical literature, it is also often denoted as grad F.
Since M], is given by a constant value of the function F, the derivatives of the curves lying in M will surely have the property that the differential dF always evaluates to zero on them. Indeed, for every such a curve, we have F(c(t)) — b, hence
— F(c(t)) = dF(c'(t)) = 0. at
On
the other hand, v = (vu...,v„) e R" directional derivative
we can consider a general vector and the magnitude of the corresponding
\dvF\ =
M
dx
-vi
dx„
— cos (
ID Fll
where <p is the angle of the vector i; from the gradient F, see the discussion about angles of vectors and straight lines in the fourth chapter (definition 4.18).
Hence it follows that the partial derivatives that are zero are exactly those which are perpendicular to the gradient, while the direction given by the gradient is the direction in which the function / increases most rapidly.
Therefore, it is clear that the tangent plane to a non-empty level set Mb in a neighborhood of its point with non-zero gradient D1 F is determined by the orthogonal complement to the gradient, and the gradient itself is the so-called normal vector of the hypersurface Mb.
For instance, considering a sphere in R3 with radius r > 0, centered at (a, b, c), i. e., given by the equation
F(x, y, z) = (x- a)2 + (y - b)2 + (z - c)2 = r2,
we get the normal vectors at a point P — (xq, yo, zo) as a non-zero multiple of the gradient, i. e., a multiple of
D1F — (2(x0 - a), 2(yo - b), 2(z0 - c)),
and the tangent vectors will be exactly the vectors perpendicular to the gradient. Therefore, the tangent plane to a sphere at the point P can always be described implicitly in terms of the gradient by the equation
0 = (x0 - a)(x - x0) + (y0 - b)(y - y0) + (z0 - c)(z - z0)-This is a special case of the following general formula:
490
CHAPTER 8. CONTINUOUS MODELS WITH MORE VARIABLES
In cylindric coordinates
Tangent hyperplane to an implicitly given hypersurface
X
y
z
r cos(<p), r sin(<p),
z
with Jacobian r of this transformation, the calculation of the volume as the difference of the two solids considered above looks as follows:
Theorem. For a function F(x\, ..., xn) ofn variables and a point P — (a i,..., an) in a level set Mb of the function F such that Mb is the graph of a function of(n— 1) variables in a neighborhood of the point P, the implicit equation for the tangent hypersurface to Mb is
0 =
df df -MP) . (jci - ai) + ■ ■ ■ + -MP) • (x„ - a„). ox i oxn
V
r2n ri r1 Jo   Jo Jo
TT
r dz dr dcp = —=.
V3
Note that we cannot compute the volume of the solid directly in the cylindric coordinates. Thus, we must split it into two solids defined by the conditions r < \ and r > \, respectively.
7T
7r
□
Another alternative is to compute it as the volume of a solid of revolution, again splitting the solid into the two parts as in the previous case (the part "under the cone" and the part "under the sphere". However, these solids cannot be obtained by rotating around one of the axes. The volume of the former part can be calculated as the difference between the volumes of the cylinder x2 + y2 < ^,0 < z < ^
and the cone's part 3x2 + 3y2<z2,0<z<^. The volume of the latter one is then the difference between the volumes of the solid that is created by rotating the part of the arc y = a/(1 — x2), \ < x < 1
around the z-axis and the cylinder x2 + y2
V3
<iO<z< 2
V
(1 -r2)dr--—
4V3 73'
8.71.   Calculate the volume of the spherical segment of the ball x2 +
y2 + z2 = 2 cut by the plane z = 1.
Proof. The statement is apparent from the previous reasonings. The tangent hyperplane must be (n — l)-dimensional, so its direction space is given as the kernel of the linear form given by the gradient (zero values of the corresponding linear mapping Rn -> R given by multiplying the column of coordinates by the row vector grad F). Clearly, The selected point P satisfies our equation. □
8.20. A model of illumination of 3D objects. Let us consider illumination of a three-dimensional object where we know the direction i; of the light falling onto the two-dimensional surface of this object, i. e. a set M given implicitly by an equation F(x, y, z) — 0. The light intensity of a point P e M is defined as I cos <p, where <p is the angle between the normal line to M and the vector which is opposite to the flow of the light. As we have seen, the normal line is determined by the gradient of the function F. The sign of our expression then says which side of the surface is illuminated.
For example, consider an illumination with intensity Io in the direction of the vector i; — (1, 1, — 1) (i. e. "downward askew"), and let the ball given by the equation F(x, y, z) — x2 + y2 + z2 — 1 < 0 be the object of our interest. Then, for a point P — (x, y, z) e M on the surface, we get intensity
grad F ■ v   r      —2x — 2y + 2z
I(P) =
I grad PII
r'o =
2a/3
We can notice that, as anticipated, the point which is illuminated
with the (full) intensity Io is the point P = surface of the ball.
V3
(-1, -1, 1) on the
8.21. Tangent and normal spaces. Now, we extend our reason-ings about tangent and normal lines to general di-vjM^ mensions. Having a mapping / : Rm+n » M", with coordinate functions f, we can also consider
the n equations for n + m variables
fi(x\,
^m-\-n
l,...,«,
expressing the equality F(x) — b for a vector b e Rn.
Then, assuming that the conditions of the implicit function theorem hold, the set of all solutions (x\,..., xm+n) e Rm+n is (at least locally) the graph of a mapping G : Rm —>• Rn.
For a fixed choice b — (b\,..., b„), the set of all solutions is, of course, the intersection of all hypersurfaces M(bt, f) corresponding to the particular functions f. The same must hold for tangent directions, while normal directions are generated by particular gradients. Therefore, if D1F is the Jacobian matrix of a mapping which implicitly defines a set M and a point P —
491
CHAPTER 8. CONTINUOUS MODELS WITH MORE VARIABLES
Solution. We iwll compute the integral in spherical coordinates. The segment can be perceived as a spherical sector without the cone (with vertex at the point [0, 0, 0] and the circular base z, = 1, x2 + y2 = 1). In these coordinates, the sector is the product of the intervals [0, \/2] x [0, 2tc) x [0,7r/4]. We thus integrate in the given bounds, in any order:
f/7
Jo   Jo Jo
r2 sin(Ö) dö dr dcp = -(V2 - l)jt.
In the end, we must subtract the volume of the cone. That is equal to \ttR2H (where R is the radius of the cone's base and H is its height; both are equal to 1 in our case), so the total volume is
4   r 1       1 r
^sector - ^cone = ~W2 - 1) - -it = -7r(4V2 - 5).
The volume of a general spherical segment with height h in a ball with radius R could be computed similarly:
V
^sector ~~ ^cone
/     / r2 sin(6>) dr dö d<p
Jo   Jo Jo
1 9
--jt(2Rh - h2)(R - h)
1 9
8.72. Find the volume of the part of the cylinder x2 + z2 lies inside the cylinder x2 + y2 = 16.
□
16 which
(ai,..., am+n) e M such that M is the graph of a mapping in the neighborhood of the point P,
D1 F :
(Ml
obci
. Ml
\dx\
vm+n ,
3/i \
-Ms-,
then the affine subspace in Rm+n which contains exactly all tangent lines going through the point P is given implicitly by the following equations:
9/i
OX l
dfn OX I
■ a\) H-----h
■fll) H-----h
\r ) ■ (xm-|_n
OXn
\r ) ■ (Xm-\-n
OX„
h«)
This subspace is called the tangent space to the (implicitly given) m-dimensional surface M at the point P.
The normal space at the point P is the affine subspace generated by the point P and the gradients of all the functions f\,..., f„ at the point P, i. e. the rows of the Jacobian matrix D1 F.
As an illustrative simple example, we can calculate the tangent and normal spaces to a conic section in R3. Let us consider the equation of a cone with vertex at the origin,
yjx2 + y2,
0 = f(x, y, z) = and a plane, given by
0 — g(x, y, z) — z — 2x + y + 1.
The point P — (1, 0, 1) belongs to both the cone and the plane, so the intersection M of these surfaces is a curve (draw a picture!). Its tangent line at the point P is given by the following equations:
0 :
1
2^x2 + y2' 1
:2x
\x = l,y=0
2jx2 + y
-.2y
(x- 1)
■y + 1 • (z - 1)
0 =
i jc = l,y=0
-x + z
-2(x - 1) + y + (z - 1) = -2x + y + z + 1,
while the plane perpendicular to our curve, containing the point P, is given parametrically by the expression
(1,0, 1) + t(-1,0, l)+er(-2, 1, 1)
with parameters x and a.
8.22. Bound extrema. Now, we will approach the first really serious application of the differential calculus of more variables. The typical task of optimization is to find the extrema of values depending on several (yet finitely many) parameters, given some further conditions on the mutual relations between the parameters.
The problem often has m + n parameters which are bound by n conditions. In the language of our differential calculus, we are thus looking for the extrema of a differentiable function h on the set M of points given implicitly by an equation F(x\,..., xm+n) — 0. However, we have already prepare efficient procedures for this.
492
CHAPTER 8. CONTINUOUS MODELS WITH MORE VARIABLES
Solution. We will compute the integral in Cartesian coordinates. Since the solid is symmetric, it suffices to integrate over the first octant (interchanging x and — x does not change the equation of the solid; the same holds for y and for z). The part of the solid that lies in the first octant is given by the space under the graph of the function z(x, y) = Vl6 — x2 and over the quarter-disc x2 + y2 < 16, x > 0, y > 0. Therefore, the volume of the whole solid is equal to
y = 8/o/o 7W=-
dydx = 128.
□
Remark. Note that the projection of the considered solid onto both the plane y = 0 and the plane z, = 0 is a circle with radius 4, yet the solid is not a ball.
8.73. Find the volume of the part of the cylinder x2 + y2 by the planes z, = 0 and z, = x + y + 2.
4 bounded
For every curve c(t) c M going through P — c(0), we must have that h (c(t)) is an externum of this univariate function. Therefore, the derivative must satisfy
^-A(c(0)|M> = d^o)h(P) = dh(P)(c'(0)) = 0.
However, this means that the differential of the function h at the point P is zero along all tangent increases to M at P. This property is equivalent to stating that the gradient of h lies in the normal subspace (more precisely, in its direction space). Such points P e M are called stationary points of the function H with respect to bindings given by F.
As we have seen in the previous paragraph, the normal space to our set M is generated by the rows of the Jacobian matrix of the mapping F, so the stationary points are determined equivalently by the following proposition:
___J    Lagrange multipliers | -
Theorem. Let F = (/i, ..., /„) : Rm+n -> Rn be a differen-tiable function in a neighborhood of a point P, F(P) — 0. Further, let M be given implicitly by an equation F (x, y) — 0, and let the rank of the matrix D1 F at the point P ben. Then P is a stationary point of a continuously differentiable function h : W"+n -> Rwith respect to the conditions F if and only if there exist real parameters k\, ... ,kn such that
grad/z = Xi grad/i
X„ grad/„.
Let us notice that the method of Lagrange multipliers is an algorithmic one. Therefore, let us take a look at the ^,/ numbers of unknowns and equations: the gradients are vectors of m+n coordinates, so the request of the theorem gives m + n equations. The variables are, on one side, the coordinates x\,..., xm+n of the wanted stationary points P with respect to the bindings, and, on the other hand, the n parameters A, in the linear combination. Now it remains to state that the point P belongs to the implicitly given set M, which leads to n more equations. Altogether, we have 2n + m equations for 2n + m variables, so we can expect that the solution will be given by a discrete set of points P (i. e., each one of them will be an isolated point).
8.23. Arithmetic mean-geometric mean inequality. As an example of practical application of the Lagrange multipliers, we will prove the inequality
1
(x\ H-----h xn) > ^/xT
for any n positive real numbers x\,... ,x„. Further, we will prove that the inequality holds with equality if and only if all the x, 's are equal.
Let us thus take the sum x\ + ■■■ +x„ — c to be the binding condition for a (non-specified) non-negative constant c. We will look for the maxima and minima of the function
f (x\, . . . , Xfi) — \JX\ • • • xn
with respect to our binding condition and the assumption x\ >
0,...,x„ > 0.
The normal vector to the hyperplane defined by the condition is (1,..., 1). Therefore, the function / can have an externum only
493
CHAPTER 8. CONTINUOUS MODELS WITH MORE VARIABLES
Solution. We will work in cylindric coordinates given by the equations x = r cos(<p), y = r sin(<p), z, = z. The Jacobian of this transformation is J = r. The solid can be divided into two parts: above and below the plane z, = 0, whose volumes will be denoted by V\ and V2, respectively. Further, we can notice that one part of the solid with volume Vi is a pyramid with vertices [0, 0, 0], [0, 0, 2], [-2, 0, 0], [0, -2, 0]. Thus, we will further split this solid (above z, = 0) into two parts, whose volumes we will calculate separately.
Vi - V,
pyramid
^pyramid
/" if
J-n/2 \Jo
[r sin <p + r cos <p + 2]r dr J dep.
Further,
Vi - V2
ff
J-71 JO
r (sin(40) + cos(<p)) + 2r dr dep = &jr,
4jt + f.
□
so Vi + V2
Remark. During the calculation, we made use of the fact that integrating a function of two variables over an area in M2 yields the difference of the volume of the solid in M3 determined by the graph of the integrated function and lying above z, = 0 and the one lying below z, = 0.
8.74. Find the volume of the solid in M3 which is given by the intersection of the sphere x2 + y2 + z2 = 4 and the cylinder x2 + y2 = 1.
Solution. Thanks to symmetry, it suffices to compute the volume of the part that lies in the first octant. We will integrate in cylindric coordinates given by the equations x = r cos(<p), y = r sin(<p), z, = z with Jacobian J = r, and it is the space between the plane z, = 0 and the graph of the function z = a/4 — x2 — y2 = V4 — r2. Therefore, we
at those points where its gradient is a multiple of this normal vector. Thus, we get the following system of equations for the wanted points:
1 1__
v x\ • • • xn — X,
n xi
for i — 1,..., n and X e R.
Apparently, this system has a unique solution x\ — ■■ ■ — xn in the examined set. If we allowed the variables x, to be zero as well, then our set M would be compact, so the function / would have to have both a maximum and a minimum there. However, / is apparently minimal if and only if at least one of the values x, is zero; so the function necessarily has a strict maximum at our point with x; = -,/ = 1,..., n.
The value of the geometric mean is then smaller at all other points with the given sum c of coordinates, which proves the inequality.
2. Integration for the second time
Now, we will return to the process of integration, which was partially described in the second part of chapter six. We will not go into details; instead, we will concentrate on extension of this process quantities dependent on several variables, or on parameters.
8.24. Integrals dependent on parameters. When integrating a
i§t# function f(x, yi,..., y„) of n + 1 variables with respect to a single variable x, then the result will be a function F(y\,..., y„) of the remaining variables.
In problems from practice, we often deal with the task to examine such a function F. We can, for instance, look for the volume or area of a body which depends on parameters, and determine the minimal and maximal values (with additional bindings as well). We know from the first part of this chapter that we have tools built upon partial derivatives of functions for this purpose. Therefore, it would be great if we could interchange the operations of differentiation and integration, which we will prove shortly. We begin with examination of continuous dependency upon parameters.
Theorem. For a continuous function f(x, y\,..., y„) defined for all x lying in a finite interval [a, b] and all (y\,..., y„) from a neighborhood U of a point c — (c\,..., cn) € R", consider the integral
F(yi
y„) = /
Ja
Then the function F(yi ... hood U of the point c.
f(x, yi, ..., y„)dx. y„) is also continuous in the neighbor-
Proof. To verify the first proposition, it is surely sufficient jji   to recall the definition of the Riemann integral and the already verified fact that a function which is continuous on I a compact set is actually uniformly continuous, f ^        Let us thus choose a neighborhood W of a fixed point y so that we would have for all y e W and all x e [a, b] that
\f(x, y) - fix, y)\ <s
for a selected (small) positive e. The Riemann integral is, for any continuous function, evaluated by approximations by finite sums (equivalently: upper, lower, or Riemann sums with arbitrary representatives Hi, see paragraph 6.25 in the sixth chapter). However,
494
CHAPTER 8. CONTINUOUS MODELS WITH MORE VARIABLES
can directly write is as the double integral
-JT/2 rl
V = 8 /      /  ry/A - r2 dr dcp = -(8 - 3V3)tt. Jo    Jo 3
□
8.75. Find the volume of the solid in R3 which is given by the intersection of the sphere x2 + y2 + z2 = 2 and the paraboloid z = x2 + y2.
Solution. Once again, we will work in cylindric coordinates:
V = f    f   f       r dz, dr dip =   ^~7r--—.
Jo   Jo Jfi 3 6
□
8.76. Find the volume of the solid in R3 which is bounded by the elliptic cylinder 4x2 + y2 = 1 and the planes z, = 2y and z, = 0, lying above the plane z, = 0.
Solution. Thanks to symmetry, it is advantageous to work in the coordinates x = \r cos(<p), y = r cos(<p), z, = z with Jacobian J = \r. The equation of the elliptic cylinder in these coordinates is r2 = 1.
for the chosen parameters y and y, none of the sums can differ by more than
yfc—i
k-l
ž=0
ž=0
k-l
i=0 < s(b — a).
Hence it follows that the limit values F(y) and F(y) cannot differ by more than s(b — a) either, so the function is continuous. □
8.25. Integration of multivariate functions. In the case of univariate functions, integration was motivated by the idea of the area under the graph of a given function *^t~~:_ of one variable. Now, we will consider the volume of the part of the three-dimensional space which lies under the graph of a function z — fix, y) of two variables, and the multidimensional analogues in general. Back then, we chose small intervals [x,, x,+1 ] of length A x, which divided the whole interval over which we integrated. Then, we selected their representatives %i, and we approximated the corresponding part of the area by the area of the rectangle with height given by the value /(&) at the representative, i. e. the expression /(§)Ax;.
In the case of functions of two variables, we will work with divisions in both and the values representing the height of the graph above the particular little rectangles in the plane.
However, the first thing we must do is to determine the integration domain, i. e. the area we wish to integrate our function / over. As an example, let us take a look at the function z — f(x,y) — sf 1 — x2 — y2, whose graph is, inside the unit disc, the unit sphere. Therefore, integrating this function over the unit disc yields the volume of the unit semi-ball.
The simplest approach is to consider only those integration domains M which are given by products of intervals, i. e. given by ranges x e [a, b] and y e [c, d\. In this context, we talk about a multidimensional interval. If M is a different bounded set in M2, we work with a sufficiently large area [a, b] x [c, d], rather than with the set itself, and we adjust our function so that fix, y) — 0 for all points lying outside M. Considering the above case of the unit ball, we would thus integrate over the set M — [—1,1] x [—1, 1] the function
fix, y)
■r
for x2 + y2 < 1 otherwise.
The definition of the Riemann integral then faithfully follows our procedure from paragraph 6.24. We can do this for an arbitrary finite number of variables.
___J    Riemann integral | -
The Riemann integral of a real-valued function / defined on a multidimensional interval / = [a\, b\ \ x \a2, bj\ x ... x [a„, bn] exists if for every choice of a sequence of divisions S (we are dividing the multidimensional interval in all variables contemporaneously) and the representatives of the particular little cubes §(1.. Jn, with the maximum size among all used intervals approaching zero,
495
CHAPTER 8. CONTINUOUS MODELS WITH MORE VARIABLES
Thus, the wanted volume is
"jt   p i y
the integral sums
V
I I r sinfa)-r dr dcp Jo  Jo 2
/*/'
Jo Jo
r sin fa) dr dcp
r i
io 3
sinfto) d<? . /0   J J
□
8.77. Find the volume of the solid in M3 which is bounded by the paraboloid 2x2 + y2 = z and the plane z = 2.
Solution. Similarly to the above problem, we choose "special" coordinates which respect the symmetry of the solid: x = -^r cosfa),
y = r sin fa), z = z with Jacobian J = -^r. The equation of the
paraboloid in these coordinates is z, = r2, so the volume of the solid is equal to
pn/2   />\/2   p2 y
V   =   4        /     / —rdzdrdcp Jo    Jo    Jfi V2
=   2V2 /      /    2r - r3 dr dcp = 2^2 \ dcp Jo    Jo Jo
= V2ir.
□
8.78.   Calculate the volume of the ellipsoid x2 + 2y2 + 3z2 = 1.
Solution. We will consider the coordinates
x
y
z
r cosfa) sin(ö), ■^rsinfa)sin(ö),
TS
^rcos(Ö).
The corresponding Jacobian is -j^r2 sin(0), so the volume is
V =        /    / sin(Ö) dr dö dcp
Jo   Jo Jo V6
3V6
7T.
Se,^ = £ f(^...in)AXi
(here, we write Ai;, Jn for the product of the sizes of the particular intervals from the division defining the little cube with the corresponding indeces) always converge to the value
independent of the selected sequence of divisions and representatives.
The function / is then said to be Riemann-integrable over I.
As a relatively simple exercise, you can prove in detail that every Riemann-integrable function over an interval I must be bounded there. The reason is the same as in the case of univariate functions: we control the norms of the divisions used in the definition somewhat roughly.
The situation is much worse if we try to integrate in this way over unbounded intervals, because, unlike integrals in one variable, we cannot replace the wanted result uniquely with the limit of integrals over bounded areas, see ?? below. Therefore, we will further talk about integration of functions over M" only for functions whose support is compact, i. e. functions which take zero outside a bounded interval I.
A bounded set M C M" is said to be Riemann measurable iff its indicator function, defined by
Xm(xi,
> X-n) —
1   for (x\,..., x„) e S 0   for all other points in 1
is Riemann-integrable over W.
For any Riemann-measurable set M and a function / defined at all points of M, we can consider the function / — xm • / as a function defined on the whole W, and this function / apparently has a compact support. The Riemann integral of the function / over the set M is defined by
/   / dx\ ... dxn — /
Jm Jr"
fdx\ ... dx„,
supposing the right-hand integral exists.
This definition of the Riemann integral does not provide reasonable instructions how to compute the values of integrals. However, it immediately leads to the following basic properties of the Riemann integral (cf. Theorem 6.24):
8.26. Theorem. The set of Riemann-integrable real-valued functions over an interval I C M" is a vector space over the real scalars, and the Riemann integral is a linear form there.
If the integration domain S is given as a disjoint union of finitely many Riemann-measurable domains S{, the integral over a function f over S is given by the sums of the integrals over the particular domains Si.
Proof. All the properties follows directly from the definition of the Riemann integral and the properties of convergent sequences of real numbers, just like in the case of univariate functions. We advise to think out the details by yourselves. □
Now, let us rewrite the theorem into usual equalities:
496
CHAPTER 8. CONTINUOUS MODELS WITH MORE VARIABLES
□
8.79. Remark. Note that if the transformation the coordinates is linear (and affine), then the space is deformed "uniformly". This means that the volume of an arbitrary solid is changed proportionally to the change of the volume of an infinitesimal volume element, which is the Jacobian. Therefore, if we consider the volume of the ball with a given radius r to be known, (in this case, r = 1), we can infer directly that
the volume of the ellipsoid is V
i
76
T7t
376
It.
8.80. Find the volume of the solid which is bounded by the paraboloid 2x2 + 5V2 = z, and the plane z = 1.
Solution. We choose the coordinates
x   =   -j^r cos(cp), y   = ^rsinOp), z   = z. The determinant of the Jacobian is      so the volume is
/» 2tt    /»1    /»1
Jo    Jo J i2
V
10
dz dr dcp
7T
2V10
□
8.81. Find the volume of the solid which lies in the first octant and is bounded by the surfaces y2 + z2 = 9 and y2 = 3x.
Solution. In cylindric coordinates,
V
prc/2   p3 p
Jo    Jo Jo
r-j cos2 o?) 27
r dx dr dw = —jt. * 16
□
8.82. Find the volume of the solid in M3 which is bounded by the cone part 2x2 +y2 = (z — 2)2, z > 2 and the paraboloid 2x2 + y2 = 8 — z.
Solution. First of all, we find the intersection of the given surfaces:
(z - 2)2
-z + 8, z > 2;
_|    Finite additivity and linearity J_
The first part says that a linear combination (over scalars in R) of Riemann-integrable functions f■ : I -> R, i — 1,..., k is always a Riemann-integrable function, and it can be computed as follows:
a\f\(x\, ..., x„) H-----\-akfk(xi, ...,xn)\dx\. ..dx„
a\ j f\(x\, ..., x„) dx\ ... dxn + ----Vak j fk(x\, ..., x„) dx\ .. .dx„.
The second part then says that for disjoint Riemann-measurable sets Mi and M2 and for a function / : Rn —>• R which is Riemann-integrable over both these sets, we have that
/
Ja
MlUM2
f (x\, ..., x„) dx\ ... dx„ = /   f(xi,...,xn)dxi...dxn +
JM\
j
JM2
f(x\, ..., x„) dx\ ... dx„.
8.27. Multiple integrals. We will see in a while that Riemann-jSt'*    integrable functions especially involve the cases when the integration domain M can be defined by a continuous function dependency of the coordinates of boundary points so that, given the first coordinate x, we can define the range of the next coordinate by two functions, i. e., y  e  [(f(x), i[r(x)], then the range of the next coordinate by z e [n(x, y), r(x, y)], and so on for all of the other coordinates.
We can indeed do this in the case of our ball from the introductory example: for x e [—1, 1], we define the range for y as y e [—VI — x2, Vl — x2]. The volume of the ball can then be computed by integration of the mentioned function /, or we can integrate the indicator function of the ball, i. e. the function which takes one on the area S c R3 which is defined by z e '    r"     ^     ~   r"     ^ ~~
TT
y2].
The following theorem is fundamental for this. It transforms the computation of a Riemann integral to a gradual computation of several univariate integrals (while the other variables are considered to be parameters, which can thus appear in the integration bounds as well).
___|    Multiple integrals J___
Theorem. Let M c E
continuous functions
be a bounded set given, as above, by
M = {(xi, ..., x„); xi e [a, b], x2 e [ih(xi), n2(xi)], x„ e [i/n(x\, ...,xn-i),nn(xi, ...,x„_i)]},
and f be a function which is continuous on M. Then the Riemann integral of the function f over the set M exists and is given by the
497
CHAPTER 8. CONTINUOUS MODELS WITH MORE VARIABLES
therefore, z = 4, and the equation of the intersection is 2x2 + y = 4. The substitution x =    r cos fa), y = r sin fa), z, = z transforms the
given surfaces to the form r2 = (z — 2)2, z, > 2, and r2 = 8 — z, i. e., z = r+2 for the former surface and z, = 8—r2 for the latter. Altogether, the projection of the given solid onto the coordinate cp is equal to the interval [0, 2it]. Having fixed a cp0 e [0, 2it], the projection of the intersection of the solid and the plane cp = ip0 onto the coordinate r equals (independently of cpo) the interval [0, 2]. Having fixed both ro and cpo, the projection of the intersection of the solid and the line r = ro, cp = cpo, onto the coordinate z, is equal to the interval [ro +2, 8—r^ ].
The Jacobian of the considered transformation is j
V2
r, so we can
write
V
p2n p2 pS-r2 0     JO Jr+2
V2
dz, dr dcp
16V2
-TV.
□
8.83.  Find the volume of the solid which lies inside the cylinder y2 +
z" = 4 and the half-space x > 0 and is bounded by the surface y2 +
j2
-2 + 2x
z" + zx = 16. Solution. In cylindric coordinates,
V
f'ff
Jo   Jo Jo
r dx dr dcp = 2&it.
□
8.84. The centroid of a solid. The coordinates (xt, yt,zt) of the cen-troid of a (homogeneous) solid T with volume V in M3 are given by the following integrals:
Zt
E E E
xdxdy dz,
y dx dy dz,
z dxdydz.
Jo ^0
^2 2
dy dx
s/6 in Jo
dy' dx'
TV
10    JO 4V6' The other integrals we need can be computed directly in Cartesian coordinates x and y:
x dy dx
73    / 1 - 3x2
Xa/-dx
2
1 - 3f
dt
V2 18 '
formula
The centroid of a figure in M2 or other dimensions can be computed analogously.
8.85. Find the centroid of the part of the ellipse 3x2 +2/ = 1 which lies in the first quadrant of the plane M2.
Solution. First, let us calculate the volume of the given ellipse. The transformation x = -j^x1 ,y = -jj/ with Jacobian ^ leads to
/   f(xi, x2, ..., x„)üfxi .. .dx„ — / 1/
Jm Ja \J\lr2
a
rin(x\,...,Xn-\)
f(x\, xi, ■ ■ ■, xn) dxn ) ... dx2 )dxi
fn(xi,---,Xn-i)
Proof. First of all, we will go through the proof for the case of two variables, and then we will see that there is no need of further ideas in the general case.
Consider an interval / — [a,b] x [c, d] containing our set M — {(x, y); x e [a,b],y e [ijf(x), n(y)]} and divisions S of the interval / with representatives^-.
The corresponding integral sum is
SS,5=£/(£y)A*l7
'•j
= E(E/^'))a^)Ax''
where we write Axy for the product of the sizes Ax, and Ax; of the intervals which correspond to the choice of the representative
Now, let us assume that we work only with choices of representatives l-ij which all share the same first coordinate x,. If we leave the division of the interval [a, b] and refine only the division of [c, d], the values of the inner sum of our expression will approach the value of the integral
Pl(Xi)
Jy(xi)
f(xt, y)dy,
which surely exists since the function f(x{,y) is continuous. Moreover, we thus obtain a function which is continuous in the free parameter x;, see 8.24. Therefore, further refinement of the division of the interval [a, b] leads, in limit, to the desired formula
^2SiAxi^S = j (j f(x,y)dy\dx.
It remains to deal with the case of general choices of representatives of general divisions S. However, since we are working with a continuous function / on a compact set, it is actually uniformly continuous there. Therefore, if we select a small real number e > 0 beforehand, we can always, for the norm of a division, find a bound S > 0 so that the values of the function / for the general choices xtj differs by no more than e from the choices used above. The limit processes thus result in the same for general Riemann sums 63,5 as we saw above.
Now, the general case can be proved easily by induction. In the case of n — 1, the result is trivial. The presented reasoning can easily be transformed for a general induction |x step, writing (x2,... ,x„) instead of y, having x\ instead of x, and perceiving the particular little cubes of the divisions as (n — 1)-dimensional cubes Cartesian-multiplied by the last interval. In the last-but-one step of the proof, we just use the induction hypothesis, rather than the simple one-dimensional integration. The final argument about uniform continuity remains
498
CHAPTER 8. CONTINUOUS MODELS WITH MORE VARIABLES
1       . /\-3x2
1 [H 1 - 3x2 y dy dx = - / -:-dx
r?3 m— l r
/    /      y dy dx = -
Jo    Jo z Jo
4~1
1 V3 (1 - 3x2)dx = —.
18
Therefore, the coordinates of the centroid are       — ].
□
8.86. Find the volume and the centroid of a homogeneous cone of height h and circular base with radius r.
Solution. Positioning the cone so that the vertex is at the origin and points downwards, we have in cylindric coordinates that
V =4
rTr/2   pr ph
Jo   Jo hp
1 2
p dz dp dip = —Tthr
Apparently, the centroid lies on the z-axis. For the z-coordinate, we get
1 r 1 r12 r rh 3
z = 77 /   -     zdV = 77 /      /   /   zpdzdpdcp =-h. V JkuLzel V Jo    Jo J±n 4
TP
Thus, the centroid hes -\h over the center of the cone's base. □
8.87. Find the centroid of the solid which is bounded by the paraboloid 2x2 + 2y2 = z, the cylinder (x + l)2 + y2 =0, and the plane z = 0.
Solution. First, we will compute the volume of the given solid. Again, we use the cylindric coordinates (x = r ■ cos cp, y = r ■ sin cp, z, = z), where the equation of the paraboloid is z = 2r2 and the equation of the cylinder reads r = —2 cos(tp). Moreover, taking into account the fact that the plane x = 0 is tangent to the given cylinder, we can easily determine the bounds of the integral that corresponds to the volume of the examined solid:
/»        /»— 2 cos pl^r2
V   =    I     I I    r dz dr dtp
Ji   Jo Jo
3zl
/| fo
L
-2 cos c
2r dr dtp
i cos ittp = 3jz,
where the last integral can be computed using the method of recurrence from 6.22.
Now, let us find the centroid. Since the solid is symmetric with respect to the plane y = 0, the y-coordinate of the centroid must be zero. Then, the remaining coordinates xT and zt of the centroid can
the same. We advise to go through this proof in detail as an exercise. □
FuBINl's theorem
8.28. Corollary. For a multidimensional interval M — [a\, b\\ x [a2, bi\ x ... x [an, bn] and a continuous function f(x\,..., x„) on M, the multiple Riemann integral
Jm
f(xi
fbi i>b2 rbn J a\   Jai Jüf,
f(x\, ..., x„) dx\ ... dx„
is independent of the order in which the integrations are performed.
Proof. In the case of a multidimensional interval M in the previous theorem, any order of integration expresses the area M in the required form. Therefore, the order of integration has no effect upon the result of the integral. □
The possibility of changing the order of integration in multiple integrals is extremely useful. We have already taken advantage of this result, namely when studying the connection of Fourier transforms and convolutions, see paragraph 7.9.
Our simple derivation of Fubini's theorem builds upon the simple properties of Riemann integration and the continuity of the integrated function. Fubini, in fact, proved this result in a much more general context of integration, while the theorem we have just introduced was used by mathematicians like Cauchy at least a century before Fubini.
We can also notice that we have defined no concept of an improper integral for unbounded multivariate functions. You can verify that it is quite impossible to do this in a reasonable way, just consider the following example of two multiple integrals:
fo (io
y
(x
y)3 - y
\ l
dy Jdx — — ■\dy=-:
3 dxJ dy
(x + yy
The reason can be felt already from the properties of non-absolutely converging series. There, rearranging the summands can lead to an arbitrary result.
The situation is a bit better if we calculate the Riemann integral of a bounded Riemann-integrable function f(x) with non-compact support over the whole M". If there is a universal bound
/ (x) dx
< C
with a constant C independent of the choice of an n-dimensional interval I, then it is possible to define
/ f(x) dx — lim / Jw r^°° Jir
f(x) dx,
where Ir — {{x\,..., x„); \xj \ < r, j — 1,..., n], and the result is, of course, bounded by the same constant C. In this case as well,
499
CHAPTER 8. CONTINUOUS MODELS WITH MORE VARIABLES
be computed by the following integrals: 1
xT   = — V
1
V
1
V
1
V
j j j xdxdydz
f f /
Jo    Jf Jo
/i L
\:-[
10
-2 cos a
r cos cp dz dr dcp
2r coscp dr dph
■ cos cp dcp
4
3'
where the last integral was computed by 6.22 again. Analogously for the z-coordinate of the centroid:
Zt
V Í    L Í
-2 cos c
20
zr cos cp dz dr dcp = —.
The coordinates of the centroid are thus [—4,0, 2§-].
□
8.88. Find the centroid of the homogeneous solid in M3 which lies between the planes z, = 0 and z, = 2, bounded by the cones x2 + y2 = z2 andx2 + y2 = 2z2.
Solution. The problem can be solved in the same way as the previous ones. It would be advantageous to work in cylindric coordinates.
However, we can notice that the solid in question is an "annular cone": it is formed by cutting out a cone Ki with base radius 4 of a cone K2 with base radius 8, of common height 2.
The centroid of the examined solid can be determined by the "rule of lever": the centroid of a system of two solids is the weighted arithmetic mean of of the particular solids' centroids, weighed by the masses of the solids. We found out in exercise || 8.86|| that the centroid of a homogeneous cone is situated at quarter its height. Therefore, the centroids of both cones lie at the same point, and this points thus must be the centroid of the examined solid as well. Hence, the coordinates of the wanted centroid are [0, 0, 4]. □
8.89. Find the volume of the solid in M3 which is bounded by the cone part x2 + y2 = (z — 2)2 and the paraboloid x2 + y2 = 4 — z.
Solution. We build the corresponding integral in cylindric coordinates, which evaluates as follows:
2jt    /•! /-4-r2
V
r> All /»1 /»4—/ JO     h) Jr+2
r dz dr dcp = —it.
□
8.90.  Find the volume of the solid in M3 which lies under the cone
x2 + y2 = (z — 2)2, z < 2 and over the paraboloid x2 + y2 = z.
Solution.
V
r>2iz /»1 r>2—r JO     JO Jr2
r dz dr dcp = —jx.
Note that the considered solid is symmetric with the solid from the previous exercise || 8.891| (the center of the symmetry is the point [0,0,2]). Therefore, it must have the same volume. □
Fubini's theorem holds in the form
/   f(x)dx= /     •••( /     f(x)dxi) ...dx„.
Jr" J-qo V-oo
8.29. Notes about integration.
The Riemann integral of multivariate functions behaves even worse than we have seen in the case of func-I tions of one variable in the sixth chapter. Therefore, more V-1 sophisticated approaches to integrations have been developed, which are derived from the concept of the measure of a set. Let us take a quick look at this problem.
We can consider the strict analogy of the lower and upper Riemann integrals for univariate functions. This means taking infima or suprema, respectively, over the corresponding multidimensional interval, instead of the function values at the representatives in the Riemann sums. For bounded functions, we always get well-defined values this way, and if we do this for the indicator function xm of a fixed set M, we get the so-called inner and outer Riemann measure of the set M. Apparently, the inner measure is the limit of the areas given by the sum of the volumes of all intervals from our divisions which are inside M, and, on the other hand, the outer measure is given by the sum of the volumes of intervals covering M. It follows directly from the definition that a set M is Riemann-measurable if and only if its lower and upper measures are equal.
The sets whose outer measure is zero are, of course, Riemann-measurable. We call them measure-zero sets or null sets. It can be shown quite easily that Riemann-integrable functions are exactly those bounded functions with compact support whose set of discontinuity points has measure zero. Surely, this definition makes the measure finitely additive, i. e., a disjoint union of finitely many measurable sets is again a measurable set, and its measure is given by the sum of the measures of the sets being united. However, unlike in the case of one variable, now it does not hold that a countable disjoint union of measurable sets is measurable, so we must expect problems with limit approaches, as we have seen in the case of one variable.
If we restrict ourselves to the vector space Sc (Rn) of all continuous functions with compact support, we can proceed in the same way as in the seventh chapter, i. e., we can define, for functions / e SC(R"), their norms
ll/l
(/
\f(x\, ... ,xn)\p dxi ...dx„
VP
for all values 1 < p < oo. Thanks to the Riemann integral having been defined in terms of divisions, the properties of the norm can be verified in the same way as for univariate functions, using Holder's and Minkowski's inequalities.
We thus get the metric spaces Cp. As we have known from the general theory, its completion exists (and it is determined uniquely, up to isometry), and it can be shown that it will again be a space of functions. Moreover, a more general theory of integration can be developed so that the norms on these complete spaces would be given by the same formulae as above. However, we will not go deeper into these parts of mathematical analysis here.
500
CHAPTER 8. CONTINUOUS MODELS WITH MORE VARIABLES
8.91. Find the centroid of the surface bounded by the parabola y = 4 — x2 and the line y = 0. O
8.92. Find the centroid of the circular sector corresponding to the angle of 60° that was cut out of a disc with radius 1. O
8.93. Find the centroid of the semidisc x2 + y2 = 1, y > 0. O
8.94. Find the centroid of the circular sector corresponding to the angle of 120° that was cut out of a disc with radius 1. O
8.95. Find the volume of the solid in M3 which is given by the inequalities z > 0, z - x < 0, and (x - l)2 + y2 < 1. O
8.96. Find the volume of the solid in M3 which is given by the inequalities z > 0, z-y <0. O
8.97. Find the volume of the solid bounded by the surface
3x2 + 2y2 + 3z2 + 2xy - 2yz - 4xz = 1.
8.98. Find the volume of the part of M3 lying inside the ellipsoid 2x2 + y2 + z2 = 6 and in the half-space x > 1. O
8.99. The area of the graph of a real-valued function f(x, y) in variables x and y. The area of the graph of a function of two variables over an area S in the plane x y is given by the integral
P = f Jl+fl + ffdxdy.
Considering the cone x2 + y2 = z2. find the area of the part of its lateral surface which lies above the plane z, = 0 and inside the cylinder
x2 + y2 = y.
Solution. The wanted area can be calculated as the area of the graph of the function z = fx2 + y2 over the disc K: x2 — (y — ^)2. We can easily see that
x y
fx =-, / = —-—,
x     x2 + y2 '   y     x2 + y2 '
so the area is expressed by the integral
jjJl + f2 + f2Axdy = jjf V5
dx dy
V 2 /    /      r dr dcp =-       /   sin cp
Jo  Jo 2 J0
V2.7V
□
8.100. Find the area of the parabola z, = x2+y2 over the disc x2+y2 < 4. O
8.101. Find the area of the part of the plane x + 2y + z, = 10 that lies over the figure given by (x — \)2 + y2 < \ and y > x. Q
In the following exercise, we will also apply our knowledge of the theory of Fourier transforms from the previous chapter.
8.30. Differentiation with respect to parameters. Now, we can finally deal with the promised dependency of integrals upon parameters. The following result is / highly applicable. For instance, we can use it when examining integral transforms, which we talked about in the second part of chapter seven.
Now, our previous results about extrema of multivariate functions also have a direct application for minimization of areas or volumes of objects defined in terms of functions dependent on parameters.
__J    Differentiation with respect to parameters |„__
Theorem. For a continuous function f(x, y\,..., yn) definedfor all x from a finite interval [a, b] and for all (y\,..., yn) lying in some neighborhood U of a point c — (c\,..., c„) € M", consider the integral
f
Ja
F(yu ..., y„) — / f(x,yi,...,yn)dx.
O      If there exists a continuous partial derivative M- on a neighbor-
ly j
hood of the point c, then     (c) exists as well, and we have 8F fh df
~z—(c) — / 7—(x,c\, ... ,cn)dx. oyj        J a oyj
Proof. Thanks to the considered continuity of all functions, we can easily utilize our knowledge about univariate antiderivatives, and the result will be a simple consequence of Fubini's theorem. Since all the other parameters yj play only the passive role of a constant parameter in our reasonings, we can assume without loss of generality that there is only one parameter y. Let us denote
G(y)
-f
Ja
(x, y) dx, F(y)
-f
Ja
fix, y) dx
and compute, invoking Fubini's theorem, the antiderivative
Hiy) = I* G(y)dy Jyo
= Fiy) - F(y0).
Ja \Jyo
df \
— ix,y)dy)dx dy )
Finally, differentiating with respect to y yields
dH dF
G(y) = —(y) = —(y), ay ay
which is what we have wanted to prove.
□
8.31. Change of coordinates at integration. When calculating integrals of univariate functions, we used coordinate transformations as an extraordinarily powerful tool. $KJtO%  The situation is very similar in the case of functions ^%s»-i— of more variables.
First, let us recall (with an appropriate interpretation for the subsequent generalization) the transformation for a single variable. The integrated expression / (x) dx describes the area of a rectangle defined by a (linearized) increase of the variable x and by the value
501
CHAPTER 8. CONTINUOUS MODELS WITH MORE VARIABLES
8.102. Fourier transform and diffraction. Light intensity is a physical quantity which expresses the transmission of energy by waves. The intensity of a general light wave is defined as the time-averaged magnitude of the Poynting vector, which is the vector product of mutually orthogonal vectors of electric and magnetic fields. A monochromatic plane wave spreading in the direction of the j-axis satisfies
cs0
If
r Jo
El dt,
where c is the speed of light and eo is the vacuum permittivity. The monochromatic wave is described by the harmonic function Ey = \jr(x, t) = A cos(a>t —kx). The number A is the maximal amplitude of the wave, a> is the angular frequency, and for any fixed t, the so-called wave length a is the prime period. The number k then represents the speed & = y" at which the wave propagates. We have
/   =   cs0- f E2 dt = cs0- f A2 cos2(cot — k x) dt = * Jo ? Jo
1 fT 1 + cos(2((ot - k x))
cs0A2- f —' - dt
* Jo 2
1 sin(2(atf — k x)) ,T
-cs0A2-[t +
2co
-I
x
1 91 , sin(2(<wr — kx)) — sin(2(—k x)) , -cs0Az-(r +---)
2 r 2a>
1 9/ sin(2((wr — k x)) — sin(2(—k x)) , -ce0A2(l +---) =
2 2(or
-cs0A
f(x). If we transform the variable by a relation x — u(t), then the linearized increase can be expressed as
du
dx — —dt, dt
and so the corresponding contribution for the integral is given by
du
f(u(t))—dt, dt
where we either suppose that the sign of the derivative u'(t) is positive, or we interchange the bounds of the integral, so that the sign takes no effect in the result.
Intuitively, the procedure for n variables is quite similar. We only have to use our knowledge about the volume of parallelepipeds from linear algebra.
We use, for Riemann integrals in the Riemann sums, an approximation which takes the volume (area) of a small multidimensional interval and multiplies it by the value of the function at the representative point. If we transform the coordinates, we not only get the function value at the representative point in a new coordinate expression, but we also have to account for the change of the area or volume of the corresponding small multidimensional interval. Once again, this is the case of a linear approximation of a change, which we know well — this is actually an action of the linear approximation of the used transformation, i. e. an action of the Jacobian matrix, see 8.14. The change of the volume is then given (in absolute value) by the determinant of this matrix (see our discussion of this topic in linear algebra, especially 4.22).
Transformation of coordinates    |_,
The second term in the parentheses can be neglected since it is always less than ^ = jfr < 10~6 f°r real detectors of light, so it is much inferior to 1. The light intensity is directly proportional to the squared amplitude.
A diffraction is such a deviation from straight-line propagation of light which cannot be explained as the result of a refraction or reflection (or the change of the ray's direction in a medium with continuously varying refractive index). The diffraction can be observed when a lightbeam propagates through a bounded space. The diffraction phenomena are strongest and easiest to see if the light goes through openings or obstacles whose size is roughly the wavelength of the light. In the case of the Fraunhofer diffraction, with which we will deal in the following example, a monochromatic plane wave goes through a very thin rectangular opening and projects on a distant surface. For instance, we can highlight a spot on the wall with a laser pointer. The image we get is the Fourier transform of the function describing the permeability of the shade - opening.
Let us choose the plane of the diffraction shade as the coordinate plane z = 0. Let a plane wave A exp(ikz) (independent of the point (x, y) of landing on the shade) hit this plane perpendicularly. Let s(x, y) denote the function of the permeability of the shade, then the resulting waves falling onto the projection surface at a point (§, rj) can be described as the integral sum of the waves (Huygens-Fresnel principle) which have gone through the shade and propagate through the medium from all points (x, y, 0) (as a spherical wave) into the point (§, n, z):
Theorem. Let G(t\
Jn)
(Xl,
G(t\, ..., t„), be a continuously differentiable mapping, G(M) and M be Riemann-measurable sets, and f : M -continuous function. Then,
' Xfi) —
N =
Jm
f(xi
/
Jn
f(G(h,tn)) det^Gfa,..., tn))\dti ...dtn.
Proof. Since we are working with a continuous function / and a differentiable change of coordinates, the integrals on both sides of the equality to be proved apparently exist. Therefore, we only need to prove that their values are indeed equal. Let us denote our composite function by
,tn) = f(G(t1,
tn)),
and choose a sufficiently large n-dimensional interval I containing TV and its division S. The entire proof is nothing more than a more exact writing of the discussion presented above.
First, let us notice two things: The images of the boundaries of our interval hx...in are differentiable objects (sides, edges, etc.); in particular, they will again be Riemann-measurable sets. For each little part hx...in of our division S, the integral / over Jix..xn — G(Iix...in) surely exists.
Further, if we fix the center f(1.. Jn °f me interval hx...in, then we get the linear image of this interval (note that we map the interval
502
CHAPTER 8. CONTINUOUS MODELS WITH MORE VARIABLES
J Jm2
rf) = A I I   s(x, y)e
l2
-ik(^x+rff)
áx dy
p/2 r-q/2
rf) = A
-p/2 J-
-ik(&+rff)
dy áx
p/2 J -q/2
/p/2 r-q/2 e-^x áx /     e-ikny dy -p/2 J-q/2
~ g—ik^x ~	p/2	-£-iktjy -	q/2
	-p/2	_ —ikt] _	-q/2
2sin(& i-p/2) 2sin(& rjq/2) sin(& i-p/2) sin(& rjq/2)
A--= Apq--
ki; krj k^p/2 kriq/2
The graph of the function f(x) =      looks as follows:
The graph of the function f (£, v) = then does:
And the diffraction we are describing:
shifted to the origin with the linear mapping given by the Jacobian matrix, and the result is then added to the image of the center)
*/,...«„ = G(f,in) + DiG(f, .,)(/,-,
l\...ln        ^ \'l\...lnJ   '   ^       \ll\...ln)\1-l\...ln H\...ln)
an n -dimensional parallelepiped. If our division is very fine, this parallelepiped differs only a bit from the image Jiu...in. Exactly speaking, thanks to the uniform continuity of the mapping G, we can, for an arbitrarily small e > 0, find a norm of the division such that we will have for all finer divisions that
G(th...in) + (1 + e)D1G(tl,... tn)(Ih...in) d Jh...ik.
However, then the n-dimensional volumes will also satisfy
voln(Jil...in) < (l+s)"voln(Rh...in)
= (l+e)"|detG(fil..Jt)|vol„(/il..JJ.
Now, we are able to bound the whole integral from above:
Jm
f(xi
^) dx Y • • • djCfj — f(xi, ...,xn)dxi ...dxn
l\-ln '1-'"
< SUP g)VOl„ (/^...ij
il...iJ^'-'t^eI'l-'«
< (1 + e)n      (     SUP    *)|det G(th...ik)| vol„(/,■,...,„).
h...in(fu-'tn)&Ii^n
Letting the norms of the divisions approach zero, the left-hand value remains the same, while on the right side, we obtain the Riemann integral. Instead of the equality to be proved, we get the inequality:
/ f(xi,...,x„)dxi...dx„ Jm
Jn
f(G(h
,t„))\áet(DlG(tu
, t„))\ dt\ ... dt„.
However, now we can repeat the same reasoning so that we interchange G and G-1, the integration domains M and N, and the functions / and g. We thus immediately obtain the other inequality:
/
Jn
g(tu      tn)\det(DlG(ti,tn))\dti ...dtn
f f(xu...,xn)\dzt(DlG(G-l(xu Jm
Idet^G-1^!
Jm
which finishes the proof.
, x„))\dxi
■ , x„)))\ . dXfj
□
8.32. An example in two dimensions. The coordinate transfor-fSU        mations are quite transparent for the integral of a con-i'^i^r^ tinuous function f(x, y) of two variables. Consider
the differentiable transformation
G(s, t) — (x(s, t), y(s, t)). Denoting g(s, t) — f(x(s, t), y(s, OX we get
dx dy     dx dy
/ f(x, y)dxdy — \ g(s, t) Jg(N) Jn
ds dt      dt ds
dsdt.
503
CHAPTER 8. CONTINUOUS MODELS WITH MORE VARIABLES
M0°rad
Orad
L-I■ 10 rad
Since lim^^o ^f- = 1, the intensity at the middle of the image is directly proportional to 7o = A2p2q2. The Fourier transform can be easily scrutinized if we aim a laser pointer through a subtle opening between the thumb and the index finger; it will be the image of the function of its permeability. The image of the last picture can be seen if we create a good rectangular opening by, for instance, gluing together some stickers with sharp edges.
I. Applications of Stoke's theorem - Green's theorem 8.103. Compute
(x — y)dx + x Ay,
where c is the positively oriented curve represented by the perimeter of the square ABCD with vertices A = [2,2]; B = [-2, 2];C = [-2, -2]; D = [2, -2].
Solution. Using Green's theorem (see 8.44), we reduce the given curve integral to an area (multiple) integral. The integral is of the form
/ fix, y) Ax + g(x, y) Ay, where fix, y) = x — y and g(x, y) = x.
c
The needed partial derivatives of the functions f(x,y) and g(x,y) are thus fy(x, y) = — 1 and gx(x, y) = 1. All of the functions fix, y), g(x, y), fy(x, y), and gx(x, y) are continuous on Green's theorem:
so we can use
/
(x — y) Ax + x Ay
II
(1 + 1) Ax Ay
If
Ax Ay
2 2
2J f dxdy = 2[x]2_2.[y]2_2 = 32.
-2 -2
□
8.104. Compute
/
x  Ax + xy Ay,
where c is the positively oriented curve going through the vertices A [0, 0]; B = [1,0]; C = [0, 1].
As a truly simple example, we can calculate the integral of the indicator function of a disc with radius R (i. e. its area) and the integral of the function fit, 6) — cos(f) defined in polar coordinates inside a circle with radius \it (i. e. the volume hidden under such a "cap placed above the origin", see the picture).
First, we determine the Jacobian matrix of the transformation
: r cos 9, y — r sin#
^cos9   — rsin#N . sin 6    r cos 6
D G =
Hence, the determinant of this matrix is equal to
AetDlGir, 6) = r(sin2 6 + cos2 6) = r.
Therefore, we can calculate directly for the disc S which is the image of the rectangle (r, 6) e [0, R] x [0, 2it] — T. We thus get the area of the disc:
r r2lZ     rR rR
I dxdy — I     J    rdr d6 — I
Js Jo   Jo Jo
Inrdr — 7t R
The integration of the function / will be very similar, using multiple integration and integration by parts:
<>2jt pjt/2
/ fdxdy = / / Js Jo Jo
r cos rdr dB — ?r2
2tt.
8.33. Curve integrals. We often cannot do with integrals over J§L'* °Pen su^sets m fl^" because our quantities are given only on objects which are similar to curves or sur-'S^^^^^M faces in M3. The previous reasoning about changes of coordinates when computing integrals clarified the intuitive imagination that our process of integration is the sum of volumes of small linearized parallelepipeds multiplied by the value of the integrated function. Extending this idea, we could define integration over such multidimensional surfaces in M" directly. However, we will first relieve the integration of dependency on coordinates, and then we will transform it to the well-known integration on M".
Recall the calculation of the length of a curve by univariate integrals, which was discussed in paragraph 6.7 on page 353. The curve was parametrized as a mapping c(t) : R —>• M", and the size of the tangent vector || c' it) \\ was expressed in the Euclidean vector space. This procedure was given by the universal relation for an arbitrary tangent vector, i. e., we actually found p : Rn —>• R which gave the true size when evaluated at c'(t). This mapping satisfied p(a v) — \a\p(v) since we ignored the orientation of the curve given by our parametrization. If we wanted a signed length, respecting the orientation, then our mapping p would be linear on every one-dimensional subspace id".
504
CHAPTER 8. CONTINUOUS MODELS WITH MORE VARIABLES
Solution. The curve c is the boundary of the triangle ABC. The integrated functions are continuously differentiable on the whole M2, so we can use Green's theorem:
1  -x + l
J x4 Ax + x y Ay = J J y Ax Ay = J J y Ax Ay =
D
1
/
0 0
2
-x + l
2X2 t ~ ~2
Ax
+ x
1
/
x2 -2x + l
Ax
□
8.105. Calculate
/
(xy + x + y) Ax + (xy + x — y) Ay,
where c is the circle with radius 1 centered at the origin.
Solution. Again, the prerequisites of Green's theorem are satisfied, so we can use Green's theorem, which now gives
J (xy + x + y) Ax + (xy + x - y) Ay
c
= jj y + \- x- \AxAy
D
1 2jt
= J J r2 (sin cp — cos <p) Ar Acp =
0 0 1 2jt
= J r2 Ar J sin (/j — cos cpdcp = - [ — coscp — sin^o]^ = 0.
0 0
□
8.106.   Compute /(2e2x sin y —3y3 )dx +(e2x cos y + |x3 )dy, where
c
c is the positively oriented ellipse 4x2 + tyy2 = 36.
Solution.: We will use Green's theorem, choosing the linear deformation of polar coordinates x = 3r cos <p, <p e [0, 2it],
Now, we will proceed in a much similar way. Let us consider a differentiable curve c(t) in R", t e [a,b] and assume that a differentiable function / is defined on a neighborhood of its values. The differential of this function gives, for every tangent vector, the increase of the function in the given direction. It is expressed by the differential of the composite mapping / o c by
df
df
d(f o c)(t) = ^-(c(t))c[(t) + ■■■ + -^-(C(t))c'n(t).
We can thus try to define the value of the integral in this way (we keep writing df symbolically to emphasize which object is integrated, just like we wrote dx in the case of univariate integrals)
rb/af df
/ f dvolM Jm
-(c(t))c'n(t)\dt.
= I (^(c(t))c[(t) + -- , Ja\dxi dx„
and we can immediately verify that the change of the parametriza-tion of the curve has no effect upon the value. Indeed, writing c(t) — c(^r(s)), a — i[r(a), b — if(b), our procedure yields
L
t—
'a \ dxi
(c(f(s)))c[(f(s))
df \ dir
-^(c(if(s)))c'n(if(s)))^-ds,
and the theorem about coordinate transformations for univariate integrals gives just the same value if we have ^- > 0, i. e., if we keep the orientation of the curve, and the same value up to sign if the derivative of the transformation is negative.
Precisely speaking, we have learned to integrate the differential df of a function over curves. However, it may be the case that the connection with integration of functions is not apparent. We clearly cannot get the length of the curve if we select a constant function with value one for /. We need a geometric point of view to explain this. The size of a vector is given by a quadratic form, rather than a linear one. However, if we take the square root of the values of a (positively definite) quadratic form, we get a linear form (up to sign, see above). We will get back to these connections shortly.
8.34. Vector fields and linear forms. In the previous paragraph, the parametrization of a curve was used to obtain a tangent vector c'(t) e R" for every point in the image M of the curve. We thus have a mapping X : M     M x R", c(t) \-+ (c(t), c'(t)). We talk about the vector field X along the curve M.
In general, we define a vector field X on an open set U C R" as assigning the vector X(x) e R" in the direction space of the Euclidean space R" to every of its points x in the considered domain.
If a vector field X on an open set U C R" is given, then we can define for every differentiable function / on U its derivative in the direction of the vector field X in terms of the directional derivative by the formula
X(f):U^R, X(f)(x)=dx(x)f.
(Xl(x),...Xn(x)),
Therefore, if we have, in coordinates, X(x) then
X(f)(x)
df
Xl(x)-^(x)
OX l
df
+ Xn(x)-^(x).
dxn
The simplest vector fields will have all coordinate functions equal to zero except for one function X, which will be constantly equal
505
CHAPTER 8. CONTINUOUS MODELS WITH MORE VARIABLES
y = 2r sincp re [0, 1],
leading to (the Jacobian of the transformation is 6r):
/
(2e2x sin y - 3y3)dx + (e2x cos y + -x3) dy = jj 2e2x cos y + 4x2- (2e2x cos y - 9y2) Ax Ay
D
1 2jt
j j 6r [4(3r cos cp)2 + 9(2r sin <p)2] =
0 0
1 2jt
216 j r3 dr j d(P = 216 • [^-]J ' 2jr = 108jr-
o o
□
8.107. Compute
j (ex\ny - y2x)dx + ^---\*2y^
c
where c is the positively oriented circle (x — 2)2 + (y — 2)2 = 1. Solution.
/ex 1 (<?*lny - y2x)dx + (---x2y)<iy = y 2
//'.
1 2jt
= y y r(r cos cp + 2) • (r sin cp + 2) dr Acp =
o o
1 2jt
= y y r3 sin cp cos <p + 2^ (sin cp + cos <p) + 4r dr dip
xy---h 2xy dx dy
y y
0 o
2jt
1 f ■ -  I Sil
4J
2n
2 f . - I si
3J
sin cp cos <p d#H— / sin cp + cos cp Acp + An
1 rsin2<p-,2jr      r -i2jt
-[^—J0 +[-cos<p + sm(pJ0 +47r=4jr.
□
Calculate the integral
y (ex siny — xy2)^ + ^ex cos y — ^x2^ dy,
where c is the positively oriented circle x2 + y2 + 4x + 4y + 7 = 0.
o
8.109. Compute
y (3y - esinx) Ax + (7x + V/ + 1) dy,
to one. Such a field then corresponds to the partial derivative with respect to the variable x,. This is also matched by the common notation
Z(x)
d
Xi(x) —
d
+ Xn(x) — . dxn
The set of all possible tangent vectors at the points of an open subset U e R" is called the tangent space TU. The vector space of all vectors at a point x is denoted by Tx U. We use the notation X(U)
for the set of all smooth vector fields on U. Vector fields
can
be perceived as generators of X(U), admitting smooth functions as the coefficients in linear combinations.
When we studied the vector spaces, we found out as early as in the second chapter that we need the so-called linear forms. They were defined in paragraph 2.39 on page 103. The same idea suggests itself now. The linear form on the direction space R" of our Euclidean space R" assigned to a point x € R" is a linear mapping defined on the tangent space TXU. Having a mapping rj : U cl" —>• Rn* on an open subset U, we talk about a linear form r\ on U.
Every differentiable function / on an open subset U c R" defines a linear form df on U. We use the notation Ql (U) for the set of all smooth linear forms on U.
It is apparent that in the coordinates (xi,..., x„), we can use the differentials of the particular coordinate functions to express every linear form rj as
rj (x) — rji(x)dx\ + • • • + rjn(x)dxn,
where m (x) are uniquely determined functions. Such a form rj is evaluated at a vector field X(x) — Xi(x)-^- H-----h X„ (x) by
V(X(x)) = m(x)Xi(x) + ■■■ + Vn(x)X„(x).
If the form rj is the differential of a function /, we get just the expression X(f)(x) — df(X(x)) used above.
Let us notice that we have actually defined the integral of any linear form over (non-parametrized) curves M in terms of an arbitrary parametrization c(t)
J-f
JM Ja
V(c(t))(c'(t))dt,
since although we worked with the function differential back then, we actually verified that the value of the integral was independent of the choice of parametrization for any linear form.
We can also notice that we need not write any symbol denoting which concept of a volume we are integrating with respect to. It is given by the definition of a linear form.
8.35. /-dimensional surfaces and &-forms. Instead of parametrized curves, we will now work with differentiable mappings cp : V <z Rk ^ R", k < n, with injective differential dcp(u) at every point of its open domain V. Such mappings are called immersions. A subset M c R" is called a manifold of dimension r iff every point x e M has a neighborhood U which is the image of such an immersion cp : V c Rk —>• M c R" which can be extended to a mapping <p : V x V —>• R" which is a diffeomorphism and (p~l (M) — V x {0}. This definition, which might seem complicated at the first sight, is illustrated by a picture. Manifolds can be typically given by implicit mappings, see paragraph 8.18 and the discussion in 8.19.
506
CHAPTER 8. CONTINUOUS MODELS WITH MORE VARIABLES
where c is the positively oriented circle x2 + y2 =9. 8.110. Compute the integral
O
/
1 y3 1 x3
(- + 2xy - —) dx + (- + x2 + —) dy,
x 3 y 3
where c is the positively oriented boundary of the set D = {(x,y) e M2 : 4 < x2 + y2 < 9,     < y < V3x}. O
8.111. Remark. An important corollary of Green's theorem is the formula for computing the area D that is bounded by a curve c.
m(D)
-y dx + x dy.
8.112.   Compute the area given by the ellipse ^ +    = 1-
Solution. Using the formula ||8.111|| and the transformation x a cos t,y = b sin t, we get for t € [0, 2it] that
m(D)
-y dx + x dy
if
c
2n
-a cos t ■ b cos tdt--/
2 J 2 J
o
2jt 2n
1 Í        2 1 Í 2
-ab I cos tdt -\—ab I sin tdt
2 J 2 J
2jt 2jt
a cos t ■ b cos tdt — - I b sin t ■ (—a sin t)dt
2n
—ab f i
2 J
___ . cos21 + sin2 tdt = -ab2it = ital 2 2
which is indeed the well-known formula for the area of an ellipse with semi-axes a and b.
□
8.113.   Find the area bounded by the cycloid which is given paramet-rically as \jr{t) = [a(t — siní); a(l — cos f)L for a > 0, t e (0, 2jt), and the x-axis.
Solution. Let the curves that bound the area be denoted by c\ and c2. As for the area, we get
(D) = \ L -ydx +x áy + \ 4 - ydx +x dy-
m
Now, we will compute the mentioned integrals step by step. The parametric equation of the curve c\ (a segment of the x-axis) is (t; 0); t € [0; 2ait], so we obtain for the first integral that
-ydx+xdy = -        0 • 1 dt + /     t ■ Odt = 0. 2 Jo Jo
The parametric equation of the curve c2 is \li(t) e (a(t — sin t), a(\ — cost e \_2tt; 0].
The mapping <p from the definition is called a local parametrization of the manifold M. The tangent space TM to the manifold M is the collection of vector subspaces TXM c T^M" which contain all vectors which are tangent to the curves in M. Clearly, every parametrization <p defines a diffeomorphism
(p.* : TV
T<p(V) C TM, <p.Ac'(t)) = —cp(c(t)).
dt
This definition is independent of the choice of the curve representing the vector c'(t) because we need to know only the first derivatives when calculating the right-hand derivatives.
Our definitions are a straightforward generalization of the concepts of a curve and a surface in the plane or the space and of their tangent lines or planes. We have excluded curves and surfaces which are self-intersecting and even those which are self-approaching. For instance, we can surely imagine a curve representing the numeral 8 parametrized with a mapping <p with everywhere-injective differential. However, we will be unable to satisfy the second property in a neighborhood of the point where the two branches of the curve meet.
To be able to talk about a volume on ^-dimensional manifolds in linear approximations, as we did with curves, we need objects which will be linear at every point in k distinct vector arguments and will assign a number to them. Moreover, we will require that interchanging any pair of arguments swap the sign, in accordance with the orientations. We have already met such objects, namely in paragraph 2.44 on page 107, and mainly when calculating the volumes of parallelepipeds using determinants in paragraph 4.22 on page 223.
___|    Exterior differential forms j___
Definition. The vector space of all ^-linear antisymmetric forms on a tangent space TXU, U c W, will be denoted by A*(7;M")*. In short, we talk about an exterior &-form at the point x.
The assignment of a &-form rj (x) to every point x e U from an open subset in R" defines an exterior differential k-form on U. The set of smooth exterior &-forms on U is denoted £2k(U).
Now, let us consider a smooth parametrization <p : V —>• M of a manifold M, an t](<p(u)) e Ak(Ttp(u)Rn), and choose arbitrarily k vectors Xi («),..., Xk(u) in the tangent space Tu V. Just like in the case of linear forms, we can evaluate the form rj at the images of the vectors X, using the parametrization <p. This operation is called the pullback of the form rj by (p.
<p* (rj(<p(u))) (X1(u),...,Xk(u))
= v(<P(u))(<P*(Xi(u)),
, <p*(Xi («))).
We can compute directly from the definition that, for instance,
cp*(dxi)(-) = dxiicfiifi-)) =
duk' duk" duk
and so (8.1)
(p (dxi) — -—du\ H-----h -—
OU\ oun
du„.
Another immediate consequence of the definition is the relation for pullbacks of an arbitrary &-form by composing two diffeomor-phisms. Verify the following equality by yourselves!
(8.2) ((pof)*a = f*((p*a).
507
CHAPTER 8. CONTINUOUS MODELS WITH MORE VARIABLES
The formula for the area expects a positively oriented curve, which means for the considered parametric equation that we are moving against the parametrization direction, i. e. from the upper bound to the lower one.
We thus get for the area of the cycloid that
o
1 Í 1 f
- J —y dx + x dy = - J a (t — siní) • a (siní) dt—
C2
2n
-w
2ti 2jt
1 2 [
-a I
2 J
o
o
1 2 [
-a I
2 J
a(l — cost) ■ a(l — cost) dt
t sin t — sin2 ř — 1+2 cos t — cos21 dt
t sin t + 2 cos t — 2 dt
2n
1
-a [—t cos t — sin t + 2 cos t — 2\l  = 3jva
□
A smooth &-form r\ on a ^-dimensional submanifold is such a mapping M -> Ak(TxM)* that the pullback of this form by any parametrization yields a smooth exterior &-form on V. We will use the notation £2k (M) for the set of all smooth exterior &-forms on M.
8.36. Outer product of exterior forms. Given a &-form a e
A"
and an ü-form,
e A*R°*, we can create a (k + £)-form a a p by all possible permutations a of the arguments. We just have to alternate the arguments in all possible orders and take the right sign each time:
(aAp)(Xu...,Xk+l) = 1 x
—       sign((r)q!(Zfr(i), Xa(k) )P(Xa(k+i), Xa{k+£) )•
It is clear from the definition that a a p is indeed a (k In the simplest case of 1-forms, the definitions says that
(a a P)(X, Y) = (a(X)ft(Y) - a(Y)ft(X)).
In the case of a 1-form a and a &-form f3, we get
£)-iorm.
(aAß)(X0,XU---,Xk) =
k
^2(-iya(Xj)ß(x0,...,Xj,.
■, Xk),
J. Applications of Stoke's theorem - the Gauss-Ostrogradsky
theorem
8.114.   Compute I = ffx3 dy dz +y3 dx dz +z3 dx dy,where S
s
is given by the sphere x2 + y2 + z2 = 1.
Solution. It is advantageous to work in spherical coordinates
x = p sin <p cos i//- p = [0, 1],
y = p sin <p sin if/ <p = [0,7t],
z =pcos(p ijs = [0, 2tv].
The Jacobian of this transformation is — p2 sin<p. The given integral is then equal to
I = J J x3 dy dz + y3 dx dz + z3 dx dy =
s
= JJJ 3x2 + 3y2 + 3z2 dx dy dz =
v
1   2jt jt
= 3 J J J p2 sin cp (p2 sin2 cp cos2 x[r + p2 sin2 cp sin2 x[r+
0   0 0
+p2 cos2 <p) dp d<p dif/ =
1    2jT jt
= 3 J J J p4 sin cp (sin2 cp (cos2 x[r + sin2 x[r) +
0   0 0
+ cos2 cp) dp dcp d^i =
where that hat indicates omission of the corresponding argument. The outer product of finitely many form is defined analogously (either directly by a similar formula, or we can notice that the outer product of two forms is associative - think this out by yourselves!).
We will use the same notation for forms in £2k (M). Just as we had generators     of all vector spaces in X(W), all linear forms
in S31 (Rn) are generated by the forms dxt. Their outer products
— dx[x a • • • a dxik
with ^-tuples of indeces i\ < i2 < ■ ■■ < ik generate the whole space £2k(Rn). Indeed, interchanging a pair of adjacent forms in the product merely changes the sign, so the whole expression is identically zero if an index appears twice. Therefore, every &-form a is given uniquely by functions Uix...ik(x)
a(x)
E -
n <—<ik
, (x)dxi. a • • • a dxu
We can also notice that 0-forms £2°(R") are, by the definition, functions on M". The outer product of a 0-form / and a &-form a is just the multiple of the form a by the function /.
Verify that the pullback of the outer product by a diffeomor-phism <p : V —>• U satisfies
cp* (a a P) — <p*a a <p*fi.
8.37. Integration of exterior forms on M". Exterior n -forms on M" have a unique behavior as we have a single non-decreasing sequence of n indeces ik — k at our disposal. Therefore, the whole space £2"(R") is generated by a single form s\2...n and it can be identified with the space of functions f(x). Each of these functions defines, at every point x e Rn, the infinitesimal calculation of the volume by the n-form
(8.3) o){x) — f(x)dx\ a • • • a dx„,
508
CHAPTER 8. CONTINUOUS MODELS WITH MORE VARIABLES
1    1,71 71
3 '  '  ' 4
p sirup dp dtp dtp- = 3
///
0   0 0
1 12 3 • - -2jz ■ [-1 - 1] = -ytf-
5
[cos ipf0
□
8.115. The vector form of the Gauss-Ostrogradsky theorem. The
divergence of a vector field F(x, y, z) = f(x, y, z)^+g(x, y, z)j^ +
h(x, y, z)-j£ is defined as di\ X := fx + gy + hz. Then, the Gauss-Ostrogradsky theorem can be formulated as follows:
///-
divF(x, y, z) dx dy dz
IS
F{x, y, z) ■ n(x, y, z) dS,
where nix, y, z) is the outer unit normal to the surface S at the point [x, y, z] e S (S is the boundary of the normal domain V).
8.116.   Find the flow of the vector field given by the function F =
(xy2, yz„ x2z) over the cylinder x2 + y2 = 4, z, = 1, z =3.
Solution. First of all, we compute the divergence of the vector field:
div F = V • F = (—-+ —— + —-) = f + z +xz.
ox ay az
Therefore, the flow T of the vector filed is equal to
///>
+ z + x dx dy dz
2  2i 3
J J J p ■ iß2 sin2 cp + z + p2cos2cp) dp dip dz.
0   0 1 2  2i 3
///
0   0 1 2  2i 3
///
0   0 1 2  2i 3
///
p ■ (p2 (sin2<p + cos2<p) + z ) dp dip dz
p ■ (p2 (sin2<p + cos2<p) + z ) dp dip dz
p + pz dp dip dz
0   0 1
2 3
2n J J p3 +pz dp dz = 2tv J [y + y z]q dz
0 1 3
/
2jz I 4 + 2z dz = 2tt[4z + z2]? = 2tt[12 + 9 - 4 - 1] = 32tt.
□
i. e., / gives, at every point, the scale with which the standard volume is to be taken. We have thus obtain a new interpretation of our earlier procedure of integration of functions / on Riemann-measurable open subsets E/cK".
First, we define a form &>r«, giving the standard n-dimensional volume of parallelograms, i. e., in the standard coordinates, we will have
CO^n — dx\ a • • • a dx„.
If we want to integrate a function fix) "in the old fashion", we consider the form co — fco^n instead, i. e. co will have the form (8.3) in the standard coordinates. We define
I co = I f(x)dx\ a Ju Ju
a dx„
I fix) dx\ ... dxn, Ju
where there is the Riemann integral of a function on the right-hand side. We can notice, that the n-form on the left-hand side is independent of the choice of coordinates.
If we want to express the form co in different coordinates using a diffeomorphism <p : V —>• U, it means we will evaluate co at a point cpiu) — x at the values of the vectors <p*(Xi),..., <p*(X„). However, this means we will integrate the form cp* co in coordinates iu\,..., u„), and we can easily compute (look at (8.1) in paragraph 8.35) that
fdipi dipi .
iip co)iu) — fiipiu))(-—dux H-----V -—du„) A...
aui aun
a i-^-du\ + • • • H—^-dun) dui dun
— fiipiu)) det(Z) 1ipiu))du\ a • • • a dun. Substituting into our interpretation of the integral, we get
f <P*(fo>)
I f(u) det(Z) 1ipiu))du\ Jv
which is, by the theorem on transformation of variables from paragraph 8.31, the same value if the determinant of the Jacobian matrix keeps being positive, and the same value up to sign if it is negative.
Our new interpretation thus yields a geometrical sense for the integral of an n-form on M", supposing the corresponding Riemann-integral exists in some (hence any) coordinates. This integration takes into account the orientation of the area we are integrating over.
8.38. Integration of exterior forms on manifolds. Now, we are almost ready for the definition of an integral of a k-formy on a ^-dimensional oriented manifold. For the sake of simplicity, we will examine smooth forms co ^— with compact support. First, let us assume that we are given a ^-dimensional manifold M c M" and one of its local parametrizations <p : V c Rk —>• U C M c M". The choice of the parametrization <p also fixes the orientation of the manifold U C M. This orientation will be the same for those choices which differ by diffeomorphisms with positive determinants of their Jacobian matrices. The orientation will be the other one in the case of negative determinants. Therefore, we apparently have exactly two orientations on every connected manifold. Fixing either of them, we thereby restrict the set of parametrizations which are compatible with this orientation.
509
CHAPTER 8. CONTINUOUS MODELS WITH MORE VARIABLES
8.117. Find the flow of the vector field given by the function F (y, x, z2), over the sphere x2 + y2 + z2 = 4.
Solution. The divergence of the given vector field is:
dy     dx dz2 divF = V • F = (— + — +-) = 2z.
dx     dy dz
Thus, the wanted flow equals
///
2z dx dy dz
1     71 111
///
p2 sin<p • 2p cos cp dp dcp dif/
ooo
2jt n
2 J p3 dp J dx[r J sin cp cos cp dip
o
0 0
P4 9 9^ sin2 w „ 2[yll ■ Wff ■ [^lo
16
2---2tt • 0 = 0.
4
K. First-order differential equations 8.118.   Find all solutions of the differential equation
□
y
(l + cos2 x)
Solution. We are given an ordinary first-order differential equation in the form / = f(x, y), which is called an explicit form of the equation. Moreover, we can write it as / = f\{x) ■ f2(y) for continuous univariate functions f\ and f2 (on certain open intervals), i. e., it is a differential equation with separated variables.
First, we replace / with dy/dx and rewrite the differential equation in the form
dy
1+COS2 X
dx.
Since
l+cos2 X
COS2 X
dx = f
+ 1 dx,
we can integrate using the basic formulae, thereby obtaining
(8.6)
arcsin y = tanx + x + C, Ce
However, we must keep in mind that the division by the expression a/1 — y2 is valid only if it is non-zero, i. e., only for y ^ ±1. Substituting the constant functions y = 1, y = — 1 into the given differential equation, we can immediately see that they satisfy it. We have thus obtained two more solutions, which are called singular. We do not have to pay attention to the case cos x = 0 since this only loses points of the domains (but not any solutions).
Now, we will comment on several parts of the computation. The expression / = dy/dx allows us to make many symbolic manipulations. For instance, we have
dz_ _ dy.
dy dx
dz_ dx '
dy dx
dx dy'
From now on, we will always proceed in this fashion, and we will talk about oriented manifolds.
Now, let us select a form co with compact support inside the image of a parametrization U C M of an oriented manifold M. The form cp* (co) is a smooth &-form on V c Rk with compact support. The integral of the form co on M is defined in terms of the chosen parametrization which is compatible with the orientation as follows:
cp*(co).
If we choose a different compatible parametrization cp — cp o ip* where i/r is a diffeomorphism i/r : W —>• 7 c K1, we can easily compute using the same definition. Let us denote
cp* (a))(u) — f (u)du\ a • • • a diik-
Invoking the relation (8.2) for the pullback of a form by a composite mapping, we get
f a>= f cp*(co)= f n<fo>)
JM JRk JRk
Ip* (/ du\ a • • • a dllk)
f(f(v)) det(Dli/)(v)dvi---dvk.
f
JRk
I
JRk
This is again the same value as fRk cp* co.
This proves the correctness of our definition of the integral fM co provided the integrated &-form has compact support lying in the image of a single parametrization.
However, typical manifolds M are given by implicit equations; e. g. x2 + y2 + z2 — 1 defines the surface of the unit ball, i. e. the sphere S2 c R3. If we want to integrate an exterior 2-form on S2, we will have to use several parametrizations. Fortunately, our definition of the integral is additive with respect to disjoint unions of integration domains. Therefore, if we can write
M — U\ U U2 U • • • U Um U B,
where Ui are pairwise disjoint images of parametrizations cpi, and B is a set whose inverse image in any parametrization is a Riemann-measurable set with measure zero, we can compute
/   co —  I &)+•••+/
JM JU\ JU,
CO + ■ ■ ■ +   I CD,
>Ui Jum and we can easily verify that this value is independent of the choice of the sets Ui and the parametrizations (in particular, we need not be worried by the set B since the result of any integration on it is zero). For example, we can imagine splitting a sphere to the upper and lower hemispheres, leaving the equator B uncovered.
When calculating in practice, we usually divide the entire manifold into several disjoint areas, and we integrate on each of them separately. However, we will mention a global definition which is more advantageous from the technical point of view.
8.39. Unit decomposition. Consider a manifold M c M" and one of its covers by open images Ui of parametrizations cot. We can surely find a countable cover of each manifold M (it suffices to realize that we can do with parametrizations which map the origin to points with rational coordinates in M"). Furthermore, we can assume that any point in x € M belongs to only finitely many sets Ui. Such a cover is called a locally finite cover by parametrizations cpi.
510
CHAPTER 8. CONTINUOUS MODELS WITH MORE VARIABLES
The validity of these two formulae is actually guaranteed by the chain rule theorem and the theorem for differentiating an inverse function, respectively. It was just the facility of the manipulations that inspired G. W. Leibniz to introduce this notation, which has been in use up to now. Further, we should realize why we have not written the general solution (||8.6||) in the suggesting form
(8.7)
y = sin (tan x + x + C) , Ce
As we will not mention the domains of differential equations (i. e., for which values of x the expressions are well-defined), we will not change them by "redundant" simplifications, either. It is apparent that the function y from (|| 8.71|) is defined for all x e (0, jt) \ {tt/2}. However, for the values of x which are close to Tt/2 (having fixed C), there is no y satisfying (||8.6||). In general, the solutions of differential equations are curves which may not be expressible as graphs of elementary functions (on the whole intervals where we consider them). Therefore, we will not even try to do that. □
8.119.   Find the general solution of the equation / = (2 — y) tan x.
Solution. Again, we are given a differential equation with separated variables. We have
dy
dx dy
— = (2 — y) tan x,
y-2 In I y - 2 I
smx
-dx,
cosx
— In I cos x
In I C I, C^O.
Here, the shift obtained from the integration has been expressed by In | C |, which is very advantageous (bearing in mind what we want to do next) especially in those cases when we obtain a logarithm on both sides of the equation. Further, we have
In | y — 2 | =ln|Ccosjc|, C^O, | y — 2 | = | Ccosjc |, C^O, y-2 = Ccosx, C^O,
where we should write ±C (after removing the absolute value). However, since we consider all non-zero values of C, it makes no difference whether we write +C or — C. We should pay attention to the fact that we have made a division by the expression y — 2. Therefore, we must examine the case y = 2 separately. The derivative of a constant function is zero, so we have found another solution, y =2. However, this solution is not singular since it is contained in the general solution as the case C = 0. Thus, the correct result is
y = 2 + Ccosx,    Cel. □
8.120.   Find the solution of the differential equation
(1 + e*)y/ = ex
which satisfies the initial condition y(0) = 1.
Solution. If the functions / : (a, b) -» R and g : (c,d) -» R are continuous and giy) ^ 0, y e (c,d), then the initial problem
/ = f(x)g(y),   y(x0) = y0
Now, recall the smooth variants of indicator functions from paragraph 6.7. For every pair of positive numbers e < r, we constructed a function fB,r(t) such that fB,r(t) — 1 for \t\ < r — s, while fB,r(t) — 0 for \ t\ > r + s, and 0 < fB,r(t) < 1 everywhere. At the same time, we had fit) ^ 0 if and only if \t\ < r + s.
Now, if we define
\r,e,xq
(x) — fB,r(\x - x0|),
we get a smooth function which takes the value 1 inside the ball Br-B (xq), with support exactly Br+B (xq), and with values between 0 and 1 everywhere.
Lemma (Whitney's theorem). Every closed set K c of all zero points of some smooth function.
is the set
Proof. The idea of the proof is quite simple. If A" — Rn, the zero function is convenient, so we can further assume that K /
W.
An open set U — Rn\K can be expressed as the union of (at most) countably many open balls Bri (x,), and for each of them, we choose a smooth non-negative function f on Rn whose support is just Bri (xi), see the function Xr,e,x0 above. Now, we add up all these functions into an infinite series
fix) — ^2akfk(x),
where the coefficients a* are selected so small that this series would converge to a smooth function fix).
To this purpose, it suffices to choose a* so that all partial derivatives of all functions a* fk ix) up to order k (inclusive) would be bounded from above by 2~k. Then, not only the series ^ a* fk is bounded from above by the series ^ 2~k, hence by Weierstrass criterion, it converges uniformly on the entire Rn, but we get the same for all series of partial derivatives, since we can always write them as
E
k=0
Clk
drfk
dx;,
■ dxir
E
k=r
ak
ar.fk
dxh ■ ■ ■ dxir
where the first part is a smooth function as it is a finite sum of smooth functions, and the second part can again be bounded from above by an absolutely converging series of numbers, so this expression will converge uniformly to dx. ^../dx.r ■
It is apparent from the definition that the function fix) satisfies the conditions of the lemma. □
Unit decomposition on a manifold
Theorem. Consider a manifold M C R" and one of its locally finite covers by open images Ui of parametrizations cpt. Then, there exists a system of smooth functions f on the sets U{ such that for every point x e M, we have ^ f (x) = 1, and f (x) ^ 0 if and only ifxe Ui.
The system of functions f from the theorem is called the unit decomposition subordinate to a locally finite cover of a manifold by parametrizations.
511
CHAPTER 8. CONTINUOUS MODELS WITH MORE VARIABLES
has a unique solution for any x0 e (a,b), y0 e (c,d). This solution is determined implicitly as
y(x)
yo *o
In practical problems, we first find all solutions of the equation and then select the one which satisfies the initial condition. Let us compute:
(l + e*) y dy/dx = ex,
ydy
1 +ex
■ dx,
y2
y— =ln(l +ex) +ln|C|,    C ^ 0, y— = ln(C[l+e*]), C>0.
The substitution y = 1, x = 0 then gives
£ = In (C • 2),    i. e.   C - 2
We have thus found the solution
y_
2
ln(^[l+e*]),
l. e.,
□
y = Jlln(f [1+e*]) on a neighborhood of the point [0, 1] where y > 0.
8.121.   Find the solution of the differential equation which satisfies y(0) = 1.
Solution. Similarly to the previous example, we get
dy dx f + 1 ~~ x + 1' arctan y = In | x + 11 + C, Cel.
The initial condition (i. e., the substitution x = 0 and y = 1) gives
arctan 1 = In | 11 + C,   i. e.,   C = f. Therefore, the solution of the given initial problem is the function
y(x) = tan (In | x + 1 | + |) on a neighborhood of the point [0,1]. □
8.122. Solve (8.8)
y
x+y + l 2x + 2y - 1'
Solution. Let a function / : (a,b) x (c,d) -» R have continuous second-order partial derivatives and f(x, y) ^ 0, x e (a,b), y € (c,d). Then, the differential equation / = f(x, y) can be transformed to an equation with separated variables if and only if
f(x,y) j;Ax.y)
= 0,   x e (a, &), y e (c, <i).
./;<v.v) ./;;<.v. v.
Proof. First, we extend the sets Ut to open sets Ut using the extended parametrizations <p, from the definition of manifold and its local parametrizations. We can surely do this in such a way that the sets Ut keep being a locally finite cover of an open neighborhood U = U, Ut C R" of the manifold M.
For every open set Ut, we can choose a function gi (x) on the whole R" so that (x) / 0 exactly for x e U{. This can be done by Whitney's theorem, having been just proved. Now, the function g (x) — i~2t gi (x) is well-defined for all x e R" and smooth, thanks to the cover being locally finite (for every fixed point x, it is a finite sum of non-zero functions on some of its neighborhoods). The function g(x) is non-zero for all x e M, so we can consider not the functions gi (x) restricted to M, but rather the functions ft (x) — gi (x)/g(x), which already have both of the required properties of the theorem. □
8.40. Integration of /c-forms on manifolds. Finally, we are ready for the definition of integral of /c-forms on ^-dimensional manifolds. Let us thus consider a s manifold M c R" and a form w e Qk(M) having compact support.
Let us choose a locally finite cover of the manifold M by parametrizations cpt : Vt —>• Ut such that the closures of all images <Pi(Vi) are compact and, eventually, choose a unit decomposition ft subordinate to this cover. The integral is defined by the formula
JM Jm   . . JUi
where the right-hand integrals have already been defined since each of the forms fiw has support inside the image under the parametrization cpt. Actually, we can assume that our sum is finite, since it suffices to consider parametrizations covering the compact support of o). Hence, it is a well-defined number, yet it remains to verify that the resulting value is independent of our choices.
To this purpose, let us choose another parametrization <p : V —>• U of a piece U of the manifold M, and let us have a look at the contribution of the integral over U. We get
f « = E [   (fi°<P)(<P*o>)= f
JU ■   JVifW Jv
ip CD.
Therefore, if we select a different cover and unit decomposition, we can do the above reasoning for a common refinement of these covers and verify that the expression we have defined is actually independent of all of our choices (think this out in detail!).
8.41. Exterior differential of exterior forms. As we have seen, the differential of a function can be interpreted as a mapping
d : Q°(Rn) Q\Rn).
By means of parametrizations, this definition can be extended to functions on manifolds M, where the differential is a linear form on M. The following theorem extends this differential to arbitrary exterior forms on manifolds McR".
[    Exterior differential    [__^
Theorem. There is a unique mapping d : Qk(M) —>• Qk+lM,for all manifolds Mcl" and k — 0,..., k, such that
• d is linear with respect to multiplication by real numbers
• for k — 0, it is a differential of functions
512
CHAPTER 8. CONTINUOUS MODELS WITH MORE VARIABLES
With a bit of effort, it can be shown that a differential equation of the form y = f(ax + by + c) can be transformed to an equation with separated variables, and this can be done by the substitution z = ax + by + c. Let us emphasize that the variable z, replaces y.
We thus set z, = x + y, which gives z! = 1 + /. Substitution into (II8.8H) yields
dz,     z + 1 1
dx     2z — 1
dz 3z
dx     2z — 1
2 1
3 ~ Yz
dz = 1 <ix,
^ z - ^ In | z I = x + C, Ce
or
| z - \ In | Cz | = x,    C £ 0.
Now, we must get back to the original variable y in one of these forms. The general solution can be written as
\x + \y - ±ln|x + y\ = x + C, CeR,
l. e.,
x -2y + In I x + j | = C, Ce
At the same time, we have the singular solution y = —x, which follows from the constraint z ^ 0 of the operations we have made (we have divided by the value 3z). □
8.123.   Solve the differential equation
xy' + y In x = y In y.
Solution. Using the substitution u = y/x, every homogeneous differential equation / = / (y/x) can be transformed to an equation (with separated variables)
u' = ^ (f(u) — u),    i. e.   u'x + u = f(u).
The name of this differential equation is comes from the following definition. A function / of two variables is called homogeneous of degree k iff f(tx, ty) = ft f(x,y). Then, a differential equation of the form
P(x, y) dx + Q(x, y) dy = 0
is a homogeneous differential equation iff the functions P and Q are homogeneous of the same degree k.
For instance, we can discover that the given equation
x dy + (y In x — y In y) dx =0
is homogeneous. Of course, it is not difficult to write it explicitly in the form
• d(a a P) = (da) Afi + (-l)ka a (dfi), where a e Qk(M)
• d(df) — Ofor every function f on M.
The mapping d is called an exterior differential.
Proof. The /c-form can be written locally in the form
a —    E ah-ikdxh a • • • a dxik. h < — <ik
If the differential d exists, then by the required properties, it must be equal to
da — daix...ikdxix a • • • a dxik
dxi
djc1 a dX[. a • • • a dxi.
Indeed, the basis linear forms dxi are in fact the differentials of the coordinate functions, so further differentiation must lead to zero by the last property, while we know the differential of functions. Further, we have d(fP) = df Afi + f dp. (Think out the details!).
On the other hand, if we define the differential d in coordinates this way, we can easily verify all of the required properties. (Finish by yourselves!) □
8.42. Manifolds with boundary. In practical problems, we of-;i ten work with manifolds M like an open ball in the three-"fM dimensional space, for example. At the same time, we are Alfc interested in the boundaries of these manifolds 3M, which if^   is a sphere in the case of a ball.
The simplest case is the one of connected curves. It is either a closed curve (like a circle in the plane), then its boundary is empty, or the boundary is formed by two points. These points will be considered including the orientation inherited from the curve, i. e., the initial point will be taken with the minus sign, and the terminal point - with the plus sign.
The curve integral is the easiest one, and we can notice that integrating the differential df of a function along a curve M which is the image of a parametrization <p : [a,b], then we get directly from the definition that
JM Ja
d(fo<p)(t)dt = f(<p(b))-f(<p(a)).
y
* In 2.
Therefore, the result is independent of not only the selected parametrization, but also the actual curve. Only the initial and terminal points matter. Splitting the curve into several consecutive disjoint intervals, the integral splits into the sum of differences of the values at the splitting points. This sum will be telescoping (i. e., the middle terms cancel out), resulting in the same value again.
Now, this phenomenon will be discussed in general dimensions. To be able to do this, we need to formalize the concept of the boundary of a manifold and its orientation. The simplest case is the half-space M — M_ x M"-1, where M_ — {(x\, ...,x„) e W; xx < 0}. Its boundary is 8M = {(xux2, ...,x„) e W; xx = 0}. The orientation on this half-space inherited from the standard orientation is the one determined by the form dx2 a • • • a dx„.
513
CHAPTER 8. CONTINUOUS MODELS WITH MORE VARIABLES
The substitution u = y/x then leads to
u'x + u = u lnw,
du
dx
x = u (lnw — 1),
du
dx
u (In u — 1) x
where u (In u —1)^0. Using another substitution, namely t In u — 1, we can integrate
/du f dx
u (In u — 1)    J x
dx
i u — 1)
ln|f|=ln|jt|+ln|C|, C jL 0,
In | lnw - 1| = In | Cx |, C^O,
Inh - 1 = Cx, C^O,
ln- = Cx + l, C^O,
x
y=xeCx+1, C^O.
The excluded cases u = 0 and In w = 1 do not lead to two more solutions sincew = 0 implies y = 0, which cannot be put into the original equation. On the other hand, In u = 1 gives y/x = e, and the function y = ex is clearly a solution. Therefore, the general solution is
>cx+i     c eR.
y = xe
□
8.124. Compute
y
4;c+3y + l
Solution. In general, we are able to solve every equation of the form
ax + by + c
(8.9)
y = /
^Ax +By + C/ If the system of linear equations
(8.10) ax +by +c = 0,    Ax+By + C=0
has a unique solution x0, yo, then the substitution U — X — Xq, V — y — yo transforms the equation (||8.9||) to a homogeneous equation
dv _        r / au+bv \
du  ~ J \Au+Bv I '
If the system (||8.10||) has no solution or has infinitely many solutions, the substitution z = ax + by transforms the equation (||8.9||) to an equation with separated variables (often, the original equation is already such).
In this problem, the corresponding system of equations
4x + 3y + 1 = 0,    3x + 2y + 1 = 0
has a unique solution x0 = — 1, yo = 1. The substitution u = x + 1, v = y — 1 then leads to the homogeneous equation
4u+3v
dv
Oriented boundary of a manifold
Let us consider a closed subset M c R" such that its interior M c M is an oriented ^-dimensional manifold with a cover by compatible parametrizations cpt. Further, let us assume that for every boundary point x e 8M — M \ M, it has a neighborhood in M with parametrization <p : V c K- x M*-1 M such that the points x e 8M n <p(V) are just the image of the boundary of the half-space M_ x M*-1. The subset M with these properties is called an oriented manifold with boundary.
The restriction of the parametrizations including the boundary to this boundary 3M defines a structure of a k — 1-dimensional oriented manifold on 3M.
8.43. Stokes' theorem. Now, we get to a very important and useful result. The main theorem about the multidimensional analogy of curve and surface integrals will be formulated for smooth forms and smooth manifolds. A brief analysis of the proof shows that actually, we need a once continuously differentiable integrated exterior form and a twice continuously integrable parametrizations of the manifold. In practice, the boundary of the region is often similar as in the case of the unit cube in R3.1, e., we have discontinuities of the derivatives on a Riemann-measurable set with measure zero in the boundary. In such a case, we divide the integration to smooth parts and add the results up. We can notice that although new pieces of boundaries come into being, they are adjacent and have opposite orientations in the adjacency regions, so their contribution is canceled out (just like in the above case of boundary points of a piecewise differentiable curve).
_ |    Stokes' theorem
Theorem. Consider a smooth exterior (k — \)-form to with compact support on an oriented manifold M with boundary 3M with the inherited orientation. Then we have
I  du> — I
JM JdM
Proof. Using an appropriate locally finite cover of the manifold M and a unit decomposition subordinate to it, we can express the integrals on both sides as the sum (even a finite one, since the support of the considered from co is compact) of integrals of forms on Rk or the
half-space 1
We can thus assume without loss of generality that M is the half-space
M =
and the form co is a form with compact support on M. Then, co will surely be the sum of the forms
co — coj (x)dx\ a • • • a dxj a • • • a dxfc,
where the hat indicates omission of the corresponding linear form, and coj (x) is a smooth function with compact support. Its exterior differential is
3u+2v '
• dcoj
dco — (—ly —-dx\ a
8xj
a dxk.
514
CHAPTER 8. CONTINUOUS MODELS WITH MORE VARIABLES
which can be solved by further substitution z = v/u. We thus obtain
4 + 3z
z'u + z
dz
— u =--
du
2z + 3 2z2 + 6z + 4
3 + 2z' 2z2 + 6z + 4
3 + 2z dz = -
du u
provided z2 + 3z + 2 ^ 0. Integrating, we get
1    i ,
- In J z2 + 3z + 2
In I u I + In I C
- In J (z2 + 3z + 2) u2 J = In | C |
In J (z2 + 3z + 2) u2 J = In C2 (z2 + 3z + 2)u2 = ±C2
C ^ 0.
We thus have
(z2 + 3z + 2) u2 = D, D/0
and returning to the original variables,
,2
— +3-+2)uz
UL u
V2 + 3vu + 2m2
(y- l)2 + 3(y- l)(x + l) + 2(x + l)2
D, D^O,
D, D^O, D, D^O.
Making simple rearrangements, the general solution can be expressed as
(x + y) (2x + y + 1) = D,    D jL 0.
Now, let us return to the condition z2 + 3z, + 2 ^ 0. It follows from z2 + 3z + 2 = 0 that z = — 1 or z = —2, i. e., v = —u or v = —2u. For i; = —u, we have x = u — 1 and y = i; + 1 = —m + 1, which means that y = — x. Similarly, for i; = —2u, we have y = —2u + 1, hence y = —2x — 1. However, both functions y = —x, y = —2x — 1 satisfy the original differential equations and are included in the general solution for the choice D = 0. Therefore, every solution is known from the implicit form
(x + y) (2x + y + 1) = D,    D € R.
□
8.125.   Find the general solution of the differential equation
(x2 + y2) dx — 2xy dy = 0.
Solution. For y ^ 0, simple rearrangements lead to
v _ i±± - i±iil!
y   —    2xy    —     2j •
Using the substitution u = y/x, we get to the equation
u'x + u
2u
If j > 1, the form co on the boundary 3M evaluates identically to zero. At the same time, invoking the fundamental theorem about antiderivatives of univariate functions, we get
f i f    ( f°° da)i \
I   d(D — (—Xy I     [I -dx; )dx\ ■ ■ ■ dxj ■ ■ ■ dxk
JM Jm*-1 VJ-oo dxj J
= (-1>
AM
dxi ■ ■ ■ dxj ■ ■ ■ dxk — 0,
since the function coj has compact support. So the theorem is true in this case. However, if j — 1, then we obtain
f f    ( f°  da)i \
j   d(D — j     II -dx\ \dxj_......dxk
JM JRk-1 \J-oo dxi )
— I     a>i (0, x2, ■ ■ ■, xjc)dx2 ■ ■ ■ dxi — J a>
JRk-1 JdM
This finishes the proof of Stokes's theorem. □
8.44. Notes about application of Stokes' theorem. We have proved an extraordinarily strong result which covers several standard integral relations from the classic vector analysis. For instance, we can notice that by Stokes' theorem, the integration of the exterior differential dco of any (k — l)-forms over a compact manifold without boundary is always zero (for example, integrating a 2-form dco over the sphere S2 c R3).
Let us look step by step at the cases of Stokes' theorem in lower dimensions.
The case n — 2, k — 1. We are thus examining a surface M in the plane, bounded by a curve C — 8M. If we have co(x, y) — f(x, y)dx+g(x, y)dy,ihendco — (— j^ + ^dxAdy. Therefore, Stokes' theorem gives the formula
■g(x, y)dy = / ( -JM \
j fix, y)dx
dx a dy,
K+dA
dy dx
which is one of the standard forms of the so-called Green's theo-
Using the standard scalar product on M2, we can identify the vector field X with a linear form cox such that a>x(Y) — (Y, X). In the standard coordinates (x, y), this just means that the field X — f(x, y) J-r 4- g(x, y)-^ defines right the form co given above. The integral of cox over a curve C has the physical interpretation of the work done by movement along this curve in the force field X. Green's theorem then says, besides others, that if cox — dF for some function F, then the work done along a closed curve is always zero. Such fields are called potential fields and the function F is the potential of the field X.
With Green's theorem, we have verified once again that integrating the differential of a function along a curve depends solely on the initial and terminal points of the curve.
The case n — 3, k — 2. We are examining a region in M3, bounded by a surface S. If ca — f(x, y, z)dy a dz + g(x, y, z)dz a dx + h(x, y, z)dx a dy, we get dm — (|£ + |^ + ^-)dx a dy a dz, and Stokes' theorem says that
Ii
fix, y, z)dy Adz+g(x, y, z)dzAdx+h(x, y, z)dxAdy
JM \
Bf
dx
dy
dh
3z
H---1--\dx A dy A dz.
515
CHAPTER 8. CONTINUOUS MODELS WITH MORE VARIABLES
For u 7^ ±1 and D
-1/C, we have du
— x dx
1+u2
2u2
2u
In 1
\ —u1 In I x I + In I C
In-
1
In I Cx
1 = Cx ( 1
z)
x
-Dx
y2
2u
dx
■ du = —, x
, C ^ 0, ,   c # 0,
C/0,
D/0, 0^0.
v2,
±x. While y = 0 is —x are solutions and
The condition u = ± 1 corresponds to y not a solution, both the functions y = x and y can be obtained by the choice D = 0. The general solution is thus
y2 = x2 + Dx,    Dei. □
8.126. Solve
y
2y
*2-l •
Solution. The given equation is of the form / = a(x)y + b(x), i. e., a non-homogeneous linear differential equation (the function b is not identically equal to zero). The general solution of such an equation can be obtained using the method of integration factor (the non-homogeneous equation is multiplied by the expression e~ ^ a(x) ^) or the method of variable separation (the integration constant that arises in the solution of the corresponding homogeneous equations is considered to be a function in the variable x). We will illustrate both of these methods on this problem.
As for the former method, we multiply the original equation by the expression
J-
dx
x-1 x + l '
where the corresponding integral is understood to stand for any anti-derivative and where any non-zero multiple of the obtained function can be considered (that is why we could remove the absolute value). Thus, consider the equation
2y
x(x — 1)
x + l   1   (jc + 1)2 _    x+l "
The core of the method of integration factor is that fact that the expression on the left-hand side is the derivative of y . Integrating this leads to
,x-l
x(x — 1)
dx
2x +2 mix + 1 I + C, Ce
y x+i  j x+i 2
Therefore, the solutions are the functions
y = x^T (t _ 2x + 21n|jc + 1 | + c) , CeR.
As for the latter method, we first solve the corresponding homogeneous equation
V -__2J-
y - x^-v
This is the statement of the so-called Gauss-Ostrogradsky theorem.
This theorem also has a very illustrative physical interpretation. Every vector field X — /(x, y, z) J-r + g(x, y, z)f^ + h(x,y,z)-^ defines an exterior 2-form cox(x, y, z) — fix, y, z)dy A dz + g(x, y, z)dz A dx + h(x, y, z)dx A dy by substitution for the first argument in the standard form of volume. The integral of this form over a surface can be perceived so that the integrated 2-form infinitesimally contributes, at every point to the integral, the increase equal to the volume of the parallelepiped given by the field X and a little piece of surface. If we consider the vector field to be the velocity of movement of the particular points of the space, this will be the "flow rate" through the given surface. On the right-hand side of the integral, there is an expression which can be defined as d(cox) — (div X)dx A dy A dz. Gauss-Ostrogradsky theorem says that if divZ equals zero identically, then the total flow rate through the boundary surface of the region is zero as well. Such fields, with divZ — 0, are called solenoidal vector fields.
The case n — 3, k — 1. In this case, we have a surface M in R3 bounded by a curve C. If the linear form co is the differential of some function, we find out that the integral over the surface depends on the boundary curve only. This is the classical Stokes' theorem. If we use the standard scalar product, just like in the plane,
to identify the vector field X — f-^r co — f dx + gdx + hdz, we obtain
■ s1-
« dy
h-^ with the form
fc
gdx + hdz
f
d(D,
where^=(|-f)JyA*+(f-f)*AJx+(f-|i)JxA dy. This 2-form can again be identified with a single vector field rot X, which yields dco by substitution into the standard form of volume. This field is called the rotation or curl of the vector field X. We can see that in the three-dimensional space, vector fields X having the property that cox — dF for some function F are given by the condition rot X — 0. They are called conservative (or potential) vector fields.
3. Differential equations
In this section, we will get back to (vector) functions of one variable, which will be given and examined in terms of their instantaneous changes. At the end, we will stop for a while to look at equations containing partial derivatives.
8.45. Linear and non-linear difference models. The concept of derivative was introduced in order to work with instantaneous changes of the examined quantities. In the introductory chapter, we once defined differences for the same reason, and it was just the relations between the values of the quantities and the changes of them or other quantities which lead to the so-called difference equations. As a motivating introduction to equations containing derivatives of unknown functions, we will now return to the difference equations for a while.
The simplest model was interests of deposits or loans (and the same for the so-called Malthusian model of populations). The increase was proportional to the value, see 1.10. Considering continuous modeling, the same request leads to an equation connecting
516
CHAPTER 8. CONTINUOUS MODELS WITH MORE VARIABLES
which is an equation with separated variables. We have
dy 2y dx        x2 —
ln|y
In I x
dy
y
1 I + In I x + 1 I + In I C x + 1
ln|y
In
C
y = C
where we had to exclude the case y
x - 1 x + 1
x2 - 1 2
-dx,
- 1
x - 1
0. However, the function y = 0 is always a solution of a homogeneous linear differential equation, and it can be included in the general solution. Therefore, the general solution of the corresponding homogeneous equation is
ax+i)    c R
J X — 1 '
Now, we will consider the constant C to be a function C(x). Differentiating leads to
./ _ C'(x) (jc + 1)(jc-1)+C(jc) (jc-I)-C(jc)
y - (x-i)2
Substituting this into the original equation, we get
C'(x) (jt+l)(jt-l)+C(jt) (Jt-I)-C(jt) _ _ 2C(x) (x + 1)
(x-l)2 ~ (x-l)(x2-l)'
It follows that
C(x)
x(x — 1) x + l '
l. e.,
C(x)
r x(x -1)
J    x + l
dx,
C(x)
2x + 2 In I x + 1 I + C, Ce
Now, it suffices to substitute:
y = c(x)
x+l x-l
x + l x-l
2x + 2 In I x + 1 I + C
C e
We can see that the result we have obtained here is of the same form as in the former case. This should not be surprising as the differences between the two methods are insignificant and the computed integrals are the same.
Finally, we can notice that the solution y of an equation / = a(x)y can be found in the same way for any continuous function a. We thus always have
y = Cefa(x)dx, CeR.
Similarly, the solution of an equation / = a (x)y +b(x) with an initial condition y(xo) = yo can be determined explicitly as (provided the coefficients, i. e. the functions a and b, are continuous)
f* ait) dt ( ex , — [' a(s) ds ,\
y = e^o       (y0 + jxQbit)e Ao        dt) .
Let us remark that the linear equation has no singular solution, and the general solution contains aCel □
8.127.   Solve the linear equation
(y + 2xy) e*2 = cos x.
the derivative y1 (t) of a function with its value
(8.4) /(?) = r.y(0
with a proportionality constant r.
It is easy to guess the solution of this equation, i. e. a function y(t) which satisfies the equality identically,
y(0 = Cert
with an arbitrary constant C. This constant can be determined uniquely be choosing the so-called initial values yo — y(?o) at some point to. If a part of the increase in our model were given by a constant action independent of the value y or t (like bank charges or the natural decrease of population as a result of sending some part of it to slaughterhouses), we could use an equation with a constant s on the right-hand side.
(8.5)
y(f) = r-y(f) + 5.
Apparently, the solution of this equation is the function
y(0 = Cert--.
r
It is very easy to come across this solution if we realize that the set of all solutions of the equation (8.4) is a one-dimensional vector space, while the solutions of the equation (8.5) are obtained by adding any one of its solutions to the solutions of the previous equation. We can then easily find the constant solution y(t) — k forA: = -f.
Similarly, in paragraph 1.13, we managed to create the so-called logistic model of population growth based upon the assumption that the ratio of the change of the population size p(n + 1) — pin) and its size pin) is affine with respect to the population size itself. We also wanted the model to behave similarly as the Malthu-sian one for small values of the population size and to cease growing when reaching a limit value K. Now, the same relation for the continuous model can be formulated for a population pit) dependent on time t by the equality
(8.6)
AO = Pit) (-
K
Pit)
i. e., at the value pit) — K for a large constant K, the instantaneous increase of the function p is indeed zero, while for pit) near zero, the ratio of the rate of increase of the population and its size is close to r, which is often a small number (roughly hundredths) expressing the rate of increase of the population in good conditions.
It is surely not easy to solve such an equation without knowing the proper theory (although we will be able to deal with this type of equations presently). However, as an exercise on differentiation, we can easily verify that the following function is a solution for every constant C:
Pit)
K
1 + CK e"
517
CHAPTER 8. CONTINUOUS MODELS WITH MORE VARIABLES
Solution. If we used the method of integration factor, we would only rewrite the equation trivially since it is already of the desired form -
2
the expression on the left-hand side is the derivative of y ex . Thus, we can immediately calculate
/
cosx,
ye-
ye-
sinx + C,
y = e x (sinx + C) ,
cos x dx,
Cef, Cel.
□
8.128.
Find all non-zero solutions of the Bernoulli equation
y-f = 3xy2.
Solution. The Bernoulli equation
y = a(x)y + b(x)f ,    r#0, r#l, reM
can be solved by first dividing by the term / and then using the substitution u = yl~r, which leads to the linear differential equation
u' = (1 — r) [a(x)u + b(x)] . In this very problem, the substitution u = y1-2 = 1/y gives
u' + a = -3x.
X
Similarly to the previous exercise, we have
u =e-lnl*l [J-3xeln|x|t/x] , was obtained as an (arbitrary) antiderivative to 1/x.
where In | x Furhter,
-3x e1
In I x
dx
/
-3x \x\dx
The absolute value can be replaced with a sign that can be canceled, i. e., it suffices to consider
u = \ [f -3x2 dx] = \ [-x3 + C], CeR.
Returning to the original variable, we get
y = - = ttS, CeR.
J       u       C-xi '
The excluded case y = 0 is a singular solution (which, of course, is true for every Bernoulli equation with r positive). □
8.129.   Interchanging the variables, solve the equation
y dx — (x + y2 sin y) dy = 0.
Solution. When the variable x occurs only in the first power in the differential equation and y occurs in the arguments of elementary functions, we can apply the so-called method of variable interchange, when we look for the solution as for a function x of the independent variable
y-
First, we write the equation explicitly:
y = —f—•
y        x+yA sin y
This equation is not of any of the previous types, so we rewrite it as follows:
Confronting the red graph (left-hand picture) of this function with the choice K = 100, r — 0, 05, and C — 1 (the first two were used in 1.13 this way, the last one roughly corresponds to the initial value p(0) — 1) with the right-hand picture (the solution of the difference equation from 1.13 with the same values of the parameters), we can see that both approaches to population modeling indeed yield quite similar results. To compare the output, the left-hand picture also contains in green the graph of the solution of the equation (8.4) with the same constant r and initial condition.
8.46. First-order differential equations. By an (ordinary) first-order differential equation, we usually mean the relation between the derivative y1 (t) of a function with respect to the variable t, its value y(t), and the vari-able itself, which can be written in terms of some real-valued function F : R3 —>• R as the equality
F(/(0,y(0,0 = 0.
The writing resembles implicitly given functions y(t); however, this time, there is a dependency upon the derivative of the wanted function y(t).
If the equation is solved at least explicitly with regard to the derivative, i. e.,
y'(t) = f(t, y(0)
for some function / : R —>• R, we can imagine graphically what this equation defines. For every value (t, y) in the plane, we can consider the arrow corresponding to the vector (1, fit, y)), i. e., the velocity with which the point of the graph of the solution moves through the plane in dependence on the free parameter t.
For the equation (8.6), for instance, we get the following picture (illustrating the solution for the initial condition as above).
518
CHAPTER 8. CONTINUOUS MODELS WITH MORE VARIABLES
dy_ _ y
dx     x + y2 sin y '
dx     (       y       \_1 x
ay     \x + y1 sin y J y
, 1 x = — x + y sin y.
y
We have thus obtained a linear differential equation. Now, we can easily compute its general solution
x = —y cos y + Cy, Cel.
□
Further problems concerning first-order differential equations can be found on page ??.
L. Practical problems leading to differential equations
8.130. A water purification plant with volume 2000 m3 was contaminated with lead which is spread in the water with density 10 g/m3. Water is flowing in and out of the basin at 2 m3/s. In what time does the amount of lead in the basin decrease below 10 [ig/m3 (which is the hygienic norm for the amount of lead in drinkable water by a regulation of the European Community) provided the water keeps being mixed uniformly?
Solution. Let us denote the water's volume in the basin by V (m3), the speed of the water's flow by i; (m3/s). In an infinitesimal (infinitely small) time unit At,y-v At grams of lead runs out of the basin, so we can construct the differential equation
dm
m
- — ■ v At V
for the change of the lead's mass in the basin. Separating the variables, we get the equation
Am
m
v
-At. V
Integration both sides of the equation and getting rid of the logarithms, we get the solution in the form m(t) = m0e~^', where m0 is the lead's mass at time t = 0. Substituting the concrete values, we find out that t = 6 h 35 min. □
8.131. The speed of transmission of a message in a population consisting of P people is directly proportional to the number of people who have not heard the message yet. Determine the function / which describes the dependency of the number of people who have heard the message on time. Is it appropriate to use this model of message transmission for small or large values of PI
Solution. We construct a differential equation for /. The speed of the transmission = fit) should be directly proportional to the number of people who have not heard of it, i. e. the value P —fit). Altogether,
dj_
dt
10C-
y(x)
80-
60-
40-
207
'///// '//// '////
1111
III!
//// //// ■%////
//// //// ////
1111
m ',',y,
//// ////
//// /// ' ////
111
m
//// ////
/// ///
J11
w
/// ///
//// ////
1111
m '///,
//// //// ////
I " I ' I ' I ' I " I T  "T  "1   T T™
50 100
~i—I—r—r—r—r—r—i
150 200
Considering these pictures, we can intuitively anticipate that for every initial condition, there will exist a unique solution of our equation. However, as we will see, this proposition holds only for sufficiently smooth functions /.
8.47. Integration of differential equations. Before examining existence of the solutions of the differential equations, we present at least one truly elementary method of solution. It transforms the solution to ordinary integration, which usually leads to an implicit description of the solution.
equations with separated variables J____
Consider a differential equation in the form
(8.7) y« = /(o-s mo)
for two continuous functions of a real variable, / and g.
The solution of this equation can be obtained by integration, i. e., we find the antiderivatives
dy
Giy)
fix)dx.
This procedure reliably finds a solution which satisfies g(y(t)) ^ 0.
Then, computing the function y(x) from the implicitly given formula F(x) + C — Giy) with an arbitrary constant C leads to the solution, because differentiating this equation using the chain rule for the composite function G(y(x)) indeed leads to      ■ y1 (x) —
fix).
As an example, we can find the solution of the equation
y (x) = x • y(x).
f C. Hence it looks (at
Direct calculation gives In \ y(x) least for positive values of y) as
2x2
HP - fit)).
y(x) = eix2+c = D ■ e^2,
where D is an arbitrary positive constant now. Let us stop for a while to examine the resulting formula and signs thoroughly. The
519
CHAPTER 8. CONTINUOUS MODELS WITH MORE VARIABLES
Separating the variables and introducing a constant K (the number of people who know the message at time t = 0 must be P — K), we get the solution
fit)
Ke
-kt
where k is a positive real constant.
Apparently, this model makes sense for large values of P only. □
8.132. The speed at which an epidemic spreads in a given closed population consisting of P people is directly proportional to the product of the number of people who have been infected and the number of people who have not. Determine the function fit) describing the number of infected people in time.
Solution. Just like in the previous problem, we construct a differential equation:
df dt
Again, separating the variables and introducing suitable constants K and L, we obtain
K
k ■ f(t) iP - fit)) .
fit)
1+Le
-Kkt
□
8.133. The speed at which a given isotope of a given chemical element decays is directly proportional to the amount of the given isotope. The half-life of the isotope of plutonium H9Pu is 24,100 years. In what time does a hundredth of a nuclear bomb whose active component is the mentioned isotope disappear?
Solution. Denoting the amount of plutonium by m, we can build a differential equation for the rate of the decay:
dm
- = k ■ m,
dt
where k is an unknown constant. The solution is thus the function m{t) = m0e~kt. Substituting into the equation for half-life^-*' = |), we get the constant k = 2.88 • 105. The wanted time is then approximately 349 years. □
8.134. The acceleration of an object falling in a constant gravitational field with a certain resistance of the environment is given by the formula
dv dt
where k is a constant which expresses the resistance of the environment. An object was dropped in a gravitational field with g = 10 ms-2 at the initial speed of 5 ms-1, the resistance constant is k = 0.5 s-1. What will the speed of the object be in three seconds?
Solution.
— = g - kv,
(f-4
-kt
constant solution y(x) — 0 satisfies our equation as well, and for negative values of y, we can use the same solution with negative constants D. In fact, the constant D can be arbitrary, and we have found a solution satisfying any initial value.
\\\w
)//S'sS'-*-lilt//"-*-.
I
III
',,111 -'///II '/////
In
///
HHP
---ww v
-n.WW \
m\\
n Iii ill
} / / f/sS'^-111////"-
WW,',',!
wm
" mini
-"////
"////111 - -,,/
The picture shows two solutions which demonstrate the instability of the equation with regard to the initial values: If, for any xq, we change a tiny yo from a negative value to a positive one, then the behavior of the resulting solution changes dramatically. Moreover, we should notice the constant solution y(x) — 0, which satisfies the initial condition y(*o) — 0.
Using separation of variables, we can easily solve the nonlinear equation from the previous paragraph which described a logistic population model. Try this as an exercise.
In the first chapter, we paid much attention to the so-called linear difference equations, and their general solution, looking quite awful, was determined in paragraph 1.10 on page 13. Although it was clear beforehand that it will be a one-i S dimensional affine space of satisfying sequences, it was a hardly transparent sum, because we needed to take into account all of the changing coefficients.
We can thus use this as a source of inspiration for the following construction of the solution of a general first-order linear equation
(8.8)
y (t) = a(t)y(t) + b(t)
with continuous coefficients a it) ans bit).
First of all, let us find the solution of the homogenized equation y (0 — a(t)y(t). This can be computed easily by separation of variables, obtaining
y(t) = yoF(t,to),    F(t,s) = daMdx .
In the case of difference equations, we "guessed" the solution, and then we proved by induction that it was correct. It is even simpler now, as it suffices to differentiate the correct solution to verify the statement.
.... |    The solution of first-order linear equations    |_,
The solution of the equation (8.8) with initial values y(?o) — yo is (locally in a neighborhood of to) given by the formula
yit) = yoFit, t0) + /  Fit, s)b(s) ds,
f
J* a(x) dx
v(3) = 20 — I5e 2 ms 1 after substitution.
□
where F(t, s) — e
Verify the correctness of the solution by yourselves (pay proper attention to the differentiation of the integral where t is both in the upper bound and a free parameter in the integrand).
520
CHAPTER 8. CONTINUOUS MODELS WITH MORE VARIABLES
8.135. The rate of increase of a population of a certain type of bug is indirectly proportional to its size. At time t = 0, the population had 100 bugs. In a month, the population doubled. What will the size of the population be in two months?
Solution. Let us consider a continuous approximation of the number of bugs, and let their amount be denoted by P. Then, we can build the following equation:
dP _ k dt ~ P'
77-100,
□
P = ^Kt + c. Substituting the given values, we get P (2) which is an estimate of the actual number of bugs.
Now, we can, for instance, directly solve the equation y' (x) — 1 — x ■ y(x), this time encountering stable behavior, visible in the following pic-
ture.	
	-\\ \
	-\\
	-W \
2,5*	A
	
8.136. Find the equation of the curve with the following properties: It lies in the first quadrant, goes through the point [l, 3/4], and its tangent at any point marks on the positive half-axis y a segment whose length is the same as the distance of that point from the origin. O
8.137. Consider a chemical compound C isolated in a container. C is unstable, with half-time of a molecule equal to q time units. If there were M moles of the compound C in the container at the beginning (i. e., at time t = 0), how many moles of it will be there at time t > 0?
o
8.138. A 100-gram body lengthens a spring of 5 cm if hung on it. Express the dependency of its position on time t provided the speed of the body is 10 cm/s when going through the equilibrium point. O
Further practical problems that lead to differential equations can be found on page ??.
M. Higher-order differential equations
8.139. Underdamped oscillation. Now, we will describe a simple model for the movement of a solid object attached to a point with a strong spring. If y(t) is the deviation of our object from the point yo = y(0) = 0, then we can assume that the acceleration y" it) in time t is proportional to the magnitude of the deviation, yet with the other sign. The proportionality constant k is called the spring constant. Considering the case k = 1, we get the so-called oscillation equation
y"(t) = -y(t).
This equation corresponds to the system of equations
x'(0 = -y(0, y'(t)=x(t)
from 8.7. The solution of this system is given by
x(t) = R cos(? - r),    y(t) = R sin(f - r)
with an arbitrary non-negative constant R, which determines the maximum amplitude, and a constant r, which determines the initial phase.
Therefore, in order to determine a unique solution, we need to know not only the initial position yo, but also the speed of the motion at that moment. These two pieces of information uniquely determine both the amplitude and the initial phase.
Moreover, let us imagine that as a result of the properties of the spring material, there is another force which is directly proportional to the instantaneous speed of our object, with the other sign than the amplitude again. This is expressed by one more term with the first derivative, so our equation is now
«w /—w\\v. /^•\\\\\\\
//^-^n\\\\\\\\. ///^-^nww \\ \ \\ \ .
t//S'— ^WWWWU
/ / / / ->»^n\\ \ \ \
w / m / / ///ssss'^—-l/l/.//./!/././././/././././/.//./
/-*w\ \ v.
Y/7^X\\\\\ hit
/ / //x/^——-~-~n\\\\\ /////// ///SSSS'^-'-^ 1/1/1//1//1/1/1/1/1/1/1/1/.//.//./
8.48. Transformation of coordinates. Our pictures tend to indi-f'Q ^ cate that differential equations can be perceived as •£j\ geometric objects (the "directional field of the ar-yg^ggr-js rows"), so we should be able to look for the solution by conveniently chosen coordinates. We will get back to this point of view later; now, we will only show three simple and typical tricks as they seem from the explicit form of the equations in coordinates.
We begin with the so-called homogeneous equations of the form
yw = /(^-).
Considering a transformation z get by the chain rule that
1
r, assuming that t / 0, then we
z'(t) = -{ty'{i)
y(t)) = -t(f(z)
z),
which is an equation with separated variables.
Another example is the so-called Bernoulli differential equations, which are of the form
y(t) = f(t)y(t) + g(t)y(t)n,
where n ^ 0, 1. The choice of the transformation z — yl~n leads to the equation
z'(t) = (1 - n)y(t)-"(f(t)y(t) + g(t)f) = (l-n)f(t)z(t) + (l-n)g(t),
which is a linear equation, which we are able to integrate.
In the end, let us take a look at an extraordinarily important equation, the so-called Riccati equation. It is a form of the Bernoulli equation with n — 2, extended by an absolute term
y(t) = f(t)y(t)+g(t)y(t)2 + h(t).
This equation can also be transformed to a linear equation provided that we are able to guess a particular solution x(t). Then, we can use the transformation
1
z(0 =-•
y(t) - x(t)
Verify by yourselves that this transformation leads to the equation
z'(t) = -(f(t) + 2xit)git))zit) - git).
521
CHAPTER 8. CONTINUOUS MODELS WITH MORE VARIABLES
y"(t) = -y(t)-ay'(t),
where a is a constant which expresses the magnitude of the damping. In the following picture, there are the so-called phase diagrams for solutions with two distinct initial conditions, namely with zero damping on the left, and for the value of the coefficient a = 0.3 on the right.
Tlumené oscilace
Tlumené oscilace
r.
The oscillations are expressed by the y-axis values; the x-axis values describe the speed of the motion.
8.140. Undamped oscillation. Find the function y(i) which satisfies the following differential equation and initial conditions:
y"(t) + 4y(t) = f(t), ;y(0) = 0, y'(0) = -1,
where the function f(t) is piecewise continuous:
1 cos(2?)   for 0 < t < 7i, 10 for t > 7t.
fit)
Solution. This problem is a model of undamped oscillation of a spring (omitting friction, non-linearities in the toughness of the spring, and other factors) which is initiated by an outer force.
The function f(t) can be written as a linear combination of Heav-iside's function u(t) and its shift, i. e.,
Since
f(t) = cos(2í)(w(í) - M0)
£(y")(s) = s2C(y) - sy(0) - y'(0) = s2 C(y) + 1,
we get, applying the results of the above exercises 7 and 8 to the Laplace transform of the right-hand side
s2 C(y) + \ + 4C(y)
£(cos(2í)(«(0 - M0)) = £(cos(2i) • u(t)) - £(cos(2i) • M0)
£(cos(2i))
Hence,
C(y)
(1
1
'£(cos(2(í + 7C))
s2 +4
s2 +4
+ (1
(s2 + 4)2
Performing the inverse transform, we obtain the solution in the form
s
y(t)
sin(2í) + \t sin(2i) + C~l \e
(s2 + Af
Just as we saw in the case of integration of functions (which is, in fact, the simplest type of equations with separated variables), the equations usually do not have a solution expressible explicitly in terms of elementary functions.
Similarly as with standard engineer tables of values of special functions, books listing the solutions of basic equations were compiled as well.4 Today, the wisdom concealed in them is essentially transferred to software systems like Maple or Mathematica. There, we can assign any task on ordinary differential equations, and we will get the results in a surprisingly good deal of cases, yet after all, it will not be possible for most problems.
The way out of this is numerical methods, which try only to approximate the solutions. However, to be able to use them, we still need good theoretical starting points regarding existence, uniqueness, and stability of the solutions.
We begin with the so-called Picard-Lindelof theorem:
Existence and uniqueness of the solutions of ODEs
8.49. Theorem. Let a function j'(t, y) : R2 -> R have continuous partial derivatives on an open set U. Then for every point (to, yo) e U D R2, there exists a maximal interval I — [?o — a, to + b], with positive a, b e R, and a unique function y(t) : I -> R which is a solution of the equation
f(t) = f(t, y(t))
on the interval I.
Proof. Notice that if a function y(t) is a solution of our equation satisfying the initial condition y(?o) — to, then it also satisfies the equation
y(0 = yo + f yf" (t) dt = y0 + I fit, y(t)) dt. Jt0 Jt0
However, the right-hand side of this expression is, up to constant, the integral operator
L(y)(t) = yQ+ f f(t,y(t))dt. Jt0
When solving our first-order differential equations, we are thus looking for a fixed point of this operator L, i. e., we want to find a function y — y(t) satisfying L(y) — y.
On the other hand, if a Riemann-integrable function y(t) is a fixed point of the operator L (y), then it immediately follows from the antiderivative theorem that y(t) indeed satisfies the given differential equation, including the initial conditions.
We can quite easily guess for the operator L how much its values L (y) and L (z) differ for various arguments y(t) and z(t). Indeed, thanks to the partial derivatives of the func-> tion / being continuous, we know that / is locally Lips-chitz. This means that we have the bound
\f(t,y)-f(t,z)\ < C\y-z\, with a constant C if we restrict the values (t, y) to a neighborhood of the point (to, yo) with compact closure. We choose an e > 0 and restrict the value of t to some interval J — [to — ao, to + bo] so
/'
E. g., Kamke.
522
CHAPTER 8. CONTINUOUS MODELS WITH MORE VARIABLES
However, by formula (||7.36||), we have
s
1 /     — tis
CTy\e
l-C-l(e-ns C(t sin(2f)))
(s2 + 4)2 ) 4
= (t -tx) sin(2(f - tx)) ■ Hn(t).
Since Heaviside's function is zero for t < tx and equal to 1 for t > tx, we get the solution in the form
y(0
■\ sin(2f) + \t sin(2f)    for 0 < f < ix
n-2
sin(20
for t > tt
□
8.141. Find the general solution of the equation
/" - 5/ - 8/ + 48y = 0.
Solution. This is a third-order linear differential equation with constant coefficients since it is of the form
y(n) + aiy(n-l) + a2y(n-2) + . . . + + ^ = f (x)
for certain constants a\, ..., an el Moreover, we have f(x) = 0, i. e., the equation is homogeneous.
First of all, we will find the roots of the so-called characteristic polynomial
kn + a{kn~x + a2kn~2 H-----h an_{k + an.
Each real root k with multiplicity k corresponds to the k solutions
and every pair of complex roots k = a ± iß with multiplicity k corresponds to the k pairs of solutions
eax cos (fix) , x eax cos (fix)        xk~1 eax cos (fix) ,
eax sin (fix) , x eax sin (fix)        xk~1 eax sin (fix) .
Then, the general solution corresponds to all linear combinations of the above solutions.
Therefore, let us consider the polynomial
k3 - 5k2 - Sk + 48
with roots ki = k2 = 4, k3 = —3. Since we know the roots, we can deduce the general solution as well:
y = de41 + C2x e4x + C3e"3\    d, C2, C3el. □
8.142. Compute
f + f + 9/ + 9y = ex + lOcos(3x) .
Solution. First, we will solve the corresponding homogeneous equation. The characteristic polynomial is equal to
k3 + k2 + 9k + 9,
with roots ki = — 1, k2 = 3i, k3 = — 3i. The general solution of the corresponding homogeneous equation is thus
y = de-x + C2cos (3x) + C3 sin (3x) ,    C\, C2, C3 e R.
The solution of the non-homogeneous equation is of the form
y = de-x + C2 cos (3x) + C3 sin (3x) +yp,    Cu C2, C3el
for a particular solution yp of the non-homogeneous equation.
The right-hand side of the given equation is of a special form. In general, if the non-homogeneous part is given by a function
that J x [yo —e, yo +s] c U, and we consider only those functions y(t) and z(t) which, for t e J, satisfy
max\y(t) - y0\ < e, max \z(f) — )>ol < £■
Now, we obtain the bound
\(L(y) - L(z))(t)\ =
f f
Jtn
f(t,y(t))- f(t,z(t))dt
< I   \f(t,y(t))- f(t,z(t))\dt 'to
<C f \y(t)-z(t)\dt Jt0
< D\t-t0\
for suitable constants C and D. For a sufficiently small 8 > 0, we thus have
max \L(y)(t)
L(z)(t)\ <
max c\y(i)
\t—1§\ <&
z(t)\
for some constant 0 < c < 1. In paragraph 7.19 on page 441, these operators were called contraction. However, for the assumptions of the Banach contraction theorem, which guarantees a uniquely determined fixed point, we need even completeness of the space X of functions on which the operator L works.
In our case, we can notice that merely from the continuity of \\ the mapping f(t, y), there follows a uniform bound for all of the functions y(t) considered above and the values t > s in their domain:
\L(y)(t) - L(y)(s)\
f
\f(t, y(t)\dt<d\t
with a universal constant d > 0. Therefore, besides the conditions mentioned above, we can even restrict ourselves to the subset of all uniformly continuous functions. This set is already compact, hence it is a complete set of continuous functions on our interval, see Arzela-Askoli theorem 7.23. Therefore, there exists a unique fixed point y(t) of this contraction L, which is the solution of our equation.
It remains to show the existence of a maximal interval / — (to — a, to + b). Let us suppose that we have found a solution y(i) on an interval (to,     and, at the same time, the limit
yi = lim y(t)
exists and is finite. Then, it follows from what has been proved above that there must exist a solution with initial condition (ti, y i), in some neighborhood of the point t\, and on the left-hand side of it, it must coincide with the solution y(t). Therefore, the solution y (t) can surely be extended on the right-hand side of t\. There are thus only two possibilities when the solution right of t\ does not exist: either there is no finite limit y(i) at t\ from the left, or the limit yi exists, yet the point (t\, yi) is on the boundary of the domain U of the function /. In both cases, we indeed have a maximal extension of the solution right of to.
The argumentation for the maximal solution left of to is analogous. □
523
CHAPTER 8. CONTINUOUS MODELS WITH MORE VARIABLES
P„(x)zax,
where Pn is a polynomial of degree n, then there is a particular solution of the form
yp=xk R„(x)eax, where k is the multiplicity of a as a root of the characteristic polynomial and Rn is a polynomial of degree at most n. More generally, if the non-homogeneous part is of the form
ex [Pm (x) cos (fix) + S„ (x) sin (fix) ] ,
where Pm is a polynomial of degree m and S„ is a polynomial of degree n, there exists a particular solution of the form
yp=xk eax [Ri(x) cos (fix) + Tr(x) sin      ] ,
where k is the multiplicity of a + ifi as a root of the characteristic polynomial and i?;, Ti are polynomials of degree at most I = max {m,n}.
In our problem, the non-homogeneous part is a sum of two functions in the special form (see above). Therefore, we will look for (two) corresponding particular solutions using the method of undetermined coefficients, and then we will add up these solutions. This will give us a particular solution of the original equation (as well as the general solution, then). Let us begin with the function y = ex, which has particular solution yPl (x) = Aex for some Aet. Since yPl (x) = y'pi (x) = /; (x) = fv\ (x) = Ae\ substitution into the original equation, whose right-hand side contains only the function y = ex, leads to
20Ae*=e\   i.e.   A = 2\.
For the right-hand side with the function y = 10 cos (3x), we are looking for a particular solution in the form
yP2 (x) = x [B cos (3x) + C sin (3x) ].
Recall that the number a = 3i was obtained as a root of the characteristic polynomial. We can easily compute the derivatives
y'n (x) = [B cos (3x) + C sin (3x) ]
+x [-3B sin (3x) + 3C cos (3x) ],
y"n (x) =2[-3B sin (3x) + 3C cos (3x) ]
+x [-9B cos (3x) - 9C sin (3x) ],
(x) = 3 [-9B cos (3x) - 9C sin (3x) ]
+x [27 B sin (3x) -21C cos (3x) ].
Substituting them into the equation, whose right-hand side contains the function y = 10 cos (3x), we get
-185 cos (3x) - 18C sin (3x) - 6B sin (3x) + 6C cos (3x) =
10 cos (3x) .
Confronting the coefficients leads to the system of linear equations -18S + 6C = 10, -18C-65=0
with the only solution B = —1/2 and C = 1/6, i. e., yP2(x) = x [—| cos (3x) + i sin (3x)]. Altogether, the general solution is
y = Cit~x + C2 cos (3x) + C3 sin (3x) + -^ex
1 1
—x cos (3x) -\—x sin (3x) ,    C\, C2, C3 e M.
2 6
8.50. Iterative approximations of solutions. The proof of the previous theorem can be reformulated as an iterative procedure which provides approximate solutions using step-by-step integration. By a concrete bound for the constant c from the proof, we can get even straight bounds for the errors. Try to think this out as an exercise (see the proof of Banach fixed-point theorem in paragraph 7.19). It can then be shown quite easily and directly that it is a uniformly convergent sequence of continuous functions, so the limit will again be a continuous function (without invoking the complicated theorems from the seventh chapter).
_ I    Picard's approximations [___
The unique solution of the equation
y(0 = /(f, y(t))
whose right-hand side / has continuous partial derivatives can be expressed, on a sufficiently small interval, as the limit of step-by-step iterations beginning with the constant function (the so-called Picard's approximation):
yo(t) = yo,   yn+i(t) - L(y„), n - 1,....
It is a uniformly converging sequence of continuous functions with continuous limit y(t).
Let us notice that we actually needed only the Lipschitzness of partial derivatives of the function, so the theorem holds with this weaker assumption as well. We will show in the next paragraph that mere continuity of the function / guarantees the existence of the solution as well, yet it is insufficient for the uniqueness.
8.51. Ambiguity of solutions. Let us begin with a really simple example. Consider the equation
y(o = v/woT-
Separating the variables, we can easily find the solution
y(0 = \(t + Q2,
for positive values y, with an arbitrary constant C and t + C > 0. For the initial values (to, yo) with yo / 0, this is an assignment matching the previous theorem, so there will also be locally exactly one solution. The solution must apparently keep being non-decreasing, hence for negative values yo, we get the same solution, only with the other sign and t + C < 0.
However, for the initial condition (to, yo) — (to, 0), we have not only the already discussed solution continuing to the left of to and to the right, but also the identically zero solution y(t) — 0. Therefore, these two branches can be bound arbitrarily, see the picture. Nevertheless, the existence of a solution is guaranteed by the following theorem, known as Peano existence theorem:
Theorem. Consider a function f'(t, y) : M2 —>• M which is continuous on an open set U. Then for every point (to, yo) e U D R2, there exists a continuous solution of the equation
y(t) = f(t, y(t))
locally in some neighborhood of to.
Proof. The proof will be presented only roughly, leaving the details to the reader. Instead of using Picard's approximations, we will proceed quite naively.
524
CHAPTER 8. CONTINUOUS MODELS WITH MORE VARIABLES
□
8.143.   Determine the general solution of the equation
/ +3/ +2y = e~2x.
Solution. The given equation is a second-order (the highest derivative of the wanted function is of order two) linear (all derivatives are in the first power) differential equation with constant coefficients. First, we solve the homogenized equation
/ + 3/ + 2y = 0.
Its characteristic polynomial is
x2 +3x + 2 = (x + l)(x + 2),
with roots x\ = — 1 and x2 = —2. Hence, the general solution of the homogenized equation is
+ c2e
-2x
where ^, c2 are arbitrary real constants.
Now, using the method of undetermined coefficients, we will find a particular solution of the original non-homogeneous equation. According to the form of the non-homogeneity and since —2 is a root of the characteristic polynomial of the given equation, we are looking for the solution in the form yo = axe~2x for aeK.
Substituting into the original equation, we obtain
-2x
a[-4e~2x +4xe~2x +3(e~2x
2xe~2x) + 2xe~2x]
hence a = — 1. We have thus found the function — xe 2x as a particular solution of the given equation. Hence, the general solution is the
function space c\e x + c2e
-2x
xe
-2x
,ci,c2 e
□
8.144. Determine the general solution of the equation
y + y = i.
Solution. The characteristic polynomial of the given equation is x2 +x, with roots 0 and — 1. Therefore, the general solution of the homogenized equation is c\ + c2e~x, where c\, c2 e R.
We are looking for a particular solution in the form ax, a e R (since zero is a root of the characteristic polynomial). Substituting into the original equation, we get a = 1. The general solution of the given non-homogeneous equation is c\ + c2e~x + x, c\, c2 e R. □
8.145. Determine the general solution of the equation
y" + 5y' + 6y = e
-2x
Solution. The characteristic polynomial of the equation is x2 + 5x + 6 = (x + 2)(x + 3), its roots are —2 and —3. The general solution of the homogenized equation is thus c\e~2x + c2e~3x, c\,c2 e R. We are looking for a particular solution in the form axe~2x, (—2 is a root of the characteristic polynomial), a e R, using the method of undetermined coefficients. Substitution into the original equation yields a = 1. Hence, the general solution of the given equation is
cxe~2x + c2e~3x + xe~2x.
We will construct a solution to the right of the initial point to. To this purpose, we select a small step h > 0 and label the points
tk — to+kh, k—l,2,....
The value of the derivative f(to, yo) of the corresponding curve of the solution (t, y(t)) is defined at the initial point (to, yo), so we can substitute a parametrized line with the same derivative:
y°) (t) = yo + f(to,yo)(t-to),
and we label yi — y(0) (fi). We thus inductively construct the functions and points
yik) (t) = yk + f(xk, yu)(t - h),   yk+1 = y(k) (tk+l).
Now, we define yn(t) by gluing the particular linear parts, i. e.,
yh(t) = yw (0   for all te[kh,(k+ l)h].
This is clearly a continuous function, which is called Euler's approximation of the solution.
Now it "only" remains to prove that the limit of the functions \\ yn for h approaching zero exists and is a solution.
For this, we need to notice (as we have already done in the proof of the theorem on uniqueness and existence of the solution) that, thanks to f(t, y) being uniformly continuous on the neighborhood U where we are looking for a solution, we have, for any selected e > 0, a S such that
\f(t,y)-f(s,z)\<s,
whenever || (t — s, y — z)\\ < S.
Especially, all of out functions y^ are in the set of uniformly continuous functions on the concerned interval. Therefore, by Arzela-Askoli theorem (see paragraph 7.23 on page 443), there exists a sequence of values h„ —>• 0 such that the corresponding sequence of functions ynn converges uniformly to a continuous function y(n). Further, let us write more simply y„ (t) — ynn —>• y(t).
However, for each of the continuous functions yh, we have only finitely many points in the interval [to,t] where it is not dif-ferentiable, so we can write
yn (t) — yo +
f y'n
Jta
(s) ds.
On the other hand, the derivatives on the particular intervals are constant, so we can write (here, k is the largest such that to +khn < t, while y, and tj are the points from the definition of the function
yn (t) - yo +
E / n*j
yj)ds
+ f f(tk,Jk)-Instead, we would like to see
y» (0 — yo + I f(s, yn (s)) ds,
f
but the difference between this integral and the last two terms in the previous expression is bound by the possible differences of the function f(t, y) and the lengths of the intervals. Thanks to our
525
CHAPTER 8. CONTINUOUS MODELS WITH MORE VARIABLES
□
8.146.   Determine the general solution of the equation
/ - y
- 5.
Solution. The characteristic polynomial of the equation is x2 —x, with roots 1, 0. Therefore, the general solution of the homogenized equation is ci + c2ex, where c\, c2 € R. We are looking for a particular solution in the form ax, a € R, using the method of undetermined coefficients. The result is a = —5, and the general solution is of the form
c\ + c2ex — 5x.
8.147.   Solve the equation
f -iy+y
□
x2 + l ■
Solution. We will solve this non-homogeneous equation using the method of variation of constants. We will thus obtain the solution in the form
y = Ci(x) yi(x) + C2(x) y2(x) H-----h C„(x) y„(x),
where y\, ..., yn give the general solution of the corresponding homogeneous equation and the functions C\ ix),..., Cn (x) can be obtained from the system
C\ (x) yi (x) + ■ ■ ■ + C'n (x) yn (x) = 0, C'l(x)y\(x) + ... + C'n(x)y'n(x) = Q,
C\(x) y(r2) (x) + ■ ■ ■ + C'n(x) y(n"-2) (x) = 0, C\(x) yf(x) + • • • + C'n(x) y(r1] ix) = fix).
The roots of the characteristic polynomial X2 — 2k + 1 are ki = k2 = 1. Therefore, we are looking for the solution in the form
Cl(x)ex + C2(x)xex,
considering the system
kC\{x)ex + C'2(x)xex =0, C\(x) ex + C'2(x) [ex +xex] = -^j-
We can compute the unknowns Cx{x) and C'2(x) using Cramer's rule. It follows from
ex
ex e
x ex ex + x ex
ex ex
0
t2+l
x ex + x ex		= e2x,
	-X	e2x
		X2 + 1'
0		e2x
x2+l		X2 + 1
universal bound for f(t, y) above, we can thus use just the last integral instead of the actual values in the limit process lim„^oo yn (t), thereby obtaining
y(t) — lim
(yo+     f(s, yn(s))ds) J t0
= yo+     (lim f(s, y„ (s))) ds Jt0
— yo + / f(s, y(s))ds, Jt0
where we used the uniform convergence y„ (t) —>• y(t). This proves the theorem.
□
8.52. Systems of first-order equations. The problem of finding the solution of the equation y1 (x) — f(x, y) can ■f also be viewed as looking for a (parametrized) curve (x(f), y(t)) in the plane where we have fixed the parametrization of the variable x(?) — t beforehand. However, if we accept this point of view, then we can forget this fixed choice for one variable and we can add an arbitrary number of variables.
In the plane, for instance, we can write such a system in the form
x' (0 = f(t, xit), yit)),    y (0 = git, xit), yit))
with two functions /, g : R3 —>• R. Similarly for more variables.
A simple example in the plane might be the system of equations
x,(t) = -y(t),    y1it) = xit).
It can be easily guessed (or verified at least) that there is a solution of this system,
xit) — R cos t,    yit) — R sin t,
with an arbitrary non-negative constant R, and the curves of the solution will be exactly the parametrized circles with radius R.
In the general case, we will work with the vector notation of the system in the form
x'(0 = /(f, xit))
for a vector function x : R —>• R" and a mapping / : M"+1 —>• R". We are able to extend the validity of the theorem on uniqueness and existence of the solution to such systems:
Existence and uniqueness for systems of ODEs    J_>
Theorem. Consider functions fit, x\, x„) : M"+1 —>• R, i = I,..., n, with continuous partial derivatives. Then, for every point (to, x\, ..., x„) e Rn+1, there exists a maximal interval [to — a, to + b], with positive numbers a, b € R, and a unique function xit) : R —>• R" which is the solution of the system of equations
x\ (x) - fi it, xi it), ...,x„ ix))
x'nix) = /„it, xi (0, - - -, x„(x)) with initial condition
Xl(t0) - xi, ...,x„(t0) -x„.
526
CHAPTER 8. CONTINUOUS MODELS WITH MORE VARIABLES
that
Ci(x)
j ^-^dx =-Un(x2+ l) + Cu    Ci e
/<ix x2 +1
C2(jc)
arctanx + C2,    C2 e
Hence, the general solution is
y = Cie* + C2X ex — \ ex In (x2 + l) + x e^arctanx,    Ci, C2 e
□
8.148. Find the only function y which satisfies the linear differential equation
y(3) -3/ -2y = 2e*,
with initial conditions y(0) = 0, / (0) = 0, y" (0) = 0.
Solution. The characteristic polynomial is x3 — 3x — 2, with roots 2 and — 1 (double). We are looking for a particular solution in the form aex, a e R, easily finding out that it is the function — \ex. The general solution of the given equation is thus
9 1
c\e   +c2e x + c3xe x - -ex.
Substituting into the original conditions, we get the only satisfactory function,
2  2x       5    -x       1     -x       1 X
-elx H--e    + -xe--e .
9        18        3 2
□
Further problems concerning higher-order differential equations can be found on page ??
N. Applications of the Laplace transform
Differential equations with constant coefficients can also be solved using the Laplace transform.
8.149. Let C(y)(s) denote the Laplace transform of a function y(t). Integrating by parts, prove that
Solution.
(8.11) C(y')(s) = sC(y)(s) - y(0)
C(y")(s) = s2C(y) - sy(0) - y'(0)
and, by induction:
C(y(n))(s) = snC(y)(s) - £Li s"'11       (0) . □
8.150. Find the function y(t) which satisfies the differential equation
y"(0 + 4y(0 = sin 2? as well as the initial conditions y(0) = 0, / (0) = 0. Solution. It follows from the above exercise ||8.149|| that
s2C(y)(s) + 4£(y)(s) = C(sm2t)(s).
We also have
C (sin 2000
Proof. The proof is almost identical to the one of the existence and uniqueness of the solution for a single equation with a single unknown function as we showed in Theorem 8.49. The unknown function x(t) — (x\ (t),..., x„ (?)) is a curve in M" satisfying the given equation, so its components Xi (t) are again expressible in terms of integrals
Xi (t) — Xi (t0) +
Jtn
(t)dt
■ f fi(t,X(t))dt. J t0
We are thus working with the integral operator y i->- L(y) once again, this time mapping curves in M" to curves in M", and we are looking for its fixed point. Since the Euclidean distance of two points in M" is always bounded from above by the sum of the sizes of the differences of the particular components, the proof goes on in much the same way as in the case 8.49. We only need to notice that the size of the vector
\\f(t, zi,..., z„) - f(t, yi,..., y„)\\
is bounded from above by the sum
\\f(t, Zi, . . . , Z„) ~ fit, yi, Z2 ■ ■ ■ , Zn)\\ + ■ ■ ■
+ \\f(t, yi, ■ ■ ■, y„-i, z„) - fit, yi,..., y„)\\.
We recommend to go through the proof of Theorem 8.49 from this point of view and to think out the details. □
When we introduced and examined models of a real system, the so-called qualitative behavior of the solution in dependence on the initial conditions and free parameters of the system (i. e. constants or functions) is essential.
As a quite simple example of a system of first-order equations, we can notice the standard population model "predator - prey", which was introduced in the 1920s by Lotka and Volterra.
Let x{t) denote the evolution of the number of individuals in the prey population and y{t) for the predators. We assume that the increase of the prey would correspond to the Malthusian model (i. e. exponential growth with coefficient a) if they were not hunted. On the other hand, we assume that the predator would only naturally die out (i. e. exponential decrease with coefficient y). Further, we consider an interaction of the predator and the prey which is expected to be proportional to the number of both with a certain coefficient p, which is, in the case of the predator, supplemented by a multiplicative coefficient expressing the hunting efficiency. We get a system of two equations:
___j       lotka-volterra model J___
x' (0 — ax{t) - Py(t)x{t) y(0 =-yy(t) + 8l3x(t)y(t).
s2 + 4
It is interesting that the same model captures quite well the progress of unemployment in the system limited to employers and their employees, considering the employees to be the predators, while the employers play the role of the prey.
vložit cca ctyri obrázky znázorňující dynamiku pro ruzne hodnoty koeficientu a opatřit komentárem!!!!!
Much information about this and other models can be found in literature.
527
CHAPTER 8. CONTINUOUS MODELS WITH MORE VARIABLES
l. e.,
£(y)(s)
(s2 + 4)2
The inverse transform leads to
i
8 4
y(t) = i sin2ř — jřcos2ř
□
8.151.   Find the function y(t) which satisfies the differential equation
/ (0 + 6/ (0 + 9y(t) = 50 sin t
and the initial conditions y(0) = 1, / (0) = 4. Solution. The Laplace transform yields
s2C(y)(s) - s - 4 + 6(sC(y)(s) - 1) + 9£(y)(s) = 50£(sin/;)00, i. e.,
(s2 + 6s + 9)£(y)(s)
50
s2 + 1
+ s +10,
C(y)(s)
50
+
s +10
(s2+l)(s + 3)2    (s + 3)2' Decomposing the first term to partial fractions, we obtain
50
As + B      C D +-r +
(s2 + l)(s + 3)2     52 + 1     s + 3    (S + 3)2'
so
50 = (As + B)(s + 3)2 + C(52 + l)(s + 3) + Z)(52 + 1). Substituting 5 = —3, we get
50 = 10D   hence   D = 5 and confronting the coefficients at s3, we have
0 = A + C,   hence   A = -C. Confronting the coefficients at s, we obtain
4
0 = 9 A + 6B + C = 8A + 65,   hence   £ = -C.
Finally, confronting the absolute term, we infer
50 = 9B + 3C + D = 12C + 3C + 5
hence   C = 3, 5 = 4, A = -3.
Since
5 + 10     5+3 + 7       1 7
+
(5 + 3)2      (Ä + 3)2      Ä+3     (5 + 3)
2 '
we have
£(y)(5)
-3s+4   , 3
_ + ^_ +     5      + J_ + -JL-
s2 + l   ^ s+3 ^ (s+3)2 ^ s+3 ^ (s+3):
_   -3s    i__4__i   _4_   i 12
s2+l       s2+l       s+3 "t" (s+3)2-
Now, the inverse Laplace transform yields the solution in the form
y(i) = -3 cos t + 4 sint + 4<?"3' + 12te~3f . □
8.53. Stability of systems of equations. Now, we restrict ourselves to a single basic theorem about stability of systems. We can notice that assuming that the partial 1^ derivatives of the functions defining the systems are -it*-~—continuous (in fact, Lipschitz) guarantees the continuity of the solution's behavior in dependence on the initial conditions as well as the equations themselves.
However, as the distance of t from the initial value to increases, the bounds grow exponentially! Therefore, this result has only a local usage, and it is in no contradiction with the example of the unstably behaving equation y1 (t) — t y(t) illustrated in paragraph 8.47.5
Let us consider two systems of equations written in the vector form
x'(f) = /(f,x(0),    y'(t) = g(t,y(t)) and assume that the mappings f,g:U c M"+1 —>• M" have continuous partial derivatives on an open set U with compact closure. Such functions must be uniformly continuous and uniformly Lipschitz on U, so we can label the finite values
\f(t,x)-f(t,y)\
C — sup
x^y; (t,x), (t,y)eU \x
5=   sup \f(t,x)-g(t,x)\
(t,x)eU
■y\
Having this notation, we can formulate our fundamental theorem: Theorem. Let x(t) and y(t) be two fixed solutions
x'(f) = /(f,x(0),    y'(t) = g(t,y(t))
of the systems considered above, given by initial conditions x(?o) = xo and y(to) = yo- Then,
\x(t) - y(t)\ < |x0 - y0\ec|Mo1 +£(ec'M°l -l).
Proof. Without loss of generality, we can assume that to — 0. \^    From the expression of the solutions x(t) and y(t) as fixed points of the corresponding integral operators, we immediately get the bound
W0 - >it)\ < \xQ -yQ\+ [ \f(s, x(s)) - g(s, y(s))\ds.
Jo
The integrand can be further bound as follows:
\f(s,x(s))-g(s,y(s))\
< \f(s, x(s)) - f(s, y(s))\ + \f(s, y(s)) - g(s, y(s))
< C \x(s) — y(s) \ + B
If we denote F(t) — \x(t)—y(t)\, a — |xo —yo\, we can write our bound as
F(t) < a + f (C F(s) + B) ds.
Such a bound can be quite easily used thanks to the following general result, which is known as Gronwall's inequality. Notice the similarity with the general solution of linear equations.
Lemma. Let a real-valued function F(t) satisfy, for the input values from an interval t € [0, tmax],
F(t) <a(t)+ f P(s)F(s)ds
-*Much more information can be found in a nice book Teschl.
528
CHAPTER 8. CONTINUOUS MODELS WITH MORE VARIABLES
8.152.   Find the function y(t) which satisfies the differential equation for some real-valued functions a it), Pit), where Pit) > 0. Then,
f it) = cos int) - yit),   t e (0, +oo) we also ^ and the initial conditions yiO) = cu y (0) = c2. Fit) <ait) + f ais)pis) eft m * ds
Solution. First, we should emphasize that it follows from the theory of 0
ordinary differential equations that this equation has a unique solution. for al11 6 [°' lmax\ If we even have that ait) is non-decreasing,
Further, we should recall that then
£ (/") is) = s2£ if) is) - s lim fit) - lim f (t)
F(ť) < ait)Joß{s)ds
and
Proof of the lemma. For the sake of transparency, let us write
£ (cos ibt) )(i) = jjijj,    JeM. G(0 = e- & «s) * .
Applying the Laplace transform to the given differential equation then Using me first assumption of the theorem, direct computation gives yields
s2C iy) is) - sci -c2 = -rj-T ~ £ (y) (s),
SZ+JT
i. e.,
ßis)Fis) ds =
(8.12)      C (v) m = {s, + {) ^ + nl) + + =«,)Gw(f»-/Vwfwa)
Therefore, it suffices to find a function y which satisfies (||8.12||). Per- - «(0^(0^(0
forming partial fraction decomposition, we obtain Now, integrating with respect to t and dividing by the non-zero
function G it) gives
The above expression of £ (cos ibt)) is) and the already proved for- f pis)Fis) ds < f
mula JO JO
mula
ßis)Fis)ds < j ais)ßis)^ds,
£ (sin t) is) —   1 which, having added a(t) to both sides of the inequality, gives the
*2+1 first proposition of the lemma.
then yield the wanted solution Assuming that a(f) is non-decreasing, we can continue:
yit) = ^2—j- (cos t — cos int)) + c\ cos t + c2 sin t . □
F it) < a (ř)(l + í jö(i) eft Är) * . Jo
8.153.   Solve the system of differential equations
x"(0+x'(0 = y(t)—y"(t)+e',    x* (t)+2x(t) = -yit)+y (t)+e~' Now, it suffices to notice that the integrand is actually a derivative:
with the initial conditions x(0)   =   0, y(0)   =   0, x'(0)   =   1, c M\s      d ( !'„,,, \
= 0 -ß(s)e/>mdr =-^W'M )>
Solution. Again, we apply the Laplace transform. This, using so we finally obtain
£(e±l) is) = f d •
transforms the first equation to          °* F{f) ^ "«C1 ~ J0 & eL ^ * *)
s2C (x) is) - s lim x(t) - lim x' it) + s£ (x) (5) - lim x(0 = _         ,  f ß(r) dr _,\
_ £ ^ ^ ^ — ( s2 £ (y) (5) — 5 lim y(£) — lim y (?) I H__— ^e second proposition of the lemma has thus been proved as
\                    t^o+         t^o+      )      s_1 well. □
and the second one to
is) -
t=>0+
Now, we can finish the proof of the theorem about continuous sC (x) is) — ^Ijn^ x(0 + 2£ (x) is) = dependency upon the parameters. We have already obtained the
-C(y) is) +s£iy) is) - lim y(t) + —. boundF(f) < a+ffc Fis)+B) ds, and using a merely modified
s+1' function Fit) = Fit) + f, it yields
Evaluating the limits (according to the initial conditions), we obtain the linear equations pit) < § + a + f CFis)ds.
sz£ (x) is) - 1 + s£ (x) is) = £ iy) is) - s1 £ iy) is) + ^
and
This is the assumption of Gronwall's inequality with (even) constant parameters, so by the second proposition of the lemma, we
l+i get
with the only solution pit) + - < ia + -)foCds,,
s£ (x) {s) + 2£ (x) is) = -£ iy) is) + s£ iy) is) +
£ix) is) — 2(j_1s)(j+1)2 .    £iy) is) — 2(s2-\x2' which is the statement
Once again, we perform partial fraction decomposition, getting Fit) <aeu+j(e"-l)
/VvWc\ — 1    1    -l 3      1      _ 1    1    _ 3      1        1   1 1
l, \x) ys> - 8 s_! -l- 4 (   1)2     s s+i _ 4 (s+i)2    4 S2-V we have wanted to prove. □
529
CHAPTER 8. CONTINUOUS MODELS WITH MORE VARIABLES
Since we have already computed that
£(?e-')(s) = £ (sinh t)(s)
2s
C (t sinh t) (s)
s2-l ■
(*2-l)2
we get
x(t)
31 e~' + j sinh t,   y(t) = 11 sinh t.
We definitely advise the reader to verify that these functions of x and y are indeed the wanted solution. The reason is that the Laplace transforms of the functions y = e', y = sinh? and y = t sinh? were obtained only for s > 1). □
8.154. Find the solution of the following system of differential equations:
x'(t)
y'(t)
Solution.
-2x(t) + 3y(0 + 3r,
-4x(0 + 5y(0 + e1,    x(0) = 1, y(0) = -1
C(x')(s) C(y')(s)
C(-2x + 3y + 3t2)(s), C(-4x + 5y + e')(s).
The left-hand sides can be written using (|| 8.111|), while the right-hand sides can be rewritten thanks to linearity of the C operator. Since C(3t2)(s) = -% and C(et)(s) = j^j, we get the system of linear equations
s£(x)(s) - 1 sC(y)(s) + 1
-2£(x)(s) + 3£(y)(s) + ^, -4C(x)(s) + 5C(y)(s) +
s-l ■
In matrices, this is A(s)x(s) = b(s), where 's + 2 -3 4     s -5, Cramer's rule says that
Ms)
^ = (£SS)andbW
1 +
£(*)(*) = -^1    £(y)(s) = -^i, where
|A| |Ai|
s+2 -3 4 5-5
1 + -1 +
s-l
-3 5-5
|A2| = Hence,
C(x)(s)
C(y)(s)
s+2 1+4
1
/ - 3s + 2,
(5-5)(l + ^) + 3(-l + ^T) (5+2)(-l + ^T)-4
24 «3 •
(5 - \)(s - 2)
(s-5)(s3 +6)
1
1
(s + 2)(2-s) 453+24
(s - \)(s - 2) V      s - 1 s3 Decomposing to partial fractions, the Laplace images of the solutions can be expressed as follows:
C(x)(s) =        - ^ + ^ 21 £(x)(.)
2s
(s-l)2
____ 25 _ 87
s-l       4(s-2)       s3 4s'
18 _ 3 I JJ_ ' «2 (s-l)2
12 21
The continuous dependency upon both the initial conditions and the potential further parameters in which the function / would be Lipschitz-continuous immediately follows from the statement of the theorem. A really simple equation in one variable x* (t) — x(t) with exponential solution shows that we cannot hope in better general results.
8.54. Differentiability of the solutions. In practical problems, we are often interested in the differentiability of the obtained solutions, especially with regard to the initial conditions or other parameters of the system.
We can notice that in the general vector notation of the system of ordinary equations
x'(0 = /(f, x(f)),
we can always suppose that the vector function does not depend implicitly on t. Indeed, if it depends explicitly on t, we can add another variable xo and write the same system of equations for the curve x7 (t) — (xo(t), x\(t),...,x„ (?)) as
x'0(f) = 1
x\(t) = fi(x0(t),Xl(t), ...,x„(f))
x'„(?) = /„(xo(0,*i(0, with initial conditions
Xn (0)
*0(?0) — to, X(t0) — Xi, . . . , X„(to) — X„.
Such systems, which do not explicitly depend on time, are called autonomous systems of ordinary differential equations.
To simplify the method, we will deal with autonomous systems dependent on parameters X and with initial conditions
(8.9)
y(t) = f(y(t),k), y(to)=x.
s—1      s—2
Without loss of generality, we will, in the case of autonomous systems, always consider the initial value to — 0, and should need arise, we will write the solution with y(0) — x in the form y(t, x, X) to emphasize the dependency on the parameters.
For fixed values of the initial conditions (and the potential parameters), the solution will always be once more differentiable than the function /. This can be easily derived inductively by applying the chain rule. If / is continuously differentiable,
f(t) = Dlf(y(t)) ■ /(t) = Dlf(y(t)) ■ f(y(t))
exists and is continuous. Having all the derivatives up to order two continuous, we get the expression for the third derivative:
y(3)(0 = D2f(y(t))(f(y(t)), f(y(t)))
+ (D1f(y(t)))2- f(y(t)).
Think out the argumentation for higher orders in detail.
Let us assume for a while that there is a solution y(t, x) of our system (8.9) which is continuously differentiable in the parameters x € M" as well. Then, the derivative
<t>(t,x) = Dl(y(t,x)),
i. e. the Jacobian matrix of all partial derivatives with respect to the coordinates x,, which depends on the time t as well as the initial
530
CHAPTER 8. CONTINUOUS MODELS WITH MORE VARIABLES
Now, the inverse transform yields the solution of this Cauchy problem:     condition x, can be determined using the chain rule:
x(t) -.
y(0
39,
3te' + 2Se' -- 3te' + lie'
21 2* 4 e
le2
15/2 2 '
6t2 -
_ 87 4 '
21 .
□
O. Equation of heat conduction
8.155. Find the solution to the so-called equation of heat conduction (equation of diffusion)
ut{x, t) = a2 uxx(x, t),    x e R, t > 0
satisfying the initial condition lim u (x,t) = f(x).
t^0+
Notes: The symbol ut = ^ stands for the partial derivative of the the u with respect to t (i. e., differentiating with respect to t and
considering x to be constant), and similarly, ux
dx2
denotes the
second partial derivative with respect to x (i. e., twice differentiating with respect to x while considering t to be constant). The physical interpretation of this problem is as follows: We are trying to determine the temperature u (x, t) in an thermally isolated and homogeneous bar of infinite length (the range of the variable x) if the initial temperature of the bar is given as the function /. The section of the bar is constant and the heat can spread in it by conduction only. The coefficient a2 then equals the quotient ^, where a is the coefficient of thermal conductivity, c is the specific heat and q is the density. In particular, we assume that a2 > 0.
Solution. We apply the Fourier transform to the equation, with respect to variable x. We have
T (ut) (co, t)
Tic
f ut(x, t)e~ia)Xdx
2k f u (x, t) t~imx dx
—oo
where differentiated with respect to t, i. e.,
T (ut) (co, t) = (T (u) (co, 0)' = (T (u))t (co, t). At the same time, we know that
T (a2 uxx) (co, t) = a2 T (uxx) (co, t) = —a2co2 T (u) (co, t). Denoting y(co, t) = T (u) (co, t), we get to the equation
yt = -a2co2 y.
We already solved a similar differential equation when we were calculating Fourier transforms, so it is now easy for us to determine all of its solutions
y(co, t) = K (co) e-fl2ft>\    K (co) e R.
It remains to determine K(co). The transformation of the initial condition gives
T(f) (co) = lim F(u) (co, t) = lim y(co, t) = K (co)e° = K (co), hence
y(co, t) = T (f) (co) e"fl2ftA,    K (co) e R.
Now, using the inverse Fourier transform, we can return to the original differential equation with solution
(x, 0 = 7= f ylfi>, 0 eimx dco
f T (f) (co) e-fl2ft>2' eimx dco
Dlx{y^(t,x)) =-(Dxy(t,x))
= D1f(y(t,x))-D1xy(t,x).
The derivatives with respect to the initial conditions along the solution y(t, x) of the system (8.54) are thus given as the solutions of a system of n2 first-order equations with initial condition
(8.10) &(t, x) = F(t, x) ■ ®(t, x), <P(0, x) = E,
where F(t, x) — D1 f(y(t, x)), and the initial condition comes out from the identity y(0, x) — x. The unique existence of the solution of this (matrix) system and its continuous dependence on the parameters have already been proved.
The following theorem says that for systems of continuously differentiable right-hand sides /, the derivatives with respect to the parameters can indeed be obtained in this way.
___|    Differentiability of the solutions |___
Theorem. Let us consider an open subset U C Rn+k and a mapping f : U —>• M" with continuous first derivatives. Then, a system of differential equations dependent upon a parameter X e Rk with initial condition at a point x € U
y(t) = f(y(t),X), y(0)=x
has a unique solution y(t, x, X), which is a mapping with continuous first derivatives with respect to each variable.
Proof. First, we can notice that we can consider a system dependent on parameters to be an ordinary autonomous system with no parameters if we consider even the parame-,«i ters to be space variables and we add (vector) conditions i| X'(t) — 0 and X(0) — X. Therefore, without loss of generality, we can prove the theorem for autonomous systems with no further parameters and concentrate on the dependency upon the initial conditions.
Just like in the case of the fundamental existence theorem, we will build upon Picard's approximations of the solution using the integral operator
y0(t, x) — x, yk+i(t,x)
= x+ f Jo
f(yk(s,x))ds.
Merely specifying the proof of this theorem 8.49, we can verify the uniform convergence of the approximations yk (t, x) to the solution y(t, x), including the variable x.
Now, for the initial condition, let us fix a point xo and a small neighborhood V of its, which, should need be, will be reduced during the following bounds, and let us write C for the constant which, thanks to Lipschitzness of the function /, gives the bound
\f(y)-f(z)\ <C\y-z\ on this neighborhood. We have already known that if the derivative
<D(f,x) = Dlxy(t,x)
of the solution y(t, x) exists, then it will be given by the equation (8.10) with an initial condition. Therefore, let us define <$> (t, x) by this equation and examine the expression
G(t, h) = \y(t, x0 + h)- y(t, x0) - h®(t, x0)|
531
CHAPTER 8. CONTINUOUS MODELS WITH MORE VARIABLES
1/(1/ f(s)e-ia)Sds) e-fl2ftA eia)X dco
i    r i    r --a2"2*        -X) dto \ ds.
I /(*)   -757 / e-flfl"e-"
Computing the Fourier transform F(f) of the function fit) e-af- ^or fl > o, we have obtained (while relabeling the variables)
i    r e-cp e-i>Prfp =  i e-fc c>o.
—oo
According to this formula (consider c = a2t > 0, p = co, r = s — x), we have
Therefore,
1      r   g—a2m2t g—r
J2.1T J
u (x, t)
ü)(s —x)
dco
/2a2t
2aJ~nt
j f{s)e   4*2> ds.
□
P. Numerical solution of differential equations
Now, we present two simple exercises on applying the Euler method for solving differential equations.
8.156. Use the Euler method to solve the equation / = — y2 with the initial condition y(l) = 1. Determine the approximate solution on the interval [1,3]. Try to estimate for which value h of the step is the error less than one tenth.
Solution. The Euler method for the considered equation is given by
yic+i = yk-h-y\
for
*0 = 1,     V0 = l,    Xk=X0 + k-h,     yk = yixk).
We begin the procedure with step value h = 1 and halve it in each iteration. The estimate for the "sufficiency" of h will be made somewhat imprecisely by comparing two adjacent approximate values of the function y at common points, terminating the procedure if the maximum of the absolute difference of these values is not greater than the tolerated error (0.1).
_The results_
h0 = 1
y(°) =(10 0)
Ai = 0.5
(1   0.5   0.375   0.3047 0.2583)
y
Maximal difference: 0.375.
h2 = 0.25
,(2)
(1.0000   0.7500   0.6094   0.5165   0.4498 0.3992
y
0.3594   0.3271 0.3004) Maximal difference: 0.1094.
h3 = 0.125
y
,0)
(1.0000   0.8750   0.7793 0.7034 0.6415 0.5901
0.5466   0.5092   0.4768 0.4484 0.4233 0.4009
0.3808   0.3627   0.3462 0.3312 0.3175) Maximal difference: 0.0322.
with small increases h e W. In order to prove that the continuous derivative exists, we just have to show that
1
lim - Git, h) = 0.
h^0 h
We will need several bounds to this purpose. First, from the last theorem about continuous dependence upon initial conditions, we can immediately see the bound
\yit, x0 + h) - yit, x0)\ < \h\ ec|i| .
In the next step, we use Taylor's expansion with remainder for the mapping /,
fiy) - fiz) = D\fiz) -iy-z) + Riy, z),
where Riy, z) satisfies \R(y, z)\/\y — z\ —>• 0 for |y — z\ —>• 0. We get the first bound which uses the definition of the mapping <$>it, xq) using its derivative, and we write Fit, x) — D1 fiyit, x)) again.
i
Git,h)< /   \fiyis,x0 + h))- fiyis,x0))
h F(s, xo)<J>(.s, xq)| ds
Jo
F(s, xq)\\ \y(s, xq+H) — y(s, xq) — h <J>(^, xo)| ds
+ \R(y(s,xo+h),y(s,x0))\ds, Jo
where we are working with the norm on matrices given as the maximum of the absolute values of their entries.
Since we assume that Fit, x) is continuous, we can bound the norm in our neighborhood V and for \t\ < T with a sufficiently small T to remain in the neighborhood V by
\\F(t,xo)\\ < B,
and, at the same time, for any selected constant e > 0, we can find abound \h\ < 8 for which the remainde R satisfies
\ Riy it, x0 + h), yit, x0))|  < e\y(t, x0 + h) - yit, x0)| < \h\eeCT .
Therefore, our bound can be improved as follows:
Git,h)<B\  Gis,h)ds + s\h\Te Jo
Gronwall's lemma now gives
Git, h) < e\h\Te(c+B)T .
However, hence it follows that lim^o \ G(t, h) converges to zero, which finishes the proof. □
It can be proved very similarly that continuous differentiability of the right-hand side up to order k (inclusive) guarantees the same order of differentiability of the solution in all input parameters.
It even holds that if the right-hand side / is analytic in all parameters, then the dependency of the solution on the parameters is analytic as well.
CT
532
CHAPTER 8. CONTINUOUS MODELS WITH MORE VARIABLES
Using suitable software, the following graphical representation of the results can be obtained, where the dashed curve corresponds to the exact solution, which is the function y = 1/x.
□
8.157. Using the Euler method, solve the equation / = — 2y with the initial condition y(0) = 1 and step value h = 1. Explain the phenomenon which occurs here and suggest another procedure.
Solution. In this case, the Euler method is given by
yt+i = yk - h ■ 2yk
-yk-
For the initial condition yo = 1, we get the alternating values ± 1 as the result. This is a typical manifestation of the instability of this method for large step values h. If the step cannot be reduced for some reasons (for instance, when processing digital data, the step value is fixed), better results can be achieved by the so-called implicit Euler method. For a general equation / = f(x, y), that is given by the formula
yt+i = yk + h- f(xk+u yk+i).
In general, we thus have to solve a non-linear equation in each step. However, in our problem, we get
yt+i = yk - 2h ■ yt+i,
so we have yk+\ = \yk for h = 1. Again, the obtained results can be represented graphically, including the exact solution of the equation.
8.55. Flows of vector fields. Before going to higher-order equations, we stop for a while to look at systems first-order of equations from the geometrical point of view. Now, we can geometrically formalize the right-hand side of an autonomous system as assigning the vec-l" in the direction space of the Euclidean space Rn to each of its points x in the considered domain. We talk about the vector field X(x) — fix).
If a vector field X on an open set U c R" is given, then we can define for every differentiable function / on U its derivative in the direction of the vector field X by
tor f(x) e
X(f) : U
X(fXx)=dx(x)f.
Therefore, if we have, in coordinates, X(x) — (X\(x),... X„ (x)), then
*(/)(*) = XiCc)|£(x)-
OX l
df
+ X„ix)-^ix).
The simplest vector fields will have all coordinate functions equal to zero except for one function X, which will be constantly equal to one. Such a field then corresponds to the partial derivative with respect to the variable x,. This is also matched by the common notation
Xix)
d
Xiix) —
OX I
d
+ X„ix) — . oxn
Now, the problem of finding the solution of our system of equations can be described equivalently as looking for a curve which satisfies
x' (0 = X(x(t))
for each t in its domain. The tangent vector of the wanted curve is given, at each of its points, by the vector field X. Such a curve is called an integral curve of the vector field X, and the mapping
Flf : Rn -* Rn,
defined at a point xo as the value of the integral curve x(t), satisfying x(0) — xq is called the flow of the vector field X. The theorem about existence and uniqueness of the solution of the systems of equations says that for every continuously differentiable vector field X, its flow exists at every point xo of the domain for sufficiently small values of t. The uniqueness further directly guarantees that
Flf+S(x)
Flf o Flf (x),
whenever both sides exist. Moreover, the mapping FlfQ (x) with a fixed parameter t is differentiable at all points x where it is defined.
If a vector field X is defined on the whole Rn and has compact support, then its flow clearly exists at all points and for all values of t. Such vector fields are called complete. The flow of a complete vector field thus consists of diffeomorphisms Flf Rn -> Rn with inverse diffeomorphisms Flx_t.
A simple example of a complete vector field is the field X(x) — af-. Its flow is given by
Fl? (xi,
x„) — (xi +t,X2,..., x„).
On the other hand, the vector field X(t) dimensional space R is not complete as its solutions are of the form
1
f2^- on the one-
t i->
C
533
CHAPTER 8. CONTINUOUS MODELS WITH MORE VARIABLES
□
for initial conditions with to / 0, so they "run away" towards infinite values in a finite time.
The description of a vector fields as assigning the tangent vector in the direction space to each point of the Euclidean space is independent of the coordinates. The following theorem thus gives us a geometric local qualitative description of all solutions of systems of ordinary differential equations in a neighborhood of each point x where the given vector field X is non-zero.
Theorem. If X is a vector field defined on a neighborhood of a point xq € K" and we have X(xq) ^ 0, then there exists a transformation of coordinates F such that in the new coordinates y — F(x), the vector field X is given as the field
Proof. We will construct a diffeomorphism F — (fi,..., /„) step by step. Geometrically, the essence of the proof can be summarized as follows: we select a hyper-> surface which is complementary to the directions X(x), goes through the point xq, then we fix the coordinates on it, and finally, we extend them to some neighborhood of the point xo using the flow of the field X.
First, we move the point xo to the origin and use a linear transformation on W in order to achieve X(0) — gfj-(O). Now, let us write in these coordinates (x\,... ,x„) the flow of the field X going through the point (xi,..., x„) at time t — 0 as x,-(0 — (Pi (t,xi,..., x„). We define
fi(x\, ..., x„) = <Pi(x\,0, x2, ..., x„).
Since the flow of the field X satisfies
<Pi(0, 0, x2, ..., x„) — (0, X2,
since it is the flow at time 0, we obtain
dF
> Xfi),
dxi
(0) = (0,
1,
.0),
i = 2,..
and the same formula holds for i — 1, because we have X — Therefore, the Jacobian matrix of the mapping F at the origin is the identity matrix E, so it is indeed a transformation of coordinates on some neighborhood (see the inverse mapping theorem in paragraph 8.17).
Now, directly from the definition of the mapping F in terms of the flow of the vector field X, the flow of the field will be expressed in the new coordinates (yi, ■ ■ ■, y«) as
Flf(y!,...,y„) = (yi + t, y2, Verify this by yourselves in detail!
yn)-
□
8.56. Higher-order equations. An ordinary differential equation of order k (solved with respect to the highest derivative) is an equation
s
y(k) (t) = f(t,y(t),y'(t),...,yik-1)(t)),
where / is a known function of k+1 variables, x is an independent variable, and y(x) is an unknown function of one variable. We will show that this type of equation is always equivalent to a system of k first-order equations.
We introduce new unknown functions in a variable t as follows: y0(0 = y(t), yi(0 = y'0(0,      y*-i(0 = ^_2(0- Now,
534
CHAPTER 8. CONTINUOUS MODELS WITH MORE VARIABLES
the function y(t) is a solution of our original equation if and only if it is the first component of the system of equations
y0(o = yi (o
/iCO =y2(0 y„_2(0 = yn-i(t)
y„_i(0 = f(t,yo(t), yi(0,---,y«-i(0)-
We thus get the following direct corollary of the theorems from 8.52-8.54:
___|    Solutions of higher-order ODEs |_ -
Theorem. Let a function f(t, yo, yt-i) : U C Rk+1 -> M have continuous partial derivatives on an open set U. Then, for every point (to, zo, ■ ■ ■, Zk-i) e U, there exists a maximal interval Imax — [xo — a,xo + b], with positive numbers a, b e M, and a unique function y(t) : Imax —>• K which is a solution of the k-th order equation
y(k) (t) = f(t, y(t),y'(t),...,y{k-l)(t))
with initial condition
y(?o) = zo, y (to) = zi,..., y(k~1} (to) = zt-i-
Moreover, this solution depends differentiably on the initial condition and potential further parameters differentiably entering the function f.
We can thus see that for an unambiguous assignment of a solution of an ordinary &-th order differential equation, we have to determine at a point the value and the first k — 1 derivatives of the resulting function.
If we worked with a system of £ equations of order k, then the same procedure transforms this system to a system of k£ first-order equations. Therefore, an analogous statement about existence, uniqueness, continuity, and differentiability will hold again.
Of course, stronger properties pass on to all such systems in the cases when the right-hand side of the equation / is differen-tiable up to order k (inclusive) or analytic, including the parameters, and these properties pass on to the solutions as well.
8.57. Linear differential equations. We have already perceived the operation of differentiation as a linear mapping from (sufficiently) smooth functions to functions. If we multiply the derivatives (-^V of the particular orders j by fixed functions aj (t) and add up these expressions, we get the so-called linear differential operator.
y(t)      D(y)(t) = ak(t)yik) (t) + ■ ■ ■ + ax (t)y' (t) + a0y(t).
To solve the corresponding homogeneous linear differential equation then means to find a function y satisfying D(y) — 0, i. e., the image is the identically zero function.
It is clear straight from the definition that the sum of two solutions will again be a solution, since for any functions yi and y2, we have
D(yi + y2)(t) = D(yi)(t) + D(y2)(t). Analogously, a constant multiple of a solution is again a solution. The set of all solutions of a &-th order linear differential equation
535
CHAPTER 8. CONTINUOUS MODELS WITH MORE VARIABLES
is thus a vector space. Applying the previous theorem about existence and uniqueness, we get the following:
__The space of solutions of linear equations _  
Theorem. The set of all solutions of a homogeneous linear differential equation of order k is always a vector space of dimension k. Therefore, we can always describe the solutions as linear combinations of any set ofk linearly independent solutions. Such solutions are determined uniquely by linearly independent initial conditions on the value of the function y(t) and its first (k — 1) derivatives at a fixed point To-
Proof. If we choose k linearly independent initial conditions at a fixed point, then we get, for each of them, a unique solution of our solution. A linear combination of these initial condition then leads to the same linear combination of the corresponding solutions. We thus exhaust all of the possible initial conditions, so we get the entire space of solutions of our equation in this way. □
8.58. Linear differential equations with constant coefficients.
The previous discussion surely reminded us the situation with homogeneous linear difference equations we dealt with in paragraph 3.9 of the third chapter. The analogy goes further even when all of the coefficients aj of the differential operator D are constant. We have already seen such first-order equations (8.8) whose solution is an exponential with an appropriate constant at the argument. Just like in the case of difference equations, it suggests itself to try whether such a form of the solution y(t) — e>a with an unknown parameter k can satisfy an equation of order k. Substitution yields
D(ekt) — (akXk + ak-ikk~l H-----h a\k + a0(x)) elt .
The parameter k thus leads to a solution of a linear differential equation with constant coefficients if and only if k is a root of the so-called characteristic polynomial akkk + • • • + a\k + ao.
If this polynomial has k distinct roots, then we get the basis of the whole vector space of solutions. Otherwise, if k is a multiple root, then direct calculation, making use of the fact that k is then a root of the derivative of the characteristic polynomial as well, yields that the function y(t) — t e>a is also a solution. Similarly, for higher multiplicities I, we get I distinct solutions
In the case of a general linear differential equation, we assign a non-zero value of the differential operator D. Again, analogously to the reasonings about systems of linear equations or linear difference equations, we can see that the general solution of this type of (non-homogeneous) equation
for a fixed function bit), is the sum of an arbitrary solution of this equation and the set of all solutions of the corresponding homogeneous equation D(y)(t) — 0. The entire space of solutions is thus again a nice finite-dimensional affine space, hidden in a huge space of functions.
The methods for finding a particular solution are introduced in concrete examples in the other column. In principle, they are based upon looking for the solution in a similar form as the right-hand side is.
Diy)it) = bit),
536
CHAPTER 8. CONTINUOUS MODELS WITH MORE VARIABLES
8.59. Matrix systems with constant coefficients. Now, let us
take a look at a very special case of first-order systems, whose right-hand side is given by multiplication by an n2-dimensional unknown vector function
Y(t):
(8.11) Y'(t) = A ■ 7(0
with a constant matrix A e Mat„(M). Combining all our knowledge from linear algebra and univariate function analysis, we can guess the solution directly if we define the so-called exponential of a matrix by the formula
B(t) = etA = Y —Ak.
The right-hand expression can be formally viewed as a matrix whose entries are infinite series created from the mentioned products. If we bound all entries of A by the maximum of their absolute values ||A|| = C, then, for the k-ih summand in bij(t),
we get the bound j^nkCk in absolute value. Hence, every series bij (t) is necessarily absolutely and uniformly convergent, and it is bound above by the value emC. Trying to differentiate the terms of our series one by one, we get a uniformly convergent series with limit A etA. Therefore, by the general properties of uniformly convergent series, the derivative
dt K '
also equals this expression. We have thus obtained the general solution of our system (8.11) in the form
7(0 = &tA Z,
where Z e Mat„ (R) is an arbitrary constant matrix. Indeed, the exponential etA is an invertible matrix for all t, so we have obtained a vector space of the proper dimension, and hence all general solutions.
Noticeably, if we have only a vector equation with a constant matrix A e Mat„(R), y1 (t) — A ■ y(t), for an unknown function y : R -> Rn, then the exponential etA gives n linearly independent solutions with its n columns. The general solution is then given by any linear combination of them.
Finally, let us recall that we met the first-order matrix system in paragraph 8.54 when we were thinking about the derivative of the solutions of vector equations with respect to t the initial conditions. Now, consider a differentiable vector w field X(x) defined on a neighborhood of a point xo e R" such that X(xo) — 0. Then, the point xq is a fixed point of its flow Flf(x).
The differential d> (0 = Dx Flf (x0) satisfies (see (e8.42b) on page 531)
<D'(0 = DlX(x0) ■ <D(0,    d>(0) = £.
We thus know explicitly the evolution of the differential of the vector field's flow at the singular point xo, which is given by the exponential
d>(0 = e'A,    A = Z)1Z(x0).
This is a useful step for reasoning about the qualitative behavior in a neighborhood of the stationary point xq.
537
CHAPTER 8. CONTINUOUS MODELS WITH MORE VARIABLES
8.60. A note about Markov chains. In the third chapter, we dealt
with iterative processes, where the so-called stochastic matrices and Markov processes determined by them played an interesting role. Let us recall that a matrix A is stochastic iff the sum of each of its
columns is one. In other words,
(1 ... 1) • A = (1 ... 1). If we take the exponential etA, we obtain
(1 ... 1) • etA =      — C1 • • • 1) • A* = e'(l ... 1).
k=0
Therefore, for every t, the invertible matrix Bit) — e~' etA is stochastic. We thus get a continuous version of the Markov process (infinitesimally) generated by the stochastic matrix A. Indeed, differentiating with respect to t, we obtain
— Bit) = -e"'e(A AetA = (-E + A)B(f),
dt
so the matrix Bit) is the solution of the matrix system of equations with constant coefficients
Y'it) — (A — E) ■ Yit)
with the stochastic matrix A. This can be explained quite intuitively. If the matrix A is stochastic, then the instantaneous increase of the vector yit) in the vector system with the matrix A, y (?) = A ■ yit), is again a stochastic vector. However, we want the Markov process to keep the vector yit) stochastic for all t. Hence, the sum of increases of the particular components of the vector yit) must be zero, which is guaranteed by subtracting the identity matrix.
As we have seen above, the columns of the matrix solution Y'it) create a basis of all solutions i it) of the vector system.
Let us further suppose that the matrix A is primitive, i. e., some of its powers has only positive entries, see ?? on page ??. Then we know that its powers converge to a matrix A^, all of whose columns are eigenvectors corresponding to the eigenvalue 1. Hence, there must exist a universal constant bound for all powers II Ak — Aqo II < C, and for every small positive e, there is an N e N such that for all k > N, we already have that \\Ak — A^H < s. Now, we can bound the difference between the solution Y'it) for large t and the constant matrix A^:
00   Jc 00 Jc
e-' y LA* _ e-t y LA ^ k\        ^ k\
k=0 k=0
jk
< e"' ^ -CHAooll +e-'e||A00||.
k<N
The limit of the expression fit) — e~l J2k<N \\ can easily be computed by iterative application of 1'Hospital's rule. Indeed, differentiation of the sum yields the same, only for N one less, and the derivative in the denominator is not changed, so the limit is zero. Therefore, for our chosen e, we can find a T such that fit) would be less than e for t > T. The whole expression has thus been bound(forn > N and? > T > 0) by the number e(C + l^lA^H.
We have thus proves a very interesting statement, which resembles the discrete version of Markov processes:
538
CHAPTER 8. CONTINUOUS MODELS WITH MORE VARIABLES
_J    Continuous processes with a stochastic matrix
Theorem. Every primitive stochastic matrix A gives a vector system of equations
3/ (0 = (A - E) ■ y(t)
with the following properties:
• the basis of the vector space of all solutions is given by the columns of the stochastic matrix
Y(t) = e~' etA ,
• if the initial condition yo — y(to) is a stochastic vector, then the solution y(t) is also a stochastic vector for all t,
• every stochastic solution converges for t -> oo to an eigenvector yoo of the matrix A corresponding to the eigenvalue 1 of the matrix A.
4. Notes about numerical methods
Except for the simple equations, like the linear ones with constant coefficients, we seldom encounter analytically solvable equations in practice. Therefore, we usually need some techniques to approximate the solutions of the equations we are working with.
We have already thought of a similar idea anywhere we dealt with approximations (i. e., we would recommend to compare this to the earlier paragraphs about splines, Taylor polynomials, and Fourier series). With a bit of courage, we can consider difference and differential equations to be mutual approximations. In one direction, we replace differences with differentials (for example, in economical or population models), and vice versa.
We will stop for a while to look at replacing derivatives with differences. First, we introduce the usual notation for bounds on the errors.
Let us recall that having a function f(x) in variable x, we say that it is, in a neighborhood of a limit point xo of its domain, of order of magnitude 0(cp(x)) for a function cp(x) iff there exists a neighborhood U of the point xq and a constant C such that
l/tol <C-\<p(x)\
for all x e U. The limit point xo can also be one of the infinite values ±oo.
The most usual cases are 0(xp) for a polynomial order of magnitude, at zero or infinity; 0(ln x) for a logarithmic order of magnitude at infinity, and so on. Let us notice that the logarithmic order of magnitude is independent of the choice of the logarithm base.
A good example is the approximation of a function by its Taylor polynomial of order k at a point xq . Taylor's theorem for univariate functions says that the error of this approximation is 0(hk+1), where h is the increase of the argument x — xo — h.
We also considered similar topics in the case of Fourier series.
8.61. Euler method. In the case of ordinary differential equations, the simplest scheme is approximation with the so-called Euler polygons. We will present it for a single ordinary equation with two quantities: one independent and one dependent. It works analogously for systems of equations where scalar quantities and their derivatives in time t are replaced with vectors dependent upon time and their derivatives.
539
CHAPTER 8. CONTINUOUS MODELS WITH MORE VARIABLES
Let us consider an equation (of order one, for the sake of simplicity and without loss of generality)
3/(0 = fit, y(t)).
We will denote the discrete increase of time by h, i. e., tn — t$+nh, and y„ — y(t„). It follows from Taylor's theorem (with remainder of order two) and our equation that
y„+i = y„ + y'{tn)h + 0(h2) = yn + f(tn, yn)h + 0(h2).
Therefore, if we make n such steps with increase h from to to t„, the expected bound for the total error following the local inaccuracies of our linear approximation at most h 0(h2), i. e., the error's order of magnitude will be 0(h). In practice, rounding errors take effect as well.
Using the numerical procedure of Euler method, we consider the piecewise linear polygon defined above to be the solution.
540