113 Catalan Numbers I Iiohc include .........r.'in. |i illi 111 I In |ilnut . wi II formed sequences of |, ,,. ni In:;,:,, lull binary Liven, well i... i • 11111. i.. - I i.....I nets of variables, stack I.......ital.ions, and triiingullitioim of m convex polygon. After defining the Catalan numbers explicitly by formula, we will show by | . ombin&torial argument that they count nonnegative paths in the plane. The of each set of structures subsequently considered is shown to be a Catalan .....nber by establishing a one-to-one correspondence between that set and a set I 11 uc.tures shown earlier to be of that size. Using recurrence relations and generating functions, we will show by a . I liferent approach that the size of one of these sets (well-parenthesized products ni variables) is a Catalan number. All the sets considered are then enumerated by (Catalan numbers in view of the existence of the one-to-one correspondences previously established. The asymptotic behavior of the sequence is investigated, iBd we obtain the order of magnitude of the nth Catalan number. Author: versity. Thomas A. Dowling, Department of Mathematics, Ohio State Uni- Prerequisites: The prerequisites for this chapter are recursive definitions, basic counting principles, recurrence relations, rooted trees, and generating functions. See, for example, Sections 3.1-3.3, 4.1-4.4, 5.1, 5.2, 8.1-8.3, and Appendix 3 of Discrete Mathematics and Its Applications, Second Edition, by Kenneth H. Rosen. Introduction Sequences and arrays whose terms enumerate combinatorial structures have many applications in computer science. Knowledge (or estimation) of such integer-valued functions is, for example, needed in analyzing the complexity of an algorithm. Familiar examples are the polynomials in n, exponential functions (with an integer base) with exponent a polynomial in n, factorials, and the binomial coefficients. Less familiar are the Stirling numbers considered elsewhere in this book. The sequence of positive integers to be met here, called the Catalan* numbers, enumerate combinatorial structures of many different * Eugene Charles Catalan (1814-1894) was a prominent Belgian mathematician who had numerous publications on multiple integrals, general theory of surfaces, mathematical analysis, probability, geometry, and superior arithmetic. Paths and Catalan Numbers Suppose m = a + b votes were cast in an election, with candidate A receiving a Votes and candidate B receiving b votes. The ballots are counted individually in some random order, giving rise to a sequence of a As and b Bs. The number of possible ballot sequences is the number C(m, a) of a-element subsets of the set {1,2,... ,m}, since each such subset indicates which of the m ballots were cast for candidate A. Assuming all such sequences are equally likely, what is the probability that candidate A led throughout the counting of the ballots? Every sequence of m ballots can be represented by an ordered m-tuple (xi,x2,...,xm) with x,- = 1 if the ith ballot was a vote for A and x; = -1 if it was for B. Then after i ballots are counted, the ith partial sum s; = Xl 4. X2 + ... + Xi (with s0 = 0) represents ,4's "lead" over B. If we denote by P(a, 6) the number of sequences (xx , x2, • ■ •, xm) with the partial sums s,- > 0 for i = 1,2,..., m, then the probability that A led throughout is P(a, b)/C(m, a). The Catalan numbers arise in the case where a = b, which we shall now assume. Denote the common value of a and b by n, so that m = 2n. Suppose we seek the probability that A never trailed throughout the counting. There are C(2n, n) possible sequences of n l's and n -Is. We seek the number for which si > 0 for i = 1,2,... ,2n- 1. We can represent the sequence (x1;... ,x2n) by a path in the plane from the origin to the point (2n,0) whose steps are the line segments between (i -l,Sj_i) and (i,si), for i = 1,2,... ,2n. The ith step in this path then has slope Si - s,_i = x{ € {!,-!}• 112 114 Af'plit.ittow, nl I>f.i min (lm,y minus signs. Draw the corresponding paths. n 0 1 2 34 II 0 U) Sequence (i) (--,+,+,+,+,--,-,+) (ii) (+,+,-,+,-,-,+,+,-,-) (iii) (+,+,-,+,+,-,+,--,-) Partial sum sequence (0,-1,-2,-1,0,1,2,1,0,-1,0) (0,1,2,1,2,1,0,1,2,1,0) (0,1,2,1,2,3,2,3,2,1,0) Solution: The corresponding paths are shown in Figure 1. □ (i) (ii) (Iii) Figure 1. Some paths from (0,0) to (10,0). Note that in sequence (i) the partial sum sequence has both positive and negative entries, so the path lies both above and below the x-axis. In sequence (ii) all the partial sums are nonnegative, so the path does not go below the x-axis. But in sequence (iii), we have s,- > 0 for i = 1,2,..., In — 1. Hence, except for its two endpoints, the path lies entirely above the x-axis. We call a path from (0,0) to (2n, 0) nonnegative if all s,- > 0 and positive if S{ > 0 for i = 1,2,... ,2ri — 1. Thus, a nonnegative path corresponds to a ballot sequence in which candidate A and candidate B both received n votes, but candidate A never trailed. Similarly, a positive path represents a ballot sequence in which candidate A was leading all the way until the final ballot brought them even. We shall see that the number of nonnegative paths and positive paths from the origin to the point (2n,0) are both Catalan numbers. Before proceeding with this ballot problem, let us define the Catalan numbers. Definition 1 The Catalan number cn, for n > 0, is given by 1 C(2n,n). □ n + 1 c„ 1 1 2 5 14 42 132 429 1,430 4,862 16,796 Table 1. The lirst eleven Catalan numbers. I xnmple 2 Find all the nonnegative paths from the origin to (6,0). Which nl these are positive paths? Solution: There are C(4,2) = 6 slope sequences of length 6 that start with 1 itml end with —1. To check for nonnegativity, we compute their partial sums, iii'I we find the nonnegative paths have slope sequences (+,-,+,-,+,-), (+,-,+,+,-,-), (+,+,-,-,+.-), (+>+,-,+,-.-), (+,+,+,-,-,-)< with respective partial sum sequences (0,1,0,1,0,1,0), (0,1,0,1,2,1,0), (0,1,2,1,0,1,0), (0,1,2,1,2,1,0), (0,1,2,3,2,1,0). These paths are shown in Figure 2. □ ^ ^v, A Figure 2. The nonnegative paths from the origin to (6,0). show that the numbers of nonnegative and positive paths are The values of cn for n < 10 are given in Table 1. Let us now Catalan numbers. Theorem 1 The number of paths from the origin to (2n,0) that are (i) positive is the Catalan number c„_i, (ii) nonnegative is the Catalan number c„. Proof: We shall first establish a one-to-one correspondence between positive paths of length 2n and nonnegative paths of length 2n — 2. Let (s0,si,... ,«2n) be the partial sum sequence of a positive path P. Then s0 = s2n = 0 and Sj > 1 for i = 1,2, ...,2n - 1. Let x = (xi, x2,..., x2„) be the corresponding slope sequence of the steps, so x,- = s,- - s»_i G {1,-1}. Since si > 1 and s0 = 0, we must have si = 1. Hence the path P passes through the point (1,1). Similarly, since s2„_i > 1 and s2„ =0, we have s2„_i = 1. Therefore P passes through i ic> A,,,,l„ .,(/,,,-. oil',:, „,to Mif/m i /,,,,,(„, ' ( .-iM/d/i Niimhwii 117 the point. (2/i — 1,1). If we omit. Mir lirnt. nnil Inwl I..... I......I In I. .p.- sequence, we have a sequence x' that has n — 1 Is and n— I In hirMirr, Mir partial niihin for x' satisfy s'i = x'j + x'2 + ... + x\ = x2 + x3 + . . . + Xi+i = Si+i - 1 > 0 for 0 < i < 2n — 2. Thus the positive path P from the origin to (2n,0) corresponds to a nonnegative path P' from the origin to (2n — 2,0). Geometrically, the path P' is produced from P by taking the segment of P from (1,1) to (2n —1,1) relative to the new coordinate system obtained by translating the origin to the point (1,1). Since the first and last terms of P must be 1 and —1, respectively, the operation of deleting those terms is reversed by adding 1 before and —1 after P'. Thus the function that takes P to P' is invertible. Hence, if an is the number of positive paths and bn the number of nonnegative paths from the origin to (2n,0), then an = 6„_i. Part (ii) will then follow from (i) once we establish that an — c„_i, since we would then have bn = an+i = c„. A positive path P from the origin to (2n,0) is determined uniquely by its segment P' from (1,1) to (2n — 1,1), which lies entirely above the x-axis. But the number of such paths P' is the difference between C(2n — 2,n — 1), the total number of paths from (1,1) to (2n — 1,1), and the number of paths Q from (1,1) to (2n — 1,1) that meet the x-axis. Thus, we can determine the number of paths P' by counting the number of such paths Q. Suppose that Q first meets the x-axis at the point (k,0). Then k > 2 since Q begins at the point (1,1). If we reflect the segment Qi of Q from (1,1) to (k,0) about the x-axis, we obtain a path Q\ from (1,-1) to (k,0). Then if we adjoin to Q[ the segment Q2 of Q from (k,0) to (2n — 1,1), we obtain a path Q* = Q1Q2 from (1,-1) to (2n — 1,1). But every such path Q* must cross the x-axis, so we can obtain the path Q from Q* by reflecting the segment of Q* from (1,-1) to its first point on the x-axis. Hence we have a one-to-one correspondence between the paths Q and Q*, so the number of paths Q from (1,1) to (2n—1,1) that meet the x-axis is equal to the number of paths Q* from (1,-1) to (2n — 1,1). Every such path Q* must have two more increasing steps than decreasing steps, so it has n increasing steps and n — 2 decreasing steps. The number of such paths Q* is therefore C(2n — 2, n — 2), and this is equal to the number of such paths Q. The number, an, of positive paths P from the origin to (2n,0) can now be calculated. Since it is equal to the number of positive paths P' from (1,1) to (2n —1,1), and this is the difference between the total number C(2n — 2, n — 1) of paths from (1,1) to (2n —1,1) and the number C(2n — 2, n — 2) of paths Q lllilll (1,1) to (2/1 • 1,1) Mlllt iiu'll Mir J (i k in, wr Imvr an = C(2n - 2,n - 1) - C(2r» - 2,n - 2) (2t» - 2)1 (2n - 2)! (n-l)l(n-1)1 (n - 2)!n! _ (2n-2)! / 1 J\ (n-2)!(n-l)! \n - 1 n) (2n - 2)! ~ (n - 2)!(n - 1)! n(n - 1) 1 (2n - 2)! - n(n- l)!(n - 1)! - -C(2n-2,n- 1) n - Cn-1- Well-formed Sequences of Parentheses The Catalan numbers count several structures important in computer science. Lot us consider first the set of well-formed sequences of parentheses, which I .111 be defined recursively (see Section 3.3 in Discrete Mathematics and Its Applications, Second Edition, by Rosen). Definition 2 A sequence of parentheses is well-formed if and only if it can he derived by a finite sequence of the following rules: The empty sequence is well-formed. If A is well-formed, then (.4) is well-formed. If A and B are well-formed, then AB is well-formed. We say that the right parenthesis following A in the second rule closes the left parenthesis preceding A. □ Example 3 Find a sequence of n = 4 parentheses that is not well-formed and a sequence that is well-formed. Solution: The sequence (()))(() is not well-formed since only one of the third and fourth left parentheses can be closed by the single right parenthesis that follows them. But the sequence (())()() is well-formed since each left parenthesis is closed by the first right parenthesis following it that does not close a left parenthesis between them. □ Clearly each well-formed sequence of parentheses must have an equal number of left parentheses and right parentheses. Further, in any initial string con- lit) Applh .ilium, at I >ľ.< min I 119 sisting of the first i parent .hoses of a well I'm.....I ninn<<- of '2n parent I »*■.-.« there must be at least as many left, parentheses i n.lit parentlieses. Thus, if wo replace left parentheses by Is and right parentheses by — Is in a well-formed sequence of parentheses, we obtain a sequence (zi, Z2> • • • > x2n) G •{ 1, — 1} 1 with all partial sums Sj > 0, and hence a nonnegative path from (0,0) to (2n,0). Conversely, any nonnegative path from (0,0) to (2n,0) produces a well-formed sequence of parentheses. By Theorem 1 and this one-to-one correspondence, we have established the following result. Theorem 2 The number of well-formed sequences of parentheses of length 2n is the Catalan number c„. ■ Example 4 Find the well-formed sequences of parentheses of length 2n = 6. Solution: Since n = 3, by Theorem 2 there are exactly c3 = 5 such sequences. They are found to be 0 0 0, (MO), (())(), (()()), ((()))• □ Stack Permutations In Fundamental Algorithms, Volume 1, of his classic series of books, The Art of Computer Programming, Donald Knuth posed the problem of counting the number of permutations of a particular type that arise in the computer. A stack is a list which can only be changed by insertions or deletions at one distinguished end, called the top of the list. When characters are individually inserted and deleted from the stack, the last one inserted must be the first one deleted from the stack (lifo). An insertion to the top of the stack is called a push; a deletion from the top is called a pop. We can interpret a stack as a stack of plates, with a push representing the placement of a plate on top, and a pop corresponding to the removal of a plate from the top. A sequence of pushes and pops is admissible if the sequence has an equal number n of pushes and pops, and at each stage the sequence has at least as many pushes as pops. If we identify pushes with Is and pops with —Is, then an admissible sequence corresponds to a nonnegative path from the origin to the point (2n,0). Thus, by Theorem 1, the number of admissible sequences of pushes and pops of length In is the Catalan number c„. When applied in a computer, an admissible sequence of pushes and pops of length 2n transforms an input string of length n to an output string with the same symbols, but in a possibly different order. Suppose we have as an initial input string the standard permutation 123...n of the set TV = {1,2,...,n} mul mi it| the input hiniik to the top of the stack, Mini each pop transfers the element on lop ol the stark to the beginning of the .....ľni Hiring. After the n pushes and n pops have been performed, the output llrliiv, i'1 >'• permutation of N called a stuck permutation. Exiimple 5 Let n = 4 and consider the admissible sequence (+,+,+,-,-,+,-,-), 1 e a plus sign represents a push and a minus sign represents a pop. Find ilľ stack permutation produced by this sequence of pushes and pops. Solution: We will denote the result of each operation of the admissible se-quence of pushes and pops by a string of the form a[cr]/?, where a is the current Input string, /? the current output string, and [a] the current stack, with the In I element of cr being the top of the stack. Then the admissible sequence i.im 11 proceeds as follows to produce the stack permutation 4132: Sequence a[< I therefore by a unique nonnegative path. We can recover the admissible sequence, and hence the path, from the stack permutation as follows. Starting with the stack permutation, precede it by the empty input string a and the rinpty stack [........■........... i|in,| ......i upward, is (+,+,-,+,+,-,-,-). Q It follows from the foregoing that there is a one-to-one correspondence between the set of stack permutations of N and the set of nonnegative paths, or between the set of stack permutations of N and set of admissible sequences. By Theorem 1 we have the following theorem. Theorem 3 The number of stack permutations of an ra-element set is the Catalan number c„. II parenthesized w <|' I..i i In |.....Ill' i i i i ■ i 11 i »re found to be Well-parenthesized Products Consider an algebraic structure 5 with a binary operation,which we will refer to as multiplication. Then, as usual, we can denote the product of x, y G 5 by xy. Let us further assume that the operation is not commutative, so that xy ^ yx in general. The product X\x^ ■ ■ ■ xn+\ of n + 1 elements of 5 in that order is well-defined provided that multiplication is associative, i.e. that (xy)z = x(yz) for all x,y,z £ 5. But let us suppose that multiplication in 5 is not associative. Then a product X1X2 ■ ■ ■ £n+i is defined only after parentheses have been suitably inserted to determine recursively pairs of elements to be multiplied. However, we will refer to the sequence x\x% ■ ■ ■ x„+i without parentheses simply as a product. We shall determine the number of ways a product X1X2 ■ ■ ■ xn+i can be parenthesized. The assumption that the binary operation is noncommutative and nonassociative allows us to interpret this number as the maximum number of different elements of S that can be obtained by parenthesizing the product. However, we could instead consider the x,s to be real numbers and the binary operation to be ordinary multiplication. In this case we are seeking the number of ways that the product x 1X2 ■ ■ ■ xn+i can be computed by successive multiplications of exactly two numbers each time. This was Catalan's original formulation of the problem. Example 8 Find the distinct ways to parenthesize the product 2:1X22:32:4. Solution: Whenever a left parenthesis is closed by a right parenthesis, and we have carried out any products defined by closed pairs of parentheses nested between them, we must have a product of exactly two elements of 5. The ((»l(*3«8))»4)i (»t((*«»l)»0)i ((»l»a)(*3*4))i (j'l (.r,(r,J.,))), (((X1X2)X3)X4). □ Note that we used n = pairs of parentheses in each parenthesized prod-in I in Kxample 8. Although not necessary, it is convenient to include outer parentheses, where the left parenthesis is first and the right parenthesis last. We will now formally define what is meant by parenthesizing a product. Definition 3 A product is well-parenthesized if it can be obtained recur-ively by a finite sequence of the following rules: Each single term x £ S is well-parenthesized. If A and B are well-parenthesized, then (AB) is well-parenthesized. □ Note that a well-parenthesized product, other than a single element of 5, includes outer parentheses. Thus, (xy) is well-parenthesized, but xy is not. We can then use mathematical induction to prove that n pairs of parentheses must be added to 2:12:2 • • -2:0+1 to form a well-parenthesized product. If the n + 1 variables £1,2:2,. ■., 2:„+i are deleted from a well-parenthesized product, the n pairs of parentheses that remain must be a well-formed sequence of parentheses. But not every well-formed sequence of n pairs of parentheses can arise in this way. For example, (( )( )( )) is a well-formed sequence of n = 4 pairs of parentheses, but since the operation is binary, the outer pair of parentheses would call for the undefined product of the three elements of S that are to be computed within the inner parentheses. Example 9 Show by identifying A and B at each step that ((xl(x2x3))(xAx5)) is obtained from the product XiX22;32:42:5 by the recursive definition, and hence is a well-parenthesized product. Solution: X1X2X3X4X5 Xi(X2X3)x4X5 (xi(X2X3))XiX5 (xi(x2X3)){X4X5) {(xl(x2x3))(x4x5)) A = X2, B - x3 A = xi, B = (x2x3) A = 2:4, B = x5 A = (2:1(2:2*3)), B = (zuzs)- □ 124 Applications ot Disi intn Miithiini.itu We will now show that there is a one I......i ...... I......I. in . between well parenthesized products of £j, x2, • • ■, #n+i and mnegiil ivr | >.■ 11■ -. I'm >in the ori gin to the point (2n, 0). A well-parenthesized product forms 11 string p of length 3n + 1 with three types of characters: n left parentheses, n + 1 variables, and n right parentheses. But the slope sequence of the nonnegative paths from the origin to the point (2n,0) has just two different numbers, 1 and —1, with n of each. To obtain a sequence from p with a structure similar to that of the slope sequence, we form a string q — S(p) of length 2n by deleting the last variable x„+i and the n right parentheses. Example 10 Find the string q, if p = ((xi(x2x3))(x4a;5)) is the well-parenthesized product in Example 8. Solution-. Deleting x$ and the right parentheses from p gives q = ((*l(*2*3(*4- □ Let us examine the properties of this string q = 5(p) obtained from a well-parenthesized product p. We will then show that p can be obtained from a string q with these properties. By the way that q was defined, we first note the following. Lemma 1 The string q = S(p) has n left parentheses and the n variables a?i, X2,... ,xn in that order. ■ Since p cannot have a left parenthesis between xn and xn+i, the last character of the string q must be xn. This is in fact implied by a more general property that q satisfies, analogous to the property satisfied by nonnegative paths, which we state in the following lemma. Lemma 2 Let q = S(p), where p is a well-parenthesized product of the variables x\,x2,..., x„+i. For i < 2n, the number of left parentheses in the string ?i92 ' • • <7i>s at least as large as the number of variables. Proof: We will prove the lemma by induction on n, If n — 1, then we must have p = (X1X2), so q = (xi and the conclusion holds. Suppose n > 2 and that the conclusion holds whenever q is obtained from a well-parenthesized product p of k < n variables. Let p be a well-parenthesized product of n + 1 variables. From the recursive definition, p = (^45) is the product of two nonempty well-parenthesized products, A and B. The first left parenthesis in p is placed before AB, so the number of left parentheses up to each character of p preceding B is one more than in A. But if Xk is the 1 h ,,,<„, ■ 1 ,,t.,l.m Niiiiilmn. 125 I , i v Ii i; 11 > I < - 111 A, linn it dum mil n|i|>< hi in ,*>'( I), lull dorrt appear following i In q .s'(p). Thus iIn 'Iill. 1 • 11.1 between the number of left parentheses mm.I I lie number of variablen in <| exeeedn Hie corresponding difference in S(A) ii. until Xk is reached, where it becomes zero. The differences at each I. .1 11 ter <>f li arc then the same as they are at. that character in p. Since A is II parenthesized product of ifc < n variables, it then follows by the inductive 1.. p. .1 Insis that the number of left parentheses in qiq2 for i < 2n is at least . large ;us the number of variables. ■ We say that a string q satisfying Lemmas 1 and 2 is suitable. Theorem 4 There is a one-to-one correspondence between the set of suitable ■11 ings of length 2n and the set of well-parenthesized products of length 3n+ 1. / 'tool: We will show that the function S is a one-to-one correspondence. Let q I., a suitable string. We need to show that we can reconstruct the well-I'.iri'iithesized product p such that q = 5(p). First we adjoin x„+i to the 1 irlit of q and call the new string q' = qxn+i. Then by Lemma 1 we have the I. illnwing. Lemma 1' The string q' has n left parentheses and the n + 1 variables r 1, x'2,..., xn+i in that order. Since q and q' agree in the first 2n positions, we will denote the character of q' in position i by 2 and the theorem is true with n — 1 replacing n. By Lemmas 1 and 2 the last two characters of q are either xn-ixn or (xn, so the last three characters of q' are either a:n_ia:nxn+i or (xnxn+i- By Lemma 1', in either case q' will have three consecutive characters of the form (xjXj+i for some j > 1. Let j\ be the minimum such j. When right parentheses are inserted in q' to form a well-parenthesized product, a right parenthesis must immediately follow (xj^j^i, so p would contain (xj1Xj1+x). Replace (xjlXjl+\) by a new variable j/i in q' to form a string qj. Then qi has n— 1 left parentheses and n variables. Further, q^ satisfies the conclusion of Lemma 1' with n— 1 replacing n. Then by our inductive hypothesis, the well-parenthesized product pi can be recovered. Substituting (xjiXj1+\) for t/i in pj gives a well-parenthesized product p such that q = 5(p). Since S is one-to-one and can be inverted, it is a one-to-one correspondence. ■ 126 Applications ot Dmomto Mathoimtlim Example 11 Recover the will parenthesi/i • I i.....Iin I \< lioin the suitable string q = {xi((x2((x3x4x6 so that q = 5(p). Solution: We form q' = (xi((x2((x3x4x5x6 by adding x6 to the right of q. Then locate at each stage j the first occurrence of a left parenthesis immediately followed by two variables, and replace this string of length three by the new variable j/j equal to this string with a right parenthesis added on the right as a fourth character. ' = (xi((x2((x3X4X5X6 = (xi((x2{yix5xe - {x\((x2y2x6 = (xi(y3x6 = (xiVi = 2/5 2/1 = {x3x4) 2/2 = (2/1*5) 2/3 = (x2y2) 2/4 = (2/3*6) 2/5 = (xm) Then, on setting p = y5 and successively substituting for the 2/5,2/4, • • ■, 2/1 their values on the right we obtain new variables P = 2/5 = (z12/4) = (xi(y3x6)) - [xi({x2y2)xe)) = (xi{(x2(yixs))xs)) = (xxdx^xsx^x^xe)). D Corollary 1 The number of well-parenthesized products of n + 1 variables is the Catalan number c„. Proof: By Theorem 4, each well-parenthesized sequence from x\x2 ■ ■ ■ xn+i corresponds to a suitable string q of length 2n with n left parentheses and the n variables xi,x2,... ,xn. Define a sequence z — (z\,z2,..., z2n) by Zi={1 \-l ifg.i is a left parenthesis is a variable xj. Then it follows from Lemma 2 that the partial sums Si of z are nonnegative. LentwTT mg \°VS " TnegatiVG Path fr°m (°'°) to (2"'°)- Consequently, by Theorem 1, the number of well-parenthesized products is c„ ■ . /, .,.(,., ( .ü.tl.tn Niimbois 127 Full Binary Trees Urcn.ll (see Section H.I of Discrete MnUirinntics and Its Applications, Second I .lil.ion, by Rosen) tlial a full l>iunry tree is a rooted tree in which each internal vertex has exactly two children. Thus, a full binary tree with n in-liiiial vertices has In edges. Since a tree has one more vertex than it has rdges, a full binary tree T with n internal vertices has 2n + 1 vertices, and Urns n + 1 leaves. Suppose we label the leaves of T as they are encountered along a transversal (preorder, postorder, or inorder; see Section 8.1 of Discrete Mathematics and Its Applications) with x\, x2,..., xn+\. Then T recursively defines a well-parenthesized product of Xi,x2, ■ ■ ■ ,*n+i by the following rule. Labeling rule: If v is an internal vertex with left child a and right child 6, having labels A and B, respectively, then label v with (AB). The label on the root of the tree will be the well-parenthesized product. Conversely, given a well-parenthesized product of n + 1 variables x\, x2,..,, Bn+1, a labeled full binary tree is determined by first labeling the root with the well-parenthesized product, then moving from the outer parentheses inward by adding two children labeled A and B to each vertex v with label (AB). The haves of the tree will be labeled with the variables X\,x2,..., xn+i in the order encountered by a traversal. Consequently, there is a one-to-one correspondence between the well-parenthesized products of n + 1 variables and the full binary trees with n + 1 leaves and n internal vertices. By Theorem 4 we therefore have the following result. Theorem 5 The number of full binary trees with n internal vertices is the Catalan number cn. ■ Example 12 Draw and label the full binary tree defined by the well-parenthesized product ((1(23))(45)). Solution: The full binary tree is shown in Figure 4. ((1(23))(45)) Figure 4. Full binary tree obtained from ((1(23))(45)). Application.', ol Hi:, into Milllw atiiliin Ntimtwiit Triangulations of a Convex Polygon In this section we consider a geometric interpretation of the ('atalan numbers, An n-gon (n > 3) in the plane is a polygon P with n vertices and n sides. I-el vq, vi,..., i>„_i be the vertices (in counterclockwise order). Let us denote; by V{Vj the line segment joining Vi and Vj. Then the n sides of P are s< = for 1 < i < n — 1 and «o = vnvo- A diagonal of P is a line segment V{Vj joining two nonadjacent vertices of P. An n-gon P is convex if every diagonal lies wholly in the interior of P. Let D be a set of diagonals, no two of which meet in the interior of a convex n-gon P, that partitions the interior of P into triangles. The sides of the triangles are either diagonals in D or sides of P. The set T of triangles obtained in this way is called a triangulation of P. Three triangulations of a convex hexagon are shown in Figure 5. (a) (b) (c) Figure 5. Three triangulations of a convex hexagon. We shall next determine the number of diagonals needed to form a triangulation. Lemma 3 A triangulation of a convex n-gon has n — 2 triangles determined by n — 3 diagonals. Proof: We shall argue by induction on n. If n = 3 there is one triangle and no diagonals, while if n = 4 there are two triangles and one diagonal. Assume that n > 5 and that the conclusion holds for triangulations of a convex m-gon with 3 < m < n. Let P be a convex n-gon and let T be a triangulation of P with diagonal set D. Since n > 3 there must be at least one diagonal in D, say voVk, where 2 < k < n — 2. The diagonal voVk of P then serves jointly as a side of the convex (k + l)-gon Pi with vertices v0, vi,..., and the convex (n — k + l)-gon P^v/ith vertices vo,Vk,Vk+i ...,v„_i. Triangulations Tj and T2 of these two convex polygons are defined by subsets D\ and D2, respectively, of the set D of diagonals of P other than v^Vk- Since k+ 1 < n and n — + 1 < n, we may apply the inductive hypothesis. Then T\ and T2 Imvc * I and n — k — 1 triimgli'n ilellm I h\ k — 2 and n — Jb — 2 diagonals, li | u-i Lively. Adding Lhe miiiilierN ol 11 i.hikI'n nive« (ifc-l) + (n — k— l) = n — 2 •I.■-•! iii '/'. We add I (for e,,e( ) to Lhe Miliii (k — 2) + (n — k — 2) = n — 4 of 11« i.....ibera of diagonals to get n — 3 diagonals in D. ■ 11 i.s convenient now to assume that P is a convex (n + 2)-gon for n > 1, With vertices fo, fi> • • • > wn+i and sides »o.*i) • • • .*n+i- Then, by Lemma 3, • vi i v triangulation of P is defined by n — 1 diagonals and has n triangles. bet /,„ be the number of triangulations of a convex (n + 2)-gon P. Then h |y ti — 1 and t2 = 2. I xample 13 Draw all the triangulations of a convex pentagon to show that la = 5. Solution: Since n = 3, each triangulation has three triangles defined by two diagonals. If ViVj is one of the diagonals, the other diagonal must meet w< • i Thus each of the two diagonals must join the common endpoint to a Ronadjacent vertex. We find the five triangulations shown in Figure 6. □ Figure 6. The triangulations of a convex pentagon. Consider the string sis2 ■ ■ -sn+i formed by taking the n + 1 sides of P other than sq in order around P. We shall show that the product SiS2 ■ ■ • sn+i is well-parenthesized by a triangulation of P. One side, so, is excluded above in order that the remaining sides form an open path in the plane. The sides, taken in the order of the path, then form a sequence analogous to the product of n + 1 variables considered earlier. Let T be a triangulation of P defined by a set D of n — 1 diagonals, where n > 4. Each side s,- of P is a side of exactly one of the n triangles of T. There are two types of triangles in T that contain sides of P: (i) The three sides of an outer triangle are two adjacent sides s;,s,-+i of P and the diagonal t;,-_ii;i+i of D. (ii) The three sides of an inner triangle are a side Si s Vi-iVi of P and the two diagonals Vi„\Vj, V{Vj of D for some vertex Vj, with jf^i-2,1 + 1. For example, hexagons (a) and (b) in Figure 5 each have two outer tri- 130 Applications otDlacroto Mmhont.iii, . angles, while (c) has three. In order to eHUhlmh ....... ..............,,„„„|,.„,... between tnangulations and well-parenthesized producl.n, .....,, ,|„,w |.|.at any tnangulation has an outer triangle not having s0 as a side, Lemma 4 If n > 2, every triangulation T of a convex (n + 2)-gon P has at least two outer triangles. Proof: Suppose that at most one of the n triangles of T has two sides of P as sides. Let n; be the number of sides of P that are sides of the ith triangle. Then n,- = 1 for at least n — 1 triangles and n-i — 2 for at most one. When we sum these n numbers n,-, we conclude that P has at most n + 1 sides. But P has n + 2 sides, so we have a contradiction. Thus P must have at least two outer triangles. Theorem 6 The number of triangulations of a convex (n + 2)-gon is the Catalan number cn. Proof: By Corollary 1, it will suffice to establish a one-to-one correspondence between triangulations of a convex (n + 2)-gon P and well-parenthesized products of the n + 1 sides of P other than sq. Let n > 2 and assume such a one-to-one correspondence exists for (n + 1)-gons. Let T be a triangulation of the convex (n + 2)-gon P. Since every side of P is a side of exactly one triangle of T, and by Lemma 4 there are at least two outer triangles, there must be an outer triangle in T that does not have So as a side. This triangle has vertices Vi-\, Vi, i>j+i for some i, 1 < i < n, so has as sides the two sides Si,«i+i of P and the diagonal w,-_it;,-+i in D. Label this diagonal with If we delete vertex V{ and replace sides s,-,s,+i by the diagonal labeled (s,s,+i), we have a convex (n + l)-gon P'. By the inductive hypothesis we can establish a one-to-one correspondence between triangulations T' of P' and well-parenthesized products of the sides of P' other than sq. But one of the sides of P' is labeled with the well-parenthesized product (sjSj+i) of two sides of P. Thus the well-parenthesized product of the sides of P' represents a well-parenthesized product of the sides of P. Conversely, each innermost pair of parentheses in a well-parenthesized product of the sides of P other than So indicates that the two sides within that pair are in an outer triangle. Then the diagonal completing the outer triangle must be included in D. Each closing of parentheses acts in this way to add diagonals that complete outer triangles on the reduced polygon until a triangulation of P is obtained. 131 Summary of Objects Counted by the Catalan Numbers I he one-to-one correspondences we have established between the sets of trian-|ulations of a convex (n + 2)-gon, welI parenthesized products of n+1 variables, VVi II formed sequences of n pairs of parentheses, stack permutations of 12---n, mil nonnegative paths from the origin to the point (2n,0) are illustrated in I i run- 7 for the case n — 4. The side so in the hexagon that does not correspond to a variable in the ■' i responding well-parenthesized product of n+1 variables is shown as a dashed line segment. For clarity, each of the other sides, s,-, which corresponds to a Variable in the corresponding well-parenthesized product, is labeled simply i. The Generating Function of the Catalan Numbers We started by defining the Catalan number cn by means of a formula, and we then showed by a combinatorial argument that it enumerates nonnegative paths in the plane. We subsequently found one-to-one correspondences between several different types of combinatorial structures, starting with the nonnegative paths. It followed that the number of structures of each type must be equal to the number c„ of nonnegative paths. Having established the one-to-one correspondences, the same conclusion would follow if we showed (combinatorially or otherwise) that the number of structures of any particular type is given by 1 n+ 1 C(2n,n). (1) In this section we will obtain (1) as the number of well-parenthesized products of n+1 variables using recurrence relations and generating functions (see Chapter 5 and Appendix 3 of Discrete Mathematics and Its Applications). In the next section we will investigate the behavior of the sequence {c„} for large values of n. A sequence {an} = arj,ai,... satisfies a linear homogeneous recurrence relation of degree k with constant coefficients if each term an for n > k can be computed recursively from the previous k terms by means of an = C\an-\ + C2a„_2 -I-----h Ckan-k (2) for some constants Q, 1 < i < k, with Ck ^ 0. Any sequence satisfying (2) is completely determined by its k initial values. The generating function of a sequence {an} is the power series 00 A{x) = ^anxn. n = 0 132 Al))>ln .Hi,,,,: 1)1 Pit., Mill,,,,,,.,!,, : y\3 «12)(3(45))) (())()() 1243 «12X(34)5» (())(()) 2143 ((1(23))(45)) (()())() 1423 («12)3X45» ((()))() 1432 ((1(2(34)))5) (000) 4123 ((1((23)4))S) (0(0)) 4213 («12)(34))5) ((0)0) 4132 (((1(23))4)5) ((()())) 4312 ((((12)3)4)5) (((()))) 4321 Figure 7. One-to-one correspondences with n = 4 between the sor. of tnangulatzons of a convex hexagon, well-parentheszed^product nelatrpttr^8 °f — P~ions, Ld non-' l^rffiUenC! SaJiSfieS a,Hnear homo^ous recurrence relation with con-tunction of x. The polynom.al in the numerator depends only on the k initial values, while the polynomial in the denominator depends only on the "fur- • /i ■ ,i i/.im Niiinlmni 133 i. In..... Once the roots > • ! thi polj........id iii IIh di in >n una tor are found (••i < minted by an algorithm), I In val.....I iiny t'-r.....I the sequence can be ulitaiiied hy means of partial Inn I...... and known power series. Suppose we define a„ to he the iiumher of well-parenthesized products of n i.ihloH. We showed earlier tliat «„+i is the Catalan number c„ given by (1). Mill let us find a„ using recurrence relations and generating functions. To form ll-parenthesized product of n > 2 variables X\, x2, • • ■,xn by the recursive lli huition, the outer parentheses would be the last ones added to form (AB), Where A and B are well-parenthesized products, each having at least one of the v.u lables. The outer parentheses then would enclose AB, where A is one of the n, well-parenthesized products of the first i variables and B is one of the a„_,-well-parenthesized products of the last n — i variables, for some i satisfying 1 < i < n — 1. Then, by the product rule and the sum rule, we obtain n-l an = axan_i + a2a„_2 +----h an-i<*i = ^ a>an-i- (3) «=i Note that (3) is a homogeneous recurrence relation with constant coefficients. However, it is not linear, but quadratic. In addition, since the number of previous terms that determine an depends on n, and the degree of a recurrence relation is defined only when this number is some fixed integer k, the degree is not defined in (3). If we define ao = 0, then the terms aoa„ and anao can be added to the right-hand side of (3) without changing its value, and we obtain an = a0an + a,ian-\ +----1- anaQ = a,an_j. (4) >=o The right-hand side of (4) is zero for n < 1, but for n > 2 it is the coefficient of xn in the square of the generating function A(x) = Yli=oanxn ■ Since a\ = 1, it follows that A\x) =A(x)-x, which is a quadratic equation in A(x). By the quadratic formula, we obtain as solutions A(x) = -(1 ± vT^Ti). (5) Since A(0) = ao = 0, and when we set x = 0 the right-hand side of (5) with the plus sign is 1, we must take the minus sign. Thus the generating function of the sequence an is A(x)=1-(l-VT=4x~). (6) Using a generalization of the binomial theorem, it can be shown that VT=4x~ = (\ - Ax)1'2 = l + ^(-l)"ili!i4"x", (7) 71. n= 1 134 Applications ot Discrani Muthomatics 135 where (^)„ is the falling factorial function (x)„ = x(x— I) • • ■ (fl—n+1) evaluated at x — 1/2. The terms in the sum in (7) can be simplified after writing 4" as 2"2", carrying out the multiplication of = {\){\ — 1) ■ • • (3 — n + 1) by (-l)n2n termwise, writing the remaining factor 2n as 2(2n_1)(n - l)!/(n - I )t, and noting that 2n_1(n - 1)! = 2 • 4 • • • (2n - 2). We then obtain 00 1 VI - 4a; = 1 - 2 ]T - C(2n - 2, n - 1) n=i " From (6) and (8) we obtain the generating function 00 A{x) = 2J - C{2n - 2,n - 1)xn. («) (9) n = l Thus the coefficient of xn+1 in (9) is a„+i = ^ C(2n,n), which by (1) is the Catalan number c„. Similar methods could be used to show that the sizes of the other structures considered are Catalan numbers. It would suffice to show that the sequence satisfied the recurrence relation (4) and that the initial values agree, perhaps after a shift. Asymptotic Behavior of the Catalan Numbers Let us consider the behavior of the sequence {c„} of Catalan numbers for large values of n. In the previous section we let an be the number of well-parenthesized products of n variables and showed that the sequence {a„} satisfies the recurrence relation (3). Using the generating function A(x) of the sequence {an}, we found that a„+i is equal to the Catalan number c„ given by (1). If we substitute Cj for aJ+1 in (3) and adjust the range of the index variable i, we see that the sequence {c„} satisfies the recurrence relation n-l Cn = C0Cn_i + C!Cn_2 -\-----h C„_iC0 = 2_J cicn-i- (10) 1=0 with Co = 1. This is a quadratic homogeneous recurrence relation with constant coefficients, but with the degree undefined. The asymptotic behavior of the sequence {cn} can be more easily found by showing that it satisfies a second homogeneous recurrence relation, one that is linear of degree one but with a variable coefficient. The linear recurrence relation could have been found earlier using Definition 1 of the Catalan numbers, but we will find it using the solution given by (1) to the quadratic recurrence relation (10). A linear homogeneous recurrence relation of degree one with a constant coefficient has the form an = Can-\. On iterating this recurrence relation n— 1 ...... , we olilain the l'< hi 1 in In ii,, ( '"iin Suppose that instead of making our mill definition of c„, wlmli in the rmme as (1), we had defined cn as the ■ 1111111>>'r of well-parenthesized products of »1 + 1 variables. Then we would find 1 in I In- last section, that the sequence {c,,} satisfies the quadratic recurrence o I ition (10), and that the solution is given by (1). On using the formula for tin binomial coefficient, it can be then be shown (see Exercise 6) that 4n-2 n+ 1 c„-i, (11) I hat the constant coefficient C is replaced by a variable coefficient 4n - 2 6 C{n) = n + l = 4- n+ 1 ('learly C(n) increases to 4 as a limit as n —► 00. However, this does not imply that cn can be approximated by a constant multiple of 4". That would be the 1 use if fact C(n) was identically equal to the constant C — 4. However, on using the familiar expression for a binomial coefficient involving three factorials, and replacing each of those factorials by an approximate value, we can approximate c„, and use this approximation to find a function /(») with a relatively simple form such that c„ = 0(/(n)). The simplest form of Stirling's approximation s„ of n! is given by (12) Using this approximation, it can be shown (see Exercise 7) that cn = 0(n"3/24"). (l3) Suggested Readings 1. K. Bogart, Introductory Combinatorics, Harcourt, Brace, Jovanovich, New York, 1990. 2. L. Comptet, Advanced Combinatorics, D. Reidel, Boston, 1974. 3. W. Feller, An Introduction to Probability Theory and Its Applications, Second Edition, Wiley, 1961. 4. F. Roberts, Applied Combinatorics, Prentice-Hall, Englewood Cliffs, N.J., 1984. 136 Afipli, .ili,;r. „ll)i!., i„1,1 M.ilh,i,n.,l, Exercises In Exercises 1-4 find the structures that correspond to the given structural under the one-to-one correspondences established. 1. Given the sequence (+-|---1- —|----|-H---) of ±ls with nonnegative partial sums, find or draw the corresponding a) well-formed sequence of six pairs of parentheses b) nonnegative path from the origin to (12,0) c) stack permutation of 123456. 2. Given the sequence (+H---1----M---) of ±ls with nonnegative partial sums, find or draw the corresponding a) well-parenthesized product of six variables b) full binary tree with six leaves whose vertices are labeled with well-parenthesized products c) triangulation of a convex septagon. 3. Given the following triangulation of a convex octagon, find or draw the corresponding a) well-parenthesized product of seven variables b) sequence of twelve ±ls with nonnegative partial sums c) nonnegative path from the origin to (12,0) d) stack permutation of 123456. 4. Given the well-parenthesized product ((1(23))((45)6)) of six variables (with X{ denoted by i), find or draw the corresponding a) sequence of ten ±ls with nonnegative partial sums b) stack permutation of 12345 c) triangulation of a convex septagon. 5. Find the sequence of ten pushes (+) and pops (—) that produces the stack permutation 42135, and draw the corresponding nonnegative path. i h,,rl„i ■ i NtinihoiH 13/ i. i 'luve that the ('atalan iiuiiiIht» <•„ mil ml\ i li< i<■■ in reme relation (II). 7. Use the Stirling approximation (12) I'm ri! I.<> prove that the (Catalan number <■„ satisfies (13). In lixercises 8 10, suppose that '/' is a triangulation of a convex (n + 2)-gon /' with diagonal set D. Let vqVi ■ ■ -vn+i be the vertices of P in order, and |at Hi be the side for 1 < i < n + 1, s0 = v0v„+\. Denote by p the corresponding well-parenthesized product SiS2 ■ ■ ■ sn+i of its sides other I lian So- H. Prove that the diagonal ViVj is in the diagonal set D if and only if the product Sj+iSj+2 "'Sj is well-parenthesized in p. D, Let 1 < Jfc < n — 1. Prove that the nonnegative path corresponding to T meets the a;-axis at the point (2k, 0) if and only if D contains the diagonal VkVn + l. 1(1. Prove that the nonnegative path corresponding to T is positive if and only if D contains the diagonal vavn. Computer Projects 1. Given a positive integer n as input, find the Catalan number c„ using a) the recurrence relation (10). b) the recurrence relation (11). 2. Given positive integers n and N, find JV random sequences Xj = {x,j} of n Is and n —Is, compute the corresponding sequences Sj — of partial sums S{j — x\j + X2j + ■ ■ ■ + Xij, and let a, 6 be the number of positive, nonnegative sequences Sj, respectively. Compute the ratios a) a/C(2n, n). b) b/C(2n,n). c) a/b. 3. Given a positive integer n, find a random sequence x of n Is and n —Is that has a nonnegative sequence s of partial sums. Using s, produce the corresponding a) well-formed sequence of parentheses. b) stack permutation. c) well-parenthesized product. d) full binary tree (graphics). e) triangulation of a convex polygon (graphics).