Notes on the Catalan problem Daniele Paolo Scarpazza Politecnico di Milano <scarpazz@elet.polimi.it> March 16th, 2004 Daniele Paolo Scarpazza Notes on the Catalan problem [1] An overview of Catalan problems • Catalan numbers appear as the solution of a variety of problems; • they were first described in the 18th century by Leonhard Euler (working on polygon triangulation); • they are named after Eugene Catalan, a belgian mathematician which found their expression (working on parenthesizations). Daniele Paolo Scarpazza Notes on the Catalan problem [2] A Catalan Problem: Balanced Parentheses “Determine the number of balanced strings of parentheses of length 2n”. A string of parentheses is an ordered collection of symbols “(” and “)”. Balanced: same number of open and close parentheses, and every prefix of the string has at least as many open parentheses as close parentheses; Example: ()(()()) is balanced; strings )(()) and (()() are not. Daniele Paolo Scarpazza Notes on the Catalan problem n C(n) 0 empty string 1 1 () 1 2 ()() (()) 3 ()()() ((())) ()(()) (())() (()()) 5 4 ()()()() ()((())) (()()()) ((()())) ()()(()) (())()() (()(())) (((()))) ()(())() (())(()) ((()))() ()(()()) (()())() ((())()) 14 5 ()()()()() ()()((())) ()(()()()) ()((()())) (())(())() (()())(()) (()(()))() ((()))()() ((())(())) ((()(()))) (((()()))) ()()()(()) ()(())()() ()(()(())) ()(((()))) (())(()()) (()()())() (()(())()) ((()))(()) ((()()))() (((())))() ((((())))) ()()(())() ()(())(()) ()((()))() (())()()() (())((())) (()()()()) (()(()())) ((())())() ((()())()) (((()))()) ()()(()()) ()(()())() ()((())()) (())()(()) (()())()() (()()(())) (()((()))) ((())()()) ((()()())) (((())())) 42 2 [3] Daniele Paolo Scarpazza Notes on the Catalan problem [4] Another one: Mountain Ranges “Determine the number of mountain landscapes which can be formed with n upstrokes and n downstrokes.” Mountain range: polyline of upstrokes “/” and downstrokes “\”; its extreme points lie on the same horizontal line, and no segments cross it. n C(n) 0 * 1 1 /\ 1 2 /\/\ 2 /\ / \ 3 /\/\/\ /\ /\/ \ /\ / \/\ /\/\ / \ /\ / \ / \ 5 Daniele Paolo Scarpazza Notes on the Catalan problem [5] Another one: Diagonal-avoiding paths on a lattice “Given a n × n lattice, determine the number of paths of length 2n which do not cross the diagonal.” In a finite lattice (i, j) : 1 ≤ i ≤ n, 1 ≤ j ≤ n, a path is a connected sequence of “west” or “south” segments from node (1, 1) to node (n, n). @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ Sample path in a 7 × 7 lattice, corresponding to string (())(((()))()). Daniele Paolo Scarpazza Notes on the Catalan problem [6] Another one: Multiplication precedence “Determine the number of ways in which n + 1 factors can be multiplied together, according to the precedence of multiplications.” n C(n) 0 a 1 1 a·b 1 2 (a · b) · c a · (b · c) 2 3 ((a · b) · c) · d (a · (b · c)) · d a · (b · (c · d)) (a · b) · (c · d) a · ((b · c) · d) 5 4 (((a · b) · c) · d) · e ((a · b) · (c · d)) · e (a · b) · (c · (d · e)) (a · (b · c)) · (d · e) (a · (b · (c · d))) · e a · ((b · c) · (d · e)) a · (b · ((c · d) · e)) ((a · b) · c) · (d · e) (a · b) · ((c · d) · e) ((a · (b · c)) · d) · e (a · ((b · c) · d)) · e a · (((b · c) · d) · e) a · ((b · (c · d)) · e) a · (b · (c · (d · e))) 14 Daniele Paolo Scarpazza Notes on the Catalan problem [7] Another one: Convex polygon triangulation “Determine the number of ways in which a convex polygon with n+2 edges can be triangulated.” Daniele Paolo Scarpazza Notes on the Catalan problem [8] Another one: Handshakes across a table “Determine the number of ways in which 2n people sitting around a table can shake hands without crossing their arms”. Daniele Paolo Scarpazza Notes on the Catalan problem [9] Another one: Binary rooted trees “Determine the number of binary rooted trees with n internal nodes.” Each non-leaf is an internal node. Binary rooted trees with n internal nodes and n ranging from 0 to 3: Daniele Paolo Scarpazza Notes on the Catalan problem [10] Another one: Plane trees “Determine the number of plane trees with n edges.” A plane tree is such that it is possible to draw it on a plane with no edges crossing each other. Daniele Paolo Scarpazza Notes on the Catalan problem [11] Another one: Skew Polyominos “Determine the number of skew polyominos with perimeter 2n + 2.” Polyomino: figure composed by squares connected by their edges. Skew polyomino: successive columns of squares from left to right increase in height: the bottom of the column to the left is always lower or equal to the bottom of the column to the right. Similarly, the top of the column to the left is always lower than or equal to the top of the column to the right. Daniele Paolo Scarpazza Notes on the Catalan problem [12] Daniele Paolo Scarpazza Notes on the Catalan problem Derivations of the Catalan numbers [13] Daniele Paolo Scarpazza Notes on the Catalan problem [14] Catalan numbers derived with generating functions Any string containing n > 0 pairs of parentheses can be decomposed as: (A)B where, if A contains k pairs of parentheses, B must contain n − k − 1. All configurations of n parenthesis pairs are the ones where A is empty and B contains n − 1 pairs, plus the ones where A contains 1 pair and B contains n − 2, and so on: Daniele Paolo Scarpazza Notes on the Catalan problem C1 = C0C0 C2 = C0C1 + C1C0 C3 = C0C2 + C1C1 + C2C0 C4 = C0C3 + C1C2 + C2C1 + C3C0 ... = ... which can be rewritten in the form of a recurrence relation: C0 = 1, C1 = 1, Cn = n−1 X i=0 CiCn−1−i [15] Daniele Paolo Scarpazza Notes on the Catalan problem [16] We will now solve the above recurrences with the use of generating functions. C(x) = C0 + C1 · x + C2 · x2 + ... = +∞ X Ci · xi i=0 Let’s now examine the expression of [C(x)]2 = C(x)C(x), as follows: C0C0 + (C0C1 + C1C0) x +(C0C2 + C1C1 + C2C0) x2 + ... = || || || C1 + C2 x + C3 x2 + ... still a generating function with Catalan coefficients, shifted one position left: [C(x)]2 = C1 + C2x + C3x2 = +∞ X i=0 Ci+1xi . Daniele Paolo Scarpazza Notes on the Catalan problem [17] Therefore if we multiply the whole series by x and add C0, the original Catalan series is obtained: C(x) = C0 + x[C(x)]2. A quadratic equation, which could be put into the more familiar form: xC 2 − C + C0 = 0, where C is the unknown and x, C0 are constant coefficients. Replacing C0 with its value (i.e., 1), the solution is trivially given by: C= 1± √ 1 − 4x . 2x Daniele Paolo Scarpazza Notes on the Catalan problem [18] Only the − solution is acceptable, being C0 = 1: C= 1− √ 1 − 4x . 2x (1) The solution contains the power of a binomial with fractional exponent: √ 1 − 4x = (1 − 4x)1/2 = X 1/2 n≥0 n (−4x)n, Daniele Paolo Scarpazza Notes on the Catalan problem which can be expanded as: 1/2 (1/2)(−1/2) 4x + (4x)2 + 1 2·1 (1/2)(−1/2)(−3/2) (4x)3 + 3·2·1 (1/2)(−1/2)(−3/2)(−5/2) (4x)4 + ... 4·3·2·1 (1 − 4x)1/2 = 1 − + + which can be simplified as follows: (1 − 4x)1/2 = 1 − 1 1 3·1 3 5·3·1 2x + 4x2 − 8x + 16x4 + ... 1! 2! 3! 4! [19] Daniele Paolo Scarpazza Notes on the Catalan problem [20] Now, substituting we obtain: C(x) = 1 − 1 3·1 2 5·3·1 3 7·5·3·1 2x + 4x + 8x + 16x4 + ... 2! 3! 4! 5! We can get rid of terms like 7 · 5 · 3 · 1 (factorials missing the even factors), by considering that: 22 · 2! = 4 · 2 23 · 3! = 6 · 4 · 2 24 · 4! = 8 · 6 · 4 · 2 ... = ... n Y 2n · n! = 2i i=1 Daniele Paolo Scarpazza Notes on the Catalan problem Consequently: 1 2! 1 4! 2 1 6! 2 C(x) = 1 + ( )x + ( )x + ( )x = 2 1!1! 3 2!2! 4 3!3! +∞ X 1 2i i = z 1 + i i i=0 Therefore, the ith Catalan number is: 1 2i . Ci = 1+i i [21] Daniele Paolo Scarpazza Notes on the Catalan problem [22] The simplest proof Solution based on considerations on the count of diagonal-avoiding paths on a lattice: determining Cn equals counting the total paths through the grid and subtracting the number of invalid ones. @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ (example of an invalid path). Daniele Paolo Scarpazza Notes on the Catalan problem [23] P (i, i + 1): first illegal reached point. Transformation: from point P on, replace S ↔ W segments. @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ Transformation starts at point (i, i + 1) and it causes the n − i W segments to be replaced by S segments and the remaining n − i − 1 S segments to be replaced by W segments, ⇒ new ending coordinates (i + (n − i − 1), (i + 1) + (n − i)) = (n − 1, n + 1). Daniele Paolo Scarpazza Notes on the Catalan problem [24] By construction, every illegal path in a n × n lattice corresponds to exactly one non-constrained path in (n − 1) × (n + 1) lattice. 2n a+b Paths in a a × b lattice: a ⇒ in a n × n: n 2n Invalid paths in a n × n lattice = paths in a (n − 1) × (n + 1) lattice: n+1 Cn = 2n 2n − n n+1 = 2n n 2n − n n+1 n = 1 2n . n+1 n Daniele Paolo Scarpazza Notes on the Catalan problem [25] A novel interpretation A novel calculation of the Catalan numbers, inspired by formal language considerations. Language of balanced parentheses (Van Dyck language). G = (Σ, N, S, R) Σ = {(, )} N = {S} R = {r1, r2} r1 : S→ε r2 : S → (S)S Daniele Paolo Scarpazza Notes on the Catalan problem [26] Sentential form: sequence of terminal and nonterminal symbols which can be derived from the start symbol S. Strings are special sentential forms of terminal symbols only. Labelling: (n, t)-label for a sentential form containing n nonterminals and t terminals. Strings ∈ L(G) will be labeled (0, 2i), i ∈ N. Theorem: the number of terminal symbols is even. Proof: the axiom contains no terminals, rules preserve parity. Derivation step: a substitution replacing a single S symbol in a sentential form with the right-hand side of rule r1 or r2. Daniele Paolo Scarpazza Notes on the Catalan problem The derived form of a (n, t) form will have: • one nonterminal less, same number of terminals (rule r1 applied); • one more non-terminal and two more terminals (rule r2 applied). r1* (n,t) (n-1,t) H HH H r HH HH j 2 (n+1,t+2) [27] Daniele Paolo Scarpazza Notes on the Catalan problem Thus, a given sentential form derives either from: • a (n + 1, t) form, through rule r1; or • a (n − 1, t − 2) form, through rule r2. (n+1,t) (n-1,t-2) r HH Y HH1 HH H H 2 r (n,t) [28] Daniele Paolo Scarpazza Notes on the Catalan problem [29] Through recursive application of above scheme, all predecessors of a given sentential form can be determined, up to the axiom, which has obviously label (1, 0). Theorem: axiom can only have label (1, 0). Theorem: label (1, 0) corresponds to axiom only. In general it is not true that each sentential form has exactly 2 predecessors: • a (1, 0) form has no predecessor by definition, being the axiom; • (0, t) and (1, t) forms can only have a (n + 1, t) predecessor. (Proof: Daniele Paolo Scarpazza Notes on the Catalan problem [30] by contradiction, the (n − 1, t − 2) predecessor would have zero or less nonterminals, therefore it could have no successors.) • (n, t) forms with n > t do not exist, apart from the axiom (1, 0). (Proof: by induction. For each form (n, t), be δ = t − n. The axiom has δ = −1, both rules increment δ.) Each derivation tree starts with a label corresponding to a string, (0, 2i), and reaches leaf nodes which are either axioms or invalid nodes. Daniele Paolo Scarpazza Notes on the Catalan problem [31] Number of axioms contained in the tree of a (0, 2i)-string = number of different ways in the derives a (0, 2i)-string = number of different strings of balanced parentheses of length 2i (since each derivation is unique). Daniele Paolo Scarpazza Notes on the Catalan problem Examples follow; axioms are marked with “!”, invalid nodes with “×”. (0,2) (1,2) (2,2) (1,0) ! Figure 1: Derivation tree for (0, 2), i.e., 2n = 2 [32] Daniele Paolo Scarpazza Notes on the Catalan problem [33] (0,4) (1,4) (2,4) ((((````` ((((( ( ``` ( ( ( ( (1,2) (3,4) XX XXX XX (2,2) b bb (1,0)! (3,2)× (2,2) b bb (4,4) H HH H (1,0)! (3,2)× (3,2)× (5,4)× Figure 2: Derivation tree for (0, 4), i.e., 2n = 4 The count of axiom nodes in each derivation tree, is the ith Catalan Daniele Paolo Scarpazza Notes on the Catalan problem [34] number. According to the above rules, the Ci is the number of axioms in the derivation tree of a (0, t) form, with t = 2i, thus: 1 0 R(n, t) = R(n + 1, t) R(n − 1, t − 2) + R(n + 1, t) if (n = 1 ∧ t = 0) if n > t if n <= 1 otherwise. By construction, Ci = R(0, 2i). The above relation R was implemented in Tcl function and tested for correctness. Daniele Paolo Scarpazza Notes on the Catalan problem [35] t/n 0 1 2 3 4 5 6 7 8 9 10 11 12 13 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 1 0 1 0 2 0 5 0 14 0 42 0 132 0 429 0 1430 0 4862 0 16796 0 58786 0 208012 1 0 1 0 2 0 5 0 14 0 42 0 132 0 429 0 1430 0 4862 0 16796 0 58786 0 208012 0 0 1 0 2 0 5 0 14 0 42 0 132 0 429 0 1430 0 4862 0 16796 0 58786 0 208012 0 0 0 0 1 0 3 0 9 0 28 0 90 0 297 0 1001 0 3432 0 11934 0 41990 0 149226 0 0 0 0 0 0 1 0 4 0 14 0 48 0 165 0 572 0 2002 0 7072 0 25194 0 90440 0 0 0 0 0 0 0 0 1 0 5 0 20 0 75 0 275 0 1001 0 3640 0 13260 0 48450 0 0 0 0 0 0 0 0 0 0 1 0 6 0 27 0 110 0 429 0 1638 0 6188 0 23256 0 0 0 0 0 0 0 0 0 0 0 0 1 0 7 0 35 0 154 0 637 0 2548 0 9996 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 8 0 44 0 208 0 910 0 3808 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 9 0 54 0 273 0 1260 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 10 0 65 0 350 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 11 0 77 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 12 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 Daniele Paolo Scarpazza Notes on the Catalan problem Plot of the surface R(n, t). Points with odd values of t were not plotted. [36] Daniele Paolo Scarpazza Notes on the Catalan problem [37] Plot of the surface log R(n, t). Points with odd values of t were not plotted. Daniele Paolo Scarpazza Notes on the Catalan problem [38] Future Developments If D is the language defined by: S → ε | (S)S, then D can be recursively written as: D = {ε} + (D)D where + denotes disjoint set union. If an alphabet A = {a1, a2, ..., am} is considered, A∗ denotes the language of all the strings over alphabet A, and if α ∈ A∗ is a string over A, |α| denotes the length of α. For a language A∗, a function w : A∗ → Z[[x]] can be defined: w(α) = x|α|, with α ∈ A∗ and we set by convention that w(ε) = 1. Daniele Paolo Scarpazza Notes on the Catalan problem [39] It is trivial to prove that function w exhibits the following property: ∀α, β ∈ A∗, w(α · β) = w(α)w(β), where · denotes P string concatenation. w can be extended on languages, by defining w(L) = α∈L w(α). Therefore ∗ w(A ) = X α∈A∗ w(α) = X α∈A∗ |α| x = X X n≥0 1 xn = |α|=n The following equation can be set for language D: w(D) = 1 + x2w(D)2 X n≥0 1 m x = . 1 − mx n n Daniele Paolo Scarpazza Notes on the Catalan problem [40] which can be solved by replacing y = w(D), thus obtaining y = 1 + x2y 2, or in a more familiar form: x2y 2 − y + 1 = 0. The solution is given by: y= 1− √ 1 − 4x2 2x2 It can be shown that: D(x) = w(D) = D0 + D2x2 + D4x4 + D6x6 + ... where ∀i ∈ N, D2i = Ci and D2i+1 = 0, therefore D(x) = w(D) = C0 + C1x2 + C2x4 + C3x6 + .... Daniele Paolo Scarpazza Notes on the Catalan problem [41] Our aim is to extend above considerations to the language E of sentential forms of grammar G. Each string in D corresponds to exactly 1 mountain range; each string in E corresponds to exactly one extended mountain range, with also horizontal strokes. Example: (S)(S)((S)(S(S)))S, and its corresponding extended mountain range: _ _ _/ \ _ _ / \/ \ / \/ \/ \_ The language E of sentential forms of grammar G is a new language, can be defined by the a new grammar H, given as follows: Daniele Paolo Scarpazza Notes on the Catalan problem H [42] = (Σ0, N, B, R0) Σ0 = {(, ), S} N = {B} R = {r1, r2, r3} r1 : B→ε r2 : B → SB r3 : B → (B)B Note: S is a terminal symbol for grammar H. The above language is called the Motzkin language. Daniele Paolo Scarpazza Notes on the Catalan problem Now let us consider the production: B → ε | SB | (B)B, and we give now a recursive definition of E: E = {ε} + SE + (E)E. It is then time to introduce a newer, more useful definition of w(α): w(α) = xp(α)y o(α)z |α| where p(α) = |α|( + |α|), and o(α) = |α|S , therefore p(α) + o(α) = |α|. From the recursive definition of E, it is possible to set: w(E) = 1 + yzw(E) + x2z 2w(E)2, [43] Daniele Paolo Scarpazza Notes on the Catalan problem [44] which, replacing e = w(E), is: x2z 2e2 + (yz − 1)e + 1 = 0, which, solved by e yields: p 1 − yz − 1 − 2yz + y 2z 2 − 4x2z 2 e= . 2 2 2x z Thus e(x, y, z) can be written as a formal power series with coefficients Daniele Paolo Scarpazza Notes on the Catalan problem [45] Ei,j,k : e(x, y, z) = E0,0,0 + + E1,0,0x + E0,1,0y + E0,0,1z + + E2,0,0x2 + E0,2,0y 2 + E0,0,2z 2 + E1,1,0xy + E0,1,1yz + E1,0,1xz + + ... It is now evident that the number of sentential forms with n nonterminals and t terminals, previously called R(n, t) is given by : R(n, t) = En,t,n+t = [xty n]e(x, y, 1), Daniele Paolo Scarpazza Notes on the Catalan problem [46] where the notation [...] has the following meaning: n [x ]f (x) = fn ⇔ f (x) = X fnxn, n≥0 in particular [xiy j z k ]e(x, y, z) = Ei,j,k . Furthermore, an expression of e(x, y, 1) can be obtained by restriction: e(x, y, 1) = e(x, y, z)|z=1 p 1 − y − 1 − 2y + y 2 − 4x2 = . 2x2 Incidentally, the i-th Catalan number, which was equal to R(0, 2i) can be obtained by setting y = 0, thus: Daniele Paolo Scarpazza Notes on the Catalan problem e(x, 0, 1) = e(x, y, 1)|y=0 = 1− [47] √ 1 − 4x2 , 2x2 which is identical to a previous equation and admits the same solutions. To obtain an expression of R(i, j), we can collect (1 − y) in the numerator and (1 − y)2 in the denominator, thus obtaining: q p x2 1 − 1 − 4 2 2 1−y 1 1 − 1 − 4q (1−y) e= = x , 2 2x2 (1 − y)2 1 − y 2q q= 2 (1−y) 1−y which can be solved by comparison with previous case, thus: 1 x e= D 1−y 1−y Daniele Paolo Scarpazza Notes on the Catalan problem but since D(q) = 1− p X 1 − 4q 2 k = D q k 2q 2 k≥0 then X xk 1 X xk = Dk e= Dk k 1−y (1 − y) (1 − y)k+1 k≥0 k≥0 = X Dk xk k≥0 X n + k n≥0 k yn X n + k = Dk xk y n k n,k≥0 [48] Daniele Paolo Scarpazza Notes on the Catalan problem therefore it should be true that: R(n, k) = 0 n+k k Ck/2 if k odd if k even [49] Daniele Paolo Scarpazza Notes on the Catalan problem [50] Problem: the above equation is incorrect! Cause: the real language of sentential forms of E is smaller than L(H); Proof: S, SS, SSS, ...S(), S(S), ... ∈ L(H) − E. Thus: above work need to be remade. Our current efforts are devoted to finding a new correct and unambiguous grammar for language E, obtaining an appropriate recursive definition of E and a corresponding equation for w(E), which, solved, would yield a formal power series for w(E), thus, a closed form for R(n, t) numbers.

* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project

Download PDF

advertising