# Scarpazza - Notes on the Catalan Problem - slides 2004-03 ```Notes on the Catalan problem
Daniele Paolo Scarpazza
Politecnico di Milano
<scarpazz@elet.polimi.it>
March 16th, 2004
Daniele Paolo Scarpazza
Notes on the Catalan problem

An overview of Catalan problems
• Catalan numbers appear as the solution of a variety of problems;
• they were first described in the 18th century by Leonhard Euler (working
on polygon triangulation);
• they are named after Eugene Catalan, a belgian mathematician which
found their expression (working on parenthesizations).
Daniele Paolo Scarpazza
Notes on the Catalan problem

A Catalan Problem: Balanced Parentheses
“Determine the number of balanced strings of parentheses of length 2n”.
A string of parentheses is an ordered collection of symbols “(” and “)”.
Balanced: same number of open and close parentheses, and every prefix
of the string has at least as many open parentheses as close parentheses;
Example: ()(()()) is balanced; strings )(()) and (()() are not.
Daniele Paolo Scarpazza
Notes on the Catalan problem
n
C(n)
0
empty string
1
1
()
1
2
()()
(())
3
()()()
((()))
()(())
(())()
(()())
5
4
()()()()
()((()))
(()()())
((()()))
()()(())
(())()()
(()(()))
(((())))
()(())()
(())(())
((()))()
()(()())
(()())()
((())())
14
5
()()()()()
()()((()))
()(()()())
()((()()))
(())(())()
(()())(())
(()(()))()
((()))()()
((())(()))
((()(())))
(((()())))
()()()(())
()(())()()
()(()(()))
()(((())))
(())(()())
(()()())()
(()(())())
((()))(())
((()()))()
(((())))()
((((()))))
()()(())()
()(())(())
()((()))()
(())()()()
(())((()))
(()()()())
(()(()()))
((())())()
((()())())
(((()))())
()()(()())
()(()())()
()((())())
(())()(())
(()())()()
(()()(()))
(()((())))
((())()())
((()()()))
(((())()))
42
2

Daniele Paolo Scarpazza
Notes on the Catalan problem

Another one: Mountain Ranges
“Determine the number of mountain landscapes which can be formed with
n upstrokes and n downstrokes.”
Mountain range: polyline of upstrokes “/” and downstrokes “\”; its
extreme points lie on the same horizontal line, and no segments cross it.
n
C(n)
0
*
1
1
/\
1
2
/\/\
2
/\
/ \
3
/\/\/\
/\
/\/ \
/\
/ \/\
/\/\
/
\
/\
/ \
/
\
5
Daniele Paolo Scarpazza
Notes on the Catalan problem

Another one: Diagonal-avoiding paths on a lattice
“Given a n × n lattice, determine the number of paths of length 2n which
do not cross the diagonal.”
In a finite lattice (i, j) : 1 ≤ i ≤ n, 1 ≤ j ≤ n, a path is a connected
sequence of “west” or “south” segments from node (1, 1) to node (n, n).
@
@
@
@
@
@
@
@
@
@
@
@
@
@
@
@
Sample path in a 7 × 7 lattice, corresponding to string (())(((()))()).
Daniele Paolo Scarpazza
Notes on the Catalan problem

Another one: Multiplication precedence
“Determine the number of ways in which n + 1 factors can be multiplied
together, according to the precedence of multiplications.”
n
C(n)
0
a
1
1
a·b
1
2
(a · b) · c
a · (b · c)
2
3
((a · b) · c) · d
(a · (b · c)) · d
a · (b · (c · d))
(a · b) · (c · d)
a · ((b · c) · d)
5
4
(((a · b) · c) · d) · e
((a · b) · (c · d)) · e
(a · b) · (c · (d · e))
(a · (b · c)) · (d · e)
(a · (b · (c · d))) · e
a · ((b · c) · (d · e))
a · (b · ((c · d) · e))
((a · b) · c) · (d · e)
(a · b) · ((c · d) · e)
((a · (b · c)) · d) · e
(a · ((b · c) · d)) · e
a · (((b · c) · d) · e)
a · ((b · (c · d)) · e)
a · (b · (c · (d · e)))
14
Daniele Paolo Scarpazza
Notes on the Catalan problem

Another one: Convex polygon triangulation
“Determine the number of ways in which a convex polygon with n+2 edges
can be triangulated.”
Daniele Paolo Scarpazza
Notes on the Catalan problem

Another one: Handshakes across a table
“Determine the number of ways in which 2n people sitting around a table
can shake hands without crossing their arms”.
Daniele Paolo Scarpazza
Notes on the Catalan problem

Another one: Binary rooted trees
“Determine the number of binary rooted trees with n internal nodes.”
Each non-leaf is an internal node. Binary rooted trees with n internal
nodes and n ranging from 0 to 3:
Daniele Paolo Scarpazza
Notes on the Catalan problem

Another one: Plane trees
“Determine the number of plane trees with n edges.” A plane tree is such
that it is possible to draw it on a plane with no edges crossing each other.
Daniele Paolo Scarpazza
Notes on the Catalan problem

Another one: Skew Polyominos
“Determine the number of skew polyominos with perimeter 2n + 2.”
Polyomino: figure composed by squares connected by their edges.
Skew polyomino: successive columns of squares from left to right increase
in height: the bottom of the column to the left is always lower or equal to the
bottom of the column to the right. Similarly, the top of the column to the left is
always lower than or equal to the top of the column to the right.
Daniele Paolo Scarpazza
Notes on the Catalan problem

Daniele Paolo Scarpazza
Notes on the Catalan problem
Derivations of the Catalan numbers

Daniele Paolo Scarpazza
Notes on the Catalan problem

Catalan numbers derived with generating functions
Any string containing n > 0 pairs of parentheses can be decomposed as:
(A)B
where, if A contains k pairs of parentheses, B must contain n − k − 1.
All configurations of n parenthesis pairs are the ones where A is empty
and B contains n − 1 pairs, plus the ones where A contains 1 pair and B
contains n − 2, and so on:
Daniele Paolo Scarpazza
Notes on the Catalan problem
C1 = C0C0
C2 = C0C1 + C1C0
C3 = C0C2 + C1C1 + C2C0
C4 = C0C3 + C1C2 + C2C1 + C3C0
... = ...
which can be rewritten in the form of a recurrence relation:
C0 = 1,
C1 = 1,
Cn =
n−1
X
i=0
CiCn−1−i

Daniele Paolo Scarpazza
Notes on the Catalan problem

We will now solve the above recurrences with the use of generating functions.
C(x) = C0 + C1 · x + C2 · x2 + ... =
+∞
X
Ci · xi
i=0
Let’s now examine the expression of [C(x)]2 = C(x)C(x), as follows:
C0C0 + (C0C1 + C1C0) x +(C0C2 + C1C1 + C2C0) x2 + ... =
||
||
||
C1
+
C2
x +
C3
x2 + ...
still a generating function with Catalan coefficients, shifted one position left:
[C(x)]2 = C1 + C2x + C3x2 =
+∞
X
i=0
Ci+1xi
.
Daniele Paolo Scarpazza
Notes on the Catalan problem

Therefore if we multiply the whole series by x and add C0, the original Catalan
series is obtained:
C(x) = C0 + x[C(x)]2.
A quadratic equation, which could be put into the more familiar form:
xC 2 − C + C0 = 0,
where C is the unknown and x, C0 are constant coefficients. Replacing C0
with its value (i.e., 1), the solution is trivially given by:
C=
1±
√
1 − 4x
.
2x
Daniele Paolo Scarpazza
Notes on the Catalan problem

Only the − solution is acceptable, being C0 = 1:
C=
1−
√
1 − 4x
.
2x
(1)
The solution contains the power of a binomial with fractional exponent:
√
1 − 4x = (1 − 4x)1/2 =
X 1/2
n≥0
n
(−4x)n,
Daniele Paolo Scarpazza
Notes on the Catalan problem
which can be expanded as:
1/2
(1/2)(−1/2)
4x +
(4x)2 +
1
2·1
(1/2)(−1/2)(−3/2)
(4x)3 +
3·2·1
(1/2)(−1/2)(−3/2)(−5/2)
(4x)4 + ...
4·3·2·1
(1 − 4x)1/2 = 1 −
+
+
which can be simplified as follows:
(1 − 4x)1/2 = 1 −
1
1
3·1 3 5·3·1
2x + 4x2 −
8x +
16x4 + ...
1!
2!
3!
4!

Daniele Paolo Scarpazza
Notes on the Catalan problem

Now, substituting we obtain:
C(x) = 1 −
1
3·1 2 5·3·1 3 7·5·3·1
2x +
4x +
8x +
16x4 + ...
2!
3!
4!
5!
We can get rid of terms like 7 · 5 · 3 · 1 (factorials missing the even factors), by
considering that:
22 · 2! = 4 · 2
23 · 3! = 6 · 4 · 2
24 · 4! = 8 · 6 · 4 · 2
... = ...
n
Y
2n · n! =
2i
i=1
Daniele Paolo Scarpazza
Notes on the Catalan problem
Consequently:
1 2!
1 4! 2 1 6! 2
C(x) = 1 + (
)x + (
)x + (
)x =
2 1!1!
3 2!2!
4 3!3!
+∞
X
1
2i i
=
z
1
+
i
i
i=0
Therefore, the ith Catalan number is:
1
2i
.
Ci =
1+i i

Daniele Paolo Scarpazza
Notes on the Catalan problem

The simplest proof
Solution based on considerations on the count of diagonal-avoiding paths
on a lattice: determining Cn equals counting the total paths through the grid
and subtracting the number of invalid ones.
@
@
@
@
@
@
@
@
@
@
@
@
@
@
@
@
@
(example of an invalid path).
Daniele Paolo Scarpazza
Notes on the Catalan problem

P (i, i + 1): first illegal reached point.
Transformation: from point P on, replace S ↔ W segments.
@
@
@
@
@
@
@
@
@
@
@
@
@
@
@
@
Transformation starts at point (i, i + 1) and it causes the n − i W segments
to be replaced by S segments and the remaining n − i − 1 S segments to be
replaced by W segments, ⇒ new ending coordinates (i + (n − i − 1), (i + 1) +
(n − i)) = (n − 1, n + 1).
Daniele Paolo Scarpazza
Notes on the Catalan problem

By construction, every illegal path in a n × n lattice corresponds to exactly
one non-constrained path in (n − 1) × (n + 1) lattice.
2n
a+b
Paths in a a × b lattice: a ⇒ in a n × n: n
2n
Invalid paths in a n × n lattice = paths in a (n − 1) × (n + 1) lattice: n+1
Cn =
2n
2n
−
n
n+1
=
2n
n
2n
−
n
n+1 n
=
1
2n
.
n+1 n
Daniele Paolo Scarpazza
Notes on the Catalan problem

A novel interpretation
A novel calculation of the Catalan numbers, inspired by formal language
considerations.
Language of balanced parentheses (Van Dyck language).
G = (Σ, N, S, R)
Σ = {(, )}
N
= {S}
R = {r1, r2}
r1
:
S→ε
r2
:
S → (S)S
Daniele Paolo Scarpazza
Notes on the Catalan problem

Sentential form: sequence of terminal and nonterminal symbols which
can be derived from the start symbol S. Strings are special sentential forms
of terminal symbols only.
Labelling: (n, t)-label for a sentential form containing n nonterminals and
t terminals. Strings ∈ L(G) will be labeled (0, 2i), i ∈ N.
Theorem: the number of terminal symbols is even. Proof: the axiom
contains no terminals, rules preserve parity.
Derivation step: a substitution replacing a single S symbol in a sentential
form with the right-hand side of rule r1 or r2.
Daniele Paolo Scarpazza
Notes on the Catalan problem
The derived form of a (n, t) form will have:
• one nonterminal less, same number of terminals (rule r1 applied);
• one more non-terminal and two more terminals (rule r2 applied).
r1*
(n,t)
(n-1,t)
H
HH
H
r
HH
HH
j
2
(n+1,t+2)

Daniele Paolo Scarpazza
Notes on the Catalan problem
Thus, a given sentential form derives either from:
• a (n + 1, t) form, through rule r1; or
• a (n − 1, t − 2) form, through rule r2.
(n+1,t)
(n-1,t-2)
r
HH
Y
HH1
HH
H
H
2
r
(n,t)

Daniele Paolo Scarpazza
Notes on the Catalan problem

Through recursive application of above scheme, all predecessors of
a given sentential form can be determined, up to the axiom, which has
obviously label (1, 0).
Theorem: axiom can only have label (1, 0).
Theorem: label (1, 0) corresponds to axiom only.
In general it is not true that each sentential form has exactly 2
predecessors:
• a (1, 0) form has no predecessor by definition, being the axiom;
• (0, t) and (1, t) forms can only have a (n + 1, t) predecessor. (Proof:
Daniele Paolo Scarpazza
Notes on the Catalan problem

by contradiction, the (n − 1, t − 2) predecessor would have zero or less
nonterminals, therefore it could have no successors.)
• (n, t) forms with n > t do not exist, apart from the axiom (1, 0). (Proof: by
induction. For each form (n, t), be δ = t − n. The axiom has δ = −1, both
rules increment δ.)
Each derivation tree starts with a label corresponding to a string, (0, 2i),
and reaches leaf nodes which are either axioms or invalid nodes.
Daniele Paolo Scarpazza
Notes on the Catalan problem

Number of axioms contained in the tree of a (0, 2i)-string
=
number of different ways in the derives a (0, 2i)-string
=
number of different strings of balanced parentheses of length 2i (since each
derivation is unique).
Daniele Paolo Scarpazza
Notes on the Catalan problem
Examples follow; axioms are marked with “!”, invalid nodes with “×”.
(0,2)
(1,2)
(2,2)
(1,0) !
Figure 1: Derivation tree for (0, 2), i.e., 2n = 2

Daniele Paolo Scarpazza
Notes on the Catalan problem

(0,4)
(1,4)
(2,4)
((((`````
(((((
(
```
(
(
(
(
(1,2)
(3,4)
XX
XXX
XX
(2,2)
b
bb
(1,0)! (3,2)×
(2,2)
b
bb
(4,4)
H
HH
H
(1,0)! (3,2)× (3,2)× (5,4)×
Figure 2: Derivation tree for (0, 4), i.e., 2n = 4
The count of axiom nodes in each derivation tree, is the ith Catalan
Daniele Paolo Scarpazza
Notes on the Catalan problem

number.
According to the above rules, the Ci is the number of axioms in the
derivation tree of a (0, t) form, with t = 2i, thus:

1



0
R(n, t) =
R(n + 1, t)



R(n − 1, t − 2) + R(n + 1, t)
if (n = 1 ∧ t = 0)
if n > t
if n <= 1
otherwise.
By construction,
Ci = R(0, 2i).
The above relation R was implemented in Tcl function and tested for
correctness.
Daniele Paolo Scarpazza
Notes on the Catalan problem

t/n
0
1
2
3
4
5
6
7
8
9
10
11
12
13
0
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
1
0
1
0
2
0
5
0
14
0
42
0
132
0
429
0
1430
0
4862
0
16796
0
58786
0
208012
1
0
1
0
2
0
5
0
14
0
42
0
132
0
429
0
1430
0
4862
0
16796
0
58786
0
208012
0
0
1
0
2
0
5
0
14
0
42
0
132
0
429
0
1430
0
4862
0
16796
0
58786
0
208012
0
0
0
0
1
0
3
0
9
0
28
0
90
0
297
0
1001
0
3432
0
11934
0
41990
0
149226
0
0
0
0
0
0
1
0
4
0
14
0
48
0
165
0
572
0
2002
0
7072
0
25194
0
90440
0
0
0
0
0
0
0
0
1
0
5
0
20
0
75
0
275
0
1001
0
3640
0
13260
0
48450
0
0
0
0
0
0
0
0
0
0
1
0
6
0
27
0
110
0
429
0
1638
0
6188
0
23256
0
0
0
0
0
0
0
0
0
0
0
0
1
0
7
0
35
0
154
0
637
0
2548
0
9996
0
0
0
0
0
0
0
0
0
0
0
0
0
0
1
0
8
0
44
0
208
0
910
0
3808
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
1
0
9
0
54
0
273
0
1260
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
1
0
10
0
65
0
350
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
1
0
11
0
77
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
1
0
12
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
1
Daniele Paolo Scarpazza
Notes on the Catalan problem
Plot of the surface R(n, t). Points with odd values of t were not plotted.

Daniele Paolo Scarpazza
Notes on the Catalan problem

Plot of the surface log R(n, t). Points with odd values of t were not plotted.
Daniele Paolo Scarpazza
Notes on the Catalan problem

Future Developments
If D is the language defined by: S → ε | (S)S, then D can be recursively
written as: D = {ε} + (D)D where + denotes disjoint set union.
If an alphabet A = {a1, a2, ..., am} is considered, A∗ denotes the language
of all the strings over alphabet A, and if α ∈ A∗ is a string over A, |α| denotes
the length of α.
For a language A∗, a function w : A∗ → Z[[x]] can be defined:
w(α) = x|α|, with α ∈ A∗
and we set by convention that w(ε) = 1.
Daniele Paolo Scarpazza
Notes on the Catalan problem

It is trivial to prove that function w exhibits the following property:
∀α, β ∈ A∗, w(α · β) = w(α)w(β),
where · denotes P
string concatenation. w can be extended on languages, by
defining w(L) = α∈L w(α). Therefore

∗
w(A ) =
X
α∈A∗
w(α) =
X
α∈A∗
|α|
x
=
X

X

n≥0
1 xn =
|α|=n
The following equation can be set for language D:
w(D) = 1 + x2w(D)2
X
n≥0
1
m x =
.
1 − mx
n n
Daniele Paolo Scarpazza
Notes on the Catalan problem

which can be solved by replacing y = w(D), thus obtaining y = 1 + x2y 2, or
in a more familiar form: x2y 2 − y + 1 = 0. The solution is given by:
y=
1−
√
1 − 4x2
2x2
It can be shown that:
D(x) = w(D) = D0 + D2x2 + D4x4 + D6x6 + ...
where ∀i ∈ N, D2i = Ci and D2i+1 = 0, therefore
D(x) = w(D) = C0 + C1x2 + C2x4 + C3x6 + ....
Daniele Paolo Scarpazza
Notes on the Catalan problem

Our aim is to extend above considerations to the language E of sentential
forms of grammar G.
Each string in D corresponds to exactly 1 mountain range; each string in
E corresponds to exactly one extended mountain range, with also horizontal
strokes. Example: (S)(S)((S)(S(S)))S, and its corresponding extended
mountain range:
_
_ _/ \
_ _ / \/
\
/ \/ \/
\_
The language E of sentential forms of grammar G is a new language, can be
defined by the a new grammar H, given as follows:
Daniele Paolo Scarpazza
Notes on the Catalan problem
H

= (Σ0, N, B, R0)
Σ0 = {(, ), S}
N
= {B}
R = {r1, r2, r3}
r1
:
B→ε
r2
:
B → SB
r3
:
B → (B)B
Note: S is a terminal symbol for grammar H. The above language is called
the Motzkin language.
Daniele Paolo Scarpazza
Notes on the Catalan problem
Now let us consider the production:
B → ε | SB | (B)B,
and we give now a recursive definition of E:
E = {ε} + SE + (E)E.
It is then time to introduce a newer, more useful definition of w(α):
w(α) = xp(α)y o(α)z |α|
where p(α) = |α|( + |α|), and o(α) = |α|S , therefore p(α) + o(α) = |α|.
From the recursive definition of E, it is possible to set:
w(E) = 1 + yzw(E) + x2z 2w(E)2,

Daniele Paolo Scarpazza
Notes on the Catalan problem

which, replacing e = w(E), is:
x2z 2e2 + (yz − 1)e + 1 = 0,
which, solved by e yields:
p
1 − yz − 1 − 2yz + y 2z 2 − 4x2z 2
e=
.
2
2
2x z
Thus e(x, y, z) can be written as a formal power series with coefficients
Daniele Paolo Scarpazza
Notes on the Catalan problem

Ei,j,k :
e(x, y, z) = E0,0,0 +
+ E1,0,0x + E0,1,0y + E0,0,1z +
+ E2,0,0x2 + E0,2,0y 2 + E0,0,2z 2 + E1,1,0xy + E0,1,1yz + E1,0,1xz +
+ ...
It is now evident that the number of sentential forms with n nonterminals and
t terminals, previously called R(n, t) is given by :
R(n, t) = En,t,n+t = [xty n]e(x, y, 1),
Daniele Paolo Scarpazza
Notes on the Catalan problem

where the notation [...] has the following meaning:
n
[x ]f (x) = fn ⇔ f (x) =
X
fnxn,
n≥0
in particular
[xiy j z k ]e(x, y, z) = Ei,j,k .
Furthermore, an expression of e(x, y, 1) can be obtained by restriction:
e(x, y, 1) = e(x, y, z)|z=1
p
1 − y − 1 − 2y + y 2 − 4x2
=
.
2x2
Incidentally, the i-th Catalan number, which was equal to R(0, 2i) can be
obtained by setting y = 0, thus:
Daniele Paolo Scarpazza
Notes on the Catalan problem
e(x, 0, 1) = e(x, y, 1)|y=0 =
1−

√
1 − 4x2
,
2x2
which is identical to a previous equation and admits the same solutions.
To obtain an expression of R(i, j), we can collect (1 − y) in the numerator
and (1 − y)2 in the denominator, thus obtaining:
q
p
x2
1
−
1
−
4
2
2
1−y
1 1 − 1 − 4q (1−y)
e=
=
x ,
2
2x2
(1 − y)2
1
−
y
2q
q=
2
(1−y)
1−y
which can be solved by comparison with previous case, thus:
1
x
e=
D
1−y
1−y
Daniele Paolo Scarpazza
Notes on the Catalan problem
but since
D(q) =
1−
p
X
1 − 4q 2
k
=
D
q
k
2q 2
k≥0
then
X
xk
1 X
xk
=
Dk
e=
Dk
k
1−y
(1 − y)
(1 − y)k+1
k≥0
k≥0
=
X
Dk xk
k≥0
X n + k n≥0
k
yn
X n + k =
Dk xk y n
k
n,k≥0

Daniele Paolo Scarpazza
Notes on the Catalan problem
therefore it should be true that:
R(n, k) =
0
n+k
k
Ck/2
if k odd
if k even

Daniele Paolo Scarpazza
Notes on the Catalan problem

Problem: the above equation is incorrect!
Cause: the real language of sentential forms of E is smaller than L(H);
Proof: S, SS, SSS, ...S(), S(S), ... ∈ L(H) − E.
Thus: above work need to be remade.
Our current efforts are devoted to finding a new correct and unambiguous
grammar for language E, obtaining an appropriate recursive definition of E
and a corresponding equation for w(E), which, solved, would yield a formal
power series for w(E), thus, a closed form for R(n, t) numbers.
```