On Certain Formal Properties of Grammars* (1959) NOAM CHOMSKY

On  Certain  Formal Properties of  Grammars* (1959) NOAM CHOMSKY
INFORMATION AND CONTROL 9., 1 3 7 - 1 6 7
(1959)
On Certain Formal Properties of Grammars*
NOAM CHOMSKY
Massachusetts Institute of Technology, Cambridge, Massachusetts and The Institute
for Advanced Study, Princeton, New Jersey
A grammar can be regarded as a device that enumerates the sentences of a language. We study a sequence of restrictions that limit
grammars first to Turing machines, then to two types of system from
which a phrase structure description of the generated language can
be drawn, and finally to finite state IV[arkov sources (finite automata). These restrictions are shown to be increasingly heavy in the
sense that the languages that can be generated by grammars meeting
a given restriction constitute a proper subset of those that can be
generated by grammars meeting the preceding restriction. Various
formulations of phrase structure description are considered, and the
source of their excess generative power over finite state sources is
investigated in greater detail.
SECTION
1
A language is a collection of sentences of finite length all constructed
from a finite alphabet (or, where our concern is limited to syntax, a finite
vocabulary) of symbols. Since any language L in which we are likely to
be interested is an infinite set, we can investigate the structure of L only
through the study of the finite devices (grammars)
which are capable of
enumerating its sentences. A grammar
of L can be regarded as a function
whose range is exactly L. Such devices have been called "sentence-generating grammars. ''z A theory of language will contain, then, a specifica* This work was supported in part by the U. S. Army (Signal Corps), the U. S.
Air Force (Office of Scientific Research, Air Research and Development Command), and the U. S. Navy (Office of Naval Research). This work was also supported in part by the Transformations Project on Information Retrieval of the
University of Pennsylvania. I am indebted to George A. Miller for several important observations about the systems under consideration here, and to I~. B.
Lees for material improvements in presentation.
i Following a familiar technical use of the term "generate," cf. Post (1944).
This locution has, however, been misleading, since it has erroneously been interpreted as indicating that such sentence-generating grammars consider language
137
138
CHOMSKY
tion of the class F of functions from which g r a m m a r s for particular languages m a y be drawn.
T h e weakest condition t h a t can significantly be placed on g r a m m a r s is
t h a t F be included in the class of general, unrestricted T u r i n g machines.
T h e strongest, m o s t limiting condition t h a t has been suggested is t h a t
each g r a m m a r be a finite M a r k o v i a n source (finite automaton).2
T h e latter condition is k n o w n t o be too strong; if F is limited in this
w a y it will not contain a g r a m m a r for English ( C h o m s k y , 1956). T h e
former condition, on the other hand, has no interest. We learn n o t h i n g
a b o u t a natural language f r o m the fact t h a t its sentences can be effectively displayed, i.e., t h a t t h e y constitute a reeursively enumerable set.
T h e reason for this is d e a r . Along with a specification of the class F of
grammars, a t h e o r y of language m u s t also indicate how, in general, relev a n t structural information can be obtained for a particular sentence
generated b y a particular g r a m m a r . T h a t is, the t h e o r y m u s t specify a
class ~ of " s t r u c t u r a l descriptions" and a functional • such t h a t given
f 6 F and x in the range of f, ~(f,x) 6 Z is a structural description of x
(with respect to the g r a m m a r f ) giving certain information which will
facilitate and serve as the basis for an a c c o u n t of how x is used and understood b y speakers of the language whose g r a m m a r is f; i.e., which will
indicate whether x is ambiguous, to w h a t other sentences it is structurally
similar, etc. These empirical conditions t h a t lead us to characterize F
in one w a y or a n o t h e r are of critical importance. T h e y will not be further
discussed in this paper, 3 b u t it is clear t h a t we will not be able to defrom the point of view of the speaker rather than the hearer. Actually, such grammars take a completely neutral point of view. Compare Chomsky (1957, p. 48).
We can consider a grammar of L to be a function mapping the integers onto L,
order of enumeration being immaterial (and easily specifiable, in many ways) to
this purely syntactic study, though the question of the particular "inputs" required to produce a particular sentence may be of great interest for other investigations which can build on syntactic work of this more restricted kind.
2 Compare Definition 9, See. 5.
Except briefly in §2. In Chomsky (1956, 1957), an appropriate ~ and ~ (i.e., an
appropriate method for determining structural information in a uniform manner from the grammar) are described informally for several types of grammar,
including those that will be studied here. It is, incidentally, important to recognize that a grammar of a language that succeeds in enumerating the sentences
will (although it is far from easy to obtain even this result) nevertheless be of
quite limited interest unless the underlying principles of construction are such
as to provide a useful structural description.
ON CERTAIN FORMAL PROPERTIES OF GRAMMARS
139
velop an adequate formulation of ¢ and % if the elements of F are specified only as such "unstructured" devices as general Turing machines.
Interest in structural properties of natural language thus serves as an
empirical motivation for investigation of devices with more generative
power than finite automata, and more special structure than Turing
machines. This paper is concerned with the effects of a sequence of increasing heavy restrictions on the class F which limit it first to Turing
machines and finally to finite automata and, in the intermediate stages,
to devices which have linguistic significance in that generation of a sentence automatically provides a meaningful structural description. We
shall find that these restrictions are increasingly heavy in the sense that
each limits more severely the set of languages that can be generated.
The intermediate systems are those that assign a phrase structure descritption to the resulting sentence. Given such a classification of special
kinds of Turing machines, the main problem of immediate relevance to
the theory of language is that of determining where in the hierarchy of
devices the grammars of natural languages lie. It would, for example, be
extremely interesting to know whether it is in principle possible to construct a phrase structure grammar for English (even though there is
good motivation of other kinds for not doing so). Before we can hope
to answer this, it will be necessary to discover the structural properties
that characterize the languages that can be enumerated by grammars of
these various types. If the classification of generating devices is reasonable (from the point of view of the empirical motivation), such purely
mathematical investigation may provide deeper insight into the formal
properties that distinguish natural languages, among all sets of finite
strings in a finite alphabet. Questions of this nature appear to be quite
difficult in the case of the special classes of Turing machines that have
the required linguistic significance. 4 This paper is devoted to a preliminary study of the properties of such special devices, viewed as grammars.
It should be mentioned that there appears to be good evidence that
devices of the kinds studied here are not adequate for formulation of a
full grammar for a natural language (see Chomsky, 1956, §4; 1957,
Chapter 5). Left out of consideration here are what have elsewhere been
4 In C h o m s k y and Miller (1958), a s t r u c t u r a l characterization t h e o r e m is
s t a t e d for languages t h a t can be e n u m e r a t e d by finite automata, in t e r m s of t h e
cyclical s t r u c t u r e of these a u t o m a t a . The basic characterization theorem for
finite automata is proven in Kleene (1956).
140
CHOMSKY
called "grammatical transformations" (Harris, 1952a, b, 1957; Chomsky, 1956, 1957). These are complex operations that convert sentences
with a phrase structure description into other sentences with a phrase
structure description. Nevertheless, it appears that devices of the kind
studied in the following pages must function as essential components in
adequate grammars for natural languages. Hence investigation of these
devices is important as a preliminary to the far more difficult study of
the generative power of transformational grammars (as well as, negatively, for the information it should provide about what it is in natural
language that makes a transformational grammar necessary).
SECTION 2
A phrase structure grammar consists of a finite set of "rewriting rules"
of the form ~ --* ¢, where e and ~b are strings of symbols. It contains a
special "initial" symbol S (standing for "sentence") and a boundary
symbol # indicating the beginning and end of sentences. Some of the
symbols of the grammar stand for words and morphemes (grammatically
significant parts of words). These constitute the "terminal vocabulary."
Other symbols stand for phrases, and constitute the "nonterminal vocabulary" (S is one of these, standing for the "longest" phrase). Given
such a grammar, we generate a sentence by writing down the initial
string #S#, applying one of the rewriting rules to form a new string
#~1# (that is, we might have applied the rule #S# --~ #el# or the rule
S --~ ¢~), applying another rule to form a new string #e2#, and so on, until
we reach a string #~# which consists solely of terminal symbols and
cannot be further rewritten. The sequence of strings constructed in this
way will be called a "derivation" of #e~#.
Consider, for example, a grammar containing the rules: S ~ AB,
A --~ C, CB ~ Cb, C --> a, and hence providing the derivation D =
(#S#, #AB#, #CB#, #Cb#, #ab#). We can represent D diagrammatically
in the form
S
/\
A
I
C
B
I
b
(1)
If appropriate restrictions are placed on the form of the rules m --~ ~ (in
particular, the condition that ~ differ from m by replacement of a single
ON C E R T A I N FORMAL P R O P E R T I E S OF GRAMMARS
14%1
symbol of ~ b y a non-null string), it will always be possible to associate
with a derivation a labeled tree in the same way. These trees can be
taken as the structural descriptions discussed in Sec. 1, and the method
of constructing them, given a derivation, will (when stated precisely)
be a definition of the functional ~. A substring x of the terminal string of
a g i v e n derivation will be called a phrase of type A just in case it can
be traced back to a point labeled A in the associated tree (thus, for example, the substring enclosed within the boundaries is a phrase of the
type "sentence"). If in the example given above we interpret A as Noun
Phrase, B as Verb Phrase, C as Singular Noun, a as John, and b as comes,
we can regard D as a derivation of John comes providing the structural
description (1), which indicates that John is a Singular Noun and a
Noun Phrase, that comes is a Verb Phrase, and that John comes is a
Sentence. Grammars containing rules formulated in such a way that trees
can be associated with derivations will thus have a certain linguistic
significance in that they provide a precise reconstruction of large parts
of the traditional notion of "parsing" or, in its more modern version,
immediate constituent analysis. (Cf. Chomsky (1956, 1957) for further
]iseussion.)
The basic system of description that we shall consider is a system G
of the following form: G is a semi-group under concatenation with strings
in a finite set V of symbols as its elements, and I as the identity element.
V is called the "vocabulary" of G. V = Vr u VN(Vr, VN disjoint), where
Vr is the "terminal vocabulary" and VN the "nonterminal vocabulary."
Vr contains I and a " b o u n d a r y " element #. V~ contains an element S
(sentence). A two-place relation -~ is defined on elements of G, read
"can be rewritten as." This relation satisfies the following conditions:
Axiom 1. --* is irreflexive.
AXIOM 2. A C VN if and only if there are ~,, ¢, co such that ~,A¢ --+ ~co¢.
Axiom 3. There are no ~, ¢, co such that ~ --+ ¢#co.
A x l o ~ 4. There is a finite set of pairs (Xi, col), "'" , (x~, cos) such
that for all ~, ¢, ~ --~ ¢ if and only if there are ~1, ~2, and j _= n such
that ~ = ~ixj~2 and ¢ = ~coj~2 •
Thus the pairs (xJ, coJ) whose existence is guaranteed by Axiom 4
give a finite specification of the relation --~. In other words, we m a y think
of the grammar as containing a finite number of rules x; --~ coi which
completely determine all possible derivations.
The presentation will be greatly facilitated by the adoption of the
following notational convention (which was in fact followed above).
142
CHOMSKY
CONVENTION 1: We shall use capital letters for strings in V~ ; small
Latin letters for strings in Vr ; Greek letters for arbitrary strings; early
letters of all alphabets for single symbols (members of V); late letters
of all alphabets for arbitrary strings.
DEFINITION 1. (91, "'" , 9n)(n > 1) is a ¢J-derivation of~ if ~b = ~i,
= 9~, and 9~-+9i+1(1 =< i < n).
DEFINITION 2. A 9-derivation is terminated if it is not a proper initial
subsequence of any 9-derivation. ~
DEFINITION 3. The terminal language La generated b y G is the set of
strings x such that there is a terminated #S#-derivation of x. 6
DEFINITION 4. G is equivalent to G* if La = L a , .
DEFINITION
5. 9 ~ ~b if there is a 9-derivation of ~.
(which is the ordinary ancestral of --~) is thus a partial ordering of
strings in G. These notions appear, in slightly different form, in Chomsky
(1956, 1957).
This paper will be devoted to a study of the effect of imposing the
following additional restrictions on grammars of the type described
above.
RESTRICTION i. If 9 --* ~b, then there are A, 91,92, ~ such that 9 -91A92, ~b = 91w92, and ~ ~ I.
RESTRICTION 2. If 9 -~ ~b, then there are A, 9J, 92, ~ such that 9 =
91A92, ~b -- 91~92, ~0 # I, but A -~ w.
RESTRICTION 3. If 9 -~ #, then there are A, 91,92, w, a, B such that
9- 91A92,~b91~92,~0 ~I,A--~,but¢o
~- aBor~o = a.
The nature of these restrictions is clarified by comparison with Axiom
4, above. Restriction 1 requires that the rules of the grammar [i.e., the
minimal pairs (x~, w~) of Axiom 4] all of be the form 91A92 --+ 91~q~2,
where A is a single symbol and w ~ I. Such a rule asserts that A -~
in the context 91--~2 (which may be null). Restriction 2 requires that
the limiting context indeed be null; that is, that the rules all be of the
form A -+ o~, where A is a single symbol, and that each such rule may be
applied independently of the context in which A appears. Restriction 3
5 Note that a terminated derivation need not terminate in a string of Vr (i.e.,
it may be "blocked" at a nonterminal string), and that a derivation ending with
a string of VT need not be terminated (if, e.g., the grammar contains such rules
as ab - ~ cd).
6 Thus the terminal language LG consists only of those strings of V r which are
derivable from #S# but which cannot head a derivation (of > 2 lines).
ON C E R T A I N F O R M A L P R O P E R T I E S OF GRAMMARS
143
limits the rules t o the form A ---> aB or A --+ a (where A , B are single
nonterminal symbols, and a is a single terminal symbol).
DEFINITION 6. F o r i = 1, 2, 3, a type i grammar is one meeting restriction i, and a type i language is one with a t y p e i g r a m m a r . A type 0 grammar (language) is one t h a t is unrestricted.
T y p e 0 g r a m m a r s are essentially Turing machines; t y p e 3 grammars,
finite a u t o m a t a . T y p e 1 and 2 g r a m m a r s can be interpreted as systems
of phrase structure description.
SECTION 3
T h e o r e m 1 follows immediately f r o m the definitions.
THEOREM 1. F o r b o t h g r a m m a r s a n d languages, t y p e 0 D t y p e 1
t y p e 2 ___ t y p e 3.
T h e following is, furthermore, well known.
TtIEOREM 2. E v e r y recursively enumerable set of strings is a t y p e 0
language (and conversely), v
T h a t is, a g r a m m a r of t y p e 0 is a device with the generative power of a
T u r i n g machine. T h e t h e o r y of t y p e 0 g r a m m a r s and t y p e 0 languages
is t h u s p a r t of a rapidly developing b r a n c h of m a t h e m a t i c s (recursive
function t h e o r y ) . Conceptually, at least, the t h e o r y of g r a m m a r can be
viewed as a s t u d y of special classes of recursive functions.
THEOREM 3. E a c h t y p e 1 language is a decidable set of strings. 7~
T h a t is, given a t y p e 1 g r a m m a r G, there is an effective procedure for
determining w h e t h e r an a r b i t r a r y string x is in the language e n u m e r a t e d
b y G. This follows f r o m the fact t h a t if ¢~, ~+1 are successive lines of a
derivation produced b y a t y p e 1 g r a m m a r , t h e n ~+1 c a n n o t contain
fewer symbols t h a n ~ , since ~+1 is formed f r o m ~ b y replacing a single
symbol A of ~ b y a non-null string ~. Clearly a n y string x which has a
7 See, for example, Davis (1958, Chap. 6, §2). It is easily shown that the further
structure in type 0 grammars over the combinatorial systems there described does
not affect this result.
7~ But not conversely. For suppose we give an effective enumeration of type 1
grammars, thus enumerating type 1 languages as L1, L ~ , - . . . Let sl,s~ ,..- be
an effective enumeration of all finite strings in what we can assume (without
restriction) to be the common, finite alphabet of L1,L2,--- . Given the index oi
a language in the enumeration L~ ,L2 ,.-. , we have immediately a decision procedure for this language. Let M be the "diagonal" language containing just those
strings sl such that [email protected] Li. Then M is a decidable language not in the enumeration.
I am indebted to Hilary Putnam for this observation.
144
CHOMSKY
#S#-derivation, has a #S#-derivation in which no line repeats, since lines
between repetitions can be deleted. Consequently, given a grammar G
of type 1 and a string x, only a finite number of derivations (those with
no repetitions and no lines longer than x) need be investigated to determine whether x C L o .
We see, therefore, that Restriction 1 provides an essentially more
limited type of grammar than type 0.
The basic relation -~ of a type 1 grammar is specified completely
b y a finite set of pairs of the form (¢[email protected]~, @~¢~). Suppose that ~ =
ax • • • a ~ . We can then associate with this pair the element
A
(2)
(T 1
O~2
• • •
O/m_ 10/m
Corresponding to any derivation D we can construct a tree formed from
the elements (2) associated with the transitions between successive lines
of D, adding elements to the tree from the appropriate node as the
derivation progresses, s We can thus associate a labeled tree with each
derivation as a structural description of the generated sentence. The restriction on the rules ~ -+ ~ which leads to type 1 grammars thus has a
certain linguistic significance since, as pointed out in Sec. 1, these grammars provide a precise reconstruction of much of what is traditionally
called "parsing" or "immediate constituent analysis." Type 1 grammars
are the phrase structure grammars considered in Chomsky (1957,
Chap. 4).
SECTION 4
LEMMA 1. Suppose that G is a type 1 grammar, and X , B are particular strings of G. Let G' be the grammar formed by adding X B ~ B X
to G. Then there is a type 1 grammar G* equivalent to G'.
P~ooF. Suppose t h a t X = A1 • • • A n . Choose C1, • • • , Cn+l new and
distinct. Let Q be the sequence of rules
8 This associated tree might not be unique, if, for example, there were a derivation containing the successive lines ,p1AB~,2, ~IACB~2, since this step in the derivation might have used either of the rules A --~ A¢ or B --~ CB. It is possible to add
conditions on G that guarantee uniqueness without affecting the set of generated
languages.
ON CERTAIN FORMAL PROPERTIES OF GRAMMARS
145
A1 "'" A n B - ~ C1A2 "'" A,~B
C1 . . . C~B
BC2 . . . Cn+l
BA1 " " A,~
where the left-hand side of each rule is the right-hand side of the immediately preceding rule. Let G* be formed by adding the rules of Q to
G. It is obvious that if there is a #S#-derivation of x in G* using rules of
Q, then there is a #S#-derivation of x in G* in which the rules are applied only in the sequence Q, with no other rules interspersed (note that
x is a terminal string). Consequently the only effect of adding the rules
of Q to G is to permit a string ~,XB~ to be rewritten ~ B X ¢ , and La.
contains only sentences of L~,. It is clear that La* contains all the sentences of Lo, and that G* meets Restriction 1.
By a similar argument it can easily be shown that type 1 languages
are those whose grammars meet the condition that if ~ --~ ~b, then ~b is
at least as long as ~. That is, weakening Restriction 1 to this extent will
not increase the class of generated languages.
LEMMA2. Let L be the language containing all and only the sentences
of the form #a~bma%'~ccc#(m,n ~ 1). Then L is a type 1 language.
PRoof. Consider the grammar G with V r = la,b,c,I,#},
VN = {S, $1 , $2, A, .4, B,/~, C, D, E, F},
and the following rules:
(I) (a) S ~ CDS~S2F
(b)
S~ -+ S:S~
(c) [S2B --->B B J
(d) $1 "-+ S1S~
~Sl u =---+AB~
( e ) [ S I A ---+ A A /
146
C~OMSXY
{CDA --+ CE~A
( I I ) (a) ICDB --+ C E B B S
(b) [CE~ ~ ~CE J
(c) E ~ -~ ~Ea
(d) E a # --+ D a #
(e) ~ --~ D a
( I I I ) CDFa ~ a C D F
(IV) (a) ~B, 3 -~ bJ
~CDF#---+ CDc#]
(b) ~CDc ~ Ccc
LCc -~ cc
J
where a, f~ range over {A, B, F}.
I t can n o w be determined t h a t the only #S#-derivations of G t h a t
terminate in strings of VT are produced in the following m a n n e r :
(1) the rules of ( I ) are applied as follows: (a) once, (b) m - 1 times
for some m = 1, (c) m times, (d) n -- 1 times for some n => 1, and (e)
n times, giving
#CDo~, . . . ,~,,+,,F#
whereat =Afori~n,a~=
Bfori>n
(2) the rules of ( I I ) are applied as follows: (a) once and (b) once,
giving
#alCEal . . . ,~,~+mF#9
(c) n + m times and (d) once, giving
#alCa~ . . . o~n+~FDal#
(e) n + m times, giving
#alCDa~ . . . ol~Fal#
(3) the rules of ( I I ) are applied, as in (2), n + m giving
1 more times,
#al "'" a,~+~CDFal . . . a,~+,~#
9 Where here and henceforth, a~ = fi~ if a~ = A, ~ = /~ if a~ = B. Note thut
use of rules of the type of (II), (b), (c), (e), and (III) is justified by Lemma 1.
ON CERTAIN FORMAL PROPERTIES OF GRAMMARS
147
(4) the rule ( I I I ) is applied n + m times, giving
# ~ "'" ~,~+,~1 "'" a,~+,~CDF#
(5) the rules of (IV) are applied, (a) 2 (n + m) times, (b) once,
giving
#anbma% '%cc#
Any other sequence of rules (except for a certain freedom in point of
application of [IVa]) will fail to produce a derivation terminating in a
string of Vr. Notice that the form of the terminal string is completely
determined b y step (1) above, where n and m are selected. Rules ( I I )
and (III) are nothing but a copying device that carries any string of the
from #CDXF# (where X is any string of A's and B's) to the corresponding string #XXCDF#, which is converted b y (IV) into terminal form.
B y Lemma 1, there is a type 1 grammar G* equivalent to G, as was
to be proven.
TI~EOREM 4. There are type 1 languages which are not type 2 languages.
PRooF. We have seen that the language L consisting of all and only
the strings #a%'~a%%cc# is a type 1 language. Suppose t h a t G is a type
2 grammar of L. We can assume for each A in the vocabulary of G that
there are infinitely m a n y x's such that A ~ x (otherwise A can be eliminated from G in favor of a finite number of rules of the form B -+ ~lz~
whenever G contains the rule B --~ ~1A~2 and A ~ z). L contains intlnitely m a n y sentences, but G contains only finitely m a n y symbols. Therefore we can find an A such that for infinitely m a n y sentences of L there
is an #S#-derivation the next-to-last line of which is of the form xAy
(i.e., A is its only nonterminal symbol). From among these, select a
sentence s = #a~b'~a'b'%cc# such that m -t- n > r, where al . . . an is
the longest string z such that A --+ z (note that there must be a z such
t h a t A -+ z, since A appears in the next-to-last line of a derivation of a
terminal string; and, by Axiom 4, there are only finitely m a n y such z's).
B u t now it is immediately clear that ff ( ~ , • -. , ~t+~) is a #S#-derivation
of s for which ~t = #xAy#, then no m a t t e r what x and y m a y be,
(~1,
"'" , ~ t )
is the initial part of infinitely m a n y derivations of terminal strings not
in L. Hence G is not a grammar of L.
We see, therefore, that grammars meeting Restriction 2 are essentially
148
CHOMSKY
less powerful t h a n those meeting only Restriction 1. However, the extra
power of g r a m m a r s t h a t do not meet Restriction 2 appears, from the
above results, to be a defect of such grammars, with regard to the intended interpretation. The extra power of type 1 grammars comes (in
part, at least) from the fact t h a t even though only a single symbol is
rewritten with each addition of a new line to a derivation, it is nevertheless possible in effect to incorporate a permutation such as A B ~ B A
( L e m m a 1). The purpose of permitting only a single symbol to be rewritten was to permit the construction of a tree (as in Sec. 2) as a
structural description which specifies t h a t a certain segment x of the
generated sentence is an A (e.g., in the example in Sec. 2, J o h n is a
N o u n Phrase). The tree associated with a derivation such as t h a t in the
proof of L e m m a 1 will, where it incorporates a permutation A B --~ B A ,
specify t h a t the segment derived ultimately from the B of • • - B A • • • is
an A, and the segment derived from the A of . . - B A . . . is a B. For
example, a type 1 g r a m m a r in which both J o h n will come and will J o h n
come are derived from an earlier line N o u n Phrase-Modal-Verb, where
will J o h n come is produced b y a permutation, would specify t h a t will
in will J o h n come is a N o u n Phrase and J o h n a Modal, contrary to intention. Thus the extra power of type 1 grammars is as much a defect
as was the still greater power of unrestricted Turing machines (type 0
grammars).
A type 1 g r a m m a r m a y contain minimal rules of the form ~ I A ~
~ 1 ~ 2 , whereas in a t y p e 2 grammar, ~ and ~2 must be null in this case.
A rule of the type 1 form asserts, in effect, t h a t A --~ o~ in the context
~ - - ~ . Contextual restrictions of this type are often found necessary
in construction of phrase structure descriptions for natural languages.
Consequently the extra flexibility permitted in type 1 g r a m m a r s is important. I t seems clear, then, t h a t neither Restriction 1 nor Restriction 2
is exactly what is required for the complete reconstruction of immediate
constituent analysis. I t is not obvious what further qualification would
be appropriate.
I n type 2 grammars, the anomalies mentioned in footnote 5 are
avoided. The final line of each terminated derivation is a string in V r ,
and no string in Vr can head a derivation of more t h a n one line.
SECTION 5
We consider now g r a m m a r s meeting Restriction 2.
DEFINITION 7. A g r a m m a r is self-embedding (s.e.) if it contains an A
such t h a t for some ~,~b(~ ~ I ~ ¢), A ~ ~A~b.
ON
CERTAIN
FORMAL
PROPERTIES
OF
GRAMMARS
149
DEFINITION 8. A g r a m m a r G is r e g u l a r if it contains only rules of the
form A --+ a or A ---+ B C , where B ~ C; and if whenever A -+ ~1B~2 and
A --+ ~blB¢~2are rules of G, then ~o~ = ~b~(i = 1, 2).
THEOREM 5. If G is a type 2 grammar, there is a regular g r a m m a r G*
which is equivalent to G and which, furthermore, is non-s.c, if G is
non-s.c.
PROOF. Define L(~) (i.e., length of ~) to be m if ~ = al • • • am, where
a~ 7 I .
Given a type 2 g r a m m a r G, consider all derivations D = (91, • "- , 9t)
meeting the following four conditions:
(a) for some A, 91 = A
(b) D contains no repeating lines
(c) L(~ot_~) < 4
(d) L(¢t) _-__ 4 or ~ot is terminal.
Clearly there is a finite number of such derivations. Let G1 be the gramm a r containing the minimal rule ~ -+ ~bjust in case for some such derivation D, ~ = ~ and ~ = c t . Clearly G~ is a type 2 g r a m m a r equivalent
to G, and is non-s.c, if G is non-s.c., since ~o -+ ~bin G1 only if ~o ~ ¢ in G.
Suppose t h a t G1 contains rules R~ and R2 :
R1 : A -+ ~iB~o~ = o01c02~a~(~ ~ I )
R~ : A --~ ~B¢~
where 9~ ~ ~bl or 92 # ~2 • Replace R~ b y the three rules
RI~ : A ---+ C D
R ~ : C --+ ~
where C and D are new and distinct. Continuing in this way, always
adding new symbols, form G2 equivalent to G~ , non-s.e, if G~ is non-s.e.,
and meeting the second of the regularity conditions.
If G~ contains a rule A --+ a~ • • • oe,~(a~ ~ I , n > 2), replace it b y t h e
rules
R~ : A ---+ a l . . .
a,~_~B
where B is new. Continuing in this way, form Ga.
If Ga contains A --+ a b ( a ~ I ~ b), replace it b y A ~ B C , B ---+ a,
C -+ b, where B and C are new. If Ga contains A ---+ a B , replace it b y
150
CHOMSKY
A -+ CB, C --> a, where C is new. If it contains A --+ Ba, replace this by
A ~ BC, C --+ a, where C is new. Continuing in this way form G4. G4
then is the grammar G* required for the theorem.
Theorem 5 asserts in particular that all type 2 languages can be generated b y grammars which yield only trees with no more than two
branches from each node. T h a t is, from the point of view of generative
power, we do not restrict grammars b y requiring that each phrase have
at most two immediate constituents (note that in a regular grammar, a
"phrase" has one immediate constituent just in ease it is interpreted as
a word or morpheme class, i.e., a lowest level phrase; an immediate
constituent in this case is a member of the class).
DEFINITION 9. Suppose that 2~ is a finite state Markov source with a
symbol emitted at each inter-state transition; with a designated initial
state So and a designated final state Sy ; with # emitted on transition
from So and from Sf to So, and nowhere else; and with no transition
from Sf except to So. Define a sentence as a string of symbols emitted
as the system moves from So to a first recurrence of So. Then the set
of sentences that can be emitted by Z is a finite state language, z°
Since Restriction 3 limits the rules to the form A --+ aB or A -~ a,
we immediately conclude the following.
THEOREM
6. The type 3 languages are the finite state languages.
PROOF. Suppose that G is a type 3 grammar. We interpret the symbols
of V~ as designations of states and the symbols of Vr as transition symbols. Then a rule of the form A --~ aB is interpreted as meaning that a
is emitted on transition from A to B. An #S#-derivation of G can involve only one application of a rule of the form A --+ a. This can be interpreted as indicating transition from A to a final state with a emitted.
The fact that # bounds each sentence of L~ can be understood as indicating the presence of an initial state So with # emitted on transition from
So to S, and as a requirement that the only transition from the final
state is to So, with # emitted. Thus G can be interpreted as a system
of the type described in Definition 9. Similarly, each such system can
be described as a type 3 grammar.
lO Alternatively,
~ can be considered as a finite automaton, and the generated
finite state language, as the set of input sequences that carry it from So to a first
recurrence of S0 . Cf. Chomsky
and Miller (1958) for a discussion of properties of
finite state languages and systems that generate them from a point of view related to that of this paper. A finite state language is essentially what is called in
Kleene (1956) a "regular event."
ON CERTAIN FORMAL PROPERTIES OF GRAMMARS
151
Restriction 3 limits the rules to the form A ~ a B or A --~ a. From
Theorem 5 we see that Restriction 2 amounts to a limitation of the rules
to the form A ~ aB, A --~ a, or A --~ B C (with the first type dispensable).
Hence the fundamental feature distinguishing type 2 grammars (systems of phrase structure) from type 3 grammars (finite automata) is
the possibility of rules of the form A ~ B C in the former. This leads to
an important difference in generative power.
THEORE~ 7. There exist type 2 languages that are not type 3 languages. (Cf. Chomsky, 1956, 1957.)
In Chomsky (1956), three examples of non-type 3 languages were
presented. Let L1 be the language containing just the strings a'b~; L~,
the language containing just the strings xy, where x is a string of a's
and b's and y is the mirror image of x; L~, the language consisting of all
strings x x where x is a string of a's and b's. Then L1, L2, and L3 are
not type 3 languages. LI and L2 are type 2 languages (cf. Chomsky,
1956). L3 is a type 1 language but not a type 2 language, as can be shown
by proofs similar to those of Lemma 2 and Theorem 4.1~
Suppose that we extend the power of a finite automaton by equipping
it with a finite number of counters, each of which can assume infinitely
many positions. We permit each counter to shift position in a fixed way
with each inter-state transition, and we permit the next transition to be
determined by the present state and the present readings of the counters.
A language generated (as in Definition 9) by a system of this sort (where
each counter begins in a fixed position) will be called a counter language.
Clearly L1, though not a finite state (type 3) language, is a counter
language. Several different systems of this general type are studied by
Schiitzenberger, (1957), where the following, in particular, is proven.
THEOREM 8. L2 is not a counter language.
Thus there are type 2 languages that are not counter languages.TMTo
summarize, L~ is a counter language and a type 2 language, but not a
type 3 (finite state) language; L2 is a type 2 language but not a counter
language (hence not a type 3 language) ; and L3 is a type 1 language but
not a type 2 language.
11 I n C h o m s k y (1956, p. 119) a n d C h o m s k y (1957, p. 34), it was erroneously
s t a t e d t h a t La c a n n o t be g e n e r a t e d b y a phrase s t r u c t u r e system. This is true for
a t y p e 2, b u t n o t a t y p e 1 p h r a s e s t r u c t u r e system.
12 T h e f u r t h e r question w h e t h e r all counter languages are t y p e 2 languages (i.e.,
w h e t h e r counter languages c o n s t i t u t e a step between types 2 a n d 3 in t h e hiera r c h y being considered here) has not been investigated.
152
CHOMSKY
From Theorems 2, 3, 4 and 7, we conclude:
THEOREM 9. Restrictions 1, 2 and 3 are increasingly heavy. T h a t is,
the inclusion in Theorem 1 is proper inclusion, both for grammars (trivially) and for languages.
The fact that L~ is a type 2 language but neither a type 3 nor a counter
language is important, since English has the essential properties of L~
(Chomsky, 1956, 1957). We can conclude from this that finite automata (even with a finite number of infinite counters) that produce sentences from "left to right" in the manner of Definition 9 cannot constitute the class F (cf. Sec. 1) from which grammars are drawn; i.e., the
devices that generate language cannot be of this character.
SECTION 6
The importance of gaining a better understanding of the difference in
generative power between phrase structure grammars and finite state
sources is clear from the considerations reviewed in Sec. 5. We shall
now show that the source of the excess of power of type 2 grammars over
type 3 grammars lies in the fact that the former m a y be self-embedding
(Definition 7). Because of Theorem 5 we can restrict our attention to
regular grammars.
Construction: Let G be a non-s.e, regular (type 2) grammar. Let
K = {(A1,...,A~)
for
[m = 1
or,
1 <= i < j < m, Ai--->~Ai+l¢~
and
A~#A~}.
We construct the grammar G' with each nonterminal symbol represented
in the form [B1 .." B~]~(i = 1, 2), where the B / s are in turn nontermihal symbols of G, as follows: 13
Suppose that (BI, . . . , Bn) C K.
(i) If Bn --+ a in G, then [B1 .." B~]~ -+ a[B1 . . . B~]2.
(ii) If B~ ---+CD where C # B~ ~ D ( i <= n), then
( a ) [B~ . . . B~]~ ~
[B~ . - . B.C]I
(b) [B1 . . . B,C]2 --+ [B1 .." B,D]~
(c) [B~ . . . B,~D]~ -+ [B~ . . . B=]2.
13Since the nonterminal symbols of G and G' are represented in different forms,
we can use the symbols --~ and ~ for both G and G' without ambiguity.
ON
CERTAIN
FORMAL
PROPERTIES
OF
GRAMMARS
153
(iii) If B , ~ C D where B~ = D for some i < n, then
(a) [B1 . . . B~h-+ [B~ . . . B,~C]I
(b) [B1 . . . B.C]2 --~ [B~ . . . B,]I .
(iv) If B~ ---> C D where B~ -- C for some i ~ n, then
(a) [B~ . . . B~]~ ~
[BI . . . B,D]~
(b) [B~ . . . B,~D]2 ~ [BI . . . B~]2.
We shall prove that G' is equivalent to G (when sfightly modified).
The character of this construction can be clarified by consideration
of the trees generated by a grammar (cf. Sec. 3). Since G is regular and
non-s.e., we have to consider only the following configurations:
(a)
(b)
(c)
(d)
B1
B1
B1
B1
/1\
/1\
B2
B2
B~
B2
/ 1: \
/ 1: \
/ 1: \
/1\
B~
B~
B~
B~
/\
i
a
/!\
C
/1\
/\
D
E1
B~+I
E~
B~+I E1
Bi+~
B~+~ E2
Bn
Bn
/\
C
(3)
/\
B~
Bi
D
where at most two of the branches proceeding from a given node are
non-null; in case (b), no node dominated by B~ is labeled B i ( i <= n);
and in each case, B1 = S.
(i) of the construction corresponds to case (3a), (ii) to (3b), (iii) to
(3c), and (iv) to (3d). (3e) and (3d) are the only possible kinds of recursion. If we have a configuration of the type (3c), we can have substrings of the form (xl • . . x,~_~y) k (where Ej ~ x~-, C ~ y ) in the resulting terminal strings. In the case of (3d) we can have substrings of the
form (yXn--i "'" Xl) ~ (where D ~ y, Ej ~ xj). (iii) and (iv) accommodate these possibilities by permitting the appropriate cycles in Gt. To
the earliest (highest) occurrence of a particular nonterminal symbol
154
CI-IOMSKY
B~ in a particular branch of a tree, the construction associates two nonterminal symbols [B1 . . . B~]I and [B1 . " B~]2, where B1 , , . . , B~-I are
the labels of the nodes dominating this occurrence of B~. The derivation in G' corresponding to the given tree will contain a subsequence
(z[B1 . . . Bn]~, . . . zx[B~ . . . B~]2), where B~ ~ x and z is the string
preceding this occurrence of x in the given derivation in G. For example,
corresponding to a tree of the form
S
A
B
I
J
a
b
(4)
generated by a grammar G, the corresponding G' will generate the derivation (5) with the accompanying tree:
1.
[S]1
[S]1
2.
[SA]t
(iia)
3.
a[SA]2
(i)
4.
a[SB]I
(lib)
[SB]I
5.
ab[SB]2
(i)
b [SB]2
6.
ab[S]2
(iic)
I
[SAh
/\
a
[SA]2
]
(5)
I
[S]2
where the step of the construction permitting each line is indicated at
the right.
We now proceed to establish that the grammar G' given by this construction is actually equivalent (with slight modification) to the given
grammar G. This result, which requires a long sequence of introductory
lemmas, is stated in a following paragraph as Theorem 10. From this
we will conclude that given any non-s.c, type 2 grammar, we can construct an equivalent type 3 grammar (with m a n y vacuous transitions
which are, however, eliminable; cf. Chomsky and Miller, 1958). From
this follows the main result of the paper (Theorem 11), namely, that
the extra power of phrase structure grammars over finite automata as
language generators lies in the fact that phrase structure grammars m a y
be self-embedding.
ON C E R T A I N F O R M A L P R O P E R T I E S
OF G R A M M A R S
155
LEMMA 3. If ( A 1 , " • • , A m ) ~ K , where K is as in the construction,
t h e n A j ~ A k ~ b , for 1 =< j <= k =< m.
LEMM~ 4. If [B1 " ' " B~]~ - + x[B1 " " B,~]j, C ~ B k ( k <= re, n ) , a n d
C ---> aBl~, t h e n [CBI . . . B~]i ---+ x[CB1 . . . B,~]j.
Proofs are immediate.
LEMMA 5. If (B1, " ' " , B~) C K and 1 < m < n, t h e n
(a) if B ~ ~ ~B1, it is n o t the case t h a t B~ ~ B m ~ ( i <= n; i ~ m )
(b) if Bm ~ BI~, it is n o t the case t h a t B~ ~ ~ B m ( i <= n; i ~ m )
(c) if Bm ~ ~B~b, it is n o t the case t h a t B~ ~ ~lB,~2Bmo~3(i <= n )
PROOF. Suppose t h a t B ~ ~ ~B1 and for some i ~ m, B~ ~ B~¢.
.'. ~ ~ I ~ ~b. B y l e m m a 3, B~ ~ ~ 1 B ~ 2 . . ' . Bm ~ ~ 1 B ~ 2 ~ ~lBm~b~2.
Contra., since now B ~ is self-embedded. Similarly, case (b). Suppose
Bm ~ ~B~b and for some i, B~ ~ ~1B,~o~2B~3. .'. B1 ~ x~B~x2
Xlo~iB~2Bmo~3X2 ~ ~Blo~Bl¢o~ ~ o~TBlo~sBl~DB~o~6. Contra. (s.e.).
T o facilitate proofs, we a d o p t the following notational convention:
CONVENTmN 2. Suppose t h a t ( ~ , • • • , ~ ) is a derivation in G' f o r m e d
b y construction. T h e n ~ = a~ . . . a~Q~ (where Q~ is the unique non~
15
terminal symbol t h a t can a p p e a r in a derivation~), Q~ - + a ~+l~z~+~.
m
1
zn --- a m " " a,~.zn = zn.
LEMMA 6. Suppose t h a t D = (~1, • • • , ~ ) is a derivation in G' where
Q~ = [B~]2. T h e n :
( I ) if~l = [B~]I, (C1, . . . , Cm+l) ~ K , C~---+A~+~C~+I (for 1 < i < m ) ,
a n d Cm+l = B1, t h e n there is a derivation
([C~ . . . C,~B~h, . "
, z~[C~]~)
in G'.
( I I ) if ~ = [B1 . - . B~]I and B~ ~ x B 1 , t h e n there is a derivation
([B~h, " " , z~[B~]~)
in G'.
PROOF. P r o o f is b y simultaneous induction on the length of z~, i.e.,
the n u m b e r of non-null symbols a m o n g a l , --- , a~.
Suppose t h a t the length of z~ is 1 . . ' . there is one and only one i s.t.
q~ = [ " ' ] 1 and q~+~ = [ . - - ] 2 .
(a) Suppose t h a t i > 1. T h e n ~ = Q~ is formed from Q~-I b y a rule
whose source is (iia) or (iiia), a n d ~+~ = a~+lQ~+~ is f o r m e d f r o m
~+1 = a~+~Q~+~ b y a rule whose source is (iic) or (ivb). B u t for some
~ Unless the initial line contained more than one nonterminal symbol, a case
which will never arise below.
~ Note that a~+~will always be I unless the step of the construction justifying
~ --~ ~+~ is (i). a~ will generally be I in this sequence of theorems.
156
,t~, Qi--1
CHOMSKY
=
[B1
"'"
Bk]l, Qi
=
[B1
"'"
Bk+ila,
Q¢+I
=
[B1
..-
Bk+l]2,
.Qi+~ = [B1 • • • Bk]2. .'. Bk --~ Bk+ID for some D , Bk --+ EBk+a , for some
E , which contradicts the assumption t h a t G is r e g u l a r . . ' , i = 1.
(b) Consider now ( I ) . Since i = 1, r = 2 . . ' . B , --+ z2. B y assumption
a b o u t the C?s and m applications of L e m m a 4, and (i) of the construction, [Ca "'" C,~B1]a --* z2[C~ " " CraB1]2. Since Ci --~ Ai+aCi+l(Ci ~ C j
for 1 _-< i < j _< m + 1, since (Ca, - . - , C~+~) C K b y assumption), it
follows t h a t [Ca - . . C,~Ba]2 ~ [Ca . . . C,,]~ --~ [CI . . . Cm-~]2 . . . ---* [Ca]2.
.'. ([C~ . . . CmBa]a, z 2 [ C a " " CraBs]2, z 2 [ C ~ . " C m ] 2 , " ' , z_~[Ca]2) is the
required derivation.
(e) Consider now ( I I ) . Since i = 1, B~ --~ z2 and [Bn]a --~ z2[Bn]~, b y
(i) of c o n s t r u c t i o n . . ' . ([Bn]~ , z2[Bn]2) is the required derivation.
This proves the l e m m a for the ease of z~ of length 1.
Suppose it is true in all cases where z~ is of length < t.
Consider ( I ) . Let D be such t h a t z~ is of length t. If none of Ca, • • • ,
C~ appears in a n y of the Q¢'s in D, then the proof is just like (a), above.
Suppose t h a t ~j is the earliest line in which one of Ca, • • • , C ~ , say Ck,
appears in Q j . j > 1, since C1, . - ' , C~ ~ Ba. B y assumption of nons.c., the rule Q~'-I --+ a j Q j used to form ~- can only have been introduced b y (lib). ~6 .'. Q~--j = [Ba " " B n E ] 2 , Qj = [ B a ' . . B,~C~]a, Bn
"-" E C k .
B u t Ca, " ' " , C~ do not occur in Q1, "'" , Q~-a and
(Ca,-'-,
C~,B~) ~ K.
.'., b y L e m m a 4,
([C~ . - . C~Ba]~ , . . . ,
zj_~[C~ . . . C r a B s . . .
B,~E]~)
(6)
is a derivation. F u r t h e r m o r e z j_l is not null, since there is at least one
transition from [ . . . ] 1 to [.-.]2 in (6), which m u s t therefore have been
introduced b y (i) of the construction. B u t B,~ ---+ E C ~ . . ' .
[C~ . . . C r a B s . . .
B,E]~----> [ C a ' . .
C~]~
(7)
[by (iiib)]. F u r t h e r m o r e we know t h a t
([B~
. . - B~C~]I, . . . , g+a[B,]2)
(S)
~ It can only have been introduced by (iia), (lib), (ilia), (iva), or C~ will appear in Q~._, . Suppose (iia)..'. Qi-* = [B~ ... Bq], , Qi = [B~ ... BqC~], , and
Bq ~ C~D. But C~ ~ ebb. Contra. by Lemma 5 (a). Suppose (ilia). Same. Suppose
(iva)..'. Qi-* = [B~ ... Bd= , Qi = [B~... Bi+q]~ (q >= 1), B~+q_t ~ B~B~+q, where
C~ = B ~ (1 < s =< q). But C~ ~ ¢B~ , ¢ # I..'. C~ ~ ~colBi+q_lW2 --> ¢o~lBiBi+qCO2
:=~ ~bwaCte.o~Bi+q¢o~ , contra..', introduced by (lib).
ON CERTAIN FORMAL PROPERTIES OF
GRAMMARS
157
m u s t be a d e r i v a t i o n , since [B1 . . . B~C~]I = Qj ; i.e., (8) is j u s t t h e t a i l
e n d (¢5, " • " , Cr) of D, w i t h t h e i n i t i a l s e g m e n t zj d e l e t e d f r o m each of
¢~., • • • , Cr. Since z j_l is n o t null, z~+~ is s h o r t e r t h a n z~, hence is of
l e n g t h < t . Also, Ck ~ xB1, b y a s s u m p t i o n . . ' , b y i n d u c t i v e h y p o t h e s i s
( I I ) , t h e r e is a d e r i v a t i o n
([ck]l, . . - , d+'[c&)
(9)
.'. b y i n d u c t i v e h y p o t h e s i s ( I ) , t h e r e is a d e r i v a t i o n
(IV1 ' ' '
Ck]l,
.'.
,
z~-t-1[C112)
(10)
C o m b i n i n g (6), (7), (10), we h a v e t h e r e q u i r e d d e r i v a t i o n .
C o n s i d e r n o w ( I I ) . If n - 1 or t h e r e is no such d e r i v a t i o n of l e n g t h
l, t h e proof is t r i v i a l . A s s u m e n > 1.
L e t ~ j c o n t a i n t h e first Q of t h e f o r m [B, . . - Bm],(j > 1, m <= n).
Since B , ~ xB~, it follows f r o m L e m m a s 3, 5 t h a t Bm ~ yB1. Since
m <= n, we see b y checking t h r o u g h t h e possibilities in t h e c o n s t r u c t i o n
t h a t n o t all of Q1, • • • , Qj-1 are of t h e f o r m [. • "]2 • .'. t h e r e w a s a t l e a s t
one a p p l i c a t i o n of (i) in f o r m i n g (~1, • • • , ~ j - , ) . . ' . zj_l is n o t null. B u t
([B, . . . B A 1 , . . - ,
is, like ( 8 ) , a d e r i v a t i o n . . ' ,
derivation
z~+1[B ~12)
"
(J1)
b y i n d u c t i v e h y p o t h e s i s ( I I ) , t h e r e is a
([B~]I, - . . ,
z~+~[BmD
(12)
w h e r e z~+1 is s h o r t e r t h a n z~.
L e t ek c o n t a i n t h e first Q of t h e f o r m [B~ .. - B,~]2(m <= n). As a b o v e ,
Bm ~ yB~. F r o m L e m m a 5 it follows t h a t t h e rule used t o f o r m ~k+~
m u s t be justified b y (iic) or ( i v b ) of t h e c o n s t r u c t i o n . I n e i t h e r case,
QI~+~ = [BI " - B~-112 • S i m i l a r l y , we show t h a t
([Bi . . . B~]2, . . . , [B,]~)
(13)
is a d e r i v a t i o n . . ' , z~ = zk.
L e t q = m i n ( j , k ) . T h e n all of Q2, " ' " , Qq-1 are of t h e f o r m
[BI " ' " B~+~]i .
I t is clear t h a t we can c o n s t r u c t ¢1, • • • , Cq-~ s.t. for p < q, ~p = z~Qp',
w h e r e Q~' = [B~ . . . B~+~]i w h e n Q, = [B1 . - . B,~+v]~. C o n s e q u e n t l y
( [ B d ~ , . - . , Zq_~Q'q_~)
is
a
derivation.
(14)
158
CItOMSKY
Suppose q = j . .'. Qq-1 = [B1 . - . B.+,]~--~ a~[B1 . . . B,~]z = Qj , where
m < n < n + v . . ' . this rule can only have been introduced b y (iiib)
of the c o n s t r u c t i o n . . ' , i = 2 and B~+~-I - ~ B ~ + ~ B ~ .
Case 1. Suppose m = n . . ' .
[B,~ . . . B,+~]~ --~ [B,]I = [B,]~
(15)
Combining (14), (15), (12), We have the required derivation.
~ Bn, -.. , Bn+~..'.
Case 2. Suppose m < n . . ' . B ,
[ B , . . . B,+~]2 ~
[ B , . . . B,+~_~B,]I
(16)
b y (iib). We have seen t h a t B , ~ y B ~ . .'. B,+~_~ w ~ B ~ . .'. for
1, B , + , --~ E~B,+,+~, b y L e m m a 5.
B u t (12) is a derivation where z,
_s+~ is of length < t . . ' . b y inductive
hypothesis ( I ) there is a derivation
s < v -
( l B , . . . B~+~_IB~]~ , . . .
, z~+*[Bn]2)
(17)
Combining (14), (16), (17), we have the required derivation.
Suppose, finally, t h a t q = k. We have seen t h a t in this case z~ = zk.
B u t Qq-1 = [BI . . . B~+~]¢ --~ ak[Bi . . . Bm]~ , where m =< n < n + v . . ' .
this rule can only have been introduced b y (iic) or (ivb). I n either ease,
i = 2, m = n, v = 1, and O~_, = [B~B,+~]2 --~ ak[B,]2. Combining this
with (14) we have the required derivation.
We have thus shown t h a t the l e m m a is true in case z~ is of length 1,
a n d t h a t it is true for z~ of length t on the a s s u m p t i o n t h a t it holds for
z, of length < t . Therefore it holds for e v e r y derivation D.
LEI~MA 7. Suppose t h a t D = ( ~ , • • • , ~ ) is a derivation in G' where
QI = [B111 . T h e n
( I ) if ~ = [B~h, (C~, . . . , C~+~) C K, C~ ~ C~+~A,+~ (1 < i < m),
and C~+I = B~, t h e n there is a derivation
([C~]~, . - . , z~[C~ . . . CmB~]~)
( I I ) if ¢, = [B~ . . . B.]2 and B~ ~
in G'.
B~x, t h e n there is a derivation
([B,]I, . . - , z~[Bn]~)
in G'.
T h e proof is analogous to t h a t of L e m m a 6. I n the inductive step,
case ( I ) , we take Q~ as the last of the Q's in which one of C~, . - . , C~
ON CERTAIN
FORMAL PROPERTIES
OF GRAMMARS
159
appears, and instead of (iiib) in (7), we form
[Ci " " G]2--* [ G " " C,~BI . . . B~E]~
by (ira). The proof goes through as above, with similar modifications
throughout. In case ( I I ) of the inductive step we let Qj be the last Q of
the form [ B 1 . . . B,~]2(j < r , m <- n ) , andQk the last Q of the form
[B1 " . B~]l(m <- n ) . Taking q -- max(j,k) [instead of min(j,k)], the
proof is analogous throughout, with (iva) taking the place of (iiib).
In general, because of the symmetries in case (iii), (iv) of the construction [reflecting the parallel possibilities (30), (3d) for recursion],
most of the results obtained come in symmetrical pairs, as above, where
the proof' of the second is analogous to the proof of the first. Only one
of the pair of proofs will actually be presented.
We will require below only the following special case of (I) of Lemmas
6, 7 (which, however, could not be proved without the general case).
LEMMA 8. Suppose that D = ([B]~, .-- , z[B]2) is a derivation in G'
and that C ~ B. Then
(a) if C --~ AB, there is a derivation
( [ C B h , . . . ,z[C]~)
in G'
(b) if C --~ B A , there is a derivation
([C]~, - . . , z[CB]2)
in G'.
DEFINITION 10. Suppose that G' is formed from G by the construction
and D is an a-derivation of x in G. D will be said to be represented in G ~
if and only if ~ = a or a = A and there is a derivation ([All, . . . , x[A]2)
in G'.
W h a t we are now trying to prove is that every S-derivation of G is
represented in G'.
DEFrNITIoNll. L e t D 1 = ( ~ , . - . , ~ m )
and D2 = ( ~ l , ' " , ~ b ~ ) be
derivations in G. Then DI*D2 is the derivation
( ~ 1 ~ , ~2¢1, . - . , ~ ,
~,,¢~, . . . , ~m¢~).
LEMMA 9. Let Di be an A-derivation of x and D2 a B-derivation of y
in G. If Di and D2 are represented in G ~ and C ~ A B , then
Da = (C~1 .-- ~ )
is represented in G', where (~1, • "" , ~,~) = D~*D2. (D3 is thus a C-derivation of xy.)
160
CHOMSK¥
PROOF. B y hypothesis, there are derivations
([A]I, . . . , x[A]=)
(18)
([B]I, . . . , y[B]~)
(19)
in G'.
Case 1. Suppose A ~ C ~ B. Then by Lemma 8, there are derivations
x[CA ]~)
(20)
([CB]I, ' ' - , y[C]2)
(21)
([C]1, . . . ,
in G'. B y (iib) of the construction,
[CA]~ --. [CB]I
(22)
Combining (20), (22), and (21), we have the required derivation.
Case 2. C = A . .', C ~ B by assumption of regularity of G. B y Lemma
8, case (a), we have again the derivation (21). B y (ira) of the construction,
[A]~ = [C]2 - * [CB]~.
(23)
Combining (18), (23), (21) we have the required derivation.
Case 3. C = B . .'. C ~ A . B y Lemma 8, case (b), we have (20).
B y (iiib),
[CA]~ --~ [C]~ = [B]~.
(24)
Combining (20), (24), (19), we have the required derivation.
Since C ---+ C C is ruled out by assumption of regularity, these are the
only possible cases.
LE~MA 10. If D1 = ( ~ , " - , ~r) is a Xl~l-derivation, where Xl
I ~ ~ , then there is a derivation D2 = D~*D~ = (g,~, . . . , ¢~r) such
that t r = ~r, D3 is a x~-derivation and D~ is an wl-derivation.
])ROOF. Since for i > 1, q~ is formed from ~_~ by replacement of a
single symbol of ~_~ ,~7 we can clearly find X~, ~ s.t. ~ = x ~ where
either (a) xi = x~-i and w~-1 --~ ~ or (b) xi-~ --~ x~ and ~i = o~_~
(X~-~-~ = ~-~). Then D~ is the subsequence of (X~, "'" , X,) formed
b y dropping repetitions and D4 is the subsequence of ( ~ , . . . , ~ )
formed b y dropping repetitions.
LEY~MA 11. If G' is formed from G by the construction, then every
a-derivation D in G is represented in G'.
~ Which, however, may not be uniquely determined. Compare footnote 8.
ON CERTAIN FORMAL PROPERTIES OF GRAMMARS
161
PROOF. Obvious, in case D contains 2 lines. Suppose true for all derivations of fewer t h a n r lines (r > 2). Let D = (~i, " ' " , ~ ) , where ~1 = a.
Since r > 2, a = A , ~2 = B C . . ' . ( ~ , . . • , ~ ) is a BC-derivation. B y
L e l n m a 10, there is a Dz = D~*D~ = (¢~2, • "" , ~k~) s.t. D, is a B-derivation, D4 a C-derivation, and ~kr = ~r • B y inductive hypothesis, b o t h D3
and D4 are represented in G'. B y L e m m a 9, D is represented in G'.
I t remains t o show t h a t if (JAIl, -. • , x[A]2) is a derivation in G', t h e n
there is a derivation (A, . - . , x) in G.
LEMMA 12. ~s Suppose t h a t G' is formed b y the construction f r o m G,
regular a n d non-s.c., and t h a t
(a) D = ( ~ l , - . . , ~ l , - - . , ~ q , - - - , ~ m 2 , . . . , ~ , )
is a derivation
in G', where Q~ = [A1], Qml = Q ~ = [A1 - . . A~]n, Q~ = [A1 . . - Aj]~,
(b) there is no u, v s.t. u ¢ v, Q~ = Q~ = [B1 . . - B~]t, and s < k 19
(c) for ml < u < m2, if Q~ = [A1 . . . A , ] t , t h e n s > f 0
T h e n it follows t h a t
(A) if n -- 2, there is an m0 < m~ such t h a t Q~0 = [A~ . - . Ak]t
(B) if n = 1, there is an m~ > m2 such t h a t Qm = [Aj . . . Ak]2
(C) j = ~
PROOF. (A) Suppose n = 2. Assume ~ to be the earliest line to cont a i n [A1 . . . A~]2. Clearly there is an ~ =< m~ s.t. Q~ --- [A1 . . . Ak+~]~,
Q.a-~ = [A1 . . . Ak+t]~(t > 0). If there is no m0 < ~ s.t.
Q~0 = [A1 . - . Ak]l,
t h e n there m u s t be a u < r~ s.t. Q~ = [A~ • • • A~]:,
Q~+I = [A0 . . " A ~ _ I B o . . .
B,~]I,
where ~+~ is f o r m e d b y ( i r a ) of the construction, A0 = I , B0 = A k ,
m = 1, and s < /c (since ( i r a ) gives the only possibility for increasing
the length of Q b y more t h a n 1 ) . . ' . B,~_I --> A ~ B . . . . ". A~ ~ ~ A ~ B ~ ¢ .
B u t Q~ = [A~ . . . A~]2 cannot recur in a n y line following f ~ [this
would contradict assumption (b)]. Therefore, iust as above, there m u s t
be a v > m2 s.t. Q~_~ = [A0 . . . A,_~Co . . . C,~,]~, Q, = [A~ . . . A~]~,
where ~ is formed b y (iiib) of the construction, A0 = I , Co = A ~ ,
m ' > 1, p < s (since (iiib) gives the only possibility for decreasing the
~s W e c o n t i n u e t o e m p l o y C o n v e n t i o n 2, a b o v e .
19 T h a t is, Q~I = Q~: is t h e s h o r t e s t Q of D t h a t r e p e a t s .
~0 T h a t is, Qqis t h e s h o r t e s t Q of t h i s f o r m b e t w e e n ~ i a n d f ~ : .
~ T h a ~ is, Qqis n o t s h o r t e r t h a n Q~I = Q~e -
162
CttOMSK¥
length of Q b y more t h a n 1 ) . . ' . Cm,-1 ---> Cm,A~ . But A~ ~ ~1C,~,-1~2
( L e m m a 3 ) . . ' . A~ ~ elC~,Ap~2 ~ ~ 1 C ~ , ~ A ~ 4 ~ ~ I C ~ , ~ 3 ~ A ~ B , ~ 4 .
Contra., since G is assumed to be non-s.c.
.'. there is an m0 < r~ _= m~ s.t. Q~0 = [At • .. Ak]~
(B) Suppose n = 1. Proof is analogous.
(C) ( I ) . Suppose n = 2. S u p p o s e j < k. Suppose i (in Qq) is 2. Clearly
there m u s t be a v > m~ s.t. either Q~ = [A~ - . . A~.]2 [which contradicts
assumption (b)] or Q,_~ = [A0 . . . Aj-~Co . . . C~]~ , Q~, = [A1 . . . A~,]~ ,
where f~ is formed b y (iiib) of the construction, Ao .~- I , Co = A i ,
m => 1, p < j [as in the second paragraph of the proof of (A)]. Suppose
the l a t t e r . . ' . C~_~ --* C m A ~ . .'. A j ~ ~ C m A , ¢ . Furthermore, since
p < j, A ~~ ~C~A
~.
F r o m assumption (c) and assumption of regularity of G, it follows t h a t
~q+~ can only have been formed b y (iva) of the construction..'. Qq+l =
[A0 . . " Ai_~Do . . " Dt]:t, where A0 = I , Do = A~., t ~ 1..'. D t _ ~ - - - > A i D t .
.'. A i ~ xaA ~Dt~, . .'. A i ~ o~fC~o~A i x ~ D t ~ , and A~. is self-embedded,
contrary to assumption.
Suppose t h a t i (in Qq) is 1. B y (A), there is an m0 < m~ s.t. Qm0 =
[A~ . - . A , ] ~ . . ' . there is a u < m0 s.t. either Q~ = [A1 - . . A~]~ [which
contradicts assumption (b)] or Q, = [A~ . . - A,]~,
Q~+I = [A0 . - . Aj_1Bo . . . Bin]l,
where ~+1 is formed b y (ira), Ao = I , Bo = A ~ , m >= 1, s < j . Assuming the latter, we conclude t h a t A j ~ ~IAjw2Bm¢~, as above.
F r o m assumption (c) and assumption of regularity of G, it follows
t h a t Cq can only have been formed b y (iiib). Contradiction follows as
above.
( I I ) Suppose n = 1. Proof is analogous.
This completes the proof. F r o m L e m m a 12 it follows readily b y the
same kind of reasoning as above t h a t
COROLLARY. Under the assumptions of L e m m a 12,
(A) if n = 2, ~m1+1 is formed b y ( i r a ) of the construction
(B) if n = 1, ~m2 is formed b y (iiib) of the construction
(C) Q~ is of the form [A1 . . - AkBo . . . B~]t (s >-_ O, Bo =- I ) , for u such
t h a t : (a) where n = 2 and m0is a s i n (A), L e m m a 12, then m0 < u < m2 ;
(b) where n = 1 and m~ is as in (B), L e m m a 12, then ml < u < m3.
Furthermore, for ml < u < m2, s > 0 if t # n.
DEFINITION 12. Let D = (~i, ' • • , ~ ) be a derivation in G' formed
ON CERTAIN FORMAL PROPERTIES OF GRAMMARS
163
b y the construction f r o m G. T h e n D ~ corresponds t o D if D ' is a derivation of z~22 in G and for each i, j, k ( i < j ) such t h a t
(a) ~i is the earliest line containing [AI - . . Ak]l
(b) ~j is the latest line containing [AI . . . Ak]2
(c) there is no p, q s.t. i < p < j, q < h, and Qp = [A1 . - . Aq]~,
there is a subsequence (ziA~b, . . . , zj~b) in D'.
LEMMA 13. Let D = (~1, • • ", ~r) be a derivation in G t f o r m e d by the
construction f r o m a regular, non-s.e.G. Suppose t h a t Q1 = [A~ .. • A~]I,
Qr = [A~ - . . A~]2, and there is no p, q such t h a t 1 < p < r, q < s,
Qp = [ A 1 . . . Aq]~.
T h e n there is a derivation D r = (~1 ~ , • • -, ~ , ) corresponding t o D.
PnOOF. Proof is b y induction on the n u m b e r of recurrences of symbols
Qi in D (i.e., the n u m b e r of cycles in the derivation).
Suppose t h a t there are no recurrences of a n y Q~ in D. I t follows t h a t
there can have been no applications of (iva) in the construction of D,
i.e., no pairs Qi -- [A1 - - . Aj]2, Qi+~ = [A1 . . . A~]I where j < h. F o r
suppose there were such a p a i r . . ' . Ak_~ ~ A ~ A k . Also, j > s, or Q~ is
repeated as Q~. Clearly there is an m > i ~- 1 s.t. Q~ = [A~ . . - A~+~]~
(n > 0 ) . . ' . there is a t > m s.t. either Qt = [A1 . . . Aj]2 ( c o n t r a r y t o
a s s u m p t i o n of no repetitions) or
Qt = [A1 . . . As+~]2,
Qt+l = [A1 . . . Aj-,,]I
(u, v > 1),
where ~+1 is f o r m e d b y ( i i i b ) . . ' .
c o n t r a r y t o the assumption t h a t G is non-s.e. Similarly, there can be no
applications of (iiib) in the construction of D. B u t now the proof for
this case follows immediately b y induction on the length of D.
Suppose now t h a t the l e m m a is true for every derivation containing
< n occurrences of repeating Q's. Suppose t h a t D contains n such occurrences.
I.
1. Suppose t h a t the shortest recurring Q in D is [A~ • • • Akin.
2. Select m l , m~ s.t. m~ < m~ ; Q ~ = [A~ . . . A~]~ = Q~: ; there is
no i, m~ < i < m~, s.~. Qi = [A1 . . . A~]~ ; there is no j > m2 s.t.
Qj = [A1 . . . A~]~.
22 Compare Convention 2.
164
CHOMSKY
3. B y L e m m a 12 ( A ) , we k n o w t h a t t h e r e is a n m0 < ml s.t. Qm0 =
[A1 - . . A~]I. Select m0 as t h e earliest such ( t h e r e is in f a c t o n l y o n e ) .
B y t h e C o r o l l a r y t o L e m m a 12, ( C ) , a n d t h e i n d u c t i v e h y p o t h e s i s , t h e r e
is a d e r i v a t i o n D~ = (zmoAk , . . . , z m)58 c o r r e s p o n d i n g t o ( ~ o , "" ", ~ 1 ) .
4. B y C o r o l l a r y ( A ) , we k n o w t h a t
~ml+~ = z,~l[Ao " ' " A k - i B o " ' " B m ] l ,
w h e r e A o = I , Bo = A k , m >= 1, Bm-~ --~ A k B ~ . O b v i o u s l y , t h e r e is a
v(m~ < v < m2) s.t. e i t h e r Q~ = [A0 - . - A~_IBo . . . Bm]2 or
Qv = [A0 . . . A ~ - I B o . . . B~+t]2, Qv+~ = [Ao . . - A k - l B o
"'"
Bm-u]l,
where u, t > 1 a n d ~+1 is f o r m e d b y (iiib) [note C o r o l l a r y (C)]. F r o m
t h e l a t t e r a s s u m p t i o n we can d e d u c e self-embedding, as a b o v e . . ' , we
can select v as t h e l a r g e s t i n t e g e r ( m 2 s.t. Q, = [A0 • • • Ak_~Bo • • • B ~ ] ~ .
5. L e t t b e t h e l a r g e s t i n t e g e r (ml ~- 1 ~ t ~ v) s.t.
Qt = [A0 . . . Ak_~Bo . . . B m _ u ] i ,
u > O.
S u p p o s e t h a t i = 1. B u t ~t+1 m u s t be f o r m e d b y (iia) or (ilia) of t h e
c o n s t r u c t i o n . . ' , u = 1 a n d Bm_~ - ~ B m C , c o n t r a r y t o a s s u m p t i o n of
r e g u l a r i t y , since B~_~ ~ A k B m .
.'. i = 2, a n d Qt+l = [A0 . . . A k _ l B o . . . B~+.]1(n _-> 0), w h e r e
~t+l is f o r m e d b y ( i r a ) of t h e c o n s t r u c t i o n . . ' .
B m + , - I - ~ B,,_~,B~+,~ ~
~ B m - I ~ 2 B ~ , + ~ --~ ~IAkBm~2Bm+,~ .
S u p p o s e n = 0. T h e n B,,_~ --+ B m - ~ B m , so t h a t , b y r e g u l a r i t y , B~_~ ---A k . . ' . Qt = [A~ • • • A~-]2, c o n t r a r y t o a s s u m p t i o n in s t e p 2.
.'. n ~ O. .'. B m ~ ~ B , ~ + ~ _ ~ 4 . .'. B m ~ ~ 3 ~ A ~ B , ~ : B ~ + , ~ x 4 , c o n t r a .
(s.e.).
6..'. there is no t such as that postulated in step 5. Consequently
(~+i,
• •., ~) meets the assumption of the inductive hypothesis e~ and
there is a derivation D~ -- (z~+~Bm,. "',Zm~) ~ corresponding to
(~m1+1
, " " " , ~t~v)"
7. Since v was selected in s t e p 4 t o be m a x i m a l , i t follows t h a t ~,+~
c a n n o t be f o r m e d b y ( i v a ) , b y reasoning similar t o t h a t i n v o l v e d in
~ Recall that z ~
,~0+~., i.e., there is a derivation (A~, • • • , z,~o+h~.
zm0z~
~a From nonexistence of such a t it follows at once that for u such that mt
u < v, Q~ = [A0 .-- A~_~Bo . . . B,~Co . . . C~]i ( ~ >= O, Co ~ I).
~ That is, there is a derivation (B,~ , .-. , z~+~).
ON CERTAIN FORMAL PROPERTIES OF GRAMMARS
165
step 4. B y regularity assumption, it cannot h a v e been f o r m e d b y (lib)
or (iiib) of the construction, since B,,-I ---+ A k B ~ . . ' .
Q~+I -- [A0 . . - Ak-IBo . . . B,~-112.
8. Suppose m = 1, so t h a t Q~+I = [AI . . - A~]2..'. v -t- 1 = m2, b y
a s s u m p t i o n of step 2, and AIo ~ AkB~. L e t D2' be the derivation f o r m e d
f r o m D~ (cf. step 6) b y deleting initial z~1+~ f r o m each line. L e t
(~1, " " , ~ )
= DI*D2'
(cf. Definition 11; DI as in step 3). Clearly D3 = (z,~oAl~, ~ , . . . , ~ )
is a derivation corresponding to ( ~ 0 , "" ", ~ ) .
9. Suppose m _>- 2. B y a s s u m p t i o n t h a t G is non-s.e., a n d t h a t v is
m a x i m a l (in step 4) we can show t h a t e~+~ m u s t be f o r m e d b y (iib) of
the construction (all other cases lead to c o n t r a d i c t i o n ) . . ' .
Q~+2 = [A0 • .. Ak-lBo . . . Bm-2C]l,
B~_2 --> Bm_lC.
As above, we can find a vl which is the largest integer < m 2 s.t. Q~ =
[A0 . . . Ak-~Bo . . . B,~-2C]2 a n d s.t. (¢,+~, . . . , ~ 1 ) m e e t s the inductive
h y p o t h e s i s . . ' , there is a derivation D4 = (z~+2C, . . . , z~) corresponding
to ( ~ + ~ , • •., ~ ) .
10. Suppose m = 2 . . ' .
B,~_2---+ B1C,
vl + 1 = m2,
Q~+I = lAx . . . Ak]~
(as a b o v e ) . L e t D4' be the derivation f o r m e d f r o m D4 b y deleting initial
z~+2 f r o m each line. L e t (~b,, . . . , ~p) be as in step 8. L e t ( x l , • • ", xq) =
(z~oB1, ~1, "" ", ~b~)*D4'. Clearly D5 = (zmoAk, Xl, " " , Xq) is a derivation corresponding to ( ~ 0 , • • -, ~ = ) .
11. Similarly, w h a t e v e r m is, we can find a derivation
A = (Z,~oAk, . . - , z ~ )
corresponding to ( ~ 0 , . . . , e ~ ) .
12. Consider now the derivation D6 f o r m e d b y deleting f r o m t h e
original D the lines ~ + ~ , • •., ~
and the medial segment z ~ +~ from
each later line. T h a t is, D6 = (~1, "" ", ~t) (t = r - (m~ - m~)), where
for i < m~, ~b~ = ~ , a n d for i > m~, ~b~ -= z ~ z m _ ~ + ~ 2 _ ~ 1 + ~ .
By
inductive hypothesis, there is a derivation D~ corresponding to D~.
I n steps 2 and 3, m0, m~, m~ were chosen so t h a t ~ 0 contains the
earliest occurrence of Q~0 = [A~ . . - A~]~, and ~ the latest occurrence
of Q ~ = Qm~ = [A~ . - - A~.]2, and so t h a t no occurrences of Q ~ occur
166
CHOMSKY
between ~ 1 and ~ . . ' . in Ds, Cmo contains the earliest occurrence of
Q,~o and ¢/ml the latest occurrence of Q~I. Furthermore, by Corollary
(C), there is no Q shorter than Qm0 between ~m0 and ~ 1 . . ' . by inductive hypothesis and the definition of correspondence, it follows that D7
contains a subsequenee D7 = (z~oAk~b, . . . , z m ~ ) . But step 11 guarantees
us a derivation A = (z,~oAk , . . . , z ~ ) corresponding to ( ~ 0 , "" ", ~ ) .
We now construct Ds by replacing 1)7 in D7 by A = (zmoAk~, . . . , z ~ ) ,
formed by suffixing ~ to each line of 4, and inserting z~°+~ after z ~ in
all lines of D7 following the subsequenee/)7.
Clearly/)8 corresponds to D, which is the required result in case the
shortest recurring Q is of the form [...]~.
II.
An analogous proof can be given for the case in which the shortest
recurring Q is of the form [...]~.
We have shown that the lemma holds for derivations with no recursions, and that it holds of a derivation with n occurrences of recurring
Q's on the assumption that it holds for all derivations with < n such
occurrences..', it is true of all derivations.
A corollary follows immediately.
COROLLaR:C. If G' is formed from G by the construction and D' =
([A]~, . . - , x[A]~) is a derivation in G', then there is a derivation D =
(A, -.-, x) in G.
From this result and Lemma 11, we draw the following conclusion.
THEO~E~ 10. If G' is formed from G by the construction, then there
is a derivation (S, - . . , z) in G if and only if there is a derivation ([S]1,
• .., z[S]2) in G'.
That is, if [S]~ in G' plays the role of S in G, then G and G' are equivalent if we emend the construction by adding the rule Q1 --~ a wherever
there are Q~, ...,Q~ (n __->2) such that Q1 -~ aQ2 and Q~ --~ Q3 --~ " . --+
Q~, where Q~ -- [S]~, Qi is of the form [-. "]2 for 1 < i =< n, and Q1
is of the form [...]1.
But in the grammar thus formed all rules are of the form A --~ aB
(where a is I unless the rule was formed by step (i) of the construction)
or A --~ a. It is thus a type 3 grammar, and the language L~ generated
by G could have been generated by a finite state Markov source (of.
Theorem 6) with many vacuous transitions. But for every such source,
there is an equivalent source with no identity transitions (el. Chomsky
and Miller, 1958). Therefore L~ could have been generated by a finite
Markov source of the usual type. Obviously, every type 3 grammar is
ON CERTAIN FORMAL PROPERTIES OF GRAMMARS
167
non-s.e. (the lines of its A-derivations are all of the form xB). Consequently:
THEOREM 11. If L is a type 2 language, then it is not a type 3 (finite
state) language if and only if all of its grammars are self-embedding.
Among the m a n y open questions in this area, it seems particularly
important to t r y to arrive at some characterization of the languages of
these 2s various types 27 and of the languages that belong to one type
but not the next lower type in the classification. In particular, it would
be interesting to determine a necessary and sufficient structural property that marks languages as being of type 2 but not type 3. Even given
Theorem 11, it does not appear easy to arrive at such a structural characterization theorem for those type 2 languages that are beyond the
bounds of type 3 description.
RECEIVED: October 28, 1958.
REFERENCES
CnOMSKY, N. (1956). Three models for the description of language. IRE
on Inform. Theory IT-2, No. 3, 113-124.
Trans.
CnOMSKY, N. (1957). "Syntactic Structures." Mouton and Co., The Hague.
C~O~SKY, N., and MILLER, G. A. (1958). Finite state languages. Inform. and
Control 1, 91-112.
DAVIS, M. (1958). "Computability and Unsolvability." McGraw-Hill, New York.
HARRIS, Z. S. (1952a). Discourse analysis. Language 28, 1-30.
HA~nIS, Z. S. (1952b). Discourse analysis: A sample text. Language 28, 474-494.
HAnnms, Z. S. (1957). Cooccurrence and transformation in linguistic structure.
Language 33, 283-340.
KLEENE, S. C. (1956). Representation of events in nerve nets. In "Automata
Studies" (C. E. Shannon and J. McCarthy, eds.), pp. 3-40. Princeton Univ.
Press, Princeton, New Jersey.
POST, E. L. (1944). Recursively enumerable sets of positive integers and their
decision problems. Bull. Am. Math. Soc. 50, 284-316.
SCHOTZENBERGSR,M. P. (1957). Unpublished paper.
2s And several other types. In particular, investigations of this kind will be of
limited significance for natural languages until the results are extended to transformational grammars. This is a task of considerable difficulty for which investigations of the type presented here are a necessary prerequisite.
~ As, for example, the results cited in footnote 4 characterize finite state languages.
Was this manual useful for you? yes no
Thank you for your participation!

* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project

Download PDF

advertisement