INFORMATION AND CONTROL 9., 1 3 7 - 1 6 7 (1959) On Certain Formal Properties of Grammars* NOAM CHOMSKY Massachusetts Institute of Technology, Cambridge, Massachusetts and The Institute for Advanced Study, Princeton, New Jersey A grammar can be regarded as a device that enumerates the sentences of a language. We study a sequence of restrictions that limit grammars first to Turing machines, then to two types of system from which a phrase structure description of the generated language can be drawn, and finally to finite state IV[arkov sources (finite automata). These restrictions are shown to be increasingly heavy in the sense that the languages that can be generated by grammars meeting a given restriction constitute a proper subset of those that can be generated by grammars meeting the preceding restriction. Various formulations of phrase structure description are considered, and the source of their excess generative power over finite state sources is investigated in greater detail. SECTION 1 A language is a collection of sentences of finite length all constructed from a finite alphabet (or, where our concern is limited to syntax, a finite vocabulary) of symbols. Since any language L in which we are likely to be interested is an infinite set, we can investigate the structure of L only through the study of the finite devices (grammars) which are capable of enumerating its sentences. A grammar of L can be regarded as a function whose range is exactly L. Such devices have been called "sentence-generating grammars. ''z A theory of language will contain, then, a specifica* This work was supported in part by the U. S. Army (Signal Corps), the U. S. Air Force (Office of Scientific Research, Air Research and Development Command), and the U. S. Navy (Office of Naval Research). This work was also supported in part by the Transformations Project on Information Retrieval of the University of Pennsylvania. I am indebted to George A. Miller for several important observations about the systems under consideration here, and to I~. B. Lees for material improvements in presentation. i Following a familiar technical use of the term "generate," cf. Post (1944). This locution has, however, been misleading, since it has erroneously been interpreted as indicating that such sentence-generating grammars consider language 137 138 CHOMSKY tion of the class F of functions from which g r a m m a r s for particular languages m a y be drawn. T h e weakest condition t h a t can significantly be placed on g r a m m a r s is t h a t F be included in the class of general, unrestricted T u r i n g machines. T h e strongest, m o s t limiting condition t h a t has been suggested is t h a t each g r a m m a r be a finite M a r k o v i a n source (finite automaton).2 T h e latter condition is k n o w n t o be too strong; if F is limited in this w a y it will not contain a g r a m m a r for English ( C h o m s k y , 1956). T h e former condition, on the other hand, has no interest. We learn n o t h i n g a b o u t a natural language f r o m the fact t h a t its sentences can be effectively displayed, i.e., t h a t t h e y constitute a reeursively enumerable set. T h e reason for this is d e a r . Along with a specification of the class F of grammars, a t h e o r y of language m u s t also indicate how, in general, relev a n t structural information can be obtained for a particular sentence generated b y a particular g r a m m a r . T h a t is, the t h e o r y m u s t specify a class ~ of " s t r u c t u r a l descriptions" and a functional • such t h a t given f 6 F and x in the range of f, ~(f,x) 6 Z is a structural description of x (with respect to the g r a m m a r f ) giving certain information which will facilitate and serve as the basis for an a c c o u n t of how x is used and understood b y speakers of the language whose g r a m m a r is f; i.e., which will indicate whether x is ambiguous, to w h a t other sentences it is structurally similar, etc. These empirical conditions t h a t lead us to characterize F in one w a y or a n o t h e r are of critical importance. T h e y will not be further discussed in this paper, 3 b u t it is clear t h a t we will not be able to defrom the point of view of the speaker rather than the hearer. Actually, such grammars take a completely neutral point of view. Compare Chomsky (1957, p. 48). We can consider a grammar of L to be a function mapping the integers onto L, order of enumeration being immaterial (and easily specifiable, in many ways) to this purely syntactic study, though the question of the particular "inputs" required to produce a particular sentence may be of great interest for other investigations which can build on syntactic work of this more restricted kind. 2 Compare Definition 9, See. 5. Except briefly in §2. In Chomsky (1956, 1957), an appropriate ~ and ~ (i.e., an appropriate method for determining structural information in a uniform manner from the grammar) are described informally for several types of grammar, including those that will be studied here. It is, incidentally, important to recognize that a grammar of a language that succeeds in enumerating the sentences will (although it is far from easy to obtain even this result) nevertheless be of quite limited interest unless the underlying principles of construction are such as to provide a useful structural description. ON CERTAIN FORMAL PROPERTIES OF GRAMMARS 139 velop an adequate formulation of ¢ and % if the elements of F are specified only as such "unstructured" devices as general Turing machines. Interest in structural properties of natural language thus serves as an empirical motivation for investigation of devices with more generative power than finite automata, and more special structure than Turing machines. This paper is concerned with the effects of a sequence of increasing heavy restrictions on the class F which limit it first to Turing machines and finally to finite automata and, in the intermediate stages, to devices which have linguistic significance in that generation of a sentence automatically provides a meaningful structural description. We shall find that these restrictions are increasingly heavy in the sense that each limits more severely the set of languages that can be generated. The intermediate systems are those that assign a phrase structure descritption to the resulting sentence. Given such a classification of special kinds of Turing machines, the main problem of immediate relevance to the theory of language is that of determining where in the hierarchy of devices the grammars of natural languages lie. It would, for example, be extremely interesting to know whether it is in principle possible to construct a phrase structure grammar for English (even though there is good motivation of other kinds for not doing so). Before we can hope to answer this, it will be necessary to discover the structural properties that characterize the languages that can be enumerated by grammars of these various types. If the classification of generating devices is reasonable (from the point of view of the empirical motivation), such purely mathematical investigation may provide deeper insight into the formal properties that distinguish natural languages, among all sets of finite strings in a finite alphabet. Questions of this nature appear to be quite difficult in the case of the special classes of Turing machines that have the required linguistic significance. 4 This paper is devoted to a preliminary study of the properties of such special devices, viewed as grammars. It should be mentioned that there appears to be good evidence that devices of the kinds studied here are not adequate for formulation of a full grammar for a natural language (see Chomsky, 1956, §4; 1957, Chapter 5). Left out of consideration here are what have elsewhere been 4 In C h o m s k y and Miller (1958), a s t r u c t u r a l characterization t h e o r e m is s t a t e d for languages t h a t can be e n u m e r a t e d by finite automata, in t e r m s of t h e cyclical s t r u c t u r e of these a u t o m a t a . The basic characterization theorem for finite automata is proven in Kleene (1956). 140 CHOMSKY called "grammatical transformations" (Harris, 1952a, b, 1957; Chomsky, 1956, 1957). These are complex operations that convert sentences with a phrase structure description into other sentences with a phrase structure description. Nevertheless, it appears that devices of the kind studied in the following pages must function as essential components in adequate grammars for natural languages. Hence investigation of these devices is important as a preliminary to the far more difficult study of the generative power of transformational grammars (as well as, negatively, for the information it should provide about what it is in natural language that makes a transformational grammar necessary). SECTION 2 A phrase structure grammar consists of a finite set of "rewriting rules" of the form ~ --* ¢, where e and ~b are strings of symbols. It contains a special "initial" symbol S (standing for "sentence") and a boundary symbol # indicating the beginning and end of sentences. Some of the symbols of the grammar stand for words and morphemes (grammatically significant parts of words). These constitute the "terminal vocabulary." Other symbols stand for phrases, and constitute the "nonterminal vocabulary" (S is one of these, standing for the "longest" phrase). Given such a grammar, we generate a sentence by writing down the initial string #S#, applying one of the rewriting rules to form a new string #~1# (that is, we might have applied the rule #S# --~ #el# or the rule S --~ ¢~), applying another rule to form a new string #e2#, and so on, until we reach a string #~# which consists solely of terminal symbols and cannot be further rewritten. The sequence of strings constructed in this way will be called a "derivation" of #e~#. Consider, for example, a grammar containing the rules: S ~ AB, A --~ C, CB ~ Cb, C --> a, and hence providing the derivation D = (#S#, #AB#, #CB#, #Cb#, #ab#). We can represent D diagrammatically in the form S /\ A I C B I b (1) If appropriate restrictions are placed on the form of the rules m --~ ~ (in particular, the condition that ~ differ from m by replacement of a single ON C E R T A I N FORMAL P R O P E R T I E S OF GRAMMARS 14%1 symbol of ~ b y a non-null string), it will always be possible to associate with a derivation a labeled tree in the same way. These trees can be taken as the structural descriptions discussed in Sec. 1, and the method of constructing them, given a derivation, will (when stated precisely) be a definition of the functional ~. A substring x of the terminal string of a g i v e n derivation will be called a phrase of type A just in case it can be traced back to a point labeled A in the associated tree (thus, for example, the substring enclosed within the boundaries is a phrase of the type "sentence"). If in the example given above we interpret A as Noun Phrase, B as Verb Phrase, C as Singular Noun, a as John, and b as comes, we can regard D as a derivation of John comes providing the structural description (1), which indicates that John is a Singular Noun and a Noun Phrase, that comes is a Verb Phrase, and that John comes is a Sentence. Grammars containing rules formulated in such a way that trees can be associated with derivations will thus have a certain linguistic significance in that they provide a precise reconstruction of large parts of the traditional notion of "parsing" or, in its more modern version, immediate constituent analysis. (Cf. Chomsky (1956, 1957) for further ]iseussion.) The basic system of description that we shall consider is a system G of the following form: G is a semi-group under concatenation with strings in a finite set V of symbols as its elements, and I as the identity element. V is called the "vocabulary" of G. V = Vr u VN(Vr, VN disjoint), where Vr is the "terminal vocabulary" and VN the "nonterminal vocabulary." Vr contains I and a " b o u n d a r y " element #. V~ contains an element S (sentence). A two-place relation -~ is defined on elements of G, read "can be rewritten as." This relation satisfies the following conditions: Axiom 1. --* is irreflexive. AXIOM 2. A C VN if and only if there are ~,, ¢, co such that ~,A¢ --+ ~co¢. Axiom 3. There are no ~, ¢, co such that ~ --+ ¢#co. A x l o ~ 4. There is a finite set of pairs (Xi, col), "'" , (x~, cos) such that for all ~, ¢, ~ --~ ¢ if and only if there are ~1, ~2, and j _= n such that ~ = ~ixj~2 and ¢ = ~coj~2 • Thus the pairs (xJ, coJ) whose existence is guaranteed by Axiom 4 give a finite specification of the relation --~. In other words, we m a y think of the grammar as containing a finite number of rules x; --~ coi which completely determine all possible derivations. The presentation will be greatly facilitated by the adoption of the following notational convention (which was in fact followed above). 142 CHOMSKY CONVENTION 1: We shall use capital letters for strings in V~ ; small Latin letters for strings in Vr ; Greek letters for arbitrary strings; early letters of all alphabets for single symbols (members of V); late letters of all alphabets for arbitrary strings. DEFINITION 1. (91, "'" , 9n)(n > 1) is a ¢J-derivation of~ if ~b = ~i, = 9~, and 9~-+9i+1(1 =< i < n). DEFINITION 2. A 9-derivation is terminated if it is not a proper initial subsequence of any 9-derivation. ~ DEFINITION 3. The terminal language La generated b y G is the set of strings x such that there is a terminated #S#-derivation of x. 6 DEFINITION 4. G is equivalent to G* if La = L a , . DEFINITION 5. 9 ~ ~b if there is a 9-derivation of ~. (which is the ordinary ancestral of --~) is thus a partial ordering of strings in G. These notions appear, in slightly different form, in Chomsky (1956, 1957). This paper will be devoted to a study of the effect of imposing the following additional restrictions on grammars of the type described above. RESTRICTION i. If 9 --* ~b, then there are A, 91,92, ~ such that 9 -91A92, ~b = 91w92, and ~ ~ I. RESTRICTION 2. If 9 -~ ~b, then there are A, 9J, 92, ~ such that 9 = 91A92, ~b -- 91~92, ~0 # I, but A -~ w. RESTRICTION 3. If 9 -~ #, then there are A, 91,92, w, a, B such that 9- 91A92,~b91~92,~0 ~I,A--~,but¢o ~- aBor~o = a. The nature of these restrictions is clarified by comparison with Axiom 4, above. Restriction 1 requires that the rules of the grammar [i.e., the minimal pairs (x~, w~) of Axiom 4] all of be the form 91A92 --+ 91~q~2, where A is a single symbol and w ~ I. Such a rule asserts that A -~ in the context 91--~2 (which may be null). Restriction 2 requires that the limiting context indeed be null; that is, that the rules all be of the form A -+ o~, where A is a single symbol, and that each such rule may be applied independently of the context in which A appears. Restriction 3 5 Note that a terminated derivation need not terminate in a string of Vr (i.e., it may be "blocked" at a nonterminal string), and that a derivation ending with a string of VT need not be terminated (if, e.g., the grammar contains such rules as ab - ~ cd). 6 Thus the terminal language LG consists only of those strings of V r which are derivable from #S# but which cannot head a derivation (of > 2 lines). ON C E R T A I N F O R M A L P R O P E R T I E S OF GRAMMARS 143 limits the rules t o the form A ---> aB or A --+ a (where A , B are single nonterminal symbols, and a is a single terminal symbol). DEFINITION 6. F o r i = 1, 2, 3, a type i grammar is one meeting restriction i, and a type i language is one with a t y p e i g r a m m a r . A type 0 grammar (language) is one t h a t is unrestricted. T y p e 0 g r a m m a r s are essentially Turing machines; t y p e 3 grammars, finite a u t o m a t a . T y p e 1 and 2 g r a m m a r s can be interpreted as systems of phrase structure description. SECTION 3 T h e o r e m 1 follows immediately f r o m the definitions. THEOREM 1. F o r b o t h g r a m m a r s a n d languages, t y p e 0 D t y p e 1 t y p e 2 ___ t y p e 3. T h e following is, furthermore, well known. TtIEOREM 2. E v e r y recursively enumerable set of strings is a t y p e 0 language (and conversely), v T h a t is, a g r a m m a r of t y p e 0 is a device with the generative power of a T u r i n g machine. T h e t h e o r y of t y p e 0 g r a m m a r s and t y p e 0 languages is t h u s p a r t of a rapidly developing b r a n c h of m a t h e m a t i c s (recursive function t h e o r y ) . Conceptually, at least, the t h e o r y of g r a m m a r can be viewed as a s t u d y of special classes of recursive functions. THEOREM 3. E a c h t y p e 1 language is a decidable set of strings. 7~ T h a t is, given a t y p e 1 g r a m m a r G, there is an effective procedure for determining w h e t h e r an a r b i t r a r y string x is in the language e n u m e r a t e d b y G. This follows f r o m the fact t h a t if ¢~, ~+1 are successive lines of a derivation produced b y a t y p e 1 g r a m m a r , t h e n ~+1 c a n n o t contain fewer symbols t h a n ~ , since ~+1 is formed f r o m ~ b y replacing a single symbol A of ~ b y a non-null string ~. Clearly a n y string x which has a 7 See, for example, Davis (1958, Chap. 6, §2). It is easily shown that the further structure in type 0 grammars over the combinatorial systems there described does not affect this result. 7~ But not conversely. For suppose we give an effective enumeration of type 1 grammars, thus enumerating type 1 languages as L1, L ~ , - . . . Let sl,s~ ,..- be an effective enumeration of all finite strings in what we can assume (without restriction) to be the common, finite alphabet of L1,L2,--- . Given the index oi a language in the enumeration L~ ,L2 ,.-. , we have immediately a decision procedure for this language. Let M be the "diagonal" language containing just those strings sl such that [email protected] Li. Then M is a decidable language not in the enumeration. I am indebted to Hilary Putnam for this observation. 144 CHOMSKY #S#-derivation, has a #S#-derivation in which no line repeats, since lines between repetitions can be deleted. Consequently, given a grammar G of type 1 and a string x, only a finite number of derivations (those with no repetitions and no lines longer than x) need be investigated to determine whether x C L o . We see, therefore, that Restriction 1 provides an essentially more limited type of grammar than type 0. The basic relation -~ of a type 1 grammar is specified completely b y a finite set of pairs of the form (¢[email protected]~, @~¢~). Suppose that ~ = ax • • • a ~ . We can then associate with this pair the element A (2) (T 1 O~2 • • • O/m_ 10/m Corresponding to any derivation D we can construct a tree formed from the elements (2) associated with the transitions between successive lines of D, adding elements to the tree from the appropriate node as the derivation progresses, s We can thus associate a labeled tree with each derivation as a structural description of the generated sentence. The restriction on the rules ~ -+ ~ which leads to type 1 grammars thus has a certain linguistic significance since, as pointed out in Sec. 1, these grammars provide a precise reconstruction of much of what is traditionally called "parsing" or "immediate constituent analysis." Type 1 grammars are the phrase structure grammars considered in Chomsky (1957, Chap. 4). SECTION 4 LEMMA 1. Suppose that G is a type 1 grammar, and X , B are particular strings of G. Let G' be the grammar formed by adding X B ~ B X to G. Then there is a type 1 grammar G* equivalent to G'. P~ooF. Suppose t h a t X = A1 • • • A n . Choose C1, • • • , Cn+l new and distinct. Let Q be the sequence of rules 8 This associated tree might not be unique, if, for example, there were a derivation containing the successive lines ,p1AB~,2, ~IACB~2, since this step in the derivation might have used either of the rules A --~ A¢ or B --~ CB. It is possible to add conditions on G that guarantee uniqueness without affecting the set of generated languages. ON CERTAIN FORMAL PROPERTIES OF GRAMMARS 145 A1 "'" A n B - ~ C1A2 "'" A,~B C1 . . . C~B BC2 . . . Cn+l BA1 " " A,~ where the left-hand side of each rule is the right-hand side of the immediately preceding rule. Let G* be formed by adding the rules of Q to G. It is obvious that if there is a #S#-derivation of x in G* using rules of Q, then there is a #S#-derivation of x in G* in which the rules are applied only in the sequence Q, with no other rules interspersed (note that x is a terminal string). Consequently the only effect of adding the rules of Q to G is to permit a string ~,XB~ to be rewritten ~ B X ¢ , and La. contains only sentences of L~,. It is clear that La* contains all the sentences of Lo, and that G* meets Restriction 1. By a similar argument it can easily be shown that type 1 languages are those whose grammars meet the condition that if ~ --~ ~b, then ~b is at least as long as ~. That is, weakening Restriction 1 to this extent will not increase the class of generated languages. LEMMA2. Let L be the language containing all and only the sentences of the form #a~bma%'~ccc#(m,n ~ 1). Then L is a type 1 language. PRoof. Consider the grammar G with V r = la,b,c,I,#}, VN = {S, $1 , $2, A, .4, B,/~, C, D, E, F}, and the following rules: (I) (a) S ~ CDS~S2F (b) S~ -+ S:S~ (c) [S2B --->B B J (d) $1 "-+ S1S~ ~Sl u =---+AB~ ( e ) [ S I A ---+ A A / 146 C~OMSXY {CDA --+ CE~A ( I I ) (a) ICDB --+ C E B B S (b) [CE~ ~ ~CE J (c) E ~ -~ ~Ea (d) E a # --+ D a # (e) ~ --~ D a ( I I I ) CDFa ~ a C D F (IV) (a) ~B, 3 -~ bJ ~CDF#---+ CDc#] (b) ~CDc ~ Ccc LCc -~ cc J where a, f~ range over {A, B, F}. I t can n o w be determined t h a t the only #S#-derivations of G t h a t terminate in strings of VT are produced in the following m a n n e r : (1) the rules of ( I ) are applied as follows: (a) once, (b) m - 1 times for some m = 1, (c) m times, (d) n -- 1 times for some n => 1, and (e) n times, giving #CDo~, . . . ,~,,+,,F# whereat =Afori~n,a~= Bfori>n (2) the rules of ( I I ) are applied as follows: (a) once and (b) once, giving #alCEal . . . ,~,~+mF#9 (c) n + m times and (d) once, giving #alCa~ . . . o~n+~FDal# (e) n + m times, giving #alCDa~ . . . ol~Fal# (3) the rules of ( I I ) are applied, as in (2), n + m giving 1 more times, #al "'" a,~+~CDFal . . . a,~+,~# 9 Where here and henceforth, a~ = fi~ if a~ = A, ~ = /~ if a~ = B. Note thut use of rules of the type of (II), (b), (c), (e), and (III) is justified by Lemma 1. ON CERTAIN FORMAL PROPERTIES OF GRAMMARS 147 (4) the rule ( I I I ) is applied n + m times, giving # ~ "'" ~,~+,~1 "'" a,~+,~CDF# (5) the rules of (IV) are applied, (a) 2 (n + m) times, (b) once, giving #anbma% '%cc# Any other sequence of rules (except for a certain freedom in point of application of [IVa]) will fail to produce a derivation terminating in a string of Vr. Notice that the form of the terminal string is completely determined b y step (1) above, where n and m are selected. Rules ( I I ) and (III) are nothing but a copying device that carries any string of the from #CDXF# (where X is any string of A's and B's) to the corresponding string #XXCDF#, which is converted b y (IV) into terminal form. B y Lemma 1, there is a type 1 grammar G* equivalent to G, as was to be proven. TI~EOREM 4. There are type 1 languages which are not type 2 languages. PRooF. We have seen that the language L consisting of all and only the strings #a%'~a%%cc# is a type 1 language. Suppose t h a t G is a type 2 grammar of L. We can assume for each A in the vocabulary of G that there are infinitely m a n y x's such that A ~ x (otherwise A can be eliminated from G in favor of a finite number of rules of the form B -+ ~lz~ whenever G contains the rule B --~ ~1A~2 and A ~ z). L contains intlnitely m a n y sentences, but G contains only finitely m a n y symbols. Therefore we can find an A such that for infinitely m a n y sentences of L there is an #S#-derivation the next-to-last line of which is of the form xAy (i.e., A is its only nonterminal symbol). From among these, select a sentence s = #a~b'~a'b'%cc# such that m -t- n > r, where al . . . an is the longest string z such that A --+ z (note that there must be a z such t h a t A -+ z, since A appears in the next-to-last line of a derivation of a terminal string; and, by Axiom 4, there are only finitely m a n y such z's). B u t now it is immediately clear that ff ( ~ , • -. , ~t+~) is a #S#-derivation of s for which ~t = #xAy#, then no m a t t e r what x and y m a y be, (~1, "'" , ~ t ) is the initial part of infinitely m a n y derivations of terminal strings not in L. Hence G is not a grammar of L. We see, therefore, that grammars meeting Restriction 2 are essentially 148 CHOMSKY less powerful t h a n those meeting only Restriction 1. However, the extra power of g r a m m a r s t h a t do not meet Restriction 2 appears, from the above results, to be a defect of such grammars, with regard to the intended interpretation. The extra power of type 1 grammars comes (in part, at least) from the fact t h a t even though only a single symbol is rewritten with each addition of a new line to a derivation, it is nevertheless possible in effect to incorporate a permutation such as A B ~ B A ( L e m m a 1). The purpose of permitting only a single symbol to be rewritten was to permit the construction of a tree (as in Sec. 2) as a structural description which specifies t h a t a certain segment x of the generated sentence is an A (e.g., in the example in Sec. 2, J o h n is a N o u n Phrase). The tree associated with a derivation such as t h a t in the proof of L e m m a 1 will, where it incorporates a permutation A B --~ B A , specify t h a t the segment derived ultimately from the B of • • - B A • • • is an A, and the segment derived from the A of . . - B A . . . is a B. For example, a type 1 g r a m m a r in which both J o h n will come and will J o h n come are derived from an earlier line N o u n Phrase-Modal-Verb, where will J o h n come is produced b y a permutation, would specify t h a t will in will J o h n come is a N o u n Phrase and J o h n a Modal, contrary to intention. Thus the extra power of type 1 grammars is as much a defect as was the still greater power of unrestricted Turing machines (type 0 grammars). A type 1 g r a m m a r m a y contain minimal rules of the form ~ I A ~ ~ 1 ~ 2 , whereas in a t y p e 2 grammar, ~ and ~2 must be null in this case. A rule of the type 1 form asserts, in effect, t h a t A --~ o~ in the context ~ - - ~ . Contextual restrictions of this type are often found necessary in construction of phrase structure descriptions for natural languages. Consequently the extra flexibility permitted in type 1 g r a m m a r s is important. I t seems clear, then, t h a t neither Restriction 1 nor Restriction 2 is exactly what is required for the complete reconstruction of immediate constituent analysis. I t is not obvious what further qualification would be appropriate. I n type 2 grammars, the anomalies mentioned in footnote 5 are avoided. The final line of each terminated derivation is a string in V r , and no string in Vr can head a derivation of more t h a n one line. SECTION 5 We consider now g r a m m a r s meeting Restriction 2. DEFINITION 7. A g r a m m a r is self-embedding (s.e.) if it contains an A such t h a t for some ~,~b(~ ~ I ~ ¢), A ~ ~A~b. ON CERTAIN FORMAL PROPERTIES OF GRAMMARS 149 DEFINITION 8. A g r a m m a r G is r e g u l a r if it contains only rules of the form A --+ a or A ---+ B C , where B ~ C; and if whenever A -+ ~1B~2 and A --+ ~blB¢~2are rules of G, then ~o~ = ~b~(i = 1, 2). THEOREM 5. If G is a type 2 grammar, there is a regular g r a m m a r G* which is equivalent to G and which, furthermore, is non-s.c, if G is non-s.c. PROOF. Define L(~) (i.e., length of ~) to be m if ~ = al • • • am, where a~ 7 I . Given a type 2 g r a m m a r G, consider all derivations D = (91, • "- , 9t) meeting the following four conditions: (a) for some A, 91 = A (b) D contains no repeating lines (c) L(~ot_~) < 4 (d) L(¢t) _-__ 4 or ~ot is terminal. Clearly there is a finite number of such derivations. Let G1 be the gramm a r containing the minimal rule ~ -+ ~bjust in case for some such derivation D, ~ = ~ and ~ = c t . Clearly G~ is a type 2 g r a m m a r equivalent to G, and is non-s.c, if G is non-s.c., since ~o -+ ~bin G1 only if ~o ~ ¢ in G. Suppose t h a t G1 contains rules R~ and R2 : R1 : A -+ ~iB~o~ = o01c02~a~(~ ~ I ) R~ : A --~ ~B¢~ where 9~ ~ ~bl or 92 # ~2 • Replace R~ b y the three rules RI~ : A ---+ C D R ~ : C --+ ~ where C and D are new and distinct. Continuing in this way, always adding new symbols, form G2 equivalent to G~ , non-s.e, if G~ is non-s.e., and meeting the second of the regularity conditions. If G~ contains a rule A --+ a~ • • • oe,~(a~ ~ I , n > 2), replace it b y t h e rules R~ : A ---+ a l . . . a,~_~B where B is new. Continuing in this way, form Ga. If Ga contains A --+ a b ( a ~ I ~ b), replace it b y A ~ B C , B ---+ a, C -+ b, where B and C are new. If Ga contains A ---+ a B , replace it b y 150 CHOMSKY A -+ CB, C --> a, where C is new. If it contains A --+ Ba, replace this by A ~ BC, C --+ a, where C is new. Continuing in this way form G4. G4 then is the grammar G* required for the theorem. Theorem 5 asserts in particular that all type 2 languages can be generated b y grammars which yield only trees with no more than two branches from each node. T h a t is, from the point of view of generative power, we do not restrict grammars b y requiring that each phrase have at most two immediate constituents (note that in a regular grammar, a "phrase" has one immediate constituent just in ease it is interpreted as a word or morpheme class, i.e., a lowest level phrase; an immediate constituent in this case is a member of the class). DEFINITION 9. Suppose that 2~ is a finite state Markov source with a symbol emitted at each inter-state transition; with a designated initial state So and a designated final state Sy ; with # emitted on transition from So and from Sf to So, and nowhere else; and with no transition from Sf except to So. Define a sentence as a string of symbols emitted as the system moves from So to a first recurrence of So. Then the set of sentences that can be emitted by Z is a finite state language, z° Since Restriction 3 limits the rules to the form A --+ aB or A -~ a, we immediately conclude the following. THEOREM 6. The type 3 languages are the finite state languages. PROOF. Suppose that G is a type 3 grammar. We interpret the symbols of V~ as designations of states and the symbols of Vr as transition symbols. Then a rule of the form A --~ aB is interpreted as meaning that a is emitted on transition from A to B. An #S#-derivation of G can involve only one application of a rule of the form A --+ a. This can be interpreted as indicating transition from A to a final state with a emitted. The fact that # bounds each sentence of L~ can be understood as indicating the presence of an initial state So with # emitted on transition from So to S, and as a requirement that the only transition from the final state is to So, with # emitted. Thus G can be interpreted as a system of the type described in Definition 9. Similarly, each such system can be described as a type 3 grammar. lO Alternatively, ~ can be considered as a finite automaton, and the generated finite state language, as the set of input sequences that carry it from So to a first recurrence of S0 . Cf. Chomsky and Miller (1958) for a discussion of properties of finite state languages and systems that generate them from a point of view related to that of this paper. A finite state language is essentially what is called in Kleene (1956) a "regular event." ON CERTAIN FORMAL PROPERTIES OF GRAMMARS 151 Restriction 3 limits the rules to the form A ~ a B or A --~ a. From Theorem 5 we see that Restriction 2 amounts to a limitation of the rules to the form A ~ aB, A --~ a, or A --~ B C (with the first type dispensable). Hence the fundamental feature distinguishing type 2 grammars (systems of phrase structure) from type 3 grammars (finite automata) is the possibility of rules of the form A ~ B C in the former. This leads to an important difference in generative power. THEORE~ 7. There exist type 2 languages that are not type 3 languages. (Cf. Chomsky, 1956, 1957.) In Chomsky (1956), three examples of non-type 3 languages were presented. Let L1 be the language containing just the strings a'b~; L~, the language containing just the strings xy, where x is a string of a's and b's and y is the mirror image of x; L~, the language consisting of all strings x x where x is a string of a's and b's. Then L1, L2, and L3 are not type 3 languages. LI and L2 are type 2 languages (cf. Chomsky, 1956). L3 is a type 1 language but not a type 2 language, as can be shown by proofs similar to those of Lemma 2 and Theorem 4.1~ Suppose that we extend the power of a finite automaton by equipping it with a finite number of counters, each of which can assume infinitely many positions. We permit each counter to shift position in a fixed way with each inter-state transition, and we permit the next transition to be determined by the present state and the present readings of the counters. A language generated (as in Definition 9) by a system of this sort (where each counter begins in a fixed position) will be called a counter language. Clearly L1, though not a finite state (type 3) language, is a counter language. Several different systems of this general type are studied by Schiitzenberger, (1957), where the following, in particular, is proven. THEOREM 8. L2 is not a counter language. Thus there are type 2 languages that are not counter languages.TMTo summarize, L~ is a counter language and a type 2 language, but not a type 3 (finite state) language; L2 is a type 2 language but not a counter language (hence not a type 3 language) ; and L3 is a type 1 language but not a type 2 language. 11 I n C h o m s k y (1956, p. 119) a n d C h o m s k y (1957, p. 34), it was erroneously s t a t e d t h a t La c a n n o t be g e n e r a t e d b y a phrase s t r u c t u r e system. This is true for a t y p e 2, b u t n o t a t y p e 1 p h r a s e s t r u c t u r e system. 12 T h e f u r t h e r question w h e t h e r all counter languages are t y p e 2 languages (i.e., w h e t h e r counter languages c o n s t i t u t e a step between types 2 a n d 3 in t h e hiera r c h y being considered here) has not been investigated. 152 CHOMSKY From Theorems 2, 3, 4 and 7, we conclude: THEOREM 9. Restrictions 1, 2 and 3 are increasingly heavy. T h a t is, the inclusion in Theorem 1 is proper inclusion, both for grammars (trivially) and for languages. The fact that L~ is a type 2 language but neither a type 3 nor a counter language is important, since English has the essential properties of L~ (Chomsky, 1956, 1957). We can conclude from this that finite automata (even with a finite number of infinite counters) that produce sentences from "left to right" in the manner of Definition 9 cannot constitute the class F (cf. Sec. 1) from which grammars are drawn; i.e., the devices that generate language cannot be of this character. SECTION 6 The importance of gaining a better understanding of the difference in generative power between phrase structure grammars and finite state sources is clear from the considerations reviewed in Sec. 5. We shall now show that the source of the excess of power of type 2 grammars over type 3 grammars lies in the fact that the former m a y be self-embedding (Definition 7). Because of Theorem 5 we can restrict our attention to regular grammars. Construction: Let G be a non-s.e, regular (type 2) grammar. Let K = {(A1,...,A~) for [m = 1 or, 1 <= i < j < m, Ai--->~Ai+l¢~ and A~#A~}. We construct the grammar G' with each nonterminal symbol represented in the form [B1 .." B~]~(i = 1, 2), where the B / s are in turn nontermihal symbols of G, as follows: 13 Suppose that (BI, . . . , Bn) C K. (i) If Bn --+ a in G, then [B1 .." B~]~ -+ a[B1 . . . B~]2. (ii) If B~ ---+CD where C # B~ ~ D ( i <= n), then ( a ) [B~ . . . B~]~ ~ [B~ . - . B.C]I (b) [B1 . . . B,C]2 --+ [B1 .." B,D]~ (c) [B~ . . . B,~D]~ -+ [B~ . . . B=]2. 13Since the nonterminal symbols of G and G' are represented in different forms, we can use the symbols --~ and ~ for both G and G' without ambiguity. ON CERTAIN FORMAL PROPERTIES OF GRAMMARS 153 (iii) If B , ~ C D where B~ = D for some i < n, then (a) [B1 . . . B~h-+ [B~ . . . B,~C]I (b) [B1 . . . B.C]2 --~ [B~ . . . B,]I . (iv) If B~ ---> C D where B~ -- C for some i ~ n, then (a) [B~ . . . B~]~ ~ [BI . . . B,D]~ (b) [B~ . . . B,~D]2 ~ [BI . . . B~]2. We shall prove that G' is equivalent to G (when sfightly modified). The character of this construction can be clarified by consideration of the trees generated by a grammar (cf. Sec. 3). Since G is regular and non-s.e., we have to consider only the following configurations: (a) (b) (c) (d) B1 B1 B1 B1 /1\ /1\ B2 B2 B~ B2 / 1: \ / 1: \ / 1: \ /1\ B~ B~ B~ B~ /\ i a /!\ C /1\ /\ D E1 B~+I E~ B~+I E1 Bi+~ B~+~ E2 Bn Bn /\ C (3) /\ B~ Bi D where at most two of the branches proceeding from a given node are non-null; in case (b), no node dominated by B~ is labeled B i ( i <= n); and in each case, B1 = S. (i) of the construction corresponds to case (3a), (ii) to (3b), (iii) to (3c), and (iv) to (3d). (3e) and (3d) are the only possible kinds of recursion. If we have a configuration of the type (3c), we can have substrings of the form (xl • . . x,~_~y) k (where Ej ~ x~-, C ~ y ) in the resulting terminal strings. In the case of (3d) we can have substrings of the form (yXn--i "'" Xl) ~ (where D ~ y, Ej ~ xj). (iii) and (iv) accommodate these possibilities by permitting the appropriate cycles in Gt. To the earliest (highest) occurrence of a particular nonterminal symbol 154 CI-IOMSKY B~ in a particular branch of a tree, the construction associates two nonterminal symbols [B1 . . . B~]I and [B1 . " B~]2, where B1 , , . . , B~-I are the labels of the nodes dominating this occurrence of B~. The derivation in G' corresponding to the given tree will contain a subsequence (z[B1 . . . Bn]~, . . . zx[B~ . . . B~]2), where B~ ~ x and z is the string preceding this occurrence of x in the given derivation in G. For example, corresponding to a tree of the form S A B I J a b (4) generated by a grammar G, the corresponding G' will generate the derivation (5) with the accompanying tree: 1. [S]1 [S]1 2. [SA]t (iia) 3. a[SA]2 (i) 4. a[SB]I (lib) [SB]I 5. ab[SB]2 (i) b [SB]2 6. ab[S]2 (iic) I [SAh /\ a [SA]2 ] (5) I [S]2 where the step of the construction permitting each line is indicated at the right. We now proceed to establish that the grammar G' given by this construction is actually equivalent (with slight modification) to the given grammar G. This result, which requires a long sequence of introductory lemmas, is stated in a following paragraph as Theorem 10. From this we will conclude that given any non-s.c, type 2 grammar, we can construct an equivalent type 3 grammar (with m a n y vacuous transitions which are, however, eliminable; cf. Chomsky and Miller, 1958). From this follows the main result of the paper (Theorem 11), namely, that the extra power of phrase structure grammars over finite automata as language generators lies in the fact that phrase structure grammars m a y be self-embedding. ON C E R T A I N F O R M A L P R O P E R T I E S OF G R A M M A R S 155 LEMMA 3. If ( A 1 , " • • , A m ) ~ K , where K is as in the construction, t h e n A j ~ A k ~ b , for 1 =< j <= k =< m. LEMM~ 4. If [B1 " ' " B~]~ - + x[B1 " " B,~]j, C ~ B k ( k <= re, n ) , a n d C ---> aBl~, t h e n [CBI . . . B~]i ---+ x[CB1 . . . B,~]j. Proofs are immediate. LEMMA 5. If (B1, " ' " , B~) C K and 1 < m < n, t h e n (a) if B ~ ~ ~B1, it is n o t the case t h a t B~ ~ B m ~ ( i <= n; i ~ m ) (b) if Bm ~ BI~, it is n o t the case t h a t B~ ~ ~ B m ( i <= n; i ~ m ) (c) if Bm ~ ~B~b, it is n o t the case t h a t B~ ~ ~lB,~2Bmo~3(i <= n ) PROOF. Suppose t h a t B ~ ~ ~B1 and for some i ~ m, B~ ~ B~¢. .'. ~ ~ I ~ ~b. B y l e m m a 3, B~ ~ ~ 1 B ~ 2 . . ' . Bm ~ ~ 1 B ~ 2 ~ ~lBm~b~2. Contra., since now B ~ is self-embedded. Similarly, case (b). Suppose Bm ~ ~B~b and for some i, B~ ~ ~1B,~o~2B~3. .'. B1 ~ x~B~x2 Xlo~iB~2Bmo~3X2 ~ ~Blo~Bl¢o~ ~ o~TBlo~sBl~DB~o~6. Contra. (s.e.). T o facilitate proofs, we a d o p t the following notational convention: CONVENTmN 2. Suppose t h a t ( ~ , • • • , ~ ) is a derivation in G' f o r m e d b y construction. T h e n ~ = a~ . . . a~Q~ (where Q~ is the unique non~ 15 terminal symbol t h a t can a p p e a r in a derivation~), Q~ - + a ~+l~z~+~. m 1 zn --- a m " " a,~.zn = zn. LEMMA 6. Suppose t h a t D = (~1, • • • , ~ ) is a derivation in G' where Q~ = [B~]2. T h e n : ( I ) if~l = [B~]I, (C1, . . . , Cm+l) ~ K , C~---+A~+~C~+I (for 1 < i < m ) , a n d Cm+l = B1, t h e n there is a derivation ([C~ . . . C,~B~h, . " , z~[C~]~) in G'. ( I I ) if ~ = [B1 . - . B~]I and B~ ~ x B 1 , t h e n there is a derivation ([B~h, " " , z~[B~]~) in G'. PROOF. P r o o f is b y simultaneous induction on the length of z~, i.e., the n u m b e r of non-null symbols a m o n g a l , --- , a~. Suppose t h a t the length of z~ is 1 . . ' . there is one and only one i s.t. q~ = [ " ' ] 1 and q~+~ = [ . - - ] 2 . (a) Suppose t h a t i > 1. T h e n ~ = Q~ is formed from Q~-I b y a rule whose source is (iia) or (iiia), a n d ~+~ = a~+lQ~+~ is f o r m e d f r o m ~+1 = a~+~Q~+~ b y a rule whose source is (iic) or (ivb). B u t for some ~ Unless the initial line contained more than one nonterminal symbol, a case which will never arise below. ~ Note that a~+~will always be I unless the step of the construction justifying ~ --~ ~+~ is (i). a~ will generally be I in this sequence of theorems. 156 ,t~, Qi--1 CHOMSKY = [B1 "'" Bk]l, Qi = [B1 "'" Bk+ila, Q¢+I = [B1 ..- Bk+l]2, .Qi+~ = [B1 • • • Bk]2. .'. Bk --~ Bk+ID for some D , Bk --+ EBk+a , for some E , which contradicts the assumption t h a t G is r e g u l a r . . ' , i = 1. (b) Consider now ( I ) . Since i = 1, r = 2 . . ' . B , --+ z2. B y assumption a b o u t the C?s and m applications of L e m m a 4, and (i) of the construction, [Ca "'" C,~B1]a --* z2[C~ " " CraB1]2. Since Ci --~ Ai+aCi+l(Ci ~ C j for 1 _-< i < j _< m + 1, since (Ca, - . - , C~+~) C K b y assumption), it follows t h a t [Ca - . . C,~Ba]2 ~ [Ca . . . C,,]~ --~ [CI . . . Cm-~]2 . . . ---* [Ca]2. .'. ([C~ . . . CmBa]a, z 2 [ C a " " CraBs]2, z 2 [ C ~ . " C m ] 2 , " ' , z_~[Ca]2) is the required derivation. (e) Consider now ( I I ) . Since i = 1, B~ --~ z2 and [Bn]a --~ z2[Bn]~, b y (i) of c o n s t r u c t i o n . . ' . ([Bn]~ , z2[Bn]2) is the required derivation. This proves the l e m m a for the ease of z~ of length 1. Suppose it is true in all cases where z~ is of length < t. Consider ( I ) . Let D be such t h a t z~ is of length t. If none of Ca, • • • , C~ appears in a n y of the Q¢'s in D, then the proof is just like (a), above. Suppose t h a t ~j is the earliest line in which one of Ca, • • • , C ~ , say Ck, appears in Q j . j > 1, since C1, . - ' , C~ ~ Ba. B y assumption of nons.c., the rule Q~'-I --+ a j Q j used to form ~- can only have been introduced b y (lib). ~6 .'. Q~--j = [Ba " " B n E ] 2 , Qj = [ B a ' . . B,~C~]a, Bn "-" E C k . B u t Ca, " ' " , C~ do not occur in Q1, "'" , Q~-a and (Ca,-'-, C~,B~) ~ K. .'., b y L e m m a 4, ([C~ . - . C~Ba]~ , . . . , zj_~[C~ . . . C r a B s . . . B,~E]~) (6) is a derivation. F u r t h e r m o r e z j_l is not null, since there is at least one transition from [ . . . ] 1 to [.-.]2 in (6), which m u s t therefore have been introduced b y (i) of the construction. B u t B,~ ---+ E C ~ . . ' . [C~ . . . C r a B s . . . B,E]~----> [ C a ' . . C~]~ (7) [by (iiib)]. F u r t h e r m o r e we know t h a t ([B~ . . - B~C~]I, . . . , g+a[B,]2) (S) ~ It can only have been introduced by (iia), (lib), (ilia), (iva), or C~ will appear in Q~._, . Suppose (iia)..'. Qi-* = [B~ ... Bq], , Qi = [B~ ... BqC~], , and Bq ~ C~D. But C~ ~ ebb. Contra. by Lemma 5 (a). Suppose (ilia). Same. Suppose (iva)..'. Qi-* = [B~ ... Bd= , Qi = [B~... Bi+q]~ (q >= 1), B~+q_t ~ B~B~+q, where C~ = B ~ (1 < s =< q). But C~ ~ ¢B~ , ¢ # I..'. C~ ~ ~colBi+q_lW2 --> ¢o~lBiBi+qCO2 :=~ ~bwaCte.o~Bi+q¢o~ , contra..', introduced by (lib). ON CERTAIN FORMAL PROPERTIES OF GRAMMARS 157 m u s t be a d e r i v a t i o n , since [B1 . . . B~C~]I = Qj ; i.e., (8) is j u s t t h e t a i l e n d (¢5, " • " , Cr) of D, w i t h t h e i n i t i a l s e g m e n t zj d e l e t e d f r o m each of ¢~., • • • , Cr. Since z j_l is n o t null, z~+~ is s h o r t e r t h a n z~, hence is of l e n g t h < t . Also, Ck ~ xB1, b y a s s u m p t i o n . . ' , b y i n d u c t i v e h y p o t h e s i s ( I I ) , t h e r e is a d e r i v a t i o n ([ck]l, . . - , d+'[c&) (9) .'. b y i n d u c t i v e h y p o t h e s i s ( I ) , t h e r e is a d e r i v a t i o n (IV1 ' ' ' Ck]l, .'. , z~-t-1[C112) (10) C o m b i n i n g (6), (7), (10), we h a v e t h e r e q u i r e d d e r i v a t i o n . C o n s i d e r n o w ( I I ) . If n - 1 or t h e r e is no such d e r i v a t i o n of l e n g t h l, t h e proof is t r i v i a l . A s s u m e n > 1. L e t ~ j c o n t a i n t h e first Q of t h e f o r m [B, . . - Bm],(j > 1, m <= n). Since B , ~ xB~, it follows f r o m L e m m a s 3, 5 t h a t Bm ~ yB1. Since m <= n, we see b y checking t h r o u g h t h e possibilities in t h e c o n s t r u c t i o n t h a t n o t all of Q1, • • • , Qj-1 are of t h e f o r m [. • "]2 • .'. t h e r e w a s a t l e a s t one a p p l i c a t i o n of (i) in f o r m i n g (~1, • • • , ~ j - , ) . . ' . zj_l is n o t null. B u t ([B, . . . B A 1 , . . - , is, like ( 8 ) , a d e r i v a t i o n . . ' , derivation z~+1[B ~12) " (J1) b y i n d u c t i v e h y p o t h e s i s ( I I ) , t h e r e is a ([B~]I, - . . , z~+~[BmD (12) w h e r e z~+1 is s h o r t e r t h a n z~. L e t ek c o n t a i n t h e first Q of t h e f o r m [B~ .. - B,~]2(m <= n). As a b o v e , Bm ~ yB~. F r o m L e m m a 5 it follows t h a t t h e rule used t o f o r m ~k+~ m u s t be justified b y (iic) or ( i v b ) of t h e c o n s t r u c t i o n . I n e i t h e r case, QI~+~ = [BI " - B~-112 • S i m i l a r l y , we show t h a t ([Bi . . . B~]2, . . . , [B,]~) (13) is a d e r i v a t i o n . . ' , z~ = zk. L e t q = m i n ( j , k ) . T h e n all of Q2, " ' " , Qq-1 are of t h e f o r m [BI " ' " B~+~]i . I t is clear t h a t we can c o n s t r u c t ¢1, • • • , Cq-~ s.t. for p < q, ~p = z~Qp', w h e r e Q~' = [B~ . . . B~+~]i w h e n Q, = [B1 . - . B,~+v]~. C o n s e q u e n t l y ( [ B d ~ , . - . , Zq_~Q'q_~) is a derivation. (14) 158 CItOMSKY Suppose q = j . .'. Qq-1 = [B1 . - . B.+,]~--~ a~[B1 . . . B,~]z = Qj , where m < n < n + v . . ' . this rule can only have been introduced b y (iiib) of the c o n s t r u c t i o n . . ' , i = 2 and B~+~-I - ~ B ~ + ~ B ~ . Case 1. Suppose m = n . . ' . [B,~ . . . B,+~]~ --~ [B,]I = [B,]~ (15) Combining (14), (15), (12), We have the required derivation. ~ Bn, -.. , Bn+~..'. Case 2. Suppose m < n . . ' . B , [ B , . . . B,+~]2 ~ [ B , . . . B,+~_~B,]I (16) b y (iib). We have seen t h a t B , ~ y B ~ . .'. B,+~_~ w ~ B ~ . .'. for 1, B , + , --~ E~B,+,+~, b y L e m m a 5. B u t (12) is a derivation where z, _s+~ is of length < t . . ' . b y inductive hypothesis ( I ) there is a derivation s < v - ( l B , . . . B~+~_IB~]~ , . . . , z~+*[Bn]2) (17) Combining (14), (16), (17), we have the required derivation. Suppose, finally, t h a t q = k. We have seen t h a t in this case z~ = zk. B u t Qq-1 = [BI . . . B~+~]¢ --~ ak[Bi . . . Bm]~ , where m =< n < n + v . . ' . this rule can only have been introduced b y (iic) or (ivb). I n either ease, i = 2, m = n, v = 1, and O~_, = [B~B,+~]2 --~ ak[B,]2. Combining this with (14) we have the required derivation. We have thus shown t h a t the l e m m a is true in case z~ is of length 1, a n d t h a t it is true for z~ of length t on the a s s u m p t i o n t h a t it holds for z, of length < t . Therefore it holds for e v e r y derivation D. LEI~MA 7. Suppose t h a t D = ( ~ , • • • , ~ ) is a derivation in G' where QI = [B111 . T h e n ( I ) if ~ = [B~h, (C~, . . . , C~+~) C K, C~ ~ C~+~A,+~ (1 < i < m), and C~+I = B~, t h e n there is a derivation ([C~]~, . - . , z~[C~ . . . CmB~]~) ( I I ) if ¢, = [B~ . . . B.]2 and B~ ~ in G'. B~x, t h e n there is a derivation ([B,]I, . . - , z~[Bn]~) in G'. T h e proof is analogous to t h a t of L e m m a 6. I n the inductive step, case ( I ) , we take Q~ as the last of the Q's in which one of C~, . - . , C~ ON CERTAIN FORMAL PROPERTIES OF GRAMMARS 159 appears, and instead of (iiib) in (7), we form [Ci " " G]2--* [ G " " C,~BI . . . B~E]~ by (ira). The proof goes through as above, with similar modifications throughout. In case ( I I ) of the inductive step we let Qj be the last Q of the form [ B 1 . . . B,~]2(j < r , m <- n ) , andQk the last Q of the form [B1 " . B~]l(m <- n ) . Taking q -- max(j,k) [instead of min(j,k)], the proof is analogous throughout, with (iva) taking the place of (iiib). In general, because of the symmetries in case (iii), (iv) of the construction [reflecting the parallel possibilities (30), (3d) for recursion], most of the results obtained come in symmetrical pairs, as above, where the proof' of the second is analogous to the proof of the first. Only one of the pair of proofs will actually be presented. We will require below only the following special case of (I) of Lemmas 6, 7 (which, however, could not be proved without the general case). LEMMA 8. Suppose that D = ([B]~, .-- , z[B]2) is a derivation in G' and that C ~ B. Then (a) if C --~ AB, there is a derivation ( [ C B h , . . . ,z[C]~) in G' (b) if C --~ B A , there is a derivation ([C]~, - . . , z[CB]2) in G'. DEFINITION 10. Suppose that G' is formed from G by the construction and D is an a-derivation of x in G. D will be said to be represented in G ~ if and only if ~ = a or a = A and there is a derivation ([All, . . . , x[A]2) in G'. W h a t we are now trying to prove is that every S-derivation of G is represented in G'. DEFrNITIoNll. L e t D 1 = ( ~ , . - . , ~ m ) and D2 = ( ~ l , ' " , ~ b ~ ) be derivations in G. Then DI*D2 is the derivation ( ~ 1 ~ , ~2¢1, . - . , ~ , ~,,¢~, . . . , ~m¢~). LEMMA 9. Let Di be an A-derivation of x and D2 a B-derivation of y in G. If Di and D2 are represented in G ~ and C ~ A B , then Da = (C~1 .-- ~ ) is represented in G', where (~1, • "" , ~,~) = D~*D2. (D3 is thus a C-derivation of xy.) 160 CHOMSK¥ PROOF. B y hypothesis, there are derivations ([A]I, . . . , x[A]=) (18) ([B]I, . . . , y[B]~) (19) in G'. Case 1. Suppose A ~ C ~ B. Then by Lemma 8, there are derivations x[CA ]~) (20) ([CB]I, ' ' - , y[C]2) (21) ([C]1, . . . , in G'. B y (iib) of the construction, [CA]~ --. [CB]I (22) Combining (20), (22), and (21), we have the required derivation. Case 2. C = A . .', C ~ B by assumption of regularity of G. B y Lemma 8, case (a), we have again the derivation (21). B y (ira) of the construction, [A]~ = [C]2 - * [CB]~. (23) Combining (18), (23), (21) we have the required derivation. Case 3. C = B . .'. C ~ A . B y Lemma 8, case (b), we have (20). B y (iiib), [CA]~ --~ [C]~ = [B]~. (24) Combining (20), (24), (19), we have the required derivation. Since C ---+ C C is ruled out by assumption of regularity, these are the only possible cases. LE~MA 10. If D1 = ( ~ , " - , ~r) is a Xl~l-derivation, where Xl I ~ ~ , then there is a derivation D2 = D~*D~ = (g,~, . . . , ¢~r) such that t r = ~r, D3 is a x~-derivation and D~ is an wl-derivation. ])ROOF. Since for i > 1, q~ is formed from ~_~ by replacement of a single symbol of ~_~ ,~7 we can clearly find X~, ~ s.t. ~ = x ~ where either (a) xi = x~-i and w~-1 --~ ~ or (b) xi-~ --~ x~ and ~i = o~_~ (X~-~-~ = ~-~). Then D~ is the subsequence of (X~, "'" , X,) formed b y dropping repetitions and D4 is the subsequence of ( ~ , . . . , ~ ) formed b y dropping repetitions. LEY~MA 11. If G' is formed from G by the construction, then every a-derivation D in G is represented in G'. ~ Which, however, may not be uniquely determined. Compare footnote 8. ON CERTAIN FORMAL PROPERTIES OF GRAMMARS 161 PROOF. Obvious, in case D contains 2 lines. Suppose true for all derivations of fewer t h a n r lines (r > 2). Let D = (~i, " ' " , ~ ) , where ~1 = a. Since r > 2, a = A , ~2 = B C . . ' . ( ~ , . . • , ~ ) is a BC-derivation. B y L e l n m a 10, there is a Dz = D~*D~ = (¢~2, • "" , ~k~) s.t. D, is a B-derivation, D4 a C-derivation, and ~kr = ~r • B y inductive hypothesis, b o t h D3 and D4 are represented in G'. B y L e m m a 9, D is represented in G'. I t remains t o show t h a t if (JAIl, -. • , x[A]2) is a derivation in G', t h e n there is a derivation (A, . - . , x) in G. LEMMA 12. ~s Suppose t h a t G' is formed b y the construction f r o m G, regular a n d non-s.c., and t h a t (a) D = ( ~ l , - . . , ~ l , - - . , ~ q , - - - , ~ m 2 , . . . , ~ , ) is a derivation in G', where Q~ = [A1], Qml = Q ~ = [A1 - . . A~]n, Q~ = [A1 . . - Aj]~, (b) there is no u, v s.t. u ¢ v, Q~ = Q~ = [B1 . . - B~]t, and s < k 19 (c) for ml < u < m2, if Q~ = [A1 . . . A , ] t , t h e n s > f 0 T h e n it follows t h a t (A) if n -- 2, there is an m0 < m~ such t h a t Q~0 = [A~ . - . Ak]t (B) if n = 1, there is an m~ > m2 such t h a t Qm = [Aj . . . Ak]2 (C) j = ~ PROOF. (A) Suppose n = 2. Assume ~ to be the earliest line to cont a i n [A1 . . . A~]2. Clearly there is an ~ =< m~ s.t. Q~ --- [A1 . . . Ak+~]~, Q.a-~ = [A1 . . . Ak+t]~(t > 0). If there is no m0 < ~ s.t. Q~0 = [A1 . - . Ak]l, t h e n there m u s t be a u < r~ s.t. Q~ = [A~ • • • A~]:, Q~+I = [A0 . . " A ~ _ I B o . . . B,~]I, where ~+~ is f o r m e d b y ( i r a ) of the construction, A0 = I , B0 = A k , m = 1, and s < /c (since ( i r a ) gives the only possibility for increasing the length of Q b y more t h a n 1 ) . . ' . B,~_I --> A ~ B . . . . ". A~ ~ ~ A ~ B ~ ¢ . B u t Q~ = [A~ . . . A~]2 cannot recur in a n y line following f ~ [this would contradict assumption (b)]. Therefore, iust as above, there m u s t be a v > m2 s.t. Q~_~ = [A0 . . . A,_~Co . . . C,~,]~, Q, = [A~ . . . A~]~, where ~ is formed b y (iiib) of the construction, A0 = I , Co = A ~ , m ' > 1, p < s (since (iiib) gives the only possibility for decreasing the ~s W e c o n t i n u e t o e m p l o y C o n v e n t i o n 2, a b o v e . 19 T h a t is, Q~I = Q~: is t h e s h o r t e s t Q of D t h a t r e p e a t s . ~0 T h a t is, Qqis t h e s h o r t e s t Q of t h i s f o r m b e t w e e n ~ i a n d f ~ : . ~ T h a ~ is, Qqis n o t s h o r t e r t h a n Q~I = Q~e - 162 CttOMSK¥ length of Q b y more t h a n 1 ) . . ' . Cm,-1 ---> Cm,A~ . But A~ ~ ~1C,~,-1~2 ( L e m m a 3 ) . . ' . A~ ~ elC~,Ap~2 ~ ~ 1 C ~ , ~ A ~ 4 ~ ~ I C ~ , ~ 3 ~ A ~ B , ~ 4 . Contra., since G is assumed to be non-s.c. .'. there is an m0 < r~ _= m~ s.t. Q~0 = [At • .. Ak]~ (B) Suppose n = 1. Proof is analogous. (C) ( I ) . Suppose n = 2. S u p p o s e j < k. Suppose i (in Qq) is 2. Clearly there m u s t be a v > m~ s.t. either Q~ = [A~ - . . A~.]2 [which contradicts assumption (b)] or Q,_~ = [A0 . . . Aj-~Co . . . C~]~ , Q~, = [A1 . . . A~,]~ , where f~ is formed b y (iiib) of the construction, Ao .~- I , Co = A i , m => 1, p < j [as in the second paragraph of the proof of (A)]. Suppose the l a t t e r . . ' . C~_~ --* C m A ~ . .'. A j ~ ~ C m A , ¢ . Furthermore, since p < j, A ~~ ~C~A ~. F r o m assumption (c) and assumption of regularity of G, it follows t h a t ~q+~ can only have been formed b y (iva) of the construction..'. Qq+l = [A0 . . " Ai_~Do . . " Dt]:t, where A0 = I , Do = A~., t ~ 1..'. D t _ ~ - - - > A i D t . .'. A i ~ xaA ~Dt~, . .'. A i ~ o~fC~o~A i x ~ D t ~ , and A~. is self-embedded, contrary to assumption. Suppose t h a t i (in Qq) is 1. B y (A), there is an m0 < m~ s.t. Qm0 = [A~ . - . A , ] ~ . . ' . there is a u < m0 s.t. either Q~ = [A1 - . . A~]~ [which contradicts assumption (b)] or Q, = [A~ . . - A,]~, Q~+I = [A0 . - . Aj_1Bo . . . Bin]l, where ~+1 is formed b y (ira), Ao = I , Bo = A ~ , m >= 1, s < j . Assuming the latter, we conclude t h a t A j ~ ~IAjw2Bm¢~, as above. F r o m assumption (c) and assumption of regularity of G, it follows t h a t Cq can only have been formed b y (iiib). Contradiction follows as above. ( I I ) Suppose n = 1. Proof is analogous. This completes the proof. F r o m L e m m a 12 it follows readily b y the same kind of reasoning as above t h a t COROLLARY. Under the assumptions of L e m m a 12, (A) if n = 2, ~m1+1 is formed b y ( i r a ) of the construction (B) if n = 1, ~m2 is formed b y (iiib) of the construction (C) Q~ is of the form [A1 . . - AkBo . . . B~]t (s >-_ O, Bo =- I ) , for u such t h a t : (a) where n = 2 and m0is a s i n (A), L e m m a 12, then m0 < u < m2 ; (b) where n = 1 and m~ is as in (B), L e m m a 12, then ml < u < m3. Furthermore, for ml < u < m2, s > 0 if t # n. DEFINITION 12. Let D = (~i, ' • • , ~ ) be a derivation in G' formed ON CERTAIN FORMAL PROPERTIES OF GRAMMARS 163 b y the construction f r o m G. T h e n D ~ corresponds t o D if D ' is a derivation of z~22 in G and for each i, j, k ( i < j ) such t h a t (a) ~i is the earliest line containing [AI - . . Ak]l (b) ~j is the latest line containing [AI . . . Ak]2 (c) there is no p, q s.t. i < p < j, q < h, and Qp = [A1 . - . Aq]~, there is a subsequence (ziA~b, . . . , zj~b) in D'. LEMMA 13. Let D = (~1, • • ", ~r) be a derivation in G t f o r m e d by the construction f r o m a regular, non-s.e.G. Suppose t h a t Q1 = [A~ .. • A~]I, Qr = [A~ - . . A~]2, and there is no p, q such t h a t 1 < p < r, q < s, Qp = [ A 1 . . . Aq]~. T h e n there is a derivation D r = (~1 ~ , • • -, ~ , ) corresponding t o D. PnOOF. Proof is b y induction on the n u m b e r of recurrences of symbols Qi in D (i.e., the n u m b e r of cycles in the derivation). Suppose t h a t there are no recurrences of a n y Q~ in D. I t follows t h a t there can have been no applications of (iva) in the construction of D, i.e., no pairs Qi -- [A1 - - . Aj]2, Qi+~ = [A1 . . . A~]I where j < h. F o r suppose there were such a p a i r . . ' . Ak_~ ~ A ~ A k . Also, j > s, or Q~ is repeated as Q~. Clearly there is an m > i ~- 1 s.t. Q~ = [A~ . . - A~+~]~ (n > 0 ) . . ' . there is a t > m s.t. either Qt = [A1 . . . Aj]2 ( c o n t r a r y t o a s s u m p t i o n of no repetitions) or Qt = [A1 . . . As+~]2, Qt+l = [A1 . . . Aj-,,]I (u, v > 1), where ~+1 is f o r m e d b y ( i i i b ) . . ' . c o n t r a r y t o the assumption t h a t G is non-s.e. Similarly, there can be no applications of (iiib) in the construction of D. B u t now the proof for this case follows immediately b y induction on the length of D. Suppose now t h a t the l e m m a is true for every derivation containing < n occurrences of repeating Q's. Suppose t h a t D contains n such occurrences. I. 1. Suppose t h a t the shortest recurring Q in D is [A~ • • • Akin. 2. Select m l , m~ s.t. m~ < m~ ; Q ~ = [A~ . . . A~]~ = Q~: ; there is no i, m~ < i < m~, s.~. Qi = [A1 . . . A~]~ ; there is no j > m2 s.t. Qj = [A1 . . . A~]~. 22 Compare Convention 2. 164 CHOMSKY 3. B y L e m m a 12 ( A ) , we k n o w t h a t t h e r e is a n m0 < ml s.t. Qm0 = [A1 - . . A~]I. Select m0 as t h e earliest such ( t h e r e is in f a c t o n l y o n e ) . B y t h e C o r o l l a r y t o L e m m a 12, ( C ) , a n d t h e i n d u c t i v e h y p o t h e s i s , t h e r e is a d e r i v a t i o n D~ = (zmoAk , . . . , z m)58 c o r r e s p o n d i n g t o ( ~ o , "" ", ~ 1 ) . 4. B y C o r o l l a r y ( A ) , we k n o w t h a t ~ml+~ = z,~l[Ao " ' " A k - i B o " ' " B m ] l , w h e r e A o = I , Bo = A k , m >= 1, Bm-~ --~ A k B ~ . O b v i o u s l y , t h e r e is a v(m~ < v < m2) s.t. e i t h e r Q~ = [A0 - . - A~_IBo . . . Bm]2 or Qv = [A0 . . . A ~ - I B o . . . B~+t]2, Qv+~ = [Ao . . - A k - l B o "'" Bm-u]l, where u, t > 1 a n d ~+1 is f o r m e d b y (iiib) [note C o r o l l a r y (C)]. F r o m t h e l a t t e r a s s u m p t i o n we can d e d u c e self-embedding, as a b o v e . . ' , we can select v as t h e l a r g e s t i n t e g e r ( m 2 s.t. Q, = [A0 • • • Ak_~Bo • • • B ~ ] ~ . 5. L e t t b e t h e l a r g e s t i n t e g e r (ml ~- 1 ~ t ~ v) s.t. Qt = [A0 . . . Ak_~Bo . . . B m _ u ] i , u > O. S u p p o s e t h a t i = 1. B u t ~t+1 m u s t be f o r m e d b y (iia) or (ilia) of t h e c o n s t r u c t i o n . . ' , u = 1 a n d Bm_~ - ~ B m C , c o n t r a r y t o a s s u m p t i o n of r e g u l a r i t y , since B~_~ ~ A k B m . .'. i = 2, a n d Qt+l = [A0 . . . A k _ l B o . . . B~+.]1(n _-> 0), w h e r e ~t+l is f o r m e d b y ( i r a ) of t h e c o n s t r u c t i o n . . ' . B m + , - I - ~ B,,_~,B~+,~ ~ ~ B m - I ~ 2 B ~ , + ~ --~ ~IAkBm~2Bm+,~ . S u p p o s e n = 0. T h e n B,,_~ --+ B m - ~ B m , so t h a t , b y r e g u l a r i t y , B~_~ ---A k . . ' . Qt = [A~ • • • A~-]2, c o n t r a r y t o a s s u m p t i o n in s t e p 2. .'. n ~ O. .'. B m ~ ~ B , ~ + ~ _ ~ 4 . .'. B m ~ ~ 3 ~ A ~ B , ~ : B ~ + , ~ x 4 , c o n t r a . (s.e.). 6..'. there is no t such as that postulated in step 5. Consequently (~+i, • •., ~) meets the assumption of the inductive hypothesis e~ and there is a derivation D~ -- (z~+~Bm,. "',Zm~) ~ corresponding to (~m1+1 , " " " , ~t~v)" 7. Since v was selected in s t e p 4 t o be m a x i m a l , i t follows t h a t ~,+~ c a n n o t be f o r m e d b y ( i v a ) , b y reasoning similar t o t h a t i n v o l v e d in ~ Recall that z ~ ,~0+~., i.e., there is a derivation (A~, • • • , z,~o+h~. zm0z~ ~a From nonexistence of such a t it follows at once that for u such that mt u < v, Q~ = [A0 .-- A~_~Bo . . . B,~Co . . . C~]i ( ~ >= O, Co ~ I). ~ That is, there is a derivation (B,~ , .-. , z~+~). ON CERTAIN FORMAL PROPERTIES OF GRAMMARS 165 step 4. B y regularity assumption, it cannot h a v e been f o r m e d b y (lib) or (iiib) of the construction, since B,,-I ---+ A k B ~ . . ' . Q~+I -- [A0 . . - Ak-IBo . . . B,~-112. 8. Suppose m = 1, so t h a t Q~+I = [AI . . - A~]2..'. v -t- 1 = m2, b y a s s u m p t i o n of step 2, and AIo ~ AkB~. L e t D2' be the derivation f o r m e d f r o m D~ (cf. step 6) b y deleting initial z~1+~ f r o m each line. L e t (~1, " " , ~ ) = DI*D2' (cf. Definition 11; DI as in step 3). Clearly D3 = (z,~oAl~, ~ , . . . , ~ ) is a derivation corresponding to ( ~ 0 , "" ", ~ ) . 9. Suppose m _>- 2. B y a s s u m p t i o n t h a t G is non-s.e., a n d t h a t v is m a x i m a l (in step 4) we can show t h a t e~+~ m u s t be f o r m e d b y (iib) of the construction (all other cases lead to c o n t r a d i c t i o n ) . . ' . Q~+2 = [A0 • .. Ak-lBo . . . Bm-2C]l, B~_2 --> Bm_lC. As above, we can find a vl which is the largest integer < m 2 s.t. Q~ = [A0 . . . Ak-~Bo . . . B,~-2C]2 a n d s.t. (¢,+~, . . . , ~ 1 ) m e e t s the inductive h y p o t h e s i s . . ' , there is a derivation D4 = (z~+2C, . . . , z~) corresponding to ( ~ + ~ , • •., ~ ) . 10. Suppose m = 2 . . ' . B,~_2---+ B1C, vl + 1 = m2, Q~+I = lAx . . . Ak]~ (as a b o v e ) . L e t D4' be the derivation f o r m e d f r o m D4 b y deleting initial z~+2 f r o m each line. L e t (~b,, . . . , ~p) be as in step 8. L e t ( x l , • • ", xq) = (z~oB1, ~1, "" ", ~b~)*D4'. Clearly D5 = (zmoAk, Xl, " " , Xq) is a derivation corresponding to ( ~ 0 , • • -, ~ = ) . 11. Similarly, w h a t e v e r m is, we can find a derivation A = (Z,~oAk, . . - , z ~ ) corresponding to ( ~ 0 , . . . , e ~ ) . 12. Consider now the derivation D6 f o r m e d b y deleting f r o m t h e original D the lines ~ + ~ , • •., ~ and the medial segment z ~ +~ from each later line. T h a t is, D6 = (~1, "" ", ~t) (t = r - (m~ - m~)), where for i < m~, ~b~ = ~ , a n d for i > m~, ~b~ -= z ~ z m _ ~ + ~ 2 _ ~ 1 + ~ . By inductive hypothesis, there is a derivation D~ corresponding to D~. I n steps 2 and 3, m0, m~, m~ were chosen so t h a t ~ 0 contains the earliest occurrence of Q~0 = [A~ . . - A~]~, and ~ the latest occurrence of Q ~ = Qm~ = [A~ . - - A~.]2, and so t h a t no occurrences of Q ~ occur 166 CHOMSKY between ~ 1 and ~ . . ' . in Ds, Cmo contains the earliest occurrence of Q,~o and ¢/ml the latest occurrence of Q~I. Furthermore, by Corollary (C), there is no Q shorter than Qm0 between ~m0 and ~ 1 . . ' . by inductive hypothesis and the definition of correspondence, it follows that D7 contains a subsequenee D7 = (z~oAk~b, . . . , z m ~ ) . But step 11 guarantees us a derivation A = (z,~oAk , . . . , z ~ ) corresponding to ( ~ 0 , "" ", ~ ) . We now construct Ds by replacing 1)7 in D7 by A = (zmoAk~, . . . , z ~ ) , formed by suffixing ~ to each line of 4, and inserting z~°+~ after z ~ in all lines of D7 following the subsequenee/)7. Clearly/)8 corresponds to D, which is the required result in case the shortest recurring Q is of the form [...]~. II. An analogous proof can be given for the case in which the shortest recurring Q is of the form [...]~. We have shown that the lemma holds for derivations with no recursions, and that it holds of a derivation with n occurrences of recurring Q's on the assumption that it holds for all derivations with < n such occurrences..', it is true of all derivations. A corollary follows immediately. COROLLaR:C. If G' is formed from G by the construction and D' = ([A]~, . . - , x[A]~) is a derivation in G', then there is a derivation D = (A, -.-, x) in G. From this result and Lemma 11, we draw the following conclusion. THEO~E~ 10. If G' is formed from G by the construction, then there is a derivation (S, - . . , z) in G if and only if there is a derivation ([S]1, • .., z[S]2) in G'. That is, if [S]~ in G' plays the role of S in G, then G and G' are equivalent if we emend the construction by adding the rule Q1 --~ a wherever there are Q~, ...,Q~ (n __->2) such that Q1 -~ aQ2 and Q~ --~ Q3 --~ " . --+ Q~, where Q~ -- [S]~, Qi is of the form [-. "]2 for 1 < i =< n, and Q1 is of the form [...]1. But in the grammar thus formed all rules are of the form A --~ aB (where a is I unless the rule was formed by step (i) of the construction) or A --~ a. It is thus a type 3 grammar, and the language L~ generated by G could have been generated by a finite state Markov source (of. Theorem 6) with many vacuous transitions. But for every such source, there is an equivalent source with no identity transitions (el. Chomsky and Miller, 1958). Therefore L~ could have been generated by a finite Markov source of the usual type. Obviously, every type 3 grammar is ON CERTAIN FORMAL PROPERTIES OF GRAMMARS 167 non-s.e. (the lines of its A-derivations are all of the form xB). Consequently: THEOREM 11. If L is a type 2 language, then it is not a type 3 (finite state) language if and only if all of its grammars are self-embedding. Among the m a n y open questions in this area, it seems particularly important to t r y to arrive at some characterization of the languages of these 2s various types 27 and of the languages that belong to one type but not the next lower type in the classification. In particular, it would be interesting to determine a necessary and sufficient structural property that marks languages as being of type 2 but not type 3. Even given Theorem 11, it does not appear easy to arrive at such a structural characterization theorem for those type 2 languages that are beyond the bounds of type 3 description. RECEIVED: October 28, 1958. REFERENCES CnOMSKY, N. (1956). Three models for the description of language. IRE on Inform. Theory IT-2, No. 3, 113-124. Trans. CnOMSKY, N. (1957). "Syntactic Structures." Mouton and Co., The Hague. C~O~SKY, N., and MILLER, G. A. (1958). Finite state languages. Inform. and Control 1, 91-112. DAVIS, M. (1958). "Computability and Unsolvability." McGraw-Hill, New York. HARRIS, Z. S. (1952a). Discourse analysis. Language 28, 1-30. HA~nIS, Z. S. (1952b). Discourse analysis: A sample text. Language 28, 474-494. HAnnms, Z. S. (1957). Cooccurrence and transformation in linguistic structure. Language 33, 283-340. KLEENE, S. C. (1956). Representation of events in nerve nets. In "Automata Studies" (C. E. Shannon and J. McCarthy, eds.), pp. 3-40. Princeton Univ. Press, Princeton, New Jersey. POST, E. L. (1944). Recursively enumerable sets of positive integers and their decision problems. Bull. Am. Math. Soc. 50, 284-316. SCHOTZENBERGSR,M. P. (1957). Unpublished paper. 2s And several other types. In particular, investigations of this kind will be of limited significance for natural languages until the results are extended to transformational grammars. This is a task of considerable difficulty for which investigations of the type presented here are a necessary prerequisite. ~ As, for example, the results cited in footnote 4 characterize finite state languages.

* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project

Download PDF

advertisement