Top-Down Parsing

Having established that not every context-free language can be accepted by a deterministic pushdown automaton, let us now consider some of those that can.

Our ovetall goal for the remainder of this chapter is to study cases in which context-free grammars can be converted into deterministic pushdown automata that can actually be used for "industrial grade" language recognition. However, our style here is rather different from that of the rest of this book; there are fewer proofs, and we do not attempt to tie up all the loose ends of the ideas we introduce. We present some guidelines ~-what we call "heuristic rules" - that will not be useful in all cases, and we do not even attempt to specify exactly when they will be useful. That is, we aim to introduce some suggestive applications of the theory developed earlier in this chapter, but this venture should not be taken as anything more than an introduction.

Let us begin with an example. The language L = {anb n } is generated by the context-free grammar G = ({ a, b, S}, {a, b}, R, S), where R contains the two rules S --+ aSb and S --+ e. We know how to construct a pushdown automaton that accepts L: just carry out the construction of Lemma 3.4.1 for the grammar G. The result is

where

M1 = ({p,q},{a,b},{a,b,S},~1,P,{q}),

~1 =((p, e, c), (q, S)), ((q, e, S), (q, aSb)),((q, e, S), (q, e)), ((q, a, a), (q, e)), ((q, b, b), (q, e))}.

Since M1 has two different transitions with identical first components -the ones corresponding to the two rules of G that have identical left-hand sides- it is not deterministic.

Nevertheless, L is a deterministic context-free language, and M1 can be modified to become a deterministic pushdown automaton M2 that accepts L$.

Intuitively, all the information that M1 needs at each point in order to decide which of the two transitions to follow is the next input symbol. If that symbol is an a, then M1 should replace S by aSb on its stack if hope of an accepting computation is to be retained. On the other hand, if the next input symbol is a b, then the machine must pop S. M2 achieves this required anticipation or lookahead by consuming an input symbol ahead of time and incorporating that information into its state. Formally,

where ~2 contains the following transitions.

3.7: Determinism and Parsing (1) ((p, e, e), (q, S))

(2) ((q,a,e), (qa,e)) (3) ((qa,e,a),(q,e)) (4) ((q, b, e), (qb" e)) (5) ((qb,e,b), (q,e)) (6) ((q,$,e), (q$, e)) (7) ((qa, e, S), (qa, aSb)) (8) ((qb,e,S),(qb,e))

163

From state q, ~M2 reads one input symbol and, without changing the stack, enters one of the three new states qa, qb, or q$. It then uses that information to differentiate between the two compatible transitions ((q, e, S), (q, aSb)) and (( q, e, S), (q, e) ): The first transition is retained only from state qa and the second only from state qb. So M2 is deterministic. It accepts the input ab$ a'i follows.

Step State Unread Input Stack Transition Used Rule of G

0 p ab$ e

-1 q ab$ S 1

2 qa b$ S 2

3 qa b$ aSb 7 S --+ aSb

4 q b$ Sb 3

5 qb ^$ Sb 4

6 qb ^$ b 8 S--+e

7 q $ e 5

8 q$ e e 6

So M2 can serve as a deterministic device for recognizing strings of the form anb n . Moreover, by remembering which transitions of M2 were derived from which rules of the grammar (this is the last column of the table above), we can use a trace of the operation of M2 in order to reconstruct a leftmost derivation of the input string. Specifically, the steps in the computation where a nonterminal is replaced on top of the stack (Steps 3 and 6 in the example) correspond to the construction of a parse tree from the root towards the leaves (see Figure 3-1l(a)).

Devices such as M_{2 ,}which correctly decide whether a string belongs in a context-free language, and, in the case of a positive answer, produce the corre-sponding parse tree are called parsers. In particular, M2 is a top-down parser because tracing its operation at the steps where nonterminals are replaced on the stack reconstructs a parse tree in a top-down, left-to-right fashion (see Figure 3-1l(b) for a suggestive way of representing how progress is made in a top-down parser). We shall see a more substantial example shortly.

Naturally, not all context-free languages have deterministic acceptors that can he derived from the standard nondeterministic one via the lookahead idea.

For example, we saw in the previous subsection that some context-free languages are not deterministic to begin with. Even for certain deterministic context-free languages, lookahead of just one symbol may not be sufficient to resolve all uncertainties. Some languages, however, are not directly amenable to parsing by lookahead for reasons that are superficial and can he removed by slightly modifying the grammar. We shall focus on these next.

Recall the grammar G that generates arithmetic expressions with operations

+

^and

*

(Example 3.1.3). In fact, let us enrich this grammar by another rule,

F -+ id(E), (R7)

designed to allow Junction calls -such as sqrt(x

*

+

^{1) and}J(z)- to appear in our arithmetic expressions.

Let us try to construct a top-down parser for this grammar. Our construc-tion of Secconstruc-tion 3.4 would give the pushdown automaton

with

and ~ as given below.

(0) ((p,e,e),(q,E)) (1) ((q, e, E), (q, E

+

T)) (2) ((q, e, E), (q, T))

M3 = ({p,q},~,r,~,p,{q}),

~ ={(,),

+, *,

id},

r=~U{E,T,F},

3.7: Determinism and Parsing (3) ((q, e, T), (q, T

*

F)) (4) ((q, e, T), (q, F)) (5) ((q,e,F),(q,(E))) (6) ((q, e, F), (q, id)) (7) ((q, e, F), (q, id(E)))

165

Finally, (( q, a, a), (q, e)) E ~ for all a E ~. The nondeterminism of M3 is manifested by the sets of transitions 1-2,3-4, and 5-6-7 that have identical first components. R'hat is worse, these decisions cannot be made based on the next input symbol. Lets us examine more closely why this is so.

Transitions 6 and 7. Suppose that the configuration of

J..h

is (q, id, F). At this point M3 could act according to anyone of transitions 5, 6, or 7. By looking at the next input symbol -id- M3 could exclude transition 5, since this transition requires that the next symbol be (. Still, M3 would not be able to decide between transitions 6 and 7, since they both produce a top of the stack that can be matched to the next input symbol -id. The problem arises because the rules F --+ id and F --+ id(E) of G have not only identical left-hand sides, but also the same first symbol on their right-hand sides.

There is a very simple way around this problem: Just replace the rules F --+ id and F --+ id(E) in G by the rules F --+ idA, A --+ e, and A --+ (E), where A is a new nonterminal (A for argument). This has the effect of "procrastinating"

on the decision between the rules F --+ id and F --+ id(E) until all needed information is available. A modified pushdown automaton M~ now results from this modified grammar, in which transitions 6 and 7 are replaced by the following.

(6') ((q,e,F),(q,idA)) (7') ((q,e,A),(q,e)) (8') ((q,e,A), (q, (E)))

Now looking one symbol ahead is enough to decide the correct action.

For example, configuration (q, id(id), F) would yield (q, id(id), idA), (q, (id), A), (q, (id), (E)), and so on.

This technique of aVOiding nondeterminism is known as left factoring. It can be summarized as follows.

Heuristic Rule 1: Whenever A --+ af31' A --+

a/h, ... ,

A --+Ci/3n are rules with a

f:.

e and n :::: 2, then replace them by the rules A --+ aA' and A' --+ f3i for i = 1, ... , n, where A' is a new nonterminal.

It is easy to see that applying Heuristic Rule 1 does not change the language generated by the grammar.

We now move to examining the second kind of anomaly that prevents us from transforming ]}h into a deterministic parser.

166 Chapter 3: CONTEXT-FREE LANGUAGES Transitions 1 and 2. These transitions present us with a more serious problem. If the automaton sees id as the next input symbol and the contents of the stack are just E, it could take a number of actions. It could perform transition 2, replacing E by T (this would be justified in case the input is, say, id). Or it could replace E by E

+

T (transition 1) and then the top E by T (this should be done if the input is id

+

id). Or it could perform transition 1 twice and transition 2 once (input id

+

id), and so on. It seems that there is no bound whatsoever on how far ahead the automaton must peek in order to decide on the right action. The culprit here is the rule E --+ E

+

T, in which the nonterminal on the left-hand side is repeated as the first symbol of the right-hand side. This phenomenon is called left recursion, and can be removed by some further surgery on the grammar.

To remove left recursion from the rule E --+ E

+

T, we simply replace it by the rules E --+ T E', E' --+

+

T E', and E' --+ e, where E' is a new nonterminal. It can be shown that such transformations do not change the language produced by the grammar. The same method must also be applied to the other left recursive rule of G, namely T --+ T

*

F. We thus arrive at the grammar G' = (F',~, R', E) where V' = ~ U {E, E', T, T', F, A}, and the rules are as follows.

(1) E --+ T E' (2) E' --+ +T E' (3) E' --+ e (4) T --+ FT' (5) T' --+ *FT' (6) T' --+ e (7) F --+ (E) (8) F --+ idA (9) A --+ e (10) A --+ (E)

The above technique for removing left recursion from a context-free gram-mar can be expressed as follows.

t

Heuristic Rule 2: Let A --+ Aal, ... , A --+ Aan and A --+ (31, ... , A --+ (3m be all rules with A on the left-hand side, where the (3i's do not start with an A and n > 0 (that is, there is at least one left-recursive rule). Then replace these rules by A --+ (3lA', ... , A --+ (3m,A' and A' --+ alA', ... , A' --+ anA', and A' --+ e, where A' is a new nonterminal.

Still the grammar G' of our example has rules with identical left-hand sides, only now all uncertainties can be resolved by looking ahead at the next input

t

We assume here that there are no rules of the form A --+ A.

3.7: Determinism and Parsing 167 symbol. We can thus construct the following deterministic pushdown automaton M4 that accepts L(G)$.

where

and ~ is listed below.

((p,e,e),(q,E)) ((q, a, e), (qa, e)) ((qa, e, a), (q, e)) ((qa, e, E), (qa, T E'))

for each a E ~ U {$}

for each a E ~

for each a E ~ U {$}

((q+, e, E'), (q+, +TE'))

((qa,e,E'),(qa,e)) foreachaE O,$}

((qa, e, T), (qa, FT')) for each a E ~ U {$}

((q., e, T'), (q., *FT'))

((qa, e, T'), (qa, e)) for each a E {+,), $}

((q(, e, F), (q(, (E))) ((qid,e, F), (qid, idA)) ((q(, e, A), (q(, (E)))

((qa,e,A), (qa,e)) for each a E {+,*,),$}

Then M4 is a parser for G'. For example, the input string id * (id)$ would be accepted as shown in the table in the next page.

Here we have indicated the steps in the computation where a nonterminal has been replaced on the stack in accordance with a rule of G'. By applying these rules of G' in the last column of this table in sequence, we obtain a leftmost derivation of the input string:

=>

T E'

=>

FT' E'

=>

idT' E'

=>

id * FT' E'

=>

id * (E)T' E'

=>

id * (T E')T' E'

=>

id * (FT' E')T' E'

=>

id * (idT' E')T' E'

=>

id * (idE')T' E'

=>

id * (id)T' E'

=>

id * (id)E'

=>

id * (id)

In fact, a parse tree of the input can be reconstructed (see Figure 3-12; the step of the pushdown automaton corresponding to the expansion of each node of the parse tree is also shown next to the node). Notice that this parser constructs the parse tree of the input in a top-down, left-first manner, starting from E and repeatedly applying an appropriate rule to the leftmost nonterminal.

168 Chapter 3: CONTEXT-FREE LANGUAGES

Step State Unread Input Stack Rule of G'

0 p id * (id)$ e

1 q id * (id)$ E

2 _qid * (id) $ E

3 _qid * (id)$ TE' 1

4 _qid *(id)$ FT'E' 4

5 _qid *(id)$ idAT'E' 8

6 q *(id)$ AT'E'

7 _q* (id)$ AT'E'

8 _q* (id)$ T'E' 9

9 _q* (id)$ *FT'E' 5

10 q (id)$ FT'E'

11 q( id)$ FT'E'

12 q( id)$ (E)T'E' 7

13 q id)$ E)T'E'

14 _qid )$ E)T'E'

15 _qid )$ TE')T'E' 1

16 _qid )$ FT'E')T'E' 4

17 _qid )$ idAT' E')T' E' 8

18 q )$ AT'E')T'E'

19 q) $ AT'E')T'E'

20 q) $ T'E')T'E' 10

21 q) $ E')T'E' 6

22 q) $ )T'E' 3

23 q $ T'E' 6

24 q$ e T'E'

25 q$ e E' 6

26 q$ e e 3

In general, given a grammar G, one may try to construct a top-down parser for G as follows: Eliminate left recursion in G by repeatedly applying Heuristic Rule 2 to all left-recursive nonterminals A of G. Apply Heuristic Rule 1 to left-factor G whenever necessary. Then examine whether the resulting grammar has the property that one can decide among rules with the same left-hand side by looking at the next input symbol. Grammars with this property are called LL(I).

Although we have not specified exactly how to determine whether a grammar is indeed LL(l) -nor how to construct the corresponding deterministic parser if it is LL(I)--there are systematic methods for doing so. In any case, inspection of the grammar and some experimentation will often be all that is needed.

3.7: Determinism and Parsing 169 E (step 3)

---T (step 4) E' (step 26)

--- ^I

F (step 5) T'(step 9) e

~ ~

id A (step 8)

*

F T' (step 25)

~ I

e ( E (step 15) e

~

T (step 16) E' (step 22)

~ I

F(step 17) T' (step 21) e

~ I

id A (step 20) e e

I

Figure 3-12 Bottom-Up Parsing

There is no one best way to parse context-free languages, and different methods are sometimes preferable for different grammars. We close this chapter by briefly considering methods quite dissimilar from those of top-down parsing. Neverthe-less they, too, find their genesis in the construction of a pushdown automaton.

In addition to the construction of Lemma 3.4.1, there is a quite orthogonal way of constructing a pushdown automaton that accepts the language generated by a given context-free grammar. The automata of that construction (from which the top-down parsers studied in the last subsection are derived) operate by carrying out a leftmost derivation on the stack; as terminal symbols are generated, they are compared with the input string. In the construction given below, the automaton attempts to read the input first and, on the basis of the input actually read, deduce what derivation it should attempt to carry out. The general effect, as we shall see, is to reconstruct a parse tree from the leaves to the root, rather than the other way around, and so this class of methods is called bottom-up.

The bottom-up pushdown automaton is constructed as follows. Let G = (F,~, R, S) be any context-free grammar; then let M = (K. 4, .1, p, F). where K=(p. q}.

r

= F, F

=

^{q},^and~ contains the following.

170 Chapter 3: CONTEXT-FREE LANGUAGES (1) ((p, a, e), (p, a)) for each a E ~.

(2) ((p, e, oR), (p, A)) for each rule A -+ 0 in R.

(3) ((p, e, S), (q, e)).

Before moving to the proof itself, compare these types of transitions with those of the automaton constructed in the proof of Lemma 3.4.1. Transitions of type 1 here move input symbols onto the stack; transitions of type 3 in Lemma 3.4. pop terminal symbols off the stack when they match input symbols.

Transitions of type 2 here replace the right-hand side of a rule on the stack by the corresponding left-hand side, the right-hand side being found reversed on the stack; those of type 2 of Lemma 3.4.1 replace the left-hand side of a rule on the stack by the corresponding right-hand side. Transitions of type 3 here end a computation by moving to the final state when only the start symbol remains on the stack; transitions of type 1 of Lemma 3.4.1 start off the computation by placing the start symbol on the initially empty stack. So the machine of this construction is in a sense perfectly orthogonal to the one of Lemma 3.4.1.

Lemma 3.7.1: Let G and M be as just presented. Then L(M) = L(G).

Proof: . Any string in L( G) has a rightmost derivation from the start symbol.

Therefore proof of the following claim suffices to establish the lemma.

Claim: For any x E ~* and "( E

r',

(p,x,,,() f-~f (p,e,S) if and only if S J!,a

"(Rx.

For if we let x be an input to M and "( = e, then since q is the only final state and it can be entered only via transition 3, the claim implies that M accepts x if and only if G generates x. The only if direction of the claim can be proved by an induction on the number of steps in the computation of M, whereas the if direction can be proved by an induction on the number of steps in the rightmost derivation of x from S . •

Let us consider again the grammar for arithmetic expressions (Example 3.1.3, without the rule F -+ id(E) of the previous subsection). The rules of this grammar are the following.

E-+E+T (Rl)

E-+T (R2)

T-+T*F (R3)

T-+F (R4)

F -+ (E) (R5)

F -+ id (R6)

3.7: Determinism and Parsing 171 If our new construction is applied to this grammar, the following set of transitions is obtained.

(p,a,e),(p,a)) foreachaE~

(p, e, T

+

E), (p, E) (p, e, T), (p, E) (p, e, F

*

T), (p, T) (p, e, F), (p, T) (p, e, )E(), (p, F) (p, e, id), (p, F) (p,e,E),(q,e)

(~O) (~1) (~2) (~3) (~4) (~5) (~6) (~7)

Let us call this pushdown automaton M. The input id * (id) is accepted by M as shown in the table below.

Step State Unread Input Stack Transition Used Rule of G

0 p id * (id) e

1 p *(id) id ~O

2 p *(id) F ~6 R6

3 p *(id) T ~4 R4

4 p (id) *T ^~O

5 p id) (*T ~O

6 p ) id(*T ~O

7 p ) F(*T ^~6 R6

8 p ) T(*T ~4 R4

9 p ) E(*T ~2 R2

10 p e )E(*T ~O

11 p e F*T ^~5 R5

12 p e T ~3 R3

13 p e E ~2 R2

14 q e e 67

We see that M is certainly not deterministic: Transitions of type ~O are compatible with all the transitions of type ~1 through ~8. Still, its overall

"philosophy" of operation is suggestive. At any point, M may shift a terminal symbol from its input to the top of the stack (transitions of type ~O, used in the sample computation at Steps 1, 4, 5, 6, and 10). On the other hand, it may occasionally recognize the top few symbols on the stack as (the reverse of) the right-hand side of a rule of G, and may then reduce this string to the corresponding left-hand side (transitions of types ~2 through ~6, used in the

172 Chapter 3: CONTEXT-FREE LANGUAGES sample computation where a "rule of G" is indicated in the rightmost column).

The sequence of rules corresponding to the reduction steps turns out to mirror exactly, in reverse order, a rightmost derivation of the input string. In our example, the implied rightmost derivation is as follows.

E~T

~T*F

~ T

*

(E)

~ T

*

(T)

~ T

*

(F)

~ T

*

(id)

~ F

*

(id)

~ id

*

(id)

This derivation can be read from the computation by applying the rules men-tioned in the right-hand column, from bottom to top, always to the rightmost nonterminal. Equivalently, this process can be thought of as a bottom-to-top, left-to-right reconstruction of a parse tree (that is, exactly orthogonal to Figure 3-11 (b».

In order to construct a practically useful parser for L(G), we must turn M into a deterministic device that accepts L( G)$. As in our treatment of top-down parsers, we shall not give a systematic procedure for doing so. Instead, we carry through the example of G, pointing out the basic heuristic principles that govern this construction.

First, we need a way of deciding between the two basic kinds of moves, namely, shifting the next input symbol to the stack and reducing the few top-most stack symbols to a single nonterminal according to a rule of the grammar.

One possible way of deciding this is by looking at two pieces of information:

the next input symbol -call it b~ and the top stack symbol --call it a. (The symbol a could be a nonterminal.) The decision between shifting and reducing is then done through a relation P ~ V x (~U {$}) called a precedence relation P. If (a, b) E P, then we reduce; otherwise we shift b. The correct precedence relation for the grammar of our example is given in the table below. Intuitively, (a, b) ^EP means that there exists a rightmost derivation of the form

R* R

S ~G f3Abx ~G f3,),abx.

Since we are reconstructing rightmost derivations backwards, it makes sense to undo the rule A --+ ')'a whenever we observe that a immediately precedes b. There are systematic ways to calculate precedence relations, as well as to find out when, as is the case in this example, a precedence relation suffices to

3.7: Determinism and Parsing 173

I ( ) id

+ *

(

) ^v' ^v' ^v' ^v'

id v' v' v' v'

+

*

T v' v' v'

F v' v' v' v'

decide between shifting and reducing; however, in many cases inspection and experimentation will lead to the right table.

Now we must confront the other source of nondeterminism: when we decide to reduce, how do we choose which of the prefixes of the pushdown store to replace with a nonterminal? For example, if the pushdown store contains the string F

*

+

E and we must reduce, we have a choice between reducing F to T (Rule R4) or reducing F

*

T to T (Rule R3). For our grammar, the correct action is always to choose the longest prefix of the stack contents that matches the reverse of the right-hand side of a rule and reduce it to the left-hand side of that rule. Thus in the case above we should take the second option and reduce F

*

T to T.

With these two rules (reduce when the top of the stack and the next in-put symbol are related by P, otherwise shift; and, when reducing, reduce the longest possible string from the top of the stack) the operation of the push-down automaton M becomes completely deterministic. In fact, we could design a deterministic pushdown automaton that "implements" these two rules (see Problem 3.7.9).

Once again, bear in mind that the two heuristic rules we have described -namely, (1) use a precedence relation to decide whether to shift or reduce, and (2) when in doubt, reduce the longest possible string- do not work in all situations. The grammars for which they do work are called weak precedence grammars; in practice, many grammars related to programming languages are or can readily be converted into weak precedence grammars. And there are many, even more sophisticated methods for constructing top-down parsers, which work for larger classes of grammars.

Problems for Section 3.7

3.7.1. Show that the following languages are deterministic context-free.

(a) {ambn : m =1= n}

(b) {wcw^R: wE {a, b}*}

174 Chapter 3: CONTEXT-FREE LANGUAGES

(d) {amcbn : m i=- n} U {amdb2m : m 2: O}

3.7.2. Show that the class of deterministic context-free languages is not closed under homomorphism.

3.7.3. Show that if L is a deterministic context-free language, then L is not inher-ently ambiguous.

3.7.4. Show that the pushdown automaton M' constructed in Section 3.7.1 accepts the language L, given that M accepts L$.

3.7.5. Consider the following context-free grammar: G = (V, I;, R, S), where V

=

{C),.,a,S,A}, I; = {(,),.,}, and R = {S --+

D,S

--+ a,S --+ (A),A--+

S, A --+ A.S} (For the reader familiar with the programming language LISP, L(G) contains all atoms and lists, where the symbol a stands for any non-null atom.)

(a) Apply Heuristic Rules 1 and 2 to G. Let G' be the resulting grammar.

Argue that G' is LL(l). Construct a deterministic pushdown automaton M accepting L(G)$. Study the computation of M on the string ((()).a).

(b) Repeat Part (a) for the grammar resulting from G if one replaces the first rule by A --+ e.

3.7.6. Consider again the grammar G of Problem 3.7.5. Argue that G is a weak precedence grammar, with the precedence relation shown below. Construct a deterministic pushdown automaton that accepts L(G)$.

la

⁽ ⁾ ^$

a ..; ..; ..;

(

) ..; ..; ..;

S ..; ..;

3.7.7. Let G' = (V,I;,R',S) be the grammar with rules S --+ (A),S --+ a,A --+

S.A, A --+ e. Is G' weak precedence? If so, give an appropriate precedence relation; otherwise, explain why not.

3.7.8. Acceptance by final state is defined in Problem 3.3.3. Show that L is de-terministic context-free if and only if L is accepted by final state by some deterministic pushdown automaton.

References 175 3.7.9. Give explicitly a deterministic pushdown automaton that accepts the

lan-guage of arithmetic expressions, based on the nondeterministic pushdown automaton M and the precedence table P of the last subsection. Your au-tomaton should look ahead in the input by absorbing the next input symbol, very much like the pushdown automaton M4 of the previous section.

3.7.10. Consider the following classes of languages:

(a) Regular (b) Context-free

Give a Venn diagram of these classes; that is, represent each class by a "bub-ble," so that inclusions, intersections, etc. of classes are reflected accurately.

Can you supply a language for each non-empty region of your diagram?

REFERENCES

Context-free grammars are a creation of Noam Chomsky; see

a N. Chomsky "Three models for the description of languages," IRE Transactions on Information Theory, 2,3, pp. 113-124, 1956, and also

a N. Chomsky "On certain formal properties of grammars," Information and Con-trol, 2,137167, 1959.

In the last paper, Chomsky normal form was also introduced. A closely r-elated notation for the syntax of programming languages, called BNF (for Backus Normal Form or Backus-Naur Form), was also invented in the late 1950s; see

a P. Naur, ed. "Revised report on the algorithmic language Algol 60," Communi-cations of the ACM, 6, 1, pp. 1-17, 1963, reprinted in S. Rosen, ed., Programming Systems and Languages New York: McGraw-Hill, pp. 79-118, 1967.

Problem 3.1.9 on the equivalence of regular grammars and finite automata is from a N. Chomsky, G. A. Miller "Finite-state languages," Information and Control, 1,

pp. 91-112, 1958.

The pushdown automaton was introduced in

a A. G. Oettinger "Automatic syntactic analysis and the pushdown store," Pro-ceedings of Symposia on Applied Mathematics, Vol. 12, Providence, R.L: Ameri-can Mathematical Society, 1961.

Theorem 3.4.1 on the equivalence of context-free languages and pushdown automata was proved independently by Schutzenberger, Chomsky, and Evey.

a M. P. Schutzenberger "On context-free languages and pushdown automata," In-formation and Control, 6, 3, pp. 246-264, 1963.

oN. Chomsky "Context-free grammar and pushdown storage," Quarterly Progress Report, 65, pp. 187-194, M.LT. Research Laboratory in Electronics, Cambridge, Mass., 1962

176 Chapter 3: CONTEXT-FREE LANGUAGES o .T. Evey "Application of pushdown store machines," Proceedings of the 1963 Fall

Joint Computer Conference, pp. 215-217. Montreal: AFIPS Press, 1963.

The closure proper·ties presented in subsection 3.5.1, along with many others, were pointed out in

o V. Bar-Hillel, M. Perles, and E. Shamir "On formal properties of simple phrase structure grammars," Zeitschrijt for Phonetik, Sprachwissenschajt und Kommu-nikationsforschung, 14, pp. 143-172, 1961.

In the same paper one finds a stronger version of Theorem 3.5.3 (the Pumping Theorem for context-free grammars; see also Problem 3.5.7). An even stronger version of that theorem appears in

o W. G. Ogden "A helpful result for proving inherent ambiguity," Mathematical Systems Theory, 2. pp. 191194, 1968.

The dynamic progr'amming algorithm for context-free language recognition was discov-ered by

o T. Kasami "An efficient recognition and syntax algorithm for context-free lan-guages," Report AFCRL-65-758 (1965), Air Force Cambridge Research Labora-tory, Cambridge, Mass., and independently by

o D. H. Younger "Recognition and parsing of context-free languages in time n³,"

Information and Control, 10, 2, pp. 189-208, 1967.

A variant of this algorithm is faster when the underlying grammar is unambiguous o .T. Earley "An efficient context-free parsing algorithm," Communications of the

ACM, 13, pp. 94-102, 1970.

The most efficient general context-free recognition algorithm known is due to Valiant.

It runs in time proportional to the time required for multiplying two n x n matrices, currently O(n².3 ... ).

o L. G. Valiant "General context-free recognition in less than cubic time," Journal of Computer and Systems Sciences, 10, 2, pp. 308-315, 1975.

LL (1) parsers were introduced in

o P. M. Lewis II, R. E. Stearns "Syntax-directed transduction," Journal of the ACM, 15, 3, pp. 465-488, 1968, and also in

o D. E. Knuth "Top-down syntax analysis," Acta Informatica, 1, 2, pp. 79-110, 1971.

Weak precedence parsers were proposed in

o .T. D. Ichbiah and S. P. Morse "A technique for generating almost optimal Floyd-Evans productions for precedence grammars," Communications of the ACM, 13, 8, pp. 501-508, 1970.

The following is a standard book on compilers

o A. V. Aho, R. Sethi, .T. D. Ullman Pr'inciples of Compiler Design, Reading, Mass.: Addison-Wesley, 1985.

Ambiguity and inherent ambiguity were first studied in

o N. Chomsky and M. P. Schutzenberger "The algebraic theory of context free languages," in Computer Programming and Formal Systems (pp. 118-161), ed.

P. Braffort, D. Hirschberg. Amsterdam: North Holland, 1963, and

在文檔中 ELEMENTS OF THE THEORY OF COMPUTATION (頁 176-193)

+

*

*

+

+

+, *,

*

J..h

a/h, ... ,

f:.

+

+

+

+

+

+

+

*

t

t

=>

=>

=>

=>

=>

=>

=>

=>

=>

=>

=>

=>

--- I

~ ~

*

~ I

~

~ I

~ I

I

r

=

r',

+

*

*

*

*

*

*

*

+ *

+

*

*

+

*

*

=

D,S

la

--- ^I