Having established that not every context-free language can be accepted by a deterministic pushdown automaton, let us now consider some of those that can.

Our ovetall goal for the remainder of this chapter is to study cases in which context-free grammars can be converted into deterministic pushdown automata that can actually be used for "industrial grade" language recognition. However, our style here is rather different from that of the rest of this book; there are fewer proofs, and we do not attempt to tie up all the loose ends of the ideas we introduce. We present some guidelines ~-what we call "heuristic rules" - that will not be useful in all cases, and we do not even attempt to specify exactly when they will be useful. That is, we aim to introduce some suggestive applications of the theory developed earlier in this chapter, but this venture should not be taken as anything more than an introduction.

Let us begin with an example. The language *L *= *{anb n } *is generated by
the context-free grammar *G *= ({ *a, b, S}, {a, b}, R, S), *where *R *contains the two
rules *S *--+ *aSb *and *S *--+ *e. * We know how to construct a pushdown automaton
that accepts *L: *just carry out the construction of Lemma 3.4.1 for the grammar
*G. *The result is

where

*M1 *= ({p,q},{a,b},{a,b,S},~1,P,{q}),

~1 *=((p, e, c), (q, S)), ((q, e, S), (q, aSb)),((q, e, S), (q, e)), *
*((q, a, a), (q, e)), ((q, b, b), (q, e))}. *

Since *M1 *has two different transitions with identical first components -the ones
corresponding to the two rules of *G *that have identical left-hand sides- it is
not deterministic.

Nevertheless, *L *is a deterministic context-free language, and *M1 * can be
modified to become a deterministic pushdown automaton *M2 *that accepts *L$. *

Intuitively, all the information that *M1 * needs at each point in order to decide
which of the two transitions to follow is the next input symbol. If that symbol
is an *a, *then *M1 * should replace *S *by *aSb *on its stack if hope of an accepting
computation is to be retained. On the other hand, if the next input symbol is
a *b, *then the machine must pop *S. * *M2 * achieves this required anticipation or
lookahead by consuming an input symbol ahead of time and incorporating that
information into its state. Formally,

where ~2 contains the following transitions.

3.7: Determinism and Parsing
*(1) ((p, e, e), (q, S)) *

*(2) ((q,a,e), (qa,e)) *
*(3) ((qa,e,a),(q,e)) *
*(4) ((q, b, e), (qb" e)) *
*(5) ((qb,e,b), (q,e)) *
*(6) ((q,$,e), (q$, e)) *
*(7) ((qa, e, S), (qa, aSb)) *
*(8) ((qb,e,S),(qb,e)) *

163

From state *q, *~M2 reads one input symbol and, without changing the stack,
enters one of the three new states *qa, qb, *or *q$. * It then uses that information
to differentiate between the two compatible transitions *((q, e, S), (q, aSb)) *and
*(( q, e, S), (q, e) ): *The first transition is retained only from state *qa *and the second
only from state *qb. *So M2 is deterministic. It accepts the input *ab$ *a'i follows.

Step State Unread Input Stack Transition Used Rule of *G *

0 *p * *ab$ * *e *

-1 *q * *ab$ * *S * 1

2 *qa * *b$ * *S * 2

3 *qa * *b$ * *aSb * 7 *S *--+ *aSb *

4 *q * *b$ * *Sb * 3

5 *qb * ^{$ } *Sb * 4

6 *qb * ^{$ } *b * 8 *S--+e *

7 *q * $ *e * 5

8 *q$ * *e * *e * 6

So M2 can serve as a deterministic device for recognizing strings of the
*form anb n . Moreover, by remembering which transitions of *M2 were derived
from which rules of the grammar (this is the last column of the table above),
we can use a trace of the operation of M2 in order to reconstruct a leftmost
derivation of the input string. Specifically, the steps in the computation where
a nonterminal is replaced on top of the stack (Steps 3 and 6 in the example)
correspond to the construction of a parse tree from the root towards the leaves
(see Figure 3-1l(a)).

*Devices such as M** _{2 , }*which correctly decide whether a string belongs in a
context-free language, and, in the case of a positive answer, produce the
corre-sponding parse tree are called parsers. In particular, M2

**is a top-down parser**because tracing its operation at the steps where nonterminals are replaced on the stack reconstructs a parse tree in a

*top-down, left-to-right*fashion (see Figure 3-1l(b) for a suggestive way of representing how progress is made in a top-down parser). We shall see a more substantial example shortly.

Naturally, not all context-free languages have deterministic acceptors that can he derived from the standard nondeterministic one via the lookahead idea.

For example, we saw in the previous subsection that some context-free languages are not deterministic to begin with. Even for certain deterministic context-free languages, lookahead of just one symbol may not be sufficient to resolve all uncertainties. Some languages, however, are not directly amenable to parsing by lookahead for reasons that are superficial and can he removed by slightly modifying the grammar. We shall focus on these next.

Recall the grammar G that generates arithmetic expressions with operations

### +

^{and }

### *

(Example 3.1.3). In fact, let us enrich this grammar by another rule,*F *-+ *id(E), * *(R7) *

designed to allow *Junction calls *-such as *sqrt(x *

### *

*x*

### +

^{1) and }

*J(z)-*to appear in our arithmetic expressions.

Let us try to construct a top-down parser for this grammar. Our construc-tion of Secconstruc-tion 3.4 would give the pushdown automaton

with

and ~ as given below.

*(0) ((p,e,e),(q,E)) *
*(1) ((q, e, E), (q, E *

### +

*T))*

*(2)*

*((q,*e,

*E), (q, T))*

M3 = ({p,q},~,r,~,p,{q}),

~ ={(,),

### +, *,

id},r=~U{E,T,F},

3.7: **Determinism and Parsing **
*(3) * *((q, e, T), (q, T *

### *

*F))*

*(4)*

*((q, e, T), (q, F))*

*(5) ((q,e,F),(q,(E)))*

*(6) ((q, e, F), (q,*id))

*(7) ((q, e, F), (q,*id(E)))

**165 **

Finally, (( *q, a, a), (q, e)) *E ~ for all *a *E ~. The nondeterminism of *M3 *is
manifested by the sets of transitions 1-2,3-4, and 5-6-7 that have identical first
components. *R'hat is worse, these decisions cannot be made based on the next *
*input symbol. *Lets us examine more closely why this is so.

*Transitions *6 *and 7. * Suppose that the configuration of

### J..h

is*(q,*id,

*F).*At this point

*M3*could act according to anyone of transitions 5, 6, or 7. By looking at the next input symbol -id-

*M3*could exclude transition 5, since this transition requires that the next symbol be (. Still,

*M3*would not be able to decide between transitions 6 and 7, since they both produce a top of the stack that can be matched to the next input symbol -id. The problem arises because the rules

*F*--+

**id**and

*F*--+ id(E) of G have not only identical left-hand sides, but also the same first symbol on their right-hand sides.

There is a very simple way around this problem: Just replace the rules
*F *--+ **id **and *F *--+ id(E) in *G *by the rules *F *--+ *idA, A *--+ *e, *and *A *--+ *(E), *where
*A *is a new nonterminal *(A *for *argument). *This has the effect of "procrastinating"

on the decision between the rules *F * --+ **id ** and *F * --+ id(E) until all needed
information is available. A modified pushdown automaton M~ now results from
this modified grammar, in which transitions 6 and 7 are replaced by the following.

*(6') ((q,e,F),(q,idA)) *
*(7') ((q,e,A),(q,e)) *
*(8') ((q,e,A), (q, (E))) *

Now looking one symbol ahead is enough to decide the correct action.

For example, configuration *(q, *id(id), *F) *would yield *(q, *id(id), *idA), (q, *(id), *A), *
*(q, *(id), (E)), and so on.

This technique of aVOiding nondeterminism is known as **left factoring. It **
can be summarized as follows.

**Heuristic Rule 1: ** *Whenever A *--+ *af31' A *--+

### a/h, ... ,

*A --+Ci/3n are rules with*a

*f:. *

*e and n ::::*2,

*then replace them by the rules A*--+

*aA' and A'*--+

*f3i*

*for*i = 1, ... ,

*n, where A' is a new nonterminal.*

It is easy to see that applying Heuristic Rule 1 does not change the language generated by the grammar.

We now move to examining the second kind of anomaly that prevents us from transforming ]}h into a deterministic parser.

166 Chapter 3: CONTEXT-FREE LANGUAGES
*Transitions 1 and 2. These transitions present us with a more serious problem. *If
the automaton sees id as the next input symbol and the contents of the stack are
just E, it could take a number of actions. It could perform transition 2, replacing
*E *by *T *(this would be justified in case the input is, say, id). Or it could replace
*E by E *

### +

*T (transition*1)

*and then the top E by T (this should be done if the*input is id

### +

id). Or it could perform transition 1 twice and transition 2 once (input id### +

id### +

id), and so on. It seems that there is no bound whatsoever on how far ahead the automaton must peek in order to decide on the right action. The*culprit here is the rule E --+ E*

### +

*T, in which the nonterminal on the left-hand*side is repeated as the first symbol of the right-hand side. This phenomenon is called left recursion, and can be removed by some further surgery on the grammar.

To remove left recursion from the rule *E *--+ *E *

### +

*T,*we simply replace it by

*the rules E --+ T E', E' --+*

### +

*T E', and E' --+ e, where E' is a new nonterminal.*It can be shown that such transformations do not change the language produced by the grammar. The same method must also be applied to the other left recursive rule of

*G,*namely

*T*--+

*T*

### *

*F.*We thus arrive at the grammar

*G'*= (F',~,

*R', E)*where

*V'*= ~ U

*{E, E', T, T', F, A},*and the rules are as follows.

*(1) E *--+ *T E' *
*(2) E' *--+ *+T E' *
*(3) E' --+ e *
*(4) * *T --+ FT' *
*(5) * *T' --+ *FT' *
*(6) T' *--+ *e *
*(7) F *--+ *(E) *
*(8) F *--+ *idA *
*(9) A *--+ *e *
*(10) A *--+ *(E) *

The above technique for removing left recursion from a context-free gram-mar can be expressed as follows.

### t

Heuristic Rule 2: *Let A --+ Aal, ... , A --+ Aan and A --+ (31, ... , A --+ (3m *
*be all rules with A on the left-hand side, where the (3i's do not start with an A *
*and n *> *0 (that is, there is at least one left-recursive rule). Then replace these *
*rules by A --+ (3lA', ... , A --+ (3m,A' and A' --+ alA', ... , A' --+ anA', and A' --+ e, *
*where A' is a new nonterminal. *

Still the grammar *G' *of our example has rules with identical left-hand sides,
only now all uncertainties can be resolved by looking ahead at the next input

### t

We assume here that there are no rules of the form*A --+ A.*

3.7: Determinism and Parsing 167
symbol. We can thus construct the following deterministic pushdown automaton
*M4 *that accepts *L(G)$. *

where

and ~ is listed below.

*((p,e,e),(q,E)) *
*((q, a, e), (qa, e)) *
*((qa, e, a), (q, e)) *
*((qa, e, E), (qa, T E')) *

*for each a E *~ U {$}

*for each a E *~

*for each a E *~ U {$}

*((q+, e, E'), (q+, +TE')) *

*((qa,e,E'),(qa,e)) * foreachaE O,$}

*((qa, e, T), (qa, FT')) * *for each a E *~ U {$}

*((q., e, T'), (q., *FT')) *

*((qa, e, T'), (qa, e)) * *for each a E *{+,), $}

*((q(, e, F), (q(, (E))) *
*((qid,e, F), (qid, idA)) *
*((q(, e, A), (q(, (E))) *

*((qa,e,A), (qa,e)) * *for each a E *{+,*,),$}

Then *M4 *is a parser for *G'. * For example, the input string id * (id)$ would be
accepted as shown in the table in the next page.

Here we have indicated the steps in the computation where a nonterminal
has been replaced on the stack in accordance with a rule of *G'. * By applying
*these rules of G' in the last column of this table in sequence, we obtain a leftmost *
derivation of the input string:

*E *

### =>

*T E'*

### =>

*FT' E'*

### =>

*idT' E'*

### =>

id * FT' E'### =>

id * (E)T' E'### =>

id * (T E')T' E'

### =>

id * (FT' E')T' E'### =>

id * (idT' E')T' E'### =>

id * (idE')T' E'

### =>

id * (id)T' E'### =>

id * (id)E'### =>

id * (id)*In fact, a parse tree of the input can be reconstructed (see Figure 3-12; the step *
of the pushdown automaton corresponding to the expansion of each node of the
parse tree is also shown next to the node). Notice that this parser constructs
the parse tree of the input in a top-down, left-first manner, starting from *E *
and repeatedly applying an appropriate rule to the leftmost nonterminal.

168 Chapter 3: CONTEXT-FREE LANGUAGES

Step State Unread Input *Stack Rule of G' *

0 *p * id * (id)$ *e *

1 *q * id * (id)$ *E *

2 * _{qid }* * (id) $

*E*

3 * _{qid }* * (id)$

*TE'*1

4 * _{qid }* *(id)$

*FT'E'*4

5 * _{qid }* *(id)$

*idAT'E'*8

6 *q * *(id)$ *AT'E' *

7 * _{q* }* (id)$

*AT'E'*

8 * _{q* }* (id)$

*T'E'*9

9 * _{q* }* (id)$

**FT'E'*5

10 *q * (id)$ *FT'E' *

11 *q( * id)$ *FT'E' *

12 *q( * id)$ *(E)T'E' * 7

13 *q * id)$ *E)T'E' *

14 * _{qid }* )$

*E)T'E'*

15 * _{qid }* )$

*TE')T'E'*1

16 * _{qid }* )$

*FT'E')T'E'*4

17 * _{qid }* )$

*idAT' E')T' E'*8

18 *q * )$ *AT'E')T'E' *

19 *q) * $ *AT'E')T'E' *

20 *q) * $ *T'E')T'E' * 10

21 *q) * $ *E')T'E' * 6

22 *q) * $ *)T'E' * 3

23 *q * $ *T'E' * 6

24 *q$ * *e * *T'E' *

25 *q$ * *e * *E' * 6

26 *q$ * *e * *e * 3

In general, given a grammar G, one may try to construct a top-down parser
for *G *as follows: Eliminate left recursion in *G *by repeatedly applying Heuristic
Rule 2 to all left-recursive nonterminals *A *of *G. *Apply Heuristic Rule 1 to
left-factor *G *whenever necessary. Then examine whether the resulting grammar has
the property that one can decide among rules with the same left-hand side by
looking at the next input symbol. Grammars with this property are called LL(I).

Although we have not specified exactly how to determine whether a grammar is
indeed *LL(l) -nor how to construct the corresponding deterministic parser if *
it is *LL(I)--*there are systematic methods for doing so. In any case, inspection
of the grammar and some experimentation will often be all that is needed.

3.7: Determinism and Parsing 169
*E *(step 3)

*---T *(step 4) *E' *(step 26)

### --- ^{I }

*F (step 5) * *T'(step 9) * *e *

### ~ ~

id *A *(step 8)

### *

*F*

*T'*(step 25)

### ~ I

*e * *( * *E *(step 15) *e *

### ~

*T *(step 16) *E' *(step 22)

### ~ I

*F(step 17) * *T' *(step 21) *e *

### ~ I

id *A *(step 20) e
e

### I

Figure 3-12 Bottom-Up Parsing

There is no one best way to parse context-free languages, and different methods are sometimes preferable for different grammars. We close this chapter by briefly considering methods quite dissimilar from those of top-down parsing. Neverthe-less they, too, find their genesis in the construction of a pushdown automaton.

In addition to the construction of Lemma 3.4.1, there is a quite orthogonal way of constructing a pushdown automaton that accepts the language generated by a given context-free grammar. The automata of that construction (from which the top-down parsers studied in the last subsection are derived) operate by carrying out a leftmost derivation on the stack; as terminal symbols are generated, they are compared with the input string. In the construction given below, the automaton attempts to read the input first and, on the basis of the input actually read, deduce what derivation it should attempt to carry out. The general effect, as we shall see, is to reconstruct a parse tree from the leaves to the root, rather than the other way around, and so this class of methods is called bottom-up.

The bottom-up pushdown automaton is constructed as follows. Let *G = *
(F,~, *R, S) *be any context-free grammar; then let M = *(K. *4, .1, *p, F). where *
*K=(p. q}. *

### r

= F,*F*

### =

^{{q}, }^{and }~ contains the following.

170 Chapter 3: CONTEXT-FREE LANGUAGES
*(1) ((p, a, e), (p, a)) *for each *a *E ~.

*(2) ((p, e, oR), (p, A)) *for each rule A -+ 0 in *R. *

*(3) ((p, e, S), (q, e)). *

Before moving to the proof itself, compare these types of transitions with those of the automaton constructed in the proof of Lemma 3.4.1. Transitions of type 1 here move input symbols onto the stack; transitions of type 3 in Lemma 3.4. pop terminal symbols off the stack when they match input symbols.

Transitions of type 2 here replace the right-hand side of a rule on the stack by the corresponding left-hand side, the right-hand side being found reversed on the stack; those of type 2 of Lemma 3.4.1 replace the left-hand side of a rule on the stack by the corresponding right-hand side. Transitions of type 3 here end a computation by moving to the final state when only the start symbol remains on the stack; transitions of type 1 of Lemma 3.4.1 start off the computation by placing the start symbol on the initially empty stack. So the machine of this construction is in a sense perfectly orthogonal to the one of Lemma 3.4.1.

Lemma 3.7.1: *Let G and M be as just presented. Then L(M) *= *L(G). *

*Proof: . *Any string in *L( G) *has a rightmost derivation from the start symbol.

Therefore proof of the following claim suffices to establish the lemma.

Claim: For any *x *E ~* and "( E

### r',

*(p,x,,,()*f-~f

*(p,e,S)*if and only if S J!,a

*"(Rx. *

For if we let *x *be an input to M and "( = e, then since *q *is the only final state
and it can be entered only via transition 3, the claim implies that M accepts *x *
*if and only if G generates x. The only if direction of the claim can be proved by *
an induction on the number of steps in the computation of M, *whereas the if *
direction can be proved by an induction on the number of steps in the rightmost
derivation of *x from S . • *

Let us consider again the grammar for arithmetic expressions (Example
3.1.3, without the rule *F *-+ *id(E) *of the previous subsection). The rules of this
grammar are the following.

*E-+E+T * *(Rl) *

*E-+T * *(R2) *

*T-+T*F * *(R3) *

*T-+F * *(R4) *

*F *-+ *(E) * *(R5) *

*F *-+ id *(R6) *

3.7: Determinism and Parsing 171 If our new construction is applied to this grammar, the following set of transitions is obtained.

*(p,a,e),(p,a)) * foreachaE~

*(p, e, T *

### +

*E), (p, E)*

*(p, e, T), (p, E)*

*(p, e, F*

### *

*T), (p, T)*

*(p, e, F), (p, T)*

*(p, e, )E(), (p, F)*

*(p, e, id), (p, F)*

*(p,e,E),(q,e)*

(~O) (~1) (~2) (~3) (~4) (~5) (~6) (~7)

Let us call this pushdown automaton *M. *The input id * (id) is accepted by *M *
as shown in the table below.

Step State Unread Input Stack Transition Used Rule of *G *

0 *p * id * (id) *e *

1 *p * *(id) id ~O

2 *p * *(id) *F * ~6 R6

3 *p * *(id) *T * ~4 R4

4 *p * (id) **T * ^{~O }

5 *p * id) *(*T * ~O

6 *p * ) *id(*T * ~O

7 *p * ) *F(*T * ^{~6 } R6

8 *p * ) *T(*T * ~4 R4

9 p ) *E(*T * ~2 R2

10 *p * *e )E(*T * ~O

11 *p * *e F*T * ^{~5 } R5

12 *p * *e * *T * ~3 R3

13 *p * *e * *E * ~2 R2

14 *q * *e * *e * 67

We see that *M *is certainly not deterministic: Transitions of type ~O are
compatible with all the transitions of type ~1 through ~8. Still, its overall

"philosophy" of operation is suggestive. At any point, *M *may shift a terminal
symbol from its input to the top of the stack (transitions of type ~O, used in
the sample computation at Steps 1, 4, 5, 6, and 10). On the other hand, it
may occasionally recognize the top few symbols on the stack as (the reverse
of) the right-hand side of a rule of G, and may then reduce this string to the
corresponding left-hand side (transitions of types ~2 through ~6, used in the

172 Chapter 3: CONTEXT-FREE LANGUAGES sample computation where a "rule of G" is indicated in the rightmost column).

The sequence of rules corresponding to the reduction steps turns out to mirror exactly, in reverse order, a rightmost derivation of the input string. In our example, the implied rightmost derivation is as follows.

E~T

~T*F

~ *T *

### *

*(E)*

~ *T *

### *

*(T)*

~ *T *

### *

*(F)*

~ *T *

### *

(id)~ *F *

### *

(id)~ id

### *

(id)This derivation can be read from the computation by applying the rules men-tioned in the right-hand column, from bottom to top, always to the rightmost nonterminal. Equivalently, this process can be thought of as a bottom-to-top, left-to-right reconstruction of a parse tree (that is, exactly orthogonal to Figure 3-11 (b».

In order to construct a practically useful parser for *L(G), *we must turn *M *
into a deterministic device that accepts L( G)$. As in our treatment of top-down
parsers, we shall not give a systematic procedure for doing so. Instead, we carry
through the example of G, pointing out the basic heuristic principles that govern
this construction.

First, we need a way of deciding between the two basic kinds of moves,
*namely, shifting the next input symbol to the stack and reducing the few *
top-most stack symbols to a single nonterminal according to a rule of the grammar.

One possible way of deciding this is by looking at two pieces of information:

the next input symbol -call it b~ *and the top stack symbol --call it a. (The *
*symbol a could be a nonterminal.) The decision between shifting and reducing is *
then done through a relation *P *~ *V *x (~U {$}) called a precedence relation
*P. *If *(a, b) *E *P, then we reduce; otherwise we shift b. The correct precedence *
relation for the grammar of our example is given in the table below. Intuitively,
*(a, b) *^{E }*P *means that there exists a rightmost derivation of the form

*R* * *R *

*S *~G *f3Abx *~G *f3,),abx. *

Since we are reconstructing rightmost derivations backwards, it makes sense
to undo the rule *A *--+ *')'a *whenever we observe that *a *immediately precedes
*b. There are systematic ways to calculate precedence relations, as well as to *
find out when, as is the case in this example, a precedence relation suffices to

3.7: Determinism and Parsing 173

I ( ) id

### + *

^{$ }

(

) ^{v' } ^{v' } ^{v' } ^{v' }

id v' v' v' v'

### +

*E *

### *

*T * v' v' v'

*F * v' v' v' v'

decide between shifting and reducing; however, in many cases inspection and experimentation will lead to the right table.

Now we must confront the other source of nondeterminism: when we decide to reduce, how do we choose which of the prefixes of the pushdown store to replace with a nonterminal? For example, if the pushdown store contains the string F

### *

*T*

### +

*E*and we must reduce, we have a choice between reducing

*F*to

*T*(Rule R4) or reducing F

### *

*T*to T (Rule R3). For our grammar, the correct

*action is always to choose the longest prefix of the stack contents that matches*the reverse of the right-hand side of a rule and reduce it to the left-hand side of that rule. Thus in the case above we should take the second option and reduce

*F*

### *

*T*to T.

With these two rules (reduce when the top of the stack and the next
in-put symbol are related by *P, *otherwise shift; and, when reducing, reduce the
longest possible string from the top of the stack) the operation of the
push-down automaton M becomes completely deterministic. In fact, we could design
a deterministic pushdown automaton that "implements" these two rules (see
Problem 3.7.9).

Once again, bear in mind that the two heuristic rules we have described -namely, (1) use a precedence relation to decide whether to shift or reduce, and (2) when in doubt, reduce the longest possible string- do not work in all situations. The grammars for which they do work are called weak precedence grammars; in practice, many grammars related to programming languages are or can readily be converted into weak precedence grammars. And there are many, even more sophisticated methods for constructing top-down parsers, which work for larger classes of grammars.

Problems for Section 3.7

3.7.1. Show that the following languages are deterministic context-free.

(a) *{ambn : *m *=1= **n} *

(b) *{wcw** ^{R }*: wE

*{a, b}*}*

174 Chapter 3: CONTEXT-FREE LANGUAGES

(c) *{cambn : m i=-n} *U {damb2m : m 2: O}

(d) *{amcbn : *m *i=-* *n} *U *{amdb2m : *m 2: O}

3.7.2. Show that the class of deterministic context-free languages is not closed under homomorphism.

3.7.3. Show that if L is a deterministic context-free language, then L is not inher-ently ambiguous.

3.7.4. Show that the pushdown automaton *M' *constructed in Section 3.7.1 accepts
the language L, given that M accepts L$.

3.7.5. Consider the following context-free grammar: *G *= *(V, *I;, *R, S), *where V

*= *

*{C),.,a,S,A}, *I; = {(,),.,}, and *R *= *{S *--+

### D,S

--+*a,S*--+

*(A),A--+*

*S, A --+ A.S} (For the reader familiar with the programming language LISP, *
*L(G) contains all atoms and lists, where the symbol a stands for any *
non-null atom.)

(a) Apply Heuristic Rules 1 and 2 to *G. *Let *G' *be the resulting grammar.

Argue that G' is LL(l). Construct a deterministic pushdown automaton M
accepting L(G)$. Study the computation of M on the string *((()).a). *

(b) Repeat Part (a) for the grammar resulting from *G *if one replaces the
first rule by *A *--+ *e. *

(c) Repeat Part (a) for the grammar resulting from *G *if one replaces the
last rule by *A --+ S.A in G. *

3.7.6. Consider again the grammar *G *of Problem 3.7.5. Argue that *G *is a weak
precedence grammar, with the precedence relation shown below. Construct
a deterministic pushdown automaton that accepts *L(G)$. *

### la

^{( }

^{) }

^{$ }

*a * ..; ..; ..;

(

) ..; ..; ..;

*A *

*S * ..; ..;

3.7.7. Let *G' *= *(V,I;,R',S) *be the grammar with rules *S *--+ *(A),S *--+ *a,A *--+

*S.A, A *--+ *e. *Is *G' *weak precedence? If so, give an appropriate precedence
relation; otherwise, explain why not.

3.7.8. Acceptance by final state is defined in Problem 3.3.3. Show that *L *is
de-terministic context-free if and only if *L *is accepted by final state by some
deterministic pushdown automaton.

References 175 3.7.9. Give explicitly a deterministic pushdown automaton that accepts the

lan-guage of arithmetic expressions, based on the nondeterministic pushdown
automaton M and the precedence table P of the last subsection. Your
au-tomaton should look ahead in the input by absorbing the next input symbol,
very much like the pushdown automaton *M4 *of the previous section.

3.7.10. Consider the following classes of languages:

(a) Regular (b) Context-free

(c) The class of the *complements of context-free languages *
(d) Deterministic context-free

Give a *Venn diagram of these classes; that is, represent each class by a *
"bub-ble," so that inclusions, intersections, etc. of classes are reflected accurately.

Can you supply a language for each non-empty region of your diagram?

REFERENCES

*Context-free grammars are a creation of Noam Chomsky; see *

a N. Chomsky "Three models for the description of languages," *IRE Transactions *
*on Information Theory, 2,3, pp. 113-124, 1956, and also *

a N. Chomsky "On certain formal properties of grammars," *Information and *
*Con-trol, 2,137167, 1959. *

*In the last paper, Chomsky normal form was also introduced. A closely r-elated notation *
*for the syntax of programming languages, called BNF (for Backus Normal Form *or
*Backus-Naur Form), was also invented in the late 1950s; see *

a P. Naur, ed. "Revised report on the algorithmic language Algol *60," *
*Communi-cations of the ACM, 6, 1, pp. 1-17, 1963, reprinted in S. Rosen, ed., Programming *
*Systems and Languages New York: McGraw-Hill, pp. 79-118, 1967. *

*Problem 3.1.9 on the equivalence of regular grammars and finite automata is from *
a N. Chomsky, G. A. Miller "Finite-state languages," *Information and Control, 1, *

pp. 91-112, 1958.

*The pushdown automaton was introduced in *

a A. G. Oettinger "Automatic syntactic analysis and the pushdown store,"
*Pro-ceedings of Symposia on Applied Mathematics, Vol. 12, Providence, R.L: *
Ameri-can Mathematical Society, 1961.

*Theorem 3.4.1 on the equivalence of context-free languages and pushdown automata *
*was proved independently by Schutzenberger, Chomsky, and Evey. *

a M. P. Schutzenberger "On context-free languages and pushdown automata,"
*In-formation and Control, 6, 3, pp. 246-264, 1963. *

oN. Chomsky "Context-free grammar and pushdown storage," Quarterly Progress Report, 65, pp. 187-194, M.LT. Research Laboratory in Electronics, Cambridge, Mass., 1962

176 Chapter 3: CONTEXT-FREE LANGUAGES
o *.T. Evey "Application of pushdown store machines," Proceedings of the 1963 Fall *

*Joint Computer Conference, pp. 215-217. Montreal: AFIPS Press, 1963. *

*The closure proper·ties presented in subsection 3.5.1, along with many others, were *
*pointed out in *

o V. Bar-Hillel, M. Perles, and E. Shamir "On formal properties of simple phrase
*structure grammars," Zeitschrijt for Phonetik, Sprachwissenschajt und *
*Kommu-nikationsforschung, *14, pp. 143-172, 1961.

*In the same paper one finds a stronger version of Theorem 3.5.3 (the Pumping Theorem *
*for context-free grammars; see also Problem 3.5.7). An even stronger version of that *
*theorem appears in *

o W. G. Ogden *"A helpful result for proving inherent ambiguity," Mathematical *
*Systems Theory, 2. pp. 191194, 1968. *

*The dynamic progr'amming algorithm for context-free language recognition was *
*discov-ered by *

o T. Kasami "An efficient recognition and syntax algorithm for context-free lan-guages," Report AFCRL-65-758 (1965), Air Force Cambridge Research Labora-tory, Cambridge, Mass., and independently by

o D. H. Younger *"Recognition and parsing of context-free languages in time n*^{3}*," *

*Information and Control, 10, 2, pp. 189-208, 1967. *

*A variant of this algorithm is faster when the underlying grammar is unambiguous *
o .T. Earley *"An efficient context-free parsing algorithm," Communications of the *

*ACM, 13, pp. 94-102, 1970. *

*The most efficient general context-free recognition algorithm known is due to Valiant. *

*It runs in time proportional to the time required for multiplying two n *x *n matrices, *
*currently O(n*^{2}*.**3 **... ). *

o L. G. Valiant *"General context-free recognition in less than cubic time," Journal *
*of Computer and Systems Sciences, 10, 2, pp. 308-315, 1975. *

*LL (1) parsers were introduced in *

o P. M. Lewis II, R. E. *Stearns "Syntax-directed transduction," Journal of the *
*ACM, 15, 3, pp. 465-488, 1968, and also in *

*o D. E. Knuth "Top-down syntax analysis," Acta Informatica, 1, 2, pp. 79-110, *
1971.

*Weak precedence parsers were proposed in *

o .T. D. Ichbiah and S. P. Morse "A technique for generating almost optimal
*Floyd-Evans productions for precedence grammars," Communications of the ACM, 13, *
8, pp. 501-508, 1970.

*The following is a standard book on compilers *

o A. V. Aho, R. Sethi, *.T. D. Ullman Pr'inciples of Compiler Design, Reading, *
Mass.: Addison-Wesley, 1985.

*Ambiguity and inherent ambiguity were first studied in *

o N. Chomsky and M. P. Schutzenberger "The algebraic theory of context free
*languages," in Computer Programming and Formal Systems (pp. 118-161), ed. *

P. Braffort, D. Hirschberg. Amsterdam: North Holland, 1963, and