• 沒有找到結果。

# PUSHDOWN AUTOMATA AND CONTEXT-FREE GRAMMARS

## Context-Free Languages

### 3.4 PUSHDOWN AUTOMATA AND CONTEXT-FREE GRAMMARS

In this section we show that the pushdown automaton is exactly what is needed to accept arbitrary context-free languages. This fact is of mathematical and practical significance: mathematical, because it knits together two different for-mal views of the same class of languages; and practical, because it lays the foundation for the study of syntax analyzers for "real" context-free languages such as programming languages (more on this in Section 3.7).

Theorem 3.4.1: The class of languages accepted by pushdown automata is ex-actly the class of contel:t-free languages.

Proof: We break this proof into two parts.

Lemma 3.4.1: Each context-free language is accepted by some pushdown au-tomaton.

Proof: Let G = (V, I:, R, S) be a context-free grammar; we must construct a pushdown automaton M such that L(M) = L(G). The machine we construct has only two states, p and q, and remains permanently in state q after its first move. Also, M uses V, the set of terminals and nonterminals, as its stack alphabet. We let M = ({p, q}, I:, V, ~,p, {q}), where ~ contains the following transitions:

(1) ((p, e, e), (q, S))

(2) ((q, e, A), (q, x)) for each rule A -+ x in R.

(3) ((q,a,a),(q,e)) for each aE I:.

The pushdown automaton M begins by pushing S, the start symbol of G, on its initially empty pushdown store, and entering state q (transition 1). On each subsequent step, it either replaces the topmost symbol A on the stack,

3.4: Pushdown Automata and Context-Free Grammars 137 provided that it is a nonterminal, by the right-hand side x of some rule A -+ x in R (transitions of type 2), or pops the topmost symbol from the stack, provided that it is a terminal symbol that matches the next input symbol (transitions of type 3). The transitions of M are designed so that the pushdown store during an accepting computation mimics a leftmost derivation of the input string; M intermittently carries out a step of such a derivation on the stack, and between such steps it strips away from the top of the stack any terminal symbols and matches them against symbols in the input string. Popping the terminals from the stack has in turn the effect of exposing the leftmost nonterminal, so that the process can continue.

Example 3.4.1: Consider the grammar G = (V,L,R,S) with V = {S,a,b,c}, L = {a, b, e}, and R = {S -+ aSa, S -+ bSb, S -+ e), which generates the language {wcwR : w E {a, b}*}. The corresponding pushdown automaton, ac-cording to the construction above, is M = ({p, q}, L, V, b., p, { q} ), with

b. = {((p, e, e), (q, S)), ((q, e, S), (q, aSa)), ((q, e, S), (q, bSb)), ((q, e, S), (q, e)), ((q, a, a), (q, e)), ((q, b, b), (q, e)),

c,c),

### (q,e))}

(Tl) (T2) (T3) (T4) (T5) (T6) (T7).

The string abbebba is accepted by M through the following sequence of moves.

State Unread Input Stack Transition Used

p abbebba e

-q abbebba S Tl

q abbcbba aSa T2

q bbebba Sa T5

q bbebba bSba T3

q bcbba Sba T6

q bebba bSbba T3

q ebba Sbba T6

q ebba ebba T4

q bba bba T7

q ba ba T6

q a a T6

q e e T5

138 Chapter 3: CONTEXT-FREE LANGUAGES Compare this to the operation, on the same string, of the pushdown au-tomaton of Example 3.3.1.0

To continue the proof of the Lemma, in order to establish that L(M) L( G), we prove the following claim.

Claim. Let w E L* and 0: E (V - L)V* U {e}. Then S

### ~

* wo: if and only if (q,w,S)

### f-M

(q,e,o:).

This claim will suffice to establish Lemma 3.4.1, since it will follow (by taking 0: = e) that S

### ~

* w if and only if (q, e, S)

(q, e, e)

### ~in

other words, w E L(G) if and only if wE L(M).

(Only if) Suppose that S

### ~

* wo:, where w E L*, and 0: E (V - L)V* U {e}.

We shall show by induction on the length of the leftmost derivation of w from S that (q,w,S)

### f-M

(q,e,o:).

Basis Step. If the derivation is of length 0, then w = e, and 0: = S, and hence indeed (q,w,S)

### f-M

(q,e,o:).

Induction Hypothesis. Assume that if S

### ~

* wo: by a derivation of length n or less, n

0, then (q,w,S)

### f-M

(q,e,o:).

Induction Step. Let

S

Uo

### :i

Ul ~ . . . ~ Un ~ Un+l

### =

wo:

be a leftmost derivation of wo: from S. Let A be the leftmost nonterminal of

Un. Then Un

xA,8, and Un+l

### =

x,,8, where x E L*, ,8" E V*, and A --+ , is a rule in R. Since there is a leftmost derivation of length n of Un = xA,8 from S, by the induction hypothesis

(q, x, S)

### f-M

(q, e, A,8). (2) Since A --+ , is a rule in R,

(q,e,A,8) f-M (q,e",8), (3)

by a transition of type 2.

Now notice that Un+l is wo:, but it is also x,,8. Hence, there is a string y E L* such that w = xy and yo: = ,,8. Thus, we can rewrite (2) and (3) above as

(q, w, S)

### f-M

(q, y, ,,8). (4) However, Since yo: = ,,8,

(q,y",8)

(q,e,o:), (5)

by a sequence of

### Iyl

transitions of type 3. Combining (4) and (5) completes the induction step.

3.4: Pushdown Automata and Context-Free Grammars 139 (If) Now suppose that (q, w, S)

### f-'M

(q, e, 0'.) with W E L* and 0'. E (V - L)V* U {e}; we show that S

### ~

* WO'.. Again, the proof is by induction, but this time on the number of transitions of type 2 in the computation by M.

Basis Step. Since the first move in any computation is by a transition of type 2, if (q,w,S)

### f-'M

(q,e,O'.) with no type-2 transitions, then w =:: e and 0: =:: S, and the result is true.

Induction Hypothesis. If (q,w,S)

### f-'M

(q,e,O'.) by a computation with n type 2 steps or fewer, n ? 0, then S

### ~

* WO'..

Induction Step. Suppose that (q,w,S)

(q,e,O'.) in n

### +

1 type-2 transitions, and consider the next-to-Iast such transition, say,

(q,w,S)

(q,y,A(3)

(q,y,,(3)

(q,e,O'.),

where w

### =

xy for some x, y E L*, and A

### ---+,

is a rule of the grammar. By the induction hypothesis we have that S

xA(3, and thus S

### ~

* x,(3. Since however (q,y,,(3)

### f-'M

(q,e,O'.), presumably by transitions of type 3, it follows that yO'. =:: ,(3, and thus S

* xyO'.

### =

WO'.. This completes the proof of Lemma 3.4.1, and with it half the proof of Theorem 3.4.1. •

We now turn to the proof of the other half of Theorem 3.4.1.

Lemma 3.4.2: If a language is accepted by a pushdown automaton, it is a context-free language.

Proof: It will be helpful to restrict somewhat the pushdown automata under consideration. Call a pushdown automaton simple if the following is true:

Whenever (( q, a, (3), (p, ,)) is a transition of the pushdown automaton and q is not the start state, then (3 E

and

### hi ::;

2.

In other words, the machine always consults its topmost stack symbol (and no symbols below it), and replaces it either with e, or with a single stack symbol, or with two stack symbols. Now it is easy to see that no interesting pushdown automaton can have only transitions of this kind, because then it would not be able to operate when the stack is empty (for example, it would not be able to start the computation, since in the beginning the stack is empty). This is why we do not restrict transitions from the start state.

We claim that if a language is accepted by an unrestricted pushdown au-tomaton, then it is accepted by a simple pushdown automaton. To see this, let fl.1 = (K, L, r,~, s, F) be any pushdown automaton; we shall construct a simple pushdown automaton M' = (KI, L,

### ru

{Z}, ~/, Sl, {f}) that also accepts L(M);

140 Chapter 3: CONTEXT-FREE LANGUAGES here s' and

### f'

are new states not in K, and Z is a new stack symbol, the stack bottom symbol, also not in

### r.

We first add to ~ the transition (( s' , e, e), (s, Z));

this transition starts the computation by placing the stack bottom symbol in the bottom of the stack, where it will remain throughout the computation. No rule of

### t1

will ever push a Z in the stack -except to replace it at the bottom of the stack. We also add to

### t1

the transitions (if. e. Z). (/', e)) for each

### f

E F.

These transitions end the computation by removing Z from the bottom of the stack and accepting the input seen so far.

Initially, ~' consists of the start and final transitions described above, and all transitions of ~. We shall next replace all transitions in ~' that violate the simplicity condition by equivalent transitions that satisfy the simplicity condi-tion. We shall do this in three stages: First we shall replace transitions with 1,81

### 2

2. Then we shall get rid of transitions with

### hi >

2, without introducing any transitions with 1,81 2 2. Finally, we shall get rid of transitions with ,8 = e, without introducing any transitions with 1,81

2 or

### hi >

2.

Consider any transition ((q,a,,8),(p,,)) E ~', where,8 = B1···Bn with n

### >

1. It is replaced by new transitions that pop sequentially the individual symbols in Bl ... Bn , rather than removing them all in a single step. Specifically, we add to ~' these transitions:

((q, e, B1 ), (qB1, e)), ((qB 1, e,B2),(qB1B2· e)),

((qBI B, ... B._." e,

### Bn-d,

(qB 1B, ... B,,_I, e)), ((qB1B2 .. B._I' a, Bn ), (p, ,)),

where, for i = 1, ... ,n - 1, qB1B2 ... Bi is a new state with the intuitive meaning

"state q after symbols B1B2 ••• Bi have been popped. We repeat this with all transitions ((q,u,,8), (p,,)) E ~' with 1,81

### >

1. It is clear that the resulting pushdown automaton is equivalent to the original one.

Similarly, we replace transitions ((q,u,,8), (p,,)) with,

C1 ... ,Cm and m

### 2

2 by the transitions

((q, a, ,8), (rl' Cm)),

### (h,

e, e), (r2' Cm-d),

((rm-2, e, e), (rm- l , C2 )),

((rm- l , e, e), (p, Cd),

where rl,.'" rm-l are new states. Notice that all transitions ((q, a, ,8), (p, ,)) E t{ have now

### hi ::;

I - a more stringent requirement than simplicity (and actually

3.4: Pushdown Automata and Context-Free Grammars 141 one that would be a loss of generality). It will be restored to

### hi ::;

2 in the next stage. Also, no transitions (( q, u, ,8), (p, ,)) with

### 1,81 >

1 were added.

Finally, consider any transition ((q, a, e), (p, ,)) with q

### i-

s' ~the only possi-ble remaining violations of the simplicity condition. Replace any such transition by all transitions of the form ((q,a,A), (p"A)), for all A E

### r

U {Z}. That is, if the automaton could move without consulting its stack, it can also move by consulting the top stack symbol, whatever it may be, and replace it immedi-ately. And we know that there i8 at least one symbol in the stack: throughout the main computation ~apart from the start and final transitions~ the stack never becomes empty. Notice also that at this stage we may introduce ,'s of length two ~this does not violate the simplicity eondition, but is necessary for obtaining general pushdown automata.

It is easy to see that this construction results in a simple pushdown automa-ton M' such that L(M) = L(M'). To continue the proof of the lemma, we shall exhibit a context-free grammar G such that L(G)

### =

L(M'); this would conclude the proof of the lemma, and with it of Theorem 3.4.l.

We let G

### =

(V,~, R, S), where V contains, in addition to a new symbol S and the symbols in ~, a new symbol (q, A.,p) for all q,p E

and each A E

U

### ie,

Z}. To understand the role of the nonterminals (q, A,p), remember that G is supposed to generate all strings accepted by M~ Therefore the nonterminals of G stand for different parts of the input strings that are accepted by M~ In particular, if A E

### r,

then the nonterminal (q,A.,p) represents any portion of the input string that might be read between a point in time when M'is in state q with A. on top of its stack, and a point in time when M'removes that occurrence of A from the stack and enters state p. If A

### =

e, then (q, e,p) denotes a portion of the input string that might be read between a time when M'is in state q and a time when it is in state p with the same stack, without in the interim changing or consulting that part of the stack.

The rules in R are of four types.

(1) The rule S , (8, Z, 1'), where 8 is the start state of the original pushdown automaton M and

### I'

the new final state.

(2) For each transition ((q,a,B), (r.C». where q,r E K: a E ~ U {e}, B,C E

### r

U {e}, and for each p E K~ we add the rule (q, B,p) , a(r, C,p).

(3) For each transition ((q,a,B), (r,C1C2 )), where q,r E K~ a E ~ U {e}, B E

### r

U {e}, and C1,C2 E

### r

and for each p,p' E K~ we add the rule (q, B,p) , a(r, C1 ,p')(p', C2,p).

(4) For each q E K: the rule (q, e, q) , e.

Note that, because M'is simple, either type 2 or type 3 applies to each transition of M~

A rule of type 1 states essentially that any input string which can be read by M' passing from state s to the final state, while at the same time the net

142 Chapter 3: CONTEXT-FREE LANGUAGES effect on the stack is that the stack bottom symbol was popped, is a string in the language L(l'vIj. A rule of type 4 says that no computation is needed to go from a state to itself without changing the stack. Finally, a rule of type 2 or 3 says that, if ((q,a,B),(p,,» Ell', then one of the possible computations that lead from state q to state p while consuming B (possibly empty) from the top of the stack, starts by reading input a, replacing B by " passing to state r. and then going on to consume, and end up in state p --reading whatever input is appropriate during such computation. If, = C1 C2 , this last computation can in principle pass through any state p' immediately after C1 is popped (this is a type-3 rule).

These intuitive remarks are formalized in the following claim.

Claim. For any q,p E K: A E

{e}, and x E ~*,

(q, A,p)

### =}a

x if and only if (q, x, A)

### f-iw

i (p, e, e).

Lemma 3.4.2, and with it Theorem 3.4.1, follows readily from the claim, since then (8, e, f)

x for some

### f

E F if and only if (8, x, e)

### f-M-'

(f, e, e); that is, x E L(G) if and only if x E L(l'vIj.

Both directions of the claim can be proved by induction on the length either of the derivation of G or the eomputation of M; they are left as an exercise (Problem 3.4.5) . •

Problems for Section 3.4

3.4.1. Carry out the construction of Lemma 3.4.1 for the grammar of Example 3.1.4. Trace the operation of the automaton you have constructed on the in pu t string (()

### 0 ) .

3.4.2. Carry out the construction of Lemma 3.4.2 for the pushdown automaton of Example 3.3.2, and let G be the resulting grammar. What is the set {w E {a, b}* : (q, a,p)

### =}a

w}? Compare with the proof of Lemma 3.4.2.

3.4.3. Carry out the construction of Lemma 3.4.2 for the pushdown automaton of Example 3.3.3. The resulting grammar will have 25 rules, but many can be eliminated as useless. Show a derivation of the string aababbba in this grammar. (You may change the names of the nonterminals for clarity.) 3.4.4. Show that if M

(K,~,

### r,

Ll, s, F) is a pushdown automaton, then there

is another pushdown automaton M'

(K',~,

### r,

Ll', s, F) such that (M') = L(M) and for all ((q, u,

(p,

E Ll',

### 1,81 + hi

:=:; 1.

3.4.5. Complete the proof of Lemma 3.4.2.

3.5: Languages that Are and Are Not Context-Free 143 3.4.6. A context-free grammar is linear if and only if no rule has as its

right-hand side a string with more than one nonterminal. A pushdown automa-ton (K,~, r,~, s, F) is said to be single-turn if and only if whenever (s. w. e) 1-*(qI' WI' Yt) I-(qu W2. Yz) I-*(q:y w:y y:0 and I y21 < I yII then I y31 < I Y21.

(That is, once the stack starts to decrease in height, it never again increases in height.) Show that a language is generated by a linear context-free gram-mar if and only if it is accepted by a single-turn pushdown automaton.

### 3.5 LANGUAGES THAT ARE AND ARE NOT CONTEXT-FREE

Closure Properties

In the last section, two views of context-free languages -as languages generated by context-free grammars and as languages accepted by pushdown automata-~

were shown to be equivalent. These characterizations enrich our understand-ing of the context-free languages, since they provide two different methods for recognizing when a language is context-free. For example, the grammatical representation is more natural and compelling in the case of a programming language fragment such as that of Example 3.1.3; but the representation in terms of pushdown automata is easier to see in the case of {w E {a, b}' : w has equal numbers of a's and b's} (see Example 3.3.3). In this subsection we shall provide further tools for establishing context-freeness: we show some clo-sure properties of the context-free languages under language operations -very much in the spirit of the closure properties of regular languages. In the next subsection we shall prove a more powerful pumping theorem which enables us to show that certain languages are not context-free.

Theorem 3.5.1: The context-free languages are closed under union, concatena-tion, and Kleene star.

Proof:. Let GI

### =

(VI, ~I' RI , St) and G2

### =

(V2, ~2' R2, S2) be two context-free grammars, and without loss of generality assume that they have disjoint sets of nonterminals --that is, VI - ~I and V2 ~ ~2 are qisjoint.

Union. Let S be a new symbol and let G = (VI U V2 U {S}, ~I

### u

~2, R, S), where R = RI U R2 U {S --+ SI, S --+ S2}. Then we claim that L(G) = L(Gt} U L(G2 ). For the only rules involving S are S --+ SI and S --+ S2, so S

### =}a

w if and

only if either SI

w or 82

### =}a

W; and since GI and G2 have disjoint sets of nonterminals, the last disjunction is equivalent to saying that wE L(GduL(G2 ).

Concatenation. The construction is similar: L(Gt}L(G2 ) is generated by the grammar

G = (VI U V2 U {S}, ~I U ~2, RI U R2 U {S --+ SlSd,S).

144 Chapter 3: CONTEXT-FREE LANGUAGES Kleene Star. L(Gd* is generated by

### •

As we shall see shortly, the class of context-free languages is not closed under intersection or complementation. This is not very surprising: Recall that our proof that regular languages are closed under intersection depended on closure under complementation; and that construction required that the automaton be deterministic. And not all context-free languages are accepted by deterministic pushdown automata (see the corollary to Theorem 3.7.1).

There is an interesting direct proof of the closure under intersection of regular languages, not relying on closure under complement, but on a direct construction of a finite automaton whose set of states is the Cartesian product of the sets of states of the constituent finite automata (recall Problem 2.3.3).

This construction cannot of course be extended to pushdown automata -the product automaton would have needed two stacks. However, it can be made to work when one of the two automata is finite:

Theorem 3.5.2: The intersection of a context-free language with a regular lan-guage is a context-free lanlan-guage.

Proof: If L is a context-free language and R is a regular language, then L = L(Md for some pushdown automaton Ml

### =

(K1,I:,f1,6.1,SI,F1 ), and R

### =

L(M2) for some deterministic finite automaton M2 = (K2, I:, 6, S2, F2)' The idea is to combine these machines into a single pushdown automaton M that carries out computations by Ml and M2 in parallel and accepts only if both would have accepted. Specifically, let M

### =

(K, I:, f, 6., s, F), where

K

### =

Kl X K 2, the Cartesian product of the state sets of Ml and M2;

f

### =

f 1 ;

s = (Sl, S2);

F = Fl X F2, and

6., the transition relation, is defined as follows. For each transition of the pushdown automaton ((ql,a,,B), (Pl,'Y)) E 6.1 , and for each state q2 E K 2, we add to 6. the transition (( (ql, q2), a,

### 13),

((PI, 6( q2, a)), 'Y)); and for each transition of the form (( Ql, e, (3), (PI ,'Y)) E 6.1 and each state Q2 E K 2, we add to 6. the transition (( (Ql , Q2), e, ,8), UPI , Q2), 'Y))' That is, M passes from state (Ql, Q2) to state (Pl,P2) in the same way that Ml passes from state Ql to PI, except that in addition M keeps track of the change in the state of M2 caused by reading the same input.

3.5: Languages that Are and Are Not Context-Free 145 It is easy to see that indeed w E L(M) if and only if w E L(M1)

### n

L(M2)' • Example 3.5.1: Let L consist of all strings of a's and b's with equal numbers of a's and b's but containing no substring abaa or babb. Then L is context-free, since it is the intersection of the language accepted by the pushdown automaton in Example 3.3.3 with the regular language {a, b} * - {a, b}*(abaaUbabb){ a, b}*.<)

A Pumping Theorem

Infinite context-free languages display periodicity of a somewhat subtler form than do regular languages. To understand this aspect of context-freeness we start from a familiar quantitative fact about parse trees. Let G

### =

(V, L, R, S) be a context-free grammar. The fanout of G, denoted ¢(G), is the largest number of symbols on the right-hand side of any rule in R. A path in a parse tree is a sequence of distinct nodes, each connected to the previous one by a line segment; the first node is the root, and the last node is a leaf. The length of the path is the number of line segments in it. The height of a parse tree is the length of the longest path in it.

Lemma 3.5.1: The yield of any parse tree of G of height h has length at most

¢(G)h.

Proof: The proof is by induction on h. When h = 1, then the parse tree is a rule of the grammar (this is Case 2 of the definition of a parse tree), and thus its yield has at most ¢( G)h

### =

¢( G) symbols.

Suppose then that the result is true for parse trees of height up to h

### 2

1.

For the induction step, any parse tree of height h+ 1 consists of a root, connected to at most d>( G) smaller parse trees of height at most h (this is Case 3 of the definition of a parse tree). By induction, all these parse "subtrees" have yields of length at most ¢(G)h each. It follows that the total yield of the overall parse tree is indeed at most ¢(G)h+1, completing the induction . •

To put it another way, the parse tree of any string w E L(G) with

### Iwl >

¢(G)h must have a path longer than h. This is crucial in proving the following pumping theorem for context-free languages.

Theorem 3.5.3: Let G

### =

(V, L, R, S) be a context-free gr·ammar. Then any string w E L( G) of length greater than 1>( G) II" -I;I can be rewritten as w

### =

uvxyz in such a way that either v or y is nonempty and uvnxynz is in L(G) for every n 2 O.

Proof: Let w be such a string, and let T be the parse tree with root labeled S and with yield w that has the smallest number of leaves among all parse trees

146 Chapter 3: CONTEXT-FREE LANGUAGES with the same root and yield. Since T's yield is longer than 4>( G) IV -EI, it follows that T has a path of length at least

~I

### +

1, that is, with at least

~I

### +

2 nodes. Only one of these nodes can be labeled by a terminal, and thus the remaining are labeled by nonterminals. Since there are more nodes in the path than there are nonterminals, there are two nodes on the path labeled with the same member A of V - L. Let us look at this path in more detail (see Figure 3-9).

S

Figure 3-9

Let us call u, v, x, y, and z the parts of the yield of T as they are shown in the figure. That is, x is the yield of the subtree Til whose root is the lower node labeled A; v is the part of the yield of the tree T' rooted at the higher A up to where the yield of Til starts; u is the yield of T up to where the yield of T' starts; and z is the rest of the yield of T.

It is now clear that the part of T' excluding Til can be repeated any number of times, including zero times, to produce other parse trees of G, whose yield is any string of the form uvnxyn z , n

### 2

O. This completes the proof, except for the requirement that vy f::- e. But if vy = e, then there is a tree with root Sand yield w with fewer leaves than that of T ~namely, the one that results if we omit from T the part of T' that excludes T"-contrary to our assumption that T is the smallest tree of this kind . •

E;xample 3.5.2: Just like the pumping theorem for regular languages (Theorem 2.4.1), this theorem is useful for showing that certain languages are not context-free. For example, L = {anbncn : n 2 O} is not. For suppose that L = L(G)

¢(G)IV-EI

for some context-free grammar G = (V,~, R, S). Let n

Then w

### =

anbncn is in L(G) and has a representation w

### =

uvxyz such that v or

3.5: Languages that Are and Are Not Context-Free 147 y is nonempty and 1wnxyn;; is in L( G) for each n = 0,1,2, ... There are two cases, both leading to a contradiction. If vy contains occurrences of all three symbols a, b, c, then at least one of v, y must contain occurrences of at least two of them. But then uv2xy2 z contains two occurrences out of their correct order --a b before an a, or a c before an a or b. If vy contains occurrences of some but not all of the three symbols, then uv2xy2 z has unequal numbers of a's, b's, and c's.¢

Example 3.5.3: L = {an: n

### 2

1 is a prime} is not context-free. To see this, take a prime p greater than ¢( G) I v -EI, where G = (V,~, R, S) is the context-free grammar allegedly generating L. Then w

### =

aP can be written as prescribed by the theorem, w

### =

uvxyz, where all components of ware strings of a's and vy :J:. e. Suppose that vy = aq, and uxz = a", where q and r are natural numbers, and q > O. Then the theorem states that r

### +

nq is a prime, for all n 2 O. This was found absurd in Example 2.4.3.

It was no accident that, in our proof that {an: n 2 1 is a prime} is not context-free, we resorted to an argument very similar to that in Example 2.4.3, showing that the same language is not regular. It turns out that any context-free language over a single-letter alphabet is regular; thus, the result of the present example follows immediately from this fact and Example 2.4.3.¢

Example 3.5.4: We shall next show that the language L = {w E {a, b, c}' w has an equal number of a's, b's, and c's} is not context-free. This time we need both Theorems 3.5.3 and 3.5.2: If L were context-free, then so would be its intersection with the regular set a*b'c'. But this language, {anbncn : n 2 A}, was shown to be non-context-free in Example 3.5.2 above.¢

These negative facts also expose the POVE'Ity in closure properties of the class of context-free languages:

Theorem 3.5.4: The context-free languages are not closed under intersection or complementation.

Proof: Clearly {anbncm : m,n 2 O} and {ambncn : m,n 2 O} are both context-free. The intersection of these two context-free languages is the lan-guage {anbnc71 : n 2 O} just shown not to be context-free. And, since

if the context-free languages were closed under complementation, they would also be closed under intersection (we know they are closed under union, Theorem 3.5.1) . •

148 Chapter 3: CONTEXT-FREE LANGUAGES

Problems for Section 3.5

3.5.1. Use closure under union to show that the following languages are context-free.

(a) {ambn : m

### ¥-

n}

(b) {a, b}* - {anbn : n ~ O}

(c) {ambncPdQ:n=q, ormsporm+n=p+q}

(d) {a, b}* - L, where L is the language

L = {babaabaaab ... ban- 1banb: n ~ I}

(e) {w E {a,b}*: w = wR}

3.5.2. Use Theorems 3.5.2 and 3.5.3 to show that the following languages are not context-free.

(a) {aP : p is a prime}

(b){an2:n~O}

(c) {www : wE {a, b}*}

(d) {w E {a, b, c}* : w has equal numbers of a's, b's, and c's}

3.5.3. Recall that a homomorphism is a function h from strings to strings such thatfor any two strings v andw, h(vw)

### =

h(v)h(w). Thus a homomorphism is determined by its values on single symbols: if w = a1 ... an with each ai a symbol, then h(w)

### =

h(ad ... h(an). Note that homomorphisms can

"erase": h(w) may be e, even though w is not. Show that if L is a context-free language and h is a homomorphism, then

(a) h[L] is context-free;

(b) h-l[L] (that is, {w E ~* : h(w) E L}) is context-free. (Hint: Start from a pushdown automaton M accepting L. Construct another pushdown automaton, similar to M, except that it reads its input not from the input tape, but from a finite buffer that is occasionally replenished in some way.

You supply the rest of the intuition and the formal details.)

3.5.4. In the proof of Theorem 3.5.2, why did we assume that M2 was determin-istic?

3.5.5. Show that the language L

### =

{babaabaaab ... ban-1banb : n ~ I} is not context-free

(a) by applying the Pumping Theorem (3.5.3);

(b) by applying the result of Problem 3.5.3. (Hint: What is h[L], where h(a) = aa, and h(b) = a?)

3.5.6. If L 1, L2 ~ ~* are languages, the right quotient of L1 by L2 is defined as follows.

Lt/L2

### =

{w E ~* : there is au E L2 such that wu E Lt}

3.5: Languages that Are and Are Not Context-Free 149 (a) Show that if L1 is context-free and R is regular, then LJ / R is context-free.

(b) Prove that {aPbn : p is a prime number and n

### >

p} is not context-free.

3.5.7. Prove the following stronger version of the Pumping Theorem (Theorem 3.5.3): Let G be a context-free grammar. Then there are numbers K and k such that any string w E L(G) withlwl2. K can be rewritten as w

### =

uvxyz with vxy skin such a way that either v or y is non empty and UV71 xy" z E L(G) for every n 2. O.

3.5.8. Use Problem 3.5.7 to show that the language {ww : w E {a,b}"} is not context-free.

3.5.9. Let G = (V,~, R, S) be a context-free grammar. A nonterminal A of G is called self-embedding if and only if A::::}~ uAv for some u,v E V".

(a) Give an algorithm to test whether a specific nonterminal of a given context-free grammar is self.·embedding.

(b) Show that if G has no self-embedding Ilonterminal, then L(G) is a regular language.

3.5.10. A context-free grammar G

### =

(i-T,~, R, S) is said to be in Greibach normal form if every rule is of the form or A --+ w for some w E ~(V - ~)*.

(a) Show that for every context-free grammar G, there is a context-free grammar G' in Greibach normal form such that L(G') = L(G') - {e}.

(b) Show that if M is constructed as in the proof of Lemma 3.4.1 from a grammar in Greibach normal form, then the number of steps in any com-putation of M on an input w can be bounded as a function of the length of w.

3.5.11. Deterministic finite-state transducers were introduced in Problem 2.1.4.

Show that if L is context-free and

### I

is computed by a deterministic finite-state transducer, then

(a) I[L] is context-free;

(b)

1 [L

### 1

is context-free.

3.5.12. Develop a version of the Pumping Theorem for context-free languages in which the length of the "pumped part" is as long as possible.

3.5.13. Let M1 and M2 be pushdown automata. Show how to construct push-down automata accepting L(M1 ) UL(M2 ), L(M1)L(M2 ), and L(Md", thus providing another proof of Theorem 3.5.1.

3.5.14. Which of the following languages are context-free? Explain briefly in each case.

(a) {ambncp : m

### =

nor n = p or m = p}

(b) {ambncp : m

nor n

p or m

p}

Outline