Resume normal parsing - does not change return I

until I does not change return I

4. Resume normal parsing

In the two error productions illustrated above, we have taken care to follow the error symbol with an appropriate synchronizing token – in this case, right parenthesis or semicolon. Thus, the “non-error action” taken in step 3 will always shift. If instead we used the production exp → error, the “non-error action” would be reduce, and (in an SLR or LALR parser) it is possible that the original (erroneous) lookahead symbol would cause another error after the reduce action, without having advanced the input. Therefore, grammar rules that contain error not followed by a token should be used only when there is no good alternative.

Caution. One can attach semantic actions to Yacc grammar rules; whenever a rule is reduced, its semantic action is executed. Chapter 4explains the use of semantic actions. Popping states from the stack can lead to seemingly “im-possible” semantic actions, especially if the actions contain side effects. Con-sider this grammar fragment:

statements: statements exp SEMICOLON

| statements error SEMICOLON

| /* empty */

exp : increment exp decrement

| ID

increment: LPAREN {nest=nest+1;}

decrement: RPAREN {nest=nest-1;}

3.5. ERROR RECOVERY

“Obviously” it is true that whenever a semicolon is reached, the value of nestis zero, because it is incremented and decremented in a balanced way according to the grammar of expressions. But if a syntax error is found after some left parentheses have been parsed, then states will be popped from the stack without “completing” them, leading to a nonzero value of nest. The best solution to this problem is to have side-effect-free semantic actions that build abstract syntax trees, as described inChapter 4.

GLOBAL ERROR REPAIR

What if the best way to recover from the error is to insert or delete tokens from the input stream at a point before where the error was detected? Consider the following Tiger program:

let type a := intArray [ 10 ] of 0 in . . .

A local technique will discover a syntax error with:=as lookahead sym-bol. Error recovery based on error productions would likely delete the phrase from typeto 0, resynchronizing on thein token. Some local repair tech-niques can insert tokens as well as delete them; but even a local repair that replaces the:=by=is not very good, and will encounter another syntax er-ror at the[ token. Really, the programmer’s mistake here is in usingtype instead ofvar, but the error is detected two tokens too late.

Global error repairfinds the smallest set of insertions and deletions that would turn the source string into a syntactically correct string, even if the insertions and deletions are not at a point where an LL or LR parser would first report an error. In this case, global error repair would do a single-token substitution, replacingtypebyvar.

Burke-Fisher error repair. I will describe a limited but useful form of global error repair, which tries every possible single-token insertion, deletion, or replacement at every point that occurs no earlier than K tokens before the point where the parser reported the error. Thus, with K = 15, if the parsing engine gets stuck at the 100th token of the input, then it will try every possible repair between the 85th and 100th token.

The correction that allows the parser to parse furthest past the original reported error is taken as the best error repair. Thus, if a single-token substi-tution ofvarfortypeat the 98th token allows the parsing engine to proceed past the 104th token without getting stuck, this repair is a successful one.

a := 7 Old num₁₀ Stack :=₆

id₄

↓

; b := c + (

! "# $

6-token queue

Current (₈ Stack +₁₆

E11

:=₆ id₄

;₃ S2

↓

d := 5 + 6 , d ) $

FIGURE 3.38. Burke-Fisher parsing, with an error-repair queue. Figure 3.18 shows the complete parse of this string according toTable 3.19.

Generally, if a repair carries the parser R = 4 tokens beyond where it origi-nally got stuck, this is “good enough.”

The advantage of this technique is that the LL(k) or LR(k) (or LALR, etc.) grammar is not modified at all (no error productions), nor are the parsing tables modified. Only the parsing engine, which interprets the parsing tables, is modified.

The parsing engine must be able to back up K tokens and reparse. To do this, it needs to remember what the parse stack looked like K tokens ago.

Therefore, the algorithm maintains two parse stacks: the current stack and the old stack. A queue of K tokens is kept; as each new token is shifted, it is pushed on the current stack and also put onto the tail of the queue; simul-taneously, the head of the queue is removed and shifted onto the old stack.

With each shift onto the old or current stack, the appropriate reduce actions are also performed.Figure 3.38illustrates the two stacks and queue.

Now suppose a syntax error is detected at the current token. For each possi-ble insertion, deletion, or substitution of a token at any position of the queue, the Burke-Fisher error repairer makes that change to within (a copy of) the queue, then attempts to reparse from the old stack. The success of a modifi-cation is in how many tokens past the current token can be parsed; generally, if three or four new tokens can be parsed, this is considered a completely successful repair.

In a language with N kinds of tokens, there are K + K · N + K · N possible deletions, insertions, and substitutions within the K -token window. Trying

3.5. ERROR RECOVERY

this many repairs is not very costly, especially considering that it happens only when a syntax error is discovered, not during ordinary parsing.

Semantic actions. Shift and reduce actions are tried repeatly and discarded during the search for the best error repair. Parser generators usually perform programmer-specified semantic actions along with each reduce action, but the programmer does not expect that these actions will be performed repeatedly and discarded – they may have side effects. Therefore, a Burke-Fisher parser does not execute any of the semantic actions as reductions are performed on the current stack, but waits until the same reductions are performed (perma-nently) on the old stack.

This means that the lexical analyzer may be up to K + R tokens ahead of the point to which semantic actions have been performed. If semantic actions affect lexical analysis – as they do in C, compiling the typedeffeature – this can be a problem with the Burke-Fisher approach. For languages with a pure context-free grammar approach to syntax, the delay of semantic actions poses no problem.

Semantic values for insertions. In repairing an error by insertion, the parser needs to provide a semantic value for each token it inserts, so that semantic actions can be performed as if the token had come from the lexical analyzer.

For punctuation tokens no value is necessary, but when tokens such as num-bers or identifiers must be inserted, where can the value come from? The ML-Yacc parser generator, which uses Burke-Fischer error correction, has a

%valuedirective, allowing the programmer to specify what value should be used when inserting each kind of token:

%value ID ("bogus")

%value INT (1)

%value STRING ("")

Programmer-specified substitutions. Some common kinds of errors cannot be repaired by the insertion or deletion of a single token, and sometimes a particular single-token insertion or substitution is very commonly required and should be tried first. Therefore, in an ML-Yacc grammar specification the programmer can use the%changedirective to suggest error corrections to be tried first, before the default “delete or insert each possible token” repairs.

%change EQ -> ASSIGN | ASSIGN -> EQ

| SEMICOLON ELSE -> ELSE | -> IN INT END

Here the programmer is suggesting that users often write “; else” where they mean “else” and so on.

The insertion ofin 0 endis a particularly important kind of correction, known as a scope closer. Programs commonly have extra left parentheses or right parentheses, or extra left or right brackets, and so on. In Tiger, another kind of nesting construct islet· · ·in· · ·end. If the programmer forgets to close a scope that was opened by left parenthesis, then the automatic single-token insertion heuristic can close this scope where necessary. But to close a let scope requires the insertion of three tokens, which will not be done automatically unless the compiler-writer has suggested “change nothing to in 0 end” as illustrated in the%changecommand above.

P R O G R A M PARSING

Use Yacc to implement a parser for the Tiger language. Appendix A de-scribes, among other things, the syntax of Tiger.

You should turn in the filetiger.grmand aREADME. Supporting files available in$TIGER/chap3include:

makefile The “makefile.”

errormsg.[ch] The Error Message structure, useful for producing error mes-sages with file names and line numbers.

lex.yy.c The lexical analyzer. I haven’t provided the source file tiger.lex, but I’ve provided the output of Lex that you can use if your lexer isn’t working.

parsetest.c A driver to run your parser on an input file.

tiger.grm The skeleton of a file you must fill in.

You won’t needtokens.hanymore; instead, the header file for tokens is y.tab.h, which is produced automatically by Yacc from the token specifi-cation of your grammar.

Your grammar should have as few shift-reduce conflicts as possible, and no reduce-reduce conflicts. Furthermore, your accompanying documentation should list each shift-reduce conflict (if any) and explain why it is not harmful.

My grammar has a shift-reduce conflict that’s related to the confusion be-tween

variable [ expression ]

type-id [ expression ] of expression

In fact, I had to add a seemingly redundant grammar rule to handle this con-fusion. Is there a way to do this without a shift-reduce conflict?

4

在文檔中 Modern Compiler Implementation in C (頁 89-99)