Sentence patterns and figures of speech are employed to refine and enhance concepts in essays. English AES systems often evaluate the quality of essays based on the variety and versatility of sentence patterns and the occurrence of such patterns as inverted sentence and relative clause. However, Chinese AES systems cannot use the evaluating methods because the definition of Chinese sentences is uncertain and loose.
The issue also results in the lack of studies about the relativity between the quality and sentence patterns of essays.
Although methods for identifying sentence patterns in essays cannot be used to Chinese AES, figures of speech are useful for scoring essays. Some studies indicate that the writers who use figures of speech in essays possess better writing skills. Ko [26] notes that the usage of figures of speech in Chinese essays is an important factor in essay scoring. [10][20] states that students’ writing skills can be enhanced when they practice or study the usage of figures of speech.
Many studies [22][51][52] have proposed various definitions and classification of figures of speech in Chinese articles. Although the definitions and classifications are varied, the manifestations of the figures of speech “pi-yu” and “pai-bi” are similar to each other. These observations indicate that it is feasible to extract figures-of-speech “pi-yu” and “pai-bi” from essays.
In the other hand, some figures of speech contain both basic and advanced representations. Huang [23] notes that ten syntactic rules for Chinese figures of speech are all included in the textbooks of elementary schools, but Chen’s experiments [12] show that the literary “pi-yu” is not used as often as the basic
“pi-yu” in sixth grade students’ Chinese essays. It implies that a better writer will use an advanced representation of familiar figure of speech skills in essays among many alternatives.
This thesis proposes methods for extracting figures of speech “pi-yu”, “pai-bi”
and literary “pi-yu”. Subsection 6.1 will discuss the representation of figure of speech
“pi-yu”. Subsection 6.2 presents methods for identifying figure of speech “pi-yu” and literary “pi-yu”. Subsection 6.3 presents methods for extracting figure of speech
“pai-bi” in essays.
In addition, although comma in Chinese functions as both comma and period in
English, the issue of ambiguity does not influence the performance of our method. In brief, this section treats a Chinese character sequence ended with comma, period, interrogation, exclamation and semicolon as a sentence.
6.1 Building Sets of Connectives and Literary Connectives
Figure-of-speech “pi-yu” makes a comparison between two unlike elements having at least one quality or characteristics in common. There are mainly four subcategories in “pi-yu”: “ming-yu” (明喻), “an-yu” (暗喻), “jie-yu” (借諭) and
“lue-yu” (略喻). “Ming-yu” and “an-yu” comprises three elements: tenor, connective and vehicle. For example, in sentence “the campus is similar to a market on recess”
(下課時校園就像菜市場), words “campus”, “similar” and “market” stand for respectively the tenor, connective and vehicle. ”Ming-yu” and “an-yu” are both similar to simile in English, but “ming-yu” differs from “an-yu” in the degree of relationship between tenor and vehicle using different connectives. Because
“ming-yu” and “an-yu” occur in essays with specific patterns, this paper only discusses the two subcategories of “pi-yu”.
Connectives are significant identifiers for retrieving the pattern of “pi-yu”. Based on our observations, the parts-of-speech of connectives could be classified into classificatory verbs and conjunctives, respectively denoted as VG and Caa in [15]. For example, words “變成” and “好像”, which are respectively synonymous to word
“become” and “like”, are classificatory verbs. Words “跟” (as) and “和” (as) are conjunctives. Since the classificatory verbs and conjunctives contain very few words in Sinica CKIP lexicon, experts can manually select qualified connectives.
Some of the connectives, e.g. word “如” (similar), almost do not appear in low-score essays, but occur in high-score essays frequently. These connectives, denoted as literary connectives, are found to be seldom used in colloquialism. Based on our observations, literary connectives should be useful for essay scoring.
Formula (6.1) is used to retrieve literary connectives from training data. First essays in training data are divided into a subset of high-score essays and a subset of low-score essays. A literary connective w is defined to satisfy the following condition:
( )
( ) ( )
Hf w
Hf w Lf w ≥β
+ (6.1)
where Hf(w) represents the numbers of the occurrence of w in the high-score subset, Lf(w) represents that in the low-score subset, β represents a threshold ranged from 0.5 to 1. The higher β value is used, the more discriminatory power the connective has. However, it will result in a small number of literary connectives.
Based on our experience from experiments, the best choice of β is 0.6.
6.2 Extracting FOS “Pi-yu”
The appearance of connectives can identify two patterns of figure-of-speech
“pi-yu”. The first pattern comprises “noun+connective+noun” in single sentence. For instance, the below sentence:
這時候 學校 變成了 一個 嘈雜的 菜市場 now Campus become a noisy market (Campus becomes a noisy market now.)
contains the sequence “campus+become+market” which matches the pattern
“noun+connective+noun”. Formula (6.2) describes the rule for the first pattern in detail:
> (Na | Nb | Nca | Ncb) > Connective > (Na | Nb | Nca | Ncb) > (6.2) where symbol “>” represents several words or no word, symbol “|” represents logical operator “OR“. Parts-of-speech Na, Nb, Nca, Ncb represent general noun, proper noun, proper place noun and general place noun, respectively.
The second pattern comprises either “connective+adjective+noun” or
“connective+noun+adjective” in a single sentence. In addition, it should satisfy two conditions: (i) there is no noun before the connective, (ii) the preceding sentence ends with comma and contains a noun. For example, considering the two adjacent sentences:
校園 充滿 交談的 聲音,就 如 菜市場 般 熱鬧非凡, Campus fill conversation voice as market boisterous (Campus fills with conversation voice, just as a boisterous market.)
in which the preceding sentence end with comma and includes noun ”campus”, and the succeeding sentence includes pattern “connective+noun+adjective” corresponding to
the sequence “as+market+boisterous” and there is no noun before the connective.
Formula (6.3) describes the rule of the pattern in detail:
> Noun >,> Connective > ((Adjective > Noun) | (Noun > Adjective)) > (6.3) where the definitions of symbols “>” and “|” are the same as that in Formula (6.2), “Noun” represents the component (Na | Nb | Nca | Ncb) in Formula (6.2),
“Adjective” represents a word whose part-of-speech is denoted as VH or A in [15].
Both rules for “pi-yu” in our proposed method effectively conform to the theoretical structure consisting of tenor, connective, and vehicle. Formula (6.2) often appears in English sentences and short Chinese sentences. Formula (6.3) is a mutation of Formula (6.2) where tenor and vehicle appears on different sentences. This is needed because of elaborated description for the tenor and vehicle.
6.3 Extracting FOS “Pai-bi”
Figure-of-speech “pai-bi” uses two sentences or sets of sentences, of which the syntactic structure is similar to each other, to express two concepts of the same property and domain. For example, both sentences “打球 的 打球、散步 的 散步”
(Players are playing, walkers are walking.) describe actions in campus using three words and the same syntactic structure: verb following noun. Our proposed method identifies the two single sentences as using the writing skill “pai-bi”.
The following criterion is used to identify if “pai-bi” appears in the essay. If two sentences appearing in a small segment of content contain the same number of words and the same part-of-speech sequence, then the “pai-bi” is considered to occur. For example, in the four serial sentences “到操場走走,可以看到有人悠閒的慢跑;到 合作社看看,可以看到有人瘋狂的搶購。” (Some guys are running leisurely on field;
some guys are shopping irrationally on snack bar.), the word segmentation and part-of-speech tagging for the first and third sentence is as follows.
到(P) 操場(Ncb) 走走(VA) 到(P) 合作社(Ncb) 看看(VA)
Both sentences consist of three words and the same parts-of-speech of the words.
In particular, a preposition, general place noun and verb are in the sequence. Our method hence identifies the occurrence of “pai-bi” in the four serial sentences.
The above example actually shows the delicate aspect of “pai-bi” where the first and the third constitute a usage while the second and fourth also constitute another usage of “pai-bi”. This is an advanced usage of “pai-bi” and is not considered in this study due to its rare occurrence.
6.4 Usefulness for Scoring Essays
Table 6.1 shows how figures-of-speech affect the scores of essays. Row 1 in Table 6.1 shows the ratios of essays to all of the essays in the corpus under different scores. Row 2 shows the ratios of the essays to all of the essays containing the usage of “pai-bi” under different scores. Row 3 shows the ratios of the essays to all of the essays containing the usage of “pi-yu” under different scores. Row 4 shows the ratios of the essays to all of the essays containing the usage of literary “pi-yu” under different scores. The different distribution or spread of the ratios shows that the usage of figure-of-speech in fact affects the score of the essays.
The total ratios in the expanded column of higher score for row 2, 3 and 4 are 0.57, 0.55 and 0.80 respectively while the total ratios for row 1 are 0.37. It shows that the essays using figure-of-speech increase the odds to obtain higher scores. Further, the data from row 2, 3 and 4 shows that the odds are increased if the advanced skill of figure-of-speech is used. In other words, graders trend to grade essays containing advanced writing skills to higher score against common skills.
Table 6.1 The Distributions of the Ratios of Essays to All Essays Lower score Higher score
1 2 3 4 5 6 All essays 0.07 0.19 0.31 0.30 0.13 0.01 FOS “pi-yu” 0.03 0.15 0.28 0.35 0.19 0.01 literary “pi-yu” 0.00 0.11 0.19 0.46 0.22 0.03 FOS “pai-bi” 0.03 0.10 0.22 0.47 0.16 0.03
By contrast to selection and connection of concepts, the number and similarity of figures of speech in essays cannot be used alone to predict exact scores of the essays.
However, the experimental result indicates that the occurrence of figures of speech will be a useful and important factor for Chinese AES.
Chapter 7 Performance of Conceptualization for Scoring Essays