Analyzing Chinese Grammatical Patterns in GW 2.0

3. STUDY I—CORPORA ANALYSIS…

3.1 Methods and Findings of Analyzing Native Speaker Corpora

3.1.3 Analyzing Chinese Grammatical Patterns in GW 2.0

國

立政治大學

‧

N a tio na

l C h engchi U ni ve rs it y

have some difference in synonym analysis with the Chinese發生 fāshēng ‘happen’

concept. In English, except for the most closely related verb OCCUR, HAPPEN is more close to the EXIST concept (something or someone is being in a certain place).

However, in Chinese, 發生 fāshēng ‘happen’ is more relevant to the 出現chūxiàn

‘appear’ concept (something or someone becomes to be in a certain place).

Nonetheless, this does not indicate that the existence/appearance verbs with the same concept in two languages should be definitely distinguished. Rather, it provides a tendency of verb meaning when we would like to investigate the correlation among each verb within a verb type (in this thesis, it is the unaccusative existence/appearance verbs).

3.1.3 Analyzing Chinese Grammatical Patterns in GW 2.0

The second analysis with respect to the synonyms of 發生 fāshēng ‘happen’ in Chinese focuses on the frequency and percentage of four grammatical patterns (V+-zhe, V+-le, N+V, and V+N).

With respect to the Chinese grammatical patterns for the Chinese native speaker corpus, Figure 3.3 displays the examples of 發生 fāshēng ‘happen’ and the way to extract the Chinese grammatical patterns from GW 2.0.

Figure 3.3 The Chinese Grammatical Patterns of 發生

fāshēng

‘happen’ in GW 2.0

‧

As can be seen in Figure 3.3, it is the way we search for the frequency of each Chinese grammatical pattern for 發生 fāshēng ‘happen’ and its synonyms. In this figure, we take the V+N patterns for發生 fāshēng ‘happen’ as an example. If we would like to search this pattern, “發生”[tag=”N.”] should be typed in, which means that the result will display all of the examples with the verb發生 fāshēng ‘happen’

collocated with the postverbal nouns. The other Chinese grammatical patterns were searched in the same way. All of the four Chinese grammatical patterns include the two Chinese perfective auxiliaries of unaccusative existence/appearance verbs (V+the impefective –zhe versus V+ the perfective -le) proposed by Liu (2007) and Laws and Yuan (2010), as well as the verb-noun grammatical patterns (N+V versus V+N) discussed by Fu (2007), Wang (2008), and Shei (2005). With the tool of concordance and corpus query language (CQL), we can precisely find out the different distributions of the four Chinese grammatical patterns. These Chinese grammatical patterns would also be utilized as reference for the stimuli of the psycholinguistic experiments, which will be discussed in great detail in Chapters Four.

The findings in terms of the frequency of the Chinese grammatical patterns are shown in Table 3.1.

TABLE 3.1 Frequency (and Percentages) of the Chinese Grammatical Patterns in GW 2.0

Chinese Unaccusative Verb Chinese Grammatical Pattern

Total V +-zhe V+-le N+V V+N

In Table 3.1, the result shows that, in terms of the four Chinese grammatical

‧ 國

立政治大學

‧

N a tio na

l C h engchi U ni ve rs it y

patterns, the frequency and the percentage of the three verbs are different. As for the pair of the two Chinese grammatical patterns (V+-zhe and V+-le), for 發生 fāshēng

‘happen’ and 出現 chūxiàn ‘appear’, the percentages of the grammatical pattern V+-le (approximately 4% for 發生 fāshēng ‘happen’ in column one and about 10%

for出現 chūxiàn ‘appear’ in column two) are much higher than those of the V+-zhe (0.093% for 發生 fāshēng ‘happen’ in column one and 0.001% for出現 chūxiàn

‘appear’ in column two), whereas the percentage of the grammatical pattern V+-zhe (9.117% in column three) is much higher than that of the V+-le (0.480% in column three ) for存在 cúnzài ‘exist’, indicating that, in terms of the grammatical patterns related to perfectivity in Chinese, the three unaccusative existence/appearance verbs may be distinctive. That is, the two verbs 發生 fāshēng ‘happen’ and 出現 chūxiàn

‘appear’ tend to be combined with the perfective auxiliary –le, while存在 cúnzài

‘exist’ appears to co-occur with the imperfective auxiliary –zhe.

On the other hand, for the two verb with noun grammatical patterns (V+N and N+V), the three unaccusative existence/appearance verbs share a similar pattern. All of the three words tend to be used as the grammatical pattern N+V (56.562% for發生 fāshēng ‘happen’ in column one; 42.470% for出現 chūxiàn ‘appear’ in column two;

38.335% for存在 cúnzài ‘exist’ in column three), which is more frequent than its reverse grammatical pattern V+N (more than 43% for 發生 fāshēng ‘happen’ in column one; approximately 33% for出現 chūxiàn ‘appear’ in column two; more than 21% for存在 cúnzài ‘exist’ in column three). This means that, for Chinese native speakers, V+N grammatical patterns among the three verbs are used more than N+V ones, even though the N+V grammatical patterns are not quite lower.

To summarize the Chinese grammatical patterns from the GW 2.0 corpus in section 3.1.3, for V +-zhe and V +-le grammatical patterns, 發生 fāshēng ‘happen’

‧ 國

立政治大學

‧

N a tio na

l C h engchi U ni ve rs it y

and出現 chūxiàn ‘appear’ tend to be combined with the perfective auxiliary –le whereas存在 cúnzài ‘exist’ usually collocates with the imperfective auxiliary –zhe, which implies that發生 fāshēng ‘happen’ and出現 chūxiàn ‘appear’ are frequently used in the perfective clauses but存在 cúnzài ‘exist’ is likely to be used in the imperfective clauses. On the other hand, as for the word order of both V+N and N+V, all of the three verbs are shown to be frequent in both of the two grammatical patterns, indicating that the two types of patterns, such as 發生意外 fāshēngyìwài ‘The accident happened’ or 意外發生 yìwàifāshēng ‘The accident happened’ are used frequently by Chinese native speakers.

3.1.4 Grammatical Form Analysis in the BNC Corpus

The next step is to search the grammatical form distributions of the four unaccusative existence/appearance verbs from the English native speaker corpus BNC.

An example of HAPPEN is provided in Figure 3.4.

Figure 3.4 The Grammatical Forms of HAPPEN in BNC

‧ 國

立政治大學

‧

N a tio na

l C h engchi U ni ve rs it y

In Figure 3.4, the example to analyze the different distributions of the grammatical forms as for the verb HAPPEN is shown. We choose the most frequent four grammatical forms (V-ed, V-base, V-s, and V-ing) because they may be more representative and frequently used by English native speaker. Moreover, other grammatical forms, such as HAPPENS may not be used as a verb, which would probably appear in the head of the sentence. The other three synonyms will be analyzed in the same manner so as to find out how English native speakers use these verbs and later compare the similarities and differences in terms of the grammatical forms.

The findings of the frequencies of the grammatical forms in terms of the four verbs are displayed in Figure 3.5. The two arrows of each verb refer to the two most frequent grammatical forms in the BNC corpus.

Figure 3.5 Verb-forms of HAPPEN, OCCUR, APEAR, and EXIST

in BNC

‧ 國

立政治大學

‧

N a tio na

l C h engchi U ni ve rs it y

From Figure 3.5, as for HAPPEN, OCCUR, and APPEAR, we can discover that both of the V-ed form and the V-base form account for the two most frequent grammatical forms, while EXIST has the tendency to be used in the V-base form (47.76%) and the third person V-s form (28.16% for exists), which indicates that native speakers of English tend to use the V-ed form and the V-base form for HAPPEN, OCCUR, and APPEAR, whereas they incline to use the V-base form and the V-s form for EXIST. However, even though the three unaccusative existence/appearance verbs (HAPPEN, OCCUR, and APPEAR) have higher percentages of the V-ed form and the V-base form, the most frequent one of the three is different.

Among the three, OCCUR and APPEAR possess the V-base form as the most frequent one (35.78% for occur and 35.83% for appear), though the base form of the two verbs is not extremely higher than the V-ed form (34.63% for occurred and 34.00% for appeared). On the other hand, HAPPEN shows a great discrepancy between the most frequent grammatical form (41.96 % for happened) and the second most one (27.11% for happen), which suggests that the salient percentage of the V-ed form may distinguish HAPPEN from its three other synonyms (OCCUR, APPEAR, and EXIST) in terms of the feature in the grammatical form. From the grammatical form distributions, we found that, for English native speakers, HAPPEN is frequently used in V-ed and V-base forms, OCCUR as well as APPEAR are frequently used in V-base and V-ed with near frequencies, and EXIST is frequently used in V-base and V-s forms. This implies the diverse verb form preferences for unaccusative verbs of English native speakers.

‧ 國

立政治大學

‧

N a tio na

l C h engchi U ni ve rs it y

3.2 Methods of Analyzing Learner Corpora

In terms of the analysis of the learner corpora, two main focuses are emphasized.

First, the grammatical form distributions along with the erroneous rates of HAPPEN, OCCUR, APEAR, and EXIST among the English native speaker corpus (BNC) and the three learner corpora will be investigated. The second focus is to analyze and categorize the error types of the four verbs from the three learner corpora. Some important information regarding the three learner corpora and the tool for extracting learner data will first be provided in the following sections.

3.2.1 Three Learner Corpora

With respect to the learner corpora, we utilized three L2 English learner corpora—the Language Training and Testing Learner Corpus (the LTTC), International Corpus of Learner English 2.0 (the ICLE, cf. Granger, Dagneaux, Meunier, & Paquot, 2009), and the National Chengchi University Foreign Language Learner Corpus (the NCCU, cf. Chung, Wang, & Tseng, 2010) All of the extracted data were produced by L1 Chinese learners, and the design and the organization of each corpus may possess some advantages for different purposes. The LTTC corpus selected in 2008 and was collected from an intermediate L2 English written texts with 1,990 samples containing 262,178 words (to date) collected from the General English Proficiency Test (GEPT), a formal English standardized test in Taiwan. Therefore, the L2 English data also have score metadata so that errors can be diagnosed according to the given scores. Part of the learner data were extracted from L2 learners’ writing tests in the LTTC. As for the annotation of this learner corpus, the part of speech

‧ 國

立政治大學

‧

N a tio na

l C h engchi U ni ve rs it y

(POS) tagging has been conducted after the analysis of this thesis was carried out. The ways of extracting L2 learners’ writing data will be illustrated in the section 3.2.2. In addition, the LTTC corpus also has the well-organized randomly-selected samples, while the data were designed for exam purpose without classroom exercises (cf.

Cheung et al., 2010; Chung & Wu, 2009). For the features of the LTTC, the L2 learners were selected from a variety of ages from 12 to 56 year old, with more representative and objective sampling of the subjects.

The second L2 English learner corpus (the ICLE) contains 3,753,030 words and is an L2 English learner corpus from a variety of L1 backgrounds, such as Bulgarian, Czech, Finnish, Japanese, Chinese, etc. The L2 learner data were mainly collected from argumentative academic writing, and each subcorpus contains approximately 200,00 word tokens. The Mandarin Chinese subcorpus has been adopted in the present thesis from 982 examples with 490,617 words. Therefore, the counts of the ICLE (490,617) are more numerous than those of the LTTC (262,178), and most of the L1 Chinese learners are mainly from Mainland China. All of L2 English learners in the ICLE were required to be the undergraduate students with advanced L2 English proficiency.

There are two versions of the ICLE, whereas, in the thesis, we utilized Version 2.0 for the concern on the large size of samples.

The third L2 English learner corpus in the present study is the NCCU Learner Corpus. It is a newly-established learner corpus in Taiwan with six languages—English, Japanese, Korean, French, Russian, and Arabic. The learner data were mainly collected from the written assignments of undergraduate students at NCCU. In this thesis, we utilize the subcorpus of English learner data, comprising 814 samples with 204,945 words (retrieved on Jan, 2010). Most of the subjects in English subcorpus of the NCCU were English majors, who possessed advanced proficiency of

‧ 國

立政治大學

‧

N a tio na

l C h engchi U ni ve rs it y

L2 English. As for the feature of the NCCU learner corpus, the L2 English data were selected from a variety of English learning materials, such as classroom exams, take-home assignments, and blog writing, etc. Therefore, compared to the previous two learner corpora (the LTTC and the ICLE), the NCCU possesses different types of learning contexts of L2 English written data.

3.2.2 The Tool of Extracting Learner Data

In order to make consistent the procedure of extracting L2 learner data among the three learner corpora, all of the learner data were extracted through the AntConc 3.2.1w developed by Laurence Anthony (2005). This simple corpus extracting tool can help us select the linguistic data of HAPPEN, OCCUR, APPEAR, and EXIST from the three learner corpora. The main searching function we will utilize with AntConc is the grammatical form search of the four verbs on the comparison of the frequency for the grammatical form distribution of each verb in BNC. One example of extracting data for HAPPEN form the LTTC via AntConc is displayed in Figure 3.6.

‧ 國

立政治大學

‧

N a tio na

l C h engchi U ni ve rs it y

Figure 3.6 The Grammatical Forms of HAPPEN in Learner Corpora

As can be seen in Figure 3.6, all of the instances of HAPPEN in the LTTC were extracted from the three learner corpora through the AntConc 3.2.1w, and all of the possible grammatical forms (happen, happens, happening, and happened) of each verb are taken into account. The other three unaccusative existence/appearance verbs (OCCUR, APPEAR, and EXIST) within the three learner corpora also follow the same procedure of data extraction. All of the learner data are then saved as the output for further analysis. For further analysis, we manually counted the grammatical form distributions as well as the erroneous rates (section 3.2.3) of HAPPEN, OCCUR, APPEAR, and EXIST among these the LTTC, the ICLE, and the NCCU learner corpora. Then categorizing error types of the four verbs was conducted (section 3.2.4).

‧ 國

立政治大學

‧

N a tio na

l C h engchi U ni ve rs it y

3.2.3 Grammatical Form and Erroneous Rate Analysis

In order to compare the similarities and differences of HAPPEN, OCCUR, APPEAR, and EXIST in the native speaker corpus BNC as well as the three learner corpora, the grammatical forms and the erroneous rates of the verbs are analyzed. The four words in the three corpora were investigated and elicited through the frequency of the four grammatical forms (V-base, V-s, V-ing, and V-ed), which were also compared to those of the BNC corpus.

In addition to the grammatical form analysis, the erroneous rates of each grammatical form (V-base, V-s, V-ing, and V-ed) with respect to HAPPEN, OCCUR, APPEAR, and EXIST in the three corpora are taken into account so that L2 learners’

difficulty in learning unaccusative existence/appearance verbs can be made clearer.

The calculation of the grammatical form distributions and erroneous rates of each English verb was followed in a rigorous manual data collection. We first identified the erroneous instances from each grammatical form, and then calculated the percentage of these erroneous instances for the comparison of similarities and differences in terms of HAPPEN, OCCUR, APPEAR, and EXIST across the native speaker corpus and the three learner corpora.

3.2.4 Categorizing the Errors

After the analysis of the grammatical form distributions and erroneous rates of HAPPEN, OCCUR, APPEAR, and EXIST, the next step focuses on categorizing the extracted erroneous instances into the common errors of the four verbs in the three corpora. All of the erroneous instances were categorized and identified manually. In

‧

terms of the categories of error types, this thesis follows a part of the result from a pilot study of HAPPEN in the LTTC corpus (Wang & Chung, 2009), which was previously shown in Table 2.1. However, in Table 3.2, the five most frequent errors of HAPPEN in the pilot study are re-categorized into two large-scaled error types, which could place more stress on the typical error types of unaccusative existence/appearance verbs in the present thesis.

TABLE 3.2 Examples of the Five Error Types from Learner Corpora

Error Type Freq. (%) Examples

Type 2- Mismatches in infinitive usages

8 (24.24%) *But you may say what is the reason cause this happen?

Type 3- Mismatches in present participle usages

5 (15.15%) *To avoid this thing happen, we should always keep clearly in a good range.

Schematic errors total 28 (84.84%)

Unaccusative errors

Type 4-Overpassivization

4 (12.12%) *First problem is always happened. When you eat noddles you will find glass bluring

Type 5-Transitivization 1 (3.03%) *This situation I have never happened before!

Unaccusative errors total 5 (15.15%)

Grand total 33 (100%)

As shown in Table 3.2, there are two larger scales—schematic errors and unaccusative errors. Schematic errors refer to the general error types which could be found in any verb type, such as unergative verbs (laugh or talk), during the learning process of the learners. In this larger scale, three error types are included, Type 1—mismatches in subject-verb agreement, Type 2—mismatches in infinitive usages,

‧ 國

立政治大學

‧

N a tio na

l C h engchi U ni ve rs it y

Type 3—mismatches in infinitive usages. The other larger scale is unaccusative errors, containing two specific subtypes of errors usually found in the misuses of unaccusative existence/appearance verbs, Type 4—overpassivization and Type 5—transitivization. After re-calculating the percentages of the five errors within two scales, schematic errors account for 84.84% within the thirty-three instances from the LTTC learner corpus, while unaccusative errors possess 15.15%. However, since the schematic errors might belong to the general errors, our focus in this thesis will be placed more on the two specific unaccusative errors—overpassivization and transitivization errors and investigate which unaccusative error will be found frequently in the four L2 English verbs. The schematic errors are mainly displayed to see the general L2 learners’ English proficiency.

As for the criterion to judge the errors, we observed the L2 English syntactic structures where the grammatical forms or the uses of HAPPEN, OCCUR, APPEAR, and EXIST are incorrect or less appropriate. For instance, the reason to categorize the erroneous sentence But you may say what is the reason cause this happen? into Type 3—mismatches in infinitive usages within the schematic errors is due to the fact that the correct grammatical form in this sentence should be to-V, and the cause…to-V is the type of L2 English infinitive syntactic structures. However, in the pilot study of Wang and Chung, the authors did not compare other learner corpora. Additionally, more unaccusative existence/appearance verbs should be included for realizing the learning difficulty of L2 learners. In the present thesis, all of the erroneous instances were categorized into these five error types within the schematic and unaccusative errors for the calculation of frequencies and percentages for HAPPEN, OCCUR, APPEAR, and EXIST across the three learner corpora, while some error types, hardly categorized into these five error types, will also grouped into the other error type. This

‧ 國

立政治大學

‧

N a tio na

l C h engchi U ni ve rs it y

would be also analyzed and discussed with particular concern in section 3.4.2.

3.3 Findings of Learner Corpora Analysis

From the analysis in section 3.3, we have discovered the distributions of the grammatical forms of the four unaccusative verbs in the English native speaker corpus.

In section 3.3.1, the comparison between native speaker corpus and the other three learner corpora will be stressed.

3.3.1 Findings of Grammatical Form and Erroneous Rate Analysis in Learner Corpora

This section is comparing the similarities and differences among the native speaker corpus (BNC) and the three learner corpora (the LTTC, the ICLE, and the NCCU). In order to clearly show the features of the four verbs among the four corpora, we utilized the bar chart to present the percentages of HAPPEN, OCCUR, APPEAR, and EXIST. The result is shown in the following four figures.

Figure 3.7 BNC Frequency of the Four Verbs

Figure 3.8 LTTC Frequency of

the Four Verbs

‧ 國

立政治大學

‧

N a tio na

l C h engchi U ni ve rs it y

Regarding the grammatical form distributions, the four figures (from Figure 3.7 to 3.10) present the percentages of the four frequent grammatical forms (V-ed, V-base, V-ing, and V-s) possessed by the four unaccusative verbs among the four corpora. As mentioned previously, the result shows that either V-ed or the V-base form of the four unaccusative verbs in BNC appears most frequently in the BNC even though only EXIST appears extremely frequent as the base form exist. This corresponds to a

在文檔中比較HAPPEN與其同義字: 以母語及學習者語料庫為基礎的非賓格存現動詞之研究 - 政大學術集成 (頁 62-0)

Analyzing Chinese Grammatical Patterns in GW 2.0

3. STUDY I—CORPORA ANALYSIS…

3.1 Methods and Findings of Analyzing Native Speaker Corpora

3.1.3 Analyzing Chinese Grammatical Patterns in GW 2.0

國

立 政 治 大 學

‧

N a tio na

l C h engchi U ni ve rs it y

Figure 3.3 The Chinese Grammatical Patterns of 發 生

‘happen’ in GW 2.0

‧

TABLE 3.1 Frequency (and Percentages) of the Chinese Grammatical Patterns in GW 2.0

‧ 國

立 政 治 大 學

‧

N a tio na

l C h engchi U ni ve rs it y

‧ 國

立 政 治 大 學

‧

N a tio na

l C h engchi U ni ve rs it y

Figure 3.4 The Grammatical Forms of HAPPEN in BNC

‧ 國

立 政 治 大 學

‧

N a tio na

l C h engchi U ni ve rs it y

Figure 3.5 Verb-forms of HAPPEN, OCCUR, APEAR, and EXIST

in BNC

‧ 國

立 政 治 大 學

‧

N a tio na

l C h engchi U ni ve rs it y

‧ 國

立 政 治 大 學

‧

N a tio na

l C h engchi U ni ve rs it y

3.2 Methods of Analyzing Learner Corpora

‧ 國

立 政 治 大 學

‧

N a tio na

l C h engchi U ni ve rs it y

‧ 國

立 政 治 大 學

‧

N a tio na

l C h engchi U ni ve rs it y

‧ 國

立 政 治 大 學

‧

N a tio na

l C h engchi U ni ve rs it y

Figure 3.6 The Grammatical Forms of HAPPEN in Learner Corpora

‧ 國

立 政 治 大 學

‧

N a tio na

l C h engchi U ni ve rs it y

‧

TABLE 3.2 Examples of the Five Error Types from Learner Corpora

‧ 國

立 政 治 大 學

‧

N a tio na

l C h engchi U ni ve rs it y

‧ 國

立 政 治 大 學

‧

N a tio na

l C h engchi U ni ve rs it y

3.3 Findings of Learner Corpora Analysis

Figure 3.7 BNC Frequency of the Four Verbs

Figure 3.8 LTTC Frequency of

the Four Verbs

‧ 國

立政治大學

Figure 3.3 The Chinese Grammatical Patterns of 發生

立政治大學

立政治大學

立政治大學

立政治大學

立政治大學

立政治大學

立政治大學

立政治大學

立政治大學

立政治大學

立政治大學

立政治大學