Statistical Analysis Procedures - 英⽂學術論⽂中[N1 of N2]構式之探討

國

立政治大學

‧

N a tio na

l C h engchi U ni ve rs it y

the annotated result based on our corpus data. There are two kinds of results to be interpreted after running CFA: CFA types and antitypes. The former refers to those configurations with a frequency that is higher than the expected frequency; by contrast, the latter refers to those with a frequency that is lower than the expected frequency. There are two variations of CFA, according to von Eye (1990): hierarchical and nonhierarchical. He explains that HCFA “systematically excludes variables from the analysis that contribute little or nothing to the constitution of types and antitypes”

(p. 6). In other words, HCFA does not run all variables simultaneously as would be found in the nonhierarchical type of CFA. Glynn (2014b) comments that this method is a simple and powerful technique which “can be seen as a simplified log-linear analysis (…) or as multiple Chi-squared tests” (p.318). He also points out that a potential drawback of this approach would be on sample size to follow two restrictions: (1) at least 20% of the cells should contain more than five occurrences, and (2) all cells must contain at least one occurrence. Keeping these in mind the following sections describe the technical procedures for carrying out the analysis.

3.3 Statistical Analysis Procedures

Several steps were necessary before subjecting the data to statistical computations in R. The following sub-sections describe the steps taken in the present study to convert the concordance results into data manageable for analyses in R.

3.3.1 Procedures for Correspondence Analysis

To prepare data for correspondence analysis, the annotated data was manually transformed into a cross-tabulation table by counting each cell in an Excel file as shown in Table 3.4. In this table, the rows represent the semantic categories of N1, whereas the columns represent the semantics categories of N2. The cell for each column and row represents a count for the types of N1 and N2. Take the number 209 from column 2 and row 2 as an example. This number means that there are a total of 209 concordance lines with the semantic category of N1 and N2 both under ‘act’. An

‧ 國

立政治大學

‧

N a tio na

l C h engchi U ni ve rs it y

example of this configuration would be the public acceptance of psychoanalysis (HP2-1305) where both acceptance and psychoanalysis are categorized as under the semantic category of ‘act’. Glynn (2014a) points out that it is a good heuristic to gradually remove these small cells which tend to cause distortion in the analysis.

Following Glynn’s advice, minor modifications to the data were made. First, five semantic categories with the lowest count, namely, ‘event’, ‘quantity’, and ‘relation’

were removed from the table. Second, two categories originally named ‘time’ and

‘location’ were recombined to form a new ‘spatio-temporal’ category. Table 3.4 was then imported into the R program for use. The R package used for running the corresponding analysis was FactoMineR (Lê, Josse, & Husson, 2008). The package was chosen for its representation of colors and legends. The output biplot was saved in an pdf file for further analysis.

‧ 國立

政治大

‧

學

tio Na

l C na e h

g n

c h i U n

iv ers i ty

Table 3.4 Contingency table for correspondence analysis

N2 act animate attribute communication group spatial-temporal cognition process state technical/concrete

act 209 97 81 105 109 40 141 14 55 201

animate 12 18 6 6 69 38 16 0 2 24

attribute 156 62 40 67 77 24 102 12 28 121

communication 76 40 28 59 33 14 42 2 7 27

group 71 48 20 25 71 35 46 1 18 47

cognition 192 72 64 88 61 27 133 14 37 62

process 36 4 13 3 9 7 12 11 15 36

state 72 21 25 15 28 12 44 8 28 34

technical/concrete 24 31 7 10 21 18 9 5 4 85

spatio-temporal 57 16 11 17 19 59 27 6 12 37

‧ 國

立政治大學

‧

N a tio na

l C h engchi U ni ve rs it y

3.3.2 Procedures for Covarying Collexeme Analysis

To prepare data for covarying collexeme analysis (Stefanowitsch & Gries, 2003, 2005; Gries & Stefanowitsch, 2004), the concordance of 591,000 hits was first queried for the head nouns at N1 and N2 in R. The head noun is semantically defined as “the most obligatory element” in the nominal group (Sinclair, 1991: 86). For instance, the head nouns (shown in bold) in the of -construction in (3.3) would be the word increase at N1 and antigen at N2.

(3.3) A significant increase of urokinase type plasminogen activator antigen in carcinomatous tissue extracts of oesophagus and stomach has recently been reported by Nishino et al. … (HU2_1048)

This example also shows that there are noun-noun sequences consist of premodifying nouns and a head noun which occupies the final position of an NP (Biber et al., 1999:590). To serve the purpose of this study, the maximum number of nouns for the noun-noun sequences was set to five to cover a wide range of NPs. For example, in the application of the hierarchical engineering records management system (FE6-471), the number of nouns expressed in the noun-noun sequence at N2 following the adjectival modifier hierarchical would be four. A maximum of five nouns was set to extract the head noun at N2. For running covarying collexeme analysis in R, the query results were compiled into a table consist of 591,000 rows with two columns headed under N1 and N2, respectively. All the words were converted to lower case before the table was used to run Gries’ R script, Coll.analysis 3.2a (Gries, 2007), for an inferential statistical analysis that calculates the degree of association (measured as collocation strengths) of the words occurring at N1 and N2 with the of -construction. The output file was then copied and pasted onto an Excel file for further analysis.

3.3.3 Procedures for Hierarchical Configural Frequency Analysis

For category annotation, 1% of the BNCweb concordance result (591,000 hits) of of -constructions were randomly extracted, which resulted in 5,910 instances. The

‧ 國

立政治大學

‧

N a tio na

l C h engchi U ni ve rs it y

5,910 instances were initially processed in R to exclude any commonly occurring three-word complex prepositions (e.g., in advance of, on behalf of, in danger of, in place of, in terms of, and by virtue of ). Three-word complex prepositions, defined as a sequence of “a simple preposition-any noun-a simple preposition” by Hoffmann (2005:23) share the same syntactic pattern as the target of -construction. According to previous studies such as Hoffmann (2005) and Quirk et al., (1985), these complex prepositions have been grammaticalized into indivisible units regarding both of their form and meaning which would not be congruent for the present study. Before the exclusion procedure began, a list of 113 complex prepositions was compiled from Hoffmann’s and Quirk et al.’s (1985) work (see Appendix A for the compiled list).

Next, the 5,910 instances of concordance were filtered to exclude those targets which are identical to any of the 113 complex prepositions. However, there were only 112 matches, suggesting a rather low occurrence in academic writing. The remaining 5,798 instances of concordance were copied and pasted onto an Excel file where annotations were to be carried out (Section 3.4.1 below). There are three features considered for annotation: nominal categories, semantic relations of N1 and N2 and syntactic positions of of -construction. Because the scope of this study has been limited to nominal groups occurring before and after of, any words that are not considered as a noun such as pronouns (e.g., it, this, anyone), determiners (e.g., each), or conjunctions (e.g., whether, which) were considered unwanted. Next, a total of 5,650 instances were subjected to a separation into two groups: (1) a total of 4,881 instances whose nominal groups are limited to two with one intermitting of (i.e., [N1 of N2]); (2) a total of 769 instances whose nominal groups are at least three with at least two intermitting of ’s (i.e., [N1 of N2 of N3], [N1 of N2 of N3 of N4] or [N1 of N2 of N3 of N4 of N5]).

Only the former was further annotated and then used in HCFA. The program generated an output folder that contains several text files with the data. The file containing the hierarchical data was selected and saved to an Excel file for further analysis.

在文檔中英⽂學術論⽂中[N1 of N2]構式之探討 - 政大學術集成 (頁 70-74)