• 沒有找到結果。

立 政 治 大 學

N a tio na

l C h engchi U ni ve rs it y

3.2 Statistical Measures

For the purpose of this study, three statistical analyses were used, two of which have been developed by Gries and Stefanowitsch (Gries & Stefanowitsch, 2004;

Stefanowistch & Gries, 2005) using the R statistical software (R Development Core Team, 2010). These three statistics methods were chosen to provide multivariate analyses on categorical data.

The first statistical analysis is correspondence analysis (CA). Like other multivariate analyses used in this study, it is derived from a two-way contingency table where each row represents the count for a type of N1 head paired with a type of N2 head represented by each column. It is a visualization of categorical data. However, Glynn (2014a: 451) urges caution in maintaining the count in each cell of the contingency table to be at least ten.

The second, referred to as the covarying collexeme analysis, (CCA), is a type of collostructional analysis that employs statistical measures to identify collocational association between a word pair and a construction. This analysis was used to determine the most commonly co-occurring N1 and N2 in the of -construction.

Common to the family of collostructional analysis, this method has the advantage that takes into account of each word’s frequency for the target construction by measuring against its overall frequency. In other words, the measurement would not be biased toward those target words that commonly occur in a high frequency. The following sub-sections provide a brief introduction to the two approaches.

Finally, the last statistical analysis involves a clustering analysis which has been advanced by Gries (2005) and Gries and Divjak (2009) that take into account of ID tags (Atkins, 1987), features (Dirven, Goossens, Putseys, & Vorlat, 1982), or levels (von Eye, 2002). This method, referred to as the hierarchical configural frequency analysis (HCFA), presupposes that there is a network of words/senses connected by their linguistic behavior. The more similar two words are the closer they are in the nexus of our mental lexicon. It is premised on this assumption that we examine patterns of annotated usage features by using multivariate statistics to identify

‧ 國

立 政 治 大 學

N a tio na

l C h engchi U ni ve rs it y

significantly correlating factors.

3.2.1 Correspondence Analysis

The correspondence analysis (CA) is a statistical tool employed in the current study to visualize categorical data. Because the aim of our study was to identify associational patterns among each category for both N1 and N2, CA is a suitable method. Blasius and Greenacre (2006) states that “Correspondence analysis (CA) is an exploratory multivariate technique for the graphical and numerical analysis of almost any data matrix with nonnegative entries, but it principally involves tables of frequencies or counts” (p. 4). Glynn (2014a), on explicating how CA works, writes that “Basically, correspondence analysis takes the frequency of co-occurring features and converts them to distances, which are then plotted, revealing how things are related by how close to or far from each other they are in a two- or three-dimensional visualization” (p. 445). The output of the CA is a biplot which allows for a two-dimensional structure of the data.

3.2.2 Covarying Collexeme Analysis

Covarying collexeme analysis (CCA) has been developed by Gries and Stefanowitsch (2004; Stefanowtisch & Gries, 2005) to provide a statistical measure for the association between words occurring in different slots of the same construction.

The analysis can be considered as an extension from Gries and Stefanowitsch’s (2003) work on collexeme analysis which identifies the association between a construction and the words occurring in one particular slot. The researchers use the term, collostruction strength, to refer to the degree of association between two entities based on Fisher-Yates Exact test. Collostruction strength is presented by the negative log transformed p-values that could be either in the positive, if the association is attractive, or negative, if the association is repulsive. The positive or negative values are the computation result of a comparison between the observed frequency, i.e., the actual occurrence found in the corpus, and the expected frequency, i.e., the frequency

‧ 國

立 政 治 大 學

N a tio na

l C h engchi U ni ve rs it y

computed from a contingency table. A contingency table for covarying collexeme analysis taken adopted Stefanowitsch and Gries (2005) is illustrated in Table 3.3.

Table 3.3 Contingency table for covarying collexeme analysis (Stefanowitsch & Gries, 2005: 9)

Mslot2 Mslot2

(Word M in slot 2) (all other words in slot 2)

Lslot2 Freq (Lslot1+ Mslot2) Freq (Lslot1+Mslot2)

(word L in slot 1)

Lslot1 Freq (Lslot1+ Mslot2) Freq (Lslot1+Mslot2) (all other words in slot 1)

A calculation of the expected cell frequency for our target co-occurring word L and word M would be the product of the frequency of word L observed in slot 1 and the frequency of word M observed in slot 2 to be divided by the total number of constructions. The product can be viewed as the row total times the column total as summarized in (3.2).

(3.2) expected frequency = f r×fcn ,

where f r is the total row frequency, f c is the total column frequency and n refers to the total frequency (Hinkle, Wiersma & Jurs, 2003: 555).

3.2.3 Hierarchical Configural Frequency Analysis

The most important statistical tool used in the present study is by means of hierarchical configural frequency analysis (HCFA), a statistical program provided by Gries (2004) written in R scripts. HCFA is a type of configural frequency analysis (CFA). Von Eye, Mair, and Mun (2010: 1) explain that CFA differs from standard methods of categorical data analysis in terms of their results. Unlike standard methods such as log-linear modeling or logistic regression that express results by means of relationships among variables, results from CFA are expressed in terms of configurations or cells of a contingency table. In other words, CFA computes various combinations of annotated features, as tabulated in a contingency table, to determine if any pattern is more or less often than expected by chance. An expected frequency for each configuration would be calculated and compared with the observed frequency or

‧ 國

立 政 治 大 學

N a tio na

l C h engchi U ni ve rs it y

the annotated result based on our corpus data. There are two kinds of results to be interpreted after running CFA: CFA types and antitypes. The former refers to those configurations with a frequency that is higher than the expected frequency; by contrast, the latter refers to those with a frequency that is lower than the expected frequency. There are two variations of CFA, according to von Eye (1990): hierarchical and nonhierarchical. He explains that HCFA “systematically excludes variables from the analysis that contribute little or nothing to the constitution of types and antitypes”

(p. 6). In other words, HCFA does not run all variables simultaneously as would be found in the nonhierarchical type of CFA. Glynn (2014b) comments that this method is a simple and powerful technique which “can be seen as a simplified log-linear analysis (…) or as multiple Chi-squared tests” (p.318). He also points out that a potential drawback of this approach would be on sample size to follow two restrictions: (1) at least 20% of the cells should contain more than five occurrences, and (2) all cells must contain at least one occurrence. Keeping these in mind the following sections describe the technical procedures for carrying out the analysis.