• 沒有找到結果。

2.5 Measurement of Semantic Relation

2.5.3 Uses of Semantic Similarity Measures

WordNet was compiled in a format of online electronic lexicon in terms of the network theory of mental lexicon (Fellbaum 1998). It modeled the lexical network of English native speakers according to lexical memory and synonymic substitutability.

As indicated by the network theory, storage and retrieval of lexicon was highly dependent on lexical semantic relatedness. English nouns, verbs, adjectives, and adverbs were organized into synonym sets as synsets. Each synset comprised a list of synonymous words and semantic pointers which captured relationships between the current synset and other synsets. They were interlinked by means of conceptual semantic and lexical relations. The semantic pointers included several various types as:

hyponym/hypernym, meronym/holonym. The hierarchical relation of lexicon was one of the important features in WordNet. Nouns were organized around hypernym and hyponym groupings. For instance, one of the direct hypernyms of ‘dog’ was ‘canine’.

One of the direct hyponyms of ‘dog’ was ‘hunting dog’. In WordNet, hypernyms and hyponyms were relativized concepts. There were a total of 117,798 nominal word types. These word types contained 146,312 sense lemmas. There were tens of millions of hierarchical chains in WordNet of which the cognitive levels of the words

have not been identified yet. Another feature was that each hierarchical chain was segmented into three parts which closely conformed to the three cognitive levels, superordinate, basic, or subordinate level to each class in English. The primary classes featured with nominal hierarchy, adjectival opposition and verbal entailment (Fellbaum, 1998; Lin, 1997). The present study used nouns and adjectives from WordNet as a knowledge base which provided a comparatively rich set of semantic links.

Due to their computational nature, semantic similarity measures were algorithmic methods mostly developed by computational linguists and computer scientists. They were designed for the purposes of performing automatic semantic analysis and facilitating applications of natural language processing. Merits of different semantic distance measures were therefore tied to performance evaluation on how successful the respective language related tasks were solved. On the side of lexical resource based measures, Patwardhan, Banerjee, and Pedersen (2003) compared a group of seven measures on the task of word-sense disambiguation and found that their hybrid method of WordNet structure and dictionary definition was the best, slightly edging out the Jiang-Conrath measure (Jiang & Conrath 1997). For the task of finding predominant word senses, McCarthy (2004) found that Patwardhan et al. measure and Jiang-Conrath measure were comparably the best but the latter was far more efficient.

In the other three studies, Jiang-Conrath measure was consistently found to be the best for the tasks of pattern induction for information extraction (Stevenson & Greenwood 2005), learning coarse-grained semantic classes (Kohomban & Lee 2005), and malapropism detection (Bundanitsky & Hirst 2006).

On the side of lexical distribution based measures, Turney (2001) showed the Pointwise Mutual Information (PMI) measure of semantic similarity could perform considerably well on TOFEL synonym test questions. Weeds & Weir (2005) evaluated

two state-of-the-art distributional similarity measures: Mutual Information (MI) based measure (Lin, 1998) and α-skew divergence measure (Lee, 1999), and found that both distributional similarity measures performed the pseudo-disambiguation task better than the representative lexical resource based measure. Cilibrasi & Vitanyi (2007) demonstrated the successful application of Normalized Google Distance in hierarchical clustering, language translation, and WordNet-like semantic categorization. Lindsey et al. (2007) examined the impact of corpus selection on two popular semantic measures: Pointwise Mutual Information and Normalized Google Distance. Their experimental results showed that Normalized Google Distance was the best overall semantic measure and was better served by smaller and high-quality corpus (Wikipedia and NY Times) than by the World Wide Web corpus.

While most measures were reported to achieve high or at least respectable performance in individual tasks, the issue of how well these measures reflected true semantic relatedness remained. Therefore, comparing results of semantic distance measures with human judgment was another important dimension of evaluation.

Budanitsky & Hirst (2006) compared five lexical resource based on measures with human rating, indicating that the coefficients of correlation ranged from 0.744 to 0.850. However, they also cautioned the limited amount of data and the difficult experimental setting of human subjects. Lindsey et al. (2007) also used human judgment of word relatedness in the evaluation and reported the average agreement percentage scores ranging from 43.2 to 67.3. Cramer (2008) contrasted with a critical performance of eleven semantic measures whose correlation coefficients with human judgment were rather low (from 0.16 to 0.36). She pointed to the reasons being the inadequate coverage of semantic information in WordNet and corpora, as well as the inappropriate experimental setup of human judgment. On another positive note, Gracia and Mena (2008) proposed a Google-based semantic relatedness measure and

reported the Spearman correlation coefficients with human judgment that ranged from 0.71 to 0.78 for the Google-based measure with the top five search engines, superior to the range of 0.46 to 0.62 for the top five WordNet-based measures. Recchia and Jones (2009) also conducted a comparative evaluation on several semantic measures with respect to human semantic judgment. The best Spearman rank correlation coefficients in the matrix of tasks and semantic measures ranged from 0.62 to 0.90.

They also concluded that simple metric could outperform more complex metrics when training on extremely large and quality data. In particular, the combination of the pointwise mutual information (PMI) metric and Wikipedia corpus was found to be highly successful and could be efficiently applicable for other research tasks. Overall, the use of effective semantic measures for approximating human judgment seemed to be supported with promising evidences.

2.6 Concluding Remarks

Collocation learning is an important research area because it involves structural, semantic and cognitive variations in lexical components which underpin the foundation of language competence. Reviewing the strength and weakness of each approach was essential for interpretation and evaluation of research discussion and outcomes.

The operational definition aforementioned in this study is based on the structure of a collocation. The “base” keeps its usual meaning as autosemantic words, while the

“collocate” usually has a less transparent meaning as synsemantic words (Hausmann 1999). Two or more lexical elements with the same base can also be syntactically associated as a merging of binary or ternary collocations, or even more words like

“collocational chain” and “collocational cluster” (Spohr, 2005), which also facilitate linguistic precision and text cohesion.

The current study used Benson et al.’s (1997) categorization because they provided clear distinctions between lexical and grammatical collocations in the syntagmatic structure. In addition, this study focused on the semi-restricted collocation because many research findings indicated it was the most problematic for L2 learners.

CHAPTER THREE RESEARCH METHOD

In this chapter, the first section provides an overview of the research design and then describes the four sub-studies to address the research questions. The four sub-studies include verifying applicability of lexical similarity measured in distinguishing semantic relations, examining collocational congruency based on similarity measures, investigating the relationship between congruency effects and collocational performance, as well as gaining insights into L2 learners’ cognitive factors of congruency processing. In the second section, profiles of research participants are depicted. The third section discusses research instruments, including computational similarity measures via the WordNet online system, a collocation test, a collocation use questionnaire, and the think-aloud method. The fourth section addresses the data collection procedure, followed by an introduction to the data analysis methods in the last section.

3.1 Research Design

The theme of the current research was using semantic similarity measures to evaluate congruency and analyzing congruency effects on L2 collocation learning.

The overall research design involved four sub-studies to address four research questions, as shown in Figure 3.1.1. The first three sub-studies conducted quantitative analysis to examine applicability of semantic similarity measures, to evaluate congruency classification, and to investigate congruency effects on collocation learning. The fourth sub-study surveyed and elicited factors of learners’ collocational priming with both quantitative and qualitative analyses.

The operational research framework, as depicted in Figure 3.1.2, was composed of research components with investigative roles and relations. The participants provided attributes of proficiency levels, collocation test data, questionnaire survey and think aloud data. Lexical similarity measure was used as a research instrument, whose utility was examined by the first sub-study to address the first research question. The collocation test was the core of study and was also a research instrument to gauge learners’ performance. The application of similarity measure to collocation led to objective examination of the notion of congruency, which was the focus of the second sub-study and addressed the second research question. Learners’

performance on collocation use was analyzed in the third sub-study to address the third research question, with learners’ proficiency level and collocational congruency as two independent variables. Finally, questionnaire and think aloud method were also research instruments with data from participants and test performance to address the fourth research question in the fourth sub-study.

To examine applicability of similarity measure

Figure 3.1.1 Overall Research Design

Quantitative Studies Qualitative Study

To evaluate congruency classification

To investigate congruency effects on collocation learning

To survey and elicit factors of collocational priming

Using Lexical Semantic Similarity Measures for Analysis of Congruency Effects on L2 Collocation Learning

Figure 3.1.2 Operational Research Framework

The first sub-study involved verifying the applicability of semantic similarity measures. Based on literature review, it was found that WordNet (Miller, 1995) incorporated eight computational algorithms of semantic similarity measures and provided convenient online use. Of the eight algorithms, two (Adapted Lesk and Gloss

Vectors) were selected in terms of their computational features and measuring

stability. Three sets of word pairs with different semantic relations were composed and tested for lexical similarity values by the two measures. The distinguishability of similarity values between different semantic relations was examined so as to establish the use of semantic similarity measure as a research instrument.

The second sub-study applied the two semantic similarity measures to the experimental set of collocation so as to objectively evaluate the properties of congruency. Semantic similarity between a collocate and a transferred word from L1 counterpart was quantified by the two computational semantic similarity measures.

Distribution of semantic similarity values was observed and analyzed for congruency classification. Congruency derived from lexical semantic similarity was cross-examined with congruency based on human judgment. Statistical and analytical

Learner Performance Questionnaire

&Think Aloud

Participants Collocation Similarity Measure

Congruency Proficiency Level

RQ1

RQ2

RQ4 RQ3

comparisons were made, which led to further understanding of the potential advantage of exploiting semantic similarity for congruency evaluation. The quantitative results and analytic derivation addressed the second research question.

In the third sub-study, it was hypothesized that L2 collocations with semantic components disparate to that of corresponding L1 counterparts would be more challenging to L2 learners. In other words, incongruent collocations would be error-prone to L2 learners. It was also conjectured that learners’ proficiency levels played a role in the interaction of congruency and performance. The objective of the experiment was to find the correlation between two independent variables, congruency and proficiency level, and one dependent variable, L2 collocation performance. The experimental results and the analytic derivation addressed the third research questions and provided comparison to previous research findings in the literature. For example, Yamashita and Jiang’s (2010) indicated that L1 and L2 congruency and L2 exposure influenced L2 collocational acquisition. They also suggested that it was more difficult to learn incongruent L2 collocations, but once stored in their lexical network, they were processed independently of L1 linguistic competence. However, the current study does not tackle the issue about whether L2 collocational knowledge develops in parallel with vocabulary knowledge. This aspect is beyond the research scope.

In the fourth sub-study, a questionnaire survey was conducted to collect learners’

conceptions about collocational congruency processing. L2 learners’ conceptions towards L1 and L2 congruency processing could reveal more in-depth thoughts about how they processed congruent and incongruent collocations. It was conjectured that learners’ conceptionss acted on lexical semantic processing and the processing activated collocational priming and production. As such, an investigation on learners’

conceptions on L2 collocation processing could gain a fuller understanding of under

what conditions congruency is a facilitator or a hindrance in L2 collocation production.

After the questionnaire survey, the think-aloud protocol was compiled to get insights into the learners’ dynamic thinking process within certain context of engagement. The fourth sub-study was beneficial to explore how L2 learners constructed collocational meanings. The protocol also elicited learners’ introspective thoughts on L2 collocational congruency processing, which led to potential induction of learning factors. Such a process-oriented study design was well-suited to collect narrative accounts. Based on this orientation, the present study adopted the think-aloud method to look into sophisticated performance of language learners’

thinking. The contents of think-aloud represented explicit linguistic knowledge and think-aloud could be used as a complement after language tests to further elicit more explicit knowledge. Also, it was useful to identify some general trends and patterns of the processing. With questionnaire responses and think-aloud transcribed data, both quantitative and qualitative analysis was conducted for comparison and interpretation, which addressed the fourth research question.

3.2 Participants

The present study used convenience sampling on the account of administrative constraints. Three participant groups were classified into advanced, intermediate and lower levels. A total of 345 college students from a medical university and a nursing junior college in Taiwan were recruited for the present investigation. The first group of participants was consisted of medical and dentistry majors who enrolled in the Freshman English conversation course for two semesters. Their English proficiency was more advanced, above the requirement of B1 level of CEFR (Common European

Framework of Reference for Languages: Learning, Teaching and Assessment) or

above by the general education center of the medical university. The levels were close to the intermediate to high level of GEPT (General English Proficiency Test), or close to the advanced level of TOEIC (at least 550 scores). The second group was composed of the pharmacist majors who had GEPT certificates of the intermediate level or at least 400 TOEIC scores. The third group consisted of the nursing majors of a junior college. On average, the English proficiency of the third group ranged from beginning to low-intermediate levels of GEPT, or approximately from A1 to A2 levels of CEFR.

All of the participants were classified into three proficiency groups, e.g., Group 1 for the high-level, Group 2 for the mid level, and group 3 for the low level, as profiled in Table 3.2.1. All of the participants took the collocation test and then completed a questionnaire in Mandarin. Based on the questionnaire responses, twenty one participants were selected for the think-aloud sub-study.

Table 3.2.1 Profile of Participant Groups and English Proficiency

Group Major N GEPT TOEIC CEFR

G1 High

Medical Science &

Dentistry

122 Advanced-

Intermediate 550 B1 G2

Middle

Pharmacy

112 Intermediate 400 A2~B1

G3

Low

Nursing

111 Low 325 or

lower A1~A2

3.3 Instruments

The instruments included two semantic similarity measures, a set of collocation test, a set of questionnaire, and the think-aloud protocol. By the operational definition, a collocation is formed by a collocate and a base. Given a pair of equivalent L2 and L1 collocations, the subject of study is usually the semantic relation between the pair of cross-linguistic collocates. However, currently all semantic similarity measures were designed to operate on word pairs of the same language. To evaluate the semantic relation between the pair of cross-linguistic collocates by semantic similarity measures, an L2 transferred word of the L1 collocate was used as a surrogate that embedded the word sense of the L1 collocate. As a design feature, semantic similarity measures also allowed semantic similarity evaluation between word pairs in both contexts of all word senses or designated word senses. When operated in all word senses, semantic similarity measures computed all possible combinations of word senses and gave the highest value that reflected the most similar senses of the two words. Alternatively, when a particular sense of each word was selected, semantic similarity measures provided similarity values of the two designated word senses.

As a convenient and useful semantic search instrument, WordNet Search-3.1 was employed to consult for word senses in glossary. As shown in Figure 3.3.1, the online system of WordNet Search-3.1 (Miller, 1995) was different from other online dictionaries because it showed not only lexical meanings and part of speech, but also its synset relation and word relation. For the purpose of this study, WordNet

Search-3.1 provided word sense observation and selection for both L2 collocates and

L2 transferred words as surrogates of L1 collocates.

Figure 3.3.2 Online Use of Semantic Similarity Measures Figure 3.3.1 Semantic Search: Word Sense Selection

The use of the two semantic similarity measures, Adapted Lesk and Gloss Vectors, as a research instrument was operationalized with the online service of

WordNet::Similarity, as shown in Figure 3.3.2. In fact, WordNet::Similarity

conveniently integrated the online service of WordNet Search-3.1 (Miller, 1995) with hyperlinks and provided semantic similarity calculation by a straightforward process of data input and results output. The process of calculating and retrieving lexical similarity values was discussed in the later section of data collocation procedure.

The second instrument was a contextualized collocation test to evaluate collocation production. Forty items were selected according to collocation properties such as restriction, semantic transparency or opacity, and L1-L2 congruency.

Semantic congruency was one of the primary properties, indicating whether or not component words could be replaced with L2 near-synonyms transferred from L1 word sense. In this study, L1 influence was considered as one of the most crucial factors in collocation processing. The influence of L1 transferred meaning caused by non-L1 equivalence has been recognized by many researchers (Bahns & Eldaw, 1993;

Caroli, 1998; Granger, 1998; Nesselhauf, 2003; Murao, 2004; Shehata, 2008). This study took L1 transfer into account to examine congruency relations and their influences in learners’ performance. The selection criteria for collocation candidates included collocational types, length, and property of restrictedness. The types were limited to two syntactic part-of-speech structures, including verb-noun (VN) and

adjective-noun (AdjN). The other properties of collocational candidates were

semi-restricted and at the span of two words, i.e., the binary type.

According to the criteria, a set of collocation candidates were extracted from the collocation lists of previous studies on common miscollocations. Lin (2010) inquired that Taiwanese EFL learners commonly misused verb-noun collocations and provided a list of 210 miscollocations. Another study conducted by Chen (2009) provided a

collocation list of the adjective-noun combination. Three online retrieval tools, including WebCollocate (Chen, 2011), TANGO (Jian, Chang, & Chang, 2004) and

ozdic.com (Oxford Dictionary Online, McIntosh, Francis & Poole, 2009), were

consulted to ensure correctness of the collocation candidates. Table 3.3.1 summarizes the design of collocation categories and the allocation of collocation items. In addition, word frequency was also used as a selection criterion. Only collocations with high frequency words were considered and the candidate list of collocation items were also selected to reach a balance of difficulty.

Table 3.3.1 Category Design and Allocation of Collocation Items

Collocation Type Restriction Length Congruent Incongruent

verb-noun semi- binary 10 10

adj.-noun semi- binary 10 10

The final set of collocation test included four categories of collocation items, congruent verb-noun collocations, incongruent verb-noun collocations, congruent adjective-noun collocations, and incongruent adjective-noun collocations. Each category consisted of ten collocation items, as shown in Table 3.3.2, with given bases and expected collocates. The collocation test was conducted in limited class hours to evaluate productive knowledge of L2 collocations. All of the collocation items were embedded in contextual sentences where the equivalent L1 collocations were given in Chinese, as given in the appendix. The test required participants to complete a blank-filling task by providing a collocate (verb or adjective) with the given base (noun) in a contextual sentence. By filling in a blank word in a contextual sentence

The final set of collocation test included four categories of collocation items, congruent verb-noun collocations, incongruent verb-noun collocations, congruent adjective-noun collocations, and incongruent adjective-noun collocations. Each category consisted of ten collocation items, as shown in Table 3.3.2, with given bases and expected collocates. The collocation test was conducted in limited class hours to evaluate productive knowledge of L2 collocations. All of the collocation items were embedded in contextual sentences where the equivalent L1 collocations were given in Chinese, as given in the appendix. The test required participants to complete a blank-filling task by providing a collocate (verb or adjective) with the given base (noun) in a contextual sentence. By filling in a blank word in a contextual sentence