4.3 Pretest Three—Familiarity Rating Task for Ambiguous Words
4.3.5 Verification of the Ambiguous Words
To keep the experimental conditions for the two groups (homonymous words and
polysemous words) equated and to avoid other potential confounding factors, these
two groups were also controlled for a number of variables (in addition to word
frequency and experiential familiarity), including percentage of verb usage, meaning
frequency distribution, percentage among confounding homophones, number of
senses, number of syntactic categories, phonological neighborhood density, and
number of participant roles. The homonymy group and polysemy group were verified
to have no significant differences with regard to these variables so that the
experimental results would not be confounded. Through the verification process, the
84
potential outliers were screened out and thus 16 homonyms and 16 polysemes were
selected as the finalized stimuli (as primes). The verification of the potential
confounding factors was listed in the following:
(1) Percentage of verb usage. We compared how often the two groups of
ambiguous words were used as verbs in order to avoid cross-categorical problem, and
to ascertain that the two groups did not differ regarding the percentage of verb usage
[t (30) = .684, p = .249].
(2) Meaning frequency distribution (Ranking percentage of the primary and
secondary meaning). According to the data collected from the sense ranking task, we
compared the strength of the two meanings of each ambiguous word in the two groups,
in order to make sure that the selected ambiguity was unbalanced or polarized in
meaning distribution. That is, the frequency of the primary meaning of ambiguity
should be greater than the secondary meaning [t (30) = 4.946, p =.000 for homonymy
group; t (30) = 3.359, p = .003 for polysemy group]. The two groups were also
compared and they have no differences in terms of the ranking percentage of the
primary and secondary meaning of ambiguity [primary meaning: t (30) = .898, p
= .188; secondary meaning: t (30) = -.578, p = .284]. In addition, the differences on
the ranking percentage between the primary and secondary meaning of the
homonymy group did not differ from that of the polysemy group [t (30) = 1.138, p
= .264].
(3) Influence of confounding homophones. Since the stimulus words would be
presented auditorily in a cross-modal experiment, it would be necessary to avoid the
possibility that the stimulus words might be confounded by their homophones. The
frequency of each homophone word (i.e., the total frequency of those words with the
same sound) was counted (according to the data from Sinica Corpus 5.0 retrieved by
Chinese Wordsketch), and the proportion of the character frequency of each stimulus
word among the frequency of its all homophones was thus calculated. Those words
with higher percentage indicated that they had less possibility of being confounded by
other homophones, and those words with lower percentage indicated that they had
higher possibility of being confounded by other homophones. For example, the
frequency of the prime word “送” was 1363 (based on Sinica Corpus 5.0) and its
homophones song4 included “送”, “宋”, “頌”, “訟” and “誦”. The total homophone
frequency of these words was 1801. The frequency percentage of the word “送”
among its confounding homophones was thus calculated as 1363/1801(%) = 75.68
(%). By comparing the percentage of the two ambiguity groups, influence of
confounding homophones was controlled so as to make sure that the two groups were
not significantly different in terms of their confounding homophones [t (30) = -.194, p
86
= .424].
(4) Number of meanings/senses. In order to avoid NOM (number-of-
meaning)/NOS (number-of-sense) effect pointed out by Borowsky and Masson (1996),
Lin (1999), Pexman and Lupker (1999), and Piercey and Joordens (2000), which
specifies that different numbers of senses would influence lexical processing, the
number of senses of each word in both two ambiguity groups was thus counted (based
on Chinese Wordnet and MOE Revised Chinese Dictionary) and compared in order to
make sure that the two groups did not significantly differ in terms of their numbers of
senses [t (30) = .164, p = .436].
(5) Number of syntactic categories. To remove the influence of NOC
(number-of-category) effect (Huang & Chang, 2004; Huang et al., 2002; Tsai, 2005),
which denotes that different numbers of parts of speech of the words would have
different effects on lexical access, the number of syntactic categories of each word in
both two ambiguity groups was counted (according to Chinese Wordnet and MOE
Revised Chinese Dictionary) and compared in order to ascertain that the two groups
did not differ in terms of their numbers of syntactic categories [t (30) = .745, p
= .231].
(6) Phonological neighborhood density (size). Neighborhood density refers to the
number of word representations that sound like a given word. Words with few similar
sounding words or neighbors (with a sparse neighborhood) and those with many
similar sounding neighbors (with a dense neighborhood) may produce significantly
different effects in word recognition (Luce & Pisoni 1998; Vitevitch & Rodríguez,
2005; Vitevitch & Stamer, 2006). In Chinese, phonological neighborhood density of a
word is defined as the number of disyllabic (two-character) words sharing the same
sound of the initial constituent character (Tsai, Lee, Lin, Tzeng, & Hung, 2006). For
example, all the disyllabic words such as j4izhe3 ‘reporter’ (記者), ji4yi4 ‘memory’
( 記 憶 ), ji4hua4 ‘plan’ ( 計 畫 ), ji4nian4 ‘memorial’( 紀 念 ) in Chinese are the
neighborhoods of the word ji4. We counted each stimulus word and compared the two
groups in regard to the phonological neighborhood density of each word, based on the
data from the system SouCiXunZi ‘Search for Word and Character’ (搜詞尋字),16 in
order to validate that the two groups did not significantly differ [t (30) = .657, p
= .258].
(7) The argument structure of verbs. It has been indicated by Li, Shu, Liu, and Li
(2006) that the information of the verb’s arguments is an integral part of the mental
representation of verbs, and such information of the verb is accessed on-line during
sentence processing. Similarly, Ahrens & Swinney (1995) and Ahrens (2003)
suggested that the number of participant roles (or thematic roles) associated with the
16 “搜詞尋字” is an on-line retrieval system (http://words.sinica.edu.tw/), which is conducted by Institute of Linguistics, Academia Sinica. This system consists of five-million-word corpus for users
88
central sense of the verb is crucial information for lexical access in sentence
processing. By using a cross-modal lexical decision task, their findings demonstrated
that reaction times following verbs with three participant roles were longer than those
with one or two participant roles. For example, the two-role verb kick was processed
faster to integrate into the sentence “It was to Robert that the football with a logo was
KICKED” than the three-role verb give in the sentence “It was to Jen that the rabbit
from Mike was GIVEN”. It was thus suggested that the number of participant roles
associated with a verb has influence on the verb’s rate of integration into the sentence
(Ahrens, 2003). As a result, we counted and compared the stimulus words of the two
ambiguity groups regarding to the number of participant roles associated with the
central sense of each stimulus word, in order to make sure that the two groups were
not significantly different in terms of this variable [t (30) = .591, p = .279]. Moreover,
all of the stimulus words were checked (on the basis of Chinese Wordnet and MOE
Revised Chinese Dictionary) to be transitive verbs in order to make the stimulus
words homogeneous.
Therefore, the homonymy group and polysemy group were compared and
confirmed that they did not differ regarding the above seven factors as well as word
frequency [t (30) = -.435, p = .333> .05] and word familiarity [t (30) = .646, p = .202].
In other words, the conditions for these two groups of ambiguity were not
significantly different except for their sense relatedness rating scores [t (30) = -14.048,
p = .000 < .05], which was an important variable manipulated in the present study.
Please refer to Appendix 11 for the items and complete statistical data.
In addition to 16 homonyms and 16 polysemes, a list of 16 unambiguous words
was constructed and added as filler items. The homonymy group, polysemy group and
unambiguity group were also compared and they did not differ with regard to word
frequency [F(2, 45) = .428, p = .654], percentage of verb usage [F(2, 45) = 1.028, p
= .366], influence of homophones [F(2, 45) = .041, p = .96], number of syntactic
categories [F(2, 45) = 1.642, p = .205], number of participant roles [F(2, 45) = 1.047,
p = .36] and neighborhood density [F(2, 45) = 1.576, p = .218].
Table 4.1 summarizes the statistical data of the experimental stimuli with respect
to each of the variables. The complete list of the experimental items (48 words in total)
was offered in Appendix 11.
Table 4.1. The statistical data of the three groups of prime words
Variables Prime groups N Mean SD T-test ANOVA
90
homophones Unambiguity 16 60.51 40.78
F(2, 45) = .041,