Chapter 2 Literature review
2.2 Empirical accounts on compound composition
In the previous section, it seems that most linguists agreed to use lexicalization to
explain VOC patterns. However, there is still no consensus on the processing mechanism of
VOCs among psycholinguists. In other words, how people process VOC is still obscure. Are
we retrieving a unitary word representation out of our lexicon? Or do we comprehend a
VOC with a combinatory mechanism as in understanding a phrase?
In the field of psycholinguistics, the representation and processing of compound words,
and morphologically complex words more generally, remains a controversial issue. Much
psycholinguistic research has focused on the question whether morphologically complex
words are stored in the mental lexicon in their full form or whether only their morphemes
are stored and then combined to process complex word forms. The former idea is so-called
full-listing models (Butterworth, 1983; Bybee, 1995) and the latter view is termed
full-parsing models (Libben, Derwing, & de Almeida, 1999; Taft, 2004; Taft & Forster,
1976).
Alternatively, it is proposed by another group of researchers that both mechanisms may
be invoked, which is known as dual-route models (Gunter, & Friederici, 2003; Koester,
Gunter, & Wagner, 2004, 2007; Zwitserlood, 1994). In dual-route models, two routes of
processing are assumed. A complex word can either be stored completely or be decomposed
14
into its morphological constituents (Baayen, Dijkstra, & Schreuder, 1997; Gunter, &
Friederici, 2003).
In order to decide among these models, many studies have been designed to explore
semantic decomposition of compounds in the auditory and visual modality (Coolen et al.
1993; Isel et al., 2003; Libben et al., 1999; Libben, 1993) with two variables being
manipulated: word frequency and semantic transparency.
It is conceivable that the frequency of the word mattered in compound processing since
more frequent words are more likely to benefit from readily available whole-form storage,
whereas less frequently used compounds might have to be processed through a
combinatorial mechanism. Semantic transparency is an influential factor on compound
processing as well. Since transparent compounds do not contain idiosyncratic meaning, they
do not need distinct lexical representations and consequently their processing mechanism
may be very similar with syntactic rules linking words in a sentence (e.g. blue+berry). On
the contrary, the meaning of opaque compounds cannot be derived by combining the
meaning of its constituents (e.g. straw+berry?) and thus may rely on whole-form lexical
storage. However, it should be noted that it is still controversial to contend that opaque
compound is accessed solely in a full-listing fashion, as illustrated below.
To explore whether the meanings of individual constituents are accessed during
compound processing or not, a number of behavioral studies are conducted with semantic
15
priming paradigms. It is shown that, regardless of the preceding prime being semantically
related to the first or second constituent of the target compound, the lexical decision times
to two-constituent transparent compound words were speeded up (Sandra, 1990;
Zwitserlood, 1994). Based on the result above, it is argued that combinatorial processing is
carried out for transparent compounds.
Trying to provide more evidence for the study above, a cross-modal priming study is
conducted by Zhou et al. (2000). In their study, the result showed that visually presented
transparent compound words were primed by the prior auditory presentation of both first
and second compound constituents, but the effect was absent for opaque compounds. In line
with these findings, another cross-modal semantic priming study showed that the prosodic
cue of the initial morpheme of a compound is able to assist the processing system in
activating a decompositional route at the offset of the morphemes (Isel et al., 2003). The
assistant effect only happened in compound words with a transparent head but not in
compound words with an opaque head.
However, in a lexical decision task using a repetition priming paradigm (Libben,
Gibson, Yoon, & Sandra, 2003), the result showed that the presentation of either the first or
second constituent as a lexical prime speeded up lexical decisions for both opaque and
transparent compounds. This implies that constituent access is activated for both transparent
and opaque compounds. Furthermore, in eye movement studies which directly compared
16
processing of transparent and opaque compounds (frequencies of constituents and the
frequency of the whole-word forms were equal between the two types), no differences were
obtained on any eye movement measure for either English (Frisson, Niswander-Klement, &
Pollatsek, 2008) or Finnish (Pollatsek & Hyönä, 2005) stimuli. Again, the results suggest
that both transparent and opaque compounds adopt similar processing mechanism. More
interestingly, in a recent study conducted by Gagne et al. (2009), both transparent and
opaque compounds were processed more quickly than monomorphemic words, showing
that even the opaque compounds were processed more quickly than monomorphemic words
which again, indicated that lexical entries of constituents are accessed in compound
processing regardless of semantic transparency.
In sum, the behavioral studies under review are mostly conducted in terms of semantic
priming paradigm. Although the role of transparency in compound processing is still under
debate, the above studies generally report decomposition effects, which are in accordance
with full-parsing and dual-route models but not with full-listing models.
Accessing to compound constituents has also been studied neurophysiologically in
recent years. Event-related potentials (ERPs), with the high temporal resolution, are a
suitable tool to investigate such fast psycholinguistic processes.
Before heading into the experiment review, N400 (Kutas&Hillyard, 1980) will be
briefly introduced here. The N400 response is a broad negative deflection of the ERP that
17
starts 200–300 ms after a word has been presented auditorily or visually and peaks after
approximately 400 ms. This negative-going wave is usually largest over central and parietal
electrode sites, with slightly larger amplitude over the right hemisphere than over the left
hemisphere. The N400 is typically seen in response to violations of semantic expectancies.
The N400 is typically elicited in response to meaningful stimuli and thought to reflect
access or integration of conceptual information (Kutas & Federmeier, 2011). It is also
assumed that the N400 effect reflects the difficulty in integrating the local lexical semantics
into the sentence/ discourse representation (Van Berkum et al.,1999; Van Berkum, Brown,
Hagoort, & Zwitserlood, 2003) or the difficulty in lexical access (Kutas & Federmeier,
2000).
Studies on compounds are rather few, if studies on derivation and inflection morpheme
processing are excluded (e.g., Katz, 1991; Li et al., 1993; Carlisle, 2000; Myers, 2006).
However, studies on idioms or English verbal phrase can still provide neurophysiological
evidence for morphological decomposition or composition.
To begin with, there are two recent studies attempting to measure the combinatorial
process itself, using the N400 brain response as an index of lexico-semantic integration of
compound constituents (Koester et al., 2007; Zhang et al.2013). Koester et al. (2007) showed
that transparent compounds elicited a larger N400 than opaque compounds, suggesting a
combinatorial mechanism for transparent compounds (Koester et al., 2007). Another recent
18
ERP study (Zhang et al.2013) aims to investigate the time course of Chinese idiom
comprehension and the effects of compositionality. In the study, Chinese idioms with varying
degrees of compositionality and non-idiomatic phrases, primed by their literal interpretations,
were visually presented to subjects for performing a semantic judgment task. The results
show a graded modulation of the N400 for the Chinese idioms, with stimuli with high
compositionality (e.g. ju jing hui shen 聚精會神 “concentrate one's attention and energy on”)
eliciting the smallest ERP effects and those with low compositionality (e.g. yao ya qie chi咬 牙切齒 “gnash the teeth in anger”) the largest. The result again supported that
compositionality may induce larger N400.
To summarize, it has been inferred that the processing of compounds, at least
transparent compounds, operates combinatorially. As for the opaque compound, the
evidence is still not enough to make the conclusion. When it comes to VOCs in Chinese,
which can be separated just like a verbal phrase but at the same time their meanings are
opaque, it seems that the situation is more complex. Although there is no present literature
to refer to, studies on English verbal phrases might be helpful because English verbal
phrases have similar patterns as Chinese VOCs.
Similar to Chinese VOCs’ controversy, there is a considerable linguistic debate on
whether verbal phrases (e.g., turn up, break down) are processed as two separate words
connected by a syntactic rule or whether they form a single lexical unit. The views differ on
19
whether meaning (transparency vs. opacity) plays a role in determining their
syntactically-connected or lexical status. As linguistic arguments could not reach a
consensus, Cappelle et al. (2010) adopted megnetoencephalography (MEG) to address the
issue. By applying a multi-feature Mismatch Negativity (MMN) design with subjects
instructed to ignore speech stimuli, Cappelle et al. recorded magnetic brain responses to
particles (up, down) auditorily presented as infrequent “deviant” stimuli in the context of frequently occurring verb “standard” stimuli. Already at latencies below 200 ms, magnetic
brain responses were larger to particles appearing in existing phrasal verbs (e.g. rise up)
than to particles appearing in non-existing combinations (e.g. *fall up), regardless of
whether particles carried a literal or metaphorical sense (e.g. rise up, heat up). Previous
research found that MMN is relatively enhanced if speech is linked to a single word, but
relatively reduced in the case of a syntactic and semantic match between two words linked
by phrase-structure rules (Pulvermüller & Shtyrov, 2003; Pulvermüller et al., 2008). The
increased brain activation to particles in real phrasal verbs reported in Cappelle et al.’s
study thus provided neurophysiological support that a congruent verb–particle sequence is
not in syntactic relationship but more like a lexical unit.
In short, according to the literatures in both psycholinguistics and neurolinguistics,
transparent compounds (i.e. similar to VOPs in Chinese) are processed with a
decomposition/integration mechanism. The larger N400 effect thus can be considered as the
20
cost of integration. On the other hand, the processing mechanism of opaque compounds (i.e.
similar to VOCs in Chinese) is still obscure.
21
Chapter 3
Methodology
In this chapter, the current experiment is illustrated. First of all, the participants of the
experiment are described in section 3.1. Materials are introduced in section 3.2. Settings and
procedure of the experiment are reported in section 3.3. Finally, the process of data analysis
is illustrated in section 3.4
3.1 Participants
Thirty-nine Chinese native speakers (20 to 35 years old, mean age = 23, 25 females)
were recruited for the experiment. All participants were right-handed according to a
simplified version of the Edinburg handedness inventory (Oldfield, 1971). They all had
normal or corrected-to normal vision. None of the subjects had neurological/psychiatric
disorders. Written informed consent was obtained from all participants before the experiment
started. They were paid for their participation after they completed the task of experiment.
22
3.2 Materials
The materials were sentences embedded with VO-structured verbs with two factors
being manipulated: transparency (Transparent, In between and Opaque) and sentence
pattern (Separated, Unseparated). They could be divided into six conditions: OS (opaque,
separated), OU (opaque, unseparated), IS (in between, separated), IU (in between,
unseparated), TS (transparent, separated), TU (transparent, unseparated). The example
sentences are provided in Table 1.
Table 1: Example sentences of the current experiment (TS, transparent, separated; TU, transparent, unseparated; IS, in between, separated; IU, in between, unseparated; OS, opaque, separated; OU, opaque, unseparated).
Conditions Transparency Sentence structure Example sentence
OS Opaque Separated
司 機 熬 了 夜 si ji ao le ye
‘The driver stayed up late’
OU Opaque Unseparated
司 機 熬 夜 了 si ji ao ye le
‘The driver stayed up late’
IS In between Separated
胖 子 跑 了 步 pan zi pao le bu
‘The fat guy ran’
IU In between Unseparated
胖 子 跑 步 了 pan zi pao bu le
‘The fat guy ran’
TS Transparent Separated
助 理 犯 了 錯 zhu li fan le cu
‘The assistant made a mistake’
TU Transparent Unseparated
助 理 犯 錯 了 zhu li fan cu le
‘The assistant made a mistake’
23
For the separated conditions (OS, IS and TS), the VO sequence was interposed with a
fixed morpheme le 了 for the following two reasons: (1) it was reported that in the case of
inserting frequently used aspectual morpheme like –le 了, -zhe 著, and –guo 過, the VO
sequence should be viewed as words (Siewierska et al., 2010); (2) –le 了 was reported to be
the most frequently used interposing element for the VO sequence (Smith, 1999; Wang,
2009). As results, all the experimental stimuli had the following sentence structures:
{[Subject]N.+[V le O] Com./Phr.}Sen. or
{[Subject] N.+[VO le] Com./Phr.}Sen.
Note: “N” means Noun, “Com.” means Compound, “Phr.” means Phrase and
“Sen.” means Sentence.
First, Verb-Object structured verbs were selected for the experiment. It should be noted
beforehand that in order to observe lexicalization of the VO sequence, the final division of
semantic transparency into three conditions (Opaque/In between/Transparent) was made by
the result of a pilot test done by native speakers of Chinese Mandarin. Therefore, the
classification of three groups in terms of semantic transparency before the pilot test was only
temporary.
The materials in the Opaque group and the In between group were mainly from the
following references: Smith’s dissertation (1999) , Wang’s (2009) and Zhang’s (2013)
24
master theses. The materials in Smith’s and Zhang’s were from the corpus of Academia
Sinica, with the tags related to the VO sequence, [spv.] and [spo.]1. As for the materials in
the Transparent group, PRACTICAL AUDIO-VISUAL CHINESE (新版視聽華語) and the
Sinica Corpus were the main resources.
A total of 623 critical verbs were selected. They were further eliminated with the
following steps. The first step was to control the word frequency, which was done by
looking up the log frequency with the Chinese Word Sketch Engine
(http://wordsketch.ling.sinica.edu.tw/). There were at least a billion Chinese lexical items in
the corpus of gigaword2all, which was constructed by the Institute of Linguistics, Academia
Sinica. Verbs with no frequency recorded were filtered out. The rest of the materials (577
verbs in total) were then classified roughly by its grammatical tag in the corpus. In the
Opaque group (226 verbs) and the In between group (152 verbs), the verbs were mainly
from the literatures mentioned above. The grammatical tags for these two groups of verbs
were all intransitive verbs, usually tagged as VA, VB and VH2. As for the Transparent group
(199 verbs), since this group was composed of transparent VO sequence, most of the verbs
in this group were tagged as VC+Na, with VC referring to a transitive verb and Na referring
1 [spv] and [spo] are two features designed by the Acdemia Sinica Corpus.
[spv] means the Verb while [spo] means the Noun of a separable V N compound e.g. 吃 Vc[+spv]了他的虧 Na[+spo].
2 [VA] Intransitive Action Verb 動作不及物動詞.
[VB] Intransitive-like Action Verb 動作類及物動詞.
[VH] Intransitive Stative Verb 狀態不及物動詞.
25
to a noun. (e.g. 寫[VC]詩[Na] “write a poem”).
To finalize the classification of the selected verbs into the Opaque, In between and
Transparent conditions, a pilot test on semantic transparency was conducted. The verbs from
the temporary classified three groups were equally distributed into four questionnaires for
subjects to rate. Three questionnaires contained 144 verbs and one contained 145 verbs, and
each of them was rated by 30 Chinese-speaking subjects ranging from 20 to 40 years old (See
Appendix A for the details of the semantic transparency questionnaire). The subjects were
asked to judge how different a verb’s meaning in use is from the meaning combination of its
constituents on a 5-point scale
(1: least different, 5: very different). Taking chi cu 吃醋“being jealous ” as an example, since
chi means “eat” and cu means “vinegar”, the combination of chi cu is “eating vinegar”, which
is very different from its meaning in use “being jealous”. Subject may consequently rate this
verb 5 (very different). Based on the subjects’ ratings, the mean scores for the verbs in these
three groups were: Opaque group =3.65 (SD=.45); In between group =2.40 (SD=.25);
Transparent group =1.53 (SD=.14). Statistical analysis on the results showed that there was a
significant difference among these three groups, F (2, 58) = 297.02, p<.001.
After these three groups (Opaque/In between/Transparent) were formed, other
variables that might confound the experiment were further controlled. First of all, word
frequency (obtained from the corpus of gigaword2all in Chinese Word Sketch Engine
26
mentioned above) of all the VO structured verbs in the three groups were equivalent no
matter in VO unseparated form, F(2, 58) = 1.65, p=.200 or in VO separated form F(2, 58) =
2.67, p=.078. Moreover, two other pilot tests were conducted to ensure that the degree of
concreteness and familiarity were equal among the three groups of verbs (See Appendices B
and C for the details of the familiarity questionnaire and concreteness questionnaire).
Another two groups of 30 subjects (20-40 years old, native speakers of Chinese Mandarin)
were recruited to judge the concreteness and familiarity of the verbs on a 5-point scale (1:
least concrete, 5: very concrete; 1: least familiar, 5: very familiar). These two groups of
subjects did not participate in the previous transparency pilot test and the later formal
experiment. The results of these two pilot tests showed that there were no significant
difference among these three groups of stimuli in terms of concreteness, F(2, 58) = 2.21,
p=.118, and familiarity degree, F(2, 58) = 1.08, p=.346. Finally, the neighborhood size of
the verbs in these three groups was controlled. The value of neighborhood size was
computed automatically by adopting python-based Load and Analysis Chinese
Corpus-Natural Language Toolkit (LACC-NLTK) based on Academia Sinica Balanced
Corpus with tagged texts (http://tm.itc.ntnu.edu.tw/CNLP/?q=node/6). There was no
significant difference among the three groups, F (2, 58) = 1.43, p=.247.
As soon as the critical VO constructed verbs were finalized, the construction of sentences
stimuli was started. All the sentences, including fillers, were built with five characters. In
27
those critical sentences, the subject of a sentence was always a two-character animate noun.
Since all the sentences should be natural enough for the experiment, the subject of a sentence
stimulus was chosen to be correlated to its critical verb, e.g. 胖子+跑步了 “The fat guy + ran.” The correlation between the subjects and the verbs of the three groups of sentence
stimuli were checked. The statistic results showed that there was no significant difference
among the three groups of sentence stimuli in terms of correlation, F(2, 58) = 1.48, p=.236.
Furthermore, the number of the strokes of the verbs (F(2, 58) = .42, p=.657) was statistically
equivalent across three groups of sentences. Finally, a final pilot test (See Appendix D for the
details of the sentence naturalness questionnaire) was conducted to ask a new group of 30
subjects (20-40 years old, native speakers of Chinese Mandarin) to rate the naturalness of the
sentences on a 5-point scale (1: least natural, 5: very natural). Ideally, all the sentence stimuli
should be rated equally nature across all the conditions, but VO unseparated sentences got
statistically higher score compared with VO separated sentences. However, this was not
surprising since previous literature had pointed out that the VO unseparated form is the main
usage of VOCs (Yi,2007). Though the statistic result showed that there was a main effect of
sentence naturalness between VO separated sentences and VO unseparated sentences (F(1, 29)
= 54.75, p<.001), and a significant separablility x transparency interaction, F(2, 58) = 4.58,
p=.014, with unseparated condition having higher ratings than separated condition , all the
sentences could still be counted as natural since the mean score of all the sentences were
28
higher than 4.37 out of 5 (TS = 4.53, TU =4.65 , IS =4.48 , IU =4.76 , OS =4.37 , OU =4.75). The results of the pilot tests are summarized in table 2.
Table 2. The summary of the statistical results for all the controlled variables.
Ratings/values Factors F Value p
VO verbs
Semantic transparency
Semantic transparency (3 levels)
F (2, 58) = 297.02 .000**
Sentence naturalness Separability (2 levels), Semantic transparency (3 levels)
Separability
F (2, 58) = 54.75 .000**
Semantic transparency
F (2, 58) = .76 1 Separability x Semantic
transparency F (2, 58) = 4.57
.014*
Sentence naturalness in VO unseparated form Semantic transparency (3 levels) F (2, 58) = 2.12 1 Sentence naturalness in VO separated form Semantic transparency (3 levels) F (2, 58) = 2.74 .072
Correlation between subjects and verbs Semantic transparency (3 levels) F (2, 58) = 1.48 1 Note. p*=<.05, p**=<.001
29
There were 30 sentence stimuli for each condition, with a total of 180 critical sentences
included in the experiment. Ninety fillers were also added, including23 ungrammatical VO
separated sentences (e.g. 粽子違著規 zong zi wei zhe gui ‘The rice dumpling is breaking rules’), 23 ungrammatical VO unseparated sentences (麵粉摸彩著 mian fen mo cai zhe ‘The
flour is drawing lots’) and 44 non-VO grammatical sentences (蚊子很討厭 wen zi hen tao
flour is drawing lots’) and 44 non-VO grammatical sentences (蚊子很討厭 wen zi hen tao