Empirical accounts on compound composition

Chapter 2 Literature review

2.2 Empirical accounts on compound composition

In the previous section, it seems that most linguists agreed to use lexicalization to

explain VOC patterns. However, there is still no consensus on the processing mechanism of

VOCs among psycholinguists. In other words, how people process VOC is still obscure. Are

we retrieving a unitary word representation out of our lexicon? Or do we comprehend a

VOC with a combinatory mechanism as in understanding a phrase?

In the field of psycholinguistics, the representation and processing of compound words,

and morphologically complex words more generally, remains a controversial issue. Much

psycholinguistic research has focused on the question whether morphologically complex

words are stored in the mental lexicon in their full form or whether only their morphemes

are stored and then combined to process complex word forms. The former idea is so-called

full-listing models (Butterworth, 1983; Bybee, 1995) and the latter view is termed

full-parsing models (Libben, Derwing, & de Almeida, 1999; Taft, 2004; Taft & Forster,

1976).

Alternatively, it is proposed by another group of researchers that both mechanisms may

be invoked, which is known as dual-route models (Gunter, & Friederici, 2003; Koester,

Gunter, & Wagner, 2004, 2007; Zwitserlood, 1994). In dual-route models, two routes of

processing are assumed. A complex word can either be stored completely or be decomposed

into its morphological constituents (Baayen, Dijkstra, & Schreuder, 1997; Gunter, &

Friederici, 2003).

In order to decide among these models, many studies have been designed to explore

semantic decomposition of compounds in the auditory and visual modality (Coolen et al.

1993; Isel et al., 2003; Libben et al., 1999; Libben, 1993) with two variables being

manipulated: word frequency and semantic transparency.

It is conceivable that the frequency of the word mattered in compound processing since

more frequent words are more likely to benefit from readily available whole-form storage,

whereas less frequently used compounds might have to be processed through a

combinatorial mechanism. Semantic transparency is an influential factor on compound

processing as well. Since transparent compounds do not contain idiosyncratic meaning, they

do not need distinct lexical representations and consequently their processing mechanism

may be very similar with syntactic rules linking words in a sentence (e.g. blue+berry). On

the contrary, the meaning of opaque compounds cannot be derived by combining the

meaning of its constituents (e.g. straw+berry?) and thus may rely on whole-form lexical

storage. However, it should be noted that it is still controversial to contend that opaque

compound is accessed solely in a full-listing fashion, as illustrated below.

To explore whether the meanings of individual constituents are accessed during

compound processing or not, a number of behavioral studies are conducted with semantic

priming paradigms. It is shown that, regardless of the preceding prime being semantically

related to the first or second constituent of the target compound, the lexical decision times

to two-constituent transparent compound words were speeded up (Sandra, 1990;

Zwitserlood, 1994). Based on the result above, it is argued that combinatorial processing is

carried out for transparent compounds.

Trying to provide more evidence for the study above, a cross-modal priming study is

conducted by Zhou et al. (2000). In their study, the result showed that visually presented

transparent compound words were primed by the prior auditory presentation of both first

and second compound constituents, but the effect was absent for opaque compounds. In line

with these findings, another cross-modal semantic priming study showed that the prosodic

cue of the initial morpheme of a compound is able to assist the processing system in

activating a decompositional route at the offset of the morphemes (Isel et al., 2003). The

assistant effect only happened in compound words with a transparent head but not in

compound words with an opaque head.

However, in a lexical decision task using a repetition priming paradigm (Libben,

Gibson, Yoon, & Sandra, 2003), the result showed that the presentation of either the first or

second constituent as a lexical prime speeded up lexical decisions for both opaque and

transparent compounds. This implies that constituent access is activated for both transparent

and opaque compounds. Furthermore, in eye movement studies which directly compared

processing of transparent and opaque compounds (frequencies of constituents and the

frequency of the whole-word forms were equal between the two types), no differences were

obtained on any eye movement measure for either English (Frisson, Niswander-Klement, &

Pollatsek, 2008) or Finnish (Pollatsek & Hyönä, 2005) stimuli. Again, the results suggest

that both transparent and opaque compounds adopt similar processing mechanism. More

interestingly, in a recent study conducted by Gagne et al. (2009), both transparent and

opaque compounds were processed more quickly than monomorphemic words, showing

that even the opaque compounds were processed more quickly than monomorphemic words

which again, indicated that lexical entries of constituents are accessed in compound

processing regardless of semantic transparency.

In sum, the behavioral studies under review are mostly conducted in terms of semantic

priming paradigm. Although the role of transparency in compound processing is still under

debate, the above studies generally report decomposition effects, which are in accordance

with full-parsing and dual-route models but not with full-listing models.

Accessing to compound constituents has also been studied neurophysiologically in

recent years. Event-related potentials (ERPs), with the high temporal resolution, are a

suitable tool to investigate such fast psycholinguistic processes.

Before heading into the experiment review, N400 (Kutas&Hillyard, 1980) will be

briefly introduced here. The N400 response is a broad negative deflection of the ERP that

starts 200–300 ms after a word has been presented auditorily or visually and peaks after

approximately 400 ms. This negative-going wave is usually largest over central and parietal

electrode sites, with slightly larger amplitude over the right hemisphere than over the left

hemisphere. The N400 is typically seen in response to violations of semantic expectancies.

The N400 is typically elicited in response to meaningful stimuli and thought to reflect

access or integration of conceptual information (Kutas & Federmeier, 2011). It is also

assumed that the N400 effect reflects the difficulty in integrating the local lexical semantics

into the sentence/ discourse representation (Van Berkum et al.,1999; Van Berkum, Brown,

Hagoort, & Zwitserlood, 2003) or the difficulty in lexical access (Kutas & Federmeier,

2000).

Studies on compounds are rather few, if studies on derivation and inflection morpheme

processing are excluded (e.g., Katz, 1991; Li et al., 1993; Carlisle, 2000; Myers, 2006).

However, studies on idioms or English verbal phrase can still provide neurophysiological

evidence for morphological decomposition or composition.

To begin with, there are two recent studies attempting to measure the combinatorial

process itself, using the N400 brain response as an index of lexico-semantic integration of

compound constituents (Koester et al., 2007; Zhang et al.2013). Koester et al. (2007) showed

that transparent compounds elicited a larger N400 than opaque compounds, suggesting a

combinatorial mechanism for transparent compounds (Koester et al., 2007). Another recent

ERP study (Zhang et al.2013) aims to investigate the time course of Chinese idiom

comprehension and the effects of compositionality. In the study, Chinese idioms with varying

degrees of compositionality and non-idiomatic phrases, primed by their literal interpretations,

were visually presented to subjects for performing a semantic judgment task. The results

show a graded modulation of the N400 for the Chinese idioms, with stimuli with high

compositionality (e.g. ju jing hui shen 聚精會神 “concentrate one's attention and energy on”)

eliciting the smallest ERP effects and those with low compositionality (e.g. yao ya qie chi咬 牙切齒 “gnash the teeth in anger”) the largest. The result again supported that

compositionality may induce larger N400.

To summarize, it has been inferred that the processing of compounds, at least

transparent compounds, operates combinatorially. As for the opaque compound, the

evidence is still not enough to make the conclusion. When it comes to VOCs in Chinese,

which can be separated just like a verbal phrase but at the same time their meanings are

opaque, it seems that the situation is more complex. Although there is no present literature

to refer to, studies on English verbal phrases might be helpful because English verbal

phrases have similar patterns as Chinese VOCs.

Similar to Chinese VOCs’ controversy, there is a considerable linguistic debate on

whether verbal phrases (e.g., turn up, break down) are processed as two separate words

connected by a syntactic rule or whether they form a single lexical unit. The views differ on

whether meaning (transparency vs. opacity) plays a role in determining their

syntactically-connected or lexical status. As linguistic arguments could not reach a

consensus, Cappelle et al. (2010) adopted megnetoencephalography (MEG) to address the

issue. By applying a multi-feature Mismatch Negativity (MMN) design with subjects

instructed to ignore speech stimuli, Cappelle et al. recorded magnetic brain responses to

particles (up, down) auditorily presented as infrequent “deviant” stimuli in the context of frequently occurring verb “standard” stimuli. Already at latencies below 200 ms, magnetic

brain responses were larger to particles appearing in existing phrasal verbs (e.g. rise up)

than to particles appearing in non-existing combinations (e.g. *fall up), regardless of

whether particles carried a literal or metaphorical sense (e.g. rise up, heat up). Previous

research found that MMN is relatively enhanced if speech is linked to a single word, but

relatively reduced in the case of a syntactic and semantic match between two words linked

by phrase-structure rules (Pulvermüller & Shtyrov, 2003; Pulvermüller et al., 2008). The

increased brain activation to particles in real phrasal verbs reported in Cappelle et al.’s

study thus provided neurophysiological support that a congruent verb–particle sequence is

not in syntactic relationship but more like a lexical unit.

In short, according to the literatures in both psycholinguistics and neurolinguistics,

transparent compounds (i.e. similar to VOPs in Chinese) are processed with a

decomposition/integration mechanism. The larger N400 effect thus can be considered as the

cost of integration. On the other hand, the processing mechanism of opaque compounds (i.e.

similar to VOCs in Chinese) is still obscure.

Chapter 3 Methodology

In this chapter, the current experiment is illustrated. First of all, the participants of the

experiment are described in section 3.1. Materials are introduced in section 3.2. Settings and

procedure of the experiment are reported in section 3.3. Finally, the process of data analysis

is illustrated in section 3.4

3.1 Participants

Thirty-nine Chinese native speakers (20 to 35 years old, mean age = 23, 25 females)

were recruited for the experiment. All participants were right-handed according to a

simplified version of the Edinburg handedness inventory (Oldfield, 1971). They all had

normal or corrected-to normal vision. None of the subjects had neurological/psychiatric

disorders. Written informed consent was obtained from all participants before the experiment

started. They were paid for their participation after they completed the task of experiment.

3.2 Materials

The materials were sentences embedded with VO-structured verbs with two factors

being manipulated: transparency (Transparent, In between and Opaque) and sentence

pattern (Separated, Unseparated). They could be divided into six conditions: OS (opaque,

separated), OU (opaque, unseparated), IS (in between, separated), IU (in between,

unseparated), TS (transparent, separated), TU (transparent, unseparated). The example

sentences are provided in Table 1.

Table 1: Example sentences of the current experiment (TS, transparent, separated; TU, transparent, unseparated; IS, in between, separated; IU, in between, unseparated; OS, opaque, separated; OU, opaque, unseparated).

Conditions Transparency Sentence structure Example sentence

OS Opaque Separated

司機熬了夜 si ji ao le ye

‘The driver stayed up late’

OU Opaque Unseparated

司機熬夜了 si ji ao ye le

‘The driver stayed up late’

IS In between Separated

胖子跑了步 pan zi pao le bu

‘The fat guy ran’

IU In between Unseparated

胖子跑步了 pan zi pao bu le

‘The fat guy ran’

TS Transparent Separated

助理犯了錯 zhu li fan le cu

‘The assistant made a mistake’

TU Transparent Unseparated

助理犯錯了 zhu li fan cu le

‘The assistant made a mistake’

For the separated conditions (OS, IS and TS), the VO sequence was interposed with a

fixed morpheme le 了 for the following two reasons: (1) it was reported that in the case of

inserting frequently used aspectual morpheme like –le 了, -zhe 著, and –guo 過, the VO

sequence should be viewed as words (Siewierska et al., 2010); (2) –le 了 was reported to be

the most frequently used interposing element for the VO sequence (Smith, 1999; Wang,

2009). As results, all the experimental stimuli had the following sentence structures:

{[Subject]_N.+[V le O] _Com./_Phr.}_Sen. or

{[Subject]_N.+[VO le]_Com./_Phr.}_Sen.

Note: “N” means Noun, “Com.” means Compound, “Phr.” means Phrase and

“Sen.” means Sentence.

First, Verb-Object structured verbs were selected for the experiment. It should be noted

beforehand that in order to observe lexicalization of the VO sequence, the final division of

semantic transparency into three conditions (Opaque/In between/Transparent) was made by

the result of a pilot test done by native speakers of Chinese Mandarin. Therefore, the

classification of three groups in terms of semantic transparency before the pilot test was only

temporary.

The materials in the Opaque group and the In between group were mainly from the

following references: Smith’s dissertation (1999) , Wang’s (2009) and Zhang’s (2013)

master theses. The materials in Smith’s and Zhang’s were from the corpus of Academia

Sinica, with the tags related to the VO sequence, [spv.] and [spo.]¹. As for the materials in

the Transparent group, PRACTICAL AUDIO-VISUAL CHINESE (新版視聽華語) and the

Sinica Corpus were the main resources.

A total of 623 critical verbs were selected. They were further eliminated with the

following steps. The first step was to control the word frequency, which was done by

looking up the log frequency with the Chinese Word Sketch Engine

(http://wordsketch.ling.sinica.edu.tw/). There were at least a billion Chinese lexical items in

the corpus of gigaword2all, which was constructed by the Institute of Linguistics, Academia

Sinica. Verbs with no frequency recorded were filtered out. The rest of the materials (577

verbs in total) were then classified roughly by its grammatical tag in the corpus. In the

Opaque group (226 verbs) and the In between group (152 verbs), the verbs were mainly

from the literatures mentioned above. The grammatical tags for these two groups of verbs

were all intransitive verbs, usually tagged as VA, VB and VH². As for the Transparent group

(199 verbs), since this group was composed of transparent VO sequence, most of the verbs

in this group were tagged as VC+Na, with VC referring to a transitive verb and Na referring

1 [spv] and [spo] are two features designed by the Acdemia Sinica Corpus.

[spv] means the Verb while [spo] means the Noun of a separable V N compound e.g. 吃 Vc[+spv]了他的虧 Na[+spo].

2 [VA] Intransitive Action Verb 動作不及物動詞.

[VB] Intransitive-like Action Verb 動作類及物動詞.

[VH] Intransitive Stative Verb 狀態不及物動詞.

to a noun. (e.g. 寫[VC]詩[Na] “write a poem”).

To finalize the classification of the selected verbs into the Opaque, In between and

Transparent conditions, a pilot test on semantic transparency was conducted. The verbs from

the temporary classified three groups were equally distributed into four questionnaires for

subjects to rate. Three questionnaires contained 144 verbs and one contained 145 verbs, and

each of them was rated by 30 Chinese-speaking subjects ranging from 20 to 40 years old (See

Appendix A for the details of the semantic transparency questionnaire). The subjects were

asked to judge how different a verb’s meaning in use is from the meaning combination of its

constituents on a 5-point scale

(1: least different, 5: very different). Taking chi cu 吃醋“being jealous ” as an example, since

chi means “eat” and cu means “vinegar”, the combination of chi cu is “eating vinegar”, which

is very different from its meaning in use “being jealous”. Subject may consequently rate this

verb 5 (very different). Based on the subjects’ ratings, the mean scores for the verbs in these

three groups were: Opaque group =3.65 (SD=.45); In between group =2.40 (SD=.25);

Transparent group =1.53 (SD=.14). Statistical analysis on the results showed that there was a

significant difference among these three groups, F (2, 58) = 297.02, p<.001.

After these three groups (Opaque/In between/Transparent) were formed, other

variables that might confound the experiment were further controlled. First of all, word

frequency (obtained from the corpus of gigaword2all in Chinese Word Sketch Engine

mentioned above) of all the VO structured verbs in the three groups were equivalent no

matter in VO unseparated form, F(2, 58) = 1.65, p=.200 or in VO separated form F(2, 58) =

2.67, p=.078. Moreover, two other pilot tests were conducted to ensure that the degree of

concreteness and familiarity were equal among the three groups of verbs (See Appendices B

and C for the details of the familiarity questionnaire and concreteness questionnaire).

Another two groups of 30 subjects (20-40 years old, native speakers of Chinese Mandarin)

were recruited to judge the concreteness and familiarity of the verbs on a 5-point scale (1:

least concrete, 5: very concrete; 1: least familiar, 5: very familiar). These two groups of

subjects did not participate in the previous transparency pilot test and the later formal

experiment. The results of these two pilot tests showed that there were no significant

difference among these three groups of stimuli in terms of concreteness, F(2, 58) = 2.21,

p=.118, and familiarity degree, F(2, 58) = 1.08, p=.346. Finally, the neighborhood size of

the verbs in these three groups was controlled. The value of neighborhood size was

computed automatically by adopting python-based Load and Analysis Chinese

Corpus-Natural Language Toolkit (LACC-NLTK) based on Academia Sinica Balanced

Corpus with tagged texts (http://tm.itc.ntnu.edu.tw/CNLP/?q=node/6). There was no

significant difference among the three groups, F (2, 58) = 1.43, p=.247.

As soon as the critical VO constructed verbs were finalized, the construction of sentences

stimuli was started. All the sentences, including fillers, were built with five characters. In

those critical sentences, the subject of a sentence was always a two-character animate noun.

Since all the sentences should be natural enough for the experiment, the subject of a sentence

stimulus was chosen to be correlated to its critical verb, e.g. 胖子+跑步了 “The fat guy + ran.” The correlation between the subjects and the verbs of the three groups of sentence

stimuli were checked. The statistic results showed that there was no significant difference

among the three groups of sentence stimuli in terms of correlation, F(2, 58) = 1.48, p=.236.

Furthermore, the number of the strokes of the verbs (F(2, 58) = .42, p=.657) was statistically

equivalent across three groups of sentences. Finally, a final pilot test (See Appendix D for the

details of the sentence naturalness questionnaire) was conducted to ask a new group of 30

subjects (20-40 years old, native speakers of Chinese Mandarin) to rate the naturalness of the

sentences on a 5-point scale (1: least natural, 5: very natural). Ideally, all the sentence stimuli

should be rated equally nature across all the conditions, but VO unseparated sentences got

statistically higher score compared with VO separated sentences. However, this was not

surprising since previous literature had pointed out that the VO unseparated form is the main

usage of VOCs (Yi,2007). Though the statistic result showed that there was a main effect of

sentence naturalness between VO separated sentences and VO unseparated sentences (F(1, 29)

= 54.75, p<.001), and a significant separablility x transparency interaction, F(2, 58) = 4.58,

p=.014, with unseparated condition having higher ratings than separated condition , all the

sentences could still be counted as natural since the mean score of all the sentences were

higher than 4.37 out of 5 (TS = 4.53, TU =4.65 , IS =4.48 , IU =4.76 , OS =4.37 , OU =4.75). The results of the pilot tests are summarized in table 2.

Table 2. The summary of the statistical results for all the controlled variables.

Ratings/values Factors F Value p

VO verbs

Semantic transparency

Semantic transparency (3 levels)

F (2, 58) = 297.02 .000**

Sentence naturalness Separability (2 levels), Semantic transparency (3 levels)

Separability

F (2, 58) = 54.75 .000**

Semantic transparency

F (2, 58) = .76 1 Separability x Semantic

transparency F (2, 58) = 4.57

.014*

Sentence naturalness in VO unseparated form Semantic transparency (3 levels) F (2, 58) = 2.12 1 Sentence naturalness in VO separated form Semantic transparency (3 levels) F (2, 58) = 2.74 .072

Correlation between subjects and verbs Semantic transparency (3 levels) F (2, 58) = 1.48 1 Note. p*=<.05, p**=<.001

There were 30 sentence stimuli for each condition, with a total of 180 critical sentences

included in the experiment. Ninety fillers were also added, including23 ungrammatical VO

separated sentences (e.g. 粽子違著規 zong zi wei zhe gui ‘The rice dumpling is breaking rules’), 23 ungrammatical VO unseparated sentences (麵粉摸彩著 mian fen mo cai zhe ‘The

flour is drawing lots’) and 44 non-VO grammatical sentences (蚊子很討厭 wen zi hen tao

在文檔中漢語動賓複合詞處理歷程研究 (頁 22-0)