Lexical access and models - Processing the spoken language signal

Chapter 2 Literature Review

2.1 Processing the spoken language signal

2.1.2 Lexical access and models

國

立政治大學

‧

N a tio na

l C h engchi U ni ve rs it y

stimuli have been identified, the cues for the consonants are lost, and only the coded

stimuli remain. Additionally, because of the relatively longer duration of vowels, the

perception course suggests that vowels are processed longer at the auditory level than

consonant (Carroll, 2008; Frauenfelder & Tyler, 1987; Garman, 1990).

2.1.2 Lexical access and models

In addition to the issues on discrimination and categorization of phonetic segments,

many researchers are interested to expand the inquiry domain to the processes which

spoken words are recognized for retrieving meanings. Psycholinguists are eager to

understand how listeners use phonological and prosodic knowledge to parse the

sensory input during word recognition (Grosjean & Gee, 1987; Lyn, 1987; Uli H &

Tyler, 1987).

Models of spoken word recognition generally assume that phonological

information is continuously integrated during spoken word recognition. When the

speech is unfolding, lexical candidates compete for recognition as a function of

phonological similarity with the speech input (Foss & Hakes, 1978; Garman, 1990;

Gleason & Ratner, 1998; Myers, Laver, & Anderson, 1981). The models are different

in explaining the temporal dynamics of spoken word recognition between the

incoming speech stimuli and potential lexical representation.

‧

國

立政治大學

‧

N a tio na

l C h engchi U ni ve rs it y

One of the significant models is Cohort model (W. D. Marslen-Wilson & Welsh,

1978; William D & Marslen-Wilson, 1987). Cohort model proposes that the onset of a

word activates a set of lexical candidates competing for recognition. In the first,

autonomous stage, when the first phoneme of a word is heard, all of the candidates

with the phonological resemblance of the words are activated. For example, if the phoneme /d/ in the word “drive” is heard, then the words beginning with /d/ may

activate many candidates such as “dive,” “drink,” “date,” “dunk” and so on. This set

of activated words is called the “cohort”. The words in the cohort are not assumed to

affect the activation levels of one another, which mean that at this stage, word

recognition is a completely data-driven or bottom-up process. In the second stage,

once a cohort structure is activated, all possible sources of the auditory information

may begin to influence the selection of the target word from the cohort. The additional

auditory phonetic information may eliminate some of the cohort words. The coming

phonetic information is assumed to work in a strictly left-to-right fashion. However,

in this stage, the sources of higher levels information may also help to eliminate the

hypothesized word cohorts. For instance, if the phoneme of the /r/ presents following

the phoneme /d/, this further acoustic-phonetic information may eliminate the cohort words such as “date” and “dunk.” And then the higher level sources of the

information may appear and eliminate other words of the cohort word such as “dive”

‧

國

立政治大學

‧

N a tio na

l C h engchi U ni ve rs it y

and “drink,” which might be not suitable for the semantic or syntactic available

information. The spoken word recognition is finally achieved when a single candidate

remains in the cohort. A latter revised cohort model extends to consider other sources

of information such as word frequency effect (Frauenfelder & Tyler, 1987; Gleason &

Ratner, 1998; Jusczyk & Luce, 2002; W. Marslen-Wilson & Tyler, 1980; William D &

Marslen-Wilson, 1987).

The TRACE model is an interactive model (McClelland & Elman, 1986),

assuming three levels of primitive processing units: the features, the phonemes, and

the words (Figure 2) . These processing units have excitatory connections between

levels and inhibitory connections within the levels. These connections can both excite

and inhibit the activation levels of the nodes according to the stimulus input and the

activity in the system. For example, the stimuli with voicing such as the consonants

/b/, /d/, or /g/ will make the voiced feature at the phoneme level of the model become

active. The activeness in turn passes its activation to all voiced phonemes at the next

level, which in turn activates the words having those phonemes. Furthermore, via

lateral inhibition among units in a level, the most activated unit may come to

dominate other competing units which are also temporarily concordant with the input.

For example, the word unit cat at the lexical level will inhibit the similar and

competing lexical units (e.g., pat). This inhibition helps to make sure that the best

‧

國

立政治大學

‧

N a tio na

l C h engchi U ni ve rs it y

candidate word will win the competition in the process (Gleason & Ratner, 1998;

Jusczyk & Luce, 2002; McClelland & Elman, 1986).

Figure 2. A subset of the units in the TRACE. Each rectangle represents a different unit. The labels indicate the item for which the unit stands, and the horizontal edges of the rectangle indicate the portion of the TRACE spanned by each unit. The input feature specifications for the phrase “tea cup,” preceded and followed by silence, are indicated for the three illustrated dimensions by the blackening of the corresponding feature units (McClelland & Elman, 1986).

There are differences between the Cohort and the TRACE models (Table 1). First,

the Cohort model emphasizes on the temporal dynamics of spoken word recognition.

‧

國

立政治大學

‧

N a tio na

l C h engchi U ni ve rs it y

Cohort model suggests the significance of the initial word, which means that spoken

words may be identified before their offsets if similar competitors are not active.

However, the TRACE theory suggested the duplicative nodes and connections of its

system through successive time slices of input. This might be questionable in treating

the temporal dynamics in spoken word recognition. The time-slice solution results in

an extremely complex structure. Second, although the TRACE model is relatively

complex, its highly interactive feature makes it possess the computational specificity,

which results in a relatively easy way to conduct a direct test of behavior simulation.

Therefore, this feature helps in accounting for phenomena with a broad range. On the

contrary, the lack of interactive feature causes the poverty of computational specificity

in Cohort model. Last, the Cohort model emphasizes on the exact match between

auditory input and lexical representation rather than the sublexical representation.

However, the TRACE model has the phonemes level which is between the words

level and features level (Jusczyk & Luce, 2002).

‧

Table 1. The features for the Cohort and the TRACE models (Jusczyk & Luce, 2002)

Cohort TRACE

Activation Constrained Radical

Units and levels

lateral inhibition No Yes

Sublexical-to-lexical

interaction (bottom-up) Facilitative and inhibitory Facilitative Lexical-to-sublexical

interaction (top-down) No Facilitative

Distinguishing features

1. Focus on time-course of recognition

3. Attempts to account for broad range of phenomena

在文檔中聲調在中文口語字彙觸接的時序處理:眼動研究之證據 - 政大學術集成 (頁 26-31)