Chapter 2 Literature Review
2.1 Processing the spoken language signal
2.1.2 Lexical access and models
國
立 政 治 大 學
‧
N a tio na
l C h engchi U ni ve rs it y
stimuli have been identified, the cues for the consonants are lost, and only the coded
stimuli remain. Additionally, because of the relatively longer duration of vowels, the
perception course suggests that vowels are processed longer at the auditory level than
consonant (Carroll, 2008; Frauenfelder & Tyler, 1987; Garman, 1990).
2.1.2 Lexical access and models
In addition to the issues on discrimination and categorization of phonetic segments,
many researchers are interested to expand the inquiry domain to the processes which
spoken words are recognized for retrieving meanings. Psycholinguists are eager to
understand how listeners use phonological and prosodic knowledge to parse the
sensory input during word recognition (Grosjean & Gee, 1987; Lyn, 1987; Uli H &
Tyler, 1987).
Models of spoken word recognition generally assume that phonological
information is continuously integrated during spoken word recognition. When the
speech is unfolding, lexical candidates compete for recognition as a function of
phonological similarity with the speech input (Foss & Hakes, 1978; Garman, 1990;
Gleason & Ratner, 1998; Myers, Laver, & Anderson, 1981). The models are different
in explaining the temporal dynamics of spoken word recognition between the
incoming speech stimuli and potential lexical representation.
‧
國立 政 治 大 學
‧
N a tio na
l C h engchi U ni ve rs it y
One of the significant models is Cohort model (W. D. Marslen-Wilson & Welsh,
1978; William D & Marslen-Wilson, 1987). Cohort model proposes that the onset of a
word activates a set of lexical candidates competing for recognition. In the first,
autonomous stage, when the first phoneme of a word is heard, all of the candidates
with the phonological resemblance of the words are activated. For example, if the phoneme /d/ in the word “drive” is heard, then the words beginning with /d/ may
activate many candidates such as “dive,” “drink,” “date,” “dunk” and so on. This set
of activated words is called the “cohort”. The words in the cohort are not assumed to
affect the activation levels of one another, which mean that at this stage, word
recognition is a completely data-driven or bottom-up process. In the second stage,
once a cohort structure is activated, all possible sources of the auditory information
may begin to influence the selection of the target word from the cohort. The additional
auditory phonetic information may eliminate some of the cohort words. The coming
phonetic information is assumed to work in a strictly left-to-right fashion. However,
in this stage, the sources of higher levels information may also help to eliminate the
hypothesized word cohorts. For instance, if the phoneme of the /r/ presents following
the phoneme /d/, this further acoustic-phonetic information may eliminate the cohort words such as “date” and “dunk.” And then the higher level sources of the
information may appear and eliminate other words of the cohort word such as “dive”
‧
國立 政 治 大 學
‧
N a tio na
l C h engchi U ni ve rs it y
and “drink,” which might be not suitable for the semantic or syntactic available
information. The spoken word recognition is finally achieved when a single candidate
remains in the cohort. A latter revised cohort model extends to consider other sources
of information such as word frequency effect (Frauenfelder & Tyler, 1987; Gleason &
Ratner, 1998; Jusczyk & Luce, 2002; W. Marslen-Wilson & Tyler, 1980; William D &
Marslen-Wilson, 1987).
The TRACE model is an interactive model (McClelland & Elman, 1986),
assuming three levels of primitive processing units: the features, the phonemes, and
the words (Figure 2) . These processing units have excitatory connections between
levels and inhibitory connections within the levels. These connections can both excite
and inhibit the activation levels of the nodes according to the stimulus input and the
activity in the system. For example, the stimuli with voicing such as the consonants
/b/, /d/, or /g/ will make the voiced feature at the phoneme level of the model become
active. The activeness in turn passes its activation to all voiced phonemes at the next
level, which in turn activates the words having those phonemes. Furthermore, via
lateral inhibition among units in a level, the most activated unit may come to
dominate other competing units which are also temporarily concordant with the input.
For example, the word unit cat at the lexical level will inhibit the similar and
competing lexical units (e.g., pat). This inhibition helps to make sure that the best
‧
國立 政 治 大 學
‧
N a tio na
l C h engchi U ni ve rs it y
candidate word will win the competition in the process (Gleason & Ratner, 1998;
Jusczyk & Luce, 2002; McClelland & Elman, 1986).
Figure 2. A subset of the units in the TRACE. Each rectangle represents a different unit. The labels indicate the item for which the unit stands, and the horizontal edges of the rectangle indicate the portion of the TRACE spanned by each unit. The input feature specifications for the phrase “tea cup,” preceded and followed by silence, are indicated for the three illustrated dimensions by the blackening of the corresponding feature units (McClelland & Elman, 1986).
There are differences between the Cohort and the TRACE models (Table 1). First,
the Cohort model emphasizes on the temporal dynamics of spoken word recognition.
‧
國立 政 治 大 學
‧
N a tio na
l C h engchi U ni ve rs it y
Cohort model suggests the significance of the initial word, which means that spoken
words may be identified before their offsets if similar competitors are not active.
However, the TRACE theory suggested the duplicative nodes and connections of its
system through successive time slices of input. This might be questionable in treating
the temporal dynamics in spoken word recognition. The time-slice solution results in
an extremely complex structure. Second, although the TRACE model is relatively
complex, its highly interactive feature makes it possess the computational specificity,
which results in a relatively easy way to conduct a direct test of behavior simulation.
Therefore, this feature helps in accounting for phenomena with a broad range. On the
contrary, the lack of interactive feature causes the poverty of computational specificity
in Cohort model. Last, the Cohort model emphasizes on the exact match between
auditory input and lexical representation rather than the sublexical representation.
However, the TRACE model has the phonemes level which is between the words
level and features level (Jusczyk & Luce, 2002).
‧
Table 1. The features for the Cohort and the TRACE models (Jusczyk & Luce, 2002)
Cohort TRACE
Activation Constrained Radical
Units and levels
lateral inhibition No Yes
Sublexical-to-lexical
interaction (bottom-up) Facilitative and inhibitory Facilitative Lexical-to-sublexical
interaction (top-down) No Facilitative
Distinguishing features
1. Focus on time-course of recognition
3. Attempts to account for broad range of phenomena