• 沒有找到結果。

3.2 Materials

3.2.1 The Resynthesized Disyllabic Nonsense Words

Following Lai (2008), Wang (2008) and Chrabaszcz et al. (2014), a nonsense word rather than a real word was chosen because the listeners’ perception of real words might be influenced by how familiar they are with the words and where they expect the stress should fall. With a nonsense word, which they have never heard, there would not be the issue of familiarity and expectancy. The disyllabic nonsense word prodawn was chosen for the stress perception test. A native speaker of English from the United States was asked to pronounce the word in the noun form with stress on the first syllable and in the verb form with stress on the second syllable. The native speaker’s pronunciation was recorded on Praat at NTNU’s Phonetics Lab. The F0,

duration and amplitude of the two syllables in the two forms were measured on Praat, as shown in the table below

Table 1: The acoustic measurements of the nonsense word prodawn Word form

Acoustic parameters

Noun form: PROdawn Verb form: proDAWN Syllable 1 Syllable 2 Syllable 1 Syllable 2

Peak F0 (Hz) 242 172 137 144

Duration (ms) 368 267 201 662

Peak amplitude (dB) 70 69 68 70

The noun and verb forms (PROdawn and proDAWN) were synthesized into tokens with varying F0, duration and amplitude on Praat and Audacity. The F0 and duration were manipulated on Praat, and the amplitude was manipulated on Audacity.

Figure 1 is an illustration of the design of the noun forms. In the noun form, the syllable being manipulated was the first syllable. In the manipulations of F0, as shown in Figure 1 (A), the original F0 contour throughout the whole syllable was maintained, but there were three different heights for the F0 contour. The F0 distance between the two syllables in the nonsense word originally produced by the English speaker was defined as one increment, and the three different heights for the F0 contour were set based on the increment. Among the different heights in the first syllables, the highest

highest one was the original height pronounced by the English speaker (Figure 1 (A2)). The lowest height was lower than the original height by one increment (Figure 1 (A3)). In the manipulations of duration, as shown in Figure 1 (B), there were three different syllable lengths. The duration distance between the two syllables in the nonsense word originally produced by the English speaker was defined as one increment, and the three different syllable lengths were set based on the increment.

Among the three different syllable lengths in the first syllables, the longest one was longer than the original by one increment (Figure 1 (B1)). The second longest one was the original length pronounced by the English speaker (Figure 1 (B2)). The shortest one was shorter than the original length by one increment (Figure 1 (B3)). In the manipulations of amplitude, as shown in Figure 1 (C), the amplitude contour was maintained, but there were three different heights for the amplitude contour. The amplitude distance between the two syllables in the nonsense word originally

produced by the English speaker was defined as one increment, and the three different heights for the amplitude contour were set based on the increment. Among the

different heights in the first syllables, the highest one was higher than the original by one increment (Figure (C1)). The second highest one was the original amplitude produced by the English speaker (Figure (C2)). The lowest amplitude height was lower than the original height by one increment (Figure 1 (C3)).

Figure 1: Illustration of the synthesized noun forms

The verb form was manipulated into different tokens in the same way as the noun form, except that the syllable being adjusted was the second syllable. Tables 2 and 3 below demonstrate the acoustic settings for resynthesizing the noun and verb forms.

A. F0

(1) (2) (3)

B. Duration

(1) (2) (3) C. Amplitude

(1) (2) (3)

The red lines are the manipulated syllables. The gray dotted lines are provided in order to compare the first syllables in each panel. Pannel A shows three tokens contrasting in F0, Panel B demonstrates duration

contrast, and Panel C represnts amplitude contrast. The second token in each panel is how the English native speaker originally produces the syllables, which is the reference token. The first syllable in the third token in each pannel is down-adjusted by one increment, while each of the first token’s first syllable is up-adjusted by one increment. The increment is defined as the distance between the original first syllable and the original second syllable.

Table 2: The acoustic settings for resynthesizing the noun form PROdawn

Token (1) Plus one increment (2) Original (3) Minus one increment

Syllable 1 2 1 2 1 2

Note: “--” means the syllable was not modified.

Table 3: The acoustic settings for resynthesizing the verb form proDAWN

Token (1) Plus one increment (2) Original (3) Minus one increment

Syllable 1 2 1 2 1 2

Note: “--” means the syllable was not modified.

There were three acoustic parameters (F0, duration and amplitude) for the two forms (noun and verb), and each acoustic parameter had 3 conditions (plus one increment, original and minus one increment). Therefore, there were a total of 54 combinations (F0 x duration x amplitude x form: 3 x 3 x 3 x 2). All of the 54 combinations were presented to the participants in a block. The test contained two blocks of the same 54 materials, where the trial order within each block was

randomized individually for each participant. The design contained two blocks so that if the listener missed some trials in one block, the listener’s responses to the same trials in the other block could still be used for analysis. If there was only one block, and the listener missed some trials, the listener’s responses to those missed trials could not be obtained.

相關文件