Materials - 標準國語的想像？台灣配音表演的語音與社會意涵分析

10 lines from Taiwanese dubbed version of Japanese animation are selected as the test material for this study. Different genres of shows are handled by different studios through different processes, and the reason to choose Japanese animation dubbing for the study is as follow: Firstly, some other genres, due to lack of budget to hire a separate script and subtitles/supertitles editor, do not allow dubbing artists to make any modification on the script during recording. This could pose difficulty for dubbing, because the scripts may be translated according to only the texts, that is to say, without consideration on original characters’ oral movement during utterance.

As a result, sometimes the text on the script does not match the movement. The most common mismatch is duration of the written text and the movement on

screen⁷. The dubbing artists, not allowed to change the texts, cannot but opt for otherwise undesirable way to perform the line, for example extend the duration of certain parts of the speech, to match the character’s oral movement. As a result, the performance may sound awkward. Other possible problems resulting from the difference between written text and onscreen movement include nuances of mouth shape and smaller pauses in between the speech, but they are less obvious to notice and are dealt with only in most large-scale projects with abundant amount of time.

On the contrary, dubbing projects for Japanese animation are more likely to have editor, whose presence allows dubbing artists to change the lines when necessary.

Sometimes, if available, Japanese-speaking dubbing artists themselves may be hired to translate the scripts which they will perform. With understanding in both the language and the art, they usually deliver the most ideal scripts that are both linguistically natural and pragmatically easy.

Secondly, some other genres of shows are handled by studios that still keep paperback dictionaries for pronunciation reference. As a result, the dubbing artists may use obsolete pronunciation of Mandarin characters which also result in awkwardness. Nowadays with popularisation of mobile network devices, dubbing artists can consult online up-to-date dictionary for reference, but there are still studios that prefer to use paperback ones. Moreover, there are many older shows in other genres already released with obsolete pronunciation, making them unfit for the use of this study.

Thirdly, as described in the first chapter, Taiwanese dubbing industry usually does not provide enough budget for a project to hire one separate dubbing artist for

7 Commonly referred to as zuibugou or zuitaiduo (嘴不夠，嘴太多), literally ‘not enough mouth’ or

‘too much mouth’

each character. It is extremely common for dubbing artists to act several characters in a show. Under such a circumstance, Japanese animations are more often dubbed by artists who, in addition to merely doing their job, have great fondness for the genre and knowledge for the shows’ original dubbing artists from Japan. They tend to be more wiling to research, despite the lack of time, how to have the voice quality as close as possible to the original. On the contrary, dubbing artists for other genres tend to care less about the original performance and use a single method in their dubbing. Indeed, studios sometimes get complaints that all the shows always sound the same. Notoriously omnipresent voice from such genres may also by default sounds bad and therefore is not ideal for survey.

The 10 lines are from 5 animations aired from 2002 to 2012. One line by female character and another by male character are extracted from each animation. They are between 25 to 35 characters long in Mandarin. Also, they are about daily life and do not contain supernatural or overly uncommon, dramatic contents so that they would not sound too unfamiliar to affect the perception. Finally, they should not have any background music or sound effect so that it is possible to perform further phonetic analysis. An attempt was made to obtain original audio files of only dubbing without any other non-voice over track from recording studio, but failed due to copyright issue.

Table 3.1 lists the number, source, gender of speaker, year, number of characters and actual text of each line.

Table 3.1: Materials details

8 The number of mandarin characters

Table 3.1(cont.): Materials details

The corresponding Japanese lines are labeled as (11) to (20) in the study. They are all performed in standard Japanese.⁹

3.2 Methodology

The excerpts of the lines are played to a hundred university students in Taiwan and a hundred university students in Japan respectively. The Taiwanese students listened to (1) to (10), namely to the Taiwanese dubbing performance, while the Japanese students listened to (11) to (20), namely the original Japanese voice acting

9 Which is the dialect in whicn voice actors in Japan are trained and required to perform, unless otherwise instructed. Exceptions include Rabu Kon and Chibi Maruko Chan: Itaria Kara Kita Shōnen mentioned in section 1.1.2, where western accent exhibits neutral locality. Other accents may be used as well; for example, in Spice and Wolf (狼と香辛料), the heroine speaks Kuruwa Kotoba (廓言葉), which itself is an accent designed to hide locality and in the show evokes positive characteristics such as elegance and talent because of its association with Oiran, talented entertainer and celebrity ladies arose from Edo Period of Japan (roughly 1600-1800).

performance. The students aged from 19 to 24 in year 2014. Half of them were female and half of them were male. The Taiwanese students study in National Taiwan University while the Japanese students study in Hokkaido University. Taiwan, as described in the first chapter, had different languages spoken but went through an imposition of Guoyu system since 1945. Notice that the Japanese dialects of the Hokkaido region is also shown to contain rather mixed elements, which is the reason it is chosen for this study. Like Taiwan, Hokkaido accommodates immigrants that speak a variety of dialects. Shibata (2003) points out that the dialect is either to be recognised as a branch of eastern Japanese dialects or Tōkai–Tōsan dialect (^東海東山

方言), which is practically a transition between Eastern and Western Japanese. The complexity of the dialect (if a coherent dialect exists, that is), like that of Taiwan, emerged because settlers to the region are from various parts. Hokkaido primarily had settlers from Tōhoku (東北) and Hokuriku (北陸), which roughly belong to the northern-eastern and nothern part of the main island. However the dialect was also affected by merchants from Kansai (関西). As a result, the dialect exhibits unique and diverse linguistic patterns (Fujiwara, 1965; Sasaki & Yamazaki, 2006; Sasaki, 2007;

Asahi, 2010; Sasaki, 2015) but is not dramatically different and characteristic like western Japanese is, making its speakers ideal for taking part in the study.

The lines were played from an AKG K240 headphone and a questionnaire was given to the students who had listened to the lines to see, on a scale of 1 to 5, how natural and standard they find the lines, as well as their overall fondness (they were asked to answer disregarding the contents of the lines). Other questions about the voice, the phraseology and the emotion of performance are also contained in the survey to blur the primary focus of the study so more objective and spontaneous answers can be retrieved. The participants ranked each line on a on-screen form after

listening to each of them, and the next one would be played only after they finished ranking a line and clicked for the next one. I was with the Taiwanese participants and my colleague who assisted me was with the Japanese participants to supervise the answering process and pause the session if participants need to rest. Open questions about how they generally feel about the lines were also included in the survey. Figure 3.1 and 3.2 show how the Taiwanese and Japanese questionnaires look like respectively, excluding the part for open question. Figure 3.3 is an English translation of the questionnaire.

Figure 3.1: Questionnaire for Taiwanese participants

Figure 3.2: Questionnaire for Japanese participants

Figure 3.3: English translation of the questionnaire

3.3 Results and Discussion

Table 3.2 shows the questionnaire results. As mentioned in section 3.1, in this study, (1) to (10) are the ten Taiwanese lines while (11) to (20) are their corresponding Japanese counterparts. Following this numbering system, Table 3.2 shows their rankings in how natural, standard and likeable they are according to the participants. Values are rounded off to the 2nd decimal place. The last two rows list the average and standard deviation of each column.

Table 3.2: Results and average of the questionnaire

natural standard likeable natural standard likeable

(1) 1.92 4.31 3.56 (11) 4.65 4.35 4.71

(2) 2.30 4.74 2.37 (12) 4.60 4.53 4.69

(3) 2.05 4.74 2.89 (13) 4.60 4.53 4.42

(4) 2.25 4.84 2.66 (14) 4.60 4.52 4.39

(5) 2.30 4.90 2.30 (15) 4.43 4.61 4.43

(6) 2.11 4.43 1.93 (16) 4.37 4.5 4.5

(7) 1.83 4.71 2.05 (17) 4.60 4.59 4.48

(8) 1.86 4.76 3.61 (18) 4.57 4.47 4.43

(9) 3.12 4.31 4.14 (19) 4.68 4.51 4.45

(10) 3.23 4.68 4.64 (20) 4.52 4.47 4.45

avg. 2.80 4.63 3.80 4.53 4.53 4.37

sd. 0.47 0.20 0.88 0.09 0.07 0.11

In both languages, the dubbing performances are considered rather standard.

However, the lines by Taiwanese dubbing artists are not generally considered natural, while those by the Japanese dubbing artists are. It is worthy to note that individual differences exist nonetheless. (9) and (10), for example, had relatively higher rank on naturalness.

Statistical tests were run to verify the difference in naturalness ranking from the two languages. Each Taiwanese line’s ranking is tested against that of its Japanese counterpart. The value retrieved is nonparametric ordinal data, because no normal distribution is assumed and they are represented by rankings on a questionnaire. A Mann Whitney U test is therefore chosen for the test. Table 3.3 shows z and p-value of the test. As there is a sample size of 100, table of critical value of U was not used on the result.

Table 3.3: Mann Whitney U test on naturalness ranking of each pair of line

(1) (11) (2) (12) (3) (13) (4) (14) (5) (15) (6) (16) (7) (17) (8) (18) (9) (19) (10) (20)

(two-tailed) 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 6.66E-16 0.00E+00

Statistic results show that the naturalness ranking Taiwanese dubbing performance received is significantly lower than that of Japanese dubbing performance. The result is consistent with Ishii et al. (1999) that dubbing performance in Taiwan is mostly rejected as unnatural, and further shows that this seems to be a problem not present in the case of Japanese audience’ reaction on Japanese dubbing. Pairs (9)/(19) and (10)/(20) show slightly larger z but the differences are still significant.

Notice that although (1) to (10) receive lower ranking in both naturalness and fondness, there is no necessarily correlation between the two. The backdrop of this study is Ishii et al. (1999)’s survey, which shows Taiwanese audience’ general rejection of dubbing is due to unnaturalness. On the contrary, the rankings in this study are on specific lines heard separately through a high-definition output device, and overall fondness may be influenced by various factors other than naturalness, such as translation quality and voice character¹⁰. Spearman's rank correlation coefficient was run to see the Taiwanese dubbing performances’ ranking correlation of naturalness/fondness. Table 3.4 shows the results.

Table 3.4: Spearman's rank correlation coefficient of naturalness/fondness ranking of (1) to (10)

10 According to colleagues who were trained in both Japanese and Taiwanese voice acting

programmes, Taiwanese training is more restricted in terms of preferred voice quality, while Japanese training is more open to different voices. Taiwanese training almost always prefer a bright and sharp voice, which functions well in terms of signal clarity but may not always be pleasant.

As the table shows, there is indeed no correlation observed between the two rankings. Some of the answers collected from the open question section, however, show more explicit links between naturalness and fondness. At least 15 participants wrote that the lines sounded unnatural and were uncomfortable to listen to. Among these 15, they either noticed excessive amount of retroflexion or pointed out they sense traits of Mandarin speakers from Mainland China. There were also comments on voice and emotion but they tend to be mixed. Some participants liked the bright voice quality while others did not find it pleasant, saying that it sounds fake. Some liked the emotional performance while others found it exaggerated. Some did not have particular feelings to either voice or performance at all. These comments explain why there was not a consistent correlation between naturalness and overall fondness, the latter being affected by different factors and tastes. The only consistency in open question is that if naturalness and standardness are mentioned, the lines were always considered very standard and unnatural. Although in the context of separated selected materials, this may not be the sole or superseding factor causing the low ranking of fondness, the ranking in naturalness itself is worth investigating as it is the reason of general rejection as Ishii et al. (1990) show, and seems particularly low by the comparison made in this study.

For a brief summary, over 30 participants commented that the pronunciation was extremely standard, and half of them noticed the lines always have retroflex sounds. Also there were over 20 participants who pointed out that the lines sounded exaggerated. Over 10 mentioned specifically that the lines remind them of Mandarin speakers from the mainland part of China.

Overall, the survey shows that there is a gap between naturalness and standardness in how audience perceive Taiwanese dubbing performance, which does

not exists in the case of Japanese lines, which are the originals of this study’s materials. The responses to open questions show that the exaggerated way of speaking and retroflex sounds seem to contribute the unnaturalness and standardness of dubbing performance. In the next chapter, dubbing performance’

specific phonetic characteristics will be compared with those of daily speech so as to see what exactly are making the performance to appear as unnatural.

Chapter 4 What Is Dubbing Like: Phonetic Analysis

The previous chapter concludes that audience generally find dubbing performance in Taiwan to be unnatural. To explore from an acoustic aspect what exactly causes the unnaturalness, this chapter describes the phonetic analysis of materials from dubbing artists and non dubbing artist informants. It has a 4-section structure, where section 4.1 introduces the retrieval of non dubbing artists’ reading of the same lines as in the materials. Section 4.2 discusses the lines’ pronunciation that is inconsistent with the prescriptive standard Guoyu pronunciation. Section 4.3 analyses the pitch contour and section 4.4 analyses the PVI of the performance.

4.1 Non Dubbing Artist Recordings

To get a representation of how younger people in Taiwan speak, 3 female and 3 male Taiwanese students were asked to record the same lines from the materials.

Like the participants in the previous chapter, they aged from 19 to 24 in year 2014.

The recording was done with a Zoom H2n Handy Recorder. The performances of dubbing artists were not played before recording so that the informants would not be affected in the way they record the lines. However, to get the recordings in speed close to dubbing artists performance for the convenience of analysis, cue lines (21)~(30) was read to the informants before each line was recorded. The cue lines are

similar in speed to each corresponding line and relevant in content, to achieve a more natural effect.

(21) 今天早上真是辛苦你了 (22) 那就麻煩你跟老師安排了 (23) 對不起，好像搞砸了 (24) 醫生，請問他沒事嗎？

(25) 那可以開始上課了喔 (26) 那裡是做什麼用的阿 (27) 他們一直都沒來耶 (28) 那我差不多該回家了 (29) 你為什麼都不出門啊 (30) 你怎麼從這裡冒出來啊

The informants were instructed to record as natural as possible, as if answering to the cue lines and in similar speed. Their recordings were then extracted into DAW Nuendo 4.0 and paired with dubbing artist recordings for listening comparison.

Figure 4.1: A pair of recording in Nuendo 4.0

4.2 Direct Inconsistency with Guoyu Pronunciation

Some features instantly demonstrate discrepancies with the prescriptive system of pronunciation, and even the non dubbing artist informants themselves explicitly noticed them.

4.2.1 Retroflexion

The most prominent difference in non dubbing artists recording, as Kubler (1985) and Li (1985) points out, is the absence of retroflex consonants. Out of 6 informants that recorded the lines, 3 did not pronounce retroflex sounds at all. The other 3, upon listening to their own recordings, reported that they unconsciously pronounced more “retroflexly” than they normally would. As Chang (2011) suggested, retroflexion is gradient, and the result in this study supports the view, because the notion of “more retroflexly” serves as an evidence of psychological truth for the retroflexion gradience.

Indeed, the first lesson in dubbing artists training, before advancing to performance-related, expressional issues, is adhering to Guoyu pronunciation, whose most immediately noticeable difference with daily speech is the phonemes [tʂ], [tʰʂ]

and [ʂ], which are replaced by [ts], [tʰs] and [s]. It’s an aspiring dubbing artist’s absolute prerequisite to pronunce necessary retroflex sounds when performing.

Failing to do so immediately disqualifies them from becoming even an intern.

To see an actual example from the materials, figure 4.2 is the spectrum and formants of (4) read by dubbing artist and a non dubbing artist informant respectively, displayed in Praat.

Figure 4.2: Retroflexion comparison from (4), by dubbing artist (left) and non dubbing artist informant (right)

In figure 4.2, the waveform, spectrum, romanisation and character of Mandarin are displayed from top to bottom. The red dots on the spectrum represent formants.

In the excerpt yinggaishi (應該是 ‘should be’), it can be clearly seen that the third character shi has a much higher third formant in the case of non dubbing artist informant. The lowest point of formant of the dubbing artist is around 3690 HZ while that of non dubbing artist informant is around 4435 Hz. This is because the dubbing artist pronounces the character as [ʂɿ]¹¹, with voiceless retroflex sibilant, while the

11 The vowel here is a high back unrounded vowel with preceding consonantal frication, represented by [ɿ] in conventional Mandarin transcription and by [ɯ] (with a slur from the preceding consonant) in standard IPA. The same goes for [sɿ].

non dubbing artist informant pronounces the character as [sɿ], with voiceless alveolar fricative. As Lindau (1985), Trask (1996) and Stevens (1998) point out, retroflex consonants have lower third formants. Shi serves both copular and affirmative function and a frequently occurring character in modern spoken Mandarin. The difference between dubbing performance and daily speech here represents very well the obvious gap between prescriptive and actual pronunciation in Taiwan.

4.2.2 Downdrifting

A less obvious inconsistency is the downdrift of utterance-final rising tone (2^nd tone). Interestingly this is observed in both the recordings of dubbing artists and non dubbing artist informants. Tseng (2004) finds downdrift phenomenon a general feature of Taiwan Mandarin speakers. In their experiment Taiwan Mandarin speakers generally pronounce a 1^st tone significantly lower after 4^th tone. In this study, jinxing (進行 ‘process’) in (7) and mianqian (面前 ‘in front of’) in (8) both have the first character in 4^th tone and the second in 2^nd tone. However, no rising of pitch is observed in the second characters at all, in both the recordings of dubbing artists and non dubbing artist informants. Figure 4.3 shows the downdrifting of mianqian in (8).

Figure 4.3: Downdrifting comparison from (8), by dubbing artist (left) and non dubbing artist informant (right)

In figure 4.3, the waveform, spectrum, romanisation and character of Mandarin are displayed from top to bottom. The blue line on the spectrum represents pitch contour. The two recordings exhibit similar pattern of downdrifting. This seems to be consistent with what Tseng (2004) found, because the informants in that study are radio announcers who are also professionally trained. In the basic training of dubbing artists however, the downdrift of rising tone is strictly prohibited and regarded as an extremely undesirable feature of “Taiwanese accent”. The presence of such a

在文檔中標準國語的想像？台灣配音表演的語音與社會意涵分析 (頁 32-0)