The corpus - The Corpus and Patterns - 萬變不離其宗：建立台灣華語中英語借字之機率感知語法

The Corpus and Patterns

4.2 The corpus

國

立政治大學

‧

N a tio na

l C h engchi U ni ve rs it y

107

combination differences. Section 4.5 in turn investigates the patterns of simplex codas, and like simplex onsets, this section goes in the order of stops, fricatives, affricates, nasals, and liquids. Next, Section 4.6 deals with complex codas, and all possible combinations of coda clusters are collected and classified to see the patterns. Finally Section 4.7 concludes this chapter.

4.2 The corpus

The original version of the loanword corpus that is currently adopted in this dissertation is compiled in Lu (2006), containing a total of 947 monosyllabic and disyllabic TM adaptations of English loanwords and transliterations. The size of the corpus is now extended to 1,664 loanword adaptations, within which 350 are monosyllabic and 1,314 polysyllabic. The purpose of the extension is twofold. First, it is our intention to update the loanword data, considering loanwords are newly coined into TM at an amazing speed, as foreign brands are imported and foreign news information are translated into TM on a daily basis. Furthermore, with the rapid development of communication technology, words with English origins are gradually created by the younger generation and recognized by public media. An example is English loser ([.lu.z .]), which is borrowed and adapted as 魯蛇 ([.lu.z .]) into TM recently. Secondly, the more sizeable the database, the more comprehensive the collection of analytic sources will be, which in turn reinforces the accuracy of the adaptation patterns being observed. As this dissertation attempts to provide a persuasive formal account of the adaptation patterns of English consonants in TM, wherein a fair number of targets are consonant clusters, plentiful sources of investigation are thus critically needed.

Unlike previous works on Chinese loanwords, where the data are drawn entirely from loanword dictionaries, the corpus data of our research come from a variety of

‧ 國

立政治大學

‧

N a tio na

l C h engchi U ni ve rs it y

108

origins, including loanword dictionaries, where only terms that are familiar to the common public are extracted, and the author’s collection of daily items from public media, the Internet, merchandise in supermarkets, shop signs on the street, and people’s conversations. The transliterations of proper nouns range from common male and female names and geographic terms to brand names, names of celebrities, professional athletes and politicians. Loanwords and transliterations that are collected from sources as such are believed to contribute more to the authenticity of the database. Moreover, the resultant analyses based on the corpus data are more likely to reflect the more down-to-earth aural events that occur in the real world, which in turn leads to greater explanatory power of the analyses when formalizing the adapter’s perceptual and mental mechanisms.

To the author’s best knowledge, the loanword corpus in our research may be the first comprehensive loanword collection of English loanwords in TM. Amongst the representative works on Mandarin loanword phonology (Miao 2005, Lin 2007a, 2008ab, Hsieh and Kenstowicz 2009, Paradis and Tremblay 2009, Dong 2012), the L1 in question refers to the Mandarin variety spoken in Mainland China (i.e. MC), also known as “Putonghua” (普通話). As has been discussed in Chapter 1, due to the political separation from Mainland China for several decades ever since the relocation of the Kuomintang’s national government in 1949, the Mandarin spoken in Taiwan has long been influenced by Taiwanese (a variety of Southern Min dialect), since approximately 75% of Taiwanese people speak Taiwanese. As classified by Lin (2007b), at least four types of Standard Chinese may be set up, despite the fact that it is difficult to have a clear cut between different Mandarin varieties. Thus, inevitable alternations occur to a large degree. Even though the prescribed Mandarin in Taiwan (TM) is the same as the one pervasively used in Mainland China in essence, with only minor differences, what is for sure, however, is that there exists a great divergence in

‧

the adaptations and transliterations of English loanwords and proper nouns between MC and TM. It is believed that a severe demarcation between the two varieties is crucial. For a clear picture, a comparison showing the different interpretations of the same token between the two Mandarin varieties is given in (73), which lists the two varieties’ adaptation patterns of English [ ] in the [t ] sequence. We will show that English loanwords in TM serve as a better platform when one is to probe the effects of perceptual salience on loanword adaptation.

(73) A comparison between MC and TM in the adaptation of English [ ]-onsets (MC examples from Dong (2012))

a. Simplex onset

Strategy MC TM

retention √ (167/167, 100%) √ (273/273, 100%)

b. Second/third onset in [t ] sequence

Strategy MC TM (syllabification, 92.86%). There is thus no substantial strategic distinction between a simplex onset [ ] and a second/third onset [ ]. For TM adapters, nonetheless, the perceptual salience yielded by this positional difference is properly reflected in the

‧ 國

立政治大學

‧

N a tio na

l C h engchi U ni ve rs it y

110

retention/deletion contrast of [ ]’s in their TM mappings. To be specific, in (73a), where [ ] lies in the simplex onset position, the arguably most salient position by Beckman (1998) and Steriade (2008), for both MC and TM speakers [ ] is always retained as a liquid without exception, either [l] or [ ]. On the other hand, [ ] in (73b) serves as the second/third onset position, still a rather perceptually salient position but is assumed to carry less strong cues than the simplex onset. For this category, a vast majority of MC cases preserve it by inserting a vowel for the preceding stop. By contrast, in addition to the same strategy of syllabification (16.22%), quite a few examples of this kind (35.14%) in TM preserve only the rounding feature and map it to the glide [w], or fuse the two successive consonants into a post-alveolar affricate (48.65%). The perceptually “weaker” position of the second/third onset compared to the simplex onset is thus considered to be more remarkably revealed in TM than in MC. This “gradience” of positional prominence will be discussed in more detail in the later discussions.

In compliance with the terminology given in Chapter 1, the following loanword types are considered for phonological analyses. First, phonological loanwords constitute the majority of the corpus. Data of this type are purely coined by mimicking the pronunciation of the source words as close as possible, making no reference to the semantic elements of the source. An example of this is “[.t æ .(o .] Tango → [.t an.k .] 探戈”, in which the meanings of the two characters 探 ([.t an.], ‘explore’) and 戈 ([.k .], ‘weapon’) have nothing to do with dancing. Second, it is only the phonological, but not the semantic, part of a hybrid loanword will be considered for analyses. For instance, in “[.st .b ks.] Starbucks → [.$i .pa.k .] 星巴克”, the character 星 ([.$i .], ‘star’) is used semantically, while the characters 巴 ([.pa.], ‘stick to’) and 克 ([.k .], ‘overcome’) are purely phonemic mapping of the second syllable in the source. Moreover, reduplicate loanwords are included for their high degree of

‧ 國

立政治大學

‧

N a tio na

l C h engchi U ni ve rs it y

111

phonetic similarity to the source words, though the combination of the characters additionally bears relevant meanings to the source word. An example of such is

“[.bu.fe .] buffet → [.pai.xwei.] 百匯”, where the literal meaning of the TM form is

‘hundred gather’. We also have lexical loanwords collected, where the characters combined happen to be existent words in the L1 lexicon, such as “[.t ou.f l.] TOEFL

→ [.t wo.fu.] 托福”, in which the TM form means ‘hold blessing’ literally (to show politeness by saying “with one’s blessing”). In addition, the phonological part of a qualitative loanword is taken into consideration for analyses, as the additional character that indicates the property of the noun is semantically purposed. For example, in “[.k .t .] cutter → [.k a.t .t$ jou.] 卡特球”, the character 球 ([.t$ jou.], ‘ball’) is added to the TM form simply to indicate that it is a type of baseball pitches. Finally, commercial loanwords are counted as a whole for their phonetic similarity to the source words, though the selected characters, separately or altogether, are usually words with positive meanings or those believed to bring good luck. What is completely excluded from the corpus is the semantic loanwords, the creation of which does not involve any phonological processes but is fully based on semantic interpretation. An example of this is “[.feis.b k.] facebook → [.ljen. u.] 臉書”, where, literally, 臉 ([.ljen.]) means ‘face’ and 書 ([. u.]) means ‘book’.

The data in the corpus are transcribed in broad IPA symbols, compiled with Microsoft Office Excel. Each entry consists of the spelling of the English source word, the phonetic transcription of each syllable under the syllabic template of English, the TM written form of the adaptation, and finally the phonetic transcription of the TM adaptation form of each syllable (hence each character) under the syllabic template of TM.

‧ 國

立政治大學

‧

N a tio na

l C h engchi U ni ve rs it y

112

在文檔中萬變不離其宗：建立台灣華語中英語借字之機率感知語法 - 政大學術集成 (頁 122-127)

The corpus

The Corpus and Patterns

4.2 The corpus

國

立 政 治 大 學

‧

N a tio na

l C h engchi U ni ve rs it y

‧ 國

立 政 治 大 學

‧

N a tio na

l C h engchi U ni ve rs it y

‧

‧ 國

立 政 治 大 學

‧

N a tio na

l C h engchi U ni ve rs it y

‧ 國

立 政 治 大 學

‧

N a tio na

l C h engchi U ni ve rs it y

‧ 國

立 政 治 大 學

‧

N a tio na

l C h engchi U ni ve rs it y

立政治大學

立政治大學

立政治大學

立政治大學

立政治大學