• 沒有找到結果。

Rule-based Tone Sandhi Algorithm

Chapter 4 Tone Sandhi Problem and Algorithm

4.3 Rule-based Tone Sandhi Algorithm

Tone sandhi rules are the most important part of this study. The algorithm of sandhi marking is shown in Table 4 - 6.

Table 4 - 6 Tone Sandhi Marking Algorithm

Rule Remark 1 Apply normal sandhi to all syllables

2 Mark the last syllable as basic tone #

3 ê ‘of䘬’: Mark the syllable preceding ê as basic tone # ê is a special marker 4 A/A Pair

4.1 A/A Pair: Mark the last syllable of the first word as basic tone #

POS level, with ambiguity

5 N/V, N/A, N/P, N/R, and N/D Pairs

5.1 N/V Pair: Mark the last syllable of the first word

POS level , with ambiguity

Table 4 - 6 Tone Sandhi Marking Algorithm and the word itself as basic tones #’s

POS level, without ambiguity

8 S: Mark the last syllable of this word as basic tone # 9 POS R

9.1 i / in ‘(s)he/theyṾ(Ᾱ)’: Mark them as normal sandhi even if they are the last syllables

9.2 góa / lí / gún / góan / lán / lín

‘I/you/my/our/yourㆹ/Ἀ(Ᾱ)(䘬)’ of POS R:

Mark them as normal sandhi if they are not the last syllables

POS/Word level

10 Sentence-final kóng ‘say嫃’: Mark this word as normal sandhi if the delimiter is among [,|烉|:|"] and there is any word of POS R in front of this word (note: this rule needs to be refined in case there is a name in front of this word)

Word level, induced from observation data

11 Precedingá [á is suffix of a word]: Mark any syllables just before á as preceding á sandhi &

Syllable level

12 Double sandhi

12.1 beh ‘want天’: Mark any beh as double sandhi $ unless it appears at the end, including those within a word, such as kiông-beh, tih-beh .

Syllable level

12.2 khì ‘go⍣’: Mark khì as double sandhi $ if the POS of the immediately following word is N or V, unless it appears at the end

Word level

12.3koh ‘againℵ’: Mark any koh as double sandhi $, including those within a word, such as chiah-koh

‘and thenℵ’ or iáu-koh ‘still 怬㗗’ , unless it

Syllable level, extended from observation data

Table 4 - 6 Tone Sandhi Marking Algorithm appears at the end

12.4kah ‘and␴’: Mark any kah as double sandhi $ unless it appears at the end

Word level

13 Neutral sandhi of --: Mark the syllable just before -- as basic tone, and mark each syllable after -- as neutral sandhi %

Word level

14 Triplicate sandhi: Mark the first syllable as triplicate sandhi if that word has 3 syllables of the same spelling

Word level

15 Special words

15.1sím-mih / sím-miݚh ‘what Ṩ湤’: Change these words into sím-mí (sandhi marks not changed)

15.2 án-ni / àn-ni / an-ni / an-n̚ ‘thus 忁㧋’: Change these words into án-ni and to mark its sandhi marks as t#

Word level, extend from observation data because of not yet standardized

16 Markers

16.1 iah-s̚ / ah-s̚ / iaݚh-s̚ / aݚh-s̚ / á-s̚ ‘or ㆾ㗗’: Mark the last syllable before these words as basic tone #

word level, extended from observation data 16.2 V s̚ ‘is 㗗’ V: Mark the last syllable of the verb that

just before s̚ as basic tone # if this verb appears again after s̚

Mark these words as basic tone #

word level

16.4ը-sî ‘sometimes 㚱㗪’ / put-sî ‘from time to time ᶵ㗪’

/ kui-khì ‘justḦ傮’ / óan-jiân ‘like ⭃䃞’ / gôan-lâi

‘originally⍇Ἦ’ / chiong-lâi ‘future ⮯Ἦ’ / chiông-lâi ‘always⽆Ἦ’ / sui-jiân/ sui-bóng

‘though晾䃞’ / sî-siông ‘, “often 㗪ⷠ’ / hui-siông

‘very朆ⷠ’/ si ݚt-chþi ‘really ⮎⛐’ / s̚-chըn ‘(the duration of ) time㗪῁’: Mark the last syllables of these words as basic tone #

word level,

Table 4 - 6 Tone Sandhi Marking Algorithm 16.6sî-kàu ‘at that time⇘㗪῁’: Mark both two syllable

of this word as basic tones

word level, loݚh-lâi ‘come down ᶳἮ’ loݚh-khì ‘go down ᶳ⍣’

kòe-lâi ‘come up忶Ἦ’ kòe-khì ‘pass away 忶⍣’:

Mark the last syllable of a verb just before these words as basic tone #, and mark these words as neutral sandhi %

word level

19.2 sian-siϸ/sin-seϸ/sian-seϸ ‘Mr. ⃰䓇’: Mark the word before these words as basic tone # and these words as neutral sandhi %, if the first letter of the

preceding word is uppercase

word level

19.3bô ‘have nothing䃉’ at the end

19.3.1 á / á-s̚ / iah / iah-s̚ / ah /ah-s̚ ‘or ㆾ㗗’: if the preceding word is among these words, do nothing

19.3.2 Otherwise: Mark the last syllable of the word just before bô as basic tone #, and mark bô as neutral sandhi % at㚫’: Mark any final bȘ/bцe as neutral sandhi

%

19.4.2 á / á-s̚ / iah / iah-s̚ / ah /ah-s̚ ‘or ㆾ㗗’:

Mark the bȘ/bцe as basic tone # if any of these words immediately precedes it

19.4.3 Otherwise: Do nothing as it could be

Table 4 - 6 Tone Sandhi Marking Algorithm / my / our/ you(r)(s)/(s)he / him / her / his / they / them / theirㆹ/Ἀ/Ṿ(Ᾱ)(䘬)’: Mark the pronoun as following sandhi @ if it appears at the end and there is a verb before it

The program is implemented according to the sequence of the above rules.

These sandhi rules work on 4 different levels: the syllable, the word, the part of speech, and the sentence pattern.

The algorithm described above is mainly based on:

(a) Tone sandhi rules proposed by linguists (R. L Cheng, 1997, 2002);

(b) Rules induced from the observation data;

(c) Our intuition as native-speaking observers of sandhi phenomena.

We also consulted:

(d) The CWSTS to examine its word segmentation result and POS tagging output (CKIP, 2004);

(e) The OTCS to check the sandhi phenomena of certain words when we met some questions (Iunn, 2003b).

It should be noted that some of the sandhi rules proposed by linguists deal with specific contexts and thus cannot be broadly applied; some others carry exceptions. There is therefore some difficulty in converting these rules into an algorithm. So, besides (a), we also formulated some rules from (b) and (c) by analyzing errors in the observation data output. In principle sandhi rules are formulated to be applicable to “most situations” -- i.e. an accuracy rate over 75% on corpus data. Once applied, the new rules may affect the original rules,

the new rules.

Some rules have priority. Subsequent rules can supersede previous ones. As an example, rule 9 (pronoun rule) can supersede rule 3 (of rule). At the level of sentence pattern, rule 19.4.2 can supersede 19.4.1 as in the following example:

(9) “Lí Ș khì kok-gцa bȘ” ‘Will you go abroad or not Ἀ㚫ᶵ㚫⍣⚳⢾’: the last bȘ ‘will not ᶵ㚫’ is marked as neutral sandhi, whereas

“Lí Ș khì kok-gцa iah-s̚ bȘ” ‘Will you go abroad or not Ἀ㚫ᶵ㚫⍣⚳⢾’:

the last bȘ is marked as basic tone.

Moreover, because of the uncertainty in tagging POS, some rules are set to apply only when there is no ambiguity, while some other rules are applied to any matching POSs.

We currently employ 20 rules and expect to refine them or append new ones.