• 沒有找到結果。

2.5 OT-based theories for language variation

2.5.2 Rank-ordering of E VAL

立 政 治 大 學

N a tio na

l C h engchi U ni ve rs it y

65

(36) Illustrations: Norman and hormone

a. Cophonology A: CodaCon >> Dep-V >> Max-R(V_.)

[.n .m n.] CodaCon Dep-V Max-R(V_)

a. ☞[.nwo.man.] *

b. [.no .man.] *W L

c. [.nwo. .man.] *W L

b. CodaCon >> Max-R(V_.) >> Dep-V

[.h .mon.] CodaCon Max-R(V_) Dep-V

a. ☞[.x . .mo .] *

b. [.x .mo .] *W L

c. [.x .mo .] *W L

This is the best cophonologies can do: proposing different rankings without access to quantitative prediction. The effect of syllable position on perceptual salience is invisible here. Furthermore, treating the two loanwords as different categories that are governed by different cophonologies does not make any sense, given that both are nouns and bear a great structural similarity. It is frequently attested in language variation that several variants are possible, but only one of them is systematically preferred. Faced with an ill-formed structure in the input, cophonology is obviously unable to predict the relative probabilities of two potentially confronting resolutions, such as retention through vowel insertion and deletion of a segment.

2.5.2 Rank-ordering of EVAL

Coetzee (2006) claims that in preference to simply selecting the best candidate and ignoring the set of losers, language users have access to the full candidate set, i.e.

the function of EVAL in OT is enriched to impose rank-ordering on the losers as well.

This OT version is termed “rank-ordering model of EVAL” (ROE). In essence, such redefinition of EVAL as ROE is to formalize language variation.

‧ 國

立 政 治 大 學

N a tio na

l C h engchi U ni ve rs it y

66

Presumably, EVAL in traditional OT discriminates the winner from the losers.

Once a losing candidate is eliminated (i.e. incurring a fatal violation against a higher constraint), all the violations with respect to the lower constraints are ignored. ROE, however, suggests that an OT grammar is able to generate more information than it is generally thought to be. EVAL should make distinction between the non-optimal candidates in terms of their well-formedness, just as it does between the winner and the losers. Coetzee (2006) provides a simple instance for this. In a hypothesized language, closed syllables are allowed but tautosyllabic consonant clusters are not, which is avoided by consonant deletion. The grammar “*Complex >> Max >>

NoCoda” is thus responsible for the process. This is shown in (37).

(37) /.prak./ → [.pak.] (Coetzee 2006, revised)

/.prak./ *Complex Max NoCoda

a. [.pak.] * *

b. [.pa.] **!

c. [.prak.] *! *

d. [.pra.] *! *

Unlike classic OT, which simply says that [.pak.] wins over the rest in this tableau, ROE further provides us with a well-formedness scale: [.pak.] >> [.pa.] >>

[.prak.] >> [.pra.]. Under this rationale, the otherwise “irrelevant” violation marks of the losers are also considered, and the likelihood that a candidate will be selected is determined by its position on the rank-ordering. The higher the candidate is on the rank-ordering, the more probable it will be selected as the output. The conceptual difference between classic OT and ROE is schematized in (38).

‧ 國

立 政 治 大 學

N a tio na

l C h engchi U ni ve rs it y

67

(38) Comparison of Classic OT and ROE (Coetzee 2006) Classic OT

[.pak.] {[.prak.], [.pra.], [.pa.]}

Only accessible No information about candidate relation between losers

ROE

[.pak.] [.pa.] [.prak.] [.pra.]

Decreasing well-formedness

Decreasing likelihood of being selected as output

Another argument that ROE puts forth is that language users’ access to the candidate set is not limitless. Instead, the constraint set is divided into two strata, one above the other. The two strata are demonstrated by a putative line called “critical cut-off”. When for an input all candidates are disfavored by some constraint above the cut-off, only the best candidate serves as the single grammatical output, just like a conventional OT scenario. However, constraints below the cut-off do not have the function to rule out a candidate as ungrammatical. Variation occurs so long as there is more than one candidate that is disfavored by only one constraint from below the cut-off. The scenario is illustrated below.

(39) Illustration of critical cut-off (Coetzee 2006)

C1 C2 C3 C4

1Cand1 *

2Cand2 *

Cand3 *!

Cand4 *!

In (39), variation arises: Cand1 and Cand2 violate C3 and C4 respectively, which

‧ 國

立 政 治 大 學

N a tio na

l C h engchi U ni ve rs it y

68

are below the cut-off so the violations are not severe enough to eliminate Cand1 and Cand2 as possible outputs. As for the relative frequencies between the two outputs, Cand1 should be observed to be the more frequent output since the constraint it violates is ranked lower than the one that Cand2 violates, i.e. it is more well-formed than Cand2. The ROE model well explains the variation phenomena attested in the vowel deletion in Faialense Portuguese and the reduplication and metathesis in Ilokano in Coetzee’s paper.

ROE differs from cophonology and other constraint re-ranking models along similar lines in that in the latter the non-observed candidates are essentially ungrammatical and can never be evaluated as the output under any sub-grammar or cophonology. It also differs from stochastic OT in that in the latter there is in principle no limitation on variation, and ungrammaticality is replaced by the concept of extremely low probability—a form that a speaker may never produce in his or her whole lifespan. What makes the tenets of ROE superior to cophonology is that

“relative” frequencies are predictable via the notions of gradient well-formedness and the critical cut-off, which endows ROE with quantitative predictability. In (39), for example, Cand1 is judged to be attested more frequently than Cand2 since the former’s violation is less severe than the latter’s, though both are possible outputs.

Though ROE performs better than cophonolgy with respect to the prediction of relative frequencies between possible outputs, it fails to formulate the precise probabilistic distribution of the variants, which is overcome successfully by stochastic OT, as will be discussed subsequently. That is, the well-formedness that ROE predicts is a matter of relativity between possible outputs, rather than their probabilistic distribution. The reason is that in ROE the constraints below the cut-off are categorically ranked, and thereby what can be predicted is simply that Cand1 is more possible than Cand2 since the constraint Cand1 violates is ranked lower than the one

‧ 國

立 政 治 大 學

N a tio na

l C h engchi U ni ve rs it y

69

Cand2 violates. Another essential flaw of ROE is that it is designed for variants derived from a single input, i.e. free variations, but this architecture falls short of our expectation to account for across-the-board patterned variations that exist in lexicon, i.e. lexical variation, as observed in the nasal substitution in Tagalog loanwords (Zuraw 2010) and the adaptation of English [ ]-codas in TM loanword adaptations (Lü 2013).

Again, to compare the validity of the different OT versions that cope with language variations, we instantiate the central claims of ROE with our loanword data.

However, considering that ROE does not fit a formal analysis of lexical variations, we investigate the free variations of TM adaptations of English [ ]-codas that correspond to a single English source word. This is exemplified in (40).

(40) Free variation of English loanwords in TM

L2 input L1 output Process Percentage (Entries) [.n .m n.] Norman [.nwo. .man.] Retention 8.21% (27,000/329,000)

[.nwo.man.] Deletion 91.79% (302,000/329,000) Source: www.google.com.tw

The result from Google’s hits reveals its striking similarity in frequency to the distributions observed in the lexical variation of the same type in our corpus (9.28%

for retention and 90.72% for deletion). Within ROE, the pattern can be formally analyzed in (41).

(41) An ROE analysis of free variation

[.n .m n.] CodaCon Dep-V Max-R(V_)

a. ☞1[.nwo.man.] *

b. ☞2[.nwo. .man.] *

c. [.no .man.] *!

‧ 國

立 政 治 大 學

N a tio na

l C h engchi U ni ve rs it y

70

Variation arises from this grammar, since both Candidates (a) and (b) are disfavored by only one of the constraints below the critical cut-off, and none is disfavored by the constraint above it. Candidate (c) can never be one of the outputs since it violates the constraint above the cut-off. Moreover, between the two possible outputs, it can be predicted that Candidates (a) will be more frequently observed than Candidates (b) as the constraint the former violates is ranked lower. However, as has been discussed, the ROE approach is unable to account for the probabilities and their distribution tendencies of the retention/deletion of English [ ]-codas that emerges from the effect of syllable position on loanword adaptation. At best, ROE tells us which is more common and which is rarer, which can be a 60%-40% or a 90%-10%

probabilistic distribution. This is somehow inadequate from a functional point of view, since little information is given about the significant effect that syllable position exerts on perceptual salience, which leads to over 90% of [ ]-coda deletion.

2.5.3 Stochastic OT

Stochastic candidate evaluation originates in the development of Boersma’s (1997, 1998) Gradual Learning Algorithm (GLA) and Boersma and Hayes’s (2001) empirical application of GLA, an error-driven algorithm that simulates the phonological learning of a (fragment of a) constraint-based grammar. What is unique to the algorithm is the type of Optimality-theoretic grammar it advocates. Instead of a set of ranked constraints that are essentially discrete from one another, it features a continuum of constraint strictness on which each constraint is assigned a certain value.

Higher values correspond to higher-ranked, less violable constraints, and vice versa.

The schema in (42) presents a categorical ranking, where Constraint A >> Constraint B >> Constraint C, though it might be said that Constraint A outranks Constraint B more than Constraint B outranks Constraint C.

‧ 國

立 政 治 大 學

N a tio na

l C h engchi U ni ve rs it y

71

(42) A continuous ranking scale (categorical)

Under no circumstances will the ranking alter provided each constraint falls on a point of the scale. The continuous scale, however, is of more theoretical significance if each constraint is equally assigned a range of value, rather than a single point on the scale. The assumption is realistically grounded in that at evaluation time, i.e. the moment of speaking, a random positive or negative value of noise is added to the ranking value (the permanent central point on a constraint range), and the resultant value used at actual evaluation time is termed the selection point. In this scenario, the dominance between two constraints may be less fixed if their ranking values are close enough to cause an overlapping area, where the ranking between them is unspecified, depending on which selection points are chosen as the real values. This is shown below.

(43) A continuous ranking scale with ranges

A B C

strict lax

(high ranked) (low ranked)

B2

C2 A2

C1

B1

A1

A B C

strict lax

(high ranked) (low ranked)

‧ 國

立 政 治 大 學

N a tio na

l C h engchi U ni ve rs it y

72

As depicted in (43), Constraint A is too far from the other constraints to overlap, and hence it is ranked highest in both evaluations. In the first evaluation time (Selection Point 1), B1 is higher than C1, a ranking that takes place in most cases for the higher ranking value of Constraint B. In the second evaluation, however, C2

outranks B2, because it happens that the speaker chooses the bottom value of Constraint B and the top value of Constraint C. Such a ranking, though possible, will still be judged to be rarer since it may only occur in the comparatively small overlapping area.

A noteworthy concept along these lines is that the random noise perturbation, namely the constraint range in this realm, that takes place in the real world can be properly portrayed as bell-curved normal (/Gaussian) distribution, where 68.27% of the selection points reside within one standard deviation (σ) from the mean (μ) on both sides, 95.45% within two σ’s, and 99.73% within three. Any probability falling beyond μ ± 3σ becomes vanishingly low. The event of overlapping ranking distributions is illustrated below (Boersma and Hayes 1999):

(44) Overlapping ranking distributions

Stochastic evaluation is in nature applied to simulated learning algorithms in which relative constraint positions that are in conflict with the current ranking hypothesis may shift on the ranking scale as the algorithm is “fed” with more correct linguistic input. This mechanism, however, is found to be highly workable on the lexical variations with optionality in loanword adaptation (Zuraw 2010). The

‧ 國

立 政 治 大 學

N a tio na

l C h engchi U ni ve rs it y

73

difference, however, is that the adult speaker’s grammar in dealing with loanwords should be fixed (though the selected values within the overlapping area vary), unlike the moveable ranking values of constraints as those working in GLA. Another discrepancy lies in the claim that the “fixed” loanword phonology is hardly acquired or learned, at least in the phase of perceptual processing to decide between retention and deletion of an input segment, but instead judged by both phonetic cues and native phonology.

As exemplified in the previous answer, the difficulty to develop a formal analysis for the probabilistic distribution of English [ ]-codas in TM adaptations casts doubt on the quantitative predictability of cophonology and ROE. Let us turn to stochastic evaluation to see whether it serves as a more feasible model in accounting for lexical variation in loanword adaptation. The same data are given again, along with the respective rankings that derive the processes of deletion and retention of postvocalic [ ]’s, as shown in (45).

(45) Variable adaptation of English [ ]-codas in TM

Strategy Constraint ranking Percentage

Deletion CodaCon >> Dep-V >> Max-R(V_) 90.72% (176/194) Retention CodaCon >> Max-R(V_) >> Dep-V 9.28% (18/194)

From a stochastic viewpoint, the ranking values of Dep-V and Max-R(V_) have to be close enough to incur an overlapping area, the overlapping “degree” of which, however, is determined by the involved frequency of each ranking. Following the convention of stochastic OT, the arbitrary value of 2.0 is adopted as the evaluation noise (σ, standard deviation), and thus the range should cover 12 units (2*3*2, as 99.73% of the selection points fall in the coverage of 3 σ’s on both sides). The initial state of constraints is given the arbitrary value of 100. Though there should be no

‧ 國

立 政 治 大 學

N a tio na

l C h engchi U ni ve rs it y

74

“starting point” (in Boersma’s original works) in the current fixed adult grammar, we place the constraints in this neighborhood for the sake of consistency. Through mathematical calculation13, the ranking values are worked out and listed in (46).

(46) Stochastic ranking for English postvocalic [ ] Constraint Ranking value

CodaCon 135.56

Dep-V 93.55

Max-R(V_) 83.78

CodaCon is arbitrarily assigned the ranking value 135.56 such that it is sufficiently high to avoid overlapping with the other constraint ranges. Max-R(V_) is assigned a value that is generally lower than Dep-V but overlapping is still inevitable.

With the ranking values 93.55 and 83.78 allocated to Dep-V and Max-R(V_), respectively, the overlapping area will cover 18.56% of each constraint, so the odds that Max-R(V_) outranks Dep-V are 9.28%. In (47), the ranking values are applied to genuine loanword data with hypothetical selection points, where CSP stands for

“common selection points”, and RSP “rare selection points”.

(47) Hypothetical selection points and results from (46) Dep-V Max-R(V_) Result Example

CSP 97.1 86.45 Deletion [.n .m n.] → [.nwo.man.]

RSP 87.63 89.11 Retention [.h .mon.] → [.x . .mo .]

As has been discussed thus far, a Stochastic-OT approach seems to be a promising OT version that is able to deal with the otherwise paradoxical problems posed by cophonology. The advantages are twofold. The first is theoretical simplicity,

13 Detailed elaboration of the rationales for calculating ranking values of constraints is provided in 6.2.

‧ 國

立 政 治 大 學

N a tio na

l C h engchi U ni ve rs it y

75

since it economizes on holding a single ranking by viewing constraints as ranges of value on a continuous strictness scale, and the overlapping area of two constraints, where free ranking may occur, is naturally inevitable when the ranking values are close enough to one another. Apparently this is conceptually superior to cophonology, since diverse subgrammars are still needed even under a single master ranking in the latter. The second is explanatory adequacy. Inheriting its superiority in modeling altering rankings in an error-driven learning algorithm, stochastic candidate evaluation excels in explaining the precise occurrence probability of each variant through calculating the overlapping coverage on the linear scale of constraint strictness. This is believed to be something both cophonology and ROE may find difficult to achieve.

Given the plentiful merits that stochastic OT enjoys, a theoretical phonologist may still be faced with two potential challenges in developing a formal analysis for language variation with this framework. First, it is hardly mentioned in Boersma’s series of works exactly how the calculation mechanism of degrees of overlapping and ranking values are working, though followers may find it convenient to conduct a computer-based stochastic analysis on language variation by means of OT Soft. The other challenge originates from the innate imperfection of the theory itself: it is able to formulate variations induced by the overlapping that is incurred by two conflicting constraints, whereas it seems to fail to explain the overlapping area that is incurred by more than three constraints that are in conflict with one another. That is, stochastic OT may have a hard time accounting for variations that are caused by more than three different rankings among over three constraints. This potential challenge, however, can be overcome by simple mathematical understanding of the locational relationships of overlapped constraints, as will be revealed in Chapter 6.

‧ 國

立 政 治 大 學

N a tio na

l C h engchi U ni ve rs it y

76