• 沒有找到結果。

Parameter Re-Estimation Formulae

在文檔中 The HTK Book (頁 151-156)

HR EST / HER EST

8.8 Parameter Re-Estimation Formulae

# alignment model set for two-model re-estimation ALIGNMODELMMF = dir2/hmacs

ALIGNHMMLIST = hmmlist2

is necessary. HERest only needs to be invoked using that configuration file.

HERest -C config -C config.2model -S trainlist -I labs -H dir1/hmacs -M dir3 hmmlist1 The models in directory dir1 are updated using the alignment models stored in directory dir2 and the result is written to directory dir3. Note that trainlist is a standard HTK script and that the above command uses the capability of HERest to accept multiple configuration files on the command line. If each HMM is stored in a separate file, the configuration variables ALIGNMODELDIR and ALIGNMODELEXT can be used.

Only the state level alignment is obtained using the alignment models. In the exceptional case that the update model set contains mixtures of Gaussians, component level posterior probabilities are obtained from the update models themselves.

8.8 Parameter Re-Estimation Formulae

For reference purposes, this section lists the various formulae employed within the HTK parameter estimation tools. All are standard, however, the use of non-emitting states and multiple data streams leads to various special cases which are usually not covered fully in the literature.

The following notation is used in this section

N number of states

S number of streams

Ms number of mixture components in stream s

T number of observations

Q number of models in an embedded training sequence Nq number of states in the q’th model in a training sequence O a sequence of observations

ot the observation at time t, 1 ≤ t ≤ T

ost the observation vector for stream s at time t aij the probability of a transition from state i to j cjsm weight of mixture component m in state j stream s

µjsm vector of means for the mixture component m of state j stream s Σjsm covariance matrix for the mixture component m of state j stream s λ the set of all parameters defining a HMM

8.8.1 Viterbi Training (HInit)

In this style of model training, a set of training observations Or, 1 ≤ r ≤ R is used to estimate the parameters of a single HMM by iteratively computing Viterbi alignments. When used to initialise a new HMM, the Viterbi segmentation is replaced by a uniform segmentation (i.e. each training observation is divided into N equal segments) for the first iteration.

Apart from the first iteration on a new model, each training sequence O is segmented using a state alignment procedure which results from maximising

φN(T ) = max

i φi(T )aiN

for 1 < i < N where

φj(t) = h

maxi φi(t − 1)aij

i bj(ot) with initial conditions given by

φ1(1) = 1 φj(1) = a1jbj(o1)

for 1 < j < N . In this and all subsequent cases, the output probability bj(·) is as defined in equations7.1and7.2in section7.1.

8.8 Parameter Re-Estimation Formulae 143

If Aij represents the total number of transitions from state i to state j in performing the above maximisations, then the transition probabilities can be estimated from the relative frequencies

ˆaij = Aij

PN

k=2Aik

The sequence of states which maximises φN(T ) implies an alignment of training data observa-tions with states. Within each state, a further alignment of observaobserva-tions to mixture components is made. The tool HInit provides two mechanisms for this: for each state and each stream

1. use clustering to allocate each observation ost to one of Ms clusters, or

2. associate each observation ost with the mixture component with the highest probability In either case, the net result is that every observation is associated with a single unique mixture component. This association can be represented by the indicator function ψrjsm(t) which is 1 if orst is associated with mixture component m of stream s of state j and is zero otherwise.

The means and variances are then estimated via simple averages

µˆjsm= PR

r=1

PTr

t=1ψjsmr (t)orst PR

r=1

PTr

t=1ψjsmr (t) Σˆjsm=

PR

r=1

PTr

t=1ψjsmr (t)(orst− ˆµjsm)(orst− ˆµjsm)T PR

r=1

PTr

t=1ψjsmr (t)

Finally, the mixture weights are based on the number of observations allocated to each compo-nent

cjsm= PR

r=1

PTr

t=1ψrjsm(t) PR

r=1

PTr

t=1

PMs

l=1ψjslr (t)

8.8.2 Forward/Backward Probabilities

Baum-Welch training is similar to the Viterbi training described in the previous section except that the hard boundary implied by the ψ function is replaced by a soft boundary function L which represents the probability of an observation being associated any given Gaussian mixture component. This occupation probability is computed from the forward and backward probabilities.

For the isolated-unit style of training, the forward probability αj(t) for 1 < j < N and 1 < t ≤ T is calculated by the forward recursion

αj(t) =

"N −1 X

i=2

αi(t − 1)aij

# bj(ot)

with initial conditions given by

α1(1) = 1 αj(1) = a1jbj(o1) for 1 < j < N and final condition given by

αN(T ) =

N −1X

i=2

αi(T )aiN

The backward probability βi(t) for 1 < i < N and T > t ≥ 1 is calculated by the backward recursion

βi(t) =

N −1X

j=2

aijbj(ot+1j(t + 1) with initial conditions given by

βi(T ) = aiN

8.8 Parameter Re-Estimation Formulae 144

for 1 < i < N and final condition given by

β1(1) =

N −1X

j=2

a1jbj(o1j(1)

In the case of embedded training where the HMM spanning the observations is a composite constructed by concatenating Q subword models, it is assumed that at time t, the α and β values corresponding to the entry state and exit states of a HMM represent the forward and backward probabilities at time t−∆t and t+∆t, respectively, where ∆t is small. The equations for calculating α and β are then as follows.

For the forward probability, the initial conditions are established at time t = 1 as follows

α(q)1 (1) =

( 1 if q = 1

α(q−1)1 (1)a(q−1)1Nq−1 otherwise α(q)j (1) = a(q)1jb(q)j (o1)

α(q)Nq(1) =

NXq−1 i=2

α(q)i (1)a(q)iNq

where the superscript in parentheses refers to the index of the model in the sequence of concatenated models. All unspecified values of α are zero. For time t > 1,

α1(q)(t) =

( 0 if q = 1

αN(q−1)q−1(t − 1) + α(q−1)1 (t)a(q−1)1Nq−1 otherwise

α(q)j (t) =

α(q)1 (t)a(q)1j +

NXq−1 i=2

α(q)i (t − 1)a(q)ij

 b(q)j (ot)

αN(q)q(t) =

NXq−1 i=2

α(q)i (t)a(q)iNq

For the backward probability, the initial conditions are set at time t = T as follows

β(q)Nq(T ) =

( 1 if q = Q

β(q+1)N

q+1(T )a(q+1)1N

q+1 otherwise βi(q)(T ) = a(q)iNqβN(q)q(T )

β(q)1 (T ) =

NXq−1 j=2

a(q)1jb(q)j (oTj(q)(T ) where once again, all unspecified β values are zero. For time t < T ,

βN(q)q(t) =

( 0 if q = Q

β1(q+1)(t + 1) + βN(q+1)

q+1(t)a(q+1)1N

q+1 otherwise βi(q)(t) = a(q)iNqβ(q)Nq(t) +

NXq−1 j=2

a(q)ij b(q)j (ot+1j(q)(t + 1)

β1(q)(t) =

NXq−1 j=2

a(q)1jb(q)j (otj(q)(t)

The total probability P = prob(O|λ) can be computed from either the forward or backward probabilities

P = αN(T ) = β1(1)

8.8 Parameter Re-Estimation Formulae 145

8.8.3 Single Model Reestimation(HRest)

In this style of model training, a set of training observations Or, 1 ≤ r ≤ R is used to estimate the parameters of a single HMM. The basic formula for the reestimation of the transition probabilities is

ˆaij= PR

r=1 1 Pr

PTr−1

t=1 αri(t)aijbj(ort+1rj(t + 1) PR

r=1 1 Pr

PTr

t=1αri(t)βir(t)

where 1 < i < N and 1 < j < N and Pr is the total probability P = prob(Or|λ) of the r’th observation. The transitions from the non-emitting entry state are reestimated by

ˆa1j = 1 R

XR r=1

1

Prαrj(1)βrj(1)

where 1 < j < N and the transitions from the emitting states to the final non-emitting exit state are reestimated by

ˆaiN = PR

r=1 1

Prαri(T )βir(T ) PR

r=1 1 Pr

PTr

t=1αri(t)βir(t) where 1 < i < N .

For a HMM with Ms mixture components in stream s, the means, covariances and mixture weights for that stream are reestimated as follows. Firstly, the probability of occupying the m’th mixture component in stream s at time t for the r’th observation is

Lrjsm(t) = 1 Pr

Ujr(t)cjsmbjsm(orstrj(t)bjs(ort) where

Ujr(t) =

½ a1j if t = 1

PN −1

i=2 αri(t − 1)aij otherwise (8.1) and

bjs(ort) =Y

k6=s

bjk(orkt)

For single Gaussian streams, the probability of mixture component occupancy is equal to the prob-ability of state occupancy and hence it is more efficient in this case to use

Lrjsm(t) = Lrj(t) = 1 Pr

αj(t)βj(t)

Given the above definitions, the re-estimation formulae may now be expressed in terms of Lrjsm(t) as follows.

ˆ µjsm=

PR

r=1

PTr

t=1Lrjsm(t)orst PR

r=1

PTr

t=1Lrjsm(t) Σˆjsm=

PR

r=1

PTr

t=1Lrjsm(t)(orst− ˆµjsm)(orst− ˆµjsm)T PR

r=1

PTr

t=1Lrjsm(t) (8.2)

cjsm= PR

r=1

PTr

t=1Lrjsm(t) PR

r=1

PTr

t=1Lrj(t)

8.8.4 Embedded Model Reestimation (HERest)

The re-estimation formulae for the embedded model case have to be modified to take account of the fact that the entry states can be occupied at any time as a result of transitions out of the previous model. The basic formulae for the re-estimation of the transition probabilities is

ˆa(q)ij = PR

r=1 1 Pr

PTr−1

t=1 α(q)ri (t)a(q)ij b(q)j (ort+1j(q)r(t + 1) PR

r=1 1 Pr

PTr

t=1α(q)ri (t)βi(q)r(t)

8.8 Parameter Re-Estimation Formulae 146

The transitions from the non-emitting entry states into the HMM are re-estimated by

ˆa(q)1j =

PR

r=1 1 Pr

PTr−1

t=1 α(q)r1 (t)a(q)1jb(q)j (ortj(q)r(t) PR

r=1 1 Pr

PTr

t=1α(q)r1 (t)β1(q)r(t) + α1(q)r(t)a(q)1Nqβ1(q+1)r(t)

and the transitions out of the HMM into the non-emitting exit states are re-estimated by

ˆa(q)iN

q = PR

r=1 1 Pr

PTr−1

t=1 α(q)ri (t)a(q)iNqβ(q)rNq (t) PR

r=1 1 Pr

PTr

t=1α(q)ri (t)βi(q)r(t)

Finally, the direct transitions from non-emitting entry to non-emitting exit states are re-estimated by

ˆa(q)1Nq =

PR

r=1 1 Pr

PTr−1

t=1 α1(q)r(t)a(q)1N

qβ1(q+1)r(t) PR

r=1 1 Pr

PTr

t=1α(q)ri (t)βi(q)r(t) + α(q)r1 (t)a(q)1N

qβ1(q+1)r(t)

The re-estimation formulae for the output distributions are the same as for the single model case except for the obvious additional subscript for q. However, the probability calculations must now allow for transitions from the entry states by changing Ujr(t) in equation8.1to

Uj(q)r(t) = (

α(q)r1 (t)a(q)1j if t = 1 α(q)r1 (t)a(q)1j +PNq−1

i=2 α(q)ri (t − 1)a(q)ij otherwise

8.8.5 Semi-Tied Transform Estimation (HERest)

In addition to estimating the standard parameters above HERest can be used to estimated semi-tied transforms and HLDA projections. This section describes semi-semi-tied transforms, the updates for HLDA are very similar.

Semi-tied covariance matrices have the form

µmr = µmr, Σmr = HrΣdiagmr HTr (8.3) For efficiency reasons the transforms are stored and likelihoods calculated using

N (o; µmr, HrΣdiagmr HTr) = 1

|Hr|N (H−1r o; H−1r µmr, Σdiagmr ) = |Ar|N (Aro; Arµmr, Σdiagmr ) (8.4) where Ar = H−1r . The transformed mean, Arµmr, is stored in the model files rather than the original mean for efficiency.

The estimation of semi-tied transforms is a doubly iterative process. Given a current set of covariance matrix estimates the semi-tied transforms are estimated in a similar fashion to the full variance MLLRCOV transforms.

ari= criG(i)−1r vu ut

à βr

criG(i)−1r cTri

!

(8.5)

where ariis ithrow of Ar, the 1 × n row vector criis the vector of cofactors of Ar, crij= cof(Arij), and G(i)r is defined as

G(i)r =

Mr

X

mr=1

1 σdiag2mri

XT t=1

Lmr(t)(o(t) − µmr)(o(t) − µmr)T (8.6)

This iteratively estimates one row of the transform at a time. The number of iterations is controlled by the HAdapt configuration variable MAXXFORMITER.

Having estimated the transform the diagonal covariance matrix is updated as

Σdiagmr = diag ÃAr

PT

t=1Lmr(t)(o(t) − µmr)(o(t) − µmr)TATr PT

t=1Lmr(t)

!

(8.7)

This is the second look as given a new estimate of the diagonal variance a new transform can be estimated. The number of iterations of transform and covariance matrix update is controlled by the HAdapt configuration variable MAXSEMITIEDITER

在文檔中 The HTK Book (頁 151-156)