Parameter Re-Estimation Formulae

HR EST / HER EST

8.8 Parameter Re-Estimation Formulae

# alignment model set for two-model re-estimation ALIGNMODELMMF = dir2/hmacs

ALIGNHMMLIST = hmmlist2

is necessary. HERest only needs to be invoked using that configuration file.

HERest -C config -C config.2model -S trainlist -I labs -H dir1/hmacs -M dir3 hmmlist1 The models in directory dir1 are updated using the alignment models stored in directory dir2 and the result is written to directory dir3. Note that trainlist is a standard HTK script and that the above command uses the capability of HERest to accept multiple configuration files on the command line. If each HMM is stored in a separate file, the configuration variables ALIGNMODELDIR and ALIGNMODELEXT can be used.

Only the state level alignment is obtained using the alignment models. In the exceptional case that the update model set contains mixtures of Gaussians, component level posterior probabilities are obtained from the update models themselves.

8.8 Parameter Re-Estimation Formulae

For reference purposes, this section lists the various formulae employed within the HTK parameter estimation tools. All are standard, however, the use of non-emitting states and multiple data streams leads to various special cases which are usually not covered fully in the literature.

The following notation is used in this section

N number of states

S number of streams

Ms number of mixture components in stream s

T number of observations

Q number of models in an embedded training sequence Nq number of states in the q’th model in a training sequence O a sequence of observations

ot the observation at time t, 1 ≤ t ≤ T

ost the observation vector for stream s at time t aij the probability of a transition from state i to j cjsm weight of mixture component m in state j stream s

µ_jsm vector of means for the mixture component m of state j stream s Σjsm covariance matrix for the mixture component m of state j stream s λ the set of all parameters defining a HMM

8.8.1 Viterbi Training (HInit)

In this style of model training, a set of training observations O^r, 1 ≤ r ≤ R is used to estimate the parameters of a single HMM by iteratively computing Viterbi alignments. When used to initialise a new HMM, the Viterbi segmentation is replaced by a uniform segmentation (i.e. each training observation is divided into N equal segments) for the first iteration.

Apart from the first iteration on a new model, each training sequence O is segmented using a state alignment procedure which results from maximising

φN(T ) = max

i φi(T )aiN

for 1 < i < N where

φj(t) = h

maxi φi(t − 1)aij

i bj(ot) with initial conditions given by

φ1(1) = 1 φj(1) = a1jbj(o1)

for 1 < j < N . In this and all subsequent cases, the output probability bj(·) is as defined in equations7.1and7.2in section7.1.

8.8 Parameter Re-Estimation Formulae 143

If Aij represents the total number of transitions from state i to state j in performing the above maximisations, then the transition probabilities can be estimated from the relative frequencies

ˆaij = Aij

P_N

k=2Aik

The sequence of states which maximises φN(T ) implies an alignment of training data observa-tions with states. Within each state, a further alignment of observaobserva-tions to mixture components is made. The tool HInit provides two mechanisms for this: for each state and each stream

1. use clustering to allocate each observation ost to one of Ms clusters, or

2. associate each observation ost with the mixture component with the highest probability In either case, the net result is that every observation is associated with a single unique mixture component. This association can be represented by the indicator function ψ^r_jsm(t) which is 1 if o^r_st is associated with mixture component m of stream s of state j and is zero otherwise.

The means and variances are then estimated via simple averages

µˆ_jsm= P_R

r=1

P_T_r

t=1ψ_jsm^r (t)o^r_st P_R

r=1

P_T_r

t=1ψ_jsm^r (t) Σˆjsm=

P_R

r=1

P_T_r

t=1ψ_jsm^r (t)(o^r_st− ˆµ_jsm)(o^r_st− ˆµ_jsm)^T P_R

r=1

P_T_r

t=1ψ_jsm^r (t)

Finally, the mixture weights are based on the number of observations allocated to each compo-nent

cjsm= P_R

r=1

P_T_r

t=1ψ^r_jsm(t) P_R

r=1

P_T_r

t=1

P_M_s

l=1ψ_jsl^r (t)

8.8.2 Forward/Backward Probabilities

Baum-Welch training is similar to the Viterbi training described in the previous section except that the hard boundary implied by the ψ function is replaced by a soft boundary function L which represents the probability of an observation being associated any given Gaussian mixture component. This occupation probability is computed from the forward and backward probabilities.

For the isolated-unit style of training, the forward probability αj(t) for 1 < j < N and 1 < t ≤ T is calculated by the forward recursion

α_j(t) =

"_{N −1} X

i=2

α_i(t − 1)a_ij

# b_j(o_t)

with initial conditions given by

α1(1) = 1 αj(1) = a1jbj(o1) for 1 < j < N and final condition given by

αN(T ) =

N −1X

i=2

αi(T )aiN

The backward probability βi(t) for 1 < i < N and T > t ≥ 1 is calculated by the backward recursion

βi(t) =

N −1X

j=2

aijbj(ot+1)βj(t + 1) with initial conditions given by

βi(T ) = aiN

8.8 Parameter Re-Estimation Formulae 144

for 1 < i < N and final condition given by

β1(1) =

N −1X

j=2

a1jbj(o1)βj(1)

In the case of embedded training where the HMM spanning the observations is a composite constructed by concatenating Q subword models, it is assumed that at time t, the α and β values corresponding to the entry state and exit states of a HMM represent the forward and backward probabilities at time t−∆t and t+∆t, respectively, where ∆t is small. The equations for calculating α and β are then as follows.

For the forward probability, the initial conditions are established at time t = 1 as follows

α^(q)₁ (1) =

( 1 if q = 1

α^(q−1)₁ (1)a^(q−1)_1N_q−1 otherwise α^(q)_j (1) = a^(q)_1jb^(q)_j (o1)

α^(q)_N_q(1) =

NXq−1 i=2

α^(q)_i (1)a^(q)_iN_q

where the superscript in parentheses refers to the index of the model in the sequence of concatenated models. All unspecified values of α are zero. For time t > 1,

α₁^(q)(t) =

( 0 if q = 1

α_N^(q−1)_q−1(t − 1) + α^(q−1)₁ (t)a^(q−1)_1N_q−1 otherwise

α^(q)_j (t) =



α^(q)₁ (t)a^(q)_1j +

NXq−1 i=2

α^(q)_i (t − 1)a^(q)_ij



 b^(q)_j (ot)

α_N^(q)_q(t) =

NXq−1 i=2

α^(q)_i (t)a^(q)_iN_q

For the backward probability, the initial conditions are set at time t = T as follows

β^(q)_N_q(T ) =

( 1 if q = Q

β^(q+1)_N

q+1(T )a^(q+1)_1N

q+1 otherwise β_i^(q)(T ) = a^(q)_iN_qβ_N^(q)_q(T )

β^(q)₁ (T ) =

NXq−1 j=2

a^(q)_1jb^(q)_j (oT)β_j^(q)(T ) where once again, all unspecified β values are zero. For time t < T ,

β_N^(q)_q(t) =

( 0 if q = Q

β₁^(q+1)(t + 1) + β_N^(q+1)

q+1(t)a^(q+1)_1N

q+1 otherwise β_i^(q)(t) = a^(q)_iN_qβ^(q)_N_q(t) +

NXq−1 j=2

a^(q)_ij b^(q)_j (ot+1)β_j^(q)(t + 1)

β₁^(q)(t) =

NXq−1 j=2

a^(q)_1jb^(q)_j (ot)β_j^(q)(t)

The total probability P = prob(O|λ) can be computed from either the forward or backward probabilities

P = αN(T ) = β1(1)

8.8 Parameter Re-Estimation Formulae 145

8.8.3 Single Model Reestimation(HRest)

In this style of model training, a set of training observations O^r, 1 ≤ r ≤ R is used to estimate the parameters of a single HMM. The basic formula for the reestimation of the transition probabilities is

ˆaij= P_R

r=1 1 Pr

P_T_r₋₁

t=1 α^r_i(t)aijbj(o^r_t+1)β^r_j(t + 1) P_R

r=1 1 Pr

P_T_r

t=1α^r_i(t)β_i^r(t)

where 1 < i < N and 1 < j < N and Pr is the total probability P = prob(O^r|λ) of the r’th observation. The transitions from the non-emitting entry state are reestimated by

ˆa1j = 1 R

XR r=1

Prα^r_j(1)β^r_j(1)

where 1 < j < N and the transitions from the emitting states to the final non-emitting exit state are reestimated by

ˆaiN = P_R

r=1 1

Prα^r_i(T )β_i^r(T ) P_R

r=1 1 Pr

P_T_r

t=1α^r_i(t)β_i^r(t) where 1 < i < N .

For a HMM with M_s mixture components in stream s, the means, covariances and mixture weights for that stream are reestimated as follows. Firstly, the probability of occupying the m’th mixture component in stream s at time t for the r’th observation is

L^r_jsm(t) = 1 Pr

U_j^r(t)cjsmbjsm(o^r_st)β^r_j(t)b^∗_js(o^r_t) where

U_j^r(t) =

½ a1j if t = 1

P_{N −1}

i=2 α^r_i(t − 1)aij otherwise (8.1) and

b^∗_js(o^r_t) =Y

k6=s

bjk(o^r_kt)

For single Gaussian streams, the probability of mixture component occupancy is equal to the prob-ability of state occupancy and hence it is more efficient in this case to use

L^r_jsm(t) = L^r_j(t) = 1 Pr

αj(t)βj(t)

Given the above definitions, the re-estimation formulae may now be expressed in terms of L^r_jsm(t) as follows.

ˆ µ_jsm=

P_R

r=1

P_T_r

t=1L^r_jsm(t)o^r_st P_R

r=1

P_T_r

t=1L^r_jsm(t) Σˆjsm=

P_R

r=1

P_T_r

t=1L^r_jsm(t)(o^r_st− ˆµ_jsm)(o^r_st− ˆµ_jsm)^T P_R

r=1

P_T_r

t=1L^r_jsm(t) (8.2)

cjsm= P_R

r=1

P_T_r

t=1L^r_jsm(t) P_R

r=1

P_T_r

t=1L^r_j(t)

8.8.4 Embedded Model Reestimation (HERest)

The re-estimation formulae for the embedded model case have to be modified to take account of the fact that the entry states can be occupied at any time as a result of transitions out of the previous model. The basic formulae for the re-estimation of the transition probabilities is

ˆa^(q)_ij = P_R

r=1 1 Pr

P_T_r₋₁

t=1 α^(q)r_i (t)a^(q)_ij b^(q)_j (o^r_t+1)β_j^(q)r(t + 1) P_R

r=1 1 Pr

P_T_r

t=1α^(q)r_i (t)β_i^(q)r(t)

8.8 Parameter Re-Estimation Formulae 146

The transitions from the non-emitting entry states into the HMM are re-estimated by

ˆa^(q)_1j =

P_R

r=1 1 Pr

P_T_r₋₁

t=1 α^(q)r₁ (t)a^(q)_1jb^(q)_j (o^r_t)β_j^(q)r(t) P_R

r=1 1 Pr

P_T_r

t=1α^(q)r₁ (t)β₁^(q)r(t) + α₁^(q)r(t)a^(q)_1N_qβ₁^(q+1)r(t)

and the transitions out of the HMM into the non-emitting exit states are re-estimated by

ˆa^(q)_iN

q = P_R

r=1 1 Pr

P_T_r₋₁

t=1 α^(q)r_i (t)a^(q)_iN_qβ^(q)r_N_q (t) P_R

r=1 1 Pr

P_T_r

t=1α^(q)r_i (t)β_i^(q)r(t)

Finally, the direct transitions from non-emitting entry to non-emitting exit states are re-estimated by

ˆa^(q)_1N_q =

P_R

r=1 1 Pr

P_T_r₋₁

t=1 α₁^(q)r(t)a^(q)_1N

qβ₁^(q+1)r(t) P_R

r=1 1 Pr

P_T_r

t=1α^(q)r_i (t)β_i^(q)r(t) + α^(q)r₁ (t)a^(q)_1N

qβ₁^(q+1)r(t)

The re-estimation formulae for the output distributions are the same as for the single model case except for the obvious additional subscript for q. However, the probability calculations must now allow for transitions from the entry states by changing U_j^r(t) in equation8.1to

U_j^(q)r(t) = (

α^(q)r₁ (t)a^(q)_1j if t = 1 α^(q)r₁ (t)a^(q)_1j +P_N_q₋₁

i=2 α^(q)r_i (t − 1)a^(q)_ij otherwise

8.8.5 Semi-Tied Transform Estimation (HERest)

In addition to estimating the standard parameters above HERest can be used to estimated semi-tied transforms and HLDA projections. This section describes semi-semi-tied transforms, the updates for HLDA are very similar.

Semi-tied covariance matrices have the form

µ_m_r = µ_m_r, Σ_m_r = H_rΣ^diag_m_r H^T_r (8.3) For efficiency reasons the transforms are stored and likelihoods calculated using

N (o; µ_m_r, HrΣ^diag_m_r H^T_r) = 1

|Hr|N (H⁻¹_r o; H⁻¹_r µ_m_r, Σ^diag_m_r ) = |Ar|N (Aro; Arµ_m_r, Σ^diag_m_r ) (8.4) where Ar = H⁻¹_r . The transformed mean, Arµ_m_r, is stored in the model files rather than the original mean for efficiency.

The estimation of semi-tied transforms is a doubly iterative process. Given a current set of covariance matrix estimates the semi-tied transforms are estimated in a similar fashion to the full variance MLLRCOV transforms.

a_ri= c_riG⁽ⁱ⁾⁻¹_r vu ut

Ã βr

c_riG⁽ⁱ⁾⁻¹r c^T_ri

(8.5)

where ariis i^throw of Ar, the 1 × n row vector criis the vector of cofactors of Ar, crij= cof(Arij), and G⁽ⁱ⁾r is defined as

G⁽ⁱ⁾_r =

mr=1

1 σ^diag2_m_r_i

XT t=1

Lmr(t)(o(t) − µ_m_r)(o(t) − µ_m_r)^T (8.6)

This iteratively estimates one row of the transform at a time. The number of iterations is controlled by the HAdapt configuration variable MAXXFORMITER.

Having estimated the transform the diagonal covariance matrix is updated as

Σ^diag_m_r = diag ÃAr

P_T

t=1Lmr(t)(o(t) − µ_m_r)(o(t) − µ_m_r)^TA^T_r P_T

t=1Lmr(t)

(8.7)

This is the second look as given a new estimate of the diagonal variance a new transform can be estimated. The number of iterations of transform and covariance matrix update is controlled by the HAdapt configuration variable MAXSEMITIEDITER

在文檔中 The HTK Book (頁 151-156)