MLLR Formulae - The HTK Book

based on the adaptation data presented.

The mathematical details of the Forward-Backward algorithm are given in section8.7, while the mathematical details for the MLLR mean and variance transformation calculations can be found in section9.4.

HEAdapt is typically invoked by a command line of the form

HEAdapt -S adaptlist -I labs -H dir1/hmacs -M dir2 hmmlist where hmmlist contains the list of HMMs.

Once all MMFs and MLFs have been loaded, HEAdapt processes each file in the adaptlist, and accumulates the required statistics as described above. On completion, an updated MMF is output to the directory dir2.

If the following form of the command is used

HEAdapt -S adaptlist -I labs -H dir1/hmacs -K dir2/tmf hmmlist

then on completion a transform model file (TMF) tmf is output to the directory dir2. This process is illustrated by Fig 9.3. Section 9.1.3 describes the TMF format in more detail. The output tmf contains transforms that transform the MMF hmacs. Once this is saved, HVite can be used to perform recognition for the adapted speaker either using a transformed MMF or by using the speaker independent MMF together with a speaker specific TMF.

HEAdapt employs the same pruning mechanism as HERest during the forward-backward computation. As such the pruning on the backward path is under the user’s control, and the beam is set using the -t option.

HEAdapt can also be run several times in block or static fashion. For instance a first pass might entail a global adaptation (forced using the -g option), producing the TMF global.tmf by invoking

HEAdapt -g -S adaptlist -I labs -H mmf -K tmfs/global.tmf \ hmmlist

The second pass could load in the global transformations (and tranform the model set) using the -J option, performing a better frame/state alignment than the speaker independent model set, and output a set of regression class transformations,

HEAdapt -S adaptlist -I labs -H mmf -K tmfs/rc.tmf \ -J tmfs/global.tmf hmmlist

Note again that the number of transformations is selected automatically and is dependent on the node occupation threshold setting and the amount of adaptation data available. Finally when producing a TMF, HEAdapt always generates a TMF to transform the input MMF in all cases.

In the last example the input MMF is transformed by the global transform file global.tmf in order to obtain the frame/state alignment only. The final TMF that is output, rc.tmf, contains the set of transforms to transform the input MMF mmf, based on this frame/state alignment.

As an alternative, the second pass could entail MLLR together with MAP adaptation, outputing a new model set. Note that with MAP adaptation a transform can not be saved and a full HMM set must be output.

HEAdapt -S adaptlist -I labs -H mmf -M dir2 -k -j 12.0 -J tmfs/global.tmf hmmlist

Note that MAP alone could be used by removing the -k option. The argument to the -j option represents the MAP adaptation scaling factor.

9.4 MLLR Formulae

For reference purposes, this section lists the various formulae employed within the HTK adaptation tool. It is assumed throughout that single stream data is used and that diagonal covariances are also used. All are standard and can be found in various literature.

The following notation is used in this section

9.4 MLLR Formulae 136

M the model set

T number of observations

m a mixture component

O a sequence of observations

o(t) the observation at time t, 1 ≤ t ≤ T

µ_m_r mean vector for the mixture component mr

ξ_m_r extended mean vector for the mixture component mr

Σmr covariance matrix for the mixture component mr

Lmr(t) the occupancy probability for the mixture component mr

at time t

9.4.1 Estimation of the Mean Transformation Matrix

To enable robust transformations to be trained, the transform matrices are tied across a number of Gaussians. The set of Gaussians which share a transform is referred to as a regression class.

For a particular transform case Wm, the R Gaussian components {m1, m2, . . . , mR} will be tied together, as determined by the regression class tree (see section9.1.2). By formulating the standard auxiliary function, and then maximising it with respect to the transformed mean, and considering only these tied Gaussian components, the following is obtained,

XT t=1

XR r=1

Lmr(t)Σ⁻¹_m_ro(t)ξ^T_m_r = XT t=1

XR r=1

Lmr(t)Σ⁻¹_m_rWmξ_m_rξ^T_m_r (9.4)

and Lmr(t), the occupation likelihood, is defined as,

Lmr(t) = p(qmr(t) | M, OT)

where qmr(t) indicates the Gaussian component mr at time t, and OT = {o(1), . . . , o(T )} is the adaptation data. The occupation likelihood is obtained from the forward-backward process de-scribed in section8.7.

To solve for W_m, two new terms are defined.

1. The left hand side of equation9.4is independent of the transformation matrix and is referred to as Z, where

Z = XT t=1

XR r=1

Lmr(t)Σ⁻¹_m_ro(t)ξ^T_m_r

2. A new variable G⁽ⁱ⁾ is defined with elements g_jq⁽ⁱ⁾=

XR r=1

v_ii^(r)d^(r)_jq

where

V^(r)= XT t=1

L_m_r(t)Σ⁻¹_m_r and

D^(r)= ξ_m_rξ^T_m_r

It can be seen that from these two new terms, Wmcan be calculated from w^T_i = G⁻¹_i z^T_i

where wi is the i^thvector of Wm and zi is the i^thvector of Z.

The regression class tree is used to generate the classes dynamically, so it is not known a-priori which regression classes will be used to estimate the transform. This does not present a problem, since G⁽ⁱ⁾and Z for the chosen regression class may be obtained from its child classes (as defined by the tree). If the parent node R has children {R1, . . . , RC} then

Z = XC c=1

Z^(R^c⁾

9.4 MLLR Formulae 137

and

G⁽ⁱ⁾= XC c=1

G^(iR^c⁾

From this it is clear that it is only necessary to calculate G⁽ⁱ⁾and Z for only the most specific regression classes possible – i.e. the base classes.

9.4.2 Estimation of the Variance Transformation Matrix

Estimation of the variance transformation matrices is only available for diagonal covariance Gaus-sian systems. The GausGaus-sian covariance is transformed using,

Σˆm= B^T_mHmBm

where Hmis the linear transformation to be estimated and Bmis the inverse of the Choleski factor of Σ⁻¹_m, so

Σ⁻¹_m = CmC^T_m and

Bm= C⁻¹_m

After rewriting the auxiliary function, the transform matrix Hmis estimated from,

Hm= P_R_c

r=1C^T_m_r£

Lmr(t)(o(t) − µ_m_r)(o(t) − µ_m_r)^T¤ Cmr

Lmr(t)

Here, Hm is forced to be a diagonal transformation by setting the off-diagonal terms to zero, which ensures that ˆΣmis also diagonal.

Chapter 10

HMM System Refinement

HHE

In chapter 8, the basic processes involved in training a continuous density HMM system were explained and examples were given of building a set of HMM phone models. In the practical application of these techniques to building real systems, there are often a number of problems to overcome. Most of these arise from the conflicting desire to have a large number of model parameters in order to achieve high accuracy, whilst at the same time having limited and uneven training data.

As mentioned previously, the HTK philosophy is to build systems incrementally. Starting with a set of context-independent monophone HMMs, a system can be refined in a sequence of stages. Each refinement step typically uses the HTK HMM definition editor HHEd followed by re-estimation using HERest. These incremental manipulations of the HMM set often involve parameter tying, thus many of HHEd’s operations involve generating new macro definitions.

The principle types of manipulation that can be performed by HHEd are

• HMM cloning to form context-dependent model sets

• Generalised parameter tying

• Data driven and decision tree based clustering.

• Mixture component splitting

• Adding/removing state transitions

• Stream splitting, resizing and recasting 138

10.1 Using HHEd 139 This chapter describes how the HTK tool HHEd is used, its editing language and the main opera-tions that can be performed.

10.1 Using HHEd

The HMM editor HHEd takes as input a set of HMM definitions and outputs a new modified set, usually to a new directory. It is invoked by a command line of the form

HHEd -H MMF1 -H MMF2 ... -M newdir cmds.hed hmmlist

where cmds.hed is an edit script containing a list of edit commands. Each command is written on a separate line and begins with a 2 letter command name.

The effect of executing the above command line would be to read in the HMMs listed in hmmlist and defined by files MMF1, MMF2, etc., apply the editing operations defined in cmds.hed and then write the resulting system out to the directory newdir. As with all tools, HTK will attempt to replicate the file structure of the input in the output directory. By default, any new macros generated by HHEd will be written to one or more of the existing MMFs. In doing this, HTK will attempt to ensure that the “definition before use” rule for macros is preserved, but it cannot always guarantee this. Hence, it is usually best to define explicit target file names for new macros. This can be done in two ways. Firstly, explicit target file names can be given in the edit script using the UF command. For example, if cmds.hed contained

....

UF smacs

# commands to generate state macros ....

UF vmacs

# commands to generate variance macros ....

then the output directory would contain an MMF called smacs containing a set of state macro definitions and an MMF called vmacs containing a set of variance macro definitions, these would be in addition to the existing MMF files MMF1, MMF2, etc.

Alternatively, the whole HMM system can be written to a single file using the -w option. For example,

HHEd -H MMF1 -H MMF2 ... -w newMMF cmds.hed hmmlist would write the whole of the edited HMM set to the file newMMF.

As mentioned previously, each execution of HHEd is normally followed by re-estimation using HERest. Normally, all the information needed by HHEd is contained in the model set itself. How-ever, some clustering operations require various statistics about the training data (see sections10.4 and10.5). These statistics are gathered by HERest and output to a stats file, which is then read in by HHEd. Note, however, that the statistics file generated by HERest refers to the input model set not the re-estimated set. Thus for example, in the following sequence, the HHEd edit script in cmds.hed contains a command (see the RO command in section10.4) which references a statistics file (called stats) describing the HMM set defined by hmm1/MMF.

HERest -H hmm1/MMF -M hmmx -s stats hmmlist train1 train2 ....

HHEd -H hmm1/MMF -M hmm2 cmds.hed hmmlist

The required statistics file is generated by HERest but the re-estimated model set stored in hmmx/MMF is ignored and can be deleted.

在文檔中 The HTK Book (頁 141-145)