Model Adaptation using Linear Transformations

HR EST / HER EST

9.1 Model Adaptation using Linear Transformations

9.1 Model Adaptation using Linear Transformations 155

An alternative more efficient form of variance transformation is also available. Here, the trans-formation of the covariance matrix is of the form

Σ = HΣHˆ ^T, (9.3)

where H is the n × n covariance transformation matrix. This form of transformation, referred to in the code as MLLRCOV can be efficiently implemented as a transformation of the means and the features.

N (o; µ, HΣH) = 1

|H|N (H⁻¹o; H⁻¹µ, Σ) = |A|N (Ao; Aµ, Σ) (9.4) where A = H⁻¹. Using this form it is possible to estimate and efficiently apply full transformations.

MLLRCOV transformations are normally estimated using MLLRMEAN transformations as the parent transform.

Constrained MLLR (CMLLR)

Constrained maximum likelihood linear regression or CMLLR computes a set of transformations that will reduce the mismatch between an initial model set and the adaptation data². More specif-ically CMLLR is a feature adaptation technique that estimates a set of linear transformations for the features. The effect of these transformations is to shift the feature vector in the initial system so that each state in the HMM system is more likely to generate the adaptation data. Note that due to computational reasons, CMLLR is only implemented within HTK for diagonal covariance, continuous density HMMs.

The transformation matrix used to give a new estimate of the adapted mean is given by ˆ

o = W ζ, (9.5)

where W is the n × (n + 1) transformation matrix (where n is the dimensionality of the data) and ζ is the extended observation vector,

ζ = [ w o1 o2 . . . on ]^T

where w represents a bias offset whose value is fixed (within HTK) at 1.

Hence W can be decomposed into

W = [ b A ] (9.6)

where A represents an n × n transformation matrix and b represents a bias vector. This form of transform is referred to in the code as CMLLR.

Since multiple CMLLR transforms may be used it is important to include the Jacobian in the likelihood calculation.

L(o; µ, Σ, A, b) = |A|N (Ao + b; µ, Σ) (9.7)

This is the implementation used in the code.

9.1.2 Input/Output/Parent Transformations

There are three types of linear transform that may be used with the HTKTools.

• Input transform: the input transform is used to determine the forward-backward probabilities, hence the component posteriors, for estimating model and transform parameters. MLLR transforms can be iteratively estimated by refining the posteriors using a newly estimated transform.

• Output transform: the output transform is the transform that is generated. The form of the transform is specified using the appropriate configuration options.

• Parent transform: the parent transform determines the model, or features, on which the model set or transform is to be generated. For transform estimation this allows cascades of transforms to be used to adapt the model parameters. For model estimation this supports speaker adaptive training. Note the current implementation only supports adaptive training with CMLLR. Any parent transform can be used when generating transforms.

There is no difference in the storage of the transform parameters, whether it is to be a parent transform or an input transform. There is also no restrictions on the base classes, or regression classes, that are used for each transform.

2 MLLR can also be used to perform environmental compensation by reducing the mismatch due to channel or additive noise effects.

9.1 Model Adaptation using Linear Transformations 156

9.1.3 Base Class Definitions

The first requirement to allow adaptation is to specify the set of the components that share the same transform. This is achieved using a baseclass. The baseclass definition files uses the same syntax for defining components as the HHEd command. However, for baseclass definitions the components must always be specified.

~b ‘‘global’’

<MMFIDMASK> CUED_WSJ*

<PARAMETERS> MIXBASE

<NUMCLASSES> 1

<CLASS> 1 {*.state[2-4].mix[1-12]}

Figure 9.1: Global base class definition

The simplest form of transform uses a global transformation for all components. Figure 9.1 shows a global transformation for a system where there are upto 3 emitting states and upto 12 Gaussian components per state.

~b ‘‘baseclass_4.base’’

<MMFIDMASK> CUED_WSJ*

<PARAMETERS> MIXBASE

<NUMCLASSES> 4

<CLASS> 1 {(one,sil).state[2-4].mix[1-12]}

<CLASS> 2 {two.state[2-4].mix[1-12]}

<CLASS> 3 {three.state[2-4].mix[1-12]}

<CLASS> 4 {four.state[2-4].mix[1-12]}

Figure 9.2: Four base classes definition

These baseclasses may be directly used to determine which components share a particular trans-form. However a more general approach is to use a regression class tree.

9.1.4 Regression Class Trees

To improve the flexibility of the adaptation process it is possible to determine the appropriate set of baseclasses depending on the amount of adaptation data that is available. If a small amount of data is available then a global adaptation transform can be generated. A global transform (as its name suggests) is applied to every Gaussian component in the model set. However as more adaptation data becomes available, improved adaptation is possible by increasing the number of transformations. Each transformation is now more specific and applied to certain groupings of Gaussian components. For instance the Gaussian components could be grouped into the broad phone classes: silence, vowels, stops, glides, nasals, fricatives, etc. The adaptation data could now be used to construct more specific broad class transforms to apply to these groupings.

Rather than specifying static component groupings or classes, a robust and dynamic method is used for the construction of further transformations as more adaptation data becomes available.

MLLR makes use of a regression class tree to group the Gaussians in the model set, so that the set of transformations to be estimated can be chosen according to the amount and type of adaptation data that is available. The tying of each transformation across a number of mixture components makes it possible to adapt distributions for which there were no observations at all. With this process all models can be adapted and the adaptation process is dynamically refined when more adaptation data becomes available.

The regression class tree is constructed so as to cluster together components that are close in acoustic space, so that similar components can be transformed in a similar way. Note that the tree is built using the original speaker independent model set, and is thus independent of any new speaker. The tree is constructed with a centroid splitting algorithm, which uses a Euclidean distance measure. For more details see section10.7. The terminal nodes or leaves of the tree specify the final component groupings, and are termed the base (regression) classes. Each Gaussian component of a model set belongs to one particular base class. The tool HHEd can be used to build a binary

9.1 Model Adaptation using Linear Transformations 157

~r "regtree_4.tree"

<BASECLASS>~b "baseclass_4.base"

<NODE> 1 2 2 3

<NODE> 2 2 4 5

<NODE> 3 2 6 7

<TNODE> 4 1 1

<TNODE> 5 1 2

<TNODE> 6 1 3

<TNODE> 7 1 4

Figure 9.3: Regression class tree example

regression class tree, and to label each component with a base class number. Both the tree and component base class numbers can be saved as part of the MMF, or simply stored separately. Please refer to section10.7for further details.

2 3

4 5 6 7

Fig. 9.1 A binary regression tree

Figure9.1shows a simple example of a binary regression tree with four base classes, denoted as {C₄, C₅, C₆, C₇}. During “dynamic” adaptation, the occupation counts are accumulated for each of the regression base classes. The diagram shows a solid arrow and circle (or node), indicating that there is sufficient data for a transformation matrix to be generated using the data associated with that class. A dotted line and circle indicates that there is insufficient data. For example neither node 6 or 7 has sufficient data; however when pooled at node 3, there is sufficient adaptation data.

The amount of data that is “determined” as sufficient is set as a configuration option for HERest (see reference section17.7).

HERest uses a top-down approach to traverse the regression class tree. Here the search starts at the root node and progresses down the tree generating transforms only for those nodes which

1. have sufficient data and

2. are either terminal nodes (i.e. base classes) or have any children without sufficient data.

In the example shown in figure 9.1, transforms are constructed only for regression nodes 2, 3 and 4, which can be denoted as W2, W3 and W4. Hence when the transformed model set is required, the transformation matrices (mean and variance) are applied in the following fashion to the Gaussian components in each base

class:-



W2 → {C5} W3 → {C6, C7} W₄ → {C₄}





At this point it is interesting to note that the global adaptation case is the same as a tree with just a root node, and is in fact treated as such.

An example of a regression class tree is shown in figure9.3. This uses the four baseclasses from the baseclass macro “baseclass 4.base”. A binary regression tree is shown, thus there are 4 terminal nodes.

9.1 Model Adaptation using Linear Transformations 158

9.1.5 Linear Transform Format

HERest estimates the required transformation statistics and can either output a set of transfor-mation models, or a single transform model file (TMF). The advantage in storing the transforms as opposed to an adapted MMF is that the TMFs are considerably smaller than MMFs (especially triphone MMFs). This section gives examples of the format that the transforms are stored in. For a description of the transform definition see section7.10.

~a ‘‘cued’’

<ADAPTKIND> BASE

<BASECLASSES> ~b ‘‘global’’

<XFORMKIND> CMLLR

<NUMXFORMS> 1

<LINXFORM> 1 <VECSIZE> 5

<BIAS> 5

-0.357 0.001 -0.002 0.132 0.072

<LOGDET> -0.3419

<BLOCKINFO> 2 3 2

<BLOCK> 1

<XFORM> 3 3

0.942 -0.032 -0.001 -0.102 0.922 -0.015 -0.016 0.045 0.910

<BLOCK> 2

<XFORM> 2 2 1.028 -0.032 -0.017 1.041

<CLASSXFORM> 1 1

Figure 9.4: Example Constrained MLLR transform using hard weights

Figure9.5shows the format of a single transform. In the same fashion as HMMs all transforms are stored as macros. The header information gives how the transform was estimated, currently either with a regression class tree TREE or directly using the base classes BASE. The base class macro is then specified. The form of transformation is then described in the transformset. The code currently supports constrained MLLR (illustrated), MLLR mean adaptation, MLLR full variance adaptation and diagonal variance adaptation. Arbitrary block structures are allowable. The assignment of base class to transform number is specified at the end of the file.

The LOGDET value stored with the transform is twice the log-determinant of the transform³.

9.1.6 Hierarchy of Transform

It is possible to specify a hierarchy of transformations. This results from using a parent transform during the training process. Figure9.5shows the use of a set of MLLR transforms generated using a parent CMLLR transform stored in the macro “cued”. The action of this transform is

1. Apply transform cued 2. Apply transform mjfg

The parent transform is always applied before the transform itself.

Hierarchy of transforms automatically result from using a parent transform when estimating a transform.

3There is no advantage in storing twice the log determininat, however this is maintained for backward compatibility with internal HTK releases.

9.1 Model Adaptation using Linear Transformations 159

~a ‘‘mjfg’’

<ADAPTKIND> TREE

<BASECLASSES> ~b ‘‘baseclass_4.base’’

<PARENTXFORM> ~a ‘‘cued’’

<XFORMKIND> MLLRMEAN

<NUMXFORMS> 2

<LINXFORM> 1 <VECSIZE> 5

<BIAS> 5

-0.357 0.001 -0.002 0.132 0.072

<BLOCKINFO> 2 3 2

<BLOCK> 1

<XFORM> 3 3

0.942 -0.032 -0.001 -0.102 0.922 -0.015 -0.016 0.045 0.910

<BLOCK> 2

<XFORM> 2 2 1.028 -0.032 -0.017 1.041

<LINXFORM> 2 <VECSIZE> 5

<BIAS> 5

-0.357 0.001 -0.002 0.132 0.072

<BLOCKINFO> 2 3 2

<BLOCK> 1

<XFORM> 3 3

0.942 -0.032 -0.001 -0.102 0.922 -0.015 -0.016 0.045 0.910

<BLOCK> 2

<XFORM> 2 2 1.028 -0.032 -0.017 1.041

<CLASSXFORM> 1 1

<CLASSXFORM> 2 1

<CLASSXFORM> 3 1

<CLASSXFORM> 4 2

Figure 9.5: Example of an MLLR transform using with a parent transform

在文檔中 The HTK Book (頁 163-169)