Discriminative Training using HMMIRest

HR EST / HER EST

8.10 Discriminative Training using HMMIRest

8.10 Discriminative Training using HMMIRest 151

that each invocation of HMMIRest uses the same initial set of models but has its own private set of data. By setting the option -p N where N is an integer, HMMIRest will dump the contents of all its accumulators into a set of files labelled HDRN.acc.1 to HDRN.acc.n. The number of files n depends on the discriminative training criterion and I-smoothing prior being used. For all set-ups the denominator and numerator accumulates are kept separate. The standard training options will have the following number of accumulates:

• 4: MPE/MWE training with a dynamic MMI prior;

• 3: MPE/MWE training with a dynamic ML prior, MPE/MWE with static priors⁴;

• 2: MMI training.

As each of the accumulators will be approximately the size of the model-set, and in this parallel model a large number of accumulators can be generated, it is useful to ensure that there is sufficient disk-space for all the accumulators generated. These dumped files are then collected together and input to a new invocation of HMMIRest with the option -p 0 set. HMMIRest then reloads the accumulators from all of the dump files and updates the models in the normal way. This process is illustrated in Figure8.9.

To give a concrete example in the same fashion as described for HERest, suppose that four networked workstations were available to execute the HMMIRest command performing MMI training. Again the training files listed previously in trainlist would be split into four equal sets and a list of the files in each set stored in trlist1, trlist2, trlist3, and trlist4. Phone-marked numerator and denominator lattices are assumed to be available in plat.num and plat.den respectively. On the first workstation, the command

HMMIRest -S trlist1 -C mmi.cfg -q plat.num -r plat.den\

-H dir1/hmacs -M dir2 -p 1 hmmlist

would be executed. This will load in the HMM definitions in dir1/hmacs, process the files listed in trlist1 and finally dump its accumulators into files called HDR1.acc.1 and HDR1.acc.2 in the output directory dir2. At the same time, the command

HMMIRest -S trlist2 -C mmi.cfg -q plat.num -r plat.den\

-H dir1/hmacs -M dir2 -p 2 hmmlist

would be executed on the second workstation, and so on. When HMMIRest has finished on all four workstations, the following command will be executed on just one of them

HMMIRest -C mmi.cfg -H dir1/hmacs -M dir2 -p 0 hmmlist \

dir2/HDR1.acc.1 dir2/HDR1.acc.2 dir2/HDR2.acc.1 dir2/HDR2.acc.2 \ dir2/HDR3.acc.1 dir2/HDR3.acc.2 dir2/HDR4.acc.1 dir2/HDR4.acc.2

where the list of training files has been replaced by the dumped accumulator files. This will cause the accumulated statistics to be reloaded and merged so that the model parameters can be reestimated and the new model set output to dir2

When discriminatively training large systems on large amounts of training data, and to a lesser extent for maximum likelihood training, the merging of possibly hundreds of accumulators associ-ated with large model sets can be slow and significantly load the network. To avoid this problem, it is possible to merge subsets of the accumlators using the UPDATEMODE = DUMP configuration option.

As an example using the above configuration, assume that the file dump.cfg contains UPDATEMODE = DUMP

The following two commands would be used to merge the statistics into two sets of accumulators in directories acc1 and acc2.

HMMIRest -C mmi.cfg -C dump.cfg -H dir1/hmacs -M acc1 -p 0 hmmlist \ dir2/HDR1.acc.1 dir2/HDR1.acc.2 dir2/HDR2.acc.1 dir2/HDR2.acc.2 HMMIRest -C mmi.cfg -C dump.cfg -H dir1/hmacs -M acc2 -p 0 hmmlist \

dir2/HDR3.acc.1 dir2/HDR3.acc.2 dir2/HDR4.acc.1 dir2/HDR4.acc.2

4The third accumulate is not now used, but is stored for backward compatibility.

8.10 Discriminative Training using HMMIRest 152

These two sets of merged statistics can then be used to update the acoustic model using HMMIRest -C mmi.cfg -H dir1/hmacs -M dir2 -p 0 hmmlist \

acc1/HDR0.acc.1 acc1/HDR0.acc.2 acc2/HDR0.acc.1 acc2/HDR0.acc.2

For very large systems this hierarchical merging of stats can be done repeatedly. Note this form of accumulate merger is also supported for HERest.

Chapter 9

HMM Adaptation

Labelled Adaptation or

Enrollment

/HV

ITE Data

HE

REST

Transformed Speaker Independent Model Set Speaker Independent Model Set

Chapter8described how the parameters are estimated for plain continuous density HMMs within HTK, primarily using the embedded training tool HERest. Using the training strategy depicted in figure 8.2, together with other techniques can produce high performance speaker independent acoustic models for a large vocabulary recognition system. However it is possible to build improved acoustic models by tailoring a model set to a specific speaker. By collecting data from a speaker and training a model set on this speaker’s data alone, the speaker’s characteristics can be modelled more accurately. Such systems are commonly known as speaker dependent systems, and on a typical word recognition task, may have half the errors of a speaker independent system. The drawback of speaker dependent systems is that a large amount of data (typically hours) must be collected in order to obtain sufficient model accuracy.

Rather than training speaker dependent models, adaptation techniques can be applied. In this case, by using only a small amount of data from a new speaker, a good speaker independent system model set can be adapted to better fit the characteristics of this new speaker.

Speaker adaptation techniques can be used in various different modes. If the true transcription of the adaptation data is known then it is termed supervised adaptation, whereas if the adaptation data is unlabelled then it is termed unsupervised adaptation. In the case where all the adaptation data is available in one block, e.g. from a speaker enrollment session, then this termed static adaptation.

Alternatively adaptation can proceed incrementally as adaptation data becomes available, and this is termed incremental adaptation.

HTK provides two tools to adapt continuous density HMMs. HERest performs offline super-vised adaptation using various forms of linear transformation and/or maximum a-posteriori (MAP) adaptation, while unsupervised adaptation is supported by HVite (using only linear transforma-tions). In this case HVite not only performs recognition, but simultaneously adapts the model set

153

在文檔中 The HTK Book (頁 159-163)