Using the HTK Large Vocabulary Decoder HDecode

TIMIT Prompts

3.9 Using the HTK Large Vocabulary Decoder HDecode

3.9 Using the HTK Large Vocabulary Decoder HDecode 52

TO t ax

TO t uw

ZERO z ia r ow

This recognition dictionary will be assumed to be scored in the file dict.hdecode.

A range of bigram and trigram language models, which must match the dictionary in dict.hdecode, can be used. For example the first few entries of a bigram language model are shown below.

\data\

ngram 1=994 ngram 2=1490

\1-grams:

-4.6305 !!UNK

-1.0296 SENT-END -1.9574 -1.0295 SENT-START -1.8367 -2.2940 A -0.6935

... ...

where !!UNK is a symbol representing the out-of-vocabulary word-class. For more details of the form of language models that can be used see chapter 15. Note if !!UNK (or <unk>) is not the symbol used for the OOV class a large number of warnings will be printed in the log-file. To avoid this HLMCopy may be used to extract the word-list excluding the unknown word symbol.

For large vocabulary speech recognition tasks these language model files may become very large.

It is therefore common to store them in a compressed format. For this section the language model is assumed be compressed using gzip and stored in a file bg lm.gz in the ARPA-MIT format.

3.9.2 Option 1 - Recognition

HDecode can be used to generate 1-best output, or lattices. For both options the same configura-tion file, assumed to be stored in config.hdecode, may be used. This should contain the following entries

TARGETKIND = MFCC_0_D_A HLANGMODFILTER = ’gunzip -c $.gz’

HNETFILTER = ’gunzip -c $.gz’

HNETOFILTER = ’gzip -c > $.gz’

RAWMITFORMAT = T

STARTWORD = SENT-START

ENDWORD = SENT-END

This configuration file has specified the frontend, the filter for reading the language model⁹, HLANGMODFILTER and filters for reading and writing lattices, HNETFILTER and HNETOFILTER respectively.

Recognition can then be run on the files specified in test.scp using the following command.

HDecode -H hmm20/models -S test.scp \ -t 220.0 220.0 \

-C config.hdecode -i recout.mlf -w bg_lm \ -p 0.0 -s 5.0 dict.hdecode xwrdtiedlist

The output will be written to an MLF in recout.mlf. The -w option specifies the n-gram model, in this case a bigram to be used. The final recognition results may be analysed using HResults in the same way as HVite.

In common with HVite, there are a number of options that need to be set empirically to obtain good recognition performance and speed. The options -p and -s specify the word insertion penalty and the grammar scale factor respectively as in HVite. There are also a number of pruning options that may be tuned to adjust the run time. These include the main beam (see the -t option), word end beam (see the -v option) and the maximum model pruning (see the -u option).

9gzip and gunzip are assumed to be in the current path.

3.9 Using the HTK Large Vocabulary Decoder HDecode 53

3.9.3 Option 2 - Speaker Adaptation

HDecode also supports the use of speaker adaptation transforms, as described in the tutorial steps 12-14. Note incremental adaptation and transform estimation are not currently supported for HDecode.

Similar command line options are used for speaker adaptation with HDecode as HVite. The main difference is that the use of an input transform is specified using the -m option in HDecode rather than the -k option in HVite. Assuming that a set of MLLR transforms have been generated using HERest as described in section 3.6 and are stored in directory xforms with a transform extension mllr2. Decoding can be run using

HDecode -H hmm20/models -S testAdapt.scp \ -t 220.0 220.0 \

-J xforms mllr2 -h ’*/%%%%%%_*.mfc’ -m -i recoutAdapt.mlf -w bg_lm \ -J classes -C config.hdecode -p 0.0 -s 5.0 dict.hdecode xwrdtiedlist The recognition output is written to recoutAdapt.mlf.

3.9.4 Option 3 - Lattice Generation

HDecode also support lattice generation to allow more complex language models to be applied, or for lattice-based discriminative training. The -z ext option, where ext specifies the extension to be used for the lattice, specifies that lattices should be generated.

The lattices are to stored in a directory (which must be generated) lat bg. The following command will perform lattice generation

HDecode -H hmm20/models -S test.scp \ -t 220.0 220.0 \

-C config.hdecode -i recout.mlf -w bg_lm \ -o M -z lat -l lat_bg -X lat \

-p 0.0 -s 5.0 dict.hdecode xwrdtiedlist

In addition to the standard printing options, word insertion and grammar scale factors, an option to specify the number of tokens used per state (see the -n option) is available This can significantly affect the decoding time and the size of lattices generated. Increasing the value (the default is 32) increases the decoding time and size of the lattices. Note the lattice will be compressed using gzip as specified with the HNETOFILTER.

Prior to rescoring the lattices generated by HDecode must be made deterministic using HLRescore.

The first stage is to generate a list of the lattices that need to be made deterministic. Let test.lcp hold this list, for example a few possible entries are given below

adg0_4_sr009.lat adg0_4_sr049.lat adg0_4_sr089.lat adg0_4_sr129.lat adg0_4_sr169.lat ... ...

For the bigram lattices previously generated in lat bg the following command needs to be run HLRescore -C config.hlrescore -S test.lcp \

-t 200.0 1000.0 \

-m f -L lat_bg -w -l lat_bg_det dict.hdecode

The resulting deterministic bigram lattices are now stored under directory lat bg det. The con-figuration file config.hlrescore should contain the following settings,

HLANGMODFILTER = ’gunzip -c $.gz’

HNETFILTER = ’gunzip -c $.gz’

HNETOFILTER = ’gzip -c > $.gz’

RAWMITFORMAT = T

STARTWORD = SENT-START

ENDWORD = SENT-END

FIXBADLATS = T

3.9 Using the HTK Large Vocabulary Decoder HDecode 54

The FIXBADLATS configuration option ensures that if the final word in the lattice is !NULL, and the word specified in ENDWORD is missing, then !NULL is replaced by the word specified in ENDWORD. This is found to make lattice generation more robust.

3.9.5 Option 4 - Lattice Rescoring

More complicated language models, for instance, higher order n-gram models, may be applied to expand the initial lattices and improve recognition performance. Assume that a compressed version of a trigram language model with the same vocabulary as the bigram above is stored in tg lm.gz.

The 1-best path in the lattice after applying the trigram language model may be obtained using the following command.

HLRescore -C config.hlrescore -S test.lcp \

-f -i recout_tg.mlf -n tg_lm -L lat_bg -w -l lat_tg \ -p 0.0 -s 5.0 dict.hdecode

The 1-best output is placed in recout tg.mlf. In addition, compressed version of the lattices now with trigram language model are stored in lat tg.

It is then possible to rescore these trigram lattices using HDecode with either a different set of acoustic models, or a different grammar scale factor. However, prior to this it is again necessary to ensure that the lattices are deterministic. Thus the following command is required

HLRescore -C config.hlrescore -S test.lcp \ -t 200.0 1000.0 -m f -L lat_tg \ -w -l lat_tg_det dict.hdecode These lattices can then be rescored using

HDecode -H hmm21/models -S test.scp \ -t 220.0 220.0 \

-C config.hdecode -i recout_rescore.mlf -L lat_tg_det \ -p 0.0 -s 5.0 dict.hdecode xwrdtiedlist2

where the new set of acoustic models are assumed to be stored in hmm21/models and model-list in xwrdtiedlist2.

在文檔中 The HTK Book (頁 60-64)