N-Best Lists and Lattices

PREEMCOEF = 0.97 USEPOWER = T NUMCHANS = 26 CEPLIFTER = 22 NUMCEPS = 12

Many of these variable settings are the default settings and could be omitted, they are included explicitly here as a reminder of the main configuration options available.

When HVite is executed in direct audio input mode, it issues a prompt prior to each input and it is normal to enable basic tracing so that the recognition results can be seen. A typical terminal output might be

READY[1]>

Please speak sentence - measuring levels Level measurement completed

DIAL ONE FOUR SEVEN

== [258 frames] -97.8668 [Ac=-25031.3 LM=-218.4] (Act=22.3) READY[2]>

CALL NINE TWO EIGHT

== [233 frames] -97.0850 [Ac=-22402.5 LM=-218.4] (Act=21.8) etc

If required, a transcription of each spoken input can be output to a label file or an MLF in the usual way by setting the -e option. However, to do this a file name must be synthesised. This is done by using a counter prefixed by the value of the HVite configuration variable RECOUTPREFIX and suffixed by the value of RECOUTSUFFIX . For example, with the settings

RECOUTPREFIX = sjy RECOUTSUFFIX = .rec

then the output transcriptions would be stored as sjy0001.rec, sjy0002.rec etc.

13.8 N-Best Lists and Lattices

As noted in section13.1, HVite can generate lattices and N-best outputs. To generate an N-best list, the -n option is used to specify the number of N-best tokens to store per state and the number of N-best hypotheses to generate. The result is that for each input utterance, a multiple alternative transcription is generated. For example, setting -n 4 20 with a digit recogniser would generate an output of the form

"testf1.rec"

FOUR SEVEN NINE OH ///

FOUR SEVEN NINE OH OH ///

etc

The lattices from which the N-best lists are generated can be output by setting the option -z ext. In this case, a lattice called testf.ext will be generated for each input test file testf.xxx.

By default, these lattices will be stored in the same directory as the test files, but they can be redirected to another directory using the -l option.

The lattices generated by HVite have the following general form

13.8 N-Best Lists and Lattices 184 VERSION=1.0

UTTERANCE=testf1.mfc lmname=wdnet

lmscale=20.00 wdpenalty=-30.00 vocab=dict

N=31 L=56 I=0 t=0.00 I=1 t=0.36 I=2 t=0.75 I=3 t=0.81 ... etc I=30 t=2.48

J=0 S=0 E=1 W=SILENCE v=0 a=-3239.01 l=0.00

J=1 S=1 E=2 W=FOUR v=0 a=-3820.77 l=0.00

... etc

J=55 S=29 E=30 W=SILENCE v=0 a=-246.99 l=-1.20

The first 5 lines comprise a header which records names of the files used to generate the lattice along with the settings of the language model scale and penalty factors. Each node in the lattice represents a point in time measured in seconds and each arc represents a word spanning the segment of the input starting at the time of its start node and ending at the time of its end node. For each such span, v gives the number of the pronunciation used, a gives the acoustic score and l gives the language model score.

The language model scores in output lattices do not include the scale factors and penalties.

These are removed so that the lattice can be used as a constraint network for subsequent recogniser testing. When using HVite normally, the word level network file is specified using the -w option.

When the -w option is included but no file name is included, HVite constructs the name of a lattice file from the name of the test file and inputs that. Hence, a new recognition network is created for each input file and recognition is very fast. For example, this is an efficient way of experimentally determining optimum values for the language model scale and penalty factors.

Part III

Reference Section

185

Chapter 14

The HTK Tools

186

14.1 HBuild 187

14.1 HBuild

14.1.1 Function

This program is used to convert input files that represent language models in a number of different formats and output a standard HTK lattice. The main purpose of HBuild is to allow the expansion of HTK multi-level lattices and the conversion of bigram language models (such as those generated by HLStats) into lattice format.

The specific input file types supported by HBuild are:

1. HTK multi-level lattice files.

2. Back-off bigram files in ARPA/MIT-LL format.

3. Matrix bigram files produced by HLStats.

4. Word lists (to generate a word-loop grammar).

5. Word-pair grammars in ARPA Resource Management format.

The formats of both types of bigram supported by HBuild are described in Chapter12. The format for multi-level HTK lattice files is described in Chapter17.

14.1.2 Use

HBuild is invoked by the command line HBuild [options] wordList outLatFile

The wordList should contain a list of all the words used in the input language model. The options specify the type of input language model as well as the source filename. If none of the flags specifying input language model type are given a simple word-loop is generated using the wordList given.

After processing the input language model, the resulting lattice is saved to file outLatFile.

The operation of HBuild is controlled by the following command line options

-b Output the lattice in binary format. This increases speed of subsequent loading (default ASCII text lattices).

-m fn The matrix format bigram in fn forms the input language model.

-n fn The ARPA/MIT-LL format back-off bigram in fn forms the input language model.

-s st en Set the bigram entry and exit words to st and en. (Default !ENTER and !EXIT). Note that no words will follow the exit word, or precede the entry word. Both the entry and exit word must be included in the wordList. This option is only effective in conjunction with the -n option.

-t st en This option is used with word-loops and word-pair grammars. An output lattice is produced with an initial word-symbol st (before the loop) and a final word-symbol en (after the loop). This allows initial and final silences to be specified. (Default is that the initial and final nodes are labelled with !NULL). Note that st and en shouldn’t be included in the wordList unless they occur elsewhere in the network. This is only effective for word-loop and word-pair grammars.

-u s The unknown word is s (default !NULL). This option only has an effect when bigram input language models are specified. It can be used in conjunction with the -z flag to delete the symbol for unknown words from the output lattice.

-w fn The word-pair grammar in fn forms the input language model. The file must be in the format used for the ARPA Resource Management grammar.

-x fn The extended HTK lattice in fn forms the input language model. This option is used to expand a multi-level lattice into a single level lattice that can be processed by other HTK tools.

-z Delete (zap) any references to the unknown word (see -u option) in the output lattice.

HBuild also supports the standard options -A, -C, -D, -S, -T, and -V as described in section4.4.

14.1 HBuild 188

14.1.3 Tracing

HBuild supports the following trace options where each trace flag is given using an octal base 0001 basic progress reporting.

Trace flags are set using the -T option or the TRACE configuration variable.

在文檔中 The HTK Book (頁 189-195)