Summary - The HTK Book

machine’s natural byte order. The default value of these two configuration variables is false which is the appropriate setting when using HTK in a multiuple machine architecture environment. In an environment comprising entirely of machines with VAX byte order both configuration parameters can be set true which will disable the byte swapping procedure during reading and writing of data.

4.10 Summary

This section summarises the environment variables and configuration parameters recognised by HShell, HMem and HMath. It also provides a list of all the standard command line options used with HTK.

Table 4.1 lists all of the configuration parameters along with a brief description. A missing module name means that it is recognised by more than one module. Table 4.2 lists all of the environment parameters used by these modules. Finally, table4.3lists all of the standard options.

Module Name Description

HShell ABORTONERR core dump on error (for debugging) HShell HWAVEFILTER Filter for waveform file input HShell HPARMFILTER Filter for parameter file input HShell HLANGMODFILTER Filter for language model file input HShell HMMLISTFILTER Filter for HMM list file input HShell HMMDEFFILTER Filter for HMM definition file input HShell HLABELFILTER Filter for Label file input

HShell HNETFILTER Filter for Network file input HShell HDICTFILTER Filter for Dictionary file input HShell HWAVEOFILTER Filter for waveform file output HShell HPARMOFILTER Filter for parameter file output HShell HLANGMODOFILTER Filter for language model file output HShell HMMLISTOFILTER Filter for HMM list file output HShell HMMDEFOFILTER Filter for HMM definition file output HShell HLABELOFILTER Filter for Label file output

HShell HNETOFILTER Filter for Network file output HShell HDICTOFILTER Filter for Dictionary file output HShell MAXTRYOPEN number of file open retries

HShell NONUMESCAPES prevent string output using \012 format NATURALREADORDER Enable natural read order for HTK binary

files

NATURALWRITEORDER Enable natural write order for HTK bi-nary files

HMem PROTECTSTAKS warn if stack is cut-back (debugging) TRACE trace control (default=0)

Table. 4.1 Configuration Parameters used in Operating Environment

Env Variable Meaning

HCONFIG Name of default configuration file HxxxFILTER Input/Output filters as above

Table. 4.2 Environment Variables used in Operating Environment

4.10 Summary 53 Standard Option Meaning

-A Print command line arguments

-B Store output HMM macro files in binary -C cf Configuration file is cf

-D Display configuration variables -F fmt Set source data file format to fmt -G fmt Set source label file format to fmt -H mmf Load HMM macro file mmf -I mlf Load master label file mlf -J tmf Load transform model file tmf -K tmf Save transform model file tmf -L dir Look for label files in directory dir

-M dir Store output HMM macro files in directory dir -O fmt Set output data file format to fmt

-P fmt Set output label file format to fmt

-Q Print command summary info

-S scp Use command line script file scp -T N Set trace level to N

-V Print version information -X ext Set label file extension to ext

Table. 4.3 Summary of Standard Options

Chapter 5

Speech Input/Output

Many tools need to input parameterised speech data and HTK provides a number of different methods for doing this:

• input from a previously encoded speech parameter file

• input from a waveform file which is encoded as part of the input processing

• input from an audio device which is encoded as part of the input processing.

For input from a waveform file, a large number of different file formats are supported, including all of the commonly used CD-ROM formats. Input/output for parameter files is limited to the standard HTK file format and the new Entropic Esignal format.

DefinitionsHMM

Terminal

Graphical

Adaptation Model Training

HNet Labels Language

Models Constraint Network Lattices/

Dictionary

HModel HDict

HUtil

HShell

HGraf

HRec HAdapt HMath

HMem

HTrain HFB HTK Tool

I/O I/O Speech

Data

HLabel

HSigP HVQHParm HWave

HAudio HLM

All HTK speech input is controlled by configuration parameters which give details of what processing operations to apply to each input speech file or audio source. This chapter describes speech input/output in HTK. The general mechanisms are explained and the various configuration parameters are defined. The facilities for signal pre-processing, linear prediction-based processing, Fourier-based processing and vector quantisation are presented and the supported file formats are given. Also described are the facilities for augmenting the basic speech parameters with energy mea-sures, delta coefficients and acceleration (delta-delta) coefficients and for splitting each parameter vector into multiple data streams to form observations. The chapter concludes with a brief descrip-tion of the tools HList and HCopy which are provided for viewing, manipulating and encoding speech files.

5.1 General Mechanism

The facilities for speech input and output in HTK are provided by five distinct modules: HAudio, HWave, HParm, HVQ and HSigP. The interconnections between these modules are shown in Fig.5.1.

5.1 General Mechanism 55

HWave Waveform

File Parameter

File

HPa rm

HAu dio Audio

Input

Observations

(Parameter Vectors and/or VQ Symbols)

HSigP HV Q

Fig. 5.1 Speech Input Subsystem

Waveforms are read from files using HWave, or are input direct from an audio device using HAudio. In a few rare cases, such as in the display tool HSLab, only the speech waveform is needed. However, in most cases the waveform is wanted in parameterised form and the required encoding is performed by HParm using the signal processing operations defined in HSigP. The parameter vectors are output by HParm in the form of observations which are the basic units of data processed by the HTK recognition and training tools. An observation contains all components of a raw parameter vector but it may be possibly split into a number of independent parts. Each such part is regarded by a HTK tool as a statistically independent data stream. Also, an observation may include VQ indices attached to each data stream. Alternatively, VQ indices can be read directly from a parameter file in which case the observation will contain only VQ indices.

Usually a HTK tool will require a number of speech data files to be specified on the command line. In the majority of cases, these files will be required in parameterised form. Thus, the following example invokes the HTK embedded training tool HERest to re-estimate a set of models using the speech data files s1, s2, s3, . . . . These are input via the library module HParm and they must be in exactly the form needed by the models.

HERest ... s1 s2 s3 s4 ...

However, if the external form of the speech data files is not in the required form, it will often be possible to convert them automatically during the input process. To do this, configuration parameter values are specified whose function is to define exactly how the conversion should be done. The key idea is that there is a source parameter kind and target parameter kind. The source refers to the natural form of the data in the external medium and the target refers to the form of the data that is required internally by the HTK tool. The principle function of the speech input subsystem is to convert the source parameter kind into the required target parameter kind.

Parameter kinds consist of a base form to which one or more qualifiers may be attached where each qualifier consists of a single letter preceded by an underscore character. Some examples of parameter kinds are

WAVEFORM simple waveform

LPC linear prediction coefficients

LPC D E LPC with energy and delta coefficients MFCC C compressed mel-cepstral coefficients

The required source and target parameter kinds are specified using the configuration parameters SOURCEKIND and TARGETKIND. Thus, if the following configuration parameters were defined

SOURCEKIND = WAVEFORM TARGETKIND = MFCC_E

5.2 Speech Signal Processing 56

在文檔中 The HTK Book (頁 58-62)