Basic HMM Definitions - The HTK Book

• optional model duration parameter vector

• number of states

• for each emitting state and each stream

– mixture component weights or discrete probabilities – if continuous density, then means and covariances – optional stream weight vector

– optional duration parameter vector

• transition matrix

The following sections explain how these are defined.

7.2 Basic HMM Definitions

Some HTK tools require a single HMM to be defined. For example, the isolated-unit re-estimation tool HRest would be invoked as

HRest hmmdef s1 s2 s3 ....

This would cause the model defined in the file hmmdef to be input and its parameters re-estimated using the speech data files s1, s2, etc.

∼h “hmm1”

<BeginHMM>

<VecSize> 4 <MFCC>

<NumStates> 5

<State> 2

<Mean> 4

0.2 0.1 0.1 0.9

<Variance> 4 1.0 1.0 1.0 1.0

<State> 3

<Mean> 4

0.4 0.9 0.2 0.1

<Variance> 4 1.0 2.0 2.0 0.5

<State> 4

<Mean> 4

1.2 3.1 0.5 0.9

<Variance> 4 5.0 5.0 5.0 5.0

<TransP> 5

0.0 0.5 0.5 0.0 0.0 0.0 0.4 0.4 0.2 0.0 0.0 0.0 0.6 0.4 0.0 0.0 0.0 0.0 0.7 0.3 0.0 0.0 0.0 0.0 0.0

<EndHMM>

Fig. 7.2 Definition for Simple L-R HMM

HMM definition files consist of a sequence of symbols representing the elements of a simple language. These symbols are mainly keywords written within angle brackets and integer and floating point numbers. The full HTK definition language is presented more formally later in section7.10.

For now, the main features of the language will be described by some examples.

7.2 Basic HMM Definitions 93 Fig 7.2 shows a HMM definition corresponding to the simple left-right HMM illustrated in Fig7.1. It is a continuous density HMM with 5 states in total, 3 of which are emitting. The first symbol in the file ∼h indicates that the following string is the name of a macro of type h which means that it is a HMM definition (macros are explained in detail later). Thus, this definition describes a HMM called “hmm1”. Note that HMM names should be composed of alphanumeric characters only and must not consist solely of numbers. The HMM definition itself is bracketed by the symbols <BeginHMM> and <EndHMM>.

The first line of the definition proper specifies the global features of the HMM. In any system consisting of many HMMs, these features will be the same for all of them. In this case, the global definitions indicate that the observation vectors have 4 components (<VecSize> 4) and that they denote MFCC coefficients (<MFCC>).

The next line specifies the number of states in the HMM. There then follows a definition for each emitting state j, each of which has a single mean vector µ_j introduced by the keyword <Mean>

and a diagonal variance vector Σj introduced by the keyword <Variance>. The definition ends with the transition matrix {aij} introduced by the keyword <TransP>.

Notice that the dimension of each vector or matrix is specified explicitly before listing the com-ponent values. These dimensions must be consistent with the corresponding observation width (in the case of output distribution parameters) or number of states (in the case of transition matrices).

Although in this example they could be inferred, HTK requires that they are included explicitly since, as will be described shortly, they can be detached from the HMM definition and stored elsewhere as a macro.

The definition for hmm1 makes use of many defaults. In particular, there is no definition for the number of input data streams or for the number of mixture components per output distribution.

Hence, in both cases, a default of 1 is assumed.

Fig7.3shows a HMM definition in which the emitting states are 2 component mixture Gaussians.

The number of mixture components in each state j is indicated by the keyword <NumMixes> and each mixture component is prefixed by the keyword <Mixture> followed by the component index m and component weight cjm. Note that there is no requirement for the number of mixture components to be the same in each distribution.

State definitions and the mixture components within them may be listed in any order. When a HMM definition is loaded, a check is made that all the required components have been defined.

In addition, checks are made that the mixture component weights and each row of the transition matrix sum to one. If very rapid loading is required, this consistency checking can be inhibited by setting the Boolean configuration variable CHKHMMDEFS to false.

As an alternative to diagonal variance vectors, a Gaussian distribution can have a full rank covariance matrix. An example of this is shown in the definition for hmm3 shown in Fig7.4. Since covariance matrices are symmetric, they are stored in upper triangular form i.e. each row of the matrix starts at the diagonal element². Also, covariance matrices are stored in their inverse form i.e. HMM definitions contain Σ⁻¹ rather than Σ. To reflect this, the keyword chosen to introduce a full covariance matrix is <InvCovar>.

2 Covariance matrices are actually stored internally in lower triangular form

7.2 Basic HMM Definitions 94

∼h “hmm2”

<BeginHMM>

<VecSize> 4 <MFCC>

<NumStates> 4

<State> 2 <NumMixes> 2

<Mixture> 1 0.4

<Mean> 4

0.3 0.2 0.2 1.0

<Variance> 4 1.0 1.0 1.0 1.0

<Mixture> 2 0.6

<Mean> 4

0.1 0.0 0.0 0.8

<Variance> 4 1.0 1.0 1.0 1.0

<State> 3 <NumMixes> 2

<Mixture> 1 0.7

<Mean> 4

0.1 0.2 0.6 1.4

<Variance> 4 1.0 1.0 1.0 1.0

<Mixture> 2 0.3

<Mean> 4

2.1 0.0 1.0 1.8

<Variance> 4 1.0 1.0 1.0 1.0

<TransP> 4

0.0 1.0 0.0 0.0 0.0 0.5 0.5 0.0 0.0 0.0 0.6 0.4 0.0 0.0 0.0 0.0

<EndHMM>

Fig. 7.3 Simple Mixture Gaussian HMM

Notice that only the second state has a full covariance Gaussian component. The first state has a mixture of two diagonal variance Gaussian components. Again, this illustrates the flexibility of HMM definition in HTK. If required the structure of every Gaussian can be individually configured.

Another possible way to store covariance information is in the form of the Choleski decomposition L of the inverse covariance matrix i.e. Σ⁻¹= LL⁰. Again this is stored externally in upper triangular form so L⁰is actually stored. It is distinguished from the normal inverse covariance matrix by using the keyword <LLTCovar> in place of <InvCovar>³.

The definition for hmm3 also illustrates another macro type, that is, ∼o. This macro is used as an alternative way of specifying global options and, in fact, it is the format used by HTK tools when they write out a HMM definition. It is provided so that global options can be specifed ahead of any other HMM parameters. As will be seen later, this is useful when using many types of macro.

As noted earlier, the observation vectors used to represent the speech signal can be divided into two or more statistically independent data streams. This corresponds to the splitting-up of the input speech vectors as described in section 5.10. In HMM definitions, the use of multiple data streams must be indicated by specifying the number of streams and the width (i.e dimension) of each stream as a global option. This is done using the keyword <StreamInfo> followed by the number of streams, and then a sequence of numbers indicating the width of each stream. The sum of these stream widths must equal the original vector size as indicated by the <VecSize> keyword.

3 The Choleski storage format is not used by default in HTK Version 2

在文檔中 The HTK Book (頁 98-101)