HMM Training (learning) - 利用隱藏式馬可夫模型之棒球精彩事件短片偵測

The most difficult problem of HMMs is to determine a method to adjust the model parameters λ = (A, B, π) to maximize the probability of the observation sequence given the model. Given any finite observation sequence as training data, there is no optimal method to estimate the model parameter. However, we can use an iterative procedure such as Segmental K-means algorithm [13] or Baum-Welch algorithm [18] to maximizeP O

(

, I |λ

)

(I is the optimal state sequence) orP

(

O|λ

)

. In Segmental K-means algorithm the parameters of the model λ = (A, B, π) are adjusted to maximize P O

(

, I|λ

)

where I here is the optimal state sequence as given by the Viterbi algorithm [14]. In Baum-Welch re-estimation, here parameter of the model λ = (A, B, π) are adjusted so as to increase P

(

O|λ

)

until a maximum value is reached.

As seen before, calculating P

(

O^|^λ

)

involves summing up^{P O}

(

^,Q^|^λ

)

^{over all}

possible state sequenceQ

(

Q=q₁q₂q₃...q_T

)

. Hence Baum-Welch algorithm dose not focus on a particular state sequence. The two methods will be described as follows respectively.

K-means algorithm takes us from λ to ^k λ^k⁺¹(iteration k to k+1) such that

( ) (

^*1 ¹

)

*| , |

,I_k λ^k ≤P I_k₊ λ^k⁺

PO O where, I is the optimum state sequence for _k^*

o o₁, ₂...,

O andλ^k, found according to the Viterbi algorithm. The criterion of optimization is called the maximum state optimized likelihood criterion. This function P

(

O,I^*|λ

)

=max_I P

(

O,I |λ

)

is called the state optimized likelihood function. Training the model in K-means Algorithm, a number of (training) observation sequences are required. Let there be w sequences available. Each sequence consists of T observation and each observation symbol

( )

o is assumed to _i be a vector of dimension D

(

D≥1

)

. K-means Algorithm then consists of the following

steps:

1. Randomly choose N observation symbols (map vector of dimension D to symbol by rule table) and assign each of the wT observation symbols to one of these N symbols from which its Euclidean distance is minimal. Hence we have formed N clusters, each of which is called a state (1 to N). We can divide those training data into N groups and pick one observation vector from each group. Of course this method is just to make the initial choice of states as widely distributed as possible.

2. Calculate the initial probabilities and the transition probabilities. i, and j represent the current state index and next state index and t represents time from 1 to T-1:

sequence 3. Calculate the mean vector and the covariance matrix for each state: for 1≤i≤N

i, and j represents current and next state index, t represents time from 1 to T:

∈

∑

4. Calculate the symbol probability distributions for each training vector for each state as (assume Gaussian distribution – change the formulas below for the particular probability distribution that suits problem). For 1≤i≤N, i represents state index and t represents time from 1 to T

( )

⁼

( )

_⎢⎣^⎡⁻

(

t ⁻ i

) (

i⁻ t ⁻ i

)

^T_⎥⎦^⎤

5. Find the optimal state sequence I (as given by Viterbi algorithm) for each ^* training sequence using ^λ ⁼

(

^{A ,}^,^B ^π

)

computed in step2 to 4 above.

π B,

A, and are the new state transition, output symbol, and initial state probability respectively from re-estimation. Each observation symbol is reassigned a state if its original assignment is different from the corresponding estimated optimum state.

6. If any observation symbol is reassigned a new state in step5, use the new assignment and repeat step2 through step6; otherwise, stop.

It can be shown in [15] that Segmental K-means algorithm converges to the state-optimized likelihood function for a wide range of observation density functions including Gaussian density function.

The second method is called Baum-Welch algorithm, assuming that an initial model can be improved upon by using the Eq. (30)-(32). An initial HMM can be constructed in any way such as random generation, but we may use the first five steps of the Segmental K-means algorithm described above to give us a reasonable initial estimate of the HMM and use Baum-Welch algorithm to re-estimate. Before we get down to the actual Eq. (30)-(32) of Baum-Welch algorithm, some concepts and notations should be introduced that shall be required in the final Eq. (30)-(32).

The forward-backward variable γt is defined in Eq. (24).

( ) ( ) ( ) ( )

Eq. (24) describes the probability of being at state qi in time t. To describe the procedure of re-estimation (iterative update and improvement) of HMM parameter,

the variable ε_t

( )

i,j was defined in Eq. (25) and Eq. (25) describes the probability of being at state qi in time t and at state qj in time t+1.

( ) ( ^λ )

ε

i , j = P s

= q

, s

_t₊₁

= q

| O ,

(25)

The sequence of events leading to the conditions required by Eq. (25) is illustrated in Fig. 3-5. It should be clear, from the definitions of the forward variable αt(i) and backward variable βt(i), that we can re-write Eq. (25) in the following form

( ) ( ) ( ) ( )

gives the desired probability measure.

Fig. 3-5 Illustration of the sequence of operations required for the computation of the joint event that the system is in state qi at time t and qj at time t+1 If we sum up forward-backward variable γt(i) from t=1 to T at each state i, we get a quantity which can be viewed as the expected number of times that state qi is visited, or if we sum up only to T-1 then we shall get the expected number of transitions out of state qi (as no transition is made at t = T). Similarly if ε_t

( )

i,j be summed up from t=1 to T-1, we shall get the expected number of transitions from state qi to state qj. Hence

( )

expectednumber of timesstateq_iis visited.

Using above formulas we can give a method for re-estimation of the parameter of an HMM. A set of reasonable re-estimation formulas for A, B, and πare

)

(index i represents the state i)

( )

that either (1) the initial model λ defined a critical point of the likelihood function, in which caseλ =λ; or (2) model λ is more likely than model λ in the sense that ^P

( )

^O^|^λ ^>^P

⁽

^O^|^λ

⁾

, i.e. a new model λ has been found from which the

observation sequence is more likely to have been produced. We can improve the probability of O which is being observed from the model if we repeat the above procedure and use λ to replace the λ several times until some limiting point is reached. The final result of re-estimation is called a maximum likelihood estimate of the HMM.

Chapter 4

Proposed scheme for Event classification 4.1 Overview of Proposed Scheme

Overview of the proposed semantic baseball event classification is depicted in Fig. 4-1(a), and Fig. 4-1(b). The process can roughly be divided into two steps:

training step and classification step. In training step, each type as listed in Table 4-2 of indexed baseball event was input as training data for each highlight classifier. In classification step, when each observation symbol sequence of unknown clip was input, each highlight classifier will evaluate how well a model predicts a given observation sequence.

HMM training 1.Training Step

Several indexed baseball clips in each type

Color conversion

Object detection Until the last key frame

Frame

Several indexed baseball clips in each type

Color conversion

Object detection Until the last key frame

Frame Fig. 4-1(a) Overview of the training step in proposed baseball event classification

2.Event classification step

Color conversion

Object detection Until the last key frame Unknown clip Until the last key frame Unknown clip

Fig. 4-1(b) Overview of the classification step in proposed baseball event classification

Each highlight clip as input starts with a PC shot and ends up with a close-up shot or a specific shot depending on different baseball event type. There are

在文檔中利用隱藏式馬可夫模型之棒球精彩事件短片偵測 (頁 27-33)

HMM Training (learning)

(

)

(

)

(

)

(

)

(

)

(

)

(

)

( ) (

)

(

)

(

)

( )

(

)

∑

( )

( )

(

) (

)

(

)

( ) ( ) ( ) ( )

( )

( ) ( λ )

ε

i , j = P s

= q

, s

= q

| O ,

( ) ( ) ( ) ( )

( )

( )

( )

( )

(

)

( ) ( ^λ )

⁽

⁾