Type Initiating - Content-based Classification

Chapter 3 Content-based Classification

3.1 Type Initiating

We classify the variant video content into four basic types: simple texture with slow motion (SS), simple texture with high motion (SH), complex texture with slow motion (CS), and complex texture with high motion (CH). For simple texture with slow motion, both spatial scalable coding and temporal scalable coding can be used. For simple texture with high motion, only spatial scalable coding is suitable for the reason that high motion results in worse recovering quality if temporal scalable coding is used as discussed in section 2.1. For complex texture with slow motion, temporal scalable coding is a better choice than spatial scalable coding which may lose many details of the texture as discussed in section 2.2.

For complex texture with high motion, both temporal and spatial scalable coding techniques are not appropriate and thus quality scalable coding is applied. Let C( fi ) denote the classification for frame i, and S( fi ) denote the scalable coding type for frame i, then we will have the following representation:

If C( fi ) == SS then S( fi ) = SSC or TSC If C( fi ) == SH then S( fi ) = SSC

If C( fi ) == CS then S( fi ) = TSC

If C( fi ) == CH then S( fi ) = QSC………..Rule.3.1.1

, where SSC represents spatial scalable coding along with FGS, TSC represents temporal scalable coding along with FGS, and QSC represents quality scalable coding which is FGS here.

Given the concept of classification, now the key issue is how to distinguish the kinds of texture and the moving degree of motion.

3.1.1 Simple texture v.s. Complex texture

In the stage of type initiating, when each picture is entering the encoder, the texture of each picture will be examined first. The examination is done by first applying decimation on the input picture, and then calculates the difference between the scaled up version of decimated picture and the original one. The procedure is as shown in Appendix A.

The difference is calculated as

, where W represents the width of frame i, while H presents the height.

denotes the original frame i, while

I

I '

_hw

denotes the scaled up version of the decimated frame i. Letγbe the complexity threshold of a frame

, where T( fi ) denotes the texture of frame i, C represents complex texture and S represents simple texture. For a frame with Diff greater than or equal to γ, we will classify it as a complex texture, otherwise a simple texture. According to the experiments, the value of threshold γ is set to be 150, which is a suitable boundary for distinguishing two different types of texture.

3.1.2 Slow motion v.s. High motion

After the examination of texture, the degree of motion in one picture will be measured. Motion estimation (ME) and motion compensation (MC)

are used for this purpose. The operation is processed by taking each input picture as a B type frame, which means the MC residual of both past and future directions for one picture will be calculated. Let Dp(fi) denote the residual when past frame is used as the reference frame for frame fi, and Df(fi) when future frame is used. There are three conditions that one picture will be taken as a high motion picture(the threshold value, β, is set to 280.368 in the experiment):

1. Dp( fi ) >β and Df( fi ) >β.

2. Dp( fi ) >β, Df( fi ) <β, and frame fi is a recovered-by-past MV picture.

3. Df( fi ) >β, Dp( fi ) <β, and frame fi is a recovered-by-future MV picture.

The first situation means that if the residuals of both directions are big, this picture is a high motion picture. The recovered-by-past MV in condition 2 means that, when frame fi is lost, its past MV and future MV are both predicted from a past MV of another frame; while recovered-by-future MV in condition 3 means that the past MV and future MV of fi are both predicted from a future MV of another frame. In temporal coding hierarchy, each B frame has two MVs; one (called past MV) uses a reference frame earlier than it, and the other one (called future MV) uses a reference frame after it. Fig. 12 shows a temporal scalable coding hierarchy, where the second picture is a recovered-by-past MV picture.

Fig. 14 the hierarchical coding structure marked with some recovery directions with GOP size = 8

Because, once it is lost, its past MV and future MV both come from past MV of the third picture. On the other hand, for the forth picture, it’s a case of recovered-by-future MV picture. According to Rule.2.5.1, the MV of lost second picture in Fig. 14 can be obtained as following:

MV( 2,j,p ) = MV( 3,j,p )/2 MV( 2,j,f ) = -MV( 3,j,p )/2

For every picture in a hierarchical coding structure shown in Fig. 15, we use Å to indicate that it is a recovered-by-past MV picture; while Æ to indicate that it is a recovered-by-future MV picture.

Fig. 15 the recovery direction of each picture in a hierarchical coding structure of GOP=8

The idea behind conditions 2 and 3 is to consider the case that something

appearing to or disappearing from a video sequence. As an example in Fig. 16, there are something appearing on top of (b) and (c), compared to (a). This means that Dp(frame(b)) and Dp(frame(c)) would be large, and the past MV of both frame (b) and (c) won’t be reliable (something cannot be found in reference frame (a)). Therefore, using past MV of frame (c) to predict both MVs of frame (b) (if (b) is lost) won’t be a good idea.

(a) (b) (c)

Fig. 16 successive frames for illustrating condition of high motion (a) in the same position of picture 1 in Fig. 14 (b) in the same position of

picture 2 in Fig. 14 (c) in the same position of picture 3 in Fig. 14 After texture and motion estimation described above, the final step in type initiating stage is to mark pictures with different types according to Rule.3.1.1. if a picture with simple texture as well as slow motion, it will be marked as an indecisive frame, because in this situation both spatial and temporal scalable coding can be applied. The decision of scalable coding type for these pictures will rely on the distortion measurement algorithm as discussed in section 3.4.

在文檔中利用可變動的影像群大小達成基於影像內容決定的可調性編碼分類 (頁 27-32)