Experiments - Experiment and Results - 利用可變動的影像群大小達成基於影像內容決定的可調性編碼分類

Chapter 4 Experiment and Results

4.2 Experiments

We have done many video samples to test the purposed method. Because of the reason that not many testing samples are composed of variant video content, we’ve chosen some representative video samples to show the experimental results; most of the other samples are classified as pure temporal scalable coding with FGS or pure spatial scalable coding with FGS because of the consistency within one video sample.

The following experiments are performed to make a comparison among

three types of scalable coding techniques and the purposed classification method. Spatial+FGS represents spatial scalability with FGS, temporal+FGS represents temporal scalability with FGS, mix represents the combination of temporal, spatial scalability with FGS, and CBC represents the purposed method. All but CBC use fixed GOP to encode the video sequence in each experiment, while as we’ve known that CBC uses adaptive GOP.

4.2.1 Exp. 1

Experiment 1 uses a yuv sample, coastguard, with 162 frames as testing.

The spatial scalability is achieved by using two kinds of resolution format, cif( 352 x 288) and qcif( 176 x 144). The testing bit-rate is set as 618 kbps.

The initial type after the step of content type initiating, the adjusted type after group adjusting, and the decided type after the step of distortion measurement for each GOP is as shown in table 2

Content type

GOP9 I t t t t t t t t t t ?????? (I t t)(t t t t t t t t)(???)(???) (I t t)(t t t t t t t t)(ttt)(t t t)

GOP1 0

I?????? t t ?? t t t t t t (I??)(????)(t t)(??)(t t t) (t t t) (I t t)(t t t t)(t t)(t t)(t t t)(t t t)

Table 2 procedure of type deciding for each GOP The experimental result is as shown in Fig. 21

Fig. 21 Experimental result of coastguard

This video sequence is basically a slow motion video but not the case during frame 48~96. The video content of frame 48~96 is about drastically moving the camera upward and therefore, not suitable for using TSC which leads to a obvious decrease on quality. In most of the case, cbsvc results in better quality or equal to the best standard coding for the current GOP.

4.2.2 Exp. 2

Experiment 2 is using a yuv sample, stefan, with 81 frames as testing.

The spatial scalability is achieved by using two kinds of resolution format,

cif( 352 x 288) and qcif( 176 x 144). The testing bit-rate is set as 718 kbps.

The scalable type assignment for each step of the proposed method is shown in table 3 below

Content type initiating

Group adjusting

Distortion measurement GOP1 I t t t t t q q q q q t t t t t t (I t t)(t t t)(q q q q q)(t t t)(t t t) None

GOP2 I t t t t t t t t t t t t t t t t None None

GOP3 I t t t t t t t t t t t t t t t t None None

GOP4 I t t t t t t t t t t t t t t t t None None

GOP5 I t t t t t t t t t t t t t t t t None None

Table 3 procedure of type deciding for each GOP The experimental result is as shown in Fig. 22

Fig. 22 Experimental result of Stefan

Stefan is a complex video sequence. During frame 0~16, the video

content contains high motion and thus, QSC only is used. The resulting quality is far better than that of the others.

4.2.3 Exp. 3

Experiment 3 is using a yuv sample, dancer, with 81 frames as testing.

The spatial scalability is achieved by using two kinds of resolution format, cif( 352 x 288) and qcif( 176 x 144). The testing bit-rate is set as 350 kbps.

The scalable type assignment for each step of the proposed method is shown in table 4 below

Content type initiating

Group adjusting

Distortion measurement

GOP1 I???????????????? None I t t t t t t t t t t t t t t t t

GOP2 I???????????????? None (I t t t t t t t t)(s s s s s s s s)

GOP3 I???????????????? None I s s s s s s s s s s s s s s s s

GOP4 I???????????????? None I s s s s s s s s s s s s s s s s

GOP5 I???????????????? None (I t t t t t t t t)(s s s s s s s s)

Table 4 procedure of type deciding for each GOP The experimental result is as shown in Fig. 23

Fig. 23 Experimental result of dancer

Dancer is a case of simple texture with slow motion and thus, it needs to be applied distortion measurement on each GOP. The resulting video quality performs better than that of the others in most case, and equal to some other standard coding which is best among others. The result tells the success of applying distortion measurement.

4.2.4 Exp. 4

Experiment 4 is using a yuv sample, football, with 81 frames as testing.

The spatial scalability is achieved by using two kinds of resolution format, cif( 352 x 288) and qcif( 176 x 144). The testing bit-rate is set as 570 kbps.

The scalable type assignment for each step of the proposed method is shown in table 5 below

Content type initiating

Group adjusting

Distortion measurement

GOP1 I????? s s s s s s s s s s ? (I??)(???)s s s s s s s s s s s (I t t)(t t t)s s s s s s s s s s s

GOP2 I? s s ? s s s s s s s s s s ?? I s s s s s s s s s s s s s s ?? I s s s s s s s s s s s s s s s s

GOP3 I???????????????? None I t t t t t t t t s s s s s s s s

GOP4 I???????????????? None I s s s s s s s s s s s s s s s s

GOP5 I???????????????? None I s s s s s s s s s s s s s s s s

Table 5 procedure of type deciding for each GOP The experimental result is as shown in Fig. 24

Fig. 24 Experimental result of football

Football is a simple-texture video sequence. The stage of type initiating well determines the type for some high motion pictures during frame 0~32, which makes the resulting quality equal to that of the best standard coding or perform better than that of the others. For other GOPs, distortion measurement works well for choosing the most suitable coding type for each frame.

4.2.5 Exp. 5

Experiment 5 is using a real-world sample, Madagascar, with 209 frames

as testing. Madagascar is an animation movie, and the reason of using this as sample is that we want to examine the effect by applying the proposed method on a real-world video sequence. The spatial scalability is achieved by using two kinds of resolution format, 704 x 576 and 352 x 288. The testing bit-rate is set as 1100 kbps.

The video content type for this sample contains SS and SH type. Because the procedure of similar classification has been shown in previous section, we only show the experimental result.

The experimental result is as shown in Fig. 25

Fig. 25 Experimental result of Madagascar

Madagascar is a testing sample transformed from AVI file to video sequence. The content of Madagascar contains many scene changes and exaggerate object movement, including deformation. We can observe the result in Fig. 25 that the proposed CBC method performs well with a

real-world video compared to the standard.

4.2.6 Complexity analysis

This section discusses the comparison of complexity among the proposed method and standard coding ways. The complexity is measured in terms of number of motion estimation and compensation (ME and MC), because ME and MC take the major amount of time during encoding process. Before discussing the complexity, we first address the concept of reusing ME and MC, which means that the result of ME and MC used in the type initiating stage to estimate the degree of motion can be reused in the distortion measurement stage if the reference frames are the same.

Assuming that after type initiating and group adjusting of a GOP, the number of picture marked with S is Ns, the number of picture marked with T is Nt, the number of picture marked with Q is Nq, the number of picture marked with ? is N?, and the size of the GOP is N. For the sub-GOP with type S and Q, no more ME and MC is needed because distortion measurement is not required by them; However, for the sub-GOP of type T, all the pictures except those at the highest enhancement layer need one extra ME and MC because the corresponding reference frames used by these pictures are different from those used in the type initiating stage. That is, Nt – floor(Nt/2) – 1 pictures require extra ME and MC. Similarly, for the sub-GOP of type indecisive, N? – floor(N?/2) – 1 pictures require extra ME and MC.

Except the first and the last frame, considering bi-prediction to be used for each picture in a GOP, and let C be the number of ME and MC, we’ll have the following comparison of complexity:

Proposed method: C = (2N+1) + [(Nt – floor(Nt/2) – 1)x2-1] + [(N? – floor(N?/2) – 1)x2-1]

SSC: C = 2N+1 TSC: C = 2N+1

The equation of the proposed method contains three terms. The first term represents the number of ME and MC during type initiating, and the remaining terms represent the number of extra ME for sub-GOPs marked with T or indecisive. The worse case will happen when Nt = N+1 or Ns = N+1, in this case C will be about 3N-1, which is about N-2 times more than the standard coding ways.

在文檔中利用可變動的影像群大小達成基於影像內容決定的可調性編碼分類 (頁 41-50)