MPEG 多媒體傳輸機制及通訊協定在嵌入式行動平台上的分析設計－－蔡淳仁教授 - 基於MPEG標準之多媒體通訊整合平台及其應用---總計畫(II)

－蔡淳仁教授

本子計畫除了延續第一年的工作，協助MPEG 標準制定（Test Bed for MPEG-21 Resource Delivery 已正式成為國際標準 ISO/IEC TR 21000-12），也根據這個 test bed 的架構，進

行了兩個新的研究方向。首先是設計了一套 streaming 的流量及容錯控制機制，能夠把傳輸頻寬和封包丟失的影響考量在系統中，以到碼率最佳化（Rate-Distortion Optimized）

的串流傳輸效果。另一個研究的方向則是設計了以 wavelet 為基礎之 scalable video codec 的流量控制機制。這部份的研究主要是在給定頻寬要求的前題下達到碼率失真最佳化的位元流(bitstream)切割，以供後端的串流傳輸系統進行封包包裝及傳輸的工作。

而這邊所設計出來的流量控制(rate control)機制和其它系統的主要改進在於我們加入了 multiple adaptation 的考量，這對未來嵌入式行動多媒體系統（如 P2P 多媒體傳輸）的應用將有十分大的幫助。好的串流資料量，還必須能在算出的串流資料量限制下，從scalable video 的位元流中，

抽出最好的子位元流。所以我們針對這方面的需求，設計出了一個快速收斂的小波轉換視訊壓縮法(wavelet video coding)的流量控制機制，另外，更進一步設計了可以有效進行多次碼率最佳化的子位元流抽取的機制。

我們所設計出的碼率最佳化串流傳輸機制，也在總計畫為MPEG 所設計出的多媒體傳輸共通測試平台上實作測試。目前此平台已經成為MPEG 國際標準(ISO/IEC JTC 1/SC 29 Part 12: Test Bed for MPEG-21 Resource Delivery, 2004)。在這個整合計畫下，總計畫團隊為 MPEG 所設計的開放原始碼包含了完整的可調式媒體伺服器、網路摸擬器、及媒體 ("Rate-distortion optimized streaming of packetized media," IEEE Transactions on Multime-dia, February 2001)。不過這套方法目前發表的成果以理論分析為主，在實作上有很多細節並沒有提出解決方案，而且在頻寬變化大的網路環境下，串流傳輸最難達到的平滑播放要求也沒有考量。而且這個系統有兩大缺點。首先是 Chou 使用封包漏失率來代表碼率失真最佳化分析中的失真。這是很不實際的做法。其次，他用來降低失真的方法則是預先重傳封包（非ARQ），這也是很沒有效率的。

在可調式位元串流傳輸中，影像資料可以分成好幾次傳送，每次的傳送都可以幫助解碼端得到更接近於原影像資料的重建訊號，因此可調式位元串流的調適 (scalable bitstream adaptation) 設計必須考慮到如下幾點：必須支援多樣化的更新運作(update operations)以產生有效可解碼的串流、將資料刪除時不能違反解碼相關性(decoding dependencies)的原則、允許在各個次元(dimensions)的可調性、對於媒體的特性 (如：碼率、失真率、frame rate、frame size…等)必須提供所有可能的可調適性、針對不同的調適單元(adaptation units) 可能必須設計不同的調適決策、對於網路服務品質(quality of service , QoS) 設計所有可能的調適方法。媒體資源的傳遞和調適在可調適的地點 (location of adaptation) 我們可以分成：傳送端驅動調適(sender-driven adaptation)、接收端驅動調適(receiver-driven adaptation)、網路驅動調適(network-driven adaptation)等三個不同的類別來考量。

在本計畫的碼率失真最佳化串流傳輸系統中，我們把封包漏失所造成的失真，轉化為不同程度的FEC 保護所造成的失真。舉例而言， 10^-3 的封包漏失率造成的失真，就相當於10^-3 的 FEC 的 error protection 導致 data rate 降低所造成的失真。整個系統可以分成兩大部份：

1. 媒體封包相依性控制：媒體封包相依控制 (packet dependency control) 的設計目標是針對提供較高的錯誤抵抗能力 (higher error-resilience) 和消除影像封包的重傳 (re-transmission) 需求。典型的多媒體串流在影像封包之間具有強烈的相依關係，如果其中一個影像封包在傳送過程中丟失，則與這個封包有相依關係且跟隨在後的 frames 在解碼時將可能會受到影響。網路調適性的媒體封包相依控制模組可以用來改善可調式多媒體串流的錯誤抵抗能力和減少延遲 (latency)，在此，可以運用一個樹狀的模型來記錄通道的封包丟失率 (channel loss rate) 和錯誤傳遞 (error propaga-tion)以達成有效的控制機制。

2. 碼率失真最佳化傳輸控制：一個多媒體封包傳送的率碼失真最佳化控制架構必需在資料單元群組之間利用碼率及失真的 Lagrangian cost function 來算出最小值的解來有效率的分配時間和頻寬的網路資源。在率碼失真最佳化控制的多媒體串流系統中，決定每一個封包的interleaving FEC 的保護程度。而這個程度則是依據此一封包的截止期限、傳送過程的歷史記錄、通道的統計資料、回饋的資訊、封包間的相依

(3) Adaptive Interleaved Forward Error Correction 系統與實驗

We have added Reed-Solomon coding modules, an interleaver module, and a

de-interleaver module to the original MPEG-21 Multimedia test bed. The system diagram is shown in Fig. 4-1. In the experiments, the CIF version of the FOREMAN sequence is used.

The sequence is encoded using ISO/IEC 14496 (MPEG-4) visual reference software (Micro-soft-FDAM1-2.5-040207) at 10 frames per second. The coding mode is one I-frame followed by nine P-frames at 10 frames per second.

Fig. 4-1. Architecture of the proposed system

Figure 4-2(a) presents the PSNR performance of the streaming system under a variable bandwidth scenario, ranging from 68kbps to 240kbps. When the transmission rate fluctuates significantly, e.g., from the 40th to the 80th frame, the proposed system can adapt the band-width changing quickly and reduce the degree of quality variation. Fig. 4-2(b) shows the bit rate variation of the FEC protected base-layer and enhancement-layer bitstream under the variable bandwidth condition. For example, when the transmission rate changes from 180kbps to 116kbps, the system performs dynamic rate allocation to add more FEC protection on the base-layer. Consequently, most of the enhancement-layer bitstream is truncated during the period.

The proposed system

0 10 20 30 40 50 60 70 80 90 100

Interleaved bitstream Bitplane size

Fig. 4-2. (a) PSNR under variable transmission rates. (b) Bitrates for the FEC-protected base-layer and the enhancement-layer bitstreams.

5. 高等精細可調層次式視頻編碼技術之研究－－蔣迪豪教授

在 MPEG 可調視訊編碼(Scalable Video Coding)標準的規格要求中，感興趣區域

Regions-of-interest (ROI) 的品質加強是一個重要的功能。對於可調視訊編碼的應用而言，

解碼端希望在增強層遺失時能夠在感興趣的區域中提供較好的品質。然而，目前可調視訊編碼標準的參照軟體JSVM1.0並沒有提供此一功能。因此在本計劃我們開發了一個可提供任意形狀感興趣區域和優雅品質調整的演算法。

(1) Background

The ROI functionality is commonly supported in scalable video/image coding. In MPEG-4 FGS, the ROI functionality is enabled by a selective bit-plane shifting scheme. Different from MPEG-4 FGS, in SVC the bit-planes are replaced with FGS layers which are produced by successive quantization of inter-layer prediction residue. Each FGS layer can be deemed as a group of multiple bit-planes. However, these bit-planes are coded by a cyclical block coding instead of traditional bit-plane coding. In SVC, the coding of a FGS layer is partitioned into the significant and refinement passes. The significant pass first encodes the insignificant coef-ficients which show zero values in the subordinate layers. After that, the refinement pass re-fines the remaining significant coefficients ranging from -1 to +1. Particularly, the signifi-cance pass coding is performed in a cyclical manner. On the other hand, the refinement pass coding is conducted subband by subband.

(2) Method -- Prioritized Cyclical Block Coding

To provide the ROI functionality based on cyclical block coding, we proposed to code each block unequally in a coding cycle. For illustration, Fig. 5-1 and Fig. 5-2 show the differences between the cyclical block coding and our prioritized coding scheme. For simplicity, we use the notation of (EOB, Run, Level) symbol to represent the coefficients of a block that are to be coded in a coding cycle of significance pass. In Fig. 5-1, the cyclical block coding equally encodes each block with one (EOB, Run, Level) symbol (or one refinement symbol) in every cycle. However, as shown in Fig. 5-2, to offer the ROI functionality, we should encode the blocks in the ROIs with more symbols by enabling the coding prior to the blocks which are outside the ROIs. We can further extend this concept to enable the coding of different ROIs at different cycles to have graceful selective enhancement. For example in Fig. 5-2, we have se-lected two ROIs. The coding of the highest priority region ROI 1, which includes Block 0, is enabled prior to the rest of blocks. After two coding cycles, the coding of ROI 2 which is of lower priority is activated subsequently. In particular, the coding of ROI 2 is started before the coding of background region, i.e., Block 2 and Block 3. With such prioritization, the blocks in the ROIs will be coded with more symbols at the end of each cycle. When the enhance-ment-layer is truncated, the blocks in the ROIs will be firstly decoded and updated.

For specifying the priority of a ROI, we use the number of shifting cycles which is defined as the number of coding cycles in the ROI before the coding of background region. For example, the number of shifting cycles of ROI 1 in Fig. 5-2 is 2 which means that the Block 0 has to be coded with 2 cycles before the coding of background region. Higher number of shifting cycles means higher coding priority.

(3) Experiment

For the experiment, we implement our prioritized cyclical block coding based on JSVM1.0.

Specifically, we use the Foreman sequence as an example to demonstrate the functionality of ROI. We compare the regional PSNR and the subjective quality between the schemes with ROI and without ROI.

Fig. 5-3 shows the regional PSNR comparison of different ROI coding modes. The legend SEn denotes the performance of our prioritized cyclical block coding. Specifically, the

num-ber n represents the foreground shifting factors. In addition, SEn_Remap shows the perform-ance with both the prioritized coding and layer remapping. As shown, enabling the ROI func-tionality can dramatically increase the PSNR of foreground region. Without the layer remap-ping, the PSNR of foreground region is the same as the original cyclical block coding at the end of FGS layer 1 (around the bit rate of 700kbits/s). However, with the layer remapping, we can have ROI functionality over the entire bit rate ranges. Particularly, the curve of JSVM1_Remap shows that simply enabling the layer remapping without our prioritized cy-clical block coding is not sufficient to offer the ROI functionality over the entire bit rate range.

Figure 5-1: Cyclical block coding.

Figure 5-2: Prioritized cyclical block coding for the ROI functionality.

(4) Conclusions

In this project, we use the cyclical block coding in JSVM1.0 to develop a graceful and arbi-trary shape region-of-interest (ROI) functionality for SVC. To enable the ROI functionality,

Foreman foreground PSNR

29.5

0 200 400 600 800 1000 1200 1400 1600 1800 2000

kbits/s

Foreman background PSNR

0 200 400 600 800 1000 1200 1400 1600 1800 2000

kbits/s

Figure 5-3: Regional PSNR comparison.

we additionally introduce 2 syntax in the FGS slice header, 3 syntax at the FGS layer level, and one syntax at MB level. Experimental results show that the proposed ROI functionality can significantly improve the subjective quality while maintaining the coding efficiency and scalability. Thus, we suggest the group to look into this technology and consider including ROI functionality in the SVC.

6. MPEG 智財管理與保護系統及強韌視訊解碼器之設計與模擬 -- 杭學

鳴教授

畫面間小波視訊轉換(Interframe Wavelet Video Coding)由於擁有良好的壓縮比以及多元的可調性近來備受矚目。它可以做到三種不同的可調性：1)資料量的可調整性、2)時間解析的可調整性、3)空間解析的可調整性。畫面間小波視訊轉換初步研究成果在 2004 年3 月與 7 月提案參加於 MPEG 標準組織 scalable video coding Call-for-Proposal 競賽，

之後仍參與MPEG Ad Hoc Group 進行討論。2005 年我們持續提案參加 MPEG 標準組織 scalable video coding 的 Core Experiments。在本計畫中，主要有兩個研究主題 (1)以人類視覺系統為基礎的位元率控制法，(2)用於影像壓縮之方向性多重解析度轉換，和(3)區塊位元平面算數編碼。計畫目標為針對畫面間小波視訊轉換之空間轉換、熵編碼器、與位元率控制等進行改良。

A.

以人類視覺系統為基礎的位元控制法

(1) 背景與研究目的

在畫面間小波視訊轉換(Interframe Wavelet Video Coding)位元控制法(rate control algo-rithm)中，每個在用於畫面之間的小波轉換編碼的截斷點(truncation point)都有自己相關聯的失真(distortion)和位元長度(bits length)。而每個截斷點的斜率(slope)就是把失真的差異(distortion difference)除以位元差異(bit difference)所得到的商。在最佳化理論中 (optimization theory)，擁有較高斜率的截斷點有較高的優先權被傳送。在這裡我們提出一個方法，就是說我們把每個截斷點的斜率乘上一個由人類視覺系統算出來的權重。故這個經過視覺加重的斜率會成為位元控制法中判斷的標準。我們的模擬會指出最後的重建影像有較低的最高訊號雜訊比(PSNR)和較佳的視覺品質。

(2) 方法與實驗

正好可被注意到的失真(Just Noticeable Distortion) -- 把一張影像經過小波轉換(wavelet transform)後，每個次頻帶可用層次(level)λ 以及方位(orientation) θ 表示，而每個次頻帶的亮度成份(luminance component)的正好可被注意到的失真(just noticeable distortion)y 可

在文檔中基於MPEG標準之多媒體通訊整合平台及其應用---總計畫(II) (頁 12-0)