Thesis Outline - 可調適視訊壓縮串流在超寬頻無線技術WiMedia傳輸之低視訊延遲回報機制設計

Chapter 1 INTRODUCTION

1.3 Thesis Outline

The rest of this thesis is organized as follows: In Chapter 2, we briefly introduce about the scalable video coding (SVC). In Chapter 3 we give a brief overview of the concept of WiMedia and focus on the MAC layer. In Chapter 4, we propose the system architecture with a feedback scheme combining proper prediction and adaptation to establish the cross-layer connections between the SVC layer and MAC layer. In Chapter 5, we present the simulation results on four aspects of comparing with other two methods. Finally, the conclusion and future work are drawn in Chapter 6.

CHAPTER 2 BRIEF INTRODUCTION OF H.264/AVC SVC

The main goal of the traditional video coding standard in the past is to increase the coding efficiency. As the developments of multimedia application grow, the technologies of video compression have been changed to support from one user (uni-case) to multiple users (multi-case). But due to the channel conditions and unequal requirements for different receivers, the technology of video coding which is scalable according the condition heterogeneities is necessary.

To achieve the goal to flexibly adapt the contents for multimedia communications, a scalable video coding (SVC) based on H.264/AVC scalable extension (referred to SVC in the following) has been developed by the Joint Video Team (JVT) which is formed by ISO/IEC MPEG and VCEG. In current Joint Scalable Video Model (JSVM), the on-the-fly adaptations in the frame-resolution (spatial), frame-rate (temporal) and frame-quality (SNR) are allowed by generating a global bitstream with highest resolution for all intended receivers and each decoder can simply obtain the reduced resolutions by discarding the Network Abstract Layer (NAL) units in the global bitstream.

The SVC makes it possible that the various networks, either in wireless or cable, streaming applications such as Personal Computer (PC), high definition TV (HDTV) and even mobile devices can display the same video with different specification according to each device‟s own requirements from one scalable global bitstream. As the illustrated application scenario for SVC shown in Figure 2-1, the system consists of a SVC server which contains the SVC encoder, a wireless access point (AP) which may includes the SVC extractor and the wireless transmission mechanism and three clients connect to the network with different communication conditions. The SVC server delivers the global bitstream to the wireless AP without any truncation while the connection is usually in cable. The wireless AP extracts the different required resolutions for recipients in the heterogeneity network.

Figure 2-1 SVC overview

2.1 Encoder Overview

In Figure2-2 we introduce a generic encoder structure overview of SVC with three spatial layers which encodes the video into multiple spatial, temporal and

quality (SNR) layers for combined scalabilities [1] [2]. The encoding is a layer-based approach that separate the input video into several spatial layer first in order to support multiple spatial resolutions and then coded with separated encode loops. For each spatial layer, the temporal scalability is achieved by using hierarchical-B structure, a temporal decomposition. Since the predict information comes from either spatially lowed layers or temporally neighboring pictures at the same spatial layer, the inter-prediction scheme can exploit the correlations among the layers and reuse the information to improve the coding efficiency of the enhancement layers [1]. After the inter-layer prediction module, the residues of each spatial layer are encoded with either scalable entropy coder for fine grain scalability (FGS) or non-scalable entropy coder for coarse grain scalability (CGS) to support SNR scalability. However, there is a restriction that the first SNR layer (also called base layer) in each spatial layer need to be encoded with non-scalable mode.

Figure 2-2 SVC encoder with 3 levels of spatial scalability [3]

2.2 Temporal Scalability

Figure 2-3 Hierarchical-B prediction structure with GOP size of 8

The temporal scalability in SVC which allows of supporting multiple frame rates is implemented by using hierarchical-B prediction structure. Figure 2-3 gives an example of hierarchical-B structure with GOP size of eight that supports four temporal layers. The pictures of lower temporal layer are encoded first such that the higher temporal layer can refer to the lower layer‟s pictures.

Frame 0, 8 and 16 in Figure 2-3 are called key frames. Each key frame is either a I-frame which contains the prediction information itself or a p-frame that using only the previous key frames as reference for the inter prediction. Each GOP can be independently decoded if the preceding key frames are available. The other frames between two key frames are B-frames. Each B-frame is hierarchically predicted by using both previous and future frames from lower temporal layer as reference pictures.

2.3 Spatial Scalability

The spatial scalability is achieved by dissolution the video into spatial pyramid which the video is down-sampled to generate different spatial resolutions. The different layers are independently coded. In order to improve the coding efficiency of enhancement layer, inter-layer prediction is introduced (and the inter-layer prediction scheme is used in intra-texture, motion and residue prediction).

The inter-layer prediction is assembled according to the types of layers used.

The base and the CGS layer can select the reference layer from any lower layers flexibly while the FGS layer must predict from the previous SNR layer at the same spatial layer. In Figure 2-4 shows an example of inter-layer prediction with three spatial layers and each spatial layer contains several SNR layers which may be either CGS or FGS while the base layer of spatial layer must be CGS. The notation X_Y_Z used in the picture, for example CGS_1_0, specified each SNR layer in each spatial layer and used for decoder to identify the layers. The symbol X denotes the coding method including BASE, CGS and FGS. The Y indicates the dependency_id which increase one while access successive spatial layers or CGS layers such as from CGS_1_0 to CGS_2_0. The third symbol Z designates the quality_id which is incremented by one for successive FGS layers.

In spatial layer 0 of Figure 2-4, BASE_0_0 that is agreeing with H.264/AVC is the base layer of spatial layer 0. Above BASE_0_0, there are two other SNR layers encoded with CGS mode in spatial layer 0, CGS_1_0 and CGS_2_0 which are predicted from BASE_0_0 and CGS_1_0 respectively. Because SVC supports flexible selection of reference layer, the base layer of spatial layer 1, BASE_1_0, and the upper SNR layer CGS_4_0 are refer to the CGS_1_0 and CGS_2_0 instead of CGS_2_0 and BASE_4_0 individually. So, the CGS_4_0 can be decoded even when the BASE_4_0 doesn‟t receive correctly, for example. These flexibility leaves room for further optimization.

The FGS layers can only refer to the previous layer that is different to the rules of CGS layer, such that FGS_4_1 refers to CGS_4_0 and FGS_4_1 itself is the reference picture of FGS_4_2 and go on. According this rule, if we encoded every enhancement layer as FGS, the base layers are all necessary while decoding the above enhancement layers.

Figure 2-5 shows an example of adopting the spatial scalability with three levels and in each spatial layer still has SNR layers to further improve the quality.

Figure 2-4 Inter-layer prediction structure with three spatial layers [2]

Figure 2-5 Illustration of spatial scalability

2.4 SNR Scalability

SNR scalability uses multiple SNR layers in each spatial layer to provide the scalability of quality. Figure 2-6 shows an example that the more SNR layers been decoded correctly, the higher quality the picture can achieve.

SNR scalability can be achieved via Coarse Grain Scalability (CGS) and Fine Grain Scalability (FGS). CGS encodes in a non-scalable way while FGS can be truncated at any arbitrary position.

Figure 2-6 Illustration of SNR scalability

2.4.1 Coarse Grain Scalability (CGS)

The CGS data can only be decoded at several pre-defined integral points as shown in Figure 2-7(a). CGS seems like an additional spatial layer with smaller quantization parameter (QP). The only difference between CGS and a spatial layer is the inter-layer prediction of CGS reuses the information from lower layer without spatial interpolation, treat as every layer have the same resolution. Such that CGS doesn‟t use motion vector refinement as in spatial scalability [2].

2.4.2 Fine Grain Scalability (FGS)

FGS can allows truncation at any arbitrary location that FGS provide arbitrary quality levels according to the user‟s bandwidth capability which may be fluctuant as shown in Figure 2-7(b).

By scanning the whole frame, each FGS enhancement layer can provide a refinement of the residue signal, Such that the quality improvement can be separated in the entire frame. There are many successive improvements of FGS and more details of FGS are introduced in [2].

Figure 2-7 SNR scalability (a)CGS (b)FGS

2.5 Bit stream Extraction

An extractor is used to extract the single SVC bit stream which is consisted of a set of spatial, temporal and quality resolutions to reach the informed bit rate that is fluctuated by the bandwidth available. There are two extraction methods named simple truncation and quality layer adaption [1].

2.5.1 Simple Truncation

The extractor must determine all the reference pictures that are required to decode the base layers for the target spatial-temporal resolutions. Then extract from the lower layer to the higher layer due to the causality in encoding. If only partial

layers are allowed to transmit due to the available bandwidth, the higher layers would be truncated earlier while more bandwidth available, the more quality layer of the requested spatial-temporal resolution can be transmitted.

The extractor can only truncate the bit stream at the layer boundary if CGS scheme is used in the SNR scalability while if FGS scheme is adopted, every picture is equally truncated according to the informed bit rate.

2.5.2 Quality Layer Adaption

For quality layer adaption, the extractor adds a control information named quality_layer_id in the Network Abstraction Layer (NAL) unit. It is consisted in FGS layers to indicate the importance of the NAL unit and provide better bitstream adaption. The less important one would be dropped earlier.

The extractor, similar in the simple truncation, keeps the dependent quality layers in the order from lower layer to higher layer to reach the target bitrate. The extractor compute the bitrate of each quality layer and remove the NAL units in quality layers according to the quality_layer_id. If the bandwidth is not enough to transmit all the NAL units in the quality layer, the NAL units with the same quality_layer_id would be truncated equally.

CHAPTER 3 BRIEF OVERVIEW OF WIMEDIA UWB

This chapter briefly introduces the structure and characteristic of the WiMedia.

Ultra wideband (UWB) is the wireless networking technology of the next generation with the features of extremely high data rate which can be up to 480Mb/s with in the distance up to 10m、low interference with other radio system and low power consumption. The advantages of UWB bring many benefits to users and make several new applications. One of it is the idea of E-HOME, which wants to use the UWB technology to connect the consumer electronics in house.

WiMedia is the UWB-based specification which has more than 170 companies to participate in including Intel, Texas Instrument, SONY, Nokia, Microsoft, etc [4]

and due to the high data rate in short range with the low transmission power, WiMedia also enables the high speed wireless personal area networks (WPANs).

WiMedia use the MultiBand Orthogonal Frequency Division Multiplexing (MB-OFDM) [5], splits a signal into fourteen 500 MHz-wide bands shown in Figure 3-1 and uses OFDM to increase bandwidth, proposed by MultiBand OFDM Alliance (MBOA) [5] as its PHY technology. The MBOA merged with the WiMedia Alliance [6] to promote and standardize the UWB on March 2005 and renamed the original MBOA-UWB to WiMedia UWB. On December 2005, the European Computer Manufacturers Association (ECMA) adopted the WiMedia UWB as its standard and completed high rate Ultra WideBand PHY and MAC standard －ECMA-368 [7]. We choose the standard ECMA-368 which is equal to the WiMedia UWB as our simulation system.

Figure 3-1 Fourteen 500MHz wide bands for MB-OFDM

3.1 Brief Introduction of WiMedia UWB MAC

The characteristics of the WiMedia MAC protocol include to preserve the frame structure (including super-frame and the MAC header ) which is specified by IEEE 802.15.3 task group [8]、TDMA liked structure、adopt the CSMA and the RTS/CTS mechanism similar as the enhanced distributed channel access (EDCA) [9]

mechanism of WLAN standard、 two distinguish features namely beaconing and distributed reservation protocol (DRP) which let the fully distributed coordinate architecture without the Piconet Coordinator (PNC) to control the medium access and because of these, the MAC functionality is individual for each device in the networks.

3.1.1 Super-frame Structure

The WiMedia MAC layer divides the transmission time into consecutive super-frames. Super-frame is a period of time interval used in the standard for coordinating frame transmission between devices.

Figure 3-2 Super-frame structure consist of media access slot

Figure 3-2 shows the time structure of super-frame of WiMedia MAC layer, which is composed of 256 Media Access Slot (MAS) and the interval of each MAS is 256 usec. So the total duration of a super-frame is 256*256 = 65,536 usec.

The super-frame is separated into two parts, Beacon Period (BP) and Data Transfer Period (DTP), according to the functionality. According to the media access methods, the Data Transfer Period can be divided into Distributed Reservation Period (DRP) and Prioritized Contention Access (PCA) as shown in Figure 3-3.

Figure 3-3 Structure of super-frame of WiMedia MAC

3.1.2 Frame Transaction

A source device may fragment or aggregate MSDUs into several frames and transmit frames with the same Delivery ID and address to the same destination address [10]. A frame may be considered received by the device if it has a valid Header Check Sequence (HCS) that protect the combination of the MAC layer header and the PHY layer header and Frame Check Sequence (FCS) that contains 32-bits which represents CRC, used to check the correctness of the frame in the receiver.

Before any frame transaction, the transmitter has to send a Request To Send (RTS) and waiting a period of Short Inter-Frame Space (SIFS) that every single frame transmission shall be separated by a SIFS interval. After that it expects to receive a Clear To Send (CTS) from the receiver as shown in Figure 3-4. RTS/CTS is also the mechanism used by IEEE 802.11 wireless networking protocol to reduce the collisions introduced by the hidden node problems. If the transmitter does receive the CTS, it begins the transmission. Otherwise, the transmission fails and the transmitter release the rights to access medium.

There are three acknowledgement policies in frame transaction to verify the delivery of the frame which are Imm-ACK、B-ACK and No-ACK. In No-ACK, the receiver doesn‟t acknowledge the transmission and the transmitter treats it as successfully without regard the actual result. In Imm-ACK, the receiver acknowledges a Imm-ACK frame after a correct frame-reception. And B-ACK allows the transmitter to transmit multiple frames while the receiver keep trace the reception until the block

terminate and receive a single frame from the receiver to indicate which frames were received and which frames need to be retransmitted [11].

Figure 3-4 illustration of frame transaction

Figure 3-5 Acknowledge policies

3.1.3 Overview of Beacon Period

The beacon mechanism is the most fundamental and very important in the operation of WiMedia MAC.

In WiMedia, each super-frame starts with BP extends one or more continuous MASs which is up to 32 MASs for Beacon transmission and the first MAS start in the BP, in the super-frame, is called Beacon Period Start Time (BPST). For each MAS in BP are allowed to transmit beacons but no data transmission and it can transmit three beacons in the three beacon slots in a MAS, which means the total BP can transmit 3*32 beacons at most. Beacons are used for announcing presence, coordinate with neighbors, topology control and negotiate traffic in the medium. Because there is no central coordinator, beacons play a key role to let the system completely distributed coordinate instead of central.

In original, the IEEE 802.15.3 MAC need a PNC as a central controller and any device whether portable or not can be the PNC. Communications between devices can only enable or through by the PNC. Without PNC, the WPAN can‟t operate any more. For example, if the PNC disappears either due to the mobility or the device failure, there needs several seconds to elect a new PNC to control the networking and all the communications can‟t be maintained during the reorganization [4]. That‟s why the WiMedia MAC adopted distributed architecture which let the functionality of negotiate resource allocation on each device instead of the PNC.

Without the central information from PNC, each active device in the network needs to discover the WPAN information by itself and update periodically to make the correct decision. This is done by the transmission of management and control information which coded as Information Element (IE) of different functionalities in beacons. Beacon frames should transmit at the lowest rate 39.4Mb/s to enhance the robustness to avoid the missing of the most important information within the system.

According the WiMedia MAC standard, each device has to announce its existence by sending a beacon frame in the BP in order. Each beacon contains the device ID of its originating device and IDs of all its direct neighbors whose beacons had been heard by the device in the previous super-frame to exchange the existence and the topology information nearly heard to each other.

After scanning the neighbors‟ beacons, devices may know the nearby network information and which MASs are available while others are in used. For example, device A wants to transmit data to device B by DRP mechanism within MASs

#140~145、device C wants to transmit data to device D through PCA mechanism to

content with others in MASs#161~165 etc, these are coded in the beacons. To collect the information carried by beacons let the devices know how to transmit data or response others requirements.

3.1.3.1 Beacon Group

Devices may transmit their own BPST and BP length in the BP. BPST is used mainly for synchronization of devices in the network and the BP length is used to indicate the number of beacon slot been occupied which also shows the number of devices within range, the maximum of BP length can‟t exceed mMaxBPLength = 96.

Devices communicate with others should listen for the beacons during the BP length it announced in the last super-frame.

A beacon group is defined as a set of devices who are one-hop away the device and have the same BPST, which means the beacon group may be different from device to device. The extended beacon group is the union of a device‟s beacon group and all the beacon groups in the device‟s beacon group as shown in Figure 3-6.

If a device does not receive a beacon from another device for more than 3 consecutive super-frames, it will not consider the device is no longer in the network.

Figure 3-6 Beacon group and extended beacon group [4]

3.1.3.2 Beacon Transmission And Reception

Before transmit any non-beacon frame, a device should scan for beacons on the chosen channel for at least one super-frame, if it can‟t hear any beacon from its neighbor, it will create a new BP and send its beacon in the first beacon slot right after the signaling slots. Otherwise it will use the same BP as the neighbors and transmit its beacon in a beacon slot randomly selected from up to 8 beacon slots after the last unavailable beacon slot but not exceed the end of the BP. If there is no available

在文檔中可調適視訊壓縮串流在超寬頻無線技術WiMedia傳輸之低視訊延遲回報機制設計 (頁 12-0)