Chapter 1 Introduction
1.5 Thesis Organization
The organization of this thesis is described as follows:
Chapter 2 introduces the emulation platform of the MPEG-21 Resource Delivery Testbed, which is the basis of the experiment environment of this thesis. Some features of the Testbed are also qualified in this chapter.
Chapter 3 addresses the issue about wireless video streaming and states the architecture of the core control algorithm. The detail explanation of the proposed control scheme is included. A migration to the content-aware streaming will also be described in this chapter.
Chapter 4 shows some experimental result based on the proposed control algorithm. Both the end-to-end and the system-level performances are presented to justify the efficiency of the proposed control scheme.
Chapter 5 gives the conclusion and possible future works based on the effort of this thesis.
Chapter 2 Overview of MPEG-21 Resource Delivery Testbed and cdma2000 1x-RTT System
This chapter introduces the architecture of the MPEG-21 resource delivery testbed, which is the emulation platform for the experiment shown in this thesis. Some characteristics such as buffer configuration, rate control, traffic shaping, and packetization, will also be discussed. Some standard features such as Real Time Streaming Protocol (RTSP) / Session Description Protocol (SDP) control protocol would be skimmed only, as these standard features would affect the end-to-end performance lightly. Besides, we will describe the some streaming-related features of cdma2000 1x-RTT (Radio Transmission Technology) mobile system in this chapter, including the frame error rate, physical channel (FCH and SCHs) assignments, RLP retransmission, and etc. The physical-layer parameters are synchronized with the usage of [26], and some assumptions are also set in the same fashion.
The mixed traffic system environment is also built to perform the system-level simulation.
2.1
Overview of MPEG-21 Resource Delivery TestbedThis section gives an overview to the MPEG-21 resource delivery testbed (abbreviated Testbed in the following text). The overall architecture is firstly introduced, while each component is explained in a functional fashion. This section would not touch the detail implementation because the MPEG-21 resource delivery testbed is just one example of the existing streaming system. Common features such as packetization, rate control, and rate shaping, of the testbed will be described.
2.1.1 Overall Architecture
The overall architecture of the testbed is shown in Figure 2-1. There are four major components in this streaming system: server, client, network interface, and network emulator. The server plays the role of a normal web server that waits for subscription for
multimedia content. Also, the server performs some sender-based controls that provide better quality of service for users. The client could be any subscriber that can make a request to the server for subscription, i.e., using the same protocols for negotiation during the streaming. The network interface defined in Testbed is the session-layer interface that uses RTP/RTCP for the real-time delivery of the content. The network emulator is used to emulate a channel whose behavior could be specified using some parameters.
Media
Network Interface Network Interface
Network Emulator
Network Interface Network Interface
Network Emulator
Client
Control
Figure 2-1. Overall architecture of the MPEG-21 Testbed [8]
2.1.2 Streaming Server
There are currently five functional blocks in the streaming server of the Testbed system.
They are media database, Digital Item Adaptation (DIA), Streamer, QoS decision, server controller. The media source is firstly encoded by the offline media encoder and archived in the storage. The coded bitstream will then be managed by the media database. The media database supports the segmentation of the media into resource unit (in the terminology of MPEG-21 standard). The segmented bitstream will then be retrieved by the DIA engine, which performs resource adaptation according to the external description (Content Description Item (CDI) and conteXt Description Item (XDI)) and both static and dynamic description could be resolved. For video, the resource adaptation refers to the rate control
and rate shaping of the scalable bitstream. The streamer is primarily in charge of sending the adapted resource in a smooth way. To this end, the embedded leaky-bucket traffic shaping algorithm is employed at the streamer. The details of the bitstream segmentation, resource adaptation, and traffic shaping, are explained as following sub-sections. The functionality of each functional block in the server side is stated as follows:
z Offline encoder:Compress the source sequence into scalable bitstream
z Media database:Manage coded bitstream and segment bitstream into video packets
z DIA engine:Provide resource and description adaptation capability
z Streamer:Shape the transmission traffic in a smooth way
z QoS decision:Analytically compute the available transmission rate for each media based on the estimation of available channel rates
Bitstream segmentation (Media Database)
Due to the direct access to the bitstream, media database could retrieve some embedded information by parsing bitstream. During encoding process, some resynchronization markers are inserted into the bitstream to improve the error resiliency.
Figure 2-2 illustrates the MPEG-4 FGS bitstream structure in term of the wide-sense resynchronization marker (Video Object Planes (VOP) start code, bit-plane start code (BPSC), and resynchronization marker (RM)).
Sequence header
VOP
header RM RM VOP
header
header RM RM VOP
header
header RM RM VOP
header
header RM RM VOP
header
Figure 2-2. MPEG-4 FGS bitstream structure in terms of resynchronization markers
The bitstream segmentation of the MPEG-4 FGS structure is recommended in [27].
This bitstream segmentation could prevent the start-code emulation (i.e., incorrectly recognize the received symbol as another existing symbol due to error) because it prohibits
wide-sense resynchronization marker from being packetized into separate packets. This efficiently improves the error resiliency and the ease of finding next synchronization point for decoding, in case that packet loss occurs. As to the MPEG-4 FGS bitstream structure, the corresponding segmentation is depicted in Figure 2-3. There are additional five types of segmentation in the enhancement-layer bitstream owing to the utilization of bit-plane start code. Those segmentations shown in Figure 2-3 would not suffer any start-code emulation problem even if the packet loss rate is high. Note that each segment mentioned here is not necessarily a RTP packet. However, in the implementation of Testbed, no adaptive packetization is exploited and hence each segment is mapped into one RTP packet.
VOP
Occur both in BL and EL Occurs in EL only
VOP
Occur both in BL and EL Occurs in EL only
Figure 2-3. Representation of the bitstream segmentation for MPEG-4 FGS
Resource Adaptation (DIA engine)
The DIA performs resource adaptation based on the CDI and XDI descriptions. For video streaming, the XDI is generated according to the network profile to describe the available channel bandwidth in some time interval. The resulting XDI is fed into the DIA engine to do rate adaptation of the scalable-coded bitstream. The rate adaptation could be broken into two steps: rate control and rate shaping. The purpose of rate control is to compute the available bit budget for a resource unit (e.g. one VOP). The rate control for video is similar to that proposed in [28], which is based on a weighted average scheme. The
formula is shown as follows:
sec sec
sec
) ) (
( w I w P w B
w t t R
R
B P
I
vop
vop + +
= ⋅ (2.1)
Where R(t) is the available channel rate at time t, wVOP is the weighting factor of the associated VOP type, and the Isec, Psec, and Bsec is the number of VOP for each type in this second. The weighting factor wI, wP, and wB is set to 1, 1, and 0.6, respectively. The advantage of this rate control algorithm lies in its low complexity. However, the resulting visual quality depends on the selection of weighting factors for each type of VOP, while the quality varies among different type of VOP. Furthermore, better visual quality may be archived by R-D optimizing the streaming sequence, but this may require additional computational and/or storage complexity.
The computed bit budget is conveyed into the rate shaping module for truncating the enhancement-layer bitstream. Note here that enhancement-layer bitstream could be arbitrarily byte-aligned truncated except the symbol emulation should be avoided. For example, the last byte of the last resource unit is “0x00” in HEX, while the next resource unit is started with a VOP start code (0x000001b6). Then the concatenated bitstream would be “0x00000001b6…”, and this is not a valid codeword in the MPEG-4 bitstream syntax.
Traffic shaping (Streamer)
The major purpose of traffic shaping is to keep the transmission as smooth as possible, either in the rate or number of packet sense, depending on the structure of the underlying network. No apparent impact would appear when the transmission rate is low. However, the packet loss would be resulted when the traffic is high during some time interval, provided that traffic shaping algorithm is absent. If the transmission rate is high and no traffic shaping is used, the transmission duration may be relatively short, compared to the total transmission time. This would cause the instantaneous transmission rate exceeding the tolerable transmission rate of the underlying network. Intuitive solution to this problem is to evenly distribute the data in streaming over the total transmission time. Typical solution to this problem is to use either the “Leaky Bucket” or “Token Bucket” algorithm. The Testbed adopts the leaky bucket solution for traffic shaping, and the concept is illustrated in Figure 2-4.
One can imagine that there is a buffer in the underlying network, i.e., in the lower layer
of the protocol stack. The buffer is empty initially, and the transmission drives the data piling up into the buffer with a speed of transmission rate. Once the buffer fullness reaches to specific value, data begins to “leak” from the buffer with a speed of sustainable rate. With a fixed buffer size, a leaky bucket model could usually be characterized with three parameters: B (buffer size), R (leak rate), and F (initial buffer fullness). These three parameters determine the performance of a leaky bucket model for transporting data in a smooth way. To avoid buffer overflow, the buffer size or the leak rate should be large enough. In contrast, the buffer fullness and leak rate should also be adjusted appropriately to prevent buffer from going underflow.
Buffer Size
Leak rate = Sustainable traffic rate
Buffer Fullness Buffer increment rate
Buffer Size
Leak rate = Sustainable traffic rate
Buffer Fullness Buffer increment rate
Figure 2-4. Representation of typical leaky bucket model
2.1.3 Client
The ultimate goal of the client-side processing is to playback the desired content smoothly. However, the limited bandwidth and random error of channel complicate the processing for good reconstruction. The buffer is especially important because it mitigates the pain from synchronization and variation of receiving rate. The Testbed designs two buffers in the client side: packet buffer and stream buffer. The packet buffer is used to receive packets from the network interface, while the stream buffer provides additional protection to prevent packet buffer from going underflow or overflow. The decoder in the
Testbed is a MPEG-4 FGS compatible decoder with error-resilient capability, i.e., the decoder could continue to decode received bitstream even if the bitstream is corrupted by loss. The error control used in this Testbed is to use the NACK-based retransmission mechanism. The retransmission monitor checks the packet buffer for the lost packet in some frequency, say, one check per second, to record the sequence number of the lost packets.
The lost record would be sent from the client to the server via reliable control channel, and the lost packets will then be retransmitted to the client. Besides, the decoder follows an output buffer to moderate the decoding speed and playback. The functionality of each component in the client side is summarized as follows:
z Packet buffer:Smooth out the fluctuation of receiving rate
z Stream buffer:Provide additional protection for packet buffer from going overflow or underflow
z Decoder:Real-time decode the received bitstream with error-resilient capability
z Retransmission monitor : Monitor the packet-lost situation in the packet buffer periodically
z Output buffer:Moderate the decoding speed for smooth playback
Buffer configuration
The packet buffer plays a role of receiver buffer that prevents packet reception from duplication and out-of order. In the perspective of rate, the buffer pair regulates the receiving rate, which is strongly correlated with the channel delay/jitter, into the transmission rate of the scalable-coded video clip for smooth decoding. Figure 2-5 shows the interaction among packet buffer, stream buffer, and decoder. The notation “active” used here indicates that the component would continuously and concurrently activate its functionality.
In a normal operation case, we could assume that there are data both in the packet buffer and stream buffer. The data rate input to this buffer configuration at time t is equal to the receiving rate, while the output data rate equals to the transmission rate at the time (t-∆), where ∆ is the pre-roll time. When the network condition is pretty good, the transmission rate is equal to the available channel rate. In this case, the packet buffer is easily overflowed due to the high receiving rate. The packet buffer would immediately put the surplus data (oldest in time) into the stream buffer for decoding, and no packet loss would be incurred.
On the other hand, the packet buffer would suffer underflow in case that the channel condition turns to poor. In this state, the decoder will continue to decode the data stored in the stream buffer even if there is no incoming data at the same time. While the network condition improves, the stream buffer could be refilled with the incoming packet data. Note that this buffer design is primarily to protect the buffer from going underflow, for two reasons. One is that there are more chances for buffer underflow because the unpredictable channel condition is usually poor. Second, the use of scalable coding scheme would not overflow the buffer; the play-out rate would be equal to the receiving rate provided that the available channel rate is always sufficient.
Buffer
(Active) Stream buffer Decoder
(Active)
(Active) Stream bufferStream buffer Decoder (Active)
Figure 2-5. Interaction among packet buffer, stream buffer, and decoder in Testbed
2.1.4 Network Interface
The network interface used in Testbed is mainly to encapsulate the resource units into RTP packets for transmission. For video, each adapted resource unit is retrieved from the DIA engine by streamer, will then be sent to the integrated packet buffer. The integrated packet buffer consists of the buffer storing the packet data for retransmission and the network interface that is used for packet delivery. Once a resource unit arrives the integrated
packet buffer, a RTP packetizer encapsulates the resource unit as a payload and appends the associated RTP header to the payload, forming a valid RTP packet for video content delivery.
The detail description of a formal RTP packet could be found in [29]. Figure 2-6 shows the protocol stack used in Testbed system. The delivery of video data is via the RTP channel, which is based on the best-effort UDP network without proving reliability. As we mentioned in Section 2.1.3, the error caused by packet loss would be recovered by retransmission request from the client to the server side. To make sure the retransmission request could be correctly delivered to the server side, the RTSP messaging is through the reliable TCP channel because the traffic used for retransmission is minor. Note that the retransmission may sometimes fail due to the timing constraint of continuous playback, i.e., retransmission of a packet takes effect only if the packet arrives before it is consumed for playback.
Physical Layer Data Link Layer Network Layer (IP)
TCP UDP
RTP/RTCP RTSP/SDP
Application Layer Control Command
Layered Video Data
Base Layer Enhancement Layer
Physical Layer Data Link Layer Network Layer (IP)
TCP UDP
RTP/RTCP RTSP/SDP
Application Layer Control Command
Layered Video Data
Base Layer Enhancement Layer
Figure 2-6. Protocol stack in Testbed streaming system
2.1.5 Network Emulator
The principal use of the network emulator is to produce a heterogeneous channel environment that a streaming server-client pair would often experience. The Testbed adopts a Linux-based, public domain network emulator named NIST Net, which is developed by the National Institute of Standard Technology, of the United States. In addition to the supply of heterogeneous channel environment, the controllable channel parameter is another significant feature such that the users could create their own characterized channel.
Controllable parameters comprise the bandwidth, random packet loss probability, delay, jitter, and size of network queue. Users could adjust these parameters using a text file-based
network profile, and the minimum time interval between two consecutive parameter settings are set to 1 second.
The architecture of the NIST Net is illustrated in Figure 2-7. The logical operation that NIST Net performs is described in the following [30]. Once a packet reaches the computer installed NIST Net, a packet intercept code would activate and seize control of the IP packet type handler. All IP packets received by network device will be directly passed to the NIST Net module. Packet matching determines whether and how an incoming IP packet should be processed by the packet processing, which includes drop, duplication, and delay, of a packet.
The processed packet will then be transferred to the Linux IP level code. The fast timer takes control of system real time clock and uses it as a timer source for scheduling delayed packets. The fast timer reprograms the clock to interrupt at a sufficiently high rate for precise delay processing to a packet.
Figure 2-7. Architecture of NIST Net [30]
2.1.6 Summary
This section introduces the main architecture and operation of each functional block in the MPEG-21 Testbed resource delivery system. The Testbed is a real streaming system that users could create their own video content and deliver it over network. Some messaging mechanism, which would not significantly influence to the end-to-end performance, is touched on lightly. Besides, the network parameters are controllable, and the parameters
could be specified by network profile that is generated by any underlying network (both wire-line and wireless).
2.2
cdma2000 1x-RTT SystemThe wireless standards often cover large range of technical description that distributed both in the physical layer and the data-link layer. Nevertheless, this thesis aims at the video data transmission over this underlying network, especially for the streaming application of the scalable video stream. In this regard, we will discuss the layering structure of the cdma2000 1x-RTT system, and we would explain the impact of physical-layer attributes on the upper-layer parameters. The term “1x” defined here indicates the single carrier is utilized in the system deployment. The bandwidth variation could be directly derived from the rate assignment of the system, i.e., how many Supplementary Channels (SCHs) are assigned to a data user. In addition, the upper-layer packet loss probability could be computed from the frame error rate of the wireless link, by mapping the upper-layer packet into lower-layer frames. Furthermore, the delay and jitter caused by the wireless transmission could also be obtained from the frame error rate and the number of associated
The wireless standards often cover large range of technical description that distributed both in the physical layer and the data-link layer. Nevertheless, this thesis aims at the video data transmission over this underlying network, especially for the streaming application of the scalable video stream. In this regard, we will discuss the layering structure of the cdma2000 1x-RTT system, and we would explain the impact of physical-layer attributes on the upper-layer parameters. The term “1x” defined here indicates the single carrier is utilized in the system deployment. The bandwidth variation could be directly derived from the rate assignment of the system, i.e., how many Supplementary Channels (SCHs) are assigned to a data user. In addition, the upper-layer packet loss probability could be computed from the frame error rate of the wireless link, by mapping the upper-layer packet into lower-layer frames. Furthermore, the delay and jitter caused by the wireless transmission could also be obtained from the frame error rate and the number of associated