TCP-friendly congestion control for the fair streaming of scalable video

(1)

TCP-friendly congestion control for the fair streaming of scalable video

Sheng-Shuen Wang

*

_{, Hsu-Feng Hsiao}

Department of Computer Science, National Chiao Tung University, 1001 University Road, Hsinchu, Taiwan

a r t i c l e

i n f o

Article history:

Available online 24 February 2010 Keywords:

Bandwidth estimation Congestion control Packet train Scalable video coding TCP-friendly

a b s t r a c t

Dynamic bandwidth estimation serves as an important basis for performance optimization of real-time distributed multimedia applications. The objective of this paper is to develop a TCP-friendly and fair con-gestion control algorithm which regulates the sending rate robustly by inferring the end-to-end available bandwidth. In addition to network stability, we also consider the characteristics of streaming applica-tions, such as the bandwidth resolution in scalable video coding (SVC) which can achieve fine granularity of scalability at bit level to fit the time-vary heterogeneous networks. The congestion control algorithm is mainly composed of two phases: start phase and transmission phase to better utilize the network resource by subscribing SVC layers. In the start phase, we analyze the relationship between the one-way delay and the dispersion of packet trains, and then propose an available bandwidth inference algo-rithm which makes use of these two features without requiring administrative access to the intermediate routers along the network path. Instead of either binary search or fixed-rate bandwidth adjustment of the probing data as proposed in literature, a top-down approach is proposed to infer the initial available band-width robustly and much more efficiently. After acquiring the initial available bandband-width, the missions of the transmission phase include the adaptation of the sending rate fairly by progressive probing and also the accommodation of the network resource to TCP flows.

In case of the unavoidable network congestion, we unsubscribe scalable video layers according to the packet loss rate instead of only dropping one layer at a time to rapidly accommodate the streaming ser-vice to the channels and also to avoid persecuting the other ﬂows at the same bottleneck. In addition, the probing packets for the estimation of the available bandwidth are encapsulated with RTP/RTCP. The sim-ulations show that the proposed congestion control algorithm for real-time applications fairly utilizes network bandwidth without hampering the performance of the existing TCP applications.

1. Introduction

There has been explosive growth of emerging audio and video streaming applications recently. Many applications of multimedia communication over IP network, such as VoIP, multimedia on de-mand, IPTV, and video blog, have been integrated into our daily life rapidly. However, efficient delivery of media streams over the Internet is confronted with many challenges. Congestion control plays an important role to avoid congestive collapse by attempting to avoid oversubscription of any network resource of the interme-diate nodes in the growing demand of those multimedia services over the Internet. TCP is a connection-oriented unicast protocol that offers reliable data transfer as well as flow and congestion control. However, multimedia streaming protocols have stricter requirement of transfer latency rather than reliable delivery. Many TCP-friendly congestion control protocols which are usually built upon UDP (User Datagram Protocol) with some specified

conges-tion control algorithm are being developed so that the multimedia flows can behave fairly with respect to the coexistent TCP flows that dominate the network traffic so as to avoid the starvation of TCP traffic and to prevent the network from congestive collapse. In addition, the RTT unfairness introduced by the AIMD scheme of TCP leads to unequal bandwidth distribution among the com-peting flows with different round-trip time under the same con-gested links also should be taken into consideration. However, most of the existing streaming protocols have no consideration for the fairness and the characteristics of video streams which might affect the quality of streaming services significantly.

According to the general rate distortion curve shown inFig. 1, the larger bit rate acquired, the better quality displayed. Some works focus on optimizing rate-distortion by multipath routing [28], or by active queue management and receiver feedback[27]. As to various video coding technologies, bit-stream scalability is a desirable feature for many multimedia applications so that grace-ful adaptation of transmission requirements can be achieved.

Scal-able Video Coding (SVC)[13], as an amendment G to the H.264/

MPEG-4 AVC standard created by Joint Video Team (JVT), intends to encode a video sequence once and the encoded bit stream is able

*Corresponding author.

E-mail addresses:sswang@cs.nctu.edu.tw(S.-S. Wang),hillhsiao@cs.nctu.edu.tw

(H.-F. Hsiao).

Contents lists available atScienceDirect

Computer Communications

(2)

to allow a diversity of different receivers to acquire and to decode a subset of the encoded bit stream without the need for transcoding. Scalable video coding enables not only efficient distribution of real-time mulreal-timedia streaming over heterogeneous networks but also a most promising solution for one-to-many congestion control over multicast networks. To fulfill above requirements, a scalable video bit stream contains a non-scalable base layer which is in compliance with the H.264/MPEG-4 AVC and one or more enhancement layers which may result from spatial, temporal or fidelity scalabilities of the scalable tools. The third octet in a Net-work Abstraction Layer (NAL) unit header records the layer identi-fications, including temporal_level (3bits), dependency_id (3bits), and quality_level (4bits). Thus, there can be 23

23 24video lay-ers. Besides, the layers are subscribed layer by layer and the bit rate allocation between neighboring layers may vary signiﬁcantly according to the streaming applications.

Due to the large number of possible scalable video layers and various bit rate allocation between layers, how to quickly converge to the time-varying available bandwidth without violating the existing flows under fairness competition becomes a critical factor in real-time video streaming. As to the information of available bandwidth, Multi-Router Traffic Grapher (MRTG) can use Simple Network Management Protocol (SNMP) to obtain the information from intermediate routers in the past. However, it is often difficult if not impossible due to various technical difficulties and privacy considerations or due to an insufficient level of measurement res-olution[12]. One-way delay trend detection is utilized in Pathload

[3] to measure the end-to-end available bandwidth by sending

periodic packet trains. Since each packet train is used to determine only one decision that if the probed bit rate is greater or smaller than the available bandwidth, usually binary search is adopted to adapt the probing rate to the available bandwidth gradually. In contrast to acquiring initial available bandwidth over unicast net-works, layered congestion control algorithm proposed in BIC[14], similar to Pathload, generates periodic burst packet trains from the upper layer over multicast network so that the probing periods of each receiver can be synchronized. Each receiver uses one-way delay trend detection to make the decision of joining one addi-tional layer at a time and leaves a scalable video layer when packet loss rate exceeds a speciﬁed threshold. As a result, it is not suitable

for receivers that might require joining or leaving several scalable video layers in a short time, due to dramatically ﬂuctuant channels.

In [11], a hierarchical sub-layer probing scheme which adopts

coarse to ﬁne layer partitioning to improve the efﬁciency of the probing interval was proposed. It might be helpful to reduce the number of probing periods when compared to BIC, but on the other hand, the probing packets might overshoot easily. Besides, net-work-layer multicast is still not widely deployed due to cost and management problems. Furthermore, it does not take TCP-friendly into account because only one probing rate is allowed for each syn-chronization point.

In this paper, we focus on TCP-friendly congestion control of fair end-to-end video streaming by inferring available bandwidth. We regulate the sending rate by using probing packets periodically such that a client running SVC applications can subscribe video lay-ers gradually. In addition, we also consider the fairness property of both intra-protocol and inter-protocols, especially under different RTTs. Furthermore, RTP/RTCP[16]which relies on additional pro-tocols to provide congestion control and to guarantee QoS to real-time multimedia steaming is integrated with the proposed congestion control algorithm. The remainder of this paper is orga-nized as follows. In Section2, the background and related works about bandwidth estimation and congestion control algorithm are presented. In Section3, we describe our TCP-friendly conges-tion control algorithm with the consideraconges-tion of RTT-fairness. In Section4, the performance of our proposed algorithm is evaluated and the conclusion of this paper is given in Section5.

2. Background and related works

In this section, we introduce the bandwidth estimation model based on the one-way delay (OWD) and the packet dispersion by sending probing packets. The selected TCP-friendly congestion control protocols in the literature will also be presented.

2.1. Bandwidth estimation

Among the results of emerging research in bandwidth estima-tion, link capacity and available bandwidth are of interest. The prior is constrained by the underlying transmission bandwidth. Gi-ven that packets are delivered from sender S to its receiver R through a ﬁxed network path P, which consists of a sequence of store-and–forward links, the narrow link of a network path P is de-ﬁned as the link with minimum capacity along the path. Assuming Ciis the link capacity of link i, and there are H hops in P, the capac-ity C of the narrow link is:

C ¼ min

i¼1...HCi: ð1Þ

The technology of packet pair[1]with two back-to-back pack-ets of packet size S is usually used to measure the capacity by observing the dispersion d ðd ¼ S=CÞ passing through narrow link if there is no background trafﬁc.

On the other hand, available bandwidth depends on the trafﬁc load of the path and it is typically a time-varying random variable. Assume ki(t) is the trafﬁc load of link i at time t, the available band-width Ai(t, T) of link i is the average unused bandwidth over some time interval T as shown in(2).

Aiðt; TÞ ¼ 1 T Z Tþt t ðCi kiðtÞÞ dt: ð2Þ

The available bandwidth A(t, T) of the tight link which is deﬁned as the link with minimum available bandwidth along a path is:

Aðt; TÞ ¼ min

i¼1...HAiðt; TÞ: ð3Þ

(3)

Several studies have been devoted to the research of available bandwidth estimation in recent years. Except for the network mathematic model which is based on the speciﬁed network

behav-ior or protocol [2], probing-based methods by means of packet

train[1]analysis are widely adopted to infer network utilization. A packet train is a sequence of probing packets of equal packet size arranged either back-to-back or with some speciﬁed inter-packet dispersion. According to the analysis approaches, there are two major types of packet train based algorithms: one-way delay (OWD) based analysis model and dispersion based analysis model. 2.2. One-way delay based analysis model

Given that a sender transmits K packets of packet size S to its receiver and assumed that the propagation delay can be ignored,

the OWD Dk _{of k-th packet can be modeled as the summation of}

the transmission delay ðS=CiÞ, processing delay ð

r

iÞ, and queuing delay ðdk_iÞ of each and every link (i = 1 . . . H) along the path.

Dk ¼X H i¼1 S Ciþ

r

iþ d k i : ð4Þ

The OWD difference between adjacent packets can be expressed as the contribution from queuing delay as shown in(5).

D

Dk

¼ Dkþ1 Dk¼X

H

i¼1

ð

D

dkiÞ: ð5Þ

The idea of OWD based model is from the following proposition[7]:

If Rp>A;

D

Dk>0;

If Rp5A;

D

Dk¼ 0:

Rpstands for the probing rate and A is the available bandwidth of a given path. The proposition concludes that if the probing rate is slower than the available bandwidth of the path, the arrival rate at the receiver will match their probing rate at the sender. On the other hand, if the probing rate is faster than the available band-width, then network queues will build up and the probing packets will be delayed ðDDk_>_{0Þ. By observing the delay trend of OWD,} many algorithms, such as Pathload[3], pathChirp[4], Pathbw[5], TOPP[6]and SLoPS[7], search for the turning point at which the sending rate and the receiving rate start to match.

2.2.1. Dispersion based analysis model

Dispersion based analysis model exploits the information of the inter-arrival time between two successive probing packets at the receiver. Let dinand doutbe the time dispersion of a packet pair be-fore and after passing through a single hop, respectively. Assume that the network queue will not be empty between the departure time of the ﬁrst probing packet of a packet pair and the arrival time of the second probing packet in the joint queuing region (JQR)[8]. Given the network capacity of the tight link C, the available band-width Að¼ C kÞ can be estimated by solving the following equa-tion for the trafﬁc load k[9].

dout¼

S Cþ

k

Cdin: ð6Þ

However, if these two packets do not fall into the same period, (i.e., in the disjoint queuing region (DQR)[8]), the packet dispersion before and after passing through a hop will be equal ðdout¼ dinÞ.

Bandwidth estimation tools such as IGI [8] and Spruce[10] are

two examples that beneﬁt from this observation. We denote Ri¼ S=di as the departure rate after passing through hop i with packet time dispersion di. For the probing packets passing through hop i with the arrival rate Ri1to hop i and departure rate Rifrom the same hop, Ri1and Riwill have the following relationship:

Ri¼ Ri1¼

Ci

kiþ maxfRi1;Aig

ð7Þ

kiis the trafﬁc load of hop i.

Obviously, Ri is monotonically decreasing because the depar-ture rate will be less than or equal to the arrival rate ðRi1PRiÞ, depending on whether the arrival rate ðRi1Þ is greater than the available bandwidth ðAiÞ. In addition, the available bandwidth A is the minimum of all Ai; thus we can induce that:

RinPRoutPA: ð8Þ

2.2.2. One-way delay vs. packet dispersion

The relation between OWD and packet dispersion can be ex-pressed brieﬂy as inFig. 2. When the probing rate is less than the available bandwidth, we will most likely haveDDk

¼ 0 and

Dd¼ dout din¼ 0. In other words, it means that there is no delay trend of OWD and the packet dispersion of the packet pair mea-sured at the sender and the receiver would be the same. On the other hand, when the probing rate is more than the available band-width, the cross trafﬁc can enlarge the dispersion, and will cause the increasing of queuing delay. Assume that the start transmission time for packet i and i + 1 is siand siþ1and the arrival time at recei-ver is ri and riþ1. We can infer the dispersion Ddi¼ ðriþ1 riÞ ðsiþ1 siÞ ¼ ðriþ1 siþ1Þ ðri siÞ ¼DDi. If DDi is not equal to 0, the OWD for a packet train will have increasing trend. Therefore,

Ddican be viewed as an index of queuing delay. In summary, we can conclude that the packet dispersion and OWD are two criteria that can work together to estimate the available bandwidth more precisely.

Available bandwidth can fluctuate dramatically and thus it is very important for the bandwidth measurement to converge fast and accurately. In the previous bandwidth estimation algorithms, such as Pathload[3], which uses binary search to adjust the probing rate for the next iteration, and IGI[8], which updates the probing rate by some fixed step size to inspect whether the probing rate matches the available bandwidth, they might be too inefficient to infer the probing rate for the next iteration, especially in real-time distributed applications, in addition to the possible resolution issue of the estimated bandwidth.

2.3. Congestion control

For most unicast ﬂows that require transferring data reliably and as quickly as possible, one of the straightforward options is to use TCP directly. However, TCP whose congestion control is mainly based on AIMD scheme and slow start cannot utilize

(4)

network resource efﬁciently, especially in the case of networks with high bandwidth-RTT product. It might be too conservative for the AIMD mechanism of TCP to increase the congestion window linearly per RTT, while at the same time the reduction of conges-tion window by a factor of half tends to be too drastic. This results in inefﬁcient link utilization. For some applications such as multi-media streaming, they can tolerate data loss to some degree and are usually highly delay sensitive.

Datagram Congestion Control Protocol (DCCP) [29] has been

proposed by the Internet Engineering Task Force (IETF) to imple-ment connection setup, teardown, ECN, and feature negotiation, and also to provide a framework to allow applications to choose different TCP-friendly congestion control profile such as TCP-like or TFRC by CCID. TCP-friendly congestion control algorithms built upon UDP with suitable rate adaption mechanism for streaming applications are designed to ensure that the coexisting TCP flows will not be treated unfairly by non-TCP flows. Widmer[18] classi-fied unicast TCP-friendly congestion control algorithms into

win-dow-based [20] and rate-based [17][21] algorithms. The

algorithms in window-based category use a congestion window at the sender or at the receiver to ensure TCP friendliness. On the other hand, rate-based congestion control achieves TCP friendli-ness by adapting the transmission rate according to either Additive Increase/Multiplicative Decrease (AIMD) scheme or model-based congestion control algorithm. The AIMD scheme mimics the behavior of TCP, while the model-based one follows Padhye’s TCP

throughput model[19]instead of TCP-like AIMD mechanism. The

analytical model of TCP-friendly available bandwidth is shown as follows. f ðpÞ ¼ S tRTT ffiffiffiffiffiffi 2bp 3 q þ tRTO3 ffiffiffiffiffiffi 3bp 8 q pð1 þ 32p2_Þ ð9Þ

S is the packet size; p is the packet loss rate; tRTTis the Round Trip Time; tRTOis the TCP RTO (Retransmission Time Out) which is usu-ally set to be 4 RTT in the experiments, and the number of packets acknowledged by a single TCP acknowledgement is b.

TCP Friendly Rate Control (TFRC)[17], which is rate-based in-stead of window-based congestion control, adjusts its sending rate based on the equation according to packet loss rate and RTT. The loss event rate p is measured as the inverse of the aver-age loss intervals. According to TCP NewReno, all the lost packets in the same congestion window are treated as a single loss event and the reduction of congestion window is only triggered once. The loss event may consist of several lost packets in the duration of a round-trip time since the ﬁrst occurred packet loss, and the loss interval is deﬁned as the number of packets between consec-utive loss events. A certain number of loss intervals are averaged using exponential decaying weights so that the older loss inter-vals contribute less to the average. It prevents the loss rate from reacting too strongly to the single loss event. If we take the recent L loss event intervals into account and In is the nth loss event interval, which is the number of packets sent between the nth and the (n + 1)th loss event, the weighted average of loss event intervals I~ can be obtained as follows and the loss event rate equals 1/I~. eI ¼XL l¼1 wlIn1 wl ¼ 1; if l < L=2 wl ¼ 1 ðlðL21ÞÞ L 2þ1 ð Þ ; otherwise 8 < : ð10Þ

Slow start is applied to the initialization phase and retransmission timeout. Otherwise, congestion avoidance phase is applied to detecting a loss and calculating the sending rate by the TCP

throughput model. However, earlier work[23]shows that the con-vexity of 1/I~ (E[1/I~] – 1/E[I]) and different retransmission timeout period (RTO) can be the reasons for TCP and TFRC to experience ini-tially different sending rates. The difference of loss event rates due to the different sending rates greatly amplifies the initial through-put. In addition, since TFRC increases the sending rate per round-trip time, it will lead to RTT unfairness. In other words, the flows with shorter RTT will gain more bandwidth than the flows with lar-ger RTT under the same bottleneck.

3. The proposed method

The queuing delay usually becomes severe before the event of packet loss due to buffer overﬂow. Therefore, using queuing delay as an indication of congestion can be more accurate to perform estimation than using loss event rate. Different from the traditional TCP which adjusts the congestion window by increasing one pack-et per RTT (TCP Tahoe, NewReno, . . .,pack-etc.) or by observing the change of RTT (TCP Vegas, Fast TCP[26] . . .etc.), we determine whether to increase one layer by observing the queuing delay of packet trains.

RTP and RTCP are popular streaming protocols over the Internet. In order to cooperate with SVC and RTP/RTCP, our proposed con-gestion control algorithm consists of two phases: start phase and transmission phase as illustrated inFig. 3, compared to the slow start and congestion avoidance status of TCP congestion control algorithm. In the start phase, we focus on the precision aspect of bandwidth estimation algorithm so that the bandwidth can be uti-lized efﬁciently especially in the case of networks with high band-width-RTT product. TCP-friendly and RTT-fairness will be taken into account during the transmission phase.

3.1. Start phase

Before laying down the proposed algorithm, we use ns2 net-work simulator to conduct simulations with the topology in Fig. 4for the observation of our analysis shown in the previous sec-tion. The capacities along the path from sender to receiver are 100, 75, 55, 40, 60, and 80 Mbps, respectively. The link delay for each link is 100 ms. Cross trafﬁcs are generated from 16 random sources at each link. The inter-arrival time of those cross trafﬁcs from each source follows Pareto distribution with exponential factor

a

¼ 1:5. A packet train transmitted by the sender consists of 10 packets of packet size S = 1500 bytes with ﬁxed packet dispersion dinso that the probing rate Rinð¼ S=din¼ 40 MbpsÞ for these 10 packets is greater than the available bandwidth. The receiver R will record the dispersion doutof arrival packets under different network utili-zations of the bottleneck (i.e., the tight link).

InFig. 5we show the dispersion distribution of received pack-ets of the bottleneck link under various network utilizations. It is obvious that the received dispersion doutis influenced by the uti-lization of the tight link and has positive correlation with the net-work utilization when the fixed probing rate ðRinÞ is greater than the available bandwidth (A). The reason is that if the network load gets heavier, there is a higher probability that packets of cross traffic will be placed among the probing packets and they will contribute to the packet dispersion of the probing traffic. In addition, since the received probing rate Routð¼ S=doutÞ is inversely

proportional to the received dispersion dout and the available

bandwidth A is equal to C k, the received probing rate at R will have positive correlation with the available bandwidth. Under this condition, the received probing rate is also an upper bound of the available bandwidth as shown in Section 2, Part A. Based on the inequality in(8), we can have a better ‘‘guess” of the prob-ing rate for the next iteration.

(5)

In the beginning of the start phase, we rarely have a clue about the information of available bandwidth, the probing sender transmits a packet train with probing rate ðRinÞ which is equal to the capacity of tight link. Or, the probing rate can also simply be a packet train with back-to-back packets when the capacity of tight

link is not available, or the initial probing rate can be the bit rate of the largest layer in SVC. After passing through the links, its receiver observes the dispersions of the packet train and also the corre-sponding rate ðRout¼S=doutÞ, and feeds back the information Routto the sender as the next probing rate. We continue above steps iter-atively until the probing rate and the available bandwidth start to match by means of performing the delay trend detection on the OWD. The full search algorithm[11]is modiﬁed to detect the delay trend. It is shown in[11]that full search algorithm is better than the Pairwise Comparison Test (PCT) and Pairwise Difference Test (PDT) of the delay trend detection in Pathload[3]. The block dia-gram of the proposed algorithm for available bandwidth estima-tion is showed inFig. 6.

Instead of binary search in Pathload, we adapt the top-down ap-proach to use the received probing rate as the next probing rate for the sender due to the analysis that the received probing rate has positive correlation with the available bandwidth and it is also an upper bound of the available bandwidth. The estimation process can converge faster as shown in the next section. In addition, we use OWD as a criterion to decide whether the probing rate is less than or equal to the available bandwidth. In other words, a packet train is sent to examine whether the probing rate is greater than the available bandwidth and the received rate is used as the next probing rate iteratively until no delay trend is detected by(11). Eq.(11)is used to calculate the probability whether the OWD of the latter packets ðDk

Þ is greater than the previous ones ðDlÞ for a packet train of length M. Therefore, we can acquire the residual bandwidth faster than the slow start of TCP and TFRC. As to thresh-oldFS, if the probing rate is greater than 2 Mbps, the heuristic value

Fig. 3. Schematic diagram of the proposed congestion control algorithm.

Fig. 4. Network topology used in the ns2 simulations.

10 20 30 40 50 60 70 80 0.3 0.35 0.4 0.45 0.5 0.55 0.6 0.65 0.7 0.75 0.8

Network Utilization (%) at the Bottleneck

P a cke t D isp e rsi o n a t R e ce iv e r ( m se c)

When probing rate > available bandwidth

Fig. 5. The received packet dispersion of probing trafﬁc at various trafﬁc loads, given that the probing rate is greater than the available bandwidth.

(6)

of the threshold is 0.7. Otherwise, the range of the threshold is 0.6–0.7. FS ¼ PM k¼2 Pk1 l¼1IðD k_>_Dl_Þ MðM1Þ 2 IðDk>DlÞ ¼ 1 if Dk>Dl; IðDk >Dl Þ ¼ 0; otherwise: ( ð11Þ

Comparing with the other algorithms in literature, such as Path-load[3]or IGI[8], our algorithm is more robust. For example, if the available bandwidth suddenly drops down during the estimation period, the Pathload or IGI may over estimate the available band-width easily. Therefore, it is not appropriate for multimedia streaming. However, our probing rate is based on the previous receiving rate; it will not be inﬂuenced by the sudden decrease of available bandwidth. On the other hand, when the available bandwidth increases during the probing, the estimation might be underestimated but it will not induce packets loss.

3.2. Transmission phase

Once the initial sending rate is determined, we periodically send probing packets to detect whether there is extra available bandwidth to enhance the QoS of video streaming. Basically there are normal periods and probing periods in this phase as shown in Fig. 3. Therefore, we have to determine how long to perform the probing and how to drop a video layer whenever congestion is de-tected in this phase. In order to compete with TCP ﬂows for the network resource friendly, the probing period is dynamically adapted based on the RTT and the bit rate between two layers. Let current sending bit rate for layer i be ratei and the algorithm will probe rateiþ1 next. The time interval t between two probing events is determined as follows:

t ¼rateiþ1 ratei

s RTTmin ð12Þ

The RTTmin is the minimal RTT recorded so far and it is updated whenever smaller RTT is measured from the RTCP receiver report.

Fig. 6. System diagram for available bandwidth estimation in start phase.

V P1 CSRC _count 1 Payload type Sequence number _{(16 bits)} Timestamp (32 bits)

Synchronization source (SSRC) id. (32 bits) Contributing source (CSRC) id. (0~15 items, 32 bits each) defined by profile

System clock

extension header length

Fig. 7. Modiﬁed RTP header for probing packet.

V P subtype _type=APPPayload length(16 bits) SSRC/CSRC

Probe

Received Rate / Application Specified Data

Fig. 8. Modiﬁed RTCP APP packet header for feedback packet.

Fig. 9. The estimation error with different packet sizes under different network utilizations.

(7)

In addition, whenever the probing fails to increase a video layer, the next probing period will be postponed by time interval t as indi-cated in(13). Assuming there are N video layers and layeri is the bit rate of current layer i, thresholdt is the upper bound of time interval between two probing. According to the experiments, thresholdt= 10 s is selected. In this way, the sending rate of the low-er laylow-er has bettlow-er opportunities to catch up with the highlow-er laylow-er because of the shorter probing period when there is extra available bandwidth. Furthermore, this rule can also reduce the inﬂuence of different RTTs under the same bottleneck on the receiving rate.

t ¼ min thresholdt; t 1 þ layeri layerN ð13Þ

As usual, the packet loss rate (p) is also the indicator to detect con-gestion occurrence during normal periods. Due to the characteristic of multimedia streaming which allows tolerable degree of quality deterioration, if the packet loss rate is greater than thresholdPLR, we drop layers and reset the time interval of two probing periods, according to(12)immediately. In order to respond to the packet

Fig. 10. The estimation error with different packet numbers in a packet train under different network utilizations.

Fig. 11. The estimated available bandwidth at network utilization 60% (average available bandwidth is 16 Mbps).

Table 1

The MAD and probing number under different network conditions.

Network utilization (%) Binary search Proposed algorithm MAD (Mbps) Probing number MAD (Mbps) Probing number 20 1.98 4 1.19 2.25 40 2.231 4 1.82 2.7 60 3.62 4 2.27 3.05

Fig. 12. The throughput of TCP, TFRC, and the proposed algorithm under different background trafﬁcs.

5Mb/10ms

5Mb/10ms 5Mb/10ms

5Mb/10ms 3Mb/10ms

Fig. 13. Dumbbell network topology for TCP-friendly simulation.

Fig. 14. Throughput for TCP (Tahoe) and the proposed algorithm to compete under the same network condition.

(8)

loss rate, we also take sending rate multiplied by the factor 1 pffiffiffip

into account, since the throughput is roughly inverse pro-portion topffiffiffipaccording to the equation model of TCP throughput. Therefore, our transmission rate will be adjusted to the correspond-ing layer of ratei 1 ffiffiffip

p

.

For the avoidance of the introduction of unnecessary noise and also for better robustness over a wide range of time scales, thresh-oldPLRused in normal periods needs to be determined carefully. When the damage of a layer is greater than what we can tolerate, we have to consider dropping layers. The determination of the threshold of packet loss rate (thresholdPLR) is related to the bit rate of the current sending layer as shown in(14)so that the transmis-sion of lower layer would have higher probability than the trans-mission of the higher one to increase one additional layer.

threshold PLR ¼

a

ððratei ratei1Þ=rateiÞ

a

2 ð0; 1: ð14Þ

In additon, if the packet loss rate in normal periods is less than the thresholdPLRand the RTT demonstrates obvious decrement and it is close to RTTmin, it implies that there can be extra available band-width. We also launch probing mechanism immediately to examine if one more layer is appropriate. The condition for additional prob-ing accordprob-ing to RTT is as follows:

RTTmeasure RTTmin: ð15Þ

3.3. Integration of RTP/RTCP and congestion control algorithm Realtime Transport Protocol (RTP) provides end-to-end delivery services for real-time traffic and RTCP (Realtime Transport Control Protocol) conveys the statistic information about the participants and also the QoS-related information. In RTP, the field sequence number can be used to detect packet loss and timestamp is used to reflect sampling instant of the first byte of data where clock fre-quency is specified by the profile of payload format documents of the application. In order to use existing RTP data packets as prob-ing packets so that the probprob-ing overhead can be kept at minimal, the header extension of RTP is updated to include the system clock information. Besides, the marker bit is also set so as to bypass the non-probing data efficiently.Fig. 7shows the values of the corre-sponding fields. RTCP provides feedback on the quality of data dis-tribution by five packet types, including Sender Report (SR),

Receiver Report (RR), Source Description (SDES), BYE, and Applica-tion (APP), to generate compound packets. The APP packet is se-lected as the control message for the probing results. The control message contains the received rate and the corresponding values are shown inFig. 8.

4. Simulations and experimental results

We use ns2 to simulate and evaluate the performance of the proposed algorithms. As mentioned in the previous section, part of the normal data packets are utilized as probing packets so as to eliminate the cost of probing overhead. At first, we examine the precision and the efficiency of the proposed bandwidth estima-tion algorithm in the start phase. Furthermore, we present experi-ments and discussions from the aspects of available bandwidth utilization, TCP friendliness, and fairness, respectively. Besides, slow start in TCP or TFRC is inefficient especially for high speed networks. In order to verify whether the proposed algorithms can work well in various bottleneck capacities, simulations with different capacities are conducted in bandwidth estimation and congestion control.

4.1. Bandwidth estimation

We consider the simulation in case of large bandwidth-delay

product topology in Fig. 4where the capacities from the sender

to its receivers are 100, 75, 55, 40, 60, 80 Mbps with 100 ms link delay for each link. For all the following simulations, if the param-eters are not mentioned explicitly, the default settings are as flows. The length of a packet train is 30 packets and the probing packet size is 1500 bytes. The link with capacity 40 Mbps is the tight and narrow link and the queue length is 20 packets. Cross traffic is generated from 16 random sources at each link with Pareto dis-tribution as stated earlier. To mimic the traffic in the real world, the flows are set as follows: 40% are 40 bytes, 50% are 550 bytes, and 10% are 1500 bytes. The thresholdFS inFig. 6is 0.7 and

e

is 2 Mbps. We discuss the parameters of probing packet train, size and length and compare with binary-search as follows.

4.1.1. Packet size

In this section, we observe the inﬂuence of the packet size under different network utilizations because the packet size has

Fig. 15. Throughput for TCP (NewReno) and the proposed algorithm to compete under the same network condition.

Fig. 16. Throughput for two ﬂows with the proposed algorithm to compete under the same network condition.

(9)

impact on transmission delay and packet dispersion according to (6). If the packet size is larger, the time interval between two con-secutive packets in a packet train will increase at a ﬁxed probing rate. Therefore, the delay trend detection will not be disturbed eas-ily by the transient variation of cross trafﬁc. The mean absolute dif-ference (MAD) between the average real available bandwidth and the estimated bandwidth is used to measure the estimation error. FromFig. 9, when the packet size is greater than 800 bytes, the estimation error is smaller as well as the MAD variation.

4.1.2. The length of packet train

The length of packet train means the sampling rate to detect de-lay trend. If the sample rate is not frequent enough like 10 packets under 20% utility inFig. 10, the estimation error will be larger. However, too many probing packets with higher sending rate than the available bandwidth will easily lead to queue overflow and it is intrusive to the existing cross traffic like the result of 60 packets under 60% utility. It is a tradeoff between the precision of estima-tion and the amount of network traffic causing by probing packets. Therefore, the length of packet train has to be sufficiently large to accommodate various situations and it needs to be not intrusive. As shown inFig. 10, 30 or 50 packets in a packet train is the better choice. In order to reduce the burden of the network load, we choose 30 packets in the following simulations.

4.1.3. Comparison

We compare our algorithm to binary-search based algorithm in [11]. The available bandwidth is estimated every 20 s, and the

spots in Fig. 11show the starting time of each estimation and

the corresponding estimated available bandwidth. FromFig. 10,

it is shown that our algorithm keeps much closer to the curve of real available bandwidth.

Table 1shows the MAD between the estimated and the true

bandwidths. The required average number of packet trains to com-plete bandwidth estimation under different network utilizations is also shown. Our algorithm outperforms binary-search based algo-rithm, especially in the case of heavy trafﬁc load.

Using binary-search to infer the available bandwidth is unstable with only one threshold, because once the probing rate falls in the gray region[3], the results may be inconsistent. Even though Path-load uses two thresholds to detect delay trend, it is still easy to misjudge in the fast ﬂuctuant network. Our proposed algorithm utilizes the features of OWD and dispersion to avoid above prob-lems and the accuracy and convergence speed of probing can be improved.

4.2. Congestion control

As to the performance evaluation of congestion control algo-rithms, we present the discussion from the aspects of available bandwidth utilization, friendliness, and fairness, respectively. Sim-ulations over Brite-generated network topology are also shown. Our simulation parameters are set as follows if they are not espe-cially mentioned. The threshold of full search algorithm is 0.63. As to the various scalabilities of SVC, we assume the maximum bit rate is 2 Mbps and the maximum bit rate is equally divided into 20 layers. The accumulative bit rates for all layers are 100, 200, 300, . . ., 2000 kbps. The parameter

a

of thresholdPLRis set to 0.3 heuristically. The version of TCP ﬂows is Tahoe and the packet sizes for TFRC and TCP ﬂows are also 1000 bytes.

4.2.1. Available bandwidth utilization

In this section we evaluate the available bandwidth utilization of the proposed congestion control algorithm compared to the

algorithms in TFRC[22]and TCP. We use ns2 network simulator

to conduct simulations with the topology shown in Fig. 4. The

length of a packet train is 30 packets of packet size 1000 bytes. The capacities along the path are 10, 7.5, 5.5, 2, 6, and 8 Mbps, respectively. The link with capacity 2 Mbps is the tight and narrow link and the queue length is 30 packets. The cross traffic consisting of constant-bit-rate flows (CBR) of packet size 550 bytes is gener-ated for each link to alter the available bandwidth so that the avail-able bandwidth can exhibit large fluctuation over a period of time. From the experiment results shown inFig. 12, the sawtooth-like rate shape of TCP shows the worst utilization. The proposed con-gestion control algorithm has better performance than TFRC not only in the start phase but also in the transmission phase. During the start phase, our algorithm can converge to the available band-width by top-down approach faster than the slow-start in the TFRC/TCP. During the transmission phase, our proposed algorithm fast converges to the available bandwidth under TCP-friendly con-dition because our proposed algorithm estimates the available bandwidth by probing instead of TCP throughput model. In our proposed algorithm, the efficiency to converge to the available bandwidth depends on the time interval between two probing periods. In order to be friendly with TCP, we mimic the congestion avoidance status in TCP to dynamically adapt the time scale be-tween two probing periods. The simulation results of the proposed algorithms show steady throughput without overestimating the shared bandwidth which is a critical property of video streaming. 4.2.2. Friendliness

We simulate the proposed algorithm with different versions of TCP protocols over a basic dumbbell network topology depicted in Fig. 13. The capacity of the bottleneck is 3 Mbps, and the proposed algorithm starts behind the TCP ﬂows by 2 s. As for calculating the average throughput, the ﬁrst 200 s are considered as a transient phase and they are not taken into account. InFig. 14, the average throughputs of TCP Tahoe and the proposed algorithm are 1327.63 and 1599.5 kbps, respectively. In Fig. 15, the average throughputs of TCP NewReno and the proposed algorithm are 1537.24 and 1462.75 kbps. Tahoe and NewReno are aggressive congestion control algorithms that send packet continuously until pocket loss occurs. According to Jain’s fairness index[24], the cor-responding indices of Tahoe and NewReno with our proposed algo-rithm are 0.991 and 0.999. The main difference between Tahoe and NewReno is the fast retransmission mechanism where NewReno sets the congestion window to one half of the previous window

(10)

size and Tachoe sets to one after receiving three duplicate acknowledgements. Therefore, NewReno has better throughput than Tahoe. The throughput of the proposed algorithm depends

on thresholdPLR because our proposed algorithm only considers

dropping video layers when the packet loss rate is greater than thresholdPLR. Due to the imitation of the TCP behavior to adapt the probing period and also the usage of queuing delay trend as congestion index in the transmission period, our proposed algo-rithm will not lead to congestion collapse and starvation of TCP trafﬁc. In addition, the step size between video layers is also one of the factors affecting the throughput. If TCP and the proposed algorithm are in the congestion avoidance status and compete for the residual bandwidth at the same time, our proposed algorithm will occupy enough bandwidth for a layer by probing. If the

resid-ual bandwidth is not enough for a layer, the proposed algorithm will not change the sending rate. On the other hand, the average throughputs of TCP Tahoe/TFRC and TCP NewReno/TFRC are 1189.26/1755.06 and 1411.63/1587.99, respectively. The corre-sponding fairness indices are 0.964 and 0.997, respectively.

We run two instances of the proposed congestion control algo-rithm over topology as shown inFig. 13and the difference of the start time between two ﬂows is 2 s. The average throughputs for

each are 1500.06 and 1499.93. Form Fig. 16, the throughput of

two instances will converge after 50 s, because our congestion con-trol algorithm takes the residual bandwidth and esteems the exist-ing trafﬁc until packet loss reaches the threshold.

4.2.3. Fairness over larger network topology

In order to be close to the real network topology further, we use Brite[25]topology generator to generate the network topology as shown inFig. 17which is composed of 50 nodes. 33 nodes (the red ones inFig. 17) among those 50 nodes are in the core network. The minimum degree for each node is 2 and the capacity for each link is 10 Mbps. We observe the fairness index between the number of peers and different protocols. From Table 2, our proposed algo-rithm seems to have better performance in 4 and 8 peers. However,

Fig. 18. Fairness index over time with respect to different peer numbers and shared bandwidths. Table 2

Peer size and fairness index under different protocols.

4 peers 8 peers 16 peers 16 peers (20 Mbps) Proposed 0.941021 0.919904 0.916501 0.994578466 TFRC 0.437532 0.610675 0.999736 0.520080428

(11)

the fairness performance of our algorithm is a little bit worse than the performance of TFRC when there are 16 peers, due to the band-width resolution (100 kbps) of the video layers which also deter-mines the sending rate resolution in the proposed algorithm. The fair throughput is 625kbps and the sending rate just can be 600 or 700kbps for decision so as to unfavorable in calculating fairness index. However, when the link capacity becomes larger (20 Mbps as shown inTable 2), the fairness index of TFRC suffers a lot.

Fig. 18shows the fairness index over time with respect to differ-ent peer numbers and differdiffer-ent shared bandwidths. Obviously, the proposed algorithm is more stable than TFRC no matter in which condition. When one of TFRC flows is suppressed by the other flows, it is rather difficult for it to grow up again.

4.2.4. Fairness over various RTT values

TCP flows usually show different end-to-end throughputs at different RTTs even under the same bottleneck. In the following simulations, we change the link delay of the last segment of one path inFig. 13so that there will be different RTTs between two flows and compare the proposed algorithm with TFRC. Each simulation lasts for 1000 s and the RTT of the flow varies from 1 to 5 times with respect to the other one, respectively. The aver-age performance of 10 times of simulations with random start

time is shown in Table 3. The proposed congestion control is

based on the queuing delay, and the back-off time of the probing periods is related to the transmitted video layers according to (13)whenever a probing fails. The lower layer has better oppor-tunities to catch up with the higher layer because of the shorter probing period when there is extra available bandwidth. There-fore, the unfairness of different RTT on our proposed algorithm is much less than that on the TFRC, obviously. Because TFRC does not emphasize on the inter-session fairness; as a result, one of the TFRC ﬂows usually suppresses the other one and dominates the whole bandwidth resource after a long period of time.

5. Conclusion

In this paper, we propose a bandwidth estimation algorithm using top-down scheme that combines the features of OWD and packet dispersion, and further present a congestion control algo-rithm for SVC based streaming through the tool of bandwidth inference by periodical probing. In order to compete with TCP con-gestion control algorithm and not to hamper the existing TCP flows, we dynamically adapt probing periods according to the RTT and the bit rate of video layers to mimic TCP congestion avoid-ance status. When the packet loss rate is greater than thresholdPLR, we drop transmission rate according to the sending rate and packet loss rate. In addition, we also observe the changes of RTT to deter-mine whether probing one layer is beneficial. As shown in the sim-ulation results, the proposed algorithm demonstrates better performance than TFRC and it can coexist with TCP flows friendly.

References

[1] C. Dovrolis, P. Ramanathan, D. Moore, Packet-dispersion techniques and a capacity – estimation methodology, IEEE/ACM Transaction on Networking 12 (6) (2004).

[2] R.S. Prasad, M. Murray, C. Dovrolis, K. Claffy, Bandwidth estimation: metrics, measurement techniques, and tools, Network IEEE 17 (6) (2003) 27–35. [3] M. Jain, C. Dovrolis, Pathload: a measurement tool for end-to-end available

bandwidth, in: Proc. Passive Active measurements, Fort Collins, CO, March 2002.

[4] V. Ribeiro, R. Riedi, R. Baraniuk, J. Navratil, L. Cottrell, PathChirp: efﬁcient available bandwidth estimation for network paths, Passive and Active Measurement Workshop 2003.

[5] Q. Liu, J.N Hwang, End-to-end available bandwidth estimation and time measurement adjustment for multimedia QoS, in: International Conference on Multimedia and Expo (ICME), July 2003.

[6] B. Melander, M. Bjorkman, P. Gunningberg, A new end-to-end probing and analysis method for estimating bandwidt bootlenecks, in: Global internet Symposium, 2000.

[7] M. Jain, C. Dovrolis, End-to-end available bandwidth: measurement methodology, dynamics, and relation with tcp throughput, IEEE/ACM Transaction on Networking 11 (4) (2003).

[8] N. Hu, P. Steenkiste, Evaluation and characterization of available bandwidth probing techniques, IEEE Journal on Selected Areas in Communications 21 (6) (2003).

[9] X. Liu, K. Ravindran, D. Longuinov, What signals do packet-pair dispersion carry?, in: Proceedings of IEEE INFOCOM, vol. 1, March 2005, pp. 281–292. [10] J. Strauss, D. Katabi, F. Kaashoek, A measurement study of available bandwidth

estimation tools, in: Proceedings of the 3rd ACM SIGCOMM Conference on Internet Measurement, October 2003, pp. 39–44.

[11] J.L. Lin, S.C. Pei, J.N. Hwang, Fine-grain layered multicast based on hierarchical bandwidth inference congestion control, in: International Symposium on Circuits and Systems (ISCAS), May 2005.

[12] S.S. Wang, H.F. Hsiao, Fast end-to-end available bandwidth estimation for real-time mulreal-timedia networking, in: International Workshop on Mulreal-timedia Signal Process (MMSP), October 2006.

[13] Joint Video Team, ITU-T Recommendation H.264: Advanced Video Voding for Generic Audiovisual Services, ITU-T Rec. H.264 and ISO/IEC 14496-10 AVC, March 2009.

[14] Q. Liu, J.N. Hwang, A new congestion control algorithm for layered multicast in heterogeneous multimedia dissemination, in: International Conference on Multimedia and Expo (ICME), July 2003.

[16] H. Schulzrinne, S. Casner, R. Frederick, V. Jacobson, A transport protocol for real-time applications, RFC 3550, July 2003.

[17] S. Floyd, M. Handley, J. Padhye, J. Widmer, Equation-based congestion control for unicast application, in: Proceedings of ACM Special Interest Group on Data Communication (SIGCOMM), May 2000.

[18] J. Widmer, R. Denada, M. Mauve, A survey on TCP-friendly congestion control, in: IEEE Network, vol.15, issue 3, May 2001, pp. 28–37.

[19] J. Padhy, V. Firoiu, D. Towsely, J. Kurpose, Modeling TCP throughput: a simple model and its empirical validation, ACM SIGCOMM 28 (4) (1998).

[20] S.-G. Na, J.-S. Ahn, TCP-like ﬂow control algorithm for real-time applications, in: IEEE International Conference on Networks (ICON), 2000.

[21] I. Rhee, V. Ozdemir, Y. Yi, TEAR: TCP emulation at receivers-ﬂow control for multimedia streaming, Dept. of Camp. Sci., NCSU, Tech. Rep., April 2000. [22] S. Floyd, M. Handley, J. Padhye, J. Widmer, TCP Friendly Rate Control (TFRC):

Protocol Speciﬁcation, RFC 3448, January 2003.

[23] I. Rhee, L. Xu, Limitations of equation-based congestion control, in: IEEE/ACM Transaction on Networking, vol. 15, no. 4, August 2007.

[24] D-M. Chiu, R. Jain, Analysis of increase and decrease algorithms for congestion avoidance in computer networks, Computer Networks and ISDN Systems 17 (1) (1989) 1–14.

[25] A. Medina, A. Lakhina, I. Matta, J. Byers, Brite, <http://www.cs.bu.edu/brite/>. [26] D.X. Wei, J. Cheng, S.H. low, S. Hegde, FAST TCP: motivation, architecture, algorithms, performance, IEEE/ACM Transaction on Networking 14 (6) (2006) 1246–1259.

[27] Yingsong Huang, Shiwen Mao, Scott F. Midkiff, A control-theoretic approach to rate control for streaming videos, in: IEEE Transactions on Multimedia, Special Issue on Quality-Driven Cross-Layer Design for Multimedia Communications, vol. 11, no.6, October 2009, pp. 1072–1081.

[28] L. Zhou, B. Geller, B. Zheng, A. Wei, J. Cui, System scheduling for multi-description video streaming over wireless multi-hop networks, IEEE Transactions on Broadcasting 55 (4) (2009) 731–741.

[29] E. Kohler, M. Handley, S. Floyd, Datagram Congestion Control Protocol (DCCP), March 2006.