Content-aware optimization on rate-distortion and network traffic for scalable video multicast networks

(1)

Content-aware optimization on rate-distortion and network traffic for scalable video

multicast networks

Junni Zou· Lu Jiang · Chenglin Li

Abstract This paper aims to optimize the content-aware prioritization of scalable video multicast, which is coupled with multipath streaming and network coding based routing. It constructs multiple layer distribution meshes for the scalable video stream to minimize the total video distortion at all the receivers, determines the base layer meshes with minimum costs to maintain application-layer QoS and the layer synchronization of SVC streaming, and improves the network throughput by encouraging path-overlapping transmissions and thus allowing bandwidth sharing among different receivers for the same video layer by utilizing network coding.

The targeted problem is formulated into a minimization programming in which the quality variation between layers, the transmission cost of the base layer, as well as the efficient resource utilization are jointly considered. By decomposition and dual approach, the global convex problem is solved by a two-level decentralized iterative algorithm. The implementation of the distributed algorithm is discussed with regard

The work has been partially supported by the NSFC grants (No. 61271211, No. 60972055), and the Research Program from Shanghai Science and Technology Commission (No. 11510707000, No. 11QA1402600).

J. Zou (

B

⁾

Department of Electrical and Computer Engineering, University of California, San Diego, CA 92093, USA

e-mail: zoujunni@gmail.com J. Zou

Key Laboratory of Special Fiber Optics and Optical Access Networks, Shanghai University, Shanghai, 200072, China

L. Jiang

Department of Communication and Information Engineering, Shanghai University, Shanghai 200072, China

C. Li

Department of Electronic Engineering, Shanghai Jiao Tong University, Shanghai 200240, China

Published online: 22 December 2012

(2)

to the communication overhead, and the convergence performance is validated by numerical experiments. Packet-level simulations demonstrate that the proposed algorithm could approximately achieve the maximum flow rates determined by Max- Flow Min-Cut Theorem and benefit the overall received video quality.

Keywords Scalable video coding· Multicast · Network coding · Rate distortion · Distributed algorithm

1 Introduction

Multirate multicast is superior to single rate multicast for media streaming distribution to a set of heterogeneous receivers [18]. Multirate multicast allows receivers to subscribe contents in compliance with the available bandwidth by joining a proper subset of multicast sessions. With the development of layered and scalable video coding [25], layered multicast emerges a variant of multirate multicast for scalable media streaming [3]. In layered multicast, on one hand, video can be transmitted and decoded at multiple bit rates with progressively improved video quality. On the other hand, rate adaptation is implemented at both the sender/receiver and intermediate network nodes, while achieving highly efficient video rate-distortion performance.

Therefore, joint optimization on multirate flow control and video distortion is of paramount importance in scalable streaming dissemination.

An SVC elementary stream is encoded to contain an H.264/AVC compatible base layer and represent the bit stream in the fully scalable representation. Utilizing SVC technique, a scalable bit stream could be represented in two different ways:

a layered representation (layered scalable) or a flexible combined scalability (fully scalable) [2]. Generally, the full scalability can benefit scenarios of unicast, where the target stream can be extracted at any bit rate from the SVC elementary stream in terms of single receiver’s capability. In comparison, the layered scalability can benefit multicast distribution by offering simple adaptation to heterogeneous receivers, i.e., different receivers can subscribe to different combinations of layers under the constraints of network capacity and layer dependency. For practical streaming multicast, we adopt the layered scalability and assume that the SVC video stream is encoded into a set of multiple layers where higher layers can be viewed as progressively refinable layers for the lower layers to update the video from one quality to the next [2]. Rooted in a base layer, an SVC stream extends one or multiple enhancement layers with a flexible multi-dimension layer structure (at leat one dimension from temporal, spatial, or SNR) to provide various operating points in spatial resolution, temporal frame rate, and video reconstruction quality.

Rate control of scalable video streaming has been studied extensively in the past [11,13,27,30]. For example, Zhu et al. [30] presented a packet-based rate adaption scheme for minimizing total distortion of multiple video streams for application- layer multicast with multipath transmission. van der Schaar et al. [27] proposed a packet-based channel access scheme for scalable streaming over wireless networks.

A message-based pricing and access coordination scheme was presented in [11].

To support heterogeneous device capability in the video multicasting/broadcasting, statistical multiplexing for layered multicast was investigated with a complexity measure among all programs in all layers [13]. These rate control methods could

(3)

improve the performance of scalable video streaming over networks, however, they used predetermined distribution trees to improve the network throughput and overall video quality, which might cause decoding problem in scalable video streaming over networks since the synchronization among SVC layers has not been adequately addressed. For example, along predetermined distribution trees, video packets from higher layers may arrive before or without packets from lower layers, which will cause decoding failure. To address this layer synchronization issue and allocate paths with lower cost to lower layers according to different SVC streams, in this paper, we study rate-distortion minimization problem for scalable streaming multicast networks, where each receiver can have multiple alternative paths through the coded network (i.e., networks that use network coding) to receive the subscribed SVC layers in an incremental order. Also, the network coding assisted multirate multicast is employed to enhance network transmission performance in a further way.

The first optimization model for multirate multicast problems was proposed by Kar et al. [14,15], and a distributed algorithm for the receiver to receive service at any rate within a continuous set of rates was proposed in [24]. Extending from one source scenario to maximize the overall utility of multiple sources constrained by the sources’ transmission rates, a flow control and optimization scheme is presented in [20]. Based on this approach, a number of source-oriented rate control schemes have been developed [11,27,30], which have been previously introduced as rate control schemes for scalable video streaming. The multipath routing combined with congestion control was studied in [12]. Also, inter-session fairness for layered video multicast was investigated by considering layer-based congestion sensitivity, which lets different video layers have different sensitivity to congestion [17] to address the layer synchronization issue. However, these existing methods on network performance optimization have been focused on only resource allocation among receivers, the problem of utility maximization for heterogeneous receivers to subscribe to multiple video coding layers with prioritized multirate multicasting has not been adequately addressed.

Network coding represents a novel paradigm in information theory that first proposed by Ahlswede et al. in 2000 [1]. It extends the functionality of network nodes from storing/forwarding packets to performing algebraic operations on data received. Li et al. [19] proved that the max-flow multicast throughput can be reached through the linear network coding. Chen et al. [7] developed two adaptive rate control algorithms by considering networks with and without coding subgraphs. Wu [28]

extended network utility maximization (optimizing QoS of the entire network based on a specific utility function) to network coding based multicasting. The authors in paper [29] attempted to address the layered multicasting problem by including network coding and multipath constraints in the objective function, and proposed a solution called LION algorithm. However, they simply formulated it as an integer linear programming without utility maximization and the prioritized path costs of different layered multicasting groups. Moreover, they only provided a heuristic approach instead of a rigorous distributed algorithm with global optimality. As a further improvement, we [31] proposed a prioritized flow optimization formulation for SVC and multicast over heterogeneous networks, which used the path cost and prices of each layer as the priority parameters to capture layer synchronization of SVC streaming. Although path cost of each layer and the layer synchronization problem

(4)

have been considered in this work, the successful decoding of the based layer cannot be guaranteed at all times especially when the path cost of higher layers are much smaller than that of the base layer. Moreover, the overall communication network in this work is simply modeled by a generalized network utility maximization problem, which did not take into account distortion property for video applications. In this paper, we consider rate-distortion modeling from the perspective of application-layer QoS, and improve optimization formulation by minimizing the total received video distortions of all receivers while also emphasizing on minimizing both the delay of the base layer to guarantee a basic quality for all receivers and minimizing the actual bandwidth assigned to all SVC layers to consider the efficient network resource utilization.

In this paper, we consider content-aware prioritization of scalable video coding and investigate how it could be coupled with multipath video streaming and network coding based routing to achieve optimum performance. To minimize the total video distortion at all the receivers, we propose an efficient flow control and resource allocation scheme. It constructs multiple layer distribution meshes for the scalable video stream with multipath routing, and determines the base layer meshes with minimum costs so as to guarantee application-layer QoS and tackle the layer synchronization issue of SVC streaming. Also, a specific strategy to efficiently allocate paths for receivers with minimum bandwidth consumption is proposed to improve the network throughput, which encourages path-overlapping transmissions and allows bandwidth sharing among different receivers for the same video layer by utilizing network coding. We formulate the flow control problem into a minimization programming in which the quality variation between layers, the transmission cost of the base layer, as well as the efficient resource utilization are jointly considered. By using primal decomposition and primal-dual approach, the global convex problem is solved by a two-level decentralized iterative algorithm. The implementation of the distributed algorithm is discussed with regard to the communication overhead, and the convergence performance of the proposed algorithm is validated by numerical experiments. Packet-level simulations demonstrate that the algorithm could approximately achieve the maximum flow rates determined by Max-Flow Min-Cut Theorem and benefit overall received video quality.

The remainder of this paper is organized as follows. Section 2 describes the motivation for SVC streaming multicast networks. In Section3, we formulate the problem of rate allocation and performance optimization for scalable video coding and multicast over networks. In Section 4, we propose a decentralized algorithm for the original scheme, and discuss implementation issues related to the algorithm.

Numerical and simulation results are presented in Section 6. Finally, the paper concludes in Section7.

2 Motivations

Our motivations in this paper could be derived from two aspects. The first one is related to the layer synchronization of SVC streaming, which requires that each receiver subscribes to scalable layers in an incremental order, since the successfully received higher layers cannot be decoded without the presence of lower layers. In other words, there exists layer dependency and priority constraint among scalable

(5)

video layers, where higher layers are dependent on the low layers and the low layers are with higher priority than the higher layers. Therefore, within the context of SVC, maximizing the total number of received layers or the overall throughput cannot guarantee the quality of video streaming, since the decoding of the enhancement layers depends on packets of the base layer. The layer synchronization requirement would lead the total utility of maximum received layers into suboptimal performance, where some higher layers, though successfully received, cannot be decoded because of the lack of their corresponding lower layers. Due to the lack of layer dependency and priority consideration in constructing multicasting paths, the higher layers may overwhelm the lower layers by low path costs and prices. That is, when packets of dependent lower layers are not all available till playback time, the packets of higher layers will have to be discarded, even if the bandwidth has been allocated for higher layers to maximize total utility for all receivers. This unexpected result obviously deviates from the original optimization objective.

To clearly illustrate this problem, we take a simple example. The network shown in Fig.1a contains one source S, four relay nodes R1∼ R4, and two receivers T1, T2, with the capacity marked on each link. Assume the source generates an SVC stream into three layers, each with rate of 2 (data units/second). According to the Max-Flow Min-Cut Theorem, the max flow to receivers T1and T2are 4 and 6. Thus, T1and T2

can subscribe to 2 layers and 3 layers respectively. For simple specification, assume the data suffer similar propagation delay along each link.

When the LION model [29] is adopted to maximize the network throughput, its distribution meshes of the base layer and the first enhancement layer are shown in Fig.1b and c (with solid lines for T1 and dashed lines for T2, and the associated numbers on each link specifying the bandwidth assigned for T1/T2 on each layer).

Observing the distribution mesh for T1, we can find that it crosses four links (S→ R2→ R3→ R4→ T1) at the base layer, while it only takes the propagation delay of two links (S→ R1→ T1) at the enhancement layer, which introduces a reversed propagation delay of two links between the base layer and the enhancement layer. The layer synchronization of SVC decoding in T1 will be greatly influenced

S

R1 R2

R3

R₄

T₁ T₂

4 2

2

2 2

4 4

S

R1 R2

R3

R₄

T₁ T₂

2/2 2/2

2/0 2/2

0/2

S

R1 R2

R3

R₄

T₁ T₂

2/2

0/2

2/0

0/2 0/2

(a) (b) (c)

Fig. 1 An example of flow distribution meshes, where a is the initial topology, b and c are distribution meshes constructed by LION algorithm for the base layer and the first enhancement layer, respectively

(6)

by such reversed propagation delay, resulting in heavy buffer management and decoder burden. As the reversed propagation delay between lower and higher layers increases, the burden of buffer and decoder at the receivers would also be increased in order to have the higher layer successfully decoded. Moreover, this dilemma would be more critical when either the scale of the network or the number of total scalable video layers becomes large.

The second issue is associated with efficient bandwidth utilization. Under multipath routing mechanism, each receiver would have multiple candidate paths from the source to receive the video streams. To receive the same layer with minimum bandwidth consumption, the paths that contain more joint links with other receivers’

paths are preferred. As shown in Fig.2a, the example topology consists of a source node S, three relay nodes R1∼ R3, and three receiver nodes T1∼ T3, and the available capacity in the number of packets is also marked on each link. Suppose the base layer has 3 packets to be transmitted. Also, assume the data suffer similar propagation delay along each link. Figure2b and c display the distribution meshes for the base layer by two different routing strategies. We can find that, although three receivers successfully achieve the base layer with roughly similar latency in the two strategies, their aggregate bandwidth consumptions are quite different (the former consumes 17, while the latter uses 14). The delivery of the base layer packets with two strategies are shown in Fig.2d and e, where a, b and c denote three packets in the base layer and b+ c corresponds to the packet after network coding operation. It is observed from Fig.2e that due to the selection of overlapping paths S→ R²→ T¹ and S→ R2→ T3as well as the employment of network coding, the latter solution

(d) (b)

(a)

S

R2

R1 R3

T1 T2 T3

2 3 3

3 1 2

3

S

R2

R1 R3

T1 T2 T3

2 1 2

2 1 2 2

1 1

S

R2

R1 R3

T1 T2 T3

a,b

a,b,c

c

a,b,c

a,b a,b,c

a,b,c

S

R2

R1 R3

T1 T2 T3

c c a,b

a,b+c c

a,b

a,b b+c a,b+c S

R2

R1 R3

T1 T2 T3

5 2

3

3 2 2

2 4

5

(c) (e)

Fig. 2 An example of distribution meshes on the base layer by two different routing solutions, where a is the topology, b and c are the base layer mesh, d and e are packet transmissions

(7)

utilizes less bandwidth at the base layer, thus leaves more available resource for the higher layers.

3 Problem statement

3.1 Notations

The video distribution network can be modeled as a directed graph G(V, E), where V is the set of nodes and E is the set of directed links. The set V comprises three kinds of nodes: S, R and T, representing the set of source nodes, relay nodes and receiver nodes respectively. The SVC stream is encoded into M layers, with each layer m corresponding to a multicast session at expected transmission rate Bm. For any link(i, j), let c^ijdenote its capacity, and f_ij^m represent the bandwidth consumed on layer m.

Suppose from the source to receiver t there exist multiple alternative paths P(t).

For each receiver t, let R^m_t,kdenote the information flow rate assigned to its k-th path for transmitting packets of layer m. As a path consists of consecutive links, we use a matrix Z^tto denote whether the links are included in t’s paths:

Z^tm_k,ij=

1, if edge (i, j) ∈ path k on layer m ;

0, otherwise.

The ultimate goal of video streaming is to provide receiver the best video quality.

To estimate the quality of the SVC video stream that is received by each destination node, in this work, we take the rate-distortion model in [26]:

De(R^e) = θ

Re− R0+ D0

where De is the distortion of the encoded video sequence, measured by the mean squared error (MSE), and Reis the encoded rate. The variablesθ, R0and D0are the parameters of the R-D model.

When receiver t accesses to a new layer m, its receiving rate increases from R to R+ R. By Taylor expansion, we approximate D^e(R + R) by the first two terms of its Taylor series, the corresponding quality variation between layers goes as follows:

Dê= Dê(R + R) − Dê(R)

= De(R) · R +1

2De(R) · R²+ o(R²)

≈ − θ

(R − R0)² · R + θ

(R − R0)³ · R²

Namely, for any receiver t with flow rate R^mt on layer m, its distortion decrement can be described as a strictly convex function of R^m_t :

De(R^mt ) = − θ (m−1

i=0 Rⁱt− R0)²· R^mt + θ (m−1

i=0 Rⁱt− R0)³· (R^mt )² (1) Generally, receiver t has multiple alternative paths to join the multicast session m, but not all these paths are optimal ones. Analogous to practical routing, the optimal

(8)

paths can be chosen in a variety ways based on different considerations, such as delay, resource usage or commercial charge. Inspired by the generic cost function definition [4], we propose the following path cost function:

ρ(R^m_t,k) =

(i, j)∈E

z^tm_k,ij· R^m_t_,k

cij− R^mt,k+ d^tk· R^m_t,k (2) According to [4], receiver t’s congestion in terms of queuing delay on each link in layer m is a function of ongoing information flow rate R^m_t,k and the capacity cij

of that link. Using M/M/1 queuing model [16], the average queuing delay on each link can be expressed by 1/(cij− R^mt,k), and the total queuing delay on that link becomes R^m_t_,k/(cij− R^mt,k). Consequently, the first part of (2) represents the sum of queuing delay at links that belongs to that path. In the second term, d^t_kis a parameter corresponding to the average propagation delay over path k normalized by the average packet size. Therefore, the second term, d^t_k· R^mt,k denotes the propagation delay on path k. With this definition, ρ(R^mt,k) denotes the end to end delay of information flow within layer m transmitting to receiver t along its k-th path and is a differentiable and convex function.

3.2 Optimization problem

For a given SVC streaming multicast network, we aim at maximizing the overall video quality (i.e., minimizing the total video distortion) of all receivers, while satisfying content priority of the base layer and minimum bandwidth utilization at all the layers. Mathematically, it can be formulated as:

P1: minimize O(R, f)

=

t∈T

m∈M

D^e

⎛

⎝

k∈P(t)

R^m_t_,k

⎞

⎠ +

t∈T

k∈P(t)

ρ(R¹t,k) +

m∈M

(i, j)∈E

f_ij^m

subject to 1)

k∈P(t)

Z_k^tm_,ij· R^mt,k

≤ fij^m, ∀(i, j) ∈ E, ∀m ∈ M, ∀t ∈ T;

2)

m∈M

f_ij^m≤ c^ij, ∀(i, j) ∈ E;

3) 0 ≤

k∈P(t)

R^m_t,k≤ Bm, ∀m ∈ M, ∀t ∈ T;

The objective function O(R, f) consists of three parts. The first term represents the total quality variation between layers. The second term defines the overall end-to- end latency for the base layer dissemination. As the base layer makes predominated contribution in video data reconstruction, we emphasize on minimizing the delay of the base layer to guarantee a basic quality for all the receivers. The last term denotes the bandwidth assigned at all the layers. Clearly, it should be diminished as much as possible on the premise that all the receivers could successfully receive

(9)

their desired contents. For an optimal overall video quality, we attempt to seek an aggregate minimization solution that takes into account these three factors.

Constraint 1) represents the relationship between information flow rate and actual bandwidth consumption within each layer on each link, where network coding is applied to information flows of the same video layer. With network coding, different receivers will not compete for link bandwidth within the same session. Therefore, the actual bandwidth consumption on link(i, j) for layer m is equal to the largest information flow rate of all the receivers.

Constraint 2) ensures that the total bandwidth consumption of each link on different layers do not exceed the link capacity. Constraint 3) gives the upper bound of the information flow rate allocated to each receiver at each layer, i.e. for each receiver, the sum of information flow rate for transmitting layer m over all P(t) paths cannot exceed the expected transmission rate Bm.

Define Rt= [R¹t,1, · · · , R¹t,|P(t)|, R²t,1, · · · , R²t,|P(t)|, · · · , Rt^M,1, · · · , Rt^M,|P(t)|] and R = [R1, · · · , RT]^T. Also let Rt=

Rt 0 ≤

k∈P(t)R^m_t_,k≤ Bm, for all m and k

, t ∈ T, andR denote the Cartesian product of Rt(t ∈ T), then Problem P1 can be rewritten as:

P2: minimize

R∈R

t∈T

m∈M

O(R, f)

subject to 1)

k∈P(t)

Z^tm_k_,ij· R^mt,k

≤ fij^m, ∀(i, j) ∈ E, ∀m ∈ M, ∀t ∈ T;

2)

m∈M

f_ij^m≤ cij, ∀(i, j) ∈ E. (3)

It can be verified that the objective function and the constraint set in P2 are all convex [6]. Thus, there exists an unique optimal solution to P2 which can be easily obtained by the centralized algorithms. However, the drawback of a centralized solution is that it requires a central node to collect global information such as the assigned flow rates on all links, and to perform all the computations. Such solution could be very costly and sometimes infeasible in practice. As the network size grows, it is preferable to solve the problem in a distributed manner.

4 Distributed algorithm

4.1 Primal decomposition

It is difficult to directly solve the problem P2 with Lagrange duality, because of the interaction between variables f_ij^mand R^m_t,kin Constraint 2). If we fix the variables f_ij^m, P2 can be decoupled with respect to the variables R^m_t_,k. Following this assumption,

(10)

we adopt the primal decomposition approach [23] and solve P2 by a two-level optimization procedure:

P2a: minimize

R∈R

t∈T

m∈M

O(R, f)

subject to:

k∈P(t)

Z_k,ij^tm · R^m_t,k

≤ fij^m, ∀(i, j) ∈ E, ∀m ∈ M, ∀t ∈ T; (4)

P2b: minimize

R∈R

t∈T

m∈M

O(f)

subject to:

m∈M

f_ij^m≤ cij, ∀(i, j) ∈ E. (5)

Problem P2a performs a low-level optimization, which can be further decomposed into a set of sub-problems under the condition that f is fixed. Problem P2b performs a high-level optimization, which fulfills the update of variable f. The optimal value of the objective function of the low-level optimization is locally optimal. It approxi- mates to the global optimality by using the results of the high-level optimization.

4.2 Two-level optimization update

(1) Low-leveloptimization update The Lagrangian dual of Problem P2a is defined as:

L(R, λ, λ, λ) =

t∈T

m∈M

O(R, f) +

t∈T

m∈M

(i, j)∈E

λij^tm

⎡

⎣

k∈P(t)

(Zk^tm,ij· R^mt,k) − fij^m

⎤

⎦ (6)

whereλij^tmis the Lagrange multiplier.

The Lagrange dual function L(R, λ, λ, λ) is the maximum value of the Lagrangian over primal variableλλλ, and it is given by: g(λλλ) = sup

R

L(R, λ, λ, λ).

The Lagrange dual problem is then formulated as: maximize

λλλ≥0 g(λλλ). Note that P2a is equivalent to the above dual problem when the following Karush-Kuhn-Tucker (KTT) conditions [6] are satisfied:

(1) ^∂L(R,ˆλ_{∂ R}m^,ˆλ^,ˆλ) t,k

R^m_t,k= ˆR^m_t,k= 0, ∀k ∈ P(t), ∀m ∈ M, ∀t ∈ T;

(2)

(i, j)∈Eˆλ^tmij

k∈P(t)(Z_k,ij^tm · ˆR^m_t,k) − fij^m

= 0, ∀(i, j) ∈ E, ∀m ∈ M, ∀t ∈ T;

(3)

k∈P(t)

Z^tm_k_,ij· ˆR^mt,k

− fij^m≤ 0, ∀(i, j) ∈ E, ∀m ∈ M, ∀t ∈ T;

(4) ˆλ^tm_ij ≥ 0, ∀(i, j) ∈ E, ∀m ∈ M, ∀t ∈ T;

where ˆR and ˆλˆλˆλ represent the primal and dual optimal point, respectively.

(11)

We now propose the following primal-dual algorithm [22] to solve the low-level optimization problem. It updates the primal and the dual variables simultaneously, and moves together towards the optimal points asymptotically.

R^m_t_,k(n + 1) =

R^m_t_,k(n) + ˙R^mt,k

+

=

R^m_t_,k(n) + α(n) ·∂L(R, λ, λ, λ)

∂ R^mt,k

R^m_t_,k(n)+

(7)

λ^tmij (n + 1) =

λ^tmij (n) + ˙λ^tmij

+

=

λ^tmij (n) − β(n) ·∂L(R, λ, λ, λ)

∂λ^tmij

λ^tmij (n)⁺ (8) where n is the iteration index, α(n) and β(n) are positive step sizes, and [z]⁺= max{z, 0}. The partial derivatives of R and λλλ are given by:

˙R^m_t_,k= α(R^mt,k)

⎡

⎣∂ O(R^mt,k, fij^m)

∂ R^m_t,k +

(i, j)∈E

Z_k^tm_,ij· λ^tmij

⎤

⎦ (9)

˙λ^tm_ij = β(λ^tmij )

⎡

⎣

k∈P(t)

Z_k,ij^tm · R^m_t,k

− fij^m

⎤

⎦ (10)

Hereλ^tmij can be viewed as the congestion price at link (i, j) for the bandwidth requirement of receiver t in layer m. It can be seen from (8) and (10) that if the demand

k∈P(t)(Z_k,ij^tm · R^m_t,k) at link (i, j) for the information flow exceeds the supply f_ij^m, the priceλ^tmij will rise, and decrease otherwise. Also, it is notable that all the updating steps are distributed and can be implemented at individual links using only local information.

(2) High-level optimization update As mentioned above, the low-level optimization is operated under the assumption that the value of f is fixed. In this section, we discuss how to adjust f to solve the high-level optimization problem.

Suppose ˆλ^tmij is the optimal Lagrange multiplier corresponding to the constraint in P2a. Similar to Rt, we define fij= [ fij¹, · · · , fij^M] and f = [f¹¹, · · · , f^E]^T. Also letFij=

fij fij^m≥ 0 for all m and

m∈M f_ij^m≤ cij

, (i, j) ∈ E, andF denotes the Cartesian product ofFij

(i, j) ∈ E

. Then the Lagrangian dual and the primal-dual algorithm of P2b are proposed as follows:

L(f, η, η, η) = ˆO( f ) +

(i, j)∈E

η^ij

m∈M

f_ij^m− c^ij

(11)

f_ij^m(n+ 1) =

f_ij^m(n) + ˙fij^m

+

=

f_ij^m(n) + a(n) · ∂L(f, η, η, η)

∂ fij^m

f_ij^m(n)⁺

(12)

η^ij(n+ 1) =

η^ij(n) + ˙η^ij+

=

η^ij(n) − b(n) ·∂L(f, η, η, η)

∂η^ij

η^ij(n)⁺

(13)

(12)

where ndenotes the iteration index, and a(n), b(n) are positive step sizes. Through mathematic deduction, the partial derivatives of f andηηη are given by:

˙f_ij^m= ∂L(f, η, η, η)

∂ fij^m

f_ij^m(n)

= a f_ij^m

2· fij^m+

t∈T

λ^tmij + ηij

(14)

˙ηij= ∂L(f, η, η, η)

∂ηij

ηij(n)

= b(ηij)

m∈M

f_ij^m− cij

(15) Actually,η^ijcan be regarded as the aggregate congestion price of link(i, j). If the consumed bandwidth f_ij^mon link(i, j) in layer m cannot meet the actual requirement of all receivers, the f_ij^m will increase in the next step, or else, it will decrease. Also, the iterations ofηijand f_ij^mcan be implemented in a decentralized manner.

5 Practical implementation of the distributed algorithm

When implementing the proposed distributed algorithm, each link(i, j) and each receiver t is treated as an entity capable of processing, storing and communicating information in a distributed computation system. Assume that the processor for link (i, j) keeps track of variables λ^tmij and f_ij^m, while the processor for receiver t keeps track of variable R^m_t_,k. A decentralized version of the proposed algorithm is summarized in Table1.

Note that the low-level and high-level algorithms operate at different time scales.

The former is an inner loop and operates at a fast time scale, while the latter is an outer loop and performs at a low time scale. More specifically, the high-level

Table 1 Implementation of the proposed distributed algorithm Initialization

sets n= 0, n= 0 and λ^tm_ij(0), R^m_t,k(0), f_ij^m(0), ηij(0) respectively to some non-negative values for all t, m,(i, j) and k.

Repeat

Updating at link (i,j) in Low-level Implementation:

Receives R^m_t,k(n) from all receivers {t|t ∈ T, and Z_k,ij^tm = 1}.

Updates the congestion priceλ^tm_ij(n) according to (8) and (10).

Broadcasts the new priceλ^tm_ij(n + 1) to all receivers {t|t ∈ T, and Z_k,ij^tm = 1}.

Updating at receiver t in Low-level Implementation:

Receives from the network the aggregate congestion price

k∈P(t)

Z_k,ij^tm · R^m_t,k . Updates the rate R^m_t,k(n) with (7) and (9).

Broadcasts the rate R^m_t,k(n + 1) to all links {(i, j)|(i, j) ∈ E, and Z_k,ij^tm = 1}.

Updating at link (i,j) in High-level Implementation:

Calculates the sum ˆλ^tm_ij f_ij^m(n)

=

t∈Tˆλ^tm_ij f_ij^m(n)

. Updates a new f_ij^m(n) with (12) and (14).

Updates the aggregate congestion price according to (13) and (15).

Broadcast the new f_ij^m(n+ 1) to all receivers {t|t ∈ T, and Z^tm_k,ij= 1}.

Until

All variables converge to the optimums.

(13)

algorithm will not move to its step until ˆλ at the low-level converges to its optimum value. When the algorithm converges, the generated solution will jointly optimize the rate allocation and the transmission structure.

When the communication overhead issue [15] is taken into account, all the update operations at both low-level and high-level iterations can utilize those variables stored in the local node or link, except the information of the updated rate R^m_t,k(n + 1), fij^m(n+ 1) and the updated price λ^tmij (n + 1) that are needed to be transmitted by extra packets. For example, according to (8), to update the Lagrange price λ^tmij (n), the Rate Packet (RP) of receiver t carrying the rate information of R^mt,k(n) is only required to transmit upward along t’s paths to the subset of links{(i, j)|(i, j) ∈ E, and Zk^tm,ij= 1}. Similarly, on the basis of (7), to update the rate R^m_t_,k(n), the Control Packet (CP) containing link(i, j)’s Lagrange price λ^tmij (n) is only to be sent downward to the subset of receivers{t|t ∈ T, and Zk^tm,ij= 1} along the paths that link belongs to.

If we adopt the float type in implementation, each rate or Lagrange price takes up only 4 bytes, thus is negligible compared to the main video streaming traffic. Roughly estimated, the time spent by the whole network to reach the stability is equal to the number of iterations required for convergence multiplying the update time interval of each iteration. It is found in [9] that an update interval which is about 2 to 3 times the one way propagation delay of the particular receiver is sufficient. Therefore, the entire overhead of the proposed distributed algorithm is quite small.

6 Results and discussion

In this section, we present numerical and simulation results to show the performance of the proposed algorithm. We conduct numerical experiments on classical butterfly network topology which has been extensively used in network coding- based simulation studies [1,19,28]. The purpose of numerical solution is to evaluate the convergence behavior of the proposed distributed algorithm. Also, we present simulation results for a packet-level simulation with a general network topology, and show that our algorithm achieves an overall balanced throughput and better video quality over all receivers.

6.1 Numerical simulation results

The classical butterfly network topology, shown in Fig.3a, consists of source S, relay nodes Ri, and receivers Ti. The capacity and random propagation delay (between 0 and 1) of each link are marked as capacity/delay on each link. Assume the source generates an SVC stream into three layers, with rate of 2.5 (data units/second) on both the base layer and the first enhancement layer, and a rate of 1 on the second enhancement layer.

Convergence behavior Figure4shows the assigned data rate for each receiver at each layer during the low-level optimization, where we adopt constant step sizes with α(n) = 0.0631, β(n) = 0.01733, a(n) = 0.51 and b(n) = 0.0155. It can be seen that all data rates converge after 100 iterations. For instance, the total rates achieved by T1

reach within 10 % of its optimal value after 37 iterations and converge to 5.00038

(14)

S

R1 R3

R4 R8

T1

R6

R9

R2

R5

R10

R7 T2

T3 T4

T5 BS

BR 200/0.07

350/0.67 750/0.58 330/0.26

330/0.74

300/0.11 200/0.39

200/0.66

400/0.87

200/0.44 200/0.28

300/0.09 150/0.15

350/0.39 600/0.91

200/0.56 400/0.67

200/

0.12 410/0.53

200/0.83

400/0.26 200/0.31

200/0.65

R11 S

R1 R2

R₃

R4

T₁ T₂

2/0.67 4/0.77

1/0.66

1/0.09 4/0.58

4/0.39 2/0.87

4/0.28 4/0.44

(a) (b)

Fig. 3 Network topology associated with link capacity/delay, where a is a butterfly topology for numerical experiment, b is a general network topology for SVC streaming based simulation

after 80 iterations. The rates achieved by T2of two layers reach within 10 % of its optimum after 46 iterations and converge to 4.995936 after 92 iterations.

Figure5shows the convergence behavior of the high-level optimization. Due to space limit, we only show the rate evolutions of links(R1, R3), (R2, R3) and (R3, R4) at the first enhancement layer, while other links have similar outcomes. It is observed that the flow rates on these three links converge after 250 iterations. In addition, due to the implementation of network coding on link(R3, R4), the sum of the flow rates on links(R1, R3) and (R2, R3) is almost equal to the flow rate on link (R3, R4).

0 50 100 150 200

0 2 4 5 6 8 10 12 14 16

Receiver T 1

Number of iterations

the received rate

50 100 150 200

0 2 4 5 6 8 10 12 14 16

Receiver T 2

the received rate

Base layer

First enhancement layer

Base layer

First enhancement layer Second enhancement layer

Fig. 4 Evolution of the assigned rate for each receiver in the low-level optimization

(15)

Fig. 5 The performance of the high-level optimization

100 200 300 400 500 600 700 800

0 0.5 1 1.5 2 2.5 3 3.5

flow rate

edge (R 1,R

3) edge (R

2,R 3) edge (R

3,R 4)

Impact of step size The Lagrange multipliersλλλ and ηηη signal the congestion status of the entire network. By iteratively modifying their values, the distributed algorithm gradually reaches an optimal rate allocation solution. We now investigate the impact of the step sizesλλλ and ηηη on the convergence speed. In contrast to the aforementioned experiment, we adjustβ(n) to 0.018, and b(n) to 0.025, with α(n) and a(n) unchange- able. As seen in Fig.6, in this case, the receiving rate of T2 does not converge to the optimal point. Instead, it converges to some suboptimal solution within a quite small neighborhood around the optimum. Since such phenomenon is likely to happen when the constant step size is used [5], the diminishing step size becomes a better alternative.

Fig. 6 Impact of different fixed step sizes on convergence behavior

(16)

0 50 100 150 200 0

2 4 6 8 10 12 14 16

Receiver T 1

The received rate

0 50 100 150 200

0 2 4 6 8 10 12 14 16

Receiver T 2

The received rate

Constant step size Diminishing step size Constant step size

Diminishing step size

Fig. 7 Performance comparison for constant and diminishing step size

Here we letβ(t) = ¹t, satisfying lim

t→∞β(t) = 0 and_∞

t=0β(t) = ∞. Compared with a constant step size, we can find in Fig.7, that the receiving rates with a diminishing step size vary smoother but converge more slowly than its fixed counterpart.

Although a fixed step size is more convenient for distributed implementation, a diminishing step size is recommended in practice, for the rate with low and smooth fluctuation is crucial for video quality smoothness.

Throughput performance Figure 8 compares the achievable throughput of two receivers by the shortest path(SP) distribution tree, the LION algorithm and the proposed algorithm. It is seen that the proposed algorithm outperforms both the shortest path and LION algorithms. As the shortest path scheme constructs video distribution tree with single path and does not use network coding, in contrast, LION and our method have introduced network coding based multipath routing and achieved significant gains in network throughput. Furthermore, for receiver T1, both multipath algorithms can realize its max-flow capacity of 5, while for receiver T2, only

Fig. 8 Comparison of achievable throughput

(17)

our distributed algorithm can achieve a rate of 5.942 and approximate to its max-flow capacity 6.

6.2 Packet-level simulation results

To evaluate the received video quality using the proposed distributed algorithm, we also conduct packet-level simulations with a general network topology, as shown in Fig.3b. It contains a source S, 11 relay nodes R1∼ R11and 5 receivers T1∼ T5. The capacity (Kbps)/propagation delay (per Kbit) is marked on each link. The numbers of alternative paths for 5 receivers are 4, 4, 6, 1, 2, and their max-flow rates are 400, 550, 1300, 300, 400 Kbps, respectively. The configuration of parameters are shown in Table2.

In the packet level simulations, we use the practical random network coding [8] to distribute the source packets of each layer. Here we assume intra-session network coding is implemented within each layer to ensure easy operation. During data transmission, each relay node (as well as the source node) combines its received packets belonging to the same generation from different upstream links (or video source packets encoded by the source node) with random linear operations over a large Galois Field and then sends the coded packets to its downstream links. Each destination node can correctly decode the original packets if it receives enough coded packets. To cope with asynchronous transmission, we use the buffer model [8] to synchronize the packet arrivals and departures. In the buffer model, packets that arrive at a node on any of the incoming links are put into a single buffer sorted by layer. Then, whenever there is a transmission opportunity at an outgoing link, the number of packets of every layer in the buffer is checked and a packet is generated containing a random linear combination of all the packets that belong to the layer with the largest number of packets. After the generated packet is transmitted to the outgoing link, certain old packets are flushed from the buffer according to the flushing policy. Specially, if two layers have the same number of packets in the buffer, the lower layer is prioritized to generate a packet for transmission.

We use four standard test sequences “Bus”, “Coastguard”, “Foreman” and “Mo- bile” with a frame rate of 30 fps, CIF (352×288) resolution, and a GOP-length of 32 frames with IBBP... structure. The streams are generated using the Joint Scalable Video Model 9_10 reference codec of H.264/AVC scalable extension, with 256 Kbps on the base layer and 384 Kps, 512 Kps and 1024 Kps on the enhancement layers by fine granularity scalability (FGS) encoding. Figure 9 shows the rate-distortion performance, measured in average peak signal-to-noise ratio (PSNR), for the four CIF video sequences.

Table 2 Configuration

of parameters Parameter description Value

Step sizeα(n) 0.05

Step size a(n) 0.05

Step sizeβ(n) 0.01

Step size b(n) 0.01

Galois field size of network coding 8

Generation size of network coding 50

Number of iterations 400

Update interval 5.4 ms

(18)

Fig. 9 PSNR performance achieved for four CIF sequences with frame rate of 30 fps and GOP length of 32

200 400 600 800 1000 1100

26 28 30 32 34 36 38 40 42

rate (Kbps)

PSNR (dB)

Bus Coastguard Foreman Mobile

Throughput and transmission cost comparison Table 3 presents the number of layers received at each receiver with different algorithms. The transmission cost for each receiver in the base layer is shown in Fig.10, which is the sum of each path’s cost calculated by (2) for the base layer distribution. It can be seen that the shortest path scheme not only achieves the lowest throughput, but brings the undesired cost for the base layer transmission. The LION algorithm builds the distribution meshes with a heuristic scheme and achieves a suboptimal throughput. Similar to the shortest path algorithm, the LION also does not consider the layer synchronization of SVC streams. Therefore, these two algorithms are not efficient for practical SVC multirate multicast. Conversely, the proposed algorithm makes a joint optimization on the throughput and the transmission cost. As a result, it achieves the best throughput performance over all receivers, meanwhile, maintains the smallest cost for the base layer transmission. In addition, if network coding is not used in our algorithm, the overall throughput will distinctly decrease for the prohibition of the bandwidth share at the same layer.

Relationship between cost and delay According to the definition of the cost function in (2), the path cost can be used to depict the end-to-end path delay. To verify their linear relationship, we vary the playback deadline for “Bus”, “Coastguard”,

“Foreman” and “Mobile” streams from 400 ms to 500 ms. Note that we only consider broadcasting stored video, and ‘live’ videos are beyond the scope of this paper. Here, we suppose that packets are dropped if they do not arrive at the

Table 3 Number of layers

received at each receiver T1 T2 T3 T4 T5 Total

Shortest path tree 2 2 3 1 2 10

(without network coding)

LION 2 3 3 1 2 11

(with network coding)

Proposed algorithm 2 3 4 1 2 12

(with network coding)

Proposed algorithm 2 2 3 1 2 10

(without network coding)

(19)

Fig. 10 The transmission cost of the base layer for each receiver

receiver by the playback deadline. In Table 4, we show the average video quality (in PSNR) at receiver T1as an example. Clearly, the proposed algorithm achieves better video quality. Note that in the shortest path or the LION scheme, the base layer packets for T1 are dropped when the playback deadline is small, i.e. 400 ms.

Although T1can receive higher layer packets at lower cost, it still cannot decode any video information. As the playback deadline increases, larger packet delays can be tolerated. When the playback deadline increases to 500 ms, the video quality of the shortest path and LION schemes is similar to that of the proposed algorithm.

Inf luence of continuous achievable rate region Figure11shows the average video quality measured as PSNR for “Mobile” stream at T2and T3, where the aggregate rate allocated over the network, i.e., the total rate allocated on the output links of source node S, varies from 200 Kbps to 1.3 Mbps. Within the context of SVC, the achievable set of layer bandwidths could be continuous with FGS. Different receivers are able to receive data at different rates by join different multicast groups and video streaming layers. The LION algorithm adopts a discrete layer rate control, where a receiver should receive either a layer in whole or nothing, even if there remains

Table 4 Received average video quality of T1measured as PSNR for four sequences Playback deadline= 400 ms Playback deadline= 500 ms

Bus Coastguard Foreman Mobile Bus Coastguard Foreman Mobile

Shortest path 0 0 0 0 29.38 31.47 36.06 27.9

LION 0 0 0 0 29.38 31.47 36.06 27.9

Proposed algorithm 27.94 30.47 34.44 26.2 29.38 31.47 36.06 27.9