Joint Source and Flow Optimization for Scalable Video Multirate Multicast Over Hybrid

(1)

Joint Source and Flow Optimization for Scalable Video Multirate Multicast Over Hybrid

Wired/Wireless Coded Networks

Chenglin Li, Hongkai Xiong, Senior Member, IEEE, Junni Zou, Member, IEEE, and Zhihai He,Senior Member, IEEE

Abstract—This paper aims to optimize the overall video quality and traffic performance for multi-rate video multicast over hy- brid wired/wireless networks. In order to perform layered utility maximization over tiered networks, we propose a joint source- network flow optimization scheme where individual layers of the scalable video stream are imposed on their optimal multicast paths and associated rates for the highest sustainable layered video quality with minimum costs. It sufficiently guarantees that each destination node accesses progressive layered stream in an incremental order, considers network coding across overlapping paths to destination nodes for decent multicast capacity, and addresses the link contention problem during wireless trans- mission. We formulate the problem into convex programming with the objective to minimize the total rate-distortion variations between layers. Using primal decomposition and the primal-dual approach, we develop a decentralized algorithm with two levels of optimization. The numerical and packet-level results compare extensive performance under different control conditions over coded and non-coded hybrid networks. It demonstrates that the proposed algorithm could actually achieve the max-flow throughput and provide better video quality with optimal layered access for heterogeneous receivers.

Index Terms—Multi-rate multicast, network coding, rate- distortion, resource allocation, scalable video coding.

I. Introduction

M

ULTI-RATE MULTICAST has emerged as an important method for content distribution over large networks with its capability in adapting to different user re- quirements and time-varying network conditions [1]. From a source coding perspective, scalable video coding (SVC) allows rate adaptation not only at the encoder/decoder, but also at intermediate network nodes while achieving highly efficient

Manuscript received May 5, 2010; revised September 21, 2010 and Novem- ber 6, 2010; accepted November 9, 2010. Date of publication March 17, 2011;

date of current version May 4, 2011. This work was supported in part by the NSFC, under Grants 60632040, 60772099, 60802019, and 60928003, and by the Program for New Century Excellent Talents in University (NCET-09- 0554). This paper was recommended by Associate Editor P. Frossard.

C. Li and H. Xiong are with the Department of Electronic Engi- neering, Shanghai Jiao Tong University, Shanghai 200240, China (e-mail:

lcl1985@sjtu.edu.cn; xionghongkai@sjtu.edu.cn).

J. Zou is with the School of Communication and Information Engineering, Shanghai University, Shanghai 200072, China (e-mail: zoujn@shu.edu.cn).

Z. He is with the Department of Electrical and Computer Engineering, Uni- versity of Missouri, Columbia, MO 65211 USA (e-mail: hezhi@missouri.edu).

Color versions of one or more of the figures in this paper are available online at http://ieeexplore.ieee.org.

Digital Object Identifier 10.1109/TCSVT.2011.2129070

rate-distortion performance [2]. An SVC stream consists of a base layer and one or multiple enhancement layers with a flexible multi-dimensional layered structure, providing various operating points in spatial resolutions, temporal frame rates, and reconstruction quality levels. Different SVC layers with multi-rate multicast are transported in different IP multicast groups which are subscribed by heterogeneous receivers with different computation and communication resources and ca- pabilities. Within this context, layered multi-rate multicasting is equivalent to a generalized multi-source problem where the progressive inter-layer dependency is considered as fairness between different sources [9], [11].

Recently, the hybrid wired/wireless network has become an increasingly important tool for communication [3]–[5], which is formed by placing a sparse network of base stations in an ad hoc network, where base stations are connected by a high-bandwidth wired network and act as relays for wireless nodes. Hybrid networks present a tradeoff between traditional wired networks and pure ad hoc networks. By introducing the flexibility and scalability that traditional wired-only networks cannot achieve, data may be forwarded in a wireless manner or through the high capacity wired links. On the other hand, the addition of wired base station nodes in hybrid networks is a natural approach to reducing the energy and traffic burden on ad hoc nodes and increasing the system throughput. Further- more, hybrid ad hoc network model can be viewed as a means to extend the communication coverage of wireless cellular infrastructure. Due to the heterogeneous network performance where different wireless users are capable of distinct access capacities and correspondingly require different quality of ser- vice, the hierarchical hybrid wired/wireless network structure can benefit the asymptotic capacity in terms of flexibility and scalability. Such advantages make the multicast over hybrid networks suitable for multimedia dissemination with different quality levels. In this paper, we aim to develop an efficient flow control and performance optimization scheme for scalable video streaming over hybrid wired/wireless networks. Fig. 1 shows an example, where the layered SVC stream is generated at the source node, and distributed to different users through the two-tier wired and wireless network. The objective is to maximize the overall video quality of all receivers.

The contribution of this paper is twofold. First, we formulate a mathematically rigorous convex optimization problem with

1051-8215/$26.00 c 2011 IEEE

(2)

Fig. 1. Illustration of a two-tier wired/wireless network.

the objective to minimize the total quality variation among layers, which considers the inter-layer dependency constraints of SVC bit streams to achieve the highest sustainable layered video quality with minimum costs. The advantages of this optimization formulation include: it guarantees that each destination node accesses progressive layered stream in an incremental order; network coding is utilized for a decent multicast capacity; the link contention of wireless transmission is addressed in the formulation; and a partial quality layer is allowed to be received by introducing a tolerable rate region for each SVC layer. Second, we develop a distributed algorithm with two levels of optimization to maximize the total network utility by jointly optimizing the allocated source rates, transmission flow rates, and routing scheme. The proposed distributed algorithm is proven to be asymptotically stable using the Lyapunov theorem. The extensive numerical and packet-level results demonstrate that the proposed algorithm could actually achieve the max-flow throughput and benefit overall received video quality.

The rest of this paper is organized as follows. Related work is discussed in Section II. The notations and network models are described in Section III. In Section IV, we formulate the flow control and resource allocation problem for scalable video coding and multi-rate multicast over hybrid wired/wireless networks. The decentralized algorithm for SVC-based multi-rate multicasting with network coding based routing is proposed in Section V. We also prove the stability of the proposed decentralized rate control scheme and provide an efficient implementation scheme. Numerical experimental results and packet-level simulations are presented in Section VI. Section VII concludes this paper.

II. Related Work

In communication networks, to achieve the capacity in single-source multiple-terminal multicast [6], network coding is often performed at intermediate nodes. Recent research has demonstrated that it is able to significantly improve the network throughput and robustness to link or node failure and packet losses [7]. Distributed random linear coding schemes [7], [17] have been proposed for practical implementation of network coding. Chen et al. [8] developed adaptive rate control algorithms for networks with and without coding sub-graphs.

In this paper, we incorporate network coding into scalable video streaming over the network to optimize the overall content distribution performance.

Several rate control schemes have been developed in the literature on scalable video coding, layered multicast, and network coding [9]–[12]. For example, the distributed rate allocation scheme in [9] addressed the problem of rate allocation for video multicast with SVC over wireless mesh networks, with the goal of minimizing total video distortion of all peers.

Yan et al. [10] used a rate-distortion function as the application utility measure for optimizing the overall video quality. These rate control methods improve the performance of scalable video streaming over hybrid networks. However, they are suboptimal solutions and the inter-dependency between SVC layers has not been adequately addressed. This might cause decoding problem in scalable video streaming over networks.

For example, video packets from higher-layers may arrive before or without packets from lower layers, which will cause decoding failure. Recently, network coding has emerged as an important tool for maximizing the network throughput during content distribution. Incorporating network coding to flow control and resource allocation allows us to achieve the maximum network throughput and performance limit. A recent work by Zhao et al., called the LION algorithm [11], demonstrated that incorporating both inter-layer dependency and network coding into flow control is able to significantly improve the network throughput for overlay multicast. However, it should be noted that the LION algorithm is only a heuristic method that progressively organizes receivers into separated layered meshes and achieves only sub-optimal performance. In addition, it fails to provide rigorous theoretical justifications on the relation between the LION algorithm and the initial layered overlay multicast problem. As a further improvement, [12] proposed a prioritized flow optimization formulation for SVC and multicast over heterogeneous networks, which used the path cost and prices of each layer as the priority parameters to capture inter-layer dependency and provided rigorous distributed algorithms proven to be stable and convergent.

III. Notations and Network Models

We consider video content distribution over a tiered wired and wireless network. The network can be modeled as a directed graph G₁∪G2, where G₁= (V₁, E₁) denotes the wired network and G2 = (V2, E₂) represents the wireless network.

In the wired network G1, E1 is the set of wired links and V₁={s} ∪ N ∪ T is the set of wired nodes, where {s}, N and T represent the set of source nodes, relay nodes and receiver nodes, respectively. The wireless network G2 is composed of the set of wireless links E2 and the set of wireless nodes V₂ = T ∪ R ∪ D, where R and D denote the relay nodes and destination nodes, respectively, and T represents the set of source nodes in G2 which is also the set of receiver nodes in G₁. Hereinafter, we write V = {s} ∪ N ∪ T ∪ R ∪ D and E= E₁∪E2. Let each wired link l∈ E1 have a finite capacity of Cl, and consider the wireless link contention in a shared transmission medium of wireless link l∈ E2 with capacity C [20].

(3)

A. SVC Coding Model

Utilizing SVC technique, a scalable bit stream could be represented in two different ways, as a layered representation (layered scalable) or a flexible combined scalability (fully scalable) [34]. Generally, the flexible combined scalability can benefit the scenario of unicast, where the target stream can be extracted at any bit rate from the SVC elementary stream in compliance with the single receiver’s capacity status. In comparison, the layered scalability can benefit the network multicasting by offering simple adaptation operation to heterogeneous receivers, i.e., different receivers can subscribe to different combinations of layers under the constraints of network capacity and layer dependency.

Therefore, we adopt the layered scalability for its simplicity and assume that the SVC video stream is encoded into a set of M layers{L1, L₂, . . . , LM} with a predefined encoding rate based on the network condition. According to the encoding rates of layers, we can make the optimal adaptation decision in the scalability cube model illustrated in Fig. 2 by mapping from an SVC elementary stream with fully scalable representation into the layered representation. Correspondingly, the multicast of SVC video stream is divided into M multicast sessions. Each multicast session m has one source node s, a set of destination nodes D, and a set of relay nodes N∪T ∪R.

In order to successfully decode received SVC video streams, we should make sure that all destination nodes are able to subscribe to SVC multicast layers in an incremental order, since layer m + 1 is not decodable without its previous layers 1 to m. This layer dependency constraint promises the most efficient SVC decoding at each destination node.

In fact, a practical SVC encoder and decoder during fluctuating network adaptation could perform a very large variation of options to make a flexible inter-layer motion- compensated prediction and reconstruction [38]. It is well known that the standardized scalable extension of H.264 (SVC) only specifies the bit-stream syntax and bounds to facilitate different applications. To be concrete, a higher layer might be able to be decoded even if the lower dependent layers are either truncated or partially dropped to a mild extent. Certainly, it would cause a drift between the decoded pictures reconstructed in the encoder and in the decoder. To prevent error propagation, the reconstructed reference pictures in the motion-compensated prediction would be limited in the highest layer which is completely received. Under the condition, non-normative coding techniques, e.g., a variety of error resilience and error concealment tools, would be utilized to estimate the lost information for the decoding of the higher layer. It makes a receiver possible to subscribe to a partial quality layer within an achievable rate region. Considering both fluctuating network adaptation and optimization condition [39], [40], each layer is distributed over a multicast session at a variable transmission rate within a tolerable rate region [bm, Bm]. Mathematically, the upper bound Bm (e.g., the encoding rate with a resilient margin) and the lower bound bm

(e.g., the minimum partial margin for layer m) are specified for a confidence interval of the layered transmission rate in layer m. It differentiates the layers with the piecewise confidence intervals along the layer-dependent direction, namely, the

Fig. 2. Typical structure of scalable video bitstream with multiple dimen- sions.

achievable transmission rate for each layer is mathematically extended from an encoding rate point to a tolerable rate region.

From the layered optimization, the fine-granular continuity of the targeted variables (rates) could specifically urge the convexity of optimization problem for developing a distributed solution, and support a strong notion of fairness.

B. Network Coding Model

The algebraic operation in packet level at intermediate nodes, called network coding [6], recently has attracted sig- nificant research interests [8], [13]. Besides improving communication network’s throughput, various potential benefits of network coding have been found, including robustness to link/node failures [14] and packet losses [15], [16]. Distributed random linear coding schemes [7], [17], on the other hand, have made practical implementation of network coding possible. To transmit multiple multicast sessions over a shared network, we might perform network coding across sessions to achieve the optimal throughput. However, combining data belonging to different layers makes it difficult to recover all original data for destination nodes that only receive partial layers. Thus, we limit network coding within each session in this paper. This approach is often referred to as intra-session coding or superposition coding [18].

With intra-session network coding, flows to different destinations of a multicast session are allowed to share network capacity by being coded together. For a single multicast session m with transmission rate Rm ∈ [bm, B_m], information flow must be transmitted at rate Rm to each destination.

However, with network coding, we only need to set the actual physical flow on each link to be the maximum of the individual destinations’ information flows. Specifically, for link l = (i, j), let x^md_(i,j)denote the information flow for destination d of multi- cast session m, and f_(i,j)^m denote the physical flow for multicast session m, then these constraints can be expressed as follows:

j:(i,j)∈E

x^md_(i,j)−

j:(j,i)∈E

x^md_(j,i)=

⎧⎨

⎩

Rm, for i = s

−Rm, for i = D 0, otherwise

(1) x^md_(i,j)≤ f_(i,j)^m ∀d ∈ D (2) where (1) reflects the information flow balance equation similar to the physical flow balance equation. Equation (2)

(4)

specifies the network coding condition, relating physical rates to information rates.

Here, the physical flow rate vector f is called the coding subgraph and can vary within a constraint set F. For a feasible coding subgraph, [22, Theorem 1] states that the multicast sessions can be achieved with the distributed random network coding schemes in [7] and [17] using intra-session network coding. For a completeness of this paper, we include this theorem here with slight adaptation to our scenario:

Theorem 1: Given a feasible coding subgraph f ∈ F, a multicast session m at rate arbitrarily close to Rm from source node s to destination nodes in the set D and that injects packets at rate arbitrarily close to f_(i,j)^m on each link (i, j) is achievable with network coding if and only if the information flow rate vector x and the physical flow rate vector f satisfy (1) and (2).

Therefore, when setting up optimal multicast sessions over the hybrid network, there is no loss of optimality in separating the problems of subgraph selection and network coding. In other words, we can find an optimal coding subgraph f satisfying (1) and (2), and then apply a network coding scheme to it where coding is done on overlapping links across different destination nodes’ paths.

For each multicast session, we find multiple paths from the source node to destination nodes using existing multi-path routing schemes [30], [31]. For each node d ∈ D, we use a matrix Hd = {h^ldj} to represent the relationship between its transmission paths and corresponding links. More specifically, suppose destination node d has J (d) alternative paths from source node s, then h^l_dj = 1 if the path j of node d uses link l, and h^l_dj= 0 otherwise.

When multi-path routing is employed, a link l might be shared by multiple paths of a certain destination node.

Therefore, destination d’s bandwidth consumption for layer m on link l is the sum of the specified layer-m flow rates on d’s all paths which pass link l. Mathematically, let R^m_dj denote the information flow rate of destination node d’s jth path in multicast session m, and f_l^m represent the physical flow rate for link l in multicast session m, then we have x^md_l =_J_(d)

j=1 h^l_djR^m_dj. The information flow balance condition in (1) will be automatically satisfied. With intra-session coding, the network coding constraint in (2) becomes

J(d)

j=1

h^l_djR^m_dj≤ fl^m ∀m ∈ M ∀l ∈ E ∀d ∈ D. (3)

C. Channel Capacity Model

In a wired network, the total transmission rate of the physical flow at each link should be no more than its capacity C_(i,j), that is

m∈M

f_(i,j)^m ≤ C(i,j) ∀(i, j) ∈ E1. (4) In wireless networks, however, the capacity of a wireless link is interrelated with other adjacent wireless links. Con- sequently, we should consider the wireless link contention in a shared transmission medium by introducing constraints of the location-dependent contention among the competing

wireless data flows [19]. In the proposed problem formulation, the assumption is that the wireless medium capacity C is shared among a wireless link l and the cluster of its competing links. The method in [20] considers the spatial locations of the nodes and determines which transmission can be successfully received by its intended recipient. According to this protocol model, suppose that any link originating from node k will interfere with link (i, j) if the link distance d_(k,j) < (1 + )d_(i,j), ≥ 0 and define (i,j) for each link (i, j)∈ E2as the cluster of links that cannot transmit when link (i, j) is active. As compared to individual links in traditional wired network, the notation of cluster can be treated as a basic resource unit. Wireless data flows compete for the capacity of individual cluster that is equivalent to the capacity of the wireless shared medium. Hence, the wireless network channel capacity constraint [35] is

m∈M

f_(i,j)^m +

(p,q)∈(i,j)

m∈M

f_(p,q)^m ≤ C · (1 − ρ(i,j)) ∀(i, j) ∈ E2

(5) where C is defined as the maximum rate of link (i, j) and its corresponding cluster _(i,j) supported by the wireless shared medium, ρ_(i,j) is assumed to be the packet loss probability at wireless link (i, j). Theoretically this packet loss rate can be derived from the Gilbert–Elliott model [41], [42].

D. Rate-Distortion Model

From the perspective of application-layer QoS, rate- distortion related model [21] could be picked as the optimized targeted utility for video applications

D_e(Re) = θ Re− R0

+ D0 (6)

where De is the distortion of the encoded video sequence, measured by the mean squared error, and Re is the encoded rate. The variables θ, R0, and D0 are the parameters of the R-D model, which can be fitted to empirical data from trial encodings using nonlinear regression techniques.

For an SVC stream, a destination node can subscribe to a partial layer. To characterize the video streaming performance of each layer m, we introduce a utility function Um(·), which is continuously differentiable, increasing and strictly concave with respect to the receiving rate. In this paper, we multicast the video streams to all destination nodes and attempt to maximize the total utility of all recipients. In other words, our objective function is given by

max

d∈D

m∈M

U_m(R^m_d) = max

d∈D

m∈M

U_m(

J(d) j=1

R^m_dj) (7)

where R^m_d denotes the received rate at destination node d in multicast session m. Using the R-D model in (6) and Taylor expansion, we can define the utility function in (7) as the absolute value of the distortion decrement for destination node

(5)

d when a new layer m is successfully received and decoded Um(R^m_d) = −[De(

m−1

i=0

Rⁱ_d+ R^m_d)− De(

m−1

i=0

Rⁱ_d)] (8)

≈ θ

(

m−1

i=0

Rⁱ_d− R0)²

R^m_d − θ (

m−1

i=0

Rⁱ_d− R0)³ (R^m_d)².

Equation (8) is a quadratic function with regard to R^m_d and the coefficient of the quadratic term is a negative number, it can be easily seen that Um(R^m_d) is strictly concave.

IV. Problem Formulation

Within the context of SVC, layered multi-rate multicasting is equivalent to a generalized multi-source problem where the inter-layer dependency is considered. The proposed optimization problem will integrate the prior context of source decomposition into the layered multi-rate multicast optimization.

In most previous rate control schemes [9], [11], one sink node can either receive an entire layer or discard the layer, i.e., receiving a partial layer when there is residual bandwidth is not supported. In contrast, the proposed optimization formulation is a continuous optimization problem and makes a receiver possible to subscribe to a partial quality layer by introducing a tolerable rate region [bm, B_m] for Layer m. Therefore, the proposed algorithm can fully utilize the network bandwidth resource. As follows, we propose a new optimization formulation which can achieve the global optimal solution while the layer dependency constraints of SVC stream are strictly satisfied with minimum costs

P1 : max

R

d∈D

m∈M

U_m(

J(d)

j=1

R^m_dj) (9)

s.t.

1)

J(d) j=1

h^l_djR^m_dj≤ fl^m ∀m ∈ M ∀l ∈ E ∀d ∈ D

2)

m∈M

f_l^m≤ Cl ∀l ∈ E1

3)

m∈M

f_l^m+

k∈(l)

m∈M

f_k^m≤ C · (1 − ρl) ∀l ∈ E2

4) bm≤

J(d) j=1

R^m_dj≤ Bm or

J(d)

j=1

R^m_dj= 0 ∀m ∈ M ∀d ∈ D

5)

J(d) j=1

R^m_dj bm

≥

J(d)

j=1

R^(m+1)_dj

B_(m+1) ∀m ∈ {1, 2, . . . , M − 1} ∀d ∈ D 6) R^m_dj≥ 0 ∀j ∈ J(d) ∀m ∈ M ∀d ∈ D

7) f_l^m≥ 0 ∀l ∈ E ∀m ∈ M.

Constraint 1 specifies the required physical flow rate on each link for each layer under the network coding condition. With network coding, different destinations will not compete for link bandwidth within the same layer, therefore the required

physical flow rate on link l for Layer m is the largest information flow rate on link l consumed among all destination nodes. Constraint 2 ensures that for each wired link, the aggregate physical flow rates of different layers do not exceed the link’s capacity. Constraint 3 characterizes the wireless link contention in a shared medium. For each wireless link l, the sum of l’s physical flow rate and the physical flow rates of links in (l) cannot exceed the wireless medium capacity C. Constraint 4 gives the lower bound and upper bound of the transmission rates allocated for Layer m, denoted by bm and Bm, respectively. According to Proposition 1, constraints 4 and 5 together can strictly ensure that each destination node subscribes to SVC multicast layers in an incremental order, i.e., Layer m being received before Layer m + 1. Constraint 6 specifies that the allocated rates are nonnegative.

Proposition 1: The allocated rate R satisfies constraints 4 and 5 if and only if each destination node receives layered video streams in an incremental order.

Proof: See Appendix A.

To ensure the convexity of the proposed optimization prob- lem P1, constraint 4 needs to be re-defined to meet the convexity requirement. Based on the nonnegativity constraint 6, it can directly imply _J(d)

j=1 R^m_dj≥ 0 from R^mdj≥ 0. Hereby, we can simply extend the second equality term_J(d)

j=1 R^m_dj= 0 in constraint 4 to _J(d)

j=1 R^m_dj ≤ 0 because _J(d)

j=1 R^m_dj = 0 can be promised along with the nonnegativity constraint 6.

Therefore, constraint 4 is formalized as bm ≤ _J(d)

j=1 R^m_dj ≤ Bm, or J(d)

j=1 R^m_dj ≤ 0, and further simplified as a cubic inequality (J(d)

j=1 R^m_dj)(_J(d)

j=1 R^m_dj− bm)(_J(d)

j=1 R^m_dj− Bm)≤ 0.

In P1, the optimization variable is the rate vector R = [R^m_dj], i.e., the information flow rates along the jth path to destination node d in multicast session m, ∀m ∈ M, ∀d ∈ D,∀j ∈ J(d). If we choose for each destination node the same number of multiple paths, the total number of the optimization variables will be |M| · |D| · |J(d)|. On the other hand, considering optimizing over the entire network without specified paths from the source node to destination nodes, the optimization variable will be the links’ information flow vector x = [x^md_l ], i.e., the vector of information flow rate on link l for destination d of multicast session m, where ∀m ∈ M, ∀d ∈ D, ∀l ∈ E and accordingly, the total number of optimization variables is |M| · |D| · |E|. In large-scale networks, the number of total links |E| is often much larger than the number of transmission paths |J(d)|.

Therefore, compared with performance optimization over the entire network, P1 has a much smaller number of optimization variables.

We can see that P1 has a unique optimal solution since its objective function is strictly concave and the solution space defined by the constraints is convex. In other words, this is a convex optimization problem. Centralized solutions require global information and coordination between all nodes and links, which is very costly and sometimes infeasible in practice [23], [24]. In the following section, we will develop a distributed solution based on decomposition and duality theories.

(6)

V. Distributed Algorithm

In this section, we develop a distributed solution to the proposed optimization problem which allows each node and link to control and update the transmission parameters by itself.

A. Primal Decomposition

Decomposition theories provide a mathematical foundation for the design of modularized and distributed control of networks [24]. The decomposition procedure aims to decompose a large and complex optimization problem into a set of small sub-problems, which can be then solved with distributed and often iterative algorithms that converge to the global optimum.

For P1 with coupling variables f_l^m, the primal decomposition is often used

P1-1 : max

R

d∈D

m∈M

U_m(

J(d)

j=1

R^m_dj) (10)

s.t. constraints 1, 4, 5, and 6 P1-2 : max

f U^∗(f) (11)

s.t. constraints 2, 3, and 7

where P1-1 performs a low-level optimization when the coupling variable vector f is fixed, while P1-2 performs a high-level optimization to update f. U^∗(f) is the value of the objective function in P1-1 for a given f. The output of the low-level optimization is locally optimal and provides an approximation to the global optimal solution.

As mentioned in Section III-B, the coupling variable vector f in P1 represents the sub-graph of network coding. Note that in P1, the impact of network coding is embedded in constraint 1. Based on Theorem 1, it implies some form of

“separation principle” that allows independent decisions on resource utilization and rate control during network coding. This suggests that the optimal configurations of multicast sessions over the hybrid network can be determined by decoupling the problem of subgraph selection from network coding. The task of the high-level optimization problem P1-2 is to update f, by selecting the optimal subgraph, while the low-level optimization problem P1-1 attempts to find a locally optimal solution for resource utilization and rate control of a specified network coding scheme for a given coding subgraph f.

B. Low-Level Optimization

We observe that the low-level optimization problem P1- 1 can be further decoupled using dual decomposition. More specifically, by relaxing the coupling constraints 1, 4, and 5 with Lagrange multipliers λ, µ, and η, respectively, P1-1 can be written as follows:

L(R, λ, µ, η) =

d∈D

m∈M

Um

⎛

⎝^J^(d)

j=1

R^m_dj

⎞

⎠

−

l∈E

d∈D

m∈M

λ^ml_d

⎡

⎣^J(d)

j=1

h^l_djR^m_dj− fl^m

⎤

⎦ −

d∈D

m∈M

µ^m_d

⎡

⎣

⎛

⎝^J(d)

j=1

R^m_dj

⎞

⎠

⎛

⎝^J(d)

j=1

R^m_dj− bm

⎞

⎠

⎛

⎝^J(d)

j=1

R^m_dj− Bm

⎞

⎠

⎤

⎦

−

d∈D M−1

m=1

η^m_d

⎡

⎢⎢

⎢⎣

J(d)

j=1

R^(m+1)_dj B_(m+1) −

J(d) j=1

R^m_dj bm

⎤

⎥⎥

⎥⎦

(12)

and the corresponding Lagrange dual function is g(λ, µ, η) = sup

R

L(R, λ, µ, η) R^m_dj≥ 0. (13) The Lagrange dual problem of P1-1 can be formulated as follows:

λ≥0,µ≥0,η≥0min g(λ, µ, η). (14) According to convex optimization theories [24], [25], if the original problem P1-1 is convex, it is equivalent to its Lagrange dual problem in (14). Then, the low-level optimiza- tion problem P1-1 can be further decomposed to a secondary master dual problem P1-1a and a set of sub-problems P1-1b that can be solved in a distributed manner

P1-1a : min

λ,µ,η g(λ, µ, η) (15)

s.t. λ≥ 0 µ ≥ 0 η ≥ 0 P1-1b : max

R L(R, λ, µ, η) (16)

s.t. R^m_dj≥ 0 ∀j ∈ J(d) ∀m ∈ M ∀d ∈ D.

At the lower level, the sub-problems P1-1b (i.e., the La- grangians) for each d, j and m, can be solved separately. At the higher level, we have the secondary master dual problem P1-1a to update dual variables λ, µ, and η.

Since the objective functions of problem P1-1a and P1-1b are differentiable with respect to the dual variables λ, µ, η and primal variables R, both problems can be solved by the gradient algorithm [26], [27]. Based on this observation, we propose the following primal-dual algorithm that updates the primal and dual variables simultaneously to solve the low-level optimization problem P1-1:

R^m_dj(tL+ 1) = [R^m_dj(tL) + a(tL)∂L(R, λ, µ, η)

∂R^m_dj ]⁺ (17) λ^ml_d (tL+ 1) = [λ^ml_d (tL)− b(tL)∂L(R, λ, µ, η)

∂λ^ml_d ]⁺ (18) µ^m_d(tL+ 1) = [µ^m_d(tL)− c(tL)∂L(R, λ, µ, η)

∂µ^m_d ]⁺ (19) η^m_d(tL+ 1) = [η^m_d(tL)− d(tL)∂L(R, λ, µ, η)

∂η^m_d ]⁺ (20) where tL denotes the iteration index, a(t), b(t), c(t) and d(t) are positive step sizes, and [·]⁺denotes the projection onto the set of non-negative real numbers.

In terms of their physical meanings, λ represents the “con- gestion prices” of information flow at all links, i.e., λ^ml_d can

(7)

be considered as the “congestion price” of information flow at link l for destination node d’s bandwidth requirement in Layer m. At each link l, if the total information flow bandwidth demand J(d)

j=1 h^l_djR^m_dj in Layer m exceeds the supply f_l^m, then the “congestion price” λ^ml_d will increase. As a result, in problem P1-1b, R^m_dj will decrease in order to meet the link’s bandwidth requirement of information flow, f_l^m, and vice versa. Similarly, the other two Lagrange multipliers, µ and η, can be considered as the “SVC encoding prices” for each destination node in a multicast session. Furthermore, all updating operations are distributed and can be implemented at individual links and nodes using only local information.

C. High-Level Optimization

The low-level optimization and corresponding primal-dual algorithm operate under the assumption that the value of f is fixed. Next, we discuss how to update f in order to solve the high-level optimization problem P1-2. Suppose ˆλ^ml_d is the optimal Lagrange price, i.e., optimal variable corresponding to the constraint _J_(d)

j=1 h^l_djR^m_dj≤ fl^m in P1-1. First we define the Lagrangian of P1-2 as follows:

L(f, α, β) = U^∗(f)−

l∈E1

αl

m∈M

f_l^m− Cl

(21)

−

l∈E2

βl

⎛

⎝

m∈M

f_l^m+

k∈(l)

m∈M

f_k^m− C · (1 − ρl)

⎞

⎠

= U^∗(f)−

l∈E1

αl

m∈M

f_l^m− Cl

−

l∈E2

βl

m∈M

f_l^m

−

l∈E2

m∈M

f_l^m

⎛

⎝

k∈(l)

βk

⎞

⎠ +

l∈E2

βlC· (1 − ρl)

where we introduce a new notation (l) to denote the cluster of links that are interfered by link l. Since (k) denotes the cluster of all links that cannot be transmitted when link k is active, we have k ∈ (l) ⇐⇒ l ∈ (k). And the corresponding Lagrange dual function is

g(α, β) = sup

f

L(f, α, β) f_l^m≥ 0 ∀l ∈ E ∀m ∈ M.

(22) Similar to the solution of low-level optimization problem P1-1, with dual decomposition, the following procedure is used to solve the high-level optimization problem P1-2:

f_l^m(tH + 1) = [f_l^m(tH) + a(tH)∂L(f, α, β)

∂f_l^m ]⁺ (23) α_l(tH+ 1) = [αl(tH)− b(tH)∂L(f, α, β)

∂αl

]⁺ (24)

βl(tH+ 1) = [βl(tH)− c(tH)∂L(f, α, β)

∂β_l ]⁺ (25) where tH denotes the iteration index, a(t), b(t) and c(t) are positive step sizes.

In terms of their physical meanings, α and β can be consid- ered as the “aggregated congestion prices” of physical flows at wired and wireless links, respectively. At each wired (or

wireless) link l, if the total physical flow bandwidth demand

m∈Mf_l^m (or

m∈Mf_l^m +

k∈(l)

m∈Mf_k^m) exceeds the supply Cl(or C·(1−ρl)), then the “aggregate congestion price”

αl (or βl) will increase. As a result, f_l^m in P1-2 will decrease in order to meet the link’s bandwidth supply of physical flow, Cl (or C· (1 − ρl)), and vice versa. The update of f_l^v can be performed individually by each link, only with knowledge of the congestion price α or β; while the update of α and β only uses the local information of each link.

D. Convergence Analysis

We analyze the convergence behavior of the proposed algorithm. Regard the primal-dual algorithm as a nonlinear autonomous system on which we can apply the following Lyapunov stability theorem [26], [29].

Theorem 2 (Lyapunov’s Theorem): Consider an autono- mous system with its equilibrium point at ˆx = 0, this equi- librium point is globally asymptotically stable if there exists a continuously differentiable Lyapunov function V (x), such that:

1) V (x) > 0, ∀x = 0; and V (x) = 0, when x = 0;

2) ˙V(x)≤ 0, ∀x; and ˙V (x) < 0, ∀x = 0;

3) V (x)→ ∞, when ||x|| → ∞.

Note that the above theorem also holds if the equilibrium point is ˆx= 0, by considering a system with state vector x − ˆx.

Proposition 2: If ( ˆR, ˆλ, ˆµ, ˆη) is an equilibrium point of the low-level primal-dual algorithm outlined in (17)–(20), then the equilibrium point is asymptotically stable, i.e., the low-level primal-dual algorithm can converge to its equilibrium point, ( ˆR, ˆλ, ˆµ, ˆη).

Proof: See Appendix B.

Proposition 3: If (ˆf, ˆα, ˆβ) is an equilibrium point of the high-level primal-dual algorithm proposed in (23)–(25), then the equilibrium point is asymptotically stable.

Proof: See Appendix C.

According to the Lyapunov’s theorem, if we can find a Lyapunov function for the dynamical system which satisfies all of these three conditions, then the equilibrium point of the dynamic system is asymptotically stable. From Propositions 2 and 3, we proved the global asymptotic stability of the primal and dual controllers of (17)–(20) and (23)–(25), respectively, which leads to the convergence behavior of the distributed solution to the dual problems of P1-1 and P1-2. Since P1-1 and P1-2 are both convex, we can solve them through their equivalent dual problems using the proposed distributed algorithms [24].

E. Summary of the Distributed Algorithm and Its Implemen- tation

To implement the proposed distributed algorithm, each link l or destination node d is treated as an entity capable of pro- cessing, storing, and communicating information. In practice, each link l = (i, j) is delegated to its sender node i, and all computations related to link l = (i, j) will be executed on node i. Assume that the processor for link l keeps track of variables f_l^m, αl, βl and λ^ml_d , while the processor of destination node d keeps track of variables R^m_dj, µ^m_d and η^m_d. A distributed version of the proposed algorithm can be summarized in Algorithm 1.

Note that the low-level and high-level optimizations operate at different timescales. The low-level iteration algorithm is

(8)

Algorithm 1 Distributed two-level optimization algorithm Step 1. Initialization:

Set R^m_dj(0), λ^ml_d (0), µ^m_d(0), η^m_d(0), f_l^m(0), αl(0) and βl(0), respec- tively, to some nonnegative value for all d, m, l and j.

Step 2. Low-level optimization (tL= 1, 2,· · ·):

At link l:

1) Receives R^m_dj(tL) from the subset{d|d ∈ D and h^l_dj= 1}.

2) Fetches f_l^m(tH) stored in the local processor of node i.

3) Updates the congestion price λ^ml_d (tL) according to (18).

4) Sends the control packet (CP) that comprises updated price λ^ml_d (tL+ 1) in the downstream direction to the subset of destination nodes{d|d ∈ D and h^ldj= 1}.

At destination node d:

1) Receives from the network the aggregate congestion price

l∈Eλ^ml_d (t_L)· h^ldj.

2) Fetches µ^m_d(t_L) and η^m_d(t_L) stored in the local processor.

3) Updates allocated rate R^m_dj(t_L) with (17).

4) Updates the SVC encoding price µ^m_d(tL) with (19).

5) Updates the SVC encoding price η^m_d(tL) with (20).

6) Sends the rate packet (RP) that contains the updated rate R^m_dj(tL+ 1) in the upstream direction to the subset of links{l|l ∈ E and h^ldj = 1}.

Iterate until the low-level implementation converges to the optimality or the maximum iteration number is achieved, then proceed to Step 3 (the high-level optimization stage).

Step 3. High-level optimization (t_H= 1, 2,· · ·):

At wired link l∈ E1:

1) Receives from the network the locally optimal congestion price ˆλ^ml_d of the low-level implementation.

2) Fetches f_l^m(tH) and αl(tH) stored in the local processor.

3) Updates a new f_l^m(tH) with (23).

4) Updates the aggregate congestion price αl(tH) with (24).

At wireless link l∈ E2:

1) Receives from the network the locally optimal congestion price ˆλ^ml_d of the low-level implementation.

2) Receives f_k^m(tH) from the cluster{k|k ∈ (l)}.

3) Receives βk(tH) from the cluster{k|k ∈ (l)}.

4) Fetches f_l^m(tH) and βl(tH) stored in the local processor.

5) Updates a new f_l^m(tH) with (23).

6) Updates the aggregate congestion price βl(tH) with (25).

7) Transmits the flow-rate packet (FP) that contains the updated physical flow rate f_l^m(t_H+ 1) to the cluster{k|k ∈ (l)}.

8) Transmits control packet (CP) that comprises the updated price βl(t_H+ 1) to the cluster{k|k ∈ (l)}.

If the high-level implementation converges to the optimality or the maximum iteration number is achieved, the algorithm stops, else go back to Step 2 (the low-level implementation).

in the inner loop and operates at a smaller time index tL, while the high-level iteration algorithm is in the outer loop and performs at a larger time index tH. More specifically, the high-level algorithm will not move to its step until λ^ml_d (tL) at the low-level converges to its optimum value ˆλ^ml_d .

In summary, the centralized approach requires all of the above variables shared in the entire network and thus causes a great amount of communication overhead. Utilizing the proposed distributed algorithm, however, the communication overhead only comprises the sending overhead of λ^ml_d (tL+ 1) and R^m_dj(tL+ 1) at each iteration of the low-level optimization and the transmitting overhead of f_l^m(tH + 1) and βl(tH+ 1) at each iteration of the high-level optimization.

The overhead of the proposed distributed algorithm consists of two parts: the network coding overhead and the communication overhead. It is demonstrated in [17] that the

Fig. 3. Network topology associated with link capacity, where (a) and (c) illustrate the wired and wireless network in numerical experiment, respectively, (b) and (d) illustrate the wired and wireless network in packet level simulation, respectively.

side information required by network coding is very small, e.g., approximately 3% in the typical Internet scenario. The communication overhead are the CP and RP information in the low-level optimization, and the FP and CP information in the high-level optimization. Consider the implementation issues [39] and take Fig. 3(c) for example, at the end of each low-level iteration, link (r15, r₁₆) needs only to send its CP downward to destination nodes d1and d4, and destination node d₁ needs only to send its RP upward to the links belonging to its six paths. Supposing each updated price or rate is float type that takes up 4 byte, and let M = 3, then the CP of link (r15, r₁₆) requires 3× 2 × 4 = 24 bytes and the RP of node d1

requires 3×6×4 = 72 bytes. Generally, in the worst case, the CP of link l takes up 4·M·|D| bytes, and the RP of destination node d takes up 4·M ·|J(d)| bytes. Similarly in the high-level optimization, wireless link l needs to send its FP and CP to the cluster{k|k ∈ (l)} and {k|k ∈ (l)}, respectively. Therefore, FP and CP require 3×4 = 12 bytes and 1×4 = 4 bytes, respectively. Thus, the total communication overhead sums up to 112 bytes. For the internet configuration of IP packet with a packet size of 1400 bytes, the communication overhead introduced by the proposed algorithm is 112/1400 = 8%. Furthermore, it can be noted that these packets (CPs, RPs and FPs) in practical implementation need not be communicated as separate packets; the CPs can be conveyed through a field in the video data packets, while the RPs and FPs can be conveyed through a field in the acknowledgement packets. The maximal additional delay introduced by sending these packets is the one way propagation delay of the particular multicast destination node.

The proposed distributed algorithm needs to be implemented whenever the initial network starts multicast session or the dynamic change of network condition suddenly happens, to catch up with the optimal allocated transmission rates for

(9)

Fig. 4. Convergence performance of low-level optimization: (a) allocated rate for d2, d4, and d5, (b) example of Lagrange prices λ, µ, and η. And the convergence behavior of high-level optimization: (c) physical flow rate for links, (d) allocated rate for d1, (e) allocated rate for d3, and (f) allocated rate for d5.

destination nodes. The implementation would not stop until the algorithm converges to the optimality or the maximum high-level iteration number is achieved. Theoretically, the convergence time spent by destination nodes to catch up with the calculated optimal transmission rates is the number of iterations multiplies the update interval of each iteration. It is found by [37] that an update interval which is about two to three times the one way propagation delay of the particular multicast destination node is sufficient.

VI. Experimental Results

In this section, experimental results are presented to demonstrate the performance of the proposed algorithm. We first conduct the numerical experiment in C++ code to jointly implement the iterations of both low-level and high-level optimization. The two-level optimization model is numerically solved to evaluate the convergence behavior of the proposed distributed algorithm and demonstrate that it is able to achieve the max-flow throughput. Packet level simulations are performed on ns-2 [32] with a hybrid wired and wireless network where SVC video streams are distributed. We show that the

Fig. 5. Impact of step size on the convergence behavior. (a) Constant step sizes. (b) Diminishing step sizes and constant step size.

proposed algorithm will significantly improve the overall video quality. Furthermore, we study the impact of playback deadline (PD) and background traffic on the overall performance.

A. Algorithm Behavior and Performance Evaluations We conduct numerical experiments and evaluate the proposed algorithm over a hybrid wired and wireless network shown in Fig. 3. Fig. 3(a) shows the wired network by a typical butterfly network with network coding. Here, s denotes the source node, n₁, n₂· · · n4 denote relay nodes, t₁ and t₂ denote two receiver nodes in the wired network as well as the source nodes in the wireless network. Fig. 3(c) shows the wireless network, which consists of 20 randomly distributed wireless sensors. t1 and t2 denote the relay stations that connect the wireless network with the wired network, d1, d₂· · · d5 denote 5 destination nodes and other nodes are wireless relay nodes.

In Fig. 3(a), we can see that each relay station in the wired network has three alternative paths from the source. In Fig.

3(c), we assume that every two nodes within a distance of less than 20 m are able to communicate with each other. We plot two shortest paths for each destination node from relay stations. Therefore, each destination node has six alternative paths from the source.

In the numerical experiments, we assume that the video bit stream has three layers, with the base layer at a rate of 3 (data units/s), the first enhancement layer at a rate of 2 (data units/s)