Joint Coding/Routing Optimization for Distributed Video Sources in Wireless Visual Sensor Networks

(1)

Joint Coding/Routing Optimization for Distributed Video Sources in Wireless Visual Sensor Networks

Chenglin Li, Junni Zou, Member, IEEE, Hongkai Xiong,Senior Member, IEEE, and Chang Wen Chen, Fellow, IEEE

Abstract—This paper studies a joint coding/routing optimiza- tion between network lifetime and video distortion by apply- ing information theory to wireless visual sensor networks for correlated sources. Arbitrary coding [distributed video coding and network coding (NC)] from both combinatorial optimization and information theory could make significant progress toward the performance limit and tractable. Also, multipath routing can spread energy utilization across nodes within the entire network to keep a potentially longer lifetime, and solve the wireless contention issues by the splitting traffic. The objective function not only keeps the total energy consumption of encoding power, transmission power, and reception power minimized, but ensures the information received by sink nodes to approximately reconstruct the visual field. Also, a generalized power consump- tion model for distributed video sources is developed, in which the coding complexity of Key frames and Wyner-Ziv frames is measured by translating specific coding behavior into energy consumption. On the basis of the distributed multiview video coding and NC-based multipath routing, the balance problem between lifetime (costs) and distortion (capacity) is modeled as an optimization formulation with a fully distributed solution.

Through a primal decomposition, a two-level optimization is relaxed with Lagrangian dualization and solved by the gradi- ent algorithm. The low-level optimization problem is further decomposed into a secondary master dual problem with four cross-layer subproblems: a rate control problem, a channel contention problem, a distortion control problem, and an energy conservation problem. The implementation of the distributed al- gorithm is discussed with regard to the communication overhead and dynamic network change. Simulation results validate the convergence and performance of the proposed algorithm.

Index Terms—Distributed video coding (DVC), network coding (NC), network lifetime, rate-distortion, wireless visual sensor network (WVSN).

Manuscript received April 26, 2010; revised July 18, 2010; accepted September 1, 2010. Date of publication January 13, 2011; date of current version March 2, 2011. This work was supported in part by the NSFC, under Grants 60632040, 60772099, 60802019, 60928003, and the Program for New Century Excellent Talents in University, under Grant NCET-09-0554, and the National High Technology Research and Development Program of China, under Grant 2006AA01Z322. This paper was recommended by Associate Editor Y. Qian.

C. Li and H. Xiong are with the Department of Electronic Engi- neering, Shanghai Jiao Tong University, Shanghai 200240, China (e-mail:

[email protected]; [email protected]).

J. Zou is with the Key Laboratory of Special Fiber Optics and Optical Access Networks, School of Communication and Information Engineering, Shanghai University, Shanghai 200072, China (e-mail: [email protected]).

C. W. Chen is with the Department of Computer Science and Engineering, State University of New York at Buffalo, Buffalo, NY 14260 USA (e-mail:

[email protected]).

Color versions of one or more of the figures in this paper are available online at http://ieeexplore.ieee.org.

Digital Object Identifier 10.1109/TCSVT.2011.2105596

I. Introduction

I

N A WIRELESS visual sensor network (WVSN), each sensor equipped with video acquisition and processing functionalities is tasked to capture digital visual information about target events, and deliver the video data over wireless channels to a control unit for further data analysis and de- cision making [1]. A WVSN is capable of a wide range of applications, such as security monitoring, video surveillance, and environmental tracking. As an illustrative configuration in Fig. 1, due to the overlapped field of multiview visual sensor nodes, we observe that videos captured from the WVSN are uniquely correlated under the multiview geometry. The major concern is to extend the lifetime of the WVSN, meanwhile, to optimize the network performance for correlated video sources under energy consumption, rate-distortion, and bandwidth constraints.

Since sensor nodes are typically battery powered and it is impractical to replace batteries in many remote sensing applications, energy conservation becomes a critical problem in conventional wireless sensor networks. Therefore, network lifetime maximization for wireless sensor networks has been extensively studied [2], [3]. These works usually assume the data processing complexity at each sensor node is relatively simple and thus the corresponding energy consumption for data processing is negligible. However, visual sensors in a WVSN are required to compress and encode data as well as relaying, and typically efficient video compression algorithms will cause high encoding power consumption, which means the conventional method for wireless sensor networks is no more applicable to the WVSN. To satisfy video applications, He et al. [4] recently reported the lifetime maximization problem with a unicast routing in WVSNs based on an analytic power- rate-distortion (P-R-D) model [5]. It does not consider both the multicast capacity and the source correlation model over a generalized network model.

Recently, there has been considerable interest in applying information theory to data networks. Thus, the traditional routing issue is urged to be a joint coding/routing optimization problem. It was proven that either the minimum communication cost or energy consumption for correlated sensed data can be achieved using distributed source coding (DSC) and network coding (NC) when the link communication cost is a convex function of the link’s data rate [6]. On one hand, lossless Slepian-Wolf [7] and lossy Wyner-Ziv (WZ) DSC theory [8] specify that separate encoding of correlated sources

1051-8215/$26.00 c 2011 IEEE

(2)

Fig. 1. WVSN under the multiview geometry. (a) Network topology.

(b) Sketch map of the 3-D view.

can approach the rate of joint entropy, provided joint decoding is executed with known correlation. Several practical Slepian- Wolf and WZ video coding approaches [9], [10] have been proposed, where temporal prediction for the side information of the estimated frame is fulfilled at the decoder side other than the encoder side. Due to the extremely large amount of data associated with multiview video, it is promised by distributed video coding (DVC) to exploit the inherent similarities of the multiview imagery without accurate geometric priors [11].

For example, a correlation modeling of WZ DVC in WVSNs was addressed with available vision knowledge to estimate the source correlation structure [12]. Zhu and Girod [13] simply absorbed WZ coding to compress data acquitted by large light field system, and then Artigas et al. [14] used View Synthesis Prediction to compensate inter-view correlated side information. In [15], a mix prediction method was applied through wavelet transform. Moreover, it signifies that the encoding power consumption at each sensor node can be reduced and the transmission power consumption can still decrease as the rate required to be transmitted at sensor nodes decreases by utilizing the data correlation among spatially adjacent video sensor nodes [16]. On the other hand, NC as a generalization of routing, offers the best achievable rate region for multicasting in a communication network [17]. In practice, random linear NC as an efficient distributed strategy, achieves this capacity with high probability [18]. Besides improving communication network’s throughput, various potential benefits of NC have been found, including robustness to link/node failures [19]

and packet losses [20]. It is shown that random linear NC suffices for the NC of correlated sources [21]. Reference [22]

provides a practical low complexity scheme of joint DSC and NC. However, most of the existing work on joint DSC and NC focused on capacity and ignored costs.

Fig. 2. WZ-based multiview video coding scheme. (a) Codec architecture.

(b) Coding structure.

To be worthy mentioning, another concern for WVSN applications is to minimize the received video distortion at sink nodes. Typically, it converged to the video distortion minimization without the consideration of network lifetime and routing optimization. For example, He and Chen [23]

investigated the resource-distortion optimization problem for video encoding and transmissions over WVSNs. It was further studied to characterize the rate-distortion performance and the power consumption by developing an analytic P-R-D model of a wireless video sensor in [5]. Recently, Zhu et al. [24]

studied the tradeoff between network utility and network lifetime for wireless sensor networks in a distributed manner.

It was configured in a general wireless sensor network and not suitable for video application.

This paper is motivated to impose a joint coding/routing optimization on correlated video sources through a P-R-D model of DVC, while maintaining a tradeoff between the network lifetime and the video distortion. Involving network combinatorics and information theory over an arbitrary graph structure, we aim to investigate the performance limit of a generalized WVSN model. Also, multipath routing can spread energy utilization across nodes within the entire network to keep a potentially longer lifetime, and solve the wireless contention issues by the splitting traffic. In this context, the objective function not only keeps a total energy consumption of encoding power, transmission power, and reception power minimized, but ensures the information received by sink nodes to approximately reconstruct the visual field. Also, we develop

(3)

a generalized power consumption model for distributed video sources, in which the coding complexity of Key frames and WZ frames is measured by translating specific coding behavior into energy consumption. On the basis of distributed multiview video coding (DMVC) and NC-based multipath routing, the balance problem between lifetime (costs) and distortion (capacity) is modeled as an optimization formulation with a fully distributed solution. Through a primal decomposition, a two- level optimization is relaxed with Lagrangian dualization and solved by the gradient algorithm. The low-level optimization problem is further decomposed into a secondary master dual problem (encoding, congestion, and energy prices update) with four cross-layer subproblems: a rate control problem, a channel contention problem, a distortion control problem, and an energy conservation problem. The implementation of the distributed algorithm is analyzed with regard to the communication overhead and dynamic change. It is validated by extensive experimental results for the convergence and performance.

The rest of this paper is organized as follows. Section II describes a generalized system model. Section III formulates the joint network lifetime and video distortion optimization without NC. In Section IV, we expand this formulation into an optimization problem with NC. A fully distributed algorithm over lossy WVSNs is proposed in Section V. Extensive simulation results are presented in Section VI. Finally, we summarize this paper in Section VII.

II. System Models A. Network Model

Suppose that a WVSN is modeled as a directed graph G = (V, E). In the network model, E is the set of directed wireless links, V = VN ∪ S is the set of nodes, where VN = {v1,1, . . . , vM,N} and S = {s1, . . . , sK} denote the set of wireless video sensor nodes and sink nodes, respectively.

Wireless video sensor nodes are in charge of video capturing, encoding and packet transmitting, while sink nodes are remote control units or human interface devices, acting as destinations of the WVSN.

Assume that all sensor nodes have a fixed transmission range Dt. Let d(i,j) denote the distance between node i and node j through link l = (i, j), then a directed wireless link exists if d(i,j)< Dt.

B. Distributed Multiview Video Coding Model

1) Wyner-Ziv Based Multiview Video Coding Scheme: Sup- pose X and Y are correlated sources termed as source data and side information [7]. Traditional source coding assumes that Y should be available at both encoder and decoder, and then the rate-distortion (R-D) function for X given Y is RX|Y(d). Con- versely, WZ coding theorem assumes that Y is only available at decoder, and encoder could only access to the correlation between X and Y , and corresponding rate-distortion function is denoted as R^WZ_X_|Y(d). A rate loss RX|Y(d)−R^WZX|Y(d) = 0 is proven feasible for Gaussian memoryless source and mean square error (MSE) distortion metric. Pradhan et al. [25] also proved that there is no rate loss for arbitrary side information Y and

independent Gaussian noise in theory. For general distribution and arbitrary distortion metric, Zamir et al. [26] proved that the rate loss is less than 0.5 bit/sample.

DMVC is normally emerging to attain benefits inherent to WZ-based DVC. It is envisaged that the spatial correlation is always stronger than temporal correlation under the multiview geometry, especially in areas with high motion. Here, we adopt the WZ-based multiview video coding scheme [11], [15].

Fig. 2(a) illustrates the multiview video codec architecture, where each sensor node consists of the sensing and encoding components. The multiview video frames are decomposed into two categories: Key (K-) frames and WZ-frames in an interlaced way. The coding structure of the WZ-based multiview video coding scheme is shown in Fig. 2(b), where a block-wise DCT is firstly applied to a WZ-frame. For each DCT transform coefficient band of a WZ-frame, the WZ codec makes use of a quantizer, a bit-plane extraction and a Slepian- Wolf codec (LDPC), to generate layered parity bits [27]. These parity bits are punctured and transmitted upon request by the decoder. The K-frames are encoded with the traditional video coding engine (e.g., state-of-the-art H.264/AVC). It might be configured in a power-scalable manner by tuning motion- compensated prediction (e.g., intra or inter-coding modes, generalized B pictures, multihypothesis-mode macroblock, and hierarchical tree-structured macroblock partitions), spatial transform (e.g., 8× 8 and 4 × 4 integer transforms) and data representation (e.g., context-adaptive variable-length coding or context-based adaptive binary arithmetic coding) [28]. At the decoder side, the K-frames are conventionally decoded to generate the side information. The side information can be seen as a noisy version of the WZ-frames, and the decoder employs a Laplacian noise model for error correction of received codes. The Laplacian parameter is estimated by observing the statistics from decoded frames. Within the underlying WZ- based MVC, two directions are defined: Temporal direction, from which the intra-view side information is generated by the temporal interpolation; View Direction, from which the spatial correlation modeling is to infer the inter-view side information.

The final side information could be achieved by an inherent data fusion from temporal and view side. In summary, the captured multiview video streams are first encoded indepen- dently into either K or WZ-frames, then transmitted to the joint decoder. By utilizing the inter-view correlations among adjacent sensor nodes, the correlation exploitation module jointly decodes the received K and WZ-frames.

2) Rate-Distortion Estimation: The R-D performance of the WZ video encoder is determined by the correlation between the source data and its side information, i.e., the inter- view correlation [8]. Suppose in quadratic Gaussian case, source data X and its side information Y are zero-mean and stationary Gaussian memoryless sources and the distortion metric is MSE. Let the inter-view spatial correlation coefficient between X and Y be ρX,Y, then, according to the WZ theory [8], the R-D performance limit is given by

R^∗_WZ(d) = RX|Y(d) =1 2log⁺₂

σ_X²(1− ρX,Y² ) d

. (1)

(4)

It can be seen that akin to the variance σ_X² in traditional video compression [43], the source correlation ρX,Y is the key parameter of the R-D function, and provides critical information for rate control and performance optimization of the WZ encoder.

Therefore, the key parameter in the rate-distortion estimation for each sensor node is the estimation of the inter- view spatial correlation coefficient which can be obtained by the method proposed in [12]. Under the multiview geometry, the spatial correlation between two sensor nodes is inversely proportional to the distance. Thus, we assume that for each sensor node vm,n, the K-frames of its one-hop neighbor nodes N₁(vm,n) ={vm−1,n, v_m+1,n, vm,n−1, v_m,n+1} can be viewed as its spatial side information. Furthermore, for the sake of saving required transmission rate of side information, only one of the four one-hop neighbor nodes with its K-frame is chosen to be the side information at the same time. For each node vm,n, define Kv_m,n as

K_v_m,n = the number of K-frames

the number of (K-frames + WZ-frames). (2) Then, in terms of weighted average, the R-D function of sensor node vm,n is given by

R_v_m,n = ¹

K_vm,n+

v∈N1(vm,n)

Kv

·

K_vm,n 2 log⁺₂

σ² d

(3)

+

v∈N1(vm,n)

Kv

2 log⁺₂

σ²(1− ρv,v² _m,n) d

.

In (3), one encoded video frame at node vm,n is either K- frame or WZ-frame for which one node from N1(vm,n) with its K-frame is chosen to be the side information. Hence, the video coding constraint is expressed as

Kv_m,n+

v∈N1(vm,n)

Kv= 1 ∀m, n. (4)

C. Wireless Channel Contention Model

Contention-based medium access control (MAC) protocols [24] are universally used as medium access control protocol in wireless sensor networks. Here, a p-persistent contention based protocol at the MAC layer is employed, i.e., each sensor node i will contend for channel access with a certain persistence probability Pi. Furthermore, assume that time is divided into slots and each sensor node can only start transmission at the beginning of each time slot. When node i decides to transmit, it first chooses one link (i, j) out of the set of its outgoing links with probability q(i,j), then contends for channel access with persistence probability Pi. Therefore, the transmission attempt probability of the link (i, j) is given by p_(i,j)= q(i,j)Pi, where

j:(i,j)∈Eq_(i,j)= 1. Thus, the persistence probability is

Pi =

j:(i,j)∈E

p_(i,j) (5)

where 0≤ Pi≤ 1, ∀i ∈ VN, and 0≤ p(i,j)≤ 1, ∀(i, j) ∈ E.

Consider the saturated wireless sensor network scenario where each link always has data to transmit, and assume the wireless channel at link l is with the packet loss probability εl. Under such conditions, the probability for a successful packet transmission is

τ_(i,j)= (1− ε(i,j))p_(i,j)

m∈(i,j)

(1− Pm) (6)

where _(i,j) is defined as the cluster of nodes whose trans- missions will interfere with the transmission at link (i, j).

Assuming that any link originating from node m will interfere with link (i, j) if d_(m,j)<(1 + )d_(i,j), where ≥ 0 specifies the interference range, then the average throughput of link (i, j) is

c_(i,j)= C⁰_(i,j)τ_(i,j) (7) where C_(i,j)⁰ is the instantaneous transmission rate at link (i, j).

And the wireless channel contention constraint requiring the aggregate flow rate should not exceed the link capacity is

f_(i,j)≤ c(i,j) ∀(i, j) ∈ E. (8)

D. Multipath Routing Model

Two major criterions are considered when choosing the multiple paths. First, the shortest path based on link weights is proven to be the best way to transmit data between each source-destination pair [16]. Applying this idea to our problem, we should find for each sensor node multiple shortest pathes to the set of sink nodes. Second, in order to achieve the bound of the multicast capacity from NC, the multiple paths in a multicast session from one sensor node to multiple sink nodes should be chosen such that the probability of path overlapping is high. In practice, we use existing multipath routing schemes [31] to find multiple paths for each multicast session. Although it may result in lower rates with a limited set of paths [32], it has a major advantage of low computational complexity.

Next, we take the notation to describe multipath routing transmission structure. For each node v, we use a matrix H_v = {h^klvj} to reflect the relationship between its path and corresponding links. More specifically, if each wireless sensor node v has J (v) alternative paths to the sink node k, let h^kl_vj= 1 if the path j of node v to the sink node k uses link l, and h^kl_vj= 0 otherwise.

Furthermore, let R^k_vj denotes the information flow rate of wireless sensor node v’s jth path to the sink node k, the aggregate flow rate fl in (8) can be expressed as

f_l=

k∈S

v∈VN

j∈J(v)

h^kl_vjR^k_vj ∀l ∈ E. (9)

E. Power Consumption Model

1) Video Coding Power Consumption: As mentioned in Section II-B, the total video coding power consumption at each sensor node can be divided into two parts: the power consumption to encode K-frames and that to encode WZ- frames. The former can be characterized by the analytic P-R-D

(5)

model [5], and the latter would be measured by translating its specific coding behavior into energy consumption.

First, we denote a = log⁺₂[^σ_d²]/ log⁺₂[^σ²⁽¹_d^−ρ²⁾] as the ratio between the encoded bit rate of single K-frame and single WZ- frame. As long as the total encoded rate Rv of sensor node v is known, the equivalent encoded rate that comprises of pure K-frames is aRv/[(1−Kv) + aKv]. According to the analytical P-R-D model, the encoding power consumption of K-frames at each sensor node v can be computed as K-frames’ ratio Kv

multiplies the power consumption of the encoded stream that comprises of pure K-frames

P_v^K = Kv

1

γ(₍₁_−K^aR^v

v)+aKv)log(σ² d_v)

³₂

1

γR_vlog(σ² d_v)

³₂

(10) where σ² is the average input variance, and γ is the encoding efficiency coefficient. For the sake of simplicity, hereinafter, we denote γ = γa/{Kv

2

3[(1− Kv) + aKv]} as the equivalent encoding efficiency coefficient.

In the WZ codec [27], a WZ-frame is obtained by sequen- tially going through DCT transform, Quantizer and LDPC encoder. Assume the size of each WZ-frame is M × N.

Adopting 4× 4 integer DCT and 2^Q-level quantization [33], the number of instructions to do DCT in the entire WZ-frame is

Num(DCT ) = M× N 4× 4 ×

(12“ + ” + 8“− ”) × 4 × 2

(11) where “·" denotes corresponding instruction. And the number of instructions to quantize one WZ-frame is given by

Num(Q) = M× N ×

1“×” + 1“ + ” + 1“ ”

. (12) For the LDPC encoder, suppose the parity check matrix is HMN×L with the ratio of “1” to be κ, then the number of instructions is

Num(LDPC) = M× N × L × κ × Q ×

1“AND”

. (13) Let ξ“·" (mW/MHz) denote the energy consumption per corresponding instruction, which can be determined by specific CMOS technology [34]. Then the power consumption of WZ-frames at each sensor node v can be computed as the number of WZ-frames per second multiplies the encoding power consumption of one single WZ-frame

P_v^WZ= Rv(1− Kv)

[(1− Kv) + aKv]MNQ× ξ“·" (14)

×[Num(DCT ) + Num(Q) + Num(LDPC)] b · Rv. Hereinafter, we use the notation b as the WZ encoding parameter for the sake of simplicity.

2) Data Transmission and Reception Power Model:

The transmission power consumption [35] at link l can be formulated as

P_l^t= c^s_lf_l/τ_l and c_l^s= α + βd_lⁿ^p (15)

where fl/τ_l is the actual aggregate rate transmitted through link l, c^s_l is the transmission power consumption cost of link l, α is the energy cost of the transmit electronics, β is the coefficient term corresponding to the energy cost of transmit amplifier, dl is the distance between the sender node and receiver node along link l, and npis the path-loss factor taking values between 2 and 4.

The reception power consumption at node v can be formu- lated as

P_v^r= c^r

j:(j,v)∈E

(f(j,v)/τ_(j,v)) (16)

where c^r is the energy consumption cost of the radio receiver, and

j:(j,v)∈E(f(j,v)/τ_(j,v)) is the actual aggregate rate received at node v.

The total power dissipation at node v can be expressed as Pv = P_v^K+ P_v^WZ+ P_v^t+ P_v^r (17)

= P_v^K+ P_v^WZ+

j:(v,j)∈E

c_l^sf_(v,j)

τ_(v,j) + c^r

j:(j,v)∈E

f_(j,v) τ_(j,v). Assuming that each sensor node v has initial energy Ev, then the lifetime of sensor node v can be stated as Tv= ^E_P^v

v. III. Joint Network Lifetime and Video Distortion

Optimization Without Network Coding A. Optimization Problem Formulation

In the proposed problem, one of our major concerns is to extend the lifetime of the entire video data gathering and transmission network, which means we should minimize the cost function in respect to the allocated rate for each sensor node and the transmission structure. In visual monitoring applications, each visual sensor node in the WVSN is of equal importance, and the energy exhaustion of any node will result in the failure of the whole network. Take a security-monitoring application for example, if any of the sensor node fails due to the exhaustion of energy, the intruder can break into the monitored area covered by that node. In such case, the entire security-monitoring system loses its effectiveness even if all the other sensor nodes are still working. Therefore, denote Tv as the lifetime of sensor node v, the lifetime of the visual sensor network can be defined as the time until the first sensor exhausts its battery energy, i.e., T_network = minv∈VNT_v. And the objective is max Tnetwork = max[minv∈VNT_v].

Another major concern is to improve the decoded video quality of each sink node, which prefers the encoded video rate at each sensor node to be as much as possible. The corresponding objective can be expressed as min

v∈VNdv, where dv is the distortion based on the encoded video rate of each sensor node.

To achieve the maximal performance of correlated source reconstruction in WVSNs, it involves two important but conflicting objectives, i.e., maximize the network lifetime and minimize the total video distortion at all the sink nodes.

The tradeoff between these two constrained problems can be formulated as a multi-objective optimization problem.

When introducing a weighted system parameter δ ∈ [0, 1],

(6)

two objectives are combined into a single objective function.

With the aforementioned constraints, the balance problem is defined as P1.

P1: min

R,p,d,t {δ

v∈VN

dv− (1 − δ)[min

v∈VN

Tv]}

s.t.

1) Rv= 1 Kv+

u∈N1(v)

Ku

·

Kv

2 log⁺₂

σ² dv

+

u∈N1(v)

Ku

2 log⁺₂

σ²(1− ρu,v² ) dv

∀v ∈ VN

2)

j∈J(v)

R^k_vj≥ Rv ∀ v ∈ VN ∀ k ∈ S 3)

k∈S

v∈VN

j∈J(v)

h^kl_vjR^k_vj= fl ∀l ∈ E

4) τ(i,j)= (1− ε(i,j))p(i,j)

m∈(i,j)

(1− Pm) ∀ (i, j) ∈ E

5) Pi=

j:(i,j)∈E

p_(i,j) ∀i ∈ VN

6) 1 γRv

log(σ² dv

)≤ (P_v^K)

2

3 ∀ v ∈ VN

7) P_v^WZ= b· Rv ∀v ∈ VN

8) E_v Tv

= P_v^K+ P_v^WZ+

j:(v,j)∈E

c^s_lf_(v,j) τ_(v,j) +c^r

j:(j,v)∈E

f_(j,v)

τ_(j,v) ∀v ∈ VN

9) f_(i,j)≤ C⁰_(i,j)τ_(i,j) ∀(i, j) ∈ E.

In P1, (1) determines the video encoding rate of each sensor node to satisfy given distortion requirements, and (2) makes sure the actual transmitted information rate of each sensor node exceed that encoding rate requirement. Constraints (3)–(5), and (9) characterize the wireless channel contention and transmission condition in wireless sensor networks. Con- straints (6)–(8) specify the power consumption of each sensor node. Furthermore, it can be observed that variable vectors τ, P, P^K, and P^WZare dummy variables as they can be expressed by functions of other variables, thus the optimization variable vectors in P1 are R, p, d, and t.

IV. Joint Network Lifetime and Video Distortion Optimization with Network Coding

A. Network Coding Constraints

Assume each single sensor node to multiple sink nodes pair corresponds to one multicast session. In this paper, we limit our consideration to separately implement NC within each session, i.e., NC is only implemented for data originated from the same sensor node, to ensure easy operation and reduce the complexity. With intra-session NC, flows from the same video sensor node to different sink nodes are allowed to share network capacity by being coded together. Suppose a sensor node v with transmission rate Rv to sink node k, information flow must flow at rate Rvto sink nodes. However, by NC, we

only need to set the actual physical flow on each link to be the maximum of the individual sink node’s information flow.

Specifically, for link (i, j), let x^k_v(i,j) denote the information flow for sink node k from sensor node v, and f_(i,j)^v denote the physical flow from sensor node v, then these constraints can be expressed as

j:(i,j)∈E

x^k_v(i,j)−

j:(j,i)∈E

x^k_v(j,i)=

⎧⎨

⎩

R_v, for i = v

−Rv, for i∈ S 0, otherwise

(18)

x^k_v(i,j)≤ f_(i,j)^v ∀k ∈ S (19) where (18) reflects the information flow balance equation similar to the physical flow balance equation. Equation (19) specifies the NC condition, relating physical rates to information rates.

Here, the physical flow rate vector f is called the coding subgraph and can vary within a constraint set{f}. For a feasible coding subgraph, Theorem 1 of [30] states that the multicast sessions of video sensor nodes can be achieved with the distributed random NC schemes in [29]. For a completeness of this paper, we include this theorem here with slight adaptation to our scenario.

Theorem 1: Given a feasible coding subgraph f ∈ {f}, a multicast session of sensor node v, which flows at rate arbitrarily close to Rv to sink nodes in the set S and injects packets at rate arbitrarily close to f_(i,j)^v on each link (i, j), is achievable with NC if and only if the information flow rate vector x and the physical flow rate vector f satisfy (18) and (19).

Therefore, when setting up optimal multicast sessions over the WVSN, there is no loss of optimality in separating the problems of subgraph selection and NC. In other words, we can find an optimal coding subgraph f satisfying (18) and (19), and then apply a NC scheme to it where coding is done on overlapping links across different sinks’ paths.

The above constraints are obtained from ideal conditions that there is no channel contention in wireless links. When considering the wireless contention, we assume that there is no retransmission limit of MAC protocol, i.e., the MAC protocol will not stop transmitting a packet until this packet is successfully delivered. In such case, the average number of transmissions attempted by a link (i, j) to successfully transmit a packet is 1/τ(i,j). Therefore, given a physical flow rate f(i,j)

on the link (i, j), the actual flow rate to make sure successful transmission through the transmitter to the receiver can achieve the rate of F(i,j)= f(i,j)/τ_(i,j). Then the NC constraints in (19) can also be rewritten as

x^k_v(i,j)≤ F_(i,j)^v τ_(i,j) ∀k ∈ S. (20) Considering NC-based multipath routing, let R^k_vj denotes the information flow rate of wireless sensor node v’s jth path to the sink node k, and f_l^v represents the physical flow rate of wireless sensor node v at link l. The information flow balance condition in (18) will be automatically satisfied. The NC constraints in (19) become

j∈J(v)

h^kl_vjR^k_vj≤ fl^v∀k ∈ S ∀v ∈ VN∀l ∈ E. (21)

(7)

B. Optimization Problem Formulation

It can be observed that in order to introduce NC technique into P1, only (3) and (9) need to be replaced by NC constraints. Considering the power consumption issue, we assume that NC at sensor nodes only employs the binary coding approach proposed in [36], which means only bit-wise addition operations with much lower complexity than other operations [e.g., multiplication operations on a large alphabet size rather than GF (2)] are needed when implementing NC.

Due to the simplicity and low complexity of NC operation, the corresponding power consumption is very little in contrast with the main video encoding and transmission power.

Therefore, when NC is taken into account, the original optimization problem P1 can be rewritten as P2

P2: min

R,p,d,t {δ

v∈VN

dv− (1 − δ)[min

v∈VN

Tv]}

s.t.

(1), (2), (4), (5), (6), (7), (8) in P1; and 3)

j∈J(v)

h^kl_vjR^k_vj≤ fl^v∀k ∈ S ∀v ∈ VN

9)

v∈VN

f_(i,j)^v ≤ C_(i,j)⁰ τ_(i,j) ∀ (i, j) ∈ E.

However, since equality constraints (4) and (8) are not linear, the formulated optimization problem P2 is not a convex optimization problem, which is usually more difficult to solve.

For the sake of simplicity, we will first take some efforts to transform them into linear equality. For (4), it is reformulated by applying logarithmic transformation at both sides. As to (8), we introduce a new variable tv = 1/Tv, which can be inter- preted as node v’s normalized power dissipation with respect to its initial energy Ev, then max[minv∈VNTv] = min[maxv∈VNtv].

Also, since the second part of the objective function in optimization problem P2 contains the maximum function which is not differentiable and need the knowledge of global information of all sensor nodes, it is difficult to solve the problem in a fully distributed manner

maxv∈VN

tv= t ∞= lim

q→+∞ t q= lim

q→+∞(

v∈VN

t^q_v)¹^q. (22) Here, t _∞ is approximated by t q [37], where q is a sufficiently large integer. Furthermore, in a reasonable way we slightly rewrite the objective function t q to t ^qq, P2 can be reformulated as

P3: min

R,p,d,t{δ

v∈VN

dv+ (1− δ)

v∈VN

t_v^q} s.t.

(1), (2), (3), (5), (6), (7), (9) in P2; and 4) log τ(i,j)= log(1− ε(i,j))p(i,j)+

m∈(i,j)

log(1− Pm)

8) Evtv= P_v^K+ P_v^WZ+

j:(v,j)∈E

c^s_lf_(v,j)

τ_(v,j) + c^r

j:(j,v)∈E

f_(j,v) τ_(j,v).

According to [37], we have the following lemma.

Lemma 2: Let t^∗ = 1/T^∗ denote the optimal solution corresponding to P2, also let t_q^∗ denote the optimal solution

corresponding to P3, we have the inequality

t^∗ _∞≤ t_q^∗ _∞≤ |VN|¹^q · t^∗ _∞. (23) From Lemma 2, it can be seen that limq→+∞ tq^∗ ∞= t^∗ ∞, which means the normalized power dissipation solved by P2 can be well approximated by the one solved by P3 when q is a sufficiently large integer. Therefore, the convex optimization problem P3 is approximately equivalent to P2, but can be solved more easily in a fully distributed manner.

V. Distributed Algorithm

In this section, we develop a distributed solution to the proposed optimization problem P3, which allows each sen- sor node and wireless link to control and update the video processing and transmission parameters by itself.

A. Two Level Decomposition

The decomposition theory aims to decompose a large and complex optimization problem into a set of small subproblems, which can be then solved with distributed algorithms that con- verge to the global optimum [38]. Considering P3, one way to decouple the problem is by first taking a primal decomposition with respect to the coupling variable F_l^v = f_l^v/τl, and then a dual decomposition with respect to the coupling constraints (2), (3), and (6). A two-level optimization decomposition procedure is proposed: a master primal problem, a secondary master dual problem with several subproblems.

In mathematical term, after the first level (high level) primal decomposition by fixing the coupling variable F_l^v = f_l^v/τl, the original optimization problem P3 can be decoupled into two hierarchical problems

P3-1: min

R,p,d,t {δ

v∈VN

d_v+ (1− δ)

v∈VN

t_v^q} s.t. (1), (2), (3), (4), (5), (6), (7), (8) in P3 P3-2: min

F U^∗(F) s.t. (9) in P3

wherein P3-1 performs a low-level optimization when the coupling variable vector F is fixed, while P3-2 performs a high-level optimization to update F. U^∗(F) is the optimal value of the objective function in P3-1 for a given F. The output of the low-level optimization is locally optimal and provides an approximation to the global optimal solution using the result of the high-level optimization.

As mentioned in Section IV-A, the coupling variable vector F or f in P3 represents the subgraph of NC. Note that in P3, the impact of NC is embedded in (3). Based on Theorem 1, it implies some forms of “separation principle” that allow independent decisions on power, rate and distortion control during NC. This suggests that the optimal configurations of multicast sessions over WVSN can be determined by decoupling the problem of subgraph selection from NC. The task of the high-level optimization problem P3-2 is to update F, by selecting the optimal subgraph, while the low-level optimization problem P3-1 attempts to find a locally optimal

(8)

solution for power, rate and distortion control of a specified NC scheme for a given coding subgraph F.

Furthermore, after the second level dual decomposition, the low-level optimization problem P3-1 can be decomposed into several subproblems the solution of which only need the local information instead of global information of the entire network. By associating coupling constraints (2), (3), and (6) with Lagrange multipliers λ, µ, and η, respectively, the Lagrangian of problem P3-1 can be expressed as

L(λ, µ, η, R, p, d, t) = δ

v∈VN

dv+ (1− δ)

v∈VN

t_v^q

+

v∈VN

k∈S

λ^k_v(Rv−

j∈J(v)

R^k_vj)

+

l∈E

v∈VN

k∈S

µ^kl_v(

j∈J(v)

h^kl_vjR^k_vj− F_l^vτ_l)

+

v∈VN

η_v[ 1 γRv

log(σ² dv

)− Pv^K

2

3] (24)

and the corresponding dual function is g(λ, µ, η) = inf

R,p,d,tL(λ, µ, η, R, p, d, t).

s.t. (1), (4), (5), (7), (8) in P3.

The Lagrange dual problem of P3-1 can be formulated as

λ≥0,µ≥0,η≥0max g(λ, µ, η). (25) According to convex optimization theorem [38], if the original problem P3-1 is convex, it is equivalent to its La- grange dual problem in (25). Then, the low-level optimization problem P3-1 can be further decomposed to a secondary master dual problem P3-1a and a set of subproblems P3-1b

∼ P3-1e that can be solved in a distributed manner P3-1a: max

λ,µ,η g(λ, µ, η) s.t. λ≥ 0, µ ≥ 0, η ≥ 0 P3-1b: min

R Ub(R) =

v∈VN

k∈S

[

l∈E

µ^kl_v(

j∈J(v)

h^kl_vjR^k_vj)

−λ^kv(

j∈J(v)

R^k_vj)]

P3-1c: min

p Uc(p) =−

l∈E

v∈VN

k∈S

µ^kl_vF_l^vτl

s.t. (4), (5) in P3 P3-1d: min

d U_d(d) = δ

v∈VN

d_v+

v∈VN

k∈S

λ^k_vR_v

+

v∈VN

ηv[ 1 γRv

log(σ² dv

)]

P3-1e: min

t Ue(t) = (1− δ)

v∈VN

t_v^q−

v∈VN

ηvP_v^K

2 3

s.t. (1), (7), (8) in P3.

The relationship among the two-level decomposition is represented in Fig. 3. In the high level, master primal problem P3-2 is a wireless link capacity allocation and NC subgraph selection problem at the transport layer. In the low level, subproblem P3-1b is a rate control problem at the transport layer, subproblem P3-1c is a contention resolution problem at the MAC layer, subproblem P3-1d and P3-1e are about distortion control and energy conservation taking into account impacts both from transport layer and MAC layer in wireless

Fig. 3. Hierarchical decomposition with two levels.

sensor networks, respectively. Secondary master dual problem P3-1a is the Lagrange price update problem.

B. Low-Level Update

Since the objective functions of the secondary master dual problem P3-1a and subproblems P3-1b ∼ P3-1e are differentiable with respect to the dual variables λ, µ, η, and primal variables R, p, d, and t, all problems can be solved by the gradient algorithm [39]. Based on this observation, we propose the following primal-dual algorithm that updates the primal and dual variables simultaneously to solve the low-level optimization problem P3-1.

1) Rate Control Problem at Transport Layer:

R^k_vj(tL+ 1) =

R^k_vj(tL)− α(tL)dU_b(R) dR^k_vj

₊

(26) where tL denotes the low-level iteration index, α(tL) is corresponding positive step size, and [·]⁺ denotes the projection onto the set of nonnegative real numbers.

2) Contention Resolution Problem at the MAC Layer:

p_(i,j)(tL+ 1) =

p_(i,j)(tL)− α(tL)dUc(p) dp_(i,j)

₁

0

(27) where [·]¹₀ denotes the projection onto the range [0, 1].

3) Distortion Control Problem and the Energy Conserva- tion Problem:

d_v(tL+ 1) = [dv(tL)− α(tL)dUb(d) ddv

]⁺ (28) t_v(tL+ 1) = [tv(tL)− α(tL)dUe(t)

dtv

]⁺. (29)

4) Secondary Master Dual Problem:

λ^k_v(tL+ 1) = [λ^k_v(tL) + α(tL)∂g(λ, µ, η)

∂λ^k_v ]⁺ (30) µ^kl_v(tL+ 1) = [µ^kl_v(tL) + α(tL)∂g(λ, µ, η)

∂µ^kl_v ]⁺ (31) ηv(tL+ 1) = [ηv(tL) + α(tL)∂g(λ, µ, η)

∂ηv

]⁺. (32) In terms of Lagrange multipliers’ physical meanings, λ maps to the “encoding prices” at sensor nodes. At sensor node v, if the encoding rate demand Rv exceeds the supply

(9)

Fig. 4. WVSN in simulation. (a) Topology with all nodes and links. (b) Multipath routing example.

j∈J(v)R^k_vj, then price λ^k_v will rise, which in problem P3-1b will lead the encoding rate supply R^k_vjto increase to meet the encoding rate demand, and vice versa. Similarly, the other two Lagrange multipliers, µ and η, can be interpreted as

“congestion prices” at wireless links and “energy prices” at sensor nodes, respectively. Furthermore, all updating steps are distributed and can be implemented at individual links and nodes using only local information.

C. High-Level Update

Next, we discuss how to adjust F in order to solve the high-level optimization problem P3-2. Suppose ˆµ^kl_v and ˆτl

are the optimal Lagrange price and optimal variable corresponding to the constraint

j∈J(v)H_vj^klR^k_vj≤ F_l^vτ_l in problem P3-1. Similarly as the solution of P3-1, first we define the Lagrangian of problem P3-2 as

L(θ, F) = U^∗(F) +

(i,j)∈E

θ_(i,j)(

v∈VN

F_(i,j)^v − C⁰_(i,j)) (33)

where θ is the Lagrange price corresponding to (9). Then a primal-dual algorithm similar to P3-1 is proposed

F_(i,j)^v (tH+ 1) = [F_(i,j)^v (tH)− β(tH)∂L(θ, F)

∂F_(i,j)^v ]⁺ (34)

θ_(i,j)(tH+ 1) = [θ(i,j)(tH) + β(tH)∂L(θ, F)

∂θ_(i,j) ]⁺ (35) where tH denotes the high level iteration index, and β(tH) is positive step size.

Here, θ could be regarded as the “aggregate congestion prices” at wireless links. The update of F_(i,j)^v can be performed individually by each link, only with knowledge of the con- gestion price θ; while the update of θ simply uses the local information of each link.

D. Summary of the Distributed Algorithm

To implement the proposed distributed algorithm, any link l ∈ E or video sensor node v ∈ VN is treated as an entity capable of processing, storing, and communicating information. In practice, each link l = (i, j) is delegated to its sender node i, and all computations related to that link will be executed on node i. Assume that the processor for

TABLE I

Configuration of Model Parameters in a WVSN

Para. Description Value

C⁰_l Instantaneous transmission rate 10 Mb/s

εl Packet loss probability 0.1

ρ Source correlation parameter 0.5

b WZ encoding parameter 0.1 W/Mb

c^s_l Transmission power cost 20 nJ/b

c^r Reception power cost 10 nJ/b

Ev Node’s initial energy 500 kJ

σ² Average input variance of video 3500 γ Equivalent encoding efficiency coefficient 5 W²³s/Mb δ Weighted system parameter 5.0× 10⁻⁴⁷

the link keeps track of variables pl(tL), µ^kl_v(tL), F_l^v(tH), and θl(tH), while the processor of sensor node v keeps track of variables R^k_vj(tL), dv(tL), tv(tL), λ^k_v(tL), and ηv(tL). When the communication overhead issue [40] is taken into account, all the update operations at both low-level and high-level iterations can utilize these variables stored in the local node or link, except the information of the updated rate R^k_vj(tL+ 1), the updated control price µ^kl_v(tL+ 1) and the updated flow rate F_l^v(tH+ 1) that need to be transmitted by extra packets. If we adopt the float type in implementation, each rate, Lagrange price or flow rate takes up only 4 bytes. Thus compared to the main stream of video transmission traffic, the communication overhead introduced by such information exchange is quite small. Furthermore, it can be noted that these packets (rates, Lagrange prices and flow rates) in real implementation need not be communicated as separate packets; the rates can be conveyed through a field in the video data packets, while the Lagrange prices and flow rates can be conveyed through a field in the acknowledgment packets. And the maximal additional delay introduced by sending these packets is the one way propagation delay of the particular sensor node.

The proposed distributed algorithm needs to be implemented whenever the initial network starts monitoring and transmission session or the dynamic change of network condition suddenly happens, to catch up with the optimal total video distortion and network lifetime for the network. The implementation would not stop until the algorithm converges to the optimality or the maximum high-level iteration number is achieved. Therefore, the time spent by the entire network to catch up with the calculated optimal total video distortion and network lifetime is the number of iterations multiplies the update time interval of each iteration. It is found by [41] that an update interval which is about two to three times the one way propagation delay of the particular sensor node is sufficient.

VI. Simulation Results

In this section, we will evaluate the overall performance of the proposed optimization algorithm. Initially, a WVSN in Fig.

4(a) is used, where each node from the 3×3 video sensor node array aims to transmit a quarter common intermediate format (QCIF) video sequence at frame rate 15 frames/s to both two sink nodes. The distance between any two nodes is set to 10 m