• 沒有找到結果。

QoS Control for Multi-Stream Voice over IP Networks

3.2 Jont FEC and Playout Control Mechanisms

3.2.2 Joint FEC and Playout Control

The main attraction of multi-stream transmission arises from its flexibility in trading different sources of impairments against each other. Waiting for the arrival of both descriptions results in lower equipment impairment, but at the cost of higher delay impairment. On the other hand, playing out the voice description with lower delay avoids latency, but increases the equipment impairment. Since playout scheduling aims to improve the overall conversational speech quality, which hangs on the balance between delay and packet loss, full reconstruction of both descriptions may not always be the priority if the overall impairment does not justify the extra delay from waiting.

Given that, the joint playout and FEC control must play around with switching between different playout scenarios in order to maximize the benefits of packet path diversity.

To accomplish this goal, we formulated the system design as a perceptually motivated optimization problem and the adopted criterion relies on the use of the proposed multi-stream voice quality prediction model. Our efforts began by estimating the playout delay, which is defined as the time from the moment that packet is delivered to the network until it has to be played out. We applied an autoregressive algorithm (Moon et al. 1998) to estimate the mean ˆd and variance ˆv of network delay, and use them to calculate the buffer delay db = ˆd + βˆv. Waiting for the FEC check packets results in additional delay and, consequently, the playout delay is given by

dplay = ˆd + βˆv + (N − 1)Tp (3.23)

where Tpis the packet generation interval. The parameter β has a critical impact on the tradeoff between delay and late packet loss, which in turn influences the conversational speech quality. From (3.23) it can be deduced that increasing β leads to lower late loss rate as more packets arrive in time, and yet the end-to-end delay also increases. Most playout buffer algorithms [11][12][13] used a fixed value of β; e.g., β = 4, to set the buffer size, so that only a small fraction of the arriving packets should be lost due to late arrival. In this work, a β-adaptive algorithm is instead used to control the buffer size so that the reconstructed voice quality is maximized in terms of delay and loss.

Our general problem can be stated as follows: Given estimates of the parameters characterizing the packet loss and delay distribution, find the optimal values of β and {N, K} so as to minimize the overall impairment function subject to the rate constraint.

Let di be the end-to-end delay experienced by the ith packet, which consists of encoding delay dc and playout delay dplay. Now, we define an overall impairment function Im as a function of both di and eK1 = (e1, · · · , eK) with the following form

Im(di, eK1 ) = Id(di) + K1 PK j=1

P

l=1,2rlIe,l(ej) (3.24)

where r1+ r2 = 1 and the probability to receive both descriptions is given by

r2 = 1

Our optimization framework requires an analytic expression for the packet erasure probability ei as a function of the parameter β. Notice that e(l)b,i and the playout delay dplay are strongly correlated, and to find out their relationship, the network delays of stream l are assumed to follow a Pareto distribution which is defined as FD(l)(d) = 1−(gl/d)αl. The parameters of Pareto distribution αland glcan be estimated from past recorded delays using the maximum likelihood estimation method [13]. More specifically, given a set of past network delays {n(l)i−1, n(l)i−2, . . . , n(l)i−M}, we compute gl = min{n(l)i−1, n(l)i−2, . . . , n(l)i−M} and αl = M/Σij−M=i−1log(n

(l) j

gl ). Then, the late loss probability of packet i in stream l can be computed as follows:

e(l)b,i = 1 − FD(l)(DF,i) = (gl/DF,i)αl. (3.26) where DF,i= dplay − (i − 1)Tp.This reduces the expression of the packet-erasure prob-ability ei to be a function of the playout delay dplay, which in turn is a function of the parameter β.

Finally, we summarize the proposed multi-stream joint playout and FEC adjustment algorithm as below.

1. Apply an autoregressive algorithm [11] to estimate the delay mean ˆd(l)i and vari-ance ˆv(l)i for individual stream l (l = 1, 2) as follows:

(l)i = µ ˆd(l)i−1+ (1 − µ)n(l)i . (3.27)

ˆ

vi(l) = µˆvi(l)−1+ (1 − µ)|n(l)i − ˆd(l)i |. (3.28) where n(l)i is the network delay of packet i in stream l and µ = 0.998002 is a weighting factor for convergence control.

2. At the beginning of each talkspurt, update network delay records for the past M = 200 packets in every stream l (l = 1, 2), and use them to calculate the Pareto distribution parameters (αl, gl) by the maximum likelihood estimation method.

3. Use the values of (αl, gl) to compute the late loss probability in (3.26) and the packet erasure probability ei in (3.22). Apply an exhaustive search method to determine the minimizer ( ˆβi(l), ˆN(l), ˆK(l)) of the overall impairment function in (3.24) subject to the code rate constraint KN × 9.28 ≤ Rmax. Here, the maximum overall code rate Rmax is chosen to be 2.

4. Set the playout delay and RS code parameters to

dplay = ˆd(l)+ ˆβi(l)(l)+ ( ˆN(l)− 1)Tp, (N, K) = ( ˆN(l), ˆK(l))

(3.29)

with l = arg min{Im( ˆβ(l), ˆN(l), ˆK(l)), l = 1, 2}

3.2.3 Experimental Results

Computer simulations were carried out to evaluate the performances given by the four MD voice transmission schemes, MD1-4, which all used the MD-G.729 for source coding and RS(N, K) code for channel coding. The speech data fed into the simulations were two sentential utterances spoken by one male and one female, each sampled at

0 0.03 0.06 0.09 0.12 0.15 45

50 55 60

Link loss rate (%)

R−factor

MD1 Dynamic {N,K,β}

MD2 RS(3,2) β=4 MD3 RS(5,3) β=4 MD4 RS(10,6) β=4 SD Dynamic {N,K, d

play }

Figure 3.3: Performance comparison for different playout algorithms.

8 kHz and 8 seconds in duration. Both samples were encoded and then processed in accordance with the delay and loss characteristics of the trace data to degrade the speech. Among the four schemes, MD1 had its parameters {β, N, K} dynamically adjusted according to the proposed voice quality prediction model, while MD2-4 shared a fixed β = 4 with (N, K) set at (3,2), (5,3), and (10,6) respectively. It should be pointed out that the last two (N, K) sets allowed MD3 and MD4 to perform at the same FEC coding ratio but with different lengths of delay, which gave us the opportunity to evaluate in our test environment the effect of packet loss vs. delay. It was hypothesized that the performances of these schemes would be set apart mainly by the values of {β, N, K} they each assumed, and that the best performance should come with the adaptive parameter adjustment scheme, or MD1 in the current case, whose calculation was based on link loss, packet-erasure loss and various transmission scenarios.

The performances of MD transmission schemes were also compared with an

FEC-protected single description (SD) transmission scheme, which consists of an 8 kbps G.729 speech coder followed by a RS(N, K) channel coder. Following the work of [16], the SD scheme applied a joint playout buffer and FEC adjustment scheme which jointly chooses both the playout delay dplay and the FEC scheme RS(N, K) so as to maximize the perceived voice quality. Figure 3.3 plots the perceived speech quality associated with the SD and four MD schemes for the case where the network paths are subjected to Gilbert-model loss process with link loss rate ranging from 0% to 15%. As described in Section 2.3, the perceived quality was gauged by calculating the predicted average R-factor according to the E-model. It can be seen that the R-factor was decreased as the link loss rate was increased regardless of the scheme used. When applying a joint playout buffer and FEC control scheme, the results obtained using the MD1 is clearly demonstrated an improvement over those obtained using the SD scheme, especially at high link loss rates. As link loss rates slightly beyond 6%, the SD scheme, despite its FEC feature, started showing incapability of recovering the lost packets in facing Gilbert-model link loss process. Among the four MD schemes, MD4, with the longest end-to-end delay, yielded the lowest R-factors, while MD3, with the same FEC coding ratio but shorter delays than those set for MD4, yielded higher R-factors than MD4, but lower R-factors than MD2. MD2 with the lower delay impairment allowed it to outperform MD3 and MD4, but its strength of packet recovery, as seen in Figure 3.3, receded faster as the link loss rate was increased, and at link loss rates greater than 12%, yielded lower R-factors than MD3. The best results in the plot, as hypothesized, were obtained with the currently proposed scheme MD1. Table 3.3 presents some of the varying parameters that shaped its performance and demonstrates the dynamic aspects of this scheme. At link loss rate = 12%, 10.25% of the descriptions were recovered with (N, K) = (5, 3) while 89.7% (the rest) were recovered with (N, K) = (3, 2); when the loss was increased to 15%, 25.6% of the descriptions were recovered with (N, K) = (5, 3) and 74.35% (the rest) were recovered with (N, K) = (3, 2). The average redundant bits thus obtained at the two link loss rates were 1.14(= 10.25% · 2 + 89.7% · 1) and 1.255(= 25.6% · 2 + 74.35% · 1) , respectively. The plot showed that these settings

allowed MD1 to outperform schemes with fixed settings in view of the transmission scenarios during testing. It follows that in multi-stream voice transmission scheme design, the pursuit of high performance of FEC does not guarantee high perceptual speech quality if delay fails to be jointly considered. The best performance seen in MD1 should therefore be taken as evidence attesting to the supremacy of using an all encompassing algorithm proposed here that aims to lower the total impairment impacts by making adjustments adaptive to the on-going interplay of delay, packet-erasure loss and various transmission scenarios.

Table 3.3: Average redundant bits comparison for different link loss rates.

Link loss rate % RS(3,2) RS(5,3) Average redundant bits

12 89.7% 10.25% 1.14

15 74.35% 25.6% 1.255

Chapter 4