Intelligent voice smoother for silence-suppressed voice over Internet

(1)

Intelligent Voice Smoother for

Silence-Suppressed Voice over Internet

Po L. Tien and Maria C. Yuang

Abstract—When transporting voice data with silence suppres-sion over the Internet, the problem of jitter introduced from the network often renders the speech unintelligible. It is thus indispensable to offer intramedia synchronization to remove jitter while retaining minimal playout delay (P DP DP D). In this paper, we propose a neural network (NN)-based intravoice synchronization mechanism, called the intelligent voice smoother (IVoS). IVoS is composed of three components: 1) the smoother buffer; 2) the NN traffic predictor; and 3) the constant bit rate (CBR) enforcer. Newly arriving frames, assumed to follow a generic Markov mod-ulated Bernoulli process (MMBP), are queued in the smoother buffer. The NN traffic predictor employs an online-trained back propagation NN (BPNN) to predict three traffic characteristics of every newly encountered talkspurt period. Based on the predicted characteristics, the CBR enforcer derives an adaptive buffering delay (ABDABDABD) by means of a near-optimal simple closed-form formula. It then imposes the delay on the playout of the first frame in the talkspurt period. The CBR enforcer in turn regulates CBR-based departures for the remaining frames of the talkspurt, aiming at assuring minimal mean and variance of distortion of talkspurts (DOTDOT ) and mean P DDOT P DP D. Simulation results reveal that, compared to three other playout approaches, IVoS achieves superior playout, yielding negligibleDOTDOTDOT and P DP DP D, irrespective of traffic variation.

Index Terms—Back propagation neural network (BPNN), best effort service, constant bit rate (CBR), Internet, intramedia synchronization, jitter, Markov modulated Bernoulli process (MMBP), multimedia communications, silence suppression.

I. INTRODUCTION

T

HE RECENT evolution in high-speed communication technology enables the development of distributed mul-timedia applications combining a variety of media data such as text, audio, graphics, images, voice, and full-motion video. These applications often require stringent quality of service (QoS) guarantees, such as bounded delay and jitter [1]. More-over, the multicast backbone (MBone)/Internet [2]–[4] has been widely deployed to support diverse multicasting traffic for businesses and individuals. Its current transport protocols, however, were originally designed to offer the best effort service without performance guarantees. Consequently, the proliferation of these multimedia applications has imposed ever more strain on the MBone/Internet. In particular, in order to make more efficient use of scarce bandwidth, voice data has been considered to be transported via variable bit rate

Manuscript received December 1, 1997; revised May 17, 1998. This work was supported in part by the Institute for Information Industry (III) under Contract 86-0040. This paper was presented in part at INFOCOM, 1998.

The authors are with the Department of Computer Science and Information Engineering, National Chiao Tung University, Taiwan, R.O.C.

Publisher Item Identifier S 0733-8716(99)00006-2.

(VBR), using speech activity detection [5], [6]. As a result, the problem of unbounded jitter introduced from the network often renders the speech unacceptable or even unintelligible. It thus becomes essential to offer intramedia synchronization retaining distinctive QoS guarantees.

Essentially, voice applications can be broadly classified as either interactive, such as voice conversation of teleconfer-encing, or unidirectional (unicast or multicast), such as voice distribution services [7]. Serving dissimilar purposes, these two classes of applications differ in the requirement and the tolerance of playout impairments [8]. Two playout impair-ments considered include distortion of talkspurts ( ) (or speech clipping) and distortion of silence (or variable speech burst delay). While the former is invariably significant, the latter is often imperceptible for most applications. Owing to the real-time nature, interactive voice applications are more sensitive to than playout impairments. On the contrary, unidirectional applications are rather susceptible to playout impairments, subject to reasonable . The main objective of the paper is to propose an adaptive playout mechanism satisfying any given set of QoS requirements in terms of

and .

Several existing intramedia synchronization methods which perform at end systems exhibit various performance mer-its. They can be categorized as: static delay-based, dynamic feedback-based, and dynamic delay-based. Static delay-based methods preserve playout continuity by buffering massive packets at receiving end systems [9] or delaying the playout time of the first packet received [4], [10], [11]. These methods have been shown to be feasible, but at the expense of a drastic increase in . On the other hand, dynamic feedback-based methods [12], [13] perform intramedia synchronization through adjusting the source generation rate by means of sending feedback from receiving end systems. While these methods are effective, they are unviable for most live-source applications. Unlike the two classes of methods described above, dynamic delay-based methods [14], [15] employ dy-namic playout rates in accordance with a computed window (or threshold) which can be intelligently predicted in real time or analytically computed in advance. These methods have been shown to be viable, particularly for video streams which are dissimilar to voice streams by nature [7]. Two other dynamic delay-based methods [16], [17] attempted to preserve playout continuity via adaptive buffering of frames having been time-stamped at the sources. The methods are indeed feasible, but at the expense of drastic processing and framing overhead from frequent time stamping.

(2)

(a)

(b)

Fig. 1. Concept of IVoS whereX: first frame of a talkspurt; Pi: pausei; Ti: talkspurti: (a) An end-to-end voice flow scenario. (b) IVoS states.

In this paper, we propose an NN-based intravoice syn-chronization mechanism, called the intelligent voice smoother (IVoS), operating at the application layer of receiving end systems. IVoS is composed of three components: 1) the smoother buffer; 2) the neural network (NN) traffic predictor; and 3) the constant bit rate (CBR) enforcer. The inbound traffic to IVoS is modeled as a generic discrete-time Markov modulated Bernoulli process (MMBP) with unknown and varying probabilistic parameters.

Initially, newly arriving frames are queued in the smoother buffer. The NN traffic predictor employs an online-trained back propagation NN (BPNN) to predict three characteristics (talkspurt length, frame count, and last burst length) of the upcoming talkspurt period. Based on the predicted character-istics, the CBR enforcer imposes an adaptive buffering delay ( ) derived from a near-optimal closed-form formula. The CBR enforcer, in turn, regulates CBR-based departures of frames within this talkspurt period, aiming at assuring minimal mean and variance of and mean . Simulation results reveal that, compared to three other playout approaches, IVoS achieves superior playout, yielding negligible and , irrespective of traffic variation.

The remainder of this paper is organized as follows. Section II presents the main concept and the inbound traffic model of IVoS. Section III details the system architecture, including its NN traffic predictor and the CBR enforcer. Section IV then demonstrates performance comparisons

between IVoS and existing playout approaches. Finally, concluding remarks are given in Section V.

II. IVoS—CONCEPT AND MODEL

A. Concept

Generally, voice data are sampled and encoded as fixed-size frames. These frames with silence suppression are in turn sent over the MBONE/Internet [2]. Upon receiving frames which are assumed to arrive in accordance with a generic MMBP (described later) with unknown probabilistic parameters, IVoS determines the departure time at which frames are transferred from the IVoS smoother buffer to the decoder buffer from which frames are playbacked. An end-to-end flow scenario is illustrated in the time–space diagram shown in Fig. 1(a). In the figure, the conversation between the sender and receiver (with IVoS) is conducted in an alternating manner between an active period and an inactive period. IVoS is in an active period during receiving frames; otherwise it is in an inactive period. Furthermore, being in an active period, IVoS alternates between the talkspurt state, with frames intermittently received (during the busy state), and the pause state during which no frames appear. Moreover, the first frame in any talkspurt is tagged (marked as shown in the figure) before being transmitted.

Accordingly, the goal of IVoS is the enforcement of CBR playout during the active period by dynamically adjusting

(3)

TABLE I

VARIABLESUSEDTHROUGHOUT THEPAPER

the duration of pauses in an effort to compensate for jitter within talkspurts. For ease of illustration, and without loss of generality, we assume that the system remains in one active period for the entire duration of the connection throughout the rest of the paper. This is often the case for unidirection-based multimedia applications, such as distance learning.

The rationale of how IVoS achieves the aforementioned goal is also shown in Fig. 1(a). The time axis in IVoS is slottized by the processing of a single frame from the adjacent lower layer, i.e., the transport layer. We assume that, disregarding the framing overhead, voice frames are generated (playbacked) at the encoder (decoder) of the sending (receiving) end system at a rate of one-third of the processing rate of the transport layer [14]. Define as the ratio of the generation or playout of a frame, referred to as the frame time, to the processing of a single frame at the transport layer, referred to as the slot time, i.e., frame time/slot time. For the example given in Fig. 1, For ease of description, all variables which are used throughout the paper are summarize in Table I.

Frames are finally received at IVoS at the receiving end system. Ideally, in a jitter-free network the interdeparture times of frames from the sender’s application are the same as the interarrival times of frames at IVoS. In this case, frames are playbacked intelligibly at the maximum rate, i.e., one frame per every three time slots during talkspurts. Unfortunately, in reality, owing to delay jitter induced in the network, different frames yield different end-to-end delays which result in speech unintelligibility. Denote and as the sample and playout time of frame respectively. In addition, denote as the end-to-end transfer delay of frame Accordingly, playout

Fig. 2. Inbound traffic model.

discontinuity is quantified by where the for talkspurt denoted as is defined as

(1)

where and are the ordinal numbers of the first and last frame in talkspurt and is the total number of frames in talkspurt Moreover, playout discontinuity can be reduced at the expense of an increase in . Let denote the of talkspurt defined as the elapsed time between the fastest possible departure and real departure of the last frame in talkspurt

Consequently, IVoS aims at achieving minimal

and (zero in the case of CBR playout) while sustain-ing minimal It is worth noting that we do not consider frame loss probability, that is, the dropping of frames due to late arrival. Although discard of late frames assures delay-free playout, the price paid is deteriorating speech clipping [8]. Two issues have been considered in the design of IVoS: 1) how and what characteristics of future traffic to be predicated and 2) how to determine an imposed on each talkspurt aimed at achieving a quasi-CBR playout during talkspurts. Before proposing solutions to these two issues, we first present the inbound traffic model and the architecture of IVoS in the next subsections.

B. Inbound Traffic Model

The inbound traffic to IVoS is modeled by a generic discrete-time MMBP [5], [6], [18] as shown in Fig. 2. The process alternates between the pause state and the talkspurt state. Within the talkspurt state, the process switches between the busy state, during which frames arrive in a burst, and the idle state during which no frame appears. The transition proba-bilities between states are given in the figure. For example, defines the probability of switching from the busy to the pause state and defines the opposite probability. The lengths of all three states are assumed to be geometrically distributed. Moreover, during the busy state, one frame always arrives (with probability per slot time and no frame is generated during the idle and the pause states.

(4)

It is worth noting that VBR voice sources have been mod-eled by such a three-state MMBP [6] with parameters matched to the nature of the voice applications. It has been shown [18] that multiplexing of MMBP’s can be approximated by another MMBP with different transitional probabilities. Therefore, after the traffic has been multiplexed and demultiplexed over a network at the end system in IVoS, we adopt a generic MMBP with unknown and varying transitional probabilities which will be in turn predicted by the NN traffic predictor.

The steady-state probabilities of being at the three states, denoted as and can be computed using

[19] where and is the

state transition probability matrix of the MMBP. As a result and (2) Moreover, the mean frame rate ( ) and mean burst length ( ) can be directly expressed as functions of

and

(3) Next, it can be perceived that the of each talkspurt is largely dependent on three variables: the talkspurt length the frame count and the last burst length of talkspurts. We now examine the probability mass functions (pmf’s) of these three random variables.

First, let denote the total number of cycles, from the busy to the idle and back to the busy state, exhibited in a talkspurt. The pmf of talkspurt length becomes

(4) with the conditional pmf given as shown in (5) at the bottom of this page, where Second, the pmf of

frame count can be expressed as

(6)

in which the joint pmf can be derived as shown in (7) at the bottom of this page. Finally, the pmf of the last burst length can be obtained from the joint pmf of the three random variables. That is

(8)

in which the joint pmf can be derived based on the same notion of the cycle introduced above, as shown in (9) at the bottom of this page. We notice that after the analytical computation of (8) and (9) the last burst length as was perceived, happens to be geometrically distributed. That is For ease of explanation, in Table II we summarize 81 types of traffic arrivals (nine different ’s and ’s) which are used throughout the rest of the paper. For all traffic arrivals we assume that due to the CBR nature ( frames per every slot time) the during talkspurts is given as , which is equal to one-third in all cases. This, assuming a silence suppression rate of 40%, results in an for the entire

talk of Moreover, notice that we

employ the same and for all traffic types due to their being independent of playout intelligibility.

III. IVoS SYSTEM ARCHITECTURE

IVoS is composed of three major components (see Fig. 3): 1) the smoother buffer; 2) the NN traffic predictor; and 3) the CBR enforcer. Newly arriving frames are first placed in the smoother buffer in a first-come-first-served (FCFS) fashion. Each time, the reception of a marked frame, which corresponds to the initiation of a new talkspurt, triggers the NN traffic predictor to perform the prediction of three traffic

if if (5) if if (7) if if (9)

(5)

TABLE II

INBOUND TRAFFIC ARRIVALS USED THROUGHOUT THEPAPER

Fig. 3. System architecture of IVoS.

characteristics of the upcoming talkspurt They are the talkspurt length frame count and the last burst length Based on the three predicted characteristics, the CBR enforcer then determines a dynamic delay to be imposed on the first frame, and regulates instant playout for all subsequent frames of the talkspurt. The same process repeats for the next talkspurt until the end of the talk. In the following sections the predictor and enforcer are described in detail.

A. NN Traffic Predictor

Substantially, we have discovered several strengths of NN’s with respect to the training of traffic distributions. On the whole, while the offline learning of traffic distributions has been shown to be profoundly viable, the online training [20], [21] of highly bursty traffic is more challenging. The NN traffic predictor of IVoS employs an online-trained BPNN to predict (talkspurt length), (frame count), and (last burst length) of the upcoming talkspurt based on the same characteristics taken from the past three talkspurt periods More explicitly, the NN is modeled, as shown in Fig. 4, as

(10)

In the equation, denotes the NN function and WG represents the weight matrix of the links between neurons.

denotes the sets of input vectors and to , representing the three characteristics respectively taken from past talkspurt periods.

Fig. 4. NN traffic predictor.

Fig. 5. Comparison of actual and predicted frame counts(Fi):

denotes the output vector, representing the three traffic char-acteristics over the next talkspurt period. At the beginning of every , in addition to predicting the three parameters of the future talkspurt as described above, the NN also performs

(6)

(a) (b)

Fig. 6. ABD under various loads and burstiness of arrivals. (a) Load effect on ABD. (b) Burstiness effect on ABD. TABLE III

PERFORMANCECOMPARISONSBETWEENOPTIMALPLAYOUTS AND IVoS(MBL = 2; = 0:9)

the back-propagation training operation by updating the WG based on the three traffic measurements over the talkspurt which has just passed.

In Fig. 5, we draw a comparison between the actual and predicted frame counts in each talkspurt , assuming an MMBP arrival of type listed in Table II. In this experiment, we employed a three-layer NN with five hidden nodes and a learning constant of 1.25. We have observed that the variation of this characteristic can mostly be captured by the predictor. The predictions of the other two characteristics, i.e., and also exhibit compatible results. For greater details, readers are referred to an early published paper [14].

B. CBR Enforcer

Based on the three predicted characteristics, the CBR en-forcer mainly determines an to be imposed on the first (marked) frame initiating the talkspurt. Let and denote the random variable of and for talkspurt respectively. The enforcer aims at the achievement of quasi-CBR departures for all subsequent frames belonging to the same talkspurt. In principle, the buffering delay of a talkspurt should be large enough in compensation for the total number of time slots lacking the frame playout in the talkspurt.

Should the three traffic characteristics, of talkspurt be known, let us consider the best case, i.e., incurring the minimum . In this case, the remaining frames (frames not belonging to the last burst) have arrived back to back at the beginning of the talkspurt. First, the entire playout duration, defined as the interval from the beginning of the talkspurt to the end of playout of the last frame, is clearly the sum of the length of talkspurt and the additional duration required to playback frames of the last burst

Moreover, the total elapsed time required for CBR playout, subject to a total number of frames in talkspurt is

Therefore, the can be given as the difference of the entire playout duration and the elapsed time for CBR playout, i.e.,

Now, consider the real case in which the remaining frames have arrived at different locations between the beginning and the last burst. Taking the localities of the remaining frames into account, we introduce a so-called locality parameter, denoted as Thus, we attain the theoretical buffering delay denoted as yielding CBR playout as

(7)

(a) (b)

(c) (d)

(e) (f)

Fig. 7. Wheret: a constant of ~t in (15); f: a constant of ~f in (15); b: a constant of ~b in (15). Abd: t + [12 b 2 (F 0 1] 0 f 2 F +. Six cases for deriving upper and lower bounds ofDOT and P D. (a) Best case 1: t 0 b 0 Abd (f 0 b)F: (b) Best case 2: t 0 b 0 Abd < (f 0 b)F: (c) Worst case 1:t 0 b 0 Abd F: (d) Worst case 2: t 0 b 0 Abd < F: (e) Worst case 3: t 0 f 0 Abd F: (f) Worst case 4: t 0 f 0 Abd < F:

TABLE IV

DOT AND P D BOUNDS FOR THEDETERMINATION OF INIVoS

It is worth noting that , being unity, corresponds to the best case given previously. The smaller the the later and more widely the spread frames have arrived.

Furthermore, since and are not known in ad-vance, replacing them by and as predicted by the NN predictor, we can formulate the actual buffering delay as

(12)

With such a delay imposed, the CBR enforcer then regulates the departure of frames in a rate of frames/slot until

the next marked frame initiating the subsequent talkspurt has been encountered. The process repeats until the end of the talk.

To formally examine the behavior of in relation to two traffic parameters, and , we formulate the pmf of in terms of and as

(8)

Fig. 8. Performance comparisons of playout approaches.

(13)

Based on (8), (9), and (13), can be directly com-puted.

We carried out analytical computation using Mathmetica and undertook event-based simulation in the C language. Both the analytical computation and simulation were performed in Pentium-Pro-200 PC’s. The analytical computation terminated

(9)

(a) (b)

(c) (d)

Fig. 9. Comparisons of mean DOT among three playout approaches. (a) Instant playout. (b) 100-slot prebuffering playout. (c) 200-slot prebuffering playout. (d) IVoS.

(a) (b)

(c) (d)

Fig. 10. Comparisons of mean P D among three playout approaches. (a) Instant playout. (b) 100-slot prebuffering playout. (c) 200-slot prebuffering playout. (d) IVoS.

when the cumulative distribution of reaches 99.9%. Simulation terminated when 10 slots had been executed (steady state was reached). In Fig. 6, we depict analytical and simulation results of mean as a function of under various ’s. In addition, due to the enormous amount of analytical computation time, results under various ’s are gathered only via simulation. The figure demonstrates that analytical results are in profound agreement with sim-ulation results. Moreover, mean unanimously declines as increases and increases with . Finally, both figures display that a smaller results in greater mean , as was expected.

In Table III, we display simulation results in order to compare the playout performances of IVoS and two other playout methods under an of 2 and a of 0.9. The two playout methods are pure CBR playout and instant

playout. The former achieves minimum mean and variance of , and the latter yields minimum mean . Since pure CBR playout regulates the departure of all frames in a fully CBR fashion, it yields zero mean and variance of . In comparison, IVoS offers negligible, minor distortion, as shown in Table III, under all nine traffic types. Moreover, with respect to mean , IVoS achieves satisfactory delay compared to the instant playout approach. As shown in Table III, the smaller the , the greater the deficiency. It is also worth remembering that pure CBR unfortunately incurs larger . In contrast, instant playout undergoes poor mean and variance

of .

Up to this point, the problem left unsolved is the determi-nation of . In principle, the selection of is based on the preference of QoS to or to . A smaller (larger) is

(10)

(a)

(b)

(c)

Fig. 11. Comparisons of DOT instances over talkspurts. (a) Instant playout approach. (b) Prebuffering playout (60-slot fixed buffering delay.) (c) IVoS. (d) Talkspurt i.

First, in the case of , the problem can be formalized as the determination of ’s satisfying a given QoS requirement in terms of a ninety-ninth percentile of . That is

Find a range of ’ satisfying (14)

where is a given tolerable value. To solve (14), we first discover two bounds of ; the upper bound and the lower bound, denoted as and respectively. They

can be derived under various conditions depicted in Fig. 7 and summarized in Table IV. With and

introduced, the cumulative distribution function of can be formulated as

(11)

(a)

(b)

(c)

Fig. 12. Comparisons of P D instances over talkspurts. (a) Instant playout approach. (b) Prebuffering playout (60-slot fixed buffering delay.) (c) IVoS. (d) Talkspurt i. where if if if and (16)

assuming that is uniformly distributed between and

On the basis of (14)–(16) and Table IV a viable range of ’s can be attained offline satisfying a given requirement. Likewise, the same procedure can be similarly applied to the case of .

IV. EXPERIMENTAL RESULTS

To demonstrate the viability of IVoS, using simulations, we compared IVoS and two other playout approaches in

(12)

terms of three performance metrics, the mean and variance of , and mean . The two approaches are instant playout and prebuffering playout. In the instant playout approach, frames were queued and playbacked at a rate of and below. It is worth noting that the instant playout approach differs from IVoS in the lack of buffering delay imposed on the first frame of each talkspurt. In the prebuffering approach, a predetermined fixed delay is imposed on the first frame of each talkspurt. For the purposes of showing performance contrast, in simulation we employed different variants of prebuffering playout using various fixed delays. All assumptions and parameters used for simulation are first summarized as follows:

• voice data rate during talkspurts: 64 kb/s;

• talk duration: 6.25 min (corresponding to 30 000 slots long);

• slot size: 100 bytes (corresponding to 12.5 ms/slot of voice);

• 3 (corresponding to a mean load of 1/3 frame/slot during talkspurts);

• inbound traffic: nine traffic types, each with nine different burstiness listed in Table I;

• buffer size used for all approaches: 1200 slots (large enough to assure loss free);

• silence suppression rate: range from 25–85% as shown in Table I.

In Fig. 8, we draw comparisons between mean and variance of and mean among three playout approaches under various ’s and ’s. In the prebuffering play-out approach, we demonstrate three versions of prebuffering playout using 20, 40, and 60 slots of fixed buffering delays, respectively. Essentially, compared to other approaches, IVoS distinctively achieves superior playout yielding minimal mean and variance of , irrespective of the increases in both and . On the contrary, the three other approaches undergo deteriorating mean and variance of as declines or as increases. Moreover, they all suffer from poor variance of , i.e., unstable speech intelligibility under medium to high burstiness traffic. As for mean , while all approaches yield longer delay under high and , IVoS retains compatibly short delay as compared to the optimal one, i.e., instant playout.

Figs. 9 and 10 plot two performance metrics under complete sets of ’s and ’s. It is particularly worth noting that, both versions of prebuffering (100 and 200 slots) playout achieve only bearable mean , at the expense of a drastic increase in mean . In Figs. 11 and 12, we further plot instances of and over talkspurt under two traffic types. The figures indicate that IVoS outperforms two other approaches in its invariably low and acceptable over time. In addition, the figures clearly reveal a tradeoff problem between and , using both the instant playout and prebuffering playout approaches. In contrast, IVoS is free from the tradeoff problem and achieves near-optimal performance. It is particularly worth noting that, in Fig. 11, under high burstiness traffic the two playout approaches other than IVoS undergo high variance of

, resulting in unstable speech intelligibility throughout the entire talk.

V. CONCLUSIONS

In this paper, we have proposed IVoS, which is an NN-based intravoice synchronization mechanism. It is composed of three components: 1) the smoother buffer; 2) the NN traffic predictor; and 3) the CBR enforcer. The NN traffic predictor employs an online-trained BPNN to predict the talkspurt length, frame count, and the last burst length of every newly encountered talkspurt, based on the same set of characteristics of the past three talkspurt periods. According to the predicted characteristics, the CBR enforcer imposes an , computed by means of a near-optimal simple closed-form formula, on the first frame of each talkspurt. It then regulates CBR-based departures for the rest of frames within the talkspurt, with the goal of assuring minimum mean and variance of , and mean . The paper demonstrated simulation results which revealed that IVoS distinctively achieves superior playout yielding minimal mean and variance of , irrespective of increases in and . In contrast, the two other playout approaches (instant playout and prebuffering playout) undergo deteriorating mean and variance of as declines. Moreover, they all suffer from poor variance of

, i.e., unstable speech intelligibility under medium and high burstiness traffic. As for mean , while existing approaches yield longer delay under high and , IVoS retains compatibly short delay as compared to the optimal one, i.e., instant playout.

REFERENCES

[1] R. Steinmetz, “Human perception of jitter and media synchronization,”

IEEE J. Selected Areas Commun., vol. 14, pp. 61–72, Jan. 1996.

[2] H. Eriksson, “MBone: The multicast backbone,” Commun. ACM, vol. 37, pp. 54–66, Aug. 1994.

[3] D. E. Comer, Internetworking with TCP/IP. Englewood Cliffs, NJ: Prentice-Hall, 1994, vol. II.

[4] J. Bolot, “End-to-end frame delay and loss behavior in the Internet,” in

Proc. ACM SIGCOMM, Sept. 1993, pp. 289–298.

[5] H. Saito, Teletraffic Technologies in ATM Networks. Norwood, MA: Artech House, 1994.

[6] S. Nanda, D. Goodman, and U. Timor, “Performance of PRMA: A packet voice protocol for cellular systems,” IEEE Trans. Veh. Technol., vol. 40, pp. 584–598, Aug. 1991.

[7] R. Onvural, Asynchronous Transfer Mode Networks—Performance

Is-sues, 2nd ed. Norwood, MA: Artech House, 1995.

[8] J. Gruber and L. Strawczynski, “Subjective effects of variable delay and speech clipping in dynamically managed voice systems,” IEEE Trans.

Commun., vol. COM-33, pp. 801–808, Aug. 1985.

[9] C. Nicolaou, “An architecture for real-time multimedia communication systems,” IEEE J. Select. Areas Commun., vol. 8, pp. 391–400, Apr. 1990.

[10] W. Montgomery, “Techniques for frame voice synchronization,” IEEE

J. Select. Areas Commun., vol. SAC-1, pp. 1022–1028, Dec. 1983.

[11] R. Ramjee, J. Kurose, and D. Towsley, “Adaptive playout mechanisms for frameized audio applications in wide-area networks,” in Proc. IEEE

INFOCOM, 1994, pp. 680–688.

[12] S. Ramanathan and P. Rangan, “Feedback techniques for intra-media continuity and inter-media synchronization in distributed multimedia systems,” Comput. J., vol. 36, no. 1, pp. 19-31, 1993.

[13] T. Little and A. Ghafoor, “Multimedia synchronization protocols for broadband integrated services,” IEEE J. Select. Areas Commun., vol. 9, pp. 1368–1382, Dec. 1991.

[14] M. C. Yuang, P. L. Tien, and S. T. Liang, “Intelligent video smoother for multimedia communications,” IEEE J. Selected Areas Commun., vol. 15, pp. 136–146, Feb. 1997.

(13)

[15] M. C. Yuang, S. Liang, Y. Chen, and C. Shen, “Dynamic video playout smoothing method for multimedia applications,” in Proc. IEEE ICC, 1996, pp. 1365–1369.

[16] Y. Xie, C. Liu, M. Lee, and T. Saadawi, “Adaptive multimedia syn-chronization in a teleconference system,” in Proc. IEEE ICC, 1996, pp. 1355–1359.

[17] Y. Ishibashi, S. Tasaka, and A. Tsuji, “Measured performance of a live media synchronization mechanism in an ATM network,” in Proc. IEEE

ICC, 1996, pp. 1348–1354.

[18] H. Heffes and D. M. Lucantoni, “A Markov modulated characterization of packetized voice and data traffic and related statistical multiplexer performance,” IEEE J. Select. Areas Commun., vol. 4, pp. 856–868, June, 1986.

[19] L. Kleinrock, Queueing Systems, Volume 1: Theory. New York: Wiley, 1975.

[20] A. Tarraf and I. Habib, “A novel neural network traffic enforcement mechanism for ATM networks,” IEEE J. Select. Areas Commun., vol. 12, pp. 1088–1096, Aug. 1994.

[21] A. Tarraf, I. Habib, and T. Saadawi, “Intelligent traffic control for ATM broadband networks,” IEEE Commun. Mag., vol. 33, pp. 76–82, Oct. 1995.

Po L. Tien was born in Taiwan, R.O.C., in 1969.

He received the B.S. degree in applied mathematics and the M.S. degree in computer and information science from the National Chiao Tung University, Taiwan, R.O.C., in 1992 and 1995, respectively. He is currently a Ph.D. candidate in the Department of Computer Science and Information Engineering at the National Chiao Tung University.

His current research interests include high-speed networking, multimedia communications, performance analysis, and applications of artificial neural networks.

Maria C. Yuang received the B.S. degree in

ap-plied mathematics from the National Chiao Tung University, Taiwan, R.O.C., in 1978, the M.S. de-gree in computer science from the University of Maryland, College Park, in 1981, and the Ph.D. de-gree in electrical engineering and computer science from the Polytechnic University, Brooklyn, NY, in 1989.

From 1981 to 1990 she was with AT&T Bell Laboratories and Bell Communications Research (Bellcore) where she was a Member of Technical Staff working on high-speed networking and protocol engineering. She was also an Adjunct Professor at the Department of Electrical Engineering at the Polytechnic University from 1989 to 1990. In 1990, she joined National Chiao Tung University, where she is currently a Professor of the Department of Computer Science and Information Engineering. Her current research interests include high-speed networking, multimedia communications, and performance analysis.