12 IEEE Transactions on Consumer Electronics, Vol. 4 I , No. 1 , FEBRUARY 1995 A NOVEL MULTIMEDIA SYNCHRONIZATION MODEL AND
ITS APPLICATIONS IN MULTIMEDIA SYSTEMS
Herng-Yow Chen, Nien-Bao Liu, Chee-Wen Shiah, Ja-Ling Wu, Wen-Chin Chen and Ming Ouhyoung Communication & Multimedia Laboratory
Department of Computer Science and Information Engineering National Taiwan University, Taipei, Taiwan, R.O.C.
Abstract
Synchronization problem, which always arises whcn sounds, videos, motion pictures and other nicdia are brought together and integrated into a computcr system, is one of the most important issucs i n multimedia communications and applications. In the time-sharing and multi-process erivironmcnt such as
UNIX
operating system, the traditional synchronization mechanism results in two fatal dcfects, namely, audio discontinuity and out of synchronization between audio and video. In this papcr, a novel mcdia synchronization model in a multi-proccss environment is proposed. Based on this model, a continuous mcdia playback module was implementcd and scnicd as the key component of the two niultimcdia systcms developed in the Communication and Multinicdia Laboratory of National Taiwan Univcrsity. One is the multimedia authoring system which provides interactive VCR-like opcrations and audiohidco editing facilities. The other is a prototype of the VOD (Video On Demand) system which providcs vidco browsing facility. Both systems show that the performance of the proposed synchronization modcl is quite satisfactory.1.
IntroductionA multimedia system combines various information sources, such as test, voice, audio. video, graphics and images, into a wide range of application. It suggests a wide variety of potential applications such as rcmote learning [l], multimcdia mailing systcm [2],
collaborativc work systems [3], multimedia communication systcms (vidco phone, conference systcni [4] and information on dcmand system [ 5 ] ) , to name just a fcw. Ncvcrthcless, the complesity of these multimcdia systcms introduces a number of new tcchnical problems i n the ficld of computer science . It
worths noting that thcsc problems basically result from the diffcrent fcaturcs among diflcrent media. To solve thcsc problems is one of the major rcsearch problems i n mu1 t i mcd i a rcscarch.
The synchronization problcm arises mainly due to thc fact that thc computcr systems use hard disks as thcir storage dcviccs to store all typcs of data in one- dimensional form. One thus has to espand (digitize) the nicdia source data, which is stored in two- dimcnsional form in analog storage, into the one- dimcnsional form in hard disks so that they can be proccsscd in digital systems. The problem of synchronization among mcdia arises consequently whcn sound, motion pictures and other media are all storcd and proccsscd togcthcr in the above forms in the coni pu t cr sj's t cm .
Thc purpose of this papcr is to develop a robust media synchronization modcl in application layer and to provide some mcdia synchronization playback modulcs. Thcse modulcs scrve as the key components of the nicdia playback applications and have been used to implcnicnt a varicty of multimcdia applications in thc UNIX cnvironmcnt, including multimedia authoring tool. VOD (Vidco On Demand) system, and so forth. In the following sections, the basic synchronization principle and the traditional synchronization mcchanism is discussed first. Then a Revised manuscript received December 13, 1994 0098 3063/95 $04.00 e 1995 IEEE
Chen, et al.: A Novel Multimedia Synchronization Model and Its Application in Multimedia Systems multi-process (multi-thread) synchronization modcl
will be proposed. Next, the system implcmcntation and some related application systems are presented. Some experiments to show the superiority of the proposcd model are then described. Finally, a bricf conclusion is given.
2. The Synchronization Principle
The problem of synchronization arises mainly whcn several related media are to be played back in their corresponding temporal constraints. In analog systcms, such as
VHS,
the rotational specd of tape is used as the reference for timing information. On the othcr hand. in digital system, the timing information can be obtained intuitively from the size of the audio segmcnt in the digital form (cf. figure 1). To solve the synchronization problem, many differcnt tcchniqucs have been proposed in different platforms (PC with DOS or MS-Windows, Workstation with UNIX and X-Windows) [6][7][8][9]. All of thcse synchronization schemes are based on the idea of aligning thc physical location of audiohideo data on the storage. Modcl I describes a simple synchronization schcme proposcd in [7].Model I : [Synchronization by location alignment] loop {
/*
estimate the audio waiting tirite i n advnnce */ estimate audio-waiting-time;/*
cJ: Eqn. (1) *//* show the related audio and videopatne
*/
play-audio-segment();play-video-frame();
/*
waiting for audio data consuitled coiiiplete!v bvthe audio device
*/
sleep(audiovaiting-time);} until end-ofglayback
Eqn. (1) :
cnido- segment- size audio-smpling- rate
a d o - w&ing-time =
+
oarheod; overhead = data- access- time+
system- overJwnd.Ideally, the audio-waiting-time equals to the size of audio segment dividcd by the sampling rate of
13
the audio dcvice. Moreover, some overheads include thc mcdia acccssing time and the instruction execution time should be taken into account in the practical system implcmcntation. In single process cnvironnicnts such as DOS, the interrupt service routinc of OS kcrncl can also be a system overhcad. In addition to thc above systcm overhcad, the overhead of proccss contcst switching is important but is difficult to be predicted in the multiple process environments such as UNIX. As pointcd out in [7] (cf. Eqn. l), the estimation of the data accessing time and system ovcrhcad is critical in Model I. If the next playing time interval (i.e. audio-waiting-time) can not be estimated prcciscly enough, two undesirable phenomcna may occur:
(i) Audio rliscontiiiuity (cf. Figure 2 ) : If the cstimatcd timc intcmal (i.e. audio-waiting-time) from Eqn.(l) is loiigcr than the rcal one, it would be too late to w i t c the ncst audio data segment to the audio dcvice i n time. Thcrcforc, the previous audio data scgmcnt i n thc audio buffcr will be exhausted before tlic nest audio scgnicnt arrives. In the meanwhile, thcrc is no audio data to be playcd back bctween these two audio scgmcnts. This results in the discontinuous audio output.
(ii) Out of synchronizution (cf. Figure 3): If the cstiniatcd time intcnlal is shorter than the real one, it \\.odd be too carly to write audio data to the audio dcvice. I n this case, the rcsidual part of previous audio segmcnt is still in the audio dcvice buffer, but the corrcsponding video frames will be immediately playcd out by the vidco dcvices such as P E G or MPEG comprcssion hardware. Thus the audio device buffcr could bc alniost full and the phenomenon of “out of sjxhronization” occurs.
Thcsc two shortcomings make the work in IC][ 711 S] incomplctc.
3.
The Synchronization Model
I n ordcr to overcome the problcnis existed in Model I, a n o \ d multi-process (multi-threads) media synchronization modcl (Modcl 11) is presented as follows. Somc opcrating systems, such as UNIX, do
14 IEEE Transactions on Consumer Electronics, Vol. 41, No. 1, FEBRUARY 1995 not support multiple threads in a process. The others,
such as MACH, do support. In our model, a thread could be treated as a process if the multi-thread facility is not supported in the operating system. We uscd "process" instead of "thread" for the convenience of the explanation.
Model I1 : [multi-process (threads) model]
(1) Each process is responsible for playing back one medium.
(2) The parent process plays the role of monitoring its child processes and playing back the highest priority medium.
(3) The child processes play back the lower priority media.
(4) The responsibilities of the parent process are :
0 pre-calculate the vital synchronization
information.
0 fork (generate) the child processes bcfore the
playback starts.
0 kill (terminate) the child processes after the
playback ends.
( 5 ) Synchronization mechanism among diflcrcnt Two approaches can be adoptcd media processes:
in the proposed model.
0 relative ynchronizution: [complcs mcthod]
According to the prc-calculatcd synchronization information (some synchronization points), media processes can synchronize with each other through some well-known internal process communication (IPC) techniques, such as share memory, suckct, and pipe.
According to the pre-calculated information (some time table), each mcdium proccss synchronizes with the global system clock .
0 absolute synchronization: [simple mcthod]
To make the proposed model clear, we give a simple example in the following, which shows how the synchronization between audio and video is done in our model.
(1) A time-stamp table, which is pre-calculatcd from the audio segment size, used as the chcck-points for synchronization. Each element of the tablc
rccords the starting point (start-time) and the ending point (dcad-time) of an audio segment. All of the video frames and their associated audio segments should be played back during their corresponding time interval. (cf. Figure 4.) (2) Generate a child process for each medium by using
systcm call "fork" [17] from parent process. As shown in figure 5 , an audio parent process can gcnerate a child video process and a child test process. Each child process inherits the synchronization information from its parent. (3) Perform synchronization mechanism described in
step ( 5 ) of Modcl I1 within those child processes. Thc following pscudo algorithm is used for video playing :
Proccdurc Play-Vidco bcgi n
loop
/* can gct current-time from system absolute clock */
/* chcck dead time */
if (currcnt-time > dcad-time of frame i ) {
jump to nest appropriate framej; I = j ;
1
/* check start time */
if (currcnt-time < start-time of frame i ) {
of frame i ) ;
wait until (current-time = start-time
1
display vidco frame i; end loop
cnd Proccdure
(-1) Thc audio child process should be busy with scnding audio data to audio device. The following is its pscudo code :
Procedure Play-Audio begin
for i = current-frame to end-frame play audio data of i-th frame cnd Procedure
( 5 ) If the user prcsses the "stop" or "pause" button, all thc active child proccsscs are killed by their parent process.
Chen, et al.: A Novel Multimedia Synchronization Model and Its Application in Multimedia Systems 15
(6) If the user presses the "play" or "continue" button later, steps (1)-(5) are repeated.
In this audiohide0 synchronization esample,
one tricky technique, rather than time stamp or internal process communication technique, is used in our implementation of audio process. The synchronization can be achieved due to the following reasons. The output rate of audio device is constant and equals to the input rate of the audio dcvice. Thus, the audio buffer is almost full while audio process is busy with writing audio data. The playback speed of audio data can thus keep pace with the time asis due to the constant output rate of audio device. Thcrclore, only video process needs the time stamp table to chcck whether the corresponding video frame should be played back or not. The following shows its chccking rules for synchronization (cf. figure I ) : the corresponding video frame should
l.be dropped if current-time is earlicr than dead-time; 2.wait if current-time is earlier than the start-time; 3.be played back if current-time is in the time intcnlal
between start-time and dead-time.
The advantages of the proposed modcl include: Easy to program and dcbug: The employmcnl of multi-process (thread) model simplifics the program. It is more intuitive to program each media function than Model I does.
"Audio discontinuity" is climinatcd: Since the audio process is constantly busy with writing audio data, audio device buner is always almost full.
The phcnomcnon of "out of synchronization" never happens: The estimation of video-waiting- time in Model I is no longer nccdcd.
Dirfcrcnt mcdia priority can be supportcd: The media priority can be supported by assigning different priorities to different processes (thrcads) in the applications. For example, in the lip synchronization application, it is obvious that the priority of audio is higher than that of video because perception of human is more scnsitive to audio than to video. In some subtle applications such as slide presentation (foreground slidcs and background music), the priority of image or
graphic nicdia may be higher than that of the audio data.
System is robustcr and more flexible: Multimcdia applications based on this proposed modcl bccome more flexible and adaptive than those in the traditional synchronization model. For esample, in our proposed model, an application can kill or suspend some less important mcdia processes (threads) when the systcm overload is heavy. These processes can be rcstartcd or rcsumed when system overload bccomcs light.
4. Implementation
We ha1.c implcmcnted a mcdia player module based on this proposcd synchronization modcl. A P E G bascd hardware board is used to compress and dccomprcss the t-idco data in rcal-time. The built-in audio dc\icc providcs the rccordplayback audio function i n 8 KHz sampling rate. To provide a high disk ~ C C C S S spccd and large storage space, a disk array is used its thc local disk.
The systcm is currently developed in the UNIX cnvironmcnt using the X Toolkits of X-Windows systcm [14][15][16] as the graphical user interface. To providc VCR-like interactive operating facility, the UNIX alarm signal (in X Toolkit Intrinsic: XlAddTimcOut) rathcr than UNIX sleep [ 171 system function is adoptcd. Figure 6 shows the hardware and sofluarc architccturc of the system.
5.
ApplicationsOn tlic basis of the proposcd synchronization modcl, scvcral multimcdia systcms have been developed in the Communication and Multimcdia Laboratory of National Tailvan Univcrsity. One is a powefil niultimcdia authoring system, which provides digital VCR-likc ifidco opcrations and KTV facilities in \vhich audio. vidco and test media can be synchronized. This system provides a friendly and lunctional complcte environmcnt allowing users to do thcir audio/vidco editing works. Table 1 summarizes thc authoring hnctionalitics providcd in this system. Figure 7 shows a photo of the authoring system.
16
A prototype of VOD (Vidco On Dcmand) system has also been developed on the Ethcrnct network, which is capable of handling simple contcnt- based media queries. Moreover, using a DCT-bascd video scene detection technique [18] and some vidco parsing technique [ 191, a multi-laycr vidco-shot browser was developed for the VOD clients. Whcn a client user requests a movie from the VOD scrvcr, he can browse rapidly a number of pre-processing vidco shots provided by the VOD server to dccide whcthcr or not he wants to see the movie. Figure 8 givcs a snapshot of the prototype of VOD system.
6. Experiment
IEEE Transactions on Consumer Electronics, Vol. 41, No. 1 , FEBRUARY 1995
To show the superiority of the proposed modcl, experiments are carried out to tcst the pcrforniance of the model in UNIX workstations with X windows for two critical situations which yicld bad performancc in the traditional synchronization modcl. Onc is in tlic case of I/O burst situation, such as whcn the systcni
executes a program with high disk I/O dcmand. The other one is in the case of CPU burst situation, such as when the system executes many CPU burst proccsscs concurrently. The expcriment rcsults show that thc performance of the proposed modcl is quitc good in both cases. No audio discontinuity and out-of- synchronization phenomena are obscncd, evcn though some vidco frames are skippcd undcr hca\y system load.
7.
ConclusionWe proposed a synchronization modcl that is suitable in both multi-user and multi-process UNIX like environment, and multi-thread MACH like environment. Compared to thc traditional approach. the new model enjoys a numbcr of advantages \\hich have been discusscd above. Above all, the proposed model is insensitive to the VO and CPU bursts situations in which the traditional synchronization method does not pcrform well.
with round robin policy. As pointed out in [IO][ll], tlic convcntional UNIX environment for workstation computing, although useful for many applications, may not bc suiitable for high-performance multimcdia computing. The main contribution of this paper includcs: (1) a novcl synchronization model and some rclatcd implcnicntation espericnces for multimedia computing arc prcscntcd; (2) a general model is
proposcd in tlic non rcal-time opcrating system (such as UNIX) to achicvc the mcdia synchronization rcquircmcnt: and (3) based on the proposed synchronization modcl, a media playback module is dcvclopcd and has bccn uscd as the key component of scvcral multimcdia systems .
8. Aclmowledgement
This rcscarch was supported in part by the National Scicncc Council of Rcpublic of China under the contract no. NSC 83-0-125-E-002-140.
As discussed prcviously, convcntional UNIX system does not support rcal-time applications
because its kernel is non-prccmptive and its process scheduling criterion adopts the multi-lcvcl fccdback
References:
Roger C. Schank, "Active Learning through Multimedia", IEEE Multimedia Magazine, Spring 1994, pages 69-78.
Ming. Ouhyoung, Wen-Chin. Chen, et al.,"The MOS Multimedia E-mail Systcm", Procceding of IEEE Multimedia, 1994, pages 315-324. Earl Craighill, Ruth Lang, Martin Fong, and Keith Skinner, "CECED: A Systcm For Information Multimedia Collaboration", Proceeding of ACM Multimedia, 1993, pagcs 437-446.
William J. Clark, "Multipoint Multimcdia Conferencing", IEEE Communications Magazine, May 1992, pages 44-50.
P.Venkat Rangan, Harrick M. Vin, and Srinivas Ramanathan, "Designing An On-Dcmand Multimedia Service", IEEE Communications Magazine, July 1992, pages 56-64.
Shyi-Bang Wey, Chee-Wen Shiah, and Wcn- Chin Chen, "Synchronization of Audio and Video Signals in Multimedia Computing Systems", Proceeding of 1992 Inlcrnational Computer Symposium , Novembcr 1992, pages Chun-Chuan Yang, Jau-Hsiung Huang, and Ming Ouhyoung, "Synchronization of Digitized Audio and Video in Multimedia System". HD- Media Technology and Applications Workshop, November 1992, pages 2-6.
Yuong-Wei Lei, Ming Ouhyoung, "A Ncw Architecture For A TV Graphics Animation Module", IEEE Transactions on Consurncr Electronics, Vol. 39, No. 4, Novcmber 1993. pages 797-800.
Lawrence A. Rowe and Brian C. Smith. "A Continuous Media Player", Proc. 3rd hit. Workshop on Network and OS Support for Digital Audio and Vidco, San Dicgo CA. November 1992, pages 328-335.
Dick C. A. Bulterman, Guido van Rossuni, and Dikter, "Multimedia Synchronization and UNIX", EurOpen Conference, Autumn 199 1.
Dick C. A. Bulterman, "Synchronization of Multi-Sourced Multimedia Data for 665-669.
Hctcrogcncous Targct Systems", Proc. 3rd Int. Workshop on Network and OS Support for Digital Audio and Video, San Diego CA, Noiwiber 1992.
XVIDEO: Uscr's Guide, Parallax graphics Inc., 1991.
XVIDEO: Software Developer's Guide, Parallax Graphics Inc., 1991.
An OPENLOOK at UNIX: a developer's guide to X, by John David Millcr, M&T Inc., 1990. X Toolkit Intrinsics Programming Manual, by Adrian Nye and Tim O'Reilly, O'Reilly &
Associates Inc., 1990.
X Toolkit Intrinsics Rcfcrcnce Manual, O'Rcilly 8r Associates Inc., 1990.
SUN MicroSystem "Programmer's Language Guidcs".
Farshid Arman, Arding Hsu, and Ming-Yee Chiit. "lmagc Processing On Compressed Data For Largc Vidco Databases", ACM Multimedia, A I Lab. Univcrsity of Michigan. "Knowledge Guidcd Parsing i n Vidco Databases", ACM Multimcdia Tutorial. 1993.
IEEE Transactions on Consumer Electronics, Vol. 41, No. I , FEBRUARY 1995
1
time interval
1
1 time interval
,2
;
Hard disk
i .
..++
'
time interval
I
'
time interval
2
---+E=+Time
----Zs-playback path of audio
'
playback path of video
time interval = audio segment size
I
audio sampling rate
Figure
1.The basic principle of audiohide0 synchronization.
Time
v v v
v v v v v v: audio segment
playing audio audio discontinuity V : video frame
Figure 2. The phenomenon
of audio discontinuity due to
the fact that the estimated time is longer than the real one.
out of synchronization
Time
v
: videoframe
- 1
audio b u ffc r
Figure
3 .The phenomenon
of "out of synchronization" due to
the fact that the estimated time is shorter than the real one.
In ideal case : I VI v2 v 3 v 4 v 5 V6
n
n
n , n
+
Video Axis--
Audio Axis I t l t2 t3 t4 t5 t5=-
Time AxisIn real case : (multi-process environment)
A 4 I VI V2 U3 V4 V5 V6 .____
+
Video Axis Audio Axis s1 dl s 3 d3 s5 d50
normal frame should be played back1
late frame should be dropped early frame should be delayedI i
k
S
I T\
start time dead time
Figure
4.Using the pre-calculated table of time stamp and system clock for
audiohide0 synchronization.
L
Lower Priority Highest Priority Lowest Priority
process process process
(play video) (play audio) (show text)
L
I
t
Kill()
L-
20
FUNCTIONS
Video creation
Multi-sources combination
Video concatenation
Visual effect
IEEE Transactions on Consumer Electronics, Vol. 41, No. 1, FEBRUARY 1995
EXAMPLES
video record, format convert, etc.
picture in picture, video-graphic composite, etc.
fad
in,fad out, etc.
dissolve. door oPen. rotation. etc.
Video frame edition
Video sequence edition
Video playing speed adjusttnent
Video display factor adjustment
Video file management
Video scene browser
Text media synchronization with A/V
A
image processing, draw, paint, etc.
reverse, cut, paste, copy, search, etc.
play, forward, slow, backward, etc.
hue, saturation, brightness, etc.
copy, rename, load, etc.
decompose video into shots of scenes
KTV facility
microphone
Etherned
I
Multimedia PlayBack System Synchronizationr
GUI1
;:O:P~
I
moduleX Video Vendor Extension
' X-window Library
7-~
UNlX Operating System
L _ -~ program level
c
r
library levelFigure
6.The hardware and software architecture of the proposed media player module
Biographies
Figure
8.A snapshot
of the proposed video o n demand system
Multinicdia Databasc and Multimcdia Synchronization Modcling.
IIerng-Yow Clicn reccivcd thc Chcc-\\’cn Sliiah is a Ph.D.
B.S. degree in Computcr studcnt in the Dcpartment of
Science and Information Computer Scicnce and
Engineering form Tamkang Information Engineering at
National Taiwan University. University, Tamshoci, Taiwan,
His rcscarch interests include ROC. He has bcen a Ph.D.
Multimcdia Database, student in the dcpartmcnt of
Computer Scicncc and Elcctronic Classroom, Real
Information Engineering at National T a i n m T i m Sj stcm Dcsign, and Computer Graphics.
University, Taipei, Taiwan, ROC, since 1992. Hc rcccivcd thc B.S. dcgrce in Computcr Science form National Chiao-Tung University in 1990, and the M.S. Processing, Image Coding, Data Coniprcssion, dcgrce i n Computcr Engineering from National Multimedia Synchronization Modcling and Taivan Unijwsity in 1992.
Multimedia Systems. He is a student mcmbcr of IEEE. His research interests include Digital Signal
.Ja-Ling \\‘U was born in Taipei,
Nicn-nao Liu is a mastcr
II
student in the dcpartmcnt of Computer Scicnce and Information Enginecring at National Taiwan Univcrsity.
He received the B.S. degree in Computcr Science
Taiwan, on Noveinbcr 21, 1956. Hc rcceivcd the B.S. dcgree in Elcctrical Engineering from the Tamkang University, Tamshoei, Taiwan, in 1979, the M.S. and Ph.D. degree in Electrical Engincering from the Tatung from the Tatung Institutc of
Technology in 1991. His research intercsts includc
22 IEEE Transactions on Consumer Electronics, Vol. 41, No. 1, FEBRUARY 1995
From 1986 to 1987 he was an Associate Associate Profcssor in the Computer Science and professor of the Electrical Engineering Department at Information Engineering Dcpartment, National Tatung Institute of Technology, Taipei, Taiwan. Since Taiwan Univcrsity. He has publishcd papers on Signal 1987 he has been with the Department of Computcr proccssing and Coniputcr Graphics. He is currently Science and Information Engineering, National cngagcd i n rcscarch in the area of Computer Graphics, Taiwan University, where he is presently a Profcssor. Virtual Reality, and Multimcdia System. He is a Outstanding Youth Medal of China and the
Outstanding Research Award sponsored by the National Science Council, from 1987 to 1992.
Prof. Wu has published more than 100 technical and conference papers. His research intcrests include Neural Networks, VLSI Signal Processing, Parallel Processing, Image Coding, Algorithm Dcsign
for DSP, Data Compression, and Multimcdia Systems.
Prof. Wu was the recipient of the 1989 mcmbcr of ACM and IEEE.
Wen-Chin Chcn rcceivcd the B.S. degree in Mathematics from National Taiwan University in 1976, and thc ScM and the Ph.D. dcgrcc in Computer Science from Brown Univcrsity in 1981 and 19S1, respectively
In 1987 he joined the faculty of National Taiwan University where he is currently a Profcssor of Computer Science and Information Engineering. His research interests include Dcsign and Analysis of Algorithms, Multimedia, and Database systems. He is the co-author of a book entitlcd "The Dcsign and Analysis of Coalesced Hashing", which was publishcd by Oxford University Press in 1987.
Ming Ouliyoung rcceivcd the B.S. and M.S. degree in Electrical Enginecring from thc National Taiwan University, in 1981 and 1985, rcspcctivcly. He receivcd the Ph.D. dcgrce i n Computcr Scicnce from thc University of North Carolina at Chapel Hill in 1990. He was a member of the technical staff at AT&T Bell Laboratories, Middle-town, during 1990 and 1991. Since August 1991, he has bcen an