Analysis of the Experimental Results - Experimental Results and Discussions

Chapter 4 Experimental Results and Discussions

4.3 Analysis of the Experimental Results

In this section, we classified our beat detection errors into two categories based on Vision-based evaluation. They are known as false positives and false negatives.

(1) False Positive Error: A false positive is the error which normally means that a

test claims something to be positive, when that is not the case. In our experiment, our beat detection with a positive result (indicating that here is a beat event) has produced a false positive in the case where there is no beat event. We separate the errors into two situations: detected the beat events when there was no beat event and detected the beat events when the correct beat event was already detected.

(2) False Negative Error: A false negative is the error of failing to observe when in

truth there is one. In our case, our beat detection without a positive result has

86.40

produced a false negative where there is a beat event at that moment.

Table 4.5 The False Positive and False Negative Error Rates

Type of Error Method

False Positive False Negative Non-beat but detected duplicate detected beats but not detected K-Curvature

° 4.72% 6.10% 15.97%

K-Curvature

° 5.37% 9.93% 7.34%

Local Minimum 4.79% 1.51% 14.70%

When the false positive errors occurred, we need to separate the errors into two kinds of situations: detected the beat events when there was no beat event and detected the beat events when the correct beat event was already detected. We detected non-beat events falsely due to the trajectory of user including some non-beat change of direction.

The duplicate detection always occurred due to the tracking lost when the target left the scene or the vibration of the target. This kind of situation might be solved by the design of dynamic beats filter to eliminate those successive wrong beats while the time interval between two consecutive beats is too close.

The false negative errors always occurred if θ 60^°(in K-Curvature algorithm) or in Local-Minimum algorithm

.

It is due to the frame lost because of the performance of the program. If we can increase the maximum frame rates we processed, this kind of mistake might be solved.

Chapter 5 Conclusion and Future Work

5.1 Conclusion

In this thesis, we have presented an efficient real-time target tracking system for conductor gesture tracking based on CAMSHIFT algorithm. Also, in order to extract beat events of trajectory of the trajectory of the target, we used K-Curvature algorithm and Local-Minimum algorithm to interpret different kinds of conductor gesture.

The major part of our framework is based on CAMSHIFT algorithm which is a very simple, computationally efficient colored object tracker. It is usable as a visual interface and it can be incorporated into our system that provides the conductor gesture

tracking. The CAMSHIFT algorithm handles computer-vision problems as follows:

z Irregular object motion: CAMSHIFT scales its search window to object size

thus naturally handling perspective–included motion irregularities, so it is suitable

for our purpose to detect the change of direction.

z Distracters: CAMSHIFT ignores objects outside its search window so other

objects do not affect CAMSHIFT’s tracking.

z Lighting Variation: Using only hue from the HSV color space and ignoring

pixels with high/low brightness gives CAMSHIFT wide lightness tolerance.

In other words, we design an HCI system for interpreting a conductor’s gestures and translating theses gestures into musical beats that can be explained as the major part of the music. This system does not require the use of active sensing, special baton, or other constraints on the physical motion of the conductor. Thus, this framework can also be used for human analysis and many other applications such as interactive virtual worlds that allow user to interact with computer systems.

5.2 Future Work

Since CAMSHIFT relies on color distribution alone, errors in color (color lighting, dim illumination, too much illumination…etc) will cause errors in tracking procedure.

More sophisticated trackers use multiple modes such as feature tracking and motion analysis to compensate for this, but more complexity would undermine the original

design criterion for CAMSHIFT. Other possible improvements include:

z Improve tempo following: the current system cannot react to some complex and

subtle movement of professional conductor, not only for the direction change. We can replace our beat detection and analysis module with some more sophisticated gesture recognition algorithms, so that we can adjust our module according to the different level of users conducting skill.

z Include time stretching algorithm: time stretching algorithm is the process of

changing the speed or duration of an audio signal without affecting its pitch.

While our system can adjust the music playback speed according to the beat event,

it can help users to understand the conducting speed he/she performed.

z New application area: we can use our framework to several different areas in

interface with vision and some other multimedia. We not also can estimate the accuracy of beat events for conducting gesture, but also the movement of the dancer. Based on the previous future work, we can design another system for a dancer whose routine is no longer constrained by the tempo of a recording and the music would spontaneously react to his/her movements.

In conclusion, since the system we proposed is a framework which combines the video and audio processing areas, the applications of this technology can help us to examine unexplored area in interfaces with music and other multimedia. We can build an “interactive karaoke” system where a user could sing a song along to a recording, but have the recording adjusting to the user’s tempo. Some other applications can be implemented following these rules, including conductor training, live performance and music synthesis control, and so forth. We hope that the flexible and interchangeable modules would make the further researches easier in the future.

References

[1] T. T. Hewett, et al., "ACM SIGCHI Curricula for Human-Computer Interaction",

ACM Press, New York, NY, 1992, ACM Order Number: 608920.

[2] B. A. Myers, "A Brief History of Human-Computer Interaction Technology",

Interactions, vol. 5, pp. 44-54, 1998.

[3] M.T. Driscoll, "A Machine Vision System for Capture and Interpretation of Orchestra Conductor’s Gestures", M. S. Degree Thesis, May, 1999.

[4] E. Lee, I. I. Grüll, H. Kiel and J. Borchers, "Conga: A Framework for Adaptive Conducting Gesture Analysis", NIME '06: Proceedings of the 2006 Conference on

New Interfaces for Musical Expression, pp. 260-265, 2006.

[5] D. Murphy, “Tracking a Conductor's Baton” , Søren I. Olsen, Editor, Proceedings

of the 12th Danish Conference on Pattern Recognition and Image Analysis,

volume 2003/05 of DIKU technical report series, pp. 59-66, Copenhagen, Denmark, August 2003.

[6] R. Behringer, "Conducting Digitally Stored Music by Computer Vision Tracking",

AXMEDIS '05: Proceedings of the First International Conference on Automated Production of Cross Media Content for Multi-Channel Distribution, pp. 271,

2005.

[7] The Church of Jesus Christ of Latter Day Saints. Conducting Course.

[8] Wikipedia, The free Encyclopedia, http://en.wikipedia.org

[9] M. Lambers, "How Far is Technology from Completely Understanding a

Conductor?”, 4th Twente Student Conference on IT, Enschede, January 30, 2006.

[10] Paul Kolesnik, "A Conducting Gesture Recognition, Analysis and Performance System", M. S. Degree Thesis, McGill University, June, 2004.

[11] R. Boulanger and M. Mathews, “The 1997 Mathews Radio-baton and

Improvisation Modes”, Proceedings of the 1997 International Computer Music

Conference, pp.395-398, Thessaloniki, Greece, 1997.

[12] Buchla Lightning II. <http://www.buchla.com>

[13] B. Brecht and G. Garnett, “Conductor Follower” , Proceedings of the 1995

International Computer Music Conference, pp. 185-186, Banff, Canada, 1995,

Available: http://cnmat.berkeley.edu/publication/conductor_follower.

[14] J. Borchers, W. Samminger and M. Mühlhäuser, "Personal Orchestra: Conducting Audio/Video Music Recordings", Proceedings of the Second International

Conference on WEB Delivering of Music (WEDELMUISC’02), 2002.

[15] Carmine Cascaito and Marcelo M. Wanderley, “Lessons from Long Term

Gestural Controller Users”, in Proceedings of the 4th International Conference on

Enactive Interfaces (ENACTIVE'07), pp. 333-336, Grenoble, France, 2007.

[16] F. Tobey and Ichiro Fujinaga, "Extraction of Conducting Gestures in 3D Space",

Proceedings of the 1996 International Computer Music Conference, pp. 305-307,

San Francisco, 1996.

[17] T. Marrin and J. Paradiso, “The Digital Baton: a Versatile Performance

Instrument”, Proceedings of the 1997 International Computer Music Conference, pp.313-316, Thessaloniki, Greece, 1997.

[18] T. Marrin and R. Picard, “The Conductor's Jacket: a Device for Recording Expressive Musical Gestures”, Proceedings of the 1998 International Computer

Music Conference, pp.215-219, Ann Arbor, MI, 1998.

[19] T. Marrin, "Inside the Conductor's Jacket: Analysis, Interpretation and Musical Synthesis of Expressive Gesture", Ph.D. Dissertation, MIT Media Lab, February, 2000.

[20] T. Ilmonen, “Tracking Conductor of an Orchestra Using Artificial Neural Networks”, M. S. Degree Thesis, Helsinki University of Technology, Espoo, Finland, 1999.

[21] T. Ilmonen and T. Takala, "Conductor Following with Artificial Neural

Networks", Proceedings of the 1999 International Computer Music Conference, pp. 367-370, Beijing, China, October, 1999.

[22] H. Morita, "A Computer Music System that Follows a Human Conductor,"

Computer, vol. 24, pp. 44-53, 1991.

[23] Light Baton. < http://web.tiscali.it/pcarosi/Lbs.htm>

[24] J. Segen, A. Majumder, and J. Gluckman, "Virtual Dance and Music Conducted by a Human Conductor", Eurographics, vol. 19(3), EACG, 1999.

[25] J. Segen, S. Kumar and J. Gluckman, "Visual Interface for Conducting Virtual Orchestra", Proceedings of the 15th International Conference on Pattern

Recognition (ICPR’00), vol.1, pp. 276-279, 2000.

[26] E. Lee, T. Marrin and J. Borchers, "You're the Conductor: A Realistic Interactive Conducting System for Children", NIME '04: Proceedings of the 2004

Conference on New Interfaces for Musical Expression, pp. 68-73, Hamamatsu,

Japan, June 3-5, 2004.

[27] E. Lee, M. Wolf and J. Borchers, "Improving Orchestral Conducting Systems in Public Spaces: Examining the Temporal Characteristics and Conceptual Models of Conducting Gestures", Proceedings of the CHI 2005 Conference on Human

Factors in Computing Systems, pp. 731-740, Portland, Oregon, April 2-7, 2005.

[28] E. Lee and J. Borchers, "The Role of Time in Engineering Computer Music Systems", NIME '05: Proceedings of the 2005 Conference on New Interfaces for

Musical Expression, pp. 204-207, Vancouver, Canada, May 26-28, 2005.

[29] D. Murphy, T. H. Andersen, and K. Jensen, “Conducting Audio Files via Computer Vision”, Gesture-Based Communication in Human-Computer

Interaction: Selected Revised Papers from the 5th International Gesture Workshop, volume 2915 of LNAI, pp. 529-540, Genoa, Italy, April, 2003.

[30] D. Murphy, “Live Interpretation of Conductors' Beat Patterns” , Proceedings of

the 13th Danish Conference on Pattern Recognition and Image Analysis,

Copenhagen, Denmark, pp. 111-120, 2004.

[31] T. Sim, D. Ng, and R. Janakiraman, "VIM: Vision for Interactive Music",

Proceedings of IEEE Workshop on Applications of Computer Vision (WACV '07),

pp.32-32, February, 2007.

[32] K. C. Ng, "Music via Motion: Transdomain Mapping of Motion and Sound for Interactive Performances", Proceedings of the IEEE, vol.92, no.4, pp. 645-655, April, 2004.

[33] W. Hu, T. Tan, L. Wang and S. Maybank, "A survey on visual surveillance of object motion and behaviors", IEEE Transactions on Systems, Man, and

Cybernetics, Part C: Applications and Reviews, vol. 34, pp. 334-352, 2004.

[34] Wen-Han Yao, "Mean-Shift Object Tracking Based On A Multi-blob Model", M.

S. Degree Thesis, National Tawan Chiao Tung University, June, 2006.

[35] G. Bradski, "Computer vision face tracking for use in perceptual user interface",

Intel Technology Journal, vol. 2nd Quarter, 1998.

[36] G. John Allen, Y. D. Richard Xu and S. Jin Jesse, "Object Tracking Using CamShift Algorithm and Multiple Quantized Feature Spaces", Inc. Australian

Computer Society, vol.36, 2004.

[37] Dorin Comaniciu and Peter Meer, "Mean Shift: A robust approach toward feature space analysis", IEEE Transactions on Pattern Analysis and Machine Intelligence, 24(5):603-619, May, 2002.

[38] Xia Liu, "Research of the Improved Camshift Tracking Algorithm", International

Conference on Mechatronics and Automation, ICMA 2007, pp. 968-972, 2007.

[39] Hongmo Je, Jiman Kim and Daijin Kim, "Vision-Based Hand Gesture Recognition for Understanding Musical Time Pattern and Tempo", The 33rd

Annual Conference of the IEEE Industrial Electronics Society (IECON), pp.

2371-2376, , Taipei, Taiwan, November 5-8, 2007.

[40] W.S. Rutkowski, A. Rosenfeld, "A comparison of corner-detection techniques for chain-coded curves", TR-623. Computer Science Center, University of Maryland, 1978.

[41] T. Peli, "Corner extraction from radar images", 1988 International Conference on

Acoustics, Speech, and Signal Processing. ICASSP-88, pp. 1216-1219 vol.2,

1988.

[42] Huang-Yu Lian, "The Effects of Human Factors on Reaction Speed to Visual and Auditory Signals ", M. S. Degree Thesis, National Kaohsiung First University of Science and Technology, Kaohsiung, Taiwan, 2000.

在文檔中以視覺為基礎之即時指揮手勢追蹤系統 (頁 57-0)