• 沒有找到結果。

CHAPTER 5 CONCLUSIONS AND FUTURE RESEARCH DIRECTIONS

5.2 Future Research Directions

Content-based audio analysis is still a new area that is not well explored. There are some possible future research directions. For example, in audio classification and segmentation, we will work on the other type of audio source such as sound effects and the compression domain. In the content-based audio retrieval, we will emphasize in query by humming (QBH).

REFERENCES

[1] S. Pfeiffer, S. Fischer, and W. Effelsberg, “Automatic audio content analysis,” in Proc. ACM Multimedia’96, Boston, MA, April 1996, pp. 21-30.

[2] J. Foote, “An overview of audio information retrieval,” ACM Multimedia Systems, vol. 7, no. 1, pp. 2-11, January 1999 .

[3] E. Scherier and M. Slaney, “Construction and evaluation of a robust multifeature speech/music discriminator,” in Proc. Int. Conf. Acoustics, Speech, Signal Processing’97, Munich, Germany, April 1997, pp. 1331-1334.

[4] S. Rossignol, X. Rodet, and J. Soumagne et al., “Feature extraction and temporal segmentation of acoustic signals,” in Proc. ICMC 98, Ann Arbor, Michigan, 1998, pp. 199-202.

[5] J. Saunders, “Real-time discrimination of broadcast speech/music,” in Proc. Int.

Conf. Acoustics, Speech, Signal Processing’96, vol. 2, Atlanta, GA, May 1996,

pp. 993-996.

[6] I. Fujinaga, “Machine recognition of timbre using steady-state tone of acoustic instruments,” in Proc. ICMC 98, Ann Arbor, Michigan, 1998, pp. 207-210.

[7] L. Wyse and S. Smoliar, “ Toward content-based audio indexing and retrieval and a new speaker discrimination technique,” in Proc. ICJAI’95, Singapore, December 1995.

[8] D. Kimber and L. D. Wilcox, “Acoustic segmentation for audio browsers,” in Proc. Interface Conf., Sydney, Australia, July 1996.

[9] E. Wold, T. Blum, D. Keislar, and J. Wheaton, “Content-based classification, search, and retrieval of audio,” IEEE Multimedia Mag., vol. 3, no. 3, pp. 27-36,

[10] L. Guojun and T. Hankinson, “A technique towards automatic audio classification and retrieval,” in Proc. Int. Conf. Signal Processing’98, vol. 2, 1998, pp. 1142-1145.

[11] J. S. Boreczky and L. D. Wilcox, “A hidden markov model framework for video segmentation using audio and image features,” in Proc. Int. Conf. Acoustics, Speech, Signal Processing’98, Seattle, May 1998, pp. 3741-3744.

[12] D. Li, I. K. Sethi, N. Dimitrova, and T. McGee, “Classification of general audio data for content-based retrieval,” Pattern Recognition Letters, vol. 22, no. 5, pp.

533-544, April 2001.

[13] T. Zhang and C.-C. J. Kuo, “Hierarchical classification of audio data for archiving and retrieving,” in Proc. Int. Conf. Acoustics, Speech, Signal Processing’99, vol. 6, 1999, pp. 3001-3004.

[14] T. Zhang and C.-C. J. Kuo, “Audio content analysis for online audiovisual data segmentation and classification,” IEEE Transactions on Speech and Audio Processing, vol. 9, no. 4, pp. 441-457, May 2001.

[15] G. Smith, H. Murase, and H. Kashino, “Quick audio retrieval using active search,” in Proc. Int. Conf. Acoustics, Speech, Signal Processing’98, Seattle, WA, May 1998, pp.3777-3780.

[16] J. Foote, “Content-based retrieval of music and audio,” in Proc. SPIE, Multimedia Storage and Archiving systems II, Vol.3229, 1997, pp. 138-147.

[17] T. Zhang and C.-C. J. Kuo, “Content-based classification and retrieval of audio,”

in Proc. SPIE, Conf. Advanced Signal Processing Algorithm, Architectures, and Implementations VIII, vol.3461 , San Diego, July 1998.

[18] A. Ghias, J. Logan, D. Chamberlin, and B. Smith, “Query by humming: musical information retrieval in an audio database,” in Proc. Int. Conf. ACM Multimedia,

[19] G. Tzanetakis and P. Cook, “Audio information retrieval (AIR) tools,” In Proc.

Int. Symposium on Music Information Retrieval (ISMIR), 2000.

[20] K. Martin, E. Scheirer, and B. Vercoe, “Musical content analysis through models of audition,” In Proc. ACM Multimedia Workshop on Content-Based Processing of Music, Bristol, UK, 1998.

[21] C. Spevak and E. Favreau, “Soundspotter-a prototype system for contest-based audio retrieval,” in Proc. Int. Conf. Digital Audio Effects, September 2002, pp.

27-32.

[22] G. Tzanetakis, Manipulation, Analysis and Retrieval System for Audio Signals.

Ph.D. thesis, Princeton University, 2002.

[23] C. Yang, “MACS: music database retrieval based on spectral similarity,” In IEEE Workshop on Applications of Signal Processing, 2001.

[24] C. Yang, Music database retrieval based on spectral similarity. Stanford University Database Group Technical Report 2001-14, 2001.

[25] S.-T. Bow, Pattern Recognition and Image Preprocessing. Marcel Dekker, 1992.

[26] MPEG Requirements Group, “Description of MPEG-7 content set,” Doc.

ISO/MPEG N2467, MPEG Atlantic City Meeting, October 1998.

[27] D. Gabor, “Theory of communication,” Journal of the Institute for Electrical Engineers, vol. 93, pp. 429-439, 1946.

[28] B. S. Manjunath and W.Y. Ma, “Texture features for browsing and retrieval of image data,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 18, no. 8, pp. 173-188, 1992.

[29] S. Qian and D. Chen, Joint Time-Frequency Analysis Methods and Applications.

Upper Saddle River, NJ: Prentice-Hall, 1966.

[30] E. Zwicker and H. Fastl, Psychoacoustics, Facts and Models. Springer, 1990.

Associates, Inc., 1998.

[32] J. Smith M. and X. Serra, “An analysis/resynthesis program for non-harmonic sounds based on a sinusoidal representation,” in Proc. ICMC 87, Ann Arbor, Michigan, 1987, pp. 290ff.

[33] N. Peter Belhumeur, and David J. Kriegman, “Eigenfaces vs. fishfaces:

recognition using class specific linear projection,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 19, no. 7, pp. 711-720, July 1997.

[34] B. S. Manjunath, P. Salembier, and T. Sikora, Introduction to MPEG-7:

Multimedia Content Description Interface. John Wiley, 2002.

[35] M. Casey, “MPEG-7 sound recognition tools,” IEEE Transactions on Circuits and Systems Video Technology, special issue on MPEG-7, IEEE, vol. 11, no. 6,

pp. 737-747, 2001.

[36] ISO/IEC JTC1/SC29/WG11 Moving Pictures Expert Group, “Information technology–multimedia content description interface – part 4: Audio. Comittee Draft 15938-4,” ISO/IEC, 2000.

[37] ISO/IEC JTC1/SC29/WG11 Moving Pictures Expert Group, “Introduction to MPEG-7,” available from http://www.cselt.it/mpeg.

[38] C. Manning and H. Schütze, Foundations of Statistical Natural Language Processing. The MIT Press, Cambridge, MA, 1999.

[39] A. J. Willis and L. Myers, “A cost-effective fingerprint recognition system for use with low-quality prints and damaged fingertips” Pattern Recognition, Vol. 34, No. 2, pp. 255-270, February 2001.

PUBLICATION LIST

We summarize the publication status of the proposed methods and our research status in the following.

(1) Ruei-Shiang Lin and Ling-Hwei Chen, “A New Approach for Classification of Generic Audio Data,” accepted by International Journal of Pattern Recognition and Artificial Intelligence.

(2) Ruei-Shiang Lin and Ling-Hwei Chen, “A New Approach for Audio Classification and Segmentation Using Gabor Wavelet Filtering and Fisher Linear Discriminator,” accepted by International Journal of Pattern Recognition and Artificial Intelligence.

(3) Ruei-Shiang Lin and Ling-Hwei Chen, “Content-based Retrieval of Audio Based on Gabor Wavelet Filtering,” accepted by International Journal of Pattern Recognition and Artificial Intelligence.

相關文件