8. 實驗結果
8.3 結果評估
8.3.3 廣告內涵分類分析
對於廣告內涵的分類,從使用於上一節的廣告音訊樣本中,取得正確的廣告分段 樣本後,一個廣告分段即代表一個獨立的廣告內涵。基於廣告內涵的類別相關性,我 們以六個廣告內涵類別(食品、醫藥、車、宣傳、日用品和電器)為基底,分類 80 個 廣告音訊樣本,每一個廣告分段經由系統的特徵值計算公式計算出內涵特徵值組,內 涵特徵值組再經由 TF-IDF 的權重計算公式產生特徵路徑的次序,不同的特徵路徑代 表不同的廣告內涵。對於不同的特徵路徑以一個餘弦向量相似度公式,來比較廣告內 涵的特徵路徑和各類別特徵路徑之間的相似性以識別相近的類別。實驗中,分別以不 同的相似度門檻值,統計相同數量的實驗樣本所能分類的數量,並藉由分類的數量評 估其準確率,其中相似度大於 70%以上的分類帄均準確率為 74.5%。表 6.為五個相 似度門檻值及六個廣告內涵類別的分類結果。表 7.為五個相似度門檻值的分類準確 率結果。
表 6. 相似度分類評估結果
表 7. 廣告分類準確率評估結果
轉換五個相似度門檻值的分類準確率統計結果,如圖8.6。從實驗中發現,在以 相似度大於80%及90%門檻值做分類的準確率不如預期,以本文所提方法的立論而言,
在相似度越高的特徵路徑下,所識別的廣告內涵應越相近,識別正確的準確率也應越 高。我們考慮實驗可能因樣本數不足所產生的影響,去除相似度大於80%及90%門檻值 的結果(如圖8.7),從低至高的相似度門檻重新做識別可靠度的分析,結果分類的準 確率隨著相似度的增加呈線性向上的趨勢發展,證實本文所提的分類方法是確實作用 於分類實驗中。
圖 8.6 分類準確率評估圖
圖 8.7 分類準確率分析圖
9. 結論
在本篇論文,我們提出一種針對 AAC 音訊格式的廣告音訊自動分析技術。我們 利用擷取自 AAC 解壓縮過程中的 MDCT 係數,用來計算出典型的分類特徵值及 MPEG-7 規範的音訊描述子,例如:靜音比、4ME、頻譜及能量特性等等,用以分辨 混合音樂、歌曲、音效及其他共四種音訊類型的音訊分段斷點,且利用 GMM、BPN、
SVM 等分類器,來比較不同音訊分段方法的效能。完成斷點分段後,斷點前後的分 段內容,藉由修改的古典向量空間模型,利用廣告內涵特徵路徑的比對,從而識別廣 告內涵的類型,由此可自動對 AAC 廣告音訊內涵分段及識別,並應用在多媒體資料 庫、網際網路的檢索和分類或廣告資料的內涵分析。
目前數位地面電視廣播(DTTB,Digital Terrestrial Television Broadcasting)標準包 含美規、歐規及日規三大主要標準。其中以歐規 DVB-T 為最多國家所採用,而隨後 而來則是專為手持式接收裝置應用的數位廣播技術 DVB-H 標準,根據 IMS Research 的報告指出,預估 2010 年時全世界將有 1.2 億用戶收看手機實況電視節目。電視廣 播相關技術的數位化已是必然趨勢,透過數位化的廣播,除可獲得較高的影音品質 外,亦可大幅提高在多媒體上的應用。因此,各國政府無不積極推動數位廣播工程,
加速普及的時程,以進入全面數位化的廣播時代。所以,針對行動或手持式裝置應用 的數位廣播技術亦逐漸成為研究重點,DVB-H 除了在硬體上受到一些資源限制外,
其在多媒體內容的研究上,也一直倍受關注。在 MPEG 所規範的音訊格式中,MPEG-4 的音訊核心 AAC 在日常的音訊應用上已漸漸地取代 MP3 的音訊格式,成為新一代的 主流,所以我們未來亦將針對手持接收裝置和網際網路等其他日常應用的音訊技術,
進行相關的多媒體內涵分析研究。
10. 參考文獻
[1] 吳智偉、劉志俊, “AC-3 環場音效與電影劇情關聯之資料探勘模型,” 第三屆數 位典藏技術研討會, 2004.
[2] 吳智偉、劉志俊, “支援 MPEG-7 之電影 AC-3 環場音效內涵描述工具,” 2005 數位生活與網際網路科技研討會, 2005.
[3] 葉億真、劉志俊, “音效資料的內涵式分類及其在電影資料庫的應用,” 第二屆 數位典藏技術研討會, 2003.
[4] 漆梅君 譯、White 著, “廣告學/Basic advertising,”亞太圖書, 1994.
[5] 樊志育, “廣告學新論,” 三民書局, version 6, pp. 137-140, Aug. 1993.
[6] 鄭煒帄、劉志俊, “網際網路電影資料庫之音效自動分段索引系統,” 第六屆網 際網路應用與發展研討會, 2005.
[7] 蕭聖峰、劉志俊, “AAC 廣告音訊之內涵式自動分析,” 2008 數位科技與創新管 理研討會(DTIM 2008), 2008.
[8] 蕭聖峰、劉志俊, “MPEG-2 AAC 電影音效內涵式自動分段,” 2007 全國計算機 會議(NCS 2007), 2007.
[9] ATSC A/52, “Digital Audio Compression Standard (AC-3),” United States Advanced Television Systems Committee, Nov. 1994.
[10] Baudrillard, J., “Simulacres et simulations”, Paris, Galilée, 1981.
[11] Bosi, M. et al., “ISO/IEC MPEG-2 advanced audio coding,” Journal of Audio Engineering Soc., vol. 45, no. 10, pp. 791–811, Oct. 1997.
[12] Duncan, J. S. and Birkhölzer, T., “Edge reinforcement using parametrized relaxation labeling,” In Proc. IEEE CVPR 1989., June 1989.
[13] Gonzalez, R. and Melih, K., “Content based retrieval of audio”, Proceedings of Australian Telecommunication Networks & Applications Conference, pp. 357-362, Melbourne, December 1996.
[14] Gu, D. and Hu, H., “Wavelet neural network based predictive control for mobile robots,” In IEEE International Conference, Systems、 Man and Cybernetics, Oct.
2000.
[15] Hauptmann, A. G. and Witbrock, M. J., “Story segmentation and detection of commercials in broadcast news video,” In Proc. IEEE Conf. Advances in Digital Libraries, pp. 168-179, Santa Barbara, April. 1998.
[16] Haveliwala, T. H., “Topic-sensitive PageRank: a context-sensitive ranking algorithm for Web search,” IEEE Transactions on Knowledge and Data Engineering, vol. 15, no. 4, pp. 784-796, July/Aug. 2003.
[17] Hecht-Nielsen, R., “ Theory of the Back Propagation Neural Network, ” Proceeding of International Joint Conference on Neural Networks, IEEE, vol. 1, pp.
593-605, 1989.
[18] Hsu, L. W., “Using Artificial Intelligent Approaches to Locating TV Commercial Films,” Thesis for Master of Science, Department of Computer Science and Engineering, Tatung University, Taiwan, July 2005.
[19] IEC CDV 61937-5: Digital audio - Interface for non-linear pcm encoded audio bitstreams applying IEC 60958 - Part 5: Non-linear PCM bitstreams according to the DTS (Digital Theater Systems) format(s) [IEC 100/974/CDV]
[20] ISO/IEC 15938-4:/FPDAM, “ Information Technology — Multimedia Content Description Interface — Part 4: Audio,” ISO/IEC, 2002.
[21] ITU-R BS. 775-1, “Multichannel Stereophonic Sound System With and Without Accompanying Picture, ” International Telecommunication Union, Geneva, Switzerland, 1992-1994.
[22] ITU-R Recommendation BS.1116, “Methods for the subjective assessment of small impairments in audio systems including multichannel sound systems,” Geneva, Switzerland, 1994.
[23] Kim, H. G. et al, “On approximating line spectral frequencies to LPC cepstral coefficients,” IEEE Transactions on Speech and Audio Processing, AT&T Bell Labs., Florham Park, NJ, USA, March 2000.
[24] Kim, H. G., Moreau, N., and Sikora, T., “MPEG-7 Audio and Beyond: Audio Content Indexing and Retrieval”, U.K.:Wiley, 2006.
[25] Li, C. H. and Lv, K. Q., “Hyperlink Classification: A New Approach to Improve PageRank,” IEEE 18th International Workshop on Database and Expert Systems Applications, pp. 274-277, Sept. 2007.
[26] Li, Y. et al., “Content-Based Movie Analysis and Indexing Based on Audio Visual Cues,” IEEE Transactions on Circuits and systems for video Technology, vol. 14, no. 8, August 2004.
[27] Lienhart, R. et al., “On the Detection and Recognition of Television Commercials,”In Proc. IEEE MMCS 1997, pp. 509-516, June 1997.
[28] Lin, Z., King, I., and Lyu, M. R., “PageSim: A Novel Link-Based Similarity Measure for the World Wide Web,” Proceedings of the 2006 IEEE/WIC/ACM International Conference on Web Intelligence, pp. 687-693, Dec. 2006.
[29] Liu, Z., Wang, Y. and Chen, T., “Audio Feature Extraction and Analysis for Scene Segmentation and Classification,” Journal of VLSI Signal Processing 20, 61–79 (1998), Kluwer Academic Publishers, Netherlands, 1998.
[30] Logan, B., “Mel frequency cepstral coefficients for music modeling,” In Proc. Int.
Symp. Music Information Retrieval (ISMIR), 2000.
[31] Lu, G. and Hankinson, T., “A Teehnique towards Automatic Audio Classification and Retrieval,” In Proceedings of ICSP, Australia, 1998.
[32] Lukasiak, J., McElroy, C. and Cheng, E., “Compression transparent low-level description of audio signals,”In Proc. IEEE ICME 2005, July 2005.
[33] Manian, V., Vásquez, R. and Katiyar, P., ” Texture classification using logical operators,” IEEE Transactions on Image Processing, Oct. 2000.
[34] Manjunath, B. S., Salembier, P., and Sikora, T., ” Introduction to MPEG-7:Multimedia Content Description Interface,” John Wiley & Sons, 2002.
[35] MPEG Requirements Group, “ Information technology - Multimedia Content Description Interface - Part2 : Description Definition Language, ” ISO/IEC JTC1/SC29/WG11 N4002, Singapore, Mar. 2001.
[36] Otsuka, I. et al., “Detection of Music Segment Boundaries using Audio-Visual Features for a Personal Video Recorder,” IEEE Transactions on Consumer Electronics, vol. 53, no. 1, February 2007.
[37] Pachet, F. and Cazaly, D., “A classification of musical genre,” In Proc. RIAO Content-Based Multimedia Information Access Conf., Paris, France, Mar. 2000.
[38] Page, L., Brin, S., Motwani, R., and Winograd, T., “The pagerank citation ranking:
Bringing order to the web,” Digital Libraries Technical Report, Stanford University, Jan. 1998.
[39] Panagiotakis, C., Tziritas, G., “A speech/music discriminator based on RMS and zero-crossings,” IEEE Transactions on Multimedia, vol.7, no. 1, Feb. 2005.
[40] Pao, H. T. et al, “Constructing and Application of Multimedia TV News Archives,”
MCAM 2007, LNCS 4577, pp. 151–160, Springer-Verlag, Berlin, Heidelberg, 2007.
[41] Pate1, N. V. and Sethi, I. K., “Auido Characterization for Video Indexing”, IS&T SPIE, vol. 2670, pp. 370-384, Feb. 1996.
[42] Pinquier, J., Rouas, J. L., and André-Obrecht, R., “A fusion study in speech/music classification,”In Proc. IEEE ICME 2003, July 2003.
[43] Qi, W. et al, “Integrating Visual, Audio and Text Analysis for News Video”, IEEE ICIP 2000, vol. 3, pp. 520-523, Vancouver, BC, Canada, Sept. 2000.
[44] Reynolds, D. A. and Rose, R. C., “Robust text-independent speaker identification using Gaussian mixture speaker models”, IEEE Transactions on Speech Audio Process. 3 (1995), pp. 72–83, 1995.
[45] Salton, G. and Buckley, C., “Term weighting approaches in automatic text retrieval,” Information Processing and Management, vol. 24, no. 5, pp. 513-523, Jan. 1988.
[46] Salton, G. et al, “A vector space model for automatic indexing,” Communications of the ACM, vol. 18, no. 11, pp. 613–620, Nov. 1975.
[47] Saunders, J., “Real-time discrimination of broadcast speech/music,” In Proc.
IEEE ICASSP 1996, 1996.
[48] Shawe-Taylor, J., Bartlett, P. L., Williamson, R. C. and Anthony, M., “Structural Risk Minimization.,” IEEE Transactions on Information Theory, 1998.
[49] Sung, A. H. and Lin, J., “Performance comparison of neural network models for engineering problems,” In IEEE International Conference, Systems、Man and Cybernetics, 1997. 'Computational Cybernetics and Simulation', Oct. 1997.
[50] Tzanetakis, G. and Cook, P., “ A framework for audio analysis based on classification and temporal segmentation,” In Proc. 25th Euromicro Conf. Workshop on Music Technology and Audio Processing, 1999.
[51] Tzanetakis, G. and Cook, P., “Musical Genre Classification of Audio Signals,”
IEEE Transactions on Speech and Audio Processing, vol. 10, no. 5, July 2002.
[52] Vapnik, V., “Statistical Learning Theory,” Springer, N.Y., 1998.
[53] Wold, E. et al, “Content-based classification, search, and retrieval of audio”, IEEE Multimedia, pp.27-36, Fall 1996.
[54] Xu, C., Maddage, N. C., and Shao, X., “ Automatic music classification and summarization,” IEEE Transactions on Multimedia, vol. 13, no. 3, pp. 441- 450, May. 2005.
[55] Zhang, T. and Kuo, J., “ Audio content analysis for on-line audiovisual data segmentation and classification,” IEEE Transactions on Speech Audio Process., vol.
9, no. 3, pp. 441–457, May. 2001.
[56] Zheng, Y. et al., “TV Commercial Classification By Using Multi-Modal Textual Information,” IEEE ICME 2006, pp. 497-500, July. 2006.
附錄 A、
特徵公式參數定義[ ] M i n
( ) s n
Fs
l L
Nw
Nhop
k ( ) f k
l( ) S k
l( ) P k
NFT
F r b B
loFb
hiFb l( )m
m
T0
f0
h
NH
fh
A
ith MDCT value in frame time index
digital audio signal sampling frequency frame index
total number of frames
length of a frame in number of time samples
number of time samples between two successive frames frequency bin index
frequency corresponding to the index k spectrum extracted from the lth frame power spectrum extracted from the lth frame size of the fast Fourier transform
frequency interval between two successive FFT bins spectral resolution
frequency band index number of frequency bands lower frequency limit of band b higher frequency limit of band b
normalized autocorrelation function of the lth frame autocorrelation lag
a pitch candidate
a weighting factor between 0 and 1 fundamental period
fundamental frequency index of harmonic component number of harmonic components frequency of the hth harmonic amplitude of the hth harmonic
附錄 B、
特徵值計算分佈圖 音訊樣本波形圖
能量特徵分佈圖
靜音比例分佈圖
0.00E+00 2.00E+04 4.00E+04 6.00E+04 8.00E+04 1.00E+05 1.20E+05 1.40E+05 1.60E+05 1.80E+05
1 64 127 190 253 316 379 442 505 568 631 694 757 820 883
Feature value
Frame
RMS
0.00E+00 2.00E+04 4.00E+04 6.00E+04 8.00E+04 1.00E+05 1.20E+05 1.40E+05 1.60E+05 1.80E+05 2.00E+05
1 60 119 178 237 296 355 414 473 532 591 650 709 768 827 886
Feature value
Frame
SR