參考文獻 - 中華大學

[1] ATSC A/52, "Digital Audio Compression (AC-3) Standard," United States Advanced Television Systems Committee.

[2] MPEG Requirements Group, “Information technology - Multimedia Content Description Interface - Part 4：Audio” ISO/IEC CD 15938-4, Oct. 2000.

[3] Agrawal, R., Imielinske, T., and Swami, A. “Mining association rules between sets of items in large database,” Proc. of the ACMSIGMOD 1993 International Conference on Management of Data, Washington, D.C., May 1993, pp.207-216.

[4] Agrawal, R., and Srikant, R., “Fast Algorithms for Mining Association Rules,”

Proc. of the 20th International Conference on Very Large Database, Santiago, Chile.

[5] L. Chen and M. Taner Ozsu, “Rule-Based Scene Extraction from Video”, IEEE ICIP2002, Vol. II, pp. 737-740, Sept. 2002.

[6] De Santo, et al., “Classifying Audio of Movies by a Multi-expert System,”

Image Analysis and Processing, 2001. Proceedings. 11th International Conference on , pp. 386-391, 26-28 Sept. 2001.

[7] S. Fischer, et al., “Automatic Recognition of Film Genres”, ACM Multimedia 95, pp. 295-304, November 1995.

[8] P. Gelin and C.J. Wellekens, “Keyword Spotting for Video Soundtrack Indexing,” Proc. IEEE Acoustics Speech and Signal Processing (ICASSP 96), Vol. 1, IEEE Computer Society Press, Los Alamitos, Calif., 1996, pp. 299-302.

[9] D. Kimber, and L. Wilcox, “Acoustic Segmentation for Audio Browsers,” Proc.

Interface Conf., Interface Foundation of North America, Fairfax, Va., 1996, pp.

295-304.

[10] Lu, G.J. and T. Hankinson, "A Technique Towards Automatic Audio Classification and Retrieval," In Proc. IEEE Intl. Conf. on Signal Processing, Vol. 2, pp. 1142-1145, 1998.

[11] Z. Liu, et al., “Classification Of TV Programs Base On Audio Information Using Hidden Markov Model”, IEEE W.S. on Multimedia Signal Processing, pp.

27-32, December 1998.

[12] C.M. Liu, S.W. Lee, and W.C. Lee, “Bit Allocation Method for Dolby AC-3 Encoder”, IEEE Trans. On Consumer Electronics, 1998.

[13] Rainer Lienhart, Silvia Pfeiffer, and Wolfgang Effelsberg, “Video Abstracting,”

Communications of the ACM, vol. 40, No. 12, pp. 55-62, 1997.

[14] Minami K, Akutsu A, Hamada H, Tonomura Y, “Video Handling with Music and Speech Detection”, IEEE Multimedia, 1998.

[15] Nakajima, Y. et al., “A Fast Audio Classification from MPEG Coded Data,”

IEEE Intern. Conf. on Acoustics, Speech, and Signal Processing, vol. 6, pp 3005-3008, 1999.

[16] N. V. Patel and I. K. Sethi, “Audio Characterization for Video Indexing,“ SPIE, vol. 2670, pp. 373-384, 1996.

[17] N. V. Patel and I. K. Sethi, “Video Classification Using Speaker Identification,”

SPIE, vol. 3022, pp. 218-225, 1997.

[18] Z. Rasheed and M. Shah, “Movie Genre Classification by Exploiting Audio-Visual Features of Previews”, IEEE ICPR 2002, August 2002.

[21] E. Scheirer, and M. Slaney, “Construction and Evaluation of a Robust Multifeature Speech/Music Discriminator,” Proc. ICASSP 97, Vol. 2, IEEE Computer Society Press, Los Alamitos, Calif., 1997, pp. 1331-1334.

[22] Sugano, M.; Isaksson, R.; Nakajima, Y.; Yanagihara, H., “Shot genre classification using compressed audio-visual features,” 2003.Proceedings. 2003 International Conference on Image Processing, Volume: 2 , 14-17 Sept. 2003 Pages:II - 17-20 vol.3

[23] Todd, C. et. al, "AC-3: Flexible Perceptual Coding for Audio Transmission and Storage," Preprint 3796, Presented at the 96rh Convention of the Audio Engineering Society, May 1994.

[24] B.T. Truong, C. Dorai and S. Venkatesh, “Automatic Genre Identification for Content-Based Video Categorization”, IEEE ICPR 2000, Vol. IV, pp 230-233, September 2000.

[25] N. Vasconcelos and A. Lippman, “Towards Semantically Meaningful Feature Spaces for the Characterization of Video Content”, IEEE ICIP97, Vol. I, pp.

25-28, 1997.

[26] Vernon, S., “Design and Implementation of AC-3 Coders,” IEEE Transactions on Consumer Electronics, vol. 41, Issue 3, pp. 754-759, Aug. 1995.

[27] Watson, M.A.; Buettner, P., “Design and Implementation of AAC Decoders,”

IEEE Transactions on Consumer Electronics, vol. 46, Issue 3, pp. 819-824, Aug.

2000.

[28] Wold, E. et al., “Content-based Classification, Search, and Retrieval of Audio,”

IEEE Multimedia, vol. 3, no. 2, pp. 27-36, 1996.

[29] Minerva M. Yeung and Boon-Lock Yeo, “Video Visualization for Compact Presentation and Fast Browsing of Pictorial Content,” IEEE Transactions on Circuits and Systems for Video Technology, vol. 7, No. 5, October 1997.

[30] 吳智偉、劉志俊, “AC-3 環場音效與電影劇情關聯之資料探勘模型,” 第三屆數位典藏技術研討會, 2004.

[31] 吳智偉、劉志俊、蘇永隆、賴嘉琪、張俊堂, “支援 MPEG-7 之電影 AC-3 環場音效內涵描述工具,” 二００五數位生活與網際網路科技研討會, 2005.

[32] 許肇凌、吳智偉、鄭煒平、劉志俊, “數位音樂資料非法傳輸偵測系統之設計與實作,” 第一屆數位典藏技術研討會, 2002.

[33] 葉億真、劉志俊, “音效資料的內涵式分類及其在電影資料庫的應用,” 第二屆數位典藏技術研討會, 2003.

[34] 鄭煒平、許肇凌、吳智偉、劉志俊, “MP3 Sniffer: MP3 音樂非法傳輸偵測系統,” 數位生活與網際網路科技研討會, 2003.

[35] 劉志俊、吳智偉、鄭煒平、許肇凌, “MP3 Sniffer: MP3 音樂非法傳輸偵測系統的設計與實作,＂ Communications of IICM, Vol. 7, No. 1, pp. 131-147, Dec. 2003.

附錄 A AC-3 特徵值擷取工具與音量字串轉換工具使用說明

程式名稱用途說明

MDCT_EXTRACTOR 取得 AC-3 串流中的指數(Exponent)與假數 (Mantissa)係數

SURROUND_ANALYZE 重組MDCT 係數與正規化計算 AUDIOSTRING_CREATOR 產生四種音量字串

(1) MDCT 係數之原始指數(Exponent) 與假數(Mantissa) 擷取工具：

MDCT_EXTRACTOR

Example.Frame

檔案中只有一筆資料，紀錄此AC-3 檔案的總 Frame 個數

Example.Exponent

檔案中存放著每個Frame 的 Exponent 係數，共有 MaxFrame × 6(Channel 個數) × 256 個係數，檔案格式如下：

Frame Number Channel : 1

Exponent_Number1 Exponent_Number2

. . .

Exponent_Number256 Channel : 2

Exponent Set Channel : 3

.

. Channel : 6 MaxFrame

Exponent Set

每個 Frame 所擁有的 Exponent 結構區塊，共有MaxFrame 個區塊

Example.Mantissa

檔案中存放著每個Frame 的 Mantissa 係數，共有 MaxFrame × 6(Channel 個數) × 256 個係數，檔案格式如下：

現以杜比環場演示示範樣本舉實際例子說明如下。

首先請先執行下列指令：

MDCT_EXTRACTOR.EXE FILENAME FILENAME 欲分析的 AC-3 檔案的完整路徑

執行的過程也會播放AC-3 檔案內容。執行完畢後，產生的 Dolby.Frame、

Dolby.Mantissa、Dolby.Exponent 檔案內容如下：

Frame Number Channel : 1

Mantissa _Number1 Mantissa _Number2

. . .

Mantissa_Number256 Channel : 2

Mantissa Set Channel : 3

. .

Channel : 6

Mantissa Set

每個 Frame 所擁有的 Mantiss 結構區塊，共有MaxFrame 個區塊

Dolby.Exponent

(2) MDCT 係數計算與正規化工具：SURROUND_ANALYZE

取得 Dolby.Frame、Dolby.Mantissa 與 Dolby.Exponent 三個檔案之後，再將其還原成MDCT 係數，方法如下。

Frame 1 Channel : 1

15 17 24

. .

205 Channel : 2 Channel : 3 Channel : 6 Frame 1 Channel : 1

0.005859 0.019531

.

0.000000 Channel : 2 Channel : 3 Channel : 6

根據上述方法，我們可以還原 MDCT 係數，並進一步分析環場資訊，請執行下列命令：

SURROUND_ANALYZE.EXE FILENAME –METHOD FILENAME 欲分析的 AC-3 檔案的完整路徑

METHOD -RMS 利用均方根值計算能量

Dim MaxFrame // Maximum Frame Number = 1060, for example.

Dim

FCount // Frame Counter

Dim Ch // Channel Number Dim

Band // Sub-Band Number MaxFrame = LineInput(FileNumber_MaxFrame)

Dim Exp(FCount, 6, 256) // Exponen Arrayt Dim Mant(FCount, 6, 256) // Mantissa Array For FCount = 1 To MaxFrame

// 取出框架編號

Temp = LineInput(FileNumber_Mantissa) Temp = LineInput(FileNumber_Exponent)

For Ch = 1 To 6

// 取出註解行

Temp = LineInput(FileNumber_Mantissa) Temp = LineInput(FileNumber_Exponent)

For Band = 1 To 256

Exp(FCount, Ch, Band) = LineInput(FileNumber_Exponent) Mant(FCount, Ch, Band) = LineInput(FileNumber_Mantissa)

For Index = 1 To Exp(FCount, Band)

Mant(FCount, Ch, Band) = Mant(FCount, Ch, Band) * 0.2

Next Index

Next Band Next Ch

Next FCount

(3) AC-3 音量字串計算工具：AUDIOSTRING_CREATOR 要取得音量字串，請執行下列指令：

AUDIOSTRING_CREATOR.EXE FILENAME –METHOD FILENAME 欲分析的 AC-3 環場資訊檔案的完整路徑

METHOD -Opposite 相對音量字串 -Absolute 絕對音量字串

產生的檔案類型為文字檔，以 Dolby.AC3 為例，檔案名稱分別為 Dolby_Opposite.TXT 或 Dolby_Absolute.TXT，內容如下所示。

Frame Number // 此樣本的 Frame 總數 Frame Counter // Frame 編號

// 紀錄哪個聲道擁有最大的能量

// 0(靜音)、2(前左)、4(中置)、8(前右)、16(後左)、32(後右)、64(低音) Maximum Energy Channel

Maximum Energy // 最大聲道能量 Channel Energy 1 // 前左聲道能量 Channel Energy 2 // 中置聲道能量 Channel Energy 3 // 前右聲道能量 Channel Energy 4 // 後左聲道能量 Channel Energy 5 // 後右聲道能量

Flag for LEF Exist // 低頻聲道是否高於 Threshold Channel Energy 6 // 低音聲道能量

F1: 5645641321565445456 // 前左 C: 8789564545132134564 // 中央 F2: 5456121315648646513 // 前右 R1: 2222221121212122121 // 後左 R2: 2222323321212123211 // 後右 S: 5556565656565655656 // 低音

附錄 B 實驗數據

表9 持續低音事件參數CB實驗數據 CB Precision Recall

0 31.34 100 1 31.34 100 2 31.34 100

3 37.74 95.24

4 48.78 95.24

5 52.63 95.24

6 66.67 95.24

7 80 95.24

8 85 80.95

9 90.91 47.62

表10 持續低音事件參數CW實驗數據 CK Precision Recall

1 44.68 100

2 52.63 92.24

3 66.67 95.24

4 80 95.24

5 80.95 80.95

6 80 76.19

7 83.33 71.43

8 77.78 66.67

9 77.78 66.67

10 76.47 61.9

11 75 57.14

12 65.22 71.43

13 68.18 71.43

14 63.16 57.14

15 63.16 57.14

CW Precision Recall

0 88.89 38.1

1 88.89 38.1

2 80 95.24

3 66.67 95.24

4 51.22 100 5 44.68 100 6 36.84 100 7 30.43 100 8 30.43 100 9 30.43 100 表12 突發低音事件參數SB實驗數據

SB Precision Recall

0 0 0

1 52.63 37.04

2 54.17 48.15

3 55.17 59.26

4 57.14 74.07

5 70.97 81.48

6 50 51.85

7 37.04 37.04

8 31.25 18.52

9 0 0 表13 突發低音事件參數SKmax實驗數據

SKmax Precision Recall

1 61.11 40.47

2 66.67 59.26

3 70.97 81.48

4 69.7 85.18

5 66.66 88.89

6 65.71 85.16

7 58.97 88.16

8 58.54 88.89

9 58.54 88.89

10 57.14 88.89

表14 節奏低音事件參數RBMin實驗數據

SBMin Precision Recall

0 66.67 66.67

1 66.67 66.67

2 66.67 66.67

3 66.67 66.67

4 66.67 66.67

5 50 33.33

6 50 33.33

7 100 33.33

8 0 0 9 0 0 表15 節奏低音事件參數RKmax實驗數據

RKmax Precision Recall

1 40 66.67

2 66.67 66.67

3 0 0

4 0 0 表16 節奏低音事件參數RR實驗數據

RR Precision Recall

1 6.38 100

2 21.43 100

3 66.67 66.67

4 100 33.33

5 0 0 表17 單一移動事件參數SiBmin實驗數據

相對音量

直接能量 Precision Recall 絕對音量

直接能量 Precision Recall

0 60.32 88.37 0 60.38 74.42 1 60.32 88.37 1 60.38 74.42 2 60.32 88.37 2 61.54 74.42

9 59.14 18.6 9 0 0 相對音量

RMS Precision Recall 絕對音量

RMS Precision Recall

0 59.38 88.37 0 47.06 18.6 1 59.68 86.05 1 47.06 18.6 2 60.66 86.05 2 47.06 18.6 3 60.66 86.05 3 47.06 18.6 4 58.62 79.07 4 47.06 18.6 5 59.65 79.07 5 47.06 18.6 6 58.82 69.77 6 47.06 18.6 7 59.52 58.14 7 47.06 18.6

8 61.29 44.19 8 40 13.95

表18 單一移動事件參數SiKmax實驗數據相對音量

直接能量 Precision Recall 絕對音量

直接能量 Precision Recall

1 63.64 99.67 1 62.5 81.4

2 63.08 95.35 2 64.41 88.37 3 60.32 88.37 3 61.54 74.42 4 61.02 83.72 4 61.7 67.44

5 56.6 69.78 5 54.76 53.45

6 58 67.44 6 55 51.16 7 55.56 58.14 7 54.29 44.19 8 58.14 58.14 8 52.94 41.86 9 58.97 53.49 9 53.13 39.53 相對音量

RMS Precision Recall 絕對音量

RMS Precision Recall

1 60.32 88.37 1 50 18.6

2 60.94 90.7 2 53.33 18.6 3 59.38 88.37 3 47.06 18.6

4 62.9 90.7 4 35.71 11.63

5 60.38 74.42 5 16.67 4.65 6 57.14 65.12 6 11.11 2.33

7 52.38 51.16 7 40 9.3

8 60 55.81 8 40 9.3

9 54.05 46.51 9 37.5 6.98

表19 單一移動事件參數SiGapmin實驗數據

相對音量

直接能量 Precision Recall 絕對音量

直接能量 Precision Recall

0 63.64 99.67 0 64.41 88.37 1 56.76 48.84 1 78.57 25.58 2 80 27.91 2 100 6.98

3 100 11.63 3 0 0

4 100 4.65 4 0 0

5 100 2.33 5 0 0

6 0 0 6 0 0 相對音量

RMS Precision Recall 絕對音量

RMS Precision Recall 0 60.94 90.7 0 53.33 18.6

1 65.85 62.79 1 100 2.33

2 72.22 30.23 2 0 0

3 81.82 20.93 3 0 0

4 83.33 11.63 4 0 0

5 75 6.97 5 0 0

6 100 6.97 6 0 0

在文檔中中華大學 (頁 59-73)