MPEG-4 Structured Audio - MPEG-4 結構音訊與實體模型合成法之結合與高品質網路音樂服務應用(2/2)

MPEG-4 Structured Audio(MPEG-4 結構音訊；簡稱 MP4-SA)是一個 ISO/IEC 的標準[6]，它並不是以聲音樣本的資料來描述聲音，而是在執行的時候以電腦程式語言來產生聲音。在傳輸上 MPEG-4 Structured Audio 分為兩個主要部分：一是描述如何建立聲音的聲音合成法；另一個則是一連串的合成控制序列，來指出哪些聲音將被建立。以下將對標準中所規範 MPEG-4 Structured Audio 的組成元素、解碼處理概觀、一些基本定義、語法與物件型式等等來做簡述。

1.1 Structured Audio 五個主要組成元素與解碼處理概觀

1. The Structured Audio Orchestra Language(or SAOL)。SAOL 是一種被用來描述任意合成和控制演算法的數位訊號處理的語言，他是建立於資料流 (bitstream)中，並由一連串的字元(characters)；或稱符號(symbol)所組成。

Structured Audio 的技術可以用極低的資料量來描述複雜且高品質的聲音。

2. The Structured Audio Score Language(or SASL)。SASL 是一種描述樂譜的簡單的控制語言，他可以控制用 SAOL 中描述出來的聲音合成演算法，

來產生聲音。SASL 可視為一種讓音樂及聲音合成與控制更有彈性、更強大的簡單控制語言。

3. The Structured Audio Sample Bank Format(or SASBF)。在波形表聲音合成方法，或處理聲音樣本演算法的描述下，SASBF 允許傳輸聲音樣本資料庫讓發聲引擎使用。

4. A Normative Scheduler Description(規範的排程描述)。標準的排程描述是 Structured Audio 解碼處理時監督 run-time 的元素，他是一種＂描述於 SASL 或 MIDI 中對合成音樂的控制＂和＂標準聲音產生演算法分配出即時事件＂的對應。

5. Normative Reference To The MIDI Standards。由 MIDI 製造商聯盟所訂定的。MIDI 是結構性控制音樂上的代表，他可以用來連結或取代 SASL。

雖然 MIDI 沒有像 SASL 那麼強大且有彈性，但是他有不少重要的特點，

即在一些已經存在並使用很久的音樂應用或是作曲系統上，MIDI 有很高的相容性，因此 MIDI 的訊息和標準語法亦可為 MPEG-4 Structured Audio 的標準所接受。

Fig 1.A SA Synthesis Example.

There are 3 tracks containing score data, and 3 synthesis algorithms in a data stream. The score data is translated into raw PCM data with the corresponding synthesis algorithm.

1.2 MPEG-4 Structure Audio Server

要將 SA 系統運用在網際網路的應用上，我們需要選擇一種串流技術做為平臺，我們選擇 ISMA(Internet Streaming Media Alliance)與 MPEG-4 所共同接受的官方標準 – RTP 做為實作之技術。 Table 1 為一 RTP 表頭檔格式，其定義如下所述：

0 1 2 3 4 5 6 7 8 9 1

0 1 2 3 4 5 6 7 8 9 2

0 1 2 3 4 5 6 7 8 9 3 0 1

V P X CC M PT Sequence number

Timestamp

synchronization source (SSRC) identifier

Table 1 RTP header format.

V: RTP version number.

P: 0 in our implementation.

X: Header extension CC: CSRC count.

M: Indication of whether it is the last

packet of the sample transmitted.

The sample here is a MPEG-4 file format sample instead of audio samples.

PT: Payload type.

Sequence number: Generated randomly

in initialization.

Timestamp: We use tick as the basic unit

that is identical to MIDI.

The reason is that it is more precise.

SSRC: determined in the playback stage.

Figure 4 顯示我們的 client-server 系統如何進行一次典型的傳輸溝通。SA media data 使用標準的 MPEG-4/QuickTime 資料格式[7]；為了整合其他形式的媒體資料如影像，特別需要時間的資訊，所以在 Media header atom 中 timeScale 的資料被設定為 krate，而 duration 則以 krate 為單位計算與運用。MPEG-4 Audio atom 中 info 欄位包含了重要的參數資料如：arate、 track、 outputBusWidth 等等，而在 Time-to-sample atom 中 SampleCount 為一 32 位元整數數值，計算具有相同長度的連續樣本個數，Sample duration 則是另一個 32 位元整數數值用來指明每個樣本之長度。我們可以使用一簡單的例子來說明，例如：假設一樣本之長度為 2 秒鐘且 krate 設為 10，SampleDuration 為 20。如果要發出一段 60 秒的音樂，則 SampleCount 為 30，其他資料之安排大多可於網頁中找到資料，故不在此處贅述。上述之設定因固定 krate 及 duration，故 SAOL 中 settempo 之 opcode 禁止使用，否則將導致資料產生不同步之窘況。

Request List of Music

Transmission of List of Music

File Selection

Transmit MPEG-4 SA bitstream

Request SAOL stream if necessary

Transmit SAOL data stream if necessary Client builds a connection

to the SA database inside the SA server

Selection of music .

ScreamSA Player receives SA bitstream. Then, builds processing units and the scheduler.

ScreamSA Player compute sounds. The sounds are played using Window utilities.

Retrieve music list in database 。

Retrieve files for the selected music

Search for SAOL files by request

Client

Client RTP ServerRTP Server

Communication Flow

Search for address of requested music Return address of requested

music in SA Server Start ScreamSA Player

to receive the selected music from SA Server.

SA Server

Figure 4 MPEG-4/ScreamSA session process.

1.3 MPEG-4 Structured Audio Player

Figure 1 表示我們所設計的 SA 撥放器 - ScreamSA 的基本工作流程。當它接收到一個 MPEG-4 SA 資料串流，首先會將之分為四個部份：檔頭、樣本資料 (SASBF)、樂譜資料(MIDI or SASL)和樂器資料(SAOL)分別存放，檔頭與樂器資料會重組以方便解碼端執行，一般而言，我們首先需要一個剖析器將之轉換為高階細的 SASL 及 Karplu-Strong plucked string model[9] SAOL 範例，需要額外說明的是目前並未支援 MIDI 標準檔案。

Figure 1 Structure Audio decoding process

在文檔中 MPEG-4 結構音訊與實體模型合成法之結合與高品質網路音樂服務應用(2/2) (頁 6-10)