Related works - 以Balsa設計之非同步MP3解碼器

2-1 Introduction to MP3

The compression technology supported by MPEG (Moving Picture Expert Group) is widely used in various current multimedia applications, for example, network multimedia streamings, online music stores, digital televisions, and portable devices.

In the MPEG-1 standard, the compression of an audio signal can be categorized to three layers, MPG Layer 1, MPEG Layer 2, and MPEG Layer 3. These layers are different in codec complexity and compressed audio quality. Layer 1 forms the basic algorithms and is suitable for the bit rate above 128 Kbps per channel. Layer 2 targets the bit rates around 128 Kbps per channel and provides additional coding of bit allocations, scalefactors and samples. Layer 3 is the most complex, but it offers the best audio quality. A common CD music is about 44.1KHZ in frequency and 16 bits in sampling, so it consumes around 10 MB of storage space per minute. MP3 music only needs 1 MB storage space per minute. The compression rate of the MP3 music is 10 ~ 12 times the compression rate of a common CD music. The comparisons between the three layers of MPEG-1 are shown in Table 2.

Layer I Layer II Layer III Analysis/Transform 32 sub-bands 32 sub-bands 32 sub-bands

Psychoacoustics model Model 1 Model 1 Model 2

Bit Rate 32~448 kbps 32~384 kbps 32~320 kbps

Sample Frequency 32, 44.1, 48 KHZ

Quantize Uniform Uniform Non-uniform

Samples per frame 384 samples 1152 samples

1152 samples

Layer I Layer II Layer III

Analysis/Transform 32 sub-bands 32 sub-bands 32 sub-bands

Psychoacoustics model Model 1 Model 1 Model 2

Bit Rate 32~448 kbps 32~384 kbps 32~320 kbps

Sample Frequency 32, 44.1, 48 KHZ

Quantize Uniform Uniform Non-uniform

Samples per frame 384 samples 1152 samples

1152 samples

Table 2: The comparisons between the three layers of MPEG-1

2-1-1 Frame format

All MP3 files are divided into smaller fragments called frames. Each frame stores 1152 audio samples divided into two granules of 576 samples each and lasts for 26 ms. The frame structure of a MP3 can be divided into five parts as shown in Figure 4. Each header of the MP3 frames is 32 bits, includes some information about this frame, Sync word, ID, Layer, CRC, Sampling frequency, etc. The side information of each frame will be used in the following parts: the Huffman decoder and the scalefactor decoder. The main data part of the frame consists of scale factors, Huffman coded bits and ancillary data.

Header(32) CRC(16) Side info(17,32 bytes) Main data ^Ancillary Header(32) CRC(16) Side info(17,32 bytes) Main data ^Ancillary

granule0 granule1 granule0 granule1

Left channel Right channel

Left channel Right channel Left channelLeft channel Right channelRight channel

Scale factor Huffman code

Scale factor Huffman code Scale factorScale factor Huffman codeHuffman code Figure 4: The frame structure of a MP3 file

2-1-2 Side information and Main data

The side information section contains the necessary information to decode the main data.

This section is 17 bytes long in the single channel mode and 32 bytes long in the dual channel mode.

The main data section contains the coded scale factor values and the Huffman coded frequency lines. Its length depends on the bitrate and the length of the ancillary data. The length of the scalefactor part depends on whether scale factors are reused, and also on the window length (sort or long).

The first 9 bits of side information is a point tag which points out the main data beginning address in the current frame. Because the MP3 is encoded in the Huffman encoding, the lengths of the audio data after Huffman encoding are not all the same. In order to increase the space utility rate, the bit reservoir technology is used as shown in Figure 5. Therefore, the main data beginning address of each frame is not always after the side information of itself.

Header &

Frame 0 data Frame 1 data Frame 2 data Frame 3 data Header &

Frame 0 data Frame 1 data Frame 2 data Frame 3 data Figure 5: The bit reservoir technology

2-1-3 Huffman decoding

There are two parts in the main data section, the scalefactor part and the Huffman data part. The size of the Huffman data part can be known by the side information and the scale factors. The big_values of the side information are the spectral values coded with different Huffman code tables. These frequencies range from zero to the Nyquist frequency and are divided into five regions (See Figure 6). The rzero region contains pairs of quantized values that equal to zero and represents the highest frequencies. The count1 region contains quadruples of quantized values that equal to -1, 0 or 1. Finally, the big_values region contains pairs of values, and the maximum of these values in the range are constrained to 8191 (13 bits). The big_values field indicates the size of the big_values region, and the maximum value

is 288.

Figure 6: The five regions of Huffman data.

The Huffman decoding flow is shown in Figure 7.

Bitstream input

Figure 7: The Huffman decoding flow

2-2 The MP3 processing flow

The MPEG/Audio layer 3 decoding process has three main parts [10]: the bitstream decoding, the inverse quantization and the frequency-to-time mapping as shown in Figure 8 .

Figure 8: The MP3 decoding process

The bitstream data is fed into the decoder. The bitstream decoding block receives header and error detection if error-check (CRC error detection code) is applied in the encoder. The bitstream data are unpacked to recover the various pieces of information, and the inverse quantization block reconstructs the quantized version of the set of mapped samples. Finally, the frequency-to-time mapping block transforms these mapped samples back into a uniform PCM.

2-2-1 Bitstream decoding

There are four phases in the bitstream decoding part, which are the header decoding, the side information decoding, the scale factor decoding, and the Huffman data decoding. First, the bitstream decoder synchronizes one header address of a frame, and then it receives header

and side information data into buffer for usage in later phases. Third, the scale factor decoding phase decodes the scale factor data that is needed in re-quantization. Fourth, the Huffman data phase receives 576 factor values which are computed by MDCT and the quantization to do an ascending power sort. The bitstream decoding block is shown in Figure 9.

Synchronization

Figure 9: The bitstream decoding block

2-2-.2 Inverse quantization

There are three parts in the inverse quantization block: the re-quantization, the reordering and the joint-stereo processing. The re-quantization part covers the Huffman decoded values back to their spectral values using a power law. For each output value Y from Huffman decoder, Y^4/3is calculated. So, it needs the scale factors and Huffman values that were decoded before. The following is the re-quantization formula:

Xr

= is

i4/3

* 2

(0.25 * C)

(1)

The factor “C” in the equation consists of the global gain and the scalefactor band information from the side information and the scale factors. The value, isi, means the Huffman decoded value at buffer index i, and the input to the next processing block at index i is called Xr(i).

Reordering

In order to make the Huffman decoding more efficient, we must reorder the frequency value from MDCT and quantization. This part is only used in short block windowing. Because the three window samples in the same frequency of each subband are put together into one window during the Huffman encoding, and they must be converted back to the original order.

The reordering method is shown in Figure 10.

Figure 10: The reordering method

Join stereo processing

The MP3 decoding does not only support the mono or dual mode channel mode

decoding, but also support stereo mode channel mode decoding.

2-2-3 The Frequency to time mapping block

The frequency to time mapping block can be divided to three phases such as alias reduction, IMDCT, ploy-phase synthesis filter bank (Figure 11). The purpose of this block is converting the decoded re-quantization frequency domain values to the time domain values.

Alias reduction

IMDCT &

overlapping

Poly‐phase synthesis Frequency to time mapping

PCM signal Alias

reduction

IMDCT &

overlapping

Poly‐phase synthesis Frequency to time mapping

PCM signal

Figure 11: The frequency to time mapping block

Alias Reduction

The alias reduction is required to negate the aliasing effects of the poly-phase synthesis filter bank during encoding. It is used in the long block. There are eight butterfly calculations for each sub-band as shown in Figure 12. The x(i) is the frequency value that is processed by a reorder module, and the cs and the ca are the constants that can be found in standard tables.

Figure 12: The alias reduction

IMDCT

The IMDCT (Inverse Modified Discrete Cosine Transform) transforms the frequency lines to poly-phase filter subband samples. The analytical expression of the IMDCT is as shown as below where n is 12 for short blocks and 36 for long blocks.

(2)

In long blocks, the input of IMDCT is formed with 18 frequency lines, and then the IMDCT produces 36 outputs. In a serial of three window blocks, the input of the IMDCT is formed with 6 frequency lines, every block produces 12 outputs.

After the IMDCT process, the result Xi must multiply with the function of windowing. And the function of windowing depends on the block_type different shapes of windows used.

1. Block_type = 0

(3) 2. Block_type = 1

(4) 3. Block_type = 3

(5)

4. Block_type = 2

(6)

(7)

After windowing, the results must be overlapped and added with the previous block.

Half of the block of the 36 values is overlapped with the second half of the previous block.

The second half of the actual block is stored to be used in the next block as shown in Figure 13.

Figure 13: The overlapping of the IMDCT

Poly-phase Synthesis

The poly-phase synthesis (filterbank) block transforms the 32 subband blocks of 18 time-domain samples in each granule to 18 blocks of 32 PCM samples. This block can be divided to four parts: moving, DCT, matrix multiply and overall adding. The flow of poly-phase synthesis is shown in Figure 14.

In the synthesis operation, the 32 subband values are transformed to the 64 values V vector via the DCT computation. The V vector is pushed into the FIFO buffer, and a new vector, U vector, is created from the FIFO. Finally, the U vector is multipled with the constant

D window to get the W vector, and these 16 W vectors are added with each other. The final 32 samples become a PCM vector.

DCT

Figure 14: The flow of poly-phase synthesis

2-3 Overview of Pipeline Architecture

The pipeline architecture is used widely in microprocessor designs. It can increase the throughput due to the parallel processing of instructions. In Figure 16, every stage of the

synchronous pipeline must be controlled by a global clock. The period of the global clock is set to the slowest pipeline stage. However, each stage of the asynchronous pipeline can be processed at its own speed. Some instructions even can bypass the stages that aren’t processed, such as instructions 3 and 4 in Figure 15. The instruction 3 doesn’t need to be processed in the WB stage, and the instruction 4 doesn’t need to be processed in the EXE stage. In the

synchronous pipeline, these instructions still need to wait a complete clock cycle time before moving to the next stage. But in the asynchronous pipeline, these instructions can be

processed quickly (bypassing) to the next stage.

Figure 15. Synchronous Pipeline V.S. Asynchronous Pipeline

2-4 Balsa back-End

The Balsa back-end can be used to generate gate level netlists for supported CAD systems [1] [2]. In this section, we will describe some basic cells for the Xilinx FPGA generated by Balsa and some handshake components in the Balsa synthesis system.

2-4-1 Basic Elements

There are many basic cells generated by Balsa for the Xilinx FPGA, including AND, OR, NOR, XOR, NAND, BUF, XNOR, INV, FD (D-type flip-flop), FDC and FDCE. The most important cell for a asynchronous circuit is the Muller C-element as shown in Figure 16. It can hold the past state. When both of the inputs are 1, the output is set to 1. When both of the inputs are 0, the output is set to 0. When other conditions happen, the output is not changed. A Muller C-element is a fundamental component that is extensively used in asynchronous circuits.

Figure 16: The Muller C-element, (a) symbol (b) true table (c) gate-level implementation

i⁰ NC2P

i1 q

i⁰

i1 q

i0 i1

1 0 no change

0 X 1 1 1 0 (a)

(b)

(c)

Figure 17: The NC2P-element (a) symbol (b) true table (c) gate-level implementation

Figure 17 shows the NC2P element. When the input, i0, is equal to 0, the output, q, is set to 1. When the i0 and i1 are both 1, the output is set to 0. For other input conditions, the output is not changed.

Figure 18: The S-element (a) symbol (b) gate-level implementation (c) handshaking protocol

Figure 18 shows the S-element. An S-element performs a series of handshake. It has 4 inputs including two request/acknowledge handshake pairs, ‘Ar’/’Aa’ and ‘Br’/’Ba’. It is composed by NC2P, NOR and AND gates. In the Balsa system, it usually replaces the

“inverter of C element” with a NC2P element, because the behavior of a NC2P element is much like a C element. Hence, it can reduce the number of gates because an “inverter of C element” uses 3 AND gates, 1 OR gate and 1 INV, but a NC2P element uses 2 AND gates, 1 NOR gate and 1 INV.

2-4-2 Handshake Components

There are 40 handshake components in the Balsa system. Each handshake component is constructed by a gate level implementation. In the following section, we will illustrate some of them.

Figure 19: The Fetch Component (a) handshake component (b) gate-level implementation

Figure 19 shows the Fetch component. This component is used to transfer data from input channels to variables, from variables to output channels, and from variables to variables.

Figure 20 and Figure 21 are the sequence and concurrent components. The sequence components control the output signals in sequence, and the concurrent component controls the output signals in parallel.

activate_0r

Figure 20: The Sequence Component (a) handshake component (b) gate-level implementation

Figure 21: The Concurrent Component (a) handshake component (b) gate-level implementation

Figure 22 is a variable component which is composed by the FD (D-type flip-flop) gate.

The Balsa system will map the “variable “description to this component when translating handshake component files (.breeze). Data is stored if the signal write_0r is set, and data is read when the signal read_0r is set.

Var Read[0]

write

(a)

(b)

FD

Write_0r Write_0a

Read_Or Write_0d

Read_0d

Read_0a

D Q

Figure 22: The variable Component (a) handshake component (b) gate-level implementation

2-5 Concluding Remarks

In this chapter, we introduce the MP3 (MPEG1 Layer 3) architecture. The frame structure of a MP3 file contains 5 parts, Header, CRC, Side information, Main data, Ancillary. This process flow of the MP3 operates in the sequence as the above structure. It can be divided into three main parts: the bitstream decoding, the inverse quantization and the frequency-to-time mapping. We then introduce the concepts of asynchronous pipeline. Finally we illustrate the Balsa back-end. Balsa synthesis system is composed of about 40 components, which can be translated into gate-level netlists.

在文檔中以Balsa設計之非同步MP3解碼器 (頁 18-39)