CHAPTER 1 INTRODUCTION
1.5 C ONTENT O RGANIZATION
This thesis contains six chapters. Chapter 1 is in the premise. Chapter 2 introduces three methods which are used to embed data into the MP3 file in this thesis. In Chapter 3, the hardware and software environment where the MP3 decoder with data embedded decoder ported is developed are introduced. Chapter 4 presents the implementation and performance verification of these methods. This thesis finishes with conclusion and future works in Chapter 5. Appendix A introduces the MPEG-1 Layer III codec algorithm, which is including the brief principles and functionality. Appendix B introduces the ODG standard which is used for testing the music quality.
CHAPTER 2
Data Embedded Algorithm in the MPEG/Audio
In this chapter, we will describe the MPEG-1/Audio compression algorithm briefly and the MPEG-1/Audio format. This serves as the necessary background of understanding our MPEG-1/Audio data embedded schemes. The data embedded technique will be introduced in Section 2.2. It includes the principle, the application, and the classification of the data embedded algorithm. The data embedded algorithm that used to implement the data embedded codec about the MPEG/Audio in this thesis is introduced in Section 2.3. Section 2.3 introduces the data embedded encoder which includes the principles, the application, the advantages, and other methods to embed data into the MPEG/Audio. It introduces the data embedded decoder which extracts the embedded data.
2.1 Introduction to MPEG Audio
The ISO MPEG standard [3][4] contains four parts for compression standards shown in Fig. 1. The MPEG-1 is divided into five parts, namely system, video, audio, compliance testing, and software simulation. The MPEG-1 audio algorithm is an international standard for digital audio compression and does not make any assumptions about the nature of the audio source. It is suitable for audio-only applications as well as combined with video data (MPEG Systems Coding).
ISO MPEG Standard
MPEG-2 MPEG-7
MPEG-1 MPEG-21
Audio Compliance Testing
Video Software Sumulation
System
Layer 2 Layer 3
Layer 1
Fig. 1 The hierarchy of the ISO MPEG Standard
Depending on the applications, MPEG audio coding system can also be divided into three layers with increasing encoder complexity:
y
Layer I
Layer I contains the basic mapping of the audio samples into 32 subbands, fixed segmentation to format the data into blocks, a psychoacoustic model for the bit allocation, and quantization. It best suits the bit rate above 128Kbps per channel.
y
Layer II
Layer II provides additional coding of bit allocation, scale-factors and samples. It targets the bit rate around 128 Kbps per channel.
y
Layer III
Layer III introduces increased frequency resolution based on a hybrid filterbank. It uses non-uniform quantizer and entropy coding (Huffman Coding). It offers the best audio quality at the bit rate around 64 Kbps per channel.
The MPEG audio compression is a lossy algorithm and uses the special nature of the human auditory system (HAS). It removes the perceptually irrelevant parts of the audio and makes the audio signal distortions inaudible to the human ear, so it can provide compression ratios ranging form 2.7 to 24, see the Fig. 2. The compression ratios depend on different predefined fixed bit rates ranging from 32 kbps to 224 kbps.
1 2 3 4
Layer I Layer II
Layer III
1:4 Compression ratio
Source WAVE File
1:8 Compression ratio 1:12 Compression ratio
Fig. 2 The comparison of the ISO MPEG Audio standard compression ratio
2.1.1 Introduction of the MP3 Encoder Algorithm
The description of the encoding process is based on the block diagram in Fig. 3.
The input audio signal which comes from a single channel PCM signal is passed through a polyphase filter bank. This filter bank divides the input signal into 32 equally-space frequency subbands. After this process, the samples in each subband are still in the time domain. A Modified Discrete Cosine Transform (MDCT) is then used to
map the samples in each subband to frequency domain. In the meantime, input signal after FFT transformation passes through a psychoacoustic model that determines the ratio of the signal energy to the masking threshold for each subband. The distortion control block uses the signal-to-mask ratios (SMR) from the psychoacoustic model to decide how to assign the total number of code bits available for the quantization of the subband signals to minimize the audibility of the quantization noise. The quantized subband samples are coded with the lossless Huffman coding to decrease the entropy of samples. Finally, the end block takes the Huffman coded subband samples and side information into a packed bitstream according to the MPEG/Audio standard.
Filterbank 32
Fig. 3 MPEG-1/Audio Layer 3 encoder block diagram [5]
2.1.2 Introduction of the MP3 Decoder Algorithm
In this section the MPEG-1/Audio Layer III decoder will be described with its functionality. The decoding process is based on the block diagram in Fig. 4. The decoder has three main parts: “Decoding of Bitstream”, “Inverse Quantization”, and
“Frequency to Time mapping”.
The input coded bitstream is passed through the first parts to synchronize and extract the quantized frequency line and other information of each frame. The inverse
quantization part dequantized the frequency line according to the output of previous part.
Finally, the last part is a set of reverse operations of the MDCT and analysis polyphase filter bank in the encoder. Its output is the audio signal in PCM format.
Decoding of
Fig. 4 MPEG-1/Audio Layer III decoder block diagram
2.2 Introduction of Data Embedded methods
There are many watermark techniques [8] in terms of their application areas and purposes. The technology of data embedded is a kind of watermarking. It is also related to the science of steganography. The word steganography is derived from the Greek words stegano (hidden) and pgrphein (to write) and therefore means “covered writing”.
Data embedded of MPEG/Audio is a technique for the transmission of additional data along with audio signals existing distribution channels.
The principle, the characteristics, the applications, and the classifications are introduced in the following:
2.2.1 The Principle of Watermarking Algorithm
Mathematically, data embedded can be expressed like EQ 1. If an original audio signal A and a watermark W are given, the watermarked audio signal
A′ is represented
as the following Eq. 1.(
AW)
f A
A′= + , Eq. 1
Fig. 5 shows the combination of the watermarking process which includes inserting and extracting watermark.
MPEG/Audio
Fig. 5 Combination of the watermarking process on MPEG/Audio
2.2.2 Main Characteristics for Watermarking Algorithm
There are many watermark characteristics, which may be required for an effective watermark, but the following main characteristics are important ones.
y
Invisibility
It is not able for human sense system to find the difference between the host media and watermark media. This is the essential requirement of all the
data hiding system including watermarking system. This is why the watermark hidden in the audio must be music inaudible.
y
Robustness
Robustness, also an essential requirement is the ability to resist some of the signal processing operations, such as filtering, compression and the identifiable degree of the retrieved watermarks. The embedded algorithm must make chance to fight against the different kinds of signal processing operations. In general, the more robust the watermarking techniques have, the fewer capacities we can embed.
y
Security
After the watermark embedding, if someone wants to take out the embedded watermark, he must own some of the secret information related to the original signal. In general, to keep secret of the embedding algorithm is not easy, so the safety of the embedding system relies on the secret key which represents the location that watermark embedded. Using the secret key as the seed of the random number generator, we can get a serial random number and cooperate with an algorithm to embed the watermark. Therefore, the secret key is necessary to extract the watermark from the embedded media.
2.2.3 Applications of Watermarking Algorithm
y
Compatible Transmission of Data (Watermarks)
Basically watermarking algorithms provide a data transmission channel that can be used in existing distribution channels. The data hiding (watermark) transmission is backward compatible in the sense that every existing channel that is able to carry watermarked music. Hence watermarking can be utilized in a wide field of applications.
y
Digital Rights Management (DRM)
Digital Rights Management is often considered as the main application of watermark. Watermark can provide means to fulfill the demands of DRM, such as proof of ownership, access control for digital media, tracing illegal copies and so on.
y
Broadcasting
A variety of applications for audio watermark are in the field of broadcasting. These include program type identification, advertising research, broadcast coverage research and etc.
2.2.4 Classification on Watermarking Techniques
The data embedded technique has different insertion and extraction methods, and may be classify and analyze these methods from the various points of view like in Table 1.
Table 1 Classification of watermarking according to several viewpoints [9]
Classification Contents Inserted media category text, image, audio, video
Perceptivity of watermark visible, invisible
Robustness of watermark robust, semi-fragile, fragile Inserting watermark type noise, image, format
Spatial domain LSB, patchwork, random function Processing method
Transform domain Look-up table, spread spectrum Necessary data for extraction Private, semi-private,
public watermarking
File size Vary or not
2.3 Data Embedded Codec Algorithm for MPEG/Audio
In this section, the properties and the data embedded codec algorithm which includes several methods to embed data into the MPEG/Audio will be introduced.
2.3.1 The Properties of Data Embedded Codec
In this thesis, the MPEG/Audio signal is the inserted media because the technique of the data embedded bases on the specification of the MPEG/Audio. The embedded data is private information which is invisible and fragile. And the file type of embedded data can be any format or just be a series of bitstream. In other words, any data can be embedded into the MPEG/Audio media no matter what data type it is as long as the size of the embedded data is not bigger than the upper limit of the embedded data of the media.
There are three methods for data embedding, embedded data into count1 region, embedded data into bit reservoir, and modify the MP3 encoder from floating point to fixed point.
Recent research has produced a number of algorithms for embedding and retrieval of watermarks in audio signals [10] [11][12][13]. While most known systems operate in the uncompressed domain (PCM Watermarking), few are capable of embedding watermarks into compressed domain (Bitstream Watermarking) such as this thesis. The classification of the watermarking algorithm proposed in this thesis as mentioned above in Table 1 can be summarized and shown in Table 2:
Table 2 Classification of the watermarking technique in this thesis
Classification Contents Inserted media category MPEG/Audio
Perceptivity of watermark invisible Robustness of watermark fragile Inserting watermark type Any format
Processing method: Frequency domain spread spectrum of high frequency Necessary data for extraction Public watermarking
File size of inserted media No change y
Inserted media category : MPEG/Audio
The data embedded method designed flow is based on the property of MPEG/Audio Specification. The MPEG-1 Layer-3 (MP3) is used for embedding data in this thesis. After MP3 encoder doses MDCT transformation which transforms signal from time domain to frequency domain, the frequency lines of the main data are distributed from low frequency to high frequency in a frame as shown in Fig. 41. Data is embedded into frequency domain by MP3 encoders, and extracted by MP3 decoders.
y
Perceptivity of watermark : Invisible
The embedded data as watermarks must be invisible because the inserted media file is the audio file. The embedded data can not either affect the quality of the original music or at least the affection can not be heard. The MP3 decoder with data embedded decoder can be used to extract the embedded data stream and reconstruct the embedded data stream to the original file.
y
Robustness of watermark : fragile
Embedding data into MP3 music is additional service by the content
providers. But the purpose of embedding data is not to provide additional protection for MP3 music, on the contrary, the embedded data becomes fragile and easily distorted when the music is compressed. More robust the watermark is, less space for data embedding. Therefore the fragile method is preferred because more fragile the watermark is, more space for data embedding.
y
Inserting watermark type : any format
The data type that is embedded into the audio file can be any file format, because the embedded data stream has a header which records the synchronization, the embedded file size, the embedded file length, and the file data stream. The embedded data stream just is a series signal of “0” and ”1”
whatever any files types are. The extractor in the MP3 decoder can extract the embedded data and an analyzer of embedded data can reconstruct the embedded files.
y
Processing method : Frequency domain
The embedded data is embedded into frequency lines of the frequency domain after the MDCT transformation which transforms from time domain to frequency domain.
y
Necessary data for extraction : public watermark
The embedded data belongs to public watermark. The embedded data only can be extracted by a special decoder.
y
File size : no change
After the data embeds into the MP3 file, the MP3 file size that is embedded data is the same to the MP3 file that is encoded by other MP3 encoder. One file is encoded by MP3 encoder with data embedded encoder, and the other one is encoded by any other MP3 encoders in the same bitrate and sampling rate. The size of the two MP3 files is the same, if they compare
to each other. They just can be differentiated by the MP3 decoder with embedded data analyzer. The one which embedded data can extract embedded information but the other one can’t.
2.3.2 The Structure of Data Embedded Codec
The data embedded codec are divided into two parts: one part is the data embedded encoder, and the other part is data embedded decoder. The data embedded encoder usually is used for content provider to provide additional service which embed lyrics, the basic information of singer, the photos of the singer, and even the information of customer into the MP3 audio. Almost all information can be embedded into the MP3 file under the upper bond of the size of the embedded data. The data embedded decoder is used for users and combines with the MP3decoder. It can extract all the information that is embedded in the MP3 files and display the information on the monitor. The MP3 decoder with data embedded decoder has also ported on the ADSP-2181 to become a portable device.
Fig. 6 indicates the structure of data embedded encoder. There are two source data for encoding: one is the audio raw data, and one is the embedded data. If there are too many embedded files input into the encoder at the same time, the encoder will confuse the files. And it causes the decoder could not extract the embedded data. The embedded data does not just include only one file. It may include two files or more, so a package program is designed in order to pack all the files to become a file with special format for encoding. The packaged file and the audio raw data input to the MP3 encoder with data embedded encoder together, and the encoder will output a MP3 file with embedded data.
The file size after embedding data is the same to the file size which is encoded by other MP3 encoder. The MP3 file which embeds data can also be played by any general MP3
player, and the embedded data won’t affect the quality of the music.
Fig. 6 The structure of data embedded encoder
Fig. 7 indicates the structure of data embedded decoder. The decoder structure is the inverse flow of the encoder. The MP3 file with embedded data as the input data inputs to the MP3 decoder with data embedded decoder. The decoder has two output ends: one is the music raw data, and the other one is the embedded data stream. The music raw data is the same music of CD quality which decodes by other general MP3 decoder. The embedded data stream has to input the data stream analyzer to analyze, and the data stream analyzer reconstructs the original embedded files. And the files would be shown on the displayer or save as files in the disk.
MP3 MUSIC
Fig. 7 The structure of data embedded decoder
2.3.3 The Methods of Data Embedded Codec
In this section, there are some methods for data embedding. They are introduced in the following subsection:
2.3.3.1 Embedded Data into the Count1 Region
The count1 region saves the frequency lines which distribute on the relative high frequency in a frame. And the energy of the count1 region is small than the energy of big-value region. So the method of embedded data into count1 region can affect the quality of the music small.
General watermarking techniques reference the absolute threshold of hearing of the psychoacoustic model [14] in the music compressing technique, as shown in Fig. 8. The signal energy can’t be heard by people under the absolute threshold of hearing, and the watermark usually hides under the absolute threshold of hearing, too. The signal of the embedded data can’t be heard by people, so it would not affect the quality of the music.
Fig. 8 The Absolute Threshold of Hearing
The computation of the psychoacoustic model is a great quantity of ratio in the MP3 encoder, and it accounts about 20% computation of the MP3 encoder. The quality of the MP3 music after the MP3 encoder encodes without psychoacoustic model and the bit rate sets 128kbps. The general bitrate of MP3 is almost 128kbps now, but a few songs even uses 128kbps for more high quality music. There are few songs encoded by 96kbps or less, because the quality is a little ugly. In order to speed up the encoding time of the MP3 encoder, the psychoacoustic model of the MP3 encoder would be removed for embedded system.
The MP3 encoding speed is speed up after the psychoacoustic model of the MP3 encoder is removed. On the other side, it is not good for data embedded techniques. The data embedded techniques would easily destroy the quality of the music without the reference of the psychoacoustic model. So should embed information in a situation without psychoacoustic model, must look for other places that can embed information in the music. The main condition of the place would not affect the original quality of the music or the affect to the quality should be the lowest.
In this thesis, the method of data embedded bases on a principle that the sensitive degree of different frequency bands for ears of people is different. The sound of low frequency for common people’s ears, no matter how the loud voice of the sound is or where the source of the sound is more relatively sensitivity to distinguish coming out.
But people’s ears are relatively insensitive to high frequency sound. The property is used during MP3 encoding. The property is that people’s ears can’t distinguish the phase of the high frequency.
The MP3 media data embedded technique is designed to utilize the different degree of sensitiveness of human ears to different sound band. Normally human ears, despite the volume or source of the sounds, are more sensitive to those with the phase of lower frequency but are less sensitive to those with the phase of higher frequency. Using this characteristic we develop modified MP3 coding technique, embedding the data in high frequency sound band when compressing MP3 data files to decrease the negative influence of the quality of the sounds. Then a MP3 media data decoder is being
The MP3 media data embedded technique is designed to utilize the different degree of sensitiveness of human ears to different sound band. Normally human ears, despite the volume or source of the sounds, are more sensitive to those with the phase of lower frequency but are less sensitive to those with the phase of higher frequency. Using this characteristic we develop modified MP3 coding technique, embedding the data in high frequency sound band when compressing MP3 data files to decrease the negative influence of the quality of the sounds. Then a MP3 media data decoder is being