Adaptive Wavelet Quantization Index Modulation Technique for Audio Watermarking

全文

(1)Adaptive Wavelet Quantization Index Modulation Technique for Audio Watermarking. Jong-Tzy Wang 1, Ming-Shan Lai2, Kai-Wen Liang2, and Pao-Chi Chang2 1 2. Department of Electrical Engineering, Jin Wen Institute of Technology, Taiwan Department of Communication Engineering, National Central University, Taiwan {jwang, mslai, kwliang, pcchang}@vaplab.ee.ncu.edu.tw. ABSTRACT The quantization index modulation (QIM) that is often used in watermarking can provide a tradeoff between robustness and transparency. In this paper, we propose a robust audio watermarking technique which adopts the wavelet QIM method with adaptive step sizes for blind watermark extraction. Since wavelet transform offers both temporal and frequency resolutions, it is suitable for audio signal processing. The adaptive step size technique is applied to audio signals with different characteristics. This technique is designed based on the criterion that SNR must be maintained above 20 dB so that it is robust and transparent. No side information on the step sizes need to be transmitted. The experimental results show that the embedding capacity is around 4 bits/frame and the watermark is robust against MP3 compression at 64 Kbps, resampling, requantization, and Gaussian noise corruption. The NC values after attacks are all above 0.8 in the experiments so that the copyright can easily be distinguished. Keywords: robust watermarking, blind detection, wavelet packet, QIM. According to the watermarking scheme [2] proposed by P. Bao and X. Ma, adaptive sizes are used for different image signals. However, the step size is required to be sent to the extraction end. This makes the overhead become very large. We propose an audio watermarking system based on wavelet packet decomposition and psychoacoustic modeling. The audio watermarking system can deliver perceptual transparent audio quality, and it is robust against various signal processing or malicious attacks. We analyze signals in the wavelet domain and use adaptive step sizes for audio signals with different characteristics based on the criterion that the SNR must be maintained above 20dB so that it is robust and transparent. In addition, no side information on the step size needs to be sent to the extraction end. The relationship between the QIM step size and the SNR is analyzed. Both the embedding end and the extraction end can use the same formula to obtain the optimal step size. Section 2 introduces the quantization index modulation technique. The proposed scheme is described in Section 3, Section 4 shows the experimental results, and finally the conclusion is provided in section 5.. 2: Quantization Index Modulation 1: Introduction Watermark techniques that provide solutions to the copyright problem become more and more important because the distribution of digital media over Internet is getting popular. Conventional audio watermarking algorithms are performed in the time and the frequency domain. Since wavelet transform offers both temporal and frequency resolutions, it is suitable for analyzing audio signals that need different resolutions in different bands. Recently, several watermarking techniques performed in the wavelet domain were proposed [1] [2]. Chen et al. [3] introduced QIM technique for characterizing the inherent tradeoffs between the robustness and the rate-distortion of the embedding. The embedding scheme [1] proposed by T.-T. Lu is to use the AIM (Activity Index Modulation) technique that applies QIM to the image activity represented by the sum of the absolute pixel values for embedding. A potential problem of the scheme is that the image quality and the robustness cannot be maintained since it uses the same step size for different image signals with different characteristics.. The determination of the step size is a tradeoff between robustness and quality of audio signals. In this section, we will discuss how the QIM step size affects the noise margin and SNR. When the QIM is used to embed watermark, we first find the maximum value of audio signal and the difference between 0 and the maximum value is divided into intervals. Each interval will be assigned an index as 0 or 1. We define polarity of calculated signal value by the index of the interval in which it is located. In order to embed watermarks, we shift the value to the median of the interval or to a nearest median of the neighbor intervals by the relationship between polarity of signal and watermark bit. Here is an example showing in Fig. 1, if the bit of watermark and polarity are the same (right black point), they just need to be moved to the median of the same interval. If the bit of watermark and polarity are different (left black point), the value is shifted to the median of the nearest neighbor interval. The quantization error is ±Δ at most as expressed in (1), where Δ is the step size.. - 1247 -.

(2) Q( x) = x ± Δ . (1) The mean squared quantization noise power is derived as the following: Δ 1 q2 = ∫ ε 2 dε = Δ2 / 3 . (2) −Δ 2Δ. where Ti is the random binary sequence, and N1 is the length of Wi in bits. Vi = Wi ⊕ Ti , i = 1 ~ N1 (3) 3.1.2. Selection of wavelet packet subbands. An audio signal is transformed into wavelet packet subbands: y = WP (x), x = { xn , n = 1 ~ m1} . (4). Where y = { yn , n = 1 ~ m1} , and m1 is the number of samples in a frame.. Fig. 1. Step Size vs. SNR. It shows that the mean squared quantization noise power is affected by the step size. On the other hand, the noise margin which determines the robustness of the watermark is also proportional to the step size. Therefore, the selection of the step size is basically a tradeoff between audio quality and robustness. Selecting a suitable step size is extremely important in a QIM based watermarking system. Our proposed adaptive system is capable of maintaining SNR of audio quality above 20dB while provides the good performance in robustness at the same time.. 3: The Proposed Watermarking Strategy. According to the psychoacoustic model, middle-low subbands are used for watermark embedding to maintain the robustness and the transparency. Therefore, we choose wavelet coefficients from n1 to n2 out of m1 to embed. In addition, from the experiments [5], we exploit the simplest wavelet basis, Haar wavelet, in our system. 3.1.3. Permutation and Block Composition. In order to avoid the damage from burst errors, the wavelet coefficient y n are first processed by random permutations, Pw (.) , to yield Z = {Z n , n = n1 ~ n2 } , Z = Pw (y ), y = { yn , n = n1 ~ n2 }. This section describes the signal flow of the watermark embedding and the blind watermark extraction of our proposed adaptive QIM algorithm.. (5).. The permutated wavelet coefficients in each subband are grouped into blocks with a suitable block size such as 8 coefficients and are calculated to get block mean α j , where j is the block index, j = 1, 2,.. ~ N .. 3.1: Embedding Algorithm αj =. The embedding algorithm is described in Fig. 2. The original audio signal is first segmented into frames, with each frame denoted as. x = { xn , n = 1, 2....} and. divided into 29 subbands via the wavelet packet decomposition. The bandwidth allocation of the subband decomposition structure is close to the critical band structure of human auditory system [4]. Watermark is embedded in the wavelet domain by the QIM method to '. produce watermarked audio x. = { x ,n = 1, 2,....} . ' n. 1 ⎢n − n ⎥ (zn ) , n=n1 ~n2 , j = 1 ~ N , N = ⎢ 2 1 ⎥ (6) ∑ 8 ⎣ 8 ⎦. 3.1.4. Adaptive Enhanced QIM. The Adaptive Enhanced QIM that consists of two major steps, the determination of the number of intervals and the calculation of Index _ numberj , are described in detail as follows. Determination of the number of intervals ( I ): First, we choose the maximum value of α j through. z. all blocks: Vi Wi. xn x n'. α MAX = MAX (α j ), j = 1 ~ N .. (7). The initial number of intervals I i is calculated as yn. zn. αj. yn'. z n'''. α '''j. Ii =. α MAX Δk. (8). where Δ k is the step size determined by the adaptive algorithm that will be described later.. Fig. 2. Embedding diagram. 3.1.1. Scramble watermark. Perform the scrambling operation on the watermark sequence Wi to obtain Vi ,. Based on the different characteristic of audio signal in the temporal domain, we use the adaptive step size technique for different frames such that the watermark is robust while the good audio quality can be maintained. No side information on the step sizes need to be. - 1248 -.

(3) transmitted. According to the requirement of IFPI [6], SNR of audio watermark of good quality must be above 20dB. Therefore, we start the step size from a small value and then increase it until SNR is equal to or larger than 20dB. From the experimental result, we find that the step size Δ k varies with a tendency that is in proportional to the mean (α MAX ) k of the block, such that we can get a formula of the step size as a linear equation (9), Δk = Δm +. Δ M − Δ min ((α MAX ) k − β m ) βM − βm. (9). where β M = MAX ((α MAX ) k , β m = MIN ((α MAX ) k , and Δ m = MIN (Δ k ) , Δ M = MAX (Δ k ) , by the statistics Δ m =5, β M =1218 and β m =50. Δ M = 102.33 It is important to maintain the same number of. Aj =. the number of interval. I is recalculated at the. , j =1~ N .. In the process of embedding watermark, we need to change specific α j to make sure that α j is less than α MAX even after embedding. Because α j is modulated for embedding such that the value of α j may exceed α MAX . It may happen when α j is located in the maximal interval. On the other hand, the block that α MAX belongs to should be kept unchanged. In the case that α j is in the maximal interval, we need to decrease A j. by two to enhance correct. extraction. A 'j = A j − 2 .. each block. ⎪⎧ z − 2Δ k , if z n ≥ 0 z 'n = ⎨ n ⎪⎩z n + 2Δ k , if z n < 0. σ 1 = [0.5 − ( I i − ⎣⎢ I i ⎦⎥ )]* Δ k ' MAX. = α MAX + σ 1. .. (10). where σ 1 is the parameter used to change the original ' ' . And α MAX is audio signals for increasing α MAX to α MAX. then used to calculate I ' so that I ' = ( ⎢⎡ I i ⎥⎤ + ⎣⎢ I i ⎦⎥ ) / 2 .. I = ⎢⎣ I ' ⎥⎦ .. condition in (14). In cases that α j is not in the maximal interval, the coefficients are kept with no change. . Embedding scheme: . If Index _ numberj and the permuted watermark. (11). sequence Vi are the same, we need to move σ 2 to the central of the interval that they belong to. ' ' ⎪⎧σ 2 = [0.5 − (A j − ⎣⎢ A j ⎦⎥ )] × Δ k . ⎨ '' ' ⎪⎩α j = α j + σ 2. Calculation of Index _ numberj ：. Before embedding watermark, we need to calculate Index _ numberj for each block as in (12), where α j is the. (12). In our proposed algorithm, the basic concept of QIM is used and the range of means of blocks is divided into intervals according to I .We need to calculate A j based on the step size Δ k to determine the interval that α j is. (17). If Index _ numberj and the permuted watermark. average magnitude of the permuted wavelet coefficients. ⎢α j ⎥ Index _ numberj = ⎢ ⎥ mod 2 , j = 1 ~ N ⎣ I ⎦. (16). Finally A 'j calculated from α 'j will match the desired. Because the number of intervals must be an integer, we floor value to get I. located:. (15). Since α 'j is calculated by z 'n , we get. α 'j = α j − 2Δ. z. (14). To get the result, we first modify the coefficients in. I which is. described as. α. (13). α MAX is used to determine the number of intervals I .. extraction end by (8), too. Dead zone evacuation is used to enhance the robustness of calculated. Δk. It is important to keep α MAX to be maximal because. intervals at the extraction end to avoid watermarking decoding errors. In order to send no side information,. αj. sequence Vi are different, we need to move σ 2 to the central of the nearest neighbor interval. ⎧ ⎧[1.5 − (A 'j − ⎢ A 'j ⎥ )] × Δ k , if (A 'j − ⎢ A 'j ⎥ ) ≥ 0.5 ⎣ ⎦ ⎣ ⎦ ⎪ σ 2 = ⎪⎨ ⎪ ' ' ' ' ⎪⎩[( ⎡⎢ A j ⎤⎥ − A j ) − 1.5] × Δ k , if (A j − ⎢⎣ A j ⎥⎦ ) < 0.5 . ⎨ ⎪ ⎪⎩α ''j = α 'j + σ 2 (18). - 1249 -.

(4) o pi an. es 2. rn. or ch. Or i. flu. ho. te. e ut. gi na l. rp. _f l W. Y. songs. no. W. A. _h a. 1_ t2. D _A. trp. Pia. W. 0_ 1. ar. rp 4. ss4 7 ba. to zn' . yn' that is the watermarked coefficient of wavelet packet, is then generated by inverse permutation.. 48 _1. 2. 20 10 0 _1. 3.1.5. Block Decomposition and Inverse Permutation. In the process of adaptive enhanced QIM, zn is changed. 50 40 30. qu. SNR(dB). (19). ha. ⎧⎪z ' + σ 2 , if z 'n ≥ 0 . z ''n = ⎨ 'n ' ⎪⎩ z n − σ 2 , if z n < 0. Fig. 3. SNR of original audio signals and watermarked audio signals. In the extraction process, a watermarked audio signal ( xn' ) is performed by the same procedure as in the embedding _ numberj . process to finally generate Index. 0.6 0.4 0.2. Since the watermark sequences W i are randomly permuted in the embedding end, the extracted watermark ( W i ) must be inverse permuted as in (22).. MP3(64 Kbps). Requantization. AWGN(28 dB). AWGN(36 dB). Resample. t. o. pe. tr u m. n ga. an. or. pi. ol in ch es 2. vi. or. or d. ho rn. ich. llo. fl u te. ce. songs. rp s. ba. ss. 47 _. 1. 0 5_ 2. (B). =1. if Index _ numberj = 1 ,then V i. 0.8. pi 3. (A). =0. if Index _ numberj = 0 ,then V i. 1. NC value. According to the following criterion, we can determine V i :. 1.2. ha. 3.2: Blind Watermark extraction. 23 _2 qu ar 48 _1 so pr 44 _1 tr p t2 1_ 2 vi oo 10 _2 W _A DA Y W _b ut ter 1 W _h W ar p _H EA RT 1 W _f lu te Am Pi a_ an oO 1 r ig in al. After performing all the above steps, the watermarked audio signal is generated.. ho rn. (21). gs. x' = W p−1 (y ' ). 0_ 1. 3.1.6. Inverse Wavelet Packet Transform (IWP). Finally, IWP of the wavelet coefficients generates the watermarked audio signal( xn' ).. In these experiments, we use LAME MP3 encoder/decoder for re-encoding attacks, we reduce the original sampling rate from 44.1 kHz to 22.05 kHz and raise it back to 44.1 kHz by using interpolation for re-sampling attacks. Similarly, requantization changes the number of bits needed from 16 bits to 8 bits first, and then raises it back to 16 bits. White Gaussian noise with a constant level of 28 and 36 dB is added to the watermarked signals. The results of the performance after attacks are shown in Fig. 4 that shows the watermark is robust against MP3 compression at 64 Kbps, resampling, requantization, and Gaussian noise corruption. In all cases, NC values are above 0.8 that the copyright can clearly be distinguished.. rp 4. (20). ha. Z ' = Pw −1 (y ' ). Fig. 4. Nc value after audio-processing.. i. j. pi an o. or ch es 2. te. ho rn. f lu. W _f lu te. AY. _h arp. -0.5. t21 _2. (23). Pi an oO r ig in al. 2. W. ∑ ∑ [W (i, j )]. 0 t rp. j. 0.5. W _A D. Nc=. i. Watermarked. 1. qu ar4 8_ 1. ∑ ∑W (i, j )W (i, j ). Original. 1.5. ha rp 40 _1. We test our algorithm on 24 sequences of 16-bit signed mono audio signals sampled at 44.1 kHz in PCM format. NC (Normalized Correlation) and SNR are used as the performance criteria of the proposed algorithm. The NC, defined as in (23) is used to calculate the similarity between the extracted and the original watermarks. A watermark pattern ( W ) with the size 28 pixel×28 pixel, is used in simulations.. 47 _1. 4: Experimental Results. A false positive rate is a detection of a watermark in a piece of media that does not actually contain that watermark. For convenience, only the host signal is extracted. If the NC of the host signal is very small, it proves that there is no false positive error. Fig. 5 shows that the false positive rate is very low because the NC of the original (unwatermarked) signal is nearly 0.. Nc value. (22). ba ss. W i = Pw−1 (Vi ), i = 1 ~ N1. songs. Fig. 5. NCs of original audio signals and watermarked audio signals.. - 1250 -.

(5) Fig. 6 shows the step sizes by running the adaptive algorithm at the embedding side as well as the extraction side. Both curves are almost indistinguishable that shows it is not necessary to send the side information on the step size.. stepsize. 60 40 20. Embed. Extract. 0 1. 17 33 49 65 81 97 113 129 145 161 177 193 frame number. Fig. 6. The step size of trpt21_2 at the embedding and extraction end.. 5: Conclusion In this paper, we present a robust audio watermarking system based on wavelet quantization index modulation. Original audio signals are not needed for extraction. In addition, an adaptive step size technique is utilized for getting the balance of the robustness and the transparency. The capacity is around 4 bits/frame. The simulation results show excellent performance against various attacks. The focus of future work will be the enhancement of robustness against more attacks, such as random dropping, random inserting, and random cropping.. References: [1].Lu, T. T.: Featured-based block-wise processing applied to image and video compression and watermarking systems. Ph.D. dissertation. Elect. Eng. Dept. Univ. of National Central. Taoyuan. Taiwan (2003) [2].Bao, P. , Ma, X.: Image adaptive watermarking using wavelet domain singular value decomposition. IEEE Trans. on Circuits and Systems for Video Technology. Vol.15. No. 1. Jan. (2005) 96-102 [3].Chen, B., Wornell, G. W.: Quantization index modulation: A class of provably good methods for digital watermarking and information embedding. IEEE Trans. Inf. Theory. Vol.47. No.4. May (2001) 1423-1443 [4].ISO/IEC 11172-3 : Information technology - Coding of moving pictures and associated audio for digital storage media at up to about 1.5 M bits/s - Part 3: Audio (MPEG-1) . (1992) [5].Wu, S., Huang, J. D., Shi, Y. Q.: Self-synchronized audio watermark in DWT domain," IEEE Int. Symposium on Circuits and Systems. Vancouver Canada. May23-26 (2004) 23- 26. [6].Katzenbeisser, S., Petitcolas, F. A. P.: Information Hiding Techniques for Steganography and Digital Watermarking. Artech House, Inc. Canton Street Norwood MA. (2000). - 1251 -.

(6)