使用視覺上可移除的浮水印實現可調式多媒體階層式保護

(1)

國

立

交

通

大

學

電子工程學系電子研究所碩士班

碩

士

論

文

使用視覺上可移除的數位浮水印

實現可調式多媒體階層式保護

Layered Protection of Scalable Media

using Perceptually Removable Watermarks

研究生：朱育成

指導教授：杭學鳴博士

中

華

民

國

九

十

六

年

六

月

(2)

使用視覺上可移除的數位浮水印

實現可調式多媒體階層式保護

Layered Protection of Scalable Media

using Perceptually Removable Watermarks

研究生: 朱育成

Student: Yu-Cheng Chu

指導教授: 杭學鳴

Advisor: Dr. Hsueh-Ming Hang

國立交通大

國立交通大學

學

電

子

工

程

學

系

電

子

研

究

所

碩

士

班

碩士論

碩士論文

文

A Thesis

Submitted to Department of Electronics Engineering & Institute of Electronics College of Electrical and Computer Engineering

National Chiao Tung University in Partial Fulfillment of the Requirements

for the Degree of Master in

Electronics Engineering

June 2007

HsinChu, Taiwan, Republic of China

(3)

使用視覺上可移除的數位浮水印實現可調式多媒體階

層式保護

研究生：朱育成

指導教授：杭學鳴

國立交通大學電子工程學系電子研究所碩士班

摘要

隨著網際網路的日漸普及，數位資料變得非常容易取得，因此我們需要一個保護架構來保護智財權。再者，由於行動通訊的發達，加上行動通訊本身頻寬的多樣化，造成了可調式多媒體日趨重要，如此一來，為了保護可調式多媒體，一個可調式多媒體階層式保護的架構被提了出來。這個架構利用加密系統來保護各階層的資料，而解密的金鑰則以浮水印鑲嵌在上一個階層的資料內。在本論文中，我們主要的目的是驗證可調式多媒體階層式保護的可行性，其驗證的範圍包括了解析度可調以及影像品質可調。為了驗證這兩種可調式媒體， JPEG2000 被選擇為測試平台，這是因為 JPEG2000 本身提供了解析度可調與影像品質可調的檔案格式。為了攜帶解密金鑰，一個視覺上可移除的數位浮水印被設計出來，這個數位浮水印利用鑲嵌高強度的浮水印來達到高資料容量的目的，但是隨之而來的是一個品質較差的鑲嵌後影像，不過一個視覺上不受影響的影像可以在最後被取得。有了視覺上可移除的數位浮水印的幫助，用來保護解析度可調或是影像品質可調多媒體的階層式保護架構可以被完整地實現。

(4)

Layered Protection of Scalable Media using

Perceptually Removable Watermarks

Student : Yu-Cheng Chu Advisor : Dr. Hsueh-Ming Hong

Department of Electronics Engineering & Institute of Electronics National Chiao Tung University

Abstract

With the widespread use of Internet, digital multimedia data are easy to access. Therefore, a protection scheme is needed to secure the intellectual property. On the other hand, scalable media coding becomes more and more important due to the varying bandwidth in transmission such as mobile communication. Hence, a layered protection scheme of scalable media was proposed. It is a scheme which protects data of each layer by cryptosystem and the decoding key of the current layer is carried by the watermark embedded in the previous layer.

In this thesis, our goal is to design and implement the layered protection scheme for the SNR and the spatial scalable media. Thus, the JPEG2000 still image compression standard, which provides content formats for SNR and spatial scalabilities, is used as the platform in our implementation. To fulfill all the requirements in our system, the perceptually removable watermarking (PRW) is chosen for carrying the decoding keys. The specifically designed PRW embeds the watermark with high intensity to increase the data capacity, and it can be removed at the other end with virtually no loss on image quality. With the aid of the PRW, the layered protection scheme for SNR and spatial scalability can be successfully implemented.

(5)

誌謝

感謝杭學鳴老師這兩年來細心的指導，讓我了解到做研究該有的態度以及方法，也感謝實驗室的學長姊和同學，尤其是峰誠學長總是給我ㄧ些良好的建議，使我在遭遇瓶頸時能夠順利解決，也謝謝建志幫助我了解 JPEG2000 的架構，讓我能夠將研究精力專注在設計而非摸索測試平台；也謝謝家人以及女友的體諒，讓我能夠專注研究而不會被一些瑣事所打擾。在這兩年的研究生活中，因為大家的幫忙，我學習到了如何解決問題，也順利完成了學業。在此把本論文獻給所有幫助過我的人。

(6)

List of Tables

Table 2-1 Le Gall 5/3 analysis and synthesis filter coefficients ... 11

Table 2-2 Daubechies 9/7 analysis and synthesis filter coefficients ... 11

Table 2-3 Coding Pass Classification...15

Table 2-4 Contexts for the significance propagation pass and cleanup pass ...16

Table 2-5 Contributions of the vertical and the horizontal neighbors to the sign context ...17

Table 2-6 Contexts for the magnitude refinement pass...17

Table 2-7 The progression order of packets ...20

Table 3-1 Results of correlation-based watermarking ...53

Table 3-2 Results of bitplane-based watermarking...54

Table 4-1 Results of correlation-based watermarking ...66

Table 4-2 Watermarking with fixed and proportioned intensities...71

Table 5-1 Results of resolution-based PRW using 5_3 filter in JPEG2000...81

Table 5-2 Results of resolution-based PRW using 9_7filter in JPEG2000...82

Table 5-3 Results of PSNR-based PRW using 5_3 filter in JPEG2000...83

Table 5-4 Results of PSNR-based PRW using 9_7filter in JPEG2000...84

Table 5-5 PSNR shown for the compressed and the reconstructed images ...85

Table 5-6 PSNR shown for the uncompressed and the uncompressed reconstructed images ...86

(9)

List of Figures

Figure 2-1 Figure 2-1 General block diagram of JPEG2000 encoder [1]...6

Figure 2-2 General block diagram of JPEG2000 decoder [1] ...6

Figure 2-3 Tiling, DC Level Shifting, Multi-component Transform (optional) ...7

Figure 2-4 2D forward discrete wavelet transform...9

Figure 2-5 2D DWT decomposition ...10

Figure 2-6 Two tiers of EBCOT algorithm ...13

Figure 2-7 Diagram of code-blocks, bit-planes, stripes and coding pass ...14

Figure 2-8 Context window and Neighbor states ...15

Figure 2-9 Basic operation of the arithmetic encoding...18

Figure 2-10 Embedding Process ...22

Figure 2-11 Extraction process ...23

Figure 2-12 Detection process ...23

Figure 2-13 Embedding procedure of correlation-based watermark ...27

Figure 2-14 Correlation values between the watermarked image and the patterns generated by different seeds and the seed used to generate the watermark is 10 ...28

Figure 2-15 Four bits watermark embedding...29

Figure 2-16 Watermarking in wavelet domain...30

Figure 2-17 Bit-planes of the fruit image ...31

Figure 2-18 DCT coefficient ordering method of watermarking...32

Figure 2-19 A typical MPEG-2 conditional access receiver ...39

Figure 2-20 Decryption and decoding of layer-protected content ...40

Figure 3-1 Embedding procedure of the correlation-based watermark ...42

Figure 3-2 Embedding “0110” into the host image ...43

Figure 3-3 An image with four decomposition levels using DWT ...44

(10)

Figure 3-5 Procedure of removing the watermark from the watermarked image...51

Figure 3-6 Test images ...52

Figure 3-7 Choosing embedding positions before R-D Optimization ...55

Figure 3-8 Choosing embedding positions after R-D Optimization...56

Figure 4-1 An image with four decomposition levels using the DWT ...59

Figure 4-2 Decryption and decoding of layer-protected content ...61

Figure 4-3 Relationship between layers...62

Figure 4-4 Embedding positions of the watermark...63

Figure 4-5 Embedding positions of the first bit of a 6-bit data...64

Figure 4-6 Seed search procedure...65

Figure 4-7 Error Correction Codes ...67

Figure 4-8 Parity-check matrix of the (39, 32) SEC-DED code...68

Figure 4-9 Architecture of the Perceptually Removable Watermarking: Data Hiding, Data Extraction and Image Reconstruction ...70

Figure 4-10 Layers of PSNR-based scalable media ...72

Figure 4-11 Different embedding ranges of the ideal watermarking technique for PSNR-based scalable media and the modified one...73

Figure 4-12 The embedding areas of the simplified watermarking technique for PSNR-based scalable media ...74

Figure 4-13 Watermark embedding of resolution-based PRW and PSNR-based PRW ...75

Figure 4-14 Watermark Extraction and Removal of Resolution-based and PSNR-based PRW...76

Figure 4-15 Conceptual structure of a JP2 file ...77

Figure 5-1 Images used for simulation ...80

Figure 5-2 The decoded images of the resolution-based scalable media at different reconstruction levels with 5_3 filter ...88 Figure 5-3 The decoded images of the resolution-based scalable media at different

(11)

reconstruction levels with 9_7 filter ...89 Figure 5-4 The decoded images of the PSNR-based scalable media at different

reconstruction levels with 5_3 filter ...90 Figure 5-5 The decoded images of the PSNR-based scalable media at different

reconstruction levels with 9_7 filter ...91 Figure 5-6 Amplified absolute difference of res.-based watermarking with 5_3 filter92 Figure 5-7 Amplified absolute difference of res.-based watermarking with 9_7 filter93 Figure 5-8 Amplified absolute difference of PSNR-based watermarking with 5_3 filter ...94 Figure 5-9 Amplified absolute difference of PSNR-based watermarking with 9_7 filter ...95

(12)

Chapter 1 Introduction

With the widespread use of Internet, all digital data are easy to access. Thus, a protection scheme is necessary for the multimedia distribution. Moreover, the scalable media coding becomes more and more important due to the varying bandwidth in transmission such as mobile communication. Thus, we like to propose a scheme for protecting the scalable media.

Scalable coding implies that a multimedia is divided into a base layer and several enhancement layers, and each enhancement layer can be independently used to improve the reconstructed image quality. There are three types of scalability, that is, spatial (resolution) scalability, temporal scalability, and signal-to-noise (SNR) scalability. The spatial scalability means that each layer has a different spatial resolution, and we retrieve the reconstructed image with higher resolution when more enhancement layers are decoded. The temporal scalability is a scheme that enhancement layers increase the overall frame rate of a video. And the SNR scalability allows different layers to have different image quality.

In [22], the layered protection scheme for scalable media was proposed. It is a scheme which protects data of each layer by cryptosystem and the decoding key of the current layer is transmitted by the watermark embedded in the previous layer. Since the verification of the layered protection scheme in [22] is only done for the temporal scalability, there is still a question that whether the proposed scheme will work or not for the SNR scalability and the spatial scalability. Thus, our contribution in this thesis is to design the layered protection scheme for the SNR and the spatial scalabilities.

(13)

In our system, the JPEG2000 still image compression standard, which provides content formats for the SNR and the spatial scalabilities, is adopted. And the watermarking technique used for carrying the decoding key is a specifically designed one. It is the perceptually removable watermarking (PRW). With the aid of the PRW, the layered protection scheme for SNR and spatial scalability can be successfully implemented.

This thesis is organized as follows. Chapter 2 describes the JPEG2000 standard, the watermarking techniques, and the concept of layered protection scheme for scalable media. In Chapter 3, the characteristics of different watermarking techniques are analyzed and the correlation-based watermarking is chosen to be the method of data transmission in the layered protection scheme. We try to improve the efficiency of the chosen watermarking technique in Chapter 4 and the perceptually removable watermarking is thus proposed. Chapter 5 shows the simulation results. And Chapter 6 concludes this thesis.

(14)

Chapter 2 JPEG2000, Watermarking,

and Layered Protection

There are three parts in this chapter, they are JPEG2000 standard, digital watermarking, and the architecture of layer protection of scalable media. At first, we will briefly describe the JPEG2000 standard by describing the features of JPEG2000 and explaining the coding procedure of JPEG2000. After describing the JPEG2000 standard, an overview of digital watermarking will then be introduced. And the architecture of layered protection of scalable media that we will design and implement is mentioned in the end.

2.1 The JPEG2000 Standard

Because the increasing popular use of multi-media technologies is increased, the old JPEG standard used to compress still images cannot fulfill the advanced requirements for image coding today. Thus, a new image compression technique with better compressed efficiency and other useful features is needed. A new standard called JPEG2000 was developed during 1997 to 2000 and achieved the aforementioned demands.

2.1.1 Introduction to JPEG2000

(15)

applications such as internet, remote sensing, medical imagery, and so on. Therefore, JPEG2000 has a number of features and the most important features [2] of JPEG2000 are listed below.

Superior low bit-rate performance

The standard should provide the superior performance at low bit-rates (e.g. below 0.25 bpp) as comparing to the old standard such as JPEG. Finally, JPEG2000 has a compression advantage over JPEG by roughly 20% and a subjective quality benefit. This feature is important in some bandwidth sensitive applications such as network image transmission and remote sensing.

Lossless and lossy compression

JPEG2000 provides both lossless and lossy compression using the same architecture. It is desired to provide lossless compression naturally in the course of progressive decoding, and then we can progressively construct the image from the lossy mode to the lossless mode.

Progressive transmission by pixel accuracy and resolution

Progressive transmission that allows images to be constructed with increasingly pixel accuracy or spatial resolution is important for many applications. This feature makes it possible for different devices to reconstruct image with different resolutions and pixel accuracy. Applications like World Wide Web, image archival and printers are some instances.

Region-of-interest coding

Some parts of an image that are more important than the others. This feature allows users to preserve the fidelity of the interested region by using more bits to code the data in the region-of-interest (ROI) and transmitting the coded data earlier.

Random codestream access and processing

This feature allows user to randomly define the ROI’s in the image and code them with less distortion than the rest of the image. Moreover, rotation, filtering,

(16)

scaling and feature extraction are supported. Robustness to bit-errors

Proper design of the codestream can aid subsequent error correction systems to alleviate catastrophic decoding errors and make sure that some specific portions of the codestream are correct. This feature can overcome the obstacle of transmission over a noisy channel like wireless channel.

Open architecture

It is desirable to have an open architecture to optimize the system for different image types and applications, so the decoder is only required to implement the core tool set and a parser that understands the codestream. Also, unknown tools could be added to the decoder if necessary.

Content-based description

Since image archival, indexing and searching is an important issue in image processing, content-based description of images might be available as part of compression system.

Side channel spatial information (transparency)

Side channel spatial information such as alpha planes and transparency planes are useful in many applications. One example is the transparency plane use in the World Wide Web applications.

Protective image security

We can use watermarking, labeling, stamping and encryption to protect a digital image. Labeling is already implemented in still picture interchange file format (SPIFF) [1] and JPEG2000 must be easy to achieve the target.

Continuous-tone and bi-level compression

A coding standard which has a unified compression architecture to compress both continuous-tone and bi-level image is desirable. This system is able to compress and decompress each color component with depth of 1 to 16 bits.

(17)

Figure 2-1 Figure 2-1 General block diagram of JPEG2000 encoder [1]

Figure 2-2 General block diagram of JPEG2000 decoder [1]

Figures 2-1 and 2-2 are the block diagrams of the encoder and the decoder in the JPEG2000 standard. Before introducing the functionality of each block in the block diagrams, we should note that the decoder is simply the inverse of the encoder. Thus, although most parts of the JPEG2000 standard are written from the point of view of the decoder, for easy understanding, we will describe the JPEG2000 encoding tools in the following sections.

(18)

2.1.2 Pre-Processing

There are three sub-steps in the preprocessing step, which are “Image Tiling”, “DC Level Shifting” and “Multi-component Transform” as shown in Figure 2-3. We will introduce each of them as follows.

Figure 2-3 Tiling, DC Level Shifting, Multi-component Transform (optional)

2.1.2.1 Image Tiling

Image tiling divides the original image into several rectangular non-overlapping blocks (tiles), and each tile is compressed independently. Since each tile could be processed independently, this step could significantly reduce the use of memory, and we can also decode some specific parts of the image instead of the whole image by this way. Although tiling could reduce the use of memory, the cost is the reduced quality of the compressed image.

2.1.2.2 DC Level Shifting

After tiling image, all pixels in each tile are dc-level shifted by subtracting the same quantity 2p-1, where p is the precision of the corresponding component. It is important to note that the dc-level shifting is only performed on samples that are unsigned.

(19)

2.1.2.3 Multi-component transform

Multi-component transform can improve the coding efficiency by efficiently reducing the redundancy between each component of the original image file [3] such as RGB components of the bmp files. JPEG2000 supports two types of transformation, that is, reversible component transformation (RCT) and irreversible component transformation (ICT). RCT can be used for lossy or lossless coding while ICT should only be used for lossy coding. The forward and the inverse ICT transformations are achieved by means of equations (2.1.2-1) and (2.1.2-2) respectively. The other one, RCT, refers to (2.1.2-3) and (2.1.2-4).

JPEG2000 supports multiple-component images and different components need not have the same bit depths. For this reason, the bit depth of each output image component must be identical to the bit depth of the corresponding input image component. This is the only requirement for reversible systems.

                    − − − − =           B G R Cr Cb Y 08131 . 0 41869 . 0 5 . 0 5 . 0 33126 . 0 16875 . 0 114 . 0 587 . 0 299 . 0 (2.1.2-1)                     − − =           Cr Cb Y B G R 0 772 . 1 0 . 1 71414 . 0 34413 . 0 0 . 1 402 . 1 0 0 . 1 (2.1.2-2)               − −     + + =           G B G R B G R V U Y ₄ 2 (2.1.2-3)               + +     + − =           G V G U V U Y B R G ₄ (2.1.2-4)

(20)

2.1.3 Discrete Wavelet Transform

Wavelet transform is a transform that produces the transformed data having both spatial and frequency information, and JPEG2000 used this property to achieve the goal of scalable coding with different resolution and quality.

In JPEG2000, the wavelet transform is used to decompose the tile components into different decomposition levels. Each of them has a number of subbands with vertical and horizontal characteristics of the original image. Due to the less correlated properties of these subband signals, the transformed data can be coded more efficiently than the original data.

Figure 2-4 2D forward discrete wavelet transform

Usually, the two-dimensional (2D) discrete wavelet transform is accomplished by cascading two one-dimensional (1D) discrete wavelet transform. As shown in Figure 2-4, the two-dimensional discrete wavelet transform is composed of two one-dimensional discrete wavelet transform along the horizontal and the vertical directions, respectively. After horizontal transform, two subbands are formed, and the

(21)

vertical transform is then applied to each band to produce the four bands in two-dimensional wavelet transform. Figure 2-5 shows the flow of decomposing one subband into four higher-level subbands. It is important to note that for one-dimensional discrete wavelet transform, the low-pass samples represent a down-sampled low-resolution version of the original signal and the high-pass samples represent a down-sampled residual version of the original signal. This property is useful in scalable coding since we can get the low-resolution image by only decoding the LL-band of the transformed data.

Figure 2-5 2D DWT decomposition

JPEG2000 provides two types of wavelet transform. One of them is the reversible DWT with the Le Gall 5/3 filter; the other one is the irreversible DWT with the Daubechies 9/7 filter. Their coefficients of the analysis and synthesis filter are listed in Table 2-1 and Table 2-2.

Since the RCT may be used for lossy and/or lossless coding, it may only be used together with the 5/3 reversible wavelet transform. And since the ICT may only be used for lossy coding, the 9/7 irreversible wavelet transform may be the only choice for ICT to be used with.

(22)

Analysis Filter Coefficients Synthesis Filter Coefficients i Low-Pass Filter hL (i) High-Pass Filter hH (i) Low-Pass Filter gL (i) High-Pass Filter gH (i)

0 6/8 1 1 6/8

±1 2/8 -1/2 1/2 -2/8

±2 -1/8 -1/8 Table 2-1 Le Gall 5/3 analysis and synthesis filter coefficients Analysis Filter Coefficients Synthesis Filter Coefficients i Low-Pass Filter hL (i) High-Pass Filter hH (i) Low-Pass Filter gL (i) High-Pass Filter gH (i)

0 0.6029490182363579 1.115087052456994 1.115087052456994 0.6029490182363579

±1 0.2668641184428723 -0.5912717631142470 -0.5912717631142470 0.2668641184428723

±2 -0.07822326652898785 -0.05754352622849957 -0.05754352622849957 -0.07822326652898785

±3 -0.01686411844287495 0.09127176311424948 0.09127176311424948 -0.01686411844287495

±4 0.02674875741080976 0.02674875741080976

(23)

2.1.4 Quantization

After transformation, all coefficients are quantized. This process is lossy unless the quantization step is equal to one with integer coefficients. Several quantization options are provided by the JPEG2000 standard, but only the uniform scalar quantization, which is the default method in JPEG2000 standard Part I, will be introduced here.

In the integer mode, the quantizer step sizes are always fixed to be one, effectively bypassing quantization and making the transform coefficients unchanged. In this case, lossy coding is still possible, but the rate control is achieved by the other mechanism. In the real mode, the quantizer step sizes are chosen in conjunction with the rate control. Each of the transformed coefficients ab(u,v) of the subband b is

quantized to the value qb(u,v) according to the formula (2.1.4-1). The step size △b is

defined in (2.1.4-2). It is relative to the dynamic range Rb of the subband b. And the

exponent/mantissa pairs (εb,μb) are signaled in the bit stream syntax.

      ∆ = b b b b ) , ( a )) , ( sign(a v) (u, u v u v q (2.1.4-1) ) 2 1 ( 2 v) (u, R ₁₁ b b εb + µb = ∆ − (2.1.4-2)

(24)

2.1.5 Embedded Block Coding with Optimized Truncation

Embedded block coding with optimized truncation (EBCOT) [4] is adopted for the entropy coding part of JPEG2000. There are two major coding steps in EBCOT, that is, tier-1 and tier-2 coding, as shown in Figure 2-6. The tier-1 part is the embedded block coding (EBC) which includes the context formation (CF) and the arithmetic encoder (AE). In the tier-1 coding, the encoder divides each subband into code-blocks and all code-blocks are encoded independently to form a block-based embedded bit-stream. The coding method is the bit-plane coding described later in the next section. For each code-block, the embedded bit-stream is composed of numerous coding passes and the outputs of tier-1 are the collection of coding passes of various code-blocks. After the embedded bit-stream is produced, the tier-2 part truncates the bit-stream for best rate-distortion optimization. That is, for fixed quality, the rate is minimized and for a fixed rate, the overall distortion is minimized. We will introduce two tiers in details in the following sections.

Figure 2-6 Two tiers of EBCOT algorithm

2.1.5.1 Tier-1 Coding

The tier-1 coding is also known as the embedded block coding (EBC). It is composed of the context formation (CF) and the arithmetic encoder (AE), and the code-block is its basic coding unit. The EBC is coding in bit-level by bit-plane coding, that is, the code-block is coded from the most significant bit (MSB) bit-plane to the least significant bit (LSB) bit-plane. Each bit-plane takes three passes and is scanned in a stripe-based method shown in Figure 2-7.

(25)

LH2 HH1 HL1 HH2 HL2 LL2 LH2 CB5 CB1 CB0 CB4 CB7 CB3 CB2 CB6 CB13 CB9 CB8 CB12 CB15 CB11 CB10 CB14 SIGN Bit-plane 7 DWT Code Block Bit-planes

Stripe 0 Stripe 1 Stripe 2 Stripe 3 Stripe 4 . . . . . . Stripe n 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23

Pass1 : Significance Propogation Pass Pass2 : Magnitude Refinement Pass Pass3 : Cleanup Pass

Stripes Coding pass Embedded block coding

Embedded block coding

Figure 2-7 Diagram of code-blocks, bit-planes, stripes and coding pass

Context Formation (CF)

As shown in Figure 2-6, the EBC is essentially a context-adaptive arithmetic encoder. The context formation (CF) generates context-decision pairs for the arithmetic encoder (AE) to adapt the probability of decision. In context modeling, all code-blocks are coded a bit-plane at a time starting from the MSB bit-plane with a nonzero element to the LSB bit-plane. There are three coding passes for a bit-plane, that is, Pass1 (Significance Propagation Pass), Pass2 (Magnitude Refinement Pass), and Pass3 (Cleanup Pass). Each DWT coefficient bit is coded in only one of the three coding passes, and the coding condition is shown in Table 2-3.

(26)

Coding Pass Coding condition

Pass1 (Significance Propagation Pass) Insignificant sample with at least one significant neighbor

Pass2 (Magnitude Refinement Pass) Significant sample

Pass3 (Cleanup Pass) All remaining coeeficients Table 2-3 Coding Pass Classification

Figure 2-8 Context window and Neighbor states

As the context window shown in Figure 2-8, the 4-connected or the 8-connected neighbors of a sample are chosen to examine the state information used for the context-based arithmetic coding. Various context information (context label) for different passes are shown in Table 2-4, Table 2-5, and Table 2-6, and the “X” in the tables means “don’t care”.

The first coding pass (Pass1) for each bit-plane is the significance propagation pass. In significance propagation pass, a bit is coded if and only if its location is insignificant and at least one of its 8-connected neighbors is significant. Nine context labels in Table 2-4 are created based on the characteristics of the 8-connected neighbors, that is, how many and which ones are significant. The mapping to the contexts also depends on which subband the code-block is in. The significance propagation pass only includes bits of coefficients that were insignificant with non-zero context. All other coefficients are skipped. If the value of this bit is 1 then

(27)

the significance state is set to 1 and then the sign bit coding must be performed right away. Otherwise, the significance state remains 0. It is important to note that the coding pass always use the most current significance state to encode the bit.

LL and LH sub-bands (vertical high-pass) HL sub-band (horizontal high-pass) HH sub-band (diagonally high-pass) Context Label ΣH ΣV ΣD ΣH ΣV ΣD Σ(H+V) ΣD 2 X X X 2 X X ≧3 8 1 ≧1 X ≧1 1 X ≧1 2 7 1 0 ≧1 0 1 ≧1 0 2 6 1 0 0 0 1 0 ≧2 1 5 0 2 X 2 0 X 1 1 4 0 1 X 1 0 X 0 1 3 0 0 ≧2 0 0 ≧2 ≧2 0 2 0 0 1 0 0 1 1 0 1 0 0 0 0 0 0 0 0 0

Table 2-4 Contexts for the significance propagation pass and cleanup pass As shown in Table 2-5, the sign bit coding uses another context of the neighborhood to determine the context label. Only four neighbors are considered, and each neighbor may have one of three states: significant positive, significant negative, or insignificant. If the two vertical neighbors are both significant with the same sign, or if only one neighbor is significant, then the vertical contribution is 1 if the sign is positive or -1 if the sign is negative. If both vertical neighbors are insignificant or both are significant with different signs, then the vertical contribution is 0. The horizontal contribution is determined in the same manner.

The second coding pass (Pass2) for each bit-plane is the magnitude refinement pass. This pass includes the bits that are already significant (except those that have just become significant in the proceeding significance propagation pass). The context used in this pass is determined by summation of the significance state of the horizontal, vertical, and diagonal neighbors as shown in Table 2-6.

(28)

Horizontal contribution Vertical contribution Context Label XOR bit 1 1 13 0 1 0 12 0 1 -1 11 0 0 1 10 0 0 0 9 0 0 -1 10 1 -1 1 11 1 -1 0 12 1 -1 -1 13 1

Table 2-5 Contributions of the vertical and the horizontal neighbors to the sign context

ΣH+ΣV+ΣD First refinement for this sample Context Label X False 16

≧1 True 15

0 True 14

Table 2-6 Contexts for the magnitude refinement pass

All the remaining coefficients in the bit-plane are insignificant with the context value of zero during the significance propagation pass. They are all included in the cleanup pass (Pass3). The cleanup pass not only uses the neighbor context, like that of the significance propagation pass from Table 2-4, but also a run-length context. If the four continuous samples in a column and the context labels of the four samples are all zeros, the run-length coding is performed.

Arithmetic Encoder (AE)

The decision which is produced by the CF is coded during the arithmetic encoding. The AE is an adaptive, binary MQ-coder [5]. The basis of binary arithmetic coding process is a recursive probability interval subdivision of Elias coding. Since the arithmetic encoder is binary, with each binary decision, the current probability interval is divided into two sub-intervals, and the width of the sub-interval is

(29)

proportional to the corresponding sample probability. The code-stream is modified (if necessary) so that the base (lower bound) of the sub-interval is assigned to the symbol, and if the interval is too small, the sub-interval is renormalized to a larger interval for AE to divide. All the processes described above are shown in Figure 2-9. Besides, a lazy coding is used to reduce the number of symbols that are arithmetically coded. According to this mode, after the fourth bit-plane is coded, the first and second pass are included as raw, while only third coding pass of each bit-plane applies the arithmetic coding. 0 1 0 1 0 1 0 1 0.7 0 0.7 0.49

p(0) = 0.7 Most Probable Symbol (MPS) p(1) = 0.3 Least Probable Symbol (LPS) The sequence is 01……

symbol

MPS

LPS

Renormalize

Figure 2-9 Basic operation of the arithmetic encoding

2.1.5.2 Tier-2 Coding

There are two main purposes in tier-2 coding. One is to truncate the bit-stream which is coded in tier-1 coding to achieve the objective quality or rate; the other is to permute the packet which is composed of the three bit-plane coding passes in different progression orders to achieve different kinds of scalability such as resolution scalability, quality scalability (PSNR scalability), and so on.

(30)

The input of the tier-2 encoding process is the set of bit-plane coding passes generated during the tier-1 encoding. Each coding pass is a candidate for truncation point of a code-block, and the coding pass information is packaged into packets based on different requirements. For meeting a target bit-rate, the packaging process imposes a particular organization of coding pass data in the output code-stream. The rate-control scheme assures that the output code-stream with desired bit-rate has the best reserved quality compared to all other code-stream at the same rate. The rate distortion optimization (RDO) algorithm will be introduced as follow.

In the encoder, rate control can be achieved through two distinct mechanisms, that is, the choice of quantization step size and the selection of the subset of coding passes to include in the code-stream. When lossless coding is employed, only the second mechanism may be used. The quantization step size must be fixed to one. When lossy coding is applied, both mechanisms may be employed. Since the tier-1 coding needs a lot of computation and changing quantization step sizes leads to redo the tier-1 coding, the first mechanism seems not practical in the tier-2 encoder. The remaining method to control bit-rate is to discard some coding passes to achieve the target rate. Each coding pass gives a load on bit-rate but provides an improvement in quality. Using this information, the encoder could create a bit-stream composed of specific coding passes to minimize the distortion at a fixed rate, and this is the task of rate control.

The rate distortion optimization algorithm is to make a trade-off between bit-rate and distortion. We should minimize the distortion while the rate is fixed or minimize the rate when the distortion is fixed. The solution for this problem is the Lagrange optimization problem [1] as (2.1.5.2-1).

∑

+ = + i n i n i i i _D R D R ) min( ( )) min(

λ

(2.1.5.2-1)

(31)

code-block Bi. The Lagrange multiplier (λ) is used to minimize J = R+λD and since

that to minimize J is equal to minimize( i ni)

i n

i D

R +λ separately. A simple algorithm to minimize J is as follows:

The core concept of the algorithm above is that R_ini +λD_ini >R_ik +λD_ik_means k is better than ni to be the truncation point. Thus the formula “If ∆ /∆ ik >λ−1

k

i R

D then set ni = k” is formed.

After the rate control process, the packets of data are permuted in different progression orders to achieve the function of scalable coding. As shown in Table 2-7, there are four types of progression order.

Progression order

Layer-resolution-component-position progressive (LRCP) Resolution-layer-component-position progressive (RLCP) Resolution-position-component-layer progressive (RPCL) Position-component-resolution-layer progressive (PCRL)

Table 2-7 The progression order of packets Set ni = 0 For k = 1,2,3,… Set ni i k i k i R R R = − ∆ and ni i k i k i D D D = − ∆ If ∆D_ik/∆R_ik >λ−1 then set ni = k

(32)

2.2 The Digital Watermarking

Because the transmission of digital data over the internet is very easy today, there is a need of a technique to protect the intellectual property of digital data. Traditional encryption system satisfies only a portion of this requirement because the clear context of the protected data can be accessed at the paying receiver end. Hence, digital watermarking technique was proposed, which hides the information into the host data without perceptual distortion. There are many kinds of digital data, but we will focus only on the watermark on image data in the following sections.

2.2.1 Introduction to Digital Watermarking

A watermark is required to be robust, invisible, and it conveys as much information as possible. For real-time system, the watermarking system is also asked to have low complexity. Since all the requiring features are conflicting, the design of watermark is just a trade-off issue.

Using the characteristics of the human visual system (HVS) to hide more information with the same perceptual quality is the most popular method to improve the efficiency of watermarking. The domain where watermark is embedded in also affects the robustness. Generally speaking, frequency domain is good for data embedding while the embedded data in spatial domain is easy to be destroyed.

In the following sections, we will introduce the watermarking techniques in detail, such as watermarking process, properties of watermark, categories of watermark, and some examples of watermarking techniques. And the most important model, the human visual system (HVS) model, is described at the end.

(33)

2.2.2 The Process of Watermarking

A watermarking system is formed by four basic stages [7]. They are embedding stage, distribution stage, extraction stage, and detection stage.

Embedding stage:

In this stage, the watermark W with the weighting g is added to the host image I as shown in Figure 2-10. The host image data may be in the spatial domain or the frequency domain.

Figure 2-10 Embedding Process

Distribution stage:

In this stage, the watermarked image is distributed. It may be published on a web or sold to a customer. During the transmission and distribution of the data, the watermarked image is affected by some harmful factors, for example, common image processing tools such as lossy compression of the data, transmission error due to noisy-channel, and geometric transformations such as clipping or rotation. All the factors above introduce errors to watermarked image. And then, all manipulation of the watermarked image should be seen as an attack on the information embedded by watermark.

Extraction stage:

There are three types of methods in extracting the watermark from the watermarked image, that is, blind, semi-blind and non-blind extraction. As shown in

(34)

Figure 2-11, the watermark is extracted from the watermarked image by the watermark extraction process. The blind extraction is the extraction that does not need side information. For semi-blind extraction method, a little side information such as the image features is needed. And if the original host image is needed, this is the non-blind extraction.

Figure 2-11 Extraction process

Detection stage:

As shown in Figure 2-12, the extracted watermark in the extraction stage is compared to the original watermark. If the extracted watermark is the same as the original watermark or they are similar to a certain degree, we can say that the image is exactly the one we want to identify.

Two types of errors may happen in this process, that is, false positive and false negative. False positive means that a watermark is detected although there is none, and false negative rejects the existence of the watermark even though there is one. These two errors may be caused by the transmission error or the design of watermarks.

(35)

2.2.3 Properties of Watermark

There are several desirable properties of a digital watermark, and they may be required depending on applications. For example, the fragile watermark is used for data authentication, and the robust watermark is used for copyright protection. The properties of watermark [7] are listed below.

Perceptual transparency

A watermarked image should not be perceptually different from the original one. That is, the watermark should not degrade the perceptual quality of the image to be embedded in.

Robustness

The watermark should be robust means that the watermark is very hard to remove without degrading the quality of the watermarked image. Thus, if a watermark is destroyed, the noisy image with no meaning is then formed unavoidably. Here are some processes the watermark should resist to. One is the common image processing techniques such as lossy compression, which throws away the unnecessary information for perceptual quality of an image to achieve the goal of data compression. Another one is the geometric transformation such as rotation, scaling, or clipping. The third one is the noisy-channel that the watermarked image should pass through. In this channel, noise is added to the watermarked image and results in damaged data. We wish the watermark is resistant to such kind of attack either.

Security

Security means how easy a watermark could be intentionally removed by deletion, modification or burying of the watermark in another illicit one.

Data capacity

Data capacity refers to the capacity for a host image to carry the hidden information. Each image has its own capacity limit for a watermark method. Therefore, the same watermarking architecture should not embed the same amount of

(36)

information into different pictures.

2.2.4 Categories of Watermark

Base on the properties of watermark, the watermarking techniques can be classified into several categories [8], and some of them are listed below.

Perceptible watermarking

Perceptible watermark refers to the watermark which can be discovered by human eyes. There are two important criteria for perceptible watermark. One is that the watermark should be robust for an authorized person to remove it; the other one is that the watermark should be falsification-resistant. Since it is relatively easy to embed a logo or pattern into the image, we have to ensure that the watermark is embedded by the claimed user.

Imperceptible watermark

The imperceptible watermark is the watermark now in common use. It is invisible but can be extracted by computer through specific algorithm. This makes it possible to build a secret communication channel between one or more recipients.

Robust watermark

Robust watermark refers to the watermark which is resistant to common image processing such as compression or geometric transformation. This means that the watermark will still survive after the destructive procedures described above.

Two major classes of the robust watermark are public and private watermarks [6] [8]. The public watermark needs no side information in the watermark detection stage but the private watermark requires side information. Public watermarks can be accessed by virtually anyone and it provides a side channel for information carrying. Its major drawback is the limited bandwidth for the data. On the other hand, side information required by the private watermark makes a big load on the channel bandwidth and how to securely transmit the side information is a big problem. This is

(37)

why the private watermark is not popular with most users.

Fragile watermark

As its name implies, the fragile watermark is easy to be destroyed by all kinds of processing procedures. The application of this kind of watermark is the authentication of data, in this application, if the watermark is destroyed, we can say that someone else has modified the data.

Semi-fragile watermark

There exists one watermark scheme that has the characteristics between robust and fragile watermark, it is the semi-fragile watermark. The only purpose of semi-fragile watermark is to differentiate between lossy transformations that are “information preserving” and lossy transformations that are “information altering” [9]. For example, it is important for an authentication to distinguish the changes of the original data is made by the compression process or the other data manipulation. Data compression, which preserves the integrity of contents, is allowed but manipulations which alter the data integrity is forbidden. The semi-fragile watermark achieves this goal because it survives the compression operator but it is broken after the other manipulations.

Removable watermark

The watermarking technique which makes it possible to obtain the original image from the watermarked image is called the removable watermarking. But this kind of watermark violates the desired properties of the robust watermark.

Un-removable watermark

This is the goal that most watermarking algorithms like to achieve. In this scheme, the watermark is tightly combined with the host image and to separate them is almost impossible. Hence, extra information such as the ownership of the products is protected securely.

(38)

2.2.5 Watermarking Techniques

We describe several types of watermarking techniques. We will not describe their details but only introduce the principles. The watermarking techniques can be coarsely divided into two classes [10], the correlation-based watermarking, and the noncorrelation-based watermarking.

Correlation-based watermarking:

The correlation-based watermarking technique directly adds watermark to the host image with an intensity factor g, and the watermark is a pseudorandom noise pattern, which is generated based on a seed and consists of the integers {-1, 0, 1}. As shown in Figure 2-13, the watermark is multiplied by a gain factor g and then be added to the host image to form the watermarked image, that is, I’ = I + k‧W.

(39)

For watermark detection, we calculate the correlation between the watermarked image and the pseudorandom noise pattern. Since the watermark is pseudorandom, we assume that the correlation between the host image and the watermark is almost equal to zero. Therefore, the correlation between the watermarked data and the watermark is the same as the autocorrelation of the watermark itself. Thus, if we obtain a correlation with high positive value in the watermark detection process, we could say that the image is watermarked; and if we use the wrong watermark pattern to inspect the watermarked image, the correlation value should be very low due to the characteristics of the pseudorandom pattern. Figure 2-14 is a statistical result in [10], the seed is used to generate the pseudorandom pattern, and different seeds result in different patterns, which are almost uncorrelated to each other. The diagram shows that the correlation value between the watermarked image and the pseudorandom pattern is close to zero unless the seed happens to be the seed which generates the watermark embedded in the host image.

Figure 2-14 Correlation values between the watermarked image and the patterns generated by different seeds and the seed used to generate the watermark is 10

Since the correlation value of a wrong pattern is not exactly equal to zero, a threshold used to decide whether the image is watermarked is required. If the

(40)

correlation value is below the threshold T, we say that the image is un-watermarked. This detection scheme causes two types of errors named false positive and false negative. False positive means that a watermark is detected although there is none, and false negative rejects the existence of the watermark even though there is one. Normally, a higher gain is leads to a lower the error probability, but it also lowers the watermarked image quality.

In the above method, we can only embed one-bit data in an image by a watermark pattern. The way to embed several bits of data is to divide the host image into several embedding blocks and embed each bit of data to the embedding block, respectively. Figure 2-15 is a simple example of watermarking of four bits in one image, the host image is first divided into four sub-pictures, and the four watermarks are embedded correspondingly into each sub-picture with the corresponding intensity.

(41)

Figure 2-16 Watermarking in wavelet domain

The correlation-based watermarking technique mentioned above is operated under the spatial domain. The watermark can be also embedded in frequency domain such as wavelet transform (WT) domain, discrete cosine transform (DCT) domain, and Fourier transform (FT) domain. Each frequency domain has its own properties, and we can use these characteristics to improve the efficiency of watermarking. For instance, as the characteristics of the human visual system (HVS) model (which will be introduced in later section), human eyes are less sensitive to disturbances in high

(42)

frequency domain, and thus we can embed a watermark with higher intensity into the high frequency domain to increase the robustness of watermark without seriously damaging the quality of the host image. Figure 2-16 shows an instance of embedding watermark in wavelet domain, the watermark of each subband is independent and is allowed to have different intensity to improve the efficiency of watermarking. Other watermarking techniques which embed the watermark in discrete cosine transform (DCT) domain or Fourier transform (FT) domain have the similar operations, but since they are difficult to display by graph, we neglect them here for convenience.

Noncorrelation-based watermarking:

There are various types of noncorrelation-based watermarking such as least significant bit (LSB) modification, DCT coefficient ordering, and block-based watermarking.

(43)

Least significant bit (LSB) modification is the simplest example of spatial domain watermarking techniques. This method directly replaces the LSB bit-plane of the host image by an enormous amount of watermark bits, and of course, it is certainly not a secure scheme because the LSB bit-plane is easy to be interfered by random noise and the watermark is then destroyed. As shown in Figure 2-17, since the LSB bit-plane of the host image has almost no information about the image, replacing the LSB bit-plane by the watermark leads to no perceptual difference between watermarked image and the host image.

DCT coefficient ordering method was proposed in [11] and described more clearly in [10]. This method reorders the DCT coefficients of each block in a specific order which stands for different embedding information. First, the host image to be watermarked is divided into several 8x8 blocks, and DCT is done for each block. Then, two or three DCT coefficients in the mid-band Fmid are selected to be quantized

using the default JPEG quantization table and relatively low JPEG quality factor. Finally, quantized coefficients are reordered according to a specific order shown in Figure 2-18. For example, if we want to embed a bit with value 0, the third selected coefficient should have the lowest value.

(44)

Block-based watermarking [12] [13] divides the host image into several embedding blocks, and try to tune the mean value of each block by changing some coefficients’ value of the block. In [12], the mean of a block with even value represents that the watermark bit of value 0 is embedded while with odd value represents that 1 is embedded. In [13], the mean value of a block with zero value represents that the embedded data is 1 while the embedded data with value 0 makes the mean of block exceed a threshold value P.

Another method is the bit-plane-based method. In this method, the watermark is embedded into a specific bit by directly replacing the bit by the watermark bit. LSB modification method mentioned above is a well known example of this method, but there are still many variations of this method such as the method described in [15]. For this method, the watermark is embedded into bits which may not be in the LSB bit-plane, and use the Torus Automorphisms (TA) technique to protect the security of the embedded information.

(45)

2.2.6 Human Visual System (HVS) Model

The HVS model is useful for tuning the intensity of the watermark to improve the efficiency of trade-off between robustness and imperceptibility. The watermarking technique which takes advantage of the HVS model embeds a watermark of higher intensity without affecting the perceptual influence compared to the one which does not use the HVS model.

We can view the HVS model as three independent stages [6], that is, a receiver, the eye and retina, a transmission channel, the optic nerve, and a processing engine, the brain. Since the knowledge of the brain behavior is still unclear, to construct a perfect HVS model is very difficult. In this section, we only try to understand the features of the HVS model for the watermark designer to use in a synthetic way and from engineering perspective.

2.2.6.1 Introduction to the Human Visual System (HVS) model

Wandell [16] uses three successive stages, that is, encoding, representation, and interpretation to divide the human vision. Encoding stage corresponds to the transformation of light into electric signals by the retina. Representation stage is a process of representing the encoded image by different visual pathways tuned to specific characteristics of the visual signal. Interpretation stage is the highest level of the human vision and is located in the brain. We should note that it is very complex and involves so much subjectivity in the interpretation stage, however, the, motion, depth, and color appearance which can be roughly modeled are also belong to this stage.

The retina of human eyes has three kinds of photoreceptors, called red, green, and blue cones. They reach their maximum sensitivity to the light with specific wavelength corresponding to these basic colors. All colors are first encoded as a

(46)

function of these basic colors, but with the progressing of the research of the human visual system, we could find that a better color space is produced, that is, one achromatic and two chromatic channels. There are many kinds of color spaces of this concept are proposed, such as CIE Luv, CIE Lab, and the YCbCr color space is the most popular one.

The achromatic channel is also called the luminance channel and most modifications of the color images are only done in this domain for simplicity because the characteristics of the chromatic channels are still not unexplored.

2.2.6.2 Visibility Threshold of the Human Visual System

In this section, we introduce the concepts only and skip the detailed formulas. The human visual system (HVS) is not sensitive to the stimuli which is too small and cannot discriminate between signals with an infinite precision. This is also why a lossy compression system could compress an image with no perceptual difference. The definition of HVS sensitivity is first depends on the anatomy of the eye, which leads to the discovery of effects like refraction, diffraction, color aberration, and spatial sampling process. In addition, HVS sensitivity also depends on the characteristics of the visual signal. Unfortunately, it is too complex to analyze the whole HVS by only the method of anatomy. Thus, another method is proposed and worked fine, that is, experiments consisting in the detection and discrimination of patterns by human eyes. Several effects describing the properties of HVS sensitivity are listed below.

Just Noticeable Difference

Just noticeable difference (JND) threshold is a threshold that any noise with value below it can be ignored since the human eye will not discover the difference. JND threshold depends on the features of both the signal and the background pattern.

(47)

Weber-Fechner’s law states that “if the luminance of a test stimulus is just noticeable from the surrounding luminance, then the ratio of the luminance difference to the surrounding luminance is approximately constant” [17], which implies that a noise is easier to be detected in a dark region. However, the experimental conditions of this law are too simple in comparison to the real conditions of image viewing. To complement Weber-Fechner’s law, Albert Munsell proposed another system, the Munsell Renotation System [18]. In this system, the lightness is pointed out to be nonlinearly related to the luminance signal and this law has been used to shape the γ

characteristic of displays.

Contrast Sensitivity Function

As the frequency is concerned, JND threshold is also considered in the frequency domain. To determine the JND threshold in frequency domain, extensive psycho-visual measurements have been performed using gratings [19], that is, simple sinusoidal signals of given spatial frequency and orientation with a window extension. Mannos and Sakrison proposed the contrast sensitivity function (CSF) (2.2.6-1) [20]. In (2.2.6-1), Lmax and Lmin is the maximal and minimal luminance of the grating, f is

the spatial frequency; K0, K1, a, and α depend on parameters such as mean luminance, temporal frequency and orientation. Although CSF may be more precise, it also has many constraints to be satisfied, so it should be use cautiously.

α ) ( 1 0 min max min max ) 1 ( ) (f K K f ea f CSF L L L L C ⋅ ⋅ + ⋅ = + − = (2.2.6-1)

(48)

2.2.6.3 Visual Masking

When the interference between two or more signals is concerned, their visibility may be increased, or more often, decreased, and this effect is also called the masking effect. All we care about is the ability for the host image to hide other signals, which is important for watermarking system. The masking effect contains the spatial masking, the contrast or pattern masking, and noise masking, and so forth. Since the masking effect is too complex, we will skip the introduction of them, the reader could find more details in [6].

(49)

2.3 Layered Protection of Scalable Media

With the widespread use of Internet, all digital data are easy to obtain, thus, a protection scheme is needed for the intellectual property. Moreover, scalable media coding becomes more and more important due to the various bandwidth of transmission such as mobile communication. Scalable coding implies that a multimedia is divided into one base layer and several enhancement layers, and each enhancement layer could independently improve certain quality of the reconstructed image.

There are three categories of scalability, that is, spatial scalability, temporal scalability, and signal-to-noise (SNR) scalability. Spatial scalability means that each layer has different resolution, and we could get the reconstructed image with higher resolution when more enhancement layers are decoded. Temporal scalability is a scheme that enhancement layers increase the overall frame rate of a film. And SNR scalability makes different layers have different amount of quality to improve.

Traditionally, each layer of scalable coding is protected by cryptographic techniques, but the synchronization problem [21] between the encryption key and the encrypted content is still an issue. Therefore, if we could transmit the encryption key with the encrypted content, the synchronization problem will be resolved. Thus, a scheme which combines encryption and robust watermarking is proposed [22]. Before introducing this scheme, a typical MPEG-2 conditional access receiver [23] is introduced.

As shown in Figure 2-19, the digital content is protected by “Control Word” (CW), and the control word can be obtained through a two-step decryption flow, that is, to retrieve CW, the Service Key (SK) should be decoded first, and the decoding of SK is based on the combination of User Key (UK) and the Entitlement Management Message (EMM). User Key may be contained in a Smart Card and the EMM can be used to change the status of the user accessibility of contents, that is, if the

(50)

subscription is overdue, a broadcaster could disable a user’s access by changing the EMM. After decoding the SK, an Entitlement Control Message is required to decode CW. Because of the complicated computation of the Control Word, it is hard to handle the synchronization problem of cryptographic techniques. Thus, a new architecture is proposed [22] and shown in Figure 2-20; in this scheme, the EMM and ECM are embedded into the content and transmitted together, then the problem of synchronization is resolved. We should note that the ECM and EMM in Figure 2-19 are corresponding to the key function key() and the watermark Wi in Figure 2-20, and

UK, SK, and CW are corresponding to Gi, Fi, and Ki.

Descrambler Scrambled bitstream Video/Audio/Data Control Word (CW) Decipherment of CW Service Key (SK) Decipherment of SK

User Key (UK) ECM

EMM

Figure 2-19 A typical MPEG-2 conditional access receiver

In Figure 2-20, Xi is the encrypted enhancement layer which can be decrypted

using the decryption key Ki, and Ei is the resulting enhancement layer used to form

the ith constructed base layer Bi; Wi is the watermark extracted from the constructed

base layer Bi-1 with extraction parameter Pi-1, and Fi is the secret information

(51)

extraction parameter Pi. In short, the relationships between all coefficients in Figure 2-20 are listed in (2.3-1) to (2.3-6). Decrypte() Compose() B0or Xi Bi Delay Extract() Decryptf() Delay param() key() Bi-1 {Gi} Wi Ki Fi Pi Pi-1

Figure 2-20 Decryption and decoding of layer-protected content ) , ( _i _i e i Decrypt X K E = (2.3-1) ) , ( _i ₁ _i i Compose B E B = ₋ (2.3-2) ) , ( ₋₁ ₋₁ = _i _i i Extract B P W (2.3-3) ) , ( _i _i f i Decrypt W G F = (2.3-4) ) ( _i i key F K = (2.3-5) ) ( _i i param F P = (2.3-6)

使用視覺上可移除的浮水印實現可調式多媒體階層式保護

國

立

交

通

大

學

電子工程學系 電子研究所碩士班

碩

碩

碩

碩

士

士

士

士

論

論

論

論

文

文

文

文

使用視覺上可移除的數位浮水印

實現可調式多媒體階層式保護

Layered Protection of Scalable Media

using Perceptually Removable Watermarks

研 究 生：朱育成

指導教授：杭學鳴 博士

中

中

中

中

華

華

華

華

民

民

民

民

國

國

國

國

九

九

九

九

十

十

十

十

六

六

六

六

年

年

年

年

六

六

六

六

月

月

月

月

使用視覺上可移除的數位浮水印

使用視覺上可移除的數位浮水印

使用視覺上可移除的數位浮水印

使用視覺上可移除的數位浮水印

實現可調式多媒體階層式保護

實現可調式多媒體階層式保護

實現可調式多媒體階層式保護

實現可調式多媒體階層式保護

Layered Protection of Scalable Media

using Perceptually Removable Watermarks

電子工程學系電子研究所碩士班

研究生：朱育成

指導教授：杭學鳴博士

國立交通大

國立交通大

國立交通大

國立交通大學