Transform Domain Data Analysis

(1)

Transform Coding

National Chiao Tung University Chun-Jen Tsai 11/24/2014

(2)

2/36

Transform Domain Data Analysis

Given an invertible transform A, the entropy of a

source x does not change subject to A, i.e. Ax has the same entropy as x.

However, there are several reasons why we want to perform lossy compression on Ax, instead of x:

Input data sequence can be interpreted with more insights Input data possibly are de-correlated in transform domain The original time-ordered sequence of data can be

decomposed into different categories

(3)

Example: Height-Weight Data (1/3)

The height-weight data pair tends to cluster alone the line x_h = 2.5x_w. A rotation transform

can simplify the data representation :

, 02 . 68 cos ,

sin

cos _o

 =



 





= − φ

φ φ

φ A φ

x_h

x







= 









w h

x A x

1 0

θ θ

θ₁

(4)

4/36

Example: Height-Weight Data (2/3)

If we set

θ

₁ to zeros for all the data pairs, and

transform the data back to x_h–x_w domain, we have the reconstruction errors as follows:

Original data Reconstructed data

(5)

Example: Height-Weight Data (3/3)

Note that, in original data, both x_h and x_w have non- negligible variances, however, for

θ

₀ and

θ

₁, only

θ

₀

has large variance

Variance (or energy) of a source and its information has a positive relation; larger source variance, higher entropy

For Gaussian source, the differential entropy is (log₂πeσ²)/2.

The error introduced into the reconstructed sequence of {x} is equal to the error introduced into the

transform-domain sequence {

θ

}.

(6)

6/36

Transform Coding Principle

Transform step:

The source {x_n} is divided into blocks of size N. Each block is

mapped into a transform sequence {e_n} using a reversible mapping Most of the energy of the transformed block was contained in few elements of the transformed values

Quantization step:

The transformed sequence is quantized based on the following strategy:

The desired average bit rate

The statistics of the various transformed elements The effect of distortion on the reconstructed sequence

Entropy coding step:

The quantized data are entropy-coded using Huffman, AC, or other techniques

(7)

Transform Formulation

For media coding, only linear transforms are used

The forward transform can be denoted by

The inverse transform is

The selection of N is application-specific

Complexity of transform is lower for small N

Large N adapts to fast-changing statistics badly

Large N produces better resolution in transform domain .

1

0

∑

⁻ ,

=

N

i

i n i

n x a

θ

.

1

0

∑

⁻ ,

=

N

i

i n i

n b

x θ

(8)

8/36

2-D Forward Transform

For 2-D signals X_i,j, a general linear 2-D transform of block size N×N is given as

If separable transform is used; the formulation can be simplified to

In matrix form, the separable transform becomes Θ

Θ

ΘΘ = AXA^T.

.

1

0 1

0

, , , ,

,

∑∑

⁻

=

−

=

= Θ

N

i N

j

l k j i j i l

k x a

.

1

0

1

0

, , ,

1

0 1

0

, , ,

,

∑∑ ∑

⁻

∑

=

−

=

−

=

−

= 









= 

= Θ

N

i

N

j

l j j i i

k N

i N

j

l j j i i k l

k a x a a x a

(9)

Orthonormal Transform

All the transforms used in multimedia compression are orthonormal transforms. Thus, A^–1 = A^T.

In this case, ΘΘΘΘ = AXA^T becomes ΘΘΘΘ = AXA^–1.

Orthonormal transforms are energy preserving

. )

(

1

0 2 1

0 2

∑

−

=

−

=

N

n

n T

T T

N

i i

x x x Ax A

x

Ax θ Ax

θ θ

(10)

10/36

Energy Compaction Effect

The efficiency of a transform depends on how much energy compaction is provided by the transform

The amount of energy compaction can be measured by the ratio of the arithmetic mean of the variances to their geometric means:

where

σ

_i² is the variance of the ith coefficients.

(

0¹ ²

)

¹ ^,

1 0 1 2

N

i N i

N

i i

N

GTC

σ σ

−

=

−

=

Π

=

∑

Note: The wider the spread of σ_i²w.r.t. their arithmetic mean, the smaller the value of the geometric mean will be → better energy compaction!

(11)

Decomposition of 1-D Input

Transform decomposes an input sequence into components with different characteristics. If

input x = [x₁, x₂], the transformed output is

The first transformed component computes the average (i.e. low-pass) behavior of the input

sequence, while the 2^nd component captures the differential (i.e. high-pass) behavior of the input.

1 , 1

1 1 2

1 



 





= − A

2 . ) , (

2 )

( ₁ ₂ ₁ ₂



 



 + −

= x x x x

Ax

(12)

12/36

Decomposition of 2-D Input

If A in previous example is used for 2-D transform and X is a 2-D input, we have X = A^TΘA:

where

α

_i,j is the outer product of ith and jth rows of A.

How do you interpret

θ

_0,0, …,

θ

_1,1?

θ_0,0 is the DC coefficient, and other θ_i,j are AC coefficients.

, 2

1

1 1

1 1 1

1 1 1 2 1

1 , 1 11 0

, 1 10 1

, 0 01 0

, 0 00

11 10

01 00

11 10

01 00

11 10

01 00

11 10

01 00

11 10

01 00

11 10

01 00

α θ α

θ α

θ

θ θ

+ +

+

=



 





+

−

− +

+

= +



 





 −



 







 





= −



 





x x

(13)

Karhunen-Loeve Transform (KLT)

KLT consists of the eigenvectors of the autocorrelation matrix: [R]_i,j = E[X_nX_n+|i–j|].

KLT minimizes the geometric means of the variance of the transform coefficients → provides maximal G_TC Issues with KLT

For non-stationary inputs, the autocorrelation function is time varying; computation of KLT is relatively expensive

KLT matrix must be transmitted to the decoder

If the input statistics change slowly, and the transform size can be kept small, the KLT can be useful

(14)

14/36

Example: KLT

For N = 2, the autocorrelation matrix for a stationary process is

The eigenvectors of R are

With orthonormal constraint, the transform matrix is

1 . 1

1 1

2

1 



 





= − K

) , 0 ( )

1 (

) 1 ( )

0

( 



 



= 

xx xx

R R

R R R

. , ₂

1 



 





= −



 



= 

β β α

α v v

(15)

Discrete Cosine Transform

DCT is derived from the Discrete Fourier Transform (DFT) by first perform an even-function extension to the input data, then compute its DFT:

Only real number operations are required Better energy compaction than DFT

DFT

DCT

(16)

16/36

DCT Formulation

The rows of DCT matrix is composed of cosine functions of different frequencies:

The inner product of the input signal with each row of the matrix is the projection of the input signal onto a cosine function of fixed frequency

The larger N is, the better the frequency resolution is

[ ]

^.

1 ,...,

1 , 0 ,

1 ,...,

1

1 ,...,

1 , 0 ,

0 2

) 1 2

cos (

2 ) 1 2

cos (

2 1

, = − = −

−

=







+ +

= i N j N

N j

i N

j i

N j i

C

N N j

i π

π

(17)

Basis Functions of 8-Point DCT

Each column of the DCT matrix is a basis function:

(18)

18/36

Basis Images of 8-Point 2-D DCT

DCT can be extended to a 2-D transform:

(19)

Performance of DCT

For Markov sources with high correlation coefficient

ρ

,

the compaction ability of DCT is close to that of KLT As many sources can be modeled as Markov sources with high values for

ρ

, DCT is the most popular

transform for multimedia compression

[ ]

] , [ ²

1

n n n

x E

x x

E ₊

ρ =

(20)

20/36

Discrete Walsh-Hadamard Trans.

The Hadamard transform is defined by an N×N matrix H with the property HH^T = NI.

Simple to compute while still separate low frequency from high frequency components of the input data

The Hadamard matrix is recursively defined as:

The DWHT transform matrix is obtained by

Normalize the matrix by 1/N^½ so that it is orthonormal

Re-arrange the rows according to number of sign changes ].

1 [ and

, ₁

2  =



 





= − H

H H

H H H

N N

N

(21)

Coding of Transform Coefficients

Different transform coefficients should be quantized and coded differently based on the amount of

information it carries

Information is related to the variance of each coefficients

The bit allocation problem tries to determine the level of quantizer to use for different transform coefficients The Lagrange multiplier optimization technique is often used to solve the optimal bit allocation

(22)

22/36

Lagrange Multiplier

A constrained optimization problem tries to minimize a cost function f(x, y) subject to some constraints on the parameter x and y: g(x, y) = c

The Lagrange cost function is defined as follows:

Solution: solve

. )

, ( )

, ,

(x y f x y g x y c ²

J λ = −λ⋅ −

0 5

10 15

20 25

0 5 10 15 20 25

-8 -6 -4 -2 0 2 4 6 8

. 0 ) , ,

, (

, =

∇_x _y _λJ x y λ

(23)

Rate-Distortion Optimization (1/3)

If the rate per coefficient is R and the rate per kth coefficient is R_k , then

where M is the number of transform coefficients The error variance for the kth quantizer

σ

^rk

2, is related to the kth input variance

σ

^θk

2, by:

where

α

_k depends on input distribution and quantizer The total reconstruction error is given by

1 ,

1

∑

=

M

k

Rk

R M

, 2 ² ²

2

k k k

R k

r α σθ

σ = ⁻

. 2 ² ²

2 ⁼

∑

^M ^α ⁻ ^R ^σ

σ

(24)

24/36

Rate-Distortion Optimization (2/3)

The objective of the bit allocation procedure is to find R_k to minimize

σ

_r² subject to total rate constraint R.

If we assume that

α

_k is a constant

α

for all k, we can set up the minimization problem in terms of Lagrange multipliers as

Taking the derivative of J with respect to R_k and setting it to zero, we obtain the expression for R_k:

1 . 2

1 1

2 2 



 



 −

−

=

∑ ∑

=

− M

k

k M

k

R R

R M

J k

kσ λ

α _θ

. 2 log

) 1 2

ln 2

( 2 log 1

2 2

2 α σ_θ − λ

= k

Rk

(25)

Rate-Distortion Optimization (3/3)

Substituting R_k to the expression for R, we have:

Therefore, the individual bit allocations for each transform coefficients is:

Note that R_k may not be integers or positive numbers

Negative R_k’s are set to zero

Positive R_k’s are reduced to a smaller integer value

(

² ^ln²

)

² ^.

1

2 2 ¹

∏

=

= − M

k

R

M

θk

σ α

λ

( )

^.

2 log 1

1 2 2

2 ¹

∏

=

+

= M

k

k M

k

R k

R

θ θ

σ σ

(26)

26/36

Zonal Sampling

Zonal sampling is a simple bit allocation algorithm:

1. Compute σ^θk

2 for each coefficient.

2. Set R_k = 0 for all k and set R_b = MR, where R_b is the total number of bits available for distribution.

3. Sort the variances {σ^θk

2} Suppose σ^θm

2 is the maximum.

4. Increment R_m by 1, and divide σ^θm

2 by 2.

5. Decrement R_b by 1. If R_b = 0, then stop; otherwise, go to 3.

Bit allocation map for an 8×8 transform

(27)

Threshold Coding

Another bit allocation policy is called threshold coding

Arrange the transform coefficients in a line The first coefficient is always coded

For remaining coefficients

If the magnitude is smaller than a threshold, it is skipped

If the magnitude is larger than a threshold, its quantized value and the number of skipped coefficients before it is coded

Zigzag scan is often used for 2-D to 1-D mapping

(28)

28/36

JPEG Image Compression

A standard defined by ISO/IEC JTC1/SC 29/WG 1 in 1992

The official IS number is IS 10918-1, which defines the input to the decoder (a.k.a. the elementary stream), and how the decoder reconstructs the image

The popular file format JFIF for JPEG elementary stream is defined in 10918-5

There are several new image coding standards that are incompatible to the old JPEG, but still bearing the JPEG name

Wavelet-based JPEG-2000 (IS 15444-1)

High quality lossless/lossy JPEG-XR (IS 29199-2)

(29)

JPEG Initial Processing

Color space RGB → YC_BC_R mapping Chroma channel 4:2:2 sub-sampling

Level shifting: assume each pixel has p-bit, then each pixel x_i,j = x_i,j – 2^p–1

Split pixels into 8×8 blocks

If image size is not a multiple of 8, extra rows/columns are padded to achieve multiple of 8

Padded data is discarded after decoding

(30)

30/36

JPEG 8×8 DCT Transform

Forward DCT is applied to each 8×8 block

Level-shifting

Forward DCT

(31)

JPEG Quantization

Midtread quantization is used; the step size for each coefficients is from an 8×8 quantization matrix Q, e.g.,

Quantized values are called “labels.” For input coefficient

θ

_ij, we have

Q_ij is the step size for i,j-th transform coefficients

. 5 . 0 









 +

=

ij ij

ij Q

l θ

(32)

32/36

JPEG Quantization Example

Quantization controls the entropy of the image

Quantization matrices reflect image quality

A scalar number (quality factor) is often used as quantization matrix multiplier to control image quality

39.88 6.56 –2.24 1.22 –102.43 4.56 2.26 1.12 37.77 1.31 1.77 0.25 –5.67 2.24 –1.32 -0.81

16 11 10 16

12 12 14 19

14 13 16 24

14 17 22 29

θ₀₀ Q₀₀

2 1 0 0

–9 0 0 0

3 0 0 0

0 0 0 0

l₀₀

³⁹^.⁸⁸₁₆ ⁰^.⁵ ^²^.⁹⁹^ ²

5 . 0

00 00

00 = + = =





+

= Q

l θ

(33)

Entropy Coding

DC/AC coefficients are coded differently

DCs are coded using

Differential coding + Huffman coding

Each DC difference is coded using a Huffman prefix plus a fixed length suffix

ACs are coded using

Run-Length coding + Huffman coding

(34)

34/36

DC Difference Code Table

Difference category (VLC-code as prefix)

value in each category (FLC-code as suffix)

(35)

AC RLE Code Table

AC is zigzag scanned into a 1-D sequence

Each non-zero coefficient is coded using a Z/C codeword plus a sign bit S

Z – number of zero run before the label C – label magnitude

EOB is used to signal the end of each block ZRL is used to signal 15 consecutive zeros

(36)

36/36

JPEG Coding Example

A good example from Wikipedia:

83,261 bytes

compression ratio 2.6:1

15,138 bytes

compression ratio 15:1

4,787 bytes

compression ratio 46:1