Overview of the Thesis - JPEG2000編碼器之加速和TI DSP系統平台上之實現

Chapter 1 Introduction

1.2 Overview of the Thesis

In this thesis, the JPEG2000 encoder is implemented on an embedded system-a TIDSP platform. A few speed-up methods are adopted in our encoder. In the Chapter 2, the concepts of the JPEG2000 algorithm are introduced and all coding modules are presented in the following sections. Chapter 3 introduces the implementation environment including the DSP platform, coding development tools, and some typical optimization methods. In Chapter 4, the JPEG2000 encoder is profiled and analyzed. Some previous accelerating methods are reviewed and modified in our DSP platform. Then, we propose our improved methods to accelerate the JPEG2000 encoder in Chapter 5 and extensive experiments using different methods are also presented in Chapter 5. Finally, we give a summary of this project and also discuss the future possible work in Chapter 6.

Chapter 2 Conspectus of JPEG2000 Algorithm

The JPEG standard has been in use for almost a decade now. It provides a valuable tool during all these years, but it cannot fulfill the advanced requirements for image coding of today. The JPEG2000 standard provides a set of features that are important to many high-end and emerging applications by adopting new technologies. This chapter introduces the feature set and provides an overview of the Part1 of JPEG2000 standard Part 1. It is the core of the JPEG2000 for image coding system. The details of JPEG2000 Part 1 can be found in [1].

2.1 Introduction to JPEG2000

Starting from March 1997, a new call for contributions was launched for the development of a new standard for the compression of still images, the JPEG2000 standard [1], [2]. The requesting compression technologies had been submitted to an evaluation during the November 1997 WG1 meeting in Sydney, Australia. The JPEG2000 standard has been achieved many desired features including different types of still image, different characteristics, and different imaging models within a unified system. The most important features [7] of JPEG2000 algorithm are listed as below.

Superior low bit-rate performance:

While superior performance at all bit-rates was considered desirable, improved performance at low bit-rate (e.g. below 0.25 bpp), with respect to JPEG, was considered to be

Seamless compression of image components (e.g., R, G, or B), each from 1 to 16 bits deep, was desired from one unified compression architecture.

Progressive transmission by pixel accuracy and resolution:

Progressive transmission that allows images to be reconstructed with increasing pixel accuracy or spatial resolution is essential for many applications. For examples, World Wide Web, image archival and printers, are common applications.

Lossless and lossy compression:

JPEG2000 provides both lossless and lossy compression, again from single compression architecture. It is desired to provide lossless compression in the natural course of progressive decoding.

Region-of-Interest Coding:

Some parts of an image are more important than others, and would like to be transmitted with better quality and less distortion than the rest of the image. Users can define certain ROI’s in the image to be coded and transmitted first.

Random code-stream access and processing:

This feature allows users to define certain ROI’s in the image to be coded and transmitted with less distortion than the rest of the image. Besides, rotation, filtering, translation, scaling and feature extraction are supported.

Robustness to bit-errors:

It is desirable to consider robustness to bit-errors while designing the codestream. In the noisy communication channels (e.g., wireless), proper design of the codestream can aid subsequent error correction systems in alleviating catastrophic decoding failures.

Open architecture:

It is desirable to allow open architecture to optimize the system for different image types and applications. A decoder is only to implement the core tool set and a parser that understands the codestream. Furthermore, unknown tools could sent from the source and be adopted by the decoder.

Content-based description:

Image archival, indexing and searching is an important in image processing.

Content-based description of images might be available as part of the compression system.

Side channel spatial information (transparency):

Side channel spatial information such as alpha planes and transparency planes are useful

for transmitting information for processing the image for display, printing or editing.

Protective image security:

Protection of a digital image can be achieved by means of watermarking, stamping, encryption, and labeling. The SPIFF has implemented labeling method, and JPEG2000 must be easy to achieve the target.

Source Image Data

Pre-Processing Forward DWT

Uniform Scalar Quantization Tier-1 Encoder

Tier-2 Encoder

Rate Control Coded Image

Figure 2-1 General block diagram of JPEG2000 encoder [1]

Post-Processing Inverse DWT Dequantization Tier-1 Decoder

Tier-2 Decoder Coded Image

Reconstructed Image

Figure 2-2 General block diagram of JPEG2000 decoder [1]

Due to above-mentioned attractive features, JPEG2000 has a very large potential application base. Some possible application areas include: document imaging, digital photography, desktop publishing, Internet, image archiving, medical imaging, remote sensing,

several of which are listed in Table 2-1. Part 2 [3] and Part 3 [4] describe extensions to the baseline codec that are useful for certain specific applications such as intraframe-style video compression. For convenience, we will refer to the codec defined in Part 1 of the standard as the baseline codec. Before introducing the major block of the codec, we should know that the most parts of the JPEG2000 standard are written from the point of view of the decoder.

Besides, the decoder is the reverse of the encoder. We will only describe the JPEG2000 encoding tools in the following sections.

Part Title Purpose

1 Core coding system Specifies the core codec for the JPEG2000 family of standard

2 Extensions[3] Specifies additional functionalities that are useful in some applications but need not be supported by all codec

3 Motion JPEG2000[4] Specifies extensions to JPEG2000 for intraframe-style video compression

4 Conformance testing[5] Specifies the procedure to be employed for compliance testing

5 Reference software[6] Provides sample software implementations of the standard to serve as a guide for implementations

Table 2-1 Part of the JPEG2000 standard

2.2 Pre-Processing

The Pre-Processing block includes three types of processes, which are “Image Tiling”,

“DC Level Shifting”, and “Component transformations”. We will describe these terms as follows.

Figure 2-3 Tiling, DC-Level shifting, and Component transformation (optional)

2.2.1 Image Tiling

The standard operations, including component mixing, wavelet transform, quantization and entropy coding, works on image tiles which are the partition of the original image. The image tiles are rectangular non-overlapping blocks which are compressed independently.

Tiling reduces memory requirements, and since they are reconstructed independently, they can be used for decoding specific parts of the image instead of the whole image.

2.2.2 DC Level Shifting

After tiling image, all samples of the each tiles are dc level shifted by subtracting the same quantity 2^P-1, where P is the component’s precision. DC level shifting is performed on samples of components that are unsigned only.

2.2.3 Component Transformation

The followed stage is an optional inter-component transformation. It reduces the correlation between components, and lead to improved coding efficiency [8]. The JPEG2000 supports multiple-component image, and different bit depths. For the reversible (i.e. lossless) systems, the only requirement is that the bit depth of each output image component must be identical to the bit depth of the corresponding input image component. The JPEG2000 supports two different component transforms, irreversible component transformation (ICT) for lossy coding and reversible component transformation (RCT) for lossless or lossy coding.

All image component samples I0(x, y), I1(x, y), I2(x, y), corresponding to the first, second, and third components, produce transform samples Y0(x, y), Y1(x, y), Y2(x, y). The forward and inverse RCT are achieved by means of (2.2-1) and (2.2-2). The other one, ICT, refers to (2.2-3) and (2.2-4).

2.3 Discrete Wavelet Transform and Quantization

The wavelet transform is used for analysis of the tile components into different decomposition levels. These decomposition levels contain a number of subbands, which consist of coefficients that describe the horizontal and vertical spatial frequency characteristics of the original tile component. Due to the statistical properties of these subband signals, the transformed data can usually be coded more efficiently than the original untransformed data.

In JPEG2000 system, two wavelet transform kernels are provided. The DWT can be irreversible or reversible. The default reversible transformation is implemented by means of the Le Gall 5-3 filter, the analysis and the corresponding synthesis filter coefficients are given in Table 2-2. The other one, default irreversible transform, is implemented by means of the Daubechies 9-7 filter, and the corresponding coefficients are given in Table 2-3.

Analysis Filter Coefficients Synthesis Filter Coefficients i Low-Pass Filter h L(i) High-Pass Filter h H(i) Low-Pass Filter g L(i) High-Pass Filter g H(i)

0 6/8 1 1 6/8

±1 2/8 -1/2 1/2 -2/8

±2 -1/8 -1/8

Table 2-2 Le Gall 5-3 analysis and synthesis filter coefficients

Analysis Filter Coefficients Synthesis Filter Coefficients i Low-Pass Filter h L(i) High-Pass Filter h H(i) Low-Pass Filter g L(i) High-Pass Filter g H(i)

0 0.6029490182363579 1.115087052456994 1.115087052456994 0.6029490182363579

±1 0.2668641184428723 -0.5912717631142470 0.5912717631142470 -0.2668641184428723

±2 -0.07822326652898785 -0.05754352622849957 -0.05754352622849957 -0.07822326652898785

±3 -0.01686411844287495 0.09127176311424948 -0.09127176311424948 0.01686411844287495

±4 0.02674875741080976 0.02674875741080976

Source Image Data

LPF (h L(i))

HPF (h H(i))

↓2

LPF (h L(i))

HPF (h H(i))

↓2

LPF (h L(i))

HPF (h H(i))

↓2

Horizontal Filtering Vertical Filtering

LLi+1

LHi+1

HLi+1

HHi+1

Figure 2-4 2-D forward discrete wavelet transform

Figure 2-5 2-D DWT decomposition

Usually, the two-dimensional (2-D) discrete wavelet transform is accomplished by cascading two one-dimensional (1-D) discrete wavelet transform. It is decomposed by one-dimensional discrete wavelet transform with 2-channel in horizontal and vertical directions respectively, as shown in Figure 2-4. After one-dimensional vertical discrete wavelet, two subbands are formed. The low-pass samples represent a downsampled low-resolution version of the original set. The high-pass samples represent a downsampled residual version of the original set. And then the subbands pass through the other horizontal filter. The four higher-level subbands are all composed of quarter original image size such as Figure 2-5.

Power of 2 decompositions is allowed in the form of dyadic decomposition (in Part I) as shown in Figure 2-6. For a N by N image through the M-level two-dimensional discrete wavelet transform decomposition, the size of each subband is N/2^M by N/2^M. An example of a dyadic decomposition into subbands of the image ‘Lena’ is illustrated in Figure 2-7.

Figure 2-6 Hierarchical of multi-level 2-D DWT

N/2

N/2 N/4

N/4

Figure 2-7 An example of Lena image for multi-level 2-D DWT

After transformation, all coefficients are quantized. Sever quantization options are provided in JPEG2000 standard. Only the uniform scalar quantization which is the default quantization method in JPEG2000 standard Part 1 would be introduced here.

In integer mode, the quantizer step sizes are always fixed at one, effectively bypassing quantization and forcing the quantizer indices and transform coefficients to be one and the same. In this case, lossy coding is still possible, but rate control is achieved by other

mechanism. In the case of real mode, the quantizer step sizes are chosen in conjunction with rate control. Each of the transform coefficients ab(u,v) of the subband b is quantized to the value qb(u,v) according to the formula (2.3-1). Since the step size Δ is represented relative _b to the dynamic range Rb of the subband b, it is defined in (2.3-2). The exponent/mantissa pairs (εb, μb) are either explicitly signaled in the bit stream syntax for every sub-band.

2.4 Embedded Block Coding with Optimized Truncation

Embedded block coding with optimized truncation (EBCOT) [9] is adopted for the entropy coding of JPEG2000.The EBCOT consists of two major coding step, tier-1 and tier-2, as shown in Figure 2-8. The tier-1 part is the embedded block coding (EBC) which is composed of the context formation (CF) and the arithmetic encoder (AE). The tier-1 coder divides each subband coefficient into code-blocks and all code-blocks are coded separately into a block-based embedded bit-stream. The coding is performed using the bit-plane coder described later in next section. For each code-block, an embedded code is produced, comprised of numerous coding passes and the output of the tier-1, block-based embedded bit-stream, is a collection of coding passes for the various code-blocks. After that, the tier-2 truncates the embedded bit-stream to minimize the overall distortion. We will introduce the two tiers in following sections.

Context Formation

Arithmetic Encoder

Rate-Distortion Optimization

Full-featured bit-stream

Context Decision

DWT Coefficients

EBC

Tier-1 Tier-2

Figure 2-8 Two tiers of EBCOT algorithm

2.4.1 Tier-1 Coding

The tier-1 coding is also a known as the embedded block coding (EBC). It includes the context formation (CF) and the arithmetic encoder (AE) and its basic coding unit is a code-block. The EBC is a bit-level processing algorithm, and the code-block is coded in a

Figure 2-9 Diagram of tile, code-block, bit-plane, stripe and coding pass

2.4.1.1 Context Formation (CF)

The embedded block coding is essentially a context-adaptive arithmetic encoder as shown in Figure 2-8. The context formation (CF) generates context-decision pairs for the arithmetic encoder (AE). The context is adopted to adapt the probability of the decision by the AE. In context modeling, all code-blocks are coded a bit-plane at a time starting from the MSB bit-plane with a non-zero element to the LSB bit-plane. For each bit-plane in a code-block, a special scan pattern is use for each of three coding passes. The three coding passes are coded in order as Pass1 (significance propagation pass), Pass2 (magnitude refinement pass), and then Pass3 (cleanup pass). Each coefficient bit from DWT is coded in

only one of the three coding passes, and the coding condition is shown in Table 2-4.

Coding Pass Coding Condition

Pass1 (Significance Propagation Pass) Insignificant sample with at least one significant neighbor

Pass2 (Magnitude Refinement Pass) Significant sample

Pass3 (Cleanup Pass) Insignificant sample with all Table 2-4 Coding Pass Classification

Figure 2-10 Context window and Neighbors states

Since the context-based arithmetic coding is employed, a means to select context selection is necessary. Figure 2-10 shows the context window and the 4-connected or 8-connected neighbors of a sample is selected that is performed by examining state information.

The first coding pass (Pass1) for each bit plane is the significance propagation pass.

During the significance propagation pass, a bit is coded if its location is not significant, but at lease one of its 8-connected neighbors is significant. Nine context labels (Table 2-5) are created based on how many and which ones are significant. The significance propagation pass includes only bits of coefficients that were insignificant and have a non-zero context. All other

Only four neighbors are considered, and each neighbor may have one of three states:

significant positive, significant negative, or insignificant. Both vertical and horizontal give the different contribution for the context table. The nine permutations of the vertical and horizontal contributions are reduced into five context labels as shown in Table 2-6. The decision of sign coding can be obtained by performing the logic XOR operation with the XOR bit of the sign context table.

The second coding pass (Pass2) for each bit plane is the magnitude refinement pass. This pass signals subsequent bits after the most significant bit for each sample. If a sample was found to be significant in a previous bit plane (except those that have just become significant in the immediately proceeding significance propagation pass), the next most significant bit of that sample is conveyed using a single binary symbol. The context used in magnitude refinement coding is determined by the summation of the significance state of the horizontal, vertical, and diagonal neighbors as shown in Table 2-7.

All the remaining coefficients in the bit-plane are insignificant and have the context value of zero during the significance propagation pass. These are all included in the cleanup pass (Pass3). The cleanup coding not only uses the neighbor context, like that of the significant coding from Table 2-5, but also a run-length coding. If the four contiguous samples in a column and the context labels of the four samples are all zeros, the run-length coding is performed.

LL and LH sub-bands

(vertical high-pass) HL sub-band

(horizontal high-pass) HH sub-band

(diagonally high-pass) Context Label

Table 2-5 Contexts for the significance propagation pass and cleanup coding passes

Horizontal contribution Vertical contribution Context Label XOR bit

1 1 13 0 1 0 12 0 1 -1 11 0 0 1 10 0 0 0 9 0 0 -1 10 1 -1 1 11 1 -1 0 12 1 -1 -1 13 1

Table 2-6 Contributions of the vertical (and the horizontal) neighbors to the sign context

ΣH+ΣV+ΣD First refinement for this sample Context Label

X^b False 16

≧1 True 15

0 True 14

Table 2-7 Contexts for the magnitude refinement coding pass

Figure 2-11 Basic operation of the AE

(Most Probable Symbol, Least Probable Symbol, and Renormalization)

2.4.1.2 Arithmetic Encoder (AE)

The decision which is produced by the CF is coded during arithmetic encoder. The AE is an adaptive, binary MQ-coder [10]. The basis of the binary arithmetic coding process is the recursive probability interval subdivision of Elias coding. Since it is a binary AE, there are only two sub-intervals. With each binary decision, the current probability interval is subdivided into two sub-intervals, and the codestream is modified (if necessary) so that points to the base (lower bound) of the probability sub-interval assigned to the symbol as shown in Figure 2-11. Besides, a lazy coding mode is used to reduce the number of symbols that are arithmetically coded. According to this mode, after the fourth bitplane is coded, the first and second pass are included as raw, while only the third coding pass of each bitplane employs arithmetic coding.

2.4.2 Tier-2 Coding

The tier-2 encoding follows the tier-1 encoding, and the input of the tier-2 encoding process is the set of bit-plane coding passes generated during tier-1 encoding. Each coding pass is a candidate of truncation point of a code-block and the coding pass information is packaged into data units called packets in tier-2 coding. For meeting a target bit-rate or transmission time, the packaging process imposes a particular organization of coding pass data in the output codestream. Thus rate control assures that the desired number of bytes is used by the codestream while assuring the highest image quality possible. We will review the RDO algorithm in following section.

In the encoder, rate control can be achieved through two distinct mechanisms, the choice of quantization step size and the selection of the subset of coding passes to include in the codestream. When lossless coding is employed, only the first mechanism may be used. The quantization step sizes must be fixed to one. In lossy coding mode, both of the two mechanisms may be employed. If the quantization step sizes are changed, the tier-1 encoding must be performed again. Since tier-1 coding requires a lot of computation, changing step sizes may not be practical in the encoder. The encoder can elect to discard coding passes in order to control the rate. The contribution of each coding pass makes to rate, and calculates

the distortion reduction. Using this information, the encoder can include the coding passes in order of decreasing distortion reduction until the bit budget has been exhausted.

The goal of rate control is to minimize the distortion while keeping the rate smaller than the target rate, RT. The problem is mapped into Lagrange optimization problem [11] as (2.4-1).

The D means total distortion, and R means total bit rate. The Lagrange multiplier(λ) is used to minimize J = D+Rλ,and thus the derivative of J is set to zero. The candidate corresponding pass m of the bit-plane k in the code-block i (Bi) is represented as Zi. Then the optimalλ,(*λ), and the slop of R-D curve can be obtained as (2.4-2).

For each code-block Bi, the slop of R-D curve is corresponding to the number of Zi as (2.4-3). The SiZi means the reduction speed when Bi is truncated at Zi. The optimal solution proved in [11] is constrained as below (2.4-4). The *Zi is the optimal truncation point of Bi, and the rate-distortion optimization can be achieved when Zi is sufficiently closed to *Zi.

Chapter 3 DSP Implementation Environment

In this chapter, we will briefly introduce the DSP platform environment and some optimization methods. We use the DSP module (SMT395) made by Sundance. It houses two important chips, TMS320C6416T DSP chip made by Texas Instrument and Xilinx Virtex II Pro FPGA. As our implementation is software base system, we only focus on the DSP chip. In

在文檔中 JPEG2000編碼器之加速和TI DSP系統平台上之實現 (頁 14-0)