Lossless Data Compression Tools - Lossless Data Compression Methods

Chapter 2 Lossless Data Compression Methods

2.3 Lossless Data Compression Tools

Lossless data compression algorithms consisting of WinRAR and WinZip are the common and popular. The compression algorithms are designed for general data compression under the consideration of flexibility. The algorithms mix various algorithms and adopt the best algorithm adaptively to achieve the better performance over various sources. The adaptation of data compression is based on analysis prior to coding. For analysis, the buffer requirement is larger when the source data is huge. Thus, for low complexity issue, WinRAR and WinZip are not suitable. In addition, the technical details of the commercial software WinRAR and WinZip are not available. In our simulations, the two tools are used for the performance comparisons with proposed line-based adaptive lossless video compression (LALVC) algorithm.

Chapter 3 Line-based Adaptive Lossless Video Compression

For interactive multimedia applications that demand low complexity and low latency, we present a novel line-based adaptive lossless video compression (LALVC) algorithm. In LALVC, the architecture, syntax and coding methods are explained in detail. In addition, we reveal the hardware architecture of LALVC on an ARM based platform. In addition, the hardware implementation issues are addressed and resolved.

3.1 Motivation

We propose an algorithm with low complexity for implementation on FPGA platform or ASIC design. The features of the proposed algorithm include one-pass, transform-free, motion-free, and simple structure of entropy coder.Since video interaction on high-resolution screens or display devices needs huge bandwidth for exchanging raw data. To reduce the transmission cost and provide video without losing data accuracy, lossless video compression is required.

3.2 Main Architecture

Figure 10 shows the application example of LALVC. The screen signal is output from the VGA card and transformed into LVDS-type signal for wire-transmission. Our algorithm is implemented into the PCB with FPGA Chip. Digital signal is fed into our design for compression. Compressed signal is transformed by the LVDS-type signal for transmission.

After the receiver gets the signal from the wire-line, de-LVDS transform is applied on the signal. The Decoder PCB decodes the signal on the digital-type signal to reconstruct the screen.

3.3 Algorithm Development

3.3.1 Overview of Algorithm

We propose a line-based adaptive lossless video compression (LALVC) algorithm for interactive multimedia applications that demand low complexity and low latency. Considering low complexity and low delay, LALVC adopts a simple and efficient architecture that adopts one-pass, raster-scan, transform-free coding process and a simple predictor.

(a). One-pass:

Multiple-pass encoding can enhance the coding performance by analyzing the source first and use side information to set the coding flow. However, the multiple-pass encoding needs more buffer for pre-analysis and will increase the latency of the bitstream output.

Figure 10. Application example of LALVC

(b). Transform-free:

In lossless image coding, data accuracy is preserved. If a transformation method including DCT or Wavelet is taken, the coefficients are represented in floating point when non-integer transformation is applied. With transform coefficients, we can keep best subjective quality based on the removal of high frequency components, which causes the data loss. For the lossless video compression, the transformation is not used.

(c). Simple predictor:

Complex statistics is an obstacle for low-complexity requirement. The main goal of our algorithm is to get the high coding efficiency with a simple predictor, i.e. multiplier and divider are avoided.

For low latency, zero-motion prediction and one-frame buffer are used to reduce temporal redundancy. In addition, to maximize the coding efficiency for both natural and computer-generated video sequences, LALVC adaptively selects the best coding mode for each handling line of every frame. For each line, Golomb code in entropy coding can enhance coding efficiency with less computation load and is easy for hardware implementation. The experimental results show that temporal preprocessing and line-based mode decision can increase compression ratio with properly increased complexity as compared to that of JPEG-LS. LALVC contains three major parts. In Figure 11, the source video is first fed into a

Figure 11. The block diagram of the LALVC encoder.

preprocessing unit for temporal redundancy removal. The output of the preprocessing process provides the best mode for coding each line. The mode information is then passed to the second part, called as a mode-dependent spatial predictor. Based on a simple spatial predictor, inter-pixel redundancy is removed and the residuals are encoded with an entropy coder using Golomb Coder to remove coding redundancy. The decoder uses inverse operations at the encoder to reconstruct sequence line by line and frame by frame. The blocks with gray color in Figure 11 are novel. The remaining blocks including fixed predictor, context model, and Golomb codes are selected from the existing JPEG-LS standard to fit our requirements.

3.3.2 Prediction Method

Three predictors used in temporal, spatial and coding relations are introduced individually. For low-complexity requirement, we develop the algorithm with simple operations to alleviate computation cost by multiplication or division.

3.3.2.1 Temporal Prediction (Inter-Frame Process)

To avoid high computation load, motion search is not considered in our development of LALVC. Zero-motion DPCM is the simplest method. To emulate the performance of DPCM, we compare the performance of the video coding with or without zero-motion prediction.

Table 7 shows the results of DPCM without zero-motion prediction. Both Table 8 and Table 9 show the enforced zero-motion prediction and the difference is the subsequent coding process

Figure 12. Line-based zero-motion comparator

of the prediction residuals. The source images are gray-level pictures. Residual is derived by differentiating each pair of pixels. The level range becomes double (-255~+255), which needs 9-bit depth to represent each residual. In Table 8, we classify the residuals into two different frames. One is positive plane and the other is negative part. Table 9 divides the residuals into sign plane (1-bit plane) and the other is the plane consisting of absolute values.

The gains based on zero-motion prediction can be evaluated by comparing Table 7 with Table 8. The sequences Akiyo_cif and Silent_cif could get the better compression when zero-motion is applied. The coding performance of the sequences Foreman_cif and Bus_cif are not favorable. Thus, we need a hybrid method of context-aware scheme to automatically apply zero-motion to the suitable sequences. Considering sequence properties, Akiyo_cif and Silent_cif contain slow motion objects and Forman_cif and Bus_cif have high motion objects.

Thus, to achieve the highest performance, a good mode decision is demanded to make a hybrid predictor algorithm for various video sequences.

In our algorithm, LAVLC encodes video sequences in a manner of pixel by pixel and then line by line in raster-scan order. The line-based preprocessing algorithm employs some predictions to remove redundancy between temporally neighboring frames. As shown in Figure 12, the temporal prediction with zero motion vector, which has the lowest complexity,

Table 7. Original sequence test without DPCM process

WinZip WinRAR JBIG1 JPEG2000 JPEG-LS CALIC

Akiyo_cif 1.670916 2.649852 1.909968 2.82182592 2.995522 3.170546 Silent_cif 1.399279 1.859162 1.555788 2.12750716 2.198372 2.194892 Foreman_cif 1.53747 2.085842 1.753247 2.28110599 2.414634 2.506403 Bus_cif 1.368598 1.602092 1.423095 1.99786088 1.988431 2.022003 Table 8. DPCM process (Negative value plane + positive value plane)

V1 WinZip WinRAR JBIG1 JPEG2000 JPEG-LS CALIC

Akiyo_cif 4.184175 4.807215 4.935194 4.68809193 5.791167 5.146251 Silent_cif 2.120254 2.337154 2.649184 2.22588623 2.716298 2.648214 Foreman_cif 1.559567 1.554926 1.833107 1.55026621 1.854164 1.860442 Bus_cif 1.124387 1.213907 1.23257 1.08378339 1.335618 1.313306

is applied to co-located lines of successive frames. In Figure 12, the i-th line and the (i-1)-th line of the frame In and i-th line of the frame In-1 are imported into the line comparator to calculate the difference of the two lines for advanced mode decision. The mode decision is based on two metrics including SAD (sum of absolute difference) and SAGD (sum of absolute gradient of difference).

Where the difference signals ^Iintra^D

( )

ⁱ,^j for the i-th row and j-th pixels within the n-th frame

With SADinter ,SAGD_intraand SAGD_inter , Figure 17 shows the proposed mode decision algorithm. With zero-motion prediction, line-based preprocessing is shown in Figure 13.

We use a near exhaustive approach to find the upper bound of coding performance under the specified structure and to decide the mode tags. According to the upper bound, we analyze the absolute gradient of difference (AGD) in inter and intra domains to show the principle of optimal mode decision.

Table 9. DPCM process (Absolute-value plane + Sign plane)

V2 WinZip WinRAR JBIG1 JPEG2000 JPEG-LS CALIC

Akiyo_cif 4.106447 4.760964 5.201401 4.72215597 5.201765 5.379369 Silent_cif 2.136152 2.478691 2.667505 2.30054222 2.619741 2.782672 Foreman_cif 1.667592 1.921768 2.001752 1.81020296 2.013832 2.155733 Bus_cif 1.270787 1.374832 1.468044 1.48336829 1.538868 1.639251

( ) ( )

Figure 14(a) and Figure 14(b) show the AGD(inter) and AGD(intra) histogram probability distribution. When raw data mode is preferred, the AGD(intra) will be centralized at lower level area. The sequence used for analysis is Foreman, under the near-upper bound case, raw data mode probability is 72.7% and the difference mode is 27.2%.

(a) Raw data mode (72.7%) (b) Difference data mode (27.2%) Figure 14. Probability distribution of AGD(inter) and AGD(intra) for Foreman sequence in

CIF resolution.

Figure 13. Line-based preprocessing.

(a). Bus_Raw_Mode(98.1%) (b). Football_Raw_Mode(86.9%) (c). Mobile_Raw_Mode(85.3%)

Figure 15. Probability distribution of sequences with major mode of Raw Data Mode

In our simulation, we take 6 sequences. Except the Foreman, we group the other 5 sequences into 2 types. The first group is Raw_Mode dominated sequences. In Figure 15, Bus, Football, and Mobile belong to this type. AGD(inter) is distributed at higher gray level.

Mobile is not obvious. The second type of sequences is Diff_Mode dominated. In Figure 16, AGD(intra) is over AGD(inter) for Akiyo and Silent in probability distribution. According to the statistics, we can conclude a simple rule for mode decision to take the suitable mode for coding the line.

(a). Akiyo_Raw_Mode(99.6%) (b). Silent_Diff_Mode(88.7%) Figure 16. Probability distribution of sequences with major mode of Diff_Mode

With the preprocessing, we choose the best coding mode for each pixel every line. With the best coding mode, each line of the current frame can be encoded most efficiently. To further improve the coding efficiency based on close relationship of spatially successive lines,

a line-based intra-frame prediction method that consists of adaptive reference line and simplified context modeling is introduced.

3.3.2.2 Four Line-based Coding Modes

Four modes including skip mode, difference mode, raw data mode and DC mode are used to categorize the outputs of the line comparator at the encoder. For decoding each line, 2-bit overhead represents the coding mode.

(a). Skip Mode (codeword ‘00’ in the syntax)

The line in current frame is the same as the one in previous frame. The line is skipped by using the side information for decoder to reconstruct the line by directly duplicating it from the previous frame. In Table 10, the previous two columns are compression ratio with and without skip mode. Coding efficiency has great improvement on Akiyo, OutR, OutG, and OutB due to high probability of skip mode cases. If the screen is still for a long time, skip mode would contributes to the best degree.

Table 10. Statistics of skip mode for 9 sequences.

Akiyo.Y 2.70 3.05 11683 0.135

bus.Y 1.64 1.64 0 0

Football.Y 1.99 1.99 249 0.003

Foreman.Y 1.97 1.97 0 0

Mobile.Y 1.43 1.43 0 0

Silent.Y 1.86 1.86 0 0

Out.R 4.96 16.81 21051 0.731

Out.G 5.10 17.97 21048 0.731

Out.B 5.30 19.45 21109 0.733

(b). Difference Mode (codeword ‘01’ in the syntax)

Table 7, Table 8, and Table 9 show the benefits of the zero-motion prediction for the slow-motion sequence just as Akiyo or Silent. We assign difference mode to turn on the zero motion prediction. The difference mode is applied when the magnitude of SAGD_interis smaller thanSAGD_intra. In this case, we take the residuals of zero motion prediction for coding. With the prediction residual, the sign information costs one extra bit. To save the sign bits, the same pixel value re-mapping approach adopted in JPEG-LS is used. If the image alphabet size is α(=2⁸ for a gray-level image), remapping the large prediction residual (−α≤ε ≤α,εis the current residual) into the small and unsigned one (0≤ε ≤α) can decrease the value range without violating the data correction since the prediction value is known in the decoder

(c). Raw Data Mode (codeword ‘11’ in the syntax)

Table 7, Table 8, and Table 9 show the fast motion sequence would be processed well without turn on the zero motion predictor. As SAGD_intra is smaller than SAGD_inter, spatial coding algorithm instead of zero-motion prediction shall be used. In this mode, only intra-frame prediction is taken on the original pixel.

(d). DC Mode (codeword ‘10’ in the syntax)

DC mode is designed for when brightness or luminance on the screen is changed. The mode will contribute to high performance. When the difference value SAGD_inter is zero,

( )

i j

I_inter^D , is the same for all pixels in the processing line, we can use one value to represent

DC offset. With the DC offset, the decoder can perfectly reconstruct the line.

A. Inter-Component Process

For color sequence, color plane can be fed into 3 encoding modules individually. R plane is coding by reference of the R, and G and B planes are processed by the same way. If the context models of various planes are independent, memory requirement is large for realizing the inter-component process. Another choice is coding by the order of R-G-B, which means R plane is referenced by G plane and G plane is referenced by B.

B. Mode-Dependent Spatial Prediction (Intra-Frame Process)

In LALVC, we use a simple template in Eq.16 to predict the pixels of the same frame by a fixed predictor by

With the template, only two lines are buffered for advanced processing including error bias cancellation in context modeling and run mode extension. To simplify the context modeling, we use two gradients (b-c) and (c-a) for the error bias cancellation. To avoid performance degradation of inter-line prediction when the coding mode transition occurs, we set the reference as the same property as the current coding line, which means the raw data mode with raw value reference line and the difference mode with difference value reference line. In addition, the distinct content model for different modes can retain the continuity of the template with the coding pixels. The mode adaptation is applied priori to the gradient computation in Eq.16.

C. Mode Decision

Mode decision mainly focuses on choosing the better mode between the raw data mode and the difference mode. In LALVC, mode decision is made in the temporal preprocess, which can be seen as one-pass of the line first to analyze the statistics of the line. The complexity is analyzed in the section of 4.1.3.3 . Each line has 2-bit overhead to conduct the decoder to follow the chosen mode for reconstruction of the current line.

We take the opt_16_line for the near-optimum performance. Under the simulation, the line mode tag is acquired. Do the statistics about image property related to the tags. Mode decision is made between temporal and only intra predictions. The following shows the distance of difference between reference lines from inter frame and intra frame.

D. Summary

LALVC follows the basic ideas mentioned in the beginning of the chapter. By means of mode decision, we use the four-mode adaptive algorithm to get a high performance based on temporal and spatial predictors. The functions involved in LALVC maintain the low complexity requirements. To show the hardware realization of LALVC, we give the hardware design architecture in the next chapter.

3.3.3 Error Mapping and Entropy Coder

3.3.3.1 Error Mapping

In the previous sections, the difference mode is explained. The residuals is produced from the temporal preprocess. Residuals are derived with subtraction of two pixel values. For gray-level precision (8-bit), the range of residuals with no remapping is increased to as the range of -255~+255, which means that 9 bits are required to represent each value. To reduce the range for saving bits, an error remapping approach is adopted in LALVC.

For 9-bit value range mapping into 8-bit value range, the information of pre-mapped value can be perfectly preserved. The remapping can be explained by decoding processing. In the decoder, the reconstruction value is decided by both residual and predicted value. The remapping methods can be categorized into 2 types. For value range distribution, one is unsigned 8-bit representation (0~255) and the other is singed for 8-bit representation

process

Figure 17. Mode decision in the preprocessing.

(-128~127).

A. Signed mapping

In Figure 18, the two triangle blocks are remapped. For the upper one, Ex (Residual) ranges from 128 to 255, and Px (Predicted value) ranges from 0 to 127. The coding value

) ( Ex Px

Rx = + can be perfectly reconstructed by Ex and Px based on 1 to 1 mapping without aliasing.

Figure 18. Residual remapping method (signed mapping).

B. Unsigned mapping

In Figure 19, the remapping is done by moving the lower triangle to the upper side. After remapping, Ex (Residual) ranges from 0 to 255 that can be represented by 8-bit. For the two remapping methods, there is minor difference of coding performance in our simulation. In the hardware implementation issue, the unsigned mapping is simpler for architecture design.

When 9-bit Ex is available, remapping is finished with the truncation of MSB. More comparators are necessary for implementing the signed remapping. Due to complexity consideration, the unsigned mapping method is preferred.

Figure 19. Residual remapping method (unsigned mapping)

3.3.3.2 Entropy Coder

To fit low-complexity requirements, Golomb code is the best choice on a tradeoff between coding efficiency and complexity. Arithmetic code owns higher coding efficiency and higher complexity than Huffman code.

3.3.4 Syntax Organization

We define the syntax for LALVC decoding that supports the four line-based coding modes. In Figure 20, the bits for the frames are separated by the ‘Sync’ field. The field ‘Frame Skip Tag’ can imply the use of the reference as the current one without display refreshment, which can reduce the data volume when the video content is not changed for a long time. For line-level syntax, the ‘Line Mode Tag’ is set in the preprocessing. In addition, the field ‘DC

value’ can reduce coding overhead when the brightness or color saturation variation of whole image pixels is adjusted. With the selected coding mode, the prediction residuals are generated and then passed through the entropy coder to form the data bitstream. With the syntax, the LALVC decoding process reconstructs the sequence.

Figure 20. Syntax of CAVLC file format.

3.3.5 Algorithm Simplification

Context Model used in JPEG-LS takes advantage of 3 gradients for deriving the entry.

The gradient is computed by a-c, b-c, d-b, (in Figure 21, a, b, c, and d are the neighboring pixels around current pixel x). Each gradient ranges from -255 ~ 255. If no quantization of gradient is applied, the memory requirement is 511*511*511 states. After quantization (Figure 7), the threshold is set at 3, 7 and 21 (corresponding to T1, T2, and T3). Each gradient is mapping to 9 level states (-4 ~ +4). Total amount of states can be reduced into only 9*9*9 = 729. Merging the states based on the context symmetric property reduces the total number of states into 50%.

Figure 21. The template for context model

Table 11 lists the number of states for the context models based on 2 or 3 gradients. “2 gradients” means that only (c-a) and (b-c) decide the entry point of the states.

Table 11. The number of states to support the context model

No grouping Grouping(Quantization)

Num of

gradients No merging Merging with

symmetric No merging Merging with symmetric

3 133432831 66716416 729 365

2 261121 130561 81 41

Four arrays of N[], A[], B[] and C[] are the elements of the Context Modeler. For hardware implementation, the state size must be decided. The relation between the four factors can be defined by

BN N C

A = + Eq. 17

Table 12. The memory requirement for elements in the context model

Array Name Value Range Storage Requirement Actual Memory Size

N[] 0 ~ 64 7-bits 8-bits

A[] 0 ~ 64*255 15-bits 16-bits

B[] -63 ~ 0 7-bits 8-bits

C[] 0 ~ 255 8-bits 8-bits

In JPEG-LS, reset scheme is enabled when N[] is 64, which sets the range of A[] in a reasonable region. If arrays of the context model are stored in the SRAM, resetting of each frame is not good for implementation. The simple method is to turn on the reset scheme when

在文檔中低複雜度無失真視訊壓縮 (頁 25-0)