Document image compression algorithm - THE COMPRESSION ALGORITHMS FOR COMPOUND DOCUMENT IMAGES

3. THE COMPRESSION ALGORITHMS FOR COMPOUND DOCUMENT IMAGES

3.3 Document image compression algorithm

the marked pixels ″▲″ are the transition pixels, of which five are included. The foreground pixel ratio is defined as,

B = complexity of the blocks, and the foreground pixel ratio reflects the density of the foreground pixels. The block size is defined as block width×block length. T ≤ 0.3, B≤ 0.5 and 300 ≤ block size ≤ 30000 are set to extract the text from foreground blocks.

3.3 Document image compression algorithm

As stated above, text and background images have different characteristics.

Traditional image compression algorithms, such as JPEG, are unsuitable for document images. JPEG’s use of local cosine transforms is based on the assumption that the high spatial frequency components in images can be removed without too much degradation of quality. While this assumption holds for most images of natural scenes, it does not hold for document images. A different compression method is required to code text accurately and efficiently to maximize its clarity. Text and the background image can be encoded by methods appropriate for bi-level and continuous-tone images, respectively.

The foreground/background representation was proposed in the ITU MRC/T.44

recommendation [31]. This prototype of document image representation is used in Xerox’s XIFF image format, which presently uses CCITT-G4 to code the mask (text) layer, and JPEG to code the foreground (text color) and background layers. However, the compression ratio of MRC/T.44 is insufficient for document images. Thus, this foreground/background representation is used and two compression algorithms proposed for compound document images. In this work, pixels of text are extracted from a compound document image. The text plane is the mask layer.

Several gaps appear in the background image when pixels of text are extracted from it. The gaps are replaced by pixels with the average gray value of the neighboring pixels to improve the efficiency of compression. The foreground image is the color plane of the text. The color of the text can be obtained from the original image according to the position of the text. The pixels of the text are called used pixels, and the others are called unused pixels. Those unused pixels can be replaced by pixels of an appropriate color to enhance the compression. The color-filling algorithm is as follows.

Step 1. Mark the pixels in the mask as used pixels; the other pixels are unused pixels.

Step 2. Fill the gap with the color of the pixel which adjoins the used pixel, row by row. Mark the filled pixels as used pixels.

Step 3. Fill the gap with the color of the pixel next to the used pixel, column by

column. Mark the filled pixels as used pixels.

Step 4. Repeat the processes in Steps 2 and 3 until no unused pixels remain. The foreground plane is thus obtained.

Different planes are compressed using different compression methods.

(1) Mask plane: The text pixels, also called the mask, are represented by a single bit-plane. This bit-plane uses “1” to represent a text pixel and “0” to represent a background pixel. The text pixels are coded using JB2, which is a variation of AT&T’s proposal for the JBIG2 fax standard.

(2) Foreground plane: The text’s color, also called the foreground, is represented in a color plane. Neighboring text characters generally have identical color, so the color plane contains large areas of contiguous pixels of almost the same color. This color plane is coded using a wavelet-based compression algorithm [3].

(3) Background plane: The background image is coded by the same algorithm as that to code the foreground image.

[Foreground]:

wavelet-based coding [Mask]:

JB2 [Background]

wavelet-based coding

Fig.18 Document image compression format

Figure 18 depicts the compression format. This work proposes two compression

algorithms, CSSP-I and CSSP-II, respectively, to compress compound document images. Each component of the encoder is described below.

A. Method of compressing the mask plane

The mask image compression method uses JB2 [29]. JB2 is an algorithm proposed by AT&T’s for the JBIG2 standard [30] for compressing fax and bi-level images. JB2 provides better compression than JBIG1 [32], which is often used for faxes, in both lossless and lossy compression of arbitrarily scanned images with scanning resolutions from 100 to 800 dpi. JB2 uses information in previously encountered characters and does not risk the introduction of the character substitution errors that are inherent in the use of OCR. The JB2 method has been proven to be approximately 20% more efficient than the JBIG1 standard in the lossless compression of bi-level images. By running JB2 in a controlled loss mode, this algorithm yields a compression ratio of about two to four times that provided by the JBIG1 method. In lossy mode, JB2 is four to eight times better than CCITT-G4 (which is lossless). It is also four to eight times better than GIF.

B. Method for compressing foreground/background plane

The combination of discrete wavelet transform and zerotree coding [3],[4],[33]

has been proposed to compress a pure image with a high compression ratio. Such coding algorithms provide good image quality. This study uses the embedded zerotree wavelet (EZW) coding algorithm [3] as the algorithm for compressing

foreground/background images.

These compression methods are applied to mask, foreground and background images. The document image compression algorithm mentioned earlier is called compression algorithm CSSP-I.

Compression algorithm CSSP-I can extract the text plane from an overlapping background image and compress it using JB2. The compressed data thus obtained are approximately 50% of all the compressed data. Therefore, compression algorithm CSSP-II is proposed to improve the compression ratio.

Compression algorithm CSSP-II uses a downsampling method to reduce the data of the original document images. The downsampling method replaces each 2×2 pixel block by its mean value. This process of reducing the number of pixels is called downsampling. The size of the image is thus diminished to a quarter of the original.

The text segmentation algorithm extracts the text plane. The processing time and the size of the text plane are reduced because the size of the image is a quarter of that of the original. After the segmentation algorithm is applied, the full-size background and quarter-size foreground are compressed using the wavelet-based compression algorithm, and quarter-size mask using the JB2.

In the decompression phase, the quarter-size mask is enlarged by upsampling.

The upsampling method expands each pixel in the mask into a 2×2 pixel. After upsampling, text looks thinner than the original one, so the characters are expanded

by one pixel around the boundaries of it.

3.4 Experimental results

The proposed algorithms were simulated in Windows 2000 (Pentium III 700, 128 MB RAM) using programs written in C++ language. A 24-bit true color image and 200dpi processing were used. Each pixel in a 24-bit true color is characterized by R, G and B values, and every value is represented by 8 bits. The compression algorithms are applied to complex compound documents, so test images include text that overlaps the background. Two compression algorithms, JPEG and DjVu, are selected for comparison with the proposed algorithms. Figure 19 shows the mask images and the background images obtained using proposed algorithm CSSP-I.

The test images are processed using CSSP-I and JPEG, as shown in Fig.20. The color of the image compressed by JPEG is very seriously lost; the block effect is very obvious and the text is blurred. The visual quality obtained using CSSP-I is better than that obtained using the JPEG algorithm.

Figure 21 displays images processed by DjVu. CSSP-I and DjVu are based on the MRC format so the test images are divided into three – the mask image, the background image and the reconstruction image, for comparison.

(1) Mask image: From Figs. 19 and 21, text extracted by CSSP-I is clearer than that extracted by DjVu. The text can be extracted from the complex background using segmentation algorithm of CSSP-I, so the mask image to which CSSP-I is applied can

be put into post-processing, such as by OCR, more precisely than the mask of DjVu.

Accordingly, the segmentation algorithm can also be applied to an OCR system to recognize text on a complex background.

(2) Background image: Figure 19 displays background images obtained by CSSP-I, and Fig. 21 shows those obtained by DjVu. Although DjVu is especially for extracting sharp edges, some parts of the text are missing. Clearly, the background images obtained using CSSP-I are more precise than those obtained using DjVu.

(3) Reconstruction image: Figure 20 displays the reconstruction images obtained of CSSP-I, and Fig. 21 displays those reconstructed by of DjVu.

Figure 22 shows the reconstruction images and the mask images obtained by CSSP-II. The latter are a quarter of the size obtained using CSSP-I. The mask images obtained using CSSP-II are not directly downsampled from those obtained using CSSP-I, but they are extracted from downsampled document images. Therefore, the processing time and the amount of memory used are reduced.

Table 3 presents the compression ratio and PSNR of the proposed methods, JPEG and DjVu. CSSP-I and CSSP-II yield better quality images than JPEG.

Furthermore, the compression ratio and PSNR of the CSSP-I and CSSP-II are higher than those of JPEG. The average PSNR of the proposed methods is close to the average PSNR of DjVu, but the visual quality obtained using the proposed method is better than that obtained using DjVu. The total compression ratio of CSSP-II is higher

than that of DjVu.

3.5 Concluding remarks

Document image segmentation has been studied for over ten years. Directly extracting text from a complex compound document is difficult because the text overlaps background. This chapter proposed a new segmentation method for separating text from compound document images with high text/background overlap.

Based on the new segmentation method, two methods for compressing compound document images were presented. High-quality compound document images with both high compression ratio and a good presentation of text were thus obtained. The proposed compression algorithms were compared with JPEG and DjVu. The proposed methods perform much better.

Original image Mask image Background image

(a) Test image A

Original image Mask image Background image

(b) Test image B

Original image Mask image Background image

(d) Test image D

Original image Mask image 1

Mask image 2 Background image

(e) Test image E

Original image Mask image 1

Mask image 2 Background image

(f) Test image F

Fig.19 Segmentation images of proposed algorithm CSSP-I

CSSP-I

Mask 8,647 bytes Foreground 2,359 bytes Background 15,728 bytes Compression ratio=176.7

CSSP-I Mask 7,083 bytes Foreground 2,359 bytes Background 15,728 bytes

Compression ratio=187.5

CSSP-I Mask 9,556 bytes Foreground 2,359 bytes Background 15,728 bytes

Compression ratio=170.7

JPEG

Compression ratio=122.4

JPEG

Compression ratio=138.7

JPEG

Compression ratio=123.9

(a) Test image A (b) Test image B (c) Test image C

CSSP-I

Mask 6,995 bytes Foreground 2,359 bytes Background 15,728 bytes

Compression ratio=188.1

CSSP-I

Mask 8,835+1,051bytes Foreground 3,612*2 bytes Background 24,084 bytes

Compression ratio=175.4

CSSP-I

Mask 14,947+4,096 bytes Foreground 2,359*2 bytes Background 15,728 bytes

Compression ratio=119.5

JPEG

Compression ratio=120.5

JPEG

Compression ratio=160.4

JPEG

Compression ratio=117.4

(d) Test image D (e) Test image E (f) Test image F

Fig.20 Compared with proposed algorithm CSSP-I & JPEG

Compression ratio=163 Mask plane Background plane

(a) Test image A

Compression ratio=157.7 Mask plane Background plane

(b) Test image B

Compression ratio=160.5 Mask plane Background plane

Compression ratio=166.3 Mask plane Background plane

(d) Test image D

Compression ratio=166 Mask plane Background plane

(e) Test image E

Compression ratio=118.2 Mask plane Background plane

(f) Test image F

Fig.21 Processed images by the DjVu

Reconstruction image Reconstruction image Reconstruction image

Mask plane Mask plane Mask plane

(a) Test image A (b) Test image B (c) Test image C

Mask 4,362 bytes Foreground 590 bytes Background 15,728 bytes Compression ratio=228.2

Mask 3,731 bytes Foreground 590 bytes Background 15,728 bytes Compression ratio=235.4

Mask 4,936 bytes Foreground 590 bytes Background 15,728 bytes

Compression ratio=222

Reconstruction image Reconstruction image Reconstruction image

Mask plane Mask plane 1 Mask plane 1

Mask plane 2 Mask plane 2

(d) Test image D (e) Test image E (f) Test image F

Mask 3,572 bytes Foreground 590 bytes Background 15,728 bytes Compression ratio=237.2

Mask 4,616+671 bytes Foreground 3,612*2 bytes Background 24,084 bytes

Compression ratio=197.5

Mask 7,884+674 bytes Foreground 590*2 bytes Background 15,728 bytes Compression ratio=185.3

Fig.22 Processed images by the CSSP-II

Table 3 Comparison the compression ratio and PSNR for the proposed methods to JPEG & DjVu

Images a b c d e f Average

Compression ratio

176.7 185.5 170.7 188.1 175.4 119.5 169.3 Algorithm

CSSP-I

PSNR 18.9 22.8 20.9 19.8 23.9 20.5 21.13

Compression ratio

228.2 235.4 222.2 237.2 197.5 185.3 217.6 Algorithm

CSSP-II

PSNR 18.8 22.7 20.9 20.3 23.7 20.3 21.1

Compression ratio

122.4 138.7 123.9 120.5 160.4 117.4 130.6 JPEG

PSNR 17.6 21.5 20.0 19.8 21.3 19.5 19.95

Compression ratio

163 157.7 160.5 166.3 166 118.2 155.3 DjVu

PSNR 18.7 23.6 20.6 20.2 23.6 20.3 21.17

CHAPTER 4 THE MULTI-LAYER SEGMENTATION METHOD FOR COMPLEX DOCUMENT IMAGES

Texts are frequently printed on complex backgrounds. Segmenting texts is an important topic in document analysis. Some methods of segmentation have been developed for texts with images. However, previous studies have not sufficiently addressed complex compound documents. This chapter proposes a text segmentation algorithm for various document images. The proposed segmentation algorithm incorporates a new multi-layer segmentation method to separate the text from various compound document images, regardless of whether the text and background overlap.

This method solves various problems associated with the complexity of background images. Experimental results obtained using different document images scanned from book covers, advertisements, brochures, and magazines reveal that the proposed algorithm can successfully segment Chinese and English text strings from various backgrounds, regardless of whether the texts are over a simple, slowly varying or rapidly varying background texture.

4.1 Introduction

The complexity of background images is critical to the application of the text segmentation algorithm. Segmenting the texts from a complex compound document image is an important issue in document analysis. Document image segmentation, which separates the text from a monochromatic background, has been studied for over a decade. Some systems based on prior knowledge of some statistical characteristics of various blocks [11],[13], or texture analyses [27] have been successively developed.

A text segmentation algorithm based on block-thresholding, which involves thresholds on rate-distortion has been proposed [26].A system that focuses on the extraction and classification of bibliographical information from book covers has been developed [34]. Several other approaches use the features of wavelet coefficients to extract text [35]-[39].

All such systems focus on processing document images whose texts do not overlay a complexity background. These studies are effective in extracting characters from monochromatic backgrounds. However, they do not apply when backgrounds include sharply varying contours or overlap with texts. These background images include 1) monochromatic backgrounds with/without texts; 2) slowly varying backgrounds with/without texts; 3) highly varying background with/without texts and, 4) complex varying backgrounds with/without texts with different colors. Extracting

the texts is particularly difficult when the compound document image includes all of these backgrounds.

A text extraction algorithm was proposed to aim at WWW images [40]. The algorithm used the Euclidean minimum-spanning-tree (EMST) technique to cluster the R-G-B space of the input image into a number of color classes. For each color class, the bounding boxes of connected components were found by the connected-component labeling method and the shape and feature of the bounding boxes were used to classify the connected components as text-like or non-text-like.

However, the global algorithm is not sufficiently to extract the texts in many document, advertisement, brochure, and magazine images. The texts in these A4-size images are widespread distributions. Because these images are captured by scanners, the pixel values of texts will be spread due to the optical property of scanners for the complex varying backgrounds overlap with texts with different colors. Hence, the texts in each color classes will be fragmented by the global algorithm.

A global segmentation method for color document was proposed [41], which uses spatial information of R-G-B color space to select the line segments by the author as initial clusters. Then, reduce the line segments that are close and use a predefined threshold to group the neighbor pixels of the remaining line segments. The method makes the assumption that for the documents under consideration the

background color for each frame is uniform over the whole frame. Hence, the method does not apply when images include rich and colorful backgrounds or little texts, because the line segments can not be selected correctly. Meanwhile, the texts in each cluster will be fragmented by the global algorithm.

Some edge-based methods were proposed to detect the texts from complex document images [42],[43]. These methods use Sobel or Canny operator to detect the edge features and calculates an edge-based feature to detect the texts. The edge-based methods can detect the edge feature and use the feature to extract the texts from document images. But, the edge feature of backgrounds will be detected simultaneously. Therefore, the edge-based method does not valid when backgrounds include sharply varying contours and overlap with texts.

A text detection method [44] used three second-order Gaussian derivative filters to calculate the edge-feature vector from three different image sizes, and the K-means algorithm (with K=3) was used to cluster the pixels based on the edge-feature vectors.

One of the three clusters is labeled as text plane. Because the text plane contains many complex backgrounds and texture patterns, the refinement phase calculates the strokes, edge information, of the text plane and groups the strokes which have similar heights and are horizontally aligned into tight rectangular bounding boxes. Furthermore, the text plane clustered by the edge-feature vectors will be interfered by non-text edges

when the texts are connected with the complex texture background. Hence, the edge-based method is not useful when backgrounds include sharply varying contours and overlap with texts.

Recently, some text detection and tracking methods focus on digital video were proposed [45],[46]. The text detection methods use the wavelet transfer or the gradient image of R-G-B space to obtain the edge information of the images. The text extraction methods are edge-based and multi-scale methods. These methods are sufficient to detect and track texts from video frames. In document images, many texts are small and thin. The edge features of small texts and thin strokes will be diminished after the downsample process. Hence, the methods could be unsuitable to detect the texts from the document images. Furthermore, the edge-based text extraction method will also be interfered by the non-text edges when the texts are connected with the complex texture backgrounds which include sharply varying contours.

In the multi-layer segmentation method (MLSM) proposed by this chapter, it uses a block-based unsupervised clustering algorithm to cluster the pixels whose values are near. As we know, the disadvantage of the local method is that a lot of the structural information is lost in the process. Hence, a jigsaw-puzzle layer construction algorithm is presented to reconstruct the structural information based on different objects of the

processed images. Therefore, the MLSM can be used to solve various problems associated with the different document images scanned from book covers, advertisements, brochures, and magazines, regardless of whether the texts are over a simple, slowly varying or rapidly varying background texture. The MLSM focuses on the document images and can solve the disadvantages of the edge-based and downsample methods.

This chapter presents a good method to extract texts from different compound document images. The compound document image includes several objects, including different colored texts, figures, scenes and complex backgrounds. Such objects may overlap each others. They have different features, according to which the document image can be partitioned into several object layers. The MLSM can separate texts

在文檔中複雜型複合式文件影像壓縮方法之研究 (頁 61-0)