Concluding remarks - THE FUZZY-BASED TEXT SEGMENTATION METHOD

2. THE FUZZY-BASED TEXT SEGMENTATION METHOD

2.6 Concluding remarks

Traditional color image-compression standards such as JPEG are inappropriate for document images. JPEG’s application relies on the assumption that the high spatial frequency components in images can be essentially removed without much quality degradation. While this assumption holds for most pictures of natural scenes, it does not work for document images. The texts require a lossless coding technique to maximize readability. This chapter has proposed a new compression method with promising performance on color document images. The method uses different compression algorithms based on fuzzy picture-text segmentation for sub-images with different characteristics. The fuzzy picture-text segmentation algorithm is based on the coefficients of wavelet transform. It is fast to find out the text components and picture components from the coefficients of wavelet transform. We have also compared our method with JPEG. The results show that the new compression method has achieved better and clearer quality than those from JPEG.

Fig.13(a) The original image(512×1024, scanned by ScanMaker V600 at 200dpi)

Fig.13(b) JPEG image(CR=104.1) Fig.13(c) Proposed method(CR=112.3)

Fig.14(a) The original image(1024×512, scanned at 200dpi)

Fig.14(b) JPEG image(CR=83.1)

Fig.14(c) Proposed method(CR=84.31)

CHAPTER 3 THE COMPRESSION ALGORITHMS FOR COMPOUND DOCUMENT IMAGES WITH LARGE TEXT/BACKGROUND OVERLAP

This chapter presents two algorithms for compressing image documents, with a high compression ratio of both color and monochromatic compound document images.

The proposed algorithms apply a new segmentation method to separate the text from the image in a compound document in which the text and background overlap. The segmentation method classifies document images into three planes: the text plane, the background plane, and the text’s color plane. Different compression techniques are used to process the text plane, the background and the text’s color plane. The text plane is compressed using the pattern matching technique, called JB2. Wavelet transform and zerotree coding are used to compress the background plane and the text’s color plane. Assigning bits for different planes yields high-quality compound document images with both a high compression ratio and well presented text. The proposed algorithms greatly outperform the famous image compression methods, JPEG and DjVu, and enable the effective extraction of the text from a complex background, achieving a high compression ratio for compound document images.

3.1 Introduction

A color page of A4 size at 200 dpi is 1660 pixels wide and 2360 pixels high; it occupies about 12 Mbytes of memory in an uncompressed form. The large amount of data prolongs transmission time and makes storage expensive. Texts and pictures cannot be compressed by a single method like JPEG since they have different characteristics. Digitized images of compound documents typically consist of a mixture of text, pictures, and graphic elements, which have to be separated for further processing and efficient representation. Rapid advances in multimedia techniques have enabled document images, advertisements, checks, brochures and magazines to overlap text with background images. Separating the text from a compound document image is an important step in analyzing a document.

Document image segmentation, which separates the text from the monochromatic background, has been studied for over ten years. Segmenting compound document images is still an open research field. Traditional image compression methods, such as JPEG, are not suitable for compound document images because such images include much text. These image data are high-frequency components, many of which are lost in JPEG compression. Text and the high-frequency components thus become blurred. Then, the text cannot be recognized easily by the human eye or a computer. The text contains most information, separating the text from a compound document image is one of the most significant areas of

research into document images. The traditional compression method cannot meet the needs of the digital world, because when compound documents are compressed at a high compression ratio, the image quality of the text part usually becomes unacceptable.

Many techniques have been developed to segment document images. Some approaches to processing monochromatic document images have already been proposed. Queiroz et al. proposed a segmentation algorithm based on block-thresholding, in which the thresholds were found in a rate-distortion analysis method [26]. Some other systems based on a prior knowledge of some statistical properties of the various blocks [11]-[15], or textual analyses [16],[17],[27] have also been subsequently developed. All these systems focus on processing monochromatic documents. In contrast, few approaches to analyzing color documents have been proposed. Suen and Wang [19] utilized geometric features and color information to classify segmented blocks into lines of text and picture components. Digipaper [28]

and DjVu [20],[21] are two image compression techniques that are particularly geared towards the compression of a color document image. The basic idea behind Digipaper and DjVu is to separate the text from the background and to use different techniques to compress each of those components. The image of the text part in DjVu is encoded using a bi-level image compression algorithm called JB2, and the background image is encoded by a progressive, wavelet-based compression algorithm called IW44.

These methods powerfully extract text characters from a simple or slowly varying background. However, they are insufficient when the background includes sharply varying contours or overlaps with text. Extracting the text when the color of the overlapped background is close that of the text is especially difficult. Finding a text segmentation method of complex compound documents remains a great challenge and the research field is still young.

This chapter proposes a new segmentation algorithm for separating text from a complex compound document in 24-bit true-color or 8-bit monochrome. Two compression algorithms that yield a high compression ratio are also proposed. The technique separates text from background image after segmenting the text. Therefore, it has many applications, such as to color facsimiles and document compression.

Moreover, the segmentation algorithm can be used to find characters in complex documents with a large text/background overlap.

3.2 Text segmentation algorithm

When a document image is captured from a scanner, it can include several different components, including text, graphics, pictures, and others. The textual parts must be separated from a compound document image, whether effective compression or optical character recognition (OCR) is intended.

This section introduces a new extraction algorithm that can separate text from a complex, color or monochromatic compound document, as in Fig.15.

(a) Test image A size=1024×1536

(b) Test image B size=1024×1536

(d) Test image D size=1024×1536

(e) Test image E size=1344×1792

(f) Test image F size=1024×1536 Fig.15 The original full images (200 dpi)

(a) Original Y-plane image

(b) Bright plane

(d) Dark plane

Fig.16 The planes after clustering algorithm

The features and texture of a complex compound document can be very complicated. In a pilot test, when a color document image was converted into a monochromatic image, the gray value of the image of text differed slightly from the gray value of the background image. However, the image of text cannot easily be directly separated from the background image because the difference between the gray values is too small. Therefore, two phases are used to accomplish the desired purpose. In the first phase, which involves color transformation and clustering analysis, the monochromatic document image is partitioned into three planes, the dark plane, the medium plane, and the bright plane, as depicted in Fig.16. The color of the text is almost all the same, so the variance of the text’s grayscale is small. Therefore, all the text can be grouped in the same plane.

When the text is black, the text and some of the background with a gray value close to that of the text is put in the dark plane. In contrast, the text is put into another plane if it is not black. Thus, the text and some noise are coarsely separated from the background. In the second phase, an adaptive threshold is determined to refine the text by adaptive binarization and block extraction. The two phases in which the algorithm extracts the text from the background are shown below.

A. Color Transformation

The color transformation technique is used to transfer a color document image to the YUV plane and the Y-plane (grayscale image) is used to segment the text from the

complicated background image.

B. Clustering analysis

In general, after a color document image is converted into a monochromatic one,

the textures of the original color image are still present in the converted grayscale image. The difference between the text’s gray value and that of the overlapping background image is small. Thus, a clustering algorithm is used to split the grayscale images. Clustering analysis roughly separates text from a background image. First, it extracts as many as possible of the different textures of an image. The text is embedded in one of the planes.

The clustering algorithm is described below.

Step 1. Partition the M×N grayscale image A

( )

i, j into p sub-block images xn

( )

i,j .

Each sub-block xn

( )

i,j is of K×L, where n=1,2,…p.

Step 2. Calculate the mean of gray value, mn, and the standard derivation,σ n, of each

K× L sub-block image. For the n^th sub-block image x_n

( )

^i,^j , the mean and standard

Step 3. Split xn

( )

i,j according to mean and standard derivation. Define two centers,

ηk using Equations (3-1) and (3-2), respectively.

If σ_n₁ >σ_n₂, then center C_n₃ =m_n −0.5×σ_n, and compute the two new centers

is partitioned into three clusters ψ_k (k=1,2,3) according to,

{

x_n

( )

i, j |D_ij,1<D_ij,2 and D_ij,1<D_ij,3

}

;

{

x_n

( )

i, j |D_ij,2 <D_ij,1 and D_ij,2 <D_ij,3

}

;

{

x_n

( )

i,j |D_ij,3 <D_ij,1 and D_ij,3<D_ij,2

}

; (3-8) where, D_ij_,₁= x_n

( )

i,j −C_n₁ ,

D_ij_,₂= x_n

( )

i,j −C_n₂ , and

D_ij_,₃= x_n

( )

i,j −C_n₃ . (3-9)

Repeat Steps 2 to 6 until all of the sub-block images xn

( )

i,j (n=1,2,…p) have been processed.

The optimal partition of the images depends on the intensity distribution of background images and the lengths, sizes and layouts of the text strings. However, analyzing those parameters is very complex. Therefore, images are partitioned into equal sub-blocks for simplicity. This study used K=256 and L=128, and a value of p that depended on the image size. The constants were determined empirically to ensure good performance in general cases.

C. Adaptive binarization

After the first phase, the gray values of each plane become simpler than those of the original document image. Then, the text is extracted from the background image using the thresholding algorithm. Thresholding techniques can be categorized into

two classes, global and local. The global thresholding algorithm uses a single threshold, while a local thresholding algorithm computes a separate threshold for each local region. This work utilizes a local thresholding algorithm. Set the threshold value THn of the dark and bright planes of the sub-block images xn

( )

i, j (n=1,2,…p) to, processed, while m_bn relates to the background pixels of the other two planes. From Eq. (3-10), the value of THn can be adapted to the gray value of sub-block xn

( )

i,j . The foreground pixels in the dark plane are darker than the background pixels. The

adaptive thresholding value can be calculated as, THn= m_fn

-n

The threshold value THn is biased to the left of the mean value m_fn. The foreground pixels in the bright plane are brighter than the background pixels. The adaptive

thresholding value can be calculated as, THn= m_fn +

threshold value THn is biased to the right of the mean value m_fn. Figure 17 gives an

example of the algorithm.

(a) Original sub-block image (b) Dark plane of the sub-block

(d) Bi-level image after adaptive thresholding

Fig.17 An example of text extraction algorithm image(size=256×128)

Figure 17(a) is the sub-block image xn

( )

i,j , Figure 17(b) is the dark plane of the sub-block, and Fig. 17(c) shows the histogram of the sub-block image xn

( )

i,j . The threshold value THn has a bias to the left of the mean value m_fn to clarify the text.

Figure 17(d) shows the bi-level image.

D. Spreading and region growing for block extraction

The pixels of foreground and noise are obtained simultaneously using the thresholding method. Accordingly, the isolated pixels are deleted and the Constrained Run Length Algorithm (CRLA) [7] is applied to remove noise pixels.

The CRLA is performed in horizontal and vertical directions, yielding the

binary images Mh

( )

i,j and Mv

( )

i,j (1<i<M, 1<j<N), respectively. Then, the "AND"

operator is applied to Mh

( )

i,j and Mv

( )

i, j pixel by pixel, and a binary image,

Mhv

( )

i,j , which merges the neighboring pixels in both directions, is obtained.

The CRLA and the logic operation together constitute the spreading process, after which, the binary spreading image Mhv

( )

i,j is processed using the region growing method to arrange the foreground pixels into rectangular blocks. The region growing method is described in the Chapter 2.

The processes of spreading and region growing yield the positions of the foreground blocks.

E. Distinguishing text from foreground blocks

The blocks that contain foreground pixels are extracted following the spreading and growing processes. The blocks that contain text strings must now be identified. In this study, three parameters, transition pixel ratio, foreground pixel ratio, and block size, are used to identify these blocks.

The transition pixel ratio is defined as,

T = Area of block

The transition pixel occurs at the boundary of foreground pixels. For example, in the following bi-level image,

0 0 0 1 1 0 1 0 0 1 1 1 0 0 0

▲ ▲ ▲ ▲ ▲

the marked pixels ″▲″ are the transition pixels, of which five are included. The foreground pixel ratio is defined as,

B = complexity of the blocks, and the foreground pixel ratio reflects the density of the foreground pixels. The block size is defined as block width×block length. T ≤ 0.3, B≤ 0.5 and 300 ≤ block size ≤ 30000 are set to extract the text from foreground blocks.

3.3 Document image compression algorithm

As stated above, text and background images have different characteristics.

Traditional image compression algorithms, such as JPEG, are unsuitable for document images. JPEG’s use of local cosine transforms is based on the assumption that the high spatial frequency components in images can be removed without too much degradation of quality. While this assumption holds for most images of natural scenes, it does not hold for document images. A different compression method is required to code text accurately and efficiently to maximize its clarity. Text and the background image can be encoded by methods appropriate for bi-level and continuous-tone images, respectively.

The foreground/background representation was proposed in the ITU MRC/T.44

recommendation [31]. This prototype of document image representation is used in Xerox’s XIFF image format, which presently uses CCITT-G4 to code the mask (text) layer, and JPEG to code the foreground (text color) and background layers. However, the compression ratio of MRC/T.44 is insufficient for document images. Thus, this foreground/background representation is used and two compression algorithms proposed for compound document images. In this work, pixels of text are extracted from a compound document image. The text plane is the mask layer.

Several gaps appear in the background image when pixels of text are extracted from it. The gaps are replaced by pixels with the average gray value of the neighboring pixels to improve the efficiency of compression. The foreground image is the color plane of the text. The color of the text can be obtained from the original image according to the position of the text. The pixels of the text are called used pixels, and the others are called unused pixels. Those unused pixels can be replaced by pixels of an appropriate color to enhance the compression. The color-filling algorithm is as follows.

Step 1. Mark the pixels in the mask as used pixels; the other pixels are unused pixels.

Step 2. Fill the gap with the color of the pixel which adjoins the used pixel, row by row. Mark the filled pixels as used pixels.

Step 3. Fill the gap with the color of the pixel next to the used pixel, column by

column. Mark the filled pixels as used pixels.

Step 4. Repeat the processes in Steps 2 and 3 until no unused pixels remain. The foreground plane is thus obtained.

Different planes are compressed using different compression methods.

(1) Mask plane: The text pixels, also called the mask, are represented by a single bit-plane. This bit-plane uses “1” to represent a text pixel and “0” to represent a background pixel. The text pixels are coded using JB2, which is a variation of AT&T’s proposal for the JBIG2 fax standard.

(2) Foreground plane: The text’s color, also called the foreground, is represented in a color plane. Neighboring text characters generally have identical color, so the color plane contains large areas of contiguous pixels of almost the same color. This color plane is coded using a wavelet-based compression algorithm [3].

(3) Background plane: The background image is coded by the same algorithm as that to code the foreground image.

[Foreground]:

wavelet-based coding [Mask]:

JB2 [Background]

wavelet-based coding

Fig.18 Document image compression format

Figure 18 depicts the compression format. This work proposes two compression

algorithms, CSSP-I and CSSP-II, respectively, to compress compound document images. Each component of the encoder is described below.

A. Method of compressing the mask plane

The mask image compression method uses JB2 [29]. JB2 is an algorithm proposed by AT&T’s for the JBIG2 standard [30] for compressing fax and bi-level images. JB2 provides better compression than JBIG1 [32], which is often used for faxes, in both lossless and lossy compression of arbitrarily scanned images with scanning resolutions from 100 to 800 dpi. JB2 uses information in previously encountered characters and does not risk the introduction of the character substitution errors that are inherent in the use of OCR. The JB2 method has been proven to be approximately 20% more efficient than the JBIG1 standard in the lossless compression of bi-level images. By running JB2 in a controlled loss mode, this algorithm yields a compression ratio of about two to four times that provided by the JBIG1 method. In lossy mode, JB2 is four to eight times better than CCITT-G4 (which is lossless). It is also four to eight times better than GIF.

B. Method for compressing foreground/background plane

The combination of discrete wavelet transform and zerotree coding [3],[4],[33]

has been proposed to compress a pure image with a high compression ratio. Such coding algorithms provide good image quality. This study uses the embedded zerotree wavelet (EZW) coding algorithm [3] as the algorithm for compressing

foreground/background images.

These compression methods are applied to mask, foreground and background images. The document image compression algorithm mentioned earlier is called compression algorithm CSSP-I.

Compression algorithm CSSP-I can extract the text plane from an overlapping background image and compress it using JB2. The compressed data thus obtained are approximately 50% of all the compressed data. Therefore, compression algorithm CSSP-II is proposed to improve the compression ratio.

Compression algorithm CSSP-II uses a downsampling method to reduce the data of the original document images. The downsampling method replaces each 2×2 pixel block by its mean value. This process of reducing the number of pixels is called downsampling. The size of the image is thus diminished to a quarter of the original.

The text segmentation algorithm extracts the text plane. The processing time and the size of the text plane are reduced because the size of the image is a quarter of that of the original. After the segmentation algorithm is applied, the full-size background and quarter-size foreground are compressed using the wavelet-based compression algorithm, and quarter-size mask using the JB2.

In the decompression phase, the quarter-size mask is enlarged by upsampling.

The upsampling method expands each pixel in the mask into a 2×2 pixel. After upsampling, text looks thinner than the original one, so the characters are expanded

by one pixel around the boundaries of it.

3.4 Experimental results

The proposed algorithms were simulated in Windows 2000 (Pentium III 700, 128 MB RAM) using programs written in C++ language. A 24-bit true color image and 200dpi processing were used. Each pixel in a 24-bit true color is characterized by

在文檔中複雜型複合式文件影像壓縮方法之研究 (頁 45-0)