複雜型複合式文件影像壓縮方法之研究

(1)

國

立

交

通

大

學

電機與控制工程學系

博士

論

文

複雜型複合式文件影像壓縮方法之研究

THE STUDY OF THE COMPRESSION ALGORITHMS FOR

COMPLEX COMPOUND DOCUMENT IMAGES

研究生：瞿忠正

指導教授：吳炳飛教授

(2)

複雜型複合式文件影像壓縮方法之研究

THE STUDY OF THE COMPRESSION ALGORITHMS FOR

COMPLEX COMPOUND DOCUMENT IMAGES

研究生：瞿忠正 Student：Chung-Cheng Chiu

指導教授：吳炳飛 Advisor：Bing-Fei Wu

國立交通大學

電機與控制工程學系

博士論文

A Dissertation

Submitted to Department of Electrical and Control Engineering College of Electrical Engineering and Computer Science

National Chiao Tung University In partial Fulfillment of the Requirements

for the Degree of Doctor of Philosophy

in

Electrical and Control Engineering

December 2004

Hsinchu, Taiwan, Republic of China

(3)

複雜型複合式文件影像壓縮方法之研究

學生：瞿忠正指導教授：吳炳飛教授

國立交通大學電機與控制工程學系博士班

摘要

由於複合式文件影像中包含許多文字資訊，當文件影像以傳統壓縮方法壓縮時，文字資訊會產生大量的失真，文字和一些屬於高頻的資訊會變的模糊。所以，傳統壓縮方法並不適合拿來直接對複合式文件影像作壓縮處理，同時，壓縮後文件影像中的文字，也無法容易的被電腦辨識或被我們閱讀。因為文件中文字資訊的重要性，所以文件影像的文字切割技術已經發展了十多年，但是針對複合式文件影像的研究，仍是一個新鮮的研究課題。目前已有許多學者針對複合式文件影像研究文字切割的方法，但是這些方法依然不能適用於目前報章雜誌上圖文交疊、背景變化多端的複雜型複合式文件影像。像這類複雜型複合式文件影像的文字切割技術，可以說是文件影像處理的一大挑戰。如果可以從不同複雜程度的影像中，將文字切割出來，那就可以適用於所有的文件影像處理。本篇論文研究目標就是發展出一種可以解決複雜型複合式文件影像的文字切割方法，使文件影像壓縮可以達到更高的壓縮倍數與視覺品質。本篇論文提出三個文字切割的方法，這三個方法所處理的複合式文件影像難度依章節順序增高。本文中提出的切割方法應用於文件影像壓縮，可以明顯的看出壓縮倍

(4)

數與視覺品質優於 JPEG 或 DjVu，而且在本文第三個切割方法(MLSM)中，提出新的

區域性區塊特徵分離與拼圖式全圖整合的方法，在解決複雜型複合式文件影像的文字

切割問題時，即使在同一張完整的文件影像中，包含各種不同程度的複雜狀況，也可

以順利的將不同顏色、不同複雜背景與不同交疊程度的文字切割出來，提高各種複雜

(5)

THE STUDY OF THE COMPRESSION ALGORITHMS

FOR COMPLEX COMPOUND DOCUMENT IMAGES

Student：Chung-Cheng Chiu Advisor：Prof. Bing-Fei Wu

Department of Electrical and Control Engineering

National Chiao Tung University

ABSTRACT

Traditional image compression methods are not suitable for compound document

images because such images include much text. These image data are high-frequency

components, many of which are lost in compression. Text and the high-frequency

components thus become blurred. Then, the text cannot be recognized easily by the human

eye or a computer. The text contains most information, separating the text from a

compound document image is one of the most significant areas of research into document

images. Document image segmentation, which separates the text from the monochromatic

background, has been studied for over ten years. Segmenting compound document images

is still an open research field. Many techniques have been developed to segment document

images. However, they are insufficient when the background includes sharply varying

contours or overlaps with text. Finding a text segmentation method of complex compound

documents remains a great challenge and the research field is still young. This dissertation

presents three segmentation algorithms for compressing image documents, with a high

(6)

proposed algorithms greatly outperform the famous image compression methods, JPEG and

DjVu, and enable the effective extraction of the text from a complex background, achieving

(7)

ACKNOWLEDGEMENTS

第一次見到恩師吳炳飛教授，就深深被教授的獨特氣質與淵博的學養所吸引，當下就決定進到 CSSP 實驗室展開人生一段非常重要的學術研究訓練。這一段學習的過程當然是多采多姿，研究的工作雖然辛苦，但是絕不會累!因為一路上有教授幫忙排除萬難；研究的路途雖然遙遠，但是決不孤獨!因為一路上有許多實驗室的夥伴陪伴。能順利完成博士學位，需要感謝的人很多，首先要感謝的就是恩師吳炳飛教授與師母，感謝教授為我開啟一扇智慧之門，讓我未來可以走的更遠更平順；感謝師母的協助，讓教授能有更多的時間來指導我們。感謝強哥與旭哥在我研究的過程一路陪伴，真懷念我們一起在 CSSP 實驗室並肩熬夜打拼的日子；尤其要感謝陳彥霖學弟，他從碩士班開始就跟著我做研究，並直攻到博士班，一路陪伴我完成博士學業；感謝實驗室所有學弟、妹們，與各位合作研究的經驗，是我這一生重要的回憶。感謝口試委員宋開泰教授、張志永教授、賈叢林教授、陸儀斌教授與蘇崇彥教授，給予我在研究上許多寶貴的意見。更要感謝我的父親瞿梓庭先生與母親羅金妹女士，從小就給我一個模範的榜樣，感謝父親嚴厲的身教與言教，讓我能掌握人生的方向，謝謝您們！最後要感謝的是我的妻子李明珍小姐，由於研究工作的繁重，這段時間感謝妳的陪伴與支持，兩個頑皮小子旭民與瑞宏也照顧的很出色，讓我無後顧之憂的將研究工作圓滿完成，謹將此成果與妳分享。忠正於國立交通大學電機與控制工程學系 CSSP 實驗室 2004/12/14

(8)

ABSTRACT (Chinese) ……….……….………. ii ABSTRACT (English) ………..…. iv ACKNOWLEDGEMENTS ……….……….………. vi CONTENTS ……….……….………. vii LIST OF TABLES ……….……….………. ix LIST OF FIGURES ……….……….………. x 1. INTRODUCTION ……….………..………..……. 1 1.1 Motivation……….……….. 1

1.2 Organization of the dissertation……….. 2

2. THE FUZZY-BASED TEXT SEGMENTATION METHOD ...………..………….. 7

2.1 Introduction ……… 8

2.2 The characteristics of coefficients in wavelet transform………. 9

2.3 Fuzzy picture-text segmentation algorithm………..……... 12

2.4 Color document images compression method………. 27

2.5 Experimental results……… 31

2.6 Concluding remarks……… 32

3. THE COMPRESSION ALGORITHMS FOR COMPOUND DOCUMENT IMAGES WITH LARGE TEXT/BACKGROUND OVERLAP……….…. 35

(9)

3.2 Text segmentation algorithm………...… 38

3.3 Document image compression algorithm……… 48

3.4 Experimental results……….………... 53

3.5 Concluding remarks……… 55

4. THE MULTI-LAYER SEGMENTATION METHOD FOR COMPLEX DOCUMENT IMAGES………... 68

4.1 Introduction………. 69

4.2 Multi-layer segmentation method………..……. 73

4.2.1 Block-based clustering algorithm………. 77

4.2.2 Jigsaw-puzzle layer construction algorithm………... 81

4.3 Text extraction algorithm……….. 97

4.4 Experimental results and discussions……… 104

4.5 Concluding remarks………... 108

5. CONCLUSIONS AND PERSPECTIVE……….. 116

REFERENCE……….. 120

VITA………. 125

PUBLICATION LIST………. 127

(10)

LIST OF TABLES

Table 1 The dynamic range of UV-plane coefficient……… 20

Table 2 The fractal dimension influenced by the resolution of images……… 23

Table 3 Comparison the compression ratio and PSNR for the proposed methods to JPEG & DjVu...

(11)

LIST OF FIGURES

Fig.1 Image after 2-level discrete wavelet transformation.……….... 10

Fig.2 The flowchart of Fuzzy picture-text Segmentation algorithm………….. 13

Fig.3 The influence of threshold………... 14

Fig.4 Example of CRLA and region growing……… 17

Fig.5 The edge projection of text components in high frequency band………. 18

Fig.6 Two kinds of projection histogram………... 19

Fig.7 Images with text/picture component………. 23

Fig.8 The characteristics of three parameters……… 25

Fig.9 The membership functions of three parameters……… 25

Fig.10 Fuzzy Rule Table……… 26

Fig.11 Possible fuzzy quantization by triangle-sharp fuzzy number…………. 27

Fig.12 The flowchart of proposed compression algorithm……… 28

Fig.13(a) The original image(512×1024, scanned by ScanMaker V600 at 200dpi)………... 33

Fig.13(b) JPEG image(CR=104.1)……… 33

(12)

Fig.14(a) The original image(1024×512, scanned at 200dpi)……… 34

Fig.14(b) JPEG image(CR=83.1)………... 34

Fig.14(c) Proposed method(CR=84.31)………. 34

Fig.15 The original full image (200 dpi)……… 39

Fig.16 The planes after clustering algorithm………….……… 40

Fig.17 An example of text extraction algorithm (size=256×128)……….. 46

Fig.18 Document image compression format……… 50

Fig.19 Segmentation images of proposed algorithm CSSP-I……… 59

Fig.20 Compared with proposed algorithm CSSP-I & JPEG……… 61

Fig.21 Processed images of the DjVu……… 64

Fig.22 Processed images of the CSSP-II………... 66

Fig.23 An example of the results after the block-based clustering algorithm... 75

Fig.24 Flowchart of the decision procedure to construct or extend an object layer……… 89

Fig.25 An example of the MLSM (image size=1361x1333)………. 96

Fig.26 The example of the text extraction algorithm of the Fig.25(d)……….. 101

(13)

Fig.28 Test image 2 (image size=1829x2330)………... 111

Fig.29 Test image 3 (image size=2462x3250)………... 112

(14)

CHAPTER 1 INTRODUCTION

1.1 Motivation

As the electronic storage, retrieval and transmission of documents become faster

and cheaper, documents are becoming increasingly digitized. The image size of a

magazine page at 300 dpi is 3300 pixels high and 2500 pixels wide, it occupies about

25 Mbytes of memory in uncompressed form. The volume of data greatly prolongs

the transmission time and makes the storage cost high. Therefore, the document

images should be compressed before transmission or storage.

A typical digital image encoder initially converts the input image data into

coefficients by means of one of the transform procedures, such as DCT and FFT. The

obtained coefficients are then encoded using scalar or vector quantization followed by

one of entropy-based coders which is a Huffman coder in a majority of cases. The

combination of discrete wavelet transform [1],[2] and zerotree coding [3],[4] was

proposed to compress a nature image.

However, using those methods on color document images as advertisements

(15)

characteristics of text and pictures are different, it is not suitable to compress them by

the same method like JPEG [5] or discrete wavelet transform. Digitized images of

printed documents typically consist of a mixture of texts, pictures, and graphics

elements, which have to be separated for further processing and efficient

representation. Because text captures the most information, how to segment the text

from printed document images becomes an important step in document analysis.

So far, there are more and more documents printed with gorgeous styles such as

various color texts and background objects. They must be segmented from an image

to facilitate further processing. Therefore, many researchers have developed valuable

segmentation techniques for applications that include document analysis, image

segmentation, image compression, and pattern recognition.

1.2 Organization of the dissertation

In this dissertation, three segmentation methods for document images proposed

to extract texts from compound document images.

In the Chapter 2, a compression method for color document images based on the

wavelet transform and fuzzy picture-text segmentation is presented. This approach

addresses a fuzzy picture-text segmentation method, which separates pictures and

texts by using wavelet coefficients from color document images. Two components,

(16)

algorithms.

The fuzzy picture-text segmentation method separates the text and picture from

the monochromatic background. However, the rapid development of multimedia

technology has led to increasing numbers of real-life documents, including stylistic

text strings with decorated objects and colorful, slowly or highly varying background

components. These documents overlap the text strings with the background images.

Therefore, the fuzzy picture-text segmentation method cannot effectively segment all

important objects. It is insufficient when the background includes sharply varying

contours or overlaps with text.

Therefore, Chapter 3 proposes a new segmentation algorithm for separating text

from a document image with a complex background. However, the image of text

cannot easily be directly separated from the background image because the difference

between the gray values is too small. Therefore, two phases are used to accomplish

the desired purpose. In the first phase, which involves color transformation and

clustering analysis, the monochromatic document image is partitioned into three

planes, the dark plane, the medium plane, and the bright plane. The color of the text is

almost all the same, so the variance of the text’s grayscale is small. Therefore, all the

text can be grouped in the same plane. When the text is black, the text and some of the

background with a gray value close to that of the text is put in the dark plane. In

(17)

noise are coarsely separated from the background. In the second phase, an adaptive

threshold is determined to refine the text by adaptive binarization and block extraction.

Then, two compression algorithms that yield a high compression ratio are also

proposed.

The segmentation algorithm focuses on processing the images whose texts are

overlap to the complex background. The study is powerful in extracting texts from

complex backgrounds. However, we can find many advertisements or magazines

whose background images contain many different cases including 1) monochromatic

background with/without texts, 2) slowly varying background with/without texts, 3)

highly varying background with/without texts and 4) complex varying background

with/without different color texts. It is hard to extract the texts when all of the cases

spread in a compound document image, especially. Furthermore, the color of texts

may be more than three. Therefore, the segmentation algorithm in the Chapter 3 may

be insufficient to extract the text from document images in all cases. The text

segmentation method of those complex images becomes a great challenge and still a

novel research field.

To conquer this challenge, we present a text segmentation algorithm for various

document images in Chapter 4. The proposed segmentation algorithm incorporating

with a new multi-layer segmentation method (MLSM) can separate the text from

(18)

overlap. This method solves various problems associated with the complexity of

background images.

The MLSM provides an effective method to extract objects from different

complex images. The complex image includes many different objects such as

difference color texts, figures, scenes and complex backgrounds. Those objects could

be overlapped or non-overlapped by each others. Because those objects have

different features, the image can be partitioned into many object-layers by means of

the features of objects embedded in it. Then the block-based clustering algorithm can

be performed on those layered image sub-blocks and cluster them to form several

object layers. Consequently, different text, non-text objects and background

components are segmented into separate object layers. The proposed method can

separate or objects from 8-bit grayscale or 24-bit true-color images, no matter the

objects overlap a simple, slowly or highly varying background. The block-based

clustering algorithm decomposes the sub-block image into different layered

sub-block images, LSBs, in the order of darkest to lightest corresponding to the

original sub-block image. In the jigsaw-puzzle layer construction algorithm, some

statistical and spatial features of adjacent LSBs are introduced to assemble all LSBs

of the same text paragraph or object.

(19)

segmented into several independent object layers for further extraction process. When

applied to real-life, complex document images, the proposed method can successfully

extract text strings with various colors and illuminations from overlaying non-text

objects or complex backgrounds, as determined experimentally. Experimental results

obtained using different document images scanned from book covers, advertisements,

brochures, and magazines reveal that the proposed algorithm can successfully

segment Chinese and English text strings from various backgrounds, regardless of

whether the texts are over a simple, slowly varying or rapidly varying background

(20)

CHAPTER 2 THE FUZZY-BASED TEXT SEGMENTATION

METHOD

This chapter presents a compression method for color document images based on

the wavelet transform and fuzzy picture-text segmentation. This approach addresses a

fuzzy picture-text segmentation method, which separates pictures and texts by using

wavelet coefficients from color document images. The number of colors, the ratio of

projection variance, and the fractal dimension are utilized to segment the pictures and

texts. By using the fuzzy characteristics of these parameters, a fuzzy rule is proposed

to achieve the purpose of picture-text image segmentation. Two components, text

strings and pictures, are generated and processed by different compression algorithms.

The picture components and the text components are encoded by zerotree wavelet

coding and by the modified run-length Huffman coding, respectively. Experimental

results have shown that the work has achieved promising performance on high

(21)

2.1 Introduction

Digitized images of printed documents typically consist of a mixture of texts,

pictures, and graphics elements, which have to be separated for further processing and

efficient representation. Because text captures the most information, how to segment

the text from printed document images becomes an important step in document

analysis. Accordingly, various techniques have been developed to segment document

images. Many approaches devoted to process monochrome document have been

proposed in the past years. Wahl et al. [6] designed a prototype system for document

analysis and a constrained run length algorithm (CRLA) for block segmentation.

Nagy et al. [7] presented an expert system with two tools: the X-Y tree and formal

block-labeling schema, to accomplish document analysis. Fletcher and Kasturi [8]

proposed a robust algorithm, which uses the Hough transform to group connected

components into local character strings, to separate text from mixed text/graphics

document images. Kamel and Zhao [9] presented two new extraction techniques: a

logical level technique and a mask-based subtraction technique. Tsai [10] proposed an

approach to automatic threshold selection using the moment-preserving principle.

Some other systems based on the prior knowledge of some statistical properties of

various blocks [11]-[15], or texture analyses [16],[17] have also been successively

developed. Those systems all focus on processing monochrome document. In contrast,

(22)

[19] presented a text string extraction algorithm, which uses the edge-detection

technique and text block identification to extract the text string. Haffner et al. [20]

proposed an image compression technique called “DjVu” that is specially geared

toward the compression of document image in color.

In this chapter, we present a compression method for color document images by

using a new fuzzy picture-text segmentation algorithm. This proposed segmentation

algorithm separates the text from color document images in frequency domain by

using the coefficients derived from the discrete wavelet transform. The coefficients

are used to separate the text/picture components by using the fuzzy classification

method. Then, the coefficients of picture components are encoded by the coding

method of zerotree, and Modified Run-Length Huffman Code (MRLHC) encodes the

coefficients of text components.

2.2 The characteristics of coefficients in wavelet transform

The basic theory of the wavelet transform is to represent any arbitrary function f

as a superposition of wavelets. After the first level of two-dimensional discrete

wavelet transform, the image was arranged placing the lowest-frequency band in the

left upper corner, the highest-frequency band in the right down corner, and the

middle-frequency band in the right upper corner and left down corner. The

(23)

bands, therefore, making the image decomposed into the second level of wavelet

transform, and then seven frequency bands as Fig.1 are obtained. The coefficients of

seven bands obtained from the original image by wavelet transform are treated as the

textures of different frequency.

Fig.1 Image after 2-level discrete wavelet transformation.

In Fig.1, the LL2 band has the coefficients of the lowest-frequency. LHi, HLi

and HHi (i=1, 2) bands indicate the edge information in the original images. The LL2

band is very similar to the original image but with only the size of 1/16. The cost of

calculation can be reduced by using the LL2 band for picture-text segmentation. For

picture components, the signals in LH1, HL1 and HH1 bands are not sensitive to

human eyes, and they could be directly discarded. However, for text components,

those frequency bands include prominent edge information which should be coded to

preserve the text information.

After the wavelet transform, the coefficients extracted from text components LL2 LH2

HL1 HL2 HH2

LH1

(24)

appear more edge information than the coefficients from picture components. That is,

the coefficients of text components show higher frequency characteristics than the

coefficients of picture components. As such, the characteristics of wavelet transform

coefficients are different between text components and picture components. Therefore,

the edge feature is a good parameter for segmenting picture-text components.

The number of colors, also a useful feature, can be used for color-document

image segmentation. Because the coefficients of LL2 band are very similar to the

original image after wavelet transform, the color number can be obtained by counting

the color number of UV-plane from the coefficients of LL2 band. However, it is

difficult to obtain the number of color from the color document images directly. In

this chapter, a new algorithm is proposed to extract the number of colors from the

coefficients of LL2 band.

The fractal dimension indicates the complexity of images. In addition, it is found that the fractal dimension [22] between text components and picture components is quit different. Because the picture components are more complicated than the text components, the picture components have higher fractal dimension than the text components. The fractal dimension in original image and in low-resolution image is similar. Therefore, the coefficients of LL2 band are applied only to obtain the fractal dimension from text components and picture components. In this way, the processing time to compute fractal dimension can be reduced.

(25)

2.3 Fuzzy picture-text segmentation algorithm

As mentioned in previous section, the new document image segmentation utilizes

the coefficients from wavelet transform to extract the features of picture components

and text components which can be further separated by extracted features using a

fuzzy algorithm. We use the technique of spreading and region growing to mark all

the foreground blocks including picture-images, text-images and other kinds of

images on LL2 band, and these blocks are then segmented to text components or

picture components (non-text components).

The segmentation method uses color number, the energy of edge projection, and

fractal dimension to segment foreground blocks. The reasons why we use those three

parameters to perform segmentation are listed as fellows：

(1)Color number: Since picture components are more colorful than text components,

color number can distinguish them.

(2)The energy of edge projection: It shows the distribution of edge projection in a

block. In general, the variation of edge projection is regular in text components, and

irregular in picture components.

(3)Fractal dimension (FD): It shows the complexity of images. In most cases, the

pixels in text components distribute more uniformly and the fractal dimension is

(26)

The three parameters, color number, the energy of edge projection and fractal

dimension, are used in the same time to reduce misjudgment. For example, if the

reliability of the three parameters are 9/10, 19/20 and 4/5, the decision error of

picture-text segmentation would diminish to 1/1000 (1/10×1/20×1/5) when we consider the three parameters in the same time appropriately. The Fuzzy Rule

calculation [23] is very suitable to analyze this kind of variables. Therefore, we

propose a fuzzy picture-text segmentation algorithm to separate text components and

picture components from color document images.

Fig.2 The flowchart of fuzzy picture-text segmentation algorithm. W avelet coefficient of layer 2 Sub-band of LL2 Sub-band of LH2, HL2,HH2 Spreading Region growing The position of foreground Fractal Dimension Calculation Color Number Calculation LEPVR Calculation Fuzzifiier Inference Engine Defuzzifier Fuzzy Rule Table The position of picture and text

(27)

The flowchart of the algorithm is shown in Fig.2. Details of the algorithm are

explained in the following subsections.

A. Spreading and region growing for blocks extraction

The coefficients of LL2 band are used to perform block extraction. The

proposed block extraction method is to divide the foreground of document images

into text components and picture components. Before the process, we need to convert

the coefficients of LL2 band into bi-level data, and use the thresholding method to

decide the location of foreground and background. However, the pixel numbers of

background are more than foreground's. In order to make the boundary of foreground

more obvious, the algorithm uses a threshold value from the mean value. The

threshold value is (Mean－Variance). We can realize the influence of bias in Fig.3.

Fig.3 The influence of threshold.

Original Gray-Scale Image

Seperate Foreground and Background by Mean

Seperate Foreground and Background by

(28)

The pixels of foreground and noise will be all extracted by a thresholding

method in the same time. Therefore, we use the Constrained Run Length Algorithm

(CRLA) to remove noise pixels. The algorithm was proposed by Wahl et al. [6] to

preserve the pixel when it comes from the valid continuous pixels. For example, there

is a binary string, 11001000001000011, with a constraint C=4 to the run length of 0s,

if the number of consecutive 0s is less than or equal to C, these 0s must be replaced

with 1s; otherwise, they are reserved. As a result, the above binary string is converted

into the sequence, 11111000001111111. Some noises are eliminated by the method.

The CRLA is performed in horizontal and vertical directions, and the bi-level

images, "Mv" and "Mh", are obtained, respectively. Then, we apply the "OR" operator

on Mv and Mh pixel by pixel, and get a bi-level spreading image, Mhv, which merges

the neighboring pixels in both direction.

Therefore, the methods of thresholding, CRLA and logic operation are called

the spreading process. After the spreading process, the bi-level spreading image Mhv

is processed by the region growing method to gather the foreground pixels into

rectangle blocks. The steps of region growing method are described below:

Step 1. Collect the foreground pixels of image Mhv row by row.

Step 2. Compare the foreground pixels collected from Step 1 with the current

(29)

blocks, the foreground pixels and blocks are merged into the same block.

If there is no overlap, make a new block for the foreground pixels.

Step 3. After region growing, every block will be checked. If the block is

neither growing bigger nor being a new one, stop the block's growing

and regard it as an isolated block.

Step 4. Check if there is any overlap between blocks or not. Merging the

overlapping blocks into the same block.

Step 5. If Mhv comes to the last row, then go to Step 6; if not, return to Step 1.

Step 6. Change all existing blocks into isolated blocks. If there is any overlap

between blocks, merge the overlapping blocks into the same block.

Step 7. Delete those smaller noise blocks.

Step 8. The end.

After the processes of spreading and region growing, we got all foreground

blocks of the image Mhv. An example is shown in Fig.4.

We can calculate the local edge projection variance ratio, the color number, and

(30)

(a) Original document image.(200 dpi, image size=768×256)

(b) The binary image of sub-band LL2.(image size=384×128)

(c) The binary image of sub-band LL2 after CRLA and region growing. (image size=384×128)

Fig.4 Example of CRLA and region growing.

Those calculating methods are described as follows.

B. The calculation of local edge projection variance ratio

It is assumed that texts are written in horizontal or vertical direction. When the

edge information is projected toward the vertical direction of text strings, the

projection histogram would variation regularly. In addition, the projection magnitudes

(31)

information is not projected on the vertical direction of text strings, the variation of

the projection histogram will be irregularly. This property can be used to decide the

direction of text strings. The horizontal or vertical edge projection is used to decide

the direction of text strings.

Furthermore, since the variation of edge projection is different between text

components and picture components, it can be applied to distinguish text components

or picture components from foreground blocks. After discrete wavelet transform, the

edge projection in high frequency bands (LH, HL, and HH) is more obvious than the

one in low frequency band (LL). Therefore, the edge projection is calculated from the

binary image which combines the binary images of high frequency bands (LH, HL,

and HH) using logical OR operator. Fig.5 shows the vertical and horizontal edge

projection of text-image in high frequency band.

Fig.5 The edge projections of text components in high frequency band. H orizontal P ro jection histogram V ertical P rojection histo gram

(32)

The variation of edge projection is regular in text components, and irregular in

picture components. The edge projection variance ratio is defined by

Edge projection variance ration (EPVR)= ×

∑

−

i 2 ) ) i ( ( 1 Mean P Mean (1-1)

in which P(i) is the magnitude of ith projection and Mean is the average of the projection histogram.

The edge projection variance ratio shows the variation in projection histogram.

By considering those two histograms shown in Fig.6, the left one is the projection

histogram of text component and the right one is the projection histogram of picture

component. We find these two projection histograms share the same EPVR. However,

only the left diagram reveals the property of text.

(a)The projection of text component (b)The projection of picture component

Fig.6 Two kinds of projection histogram.

So we modify the equation (1-1) below:

Local edge projection variance ratio (LEPVR)= ×

∑

−

i 2 ) (i) ) i ( ( 1 LMean P Mean (1-2) mean Projection histogram mean Projection histogram

(33)

, where P(i) is the magnitude of ith projection, Mean is the average of projection, and , (k) 1 = (i) 1 -/2 + i /2 -i

∑

× N N P N

LMean N is the width of the sliding window.

LEPVR uses the local mean to compute the local standard deviation used to

substitute the global one. Therefore, the LEPVR of text components is larger than that

of picture components.

C. The calculation of color number

In general, the RGB-plane of a color document image is transformed to the

YUV-plane before further processing. Because all color information can be collected

in the UV-plane, the color number will be calculated in the UV-plane of LL2 band.

Because the wavelet transform 9/7 filter is adopted in this work, the dynamic range of

UV-coefficients, shown in Table 1, would be amplified 4 times after the wavelet

transform.

Table 1. The dynamic range of UV-plane coefficient

Original dynamic

range

First level Second level

U -plane ±112 ±448 ±1792

V-plane ±158 ±632 ±2528

In Table 1, the coefficients in second-level wavelet transform could be enlarged

to 16 times at maximum, but it would not achieve the maximum value in most cases.

(34)

Therefore, minimizing 8 times is more reasonable in current situation.

The extraction of color number from foreground blocks is performed as

follows:

Step 1. Calculating the histogram of UV-plane.

Step 2. Set LB as the threshold value, calculate the effective color range

(ColorRange) whose pixel number is larger than LB.

, , , 3 100 10 ⎪ ⎩ ⎪ ⎨ ⎧ = TotalPixel LB Otherwise TotalPixel If TotalPixel If 1000 400 1000 < ≤ ≤ (1-3)

Step 3. Define the effective mean value (MeanPixel) of pixels as:

ColorRange TotalPixel MeanPixel=

(1-4) Step 4. Set 2 MeanPixel

as the threshold value. Calculate the number of colors

whose peak values of histogram are larger than the threshold value.

This number of colors is called color number in UV-plane.

We can calculate the color numbers of A and B in UV-plane. We define the value

of A×B to be the color number of the foreground block.

D. The calculation of fractal dimension

The fractal dimension indicates the complexity of images. The more complex

image is, the larger the fractal dimensions. However, images are two-dimensional, so

(35)

There are many ways to calculate the fractal dimension (FD). Here, we use the

Two Dimension Box Counting Method [25] which is easier and faster to calculate

compared to others. The steps of Two Dimension Box Counting Method are listed

below:

Step 1. Assume all boxes are square. Set the minimum value (Boxmin) and the maximum value (Boxmax) of box, and let Len equal to the width of

Boxmin. To partition the image into blocks, the width of block is the value of Len.

Step 2. From the upper left side of image, search whether there exists any foreground pixel in the box whose width is the value of Len. If yes, plus 1 to the counter (Count(Len)). Apply this method to the whole image from left to right and up to down. After the process, the number of boxes contained the foreground pixel is obtained.

Step 3. If Len ≦ Boxmax, Len is increased by 1, and go to Step 2, else go to Step 4.

Step 4. Draw the logarithm figure of

Len

1

and Count(Len).

Step 5. Use the Least Square Error Method to get the line which is the nearest to the curve of the logarithm figure in step 4. Then define the slope of the curve as the fractal dimension of the image.

Theoretically, fractal dimension would not change in any scale, meaning that

the characteristics of fractal dimension will be the same when enlarge or reduce the

size of image. However, the property will be slightly changed because of the limited

(36)

lower down the resolution of image. Fig.7 illustrates the images with text component

and picture component. The FD difference between the original image and the image

in LL2 band is shown in Table 2.

(a) picture component (b) text component

Fig.7 Images with text/picture component.

Table 2. The fractal dimension influenced by the resolution of images. FD of original image FD of LL2 image

Fig.7 (a) 1.809 1.861

Fig.7 (b) 1.24 1.178

There are two information concluded from Table 2.

(1) The FD in original image is very close to the FD in the low-resolution image. The variation range is between ±0.1.

(2) FD in text components is smaller than in picture components.

The information (1) confirms that we can calculate FD in LL2 band to replace FD

in original image. The information (2) shows the difference of FD in picture and text

(37)

E. Fuzzy logic decision system

We can use the local edge projection variance ratio, color number, and fractal

dimension to segment the foreground blocks to picture components or text

components. In some cases, the local edge projection variance ratio, color number,

and fractal dimension of text components would show the same characteristics as

those of picture components do. In other words, those parameters have the fuzzy

characteristic. Therefore, we can use the fuzzy theorem to classify the foreground

blocks.

In general, picture components and text components have three characteristics:

(1) If the local-edge projection variance ratio (LEPVR) is large, it is likely to be text

components.

(2) If the color number is large, it is likely to be picture components.

(3) If the FD is large, it is likely to be picture components.

The three characteristics are showed in Fig.8. The fuzzy membership functions

shown in Fig.9 are used for the testing of many images and the recognition of human

(38)

Fig.8 The characteristics of three parameters.

Fig.9 The membership functions of three parameters.

(S:small; CS:close to small; CL:close to large; L:large)

By using Hamming distance, and the number of reliability of three parameters,

we set a Fuzzy Rule Table (4×4×4 cubic) as shown in Fig.10. First, filling into 0 (the

most likely be picture components) in one edge of this cubic block, then fill into 18

(the most likely be text components) in the opposite position of 0. From 0 to 18, no

matter which the route is, the Hamming distance is equal to 9. Then, filling number

based on equal Hamming distance, we got an initial fuzzy rule table where the three

parameters maintain the same degree of importance.

S CS CL L 3 5 7 9 LEPVR S CS CL L 1.3 1.5 1.7 1.9 FD S CS CL L 40 70 100 130 Color number LEPVR large small picture text

Color number small large

picture text

FD small

large

(39)

(a) (b)

(c) (d) Fig.10 The fuzzy rule table.

In the fuzzy rule table, the large number is more likely to be the text components.

On the contrary, the small number is more likely to be the picture components. Even

though the text components, which contain only a color in the same text component, it

would not include more color number than picture components do. Therefore, we can

emphasize the weighting of “color number is large” for picture components. However,

for gray or single color images, since there is only one color, this information (color

number) is not reliable for text components and picture components. Under this

circumstance, it is needed to rely on LEPVR and FD, so we need to put more

emphasis on the weighting of these two parameters for gray or single color images.

Color Numbers Color Numbers

LEPVR small large small 6 10 16 18 4 8 14 16 4 6 8 10 large 2 3 5 7 FD is small LEPVR small large small 5 10 14 16 3 7 12 14 3 5 7 9 large 1 3 4 6 FD is close to small

Color Numbers Color Numbers

LEPVR small large small 4 8 12 14 2 6 10 12 2 4 6 8 large 0 2 2 4 FD is close to large LEPVR small large small 3 6 10 12 2 5 8 10 1 4 6 8 larg 多 0 1 1 1 FD is large

(40)

Through these methods discussed above, we design the fuzzy rule table in Fig.10. The

corresponding function in every rule is shown in Fig.11.

Fig.11 Possible fuzzy quantization by triangular-shaped fuzzy numbers

A number of defuzzification methods leading to distinct results were proposed in

many literatures. Each method is based on some rationale. In this method, the center

of area method [23] is selected to define the defuzzified value. The defuzzified value

is set 100： if the defuzzified value of foreground blocks is smaller than 100, it

belongs to picture components; and if the defuzzified value of foreground blocks is

larger or equal to 100, then it belongs to text components.

2.4 Color document images compression method

After fuzzy picture-text segmentation algorithm is applied, the document images

are classified into text components and picture components. In this section, zerotree C1 1 C0 C2 C17 C18 10 20 C5 60 190 180 Possibilit y 30

(41)

coding method is used to compress the coefficients of wavelet transform for picture

components, but the coefficients of text components need to be removed. For

text-components, we have to extract the colors of text from the original document

image. Each color of text components will form a single color plane. The color plane

is compressed by the Modified Huffman code.

However, the run-length of blank pixels between text-lines can be very long, and

may reduce the coding efficiency of Huffman code. To solve this problem, we use

Modified Run-length code to deal with the long run-length, and Modified Huffman

code codes the short run-length. The flowchart of proposed compression algorithm is

shown in Fig.12.

Fig.12 The flowchart of proposed compression algorithm.

The coefficients after wavelet transform of layer 2

Picture and Text Segmentation

Text components

Picture components

separate foreground and background

extract the color of background

quantize the color in

Text-plane

coding each single-color plane by MRHC clear text's coefficients in picture components continue wavelet transform of layer 3,4... coding the coefficients by zerotree

(42)

A. Color quantization in text components

For the text components, they are separated into several single-color planes.

Therefore, we have to decide the number of colors in the text components by

calculating the foreground histogram (His[i]) from the UV-plane. The steps are listed

as follow:

Step 1. Let His[i] pass the low-pass filter, and set the result as Fhis[i].

Step 2. Find the maximum values of Fhis[i].

Step 3. Get rid of the maximum value of Fhis[i] which is smaller than

TH_LowBound.

Step 4. Set a threshold of duration (TH_UV), and calculate the local-maximum

between ±TH_UV.

In Step 1, the purpose of low-pass filter is to eliminate high frequency noise. In

Step 2, all the peak values in FHis[i] can be found, and the smaller peaks in Step 3 are

deleted. In Step 4, we can obtain the major colors of text components and use the

color information to segment the text component into many single-color planes by

difference colors. The meaning of TH_UV is the minimum difference of colors that

human eyes are able to discern. Here the value of TH_UV is set to 15 and

(43)

B. Using Modified Run-Length Huffman Code (MRLHC) to encode text

The text components are segmented into several single-color planes. Those

planes are compressed by Modified Run-Length Huffman Code.

Modified Run-Length Huffman Code is especially designed to code the long

run, so it is expected that the code has the ability to handle extreme long runs. The

algorithm is designed as follows:

Run-Length=1728×Multi-code + 64×Makeup code + Terminate code

'27' 26' ~ '1 only 1728) (2 ~ 1728 1727 ~ 64 63 ~ 0 length -Run 16 code Length Run Modified code Makeup code Terminate code Makeup code Terminate + + ⎪ ⎩ ⎪ ⎨ ⎧ × =

Modified Run Length code = Continue-code + Multi-code + Makeup code +

Terminate code.

The first four bits are Continue-code which indicates the number of bits can be

read in Multi-code. By using this approach, the maximum Run-Length is represented

as long as 1728×216＝27×222

. The maximum Run-Length in A3-size paper scanned in

300dpi is approximate by 8×222. Therefore, Using Modified Run Length Code can

represent an A4-size paper just in one code.

C. Compression method of picture components

When the blocks of texts are extracted from the color document image, many gaps are located at the color document image. The boundary of those gaps produces many high-frequency coefficients after wavelet transform. The zerotree coding

(44)

algorithm will waste bits to encode those high-frequency coefficients. In order to improve the efficiency of compression, those gaps must be compensated with appropriate coefficients. Because the fuzzy picture-text segmentation algorithm extracts the texts based on the coefficients of wavelet transform, it just need to compensate the coefficients of text components from the coefficients of wavelet transform. Our method is to directly compensate the text components by the average of neighboring data in the LL2 band, and set the coefficients of text components in HLi, LHi, and HHi (i=1,2) bands as zeroes. After compensating the text components in the sub-band coefficients, the wavelet transform is adopted for the coefficients of LL2 band continuously. Then, the zerotree coding algorithm is used to encode the coefficients of wavelet transform.

2.5 Experimental results

The proposed coding algorithm was simulated on Window 2000 (Pentium Ⅲ

700, 128 MB RAM) with programs written in C++ language. In this study, we used

the 24-bit true color image format and 200 dpi in processing. Each pixel in a 24-bit

true color image is characterized by its R, G, and B color values, and 8 bits represent

every value. Fig.13 and Fig.14 show the comparison of the JPEG and the new

compression algorithm. The time of performing fuzzy picture-text segmentation in

Fig.13(c) and Fig.14(c) are both 0.02 sec. Experimental results show that the

compression algorithm based on fuzzy picture-text segmentation has achieved better

(45)

2.6 Concluding remarks

Traditional color image-compression standards such as JPEG are inappropriate

for document images. JPEG’s application relies on the assumption that the high

spatial frequency components in images can be essentially removed without much

quality degradation. While this assumption holds for most pictures of natural scenes,

it does not work for document images. The texts require a lossless coding technique to

maximize readability. This chapter has proposed a new compression method with

promising performance on color document images. The method uses different

compression algorithms based on fuzzy picture-text segmentation for sub-images with

different characteristics. The fuzzy picture-text segmentation algorithm is based on

the coefficients of wavelet transform. It is fast to find out the text components and

picture components from the coefficients of wavelet transform. We have also

compared our method with JPEG. The results show that the new compression method

(46)

Fig.13(a) The original image(512×1024, scanned by ScanMaker V600 at 200dpi)

(47)

Fig.14(a) The original image(1024×512, scanned at 200dpi)

Fig.14(b) JPEG image(CR=83.1)

(48)

CHAPTER 3 THE COMPRESSION ALGORITHMS FOR

COMPOUND DOCUMENT IMAGES WITH

LARGE TEXT/BACKGROUND OVERLAP

This chapter presents two algorithms for compressing image documents, with a

high compression ratio of both color and monochromatic compound document images.

The proposed algorithms apply a new segmentation method to separate the text from

the image in a compound document in which the text and background overlap. The

segmentation method classifies document images into three planes: the text plane, the

background plane, and the text’s color plane. Different compression techniques are

used to process the text plane, the background and the text’s color plane. The text

plane is compressed using the pattern matching technique, called JB2. Wavelet

transform and zerotree coding are used to compress the background plane and the

text’s color plane. Assigning bits for different planes yields high-quality compound

document images with both a high compression ratio and well presented text. The

proposed algorithms greatly outperform the famous image compression methods,

JPEG and DjVu, and enable the effective extraction of the text from a complex

(49)

3.1 Introduction

A color page of A4 size at 200 dpi is 1660 pixels wide and 2360 pixels high; it

occupies about 12 Mbytes of memory in an uncompressed form. The large amount of

data prolongs transmission time and makes storage expensive. Texts and pictures

cannot be compressed by a single method like JPEG since they have different

characteristics. Digitized images of compound documents typically consist of a

mixture of text, pictures, and graphic elements, which have to be separated for further

processing and efficient representation. Rapid advances in multimedia techniques

have enabled document images, advertisements, checks, brochures and magazines to

overlap text with background images. Separating the text from a compound document

image is an important step in analyzing a document.

Document image segmentation, which separates the text from the

monochromatic background, has been studied for over ten years. Segmenting

compound document images is still an open research field. Traditional image

compression methods, such as JPEG, are not suitable for compound document images

because such images include much text. These image data are high-frequency

components, many of which are lost in JPEG compression. Text and the

high-frequency components thus become blurred. Then, the text cannot be recognized

easily by the human eye or a computer. The text contains most information, separating

(50)

research into document images. The traditional compression method cannot meet the

needs of the digital world, because when compound documents are compressed at a

high compression ratio, the image quality of the text part usually becomes

unacceptable.

Many techniques have been developed to segment document images. Some

approaches to processing monochromatic document images have already been

proposed. Queiroz et al. proposed a segmentation algorithm based on

block-thresholding, in which the thresholds were found in a rate-distortion analysis

method [26]. Some other systems based on a prior knowledge of some statistical

properties of the various blocks [11]-[15], or textual analyses [16],[17],[27] have also

been subsequently developed. All these systems focus on processing monochromatic

documents. In contrast, few approaches to analyzing color documents have been

proposed. Suen and Wang [19] utilized geometric features and color information to

classify segmented blocks into lines of text and picture components. Digipaper [28]

and DjVu [20],[21] are two image compression techniques that are particularly geared

towards the compression of a color document image. The basic idea behind Digipaper

and DjVu is to separate the text from the background and to use different techniques

to compress each of those components. The image of the text part in DjVu is encoded

using a bi-level image compression algorithm called JB2, and the background image

(51)

These methods powerfully extract text characters from a simple or slowly varying

background. However, they are insufficient when the background includes sharply

varying contours or overlaps with text. Extracting the text when the color of the

overlapped background is close that of the text is especially difficult. Finding a text

segmentation method of complex compound documents remains a great challenge and

the research field is still young.

This chapter proposes a new segmentation algorithm for separating text from a

complex compound document in 24-bit true-color or 8-bit monochrome. Two

compression algorithms that yield a high compression ratio are also proposed. The

technique separates text from background image after segmenting the text. Therefore,

it has many applications, such as to color facsimiles and document compression.

Moreover, the segmentation algorithm can be used to find characters in complex

documents with a large text/background overlap.

3.2 Text segmentation algorithm

When a document image is captured from a scanner, it can include several

different components, including text, graphics, pictures, and others. The textual parts

must be separated from a compound document image, whether effective compression

or optical character recognition (OCR) is intended.

This section introduces a new extraction algorithm that can separate text from a

(52)

(a) Test image A size=1024×1536 (b) Test image B size=1024×1536 (c) Test image C size=1024×1536 (d) Test image D size=1024×1536

(e) Test image E size=1344×1792

(f) Test image F size=1024×1536 Fig.15 The original full images (200 dpi)

(53)

(a) Original Y-plane image

(b) Bright plane

(c) Medium plane

(d) Dark plane

(54)

The features and texture of a complex compound document can be very

complicated. In a pilot test, when a color document image was converted into a

monochromatic image, the gray value of the image of text differed slightly from the

gray value of the background image. However, the image of text cannot easily be

directly separated from the background image because the difference between the

gray values is too small. Therefore, two phases are used to accomplish the desired

purpose. In the first phase, which involves color transformation and clustering

analysis, the monochromatic document image is partitioned into three planes, the dark

plane, the medium plane, and the bright plane, as depicted in Fig.16. The color of the

text is almost all the same, so the variance of the text’s grayscale is small. Therefore,

all the text can be grouped in the same plane.

When the text is black, the text and some of the background with a gray value

close to that of the text is put in the dark plane. In contrast, the text is put into another

plane if it is not black. Thus, the text and some noise are coarsely separated from the

background. In the second phase, an adaptive threshold is determined to refine the

text by adaptive binarization and block extraction. The two phases in which the

algorithm extracts the text from the background are shown below.

A. Color Transformation

The color transformation technique is used to transfer a color document image to

(55)

complicated background image.

B. Clustering analysis

In general, after a color document image is converted into a monochromatic one,

the textures of the original color image are still present in the converted grayscale

image. The difference between the text’s gray value and that of the overlapping

background image is small. Thus, a clustering algorithm is used to split the grayscale

images. Clustering analysis roughly separates text from a background image. First, it

extracts as many as possible of the different textures of an image. The text is

embedded in one of the planes.

The clustering algorithm is described below.

Step 1. Partition the M×N grayscale image A

( )

i, j into p sub-block images xn

( )

i,j . Each sub-block xn

( )

i,j is of K×L, where n=1,2,…p.

Step 2. Calculate the mean of gray value, mn, and the standard derivation,σ n, of each

K× L sub-block image. For the nth sub-block image xn

( )

i,j , the mean and standard derivation are computed as

mn= L K j i x j i n × ∑ , (, ) ; (3-1) σ n= L K m j i x j i n n × ∑ − , 2 ] ) , ( [ . (3-2)

(56)

' 1 n C and Cn'₂, by ' 1 n C =mn + 50. ×σn and ' 2 n C =mn − 50. ×σn. (3-3)

Step 4. Calculate the absolute difference of each pixel of xn

( )

i, j to Cn'1 and ' 2 n C using ' 1 , ij D = xn

( )

i,j −Cn'₁ and ' 2 , ij D = xn

( )

i, j −Cn'₂ . (3-4)

Then, xn

( )

i,j partition into two clusters ηk

(

k=1,2

)

according to η₁:

{

( )

'

}

2 , ' 1 , | , _ij _ij n i j D D x ≤ , and η₂:

{

( )

'

}

2 , ' 1 , | , _ij _ij n i j D D x > (3-5)

Step 5. Calculate the mean mnk and standard derivation σnk of the two clusters

(

k=1,2

)

k

η using Equations (3-1) and (3-2), respectively.

If σn₁ >σn₂, then center Cn3 =mn −0.5×σn, and compute the two new centers

1 n

C and Cn₂ using Cn1=mn1+0.5×σn1, and

2 n

C =mn₁−0.5×σn₁. (3-6)

Else, (if σn₁<σn₂, then center Cn3 =mn +0.5×σn, and compute the two new centers

1 n

C and Cn₂ using Cn1=mn2 +0.5×σn2, and

2 n

C =mn₂ −0.5×σn₂. (3-7)

(57)

is partitioned into three clusters ψk (k=1,2,3) according to, : 1 ψ

{

xn

( )

i, j |Dij_,₁<Dij_,₂ and Dij_,₁<Dij_,₃

}

; : 2 ψ

{

xn

( )

i, j |Dij_,₂ <Dij_,₁ and Dij_,₂ <Dij_,₃

}

; : 3 ψ

{

x_n

( )

i,j |D_ij_,₃ <D_ij_,₁ and D_ij_,₃<D_ij_,₂

}

; (3-8) where, Dij_,₁= xn

( )

i,j −Cn1 , D_ij_,₂= xn

( )

i,j −Cn₂ , and Dij_,₃= xn

( )

i,j −Cn3 . (3-9) Repeat Steps 2 to 6 until all of the sub-block images xn

( )

i,j (n=1,2,…p) have been processed.

The optimal partition of the images depends on the intensity distribution of

background images and the lengths, sizes and layouts of the text strings. However,

analyzing those parameters is very complex. Therefore, images are partitioned into

equal sub-blocks for simplicity. This study used K=256 and L=128, and a value of p

that depended on the image size. The constants were determined empirically to ensure

good performance in general cases.

C. Adaptive binarization

After the first phase, the gray values of each plane become simpler than those of

the original document image. Then, the text is extracted from the background image

(58)

two classes, global and local. The global thresholding algorithm uses a single

threshold, while a local thresholding algorithm computes a separate threshold for each

local region. This work utilizes a local thresholding algorithm. Set the threshold value

THn of the dark and bright planes of the sub-block images xn

( )

i, j (n=1,2,…p) to,

THn= m_fn ⎟⎟ ⎠ ⎞ ⎜⎜ ⎝ ⎛ ± n n n - m_f m_b m_f ×σ _fn (3-10)

, where m_fn is the mean value calculated from the pixels in one of the planes, xn

( )

i,j ;

m_bn is the mean value calculated from the pixels in the other xn

( )

i,j plane , and σ _fn is the standard deviation calculated from the pixels in the same plane xn

( )

i, j as

m_fn.

Restated, m_fn is determined by the foreground pixels of the plane that is being

processed, while m_bn relates to the background pixels of the other two planes. From

Eq. (3-10), the value of THn can be adapted to the gray value of sub-block xn

( )

i,j . The foreground pixels in the dark plane are darker than the background pixels. The

adaptive thresholding value can be calculated as, THn= m_fn

-n n n -m_f m_b m_f ×σ _fn.

The threshold value THn is biased to the left of the mean value m_fn. The foreground

pixels in the bright plane are brighter than the background pixels. The adaptive

thresholding value can be calculated as, THn= m_fn +

n n n -m_f m_b m_f ×σ _fn. The threshold value THn is biased to the right of the mean value m_fn. Figure 17 gives an

(59)

example of the algorithm.

(a) Original sub-block image (b) Dark plane of the sub-block

(c) Histogram of image (b)

(d) Bi-level image after adaptive thresholding

Fig.17 An example of text extraction algorithm image(size=256×128)

Figure 17(a) is the sub-block image xn

( )

i,j , Figure 17(b) is the dark plane of the sub-block, and Fig. 17(c) shows the histogram of the sub-block image xn

( )

i,j . The threshold value THn has a bias to the left of the mean value m_fn to clarify the text.

Figure 17(d) shows the bi-level image.

D. Spreading and region growing for block extraction

The pixels of foreground and noise are obtained simultaneously using the

thresholding method. Accordingly, the isolated pixels are deleted and the Constrained

Run Length Algorithm (CRLA) [7] is applied to remove noise pixels.

(60)

binary images Mh

( )

i,j and Mv

( )

i,j (1<i<M, 1<j<N), respectively. Then, the "AND"

operator is applied to Mh

( )

i,j and Mv

( )

i, j pixel by pixel, and a binary image,

Mhv

( )

i,j , which merges the neighboring pixels in both directions, is obtained.

The CRLA and the logic operation together constitute the spreading process,

after which, the binary spreading image Mhv

( )

i,j is processed using the region

growing method to arrange the foreground pixels into rectangular blocks. The region

growing method is described in the Chapter 2.

The processes of spreading and region growing yield the positions of the

foreground blocks.

E. Distinguishing text from foreground blocks

The blocks that contain foreground pixels are extracted following the spreading

and growing processes. The blocks that contain text strings must now be identified. In

this study, three parameters, transition pixel ratio, foreground pixel ratio, and block

size, are used to identify these blocks.

The transition pixel ratio is defined as,

T = block of Area block in pixels transition of number Total (3-11)

The transition pixel occurs at the boundary of foreground pixels. For example,

複雜型複合式文件影像壓縮方法之研究

國

立

交

通

大

學

電機與控制工程學系

博 士

論

文

複雜型複合式文件影像壓縮方法之研究

THE STUDY OF THE COMPRESSION ALGORITHMS FOR

COMPLEX COMPOUND DOCUMENT IMAGES

研 究 生：瞿忠正

指導教授：吳炳飛 教授

複雜型複合式文件影像壓縮方法之研究

THE STUDY OF THE COMPRESSION ALGORITHMS FOR

COMPLEX COMPOUND DOCUMENT IMAGES

研 究 生：瞿忠正 Student：Chung-Cheng Chiu

指導教授：吳炳飛 Advisor：Bing-Fei Wu

國 立 交 通 大 學

電 機 與 控 制 工 程 學 系

博 士 論 文

複雜型複合式文件影像壓縮方法之研究

學生：瞿忠正 指導教授：吳炳飛 教授

國立交通大學 電機與控制工程學系 博士班

摘 要

THE STUDY OF THE COMPRESSION ALGORITHMS

FOR COMPLEX COMPOUND DOCUMENT IMAGES

Student：Chung-Cheng Chiu Advisor：Prof. Bing-Fei Wu

Department of Electrical and Control Engineering

National Chiao Tung University

ABSTRACT

ACKNOWLEDGEMENTS

CONTENTS

LIST OF TABLES

LIST OF FIGURES

CHAPTER 1

INTRODUCTION

CHAPTER 2

THE FUZZY-BASED TEXT SEGMENTATION

METHOD

∑

∑

∑

CHAPTER 3

THE COMPRESSION ALGORITHMS FOR

COMPOUND DOCUMENT IMAGES WITH

LARGE TEXT/BACKGROUND OVERLAP

( )

( )

( )

( )

( )

( )

( )

( )

(

)

{

( )

}

{

( )

}

(

)

{

( )

}

{

( )

}

{

( )

}

( )

( )

( )

博士

研究生：瞿忠正

指導教授：吳炳飛教授

研究生：瞿忠正 Student：Chung-Cheng Chiu

國立交通大學

電機與控制工程學系

博士論文

學生：瞿忠正指導教授：吳炳飛教授

國立交通大學電機與控制工程學系博士班

摘要