Text extraction from complex document images using the multi-plane segmentation technique

(1)

2006 IEEEInternational Conference on Systems, Man,and Cybernetics

October 8-11, 2006, Taipei, Taiwan

Text

Extraction

from Complex

Document Images

Using

the

Multi-plane

Segmentation Technique

Yen-Lin Chen, Student Member, IEEE, and

Bing-Fei

Wu*, Senior Member, IEEE

Abstract-This study presents a new method for extracting characters from various real-life complex document images. The

proposed method applies a multi-plane segmentation technique to separate homogeneous objects including text blocks, non-text graphical objects, andbackground textures into individual object planes. It consists of two stages - automatic localized multilevel thresholding, and multi-plane region matching and assembling. Then a text extraction process can beperformed on the resultant planes todetect and extract characters with differentcharacteristics

intherespective planes. Theproposedmethod processes document imagesregionallyandadaptively accordingtotheirrespectivelocal features. This allows preservation of detailed characteristics from

extracted characters, especially small characters with thin strokes,

aswell as gradationalilluminations of characters. This alsopermits background objects with uneven, gradational, and sharp vanations incontrast, illumination, andtexture tobehandledeasilyandwell. Experimental results on real-life complex document images

demonstrate that the proposed method is effective in extracting characters with various illuminations, sizes, and font styles from varioustypesofcomplexdocument

images.'

I. INTRODUCTION

Textual information extraction from document images provides many useful applications in documentanalysis and understanding, such as optical character

recognition,

document retrieval, and compression. To-date, many effective techniques have been

developed

for

extracting

characters from monochromatic document

images [1].

In recentyears,advances in multimedia

publishing

and

printing

technology have led to an increasing number of real-life documents in which

stylistic

text

strings

are

printed

with decorated objects and colorful, varied

background

components. However, these approaches do not work well for extracting characters from real-life complex document images. Comparedtomonochromatic documentimages, text extraction incomplex document images

brings

with it many difficulties associated with the complexity of

background

images, variety and shading of character illuminations, superimposing characters with illustrations and

pictures,

as

wellasother decoratedbackgroundcomponents.

Since most characters show sharp and distinctive edge features, methods based on edge information

[2]-[4]

have been developed. Such methods utilize an edge detection operatorto extract theedgefeatures of characterobjects, and then use these features to extract characters from document images. Such edge-based methods are capable of extracting characters in different homogeneous illuminations from graphic backgrounds. However, when the characters are adjoined or touched with graphical objects, texture patterns,

This workis supported by the National Science Council under GrantNo. NSC94-2213-E-009-066.

*Theauthorsarewith Department ofElectrical and Control Engineering, National Chiao TungUniversity,1001Ta-HsuehRoad, Hsinchu 30050,

Taiwan(e-mail: bWL occnCt1u.edut w)

or backgrounds with sharply varying contours, edge-feature vectors of non-text objects with similar characteristics may also beidentified as text, and thus the characters in extracted textregions are blurred by those non-text objects.

In recent years, several color segmentation-based methods for text extraction from color document images have been proposed [5]-[7]. Thesemethods utilizecolor clustering or quantization approaches for determining the prototype colors of documents so as to facilitate the detection of character objects in these separated color planes. However, most of these methods have difficulties in extracting characterswhich are embedded in complex backgrounds or that touch other graphical objects. This is because the prototype colors are determined in a global view, so that appropriate prototype colors cannot be easily selected for distinguishing characters from those touched graphical objects and complex backgrounds without sufficient contrast. In this study, we propose an effective method for extracting characters from these complex document images, and resolving the above issues associated with the complexity of their backgrounds. The document image is processed by the proposed multi-plane segmentation technique to decompose it into separate object planes. The proposedmulti-plane segmentation technique comprises two stages: automatic localizedhistogram multilevel thresholding, andmulti-planeregion matchingandassemblingprocessing. After the multi-plane segmentation technique has been carried out,homogeneous objects includingtextblocks,other non-text objects, and backgroundtextures are separated into individual object planes. The textextraction process is then performed on the resultant planes to detect and extract characters with different characteristics in the respective planes. The document image is processed regionally and adaptively according to local features by the proposed method. This allows detailed characteristics ofthe extracted characters to be well-preserved, especially the small characters with thin strokes, as well as the gradational illuminations ofcharacters. This also allows for characters adjoined ortouched with graphical objects andbackgrounds with uneven, gradational, and sharp variations in contrast, illumination, and texture to be handled easily and well. Experimental results demonstrate thatthe proposed method is capable of extracting characters from various types of complexdocumentimages.

II. LOCALIZED HISTOGRAMMULTILEVELTHRESHOLDING The multi-plane segmentation, if necessary, begins by applying a color-to-grayscale transformation on the RGB components of a color document image to obtain its illumination component Y After the color transformation is performed, the illumination image Y still retains the texture features of theoriginal color image, as pointed out in [2], and

(2)

thus the character strokes in theiroriginal colorarestill well-preserved. Then the obtained illumination image Ywill be divided intonon-overlapping rectangular blockregions with dimension

MHXMv,

asshowninFig. 1(b). Thus the mission is to extract objects with similar characteristics from these rectangular block regions into different sub-block regions to facilitate further analysis in the following stage. Hence, an effective multilevel thresholding technique is needed for automatically determining the suitable number ofthresholds for segmenting the block region into different decomposed object regions. By using the properties of discriminant analysis, we have proposed an automatic multilevel global thresholding technique for image segmentation [8]. This technique automatically determines the suitable number of thresholds, and utilizes afast recursive selection strategy for selecting the optimal thresholds to segment the image into separate objects with similar characteristics in a computationally frugal way. In this study, we utilize this multilevel threshold selection technique and make necessary modifications to adapt it for segmenting block regions into differentobjectswithsimilar characteristics.

We utilize the concept of separability measure described in [8] as a criterion ofautomatic segmentation of objects in a given block region, denoted by 91 . This segmentation criterion is denoted by the "separability factor"- ST as in[8], and isdefinedas,

SE = VBC(T)!v91 = 1-

vwc

(T)/vqf

(1)

where VBC VWC and

v,,

are between-class variance, within-class variance, and total variance of the gray intensities of pixels in the region SR, and

v,,

serves as the normalization factor; and T is the evaluated threshold set composed ofnthresholds,topartitionpixelsintheregion 91 into n±+1 classes. Thesepixelclasses arerepresentedbyCo

to'...,

tl}

I

I .. Ck= {tk +1

X..tk+1}1

Cn {tn+1, .l--,U

Based on this efficient discriminant criterion, an automatic multilevel thresholding is applied for recursively segmenting the block region

91

into different objects of homogeneous illuminations, regardless of the number of objects and image complexity ofthe region

91.

It can be performed until the SE measure is large enough to show that the appropriate discrepancy among the resultant classes has been obtained. This objective is reached by the scheme that selects the class with the maximal contribution

Wk

Uk2

(wherewk and

ok

arethe cumulative probabilityand variance ofthe gray intensities in class Ck, respectively) of the total within-class variance

vwc(T),

denoted by

Cp,

for recursively performing an optimal bi-class partition procedure, as described in [8], until the separability among all classes becomes satisfactory, i.e. the condition where the ST measure

approximates

a

sufficiently large

value. The class

Cp

will be divided into two classes

Cpo

and

Cpl

by applying the optimal threshold

t*

determined by the localized histogram thresholding procedure as described in [8]. Thus, the

ST

measure will most rapidly reach the maximal increment to satisfy sufficient separability among

the resultant classes of pixels. As a result, objects with homogeneous gray illuminations will bewell-separated.

Furthermore, if a region 91 comprises of a set of pixels with homogeneous gray intensity features, consisting of parts of a larger object or background region, then it should not be partitioned and should keep its original components. Thesehomogeneous regions can be determined by evaluating two statistical features: 1) the bi-class ST measure, denoted as

S(F,

which is the SF value obtained by performing the initial bi-class partition procedure on region 91 , i.e. the SF value associated with the determined threshold

t*;

and2) theillumination variance,

v,

of pixels in the region 91. Hence, if both the

SFb

and

v,

features have small values, this reveals that the distribution of the region 91 is concentrated within a compact range, and thus the 91 comprises a set of homogeneous pixels representing a simple object or parts thereof. Therefore, the following homogeneity condition is utilized for determining the situation where both the

Sfb

and

v,

features are small:

SFb

<

Thho

,and vS <

Thhl

(2)

where Thho and Thhl are pre-defined thresholds. Therefore, if the homogeneity condition is satisfied, the region 91 is recognized as a homogeneous region, and does not need to undergo the partition process and hence keeps its pixels of homogeneous objects unchangedto be processed by the next stage. The values ofthe two thresholds

Thho

and ThhI are experimentallychosen as 0.6 and 90, respectively.

Then the localized automatic multilevel thresholding processis performedasthefollowing steps:

Step 1: To begin, the illumination image Y with size

Wi,ngXHimg

is divided into

rectangular

block

regions

91'j with dimension

MHxMv,

as shown in Fig. 1(b). Here

(i,

j)

are the location indices, and i=0,...,N,, and

j

=0,...,

Nv

, where N,, =

([WJmg

/MI,

I

-l) and

Nv=

(H,,ng

/M,

-

1)

, which

represent

the numbers of

divided blockregionsper rowand per column, respectively. Step2: For each block region

91"'

, compute the histogram of pixels in 9 1" , and then determine the illumination variance - V9j' and the bi-class separability measure

SE,

initially, there is only one class

,2"i;

let q represent the present amount of classes, and thus set q = 1. If the

homogeneity condition, i.e. Eq. (2), is satisfied,thenskip the localizedthresholdingprocessforthisregion

91"'J

and go to step 7;elseperformthefollowingsteps.

Step3: Currently, q classes exist, having been decomposed from

91"j.

Compute the class probability

wIj,

the class mean u4', andthestandarddeviation S',ofeachexisting class

Ck'i

of pixels decomposed from 91"', where k denotestheindex of the present classes andn=0,. q-1.

Step 4: From all classes Ck", determine the class Cp" which has the maximal contribution

(Wi' k'/

2) of the total

(3)

within-class variance

v"4

of

9"i

", to be partitioned in the next step in order to achieve the maximalincrement of ST. Step 5: Partition

Co"':

{

ti'

+1, ...,

t>

} into two classes

Cpjo

:{

ti

+1,

...,ts1

i},

and

C(j

j:{ tsj

+l,

...,t,}

, using

the optimal threshold

t"I

* determined by the bi-class

partitionprocedure. Consequently, the gray intensities of the region

9V'i

arepartitioned intoq+l classes, cI',I ... I

(l

. , and then let q =q±+l toupdatetherecord of the currentclassamount.

Step 6: Compute the SE value of all currently obtained classes using Eq. (1), if theobjective condition, SE .ThSF, is satisfied, thenperform the following Step 7; otherwise, go back to Step 3 to conduct further partition process on the obtained classes.

Step 7: Classify the pixels of the block region 91"j into separate sub-block regions, SR

i,,O,

SR

i,j,l

..., SRi,j,q-, corresponding to the partitioned classes of gray illumination intensities,

GC',

Iq

I',

...,

Cq,'l

, respectively, where the notation SRi,j,k represents the k-th SR decomposed from the region 91"j . Then, finish the localized thresholding process on

91"j

andgobackto step 2and repeatSteps 2-6 to process the remaining block regions; ifall block

regions

have beenprocessed,gotostep 8.

Step 8: Terminate the segmentation process and deliver all obtainedsub-blockregions of the

corresponding region

91.

The value oftheseparabilitymeasurethreshold

ThsF

is chosen as 0.92 according to the

experimental

analysis

described in [8] to

yield satisfactory

segmentation

results on

the block regions. With regard to the dimension

MHXMv

of theblockregion, suitable larger values of theparameters MH and Mv shouldbe selected so that all

foreground objects

in the images can be clearly segmented. Since a

typical

resolution ofthe document images ranges from 200 dpi to 600 dpi for scanning books, advertisements,

journals

and magazines, etc,we utilize

M,,

-

Mv

=100 (8.5 mm on the

300 dpi resolution) in our experiments. The multi-plane region matching andassemblingprocess, aspresented in the followingsection, isthencarriedout toassembleand

classify

them into object planes, denoted as P. All SRs obtained from the localized multilevel

thresholding

process are

collected intoahypothetical "Pool", inwhich the

multi-plane

region matching andassemblingprocess is conducted.

III. MULTI-PLANEREGION MATCHINGANDASSEMBLING This section describes an algorithm for constructing the object planes from the SRs, generated from the localized multilevel thresholding procedure introduced in the preceding section. Some statistical and spatial features of the adjacent SRs areintroduced into the multi-plane region matching and assembling procedure in order to assemble all SRs of the homogenous text region or object. The proposed multi-plane region matching and assembling process is conducted by

performing the following three phases - the initial plane

selection phase, the matching phase and the plane constructingdecision phase.

Several concepts and definitions are first introduced to facilitate the matching and assembling process of SRs obtained from the previous localized thresholding procedure. An SR may comprise several connected object regions of pixels decomposed from its associated region 9t. Thus the pixelsthat belong to the regions of a certain SR are said to be objectpixelsof this SR, whileotherpixelsinthis SR are non-objectpixels. Thesetof theobject pixelsin one SR isdefined asfollows,

OP(SR

ii)

={g(SR

iJ,kx,y)|

The pixel at (x,y) is an object pixel in SRiijk} where g(SRi 1,x,y) is thegrayillumination intensity of the

pixel atlocation (x, y) in SR i,j,k , and the range of x is within

[0,

MH

-1]

and y is within

[0,

M,,

-1]

. The concept 4-adjacentreferstothesituationinwhich each SR has four sides thatborder the top, the bottom, the left or theright boundary ofits adjoiningSRs. The SRs which arecomprised ofobjects with homogeneous features are assembled to form an object plane, denoted by P.

A.InitialPlane SelectionPhase

In the first processing phase, to improve the speed and accurateness of the final convergence of the multi-plane region matching and assembling process, the mountain clusteringtechnique [9] can be applied todetermine the SRs with the most prominent and representative gray intensity features, and these SRs are selectedas seedsto establisha set of initial planes. The mountain method is a fast, one-pass algorithm, which utilizes the densityof features todetermine the most representative feature points. Here we consider SRs asfeaturepointsinthemountainmethod.

First,given the localized multilevel thresholdingprocess to segment the image into r SRs in the Pool, the mean

pu(SR

Ik)

associated with each of them is also obtained.

,u(SR

i} k) is the mean of gray intensities of object pixels

comprised by SR ,j,k andis equivalent to

Atj

obtainedby localized multilevel thresholding process. Then the region dissimilarity measure, denoted by DRM, between two 4-adjacentSRs canbecomputedas,

DR4(SRi

j1,k,

SRi'j2

k,)

=U(SR i'Jl1k))u(SRiJ2)|

(4)

The range of the DRM is within

[0,255].

The lower the computedvalueof

D.,

thestrongerthesimilarity amongtwo SRs.

Therefore, the mountain function at a SR can be computed as,

M,

(SR

i

),

jk

SEP-oDRM

(SR

,SR'j

) (5) VSRi,jiEpooI

where a is a positive constant. A higher value of the mountain function reflects that SRi,j,k possesses more homogenous SRs in its vicinity. Therefore, it is sensible to

(4)

selecta SR

'j'

k with ahigh value of mountain function as a

representative seed to establish a plane. Let M0 be the maximalvalueof the mountain function values, and

SR,,

be the SRwhose mountain value is M*:

MO`

(SR

)= max

[MO

(SR jo)]

(6)

Thus,

SRO*

is selected as the seedof thefirstinitial plane. Aftercomputingthemountain functionof each SR in the Pool,the following representative seeded SRs are determined by destroying the mountains. Since the SRs whose gray intensity features close to SR0 also have high mountain

values, itisnecessary toeliminate the effects of theidentified seeded SRs before determining the follow-up seeded SRs. Toward this purpose, the updating equation of the mountain function, after eliminating the previous

(m-i

)th seeded SR

-SRm

i*

X is computedby,

m,, (SR 'jk') mA-, (SR 'j') M* (SR, e')e OD(SR lSR, (7)

wheretheparameter ,B determinesthe neighborhood radius thatprovide measurable reductions in the updated mountain function. Thenrecursively performing the discountedprocess of the mountain function given by Eq. (7), new suitable seededSRscanbedeterminedinthesameway,untilthelevel of the current maximal M* falls bellow a certain level compared to the first maximal mountain MO The terminativecriterion ofthisprocedure is defined as,

M*-

*)<g

~~~~~(8)

(MG /MO)

(8

where 6 is a positive constant less 1. Here the parameters are selected as a=5.4, ,B=1.5 and -=0.45 as suggested by Pal and Chakraborty

[10].

As a result, this process converges to the determination of resultant N seeded SRs:

{SR,*,

m=0:N-1}, and they are utilized to establish N

initialplanesforperformingthefollowing matching phase. B.Matching Phase

Inthematching phase,thesimilarityandconnectedness between each currentunclassified SR andall current exiting planesare analyzedtodetermine the best belongingplane. If the best matching plane for a certain unclassified SR is

determined,

then this SR is assembled into that plane and removed from the

Pool;

otherwise, if there is no matching plane for a certain unclassified SR, then this SR remains in the Pool. Since new planes will be established in the plane constructing phase ofthefollowing

recursions,

the SRswhich cannot find matching planes in the current recursion of the matching phase will be analyzed in subsequent recursions untilthey findtheirbest

matching plane.

Twomeasurements of thecontinuityandsimilaritybetweentwo4-adjacentSRs

-the side-match measure, denoted as DSM, and the region dissimilarityDRM, ascomputedbyEq. (4), are employedfor theassemblingprocess. Then both DSM andDRMmeasuresare

considered to determine the match

grade

oftwo

4-adjacent

SRs. The matching phaseprocess is basedon

evaluating

the matchgradeamongthe SRs andassemblingthemintoplanes.

First, the side-match measure - DSM, which reveals the dissimilarity of the touching boundary between the two 4-adjacent SRs, is described as follows. Suppose that two SRs are 4-adjacent. They mayhave one of two types of touching boundaries: 1) the vertical touching boundary shared by two horizontally adjacent SRs, or 2) the horizontal side shared by two vertically adjacent SRs. Here we take the case of two horizontally adjacent SRs for example, and the case of vertically adjacent SRs can also be similarly derived. For a pair oftwo horizontally adjacent SRs -SR ,ij,

k"

on the left, and SR ,,

J2,k,

onthe right, the pixel values on the rightmost side of SR

i',jk

I' and the leftmost side of SR "

X"2"

can be described as: g(SR

'"k',

M1

-l,y)

and g(SRkJ_kl,0,y) respectively. The sets of object pixels on the rightmost side and theleftmost side of an SR, denoted by RS(SRisj,k) and

LS(SR i,jk ), respectively, are defined as follows,

RS(SR'

ij.k)

={yg(SR

)O(jR,

M,+

-n1,dy)

g(Sijk,,,-l,y)E-OP(SR I-k),and O<y<M,-1} LS(SR )

=g(SR

o

(

(9) 10) g(SR'i k 0,y)eOP(SR j), and

O<y<Mv

-1} To facilitate the following descriptions of the side-match features, the denotations of SR ,,j k and SR' k', are simplified as

SR'

and SR , respectively. The vertical touching boundary of

SR'

and SR , denoted as

VB(SR',

SRIr),

is represented by a set of side connections formed by pairs of object pixels that are symmetrically connected on their associated rightmost and leftmost sides, andisdefinedasfollows,

VB(SR',

SRr

)

=

(g(SR',

MH

-,y),g(SRr,

0,

Y))

(11)

g(SRI,

M-1,y)

E

RS(SR'),

and

g(SRr

,O,y)

E

LS(SRr)}

Also, the number of side connections of the touching boundary, i.e. the amount of connected pixel pairs in VB(SR

'iJii,k

SR

i2j12k2J

) Xshould also be considered for the connectedness of the two 4-adjacent SRs, and is denoted by

NSC

(SR

j,'

JI, ,SRi2 J2 2). Therefore, the side-match measure,

DSM,

ofthetwo

4-adjacent

SRs canbe

computed

as,

DSM

(SRI,SRr)

=

I

11 g(SR',

M,

-1,

y)-

g(SRr,

0,

Y)||

(12)

(g(SR,MH-I-X).g(SR

.0J#))

VB(SRl,SRr)

Ns,.(SRI,SRr)

If the DSM value oftwo 4-adjacent SRs is sufficiently low, then these two SRs are homogeneous with each other, and shouldbelongtothe sameobject plane P. The range ofDSM values is within

[0,255].

Accordingly,theDSMmeasure canreflect thecontinuity oftwo4-adjacent

SRs,

and the

DRM

value,asobtainedbyEq. (4), reveals the similarity between them. Hence the homogeneity and connectedness oftwo 4-adjacent SRs can be measured bydetermining thedominant effect of theDSM

(5)

and the DRM. Therefore, based on the abovedefinitions, the match grade of two 4-adjacent SRs, denoted by m , is determinedas,

m(SR'

IX,k1

ISRi.,~

XRk,

max(DSM (SR k,,

SRij

k,

),

DRM (SR"

l,kI

I,sRk-,k

)) (13)

max(a(SR" i1k1 ) +U(ysrii)ko), 1)

where o(SRi,j,k) is the standard deviation of gray intensities of all object pixels associated with the SR i,j,k andisequivalent to < obtained from localized histogram multilevel thresholding process. Here the denominator term max(o7(SR"j1 k)+ 7(SR'i2k,), 1) in Eq. (13) serves as the normalization factor.

In each recursion ofthe matching phase, each of the unclassified SRs, i.e. SRi,j,k in the Pool, is analyzed by evaluating the match grades of SR

i,j,k

associated with those SRs, denoted by SRi',j',k' which have been grouped into current existing object planes(the subscript q represents that SR i',j',k' belongs to the q-th plane

Eq)

to seek for the matching planeinto which SRi,j,k canbe

grouped.

Inorder to facilitate match grade determination, the set AS(SR

iIj.k,p

) isutilized forcontainingthe SRqS inaplane

Pq

which are4-adjacentto SRisjk iSdefined

by,

AS(SRijk) (14)

=1SRqj j k E

CqI

SRqiiqk

is

4-adjacent

tO SRi3j1k Then thematchgrade

2R(SR

i

k),

which revealshowwell

SRi,j,k matches with

Pq

, can be determined by the followingoperation,

9t(SR

ik)

)-

min

k m(SRij,kS q

i'jk)

(15)

VSRq'Ik',EAS(SRLJk,Pq)

It must be noted that if none of the SRs in

P,

are 4-adjacent to SRi,jk, i.e.

AS(SRJk,7f)=0,

then P is excluded from the consideration for

matching

with SRi,jk Thus the plane which has the best match grade associated with SRij,k among allexisting planes,denotedby Cm, can

be determinedby,

9i(sR i,j,k

,

P4)min9MGSR

i,j,k,

pq

(16)

VPq

The following matchingcondition is then applied to check whether the selected candidate plane (m and SRi,j,k are sufficiently matched, and thus (m can be determined for suitably absorbing SR i,j,k,defined as follows,

95L(SR

i,j,k

P

C

)

<

Thm

(17)

where

Th.

is a

predefined

threshold which represents the acceptable tolerance of dissimilarity for SR i,j,k to be grouped into

P,.

The matching condition has a moderate effectonthenumberof resultant object planes, and the value choice of

Thm

=1.2 is experimentally determined to obtain

sufficiently distinct planes and avoid excessive splitting of planes. Accordingly, if SR ,jk and its associated

P"m

satisfythe matching condition, then SRi,j,k is merged into

Pm,

andremoved from the Pool. If the matching condition cannotbe satisfied, this reflects that there is no appropriate matching plane for SRi,j,k in this current matching phase recursion. As a result, SRijk should remain in the Pool until its suitable matching plane emerges after more plane constructing phase recursions. Afteradeterminationhas been made for SRi,jkk the matchingprocessis in turn appliedon the subsequent unclassified SRs in the Pool, until all have beenprocessedonceinthecurrentmatching phase recursion. C. PlaneConstructing Decision Phase

Afterperforming the previous matching phaserecursion, ifthere are unclassified SRsremaining, and the Pool is not drained, these unclassified SRs must be analyzed and a determination madeas towhether it is necessarytoestablish a newplaneto assemble SRswith such features into another homogeneous region. Theplane constructing decision phase determines whetherto 1)establish andinitialize a newplane byselecting the unclassified SR "farthest" from all existing planes as an initialseed, or 2) extend one selectedplane by merging one unclassified SR "nearest" to this plane. The decision is made according to theanalysis ofthe following gray intensity and location features. The dissimilarity measure between one unclassified SR, not adjoined to any existing planes in theprevious matching phase recursion, and a certain object plane CP, is determined by the relative-difference of gray intensities between this SR and its nearest

SRqi'ko

among all

SRqs

that

belong

to T, and is

computedas,

D1

(SRjk,

mm r DRM(SR ijk

SRqJk

)

(1k8)

VsRq-Pq

max(o0(SRlk"I'

(Sqik,

)

where DRM(SRilj,k

,SRqi

jlk) iscomputed byEq.(4). Then the smallest dissimilarity ofgrayintensity between SRi,j,k and theplane beingmostsimilartoit amongallcurrentlyexisting object planes, denoted by

(PS,,SR

.j)k,

isdetermined by,

DI

(SR

k,jSI(SR

j.k))=

mn(D,))

i

(19)

q

Then the dissimilarity measure of SR

,jk

and its associated

PSI(SR

ij)

is

also

expressed

as Ds,

(SR

ijk)

to

represent

the

least dissimilarity of gray intensity between this SR and all existing planes. If SRi,j,k has a sufficiently low dissimilarity

D,

with theplane

P1(SRJ.k),

and they are also locatively close to each other, this means that this SR is homogeneous with

uSl(SRi.(.k,

, even if it is not currently

4-adjacentto

PSI(SR

j-)

-To determine this situation, the locative distance betweenanSRand a plane

Pl,

denoted as DE(SR

.J-kJp),

is computedby the Euclidean distance between this SR and the

(6)

closest SRq among all SRqs associated with the plane

Pq;

and is determined as,

DE(SR , GP)

mi(SR

(SR k

RSRq

i ) (20)

where De(SR

ij*,kSRqI'ij'k')=

V(i

_i,2+(j_

)'2)2

If SRij,k and its

PSI(SR

Q.k) are homogeneous in gray illumination and alsolocatively close to each other, i.e. both D,(R SI

((SR

k)) and DE(SR p, ,(SR,i)) values are

sufficiently small, then SRi,j,k should join the plane

PSI(SR

A)k,

rather than establish a new

independent

plane,

in

order to prevent a text region orhomogeneous object to be split into morethan one plane. Otherwise, if no such planes are found, a newplane should be created to aggregate those SRswithdistinct features.

To ensure that the newplane contains distinct features with currently existing planes, a scheme for selecting the suitable SRastherepresentative seed for constructing a new plane isgivenas follows. Bythis means, this seedSR, which is most dissimilar to any currently existing planes in illuminationintensities, canbeobtained by,

DLI

(SRLI)

= max

DL

(SR

i)jk

(21)

VSRE Pool

where DLI (SR

ijilk)

=

max(DI

(SR

i,jk

)) VPq

Hence, the determined

SRLI

with the largest

DL,

value will be selected as the seed SR to establish a new plane to aggregate those SRs whose features are distinct from other existing planes.

By means of the definitions given above, the plane constructing decision isperformed accordingtothefollowing steps:

Step1: First,the SRswhich havesufficientlylow

Ds,

values areselected into theset

SRs5

using thefollowing operation:

SRs,

={SR i,j,kE Pool

|Ds,

(SR iljk)<

Ths,

},

(22)

where

Ths,

is a predefined threshold for determining whether the SR is homogeneous with any one of existing planes. If noneofthe SRsare selected bytheabove condition, i.e.

SRsI

isempty, thengodirectlytoStep 3for constructing a newplane; otherwise, performthefollowing Step 2. Step 2: The set SRsI now contains the SRs which are homogeneous with other currently existing planes, but are not4-adjacent with them, andthusremainunclassifiedin the previous matching phaserecursion. The SRlocativelynearest to its associated

PI(SR,j,)'

denotedby

SR'

, is determined asfollows,

DE (SRN,(PS

(sRN)

)= min DE

(SRi,j,k

,(p

i,(,k

)

Si VSRE=-SRs, SI(SR'' (23)

If SRN and its

PS,(sR,V

are sufficiently close to each other, i.e. the condition

DE(SRNj'PS,(SRN,)

<ThL is satisfied, then SRN is decided to be merged with

Ps,(,R,)

to extend its influential area on nearby SRs, and

proceeds

to Step 4. Otherwise, performStep3forconstructinga newplane.

Step 3: The

SRLI,

the SR most dissimilar to any currently existing planes, is determined by Eq. (21). Thus

SRLI

is employed as aseeded SR toestablish anewplane

qP,,

and thencontinuestoStep 4.

Step 4: Finish the plane constructing decision phase, and thenconductthe nextmatchingphaserecursion.

The threshold

Th5,

utilized in Eq. (22) moderately influences the number of resultant planes. If

Ths,

is low, then the number ofresultant planes will be increased and a homogeneous region may be broken into more than one plane, althoughits influence on text extraction is not serious. Ifthe value is large however, then the number of planes is reduced, andsomeobjects maybe merged to a certain degree. Reasonably, its value should be tighter than the

Th,,

value, which isutilizedinthe pre-match condition to ensure that the determined SRN is sufficiently homogeneous with PSI(SR% and thus SI(SRN) can appropriately absorb homogeneous SRs near the extended influential area benefited from participation with SRN. Therefore, in our experiments, the value of

Ths5

is chosen as (3/4).

Th,,

. Normally, text-lines or text-blocks usuallyoccupyperceptible area of the image, and thus their width or height should be in appreciable proportion to those of the whole image. Therefore,

ThL=

min(N,,,NV

)/4 is used for experiments, where N,,

and

NV

are the numbers ofblock regions per row and per column, respectively.

D. Overall Process

Based on the three above-mentioned processing phases, the region matching and assembling process begins by applying the initial plane selection phase on allunclassified SRs in the Pool to determine the representative seeded SRs {SRm m m=1:N} for setting up Ninitial planes. Thus, the

matching phasecan then becarried out for the rest ofSRs in the Pool and the initial planes

P,

*, , . Then the

matchingphaseand theplane constructing decision phase are recursively performed in turns on the rest of the SRs in the Pool and emergingplanes, until each SR has been classified and associated with a particular plane, and the Pool is eventually cleared.

Consequently, afterthemulti-planeregion matchingand assembling process is completed, each of the homogenous objects with connected and similar features is separated into corresponding object planes. To facilitate further analysis, they are represented as

P,,

C, ...,

,L-

where L is the number of theresultant planes obtained. We use Fig. 1 as an example of the

proposed

method procedure. The original image in Fig. l(a) consists of three different colored text regions printed on a varying and shaded background. Moreover, the black charactersaresuperimposedonthe white characters. First, as shown in Fig.

1(b),

the original imageis transformedinto grayscale,and is divided into blockregions. Figures

1(c)-(i)

are seven object planes obtained from Fig. l(a) by performing the multi-plane segmentation technique. As aresult, the homogeneous objects in which all characters

(7)

and backgroundtextures are segmentedinto several separate

planes can effectively be analyzed in detail. By observing

these obtained planes,we can seethat threetextregionswith differentcharacteristicsaredistinctly separated. Extraction of

text strings from each binarized plane in which objects are

well separated, and canbe easily performed by ourprevious

proposed projection-based textextraction method [11]. After the text extraction process being conducted on all object

planes, the text lines extracted from these planes are then

collectedintoaresultanttextplane,as shown inFig.

o(j).

(a)Original image. (b) Sectored regions corresponding totheoriginalimage

(d) Object plane (P

(e) Object plane P2 (f)Object plane 3

..d s.^'

m--* ~sitL^*.X As4lt rz.itS

(g) Object plane P4 (h) Object plane (5

(j)Thetextplaneextractedfromall

~~~planesderived fromFig. (a) Fig. 1. Anexampleofthetestimage, "Calibre",processed bytheproposed multi-plane segmentation technique (imagesize:1929x1019)

IV. EXPERIMENTALRiESULTS

Theperformanceof theproposedmethod isevaluated in this section and compared to other existing text extraction methods, namely Jamn and Yu's color-quantization-based

method [6], and Pietikainen and Okun's edge-based method [4]. A set of 46 real-life complex document images was

employed forexperiments on performance evaluation oftext

extraction [12]. These test images are scanned from book

covers, book andmagazine pages, advertisements, and other real-life documents at the resolution of 200 dpi to 300 dpi. They were transformed into gray-scale images, and then

processed by the proposed method. Most are comprised of

characterstrings invariouscolorsorilluminations, font styles

and sizes which areoverlapped with multi-colored, textured, or uneven illuminated backgrounds. Figures 2(a) - 3(a) are

themostrepresentative samples with typical characteristics of complex document images, andmoreresults oftest samples inthe experimentalsetarealso shown in [12].

Figures2-3 show the results oftextextractionproduced

by the proposed method, Jain and Yu's color-quantization-based method [6], and Pietikainen and Okun's edge-based method [4]. Here the extraction results of Pietikainen and Okun's method in Figs 2(d) - 3(d) were converted into

masked images where the black maskwasadoptedtoexhibit

the non-textregion. Figure 2(a) contains background objects with sharp illumination variations across text regions, and someof these alsopossesssimilarcolors and illuminationsto

those characterstouched withthem. Asaresult, the character

illuminations are influenced and have gradational variations

due to the scanning process. After performing the proposed

method, the extractionresultsshown inFig. 2(b) demonstrate that themajority ofthe charactersare successfully segmented

fromthesharply varying backgrounds.AsshowninFig. 2(c), Jain and Yu's method fails to extract the large captions and many characters of the main text region, because many characters are fragmented due to the influence of those background objects in color-quantization process. The edge-based method extracts most characters except some broken

large characters and several missed small characters, as shown in Fig. 2(d); however, several graphical objects with sharply varying contours arealso identified as text,and thus the characters in extracted text regions are blurred. In Fig.

3(a), large portions of the main bodytexts are printed on a

large blacktextured and shadedregion, and thus the contrast

between the characters and this textured region is extremely degraded. As shown in Fig. 3(b), the characters in three different text regions are successfully extracted by the

proposed method. BothJain and Yu'smethodandthe edge-based method fail to extract characters from the textured region with degradedcontrast,asshown inFigs. 3(c)and3(d),

respectively. Therefore, as seen from Figs. 2 - 3, our

proposed method performs significantly better than Jain and Yu's method and the edge-based method in various difficult

cases. By observing the results obtained, while character

strings comprised of variousilluminations, sizes, and styles,

are overlapped with background objects with considerable variations incontrast, illumination, andtexture, nearly all the

textregionsareeffectivelyextractedbytheproposed method.

To quantitatively evaluate text extraction performance,

twomeasures, the recallrateandtheprecisionrate, whichare

commonly used for evaluating performance in information retrieval,areadopted. Theyarerespectively definedas,

,, + _No. of correctly extracted characters AX

recail

rate

_No. _{of actual characters}

precision rate=

No. of correctly extracted characters (25)

No. of extracted character-like components

Wecompute the recall and precision ratesfortextextraction results oftestimages in this study by manually counting the

number of actual characters of the document image, total :70:::Er

., i,,2.= .a.B>.,--X<. 70 --;;

i. i.- ..55-:;j,,,,;,f,fi.i-., .ff.§,i,z,,<-'u,,M-.,2s&>>.100

.,,,-,....,...Efi-;ES'SSES-'-E

iSiE1EU;7jR,,;,;,;,'),,,7.,{,,=,T:h::E

-000t0Xdyt'{=S';-'':i".'W-"''-''''==-"::i'""-"-"'S.ttitEStEtEST;

7: 7.t:f: ?;- t; bES,,:D iL.: tC'd: t47:S ffit: itiS

jf ff:0:: it0 itiC'VE t: .t L.E

iES::SXiti)iDiyS)iE0:

(c) Object plane (P0

... 1

-1 Li

(8)

extracted character-like components, and the correctly extracted characters, respectively. The quantitative evaluation were performed on our test set [12] of 46 complex document images totaling 22791 visible characters. Since these quantitative evaluation criteriaareperformed on the extracted connected-components, the results of Pietikainen and Okun's method [4] is inappropriate for quantitative evaluation using these criteria. Table I depicts the results of quantitative evaluation of Jain and Yu's method [6] and the proposed method. By observing Table 1, wecan see that the proposed method exhibits significantly better text extraction performanceascomparedwith that of Jain and Yu's method.

TableI.Experimentaldata of Jain and Yu'smethod and proposedmethod Method Recall Rate Precision Rate JainandYu'smethod 79.8% 95.2%

Proposedmethod 99.1 % 99.3% REFERENCES

[1] L. 0' Gorman, R. Kasturi, Document image analysis, IEEE Comput. Soc.Press,SilverSpring, MD, 1995.

[2] V. Wu, R. Manmatha, and E.M. Riseman, "Textfinder: an automatic system todetect andrecognize textin images," IEEE Trans. Pattern

Anal. Mach.Intell., vol. 21,no. 11,pp. 1224-1229, 1999.

[3] Y. M. Y. Hasan,and L. J. Karam,"MorphologicalTextExtractionfrom Images,"IEEE Trans. Image Process., vol. 9, no. 11, pp. 1978-1983, 2000.

[4] M.Pietikinenand0.Okun,"Edge-based method for text detection from

complexdocumentimages,"Proc. 6thInt'lConfDoc.Anal. Recognit., pp.286-291,2001.

[5] Y. Zhong, K. Karu, A. K. Jain, "Locating text in complex color images," Pattern Recognit.,vol.28, no.10,pp. 1523-1535, 1995. [6] A. K. Jain, B. Yu, "Automatic text location in images and video

frames," Pattern Recognit.,vol. 31, no. 12, pp. 2055-2076, 1998.

[7] C.Strouthopoulos,N.Papamarkos,A.E.Atsalakis,"Textextractionin

complex color documents,"Pattern Recognit.,vol. 35, pp. 1743-1758, 2002.

[8] B.-F. Wu, Y.-L. Chen, and C.-C.Chiu,"Adiscriminantanalysis based recursive automatic thresholding approach for image

segmentation,"IEICE Trans. Info. Systems,vol. E88-D, no.7, pp.1 716-1723,2005.

[9] R.R. Yagerand D.P. Filev, "Approximate clustering via the mountain

method," IEEETrans.Syst. ManCybern,vol. 24, no. 8, pp. 1279-1284, 1994.

[10]N.R. Pal and D. Chakraborty, "Mountain and subtractive clustering method: improvements and generalization," Int'l.J.Initell.Sy7st.,vol. 15, pp.329-341,2000.

[11]B.-F. Wu, Y.-L. Chen, and C.-C. Chiu, "Multi-layer segmentation methodforcomplexdocumentimages,"Int'l. J. PatternRecognit.Artif.

Intell., Vol.19,No.8,pp.997-1025,2005.

[12]Experimentalresultsofall testimages inourdatabaseareavailableat:

http://140.113.150.97/SMC06 TestDatabase.htm

SIEMENS SIEMENS

(b).Textextraction resultsby the (c).Textextractionresultsby Jain and

proposedmethod Yu'smethod

Fig. 2.Resultsofthe test image I (size: 2333x3153)

you

see

v

you

see

SAMSIH* TFTtCB~~~~~~~~~~M~S00

(b).Textextraction resultsbythe (c).TextextractionresultsbyJamnand

proposedmethod Yu'smethod

Fig.3.Results of thetestimage2(size:2405x 3207)

(d). I ext extractionresultsby

Pietikainenand Okun's method

(a). 1extextractionresultsby