A multi-plane approach for text segmentation of complex document images

(1)

Contents lists available atScienceDirect

Pattern Recognition

journal homepage:w w w . e l s e v i e r . c o m / l o c a t e / p r

A multi-plane approach for text segmentation of complex document images

Yen-Lin Chen

a

_{, Bing-Fei Wu}

b,∗

a_{Department of Computer Science and Information Engineering, Asia University, 500 Liufeng Road, Wufeng, Taichung 41354, Taiwan} b_{Department of Electrical and Control Engineering, National Chiao Tung University, 1001 Ta-Hsueh Road, Hsinchu 30010, Taiwan}

A R T I C L E I N F O A B S T R A C T

Article history:

Received 19 January 2008

Received in revised form 1 September 2008 Accepted 19 October 2008

Keywords:

Document image processing Text extraction

Image segmentation Multilevel thresholding Region segmentation Complex document images

This study presents a new method, namely the multi-plane segmentation approach, for segmenting and extracting textual objects from various real-life complex document images. The proposed multi-plane segmentation approach first decomposes the document image into distinct object planes to extract and separate homogeneous objects including textual regions of interest, non-text objects such as graphics and pictures, and background textures. This process consists of two stages—localized histogram multilevel thresholding and multi-plane region matching and assembling. Then a text extraction procedure is applied on the resultant planes to detect and extract textual objects with different characteristics in the respective planes. The proposed approach processes document images regionally and adaptively according to their respective local features. Hence detailed characteristics of the extracted textual objects, particularly small characters with thin strokes, as well as gradational illuminations of characters, can be well-preserved. Moreover, this way also allows background objects with uneven, gradational, and sharp variations in con-trast, illumination, and texture to be handled easily and well. Experimental results on real-life complex document images demonstrate that the proposed approach is effective in extracting textual objects with various illuminations, sizes, and font styles from various types of complex document images.

1. Introduction

Extraction of textual information from document images provides many useful applications in document analysis and understanding, such as optical character recognition, document retrieval, and com-pression[1,2]. To-date, many techniques were presented for extract-ing textual objects from monochromatic document images[3–6]. In recent years, advances in multimedia publishing and printing tech-nology have led to an increasing number of real-life documents in which stylistic character strings are printed with pictorial, tex-tured, and decorated objects and colorful, varied background com-ponents. However, most of current approaches cannot work well for extracting textual objects from real-life complex document images. Compared to monochromatic document images, text extraction in complex document images brings many difficulties associated with the complexity of background images, variety, and shading of charac-ter illuminations, the superimposing of characcharac-ters with illustrations and pictures, as well as other decorated background components. As a result, there is an increasing demand for a system that is able to read and extract the textual information printed on pictorial and

∗ Corresponding author. Tel.: +886 3 5131538; fax: +886 3 5712385. E-mail addresses:ylchen@asia.edu.tw(Y.-L. Chen),

bwu@cssp.cn.nctu.edu.tw(B.-F. Wu).

textured regions in both colored images as well as monochromatic main text regions.

Several newly developed global thresholding methods are useful in separating textual objects from non-uniform illuminated doc-ument images. Liu and Srihari [7]proposed a method based on texture features of character patterns, while Cheriet et al.[8] pre-sented a recursive thresholding algorithm extended from Otsu's optimal criterion [9]. These methods are performed by classify-ing pixels in the original image as foreground objects (particularly textual objects of interest) or as background ones according to their gray intensities in a global view, and are attractive because of computational simplicity. However, binary images obtained by global thresholding techniques are subject to noise and distortion, especially because of uneven illumination and the spreading ef-fect caused by the image scanner. To solve the above-mentioned issues, Solihin and Leedham's integral ratio approaches [10] pro-vided a new class of histogram-based thresholding techniques which classify pixels into three classes: foreground, background, and a fuzzy region between two basic classes. In Ref.[11], Parker proposed a local gray intensity gradient thresholding technique which is effective for extracting textual objects in badly illumi-nated document images. Because this method is based on the assumption of binary document images, its application is limited to extracting character objects from backgrounds no more complex than monotonically changing illuminations. A local and adaptive

(2)

binarization method was presented by Ohya et al. [12]. This method divides the original image into blocks of specific size, de-termines an optimal threshold associated with each block to be applied on its center pixel, and uses interpolation for determin-ing pixel-wise thresholds. It can effectively extract textual objects from images with complex backgrounds on condition that the il-luminations are very bright compared with those of the textual objects.

Some other methods support a different viewpoint for extracting texts by modeling the features of textual objects and backgrounds. Kamel and Zhao[13] proposed the logical level technique to uti-lize local linearity features of character strokes, while Venkateswarlu and Boyle's average clustering algorithm[14]utilizes local statisti-cal features of textual objects. These methods apply symmetric lo-cal windows with a pre-specified size, and several pre-determined thresholds of prior knowledge on the local features, and so that char-acters with stroke widths that are substantially thinner or thicker than the assumed stroke width, or characters in varying illumina-tion contrasts with backgrounds may not be appropriately extracted. To deal with these problems, Yang and Yan[15]presented an adap-tive logical method (ALM) which applies the concepts of Liu and Srihari's run-length histogram[7]on sectored image regions, to pro-vide an effective scheme for automatically adjusting the size of the local window and logical thresholding level. Ye et al.'s hybrid ex-traction method [16] integrates global thresholding, local thresh-olding, and the double-edge stroke feature extraction techniques to extract textual objects from document images with different com-plexities. The double-edge technique is useful in separating charac-ters whose stroke widths are within a specified size from uneven backgrounds. Some recently presented methods[17,18]utilized the sub-image concepts to deal with the extraction of textual objects under different illumination contrasts with backgrounds. Dawoud and Kamel's[17] proposed a multi-model sub-image thresholding method that considers a document image as a collection of pre-determined regions, i.e. sub-images, and then textual objects con-tained in each sub-image are segmented using statistical models of the gray-intensity and stroke-run features. In Amin and Wu's multi-stage thresholding approach[18]Otsu's global thresholding method is firstly applied, and then a connected-component labeling process is applied on the thresholded image to determine the sub-images of interest, and these sub-images then undergo another threshold-ing process to extract textual objects. The extraction performance of the above two methods relies principally on the adequate deter-mination of sub-image regions. Thus, in case of the textual objects overlapping on pictorial or textured backgrounds of poor and vary-ing contrasts, suitable sub-images are hard to determine to obtain satisfactory extraction results.

Since most textual objects show sharp and distinctive edge fea-tures, methods based on edge information[19–22]have been de-veloped. Such methods utilize an edge detection operator to extract the edge features of textual objects, and then use these features to extract texts from document images. Wu et al.'s textfinder sys-tem[20]uses nine second-order Gaussian derivative filters to ob-tain edge-feature vectors of each pixel at three different scales, and applies the K-means algorithm on these edge-feature vectors to identify corresponding textual pixels. Hasan and Karam[21] in-troduced a method that utilizes a morphological edge extraction scheme, and applies morphological dilation and erosion operations on the extracted closure edges to locate textual regions. Edge in-formation can also be treated as a measure for detecting the exis-tence of textual objects in a specific region. In Pietikainen and Okun's work[22], edge features extracted by the Sobel operator are divided into non-overlapping blocks, and then these blocks are classified as text or non-text according to their corresponding values of the edge features. Such edge-based methods are capable of extracting textual objects in different homogeneous illuminations from graphic

backgrounds. However, when the textual objects are adjoined or touched with graphical objects, texture patterns, or backgrounds with sharply varying contours, edge-feature vectors of non-text ob-jects with similar characteristics may also be identified as textual ones, and thus the characters in extracted textual regions are blurred by those non-text objects. Moreover, when textual objects do not have sufficient contrasts with non-text objects or backgrounds to form sufficiently strong edge features, such textual objects cannot be easily extracted with edge-based methods.

In recent years, several color-segmentation-based methods for text extraction from color document images have been proposed. Zhong et al. [23] proposed two methods and a hybrid approach for locating texts in color images, such as in CD jackets and book covers. The first method utilizes a histogram-based color clustering process to obtain connected-components with uniform colors, and then several heuristic rules are applied to classify them as textual or non-textual objects. The second method locates textual regions based on their distinctive spatial variance. To detect textual regions more effectively, both methods are combined into a hybrid approach. Although the spatial variance method still suffers from the draw-backs of the edge-based methods mentioned previously, the color connected-component method moderately compensates for these drawbacks. However, this approach still cannot provide acceptable results when the illuminations or colors of characters in large textual regions are shaded. Several recent techniques utilize color clustering or quantization approaches to determine the prototype colors of doc-uments so as to facilitate the detection of character objects in these separated color planes. In Jain and Yu's work[24], a color document is decomposed into a set of foreground images in the RGB color space using a bit-dropping quantization and the single-link color cluster-ing algorithm. Strouthopoulos et al.'s adaptive color reduction tech-nique[25]utilizes an unsupervised neural network classifier and a tree-search procedure to determine prototype colors. Some alterna-tive color spaces are also adopted to determine prototype colors for finding textual objects of interest. Yang and Ozawa[26]make use of the HSI color space to segment homogenous color regions to ex-tract bibliographic information from book covers, while Hase et al. [27] apply a histogram-based approach to select prototype colors on the CIE Lab color space to obtain textual regions. However, most of the aforementioned methods have difficulties in extracting texts which are embedded in complex backgrounds or that touch other pictorial and graphical objects. This is because the prototype col-ors are determined in a global view, so that appropriate prototype colors cannot be easily selected to distinguish textual objects from those touched pictorial objects and complex backgrounds without sufficient contrasts. Furthermore, such problems also limit the reli-ability of such methods in handling unevenly illuminated document images.

In brief, extracting texts from complex document images involves several difficulties. These difficulties arise from the following prop-erties of complex documents: (1) character strings in complex doc-ument images may have different illuminations, sizes, font styles, and may be overlapped with various background objects with un-even, gradational, and sharp variations in contrast, illumination, and texture, such as illustrations, photographs, pictures or other back-ground textures and (2) these documents may comprise small char-acters with very thin strokes as well as large charchar-acters with thick strokes, and may be influenced by image shading. An approach for extracting black texts from such complex backgrounds to facilitate compression of document images has been proposed in our previous work[28].

In this study, we propose an effective method, namely the

multi-plane segmentation approach, for segmenting and extracting textual

objects of interest from these complex document images, and resolv-ing the above issues associated with the complexity of their back-grounds. The proposed multi-plane segmentation approach first

(3)

decomposes the document image into distinct object planes to ex-tract and separate homogeneous objects including textual regions of interest, non-text objects such as graphics and pictures, and back-ground textures. This process consists of two stages—localized

his-togram multilevel thresholding and multi-plane region matching and assembling. Then a text extraction procedure is applied on the

re-sultant planes to detect and extract textual objects with different characteristics in the respective planes. The proposed approach pro-cesses document images regionally and adaptively by means of their local features. This way allows detailed characteristics of the ex-tracted textual objects to be well-preserved, especially the small characters with thin strokes, as well as characters in gradational and shaded illumination contrasts. Thus, textual objects adjoined or touched with pictorial objects and backgrounds with uneven, grada-tional, and sharp variations in contrast, illumination, and texture can be handled easily and well. Experimental results demonstrate that the proposed approach is capable of extracting textual objects with different illuminations, sizes, and font styles from different types of complex document images. As compared with other existing tech-niques, our proposed approach exhibits feasible and effective perfor-mance on text extraction from various real-life complex document images.

2. Overview of the proposed approach

The proposed multi-plane segmentation approach decomposes the document image into separate object planes by applying the two processing stages: automatic localized histogram multilevel thresh-olding, and multi-plane region matching and assembling. The flow diagram of the proposed approach is illustrated in Fig. 1. In the first stage, the original image is firstly sectored into non-overlapping “localized block regions”, denoted byi,j_{, then distinct objects}

em-bedded in block regions are decomposed into separate “sub-block regions (SRs)” by applying the localized histogram multilevel thresh-olding process, as illustrated in Figs. 2–4. Afterward, in the sec-ond stage, the multi-plane region matching and assembling process, which adopts both the localized spatial dissimilarity relation and the global feature information, is applied to perceptually classify and ar-range the obtained SRs to compose a set of homogeneous “object planes”, denoted by

P

q, especially textual regions of interest. This

proposed multi-plane region matching and assembling process is conducted by recursively applying the following three phases—the initial plane selection phase, the matching phase, and the plane con-struction phase, as depicted inFig. 6. Consequently, homogeneous objects including textual regions of interest, non-text objects such as graphics and pictures, and background textures are extracted and separated into distinct object planes. The text extraction process is then performed on the resultant planes to extract the textual objects with different characteristics in the respective planes, as shown in Fig. 7. The important symbols utilized for the presentation of the proposed approach are depicted inTable 1.

The following sections will accordingly describe the detailed stages of the proposed approach, and are organized as follows. In Sections 3 and 4, the two stages of the proposed multi-plane seg-mentation approach, the localized histogram multilevel threshold-ing procedure, and the multi-plane region matchthreshold-ing and assemblthreshold-ing process, are, respectively, presented. Then, a simple text extrac-tion procedure is described in Secextrac-tion 5. Next, Secextrac-tion 6 illustrates parameter adaptation and comparative performance evaluation results. Finally, the conclusions of this study are stated in Section 7. 3. Localized histogram multilevel thresholding

For complex document images with textual objects in differ-ent illuminations, sizes, and font styles, and printed on varying or

Fig. 1. Block diagram of the proposed multi-plane segmentation approach.

(4)

inhomogeneous background objects with uneven, gradational, and sharp variations in contrast, illumination, and texture, such as illus-trations, photographs, pictures or other background patterns, a crit-ical difficulty arises that no global segmentation techniques could

Fig. 3. Sectored regions of the illumination image Y obtained from the original image inFig. 2.

Fig. 4. Example of the results by the localized multilevel thresholding procedure, and the resultantSFvalues ofi2,j1,i1,j2, andi2,j2after the thresholding procedure are 0.931, 0.961, and 0.96, respectively: (a) part of the partitioned block regions of the image “Calibre” inFig. 3, where the block regions enclosed by yellow ink are employed for the following examples of the localized multilevel thresholding procedure, (b) the upper-left block region,i1,j1,SFb= 0.577, and= 8.81, (c) SRi1,j1,0derived from i1,j1, which is a homogenous block region, (d) the upper-right block regioni2,j1,SFb= 0.931, and= 22.8, (e) SRi2,j1,0derived fromi2,j1, (f) SRi2,j1,1derived fromi2,j1, (g) the bottom-left block regioni1,j2,SFb= 0.804, and= 42.3, (h) SRi1,j2,0derived fromi1,j2, (i) SRi1,j2,1derived fromi1,j2, (j) SRi1,j2,2derived fromi1,j2, (k) SRi1,j2,3 derived fromi1,j2, (l) bottom-right block regioni2,j2,SFb= 0.835, and= 46.6, (m) SRi2,j2,0derived fromi2,j2, (n) SRi2,j2,1derived fromi2,j2, (o) SRi2,j2,2derived from i2,j2, and (p) SRi2,j2,3derived fromi2,j2.

Fig. 5. Types of touching boundaries of the two 4-adjacent SRs: (a) vertical boundary and (b) horizontal boundary.

(5)

Fig. 6. An example of the test image, “Calibre”, and the object planes obtained by the multi-plane segmentation (image size= 1929×1019): (a) object planeP0, (b) object planeP1, (c) object planeP2, (d) object planeP3, (e) object planeP4, (f) object planeP5, and (g) object planeP6.

work well for such kinds of document images. This is because when the regions of interesting textual objects consisted of multiple col-ors or gray intensities are undersized as compared with those of the touched pictorial objects and complex backgrounds with in-distinct contrasts, these textual objects cannot be discriminated in a global view of statistical features. A typical example with these characteristics is shown in Fig. 2. This sample image consists of three different colored textual regions printed on a varying and shaded background. Moreover, the black characters are superim-posed on the white characters. By observing some localized regions, the statistical features of the textual objects, pictorial objects, and backgrounds could be much more distinguishable. Therefore, re-gional and adaptive analysis approach for the localized statistical features can provide detailed characteristics of the textual objects of interest to be well-extracted for later document processing. In this section, we will introduce a simple and effective localized seg-mentation approach as the first stage of the multi-plane

segmenta-tion process for extracting textual objects from complex document images.

The multi-plane segmentation process, if necessary, begins by ap-plying a color-to-grayscale transformation on the RGB components of image pixels in a color document image, to obtain its illumination image Y. After the color transformation is performed, the illumina-tion image Y still retain the texture features of the original color image, as pointed out in Ref.[20], and thus the character strokes in their original color are still well-preserved. Then the obtained illu-mination image Y will be sectored into non-overlapping localized block regionsi,j_{with a given size M}

H×MV, as shown inFig. 3. To

fa-cilitate analysis in the following stage, the objects of interest must be extracted from these localized block regions into separate SRs, each of which contains objects with homogeneous features. Toward this goal, the discriminant criterion is useful for measuring separability among the decomposed regions with different objects. Its applica-tion on bi-level global thresholding to extract foreground objects

(6)

from the background was first presented by Otsu[9]. This method is ranked as the most effective bi-level threshold selection method [29,30]. However, when the number of desired thresholds increases, the computation needed to obtain the optimal threshold values is substantially increased and the search to achieve the optimal value of the criterion function is particularly exhaustive.

Hence, an efficient multilevel thresholding technique is needed to automatically determine the suitable number of thresholds to segment the block region into different decomposed object regions. By using the properties of discriminant analysis, we have proposed an automatic multilevel global thresholding technique for image

Fig. 7. Examples of the text location and extraction process: (a) example of per-forming X-cut on connected-components in the binary planeBP4ofFig. 6(g), (b) example of performing Y-cut on the top connected-component group, which is the first group among five groups obtained from X-cut procedure onBP4, (c) the re-sultant candidate text-lines obtained by the XY-cut spatial clustering process, and (d) the resultant text plane obtained by performing text extraction process on all object planes derived fromFig. 2.

Table 1

List of important symbols of the proposed approach.

Symbol Description

i,j _{Localized block region, which represents one of the non-overlapping block regions sectored from the original image, and the superscript} (i,j) denotes its location index

SRi,j,k_{, SR}i ,j ,k

q Sub-block region, which is derived fromi,j after applying the localized histogram multilevel thresholding process; the additional superscript k means it is k-th SRs derived fromi,j_{, and when the subscript q is assigned, it means that this SR has belonged to an} existent object planePq

Pq Object plane, which is formed by a set of homogeneous SRs after performing the multi-plane region matching and assembling process, and the subscript q represents its order of creation

segmentation[31]. This technique extends and applies the concept of discriminant criterion on analyzing the separability among the gray levels in the image. It can automatically determine the suitable number of thresholds, and utilizes a fast recursive selection strategy to select the optimal thresholds to segment the image into separate objects with similar features in a computationally frugal way. Based on this effective technique, we will introduce a localized histogram multilevel thresholding process to decompose distinct objects with homogeneous features in localized block regions into separate SRs. This process is described in the following subsections.

3.1. Statistical features and recursive partition concepts of localized regions

Let fgdenote the observed frequencies (histogram) of gray

inten-sities of pixels in a localized block regioni,j_{with a given gray}

in-tensity g, and thus the total amount of pixels ini,j_{can be given by}

N_{= f}0+f1+ . . . +fU−1, where U is the number of gray intensities in the

histogram. Hence, the normalized probability of one pixel having a given gray intensity can be computed as,

Pg= fg N, where Pg

0, and U−1 g=0 Pg= 1 (1)

In order to segment textual objects, foreground objects and back-ground components from a given localized regioni,j_{, pixels in}i,j

should be partitioned into a suitable number of classes. For multi-level thresholding, with n thresholds to partition the pixels in the regioni,j _{into n+1 classes, gray intensities of pixels in}i,j_are

seg-mented by applying a threshold set_{T, which is composed of n} thresh-olds, where T = {tk|k = 1, ... , n}. These classes are represented by

C0= {0, 1, ... , t1}, ... Ck= {tk+ 1, tk+ 2, ... , tk+1}, ... , Cn= {tn+ 1, tn+

2, . . . , U_{−1}. Then the statistical features associated with a given pixel} class Ck, including the cumulative probability, the mean, and the

standard deviation, denoted by wk,

k, and

2k, respectively, can be

computed as wk= tk+1 g=tk+1 Pg,

k= tk+1 g=tk+1gPg wk , and

2 k= tk+1 g=tk+1Pg(g−

k) 2 wk (2) Based on the above-mentioned statistical features of pixels in the regioni,j_{, the between-class variance, denoted by v}

BC, an effective

criterion for evaluating segmentation results, can be obtained for measuring the separability among all classes, and is expressed as

vBC(T) = n k=0 w_k(

_k₋

)2, where

₌ U−1 g=0 gPg (3)

where

is the overall mean of the gray intensities ini,j_{. Then}

(7)

2

, respectively, of all segmented classes of gray intensities are,

re-spectively, computed as vWC(T) = n k=0 wk

2k,

2= U−1 g=0 (g−

)2Pg (4)

Here, a dummy threshold t0=0 is utilized for the sake of convenience

in simplifying the expression of equation terms.

The aforementioned criterion functions can be considered as a measure of separability among all existing classes decomposed from the original regioni,j_{. We utilize this concept as a criterion}

of automatic segmentation of objects in a region, denoted by the “separability factor”—

SF

in this study, which is defined as

SF

=vBC(T)

2 = 1 − vWC(T)

2 (5) where

2

serves as the normalization factor in this equation. The

SF

value represents the separability measure among all existing classes, and lies within the range

SF

_{∈ [0, 1]; the lower bound is} approached when the regioni,j_{comprises a uniform gray intensity,}

while the upper bound is achieved when the regioni,j _consists

of exactly n+1 gray intensities. The objective is to maximize the

SF

value so as to optimize the segmentation result. This concept is supported by the property that

2

is equivalent to the sum of vBCand vWC. By observing the terms comprising vWC(T), if the gray

intensities of the pixels belonging to most existing classes are widely distributed, i.e. the contribution values of their class variances

2

kare

large, then the value of the corresponding

SF

measure becomes low. Accordingly, when

SF

approximates 1.0, all resultant classes of gray intensities Ck(k= 0, ... ,n), which are decomposed from the

original regioni,j_{, are ideally and completely separated.}

Therefore, based on this efficient discriminant criterion, an au-tomatic multilevel thresholding can be applied for recursively seg-menting the block regioni,j_{into different objects of homogeneous}

illuminations, regardless of the number of objects and image com-plexity of the regioni,j_{. It can be performed until the}

_SF

_measure

is large enough to show that the appropriate discrepancy among the resultant classes has been obtained. Through these aforementioned properties, this objective can be achieved by minimizing the total within-class variance vWC(T). This can be achieved by the scheme

that selects the class with the maximal contribution (wk

2k) to the

total within-class variance for performing the bi-class partition pro-cedure in each recursion. Thus, the

SF

measure will most rapidly reach the maximal increment to satisfy sufficient separability among the resultant classes of pixels. As a result, objects with homogeneous gray intensities will be well-separated.

The class having the maximal contribution of within-class vari-ance wk

2k is denoted by Cp, and it comprises a subset interval of

gray intensities represented by Cp: {tp+ 1, tp+ 2, ... , tp+1}. Then a

simple effective bi-class partition procedure, as described in Ref.[31], is performed on each determined Cpin each recursion until the

sep-arability among all classes becomes satisfactory, i.e. the condition where the

SF

measure approximates a sufficiently large value. The class Cpwill be divided into two classes Cp0and Cp1by applying the

optimal threshold t∗_S determined by the localized histogram based selection procedure as described in Ref.[31]. The resultant classes

Cp0and Cp1comprise the subsets of gray intensities derived from

Cpand can be represented as: Cp0: {tp+ 1, tp+ 2, ... , t_S∗} and Cp1:

{t∗

S+ 1, t∗S+ 2, ... , tp+1}. The threshold values determined by this

re-cursive selection strategy is ensured to achieve maximum separation on the resultant segmented classes of gray intensities, and hence satisfactory segmentation results of objects can be accomplished by means of the smallest amount of thresholding levels.

Furthermore, if a regioni,j_{is comprised of a set of pixels with}

homogeneous gray intensities, most of them are parts of a large homogeneous background region, and thus it is unnecessary to be partitioned to avoid the redundant segmentation for saving the com-putation costs. For example,Fig. 4(b) is the block region with such characteristics. Therefore, before performing the first partition pro-cedure on the regioni,j_{, an investigation of the homogeneity of}i,j

should be conducted in advance to avoid such redundant segmenta-tion. This condition can be determined by evaluating the following two statistical features: (1) the bi-class

SF

measure, denoted as

SF

b, which is the

SF

value obtained by performing the initial

bi-class partition procedure on the regioni,j_{, i.e. the}

_SF

_value

as-sociated with the determined threshold t∗_Sand (2) the standard de-viation,

, of the gray intensities of the pixels in the entire region i,j_{. According to the aforementioned properties, the}

_SF

bvalue

re-flects the separability of the statistical distribution of gray intensities of pixels in the entire regioni,j_{, and the lower the}

_SF

bvalue is,

the more indistinct or uniform the distribution is. The standard de-viation

represents whether the distribution of gray intensities in i,j_{is widely dispersed or narrowly aggregated. Therefore, a region}

i,j_{is determined to be a homogeneous region that comprises a set}

of homogeneous pixels of a uniform object or parts thereof if both the

SF

band

features reveal low values. On the other hand, if

SF

bis small but

is large, the regioni,jmay consist of many

indistinct object regions with low separability, and should still un-dergo a recursive partition process to separate all objects. Based on the above-mentioned phenomenon, a regioni,j_{can be recognized}

as a homogeneous region if the following homogeneity condition is satisfied:

SF

b

h0, and

h1 (6)

where

h0and

h1are pre-defined thresholds. If a regioni,jis

recog-nized as a homogeneous region, then it does not need to undergo the partition process and hence keeps its pixels of homogeneous objects unchanged to be processed by the next stage.

3.2. Recursive partition process of localized regions

Based on the above-mentioned concepts, the localized automatic multilevel thresholding process is performed by the following recur-sive steps:

Step 1: To begin, the illumination image Y with size Wimg×Himgis

divided into localized block regionsi,j_{with the given size M} H×MV, as

shown inFig. 3. Here (i,j) are the location indices, and i=0, ... , NHand

j_{=0, ... , N}V, where NH=(WimgMH−1) and NV=(HimgMV−1),

which represent the numbers of divided block regions per row and per column, respectively.

Step 2: For each block regioni,j_{, compute the histogram of pixels}

ini,j_{, and then determine its associated standard deviation—}

i,j

and

the bi-class separability measure

SF

_b; initially, there is only one class C₀i,j; let q represent the present amount of classes, and thus set

q= 1. If the homogeneity condition, i.e. Eq. (6), is satisfied, then skip the localized thresholding process for this regioni,j_{and go to step}

7; else perform the following steps.

Step 3: Currently, q classes exist, having been decomposed from

i,j_{. Compute the class probability w}i,j

k, the class mean

i,j

k, and the

standard deviation

i,j_k, of each existing class C_ki,jof gray intensities decomposed from i,j_{, where k denotes the index of the present}

classes and k= 0, ... ,q−1.

Step 4: From all classes C_ki,j, determine the class Cpi,jwhich has

(8)

vi,j_WCofi,j_{, to be partitioned in the next step in order to achieve the}

maximal increment of

SF

.

Step 5: Partition Cpi,j : {ti,jp + 1, ti,jp + 2, ... , ti,jp+1} into two classes

Ci,j_p0 : {ti,j

p + 1, ti,jp + 2, ... , ti,jS∗} and C i,j p1 : {t i,j∗ S + 1, t i,j∗ S + 2, ... , t i,j p+1},

using the optimal threshold t_Si,j∗determined by the bi-class parti-tion procedure. Consequently, the gray intensities of the regioni,j

are partitioned into q+1 classes, Ci,j₀, . . . , C_p0i,j, C_p1i,j, . . . , C_qi,j₋₁and then let

q= q+1 update the record of the current class amount.

Step 6: Compute the

SF

value of all currently obtained classes

using Eq. (5), if the objective condition,

SF

SF, is satisfied, then

perform the following Step 7; otherwise, go back to Step 3 to conduct further partition process on the obtained classes.

Step 7: Classify the pixels of the block regioni,j _{into separate}

SRs, SRi,j,0_{, SR}i,j,1_{, . . . , SR}i,j,q−1_{corresponding to the partitioned classes}

of gray intensities, C₀i,j, C₁i,j, . . . C_qi,j₋₁, respectively, where the notation

SRi,j,k_{represents the k-th SR decomposed from the region}i,j_.

Con-sequently, we obtain

q−1

k=0

SRi,j,k= i,j _{and SR}i,j,k1

k1k2

SRi,j,k2₌

Then, finish the localized thresholding process oni,j _{and go back}

to Step 2 and repeat Steps 2–6 to recursively partition the remain-ing block regions; if all block regions have been processed, go to Step 8.

Step 8: Terminate the segmentation process and deliver all

ob-tained SRs of the corresponding block regions.

Here the separability measure threshold

SF is a pre-defined

threshold to determine whether the segmented objects in the block regions are sufficiently separated to satisfy the objective condition. From our experimental analysis on the block regions containing textual objects in the test images, most of them achieve satisfactory segmentation results of homogeneous objects when their resultant

SF

values exceed 0.92 after performing the segmentation proce-dure, and some other complemental experimental analysis described in Ref.[31]also shows similar consequences. Therefore, the value of

SFis determined as 0.92 to yield satisfactory segmentation results

on the block regions. As for the thresholds

h0and

h1utilized in the

homogeneity condition, we can also determine the suitable values of them by the similar way. By observing the non-textual background regions containing pixels in homogeneous gray intensities, their as-sociated

SF

bfeatures mostly reflect small values of below 0.6, and

are also accompanied with the corresponding

standard deviation features that are below 11. Therefore, the values of the thresholds

h0and

h1are chosen as 0.6 and 11, respectively, to appropriately

detect non-textual homogeneous block regions before performing the thresholding process, and thus some unnecessary segmenta-tions that produce redundant SRs can be efficiently avoided for saving computation costs of the localized multilevel thresholding and the following multi-plane region matching and assembling process.

With regard to the size parameters MH×MVof each block region,

in order for the localized thresholding process to be more adaptive on the steep gradation situation, and to extract the foreground ob-jects in greater detail, smaller sized block regions are desirable. In this way the small objects can be more clearly segmented, but at the cost of greater computation so as to yield the final results when performing the subsequent multi-plane region matching and assem-bling process. Therefore, suitable larger values of MHand MVshould

be chosen to moderately localize and accommodate the features of the allowable character size, and so that the contained textual ob-jects in the images can be clearly segmented. Therefore, given an input document image, MHand MVshould also be automatically

de-termined with respect to its scanning resolution RES (pixels per inch)

by applying a size mapping parameter

_d, and can be obtained by

MH= MV=

d· RES (7)

Based on the analysis of typical characteristics of character sizes as described in Ref.[32]and the practice that typical resolutions for scanning most real-life document images may range from 200 to 600 dpi, the value of

_d is reasonably determined as 0.4 according to the typical allowable character sizes with respect to the scanning resolutions RES. In this way, the size of each block region is deter-mined as about 10_{× 10 mm}2_{in different scanning resolutions, such}

as MH= MV= 80, MH= MV= 120, and MH= MV= 240 in 200, 300,

and 600 dpi scanning resolutions, respectively. These parameters are determined by conducting experiments involving numerous real-life document samples with various characteristics in our experimen-tal set, so that nearly all foreground and textual objects in various document images can be appropriately separated in the preliminary experiments.

We utilize Fig. 4 as an example of performing the localized automatic multilevel thresholding procedure on several block re-gions. HereFig. 4(a) is part of the sectored sample image inFig. 3. Figs. 4(b), (d), (g), and (l), show the four adjacent block regions, i1,J1_,i2,J1_,i1,J2_{, and} i2,J2_{, their corresponding}

SF

band

val-ues, for illustrating the localized thresholding procedure. HereFig. 4(b) is a homogenous block region, and is properly detected by the homogenous conditions, and therefore its pixels are kept intact in Fig. 4(c).Figs. 4(d), (g), and (l), are the block regions comprised of multiple homogeneous objects. After the localized histogram multi-level thresholding procedure has been performed, different objects in these localized regions are distinctly segmented into separate

SRs from darkest to lightest, and their corresponding resultant

SF

values also approach to be close to 1.0, as shown inFigs. 4(e), (f), (h)–(k), and (m)–(p), respectively.

4. Multi-plane region matching and assembling process Having decomposed all localized block regions into several separate classes of pixels by the localized multilevel thresholding procedure, various objects embedded or superimposed in different background objects and textures are, respectively, separated into relevant SRs. Then we need a methodology for grouping them into meaningful objects, especially textual objects of interest, for fur-ther extraction process. Nowadays, concepts of grouping pixels into meaningful regions are widely applied in region-based im-age segmentation[33,34]. Nevertheless, contemporary pixel-based image segmentation techniques cannot work well for the purpose of segmenting textual objects in complex document images. More commonly, performing pixel-based region segmentation on textual objects may cause extracted printed characters to be fragmented and falsely connected or occluded by non-text pictorial objects or background textures. Moreover, this way suffers heavy computa-tional costs when applied to real-life document images scanned with 200–600 dpi resolutions.

Therefore, there is a need to develop an effective segmentation approach that will deal with regions instead of pixels, to offer a considerable reduction in computational complexity and provide ap-propriate preservation to the structural characteristics of extracted textual objects, particularly those of small characters with thin strokes. In this section, we present a multi-plane region matching and assembling method, which adopts both the localized spatial dissimilarity relation and the global feature information, to percep-tually classify and assemble these obtained SRs to compose a set of object planes (

P

q) of homogeneous features, especially textual

regions of interest. This proposed multi-plane region matching and assembling process is conducted by recursively performing the fol-lowing three phases—the initial plane selection phase, the matching

(9)

4.1. Overview and basic definitions

To facilitate the matching and assembling process of the SRs ob-tained from the previous procedure, several concepts and definitions on statistical and spatial features for the SRs are introduced in this subsection. First, given the localized multilevel thresholding process to segment the NH× NV block regions of the original image into r

SRs, a hypothetical “Pool” is adopted for initially collecting these

ob-tained SRs and representing that they are still unclassified into any object planes. Then, the concept 4-adjacent refers to the situation in which each SR has four sides that border the top, the bottom, the left or the right boundary of its adjoining SRs. The SRs which are comprised of objects with homogeneous features are assembled to form an object plane

P

q. An object plane

P

q represents a set of

matching SRs, and for each pair of SRs in

P

q, there are some finite

chains of SRs that connect them so that each successive pair of SRs is

4-adjacent.

Furthermore, each SR may comprise several connected object re-gions of pixels decomposed from its associated block regioni,j_{. Thus}

the pixels that belong to the object regions of a certain SR are said to be object pixels of this SR, while other pixels in this SR are

non-object pixels. The set of the non-object pixels in an SR indexed at (i, j, k) is

defined as follows:

OP(SRi,j,k₎_{= {g(SR}i,j,k_{, x, y)}_{|The pixel at (x, y)}

is an object pixel in SRi,j,k_}

where g(SRi,j,k_{, x, y) is the gray intensity of the pixel at location}

(x, y) in SRi,j,k_{, and the range of x is within [0, M}

H− 1] and y is

within [0, MV − 1]. As well as the total number of object pixels

in SRi,j,k_{, i.e. the amount of object pixels in} _OP(SRi,j,k_{), is}

repre-sented by Nop(SRi,j,k). Then, a mean feature

(SRi,j,k) is also

accord-ingly obtained for each of these SRs. Here

(SRi,j,k_{) is the mean}

of gray intensities of object pixels comprised by SRi,j,k_{, and is}

equivalent to

i,j_k obtained in the localized multilevel thresholding process.

Accordingly, given the unclassified SRs in the Pool, the initial plane selection phase is firstly performed on these unclassified SRs to de-termine a representative set of seed SRs{SR∗_m_{, m}_{= 0 : N − 1}, and}

then initially setting up N initial object planes_{

P

m: m= 0 : N − 1}

based on these selected seed SRs. Afterward, the matching phase will be subsequently performed on the rest of unclassified SRs in the Pool and these initial planes, to determine the association and belonging-ness of these SRs with the existent object planes. For the unclassi-fied SRs having perceptibly distinct features with currently existing planes, the plane construction phase will then be conducted to cre-ate and initialize an appropricre-ate new plane for assembling SRs with such features into this new plane to form another homogeneous ob-ject region in the subsequent matching phase recursion. After the first pass of multi-plane region matching and assembling process has been performed, the matching phase and the plane construction phase will be recursively performed in turns on the rest of unclas-sified SRs in the Pool and emerging planes, until each SR has been classified and associated with a particular plane, and the Pool is even-tually cleared. As a result, the whole illumination image Y will be segmented into a set of separate object planes{

P

q: q=0 : L−1}, each

of which consists of homogenous objects with connected and similar features, such as textual regions of interest, non-text objects such as graphics and pictures, and background textures. Consequently, we will obtain, L−1 q=0

P

q= Y, with

P

q1 q1q2

P

q2=

where L is the number of the resultant planes obtained. In the fol-lowing subsections, we will, respectively, describe the detailed

ele-ments of the proposed multi-plane region matching and assembling process.

4.2. Initial plane selection phase

In this initial processing phase, determining the number and approximate location of the significant clusters of SRs in the Pool can facilitate the speed and accurateness of the final convergence of the multi-plane region matching and assembling process. For this purpose, the subtractive, and mountain clustering technique [35,36] is applied to determine the SRs with the most prominent and representative gray intensity features from the Pool set. As a result, the SRs being selected as the seeds by the mountain clus-tering process will be adopted to establish a set of initial object planes for clustering those SRs having homogeneous features with them.

The mountain method is a fast, one-pass algorithm, which uti-lizes the density of features to determine the most representative feature points as the approximate cluster centers. Here we employ the mean features associated with SRs, i.e.

(SR), as the feature points employed in the mountain clustering process. To facilitate the de-scription of the mountain clustering process, the region dissimilarity

measure, denoted by DRM, between each pair of the two SRs, SRi,j,k

and SRi,j,k_{, is defined as} DRM(SRi,j,k, SRi

,j,k

)= ||

(SRi,j,k)−

(SRi,j,k)|| (8) The range of the DRM is within [0, 255]. The lower the computed

value of DRM, the stronger the similarity among two SRs. Then, the

initial mountain function at an SR is computed as

M(SRi,j,k)=

∀SRi,j,k_∈Pool

e−·DRM(SRi,j,k,SRi ,j ,k) ₍₉₎

where

is a positive constant. It is obvious from Eq. (9) that an SR that can attract more SRs having similar features to it will obtain a high value in the mountain function. The mountain can be viewed as a measure of the density of SRs in the vicinity of the gray intensity feature space. Therefore, it is reasonable to choose SRs with the most significant mountain values as representative seeds to create an ob-ject plane. Let M∗mdenote the maximal value of the m-th mountain

function, and SR∗_mdenote the SR whose mountain value is M_m∗. They are determined by

M∗_m= Mm(SR∗m)= max

∀SRi,j,k_∈Pool[Mm(SR

i,j,k_)] ₍₁₀₎

First, by applying Eqs. (9) and (10) on all the SRs in the Pool, we can obtain the first (and highest) mountain M∗₀, and its associated representative SR, SR∗₀. Then SR∗₀will be selected as the first seed of the first initial plane. After performing the first iteration of mountain clustering, the following representative seeded SRs can be accord-ingly determined by, respectively, destructing the mountains. This is because the SRs whose gray intensity features are close to previ-ously determined seeded SRs have influential effects on the values of the subsequent mountain functions, and thus it is necessary to elim-inate these effects of the identified seeded SRs before determining the follow-up seeded SRs. Toward this purpose, the updating equa-tion of the mountain funcequa-tion, after eliminating the last (m−1)-th seeded SR_−SR∗_m₋₁, is computed by

Mm(SRi,j,k)= Mm−1(SRi,j,k)− M∗m−1e−·DRM(SR

i,j,k_,SR∗

m−1) (11)

where the parameter

determines the neighborhood radius that provides measurable reductions in the updated mountain function.

(10)

Accordingly, through recursively performing the discount process of the mountain function given by Eq. (11), new suitable seeded

SRs can be determined in the same manner, until the level of the

current maximal M∗_m₋₁falls bellow a certain level compared to that of the first maximal mountain M∗₀. The terminative criterion of this procedure is defined as

(M∗_m₋₁/M₀∗) <

(12)

where

is a positive constant less 1. Here the parameters are selected as

= 5.4,

= 1.5 and

= 0.45 as suggested by Pal and Chakraborty [37]. Consequently, this process converges to the determination of resultant N seeded SRs:{SR∗

m, m= 0 : N − 1}, and they are utilized to

establish N initial object planes{

P

m: m= 0 : N − 1} for performing

the following matching phase.

4.3. Matching phase

Having a set of existent object planes from the initial processing phase or previous iterations of the assembling process, then an ef-ficient methodology to associate and assemble the unclassified SRs remained in the Pool with these object planes is necessary to pro-duce appropriate segmentation results of textual objects. Toward this goal, we present a matching process for these unclassified SRs to, respectively, evaluate their mutual connectedness and similar-ity associated with the already existing planes, and to determine its best belonging plane.

4.3.1. Matching grades

To effectively determine the best belonging plane of an unclas-sified SR, we employ a hybrid methodology, named the matching

grade evaluation, for evaluating the mutual connectedness and

simi-larity between them. This hybrid evaluation methodology considers both local pair-wise and global information provide by SRs and ex-isting planes based on two forms of matching grades, the single-link

matching grade, and the centroid-link matching grade. The single-link

matching grade is performed by examining the degree of local dis-connectedness between a pair of two neighboring SRs, an unclas-sified SR and its neighboring clasunclas-sified SRs already have belonging planes; while the centroid-link matching grade is adopted for as-sessing the degree of global dissimilarity between an unclassified

SR and an already existing plane. Then the two matching grades are

combined to provide an effective hybrid criterion to determine the best belonging plane for this unclassified SR among all the existing planes.

During one given matching phase recursion, if an unclassified

SR can find its best belonging plane after examining their mutual

matching grade, then this SR is classified and assembled into this best belonging plane and removed from the Pool afterward; oth-erwise, if there is no suitable matching plane for an unclassified

SR at this time, then this SR will remain unclassified in the Pool.

Since new potential object planes will be created in the following recursion of the plane constructing phase, SRs remaining unclassi-fied in the current matching phase recursion will be re-analyzed in subsequent recursions until their best matching planes are determined.

The single-link matching grade is utilized to examine the degree of disconnectedness between an unclassified SR in the Pool, SRi,j,k_,

and an already existent plane

P

qin a local manner. It is determined

by applying a connectedness measure on SRi,j,k_{and its 4-adjacent SRs}

which have already belonged to an existent plane

P

q, denoted by

SRiq,j,k, where the subscript q represents that SRi

_,j_,k

q belongs to the

q-th plane

P

q. To effectively evaluate the single-link matching grade,

two measures for evaluating discontinuity and dissimilarity between a pair of two 4-adjacent SRs—the side-match measure and the region

dissimilarity measure, i.e. DRM as computed using Eq. (8), are

em-ployed. Then both DSM and DRMmeasures are jointly considered to

determine the single-link matching grade of a pair of 4-adjacent SRs. The side-match measure, denoted by DSM, which examines the

degree of disconnectedness of the touching boundary between SRi,j,k

and SRiq,j,k, is described as follows. Given such pair of two SRs are

4-adjacent, they may have one of the two types of touching boundaries: (1) a vertical touching boundary mutually shared by two horizontally adjacent SRs, as shown inFig. 5(a) or (2) a horizontal boundary shared by two vertically adjacent SRs, as shown inFig. 5(b).

First, given a pair of two horizontally adjacent SRs—SRi,j,k_{on the}

left and SRiq,j,k on the right, the gray intensities of pixels on the

rightmost side of SRi,j,k _{and the leftmost side of SR}i,j,k

q can be

de-scribed as: g(SRi,j,k_{, M}

H− 1, y) and g(SRi

,j,k

q , 0, y), respectively. Then

the sets of object pixels on the rightmost side and the leftmost side of a given SR, denoted by_RS(SRi,j,k_{) and}_LS(SRi,j,k_{), respectively, are}

defined as follows:

RS(SRi,j,k₎_{= {g(SR}i,j,k_{, M}

H− 1, y)|g(SRi,j,k, MH− 1, y)

∈ OP(SRi,j,k_{), and 0}

_y

_M

V− 1} and LS(SRi,j,k₎_{= {g(SR}i,j,k_{, 0, y)}_|g(SRi,j,k_{, 0, y)}_{∈ OP(SR}i,j,k_),

and 0

y

MV− 1}

To facilitate the following descriptions of the side-match features, the denotations of SRi,j,k _{and SR}i,j,k

q are simplified as SRl and SRr,

respectively. The vertical touching boundary of SRl_{and SR}r_{, denoted}

asVB(SRl_{, SR}r_{), is represented by a set of side connections formed}

by pairs of object pixels that are symmetrically connected on their associated rightmost and leftmost sides, and is defined as follows:

VB(SRl_{, SR}r₎_{= {(g(SR}l_{, M}

H− 1, y), g(SRr, 0, y))|g(SRl, MH− 1, y)

∈ RS(SRl_{), and g(SR}r_{, 0, y)}_{∈ LS(SR}r₎_}

Similarly, in the case that SRi,j,k _{and SR}i,j,k

q are vertically adjacent

(suppose that SRi,j,k_{is on the top and SR}i,j,k

q is on the bottom, and

their denotations are also simplified as SRt _{and SR}b_{, respectively),}

their horizontal touching boundary can be represented as

HB(SRt_{, SR}b₎_{= {(g(SR}t_{, x, M}

v− 1), g(SRb, x, 0))|g(SRt, x, Mv− 1)

∈ BS(SRt_{), and g(SR}b_{, x, 0)}_{∈ TS(SR}b₎_}

where_BS(SRt_{) and}_TS(SRb_{) represent the bottommost side and the}

topmost side of SRt_{and SR}b_{, respectively, and are defined as}

BS(SRi,j,k₎_{= {g(SR}i,j,k_{, x, M}

v− 1)|g(SRi,j,k, x, Mv− 1)

∈ OP(SRi,j,k_{), and 0}

_x

_M

H− 1} and TS(SRi,j,k₎_{= {g(SR}i,j,k_{, x, 0)}_|g(SRi,j,k_{, x, 0)}

∈ OP(SRi,j,k_{), and 0}

_x

_M H− 1}

Also, the number of side connections of the touching boundary, i.e. the amount of connected pixel pairs inVB(SRi1,j1,k1_{, SR}i2,j2,k2_{) or}

HB(SRi1,j1,k1_{, SR}i2,j2,k2_{), should also be considered for normalizing the} disconnectedness measure of the two 4-adjacent SRs, and is denoted by Nsc(SRi1,j1,k1, SRi2,j2,k2).

(11)

Therefore, the horizontal and vertical types of the side-match measures of a pair of two 4-adjacent SRs, denoted by Dh

SM and DvSM,

respectively, can be computed as

Dh SM(SR l_{, SR}r₎ = (g(SRl ,MH−1,y),g(SRr,0,y))∈VB(SRl,SRr)||g(SR l_{, MH}_{− 1, y) − g(SR}r_{, 0, y)||} Nsc(SRl_{, SR}r₎ and Dv SM(SRt, SRb) = (g(SRt ,x,Mv−1),g(SRb,x,0))∈HB(SRt ,SRb)||g(SR t_{, x, Mv}_{− 1) − g(SR}b_{, x, 0)||} Nsc(SRt_{, SR}b₎ (13)

Accordingly, the side-match measure of SRi,j,k_{and SR}i,j,k

q can be ob-tained by DSM(SRi,j,k, SRi _,j_,k q ) = Dh SM(SRl, SRr) SRi,j,kand SR i,j,k

q are horizontally adjacent

Dv

SM(SRt, SRb) SRi,j,kand SR i,j,k

q are vertically adjacent

(14)

The range of DSM values is within [0, 255]. If the DSMvalue of two

4-adjacent SRs is sufficiently low, then these two SRs are homoge-neous with each other, and thus they should belong to the same plane.

Accordingly, the DSM measure can reflect the disconnectedness

of two 4-adjacent SRs, and the DRMvalue, as obtained by Eq. (8), and

assesses the dissimilarity between them. The single-link matching grade, denoted by

m

s, evaluates both the degree of

disconnected-ness and dissimilarity of the two 4-adjacent SRs by considering the dominant effect of their associated DSMand DRM values, and is

de-termined by

m

s(SRi,j,k, SRi ,j,k q ) =max(DSM(SRi,j,k, SR i,j,k q ), DRM(SRi,j,k, SRi _,j_,k q )) max(

(SRi,j,k₎₊

_(SRi,j,k q ), 1) (15)

where

(SRi,j,k_{) is the standard deviation of gray intensities of all}

object pixels associated with SRi,j,k_{, and is equivalent to}

i,j

k obtained

in the localized histogram multilevel thresholding process. Here the denominator term max(

(SRi,j,k₎₊

_(SRi,j,k

q ), 1) in Eq. (15) serves as

the normalization factor.

Next, the centroid-link matching grade, which evaluates the de-gree of dissimilarity between SRi,j,k_{and an already existing plane}

_P

q

in a global manner, is given as follows. Let

(

P

q) and

2(

P

q) denote

the mean and variance of the existing plane

P

q, respectively, and

they are given by

(

P

q)= SRi ,j ,kq ∈PqNop(SR i,j,k q )·

(SRi _,j_,k q ) Nop(

P

q) (16) and

2₍

_P

q)= SRi ,j ,kq ∈PqNop(SR i,j,k q )· ||

(SRi ,j,k q )−

(

P

q)||2 Nop(

P

q) (17)

where Nop(

P

q) denotes the amount of pixels in

P

q, and is given by

Nop(

P

q)= SRi ,j ,kq ∈q Nop(SRi _,j_,k q ) (18)

Accordingly, the centroid-link matching grade of SRi,j,k_and

_P

qcan be computed by

m

c(SRi,j,k,

P

q)= ||

(SRi,j,k₎₋

₍

_P

q)|| max(

(SRi,j,k₎₊

₍

_P

q), 1) (19)

If SRi,j,k_{is finally determined to be merged into the plane}

_P

q, then

the mean

(

P

q) and variance

2(

P

q) of

P

qshould be updated after

taking in SRi,j,k_{. The new mean and variance of}

_P

qare, respectively,

computed by

(

P

newq )=

(Nop(

P

prevq )·

(

P

qprev)+ Nop(SRi,j,k)·

(SRi,j,k))

(Nop(

P

prevq )+ Nop(SRi,j,k))

(20)

and

2₍

_P

new

q )=

[Nop(

P

prevq )·

2(

P

prevq )+ Nop(SRi,j,k)· ||

(SRi,j,k)−

(

P

newq )||2+ Nop(

P

prevq )· ||

(

P

newq )−

(

P

prevq )||2]

(Nop(

P

prevq )+ Nop(SRi,j,k))

(21)

where

P

new_q denotes the newly expanded plane

P

q, while

P

prevq

denotes the previous one; and

(

P

newq ) and s2(

P

newq ) represent the

updated mean and variance of

P

q, respectively, while

(

P

prevq ) and

2₍

_P

prev

q ) represent the previous ones.

Both of the above-mentioned matching grades are then combined to form a composite matching grade, denoted by

M

(SRi,j,k_,

_P

q), to

complimentarily assess the degree of disconnectedness and dissimi-larity of an unclassified SR and an already existing plane in both local pair-wise and global manners. Consequently, this composite match-ing grade can provide a more effective criterion for determinmatch-ing the best belonging plane for each of the unclassified SRs. In each recur-sion of the matching phase, each of the unclassified SRs, i.e. SRi,j,k_in

the Pool, is analyzed by evaluating the composite matching grade of

SRi,j,k_{associated with each of its neighboring existent planes}

_P

q, to

seek for the best matching plane into which SRi,j,k_{should belong.}

Since the evaluating process of the composite matching grades of SRi,j,k _{is performed on its neighboring planes, a plane}

_P

q must

have at least one of its own SRs 4-adjacent to SRi,j,k_{, to compete}

for the belongingness of SRi,j,k_{. To facilitate the computation of the}

composite matching grade of SRi,j,k_{and a plane}

_P

q, the processing

setAS(SRi,j,k_,

_P

q) is utilized to store the SRqs which belong to

P

qand

4-adjacent to SRi,j,k_{as well, and is defined by} AS(SRi,j,k_,

_P

q)= {SRi ,j,k q ∈

P

q|SRi ,j,k q is 4-adjacent to SRi,j,k}

Then the composite matching grade

M

of SRi,j,k_{associated with the}

plane

P

q, which reveals how well SRi,j,k matches with

P

q, can be

determined by

M

(SRi,j,k_,

_P

q)= wc(

m

c(SRi,j,k_,

_P

q)) + ws ⎛ ⎝ min ∀SRi ,j ,k q ∈AS(SRi,j,k,Pq)

m

s(SRi,j,k, SRiq,j,k) ⎞ ⎠ (22)

where wcand wsare the weighting factors to control the weighted

contributions of the centroid-linkage and single-linkage strengths of the composite matching grade, respectively, and wc+ws=1. By

apply-ing the weightapply-ing factors wcand wsin the composite matching grade,

the centroid-linkage and single-linkage can be combined for taking advantage of their related strengths. Because textual regions mostly

A multi-plane approach for text segmentation of complex document images

Pattern Recognition

A multi-plane approach for text segmentation of complex document images

Yen-Lin Chen

, Bing-Fei Wu

P

SF

SF

SF

SF

SF

SF

SF

SF

SF

SF

SF

SF

SF

SF

SF

SF

SF

SF

SF

SF

SF

SF

SF



SF

SF

SF

SF

P

P

P

P

P

P

P

P

P

















P

P

P

P

m

m

P

P

P

P

P

P

P

P

P

P

P

P

P

m

P

P

P

P

_{, Bing-Fei Wu}

_SF

_SF

_SF

_SF

_P

_P

_P

_P

_P

_P