Contents lists available atScienceDirect
Pattern Recognition
journal homepage:w w w . e l s e v i e r . c o m / l o c a t e / p r
A multi-plane approach for text segmentation of complex document images
Yen-Lin Chen
a, Bing-Fei Wu
b,∗aDepartment of Computer Science and Information Engineering, Asia University, 500 Liufeng Road, Wufeng, Taichung 41354, Taiwan bDepartment of Electrical and Control Engineering, National Chiao Tung University, 1001 Ta-Hsueh Road, Hsinchu 30010, Taiwan
A R T I C L E I N F O A B S T R A C T
Article history:
Received 19 January 2008
Received in revised form 1 September 2008 Accepted 19 October 2008
Keywords:
Document image processing Text extraction
Image segmentation Multilevel thresholding Region segmentation Complex document images
This study presents a new method, namely the multi-plane segmentation approach, for segmenting and extracting textual objects from various real-life complex document images. The proposed multi-plane segmentation approach first decomposes the document image into distinct object planes to extract and separate homogeneous objects including textual regions of interest, non-text objects such as graphics and pictures, and background textures. This process consists of two stages—localized histogram multilevel thresholding and multi-plane region matching and assembling. Then a text extraction procedure is applied on the resultant planes to detect and extract textual objects with different characteristics in the respective planes. The proposed approach processes document images regionally and adaptively according to their respective local features. Hence detailed characteristics of the extracted textual objects, particularly small characters with thin strokes, as well as gradational illuminations of characters, can be well-preserved. Moreover, this way also allows background objects with uneven, gradational, and sharp variations in con-trast, illumination, and texture to be handled easily and well. Experimental results on real-life complex document images demonstrate that the proposed approach is effective in extracting textual objects with various illuminations, sizes, and font styles from various types of complex document images.
© 2008 Elsevier Ltd. All rights reserved.
1. Introduction
Extraction of textual information from document images provides many useful applications in document analysis and understanding, such as optical character recognition, document retrieval, and com-pression[1,2]. To-date, many techniques were presented for extract-ing textual objects from monochromatic document images[3–6]. In recent years, advances in multimedia publishing and printing tech-nology have led to an increasing number of real-life documents in which stylistic character strings are printed with pictorial, tex-tured, and decorated objects and colorful, varied background com-ponents. However, most of current approaches cannot work well for extracting textual objects from real-life complex document images. Compared to monochromatic document images, text extraction in complex document images brings many difficulties associated with the complexity of background images, variety, and shading of charac-ter illuminations, the superimposing of characcharac-ters with illustrations and pictures, as well as other decorated background components. As a result, there is an increasing demand for a system that is able to read and extract the textual information printed on pictorial and
∗ Corresponding author. Tel.: +886 3 5131538; fax: +886 3 5712385. E-mail addresses:ylchen@asia.edu.tw(Y.-L. Chen),
bwu@cssp.cn.nctu.edu.tw(B.-F. Wu).
0031-3203/$ - see front matter©2008 Elsevier Ltd. All rights reserved. doi:10.1016/j.patcog.2008.10.032
textured regions in both colored images as well as monochromatic main text regions.
Several newly developed global thresholding methods are useful in separating textual objects from non-uniform illuminated doc-ument images. Liu and Srihari [7]proposed a method based on texture features of character patterns, while Cheriet et al.[8] pre-sented a recursive thresholding algorithm extended from Otsu's optimal criterion [9]. These methods are performed by classify-ing pixels in the original image as foreground objects (particularly textual objects of interest) or as background ones according to their gray intensities in a global view, and are attractive because of computational simplicity. However, binary images obtained by global thresholding techniques are subject to noise and distortion, especially because of uneven illumination and the spreading ef-fect caused by the image scanner. To solve the above-mentioned issues, Solihin and Leedham's integral ratio approaches [10] pro-vided a new class of histogram-based thresholding techniques which classify pixels into three classes: foreground, background, and a fuzzy region between two basic classes. In Ref.[11], Parker proposed a local gray intensity gradient thresholding technique which is effective for extracting textual objects in badly illumi-nated document images. Because this method is based on the assumption of binary document images, its application is limited to extracting character objects from backgrounds no more complex than monotonically changing illuminations. A local and adaptive
binarization method was presented by Ohya et al. [12]. This method divides the original image into blocks of specific size, de-termines an optimal threshold associated with each block to be applied on its center pixel, and uses interpolation for determin-ing pixel-wise thresholds. It can effectively extract textual objects from images with complex backgrounds on condition that the il-luminations are very bright compared with those of the textual objects.
Some other methods support a different viewpoint for extracting texts by modeling the features of textual objects and backgrounds. Kamel and Zhao[13] proposed the logical level technique to uti-lize local linearity features of character strokes, while Venkateswarlu and Boyle's average clustering algorithm[14]utilizes local statisti-cal features of textual objects. These methods apply symmetric lo-cal windows with a pre-specified size, and several pre-determined thresholds of prior knowledge on the local features, and so that char-acters with stroke widths that are substantially thinner or thicker than the assumed stroke width, or characters in varying illumina-tion contrasts with backgrounds may not be appropriately extracted. To deal with these problems, Yang and Yan[15]presented an adap-tive logical method (ALM) which applies the concepts of Liu and Srihari's run-length histogram[7]on sectored image regions, to pro-vide an effective scheme for automatically adjusting the size of the local window and logical thresholding level. Ye et al.'s hybrid ex-traction method [16] integrates global thresholding, local thresh-olding, and the double-edge stroke feature extraction techniques to extract textual objects from document images with different com-plexities. The double-edge technique is useful in separating charac-ters whose stroke widths are within a specified size from uneven backgrounds. Some recently presented methods[17,18]utilized the sub-image concepts to deal with the extraction of textual objects under different illumination contrasts with backgrounds. Dawoud and Kamel's[17] proposed a multi-model sub-image thresholding method that considers a document image as a collection of pre-determined regions, i.e. sub-images, and then textual objects con-tained in each sub-image are segmented using statistical models of the gray-intensity and stroke-run features. In Amin and Wu's multi-stage thresholding approach[18]Otsu's global thresholding method is firstly applied, and then a connected-component labeling process is applied on the thresholded image to determine the sub-images of interest, and these sub-images then undergo another threshold-ing process to extract textual objects. The extraction performance of the above two methods relies principally on the adequate deter-mination of sub-image regions. Thus, in case of the textual objects overlapping on pictorial or textured backgrounds of poor and vary-ing contrasts, suitable sub-images are hard to determine to obtain satisfactory extraction results.
Since most textual objects show sharp and distinctive edge fea-tures, methods based on edge information[19–22]have been de-veloped. Such methods utilize an edge detection operator to extract the edge features of textual objects, and then use these features to extract texts from document images. Wu et al.'s textfinder sys-tem[20]uses nine second-order Gaussian derivative filters to ob-tain edge-feature vectors of each pixel at three different scales, and applies the K-means algorithm on these edge-feature vectors to identify corresponding textual pixels. Hasan and Karam[21] in-troduced a method that utilizes a morphological edge extraction scheme, and applies morphological dilation and erosion operations on the extracted closure edges to locate textual regions. Edge in-formation can also be treated as a measure for detecting the exis-tence of textual objects in a specific region. In Pietikainen and Okun's work[22], edge features extracted by the Sobel operator are divided into non-overlapping blocks, and then these blocks are classified as text or non-text according to their corresponding values of the edge features. Such edge-based methods are capable of extracting textual objects in different homogeneous illuminations from graphic
backgrounds. However, when the textual objects are adjoined or touched with graphical objects, texture patterns, or backgrounds with sharply varying contours, edge-feature vectors of non-text ob-jects with similar characteristics may also be identified as textual ones, and thus the characters in extracted textual regions are blurred by those non-text objects. Moreover, when textual objects do not have sufficient contrasts with non-text objects or backgrounds to form sufficiently strong edge features, such textual objects cannot be easily extracted with edge-based methods.
In recent years, several color-segmentation-based methods for text extraction from color document images have been proposed. Zhong et al. [23] proposed two methods and a hybrid approach for locating texts in color images, such as in CD jackets and book covers. The first method utilizes a histogram-based color clustering process to obtain connected-components with uniform colors, and then several heuristic rules are applied to classify them as textual or non-textual objects. The second method locates textual regions based on their distinctive spatial variance. To detect textual regions more effectively, both methods are combined into a hybrid approach. Although the spatial variance method still suffers from the draw-backs of the edge-based methods mentioned previously, the color connected-component method moderately compensates for these drawbacks. However, this approach still cannot provide acceptable results when the illuminations or colors of characters in large textual regions are shaded. Several recent techniques utilize color clustering or quantization approaches to determine the prototype colors of doc-uments so as to facilitate the detection of character objects in these separated color planes. In Jain and Yu's work[24], a color document is decomposed into a set of foreground images in the RGB color space using a bit-dropping quantization and the single-link color cluster-ing algorithm. Strouthopoulos et al.'s adaptive color reduction tech-nique[25]utilizes an unsupervised neural network classifier and a tree-search procedure to determine prototype colors. Some alterna-tive color spaces are also adopted to determine prototype colors for finding textual objects of interest. Yang and Ozawa[26]make use of the HSI color space to segment homogenous color regions to ex-tract bibliographic information from book covers, while Hase et al. [27] apply a histogram-based approach to select prototype colors on the CIE Lab color space to obtain textual regions. However, most of the aforementioned methods have difficulties in extracting texts which are embedded in complex backgrounds or that touch other pictorial and graphical objects. This is because the prototype col-ors are determined in a global view, so that appropriate prototype colors cannot be easily selected to distinguish textual objects from those touched pictorial objects and complex backgrounds without sufficient contrasts. Furthermore, such problems also limit the reli-ability of such methods in handling unevenly illuminated document images.
In brief, extracting texts from complex document images involves several difficulties. These difficulties arise from the following prop-erties of complex documents: (1) character strings in complex doc-ument images may have different illuminations, sizes, font styles, and may be overlapped with various background objects with un-even, gradational, and sharp variations in contrast, illumination, and texture, such as illustrations, photographs, pictures or other back-ground textures and (2) these documents may comprise small char-acters with very thin strokes as well as large charchar-acters with thick strokes, and may be influenced by image shading. An approach for extracting black texts from such complex backgrounds to facilitate compression of document images has been proposed in our previous work[28].
In this study, we propose an effective method, namely the
multi-plane segmentation approach, for segmenting and extracting textual
objects of interest from these complex document images, and resolv-ing the above issues associated with the complexity of their back-grounds. The proposed multi-plane segmentation approach first
decomposes the document image into distinct object planes to ex-tract and separate homogeneous objects including textual regions of interest, non-text objects such as graphics and pictures, and back-ground textures. This process consists of two stages—localized
his-togram multilevel thresholding and multi-plane region matching and assembling. Then a text extraction procedure is applied on the
re-sultant planes to detect and extract textual objects with different characteristics in the respective planes. The proposed approach pro-cesses document images regionally and adaptively by means of their local features. This way allows detailed characteristics of the ex-tracted textual objects to be well-preserved, especially the small characters with thin strokes, as well as characters in gradational and shaded illumination contrasts. Thus, textual objects adjoined or touched with pictorial objects and backgrounds with uneven, grada-tional, and sharp variations in contrast, illumination, and texture can be handled easily and well. Experimental results demonstrate that the proposed approach is capable of extracting textual objects with different illuminations, sizes, and font styles from different types of complex document images. As compared with other existing tech-niques, our proposed approach exhibits feasible and effective perfor-mance on text extraction from various real-life complex document images.
2. Overview of the proposed approach
The proposed multi-plane segmentation approach decomposes the document image into separate object planes by applying the two processing stages: automatic localized histogram multilevel thresh-olding, and multi-plane region matching and assembling. The flow diagram of the proposed approach is illustrated in Fig. 1. In the first stage, the original image is firstly sectored into non-overlapping “localized block regions”, denoted byi,j, then distinct objects
em-bedded in block regions are decomposed into separate “sub-block regions (SRs)” by applying the localized histogram multilevel thresh-olding process, as illustrated in Figs. 2–4. Afterward, in the sec-ond stage, the multi-plane region matching and assembling process, which adopts both the localized spatial dissimilarity relation and the global feature information, is applied to perceptually classify and ar-range the obtained SRs to compose a set of homogeneous “object planes”, denoted by
P
q, especially textual regions of interest. Thisproposed multi-plane region matching and assembling process is conducted by recursively applying the following three phases—the initial plane selection phase, the matching phase, and the plane con-struction phase, as depicted inFig. 6. Consequently, homogeneous objects including textual regions of interest, non-text objects such as graphics and pictures, and background textures are extracted and separated into distinct object planes. The text extraction process is then performed on the resultant planes to extract the textual objects with different characteristics in the respective planes, as shown in Fig. 7. The important symbols utilized for the presentation of the proposed approach are depicted inTable 1.
The following sections will accordingly describe the detailed stages of the proposed approach, and are organized as follows. In Sections 3 and 4, the two stages of the proposed multi-plane seg-mentation approach, the localized histogram multilevel threshold-ing procedure, and the multi-plane region matchthreshold-ing and assemblthreshold-ing process, are, respectively, presented. Then, a simple text extrac-tion procedure is described in Secextrac-tion 5. Next, Secextrac-tion 6 illustrates parameter adaptation and comparative performance evaluation results. Finally, the conclusions of this study are stated in Section 7. 3. Localized histogram multilevel thresholding
For complex document images with textual objects in differ-ent illuminations, sizes, and font styles, and printed on varying or
Fig. 1. Block diagram of the proposed multi-plane segmentation approach.
inhomogeneous background objects with uneven, gradational, and sharp variations in contrast, illumination, and texture, such as illus-trations, photographs, pictures or other background patterns, a crit-ical difficulty arises that no global segmentation techniques could
Fig. 3. Sectored regions of the illumination image Y obtained from the original image inFig. 2.
Fig. 4. Example of the results by the localized multilevel thresholding procedure, and the resultantSFvalues ofi2,j1,i1,j2, andi2,j2after the thresholding procedure are 0.931, 0.961, and 0.96, respectively: (a) part of the partitioned block regions of the image “Calibre” inFig. 3, where the block regions enclosed by yellow ink are employed for the following examples of the localized multilevel thresholding procedure, (b) the upper-left block region,i1,j1,SFb= 0.577, and= 8.81, (c) SRi1,j1,0derived from i1,j1, which is a homogenous block region, (d) the upper-right block regioni2,j1,SFb= 0.931, and= 22.8, (e) SRi2,j1,0derived fromi2,j1, (f) SRi2,j1,1derived fromi2,j1, (g) the bottom-left block regioni1,j2,SFb= 0.804, and= 42.3, (h) SRi1,j2,0derived fromi1,j2, (i) SRi1,j2,1derived fromi1,j2, (j) SRi1,j2,2derived fromi1,j2, (k) SRi1,j2,3 derived fromi1,j2, (l) bottom-right block regioni2,j2,SFb= 0.835, and= 46.6, (m) SRi2,j2,0derived fromi2,j2, (n) SRi2,j2,1derived fromi2,j2, (o) SRi2,j2,2derived from i2,j2, and (p) SRi2,j2,3derived fromi2,j2.
Fig. 5. Types of touching boundaries of the two 4-adjacent SRs: (a) vertical boundary and (b) horizontal boundary.
Fig. 6. An example of the test image, “Calibre”, and the object planes obtained by the multi-plane segmentation (image size= 1929×1019): (a) object planeP0, (b) object planeP1, (c) object planeP2, (d) object planeP3, (e) object planeP4, (f) object planeP5, and (g) object planeP6.
work well for such kinds of document images. This is because when the regions of interesting textual objects consisted of multiple col-ors or gray intensities are undersized as compared with those of the touched pictorial objects and complex backgrounds with in-distinct contrasts, these textual objects cannot be discriminated in a global view of statistical features. A typical example with these characteristics is shown in Fig. 2. This sample image consists of three different colored textual regions printed on a varying and shaded background. Moreover, the black characters are superim-posed on the white characters. By observing some localized regions, the statistical features of the textual objects, pictorial objects, and backgrounds could be much more distinguishable. Therefore, re-gional and adaptive analysis approach for the localized statistical features can provide detailed characteristics of the textual objects of interest to be well-extracted for later document processing. In this section, we will introduce a simple and effective localized seg-mentation approach as the first stage of the multi-plane
segmenta-tion process for extracting textual objects from complex document images.
The multi-plane segmentation process, if necessary, begins by ap-plying a color-to-grayscale transformation on the RGB components of image pixels in a color document image, to obtain its illumination image Y. After the color transformation is performed, the illumina-tion image Y still retain the texture features of the original color image, as pointed out in Ref.[20], and thus the character strokes in their original color are still well-preserved. Then the obtained illu-mination image Y will be sectored into non-overlapping localized block regionsi,jwith a given size M
H×MV, as shown inFig. 3. To
fa-cilitate analysis in the following stage, the objects of interest must be extracted from these localized block regions into separate SRs, each of which contains objects with homogeneous features. Toward this goal, the discriminant criterion is useful for measuring separability among the decomposed regions with different objects. Its applica-tion on bi-level global thresholding to extract foreground objects
from the background was first presented by Otsu[9]. This method is ranked as the most effective bi-level threshold selection method [29,30]. However, when the number of desired thresholds increases, the computation needed to obtain the optimal threshold values is substantially increased and the search to achieve the optimal value of the criterion function is particularly exhaustive.
Hence, an efficient multilevel thresholding technique is needed to automatically determine the suitable number of thresholds to segment the block region into different decomposed object regions. By using the properties of discriminant analysis, we have proposed an automatic multilevel global thresholding technique for image
Fig. 7. Examples of the text location and extraction process: (a) example of per-forming X-cut on connected-components in the binary planeBP4ofFig. 6(g), (b) example of performing Y-cut on the top connected-component group, which is the first group among five groups obtained from X-cut procedure onBP4, (c) the re-sultant candidate text-lines obtained by the XY-cut spatial clustering process, and (d) the resultant text plane obtained by performing text extraction process on all object planes derived fromFig. 2.
Table 1
List of important symbols of the proposed approach.
Symbol Description
i,j Localized block region, which represents one of the non-overlapping block regions sectored from the original image, and the superscript (i,j) denotes its location index
SRi,j,k, SRi ,j ,k
q Sub-block region, which is derived fromi,j after applying the localized histogram multilevel thresholding process; the additional superscript k means it is k-th SRs derived fromi,j, and when the subscript q is assigned, it means that this SR has belonged to an existent object planePq
Pq Object plane, which is formed by a set of homogeneous SRs after performing the multi-plane region matching and assembling process, and the subscript q represents its order of creation
segmentation[31]. This technique extends and applies the concept of discriminant criterion on analyzing the separability among the gray levels in the image. It can automatically determine the suitable number of thresholds, and utilizes a fast recursive selection strategy to select the optimal thresholds to segment the image into separate objects with similar features in a computationally frugal way. Based on this effective technique, we will introduce a localized histogram multilevel thresholding process to decompose distinct objects with homogeneous features in localized block regions into separate SRs. This process is described in the following subsections.
3.1. Statistical features and recursive partition concepts of localized regions
Let fgdenote the observed frequencies (histogram) of gray
inten-sities of pixels in a localized block regioni,jwith a given gray
in-tensity g, and thus the total amount of pixels ini,jcan be given by
N= f0+f1+ . . . +fU−1, where U is the number of gray intensities in the
histogram. Hence, the normalized probability of one pixel having a given gray intensity can be computed as,
Pg= fg N, where Pg
0, and U−1 g=0 Pg= 1 (1)In order to segment textual objects, foreground objects and back-ground components from a given localized regioni,j, pixels ini,j
should be partitioned into a suitable number of classes. For multi-level thresholding, with n thresholds to partition the pixels in the regioni,j into n+1 classes, gray intensities of pixels ini,jare
seg-mented by applying a threshold setT, which is composed of n thresh-olds, where T = {tk|k = 1, ... , n}. These classes are represented by
C0= {0, 1, ... , t1}, ... Ck= {tk+ 1, tk+ 2, ... , tk+1}, ... , Cn= {tn+ 1, tn+
2, . . . , U−1}. Then the statistical features associated with a given pixel class Ck, including the cumulative probability, the mean, and the
standard deviation, denoted by wk,
k, and
2k, respectively, can be
computed as wk= tk+1 g=tk+1 Pg,
k= tk+1 g=tk+1gPg wk , and
2 k= tk+1 g=tk+1Pg(g−
k) 2 wk (2) Based on the above-mentioned statistical features of pixels in the regioni,j, the between-class variance, denoted by v
BC, an effective
criterion for evaluating segmentation results, can be obtained for measuring the separability among all classes, and is expressed as
vBC(T) = n k=0 wk(
k−
)2, where
= U−1 g=0 gPg (3)
where
is the overall mean of the gray intensities ini,j. Then
2
, respectively, of all segmented classes of gray intensities are,
re-spectively, computed as vWC(T) = n k=0 wk
2k,
2= U−1 g=0 (g−
)2Pg (4)
Here, a dummy threshold t0=0 is utilized for the sake of convenience
in simplifying the expression of equation terms.
The aforementioned criterion functions can be considered as a measure of separability among all existing classes decomposed from the original regioni,j. We utilize this concept as a criterion
of automatic segmentation of objects in a region, denoted by the “separability factor”—
SF
in this study, which is defined asSF
=vBC(T)2 = 1 − vWC(T)
2 (5) where
2
serves as the normalization factor in this equation. The
SF
value represents the separability measure among all existing classes, and lies within the rangeSF
∈ [0, 1]; the lower bound is approached when the regioni,jcomprises a uniform gray intensity,while the upper bound is achieved when the regioni,j consists
of exactly n+1 gray intensities. The objective is to maximize the
SF
value so as to optimize the segmentation result. This concept is supported by the property that2
is equivalent to the sum of vBCand vWC. By observing the terms comprising vWC(T), if the gray
intensities of the pixels belonging to most existing classes are widely distributed, i.e. the contribution values of their class variances
2
kare
large, then the value of the corresponding
SF
measure becomes low. Accordingly, whenSF
approximates 1.0, all resultant classes of gray intensities Ck(k= 0, ... ,n), which are decomposed from theoriginal regioni,j, are ideally and completely separated.
Therefore, based on this efficient discriminant criterion, an au-tomatic multilevel thresholding can be applied for recursively seg-menting the block regioni,jinto different objects of homogeneous
illuminations, regardless of the number of objects and image com-plexity of the regioni,j. It can be performed until the
SF
measureis large enough to show that the appropriate discrepancy among the resultant classes has been obtained. Through these aforementioned properties, this objective can be achieved by minimizing the total within-class variance vWC(T). This can be achieved by the scheme
that selects the class with the maximal contribution (wk
2k) to the
total within-class variance for performing the bi-class partition pro-cedure in each recursion. Thus, the
SF
measure will most rapidly reach the maximal increment to satisfy sufficient separability among the resultant classes of pixels. As a result, objects with homogeneous gray intensities will be well-separated.The class having the maximal contribution of within-class vari-ance wk
2k is denoted by Cp, and it comprises a subset interval of
gray intensities represented by Cp: {tp+ 1, tp+ 2, ... , tp+1}. Then a
simple effective bi-class partition procedure, as described in Ref.[31], is performed on each determined Cpin each recursion until the
sep-arability among all classes becomes satisfactory, i.e. the condition where the
SF
measure approximates a sufficiently large value. The class Cpwill be divided into two classes Cp0and Cp1by applying theoptimal threshold t∗S determined by the localized histogram based selection procedure as described in Ref.[31]. The resultant classes
Cp0and Cp1comprise the subsets of gray intensities derived from
Cpand can be represented as: Cp0: {tp+ 1, tp+ 2, ... , tS∗} and Cp1:
{t∗
S+ 1, t∗S+ 2, ... , tp+1}. The threshold values determined by this
re-cursive selection strategy is ensured to achieve maximum separation on the resultant segmented classes of gray intensities, and hence satisfactory segmentation results of objects can be accomplished by means of the smallest amount of thresholding levels.
Furthermore, if a regioni,jis comprised of a set of pixels with
homogeneous gray intensities, most of them are parts of a large homogeneous background region, and thus it is unnecessary to be partitioned to avoid the redundant segmentation for saving the com-putation costs. For example,Fig. 4(b) is the block region with such characteristics. Therefore, before performing the first partition pro-cedure on the regioni,j, an investigation of the homogeneity ofi,j
should be conducted in advance to avoid such redundant segmenta-tion. This condition can be determined by evaluating the following two statistical features: (1) the bi-class
SF
measure, denoted asSF
b, which is theSF
value obtained by performing the initialbi-class partition procedure on the regioni,j, i.e. the
SF
valueas-sociated with the determined threshold t∗Sand (2) the standard de-viation,
, of the gray intensities of the pixels in the entire region i,j. According to the aforementioned properties, the
SF
bvalue
re-flects the separability of the statistical distribution of gray intensities of pixels in the entire regioni,j, and the lower the
SF
bvalue is,
the more indistinct or uniform the distribution is. The standard de-viation
represents whether the distribution of gray intensities in i,jis widely dispersed or narrowly aggregated. Therefore, a region
i,jis determined to be a homogeneous region that comprises a set
of homogeneous pixels of a uniform object or parts thereof if both the
SF
bandfeatures reveal low values. On the other hand, if
SF
bis small butis large, the regioni,jmay consist of many
indistinct object regions with low separability, and should still un-dergo a recursive partition process to separate all objects. Based on the above-mentioned phenomenon, a regioni,jcan be recognized
as a homogeneous region if the following homogeneity condition is satisfied:
SF
bh0, and
h1 (6)
where
h0and
h1are pre-defined thresholds. If a regioni,jis
recog-nized as a homogeneous region, then it does not need to undergo the partition process and hence keeps its pixels of homogeneous objects unchanged to be processed by the next stage.
3.2. Recursive partition process of localized regions
Based on the above-mentioned concepts, the localized automatic multilevel thresholding process is performed by the following recur-sive steps:
Step 1: To begin, the illumination image Y with size Wimg×Himgis
divided into localized block regionsi,jwith the given size M H×MV, as
shown inFig. 3. Here (i,j) are the location indices, and i=0, ... , NHand
j=0, ... , NV, where NH=(WimgMH−1) and NV=(HimgMV−1),
which represent the numbers of divided block regions per row and per column, respectively.
Step 2: For each block regioni,j, compute the histogram of pixels
ini,j, and then determine its associated standard deviation—
i,j
and
the bi-class separability measure
SF
b; initially, there is only one class C0i,j; let q represent the present amount of classes, and thus setq= 1. If the homogeneity condition, i.e. Eq. (6), is satisfied, then skip the localized thresholding process for this regioni,jand go to step
7; else perform the following steps.
Step 3: Currently, q classes exist, having been decomposed from
i,j. Compute the class probability wi,j
k, the class mean
i,j
k, and the
standard deviation
i,jk, of each existing class Cki,jof gray intensities decomposed from i,j, where k denotes the index of the present
classes and k= 0, ... ,q−1.
Step 4: From all classes Cki,j, determine the class Cpi,jwhich has
vi,jWCofi,j, to be partitioned in the next step in order to achieve the
maximal increment of
SF
.Step 5: Partition Cpi,j : {ti,jp + 1, ti,jp + 2, ... , ti,jp+1} into two classes
Ci,jp0 : {ti,j
p + 1, ti,jp + 2, ... , ti,jS∗} and C i,j p1 : {t i,j∗ S + 1, t i,j∗ S + 2, ... , t i,j p+1},
using the optimal threshold tSi,j∗determined by the bi-class parti-tion procedure. Consequently, the gray intensities of the regioni,j
are partitioned into q+1 classes, Ci,j0, . . . , Cp0i,j, Cp1i,j, . . . , Cqi,j−1and then let
q= q+1 update the record of the current class amount.
Step 6: Compute the
SF
value of all currently obtained classesusing Eq. (5), if the objective condition,
SF
SF, is satisfied, thenperform the following Step 7; otherwise, go back to Step 3 to conduct further partition process on the obtained classes.
Step 7: Classify the pixels of the block regioni,j into separate
SRs, SRi,j,0, SRi,j,1, . . . , SRi,j,q−1corresponding to the partitioned classes
of gray intensities, C0i,j, C1i,j, . . . Cqi,j−1, respectively, where the notation
SRi,j,krepresents the k-th SR decomposed from the regioni,j.
Con-sequently, we obtain
q−1
k=0
SRi,j,k= i,j and SRi,j,k1
k1k2
SRi,j,k2=
Then, finish the localized thresholding process oni,j and go back
to Step 2 and repeat Steps 2–6 to recursively partition the remain-ing block regions; if all block regions have been processed, go to Step 8.
Step 8: Terminate the segmentation process and deliver all
ob-tained SRs of the corresponding block regions.
Here the separability measure threshold
SF is a pre-defined
threshold to determine whether the segmented objects in the block regions are sufficiently separated to satisfy the objective condition. From our experimental analysis on the block regions containing textual objects in the test images, most of them achieve satisfactory segmentation results of homogeneous objects when their resultant
SF
values exceed 0.92 after performing the segmentation proce-dure, and some other complemental experimental analysis described in Ref.[31]also shows similar consequences. Therefore, the value ofSFis determined as 0.92 to yield satisfactory segmentation results
on the block regions. As for the thresholds
h0and
h1utilized in the
homogeneity condition, we can also determine the suitable values of them by the similar way. By observing the non-textual background regions containing pixels in homogeneous gray intensities, their as-sociated
SF
bfeatures mostly reflect small values of below 0.6, andare also accompanied with the corresponding
standard deviation features that are below 11. Therefore, the values of the thresholds
h0and
h1are chosen as 0.6 and 11, respectively, to appropriately
detect non-textual homogeneous block regions before performing the thresholding process, and thus some unnecessary segmenta-tions that produce redundant SRs can be efficiently avoided for saving computation costs of the localized multilevel thresholding and the following multi-plane region matching and assembling process.
With regard to the size parameters MH×MVof each block region,
in order for the localized thresholding process to be more adaptive on the steep gradation situation, and to extract the foreground ob-jects in greater detail, smaller sized block regions are desirable. In this way the small objects can be more clearly segmented, but at the cost of greater computation so as to yield the final results when performing the subsequent multi-plane region matching and assem-bling process. Therefore, suitable larger values of MHand MVshould
be chosen to moderately localize and accommodate the features of the allowable character size, and so that the contained textual ob-jects in the images can be clearly segmented. Therefore, given an input document image, MHand MVshould also be automatically
de-termined with respect to its scanning resolution RES (pixels per inch)
by applying a size mapping parameter
d, and can be obtained by
MH= MV=
d· RES (7)
Based on the analysis of typical characteristics of character sizes as described in Ref.[32]and the practice that typical resolutions for scanning most real-life document images may range from 200 to 600 dpi, the value of
d is reasonably determined as 0.4 according to the typical allowable character sizes with respect to the scanning resolutions RES. In this way, the size of each block region is deter-mined as about 10× 10 mm2in different scanning resolutions, such
as MH= MV= 80, MH= MV= 120, and MH= MV= 240 in 200, 300,
and 600 dpi scanning resolutions, respectively. These parameters are determined by conducting experiments involving numerous real-life document samples with various characteristics in our experimen-tal set, so that nearly all foreground and textual objects in various document images can be appropriately separated in the preliminary experiments.
We utilize Fig. 4 as an example of performing the localized automatic multilevel thresholding procedure on several block re-gions. HereFig. 4(a) is part of the sectored sample image inFig. 3. Figs. 4(b), (d), (g), and (l), show the four adjacent block regions, i1,J1,i2,J1,i1,J2, and i2,J2, their corresponding
SF
band
val-ues, for illustrating the localized thresholding procedure. HereFig. 4(b) is a homogenous block region, and is properly detected by the homogenous conditions, and therefore its pixels are kept intact in Fig. 4(c).Figs. 4(d), (g), and (l), are the block regions comprised of multiple homogeneous objects. After the localized histogram multi-level thresholding procedure has been performed, different objects in these localized regions are distinctly segmented into separate
SRs from darkest to lightest, and their corresponding resultant
SF
values also approach to be close to 1.0, as shown inFigs. 4(e), (f), (h)–(k), and (m)–(p), respectively.
4. Multi-plane region matching and assembling process Having decomposed all localized block regions into several separate classes of pixels by the localized multilevel thresholding procedure, various objects embedded or superimposed in different background objects and textures are, respectively, separated into relevant SRs. Then we need a methodology for grouping them into meaningful objects, especially textual objects of interest, for fur-ther extraction process. Nowadays, concepts of grouping pixels into meaningful regions are widely applied in region-based im-age segmentation[33,34]. Nevertheless, contemporary pixel-based image segmentation techniques cannot work well for the purpose of segmenting textual objects in complex document images. More commonly, performing pixel-based region segmentation on textual objects may cause extracted printed characters to be fragmented and falsely connected or occluded by non-text pictorial objects or background textures. Moreover, this way suffers heavy computa-tional costs when applied to real-life document images scanned with 200–600 dpi resolutions.
Therefore, there is a need to develop an effective segmentation approach that will deal with regions instead of pixels, to offer a considerable reduction in computational complexity and provide ap-propriate preservation to the structural characteristics of extracted textual objects, particularly those of small characters with thin strokes. In this section, we present a multi-plane region matching and assembling method, which adopts both the localized spatial dissimilarity relation and the global feature information, to percep-tually classify and assemble these obtained SRs to compose a set of object planes (
P
q) of homogeneous features, especially textualregions of interest. This proposed multi-plane region matching and assembling process is conducted by recursively performing the fol-lowing three phases—the initial plane selection phase, the matching
4.1. Overview and basic definitions
To facilitate the matching and assembling process of the SRs ob-tained from the previous procedure, several concepts and definitions on statistical and spatial features for the SRs are introduced in this subsection. First, given the localized multilevel thresholding process to segment the NH× NV block regions of the original image into r
SRs, a hypothetical “Pool” is adopted for initially collecting these
ob-tained SRs and representing that they are still unclassified into any object planes. Then, the concept 4-adjacent refers to the situation in which each SR has four sides that border the top, the bottom, the left or the right boundary of its adjoining SRs. The SRs which are comprised of objects with homogeneous features are assembled to form an object plane
P
q. An object planeP
q represents a set ofmatching SRs, and for each pair of SRs in
P
q, there are some finitechains of SRs that connect them so that each successive pair of SRs is
4-adjacent.
Furthermore, each SR may comprise several connected object re-gions of pixels decomposed from its associated block regioni,j. Thus
the pixels that belong to the object regions of a certain SR are said to be object pixels of this SR, while other pixels in this SR are
non-object pixels. The set of the non-object pixels in an SR indexed at (i, j, k) is
defined as follows:
OP(SRi,j,k)= {g(SRi,j,k, x, y)|The pixel at (x, y)
is an object pixel in SRi,j,k}
where g(SRi,j,k, x, y) is the gray intensity of the pixel at location
(x, y) in SRi,j,k, and the range of x is within [0, M
H− 1] and y is
within [0, MV − 1]. As well as the total number of object pixels
in SRi,j,k, i.e. the amount of object pixels in OP(SRi,j,k), is
repre-sented by Nop(SRi,j,k). Then, a mean feature
(SRi,j,k) is also
accord-ingly obtained for each of these SRs. Here
(SRi,j,k) is the mean
of gray intensities of object pixels comprised by SRi,j,k, and is
equivalent to
i,jk obtained in the localized multilevel thresholding process.
Accordingly, given the unclassified SRs in the Pool, the initial plane selection phase is firstly performed on these unclassified SRs to de-termine a representative set of seed SRs{SR∗m, m= 0 : N − 1}, and
then initially setting up N initial object planes{
P
m: m= 0 : N − 1}based on these selected seed SRs. Afterward, the matching phase will be subsequently performed on the rest of unclassified SRs in the Pool and these initial planes, to determine the association and belonging-ness of these SRs with the existent object planes. For the unclassi-fied SRs having perceptibly distinct features with currently existing planes, the plane construction phase will then be conducted to cre-ate and initialize an appropricre-ate new plane for assembling SRs with such features into this new plane to form another homogeneous ob-ject region in the subsequent matching phase recursion. After the first pass of multi-plane region matching and assembling process has been performed, the matching phase and the plane construction phase will be recursively performed in turns on the rest of unclas-sified SRs in the Pool and emerging planes, until each SR has been classified and associated with a particular plane, and the Pool is even-tually cleared. As a result, the whole illumination image Y will be segmented into a set of separate object planes{
P
q: q=0 : L−1}, eachof which consists of homogenous objects with connected and similar features, such as textual regions of interest, non-text objects such as graphics and pictures, and background textures. Consequently, we will obtain, L−1 q=0
P
q= Y, withP
q1 q1q2P
q2=where L is the number of the resultant planes obtained. In the fol-lowing subsections, we will, respectively, describe the detailed
ele-ments of the proposed multi-plane region matching and assembling process.
4.2. Initial plane selection phase
In this initial processing phase, determining the number and approximate location of the significant clusters of SRs in the Pool can facilitate the speed and accurateness of the final convergence of the multi-plane region matching and assembling process. For this purpose, the subtractive, and mountain clustering technique [35,36] is applied to determine the SRs with the most prominent and representative gray intensity features from the Pool set. As a result, the SRs being selected as the seeds by the mountain clus-tering process will be adopted to establish a set of initial object planes for clustering those SRs having homogeneous features with them.
The mountain method is a fast, one-pass algorithm, which uti-lizes the density of features to determine the most representative feature points as the approximate cluster centers. Here we employ the mean features associated with SRs, i.e.
(SR), as the feature points employed in the mountain clustering process. To facilitate the de-scription of the mountain clustering process, the region dissimilarity
measure, denoted by DRM, between each pair of the two SRs, SRi,j,k
and SRi,j,k, is defined as DRM(SRi,j,k, SRi
,j,k
)= ||
(SRi,j,k)−
(SRi,j,k)|| (8) The range of the DRM is within [0, 255]. The lower the computed
value of DRM, the stronger the similarity among two SRs. Then, the
initial mountain function at an SR is computed as
M(SRi,j,k)=
∀SRi,j,k∈Pool
e−·DRM(SRi,j,k,SRi ,j ,k) (9)
where
is a positive constant. It is obvious from Eq. (9) that an SR that can attract more SRs having similar features to it will obtain a high value in the mountain function. The mountain can be viewed as a measure of the density of SRs in the vicinity of the gray intensity feature space. Therefore, it is reasonable to choose SRs with the most significant mountain values as representative seeds to create an ob-ject plane. Let M∗mdenote the maximal value of the m-th mountain
function, and SR∗mdenote the SR whose mountain value is Mm∗. They are determined by
M∗m= Mm(SR∗m)= max
∀SRi,j,k∈Pool[Mm(SR
i,j,k)] (10)
First, by applying Eqs. (9) and (10) on all the SRs in the Pool, we can obtain the first (and highest) mountain M∗0, and its associated representative SR, SR∗0. Then SR∗0will be selected as the first seed of the first initial plane. After performing the first iteration of mountain clustering, the following representative seeded SRs can be accord-ingly determined by, respectively, destructing the mountains. This is because the SRs whose gray intensity features are close to previ-ously determined seeded SRs have influential effects on the values of the subsequent mountain functions, and thus it is necessary to elim-inate these effects of the identified seeded SRs before determining the follow-up seeded SRs. Toward this purpose, the updating equa-tion of the mountain funcequa-tion, after eliminating the last (m−1)-th seeded SR−SR∗m−1, is computed by
Mm(SRi,j,k)= Mm−1(SRi,j,k)− M∗m−1e−·DRM(SR
i,j,k,SR∗
m−1) (11)
where the parameter
determines the neighborhood radius that provides measurable reductions in the updated mountain function.
Accordingly, through recursively performing the discount process of the mountain function given by Eq. (11), new suitable seeded
SRs can be determined in the same manner, until the level of the
current maximal M∗m−1falls bellow a certain level compared to that of the first maximal mountain M∗0. The terminative criterion of this procedure is defined as
(M∗m−1/M0∗) <
(12)
where
is a positive constant less 1. Here the parameters are selected as
= 5.4,
= 1.5 and
= 0.45 as suggested by Pal and Chakraborty [37]. Consequently, this process converges to the determination of resultant N seeded SRs:{SR∗
m, m= 0 : N − 1}, and they are utilized to
establish N initial object planes{
P
m: m= 0 : N − 1} for performingthe following matching phase.
4.3. Matching phase
Having a set of existent object planes from the initial processing phase or previous iterations of the assembling process, then an ef-ficient methodology to associate and assemble the unclassified SRs remained in the Pool with these object planes is necessary to pro-duce appropriate segmentation results of textual objects. Toward this goal, we present a matching process for these unclassified SRs to, respectively, evaluate their mutual connectedness and similar-ity associated with the already existing planes, and to determine its best belonging plane.
4.3.1. Matching grades
To effectively determine the best belonging plane of an unclas-sified SR, we employ a hybrid methodology, named the matching
grade evaluation, for evaluating the mutual connectedness and
simi-larity between them. This hybrid evaluation methodology considers both local pair-wise and global information provide by SRs and ex-isting planes based on two forms of matching grades, the single-link
matching grade, and the centroid-link matching grade. The single-link
matching grade is performed by examining the degree of local dis-connectedness between a pair of two neighboring SRs, an unclas-sified SR and its neighboring clasunclas-sified SRs already have belonging planes; while the centroid-link matching grade is adopted for as-sessing the degree of global dissimilarity between an unclassified
SR and an already existing plane. Then the two matching grades are
combined to provide an effective hybrid criterion to determine the best belonging plane for this unclassified SR among all the existing planes.
During one given matching phase recursion, if an unclassified
SR can find its best belonging plane after examining their mutual
matching grade, then this SR is classified and assembled into this best belonging plane and removed from the Pool afterward; oth-erwise, if there is no suitable matching plane for an unclassified
SR at this time, then this SR will remain unclassified in the Pool.
Since new potential object planes will be created in the following recursion of the plane constructing phase, SRs remaining unclassi-fied in the current matching phase recursion will be re-analyzed in subsequent recursions until their best matching planes are determined.
The single-link matching grade is utilized to examine the degree of disconnectedness between an unclassified SR in the Pool, SRi,j,k,
and an already existent plane
P
qin a local manner. It is determinedby applying a connectedness measure on SRi,j,kand its 4-adjacent SRs
which have already belonged to an existent plane
P
q, denoted bySRiq,j,k, where the subscript q represents that SRi
,j,k
q belongs to the
q-th plane
P
q. To effectively evaluate the single-link matching grade,two measures for evaluating discontinuity and dissimilarity between a pair of two 4-adjacent SRs—the side-match measure and the region
dissimilarity measure, i.e. DRM as computed using Eq. (8), are
em-ployed. Then both DSM and DRMmeasures are jointly considered to
determine the single-link matching grade of a pair of 4-adjacent SRs. The side-match measure, denoted by DSM, which examines the
degree of disconnectedness of the touching boundary between SRi,j,k
and SRiq,j,k, is described as follows. Given such pair of two SRs are
4-adjacent, they may have one of the two types of touching boundaries: (1) a vertical touching boundary mutually shared by two horizontally adjacent SRs, as shown inFig. 5(a) or (2) a horizontal boundary shared by two vertically adjacent SRs, as shown inFig. 5(b).
First, given a pair of two horizontally adjacent SRs—SRi,j,kon the
left and SRiq,j,k on the right, the gray intensities of pixels on the
rightmost side of SRi,j,k and the leftmost side of SRi,j,k
q can be
de-scribed as: g(SRi,j,k, M
H− 1, y) and g(SRi
,j,k
q , 0, y), respectively. Then
the sets of object pixels on the rightmost side and the leftmost side of a given SR, denoted byRS(SRi,j,k) andLS(SRi,j,k), respectively, are
defined as follows:
RS(SRi,j,k)= {g(SRi,j,k, M
H− 1, y)|g(SRi,j,k, MH− 1, y)
∈ OP(SRi,j,k), and 0
y
M
V− 1} and LS(SRi,j,k)= {g(SRi,j,k, 0, y)|g(SRi,j,k, 0, y)∈ OP(SRi,j,k),
and 0
yMV− 1}To facilitate the following descriptions of the side-match features, the denotations of SRi,j,k and SRi,j,k
q are simplified as SRl and SRr,
respectively. The vertical touching boundary of SRland SRr, denoted
asVB(SRl, SRr), is represented by a set of side connections formed
by pairs of object pixels that are symmetrically connected on their associated rightmost and leftmost sides, and is defined as follows:
VB(SRl, SRr)= {(g(SRl, M
H− 1, y), g(SRr, 0, y))|g(SRl, MH− 1, y)
∈ RS(SRl), and g(SRr, 0, y)∈ LS(SRr)}
Similarly, in the case that SRi,j,k and SRi,j,k
q are vertically adjacent
(suppose that SRi,j,kis on the top and SRi,j,k
q is on the bottom, and
their denotations are also simplified as SRt and SRb, respectively),
their horizontal touching boundary can be represented as
HB(SRt, SRb)= {(g(SRt, x, M
v− 1), g(SRb, x, 0))|g(SRt, x, Mv− 1)
∈ BS(SRt), and g(SRb, x, 0)∈ TS(SRb)}
whereBS(SRt) andTS(SRb) represent the bottommost side and the
topmost side of SRtand SRb, respectively, and are defined as
BS(SRi,j,k)= {g(SRi,j,k, x, M
v− 1)|g(SRi,j,k, x, Mv− 1)
∈ OP(SRi,j,k), and 0
x
M
H− 1} and TS(SRi,j,k)= {g(SRi,j,k, x, 0)|g(SRi,j,k, x, 0)
∈ OP(SRi,j,k), and 0
x
M H− 1}
Also, the number of side connections of the touching boundary, i.e. the amount of connected pixel pairs inVB(SRi1,j1,k1, SRi2,j2,k2) or
HB(SRi1,j1,k1, SRi2,j2,k2), should also be considered for normalizing the disconnectedness measure of the two 4-adjacent SRs, and is denoted by Nsc(SRi1,j1,k1, SRi2,j2,k2).
Therefore, the horizontal and vertical types of the side-match measures of a pair of two 4-adjacent SRs, denoted by Dh
SM and DvSM,
respectively, can be computed as
Dh SM(SR l, SRr) = (g(SRl ,MH−1,y),g(SRr,0,y))∈VB(SRl,SRr)||g(SR l, MH− 1, y) − g(SRr, 0, y)|| Nsc(SRl, SRr) and Dv SM(SRt, SRb) = (g(SRt ,x,Mv−1),g(SRb,x,0))∈HB(SRt ,SRb)||g(SR t, x, Mv− 1) − g(SRb, x, 0)|| Nsc(SRt, SRb) (13)
Accordingly, the side-match measure of SRi,j,kand SRi,j,k
q can be ob-tained by DSM(SRi,j,k, SRi ,j,k q ) = Dh SM(SRl, SRr) SRi,j,kand SR i,j,k
q are horizontally adjacent
Dv
SM(SRt, SRb) SRi,j,kand SR i,j,k
q are vertically adjacent
(14)
The range of DSM values is within [0, 255]. If the DSMvalue of two
4-adjacent SRs is sufficiently low, then these two SRs are homoge-neous with each other, and thus they should belong to the same plane.
Accordingly, the DSM measure can reflect the disconnectedness
of two 4-adjacent SRs, and the DRMvalue, as obtained by Eq. (8), and
assesses the dissimilarity between them. The single-link matching grade, denoted by
m
s, evaluates both the degree ofdisconnected-ness and dissimilarity of the two 4-adjacent SRs by considering the dominant effect of their associated DSMand DRM values, and is
de-termined by
m
s(SRi,j,k, SRi ,j,k q ) =max(DSM(SRi,j,k, SR i,j,k q ), DRM(SRi,j,k, SRi ,j,k q )) max((SRi,j,k)+
(SRi,j,k q ), 1) (15)
where
(SRi,j,k) is the standard deviation of gray intensities of all
object pixels associated with SRi,j,k, and is equivalent to
i,j
k obtained
in the localized histogram multilevel thresholding process. Here the denominator term max(
(SRi,j,k)+
(SRi,j,k
q ), 1) in Eq. (15) serves as
the normalization factor.
Next, the centroid-link matching grade, which evaluates the de-gree of dissimilarity between SRi,j,kand an already existing plane
P
q
in a global manner, is given as follows. Let
(
P
q) and2(
P
q) denotethe mean and variance of the existing plane
P
q, respectively, andthey are given by
(
P
q)= SRi ,j ,kq ∈PqNop(SR i,j,k q )·(SRi ,j,k q ) Nop(
P
q) (16) and2(
P
q)= SRi ,j ,kq ∈PqNop(SR i,j,k q )· ||(SRi ,j,k q )−
(
P
q)||2 Nop(P
q) (17)where Nop(
P
q) denotes the amount of pixels inP
q, and is given byNop(
P
q)= SRi ,j ,kq ∈q Nop(SRi ,j,k q ) (18)Accordingly, the centroid-link matching grade of SRi,j,kand
P
qcan be computed bym
c(SRi,j,k,P
q)= ||(SRi,j,k)−
(
P
q)|| max((SRi,j,k)+
(
P
q), 1) (19)If SRi,j,kis finally determined to be merged into the plane
P
q, thenthe mean
(
P
q) and variance2(
P
q) ofP
qshould be updated aftertaking in SRi,j,k. The new mean and variance of
P
qare, respectively,
computed by
(
P
newq )=(Nop(
P
prevq )·(
P
qprev)+ Nop(SRi,j,k)·(SRi,j,k))
(Nop(
P
prevq )+ Nop(SRi,j,k))(20)
and
2(
P
newq )=
[Nop(
P
prevq )·2(
P
prevq )+ Nop(SRi,j,k)· ||(SRi,j,k)−
(
P
newq )||2+ Nop(P
prevq )· ||(
P
newq )−(
P
prevq )||2](Nop(
P
prevq )+ Nop(SRi,j,k))(21)
where
P
newq denotes the newly expanded planeP
q, whileP
prevqdenotes the previous one; and
(
P
newq ) and s2(P
newq ) represent theupdated mean and variance of
P
q, respectively, while(
P
prevq ) and2(
P
prevq ) represent the previous ones.
Both of the above-mentioned matching grades are then combined to form a composite matching grade, denoted by
M
(SRi,j,k,P
q), to
complimentarily assess the degree of disconnectedness and dissimi-larity of an unclassified SR and an already existing plane in both local pair-wise and global manners. Consequently, this composite match-ing grade can provide a more effective criterion for determinmatch-ing the best belonging plane for each of the unclassified SRs. In each recur-sion of the matching phase, each of the unclassified SRs, i.e. SRi,j,kin
the Pool, is analyzed by evaluating the composite matching grade of
SRi,j,kassociated with each of its neighboring existent planes
P
q, toseek for the best matching plane into which SRi,j,kshould belong.
Since the evaluating process of the composite matching grades of SRi,j,k is performed on its neighboring planes, a plane
P
q must
have at least one of its own SRs 4-adjacent to SRi,j,k, to compete
for the belongingness of SRi,j,k. To facilitate the computation of the
composite matching grade of SRi,j,kand a plane
P
q, the processing
setAS(SRi,j,k,
P
q) is utilized to store the SRqs which belong to
P
qand4-adjacent to SRi,j,kas well, and is defined by AS(SRi,j,k,
P
q)= {SRi ,j,k q ∈P
q|SRi ,j,k q is 4-adjacent to SRi,j,k}Then the composite matching grade
M
of SRi,j,kassociated with theplane
P
q, which reveals how well SRi,j,k matches withP
q, can bedetermined by
M
(SRi,j,k,P
q)= wc(m
c(SRi,j,k,P
q)) + ws ⎛ ⎝ min ∀SRi ,j ,k q ∈AS(SRi,j,k,Pq)m
s(SRi,j,k, SRiq,j,k) ⎞ ⎠ (22)where wcand wsare the weighting factors to control the weighted
contributions of the centroid-linkage and single-linkage strengths of the composite matching grade, respectively, and wc+ws=1. By
apply-ing the weightapply-ing factors wcand wsin the composite matching grade,
the centroid-linkage and single-linkage can be combined for taking advantage of their related strengths. Because textual regions mostly