Jigsaw-puzzle layer construction algorithm

4. THE MULTI-LAYER SEGMENTATION METHOD FOR COMPLEX DOCUMENT

4.2 Multi-layer segmentation method

4.2.2 Jigsaw-puzzle layer construction algorithm

A sub-block image may be composed by one or more object images with various intensity features. Those object images may be parts of a larger object, one or several character patterns with various intensities, or one piece of background texture components. The block-based clustering algorithm decomposes the sub-block image into different layered sub-block images, LSBs, in the order of the darkest to brightest corresponding to the original sub-block image. In the jigsaw-puzzle layer construction algorithm, some statistical and spatial features of adjacent LSBs are introduced to assemble all LSBs of the same text paragraph or object. This section introduces an algorithm for constructing the object layers from the LSBs generated from the block-based clustering algorithm introduced in the preceding section.

Before the explanation of the algorithm, we describe some definitions in the

algorithm.

We define the 4-adjacent that each LSB has four sides between the adjacent sub-block images that border on the top, bottom, left, or right side of the LSB. Each side of the LSB adjoins several adjacent LSBs derived from the adjacent sub-block image and matches only one adjacent LSB. An object layer is assembled by the LSBs which match their adjacent LSBs, and all the LSBs of the object image are recorded by a finite chain. Since text strings are mostly printed in horizontally or vertically in documents, the text strings of document images have a continuous relationship in horizontal or vertical. Hence, it is appropriate that the 4-adjacent property is used to determine the connectedness of the LSBs. The continuity of the adjacent LSBs is used to match the LSBs with 4-adjacent. The pixels of each LSB can be represented as a specified subset of all pixels in the corresponding sub-block image. A LSB may comprise several connected regions - these pixels of the connected regions are said the valid pixels and the rest pixels of the LSB are said the invalid pixels.

The parameter LSB(i,j,k) is defined the k-th LSB decomposed from the sub-block image xn

( )

^i,^j . If theLSB(i,j,k)is matched to the object layer Lq, then it is denoted asLSB_q(i,j,k), where the subscript q denote the q-th object layer. We denote the valid pixel value located on (x, y) in theLSB(i,j,k)as Pix(LSB, x, y), x=0~(K-1) and y=0~(L-1). Two measurements of the continuity among two LSBs are defined.

First, the mean distortion of all touched valid pixels at the boundary between two adjacent LSBs, called side-match distortion, is represented as the DSM. For instance, there are two horizontally adjacent LSBs which have the dimension K×L, and we

denote the left one by the LSBl and the right one by the LSBr. Their pixel values on the horizontal touching boundary can be described by Pix(LSBl, K-1, y) and Pix(LSBr, 0, y), y=0~(L-1). Note that only the valid pixels are taken into account, and the values of the boundary pixels are taken into account for the DSM. The horizontal touching boundaries between the adjacent LSBs form a vertical edge. The valid pixels that are symmetrically located on both sides of the vertical edge are considered as the valid side connection. The pair number of the valid pixels in the valid side connection is a factor that reflects the connectedness of the two adjacent LSBs, and is denoted by Nvs(LSBl, LSBr). Hence the DSM of two horizontally adjacent

The DSM means the average difference between the valid pixels in the valid side connection. If the value of the DSM is small, the two adjacent LSBs will be the same object layer. The range of the DSM value is 0~255.

Second, the difference between the average pixel values of two LSBs, called the inter-LSB distortion, is defined as the DLM. Similarly, only the valid pixels of the

LSBs are taken into account for the DLM, and we denote the number of valid pixels of where m(LSB) denote the average of all valid pixels belonging to this LSB, and is computed as

The DLM means the average difference between two LSBs. If the value of the DLM

is small, the two LSBs will be the same object layer. The range of the DLM value is 0~255.

Similarly, the DSM and DLM of two vertically adjacent LSBs can be deduced from Eq.(4-8) and Eq.(4-9), respectively. The smaller the value of the DSM or DLM is computed, the stronger the continuity or similarity of the adjacent LSBs is presented.

According to the definitions described above, the match grade is defined as ), average difference located on the both adjacent sides of the two LSBs and the DLM can be treated as the global average difference of the two LSBs. Therefore, the maximum value of the DSM and DLM is defined as the match grade. The best match of two LSBs

can be selected by the minimal value of the match grade.

The jigsaw-puzzle layer construction algorithm is constructed by two procedures, the decision procedure for constructing of a new object layer and the matching procedure. The proposed algorithm is given as the pseudo-code as follow:

Algorithm：Jigsaw-puzzle layer construction

MP( ) /The matching procedure/

if the LSBN satisfy the pre-match condition {

Mark the LSBN as the representative LSB of the object layer }

}

if there are one more representative LSBs in this process {

LSBq’ ← the representative LSB with the minimal match grade Found_flag ← 1

}

else if there is one representative LSB in this process {

candidate_insert(LSBq’ ) }

The proposed algorithm begins by analyzing the initially unclassified LSBs in the Pool. The Pool will be analyzed several times until each of all unclassified LSBs has

been classified as a certain object layer. Before starting a new iteration of analyzing

the unclassified LSBs in the Pool, the algorithm will perform a procedure, called

“decision procedure for constructing of a new object layer”, which will be described later, to determine a chosen seeded LSB whether to set up a new object layer or to belong to an existing object layer which is similar to the chosen seeded LSB. Then, the “matching procedure” will find the matched object layers of the unclassified LSBs.

Once an unclassified LSB in the Pool has been classified to an object layer, then it will be removed from the Pool.

In the first time to analyze the Pool, a new object layer should be set up by the first unclassified LSB(0,0,0), because there is no existing object layers initially. So, the LSB(0,0,0) becomes a new object layer L0 and is denoted as LSB₀(0,0,0) in the decision procedure for constructing of a new object layer. In the matching procedure, we then scan each of the rest unclassified LSBs in the Pool. Whenever an unclassified

) , , (i j k

LSB which is 4-adjacent to one or more existing object layers is detected, then we check the pre-match condition to find the reasonable object layers for the unclassified LSB(i,j,k).

The pre-match condition is defined as ,

gray level distance. When the condition is satisfied, the object layer Lq becomes a candidate for the unclassified LSB(i,j,k) and the representative LSB_q(i^', j^',k^') of

the Lq will participate in the process of the match grade. The purpose of the pre-match is to filter out the unreasonable object layers in order to save the computation power.

All representative LSBs of the reasonable object layers will be found and inserted into the candidate list. Note that if there are more than one LSB of the same object layer Lq

that are 4-adjacent to the unclassified LSB(i,j,k) and satisfy the pre-match condition, then we will choose the one with the minimal match grade as the representative

) , , (i^' j^' k^'

LSB_q of the object layer Lq. After all representatives of the object layers are obtained, we calculate and compare the match grades between the unclassified

) , , (i j k

LSB and all representatives in the candidate list, and then determine the best match representative LSBw by selecting the one with the minimal value of the match grade, and thus classify the unclassified LSB(i,j,k) to the Lw.

Now we return to the matching procedure, after the L0 has been set up, there exists one object layer in this iteration. The LSB(1,0,0) is currently analyzed, assume the LSB₀(0,0,0)is 4-adjacent and satisfies the pre-match condition with LSB(1,0,0). Since there is only the LSB₀(0,0,0)in the L0, so the LSB₀(0,0,0) is selected as the representative LSB of the L0 and inserted into the candidate list. Since no other object layers existing in this time, the L0 is directly determined as the best match object layer, and thus LSB(1,0,0) is classified to the L0 and removed from the Pool. Then, repeat the two procedures until all unclassified LSBs in the Pool have been analyzed once in

this iteration. The detail descriptions of the two procedures are explained below.

A. The decision procedure for constructing of a new object layer

The procedure determines a chosen LSB: 1) to set up and initialize a new object layer, or 2) to classify it into an existing object layer which is most similar to the chosen LSB. The decision procedure is performed to achieve an optimum decision to construct or extend an object layer. The decision is determined according to the analysis of following features and is depicted in Fig.24.

Fig.24 Flowchart of the decision procedure to construct or extend an object layer

Several definitions and measures must be stated before the details of this procedure are described. The minimum gray intensity distance between one unclassified LSB(i,j,k) and the object layer Lp, denoted as ID(LSB(i, j,k),L_p), is determined by the minimum intensity difference between the unclassified LSB(i,j,k)

and all LSBs of the Lp, and is computed as

The Euclidean location distance between one unclassified LSB(i,j,k) and the object layer Lp, denoted as LD(LSB(i,j,k),L_p) , is computed by the Euclidean The smallest gray intensity distance between an unclassified LSB(i,j,k) and all

currently existing object layers is defined as

))

where n is the index of the existing object layers. The object layer with the smallest gray intensity distance determined by the SID(LSB(i, j,k)) is denoted as LSI, and the LSB with the minimum gray intensity distance determined by the ID(LSB(i,j,k),L_SI)

is denoted as LSBSI which is the LSB of the LSI. The value of the SID(LSB(i,j,k)) is

the smallest difference of the gray intensity between the unclassified LSB(i,j,k) and all existing object layers. If the SID(LSB(i, j,k)) value of an unclassified LSB(i,j,k) is very small, it reflects that the gray intensity of the unclassified LSB(i,j,k) is very similar to the gray intensity of the LSBSI in the object layer LSI. Because the texts and the homogeneous objects have the same gray intensity, it means that the unclassified

) , , (i j k

LSB may be part of the object layer LSI. The unclassified LSB(i,j,k) should not set up a new object layer to prevent the splitting of the texts or homogeneous objects into more than one object layer.

The largest gray intensity distance between an unclassified LSB(i, j,k) and all

currently existing object layers is defined as

))

The object layer with the largest gray intensity distance determined by )) difference of the gray intensity between the unclassified LSB(i, j,k) and all existing object layers. If the SID(LSB(i,j,k)) values of all unclassified LSB(i, j,k) are very large, it reflects that all the unclassified LSB(i,j,k) are dissimilar to the existing object layers. Hence, the unclassified LSB(i,j,k) which has the largest

LID should be selected as the seeded LSB to set up a new object layer.

The minimum Euclidean location distance measured between an unclassified

) , , (i j k

LSB and all currently existing object layers is defined as ))

and the object layer with minimum Euclidean location distance determined by the )) minimum Euclidean location distance between an unclassified LSB(i,j,k) and all existing object layers.

In this procedure, all unclassified LSBs in the Pool are processed to extract their corresponding SIDs, LIDs and SLDs with all existing object layers according to the definitions stated above. In order to classify the chosen LSB into an existing object layer which is most similar to the chosen LSB, the unclassified LSB which has the similar gray intensity and is closest to its corresponding object layer is chosen by the SID and SLD values. We select five unclassified LSBs with the smallest SID value at

most, which must satisfy the condition - SID(LSB(i,j,k))≤Th_SI, and compute the SLDs between the five unclassified LSBs and theirs corresponding LSL. Because the texts or the homogeneous objects may contain many different connected regions, each unclassified LSB is part of its corresponding LSI. Then, the unclassified LSB with the smallest SLD will be the seeded LSB and classified to its corresponding LSL.

If the SID values of all unclassified LSBs are larger than ThSI,

ThSI

object layers, then it is appropriate to set up a new object layer by determining the seeded LSB with the largest LID from the Pool to initialize a new object layer.

The ThSI is the predefined threshold and set as 14. The setting of the ThSI value will influence the number of resultant object layers. If the value is too small, then the number of object layers will increase and some homogeneous region may split into more than one object layers, such as broken text lines; while the value is too large, then there may some different object regions be merged into the same object layer.

B. The matching procedure

We now present the matching procedure that assigns each unclassified LSB into an existing object layer to which the unclassified LSB should belong. The matching procedure analyzes the unclassified LSBs from darkest to lightest, left side to right side, and top side to bottom side. Hence, all unclassified LSBs are put in a “Pool”, and they will be analyzed following the order described above.

The algorithm uses a list to keep track of the representative LSBs of the object layers and to determine which object layer the unclassified LSB should belong. The representative LSBq of the object layer Lq must be out of the LSBs which are belonging to the Lq and 4-adjacent to the current unclassified LSB. When an unclassified LSB is analyzed to determine which object layer is the best match, there may be several object layers to choose. The list stores the representative LSBs of the

candidate object layers, where each candidate object layer provides one representative LSB. Then we can calculate the match grades between the unclassified LSB and all representative LSBs, which are 4-adjacent to the current unclassified LSB, in the list to determine which object layer is the best match for the unclassified LSB. The match grade is a criterion utilized to calculate how well the unclassified LSB(i,j,k) match with an candidate object layer Lq. Before the computation of the match grade, the pre-match condition is firstly used as to determine whether the representative

) , , (i^' j^' k^'

LSB_q , which represents the object layer Lq, is a candidate for the unclassified ) pixels will influence the valid pixels in the valid side connection, the DSM could be invalid when the DSM value is small. Hence, the DSM is computed under two cases. 1) When Nvs value is larger enough to reflect that the side information of the two adjacent LSBs is appropriate, the DSM is taken into consideration for the match grade, i.e.

where Thvs is a predefined threshold. 2) Otherwise, the DSM factor is disabled by setting DSM to zero. Considering the cases of the two adjacent LSBs which contains

character patterns with thin strokes cross the side of them, so that the reasonable value of the Thvs can be defined as 5% of the average of K or L values. Since we use K=L=96 experimentally as described before, so the Thvs=5 is obtained and used in

this work.

We define two operations for the candidate list:

i) candidate_insert(LSB_q(i^',j^',k^')): which inserts a representative LSB_q(i^',j^',k^') of the object layer L_q into the candidate list.

ii) candidate_decide( )→ L : which computes the match grades of all _w representative LSB_q(i^',j^',k^') in the candidate list and then finds the best match object layer L which has the minimal match grade among all _w candidates. Hence, the current unclassified LSB(i, j,k)will be classified to the object layers L . _w

After the proposed MLSM algorithm is performed, all LSBs are classified into appropriate object layers. Consequently there are N object layers, L0, L1,…, LN-1 are created. Each object layer possesses a set of the LSBs. An object image is created by all pixels belonging to the object layer. Figure 25(a) displays the image of a CD cover.

Figures 25(b), (c), and (d) are the object images derived from Fig.25(a) after the MLSM. A detailed analysis of those object images in which all character patterns, foreground objects and background components are well separated, can be easily

performed. The text-lines will be extracted from each object layer in the text extraction algorithm presented in the following section.

(a) Original image

(b) Layer 1 (c) Layer 2

(d) Layer 3

Fig. 25 An example of the MLSM (image size=1361x1333)

在文檔中複雜型複合式文件影像壓縮方法之研究 (頁 94-110)