• 沒有找到結果。

Experimental results and discussions

4. THE MULTI-LAYER SEGMENTATION METHOD FOR COMPLEX DOCUMENT

4.4 Experimental results and discussions

This study illustrates 24-bit true color or 8-bit monochromatic document images at 300dpi, full page. The proposed method for automatic text segmentation has been tested on numerous magazine images, cover images and advertisement images.

Figures 27(a)~32(a) display parts of the test images. The background images in Figs.

27(a) to 32(a) include the following features. 1) Monochromatic background with/without text; 2) slowly varying background with/without text; 3) highly varying background with/without texts, and 4) complex varying background with/without text of various colors.

Figures 27(b)~32(b) present the text planes in Figs. 27(a)~32(a) after the proposed text segmentation method is implemented. Figures 27(c) to 32(c) show parts of the object layers in Figs. 27(a) to 32(a). The ratio of success of the proposed text segmentation method is,

Ratio of success =

The ratios of success in Figs. 27(b)~32(b) are 100%, 98.5%, 99.2%, 98.7%, 100%, and 97%, respectively. The proposed text segmentation method can be successfully applied to extract texts with different typefaces or sizes, as well as those spread in a compound document image with monochromatic, slowly varying, highly varying and complex varying backgrounds.

The MLSM decomposes the document image into several object layers. All of the texts are spread into different object layers, according to their colors. The text extraction algorithm extracts the text from all of the object layers. Different object layers may contain text-like blocks in a particular position, so the text extracting algorithm could make the wrong decision. Consequently, the text extraction algorithm can be further improved. For instance, although most of the text in Fig.28(a) overlays a complex varying background - a map - all of the text that overlaps the map is segmented into one of the object layers in Fig. 28(c). Although the ratios of success in Figs. 28(b), 29(b), 30(b) and 32(b) are not 100%, the MLSM successfully segments all of the texts.

According to our results, the texts can be extracted from different backgrounds, regardless of whether the texts overlap a simple, slowly or rapidly varying background. This method overcomes various issues raised by the complexity of

background images. Consequently, the multi-layer segmentation algorithm constitutes an effective solution for extracting text from various document images.

In block-based clustering algorithm, the parameters THJDF and THσ are the threshold values to decide which cluster is convergence when the conditions, JDF>THJDF andσ <THσ , are met. The JDF value measures the separability between two adjacent clusters in the block-based clustering algorithm.

The JDF value may lies within the range 0 ≤ JDF ≤ 1. Maximizing the JDF value can be utilized as an objective function, to optimize the segmentation result. Hence, when the JDF approximates 1.0, the two adjacent clusters are ideally and completely separated. When the number of the clusters is more than two, the average JDF is used to measure the separability of the clusters. This study employs THJDF=0.9.

The standard deviation,σ , measures the compactness of the pixel values of each clusters. Ideally, the σ approximates zero for a monochromatic object. In a pilot experiment, we analyze the widespread distributions, caused by the scanner or the original document, of the pixel values of monochromatic texts in different document images. The average variation of the monochromatic texts with different size or style is around 0~50. In general, the THσ =25 can obtain good preservation of the texts, but it is insufficient for our needs. Therefore, this study employs THσ =14 to obtain better outcome, when the texts overlap a background with rapidly varying texture and

similar grayscale. When the THσ is below 25, the extracted texts are thinner than original texts and the boundary of the texts are clustered to different object layer, as the Fig.23(j).

Because the value of the THσ is set as 14, the standard deviation,σ, of each LSB will less than 14. In other words, if two LSBs belong to the same object layer, the difference of the average values between the two LSBs will be less than 14. The threshold values of the ThLM and ThSI are used to judge whether the two LSBs are belong to the same object layer or not by the difference of the average values.

Therefore, the threshold values of the ThLM and ThSI are set as 14. In the decision procedure for constructing of a new object layer, we use the ThSI to determine which unclassified LSB should be merged with an existing object layer or set up a new object layer. In the pre-match condition of the matching procedure, the ThLM is used to filter out the unreasonable object layers in order to save the computation power.

The segmentation method proposed by the chapter has experimented on a large number of different document images, scanned from book covers, advertisements, brochures, and magazines. We find that the monochromatic objects, text or non-text, can be successfully separated from a document image by the MLSM, nevertheless, a few texts could be failed to extract when the pixel values of the texts are multicolor, gradual change, or too close to the pixel values of the background. A multicolor or

gradual change text brings the text fragmented and distributed to different clusters. A text could be merged with its background when the values of the text are too close to the values of its background. Although, decreasing the parameter THσ (below 14) can separate the text and the overlapped background, whose values are too close to the text, to solve the merged problem, it will cause the text fragmented and distributed to different clusters. Therefore, an adaptive threshold THσis the future work to solve the merged problem.

4.5 Concluding remarks

This study presents a viable method for extracting texts from a complex compound document image in which texts overlay various background images. The proposed segmentation algorithm uses a multi-layer segmentation method to segment the texts from various compound document images, regardless of whether the texts overlap the background. This method overcomes various issues raised by the complexity of the background images. Experimental results obtained with various document images reveal that the proposed algorithm can successfully segment Chinese and English text strings from various backgrounds, regardless of whether the texts overlap a simple, slowly or rapidly varying background. The method can be used to improve the effectiveness of compression; the technique has many applications,

algorithm can be used in Optical Character Recognition (OCR) to search for characters in complex documents strong text/background overlap.

(a) Original image (b) Text plane

(c) Parts of layer planes

Fig.27 Test image 1 (image size=2262x3263)

(a) Original image (b) Text plane

(c) Parts of layer planes

Fig.28 Test image 2 (image size=1829x2330)

(a) Original image (b) Text plane

(c) Parts of layer planes

Fig.29 Test image 3 (image size=2462x3250)

(a) Original image (b) Text plane

(c) Parts of layer planes

Fig.30 Test image 4 (image size=2333x3153)

(a) Original image (b) Text plane

(c) Parts of layer planes

Fig.31 Test image 5 (image size=2469x3535)

(a) Original image (b) Text plane

(c) Parts of layer planes

Fig.32 Test image 6 (image size=2469x3535)

CHAPTER 5

CONCLUSIONS AND PERSPECTIVE

This dissertation presents three segmentation methods for document image compression. In the Chapter 2, a compression method for color document images based on the wavelet transform and fuzzy picture-text segmentation was presented.

This approach addresses a fuzzy picture-text segmentation method, which separates pictures and texts by using wavelet coefficients from color document images. The number of colors, the ratio of projection variance, and the fractal dimension are utilized to segment the pictures and texts. By using the fuzzy characteristics of these parameters, a fuzzy rule is proposed to achieve the purpose of picture-text image segmentation. Two components, text strings and pictures, are generated and processed by different compression algorithms. The picture components and the text components are encoded by zerotree wavelet coding and by the modified run-length Huffman coding, respectively.

However, the fuzzy picture-text segmentation method does not suitable for the document images whose texts are overlap with a complex background. Therefore, two

algorithms for compressing image documents with large text/background overlap are proposed in Chapter 3. The proposed algorithms apply a new segmentation method to separate the text from the image in a compound document in which the text and background overlap. The segmentation method classifies document images into three planes: the text plane, the background (non-text) plane, and the text’s color plane.

Different compression techniques are used to process the text plane, the background and the text’s color plane. The text plane is compressed using the pattern matching technique, called JB2. Wavelet transform and zerotree coding are used to compress the background plane and the text’s color plane. The proposed algorithms greatly outperform the famous image compression methods, JPEG and DjVu, and enable the effective extraction of the text from a complex background, achieving a high compression ratio for compound document images.

Although the segmentation method in the Chapter 3 outperforms the famous image compression methods, JPEG and DjVu, it does not apply when backgrounds include sharply varying contours or overlap with texts. These background images include 1) monochromatic backgrounds with/without texts; 2) slowly varying backgrounds with/without texts; 3) highly varying background with/without texts and, 4) complex varying backgrounds with/without texts with different colors. Extracting the texts is particularly difficult when the compound document image includes all of

these backgrounds.

In Chapter 4, a viable method for extracting texts from a complex compound document image in which texts overlay various background images is presented. The proposed segmentation algorithm uses a multi-layer segmentation method (MLSM) to segment the texts from various compound document images, regardless of whether the texts overlap the background. The MLSM method overcomes various issues raised by the complexity of the background images. Experimental results obtained with various document images reveal that the proposed algorithm can successfully segment Chinese and English text strings from various backgrounds, regardless of whether the texts overlap a simple, slowly or rapidly varying background. The method can be used to improve the effectiveness of compression; the technique has many applications, including compressing color faxes and documents. Moreover, the segmentation algorithm can be used in Optical Character Recognition (OCR) to search for characters in complex documents strong text/background overlap.

According to our results, the texts can be extracted from different backgrounds, regardless of whether the texts overlap a simple, slowly or rapidly varying background. This method overcomes various issues raised by the complexity of background images. Consequently, the multi-layer segmentation algorithm constitutes an effective solution for extracting text from various document images.

The MLSM method has experimented on a large number of different document images, scanned from book covers, advertisements, brochures, and magazines. We find that the monochromatic objects, text or non-text, can be successfully separated from a document image by the MLSM, nevertheless, a few texts could be failed to extract when the pixel values of the texts are multicolor, gradual change, or too close to the pixel values of the background. A multicolor or gradual change text brings the text fragmented and distributed to different clusters. A text could be merged with its background when the values of the text are too close to the values of its background.

Although, decreasing the parameter THσ(below 14) can separate the text and the overlapped background, whose values are too close to the text, to solve the merged problem, it will cause the text fragmented and distributed to different clusters.

Therefore, an adaptive threshold THσ is the future work to solve the merged problem.

REFERENCE

[1] M. Antonini, M. Barlaud, P. Mathieu and I. Daubechies, “Image Coding Using Wavelet Transform,” IEEE Trans. Image Processing, Vol. 1, No. 2, pp. 205-220, 1992.

[2] S. C. Mallet, “A Theory for Multiresolution Signal Decomposition: The Wavelet Representation,” IEEE Transactions on Pattern Analysis and Machine Intelligence, Vol. 11, No. 7, pp. 674-693, 1989.

[3] J. M. Shapiro, “Embedded Image Coding Using Zerotrees of Wavelets Coefficients,”

IEEE Trans. Signal Processing, Vol. 41, No. 12, pp. 3445-3462, 1993.

[4] A. Said and W. A. Pearlman, “A New, Fast, and Efficient Image Codec Based on Set Partitioning in Hierarchical Trees,” IEEE Trans. Circuits and Systems for Video Technology, Vol. 6, No. 3, pp. 243-250, 1996.

[5] A. Leger, T. Omachi, and G.K. Wallace, “JPEG Still Picture Compression Algorithm,”

Optical Engineering, Vol. 30, No. 7, pp. 947-954, 1991.

[6] F. M. Wahl, K. Y. Wong, and R. G. Casey, “Block Segmentation and Text Extraction in Mixed Text/Image Documents,” Computer Graphics and Image Processing, Vol. 20, pp. 375-390, 1982.

[7] G. Nagy, S. C. Seth and S. D. Stoddard, “Document Analysis with an Expert System”, Pattern recognition practice II, pp. 149-159, 1986.

[8] L. A. Fletcher and R. Kasturi, “A Robust Algorithm for Text String Separation from Mixed Text/Graphics Images,” IEEE Trans. on Pattern Analysis and Machine Intelligence, Vol. 10, No. 6, pp. 910-918, 1988.

[9] Mohamed Kamel and Aiguo Zhao, “Extraction of Binary Character/Graphics Images from Grayscale Document Images,” CVGIP:Graphical Models and Image Processing, Vol. 55, No. 3, pp. 203-217, 1993.

[10] Wen-Hsiang Tsai, “Moment-Preserving Thresholding : A New Approach,” Computer Vision, Graphics, and Image Pressing, Vol. 29, pp. 377-393, 1985.

[11] J. L. Fisher, S. C. Hinds and D. P. D’amato, “A Rule-Based System for Document Image Segmentation,” Proceedings of 10th IEEE international conference on Pattern Recognition, Vol.1, pp. 567-572, 1990.

[12] T. Akiyama and N. Hagita, “Automated Entry System for Printed Documents,” Pattern Recognition, Vol. 23, No. 11, pp. 1141-1154, 1990.

[13] F. Y. Shih, S. S. Chen, D. C. D. Hung and P. A. Ng, “A Document Segmentation, Classification and Recognition System,” Proceedings of IEEE international conference on System integration, pp.258-267, 1992.

[14] T. Pavlidis and J. Zhou, “Page Segmentation and Classification,” CVGIP: Graph.

Models Image Process., Vol. 54, No. 6, pp. 484-496, 1992.

[15] A. A. Zlatopolsky, “Automated Document Segmentation,” Pattern Recognition Lett., Vol. 15, No. 7, pp. 699-704, 1994.

[16] D. Wang and S. N. Srihari, “Classification of Newspaper Image Blocks using Texture Analysis,” Computer Vision Graph. Image Process., Vol. 47, pp. 327-352, 1989.

[17] A. K. Jain and S. Bhattacharjee, “Text Segmentation using Gabor Filters for Automatic Document Processing,” Mach. Vis. Appl., Vol. 5, pp. 169-184, 1992.

[18] C. Fortin, R. Kumaresan, W. Ohley and S. Hoefer, “Fractal Dimension in The Analysis of Medical Images,” IEEE Engineering in Medicine and Biology Magazine, Vol. 11, pp. 65 –71, 1992.

[19] H. M. Suen and J. F. Wang, “Text String Extraction from Images of Colour-Printed Documents,” IEE Proc.-Vis. Image Signal Process., Vol. 143, No. 4, pp. 210-216, 1996.

[20] P. Haffner, L. Bottou, P. G. Howard, P. Simard, Y. Bengio and Y. Le Cun, ” Browsing Through High Quality Document Images with DjVu,” Proceedings. IEEE International Forum on Research and Technology Advances in Digital Libraries, pp.309-318, 1998.

[21] L. Bottou, P. Haffner, P. G. Howard, P. Simard, Y. Bengio and Y. Le Cun, “High Quality Document Image Compression with DjVu,” Journal of Electric Imaging, Vol.7, No.3, pp.410-425, 1998.

[22] B. B. Chaudhuri and N. Sarkar, “Texture Segmentation using Fractal Dimension,”

IEEE Transactions on Pattern Analysis and Machine Intelligence, Vol. 17, pp. 72 –77, 1995.

[23] George J. Klir et al., Fuzzy Sets and Fuzzy Logic : Theory and Applications, Prentice Hall, Englewood Cliffs, NJ, 1995.

[24] N. Sarkar and B. B. Chaudhuri, “An Efficient Differential Box-Counting Approach to Compute Fractal Dimension of Image,” IEEE Transactions on Systems, Man and Cybernetics, Vol. 24, pp. 115 –120, 1994.

[25] S. Buczkowski, S. Kyriacos, F. Nekka and L. Ccartililer, “The Modified Box-Counting Method: Analysis of Some Characteristic Parameters,” Pattern Recognition, Vol. 31, No. 4, pp. 441-418, 1998.

[26] R. L. Queiroz, Z. Fan and T. D. Tran, “Optimizing Block-Thresholding Segmentation for Multilayer Compression of Compound Images,” IEEE Transaction on Image Processing, Vol. 9, No.9, pp.1461-1471, 2000.

[27] Y. Lin and S. N. Srihari, “Document Image Binarization Based on Texture Features,”

IEEE Trans. Pattern Analysis and Machine Intelligence, Vol. 19, No 5, pp. 540-544, 1997.

[28] D. Huttenlocher, P. Felzenszwalb and W. Rucklidge, “DigiPaer: A Versatile Color Document Image Representation,” Proc. IEEE Intl. Conf. Image Proc., pp.219-223, 1999.

[29] P. G. Howard, “Text Image Compression using Soft Pattern Matching,” Computer Journal, Vol. 40, No. 2, pp.146-156, 1997.

[30] “JBIG Committee FDIS Text,“ ISO/IEC International Standard 14492, 1999.

[31] “MRC. Mixed raster content mode,” ITU Recommendation T.44, 1997.

[32] “JBIG. Progressive bi-level image compression,” ITU recommendation T.82, ISO/IEC Internation Standard 11544, 1993.

[33] D. Taubaman and A. Zakhor, “Multi-Rate 3D Subband Coding of Video,” IEEE Trans.

Image Processing, Vol. 3, No.5, pp. 572-588, 1994.

[34] H. Yang, M. Kashimura, N. Onda and S. Ozawa, “Extraction of Bibliography Information Based on Image of Book Cover,” International Journal of Pattern Recognition and Artificial Intelligence, Vol.14, No.7, pp.963-978, 2000.

[35] J. Li and R. M. Gray, “Context-Based Multiscale Classification of Document Images Using Wavelet Coefficient Distributions,” IEEE Trans. on Image Processing, Vol.9, No.9, pp.1604-1616, 2000.

[36] H. Choi and R. G. Baraniuk, “Multiscale Image Segmentation Using Wavelet-Domain Hidden Markov Models,” IEEE Trans. on Image Processing, Vol.10, No.9, pp.1309-1321, 2001.

[37] H. Cheng and C. A. Bouman, “Multiscale Bayesian Segmentation Using a Trainable Context Model,” IEEE Trans. on Image Processing, Vol.10, No.4, pp.511-524, 2001.

[38] Bing-Fei Wu, Chung-Cheng Chiu and Wen-Long Lin, “Wavelet-based Images Compression of Color Document by Fuzzy Picture-Text Segmentation,” Journal of The Chinese Institute of Engineers, Vol. 26, No.1, pp.113-118, 2003.

[39] M. Acharyya and M. K. Kundu, “Document Image Segmentation Using Wavelet Scale-Space Features,” IEEE Trans. on Circuits and Systems for Video Technology, Vol.12, No.12, pp.1117-1127, 2002.

[40] J. Zhou and D. Lopresti, “Extracting Text from WWW Images,” Proceedings of International Conference on Document Analysis and Recognition, pp.248-252, 1997.

[41] M. Worring and L. Todoran, “Segmentation of Color Documents by Line Oriented Clustering using Spatial Information,” Proceedings of International Conference on Document Analysis and Recognition, pp.67-70, 1999.

[42] M. Pietikinen and O. Okun, “Edge-Based Method for Text Detection from Complex Document Images,” Proceedings of International Conference on Document Analysis and Recognition, pp.286-291, 2001.

[43] Q. Yuan and C.L. Tan, “Text Extraction from Gray Scale Document Images using Edge Information,” Proceedings of International Conference on Document Analysis and Recognition, pp.302-306, 2001.

[44] V. Wu, R. Manmatha and E.M. Riseman, “Finding Text in Images,” Proceedings of 2nd ACM International Conference on Digital Libraries, pp.3-12, 1997.

[45] H.P. Li, D. Doermann and O. Kia, “Automatic Text Detection and Tracking in Digital Video,” IEEE Trans. on Image Processing, Vol.9, No.1 , pp.147-156, 2000.

[46] R. Lienhart and A. Wernicked, “Localizing and Segmenting Text in Images and Videos,” IEEE Trans. on Circuits and Systems for Video Technology, Vol.12, No.4, pp.236-268, 2002.

[47] N. Otsu, “A Threshold Selection Method from Gray-Level Histograms,” IEEE Trans.

on Systems, Man, and Cybernetics, Vol.8, pp.62-66, 1978.

VITA

博士候選人簡歷

姓名:瞿忠正 性別:男

生日:民國 56 年 1 月 18 日 出生地:台南市

論文題目:

中文:複雜型複合式文件影像壓縮方法之研究

英文:THE STUDY OF THE COMPRESSION ALGORITHMS FOR COMPLEX COMPOUND DOCUMENT IMAGES

學經歷:

1. 75 年 9 月~79 年 7 月 中正理工學院電機工程學系 2. 79 年 7 月~81 年 7 月 中正理工學院電機工程學系 助教 3. 81 年 9 月~83 年 7 月 中正理工學院電子工程研究所 4. 83 年 7 月~now 中正理工學院電機工程學系 講師

5. 87 年 9 月~now 在職進修國立交通大學電機與控制工程研究所博士 學位

榮譽:

一、第五屆 TIC100 創業競賽 冬令營 冠軍 二、第五屆 TIC100 創業競賽 總決賽 銀質獎 三、第十七屆宏碁龍騰知識經濟論文獎 金質獎

第十七屆宏碁龍騰知識經濟論文獎 金質獎

第五屆 TIC100 創業競賽冬令營冠軍 第五屆 TIC100 創業競賽總決賽銀質獎

PUBLICATION LIST

博士候選人著作目錄

姓名:瞿忠正 (Chung-Cheng Chiu)

Journal

[1] Bing-Fei Wu, Chung-Cheng Chiu and Wen-Long Lin, “Wavelet-Based Images Compression of Color Document by Fuzzy Picture-Text Segmentation,” Journal of The Chinese Institute of Engineers, Vol. 26, No.1, pp.113-118, 2003.

[2] Bing-Fei Wu, Chung-Cheng Chiu and Yen-Lin Chen, “Algorithms for Compressing Compound Document Images with Large Text/Background Overlap”, accepted by IEE

[2] Bing-Fei Wu, Chung-Cheng Chiu and Yen-Lin Chen, “Algorithms for Compressing Compound Document Images with Large Text/Background Overlap”, accepted by IEE

相關文件