• 沒有找到結果。

結論與未來研究

第一節 結論

本研究提出一個加強 Hough transform 對於直橫混排名片文字行分離的偵測 方法,以 nearest-neighbor 方法連結文字元件,最後擷取出完整之文字行。

本研究中使用 Hough transform,基於 Epshtein 的方法作修改,可偵測同一名 片影像中不同角度的文字行,且具有穩定性,即使兩種不同排版方向的文字區域 距離很近,搭配角度投影量統計分析和 RLSA 輔助文字行方向選取,也能將兩種 排版角度分開。

經由本研究的實驗證明,經本研究方法處理過的名片影像,對於複雜名片特 別能提高後續 OCR 軟體的字元辨識率,且本研究修改之 Hough transform 可偵測 任意角度之文字行,適用於一般垂直拍攝及具有透視扭曲的名片影像。

第二節 未來研究

在未來研究中,將朝下列方向作改進:

1. 本研究適用範圍為名片之上、下,左、右,四個邊界都納入影像中,

使用者為了多留下名片資訊,會使用滿版拍攝,即名片邊界超出影像外,

若允許名片邊界不在影像內,而仍可以校正透視扭曲,適用範圍會更廣。

2. 對於本研究出現文字行分割不足情況,可由加入有效的圖文分離演算

法來改善。過度分割的情況出現於字距較寬的名片影像上,可在文字行建

構完成後,針對文字行叢集較疏的部份進行 X-Y CUT 或其他 Top-down 的

切割方法,減少本研究所使用 Bottom-up 方法對寬字矩的連接在缺乏名片

整體排版資訊時判斷失誤。背景和文字重疊時,顏色相似的底色也會造成

過度分割,此時背景無法單獨取出連通圖做圖文分離,需要藉由更精細的

色彩分析從背景中抽取文字元件。

3. 本研究所使用之 NBS 門檻值為固定,對於強烈光照不均的情況或模糊 的情況,不足用於擷取整張影像的文字,希望未來能設計更精細的門檻值

條件判斷,或加入光照不均的校正演算法,使本研究適用的情況更普遍。

參考文獻

[1] The Stanford Mobile Visual Search Dataset Available:

http://web.cs.wpi.edu/~claypool/mmsys-dataset/2011/stanford/

[2] J. Liang, D. Doermann, and H. Li, "Camera-based analysis of text and documents: a survey," International Journal on Document Analysis and Recognition, vol. 7, pp. 84-104, 2005.

[3] R. Lienhart and A. Wernicke, "Localizing and segmenting text in images and videos," Ieee Transactions on Circuits and Systems for Video Technology, vol.

12, pp. 256-268, Apr 2002.

[4] H.-K. Kim, "Efficient automatic text location method and content-based indexing and structuring of video database," Journal of Visual

Communication and Image Representation, vol. 7, pp. 336-344, 1996.

[5] A. Miene, T. Hermes, G. Ioannidis, and A. Christoffers, "Extracting textual inserts from digital videos," in Document Analysis and Recognition, 2001.

Proceedings. Sixth International Conference on, 2001, pp. 1079-1083.

[6] Y. M. Y. Hasan and L. J. Karam, "Morphological text extraction from

images," Ieee Transactions on Image Processing, vol. 9, pp. 1978-1983, Nov 2000.

[7] D. Chen, H. Bourlard, and J.-P. Thiran, "Text identification in complex

background using SVM," in Computer Vision and Pattern Recognition, 2001.

CVPR 2001. Proceedings of the 2001 IEEE Computer Society Conference on, 2001, pp. II-621-II-626 vol. 2.

[8] J. Canny, "A computational approach to edge detection," Pattern Analysis and Machine Intelligence, IEEE Transactions on, pp. 679-698, 1986.

[9] V. Wu, R. Manmatha, and E. M. Riseman, "Textfinder: An automatic system to detect and recognize text in images," Pattern Analysis and Machine Intelligence, IEEE Transactions on, vol. 21, pp. 1224-1229, 1999.

[10] H. Li, D. Doermann, and O. Kia, "Automatic text detection and tracking in digital video," Image Processing, IEEE Transactions on, vol. 9, pp. 147-156, 2000.

[11] K. I. Kim, K. Jung, and J. H. Kim, "Texture-based approach for text

detection in images using support vector machines and continuously adaptive mean shift algorithm," Pattern Analysis and Machine Intelligence, IEEE Transactions on, vol. 25, pp. 1631-1639, 2003.

[12] J. Ha, R. M. Haralick, and I. T. Phillips, "Recursive XY cut using bounding

boxes of connected components," in Document Analysis and Recognition, 1995., Proceedings of the Third International Conference on, 1995, pp.

952-955.

[13] L. O'Gorman, "The document spectrum for page layout analysis," Pattern Analysis and Machine Intelligence, IEEE Transactions on, vol. 15, pp.

1162-1173, 1993.

[14] R. Cattoni, T. Coianiz, S. Messelodi, and C. Modena, "Geometric layout analysis techniques for document image understanding: a review," 1998.

[15] N. Papamarkos, J. Tzortzakis, and B. Gatos, "Determination of run-length smoothing values for document segmentation," in Electronics, Circuits, and Systems, 1996. ICECS'96., Proceedings of the Third IEEE International Conference on, 1996, pp. 684-687.

[16] R. O. Duda and P. E. Hart, "Use of the Hough transformation to detect lines and curves in pictures," Communications of the ACM, vol. 15, pp. 11-15, 1972.

[17] B. Gatos, N. Papamarkos, and C. Chamzas, "Skew detection and text line position determination in digitized documents," Pattern Recognition, vol. 30, pp. 1505-1519, 1997.

[18] L. Likforman-Sulem, A. Hanimyan, and C. Faure, "A Hough based algorithm for extracting text lines in handwritten documents," in Document Analysis and Recognition, 1995., Proceedings of the Third International Conference on, 1995, pp. 774-777.

[19] L. Itti, C. Koch, and E. Niebur, "A model of saliency-based visual attention for rapid scene analysis," Pattern Analysis and Machine Intelligence, IEEE Transactions on, vol. 20, pp. 1254-1259, 1998.

[20] S. Montabone and A. Soto, "Human detection using a mobile platform and novel features derived from a visual saliency mechanism," Image and Vision Computing, vol. 28, pp. 391-402, 2010.

[21] R. Hartley and A. Zisserman, Multiple view geometry in computer vision vol.

2: Cambridge Univ Press, 2000.

[22] L. Shapiro and G. C. Stockman, Computer Vision. 2001: Prentice Hall, 2001.

[23] Y. Gong, "Advancing content-based image retrieval by exploiting image color and region features," Multimedia Systems, vol. 7, pp. 449-457, 1999.

[24] B. Epshtein, "Determining Document Skew Using Inter-line Spaces," in Document Analysis and Recognition (ICDAR), 2011 International Conference on, 2011, pp. 27-31.

[25] Y. Li, Y. Zheng, and D. Doermann, "Detecting text lines in handwritten documents," in Pattern Recognition, 2006. ICPR 2006. 18th International

Conference on, 2006, pp. 1030-1033.

[26] ICDAR 2011 Robust Reading Competiton, Challenge 1: "Reading Text in Born-Digital Images (Web and Email)". Available:

http://www.cvc.uab.es/icdar2011competition/

[27] A. Clavelli, D. Karatzas, and J. Lladós, "A framework for the assessment of text extraction algorithms on complex colour images," in Proceedings of the 9th IAPR International Workshop on Document Analysis Systems, 2010, pp.

19-26.

[28] D. Karatzas, S. R. Mestre, J. Mas, F. Nourbakhsh, and P. P. Roy, "ICDAR 2011 Robust Reading Competition-Challenge 1: Reading Text in

Born-Digital Images (Web and Email)," in Document Analysis and Recognition (ICDAR), 2011 International Conference on, 2011, pp.

1485-1490.

[29] ABBYY FineReader 11. Available: http://www.abbyy.com/

[30] About Microsoft Office Document Imaging. Available:

http://office.microsoft.com/en-us/word-help/about-microsoft-office-documen t-imaging-HP001077103.aspx

[31] Tesseract OCR engine. Available: http://code.google.com/p/tesseract-ocr/

相關文件