2 0 0 4 IEEE International Conference on Multimedia a n d Expo (ICME)
Automatic Text Detection Using Multi-Layer
Color
Quantization in Complex
Color
Images
Soo-Chang
Pei, and Yu-Ting Chuang
Department of Electrical Engineering, National Taiwan University
No.1, Sec. 4 Roosevelt Rd., Taipei,
10617 Taiwan, R.0.C
TEL:
886-2-23635251 ext. 321
FAX:
886-2-23671
909
Email:
[email protected], 190942058 @ms90.ntu.edu.tw
ABSTRACT
News, magazines, Web pages, etc in this modem life always contain much text information.
In
this paper. we propose a novel approach to detect text in images with very low false alarm rate. First of all. neural network color quantization is used to compact text color. Second, 3D histogram analysis chooses several color candidates, and then extracted each of these color candidates to obtain several bi-level images. Moreover, for each bi-level image. connectivity analysis and some morphological operators are fed to produce character candidates. Furthermore. we calculate some spatial features and relationships of each text candidate. At last, we can localize text regions by authentication from L.0.G edge detector. Meanwhile, in complex color images, multi-quantization layers can be integrated to reject non-text parts and reduce false alarm rate.1.
INTRODUCTIONIn modem multimedia times, News, Web pages, magazines. and advertisements are everywhere in our lives. Among them, text absolutely, is the most important information. For example, when people surf Web pages, they always care about scores in a baseball game. or the price and name of products they like. Therefore, text detection is becoming a popular research nowa- days. In related works, Jain and Yu
[U]
use color reduction to decompose an input image to several individual foreground im- ages and then put them to connected component analysis to lo- calize text regions. This approach has two drawbacks. First, in the low contrast color image, false alarm rate might increase rapidly due to color quantization. Second,as
number of q u a - tized color incmases, the system has to pay higher computing complexity or more memory space. Lienhart and Wemicke [3] decimate the input image to multiple resolution layers. By util- izing edge features in each individual layer, text can be located after integrating all resolution layers. But in general compound images and video score bar always contain many characters with small font size so that characters after decimation are almost invisible. Gao, and Yang [ 5 ] , Cai, Song, and Lyu [41 suppose that edge strength and density of charactersare
always stronger than other objects in color images. Therefore, after edge filtering, text candidates couldbe
easily found. Unfortunately, systems with above assumptions could never work very well in complex color images. Zhong, Karu and lain [7] compute the spatial0-7803-8603-5/04/$20.00 02004
IEEE
619
variance along each horizontal line over the whole image and text lines can then be found by extracting the rows between two sharp edges of the spatial variance - one edge rising and the other falling.
In
their approach. if the background is complex. an appropriate threshold could not easily be identifird. By our ap- proach, we could get fewer foreground images by means of 3D histogram analysis and raise detecting rate by integrating all single quantization layers. In addition. we use two morphologi- cal operators to compensate text fractions resulted from color quantization.Remaindcrs of this paper are organized as follows. Section.2 describe details of our algorithm. Section.3 shows our perform- ance of this algorithm and some experiment results. Final con- clusion is made in Section.4
2. ALGORITHM
The fundamental building block of our whole text detection system is illustrated in
Fig.1.
First, the input image is quantized to several quantized images with different number of quantized color. For each quantized image, it was put to 3D histogram analysis to find some specific colors, which are probable text candidates. Furthermore, each bi-level image relative to its color candidate could be produced. By calculating some spatial fea- tllres and relationships of characters, text candidates would be identified. Final, we combine all single quantization layers so that we could localize text regions accurately. In the following sections,we
will first explain details of single quantization layer, and than multi-layer combination approach will be described later.2.1 Text Contents of Single Quantization Layer
Before explaining steps of single quantization layer, we have to first make some assumptions for general text in color images as follow>.
a. Text within an image is meaningful if and only if people can recognize easily.
b. For the same sentence or word, character size is similar to each other.
c. The words need to be big enough for human eyes to recog-
nize.
d. Characters of the same word or sentence within an image always have similar colors.
Single Quantizatinn
Layer
+
...
-
3 0 hklogram analysis Morphological operator 3D hbtagramt
L analysiss
x
3-
Morphological operator Connectivity analysis LOG-
.
Morphological operator
2.
. I
-
-
3D hstogmm...
-
.
.
Fig. 1. Building blocks of whole text detection system
2.2 Color Quantization
It
is
essential that text regions have to be merged together first if we would like to localize it by using color information. In accordance with above points. we choose SOFM neural network [I] to quantize input image. First of all, we select an appropriate neural structure and small color table size (4 toIO
colors were recommended), then assigned uniformly distributed color to initial neurons. Final. Butterily-lumping sample sequence from original image was fedin SOFM
training to obtain the color palette.2.3
3D
Histogram AnalysisGenerally speaking. pixels in text region are often more compact than in other objects. According to this point, we cal- culate the 3D color histogram as shown in Equation (I), how- ever, with respect to each quantized color Vq , whem
V q E [ i j q l , i j q 2 ,
...
ijqm], and m=nwnber of quantized color,we estimate its histogram gradient
E(
Vq J by equation (2). Thus, some quantized colors whose gradient is higher than a threshold would be considered as textual color candidates. After deter- mining dominant colors, several non-important colors could be rejected to reduce computation complexity.-
( 1 )
where
V
E( r , g , b )
where %is
=
(i,
j , k )
2.4 Morphological Operating
Substantially, English character consists of only one con- nected region (except “i”. ‘)”), but in other languages, a charac- ter always includes two o r three regions (see Fig.2.a). Thus, if we put this kind of “non-single region” characters to connected component analysis without doing any preprocessing. it is more likely that connected component analysis might make the wrong decision. So. we have to merge these regions first. Here, we utilize morphological dilation
[IO]
tn merge the “co-charactei‘ regions (see Fig.2.b). Unfortunately, a serious problem might be followed by this operation, if two characters are very close to each other. they might also be merged together due to this op- eration. Consequently, morphological erosion [lo] with different strucruring element such as bar shape must be used to Solve this problem (Fig2.c). Also, it is exciting that Morphological dilation can also compensate the effect that when color quantization works on low contrast images, a character sometimes might bebroken to pieces, an example is shown in Fig2.d and Fig2.e. where the left character of Fig2.d is broken to four pieces and it is compensated in Fig2.e.
In
parenthescs. it is known that using both dilation and erosion will result in granularities in images, but in our application, they are always discarded, because granularities are always resulted from very small and independ- ent connected region, i.e.. a small connected region is difficult for human eyes to recognize it even if it is actually a character.Fig. 2. (a) Original characters, (b) dilation on (a), (c) emsion
on (b), (d) color quantization in low contrast image, (e) dila- tion on (d)
2 5 Connectivity Analysis
Details of this step are similar to other existing text detection methods.
In
general, characters in an image always appear in groups; therefore, using some features such as width, height anddistance
can
identify text candidates. 2.6 Authentication from L.0.G Edge FilterInstead of adopting edge filter to tind text candidates, we make use of L.0.G ( Laplacian of Gaussian ) edge filter to only confirm each text candidate. By calculating the ratio of edge and non-edge points, we could make sure whether it is text region or not, if the ratio of this hounding box is higher than a threshold. 2.7 Multi-Layer Combination
For simple background images, single quantization layer may work very well. But in complex background images single quantization layer may fail to detect the accurate text regions, meanwhile, it may also produce many f d s e boxes at the same time. Fortunately, in different single layers although many false boxes might happen, they would neither appear in similar loca- tion nor have same box size. Therefore, we could solve this serious problem by integnting several different single quantiza- tion layers that have different false boxes as shown in Fig.3. It is clear that if we hold boxes that always not only appear in the fixed location but also have similar box size, and reject the other boxes, we could detect real text boxes with low false alarm rate io complex background images.
3. EXPERIMENT RESULTS
Fig. 3. (a, b) Single layer with CQ
=
6
and its own text boxes,(e,
d) Single layer with CQ=
8
and its own text boxes, (e, 0 Single layer with CQ = 10 and its own text boxes, (g) output text boxes after integrating all single layers, (h) output image (where CQis the number of quantiaed color)
To prove the robustness of our system, we test some color images, and three video sequences included sports and news
wirlrout adjusring an>’ parameter of our svsrern by changing conditions such as resolution. font size. languages, and com- plexity of background. The overall performance ofthe algorithm is listed in Table 1, and some test image and video frames are shown in Fig. 4.
As listed in Table I.we detine the hit rate, false alarm rate, and m i s s rate to evaluate our system as follows:
hit rare =
#(boxes detected as text and they are indeed text
]
#{boxes detected as text ) x 100miss rare = 100 - hir rare false alarm rare=
# (boxes detected as text but they are not text
1
#[boxes detected as text ) x 100 where the symbol “ # ” represents the number of the set.Tahle.1. 4. CONCLUSION
In
our algorithm, instead of concentrating on choosing some “tied” pxameters to avoid false text localization, we use a mul- tiple calor quantization layer approach to Localize correct text with very loose parameters. In addition, several morphologicaloperations
are
used to cumpensate some shoncomings. For de- tection performance, we indeed get a low false alarm rate, and high hit rate.5.REFERENCES
[I] S. C. Pei, and Y.
S.
Lo, “Color Image Compression and Limited Display Using Self-organization Kohonen Map”,IEEE Truns. Circuits and Svstems for Vidco Technolog): pp.191-205, Ap,: 1998.
Ani1 K. Jain. and Bin Yu “Automatic Text Location in Images and Video Frames”, IEEE. Intl. ConJ Patrem Recognition. pp. 1497-1499, Aug. 1998
R.Lienhart, and A. Wemicke “Localizing and Segmenting Text in Images and Videos”, IEEE Trans. Circuirs and Sysrerns for Video Technology, pp.256-68, Ap,: 2002.
M . Cai. 1. Song, and M.
R.
Lyu, “A New Approach for Video Text Detection”, IEEE, htl. Cant Image Pmcess-[2]
[3]
[4]
ing. pp.117-120, 2002.
I. Gao. and J. Yang, “An AdaDtive Aleorithm for Text 151
. .
-
-
Detection from Natural Scenes”, Pmceedings of Com-
purer’ Vision and Parrem Recognirion (CVPR). pp.84-89.
2001.
J.
Ymg,
X. Chen. I. Zhang, Y. Zhang, and A. Waibel, “Automatic Detection and Translation of Text from Natural Scenes”, IEEE. Intl. Confi Acousrics. Speech. a d Signal Processing (ICASSPJ, pp.2101-2104. May, 2002.Y. Zhong, K. Karu, and A. K. Jain, “Locating Text in Complex Color Images”, Parrem Recognition,
28: 1523-1535, pp.146-149, 1995.
A. K. Jain. and Bin Yu, “Automatic text location in im- ages and video frames”, Partem Recognition, vol. 31, no.
12, pp.2055-2076, 1998.
S. kdbhakar, H. C h e w John
C.
Handley, Z.Fan,
andY.
W. Lin, “Picture-GraphicsColor Image
Classification”,EEE,
IntL
Conf
onha@
hocessin~(ICP),
pp, 78.5-788,2@2Robert M. Haralick, and Linda C Shapiro, ”Computer
and Robor Vision” vol
I
[6][7]
[8]
[91
[IO]
Fig. 4. (a) vehicle license (h) non-compact text (c) low contrast image