Automatic text detection using multi-layer color quantization in complex color images

(1)

2 0 0 4 IEEE International Conference on Multimedia a n d Expo (ICME)

Automatic Text Detection Using Multi-Layer

Color

Quantization in Complex

Color

Images

Soo-Chang

Pei, and Yu-Ting Chuang

Department of Electrical Engineering, National Taiwan University

No.1, Sec. 4 Roosevelt Rd., Taipei,

10617 Taiwan, R.0.C

TEL:

886-2-23635251 ext. 321

FAX:

886-2-23671

909 Email:

[email protected], 190942058 @ms90.ntu.edu.tw

ABSTRACT

News, magazines, Web pages, etc in this modem life always contain much text information.

In

this paper. we propose a novel approach to detect text in images with very low false alarm rate. First of all. neural network color quantization is used to compact text color. Second, 3D histogram analysis chooses several color candidates, and then extracted each of these color candidates to obtain several bi-level images. Moreover, for each bi-level image. connectivity analysis and some morphological operators are fed to produce character candidates. Furthermore. we calculate some spatial features and relationships of each text candidate. At last, we can localize text regions by authentication from L.0.G edge detector. Meanwhile, in complex color images, multi-quantization layers can be integrated to reject non-text parts and reduce false alarm rate.

1.

INTRODUCTION

In modem multimedia times, News, Web pages, magazines. and advertisements are everywhere in our lives. Among them, text absolutely, is the most important information. For example, when people surf Web pages, they always care about scores in a baseball game. or the price and name of products they like. Therefore, text detection is becoming a popular research nowa- days. In related works, Jain and Yu

[U]

use color reduction to decompose an input image to several individual foreground images and then put them to connected component analysis to localize text regions. This approach has two drawbacks. First, in the low contrast color image, false alarm rate might increase rapidly due to color quantization. Second,

as

number of q u a - tized color incmases, the system has to pay higher computing complexity or more memory space. Lienhart and Wemicke [3] decimate the input image to multiple resolution layers. By util- izing edge features in each individual layer, text can be located after integrating all resolution layers. But in general compound images and video score bar always contain many characters with small font size so that characters after decimation are almost invisible. Gao, and Yang [ 5 ] , Cai, Song, and Lyu [41 suppose that edge strength and density of characters

are

always stronger than other objects in color images. Therefore, after edge filtering, text candidates could

be

easily found. Unfortunately, systems with above assumptions could never work very well in complex color images. Zhong, Karu and lain [7] compute the spatial

0-7803-8603-5/04/$20.00 02004

IEEE

619

variance along each horizontal line over the whole image and text lines can then be found by extracting the rows between two sharp edges of the spatial variance - one edge rising and the other falling.

In

their approach. if the background is complex. an appropriate threshold could not easily be identifird. By our approach, we could get fewer foreground images by means of 3D histogram analysis and raise detecting rate by integrating all single quantization layers. In addition. we use two morphological operators to compensate text fractions resulted from color quantization.

Remaindcrs of this paper are organized as follows. Section.2 describe details of our algorithm. Section.3 shows our performance of this algorithm and some experiment results. Final conclusion is made in Section.4

2. ALGORITHM

The fundamental building block of our whole text detection system is illustrated in

Fig.1.

First, the input image is quantized to several quantized images with different number of quantized color. For each quantized image, it was put to 3D histogram analysis to find some specific colors, which are probable text candidates. Furthermore, each bi-level image relative to its color candidate could be produced. By calculating some spatial fea- tllres and relationships of characters, text candidates would be identified. Final, we combine all single quantization layers so that we could localize text regions accurately. In the following sections,

we

will first explain details of single quantization layer, and than multi-layer combination approach will be described later.

2.1 Text Contents of Single Quantization Layer

Before explaining steps of single quantization layer, we have to first make some assumptions for general text in color images as follow>.

a. Text within an image is meaningful if and only if people can recognize easily.

b. For the same sentence or word, character size is similar to each other.

c. The words need to be big enough for human eyes to recog-

nize.

d. Characters of the same word or sentence within an image always have similar colors.

(2)

Single Quantizatinn

Layer

+

...

-

3 0 hklogram analysis Morphological operator 3D hbtagram

t

L analysis

s

x

3

-

Morphological operator Connectivity analysis LOG

-

.

Morphological operator

2. . I

-

3D hstogmm

...

-

.

Fig. 1. Building blocks of whole text detection system

2.2 Color Quantization

It

is

essential that text regions have to be merged together first if we would like to localize it by using color information. In accordance with above points. we choose SOFM neural network [I] to quantize input image. First of all, we select an appropriate neural structure and small color table size (4 to

IO

colors were recommended), then assigned uniformly distributed color to initial neurons. Final. Butterily-lumping sample sequence from original image was fed

in SOFM

training to obtain the color palette.

2.3

3D

Histogram Analysis

Generally speaking. pixels in text region are often more compact than in other objects. According to this point, we cal- culate the 3D color histogram as shown in Equation (I), how- ever, with respect to each quantized color Vq , whem

V q E [ i j q l , i j q 2 ,

...

ijqm], and m=nwnber of quantized color,

we estimate its histogram gradient

E(

Vq J by equation (2). Thus, some quantized colors whose gradient is higher than a threshold would be considered as textual color candidates. After deter- mining dominant colors, several non-important colors could be rejected to reduce computation complexity.

-

( 1 )

where

V

E

( r , g , b )

where %is

=

(i,

j , k )

2.4 Morphological Operating

Substantially, English character consists of only one connected region (except “i”. ‘)”), but in other languages, a character always includes two o r three regions (see Fig.2.a). Thus, if we put this kind of “non-single region” characters to connected component analysis without doing any preprocessing. it is more likely that connected component analysis might make the wrong decision. So. we have to merge these regions first. Here, we utilize morphological dilation

[IO]

tn merge the “co-charactei‘ regions (see Fig.2.b). Unfortunately, a serious problem might be followed by this operation, if two characters are very close to each other. they might also be merged together due to this operation. Consequently, morphological erosion [lo] with different strucruring element such as bar shape must be used to Solve this problem (Fig2.c). Also, it is exciting that Morphological dilation can also compensate the effect that when color quantization works on low contrast images, a character sometimes might be

(3)

broken to pieces, an example is shown in Fig2.d and Fig2.e. where the left character of Fig2.d is broken to four pieces and it is compensated in Fig2.e.

In

parenthescs. it is known that using both dilation and erosion will result in granularities in images, but in our application, they are always discarded, because granularities are always resulted from very small and independ- ent connected region, i.e.. a small connected region is difficult for human eyes to recognize it even if it is actually a character.

Fig. 2. (a) Original characters, (b) dilation on (a), (c) emsion

on (b), (d) color quantization in low contrast image, (e) dila- tion on (d)

2 5 Connectivity Analysis

Details of this step are similar to other existing text detection methods.

In

general, characters in an image always appear in groups; therefore, using some features such as width, height and

distance

can

identify text candidates. 2.6 Authentication from L.0.G Edge Filter

Instead of adopting edge filter to tind text candidates, we make use of L.0.G ( Laplacian of Gaussian ) edge filter to only confirm each text candidate. By calculating the ratio of edge and non-edge points, we could make sure whether it is text region or not, if the ratio of this hounding box is higher than a threshold. 2.7 Multi-Layer Combination

For simple background images, single quantization layer may work very well. But in complex background images single quantization layer may fail to detect the accurate text regions, meanwhile, it may also produce many f d s e boxes at the same time. Fortunately, in different single layers although many false boxes might happen, they would neither appear in similar location nor have same box size. Therefore, we could solve this serious problem by integnting several different single quantization layers that have different false boxes as shown in Fig.3. It is clear that if we hold boxes that always not only appear in the fixed location but also have similar box size, and reject the other boxes, we could detect real text boxes with low false alarm rate io complex background images.

3. EXPERIMENT RESULTS

Fig. 3. (a, b) Single layer with CQ

=

6

and its own text boxes,

(e,

d) Single layer with CQ

=

8

and its own text boxes, (e, 0 Single layer with CQ = 10 and its own text boxes, (g) output text boxes after integrating all single layers, (h) output image (where CQ

is the number of quantiaed color)

(4)

To prove the robustness of our system, we test some color images, and three video sequences included sports and news

wirlrout adjusring an>’ parameter of our svsrern by changing conditions such as resolution. font size. languages, and complexity of background. The overall performance ofthe algorithm is listed in Table 1, and some test image and video frames are shown in Fig. 4.

As listed in Table I.we detine the hit rate, false alarm rate, and m i s s rate to evaluate our system as follows:

hit rare =

#(boxes detected as text and they are indeed text

]

#{boxes detected as text ) x 100

miss rare = 100 - hir rare false alarm rare=

# (boxes detected as text but they are not text

1

#[boxes detected as text ) x 100 where the symbol “ # ” represents the number of the set.

Tahle.1. 4. CONCLUSION

In

our algorithm, instead of concentrating on choosing some “tied” pxameters to avoid false text localization, we use a multiple calor quantization layer approach to Localize correct text with very loose parameters. In addition, several morphological

operations

are

used to cumpensate some shoncomings. For detection performance, we indeed get a low false alarm rate, and high hit rate.

5.REFERENCES

[I] S. C. Pei, and Y.

S.

Lo, “Color Image Compression and Limited Display Using Self-organization Kohonen Map”,

IEEE Truns. Circuits and Svstems for Vidco Technolog): pp.191-205, Ap,: 1998.

Ani1 K. Jain. and Bin Yu “Automatic Text Location in Images and Video Frames”, IEEE. Intl. ConJ Patrem Recognition. pp. 1497-1499, Aug. 1998

R.Lienhart, and A. Wemicke “Localizing and Segmenting Text in Images and Videos”, IEEE Trans. Circuirs and Sysrerns for Video Technology, pp.256-68, Ap,: 2002.

M . Cai. 1. Song, and M.

R.

Lyu, “A New Approach for Video Text Detection”, IEEE, htl. Cant Image Pmcess-

[2]

[3]

[4]

ing. pp.117-120, 2002.

I. Gao. and J. Yang, “An AdaDtive Aleorithm for Text 151

. .

-

Detection from Natural Scenes”, Pmceedings of Com-

purer’ Vision and Parrem Recognirion (CVPR). pp.84-89.

2001.

J.

Ymg,

X. Chen. I. Zhang, Y. Zhang, and A. Waibel, “Automatic Detection and Translation of Text from Natural Scenes”, IEEE. Intl. Confi Acousrics. Speech. a d Signal Processing (ICASSPJ, pp.2101-2104. May, 2002.

Y. Zhong, K. Karu, and A. K. Jain, “Locating Text in Complex Color Images”, Parrem Recognition,

28: 1523-1535, pp.146-149, 1995.

A. K. Jain. and Bin Yu, “Automatic text location in im- ages and video frames”, Partem Recognition, vol. 31, no.

12, pp.2055-2076, 1998.

S. kdbhakar, H. C h e w John

C.

Handley, Z.

Fan,

and

Y.

W. Lin, “Picture-Graphics

Color Image

Classification”,

EEE,

IntL

Conf

on

ha@

hocessin~

(ICP),

pp, 78.5-788,2@2

Robert M. Haralick, and Linda C Shapiro, ”Computer

and Robor Vision” vol

I

[6]

[7]

[8]

[91

[IO]

Fig. 4. (a) vehicle license (h) non-compact text (c) low contrast image

(d)

TV

advertisements (e) soccer

(0

baseball