Introduction - 街景導覽之招牌偵測系統

1.1 Introduction

A single photograph of a scene is just a static snapshot with limited field of view captured from a single viewpoint. Many techniques have been proposed to extend the ways in which a scene can be visualized by taking multiple photographs.

Multi-perspective panorama (MPP), also called route panorama, is based on side-captured images which are photographed continuously when walking along the street. Then those images are stitching together to show the street view. Users can browse the street view by web browser or other graphical user interface. Combining the route panorama with internet, the user can easily see the street view without going outside.

General route panorama systems can help users to see the street view; however, these systems only show the street view in a static way and lack of interaction with users. In nowadays, people want to have some interactions with shops on the web, like ordering a ticket when seeing a movie theater or having a reservation when seeing a famous restaurant. Route panorama cannot offer these services to people by just stitching those continuous images. Therefore, to solve this problem, route panorama needs to add other system to handle dynamic requests, a system can help people link to those street shops homepage or related discussion on the internet.

In order to help user linking to the homepage of street shop when viewing route panorama, a hyperlink for each street shop sign in route panorama is needed. To add a hyperlink for the shop sign, the shop sign have to be identified first, and then the texts

in the shop sign will be recognized; according to these recognized terxts, a hyperlink will be added on it in the end.

To achieve the purpose, a system is designed as follow. First, the sign in the images must be detected. Second, we classify the characters in the sign by optical character recognition (OCR). Finally, we add a hyperlink on the characters to link to the home page or related pages of the shop. Characters in the images are not only in the sign but everywhere, so sign detection have to be done before classify characters in the images. In this paper, we focus on the first step, sign detection.

In general, a sign is a rigid object. A rigid object usually have fixed shape and fixed colors and these characteristics can be used to detect the rigid object from a digital image. In rigid objects researches, vehicle license plate detection and traffic sign detection are two of the most popular researches now. Here are some researches about them.

1.2 Related Work

Vehicle license plate detection (LPD) is widely used for detecting speeding cars, security control in restricted areas, unattended parking zone, traffic law enforcement and electronic toll collection. There are three steps in VLP recognition system, plate location step detects the license plate location in the image, characters segmentation step extracts characters in the license plate, and characters recognition step recognizes characters. Here we focus on the plate location step.

Vehicle license plates have two significant characteristics, fixed shape and

limited color. All vehicle license plates must be rectangular and have fixed aspect ratio. According to this characteristic, Hough transforms (HT) [1] was proposed for line detection. Sliding concentric windows (SCW) algorithm was also proposed for detecting candidate rectangles [2]. Mathematical morphology method was another effective approach that often used in detecting license plate location [3]. However, when vehicle license plate image was taken in various incline angles or under various lighting, detecting vehicle license plate by edge-based methods mentioned above were not useful. Color-based methods have been proposed to solve these problems. Color collocation in vehicle license plate only has a few kinds of collocation, so in these methods, the system makes use of color information of the plate [4].

Another popular research in detecting rigid object is traffic sign detection.

Traffic sign detection also has three stages, detection stage, classification stage and recognition. Detection stage finds the most likely image area that may contain traffic sign. These areas are often called region of interest (ROI). Each ROI will be tested to classify which one is traffic sign and belongs to which traffic sign category, such as warning, prohibition and obligation. In the final stage, recognition stage identify what does the traffic sign represent. Here we also focus on the first stage, detecting region of interest (ROI).

Since traffic signs all have strict shape format (circle, rectangle, octagon, and triangle), many shape-based methods were proposed by this significant characteristic.

Hough transforms was proposed for line detection [5]. Cross-correlation based template matching with traffic sign template (strict shape format) [6]. Shape-based methods using in detecting traffic sign still have the same problems that mentioned in detecting vehicle license plate, images were taken in various incline angles or under

various lighting. In these situations, color-based methods were proposed to solve these situations.

1.3 Objectives

However, shape-based and color-based methods mentioned above cannot work well in detecting shop sign. Sliding concentric windows algorithm can only detect rectangle with fixed aspect ratio, but shop signs’ shape are various. They do not have a fixed aspect ratio. Color-based methods are not suitable in detecting shop sign, either. There are not too many collocations of colors in vehicle license plates and traffic signs, so they can be easily detected by color feature, but shop signs are colorful and have no rules, therefore, colors cannot be a feature in detecting shop signs.

To detect shop sign, Jerod Weinman has proposed a Markov random field based method to detect text regions from photos [7]. In his research, texts only exist in the shop sign and the image background is simple. But images we want to detect are more complicate. Texts in our images are everywhere, not just in the shop sign. Besides, using Markov random field to detect text form an image is time-consuming. For these reasons, a new method must be found to solve these problems.

Out system divides into two stages, first stage detects areas that may possibly include shop sign, and second stage extracts the most possible area to be our shop sign as a result. In the first stage, Sobel edge detection algorithm was used to find all horizontal and vertical edges; histogram projection filtrate shorter lines for those edges. Then we build new rules to cancel impossible lines for the rest of candidate

lines. In the second stage, max area matching rule was proposed to classify those regions which were made by candidate lines in the first stage.

1.4 Overview of this thesis

The paper is organized as follows. Chapter 2 introduces our new method. In chapter 3, experimental results and some probable faults of our method will be presented. Chapter 4 gives conclusions and suggestions for future work to our research.

在文檔中街景導覽之招牌偵測系統 (頁 8-13)