CHAPTER 1 INTRODUCTION
1.2 Previous works
Although it has been almost ten years since the concept of butterfly image recognition was proposed, the research in this field is still limited. After searching plenty of literature, including IEEE, SCI, ACM and Google scholar, we only find less than 20 studies about butterfly recognition on “specimen image” and none of them are
1
on “natural image.” Therefore, in the following two sections, we first survey another nice paper about automatic segmentation on general images, and summarize the related works on butterfly recognition on “specimen image.”
1.2.1 Segmentation
Before recognition, there is a necessary step to extract the butterfly region from the background; however, the previous researches we found are all on specimen image. Since the background of nature image is much more complex than the simple and clear background in specimen image, the previous methods could not be applied to our situation. Therefore, we use the idea from another research about automatic segmentation on general images.
Li-Jen Chen and Ling-Hwei Chen [1] proposed a new fast method in automatic object segmentation based on human visual system. They first applied K-means on the local color distribution, the color values of all pixels, and obtained many small regions. Then, they analyzed the properties of each region: color composition, size, relative position, and coarseness. Finally, they used those color and texture properties to merge the similar regions. Although this research could achieve good results, it took too much computing time. Hence, we will only use the concept of its first step in our method.
2
1.2.2 Feature extraction and Recognition
The concept of butterfly image retrieval has been introduced around the late 1990s by Suzuki and Nagao [2-3]. At that time, due to the shortage of retrieval technology, the systems only supported text-based search. Recently, with the development of digital collection, biological image recognition has been noticed gradually. Some researchers started from the retrieval on butterfly specimen image [4-12].
During 2000 ~ 2003, Prof. Hsiang and his students [4-9] proposed a butterfly retrieval system on Internet and tried to retrieve the information from butterfly specimen image automatically. In 2000, they first built the retrieval system with all feature values given by human [4-5]. After the system construction, they discussed about the automaticity of object segmentation and color feature extraction. The correction rate of segmentation was 82%, and the best color feature they found was the human pre-defined perceptual level. Due to the unsatisfying results, they continued researching on automatic color and texture recognition on butterfly specimen image [6-9]. Nonetheless, their best recognition rate was about 50%
measured by fuzzy average R-Precision, which evaluates the average R-Precision rate by counting the similar retrieval image as correct in some degree.
In 2002, Hyuga and Nishikawa [10] constructed a clustering system of butterfly
3
specimen image. They first calculated 88 features of each database image, including 10 color features of HSL color histogram, 26 shape features of local decimal and moment, 52 texture features of angular second moment, contrast, correlation and entropy. After that, they mapped all images to the Self-Organizing Maps (SOM) structure by their features, and divided them to clusters. The clustering results on the SOM map seemed disordered and overlapped seriously. Besides, the author did not support any experimental data.
In 2006, Lim [11] proposed the classification algorithm for butterfly and ladybug.
The algorithm combined three traditional methods: morphology process, edge detection and color histogram. The author announced that this algorithm can recognize “every” insect (more than one) in an image taken on the field, and its average recognition ratio was extremely good, about 90%. Nevertheless, through our observation, its input and database images are all specimen images, and for each image, there is only one insect on it. This deviates from the author’s description.
Besides, its database, consisting of 9 species of butterflies and 4 species of ladybugs, is insufficient for the reality.
In the later year, the lecture proposed by Starostenko [12] was closer to the real situation. This study could be divided into two parts, the shape indexing method and the feature filters. The indexing method [13], previously proposed for general image
4
retrieval, combined the SUSAN, CORPAI and 2STF algorithms. The later part included color, compactness and elongatedness features, and were used to pre-eliminate the butterfly species with large difference from the candidates. Its database consisted of 140 species and totally 1400 images, and for each species, the author defined its relevant degree with other species. Its average recall rate was 28%, and the average precision rate was 46%.
Considering the practicality for ordinary people and the further progress in this field, we propose a novel system for butterfly recognition on natural image. First, since there is no previous study on natural image, a useful database model is designed and constructed. Next, a user-friendly interactive segmentation method is provided to solve some problems on natural image. Then, the system can automatically extract the representative dominant colors of butterfly by the proposed Automatic Region Growing Boundary Thresholding (ARGBT) and Automatic Error Threshold K-means (AET-K-means) methods. The corresponding color and distribution features of each dominant color are acquired, and a similarity measurement is provided for recognition.
After the recognition process, we also support two feedback mechanisms to come up to user’s expectation. Finally, the experimental results have shown the effectiveness of our system.
5