Personal adaptation in handwriting recognition

4.3 Multistage character recognition

4.3.3 Personal adaptation in handwriting recognition

Most of the recently announced handwritten character recognition systems claimed their benchmarking recognition performance to be higher than 90%.

However, when they were tested on unconstrained freehand-writing, most of their recognition accuracy fell between 40% and 50% [41]. Hence, we sug-gested an unconstrained freehand-writing recognition module to adaptively fine tune the parameters of the SPDNN character recognizer in order to learn the user’s own writing style. When input characters were misclassified, the erro-neous recognition results will be manually corrected by a user. In the mean time, the parameters or the decision boundaries of the corresponding character SPDNN are modified and improved by performing the reinforced and antire-inforced learning processes. In addition, when it is necessary, clusters in a character SPDNN may be created (self-growing rules) to better approximate the partition boundaries. In order to prevent the excessive learning of the des-ignated character boundary, the adaptive learning process usually include a verification process. Naturally, the reinforced and antireinforced learning pro-cesses are applied to SPDNN associated with the mismatched character and its similar characters (the TOP 10 candidates). When more and more

uncon-strained freehand-written characters are presented to the system, each character SPDNN will gradually learn the user’s personal writing style.

Figure 4.6: The user interface and a recognition snapshot of the proposed three-stage recognizer.

Experimental results and Performance evaluation

In order to evaluate the performance of the unconstrained freehand-writing recognition module for its adaptation and recognition capabilities, we prepared our in house database (NCTU/NNL) in the following manner. We first selected the most commonly used 300 characters from the Chinese textbooks for the elementary schools in Taiwan. And then, these 300 Chinese characters and the alphanumerics were written without any restriction on the writing style by several students in our university for 10 times in several days. We intended to simulate a natural and general unconstrained freehand-written database in this manner. The testing results for 5 user’s adaptation processes are illustrated in Table 4.6. The recognition rates was raised from 44.09% to 82.2% during the 5 learning cycles. And the performance may finally increase up to 90.03%

in 10 learning cycles. Figure 4.6 depicts the user interface and a snapshot of recognition results of the prototype system.

Table 4.6: By applying 300 commonly used characters written with-out any constraints by five students, the proposed adaptive sys-tem shows significant improvement on the recognition accuracy during the 10 learning cycles.

Trial user#1 user#2 user#3 user#4 user#5 avg.

1st 50.6% 33.6% 38.1% 52.3% 45.7% 44.0%

2nd 67.7% 69.0% 55.5% 56.6% 61.1% 62.0%

3rd 78.6% 80.0% 69.9% 71.5% 72.7% 74.5%

4th 84.3% 78.7% 69.3% 75.9% 86.0% 79.5%

5th 84.6% 87.6% 73.9% 79.9% 85.1% 82.2%

6th 81.9% 89.0% 76.2% 80.2% 84.0% 82.2%

7th 86.5% 89.6% 79.9% 78.5% 84.6% 83.8%

8th 89.5% 90.3% 79.5% 86.9% 89.3% 87.1%

9th 90.5% 90.6% 81.2% 87.7% 89.3% 87.9%

10th 93.6% 91.4% 84.6% 90.5% 90.1% 90.0%

Chapter 5 Content-based Image Retrieval

The ongoing proliferation of digital content available over Internet leads to an increasing demand for systems that can automatically query, search, and retrieve of relevant images from large content databases and/or library. To construct such systems, two issues have to be considered: (1) how to properly index an image, and (2) how to design a user friendly query method. Over the past decades, a considerable number of studies have been made on content-based image retrieval (CBIR), where images are indexed and retrieved by their visual features, such as object shape, position, color, texture, etc. [42, 43, 44, 3, 4, 45, 46].

5.1 Overview

According to the different query methods, image query systems can be divided into (1) the full automatic, and (2) the user feedback query categories. For the full automatic query systems [47, 48, 49, 50], a user specifies several related images to vaguely reflect his/her desired images, and then the query systems

respond with a bunch of so called related images. Sometimes, most of these images may be undesired due to misinterpretation between users and the query system. On the other hand, to make the retrieval results to be more satisfactory, the user feedback query systems [51, 52] specify some details of contents instead of just the image itself, such that the query systems can directly use these information for searching and matching suitable images.

In a full automatic query system, indexing an image with its global fea-tures, such as color histograms, are often used for image retrieval. Some early developed systems, such as QBIC [47], Virage [48], Photobook [49], VisualSEEk and WebSEEk [50] basically applied global features for image retrieval. How-ever, using global features for image indexing, a query system may ignore some significant local details of an image, so as to retrieve undesired images.

Instead of using global features of an image, the user feedback query sys-tems, such as the Netra [51] and Blobword [52], adopt local features to represent or to index an image. In these systems, the local features are obtained from some regions or subimages, which are segmented or sketched from an image first, and then various visual features of these regions are extracted. In gen-eral, the query and retrieving precision of these systems are usually better than the global feature based systems, but their performance depend heavily on the precise segmentation or skillful sketch of a region.

For the past decades, segmenting an image into semantic meaningful re-gions is still a difficult task in image processing [42, 43, 44]. Instead of em-phasizing on the precise region segmentation, the Integrated Region Matching (IRM) metric [3] is proposed to robust measure the similarity between regions and reduces the influence of inaccurate segmentation. In addition, a

region-based fuzzy feature matching approach, called unified feature matching (UFM) [4], is proposed to characterize each region with a fuzzy feature set; thus, an image is associated with a family of fuzzy feature sets. Since fuzzy features naturally characterize the blurry boundaries between regions, the influence of inaccurate segmentation is reduced.

Since an image can be partitioned into several sub-images, called regions or objects, the spatial relationship of these regions plays an important role in representing an image. The 2D B-string [53] is proposed to represent spatial relation of regions, where each region is represented by two symbols: the begin boundary and the end boundary symbols. With these symbols, a 2D B-string can represent the spatial relation of partial overlap regions without a boundary cutting process.

Since the 2D B-string represents only the spatial relation of regions in an image, it may correspond to two sets of regions of similar spatial relations but completely different in shapes and sizes. Thus, in addition to 2D B-string, more visual features, such as color, texture, and shape are needed to represent regions and to index an image for query and retrieving purposes.

The kernel problem of image retrieval lies on the representing a user’s de-sired images, which is conceptually resided in his/her mind, into a set of com-putable image processing formulas or models. Similar to the keywords for the text query, the visual keywords are proposed for the image query and retrieval.

Instead of annotating each region in an image by keywords, I represent a region with a visual keyword, and spatial relation of regions with a visual string.

A Neural Networks based Image Retrieval System (NNIRS) is developed

at “http:// 140.113.216.78/ ImageQuerySystem”. The system configuration is depicted in Fig. 5.1. All the major processing modules, including pre-processing and feature extraction modules, a Visual Keyword based Retrieval Module (VKRM), and a Visual String based Retrieval Module (VSRM), are im-plemented on a personal computer.

Image Preprocessing

Figure 5.1: System configuration of the Neural Networks based Image Retrieval System (NNIRS). The pre-processing and feature extraction module is used to extract the visual keyword or visual string from the image. The visual keyword and visual string based retrieval modules are used to find the relevant images in the database using visual keywords or visual string, respectively.

The user can query the NNIRS by visual keyword or visual string. When a query image is submitted, the pre-processing and feature extraction module is first activated to extract the visual keyword or visual string from the image depending on the query requirement. Then, the VKRM or VSRM is used to find the relevant images in the database using visual keywords or visual string.

The following sections show the details of these modules in the NNIRS.

5.2 Image Pre-processing and Feature

在文檔中複合式高斯類神經網路之研究 (頁 55-62)