Statement of Problem - 統計學習方法在高維度資料分類上之應用

CHAPTER 1: INTRODUCTION

1.1 Statement of Problem

Statistical learning methods are the ways of estimating functional dependency from a given collection of data. It analyzes factors responsible for generalization and controls these factors in order to generalize well via the learning theory of statistic. It covers important topics of classical statistics, in particular, discriminant analysis, regression analysis, and density estimation problem (Vapnik, 1998). In this thesis, we focus on discriminant analysis, also called pattern recognition, and obtained the results via statistic learning methods. Pattern recognition is a human activity that we try to imitate by mechanical means. There are no physical laws that assign observations to class. It is the human consciousness that groups observations to classes. Although their connections and inter-relations are often hidden, by the attempt of imitating this process, some understanding and general relations might be gained in these observations via statistical learning, which deals with the problem detecting and characterizing relations in data. Most of statistical learning, machine learning methods and data mining methods of pattern recognition assume that the data is in vector form and that the relations can be expressed as classification rules, regression function or cluster structures. Hence, the pattern recognition technique had been extensively applied in a variety of engineering and scientific disciplines, such as bioinformatics, psychology, medicine, marketing, computer vision, artificial intelligence, and remote

sensing (Duin & Pekalska, 2005; John & Nello, 2004; Jain, Duin, & Mao, 2000).

Presently, owing to science and technology development, the multi-dimensional data has been evolved into the high dimensional data, such as hyperspectral data, bioinformation data, gene expression data, and face recognition data etc. The classification techniques of classical statistics in pattern recognition typically assume there are enough learning patterns available to gain the reasonably accurate class descriptions in quantitative form and based on using various types of a priori information. Unfortunately, the number of learning patterns required to learn a classification algorithm for high dimensional data is much greater than that required for conventional data, and gathering these learning patterns can be difficult and expensive. Consequently, the assumption that enough training samples are available to accurately estimate the class quantitative description is frequently not satisfied for high dimensional data. In practice, we often encountered a great difficulty in the curse of dimensionality (also called Hughes phenomenon (Hughes, 1968; Bellman, 1961;

Raudys & Jain, 1991)), which means only a limited number of learning patterns is available in the classification techniques for high dimension data classification problem, and it usually makes the poor statistic estimation in the traditional classification technique (e.g. the covariance matrix in the maximum-likelihood classifier (ML)) (Fukunaga, 1990; Kuo & Landgrebe, 2002; Kuo & Chang, 2007). In order to overcome this situation, the classification techniques, such as dimensionality reduction, regularization, semi-supervised approaches, or the suitable classifiers, had been proposed.

Boser, Guyon, and Vapnik (1992) developed a classification technique, which doesn’t rely on a priori knowledge, for small data samples, called support vector

machine (SVM). Unlike traditional methods which minimize the empirical training error, SVM aims at minimizing an upper bound of the generalization error through maximizing the margin between the separating hyperplane and the training data.

Hence, it can overcome the poor statistical estimation problem, and it’s relatively higher empirical accuracy and excellent generalization capabilities than other standard supervised classifier (Melgani & Bruzzone, 2004; Fauvel, Chanussot, & Benediktsson, 2006). In particular, SVM have shown a good performance of high dimension data classification with a small size of training samples (Camps-Valls & Bruzzone, 2005;

Fauvel, Chanussot, & Benediktsson, 2006) and robustness to Hughes phenomenon (Bruzzone & Persello, 2009; Camps-Valls, Gomez-Chova, Munoz-Mari, Vila-Frances, Calpe-Maravilla, 2006; Melgani & Bruzzone, 2004; Camps-Valls & Bruzzone, 2005;

Fauvel, Chanussot, & Benediktsson, 2006). However, there is the most important topic in SVM is kernel method. The main idea of kernel method is to map the input data from the original space to a convenient feature space by a nonlinear mapping where inner products in the feature space can be computed by a kernel function without knowing the nonlinear mapping explicitly and linear relations are sought among the images of the data items in the feature space (Vapnik, 1998; John & Nello, 2004;

Schölkopf, Burges, & Smola, 1999). Hence, the most important issue of kernel-based method is ‘‘how to find a proper kernel function’’ for the reference data.

For hyperspectral image classification problems, there is an other problem that the spectral-domain based classifiers often cause the imprecise estimation and have difficulty distinguishing the unlabeled patterns, when the training patterns come from different land-cover classes, but have very similar spectral properties (Jackson &

Landgrebe, 2002; Kuo, Chuang, Huang, & Hung, 2009). Figure 1.1 shows the spectral

values obtained from patterns of two categories, Soybeans-min till (purple color) and Corn-no till (yellow color), in the Indian Pine Site dataset. They are two different classes but have very similar spectral properties. Hence, employing some conventional classifiers (e.g. ML classifier, k-nearest neighbor (k -NN) classifier (Fukunaga, 1990), and SVM) by these training patterns would cause the poor classification performance, and the classification maps exhibit a speckle-like classification map (Jackson &

Landgrebe, 2002; Kuo, Chuang, Huang, &Hung, 2009). Figure 1.2 shows the classification map of Indian Pine Site which is classified by SVM. There are a number of speckle-like errors on the classification result.

Figure 1.1 These spectral values get from Indian Pine Site dataset. They are two different classes, but have very similar spectral properties. The purple color represents the patterns of Soybeans-min and the yellow color represents the patterns of Corn-notill.

Band

Spectral value

Figure 1.2 There are a number of speckle-like errors on the classification result, which is classified by SVM, from the hyperspectral image (Indian Pine Site).

The objectives of this thesis are:

i) To create a proper kernel function for different dataset via a suitable criterion, and compare the performances of some reference kernel functions and composite kernel functions by SVM classifier, and investigate the influences of kernel function.

ii) To develop a spatial-contextual support vector machine (SCSVM) with both spectral information and the spatial contextual information to classify the hyperspectral image, and compare the performances between spectral information based classifiers and spectral and spatial information based classifiers, and investigate the influences of adding the spatial information on the classifiers.

在文檔中統計學習方法在高維度資料分類上之應用 (頁 15-19)