An Adaptive Feature Extraction with Reject Region

CHAPTER 3: ADAPTIVE FEATURE EXTRACTIONS

3.2 Adaptive Feature Extractions

3.2.2 An Adaptive Feature Extraction with Reject Region

(

u , and f₂ log(_t/(1_t)) is use to estimate classifier’s weights at the round t.

B. Classification Procedure:

  

 argmax_c₁_,...,_L ¹⁰_t ₁ _t (h_t(z),c)

y  .

3.2.2 An Adaptive Feature Extraction with Reject Region

Figure 3.3. The flowchart of AdaFE with reject region.

In this section, we apply nonparametric weighted feature extraction to reduce the dimensionality of hyperspectral image, and train the classifier. In the classifier aspect,

Feature Extraction

Weighted Vote

Misclassified Rate Weights for classifier

Weights for terms in Scatter Matrices

The samples of Reject Region Hyperspectral Data

(Training Samples)

Misclassified Samples Classifier

which come from the reject region into the next time process. Repeating iterations until convergence and then applying the weighted-vote way to combine classifiers.

Figure 3.3 is the flowchart of adaptive feature extraction with reject region (AdaFE_RR).

The algorithm of adaptive feature extraction with reject region (AdaFE_RR) is described as following Table 3.2:

Table 3.2. The algorithm of adaptive feature extraction with reject region.

1. Input:

(1) The training data ^x⁽i⁾,1,,^Ni ,ⁱ1,,^L, and the test sample z.

(2) A classifier algorithm,  , with output () ht^(⁾ at the round t.

(3) The reduced dimension, p.

2. Output: The label y of the test sample z by the ensemble.

A. Training Procedure:

(1) Initialize weight: ^N ⁱ ^L

xⁱ ) N1 , 1, , _i, 1, , ( ⁽⁾

1      

 .

(2) Set the stop parameter q.

 Do for t1 to…until ^ε^t ^-^ε^t-¹ ^^q.

 To build the classification including the At^d^^p linear transformation.

- Applying adaptive nonparametric weighted feature extraction - Creating the ^h_t⁽^x_⁽ⁱ⁾⁾⁽^A_t^T^x_⁽ⁱ⁾⁾.

 To estimate RG(r) for the training samples and t. - RG(r){x⁽¹⁾,x⁽²⁾,,x^([^rN^])}, where r(0,1].

- N

normalization constants,

 weights for classifiers at the round t.

B. Classification Procedure:

 





argmax_c₁_,...,_L ¹⁰_t ₁ _t (h_t(z),c)

y  .

3.2.3 An Adaptive Feature Extraction with Spatial Information

This processing includes two concepts for classifying hyperspectral image: (1) For avoiding the Hughes phenomenon, the feature extraction is the important for hyperspectral image classification. Hence, the feature space at the next round is varied at every round such that it suits for the misclassified samples at this round. The weights of the terms of the scatter matrices corresponding to the samples which are classified correctly at this round will be decreased in the next round. Otherwise, the weights will be increased for the samples which are misclassified. (2) Many studies (Bruzzone &

Persello, 2009; Kuo, Chuang, Huang, & Hung, 2009) show that the performance of the

their classification performances at this round. The traditional hyperspectral image classification procedure is a special case of our proposed processing because it is the same to perform our proposed method one round without using spatial classifier. Fig.

3.4 shows the flowchart of adaptive feature extraction with spatial information (AdaFE_SI). Note that any type of classifier and feature extraction method can be used in our proposed procedure.

Figure 3.4. The flowchart of AdaFE with spatial information.

Feature Extraction

Weighted Vote

Misclassified Rate Weights for classifier Weights for Terms in

Scatter Matrices

Misclassified Samples Hyperspectral Data

(Training Samples)

Spectral Classifier or Spatial Classifier

The algorithm of adaptive feature extraction with spatial information (AdaFE_SI) is described as following Table 3.3:

Table 3.3. The algorithm of adaptive feature extraction with spatial information.

1. Input:

(1) The training data ^x⁽i⁾,1,,^Ni,ⁱ1,,^L. (2) The test sample z.

(3) The classifier methods, ^^spectral⁽^^), ^^spatial⁽^⁾ with output at the round t.

(4) The reduced dimension, p.

2. Output: The label y of the test sample z by the ensemble.

A. Training Procedure:

(1) Initialize weight: ^N ⁱ ^L

 To build the classification including the At^d^^p linear transformation.

 Applying NWFE with ^Sb^Ada_NW^(t⁾ and Sw^Ada_NW⁽t⁾.

- N Where Zt is the normalization constants,

 classifier’s weights at the round t.

B. Classification Procedure:

  ( ( ( ), ) (1 ) ( ' ( ), )) max

arg ₁_,..., ¹⁰₁ h z c h z c

y _c _L _t_t _t _t  _t  _t .

CHAPTER 4: DATASETS AND EXPERIMENTAL RESULTS

4.1 Hyperspectral Image Dataset and Experiments

4.1.1 Data Description

In this thesis, for investigating the influences of training sample sizes to the dimension, three distinct cases, N_i  20  N  d (case 1), ^Nⁱ ^{ 0}⁴ ^^d ^ ^N (case 2), and d N_i 300N (case 3), will be discussed. Due to these sample size constraints, some of the classes in selected hyperspectral images for the experiment are used. The MultiSpec (Landgrebe, 2003) was used to select training and testing samples (100 testing samples per class) in our experiments which are the same method in (Benediktsson, Palmason, & Sveinsson, 2005), (Sebastiano & Gabriele, 2007), and (Landgrebe, 2003). At each experiment, ten training datasets and one testing dataset are randomly selected for estimating system parameters and computing the average accuracy of testing data of different algorithms respectively.

It is the Washington, DC Mall hyperspectral image (Landgrebe, 2003) as an urban site. The Washington, DC Mall image is a Hyperspectral Digital Imagery Collection Experiment airborne hyperspectral data flight line over the Washington, DC Mall. Two hundred and ten bands were collected in the 0.4–2.4 μm region of the visible and infrared spectrum. Some water-absorption channels are discarded, resulting in 191

channels. The dataset is available in the student CD-ROM of (Landgrebe, 2003). There are seven information classes in the Washington, DC data, roofs, roads, paths, grass, trees, water, and shadows, in the dataset.

4.1.2 Experimental Designs and Results

There is one dataset (Washington DC Mall Image) applied to compare the performances of non feature extraction (NonFE), NWFE, AdaFE_CE, AdaFE_RR, and AdaFE_SI methods in our experiments. In the AdaFE_CE and AdaFE_SI, the training error from 5-fold cross validation is used to renew the weights, u is set as 5. In the AdaFE_RR, the training error from 5-fold cross validation is used to renew the weights, u is set as 5, and r is set as 0.1. We create the classifiers under each different feature space, and finally combine them to turn into a strong classifier by weighted vote method.

Table 4.1. Number of pixels for each category in the Washington DC Mall Image.

Category #(Pixels)

1 Roofs 3776

2 Roads 1982

3 Paths 737

4 Grass 2870

5 Trees 1430

6 Water 1156

7 Shadows 840

Total 12791

Figure 4.1. Washington DC Mall Image.

feature extraction (NonFE), NWFE, AdaFE_CE, AdaFE_RR, and AdaFE_SI. Table 4.2 displays the classification accuracies of testing data in cases 1, 2, and 3, respectively.

Note that the highest accuracy of each dataset (in column) is highlighted in shadow cell. The brackets show the number of extracted features.

Table 4.2. The accuracy of Case 1, Case 2, and Case 3.

Case 1 Case 2 Case 3

Methods Classifier

Washington, DC Washington, DC Washington, DC

Gaussian N/A N/A 92.3 %

NonFE

3NN 84.3 % 87.5 % 94.2 %

Gaussian 87.7 % (4) 92.0 % (4) 94.1 % (15) NWFE

3NN 88.9 % (4) 91.2 % (7) 95.1 % (8) Gaussian 89.2 % (11) 92.3 % (9) 94.1 % (13) AdaFE_CE

3NN 88.3 % (4) 91.1 % (5) 96.0 % (7) Gaussian 90.3 % (7) 93.7 % (15) 95.2 % (15) AdaFE_RR

3NN 89.2 % (4) 92.1 % (4) 95.5 % (8) Gaussian 93.1 % (8) 94.6 % (14) 95.0 % (14) AdaFE_SI

3NN 92.6 % (3) 94.3 % (7) 98.1 % (8)

The comparisons of mean accuracies between all algorithms using two different classifiers are displayed in Figures 4.2-4.7.

85%

87%

89%

91%

93%

95%

NonFE NWFE AdaFE_CE AdaFE_RR AdaFE_SI

NonFE 0%

NWFE 87.70%

AdaFE_CE 89.20%

AdaFE_RR 90.30%

AdaFE_SI 93.10%

Gaussian (Case 1)

Figure 4.2. The accuracy of five methods with Gaussian classifier for hyperspectral image classification in Case 1.

80%

83%

86%

89%

92%

95%

NonFE NWFE AdaFE_CE AdaFE_RR AdaFE_SI

NonFE 84.30%

NWFE 88.90%

AdaFE_CE 88.30%

AdaFE_RR 89.20%

AdaFE_SI 92.60%

kNN (k=3, Case 1)

90%

91%

92%

93%

94%

95%

NonFE NWFE AdaFE_CE AdaFE_RR AdaFE_SI

NonFE 0%

NWFE 92.00%

AdaFE_CE 92.30%

AdaFE_RR 93.70%

AdaFE_SI 94.60%

Gaussian (Case 2)

Figure 4.4. The accuracy of five methods with Gaussian classifier for hyperspectral image classification in case 2.

85%

87%

89%

91%

93%

95%

NonFE NWFE AdaFE_CE AdaFE_RR AdaFE_SI

NonFE 87.50%

NWFE 91.20%

AdaFE_CE 91.10%

AdaFE_RR 92.10%

AdaFE_SI 94.30%

kNN (k=3, Case 2)

Figure 4.5. The accuracy of five methods with kNN (k=3) classifier for hyperspectral image classification in Case 2.

90%

91%

92%

93%

94%

95%

96%

NonFE NWFE AdaFE_CE AdaFE_RR AdaFE_SI

NonFE 92.30%

NWFE 94.10%

AdaFE_CE 94.10%

AdaFE_RR 95.20%

AdaFE_SI 95.00%

Gaussian (Case 3)

Figure 4.6. The accuracy of five methods with Gaussian classifier for hyperspectral image classification in case 3.

92%

93%

94%

95%

96%

97%

98%

99%

NonFE NWFE AdaFE_CE AdaFE_RR AdaFE_SI

NonFE 94.20%

NWFE 95.10%

AdaFE_CE 96.00%

AdaFE_RR 95.50%

AdaFE_SI 98.10%

kNN (k=3, Case 3)

From these Table and Figures, we have the following findings.

1. In the Washington, DC Mall dataset, the highest accuracies among all methods are 93.1 % (AdaFE_SI with Gaussian classifier), 94.6 % (AdaFE_SI with Gaussian classifier), and 98.1 % (AdaFE_SI with 3NN classifier) in case 1, 2, and 3, respectively.

2. The best classification result is 98.1 % which is using AdaFE_SI with kNN (k=3) classifier.

3. According to the accuracy rate from Table 4.2, AdaFE_RR and AdaFE_SI with two different classifiers are better than conventional methods.

We choose the well-known Washington, DC Mall image as the example and only some classified images are shown for comparison. The classification mechanisms under case 1, 2, and 3 (Ni =20, 40, and 300) and five methods, NonFE, NWFE, AdaFE_CE, AdaFE_RR, and AdaFE_SI, are selected to generate the classified images.

Figures 4.9-4.13 are the classification maps from five feature extraction conditions (NonFE, NWFE, AdaFE_CE, AdaFE_RR and AdaFE_SI) with Gaussian and kNN (k=3) classifiers. It can be observed form Figures 4.9-4.13 that our novel methods with Gaussian or kNN (k=3) classifier outperform NonFE and NWFE with Gaussian or kNN (k=3) classifier.

Figure 4.8. Washington DC Mall Image.

■

Grass

■

Path

■

Tree

■

Water

■

Roof

■

Road

■

Shadow

(a) (c)

(b) (d)

Figure 4.9. (a) NonFE with Gaussian classifier in cases 3.

■

Grass

■

Path

■

Tree

■

Water

■

Roof

■

Road

■

Shadow

(a) (d)

(b) (e)

Figures 4.10. (a)-(c) AdaFE_CE with Gaussian classifier in cases 1, 2, and 3.

Figures 4.10. (d)-(f) AdaFE_RR with Gaussian classifier in cases 1, 2, and 3.

■

Grass

■

Path

■

Tree

■

Water

■

Roof

■

Road

■

Shadow

(a) (d)

(b) (e)

■

Grass

■

Path

■

Tree

■

Water

■

Roof

■

Road

■

Shadow

(a) (d)

(b) (e)

Figures 4.12. (a)-(c) NWFE with kNN (k=3) classifier in cases 1, 2, and 3.

Figures 4.12. (d)-(f) AdaFE_CI with kNN (k=3) classifier in cases 1, 2, and 3.

■

Grass

■

Path

■

Tree

■

Water

■

Roof

■

Road

■

Shadow

(a) (d)

(b) (e)

4.2 Educational Measurement Dataset and and Experiments

4.2.1 Data Description

According to the results of Chapter 4.1.2, the proposed methods, adaptive feature extractions, have better performance than other conventional methods. For evaluating student’s learning profile and arranging suitable remedial instruction, educational tests are usually administrated. In this section, educational measurement dataset is analyzed by the proposed methods and other conventional methods.

Performances of two algorithms listed in Chapters 3.2.1-3.2.2 are compared by using adaptive testing simulation processes with Mathematics paper-based test data.

The content of the test designed for the sixth grade students is about “Sector” related concepts (see Appendix A). In Figure 4.14, the experts’ structures of the unit for this test are developed by seven elementary school teachers and three researchers. These structures are different from usual concept maps but emphasis on the ordering of nodes.

Additionally, every node can be assessed by an item. There are 21 items in this test and 828 subjects are collected in “Sector” tests.

Figure 4.14. Experts’ structure of “Sector” unit.

Finding the areas of compound sectors

Finding the areas of simple sectors

Definition of sector Finding the areas of circles

Drawing compound graphs

Drawing sectors

According to this structure in “Sector” test, it can divide subjects into six categories of remedial instructions. Table 4.3 is showed the remedial concepts and subjects of each category of remedial instruction.

Table 4.3. Six categories of remedial instructions.

Category Subjects The concepts of remedial instruction

0 80 All concepts are known, they don’t need remedial instructions.

1 50 They are careless and need to practice more.

2 36 “Finding the areas of compound sectors” and “Finding the areas of simple sectors”.

3 47 “Finding the areas of compound sectors” and “Finding the areas of simple sectors”.

4 221 “Definition of sector”, “Finding the areas of compound sectors”, and “Finding the areas of simple sectors”.

5 53

“Drawing sectors”.

“Finding the areas of simple sectors”, “Drawing compound graphs”, and “Definition of sector”.

6 30 “Finding the areas of compound sectors” and “Drawing sectors”.

7 25 “Finding the areas of compound sectors”, “Drawing sectors”, and “Finding the areas of simple sectors”.

8 286 They need to learn all concepts of remedial instruction.

Total 828

For evaluating the performance of the proposed method, twenty subjects in each category are randomly selected to form training datasets, and others are selected to form testing datasets. Table 4.4 is showed the training subjects and testing subjects in the experiment. There are ten training and testing datasets which are randomly selected for estimating system parameters and computing the mean accuracies of testing data of

Table 4.4. The number of training subjects and testing subjects of the experiment.

Category 1 2 3 4 5 6 7 8

Training

Subjects 20 20 20 20 20 20 20 20

Testing

Subjects 30 16 27 201 33 10 5 266

Total 50 36 47 221 53 30 25 748

4.2.2 Experimental Designs and Results

There is one dataset (Educational Measurement) applied to compare the performances of non feature extraction (NonFE), NWFE, AdaFE_CE, and AdaFE_RR methods in our experiments. In the AdaFE_CE, the training error from 5-fold cross validation is used to renew the weights, u is set as 5. In the AdaFE_RR, the training error from 5-fold cross validation is used to renew the weights, u is set as 5, and r is set as 0.1. We create the classifiers under each different feature space, and finally combine them to turn into a strong classifier by weighted vote method.

Mean accuracies of two algorithms listed in Chapters 3.2.1-3.2.2 and other conventional methods for educational testing experiment are shown in Table 4.5. Note that shadow part indicates the best accuracy of all combinations and the best accuracy of each applied classifier among all algorithms is written in bold type. The comparisons of mean accuracies between all algorithms using two different classifiers are displayed in Figures 4.15-4.16.

The following are some findings based on these results.

1. The best accuracies with kNN (k=3) and Gaussian are 70.2 % (AdaFE_RR) and 52.1 % (AdaFE_RR), respectively.

2. Applying kNN (k=3) and Gaussian into AdaFE_CE or AdaFE_RR can improve the classification performance effectively.

3. The best classification result is 70.2 % which is using AdaFE_RR with kNN (k=3).

4. The single classifier (Gaussian or kNN) does not work well in this dataset. It can be observed form Table 4.5 and Figures 4.15-4.16 that doing feature extraction with classifier (Gaussian or kNN) outperforms NonFE with classifier (Gaussian or kNN).

Table 4.5. The accuracy of four methods with two classifiers in educational testing experiment.

Ni = 20 Methods Classifier

Educational Measurement Data

Gaussian 16.3 %

NonFE

kNN (k=3) 35.4 %

Gaussian 48.6 % (2)

NWFE

kNN (k=3) 58.6 % (4)

Gaussian 51.7 % (2)

AdaFE_CE

kNN (k=3) 62.6 % (5)

Gaussian 52.1 % (3)

AdaFE_RR

kNN (k=3) 70.2% (4)

10%

20%

30%

40%

50%

NonFE NWFE AdaFE_CE AdaFE_RR

NonFE 16.30%

NWFE 48.60%

AdaFE_CE 51.70%

AdaFE_RR 52.10%

Gaussian

Figure 4.15. The accuracy of four methods with Gaussian classifier in educational testing experiment.

20%

30%

40%

50%

60%

70%

NonFE NWFE AdaFE_CE AdaFE_RR

NonFE 35.40%

NWFE 58.60%

AdaFE_CE 62.60%

AdaFE_RR 70.20%

kNN (k=3)

Figure 4.16. The accuracy of four methods with kNN (k=3) classifier in educational testing experiment.

4.3 UCI Datasets and Experiments

4.3.1 Data Description

In this section, we select a few well-known datasets from University of California Irvine (UCI) Machine Learning Repository (Blake & Merz, 1998) for experimental analysis. The datasets are described as following Table 4.6:

Table 4.6. The datasets of UCI Machine Learning Repository.

Datasets Number of Categories Number of Instances Number of Attributes

Glass 6 214 9

Heart 2 267 44

Yeast 4 1299 8

4.3.2 Experimental Designs and Results

There are three datasets (Glass, Heart, and Yeast) applied to compare the performances of non feature extraction (NonFE), NWFE, AdaFE_CE, and AdaFE_RR methods in our experiments. In the AdaFE_CE, the training error from 5-fold cross validation is used to renew the weights, u is set as 5. In the AdaFE_RR, the training error from 5-fold cross validation is used to renew the weights, u is set as 5, and r is set as 0.1. We create the classifiers under each different feature space, and finally combine them to turn into a strong classifier by weighted vote method.

Mean accuracies of two algorithms listed in Chapters 3.2.1-3.2.2 and other

of each applied classifier among all algorithms is written in bold type. The comparisons of mean accuracies between all algorithms using two different classifiers are displayed in Figures 4.17-4.22.

The following are some findings based on these results.

1. In Glass, Heart, and Yeast datasets, the highest accuracies among all methods are 77.6 % (AdaFE_RR with 3NN), 82.4 % (AdaFE_RR with 3NN), and 61.4

% (AdaFE_RR with 3NN), respectively.

2. Applying kNN (k=3) and Gaussian into AdaFE_CE or AdaFE_RR can improve the classification performance effectively.

Table 4.7. The accuracy of four methods with two classifiers in UCI datasets experiment.

Datasets Methods Classifier

Glass Heart Yeast

Gaussian 21.7% 74.9% 27.0%

NonFE

kNN (k=3) 75.3% 80.1% 56.2%

Gaussian 56.2% (6) 78.3% (4) 58.4% (8) NWFE

kNN (k=3) 72.0% (6) 79.4% (12) 56.5% (2) Gaussian 60.4% (7) 79.4% (2) 60.9% (4) AdaFE_CE

kNN (k=3) 74.3% (8) 82.0% (15) 60.3% (3) Gaussian 60.8% (6) 79.7% (3) 60.5% (7) AdaFE_RR

kNN (k=3) 77.6% (8) 82.4% (4) 61.4% (7)

10%

20%

30%

40%

50%

60%

NonFE NWFE AdaFE_CE AdaFE_RR

NonFE 21.70%

NWFE 56.20%

AdaFE_CE 60.40%

AdaFE_RR 60.80%

Gaussian

Figure 4.17. The accuracy of four methods with Gaussian classifier in Glass dataset experiment.

60%

65%

70%

75%

80%

NonFE NWFE AdaFE_CE AdaFE_RR

NonFE 75.30%

NWFE 72.00%

AdaFE_CE 74.30%

AdaFE_RR 77.60%

kNN (k=3)

70%

72%

74%

76%

78%

80%

NonFE NWFE AdaFE_CE AdaFE_RR

NonFE 74.90%

NWFE 78.30%

AdaFE_CE 79.40%

AdaFE_RR 79.70%

Gaussian

Figure 4.19. The accuracy of four methods with Gaussian classifier in Heart dataset experiment.

70%

72%

74%

76%

78%

80%

82%

84%

NonFE NWFE AdaFE_CE AdaFE_RR

NonFE 80.10%

NWFE 79.40%

AdaFE_CE 82.00%

AdaFE_RR 82.40%

kNN (k=3)

Figure 4.20. The accuracy of four methods with kNN (k=3) classifier in Heart dataset experiment.

20%

25%

30%

35%

40%

45%

50%

55%

60%

65%

NonFE NWFE AdaFE_CE AdaFE_RR

NonFE 27.00%

NWFE 58.40%

AdaFE_CE 60.90%

AdaFE_RR 60.50%

Gaussian

Figure 4.21. The accuracy of four methods with Gaussian classifier in Yeast dataset experiment.

50%

52%

54%

56%

58%

60%

62%

64%

NonFE NWFE AdaFE_CE AdaFE_RR

NonFE 56.20%

NWFE 56.50%

AdaFE_CE 60.30%

AdaFE_RR 61.40%

kNN (k=3)

CHAPTER 5: CONCLUSION AND FUTURE WORK

5.1 Conclusion

In Chapter 2, nonparametric weighted feature extraction (NWFE), adaptive boosting, reject region, neighborhood system, and classifiers are introduced which are the previous works for Chapters 3, 4, and 5.

In Chapter 3, a novel method named adaptive feature extraction for constructing various classifiers is proposed for mitigating the Hughes effect (small training sample problem) and improving high dimensional data classification performances.

In Chapter 4, experimental results of hyperspectral image dataset show that the proposed algorithm can improve on Gaussian and kNN (k=3) classifiers, especially in the situation of small ensemble size. The best classification accuracy can be obtained by using AdaFE_SI with Gaussian or knn (k=3) classifier. Experimental results of educational testing and UCI datasets show that the application of the proposed method is useful.

Based on the above summary, the adaptive feature extraction method is a robust classification process for high dimensional data.

In this thesis, AdaFE_CE is proposed and the traditional NWFE is a special case of AdaFE_CE when performing only a round; if the ratio r which is the factor to

decide the number of rejected samples from training samples is set as 1, then NWFE is the special cases of AdaFE_RR. Our novel methods integrate NWFE, AdaBoost, reject region, and neighborhood system’s advantages to develop better efficacy. The experimental results show that only fewer rounds by applying the algorithm we proposed can obtain the higher accuracy and achieve the best result no matter what kinds of classifiers are applied.

5.2 Suggestions for Future Work

1. Try to integrate feature selection into multiple classifier system to scale new heights in classification result in the future.

2. Adding other feature extractions and classifiers to enlarge the adaptive feature extraction methods.

3. To use other high dimensional data with huge dimensionality such as face recognition data to test the completeness and suitability of the adaptive feature extraction methods.

APPENDIX A: THE TEST OF “SECTOR” UNIT

REFERENCES

Bellman, R. E. (1961). Adaptive Control Processes. Princeton, NJ: Princeton Univ.

Press.

Benediktsson, J. A., Palmason, J. A., & Sveinsson, J. R. (2005). Classification of hyperspectral data from urban areas based on extended morphological profiles.

IEEE Trans. Geosci. Remote Sens., 43(3), pp. 480–491.

Blake, C. L. & Merz, C. J. (1998). UCI Repository of Machine Learning Databases, http://www.ics.uci.edu/~mlearnMLRepository.html

Bruzzone, L. & Persello, C. (2009). A Novel Context-Sensitive Semisupervised SVM Classifier Robust to Mislabeled Training Samples. IEEE Trans. Geosci. Remote Sens., vol. 47, issue 7, pp. 2142-2154.

Chow, C.K. (1970). On Optimum Recognition Error and Reject Trade-Off. IEEE Trans.

Information Theory, 16, 41-46, 1970.

Cossu, R., Chaudhuri, S., & Bruzzone, L. (2005). A Context-Sensitive Bayesian Technique for the Partially Supervised Classification of Multitemporal Images,”

IEEE Geoscience and Remote Sensing Letters, 2(3), 352-356.

Duda, R. O. & Hart, P. E. (1973). Pattern Classification and Scene Analysis. NY: John Wiley & Sons.

Freund, Y. & Schapire, R. E. (1997). A decision-theoretic generalization of on-line learning and an application to boosting. Journal of Computer and System Sciences, 55(1), 119-139.

Friedman, J. H. (1989). Regularized Discriminant Analysis. Journal of the American Statistical Association, 84, 165-175.

Fukunaga, K. (1990). Introduction to Statistical Pattern Recognition. (2nd ed.). San Diego, CA: Academic.

Giusti, N., Masulli, F., & Sperduti, A. (2002). Theoretical and Experimental Analysis of a Two-Stage System for Classification. IEEE Trans. on Pattern Analysis and Machine Intelligence, PAMI-24(7):893-904.

Hammersley, J. M., & Clifford, P. (1971). Markov fields on finite graphs and lattices.

Unpublished manuscript, Oxford University, 1971.

Hsieh, P. F., Wang, D. S., & Hsu, C. W. (2006). A linear feature extraction for multiclass classification problems based on class mean and covariance

discriminant information, extraction. IEEE Trans. Pattern Anal. Mach. Intell., 28(2), 223–235.

Hughes, G. F. (1968). On the Mean Accuracy of Statistical Pattern Recognition. IEEE

Jain , A. K., Duin, R. P. W, & Mao, J. (2000). Statistic Pattern Recognition: A Review.

IEEE Transaction on Pattern Analysis and Machine Intelligence, 22(1), 4-37.

Jia, X. & Richards, J. A. (2008). “Managing the Spectral-Spatial Mix in Context Classification using Markov Random Fields,” IEEE Geoscience and Remote Sensing Letters, 5(2), 311-314.

Kinderman, R., & Snell, J. L. (1980). Markov random fields and their applications. Amer.

Math. Soc., vol. 1, pp. 1-142, 1980.

Ko, L.-W., Kuo, B.-C., Lin, C.-T., & Liu, D.-J. (2005). A Two-stage Classification System with Nearest Neighbor Reject Option for Hyperspectral Image Data. The 21th IPPR Conference on Computer Vision, Graphics, and Image Processing (CVGIP ).

Kuncheva, L. I. (2004). Combining Pattern Classifiers: Methods and Algorithms.

Hoboken, NJ: Wiley & Sons.

Kuo, B.-C. & Landgrebe, D. A. (2004). Nonparametric Weighted Feature Extraction for Classification. IEEE Transaction on Geoscience and Remote Sensing, 42(5), 1096-1105.

Kuo, B.-C., Ko, L.-W., Pai, C.-H., & Yang, J.-M. (2003). Combining Feature

在文檔中基於自適應特徵萃取於高維度資料分類 (頁 40-0)