Cluster Based Dynamic Subspace Method - CLUSTER BASED DYNAMIC SUBSPACE METHOD

CHAPTER 3: CLUSTER BASED DYNAMIC SUBSPACE METHOD

3.3 Cluster Based Dynamic Subspace Method

In this chapter, a novel multiple classifier system named cluster based dynamic subspace method (CDSM) is proposed and how the drawbacks of RSM and WRSM are overcome is shown. The framework of CDSM is displayed in Figure 3.3, where all feature vectors are embedded into prototype space (Mojaradi, Emami, Varshosaz,

& Jamali, 2008) to explore the similarity between each other, and two new distributions are imposed in feature selection process. One is the importance distribution of feature weight, M , which models the probability of each feature to be selected out as the component dimensions of the subspace. The selected

probability of all features are assumed differently and modeled by membership degrees obtained from clustering algorithms. The other is the importance distribution of subspace dimensionality, C, and the function of it is to give information about how many dimensionalities are suitable for resulting in better classification. The subspace dimensionality is neither predefined nor a fixed number but drawn from C distribution in each overproduction. Additionally, we also propose an automatic update algorithm for C distribution through the iteration process.

Figure 3.3. The framework of CDSM Clustering

Subspace dimensionality, c

The algorithm of CDSM is shown in Table 3.3.

Table 3.3. The algorithm of CDSM Input:

The training dataset ^X The testing sample ^Y

A learning algorithm (classifier), Ψ

The number of classifiers for initializing C distribution, b The number of classifiers in the ensemble, B

Output:

Final hypothesis F:Y →{1,2,L,L} obtained by the ensemble classifiers.

A. Training Procedure Begin

B. Classification Procedure

)

In the following sections, prototype space, C distribution, M distribution, and the steps of proposed ensemble technique are introduced, respectively.

3.3.1 Prototype space

Each kind of features has different attributes for characterizing objects that come from different classes. In order to avoid selecting out the similar features to be redundant information, all features are firstly embedded into prototype space (Mojaradi et al., 2008) and grouped into different clusters by the clustering algorithm, and then each component feature of subspace is respectively selected from different cluster. Prototype space is an L-dimensional space for feature representation. Let E be a mean spectra for L classes, where xij represents the mean value of class k L-dimensional vector to describe L classes with the jth feature. All vectors

μn

μ₁, ₂,K, scatter on the L-dimensional space called prototype space as Figure 3.4 (b) to explore the similarity between all features. In physical perspective, if the

Class 2

Class 1

Class 3 μj

features have similar spectral responses of different classes, these highly correlated features should be close to each other in prototype space and constitute a cluster.

3.3.2 Importance distribution of feature weight

The membership degrees of all features for each cluster are utilized to model the importance distribution of feature weight. In each cluster a feature that is close to the cluster center identified as representative of cluster feature and given large probability to be selected. In this thesis, we propose two importance distributions of feature weight, Mkm and Mfcm, to impose into the procedure of the feature selection.

Assume that all features could be divided into c groups. Two well-known clustering algorithms are applied on μ₁,μ₂,K,μ_n to obtain the membership degrees for each feature in each group. The first one is k-means algorithm (km) which is a hard partition method and assigns each point to only one cluster whose centroid is nearest. The probability mass function (pmf) for cluster k, M_km(k), is defined as one is fuzzy c-means algorithm (fcm), each feature has a degree of belonging to clusters, rather than belonging completely to just one cluster. The pmf of M_fcm(k) is defined as

1 ,

The features carrying large membership degrees have large probabilities to be selected out, and it is intuitive since these features possess large representativeness for the clusters. The algorithm of estimating the distribution of feature weights is summarized in the following:

Next, we will propose another distribution for automatically determining the number of subspace dimensionalities instead of predefining a fixed number.

3.3.3 Importance distribution of subspace dimensionality

Let the subspace dimensionality, c , be an outcome of the distribution of subspace dimensionalities, C, with the probability function f(c), where 1≤c≤n.

)

f(c is estimated by the re-substantiation accuracy of the classifier training in c-dimensional space and is smoothed by the kernel smoothing technique (Parzen, 1962) (Silverman, 1985). First of all the initial distribution C has to be set up, then ₀ an automatic updating algorithm for C distribution is introduced in the process of constructing ensemble classifiers. The initialize steps are summarized as the following and its updating mechanism will be introduced in the next section.

Let b be the number of classifiers for initializing C₀ distribution and the index i=1,2,K,b.

Step 1. Construct the ith classifier hⁱ based on the ith subspace which dimensionality is set as

1 ],

Step 3. The feature selection process selects cⁱ features based on )

M K , respectively, and obtain a cⁱ-dimensional dataset ~i ( , ₍i₎, i) feature selection algorithm.

Step 4. Train the ith classifier by ~ ) ( ⁱ

i X

h =Ψ , where Ψ is the learning algorithm.

Step 5. Estimate the re-substitution accuracy φ(hⁱ) of the ith classifier.

Step 6. Obtain C₀ distribution by using a kernel smoothing technique (KS), and the density f₀ is estimated by the following function:

⎥⎦

3.3.4 Dynamic subspace ensemble

After initializing C distribution, the ensemble classifiers start to be constructed as the following steps:

Let B be the number of classifiers in the ensemble and the index j=1,2,K,B. Step 1. Draw a new subspace dimensionality c^b⁺^j from Cj₋₁ distribution, and

then imitate Steps 2 to 4 of the previous section to train classifier h^b⁺^j. Step 2. Estimate the re-substitution accuracy φ(h^b⁺^j) as the feedback to obtain

an updating C_j distribution whose probability density is re-estimated by

Step 3. Back to Step 1 until B classifiers have been trained.

Step 4. Finally, combine B classifiers by simple majority voting in the final decision rule.

These B classifiers are constructed in the subspaces with different dimensionalities since C distribution is changing during the training process, and an example of the updating process is shown in Figures 3.6 (b)-(l). This is the reason why we name the proposed method as “dynamic” subspace method.

在文檔中基於子空間選取之多重辨識器系統於高維度資料分類 (頁 40-47)