Reducing Cost-Sensitive Multi-Label Classiﬁcation into Cost-Sensitive

Clas-siﬁcation

Reduction is a commonly used machine learning technique when the problem can not be solved by standard learning algorithms. In the research of traditional multi-label classiﬁcation, a category of methods called problem transformation belongs to this kind of technique. It transforms the multi-label classiﬁcation problem to one or many single-label classiﬁcation tasks.

The binary relevance (BR) method is one of the popular problem transfor-mation approaches. It trains a binary classiﬁer for each label independently. For each label, the instances with/without the label will be treated as positive/negative examples for training the corresponding binary classiﬁer. This manner inevitably loses the co-occurrence information of multiple labels that might be useful. Label correlation is an useful information for multi-label classiﬁcation since some labels often co-occur. For example, in music tag annotation, a song with the “hip hop”

tag is more likely to be also annotated with “rap” than “jazz”, while a song with the “dance” tag is more likely to be also annotated with “electronic” than “guitar”.

Label powerset (LP) [55] method is another problem transformation approach.

It treats each distinct combination of labels in the training set as a diﬀerent class and, thus, treats the multi-label classiﬁcation as a multi-class classiﬁcation prob-lem. Given a test instance, the multi-class LP classiﬁer predicts the most probable class, which can be transformed to a set of labels. Table 2.1 shows an example of multi-label dataset with transformed multi-class label based on the concept of LP. However, one major concern for this model is that, when the number of la-bels increases, the number of potential classes increases proportionally, and each class will be associated with very few training instances. Moreover, LP can only

Table 2.1: An Example of Multi-Label Dataset with Transformed Multi-Class Labels Instance Label Set Transformed Class

1 Rock,Guitar 1

2 Rock, Guitar, Drum 2

3 Rock, Guitar, Vocal 3

4 Country, Guitar 4

5 Rock, Guitar, Drum 2

6 R&B, Vocal 5

7 Country, Guitar 4

8 Vocal 6

predict labelsets observed in the training data. In [56], a method called Random k -Labelsets (RAk EL) is proposed to overcome the drawback of the traditional LP method. RAk EL randomly selects a number of label subsets from the original set of labels and uses the LP method to train the corresponding multi-class classiﬁers. The ﬁnal prediction of RAk EL is made by voting of the LP classiﬁers in the ensemble.

This method can not only reduce the number of classes, but also allow each class to have more training instances. Experimental results have shown an improvement of RAk EL over LP.

Inspired by the reduction methods for multi-label classiﬁcation, we propose two general strategies for reducing the CSML problem to cost-sensitive single-label classiﬁcation problem: a binary relevance based strategy and a label powerset based strategy. We describe these two methods in the following two subsections.

2.2.1 Cost-Sensitive Stacking

In this subsection, we propose a two-stage method called cost-sensitive stacking.

Stacking [63] is a method of combining the outputs of multiple independent classi-ﬁers for multi-label classiﬁcation. In the ﬁrst stage of cost-sensitive stacking, assume that the K labels are independent and we train cost-sensitive binary classiﬁers

inde-pendently. Then, we use the outputs of all binary classiﬁers, f₁(x), f₂(x), ..., f_K(x), as features to form a new feature set. Let the new feature be z = (z₁, z₂, ..., z_K).

We can use the new feature set together with the true label to learn the parameters w_kj of the stacking classiﬁers:

h_k(z) =

∑K j=1

w_kjz_j, (2.9)

where the weight w_kj will be positive if label j is positively correlated to label k;

otherwise, w_kj will be negative. The stacking classiﬁers can recover misclassiﬁed labels by using the correlation information captured in the weight w_kj.

2.2.2 Cost-Sensitive RAk EL

As mentioned in the beginning of Section 2.2, a method called Random k -Labelsets [58] is proposed to realize and improve the LP method. A k -labelset is a labelset R ⊆ L with |R| = k. RAkEL randomly selects a number of k-labelsets from L and uses the LP method to train the corresponding multi-label classiﬁers. Algorithms 1 and 2 describe the training and classiﬁcation processes of RAk EL, respectively.

The prediction of a multi-class LP classiﬁer g_m for sample x is denoted by g_m(x)∈ {1, 2, . . . , V }. Note that V will be much smaller than 2^k if the data is sparse. In be 1, 1,−1, and −1, respectively.

We extend RAk EL for cost-sensitive multi-label classiﬁcation. The extension is not straightforward since we are given a cost value for each label but RAk EL

Algorithm 1 The training process of RAk EL

• Input: number of models M, size of labelset k, set of labels L, and the training setD = (xi, y_i)^N_i=1

• Output: an ensemble of LP classiﬁers gm and the corresponding k -labelsets Rm

1. Initialize S ← L^k

2. for m← 1 to min(M,|L^k|) do

• Rm ← a k-labelset randomly selected from S

• train the LP classiﬁer gm based on D and Rm

• S ← S \ Rm

3. end

considers a set of labels as a class. Our idea is to train the cost-sensitive LP classiﬁer ˆ

g_mby transforming the cost of each label in a labelset to the total cost of the labelset.

The transformed cost ˆc_i of a training sample x_i for training ˆg_m is computed by

where c_i is the cost vector mentioned in Section 1.3. Therefore, we can obtain the multi-class training sample with the associated cost, (x_i, ˆy_i, ˆc_i), for training the LP classiﬁer, where ˆyi ∈ {1, 2, . . . , V } is the class value and ˆci is the cost to be paid when the class of this instance is misclassiﬁed. We use the multi-class SVM as the LP classiﬁer in this study, and employ the one-versus-one strategy [33] in cost-sensitive multi-class classiﬁcation.

在文檔中成本導向多標籤學習演算法與應用 (頁 25-28)