Experimental Setup - Extended Experiments

4.6 Extended Experiments

5.1.3 Experimental Setup

For the cost-less multi-label classiﬁcation part, we compare the performance of GLE with that of six state-of-the-art multi-label learning algorithms: RAk EL, BR (Bi-nary Relevance) [55], CC (Classiﬁer Chains) [49], MLKNN [67], IBLR [10], and BPMLL [66]. We implement GLE using MATLAB. We exploit the linear multi-class SVM implemented in LIBSVM for the LP multi-classiﬁers in GLE, which is based on the one-against-one approach. The parameter C in SVM is set to 1.

Based on our observation, the best selected parameters k and M are usually the same for both RAk EL and GLE. We select k and M using cross-validation for RAk EL and use Hamming loss as the model selection criterion. We apply the selected k and M for GLE and the parameters are listed in Table 5.2. Then, the parameters γ and ν in GLE are selected using cross-validation with respect to the ﬁve diﬀerent evaluation metrics, respectively. As mentioned in the Section 2.3.1, the computational cost of obtaining β^∗is very small. After training the LP classiﬁers, we can obtain diﬀerent β with thousands of diﬀerent parameter pair, that is, γ and ν, within one second. Furthermore, we will show the experimental results with diﬀerent γ and ν in Section 5.1.5. The results suggest that we can obtain good enough results by tuning γ and set ν to 1, and thus alleviate the burden of parameter selection.

The BR, CC, MLKNN, and IBLR are implemented in the MULAN package.

For BR, we exploit the linear logistic regression with probability output score as the base classiﬁer. For CC, we exploit the linear SVM as the base classiﬁer and

Table 5.2: Selected Parameters k and M of GLE and RAk EL for the Multi-Label Datasets

Dataset k M

scene 4 15

enron 16 250 cal500 10 250 majorminer 14 250 medical 14 250 bibtex 24 250

dlc1 26 250

dlc2 32 250

dlc3 32 250

dlc4 32 250

the parameter C is set to 1. For MLKNN and IBLR, the size of the neighborhood is set to 10 [10, 67]. We use the MATLAB implementation of BPMLL, which is provided by the authors of BPMLL. For BPMLL, the number of hidden neurons is set to 20% of the dimensionality, and the number of training epochs is selected using cross-validation. The implementation and setting of RAk EL are similar to that of GLE. The parameters k and M in RAk EL are selected using cross-validation. We perform three-fold cross-validation sixty times for the ﬁve medium-scale datasets and three times for the ﬁve large-scale datasets; and calculate the mean and standard deviation of the results.

5.1.4 Experimental Results

The experimental results of multi-label classiﬁcation are summarized in Table 5.3.

The numbers in parentheses represent the rank of the algorithm among the compared algorithms. We do not report the performance on cal500 in terms of subset 0/1 loss since none of the methods can achieve an error rate better than 1.0. The average rankings of our method GLE on ten datasets using ﬁve diﬀerent metrics are 1.8, 2.4, 1.5, 1.4, and 1.9, respectively. On four of the ﬁve metrics, GLE achieves the best performance. GLE performs slightly worse than MLKNN only in terms

of ranking loss; however, the diﬀerence is very small. We observe that RAk EL performs closely to GLE in terms of Hamming loss; but in terms of the other four metrics, GLE performs much better than RAk EL. Among the ﬁve metrics, the improvement of GLE is more signiﬁcant on Hamming loss, subset 0/1 error, and one error. Generally speaking, GLE has better or competitive performance against the other state-of-the-art methods. We have run the pairwise t-test at the 5%

signiﬁcance level on the experimental results. We use •/◦ to indicate whether GLE is statistically superior/inferior to the compared algorithm in Table 5.3. When the diﬀerence is not signiﬁcant, no marker is given. There are 246 cases in which GLE performs signiﬁcantly better than the compared method and only 34 cases in which GLE performs signiﬁcantly worse.

Since both GLE and RAk EL are LP-based methods, we calculate the relative improvement of GLE over RAk EL for each dataset, respectively. Then, we show the average relative improvement over all datasets for the ﬁve evaluation metrics in Table 5.4, respectively. In each iteration of the GLE and RAk EL training phase, they use the same randomly selected k -Labelsets for the LP classiﬁers. We have also shown the relative improvement of two simpliﬁed versions of GLE, that is, without two-norm regularization (γ = 0) or without hypergraph regularization (ν = 0), over RAk EL in Table 5.4. We observe that the relative improvement of GLE over RAk EL is more signiﬁcant in terms of ranking loss than the other metrics. GLE achieves around 10% relative improvements over RAk EL in terms of both one error and average precision, but the improvement is small in terms of Hamming loss and subset 0/1 loss. The simpliﬁed version of GLE without two-norm regularization performs even worse than RAk EL in terms of Hamming loss, subset 0/1 loss and one error. The simpliﬁed version of GLE without hypergraph regularization performs better than RAk EL but worse than GLE.

We further compare GLE and RAk EL by varying parameters k and M . They

Table 5.3: Experimental Results in Terms of Five Diﬀerent Evaluation Metrics. The Numbers in Parentheses Represent the Rank of the Algorithm Among the Compared Algorithms. The Average Rank is the Average of the Ranks Across All Datasets. •/◦

indicates whether GLE is statistically superior/inferior to the compared algorithm (the pairwise t-test at the 5% signiﬁcance level).

GLE RAk EL BR CC MLKNN IBLR BPMLL

Hamming Loss cal500 1.0000 (1) 1.0000 (1) 1.0000 (1) 1.0000 (1) 1.0000 (1) 1.0000 (1) 1.0000 (1) majorminer 0.9081 (1) 0.9113 (2)• 0.9602 (5)• 0.9434 (4)• 0.9358 (3)• 0.9701 (6)• 0.9910 (7)•

Table 5.4: Relative Improvement of GLE and Its Two Simpliﬁed Versions Over RAk EL in Terms of Five Diﬀerent Evaluation Metrics (In %)

Hamming Ranking Subset One Average Loss Loss 0/1 Loss Error Precision γ = 0 -9.11 26.44 -5.19 -28.29 2.16

ν = 0 0.20 51.26 1.30 3.87 10.25

GLE 0.22 55.34 1.83 8.15 11.80

use the same randomly selected k -Labelset for the LP classiﬁers. We show the aver-age relative improvement of GLE over RAk EL in terms of ﬁve diﬀerent evaluation metrics in Figure 5.1. The results for the selected parameter k in Table 5.2 are indicated by the blue curves in Figure 5.1. We have also tested larger k and smaller k, and the results are indicated by the red and green curves, respectively. We in-crementally increase the number of models M as indicated by the horizontal axis.

Since the number of instances and the number of labels for the scene dataset are much smaller than that of the other datasets, the ten steps of M are from 6 to 15.

For the other nine datasets, the ten steps of M are from 25 to 250 with an increment of 25. From the results, we observe that GLE performs better than RAk EL in most parameter settings, especially when the number of models increases. The major reason could be that, when the number of models is small, it is less likely to select a signiﬁcantly better or worse k -Labelset, so it is reasonable to assume that all LP classiﬁers are equally important.

在文檔中成本導向多標籤學習演算法與應用 (頁 68-72)