最近搜尋

沒有找到結果。

標籤

沒有找到結果。

文件

沒有找到結果。

上傳

首頁學校主題

登錄

Cost-Sensitive Experiments

在文檔中成本導向多標籤學習演算法與應用 (頁 74-89)

As mentioned in Section 6.6, six datasets (including cal500, majorminer, dlc1, dlc2, dlc3, and dlc4 ) come from the social tagging domain. These datasets, which contain the tag count information, are used for cost-sensitive social tagging experiments.

The experimental setup is the same as that in the Section 6.6. However, we only compare GLE with RAk EL, BR, and MLKNN, since these three methods perform better than CC, IBLR, and BPMLL on the experiments in Section 6.6. We replace the base classiﬁer in BR by cost sensitive binary classiﬁer (CSBR) as in the previous work [38]. The cost-sensitive binary classiﬁer is implemented using LIBSVM.

The experimental results of cost-sensitive social tag annotation and retrieval are summarized in Table 5.5. In most cases, GLE outperforms RAk EL, CSBR, and MLKNN. In only one case, GLE performs slightly worse than RAk EL in cost-sensitive annotation; however, the diﬀerence is not signiﬁcant. The average rankings of GLE on six datasets using two diﬀerent metrics are 1.2 and 1.0, respectively.

Table 5.5: Experimental Results in Terms of Two Cost-Sensitive Evaluation Met-rics. The Average Rank is the Average of the Ranks Across All Datasets. •/◦

indicates whether GLE is statistically superior/inferior to the compared algorithm (the pairwise t-test at the 5% signiﬁcance level).

GLE RAk EL CSBR MLKNN

Cost-Sensitive F-Measure for Annotation

cal500 0.6544 (1) 0.6436 (2) • 0.4916 (4)• 0.5889 (3) • majorminer 0.4938 (1) 0.4885 (2) • 0.2607 (4)• 0.2964 (3) • dlc1 0.2048 (2) 0.2052 (1) 0.0780 (4)• 0.1537 (3) • dlc2 0.1498 (1) 0.1433 (2) • 0.0588 (4)• 0.1213 (3) • dlc3 0.1555 (1) 0.1502 (2) • 0.0621 (4)• 0.1174 (3) • dlc4 0.1875 (1) 0.1835 (2) • 0.0529 (4)• 0.1449 (3) •

Average Rank 1.2 1.8 4.0 3.0

Cost-Sensitive F-Measure for Retrieval

cal500 0.4699 (1) 0.2929 (4) • 0.3341 (2)• 0.3053 (3) • majorminer 0.3157 (1) 0.3066 (2) • 0.1147 (4)• 0.1427 (3) • dlc1 0.2141 (1) 0.2094 (2) • 0.1721 (4) 0.1874 (3) • dlc2 0.2801 (1) 0.2678 (2) • 0.1887 (3)• 0.1725 (4) • dlc3 0.2515 (1) 0.2416 (2) 0.1864 (3) 0.1464 (4) • dlc4 0.2572 (1) 0.2479 (2) • 0.1329 (4)• 0.1466 (3) •

Average Rank 1.0 2.3 3.3 3.3

(a) Hamming Loss (b) Ranking Loss

(c) Subset 0/1 Loss (d) One Error

(e) Average Precision

Figure 5.2: Experimental Results of GLE with Diﬀerent γ And ν in Terms of Five Diﬀerent Evaluation Metrics on The Scene Dataset.

(a) Hamming Loss (b) Ranking Loss

(c) Subset 0/1 Loss (d) One Error

(e) Average Precision

Figure 5.3: Experimental Results of GLE with Diﬀerent γ And ν in Terms of Five Diﬀerent Evaluation Metrics on The Enron Dataset.

(a) Hamming Loss (b) Ranking Loss

(c) Subset 0/1 Loss (d) One Error

(e) Average Precision

Figure 5.4: Experimental Results of GLE with Diﬀerent γ And ν in Terms of Five Diﬀerent Evaluation Metrics on The Cal500 Dataset.

(a) Hamming Loss (b) Ranking Loss

(c) Subset 0/1 Loss (d) One Error

(e) Average Precision

Figure 5.5: Experimental Results of GLE with Diﬀerent γ And ν in Terms of Five Diﬀerent Evaluation Metrics on The Majorminer Dataset.

(a) Hamming Loss (b) Ranking Loss

(c) Subset 0/1 Loss (d) One Error

(e) Average Precision

Figure 5.6: Experimental Results of GLE with Diﬀerent γ And ν in Terms of Five Diﬀerent Evaluation Metrics on The Medical Dataset.

(a) Hamming Loss (b) Ranking Loss

(c) Subset 0/1 Loss (d) One Error

(e) Average Precision

Figure 5.7: Experimental Results of GLE with Diﬀerent γ And ν in Terms of Five Diﬀerent Evaluation Metrics on The Bibtex Dataset.

(a) Hamming Loss (b) Ranking Loss

(c) Subset 0/1 Loss (d) One Error

(e) Average Precision

Figure 5.8: Experimental Results of GLE with Diﬀerent γ And ν in Terms of Five Diﬀerent Evaluation Metrics on The Dlc1 Dataset.

(a) Hamming Loss (b) Ranking Loss

(c) Subset 0/1 Loss (d) One Error

(e) Average Precision

Figure 5.9: Experimental Results of GLE with Diﬀerent γ And ν in Terms of Five Diﬀerent Evaluation Metrics on The Dlc2 Dataset.

(a) Hamming Loss (b) Ranking Loss

(c) Subset 0/1 Loss (d) One Error

(e) Average Precision

Figure 5.10: Experimental Results of GLE with Diﬀerent γ And ν in Terms of Five Diﬀerent Evaluation Metrics on The Dlc3 Dataset.

(a) Hamming Loss (b) Ranking Loss

(c) Subset 0/1 Loss (d) One Error

(e) Average Precision

Figure 5.11: Experimental Results of GLE with Diﬀerent γ And ν in Terms of Five Diﬀerent Evaluation Metrics on The Dlc4 Dataset.

Chapter 6

Patient-Balanced Learning for Medical Image Classiﬁcation

KDD Cup is an annual worldwide competition on KDD (knowledge discovery and data mining). It is organized by ACM special interest group on KDD, and started from 1997. It is now the most prestigious data mining competition. In both KDD Cup 2006 and 2008, the prediction task is medical image classiﬁcation. The medi-cal image datasets are provided by Siemens Medimedi-cal Solutions, USA. In KDD Cup 2006¹, the task is pulmonary embolism (PE) classiﬁcation using pre-processed com-puted tomography images [29]; while in KDD Cup 2008², the task is breast cancer classiﬁcation using mammogram images [48]. We have participated in the KDD Cup 2008 and have won the joint winner of the competition.

In this chapter, we start from discussing about some practical issues of model selection for medical image classiﬁcation. Since the performance evaluation is based on patient-based metrics rather than traditional instance-based metrics, general model selection strategies may not work well. We describe our model selection strategies that used in our winning method. Then, we describe a class-imbalanced issue and a class-balanced SVM. Furthermore, we discuss a patient-imbalanced

prob-1http://www.cs.unm.edu/kdd cup 2006

2http://www.kddcup2008.com/

lem that might seriously hurt the generalization ability of the image classiﬁer. To the best of our knowledge, this problem has not been addressed and solved in pre-vious researches. We believe that it occur in general medical image classiﬁcation tasks and is not speciﬁc to the KDD Cup competition. We design a patient-balanced learning strategy based on cost-sensitive binary classiﬁcation. The experiments are conducted on both of the breast cancer dataset and the pulmonary embolism dataset. The absolute performance improvement of the patient-balanced learning over traditional learning method is about 5% on the test data, in terms of AUC, which should be considered as crucial for winning the competition.

6.1 Background

Data mining techniques have been widely exploited for the Computer Aided Diag-nosis (CAD) for medical image data (e.g. CT scans, X-ray, MRI,. . . , etc.). Given a set of labeled images, one can design a learning program that predicts whether an unlabelled image contains cancer regions or not. There are generally three steps in developing a CAD system [29]:

1. Identify some potentially unhealthy regions (or regions of interest, ROIs) from a medical image.

2. Extract descriptive features from each candidate region.

3. Design a classiﬁer to identify the labels of the candidates.

The third step in the CAD scenario can be formulated as a supervised learning problem. That is, we are given a training data set {(xi, y_i, p_i)}^Ni=1, where x_i is a feature vector of an ROI, y_i ∈ {1, −1} is a class label indicated whether this ROI is unhealthy (positive) or not (negative), and p_i ∈ {1, 2, · · · , M} denotes that this

instance is associated with the j-th patient. We note that the instances of a patient may not come from one single image, but from images of diverse viewpoints or organs (e.g. left/right breast). We deﬁne a patient as unhealthy if and only if at least one of his ROI instances is regarded as unhealthy. We also deﬁne a patient as healthy if and only if none of his ROI instances is regarded as unhealthy. Let X_j^bag be the set of ROI feature vectors associated with the j-th patient. We deﬁne a patient classiﬁer F (X_j^bag) with input X_j^bag as:

F (X_j^bag) =

{ 1 (unhealthy) if ∃xi ∈ Xj^bag, f (x_i) = 1,

−1 (healthy) else, (6.1)

where f (xi) can be any binary classiﬁer, such as one implemented in SVM, with a single feature vector as its input. The patient classiﬁer can also be expressed compactly as:

F (X_j^bag) = max

xi∈X_j^bagf (x_i). (6.2) Suppose the classiﬁer f (x_i) can output conﬁdence scores for the ROIs and the higher score means more conﬁdence to be unhealthy. We can use the largest score of the associated ROIs to represent conﬁdence degree of unhealthy for a patient according to (6.2).

This scenario is similar to the setting of the multi-instance learning (MIL) problem [2, 42, 62, 68] by considering the instances belonging to a patient as a bag of instances as deﬁned in MIL. The major diﬀerence between MIL and our introduced scenario for medical image classiﬁcation is that in MIL the label information is provided in the bag level but not in the instance level. Consequently, treating this medical image classiﬁcation problem as a conventional MIL problem will lose the detailed label information. Nevertheless, such ﬁne-grained MIL-liked problem is very important so that it was proposed as the major challenge for KDD Cup 2006 and 2008 competition. We propose learning methods to improve the performance of medical classiﬁers in such ﬁne-grained MIL-liked problem.

在文檔中成本導向多標籤學習演算法與應用 (頁 74-89)

立即下載 "成本導向多標籤學習演算..."

Outline

相關文件