• 沒有找到結果。

Condensed Filter Tree for Cost-Sensitive Multi-Label Classification

N/A
N/A
Protected

Academic year: 2022

Share "Condensed Filter Tree for Cost-Sensitive Multi-Label Classification"

Copied!
2
0
0

加載中.... (立即查看全文)

全文

(1)

Condensed Filter Tree for Cost-Sensitive Multi-Label Classification

A. Proof of Theorem 1

Theorem 1. Under the proper ordering and K-classifier tricks, for eachx and the multi-label classifier h formed by chainingK binary classifiers (h1, ..., hK) as in the predic- tion procedure of Filter Tree, the regretrg(h, P) is

rg(h,P)≤ X

t∈hr,yi

Jhk(x,t)6=y[k]Krg



hk(x,t),FTt(P,hk+1,...,hK)



,

where k denotes the layer that t is on, and FTt(P, hk+1, ..., hK) represents the procedure that generates weighted examples(x, b, w) to train the node at indext based on sampling y from P|xand considering the predictions of classifiers in the lower layers.

Proof. The proof is similar to the one in (Beygelzimer et al.,2008), which is based on defining the overall-regret of any subtree. The key change in our proof is to define the path-regretof any subtree to be the total regret of the nodes on the ideal path of the subtree. The induction step follows similarly from the proof in (Beygelzimer et al.,2008) by considering two cases: one for the ideal prediction to be in the left subtree and one for the ideal prediction to be in the right. Then an induction from layer K to the root proves the theorem.

For each node t on layer k, hk makes a weighted binary classification decision of 0 or 1, which directs the predic- tion procedure to move to either the node t0or t1. Without loss of generality, assume hk(x, t) = 1. We denote ˆt as the prediction (leaf) on x when starting at node t. For each leaf node ˜y, let ¯C(˜y) ≡ Ey∼P|xC(y, ˜y). Then the node re- gret rg(t) is simply ¯C(ˆt1) − mini∈{0,1}C(ˆ¯ ti). Obviously, rg(t) ≥ ¯C(ˆt1) − ¯C(ˆt0) for all node t.

In addition to the regret of nodes, we also define the regret of the subtree Tt rooted at node t. The re- gret of the subtree Tt is as defined as the regret of the predicted path (vector) ˆt within the subtree Tt, that is, rg(Tt) = ¯C(ˆt) − ¯C(t) , where t denotes the optimal prediction (leaf node) in the subtree Tt. By this definition, rg(h, P) can be treated as rg(Tr).

We now prove by induction from layer K to the root. The induction hypothesis is that

rg(Tt) ≤ X

t0∈ht,ti

Jhk(x, t0) 6= y[k]Krg(t

0),

where k is the corresponding layer of each node t0. The hypothesis states that the regret of the subtree is bounded by the sum of the regrets for the wrongly predicted nodes from t to the ideal prediction t. The base case is the reduction tree with one single internal node t and two leaf nodes, which is a cost-sensitive binary classification with rg(Tt) = rg(t) trivially. If h1 predicts correctly, then rg(Tt) = 0. Otherwise rg(Tt) = rg(t). Then the induction hypothesis is satisfied.

For the inductive step, for node t on layer k, assume R0≡ rg(Tt0) ≤ X

t0∈ht0,t0i

Jhk(x, t0) 6= y[k]Krg(t

0),

and

R1≡ rg(Tt1) ≤ X

t0∈ht1,t1i

Jhk(x, t0) 6= y[k]Krg(t

0).

The optimal prediction tis either on the right subtree T1

or the left subtree T0. For the first case, it implies t= t1 and y[k] = hk(x, t) = 1, then

rg(Tt) = C(ˆ¯ t1) − ¯C(t)

= C(ˆ¯ t1) − ¯C(t1)

= R1≤ X

t0∈ht1,t1i

Jhk(x, t0) 6= y[k]Krg(t

0)

= X

t0∈ht,ti

Jhk(x, t0) 6= y[k]Krg(t

0).

For the second case, it implies t = t0 and y[k] 6=

hk(x, t) = 1, then

rg(Tt) = C(ˆ¯ t1) − ¯C(t)

= C(ˆ¯ t1) − ¯C(t0)

= C(ˆ¯ t1) − ¯C(ˆt0) + ¯C(ˆt0) − ¯C(t0)

≤ rg(t) + R0

≤ rg(t) + X

t0∈ht0,t0i

Jhk(x, t0) 6= y[k]Krg(t

0)

= X

t0∈ht,ti

Jhk(x, t0) 6= y[k]Krg(t

0).

Then we complete the induction.

B. Datasets

Here we summarize the basic statistics of the used datasets in Table1.

(2)

Condensed Filter Tree for Cost-Sensitive Multi-Label Classification

Table 1. The properties of each dataset Dataset # Instances # Labels (K)

CAL500 502 174

emotions 593 6

enron 1702 53

imdb 86290 28

medical 662 45

scene 2407 6

slash 3279 22

tmc 28596 22

yeast 2389 144

References

Beygelzimer, A., Langford, J., and Ravikumar, P. Error correcting tournaments, 2008.

參考文獻

相關文件

Animal or vegetable fats and oils and their fractiors, boiled, oxidised, dehydrated, sulphurised, blown, polymerised by heat in vacuum or in inert gas or otherwise chemically

Milk and cream, in powder, granule or other solid form, of a fat content, by weight, exceeding 1.5%, not containing added sugar or other sweetening matter.

Estimated resident population by age and sex in statistical local areas, New South Wales, June 1990 (No. Canberra, Australian Capital

The idea of the above proof can be applied to the proof of Stoke’s Theorem for general cases... One can show that the Stoke’s theorem holds for any k-form and for any k-chains on R

• There are important problems for which there are no known efficient deterministic algorithms but for which very efficient randomized algorithms exist.. – Extraction of square roots,

— John Wanamaker I know that half my advertising is a waste of money, I just don’t know which half.. —

Similar objections apply to using a board as a desktop; people will have to get used to using pads and tabs on a desk as an adjunct to computer screens before taking embodied

To build a cost-sensitive DNN for a K-class cost-sensitive classification problem, the proposed framework replaces the layer-wise pretraining step with layer-wise cost estimation,