• 沒有找到結果。

Decision Tree Decision Tree Heuristics in C&RT

Regularization by Pruning

fully-grown tree: E in (G) = 0

if all

x n

different

but

overfit

(large E

out

) because

low-level trees built with small D c

need a

regularizer, say, Ω(G) = NumberOfLeaves(G)

want

regularized decision tree:

argmin

all possible G

E in

(G) + λΩ(G)

—called

pruned decision tree

cannot enumerate

all possible G

computationally:

—often consider only

• G

(0)

= fully-grown tree

• G

(i)

= argmin

G

E

in

(G) such that G is one-leaf removed from G

(i−1)

systematic

choice of λ?

fully-grown tree: E in (G) = 0

if all

x n

different

but

overfit

(large E

out

) because

low-level trees built with small D c

need a

regularizer, say, Ω(G) = NumberOfLeaves(G)

want

regularized decision tree:

argmin

all possible G

E in

(G) + λΩ(G)

—called

pruned decision tree

cannot enumerate

all possible G

computationally:

—often consider only

• G

(0)

= fully-grown tree

• G

(i)

= argmin

G

E

in

(G) such that G is one-leaf removed from G

(i−1)

Decision Tree Decision Tree Heuristics in C&RT

Branching on Categorical Features

numerical features

blood pressure:

130, 98, 115, 147, 120

branching for numerical

decision stump

b(x) =

Jx

i

θ

K + 1 with

θ

∈ R

categorical features

major symptom:

fever, pain, tired, sweaty

branching for categorical

decision subset

b(x) =

Jx

i

S

K + 1 with

S ⊂ {1, 2, . . . , K }

C&RT

(& general decision trees): handles

categorical features easily

Hsuan-Tien Lin (NTU CSIE) Machine Learning Techniques 15/22

Decision Tree Decision Tree Heuristics in C&RT

Branching on Categorical Features

numerical features

blood pressure:

130, 98, 115, 147, 120

branching for numerical

decision stump

b(x) =

Jx

i

θ

K + 1 with

θ

∈ R

categorical features

major symptom:

fever, pain, tired, sweaty

branching for categorical

decision subset

with

S ⊂ {1, 2, . . . , K }

C&RT

(& general decision trees): handles

categorical features easily

Decision Tree Decision Tree Heuristics in C&RT

Branching on Categorical Features

numerical features

blood pressure:

130, 98, 115, 147, 120

branching for numerical

decision stump

b(x) =

Jx

i

θ

K + 1 with

θ

∈ R

categorical features

major symptom:

fever, pain, tired, sweaty

branching for categorical

decision subset

b(x) =

Jx

i

S

K + 1 with

S ⊂ {1, 2, . . . , K }

C&RT

(& general decision trees): handles

categorical features easily

Hsuan-Tien Lin (NTU CSIE) Machine Learning Techniques 15/22

Decision Tree Decision Tree Heuristics in C&RT

Branching on Categorical Features

numerical features

blood pressure:

130, 98, 115, 147, 120

branching for numerical

decision stump

b(x) =

Jx

i

θ

K + 1 with

θ

∈ R

categorical features

major symptom:

fever, pain, tired, sweaty

branching for categorical

decision subset

with

S ⊂ {1, 2, . . . , K }

C&RT

(& general decision trees): handles

categorical features easily

Decision Tree Decision Tree Heuristics in C&RT

Branching on Categorical Features

numerical features

blood pressure:

130, 98, 115, 147, 120

branching for numerical

decision stump

b(x) =

Jx

i

θ

K + 1 with

θ

∈ R

categorical features

major symptom:

fever, pain, tired, sweaty

branching for categorical

decision subset

b(x) =

Jx

i

S

K + 1 with

S ⊂ {1, 2, . . . , K }

C&RT

(& general decision trees): handles

categorical features easily

Hsuan-Tien Lin (NTU CSIE) Machine Learning Techniques 15/22

numerical features

blood pressure:

130, 98, 115, 147, 120

branching for numerical

decision stump

b(x) =

Jx

i

θ

K + 1 with

θ

∈ R

categorical features

major symptom:

fever, pain, tired, sweaty

branching for categorical

decision subset

b(x) =

Jx

i

S

K + 1 with

S ⊂ {1, 2, . . . , K }

C&RT

(& general decision trees):

Decision Tree Decision Tree Heuristics in C&RT

Missing Features by Surrogate Branch

possible

b(x) =

J

weight

≤ 50kgK

if

weight

missing during prediction:

what would human do?

• go get weight

• or, use threshold on height instead, because threshold on height ≈ threshold on weight

• surrogate branch:

• maintain surrogate branch b

1

(x), b

2

(x), . . . ≈ best branch b(x) during training

• allow missing feature for b(x) during prediction by using surrogate instead

C&RT: handles missing features easily

Hsuan-Tien Lin (NTU CSIE) Machine Learning Techniques 16/22

Decision Tree Decision Tree Heuristics in C&RT

Missing Features by Surrogate Branch

possible

b(x) =

J

weight

≤ 50kgK if

weight

missing during prediction:

what would human do?

threshold on height ≈ threshold on weight

• surrogate branch:

• maintain surrogate branch b

1

(x), b

2

(x), . . . ≈ best branch b(x) during training

• allow missing feature for b(x) during prediction by using surrogate instead

C&RT: handles missing features easily

Decision Tree Decision Tree Heuristics in C&RT

Missing Features by Surrogate Branch

possible

b(x) =

J

weight

≤ 50kgK if

weight

missing during prediction:

what would human do?

• go get weight

• or, use threshold on height instead, because threshold on height ≈ threshold on weight

• surrogate branch:

• maintain surrogate branch b

1

(x), b

2

(x), . . . ≈ best branch b(x) during training

• allow missing feature for b(x) during prediction by using surrogate instead

C&RT: handles missing features easily

Hsuan-Tien Lin (NTU CSIE) Machine Learning Techniques 16/22

Decision Tree Decision Tree Heuristics in C&RT

Missing Features by Surrogate Branch

possible

b(x) =

J

weight

≤ 50kgK if

weight

missing during prediction:

what would human do?

• go get weight

• or, use threshold on height instead, because threshold on height ≈ threshold on weight

• surrogate branch:

• allow missing feature for b(x) during prediction by using surrogate instead

C&RT: handles missing features easily

Decision Tree Decision Tree Heuristics in C&RT

Missing Features by Surrogate Branch

possible

b(x) =

J

weight

≤ 50kgK if

weight

missing during prediction:

what would human do?

• go get weight

• or, use threshold on height instead, because threshold on height ≈ threshold on weight

• surrogate branch:

• maintain surrogate branch b

1

(x), b

2

(x), . . . ≈ best branch b(x) during training

• allow missing feature for b(x) during prediction by using surrogate instead

C&RT: handles missing features easily

Hsuan-Tien Lin (NTU CSIE) Machine Learning Techniques 16/22

Decision Tree Decision Tree Heuristics in C&RT

Missing Features by Surrogate Branch

possible

b(x) =

J

weight

≤ 50kgK if

weight

missing during prediction:

what would human do?

• go get weight

• or, use threshold on height instead, because threshold on height ≈ threshold on weight

• surrogate branch:

• maintain surrogate branch b

1

(x), b

2

(x), . . . ≈ best branch b(x) during training

C&RT: handles missing features easily

Decision Tree Decision Tree Heuristics in C&RT

Missing Features by Surrogate Branch

possible

b(x) =

J

weight

≤ 50kgK if

weight

missing during prediction:

what would human do?

• go get weight

• or, use threshold on height instead, because threshold on height ≈ threshold on weight

• surrogate branch:

• maintain surrogate branch b

1

(x), b

2

(x), . . . ≈ best branch b(x) during training

• allow missing feature for b(x) during prediction by using surrogate instead

C&RT: handles missing features easily

Hsuan-Tien Lin (NTU CSIE) Machine Learning Techniques 16/22

possible

b(x) =

J

weight

≤ 50kgK if

weight

missing during prediction:

what would human do?

• go get weight

• or, use threshold on height instead, because threshold on height ≈ threshold on weight

• surrogate branch:

• maintain surrogate branch b

1

(x), b

2

(x), . . . ≈ best branch b(x) during training

• allow missing feature for b(x) during prediction by using surrogate

instead

Decision Tree Decision Tree Heuristics in C&RT

Fun Time

For a categorical branching criteria

b(x) =

Jx

i

S

K + 1 with

S = {1, 6}. Which of the following is the explanation of the criteria?

1

if i-th feature is of type 1 or type 6, branch to first sub-tree; else branch to second sub-tree

2

if i-th feature is of type 1 or type 6, branch to second sub-tree;

else branch to first sub-tree

3

if i-th feature is of type 1 and type 6, branch to second sub-tree;

else branch to first sub-tree

4

if i-th feature is of type 1 and type 6, branch to first sub-tree; else branch to second sub-tree

Reference Answer: 2

Note that ‘∈ S’ is an ‘or’-style condition on the elements of S in human language.

Hsuan-Tien Lin (NTU CSIE) Machine Learning Techniques 17/22

For a categorical branching criteria

b(x) =

Jx

i

S

K + 1 with

S = {1, 6}. Which of the following is the explanation of the criteria?

1

if i-th feature is of type 1 or type 6, branch to first sub-tree; else branch to second sub-tree

2

if i-th feature is of type 1 or type 6, branch to second sub-tree;

else branch to first sub-tree

3

if i-th feature is of type 1 and type 6, branch to second sub-tree;

else branch to first sub-tree

4

if i-th feature is of type 1 and type 6, branch to first sub-tree; else branch to second sub-tree

Reference Answer: 2

Decision Tree Decision Tree in Action

A Simple Data Set

C&RT AdaBoost-Stump

C&RT: ‘divide-and-conquer’

Hsuan-Tien Lin (NTU CSIE) Machine Learning Techniques 18/22

Decision Tree Decision Tree in Action

A Simple Data Set

C&RT: ‘divide-and-conquer’

Decision Tree Decision Tree in Action

A Simple Data Set

C&RT AdaBoost-Stump

C&RT: ‘divide-and-conquer’

Hsuan-Tien Lin (NTU CSIE) Machine Learning Techniques 18/22

Decision Tree Decision Tree in Action

A Simple Data Set

C&RT: ‘divide-and-conquer’

Decision Tree Decision Tree in Action

A Simple Data Set

C&RT AdaBoost-Stump

C&RT: ‘divide-and-conquer’

Hsuan-Tien Lin (NTU CSIE) Machine Learning Techniques 18/22

Decision Tree Decision Tree in Action

A Simple Data Set

C&RT: ‘divide-and-conquer’

Decision Tree Decision Tree in Action

A Simple Data Set

C&RT AdaBoost-Stump

C&RT: ‘divide-and-conquer’

Hsuan-Tien Lin (NTU CSIE) Machine Learning Techniques 18/22

Decision Tree Decision Tree in Action

A Simple Data Set

C&RT: ‘divide-and-conquer’

Decision Tree Decision Tree in Action

A Simple Data Set

C&RT AdaBoost-Stump

C&RT: ‘divide-and-conquer’

Hsuan-Tien Lin (NTU CSIE) Machine Learning Techniques 18/22

Decision Tree Decision Tree in Action

A Simple Data Set

C&RT: ‘divide-and-conquer’

Decision Tree Decision Tree in Action

A Simple Data Set

C&RT AdaBoost-Stump

C&RT: ‘divide-and-conquer’

Hsuan-Tien Lin (NTU CSIE) Machine Learning Techniques 18/22

Decision Tree Decision Tree in Action

A Complicated Data Set

C&RT AdaBoost-Stump

Decision Tree Decision Tree in Action

A Complicated Data Set

C&RT AdaBoost-Stump

C&RT: even more efficient than AdaBoost-Stump

Hsuan-Tien Lin (NTU CSIE) Machine Learning Techniques 19/22

Decision Tree Decision Tree in Action

A Complicated Data Set

C&RT AdaBoost-Stump

Decision Tree Decision Tree in Action

A Complicated Data Set

C&RT AdaBoost-Stump

C&RT: even more efficient than AdaBoost-Stump

Hsuan-Tien Lin (NTU CSIE) Machine Learning Techniques 19/22

C&RT AdaBoost-Stump

Decision Tree Decision Tree in Action

Practical Specialties of C&RT

• human-explainable

• multiclass

easily

• categorical

features easily

• missing

features easily

• efficient

non-linear training (and testing)

—almost no other learning model share

all such specialties,

except for

other decision trees

another

popular decision tree algorithm:

C4.5, with different choices of heuristics

Hsuan-Tien Lin (NTU CSIE) Machine Learning Techniques 20/22

Decision Tree Decision Tree in Action

Practical Specialties of C&RT

• human-explainable

• multiclass

easily

• efficient

non-linear training (and testing)

—almost no other learning model share

all such specialties,

except for

other decision trees

another

popular decision tree algorithm:

C4.5, with different choices of heuristics

Decision Tree Decision Tree in Action

Practical Specialties of C&RT

• human-explainable

• multiclass

easily

• categorical

features easily

• missing

features easily

• efficient

non-linear training (and testing)

—almost no other learning model share

all such specialties,

except for

other decision trees

another

popular decision tree algorithm:

C4.5, with different choices of heuristics

Hsuan-Tien Lin (NTU CSIE) Machine Learning Techniques 20/22

Decision Tree Decision Tree in Action

Practical Specialties of C&RT

• human-explainable

• multiclass

easily

• categorical

features easily

• missing

features easily

except for

other decision trees

another

popular decision tree algorithm:

C4.5, with different choices of heuristics

Decision Tree Decision Tree in Action

Practical Specialties of C&RT

• human-explainable

• multiclass

easily

• categorical

features easily

• missing

features easily

• efficient

non-linear training (and testing)

—almost no other learning model share

all such specialties,

except for

other decision trees

another

popular decision tree algorithm:

C4.5, with different choices of heuristics

Hsuan-Tien Lin (NTU CSIE) Machine Learning Techniques 20/22

Decision Tree Decision Tree in Action

Practical Specialties of C&RT

• human-explainable

• multiclass

easily

• categorical

features easily

• missing

features easily

• efficient

non-linear training (and testing)

—almost no other learning model share

all such specialties,

except for

other decision trees

Decision Tree Decision Tree in Action

Practical Specialties of C&RT

• human-explainable

• multiclass

easily

• categorical

features easily

• missing

features easily

• efficient

non-linear training (and testing)

—almost no other learning model share

all such specialties,

except for

other decision trees

another

popular decision tree algorithm:

相關文件