• 沒有找到結果。

# Machine Learning Techniques (ᘤᢈ)

N/A
N/A
Protected

Share "Machine Learning Techniques (ᘤᢈ)"

Copied!
37
0
0

(1)

## ( 機器學習技法)

### Lecture 9: Decision Tree

Hsuan-Tien Lin (林軒田) htlin@csie.ntu.edu.tw

(2)

Decision Tree

### 2

Combining Predictive Features: Aggregation Models

to

### 3 Distilling Implicit Features: Extraction Models

Hsuan-Tien Lin (NTU CSIE) Machine Learning Techniques 1/22

(3)

## What We Have Done

blending: aggregate

### after getting g t

; learning: aggregate

aggregation type

### blending learning

uniform voting/averaging

### Bagging

non-uniform linear

stacking

realizes

### conditional aggregation

(4)

Decision Tree Decision Tree Hypothesis

## Decision Tree for Watching MOOC Lectures

G(x) =

X

(x) ·

(x)

### • base hypothesis gt

(x):

leaf at end of path t, a

here

### • condition qt

(x):

Jis x on path t ?K

usually with

### quittingtime?

N

true

Y

false

<18:30

Y

between

N

>2 days

Y

between

N

< −2 days

>21:30

decision tree: arguably one of the most

### human-mimicking models

Hsuan-Tien Lin (NTU CSIE) Machine Learning Techniques 3/22

(5)

## Recursive View of Decision Tree

Path View: G(x) =P

·

quitting time?

has a date?

true

false

<18:30

between

>2 days

between

< −2 days

>21:30

X

·

(x)

hypothesis

(x):

### sub-tree

hypothesis at the c-th branch

= (root,

### your data structure instructor would say :-)

(6)

Decision Tree Decision Tree Hypothesis

### •

human-explainable:

simple:

### •

efficient in prediction and

heuristic:

mostly

explanations

### •

heuristics:

‘heuristicsselection’

confusing to beginners

### •

arguably no single

### representative algorithm

decision tree: mostly

### heuristicbut useful

on its own

Hsuan-Tien Lin (NTU CSIE) Machine Learning Techniques 5/22

(7)

Decision Tree Decision Tree Hypothesis

## Fun Time

The following C-like code can be viewed as a decision tree of three leaves.

### }

What is the output of the tree for (income, debt) = (98765, 56789)?

true

false

98765

### 4

56789

You can simply trace the code. The tree expresses a complicated boolean condition Jincome>100000 or debt ≤ 50000K.

Hsuan-Tien Lin (NTU CSIE) Machine Learning Techniques 6/22

(8)

Decision Tree Decision Tree Hypothesis

## Fun Time

The following C-like code can be viewed as a decision tree of three leaves.

### }

What is the output of the tree for (income, debt) = (98765, 56789)?

true

false

98765

### 4

56789

You can simply trace the code. The tree expresses a complicated boolean condition Jincome>100000 or debt ≤ 50000K.

Hsuan-Tien Lin (NTU CSIE) Machine Learning Techniques 6/22

(9)

## A Basic Decision Tree Algorithm

P

J

=cK

(x) function

data D = {(x

,y

)}

 if

return

(x) else

learn

split D to

parts

= {(x

,y

) :

=c}

build sub-tree

)

return

P

J

=cK

(x)

four choices:

### criteria, termination criteria, & base hypothesis

(10)

Decision Tree Decision Tree Algorithm

## Classification and Regression Tree (C&RT)

function

,y

)}

) if

return

(x) else ...

split D to

parts

= {(x

,y

) :

=c}

=2 (binary tree)

(x) = E

-optimal

n

n

disclaimer:

here is based on

of

### CARTTMof California Statistical Software

Hsuan-Tien Lin (NTU CSIE) Machine Learning Techniques 8/22

(11)

## Branching in C&RT: Purifying

function

,y

)}

) if

return

(x) = E

-optimal

else ...

learn

split D to

parts

= {(x

,y

) :

=c}

### •

simple internal node for

### •

‘easier’ sub-tree: branch by

argmin

X

|D

with h| ·

with h)

by

### purifying

(12)

Decision Tree Decision Tree Algorithm

## Impurity Functions

### •

regression error:

N

n=1

n

2

with

=

of {y

}

### •

classification error:

N

n=1

n

with

=

of {y

}

Gini index:

K

k =1

N

n=1

n

### !

2

—all k considered together

### •

classification error:

1≤k ≤K

N

n=1

n

—optimal

only

choices:

### Gini

for classification,

### regression error

for regression

Hsuan-Tien Lin (NTU CSIE) Machine Learning Techniques 10/22

(13)

## Termination in C&RT

function

,y

)}

) if

return

(x) = E

-optimal

else ...

learn

argmin

X

|D

with h| ·

with h)

all

= 0 =⇒

(x) =

all

with

that come from

by

### purifying

(14)

Decision Tree Decision Tree Algorithm

## Fun Time

For the Gini index, 1 −P



N

n=1

n



### 2

. Consider K = 2, and let µ =

1, where N

### 1

is the number of examples with y

### n

=1. Which of the following formula of µ equals the Gini index in this case?

2µ(1 − µ)

(1 − µ)

2µ(1 − µ)

(1 − µ)

Simplify 1 − (µ

+ (1 − µ)

### 2

)and the answer should pop up.

Hsuan-Tien Lin (NTU CSIE) Machine Learning Techniques 12/22

(15)

## Fun Time

For the Gini index, 1 −P



N

n=1

n



### 2

. Consider K = 2, and let µ =

1, where N

### 1

is the number of examples with y

### n

=1. Which of the following formula of µ equals the Gini index in this case?

2µ(1 − µ)

(1 − µ)

2µ(1 − µ)

(1 − µ)

Simplify 1 − (µ

+ (1 − µ)

### 2

)and the answer should pop up.

(16)

Decision Tree Decision Tree Heuristics in C&RT

## Basic C&RT Algorithm

function

data D = {(x

,y

)}

 if

return

(x) = E

-optimal

else

learn

argmin

X

|D

with h| ·

with h)

split D to

parts

= {(x

,y

) :

=c}

build sub-tree

)

return

P

J

=cK

### G c

(x)

easily handle binary classification, regression, &

### multi-class classification

Hsuan-Tien Lin (NTU CSIE) Machine Learning Techniques 13/22

(17)

## Regularization by Pruning

if all

different

but

(large E

) because

need a

want

argmin

(G) + λΩ(G)

—called

cannot enumerate

### all possible G

computationally:

—often consider only

(0)

(i)

G

in

(i−1)

systematic

### choice of λ? validation

(18)

Decision Tree Decision Tree Heuristics in C&RT

## Branching on Categorical Features

### numerical features

blood pressure:

130, 98, 115, 147, 120

decision stump

Jx

K + 1 with

∈ R

### categorical features

major symptom:

fever, pain, tired, sweaty

decision subset

Jx

K + 1 with

### C&RT

(& general decision trees):

handles

### categorical features easily

Hsuan-Tien Lin (NTU CSIE) Machine Learning Techniques 15/22

(19)

## Missing Features by Surrogate Branch

possible

J

≤ 50kgK if

### weight

missing during prediction:

### •

what would human do?

1

2

### C&RT: handles missing features easily

(20)

Decision Tree Decision Tree Heuristics in C&RT

## Fun Time

For a categorical branching criteria

Jx

### S

K + 1 with

S = {1, 6}. Which of the following is the explanation of the criteria?

### 1

if i-th feature is of type 1 or type 6, branch to first sub-tree; else branch to second sub-tree

### 2

if i-th feature is of type 1 or type 6, branch to second sub-tree;

else branch to first sub-tree

### 3

if i-th feature is of type 1 and type 6, branch to second sub-tree;

else branch to first sub-tree

### 4

if i-th feature is of type 1 and type 6, branch to first sub-tree; else branch to second sub-tree

Note that ‘∈ S’ is an ‘or’-style condition on the elements of S in human language.

Hsuan-Tien Lin (NTU CSIE) Machine Learning Techniques 17/22

(21)

## Fun Time

For a categorical branching criteria

Jx

### S

K + 1 with

S = {1, 6}. Which of the following is the explanation of the criteria?

### 1

if i-th feature is of type 1 or type 6, branch to first sub-tree; else branch to second sub-tree

### 2

if i-th feature is of type 1 or type 6, branch to second sub-tree;

else branch to first sub-tree

### 3

if i-th feature is of type 1 and type 6, branch to second sub-tree;

else branch to first sub-tree

### 4

if i-th feature is of type 1 and type 6, branch to first sub-tree; else branch to second sub-tree

Note that ‘∈ S’ is an ‘or’-style condition on the elements of S in human language.

(22)

Decision Tree Decision Tree in Action

## A Simple Data Set

### C&RT: ‘divide-and-conquer’

Hsuan-Tien Lin (NTU CSIE) Machine Learning Techniques 18/22

(23)

Decision Tree Decision Tree in Action

## A Simple Data Set

### C&RT: ‘divide-and-conquer’

(24)

Decision Tree Decision Tree in Action

## A Simple Data Set

### C&RT: ‘divide-and-conquer’

Hsuan-Tien Lin (NTU CSIE) Machine Learning Techniques 18/22

(25)

Decision Tree Decision Tree in Action

## A Simple Data Set

### C&RT: ‘divide-and-conquer’

(26)

Decision Tree Decision Tree in Action

## A Simple Data Set

### C&RT: ‘divide-and-conquer’

Hsuan-Tien Lin (NTU CSIE) Machine Learning Techniques 18/22

(27)

Decision Tree Decision Tree in Action

## A Simple Data Set

### C&RT: ‘divide-and-conquer’

(28)

Decision Tree Decision Tree in Action

## A Simple Data Set

### C&RT: ‘divide-and-conquer’

Hsuan-Tien Lin (NTU CSIE) Machine Learning Techniques 18/22

(29)

Decision Tree Decision Tree in Action

## A Simple Data Set

### C&RT: ‘divide-and-conquer’

(30)

Decision Tree Decision Tree in Action

## A Simple Data Set

### C&RT: ‘divide-and-conquer’

Hsuan-Tien Lin (NTU CSIE) Machine Learning Techniques 18/22

(31)

Decision Tree Decision Tree in Action

## A Simple Data Set

### C&RT: ‘divide-and-conquer’

(32)

Decision Tree Decision Tree in Action

## A Simple Data Set

### C&RT: ‘divide-and-conquer’

Hsuan-Tien Lin (NTU CSIE) Machine Learning Techniques 18/22

(33)

## A Complicated Data Set

### C&RT: even more efficient than

(34)

Decision Tree Decision Tree in Action

## Practical Specialties of C&RT

easily

features easily

features easily

### • efficient

non-linear training (and testing)

—almost no other learning model share

except for

### another

popular decision tree algorithm:

### C4.5, with differentchoices of heuristics

Hsuan-Tien Lin (NTU CSIE) Machine Learning Techniques 20/22

(35)

Decision Tree Decision Tree in Action

## Fun Time

Which of the following is

### not

a specialty of C&RT without pruning?

### 1

handles missing features easily

### 2

produces explainable hypotheses

achieves low E

achieves low E

### out

The first two choices are easy; the third comes from the fact that fully grown C&RT greedy minimizes E

### in

(almost always to 0). But as you may imagine, overfitting may happen and E

### out

may not always be low.

Hsuan-Tien Lin (NTU CSIE) Machine Learning Techniques 21/22

(36)

Decision Tree Decision Tree in Action

## Fun Time

Which of the following is

### not

a specialty of C&RT without pruning?

### 1

handles missing features easily

### 2

produces explainable hypotheses

achieves low E

### 4

achieves low E

The first two choices are easy; the third comes from the fact that fully grown C&RT greedy minimizes E

### in

(almost always to 0). But as you may imagine, overfitting may happen and E

### out

may not always be low.

Hsuan-Tien Lin (NTU CSIE) Machine Learning Techniques 21/22

(37)

## Summary

### 2

Combining Predictive Features: Aggregation Models

### 3 Distilling Implicit Features: Extraction Models

Hsuan-Tien Lin (NTU CSIE) Machine Learning Techniques 21/25.. Gradient Boosted Decision Tree Summary of Aggregation Models. Map of

2 Combining Predictive Features: Aggregation Models Lecture 7: Blending and Bagging.. Motivation of Aggregation

Which of the following aggregation model learns diverse g t by reweighting and calculates linear vote by steepest search?.

3 Distilling Implicit Features: Extraction Models Lecture 14: Radial Basis Function Network. RBF

Lecture 4: Soft-Margin Support Vector Machine allow some margin violations ξ n while penalizing them by C; equivalent to upper-bounding α n by C Lecture 5: Kernel Logistic

Hsuan-Tien Lin (NTU CSIE) Machine Learning Techniques 5/22.. Decision Tree Decision Tree Hypothesis. Disclaimers about

1 Embedding Numerous Features: Kernel Models Lecture 1: Linear Support Vector Machine.. linear SVM: more robust and solvable with quadratic programming Lecture 2: Dual Support

1 Embedding Numerous Features: Kernel Models Lecture 1: Linear Support Vector Machine.