# Machine Learning Techniques (ᘤᢈ)

(1)

## ( 機器學習技巧)

### Lecture 9: Decision Tree

Hsuan-Tien Lin (林軒田) htlin@csie.ntu.edu.tw

(2)

## Agenda

### Decision Tree in Action

(3)

Decision Tree Decision Tree Hypothesis

## What We Have Done

blending: aggregate

### after getting g t

; learning: aggregate

aggregation type

### blending learning

uniform voting/averaging

### Bagging

non-uniform linear

stacking

realizes

(4)

## Decision Tree for Playing Golf

G(x) =

X

(x) ·

(x)

### • base hypothesis gt

(x):

leaf at end of path t, a

here

### • condition qt

(x):

Jis x on path t ?K

usually with

### simpleinternal nodes

decision tree: arguably one of the most

### human-mimicking models

(5)

Decision Tree Decision Tree Hypothesis

## Recursive View of Decision Tree

Path View: G(x) =P

·

X

·

(x)

hypothesis

(x):

### sub-tree

hypothesis at the c-th branch

= (root,

(6)

### •

human-explainable:

simple:

### •

efficient in prediction and

heuristic:

mostly

explanations

### •

heuristics:

‘heuristicsselection’

confusing to beginners

### •

arguably no single

### representative algorithm

decision tree: mostly

### but useful

on its own

(7)

Decision Tree Decision Tree Hypothesis

(8)

## A Basic Decision Tree Algorithm

P

J

=cK

(x) function

data D = {(x

,y

)}

 if

return

(x) else

learn

split D to

parts

= {(x

,y

) :

=c}

build sub-tree

)

return

P

J

=cK

(x)

four choices:

### criteria, termination criteria, & base hypothesis

(9)

Decision Tree Decision Tree Algorithm

function

,y

)}

) if

return

(x) else ...

split D to

parts

= {(x

,y

) :

=c}

=2 (binary tree)

(x) = E

-optimal

n

n

disclaimer:

here is based on

of

(10)

## Branching in C&RT: Purifying

function

,y

)}

) if

return

(x) = E

-optimal

else ...

learn

split D to

parts

= {(x

,y

) :

=c}

### •

simple internal node for

### •

‘easier’ subtree: branch by

argmin

X

|D

with h| ·

with h)

by

### purifying

(11)

Decision Tree Decision Tree Algorithm

## Impurity Functions

### •

regression error:

N

n=1

n

2

with

=

of {y

}

### •

classification error:

N

n=1

n

with

=

of {y

}

Gini index:

K

k =1

N

n=1

n

### !

2

—all k considered together

### •

classification error:

1≤k ≤K

N

n=1

n

—optimal

only

choices:

### Gini

for classification,

for regression

(12)

## Termination in C&RT

function

,y

)}

) if

return

(x) = E

-optimal

else ...

learn

argmin

X

|D

with h| ·

with h)

all

= 0 =⇒

(x) =

all

with

that come from

by

### purifying

(13)

Decision Tree Decision Tree Algorithm

(14)

## Basic C&RT Algorithm

function

data D = {(x

,y

)}

 if

return

(x) = E

-optimal

else

learn

argmin

X

|D

with h| ·

with h)

split D to

parts

= {(x

,y

) :

=c}

build sub-tree

)

return

P

J

=cK

### G c

(x)

easily handle binary classification, regression, &

### multi-class classification

(15)

Decision Tree Decision Tree in Practice

## Regularization by Pruning

if all

different

but

(large E

) because

need a

want

argmin

(G) + λΩ(G)

—called

cannot enumerate

### all possible G

computationally:

—often consider only

(0)

(i)

G

in

(i−1)

systematic

(16)

## Branching on Categorical Features

### numerical features

blood pressure:

130, 98, 115, 147, 120

decision stump

Jx

K + 1 with

∈ R

### categorical features

major symptom:

fever, pain, tired, sweaty

decision subset

Jx

K + 1 with

### C&RT

(& general decision trees):

handles

### categorical features easily

(17)

Decision Tree Decision Tree in Practice

## Missing Features by Surrogate Branch

possible

J

≤ 50kgK if

### weight

missing during prediction:

### •

what would human do?

1

2

(18)

## A Simple Data Set

(19)

Decision Tree Decision Tree in Action

(20)

## A Simple Data Set

(21)

Decision Tree Decision Tree in Action

(22)

## A Simple Data Set

(23)

Decision Tree Decision Tree in Action

(24)

## A Simple Data Set

(25)

Decision Tree Decision Tree in Action

(26)

## A Simple Data Set

(27)

Decision Tree Decision Tree in Action

(28)

## A Complicated Data Set

(29)

Decision Tree Decision Tree in Action

## Practical Specialties of C&RT

easily

features easily

### • missing

features easily

—almost no other learning model share

except for

### another

popular decision tree algorithm:

(30)

## Fun Time

(31)

Decision Tree Decision Tree in Action

Updating...

## References

Related subjects :