# Machine Learning Techniques (ᘤᢈ)

(1)

## ( 機器學習技法)

### Lecture 9: Decision Tree

Hsuan-Tien Lin (林軒田) htlin@csie.ntu.edu.tw

### ( 國立台灣大學資訊工程系)

Hsuan-Tien Lin (NTU CSIE) Machine Learning Techniques 0/22

(2)

### 2

Combining Predictive Features: Aggregation Models

to

### Decision Tree in Action

(3)

Decision Tree Decision Tree Hypothesis

## What We Have Done

blending: aggregate

### after getting g t

;

learning: aggregate

aggregation type

### learning

uniform voting/averaging

### Bagging

non-uniform linear

stacking

realizes

### conditional aggregation

Hsuan-Tien Lin (NTU CSIE) Machine Learning Techniques 2/22

(4)

Decision Tree Decision Tree Hypothesis

## What We Have Done

blending: aggregate

### after getting g t

; learning: aggregate

aggregation type

### blending learning

uniform voting/averaging

### Bagging

non-uniform linear

stacking

realizes

### conditional aggregation

(5)

Decision Tree Decision Tree Hypothesis

## What We Have Done

blending: aggregate

### after getting g t

; learning: aggregate

aggregation type

### blending learning

uniform voting/averaging

### Bagging

non-uniform linear

stacking

realizes

### conditional aggregation

Hsuan-Tien Lin (NTU CSIE) Machine Learning Techniques 2/22

(6)

blending: aggregate

### after getting g t

; learning: aggregate

aggregation type

### blending learning

uniform voting/averaging

### Bagging

non-uniform linear

stacking

realizes

### conditional aggregation

(7)

Decision Tree Decision Tree Hypothesis

## Decision Tree for Watching MOOC Lectures

G(x) =

X

(x) ·

(x)

### • base hypothesis gt

(x): leaf at end of path t, a

here

### • condition q t

(x): Jis x on path t ?K

usually with

### quittingtime?

N

true

Y

false

<18:30

Y

between

N

>2 days

Y

between

N

< −2 days

>21:30

decision tree: arguably one of the most

### human-mimicking models

Hsuan-Tien Lin (NTU CSIE) Machine Learning Techniques 3/22

(8)

Decision Tree Decision Tree Hypothesis

## Decision Tree for Watching MOOC Lectures

G(x) =

X

(x) ·

(x)

a

here

### • condition q t

(x): Jis x on path t ?K

usually with

### quittingtime?

N

true

Y

false

<18:30

Y

between

N

>2 days

Y

between

N

< −2 days

>21:30

decision tree: arguably one of the most

### human-mimicking models

(9)

Decision Tree Decision Tree Hypothesis

## Decision Tree for Watching MOOC Lectures

G(x) =

X

(x) ·

(x)

### • base hypothesis gt

(x):

leaf at end of path t, a

here

### • condition q t

(x): Jis x on path t ?K

usually with

### quittingtime?

N

true

Y

false

<18:30

Y

between

N

>2 days

Y

between

N

< −2 days

>21:30

decision tree: arguably one of the most

### human-mimicking models

Hsuan-Tien Lin (NTU CSIE) Machine Learning Techniques 3/22

(10)

Decision Tree Decision Tree Hypothesis

## Decision Tree for Watching MOOC Lectures

G(x) =

X

(x) ·

(x)

### • base hypothesis gt

(x):

leaf at end of path t, a

here

### • condition qt

(x):

Jis x on path t ?K

### quittingtime?

N

true

Y

false

<18:30

Y

between

N

>2 days

Y

between

N

< −2 days

>21:30

decision tree: arguably one of the most

### human-mimicking models

(11)

Decision Tree Decision Tree Hypothesis

## Decision Tree for Watching MOOC Lectures

G(x) =

X

(x) ·

(x)

### • base hypothesis gt

(x):

leaf at end of path t, a

here

### • condition qt

(x):

Jis x on path t ?K

usually with

### quittingtime?

N

true

Y

false

<18:30

Y

between

N

>2 days

Y

between

N

< −2 days

>21:30

decision tree: arguably one of the most

### human-mimicking models

Hsuan-Tien Lin (NTU CSIE) Machine Learning Techniques 3/22

(12)

G(x) =

X

(x) ·

(x)

### • base hypothesis gt

(x):

leaf at end of path t, a

here

### • condition qt

(x):

Jis x on path t ?K

usually with

### quittingtime?

N

true

Y

false

<18:30

Y

between

N

>2 days

Y

between

N

< −2 days

>21:30

(13)

Decision Tree Decision Tree Hypothesis

## Recursive View of Decision Tree

Path View: G(x) =P

·

quitting time?

has a date?

true

false

<18:30

between

>2 days

between

< −2 days

>21:30

X

·

(x)

hypothesis

(x):

### sub-tree

hypothesis at the c-th branch

= (root,

### sub-trees), just like what your data structure instructor would say :-)

Hsuan-Tien Lin (NTU CSIE) Machine Learning Techniques 4/22

(14)

Decision Tree Decision Tree Hypothesis

## Recursive View of Decision Tree

Path View: G(x) =P

·

quitting time?

has a date?

true

false

<18:30

between

>2 days

between

< −2 days

>21:30

X

·

(x)

(x):

### sub-tree

hypothesis at the c-th branch

= (root,

### your data structure instructor would say :-)

(15)

Decision Tree Decision Tree Hypothesis

## Recursive View of Decision Tree

Path View: G(x) =P

·

quitting time?

has a date?

true

false

<18:30

between

>2 days

between

< −2 days

>21:30

X

·

(x)

hypothesis

(x):

### sub-tree

hypothesis at the c-th branch

= (root,

### sub-trees), just like what your data structure instructor would say :-)

Hsuan-Tien Lin (NTU CSIE) Machine Learning Techniques 4/22

(16)

Decision Tree Decision Tree Hypothesis

## Recursive View of Decision Tree

Path View: G(x) =P

·

quitting time?

has a date?

true

false

<18:30

between

>2 days

between

< −2 days

>21:30

X

·

(x)

hypothesis

= (root,

### your data structure instructor would say :-)

(17)

Decision Tree Decision Tree Hypothesis

## Recursive View of Decision Tree

Path View: G(x) =P

·

quitting time?

has a date?

true

false

<18:30

between

>2 days

between

< −2 days

>21:30

X

·

(x)

hypothesis

(x):

### sub-tree

hypothesis at the c-th branch

= (root,

### sub-trees), just like what your data structure instructor would say :-)

Hsuan-Tien Lin (NTU CSIE) Machine Learning Techniques 4/22

(18)

Decision Tree Decision Tree Hypothesis

## Recursive View of Decision Tree

Path View: G(x) =P

·

quitting time?

has a date?

true

false

<18:30

between

>2 days

between

< −2 days

>21:30

X

·

(x)

hypothesis

(x):

### sub-tree

hypothesis at the c-th branch

= (root,

### sub-trees),

(19)

Decision Tree Decision Tree Hypothesis

## Recursive View of Decision Tree

Path View: G(x) =P

·

quitting time?

has a date?

true

false

<18:30

between

>2 days

between

< −2 days

>21:30

X

·

(x)

hypothesis

(x):

### sub-tree

hypothesis at the c-th branch

= (root,

### sub-trees), just like what your data structure instructor would say :-)

Hsuan-Tien Lin (NTU CSIE) Machine Learning Techniques 4/22

(20)

Decision Tree Decision Tree Hypothesis

### •

human-explainable:

simple:

### •

efficient in prediction and

explanations

### •

heuristics:

‘heuristicsselection’ confusing to beginners

### •

arguably no single

### representative algorithm

decision tree: mostly

### but useful

on its own

(21)

Decision Tree Decision Tree Hypothesis

### •

human-explainable:

simple:

### •

efficient in prediction and

heuristic:

mostly

explanations

### •

heuristics:

‘heuristicsselection’ confusing to beginners

### •

arguably no single

### representative algorithm

decision tree: mostly

### heuristicbut useful

on its own

Hsuan-Tien Lin (NTU CSIE) Machine Learning Techniques 5/22

(22)

Decision Tree Decision Tree Hypothesis

### •

human-explainable:

simple:

### •

efficient in prediction and

explanations

### •

heuristics:

‘heuristicsselection’ confusing to beginners

### •

arguably no single

### representative algorithm

decision tree: mostly

### but useful

on its own

(23)

Decision Tree Decision Tree Hypothesis

### •

human-explainable:

simple:

### •

efficient in prediction and

heuristic:

mostly

explanations

### •

heuristics:

‘heuristicsselection’ confusing to beginners

### •

arguably no single

### representative algorithm

decision tree: mostly

### heuristicbut useful

on its own

Hsuan-Tien Lin (NTU CSIE) Machine Learning Techniques 5/22

(24)

Decision Tree Decision Tree Hypothesis

### •

human-explainable:

simple:

### •

efficient in prediction and

heuristic:

mostly

### little theoretical

explanations

confusing to beginners

### •

arguably no single

### representative algorithm

decision tree: mostly

### but useful

on its own

(25)

Decision Tree Decision Tree Hypothesis

### •

human-explainable:

simple:

### •

efficient in prediction and

heuristic:

mostly

explanations

### •

heuristics:

‘heuristicsselection’

confusing to beginners

### •

arguably no single

### representative algorithm

decision tree: mostly

### heuristicbut useful

on its own

Hsuan-Tien Lin (NTU CSIE) Machine Learning Techniques 5/22

(26)

Decision Tree Decision Tree Hypothesis

### •

human-explainable:

simple:

### •

efficient in prediction and

heuristic:

mostly

explanations

### •

heuristics:

‘heuristicsselection’

confusing to beginners

### •

arguably no single

### representative algorithm

(27)

Decision Tree Decision Tree Hypothesis

### •

human-explainable:

simple:

### •

efficient in prediction and

heuristic:

mostly

explanations

### •

heuristics:

‘heuristicsselection’

confusing to beginners

### •

arguably no single

### representative algorithm

decision tree: mostly

### heuristicbut useful

on its own

Hsuan-Tien Lin (NTU CSIE) Machine Learning Techniques 5/22

(28)

Decision Tree Decision Tree Hypothesis

## Fun Time

The following C-like code can be viewed as a decision tree of three leaves.

### }

What is the output of the tree for (income, debt) = (98765, 56789)?

true

false

98765

### 4

56789

expresses a complicated boolean condition Jincome>100000 or debt ≤ 50000K.

Hsuan-Tien Lin (NTU CSIE) Machine Learning Techniques 6/22

(29)

Decision Tree Decision Tree Hypothesis

## Fun Time

The following C-like code can be viewed as a decision tree of three leaves.

### }

What is the output of the tree for (income, debt) = (98765, 56789)?

true

false

98765

### 4

56789

You can simply trace the code. The tree expresses a complicated boolean condition Jincome>100000 or debt ≤ 50000K.

Hsuan-Tien Lin (NTU CSIE) Machine Learning Techniques 6/22

(30)

Decision Tree Decision Tree Algorithm

## A Basic Decision Tree Algorithm

P

J

=cK

(x) function

data D = {(x

,y

)}



else

learn

split D to

parts

= {(x

,y

) :

=c}

build sub-tree

)

return

P

J

=cK

(x)

four choices:

### criteria, termination criteria, & base hypothesis

(31)

Decision Tree Decision Tree Algorithm

## A Basic Decision Tree Algorithm

P

J

=cK

(x) function

data D = {(x

,y

)}



if

return

(x) else

learn

split D to

parts

= {(x

,y

) :

=c}

build sub-tree

)

return

P

J

=cK

(x)

four choices:

### number of branches, branching criteria, termination criteria, & base hypothesis

Hsuan-Tien Lin (NTU CSIE) Machine Learning Techniques 7/22

(32)

Decision Tree Decision Tree Algorithm

## A Basic Decision Tree Algorithm

P

J

=cK

(x) function

data D = {(x

,y

)}



else

learn

split D to

parts

= {(x

,y

) :

=c}

build sub-tree

)

return

P

J

=cK

(x)

four choices:

### criteria, termination criteria, & base hypothesis

(33)

Decision Tree Decision Tree Algorithm

## A Basic Decision Tree Algorithm

P

J

=cK

(x) function

data D = {(x

,y

)}



if

return

(x) else

learn

split D to

parts

= {(x

,y

) :

=c}

build sub-tree

)

return

P

J

=cK

(x)

four choices:

### number of branches, branching criteria, termination criteria, & base hypothesis

Hsuan-Tien Lin (NTU CSIE) Machine Learning Techniques 7/22

(34)

Decision Tree Decision Tree Algorithm

## A Basic Decision Tree Algorithm

P

J

=cK

(x) function

data D = {(x

,y

)}



else

learn

split D to

parts

= {(x

,y

) :

=c}

build sub-tree

)

return

P

J

=cK

(x)

four choices:

### criteria, termination criteria, & base hypothesis

(35)

Decision Tree Decision Tree Algorithm

## A Basic Decision Tree Algorithm

P

J

=cK

(x) function

data D = {(x

,y

)}

 if

return

(x) else

learn

split D to

parts

= {(x

,y

) :

=c}

build sub-tree

)

return

P

J

=cK

(x)

four choices:

### number of branches, branching criteria, termination criteria, & base hypothesis

Hsuan-Tien Lin (NTU CSIE) Machine Learning Techniques 7/22

(36)

P

J

=cK

(x) function

data D = {(x

,y

)}

 if

return

(x) else

learn

split D to

parts

= {(x

,y

) :

=c}

build sub-tree

)

return

P

J

=cK

### G c

(x)

(37)

Decision Tree Decision Tree Algorithm

## Classification and Regression Tree (C&RT)

function

,y

)}

) if

return

(x) else ...

split D to

parts

= {(x

,y

) :

=c}

=2 (binary tree)

(x) = E

-optimal

n

n

disclaimer:

here is based on

of

### CARTTMof California Statistical Software

Hsuan-Tien Lin (NTU CSIE) Machine Learning Techniques 8/22

(38)

Decision Tree Decision Tree Algorithm

## Classification and Regression Tree (C&RT)

function

,y

)}

) if

return

(x) else ...

split D to

parts

= {(x

,y

) :

=c}

=2 (binary tree)

(x) = E

-optimal

disclaimer:

here is based on

of

### CARTTMof California Statistical Software

(39)

Decision Tree Decision Tree Algorithm

## Classification and Regression Tree (C&RT)

function

,y

)}

) if

return

(x) else ...

split D to

parts

= {(x

,y

) :

=c}

=2 (binary tree)

(x) = E

-optimal

n

n

disclaimer:

here is based on

of

### CARTTMof California Statistical Software

Hsuan-Tien Lin (NTU CSIE) Machine Learning Techniques 8/22

(40)

Decision Tree Decision Tree Algorithm

## Classification and Regression Tree (C&RT)

function

,y

)}

) if

return

(x) else ...

split D to

parts

= {(x

,y

) :

=c}

=2 (binary tree)

(x) = E

-optimal

disclaimer:

here is based on

of

### CARTTMof California Statistical Software

(41)

Decision Tree Decision Tree Algorithm

## Classification and Regression Tree (C&RT)

function

,y

)}

) if

return

(x) else ...

split D to

parts

= {(x

,y

) :

=c}

=2 (binary tree)

(x) = E

-optimal

n

n

disclaimer:

here is based on

of

### CARTTMof California Statistical Software

Hsuan-Tien Lin (NTU CSIE) Machine Learning Techniques 8/22

(42)

Decision Tree Decision Tree Algorithm

## Classification and Regression Tree (C&RT)

function

,y

)}

) if

return

(x) else ...

split D to

parts

= {(x

,y

) :

=c}

=2 (binary tree)

(x) = E

-optimal

n

n

of

### CARTTMof California Statistical Software

(43)

Decision Tree Decision Tree Algorithm

## Classification and Regression Tree (C&RT)

function

,y

)}

) if

return

(x) else ...

split D to

parts

= {(x

,y

) :

=c}

=2 (binary tree)

(x) = E

-optimal

n

n

disclaimer:

here is based on

of

### CARTTMof California Statistical Software

Hsuan-Tien Lin (NTU CSIE) Machine Learning Techniques 8/22

(44)

Decision Tree Decision Tree Algorithm

## Branching in C&RT: Purifying

function

,y

)}

) if

return

(x) = E

-optimal

else ...

learn

split D to

parts

= {(x

,y

) :

=c}

argmin

X

|D

with h| ·

with h)

by

### purifying

(45)

Decision Tree Decision Tree Algorithm

## Branching in C&RT: Purifying

function

,y

)}

) if

return

(x) = E

-optimal

else ...

learn

split D to

parts

= {(x

,y

) :

=c}

### •

simple internal node for

### •

‘easier’ sub-tree: branch by

argmin

X

|D

with h| ·

with h)

by

### purifying

Hsuan-Tien Lin (NTU CSIE) Machine Learning Techniques 9/22

(46)

Decision Tree Decision Tree Algorithm

## Branching in C&RT: Purifying

function

,y

)}

) if

return

(x) = E

-optimal

else ...

learn

split D to

parts

= {(x

,y

) :

=c}

### •

simple internal node for

argmin

X

|D

with h| ·

with h)

by

### purifying

(47)

Decision Tree Decision Tree Algorithm

## Branching in C&RT: Purifying

function

,y

)}

) if

return

(x) = E

-optimal

else ...

learn

split D to

parts

= {(x

,y

) :

=c}

### •

simple internal node for

### •

‘easier’ sub-tree: branch by

argmin

X

|D

with h| ·

with h)

by

### purifying

Hsuan-Tien Lin (NTU CSIE) Machine Learning Techniques 9/22

(48)

Decision Tree Decision Tree Algorithm

## Branching in C&RT: Purifying

function

,y

)}

) if

return

(x) = E

-optimal

else ...

learn

split D to

parts

= {(x

,y

) :

=c}

### •

simple internal node for

### •

‘easier’ sub-tree: branch by

argmin

X|D

with h| ·

### impurity(D c

with h)

(49)

Decision Tree Decision Tree Algorithm

## Branching in C&RT: Purifying

function

,y

)}

) if

return

(x) = E

-optimal

else ...

learn

split D to

parts

= {(x

,y

) :

=c}

### •

simple internal node for

### •

‘easier’ sub-tree: branch by

argmin

X

|D

with h| ·

with h)

by

### purifying

Hsuan-Tien Lin (NTU CSIE) Machine Learning Techniques 9/22

(50)

Decision Tree Decision Tree Algorithm

## Impurity Functions

### •

regression error:

N

n=1

n

2

with

=

of {y

}

### •

classification error:

N

n=1

n

with

=

of {y

}

k =1

n=1

n

### = k K N

—all k considered together

### •

classification error:

1≤k ≤K

N

n=1

n

—optimal

only

choices:

### Gini

for classification,

### regression error

for regression

(51)

Decision Tree Decision Tree Algorithm

## Impurity Functions

### •

regression error:

N

n=1

n

2

with

=

of {y

}

### •

classification error:

N

n=1

n

with

=

of {y

}

Gini index:

K

k =1

N

n=1

n

### !

2

—all k considered together

### •

classification error:

1≤k ≤K

N

n=1

n

—optimal

only

choices:

### Gini

for classification,

### regression error

for regression

Hsuan-Tien Lin (NTU CSIE) Machine Learning Techniques 10/22

(52)

Decision Tree Decision Tree Algorithm

## Impurity Functions

### •

regression error:

N

n=1

n

2

with

=

of {y

}

### •

classification error:

N

n=1

n

with

=

of {y

}

k =1

n=1

n

### = k K N

—all k considered together

### •

classification error:

1≤k ≤K

N

n=1

n

—optimal

only

choices:

### Gini

for classification,

### regression error

for regression

(53)

Decision Tree Decision Tree Algorithm

## Impurity Functions

### •

regression error:

N

n=1

n

2

with

=

of {y

}

### •

classification error:

N

n=1

n

with

=

of {y

}

Gini index:

K

k =1

N

n=1

n

### !

2

—all k considered together

### •

classification error:

1≤k ≤K

N

n=1

n

—optimal

only

choices:

### Gini

for classification,

### regression error

for regression

Hsuan-Tien Lin (NTU CSIE) Machine Learning Techniques 10/22

(54)

Decision Tree Decision Tree Algorithm

## Impurity Functions

### •

regression error:

N

n=1

n

2

with

=

of {y

}

### •

classification error:

N

n=1

n

with

=

of {y

}

k =1

n=1

n

### = k K N

—all k considered together

### •

classification error:

1≤k ≤K

N

n=1

n

—optimal

only

choices:

### Gini

for classification,

### regression error

for regression

(55)

Decision Tree Decision Tree Algorithm

## Impurity Functions

### •

regression error:

N

n=1

n

2

with

=

of {y

}

### •

classification error:

N

n=1

n

with

=

of {y

}

Gini index:

K

k =1

N

n=1

n

### !

2

—all k considered together

### •

classification error:

1≤k ≤K

N

n=1

n

—optimal

only

choices:

### Gini

for classification,

### regression error

for regression

Hsuan-Tien Lin (NTU CSIE) Machine Learning Techniques 10/22

(56)

### •

regression error:

N

n=1

n

2

with

=

of {y

}

### •

classification error:

N

n=1

n

with

=

of {y

}

Gini index:

K

k =1

N

n=1

n

### !

2

—all k considered together

### •

classification error:

1≤k ≤K

N

n=1

n

—optimal

### k = y ∗

only

(57)

Decision Tree Decision Tree Algorithm

## Termination in C&RT

function

,y

)}

) if

return

(x) = E

-optimal

else ...

learn

argmin

X

|D

with h| ·

with h)

all

= 0 =⇒

(x) =

all

with

that come from

by

### purifying

Hsuan-Tien Lin (NTU CSIE) Machine Learning Techniques 11/22

(58)

Decision Tree Decision Tree Algorithm

## Termination in C&RT

function

,y

)}

) if

return

(x) = E

-optimal

else ...

learn

argmin

X

|D

with h| ·

with h)

all

= 0 =⇒

(x) =

with

that come from

by

### purifying

(59)

Decision Tree Decision Tree Algorithm

## Termination in C&RT

function

,y

)}

) if

return

(x) = E

-optimal

else ...

learn

argmin

X

|D

with h| ·

with h)

all

= 0 =⇒

(x) =

all

with

that come from

by

### purifying

Hsuan-Tien Lin (NTU CSIE) Machine Learning Techniques 11/22

(60)

function

,y

)}

) if

return

(x) = E

-optimal

else ...

learn

argmin

X

|D

with h| ·

with h)

all

= 0 =⇒

(x) =

all

### xn the same: no decision stumps

(61)

Decision Tree Decision Tree Algorithm

## Fun Time

For the Gini index, 1 −P



N

n=1

n



### 2

. Consider K = 2, and let µ =

1, where N

### 1

is the number of examples with y

### n

=1. Which of the following formula of µ equals the Gini index in this case?

2µ(1 − µ)

(1 − µ)

2µ(1 − µ)

(1 − µ)

Simplify 1 − (µ

+ (1 − µ)

### 2

)and the answer should pop up.

Hsuan-Tien Lin (NTU CSIE) Machine Learning Techniques 12/22

(62)

For the Gini index, 1 −P



N

n=1

n



### 2

. Consider K = 2, and let µ =

1, where N

### 1

is the number of examples with y

### n

=1. Which of the following formula of µ equals the Gini index in this case?

2µ(1 − µ)

(1 − µ)

2µ(1 − µ)

(1 − µ)

Simplify 1 − (µ

+ (1 − µ)

### 2

(63)

Decision Tree Decision Tree Heuristics in C&RT

## Basic C&RT Algorithm

function

data D = {(x

,y

)}

 if

return

(x) = E

-optimal

else

learn

argmin

X

|D

with h| ·

with h)

split D to

parts

= {(x

,y

) :

=c}

build sub-tree

)

return

P

J

=cK

### G c

(x)

easily handle binary classification, regression, &

### multi-class classification

Hsuan-Tien Lin (NTU CSIE) Machine Learning Techniques 13/22

(64)

function

data D = {(x

,y

)}

 if

return

(x) = E

-optimal

else

learn

argmin

X

|D

with h| ·

with h)

split D to

parts

= {(x

,y

) :

=c}

build sub-tree

)

return

P

J

=cK

### G c

(x)

(65)

Decision Tree Decision Tree Heuristics in C&RT

## Regularization by Pruning

if all

different

but

(large E

) because

need a

want

argmin

(G) + λΩ(G)

—called

cannot enumerate

### all possible G

computationally:

—often consider only

(0)

(i)

G

in

(i−1)

systematic

### choice of λ? validation

Hsuan-Tien Lin (NTU CSIE) Machine Learning Techniques 14/22

(66)

Decision Tree Decision Tree Heuristics in C&RT

## Regularization by Pruning

if all

different

but

(large E

) because

need a

want

argmin

(G) + λΩ(G)

—called

cannot enumerate

### all possible G

computationally:

—often consider only

systematic

### choice of λ? validation

(67)

Decision Tree Decision Tree Heuristics in C&RT

## Regularization by Pruning

if all

different

but

(large E

) because

need a

want

argmin

(G) + λΩ(G)

—called

cannot enumerate

### all possible G

computationally:

—often consider only

(0)

(i)

G

in

(i−1)

systematic

### choice of λ? validation

Hsuan-Tien Lin (NTU CSIE) Machine Learning Techniques 14/22

(68)

Decision Tree Decision Tree Heuristics in C&RT

## Regularization by Pruning

if all

different

but

(large E

) because

need a

want

argmin

(G) + λΩ(G)

—called

cannot enumerate

### all possible G

computationally:

—often consider only

systematic

### choice of λ? validation

(69)

Decision Tree Decision Tree Heuristics in C&RT

## Regularization by Pruning

if all

different

but

(large E

) because

need a

want

argmin

(G) + λΩ(G)

—called

cannot enumerate

### all possible G

computationally:

—often consider only

(0)

(i)

G

in

(i−1)

systematic

### choice of λ? validation

Hsuan-Tien Lin (NTU CSIE) Machine Learning Techniques 14/22

(70)

Decision Tree Decision Tree Heuristics in C&RT

## Regularization by Pruning

if all

different

but

(large E

) because

need a

want

argmin

(G) + λΩ(G)

—called

cannot enumerate

### all possible G

computationally:

—often consider only

systematic

### choice of λ? validation

(71)

Decision Tree Decision Tree Heuristics in C&RT

## Regularization by Pruning

if all

different

but

(large E

) because

need a

want

argmin

(G) + λΩ(G)

—called

cannot enumerate

### all possible G

computationally:

—often consider only

(0)

(i)

G

in

(i−1)

systematic

### choice of λ? validation

Hsuan-Tien Lin (NTU CSIE) Machine Learning Techniques 14/22

(72)

Decision Tree Decision Tree Heuristics in C&RT

## Regularization by Pruning

if all

different

but

(large E

) because

need a

want

argmin

(G) + λΩ(G)

—called

cannot enumerate

### all possible G

computationally:

—often consider only

(0)

(i)

G

in

### (G) such that G is one-leaf removed from G

(i−1)

(73)

Decision Tree Decision Tree Heuristics in C&RT

## Regularization by Pruning

if all

different

but

(large E

) because

need a

want

argmin

(G) + λΩ(G)

—called

cannot enumerate

### all possible G

computationally:

—often consider only

(0)

(i)

G

in

(i−1)

systematic

### validation

Hsuan-Tien Lin (NTU CSIE) Machine Learning Techniques 14/22

(74)

if all

different

but

(large E

) because

need a

want

argmin

(G) + λΩ(G)

—called

cannot enumerate

### all possible G

computationally:

—often consider only

(0)

(i)

G

in

### (G) such that G is one-leaf removed from G

(i−1)

(75)

Decision Tree Decision Tree Heuristics in C&RT

## Branching on Categorical Features

### numerical features

blood pressure:

130, 98, 115, 147, 120

decision stump

Jx

K + 1 with

∈ R

### categorical features

major symptom:

fever, pain, tired, sweaty

decision subset

Jx

K + 1 with

### C&RT

(& general decision trees): handles

### categorical features easily

Hsuan-Tien Lin (NTU CSIE) Machine Learning Techniques 15/22

(76)

Decision Tree Decision Tree Heuristics in C&RT

## Branching on Categorical Features

### numerical features

blood pressure:

130, 98, 115, 147, 120

decision stump

Jx

K + 1 with

∈ R

### categorical features

major symptom:

fever, pain, tired, sweaty

decision subset

with

### C&RT

(& general decision trees): handles

### categorical features easily

(77)

Decision Tree Decision Tree Heuristics in C&RT

## Branching on Categorical Features

### numerical features

blood pressure:

130, 98, 115, 147, 120

decision stump

Jx

K + 1 with

∈ R

### categorical features

major symptom:

fever, pain, tired, sweaty

decision subset

Jx

K + 1 with

### C&RT

(& general decision trees): handles

### categorical features easily

Hsuan-Tien Lin (NTU CSIE) Machine Learning Techniques 15/22

(78)

Decision Tree Decision Tree Heuristics in C&RT

## Branching on Categorical Features

### numerical features

blood pressure:

130, 98, 115, 147, 120

decision stump

Jx

K + 1 with

∈ R

### categorical features

major symptom:

fever, pain, tired, sweaty

decision subset

with

### C&RT

(& general decision trees): handles

### categorical features easily

(79)

Decision Tree Decision Tree Heuristics in C&RT

## Branching on Categorical Features

### numerical features

blood pressure:

130, 98, 115, 147, 120

decision stump

Jx

K + 1 with

∈ R

### categorical features

major symptom:

fever, pain, tired, sweaty

decision subset

Jx

K + 1 with

### C&RT

(& general decision trees): handles

### categorical features easily

Hsuan-Tien Lin (NTU CSIE) Machine Learning Techniques 15/22

(80)

### numerical features

blood pressure:

130, 98, 115, 147, 120

decision stump

Jx

K + 1 with

∈ R

### categorical features

major symptom:

fever, pain, tired, sweaty

decision subset

Jx

K + 1 with

### C&RT

(& general decision trees):

(81)

Decision Tree Decision Tree Heuristics in C&RT

## Missing Features by Surrogate Branch

possible

J

≤ 50kgK

if

### weight

missing during prediction:

### •

what would human do?

1

2

### C&RT: handles missing features easily

Hsuan-Tien Lin (NTU CSIE) Machine Learning Techniques 16/22

Updating...

## References

Related subjects :