• 沒有找到結果。

C&RT here is based on selected components of CART TM of California Statistical Software

Hsuan-Tien Lin (NTU CSIE) Machine Learning Techniques 8/22

Decision Tree Decision Tree Algorithm

Classification and Regression Tree (C&RT)

function

,y

)}

) if

return

(x) else ...

split D to

parts

= {(x

,y

) :

=c}

=2 (binary tree)

(x) = E

-optimal

disclaimer:

here is based on

of

CARTTMof California Statistical Software

Decision Tree Decision Tree Algorithm

Classification and Regression Tree (C&RT)

function

,y

)}

) if

return

(x) else ...

split D to

parts

= {(x

,y

) :

=c}

=2 (binary tree)

(x) = E

-optimal

n

n

disclaimer:

here is based on

of

CARTTMof California Statistical Software

Hsuan-Tien Lin (NTU CSIE) Machine Learning Techniques 8/22

Decision Tree Decision Tree Algorithm

Classification and Regression Tree (C&RT)

function

,y

)}

) if

return

(x) else ...

split D to

parts

= {(x

,y

) :

=c}

=2 (binary tree)

(x) = E

-optimal

disclaimer:

here is based on

of

CARTTMof California Statistical Software

Decision Tree Decision Tree Algorithm

Classification and Regression Tree (C&RT)

function

,y

)}

) if

return

(x) else ...

split D to

parts

= {(x

,y

) :

=c}

=2 (binary tree)

(x) = E

-optimal

n

n

disclaimer:

here is based on

of

CARTTMof California Statistical Software

Hsuan-Tien Lin (NTU CSIE) Machine Learning Techniques 8/22

Decision Tree Decision Tree Algorithm

Classification and Regression Tree (C&RT)

function

,y

)}

) if

return

(x) else ...

split D to

parts

= {(x

,y

) :

=c}

=2 (binary tree)

(x) = E

-optimal

n

n

of

CARTTMof California Statistical Software

Decision Tree Decision Tree Algorithm

Classification and Regression Tree (C&RT)

function

,y

)}

) if

return

(x) else ...

split D to

parts

= {(x

,y

) :

=c}

=2 (binary tree)

(x) = E

-optimal

n

n

disclaimer:

here is based on

of

CARTTMof California Statistical Software

Hsuan-Tien Lin (NTU CSIE) Machine Learning Techniques 8/22

Decision Tree Decision Tree Algorithm

Branching in C&RT: Purifying

function

,y

)}

) if

return

(x) = E

-optimal

else ...

learn

split D to

parts

= {(x

,y

) :

=c}

argmin

X

|D

with h| ·

with h)

by

purifying

Decision Tree Decision Tree Algorithm

Branching in C&RT: Purifying

function

,y

)}

) if

return

(x) = E

-optimal

else ...

learn

split D to

parts

= {(x

,y

) :

=c}

•

simple internal node for

•

‘easier’ sub-tree: branch by

argmin

X

|D

with h| ·

with h)

by

purifying

Hsuan-Tien Lin (NTU CSIE) Machine Learning Techniques 9/22

Decision Tree Decision Tree Algorithm

Branching in C&RT: Purifying

function

,y

)}

) if

return

(x) = E

-optimal

else ...

learn

split D to

parts

= {(x

,y

) :

=c}

•

simple internal node for

argmin

X

|D

with h| ·

with h)

by

purifying

Decision Tree Decision Tree Algorithm

Branching in C&RT: Purifying

function

,y

)}

) if

return

(x) = E

-optimal

else ...

learn

split D to

parts

= {(x

,y

) :

=c}

•

simple internal node for

•

‘easier’ sub-tree: branch by

argmin

X

|D

with h| ·

with h)

by

purifying

Hsuan-Tien Lin (NTU CSIE) Machine Learning Techniques 9/22

Decision Tree Decision Tree Algorithm

Branching in C&RT: Purifying

function

,y

)}

) if

return

(x) = E

-optimal

else ...

learn

split D to

parts

= {(x

,y

) :

=c}

•

simple internal node for

•

‘easier’ sub-tree: branch by

argmin

X|D

with h| ·

impurity(D c

with h)

Decision Tree Decision Tree Algorithm

Branching in C&RT: Purifying

function

,y

)}

) if

return

(x) = E

-optimal

else ...

learn

split D to

parts

= {(x

,y

) :

=c}

•

simple internal node for

•

‘easier’ sub-tree: branch by

argmin

X

|D

with h| ·

with h)

by

purifying

Hsuan-Tien Lin (NTU CSIE) Machine Learning Techniques 9/22

Decision Tree Decision Tree Algorithm

Impurity Functions

•

regression error:

N

n=1

n

2

with

=

of {y

}

•

classification error:

N

n=1

n

with

=

of {y

}

k =1

n=1

n

= k K N

—all k considered together

•

classification error:

1≤k ≤K

N

n=1

n

—optimal

only

choices:

Gini

for classification,

regression error

for regression

Decision Tree Decision Tree Algorithm

Impurity Functions

•

regression error:

N

n=1

n

2

with

=

of {y

}

•

classification error:

N

n=1

n

with

=

of {y

}

Gini index:

K

k =1

N

n=1

n

!

2

—all k considered together

•

classification error:

1≤k ≤K

N

n=1

n

—optimal

only

choices:

Gini

for classification,

regression error

for regression

Hsuan-Tien Lin (NTU CSIE) Machine Learning Techniques 10/22

Decision Tree Decision Tree Algorithm

Impurity Functions

•

regression error:

N

n=1

n

2

with

=

of {y

}

•

classification error:

N

n=1

n

with

=

of {y

}

k =1

n=1

n

= k K N

—all k considered together

•

classification error:

1≤k ≤K

N

n=1

n

—optimal

only

choices:

Gini

for classification,

regression error

for regression

Decision Tree Decision Tree Algorithm

Impurity Functions

•

regression error:

N

n=1

n

2

with

=

of {y

}

•

classification error:

N

n=1

n

with

=

of {y

}

Gini index:

K

k =1

N

n=1

n

!

2

—all k considered together

•

classification error:

1≤k ≤K

N

n=1

n

—optimal

only

choices:

Gini

for classification,

regression error

for regression

Hsuan-Tien Lin (NTU CSIE) Machine Learning Techniques 10/22

Decision Tree Decision Tree Algorithm

Impurity Functions

•

regression error:

N

n=1

n

2

with

=

of {y

}

•

classification error:

N

n=1

n

with

=

of {y

}

k =1

n=1

n

= k K N

—all k considered together

•

classification error:

1≤k ≤K

N

n=1

n

—optimal

only

choices:

Gini

for classification,

regression error

for regression

Decision Tree Decision Tree Algorithm

Impurity Functions

•

regression error:

N

n=1

n

2

with

=

of {y

}

•

classification error:

N

n=1

n

with

=

of {y

}

Gini index:

K

k =1

N

n=1

n

!

2

—all k considered together

•

classification error:

1≤k ≤K

N

n=1

n

—optimal

only

choices:

Gini

for classification,

regression error

for regression

Hsuan-Tien Lin (NTU CSIE) Machine Learning Techniques 10/22

•

regression error:

N

n=1

n

2

with

=

of {y

}

•

classification error:

N

n=1

n

with

=

of {y

}

Gini index:

K

k =1

N

n=1

n

!

2

—all k considered together

•

classification error:

1≤k ≤K

N

n=1

n

—optimal

k = y ∗

only

Decision Tree Decision Tree Algorithm

Termination in C&RT

function

,y

)}

) if

return

(x) = E

-optimal

else ...

learn

argmin

X

|D

with h| ·

with h)

all

= 0 =⇒

(x) =

all

with

that come from

by

purifying

Hsuan-Tien Lin (NTU CSIE) Machine Learning Techniques 11/22

Decision Tree Decision Tree Algorithm

Termination in C&RT

function

,y

)}

) if

return

(x) = E

-optimal

else ...

learn

argmin

X

|D

with h| ·

with h)

all

= 0 =⇒

(x) =

with

that come from

by

purifying

Decision Tree Decision Tree Algorithm

Termination in C&RT

function

,y

)}

) if

return

(x) = E

-optimal

else ...

learn

argmin

X

|D

with h| ·

with h)

all

= 0 =⇒

(x) =

all

with

that come from

by

purifying

Hsuan-Tien Lin (NTU CSIE) Machine Learning Techniques 11/22

function

,y

)}

) if

return

(x) = E

-optimal

else ...

learn

argmin

X

|D

with h| ·

with h)

all

= 0 =⇒

(x) =

all

xn the same: no decision stumps

Decision Tree Decision Tree Algorithm

Fun Time

For the Gini index, 1 −P



N

n=1

n



2

. Consider K = 2, and let µ =

1, where N

1

is the number of examples with y

n

=1. Which of the following formula of µ equals the Gini index in this case?

2µ(1 − µ)

(1 − µ)

2µ(1 − µ)

(1 − µ)

Simplify 1 − (µ

+ (1 − µ)

2

)and the answer should pop up.

Hsuan-Tien Lin (NTU CSIE) Machine Learning Techniques 12/22

For the Gini index, 1 −P



N

n=1

n



2

. Consider K = 2, and let µ =

1, where N

1

is the number of examples with y

n

=1. Which of the following formula of µ equals the Gini index in this case?

2µ(1 − µ)

(1 − µ)

2µ(1 − µ)

(1 − µ)

Simplify 1 − (µ

+ (1 − µ)

2

Decision Tree Decision Tree Heuristics in C&RT

Basic C&RT Algorithm

function

data D = {(x

,y

)}

 if

return

(x) = E

-optimal

else

learn

argmin

X

|D

with h| ·

with h)

split D to

parts

= {(x

,y

) :

=c}

build sub-tree

)

return

P

J

=cK

G c

(x)

easily handle binary classification, regression, &

multi-class classification

Hsuan-Tien Lin (NTU CSIE) Machine Learning Techniques 13/22

function

data D = {(x

,y

)}

 if

return

(x) = E

-optimal

else

learn

argmin

X

|D

with h| ·

with h)

split D to

parts

= {(x

,y

) :

=c}

build sub-tree

)

return

P

J

=cK

G c

(x)

Decision Tree Decision Tree Heuristics in C&RT

Regularization by Pruning

if all

different

but

(large E

) because

need a

want

argmin

(G) + λΩ(G)

—called

cannot enumerate

all possible G

computationally:

—often consider only

(0)

(i)

G

in

(i−1)

systematic

choice of λ? validation

Hsuan-Tien Lin (NTU CSIE) Machine Learning Techniques 14/22

Decision Tree Decision Tree Heuristics in C&RT

Regularization by Pruning

if all

different

but

(large E

) because

need a

want

argmin

(G) + λΩ(G)

—called

cannot enumerate

all possible G

computationally:

—often consider only

systematic

choice of λ? validation

Decision Tree Decision Tree Heuristics in C&RT

Regularization by Pruning

if all

different

but

(large E

) because

need a

want

argmin

(G) + λΩ(G)

—called

cannot enumerate

all possible G

computationally:

—often consider only

(0)

(i)

G

in

(i−1)

systematic

choice of λ? validation

Hsuan-Tien Lin (NTU CSIE) Machine Learning Techniques 14/22

Decision Tree Decision Tree Heuristics in C&RT

Regularization by Pruning

if all

different

but

(large E

) because

need a

want

argmin

(G) + λΩ(G)

—called

cannot enumerate

all possible G

computationally:

—often consider only

systematic

choice of λ? validation

Decision Tree Decision Tree Heuristics in C&RT

Regularization by Pruning

if all

different

but

(large E

) because

need a

want

argmin

(G) + λΩ(G)

—called

cannot enumerate

all possible G

computationally:

—often consider only

(0)

(i)

G

in

(i−1)

systematic

choice of λ? validation

Hsuan-Tien Lin (NTU CSIE) Machine Learning Techniques 14/22

Decision Tree Decision Tree Heuristics in C&RT

Regularization by Pruning

if all

different

but

(large E

) because

need a

want

argmin

(G) + λΩ(G)

—called

cannot enumerate

all possible G

computationally:

—often consider only

systematic

choice of λ? validation

Decision Tree Decision Tree Heuristics in C&RT

Regularization by Pruning

if all

different

but

(large E

) because

need a

want

argmin

(G) + λΩ(G)

—called

cannot enumerate

all possible G

computationally:

—often consider only

(0)

(i)

G

in

(i−1)

systematic

choice of λ? validation

Hsuan-Tien Lin (NTU CSIE) Machine Learning Techniques 14/22

Decision Tree Decision Tree Heuristics in C&RT

Regularization by Pruning

if all

different

but

(large E

) because

need a

want

argmin

(G) + λΩ(G)

—called

cannot enumerate

all possible G

computationally:

—often consider only

(0)

(i)

G

in

(G) such that G is one-leaf removed from G

(i−1)

Decision Tree Decision Tree Heuristics in C&RT

Regularization by Pruning

if all

different

but

(large E

) because

need a

want

argmin

(G) + λΩ(G)

—called

cannot enumerate

all possible G

computationally:

—often consider only

(0)

(i)

G

in

(i−1)

systematic

Outline