C&RT here is based on selected components of CART TM of California Statistical Software

Hsuan-Tien Lin (NTU CSIE) Machine Learning Techniques 8/22

Decision Tree Decision Tree Algorithm

Classification and Regression Tree (C&RT)

function

DecisionTree(data D = {(x _n

_n

)}

^N _n=1

) if

termination criteria met

return

base hypothesis g t

(x) else ...

2

split D to

C

parts

D _c

= {(x

_n

) :

b(x _n )

=c}

two simple choices

• C

=2 (binary tree)

• g _t

(x) = E

_in

-optimal

constant

disclaimer:

C&RT

here is based on

selected components

CART ^TM of California Statistical Software

Decision Tree Decision Tree Algorithm

Classification and Regression Tree (C&RT)

function

DecisionTree(data D = {(x _n

_n

)}

^N _n=1

) if

termination criteria met

return

base hypothesis g t

(x) else ...

2

split D to

C

parts

D _c

= {(x

_n

) :

b(x _n )

=c}

two simple choices

• C

=2 (binary tree)

• g _t

(x) = E

_in

-optimal

constant

• binary/multiclass classification (0/1 error): majority of {y

}

• regression (squared error): average of {y

}

disclaimer:

C&RT

here is based on

selected components

CART ^TM of California Statistical Software

Hsuan-Tien Lin (NTU CSIE) Machine Learning Techniques 8/22

Decision Tree Decision Tree Algorithm

Classification and Regression Tree (C&RT)

function

DecisionTree(data D = {(x _n

_n

)}

^N _n=1

) if

termination criteria met

return

base hypothesis g t

(x) else ...

2

split D to

C

parts

D _c

= {(x

_n

) :

b(x _n )

=c}

two simple choices

• C

=2 (binary tree)

• g _t

(x) = E

_in

-optimal

constant

disclaimer:

C&RT

here is based on

selected components

CART ^TM of California Statistical Software

Decision Tree Decision Tree Algorithm

Classification and Regression Tree (C&RT)

function

DecisionTree(data D = {(x _n

_n

)}

^N _n=1

) if

termination criteria met

return

base hypothesis g t

(x) else ...

2

split D to

C

parts

D _c

= {(x

_n

) :

b(x _n )

=c}

two simple choices

• C

=2 (binary tree)

• g _t

(x) = E

_in

-optimal

constant

• binary/multiclass classification (0/1 error): majority of {y

}

• regression (squared error): average of {y

}

disclaimer:

C&RT

here is based on

selected components

CART ^TM of California Statistical Software

Hsuan-Tien Lin (NTU CSIE) Machine Learning Techniques 8/22

Decision Tree Decision Tree Algorithm

Classification and Regression Tree (C&RT)

function

DecisionTree(data D = {(x _n

_n

)}

^N _n=1

) if

termination criteria met

return

base hypothesis g t

(x) else ...

2

split D to

C

parts

D _c

= {(x

_n

) :

b(x _n )

=c}

two simple choices

• C

=2 (binary tree)

• g _t

(x) = E

_in

-optimal

constant

• binary/multiclass classification (0/1 error): majority of {y

}

• regression (squared error): average of {y

}

CART ^TM of California Statistical Software

Decision Tree Decision Tree Algorithm

Classification and Regression Tree (C&RT)

function

DecisionTree(data D = {(x _n

_n

)}

^N _n=1

) if

termination criteria met

return

base hypothesis g t

(x) else ...

2

split D to

C

parts

D _c

= {(x

_n

) :

b(x _n )

=c}

two simple choices

• C

=2 (binary tree)

• g _t

(x) = E

_in

-optimal

constant

• binary/multiclass classification (0/1 error): majority of {y

}

• regression (squared error): average of {y

}

disclaimer:

C&RT

here is based on

selected components

CART ^TM of California Statistical Software

Hsuan-Tien Lin (NTU CSIE) Machine Learning Techniques 8/22

Decision Tree Decision Tree Algorithm

Branching in C&RT: Purifying

function

DecisionTree(data D = {(x _n

_n

)}

^N _n=1

) if

termination criteria met

return

base hypothesis g _t

(x) = E

_in

-optimal

constant

else ...

1

learn

branching criteria b(x)

2

split D to

2

parts

D _c

= {(x

_n

) :

b(x _n )

=c}

more simple choices

b(x) =

argmin

decision stumps h(x) 2

c=1

_c

with h| ·

impurity(D _c

with h)

C&RT: bi-branching

purifying

Decision Tree Decision Tree Algorithm

Branching in C&RT: Purifying

function

DecisionTree(data D = {(x _n

_n

)}

^N _n=1

) if

termination criteria met

return

base hypothesis g _t

(x) = E

_in

-optimal

constant

else ...

1

learn

branching criteria b(x)

2

split D to

2

parts

D _c

= {(x

_n

) :

b(x _n )

=c}

more simple choices

•

simple internal node for

C = 2: {1, 2}-output decision stump

•

‘easier’ sub-tree: branch by

purifying

b(x) =

argmin

decision stumps h(x) 2

c=1

_c

with h| ·

impurity(D _c

with h)

C&RT: bi-branching

purifying

Hsuan-Tien Lin (NTU CSIE) Machine Learning Techniques 9/22

Decision Tree Decision Tree Algorithm

Branching in C&RT: Purifying

function

DecisionTree(data D = {(x _n

_n

)}

^N _n=1

) if

termination criteria met

return

base hypothesis g _t

(x) = E

_in

-optimal

constant

else ...

1

learn

branching criteria b(x)

2

split D to

2

parts

D _c

= {(x

_n

) :

b(x _n )

=c}

more simple choices

•

simple internal node for

C = 2: {1, 2}-output decision stump

b(x) =

argmin

decision stumps h(x) 2

c=1

_c

with h| ·

impurity(D _c

with h)

C&RT: bi-branching

purifying

Decision Tree Decision Tree Algorithm

Branching in C&RT: Purifying

function

DecisionTree(data D = {(x _n

_n

)}

^N _n=1

) if

termination criteria met

return

base hypothesis g _t

(x) = E

_in

-optimal

constant

else ...

1

learn

branching criteria b(x)

2

split D to

2

parts

D _c

= {(x

_n

) :

b(x _n )

=c}

more simple choices

•

simple internal node for

C = 2: {1, 2}-output decision stump

•

‘easier’ sub-tree: branch by

purifying

b(x) =

argmin

decision stumps h(x) 2

c=1

_c

with h| ·

impurity(D _c

with h)

C&RT: bi-branching

purifying

Hsuan-Tien Lin (NTU CSIE) Machine Learning Techniques 9/22

Decision Tree Decision Tree Algorithm

Branching in C&RT: Purifying

function

DecisionTree(data D = {(x _n

_n

)}

^N _n=1

) if

termination criteria met

return

base hypothesis g _t

(x) = E

_in

-optimal

constant

else ...

1

learn

branching criteria b(x)

2

split D to

2

parts

D _c

= {(x

_n

) :

b(x _n )

=c}

more simple choices

•

simple internal node for

C = 2: {1, 2}-output decision stump

•

‘easier’ sub-tree: branch by

purifying

b(x) =

argmin

2

X|D

_c

with h| ·

impurity(D _c

with h)

Decision Tree Decision Tree Algorithm

Branching in C&RT: Purifying

function

DecisionTree(data D = {(x _n

_n

)}

^N _n=1

) if

termination criteria met

return

base hypothesis g _t

(x) = E

_in

-optimal

constant

else ...

1

learn

branching criteria b(x)

2

split D to

2

parts

D _c

= {(x

_n

) :

b(x _n )

=c}

more simple choices

•

simple internal node for

C = 2: {1, 2}-output decision stump

•

‘easier’ sub-tree: branch by

purifying

b(x) =

argmin

decision stumps h(x) 2

c=1

_c

with h| ·

impurity(D _c

with h)

C&RT: bi-branching

purifying

Hsuan-Tien Lin (NTU CSIE) Machine Learning Techniques 9/22

Decision Tree Decision Tree Algorithm

Impurity Functions

by E in of optimal constant

•

regression error:

impurity(D) = 1 N

X

n=1

(y

− y ¯ )

with

y ¯

average

of {y

_n

}

•

classification error:

impurity(D) = 1 N

X

n=1

Jy

6= y

^∗

K

with

y ^∗

majority

of {y

_n

}

for classification

1 − X

k =1

n=1

Jy

= k K N

—all k considered together

•

classification error:

1 − max

1≤k ≤K

P

n=1

Jy

= k K N

—optimal

k = y ^∗

only

Gini

for classification,

regression error

for regression

Decision Tree Decision Tree Algorithm

Impurity Functions

by E in of optimal constant

•

regression error:

impurity(D) = 1 N

X

n=1

(y

− y ¯ )

with

y ¯

average

of {y

_n

}

•

classification error:

impurity(D) = 1 N

X

n=1

Jy

6= y

^∗

K

with

y ^∗

majority

of {y

_n

}

for classification

•

Gini index:

1 −

X

k =1

P

n=1

Jy

= k K N

!

—all k considered together

•

classification error:

1 − max

1≤k ≤K

P

n=1

Jy

= k K N

—optimal

k = y ^∗

only

Gini

for classification,

regression error

for regression

Hsuan-Tien Lin (NTU CSIE) Machine Learning Techniques 10/22

Decision Tree Decision Tree Algorithm

Impurity Functions

by E in of optimal constant

•

regression error:

impurity(D) = 1 N

X

n=1

(y

− y ¯ )

with

y ¯

average

of {y

_n

}

•

classification error:

impurity(D) = 1 N

X

n=1

Jy

6= y

^∗

K

with

y ^∗

majority

of {y

_n

}

for classification

1 − X

k =1

n=1

Jy

= k K N

—all k considered together

•

classification error:

1 − max

1≤k ≤K

P

n=1

Jy

= k K N

—optimal

k = y ^∗

only

Gini

for classification,

regression error

for regression

Decision Tree Decision Tree Algorithm

Impurity Functions

by E in of optimal constant

•

regression error:

impurity(D) = 1 N

X

n=1

(y

− y ¯ )

with

y ¯

average

of {y

_n

}

•

classification error:

impurity(D) = 1 N

X

n=1

Jy

6= y

^∗

K

with

y ^∗

majority

of {y

_n

}

for classification

•

Gini index:

1 −

X

k =1

P

n=1

Jy

= k K N

!

—all k considered together

•

classification error:

1 − max

1≤k ≤K

P

n=1

Jy

= k K N

—optimal

k = y ^∗

only

Gini

for classification,

regression error

for regression

Hsuan-Tien Lin (NTU CSIE) Machine Learning Techniques 10/22

Decision Tree Decision Tree Algorithm

Impurity Functions

by E in of optimal constant

•

regression error:

impurity(D) = 1 N

X

n=1

(y

− y ¯ )

with

y ¯

average

of {y

_n

}

•

classification error:

impurity(D) = 1 N

X

n=1

Jy

6= y

^∗

K

with

y ^∗

majority

of {y

_n

}

for classification

1 − X

k =1

n=1

Jy

= k K N

—all k considered together

•

classification error:

1 − max

1≤k ≤K

P

n=1

Jy

= k K N

—optimal

k = y ^∗

only

Gini

for classification,

regression error

for regression

Decision Tree Decision Tree Algorithm

Impurity Functions

by E in of optimal constant

•

regression error:

impurity(D) = 1 N

X

n=1

(y

− y ¯ )

with

y ¯

average

of {y

_n

}

•

classification error:

impurity(D) = 1 N

X

n=1

Jy

6= y

^∗

K

with

y ^∗

majority

of {y

_n

}

for classification

•

Gini index:

1 −

X

k =1

P

n=1

Jy

= k K N

!

—all k considered together

•

classification error:

1 − max

1≤k ≤K

P

n=1

Jy

= k K N

—optimal

k = y ^∗

only

Gini

for classification,

regression error

for regression

Hsuan-Tien Lin (NTU CSIE) Machine Learning Techniques 10/22

by E in of optimal constant

•

regression error:

impurity(D) = 1 N

X

n=1

(y

− y ¯ )

with

y ¯

average

of {y

_n

}

•

classification error:

impurity(D) = 1 N

X

n=1

Jy

6= y

^∗

K

with

y ^∗

majority

of {y

_n

}

for classification

•

Gini index:

1 −

X

k =1

P

n=1

Jy

= k K N

!

—all k considered together

•

classification error:

1 − max

1≤k ≤K

P

n=1

Jy

= k K N

—optimal

k = y ^∗

only

Decision Tree Decision Tree Algorithm

Termination in C&RT

function

DecisionTree(data D = {(x _n

n

)}

^N _n=1

) if

termination criteria met

return

base hypothesis g _t

(x) = E

_in

-optimal

constant

else ...

1

learn

branching criteria

b(x) =

argmin

decision stumps h(x) 2

c=1

_c

with h| ·

impurity(D c

with h)

‘forced’ to terminate when

•

all

y _n the same: impurity

= 0 =⇒

g _t

(x) =

y _n

•

all

x _n the same: no decision stumps

C&RT: fully-grown tree

with

constant leaves

that come from

bi-branching

purifying

Hsuan-Tien Lin (NTU CSIE) Machine Learning Techniques 11/22

Decision Tree Decision Tree Algorithm

Termination in C&RT

function

DecisionTree(data D = {(x _n

n

)}

^N _n=1

) if

termination criteria met

return

base hypothesis g _t

(x) = E

_in

-optimal

constant

else ...

1

learn

branching criteria

b(x) =

argmin

decision stumps h(x) 2

c=1

_c

with h| ·

impurity(D c

with h)

‘forced’ to terminate when

•

all

y _n the same: impurity

= 0 =⇒

g _t

(x) =

y _n

C&RT: fully-grown tree

with

constant leaves

that come from

bi-branching

purifying

Decision Tree Decision Tree Algorithm

Termination in C&RT

function

DecisionTree(data D = {(x _n

n

)}

^N _n=1

) if

termination criteria met

return

base hypothesis g _t

(x) = E

_in

-optimal

constant

else ...

1

learn

branching criteria

b(x) =

argmin

decision stumps h(x) 2

c=1

_c

with h| ·

impurity(D c

with h)

‘forced’ to terminate when

•

all

y _n the same: impurity

= 0 =⇒

g _t

(x) =

y _n

•

all

x _n the same: no decision stumps

C&RT: fully-grown tree

with

constant leaves

that come from

bi-branching

purifying

Hsuan-Tien Lin (NTU CSIE) Machine Learning Techniques 11/22

function

DecisionTree(data D = {(x _n

n

)}

^N _n=1

) if

termination criteria met

return

base hypothesis g _t

(x) = E

_in

-optimal

constant

else ...

1

learn

branching criteria

b(x) =

argmin

decision stumps h(x) 2

c=1

_c

with h| ·

impurity(D c

with h)

‘forced’ to terminate when

•

all

y _n the same: impurity

= 0 =⇒

g _t

(x) =

y _n

•

all

x _n the same: no decision stumps

Decision Tree Decision Tree Algorithm

Fun Time

For the Gini index, 1 −P

K k =1

_P

n=1

Jy

=k K N

2

. Consider K = 2, and let µ =

^N _N

¹, where N

₁

is the number of examples with y

n

=1. Which of the following formula of µ equals the Gini index in this case?

1

2µ(1 − µ)

2

2µ

²

(1 − µ)

3

2µ(1 − µ)

²

4

2µ

²

(1 − µ)

²

Reference Answer: 1

Simplify 1 − (µ

²

+ (1 − µ)

²

)and the answer should pop up.

Hsuan-Tien Lin (NTU CSIE) Machine Learning Techniques 12/22

For the Gini index, 1 −P

K k =1

_P

n=1

Jy

=k K N

2

. Consider K = 2, and let µ =

^N _N

¹, where N

₁

is the number of examples with y

n

=1. Which of the following formula of µ equals the Gini index in this case?

1

2µ(1 − µ)

2

2µ

²

(1 − µ)

3

2µ(1 − µ)

²

4

2µ

²

(1 − µ)

²

Reference Answer: 1

Simplify 1 − (µ

²

+ (1 − µ)

²

)and the answer

Decision Tree Decision Tree Heuristics in C&RT

Basic C&RT Algorithm

function

DecisionTree

data D = {(x

_n

)}

^N _n=1

cannot branch anymore

return

g t

(x) = E

_in

-optimal

constant

else

1

learn

branching criteria

b(x) =

argmin

decision stumps h(x) 2

c=1

_c

with h| ·

impurity(D _c

with h)

2

split D to

2

parts

D _c

= {(x

n

) :

b(x n )

=c}

3

build sub-tree

G c

←

DecisionTree(D _c

)

4

return

G(x) =

2 c=1

b(x)

=cK

G _c

(x)

easily handle binary classification, regression, &

multi-class classification

Hsuan-Tien Lin (NTU CSIE) Machine Learning Techniques 13/22

function

DecisionTree

data D = {(x

_n

)}

^N _n=1

cannot branch anymore

return

g t

(x) = E

_in

-optimal

constant

else

1

learn

branching criteria

b(x) =

argmin

decision stumps h(x) 2

c=1

_c

with h| ·

impurity(D _c

with h)

2

split D to

2

parts

D _c

= {(x

n

) :

b(x n )

=c}

3

build sub-tree

G c

←

DecisionTree(D _c

)

4

return

G(x) =

2 c=1

b(x)

=cK

G _c

(x)

Decision Tree Decision Tree Heuristics in C&RT

Regularization by Pruning

fully-grown tree: E _in (G) = 0

if all

x n

different

but

overfit

(large E

_out

) because

low-level trees built with small D _c

•

need a

regularizer, say, Ω(G) = NumberOfLeaves(G)

•

want

regularized decision tree:

argmin

all possible G

E _in

(G) + λΩ(G)

—called

pruned decision tree

•

cannot enumerate

all possible G

computationally:

—often consider only

• G

⁽⁰⁾

= fully-grown tree

• G

⁽ⁱ⁾

= argmin

E

(G) such that G is one-leaf removed from G

⁽ⁱ⁻¹⁾

systematic

choice of λ? validation

Hsuan-Tien Lin (NTU CSIE) Machine Learning Techniques 14/22

Decision Tree Decision Tree Heuristics in C&RT

Regularization by Pruning

fully-grown tree: E _in (G) = 0

if all

x n

different

but

overfit

(large E

_out

) because

low-level trees built with small D _c

•

need a

regularizer, say, Ω(G) = NumberOfLeaves(G)

•

want

regularized decision tree:

argmin

all possible G

E _in

(G) + λΩ(G)

—called

pruned decision tree

•

cannot enumerate

all possible G

computationally:

—often consider only

systematic

choice of λ? validation

Decision Tree Decision Tree Heuristics in C&RT

Regularization by Pruning

fully-grown tree: E _in (G) = 0

if all

x n

different

but

overfit

(large E

_out

) because

low-level trees built with small D _c

•

need a

regularizer, say, Ω(G) = NumberOfLeaves(G)

•

want

regularized decision tree:

argmin

all possible G

E _in

(G) + λΩ(G)

—called

pruned decision tree

•

cannot enumerate

all possible G

computationally:

—often consider only

• G

⁽⁰⁾

= fully-grown tree

• G

⁽ⁱ⁾

= argmin

E

(G) such that G is one-leaf removed from G

⁽ⁱ⁻¹⁾

systematic

choice of λ? validation

Hsuan-Tien Lin (NTU CSIE) Machine Learning Techniques 14/22

Decision Tree Decision Tree Heuristics in C&RT

Regularization by Pruning

fully-grown tree: E _in (G) = 0

if all

x n

different

but

overfit

(large E

_out

) because

low-level trees built with small D _c

•

need a

regularizer, say, Ω(G) = NumberOfLeaves(G)

•

want

regularized decision tree:

argmin

all possible G

E _in

(G) + λΩ(G)

—called

pruned decision tree

•

cannot enumerate

all possible G

computationally:

—often consider only

systematic

choice of λ? validation

Decision Tree Decision Tree Heuristics in C&RT

Regularization by Pruning

fully-grown tree: E _in (G) = 0

if all

x n

different

but

overfit

(large E

_out

) because

low-level trees built with small D _c

•

need a

regularizer, say, Ω(G) = NumberOfLeaves(G)

•

want

regularized decision tree:

argmin

all possible G

E _in

(G) + λΩ(G)

—called

pruned decision tree

•

cannot enumerate

all possible G

computationally:

—often consider only

• G

⁽⁰⁾

= fully-grown tree

• G

⁽ⁱ⁾

= argmin

E

(G) such that G is one-leaf removed from G

⁽ⁱ⁻¹⁾

systematic

choice of λ? validation

Hsuan-Tien Lin (NTU CSIE) Machine Learning Techniques 14/22

Decision Tree Decision Tree Heuristics in C&RT

Regularization by Pruning

fully-grown tree: E _in (G) = 0

if all

x n

different

but

overfit

(large E

_out

) because

low-level trees built with small D _c

•

need a

regularizer, say, Ω(G) = NumberOfLeaves(G)

•

want

regularized decision tree:

argmin

all possible G

E _in

(G) + λΩ(G)

—called

pruned decision tree

•

cannot enumerate

all possible G

computationally:

—often consider only

systematic

choice of λ? validation

Decision Tree Decision Tree Heuristics in C&RT

Regularization by Pruning

fully-grown tree: E _in (G) = 0

if all

x n

different

but

overfit

(large E

_out

) because

low-level trees built with small D _c

•

need a

regularizer, say, Ω(G) = NumberOfLeaves(G)

•

want

regularized decision tree:

argmin

all possible G

E _in

(G) + λΩ(G)

—called

pruned decision tree

•

cannot enumerate

all possible G

computationally:

—often consider only

• G

⁽⁰⁾

= fully-grown tree

• G

⁽ⁱ⁾

= argmin

E

(G) such that G is one-leaf removed from G

⁽ⁱ⁻¹⁾

systematic

choice of λ? validation

Hsuan-Tien Lin (NTU CSIE) Machine Learning Techniques 14/22

Decision Tree Decision Tree Heuristics in C&RT

Regularization by Pruning

fully-grown tree: E _in (G) = 0

if all

x n

different

but

overfit

(large E

_out

) because

low-level trees built with small D _c

•

need a

regularizer, say, Ω(G) = NumberOfLeaves(G)

•

want

regularized decision tree:

argmin

all possible G

E _in

(G) + λΩ(G)

—called

pruned decision tree

•

cannot enumerate

all possible G

computationally:

—often consider only

• G

⁽⁰⁾

= fully-grown tree

• G

⁽ⁱ⁾

= argmin

E

_in

(G) such that G is one-leaf removed from G

⁽ⁱ⁻¹⁾

Decision Tree Decision Tree Heuristics in C&RT

Regularization by Pruning

fully-grown tree: E _in (G) = 0

if all

x n

different

but

overfit

(large E

_out

) because

low-level trees built with small D _c

•

need a

regularizer, say, Ω(G) = NumberOfLeaves(G)

•

want

regularized decision tree:

argmin

all possible G

E _in

(G) + λΩ(G)

—called

pruned decision tree

•

cannot enumerate

all possible G

computationally:

—often consider only

• G

⁽⁰⁾

= fully-grown tree

• G

⁽ⁱ⁾

= argmin

E

_in

(G) such that G is one-leaf removed from G

⁽ⁱ⁻¹⁾

systematic

choice of λ?

在文檔中 Machine Learning Techniques (ᘤᢈ) (頁 37-73)

C&RT here is based on selected components of CART TM of California Statistical Software

Classification and Regression Tree (C&RT)

DecisionTree(data D = {(x n

n

N n=1

termination criteria met

base hypothesis g t

2

C

D c

n

n

b(x n )

two simple choices

• C

• g t

in

constant

C&RT

selected components

CART TM of California Statistical Software

Classification and Regression Tree (C&RT)

DecisionTree(data D = {(x n

n

N n=1

termination criteria met

base hypothesis g t

2

C

D c

n

n

b(x n )

two simple choices

• C

• g t

in

constant

• binary/multiclass classification (0/1 error): majority of {y

}

• regression (squared error): average of {y

}

C&RT

selected components

CART TM of California Statistical Software

Classification and Regression Tree (C&RT)

DecisionTree(data D = {(x n

n

N n=1

termination criteria met

base hypothesis g t

2

C

D c

n

n

b(x n )

two simple choices

• C

• g t

in

constant

C&RT

selected components

CART TM of California Statistical Software

Classification and Regression Tree (C&RT)

DecisionTree(data D = {(x n

n

N n=1

termination criteria met

base hypothesis g t

2

C

D c

n

n

b(x n )

two simple choices

• C

• g t

DecisionTree(data D = {(x _n

_n

^N _n=1

D _c

_n

_n

b(x _n )

• g _t

_in

CART ^TM of California Statistical Software

DecisionTree(data D = {(x _n

_n

^N _n=1

D _c

_n

_n

b(x _n )

• g _t

_in

CART ^TM of California Statistical Software

DecisionTree(data D = {(x _n

_n

^N _n=1

D _c

_n

_n

b(x _n )

• g _t

_in

CART ^TM of California Statistical Software

DecisionTree(data D = {(x _n

_n

^N _n=1

D _c

_n

_n

b(x _n )

• g _t

_in

CART ^TM of California Statistical Software

DecisionTree(data D = {(x _n

_n

^N _n=1

D _c

_n

_n

b(x _n )

• g _t

_in

CART ^TM of California Statistical Software

DecisionTree(data D = {(x _n

_n

^N _n=1

D _c

_n

_n

b(x _n )

• g _t

_in

CART ^TM of California Statistical Software

DecisionTree(data D = {(x _n

_n

^N _n=1

base hypothesis g _t

_in

D _c

_n

_n