yes, you can ‘easily’ understand everything! :-)

Finale Machine Learning in Practice

NTU KDDCup 2011 Track 1 World Champion Model

A linear ensemble of individual and blended models for music rating

prediction, Chen et al., KDDCup 2011

NNet, DecTree-like, and then linear blending of

• Matrix Factorization

variants, including probabilistic

PCA

• Restricted Boltzmann Machines: an ‘extended’ autoencoder

• k Nearest Neighbors

• Probabilistic Latent Semantic Analysis:

an extraction model that has

‘soft clusters’

as hidden variables

•

linear regression, NNet, & GBDT

yes, you can ‘easily’

Finale Machine Learning in Practice

NTU KDDCup 2011 Track 1 World Champion Model

A linear ensemble of individual and blended models for music rating

prediction, Chen et al., KDDCup 2011

NNet, DecTree-like, and then linear blending of

• Matrix Factorization

variants, including probabilistic

PCA

• Restricted Boltzmann Machines: an ‘extended’ autoencoder

• k Nearest Neighbors

• Probabilistic Latent Semantic Analysis:

an extraction model that has

‘soft clusters’

as hidden variables

•

linear regression, NNet, & GBDT

yes, you can ‘easily’

understand everything! :-)

Finale Machine Learning in Practice

NTU KDDCup 2011 Track 1 World Champion Model

A linear ensemble of individual and blended models for music rating

prediction, Chen et al., KDDCup 2011

NNet, DecTree-like, and then linear blending of

• Matrix Factorization

variants, including probabilistic

PCA

• Restricted Boltzmann Machines: an ‘extended’ autoencoder

• k Nearest Neighbors

• Probabilistic Latent Semantic Analysis:

an extraction model that has

‘soft clusters’

as hidden variables

•

linear regression, NNet, & GBDT

yes, you can ‘easily’

understand everything! :-)

Finale Machine Learning in Practice

NTU KDDCup 2011 Track 1 World Champion Model

A linear ensemble of individual and blended models for music rating

prediction, Chen et al., KDDCup 2011

NNet, DecTree-like, and then linear blending of

• Matrix Factorization

variants, including probabilistic

PCA

• Restricted Boltzmann Machines: an ‘extended’ autoencoder

• k Nearest Neighbors

• Probabilistic Latent Semantic Analysis:

an extraction model that has

‘soft clusters’

as hidden variables

•

linear regression, NNet, & GBDT

yes, you can ‘easily’

understand everything! :-)

Finale Machine Learning in Practice

NTU KDDCup 2011 Track 1 World Champion Model

A linear ensemble of individual and blended models for music rating

prediction, Chen et al., KDDCup 2011

NNet, DecTree-like, and then linear blending of

• Matrix Factorization

variants, including probabilistic

PCA

• Restricted Boltzmann Machines: an ‘extended’ autoencoder

• k Nearest Neighbors

• Probabilistic Latent Semantic Analysis:

an extraction model that has

‘soft clusters’

as hidden variables

•

linear regression, NNet, & GBDT

yes, you can ‘easily’

understand everything! :-)

Finale Machine Learning in Practice

NTU KDDCup 2011 Track 1 World Champion Model

A linear ensemble of individual and blended models for music rating

prediction, Chen et al., KDDCup 2011

NNet, DecTree-like, and then linear blending of

• Matrix Factorization

variants, including probabilistic

PCA

• Restricted Boltzmann Machines: an ‘extended’ autoencoder

• k Nearest Neighbors

• Probabilistic Latent Semantic Analysis:

an extraction model that has

‘soft clusters’

as hidden variables

•

linear regression, NNet, & GBDT

yes, you can ‘easily’

understand everything! :-)

Finale Machine Learning in Practice

NTU KDDCup 2011 Track 1 World Champion Model

A linear ensemble of individual and blended models for music rating

prediction, Chen et al., KDDCup 2011

NNet, DecTree-like, and then linear blending of

• Matrix Factorization

variants, including probabilistic

PCA

• Restricted Boltzmann Machines: an ‘extended’ autoencoder

• k Nearest Neighbors

• Probabilistic Latent Semantic Analysis:

an extraction model that has

‘soft clusters’

as hidden variables

•

linear regression, NNet, & GBDT

Finale Machine Learning in Practice

NTU KDDCup 2012 Track 2 World Champion Model

A two-stage ensemble of diverse models for advertisement ranking in

KDD Cup 2012, Wu et al., KDDCup 2012

NNet, GBDT-like, and then linear blending of

• Linear Regression

variants, including

linear SVR

• Logistic Regression

variants

• Matrix Factorization

variants

•

. . .

‘key’ is to

blend properly without overfitting

Finale Machine Learning in Practice

NTU KDDCup 2012 Track 2 World Champion Model

A two-stage ensemble of diverse models for advertisement ranking in

KDD Cup 2012, Wu et al., KDDCup 2012

NNet, GBDT-like, and then linear blending of

• Linear Regression

variants, including

linear SVR

• Logistic Regression

variants

• Matrix Factorization

variants

•

. . .

‘key’ is to

blend properly without overfitting

Finale Machine Learning in Practice

NTU KDDCup 2012 Track 2 World Champion Model

A two-stage ensemble of diverse models for advertisement ranking in

KDD Cup 2012, Wu et al., KDDCup 2012

NNet, GBDT-like, and then linear blending of

• Linear Regression

variants, including

linear SVR

• Logistic Regression

variants

• Matrix Factorization

variants

•

. . .

‘key’ is to

blend properly without overfitting

Finale Machine Learning in Practice

NTU KDDCup 2012 Track 2 World Champion Model

A two-stage ensemble of diverse models for advertisement ranking in

KDD Cup 2012, Wu et al., KDDCup 2012

NNet, GBDT-like, and then linear blending of

• Linear Regression

variants, including

linear SVR

• Logistic Regression

variants

• Matrix Factorization

variants

•

. . .

‘key’ is to

blend properly without overfitting

Finale Machine Learning in Practice

NTU KDDCup 2012 Track 2 World Champion Model

A two-stage ensemble of diverse models for advertisement ranking in

KDD Cup 2012, Wu et al., KDDCup 2012

NNet, GBDT-like, and then linear blending of

• Linear Regression

variants, including

linear SVR

• Logistic Regression

variants

• Matrix Factorization

variants

•

. . .

‘key’ is to

blend properly without overfitting

Finale Machine Learning in Practice

NTU KDDCup 2013 Track 1 World Champion Model

Combination of feature engineering and ranking models for paper-author identification in KDD Cup 2013, Li et al., KDDCup 2013

linear blending of

• Random Forest

with many many many trees

• GBDT

variants

with tons of efforts in designing features

‘another key’ is to

construct features with

domain knowledge

Finale Machine Learning in Practice

NTU KDDCup 2013 Track 1 World Champion Model

Combination of feature engineering and ranking models for paper-author identification in KDD Cup 2013, Li et al., KDDCup 2013

linear blending of

• Random Forest

with many many many trees

• GBDT

variants

with tons of efforts in designing features

‘another key’ is to

construct features with

domain knowledge

Finale Machine Learning in Practice

NTU KDDCup 2013 Track 1 World Champion Model

Combination of feature engineering and ranking models for paper-author identification in KDD Cup 2013, Li et al., KDDCup 2013

linear blending of

• Random Forest

with many many many trees

• GBDT

variants

with tons of efforts in designing features

‘another key’ is to

construct features with

domain knowledge

Finale Machine Learning in Practice

NTU KDDCup 2013 Track 1 World Champion Model

Combination of feature engineering and ranking models for paper-author identification in KDD Cup 2013, Li et al., KDDCup 2013

linear blending of

• Random Forest

with many many many trees

• GBDT

variants

with tons of efforts in designing features

‘another key’ is to

construct features with

domain knowledge

Finale Machine Learning in Practice

NTU KDDCup 2013 Track 1 World Champion Model

Combination of feature engineering and ranking models for paper-author identification in KDD Cup 2013, Li et al., KDDCup 2013

linear blending of

• Random Forest

with many many many trees

• GBDT

variants

with tons of efforts in designing features

‘another key’ is to

construct features with

domain knowledge

Finale Machine Learning in Practice

ICDM 2006 Top 10 Data Mining Algorithms

1

C4.5: another

decision tree

2

k -Means

3

SVM

4 Apriori: for frequent itemset mining

5

EM:

‘alternating

optimization’

algorithm for some models

6

PageRank: for

link-analysis, similar to

matrix factorization

7

AdaBoost

8

k Nearest Neighbor

9

Naive Bayes: a simple

linear model

with ‘weights’ decided by data statistics

10

C&RT

personal view of five missing ML competitors:

LinReg, LogReg,

在文檔中 Machine Learning Techniques (ᘤᢈ) (頁 81-98)