Finale Machine Learning in Practice
NTU KDDCup 2011 Track 1 World Champion Model
A linear ensemble of individual and blended models for music rating
prediction, Chen et al., KDDCup 2011
NNet, DecTree-like, and then linear blending of
• Matrix Factorization
variants, including probabilisticPCA
• Restricted Boltzmann Machines: an ‘extended’ autoencoder
• k Nearest Neighbors
• Probabilistic Latent Semantic Analysis:
an extraction model that has
‘soft clusters’
as hidden variables•
linear regression, NNet, & GBDTyes, you can ‘easily’
Finale Machine Learning in Practice
NTU KDDCup 2011 Track 1 World Champion Model
A linear ensemble of individual and blended models for music rating
prediction, Chen et al., KDDCup 2011
NNet, DecTree-like, and then linear blending of
• Matrix Factorization
variants, including probabilisticPCA
• Restricted Boltzmann Machines: an ‘extended’ autoencoder
• k Nearest Neighbors
• Probabilistic Latent Semantic Analysis:
an extraction model that has
‘soft clusters’
as hidden variables•
linear regression, NNet, & GBDTyes, you can ‘easily’
understand everything! :-)
Finale Machine Learning in Practice
NTU KDDCup 2011 Track 1 World Champion Model
A linear ensemble of individual and blended models for music rating
prediction, Chen et al., KDDCup 2011
NNet, DecTree-like, and then linear blending of
• Matrix Factorization
variants, including probabilisticPCA
• Restricted Boltzmann Machines: an ‘extended’ autoencoder
• k Nearest Neighbors
• Probabilistic Latent Semantic Analysis:
an extraction model that has
‘soft clusters’
as hidden variables•
linear regression, NNet, & GBDTyes, you can ‘easily’
understand everything! :-)
Finale Machine Learning in Practice
NTU KDDCup 2011 Track 1 World Champion Model
A linear ensemble of individual and blended models for music rating
prediction, Chen et al., KDDCup 2011
NNet, DecTree-like, and then linear blending of
• Matrix Factorization
variants, including probabilisticPCA
• Restricted Boltzmann Machines: an ‘extended’ autoencoder
• k Nearest Neighbors
• Probabilistic Latent Semantic Analysis:
an extraction model that has
‘soft clusters’
as hidden variables•
linear regression, NNet, & GBDTyes, you can ‘easily’
understand everything! :-)
Finale Machine Learning in Practice
NTU KDDCup 2011 Track 1 World Champion Model
A linear ensemble of individual and blended models for music rating
prediction, Chen et al., KDDCup 2011
NNet, DecTree-like, and then linear blending of
• Matrix Factorization
variants, including probabilisticPCA
• Restricted Boltzmann Machines: an ‘extended’ autoencoder
• k Nearest Neighbors
• Probabilistic Latent Semantic Analysis:
an extraction model that has
‘soft clusters’
as hidden variables•
linear regression, NNet, & GBDTyes, you can ‘easily’
understand everything! :-)
Finale Machine Learning in Practice
NTU KDDCup 2011 Track 1 World Champion Model
A linear ensemble of individual and blended models for music rating
prediction, Chen et al., KDDCup 2011
NNet, DecTree-like, and then linear blending of
• Matrix Factorization
variants, including probabilisticPCA
• Restricted Boltzmann Machines: an ‘extended’ autoencoder
• k Nearest Neighbors
• Probabilistic Latent Semantic Analysis:
an extraction model that has
‘soft clusters’
as hidden variables•
linear regression, NNet, & GBDTyes, you can ‘easily’
understand everything! :-)
Finale Machine Learning in Practice
NTU KDDCup 2011 Track 1 World Champion Model
A linear ensemble of individual and blended models for music rating
prediction, Chen et al., KDDCup 2011
NNet, DecTree-like, and then linear blending of
• Matrix Factorization
variants, including probabilisticPCA
• Restricted Boltzmann Machines: an ‘extended’ autoencoder
• k Nearest Neighbors
• Probabilistic Latent Semantic Analysis:
an extraction model that has
‘soft clusters’
as hidden variables•
linear regression, NNet, & GBDTFinale Machine Learning in Practice
NTU KDDCup 2012 Track 2 World Champion Model
A two-stage ensemble of diverse models for advertisement ranking in
KDD Cup 2012, Wu et al., KDDCup 2012
NNet, GBDT-like, and then linear blending of
• Linear Regression
variants, includinglinear SVR
• Logistic Regression
variants• Matrix Factorization
variants•
. . .‘key’ is to
blend properly without overfitting
Finale Machine Learning in Practice
NTU KDDCup 2012 Track 2 World Champion Model
A two-stage ensemble of diverse models for advertisement ranking in
KDD Cup 2012, Wu et al., KDDCup 2012
NNet, GBDT-like, and then linear blending of
• Linear Regression
variants, includinglinear SVR
• Logistic Regression
variants• Matrix Factorization
variants•
. . .‘key’ is to
blend properly without overfitting
Finale Machine Learning in Practice
NTU KDDCup 2012 Track 2 World Champion Model
A two-stage ensemble of diverse models for advertisement ranking in
KDD Cup 2012, Wu et al., KDDCup 2012
NNet, GBDT-like, and then linear blending of
• Linear Regression
variants, includinglinear SVR
• Logistic Regression
variants• Matrix Factorization
variants•
. . .‘key’ is to
blend properly without overfitting
Finale Machine Learning in Practice
NTU KDDCup 2012 Track 2 World Champion Model
A two-stage ensemble of diverse models for advertisement ranking in
KDD Cup 2012, Wu et al., KDDCup 2012
NNet, GBDT-like, and then linear blending of
• Linear Regression
variants, includinglinear SVR
• Logistic Regression
variants• Matrix Factorization
variants•
. . .‘key’ is to
blend properly without overfitting
Finale Machine Learning in Practice
NTU KDDCup 2012 Track 2 World Champion Model
A two-stage ensemble of diverse models for advertisement ranking in
KDD Cup 2012, Wu et al., KDDCup 2012
NNet, GBDT-like, and then linear blending of
• Linear Regression
variants, includinglinear SVR
• Logistic Regression
variants• Matrix Factorization
variants•
. . .‘key’ is to
blend properly without overfitting
Finale Machine Learning in Practice
NTU KDDCup 2013 Track 1 World Champion Model
Combination of feature engineering and ranking models for paper-author identification in KDD Cup 2013, Li et al., KDDCup 2013
linear blending of
• Random Forest
with many many many trees• GBDT
variantswith tons of efforts in designing features
‘another key’ is to
construct features with
domain knowledge
Finale Machine Learning in Practice
NTU KDDCup 2013 Track 1 World Champion Model
Combination of feature engineering and ranking models for paper-author identification in KDD Cup 2013, Li et al., KDDCup 2013
linear blending of
• Random Forest
with many many many trees• GBDT
variantswith tons of efforts in designing features
‘another key’ is to
construct features with
domain knowledge
Finale Machine Learning in Practice
NTU KDDCup 2013 Track 1 World Champion Model
Combination of feature engineering and ranking models for paper-author identification in KDD Cup 2013, Li et al., KDDCup 2013
linear blending of
• Random Forest
with many many many trees• GBDT
variantswith tons of efforts in designing features
‘another key’ is to
construct features with
domain knowledge
Finale Machine Learning in Practice
NTU KDDCup 2013 Track 1 World Champion Model
Combination of feature engineering and ranking models for paper-author identification in KDD Cup 2013, Li et al., KDDCup 2013
linear blending of
• Random Forest
with many many many trees• GBDT
variantswith tons of efforts in designing features
‘another key’ is to
construct features with
domain knowledge
Finale Machine Learning in Practice
NTU KDDCup 2013 Track 1 World Champion Model
Combination of feature engineering and ranking models for paper-author identification in KDD Cup 2013, Li et al., KDDCup 2013
linear blending of
• Random Forest
with many many many trees• GBDT
variantswith tons of efforts in designing features
‘another key’ is to
construct features with
domain knowledge
Finale Machine Learning in Practice
ICDM 2006 Top 10 Data Mining Algorithms
1
C4.5: anotherdecision tree
2
k -Means3
SVM4 Apriori: for frequent itemset mining
5
EM:‘alternating
optimization’
algorithm for some models6
PageRank: forlink-analysis, similar to
matrix factorization
7
AdaBoost8
k Nearest Neighbor9
Naive Bayes: a simplelinear model
with ‘weights’ decided by data statistics10
C&RTpersonal view of five missing ML competitors: