Teaching Machine Learning to a Diverse Audience:
the Foundation-based Approach
Hsuan-Tien Lin, National Taiwan University Malik Magdon-Ismail, Rensselaer Polytechnic Institute Yaser S. Abu-Mostafa, California Institute of Technology
Teaching Machine Learning Workshop @ ICML 2012 June 30, 2012
Lin, Magdon-Ismail, Abu-Mostafa Foundation-based Approach 06/30/2012 1 / 15
Diversity in ML classes
NTU ML 2011 Fall (77 students)
background diversity “maturity” diversity junior: 8 senior: 20 master: 44 phd: 5
similarly diverse in RPI and in Caltech (online course)1 challenge:
serving CS students while accommodating the needs of diverse non-CS audience mindset of the audience?
1http://work.caltech.edu/telecourse
Observed Mindsets of the Diverse Audience
highlymotivatedto learn
—not satisfied with only shallow comic-book stories
often withminimum but non-emptymath/programming background
—capable of downloading and trying the latest packages words of a student from industry (Caltech online course 2012)2
demand:solid foundation(and better intuition)!
2http://book.caltech.edu/bookforum/showthread.php?p=3107
Lin, Magdon-Ismail, Abu-Mostafa Foundation-based Approach 06/30/2012 3 / 15
Our Proposed Teaching Approach
foundation-based, and foundation-first
then, compensate foundation witha couple ofuseful algorithms/techniques
comparison to techniques-based techniques-based:
hops through the forest ofmanylatest and greatest techniques foundation-based: illustrate themap (core)first to prevent getting lost in the forest
foundation-based:
prepare students foreasy learning of untaught/future techniques
Our Proposed Teaching Approach [Cont.]
foundation-based, and foundation-first
then, compensate foundation witha couple ofuseful algorithms/techniques
comparison to foundation-later foundation-later:
first, techniques to raise interests
then, foundations to consolidate understanding
foundation-first: build thebasis (core)first to perceive the techniques from the right angle
foundation-first:
let studentsknow when and how to use the powerful tools before getting addicted on the power
Lin, Magdon-Ismail, Abu-Mostafa Foundation-based Approach 06/30/2012 5 / 15
Our Proposed Foundation: Three Concepts
understand learnability, approximation and generalization when can we learn and what are the tradeoffs?
conducting machine learningproperly use simple models first
the linear model coupled with some nonlinear transforms is typically enough for most applications
conducting machine learningsafely deal with noise and overfitting carefully
how to tackle the “dark side” of learning?
conducting machine learningprofessionally
our experience: worth starting with those foundations, even for a diverse audience
learnability, approximation & generalization
—conducting machine learning properly
good learning (test performance)
= good approximation (training performance) + good generalization (complexity penalty)
a must-teach key message
can be illustrated indifferent forms(e.g. VC bound, bias-variance, even human-learning philosophy)
make learningnon-trivial and fascinatingto students
Lin, Magdon-Ismail, Abu-Mostafa Foundation-based Approach 06/30/2012 7 / 15
learnability, approximation & generalization
—conducting machine learning properly [Cont.]
wrong use of learning (beginner’s mistakes)
ensuregood approximation, pray forgood generalization
—praying for something out-of-control
right use of learning
ensuregood generalization, try best forgood approximation
—trying something possibly in-control
We cannot guarantee learning. We can“guarantee” no disasters. That is, after we learn we will either declare success or failure, and in both cases we will be right.
linear models
—conducting machine learning safely
linear models
= good generalization
withestablished optimization toolsforgood approximation
after knowingapproximation/generalization:
a good stagefor learning safe techniques
sufficiently usefulfor many practical problems (Yuan et al., 2012) building blockin sophisticated techniques throughfeature
transforms
make learningconcreteto students
Lin, Magdon-Ismail, Abu-Mostafa Foundation-based Approach 06/30/2012 9 / 15
linear models
—conducting machine learning safely [Cont.]
wrong use of learning (beginner’s mistakes)
start with the “greatest” techniques first —a point of no return right use of learning
start with thesimplesttechniques first —and yes, it can work well a rich and representative family of linear techniques
classification: approx. combinatorial optimization (perceptron-like) regression: analytic optimization (pseudo-inverse)
logistic regression: iterative optimization (SGD)
Students coming from diverse backgrounds not only get thebig picture, but also thefiner details in a concrete setting.
deal with noise and overfitting
—conducting machine learning professionally
overfit = difficult to ensure good generalization/learning withstochastic or deterministic noiseon finite data regularization= tools for further guaranteeinggood generalization validation= tools for certifyinggood learning
overfit(data size, noise level)
turn amateur students toprofessionals make learningartisticto students
Lin, Magdon-Ismail, Abu-Mostafa Foundation-based Approach 06/30/2012 11 / 15
deal with noise and overfitting
—conducting machine learning professionally [Cont.]
wrong use of learning (beginner’s mistakes)
apply all possible techniques and choose bybest approximation result
—high risk of overfitting right use of learning
apply a reasonable number of well-regularizedtechniques and choose bybest validation result—relatively immune to noise and overfitting
Complex situations call forsimplermodels.
Teaching/Learning Life After the Foundations
Support Vector Machine
generalization large-margin bound approximation quadratic programming linear model basic formulation feature transform through kernel regularization large-margin validation #-SV bound
Neural Network
#-neuron bound gradient decent et al.
neurons
through cascading
weight-decay or early-stopping for choices in regularization
[libsvm-2.9]$ ./svm-train -t 2 -g 0.05 -c 100 heart_scale optimization finished, #iter = 1966
Total nSV = 113
good approximation (by choosing kernel and optimization) good generalization (by regularization)
good learning (by using #SV as validation indicator)
Lin, Magdon-Ismail, Abu-Mostafa Foundation-based Approach 06/30/2012 13 / 15
Teaching/Learning Life After the Foundations [Cont.]
Caltech 2012: (mixed)7 weeksof foundations, 0.5 week of NNet, 0.5 week of RBF Net, 1 week of SVM
NTU 2011: (sequential)10 weeksof foundations, 2.5 weeks of SVM, 2.5 weeks of bagging/boosting
—with an in-class data mining competition3where students exploited taught/not-taughttechniques with ease
oftenincrementalefforts to teach/learn a new technique after solid foundations
3http://main.learner.csie.ntu.edu.tw/php/ml11fall/
Conclusion
foundation-based, foundation-first
—works well in our experience
learnability: philosophicalunderstanding, make learning non-trivial, conduct learningproperly
linear models: algorithmicmodeling, make learningconcrete, conduct learningsafely
overfitting: practicaltuning, make learningartistic, conduct learningprofessionally
Thank you. Questions?
Lin, Magdon-Ismail, Abu-Mostafa Foundation-based Approach 06/30/2012 15 / 15