Teaching Machine Learning to a Diverse Audience: the Foundation-based Approach

(1)

Teaching Machine Learning to a Diverse Audience:

the Foundation-based Approach

Hsuan-Tien Lin, National Taiwan University Malik Magdon-Ismail, Rensselaer Polytechnic Institute Yaser S. Abu-Mostafa, California Institute of Technology

Teaching Machine Learning Workshop @ ICML 2012 June 30, 2012

Lin, Magdon-Ismail, Abu-Mostafa Foundation-based Approach 06/30/2012 1 / 15

(2)

Diversity in ML classes

NTU ML 2011 Fall (77 students)

background diversity “maturity” diversity junior: 8 senior: 20 master: 44 phd: 5

similarly diverse in RPI and in Caltech (online course)¹ challenge:

serving CS students while accommodating the needs of diverse non-CS audience mindset of the audience?

1http://work.caltech.edu/telecourse

(3)

Observed Mindsets of the Diverse Audience

highlymotivatedto learn

—not satisfied with only shallow comic-book stories

often withminimum but non-emptymath/programming background

—capable of downloading and trying the latest packages words of a student from industry (Caltech online course 2012)²

demand:solid foundation(and better intuition)!

2http://book.caltech.edu/bookforum/showthread.php?p=3107

(4)

Our Proposed Teaching Approach

foundation-based, and foundation-first

then, compensate foundation witha couple ofuseful algorithms/techniques

comparison to techniques-based techniques-based:

hops through the forest ofmanylatest and greatest techniques foundation-based: illustrate themap (core)first to prevent getting lost in the forest

foundation-based:

prepare students foreasy learning of untaught/future techniques

(5)

Our Proposed Teaching Approach [Cont.]

foundation-based, and foundation-first

then, compensate foundation witha couple ofuseful algorithms/techniques

comparison to foundation-later foundation-later:

first, techniques to raise interests

then, foundations to consolidate understanding

foundation-first: build thebasis (core)first to perceive the techniques from the right angle

foundation-first:

let studentsknow when and how to use the powerful tools before getting addicted on the power

(6)

Our Proposed Foundation: Three Concepts

understand learnability, approximation and generalization when can we learn and what are the tradeoffs?

conducting machine learningproperly use simple models first

the linear model coupled with some nonlinear transforms is typically enough for most applications

conducting machine learningsafely deal with noise and overfitting carefully

how to tackle the “dark side” of learning?

conducting machine learningprofessionally

our experience: worth starting with those foundations, even for a diverse audience

(7)

learnability, approximation & generalization

—conducting machine learning properly

good learning (test performance)

= good approximation (training performance) + good generalization (complexity penalty)

a must-teach key message

can be illustrated indifferent forms(e.g. VC bound, bias-variance, even human-learning philosophy)

make learningnon-trivial and fascinatingto students

(8)

learnability, approximation & generalization

—conducting machine learning properly [Cont.]

wrong use of learning (beginner’s mistakes)

ensuregood approximation, pray forgood generalization

—praying for something out-of-control

right use of learning

ensuregood generalization, try best forgood approximation

—trying something possibly in-control

We cannot guarantee learning. We can“guarantee” no disasters. That is, after we learn we will either declare success or failure, and in both cases we will be right.

(9)

linear models

—conducting machine learning safely

linear models

= good generalization

withestablished optimization toolsforgood approximation

after knowingapproximation/generalization:

a good stagefor learning safe techniques

sufficiently usefulfor many practical problems (Yuan et al., 2012) building blockin sophisticated techniques throughfeature

transforms

make learningconcreteto students

(10)

linear models

—conducting machine learning safely [Cont.]

start with the “greatest” techniques first —a point of no return right use of learning

start with thesimplesttechniques first —and yes, it can work well a rich and representative family of linear techniques

classification: approx. combinatorial optimization (perceptron-like) regression: analytic optimization (pseudo-inverse)

logistic regression: iterative optimization (SGD)

Students coming from diverse backgrounds not only get thebig picture, but also thefiner details in a concrete setting.

(11)

deal with noise and overfitting

—conducting machine learning professionally

overfit = difficult to ensure good generalization/learning withstochastic or deterministic noiseon finite data regularization= tools for further guaranteeinggood generalization validation= tools for certifyinggood learning

overfit(data size, noise level)

turn amateur students toprofessionals make learningartisticto students

(12)

deal with noise and overfitting

—conducting machine learning professionally [Cont.]

apply all possible techniques and choose bybest approximation result

—high risk of overfitting right use of learning

apply a reasonable number of well-regularizedtechniques and choose bybest validation result—relatively immune to noise and overfitting

Complex situations call forsimplermodels.

(13)

Teaching/Learning Life After the Foundations

Support Vector Machine

generalization large-margin bound approximation quadratic programming linear model basic formulation feature transform through kernel regularization large-margin validation #-SV bound

Neural Network

#-neuron bound gradient decent et al.

neurons

through cascading

weight-decay or early-stopping for choices in regularization

[libsvm-2.9]$ ./svm-train -t 2 -g 0.05 -c 100 heart_scale optimization finished, #iter = 1966

Total nSV = 113

good approximation (by choosing kernel and optimization) good generalization (by regularization)

good learning (by using #SV as validation indicator)

(14)

Teaching/Learning Life After the Foundations [Cont.]

Caltech 2012: (mixed)7 weeksof foundations, 0.5 week of NNet, 0.5 week of RBF Net, 1 week of SVM

NTU 2011: (sequential)10 weeksof foundations, 2.5 weeks of SVM, 2.5 weeks of bagging/boosting

—with an in-class data mining competition³where students exploited taught/not-taughttechniques with ease

oftenincrementalefforts to teach/learn a new technique after solid foundations

3http://main.learner.csie.ntu.edu.tw/php/ml11fall/

(15)

Conclusion

foundation-based, foundation-first

—works well in our experience

learnability: philosophicalunderstanding, make learning non-trivial, conduct learningproperly

linear models: algorithmicmodeling, make learningconcrete, conduct learningsafely

overfitting: practicaltuning, make learningartistic, conduct learningprofessionally

Thank you. Questions?