Academic year: 2022

Hsuan-Tien Lin (林軒田) htlin@csie.ntu.edu.tw

Department of Computer Science

& Information Engineering

National Taiwan University



My Two Classes with

Competition-Based Final Projects

Machine Learning elective (junior+)



2014 (7 years)

Data Structures & Algorithms required (freshmen)



2015 (5 years)


Students’ Reactions (Selected)



游靖堂、宋彥頡,2013.06.27 好幾個晚上Debug或是寫新的算法and結構 甚至在考離散前一晚趕著競賽結束前趕工到四點


就是為了這個...DSA Final Project!!!!!

從第一次的submission 75%, 46秒(只做指令c) 到現在最後一次的submission 100%, 4秒


從40幾秒進步到10秒(再加上-O2瞬間又降為4秒) 從2249162Byte變成1499446Byte

從陣列+二分搜變成Trie+Binary Search Tree 大回顧20天的努力終於...完成了!!!!


游靖堂、宋彥頡,2013.06.27 coding的過程當中也越來越熟悉class 很可惜沒能成功手爆AVL Tree代替BST 很可惜沒能在競賽結束前知道-O2的厲害很可惜...

能這樣彙整所有的spell checker功能

還很迅速的把完全沒學過, 甚至coding前根本不知道它的名字的Trie CO出來


趴尼游靖堂雖然很嘴砲 但他也很罩阿很多想法完全靠他

還有洪湧Cheng Yueh Han Han W神一直遙遙領先 也是讓我們想要力爭上游的進步動力

不管競賽結果如何或是Final Project Report的分數如何我想我們已經盡 力了


個人覺得最高明的地方還是final project的competition,這玩意兒實 在高明,寫到根本上癮,讓人想一寫在寫,快還要更快,小還要更 小,尤其是被其他對手超過的驚訝感,甚至讓人很想放棄其他科的 期末全力寫出超猛程式—DSA2013

Final Project真的太誘人了...已經快期末考了每天卻還是忍不住去 改它一下戳它一下電它一下—DSA2013



關於final project,我自己是被打擊得挺厲害的,因為,看別組的正 確率都那麼高,我們則雖然用了weka,卻連演算法都跑不



Excitement of Competition




「第六、鼓勵學生競賽。從來沒有一件事像「競爭」這樣,能讓人廢寢 忘食、24小時工作絲毫不倦。我們鼓勵學生參加各式各樣的國際競賽,




Another “In-Class” Competition: KDD Cup


an annual competition on KDD (knowledge discovery and data mining)

organized by ACM SIGKDD, starting from 1997, nowthe most prestigious data mining competition

usually lasts 3-4 months

participants include famous research labs (IBM, AT&T) and top universities (Stanford, Berkeley)


KDD Cups: 2008 to 2013 I


organizer: Siemens

topic: breast cancer prediction (medical)

data size: 0.2M

teams: > 200

NTU:co-championwith IBM (led by Prof. Shou-de Lin)


organizer: Orange

topic: customer behavior prediction (business)

data size: 0.1M

teams: > 400

NTU:3rd placeof slow track


organizer: PSLC Data Shop

topic: student performance prediction (education)

data size: 30M

teams: > 100

NTU:championandstudent-team champion


organizer: Yahoo!

topic: music preference prediction (recommendation)

data size: 300M

teams: > 1000

NTU:double champions


organizer: Tencent

topic: webuser behavior prediction (Internet)

data size: 150M

teams: > 800

NTU:champion of track 2


organizer: Microsoft

topic: paper-author relationship prediction (text mining)

data size: 700M

teams: > 500

NTU:double champions


KDD Cup 2011

Music Recommendation Systems

host: Yahoo!

11 yearsof Yahoo! music data

2 tracksof competition

official dates: March 15 to June 30

1878 teams submitted to track 1;

1854 teams submitted to track 2


NTU Team for KDD Cup 2011

3 faculties:

Profs. Chih-Jen Lin, Hsuan-Tien Lin and Shou-De Lin

1 course

Data Mining and Machine Learning: Theory and Practice

3 TAs and 19 students:

most wereinexperienced in music recommendation in the beginning

official classes: April to June;

actual classes: December to June

our motto: study state-of-the-art approaches and thencreatively improve them


Previously: How Much Did You Like These Movies?


(1M dollar competition between 2007-2009)

goal: use “movies you’ve rated” to automatically

predict yourpreferenceson future movies


My Other Motivations

I HATE exams

even more than my students...


My Design: Time Line

key dates:

report due (i.e. overall competition end): as late as possible

—often4 days before I need to submit the scores to NTU

award ceremony (i.e. early competition end): usuallylast class

announcement: best timing to beright after midterm

—but may highly depend on TAs’ schedule

start designing:two or more weeks beforeannouncement


My Design: Story/Topic

an interesting story makes the competition exciting!


In this final project, you are going to be part of an exciting machine learning competition. Consider a startup company that features a coming product on the mobile phone. The core of the product is a robust character recognition system... To win the prize, you need to fight for the leading positions on the score board. Then, you need to submit a comprehensive report that describes not only the

recommended approaches, but also the reasoning behind your recommendations. Well, let’s get started!

more interesting ones:

ML2014, ML2013:optical character recognition

ML2012:ad click prediction(derived from KDDCup 2012)

DSA2014, DSA2012:email searcher

DSA2013, DSA2011:spell checker

—often okay toreuse with modifications


My Design: Team Size

most ideal team size IMHO is 3:

collaborative,dispute resolution,fewer free riders, etc.

but can also allow 4if class size too bigfor the TAs to grade

usually allow ≤ 3:

so students do not have the burden to findexactly 3

students canflexibly break teamsif needed

butevaluate with workloads of 3for fairness

still sometimes hard for some students to find team members:

motto: provide matching mechanism, butnot force anyone to any team

prevent free riders: needworkload distributionin report


My Design: Scoreboard

core place that makes the gameexciting

thanks to my TAsin all those years for creating and maintaining the service

basically, a simplesubmit-judge-scoreboardsystem

usually provide the students an additionaldescriptionfield to interact—though few use it for serious purposes


My Design: Team Names

good (humerous) team namesmake the competition interesting



DSAGG(Don’t Submit A Goddamn Garbage)


HTLIN (Have To Learn In NTU)

“bad” team names?

2014 ITSA 線上程式競賽:「閉上眼睛深吸氣,想想妹妹就打出來

—don’t know whether I should “educate” students about this, but up to nowno students crossed my line yet


My Design: Award Ceremony

purpose: toadd more fun

light presents(postcards, paper notebooks, etc.)

some students list theirgood-performing awards in resume

may serve someeducational purposes

in addition to good-performing awards, can also giveinteresting awards


ML2012: How Much Overfitting Can We Get?

9472 submissions from 52 teams within 1.5 months...


Award 1: First Submission Award

team scoreboard hidden algorithm time

Not Here∼ Combo Three!!! 0.5018 0.4998 Random 2012/11/27 20:28:38


Award 4: Happy 2013 Award

team scoreboard hidden algorithm time

Minimaximizer 0.7632 0.7407 rwa 2013/01/01 00:00:08


Award 5: Goodbye 2012 Award

team scoreboard hidden algorithm time anything 0.7704 0.7527 b 2012/12/31 23:59:24


Award 7-8: Hard Working Awards

team submission count

A 1097

anything 1149


My Design: Grade

generally based onreport, not competition, butcorrelated

too much emphasis on competition ⇒ utilitarianism

too little emphasis on competition ⇒ less interesting game

ask TAs to act as “bosses”: The grading TAs would grade qualitatively with letters: A++[210], A+[196], A[186], B+[176], B[166], C+[156], C[146], D+[136], D[126], F+[116], F[76], F-[36], Z[0]

listbasic requirementscorresponding toB

to get B, students only need to work ≈ usual homeworks

to get more, need more to convince the TAs

generally“loose” about basic requirements

—most students perform way beyond the basic requirements anyway

generally team grade, butadjust individual grade if workload unbalanced


My Design: Loading

ideal: a bitharder than homework

estimate: 60 to 90 man-hours to finish basic requirements (30 man-hour per member)

sometimes need toadjust loading of other homeworks

—not an easy task, though


My Design: Coverage

motto: trynot restrictingthe tools that students use

but sometimes needing some restrictions in competition for fairnessandfocusof project

parallel programming for freshmen?

external data for optical character recognition?

decision criterion: which makesmost people in the gamemost excited?

try beingsuper-flexible in reportto still reward creativity


My Design: TAs

good TAs’ helpessential—I cannot thank them enough!

design,system setup,discuss with students

unfortunately, NTUcannot pay many TAs

—many of our TAs arevolunteers

(joined undertotal free will, even for my lab students)

some TAs evenplay in competition(good or bad)


My Design: TAs

always note: TAs arebusy!!


My Design: Instructor

my main job:heat up the competition


my main job:heat up the competition


My Design: Instructor

my two other jobs:

participateseriously in the design

maintainfairnessof competition


Less Successful Stories

DSA2015:announced late,hard homework

DSA2011:decide to do final project too late

ML2013:task too easy in some sense


Some Summary Thoughts

Positive Side

funfor most students, TAs and instructor

students, TAs and instructorlearn a lot

Negative Side

exhaustingfor most students, TAs and instructor

can be disappointingfor some students

Questions and Discussions?



