• 沒有找到結果。

ML2012 Final Project

N/A
N/A
Protected

Academic year: 2022

Share "ML2012 Final Project"

Copied!
22
0
0

加載中.... (立即查看全文)

全文

(1)

ML2012 Final Project

TAs: Wei-Yuan Shen, Han-Jay Yang and Ching-Pei Lee Instructor: Hsuan-Tien Lin

2012.12.3

(2)

Outline

Data Introduction Evaluation Criterion Possible Directions Practical Issue

(TA’s Lecture) ML2012 Final Project 2012.12.3 2 / 16

(3)

Data Introduction

The data sets origin from our validation set blending process in the track 2 of KDDCUP2012.

The track 2 of KDDCUP2012

Task: predict click-through rate of ads on search engine.

Data: 155,750,158 training instances, over 10 GB data sets.

Goal: Maximize AUC among those instances.

Difficulties: Huge data sets and feature extraction.

Key to our success:

Explore useful features from the data.

Exploit diverse set of model.

Use blending to enhance the diversity, and boost the performance.

(4)

Data Introduction

Validation set blending

1 Validation Set(V): sample 1/11 instances from train set.

2 Training several models on the rest 10/11 instances.

3 Split V into sub-training(V1) and sub-testing(V2) sets.

4 Use models in step 2 to get predictions on both V and test set.

5 Create features of V1,V2 and testing data sets for validation set blending, including the predictions of models in step 2 and some optional extra features.

6 Treat V1 as the new training data and V2 as the new validation data, then do training to predict on the test set.

(TA’s Lecture) ML2012 Final Project 2012.12.3 4 / 16

(5)

Data Introduction

Validation set blending

1 Validation Set(V): sample 1/11 instances from train set.

2 Training several models on the rest 10/11 instances.

3 Split V into sub-training(V1) and sub-testing(V2) sets.

4 Use models in step 2 to get predictions on both V and test set.

5 Create features of V1,V2 and testing data sets for validation set blending, including the predictions of models in step 2 and some optional extra features.

6 Treat V1 as the new training data and V2 as the new validation data, then do training to predict on the test set.

(6)

Data Introduction

Validation set blending

1 Validation Set(V): sample 1/11 instances from train set.

2 Training several models on the rest 10/11 instances.

3 Split V into sub-training(V1) and sub-testing(V2) sets.

4 Use models in step 2 to get predictions on both V and test set.

5 Create features of V1,V2 and testing data sets for validation set blending, including the predictions of models in step 2 and some optional extra features.

6 Treat V1 as the new training data and V2 as the new validation data, then do training to predict on the test set.

(TA’s Lecture) ML2012 Final Project 2012.12.3 4 / 16

(7)

Data Introduction

Validation set blending

1 Validation Set(V): sample 1/11 instances from train set.

2 Training several models on the rest 10/11 instances.

3 Split V into sub-training(V1) and sub-testing(V2) sets.

4 Use models in step 2 to get predictions on both V and test set.

5 Create features of V1,V2 and testing data sets for validation set blending, including the predictions of models in step 2 and some optional extra features.

6 Treat V1 as the new training data and V2 as the new validation data, then do training to predict on the test set.

(8)

Data Introduction

Validation set blending

1 Validation Set(V): sample 1/11 instances from train set.

2 Training several models on the rest 10/11 instances.

3 Split V into sub-training(V1) and sub-testing(V2) sets.

4 Use models in step 2 to get predictions on both V and test set.

5 Create features of V1,V2 and testing data sets for validation set blending, including the predictions of models in step 2 and some optional extra features.

6 Treat V1 as the new training data and V2 as the new validation data, then do training to predict on the test set.

(TA’s Lecture) ML2012 Final Project 2012.12.3 4 / 16

(9)

Data Introduction

Validation set blending

1 Validation Set(V): sample 1/11 instances from train set.

2 Training several models on the rest 10/11 instances.

3 Split V into sub-training(V1) and sub-testing(V2) sets.

4 Use models in step 2 to get predictions on both V and test set.

5 Create features of V1,V2 and testing data sets for validation set blending, including the predictions of models in step 2 and some optional extra features.

6 Treat V1 as the new training data and V2 as the new validation data, then do training to predict on the test set.

(10)

Data Introduction

Validation set blending(cont.) Benefits:

Validation set blending works when single models have enough diversity.

The training size is much smaller than training for single models, we can try more complicated algorithms and feature engineering.

We get about 1% improvement in the last week of the competition.

Data sets of final project

40,000 training examples, and 50,000 test ones.

Binary label and each example contains 71 features.

All training and testing examples are sampled from our validation set(V) of track2 of KDDCUP2012.

The features including 45 single model predictions and 26 numerical features we extract from the raw data.

Missing values: there are about 10% missing values per column in both training and testing data sets.

(TA’s Lecture) ML2012 Final Project 2012.12.3 5 / 16

(11)

The ROC Curve

Receiver Operating Characteristic

True Positive Rate = TP / P

(12)

The ROC Curve

Receiver Operating Characteristic

Each point on the curve correspond to an (TP,FP) pair.

Imagine as we incline to report more positive instances, both TP and FP increases.

(TA’s Lecture) ML2012 Final Project 2012.12.3 7 / 16

(13)

Typical Ranking Scenario & ROC Curve

(14)

Area Under Curve (AUC)

Defined as the area under ROC curve.

Characteristics:

Equal to the P(Rank(I+) ¿ Rank(I))

Equal to the proportion of “corrected-ranked pair” among all pairs. Measure how well your training model rank positive instances (higher), in a sense.

(TA’s Lecture) ML2012 Final Project 2012.12.3 9 / 16

(15)

Area Under Curve (AUC)

Defined as the area under ROC curve.

Characteristics:

Equal to the P(Rank(I+) ¿ Rank(I))

Equal to the proportion of “corrected-ranked pair” among all pairs.

Measure how well your training model rank positive instances (higher), in a sense.

(16)

Calculation of AUC

Equal to the proportion of “corrected-ranked pair” among all pairs.

Given a sorted list, we can count the number of “corrected-ranked pair” in O(n).

For each Negative item, (accumulately) count how many instances are before it.

(TA’s Lecture) ML2012 Final Project 2012.12.3 10 / 16

(17)

The Challenges

What you know so far:

How to do (binary) classification.

How to do linear / logistic regression.

The challenge:

Ranking: output is a sorted list.

Bipartite ranking: instance is either positive or negative.

Missing values.

(18)

The Bipartite Ranking Problem

“Ranking”: give “score” to each instance Similar as in a regression problem.

But the binary label in training data could be a problem.

Want to rank positive instance before negative ones.

Not that different with a classification problem.

Thus, possible strategies:

“Score”: use regression techniques.

“Pairwise Comparison”: transform to the binary classification problem over pair of examples: F : (x, x’) → y , which measures if x is “better”

than x’.

Any way you can turn a classification prediction into a confidence measure.

(TA’s Lecture) ML2012 Final Project 2012.12.3 12 / 16

(19)

The Bipartite Ranking Problem

Few things to note, though:

Handle ties with caution. Try to break ties if possible.

As typical bipartite ranking problems, the samples could be unbalanced.

Be sure to use AUC to measure your performance. (that’s including your validation performance)

(20)

Handling Missing Data

Random values.

Average values.

Special label ‘?’ ..?

Most “likely” values.

Look for similar sample?

Predict the missing value?

Use your imagination.

(TA’s Lecture) ML2012 Final Project 2012.12.3 14 / 16

(21)

Practical Issue

1 Data Pre-Processing Target normalization Feature normalization Feature engineering

2 Parameter Selection Depends on your data Overfitting and Under fitting Model type selection

Tradeoff between training time and performance Stopping criteria: error tolerance

3 Accelerate the whole training procedure Training time v.s. Loading time Local disk v.s. NFS

Parallelization

(22)

Questions?

(TA’s Lecture) ML2012 Final Project 2012.12.3 16 / 16

參考文獻

相關文件

• validation set blending: a special any blending model E test (squared): 519.45 =⇒ 456.24. —helped secure the lead in last

2 Combining Predictive Features: Aggregation Models Lecture 7: Blending and Bagging.. Motivation of Aggregation

Initial Approaches and Some Settings Sparse Features and Linear Classification Condensed Features and Random Forest Ensemble and Final Results.. Discussion

• As the binary quadratic programming program is NP-hard in general, identifying polynomially solvable subclasses of binary quadratic programming problems not only offers the-

Reading Task 6: Genre Structure and Language Features. • Now let’s look at how language features (e.g. sentence patterns) are connected to the structure

understanding the features of academic genres (or text types) and detailed reading strategies. This could work in all school contexts, including

understanding the features of academic genres (or text types) and detailed reading strategies. This could work in all school contexts, including

Wang, Solving pseudomonotone variational inequalities and pseudocon- vex optimization problems using the projection neural network, IEEE Transactions on Neural Networks 17