• 沒有找到結果。

5. Empirical result

5.1 GAM

立 政 治 大 學

N a tio na

l C h engchi U ni ve rs it y

22

5. Empirical result

5.1 GAM

In order to identify which variables would practically enhance predictive performance, we specify models using reservation information and restaurant information and evaluate their performances by AUC (See Section 4 for details). We note that all the variables used for forecasting are statistically significant. The 𝑃𝑖 in our study implies

Pr(𝑅𝑒𝑡𝑢𝑟𝑛90𝑖 = 1|𝑋𝑖β) log ( 𝑃𝑖

1 − 𝑃𝑖) = α + 𝛽1𝑆1(Gender) + 𝛽2𝑆2(Age) (9) Equation (9) is a generalized additive model with a member’s age (Age) and gender (Gender) as predictors. Traditionally, these member profiles are considered powerful to predict consumers’ behaviors. However, the AUC (Fig 5.1) is only 7.7%

larger than the one of a randomly guessing model (AUC=0.5). That is, Age and Gender is not helpful to predict a customer’s return rate (Return90). Moreover, there are too many missing values in independent variable Age. This may lead to losing data information.

Fig 5.1

立 政 治 大 學

N a tio na

l C h engchi U ni ve rs it y

23

We continue to try on period of time between dining day and order-placed day and reservation group size: Timediff and People. (equation 10)

log ( 𝑃𝑖

1 − 𝑃𝑖) = α + 𝛽1𝑆1(Timediff) + 𝛽2𝑆2(People) (10) From Fig 5.2, the ROC curve of equation 10 is closer to the point (0, 1) than equation 9 in Fig 5.1. Likewise, the AUC of equation 10 is 0.5799. This means that equation 10 has better prediction performance than equation 9 (0.5799 versus 0.5385).

As a result, we consider that these two predictors improve prediction performance.

Fig 5.2

Since some variables of reservation information are able to improve the performance of prediction, we further include that the status of reservation information can improve the performance as well.

log ( 𝑃𝑖

1 − 𝑃𝑖) = α + 𝛽1status_ok + 𝛽2status_canceled + 𝛽3status_changed (11) Equation (11) is another classification model using reservation status: reservation is new or ok (status_ok), reservation is canceled (status_canceled), and reservation is changed (status_changed). As we mention in Section 3, these three independent variables record the status of a reservation. Fig 5.3 shows the comparison between the model M1 (equation 10) and M2 (equation 11). The prediction performance of M2 is better. The AUC of M2 is 0.6236, which is higher than the 0.5799 of M1.

立 政 治 大 學

N a tio na

l C h engchi U ni ve rs it y

24

Consequently, the status of reservation improves the prediction as well.

Fig 5.3

log ( 𝑃𝑖

1 − 𝑃𝑖) = α + 𝛽1status_ok + 𝛽2status_canceled + 𝛽3status_changed + 𝛽4𝑆1(Timediff) + 𝛽5𝑆2(people) (12) Equation (12) is a model combining independent variables from equation (10) and equation (11). Fig 5.4 shows the ROC curves of new model M3 and the other two, M1 and M2. The AUC of M3 is 0.6605. It improves the prediction performance (from 0.6236 to 0.6605).

立 政 治 大 學

N a tio na

l C h engchi U ni ve rs it y

25

Fig 5.4

With the improvement at M3, we attempt to introduce all the other variables to the classification models. We infer the prediction power of member profile may increase in a larger predictor set. In section 3, we mention that besides reservation information, information of restaurants in the booking records is also included in the dataset. It is possible that restaurant information influences the behaviors of members.

log ( 𝑃𝑖

1 − 𝑃𝑖) = α + 𝛽1age16 − 25 + 𝛽2ag2 + 𝛽3age36 − 45 + 𝛽4gender + 𝛽5status_𝑜𝑘 + 𝛽6status_canceled

+ 𝛽7status_changed + 𝛽8𝑆1(Timediff) + 𝛽9𝑆2(people)

+ 𝛽10Is_hotel + 𝛽11new_taipei_city + 𝛽12out_of_greater_taipei + 𝛽13Wifi (13)

Equation (13) includes member’s profile and restaurant information, covering age, gender, and the location of restaurant (area), if restaurant is in a hotel (is_hotel), and if restaurant provides wifi (wifi). In Fig 5.5, the ROC curve of M4 is almost the same as the one of M3. This implies that in our case, geographic information of a restaurant and its facilities are not powerful to predict member’s usage of an online booking service. In other words, no matter where the restaurant is, its location does not influence member’s behavior. Table 8 shows AUC comparison among these GAM models.

立 政 治 大 學

N a tio na

l C h engchi U ni ve rs it y

26

Fig 5.5

Table 8 GAM model comparison

Model Name Variables AUC

M0 age, gender 0.5305

M1 timediff, people 0.5799

M2 status 0.6236

M3 timediff, people, status 0.6605

M4 timediff, people, status, restaurant 0.6644

After constructing the models through GAM with different combination of independent variables, we are not able to improve their AUC further. Therefore, we would like to try tree-based methods which also perform well in data classification.

The goal of tree-based methods is to divide the predictor space into several different regions for classification. At the end of the division, the regions are summarized to a tree. Decision tree is a method which constructs a tree through this process. It can classify data and interpret results easily. However, it actually has a poor prediction power compared with the learning methods, like logistics regression and GAM (James, et al., 2013). Therefore, we also fit our data with other two tree-based methods,

Bagging and Random Forest. Unlike decision tree, the two learning approaches resample training data sets through the procedure bootstrap. Bootstrap is commonly used to reduce prediction variance of one training set from another. It resamples data multiple times by sampling the training data with replacement.

Bagging bootstraps new training data sets from the original one first. These new training sets are fit to prediction model with full set of p predictors separately. And then the prediction results would be aggregated to one single model. Generally speaking, the results are aggregated through classifiers’ voting. For instance, if the training data is resampled 100 times through bootstrap, there are 100 classifiers.

Therefore, for each observation, there would be 100 predictions. The final prediction is voted by the 100 predictions. If there are 90 “Yes” in the 100 predictions, the final prediction of the observation is voted as “Yes”. Random forest, one the other hand, is a refined version of bagging. Like bagging, multiple training data sets are

bootstrapped. Nevertheless, there is a random subset of m predictors chosen from the full set of p predictors during each fitting process of these training sets to different decision trees. The number m is typically equal to the square root of p, which means 𝑚 = √𝑝. This can improve on bagging by de-correlating the trees. That is, random forest can avoid the problem of data overfitting (James, G., et al., 2013).

Fig 5.6 visualizes the two different types of decision tree: classification tree and regression tree. In classification tree, the tree only has one node, status_canceled.

Whether the record’s predictor status_canceled is 1 or 0, it would be classified as not placing another reservation in 90 days. According to the result, there would be no positive prediction throughout the classification process. This leads to a poor value of AUC close to 0.5. In contrast, the AUC of regression tree is about 0.61 because it has positive predictions and improves the prediction performance. The regression tree also has only one node. If the record’s predictor status_canceled is 1, the return probability of the member is 0.3834. If the record’s status of status_canceled is 0, the return probability of the member is 0.1553. Consequently, we use regression tree to

相關文件