Tree-Based Learning - Empirical result - 客戶回訪預測模型應用於線上預約服務

5. Empirical result

5.2 Tree-Based Learning

After constructing the models through GAM with different combination of independent variables, we are not able to improve their AUC further. Therefore, we would like to try tree-based methods which also perform well in data classification.

The goal of tree-based methods is to divide the predictor space into several different regions for classification. At the end of the division, the regions are summarized to a tree. Decision tree is a method which constructs a tree through this process. It can classify data and interpret results easily. However, it actually has a poor prediction power compared with the learning methods, like logistics regression and GAM (James, et al., 2013). Therefore, we also fit our data with other two tree-based methods,

Bagging and Random Forest. Unlike decision tree, the two learning approaches resample training data sets through the procedure bootstrap. Bootstrap is commonly used to reduce prediction variance of one training set from another. It resamples data multiple times by sampling the training data with replacement.

Bagging bootstraps new training data sets from the original one first. These new training sets are fit to prediction model with full set of p predictors separately. And then the prediction results would be aggregated to one single model. Generally speaking, the results are aggregated through classifiers’ voting. For instance, if the training data is resampled 100 times through bootstrap, there are 100 classifiers.

Therefore, for each observation, there would be 100 predictions. The final prediction is voted by the 100 predictions. If there are 90 “Yes” in the 100 predictions, the final prediction of the observation is voted as “Yes”. Random forest, one the other hand, is a refined version of bagging. Like bagging, multiple training data sets are

bootstrapped. Nevertheless, there is a random subset of m predictors chosen from the full set of p predictors during each fitting process of these training sets to different decision trees. The number m is typically equal to the square root of p, which means 𝑚 = √𝑝. This can improve on bagging by de-correlating the trees. That is, random forest can avoid the problem of data overfitting (James, G., et al., 2013).

Fig 5.6 visualizes the two different types of decision tree: classification tree and regression tree. In classification tree, the tree only has one node, status_canceled.

Whether the record’s predictor status_canceled is 1 or 0, it would be classified as not placing another reservation in 90 days. According to the result, there would be no positive prediction throughout the classification process. This leads to a poor value of AUC close to 0.5. In contrast, the AUC of regression tree is about 0.61 because it has positive predictions and improves the prediction performance. The regression tree also has only one node. If the record’s predictor status_canceled is 1, the return probability of the member is 0.3834. If the record’s status of status_canceled is 0, the return probability of the member is 0.1553. Consequently, we use regression tree to

‧

國

立政治大學

‧

N a tio na

l C h engchi U ni ve rs it y

construct models in the following research, including bagging and random forest. Fig 5.7 shows the comparison of AUC among tree-based models, i.e., decision tree, bagging, and random forest, and GAM. The independent variables for constructing models are the same as equation (13). The AUC of GAM is 0.6675, which is higher than the AUC of all the other models, including the one of random forest (0.6582).

This indicates that with the same predictors in M4, GAM still performs better than some tree-based methods. Interestingly, the AUC of decision tree with only one predictor is 0.6145, which is higher than the one of bagging (0.6058).

‧

國

立政治大學

‧

N a tio na

l C h engchi U ni ve rs it y

Fig 5.6

Fig 5.7

‧

國

立政治大學

‧

N a tio na

l C h engchi U ni ve rs it y

Fig 5.8 Fig 5.9

Fig 5.8 and Fig 5.9 are the variable importance comparison of bagging and random forest. For each tree, the original prediction error for classification is recorded and the permuting prediction errors are recorded by permuting each predictor variable.

For example, if there are 10 variables in the model, there are one original prediction error of the model with full set of variables and 10 permuting prediction errors of each variable in the tree. The difference between original prediction error and permuting prediction errors of variables are then averaged out across trees. The difference would be normalized by the standard deviation of these differences. The normalized

difference is the relative importance of the variable. The variable which has the biggest error difference is the most important variable because the prediction error increases a lot without the variable. And just as we see from the structure of decision tree (Fig 5.7), status_canceled is the most important variable in both bagging and random forest model. But the information of restaurant, like is_hotel and wifi, seems to improve prediction a lot, which is different from the model of GAM.

In order to improve the performance of tree-based models, we also include the two-way interaction terms. First of all, we added squared term of people and squared term of timediff to equation (13), resulting in 15 independent variables. Further, each of these independent variables is multiplied with one another. Consequently, there are 96 two-way interaction terms. They are the combinations of each two variables but the interaction terms between status, age, and area themselves are excluded because they are categorical variables. The interaction terms between people, timediff and their square terms are also excluded. (96 = 𝑐₂¹⁵− 𝑐₂³− 𝑐₂³− 𝑐₂²− 2) As a result, there are 111 independent variables in total (96+15). Moreover, the number of independent variables to construct random forest model are randomly chosen, which is different

‧

國

立政治大學

‧

N a tio na

l C h engchi U ni ve rs it y

from bagging model. Consequently, the number of variables randomly chosen in bagging model is 111, and in random forest model is 11 (≈ √111).

Fig 5.10

Fig 5.10 shows the comparison of AUC between tree-based learning models with two-way interaction terms and GAM without interaction terms. The performance of GAM model is still better than the performance of tree-based learning models. The AUC of decision tree model (0.6145) is higher than bagging model (0.6058) and random forest model (0.6092). And the performance of random forest model with interaction terms (0.6092) is poorer than the one without interaction terms (0.6582).

In fig 5.11, the importance of variables in random forest model with interaction terms is listed. There is no interaction term on the list, which shows the reason why the performance of random forest model does not improve after adding these interaction terms to predictive models. In other words, interactions terms cannot improve the prediction performance in this context. Moreover, it is surprising that is_hotel and wifi are the two of the most important predictors in the model. We consider that the

well-performed variable, status_canceled, would not appear in each variable set when randomly selecting variables in random forest models. Therefore, other variables, like is_hotel and wifi, may perform well in the models without status_canceled and the importance of these variables may be relatively higher than status_canceled.

‧

國

立政治大學

‧

N a tio na

l C h engchi U ni ve rs it y

Fig 5.11

Due to the poor performance of random forest model with interaction terms, we consider that too many predictors would lead to higher probability to select variables without prediction power when training model. Therefore, we eliminate those

redundant variables and keep predictors which improve the performance. According to the results we show, status of the reservation (status_ok, status_canceled,

status_changed) can improve prediction performance a lot. As a result, we try the specifications focusing on reservation status to see if GAM can still perform better than tree-based learning method in the context. In the first model, we pick the three status independent variables (status_ok, status_canceled, status_changed) and the interaction between the three status variables and rest of the 10 variables in equation (13). In other words, there are 33 (3+3*10) variables in the model. As for the second model, the 10 variables are also added to the model. So, there are 43 (3+3*10+10) independent variables in the second model. Fig 5.12 and fig 5.13 are the AUC comparisons of two different sets of independent variables applying to the four

different training methods. Fig 5.12 is the first model with 33 variables and fig 5.13 is the second model with 43 variables. The model of GAM still performs better than all the other tree-based training models. Also, the performance of random forest models makes a great improvement under this independent variables combination. The AUC grows from 0.6092 to about 0.665. Table 9 compares with/without interaction terms of different learning methods.

‧

國

立政治大學

‧

N a tio na

l C h engchi U ni ve rs it y

Fig 5.12

Fig 5.13

‧

國

立政治大學

‧

N a tio na

l C h engchi U ni ve rs it y

Table 9 Learning methods comparison Model AUC

Learning methods

Without interaction terms With interaction terms (status)

bagging 0.6058 0.6067

random forest 0.6582 0.665

GAM 0.6675 0.6675

‧

Our study has two major findings pertaining to variable selection and modeling method for predicting online reservation services. Regarding variable selection, according to the comparison of performance between the predictive models, age and gender have weak prediction power. Traditionally, we consider information of

members themselves strongly connected to their behavior. It would influence whether a member would buy products or services again and it is helpful to predict member behavior. However, in our study, the models with powerful prediction are related to the reservation itself, including variables like the status of the reservation (status), the period of time between the day of order placed and dinging day (timediff), and group size of dinging people (people). As for information about restaurant, it barely makes improvement to prediction performance. Although these independent variables are all statistically important to the models, they may not have strong prediction power. In other words, statistically significant variables may be explanatory to customer behavior, but it is not necessary for them to improve the performance of predictive models.

Regarding modeling method, according to section 5.2, we apply tree-based

learning methods to training data. These methods, like decision tree and random forest, are widely used for data classification because of its prediction power. In order to provide more information to tree-based learning methods, two-ways interaction terms are added to the models. However, the prediction performance of such a model specification is worse than GAM. Because not all variables are selected in random forest, some trees may contain variables without prediction power, e.g. age and restaurant information. So, we then focus on the interaction between status and other predictors to fit the training data. However, under this specification, GAM still performs better than these tree-based learning methods. That is,

computational-complex models cannot improve AUC further. Instead, GAM, with less computational efforts provides the best prediction accuracy. As a result, we conclude that neither model selection nor model complexity is the most important issue in our research context. It is variable selection that determines prediction power across models. For example, picking the transaction-dependent variables like status over restaurant information makes a significant improvement to prediction performance.

In conclusion, service providers like EZTable can focus on collecting data about reservation. For example, reservation status can be collected automatically by system.

Also, the time that the order placed and the group size of dining people have to be correct for member to place a reservation. It is efficient to collect these data because

在文檔中客戶回訪預測模型應用於線上預約服務 - 政大學術集成 (頁 27-35)