GAM - Empirical result - 客戶回訪預測模型應用於線上預約服務

5. Empirical result

5.1 GAM

國

立政治大學

‧

N a tio na

l C h engchi U ni ve rs it y

5. Empirical result

5.1 GAM

In order to identify which variables would practically enhance predictive performance, we specify models using reservation information and restaurant information and evaluate their performances by AUC (See Section 4 for details). We note that all the variables used for forecasting are statistically significant. The 𝑃_𝑖 in our study implies

Pr(𝑅𝑒𝑡𝑢𝑟𝑛90_𝑖 = 1|𝑋_𝑖β) log ( 𝑃_𝑖

1 − 𝑃_𝑖) = α + 𝛽₁𝑆₁(Gender) + 𝛽₂𝑆₂(Age) (9) Equation (9) is a generalized additive model with a member’s age (Age) and gender (Gender) as predictors. Traditionally, these member profiles are considered powerful to predict consumers’ behaviors. However, the AUC (Fig 5.1) is only 7.7%

larger than the one of a randomly guessing model (AUC=0.5). That is, Age and Gender is not helpful to predict a customer’s return rate (Return90). Moreover, there are too many missing values in independent variable Age. This may lead to losing data information.

Fig 5.1

‧

國

立政治大學

‧

N a tio na

l C h engchi U ni ve rs it y

We continue to try on period of time between dining day and order-placed day and reservation group size: Timediff and People. (equation 10)

log ( 𝑃_𝑖

1 − 𝑃_𝑖) = α + 𝛽₁𝑆₁(Timediff) + 𝛽₂𝑆₂(People) (10) From Fig 5.2, the ROC curve of equation 10 is closer to the point (0, 1) than equation 9 in Fig 5.1. Likewise, the AUC of equation 10 is 0.5799. This means that equation 10 has better prediction performance than equation 9 (0.5799 versus 0.5385).

As a result, we consider that these two predictors improve prediction performance.

Fig 5.2

Since some variables of reservation information are able to improve the performance of prediction, we further include that the status of reservation information can improve the performance as well.

log ( 𝑃_𝑖

1 − 𝑃_𝑖) = α + 𝛽₁status_ok + 𝛽₂status_canceled + 𝛽₃status_changed (11) Equation (11) is another classification model using reservation status: reservation is new or ok (status_ok), reservation is canceled (status_canceled), and reservation is changed (status_changed). As we mention in Section 3, these three independent variables record the status of a reservation. Fig 5.3 shows the comparison between the model M1 (equation 10) and M2 (equation 11). The prediction performance of M2 is better. The AUC of M2 is 0.6236, which is higher than the 0.5799 of M1.

‧

國

立政治大學

‧

N a tio na

l C h engchi U ni ve rs it y

Consequently, the status of reservation improves the prediction as well.

Fig 5.3

log ( 𝑃_𝑖

1 − 𝑃_𝑖) = α + 𝛽₁status_ok + 𝛽₂status_canceled + 𝛽₃status_changed + 𝛽₄𝑆₁(Timediff) + 𝛽₅𝑆₂(people) (12) Equation (12) is a model combining independent variables from equation (10) and equation (11). Fig 5.4 shows the ROC curves of new model M3 and the other two, M1 and M2. The AUC of M3 is 0.6605. It improves the prediction performance (from 0.6236 to 0.6605).

‧

國

立政治大學

‧

N a tio na

l C h engchi U ni ve rs it y

Fig 5.4

With the improvement at M3, we attempt to introduce all the other variables to the classification models. We infer the prediction power of member profile may increase in a larger predictor set. In section 3, we mention that besides reservation information, information of restaurants in the booking records is also included in the dataset. It is possible that restaurant information influences the behaviors of members.

log ( 𝑃_𝑖

1 − 𝑃_𝑖) = α + 𝛽₁age16 − 25 + 𝛽₂ag2 + 𝛽₃age36 − 45 + 𝛽₄gender + 𝛽₅status_𝑜𝑘 + 𝛽₆status_canceled

+ 𝛽₇status_changed + 𝛽₈𝑆₁(Timediff) + 𝛽₉𝑆₂(people)

+ 𝛽₁₀Is_hotel + 𝛽₁₁new_taipei_city + 𝛽₁₂out_of_greater_taipei + 𝛽₁₃Wifi (13)

Equation (13) includes member’s profile and restaurant information, covering age, gender, and the location of restaurant (area), if restaurant is in a hotel (is_hotel), and if restaurant provides wifi (wifi). In Fig 5.5, the ROC curve of M4 is almost the same as the one of M3. This implies that in our case, geographic information of a restaurant and its facilities are not powerful to predict member’s usage of an online booking service. In other words, no matter where the restaurant is, its location does not influence member’s behavior. Table 8 shows AUC comparison among these GAM models.

‧

國

立政治大學

‧

N a tio na

l C h engchi U ni ve rs it y

Fig 5.5

Table 8 GAM model comparison

Model Name Variables AUC

M0 age, gender 0.5305

M1 timediff, people 0.5799

M2 status 0.6236

M3 timediff, people, status 0.6605

M4 timediff, people, status, restaurant 0.6644

‧

After constructing the models through GAM with different combination of independent variables, we are not able to improve their AUC further. Therefore, we would like to try tree-based methods which also perform well in data classification.

The goal of tree-based methods is to divide the predictor space into several different regions for classification. At the end of the division, the regions are summarized to a tree. Decision tree is a method which constructs a tree through this process. It can classify data and interpret results easily. However, it actually has a poor prediction power compared with the learning methods, like logistics regression and GAM (James, et al., 2013). Therefore, we also fit our data with other two tree-based methods,

Bagging and Random Forest. Unlike decision tree, the two learning approaches resample training data sets through the procedure bootstrap. Bootstrap is commonly used to reduce prediction variance of one training set from another. It resamples data multiple times by sampling the training data with replacement.

Bagging bootstraps new training data sets from the original one first. These new training sets are fit to prediction model with full set of p predictors separately. And then the prediction results would be aggregated to one single model. Generally speaking, the results are aggregated through classifiers’ voting. For instance, if the training data is resampled 100 times through bootstrap, there are 100 classifiers.

Therefore, for each observation, there would be 100 predictions. The final prediction is voted by the 100 predictions. If there are 90 “Yes” in the 100 predictions, the final prediction of the observation is voted as “Yes”. Random forest, one the other hand, is a refined version of bagging. Like bagging, multiple training data sets are

bootstrapped. Nevertheless, there is a random subset of m predictors chosen from the full set of p predictors during each fitting process of these training sets to different decision trees. The number m is typically equal to the square root of p, which means 𝑚 = √𝑝. This can improve on bagging by de-correlating the trees. That is, random forest can avoid the problem of data overfitting (James, G., et al., 2013).

Fig 5.6 visualizes the two different types of decision tree: classification tree and regression tree. In classification tree, the tree only has one node, status_canceled.

Whether the record’s predictor status_canceled is 1 or 0, it would be classified as not placing another reservation in 90 days. According to the result, there would be no positive prediction throughout the classification process. This leads to a poor value of AUC close to 0.5. In contrast, the AUC of regression tree is about 0.61 because it has positive predictions and improves the prediction performance. The regression tree also has only one node. If the record’s predictor status_canceled is 1, the return probability of the member is 0.3834. If the record’s status of status_canceled is 0, the return probability of the member is 0.1553. Consequently, we use regression tree to

在文檔中客戶回訪預測模型應用於線上預約服務 - 政大學術集成 (頁 22-27)