Forecast - 應用梯度提升機於供應鏈預測

CHAPTER 4 FORECASTING

4.2 Forecast

The data we received from W company are of two different types. The first one is the Calloff data. Table 4.2 can show us the content of the Calloff data. The first column is the transaction date of the item. The second column is the plant where the items are produced. The third column is the item number. The fourth column is the demand quantity of the item in the specific transaction date. We do not have the forecasting data in this table.

Table 4.2 The content of the calloff data

TRANSACTION_DATE CUST_PLANT ITEM_NUM QUANTITY

2015/4/1 F130 AO3418L 9000

2015/4/1 F130 AOZ5019QI 6000

2015/4/1 F130 SSM3J327R,LF(T 12000

2015/4/1 F131 AO3401A 288000

2015/4/1 F131 AO4407A 27000

… … … …

2017/11/1 F130 TPC8067-H,LQ(S 2500

2017/11/1 F131 SSM3K16CT,L3AP1F(T 10000

2017/11/1 F131 SSM3K16CT(TL3APP1E 140000

2017/11/1 F130 SSM3K15AMFV,L3AF(

T 216000

2017/11/1 F130 TCS10DPU(T5LAP,E) 36000

DOI:10.6814/THE.NCCU.MIS.003.2019.A05

The second part is the forecasting data. Table 4.3 shows us the content of the forecasting data. The first column is the date when the downstream companies generate the forecasting data using the materials requirement planning system. The second is the plant where the items are produced. The third is the item number. The fourth is the target date of the forecast, and the fifth column is the forecasting demand quantity of the item in the specific transaction date.

Table 1.3 The content of the forecasting data

MRP_DATE CUST_PLANT ITEM_NUM DMD_DATE QUANTITY

2015/6/29 F130 AO4435 2015/6/29 1226

2015/6/29 F130 AO4435 2015/7/6 1842

2015/6/29 F130 AO4435 2015/7/13 2073

2015/6/29 F130 AO4435 2015/7/20 518

2015/6/29 F130 AO4435 2015/7/27 2774

… … … … …

We want to use not only the demand data but also the forecasting data to let the machine learn. We need to combine these two data together. The first step is to find how many item-plant pairs are there in the data. We find 2214 item-plant pairs in total. The time unit of W company is week, so we transform the

TRANSACTION_DATE in table 4.2 and the MRP_DATE and the DMD_DATE in table 4.3 into week numbers. The first week is from April 1, 2015 to April 7, 2015, a total of 136 weeks. We sort the data by the item-plant pairs and the week numbers.

Table 4.4 show us the example of the Calloff data after we transform and sort. Some

DOI:10.6814/THE.NCCU.MIS.003.2019.A05

weeks have demand while others do not. The second step is to match the item-plant pairs and the week numbers. There are many forecasting values for the same week.

We list the forecasting value in the new columns after the Week column. Table 4.5 is the example of the combination of the Calloff data and Forecasting data.

Table 4.4 The Calloff data after transformation CUST_PLANT QUANTITY ITEM_PLANT Week

F130 39000 AO3418L -

We show the example of the Forecasting in table 4.5. We are now standing at week 7.

The lead time is 4 weeks. The target we want to forecast is the Calloff.LT.1, which is the aggregate of the demand in the future lead time plus 1 week, that is, week 8 to week 12. FCST(-1) means the forecasting value of the Calloff.LT.1 for week 7 from 1 week ago, that is, week 6. FCST(-2) means the forecasting value of the Calloff.LT.1 for week 7 from 2 week ago, that is, week 5. The forecasting signals we already have in week 7 is the triangle shown in table 4.5. The FCST(-1) of week 9 is week 8. We do not have the information for week 7. As the week number increases, the known forecasting information become lesser. We can only use the forecasting data in the triangle to forecast the future demand. When we stand at week 7, the latest

Calloff.LT.1 we have is 7 minus lead time, which means Calloff.LT in week 3.

DOI:10.6814/THE.NCCU.MIS.003.2019.A05

Table 4.5 Example of forecasting data framework

We cannot randomly use any percent of the data as the training data, the validation data, and the testing data. The data we use are in an order of time. We take 52 weeks as the basis for training and validation. In the first round, we take the week 1 to week 36 as the training data, the week 37 to week 39 as the validation data, and the week 52 as the testing data. We use week 52 instead of week 40 because the week 39

Calloff.LT.1 is the demand collection from week 40 to week 52. In this situation, lead time is now 12 weeks. It is very similar to the situation in Table 4.6. When we stand at week 52, we already know that the latest Calloff.LT.1 is in week 39. Table 4.6 show us the process of each round.

DOI:10.6814/THE.NCCU.MIS.003.2019.A05

Table 4.6 The process of the training, validation, and testing time-domain change

The time-domain is from week 52 to 103, a total of 52 weeks, which means a year.

The index we use to measure the performance is mean absolute error

(MAE).However, the first week for any item-plant pair that has the calloff or the forecasting data can be different. The method we use finds the first week when each item-plant pair has its calloff or forecasting data. We only cumulate the MAE from the first week when the item-plant pairs have calloff or forecasting data to the 103^rd week. The item-plant pairs that have both calloff and forecasting data after 103^rd week will not be considered. After this step, we only have 1899 item-plant pairs.

We first try the GBM model, which is the Model 1. We think the 8WMA and the GBM have their own strengths. This is the reason we want to mix the 8WMA and the GBM model. The first is simply giving the same weight to 8WMA and GBM, which is the

Model 2. However, we think that adjusting the weights weekly will improve the

performance due to the change of the trend. In Model 3, we compare the forecasting value produced by the validation data with the real value and use the error of the

round1 Training:

DOI:10.6814/THE.NCCU.MIS.003.2019.A05

forecasting value and the real value to set the weight for each week. We also consider the characteristic of each Item-plant pair in Model 4. We give each Item-plant pair its own weight. The brief contents of each model are list below.

Model 1: Gradient Boosting Machine (GBM)

We use the package “H2O” in R and use the function h2o.gbm. The parameters we set are listed in table 4.2. GBM has many parameters that can be fine-tuned. For detailed explanations of these parameters in Table 4.7, please refer to Gradient Boosted Models with H2O (Click, Malohlava, Candel, Roark, and Parmar, 2016)

Table 4.7 The parameter of the GBM

Number of trees: 100, 200, … , 1000

The larger the number of trees, the more accurate the result. It seems that more trees is always better. However, there are some problems so easy that we do not need too many trees.

Maximum depth of the trees: 1, 2, … , 25

This means the complexity of the trees. The less complex of the trees, the less chance to overfit. However, the complex trees have better explanatory power compare to the simple trees.

Minimum row for each iteration: 5, 10, … , 100

Learning with more data rows may perform better due to it use more data.

However, sometimes the data we have are limited. In this situation we should learn with less data rows.

Learning rate: 0.01, 0.012, … , 0.08

Once we do not reach the optimal solution, the machine will try the other gradient to find out the next solution. The step size the machine renew from the current solution to the next solution is according to the learning rate.

Sampling rate: 0.3, 0.35, … , 1

This sample rate is for rows, which means how many percentages of the data rows will be use in training. Using more data can make the result more accuracy while it will takes more times to train.

Column sample rate: 0.3, 0.35, … , 1

DOI:10.6814/THE.NCCU.MIS.003.2019.A05

This sample rate is for columns, which means how many percentages of the features will be use in training. Sometimes the features we give the machines may not useful in the machine learning. The more features we use can not guarantee the result will be better. Note that this method is sample without replacement.

Column sample rate per tree: 0.3, 0.35, … , 1

We should set this parameter after we set the Column sample rate. If we set the Column sample rate equals to 0.8 and the Column sample rate per tree equals to 0.7, each tree will use 0.8*0.7 equals to 56% of the column will be use to train.

Note that this method is sample without replacement.

Distribution: Gamma

We assume that the distribution of the data is Gamma.

In total, there are over 489 million scenarios. We perform a random grid search by randomly testing a subset of these scenarios. Finally, the model will choose an

optimal one from the scenarios above according to the condition set by us. Measuring the percentages of the improvement is also a problem. Waller (2015) shows that there are some indexes, including mean absolute error (MAE), geometric mean absolute error (GMAE), mean absolute percentage error (MAPE), symmetric mean absolute percentage error (sMAPE), and mean absolute scaled error (MASE). Because we care about the difference between the forecasting value and the true value, so we set the condition as MAE.

Model 2: 50% 8WMA and 50% GBM

In this model, we just simply give the 8WMA model and the GBM model with the same weights. This means that the weight for the 8WMA model and the GBM model are the same from 52^nd week to 103^rd week.

Model 3: Weighted by week

We will try to weight the 8WMA model and the GBM model by week. We calculate the weight by the MAE of the forecasting value and the real value. We first use the validation data to forecast. Then, we have the forecasting value. We use the MAE of the forecasting value and the real value to adjust the weight every week. Equation 4.2 shows the calculation process. 𝑀𝐴𝐸. 8𝑊𝑀𝐴 is the MAE of the 8WMA. 𝑀𝐴𝐸. 𝐺𝐵𝑀 is the MAE of the GBM. When the error of the 8WMA is large, we should let the weight

DOI:10.6814/THE.NCCU.MIS.003.2019.A05

of the GBM be heavier to approximate the real value, and vice versa. Since the weight of all item-plant pairs are the same for the same week, there are only one notation, that is week, for 𝑀𝐴𝐸. 8𝑊𝑀𝐴⁡and 𝑀𝐴𝐸. 8𝑊𝑀𝐴.

Model 4: Weighted by week and by item

We will try to give the 8WMA and the GBM weight by week and by item. We first use the validation data to forecast. Then, we have the forecasting value. We will calculate the MAE by week and item, which means different item-plant pairs will have different

MAEs in the same week. This is the reason it has two notations, item-plant pairs and

weeks, for MAE.8WMA and MAE.GBM.

MAE. 8WMA_{𝑖𝑡𝑒𝑚,𝑤𝑒𝑒𝑘}

The measurement unit we use here is the error of each model divided by the error of

8WMA. We want to check how good or how bad the new model performs. The

calculation is in equation 4.3. The start time is the first week when the item-plant pair I has its calloff or forecasting data. First, we can notice the item-plant pair when we first meet the demand or receive the forecasting data from the downstream companies.

Only the error after the start time will be counted. The end time is fixed in 103^rd week.

We sum up the error of each week by their absolute value.

在文檔中應用梯度提升機於供應鏈預測 - 政大學術集成 (頁 21-28)