應用梯度提升機於供應鏈預測 - 政大學術集成

全文

(1)國立政治大學資訊管理學系研究所碩士學位論文. 應用梯度提升機於供應鏈預測 The Application of Gradient Boosting Machine in Supply Chain Forecasting. 指導教授:：張欣綠博士莊皓鈞博士. 研究生：許博淳撰. 中華民國 108 年 2 月. DOI:10.6814/THE.NCCU.MIS.003.2019.A05.

(2) 致謝經過兩年的時間本研究終於要告一個段落，這並不是一條筆直的道路，經過許多的嘗試才能有結果，最先需要感謝的是張欣綠博士以及莊皓鈞博士，在我對於政大陌生的情況下，伸出溫暖的雙手接納我成為電子商務實驗室的一員，同時欣綠老師與此個案公司已經合作數年，在參考了我大學背景與專題研究內容後，將我指派到能夠讓我發揮的領域，同時將皓鈞老師帶入指導我進行數學模型建構以及模擬相關的研究，在兩位指導老師的努力下，承蒙前期多位學長姐的努力，終於獲得個案公司更多的信任與資料提供，使得我的研究更接近現實面而更具有參考價值。另外欣綠老師和皓鈞老師除了在研究議題上對我的指導，也經常關心我生活中的大小事務，在研究以外遇到的問題也能擁有老師的建議是讓我覺得十分幸福的事情，同時也感謝兩位老師非常的信任我能夠自主管理而沒有給我太多的限制。. 我的同屆實驗室夥伴雖然只有一位，楊惠晴，但我也同時擁有三位學姊，石詠綺，胡惠宸和孫若庭，在我初入政大人生地不熟的情況下，給予我十分足夠的幫助，不論是在課程上或是生活上，都讓我少了十分多的不安，增添十分多的溫暖，一併需要感謝的還有同屆同學黃俊敏、許安廷、吳旻諺、徐祥華、蔡佑誠、柯典佑，學長何志偉，我認為有人際關係的溫暖可以讓我更專心於研究上，特別需要感謝惠晴的是資料處理上給予我十分多的建議，詠綺學姊則給予我論文投稿、發表與撰寫上十分足夠的建議。. 最後當然也要感謝我的家人，父親許志誠、母親季明華、妹妹許博涵、女朋友黃佳琳和黃金興與許雪霞夫婦，除了提供我必要的生活開支之外，也在精神層面上支持我完成研究，研究過程有許多顛簸，自然也會有許多壓力，在家人們的陪伴下，我有一個紓解壓力的缺口，也會使我更有動力去完成論文而不怠惰; 健而美健身院上至許火塗院長、張來秀師母，下至諸位國手及訓練夥伴也都讓我在研究上精力充沛。. 許博淳. 2019 年 2 月 22 日. DOI:10.6814/THE.NCCU.MIS.003.2019.A05.

(3) 摘要為協助亞太地區最大的電子零組件經銷商長久以來存貨量過高且達交率不如預期的情形，本研究從以下兩點做嘗試，第一項為優化訂貨策略，第二項為引入機器學習協助預測需求；在優化訂貨策略部分嘗試提出新法則去決定訂購數量，並在完美資訊下比較公司現有法則與新法則之優劣；目前個案公司採用之預測方法為移動平均法，本研究嘗試引入梯度提升機這種機器學習方法，並同時加入移動平均法與機器學習之混合模型，採用達交率、服務水準及期末存貨量三個指標，嘗試比較模型之間優劣；另外，為了要建構機器能夠學習的資料，需要事前處理資料格式與篩選內容，也需要另外加入特徵值以便機器能夠學習到需求變化的特性。本研究之目標在幫助個案公司改善預測能力，企圖使存貨量降低並且提升達交率，使個案公司之營運績效提升。. 關鍵詞:梯度提升機，電子零組件經銷商. DOI:10.6814/THE.NCCU.MIS.003.2019.A05.

(4) ABSTRACT In order to solve the two main problem of our case study W company, high pressure of stock and the dissatisfied fill rate, this research aims to find a better ordering policy and use machine learning to forecast the demand. We propose a new policy to decide the order quantity and we compare the new policy to the current one under the perfect information. Nowadays, W company forecast the demand with the moving average method. We try one of the machine learning method, Gradient Boosting Machine, and we also mix the Gradient Boosting Machine and the moving average methods together to forecast. We use three indexes, fill rate, service level, and the stock quantity at the end of the period, to measure the performance. The raw data from the W company needed processing and screening and we need to add some features to make the machine capable to learn the demand pattern. Make the forecast more precise is the objective of this research. So, we want to keep the fill rate higher and minimize the inventory, which means the performance of W company will become more competitive.. Key words: Gradient Boosting Machine, electronic component distributor. DOI:10.6814/THE.NCCU.MIS.003.2019.A05.

(5) CONTENT TABLES AND FIGURES ........................................................................ 1 CHAPTER 1 INTRODUCTION ............................................................. 1 1.1 Background and Motivation ......................................................................................... 1 1.2 Research Questions ...................................................................................................... 3. CHAPTER 2. LITERATURE REVIEW ............................................. 4. 2.1 The Problems of Demand Prediction............................................................................ 4 2.2 Machine Learning for Supply Chain Demand Forecasting .......................................... 5. CHAPTER 3. INVENTORY MODEL AND POLICY ANALYSIS. 7. 3.1 Inventory Control System ............................................................................................. 7 3.2 As-Is and To-Be Policy Analysis ................................................................................. 9. CHAPTER 4 FORECASTING ............................................................. 14 4.1 Data and Feature Engineering .................................................................................... 14 4.2 Forecast....................................................................................................................... 15 4.3 The Result ................................................................................................................... 22. CHAPTER 5 DISCUSSION .................................................................. 26 CHAPTER 6 CONCLUSION & LIMITATION ................................. 28 6.1 Conclusion .................................................................................................................. 28. REFERENCES ........................................................................................ 30. DOI:10.6814/THE.NCCU.MIS.003.2019.A05.

(6) TABLES AND FIGURES Table 3.1 The results of the As-is and To-be policies ..............................................12 Table 4.1 The data column .......................................................................................14 Table 4.2 The content of the calloff data ..................................................................15 Table 4.3 The content of the forecasting data...........................................................16 Table 4.4 The Calloff data after transformation .......................................................17 Table 4.5 Example of forecasting data framework ..................................................18 Table 4.6 The process of the training, validation and testing time-domain change .19 Table 4.7 The parameter of the GBM.......................................................................20. Figure 1.1 The position of W company in the semiconductor supply chain ..............1 Figure 1.2 Real demand frequency .............................................................................2 Figure 3.1 The demand frequency of the high frequency items ...............................11 Figure 3.2 The demand frequency of the 75 percent quantile frequency items ........11 Figure 3.3 Result of service level .............................................................................12 Figure 3.4 Result of end of week stock on hand ......................................................12 Figure 4.2 The cumulated number of items for each of the models .........................24 Figure 4.3 Error ratio distribution.............................................................................24. DOI:10.6814/THE.NCCU.MIS.003.2019.A05.

(7) CHAPTER 1 INTRODUCTION 1.1 Background and Motivation W company is a Taiwan holding company founded in 2005. Today, W company is the largest semiconductor components distributor in the world with annual revenue of about 17.5 billion US dollars. Figure 1.1 shows the position that W company occupies in the semiconductor supply chain.. Figure 1.1 The position of W company in the semiconductor supply chain. The primary reason is there are more stocks on hand at the end of the life cycle. These stocks are called dead stocks. The inventory problem becomes increasingly serious as W company expands its business. It cannot sell these dead stocks as these components. DOI:10.6814/THE.NCCU.MIS.003.2019.A05.

(8) are no longer used in the latest products. Furthermore, W company has to pay to dispose these dead stocks. The reason W company needs to prepare a lot more than the real demand is the difference in the lead time. Its supplier lead time is steady but long, at around 12 weeks. When the downstream customer demands quick response whenever components are needed, W company should forecast ahead of their customer for 2 to 3 months or even longer.. Due to the contracts with the customers, W company needs to deliver the components on time to cover a certain percent of the demand, or they will be punished with compensation. The goal of W company is to make the fill rate reach 100%; it is not satisfied if there is high pressure of stocks on hand while the fill rate needs to be 100%, since it may lose its customers. Once the customers leave, it will cost a lot to bring them back. The only method to avoid this problem is through precise forecasting.. However, the demand can change a lot without any rules. We find that some of the demand quantity changes on a weekly basis. We also find that others do not have demand in most of the weeks. Figure 1.2 shows us the situation. Quantity. Lumpy Calloff. 1000 900 800 700 600 500 400 300 200 100 0 1. 4. 7. 10. 13. 16. 19. 22. 25. 28. 31. 34. 37. 40. 43. 46. 49. 52. Week week Figure 1.2 Real demand frequency. 2. DOI:10.6814/THE.NCCU.MIS.003.2019.A05.

(9) According to our research, only 25% of the items from W company can completely satisfy the demand for each week. This is something we can improve to make the forecasting better.. 1.2 Research Questions Due to the erratic and non-stable demand pattern of the W company, the research questions are as follows, 1. How to decide the order quantity? The perfect order quantity will be the real demand after the current lead time. However, it is difficult to foresee the future. Due to the error between the real demand and the estimation, we should order more than the estimated quantity. However, the more we order the more stock on hand we have at the end. We try to find a balance between fulfilling the demand and minimizing the stock on hand at the end.. 2. How to predict the demand quantity for the future lead time? The lead time is the time difference between ordering and receiving. In other words, W company should send out the order by a lead time unit before it receives the order. The lead time of W company is steady but long, at around 12 weeks. The available stocks should cover the demand of the lead time. The longer the lead time, the harder it is to make a precise forecast of the future demand. To summarize, the main question is how to forecast the demand in the future.. 3. DOI:10.6814/THE.NCCU.MIS.003.2019.A05.

(10) CHAPTER 2 LITERATURE REVIEW In this chapter, we will discuss demand forecasting and machine learning for supply chain demand forecasting. In the first part, we want to find out why the demand of W company is so unpredictable. When we know the demand characteristics, we can find the breaking point easily. In the second part, we discuss the difference between our research and the previous studies and our reasons for choosing a certain machine learning method.. 2.1 The Problems of Demand Prediction Hendry, Simangunsong, and Stevenson (2011) have listed three kinds of uncertainties in the supply chain management: demand uncertainty, manufacture uncertainty, and supply uncertainty. The demand uncertainty results from the uncertainties of the demand quantity and the period when the demand occurs. The manufacture uncertainty results from the uncertainties of the yield rates, the crashing frequency of the machines, and other logistical problems. The supply uncertainty results from the uncertainties of the quality of the components, the reliability of import, and the backup level of the components. The manufacture uncertainty can be reduced by using more precise equipment and the supply uncertainty can be reduced by using historical data to roughly predict an expected supply. However, a simple method to. resolve the problem of demand uncertainty does not exist, as demand prediction is too complex.. Bullwhip effect is one of the main reasons why the demand is so unpredictable. The bullwhip effect results from poor supply chain integration. Babai et al. (2015) found that a small increase or decrease in the demand of the final customers will finally lead to a huge difference for the upstream suppliers. Each supply chain unit will adjust its demand to have a safe stock quantity. By the time information is delivered to the upstream suppliers, it has already been distorted by each unit in the supply chain. The bullwhip effect makes the demand quantity change more than the real change.. 4. DOI:10.6814/THE.NCCU.MIS.003.2019.A05.

(11) W company has a serious forecasting problem. First, a lot of the research (e.g., Bischak, Naseraldin and Silver, 2008; Esper and Waller, 2014) assume that the demand pattern is normally distributed and stationary. However, the demand distributions that W company faces are intermittent or lumpy. It is very likely that components of the latest 3C products did not even exist one year ago, which means the components are being frequently replaced. Therefore, the demand for the components is finite rather than stationary.. The demand pattern of W company has a finite horizon with a long lead time. Deng, Paul, Tan, and Wei (2017) proved that a periodic review system tends to have more stocks on hand at the end of the time-domain when the demand has a finite horizon. The difference between the minimum and maximum value of the future demand will increase as the lead time increases. In fact, it is not easy to find a method to predict the demand that W company met with.. 2.2 Machine Learning for Supply Chain Demand Forecasting The study by Crone, Fildes, Nikolopoulos, and Syntetos (2008) shows that a lot of time series models have been used in demand forecasting, such as autoregressive integrated moving average (ARIMA) and exponential smoothing. Besides, Carbonneau, Laframboise, and Vahidov (2008) tried some machine learning models, neural networks, and support vector machines to forecast the supply chain demand. However, all these models are at item level. In other words, the models are fitted with individual items. We want to forecast a block with many weeks at once. The week number of the block depends on the weeks of the supplier lead time. In this situation, our data size decreases significantly. If the supplier lead time is 8 weeks, and we have 72 weeks’ data in total, we only have 72/8 blocks, which is 9 blocks. This data will not be enough for the machine to learn. As a solution for this problem, we use different items together as the input. We fit the model with cross-item aggregation. For every round we use all the available items to fit the models.. Unlike the neural networks, the Gradient Boosting Machine (GBM) does not use a strong machine learning model for forecasting. On the contrary, GBM is a collection 5. DOI:10.6814/THE.NCCU.MIS.003.2019.A05.

(12) of many weak machine learning models (Knoll and Natekin, 2013). GBM can improve some weakness of the strong machine learning model. A strong machine learning model will have a complex algorithm, which will not make it easily generalizable. GBM using a number of weak machine learning models can easily prevent this problem. In addition, the other advantage of using weak machine learning models is the time saved. The computation time of a weak machine learning model is much less than a strong machine learning model.. Gradient boosting is the process of finding out the lowest loss function through iterative execution. There will be three steps in each iteration. The first step finds the. direction. The second decides the step size; the last step renews the value. The machine will decide the direction and step size according to the gradient.. Click, Lanford, Malohlava, Parmar, and Roark (2015) published the H2O package in R and Python to make the use of GBM, Random Forest, and other machine learning models easier. Unlike Random Forest that averages the independent regression trees, GBM. utilizes an ensemble of trees, in which each tree sequentially learns from the previous tree’s prediction errors. Compared to Deep Learning that is sensitive to the scale of variables and is time consuming, GBM is easy-to-train and shown to be the best method in many structural data-driven predictions.. 6. DOI:10.6814/THE.NCCU.MIS.003.2019.A05.

(13) CHAPTER 3 INVENTORY MODEL AND POLICY ANALYSIS 3.1 Inventory Control System The inventory control system in W company is based on a periodic review model. The order frequency is once a week, and there are three steps to construct the inventory control system. The first is to check on the stocks on hand and the orders received. The second is to calculate the stock in transit and the available stock. The last is to make decision about the order quantity. There are some notations used in this chapter. t is the index of the week number. The time-domain of t is from 1 to T. LT means the lead time from ordering to receiving. The first step is to confirm the stocks on hand and the stock just received. Q𝑡 is the order quantity for the week t. The order will be received LT + 1 weeks after it was made. So, the amount of order received (𝑂𝑅𝑡 ) is the order quantity LT+1 weeks before (𝑄𝑡−𝐿𝑇−1 ). BOH𝑡 is the stock on hand at the beginning of the week t. Its value is equal to the stock on hand at the end of the week t-1 (𝐸𝑂𝐻𝑡−1 ). 𝐸𝑂𝐻, the stock on hand at the end of the week, is equal to the stock on hand at the beginning of the week (𝐵𝑂𝐻𝑡 ) plus the stocks received in the week (𝑂𝑅𝑡 ), minus the stocks that were given out to the customers (𝐷𝑡 ). The stocks given out to the customers in the week is called demand (𝐷𝑡 ). Sometimes, the demand is higher than what W company can afford. In this situation, the out of stock causes the EOH𝑡 to become zero.. 𝑂𝑅𝑡 = 𝑄𝑡−𝐿𝑇−1. (3.1). 𝐵𝑂𝐻𝑡 = 𝐸𝑂𝐻𝑡−1. (3.2). 𝐸𝑂𝐻𝑡 = 𝑚𝑎𝑥(𝐵𝑂𝐻𝑡 + 𝑂𝑅𝑡 − 𝐷𝑡 , 0). (3.3). The begin of week backlog (BBL) comprises the stocks in transit. It equals the begin of week backlog in the previous week (𝐵𝐵𝐿𝑡−1) minus the stocks that had already been received in the previous week (𝑂𝑅𝑡−1), plus the order quantity that has just been decided in the previous week (𝑄𝑡−1 ). The available stock before ordering is the stock on hand (𝐸𝑂𝐻𝑡 ) plus the stocks that will be received in the near future (𝐵𝐵𝐿𝑡 − 𝑂𝑅𝑡 ). 7. DOI:10.6814/THE.NCCU.MIS.003.2019.A05.

(14) 𝐵𝐵𝐿𝑡 = 𝐵𝐵𝐿𝑡−1 - 𝑂𝑅𝑡−1 + 𝑄𝑡−1. (3.4). 𝐴𝑆𝑡 = 𝐸𝑂𝐻𝑡 + 𝐵𝐵𝐿𝑡 − 𝑂𝑅𝑡. (3.5). The order quantity (Q) is equal to the difference between the targeted inventory position (S) and the available stock before ordering (AS). The targeted inventory position is a value that the policy generates. Its goal is to protect the W company by having enough stocks to fulfill the future demand, while not having too much stocks on hand in the end .. 𝑄𝑡 =𝑚𝑎𝑥⁡(0, 𝑆𝑡 - 𝐴𝑆𝑡 ). (3.6). If the demand is independent and identically distributed (iid), deciding the targeted inventory position will be simple. The truth is that when the demand is non-iid, the finite horizon and the lead time is long. All these characters make forecasting difficult. The way W company decides the targeted inventory position is called mweek-moving –average (mWMA). The week where we stand now is t. It uses the demand from the week t-1 to t-m to forecast the week t+1 to t+LT. W company can adjust the targeted inventory position by changing the r, where r is the adjustment factor. We understand that an increase in r by 1 unit represents the increase in targeted inventory position for 1 week. The calculation is shown in equation 3.7 below. 𝑆𝑡 = ⁡ ∑𝑡−𝑚 𝑠 =𝑡−1 𝐷𝑠 / 𝑚⁡ ∗ (𝐿𝑇 + 𝑟)⁡. (3.7). The objective function of W company is to minimize the overall end of week stocks on hand. The constraint is in calculating what percent of the demand will be fulfilled. It should be at least greater than the fill rate (FR) we have set. The sum of the beginning of week’s stocks on hand and the order received in the week is the stocks that can be used to fulfill the demand. When there is no demand for any of the weeks, the FR is naturally fulfilled. Min ∑𝑇𝑡=1 𝐸𝑂𝐻𝑡. (3.8). 𝑆𝑡. 8. DOI:10.6814/THE.NCCU.MIS.003.2019.A05.

(15) (𝐵𝑂𝐻𝑡 + 𝑂𝑅𝑡 )/𝐷𝑡 ≧ FR ∀t = 1,…,T. (3.9). Bijvank and Vis (2011) integrated the papers of lost sales. The usual classifications of the replenishment models are the backorders and the lost sales. In the backorder model, the part of the demand that cannot be fulfilled in the near future is replenished, whereas the lost sales model implies that the company can do nothing about the out of stock. For W. company, the lost sales model is more appropriate, because the demands of their customers are urgent and perishable. Once W company is unable to fulfill the demand, they will transfer their orders to the other distributors.. We use FR in equation 3.9 as the performance index because it has been used by a number of researchers (e.g., Bischak, Naseraldin, and Silver, 2008). In addition, other researchers have used another index, Service level (SL), to measure the performance (Beutel and Minner, 2012; Deng, Tan, Paul, and Wei, 2017). We use both of these in this research.. We can easily find that if we order as less as possible, we will lower the EOH, but we cannot fit the FR, and vice versa. We add another index called SL to count the number of times we fulfill the FR constraint. The time-domain starts at SL_t, which is a few weeks after the first week, because the EOH and OR at the first week is 0. Even if we order at the first week, the order will be received LT+1 weeks later. The demand of the first LT+1 week can never be fulfilled, and therefore, we do not count them when calculating the SL.. 𝑆𝐿 =. 𝑛𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 ⁡𝑤𝑒𝑒𝑘𝑠⁡𝑡ℎ𝑎𝑡⁡𝑓𝑢𝑙𝑓𝑖𝑙𝑙⁡𝑡ℎ𝑒⁡𝐹𝑅⁡𝑐𝑜𝑛𝑠𝑡𝑟𝑎𝑖𝑛𝑡 𝑛𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝑤𝑒𝑒𝑘𝑠⁡𝑤𝑒⁡𝑐𝑜𝑢𝑛𝑡. (3.10). 3.2 As-Is and To-Be Policy Analysis One thing that we can be sure about forecasting is that it will always have some error. The error is the difference between the forecasted value and the real value. If we assume that the future demand is already known, we can eliminate the influence of the. 9. DOI:10.6814/THE.NCCU.MIS.003.2019.A05.

(16) forecasting. It allows us to focus on the problem in the policy itself. If we have the perfect information, the SL should be 1, and the EOH should be 0.. The As-Is policy is the policy that W company using nowadays. W company sets m to 8 in equation 3.7. This policy was invented by the rule of thumb and is a naïve forecasting method. The major defect is the adjustment factor, r. To avoid using r, we try to use another way to decide the order quantity. The order quantity should mainly consider two factors, the future demand and the AS. We start at week t. The future demand in week t+LT+1 is the quantity we should order now. However, we should also consider the AS. AS is the stocks W company has on hand and the orders in transit. The more AS we have, the less we should order, for EOH to be 0. From equation 3.3, we understand that 𝐸𝑂𝐻𝑡 equals to 𝑚𝑎𝑥(𝐵𝑂𝐻𝑡 + 𝑂𝑅𝑡 − 𝐷𝑡 , 0). Therefore, we can derive equation 3.11, where 𝐸𝑂𝐻𝑡+𝐿𝑇 will be equal to the summation of 𝑚𝑎𝑥(𝐵𝑂𝐻𝑡 + 𝑂𝑅𝑡 − 𝐷𝑡 , 0) from weeks t+1 to weeks t+LT. 𝐸𝑂𝐻𝑡+𝐿𝑇 = 𝑀𝑎𝑥(∑𝑡+𝐿𝑇 𝑠=𝑡+1 𝐵𝑂𝐻𝑠 + 𝑂𝑅𝑠 − 𝐷𝑠 , 0). (3.11). Equation 3.11 shows that 𝐸𝑂𝐻𝑡+𝐿𝑇 equals to 𝐵𝑂𝐻𝑡+𝐿𝑇+1, which we already know from equation 3.2. The 𝐵𝑂𝐻𝑡+𝐿𝑇+1 will be the 𝐴𝑆𝑡 minus the demand summation from weeks t+1 to weeks t+LT. 𝐸𝑂𝐻𝑡+𝐿𝑇 = 𝐵𝑂𝐻𝑡+𝐿𝑇+1. (3.12). 𝐵𝑂𝐻𝑡+𝐿𝑇+1 = max(𝐴𝑆𝑡 - ∑𝑡+𝐿𝑇 𝑠 =𝑡+1 𝐷𝑠 , 0). (3.13). The order quantity is in equation 3.13, which is the To-be policy. After the demand from t+1 to t+LT, we need to fulfill the demand for the weeks t+LT+1 depending on how many stocks we have. If 𝐴𝑆𝑡 − ∑𝑡+𝐿𝑇 𝑠 =𝑡+1 𝐷𝑠 is less than 0, we do not have any inventory. We should order the quantity that is just equal to 𝐷𝑡+𝐿𝑇+1 . However, if we still have some AS, we should order the difference between 𝐴𝑆𝑡 and 𝐷𝑡+𝐿𝑇+1 to make sure that the EOH is 0. 𝑄𝑡 = 𝐷𝑡+𝐿𝑇+1 − max⁡(0, 𝐴𝑆𝑡 − ∑𝑡+𝐿𝑇 𝑠 =𝑡+1 𝐷𝑠 ). (3.14). 10. DOI:10.6814/THE.NCCU.MIS.003.2019.A05.

(17) We will now use the perfect information to compare the performance of the As-Is policy and the To-Be policy. We choose 4 item-plant pairs to check the performance. Item A and Item B represent for the high frequency item-plant pairs. Item A has 123week demand in a total of 136-week, whereas Item B has 121-week demand. Item C and Item D represent the medium frequency items. Both have 23-week demand in a total of 136-week. The demand frequencies are as shown in figure 3.1 and 3.2.. Figure 2.1 The demand frequency of the high frequency items. Figure 3.2 The demand frequency of the 75 percent quantile frequency items. After we put the demand data into the As-Is and To-Be policies, the results are shown in figure 3.3 and 3.4.. 11. DOI:10.6814/THE.NCCU.MIS.003.2019.A05.

(18) Figure 3.3 Result of service level. Figure 3.4 Result of end of week stock on hand. We can clearly find that the SLs under To-Be policy can reach 1, and the EOHs can reach 0 in all of the represented items. However, the SLs under the As-Is policy are still not perfect, and the EOHs are still higher than 0. The results for all 2214 items in the calculation is in table 3.1.. Table 3.1 The results of the As-is and To-be policies. SL=1 As-Is To-Be. SL<1. EOH=0 EOH>0. 25.91% 74.09% 35.23% 64.77% 100%. 0%. 100%. 0%. We can see that most of the performances in the As-Is policy are still not optimal. Obviously, the To-Be policy is better than the As-Is policy.. 12. DOI:10.6814/THE.NCCU.MIS.003.2019.A05.

(19) Although we will never know the demand in the future in the real world, we can be sure that we will be closer to the SL equal to 1 and the EOH equal to 0, if we forecast as precisely as we can. Therefore, the learning problem is to predict ∑𝑡+𝐿𝑇 𝑠=𝑡+1 𝐷𝑠 and 𝐷𝑡+𝐿𝑇+1 . The method to predict 𝐷𝑡+𝐿𝑇+1 is shown in equation 3.15. 𝑡+𝐿𝑇 𝐷𝑡+𝐿𝑇+1 = ⁡ ∑𝑡+𝐿𝑇+1 𝑠=𝑡+1 𝐷𝑠 − ⁡ ∑𝑠=𝑡+1 𝐷𝑠. (3.15). 13. DOI:10.6814/THE.NCCU.MIS.003.2019.A05.

(20) CHAPTER 4 FORECASTING 4.1 Data and Feature Engineering The data we have belongs to the customer id 1171. The time-domain is from April 1, 2015 to December 31, 2017, a total of 136 weeks. The column names are listed in table 4.1. The week we stand at is called week t. Table 4.1 The data column. Target variables Calloff.LT: The future demand aggregation from t+1 to t+LT Calloff.LT.1: The future demand aggregation from t+1 to t+LT+1 Features Item-plant pairs: Identity of the items. The combination of the item number and the cust plant. Week: Week number. Quantity: Demand in week t. Cumulated Calloff Quantity: The aggregation of demand from week 1 to week t. Cumulated mean Calloff Quantity: The aggregation mean of demand from week 1 to week t. Cumulated Zero: The number of the zero demand from week 1 to week t. Continual Zero: The number of the zero demand from the last week with non-zero demand to week t. Forecasting: The forecasting data for week t to week t+LT+1 given by the downstream clients.. Some items have the same item numbers during production at different cust plants. This is why we use the item number and the cust plant to create a new column, itemplant pairs. Finally, we have 2214 different item-plant pairs. Week Number is from 1 to 136, representing the week that the data belongs to.. We want the machine to learn the quantity of demand for those that are not zero by Quantity, Cumulated Calloff Quantity, and Cumulated mean Calloff Quantity. On the. 14. DOI:10.6814/THE.NCCU.MIS.003.2019.A05.

(21) other hand, we want the machine to learn some of the future demand that will be zero by Cumulated Zero and Continual Zero.. We will briefly introduce the forecasting data in section 4.2. Because the first demands of the item-plant pairs occur in different weeks, most of the item-plant pairs do not have demand in each of the 136 weeks. Therefore, we will fill in zero in the Quantity column for the week without demand.. 4.2 Forecast The data we received from W company are of two different types. The first one is the Calloff data. Table 4.2 can show us the content of the Calloff data. The first column is the transaction date of the item. The second column is the plant where the items are produced. The third column is the item number. The fourth column is the demand quantity of the item in the specific transaction date. We do not have the forecasting data in this table.. Table 4.2 The content of the calloff data. TRANSACTION_DATE CUST_PLANT ITEM_NUM. QUANTITY. 2015/4/1 F130. AO3418L. 9000. 2015/4/1 F130. AOZ5019QI. 6000. 2015/4/1 F130. SSM3J327R,LF(T. 2015/4/1 F131. AO3401A. 288000. 2015/4/1 F131. AO4407A. 27000. ……. …. 12000. …. 2017/11/1 F130. TPC8067-H,LQ(S. 2017/11/1 F131. SSM3K16CT,L3AP1F(T. 10000. 2017/11/1 F131. SSM3K16CT(TL3APP1E. 140000. SSM3K15AMFV,L3AF(. 2017/11/1 F130. T. 2017/11/1 F130. TCS10DPU(T5LAP,E). 2500. 216000 36000. 15. DOI:10.6814/THE.NCCU.MIS.003.2019.A05.

(22) The second part is the forecasting data. Table 4.3 shows us the content of the forecasting data. The first column is the date when the downstream companies generate the forecasting data using the materials requirement planning system. The second is the plant where the items are produced. The third is the item number. The fourth is the target date of the forecast, and the fifth column is the forecasting demand quantity of the item in the specific transaction date.. Table 1.3 The content of the forecasting data. MRP_DATE CUST_PLANT ITEM_NUM DMD_DATE QUANTITY. …. 2015/6/29 F130. AO4435. 2015/6/29. 1226. 2015/6/29 F130. AO4435. 2015/7/6. 1842. 2015/6/29 F130. AO4435. 2015/7/13. 2073. 2015/6/29 F130. AO4435. 2015/7/20. 518. 2015/6/29 F130. AO4435. 2015/7/27. 2774. …. 2015/12/27 F136. 2015/12/27 F136. 2015/12/27 F136. 2015/12/27 F136. 2015/12/27 F136. …. …. TPCA8006H(TE12L,Q TPCA8006H(TE12L,Q TPCA8006H(TE12L,Q TPCA8006H(TE12L,Q TPCA8006H(TE12L,Q. … 2016/8/1. 0. 2016/9/1. 0. 2016/10/1. 0. 2016/11/1. 0. 2016/12/1. 0. We want to use not only the demand data but also the forecasting data to let the machine learn. We need to combine these two data together. The first step is to find how many item-plant pairs are there in the data. We find 2214 item-plant pairs in total. The time unit of W company is week, so we transform the TRANSACTION_DATE in table 4.2 and the MRP_DATE and the DMD_DATE in table 4.3 into week numbers. The first week is from April 1, 2015 to April 7, 2015, a total of 136 weeks. We sort the data by the item-plant pairs and the week numbers. Table 4.4 show us the example of the Calloff data after we transform and sort. Some 16. DOI:10.6814/THE.NCCU.MIS.003.2019.A05.

(23) weeks have demand while others do not. The second step is to match the item-plant pairs and the week numbers. There are many forecasting values for the same week. We list the forecasting value in the new columns after the Week column. Table 4.5 is the example of the combination of the Calloff data and Forecasting data.. Table 4.4 The Calloff data after transformation. CUST_PLANT QUANTITY ITEM_PLANT Week F130. 39000. F130. 105000. F130. 36000. F130. 39000. F130. 90000. …. …. AO3418L -. 1. F130 AO3418L -. 2. F130 AO3418L -. 5. F130 AO3418L -. 6. F130 AO3418L -. 7. F130 …. …. We show the example of the Forecasting in table 4.5. We are now standing at week 7. The lead time is 4 weeks. The target we want to forecast is the Calloff.LT.1, which is the aggregate of the demand in the future lead time plus 1 week, that is, week 8 to week 12. FCST(-1) means the forecasting value of the Calloff.LT.1 for week 7 from 1 week ago, that is, week 6. FCST(-2) means the forecasting value of the Calloff.LT.1 for week 7 from 2 week ago, that is, week 5. The forecasting signals we already have in week 7 is the triangle shown in table 4.5. The FCST(-1) of week 9 is week 8. We do not have the information for week 7. As the week number increases, the known forecasting information become lesser. We can only use the forecasting data in the triangle to forecast the future demand. When we stand at week 7, the latest Calloff.LT.1 we have is 7 minus lead time, which means Calloff.LT in week 3.. 17. DOI:10.6814/THE.NCCU.MIS.003.2019.A05.

(24) Table 4.5 Example of forecasting data framework. We cannot randomly use any percent of the data as the training data, the validation data, and the testing data. The data we use are in an order of time. We take 52 weeks as the basis for training and validation. In the first round, we take the week 1 to week 36 as the training data, the week 37 to week 39 as the validation data, and the week 52 as the testing data. We use week 52 instead of week 40 because the week 39 Calloff.LT.1 is the demand collection from week 40 to week 52. In this situation, lead time is now 12 weeks. It is very similar to the situation in Table 4.6. When we stand at week 52, we already know that the latest Calloff.LT.1 is in week 39. Table 4.6 show us the process of each round.. 18. DOI:10.6814/THE.NCCU.MIS.003.2019.A05.

(25) Training: Calloff.LT.1: 2 – 37 weeks Features: (3-14) – (38-50) weeks Validation: Calloff.LT.1: 38 – 40 weeks Fetures: (39-51) – (41-53) weeks Testing: Calloff.LT.1: 53 week Features: 54 - 66 weeks. .... round 52. Training: Calloff.LT.1: 1 – 36 weeks Features: (2-13) – (37-49) weeks Validation: Calloff.LT.1: 37 – 39 weeks Fetures: (38-50) – (40-52) weeks Testing: Calloff.LT.1: 52 week Features: 53 - 65 weeks. round2. round1. Table 4.6 The process of the training, validation, and testing time-domain change. Training: Calloff.LT.1: 52 – 87 weeks Features: (53-65) – (88100) weeks Validation: Calloff.LT.1: 88 – 91 weeks Features: (89-101) – (92-104) weeks Testing: Calloff.LT.1: 103 week Features: 104-116 weeks. The time-domain is from week 52 to 103, a total of 52 weeks, which means a year. The index we use to measure the performance is mean absolute error (MAE).However, the first week for any item-plant pair that has the calloff or the forecasting data can be different. The method we use finds the first week when each item-plant pair has its calloff or forecasting data. We only cumulate the MAE from the first week when the item-plant pairs have calloff or forecasting data to the 103rd week. The item-plant pairs that have both calloff and forecasting data after 103rd week will not be considered. After this step, we only have 1899 item-plant pairs.. We first try the GBM model, which is the Model 1. We think the 8WMA and the GBM have their own strengths. This is the reason we want to mix the 8WMA and the GBM model. The first is simply giving the same weight to 8WMA and GBM, which is the Model 2. However, we think that adjusting the weights weekly will improve the performance due to the change of the trend. In Model 3, we compare the forecasting value produced by the validation data with the real value and use the error of the 19. DOI:10.6814/THE.NCCU.MIS.003.2019.A05.

(26) forecasting value and the real value to set the weight for each week. We also consider the characteristic of each Item-plant pair in Model 4. We give each Item-plant pair its own weight. The brief contents of each model are list below.. Model 1: Gradient Boosting Machine (GBM) We use the package “H2O” in R and use the function h2o.gbm. The parameters we set are listed in table 4.2. GBM has many parameters that can be fine-tuned. For detailed explanations of these parameters in Table 4.7, please refer to Gradient Boosted Models with H2O (Click, Malohlava, Candel, Roark, and Parmar, 2016) Table 4.7 The parameter of the GBM. Number of trees: 100, 200, … , 1000 The larger the number of trees, the more accurate the result. It seems that more trees is always better. However, there are some problems so easy that we do not need too many trees. Maximum depth of the trees: 1, 2, … , 25 This means the complexity of the trees. The less complex of the trees, the less chance to overfit. However, the complex trees have better explanatory power compare to the simple trees. Minimum row for each iteration: 5, 10, … , 100 Learning with more data rows may perform better due to it use more data. However, sometimes the data we have are limited. In this situation we should learn with less data rows. Learning rate: 0.01, 0.012, … , 0.08 Once we do not reach the optimal solution, the machine will try the other gradient to find out the next solution. The step size the machine renew from the current solution to the next solution is according to the learning rate. Sampling rate: 0.3, 0.35, … , 1 This sample rate is for rows, which means how many percentages of the data rows will be use in training. Using more data can make the result more accuracy while it will takes more times to train. Column sample rate: 0.3, 0.35, … , 1. 20. DOI:10.6814/THE.NCCU.MIS.003.2019.A05.

(27) This sample rate is for columns, which means how many percentages of the features will be use in training. Sometimes the features we give the machines may not useful in the machine learning. The more features we use can not guarantee the result will be better. Note that this method is sample without replacement. Column sample rate per tree: 0.3, 0.35, … , 1 We should set this parameter after we set the Column sample rate. If we set the Column sample rate equals to 0.8 and the Column sample rate per tree equals to 0.7, each tree will use 0.8*0.7 equals to 56% of the column will be use to train. Note that this method is sample without replacement. Distribution: Gamma We assume that the distribution of the data is Gamma. In total, there are over 489 million scenarios. We perform a random grid search by randomly testing a subset of these scenarios. Finally, the model will choose an optimal one from the scenarios above according to the condition set by us. Measuring the percentages of the improvement is also a problem. Waller (2015) shows that there are some indexes, including mean absolute error (MAE), geometric mean absolute error (GMAE), mean absolute percentage error (MAPE), symmetric mean absolute percentage error (sMAPE), and mean absolute scaled error (MASE). Because we care about the difference between the forecasting value and the true value, so we set the condition as MAE.. Model 2: 50% 8WMA and 50% GBM In this model, we just simply give the 8WMA model and the GBM model with the same weights. This means that the weight for the 8WMA model and the GBM model are the same from 52nd week to 103rd week.. Model 3: Weighted by week We will try to weight the 8WMA model and the GBM model by week. We calculate the weight by the MAE of the forecasting value and the real value. We first use the validation data to forecast. Then, we have the forecasting value. We use the MAE of the forecasting value and the real value to adjust the weight every week. Equation 4.2 shows the calculation process. 𝑀𝐴𝐸. 8𝑊𝑀𝐴 is the MAE of the 8WMA. 𝑀𝐴𝐸. 𝐺𝐵𝑀 is the MAE of the GBM. When the error of the 8WMA is large, we should let the weight 21. DOI:10.6814/THE.NCCU.MIS.003.2019.A05.

(28) of the GBM be heavier to approximate the real value, and vice versa. Since the weight of all item-plant pairs are the same for the same week, there are only one notation, that is week, for 𝑀𝐴𝐸. 8𝑊𝑀𝐴⁡and 𝑀𝐴𝐸. 8𝑊𝑀𝐴. MAE. 8WMAweek * ⁄(MAE. 8WMA week + ⁡ MAE. GBMweek ) ⁡⁡⁡⁡⁡⁡⁡⁡⁡⁡⁡⁡⁡⁡⁡⁡⁡⁡⁡⁡GBM. forecasting 𝑤𝑒𝑒𝑘 + MAE. GBMweek * ⁄(MAE. 8WMA week + ⁡ MAE. GBMweek ) ⁡⁡⁡⁡⁡⁡⁡⁡⁡⁡⁡⁡⁡⁡⁡⁡⁡⁡⁡⁡⁡⁡⁡⁡⁡⁡⁡⁡⁡⁡8WMA. forecasting 𝑤𝑒𝑒𝑘 (4.1) Model 4: Weighted by week and by item We will try to give the 8WMA and the GBM weight by week and by item. We first use the validation data to forecast. Then, we have the forecasting value. We will calculate the MAE by week and item, which means different item-plant pairs will have different MAEs in the same week. This is the reason it has two notations, item-plant pairs and weeks, for MAE.8WMA and MAE.GBM. MAE. 8WMA𝑖𝑡𝑒𝑚,𝑤𝑒𝑒𝑘 * ⁄(MAE. 8WMA 𝑖𝑡𝑒𝑚,𝑤𝑒𝑒𝑘 + ⁡ MAE. GBM𝑖𝑡𝑒𝑚,𝑤𝑒𝑒𝑘 ) GBM. forecasting 𝑖𝑡𝑒𝑚,𝑤𝑒𝑒𝑘 + MAE. GBM𝑖𝑡𝑒𝑚,𝑤𝑒𝑒𝑘 * ⁄(MAE. 8WMA 𝑖𝑡𝑒𝑚,𝑤𝑒𝑒𝑘 + ⁡ MAE. GBM𝑖𝑡𝑒𝑚,𝑤𝑒𝑒𝑘 ) 8WMA. forecasting 𝑖𝑡𝑒𝑚,𝑤𝑒𝑒𝑘. (4.2). 4.3 The Result The measurement unit we use here is the error of each model divided by the error of 8WMA. We want to check how good or how bad the new model performs. The calculation is in equation 4.3. The start time is the first week when the item-plant pair I has its calloff or forecasting data. First, we can notice the item-plant pair when we first meet the demand or receive the forecasting data from the downstream companies. Only the error after the start time will be counted. The end time is fixed in 103rd week. We sum up the error of each week by their absolute value.. 22. DOI:10.6814/THE.NCCU.MIS.003.2019.A05.

(29) 𝑚𝑜𝑑𝑒𝑙 ∑103 | 𝑡=𝑠𝑡𝑎𝑟𝑡⁡𝑡𝑖𝑚𝑒|𝑒𝑟𝑟𝑜𝑟𝑖,𝑡 error⁡ratio⁡ = ⁄ 103 8𝑊𝑀𝐴 ∑𝑡=𝑠𝑡𝑎𝑟𝑡⁡𝑡𝑖𝑚𝑒|𝑒𝑟𝑟𝑜𝑟𝑖,𝑡 |. (4.3). A ratio less than 1 means the new model performs better than 8WMA for item-plant i. The less the ratio is, the better the new model performs. If the ratio is 0.4, the error sum of the new model is only 40% of the 8WMA. We can also say that the performance is 1/0.4 = 2.5 times better than the 8WMA. Conversely, the new model’s performance is worse than the 8WMA if the ratio is larger than 1. If the ratio is 3, the error sum of the new model is 3 times that of the 8WMA. We can also say that 8WMA perform 3 times better than the new model.. We can directly see the error ratio of each model in figure 4.2. The weighted by week model has the best performance, where 66.4% of the 1899 item-plant pairs are better than the 8WMA model. The 50% GBM and 50% 8WMA model is very close to the weighted by week model. However, we can see that directly giving the same weight to the GBM and 8WMA models will result in the min error ratio, which cannot be small.. Although the weighted by week by item model cannot perform very well, it still has something special. Some of the item-plant pairs can reach the goal that no error with the true value can. In the error ratio range 0 to 0.5, The GBM model and weighted by week by item model perform significantly better than the other models. In figure 4.3 we can see that the 50% GBM and 50% 8WMA model, as well as the weighted by week model only have less than 10% of the item-plant pairs in the error ratio range 0 to 0.5. In contrast, the GBM model and the weighted by week model have more than 20% in this range.. 23. DOI:10.6814/THE.NCCU.MIS.003.2019.A05.

(30) Figure 4.2 The cumulated number of items for each of the models. Percentage. Error ratio distribution 0.6 0.5 0.4 0.3 0.2 0.1 0 <=0.5. >0.5&<1. >1&<=2. >2. Error Ratio Interval GBM 50%GBM & 50%8WMA weighted by week weighted by week by item Figure 4.3 Error ratio distribution. 24. DOI:10.6814/THE.NCCU.MIS.003.2019.A05.

(31) We discover that the mixed models are all better than GBM model. There are always some of the item-plant pairs where GBM model cannot perform that well. Conversely, some of the item-plant pairs are perfect for GBM models. This is why it performs really well in the range of the ratio less than 0.5, although the total performance of GBM model is the worst among all the new models we had tried. The mixture model may perform better because when the item-plant pair is suitable for 8WMA model, and the weight of 8WMA may be higher. On the contrary, it will give GBM model higher weight if the item-plant pair is more suitable for GBM model.. The by week by item model is the only model with error ratio equal to 0. After we check the weight of these ratios to be zero in the by week by item model, we find that all the weights for GBM are 0 and all the weights for 8WMA are 1. This result show us that 8WMA model performs extraordinarily well in some of the item-plant pairs. After we check the demands of these item-plant pairs, we find that the demands are all 0 after the first test week, that is, week 52.. 25. DOI:10.6814/THE.NCCU.MIS.003.2019.A05.

(32) CHAPTER 5 DISCUSSION The model with the best performance is obviously the weighted by week model. Over 66 percent of the item-plant pairs can forecast better than the current 8WMA model. However, the weakness of the weighted by week model is that over 50% of the itemplant pairs are in the error ratio range 0.5 to 1. The objective of W company is to minimize the EOH. The more precise the forecasting, the lower the EOH will be. In section 4.4, We understand that the weighted by week by item model can perform this really well. The error ratio under 0.5 means it lowers the error by at least 50 percent from the 8WMA model. It also has the most item-plant pairs with zero error ratio, which means it can forecast these item-plant pairs without any error, with 100 percent accuracy.. If the downstream company never uses some items in the future products, it will let W company know. We think the situation where the demand after a certain week are all 0 will not happen. However, we can still learn something from this situation. 8WMA may perform really well in some of the items while GBM may perform well in the others. We need to find out the characteristics of the item-plant pairs. W company can classify the item-plant pairs before forecasting. The item-plant pairs with the same characteristics are put into the machine so that it can learn better. We have proved that machine learning can somehow help us in figuring out the inventory problems. We can also group the item-plant pairs and assign them to the appropriate forecasting method. For example, we use time series model to forecast the item-plant pairs, if they can be forecast better by using the time series model rather than machine learning model.. The features of the machine learning roughly determine how well the machine will learn. The features we put in, such as Cumulated Calloff Quantity and Cumulated Zero are mostly lagging indicators. These features can only reflect the situation that has already happened. Fortunately, we have a feature called Forecasting. Forecasting is a leading indicator given by the downstream customers. It is the demand forecasting for the future LT+1 week.. 26. DOI:10.6814/THE.NCCU.MIS.003.2019.A05.

(33) We used the data from 2015 to 2016 to forecast 2016 to 2017. However, most of the forecasting data from 2015 to 2016 are full of missing values. This will hugely influence learning ability and the machine can only learn from the lagging indicators . After we show. our results, W company can try to aggressively collect the forecasting data.. 27. DOI:10.6814/THE.NCCU.MIS.003.2019.A05.

(34) CHAPTER 6 CONCLUSION & LIMITATION 6.1 Conclusion The main target of our research is to minimize the inventory of W company, while simultaneously keeping the fill rate the same or even higher than before. We first inspect the inventory policy. The order quantity is important, but difficult to decide. We propose a new method, To-Be policy, to compare with the As-Is policy. We find out that To-Be policy can perform better than As-Is policy with perfect information. To-Be policy can reach the optimal situation where the service level is equal to 1 and the average end of week stock on hand is equal to 0, while As-Is policy cannot do this.. We find that there are two kind of methods to solve this problem, which have been used by most of the previous research. The first is the time-series and the second is the machine learning. W company had already use one kind of the time-series methods, moving average. We try to use another kind of the machine learning, gradient boosting machine, to find if it can perform better. However, for most of the time, the best choice will be the mixture of the two methods. We propose 3 kinds of the mix models, the 50% GBM and 50% 8WMA, the weighted by week, the weighted by week by item. The weighted by week model and the weighted by week by item model have their own strengths in different kind of the situations. The weighted by week model has the highest ratio of the performance, better than the 8WMA. The weighted by week by item model has the highest ratio of the performance when the error ratio is less than 0.5, better than the 8WMA.. 6.2 Limitation The data we have is from one customer only. When W company faces other customers, the situation may change. There may be have three kinds of solutions. The first method may put all the data from different companies into the machine at once. We will get only one model at the end, while the error between different companies may be huge. The second method may let the machine learn from each model for each 28. DOI:10.6814/THE.NCCU.MIS.003.2019.A05.

(35) company. The models may reflect the different characteristics of the different companies. However, if we face a new company, we may not have enough data to construct the model. The third method may group the data with the similar characteristics together to let the machine learn. The difficult part of this method is finding out the effective characteristics.. We assume the demand distribution as gamma. Future research can try with other demand distributions. It can also add the parameter quantile to adjust the output. The higher the quantile is, the more demand that can be covered. On the contrary, a lower quantile may lower the end of week stock on hand. We pick out the model with the minimum MAE to be the optimal one. Other researchers can try other error indices, such as mean square error (MSE) and root mean squared logarithmic error (RMSLE), according to their concern in future research.. We try only one of the machine learning method from the h2o package. Other methods, such as Random Forest, Deep Learning, and Generalized Linear Model, can be use in future research. Many machine learning methods are not there in the h2o packages and new methods are being invented. The mixture of more machine learning or time-series models may improve the results.. 29. DOI:10.6814/THE.NCCU.MIS.003.2019.A05.

(36) REFERENCES [1] Hendry, L. C., Simangunsong, E., & Stevenson M. (2011). Supply Chain Uncertainty: A Review and Theoretical Foundation for Future Research. International Journal of Production Research. 252(2016). pp. 1-26.. [2] Zied, Babai, John, E. Boylan, Stephan, Kolassa & Aris, A. Syntetos(2015). Supply chain forecasting: Theory, practice, their gap and the future. European Journal of Operational Research.. [3] Diane, P., Bischak, Hussein, Naseraldin & Edward, A. Silver(2008). Determining the Reorder Point and Order-Up-To-Level in a Periodic Review System So As to Achieve a Desired Fill Rate and a Desired Average Time Between Replenishments. The Journal of the Operational Research Society, 60(9), pp. 1244-1253.. [4] Terry, L., Esper & Matthew, A., Waller (2014). The Definitive Guide to Inventory Management, The Principles and Strategies for the Efficient Flow of Inventory across the Supply Chain. Council of Supply Chain Management Professionals, Ch3. [5] Qi, Deng, Anand, A., Paul, Yinliang (Ricky), Tan & Lai, Wei (2017). Mitigating Inventory Overstocking: Optimal Order-Up-to Level to Achieve a Target Fill Rate over a Finite Horizon. Production and Operations Management, Forthcoming,. [6] S., F., Crone, R., Fildes, K., Nikolopoulos, & A., A., Syntetos (2008). Forecasting and operational research: a review. Journal of the Operational Research Society. 2008(59). Pp.1150-1172.. [7]Real, Carbonneau, Kevin, Laframboise & Rustam, Vahidov (2008). Application of machine learning techniques for supply chain demand forecasting. European Journal of Operational Research. 184(2008). Pp. 1140-1154.. [8]Alois, Knoll & Alexey, Natekin (2013). Gradient boosting machines, a tutorial. Frontiers in NEURORBOTICS. 7(21).. 30. DOI:10.6814/THE.NCCU.MIS.003.2019.A05.

(37) [9] Arno, Candel, Cliff, Click, Michal, Malohlava, Viraj Parmar & Hank, Roark (2016). Gradient Boosted Models with H2O. H2O.ai, Inc. pp.. [10] Marco, Bijvank. Iris, F., A., Vis (2011). Lost-sales inventory theory: A review. European Journal of Operational Research, 215(1). pp. 1-13 [11] Anna-Lena, Beutel and Stefan, Minner (2012). Safety stock planning under causal demand forecasting. International Journal of Production Economics. 140(2). pp.637 – 645.. [12] Daniel Waller (2015). Method for intermittent Demand Forecasting. (Unpublished thesis). Lancaster University. 31. DOI:10.6814/THE.NCCU.MIS.003.2019.A05.

(38)