利用無監督遍歷偵測供應鏈流程變異性 - 政大學術集成

全文

(1):. 立. 政治大. ‧. ‧ 國. 學 y. Nat. Tracking Supply Chain Process Variability with. n. er. io. sit. Unsupervised Cluster Traversal al v i n Ch engchi U. 107. 8.

(2) Tracking Supply Chain Process Variability with Unsupervised Cluster Traversal Teng-Yung Lin August 26, 2018. 立. 政治大. ‧. ‧ 國. 學. n. er. io. sit. y. Nat. al. Ch. engchi. i n U. v.

(3) Abstract Supply chain processes need stability and predictability for the supply to better match demand at the right time with the right quantity. Reaching stable operations under uncertainty, however, is challenging as fluctuating demand patterns in the downstream are so common and make inventory control at the upstream a daunting task. Working with one of the leading semiconductor distributors in the world, who piles up stock that hampers profitability for the sake of satisfying lumpy/erratic demand in the downstream production plants, we help the distributor track process variability in its operations. Specifically, we integrate unsupervised. 政治大. clustering with the recurrent neural network for tracking supply chain process vari-. 立. ability without pre-assumptions on demand patterns. We first apply unsupervised. ‧ 國. 學. learning techniques to characterize weekly process performance of a wide variety of electronic items, where item-week pairs that have relatively-high similarity on. ‧. values of demand and stock attributes are clustered together. The operational variability of each item can then be measured with the trajectory of the item on its. y. Nat. sit. clusters ordered by time. To predict the trajectory of how each item moves from. al. er. io. week to week, we propose a new cluster sequence encoding and employ the recurrent. v i n C h scheme, the presented function tailored to our encoding approach can achieve high engchi U n. neural network structure for sequence prediction. We show that with a training loss. accuracy on variability prediction for real-world data. Since any upstream supply operations are driven by downstream demand patterns, the prediction on items’ operational variability may help suppliers to better prepare for demand irregularities by dynamically adjusting their operations strategies, e.g., altering throughput rates, rescheduling deliveries, increasing/decreasing fulfillment frequencies, etc..

(4) Contents Abstract 1 Introduction. 1. 2 Related Works. 4. 2.1. Supply Chain Demand Forecasting . . . . . . . . . . . . . . . . . . . . . .. 4. 2.2. Growing Hierarchical SOM (GHSOM) . . . . . . . . . . . . . . . . . . . .. 4. 2.3. Recurrent Neural Network (RNN) . . . . . . . . . . . . . . . . . . . . . . .. 5. 2.4. Time Series Data Prediction with Clustering . . . . . . . . . . . . . . . . .. 5. 立. 3 Methodology. 政治大. 7. ‧ 國. 學. System Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. 8. 3.2. SCM process variability . . . . . . . . . . . . . . . . . . . . . . . . . . . .. 8. 3.3. Unsupervised clustering with GHSOM . . . . . . . . . . . . . . . . . . . . 11. 3.4. Cluster items with trajectories . . . . . . . . . . . . . . . . . . . . . . . . . 16. 3.5. Cluster sequence prediction . . . . . . . . . . . . . . . . . . . . . . . . . . 16. 3.6. Model Accuracy and Performance . . . . . . . . . . . . . . . . . . . . . . . 21. ‧. 3.1. n. er. io. sit. y. Nat. al. 4 Case Study. Ch. engchi. i n U. v. 23. 4.1. System Settings . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23. 4.2. Comparison on lMSE and wMSE in fluctuating and stable items . . . . . . 26. 4.3. Trajectory prediction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26. 5 Conclusion. 29. References. 30.

(5) List of Figures 1. System Architecture . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. 8. 2. Defining neighbor unit . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13. 3. Cluster tree . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13. 4. GHSOM labeling method . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14. 5. A typical sample of item trajectory . . . . . . . . . . . . . . . . . . . . . . 15. 6. The RNN and its input/output sequence (a three-layer GHSOM tree, i.e., h=3) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18. 政治大 Item-A 5 steps trajectories 立 prediction . . . . . . . . . . . . . . . . . . . . .. 7. Cluster distance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22. 8. 28. ‧. ‧ 國. 學. n. er. io. sit. y. Nat. al. Ch. engchi. i n U. v.

(6) List of Tables 1. Numerical Attributes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. 9. 2. Nominal Attributes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. 9. 3. Company W’s Data Table . . . . . . . . . . . . . . . . . . . . . . . . . . . 10. 4. Company W’s Data Table2 . . . . . . . . . . . . . . . . . . . . . . . . . . . 10. 5. Item trajectories in same cluster . . . . . . . . . . . . . . . . . . . . . . . . 17. 6. Experiment input sequences . . . . . . . . . . . . . . . . . . . . . . . . . . 24. 7. Accuracy comparison on wMSE, lMSE and MSE in all settings. 8. Training data accuracy on wMSE, lMSE and MSE in all settings. 9. . . . . . . 25. . . . . . 政治大 Traversal distance立 comparison on MSE, wMSE and lMSE in all settings .. 25 26. Traversal distance comparison on high and low trajectory clusters . . . . . 27. 11. Accuracy comparison on high and low trajectory clusters . . . . . . . . . . 27. ‧. ‧ 國. 學. 10. n. er. io. sit. y. Nat. al. Ch. engchi. i n U. v.

(7) 1. Introduction. Supply chain processes need stability and predictability for the upstream supply to better match downstream demand at the right time with the right quantity. Unstable and volatile operations, however, are norms rather than anomalies in semiconductor supply chains with long production lead time, short product life cycle, and erratic market demand. In this study, we work with one of the leading semiconductor distributors in the world, Company W, to better monitor its supply chain process variability through contemporary machine learning techniques. Being a global distributor in this industry, Company W has. 政治大 patterns and supply availability 立 from week to week. As a result, the distributor has no. to buy, store, and transfer thousands of semiconductor items, each has di↵erent demand. ‧ 國. 學. choices but piles up inventories that hamper profitability just for the sake of satisfying abnormal demand in the downstream production plants. To secure a fine return on. ‧. working capital, it is imperative for the distributor to closely monitor metrics such as intransit inventory, total on-hand inventory, and demand levels regularly. While monitoring. y. Nat. sit. the foregoing metrics for a single item may be easy, doing so for all items turns out to. n. al. er. io. be a highly challenging task. Essentially, managers in the company W wish to identify. i n U. v. groups of items with similar operational attributes (e.g., inventory, demand) from time to. Ch. engchi. time, such that they can greatly reduce managerial burden and focus on tracking process variability on a weekly basis. To help the distributor improve its supply chain operations, we propose a two-stage cluster traversal approach in which we combine unsupervised clustering and the Recurrent Neural Network (RNN) to capture the rapidly-changing process attributes of di↵erent items without presuming items’ demand processes. In the first stage, we employ an unsupervised clustering method to identify item-week pairs with similar inventory and demand metrics, in addition to developing a new cluster sequence labeling method grounded on continual numbers. After labeling each cluster, we form cluster trajectories and order the clusters of the same item by time to observe process variability. This similarity learning helps the company W achieve its goal of reducing management complexities. In the second stage, given the advantage that cluster labels 1.

(8) retain rich process information in low data dimensions, we are able to train the RNN model using weekly cluster labels without having to put all attributes into the RNN model such as video streaming and speech recognizing research [1]. The fitted RNN enables the company W to foresee process variability by predicting the moving trajectories among clusters for all items. Further, we develop two loss functions tailored to the cluster labeling scheme for the RNN. By doing so, we show that the tailored RNN has substantially better prediction performance than the RNN trained with the ordinary mean squared errors loss function. Motivated by practical challenges in semiconductor supply chain management,. 政治大 1. How to characterize process variability of groups of items in terms of inventory and 立. our research endeavors aim to address the questions below:. demand metrics that are crucial for supply chain operations?. ‧ 國. 學. 2. How to predict process variability movement of each item and how our method. ‧. performs in analyzing data from real-world supply chain operations?. sit. y. Nat. To answer the first question, we perform unsupervised clustering on various features of. io. er. items across periods and visualize clusters in two dimensions – backlog (i.e., in-transit inventory)-to-avail (i.e., total inventory including stock on-hand) ratio and demand (i.e.,. n. al. Ch. i n U. v. accumulated demand)-to-avail ratio. For the distributor, a higher backlog-to-avail ratio. engchi. implies a lower risk of holding inventory for an item as the in-transit inventory ownership is still under upstream suppliers, whereas a higher demand-to-avail ratio relates to better utilization of working capital as the inventory shipment to downstream customers is more active. Therefore, the distributor generally prefers and attempts to keep items in high backlog-to-avail and demand-to-avail ratios. Projecting clusters of item-week pairs onto the two dimensions helps the company W uncover clusters with high/low supply and demand risks. The movement of any cluster (and item-weeks within it) on the twodimensional space characterizes performance variation in items’ inventory and demand metrics that are crucial for supply chain operations. To answer the second question, for each item we generate its week-to-week cluster labels that allow managers to track process variability at the item level. Using past trajectories of cluster labels as input features, we 2.

(9) train RNN to predict the next cluster movement of all items. Furthermore, we implement the presented approach in a prototype system, DejaVu, and apply it to analyze a two-year transaction data set, including more than three thousands of items, from the distributor. The preliminary result indicates that the two-stage cluster traversal approach presented in our paper can help the company W track process variability of items using readily available demand and inventory data. Through the process of clustering, labeling and data transformation, movement prediction, managers of company W are more capable of predicting performance variation, hedging against uncertainty, dynamically adjusting. 政治大. their supply chain operations, and perhaps more importantly, developing their supply chain tactics/strategies with foresight. The remainder of this paper is organized as follows:. 立. Chapter 2 presents the related paper review. Chapter 3 describes the system architecture. ‧ 國. 學. we proposed. Chapter 4 introduces a case study which applies our proposed approach. Chapter 5 presents the conclusions.. ‧. n. er. io. sit. y. Nat. al. Ch. engchi. 3. i n U. v.

(10) 2 2.1. Related Works Supply Chain Demand Forecasting. One of the biggest challenges in supply chain management is to coordinate the demand and inventory. Two common econometric models for forecasting product demands are logit [2] and nested logit [3]. Beyer et al. (2005) [4] used a profile extractor which normalizes the demand profiles of the similar products to obtain the demand profile of a new product. Singh et al. (2006) [5] discussed systems and methods for multiple-scenario com-. 治政 alternative forecast algorithm theories. They further provided 大 an automatic tuning mech立 anism to determine optimal parameter settings of the forecasting algorithm and model.. parisons which help users create predictions from multiple history streams with various. ‧ 國. 學. Facing uncertain customer demands in a multi-level supply chain, Efendigil et al. (2009) [6] proposed a new comparative forecasting methodology via neural techniques. They. ‧. developed a comparative forecasting mechanism based on artificial intelligence methods.. sit. y. Nat. Carbonneau et al. (2008) [7] applied recurrent neural networks, support vector machine. io. er. and a traditional regression model to forecast distorted demand of a supply chain. Their finding suggested that recurrent neural networks and support vector machines showed the. al. n. v i n best performance, but the accuracy significantly better than that of C hwas not statistically engchi U the regression model. 2.2. Growing Hierarchical SOM (GHSOM). Clustering methods [8, 9, 10, 11] that cluster samples with their similarity have been widely employed in various fields, such as operation management [12], location of facilities [13], financial time series predictions [14], market segmentation [15], product demand prediction [16], fraudulent financial report detection [17], and mobile application recommendation [18]. Self Organization Maps (SOMs) [9] are good in extracting high dimension data to two-dimension representation space. Growing hierarchical SOMs (GHSOMs) [10] builds a tree-like SOM architecture with layers. The size and depth of the GHSOM are. 4.

(11) adjusted within an unsupervised training procedure based on inter- and intra- variations among samples and clusters. To improve the scalability of GHSOMs, Chiu, Chen, and Yu (2017) [11] proposed an actor-based GHSOM algorithm that facilitates parallel map constructions. We adopt GHSOM and their construction[11] as our unsupervised clustering mechanism in this work.. 2.3. Recurrent Neural Network (RNN). The Recurrent Neural Network (RNN)[19], such as LSTM and GRU, is considered to. 政治大 based data modeling, such as natural language proessing [20]. In the business field, 立 many important forecasting problems, such as short life span products, noise, and small one of the most successful machine learning methods when it comes to complex time-. ‧ 國. 學. sample size, involve many issues that are hard to be tackled in existing software systems [21]. Giles et al. (2001) [22] proposed a hybrid processing method that combines the. ‧. SOM to preprocess data and employs the RNN to predict daily foreign exchange rates.. Nat. sit. y. Darbellay et al. (2000) [23] proposed a neural network method to predict short-term. er. io. electricity consumption, in which they found that the neural network is superior to linear. al. v i n forecasting where they took theCprevious as the input sequence and predicted the h e n gsales chi U n. models under some conditions. Mupparaju et al. [24] showed LSTM suitable for demand. sequence of the sales in the future.. 2.4. Time Series Data Prediction with Clustering. Previous research has applied unsupervised clustering such as [9] in the exploratory phase of data mining since they are excellent at projecting high-dimension input on a lowdimensional space that can be e↵ectively utilized to explore the similarity of real-world data. Rauber et al. (2002) [25] used GHSOM to categorize characteristics of frequency spectra and create di↵erent dynamic music collections. Hsu et al. (2009) [26] applied SOM to capture the non-stationary property of financial series and then applied support vector regression (SVR) to forecast financial indices. Liu et al. (2006) [27] presented 5.

(12) an approach which employed GHSOM to analyze time series data and provided a useful practical tool for pattern discovery. Unlike using clustering as preprocessing, we employ GHSOM to cluster samples and use cluster trajectory to characterize demand and inventory movement. We employ RNNs as our sequence modeling on tracking the movement, achieving accurate prediction on supply chain process variability. There is no research as far as we know, focusing on predicting the movement of clusters.. 立. 政治大. ‧. ‧ 國. 學. n. er. io. sit. y. Nat. al. Ch. engchi. 6. i n U. v.

(13) 3. Methodology. Artificial intelligence forecasting techniques have been receiving much attention lately for their ability to achieve prediction accuracy that cannot be possibly reached by traditional methods. They are claimed to have the ability to learn like humans by accumulating knowledge through continual learning activities. In response to the need for capturing irregularities in real-world supply chain operations, we propose a new forecasting technique that employs unsupervised clustering and prediction techniques on cluster movement trajectory of time sequence data. In this section, we first give an overview of the system. 政治大 cess variability, we list practically 立 important demand and inventory indicators that are structure of our proposed approach in sec 3.1. Next, to capture the supply chain pro-. ‧ 國. 學. commonly used in supply chain management as our input data in sec 3.2. We start our analysis from unsupervised clustering on these raw samples, where each sample corre-. ‧. sponds to the predefined demand features of an item in a specific week. We employ the growing-hierarchical SOMs on these samples so that samples (the demand of an item in a. y. Nat. sit. specific week) have similar feature values are clustered together. For each item, its item. n. al. er. io. cluster trajectory (based on sequencing the clusters of its samples in each week) can then. i n U. v. be generated and can further be used to represent the process variability of that item.. Ch. engchi. We detail how to generate item trajectory in sec 3.3. Items can then be further clustered together based on their cluster trajectory, where items fall into the same cluster have similar trajectory features and hence relatively similar patterns on demand variability. Finally, we employ an RNN model to predict item trajectory and present our proposed loss function in sec 3.5. The RNN model takes the cluster trajectory as its input and yields the prediction on the next trajectory. In our research, we propose two di↵erent loss functions, weighted mean square error (wMSE) and ladder mean square error (lMSE) to optimize our objective function during the RNN training process. We later show that adopting wMSE and lMSE significantly improved the learning accuracy in experiments.. 7.

(14) 3.1. System Overview. Fig. 1 shows a high-level diagram of our analysis architecture.. 治政大 Figure 1: System Architecture 立 ‧ 國. 學. In the preprocess phase, we first separate nominal and numerical attributes as the cluster representations (nominal attribute) and assess inter- and intra- data di↵erence. ‧. (numerical attributes). We also employ a minimax scaler transformation on numerical. y. Nat. attributes to reduce noise and outliers e↵ect [28]. In the clustering phase, for each record,. io. sit. we label it with nominal attributes including item and date (on week), and use numerical. n. al. er. attributes including all the predefined indicators of demand and inventory in supply chain. i n U. v. management as the sample attributes of that record. We then apply GHSOM to cluster. Ch. engchi. samples based on these numerical attributes. In the trajectory phase, we form the cluster trajectory of each item based on the labels of its samples. In the predictions phase, we further apply GHSOM to cluster items based on their trajectory features and use RNN to predict the movement of trajectories.. 3.2. SCM process variability. To characterize process variability in supply chain operations, we interview managers of the company W and identify eight practically relevant numerical attributes pertaining to demand (two attributes) or inventory levels (six attributes). A list and brief description of these attributes considered as key performance indicators in operation management [29] are shown in Tab. 1. Since a major goal of supply chain management is to satisfy 8.

(15) demand while lowering inventory holdings that require capital investment, the demandand inventory-related features are intrinsically relevant metrics for characterization performance variation and process variability in supply chain operations. We also choose Table 1: Numerical Attributes Name. Inventory. Actual AWU FCST M. Demand. BL <= 9WKs. 政治大. Backlog (BL). 立. DC-OH Hub-OH. 學. In stock quantities. Available. Company backlog. sit. Nat. TTL OH. ‧. ‧ 國. Description Average weekly usage (i.e., actual demand) in the past Managers’ forecast of monthly demand for the future In-transit inventory to be delivered by upstream supplier within 9 weeks Total in-transit inventory (to be delivered) On-hand (OH) inventory in the distribution center (DC) On-hand (OH) inventory in warehouse nearby downstream customer production plant. y. Indicators. er. io. some representative nominal attributes as labels like item short name, brand name, and. n. al. i n U. v. salesman. All nominal attributes selected are shown in Tab. 2. To have a robust and prac-. Ch. engchi. Table 2: Nominal Attributes Name. Description. Report Date. Date when every trade was made. Customer. Company Ws customers. Item Short Name. Items sold by Company W. Brand. Items brand sold by Company W. Type. Level of in stock items. Sales. Man who sells items. tical explanation of each cluster, we adopt two critical attributes to be our labels. One is the actual week unit (AWU) to availability ratio which shows the amount of product consumed by down-stream partners. For instance, if the ratio is high, it means the company 9.

(16) Table 3: Company W’s Data Table Report Date. Customer. Type. Item Short Name. Brand. Sales. AWU Aval ratio. Backlog Avail ratio. 20160805. 10759. Over Stock. Item A. TOSHIBA. Sales A. 0.0143. 0.3395. 20160805. 10759. Over Stock. Item B. TOSHIBA. Sales B. 0. 0.4389. 20160805. 10759. Over Stock. Item C. ZILLTEK. Sales C. 0.0321. 0.4014. 20160812. 1171. Zero. Item C. TOSHIBA. Sales B. 0.0021. 0.4756. 20160812. 1171. Zero. Item D. EVERLIGHT. Sales B. 0.0545. 0.7292. 2016819. 1171. Zero. Item E. TOSHIBA. Sales D. 0.0152. 0.5217. .... 政治大. Table 4: Company W’s Data Table2. 立. BL =<9WKs. Backlog. DC OH. Hub OH. TTL OH. FCST M. 0.0074. 0.0179. 0.0097. 0.0087. 0.0431. 0. 0.02762. 0. 0.0025. 0.0027. 0. 0.0017. 0.0002. 0. 0.0001. 0. 0.0031. 0.0504. 0.0019. 0.0291. 0.0016. 0.0077. 0.0036. 0. 0.0831. 0.0522. 0.0325. 0.0357. 0.1102. 0. 0.0705. 0.0003. 0.0809. 0.1814. 0. 0.19. 0.0453. 0.1055. 0.064. 0.0003. 0.0003. 0.0053. 0.2128. 0.0002. 0.1745. 0.0091. 0.1147. 0.0001. er. io. sit. y. ‧. Nat. .... ‧ 國. Avail. 學. Actual AWU. al. has too few inventories according to demand. The other is the backlog to availability ratio. n. v i n C h If the backlogUis high, it indicates a high cost on representing company backlog levels. engchi backlog storage [30]. These labels, which indicate inventory and demand level, can help company managers discover clusters with high/low supply and demand risks.. AW U. Backlog. Avail. Avail. Ratio =. ActualAW U Available. Ratio =. Backlog Available. After we define our numerical and nominal attributes, we integrate all historical records into a table with eight numerical attributes and ten nominal labels shown in Tab. 3. To get rid of noise in real-world data, we use minimax scaler to give more credibility to the results [31]. The following equation shows the transformation applied to all of the. 10.

(17) eight numerical attributes, which are re-scaled into a [0, 1] range:. Xscaled =. X Xmin Xmax Xmin. (1). In the case study section, we come across some missing values in numerical attributes when using Company Ws data. We substitute. 1 for missing entries and feed them along. with scaled [0, 1] attributes to the GHSOM clustering algorithm, in which a negative value would be accommodated in the clustering processes and reflected in emerged clusters.. 3.3. 政治大. Unsupervised clustering with GHSOM. 立. The Growing Hierarchical Self-Organizing Maps (GHSOMs) [10] consists of an architec-. ‧ 國. 學. ture that has the ability to restructure SOMs both in a hierarchical way according to the data distribution, allowing hierarchical decomposition in sub-parts of the data, and. ‧. in a horizontal way, meaning that each map adapts its size to the input space require-. sit. y. Nat. ments. The advantage of GHSOM is the ability to determine the depth and size of the. io. er. map throughout the training process. The growth process of the GHSOM is guided by two parameters: ⌧1 , which represents the same-layer SOM map similarity and ⌧2 , which. n. al. Ch. i n U. v. determines the depth of the GHSOM structure. The input x shown in eq. (2) is a high. engchi. dimension matrix including i number of vector with the selected attributes.. i, n 2 N ⇤ ⇥ vn = a1 , a2 , ..., ai x = [v1 , v2 , ..., vn ]. (2). The starting point for the growth process is the overall input data as measured with the single-unit SOM at layer 0. This starting unit is assigned a weight vector m0 , m0 = [µ01 , µ02 , . . . , µ0i ], computed as the average of all input attributes. The deviation of the input data, i.e. the mean quantization error mqe of this single unit, is computed as given. 11.

(18) in eq. (3) with the number of input vectors n.. mqe0 =. 1 · ||m0 n. x||. (3). After the computation of mqe0 , training of the GHSOM starts with its first layer SOM which consists of a rather small number of units, e.g. a grid of 2 ⇥ 2 units. To adapt the size of this first layer SOM by data distribution, the mean quantization error of the map is computed as given in eq. (4). In this formula, u refers to the number of units i contained in the SOM m. According to eq. (3), mqei is computed as the average distance. 政治大. between weight vector mi and the input patterns mapped onto unit i.. 立. ‧ 國. 1 X · mqei u i. 學. MQEm =. (4). ‧. The basic concept is that each layer of the GHSOM is responsible for explaining the deviation of the input data as present in its prior layer. This is done by adding units to. Nat. sit. y. expand the initial SOM map on each layer until it reaches a suitable size. More specifically,. al. er. io. the SOM on each layer is allowed to grow until the deviation in the unit of its preceding. v ni. n. layer is reduced to at least a fixed percentage ⌧1 . Thus, as long as MQEm. Ch. engchi U. ⌧1 · mqe0. holds true for the first layer map m, a new row or a new column of units is added to this SOM. This insertion is achieved neighboring the unit e with the highest mean quantization error, mqee , called error unit. Whether a new row or a new column is added is controlled by the location of the most dissimilar neighboring units d to the error unit. We take the advantage of matrix calculation which can accelerate by multi-thread GPU process unit. Here, we use two matrix to record units weight and units topology location. First we get the location of error unit by get index of the highest unit weight mqee . Next, We can easily get the neighboring units location by simply subtract error unit location and topology location and get the result which the absolute sum is equal to 1 (one step distance di↵erence). Fig. 2 shows how we find the location of neighbor units. After we find the location of neighbor units, we calculate the most weight dissimilar neighboring. 12.

(19) Figure 2: Defining neighbor unit units to the error unit. Hence, we insert a new row or a new column depending on the. 治政 of the first layer map is finished, i.e. MQE < ⌧ · mqe 大, the units of this map are tested 立 for expansion on the second layer. In particular, those units that have a large mean position of the neighbor with the most dissimilar weight vector. As soon as the expansion m. 1. 0. ‧ 國. 學. quantization error will add a new SOM to the second layer of the GHSOM. A parameter ⌧2 is used to express the desired level of granularity in input data discrimination in the. ‧. final maps. More precisely, each unit i fulfilling the criterion given in eq. (5) will be. n. al. (5). er. io. mqei > ⌧2 · mqe0. sit. y. Nat. expanded hierarchically.. i n U. v. Once the ⌧2 condition is satisfied, the unit insertion procedure will continue with this. Ch. engchi. newly established SOM map which takes input data that belongs to the same preceding SOM unit. We examine the ⌧1 and ⌧2 until no more units required further expansion and the training process leads to a hierarchical SOM tree structure. The result of GHSOMs. Figure 3: Cluster tree. 13.

(20) represents a tree structure as shown in Fig. 3. We further label each cluster according to its position in each layer. Considering a GHSOM structure with depth h, the label of a cluster C k is represented as an h-dimension vector: ⇤ ⇥ C k = l1 , l2 , ..., lh ,. (6). , where li indicates the position of C k in the ith layer. Fig. 4 shows an example to label clusters in a three-layer GHSOM structure. Each cluster represents a set of samples that. 立. 政治大. ‧. ‧ 國. 學. n. er. io. sit. y. Nat. al. Ch. engchi. i n U. v. Figure 4: GHSOM labeling method have similar values in demand and inventory attributes, where each sample is associated with an item (some product in the supply chain) at some week (the product has transactions in that week). That is to say, a product (an item) has a weekly sample in historical records if the product has transactions in that week. For each item, according to the clusters that its samples fall in, we can generate an item trajectory as a cluster sequence ordered by dates. For weeks that there are no transactions, we use ’0’ as a padding symbol that is distinguishable from all the clusters. Here we show item-A, a sample item of Company W’s products, which represents a typical item that has highly irregular demand. 14.

(21) A typical sample of item trajectory is [0, 0, 0, 0, 0, C 21 , 0, 0, 0, 0, C 21 , C 19 , 0, 0, C 17 , 0, C 21 , 0, 0, 0, 0, 0, 0, C 21 , 0, 0, 0, 0, C 21 , 0, 0, 0, C 21 , 0, 0, 0, 0, C 17 , 0, C 17 , 0, 0, 0, C 17 , 0, 0, 0, C 17 , 0, 0, C 17 , 0, 0, C 18 , 0, 0, 0, C 18 , 0, 0, C 18 , 0, 0, 0, C 19 , 0, 0, 0, C 19 , C 20 , 0, 0, C 19 , 0, 0, C 19 , 0, 0, 0, C 20 , 0, 0, 0, C 20 , 0, 0, C 19 , 0, C 20 , 0, C 20 , C 19 ], where this item has 27 transaction records in the past 92 weeks. For the week, e.g., week 6, that the item has transactions, the item’s demand and inventory can be characterized by the corresponding cluster, e.g., C 21 . The trajectory reflects the process variability of the item. One can project the cluster movement to AWU-Avail-Ratio & Backlog-Avail-Ratio map. 政治大. as shown in Fig. 5, which represents the movement of item-A at di↵erent time. It gives company managers insight through observing the fluctuation pattern of its movements in. 立. AWU-Avail-Ratio (x-axis) and Backlog-Avail-ratio (y-axis).. ‧. ‧ 國. 學. n. er. io. sit. y. Nat. al. Ch. engchi. i n U. v. Figure 5: A typical sample of item trajectory. 15.

(22) 3.4. Cluster items with trajectories. It is common for supply-chain distributors to have thousands of products. To cluster similar items, we put item trajectories into GHSOM and cluster similar trajectories together. In the following experiment section, we train one RNN model by feeding each cluster trajectories, and we will also compare the RNN model performance while feeding di↵erent trajectories, such as single item trajectory and all item trajectories. To thoroughly compare the di↵erent between our RNN model, we also separate stable trajectory cluster, which means the item usually stay in the same or have similar moving pattern,. 政治大 experiment. Tab. 5 shows some examples of Company W’s item sequences that belong to 立 and unstable trajectory cluster, which represent high fluctuating item trajectories, in our. the same cluster.. ‧ 國. 學. Cluster sequence prediction. ‧. 3.5. To capture real-world fluctuation of items, we adopt recurrent neural networks (RNNs). y. Nat. sit. to learn and predict the trajectory movement. The movement consists of two parts: the. n. al. er. io. cluster (the demand and inventory status of next transaction falls in) and the delay (the. i n U. v. number of no transaction weeks before it happens). To this aim, we re-encode the item. Ch. engchi. trajectory as the sequence of a pair of [cluster, delay] as:. Item-X = {[c0 , d0 ] , [c1 , d1 ] , ..., [cn , dn ]}. (7). , where n refers to the number of transaction weeks in the trajectory and for 0  i < n, ci refers to the cluster of the ith transaction week and di referes to the number of padding symbol 0 before ci in the trajectory. The cluster of ci is encoded as a layer vector [l1 , l2 , . . . , lh ]. We adopt an RNN model as our learning mechanism, which has been shown great success in music modeling and speech signal modeling [20]. Fig. 6 shows the RNN structure we employ. A sequence item-X (eq. 7) with length n is fed to an RNN model with n neuron nodes. Each neuron St takes xt = [ct , dt ] to predict xˆt+1 at transaction. 16.

(23) Table 5: Item trajectories in same cluster Cluster. Item Name. 820. Item S. 政治大. Item M. 學. ‧. ‧ 國. 立. n. al. er. io. sit. y. Nat 980. Item trajectory [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, C 17 , 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, C 17 , 0, 0, 0, 0, 0, C 17 , 0, C 17 ] [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, C 17 , 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, C 17 , 0, 0, 0, 0, 0, C 17 , 0, C 17 ] [0, 0, 0, 0, 0, C 21 , 0, 0, 0, 0, C 21 , C 19 , 0, 0, C 17 , 0, C 21 , 0, 0, 0, 0, 0, 0, C 21 , 0, 0, 0, 0, C 21 , 0, 0, 0, C 21 , 0, 0, 0, 0, C 17 , 0, C 17 , 0, 0, 0, C 17 , 0, 0, 0, C 17 , 0, 0, C 17 , 0, 0, C 18 , 0, 0, 0, C 18 , 0, 0, C 18 , 0, 0, 0, C 19 , 0, 0, 0, C 19 , C 20 , 0, 0, C 18 , 0, 0, C 18 , 0, 0, 0, C 20 , 0, 0, 0, C 20 , 0, 0, C 19 , 0, C 20 , 0, C 20 , C 19 ] [0, 0, 0, 0, 0, C 21 , 0, 0, 0, 0, C 21 , C 21 , 0, 0, C 21 , 0, C 17 , 0, 0, 0, 0, 0, 0, C 21 , 0, 0, 0, 0, C 21 , 0, 0, 0, C 21 , 0, 0, 0, 0, C 21 , 0, C 21 , 0, 0, 0, C 21 , 0, 0, 0, C 21 , 0, 0, C 21 , 0, 0,C 20 , 0, 0, 0, C 21 , 0, 0, C 21 , 0, 0, 0, C 21 , 0, 0, 0, C 21 , C 21 , 0, 0, C 21 , 0, 0, C 20 , 0, 0, 0, C 21 , 0, 0, 0, C 21 , 0, 0, C 21 , 0, C 21 , 0, C 21 , C 21 ]. Item A. Ch. Item A2. engchi. 17. i n U. v.

(24) 立. 政治大. ‧. ‧ 國. 學. n. er. io. sit. y. Nat. al. Ch. engchi. i n U. v. Figure 6: The RNN and its input/output sequence (a three-layer GHSOM tree, i.e., h=3). 18.

(25) time t as shown in eq. (8). xˆt+1 shall be as close as to xt+1 = [ct+1 , dt+1 ] during the training process. ReLU is the half rectifier activation function that transforms input x to an output a ⇥ x + b if x is positive and 0 otherwise. ReLu : f (x) = max(0, a ⇥ x + b) xˆt+1 = ReLU( U ⇥ xt + W ⇥ xˆt ). (8). W is used to scale the output of St before it is fed to the next neuron, where U is used to scale the input xn+1 to the next neuron, both will be adjusted to fit the correct sequence during the training process.. 立. 政治大. The key issue is to define suitable loss function for cluster prediction. Note that for. ‧ 國. 學. each cluster, the lower layer accuracy dominates the higher layer accuracy. That is to say when i < j, the accurate prediction on li is more important than the prediction on lj .. ‧. For instance predicting a cluster [1, 2, 2] with [1, 3, 8] shall be better than [2, 2, 2], given. y. Nat. the fact that the previous prediction is a cluster within the same first layer. In this case,. io. sit. directly using mean square errors (MSE) [32] as loss function will overlook the di↵erences. n. al. er. between layers. To address this issue, we propose two new loss functions called weighted. i n U. v. mean square error (wMSE) as shown in eq. (9) and ladder mean square error (lMSE) 1. Ch. engchi. to enlarge the impacts of layer domination.. W = [w1 , w2 , . . . , wh , wd ] n. wM SE =. 1X [W ⇥ (yˆt n t=0. yt )]2. (9). ⇥ ⇤ In our wMSE loss function, w is a vector that corresponds to ln1 , ln2 , . . . , lnh , dn to give. di↵erent layers di↵erent weights. This can force the RNN model to comprehend the. idea of layer dissimilarity. Consider yˆ as the RNN prediction value of an item trajeci h i h io nh and y is the item trajectory tory lˆ1 0 , lˆ2 0 , lˆ3 0 , dˆ0 , lˆ1 1 , lˆ2 1 , lˆ3 1 , dˆ1 , ..., lˆ1 n , lˆ2 n , lˆ3 n , dˆn. {[l01 , l02 , l03 , d0 ] , [l11 , l12 , l13 , d1 ] , ..., [ln1 , ln2 , ln3 , dn ]} from transaction time 0 to n. Eq. (10) shows. 19.

(26) an example of W multiply (ˆ y ⇥. y) in time n.. [w1 ⇥ (lˆ1 n. ln1 )], [w2 ⇥ (lˆ2 n. [w3 ⇥ (lˆ3 n. ln3 )], [wd ⇥ (dˆn. ln2 )], ⇤ dn )]. (10). However, when adopting wMSE, minus a↵ects exist throughout di↵erent layers. For instance, lj still a↵ect li (given i < j) on a wj scale. To eliminate lower to higher layers impacts, we propose lMSE to address this issue in which we check higher layer first and then check lower layer orderly.. 政治大. Algorithm 1 Algorithm for ladder MSE. 立. n. al. er. io. sit. Nat. y. ‧. ‧ 國. 學. 1 2 3 1 2 3 Input: {[lnh ln , dn ]} 0 , l0 , l0 , d0 ] , ..., [lin , ln , h io 1 2 3 ˆ ˆ ˆ ˆ Output: l 0 , l 0 , l 0 , d0 , ..., lˆ1 n , lˆ2 n , lˆ3 n , dˆn i h 1 1 1 1 ˆ ˆ ˆ [l01 , l11 , ..., ln1 ] 1: ldif f l 0 , l 1 , ..., l n i h 2 [l02 , l12 , ..., ln2 ] 2: ldif lˆ2 0 , lˆ2 1 , ..., lˆ2 n f h i 3 ˆ3 0 , lˆ3 1 , ..., lˆ3 n 3: ldif l [l03 , l13 , ..., ln3 ] f h i 4: ddif f dˆ0 , dˆ1 , ..., dˆn [d0 , d1 , ..., dn ] lMSE Loss Function 1 5: if (ldif f 6= 0) then P 1 P d 1 1 1 2 [w ⇥ (ldif [w ⇥ (ddif f )]2 6: lM SE f )] + n n 7: else 2 8: if (ldif f 6= 0) then P 2 P d 1 1 2 2 [w ⇥ (ldif [w ⇥ (ddif f )]2 9: lM SE f )] + n n 10: else 3 11: if (ldif f 6= 0) then P 3 P d 1 1 3 2 [w [w ⇥ (ddif f )]2 12: lM SE ⇥ (l )] + dif f n n 13: else P d 1 14: lM SE [w ⇥ (ddif f )]2 n 15: end if 16: end if 17: end if 18: return lM SE. Ch. engchi. 20. i n U. v.

(27) 3.6. Model Accuracy and Performance. To evaluate the accuracy of our proposed model, we adopt two methods to indicate our RNN model performance. One is to count whether our prediction is equal to ground true throughout all layer. The other is to calculate the traversal distance, ladder distance, between the predicted cluster and correct cluster. Here the ladder distance is denoted as total steps moving from one cluster to the other. For instance, the distance between [2, 3, 1] and [2, 3, 5] is two steps since it takes one step from [2, 3, 1] toward its parent node [2, 3] and one step from [2, 3] to its child [2, 3, 5]. The distance between [2, 3, 1] and [2,. 政治大 two steps back to its child [2, 1, 6]. To estimate the delay correctness, we use 立. 1, 6] is 4, shown in fig. 7. [2, 3, 1] has to move two steps to parent node [2] and then move d, which. is denoted as the absolute mean distance between ground true and predicting delay, to. ‧ 國. 學. represent our model accuracy on predicting delay.. ‧. In the case study, we show that using wMSE and lMSE as our loss function can help the RNN to apprehend the layer di↵erences and significantly improve the prediction. n. al. er. io. sit. y. Nat. accuracy.. Ch. engchi. 21. i n U. v.

(28) 立. 政治大. ‧. ‧ 國. 學. n. er. io. sit. y. Nat. al. Ch. engchi. i n U. Figure 7: Cluster distance. 22. v.

(29) 4. Case Study. In this section, we conduct our case study with a real-world electronics distributor, Company W, and report our outcome and findings while applying the presented cluster trajectory and prediction approach to track process variability. Based in Taipei, Taiwan, Company W is one of the world’s largest electronics component distributors, with more than 30 branches worldwide. Company W acts as a franchise partner to more than 60 international electronics component suppliers, including Intel, Texas Instruments, Philips, Hynix, Vishay, and Omni Vision. This company plays a bu↵er role for the entire sup-. 政治大 companies (OEM, ODM) and 立upstream manufacturing partners. For instance, as small-. ply chain by coordinating order quantities and production schedules with its downstream. ‧ 國. 學. medium manufacturers generally order small quantities, it is hard for them to negotiate a good price with vendors. Company W, therefore, assists by aggregating orders from small. ‧. companies to obtain a better quantity price. Company W can also help small companies to control their inventory pooling. Currently, product managers in the company W. y. Nat. sit. depend solely on their experience and intuition to make ordering decisions since the com-. n. al. er. io. pany lacks clear rules or methods for educating its product managers, and the company. i n U. v. faces tremendous challenges to recognize patterns of upstream and downstream demand fluctuations.. 4.1. Ch. engchi. System Settings. All experiments were carried out on the data collected from real transactions of the company W, covering two-years historical data with more than eight thousand transaction records of three thousand items. Each item has samples up to 92 weeks. We conduct three experiment settings in this case study as shown in Tab. 6. To evaluate the model accuracy, we slice the cluster sequences into training and testing sequences. The training sequences include two portions that exist one step di↵erence from each other. The first portion is item sequence [c0 , d0 ] to [cn 2 , dn 2 ] and the second portion is [c1 , d1 ] to [cn 1 , dn 1 ]. The first portion sequences are used to train the RNN model, 23.

(30) Table 6: Experiment input sequences Input Sequence RN Nall. RN Ncluster. RN Nitem. Description We train one RNN model for all the items. We consider all the 3000+ items can be predicted with one RNN model in this setting. We cluster items based on their trajectory features and train one RNN model for each cluster. In this setting, we employ GHSOM to cluse 3000+ items into 104 clusters, and train 104 RNN models with item trajectories of each cluster. We train one RNN model for each item. That is to say, we generate 3000+ RNN model for each item trajectory in this setting.. 政治大 and the second portion sequences 立 are used to generate the prediction on xˆ. n. and check. ‧ 國. 學. whether it is accurate against [cn , dn ] in the real data. The measurement for our model accuracy is to calculate the ratio that the prediction is exactly the same as the ground. ‧. truth by layers and the delay. We also evaluate our model performance by calculating the traversal distances and the mean no-transaction weeks di↵erence. Nat. y. t which point. sit. out the di↵erence when predicting no-transaction weeks before next transaction. We. n. al. er. io. implement BasicRNN in tensorflow as our model, setting our training stop condition. i n U. v. to 15000 iterations or accepted loss lower than 0.0001, employing AdamOptimizer [33]. Ch. engchi. as the optimization approach. We compare using the proposed weighted mean square error (wMSE) and ladder mean square error (lMSE) as loss function against MSE. The experiment results shown in Tab. 7. We also test if our model overfit the training data and Tab. 8 shows the training result. In all the settings, we are able to achieve better accuracy ratios using wMSE and lMSE as the loss function compared to using MSE in training the RNN models to predict the trajectory movement. However, there is no significantly di↵erence between adopting wMSE and lMSE in predicting the ground true. We then calculate the mean ladder distance between these models. Tab. 9 shows the mean ladder distance when adopting di↵erent loss function. As we can see, lMSE is better to predict the similar cluster with shorter ladder distance. 24.

(31) Table 7: Accuracy comparison on wMSE, lMSE and MSE in all settings Sequence. Layer accuracy. Loss function. RN Nall. l1 , l2. l1 , l2 , l3. M SE. 0.431. 0.233. 0.108. 19.86. wM SE. 0.752. 0.49. 0.248. 13.82. lM SE. 0.762. 0.399. 0.151. 10.11. M SE. 0.481. 0.262. 0.136. 13.85. 0.37. 4.88. 0.469. 2.34. 0.49. 14.23. 0.718 0.532 治政大 lM SE 0.817 0.688 立M SE 0.554 0.498 wM SE. RN Nitem. d. wM SE. 0.948. 0.773. 學. ‧ 國. RN Ncluster. l1. 0.553. 5.52. lM SE. 0.972. 0.78. 0.57. 2.84. ‧ sit. y. Nat. n. al. er. io. Table 8: Training data accuracy on wMSE, lMSE and MSE in all settings Sequence. RN Nall. RN Ncluster. RN Nitem. Ch. Loss function. e n g cl h i 1. i n U. v. Layer accuracy l1 , l2. l1 , l2 , l3. d. M SE. 0.62. 0.38. 0.21. 8.2. wM SE. 0.9. 0.59. 0.28. 5.19. lM SE. 0.91. 0.63. 0.32. 4.21. M SE. 0.88. 0.69. 0.31. 6.34. wM SE. 0.97. 0.71. 0.43. 1.79. lM SE. 0.98. 0.97. 0.88. 1.04. M SE. 0.92. 0.78. 0.36. 6.48. wM SE. 0.98. 0.83. 0.62. 3.32. lM SE. 0.99. 0.97. 0.91. 1.94. 25.

(32) Table 9: Traversal distance comparison on MSE, wMSE and lMSE in all settings Sequence. RN Nall. RN Ncluster. Loss function. Mean ladder distance. M SE. 6.32. wM SE. 3.77. lM SE. 3.58. M SE. 5.17. wM SE. 3.74. lM SE. 2.31. 治4.84 政 wM SE 2.63 大 M SE. RN Nitem. 立. 1.16. ‧ 國. 學. 4.2. lM SE. Comparison on lMSE and wMSE in fluctuating and stable. ‧. items. Nat. sit. y. To further compare the di↵erence between wMSE and lMSE, we select clusters with stable. er. io. trajectories movement (std lower than 25%) and clusters with fluctuating trajectories (std. al. higher than 75%) and we employ di↵erent RNN models on these data. In this setting, we. n. v i n C h with more thanU600 items and 26 stable trajectory have 26 unstable trajectory clusters engchi clusters with more than twelve thousand items. Tab. 10 shows the result on traversal distance and Tab. 11 shows the accuracy prediction throughout di↵erent layers. As we can see in the experiment, lMSE significantly out perform wMSE on high process variability scenario. Both wMSE and lMSE prediction are almost same in low variability scenario.. 4.3. Trajectory prediction. Once the RNN model has been trained, it can be used to predict the future trajectory movement. We feed [c0 , d0 ] to [cn 2 , dn 2 ] and [c1 , d1 ] to [cn 1 , dn 1 ] to train our model. Then, we put [c2 , d2 ] to [cn , dn ] to predict the [cn+1 , dn+1 ]. After our model generates new 26.

(33) Table 10: Traversal distance comparison on high and low trajectory clusters Sequence. Loss function. RN Nall. RN Ncluster. Mean ladder distance high. low. wM SE. 4.68. 3.56. lM SE. 4.41. 3.34. wM SE. 3.87. 3.32. lM SE. 3.34. 3.28. 政治3.58 大3.16 lM SE 3.08 2.96 wM SE. RN Nitem. 立. ‧ 國. 學 ‧. Table 11: Accuracy comparison on high and low trajectory clusters. io. al. RN Nall. Ch. Low. High RN Ncluster Low. High RN Nitem Low. y. l1. l1 , l2. l1 l3. d. wM SE. 0.37. 0.16. 0.07. 25.31. lM SE. 0.49. 0.17. 14.88. 0.44. 0.24. 6.82. n. High. Layer accuracy. Loss. sit. Variability. er. Nat. Sequence. iv 0.24 n U. e nwMgSE c h i0.77 lM SE. 0.86. 0.59. 0.48. 5.49. wM SE. 0.45. 0.24. 0.18. 5.37. lM SE. 0.63. 0.49. 0.26. 3.84. wM SE. 0.73. 0.51. 0.38. 1.76. lM SE. 0.82. 0.61. 0.43. 1.01. wM SE. 0.66. 0.0.43. 0.2. 6.17. lM SE. 0.76. 0.49. 0.28. 3.56. wM SE. 0.99. 0.85. 0.59. 3.27. lM SE. 0.99. 0.93. 0.84. 1.98. 27.

(34) prediction, we feed it back to create two steps ahead prediction. We can keep feeding our prediction value to our RNN model. Fig. 8 shows the movement of item-A trajectory that generates five steps ahead prediction. The prediction [p1, p2, p3, p4, p5] is [0, [2, 3, 4], [2, 3, 2], 0, [2, 3, 3], [2, 3, 1], 0, [2, 3, 3]. The prediction of the item movement provides managers insights on SCM process variability at the item level and facilitates dynamic adjustment strategies on item-level demand and inventory management.. 立. 政治大. ‧. ‧ 國. 學. n. er. io. sit. y. Nat. al. Ch. engchi. i n U. v. Figure 8: Item-A 5 steps trajectories prediction. 28.

(35) 5. Conclusion. We propose a novel prediction model that combines unsupervised clustering and sequence prediction techniques to address the problem of SCM process variability forecasting with real-world transaction data. We focus on the development of an e↵ective cluster forecasting model for a company which has many up and downstream customers. We treat the GHSOM results as our RNN model inputs, which is di↵erent from many previous studies that use GHSOM to preprocess data and propose an e↵ective encoding and loss function of the RNN model to improve the accuracy. The prediction on item-level demand and. 政治大 their operation strategies dynamically. 立. inventory trajectories can help managers to face irregular demand patterns and adjust. ‧ 國. 學. Acknowledgement. ‧. This project is funded in part by MOST 106-2221-E-004-002- and MOST 105-2923-E-. n. al. er. io. sit. y. Nat. 002-016-MY3.. Ch. engchi. 29. i n U. v.

(36) References [1] H. Sak, A. Senior, and F. Beaufays, “Long short-term memory recurrent neural network architectures for large scale acoustic modeling,” in Fifteenth annual conference of the international speech communication association, 2014. [2] D. McFadden et al., “Conditional logit analysis of qualitative choice behavior,” 1973. [3] M. Ben-Akiva and S. R. Lerman, “Discrete choice analysis mit press cambridge,” MA Google Scholar, 1985.. 政治大. [4] D. M. Beyer, F. Safai, and F. AitSalia, “Profile-based product demand forecasting,”. 立. Dec. 20 2005, uS Patent 6,978,249.. ‧ 國. 學. [5] N. Singh, S. J. Olasky, K. S. Clu↵, and W. F. Welch Jr, “Supply chain demand forecasting and planning,” Jul. 18 2006, uS Patent 7,080,026.. ‧. ¨ ut, and C. Kahraman, “A decision support system for demand [6] T. Efendigil, S. On¨. y. Nat. sit. forecasting with artificial neural networks and neuro-fuzzy models: A comparative. n. al. er. io. analysis,” Expert Systems with Applications, vol. 36, no. 3, pp. 6697–6707, 2009.. Ch. i n U. v. [7] R. Carbonneau, K. Laframboise, and R. Vahidov, “Application of machine learning. engchi. techniques for supply chain demand forecasting,” European Journal of Operational Research, vol. 184, no. 3, pp. 1140–1154, 2008. [8] F. Murtagh and P. Legendre, “Ward’s hierarchical agglomerative clustering method: Which algorithms implement ward’s criterion?” Journal of Classification, vol. 31, no. 3, pp. 274–295, Oct 2014. [Online]. Available:. https://doi.org/10.1007/. s00357-014-9161-z [9] J. Vesanto and E. Alhoniemi, “Clustering of the self-organizing map,” IEEE Transactions on neural networks, vol. 11, no. 3, pp. 586–600, 2000.. 30.

(37) [10] A. Rauber, D. Merkl, and M. Dittenbach, “The growing hierarchical self-organizing map: exploratory analysis of high-dimensional data,” IEEE Transactions on Neural Networks, vol. 13, no. 6, pp. 1331–1341, 2002. [11] C.-H. Chiu, J.-J. Chen, and F. Yu, “An e↵ective distributed ghsom algorithm for unsupervised clustering on big data,” in Big Data (BigData Congress), 2017 IEEE International Congress on. IEEE, 2017, pp. 297–304. [12] M. J. Brusco, R. Singh, J. D. Cradit, and D. Steinley, “Cluster analysis in empirical om research: survey and recommendations,” International Journal of Operations &. 政治大. Production Management, vol. 37, no. 3, pp. 300–320, 2017.. 立. ations research, vol. 24, no. 11, pp. 1097–1100, 1997.. 學. ‧ 國. [13] N. Mladenovi´c and P. Hansen, “Variable neighborhood search,” Computers & oper-. ‧. [14] F. E. H. Tay and L. J. Cao, “Improved financial time series forecasting by combining. y. Nat. support vector machines with self-organizing feature map,” Intelligent Data Analysis,. er. io. sit. vol. 5, no. 4, pp. 339–354, 2001.. [15] C. Hung and C.-F. Tsai, “Market segmentation based on hierarchical self-organizing. al. n. v i n map for markets of multimedia systems with applications, vol. 34, C hon demand,” Expert engchi U no. 1, pp. 780–787, 2008.. [16] C.-J. Lu and Y.-W. Wang, “Combining independent component analysis and growing hierarchical self-organizing maps with support vector regression in product demand forecasting,” International Journal of Production Economics, vol. 128, no. 2, pp. 603–613, 2010. [17] S.-Y. Huang, R.-H. Tsaih, and F. Yu, “Topological pattern discovery and feature extraction for fraudulent financial reporting,” Expert Systems with Applications, vol. 41, no. 9, pp. 4360–4372, 2014.. 31.

(38) [18] Z.-R. Fang, S.-W. Huang, and F. Yu, “Appreco: Behavior-aware recommendation for ios mobile applications,” in Web Services (ICWS), 2016 IEEE International Conference on. IEEE, 2016, pp. 492–499. [19] M. Sundermeyer, R. Schl¨ uter, and H. Ney, “Lstm neural networks for language modeling,” in Thirteenth Annual Conference of the International Speech Communication Association, 2012. [20] J. Chung, C. Gulcehre, K. Cho, and Y. Bengio, “Empirical evaluation of gated recurrent neural networks on sequence modeling,” arXiv preprint arXiv:1412.3555, 2014.. 立. 政治大. [21] N. S. Chaudhari and X.-M. Yuan, “Demand forecasting of short life span products:. ‧ 國. 學. Issues, challenges, and use of soft computing techniques,” in Handbook of Computational Intelligence in Manufacturing and Production Management.. IGI Global,. ‧. 2008, pp. 124–143.. y. Nat. io. sit. [22] C. L. Giles, S. Lawrence, and A. C. Tsoi, “Noisy time series prediction using recurrent. al. n. 161–183, 2001.. er. neural networks and grammatical inference,” Machine learning, vol. 44, no. 1-2, pp.. Ch. engchi. i n U. v. [23] G. A. Darbellay and M. Slama, “Forecasting the short-term demand for electricity: Do neural networks stand a better chance?” International Journal of Forecasting, vol. 16, no. 1, pp. 71–83, 2000. [24] K. Mupparaju, A. Soni, P. Gujela, and M. A. Lanham, “A comparative study of machine learning frameworks for demand forecasting.” [25] A. Rauber, E. Pampalk, and D. Merkl, Using psycho-acoustic models and selforganizing maps to create a hierarchical structuring of music by sound similarity. na, 2002.. 32.

(39) [26] S.-H. Hsu, J. P.-A. Hsieh, T.-C. Chih, and K.-C. Hsu, “A two-stage architecture for stock price forecasting by integrating self-organizing map and support vector regression,” Expert Systems with Applications, vol. 36, no. 4, pp. 7947–7951, 2009. [27] S. Liu, L. Lu, G. Liao, and J. Xuan, “Pattern discovery from time series using growing hierarchical self-organizing map,” in International Conference on Neural Information Processing. Springer, 2006, pp. 1030–1037. [28] M. A. Hern´andez and S. J. Stolfo, “Real-world data is dirty: Data cleansing and the merge/purge problem,” Data mining and knowledge discovery, vol. 2, no. 1, pp. 9–37, 1998.. 立. 政治大. [29] M. C. Wilson, “The impact of transportation disruptions on supply chain per-. ‧ 國. 學. formance,” Transportation Research Part E: Logistics and Transportation Review, vol. 43, no. 4, pp. 295–320, 2007.. ‧. y. Nat. [30] S. M. Homayouni, T. S. Hong, and N. Ismail, “Development of genetic fuzzy logic. io. sit. controllers for complex production systems,” Computers & Industrial Engineering,. er. vol. 57, no. 4, pp. 1247–1257, 2009.. al. n. v i n [31] P. J. Huber, “Robust statistics,” C h in InternationalUEncyclopedia of Statistical Science. engchi Springer, 2011, pp. 1248–1251. [32] L. B. Sheiner and S. L. Beal, “Some suggestions for measuring predictive performance,” Journal of pharmacokinetics and biopharmaceutics, vol. 9, no. 4, pp. 503– 512, 1981. [33] A. Tato and R. Nkambou, “Improving adam optimizer,” 2018.. 33.

(40)