• 沒有找到結果。

eight numerical attributes, which are re-scaled into a [0, 1] range:

Xscaled = X Xmin

Xmax Xmin

(1)

In the case study section, we come across some missing values in numerical attributes when using Company Ws data. We substitute 1 for missing entries and feed them along with scaled [0, 1] attributes to the GHSOM clustering algorithm, in which a negative value would be accommodated in the clustering processes and reflected in emerged clusters.

3.3 Unsupervised clustering with GHSOM

The Growing Hierarchical Self-Organizing Maps (GHSOMs) [10] consists of an architec-ture that has the ability to restrucarchitec-ture SOMs both in a hierarchical way according to the data distribution, allowing hierarchical decomposition in sub-parts of the data, and in a horizontal way, meaning that each map adapts its size to the input space require-ments. The advantage of GHSOM is the ability to determine the depth and size of the map throughout the training process. The growth process of the GHSOM is guided by two parameters: ⌧1, which represents the same-layer SOM map similarity and ⌧2, which determines the depth of the GHSOM structure. The input x shown in eq. (2) is a high dimension matrix including i number of vector with the selected attributes.

i, n2 N vn=⇥

a1, a2, ..., ai

x = [v1, v2, ..., vn] (2)

The starting point for the growth process is the overall input data as measured with the single-unit SOM at layer 0. This starting unit is assigned a weight vector m0, m0 = [µ01, µ02, . . . , µ0i], computed as the average of all input attributes. The deviation of the input data, i.e. the mean quantization error mqe of this single unit, is computed as given

in eq. (3) with the number of input vectors n.

mqe0 = 1

n · ||m0 x|| (3)

After the computation of mqe0, training of the GHSOM starts with its first layer SOM which consists of a rather small number of units, e.g. a grid of 2⇥ 2 units. To adapt the size of this first layer SOM by data distribution, the mean quantization error of the map is computed as given in eq. (4). In this formula, u refers to the number of units i contained in the SOM m. According to eq. (3), mqei is computed as the average distance between weight vector mi and the input patterns mapped onto unit i.

MQEm = 1 u ·X

i

mqei (4)

The basic concept is that each layer of the GHSOM is responsible for explaining the deviation of the input data as present in its prior layer. This is done by adding units to expand the initial SOM map on each layer until it reaches a suitable size. More specifically, the SOM on each layer is allowed to grow until the deviation in the unit of its preceding layer is reduced to at least a fixed percentage ⌧1. Thus, as long as MQEm1 · mqe0

holds true for the first layer map m, a new row or a new column of units is added to this SOM. This insertion is achieved neighboring the unit e with the highest mean quantization error, mqee, called error unit. Whether a new row or a new column is added is controlled by the location of the most dissimilar neighboring units d to the error unit. We take the advantage of matrix calculation which can accelerate by multi-thread GPU process unit. Here, we use two matrix to record units weight and units topology location. First we get the location of error unit by get index of the highest unit weight mqee. Next, We can easily get the neighboring units location by simply subtract error unit location and topology location and get the result which the absolute sum is equal to 1 (one step distance di↵erence). Fig. 2 shows how we find the location of neighbor units. After we find the location of neighbor units, we calculate the most weight dissimilar neighboring

立 政 治 大 學

N a tio na

l C h engchi U ni ve rs it y

Figure 2: Defining neighbor unit

units to the error unit. Hence, we insert a new row or a new column depending on the position of the neighbor with the most dissimilar weight vector. As soon as the expansion of the first layer map is finished, i.e. MQEm < ⌧1· mqe0, the units of this map are tested for expansion on the second layer. In particular, those units that have a large mean quantization error will add a new SOM to the second layer of the GHSOM. A parameter

2 is used to express the desired level of granularity in input data discrimination in the final maps. More precisely, each unit i fulfilling the criterion given in eq. (5) will be expanded hierarchically.

mqei > ⌧2· mqe0 (5)

Once the ⌧2 condition is satisfied, the unit insertion procedure will continue with this newly established SOM map which takes input data that belongs to the same preceding SOM unit. We examine the ⌧1 and ⌧2 until no more units required further expansion and the training process leads to a hierarchical SOM tree structure. The result of GHSOMs

Figure 3: Cluster tree

立 政 治 大 學

N a tio na

l C h engchi U ni ve rs it y

represents a tree structure as shown in Fig. 3. We further label each cluster according to its position in each layer. Considering a GHSOM structure with depth h, the label of a cluster Ck is represented as an h-dimension vector:

Ck=⇥

l1, l2, ..., lh

, (6)

, where li indicates the position of Ck in the ith layer. Fig. 4 shows an example to label clusters in a three-layer GHSOM structure. Each cluster represents a set of samples that

Figure 4: GHSOM labeling method

have similar values in demand and inventory attributes, where each sample is associated with an item (some product in the supply chain) at some week (the product has transac-tions in that week). That is to say, a product (an item) has a weekly sample in historical records if the product has transactions in that week. For each item, according to the clusters that its samples fall in, we can generate an item trajectory as a cluster sequence ordered by dates. For weeks that there are no transactions, we use ’0’ as a padding sym-bol that is distinguishable from all the clusters. Here we show item-A, a sample item of Company W’s products, which represents a typical item that has highly irregular demand.

立 政 治 大 學

N a tio na

l C h engchi U ni ve rs it y

A typical sample of item trajectory is [0, 0, 0, 0, 0, C21, 0, 0, 0, 0, C21, C19, 0, 0, C17, 0, C21, 0, 0, 0, 0, 0, 0, C21, 0, 0, 0, 0, C21, 0, 0, 0, C21, 0, 0, 0, 0, C17, 0, C17, 0, 0, 0, C17, 0, 0, 0, C17, 0, 0, C17, 0, 0, C18, 0, 0, 0, C18, 0, 0, C18, 0, 0, 0, C19, 0, 0, 0, C19, C20, 0, 0, C19, 0, 0, C19, 0, 0, 0, C20, 0, 0, 0, C20, 0, 0, C19, 0, C20, 0, C20, C19], where this item has 27 transaction records in the past 92 weeks. For the week, e.g., week 6, that the item has transactions, the item’s demand and inventory can be characterized by the corresponding cluster, e.g., C21. The trajectory reflects the process variability of the item.

One can project the cluster movement to AWU-Avail-Ratio & Backlog-Avail-Ratio map as shown in Fig. 5, which represents the movement of item-A at di↵erent time. It gives company managers insight through observing the fluctuation pattern of its movements in AWU-Avail-Ratio (x-axis) and Backlog-Avail-ratio (y-axis).

Figure 5: A typical sample of item trajectory

It is common for supply-chain distributors to have thousands of products. To cluster similar items, we put item trajectories into GHSOM and cluster similar trajectories to-gether. In the following experiment section, we train one RNN model by feeding each cluster trajectories, and we will also compare the RNN model performance while feeding di↵erent trajectories, such as single item trajectory and all item trajectories. To thor-oughly compare the di↵erent between our RNN model, we also separate stable trajectory cluster, which means the item usually stay in the same or have similar moving pattern, and unstable trajectory cluster, which represent high fluctuating item trajectories, in our experiment. Tab. 5 shows some examples of Company W’s item sequences that belong to the same cluster.

3.5 Cluster sequence prediction

To capture real-world fluctuation of items, we adopt recurrent neural networks (RNNs) to learn and predict the trajectory movement. The movement consists of two parts: the cluster (the demand and inventory status of next transaction falls in) and the delay (the number of no transaction weeks before it happens). To this aim, we re-encode the item trajectory as the sequence of a pair of [cluster, delay] as:

Item-X = {[c0, d0] , [c1, d1] , ..., [cn, dn]} (7)

, where n refers to the number of transaction weeks in the trajectory and for 0  i <

n, ci refers to the cluster of the ith transaction week and di referes to the number of padding symbol 0 before ci in the trajectory. The cluster of ci is encoded as a layer vector [l1, l2, . . . , lh]. We adopt an RNN model as our learning mechanism, which has been shown great success in music modeling and speech signal modeling [20]. Fig. 6 shows the RNN structure we employ. A sequence item-X (eq. 7) with length n is fed to an RNN model with n neuron nodes. Each neuron St takes xt = [ct, dt] to predict ˆxt+1 at transaction

相關文件