A new ART-counterpropagation neural network for solving a forecasting problem

(1)

A new ART-counterpropagation neural network

for solving a forecasting problem

Tzu-Chiang Liu*, Rong-Kwei Li

Department of Industrial Engineering and Management, National Chiao-Tung University, 1001 Ta Hsueh Road, Hsinchu, Taiwan, ROC

Abstract

This study presents a novel Adaptive resonance theory-Counterpropagation neural network (ART-CPN) for solving forecasting problems. The network is based on the ART concept and the CPN learning algorithm for constructing the neural network. The vigilance parameter is used to automatically generate the nodes of the cluster layer for the CPN learning process. This process improves the initial weight problem and the adaptive nodes of the cluster layer (Kohonen layer). ART-CPN involves real-time learning and is capable of developing a more stable and plastic prediction model of input patterns by self-organization. The advantages of ART-CPN include the ability to cluster, learn and construct the network model for forecasting problems. The network was applied to solve the real forecasting problems. The learning algorithm revealed better learning efficiency and good prediction performance.

Keywords: Adaptive resonance theory; Counterpropagation; Neural network

1. Introduction

Many neural networks are used for solving forecasting problems. The forecasting network model is a supervised learning algorithm on a neural network. The network models include the backpropagation (BP), radial basis function (RBF) and conterpropagation network (CPN) models (Chun & Kim, 2004; Kim, Jeong, & Lee, 2003; Shi, Xu, & Liu 1999). The counterpropagation network was introduced byHecht-Nielsen (1988). CPN was designed to provide an efficient learning algorithm for solving the function approximation problem yZf(x) and the forecasting problem (Chang & Chen, 2001). The full CPN works best only when the inverse function fK1exists. The forward-only CPN is designed to approximate yZf(x) when fK1dose not need. The forward-only CPN consists of three layers: input, cluster (Kohonen), and output (Grossberg) layers. The learning of CPN can be split into two stages that combine unsupervised and supervised learning. During the first stage,

the input vectors are clustered and the weight of cluster nodes is determined. During the second stage, the weights from the cluster nodes to the output nodes are adapted to produce the desired response (target output). The supervised learning reduces the errors between the CPN outputs and the desired target.

Adaptive resonance theory (ART) was developed by Carpenter and Grossberg. ART nets are a famous unsuper-vised learning algorithm. This algorithm can automatically find the adaptive clusters based on training patterns. The ART net clustering result is affected by a lower change value of the vigilance parameter. CPN is a supervised neural network that based on the Kohonen learning vector quantization. The learning vector quantization algorithm depends on the approximately optimal number of codebook vectors assigned to each cluster and their initial weights (Kohonen, Hynninen, Laaksonen, & Torkkola, 1995). This study proposes a new neural network called ART-CPN. This network uses the vigilance parameter to generate the cluster layer. In the train process, the learning algorithm sets the vigilance parameter of the input layer (rx) to generate new nodes of cluster layer. The net can automatically create the nodes of the cluster layer and their initial weights.

www.elsevier.com/locate/eswa

* Corresponding author. Tel.: 23924505-7626; fax: C886-4-23934620.

(2)

Moreover, the weights of each layer are adjusted based on the CPN leaning rule.

This study introduces a new neural network for solving forecasting problems using a modified CPN algorithm. ART-CPN was applied to forecast the Box-Jenins furnace data and the lead frame dimension in the etching process for semiconductors. Compared with the conventional neural network, that is the standard backpropagation network (BPN). ART-CPN showed better learning efficiency and good prediction performance for solving forecasting problems.

The rest of this paper is organized as follows. Section 2 describes ART and CPN neural network architectures and the learning algorithm. Section 3 then describes ART-CPN neural network. Subsequently, Section 4 examines the performance of proposed method by computer simulation on benchmark Box-Jenins furnace data and predicting the lead frame dimension. Finally, Section 5 discusses the results and draws conclusions.

2. Adaptive resonance theory and forward-only conterpropagation network

Adaptive resonance theory was developed by Carpenter and Grossberg. ART nets are designed to control the degree of similarity of patterns place on the same cluster unit. The system is sufficiently stable against noise to enable learning, and is sufficiently plastic to learn new input vectors without affecting already learned results. ART networks can develop stable and plastic clustering of arbitrary sequences of input patterns by self-organization. Upon receiving an input pattern, the network attempts to categorize it by first comparing it against the stored weight vectors of existing categories. If a category with the required matching level (vigilance parameter) is identified, then the network training enters a so-called resonant state, and learns by modifying its weight vectors in the learning process. ART has since led to an evolving series of real-time neural network models for unsupervised and supervised leaning. These neural models are capable of learning stable recognition categories in response to arbitrary input sequences with either fast or slow learning (Yang, Han, & Kim, 2004). Model families include ART1 (Carpenter & Grossberg, 1987a), which can stably learn to categorize binary input patterns presented in an arbitrary order; ART2 (Carpenter & Grossberg, 1987b), which can learn to categorize either analog or binary input patterns; ART3 (Carpenter & Grossberg, 1990), which can carry out parallel search, or hypothesis testing of distribution recognition code in a multilevel network hierarchy and Fuzzy ART (Carpenter, Grossberg, & David, 1991) developed herein generalizes ART1 as being capable of learning stable recognition categories in response to both analog and binary input patterns.

The forward-only conterpropagation network is a combination of a portion of the Kohonen self-organizing

map and the output layer. Fig. 1illustrates the architecture of the CPN, which appears to be same that of the backpropagation net. The net consists of three layers: input layer, cluster layer (Kohonen layer) and output layer (Grossberg layer). The training procedure for the CPN comprises two steps. First, an input vector is presented to the input node. The nodes in the cluster layer then compete (winner take all) for the right to learn the input vector. The weights of the network are adjusted automatically during the learning process. Unsupervised learning is used in this step to cluster the input vector to separate distinct clusters of input data. Second, the weight vectors between the cluster and output layers are adjusted using supervised learning to reduce the errors between the CPN outputs and the corresponding desired target outputs.

During the first step, the Euclidean distance between the input and weight vectors is calculated. The winner node is selected based on comparing the input vector X(x1,x2,.,xn)T and the weight vectors vij(w1j,w2j,.,wnj)T. The winning node zjhas the weight vector wjk(w1j,w2j,., wnj)T, winner-take-all operation that permits this cluster node J to be the most similar to the input vector. The weights of the cluster node J are adjusted. The weight vector of the winner is updated according to

vnewiJ Z v old

iJ CaðxiKv

old

iJ Þ (1)

where a denotes the learning rate and xirepresents the ith node of input layer.

After training the weights from the input layer to the cluster layer, the weights from the cluster layer to output layer are trained. Each training pattern inputs the input layer, and the associated target vector is presented to the output layer. The competitive signal is a binary variable, assuming a value of 1 for the winning node and a value of 0 for other nodes of the cluster layer. Each output node k has a calculated input signal wJkand target output yk. The weights between the winning cluster node and the output layer nodes are updated as follows:

wnewJk Z w old

Jk CbðykKw

old

Jk Þ (2)

(3)

where wJkdenotes the weights from the cluster layer to output layer, and b represents the learning rate. The competitive signal of cluster layer zjis computed by

zjZ

1 if j Z J; J is winning node

0 otherwise

(

(3) and the output node k is given by

^ykZ

Xp jZ1

wjkzj (4)

^y_kis the CPN kth computed output.

The CPN classifies the input vector to most similar cluster nodes, and then outputs the prediction result. The learning speed of CPN is fast compared to other neural networks owing to the use of the efficient learning algorithm for solving forecasting problems. The CPN can compress the m input patterns to p clusters where in general p!m. The adaptive p cluster nodes determine the accuracy of the network output. The next paragraph develops a new ART-CPN network to automatically generate the adaptive nodes of the cluster layer. The network was applied to solve the real forecasting problems.

3. ART-CPN neural network

Adaptive resonance theory nets are designed to allow the user to control the degree of similarity of patterns placed on the same cluster. The adaptive resonance theory algorithm proposed by Grossberg is a special neural network that can cluster the training patterns. Meanwhile, the CNN-ART algorithm (Lin & Yu, 2003) can dynamically generate the nodes of the cluster layer. The appropriate initial weight vectors (codebook vectors) can be obtained in contrast to the conventional unsupervised learning algorithms. The ART-CPN combines the adaptive resonance theory and con-terpropagation network to develop a new prediction network model. The algorithm redesigns the relative similarity between the input vector and the weight vectors for a cluster node. The vigilance parameters make the network automatically generate the nodes of the cluster layer and adaptive initial weights between the input layer, cluster layer and output layer. The ART-CPN algorithm can derive the Kohonen and Grossberg learning rule for updating the weights of the winning node. The network has fast learning speed and good prediction performance. 3.1. Similarity and architecture

Pattern recognition systems are designed to find the similarly vector from random training vectors. From the geometric perspective, the similarity of two vectors is the distance metric of the two different vectors (patterns). The p-norm metric is commonly used, and was defined as

follows (Han & Kamber, 2001) X Z ½x1; x2; x3; .; xn T and Y Z ½y1; y2; y3; .; yn T p-norm metric kX K Yk h X n iZ1 jxiKyij p !1=p ; 1%p%N: (5)

The following conditions must be satisfied:

C1 : dðx; yÞR0 (6)

C2 : dðx; yÞ Z dðy; xÞ (7)

C3 : dðx; yÞ%dðx; zÞ Cdðz; yÞ (8)

The p-norm is generally used to determine whether two patterns are the same class. The distance between two vectors is as follows (Han & Kamber, 2001):

p Z 1; kX K Yk1

Z Xn

iZ1

jxiKyij; is the Manhattan distance: (9)

p Z 2; kX K Yk1 Z ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi Xn iZ1 jxiKyij 2 s

; is the Euclidean distance: (10)

The competition learning law uses the Euclidean distance to determine the winner. The distance calculates the Euclidean distance between the input vector and weight vectors, and the winner unit whose weight vector has the smallest Euclidean distance from the input vector. ART-CPN uses the mean of Manhattan distance to calculate the similarity between the input and weight vectors. Consider m unlabeled training patterns to have n-dimensional attributes using a set of the vectors (X1,X2,.Xm), the similarity between two vectors X1 and X2is calculated as follows:

X1Z ½x11; x12; x13; .; x1nTand

X2Z ½x21; x22; x23; .; x2n T

have n dimensional attributes: Distance components of two vectors for x11,x21: Dsðx_11;x21ÞZ

jx11Kx21j

max xm1Kmin xm1

;

x11_;x123X1; X2for first attribute of X1; X2 ð11Þ

Similarity between two vectors ðVsðX1;X2ÞÞ between

X1and X2: DsðX1;X2ÞZ

Pn

iZ1Dsðxi;x2iÞ

n ;

(4)

VsðX1;X2ÞZ 1 K DsðX1;X2Þ;

Vsis defined as the similarity between X1and X2: ð13Þ

The major difference between ART and unsupervised neural networks is the vigilance parameter (r). The ART defines the similarity between a new pattern and a stored pattern. This similarity is compared to r as a measure to ensure that the new pattern is properly classified or that a new cluster is generated. ART-CPN defines the vector similarity (Vs) that VsRr creates the new node for the CPN cluster layer. Fig. 2 shows the architecture of ART-CPN network. The net develops a new method to generate the adaptive nodes of the cluster layer. The first input vector (first vij) and target vector (first wjk) are used directly to establish the initial cluster node. The similarity of the vector ðVsðXjÞÞ is calculated between the input vector (X)

and weight vector (vij). The maximum VsðXjÞis the winning

node. Moreover, the winning node is index J. rX is the vigilance parameters of the weight vectors (viJ). If the VsðXJÞ!rX is true, the cluster nodes are added one node to

cluster layer. Additionally, if VsðXJÞRrX is true to represent

the input vector that belongs to weight vector (same cluster), the Kohonen learning algorithm adjusts the weights of viJ and the Grosesberg learning algorithm adjusts the weights of ðwJkÞ: The network uses real-time

learning to generate dynamically the nodes of cluster layer in the training process.

3.2. The ART-CPN algorithm

The training procedure for the forward-only counter-propagation net includes two steps in learning process. The ART-CPN simultaneously trains the weights of the input layer, cluster layer and output layer. An input vector (X) presents to the cluster node that the VsðXjÞ of the weight

vector (vij) is calculated. Each training vector is presented to the input layer, and the associated target vector is presented to the output layer. The nodes in the cluster layer compete (winner-take-all) for the right to learn the input vector. The maximum VsðX_jÞ is the winning node (call its index J).

The winning node sends a signal of 1 to the output layer. Each output node k has a calculated input signal wJkand target vector. The learning rule updates the weights of the winning nodes. Meanwhile, the learning rule updates the weights from the input layer to the cluster nodes:

vnewiJ Z v old

iJ CaðxiKv

old

iJ Þ (14)

J denotes the winning node.

The learning rule updates the weights from the cluster nodes to the output layer:

wnewJk Z woldiJ CbðykKwoldJk Þ (15)

The competitive signal of cluster layer zjis computed by zjZ

1 if j Z J; J is winning node

0 otherwise

(

(16) The learning rule for the weights from the cluster nodes to the output nodes can be expressed using the delta rule:

wnewJk Z woldiJ CbzjðykKwoldJkÞ (17)

The training of the weights from the input nodes to the cluster nodes continues at a low learning rate with gradually reducing learning rate for the weights from the cluster nodes to the output nodes. The nomenclature used is as follows:

X(x1,.,xn) input vector. Y(y1,.,yk) target vector.

VsðXjÞ similarity of vector between the input and

weight vectors of cluster node j.

rX vigilance parameter of input vector (0.8–0.9).

C learning epochs.

a,b learning rate (0.04–0.1).

vxJ, wJk weights of the winner cluster nodes (J). Table 1lists a pseudo-code of the ART-CPN algorithm. Setting the vigilance parameter can generate the number of the cluster nodes. If the vigilance is set to be high, then it gathers a large number of cluster nodes. After training, the weights of the cluster nodes are distributed in a statistically optimal manner that improves the accuracy performance. The learning speed of ART-CPN is extremely fast due to the one step learning process and the efficient learning algorithm.

In the testing process, only input data is required for the network model to operate when the ART-CPN is used for the prediction. The application procedure for ART-CPN is as follows:

Step 1. Present input vector X. Step 2. Compute V_sðX

jÞ;

Find the node J that is maximum VsðX_JÞ:

Step 3. Set the activations of output nodes ð ^ykÞ :

(5)

The competitive signal of cluster layer zj is computed by zjZ 1 if j Z J; J is winning node 0 otherwise ( ^ykZ X j zjwjk (18)

4. Application of the ART-CPN to solve the forecasting problems

The above methodology is applied to two forecasting problems. The first problem is the benchmark Box and Jenkins gas furnace data (Lin & Cunningham, 1995). Meanwhile, the second problem is an application of predicting the lead frame dimension for semiconductors. The performance of ART-CPN and BPN are evaluated and compared using the root-mean-square error (RMSE) and the PIperformance index. The root mean of the square errors of prediction are calculated from the target output (yi) and predicted values ð ^yiÞ according to the equation:

RMSE Z

ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi_P_m

iZ1ðyiK^yiÞ2

m r

; for m training patterns (19) PIis calculated based on the target output and predicted values according to the equation (Lin & Cunningham, 1995): PIZ ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi Pm iZ1ðyiK^yiÞ2 p Pm iZ1jyij

; for m training patterns (20) 4.1. Box and Jenkins gas furnace data

The Box and Jenkins gas furnace data were obtained from the literature (Lin & Cunningham, 1995). The process is a gas furnace with a single input (gas flow rate) u(t) and single output (CO2 concentration) y(t). The dataset considers the variable y(tK1),y(tK2),.,y(tK4), u(tK1),

u(tK2).,u(tK6) as input nodes. This investigation trained 250 data points and predicted the next 40. The ART-CPN

sets the rXZ0.92, aZbZ0.04 and CZ6 for network

training. It took 1500 iterations to train. The forecasting

performance measure was RMSEZ0.65722, PIZ0.00077

on the training set, and RMSEZ1.61294 PIZ0.00473 on the test set. The training data is used to train the back-propagation network. Training the network model required 15,000 iterations. The BPN performance measure was

RMSEZ0.4284, PIZ0.0005 on the training set and

RMSEZ1.4795 PIZ0.00436 on the test set. 4.2. Forecasting the dimension of lead frame for semiconductor

The electrical industry is rapidly developing, creating high demand for IC production. Etched semiconductor lead frame is the basic material used in IC packaging. The dimensions of the pilot hole are generally required to be highly precise in the lead frame manufacturing.Fig. 3shows the dimension of the pilot hole of the etching lead frame. The photo etching process must control the dimension of the pilot hole, and records the manufacturing parameters of the etching machine and also the inspection data. These data were used to construct the ART-CPN network model. The network model forecasts the dimension of the pilot hole to control the dimension on target value in the etching process. The model effectively maintains process stability and supports the adjustment parameters of

Table 1

The ART-CPN algorithm

Step 1. Set the initialize weights (first input vector and first target vector); initialize learning rate (a,b); 0%rX%1, number of epochs

Step 2. While stopping condition is false, do steps 3–8

Step 3. For each input vector X(x1,x2,.,xn) and target vector Y(y1,y2,.,yk), do Steps 4–8

Step 4. Compute VsðXjÞof each cluster node

Find the winning cluster node (J) that is a maximum VsðXjÞ; called its index J

Step 5. If VXJRrxupdate wiJusing

vnew

iJ Z voldiJ CaðxiKvoldiJÞ; iZ1; .; n: wnewJk Z woldiJ CbðykKwoldJkÞ; kZ1; .; k:

Step 6. The maximum VsðXJÞ!rXthen adds one cluster node (jZjC1).

vnew cluster node

ij Z xiðiZ1; .; nÞ; the weights between input layer and cluster layer. wnew cluster nodejk Z ykðiZ1; .; kÞ; the weights between cluster layer and output layer. Step 7. Reduce learning rate

Step 8. Test stopping condition

The condition may specify a fixed number of epochs or the learning rate reaches a small value sufficiently

(6)

the subsequent etching process. The input parameters that significantly influence the dimensions of the pilot hole during the etching process must be identified. The input data is mapped onto the output to the desired accuracy. Ten input parameters are used to construct the forecasting model. The input parameters are identified as follows:

Manufacture data of etching process:

1. Be(tK1): the Blaume value of etching solution in time period tK1.

2. PH(tK1): the PH value of etching solution in time period tK1.

3. OPR(tK1): the ORP value of etching solution in time period tK1.

4. ET(tK1): the temperature of the etching solution (8C) in time period tK1.

5. Speed(tK1): the speed of the material (mm/min) in time period tK1.

6. RON(tK1): the etching open roller number in time period tK1.

Inspection data:

7. y(tK1): the sample mean the dimension of the pilot hole in time period tK1.

Table 2illustrates the input data for training process. y(t) predicts the sample dimension of the pilot hole in time period t. The 72 etching process data is used to establish the forecasting network model. Network training took 1008 iterations using the ACT-CPN. Moreover, the ART-CPN forecasting performance measure was RMSEZ1.5525, PIZ0.00012 on the training set. Network training required 100,000 iterations using BPN. The performance measure used on the training set was RMSEZ1.65795, PIZ0.00013.

This study used the trained network model to predict 25 samples of the pilot hole in the etching process.Fig. 4shows the prediction errors for the test process. The maximum prediction error is ART-CPNZ4 mmm and BPNZ7 mmm. Moreover, the performance measure was RMSEZ1.60609,

PIZ0.00020 for ART-CPN and RMSEZ2.86667,

PIZ0.00037 for BPN. The pilot hole has a tolerance of 25 mmm. The ART-CPN network can precisely forecast the dimension of the pilot hole according to manufacturing parameters and inspection data in the etching process. The algorithm provides good learning efficiency and prediction performance to improve the lead frame quality in the etching process.

5. Conclusions

This investigation proposed a new neural network for solving the forecasting problem. ART-CPN uses the vigilance (rX) to generate the cluster layer nodes. The adaptive cluster nodes can enhance the traditional CPN performance. The net successfully applies to predict the gas furnace data and the lead frame dimension in the forecasting system. A good forecasting performance could improve the lead frame quality in the etching process. ART-CPN requires the one step leaning process. The learning speed is faster than the CPN and BPN. ART-CPN based on Adaptive Resonance Theory and CPN that is supervised real-time learning by a self-organizing neural network. If the vigilance is fixed, there is no need to retrain all patterns when adding the new training patterns. In the dynamic database, the real-time learning algorithm can adapt the environmental change to produce a new network model for new training patterns. The learning characteristic is critical for dynamic forecasting problems. ART-CPN is success-fully applied to solve the forecasting problems. The algorithm can reduce the learning time and obtain good prediction performance for solving forecasting problems.

References

Carpenter, G. A., & Grossberg, S. (1987a). A massively parallel architecture for a self-organizing neural pattern recognition machine. Computer Vision, Graphic, and Image Processing, 37, 54–115. Carpenter, G. A., & Grossberg, S. (1987b). ART2: Stable self-organization

of pattern recognition codes for analog input patterns. Applied Optic, 26, 4919–4930.

Carpenter, G. A., & Grossberg, S. (1990). ART3: Hierarchical search using chemical transmitters in self-organizing pattern reconition architec-tures. Neural Networks, 3, 129–152.

Table 2

The training data is used to train the neural network

Be PH ORP ET Speed Roll number y(tK1) y(tK2) y(tK3) y(tK4) y(t)

41.70 486.90 534.50 50.10 1905 97.00 1551.40 1549.60 1551.00 1549.20 1550.20

41.70 488.90 534.60 50.20 1905 97.00 1550.20 1551.40 1549.60 1551.00 1551.40

(7)

Carpenter, G. A., Grossberg, S., David, B. R., & Fuzzy, ART. (1991). Fast stable learning and categorization of analog patterns by an adaptive resonance system. Neural Networks, 4, 759–771.

Chang, F. J., & Chen, Y. C. (2001). A conterpropagation fuzz-neural network modeling approach to real time streamflow predition. Jounal of Hydrology, 245, 153–164.

Chun, S. H., & Kim, S. H. (2004). Data mining for financial prediction and trading: Application to single and multiple markets. Expert Systems with Applications, 26, 131–139.

Han, J., & Kamber, M. (2001). Data mining: Concept and techniques. Morgan Kaufmann , 338–344.

Hecht-Nielsen, R. (1988). Application of counterpropagation networks. Neural Networks, 1, 131–139.

Kohonen, T., Hynninen, J., Laaksonen, J., & Torkkola, K. (1995). The learning vector quantization program package reference guide. www.cis.hut.fi/research/som-research/nnrc-programs.shtml, 1995 (pp.1-10).

Kim, I. S., Jeong, Y. J., Lee, C. W., & Yarlagadda, P. K. D. V. (2003). Prediction of welding parameters for pipeline welding using an intelligent system. International Journal of Advanced Manufacturing Technology, 22, 713–719.

Lin, Y., & Cunningham, G. A. (1995). A new approach to fuzzy-neural system modeling. IEEE Transaction on Fuzzy System, 3(2), 190–198.

Lin, T. C., & Yu, P. T. (2003). Centroid neural network adaptive resonance theory for vector quantization. Signal Processing, 83, 649–654.

Shi, S. M., Xu, L. D., & Liu, B. (1999). Improving the accuracy of nonlinear combined forecasting using neural networks. Expert Systems with Applications, 16, 49–54.

Yang, B. S., Han, T. H., & Kim, Y. S. (2004). Integration of ART-Kohonen neural network and case-based reasoning for intelligent fault diagnosis. Expert Systems with Applications, 26, 387–395.