Optimal design for a ball grid array wire bonding process using a neuro-genetic approach

(1)

Optimal Design for a Ball Grid Array Wire Bonding

Process Using a Neuro-Genetic Approach

Chao-Ton Su and Tai-Lin Chiang

Abstract—This study presents an integrated method in which neural networks, genetic algorithms, and exponential desirability functions are used to optimize the ball grid array (BGA) wire bonding process. As widely anticipated, the BGA package will become the fastest-growing semiconductor package and push integrated circuit (IC) packaging to higher level of compactness and density. However, wire bonding in BGA is difficult owing to its high input/output (I/O) count, fine pitch wire bonds, and long wire lengths. This study addresses two fundamental issues in the semiconductor assembly facility on its quest toward a defect-free manufacturing environment. First, the problem of exploring the nonlinear multivariate relationship between parameters and responses and second, obtaining the optimum operation parameters with respect to each response in which the process should operate. The implementation for the proposed method was carried out in an IC assembly factory in Taiwan; results in this study demonstrate the practicability of the proposed approach.

Index Terms—Ball grid array (BGA), exponential desirability function, genetic algorithms, neural networks, wire bonding.

I. INTRODUCTION

B

GA packages provide a high interconnect density and lead count using standard pitch dimensions. Wire bonding de-signs include ultra fine pitch and cavity-up, which conduct heat from the die through the substrate and interconnect. Owing to their intrinsic design, BGAs are technically complex to bond, and are designed for high I/O counts, i.e., up to 500 leads is common. They also demand fine-pitch ( m) wire bonding and require long wire lengths, straight loops and small first and second bond areas [1]. With high I/O count, fine pitch wire bonds, and long wire lengths, wire bonding in the BGA as-sembly is difficult. Exploring a manufacturing solution for the BGA requires an integrated study for wire bonding parame-ters. Fig. 1 depicts the thermal design architecture of the BGA package.

The wire bonding process begins from targeting the capillary on the bond pad and positioned above the die with ball formed on the end of the wire and pressed against the face of the cap-illary. The capillary descends bringing the ball in contact with the die. The inside cone, or radius, grips the ball in forming the bond. In a thermosonic system, ultrasound vibration is then ap-plied. After the ball is bonded to the die, the capillary raises

Manuscript received December 4, 2000; revised December 20, 2001. This paper was supported by the National Science Council of Taiwan, R.O.C., under Contract NSC-90-2218-E-009-008.

C.-T. Su is with the Department of Industrial Engineering and Management, National Chiao Tung University, Hsinchu, Taiwan, R.O.C.

T.-L. Chiang is with the Department of Business Administration, Minghsin Institute of Technology, Hsinchu, Taiwan, R.O.C.

Publisher Item Identifier S 1521-334X(02)02656-3.

Fig. 1. Typical BGA thermal design.

Fig. 2. Wire bonding process mechanism.

to the loop height position. The clamps are open and wire is free to feed out the end of the capillary. The lead of the de-vice is positioned under the capillary, which is then lowered to the lead. Wire is fed out the end of the capillary, forming a loop. The capillary deforms the wire against the lead, producing a wedge-shaped bond, which has a gradual transition into the wire. In a thermosonic machine, ultrasonic vibration is then ap-plied. The capillary raises off the lead. Leaving the stitch bond. At a pre-set height, the clamps are closed, while the capillary is still rising with the bonding lead. This prevents the wire from feeding out the capillary and pulls at the bond. The wire will break at the thinnest cross section of the bond. A new ball is formed on the tail of the wire, which extends from the end of the capillary. A hydrogen flame or an electronic spark may be used to form the ball. The cycle is completed and ready for the next ball bond. Fig. 2 depicts the mechanism of wire bonding.

Wire bonding is used throughout the semiconductor industry as a means of interconnecting the dies, substrates and I/O pins. Ultrasonic metal welding technology can be used for many dif-ferent applications by appropriately utilizing its sound wave and high frequency mechanical energy characteristics. Ultrasonic energy is used to improve the structure of materials in met-allurgy. The acoustic irradiation of molten mass improves de-gasification and the finer grain structure during the hardening process.

(2)

The wire bonding operation attempts to develop a high yield interconnect and low wire sweep process with a sufficient long-term reliability to satisfy customers requirements [2]. To achieve a high level of wire bonding performance and quality, the ap-propriate bonding process parameters must be accurately iden-tified and controlled. The task of the process engineers is to identify and control these parameters to obtain desired wire bonding quality for optimizing multiple responses (e.g., max-imum ball shear strength, wire pull strength, and appropriate ball size), based on their experience or equipment provider’s recommendations iteratively. However, this task is complicated and difficult due to coupled multivariable system, which makes it impossible to adjust a single parameter without affecting the others. Therefore, this multivariate operation requires an intelli-gent system capable of evaluating the process and determining the optimum adjustment [3].

The use of statistical experimental design techniques in semi-conductor manufacturing has been proven very beneficial in process modeling, optimization and control. This approach in process has yielded fairly good empirical models for processes such as plasma etching and LPCVD [4]. However, statistical modeling in semiconductor manufacturing relies on response surface methods (RSM) to construct process models following experimentation. Himmel and May [5] demonstrated that RSM models are lack accurate and robust than models constructed using neural networks.

This study presents an integrated approach not only for exploring empirical models between process parameters and responses via neural networks, but also for optimizing the process through certain parameter settings using genetic algo-rithms and exponential desirability function for the BGA wire bonding process. A comparison through confirmatory trials between RSM and proposed approach with respect to each response is conducted as well.

The rest of this paper is organized as follows. Neural net-works, genetic algorithms, and exponential desirability func-tions are briefly described is made. The next section presents an integrated procedure for optimizing the BGA wire bonding process. An experimental design for the implementation of pro-posed procedure is then illustrated, followed by a comparison of proposed procedure and RSM in terms of process performance. Concluding remarks are finally made.

II. MODELING AND OPTIMIZATIONAPPROACH FORBGA WIREBONDING

A. Neural Networks

Major progress in studying neural networks has been made since 1980. Neural networks are increasingly used to model complex manufacturing processes, generally for process and quality control [6]. Frequently these models are used to iden-tify optimal process setting. An approximated model can be constructed using a back-propagation neural network. In addi-tion, the output can be predicted by using statistical methods. However, according to Seet and Boullart’s, such methods tend to be generally less accurate [7]. Neural networks possess the unique capability of learning arbitrary nonlinear mappings be-tween noisy sets of input and output patterns [4]. Basically, a

Fig. 3. Topology of the back-propagation neural network.

neural network approach can typically be constructed without assuming anything about the functional form of the relationship between predictors and response [8]. In addition to learning and extracting the process behavior from previous operating infor-mation. This approach can also be used as a model for process optimization. Neural networks have demonstrated strong capa-bility of learning nonlinear and complex relationship between process parameters and responses without any prior knowledge regarding the process. The neural network approach holds a major advantage over the statistical method in that the neural network is explicitly nonlinear through hidden layers. It is a more general mapping procedure in which a specific function format is not required in model building [9]. This particularly fits the highly complex process of BGA wire bonding [10].

Neural networks have recently emerged as a highly promising alternative to physically based models and statistical methods of semiconductor process modeling. Fig. 3 displays the gen-eral structure of a feedforward, multilayer neural network used for semiconductor process modeling that is typically trained via back-propagation [4]. The back-propagation networks have al-ready been applied to a wide range of problems (e.g., speech synthesis, and pattern recognition) and, in most cases, exhibit a good behavior and results [11]. Once trained, a back-propaga-tion network can be evaluated quickly, which is an advantage during the optimization phase. Recent overviews of neural net-work applications in manufacturing industry were compiled by Zhang and Huang [12]. More related applications can also be found in ([3]–[7], [10], [13]–[17]).

The BP neural networks consist of layers of neurons intercon-nected such that information is stored in the weight assigned to the connections. Network learning aims to determine an appro-priate set of connection strengths which facilitate the activation of these processing units to achieve a desired state that imitates a given set of sampled patterns. In addition, a sigmoid activation function determines the activation level of a neuron.

B. Genetic Algorithms (GAs)

More conventional optimization methods start from one point in the search area and then move sequentially to achieve the op-timum solution, thereby operating rather locally and highly prone

(3)

to falling inside a coincidental local optimum. GAs counteract entrapment in a local optimal solution to imitate the principles of natural genetics and natural selection to constitute search and optimization procedure. They perform a global, random, parallel search for an optimal solution using simple computations.

GAs are efficient local search methods based on natural se-lection and population genetics. These algorithms use random-ized operators operating on a population of candidate solution to generate a new population of candidates in the search space [18]. Owing to that large dimensions are involved in the param-eters-to-responses function and a mathematical formulation is unavailable, this study applies genetic algorithms, one of the promising approaches for optimizing the complicated produc-tion system. GAs are known for their robustness and effective overall search capabilities [19]. Huang and Adeli [15], Sette et

al. [7], as well as Hsu and Su [20] have demonstrated the ability

of Gas to perform an optimum search through GAs. A GA in its simplest form uses three operators: reproduction, crossover, and mutation.

C. Exponential Desirability Function

It’s not unusual to deal with multiresponses in a manufac-turing process. Optimizing the process with respect to any single response often results in nonoptimum values for the remaining characteristics. A simple and intuitive approach to multiresponse problem is to superimpose the response contour plots and to determine an optimal solution by visual inspection. Such a method is severely limited by the number of input variables and/or responses [21]. The desirability function approach attempts to transform a multiresponse problem into a single response one by mathematical transformation [22]. Kim and Lin [21] develops an approach based on maximizing exponential desirability functions that do not require any assumptions regarding the form or degree of the estimated response models, such an approach is robust to the potential dependencies between response variables. Their approach aims to identify the settings of the input variables to maximize the degree of overall satisfaction with respect to all the responses. The exponential desirability function has been extensively used to simultaneously optimize several responses. The benefits of the exponential desirability are that they are easily understood, intuitively, and allow the user to weigh the response according to their importance.

In order to achieve an overall optimization with respect to all the responses, a “minimum” operator for aggregating the re-sponses can be stated as

(1) subject to

(2) This formulation aims to identify which maximizes the min-imum degree of satisfaction with respect to all the responses within the experimental region , i.e.,

(3)

The exponential desirability function can be formed as (4)

where is the exponential constant , and is a standardized parameter representing the distance of the esti-mated responses from its target in units of the maximum allow-able deviation. For example, for a response with symmetric desirability function is defined as

(5) Similarly, for the smaller-the-better (STB) type or larger-the-better (LTB) type response, the following transformations for the value

(6)

(7) ranges between and for an NTB-type response and between 0 and 1 otherwise. In both cases the value of achieves its maximum value when . The function given in (4) has been proven to provide a reasonable and flexible representation of human perception and is convenient to handle analytically.

The exponential desirability function has several method-ological advantages over the available methods in terms of optimizing multiresponse. First, the “maximin” approach is robust to the potential dependence between responses. Such dependence is extremely difficult to detect or model in practice. Second, this approach achieves a better balance between all the responses compared with the existing methods. Third, the objective function value allows a satisfactory physical interpre-tation in terms of degree of satisfaction. Fourth, the approach can also be viewed as a fuzzy logic approach. The “maximin” approach is equivalent to using the logical and operator in fuzzy logic, denoting the intersection of the corresponding membership functions. Related studies have demonstrated that this optimization scheme is quite effective in compromising multiple conflicting objectives [22]–[24].

D. Proposed Optimization Procedure

This study proposes an integrated neuro-genetic-exponential desirability function algorithm capable of optimizing the param-eter settings in a BGA wire bonding process. The proposed ap-proach consists of two stages. The first stage procedure involves using of a BP network to derive the relationship model between input parameters and output responses. Notably, the trained net-work can accurately predict the behavior of possible parameter combinations. Thus, tuning the input parameters in the trained network allow us to obtain the corresponding responses. The ex-ponential desirability is then used to transform the multiple re-sponses into a single response. During the second stage, GA is applied to obtain the optimum degree of satisfaction . Herein, the chromosome is used to represent the possible solution. Each

(4)

Fig. 4. Schematic diagram for determining the optimal wire bonding parameters.

gene in the chromosome represents the value of the input pa-rameter. For example, a manufacturing process has three input parameters , and . A chromosome can represent the value of the three parameters , respectively. The essential ge-netic operators during the iterative procedure can be found in the previous section. These operations are conducted to obtain the optimal response, which is evaluated by the fitness func-tion. Therefore, the optimal parameter of the problem can be ob-tained. Fig. 4 schematically depicts the proposed optimization procedure. The detailed procedure is summarized as follows.

Step 1) Collect the input parameters and corresponding re-sponses.

Step 2) Develop a BP network model to obtain the relation-ship between the input parameters and output re-sponses.

Step 3) Apply the exponential desirability function to transform the multiple responses into a single one. The trained network with a modified single response is referred to as a fitness function. Step 4) Set the GA operating conditions (e.g., population

size, generation size, parameter number, crossover rate, and mutation rate).

Step 5) Create an initial population by randomly selecting the value of the input parameters.

Step 6) Repeat steps 7-11 until the stopping condition is reached.

Step 7) Calculate the fitness value by inputting the param-eter values to the fitness function.

Step 8) Select the parameter values according to the com-puted responses.

Step 9) Crossover the fitness parameter values.

Step 10) Mutate the parameter values to yield the next gen-eration.

Step 11) Obtain the current optimal parameter values. Step 12) Obtain the optimal parameter settings.

III. EXPERIMENTALRESULTS A. Training of Neural Networks

An engineering experiment on the 52- m fine pitch BGA wire bonding process is conducted to optimize the wire bonding process with respect to each response, which is shown

TABLE I

RESPONSES OF THE52-m FINEPITCHBGA

TABLE II

PROCESSPARAMETERS ANDTHEIRLEVELS

TABLE III

OPTIONS FORNEURALNETWORKS

in Table I. Table II lists the process parameters and value for each level. Thirty-two trials are conducted by a well-structured orthogonal array . The experimental data are then used for constructing the relationship model between parameters and responses through the BP neural network in which 80% (approximately 25 samples) are used for training the neural networks while the remaining 20% (approximately seven samples) are used for testing.

Table III lists several options of the neural network architec-ture; in which the structure 8-4-3 under the best convergence criterion of the root of mean square (RMSE) is selected to ob-tain a better performance.

B. Determination of the Fitness Function

In this sturdy, responses , and have lower specifications and has the corresponding target value. Herein, the expo-nential desirability function is used to solve the multi-response problem. We have

(8) where is calculated from (3) and (4). The engineering man-agement agrees on employing a convex, convex, and concave exponential desirability function for , and with

respectively, according to their importance. There-fore, is set as the fitness function of the GA as further explored in the next section.

(5)

TABLE IV

IMPLEMENTATIONRESULTS OFGA

C. Optimization Using Genetic Algorithms

Each input parameter in the BGA wire bonding process is normalized to the value between 0 and 1 and they are com-bined into one string. For example, the input parameters listed in Table II are transformed into the chromosome representation in a string. Strings are randomly generated to form the initial population. When GA is applied to optimize the BGA wire bonding parameter selection, the essential op-erators, including reproduction, crossover and mutation should be determined in advance. Herein, a roulette wheel approach is adopted as the selection procedure. The crossover rate and mu-tation rates are set as 0.5 and 0.01, respectively. Fifty strings are randomly generated to establish the initial population. Notably, 5 000 generations were processed.

D. Results

The above information is used and the GA is executed twenty runs. Table IV summarizes the implementation results. The higher value of implies a better degree of satisfaction in terms of compromised solution. The largest value is 0.8812 and its optimum chromosome is (167.1, 6.7, 36.7, 46.5, 0.21, 24.8, 5.9, 62.3). These settings are the optimal condition for the eight process parameters.

E. Comparison

Conventionally, process engineers handle a multiresponse problem to apply the RSM, polynomial models fitted to each of the responses. They then superimpose the response contour plots to determine optimal parameter settings by overlaying of contour plots along with a separate response surface analysis. The best subset models and their -values using RSM are

The optimal process settings can be obtained as values of (159.2, 8.2, 40.5, 51.8, 0.31, 21.4, 5.3, 67.1).

This study conducted a comparison between the RSM and the proposed approach for benchmarking purposes. According to the comparison on Table V, the proposed approach reveals better

TABLE V

COMPARISONBETWEEN THERSMAND THEPROPOSEDAPPROACH

Fig. 5. Yield rate trend.

performance more than 10% on the wire pull and ball shear in terms of short term process capability. Because the values of wire pull and ball shear are the most important characteristics which will highly affect the electrical function of the device in later applications, these results are highly satisfied the line engineers.

This paper was also employed the -tests of the mean values for the wire pull and ball shear between the two approaches, re-spectively, the statistics are 4.26 with a value of 0.0001and 3.35 with a value of 0.0009. Thus, there are strong evidences to indicate that the means for wire pull and ball shear by the proposed approach are greater than the means by the RSM approach.

The effectiveness of the proposed approach is conducted at a semiconductor assembly line in Taiwan that was undertaken to optimize the BGA wire bonding parameters. The implementa-tion results under mass producimplementa-tion over eight months confirm that the proposed approach outperforms the conventional RSM method in optimizing a BGA wire bonding process. According to the quality trend chart (Fig. 5) from the shop floor, the wire bonding yield has been risen to an average around 99.92% over

(6)

eight months from 98.1% which equivalents to a reduction of 18 200 DPPM (defect parts per million). The annual cost saving is expected to exceed 1.1 million US dollars from implementing the proposed approach in Month 2, whereas the expenditure for the experiment was below USD 2,000.

IV. CONCLUSION

This study has demonstrated that integrating BP neural net-works, genetic algorithms, and exponential desirability function can optimize the BGA wire bonding process. Although statis-tical experimental design techniques in semiconductor manu-facturing have greatly benefited in process modeling, optimiza-tion and control, statistical modeling that heavily relies on re-sponse surface methods (RSM) to construct process models fol-lowing experimentation are less accurate and robust than neural networks models. The neural network approach is better than statistical method largely owing to that the neural network is explicitly nonlinear through hidden layers. It is a more general mapping procedure in which a specific function format is not required in model building. This study also demonstrated the superiority of the proposed approach over RSM base on cri-teria such as degree of satisfaction , testing the difference about two population means, and short term process capability. The proposed approach can easily achieve optimization of the complex process with multiple responses. These settings facil-itate process engineers in achieving acceptable process control during the production. In addition, the improvement in process performance allows the factory to more easily fabricate prod-ucts with superior quality in the IC assembly industry.

ACKNOWLEDGMENT

The authors would like to thank Process Engineering Man-ager D. Hsiao, AdvanTech, Ltd., for his full support, and B. Thomas, for permitting the use of the BGA technical data in this study.

REFERENCES

[1] C. Leroy, L. Lee, and M. Eshelman, “Bonding BGA packages requires integrated solutions,” in Proc. IMAPS Conf., Philadelphia, PA, Oct. 1997.

[2] M. Kuzawinski, “High density package applications for wire bond and flip chip: Small, fine pitch BGA packages,” in Proc. Semiconduct. Packag. Symp.-Session I: SEMICON W., 1999.

[3] K. M. Tay and C. Butler, “Modeling and optimizing of a MIG welding process-A case study using experimental designs and neural networks,” Qual. Rel. Eng. Int., vol. 13, pp. 61–70, 1997.

[4] K. K. Lee, T. Brown, G. Dagnall, R. Bicknell-Tassius, A. Brown, and G. S. May, “Using neural networks to construct model of the molecular beam epitaxy process,” IEEE Trans. Semiconduct. Manufact., vol. 13, pp. 34–45, Feb. 2000.

[5] C. Himmel and G. May, “Advantages of plasma etch modeling using neural networks over statistical techniques,” IEEE Trans. Semiconduct. Manufact., vol. 6, pp. 103–111, May 1993.

[6] D. W. Coit, B. T. Jackson, and A. E. Smith, “Static neural network process model: Considerations and cases studies,” Int. J. Prod. Res., vol. 36, no. 11, pp. 2953–2967, 1998.

[7] S. Sette, L. Boullart, L. V. Langenhove, and P. Kiekens, “Optimizing the fiber-to-yarn production process with a combined neural network/ge-netic algorithm approach,” Textile Res. J., vol. 67, no. 2, pp. 84–92, 1997. [8] H. S. Stern, “Neural networks in applied statistics,” Technometrics, vol.

38, no. 3, pp. 205–220, Aug. 1996.

[9] C. A. Chang and C.-T. Su, “A comparison of statistical regression and neural network methods in modeling measurement errors for computer vision inspection systems,” Comput. Ind. Eng., vol. 28, no. 3, pp. 593–603, 1995.

[10] J. Chen, P. T. P. Chu, D. Shan, H. Wong, and S. S. Jang, “Optimal design using neural network and information analysis in plasma etching,” J. Vac. Sci. Technol. B, vol. 17, no. 1, pp. 145–153, Jan./Feb. 1999. [11] R. P. Lipmann, “An introduction to computing with neural nets,” ASSP,

vol. 40, 1987.

[12] H. C. Zhang and S. H. Huang, “Applications of neural networks in man-ufacturing: A state-of-the-art survey,” Int. J. Prod. Res., vol. 33, pp. 705–728, 1995.

[13] C. C. Chiu, C. T. Su, G. S. Yang, J. S. Huang, S. C. Chen, and T. N. Cheng, “Selection of optimal parameters in gas-assisted injection moulding using neural network model and the Taguchi method,” Int. J. Qual. Sci., vol. 2, no. 2, pp. 106–120, 1997.

[14] F. L. Chen and S. F. Liu, “A neural-network approach to recognize defect spatial pattern in semiconductor fabrication,” IEEE Trans. Semiconduct. Manufact., vol. 13, pp. 366–373, Aug. 2000.

[15] S. L. Hung and H. Adeli, “A parallel genetic/neural network learning algorithm for MIMD shared memory machines,” IEEE Trans. Neural Networks, vol. 5, pp. 900–909, Nov. 1994.

[16] T. K. Meng and C. Butler, “Solving multiple response optimization prob-lems using adaptive neural networks,” Int. J. Adv. Manufact. Technol., vol. 13, pp. 666–675, 1997.

[17] S. Y. S. Lam, K. L. Petri, and A. E. Smith, “Prediction and optimization of a ceramic casting process using a hierarchical hybrid system of neural networks and fuzzy logic,” IIE Trans., vol. 32, pp. 83–91, 2000. [18] D. E. Goldberg, Genetic Algorithms in Search, Optimization, and

Ma-chine Learning. Reading, MA: Addision-Wesley, 1989.

[19] Z. Khan, B. Prasad, and T. Singh, “"Machining condition optimization by genetic algorithms and simulated annealing,” Comput. Oper. Res., vol. 24, no. 7, pp. 647–657, 1997.

[20] C.-M. Hsu and C.-T. Su, “Multiobjective machine-component grouping in cellular manufacturing: A genetic algorithm,” Prod. Planning Contr., vol. 9, no. 2, pp. 155–166, 1998.

[21] K. J. Kim and K. J. D. Lin, “Simultaneous optimization of mechanical properties of steel by maximizing exponential desirability functions,” Appl. Stat., pt. 3, vol. 49, pp. 311–325, 2000.

[22] E. Castillo, D. C. Montgomery, and D. R. Maccarville, “Modified desir-ability functions for multiple response optimization,” J. Qual. Technol., vol. 28, no. 3, pp. 337–345, July 1996.

[23] R. Bellman and L. Zadeh, “Decision-making in a fuzzy environment,” Manag. Sci., vol. 17, pp. 141–164, 1970.

[24] M. Laviolette, J. Seaman, J. Barrett, and W. Woodall, “A probabilistic and statistical view of fuzzy methods,” Technometrics, vol. 37, pp. 249–261, 1995.

Chao-Ton Su received the Ph.D. degree in industrial

engineering from the University of Missouri, Columbia.

He is a Professor in the Department of Indus-trial Engineering and Managementm, National Chiao-Tung University, Hsinchu, Taiwan, R.O.C. His research is in the areas of quality management and neural networks in industrial applications.

Tai-Lin Chiang received the Ph.D. degree in

in-dustrial engineering and management from National Chiao-Tung University, Hsinchu, Taiwan, R.O.C.

He is currently a Lecturer in the Department of Business Administration, Mingshin Institute of Technology, Hsinchu. His research interests include quality engineering, neural networks applications, and semiconductor manufacturing engineering.