Test Results and Comparisons

Our simulations are based on the following data collected from two products, say product A and product B, of a renowned wafer foundry in Taiwan. Both products are made in 6-inch wafers⁷. Each wafer of product A and product B consists of 203 dies and 206 dies, respectively. In the following, we will focus our description on product A, while the test results of product B will also be presented. It should be noted that all the test results shown in this section are simulated in a Pentium IV PC using Borland C++.

There are 12 bins in the wafers of product A. The probability mass function P(B_k  ,n) k=1,…,12,and the probability of the number of overkills in bin k, p ,_k k=1,…,12,are given. The yield rate of product A is 68%. The decision-variable space

sigmoid-type function as our penalty function P in (3.2), i.e., ₍ _[ _] ₎ 1

(0.1594) is a normalized coefficient such that

]

where  is the mean of the number of dies of bin_k k. The parameters in the proposed two-level algorithm are set as follows: L_s 300 , L_e 10,000, M =1000, I =5000, N =1000, and s =35. We have simulated 3 cases of different r ’_T s,which are10,30 and 50.

The good enough vector of threshold values and the average overkill percentage for the three cases of r_T we obtained from the two-level algorithm are shown in Table 3.1. The CPU time consumed in each case plus the training time is approximately 6.05 minutes.

From Table 3.1, we can observe that when r_T increases, the values of g_W_min increase as shown in row 2, and the values of leading n_k_max, k 5 and 6, which account for most of the retests, decrease as shown in rows 7 and 8, respectively. This indicates that if we allow

7The reason we use 6-inch wafer products is for easier identification of the bins and overkills in experiments. In fact, our results can apply to any size of wafer.

Figure 3.4: The resulted (E[V],E[R]) pairs of the 521 test wafers based on the vector of threshold values determined by two-level algorithm, random generator, three-sigma limit, six-sigma limit, GA and SA algorithm.

Table 3.1: The good enough vector of threshold values and the average overkill percentage of product A for three different r ’_T s.

more retests (that is increasing r ), we can set more stringent threshold values (that are_T increasing g_W_min and decreasing the leading n_k_maxs), so as to save more overkills (that is the decreased average overkill percentage), as indicated in the last row of Table 3.1.

To demonstrate the real world performance of the vector of threshold values obtained by the two-level algorithm for the three cases shown in Table 3.1, we use 521 real test wafers, whose number of dies of all bins, b_jk, j1,...,521, k1,...,12, and overkills before retest,

v ,jk j1,...,521, k 1,...,12, are known. The corresponding results of the pair of the average overkills per wafer,





E , and the average retests per wafer, 521 )

E , for these 521 test wafers are shown in Figure 3.4 as the points marked by

“”, “”,“



”with thecorresponding r_T shown on the top right corner of the figure. We also use 2000 randomly selected vectors of threshold values to test the same 521 wafers; the resulted pairs of E[V] and E[R] areshown asthepointsmarked by “”in Figure 3.4.

We see that for E[R]10, the E[V] resulted by the good enough vector of threshold values obtained by the two-level algorithm is almost the minimum compared with those resulted by the randomly selected vectors of threshold values. Similar conclusions can be drawn for the cases of r =30 and 50. Since reducing overkills and retests have conflicting_T nature, the considered unconstrained stochastic optimization problem (3.2) possesses pareto optimal solutions. From Figure 3.4, we can see that the results we obtained for the cases of r =10, 30 and 50 are almost on the boundary of the region resulted from the randomlyT

generated vectors of threshold values; this implicit boundary represents the (E[V],E[R]) pairs resulted by the pareto optimal vectors of threshold values.

We also use the three-sigma limit and six-sigma limit to determine the threshold values such that g_W³^_min _g 3_g , n_k³^_max _k 3_k , k 1,...,12 , and g_W⁶^_min _g 6_g ,

k k

nk⁶^_max 6, k 1,...,12, where  and_g  , the mean and standard derivation of the_g number of good dies in a wafer, and  and_k  , the mean and standard derivation of the_k number of dies of bin k, are obtained from the data set of 521 test wafers. Using these threshold values to test the same set of 521 test wafers, the resulted (E[V],E[R]) pairs

from three-sigma limit and six-sigma limit are also shown in Figure 3.4 marked by ““and

“”, respectively. For E[R]10, we can see that our method will save 22% and 24%

more overkills than the three-sigma limit and six-sigma limit, respectively. Considering the vast number of dies manufactured per month, the increased profit due to saving overkills will be too large to neglect. Furthermore, both three-sigma limit and six-sigma limit do not generate the pareto optimal solution for (3.2), and they cannot control the level of retests like ours.

We have also used typical GA and Simulated Annealing (SA) [42] algorithm to solve (3.2) for the case of r =10. As indicated at the beginning of Section 3.4, the global searching_T techniques are computationally expensive in solving (3.2). We stop the GA and SA when they consumed 50 times of the CPU time consumed by the two-level algorithm, and the objective values of (3.2) they obtained are still 5.4% and 8.1% more than the final objective value obtained by the two-level algorithm, respectively. Using the threshold values they obtained to test the 521 wafers, the resulted (E[V],E[R]) pairs from GA and SA are marked by “+“and “”in Figure 3.4. We found that using two-level algorithm, we can save 6.2% and 8.6% more overkills than using the GA and SA for E[R]10, respectively. In addition, both GA and SA do not generate the pareto optimal solution, because the best so far solution they obtained for 5 hours of CPU time are still far away from the optimal solution of (3.2).

There are 10 bins in the wafers of product B. The probability mass function P(B_k  ,n) k=1,…,10, and the probability of the number of overkills in bin k, p ,_k k=1,…,10, are given. The yield rate of product B is 46.6%. We employed the same sigmoid-type function as that used in product A, however the normalized coefficient  is 0.1207. Specific data in the two-level algorithm applying to product B are similar to the case of product A. We have also simulated 3 cases of different r ’_T s,which are20,40 and 80. The CPU time consumed in each case plus the training time is approximately 5.2 minutes. More retests are requested here due to the lower yield of product B than A. The good enough vectors of threshold values and the average overkill percentage for the three cases of different r ’sobtained by

our algorithm are shown in Table 3.2. The values of r_T versus the threshold values and the average overkill percentage have the same trend as in product A. From Table 3.2, we can observe that when r_T increases, the values of g_W_min increase as shown in row 2, and the values of leading n_k_max, k 8 and 9, which account for most of the retests, decrease as shown in rows 10 and 11, respectively.

We use 590 real wafers of product B to test the performance of the vector of threshold values shown in Table 3.2. The resulted pairs of



 pairs of E[V] and E[R] of the 2000 randomly generated vectors of threshold values applied to the same set of 590 wafers. From this figure, we can see that the results we obtained for the cases of r =20, 40 and 80 are almost on the boundary of the region_T resulted from the randomly generated vectors of threshold values; this implicit boundary represents the (E[V],E[R]) pairs resulted by the pareto optimal vectors of threshold values.

We also use the three-sigma limit and six-sigma limit to determine the threshold values, the resulted (E[V],E[R]) pairs from three-sigma limit and six-sigma limit are also shown in Figure 3.5 marked by ““and “”, respectively. For E[R]20, we can see that our method will save 21% and 24% more overkills than the three-sigma limit and six-sigma limit, respectively. We have also used typical GA and SA algorithm to solve (3.2) for the case of r =20. We stop the GA and SA when they consumed 50 times of the CPU time_T consumed by the two-level algorithm, and the objective values of (3.2) they obtained are still 5.3% and 6.9% more than the final objective value obtained by the two-level algorithm, respectively. Using the threshold values they obtained to test the 590 wafers, the resulted (E[V],E[R]) pairs from GA and SA are marked by “+“and “”in Figure 3.5. We found that using two-level algorithm, we can save 6.6% and 4.2% more overkills than using the GA and SA for E[R]20, respectively.

Figure 3.5: The resulted (E[V],E[R]) pairs of the 590 test wafers based on the vector of threshold values determined by two-level algorithm, random generator, three-sigma limit, six-sigma limit, GA and SA algorithm.

Good enough

vector of threshold values

20 40 80

min

gW 118 131 146

max

n1 6 2 3

max

n2 4 4 1

max

n3 3 4 5

max

n4 6 3 3

max

n5 5 1 4

max

n6 10 5 5

max

n7 7 4 2

max

n8 65 54 38

max

n9 76 63 45

max

n10 18 13 9

* [ ]100% TDB

E 2.36% 1.71% 0.56%

* TDB: the total number of dies in a wafer of product B.

Table 3.2: The good enough vector of threshold values and the average overkill percentage of product B for three different r ’_T s.

3.6 Concluding Remarks

To cope with the computationally intractable stochastic simulation optimization problems, we have proposed an ordinal optimization theory based two-level algorithm to solve for a good enough solution using reasonable computational time. We have justified the performance of the proposed algorithm based on the simulations.

To demonstrate the applicability of the proposed algorithm, we have used it to solve for a vector of good enough threshold values to reduce overkills and retests in a wafer testing process of a wafer foundry. We have tested the performance of the solution we obtained using the real data and found that the resulting average number of overkills and retests per wafer lie almost on the boundary resulted from the pareto optimal vector of threshold values of the considered stochastic optimization problem. This indicates that the proposed algorithm will not only control the tolerable level of retests by taking the various chip demand into account but also provide a near pareto optimal vector of threshold values. The vector of good enough threshold values obtained by the proposed algorithm is very successful in the aspects of solution quality and computational efficiency.

The proposed formulation for reducing overkills and retests is not limited to the testing process of a foundry, it can easily adapt to any general testing procedures. The proposed ordinal optimization theory based two-level algorithm is not limited to the problem considered in this chapter. In fact, it can be used to solve any hard optimization problem that requires lengthy computational time to evaluate the performance of a decision variable.

Chapter 4 Conclusions and Perspectives

Two related issues on the throughput and yield of wafer fabrication and testing processes have been discussed. In the first issue, we have presented a classification based fault detection and isolation scheme for the ion implanter, and in the second issue, we have presented an ordinal optimization approach to find the optimal threshold values to reduce the overkills of dies under a tolerable retest level in wafer testing process.

In the first issue, the proposed classification based fault detection and isolation scheme is a general methodology. Modifying the warning signal generation criteria to meet individual machine’s needs, this fault detection scheme is not limited to the ion implanter. The simplicity of the HCT based fault isolation scheme made HCT worthwhile especially when its accuracy can be remedied by the warning signal generation criteria when applying to the ion implanter. Due to the efficient learning capability of HCT and the 0.05 seconds classification time for classifying the recipe of a working wafer, the proposed fault detection and isolation scheme can work on line and real-time.

In the second issue, the proposed ordinal optimization theory based two-level algorithm is not limited to the problem considered in this dissertation. In fact, it can be used to solve any hard optimization problem that requires lengthy computational time to evaluate the performance of a decision variable. Although the proposed approach presented in this dissertation was illustrated using a wafer testing problem, it is well suited to different application areas.

Reference

[1] C. M. McKenna,“A personal historical perspective of ion implantation equipment for semiconductor applications,” 2000 International Conference on Ion Implantation Technology, Alpbach, Austria, pp. 1-19, Sep. 2000.

[2] P. M. Frank, ”Faultdiagnosis in dynamic systems using analytical and knowledge-based redundancy,”Automatica, vol. 26, no. 3, pp. 459-474, May 1990.

[3] R. Isermann, “Fault diagnosis of machines via parameterestimation and knowledge processing,”Automatica, vol. 29, no. 4, pp. 815-835, July 1993.

[4] J. Gertler, Fault Detection and Diagnosis in Engineering Systems, New York: Marcel Dekker, May 1998.

[5] M. Basseville and A. Benveniste, (eds), “Detection of abrupt changes in signals and dynamical systems,” Lecture Notes in Control and Information Sciences, vol. 77, Springer-Verlag: Berlin, Dec. 1985.

[6] H. H. Yue, S. J. Qin, R. J. Markle, C. Nauert and M. Gatto, “Fault detection of plasma etchers using optical emission spectra,” IEEE Transactions on Semiconductor Manufacturing, vol. 13, no. 3, pp. 374-385, Aug. 2000.

[7] R. Isermann,“Model-based fault detection and diagnosis - status and applications,”16th Symposium on Automatic Control in Aerospace, St. Petersburg, Russland, June 2004.

[8] G. M. Smith, Statistical Process Control and Quality Improvement, 5^th ed., Upper Saddle River, NJ: Prentice Hall, July 2003.

[9] M. H. Dunham, Data Mining: Introductory and Advanced Topics, Englewood Cliffs, NJ:

Prentice Hall, Aug. 2002.

[10] L. Breiman, J. H. Friedman, J. A. Olshen and C. J. Stone, Classification and Regression Trees, London: Chapman & Hall, Jan. 1984.

[11] G. P. Zhang, “Neural networks for classification:a survey,”IEEE Transactions on Systems, Man and Cybernetics, Part C, vol. 30, no. 4, pp. 451-462, Nov. 2000.

[12] L. Bruzzone and D. F. Prieto, “Unsupervised retraining of a maximum likelihood classifier for the analysis of multitemporal remote sensing images,”IEEE Transactions on Geoscience and Remote Sensing, vol. 39, no. 2, pp. 456-460, Feb. 2001.

[13] X. Chang and J. H. Lilly, “Evolutionary design of a fuzzy classifier from data,”IEEE Transactions on Systems, Man and Cybernetics, Part B, vol. 34, no. 2, pp. 1031-1044, April 2004.

[14] J. R. Quinlan, C4.5: Programs for Machine Learning, San Mateo, Calif.: Morgan Kaufmann, Jan. 1993.

[15] K. R. Müller, S. Mika, G. Rätsch, K. Tsuda and B. Schölkopf, “An introduction to kernel-based learning algorithms,”IEEE Transactions on Neural Networks, vol. 12, no.

2, pp. 181-202, March 2001.

[16] L. Breiman, “Random forests,”Machine Learning, vol. 45, no. 1, pp. 5-32, Oct. 2001.

[17] J. H. Friedman, “Stochastic gradient boosting,”Computational Statistics & Data Analysis, vol. 34, no. 4, pp. 367-378, Feb. 2002.

[18] A. Borisov, V. Eruhimov and E. Tuv, “Boosting flexible learning ensembles with dynamic feature selection”, NIPS 2003 workshop on feature extraction, British Columbia, CA, Dec. 2003.

[19] G. Grimmett and D. Stirzaker, Probability and Random Processes, 3^rd ed., Oxford University Press, Oxford, May 2001.

[20] H. Ishibuchi and T. Nakashima, “Effect of rule weights in fuzzy rule-based classification systems,” IEEE Transactions on Fuzzy Systems, vol. 9, no. 4, pp.

506-515, Aug. 2001.

[21] J. R. Quinlan, (2003) Data Mining Tools See5 and C5.0, version 1.20. [Online].

Available: http://www.rulequest.com/see5-info.html

[22] S. Muriel, P. Garcia, O. Marie-Richard, M. Monleon and M. Recio,“Statisticalbin analysis on wafer probe,”2001 IEEE/SEMI Advanced Semiconductor Manufacturing Conference and Workshop, Munich, Germany, pp. 187-192, April 2001.

[23] D. C. Montgomery, Introduction to Statistical Quality Control, 5^thed., New York: John

Wiley and Sons, July 2004.

[24] J. Andersson, A Survey of Multiobjective Optimization in Engineering Design, Technical Report No. LiTH-IKP-R-1097, Department of Mechanical Engineering, Linköpings University, Sweden, 2000.

[25] K. M. Miettinen, Nonlinear Multiobjective Optimization, Boston: Kluwer Academic Publishers, Oct. 1999.

[26]Y.C.Ho,“An explanation of ordinal optimization: Soft computing for hard problems,”

Information Sciences, vol. 113, no. 3-4, pp. 169-192, Feb. 1999.

[27] T.W.E. Lau and Y.C. Ho,“Universalalignmentprobability and subsetselection for ordinal optimization,”Journal of Optimization Theory and Applications, vol. 39, no. 3, pp. 455-489, June 1997.

[28] C.-H. Chen, S.D. Wu and L. Dai, “Ordinalcomparison ofheuristicalgorithmsusing stochastic optimization,”IEEE Transactions on Robotics and Automation, vol. 15, no.

1, pp. 44-56, Nov. 1999.

[29] B.-W. Hsieh, C.-H. Chen and S.-C. Chang, “Scheduling semiconductor wafer fabrication by using ordinal optimization-based simulation,”IEEE Transactions on Robotics and Automation, vol. 17, no. 5, pp. 599-608, Oct. 2001.

[30] S.-Y. Lin and Y.C. Ho, “Universal alignment probability revisited,”Journal of Optimization Theory and Applications, vol. 113, no. 2, pp. 399-407, May 2002.

[31] S.-Y. Lin, Y.C. Ho and C.-H.Lin,“An ordinaloptimization theory based algorithm for solving the optimal power flow problem with discrete control variables,”IEEE Transactions on Power Systems, vol. 19, no. 1, pp. 276-286, Feb. 2004.

[32] M. Bosque, Understanding 99% of Artificial Neural Networks: Introduction & Tricks, San Jose: Writers Club Press, March 2002.

[33] C. R. Reeves and J. E. Rowe, Genetic Algorithms: Principles and Perspectives: A Guide to GA Theory, Boston: Kluwer Academic Publishers, Dec. 2002.

[34] George A. F. Seber and C. J. Wild, Nonlinear Regression, New York: John Wiley and Sons, Sep. 2003.

[35] K. Hornik, M. Stinchcombe and H. White, “Multilayer feedforward networks are universal approximators,”Neural Networks, vol. 2, no. 5, pp. 359-366, 1989.

[36] T. Chen, H. Chen and R. W. Liu, “Approximation capability in C(R¯ⁿ) by multilayer feedforward networks and related problems,”IEEE Transactions on Neural Networks, vol. 6, no. 1, pp. 25-30, Jan. 1995.

[37] J. G. Attali and G. Pagès, “Approximation of functions by a multilayer perceptron: a new approach,”Neural Networks, vol. 10, no. 6, pp. 1069-1081, Aug. 1997.

[38] C. G. Panayiotou, C. G. Cassandras and W. B. Gong,“Model abstraction for discrete event systems using neural networks and sensitivity information,”Proceedings of the 2000 Winter Simulation Conference, Orlando, FL, USA, vol. 1, pp. 335-341, Dec.

2000.

[39] R. A. Kilmer, A. E. Smith and L. J. Shuman,“Computing confidence intervals for stochastic simulation using neural network metamodels,”Computers and Industrial Engineering, vol. 36, no. 2, pp. 391-407, April 1999.

[40] G. Lera and M. Pinzolas, “Neighborhood based Levenberg-Marquardt algorithm for neural network training,”IEEE Transactions on Neural Networks, vol. 13, no. 5, pp.

1200-1203, Sep. 2002.

[41] M.F.Moller,“A scaled conjugategradientalgorithm forfastsupervised learning,”

Neural Networks, vol. 6, no. 4, pp. 525-533, 1993.

[42] S. M. Sait and H. Youssef, Iterative Computer Algorithms with Applications in Engineering: Solving Combinatorial Optimization Problems, Los Alamitos: IEEE Computer Society, Aug. 1999.

[43] E. K. P. Chong and S. H. Żak, An Introduction to Optimization, 2^nd ed., New York:

John Wiley and Sons, July 2001.

List of Publication

著作目錄姓名: 洪士程 (Shih-Cheng Horng

)

期刊論文著作:

1. Shin-Yeu Lin and Shih-Cheng Horng, “A Classification Based Fault Detection and Isolation Schemeforthe Ion Implanter”,accepted to appear in IEEE Transactions on Semiconductor Manufacturing. (EI, SCI)

2. Shin-Yeu Lin and Shih-Cheng Horng, “Application of an Ordinal Optimization Algorithm to theWaferTesting Process”,accepted to appear in IEEE Transactions on Systems, Man and Cybernetics, Part A. (EI, SCI)

研討會論文著作:

1. Shin-Yeu Lin, Shih-Cheng Horng, “Ordinal Optimization Approach to Stochastic Simulation Optimization Problemsand Applications”,Proceedings of the 15th IASTED International Conference on Applied Simulation and Modelling, pp. 274-279, Rhodes, Greece, June 26~28, 2006.

2. Shih-Cheng Horng, Shin-Yeu Lin, “A Hybrid Classification Tree for Products of Complicated Machinesin FlexibleManufacturing Systems”,Proceedings of IEEE SMC 2005 - International Conference on Systems, Man and Cybernetics, pp. 3775-3780, Hawaii, USA, Oct. 10~12, 2005.

3. Shin-Yeu Lin, Shih-Cheng Horng, Chi-Hsing Tsai, “Fault Detection of the Ion Implanter Using Classification Approach”,Proceedings of the 5th Asian Control Conference, pp.

808-813, Melbourne, Australia, July 20-23, 2004.

4. Chi-Hsing Tsai, Shin-Yeu Lin, Mu-Huo Cheng, Shih-Cheng Horng, Chun-Hung Liu, Wen-Yo Lee, Chia-Hung Tsai, “An Effective and Efficient Hierarchical Fuzzy Rule Based Classifier”,Proceedings of IEEE International Conference on Machine Learning and Cybernetics 2003, vol.4, pp. 2173-2178, Xi-An, China, Nov. 2-5, 2003.

5. S. C. Horng, S. Y. Lin, M. H. Cheng, F. Y. Yang, C. H. Lin, W. H. Lee, C. H. Tsai,

在文檔中兩個關於晶圓製造及測試程序的產能與良率之問題及解決方法 (頁 68-81)







3.6 Concluding Remarks

Chapter 4

Conclusions and Perspectives

Reference

List of Publication

著作目錄 姓名: 洪士程 (Shih-Cheng Horng

著作目錄姓名: 洪士程 (Shih-Cheng Horng