- Calculating next matrix algorithm : - 知識系統中快速索引機制之研究

Input: An s-feature matrix M^s for a table T.

Output: An (s+1)-feature matrix M^s+1 for a table T.

Step 1: For each j, j = 1 to |C^s| - 1, do the following steps.

Step 2: For each l, l = (j mod m) + 1 to m, do the following sub-steps.

Step 2.1: Set NV^s+1j = NV^sj ORNV¹l. Step 2.2: Set the temporary counter k to 1.

Step 2.3: For each feature-value vector F^sjx in M^sj, 1 ≤ x ≤ |C^sj|, do the following sub-steps:

Step 2.3.1: For each feature-value vector F¹ly in M¹l, 1 ≤ y ≤ |C¹l|, do the following sub-steps:

Step 2.3.1.1: Set RV^s+1jk = RVjx AND RV¹ly. Step 2.3.1.2: Set CV^s+1jk = CV^sjx ANDCV¹ly. Step 2.3.1.3: IF CV^s+1jk ≠

UNIQUE_σm, set CV^s+1jk = Find class vector algorithm(RV^s+1jk).

Step 2.3.1.4: Set k = k+1.

Step 3: Return the (s+1)-feature matrix M^s+1.

For example, the 2-feature matrix M² for the data in Table 7.1 is generated from the 1-feature matrix M¹ as follows. The name vector for feature C²1 is first calculated.

Thus:

NV²1 = NV¹1 ORNV¹1

= 1000 OR 0100

= 1100.

The feature-value vector F²11 in M²1 is then calculated. The record vector is found as follows:

RV²11 = RV¹11 AND RV¹21

= 1100100000 AND 1110000000

= 1100000000.

The class vector is found as follows:

CV²11 = CV¹11 AND CV¹21

= 110 AND 100

= 100.

In a similar way, all the feature-value vectors in the 2-feature matrix M² can be found. The results are shown in Table 7.5:

Table 7.5: The 2-feature matrix M² found by the Calculating next matrix algorithm Feature

Set

Feature Set Value

Name Vector

Record Vector

Class Vector V²₁₁ 1100 1100000000 100 V²12 1100 0000100000 010 V²13 1100 0010000000 100 V²14 1100 0001010000 011 V²15 1100 0000001000 001 C²₁

V²₁₆ 1100 0000000100 001 V²21 1010 1000000000 100 V²22 1010 0100000000 100 V²₂₃ 1010 0000100000 010 V²₂₄ 1010 0001010000 011 V²25 1010 0010000000 100 C²2

V² 1010 0000001100 001

V²31 1001 1000100000 110 V²₃₂ 1001 0100000000 100 V²33 1001 0011000000 110 V²34 1001 0000010000 001 C²3

V²35 1001 0000001100 001 V²41 0110 1000000000 100 V²₄₂ 0110 0110000000 100 V²43 0110 0001011000 011 V²44 0110 0000100000 010 C²4

V²₄₅ 0110 0000000100 001 V²₅₁ 0101 1010000000 100 V²52 0101 0100000000 100 V²53 0101 0001100000 010 V²₅₄ 0101 0000011000 001 C²₅

V²55 0101 0000000100 001 V²61 0011 1001000000 110 V²62 0011 0000011100 001 V²63 0011 0010000000 100 V²₆₄ 0011 0100000000 100 C²6

V²65 0011 0000100000 010

Note that in Step 2.3.1.2, the class vector derived by the bit-wise "AND" operator denotes only the "possible" class distribution. For example, the feature-value vector F²₂₁ consists of RV²₂₁ = "1000000000" and CV²₂₁ = "110" after Step 2.3.1.2. Since each record belongs to only one class, the above results are not correct. In fact, the class vector CV²21 = "100". Step 2.3.1.2 is used as a quick check. If CV^s+1jk ≠

UNIQUE_σm, then the Find class vector algorithm is run in Step 2.3.1.3 to find the correct class vector.

After the new feature matrix is derived, the Selecting feature set algorithm is then executed again to find an appropriate feature set. For the above example, the 2-feature matrix M² is then input to the Selecting feature set algorithm and the feature set FS = {C2,C4} are found and returned as the solution.

After the above method is executed, the feature set FS to classify the given data set T is generated. FS may be over-fitting or under-fitting for the problem since they are derived only according to the current data set. These features are then evaluated and modified by domain experts. They thus serve as the candidates for the experts to have a good initial standpoint.

7.3 Complexity Analysis and Experiments

The time and space complexities of the proposed algorithms are analyzed in this section. Let n be the number of records, m be the number of features and c be the number of classes. Also define i as the maximum possible number of features in a feature set, j as the maximum number of possible values of a feature, and s as the number of iterations. The time complexity and space complexity of each step in the Find class vector algorithm is shown in Table 7.6.

Table 7.6: The time and space complexities of the Find class vector algorithm

Step No Time Complexity Space Complexity

Step 1 O(1) O(c)

Step 2 O(jc) O(jc)

Step 3 O(1) O(c)

Total O(jc) O(jc)

The time and space complexities of each step in the Create cleansing tree algorithm is shown in Table 7.7. Note that the maximum amount of nodes within a

Ctree is n.

Table 7.7: The time and space complexities of the Create cleansing tree algorithm

Step No Time Complexity Space Complexity

Step 1 O(1) O(1)

Step 2 O(1) O(1)

Step 3 O(1) O(1)

Step 4 O(nmj) O(n)

Step 5 O(mj) O(n)

Step 6 O(1) O(1)

Step 7 O(1) O(1)

Total O(nmj) O(n)^*

The time and space complexities of each step in the Find span order algorithm is shown in Table 7.8:

Table 7.8: The time and space complexities of the Find span order algorithm

Step No Time Complexity Space Complexity

Step 1 O(m) O(m)

Step 2 O(cm) O(cm)

Step 3 O(clgc) O(c)

Step 4 O(1) O(1)

Total O(Max(cm, clgc)) O(cm)

The time and space complexities of each step in the Cleansing feature matrix algorithm is shown in Table 7.9:

Table 7.9: The time and space complexities of the Cleansing feature matrix algorithm

Step No Time Complexity Space Complexity

Step 1 O(mj) O(mj)

Step 2 O(1) O(1)

Total O(mj) O(mj)

The time and space complexities of each step in the Selecting feature set algorithm is shown in Table 7.10.

Table 7.10: The time and space complexities of the Selecting feature set algorithm

Step No Time Complexity Space Complexity

Step 1 O(1) O(1)

Step 2 O(m^sj^s) O(1)

Step 3 O(1) O(1)

Step 4 O(j^s) O(1)

Step 5 O(1) O(1)

Step 6 O(c) O(c)

Step 7 O(1) O(1)

Total O(m^sj^s) O(c)

The time and space complexities of each step in the Calculating next matrix algorithm is shown in Table 7.11:

Table 7.11: The time and space complexities of the Calculating next matrix algorithm

Step No Time Complexity Space Complexity

Step 1 O(m^sj^s) O(m^sj^s)

Step 2 O(mj) O(mj)

Step 3 O(1) O(j)

Total O(m^sj^s) O(m^sj^s)

To evaluate the performance of the proposed method, we compare it with other feature selection methods. Our target machine is a Pentium III 1G Mhz processor system, running on the Microsoft Windows 2000 multithreaded OS. The system includes 512K L2 cache and 256 MB shared-memory.

Several datasets from the UCI Repository [60] are used for the experiments.

These datasets have different characteristics. Some have known relevant features (such as Monks), some have many classes (such as SoybeanL), and some have many instances (such Mushroom). In addition, a large real data set about endowment

insurances from a world-wide financial group is used to examine the usability of the proposed method. Experimental results show the proposed method can discover the desired feature sets and can thus help the enterprise to build a CBR system for their loan promotion function of customer relationship management system. The data set of insurance data uses 27 condition features to describe the states of 3 different insurance types. Different types of attribute values including date/time, numeric and symbolic data exist. They are all transformed into the symbolic type by some clustering methods.

Six of them have missing values.

The characteristics of the above datasets are summarized in Table 7.12.

Table 7.12: The datasets used in the experiments Database Name Class No. Condition

Feature No.

Record No. Missing Features

Monk1 2 6 124 no Monk2 2 6 169 no Monk3 2 6 122 no

Vote 2 16 300 no

Mushroom 2 22 8124 Yes

SoybeanL 19 35 683 Yes

Insurance 3 27 35000 Yes

In the experiments, the accuracy, the number of selected features, and the time

accuracy is measured by the classification results of the target table. If the selected feature set can solve the problem without any error, 100% accuracy is reached;

otherwise the accuracy is calculated by the number of correctly classified records over the total number of records. Experimental results show both methods can reach 100%

accuracy. We then compare the feature sets found by these two approaches. The results are shown in Table 7.13. Obviously, the accuracy of all datasets is 100% since both of these two method discover the minimal feature sets.

Table 7.13: The selected feature sets found by the two approaches.

Traditional Rough Set Approach

Bitmap-based Approach Accuracy

Dataset Feature Set Feature Set 100%

Monk1 C1, C2, C5 C1, C2, C5 100%

Monk2 C1-C6 C1-C6 100%

Monk3 C1, C2, C4, C5 C1, C2, C4, C5 100%

Vote C1-C4, C9, C11, C13, C16 C1-C4, C9, C11, C13, C16 100%

Mushroom C3, C4, C11, C20 C3, C4, C11, C20 100%

SoybeanL Need too much computation time.

C14, C20, C26, C27, C29, C30, C31, C32, C33, C34, C35

100%

Insurance C4, C15, C17, C20, C22, C25

C4, C15, C17, C20, C22, C25

100%

Note that there may be more then one solution for the selected features. In Table 7.13, only the first selected feature set (in the alphabetical order) is listed. It is easily seen that the selected feature sets of our proposed approach and the traditional rough

set approach are the same except for the SoybeanL problem. The SoybeanL problem needs too much computation time by the traditional rough set approach.

The numbers of the selected features by the two approaches are shown in Table 7.14. Both methods get the same numbers for all problems except for SoybeanL.

Table 7.14: The number of the selected features found by the two approaches.

Dataset Traditional RS Bitmap-based

Monk1 3 3

Monk2 6 6

Monk3 4 4

Vote 8 8

Mushroom 4 4

SoybeanL 11 11

Insurance 6 6

At last, the computation time is compared. The data sets are first loaded into the memory from the hard disk and the processing times are measured. The time is rounded to 0 if the real time is less than 0.001 seconds. The results are shown in Table 7.15.

Table 7.15: The CPU times needed by the two approaches Dataset Traditional RS Bitmap-based

Monk1 0.07 0

Monk2 0.351 0.01

Monk3 0.141 0

Vote 428.19 1.923

Mushroom 4911.32 27.91 SoybeanL >1000000 247805

Insurance 468656 2435.66

Consistent with our expectation, the proposed approach is much faster than the traditional rough set approach. Especially for the Insurance data, our approach needs only about 40 minutes, but the traditional rough set approach needs much more computation time.

In this chapter, we have proposed a bit-based feature selection approach to discover optimal feature sets for the given table(dataset). In this approach, the feature values are first encoded into bitmap indices for searching the optimal solutions efficiently. Also, the corresponding indexing and selecting algorithms are described in details for implementing the proposed approach. Experimental results on different data sets have also shown the efficiency and accuracy of the proposed approach.

The traditional rough-set approach has two very time-consuming parts, combination of features and comparison of upper/lower approximations. In this method, we use the single-time-clock bit-wise operations to shorten the computation time of the comparison part. Moreover, the workload in the combination part is highly reduced since the new levels of combination can be generated via the pervious ones.

The bit-wise operations are also used to speed up the combination generation. The proposed feature-selection approach also adopts appropriate meta-data structures to

take advantages of the computational power of the bit-wise operations.

The feature selection problem is generally an NP-complete problem. Although the proposed approach can process a larger amount of features than the traditional rough-set approach, it still becomes unmanageable especially when the number of features is huge or when the number of possible values of features is large. In the future, we will continuously investigate and design efficient heuristic approaches to manage huge amounts of features and possible values. We will also attempt to integrate different feature selection approaches to automatically select an appropriate one for optimal or near-optimal solutions according to the characteristics of given data sets.

Chapter 8 Using BWI Indexing in

Semiconductor Manufacturing Defect Detection Systems

In this chapter, an unsupervised-learning data-driven data mining system of a production-level defect detection system in an intelligent engineering data analysis (iEDA) system in Taiwan Semiconductor Manufacturing Company Ltd. (TSMC) is introduced. The bit-wise indexing methods (including Sample, Encapsulated and Compact Bit-wise Indexing Methods), Data Mining Technologies, and Statistic Methods are hybridly used in this application in order to generate the possible root-cause candidate list for the given manufacturing details of an individual low-yield situation event. Also, some critical issues about of applying a data mining solution for manufacturing defects detection system in semiconductor manufacturing domain will be discussed and reviewed. Finally, we will propose the system framework of the next-generation data mining solution in the future for providing a more knowledgeable,

reasonable, reliable and flexible solution for data mining solution in the semiconductor manufacturing domain.

8.1 Problem Description

In recent years, the procedures of manufacturing have become increasingly complex [16][17][18]. To meet high expectations regarding yield targets, rapidly identifying the root causes of defects is essential for meeting high expectations regarding yield targets. Therefore, the technologies of process control, statistical analysis and experiment design are used to establish a solid base for well tuned manufacturing processes. However, identifying root cause remains extremely difficult due to multi-factor and nonlinear interactions in this intermittent problem. Traditionally, the process of identifying root cause of defects is costly. The semiconductor manufacturing industry provides an example. With a huge amount of semiconductor engineering data stored in the database and versatile analytical charting and reporting in production and development, the CIM/MES/EDA systems in most semiconductor manufacturing companies help users analyze the collected data to achieve the goal of yield enhancement. However, semiconductor manufacturing procedures are

sophisticated, and thus multi-dimensional and large volumes of data are required to be collected for these procedures. Data mining technologies [4][3][9][33] are employed to deal with such large amounts of high-dimensional data [6][16][17][29][41][51][52][59].

In this chapter, we propose a data mining system and describe the experience of applying such systems for discovering the root causes of low-yield situations in TSMC [16][17]. Additionally, the evaluation of applying such a mining system for manufacturing defect detection in the semiconductor manufacturing domain is discussed and reviewed. Finally, a new architecture for a reasonable, reliable and flexible defect detection platform based on the data mining approach is briefly described.

8.2 DM Project for Yield Enhancement

In June 2002, a research project on data mining techniques was triggered by the Manufacturing Information Technology Division of Taiwan Semiconductor Manufacturing Company. Five test cases were conducted, including partial lot-based information, WIP information, CP information, In-line metrology results, WAT results and some manufacturing parameters. Each case represents a low-yield situation with an already discovered root cause related to some manufacturing procedure; however, all of the cases require extensive trouble-shooting time. Based on the given cases, a prototype of the data-driven data mining system is required to discover the possible root causes

for the subject cases. Since a large amount of data on this company exists, the data mining system only discovers the killer machines for the cases that were prepared by product engineers in the event of an abnormal manufacturing situation. Additionally, the attribute weights in the given cases are initially treated as equal because of the lack of previous built-in knowledge. Also, this engine is required to be noise-insensitive since noise is difficult to filter in semiconductor yield enhancement applications.

After discussing this project, the data mining system should be designed according to the following criteria:

1. Platform criterion: The data mining system needs to be executed in both server-end and client-end applications according to the functional specification of an iEDA (Intelligent Engineering Data Analysis) system in TSMC.

2. Development environment criterion: The data mining system should be developed as some independent functional modules due to the system integration and platform issue; and a prototype system integrating all proposed modules should be provided for testing and evaluation via TSMC.

3. Given data set criterion: Since the EDA system involves a vast and still growing quantity of data, it seems impossible to analyze all manufacturing data in the EDA system via the data mining system. The data mining system is designed for

analyzing a pre-generated data set in the event of a low-yield situation. Restated, the input data for the data mining system should be generated as a low-yield situation case. Some lot-based manufacturing information is involved in this low-yield situation case, and each case comprises a maximum of six segments, including basic lot information, WIP information, CP information, in-line metrology results, WAT results and other manufacturing parameter segments, and a unique decision feature used to classify the high and low yield group of given lots. As mentioned above, the data mining system is designed as a data-driven solution, and no previous knowledge is built to recognize the attribute catalog and type, with the attributes of all given cases that are processed by the data mining system being named according to the pre-defined naming rules. Furthermore, the user-prepared data files are acceptable only if the naming rules of attributes are followed.

4. Accuracy criterion: In this data mining project, the accuracy rate should exceed 80%

in all cases. The percentage of hit cases thus should exceed 80%, where a hit case means that the real root cause ranks within the top five rankings on the possible root cause ranking list.

5. Efficiency criterion: The procedure of the mining engine should be completed within one minute using the benchmark case involving 300 lots and 13000 attributes for each lot.

The above criteria are incorporated into a data mining system scenario through the following procedures:

1. Data preparation procedure: The raw data of cases are first retrieved from the EDA database and then transformed into Bit-wise Indexing (BWI) matrixes [10] to accelerate the subsequent mining procedure. Figure 8.1 illustrates three major functional modules, including the Data Quality Analysis, Cutting- Point Calculating and Data Dispatcher modules, in this processing phase. Since semiconductor manufacturing processes have become increasingly sophisticated, data collection problems also have become increasing serious, particularly when using advanced technologies. Generally, in a spit lot situation, sparse and null data issues may seriously impact the accuracy of the data mining results. The Data Quality Analysis module is then employed to check quality of a given data set based upon our proposed quality indicators. This function also provides lot and attribute merging mechanisms in order to help users for combining the spit lot or procedure step in the given data set. When the quality of the given data set is confirmed by the user, a decision feature is required for judging the lot information within the whole data set.

The decision feature of this data set is used to separate all given lots into two independent groups, called normal and abnormal lot group. After the decision

feature is selected, the Cutting-Point Calculating module is executed to determine whether the normal lot group is located at the right-hand (larger than) or left-hand (smaller than) side of the given critical point. Certainly, users can define these two parameters by themselves based on different situations. Since decision feature and cutting-point are selected, the Data Dispatcher module has been used to dispatch some individual data segments for data mining according to the naming rules, and the corresponding BWI matrixes thus are generated.

Raw Data

Data Quality Analysis

Cutting-Point Calculating Data Dispatcher Naming

Columns/Lots DM Model Decision

Data Segments

Analysis Result QC OK

Data Set

Cutting-Point Calculating

Cutting Point &

Decision Direct

Figure 8.1: the flowchart of data preparation procedure

Cutting Point &

Figure 8.2: the flowchart of data mining procedure

2. Data mining procedure: Once the target BWI matrixes are fully prepared and the data quality is verified, the data mining procedure is triggered to analyze the content of cases and discover the root causes for the target cases. Figure 8.2 briefly describes four major data mining modules, including the Transaction-based, Learning-based, Feature-based and Statistical-base modules, as presented below:

i) Transaction-based module: Generally, over 80% of low-yield situations in the semiconductor manufacturing result from machine failure, and it is extremely difficult to determine the degree to which each machines contributes to failure during the manufacturing procedure. The root causes for production of low-yield

wafers are hard to determine, since yield can not be qualified during the manufacturing process. Generally, product engineers require some data analysis methods for identifying evidence regarding possible root cause. According to the experience of domain experts, methods based on single variable analysis usually have seldom null-value tolerant ability. Therefore, these methods are not quite suitable for seeking the root cause machine for the semiconductor manufacturing domain. To solve the above problem, the Transaction-based module, including equipment and multiple factor mining function, is applied to analyze the WIP

在文檔中知識系統中快速索引機制之研究 (頁 175-0)