IC基板製程時間之特徵選擇研究－以鑽孔作業為例 - 政大學術集成

全文

(1)國立政治大學資訊管理學系. 碩士學位論文指導教授：劉文卿博士. 立. 治博士政許志堅大. ‧. ‧ 國. 學 y er. io. －以鑽孔作業為例. sit. Nat. IC 基板製程時間之特徵選擇研究. n. A Study of Features a l Selection to Process Time v of IC Substrate i - For Example n C of Drilling Operation. hengchi U. 研究生：宋伯謙中華民國一零六年一月.

(2) 中文摘要在數據分析的領域中，尤其在大數據的領域之中，因常含有相當高維度的預測變數，故特徵選擇是一個很重要的主題。這個主題在半導體的應用上，已經獲得相當豐碩的成果，但在 IC 基板的應用上，成果就相對顯得貧乏。所以，此次的研究(以 IC 基板中鑽孔製程為例)將透過以下的試驗方法(含：GR-SNBC (Gain Ratio with Naive Bayes Classifier)、SU-SNBC (Symmetrical Uncertainty with Naive Bayes Classifier)與 SU-CART (Symmetrical Uncertainty with Classification and Regression Tree Classifier))，來建立可應用於 IC 基板製程時間預測上的一組屬性。最後，此一研究的成果不僅在於，使用資料挖礦的方法，來找出一組具有顯著性，而且可以用來預測的 IC 基板製程時間的產品特徵屬性；而且，發現若為了縮短製程時間，來自產品結構本身的因子，會比來自產品在生產管理上的因子更具顯著的效果。. 政治大關鍵字：特徵選擇、產品特徵、製程時間、資料挖礦立. ‧. ‧ 國. 學. n. er. io. sit. y. Nat. al. Ch. engchi. -i-. i n U. v.

(3) ABSTRACT Feature selection is significate subject in domain of data analysis, especially in big-data with a lot of high dimension predictive variables. In semi-conductor field, this subject has already gotten a plenty of achievement, but not in IC-substrate; so in this research for example of drilling operation, through experiments, it builds a group of selective features for this field to predict process time, and the methods used are GR-SNBC (Gain Ratio with Naive Bayes Classifier), SU-SNBC (Symmetrical Uncertainty with Naive Bayes Classifier) and SU-CART (Symmetrical Uncertainty with Classification and Regression Tree Classifier). The contributions of this research are not only a selective product characteristics subset suggested to predict process-time in IC-substrate fab via the data-mining methods here, but also an observation that in order to shorten the process time, the factors of product construction weighs more than production management.. 立. 政治大. ‧ 國. 學. ‧. Keyword: Features Selection, Product Characteristics, Process Time, Data-mining. n. er. io. sit. y. Nat. al. Ch. engchi. -ii-. i n U. v.

(4) Table of Contents 中文摘要.……………………………………………………………………………………………….i ABSTRACT…………….……………………………………………..…………………………....ii Table of Contents……………………..….……………………………………………………iii Table of Figures…………………………….……………………….….……………………….v Table of Tables…………………………………….…………………………………………….vi 1. Introduction ...................................................................... 1. 2. Literature Review .............................................................. 3 2.1. Predictive Factors about Cycle Time .............................. 3. 2.2. Related Work about Time Predicted ............................... 4. 3.1. Data Preparation ...................................................... 11. 治政 2.3 Data Mining and Knowledge Discovery 大 in Databases ........ 6 立 3 Methods Design ............................................................... 11 Data Integration .................................................. 12. y. Nat. Data Pre-processing .................................................. 14. 3.2.2. Data Cleaning ..................................................... 15. 3.2.3. Data Transformation ............................................ 17. 3.2.4. Data Reduction ................................................... 17. n. al. er. sit. Data Filtering ...................................................... 14. io. 3.2.1. 3.3. 5. ‧ 國. 3.1.2. ‧. Data Collection and Review ................................... 11. 3.2. 4. 學. 3.1.1. Ch. engchi. i n U. v. Data-Mining Method .................................................. 18. Empirical Implement ........................................................ 25 4.1. Experimental Data Settlement .................................... 25. 4.2. Experimental Approaches Implement .......................... 29. 4.3. Experimental Result and Discussion ............................ 34. 4.3.1. Experimental Result of Filters ................................ 34. 4.3.2. Experimental Result of Wrappers with Filtering ........ 36. 4.3.3. Experimental Result of Analysis ............................. 40. Conclusion ...................................................................... 43 5.1. Study Limitations ..................................................... 43 -iii-.

(5) 5.2. Research Contribution ............................................... 43. 5.3. Future Suggestion .................................................... 44. REFERENCE…….…………………………………………………………………………………45. 立. 政治大. ‧. ‧ 國. 學. n. er. io. sit. y. Nat. al. Ch. engchi. -iv-. i n U. v.

(6) Table of Figures Figure-1 An Overview of the Steps That Compose the KDD Process .......................................................................................... 10 Figure-2 overall experiment flow chart ..................................... 20 Figure-3 intersection of feature subsets.................................... 41. 立. 政治大. ‧. ‧ 國. 學. n. er. io. sit. y. Nat. al. Ch. engchi. -v-. i n U. v.

(7) Table of Tables Table-1 research domain and time predicted ............................................... 4 Table-2 research category, description and representatives ................. 6 Table-3 tableau of experiment design ........................................................... 24 Table-4 the summarized table of data preparation in practice ........... 26 Table-5 the summarized table of data pre-processing in practice .... 29 Table-6 information of dataset for experiment .......................................... 29 Table-7 GR implementation commands ........................................................ 30 Table-8 SU implementation commands......................................................... 30 Table-9 GR-SNBC implementation commands ........................................... 31 Table-10 SU-SNBC implementation commands......................................... 31 Table-11 SU-CART implementation commands ......................................... 32 Table-12 GR-SNBC probable accuracy evaluation .................................... 33 Table-13 SU-SNBC probable accuracy evaluation .................................... 33 Table-14 SU-CART probable accuracy evaluation ..................................... 33 Table-15 features ranked by filter of GR ...................................................... 35 Table-16 features ranked by filter of SU ....................................................... 36 Table-17 selective attributes by GR-SNBC ................................................... 37 Table 18 selective attributes by SU-SNBC ................................................... 39 Table 19 selective attributes by SU-CART .................................................... 40 Table-20 probable accuracy estimated .......................................................... 40 Table-21 illustration of feature subsets ......................................................... 42. 立. 政治大. ‧. ‧ 國. 學. n. er. io. sit. y. Nat. al. Ch. engchi. -vi-. i n U. v.

(8) 1 Introduction Manufacturing is more and more important nowadays, since "Industry 4.0" originates from a project in the high-tech strategy of the German government, which promotes the computerization of manufacturing. With the “Industry 4.0” of German government, the “Advanced Manufacturing Partnership” of US, and the “China Manufacturing 2025 Plan” of China, it definitely announces the industrial revolution 4.0 stage. In the new era, it facilitates the vision and execu-. 政治大. tion of a “Smart Factory”. Since this “Cyber-Physical Systems” coming,. 立. forecasting and predicting are the key points in the future for any. ‧ 國. 學. industry, especially in 3C products. In forecasting and predicting cycle. ‧. time, there are already existing a number of methods, such as direct procedure, mathematics, statistical, simulated, and data-mining ap-. y. Nat. sit. proach. (Chung and Huang, 2002) (Meidan et al., 2011). n. al. er. io. Therefore, in order to control cycle-time better, first at all, I have. i n U. v. to understand and investigate a data-driven approach that identifies. Ch. engchi. key factors and predicts their impact on process time.[2][3] As well, predictors and features selection have become significant in industrial field of application for which datasets with hundreds of data columns are available in MES (Manufacture Execution System) and ERP(Enterprise Resource Planning) systems. The objective of features selection is three-fold: the recognition rate is improved; less computing power is needed for constructing the classifier; the selected features can help us understand the causal relationship between features and classes. -1-.

(9) In the remainder of this research, I describe literature review in section 2, and data collection and data preprocess in the former of section 3. The latter of section 3 is devoted to the illustration of the data preparation and preprocessing, as well, feature selection approaches. In section 4, I apply these approaches to the data and present my experimental results of it. In section 5, I discuss my findings and conclude this research.. 立. 政治大. ‧. ‧ 國. 學. n. er. io. sit. y. Nat. al. Ch. engchi. -2-. i n U. v.

(10) 2 Literature Review Before entering the main content, this research try to illustrate related work about predictive factors, time predicted, research categories, methods and its representatives respectively. Of course, the method of this research to designate and implement, data mining and knowledge discovery in databases, it have been attracting a significant amount of research, industry, and media attention of late.. 政治大. 2.1 Predictive Factors about Cycle Time. 立. Basically, key factors are categorized into three kind; one is "job. ‧ 國. 學. characteristics", another is "shop condition", and the other is "dynamics workload and queue". [21][22][23][24][25]. ‧. (a) job characteristics: only job characteristics is considered,. sit. y. Nat. such as the total process time, the number of reentrances, and. al. er. io. the number of operations of the lot, as well, slack time.. v. n. (b) shop condition: both the processing and the queueing status. Ch. engchi. i n U. in realistic fab production is considered, such as the total process time, the total current number of WIP(work-in-progress, WIP), the probabilistic distribution of waiting time. (c) dynamics workload and queue: Since the cycle time of WIP is auto-correlation, its estimation should be constructed dynamically with workload and queue, such as the number of WIP in the fab or waiting for the most bottleneck machines or on the processing route of the lot, the average fab utilization.. -3-.

(11) 2.2 Related Work about Time Predicted In the past, a lot of scholars have studied to time predicted, the research domains focused at semi-conductor primarily. Almost articles studies are on ‘cycle-time’, and others’ studies on ‘process time’ as the following Table-1. refer-. au-. research. ence. thors. domain semi-conductor. (critical) steps in the process 政治大 establishing dynamic CT predic-. semi-conductor. ‧. Meidan. complete production line. y. semi-conductor. io. al. n. et al.. to predict CT of a single operation step, a line segment or a. Nat. [3]. Tirkel. tion models, which can be used. 學. [2]. ‧ 國. 立. sit. et al.. new lot’s cycle time at certain. lot cycle time (CT). er. [1]. Backus. time predicted to research. v i n Ch cycle time pree n g csemiconductor hi U. diction based on historical pro-. [4]. Chien et al.. semi-conductor. duction line data; This cycle time only take “wafer fabrication time” into count in that study. [5]. Has-. non-volatile-. forecasting the steady state cy-. soun. memory. cle time of process segments. Table-1 research domain and time predicted. -4-.

(12) The research to study cycle time also exists in different categories and via different entrance approaches, as the following Table-2. category. direct. represent-. research description. atives. Direct procedure only uses current availa-. Cheng &. ble information as job characteristics,. Gupta(1989). shop condition etc. to predict cycle time.. Vig &. procedure. Dooley(1991). 立. Atherton and. simulate the lot cycle time that occurs in a. Atherton. wafer fab.. (1995). ‧ 國. ‧ y. sit. n. al. er. io analysis. al.(1999). Discrete event simulations are used to. Nat statistical. Chang et. 學. simulation. 政治大. Ch. engchi U. v ni. Wood (1997) Kim et al. (1998). Regression analysis or some other statisti-. Her &. cal analysis method is applied to deter-. Li(1996). mine the relationship between the cycle. Raddon and. time and other related parameters.. Grigsby. method. (1997) Chien et al.(2012) Martin (1998). analytical method. Su (1998) -5-.

(13) The analytical method is based primarily. learningdata mining. Dooley. and its deviation.. (1991). An interdisciplinary field bringing together. Chien et. techniques from machine learning, pattern. al.(2005). recognition, statistics, databases, and vis-. Hsu et. ualization to address the issue of infor-. al.(2007). 政治大 Cabena et al. (1998) 立. Ganesan et. The hybrid method combines different. Etheshami et. methods to produce a cycle time estima-. al. (1992). tion. For example, by consideration of the. Enns (1995). lot characteristics.. Fronckowiak. mation extraction from large data bases.. al.(2008). ‧. method. matical model to derive the lot cycle time. 學. hybrid. Vig and. ‧ 國. machine. on queueing theory or some other mathe-. er. io. sit. y. Nat. al. et al. (1996). v. n. Table-2 research category, description and representatives [3][18][19]. Ch. engchi. i n U. In the research of Backus et al. (2006), they demonstrate and analyze the advantage and disadvantage about simulation, statistical, analytical and data-mining methods as followings: firstly, in a completely linear process, intermediate cycle-time predictions can be obtained directly from Little’s law. Secondly, simulation has been the most commonly studied approach and Atherton and Atherton (1995) argued that it is the best approach for a complex process such as semiconductor manufacturing. Thirdly, a traditional regression analysis does not handle categorical predictors, interactions, missing data, or outliers well. In addition, it is limited by inherently linear models. Their work also is based on a statistical analysis. However, they develop sophisticated predictor variables and we apply modern modeling methods. As was pointed out by Enns, statistical models need to -6-.

(14) be updated with the current tooling/capacity characteristics of the process. Consequently, they consider the maintenance of a model (in particular, the ability to regenerate the model) to be an important element. Advantages of a data mining approach were discussed by Janakiram et al.. With the ability to quickly reanalyze the statistical data, such models can be updated as necessary. Then, analytical methods rooted in queueing theory comprise the third class of methods, and much process modeling is necessary to develop an effective model. Finally, instead of an overly simplistic analytical model, a linear statistical model, or a complex simulation model, they propose to identify those characteristics and evaluate the quality of predictions by data mining approach. [1]. 政治大 2.3 Data Mining and Knowledge Discovery in Data立. ‧ 國. 學. bases. Historically, the notion of finding useful patterns in data has been. ‧. given a variety of names, including data mining, knowledge extrac-. sit. y. Nat. tion, information discovery, information harvesting, data archaeology,. io. er. and data pattern processing. KDD refers to the overall process of discovering useful knowledge from data, and data mining refers to a. al. n. v i n C h Data miningUis the application of speparticular step in this process. engchi cific algorithms for extracting patterns from data. The steps in the KDD process, such as data preparation, data selection, data cleaning, incorporation of appropriate prior knowledge, and proper interpretation of the results of mining, are essential to ensure that useful knowledge is derived from the data. KDD has evolved, and continues to evolve, from the intersection of research fields such as machine learning, pattern recognition, databases, statistics, AI, knowledge acquisition for expert systems, data -7-.

(15) visualization, and high-performance computing. The unifying goal is extracting high-level knowledge from low-level data in the context of large data sets. The data-mining component of KDD currently relies heavily on known techniques from machine learning, pattern recognition, and statistics to find patterns from data in the data-mining step of the KDD process. The KDD process is interactive and iterative, involving numerous steps with many decisions made by the user. Brachman and Anand(1996) give a practical view of the KDD process, emphasizing the interactive nature of the process. Here, I broadly. 政治大 outline some of its basic steps: 立. ‧ 國. 學. (a) First is developing an understanding of the application domain and the relevant prior knowledge and identifying the goal. ‧. of the KDD process from the customer’s view-point.. y. Nat. (b) Second is creating a target data set: selecting a data set, or. discovery is to be performed.. al. er. io. sit. focusing on a subset of variables or data samples, on which. n. v i n C hand preprocessing. (c) Third is data cleaning Basic operations inengchi U clude removing noise if appropriate, collecting the necessary. information to model or account for noise, deciding on strategies for handling missing data fields, and accounting for timesequence information and known changes. (d) Fourth is data reduction and projection: finding useful features to represent the data depending on the goal of the task. With dimensionality reduction or transformation methods, the effective number of variables under consideration can be reduced, or invariant representations for the data can be found. -8-.

(16) (e) Fifth is matching the goals of the KDD process (step 1) to a particular data-mining method. For example, summarization, classification, regression, clustering, and so on, are described later as well as in Fayyad, Piatetsky-Shapiro, and Smyth (1996). (f) Sixth is exploratory analysis and model and hypothesis selection: choosing the datamining algorithm(s) and selecting method(s) to be used for searching for data patterns. This process includes deciding which models and parameters might be appropriate (for example, models of categorical data are differ-. 政治大 ent than models of vectors over the reals) and matching a par立. ‧ 國. 學. ticular data-mining method with the overall criteria of the KDD process (for example, the end user might be more interested. ‧. in understanding the model than its predictive capabilities).. y. Nat. (g) Seventh is data mining: searching for patterns of interest in. io. sit. a particular representational form or a set of such representa-. er. tions, including classification rules or trees, regression, and. n. al. clustering. The. v i n C hcan significantly user aid engchi U. the data-mining. method by correctly performing the preceding steps. (h) Eighth is interpreting mined patterns, possibly returning to any of steps 1 through 7 for further iteration. This step can also involve visualization of the extracted patterns and models or visualization of the data given the extracted models. (i) Ninth is acting on the discovered knowledge: using the knowledge directly, incorporating the knowledge into another system for further action, or simply documenting it and reporting it to interested parties. This process also includes checking -9-.

(17) for and resolving potential conflicts with previously believed (or extracted) knowledge. The KDD process can involve significant iteration and can contain loops between any two steps. The basic flow of steps (although not the potential multitude of iterations and loops) is illustrated in Figure 1. Most previous work on KDD has focused on step 7, the data mining. However, the other steps are as important (and probably more so) for the successful application of KDD in practice. Having defined the basic notions and introduced the KDD process, I now focus on the. 政治大 data-mining component, which has, by far, received the most atten立 ‧. ‧ 國. 學. tion in the literature. [11]. n. er. io. sit. y. Nat. al. Ch. engchi. i n U. v. Figure-1 An Overview of the Steps That Compose the KDD Process [11]. -10-.

(18) 3 Methods Design Devising useful features to extract requires domain knowledge. Inventing features that might be useful without some underlying idea of why such a feature, or set of features, might be useful is seldom of value. (Pyle, 1999) Before constructing the feature selection model to get a group of selective features for process time, huge amount of long-term historical and a lot of characteristics of material items data were all collected from different systems. After reviewing the col-. 政治大. lected data, it often contains incorrect, inconsistent, redundant, du-. 立. plicate data, as well, noise data such as missing values and outliers. ‧ 國. 學. that must be cleaned to prevent the proper information of future analysis. Moreover, there are several concerns such as data quantity,. ‧. data scales, data reduction and data transformation, which also have. n. al 3.1 Data Preparation. er. io. sit. y. Nat. to be took into consideration before input them into models.. Ch. engchi. i n U. v. Data preparation is consisted some steps of "data collection", "data review" and "data integration" and before data integration, I should consider incorrect data, inconsistent data, redundant data and duplicate data, then I could combine these data source into a dataset for next stage.. 3.1.1 Data Collection and Review In large enterprises, data could be scattered in a number of departments and on different platforms. In most cases, the data is even -11-.

(19) acquired and maintained using different software systems. The goal, depth and standard of data collection may vary across the enterprise. As a result, when data from more than one group is required for data analysis, problems related to the use of data from multiple sources may arise. [17] There are two original data sources I plan to collect; one is about ITEM data table from ERP (Enterprise Resource Planning) system, and the other is about WIP data table from MES (Manufacturer Execution System) system.. 政治大 The former one provides a lot of product characteristics including 立. ‧ 國. 學. physical dimension, product category, and some procedure details as. etching, copper and image process. The latter one provides WIP. ‧. (Work In Process) historical production records from the first opera-. y. Nat. tion to the last operation including, primarily, lot-type, lot-id, lot-pri-. io. sit. ority, lot- status, process time, waiting time, hold time, quantity of. n. al. er. WIP, unit of WIP, production route, operations, and the critical time. Ch. stamps for each operation.. engchi. i n U. v. 3.1.2 Data Integration In the step of data integration, it has be considered the following mistakes of data including incorrect, inconsistent, redundant and duplicate. First at all, incorrect data having incorrect attribute values, owing to the following. The data collection instrument may be faulty; human or programs may be errors while inputting; data transmission may -12-.

(20) occur error. In my case, it is has constraint on the “product-class” for class1, class2, class3, but others value is recorded, this kind of records has to be deleted. Next, inconsistent data becomes important when data is collected by several groups. This is especially true in domains where sensor data are collected and analyzed. Sensor data consists of a lot of text and symbolic attributes where groups of data have to be combined. The inconsistent problems could be due to human way of representing the text or even use of natural language understanding/pro-. 政治大 cessing capabilities in the data collection process. (Fayyad et al, 1996) 立. ‧ 國. 學. In my case, date-time format between this two systems is different, one is 'YYYY/MM/DD HH24:MI:SS' and the other is 'MM/DD/YYYY. ‧. HH24:MI:SS', so related time duration to be calculated carefully.. y. Nat. After that, redundant data can be easily merged from different. io. sit. streams or may be present in one stream. Redundancy occurs when. er. essentially identical information is entered in multiple variables, such. al. n. v i n C hIf the information as “date_of_birth” and “age. is not actually identiengchi U cal, the worst damage is likely to be only that it takes a longer time to build the models. However, most modeling techniques are affected more by the number of variables than by the number of instances. Removing redundant variables, particularly if there are many of them, will increase modeling speed. (Pyle, 1999) In my case, there is already existing “customer-id” to distinguish specified customer clearly, but also existing “customer-name”; this is redundant data and have to be removed. -13-.

(21) Finally, duplicate data, data is integrated or collected from multiple sources. While integrating data from multiple sources, the amount of the data increases and as well as data is duplicated. (Tamilselvi et al., 2011) In my case, the ‘process-name’ in ITEM data-table duplicates with ‘route-name’ in WIP data-table, hence, duplicate data existing, one should be removed. After finishing the above procedures, since the raw data is from two different system, it need to be found a proper foreign key to merge this two data sources into a dataset for data preprocessing.. 立. 政治大. ‧ 國. 學. 3.2 Data Pre-processing. Dirty data could be due to reasons such as sensor failure, data. ‧. transmission or improper data entry, many of them may be unknown. y. Nat. at the time of data collection, so data pre-processing is important and. er. io. sit. time-consuming stage in data mining. Data preprocessing is consisted of "data filtering", "data cleaning", "data transformation" and. al. n. v i n C hstep of data-cleaning, "data reduction" and in the I should consider engchi U. noise data, miss-value data, outlier data, then I could improve data quality for model building. In the following sections, I try to describe its routine handlers for the specified situation considered in this research work.. 3.2.1 Data Filtering Whenever data is collected and used for a mining project, the miner needs to have some underlying idea, rationale, or theory as to -14-.

(22) why that particular data set can address the problem area. This idea, rationale, or theory forms the explanatory structure for the data set. It explains how the variables are expected to relate to each other, and how the data set as a whole relates to the problem. It establishes a reason for why the selected data set is appropriate to use. (Pyle, 1999) So I select massive production WIP except experimental and engineering lots, and the special operation of IC substrate, drilling, to be the empirical implementation objective in this research work.. 3.2.2 Data Cleaning. 立. 政治大. ‧ 國. 學. In the step of “data cleaning”, there are really a lot of particular and complex cases and different technical approaches should be. y. Nat. (a) Missing data. ‧. adopted respectively in the real world.. er. io. sit. If it is noted that there are many tuples that have no recorded value for several attributes, then the missing values can be filled in. al. n. v i n C hmethods (Arora for the attribute by various e n g c h i U et al., 2012) described. below, and methods of a. b. f. applied in this work. a. Ignore the tuple. b. Fill in the missing value manually.. c. Use a global constant to fill in the missing value. d. Use the attribute mean to fill in the missing value. e. Use the attribute mean for all samples belonging to the same class as the given tuple. f. Use the most probable value to fill in the missing value. -15-.

(23) (b) Noise data Noise is a random error or variance in a measured variable. Noise in the data can be attributed to several reasons: one is that data measurement or transmission errors, and the other is that inherent reasons such as characteristics of processes or systems from which data is collected. (Arora et al., 2012) They could be settled by several approaches such as the followings. After some consideration of causal relation and information keeping, then binning method is applied here for all data column which value is numerical scale.. 政治大 a. Binning method 立. ‧ 國. 學. b. Clustering method. c. Regression method. ‧. d. Combined computer and human inspection. y. Nat. (c) Outlier data. io. sit. An outlier is a single, or very low frequency, occurrence of the. er. value of a variable that is far away from the bulk of the values of the. al. n. v i n C h for some U variable. It is a problem because, modeling methods in parengchi. ticular (some types of neural network, for instance), outliers may distort the remaining data to the point of uselessness. For outlier data, although I consider the below approaches (Chien et al., 2014) to settle, finally, interquartile range (IQR) measurement to identity and remove outlier data is adopted. a. To eliminate directly. b. To use clustering method. c. To use interquartile range (IQR) measurement. d. To alternate it by another values using standardized -16-.

(24) transformation.. 3.2.3 Data Transformation In the step of data transformation, data would be made into the appropriate forms, as well, it could be convert and consolidate into an obvious forms for data-mining. There are a lot of specialized approaches of data transformation for particular case. In this research, some approaches applied are convention of dummy-variable, aggre-. 政治大 Convention of dummy-variables, where a dummy variable, 立. gation, normalization and standardization. (Arora et al., 2012) (a). ‧ 國. 學. also known as indicator variable, is an artificial variable created. to represent an attribute with two or more distinct catego-. ‧. ries/levels.. y. Nat. (b) Aggregation, where summary or aggregation operations are. er. io. sit. applied to the data.. (c) Normalization, where the attribute data are scaled so as to fall. al. n. v i n C h range usuallyUa primary target scale. with in a small specified engchi. (d) Standardization, where it is also well-known as Z-score transferred in Statistics.. 3.2.4 Data Reduction Complex data analysis and mining on huge amounts of data may take a very long time, making such analysis impractical or infeasible. Data reduction techniques have been helpful in analyzing reduced representation of the dataset without compromising the integrity of -17-.

(25) the original data and yet producing the quality knowledge. The concept of data reduction is commonly understood as either reducing the volume or reducing the dimensions. (Arora et al., 2012) In this work, the following approaches to facilitate fulfilling this step. (a) Generalization, where low level or primitive (raw) data are replaced by higher level concepts through the use of concept hierarchies. It is also known as “concept hierarchy generation”. (b) Discretization is a process of quantizing continuous attributes. In other words, it is the process of putting values into buckets. 政治大 so that there are a limited number of possible states. The suc立. ‧ 國. 學. cess of discretization can significantly extend the borders of many learning algorithms.. ‧ y. Nat. 3.3 Data-Mining Method. er. io. sit. The feature selection methods are typically presented in three classes based on how they combine the selection algorithm and the. n. al. model building.. Ch. (a) Filter Method:. engchi. i n U. v. It analyzes intrinsic properties of data, ignoring the classifier. Most of these methods can perform two operations, ranking and subset selection: in the former, the importance of each individual feature is evaluated, usually by neglecting potential interactions among the elements of the joint set; in the latter, the final subset of features to be selected is provided. (b) Wrapper Method: -18-.

(26) It evaluates subsets of variables which allows, unlike filter approaches, to detect the possible interactions between variables. The two main disadvantages of these methods are: The increasing overfitting risk when the number of observations is insufficient. The significant computation time when the number of variables is large. (c) Embedded Method: It has been proposed to reduce the classification of learning. They try to combine the advantages of both previous methods.. 政治大 The learning algorithm takes advantage of its own variable se立. ‧ 國. 學. lection algorithm. So, it needs to know preliminary what a good selection is, which limits their exploitation.. ‧. In 2011, a machine-learning data-mining approach of CMIM-. y. Nat. SNBC (conditional mutual information maximization and selective na-. io. sit. ive Bayesian classifier) approach proposed by Meidan et al. which has. er. significant performance including simplicity, interpretability and effi-. al. n. v i n C htree, a neuralUnetwork and multinomial ciency compared to decision engchi logistic regression in semi-conductor field. So, in my research, data-. mining approaches to get a selective feature subset are referred to the work of CMIM-SNBC approach (Meidan et al. 2011). CMIM is based on information theoretic ranking criteria (conditional mutual information), so information theory is adopted to be a foundation of my study for identification and prediction of process-time key factors in IC-substrate field. My experimental approach basically consists of wrapper with filter, and its outline executive process flow is as Figure-2. -19-.

(27) Wrapper evaluates attribute sets by using a learning scheme. It contains a classifier to use for estimating the accuracy of subsets. Cross validation is used to estimate the accuracy of the learning scheme for a set of attributes. With regard to filter is for running an arbitrary classifier (the base classifier to be used) on data that has been passed through an arbitrary filter. Like the classifier, the structure of the filter is based exclusively on the training data and test instances will be processed by the filter without changing their structure. For each wrapper, it adopts. 政治大 greedy-stepwise search strategy to learn. Greedy-stepwise performs 立. ‧ 國. 學. a greedy forward or backward search through the space of attribute subsets. May start with no/all attributes or from an arbitrary point in. ‧. the space. Stops when the addition/deletion of any remaining attrib-. y. Nat. utes results in a decrease in evaluation. Can also produce a ranked. io. sit. list of attributes by traversing the space from one side to the other. n. al. er. and recording the order that attributes are selected.. Ch. engchi. i n U. Features of Raw Dataset. Features of Processed Dataset. Features of Ranking Dataset. Pre-processing. Filter. Wrapper. v. Key Features of Prediction to Process-Time. Figure-2 overall experiment flow chart. There are 2 basic classifiers for experiment, respectively, “Naïve -20-.

(28) Bayes” and “Classification and Regression Tree”. Naïve Bayes is a class for a Naïve Bayes classifier using estimator classes. They are a family of simple probabilistic classifiers based on applying Bayes' theorem with strong (naive) independence assumptions between the features. Naïve Bayes classifiers are highly scalable, requiring a number of parameters linear in the number of variables (features/predictors) in a learning problem. Maximum-likelihood training can be done by evaluating a closed-form expression, which takes linear time, rather than by expensive iterative approximation. 政治大 as used for many other types of classifiers. 立. ‧ 國. 學. The CART (Classification and Regression Tree) approach was in-. troduced in 1984 by Leo Breiman, Jerome Friedman, Richard Olshen. ‧. and Charles Stone as an umbrella term to refer to the following types. y. Nat. of decision trees:. io. sit. (a) Classification Trees: where the target variable is categorical. n. al. er. and the tree is used to identify the "class" within which a target. Ch. variable would likely fall into.. engchi. i n U. v. (b) Regression Trees: where the target variable is continuous and tree is used to predict its value. The main elements of CART are: (a) Rules for splitting data at a node based on the value of one variable; (b) Stopping rules for deciding when a branch is terminal and can be split no more; (c) A prediction for the target variable in each terminal node. Here, “artificial neural network” is not chosen, because it is be -21-.

(29) studied and discussed plentifully at all, for its preciseness and accuracy, as well, its inherent characteristics including difficult explanation, long training duration, complex modeling, besides, not good scalability. Finally, about the evaluator, “Information Gain Ratio” and “Symmetrical Uncertainty” are considered in the experiment. "Information Gain Ratio" evaluates the worth of an attribute by measuring the gain ratio with respect to the class.. 政治大 𝐺𝐺𝐺𝐺(𝐶𝐶𝐶𝐶𝐶𝐶𝐶𝐶𝐶𝐶, 𝐴𝐴𝐴𝐴𝐴𝐴𝐴𝐴𝐴𝐴𝐴𝐴𝐴𝐴𝐴𝐴𝐴𝐴) = 𝐺𝐺𝐺𝐺𝐺𝐺𝐺𝐺(𝐶𝐶𝐶𝐶𝐶𝐶𝐶𝐶𝑠𝑠 | 𝐴𝐴𝐴𝐴𝐴𝐴𝐴𝐴𝐴𝐴𝐴𝐴𝐴𝐴𝐴𝐴𝐴𝐴) / 𝐻𝐻(𝐴𝐴𝐴𝐴𝐴𝐴𝐴𝐴𝐴𝐴𝐴𝐴𝐴𝐴𝐴𝐴𝐴𝐴) 立. ‧ 國. 學. (Equation-1). ‧. “Symmetrical Uncertainty” evaluates the worth of an attribute by measuring the symmetrical uncertainty with respect to the class.. n. er. io. sit. y. Nat. al. 𝑆𝑆𝑆𝑆(𝐶𝐶𝐶𝐶𝐶𝐶𝐶𝐶𝐶𝐶, 𝐴𝐴𝐴𝐴𝐴𝐴𝐴𝐴𝐴𝐴𝐴𝐴𝐴𝐴𝐴𝐴𝐴𝐴). i n U. v. = 2 ∗ 𝐺𝐺𝐺𝐺𝐺𝐺𝐺𝐺(𝐶𝐶𝐶𝐶𝐶𝐶𝐶𝐶𝐶𝐶 | 𝐴𝐴𝐴𝐴𝐴𝐴𝐴𝐴𝐴𝐴𝐴𝐴𝐴𝐴𝐴𝐴𝐴𝐴) / [ 𝐻𝐻(𝐶𝐶𝐶𝐶𝐶𝐶𝐶𝐶𝐶𝐶) + 𝐻𝐻(𝐴𝐴𝐴𝐴𝐴𝐴𝐴𝐴𝐴𝐴𝐴𝐴𝐴𝐴𝐴𝐴𝐴𝐴) ]. Ch. engchi. (Equation-2). Where, 1) (𝑋𝑋) is the entropy of a discrete random variable 𝑋𝑋. Suppose 𝑝𝑝(𝑥𝑥) is the prior probabilities for all values of 𝑋𝑋, 𝐻𝐻(𝑋𝑋) is defined by 𝐻𝐻(𝑋𝑋) = − � 𝑝𝑝(𝑥𝑥) log 2 𝑝𝑝(𝑥𝑥) 𝑥𝑥∈𝑋𝑋. (Equation-3). 2) Gain( 𝑋𝑋∣𝑌𝑌 ) is the amount by which the entropy of 𝑌𝑌 decreases. -22-.

(30) It reflects the additional information about 𝑌𝑌 provided by 𝑋𝑋 and is called the information gain which is given by 𝐺𝐺𝐺𝐺𝐺𝐺𝐺𝐺( 𝑋𝑋 | 𝑌𝑌 ). = H(X) − H( X |Y ). = 𝐻𝐻(𝑌𝑌) − 𝐻𝐻( 𝑌𝑌 | 𝑋𝑋). (Equation-4). Where (𝑋𝑋∣ ) is the conditional entropy which quantifies the remaining entropy (i.e. uncertainty) of a random variable 𝑋𝑋. 政治大 Suppose (𝑥𝑥) is the prior probabilities for all values of 𝑋𝑋 and ( ∣ 立. given that the value of another random variable 𝑌𝑌 is known.. ‧ 國. 學. 𝑦𝑦 ) is the posterior probabilities of 𝑋𝑋 given the values of 𝑌𝑌 , 𝐻𝐻( 𝑋𝑋 ∣ 𝑌𝑌 ) is defined by. y. 𝑥𝑥∈𝑋𝑋. (Equation-5). sit. Nat. 𝑦𝑦∈𝑌𝑌. ‧. 𝐻𝐻( 𝑋𝑋 | 𝑌𝑌 ) = − � 𝑝𝑝(𝑦𝑦) � 𝑝𝑝( 𝑥𝑥 | 𝑦𝑦 ) log 2 𝑝𝑝( 𝑥𝑥 | 𝑦𝑦 ). al. er. io. Information gain is a symmetrical measure. That is the amount. v. n. of information gained about 𝑋𝑋 after observing 𝑌𝑌 is equal to the. Ch. engchi. i n U. amount of information gained about 𝑌𝑌 after observing 𝑋𝑋. This. ensures that the order of two variables (e.g.,(𝑋𝑋, 𝑌𝑌 ) or (𝑌𝑌,𝑋𝑋)). will not affect the value of the measure.[31]. Both evaluator adopt the search method of “Ranker” which could ranks attributes by their individual evaluations. In brief, the method design is scheduled as the following Table-3. Experi-. Filter. Wrapper. ment. Evaluator. Classifier -23-.

(31) Naïve Bayes. & Ranker. & Greedy Stepwise-forward. Symmetrical Uncer-. Naïve Bayes. tainty & Ranker. &Greedy Stepwise-forward. Symmetrical Uncertainty & Ranker. Classification And Regression Tree & Greedy Stepwise-forward. 政治大. Table-3 tableau of experiment design. 立. 學 ‧. Nat. y. SU-CART. Information Gain Ratio. io. sit. SU-SNBC. & Search Strategy. n. al. er. GR-SNBC. & Search Method. ‧ 國. Type. Ch. engchi. -24-. i n U. v.

(32) 4 Empirical Implement It follows the research method in the previous section to do data collection, review, preparation and pre-processing, then implement the models, after all of that, I get the result and analyze it.. 4.1 Experimental Data Settlement In this case study, there are two major data sources I collected to be a raw data; one is named ITEM data table from ERP system. 政治大. containing around 200 data columns with 8,000 data rows and the. 立. other is named WIP data table from MES system containing around. ‧ 國. 學. 100 data columns with 340,000 data rows.. After the stage of data preparation, valid-status of ITEM contains its. ‧. own identity with other information as description which could be. y. sit. n. al. er. io. purged.. Nat. identified by this identity in the same data-cell, so it need to be. Data. Incorrect. Source. Data. Ch. engchi. Inconsistent. i n U. v. Redun-. Duplicate. dant. Data. Data. Data Data. product-. date-time for-. customer-. process-. Columns. class. mat. info. name. user-login for-. operation-. mat. info. of ITEM. valid-status Data. n/a. n/a. n/a -25-. route-name.

(33) Columns of WIP Table-4 the summarized table of data preparation in practice. Then the stage is data pre-processing, dataset is sequentially settled by data filtering, data cleaning which contains missing-data, noise-data and outlier-data, data transformation and data reduction. The summary is scheduled as Table-4 as following. In step of data filtering, keys as CK (candidate key), FK (foreign. 政治大. key), identity as member-id, system-id, control-info as system-. 立. timestamp, system-control and check-flag, are all filtered. However,. ‧ 國. 學. massive production lots kept but experimental and engineering lots filtered; final-products lots kept but semi-product lots filtered; nor-. ‧. mal operation in drilling process kept but others filtered; in-house. al. er. io. such as defect, scrap and rework filtered.. sit. y. Nat. WIP kept but out-sourcing filtered; normal process kept but abnormal. v. n. In step of data cleaning, three sub-steps consists of missing-data,. Ch. engchi. i n U. noise-data and outlier-data are checked. To missing-data, some datacolumns are backup for future usage so there is no data in them, otherwise, some data-columns with information but too sparse (<= 10%) to provide data value for this data property, and all data-cell contains the same value (same information) all over the same datacolumn. The above cases should be deleted, but others is filled up with “unknown” or “0”. As to noise data, I firstly group by “producttype” and “operation-id”, then binning procedure is applied to the data through equal-frequency with 3-depth-median-smooth. After -26-.

(34) that, inter-quartile procedure is executed, as well, with outlier factor is 1.5 and extreme value factor is 3. In step of data transformation, there are several sub-steps to execute as following: (a). In sub-step of convention of dummy-variable, I have to. check all nominal (categorical) data-column to convert them to be dummy variables. For example, product-type is consisted of 10 digit characters originally, then by its context value status, it converted into 60 dummy variables. (b). 政治大 In sub-step of aggregation, there are several important 立. ‧ 國. 學. time duration using for prediction as process-time, waitingtime, cycle-time and mean-time. They are derived from the. Nat. y. ‧. following equations.. sit. n. al. er. io. 𝐶𝐶𝐶𝐶𝐶𝐶𝐶𝐶𝐶𝐶𝐶𝐶𝐶𝐶𝐶𝐶𝐶𝐶 = 𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸 − 𝐵𝐵𝐵𝐵𝐵𝐵𝐵𝐵𝐵𝐵𝐵𝐵𝐵𝐵𝐵𝐵𝐵𝐵. Ch. n U engchi. iv. (Equation-6). 𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃 = 𝑂𝑂𝑂𝑂𝑂𝑂𝑂𝑂𝑂𝑂𝑂𝑂𝑂𝑂 − 𝐼𝐼𝐼𝐼𝐼𝐼𝐼𝐼𝐼𝐼𝐼𝐼. (Equation-7). 𝑊𝑊𝑊𝑊𝑊𝑊𝑊𝑊𝑊𝑊𝑊𝑊𝑊𝑊𝑊𝑊𝑊𝑊𝑊𝑊𝑊𝑊 = 𝐶𝐶𝐶𝐶𝐶𝐶𝐶𝐶𝐶𝐶𝐶𝐶𝐶𝐶𝐶𝐶𝐶𝐶 − 𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃. (Equation-5). 𝑀𝑀𝑀𝑀𝑀𝑀𝑀𝑀𝑀𝑀𝑀𝑀𝑀𝑀𝑀𝑀 =. 𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃 𝐼𝐼𝐼𝐼𝐼𝐼𝐼𝐼𝐼𝐼 𝑄𝑄𝑄𝑄𝑄𝑄𝑄𝑄𝑄𝑄𝑄𝑄𝑄𝑄𝑄𝑄 𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃. (Equation-6). (c) In sub-step of normalization, although the dataset is combined from 2 data-table, after inspecting, there is no data -27-.

(35) value need to execute normalization procedure under standardization is executed. (d). In sub-step of standardization, for consideration with. data equally value, all data column to process by standardized of Z-score. In step of data reduction, for consideration of classifier property in next stage of data-mining, I execute generalization and discretization on process-time. Finally, in this case, the foreign key are composited, respectively,. 政治大 from “item-id” with “item- version” in ITEM data table and “product立. ‧ 國. 學. id” with “product-version” in WIP data table. In the dataset settled, there are existing around 2,000 rows with 600 columns (containing. n. al. er. io. sit. y. Nat. preparation.. ‧. dummy variables). The following Table-5 is summarized the data. Data. Data. Source. filtering. C h Data engchi cleaning. iv n U Data transfor-. reduc-. mation. tion. Data. foreign keys any column. any. Col-. ID. which is for. which type is. umns. control-info. backup,. nominal or or-. sparse,. dinal such as. same-value. flag, status …. factory-site. cycle-time. of ITEM. Data. non MP lots. -28-. Data. column n/a. mean-.

(36) Col-. non. final out-sourcing. waiting-time. umns. product. specific-tool-. process-time. of WIP. abnormal. rank. mean-time. OP.. useless-route. product-type. non drilling. time. recipe-info. OP. Table-5 the summarized table of data pre-processing in practice. 4.2 Experimental Approaches Implement. 政治大. In this study, these methods implemented via WEKA, Waikato. 立. Environment for Knowledge Analysis, with stable version 3.6.. ‧ 國. 學. The information of dataset applied to experiment is as the following Table-6. In these attributes, there are existing 630 attributes for. ‧. predictor and only one for response variable which is named as. sit. y. Nat. wip_with_item. io. Relation. n. al. Instances. Attributes. Ch. 1795. e n g631 chi. er. ‘SMTIME’.. i n U. v. Table-6 information of dataset for experiment. Before the main experiments, 2 procedures of filter with GR (Gain Ratio) and SU (Symmetrical Uncertainty) respectively to execute. The running detail command are presented in Table-7 and 8. Evaluator. weka.attributeSelection.GainRatioAttributeEval. Method Search Method. weka.attributeSelection.Ranker -T -. -29-.

(37) 1.7976931348623157E308 -N -1 Evaluation. 10-fold cross-validation. Mode Table-7 GR implementation commands. Evaluator. weka.attributeSelection.SymmetricalUncertAt-. Method. tributeEval. Search Method. weka.attributeSelection.Ranker -T 1.7976931348623157E308 -N -1. 治政 10-fold cross-validation 大立. Evaluation. 學. ‧ 國. Mode. Table-8 SU implementation commands. ‧. Then, there are 3 main experimental type as following: GR-SNBC, SU-SNBC and SU-CART. Their running detail information, respectively,. y. Nat. n. er. io. al. sit. are presented in Table-9, 10 and 11.. Ch. i n U. v. Resample weka.filters.supervised.instance.Resample -B 0.0 -S 3. engchi. Method. -Z 30.0 -no-replacement. Filter. weka.filters.supervised.attribute.AttributeSelection -E. with. "weka.attributeSelection.InfoGainAttributeEval " -S. Search. "weka.attributeSelection.Ranker -T -. Method. 1.7976931348623157E308 -N -1". Wrapper. weka.classifiers.meta.AttributeSelectedClassifier -E. with. "weka.attributeSelection.WrapperSubsetEval -B. Search. weka.classifiers.bayes.NaiveBayes -F 5 -T 0.01 -R 1 -. -30-.

(38) Strategy. E DEFAULT --" -S "weka.attributeSelection.GreedyStepwise -T -1.7976931348623157E308 N -1 -num-slots 1" -W weka.classifiers.trees.J48 -- -C 0.25 -M 2. Evalua-. 5-fold cross-validation. tion Mode Table-9 GR-SNBC implementation commands. 政治大. Resample weka.filters.supervised.instance.Resample -B 0.0 -S 5. 立. -Z 50.0 -no-replacement. Filter. weka.filters.supervised.attribute.AttributeSelection -E "weka.attributeSelection.SymmetricalUncertAt-. ‧. Search. ‧ 國. with. 學. Method. tributeEval " -S "weka.attributeSelection.Ranker -T -. Wrapper. weka.classifiers.meta.AttributeSelectedClassifier -E. with. "weka.attributeSelection.WrapperSubsetEval -B. Search. weka.classifiers.bayes.NaiveBayes -F 5 -T 0.01 -R 1 -. Strategy. E DEFAULT --" -S "weka.attributeSelec-. n. al. er. io. sit. y. 1.7976931348623157E308 -N -1". Nat. Method. Ch. engchi. i n U. v. tion.GreedyStepwise -T -1.7976931348623157E308 N -1 -num-slots 1" -W weka.classifiers.trees.J48 -- -C 0.25 -M 2 Evalua-. 5-fold cross-validation. tion Mode Table-10 SU-SNBC implementation commands. -31-.

(39) Resample weka.filters.supervised.instance.Resample -B 0.0 -S 7 Method. -Z 50.0 -no-replacement. Filter. weka.filters.supervised.attribute.AttributeSelection -E. with. "weka.attributeSelection.SymmetricalUncertAt-. Search. tributeEval " -S "weka.attributeSelection.Ranker -T -. Method. 1.7976931348623157E308 -N -1". Wrapper. weka.classifiers.meta.AttributeSelectedClassifier -E. with. "weka.attributeSelection.WrapperSubsetEval -B. -S 1 -M 2.0 -N 5 -C 1.0" -S "weka.attributeSelec-. 學. Strategy. ‧ 國. Search. 政治大 weka.classifiers.trees.SimpleCart -F 5 -T 0.01 -R 1 -立. tion.GreedyStepwise -T -1.7976931348623157E308 -. y. sit. io. al. v i n C h implementationUcommands Table-11 SU-CART engchi n. Mode. er. tion. 5-fold cross-validation. Nat. Evalua-. ‧. N -1" -W weka.classifiers.trees.J48 -- -C 0.25 -M 2. Besides the detailed implementation command of main experiments, the probable accuracy is estimated by the following design presented in Table-12, 13 and 14 respectively. Attribute. weka.classifiers.bayes.NaiveBayes. Selected Classifier Evaluator. weka.attributeSelection.GainRatioAttributeEval -32-.

(40) Search. weka.attributeSelection.Ranker -T -. Strategy. 1.7976931348623157E308 -N -1. Test Mode. 10-fold cross-validation Table-12 GR-SNBC probable accuracy evaluation. Attribute. weka.classifiers.bayes.NaiveBayes. Selected Classifier Evaluator. weka.attributeSelection.SymmetricalUncertAttributeEval. 1.7976931348623157E308 -N -1. 10-fold cross-validation. ‧. Test Mode. 學. Strategy. 立. weka.attributeSelection.Ranker -T -. ‧ 國. Search. 政治大. Table-13 SU-SNBC probable accuracy evaluation. Evaluator. y. sit. er. al. n. Classifier. weka.classifiers.trees.SimpleCart -- -S 1 -M 2.0 -N. io. Selected. Nat. Attribute. 5 -C 1.0. Ch. engchi. i n U. v. weka.attributeSelection.SymmetricalUncertAttributeEval. Search. weka.attributeSelection.Ranker -T -. Strategy. 1.7976931348623157E308 -N -1. Test Mode. 10-fold cross-validation Table-14 SU-CART probable accuracy evaluation. -33-.

(41) 4.3 Experimental Result and Discussion In this section, I try to illustrate the results of filters and wrapper and try to analyze these two results.. 4.3.1 Experimental Result of Filters All of the features are ranked by filter of GR (Gain Ratio) and SU (Symmetrical Uncertainty) respectively as the following Table-15 and Table-16. Both of them are ordered by the importance of ranking.. 政治大 Predictor 立 product_category. category_code_4,. ‧. category_code_5. product_type. product_type_D,. io. sit. y. Nat. 2. product_type_E. product_dimension. al. n. 3. Examples. er. 1. Category. Predictor. 學. Level. ‧ 國. Rank. panel_depth,. v i n Ch i U e n g c hpanel_width,. panel_length, product_layer. 4. product_convert_rate. panel_use_rate, strip_use_rate. product_details. ball_criteria, pin_criteria,. 5. drc_criteria, xout_spec, -34-.

(42) net_numbers product_quality. quality_level,. 6. testing_level, product_yeild. 7. product_level. product_class. 8. client_check_status. customer_pass_flag. wip_flow_status. lot critical,. 9. lot priority. 政治大. Table-15 features ranked by filter of GR. 立. Predictor. Level. Category. Examples. ‧ 國. Predictor. 學. Rank. product_dimension. ‧. panel_width,. Nat. io. sit. y. panel_length,. product_layer. n. al. 2. product_convert_rate. Ch. er. 1. panel_depth,. i n U. v. panel_use_rate,. i e n g c hstrip_use_rate. product_quality. quality_level,. 3. testing_level, product_yeild. 4. 5 6. product_category. category_code_4, category_code_5. product_type. product_type_D, product_type_E. product_details. ball_criteria, -35-.

(43) pin_criteria, drc_criteria, xout_spec, net_numbers 7. product_level. product_class. 8. client_check_status. customer_pass_flag. wip_flow_status. lot critical,. 9. lot priority. 政治大. Table-16 features ranked by filter of SU. 立. 4.3.2 Experimental Result of Wrappers with Filtering. ‧ 國. 學. The hit rate in the following tables is measured by the selected number of folds while undertaking cross-validation evaluation.. ‧. y. 𝑡𝑡ℎ𝑒𝑒 𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠 𝑛𝑛𝑛𝑛𝑛𝑛𝑛𝑛𝑛𝑛𝑛𝑛 𝑜𝑜𝑜𝑜 𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓 𝑡𝑡ℎ𝑒𝑒 𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡 𝑒𝑒𝑒𝑒𝑒𝑒𝑒𝑒𝑒𝑒𝑒𝑒𝑒𝑒𝑒𝑒𝑒𝑒 𝑛𝑛𝑛𝑛𝑛𝑛𝑛𝑛𝑛𝑛𝑛𝑛 𝑜𝑜𝑜𝑜 𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓. io. sit. Nat. ℎ𝑖𝑖𝑖𝑖 𝑟𝑟𝑟𝑟𝑟𝑟𝑟𝑟 =. (Equation 7). n. al. er. The selective subset by commands of GR-SNBC is as Table-17.. Ch. i n U. Search. Greedy Stepwise (forwards). Method. Start set : no attributes. Attribute. supervised, Class (nominal). Subset. SMTIME. engchi. v. Evaluator Wrapper. Learning scheme: weka.classifiers.bayes.Na-. Subset. iveBayes. Evaluator. Scheme options: N/A Subset evaluation: classification accuracy. -36-.

(44) Number of folds for accuracy estimation: 5 Selective. Hit. Predictor. Predictor. attributes. Rate. Category. Examples. 100% 80% 60%. product_demision. product_depth, product_layer. n/a. n/a. product_type. product_type_E4, product_type_E8. 政治大 category_code_2, product_convert_rate panel_use_rate, 40%立 product_category. ‧ 國. strip_use_rate,. 學. product_quality. scarp_status. xout_spec,. sit. y. Nat. 20%. component_criteria,. ‧. product_details. net_numbers. er. io. n. product_class, a lproduct_level v i n Ch client_check_status U customer_pass_flag, i e h n c g 0% wip_flow_status. lot critical,. lot priority, Execution. Around 6 hours. duration Table-17 selective attributes by GR-SNBC. The selective subset by commands of SU-SNBC is as the following Table-18.. -37-.

(45) Search. Greedy Stepwise (forwards). Method. Start set : no attributes. Attribute. supervised, Class (nominal). Subset. SMTIME. Evaluator Wrapper. Learning scheme: weka.classifiers.bayes.Na-. Subset. iveBayes. Evaluator. Scheme options: N/A. 政治大 Number of folds for accuracy estimation: 5 立 Subset evaluation: classification accuracy. Predictor. Rate. Category product_demision. Examples. product_depth, product_layer,. sit. y. Nat. 100%. Predictor. ‧. ‧ 國. product_length. io. er. attributes. Hit. 學. Selective. n/a a ln/a v i n Ch product_type e n g c h i U product_type_E8 60%. n. 80%. 40%. product_quality. scarp_status. product_category. category_code_3. product_details. DRC_criteria. product_convert_rate panel_use_rate, 20%. product_level. strip_use_rate product_class. 0%. client_check_status. customer_pass_flag,. wip_flow_status. lot critical,. -38-.

(46) lot priority, Execution. Around 7 hours 30 mins. duration Table 18 selective attributes by SU-SNBC. The selective subset by commands of SU-CART is as Table-19. Search. Greedy Stepwise (forwards). Method. Start set : no attributes. Attribute. supervised, Class (nominal). Subset. SMTIME. 立. Evaluator. ‧ 國. Evaluator. Learning scheme: weka.classifiers.trees.SimpleCart Scheme options: N/A. ‧. Subset. 學. Wrapper. 政治大. Subset evaluation: classification accuracy. y. Nat. io. attributes. Rate. al. n. Hit. 100% 80% 60%. Predictor. er. Selective. sit. Number of folds for accuracy estimation: 5. iv. Predictor. Examples C h Category U n engchi product_demision product_depth,. product_layer n/a. n/a. product_type. product_type_E8. product_details. DRC_criteria. product_convert_rate panel_use_rate, 40%. product_category. strip_use_rate, category_code_3. -39-.

(47) 20%. 0% Execution. product_quality. scarp_status,. product_level. product_class,. client_check_status. customer_pass_flag. wip_flow_status. lot critical, lot priority. Around 16 hours. duration Table 19 selective attributes by SU-CART. Search. Probable. Selected. Evaluator. Strategy. Accuracy. SU-SNBC. NaiveBayes. n. SU-CART. al. GainRatio Symmetrical Uncertainty. Ch. SimpleCart. engchi. Symmetrical Uncertainty. Ranker. sit. NaiveBayes. io. GR-SNBC. y. Classifier. Nat. Type. Attribute. ‧. mental. Attribute. er. Experi-. 學. CART.. ‧ 國. 政治大 The probable accuracy is estimated by stratified 10-fold cross立 including GR-SNBC, SU-SNBC and SUvalidation shown as Table-20. 91.14%. Ranker. v. 91.14%. Ranker. 92.26%. i n U. Table-20 probable accuracy estimated. 4.3.3 Experimental Result of Analysis In filter phase, I get that product_category, product_convert_rate, product_demision, product_quality, product_details and product _type are the generally selective feature subsets, especially product_dimesion. On the contrary, product_level, client_check_status -40-.

(48) and wip_flow_status are commonly not significant predictive indicators. In wrapper phase, product_dimesion is still strongly significant, and wip_flow_status should not be selected to become a predictor. As to the other features could be applied but not so significant. Finally, through comparison with these 3 feature subset selected, the intersectional and selective common features subset as the following Figure-3 and Table-21.. GR-SNBC (91.14%). 4A. 學. ‧ 國. 立. 政治大 3A. ‧. 2C. 2A. 2B. n. al. Ch. engchi. y. 3B. SU-SNBC (91.14%). sit. io. SU-CART 3C (92.26%). er. Nat. 1A. i n U. v. Figure-3 intersection of feature subsets. Notation. Predictor Category. 1A. product_dimension. 2A. product_type, product_category, product_convert_rate, product_quality, product_details. 2B. product_type, product_quality, product_category, -41-.

(49) product_details, product_convert_rate, product_level 2C. product_type, product_details, product_convert_rate, product_category, product_quality. 3A. product_level. 3B. {EMPTY}. 3C. client_check_status. 4A. wip_flow_status Table-21 illustration of feature subsets. 立. 政治大. ‧. ‧ 國. 學. n. er. io. sit. y. Nat. al. Ch. engchi. -42-. i n U. v.

(50) 5 Conclusion In the end, I reflect on the target purpose setup in the beginning of this study, summarize it and provide possible future work.. 5.1 Study Limitations WEKA is quiet good and open data-mining tool for data analysis, so it is chosen for implementation of this study, but with my limited capability and knowledge, I try my best to increase the adaption be-. 政治大. tween the data and the algorithm, as well as, to invoke most likely. 立. procedures in its system built-in environment framework without cus-. ‧. ‧ 國. 學. tomizing it to fit the workflow and algorithm referred.. 5.2 Research Contribution. sit. y. Nat. A significant product feature subset to predict process-time in IC. n. al. er. io. substrate is done as the result presented in last section, through the. i n U. v. 3 data-mining methods as GR-SNBC (Gain Ratio with Naive Bayes. Ch. engchi. Classifier), SU-SNBC (Symmetrical Uncertainty with Naive Bayes Classifier) and SU-CART (Symmetrical Uncertainty with Classification and Regression Tree Classifier) approach. Through this result, I could conclude that the process time of ICsubstrate in drilling operation quite depends on product characteristics rather than processing control. Therefore, in order to decrease the process time, it is reasonable for us to evaluate the factors of product construction than production management.. -43-.

(51) 5.3 Future Suggestion In the new era of “Industry 4.0”, prediction and estimation is getting more and more important in all manufacturing, the purpose of feature selection, not only precise and accuracy but also the efficiency and easy-to-understand should be contained. So far, there are basically existing two approaches, a filter approach and a wrapper one. Although a wrapper approach has better satisfaction but takes too much time than filter approach, so how to develop a balance approach which precise, accuracy, efficiency and. 政治大 easy-to-understand should be considered in this subject. 立 ‧. ‧ 國. 學. n. er. io. sit. y. Nat. al. Ch. engchi. -44-. i n U. v.

(52) REFERENCES [1] Backus, P.; Janakiram, M.; Mowzoon, S.; Runger, G.C.; Bhargava, A. "Factory cycle-time prediction with a data-mining approach", Semiconductor. Manufacturing,. IEEE. Transactions. on, On page(s): 252 - 258 Volume: 19, Issue: 2, May 2006 [2] I. Tirkel, "Cycle time prediction in wafer fabrication line by applying data mining methods", Proc. 22nd IEEE/SEMIASMC, pp. 15, 2011. 政治大. [3] Y. Meidan , B. Lerner , G. Rabinowitz and M. Hassoun, "Cycle-time. 立. key factor identification and prediction in semiconductor man-. ‧ 國. 學. ufacturing using machine learning and data mining", IEEE Trans. Semicond. Manuf., vol. 24, no. 2, pp. 237-248, 2011. ‧. [4] Chien, C. F., Hsiao, C. W., Meng, C., Hong, K. D., Wang, S. T.,. sit. y. Nat. 2005. Cycle time prediction and control based on production. al. er. io. line status and manufacturing data mining, Proceedings of In-. v. n. ternational Symposium on Semiconductor Manufacturing Con-. Ch. engchi. i n U. ference 2005, 13-15 September, San Jose, California, USA, pp.327-330. [5] Hassoun, M. "On Improving the Predictability of Cycle Time in an NVM Fab by Correct Segmentation of the Process", Semiconductor Manufacturing, IEEE Transactions on, On page(s): 613 618 Volume: 26, Issue: 4, Nov. 2013 [6] Dash, M., & Liu, H. (1997). Feature selection for classification. Intelligent Data Analysis, 1(1-4), 131-156. [7] Liu, H., & Motoda, H. (1998). Feature extraction, construction and -45-.

(53) selection: A data mining perspective. Norwell,MA: Kluwer Academic Publishers. [8] Liu, H., & Motoda, H. (1998). Feature selection for knowledge discovery and data mining. Norwell, MA: Kluwer Academic Publishers. [9] A. Whitney, "A direct method of nonparametric measurement selection", IEEE Transactions on Computers, vol. 20, pp.11001103, 1971 [10] S.-H. Chung and H.-W. Huang, "Cycle time estimation for wafer. 學. 2002. ‧ 國. 政治大 fab with engineering lots", IIE Trans., vol. 34, pp. 105-118, 立. [11] Fayyad, U., Piatetsky-Shapiro, G., & Smyth, P. (1996). From data to. knowledge. discovery. databases. AI. maga-. y. Nat. zine, 17(3), 37.. in. ‧. mining. er. io. gan Kaufmann.. sit. [12] Pyle, D. (1999). Data preparation for data mining(Vol. 1). Mor-. al. n. v i n [13] Chien, C. F., Hsiao, C A., and Wang, I., U h e n g c h i 2004. Constructing semiconductor manufacturing performance index and applying data. mining for manufacturing data analysis, Journal of Chinese Institute of Industrial Engineering, vol.21, pp.313-327. [14] Chien, C. F., Wang, W. C., and Cheng, J. C., 2007. Data mining for yield enhancement in semiconductor manufacturing and an empirical study, Expert Systems with Applications, vol. 33, pp. 1-7. [15] Sheng, V. S., Provost, F., & Ipeirotis, P. G. (2008, August). Get another label? improving data quality and data mining using -46-.

(54) multiple, noisy labelers. InProceedings of the 14th ACM SIGKDD international conference on Knowledge discovery and data mining (pp. 614-622). ACM. [16] Lommen, A. (2009). MetAlign: interface-driven, versatile metabolomics tool for hyphenated full-scan mass spectrometry data preprocessing. Analytical chemistry, 81(8), 3079-3086. [17] Famili, F., Shen, W. M., Weber, R., & Simoudis, E. (1997). Data pre-processing and intelligent data analysis. International Journal on Intelligent Data Analysis,1(1).. 政治大 [18] Chung, S. H., & Huang, H. W. (2002). Cycle time estimation for 立. ‧ 國. 118.. 學. wafer fab with engineering lots. IIE Transactions, 34(2), 105-. ‧. [19] Chung, S. H., & Huang, H. W. (1999). The block-based cycle time. y. Nat. estimation algorithm for wafer fabrication factories. INTERNA-. io. sit. TIONAL JOURNAL OF INDUSTRIAL ENGINEERING-THEORY AP-. er. PLICATIONS AND PRACTICE, 6(4), 307-316.. al. n. v i n [20] Singhal, S., & Jena, C M. (2013). A Study h e n g c h i U on WEKA Tool for Data Preprocessing,. Classification. and. Clustering. International. Journal of Innovative Technology and Exploring Engineering (IJITEE), 2(6), 250-253. [21] Her, Jane-Fung & Rong-Kwei Li. (1996). Cycletime Prediction used Stage Regression Method in Wafer Fabrication Factory (Doctoral dissertation). [22] Cheng, Chao-Ming & Chung, Shu-Hsing (1995). The Design of Due-Date Assignment Model for a Wafer Fabrication Factory (Doctoral dissertation). -47-.

(55) [23] Chen, T. (2007). An intelligent hybrid system for wafer lot output time prediction. Advanced Engineering Informatics, 21(1), 5565. [24] Vig, M. M., & DOOLEY, K. J. (1991). Dynamic rules for due-date assignment. THE INTERNATIONAL JOURNAL OF PRODUCTION RESEARCH, 29(7), 1361-1377. [25] Cheng, T. C. E., & Gupta, M. C. (1989). Survey of scheduling research involving due date determination decisions. European Journal of Operational Research, 38(2), 156-166.. 政治大 [26] Hall, M., Frank, E., Holmes, G., Pfahringer, B., Reutemann, P., & 立. ‧ 國. 學. Witten, I. H. (2009). The WEKA data mining software: an update. ACM SIGKDD explorations newsletter, 11(1), 10-18.. ‧. [27] Tamilselvi, J. J., & Gifta, C. B. (2011). Handling Duplicate Data. io. sit. Computer Applications (0975–8887), 15(4).. y. Nat. in Data Warehouse for Data Mining. International Journal of. er. [28] Piatetsky-Shapiro, G., Brachman, R. J., Khabaza, T., Kloesgen,. al. n. v i n W., & Simoudis, E.C (1996, August). U h e n g c h i An Overview of Issues in Developing Industrial Data Mining and Knowledge Discovery Applications. In KDD (Vol. 96, pp. 89-95).. [29] Chen, Z., Wu, C., Zhang, Y., Huang, Z., Ran, B., Zhong, M., & Lyu, N. (2015). Feature selection with redundancy-complementariness dispersion.Knowledge-Based Systems, 89, 203-217. [30] Quinlan, J. R. (1992, November). Learning with continuous classes. In 5th Australian joint conference on artificial intelligence (Vol. 92, pp. 343-348). -48-.

(56) [31] Song, Q., Ni, J., & Wang, G. (2013). A fast clustering-based feature subset selection algorithm for high-dimensional data. IEEE transactions on knowledge and data engineering, 25(1), 1-14.. 立. 政治大. ‧. ‧ 國. 學. n. er. io. sit. y. Nat. al. Ch. engchi. -49-. i n U. v.

(57)