Data-Mining Method

3 Methods Design

3.3 Data-Mining Method

國

立政治大學

‧

N a tio na

l C h engchi U ni ve rs it y

-18-

the original data and yet producing the quality knowledge. The con-cept of data reduction is commonly understood as either reducing the volume or reducing the dimensions. (Arora et al., 2012) In this work, the following approaches to facilitate fulfilling this step.

(a) Generalization, where low level or primitive (raw) data are replaced by higher level concepts through the use of concept hierarchies. It is also known as “concept hierarchy generation”.

(b) Discretization is a process of quantizing continuous attributes.

In other words, it is the process of putting values into buckets so that there are a limited number of possible states. The suc-cess of discretization can significantly extend the borders of many learning algorithms.

3.3 Data-Mining Method

The feature selection methods are typically presented in three classes based on how they combine the selection algorithm and the model building.

(a) Filter Method:

It analyzes intrinsic properties of data, ignoring the classifier.

Most of these methods can perform two operations, ranking and subset selection: in the former, the importance of each dividual feature is evaluated, usually by neglecting potential in-teractions among the elements of the joint set; in the latter, the final subset of features to be selected is provided.

(b) Wrapper Method:

‧ 國

立政治大學

‧

N a tio na

l C h engchi U ni ve rs it y

-19-

It evaluates subsets of variables which allows, unlike filter ap-proaches, to detect the possible interactions between variables.

The two main disadvantages of these methods are: The in-creasing overfitting risk when the number of observations is insufficient. The significant computation time when the number of variables is large.

It has been proposed to reduce the classification of learning.

They try to combine the advantages of both previous methods.

The learning algorithm takes advantage of its own variable se-lection algorithm. So, it needs to know preliminary what a good selection is, which limits their exploitation.

In 2011, a machine-learning data-mining approach of CMIM-SNBC (conditional mutual information maximization and selective na-ive Bayesian classifier) approach proposed by Meidan et al. which has significant performance including simplicity, interpretability and effi-ciency compared to decision tree, a neural network and multinomial logistic regression in semi-conductor field. So, in my research, data-mining approaches to get a selective feature subset are referred to the work of CMIM-SNBC approach (Meidan et al. 2011). CMIM is based on information theoretic ranking criteria (conditional mutual information), so information theory is adopted to be a foundation of my study for identification and prediction of process-time key factors in IC-substrate field.

My experimental approach basically consists of wrapper with filter, and its outline executive process flow is as Figure-2.

‧ 國

立政治大學

‧

N a tio na

l C h engchi U ni ve rs it y

-20-

Wrapper evaluates attribute sets by using a learning scheme. It contains a classifier to use for estimating the accuracy of subsets.

Cross validation is used to estimate the accuracy of the learning scheme for a set of attributes.

With regard to filter is for running an arbitrary classifier (the base classifier to be used) on data that has been passed through an arbi-trary filter. Like the classifier, the structure of the filter is based ex-clusively on the training data and test instances will be processed by the filter without changing their structure. For each wrapper, it adopts greedy-stepwise search strategy to learn. Greedy-stepwise performs a greedy forward or backward search through the space of attribute subsets. May start with no/all attributes or from an arbitrary point in the space. Stops when the addition/deletion of any remaining attrib-utes results in a decrease in evaluation. Can also produce a ranked list of attributes by traversing the space from one side to the other and recording the order that attributes are selected.

Features Raw Datasetof

Pre-processing Filter Wrapper

Features Processed of

Dataset

Features Rankingof Dataset

Key Features Prediction of to Process-Time

Figure-2 overall experiment flow chart

There are 2 basic classifiers for experiment, respectively, “Naïve

‧ 國

立政治大學

‧

N a tio na

l C h engchi U ni ve rs it y

-21-

Bayes” and “Classification and Regression Tree”.

Naïve Bayes is a class for a Naïve Bayes classifier using estimator classes. They are a family of simple probabilistic classifiers based on applying Bayes' theorem with strong (naive) independence assump-tions between the features. Naïve Bayes classifiers are highly scalable, requiring a number of parameters linear in the number of variables (features/predictors) in a learning problem. Maximum-likelihood training can be done by evaluating a closed-form expression, which takes linear time, rather than by expensive iterative approximation as used for many other types of classifiers.

The CART (Classification and Regression Tree) approach was in-troduced in 1984 by Leo Breiman, Jerome Friedman, Richard Olshen and Charles Stone as an umbrella term to refer to the following types of decision trees:

(a) Classification Trees: where the target variable is categorical and the tree is used to identify the "class" within which a target variable would likely fall into.

(b) Regression Trees: where the target variable is continuous and tree is used to predict its value.

The main elements of CART are:

(a) Rules for splitting data at a node based on the value of one variable;

(b) Stopping rules for deciding when a branch is terminal and can be split no more;

Here, “artificial neural network” is not chosen, because it is be

‧ 國

立政治大學

‧

N a tio na

l C h engchi U ni ve rs it y

-22-

studied and discussed plentifully at all, for its preciseness and accu-racy, as well, its inherent characteristics including difficult explanation, long training duration, complex modeling, besides, not good scalabil-ity.

Finally, about the evaluator, “Information Gain Ratio” and “Sym-metrical Uncertainty” are considered in the experiment.

"Information Gain Ratio" evaluates the worth of an attribute by measuring the gain ratio with respect to the class.

𝐺𝐺𝐺𝐺(𝐶𝐶𝐶𝐶𝐶𝐶𝐶𝐶𝐶𝐶, 𝐴𝐴𝐴𝐴𝐴𝐴𝐴𝐴𝐴𝐴𝐴𝐴𝐴𝐴𝐴𝐴𝐴𝐴) = 𝐺𝐺𝐶𝐶𝐴𝐴𝐺𝐺(𝐶𝐶𝐶𝐶𝐶𝐶𝐶𝐶𝐶𝐶 | 𝐴𝐴𝐴𝐴𝐴𝐴𝐴𝐴𝐴𝐴𝐴𝐴𝐴𝐴𝐴𝐴𝐴𝐴) / 𝐻𝐻(𝐴𝐴𝐴𝐴𝐴𝐴𝐴𝐴𝐴𝐴𝐴𝐴𝐴𝐴𝐴𝐴𝐴𝐴)

(Equation-1)

“Symmetrical Uncertainty” evaluates the worth of an attribute by measuring the symmetrical uncertainty with respect to the class.

𝑆𝑆𝑆𝑆(𝐶𝐶𝐶𝐶𝐶𝐶𝐶𝐶𝐶𝐶, 𝐴𝐴𝐴𝐴𝐴𝐴𝐴𝐴𝐴𝐴𝐴𝐴𝐴𝐴𝐴𝐴𝐴𝐴)

= 2 ∗ 𝐺𝐺𝐶𝐶𝐴𝐴𝐺𝐺(𝐶𝐶𝐶𝐶𝐶𝐶𝐶𝐶𝐶𝐶 | 𝐴𝐴𝐴𝐴𝐴𝐴𝐴𝐴𝐴𝐴𝐴𝐴𝐴𝐴𝐴𝐴𝐴𝐴) / [ 𝐻𝐻(𝐶𝐶𝐶𝐶𝐶𝐶𝐶𝐶𝐶𝐶) + 𝐻𝐻(𝐴𝐴𝐴𝐴𝐴𝐴𝐴𝐴𝐴𝐴𝐴𝐴𝐴𝐴𝐴𝐴𝐴𝐴) ]

(Equation-2)

Where,

1) (𝑋𝑋) is the entropy of a discrete random variable 𝑋𝑋. Suppose 𝑝𝑝(𝑥𝑥) is the prior probabilities for all values of 𝑋𝑋, 𝐻𝐻(𝑋𝑋) is defined by

𝐻𝐻(𝑋𝑋) = − � 𝑝𝑝(𝑥𝑥) log₂𝑝𝑝(𝑥𝑥)

𝑥𝑥∈𝑋𝑋

(Equation-3)

2) Gain( 𝑋𝑋∣𝑌𝑌 ) is the amount by which the entropy of 𝑌𝑌 decreases.

‧

It reflects the additional information about 𝑌𝑌 provided by 𝑋𝑋 and is called the information gain which is given by

𝐺𝐺𝐶𝐶𝐴𝐴𝐺𝐺( 𝑋𝑋 | 𝑌𝑌 )

= H(X) − H( X |Y )

= 𝐻𝐻(𝑌𝑌) − 𝐻𝐻( 𝑌𝑌 | 𝑋𝑋)

(Equation-4)

Where (𝑋𝑋∣ ) is the conditional entropy which quantifies the remaining entropy (i.e. uncertainty) of a random variable 𝑋𝑋 given that the value of another random variable 𝑌𝑌 is known.

Suppose (𝑥𝑥) is the prior probabilities for all values of 𝑋𝑋 and ( ∣ 𝑦𝑦 ) is the posterior probabilities of 𝑋𝑋 given the values of 𝑌𝑌 , 𝐻𝐻( 𝑋𝑋

Information gain is a symmetrical measure. That is the amount of information gained about 𝑋𝑋 after observing 𝑌𝑌 is equal to the amount of information gained about 𝑌𝑌 after observing 𝑋𝑋. This ensures that the order of two variables (e.g.,(𝑋𝑋, 𝑌𝑌 ) or (𝑌𝑌,𝑋𝑋)) will not affect the value of the measure.[31]

Both evaluator adopt the search method of “Ranker” which could ranks attributes by their individual evaluations.

In brief, the method design is scheduled as the following Table-3.

‧ 國

立政治大學

‧

N a tio na

l C h engchi U ni ve rs it y

-24-

Type & Search Method & Search Strategy

GR-SNBC

Information Gain Ratio

& Ranker

Naïve Bayes

& Greedy Stepwise-forward

SU-SNBC

Symmetrical Uncer-tainty & Ranker

Naïve Bayes

&Greedy Stepwise-forward

SU-CART

Symmetrical Uncer-tainty & Ranker

Classification And Regres-sion Tree

& Greedy Stepwise-forward

Table-3 tableau of experiment design

‧

在文檔中 IC基板製程時間之特徵選擇研究－以鑽孔作業為例 - 政大學術集成 (頁 25-32)

3 Methods Design

3.3 Data-Mining Method

國

立 政 治 大 學

‧

N a tio na

l C h engchi U ni ve rs it y

3.3 Data-Mining Method

‧ 國

立 政 治 大 學

‧

N a tio na

l C h engchi U ni ve rs it y

‧ 國

立 政 治 大 學

‧

N a tio na

l C h engchi U ni ve rs it y

‧ 國

立 政 治 大 學

‧

N a tio na

l C h engchi U ni ve rs it y

‧ 國

立 政 治 大 學

‧

N a tio na

l C h engchi U ni ve rs it y

‧

‧ 國

立 政 治 大 學

‧

N a tio na

l C h engchi U ni ve rs it y

Type & Search Method & Search Strategy

GR-SNBC

SU-SNBC

SU-CART

‧

立政治大學

立政治大學

立政治大學

立政治大學

立政治大學

立政治大學