Genetic Algorithm for MDDP-t - Advanced Bit-wise Indexing Method

Chapter 4 Advanced Bit-wise Indexing Method

5.3 Genetic Algorithm for MDDP-t

= ³

, ) ( )

(

j i k k j

i w f m

m RCEF

However, the corresponding weights W={w1, w2, w3} of these three RCEF probing functions require further investigation. A genetic algorithm is thus used to solve the weight-learning problem of the three given probing functions in order to determine suitable weights for the MDDP-t.

5.3 Genetic Algorithm for MDDP-t

The search space in a GA (Genetic Algorithm) consists of possible solutions to a problem [15]. A solution in the search space is called an individual and its genotype consists of a set of chromosomes represented by sequences of 0s and 1s. These chromosomes can dominate individual phenotypes. Each individual has an associated objective function called its fitness. A good individual is one that has a high/low fitness

value depending on whether the problem involves maximization or minimization. The strength of a chromosome in an individual is represented by its fitness value and the chromosomes of individuals are carried to the next generation. The set of individuals with associated fitness values is called the population. The population at a given stage in the GA is referred to as a generation. The best individual in each generation is the individual with the best discovered fitness value.

There are three main components in the GA while loop:

(1) selection/reproduction, the process of selecting good individuals from the current generation to be carried to the next generation;

(2) crossover, the process of shuffling two randomly selected strings (chromosomes) in two parent individuals to generate new offspring;

(3) replacement, the replacing of the worst-performing individuals in a generation based on fitness value.

Sometimes one or more bits of a chromosome are complemented to generate a new offspring. This process is called mutation. The population size is finite in each GA generation, which implies that only relatively fit individuals in generation j will be carried to the next generation j+1. The power of GA is that the algorithm terminates

rapidly to an optimal or near optimal solution. The iterative process is terminated when the solution reaches the optimum value [16].

Details of the GA developed to solve MDDP-t are described in this section. As mentioned above, the weight set W is quite important in solving MDDP-t. Since the weights are domain-dependent, we propose a GA-based weight-learning function for MDDP-t to find weights w for each probing function according to MDDP-t instances

with known root causes. The weight-learning function is described in detail below.

NOTATION 5.4

Mi machine set for the i-th MDDP-t instance;

rmi root-cause machine already known to cause the i-th MDDP-t instance defect.

rank(Mi, rmi) k, where rmi is the k-th largest RCEF value in set Mi.

Weight-learning Problem: Given k MDDP-t instances, find weights w1, w2 and w3 to minimize :

∑

= k

i i rm M rank

) ,

( ………...………..…..(5.4)

EXAMPLE 5.3:

Assume three MDDP-t instances with three weight sets. According to the rankings of actual root causes in the three datasets evaluated using these three functions shown in Table 5.2, w1 is the best choice.

Table 5.2: Weight-learning function example for three MDDP-t instances

) , (M₁ rm₁

rank rank(M₂,rm₂) rank(M₃,rm₃)

∑

= 3

) , (

i i rm M rank

w1 1 1 2 4

w2 2 3 4 9

w3 1 2 4 7

There are five parts to our GA approach: encoding, crossover/mutation, selection/terminal conditions, and fitness determination. In general, the chromosomes in the first generation are created randomly and succeeding generations are generated by crossover and mutation. Details of these four parts are given below.

Encoding

The proposed probing functions are based on expert experiences, and each chromosome is the concatenation of the bit-strings represented by w₁, w₂ and w₃. Since not all probing functions are used in every domain, the n-bit flags e₁, e₂ and e₃ are used to help the GA efficiently determine which probing functions to use in the RCEF

function. When the one-bit ei is set to zero, the weight wi of that probing function is set to zero in the chromosome. Obviously, the n-bit ei is used to set the probing function probability determination to 1/n. Assume the probing function determination probability is 25% and the number of bits for ei is set to 4. The corresponding essential flag ei also uses n bits in the tail of its weight string, the initial values of which are randomly set. According to the above definitions, assume that w1=00011=3, s1=01, w2=00101=5, s2=10, w3=00100=4, and s3=10. The chromosome thus generated is 000110100101100010010.

Crossover/Mutation Procedures

Many methods can be employed in the crossover process, thus, suitable operation should be selected according to the application domain. For example, the strings 001111001011001001 and 010011011001001011 could be crossed over after the second locus in each to produce 000011011001001011 and 011111001011001001. Our experience indicates the random one-point crossover method is suitable for solving MDDP-t learning problems.

The conventional bit-inversion method can be used in the mutation process. For example, the second position in the string 001111001011001001 might be mutated to yield 011111001011001001 by changing the 0 to 1 in bit 2. Our experience indicates

the inversion probability should be set to 0.05.

Selection/Terminal Conditions

The population size in each generation and terminal conditions can be determined according to the application domain. Our experience indicates the initial chromosome number in the population should be set to 300 and the terminal conditions set to 500 generations.

Fitness Function

Many chromosomes are produced in each generation and weights W must be evaluated. In order to identify suitable weight sets, all machine information is input to the RCEF, which then computes the actual root-cause rankings. An MDDP-t GA fitness function and MDDP-t GA algorithm are shown below.

MDDP-t GA fitness function

For n given MDDP-t instances MDDP-t1, MDDP-t 2 , …, MDDP-t n, let rmj be the actual root-cause of the MDDP-t j instance. Weight set Wi is better than weight set Wj if

∑

= n

k k rm M rank

) ,

( using weight set Wi is smaller than the same function using Wj.

Algorithm 5.1 - MDDP-t GA algorithm

Input: Training datasets

Output: The weight set W for the RCEF

Step1: Initialize population (bit-strings combining w1, e1, w2, e2, w3, e3) Step2: Choose parents

Step3: Construct offspring using one-point crossover Step4: Call mutation procedure

Step5: For all flags ei, if ei is all 0, set wi=0; otherwise wi=wi

Step6: Evaluate offspring and replace least-fit individual with better offspring Step7: Go to Step2 until a terminal condition is reached

Training will generate several weight sets, which can then be applied to detecting root causes in future datasets. When a new dataset with an unknown root cause is input into the manufacturing defect detection system for root cause discovery, it must first be translated into MDDP-t terms. After that, the top combination is used to generate a possible root-cause ranking list. Engineers can use these ranking lists to check machines one by one and filter out possible killer machines. Finally, engineers can then record the real root cause and may re-compute the MDDP-t learning procedure if the weights resulting from training fail to identify the correct root cause.

在文檔中知識系統中快速索引機制之研究 (頁 121-128)