3.3.1 The OO Theory Based Searching Procedures
The considered stochastic simulation optimization problem is stated in the following
) (
minJ (3.3)
where is a huge decision-variable space, and J() is the objective function, which may be an expected output or a function of expected outputs of the simulated system. To cope with the computational complexity of this problem, we will employ the Ordinal Optimization (OO) theory based searching procedure [26]-[27], which efficiently seeks a good enough solution with high probability instead of searching the best for sure based on the observation that the performance order of the decision-variable vectors is likely preserved even evaluated by a crude model. From here on, we will use the word vector to represent the vector of decision variables.
The existing searching procedure of OO can be summarized in the following [27]: (i) Uniformly or randomly select N, say 1000, vectors of decision variables from X . (ii) Evaluate and order the N vectors using an approximate model, then pick the top s, say 35, vectors to form the estimated good enough subset. (iii) Evaluate and order all the s vectors obtained from (ii) using the exact model, then pick the top k(1) vectors. The basic idea of the OO theory is based on the following observation: the performance order of the decision variables is likely preserved even evaluated using a crude model. Thus, the OO approach can reduce the searching space using cheaper evaluation to save computational time as indicated in (ii), and the best vector of decision variables obtained in (iii) is proved in [27] to be a good enough, top 5%, solution among N (=1000) with probability 0.95.
However, the good enough solution of problem (3.3) that we are searching for should be a
good enough vector in instead of the N vectors unless is as small as N [28]-[29]. As indicated in a recent paper by Lin and Ho [30], under a moderate modeling noise, the top 3.5% of the uniformly selected N vectors will be among the top 5% vectors of a huge with a very high probability (0.99), and the best case can be among the top 3.5% vectors of provided that there is no modeling error. However, for with size of 10 , a top 3.5% vector is a vector among the top 3.30 51028 ones. This certainly not seems to be a good enough solution in the sense of practical optimization; however, it is acceptable only when consists of lots of good vectors so that even if the performance order of the selected vector is not practically good enough, the corresponding objective value acceptable only when consists of lots of good vectors so that even if the performance is. As a matter of fact, most of the practical stochastic simulation optimization problems do not have lots of good vectors; otherwise, finding a good enough solution won’tbedifficult.Therefore to apply the existing ordinal optimization searching procedures, we need to develop a new scheme to select N excellent vectors from to replace (i) so as to ensure the final selected-vector is a good enough solution of (3.3) from the practical viewpoint.
Heuristic methods for obtaining N excellentvectorsmay depend on how wellone’s knowledge about the considered system. For instance in the optimal power flow problems with discrete control variables, Lin et al. proposed an algorithm based on the OO theory and engineering intuition to select N excellent discrete control vectors [31]. However, the engineering intuition may work only for specific systems. Thus, in this section, we will propose an OO theory based systematic approach to select N excellent vectors from and combine with the existing ordinal optimization searching procedures to find a good enough solution of (3.3). The systematic method we propose here for finding N excellent vectors is a combination of an Artificial Neural Network (ANN) and the Genetic Algorithm (GA). We use the ANN to construct a crude model required to evaluate the objective values of the vectors. Using this efficient evaluation for the fitness value of a vector, GA can efficiently find N excellent vectors from .
3.3.2 Finding
NRoughly Good Vectors from Decision Variables Space
As indicated in the OO theory [26]-[27],performance“order”ofthevectorsislikely preserved even evaluated using a crude model. Thus, to select N roughly good vectors from without consuming much computation time, we need to construct a crude but effective and efficient model to evaluate the objective value of (3.3) for a given vector , and use an efficient scheme to select N roughly good vectors. Our crude model is constructed based on an ANN [32], and our selection scheme is GA [33].
3.3.2.1 The Artificial Neural Network (ANN) Based Model
Considering the inputs and outputs as the vectors and the corresponding objective values J(), respectively, we can use an ANN to implement the mapping from the inputs to the outputs [32]. First of all, we will select a representative subset of by uniformly picking M , say 1000, vectors from . Then we will evaluate the objective values of these M vectors using an exact model, which can be a stochastic simulation with moderate number of test samples as indicated in [28]. These collected M input-output pairs of (,J()) will be used to train the ANN to adjust its arc weights. Once this ANN is trained, we can input any vector to obtain an estimation of the corresponding J() from the output of the ANN; in this manner, we can avoid an accurate but lengthy stochastic simulation to evaluate J() for a given . This forms our crude but efficient model to roughly estimate the objective value of (3.3) for a given vector . Effectiveness of this crude model is justified by the OO theory as mentioned above, because what we care here are the relative order of ’s, not the value of J()’s.
3.3.2.2 The Genetic Algorithm (GA)
By the aid of the above effective and efficient objective value (or the so-called fitness value in GA terminology) evaluation model, we can efficiently select N roughly good vectors from using GA, which is briefly described as follows. Assuming an initial
random population produced and evaluated, genetic evolution takes place by means of three basic genetic operators: (a) parent selection; (b) crossover; (c) mutation. The population in GA terminology represents a vector in our problem, and each population is encoded by a string of 0s and 1s. The string is called a chromosome. Parent selection is a simple procedure whereby two chromosomes are selected from the parent population based on their fitness values. Solutions with high fitness values have a high probability of contributing new offspring to the next generation. The selection rule we used in our approach is a simple roulette-wheel selection [33]. Crossover is an extremely important operator for the GA. It is responsible for the structure recombination (information exchange between mating chromosomes) and the convergence speed of the GA and is usually applied with relatively high probability, say 0.7. The chromosomes of the two parents selected are combined to form new chromosomes that inherit segments of information stored in parent chromosomes.
There are many crossover scheme, we employ the single-point crossover [33] in our approach. While crossover is the main genetic operator exploring the information included in the current generation, it does not produce new information. Mutation is the operator responsible for the injection of new information. With a small probability, random bits of the offspring chromosomes flip from 0 to 1 and vice versa and give new characteristics that do not exist in the parent population. In our approach, the mutation operator is applied with a relatively small probability 0.02 to every bit of the chromosome.
There are two criteria for the convergence of GA. One is when the fitness value of the best population does not improve from the previous generation, and the other is when evolving enough generations.
The initial populations of the GA employed in our first-level approach are I , say 5000, randomly selected vectors from . After the applied GA converges, we rank the final generation of these I populations based on their fitness values and pick the top N populations, which form the N roughly good vectors that we look for.
3.3.3 Searching the Good Enough Solution Among the
NStarting from the selected N roughly good vectors, in the second-level, we will proceed directly with step (ii) of the existing ordinal optimization searching procedures described in Section 3.3.1. In this step, we will evaluate the objective value of each vector using a more refined model3than the crude one employed in the first-level. We will order the N vectors based on the estimated objective values and choose the top s vectors to form the Selected Subset (SS). Then, we will evaluate each of the s vectors using the exact model, which is a stochastic simulation with sufficiently large number of test samples that makes the value estimation of J() for a given sufficiently stable, of the considered problem as indicated in step (iii) of the existing ordinal optimization searching procedures. The vector associated with the smallest objective value of (3.3) among s is the good enough solution that we seek.
3.3.4 The OO Theory Based Two-level Algorithm
Now, our OO theory based two-level algorithm can be stated as follows.
Step 1: Uniformly select M ’sfrom and use an exact model to compute the corresponding J()’s.Train an ANN (orANNs)by adjusting its(ortheir)arcweights using the mapping between the given M input-output pairs, that are the M (,J())’s.
Step 2: Randomly select I vectors from as the initial populations. Apply GA to these populations using the efficient and effective fitness-value evaluation model based on the ANN trained in Step 1. After the algorithm converges, we rank all the final I populations based on their fitness values and select the top N populations.
Steps 1 and 2 constitute the first-level approach.
3 This more refined model can be, for example, a stochastic simulation with small number of test samples [28] to evaluate the objective value of a given vector in the considered problem.
Step 3: Use a more refined model than the ANN to estimate the objective values of the
N vectors obtained in Step 2. Rank the N vectors based on their estimated objective values and select the top s vectors.
Step 4: Use the exact model of the considered problem to compute the objective values of the s vectors. The vector with the smallest objective value of (3.3) is the good enough solution.
Steps 3 and 4 represent the procedures of the second-level approach. Thus, the overall structure of the proposed OO theory based two-level algorithm can be shown in Figure 3.2.
Pick bestN (=1000) solutions Randomly selectI (=5000) solutions
as initial population
GA
The best solution is the good enough solution
ANN to roughly evaluate E[J()]
Run shorter stochastic simulation for each of the N solutions and compute the approximate
Run lengthy stochastic simulation for each of the s designs and compute the exact
Pick the bests (=35) solutions
)]
( [J E
)]
( [J E
Figure 3.2. The structure of the OO theory based two-level algorithm.
3.3.5 Performance Evaluation
3.3.5.1 Performance Evaluation of the First-level Approach
Since the performance of the second-level approach had been thoroughly investigated in [27], what we need to address here is how excellent the N selected vectors are among the various types of decision-variable space so as to demonstrate the validity of our first-level approach. This evaluation is carried out in the following, while the performance of the two-level algorithm will be presented afterwards.
As indicated in [27], the Order Performance Curve (OPC) of all the ordered vectors
|
| 2 1,,...,
in is determined by the spread of the order performance J[1],J[2],...,J[||], where J[i] denotes J( . Without loss of generality,i) J[i]’scan benormalized into the range [0,1], i.e., for i1,2,...,||, yi (J[i] J[1])/(J[||] J[1]). Meanwhile, the ordered
|
| vectors, spaced equally, are also mapped into the range [0,1] such that for
|,
| ,..., 2 ,
1
i z(i)z[i] (i1)/(||1). There are five broad categories of OPC models: (i) lots of good vectors, (ii) lots of intermediate but few good and bad vectors, (iii) equally distributed good, bad and intermediate vectors, (iv) lots of good and lots of bad but few intermediate vectors, and (v) lots of bad vectors. Figure 3.3 shows a graphical expression of these five types of OPCs. More precisely, a standardized OPC can be determined by a two-parameter smooth curve F1(z|,,)= 1)
1,
| (z
F , where F(z|,) is the Incomplete Beta function of the two parameters (,). In general, <1, >1
Figure 3.3: Five types of standardized OPCs.
corresponds to the OPC of type (i); >1, >1 corresponds to the OPC of type (ii); =1,
=1 corresponds to the OPC of type (iii); <1, <1 corresponds to the OPC of type (iv);
>1, <1 corresponds to the OPC of type (v). As indicated in Section 3.1, we need not consider the types of consisting of lots of good vectors in this evaluation, thus we take only the three OPC types (ii), (iii) and (v) into account. For the purpose of evaluation, we assume the size of the decision-variable space to be 1030.4
The roughness of the ANN model can be described by adding a uniform noise to the normalized performances y ’i s[26]-[27]. That means, the model of ANN can be described by the noisy model y +i , where the random noise is generated from the uniform distribution random variable U=[-0.01,0.01]5; note that this range of noise seem conservative however it can switch the order of 21028 vectors for a type (iii) OPC.
We studied a total of 28 OPCs distributed uniformly among the three broadly generic types, (ii), (iii) and (v), formed from the following parameters: =1.0, 2.0, 4.0, 5.0 and
=0.2, 0.4, 0.8, 1.0, 2.0, 4.0, 5.0. In all of our Monte-Carlo calculations, we simulate 10000 realizations of noisy OPCs. We found that the top 5% of the top ranked N(=1000) populations obtained after GA converges are lying in the top 106% of the || (=1030) populations with probability 0.99. This result is extremely better than the uniformly selected N vectors whose top 3.5% vectors can at best (i.e. no modeling error) be the top 3.5%
vectors of as indicated in [30]. This shows that the N vectors obtained by our first-level approach are really excellent.
4Since what we care here is the ranking percentage of the selected N vectors among , we can, without loss of generality, assume ||=1030 for a typical huge decision-variable space.
5The magnitude of noise for describing the roughness of a crude model is determined either based on an engineering judgment or an empirical experiment; in our case, it is estimated from an experiment of this crude model for the application problem of this chapter.
Remark 3: Though we do not investigate the actual order of the N vectors for the OPC types (i) and (iv), our first-level approach can still be applied for problems with of these two types of OPCs. This is because even if the order of the obtained N vectors of the two types of OPC may not be as good as those of the other three OPC types due to the sharp sensitivity of the noise to the performance in these two types, however their actual objective values will still be good enough due to the existence of lots of good vectors. That means in both OPC types (i) and (iv), there can be a big difference in the order of good vectors but the difference in objective values are very small. Thus, no matter what types of OPC we are facing, our first-level approach processes the same.
3.3.5.2 Performance Evaluation of the Two-level Algorithm
As indicated in Section 3.3.1, for N =1000, s=35, the top vector we obtain in Step 4 of the two-level algorithm must be among the top 5% of the N vectors with probability 0.95.
Then, combining the performance evaluation for the first-level approach, we can conclude the following: the good enough solution obtained by the OO theory based two-level algorithm is among the top 106% of with probability 0.950.99.