Sampling-test scheme - The NFL theorem holds over F if and only if F is closed under permutatio

Pei Jiang, Ying-ping Chen ∗

Lemma 10. The NFL theorem holds over F if and only if F is closed under permutation

5. Sampling-test scheme

Conventionally, the effectiveness of an optimizer is examined via experiments on a suite of test functions that serves as a benchmark. These test functions are selected according to some prior knowledge of the importance thereof. Here we propose and adopt a different approach in order to confirm the theoretical results obtained in the previous section from an empirical aspect. We draw a sample of functions randomly from PDLC in a manner similar to select respondents in a campaign survey and conduct experiments on these sampled functions. There is no bias in favor of which functions should be selected. We expect the arbitrariness delivers information about the composition of the problem class.

A uniform sampler for PDLC is firstly given in Section5.1. Experiments are then presented to summarize this section and demonstrate how the Lipschitz condition facilitates the search process in a practical standpoint.

5.1. A uniform sampler for PDLC

In order to conduct the sampling test, we need a uniform sampler in the first place. The following algorithm generates problem instances of PDLC with Lipschitz constant K u.a.r.

Algorithm 2 (Uniform PDLC Sampler).

procedure Uniform PDLC Sampler(v1v2. . . vn,Y= {0,¹, . . . ,^m},^{K )} f(v1) ←^Uniform([⁰,^m])

i←2

while i≤n do

f(vi) ←^f(vi−1) +^Uniform([−^K,^K]) if f(vi) >^{m or f}(vi) <^{0 then}

f(v1) ←^Uniform([⁰,^m]) i←1;

end if i←i+1 end while return f end procedure

Here Uniform([^a,^b])denotes the function that selects an integer u.a.r. from the closed interval[a,^b]. Such a sampler belongs to the category of accept–reject algorithms [25]. It generates a problem instance with bounded difference between any two successive vertexes u.a.r., and if the instance at hand exceeds the range of the codomain, the sampler rejects the instance.

The accept–reject mechanism guarantees the uniformity. Once the sampler halts, the output is always an instance of the PDLC.

Since this sampler is a Las Vegas algorithm, in which the answer is guaranteed to be correct but the resources used are not bounded [26], we need to address its time complexity for the practicality. For each candidate instance, the sampler will go through at most|_X|steps to assign all the vertex objective values, so it remains to show how many candidate instances it takes to generate a legit instance successfully. The accept–reject process is geometrically distributed, and therefore the expected number of instances consumed is the inverse of the acceptance probability. The following theorem provides an upper bound for the rejection probability.

Author's personal copy

P. Jiang, Y.-p. Chen / Theoretical Computer Science 412 (2011) 1614–1628 1623

Lemma 17. Suppose|_Y| =2m+1, where m is an integer, and|_X| =n. If

m> (ⁿ−1)(^K²+K)

3 ≥2,

then the rejection probability is less than 4

Using Kolmogorov’s inequality [27], we can get

Prob{ max

Author's personal copy

1624 P. Jiang, Y.-p. Chen / Theoretical Computer Science 412 (2011) 1614–1628

According to the assumption that Var[S_n−1]/(^m+1)²<^1, the rejection probability is less than

4

3, then the rejection probability is less than 4

Author's personal copy

P. Jiang, Y.-p. Chen / Theoretical Computer Science 412 (2011) 1614–1628 1625

For instance, if C=

√

3 and|_Y|is so large that O(|^Y|⁻¹)is negligible, the expected number of instances consumed is no more than 9. Multiplying the time to assign all vertexes’ values, the expected runtime, in terms of the number of assignments, is no more than 9|_X|. In other words, asymptotically speaking, if|_X|and|_Y|are about equal and|_Y|is larger than K²to some extent, then the expected runtime is approximately linear.

5.2. Experimental settings and results

As demonstrated in Section4, the virtues of the subthreshold-seeker rely on a proper algorithmic threshold. Although the main results in Section4hold whenθ ≤ βα(^f)−K , because we do not set a performance threshold literally to scrutinize how many subthreshold points are visited in real-world applications, in an experimental setting, we can examine the subthreshold-seeker more practically in terms of the time to identify the optimum. Therefore, the algorithmic threshold should be utilized for optimization, or more specifically, to minimize the objective function in this case.

We will compare the subthreshold-seeker with random search. Here we present three different subthreshold-seekers.

For the theoretical purpose, the first one uses the actual median of all objective values, in the form of exterior knowledge, asθ. The second one firstly selects a 100 points u.a.r. and then employs the calculated median asθ. The third one also starts with obtaining 100 points u.a.r., but it computes the mean and the standard deviation of these points and setsθto the mean minus the standard deviation. Moreover, the three subthreshold-seekers and random search obey the NFL framework and hence are non-repeating.

In advance of experiments, we need to determine the size of the set PDLC problems to be sampled. Suppose we want to estimate a population proportion q ∈ [0,¹]. We draw a sequence of samples uniformly and independently from the population with replacement. For each sample, we observe if it belongs to the variety of interest. With a large sample size, we expect the proportion in the sample approximates the real proportion. The following theorem depicts the relationship between the sample size and the error bound.

Theorem 20 (Sample Size and Error Bound). Let(^Zi)be a sequence of i.i.d. indicator variables with E[Z_i] = q. For allδ, ϵ ∈ (⁰,¹)^{, if}

n≥ −^ln(δ/²) 2ϵ² , then

Prob













−

i=1

Z_i n −q



> ϵ











≤δ.

Proof. Let Z=(∑ⁿi=1Z_i)/n. Applying Hoeffding’s inequality [28], for 0< ϵ <¹−q, we have Prob{Z−q> ϵ} ≤^e⁻²ⁿ^ϵ²,

and for 0< ϵ <^q,

Prob{Z−q< −ϵ} ≤^e⁻²ⁿ^ϵ². Moreover, ifϵ ≥¹−q,

Prob{Z−q> ϵ} ≤^Prob{Z>¹} =0≤e⁻²ⁿ^ϵ². Similarly, ifϵ ≥^q,

Prob{Z−q< −ϵ} ≤^Prob{Z <⁰} =0≤e⁻²ⁿ^ϵ². Hence, we conclude that for allϵ ∈ (⁰,¹)^,

Prob{|Z−q|> ϵ} ≤^2e⁻²ⁿ^ϵ². Finally,

n≥ −^ln(δ/²) 2ϵ²

implies 2e⁻²ⁿ^ϵ²≤δ, and we complete the proof.

In particular, with the conventional setting of(ϵ, δ) = (⁰.⁰³,⁰.⁰⁵), a sample of size 2,050 suffices. In other words, if we draw a sample of size 2,^050,[Z−0.⁰³,^Z+0.⁰³]forms a confidence interval for q with confidence level at least 95%.

Author's personal copy

1626 P. Jiang, Y.-p. Chen / Theoretical Computer Science 412 (2011) 1614–1628

Table 1 Successful rate.

θ Category (|X|, |Y|)

(¹⁰⁴,¹⁰⁴) (¹⁰⁵,¹⁰⁵) (¹⁰⁶,¹⁰⁶) γ > 0.9995 (2,049) 0.9985 (2,047) 0.9976 (2,045)

> .2 0.9951 (2,040) 0.9620 (1,972) 0.9624 (1,973) γˆ > 0.9995 (2,049) 0.9990 (2,048) 0.9985 (2,047)

> .² 0.9937 (2,037) 0.9620 (1,972) 0.9732 (1,995) µ − ˆσˆ > 1.0000 (2,050) 1.0000 (2,050) 1.0000 (2,050)

> .² 1.0000 (2,050) 1.0000 (2,050) 1.0000 (2,050) γ: median.γˆ: estimated median.µˆ: estimated mean.σˆ: estimated standard deviation.

‘‘>": proportion of instances where the subthreshold-seeker outperforms random search. ‘‘> .2": proportion of instances where the subthreshold-seeker outperforms random search by a 20% margin.

Table 2

Mean time steps to locate the minimum.

Algorithm (|X|, |Y|)

(¹⁰⁴,¹⁰⁴) (¹⁰⁵,¹⁰⁵) (¹⁰⁶,¹⁰⁶) STS,θ = γ 2037.58 22913.23 229232.26 STS,θ = ˆγ 2221.44 23170.58 229532.04 STS,θ = ˆµ − ˆσ 918.29 8095.78 80322.92 random search 4972.50 49724.74 496912.49 γ: median.γˆ: estimated median.µˆ: estimated mean.σˆ: estimated standard deviation.

The sampler generates 2,050 instances of PDLC with(|X|, |Y|) = (¹⁰⁴,¹⁰⁴)^,(¹⁰⁵,¹⁰⁵)^{, and}(¹⁰⁶,¹⁰⁶), respectively. The Lipschitz constant K is set to 100 for the concern of execution time, as previously discussed. For each problem instance, we test each algorithm for 50 independent runs. If the average time of a subthreshold-seeker to find the optimum is less than that of random search, the instance is counted as a success. We also count the number of instances that a subthreshold-seeker outperforms random search by a 20% margin, i.e., the instance where the average optimization time of a subthreshold-seeker is less than 80% of that of random search.Table 1displays the empirical results.

All three subthreshold-seekers outperform random search in most of the sampled problem instances. Furthermore, the subthreshold-seeker withθ = ˆµ− ˆσoutperforms random search in all 2,050 instances sampled, even with the requirement of a 20% margin. The statistical significance of such results is obvious to see: suppose the population proportion that the subthreshold-seeker withθ = ˆµ − ˆσ outperforms random search is q. To obtain the result that random search is outperformed in all instances, the probability is q²⁰⁵⁰. Even if q is as high as 0.995, the above probability is just 0.000034. To more formally rephrase, if the null hypothesis is ‘‘q≤0.995’’, the p-value is merely 0.000034.

Table 2displays the averaged optimization time over the 2,050 sampled problem instances. The subthreshold-seeker withθ = ˆµ − ˆσ outperforms others by a significant margin. Random search averages approximately|_X|/2 to find the minimum, which is expected. The subthreshold-seeker using the actual median and the one using the sample median both take about half time steps of that needed by random search to optimize the function.

The subthreshold-seekers withθ = ˆµ− ˆσ^andθ = ˆγare indeed black-box algorithms, for there is no exterior knowledge exerted and the only information they can use are function evaluations, but they outperform random search by a remarkable difference.

The performance difference between θ = ˆγ ^andθ = γ is insignificant. Such a result suggests that in this case, an estimation of median may be adequate. Suppose that P with|P| = N is a subset of real numbers, and for all i ∈ P, R(ⁱ)^is defined to be the rank (i.e., ordering) of i in P. For instance, R(^{min P}) = ^{1 and R}(^{max P}) = N. For simplicity, we assume that N is odd and hence the median of P is the element i with R(ⁱ) = ⌈^N/²⌉. Now we want to estimate the median of P. If a point sample S of size n, where n is assumed odd, is drawn by successively selecting an element u.a.r. from P with replacement, the estimated median,γ, is presumed to be the sampled median, and we want the error is bounded byϵ >^0, i.e.,|R(γ ) − ⌈^N/²⌉ | ≤ϵ^N.

If R(γ ) < ⌈^N/²⌉ −ϵN, there are at least⌈n/²⌉selections with ranks less than⌈_N/²⌉ −ϵ^{N. Let X}ibe the indicator variable that indicates if the ith selection is less than⌈N/²⌉ −ϵ^{N, X}i = 1 with probability p :=(⌈^N/²⌉ − ⌊ϵ^N⌋ −1)/^N.

R(γ ) < ⌈^N/²⌉ −ϵN if and only if∑n

i=1Xi ≥ ⌈n/²⌉. Since E[∑n

i=1Xi] = np, applying another form of Hoeffding’s inequality [28], we have

Prob

 R(γ ) <

_N 2



−ϵ^N



= Prob

 _n

−

i=1

Xi≥

_n 2





Author's personal copy

P. Jiang, Y.-p. Chen / Theoretical Computer Science 412 (2011) 1614–1628 1627

≤Prob

 _n

−

i=1

X_i≥ ⁿ 2



=Prob

1 n

−

i=1

X_i≥p+

₁ 2 −p



≤





 p

p+ ¹

2−p

p+¹ 2−p

1−p 1−p−(¹₂−p)

1−p−(¹₂−p)



=[4p(¹−p)^]ⁿ². Moreover, the symmetry implies that

Prob

 R(γ ) >

N 2

 +ϵ^N



≤[4p(¹−p)^]ⁿ². Therefore,

Prob





 R(γ ) −

N 2





> ϵ^N



≤2 [4p(¹−p)^]ⁿ². Now the only quantity left is p. By definition,

_N

2 − ⌊ϵ^N⌋ −1

N ≈ ¹

2 −ϵ.

For instance, if we setϵ =⁰.^{1 and n}=100, the probability of exceeding the error bound is less than 0.26. If the sample size n increases to 2,000, even with a smallϵ =⁰.03, the probability reduces to just 0.054. It is noteworthy that the effect of the population size N is negligible. Therefore, the required number of samples remains the same, even if the search space is immense. Although in real-world applications P is usually a multiset, if the multiplicities of P are not too large, such a gauge should not diverge significantly.

6. Conclusions

In this study, we introduced and investigated the properties of the discrete Lipschitz class. A generalized subthreshold-seeker was then proposed and shown to outperform random search on this broad function class. Finally, we proposed a tractable sampling-test scheme to empirically demonstrate the performance of the generalized subthreshold-seeker under practical configurations. We showed that optimization algorithms outperforming random search on the discrete Lipschitz class do exist from both theoretical and practical aspects.

As controversial as it may be, the NFL theorem provides an alternative standpoint to review the position of optimization algorithms and search heuristics. The NFL theorem expels the false hope to conquer all possible functions with only limited information available, as it points out the expectation to find a universally black-box optimizer is definitely over-optimistic.

However, the NFL theorem does not imply the utter infertility in the land of search heuristics by any means if our goals are appropriately placed. In this paper, the discrete Lipschitz class, as a simulation of continuous functions in a discrete space, is shown to be a class of problems on which black-box optimizers have performance advantages in both theory and practice. The only constraint imposed on the search space is bounded differences within a neighborhood. Under such a minor condition, black-box optimizers can still be effective over a broad, meaningful, and practical problem class as suggested by this study.

Acknowledgements

The work was supported in part by the National Science Council of Taiwan under Grant NSC 99-2221-E-009-123-MY2.

The authors are grateful to the National Center for High-performance Computing for computer time and facilities.

References

[1] D.H. Wolpert, W.G. Macready, No free lunch theorems for search, Tech. Rep. SFI-TR-95-02-010, Santa Fe Institute, 1995.

[2] D.H. Wolpert, W.G. Macready, No free lunch theorems for optimization, IEEE Transactions on Evolutionary Computation 4 (1997) 67–82.

[3] J.C. Culberson, On the futility of blind search: An algorithmic view of no free lunch, Evolutionary Computation 6 (1998) 109–127.

[4] S. Droste, T. Jansen, I. Wegener, Perhaps not a free lunch but at least a free appetizer, Tech. Rep. ISSN 1433-3325, Department of Computer Science, University of Dortmund, 1998.

[5] S. Droste, T. Jansen, I. Wegener, Optimization with randomized search heuristics – the (a)nfl theorem, realistic scenarios, and difficult functions, Theoretical Computer Science 287 (2002) 131–144.

[6] M.J. Streeter, Two broad classes of functions for which a no free lunch result does not hold, in: Proceedings of the Genetic and Evolutionary Computation Conference 2003, 2003, pp. 1418–1430.

Author's personal copy

1628 P. Jiang, Y.-p. Chen / Theoretical Computer Science 412 (2011) 1614–1628

[7] C. Igel, M. Toussaint, A no-free-lunch theorem for non-uniform distributions of target functions, Journal of Mathematical Modelling and Algorithms 3 (4) (2004) 313–322.

[8] S. Christensen, F. Oppacher, What can we learn from no free lunch? a first attempt to characterize the concept of a searchable function, in: Proceedings of the Genetic and Evolutionary Computation Conference 2001, 2001, pp. 1219–1226.

[9] D. Whitley, J. Rowe, Subthreshold-seeking local search, Theoretical Computer Science 361 (2006) 2–17.

[10] H. Mülenbein, How genetic algorithms really work i. mutation and hillclimbing, in: Proceedings of the Parallel Problem Solving from Nature, 2, 1992, pp. 15–26.

[11] S. Droste, T. Jansen, I. Wegener, On the analysis of the(1+1)evolutionary algorithm, Theoretical Computer Science 276 (2002) 51–81.

[12] J. He, X. Yao, Towards an analytic framework for analysing the computation time of evolutionary algorithms, Artificial Intelligence 145 (1–2) (2003) 59–97.

[13] P.S. Oliveto, C. Witt, Simplified drift analysis for proving lower bounds in evolutionary computation, in: Proceedings of Parallel Problem Solving from Nature, 2008, pp. 82–91.

[14] O. Giel, I. Wegener, Evolutionary algorithms and the maximum matching problem, in: Proceedings of the 20th Annual Symposium on Theoretical Aspects of Computer Science, 2003, pp. 415–426.

[15] F. Neumann, I. Wegener, Randomized local search, evolutionary algorithms, and the minimum spanning tree problem, Theoretical Computer Science 378 (1) (2007) 32–40.

[16] T. Jansen, I. Wegener, A comparison of simulated annealing with a simple evolutionary algorithm on pseudo-boolean functions of unitation, Theoretical Computer Science 386 (1–2) (2007) 73–93.

[17] A. Auger, O. Teytaud, Continuous lunches are free!, in: Proceedings of the Genetic and Evolutionary Computation Conference 2007, 2007, pp. 916–922.

[18] J. Rowe, M. Vose, A. Wright, Reinterpreting no free lunch, Evolutionary Computation 17 (1) (2009) 117–129.

[19] C. Schumacher, M.D. Vose, L.D. Whitley, The no free lunch and problem description length, in: Proceedings of the Genetic and Evolutionary Computation Conference 2001, 2001, pp. 565–570.

[20] R. Courant, F. John, Introduction to Calculus and Analysis, Vol. 1, Springer-Verlag, 1989.

[21] T.H. Cormen, C.E. Leiserson, R.L. Rivest, C. Stein, Introduction to Algorithms, 2nd edition, The MIT Press, 2001.

[22] R. Motwani, P. Raghavan, Randomized Algorithms, Cambridge University Press, 1995.

[23] I. Rechenberg, Evolutionsstrategie ’94, Frommann Holzboog, 1994.

[24] J. Jägersküpper, Algorithmic analysis of a basic evolutionary algorithm for continuous optimization, Theoretical Computer Science 379 (3) (2007) 329–347.

[25] C. Robert, G. Casella, Monte Carlo Statistical Methods, Springer-Verlag, 1999.

[26] Mikhail J. Atallah (Ed.), Algorithms and Theory of Computation Handbook, CRC Press LLC, 1999, ISBN-10: 0849326494, ISBN-13: 978-0849326493.

[27] Y.S. Chow, H. Teicher, Probability Theory: Independence, Interchangeability, Martingales, 3rd edition, Springer, 1997.

[28] W. Hoeffding, Probability inequalities for sums of bounded random variables, Journal of the American Statistical Association 58 (1963) 13–30.

在文檔中應用泛用型最佳化演算架構於無線網路傳輸技術最佳化問題之研究 (頁 72-79)