研究與發展專為無線網路系統客製化之最佳化演算架構

(1)

計畫類別：個別型

計畫編號： NSC 98-2221-E-009-072-

執行期間： 98 年 08 月 01 日至 99 年 07 月 31 日

執行單位：國立交通大學資訊工程學系（所）

計畫主持人：陳穎平

共同主持人：許騰尹、陳耀宗

計畫參與人員：碩士班研究生-兼任助理人員：黃淵暐

碩士班研究生-兼任助理人員：古明哲

碩士班研究生-兼任助理人員：許庭毓

博士班研究生-兼任助理人員：林季穎

博士班研究生-兼任助理人員：李長紘

報告附件：出席國際會議研究心得報告及發表論文

處理方式：本計畫可公開查詢

中華民國 99 年 10 月 29 日

(2)

方法，皆可應用於各式各樣的最佳化問題上。因此，本計畫「研究與發展專為無線

網路系統客製化之最佳化演算架構」原訂以兩年的期間，利用演化計算領域中最佳

化方法極富彈性之特點，研發出可客製化之最佳化演算架構，並將之為無線網路技

術中所存在的各項最佳化問題量身訂製，提供適切之最佳化工具與軟體。然而本計

畫僅核定為一年期計畫，故計畫目標修訂為完成混合式變數之最佳化工具，並於計

畫執行期間，進行演化計算最化佳方法論之各項相關研究。

二、研究目的

本計畫原訂之最終目標，在於研究與發展出「提供無線網路系統相關應用之最

佳化服務基礎架構」，旨在以可客製化的最佳化架構，分別對無線網路應用系統中

存在之數個參數型別與種類不同之最佳化議題加以適當處理與解決。

由於僅核定為一年期計畫，故修訂為著重演化計算最佳化方法論上的理論探討

與特性分析。預期以延伸式精簡基因演算法為基礎架構，提出新式的泛用型最佳化

演算架構，能直接適用於含有不同型態決策變數的待處理問題，包含布林變數、整

數和實數。我們利用演化計算領域中之最佳化方法視目標函數 (Objective function)

為黑箱 (Black box) 的特性，開發出可適用於各種型別不同的參數最佳化問題之可

客製化架構，以在將來配合各存在於無線網路應用系統中不同種類之最佳化問題，

並將最佳化架構進行專為符合無線網路應用系統所需之客製化。

三、文獻探討

許多現實世界中的問題大多不像純數學問題般單純，可以直接套用公式或經過

固定的計算程序來得到正確解答。這些現實問題最終仍需仰賴最佳化技術與工具的

幫助，方能解決各決策變數 (Decision variable) 或稱參數 (Parameter) 的決定問

題。舉凡工業設計、排程規劃、電路設計、資料壓縮、經濟學、建築學等等眾多領

域，都存在著各式各樣不同的最佳化問題。譬如積體電路配置問題，對於相同的電

路設計該如何配置能夠使用最小的面積，或是建築工程中，相同的建築材料該如何

設計才能獲得最大的支撐力問題。這些問題常常都不難要找到一組可行解

(Feasible solution)，甚至是多組可行解，但是如果要找到問題的最佳解，通常就不

是那麼地容易。不同的解在問題中有不同的結果反應，如果我們能客觀地分辨結果

的優劣，就能以最佳化技術提升價值與成本的比值，以期能在各式問題中降低成本

或是改善成果。。

其中，最常見的最佳化形式要屬問題的參數調整。對於想要進行最佳化處理的

問題，通常需要定義一個目標函數 (Objective function)，來協助我們使用各種最佳

(3)

2

連不存在目標函數的問題也能適用，例如：個人化之樂音片段產生 [8]。此類演算

法的可行性與實用性非常高，具有一定的求解能力，在有限時間內通常可以獲得在

品質方面可被接受解，因此漸漸地被廣泛應用於現實世界問題。

四、研究方法

1. 變數型態之研究與分析

以目前現有的許多最佳化問題而論，我們依據常見的參數型態給予分

類並討論分析。舉以一個小偷的背包問題為例子，分別對三種型態問題作

一情境模擬。此問題設定是，小偷的背包有固定的重量限制，而現在有金、

銀、銅三種不同材質的製品，其重量跟單價都不一樣，小偷該如何選擇才

能在條件限制下獲得最大利益。

 布林值 (Boolean values)

布林變數常見的被使用在決策性變數上，已經確知有數個選項，

每個選項可以用單一布林變數來表示選取或不選取。布林變數的問題

通常也就是一般的排列組合問題。當小偷問題中的三種製品都只有一

個時，即可用三個布林變數分別表示要帶走或不帶走情況，此即為典

型的布林參數問題。

 整數 (Integers)

整數是處理離散資料的型態，一些對應到實體個數的參數問題常

常就必須用整數來表示。若小偷的背包問題中，三種製品分別都有一

個以上之數量，則可以用三個整數參數來記錄，構成整數參數最佳化

的問題。

 實數 (Real numbers)

現實世界的工程問題大多是運作在實數域上，因此實數參數也就

是最常被使用的型態，通常我們可以用實數向量來表示一組問題解。

因為實數的連續性，除了在特定的問題類型之下 (例如：線性規劃問

題或是可以實數近似之最佳化問題)，實數最佳解的搜尋經常比布林

與整數型態的解還來得困難許多。假想在小偷的背包問題中，如果小

偷有工具可以對三種製品做切割動作，那此問題就必須使用實數參數

來表示帶走某種製品的數量，此問題則轉為實數最佳化問題。

(4)

有可能針對各類領域的最佳化問題提供量身訂製的最佳化架構服務。

2. 最佳化演算法架構

泛用型最佳化演算法的設計，包含了兩個最主要的部份，分別是將延

伸式精簡基因演算法擴充至整數參數與實數參數。2.1 描述如何修改邊際

乘積機率模型和最小描述長度原則機制，使整數參數也能適用。2.2 則是

在整數架構中，再加上由本實驗團隊所開發之連續域離散化技巧「隨選分

割」 (Split-on-domain) 來處理實數問題。

2.1 整數延伸式精簡基因演算法 (iECGA)

在整數延伸式精簡基因演算法的架構中，我們首先定義整數的範

圍可以從 l 到 u，然後使用整數向量取代原本的二進位位元字串作

為個體基因的表示方式。為了方便和原本的延伸式精簡基因演算法做

比較，我們通常將整數範圍定為 2 的冪次方數。也就是 d = u-l = 2

n

，

則此範圍內的整數，都可以用長度為 n 的二進位字串來表示。在延

伸式精簡基因演算法中，邊際乘積機率模型針對某特定群組進行元樣

式的統計。舉例來說 s = [1, 3, 4] 是某一基因群組，而 |s| = 3則是群

組大小，邊際乘積機率模型的計算即如表格 1 所示，總共有 2

|s|

個

可能樣式。

表格 1: 邊際乘積機率模型之二進位範例

目前族群

樣式

次數

00110

01010

01110

01100

00010

10001

000

001

010

011

100

101

110

111

0

2

1

2

1

0

(5)

4

4360

7164

...

76

77 .

1

0 而整數延伸式精簡基因演算法修改邊際乘積機率模型統計對象

為整數參數，對於同樣一組群組 s = [1, 3, 4]，若整數範圍 d = u-l，

則總共要對 d

|s|

_{種不同樣式作出現次數統計，如表格 2所示。除了修}

改邊際乘積機率模型以符合整數特性之外，模型複雜度估計的運算公

式也必須加以修改。整數延伸式精簡基因演算法將二進位布林參數樣

式的兩種情形擴展到整數範圍的 d 種情形，因此 Model Complexity

公式修正為公式 (1)。而 Compressed Population Complexity和其他部

分的機制都和原延伸式精簡基因演算法相同。





m i s_i

d

N

1 2

log

Complexity

Model

(1)

2.2 實數延伸式精簡基因演算法

在討論實數延伸式精簡基因演算法之前，必須先介紹本實驗團隊

過去所成功發展的連續值域適應性之離散化演算法「隨選分割」

(Split-on-demand, SoD)。此演算法將一連續值域分割成數個區間，使

得每個區間內的搜尋個體數目小於 N*λ，其中N為族群大小、λ為分

割比率，可以用來平衡全域搜尋 (Global search) 跟區域搜尋 (Local

search) 的強度與比重，也就是試圖在探索 (Exploration) 和利用

(Exploitation) 間找到適當的平衡。圖表 1 顯示一組隨選分割的範

例。經過隨選分割處理，可以技巧性地將實數離散化為整數。

2 3 1 -100

圖表 1: 隨選分割範例

延伸式精簡基因演算法原本是設計處理二進位資料的方法，為了

能夠處理實數參數，我們將隨選分割機制整合在整數延伸式精簡基因

演算法的流程中。因此實數延伸式精簡基因演算法架構中的個體分別

有實數向量和整數向量兩種基因態，在目標函數的評估運算和存活個

(6)

Random

Sampling

Individuals

MPM

model

Function

Evaluation

Tournament

Selection

with fitness

Crossover

Mutation

New Integer

Individuals

Greedy

MPM Search

Split‐on‐

Demand

Good Integer

Individuals

Good Real

Individuals

圖表2: 實數延伸式精簡基因演算法架構流程

2.3 泛用型最佳化技術

本計畫所提出之創新泛用型最佳化技術，即奠基於原本的二進位

延伸式精簡基因演算法，及本實驗室過去所開發的整數延伸式精簡基

因演算法和實數延伸式精簡基因演算法。我們將以延伸式精簡基因演

算法為最底層之最佳化引擎，而將同一個問題中各種不同的決策變

數，加以適當地型別轉換，經由最佳化引擎處理後，再回復其原始型

態。由過去發展最佳化技術之相關成功經驗得知，我們應可順利同時

進行不同型態之決策變數的最佳化工作，並開發出優異創新之泛用型

最佳化架構。

五、結果與討論

本計畫原擬以兩年期間，進行「專為無線網路系統客製化之最佳化演算架構」

之研究、探討、與發展工作。最終之預期成果，為開發出一套可客製化之泛用型最

佳化架構，以提供各工程暨科學領域問題之最佳化服務。並且，將此最佳化架構針

對無線網路系統，客製化成為量身訂製之最佳化基礎服務與工具，以處理無線網路

應用系統內，各種不同之最佳化問題。然而如前所述，本計畫被核定為一年期，故

完成之項目為原訂之第一年主題「可客製化之泛用型最佳化架構的設計與發展」。

在此主題中，以演化計算方法論為基礎，配合數項創新技術，設計並發展出新的可

客製化之泛用型最佳化演算架構，能直接適用於含有不同型數策變數之最佳化問

題。已完成之具體工作項目如下：

(7)

6

 強化參與人員之資料分析、演化計算、機械學習、數值分析與最佳化技術

等相關技能。

 設計最佳化演算架構: 提出可適用於含有各種不同型態決策變數之問題的新

型最佳化技術，以因應真實世界狀況中高度複雜之工程問題與困難。

 實作最佳化計算架構: 將所提出之技術，實作為獨立的最佳化工具與服務，以

供本計畫之相關人員，甚至是其他研究領域之人員分享與使用。已完成之原始

程式碼，可由此網址下載：

http://nclab.tw/SM/2010/01/

 撰寫報告並投稿論文。基於國科會之補助，本實驗室發表了以下的相關論文：

 期刊論文：

 Chuang, C.-Y., & Chen, Y.-p. (2010). Sensibility of linkage information

and effectiveness of estimated distributions. Evolutionary Computation,

18(4). doi: 10.1162/EVCO_a_00010. (SCI).

 Chen, Y.-p., & Jiang, P. (2010). Analysis on the facet of particle

interaction in particle swarm optimization. Theoretical Computer Science,

411(21), 2101–2115. doi: 10.1016/j.tcs.2010.03.003. (SCI, EI).

 會議論文：

 Huang Y.-w. & Chen, Y.-p. (2010). Detecting General Problem

Structures with Inductive Linkage Identification. In Proceedings of the

2010 Conference on Technologies and Applications of Artificial

Intelligence (TAAI 2010). (Accepted).

 Lin J.-H. & Chen, Y.-p. (2010). XCS with Bit Masks. In Proceedings of

the 2010 Conference on Technologies and Applications of Artificial

Intelligence (TAAI 2010). (Accepted).

 Chen, Y.-p. (2010). Estimation of distribution algorithms: Basic ideas

and future directions. In Proceedings of World Automation Congress

2010 (WAC 2010) (pp. IFMIP–152). (Invited).

 Chen, C.-M., Chen, Y.-p., Shen, T.-C., & Zao, J. (2010). On the

optimization of degree distributions in LT codes with covariance matrix

adaptation evolution strategy. In Proceedings of 2010 IEEE Congress on

Evolutionary Computation (CEC 2010) (pp. 3531–3538). doi:

(8)

參考文獻

[1] D.

E. Goldberg,

Genetic algorithms in search, optimization, and machine

learning. Reading, Mass.: Addison-Wesley Pub. Co., 1989.

[2]

D. E. Goldberg, The design of innovation : lessons from and for competent

genetic algorithms. Boston: Kluwer Academic Publishers, 2002.

[3] S.

Kirkpatrick, et al., "Optimization by Simulated Annealing," Science, vol. 220,

pp. 671-680, 1983.

[4] V.

Černý, "Thermodynamical approach to the traveling salesman problem: An

efficient simulation algorithm," Journal of Optimization Theory and Applications,

vol. 45, pp. 41-51, 1985.

[5] M.

Dorigo, et al., "Ant algorithms for discrete optimization," Artificial Life, vol.

5, pp. 137-172, 1999.

[6]

R. C. Eberhart and J. Kennedy, "A new optimizer using particle swarm theory,"

Proceedings of the Sixth International Symposium on Micromachine and Human

Science, pp. 39-43, 1995.

[7]

J. Kennedy and R. C. Eberhart, "Particle swarm optimization," Proceedings of

IEEE International Conference on Neural Networks, pp. 1942-1948, 1995.

[8] T.

Y. Fu, et al., "Evolutionary interactive music composition," GECCO 2006:

Genetic and Evolutionary Computation Conference, Vol 1 and 2, pp. 1863-1864,

(9)

8

2. Chen, Y.-p., & Jiang, P. (2010). Analysis on the facet of particle interaction in

particle swarm optimization. Theoretical Computer Science, 411(21), 2101–2115.

doi: 10.1016/j.tcs.2010.03.003. (SCI, EI).

會議論文：

3. Huang Y.-w. & Chen, Y.-p. (2010). Detecting General Problem Structures with

Inductive Linkage Identification. In Proceedings of the 2010 Conference on

Technologies and Applications of Artificial Intelligence (TAAI 2010). (Accepted).

4. Lin J.-H. & Chen, Y.-p. (2010). XCS with Bit Masks. In Proceedings of the 2010

Conference on Technologies and Applications of Artificial Intelligence (TAAI 2010).

(Accepted).

5. Chen, Y.-p. (2010). Estimation of distribution algorithms: Basic ideas and future

directions. In Proceedings of World Automation Congress 2010 (WAC 2010) (pp.

IFMIP–152). (Invited).

6. Chen, C.-M., Chen, Y.-p., Shen, T.-C., & Zao, J. (2010). On the optimization of

degree distributions in LT codes with covariance matrix adaptation evolution

strategy. In Proceedings of 2010 IEEE Congress on Evolutionary Computation

(CEC 2010) (pp. 3531–3538). doi: 10.1109/CEC.2010.5586202. (EI).

7. Chen, C.-M., Chen, Y.-p., Shen, T.-C., & Zao, J. (2010). Optimizing degree

distributions in LT codes by using the multiobjective evolutionary algorithm based

on decomposition. In Proceedings of 2010 IEEE Congress on Evolutionary

(10)

Uncorr

ected

P

roof

Sensibility of Linkage Information and

Effectiveness of Estimated Distributions

Chung-Yao Chuang

[email protected]

Department of Computer Science, National Chiao Tung University, Hsinchu, Taiwan

Ying-ping Chen

∗ [email protected]

Department of Computer Science, National Chiao Tung University, Hsinchu, Taiwan

Abstract

The probabilistic model building performed by estimation of distribution algorithms (EDAs) enables these methods to use advanced techniques of statistics and machine learning for automatic discovery of problem structures. However, in some situations, it may not be possible to completely and accurately identify the whole problem structure by probabilistic modeling due to certain inherent properties of the given problem. In this work, we illustrate one possible cause of such situations with problems consisting of structures with unequal fitness contributions. Based on the illustrative example, we introduce a notion that the estimated probabilistic models should be inspected to reveal the effective search directions, and further propose a general approach which utilizes a reserved set of solutions to examine the built model for likely inaccurate fragments. Furthermore, the proposed approach is implemented on the extended compact genetic algorithm (ECGA) and experiments are performed on several sets of additively sep-arable problems with different scaling setups. The results indicate that the proposed method can significantly assist ECGA to handle problems comprising structures of disparate fitness contributions and therefore may potentially help EDAs in general to overcome those situations in which the entire problem structure cannot be recognized properly due to the temporal delay of emergence of some promising partial solutions. Keywords

Sensible linkage, effective distribution, linkage sensibility, probabilistic model, model pruning, estimation of distribution algorithm, extended compact genetic algorithm, evolutionary computation.

1 Introduction

Estimation of distribution algorithms (EDAs; M ¨uhlenbein and Paaß, 1996; Larra ˜naga and Lozano, 2001; Pelikan, Goldberg, et al., 2002) are a class of evolutionary algorithms that replace the traditional variation operators, such as mutation and crossover, by building a probabilistic model on promising solutions and sampling the built model to generate new candidate solutions. Using probabilistic models for exploration enables these methods to automatically capture the likely structure of promising solutions and exploit the identified problem regularities to facilitate further search. It is presumed that EDAs can detect the structure of the problem by recognizing the regularities within the promising solutions. However, for certain problems, EDAs are unable to identify the

(11)

Uncorr

ected

P

roof

entire structure of the problem at a given time because the set of selected solutions on which the probabilistic model is built contains insufficient information regarding some parts of the problem and renders EDAs incapable of processing these parts accurately. This paper starts by observing the evolutionary process of an EDA when dealing with an exponentially scaled problem, and recognizing that the population on which the probabilistic model is built does not necessarily contain sufficient information for all problem structures to be detected completely and accurately. Based on this observation, this study proposes a general concept that estimated probabilistic models should be inspected to reveal the effective search directions, and we provide a practical approach that utilizes a reserved set of solutions to examine the built model for the fragments that may be inconsistent with the actual problem structure. Furthermore, the proposed approach is implemented on the extended compact genetic algorithm (ECGA; Harik, 1999) and experimented on several sets of additively separable problems with different scaling difficulties (Goldberg, 2002) to demonstrate the applicability.

The following section briefly reviews the research topics concerning this study. Sec-tion 3 then demonstrates the interacSec-tion between the scaling difficulty and probabilistic model building performed by EDAs. More specifically, we will investigate how the scal-ing difficulty shadows the ability of EDAs to recognize problem structures and causes inaccurate processing on the part of some solutions. Accordingly, a general approach will be proposed in Section 4 to resolve this issue and enforce accurate processing dur-ing the optimization process. In Section 5, an implementation of the proposed approach on the extended compact genetic algorithm will be detailed. Section 6 presents the empirical results, followed by discussion and analysis in Section 7. Finally, Section 8 concludes the paper.

2 Background

Genetic algorithms (GAs; Holland, 1992; Goldberg, 1989) are search techniques loosely based on the paradigm of natural evolution, in which species of creatures tend to adapt to their living environments through mutation and inheritance of useful traits. Ge-netic algorithms mimic this mechanism by introducing artificial selections and geGe-netic operators to discover and recombine partial solutions. By properly growing and mix-ing promismix-ing partial solutions, which are often referred to as buildmix-ing blocks (BBs; Goldberg, 2002), GAs are capable of efficiently solving a host of problems. The ability to implicitly process a large number of partial solutions has been recognized as an im-portant source of the computational power of GAs. According to the Schema theorem (Holland, 1992), short, low-order, and highly fit subsolutions increase their share in the final combined solution. Further, as stated in the building block hypothesis (Goldberg, 1989), GAs implicitly decompose a problem into subproblems by processing building blocks. This decompositional bias is a good strategy for tackling many real-world prob-lems, because real-world problems can oftentimes be reliably solved by combining the pieces of promising solutions in the form of problem decomposition.

However, proper growth and mixing of building blocks are not always achieved. GAs in the simplest form employ fixed representations and problem-independent re-combination operators, which often breaks promising partial solutions while perform-ing crossovers. This can cause crucial buildperform-ing blocks to vanish, thus leadperform-ing to a convergence to local optima. In order to overcome this building block disruption prob-lem, various techniques have been proposed. In this study, we focus on one line of effort often called the estimation of distribution algorithm (EDA; M ¨uhlenbein and Paaß, 1996;

(12)

Uncorr

ected

P

roof

Larra ñaga and Lozano, 2001; Pelikan, Goldberg, et al., 2002). These methods construct probabilistic models of promising solutions and utilize the built models to generate new solutions. Ideally, by detecting dependencies among variables through probabilis-tic modeling, these approaches can capture the structure of the problem and thus avoid the disruption of identified partial solutions. Early EDAs, such as population-based incremental learning (PBIL; Baluja, 1994) and the compact genetic algorithm (cGA; Harik et al., 1999), assume no interaction between decision variables, that is, decision variables are assumed to be independent of each other. Subsequent studies progressed from capturing pairwise interactions, such as mutual-information-maximizing input clustering (MIMIC; De Bonet et al., 1997), Baluja’s dependency tree approach (Baluja and Davies, 1997), and the bivariate marginal distribution algorithm (BMDA; Pelikan and M ühlenbein, 1999), to modeling multivariate interactions, such as the extended compact genetic algorithm (ECGA; Harik, 1999), the Bayesian optimization algorithm (BOA; Pelikan et al., 1999), the estimation of Bayesian network algorithm (EBNA; Etxeberria and Larra ñaga, 1999), the factorized distribution algorithm (FDA; M ühlenbein and Mahnig, 1999), and the learning version of FDA (LFDA; M ühlenbein and H öns, 2005). Along this line of research, questions arose naturally regarding the ability of EDAs to solve problems and the probabilistic models employed to learn the problem structures. Early studies recognized that solving problems composed of higher order building blocks is not expected to be accomplished by using just any probability density structure. Bosman and Thierens (1999) demonstrated that even when the set of vari-ables forming a building block is linked and expressed by the best possible MIMIC-like chain structure, directly sampling that chain to generate new solutions is not a good strategy for reliable optimization. More recently, Echegoyen et al. (2007) compared the behavior of EBNA with approximate and exact Bayesian network learning. In another vein, Hauschild et al. (2007) analyzed the structure and complexity of learned proba-bilistic models and attempted to facilitate the model building process by incorporating the knowledge acquired from previous models (Hauschild et al., 2008).

Another topic relevant to this study is the impact of disparate scale among different building blocks on the behavior and performance of the evolutionary algorithms. It is commonly observed that building blocks with higher marginal fitness contributions— salient building blocks—converge before those with lower marginal fitness contri-butions. This sequential convergence behavior is referred to as domino convergence (Thierens et al., 1998). In real-world applications, it is often the case that some parts of the problem are more prominent and contribute more to the fitness than other parts.1

Such a situation can pose two types of difficulties. Firstly, because the processing on the population is statistical in nature, building block scaling can cause inaccurate pro-cessing of less fit building blocks (Goldberg et al., 1992; Goldberg and Rudnick, 1991). The second difficulty arises because the lower fitness of a building block generally causes it to be processed at a later time compared to those of higher fitness. This delay on timeline can cause the building block to converge under random pressure, instead of proper selective pressure. Previous studies on this topic include the explicit role of scale in a systematic experimental setting (Goldberg et al., 1990), a theoretical model 1_{The reader may note that this statement cannot be formally proved nor disproved because we do}

not know nor even have a way to estimate the distribution of all real-world problems. However, this intuition can be better articulated by the explanation provided in Goldberg (2002): differences in scale are likely to be common across the space of likely problems, that is, the chance that we encounter differences in scale may be much larger than encountering equivalence in scale.

(13)

Uncorr

ected

P

roof

on the convergence behavior of exponentially scaled problems (Thierens et al., 1998), an extension of that model to building blocks more than one variable long (Lobo et al., 2000), and a convergence model of linkage learning genetic algorithms (LLGAs; Harik, 1997) on problems with different scaling setups (Chen and Goldberg, 2005).

Although the aforementioned scaling difficulty exists in a number of problems and degrades the performance of many evolutionary algorithms (EAs), there are scant investigations concerning the behavior of EDAs in the presence of scaling difficulties. Therefore, this study attempts to explore how the scaling difficulty affects EDAs, and proposes a practical countermeasure to assist EDAs on problems with different scalings. Specifically, we propose the notion that the estimated probabilistic models should be examined to enforce accurate processing of building blocks and prevent random drift from taking place. In the remainder of this paper, our approach will be demonstrated and evaluated on the test problems constructed by concatenating several trap functions. A k-bit trap function is a function of unitation2_{which can be expressed as}

f_trap_k(s1s2· · · sk)=

k, if u= k

k− 1 − u, otherwise,

where u is the number of ones in the binary string s1s2· · · sk. The trap functions were used

pervasively in the studies concerning EDAs and other evolutionary algorithms because they provide well-defined structures among variables, and the ability to recognize intervariable relationships is essential to solve the problems consisting of traps (Deb and Goldberg, 1993, 1994).

3 Linkage Sensibility

The ability of EDAs to handle the building block disruption problem comes primarily from the explicit modeling of selected promising solutions using probabilistic models. The model construction algorithms, though they differ in their representative power, capture the likely structures of good solutions by processing the population-wise statis-tics collected from the selected solutions. By reasoning the dependencies among differ-ent parts of the problem and the possible formations of good solutions, reliable mixing and growing of building blocks can be achieved. As noted by Harik (1999), learning a good probability distribution is equivalent to learning linkage, where linkage refers to the dependencies among variables. Bosman and Thierens (1999) further recognized that in order to achieve reliable optimization, linkage information should be utilized in a way such that each corresponding building block can be identified and used as a whole.

In most studies on EDAs, it is presumed that EDAs can detect linkage and recognize building blocks according to the information contained in the set of selected solutions. However, in this study, we argue that in some situations, accurate and complete linkage information cannot be acquired by distribution estimation because the selected set of solutions on which the model is built contains insufficient information on the lower fitness parts of the problem. For example, consider a 16-bit maximization problem

2_{A function in which the function value depends only on the number of ones in the binary input}

(14)

Uncorr

ected

P

roof

Table 1: Marginal product models built by ECGA when solving an exponentially scaled problem. Each group of variables represents a marginal model in which a marginal distribution resides. The converged variables are crossed out.

Generation Marginal product model

1 [s1s2s3s4] [s5s10s16] [s6s7] [s8s9s12] [s11s14s15] [s13] 2 [s1] [s2] [s3] [s4] [s5s6 s7s8] [s9s13s16] [s10s14s15] [s11s12] 3 [s1] [s2] [s3] [s4] [s5] [s6] [s7] [s8] [s9s10s11s12] [s13s16] [s14s15] 4 [s1] [s2] [s3] [s4] [s5] [s6] [s7] [s8] [s9] [s10][s11] [s12] [s13s14s15s16]

formed by concatenating four 4-bit trap functions as subproblems,

f(s1s2· · · s16)= 3

i=0

(53-if_trap

4(s4i+1s4i+2s4i+3s4i+4)) ,

where s1s2· · · s16is a solution string. Note that in contrast to other studies of EDAs, in

which the test problems are scaled uniformly, that is, the subproblems are of equal fit-ness, in this problem, each elementary trap function is scaled exponentially. This scaling is an abstraction for problems of distinguishable prominence or solving priority among the constitutive subproblems. Suppose that we choose ECGA (Harik, 1999), which uses a class of multivariate probabilistic models called marginal product models (MPMs), to tackle this problem.3_{By observing subsequent generations of the optimization process,}

a series of models built by ECGA can be obtained like those listed in Table 1. In this table, the variables enclosed by the same pair of brackets are considered dependent and are modeled jointly. Each group of variables represents a marginal model in which a marginal distribution resides, and the converged variables are crossed out.4

It can be observed that the models shown in Table 1 are only partially correct in each generation. More specifically, in each generation, only the most fit building block on which the population has not converged is correctly modeled. This is due to the fact that some part of the problem contributes much more than all others combined. If one part of the problem is worth more than the others, then this part of the solution solely determines the chance regarding whether or not the solution will be selected. As a consequence, only the most fit building block can provide sufficient information to be modeled correctly, since the model searching is performed based on the selected solutions. The remaining parts of the model are primarily the result of low fitness partial solutions “hitchhiking” on the more fit building blocks.

From the above example, we can see that not all building blocks can be detected from a given set of selected solutions by probabilistic model building. Model building algorithms cannot “see” the entire structure of the problem from the selected set of solutions because the disparate scale among different building blocks prevents complete linkage information from being included in the selected population. In this work, we will refer to this concept as linkage sensibility and those problem structures that can be identified properly using the given set of solutions are called sensible linkage. Based on this notion, we reexamine EDAs on the building block disruption problem. It is clear

3_{See Section 5.1 for a more detailed description of ECGA and marginal product models.}

4_{The convergence of a variable is defined as all solutions in the population possessing the same}

(15)

Uncorr

ected

P

roof

that the disruption problem still exists in the insensible portion of the problem because that part of the problem cannot be modeled properly. Although the above example is an extreme case of scaling, in that each subproblem is exponentially scaled, in real-world problems, it is often the case that the constitutive subproblems are weighted significantly differently, which implies that the linkage might be only partially sensible. In addition to the building block disruption problem, the random drift of the less salient parts of the problem mentioned in Section 2 further worsens the situation. These situations and issues are usually handled by increasing population size when EDAs are adopted. However, we may gain a new way to deal with these situations if it is possible to distinguish a sensible linkage from an insensible linkage.

4 Effective Distributions

The idea of sensible linkage can be closely mapped into another notion called effective

distributions. By effective distributions, we mean that by sampling these distributions,

the solution quality can be reliably advanced. Hence, the crucial criteria for effective dis-tributions are the consistency with building blocks and the provision of good directions for further search. If it is possible to extract effective marginal distributions from the built probabilistic model, we can perform partial sampling using only these marginal distributions, and leave the remaining parts of the solutions unchanged. Thus, the diver-sity is maintained and we are free from the building block disruption and random drift problems. For instance, returning to the earlier 16-bit optimization problem, if it is pos-sible to identify those partial models that are built on the senpos-sible linkage like [s1s2s3s4]

in the first generation and [s5s6s7s8] in the second generation, we can sample only the

corresponding marginal distributions which are, in this case, effective. That is, in the first generation, for each solution string, we resample only s1s2s3s4according to the marginal

distribution and keep s5s6· · · s16unchanged. In the second generation, we resample only

s₁to s8according to the marginal distributions and keep s9s10· · · s16with the same values

(note that s1s2s3s4are converged). In this way, we do not have to resort to increasing the

population size to deal with the problems caused by the disparate building block scaling. The above thoughts leave us one complication: the identification of effective distri-butions. However, the direct identification of effective distributions may be a difficult if not impossible task. It may be wise to adopt a complementary approach—to iden-tify those marginal distributions that are not likely to be effective. If there is a way to identify the ineffective distributions, we can bypass them and use only the rest of the probabilistic model, and thus approximate the result of knowing effective distributions. Our idea is that we can split the entire population into two subpopulations, use only one of the subpopulations for building the probabilistic model, and utilize the other subpopulation to collect some statistics for possible indications of ineffectiveness of cer-tain marginal distributions in the probabilistic model built on the first subpopulation. That is, with some appropriate heuristics or criteria, we can prune the likely ineffective portions of the model.

In the next section, our implementation in ECGA of the proposed concept will be detailed. More specifically, a judging criterion will be proposed to detect the likely ineffective marginal distributions of a given marginal product model.

5 ECGA with Model Pruning

This section starts with a brief review of the (ECGA; Harik, 1999). Based on the idea of detecting the inconsistency of statistics gathered from two subpopulations of the

(16)

Uncorr

ected

P

roof

Table 2: An example of a marginal product model that defines a probability distribution over four variables. The variables enclosed in the same brackets are modeled jointly, and each variable subset is considered independent of the other variable subsets.

[s1] [s2s4] [s3]

P(s1= 0) = 0.4 P(s2= 0, s4= 0) = 0.2 P(s3= 0) = 0.5 P(s1= 1) = 0.6 P(s2= 0, s4= 1) = 0.1 P(s3= 1) = 0.5

P(s2= 1, s4= 0) = 0.1 P(s2= 1, s4= 1) = 0.6

same source, a mechanism is devised to identify the possibly ineffective parts of the built probabilistic model. Finally, an optimization algorithm incorporating the proposed technique is described in detail.

5.1 Extended Compact Genetic Algorithm

ECGA uses a product of marginal distributions on a partition of the variables. This kind of probability distribution belongs to a class of probabilistic models known as marginal product models (MPMs). In this kind of model, subsets of variables can be modeled jointly, and each subset is considered independent of other subsets. In this work, the conventional notation is adopted that variable subsets are enclosed in brackets. Table 2 presents an example of MPM defined over four variables: s1, s2, s3, and s4. In this

example, s2and s4are modeled jointly and each of the three variable subsets ([s1], [s2s4],

and [s3]) is considered independent of the other subsets. For instance, the probability

that this MPM generates a sample s1s2s3s4= 0101 is calculated as follows,

P(s1s2s3s4= 0101) = P (s1= 0) × P (s2= 1, s4= 1) × P (s3= 0)

= 0.4 × 0.6 × 0.5 .

In fact, as its name suggests, a marginal product model represents a distribution that is a “product” of the marginal distributions defined over variable subsets.

In ECGA, both the structure and the parameters of the model are searched and optimized in a greedy fashion to fit the statistics of the selected set of promising solu-tions. The measure of a good MPM is quantified based on the minimum description length (MDL) principle (Rissanen, 1978), which states that any regularity in a given set of data can be used to compress that data, and the success of a model in capturing those regularities can be measured by the cost of expressing the model and the length of the data compressed according to the model. The MDL principle thus penalizes both inaccurate and complex models, thereby leading to a descriptive yet not overly com-plicated distribution. Specifically, the search measure is the MPM complexity which is quantified as the sum of model complexity, Cm, and compressed population complexity,

Cp. The greedy MPM search first considers all variables as independent and each of

them forms a separate variable subset. In each iteration, the greedy search merges two variable subsets that yield the greatest reduction in Cm+ Cp. This process continues

until there is no further merge that can decrease the combined complexity.

The model complexity, Cm, quantifies the model representation in terms of the

number of bits required to store all the marginal distributions. Suppose that the given problem is of length with binary encoding, and the variables are partitioned into m

(17)

Uncorr

ected

P

roof

subsets each of size ki, i= 1 . . . m, such that =

m

i=1ki. Then the marginal distribution

corresponding to the ith variable subset requires 2ki− 1 frequency counts to be com-pletely specified. Taking into account that each frequency count is of length log₂(n+ 1) bits, where n is the population size, the model complexity, Cm, can be defined as

Cm= log₂(n+ 1) m i=1 2ki − 1_.

The compressed population complexity, Cp, quantifies the suitability of the model

in terms of the number of bits required to store the entire selected population (the set of promising solutions picked by the selection operator) under an ideal compres-sion scheme. The comprescompres-sion scheme is based on the partition of the variables. Each subset of the variables specifies an independent “compression block” on which the corresponding partial solutions are optimally compressed. Theoretically, the optimal compression method encodes a message of probability pi using− log₂pi bits. Thus,

taking into account all possible messages, the expected length of a compressed mes-sage isi−pilog₂pibits, which is optimal. In information theory (Cover and Thomas,

1991), the quantity− log₂piis called the information of that message and

i−pilog₂pi

is called the entropy of the corresponding distribution. Based on information theory, the compressed population complexity, Cp, can be derived as

Cp = n m i=1 2ki j=1 −pijlog₂pij,

where pij is the frequency of the j th possible partial solution to the ith variable subset

observed in the selected population.

Note that in the calculation of Cp, it is assumed that the j th possible partial solution

to the ith variable subset is encoded using− log₂pij bits. This assumption is

funda-mental to our technique of identifying the likely ineffective marginal distributions. More precisely, the information of the partial solutions,− log₂pij, is a good indicator of

inconsistency of statistics gathered from two separate subpopulations.

5.2 Model Pruning

Our technique of identifying the possibly ineffective fragments of a marginal product model is based on the notion that ECGA uses compression performance to quantify the suitability of a probabilistic model for a given set of solutions. The degree of compression is a quite representative metric to the fitness of modeling, because all good compression methods are based on capturing and utilizing the relationships among data (Gr ¨unwald, 2007). Thus, if the compression scheme of the MPM built on one set of solutions is incapable of compressing another set of solutions produced under the same condition,5

then we can speculate that some of the constitutive marginal models observed in the first set of solutions are likely inconsistent with the distribution of the corresponding partial solutions observed in the second set of solutions. Such inconsistency can be seen

5_{For example, if all individuals are produced by sampling the same probabilistic model and selected}

(18)

Uncorr

ected

P

roof

as a disagreement on the direction of further search. However, under the premise that these two sets of solutions are produced under the same condition, they are supposed to reveal similar directions of further search. Thus, we can reasonably speculate that proper selection pressures were not applied on these partial solutions (causing them to drift toward two different directions), and the true linkage structures on these parts of the problem is not sensible under this condition. Recalling our definition in Section 4, an effective distribution should be capable of providing good direction for further search and consistent with the linkage structure. Thus, if the abovementioned inconsistency is found, we can expect that with a high probability,6_{the inconsistent marginal models}

are ineffective. Based on the reasoning, we can perform a systematic checking on the given MPM for the likely ineffective portions.

Suppose that the population of solutions, P , is split into two subpopulations S and

T. The model searching is performed on S, the set of promising solutions selected from

S. Then we can use the statistics collected from T, the set of solutions selected from

T, to examine the built probabilistic model, M. Since each marginal model functions independently, they can be inspected separately. Recall the former description that a variable subset, which specifies a marginal model, is viewed as a “compression block” that encodes each possible partial solution according to the marginal distribution. The

jth possible partial solution to the ith variable subset is encoded using− log₂pij bits,

where pij is the frequency of the j th possible partial solution to the ith variable subset

observed in S. Assuming that the given problem is of length with binary encoding, and there are m variable subsets with each of size ki, i= 1 . . . m, in the built model M,

for the ith marginal model, i= 1 . . . m, we can check whether or not

2ki

j=1

qij(− log₂pij) > ki,

where qij is the frequency of the j th possible partial solution to the ith variable subset

collected from T. If the inequality holds, then the compression scheme employed in the ith marginal model is not a good one for compressing the corresponding partial solutions in Tbecause it encodes a ki-bit partial solution to a bit string with an expected

length of more than ki bits. Based on the earlier reasoning, such a condition indicates

that the marginal model is likely ineffective because Tdoes not agree on this part of the model. Otherwise, the scheme should be able to compress the partial solutions in T.

Further explained from a machine learning perspective (Mitchell, 1997), a good model should generalize well to unseen instances. Otherwise, it captures coincidental regularities among the training data or what it has observed. If model building is performed on the portion where linkage is not sensible from the given set of solutions, it will “overfit” these partial solutions (i.e., take on hitchhikers) that were not subject to proper selection pressures. Consequently, the regularities captured by this part of modeling tend to be inconsistent with the true problem structure. Furthermore, the partial solutions that were not subject to proper selection pressure appear to be random, and such a situation brings about the phenomenon of random drift mentioned in Section 2. By its nature, drift is random, and two different subpopulations tend to drift in two different directions. Thus, we can use the statistical inconsistency between Sand

(19)

Uncorr

ected

P

roof

Algorithm 1 ECGA with Model Pruning

Initialize a population P with n solutions of length . whilethe stopping criteria are not met do

Evaluate the solutions in P .

Divide P into two subpopulations S and T at random.

S← apply t-wise tournament selection on S. T← apply t-wise tournament selection on T . M← build the MPM on Swith greedy search.

M← prune M based on the inconsistency with T. foreach remaining marginal distribution D in Mdo

foreach solution s= s1s2· · · sin P do

Change the values in s partially by sampling D. end for

end for end while

Tto locate the possible drift portions of the solutions and identify the likely ineffective parts within the whole model. By removing these likely ineffective parts, we can forge a partial but more effective model.

An issue in practice concerning the calculation of the inequality is that sometimes one or more possible partial solutions are absent in the set of selected solutions, leaving − log₂pij undefined because pij = 0. In the present work, we handle this practical

problem by assigning a very small value, smaller than 1/n, to the pij’s that are zero and

normalizing them such that pij’s sum to 1 (i.e.,

jpij = 1). 5.3 Integration

In this section, the optimization process incorporating ECGA and the proposed tech-nique is described. This combination helps ECGA to achieve better performance when a disparate scale exists among different parts of the problem.

The procedure is presented in Algorithm 1. This process starts with initializing a population of solutions. After initialization, the solutions are evaluated, and then the entire population is randomly split into two subpopulations. Selection operations are performed on the two subpopulations separately with the same operator and selec-tion pressure. Model building is performed on one of the subpopulaselec-tions. The other subpopulation is used to prune the built model using the technique described pre-viously. Finally, all solutions in the population are altered by sampling the remaining marginal distributions, which are considered effective, in the pruned model. These steps are repeated until the stopping criteria are satisfied.

A prominent difference between the above process and the regular EDAs is that the sampling might not include all variables. As introduced in Section 4, the existing solutions are altered by sampling only the marginal distributions surviving the model pruning process. Thus, a solution string might not be entirely modified in an iteration. This technique hence avoids random drift and inaccurate processing of low-fitness building blocks by postponing the processing until sufficient linkage information is available. Similar to the concept proposed by Bosman and Thierens (1999) that link-age information estimated from the selected solutions has to be utilized to recognize

(20)

Uncorr

ected

P

roof

Table 3: Marginal product models before and after pruning when solving a 16-bit exponentially scaled problem with the proposed approach.

Generation Marginal product model (before and after pruning) 1 Before [s1s2s3s4] [s5s13s16] [s6s7s12] [s8s11] [s9s10] [s14s15] After [s1s2s3s4] 2 Before [s1] [s2] [s3] [s4] [s5s6s7s8] [s9s14] [s10s15] [s11s13s16] [s12] After [s1] [s2] [s3] [s4] [s5s6s7s8] 3 Before [s1] [s2] [s3] [s4] [s5] [s6] [s7] [s8] [s9s10s11s12] [s13s14] [s15s16] After [s1] [s2] [s3] [s4] [s5] [s6] [s7] [s8] [s9s10s11s12] 4 Before [s1] [s2] [s3] [s4] [s5] [s6] [s7] [s8] [s9] [s10] [s11] [s12] [s13s14s15s16] After [s1] [s2] [s3] [s4] [s5] [s6] [s7] [s8] [s9] [s10] [s11] [s12] [s13s14s15s16]

building blocks, we further address that the validity of the linkage information should be confirmed beforehand. In this way, better performance in terms of function evalua-tions can be achieved if a disparate scale exists among different parts of the problem.

In order to confirm that the proposed method meets its design purpose, Table 3 lists the models before and after pruning when the earlier exponentially scaled problem is solved by Algorithm 1. It can be seen that the proposed approach appropriately removes the ineffective parts during each stage of the optimization process. In order to further illustrate the behavior and effect of the proposed approach, the algorithm is applied to another problem with a different scaling called overloaded scaling7

f(s1s2· · · s16)= 1

i₌₀

f_trap

4(s4i+1s4i+2s4i+3s4i+4)

+

3

i=2

1

5ftrap4(s4i+1s4i+2s4i+3s4i+4) ,

where s1s2· · · s16 is a solution string. The overloaded cases are those with two scales,

where some subproblems are at the high level and the rest are at the low one. The models before and after pruning when such a problem is solved are shown in Table 4. It can be observed that the proposed method works as expected in splitting the solving process according to the scaling structure. The two subproblems of higher fitness are handled first, and the two subproblems of lower fitness are solved later.

6 Experiments

The experiments are designed to reveal the behavior of the proposed approach in han-dling sets of problems with different scaling difficulties. Because ECGA is limited in handling overlapped building blocks, we use only test problems that are additively separable. In this study, three bounding models of scalings (Goldberg, 2002) are consid-ered: exponential, power law, and uniform. While the uniform and exponential cases

7_{As mentioned by Goldberg (2002), the word “overloaded” is a reference to the application of this}

idea in the early messy GA work (Goldberg et al., 1990), where such distributions were used to try to overload or overwhelm the ability of the messy GA to keep all building blocks present through all phases of the process.

(21)

Uncorr

ected

P

roof

Table 4: Marginal product models before and after pruning when solving a 16-bit problem of the overloaded scaling with the proposed approach.

Generation Marginal product model (before and after pruning) 1 Before [s1s2s3s4] [s5s6s7s8] [s9s16] [s10s14s15] [s11s13] [s12] After [s1s2s3s4] [s5s6s7s8] 2 Before [s1s2s3s4] [s5s6s7s8] [s9s13s14] [s10s12] [s11s15] [s16] After [s1s2s3s4] [s5s6s7s8] 3 Before [s1] [s2] [s3] [s4] [s5] [s6] [s7] [s8] [s9s10s11s12] [s13s14s15s16] After [s1] [s2] [s3] [s4] [s5] [s6] [s7] [s8] [s9s10s11s12] [s13s14s15s16] 4 Before [s1] [s2] [s3] [s4] [s5] [s6] [s7] [s8] [s9s10s11s12] [s13s14s15s16] After [s1] [s2] [s3] [s4] [s5] [s6] [s7] [s8] [s9s10s11s12] [s13s14s15s16]

bound the scaling performance of an algorithm at two extremes, the power law cases enable us to see the behavior in between. Based on the different scalings, three sets of test functions are constructed using ftrapk as the elemental function:

Exponential:

m−1

i=0

(k+ 1)i_f

trapk(sk×i+1sk×i+2· · · sk×i+k)

Power law:

m₋₁

i₌₀

(i+ 1)3_f

trapk(sk×i+1sk×i+2· · · sk×i+k)

Uniform:

m−1

i=0

f_trap

k(sk×i+1sk×i+2· · · sk×i+k)

By adopting different scaling setups, we can compare the original ECGA with our approach under different degrees of linkage sensibilities. By varying k and m, we can observe the behavior of the proposed method with respect to different problem and subproblem sizes in a controlled manner. Furthermore, various selection pressures are also taken into consideration to make a more thorough observation.

The purpose of the following experiments is to understand the impact of the pro-posed method on the computational resource (population size and function evaluations) required to solve a problem. Thus, we do not use solution quality as a measure of comparison but treat it as a minimum requirement. More precisely, we use a bisection method (Sastry, 2001) to bound the minimum population size capable of achieving reliable convergence to the optimum. Of course, solution quality can be an important indicator for evaluating a newly invented approach. However, the primary goal of this study is to design a more economic approach for solving problems, and the experiments are designed to evaluate the ability of the proposed approach in this aspect.

6.1 Effect of Selection Pressure

This section describes the experiments designed for observing the effect of selection pressure on both the original ECGA and the ECGA combined with the proposed ap-proach. The purpose of these experiments is twofold.

• First, we want to determine the range of selection pressure with which the pro-posed approach works as we designed. Appropriate selection pressure is quite

(22)

Uncorr

ected

P

roof

8 12 16 20 24 0 500 1000 1500 2000 2500 3000 3500 4000 4500 5000 5500 Tournament Sizes Population Sizes ECGA, = 40 ECGA, = 80 ECGA+MP, = 40 ECGA+MP, = 80

(a) Population Sizes

8 12 16 20 24 0 1 2 3 4 5 6 7 8 9 10 11 12 x 104 Tournament Sizes Function Evaluations ECGA, = 40 ECGA, = 80 ECGA+MP, = 40 ECGA+MP, = 80 (b) Function Evaluations

Figure 1: Empirical results of the proposed method and original ECGA on 40- and 80-bit (k= 4, m = 10 and 20) exponential scaled problems. Five tournament sizes ranging from 8 to 24 were used to observe the behavior of the algorithms under different selection pressures.

important to the proper functioning of our approach because the pruning mech-anism is designed according to the statistical inconsistencies between the two subpopulations.

• Second, because the proposed approach will be compared with the original ECGA in the subsequent experiments, in order to make a fair and meaningful compar-ison, the selection pressure must be set to an appropriate value for the original ECGA to work under good conditions.

6.1.1 Experimental Settings

Because tournament selection is adopted, the selection pressure is altered by changing the tournament size. We consider tournament sizes ranging from 8 to 24, and the problem instances used to make the observations are of length 40 bits and 80 bits with 4-bit trap functions as subproblems (k= 4, m = 10 and 20, respectively).

For simplicity, the splitting of population is performed in the way that the two resulting subpopulations are disjoint and of equal size. The stopping criterion is set such that a run is terminated when all solutions in the population converge to the same fitness value. For each tournament size, the minimum required population size is determined by a bisection method (Sastry, 2001) such that on average, m− 1 building blocks converge to the correct values in 50 runs for each of the two problem instances. 6.1.2 Results and Observations

The results for exponential, power law, and uniformly scaled problems are presented in Figures 1, 2, and 3, respectively. It can be observed from Figures 1(b), 2(b), and 3(b) that for all three scalings, the original ECGA works best (in terms of the number of function evaluations) under tournament size 12 or 16. Based on that, we will use these two tournament sizes in the following sets of experiments to ensure that the improvement of our approach over the original ECGA is not a result of improper selection pressure. In fact, we also performed experiments using a tournament size of 4, of which the results

(23)

Uncorr

ected

P

roof

8 12 16 20 24 0 500 1000 1500 2000 2500 3000 3500 4000 4500 5000 5500 Tournament Sizes Population Sizes ECGA, = 40 ECGA, = 80 ECGA+MP, = 40 ECGA+MP, = 80

Figure 2: Empirical results of the proposed method and original ECGA on 40- and 80-bit (k= 4, m = 10 and 20) power law scaled problems. Five tournament sizes ranging from 8 to 24 were used to observe the behavior of the algorithms under different selection pressures. 8 12 16 20 24 0 500 1000 1500 2000 2500 3000 3500 4000 4500 5000 5500 Tournament Sizes Population Sizes ECGA, = 40 ECGA, = 80 ECGA+MP, = 40 ECGA+MP, = 80

Figure 3: Empirical results of the proposed method and original ECGA on 40- and 80-bit (k= 4, m = 10 and 20) uniformly scaled problems. Five tournament sizes ranging from 8 to 24 were used to observe the behavior of the algorithms under different selection pressures.

are listed in Table 5. This demonstrates that adopting a lower selection pressure does not yield better performance for ECGA or for our approach.

The results of these experiments give some insights into the pruning mechanism. It can be observed that the appropriateness of a particular selection pressure is related to the linkage sensibility of the problem at hand. This property could cause inconvenience in choosing selection pressure for the algorithm because when dealing with black box optimization, we usually do not have any information about the problem at hand. Fortunately, Figures 1(b), 2(b), and 3(b) also suggest that under tournament sizes ranging

(24)

Uncorr

ected

P

roof

Table 5: Empirical results of the proposed method and original ECGA using a tour-nament size of 4. Experiments were conducted on 40- and 80-bit problems formed by concatenating 4-bit trap functions with three different scalings. The symbols , n, and

fevdenote problem size, population size, and function evaluations, respectively.

n fev std. of fev

Exponential ECGA 40 1,719 44,487.72 2,682.02

80 3,748 187,549.92 5,912.06

ECGA+MP 40 1,405 37,373.00 2,027.11

80 4,221 210,881.16 8,568.54

Power law ECGA 40 1,604 32,946.16 2,105.37

80 5,507 163,557.90 6,017.21 ECGA+MP 40 1,248 27,755.52 1,929.44 80 4,361 141,034.74 5,884.63 Uniform ECGA 40 1,346 17,228.80 1,489.44 80 3,479 58,308.04 3,411.61 ECGA+MP 40 2,181 30,446.76 2,411.81 80 5,598 100,540.08 5,535.96

from 8 to 16, our approach works better than the original ECGA in the exponential and power law scaled cases. Under this range of tournament sizes (8 to 16), the behavior of the proposed approach in uniformly scaled cases is relatively stable compared to that under higher selection pressure. This observation demonstrates that for a broad range of selection pressure, the improvement obtained by using the pruning mechanism can be expected in cases of limited linkage sensibility, while in cases for which linkage information is completely sensible, the overhead is relatively stable.

6.2 Impact on Population Requirement with Increasing m

This section describes experiments designed to reveal the behavior of the proposed approach when the number of subproblems within a problem is growing (i.e., increasing

mwith fixed k). In order to illustrate the effectiveness and benefit of adopting the pruning mechanism and to estimate the overhead when it is not needed, the proposed approach will be compared with the original ECGA on three sets of problems with different scaling setups.

6.2.1 Experimental Settings

The problem instances used in this set of experiments are composed of 4-bit trap func-tions and ranging from 40 to 80 bits (k= 4, m = 10 . . . 20). Two selection pressures are adopted by setting tournament size t to 12 and 16. The reason for using these two tour-nament sizes is because our approach is compared with the original ECGA, which seems to perform better with t= 12 or t = 16 according to the previous set of experiments. Otherwise, a question might arise as to whether or not the inferior performance of the original ECGA under some scaling difficulties comes from the inappropriate setting of selection pressure.

As in the previous experiment, the splitting of population is also performed in the way that the two resulting subpopulations are disjoint and of equal size. The stop-ping criterion is set such that a run is terminated when all solutions in the population converge to the same fitness value. For each problem instance, the minimum required

(25)

Uncorr

ected

P

roof

40 44 48 52 56 60 64 68 72 76 80 0 500 1000 1500 2000 2500 3000

Problem Sizes (Bits)

Population Sizes

ECGA, t = 12 ECGA, t = 16 ECGA+MP, t = 12 ECGA+MP, t = 16

40 44 48 52 56 60 64 68 72 76 80 1 2 3 4 5 6 x 104

Problem Sizes (Bits)

Function Evaluations ECGA, t = 12 ECGA, t = 16 ECGA+MP, t = 12 ECGA+MP, t = 16 (b) Function Evaluations

Figure 4: Empirical results of the proposed method compared to the original ECGA on exponential scaled problems with tournament sizes t= 12 and t = 16. Problem sizes ranging from 40 to 80 bits (k= 4, m = 10 . . . 20) were used to observe the performance of the algorithms.

population size is determined by a bisection method such that on average, m− 1 build-ing blocks converge to the correct values in 50 runs.

6.2.2 Results and Observations

The empirical results for exponentially scaled problems are shown in Figure 4. The minimum population sizes required by the proposed method are much smaller than the sizes needed by the original ECGA, and grow at a relatively slow rate. The same sit-uation is also observed in the function evalsit-uations for which our approach performed remarkably well. This improvement can be explained by the previous discussion on random drift and linkage sensibility presented in earlier sections. If simultaneous de-tection and processing of all building blocks cannot be achieved, additional costs have to be paid for the inaccurate processing and random drift of subsolutions. By adopting the pruning mechanism, we can save these costs by detecting possibly ineffective partial models and postponing the changes on them until accurate processing can be made.

Figure 5 shows the results for power law scaled problems. The results of the mini-mum population sizes are similar to those obtained in the previous set of experiments. The proposed method still uses fewer function evaluations, but the differences are re-duced. This is because the linkage sensibility of the power law scaled problems is less limited compared to that of the exponential scaled problems.

The empirical results for uniformly scaled problems are presented in Figure 6. As expected, the proposed method requires larger population sizes than which was needed by the original ECGA. Due to the fact that for uniformly scaled problems, the model building process can correctly identify all building blocks, the verification on the built model may just be useless and wasteful. The results also suggest that the function evaluations used by the proposed method are about twice as the number of what was needed by the original ECGA.

In order to support the significance of the observations, we have also performed Welch’s t-test on the results. For each problem size, a t-test of the null hypothesis that the

研究與發展專為無線網路系統客製化之最佳化演算架構

計 畫 類 別 ： 個別型

計 畫 編 號 ： NSC 98-2221-E-009-072-

執 行 期 間 ： 98 年 08 月 01 日至 99 年 07 月 31 日

執 行 單 位 ： 國立交通大學資訊工程學系（所）

計 畫 主 持 人 ： 陳穎平

共 同 主 持 人 ： 許騰尹、陳耀宗

計畫參與人員： 碩士班研究生-兼任助理人員：黃淵暐

碩士班研究生-兼任助理人員：古明哲

碩士班研究生-兼任助理人員：許庭毓

博士班研究生-兼任助理人員：林季穎

博士班研究生-兼任助理人員：李長紘

報 告 附 件 ： 出席國際會議研究心得報告及發表論文

處 理 方 式 ： 本計畫可公開查詢

中 華 民 國 99 年 10 月 29 日

方法，皆可應用於各式各樣的最佳化問題上。因此，本計畫「研究與發展專為無線

網路系統客製化之最佳化演算架構」原訂以兩年的期間，利用演化計算領域中最佳

化方法極富彈性之特點，研發出可客製化之最佳化演算架構，並將之為無線網路技

術中所存在的各項最佳化問題量身訂製，提供適切之最佳化工具與軟體。然而本計

畫僅核定為一年期計畫，故計畫目標修訂為完成混合式變數之最佳化工具，並於計

畫執行期間，進行演化計算最化佳方法論之各項相關研究。

二、 研究目的

本計畫原訂之最終目標，在於研究與發展出「提供無線網路系統相關應用之最

佳化服務基礎架構」，旨在以可客製化的最佳化架構，分別對無線網路應用系統中

存在之數個參數型別與種類不同之最佳化議題加以適當處理與解決。

由於僅核定為一年期計畫，故修訂為著重演化計算最佳化方法論上的理論探討

與特性分析。預期以延伸式精簡基因演算法為基礎架構，提出新式的泛用型最佳化

演算架構，能直接適用於含有不同型態決策變數的待處理問題，包含布林變數、整

數和實數。我們利用演化計算領域中之最佳化方法視目標函數 (Objective function)

為黑箱 (Black box) 的特性，開發出可適用於各種型別不同的參數最佳化問題之可

客製化架構，以在將來配合各存在於無線網路應用系統中不同種類之最佳化問題，

並將最佳化架構進行專為符合無線網路應用系統所需之客製化。

三、 文獻探討

許多現實世界中的問題大多不像純數學問題般單純，可以直接套用公式或經過

固定的計算程序來得到正確解答。這些現實問題最終仍需仰賴最佳化技術與工具的

幫助，方能解決各決策變數 (Decision variable) 或稱參數 (Parameter) 的決定問

題。舉凡工業設計、排程規劃、電路設計、資料壓縮、經濟學、建築學等等眾多領

域，都存在著各式各樣不同的最佳化問題。譬如積體電路配置問題，對於相同的電

路設計該如何配置能夠使用最小的面積，或是建築工程中，相同的建築材料該如何

設 計 才 能 獲 得 最 大 的 支 撐 力 問 題 。 這 些 問 題 常 常 都 不 難 要 找 到 一 組 可 行 解

(Feasible solution)，甚至是多組可行解，但是如果要找到問題的最佳解，通常就不

是那麼地容易。不同的解在問題中有不同的結果反應，如果我們能客觀地分辨結果

的優劣，就能以最佳化技術提升價值與成本的比值，以期能在各式問題中降低成本

或是改善成果。。

其中，最常見的最佳化形式要屬問題的參數調整。對於想要進行最佳化處理的

問題，通常需要定義一個目標函數 (Objective function)，來協助我們使用各種最佳

連不存在目標函數的問題也能適用，例如：個人化之樂音片段產生 [8]。此類演算

法的可行性與實用性非常高，具有一定的求解能力，在有限時間內通常可以獲得在

品質方面可被接受解，因此漸漸地被廣泛應用於現實世界問題。

四、 研究方法

1. 變數型態之研究與分析

以目前現有的許多最佳化問題而論，我們依據常見的參數型態給予分

類並討論分析。舉以一個小偷的背包問題為例子，分別對三種型態問題作

一情境模擬。此問題設定是，小偷的背包有固定的重量限制，而現在有金、

銀、銅三種不同材質的製品，其重量跟單價都不一樣，小偷該如何選擇才

能在條件限制下獲得最大利益。

 布林值 (Boolean values)

布林變數常見的被使用在決策性變數上，已經確知有數個選項，

每個選項可以用單一布林變數來表示選取或不選取。布林變數的問題

通常也就是一般的排列組合問題。當小偷問題中的三種製品都只有一

個時，即可用三個布林變數分別表示要帶走或不帶走情況，此即為典

型的布林參數問題。

 整數 (Integers)

整數是處理離散資料的型態，一些對應到實體個數的參數問題常

常就必須用整數來表示。若小偷的背包問題中，三種製品分別都有一

個以上之數量，則可以用三個整數參數來記錄，構成整數參數最佳化

的問題。

 實數 (Real numbers)

現實世界的工程問題大多是運作在實數域上，因此實數參數也就

是最常被使用的型態，通常我們可以用實數向量來表示一組問題解。

因為實數的連續性，除了在特定的問題類型之下 (例如：線性規劃問

題或是可以實數近似之最佳化問題)，實數最佳解的搜尋經常比布林

與整數型態的解還來得困難許多。假想在小偷的背包問題中，如果小

偷有工具可以對三種製品做切割動作，那此問題就必須使用實數參數

來表示帶走某種製品的數量，此問題則轉為實數最佳化問題。

有可能針對各類領域的最佳化問題提供量身訂製的最佳化架構服務。

2. 最佳化演算法架構

泛用型最佳化演算法的設計，包含了兩個最主要的部份，分別是將延

伸式精簡基因演算法擴充至整數參數與實數參數。2.1 描述如何修改邊際

乘積機率模型和最小描述長度原則機制，使整數參數也能適用。2.2 則是

計畫類別：個別型

計畫編號： NSC 98-2221-E-009-072-

執行期間： 98 年 08 月 01 日至 99 年 07 月 31 日

執行單位：國立交通大學資訊工程學系（所）

計畫主持人：陳穎平

共同主持人：許騰尹、陳耀宗

計畫參與人員：碩士班研究生-兼任助理人員：黃淵暐

報告附件：出席國際會議研究心得報告及發表論文

處理方式：本計畫可公開查詢

中華民國 99 年 10 月 29 日

二、研究目的

三、文獻探討

設計才能獲得最大的支撐力問題。這些問題常常都不難要找到一組可行解

四、研究方法

000

001

010

011

100

101

110

111

76

77

_{種不同樣式作出現次數統計，如表格 2所示。除了修}