• 沒有找到結果。

應用泛用型最佳化演算架構於無線網路傳輸技術最佳化問題之研究

N/A
N/A
Protected

Academic year: 2021

Share "應用泛用型最佳化演算架構於無線網路傳輸技術最佳化問題之研究"

Copied!
92
0
0

加載中.... (立即查看全文)

全文

(1)

1

一、 前言

演化計算 (Evolutionary computation) 為人工智慧 (Artificial intelligence) 領域

中,一個相當重要的研究方向,藉由學習和模仿自然界的演化機制,發展出許多可

應用在最佳化、搜尋、設計等問題的演算法。尤其是演化計算具有黑箱最佳化的特

性,演算法極具調整彈性,可以面對各式各樣的最佳化問題 [1, 2]。在本計畫中,

吾人欲持續對演化計算方法進行更加深入的研究,並將此一具高度可客製化的最佳

化工具,應用在通信傳輸技術中的最佳化問題,加以檢驗此最佳化工具之效能並獲

得高品質解。以期能利用演化計算領域中最佳化方法極富彈性之特點,在面對多樣

的最佳化目標時,快速地提供決策選項。於後續章節中,我們將詳述為期兩年之本

計畫之研究動機、相關研究背景與研究目的。

二、 研究目的

本實驗室在前一年度國科會計畫 (NSC 98-2221-E-009-072) 中以延伸式精簡

基因演算法 (Extended compact genetic algorithm, ECGA) 為基礎,嘗試提出新的混

合型態參數的最佳化架構。經過一整年的研究,本研究團隊在該目標已獲得了數項

成果。本計畫首先延續相關開發經驗,進一步地研究偵測決策變數間關係之技術,

以期能達成在萃取出變數間關聯性的資訊後,提升最佳化方法效能之目的。除此以

外,吾人亦著重於演化計算方法之基本理論探討,以期強化演化計算方法論之理論

基礎,深入演化計算之核心機制與數學層級之思考範圍,以試圖改變部分研究學者

認為演化計算方法缺乏理論依據之看法。於更加了解最佳化演算法之行為特性後,

再以通信傳輸技術中的 LT 編碼 (LT code) 做為最佳化標的,以所開發之最佳化

方法處理 LT 編碼中採用的編碼密度機率模型 (Degree distribution) 的最佳化問

題,以期除了驗證演算方法的有效性之外,並可提供 LT 編碼之使用者針對不同

之應用情境以將編碼密度機率模型客製化、最佳化之契機。

三、 文獻探討

許多現實世界中的問題大多不像純數學問題般單純,可以直接套用公式或經過

固定的計算程序來得到正確解答。這些現實問題最終仍需仰賴最佳化技術與工具的

幫助,方能解決各決策變數 (Decision variable) 或稱參數 (Parameter) 的決定問

題。舉凡工業設計、排程規劃、電路設計、資料壓縮、經濟學、建築學等等眾多領

域,都存在著各式各樣不同的最佳化問題。譬如積體電路配置問題,對於相同的電

路設計該如何配置能夠使用最小的面積,或是建築工程中,相同的建築材料該如何

設 計 才 能 獲 得 最 大 的 支 撐 力 問 題 。 這 些 問 題 常 常 都 不 難 要 找 到 一 組 可 行 解

(Feasible solution),甚至是多組可行解,但是如果要找到問題的最佳解,通常就不

是那麼地容易。如果我們能客觀地分辨結果的優劣,就能以最佳化技術提升價值與

成本的比值,以期能在各式問題中降低成本或是改善成果。

其中,最常見的最佳化形式要屬問題的參數調整。對於想要進行最佳化處理的

問題,通常需要定義一個目標函數 (Objective function),來協助我們使用各種最佳

(2)

2

化技術。其中一種是使用經驗法則 (heuristic),在可行解範圍逐步尋求最佳化。此

類演算法包括基因演算法 (Genetic algorithm) [1, 2]、模擬退火演算法 (Simulated

annealing) [3, 4]、螞蟻族群演算法 (Ant colony algorithm) [5]、粒子群最佳化

(Particle Swarm Optimization) [6, 7]… 等等。這類方法藉由模擬自然界的運作來達

到最佳化目的。這類方法不再受限於目標函數的數學特性,可以應用於非線性、不

可微分、或是不連續函數。無法用數學函數描述的問題,都可以設計模型,根據模

擬得到的回饋進行最佳化演算。只要兩組解的優勝劣敗能夠被某種方式比較,甚至

連不存在目標函數的問題也能適用,例如:個人化之樂音片段產生 [8]。此類演算

法的可行性與實用性非常高,具有一定的求解能力,在有限時間內通常可以獲得在

品質方面可被接受解,因此漸漸地被廣泛應用於現實世界問題。

然而,針對演化計算中之基因演算法而言,在過去的文獻中,相關學者已曾指

出,基因演算法在解編碼不適當 (亦即有相依關係的變數未能被安排在一起) 的情

形下,最佳化效能將極其低落 [9]。因此,基因演算法之主要改進方向之一,就在

於鏈結學習 (Linkage learning),亦即偵測變數間之相依性,並利用該資訊輔以各種

方式,包括動態調整解編碼、設計特殊演算子… 等,來增進基因演算法之最佳化

效能 [10]。此外,自從 No-free-lunch 定理 [11] 提出後,「泛用型」最佳化演算

法的存在即在理論層面遭到質疑;同時,現今對於最佳化演算方法的核心機制在理

論層面的認識不足,亦是目前演化計算領域無法長足發展的主因之一。是故,在本

計畫的第一年度中,吾人對於這幾個問題進行深入的探討與研究。

在應用層面,本計畫於第二年度內針對 LT 編碼中採用的編碼密度機率模型

進行最佳化研究,並且也提出了 LT 編碼方法的改進方式,創造出能夠針對不同

應用情境客製化的可能性。LT 編碼目前已被以基本元件的形式採用於許多重要的

rateless 編碼框架中,因此,增進 LT 編碼本身的效能是一件極為重要的事。為了

LT 編碼的效能,許多研究提出了 LT 編碼部分元件的改進方式。[12, 13] 針對 LT

編碼中的解碼演算法加以改進,使用不同的機制以還原來源資料。[14] 則將亂數

產生器置換為混沌 (chaos) 數列產生演算法,以提供 LT 編碼做為亂數使用。

除了這些研究以外,更多的相關研究者則聚焦在設計編碼密度機率模型上,以

期得到比已被證明在來源資料符號數趨向無限大時非常接近最佳解的 Robust

soliton distribution 能提供 LT 編碼更佳效能的編碼密度機率模型。於是,這類研

究 [15, 16] 專注於處理來源資料符號數較少的情境上,即便這些符號數小於 30

的情境事實上目前可以使用高斯消除法來處理 [17, 18]。為了要能對需使用較多、

但距離無限大很遠的 LT 編碼情境最佳化,[19] 首先提出了使用經驗法則來進行

編碼密度機率模型的最佳化動作,並測試了符號數為 100 的情況。然而,就實務

層面及需求 (例如,即時多媒體資料傳輸、音訊與視訊串流等) 來看,符號數約在

數面到數萬之間,才是亟需研究的區段。此乃由於這種數量級的符號數仍然距離無

限大很遠,Robust soliton distribution 無法幫得上忙,但卻又多到非常難以尋得可

提供較佳 LT 編碼效能的編碼密度機率模型。包含本實驗室及合作研究者在內,

過去已有數項以演化計算方法來對編碼密度機率模型最佳化的研究 [20-22],故在

本計畫中,我們在第二年度中更進一步地將前一年度內所得之成果,應用於此重要

的最佳化的問題上,並且亦試圖改進 LT 編碼原設計機制。

(3)

3

四、 研究方法

1. 針對歸納式鏈結學習法之性質探討

歸納式鏈結學習 (ILI, Inductive linkage learning) 法為本實驗室過去

所提出 [23],使用機器學習領域之 ID3 方法進行鏈結學習的技術。先前

本實驗室已對此方法進行,建構基石的難度影響 [24, 25]、問題結構的影

響 [26]… 等數項分析。奠基於這些成果,在本計畫的支持下,我們更進

一步探討此方法在處理含有不同大小與型別的建構基石之問題時的表

現,以及其所需之人口數目相對於問題大小的成長關係,如圖所示。

2. 解除 No-free-lunch 定理之桎梏

為了奠定泛用型方法存在之可能性及其範圍,吾人提出一個足夠廣泛

的數學框架,並以理論觀點配合計算實務,來合理地解釋 No-free-lunch

定理雖然在其所定義的最佳化問題、最佳化方法等範疇內為真。但由於在

實務中,絕大部分被包含在「所有問題」中的問題 (亦即目標函數) 其實

完全毋須考慮,在此情形之下,No-free-lunch 所宣稱之「任何兩個最佳

化方法在所有問題上的平均效能相等」在發展各項於實務中所使用的最佳

化計算方法方面,並不會造成任何影響。

(4)

4

3. 建立彌因演算法核心運作機制之數學模型

彌因演算法 (Memetic algorithm) 與基因演算法的不同之處,在於彌

因演算法強調以

「後天學習所獲得之技能」

的觀點,來思考區域搜尋 (Local

search) 運算子在最佳化過程中所扮演的角色。而最佳化方法要達到優良

效能,其必要條件即為所使用之廣域搜尋 (Global search) 運算子與區域

搜尋運算子能合作與平衡。然而,長期以來縱有一些零星的理論探討研

究,大多流於範圍狹窄或模型不實際。因此上述概念的本身雖被大部分相

關學者所認同,但一直停留在觀念階段,無法落實為可進行運算的數學標

的。本實驗室針對此點,建立起具體卻又不失一般性之彌因演算法數學模

型,用以探討廣域搜尋機制與區域搜尋機制的相對性質與平衡之取得,並

得以之做為設計出效能更佳之演算法的指導原則。

而由我們的研究結果可知,如下圖所示,廣域搜尋機制與區域搜尋機

制的平衡乃奠基於搜尋資源是否平均分配於這兩種搜尋機制。這裡所謂的

「平均分配」並非是以主動且直接的方式,讓廣域搜尋和區域搜尋各佔一

半的搜尋資源,而是要藉由調整其採用的最佳化演算法本身所具有的參數

來達成。若該演算化在某組參數設定下,能恰好在廣域搜尋行為和區域搜

尋行為展現時使用各半的搜尋資源,則應是其最有效率的情境。

(5)

5

4. 分析粒子群演算法的收歛時間

粒子群演算法 (Particle swarm optimization, PSO) 由於其容易使用且

效果佳的特性,目前被廣泛使用於許多的工程與科學的最佳化問題上。然

後,就理論層面而言,環顧目前相關文獻,現存對於粒子群最佳化演算法

之理論研究在原始構想上,都是先將粒子群的數量縮減為 1,然後認定這

些 (事實上是「這個」) 粒子的移動方式為動態系統,再引用動態系統領

域中所熟知的推導結果說明粒子群之收斂性。而少數奠基在相同的思維

上,推廣至稍多粒子的研究中,皆假設所有的粒子之間完全獨立。雖然由

於理論探討困難,高度簡化討論標的複雜度無可厚非,但討論「數量為 1」

的群體,而以單一粒子最終靜止的行為說明粒子群收斂的必然性,同時還

捨棄粒子群最佳化演算法中最關鍵的特性—粒子訊息交換行為。如此的理

論研究,對於推進粒子群最佳化演算法的貢獻顯然極其有限。而本實驗室

在這個主題中進行了重要的研究。在我們先前已發表的研究 [27] 中,首

創以統計的方式來詮釋粒子群最佳化方法的運作。在我們所提出的理論框

架中,不但直接考慮了由多個粒子所構成的群體,同時也一併考慮粒子間

的訊息交換的關鍵機制,成為目前所有相關的理論研究中,最為貼近實際

執行之粒子群最佳化演算法的理論模型。在奠基於此一理論架構上,我們

進行了收歛時間的推導,並以實驗的方式獲得了如下圖所示的初步驗證。

(6)

6

5. 尋找對 LT 編碼最佳化時所需的替代評估函數

改進 LT 編碼最直覺的想法就是降低其所需之 overhead,因此評估

函數通常是去計算 LT 編碼搭配某編碼密度機率模型的 overhead。遺憾

的是目前沒有一個 closed form 可以去計算平均所需的 overhead,只能依

靠大量模擬資料。若我們從另一角度來觀察 LT 編解碼行為,當收集到

的 output symbols 越來越多時,LT 編碼有越高的機率可以完全解開

input symbols。這表示我們可以固定 LT 編碼接收到的 overhead,求出並

最小化 LT 編碼失敗機率。在本計畫中,我們參考了兩種見於文獻的評

估計算方式 [28, 29],並使用於編碼密度機率模型的最佳化過程,進而獲

得如下表所條列之高效能 LT 編碼編碼密度機率模型。

6. 研究稀疏機率分佈的選擇

在最佳化架構中為了減輕搜尋負擔,我們採用稀疏機率分佈來取代完

整機率分佈。此方法可以有效地降低搜尋空間的大小,但同時也限制了找

到全域最佳解的可能。在先前的研究中,我們依靠實驗經驗,手動決定適

合的 degree 來組成稀疏機率分佈中的非零項,雖然最佳化後的結果確實

優於 Robust soliton distribution,但我們不曉得這些分佈是否已經逼近全

域最佳解,抑或有其他的 degree 組合能夠找出更佳的稀疏機率分佈。根

據文獻指出,不同 degree 的 output symbols 各自有其解碼作用。為了釐

清不同 degree 對於 LT 解碼率的影響,我們考慮量化各個 degree 上的

機率跟 LT 解碼率間的關係。分析這些數據有助於我們找到最適當的

degree 集合,使其所對應到的子搜尋空間能儘量逼近全域最佳解位置。

(7)

7

7. 改進 LT 編碼機制使其得以客製化

在噴泉碼 (Fountain codes) 的發展過程中,universal property 一直在

理論層面上佔有非常重要的地位,因為它保證了無論 channel erasure rate

在何種情形下,解碼所須的 overhead 都維持不變。然而,時值今日,在

實務情形中,太高的 channel erasure rate 根本不具考慮價值,因為即便料

接受者確實能夠接獲資料,相較其他使用較佳 channel 的同儕使用者而

言,使用經驗將大打折扣。此外,若是考慮即時多媒體傳輸的場合,只要

使用稍高

erasure rate 的 channel,便可能造成整體問題,故 channel

erasure rate 便成為一個設計參數,對不同的情境皆有經考慮而確定要支

援的界限。但此一實務需求,以目前的 LT 編碼發展而言,完全沒有被

研究者以學術問題的角度進行考慮。是故,本實驗室針對這項需求,為 LT

編碼機制引入演化計算領域中常用的 tournament selection 機制,以致 LT

編碼可依應用情境的不同,而進行客製化及最佳化調整,不必拘泥於維持

universal property。所得結果如下圖所示,噴泉碼之採用者可以情境之不

同,客製出 CDD1, CDD2, CDD3 等因應不同應用情境狀況之需求。相對

於具有 universal property 特性的編碼方法,如 LT 編碼,就僅能使用此

圖中之水平線的編碼性質,如圖中之紅色水平線即為

Robust soliton

distribution 配合 LT 編碼所能提供的效能指標。

(8)

8

五、 結果與討論

本計畫於兩年期間,進行偵測決策變數間關係技術之深入研究、探討演化計算

方法之理論根基,並且針對 LT 編碼問題進行研究,包括編碼密度機率模型的最

佳化以及改進 LT 編碼機制等。在這些主題中,已完成之具體工作項目如下:

 參與人員獲得以下之訓練:

 培養研究生分工合作之能力;

 訓練參與人員研究、統合與論文寫作能力;

 統整研究成果並發表學術論文;

 強化參與人員之資料分析、演化計算、機械學習、數值分析與最佳化技術

等相關技能。

 深入探討歸納式鏈結學習法之性質: 包括其對於含有不同大小與型別的建構

基石之問題的表現,以及建立數學模型以顯示並預測歸納式鏈結學習法所需之

人口數目相對於問題大小的成長關係。

 奠定泛用型方法存在之可能性及其範圍: 提出數學框架以理論觀點配合計算

實務以合理地解釋 No-free-lunch 定理雖然在其定義的範圍內為真,但並不會

對於發展實務計算各項方法造成任何影響。

 建立演化計算方法之數學模型: 提出數學模型並用以探討最佳化計算法方中

所含有之廣域搜尋機制與區域搜尋機制的相對性質,以及其應如何調配方能取

得平衡,以利設計出效能更佳之演算法。

 分析粒子群演算法的收歛時間: 奠基於本實驗室之前所提出之粒子群演算法

收歛的數學模型,進行收歛時間的推導,並將實際之粒子群演算法運用於處理

具體的數學函數以獲得可驗證所推導之收歛時間的數值結果。

 將研究成應用於 LT 編碼中之編碼密度機率模型最佳化: 我們首先尋找對

LT 編碼最佳化時所需的替代評估函數,再配合研究稀疏機率分佈的選擇以降

低最佳化演算法的負擔,從而成功對編碼密度機率模型進行最佳化。

 改進 LT 編碼機制使其得以客製化: 將演化計算領域經常使用的 tournament

selection 概念,應用於改進 LT 編碼機制中,致使 LT 編碼採用者有機會能

針對其不同的應用情境與狀況,將 LT 編碼客製化與最佳化。

 撰寫報告並投稿相關期刊與重要會議論文以公開發表本計畫各項研究成果。本

實驗室目前基於國科會之研究計畫補助,投稿與發表了以下的學術論文:

 期刊論文:

 Chen, C.-M., Chen, Y.-p., Shen, T.-C., & Zao, J. K. A Practical

Optimization Framework for the Degree Distribution in LT Codes. IET

Communications. (Submitted)

(9)

9

 Lin, J.-Y., & Chen, Y.-p. (2013). Population sizing for inductive linkage

identification. International Journal of Systems Science.

doi: 10.1080/00207721.2011.577246. (SCI, EI). (Accepted)

 Chen, Y.-p., Chuang, C.-Y., & Huang, Y.-W. (2012). Inductive linkage

identification on building blocks of different sizes and types.

International Journal of Systems Science, 43(12), 2202–2213.

doi: 10.1080/00207721.2011.566639. (SCI, EI).

 Lee, M.-C., Leu, F.-Y., & Chen, Y.-p. (2012). PFRF: An adaptive data

replication algorithm based on star-topology data grids. Future

Generation Computer Systems, 28(7), 1045–1057.

doi: 10.1016/j.future.2011.08.015. (SCI, EI).

 Chen, C.-H., & Chen, Y.-p. (2011). Convergence time analysis of particle

swarm optimization based on particle interaction. Advances in Artificial

Intelligence, 2011(204750), 1–7.

doi: 10.1155/2011/204750.

 Lin, J.-Y., & Chen, Y.-p. (2011). Analysis on the collaboration between

global search and local search in memetic computation. IEEE

Transactions on Evolutionary Computation, 15(5), 608–623.

doi: 10.1109/TEVC.2011.2150754. (SCI, EI).

 Jiang, P., & Chen, Y.-p. (2011). Free lunches on the discrete Lipschitz

class. Theoretical Computer Science, 412(17), 1614–1628.

doi: 10.1016/j.tcs.2010.12.028. (SCI, EI).

 會議論文:

 Chen, C.-M., & Chen, Y.-p. Connection Choice Codes. The 32nd IEEE

International Conference on Computer Communications (IEEE

INFOCOM 2013). (Submitted)

 Tsai, P.-C., Chen, C.-M., & Chen, Y.-p. (2012). Sparse degrees analysis

for LT codes optimization. In Proceedings of 2012 IEEE Congress on

Evolutionary Computation (CEC 2012) (pp. 2463–2468).

doi: 10.1109/CEC.2012.6252861. (EI).

 Lin, J.-Y., & Chen, Y.-p. (2012). When and what kind of memetic

algorithms perform well. In Proceedings of 2012 IEEE Congress on

Evolutionary Computation (CEC 2012) (pp. 2716–2723).

(10)

10

參考文獻

[1] D.

E.

Goldberg,

Genetic algorithms in search, optimization, and machine

learning. Reading, Mass.: Addison-Wesley Pub. Co., 1989.

[2]

D. E. Goldberg, The design of innovation : lessons from and for competent

genetic algorithms. Boston: Kluwer Academic Publishers, 2002.

[3] S.

Kirkpatrick, et al., "Optimization by Simulated Annealing," Science, vol. 220,

pp. 671-680, 1983.

[4] V.

Černý, "Thermodynamical approach to the traveling salesman problem: An

efficient simulation algorithm," Journal of Optimization Theory and Applications,

vol. 45, pp. 41-51, 1985.

[5] M.

Dorigo, et al., "Ant algorithms for discrete optimization," Artificial Life, vol.

5, pp. 137-172, 1999.

[6]

R. C. Eberhart and J. Kennedy, "A new optimizer using particle swarm theory,"

Proceedings of the Sixth International Symposium on Micromachine and Human

Science, pp. 39-43, 1995.

[7]

J. Kennedy and R. C. Eberhart, "Particle swarm optimization," Proceedings of

IEEE International Conference on Neural Networks, pp. 1942-1948, 1995.

[8] T.

Y.

Fu, et al., "Evolutionary interactive music composition," Proceedings of

ACM SIGEVO Genetic and Evolutionary Computation Conference 2006

(GECCO-2006), pp. 1863-1864, 2006.

[9]

D. E. Goldberg, et al., "Messy genetic algorithms: Motivation, analysis, and first

results," Complex Systems, vol. 3, pp. 493-530, 1989.

[10] Y.-p.

Chen,

Extending the scalability of linkage learning genetic algorithms:

Theory and practice, 2005.

[11] D. H. Wolpert and W. G. Macready, "No free lunch theorems for optimization,"

IEEE Transactions on Evolutionary Computation, vol. 1, pp. 67-82, 1997.

[12] H.

Tarus, et al., "Exploiting Redundancies to Improve Performance of LT

Decoding," Proceedings of the 6th Annual Conference on Communication

Networks and Services Research (CNSR 2008), pp. 198-202, 2008.

[13] F.

Lu, et al., "LT codes decoding: Design and analysis," Proceedings of the IEEE

International Symposium on Information Theory (ISIT 2009), pp. 2492-2496,

2009.

[14] Q.

Zhou, et al., "Encoding and Decoding of LT Codes Based on Chaos,"

Proceedings of the 3rd International Conference on Innovative Computing

Information and Control (ICICIC ’08), pp. 451-451, 2008.

[15] E.

Hyyti¨a, et al., "Optimal Degree Distribution for LT Codes with Small

Message Length," Proceedings of the 26th IEEE International Conference on

Computer Communications (INFOCOM 2007), pp. 2576-2580, 2007.

[16] E. A. Bodine and M. K. Cheng, "Characterization of Luby Transform Codes with

Small Message Size for Low-Latency Decoding," Proceedings of the IEEE

International Conference on Communications, pp. 1195-1199, 2008.

[17] V.

Bioglio, et al., "On the fly Gaussian elimination for LT codes," IEEE

Communications Letters, pp. 953-955, 2009.

[18] M.

Rossi, et al., "SYNAPSE++: Code Dissemination in Wireless Sensor

Networks Using Fountain Codes," IEEE Transactions on Mobile Computing, vol.

(11)

11

9, pp. 1749-1765, 2010.

[19] E.

Hyyti¨a, et al., "Optimizing the degree distribution of LT codes with an

importance sampling approach," Proceedings of the 6th InternationalWorkshop

on Rare Event Simulation (RESIM 2006), pp. 64-73, 2006.

[20] C.-M.

Chen, et al., "On the optimization of degree distributions in LT codes with

covariance matrix adaptation evolution strategy," Proceedings of 2010 IEEE

Congress on Evolutionary Computation (CEC 2010), pp. 3531-3538, 2010.

[21] C.-M.

Chen, et al., "Optimizing degree distributions in LT codes by using the

multiobjective evolutionary algorithm based on decomposition," Proceedings of

2010 IEEE Congress on Evolutionary Computation (CEC 2010), pp. 3635-3642,

2010.

[22] A. Talari and N. Rahnavard, "Rateless Codes with Optimum Intermediate

Performance," Proceedings of the Global Telecommunications Conference

(GLOBECOM 2009), pp. 1-6, 2009.

[23] C.-Y. Chuang and Y.-p. Chen, "Linkage identification by perturbation and

decision tree induction," Proceedings of 2007 IEEE Congress on Evolutionary

Computation (CEC 2007), pp. 357-363, 2007.

[24] C.-Y. Chuang and Y.-p. Chen, "Recognizing problem decomposition with

inductive linkage identification: Population requirement vs. subproblem

complexity," Proceedings of the Joint 4th International Conference on Soft

Computing and Intelligent Systems and 9th International Symposium on

advanced Intelligent Systems (SCIS & ISIS 2008), pp. 670-675, 2008.

[25] Y.-w. Huang and Y.-p. Chen, "Detecting general problem structures with

inductive linkage identification," Proceedings of the 2010 Conference on

Technologies and Applications of Artificial Intelligence (TAAI 2010), pp.

508-515, 2010.

[26] Y.-W. Huang and Y.-p. Chen, "On the detection of general problem structures by

using inductive linkage identification," Proceedings of ACM SIGEVO Genetic

and Evolutionary Computation Conference 2009 (GECCO-2009), pp. 1853-1854,

2009.

[27] Y.-p. Chen and P. Jiang, "Analysis on the facet of particle interaction in particle

swarm optimization," Theoretical Computer Science, vol. 411, pp. 2101-2115,

2010.

[28] R.

Karp, et al., "Finite length analysis of LT codes," Proceedings of 2004 IEEE

International Symposium on Information Theory (ISIT 2004), p. 39, 2004.

[29] E. Maneva and A. Shokrollahi, "New model for rigorous analysis of LT-codes,"

Proceedings of 2006 IEEE International Symposium on Information Theory

(ISIT 2006), pp. 2677-2679, 2006.

(12)

12

附錄

: 已發表之論文全文

期刊論文:

1. Chen, Y.-p., Chuang, C.-Y., & Huang, Y.-W. (2012). Inductive linkage

identification on building blocks of different sizes and types. International Journal

of Systems Science, 43(12), 2202–2213. doi: 10.1080/00207721.2011.566639. (SCI,

EI).

2. Lee, M.-C., Leu, F.-Y., & Chen, Y.-p. (2012). PFRF: An adaptive data replication

algorithm based on star-topology data grids. Future Generation Computer Systems,

28(7), 1045–1057. doi: 10.1016/j.future.2011.08.015. (SCI, EI).

3. Chen, C.-H., & Chen, Y.-p. (2011). Convergence time analysis of particle swarm

optimization based on particle interaction. Advances in Artificial Intelligence,

2011(204750), 1–7. doi: 10.1155/2011/204750.

4. Lin, J.-Y., & Chen, Y.-p. (2011). Analysis on the collaboration between global

search and local search in memetic computation. IEEE Transactions on Evolutionary

Computation, 15(5), 608–623. doi: 10.1109/TEVC.2011.2150754. (SCI, EI).

5. Jiang, P., & Chen, Y.-p. (2011). Free lunches on the discrete Lipschitz class.

Theoretical Computer Science, 412(17), 1614–1628. doi: 10.1016/j.tcs.2010.12.028.

(SCI, EI).

會議論文:

1. Tsai, P.-C., Chen, C.-M., & Chen, Y.-p. (2012). Sparse degrees analysis for LT

codes optimization. In Proceedings of 2012 IEEE Congress on Evolutionary

Computation (CEC 2012) (pp. 2463–2468). doi: 10.1109/CEC.2012.6252861. (EI).

2. Lin, J.-Y., & Chen, Y.-p. (2012). When and what kind of memetic algorithms

perform well. In Proceedings of 2012 IEEE Congress on Evolutionary Computation

(CEC 2012) (pp. 2716–2723). doi: 10.1109/CEC.2012.6252894. (EI).

(13)

This article was downloaded by: [National Chiao Tung University] On: 24 October 2012, At: 23:09

Publisher: Taylor & Francis

Informa Ltd Registered in England and Wales Registered Number: 1072954 Registered office: Mortimer House, 37-41 Mortimer Street, London W1T 3JH, UK

International Journal of Systems Science

Publication details, including instructions for authors and subscription information:

http://www.tandfonline.com/loi/tsys20

Inductive linkage identification on building blocks of

different sizes and types

Ying-ping Chen a , Chung-Yao Chuang a & Yuan-Wei Huang a a

Department of Computer Science, National Chiao Tung University, Hsinchu, Taiwan Version of record first published: 11 Apr 2011.

To cite this article: Ying-ping Chen, Chung-Yao Chuang & Yuan-Wei Huang (2012): Inductive linkage identification on building blocks of different sizes and types, International Journal of Systems Science, 43:12, 2202-2213

To link to this article: http://dx.doi.org/10.1080/00207721.2011.566639

PLEASE SCROLL DOWN FOR ARTICLE

Full terms and conditions of use: http://www.tandfonline.com/page/terms-and-conditions

This article may be used for research, teaching, and private study purposes. Any substantial or systematic reproduction, redistribution, reselling, loan, sub-licensing, systematic supply, or distribution in any form to anyone is expressly forbidden.

The publisher does not give any warranty express or implied or make any representation that the contents will be complete or accurate or up to date. The accuracy of any instructions, formulae, and drug doses should be independently verified with primary sources. The publisher shall not be liable for any loss, actions, claims, proceedings, demand, or costs or damages whatsoever or howsoever caused arising directly or indirectly in connection with or arising out of the use of this material.

(14)

International Journal of Systems Science Vol. 43, No. 12, December 2012, 2202–2213

Inductive linkage identification on building blocks of different sizes and types

Ying-ping Chen*, Chung-Yao Chuang and Yuan-Wei Huang

Department of Computer Science, National Chiao Tung University, Hsinchu, Taiwan (Received 24 March 2010; final version received 17 February 2011)

The goal of linkage identification is to obtain the dependencies among decision variables. Such information or knowledge can be applied to design crossover operators and/or the encoding schemes in genetic and evolutionary methods. Thus, promising sub-solutions to the problem will be disrupted less likely, and successful convergence may be achieved more likely. To obtain linkage information, a linkage identification technique, called Inductive Linkage Identification(ILI), was proposed recently. ILI was established upon the mechanism of perturbation and the idea of decision tree learning. By constructing a decision tree according to decision variables and fitness difference values, the interdependent variables will be determined by the adopted decision tree learning algorithm. In this article, we aim to acquire a better understanding on the characteristics of ILI, especially its behaviour under problems composed of different-sized and different-type building blocks (BBs) which are not overlapped. Experiments showed that ILI can efficiently handle BBs of different sizes and is insensitive to BB types. Our experimental observations indicate the flexibility and the applicability of ILI on various elementary BB types that are commonly adopted in related experiments.

Keywords: inductive linkage identification; ILI; linkage learning; BBs; genetic algorithms; evolutionary computation

1. Introduction

Previous studies (Goldberg, Korb, and Deb 1989; Harik 1997) on genetic algorithms (GAs), which are widely utilised to handle control and engineering problems (Wang 2009; Li and Li 2010; Gladwin, Stewart, and Stewart 2011), have shown that the encoding scheme of solutions is one of the key factors to the success of GAs by demonstrating that simple GAs fail to handle problems of which the solutions are represented with loose encodings while genetic algo-rithms capable of learning linkage succeed. If strongly related variables, which are usually referred to as building blocks (BBs), are arranged loosely with the adopted representation, they are likely to be disrupted by crossover operations. Such a condition contributes to the divergence of population, instead of the convergence towards optimal solutions. Although encoding strongly related variables tightly or making crossover operators aware of such relationships could mitigate the problem and improve the GA perfor-mance (Stonedahl, Rand, and Wilensky 2008), both measures require the foreknowledge of the target problem, which is often not the case in which evolu-tionary algorithms are adopted.

In order to overcome the BB disruption problem, a variety of techniques have been proposed and developed in the past two decades and can be roughly

classified into three categories (Munetomo and Goldberg 1998; Chen, Yu, Sastry, and Goldberg 2007):

(1) Evolving representations or operators;

(2) Probabilistic modelling for promising solutions;

(3) Perturbation methods.

The objective of the techniques in the first class is to make individual promising sub-solutions separated and less likely to be disrupted by crossover via manipulating the representation of solutions during optimisation. Various reordering and mapping opera-tors have been proposed in the literature, such as self-crossover (Pal, Nandi, and Kundu 1998), which is proven able to generate any arbitrary permutation of the symbols, the messy GA (mGA) (Goldberg et al. 1989), and the fast mGA (fmGA) (Kargupta 1995), which is the more efficient descendant of mGA. The difficulty faced by these methods is that the reordering operator usually reacts too slow and loses the race against selection. Therefore, premature convergence at local optima occurs. Another technique, the linkage learning GA (LLGA) proposed by Harik (1997), uses circular structures as the representation with two-point crossover such that the tight linkage might be more likely preserved. LLGA works well while the shares of BBs are exponentially apportioned in the total fitness,

*Corresponding author. Email: ypchen@cs.nctu.edu.tw

ISSN 0020–7721 print/ISSN 1464–5319 online ß 2012 Taylor & Francis

http://dx.doi.org/10.1080/00207721.2011.566639 http://www.tandfonline.com

(15)

which are usually referred to as exponentially scaled problems. However, it is inefficient when applied to uniformly scaled problems.

The methods in the second category are often referred to as the estimation of distribution algorithms (EDAs) (Mu¨hlenbein and Paaß 1996; Larran˜aga and Lozano 2001; Pelikan, Goldberg, and Lobo 2002). These approaches describe the dependencies among variables in a probabilistic manner by constructing a probabilistic model from selected solutions and then sample the built model to generate new solutions. Early EDAs began with assuming no interactions among variables, such as the population-based incremental learning (PBIL) (Baluja 1994) and the compact GA (cGA) (Harik, Lobo, and Goldberg 1999). Subsequent studies started to model pairwise interactions, e.g. the mutual-information input clustering (MIMIC) (de Bonet, Isbell, and Viola 1997), Baluja’s dependency tree approach (Baluja and Davies 1997), and the bivariate marginal distribution algorithm (BMDA) (Pelikan and Mu¨hlenbein 1999). Multivariate depen-dencies were then exploited, and more general interac-tions were modelled. Example methods include the extended compact GA (ECGA) (Harik 1999), the Bayesian optimisation algorithm (BOA) (Pelikan, Goldberg, and Cantu´-Paz 1999), the factorised distri-bution algorithm (FDA) (Mu¨hlenbein and Mahnig 1999) and the learning version of FDA (LFDA) (Mu¨hlenbein and Ho¨ns 2005). Since model constructing in these methods requires no additional fitness evalu-ations, EDAs are usually considered efficient in the traditional viewpoints of evolutionary computation, especially when fitness evaluations involve time-consuming simulations. However, the model construct-ing mechanism itself is sometimes computationally expensive with a large population size, which usually occurs in evolutionary methods. The difficulty which EDAs often face is that the BBs contributing less to the total fitness are likely ignored rather than recognised.

Approaches in the third category observe the fitness differences caused by perturbing variables to detect dependencies. In the literature, the gene expres-sion messy GA (GEMGA) (Kargupta 1996) models the sets of tightly linked variables as weights assigned to solutions and employs a perturbation method to detect them. GEMGA observes the fitness changes caused by perturbations on every variable for strings in the population and detects interactions among vari-ables according to how likely the varivari-ables compose optimal solutions. Assuming that nonlinearity exists within a BB, the linkage identification by nonlinearity check (LINC) (Munetomo and Goldberg 1998) per-turbs a pair of variables and observes the presence of nonlinearities to identify linkages. If the sum of fitness differences of perspective perturbations on two

variables is equal to the fitness difference caused by simultaneously perturbing the two variables, linearity is confirmed, and thus, these two variables are consid-ered independent. Instead of non-linearity, the descen-dant of LINC, linkage identification by non-monotonicity detection (LIMD) (Munetomo and Goldberg 1999), adopts non-monotonicity to detect interactions among variables. Compared to EDAs, the low salience BBs are unlikely ignored in these approaches. However, since obtaining fitness differ-ences requires extra function evaluations, perturbation methods are usually considered demanding more computational efforts to detect linkages. In addition to empirical studies Heckendorn and Wright (2004) generalised these methods through Walsh analysis to obtain theoretical resource requirements. Zhou, Sun, and Heckendorn (2007) and Zhou, Heckendorn, and Sun (2008) later extended this study from the binary domain to high-cardinality domains.

An interesting approach combining the ideas of EDAs and perturbation methods, called the depen-dency detection for distribution derived from fitness differences (D5), was developed by Tsuji, Munetomo, and Akama (2006). D5 detects the dependencies of variables by estimating the distributions of strings clustered according to fitness differences. For each variable, D5calculates fitness differences by perturba-tions on that variable in the entire population and clusters the strings into sub-populations according to the obtained fitness differences. The sub-populations are examined to find k variables with the lowest entropies, where k is an algorithmic parameter for problem complexity, i.e. the number of variables in a linkage set. The determined k variables are considered forming a linkage set. D5can detect dependencies for a class of functions that are difficult for EDAs, e.g. functions containing low salience BBs, and requires less computational cost than other perturbation methods do. However, its major constraint is that it relies on parameter k, which may not be available due to the limited information of the problem structure. As a side-effect to parameter k, D5might be fragile in the situation where the problem is composed of subprob-lems of different sizes. Moreover, Ting, Zeng, and Lin (2010) recently utilised another data mining technique, Apriori Algorithm, to learn potential association rules between decision variables for linkage discovery. They reported that their proposal can improve D5in terms of solution quality and efficiency.

In our previous work, we proposed inductive linkage identification (ILI) based on perturbations and the integration with the Interative Dichotomiser (ID3) (Quinlan 1986) algorithm, which is widely used in machine learning. ILI is an unsupervised method without any parameter for the complexity of BBs.

International Journal of Systems Science 2203

(16)

Its scalability and efficiency against the increasing problem sizes have been demonstrated (Chuang and Chen 2007, 2008; Huang and Chen 2009, 2010). Compared to the conventional perturbation methods, such as LINC and LIMD, ILI utilises a data mining technique to analyse objective functions. Compared to D5, which uses clustering, and the method proposed by Ting et al. (2010), which uses Apriori algorithm, ILI adopts the ID3 algorithm and behaves quite differ-ently. In this article, we aim to address more detailed characteristics of ILI in order to gain deeper insights and better understandings of linkage learning. In particular, problems constructed by non-overlapped BBs of different sizes and sub-functions are studied and experimented upon. Our experimental results indicate that ILI holds the properties of robustness and efficiency when facing various configurations of BBs.

The remainder of this article is organised as follows. In Section 2, the background of linkage leaning in GA and decomposability of problems is briefly introduced. Section 3 gives an introduction to ILI, including a review of the ID3 decision tree learning algorithm, an example illustrating the pro-posed approach, and an algorithmic description of ILI. Section 4 presents the experiments conducted in this study and the results revealing the behaviour of ILI. Finally, Section 5 summaries and concludes this article.

2. Linkage and BBs

In this section, we briefly review the definitions and terminologies which will be used through out this article. As stated by de Jong, Watson, and Thierens (2005), ‘two variables in a problem are interdependent if the fitness contribution or optimal setting for one variable depends on the setting of the other variable’, and such relationship between variables is often referred to as linkage in the GA literature. In order to obtain the full linkage information of a pair of variables, the fitness contribution or optimal setting of these two variables will be examined on all possible settings of the other variables.

Although obtaining the full linkage information is computationally expensive, linkage should be esti-mated using a reasonable amount of efforts if the target problem is decomposable. According to the Schema theorem (Holland 1992), short, low-order and highly fit substrings increase their share to be com-bined. Also stated in the BB hypothesis, GAs implicitly decompose a problem into sub-problems by processing BBs. It is considered that combining small parts is important for GAs and is consistent with human innovation (Goldberg 2002). These lead to a problem

model called the additively decomposable function (ADF), which can be written as a sum of low-order sub-functions.

Let a string s of length ‘ be described as a series of variables, s ¼ s1s2  s‘. We assume that s ¼ s1s2  s‘is

a permutation of the decision variables x ¼ x1x2  x‘

to represent the encoding scheme adopted by GA users. The fitness of string s is then defined as

f ðsÞ ¼X

m

i¼1

fiðsviÞ, ð1Þ

where m is the number of sub-functions, fiis the i-th

sub-function and svi is the substring to fi. Each viis a vector specifying the substring svi. For example, if vi¼(1, 2, 4, 8), svi ¼s1s2s4s8. If fiis also a sum of other functions, it can be replaced by those sub-functions. Thus, each fi can be considered as a

nonlinear function.

By eliminating the ordering property of vi, we can

obtain a set Vi containing the elements of vi. The

variables belonging to the same set of Viis regarded as

interdependent because fi is nonlinear. Thus, we refer

to the set Vi as a linkage set. A related term, BBs,

is referred to as the candidate solutions to sub-function fi. In this article, only a subclass of the ADFs is

considered. We concentrate on non-overlapping sub-functions. That is, Vi\Vj¼ ; if i 6¼ j. In addition, the

strings are assumed to be composed of binary variables.

3. Inductive linkage learning

In this section, the ideas behind ILI will be presented. Then, the ID3 algorithm, which is proposed and widely utilised in the field of machine learning, will be briefly introduced. An example is given to illustratively explain the mechanism of ILI, followed by the pseudo code.

In ILI, linkage learning is regarded as the issue of decision tree learning. As an illustration, the fitness difference can be derived in the following equation within the ADF model:

f ðs1s2  s8Þ ¼f1ðs1s2s3s4s5Þ þf2ðs6s7s8Þ

df1ðsÞ ¼ f ðs1s2  s8Þ f ðs1s2  s8Þ

¼ ðf1ðs1s2s3s4s5Þ þf2ðs6s7s8ÞÞ

 ðf1ðs1s2s3s4s5Þ þf2ðs6s7s8ÞÞ

¼f1ðs1s2s3s4s5Þ f1ðs1s2s3s4s5Þ: ð2Þ

Equation (2) indicates that the fitness difference df1

should be affected only by the bits belonging to the same sub-functions as the perturbed bits s1, which

are s1s2  s5. Since certain fitness difference values are

respectively caused by particular bits arranged in some

2204 Y.-p. Chen et al.

(17)

permutation of the sub-function where the perturbed variable belongs, we can consider the task as finding which values of variables will result in the correspond-ing fitness differences.

We found that this kind of tasks is similar to decision making in machine learning: given a condition composed of attributes, an agent (algorithm) should learn to make a decision with the given training instances. When the decision-making method is adopted for conducting linkage learning, decision variables are regarded as attributes and the fitness difference values stand for class labels. With this simple and direct mapping, linkage learning in GAs can potentially be handled with certain well-developed methods in machine learning.

3.1. Decision tree learning: ID3

The ID3 algorithm was proposed by Quinlan (1986) for the purpose of constructing a decision tree on a set of training instances. In its basic form, ID3 constructs a decision tree in a top–down manner without back-tracking. When a decision tree is being constructed, each attribute is evaluated using a statistical property, called the information gain, to measure how well the attribute alone classifies the training instances. The best attribute, which leads to the highest information gain, is accordingly selected and used as the root node of the tree. A descendant node of the root is created for each possible value of the selected attribute, and the training instances are split into appropriate descendant branches. The entire process is repeated on the training instances associated with each descendant node.

The statistical property, information gain, of each attribute is simply the expected reduction in the impurity of instances after classifying the instances with the selected attribute. The impurity of an arbitrary collection of instances is called entropy in the information theory. Given a collection D, contain-ing instances of c different target values, the entropy of Drelative to this c-wise classification is defined as

EntropyðDÞ X

c

i¼1

pilog2pi, ð3Þ

where piis the proportion of D belonging to class i. For

simplicity, in all the calculations involving entropy, we define 0log20 to be 0. In terms of entropy, the

information gain, Gain(D, A), of an attribute A relative to a collection of instances D, is defined as

GainðD, AÞ  EntropyðDÞ  X

v2Val ðAÞ

jDvj

jDjEntropyðDvÞ, ð4Þ

where Val(A) is the set of all possible values for attribute A and Dvis the subset of D of which attribute

Ahas value v. In summary, ID3 can be described as the pseudo code given in Algorithm 1.

Algorithm 1: Pseudo code of ID3 procedure ID3(D)

Stop if no further classification is need for each attribute A do

Calculate Gain(D, A) end for

Select the attribute with the highest information gain as a tree node

for each possible value v of the selected attribute do

Create a branch for Dv, the subset of D of

which the selected attribute has value v Call ID3(Dv) to construct this subtree

end for end procedure

In the proposal of ILI (Chuang and Chen 2007), the ID3 algorithm is adopted as a classification and relationship extraction mechanism. Linkage learning is then achieved by a sequence of decision tree construc-tions. In a classification problem, a training instance is composed of a list of attributes describing the instance and a target value which the decision tree is supposed to predict after training. For the purpose of linkage identification, the list of attributes is the solution string, and the target value is the fitness difference caused by perturbations.

3.2. Exemplary illustration

This section illustrates the idea that linkage learning is considered as decision learning with an example. We consider a trap function of size k defined as the following: ftrapkðs1s2  skÞ ¼trapk u ¼ Xk i¼1 si ! ¼ k, if u ¼ k; k 1  u, otherwise,  ð5Þ

where u is the number of ones in the string s1s2  sk.

Suppose that we are dealing with an eight-bit problem f ðs1s2  s8Þ ¼ftrap5ðs1s2s3s4s5Þ þftrap3ðs6s7s8Þ, ð6Þ where s1s2  s8is a solution string. In the black-box

optimisation scenario, the structural decomposition of the objective function is unknown. Our goal here is to

International Journal of Systems Science 2205

(18)

identify the two linkage sets V1¼{1, 2, 3, 4, 5} and

V2¼{6, 7, 8}, which correspond to the problem

struc-tural decomposition.

In the beginning, a population of strings is randomly generated as listed in Table 1. The first column lists the solution strings, and the second column lists the fitness values of the corresponding strings. After initializing the population, we perturb the first variable s1(0 ! 1 or 1 ! 0) for all strings in

the population in order to detect the variables interdependent on s1. Note that the choice of first

operating on s1in this example is not mandatory. Any

un-grouped decision variable in the encoding may be chosen as the root node. The third column of Table 1 records the fitness differences, df1, caused by

pertur-bations at variable s1.

Then, we construct an ID3 decision tree by using the perturbed population of strings as the training instances and the perturbed variable s1as the tree root.

Variables in s1s2  s8are regarded as attributes of the

instances, and the fitness differences df1are the target

values/class labels. Corresponding to Table 1, an ID3 decision tree shown in Figure 1 is constructed. By gathering all the decision variables on the non-leaf nodes, we can identify a group of s1, s2, s3, s4and s5. As

a consequence, linkage set V1is correctly identified.

For the remainder of this example, since s1, s2, s3, s4

and s5 are already identified as linkage set V1, we

proceed at s6. The fitness differences after perturbing

variable s6are shown in Table 2. Conducting the same

procedure, an ID3 decision tree presented in Figure 2 is obtained. By gathering all the decision variables used in the decision tree, we obtain variables s6, s7and s8,

which form linkage set V2. Because all the decision

variables are classified into their respective linkage sets, the linkage detecting task is accomplished. ILI finally reports two linkage sets, V1¼{s1, s2, s3, s4, s5} and

V2¼{s6, s7, s8}.

As illustrated in the example, the mechanism of ILI can detect size-varied BBs without assumptions. Such an ability implies that ILI should be capable of finding all relations among these variables as long as the Table 2. Population perturbed at s6.

s1s2  s8 f df6 s1s2  s8 f df6 11100 000 1 0 10101 100 1 0 10011 000 1 0 01101 100 1 0 11011 001 0 0 00100 100 3 0 01111 001 0 0 10010 101 2 0 00100 001 3 0 10110 101 1 0 11111 010 5 0 11110 101 0 0 10101 010 1 0 01101 101 1 0 11100 010 1 0 01110 110 1 0 10001 010 2 0 01111 110 0 0 11011 010 0 0 01110 110 1 0 10000 010 3 0 10101 110 1 0 01101 010 1 0 01111 110 0 0 00001 011 3 3 10010 110 2 0 00001 011 3 3 00011 111 5 3 11010 011 1 3 00011 111 5 3 11001 011 1 3 01000 111 6 3 11111 011 5 3 00101 111 5 3 11100 011 1 3 11001 111 4 3 01010 011 2 3 00110 111 5 3 10111 100 0 0 01111 111 3 3

Figure 1. ID3 decision tree constructed according to Table 1. Table 1. Population perturbed at s1.

s1s2  s8 f df1 s1s2  s8 f df1 00001 011 3 1 10010 110 2 1 00011 111 5 1 10011 000 1 1 00100 001 3 1 10101 010 1 1 00100 100 3 1 10101 100 1 1 00101 111 5 1 10101 110 1 1 00110 111 5 1 10110 101 1 1 01000 111 6 1 10111 100 0 1 01010 011 2 1 11001 011 1 1 01101 010 1 1 11001 111 4 1 01101 100 1 1 11010 011 1 1 01101 101 1 1 11011 001 0 1 01110 110 1 1 11011 010 0 1 01111 001 0 5 11100 000 1 1 01111 110 0 5 11100 010 1 1 01111 111 3 5 11100 011 1 1 10000 010 3 1 11110 101 0 1 10001 010 2 1 11111 010 5 5 10010 101 2 1 11111 011 5 5

2206 Y.-p. Chen et al.

(19)

population size is sufficiently large to provide signif-icant statistics.

3.3. Inductive linkage identification

In this section, the idea demonstrated in the previous section is formalized as an algorithm, which is called ILI. The pseudo code of ILI is presented in Algorithm 2. Conceptually, ILI includes the following three main steps:

(1) Calculate fitness differences by perturbations; (2) Construct an ID3 decision tree;

(3) Consider the tree nodes as a linkage set. The three steps repeat until all the variables of the objective function are classified into their correspond-ing linkage sets.

ILI starts with initializing a population of strings. After initialization, ILI identifies one linkage set at a time using the following procedure: (1) a variable is randomly selected to be perturbed; (2) an ID3 decision tree is constructed according to the fitness differences caused by perturbations; (3) the variables used in the tree are gathered and considered as a linkage set.

Algorithm 2: Inductive linkage identification procedure IDENTIFYLINKAGE( f, ‘)

Initialise a population P with n string of length ‘. Evaluate the fitness of strings in P using f. V {1, . . . , ‘} m 0 while V 6¼ ; do m m þ1 Select v in V at random. Vm {v} V V {v}

for each string si¼si

1si2  si‘ in P do

Perturb si v.

dfi fitness difference caused by perturbation.

end for

Construct an ID3 decision tree using (P, df ). for each decision variable sjin tree do

Vm Vm[{ j}

V V { j} end for end while

return linkage sets V1, V2, . . . , Vm

end procedure

As clearly shown in Algorithm 2, there is no parameter needed for indicating the complexity of sub-functions. That is, ILI does not rely on any assumption on the size of BBs while other existing perturbation methods usually require the maximum size of BBs to be specified. This property distinguishes ILI from other existing methods. The only factor effecting the cor-rectness of ILI is whether or not the solution strings in the population can provide sufficient information for the decision tree construction.

From our previous studies (Chuang and Chen 2007, 2008), we know that the required population size grows linearly with the problem size while the BBs size is constant. Such results indicate that ILI is more efficient than LINC, O(‘2) ¼ O(k2m2) (Munetomo and Goldberg 1998), and similar to D5, O(‘) ¼ O(km) (Tsuji et al. 2006), where ‘ is the problem size, k is the size (i.e. length or order) of BBs, and m is the number of BBs. Note that the comparison focuses on the amount of required computational resource instead of the identification quality. This is because given sufficient computational resource, all these methods can success-fully identify every BB. In order to gain further understandings on the flexibility and applicability of ILI, in the next section, experiments on the BBs of different sub-functions as well as lengths are conducted and discussed.

4. Experiments and results

Experiments and results of ILI on binary and non-overlapped ADFs will be presented in this section. These experiments are designed to gain a better understanding of the behaviour of ILI on problems of different sub-functions compositions, including size-varied, size-mixed BBs and different sub-functions.

The required population size reflects the behaviour of ILI. Therefore, our experiments are designed to Figure 2. ID3 decision tree constructed according to Table 2.

International Journal of Systems Science 2207

(20)

obtain the minimal population sizes required for different problem configurations. For a given problem, first a population size assuring successful trials of linkage identification, which means correctly identify-ing all the BBs within the problem for 30 consecutive and independent runs, is obtained by doubling the population size from 2500 until the first successful trial is archived. Once the upper bound of population sizes PUis found, the required population size is determined

in a bisection manner: the population size P ¼ (PLþPU)/2 will be configured for ILI, where PL¼1

for the first iteration. If ILI can succeed with this population size P, then P will be regarded sufficiently large for the problem. The next iteration will perform on the range [PL, P]. Otherwise, the range [P, PU] will

be used. This procedure repeats until the range is smaller than a predefined distance, which is 2 in this study, and the last tested population size is considered as the minimal requirement for the current problem.

4.1. Different BB sizes

This section describes the experiment on problems of identical overall sizes but with different-sized sub-functions. From our experimental results with different configurations of the BB size k and the number of BBs m, we group those results with the overall problem sizes and arrange them with the BB size k. Thus, the results of the same problem size with different k can be examined.

Figure 3(a) and (b) shows the experimental results where the overall problem sizes are 60 bits, 240 bits, 420 bits and 600 bits with a log-scaled y-axis. The straight lines indicate that for identical overall problem sizes, the requirements of both the population size and the function evaluation grow exponentially.

With the exponential regression of the experimental results, an estimation of y ¼ C  2akcan be obtained, where a is a constant around 0.8 and C varies with different problem sizes. Earlier studies by Munetomo and Goldberg (1998) and Heckendorn and Wright (2004), respectively, suggested an empirical and a theoretical upper bounds of function evaluations, which are both in the form of 2k‘jlog() for problems of ‘ bits, composed of order-k BBs and each BB sharing j bits with others. Reviewing our empirical results with the upper bounds, ILI shows the same computational complexity of the exponential growth with k when overall sizes remain constant, such an observation is consistent with the upper bounds reported in the literature. However, the regression gives 0.8 as the base of exponent and thus indicates a practically better efficiency compared to the suggested upper bound when the complexity of sub-problem increases.

4.2. Mixed BB sizes

One of the key features of ILI is unsupervised. In this section, we inspect this feature by conducting experi-ments on the problems consisting of non-overlapping BBs of order-k1and order-k2trap functions as

trapk1þk2ðÞ ¼ Xm i¼1 trapk1ðÞ þtrapk2ðÞ   , ð7Þ

where m is the number of trapk1 and trapk2. By designing the experiments in this way, the empirical results can be easily compared with those from problems consisting of identical sub-problem complex-ities in the following manner: for each problem size obtained from the experiment of trapk1þk2ðÞ, two

3 4 5 6 100 250 500 1000 2000 BB size Population size

Overall problem size = 60 bits Overall problem size = 240 bits Overall problem size = 420 bits Overall problem size = 600 bits

3 4 5 6 2000 5000 10,000 25,000 50,000 100,000 250,000 500,000 BB size Function evaluation

Overall problem size = 60 bits Overall problem size = 240 bits Overall problem size = 420 bits Overall problem size = 600 bits

(a) (b)

Figure 3. Requirements on different BB sizes; (a) Population size and (b) Function evaluation.

2208 Y.-p. Chen et al.

(21)

results of the same amount of trapk1 and trapk2 from experiments in Section 4.1 are summed up to get the same problem size and total number of BBs, interpo-lation is utilised when there are no results of such configurations. These calculated numbers are denoted as trapk1þtrapk2 in Figure 4 with the experimental results trapk1þk2.

First, these results show that ILI is capable of detecting BBs of different sizes within one problem without any extra information regarding the complex-ity of sub-problems. Second, comparing with calcu-lated data, it can be seen that although ILI requires more function evaluations for the problems composed of mixed BB sizes, the growth rate is still linear or very close to linear. The observation indicates that identi-fying size-varied BBs within a problem poses no particular difficulties for ILI. Such a property of robustness makes ILI more practical when being applied to real world problems where information regarding the sub-problem complexity is usually unavailable and no guideline exists to make appropri-ate assumptions.

4.3. BBs of various elementary functions

Despite of using trapkfunctions as the sub-function to

construct BBs, the capability of ILI to handle BBs formed by other functions shown in Figure 5 is examined in this section. These elementary functions are used to compose the objective function according to the ADF model, and the complexity of order 4 is used in this section.

Figure 6 shows the experimental results. The required population sizes and function evaluations of

trap4, nith4, tmmp4and valley4are plotted together, and

the standard deviation of the results for trap4is also

shown in the figures. Because the population and function evaluation requirements of these problems are similar, the behaviour of ILI should also be similar for problems constructed by mixing sub-problems of the same complexity. Moreover, the applicability of ILI on a wide range of problems is also confirmed. ILI is capable of detecting the interactions among variables as long as a sufficiently large population is employed to provide significant statistics.

5. Summary and conclusions

In this article, we examined ILI on several different configurations of BBs in order to gain better under-standings. We focused on the mixed sizes of BBs and the elementary functions of different types. These series of experiments verified the efficiency of ILI on the population requirement growth, the robustness of ILI on mixed sizes of BBs, and the applicability of ILI on BBs formed with various elementary functions.

From the experiments of BB sizes, it is demon-strated that the required function evaluations grow exponentially with the size of BBs when the overall problem size remains constant. Such a result is consistent with the conclusions of previous studies from other researchers in the manner of Big-O while ILI demands less computational resource in practice. On the other hand, if computationally expensive real-world problems, such as parametric engineering design (Saridakis and Dentsoras 2009), are handled, and the optimisation framework has to be made much more efficient, techniques of the surrogate-assisted

0 100 200 300 400 500 600 700 0 2 4 6 8 10 12 µ104

Overall problem size

Function evaluation trap3+5 trap4+5 trap3+trap5 trap4+trap5 0 100 200 300 400 500 600 700 0 0.5 1 1.5 2 2.5 µ105

Overall problem size

Function evaluation

trap3+6 trap4+6 trap3+trap6

trap4+trap6

Figure 4. Problems with mixed BB sizes. The solid lines represent the actual experimental results while the dashed lines are the summed up calculations from Section 4.1.

International Journal of Systems Science 2209

(22)

evolutionary algorithm(SAEA) (Sastry, Goldberg, and Pelikan 2001; Jin 2003; Lim, Jin, Ong, and Sendhoff 2010a; Lim, Ong, Setiawan, and Idris 2010b) may be adopted and utilised.

Another observation is that when ILI performs on problems composed of mixed-sized BBs, the computa-tional complexity of ILI is still in the same order.

This phenomenon indicates that detecting these more complicated problem structures poses no particular difficulty for ILI. Finally, the experimental results obtained by using four different elementary functions to construct BBs are quite similar. Thus, this series of experiments evidentially proves that ILI behaves sim-ilarly when handling sub-problem of different types.

0 1 2 3 4 0 1 2 3 4 Unitation Function value trap4 (a) 0 1 2 3 4 0 1 2 3 4 Unitation Function value nith4 (b) 0 1 2 3 4 0 1 2 3 4 Unitation Function value tmmp4 (c) 0 1 2 3 4 0 1 2 3 4 Unitation Function value valley4 (d)

Figure 5. Elementary functions adopted in the series of experiments in Section 4.3; (a) trap4, (b) nith4, (c) tmmp4and (d) valley4.

0 20 40 60 80 100 120 140 160 250 300 350 400 Number of BB (overall problem size = BB * 4)

Population size trap4 nith4 tmmp4 valley4 (a) 0 20 40 60 80 100 120 140 160 0 1 2 3 4 5 6 µ104 Number of BB (overall problem size = BB * 4)

Function evaluation trap4 nith4 tmmp4 valley4 (b)

Figure 6. Experimental results on different 4-bits BB types: (a) required population sizes and (b) required function evaluations.

2210 Y.-p. Chen et al.

(23)

As a consequence, we can now know that the most important factor that affects ILI’s ability to identify linkage is the size of BBs. Compared with the BB size, ILI is relatively insensitive to other factors commonly studied by the related work, including the overall problem size, the number of BBs, and the type of BBs. Hence, ILI can be considered as a good linkage learning technique and can be adopted as a tool for analysing structures of target problems or a pre-processing procedure in frameworks of GAs.

Since its introduction, ILI as a linkage learning technique has been empirically proven efficient, robust and widely applicable. Research along this line includes integrating ILI into a GA framework, han-dling real-world applications with ILI, exploring ILI’s capability of analysing problem structures and under-standing the nature of linkage learning via getting deeper insights of ILI. As for the immediate future studies, the idea of ‘linkage identification as decision learning’ can be adapted to work with other advanced decision tree techniques. Characteristics of different decision tree algorithms might exhibit behaviour of different kinds and give us a better understanding of linkage identification. Such knowledge can be utilised to practically help the algorithmic development of GAs and theoretically reveal the working principle of evolutionary computation.

Acknowledgements

The work was supported in part by the National Science Council of Taiwan under Grant NSC 99-2221-E-009-123-MY2. The authors are grateful to the National Center for High-performance Computing for computer time and facilities.

Notes on contributors

Ying-ping Chen received his BS and MS degrees in Computer Science and Information Engineering from National Taiwan University, Taiwan, in 1995 and 1997, respectively, and PhD in 2004 from the Department of Computer Science, University of Illinois at Urbana-Champaign, Illinois, USA. He has been an Assistant Professor from 2004 to 2009 and an Associate Professor since 2009 in the Department of Computer Science, National Chiao Tung University, Taiwan. His research interests in the field of genetic and evolutionary computation include theories, working principles, particle swarm optimisation, estimation of distribution algorithms, linkage learning techniques and dimensional/facet-wise models.

Chung-Yao Chuang received his BS and MS degrees in Computer Science from National Chiao Tung University, Taiwan, in 2006 and 2008, respectively. He is currently working at Academia Sinica, Taiwan for obligated citizen service and look-ing forward to belook-ing included in a PhD program starting at Fall 2012. His research interests include linkage problem in evolution-ary algorithms, estimation of distribution algorithms, evolu-tionary computation and machine learning in general.

Yuan-Wei Huangreceived the BS and MS degrees in Computer Science from National Chiao Tung University, Taiwan, in 2008 and 2010, respec-tively. His research interests include linkage learning techniques, machine learning, and artificial intelligence.

References

Baluja, S. (1994), ‘Population-based Incremental Learning: A Method for Integrating Genetic Search Based Function Optimization and Competitive Learning’, Technical Report, Pittsburgh, PA, USA.

Baluja, S., and Davies, S. (1997), ‘Using Optimal Dependency-trees for Combinational Optimization’, in Proceedings of the Fourteenth International Conference on Machine Learning, ICML’97, pp. 30–38.

Chen, Y.-p., Yu, T.-L., Sastry, K. and Goldberg, D.E. (2007), ‘A Survey of Linkage Learning Techniques in Genetic and Evolutionary Algorithms’, IlliGAL Report No. 2007014, Illinois Genetic Algorithms Laboratory, University of Illinois at Urbana-Champaign.

Chuang, C.Y., and Chen, Y.P. (2007), ‘Linkage Identification by Perturbation and Decision Tree Induction’, in Proceedings of 2007 IEEE Congress on Evolutionary Computation (CEC, 2007), pp. 357–363. Chuang, C.Y., and Chen, Y.P. (2008), ‘Recognizing Problem

Decomposition with Inductive Linkage Identification: Population Requirement vs. Subproblem Complexity’, in Proceedings of the Joint 4th International Conference on Soft Computing and Intelligent Systems and 9th International Symposium on Advanced Intelligent Systems (SCIS & ISIS, 2007), pp. 670–675.

de Bonet, J., Isbell, C., and Viola, P. (1997), ‘MIMIC: Finding Optima by Estimating Probability Densities’, Advances in Neural Information Processing Systems, 9, 424–430.

de Jong, E.D., Watson, R., and Thierens, D. (2005), ‘On the Complexity of Hierarchical Problem Solving’, in Proceedings of Genetic and Evolutionary Computation Conference 2005 (GECCO, 2008), pp. 1201–1208.

International Journal of Systems Science 2211

數據

Figure 1. ID3 decision tree constructed according to Table 1.Table 1. Population perturbed at s1.
Figure 3. Requirements on different BB sizes; (a) Population size and (b) Function evaluation.
Figure 6 shows the experimental results. The required population sizes and function evaluations of
Figure 6. Experimental results on different 4-bits BB types: (a) required population sizes and (b) required function evaluations.
+7

參考文獻

相關文件

Writing texts to convey information, ideas, personal experiences and opinions on familiar topics with elaboration. Writing texts to convey information, ideas, personal

Writing texts to convey simple information, ideas, personal experiences and opinions on familiar topics with some elaboration. Writing texts to convey information, ideas,

The purpose of this talk is to analyze new hybrid proximal point algorithms and solve the constrained minimization problem involving a convex functional in a uni- formly convex

We explicitly saw the dimensional reason for the occurrence of the magnetic catalysis on the basis of the scaling argument. However, the precise form of gap depends

Miroslav Fiedler, Praha, Algebraic connectivity of graphs, Czechoslovak Mathematical Journal 23 (98) 1973,

The min-max and the max-min k-split problem are defined similarly except that the objectives are to minimize the maximum subgraph, and to maximize the minimum subgraph respectively..

Experiment a little with the Hello program. It will say that it has no clue what you mean by ouch. The exact wording of the error message is dependent on the compiler, but it might

To convert a string containing floating-point digits to its floating-point value, use the static parseDouble method of the Double class..