• 沒有找到結果。

行政院國家科學委員會專題研究計畫 成果報告

N/A
N/A
Protected

Academic year: 2022

Share "行政院國家科學委員會專題研究計畫 成果報告"

Copied!
36
0
0

加載中.... (立即查看全文)

全文

(1)

行政院國家科學委員會專題研究計畫 成果報告

適應於異質性網格計算環境上複合式資源排程技術之設計 與研發(第 2 年)

研究成果報告(完整版)

計 畫 類 別 : 個別型

計 畫 編 號 : NSC 95-2221-E-216-011-MY2

執 行 期 間 : 96 年 08 月 01 日至 97 年 07 月 31 日 執 行 單 位 : 中華大學資訊工程學系

計 畫 主 持 人 : 許慶賢

計畫參與人員: 碩士班研究生-兼任助理人員:李開文 碩士班研究生-兼任助理人員:郁家豪 博士班研究生-兼任助理人員:陳世璋 博士班研究生-兼任助理人員:陳泰龍

處 理 方 式 : 本計畫涉及專利或其他智慧財產權,2 年後可公開查詢

中 華 民 國 97 年 10 月 30 日

(2)

行政院國家科學委員會補助專題研究計畫 █ 成 果 報 告

□期中進度報告

適應於異質性網格計算環境上複合式資源排 程技術之設計與研發

計畫類別: 5 個別型計畫 □ 整合型計畫 計畫編號: NSC95-2221-E-216-011-MY2

執行期間: 95 年 8 月 1 日至 97 年 7 月 31 日

計畫主持人: 許慶賢 中華大學資訊工程學系副教授 共同主持人:

計畫參與人員: 陳世璋、陳泰龍 (中華大學工程科學研究所博士生) 李開文、郁家豪 (中華大學資訊工程學系研究生)

成果報告類型(依經費核定清單規定繳交):□精簡報告 5 完整報告

本成果報告包括以下應繳交之附件:

□赴國外出差或研習心得報告一份

□赴大陸地區出差或研習心得報告一份

5 出席國際學術會議心得報告及發表之論文各一份

□國際合作研究計畫國外研究報告書一份

處理方式:除產學合作研究計畫、提升產業技術及人才培育研究計畫、列 管計畫及下列情形者外,得立即公開查詢

□涉及專利或其他智慧財產權,□一年 5 二年後可公開查詢

執行單位: 中華大學資訊工程學系

中 華 民 國 97 年 10 月 31 日

(3)

行政院國家科學委員會專題研究計畫成果報告

適應於異質性網格計算環境上複合式資源排程技術 之設計與研發

Design, Analysis and Implementation of Composite Multiple Resource Scheduler for Heterogeneous Grid Computing

計畫編號:NSC95-2221-E-216-011-MY2 執行期限:95 年 8 月 1 日至 97 年 7 月 31 日 主持人:許慶賢 中華大學資訊工程學系副教授

計畫參與人員:中華大學資訊工程學系研究生

陳世璋(博三)、李開文(研二)、陳泰龍(博二)、郁家豪(研二)

一、中文摘要

本報告是有關於在異質性網格環境上開 發複合式資源排程技術之設計,並且發展具有 平台透通性的分析工具。本計畫有三個主要的 研究課題:一、發展適應於叢集網格環境之主 從式工作排程技術。此項成果可以直接移植到 叢集網格的工作排程系統。二、發展複合式資 源排程的核心技術。針對異質的計算網格系統 與網格拓僕,開發最佳化的評估模組;並以實 際的 work load trace tape,分析系統的效能。

三、發展具有平台透通性的系統資源排程、調 整與學習工具。鑒於網格平台在大量計算與高 性能科學應用上漸漸普及,本計畫之研究成果 可以直接應用在發展高效能叢集式與網格計 算之實驗環境。

關鍵詞:異質計算、網格計算、複合式排程、

資源排程、工作排程、主從式架構。

Abstract

This report presents the project to design analysis and implement a composite resource scheduler and a platform mutual analysis tool on heterogeneous grids. There are three major subjects in this research: first, we will develop master-slave task scheduling technologies which can be directly incorporated on cluster grid;

second, we will develop the main technique of composite resource scheduler. For heterogeneous grid and its topology, we will devise optimized performance analysis model and analyze system efficiency according to a set of real work load trace tape from SDSC; third, we will develop platform mutual resource scheduling and learning tool. Whereas the grid computing becomes widespread for massive computing and high performance scientific applications, the achievements of this research will facilitate constructing high performance cluster and grid systems.

Keywords: Heterogeneous Computing, Grid

Computing, Composite Scheduling, Resource Scheduling, Task Scheduling, Master Slave.

二、緣由與目的

格網計算的技術在近幾年被運用在整合 各種類型網路環境下的各種資源,其目標在於 讓使用者將來處理大量資料和龐大的計算時 能在最短的時間內獲得最有效率的執行成 果,所以格網運算技術簡單的說就是利用大規 模整合的電腦系統,搭配有效率的網路傳輸,

可依照使用者的需求,提供大量的資料處理功 能。在異質網路架構下,資源排程的技術是很

(4)

重要的,排程的主要目的即是讓參與執行工作 的眾多處理器發揮最大的效能,配合網路頻寬 做最佳化的傳輸,並且必需讓各處理器閒置時 間降到最低,以及降低處理器的閒置時間。這 些研究的方向使得格網的高效能運算技術有 更多的發展空間。為了結合格網環境與最佳化 的資源排程技術,在主從式架構下的網路計算 環境所從事的大量工作排程運算技術所延伸 的相關問題就是相當值得研究的課題。在格網 與主從式架構下,工作排程的良好與否直接影 響了程式的完成時間與是否有妥善的運用系 統資源。主從式架構下的資源排程可以避免單 一個處理器負擔太重,而增加整個工作的結束 時間,以達到高效能的計算原則。另一方面,

在主從式架構與格網計算的環境下,要有效率 的執行主機交付的工作,適當的資料傳輸排程 配合執行排班也是很重要的。在異質性的網路 計算環境上的研究資源的配置,動態的新增閒 置的處理器,以及工作環境的管理,甚至資料 安全性的問題都是及待解決與改良的問題。另 一方面、過去許多研究根據處理器的異質性來 安排工作排程,或根據網路頻寬來執行工作排 程。這些研究通常就單一系統因素來考量工作 排程,然而,在某些情況下可能無法達到系統 的 公 平 性 (fairness) 與 最 佳 效 率 與 產 能 (throughput)。在這一個研究計畫中,我們將探 討複合式的工作排程技術(composite resource scheduler),將異質性的 CPU 運算、以及異質 性網路頻寬,同時納入考量,作為執行工作分 派的主要核心技術。複合式工作排程技術的主 要優點可以提升系統整體的公平性,降低反應 與延遲時間(delay guarantee)以及提升系統的 整體產能。

三、研究方法與成果

由 於 處 理 器 需 要 收 到 連 續 工 作 資 料 (Task),如果處理器數量增加,則某些演算法 不容易有效率地執行資料傳輸和執行。為了降 低處理器所浪費的閒置時間,發展有效率的工

作排程演算法是必需的。主要研究工作包含處 理器運算能力配置對整體程式執行的必要性 及其在效能上的影響,以及現有資源排程的技 術移植與測試。針對 CPU、頻寬的異質性開 發複合式資源排程技術。研究的重點有系統產 能的提升,資源使用率的提升,系統閒滯、反 應、與延遲時間的縮短,此一部分也包含了資 料傳輸最佳化與工作排程演算法的最佳化;此 外,我們也將探討如何將前述發展的技術應用 在異質性網格拓僕架構,其中除了異質性工作 分割的探討之外,還包含工作重新分派排程的 技術。

執行工作排程時,需考量處理器與異質性 網路頻寬以及資料量大小,圖一顯示,叢集式 網格系統上工作分派示意圖,不同的網格計算 節點的運算能力是不同的。在一個網格系統 中 , Master 表 示 資 源 分 派 節 點 , 亦 可 視 為 Resource Scheduler 。 異 質 的 工 作 佇 列 將 由 Master 分派至各個運算節點。在這一個部分的 系統架構之下,我們假設工作的執行是不可以 插隊的,也就是說系統資源一旦分配給某個工 作,其他的工作不可以同時使用。在通訊的部 分,網格系統上 Resource Scheduler 與 Peer Node 之間的 communication 亦視為 Heterogeneous。

而其之間的 communication 則假設需要彼此互 斥(Exclusive)。

圖一、叢集式網格系統工作分派示意圖

處理器與工作排程的關係主要是由兩個

變數 Ti_comm (單元工作傳輸時間)與 Ti (單元工

作執行時間) 所組成。如圖二所示,C1到 C4

為 四 個 不 同 區 域 中 的 子 處 理 器 連 結 至 Master-Server,而 P1到 P4表示子處理器皆有

(5)

不同的運算能力,所需的傳送時間為 T1 T2,T1_comm到 T4_comm即為 Master-Server 傳送一 個工作到各區域之處理器所需之單位時間。

本計畫第一年所提出的資源排程演算法 為 Shortest Communication Ratio (SCR),主要 分為三個部份。第一個部分先對需要參與計算 的節點處理器排程,如圖一中的 P1到 P4,排 序依序為處理器中擁有最快執行效能 P1到最 慢 P4 處理器執行效能。

Master Server

C1 : P1 C2 : P2

C3 : P3 C4 : P4

圖二、異質性處理器與異質性網路頻寬示意圖

第 二 部 份 使 用 參 與 運 算 的 處 理 器 之 (Ti_comm + Ti) 計算其最小公倍數的概念將計 算出每一個處理器在每一個基本排程週期 (Basic Scheduling Cycle)將接收多少數量的工 作以利於系統的排程。

第三部份依照所計算出的基本排程週期 內每個處理器所佔的資料傳輸時間比例大小 來分配,讓資料傳輸時間比例小(換言之為資 料執行時間比例較長)的處理器優先接收工作 以利於提早執行,如此一來可以讓每個處理器 減少等待時間。

本計畫在第二年提出了三個改良的 SCR 資 源 排 程 演 算 法 分 別 為 SCR-Best-fit 、 SCR-Worst-fit and Extended SCR (ESCR),除了 先前的三個部份,第四部份分別利用 Best-fit、

Worst-fit 與二元逼進法(Binary approximation method) 使 每 個 處 理 器 在 有 限 的 排 程 週 期 (Scheduling Cycle)、或者在有限的工作結束時 間內(Deadline)增加最大的工作接收與處理數

量而不會造成資源的閒置與浪費。

SCR 演 算 法 與 Greedy, FPF (Fast Processor First) 演算法 [5,6]最大的不同在於 Greedy 演算法為工作單一傳送至處理器做運 算,在系統執行期間會因為沒有考慮到工作整 批傳送的優點產生許多零碎的系統閒置時 間,而 FPF 演算法雖然考慮到工作整批傳送, 但忽略的異質性網路頻寬所造成的影響,導致 有效率的處理器雖然收到比較多工作,但相對 的也增加傳輸負載,如此一來其他的處理器必 需等待更長的時間才可以接收工作。所以 SCR 演算法減少的處理器等待時間與閒置時間,進 而提升整體輸出效能。

舉例說明 SCR 排程演算法,如圖二之架 構,假設 T1_comm=5 T2_comm=2 T3_comm=1T4_comm=3 T1=3T2=6 T3=11 T4=13,

分述如下:

第一步:在圖中,處理器依照單位工作執 行時間( Ti )排序。假設有 n 個節點,則對處理 器排序完後得到集合為<P1, P2, …, Pn>。

第二步:由步驟一排序節點的結果中所需 之 Ti 與各節點之單位工作傳輸時間 Ti_comm

計 算 其 最 小 公 倍 數 LCM=(5+3, 2+6, 1+11, 3+13)= 48,計算出處理器 P1到 P4在每一個基 本排程週期(BSC)接收(6, 6, 4, 3)個單位數量 的工作。

第三步:依照所計算出的基本排程週期內 每個處理器所佔的資料傳輸時間比例大小來 分配,P3資料傳輸時間比例小(換言之為資料 執行時間比例較長)優先接收工作以利於提早 執行,如此一來可以讓每個處理器減少等待時 間,執行順序依序為P3, P4, P2 , P1

如圖三中所顯示,SCR 演算法讓資料傳 輸時間比例小的節點先接收工作,所以減少其 他節點的等待時間,而參與計算的節點皆有接 收工作並參與執行進而提升系統效能。

在此範例中工作排程為有限的三個排程 週期(BSC)而工作結束時間(Deadline)為 183。

在 P1的最後一個排程週期結束之前,P2 ,P3, P4,

(6)

皆有些許的閒置時間在等待 P1結束正在執行的 工作。為了善用這些可用的處理器資源,必需 在 183 單位時間內增加傳送些許工作給P2 ,P3, P4改良的 SCR 資源排程演算法 SCR-Best-fit 與 SCR-Worst-fit 將可解決這個問題。

圖三、SCR 排程演算法模擬示意圖。

如圖四中所顯示,SCR-Best-fit演算法除了 讓資料傳輸時間比例小的節點先接收工作之 外,每個運算節點逼近 Deadline = 183 時皆有 接收工作並參與執行進而提升系統效能。最後 一個週期的閒置時間為 22 單位時間。

圖四、SCR-Best-fit 排程演算法模擬示意圖。

如圖五中所顯示,SCR-Worst-fit 演算法 中,每個運算節點逼近 Deadline = 183 時皆有 接收工作並參與執行進而提升系統效能。與 SCR-Best-fit 演算法的不同僅在於最後的工作 分配順序不同導致系統閒置時間(idle)有些微 差異。最後一個週期的閒置時間為 24。

為了證實本計畫開發的排程演算法為有 效率的排程演算法,我們利用同一個架構下的

範例之 Largest Communication Ratio (LCR)與 FPF 演算法排程結果做為比較。可以觀察到在 同一個架構下,LCR 與 FPF 演算法著重於較高 的執行節點優先接收工作導致處理器閒置時間 與系統初始等待時間皆大於 SCR 演算法。

圖五、SCR-Worst-fit 排程演算法模擬示意圖。

如圖六(a)與(b)所示,依照處理器效能排 序,執行順序依序為P1, P2, P3,, P4,而P2, P3,, P4

的初始等待時間總合 Wi大於SCR 演算法,某些情 況甚至 FPF 演算法的閒置時間過長導致 P4無 法參與計算而造成浪費,降低系統效能。

(a)

(b)

圖六、(a) SCR 排程演算法模擬示意圖、

(b) FPF 排程演算法模擬示意圖。

(7)

為了讓整體系統效能更加提升與具有彈 性,我們設計了 Extended SCR (ESCR)的演算 法,在 SCR 排程中的閒置空間,利用二元逼 進法使每個處理器在系統所設定的排程週期 內與在有限的工作結束時間內增加最大的工 作處理數量。

如圖七中所顯示,此範例中,ESCR 演算 法利用二元逼進法使每個處理器在第 j-1 到 j+1 的排程週期內指定完成 66 個工作。每個 運算節點逼近 Deadline = 199 時皆有接收工作 並參與執行進而提升系統效能。ESCR 可依照 使用者指定固定的工作數量或最後的截止時 間,最後一個週期的閒置時間為 12。

圖七、ESCR 排程演算法模擬示意圖。

如圖八中 ESCR 的二元逼進法,當指定固 定的工作數量時,系統會自動辨別工作完成時 間落在那一個週期,並且找出 Makespan 將可 確保最短的時間內完成指定的工作數量。

圖八、ESCR 的二元逼進演算法

我們針對不同的處理器配置情形與格網 異質性網路環境和各個工作排程演算法進行 分析,發現排程演算法最佳化或與花費最少排 程時間與處理器的異質性有關聯。

圖九的數據模擬為設定每個處理器取得 的計算能力為正負十,而網路頻寬差距為正負 四,系統執行週期設定為 1 到 5 的基本排程週 期,處理器的節點數量為五個節點的平均系統 效能輸出比較。

圖九、不同基本排程週期下的 FPF,SCR_W,LCR,

SCR,SCR_B 排程步驟效能輸出數據比較

圖十的數據模擬為設定每個處理器取得 的計算能力為正負十,而網路頻寬差距為正負 四,系統執行週期設定為 1 到 5 的基本排程週 期,處理器的節點數量為五個節點的處理器等 待時間比較。

圖十、不同基本排程週期下的 FPF,SCR_W,LCR,

SCR,SCR_B 排程步驟處理器等待時間比較

圖十一的數據模擬為設定每個處理器取 得的計算能力為正負十,而網路頻寬差距為正 負四,系統執行時間設定為 50 到 2000,處理 器的節點數量為五個節點的處理器系統效能

Algorithm_ESCR_Binary_Approcimation (Ti, Ti_comm, Qtask)

// Qtask is the amount of tasks to be processed 01. While ( !(TaskESCRfinish(x) = Qtask) ) { 02. Left_t=TfinishSCR(BSCj1); 03. Right_t=TfinishSCR (BSCj+1); 04. x=1/2(Left_t+ Right_t);

05. if (TaskESCRfinish(x) > Qtask) 06. x=1/2(x+ Right_t)

07. else if (TaskESCRfinish(x)< Qtask) 08. x=1/2(Left_t+x) } 09. makespan = x;

End_of_ESCR_Binary_Approcimation

0.1 0.15 0.2 0.25 0.3 0.35

1 2 3 4 5 BSC

Throughput

FPF SCR_W LCR

SCR SCR_B

0 50 100 150 200 250

1 2 3 4 5 BSC

Total processor idle FPF SCR_WLCR SCR

SCR_B

(8)

輸出比較。

0 0.1 0.2 0.3 0.4

50 100 500 1000 2000

Deadline

Throughput

ESCR Greedy FPF

圖十一、不同系統執行時間下的 ESCR,Greedy,

FPF 排程步驟效能輸出數據比較

圖十二的數據模擬為設定每個處理器取 得的計算能力為正負十,而網路頻寬差距為正 負四,系統執行時間設定為 50 到 2000,處理 器的節點數量為五個節點的處理器等待時間 比較。

0 500 1000 1500 2000 2500 3000

50 100 500 1000 2000

Deadline

Total idle time

ESCR Greedy FPF

圖十二、不同節點數目下的 ESCR,Greedy,FPF 排程步驟處理器等待時間比較

圖十三的數據模擬為設定每個處理器取 得的計算能力為正負五至正負十,而網路頻寬 差距為正負五至正負十,系統執行時間設定為 10000,處理器的節點數量為五個節點至二十 五個節點的處理器系統平均效能輸出比較。

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7

5 10 15 20 25

# of node

Average Throughput

Greedy FPF SCR ESCR

圖十三、不同節點數目下的 Greedy、FPF,SCR,

ESCR 排程步驟平均效能輸出數據比較

圖十四的數據模擬為設定每個處理器取 得的計算能力為正負十,而網路頻寬差距為正 負五,系統執行時間設定為 5000 至 25000,

處理器的節點數量為二十五個節點的處理器 系統平均效能輸出比較。

0 0.2 0.4 0.6 0.8

5000 10000 15000 20000 25000 Deadline

Average Throughput

Greedy FPF SCR ESCR

圖十四、不同系統執行時間下的 Greedy、FPF,

SCR,ESCR 排程步驟平均效能輸出數據比較

圖十五的數據模擬為設定每個處理器取 得的計算能力為正負十,而網路頻寬差距為正 負五,系統工作數量設定為 50 至 6400,處理 器的節點數量為十個節點的處理器系統平均 效能輸出比較。

0 0.1 0.2 0.3 0.4 0.5 0.6

50 200 800 1600 6400

# of task

Average Throuthput

Greedy FPF SCR ESCR

圖十五、不同工作數量下的 Greedy、FPF,SCR,

ESCR 排程步驟平均效能輸出數據比較

圖十六的數據模擬為設定每個處理器取得 的計算能力為正負十,而網路頻寬差距為正負 十,系統工作數量設定為 50 至 6400,處理器 的節點數量為十個節點的處理器系統平均錯誤 比較。平均錯誤的定義為參與運算的所有處理 器節點中,沒有收到工作的節點數量,SCR 與 ESCR 演算法每個節點皆有接受工作並且執 行,不會產生系統錯誤。

(9)

0 1 2 3 4 5 6 7 8

50 200 800 1600 6400

# of task

Average mean error

Greedy FPF SCR ESCR

圖十六、不同工作數量下的 Greedy、FPF,SCR,

ESCR 排程步驟平均錯誤數據比較

在比較這些結果後,我們發現不管節點數 量多或少、網路異質性的高與低、工作數量多 寡,ESCR 仍然比其他演算法有更高的效能,

本計畫中提出 ESCR 排程演算法明顯地勝出其 他演算法許多,尤其在節點數量多的時候,

ESCR 演算法與其他演算法有更大的系統效能 輸出差距,並且有最少的系統等待時間。

四、結論與討論

下面我們歸納本計畫主要的成果:

z 完成發展單一叢集網格系統之主從式工 作 排 程 、 並 且 實 作 應 用 在 HPHC (Heterogeneous Processor with Heterogeneous Communication) 架構下,

以實現在異質性叢及網格計算環境中高 效率的執行工作排班程式。

z 完成 SCR-scheduler 之演算法實作 我們完成 SCR 之演算法實作,可用來判 斷資料傳送與執行的最少排程步驟。針對 不同數量大小的工作與資料分配性的問 題,設計一套處理器優先序列的計算模 式、以及基本排程週期計算的公式,研究 效能評估機制對整體網路架構的必要性 及其在效能上影響。

z 完成 SCR-Best-fit 與 SCR-Worst-fit 之演 算法實作

完成 SCR-Best-fit 與 SCR-Worst-fit 之演 算法實作,用來增加資料傳送量。與 SCR 比 較 以 相 同 執 行 環 境 下 , 增 加 了 利 用

Best-fit 與 Worst-fit 兩種演算法評估週 期計算的最佳化,使排程系統善用整體網 路資源,比單純使用SCR-scheduler有更好 的效能。

z 完成ESCR-scheduler 之演算法實作 完成ESCR-scheduler 之演算法實作,用來 增加資料傳送與減少執行時間,以彌補 SCR-Best-fit 與 SCR-Worst-fit 之演算法 的不足之處。除了先前設計的處理器優先 序列的計算模式、更增加了利用二元逼近 演算法評估週期計算的最佳化,使排程系 統對整體網路資源有最高的使用率。

z 完成Minimum-Deadline資料分配之排程 演算法實作

針對最短時間排程步驟分析,所設計的排 程演算法可以找出最短的系統執行時間 (Makespan) 以確保在固定工作量的系統 運作下,處理器的運作時間與閒置時間比 其他演算法更短。

z 完成Maximum-Job資料分配之排程演算 法實作

針對最大工作量排程步驟分析,所設計的 排程演算法可以在指定的系統執行時間 (Makespan)內,確保在長系統運作時,處 理器可以做出最有效率的系統輸出,亦及 在時間內與其他演算法比較起來可以完 成最多的工作數量。

z 完成動態網格拓僕模擬。

針對不同叢集式計算網格架構、我們建立 一 個 可 以 動 態 評 估 外 部 通 訊 效 能 的 模 式。這一個部分的工作包括,閘道計算、

網路基礎頻寬拓樸模擬、即時網路資訊擷 取、與權重計算方法。

z 完 成 Greedy, FPF (Fast Processor First) algorithm [5,6], 以 及 LCR (Largest Communication Ratio) [1] 之實作。

為了證實本計畫開發的排程演算法為有 效率的排程演算法,需實做其他排程演算 法以便進行實驗。

(10)

z 完成用於分析工作排程步驟與網路傳輸 頻寬的理論模組

為了比較排程演算法的優缺點,我們完成 了資料傳輸模擬的理論模組,用以判斷排 程結果的好壞。

z 完成資料傳送所引起的頻寬競爭之研究 資料傳送至處理器的過程中會引起處理 器互相競爭,增加了系統閒置時間。我們 的排程演算法成功地減少了資料傳送時 引起的通訊競爭與初始等待時間。

z 完成處理器與網路頻寬配置變數產生器 為 了 模 擬 實 際 的 異 質 性 處 理 器 運 算 變 數,我們實做了一個子處理器與網路模擬 產生器,產生高效能運算單元集中於某些 處理器與異質性網路頻寬集中或平均分 散在各個處理器上的配置變數。

五、計畫成果自評

本計畫兩年之研究成果已達到計畫預期 之目標。第一年度、在這一個研究主題上共計 發表兩篇研討會論文[1, 2]。其中成果 [2]

Performance Effective Pre-scheduling Strategy for Heterogeneous Communication Grid Systems 已 經 被 接 受 於 Future Generation Computer Science 期刊 (SCI)。 第二年度共 計發表兩篇研討會論文[3, 4]。其中成果 [4]

An Efficient Job Allocation Method for Master Slave Paradigm with Heterogeneous Networks in Ubiquitous Environments 已 經 被 接 受 於 Journal of Supercomputing 期刊 (SCI)。

六、參考文獻

[1] Tai-Lung Chen and Ching-Hsien Hsu, "An Efficient Processor Selection Scheme for Master Slave Paradigm on Heterogeneous Networks,"

Proceedings of Network and Parallel Computing (NPC'06), Oct. 2006.

[2] Ching-Hsien Hsu, Tai-Lung Chen and Kuan-Ching Li, "Performance Effective

Pre-scheduling Strategy for Heterogeneous Communication Grid Systems," Future Generation Computer Science, Elsevier, Vol. 23, Issue 4, pp. 569-579, May 2007. Elsevier, (SCI, EI)

[3] Ching-Hsien Hsu and Tai-Lung Chen, "An Efficient Task Dispatching Method in Heterogeneous Networks," IEEE Proceedings of the 2007 International Conference on Multimedia and Ubiquitous Engineering (MUE’07), pp. 17-22, April 2007. (EI)

[4] Ching-Hsien Hsu, Tai-Lung Chen and Jong-Hyuk Park, “On improving resource utilization and system throughput of master slave jobs scheduling in heterogeneous systems,” Journal of Supercomputing, Springer, Vol. 45, No. 1, pp.

129-150, July 2008. (SCI, EI)

[5] Oliver Beaumont, Arnaud Legrand and Yves Robert, “The Master-Slave Paradigm with Heterogeneous Processors,” IEEE Trans. on parallel and distributed systems, Vol. 14, No.9, pp. 897-908, September 2003.

[6] Cyril Banino, Olivier Beaumont, Larry Carter, Fellow, Jeanne Ferrante, Senior Member, Arnaud Legrand and Yves Robert, ”Scheduling Strategies for Master-Slave Tasking on Heterogeneous Processor Platforms,” IEEE Trans. on parallel and distributed systems, Vol. 15, No.4, pp.319-330, April 2004.

[7] Oliver Beaumont, Arnaud Legrand and Yves Robert, “Pipelining Broadcasts on Heterogeneous Platforms,” IEEE Trans. on parallel and distributed systems, Vol. 16, No.4, pp. 300-313 April 2005.

[8] Francine Berman, Richard Wolski, Hernri Casanova, Walfredo Cirne, Holly Dail, Marcio Faerman, Silvia Figueira, Jim Hayes, Graziano Obertelli, Jennifer Schopf, Gary Shao, Shava Smallen, Neil Spring, Alan Su, and Dmitrii

(11)

Zagorodnov, ”Adaptive Computing on the Grid Using AppLeS,” IEEE Trans. on parallel and distributed systems, Vol. 14, No. 4, pp.369-379, April 2003.

[9] S. Bataineh, T.Y. Hsiung and T.G. Robertazzi,

“Closed Form Solutions for Bus and Tree Networks of Processors Load Sharing a Divisible Job,” IEEE Trans. Computers, Vol. 43, No. 10, pp.

1184-1196, Oct. 1994.

[10] A.T. Chronopoulos and S. Jagannathan, “A Distributed Discrete-Time Neural Network Architecture for Pattern Allocation and Control,”

Proc. IPDPS Workshop Bioinspired Solutions to Parallel Processing Problems, 2002.

[11] Atakan Dogan, Fusun Ozguner, ”Matching and Scheduling Algorithms for Failure Probability of Applications in Heterogeneous Computing,” IEEE Trans. on parallel and distributed systems, Vol. 13, No. 3, pp. 308-323, March 2002.

[12] Ching-Chin Han, Kang G. Shin, Jian Wu, ”A Fault-Tolerant Scheduling Algorithm for Real-Time Periodic Tasks with Possible Software Faults,” IEEE Trans. on computers, Vol. 52, No. 3, pp.362-372, March 2003.

[13] Tarek Hagras, Jan Janecek, ”A High Performance, Low Complexity Algorithm for Compile-Time Task Scheduling in Heterogeneous Systems,”

Proceedings of the 18th International Parallel and Distributed Processing Symposium (IPDPS’04).

[14] Jennifer M. Schopf, “A General Architecture for Scheduling on the Grid,”

TR-ANL/MCS-P1000-1002, special issue of JPDC on Grid Computing, April, 2002.

[15] Rui Min and Muthucumaru Maheswaran,

“Scheduling Co-Reservations with priorities in grid computing systems,” Proceedings of the 2nd IEEE/ACM International Symposium on Cluster Computing and the Grid (CCGRID’02), pp.

250-251, May 2002.

[16] Muhammad K. Dhodhi, Imtiaz Ahmad Anwar Yatama, Anwar Yatama and Ishfaq Ahmad, “An integrated technique for task matching and scheduling onto distributed heterogeneous computing systems,” Journal of Parallel and DistributedComputing, Vol. 62, No. 9, pp.

1338–1361, 2002.

[17] Ching-Hsien Hsu and Tai-Long Chen, “Grid Enabled Master Slave Task Scheduling for Heterogeneous Processor Paradigm,” Grid and Cooperative Computing - Lecture Notes in Computer Science, Vol. 3795, pp. 449-454, Springer-Verlag, Dec. 2005. (GCC’05) (SCI Expanded)

[18] G. Aloisio, M. Cafaro, E. Blasi and I. Epicoco,

“The Grid Resource Broker, a Ubiquitous Grid Computing Framework,” Journal of Scientific Programming, Vol. 10, No. 2, pp. 113-119, 2002.

[19] W. E. Allcock, I. Foster, R. Madduri. “Reliable Data Transport: A Critical Service for the Grid.”

Building Service Based Grids Workshop, Global Grid Forum, June 2004.

[20] B. Allcock, J. Bester, J. Bresnahan, A. L.

Chervenak, I. Foster, C. Kesselman, S. Meder, V.

Nefedova, D. Quesnal, S. Tuecke. “Data Management and Transfer in High Performance Computational Grid Environments.” Parallel Computing Journal, Vol. 28 (5), May 2002.

[21] O. Beaumont, A. Legrand and Y.Robert, “Optimal algorithms for scheduling divisible workloads on heterogeneous systems, “Proceedings of the 12th Heterogeneous Computing Workshop, IEEE Computer Press 2003.

[22] J. Blythe, E. Deelman, Y. Gil, C. Kesselman, A.Agarwal, G. Mehta and K. Vahi, “The role of planning in grid computing,” Proceedings of ICAPS’03, 2003.

[23] H. Casanova, “Simgrid: A Toolkit for the Simulation of Application Scheduling,”

(12)

Proceeding in IEEE Int’l Symp. Cluster Computing and the Grid (CCGrid ’01), pp.

430-437, May 2001.

[24] L. J. Chang, H.Y. Chen, H.C. Chang, K.C. Li, and C.T. Yang, "The Visuel Performance Analysis and Monitoring Tool for Cluster Environments", Proceedings of ICS'2004 International Computer Symposium, Taipei, Taiwan, 2004.

[25] M. Cai, A. Chervenak, M. Frank. “A Peer-to-Peer Replica Location Service Based on A Distributed Hash Table.” Proceedings of the SC2004 Conference (SC2004), November 2004.

[26] M. Faerman, A. Birnbaum, H. Casanova and F.

Berman, “Resource Allocation for Steerable Parallel Parameter Searches,” Proceedings of GRID’02, 2002.

[27] I. Foster, “Building an open Grid,” Proceedings of the second IEEE international symposium on Network Computing and Applications, 2003.

[28] James Frey, Todd Tannenbaum, M. Livny, I.

Foster and S. Tuccke, “Condor-G: A Computation Management Agent for Multi-Institutional Grids,”

Journal of Cluster Computing, vol. 5, pp. 237 – 246, 2002.

[29] N. Fujimoto, K. Hagihara, A Comparison among Grid Scheduling Algorithms for Independent Coarse-Grained Tasks, Applicatios and the Internet Workshops, 26-30 Jan. 2004, pp.629-35.

[30] J. Nabrzyski, J.M. Schopf, J. Weglarz (Eds), “Grid Resource Management” Kluwer Publishing, Fall 2003.

[31] K. Ranganathan and I. Foster. “Identifying Dynamic Replication Strategies for High Performance Data Grids” Proceedings of International Workshop on Grid Computing, Denver, CO, November 2002.

[32] D. P. Spooner, S.A. Jarvis, J. Caoy, S. Saini and G.R. Nudd, “Local Grid Scheduling Techniques using Performance Prediction,” IEE Proc.

Computers and Digital Techniques, 150(2):87-96, 2003.

[33] Ming Wu and Xian-He Sun, “A General Self-adaptive Task Scheduling System for Non-dedicated Heterogeneous Computing,”

Proceeding in IEEE International Conference on Cluster Computing, 2003.

[34] Hui Wang, Minyi Guo, Sushil K. Prasad, Yi Pan, Wenxi Chen,” An Efficient Algorithm for Irregular Redistributions in Parallelizing Compilers”

Proceedings of the 2003 International Symposium on Parallel and Distributed Processing and Applications, pp. 76-87, July 2003.

[35] Chao-Tung Yang, Yu-Lun Kuo, and Chuan-Lin Lai, “Designing Computing Platform for BioGrid,”

International Journal of Computer Applications in Technology (IJCAT), Inderscience Publishers, ISSN (Paper): 0952-8091, UK, 2004.

[36] X. Zhang and J. Schopf. “Performance Analysis of the Globus Toolkit Monitoring and Discovery Service, MDS2,” Proceedings of the International Workshop on Middleware Performance (MP2004), part of the 23rd International Performance Computing and Communications Workshop (IPCCC), April 2004.

(13)

11

行政院所屬各機關人員出國報告書提要

撰寫時間: 96 年 6 月 20 日 姓 名 許慶賢 服 務 機 關 名 稱中華大學

資工系

連絡電話、

電子信箱

03-5186410 chh@chu.edu.tw

出 生 日 期 62 年 2 月 23 日 職 稱 副教授

出 席 國 際 會 議

名 稱

2007 International Conference on Algorithms and Architecture for Parallel Processing, June 11 -14 2007.

到 達 國 家

及 地 點

Hangzhou, China 出 國

期 間

自 96 年 06 月 11 日 迄 96 年 06 月 19 日

內 容 提 要

這一次在杭州所舉行的國際學術研討會議共計四天。第一天下午本人抵達會 場辦理報到。第二天各主持一場 invited session 的論文發表。同時,自己也 在 上 午 的 場 次 發 表 了 這 依 次 被 大 會 接 受 的 論 文 。 第 一 天 也 聽 取 了 Dr.

Byeongho Kang 有關於 Web Information Management 精闢的演說。第二天許 多 重 要 的 研 究 成 果 分 為 六 個 平 行 的 場 次 進 行 論 文 發 表 。 本 人 選 擇 了 Architecture and Infrastructure、Grid computing、以及 P2P computing 相關場 次聽取報告。晚上本人亦參加酒會,並且與幾位國外學者及中國、香港教授 交換意見,合影留念。第三天本人在上午聽取了 Data and Information Management 相關研究,同時獲悉許多新興起的研究主題,並了解目前國外 大多數學者主要的研究方向,並且把握最後一天的機會與國外的教授認識,

希望能夠讓他們加深對台灣研究的印象。三天下來,本人聽了許多優秀的論 文發表。這些研究所涵蓋的主題包含有:網格系統技術、工作排程、網格計 算、網格資料庫以及無線網路等等熱門的研究課題。此次的國際學術研討會 議有許多知名學者的參與,讓每一位參加這個會議的人士都能夠得到國際上 最新的技術與資訊。是一次非常成功的學術研討會。參加本次的國際學術研 討會議,感受良多。讓本人見識到許多國際知名的研究學者以及專業人才,

得以與之交流。讓本人與其他教授面對面暢談所學領域的種種問題。看了眾 多研究成果以及聽了數篇專題演講,最後,本人認為,會議所安排的會場以 及邀請的講席等,都相當的不錯,覺得會議舉辦得很成功,值得我們學習。

出 席 人 所 屬 機 關 審 核 意 見

層 轉 機 關

審 核 意 見

研 考 會

處 理 意 見

(14)

12

(出席 ICA3PP-07 研討會所發表之論文)

A Generalized Critical Task Anticipation Technique for DAG Scheduling

Ching-Hsien Hsu

1

, Chih-Wei Hsieh

1

and Chao-Tung Yang

2

1 Department of Computer Science and Information Engineering Chung Hua University, Hsinchu, Taiwan 300, R.O.C.

chh@chu.edu.tw

2 High-Performance Computing Laboratory

Department of Computer Science and Information Engineering Tunghai University, Taichung City, 40704, Taiwan R.O.C.

ctyang@thu.edu.tw

Abstract. The problem of scheduling a weighted directed acyclic graph (DAG) representing an application to a set of heterogeneous processors to minimize the completion time has been recently studied. The NP-completeness of the problem has instigated researchers to propose different heuristic algorithms. In this paper, we present a Generalized Critical-task Anticipation (GCA) algorithm for DAG scheduling in heterogeneous computing environment. The GCA scheduling algorithm employs task prioritizing technique based on CA algorithm and introduces a new processor selection scheme by considering heterogeneous communication costs among processors for adapting grid and scalable computing. To evaluate the performance of the proposed technique, we have developed a simulator that contains a parametric graph generator for generating weighted directed acyclic graphs with various characteristics. We have implemented the GCA algorithm along with the CA and HEFT scheduling algorithms on the simulator.

The GCA algorithm is shown to be effective in terms of speedup and low scheduling costs.

1. Introduction

The purpose of heterogeneous computing system is to drive processors cooperation to get the application done quickly. Because of diverse quality among processors or some special requirements, like exclusive function, memory access speed, or the customize I/O devices, etc.; tasks might have distinct execution time on different resources. Therefore, efficient task scheduling is important for achieving good performance in heterogeneous systems.

The primary scheduling methods can be classified into three categories, dynamic scheduling, static scheduling and hybrid scheduling according to the time at which the scheduling decision is made. In dynamic approach, the system performs redistribution of tasks between processors during run-time, expect to balance computational load, and reduce processor’s idle time. On the contrary, in static

(15)

13

approach, information of applications, such as tasks execution time, message size of communications among tasks, and tasks dependences are known a priori at compile-time; tasks are assigned to processors accordingly in order to minimize the entire application completion time and satisfy the precedence of tasks. Hybrid scheduling techniques are mix of dynamic and static methods, where some preprocessing is done statically to guide the dynamic scheduler [8].

A Direct Acyclic Graph (DAG) [2] is usually used for modeling parallel applications that consists a number of tasks. The nodes of DAG correspond to tasks and the edges of which indicate the precedence constraints between tasks. In addition, the weight of an edge represents communication cost between tasks. Each node is given a computation cost to be performed on a processor and is represented by a computation costs matrix. Figure 1 shows an example of the model of DAG scheduling. In Figure 1(a), it is assumed that task nj is a successor (predecessor) of task ni if there exists an edge from ni to nj (from nj to ni) in the graph. Upon task precedence constraint, only if the predecessor ni completes its execution and then its successor nj receives the messages from ni, the successor nj can start its execution.

Figure 1(b) demonstrates different computation costs of task that performed on heterogeneous processors. It is also assumed that tasks can be executed only on single processor with non-preemptable style. A simple fully connected processor network with asymmetrical data transfer rate is shown in Figures 1(c) and 1(d).

P1 P2 P3 wi

n1 14 19 9 14 n2 13 19 18 16.7 n3 11 17 15 14.3 n4 13 8 18 13 n5 12 13 10 11.7 n6 12 19 13 14.7 n7 7 16 11 11 n8 5 11 14 10 n9 18 12 20 16.7 n10 17 20 11 16

(a) (b)

(c) (d)

Figure 1: An example of DAG scheduling problem (a) Directed Acyclic Graph (DAG-1) (b) computation cost matrix (W) (c) processor topology (d) communication weight.

The scheduling problem has been widely studied in heterogeneous systems where

(16)

14

the computational ability of processors is different and the processors communicate over an underlying network. Many researches have been proposed in the literature.

The scheduling problem has been shown to be NP-complete [3] in general cases as well as in several restricted cases; so the desire of optimal scheduling shall lead to higher scheduling overhead. The negative result motivates the requirement for heuristic approaches to solve the scheduling problem. A comprehensive survey about static scheduling algorithms is given in [9]. The authors of have shown that the heuristic-based algorithms can be classified into a variety of categories, such as clustering algorithms, duplication-based algorithms, and list-scheduling algorithms.

Due to page limitation, we omit the description for related works.

In this paper, we present a Generalized Critical task Anticipation (GCA) algorithm, which is an approach of list scheduling for DAG task scheduling problem. The main contribution of this paper is proposing a novel heuristic for DAG scheduling on heterogeneous machines and networks. A significant improvement is that inter-processor communication costs are considered into processor selection phase such that tasks can be mapped to more suitable processors. The GCA heuristic is compared favorable with previous CA [5] and HEFT heuristics in terms of schedule length and speedup under different parameters.

The rest of this paper is organized as follows: Section 2 provides some background, describes preliminaries regarding heterogeneous scheduling system in DAG model and formalizes the research problem. Section 3 defines notations and terminologies used in this paper. Section 4 forms the main body of the paper, presents the Generalized Critical task Anticipation (GCA) scheduling algorithm and illustrating it with an example. Section 5 discusses performance of the proposed heuristic and its simulation results. Finally, Section 6 briefly concludes this paper.

2. DAG Scheduling on Heterogeneous Systems

The DAG scheduling problem studied in this paper is formalized as follows. Given a parallel application represented by a DAG, in which nodes represent tasks and edges represent dependence between these tasks. The target computing architecture of DAG scheduling problem is a set of heterogeneous processors, M = {Pk: k = 1: P}

and P = |M|, communicate over an underlying network which is assumed fully connected. We have the following assumptions:

z Inter-processor communications are performed without network contention between arbitrary processors.

z Computation of tasks is in non-preemptive style. Namely, once a task is assigned to a processor and starts its execution, it will not be interrupted until its completion.

z Computation and communication can be worked simultaneously because of the separated I/0.

z If two tasks are assigned to the same processor, the communication cost between the two tasks can be discarded.

z A processor is assumed to send the computational results of tasks to their immediate successor as soon as it completes the computation.

Given a DAG scheduling system, W is an n × P matrix in which wi,j indicates estimated computation time of processor Pj to execute task ni. The mean execution time of task ni can be calculated by the following equation:

(17)

15

=

= P

j j i

i P

w w

1

, (1) Example of the mean execution time can be referred to Figure 1(b).

For communication part, a P × P matrix T is structured to represent different data transfer rate among processors (Figure 1(d) demonstrates the example). The communication cost of transferring data from task ni (execute on processor px) to task nj (execute on processor py) is denoted by ci,j and can be calculated by the following equation,

y x j i m j

i V Msg t

c, = + , × , , (2) Where:

Vm is the communication latency of processor Pm, Msgi,j is the size of message from task ni to task nj,

tx,y is data transfer rate from processor px to processor py, 1≤ x, y ≤P.

In static DAG scheduling problem, it was usually to consider processors’

latency together with its data transfer rate. Therefore, equation (2) can be simplified as follows,

y x j i j

i Msg t

c, = , × , , (3) Given an application represented by Directed Acyclic Graph (DAG), G = (V, E), where V = {nj: j = 1: v} is the set of nodes and v = |V|; E = {ei,j = <ni, nj>} is the set of communication edges and e =|E|. In this model, each node indicates least indivisible task. Namely, each node must be executed on a processor from the start to its completion. Edge <ni, nj> denotes precedence of tasks ni and nj. In other words, task ni is the immediate predecessor of task nj and task nj is the immediate successor of task ni. Such precedence represents that task nj can be start for execution only upon the completion of task ni. Meanwhile, task nj should receive essential message from ni for its execution. Weight of edge <ni, nj > indicates the average communication cost between ni and nj.

Node without any inward edge is called entry node, denoted by nentry; while node without any outward edge is called exit node, denoted by nexit. In general, it is supposed that the application has only one entry node and one exit node. If the actual application claims more than one entry (exit) node, we can insert a dummy entry (exit) node with zero-cost edge.

3. Preliminaries

This study concentrates on list scheduling approaches in DAG model. List scheduling was usually distinguished into list phase and processor selection phase.

Therefore, priori to discuss the main content, we first define some notations and terminologies used in both phases in this section.

3.1 Parameters for List Phase

Definition 1: Given a DAG scheduling system on G = (V, E), the Critical Score of task ni denoted by CS(ni) is an accumulative value that are computed recursively traverses along the graph upward, starting from the exit node. CS(ni) is computed by the following equations,

(18)

16

⎪⎩

⎪⎨

+ +

= =

( ( )) otherwise

) i.e.

( ndoe exit the is if )

(

) ,

( ij j

n suc i n

exit i i

exit

i w Max c CS n

n n n

w n CS

i j

(4)

where wexit is the average computation cost of task nexit,

wi is the average computation cost of task ni, suc(ni) is the set of immediate successors of task ni,

j

ci, is the average communication cost of edge <ni, nj> which is defined as follows,

) ( 2

, 1

, ,

, P P

t Msg

c xy P

y x j

i

j

i

×

=

, (5)

3.2 Parameters for Processor Selection Phase

Most algorithms in processor selection phase employ a partial schedule scheme to minimize overall schedule length of an application. To achieve the partial optimization, an intuitional method is to evaluate the finish time (FT) of task ni

executed on different processors. According to the calculated results, one can select the processor who has minimum finish time as target processor to execute the task ni. In such approach, each processor Pk will maintain a list of tasks, task-list(Pk), keeps the latest status of tasks correspond to the EFT(ni, Pk), the earliest finish time of task ni that is assigned on processor Pk.

Recall having been mentioned above that the application represented by DAG must satisfy the precedence relationship. Taking into account the precedence of tasks in DAG, a task nj can start to execute on a processor Pk only if its all immediate predecessors send the essential messages to nj and nj successful receives all these messages. Thus, the latest message arrive time of node nj on processor Pk, denoted by LMAT(nj, Pk), is calculated by the following equation,

(

,

)

( )( ( )i u,k,for task i executedon processor u)

n pred k n

j P Max EFT n c n P

n LMAT

j i

+

= (6)

where pred(nj) is the set of immediate predecessors of task nj. Note that if tasks ni and nj are assigned to the same processor, cu,k is assumed to be zero because it is negligible.

Because the entry task nentry has no inward edge, thus we have

(

nentry,Pk

)

=0

LMAT (7) for all k = 1 to P.

Definition 2: Given a DAG scheduling system on G = (V, E), the Start Time of task nj executed on processor Pk is denoted as ST(nj, Pk).

Estimating task’s start time (for example, task nj) will facilitate search of available time slot on target processors that is large enough to execute that task (i.e., length of time slot > wj,k). Note that the search of available time slot is started from

(

nj Pk

)

LMAT , .

Definition 3: Given a DAG scheduling system on G = (V, E), the finish time of task nj

denoted by FT(nj,Pk), represents the completion time of task nj executed on processor Pk. FT(nj,Pk) is defined as follows,

k j k j k

j P ST n P w

n

FT( , )= ( , )+ , (8) Definition 4: Given a DAG scheduling system on G = (V, E), the earliest finish time of

(19)

17 task nj denoted by ( )

nj

EFT , is formulated as follows, )}

, ( { )

( j k

P

j Minp FTn P

n EFT

k

= (9) Definition 5: Based on the determination of EFT(nj) in equation (9), if the earliest finish time of task nj is obtained upon task nj executed on processor pt, then the target processor of task nj is denoted by TP(nj), and TP(nj) = pt.

4. The Generalized Critical-task Anticipation Scheduling Algorithm

Our approach takes advantages of list scheduling in lower algorithmic complexity and superior scheduling performance and furthermore came up with a novel heuristic algorithm, the generalized critical task anticipation (GCA) scheduling algorithm to improve the schedule length as well as speedup of applications. The proposed scheduling algorithm will be verified beneficial for the readers while we delineate a sequence of the algorithm and show some example scenarios in three phases, prioritizing phase, listing phase and processor selection phase.

In prioritizing phase, the CS(ni) is known as the maximal summation of scores including the average computation cost and communication cost from task ni to the exit task. Therefore, the magnitude of the task’s critical score is regarded as the decisive factor when determining the priority of a task. In listing phase, an ordered list of tasks should be determined for the subsequent phase of processor selection. The proposed GCA scheduling technique arranges tasks into a list L, not only according to critical scores but also considers tasks’ importance.

Several observations bring the idea of GCA scheduling method. Because of processor heterogeneity, there exist variations in execution cost from processor to processor for same task. In such circumstance, tasks with larger computational cost should be assigned higher priority. This observation aids some critical tasks to be executed earlier and enhances probability of tasks reduce its finish time.

Furthermore, each task has to receive the essential messages from its immediate predecessors. In other words, a task will be in waiting state when it does not collect complete message yet. For this reason, we emphasize the importance of the last arrival message such that the succeeding task can start its execution earlier.

Therefore, it is imperative to give the predecessor who sends the last arrival message higher priority. This can aid the succeeding task to get chance to advance the start time. On the other hand, if a task ni is inserted into the front of a scheduling list, it occupies vantage position. Namely, ni has higher probability to accelerate its execution and consequently the start time of suc(ni) can be advanced as well.

In most list scheduling approaches, it was usually to demonstrate the algorithms in two phases, the list phase and the processor selection phase. The list phase of proposed GCA scheduling algorithm consists of two steps, the CS (critical score) calculation step and task prioritization step.

Let’s take examples for the demonstration of CS calculation, which is performed in level order and started from the deepest level, i.e., the level of exit task. For example, according to equation (4), we have CS(n10)= w10 = 16. For the upper level tasks, n7, n8 and n9, CS(n7) = w7+(c7,10+CS(n10)) = 47.12, CS(n8) =

)) ( ( 8,10 10

8 c CS n

w + + =37.83, CS(n9) = w9+(c9,10+CS(n10)) =49.23. The other tasks can be calculated by the same methods. Table 1 shows complete calculated

(20)

18 critical scores of all tasks for DAG-1.

Table 1: Critical Scores of tasks in DAG-1 using GCA algorithm Critical Scores of tasks in GCA algorithm

n1 n2 n 3 n 4 n 5 n 6 n 7 n 8 n 9 n 10

120.13 84.83 88.67 89.45 76.28 70.25 47.12 37.83 49.23 16.00

Follows the critical score calculation, the GCA scheduling method considers both tasks’ importance (i.e., critical score) and its relative urgency for prioritizing tasks.

Based on the results obtained previously, we use the same example to demonstrate task prioritization in GCA. Let’s start at the exit task n10, which has the lowest critical score. Assume that tasks will be arranged into an ordered list L, therefore, we have L

= {n10} initially. Because task n10 has three immediate predecessors, with the order CS(n9) > CS(n7) > CS(n8), the list L will be updated to L={n9, n7, n8, n10}. Applying the same prioritizing method by taking the front element of L, task n9; because task n9 has three immediate predecessors, with the order CS(n4) > CS(n2) > CS(n5), we have the updated list L = { n4, n2, n5, n9, n7, n8, n10}. Taking the same operations, insert task n1 in front of task n4, insert task n3 in front of task n7, insert tasks n4, n2, n6 (because CS(n4) > CS(n2) > CS(n6)) in front of task n8; we have the list L = { n1, n4, n2, n5, n9, n3, n7, n6, n4, n2, n6, n8, n10}. The final list L = {n1, n4, n2, n5, n9, n3, n7, n6, n8, n10} can be derived by removing duplicated tasks.

In listing phases, the GCA scheduling algorithm proposes two enhancements from the majority of literatures. First, GCA scheduling technique considers various transmission costs of messages among processors into the calculation of critical scores.

Second, the GCA algorithm prioritizes tasks according to the influence on its successors and devotes to lead an accelerated chain while other techniques simply schedule high critical score tasks with higher priority. In other words, the GCA algorithm is not only prioritizing tasks by its importance but also by the urgency among task. The prioritizing scheme of GCA scheduling technique can be accomplished by using simple stack operations, push and pop, which are outlined in GCA_List_Phase procedure as follows.

Begin_GCA_List_Phase

1. Initially, construct an array of Boolean QV and a stack S.

2. QV[nj] = false,njV.

3. Push nexit on top of S.

4. While S is not empty do 5. Peek task nj on the top of S;

6. If( all QV[ni] are true, for all nipred(nj) or task nj is nentry) { 7. Pop task nj from top of S and put nj into scheduling list L;

8. QV[ nj] = true; }

9. Else /* search the CT(nj) */

10. For each task ni, where nipred(nj) do 11. If(QV[ni] = false)

12. Put CS(ni) into container C;

13. Endif

14. Push tasks pred(nj) from C into S by non-decreasing order according to their critical scores;

15. Reset C to empty;

參考文獻

相關文件

The major qualitative benefits identified include: (1) increase of the firms intellectual assets—during the process of organizational knowledge creation, all participants

This research is to integrate PID type fuzzy controller with the Dynamic Sliding Mode Control (DSMC) to make the system more robust to the dead-band as well as the hysteresis

This paper integrates the mechatronics such as: a balance with stylus probe, force actuator, LVT, LVDT, load cell, personal computer, as well as XYZ-stages into a contact-

This project integrates class storage, order batching and routing to do the best planning, and try to compare the performance of routing policy of the Particle Swarm

由於本計畫之主要目的在於依據 ITeS 傳遞模式建構 IPTV 之服務品質評估量表,並藉由決

As for current situation and characteristics of coastal area in Hisn-Chu City, the coefficients of every objective function are derived, and the objective functions of

Subsequently, the relationship study about quality management culture, quality consciousness, service behavior and two type performances (subjective performance and relative

Ogus, A.,2001, Regulatory Institutions and Structure, working paper No.4, Centre on Regulation and Competition, Institute for Development Policy and Management, University