中華大學碩士論文

(1)

中華大學碩士論文

題目：在網格計算環境中以基因演算法為基礎的動態排程演算法

Genetic Algorithm Based Dynamic Scheduling Algorithms in Grid Computing Environment

系所別：資訊工程學系碩士班學號姓名： M09502036 陳政光指導教授：游坤明博士

中華民國九十七年八月

(2)

摘要

網格計算能夠整合位於不同地理位置或者網域的運算資源來形成一個高效能的虛擬計算平台。藉由此高效能計算平台，一些運算複雜的問題即可以被有效率的解決。在網格計算環境中，由於運算資源之間的網路環境與計算能力往往都是不同的，所以欲將工作分派到各個不同的運算節點上執行時，就必須要有一個有效

率的工作排程演算法才能夠充份發揮網格計算的效能。本篇論文提出了 GDSA

與 EDSA 兩個在網格計算環境中的工作排程演算法。此兩演算法為利用基因演

算法的最佳化搜尋方法來搜尋出一個於網格計算環境中的有效率工作排程方法，

並且適用於不同數量或者不同運算能力的運算節點。這兩種演算法中，我們以不

同的染色體架構來探討對演化效能上的影響。在 EDSA 演算法中，我們根據適

應函數值的演化情形來搭配利用混合式交配與遞增式突變的方法來使整個演化避免陷入區域的最佳解，並加速朝整體的最佳解逼近。為了驗證所提出的排程演

算法的效能，我們以GridSim 網格環境模擬軟體來建立一個網格模擬環境。在模

擬實驗當中，我們以不同的運算節點數與工作數來進行效能測試，並與其他五種排程演算法來進行比較。在模擬實驗的結果中顯示，與其他常見的排程演算法相比，以基因演算法為基礎的排程演算法確實能夠得到較佳的排程方法。特別是我

們所提出的 EDSA 排程演算法，不論在參與運算的節點數目或者所需執行的工

作數目增加的情況之下，皆能夠比其他的排程演算法有著較佳的效能。

關鍵字：基因演算法、網格計算、異質性、排程演算法

(3)

Abstract

Grid computing can integrate computational resources from different networks or regional areas into a high performance computational platform. With the use of this high performance platform, complex computing-intensive problems can be solved efficiently. Scheduling problem is an important issue in a grid computing environment.

Because of the differences in computational capabilities and network status of computational resources, an efficient scheduling algorithm is necessary to assign jobs to the appropriate computing nodes. In this thesis, we propose two dynamic scheduling algorithms GDSA and EDSA for scheduling tasks in grid computing environment. The proposed algorithms use the optimal-searching technique of genetic algorithm (GA) to get an efficient scheduling solution in grid computing environment and adapt to different number of computing nodes which have different computational capabilities. And, two types of chromosomes were used to discuss the effect on performance. Furthermore, the hybrid crossover and incremental mutation operations within the EDSA algorithm can move the solution away from the local-optimal solution towards a near-optimal solution. In order to verify the performance of the algorithms, a simulation with randomly generated task sets was performed, and they were then compared with five other scheduling algorithms. The simulation results show that the use of GA can effectively evolve a better schedule than other conventional scheduling algorithms. Especially, the proposed EDSA outperformed among all other scheduling algorithms across a range of scenarios.

Keywords: Genetic algorithm, grid computing, heterogeneous, scheduling algorithm

(4)

TABLE OF CONTENT

Chapter 1. Introduction ... 1

Chapter 2. Related Works ... 3

Chapter 3. Introduction to Genetic Algorithm ... 8

3.1 Encoding and Initialization ... 9

3.2 Selection and Reproduction ... 9

3.3 Crossover ... 10

3.4 Mutation ... 11

3.5 Termination ... 11

Chapter 4. GDSA and EDSA scheduling algorithms ... 12

4.1 GA based dynamic scheduling algorithm (GDSA) ... 13

4.1.1 Chromosome Definition... 13

4.1.2 Initialization ... 14

4.1.3 Decode and Fitness Function ... 14

4.1.4 Selection and Reproduction ... 15

4.1.5 Crossover ... 15

4.1.6 Mutation ... 16

4.1.7 Termination ... 17

4.2 Evolution based dynamic scheduling algorithm (EDSA) ... 22

4.2.1 Chromosome Definition... 22

4.2.2 Optimal Initialization ... 23

4.2.3 Decode and Fitness Function ... 23

4.2.4 Selection and Reproduction ... 23

4.2.5 Hybrid Crossover ... 24

4.2.6 Incremental Mutation ... 25

(5)

4.2.7 Termination ... 26

4.2.8 EDSA Workflow ... 26

Chapter 5. Experiments ... 31

5.1 Experimental Environments ... 31

5.2 Experimental Results ... 32

Chapter6. Conclusion and Future Works ... 39

Reference ... 40

(6)

LIST OF FIGURES

Figure 2.1 An example of using Min-Min scheduling algorithm ... 4

Figure 2.2 An example of using Max-Min scheduling algorithm ... 4

Figure 2.3 An example of using Sufferage scheduling algorithm ... 5

Figure 3.1 General genetic algorithm workflow ... 8

Figure 3.2 Example of single-point crossover ... 10

Figure 3.3 Example of two-point crossover ... 10

Figure 3.4 Example of mask crossover ... 11

Figure 4.1 Encoding of a schedule (GDSA) ... 13

Figure 4.2 Example of improved single-point crossover (GDSA) ... 16

Figure 4.3 Example of mutation (GDSA) ... 17

Figure 4.4 GDSA workflow ... 18

Figure 4.5 Encoding of a schedule (EDSA) ... 22

Figure 4.6 Example of two-point crossover (EDSA) ... 24

Figure 4.7 Example of mask crossover (EDSA) ... 25

Figure 4.8 Example of mutation (EDSA) ... 26

Figure 4.9 EDSA workflow ... 27

Figure 5.1 Makespan in different number of tasks (nodes = 20) ... 34

Figure 5.4 Makespan in different number of computing nodes (tasks = 500) ... 36

Figure 5.5 Makespan in different number of computing nodes (tasks = 1000) .... 37

(7)

LIST OF TABLES

Table 2.1 An example of expected execution time matrix ... 4 Table 4.1 The definitions of GDSA and EDSA ... 18 Table 5.1 Hardware and Software specification of the simulation environment ... 31

(8)

1

Chapter 1. Introduction

ith the growth in the amount of data to be computed, computational problems are getting more and more complex and costly in terms of time needed. Grid computing can integrate and utilize heterogeneous computational resources which are connected through networks. Thus grid computing is widely used to solve large-scale computational problems. Unlike traditional cluster computing, computational capabilities of resources in grid computing environments are usually different. What is more, there are also variances of network status between the computational resources. Because of the heterogeneity of grid computing, an efficient scheduling algorithm is required to schedule the number of tasks for the different computing nodes.

In this thesis, two dynamic scheduling algorithms which utilize the genetic algorithm (GA) are presented. They can schedule heterogeneous tasks which are independent to the appropriate computing nodes with different computational capabilities. And, both the network status between the different computational resources and computational capabilities are taken into consideration. So, the time for processing each task can be determined more precisely. In the first algorithm, the chromosome is encoded according to the number of computing nodes. Because the number of computing nodes is usually less than the number of tasks which will be scheduled, this algorithm has shorter chromosomal length and faster searching speed.

And in the second algorithm, the hybrid crossover and incremental mutation operators are used according to the varying status of fitness value in each generation of evolutions. Even the rates of reproduction, crossover, and mutation are changed dynamically. Thus, the solution can evolve toward near-optimal efficiently.

Furthermore, not only the above improvement in the evolutional phase is used, but W

(9)

2

also the optimal initialization. With known scheduling algorithms, the chromosomes are optimized in advance and the evolution is speeded up. So efficiency is greatly enhanced and the probability of finding a near-optimal schedule is high.

The organization of this thesis is as follow, Chapter 2 describes the related works of the other scheduling algorithms which proposed by other researchers. Chapter 3 is the introduction of the genetic algorithm. In Chapter 4, we introduce our proposed GDSA and EDSA scheduling algorithm. In Chapter 5, we present the results of our performance experiments. Finally, the conclusion and future works are given in the Chapter 6.

(10)

3

Chapter 2. Related Works

The scheduling of a set of tasks for the heterogeneous computational resources has been investigated by many researchers [1][4][9][14][16][19][26] and has been proven an NP problem [8]. In general, heuristic scheduling algorithms can be classified into two categories: on-line and batch mode. For the on-line mode, a task is scheduled while it arriving at the scheduler. For the batch mode, tasks are usually independent and there is no order of execution. So, in a batch mode schedule, the scheduling algorithms schedule a batch of tasks every time. In this thesis, batch mode schedule is investigated.

Min-Min, Max-Min and Sufferage algorithms are conventional scheduling algorithms and widely used in batch mode scheduling [13][18]. The details of these algorithms are as follows:

y Min-Min: In the Min-Min scheduling algorithm, the task with lower computation time has higher priority. And, the task is assigned to the computing node which can finish executing it first.

y Max-Min: Max-Min is the same as Min-Min in that the task is assigned to the computing node which can finish executing it first. But the difference is the task needing more computation time has higher priority.

y Sufferage: The priorities of tasks in the Sufferage scheduling algorithm are given according to the sufferage value. This value is determined by the difference in computation time between the best and second best computing nodes. Then, the same as the above algorithms, tasks are assigned to the computing nodes which can finish them first.

Table 2.1 is an example of expected execution time matrix and Figure 2.1, Figure 2.2 and Figure 2.3 are the scheduling results which using Min-Min,

(11)

4

Max-Min and Suferage scheduling algorithms of this example.

Table 2.1 An example of expected execution time matrix

Task Node 0 Node 1 Node 2 Node 3

T0 40 48 134 50

T1 50 82 88 89

T2 55 68 94 93

T3 52 60 78 108

Figure 2.1 An example of using Min-Min scheduling algorithm

Figure 2.2 An example of using Max-Min scheduling algorithm

(12)

5

Figure 2.3 An example of using Sufferage scheduling algorithm

Although the above three scheduling algorithms performed better than the traditional random or sequential scheduling approaches, they ignore the importance of dynamic network status. Usually, each task in a batch has a different computation and transmission time. The transmission time includes both transmitting the task to the computing node and returning the execution result after the task is finished. Because of the above factors, the total time for tasks will be effected by not only the computational capabilities of computing nodes but also the network status between these computational resources. However, not only above three scheduling algorithms but also some studies only take the computational capabilities into consideration and ignore the importance of the network status [22][27].

The Genetic algorithm (GA) was developed by Holland [11]. It simulates the evolution of natural biology which is based on Darwinian principles of natural selection. The GA operates on a population of chromosomes which is encoded according to the problem. Each chromosome in the population has a potential solution from the search space. With each generation, the chromosomes are operated by the reproduction, crossover, and mutation operators. Through these operators, not only the

(13)

6

superior solutions can be preserved, but also an improved solution may be generated.

Because of the above advantages, GA is widely used to solve heuristic problems by many researchers [5][6][10][17][24][25]. Many researchers have investigated the use of GA in homogeneous [12][28][29] and heterogeneous environments [23]. However, grid computing is a heterogeneous environment, so the technique for a homogeneous environment is not suitable.

In [23], a load balancing algorithm which using GA (In this thesis, it is called Z-GA) for grid computing environments was proposed. In Z-GA, the chromosome was represented by binary string. For example, if there are four tasks to schedule and three computing nodes, one possible string is: 0101|1000|0010, which says that task 2 and 4 is scheduled on node 1, task 1 is scheduled on node 2, and tasks 3 is scheduled on node 3. At initialization, the generation of each chromosome is as follows. First, a Sufferage algorithm is used to generate an initial configuration for the chromosomes.

And a certain number of chromosomes are then mutated with mutation probability 0.3.

The mutation operator used is custom bit mutation, whereby a task is randomly moved to another node. At each generation, best α chromosomes are kept and survive to the next generation and Tournament selection is used to select chromosomes to the mating pool. To the chromosomes in the mating pool, a custom uniform crossover is applied. For example, if there are four tasks to schedule and three computing nodes, two possible strings are 0101|1000|0010 and 1100|0010|0001.

If the crossover occurs in the first and last position of the two strings are 0101|0000|0011 and 0101|1010|0000. After crossover, a custom gene mutation operator is applied. Although Z-GA performed better than the other conventional scheduling algorithms, the chromosomal length of this scheduling is longer, and the efficiency of the evolution may be influenced. Moreover, the crossover and mutation

(14)

7

probabilities are controlled only by the fixed number of generations in the course of evolution. In the evolutional phase, it is hard to predict if the fitness value is local optimization. So, controlling probabilities by a fixed number of generations is not suitable. In the worst case, if the mutation rate is gained with the fitness value not being convergent, the chance that the fitness value increasing may be lost.

In order to improve the disadvantages of the above scheduling algorithm, this thesis proposed two GA based scheduling algorithms which considered the dynamic network status and had shorter chromosomal length than the Z-GA. Furthermore, the rates of reproduction, crossover and mutation are changed dynamically according to the variance of the fitness value. So the evolutional efficiency can be enhanced greatly.

(15)

8

Chapter 3. Introduction to Genetic Algorithm

The Genetic algorithm (GA) is an optimal-searching technique which can search a near-optimal solution in large solution spaces. It simulates the evolution of natural biology to let the searching solution improve better. A general GA operational flow is shown as following Figure 3.1.

Figure 3.1 General genetic algorithm workflow 1. Initializing the population of chromosomes

2. Evaluating the fitness value of each chromosome in the population 3. While (the stopping condition is not being met )

1. Selection 2. Reproduction 3. Crossover 4. Mutation

4. Finishing the evolution to find the solution

In the following, we will introduce the each phase in general genetic algorithm workflow.

(16)

9

3.1 Encoding and Initialization

In the genetic algorithm, the chromosome will be encoded according to the case of the problem. Usually, it is encoded as the type of discrete or binary string. Then the fitness function is designed. Through the fitness function, the solution which called fitness value of each chromosome can be got. The population is a collection of chromosomes. For each chromosome in the population, it represents a solution for the particular problem. In other words, each chromosome has self fitness value.

In initialization phase, a population of chromosomes is produced with random approach. Through randomly producing from the solution spaces and evaluating the fitness value of each chromosome, the chromosomes which have higher fitness value will be selected to the mating pool.

3.2 Selection and Reproduction

After evaluated the fitness value of each chromosome, the chromosome which has higher fitness means that it is a better solution. Then these chromosomes will be selected to the mating pool. This is the reproducing phase. Usually, the conventional approaches are roulette wheel selection and tournament selection as described in the following.

1. Roulette Wheel Selection: In each generation, the area of roulette is divided on the basis of the size of fitness value of each chromosome. So the chromosome which has higher fitness value has larger area in the roulette. Then randomly select one point from the roulette, the chromosome which maps that point will be selected to the mating pool. With this approach, the chromosome which has higher fitness value has higher probability to be selected.

2. Tournament Selection: In each generation, two or more chromosomes will be selected randomly. Then the chromosome which has higher fitness value will be

(17)

10

selected to the mating pool. Due to the less computation, tournament selection is faster. In this thesis, tournament selection is used in our proposed EDSA.

3.3 Crossover

Crossover is randomly select two chromosomes from the mating pool and swap information of genes mutually. Through superior chromosomes in the mating pool, the crossover operator purposes to generate better chromosomes than the chromosomes in the mating pool. The conventional crossover approaches are described as follows:

1. Single-point Crossover: Randomly select one point from the selected two chromosomes and then swap information of genes mutually, as shown in Figure 3.2.

Figure 3.2 Example of single-point crossover

2. Two-point Crossover: The same as single-point crossover that swap information of genes mutually. But the number of selected points is two, as shown in Figure 3.3.

Figure 3.3 Example of two-point crossover

3. Mask Crossover: Generate a mask which length is the same as the chromosome

(18)

11

firstly. This mask is comprised of 0 and 1. 1 represents that swap the information of genes mutually and 0 is opposite, as shown in Figure 3.4. When the length of chromosome is longer, mask crossover has higher probability to swap information of genes than single-point and two-point crossover.

Figure 3.4 Example of mask crossover 3.4 Mutation

Mutation is randomly select one chromosome, and then select certain number of mutation points to change the information of genes, e.g. if the chromosome is encoded as the type of binary string, the changed information of gene is 0 Æ 1 and 1 Æ 0. The mutation operator purposed to avoid that the chromosome is local optimal. So moderate number of mutation points and probability can let chromosomes avoid becoming local optimal solutions. Oppositely, excessive number of mutation points and probability will cause the whole evolution become randomly searching.

3.5 Termination

If the fitness value or some situations satisfy the stopping condition, the evolution will terminate and get the solution. Oppositely, if the stopping condition is not met, the chromosomes will evolve continuously until satisfying the stopping condition.

(19)

12

Chapter 4. GDSA and EDSA scheduling algorithms

For solving the scheduling problem in the grid computing environment, this thesis proposes two GA based dynamic scheduling algorithms GDSA (GA based Dynamic Scheduling Algorithm) and EDSA (Evolution based Dynamic Scheduling Algorithm) which using the GA’s searching technique to find a near-optimal schedule in the heterogeneous grid computing environment. In the GDSA, the chromosome is encoded according to the number of computing nodes and has shorter chromosomal length. With shorter length and limitary solution space, the search is fast and efficient.

And, in the EDSA, on the basis of the variances of the fitness values in each generation, crossover and mutation rates are changed dynamically. And the use of optimal initialization which comprised of many kinds of scheduling algorithms speeds the evolution of EDSA. Furthermore, the Incremental Mutation controls the points of each mutation. With this operator, the situations that are local optimal or randomly searching can be avoided. About the proposed GDSA and EDSA are described in detail in the following. And, some parameters and definitions which were used in the functions and pseudo codes are defined in the following:

N: Number of computing nodes which joining the computation C : The computing node with ID = k

T_C : Number of tasks which processed by computing node with ID = k t : File size of task with ID = i (MB)

t : Output size of task with ID = i (MB) t : Length of task with ID = i (MFLOPs)

C : Computational capability of computing node with ID = k (MFLOPs/s)

(20)

13

C : Bandwidth between the scheduler and computing node with ID = k (MB/s) V_C : Time of finishing the tasks which are assigned to the computing node with ID =

k (s)

4.1 GA based dynamic scheduling algorithm (GDSA)

In the GDSA, computing nodes are used as the indexes of a chromosome, and each index is mapped to the number of tasks which will be processed by the corresponding computing node. Because of the shorter chromosomal length and limitary solution space, GDSA can search fast and efficiently. In the following, we will describe the detail of the GDSA.

4.1.1 Chromosome Definition

In the GDSA, the chromosome is encoded according to the scheduling problem, as shown in following Figure 4.1.

Figure 4.1 Encoding of a schedule (GDSA)

In Figure 4.1, Node ID represents the index of computing nodes which joining the computation. In other words, the length of the chromosome is equal to the number of computing nodes which joining the computation. And, each index indicated a gene information which representing the number of processing tasks. Use Figure 4.1 as an example, the number of tasks which will be processed by the computing node C is four. Because the number of tasks is usually larger than the number of computing nodes in grid computing environment, the chromosomal length of GDSA is shorter.

(21)

14

And, the gene information which representing the number of processing tasks limit the solution space. Thus, search in the limitary solution space and shorter chromosomal length speeds the evolution.

4.1.2 Initialization

In order to obtain the global solutions, the initial population was generated using a random approach. That is to say, the numbers of tasks which processed by each computing node are generated randomly.

4.1.3 Decode and Fitness Function

After initialization, each chromosome in a population must be decoded and evaluated the fitness value through the fitness function. In the GDSA, each gene on the chromosomes represents the number of tasks processed by a certain computing node. And, the times of finishing the tasks on every computing node can be evaluated by Equation (1). The time needed of finishing a batch of tasks is from the starting time of assigning the first task to the computing node until the result of the last received by the scheduler. So the fitness function is designed on the basis of this time period, as Equation (2).

V_C t t

C

T_C

t

C (1)

FitenessValue Max V_C _N (2)

The time of finishing a task includes transmitting the task to the computing node, processing it, and returning the result to the scheduler. For precisely determining time needed of finishing a task, these three times must be taken into consideration. Because the number of computing nodes which joining the computation, computational capabilities, and network status vary dynamically in grid computing environment, the

(22)

15

real-time computational capabilities and network status are determined in each schedule. After evaluated the fitness value of each chromosome, the chromosome with the largest fitness value indicates the best schedule in current generation. In other words, the chromosome with the largest fitness value can schedule tasks to the fittest computing nodes and complete them in the shortest time.

4.1.4 Selection and Reproduction

In the GA, the selection operation purposes to preserve the superior chromosomes and select them for the mating pool. In the GDSA, tournament selection was used. In the past study [23], tournament selection had been used and had respectable performance. The approach of tournament selection is selecting two or more chromosomes from the previous generation and comparing them. The chromosome which has the largest fitness value will be chosen to the mating pool.

Because it has the advantage of less computation, it can shorten the time to search the solution. In the GDSA, tournament selection was used to select chromosomes to the mating pool. Moreover, in order to preserve superior chromosomes, the best α chromosomes were reproduced completely to the next generation.

4.1.5 Crossover

Crossover operator purposes swapping the information of superior chromosomes in the mating pool to evolve a better solution. Some conventional approaches like single-point crossover, two-point crossover, uniform crossover, cycle crossover and so on. In order to match the design of our chromosome, single-point crossover was improved and used in our algorithm. The approach is similar to original single-point crossover that randomly choosing one point from two chromosomes and swapping their information of genes first. Because of the swap, the total number of processing tasks of each chromosome is different from original. Due to this factor, another point

(23)

16

is chosen and complemented until the number of processed tasks is equal to the original, e.g. In Figure 4.2, the computing node C is chosen for crossover. After crossover, a random computing node C is then chosen to add and subtract differential value one.

Figure 4.2 Example of improved single-point crossover (GDSA) 4.1.6 Mutation

After selection, reproduction and crossover operations, the solution evolved toward optimal one. But, if there is not mutation, the solution may become local-optimal and hard to allow for global solutions. In the GDSA, single-point mutation which moves tasks from one computing node to another is used. Take Figure 4.3 as an example, computing node C was randomly chosen to mutate and it processed seven tasks originally. Then, a random number within the seven was chosen.

Let the number be three. Then, another random computing node C was chosen and three tasks were moved to it. So the number of processed tasks was transformed seven into four on C and three into six on C .

(24)

17

Figure 4.3 Example of mutation (GDSA) 4.1.7 Termination

In the GA, termination condition is used to determine when to stop the evolution.

Because it was hard to determine when the fitness value is convergent, the stopping condition was set that the evolution performed 2000 generations. After the evolution was stopped, a near-optimal schedule could be got from the chromosome which had the largest fitness value. By using this scheduling approach, computing nodes were utilized efficiently and the performance of computational grid was enhanced.

4.1.8 GDSA Workflow

The workflow of the GDSA is shown in Figure 4.4. In the GDSA, the initial population was produced randomly. After evaluated the fitness value, the solution was evolved through tournament selection, reproduction, improved single-point crossover, single-point mutation operators. When the solution had been evolved for 2000 generations, the evolution stopped and a near-optimal schedule could be got.

(25)

18

Figure 4.4 GDSA workflow

The procedures of GDSA are shown in the following. And table 4.1 shows the definitions of corresponding notations and terminologies in the procedures of GDSA and EDSA in the next section.

Table 4.1 The definitions of GDSA and EDSA

Definition 1. , , , … , is a batch of tasks which submitted by the user, where is the number of tasks.

Definition 2. , , , … , is a set of computing nodes which joining the computation, where is the number of computing nodes.

Definition 3. , , , … , is a population of chromosomes, is the size of the population. denotes the fitness value of the chromosome .

Definition 4. , , , … , is a population of

chromosomes of the next generation.

(26)

19

Definition 5. , and represents the rates of reproduction, crossover and

mutation, where 1 , ,

and .

Definition 6. , and represents the decreased rates of reproduction, crossover and increased rate of mutation, where

1, , and

.

Definition 7. , , , … , denotes the genes’ values of the chromosome . is the length of the chromosome.

Definition 8. , , , … , is a set of chromosomes in the mating pool, and is the size of the mating pool.

Definition 9. , , , … , denotes the genes’ values of the chromosome .

Definition 10. , , , … , is a mask, 0,1 , t 1 … , denotes a value in .

Definition 11. _ is the variable which used to count the number of generations which has been evolved.

Definition 12. _ is the variable which used to count the number of mutation points.

Definition 13. _ is the variable which used to count the generations which fitness value was invariable.

Definition 14. S is the near-optimal schedule

// GDSA Input:

: A batch of tasks which submitted by the user Output:

: The near-optimal schedule

1.

2. _ 1

3. While( _ ! 2000 )

(27)

20

,

, ,

_ 1

4. Finishing the evolution and output the schedule

Procedure

For each chromosome in , 1 …

For each gene in chromosome , 1 … Randomly produce gene’s value

Procedure

For each chromosome in population , 1 … Evaluate the fitness value of

Procedure

Sort the chromosomes in in descending order according to the fitness value

For , 1 …

Procedure ,

For , 1 …

Randomly select two chromosomes and from the

If( )

Else

1

While( )

Randomly select one point , 1 … n

Swap the genes’ values of and of

(28)

21

Complement the difference of and

2

Procedure , ,

For j, 1 …

Randomly select one chromosome in , 1 …

Randomly select two genes and in , k, r 1 … n, k r Randomly generate number , 1 …

For , 1 …

(29)

22

4.2 Evolution based dynamic scheduling algorithm (EDSA)

In the previous section, the GDSA scheduling algorithm was proposed. Although GDSA considered not only computational capabilities of computing nodes but also network status, the searching efficiency may be influenced because of its chromosomal architecture. When the number of computing nodes increases, the chromosomal length of GDSA becomes longer and the solution space gets larger. In this situation, it is difficult for GDSA to search a well schedule.

In order to enhance the searching efficiency of GDSA, an EDSA scheduling algorithm was proposed. For searching global solutions, Task ID was used as the index of the chromosome and each index was mapped to a computing node. Besides considering computational capabilities and network status, it also used different crossover and mutation operators according to the varying status of fitness value.

Furthermore, the crossover and mutation rates were changed dynamically. About the detail of the EDSA will be described in the following.

4.2.1 Chromosome Definition

In the EDSA, each chromosome in the population represents a potential schedule.

The length of the chromosome is equal to the number of tasks in the batch. Each task is assigned to a single computing node. Figure 4.5 is an example of scheduling six tasks into three computing nodes. T T , T , T , T , T , T is a batch of tasks. For each index T, it has a value(Gene) of computing node ID, e.g. T is mapped to the computing node with ID = 1(C ).

Figure 4.5 Encoding of a schedule (EDSA)

(30)

23

Although the encoding approach of the EDSA has longer chromosomal length than the GDSA, it considered the global solution space. In order to compare the performance of this two approaches, the encoding approach which using Task ID as index was used in the EDSA.

4.2.2 Optimal Initialization

The standard GA uses a random approach to generate the initial population.

Although a random approach can generate global potential solutions, it costs more time in converge the solution or evolve it toward a better solution. In [23], the Sufferage algorithm was used in initializing a population and it performed reasonably.

In order to optimize the population further, the heuristic algorithms Min-Min and Max-Min were also added except for the Sufferage. In [7], it indicates that Min-Min and Max-Min performed well in some instances. Moreover, in order to consider more possible cases, the round-robin approach was also added. Finally, the remaining chromosomes were generated with the random approach.

Through the known heuristic algorithms and random approach, the evolutional efficiency and global solutions were taken into consideration.

4.2.3 Decode and Fitness Function

In the EDSA, the fitness function is designed as GDSA which taking processing time and transmitting time into consideration. And, the fitness value can be got through Equation (2).

4.2.4 Selection and Reproduction

In order to reduce the computing time, the tournament selection was used the same as GDSA. And, best α chromosomes were reproduced completely to the next generation to preserve superior chromosomes.

(31)

24

4.2.5 Hybrid Crossover

Mask crossover has a higher probability for swapping information of genes. In this case, the change of the best chromosomes may be too much and resulting in difficulty in evolving better solutions. Compared to the mask crossover, the two-point crossover changes the information of two genes in every operation. So, the superior characteristics of the best chromosomes can be preserved. Due to the above factors, both the two-point crossover and the mask crossover were used in the crossover phase.

In the beginning, the two-point crossover was used. While the fitness value was invariable for five generations, the mask crossover which has higher variation replaces the two-point crossover. When the evolution was stagnant, mask crossover had a higher probability to evolve a better solution. The two-point crossover was reverted until the fitness value varies again. Figure 4.6 and Figure 4.7 give the examples of the two-point crossover and the mask crossover in our EDSA.

Figure 4.6 Example of two-point crossover (EDSA)

(32)

25

Figure 4.7 Example of mask crossover (EDSA) 4.2.6 Incremental Mutation

In the EDSA, Incremental Mutation was used to control the speed of mutation. In the initialization, two-point mutation was used. Then the mutation point was added one for each generation when the fitness value had been invariable for more than five generations. When the number of mutation points was larger than five, the increment will be stopped. In other words, the max size of number of mutation points was five.

This was because an excessive number of mutation points would cause random searching. Moreover, when the fitness value varied, the number of mutation points went back to two. When the fitness value had been invariable for five generations once again, the Incremental Mutation started. Figure 4.8 is an example of the mutation with four mutation points.

(33)

26

Figure 4.8 Example of mutation (EDSA) 4.2.7 Termination

The same as GDSA, because it was hard to determine when the fitness value is convergent, the stopping condition was set that the evolution performed 2000 generations in the EDSA. And, the near-optimal schedule can be got after 2000 generations.

4.2.8 EDSA Workflow

The workflow of the EDSA is shown in Figure 4.9. In the figure, the Invariable_Count is a variable used to count the generations which fitness value was

invariable. According to the variance of the fitness value, the corresponding crossover and mutation operators were used to enhance the efficiency of the evolution.

Moreover, the optimal initialization, dynamic crossover and mutation rate were also helpful to find a better solution.

(34)

27

Figure 4.9 EDSA workflow

The procedures of the EDSA scheduling algorithm are shown in the following.

// EDSA algorithm Input:

: A batch of tasks which submitted by the user Output:

S: The near-optimal schedule

1.

2. _ 1

3. _ 2

4. While ( _ ! 2000 )

If (the fitness value varies )

_ 0

_ 2

,

(35)

28

, , , _

Else

_ 1

If ( _ 5 )

,

If ( _ 5 )

_ 1

, , , _

Else

, , , _

Else

_ 2

,

, , , _

_ 1

5. Finishing the evolution and output the solution

Procedure

For each genes in chromosome , 1 … Produce genes’ values via Min-Min algorithm For each genes in chromosome , 1 …

Produce genes’ values via Max-Min algorithm For each genes in chromosome , 1 …

Produce genes’ values via Sufferage algorithm For each genes in chromosome , 1 … Produce genes’ values via Round-Robin algorithm

For , 5 …

For each genes in chromosome , 1 … Produce genes’ values via Random algorithm

(36)

29

Procedure

For each chromosomes in population , 1 … Evaluate the fitness value of

Procedure

Sort the chromosomes in in descending order

For , 1 …

Procedure ,

For , 1 …

If ( )

Else

1

While ( )

Randomly select one point , 1 …

Swap the genes’ values of and of Randomly select one point r, 1 … and r Swap the genes’ values of and of

2

Procedure ,

For , 1 …

If ( )

Else

(37)

30

Randomly produce a mask K 1

While ( )

For in , 1 …

If ( 1 )

Swap the genes’ values of and of

2

Procedure , , , _

For , j 1 …

Randomly select one chromosome in , 1 …

For , d 1 . . _

Randomly select one gene in , 1 … Randomly select one computing node , 1 …

For , 1 …

(38)

31

Chapter 5. Experiments

Experiments were done to verify GDSA and EDSA can efficiently schedule a batch of tasks to the fittest computing nodes and reduce the completion time. The experiments were implemented for different number of computing nodes and tasks.

In the experiments, the GDSA and EDSA were compared with Random, Round-Robin, Min-Min, Sufferage scheduling algorithms. Moreover, a GA based scheduling algorithm (Z-GA) which proposed in [23] was also compared.

5.1 Experimental Environments

Different simulation software of grid environment was proposed, e.g. Bricks [2], MicroGrid [21], and SimGrid [3]. In this thesis, a well-known GridSim toolkit [30]

was used to construct the simulation environment. It has been used by many studies to construct a computational grid or data grid environment [15][20]. Table 5.1 specifies the simulation environment.

Table 5.1 Hardware and Software specification of the simulation environment

Hardware Software

Intel Core 2 Duo 2.0GHz 3GB DDR2 Ram

Mac OSX 10.5 JAVA SE 6 GridSim toolkit

In order to test that whether GDSA and EDSA can search a near-optimal schedule for a large number of tasks or computing nodes, the simulation was performed in two scenarios:

y Scenario 1 (changing the number of tasks): In scenario 1, 20, 40 and 60 computing nodes were used to process 300 to 2100 tasks.

y Scenario 2 (changing the number of computing nodes): In scenario 2, 10 to 60 different number of computing nodes were used to process 500, 1000, 1500 and 2000 tasks

(39)

32

In two scenarios, tasks were independent and the size of file, execution length and execution result are different. The computing nodes also had different processing capabilities and bandwidth. With the above two scenarios, performance of scheduling algorithms when the number of tasks or computing nodes increased could be observed.

5.2 Experimental Results

In scenario 1, the number of tasks was changed to compare the makespan of seven scheduling algorithms. In Figure 5.1 ~ Figure 5.3, the experimental results show that when the number of tasks increased, the time for finishing tasks increased, too. And, in all situations, the scheduling algorithms which are based on GA performed much better than other conventional scheduling algorithms, e.g. compared with the Random, Figure 5.1 shows that when the number of tasks was 2100 and nodes was 20, Z-GA shortened makespan by 44% (less 202510 sec.), GDSA shortened makespsan by 45% (less 204955 sec.), and EDSA shortened makespan by 49% (less 223044 sec.). Figure 5.2 shows that when the number of tasks was 2100 and nodes was 40, Z-GA shortened makespan by 46% (less 129647 sec.), GDSA shortened makespan by 38% (less 108431 sec.) and EDSA shortened makespan by 55% (less 15629 sec.). Further, Figure 5.3 shows that when the number of tasks was 2100 and nodes was 60, Z-GA shortened makespan by 49% (less 108186 sec.), GDSA shortened makespan by 41% (less 90980 sec.), and EDSA shortened makespan by 60% (less 132165 sec.).

In Figure 5.1, the difference in performance between the GDSA and Z-GA is less through the experiments. Z-GA performed better than GDSA in the case of 300 and 600 tasks, e.g. compared with the GDSA, Figure 5.1 shows that when the number of tasks was 300 and nodes was 20, Z-GA shortened makespan by 13% (less 5191 sec.).

(40)

33

Oppositely, GDSA performed better than Z-GA when the number of tasks is larger than 900 tasks, e.g. compared with the Z-GA, Figure 5.1 shows that when the number of tasks was 2100 and nodes was 20, GDSA shortened makespan by 10% (less 2445 sec.). But, in Figure 5.2 and Figure 5.3, when the number of computing nodes increased to 40 and 60, the performance of GDSA decreased, e.g. compared with the GDSA, Figure 5.2 shows that when the number of tasks was 2100 and nodes was 40, Z-GA shortened makespan by 12% (less 21216 sec.) and Figure 5.3 shows that when the number of tasks was 2100 and nodes was 60, Z-GA shortened makespan by 13%

(less 17206 sec.). Because of the larger number of computing nodes, the solution spaces of GDSA grew large and searching efficiency was influenced. Relatively, even if the number of computing nodes increased, EDSA performed best throughout the experiments. Compared with the Z-GA, Figure 5.1 shows that when the number of tasks was 300, EDSA shortened makespan (less 3065 sec.) by 9%. When the number of tasks increased to 2100, EDSA shortened makespan by 8% (less 20500 sec.).

Figure 5.2 shows than when the number of computing nodes increased to 40, EDSA shortened makespan by 22% (less 4869sec.) in the case of 300 tasks and 17% (less 26882 sec.) in the case of 2100 tasks. Even the number of computing nodes increased to 60, Figure 5.3 shows that EDSA shortened makespan by 31% (less 5500 sec.) in the case of 300 tasks and 22% in the case of 2100 tasks (less 23979 sec.). So, the experiments showed that EDSA performed better than Z-GA throughout the experiments. This is due to EDSA using different crossover and mutation operators and changing probabilities dynamically rather than only changing probabilities in for a fixed number of generations. Because it is hard to predict whether a fitness value going to be invariable or convergent, changing crossover and mutation probabilities according to a fixed number of generations is not suitable.

(41)

34

Figure 5.1 Makespan in different number of tasks (nodes = 20)

(42)

35

In scenario 2, the number of computing nodes was changed to compare the makespan of seven scheduling algorithms. As shown in Figure 5.4 ~ Figure 5.7, because of the use of GA, Z-GA, GDSA and EDSA had respectable performance, e.g.

compared with the random, Figure 5.4 shows that when the number of tasks was 500 and nodes was 60, Z-GA shortened makespan by 55% (less 32468 sec.), GDSA shortened makespan by 43% (less 25327 sec.) and EDSA shortened makespan by 67% (less 39354 sec.). Figure 5.5 shows that when the number of tasks was 1000 and nodes was 60, Z-GA shortened makespan by 49% (less 48941 sec.), GDSA shortened makespan by 32% (less 31749 sec.) and EDSA shortened makespan by 63%(less 63038 sec.). Figure 5.6.6 shows that when the number of tasks was 1500 and nodes was 60, Z-GA shortened makespan by 48% (less 69460 sec.), GDSA shortened makespan by 36% (less 51737 sec.), and EDSA shortened makespan by 59% (less 85312 sec.). Figure 5.7 shows that when the number of tasks was 2000 and nodes was 60, Z-GA shortened makespan by 49% (less 98697 sec.), GDSA shortened makespan by 41% (less 83119 sec.) and EDSA shortened makespan by 58% (less 116689 sec.).

And, when the number of computing nodes is less and the number of tasks is large, GDSA performed better than Z-GA, e.g. compared with Z-GA, Figure 5.7.7 shows that GDSA shortened makespan by 5% (less 27187 sec.) in the case of 10 nodes and 4% (less 9459 sec.) in the case of 20 nodes. But, as mentioned in the Scenario 1, the searching efficiency of GDSA was influenced when the number of computing nodes increased, e.g. compared with GDSA, Figure 5.7 shows that Z-GA shortened makespan by 16% (less 23376 sec.) in the case of 50 nodes and 13% (less 15578 sec.) in the case of 60 nodes. According to the above results, we can observe that when the number of computing nodes increases, the searching efficiency of

(43)

36

GDSA will reduce because of the larger solution space.

Compared with Z-GA, Figure 5.4 shows that EDSA shortened makespan by 7%

(less 9506 sec.) in the case of 10 nodes and 26 % (less 6885 sec.) in the case of 60 nodes. In Figure 5.5, when the number of scheduled tasks was 1000, EDSA shortened makespan by 6% (less 16815 sec.) in the case of 10 nodes and 27% (less 14098 sec.) in the case of 60 nodes. In Figure 5.6, when the number of scheduled tasks was 1500, EDSA shortened makespan by 7% (less 26001 sec.) in the case of 10 nodes and 21%

(less 15851 sec.) in the case of 60 nodes. In Figure 5.7, when the number of scheduled tasks increased to 2000, EDSA shortened makespan by 7% (less 39024 sec.) in the case of 10 nodes and 18 % (less 17990 sec.) in the case of 60 nodes. So, it was observed that due to the use of different operators and dynamic probabilities, the performance of EDSA was not influenced even if the number of computing nodes or tasks increased. So the proposed EDSA can perform better than Z-GA through the experiments.

Figure 5.4 Makespan in different number of computing nodes (tasks = 500)

(44)

37

(45)

38

From the above two scenarios, we can observe that the scheduling algorithms which using the optimal-searching technique of GA have better performance than other conventional scheduling algorithms. Although the performance of GDSA scheduling algorithm is affected by the number of computing nodes which joining the computation, it has respectable performance when the number of computing nodes is less and the number of scheduled tasks is large. And the EDSA scheduling algorithm outperformed others in two scenarios, because of the use of different crossover and mutation operators and changing rates dynamically.

(46)

39

Chapter 6. Conclusion and Future Works

Although grid computing can integrate the computational resources which locate at different geographical or network areas, it is an NP problem for scheduling tasks to the computing nodes efficiently. For solving this scheduling problem, we propose two scheduling algorithms GDSA and EDSA using the genetic algorithm to search a near-optimal schedule in a grid computing environment. In the GDSA, chromosome was encoded according to the number of computing nodes to shorten the length of the chromosome and the gene information which representing the number of processing tasks limit the solution space. Thus, the evolution can be speeded in the limitary solution space and shorter chromosomal length. And, in the EDSA, the use of hybrid crossover and incremental mutation enhanced the efficiency of evolution.

Furthermore, the probabilities controlled by the variance of the fitness value stopped the solution from becoming local-optimal or losing the chance to evolve toward a better solution. In order to prove the performance of the proposed GDSA and EDSA, the simulation was performed. Compared with the traditional random approach, GDSA shortened makespan by 49% (in the case of 20 nodes, 2100 tasks) and EDSA shortened makespan by 60% (in the case of 60 nodes, 2100 tasks). Moreover, the results show that EDSA performed best throughout the experiments. In other words, the proposed EDSA can schedule a batch of tasks according to the fittest computing nodes, and efficiently shorten the time to complete tasks.

Although genetic algorithm has respectable efficiency in searching an optimal solution, it costs more time in evolution. In the future, we will study how to combine other optimal-searching techniques or utilize other approaches to shorten the time in evolution. If we can do that, the number of the evolutional generations can be reduced, and the time cost in finishing the jobs can be shortened greatly.

(47)

40

Reference

[1] J.H. Abawajy, “Fault-Tolerant Dynamic Job Scheduling Policy,” 6th International Conference on Algorithms and Architectures for Parallel Processing, Lecture Notes in Computer Science, vol. 3719, pp. 165-173, October 2005.

[2] K. Aida, A. Takefusa, H. Nakada , S. Matsuoka , S. Sekiguchi and U. Nagashima,

“Performance evaluation model for scheduling in a global computing system,”

International Journal of High Performance Computing Applications, vol. 4, pp.

268-279, 2000.

[3] H. Casanova, “Simgrid: A toolkit for the simulation of application scheduling,”

1st International Symposium on Cluster Computing and the Grid, pp. 430, 2001.

[4] K.-W. Cheng, C.-T. Yang, C.-L. Lai, S.-C. Chang,” A parallel loop self-scheduling on grid computing environments,” 7th International Symposium on Parallel Architectures, Algorithms and Networks, pp. 409-414, May 2004.

[5] G. Chryssolouris and V. Subramaniam, “Dynamic scheduling of manufacturing job shops using genetic algorithms,” Journal of Intelligent Manufacturing, vol.

12, no. 3, pp. 281-293, June 2001.

[6] K.P. Dahal, G.M. Burt, J.R. McDonald, and A. Moyes “A case study of scheduling storage tanks using a hybrid genetic algorithm,” IEEE Transactions on Evolutionary Computation, vol.5, issue 3, pp. 283-294, June 2001.

[7] K. Etminani and M. Naghibzadeh, “A Min-Min Max-Min Selective Algorithm for Grid Task Scheduling,” 3rd IEEE/IFIP International Conference in Central Asia on Internet, pp. 1-7, September 2007.

[8] M.R. Garey and D.S. Johnson, “Computers and Intractability: A Guide to the Theory of NP-Completeness,” W. H. Freeman & Co., New York, USA, 1979.

[9] R. Gruber, V. Keller, P. Kuonen, M.-C. Sawley, B. Schaeli, A. Tolou,

(48)

41

M. Torruella and T.-M. Tran, “Towards an Intelligent Grid Scheduling System,”

Sixth International Conference on Parallel Processing and Applied Mathematics, Lecture Notes in Computer Science, vol. 3911, pp. 751-757, September 2005.

[10] A.T. Haghighat, K. Faez, M. Dehghan, A. Mowlaei and Y. Ghahremani,

“GA-Based Heuristic Algorithms for QoS Based Multicast Routing,” The Twenty-second SGAI International Conference on Knowledge Based Systems and Applied Artificial Intelligence, vol. 16, issues 5-6, pp. 305-312, July 2003.

[11] J.H. Holland, “Adaption in Natural and Artificial System,” MIT Press, Cambridge, MA, USA, 1992.

[12] E. S. H. Hou , N. Ansari and H. Ren, “A Genetic Algorithm for Multiprocessor Scheduling,” IEEE Transactions on Parallel and Distributed Systems, vol. 5, no.

2, pp.113-120, February 1994.

[13] O.H. Ibarra and C.E. Kim, “Heuristic Algorithms for Scheduling Independent Tasks on Nonidentical Processors,” Journal of the ACM, vol. 24, issue 2, pp.

280-289, April 1977.

[14] S.H. Jang and J.S. Lee, “Predictive Grid Process Scheduling Model in Computational Grid,” International Workshop on Metropolis/Enterprise Grid and Applications, Lecture Notes in Computer Science, vol. 3842, pp. 525-533, January 2006.

[15] K.H. Kim and R. Buyya, “Fair Resource Sharing in Hierarchical Virtual Organizations for Global Grids,” 8th IEEE/ACM International Conference on Grid Computing, pp. 50-57, September 2007.

[16] H. Lee, D. Lee and R.S. Ramakrishna, “An Enhanced Grid Scheduling with Job Priority and Equitable Interval Job Distribution,” The first International Conference on Grid and Pervasive Computing, Lecture Notes in Computer

(49)

42

Science, vol. 3947, pp. 53-62, May 2006.

[17] S.-S. Leu and C.-H. Yang, “GA-Based Multicriteria Optimal Model for Construction Scheduling,” Journal of Construction Engineering and Management, vol. 125, no. 6, pp. 420-427, November/December 1999.

[18] M. Maheswaran, S. Ali, H.J. Siegel, D. Hensgen and R.F. Freund, “Dynamic mapping of a class of independent tasks onto heterogeneous computing systems,”

Journal of Parallel and Distributed Computing, vol. 59, issue 2, pp. 107-131, November 1999.

[19] W.-C. Shih, C.-T. Yang and S.-S. Tseng, “A Performance-Based Approach to Dynamic Workload Distribution for Master-Slave Applications on Grid Environments,” The first International Conference on Grid and Pervasive Computing, Lecture Notes in Computer Science, vol. 3947, pp. 73-82, May 2006.

[20] G. Singh, C. Kesselman and E. Deelman, “A Provisioning Model and its Comparison with Best Effort for Performance-Cost Optimization in Grids,” The Sixteenth IEEE International Symposium on High-Performance Distributed Computing, pp. 117-126, June 2007.

[21] H. Song, X. Liu, D. Jakobsen, R. Bhagwan, X. Zhang, K. Taura and A. Chien,

“The MicroGrid: A scientific tool for modeling computational Grids,” Journal of Scientific Programming, vol. 8, no. 3, pp. 127-141, 2000.

[22] E.-H. Song, Y.-S. Jeon, S.-K. Han and Y.-S. Jeong, “Hierarchical and Dynamic Information Management Framework on Grid Computing,” International Federation for Information Processing, Lecture Notes in Computer Science, vol.

4096, pp. 151-161, October 2006.

[23] R. Subrata, A.Y. Zomaya and B. Landfeldt, “Artificial life techniques for load

(50)

43

balancing in computational grids,” Journal of Computer and System Sciences, vol. 73, issue 8, pp. 1176-1190, December 2007.

[24] M. Tanaka, H. Watanabe, Y. Furukawa and T. Tanino, “GA-based decision support system for multicriteria optimization,” IEEE International Conference on Systems, Man and Cybernetics, Intelligent Systems for the 21st Century, vol. 2, pp. 1556-1561, October 1995.

[25] S. Uckun , S. Bagchi , K. Kawamura and Y. Miyabe, “Managing Genetic Search in Job Shop Scheduling,” IEEE Expert: Intelligent Systems and Their Applications, vol. 8, no. 5, pp.15-24, October 1993.

[26] Z. Xu, X. Hou and J. Sun , “Ant algorithm-based task scheduling in grid computing,” IEEE Canadian Conference on Electrical and Computer Engineering, vol. 2, pp. 1107-1110, May 2003.

[27] K.-M. Yu, Z.-J. Luo, C.-H. Chou, C.-K. Chen and J. Zhou, "A Fuzzy Neural Network Based Scheduling Algorithm for Job Assignment on Computational Grids," The 1st International Conference on Network-Based Information Systems, Lecture Notes in Computer Science, vol. 4658, pp. 533-542, September 2007.

[28] A.Y. Zomaya and Y.-H. Teh. “Observations on using genetic algorithms for dynamic load-balcncing,” IEEE Transactions on Parallel and Distributed Systems, vol.12, issue 9, pp.899-911, September 2001.

[29] A.Y. Zomaya, C. Ward and B. Macey, “Genetic scheduling for parallel processor systems: comparative studies and performance issues,” IEEE Transactions on Parallel and Distributed Systems, vol. 10, issue 8, pp. 795-812, August 1999.

[30] http://www.gridbus.org/gridsim/

中 華 大 學 碩 士 論 文