Contents lists available atScienceDirect
Future Generation Computer Systems
journal homepage:www.elsevier.com/locate/fgcs
Dynamic resource selection heuristics for a non-reserved bidding-based Grid environment
Chien-Min Wang
a, Hsi-Min Chen
b,∗, Chun-Chen Hsu
c, Jonathan Lee
baInstitute of Information Science, Academia Sinica, Taipei, Taiwan
bDepartment of Computer Science and Information Engineering, National Central University, Taoyuan, Taiwan
cDepartment of Computer Science and Information Engineering, National Taiwan University, Taipei, Taiwan
a r t i c l e i n f o
Article history:
Received 19 December 2008 Received in revised form 29 July 2009
Accepted 3 August 2009 Available online 7 August 2009
Keywords:
Grid computing Resource selection Resource management Bidding
Matchmaking
a b s t r a c t
A Grid system is comprised of large sets of heterogeneous and geographically distributed resources that are aggregated as a virtual computing platform for executing large-scale scientific applications.
As the number of resources in Grids increases rapidly, selecting appropriate resources for jobs has become a crucial issue. To avoid single point of failure and server overload problems, bidding provides an alternative means of resource selection in distributed systems. However, under the bidding model, the key challenge of resource selection is that there is no global information system to facilitate optimum decision-making; hence requesters can only obtain partial information revealed by resource providers.
To address this problem, we propose a set of resource selection heuristics to minimize the turnaround time in a non-reserved bidding-based Grid environment, while considering the level of information about competing jobs revealed by providers. We also present the results of experiments conducted to evaluate the performance of the proposed heuristics.
© 2009 Elsevier B.V. All rights reserved.
1. Introduction
With the rapid growth in the number of PCs and clusters, Grid computing technologies have emerged to facilitate resource shar- ing and the coordination of problem solving in distributed sys- tems [1,2]. Such systems consist of large sets of heterogeneous and geographically distributed resources that are aggregated as a virtual computing platform for executing large-scale scientific ap- plications. As the number of resources in Grids increases rapidly, selecting appropriate resources for jobs has become a crucial issue.
In essence, Grid resources are heterogeneous and managed in- dependently by different organizations, and resource providers can specify their own access policies for sharing resources and joining/leaving Grids dynamically. Thus, exploiting previous cluster-based scheduling heuristics [3–7] to allocate tasks through a centralized manager or mapper is not feasible.
In recent years, many matchmaking-based technologies have been proposed to address the issue of Grid resource management [8–15].Fig. 1(a) presents an abstract matchmaking model gener- alized from these technologies. However, the matchmaking tech- nique may cause a matchmaker overload problem. Since a resource
∗Corresponding author.
E-mail address:[email protected](H.-M. Chen).
matchmaker is responsible for registering all resource states ad- vertised by providers and executing matching algorithms, an in- crease in the number of resources and the frequency of job requests creates a performance bottleneck. Moreover, resource states may change minute by minute due to requesters’ activities or resource failures, so the matchmaking technique may fail to reflect the dy- namic nature of Grid resources. This is because matchmaking is a push-based model in which a matchmaker does not learn about the changes in resource states until the resource providers publish their new states. In consequence, matchmaking may return inac- curate results.
To avoid single point of failure, matchmaker overload and expired resource information, bidding provides an alternative means of resource allocation in distributed systems [16–24].
Fig. 1(b) depicts the abstract process of the bidding model. A resource requester starts a bidding process by sending a set of call-for-proposal (CFP) requests, which contain job requirements, to resource providers. Then, based their resource utilization and policies, the providers decide whether or not to participate in the bidding process. If they join the bidding process, they return bids that describe the states of their resources to the requester.
Finally, the requester evaluates and ranks the collected bids based on its selection strategy and submits the job to the provider that proposes the best-ranked bid. The bidding model has the following advantages over the matchmaking model. (1) Scalability:
resource allocations between providers and requesters in the
0167-739X/$ – see front matter©2009 Elsevier B.V. All rights reserved.
doi:10.1016/j.future.2009.08.003
a b
Fig. 1. (a) The abstract process of the matchmaking model. (b) The abstract process of the bidding model.
bidding model are fully distributed without the intervention of a centralized matchmaker/broker. (2) Autonomy: requesters themselves can determine which of the offered resources are best- suited to execute their jobs, while providers can contribute their resources according to their sharing policies and report up-to-date state information. (3) Reliability: if a resource fails during a job’s execution, the requester can select other candidate resources from the received bids.
Under the bidding model, resource providers can usually choose between two bidding strategies, reserved and non-reserved bidding [25]. Providers who adopt the reserved strategy keep the resources for each bid as commitments to guarantee future resource states. However, if the requester subsequently rejects the bid, the reserved resources will be wasted. In this scenario, other requesters may be prepared to accept the bid before the original requester rejects it; hence, there is a high probability that the provider will miss the opportunity to serve other requesters with reserved resources. In contrast to the reserved strategy, resource providers who adopt the non-reserved option offer the same resource states to a set of requesters without reserving resources for each bid. This strategy enables providers to fully utilize their resources, but it does not guarantee the resource states. If requesters receiving the same bids submit jobs to the provider simultaneously, they will have to compete for the resources so that the job completion time may not be as expected.
In addition to the above strategies, the bidding model allows providers to reveal different levels of information about competing jobs to requesters. As shown inFig. 1(b), after resource providers receive CFPs from requesters, they can simply reveal the capabilities of the provided resources, provide information about the number of competitors, or give even more complete information about the competitors. The level of information revealed is an important factor that affects the performance of resource selection for a job’s execution.
In this paper, our objective is to minimize the turnaround time of jobs in a non-reserved bidding-based Grid environment. The turnaround time covers the period from the time a job arrives to the receipt of the executed result. In online systems, users are more sensitive to the turnaround time than the execution time, waiting time or makespan [26]. To minimize the turnaround time in this model, we propose a set of deterministic and probabilistic resource selection heuristics. In contrast to traditional centralized scheduling problems, the key challenge of resource selection in the bidding model is that there is no global information system to facilitate optimum decision-making; hence, requesters are only aware of partial information released by resource providers. Thus, we consider various levels of information about competitors in the proposed heuristics. We want to determine whether requesters could make better scheduling decisions if they have more information about the states of competing jobs.
We conduct experiments to evaluate the performance of the heuristics for various levels of information and the impact
of non-cooperative requesters. The experimental results show that the performance of the Dissolve-P heuristic is superior to that of other heuristics when information about competitors is not provided. However, the MCT-D heuristics outperform the other heuristics when information about the execution times of competitors is provided. We also find that the level of information has a significant effect on the performance of the MCT-D based heuristics, but it does not influence the Dissolve-P based heuristics. Furthermore, requesters who adopt cooperative resource selection strategies achieve better results than those that use non-cooperative strategies.
The contributions of this paper are as follows. (1) To the best of our knowledge, this is the first study of the resource selection issue in an online non-reserved bidding-based Grid system that focuses on minimizing the turnaround time. (2) To address this issue, we propose a set of probabilistic and deterministic resource selection heuristics, as well as a pre-scheduling mechanism, and evaluate their performance. (3) The proposed heuristics consider various levels of information about competing jobs. (4) We examine the impact of cooperative requesters and non-cooperative requesters on the performance of Grid resource selection.
The remainder of the paper is organized as follows. Section2 contains a review of the literature on resource selection. In Section 3, we formally define the problem considered in this research. Section4presents the proposed heuristics for the various levels of information revealed by providers. We describe the simulation setup and evaluate the performance of the proposed heuristics in Section 5. Then, in Section 6, we summarize our conclusions.
2. Related work
A number of resource management approaches have been pro- posed in various Grid projects. Globus Toolkit [27], the most popular Grid middleware, integrates distributed computing re- sources and provides a set of management tools, such as security, data management, information services and execution manage- ment. In addition, for resource management, Globus provides MDS (Monitoring and Discovery Service) [28] to support the discov- ery and monitoring of resources, services, and computations, and GRAM(Grid Resource Allocation and Management) [29] combined with RSL(Resource Specification Language) for resource allocation tasks. However, Globus only allows users to specify basic config- urations, such as the file path, maximum CPU power, required memory, and wall clock time. It does not support job match- ing/scheduling at the global level; instead it leaves the task to the development of an upper-layer service. The bidding model and the proposed heuristics can be constructed as a high-level resource management service on top of Globus.
Condor matchmaker [8,11] is another well-known resource management framework designed for high-throughput computing
Fig. 2. An example of the postponement phenomenon in the non-reserved bidding model.
in Grids. Under this framework, providers and consumers describe their respective capabilities and requirements in classified adver- tisements (classads), which are pushed to a central matchmaker that does the matching. One of key features of the framework is that it considers different levels of sharing policies, which are spec- ified in the providers’ advertisements. The approaches in [9,10, 12] are extensions of the Condor matchmaker for handling specific requirements. However, Condor is based on a centralized match- making model in which the problem of matchmaker overload may occur. Moreover, matchmaking decisions are made by checking the resource states kept by the matchmaker, but those states may not be consistent with the real states of resources. Therefore, the matching results provided to requesters may be incorrect.
In contrast to the centralized matchmaking-based approaches, many bidding-related studies have been conducted in the field of distributed systems [17]. For example, Xiao et al. [24] presented a bidding-based resource management mechanism called a P2P de- centralized scheduling framework. Based on the mechanism, they proposed an incentive-based scheduling scheme to maximize the success rate of job executions and minimize the fairness deviation among resources. In [20], the authors proposed two contract-net based resource selection policies to increase the number of jobs completed successfully according to the given budgets and dead- lines. Das Anubhav et al. [19] introduced a combinatorial auction- based resource allocation protocol in which a user bids a price value for each of a combination of resources available for a task’s execution. The CORA (Coallocative, Oversubscribing Resource Al- location) [16] architecture is a market-based resource reserva- tion system that utilizes the trustworthy Vickrey auction to make combinatorial allocations of resources. These approaches focus on devising economic Grid methods based on the trading prices for resources used to achieve various goals. Although we adopt a sim- ilar bidding scheme, unlike these approaches, our objective is to select appropriate resources for requesters and also improve the performance by minimizing the turnaround time in a non-reserved bidding-based Grid environment.
Surfer [22] is a resource selection and ranking framework that adopts a pull-based protocol (a simplified bidding model) to extract the highest ranked resources. A pull-based model allows requesters to obtain dynamic information directly from providers, but it only provides a general resource selection framework and it is neutral in terms of selection policies. [21] presents an agent- based resource selection mechanism that splits the Grid scheduling process into two phases. In the discovery phase, resources that do not satisfy static resource requirements are filtered out. Then, in the second phase, requesters negotiate directly with providers to determine the current state of the remaining resources and select those suitable for the job’s execution. Unlike our work, the approach in [21] focuses on the benefit to individual requesters instead of all requesters, and it assumes that each provider adopts the reserved resource model.
3. Problem statement
Suppose that R = {r1,r2, . . . ,rm}requesters and S = {s1,s2, . . . ,sn}providers are given in a non-reserved bidding-based Grid system. A requester ri ∈ R submits Ji = {ji,1,ji,2, . . . ,ji,li}jobs
dynamically within a given time period T . The arrival rates of jobs are different for each requester. Each job ji,k ∈ Ji has a workloadwi,k, which is included in CFP messages and sent to a set of providers. After sending CFP messages, the requester riis given a deadline difor current bids, and bids received after the deadline will be ignored. Once the deadline has passed, requester ristarts evaluating bids within a time interval eiand finally submits a job to the selected provider. The initiation time tiniti,kis the point at which job ji,karrives. Note that since we assume this is an online system where jobs arrive dynamically, the workload wi,k and initiation time tiniti,kare not known a priori.
We assume each provider sj∈S manages a computing resource that has a given CPU capability cj. A bid proposed by a provider sjin reply to a CFP from requester riincludes an expected available time aj,i,kand a predicted execution time etj,i,k, which is approximately wi,k/cj, for executing job ji,k. The expected available time is the time at which a provider finishes the execution of all accepted jobs. As mentioned previously, we focus on the non-reserved bidding model in which providers do not reserve resources for each bid proposed by them; therefore, the expected available time is not updated by a provider until it actually receives a job. In other words, each provider proposes bids with the same expected available time for each CFP before it receives a job. Thus, we define the actual available time uj,i,k as the point at which provider sj starts executing job ji,k. Because resources are not reserved for each bid, the available time may be postponed, i.e., uj,i,k≥aj,i,k.
Fig. 2shows an example of the postponement phenomenon in the non-reserved bidding model. Three job requests jm, jm+1and jm+2were included in CFPm, CFPm+1and CFPm+2, respectively, and sent to provider sj. Because the provider did not accept any jobs between the times that CFPm and CFPm+2 were received, under the non-reserved bidding model, the three requesting jobs jm, jm+1
and jm+2were allocated, respectively, bids bn, bn+1and bn+2with the same expected available time aj,n. Since the requester of job jmdecided to submit its job before the requester of job jm+1, job jmcan be executed at actual available time um, which is equal to the expected available time aj,n. However, because the time slot after aj,nhad been allocated to job jm, the actual available time for executing job jm+1would be postponed to um+1, i.e. aj,n+etj,m. Thus, only one of the competitors, which received bids with the same expected available time, can be executed at the proposed expected available time, and the execution times of others will be deferred.
Competitors Pj,i,kof job ji,kare the requesting jobs that receive bids with the same expected available time from provider sj, and each one receives its bid before job ji,k. For example, there is no competitor of job jm in Fig. 2. Job jm+1 has one competitor contending for the resource of provider sj, i.e., Pj,m+1 = {jm}, whereas job jm+2has two competitors contending for the resource of provider sj, i.e., Pj,m+2= {jm,jm+1}. We also define the order of competitors as the time precedence (≺) that a provider proposes the bids with the same expected available time to competitors. For example, inFig. 2, the order of Competitors Pj,m+2is(jm ≺jm+1), so provider sjsent a bid to the requester of job jmbefore sending it to the requester of job jm+1.
Not all providers in the system are capable of proposing bids to requesters to satisfy job requests. Hence, we define that there are Qi,k ⊆ S contactable providers in the system (a.k.a.
Table 1
The notations used for the turnaround time and the presented heuristics.
Symbol Description
R The total number of requesters{r1,r2, . . . ,rm}in the system.
S The total number of providers{s1,s2, . . . ,sn}in the system.
ri A requester where ri∈R.
sj A provider where sj∈S.
cj The CPU capability cjof each provider sj.
T The time period during which requesters submit jobs.
Ji The total number of jobs{ji,1,ji,2, . . . ,ji,li}that requester ri
submits during T .
ji,k A job submitted by requester ri, where ji,k∈Ji. wi,k Each job ji,k∈Jihas a workloadwi,j. tiniti,k The time job ji,karrives.
etj,i,k The execution time (≈wi,k/cj) predicted by provider sj
during which provider sjexecutes ji,k.
aj,i,k The expected available time at which provider sjexpects to start executing job ji,k
uj,i,k The actual time that provider sjstarts executing job ji,k. pj,i,k The probability of selecting resource sjto execute job ji,k. Qi,k The contactable providers for job ji,k.
Pj,i,k The competitors of job ji,kthat contend for provider sj.
feasible machines in [30]) that can propose bids to requester ri for job ji,k. A list of contactable providers can be obtained from lightweight/hierarchical matchmakers [31,32] or by employing peer discovery technologies of P2P [24,33].
Recall that, in the bidding model, requesters do not have global information about other requesters’ selection decisions. Therefore, if all requesters greedily select the same provider, e.g., the one with the most powerful CPU capability or the minimum comple- tion time, the load on that provider would become imbalanced. To address the problem, we adopt a probabilistic concept for allocat- ing jobs, whereby the most powerful providers can execute more jobs, but the less powerful ones can still be employed. Suppose that the probability of selecting provider sjto execute job ji,kis pj,i,k. Under the non-reserved bidding model, given a time period T in which requesters dynamically generate jobs for submission to providers, we try to allocate the jobs such that the total turnaround time for all jobs will be minimized; that is, we try to find an appro- priate pj,i,k. The notations used for turnaround time and presented heuristics are listed inTable 1.
Eq.(1)defines the formal objective function that we want to minimize. The job execution time is etj,i,kand the waiting time is uj,i,k − tiniti,k, which is the actual available time minus the initiation time. As similar to [34,35], we focus on the variables of job execution time and waiting time for selecting resources, so we assume each machine interconnected with high-speed links.
Thus, the sum of the waiting time and the job execution time is the turnaround time.
m
X
i=1 li
X
k=1
(etj,i,k+uj,i,k−tiniti,k), where ji,kis assigned to sj. (1)
4. Heuristics
Resource selection in the non-reserved bidding-based model presents two major challenges. The first is the lack of a global information system to facilitate optimum decision-making so that a requester cannot determine if its competitors are selecting for the resource it requests. The second challenge is that, since we assume the non-reserved bidding model works in an on-line system, the job arrival time and job workload are not known a priori. To address these challenges, we propose a set of resource selection heuristics for various levels of information released by resource providers under this model. Specifically, we consider four levels of information:
(1) No Competitors’ Information: Only information about pre- dicted job execution time and expected available time is re- vealed by providers.
(2) The number of Competitors: Besides the above information, providers list the number of competitors.
(3) Competitors’ Execution Times: In addition to the above infor- mation, providers report the execution times of competitors.
(4) Complete Information about Competitors: Besides the above information, providers release the order of competing jobs.
Previous works [34,36] proposed Minimum Execution Time and Minimum Completion Time strategies to facilitate central- ized task allocation. However, because the non-reserved bidding model does not provide centralized control, the resource loads would become imbalanced if all requesters greedily select the same resource. Therefore, to allocate jobs, we propose a set of heuristics based on the notion of probability. To find an appropriate probabil- ity to minimize the turnaround time, the probability pj,i,kof allocat- ing job ji,kto provider sjis derived by each proposed heuristic. We also consider the extreme case of probabilistic resource selection, i.e., the probability of selecting the most preferred provider is one, and the probability of selecting the others is zero. We call the ex- treme case Deterministic selection and other cases Probabilistic selection. For ease of presentation, we discuss the heuristics in the following order: No Competitors’ Information, Complete Information about Competitors, Competitors’ Execution Times, and The Number of Competitors.
4.1. No competitors’ information
For the level of no competitors’ information, we propose one random and three probabilistic resource selection heuristics in addition to the Minimum Execution Time and the Minimum Completion Time. We use the postfixes ‘‘-D’’ and ‘‘-P’’ to distinguish deterministic strategies from probabilistic strategies. The formal definition of each strategy is as follows.
Random selection (RANDOM): For a job ji,k, the probability of selecting one of the contactable providers Qi,k is calculated as follows:
pj,i,k= 1
|Qi,k|.
Minimum Execution Time-Deterministic (MET-D): The provi- der that offers the minimum execution time for the job ji,kis selected. The formulation of the MET-D heuristic is as follows:
pj,i,k=
1, if etj,i,kis minimum∀sj∈Qi,k, 0, otherwise.
Minimum Execution Time-Probabilistic (MET-P): For the job ji,k, the probability of selecting one of the contactable providers Qi,k is proportional to the CPU capability of the provider over that of all contactable providers. The formulation of MET-P heuristic is as follows:
pj,i,k=
1 etj,i,k
P
∀sn∈Qi,k 1 etn,i,k
.
Minimum Completion Time-Deterministic (MCT-D): The pro- vider that offers the minimum completion time, i.e., the waiting time plus the execution time, for job ji,kis se- lected. The formulation of the MCT-D heuristic is as fol- lows:
pj,i,k=
(1, if etj,i,k+max{aj,i,k,tiniti,k} −tiniti,k is minimum∀sj∈Qi,k,
0, otherwise.
Minimum Completion Time-Probabilistic (MCT-P): For the job ji,ki, the probability of selecting one of the contactable providers Qi,k is proportional to the inverse of the completion time of the provider over that of all contactable providers. The formulation of the MCT-P heuristic is as follows:
pj,i,k=
1
etj,i,k+max{aj,i,k,tiniti,k}−tiniti,k
P
∀sn∈Qi,k
1
etn,i,k+max{an,i,k,tiniti,k}−tiniti,k
.
Algorithm 1 The Algorithm of Dissolve-P
1: −→
Qi,k←sort Qi,kof job ji,kby waiting time.
2: w = wi,k;
3: ct=0;
4: numOfProviders=0;
5: for∀sj∈−→ Qi,kdo
6: wtj,i,k=max{aj,i,k,tiniti,k} −tiniti,k;
7: c=sj.capability;
8: for k←1 to j do
9: c+ =sk.capability;
10: end for
11: if j<|−→ Qi,k|then
12: wtj+1,i,k=max{aj+1,i,k,tiniti,k} −tiniti,k;
13: if(w/c) > (wtj+1,i,k−wtj,i,k)then
14: w = w −c×(wtj+1,i,k−wtj,i,k);
15: ct=wtj+1,i,k;
16: else
17: ct=(w/c) + wtj,i,k;
18: numOfProviders=j;
19: break;
20: end if
21: else
22: ct=(w/c) + wtj,i,k;
23: numOfProviders=j;
24: end if
25: end for
26: for j←1 to numOfProviders do
27: wtj,i,k=max{aj,i,k,tiniti,k} −tiniti,k;
28: pj,i,k=(sj.capability×(ct−wtj,i,k))/wi,k;
29: end for
Dissolve-Probabilistic(Dissolve-P): This heuristic is inspired by the way ice cubes dissolve. We treat the workload of a job as an ice cube that can be dissolved by several providers.
Fig. 3 shows an example of the Dissolve-P heuristic.
First, it sorts the providers based on their waiting times.
Then, it tries to dissolve the workload on provider s1, which offers the minimum waiting time, and checks if s1has enough capability to perform the job (Fig. 3(a)).
However, s1does not have enough capability to service the job because the completion time ct1, i.e., the time that s1 could complete the job, would be greater than the waiting time, wt2, of provider s2. This means the workload would overflow to provider s2, which would have to help service the job. Likewise, in Fig. 3(b), providers s1and s2are not capable of servicing the job because completion time ct2would be longer thanwt3. The heuristic repeats the process to check if the workload overflows to provider s3(Fig. 3(c)). It finds that providers s1, s2and s3can service the job, i.e., the completion time ct3is less than the waiting timewt4. In this way, we can derive the selection probability of each involved provider from its potential contribution to executing the job.
a b c
Fig. 3. An example of the Dissolve-P heuristic.
Algorithm 1 details the steps of the Dissolve-P heuris- tic. Given a set of contactable providers Qi,kof job ji,k, sup- pose that we want to obtain the selection probability pj,i,k of provider sjinvolved in the dissolution process. First, the algorithm sorts the contactable providers for job ji,k in order of waiting time from the shortest to the longest (line 1). The remaining workload,w, which is not yet dis- solved, is initially set as the workload of ji,k, and ctis the final completion time of the providers involved in execut- ing ji,k(lines 2–3). In line 4, numOfProviders denotes the number of providers involved in the dissolution process.
The for loop of lines 5–25 is used to determine how many providers are involved in the dissolution and the final completion time for executing ji,k. The aggregate capa- bility, c, of the involved providers, comprised of the cur- rent and previously involved providers, is derived from lines 7–9. We use two cases to determine the number of involved providers and the final completion time. The condition in line 11 is used to check if the number of par- tial providers have sufficient capability to serve ji,k. If the capability is insufficient, the algorithm proceeds to the second case (lines 22–23), indicating that all contactable providers are involved in the dissolution. In the first case, if the aggregate capability is sufficient to serve ji,k(lines 17–19), the number of involved providers and the final completion time can be determined. Otherwise, the algo- rithm subtracts the workload executed by the aggregate capability from the remaining workload (line 14), up- dates the completion time as the waiting time of the next provider (line 15), and jumps to the next loop. In the sec- ond for loop (lines 26–29), after the number of involved providers and the final completion time are determined, the selection probability of each involved provider can be obtained from the proportion of the workload served to the whole workload of ji,k.
4.2. Complete information about competitors
Under the bidding model, a number of competitors can contend for a resource simultaneously. If the provider reveals in- formation about the competitors, we could schedule the competi- tors in advance and make proper selection decisions to balance the providers’ loads, which would further minimize turnaround time.
To this end, we propose a mechanism that pre-schedules the com- petitors and updates the waiting time of each involved provider.
Based on the updated waiting times, we use the previous heuris- tics to derive appropriate probabilities for resource selection.
Before pre-scheduling competitors, we have to sort the jobs by their time precedence (≺), since we assume that the earlier jobs have more opportunities to be allocated first.Fig. 4shows an example of the sorting process for job ji,k. In this example, Qi,k = {s1, . . . ,s5}are the contactable providers for job ji,kand
a b
c d
Fig. 4. An example of the process for sorting competitors.
Pi,k= {j1, . . . ,j8}are the competitors of ji,k. If the providers reveal details of the number of competitors to the requester, we can construct a bipartite graph to describe the relationships between providers and competitors, as shown inFig. 4(a). Furthermore, if the order information of competitors is released, we can construct the precedence links of the competitors for each provider, as shown inFig. 4(b). Then, we can merge the precedence links into a precedence graph, as shown inFig. 4(c). Finally, we can use a topological sort [37] to derive the time precedence of competitors (j1≺j3≺j4≺j5≺j7≺j8≺j2≺j6), as shown inFig. 4(d).
After sorting, we can determine the time precedence of com- petitors as well as the providers that may be selected. We start pre- scheduling the competitors from the earliest to the latest based on the previously proposed heuristics. For instance, if we apply the MCT-P heuristic in the pre-scheduling stage, each competing job would use the same heuristic to select providers. Once a competi- tor selects a provider, the waiting time of the provider will be up- dated to include the execution time of the competitor. The reason for updating the waiting time of the providers selected by com- petitors is that the jobs have a chance to be submitted to the se- lected providers before the requester submits its job to the selected providers.
Based on the updated waiting times of the contactable providers, we reuse the previously proposed heuristics to select an appropriate provider for a requester. We propose the following three extensions of the previous heuristics for this level of information.
Minimum Completion Time-Deterministic-Complete (MCT- D-C): In the pre-scheduling phase, we assume that each competitor adopts the MCT-D heuristic to select providers, and uses the competitors’ execution times as input to derive the updated waiting times of their respective providers. After pre-scheduling the competitors by the MCT-D heuristic, the provider that offers the minimum completion time is selected for the job’s execution.
Minimum Completion Time-Probabilistic-Complete (MCT- P-C): In the pre-scheduling phase, we assume that each competitor adopts the MCT-P heuristic to select providers, and uses the other competitors’ execution times as input to derive the updated waiting times of their respective providers. After pre-scheduling the competitors by the MCT-P heuristic, the
probability of selecting one of the providers is proportional to the inverse of the completion time of the provider over that of all contactable providers.
Dissolve-Probabilistic-Complete (Dissolve-P-C): In the pre- scheduling phase, we assume that each competitor adopts the Dissolve-P heuristic to select providers, and uses the competi- tors’ execution times as input to derive the updated waiting times of their respective providers. After pre-scheduling the competing jobs by the Dissolve-P heuristic, Algorithm 1 is re- applied to calculate the selection probability of the providers;
however, the waiting time in Algorithm 1 must be replaced by the updated waiting time derived by the pre-scheduling process.
4.3. Competitors’ execution times
In this level of information, providers reveal the competitors’
execution times, but not the order of the competitors. Due to the lack of order information, we cannot derive the time prece- dence of competitors, e.g., the case inFig. 4(d). To solve this prob- lem, we sort the order of competitors in an arbitrary manner.
For this information level, we propose three heuristics, Minimum Completion Time-Deterministic-Execution Time (MCT-D-E), Mini- mum Completion Time-Probabilistic-Execution Time (MCT-P-E) and Dissolve-Probabilistic-Execution Time (Dissolve-P-E). They are simi- lar to the heuristics for Complete Information about Competitors, but the order of competitors is arranged arbitrarily.
4.4. The number of competitors
Requesters know the number of competitors, but not the or- der and execution times of the competitors. Without the time in- formation, we cannot determine the updated waiting times of the providers selected by competitors in the pre-scheduling phase. To solve this problem, we substitute the execution time of the re- quester’s job for those of the competitors. For instance, inFig. 4, the execution time et1,i,kof job ji,kis used as a substitute for ex- ecution times et1,4,et1,5,et1,6 of jobs j4,j5, and j6. For this level of information, we propose three heuristics, Minimum Comple- tion Time-Deterministic-Number (MCT-D-N), Minimum Completion Time-Probabilistic-Number (MCT-P-N) and Dissolve-Probabilistic- Number (Dissolve-P-N). They are similar to the heuristics for Com- petitors’ Execution Times, but they take the execution time of the requester’s job as a substitute to derive the updated waiting times of the involved providers in the pre-scheduling phase.
5. Performance evaluation 5.1. Experiment setup
The objective of the experiments described in this section is threefold: (1) to evaluate the performance of each proposed heuristic under the various levels of information revealed by re- source providers; (2) to determine the impact of non-cooperative requesters; and (3) to compare the performance with a central- ized matchmaking model. For the experiments, we developed a resource selection tool running on Taiwan UniGrid [38] to eval- uate the performance of the presented heuristics.Table 2details the set-up parameters used in the experiments. We selected 20 machines from Taiwan Unigrid as resource providers, as shown in the first column inFig. 4. For the job types, we adopted the Lin- pack benchmark [39,40] and four application benchmarks, namely, the Fast Fourier Transform (FFT), Jacobi Successive Over-relaxation (SOR), Monte Carlo Integration (MCI) and Dense LU Matrix Factor- ization (LU), provided by SciMark2 [41]. As shown inTable 3, four of the benchmarks comprised two different-size problems. MCI (not
Table 2
Experiment settings.
Parameter Value
Number of providers (|S|) 20
Number of requesters (|R|) 4, 8, 12, 16 and 20 Number of contactable providers
(|Qi,k|)
20 and Random(20)
Job arrival rate Negative Exponential Distribution with mean=50 seconds
Deadline for waiting bids (di) 20 s Experiment period (T ) 15 min
shown) only had one problem. Thus, we had a total of 9 bench- marks, from which the type of each experimental job was selected at random.Fig. 4shows a snapshot of the time required to execute each type of job on each selected machine. The result is the average of executing each type of job three times.
Up(jn+1) =
n
X
i=1
(wi·U(ji)), wherewi= i
n
P
j=1
. (2)
To help providers report the predicted job execution time for each proposed bid, we adopt the weighted-mean prediction function to predict the execution times of jobs that belong to the same job type. Eq.(2)is the formal definition of the weighted-mean prediction function, where Up(jn+1)is the predicted execution time for the next job, and U(ji)denotes the actual execution time of the previous job. Each actual execution time is assigned a weight, based on the freshness degree of the jobs already executed, i.e., the execution time of the most recent job has more weight than the execution times of the second and third most recently executed jobs. In our experiments, we take the execution times of the last 5 jobs as historical data to predict the execution time of the next job.
To solve the problem of the lack of historical data at the beginning of the experiments, we use the execution times listed inTable 4 as historical data to predict the execution times of the first 5 jobs.
In the following experiments, the deviation rate of execution time prediction is approximately 7% on average.
We conduct four groups of experiments to evaluate the perfor- mance of the proposed heuristics for each level of information. To observe the influence of the system load on the performance, we use various numbers of requesters, ranging from 4 to 20, to rep- resent the relative system loads. Using 4 requesters allows us to assess the performance when only a small number of requesters contend for resources, whereas 20 requesters allows us to observe the performance when a large number of requesters compete for limited resources. Clearly, the system load is proportional to the number of requesters that join the system.
We also consider the impact of the number of contactable providers on the performance of resource selection. Specifically, we assess two groups, comprised of 20 and Random(20) providers respectively. The different numbers of contactable providers represent the levels of heterogeneity of the providers.|Qi,k| =
|S| = 20 indicates that the providers are homogeneous in terms of their specifications, except for their computational capability.
Thus, potentially, all providers could be accessed by all requesters.
In contrast,|Qi,k| =Random(20)<20 simulates a heterogeneous system in which the number of providers that can service job ji,k is Random(20), selected randomly from the original 20 providers.
We repeated the evaluation of the proposed resource selection heuristics for each case 10 times and took the average result as the experimental outcome.
5.2. Evaluation results
In this section, we present six sets of evaluation results. First, to determine the distribution of competitors in the experiments, we consider the average number of competitors per job compared to the number of requesters and the number of contactable providers. Second, we evaluate the performance of the heuristics on each information level and observe the influence of different contactable providers. Third, we discuss the effect of various information levels on the MCT-D and Dissolve-P heuristics. Fourth,
Table 3
Benchmarks and problem sizes (excluding MCI).
Benchmark Linpack FFT SOR LU
Large Small Large Small Large Small Large Small
Problem size 3600 1800 4 194 304 1 048 576 5000 2500 4000 2000
Table 4
The average time (in seconds) required to execute each type of job on the selected machines.
Machine Linpack FFT SOR LU MCI
Large Small Large Small Large Small Large Small
NTCU 1 122.08 15.47 34.03 6.12 6.39 5.85 158.47 20.19 4.54
NTCU 2 100.81 13.74 21.98 5.22 5.85 5.38 113.90 14.80 4.31
NTCU 3 136.63 18.71 31.21 6.70 8.91 7.24 244.57 20.29 6.48
NTCU 4 99.08 14.20 33.93 7.08 9.51 7.67 155.19 16.22 7.01
NTHU 1 130.39 18.44 32.61 6.78 8.42 6.54 151.04 24.73 6.56
NTHU 2 130.38 18.51 33.79 6.45 8.47 6.56 194.66 19.29 6.53
NTHU 3 130.27 18.54 33.84 7.04 8.45 6.55 150.35 19.24 6.53
NTHU 4 110.69 16.00 28.54 5.32 6.27 5.75 120.83 15.77 4.55
NTTU 1 159.47 20.02 25.41 5.64 10.98 4.72 189.96 24.41 4.40
NTTU 2 112.16 16.05 26.74 5.61 7.55 5.84 130.04 16.97 5.85
NTTU 3 112.17 16.05 26.99 5.58 7.52 5.84 129.76 16.96 5.85
NTTU 4 112.43 16.07 26.84 5.57 7.55 5.83 129.98 16.96 5.85
Sinica 1 98.38 13.81 33.24 7.20 17.69 8.83 154.84 16.09 7.01
Sinica 2 85.45 11.90 28.48 5.93 9.38 7.27 114.87 15.06 5.37
Sinica 3 85.50 11.87 28.53 5.47 9.25 7.27 114.98 14.98 5.39
Sinica 4 85.52 11.85 25.71 5.32 13.22 8.66 114.97 15.05 5.40
CJCU 1 822.60 110.13 136.93 32.04 20.78 7.55 892.57 120.21 7.14
NCHC 1 381.39 50.25 61.95 13.21 8.53 5.78 492.10 60.26 4.30
NTU 1 83.09 11.58 22.42 5.18 7.44 6.48 121.04 15.39 7.03
THU 1 350.67 45.10 59.52 14.31 6.61 5.59 443.60 56.37 5.35