Future Generation Computer Systems

(1)

Contents lists available atScienceDirect

Future Generation Computer Systems

journal homepage:www.elsevier.com/locate/fgcs

Dynamic resource selection heuristics for a non-reserved bidding-based Grid environment

Chien-Min Wang

^a

, Hsi-Min Chen

^b,^∗

, Chun-Chen Hsu

^c

, Jonathan Lee

^b

aInstitute of Information Science, Academia Sinica, Taipei, Taiwan

bDepartment of Computer Science and Information Engineering, National Central University, Taoyuan, Taiwan

cDepartment of Computer Science and Information Engineering, National Taiwan University, Taipei, Taiwan

a r t i c l e i n f o

Article history:

Received 19 December 2008 Received in revised form 29 July 2009

Accepted 3 August 2009 Available online 7 August 2009

Keywords:

Grid computing Resource selection Resource management Bidding

Matchmaking

a b s t r a c t

A Grid system is comprised of large sets of heterogeneous and geographically distributed resources that are aggregated as a virtual computing platform for executing large-scale scientific applications.

As the number of resources in Grids increases rapidly, selecting appropriate resources for jobs has become a crucial issue. To avoid single point of failure and server overload problems, bidding provides an alternative means of resource selection in distributed systems. However, under the bidding model, the key challenge of resource selection is that there is no global information system to facilitate optimum decision-making; hence requesters can only obtain partial information revealed by resource providers.

To address this problem, we propose a set of resource selection heuristics to minimize the turnaround time in a non-reserved bidding-based Grid environment, while considering the level of information about competing jobs revealed by providers. We also present the results of experiments conducted to evaluate the performance of the proposed heuristics.

1. Introduction

With the rapid growth in the number of PCs and clusters, Grid computing technologies have emerged to facilitate resource sharing and the coordination of problem solving in distributed systems [1,2]. Such systems consist of large sets of heterogeneous and geographically distributed resources that are aggregated as a virtual computing platform for executing large-scale scientific applications. As the number of resources in Grids increases rapidly, selecting appropriate resources for jobs has become a crucial issue.

In essence, Grid resources are heterogeneous and managed in- dependently by different organizations, and resource providers can specify their own access policies for sharing resources and joining/leaving Grids dynamically. Thus, exploiting previous cluster-based scheduling heuristics [3–7] to allocate tasks through a centralized manager or mapper is not feasible.

In recent years, many matchmaking-based technologies have been proposed to address the issue of Grid resource management [8–15].Fig. 1(a) presents an abstract matchmaking model gener- alized from these technologies. However, the matchmaking technique may cause a matchmaker overload problem. Since a resource

∗Corresponding author.

E-mail address:[email protected](H.-M. Chen).

matchmaker is responsible for registering all resource states ad- vertised by providers and executing matching algorithms, an increase in the number of resources and the frequency of job requests creates a performance bottleneck. Moreover, resource states may change minute by minute due to requesters’ activities or resource failures, so the matchmaking technique may fail to reflect the dynamic nature of Grid resources. This is because matchmaking is a push-based model in which a matchmaker does not learn about the changes in resource states until the resource providers publish their new states. In consequence, matchmaking may return inac- curate results.

To avoid single point of failure, matchmaker overload and expired resource information, bidding provides an alternative means of resource allocation in distributed systems [16–24].

Fig. 1(b) depicts the abstract process of the bidding model. A resource requester starts a bidding process by sending a set of call-for-proposal (CFP) requests, which contain job requirements, to resource providers. Then, based their resource utilization and policies, the providers decide whether or not to participate in the bidding process. If they join the bidding process, they return bids that describe the states of their resources to the requester.

Finally, the requester evaluates and ranks the collected bids based on its selection strategy and submits the job to the provider that proposes the best-ranked bid. The bidding model has the following advantages over the matchmaking model. (1) Scalability:

resource allocations between providers and requesters in the

doi:10.1016/j.future.2009.08.003

(2)

a b

Fig. 1. (a) The abstract process of the matchmaking model. (b) The abstract process of the bidding model.

bidding model are fully distributed without the intervention of a centralized matchmaker/broker. (2) Autonomy: requesters themselves can determine which of the offered resources are best- suited to execute their jobs, while providers can contribute their resources according to their sharing policies and report up-to-date state information. (3) Reliability: if a resource fails during a job’s execution, the requester can select other candidate resources from the received bids.

Under the bidding model, resource providers can usually choose between two bidding strategies, reserved and non-reserved bidding [25]. Providers who adopt the reserved strategy keep the resources for each bid as commitments to guarantee future resource states. However, if the requester subsequently rejects the bid, the reserved resources will be wasted. In this scenario, other requesters may be prepared to accept the bid before the original requester rejects it; hence, there is a high probability that the provider will miss the opportunity to serve other requesters with reserved resources. In contrast to the reserved strategy, resource providers who adopt the non-reserved option offer the same resource states to a set of requesters without reserving resources for each bid. This strategy enables providers to fully utilize their resources, but it does not guarantee the resource states. If requesters receiving the same bids submit jobs to the provider simultaneously, they will have to compete for the resources so that the job completion time may not be as expected.

In addition to the above strategies, the bidding model allows providers to reveal different levels of information about competing jobs to requesters. As shown inFig. 1(b), after resource providers receive CFPs from requesters, they can simply reveal the capabilities of the provided resources, provide information about the number of competitors, or give even more complete information about the competitors. The level of information revealed is an important factor that affects the performance of resource selection for a job’s execution.

In this paper, our objective is to minimize the turnaround time of jobs in a non-reserved bidding-based Grid environment. The turnaround time covers the period from the time a job arrives to the receipt of the executed result. In online systems, users are more sensitive to the turnaround time than the execution time, waiting time or makespan [26]. To minimize the turnaround time in this model, we propose a set of deterministic and probabilistic resource selection heuristics. In contrast to traditional centralized scheduling problems, the key challenge of resource selection in the bidding model is that there is no global information system to facilitate optimum decision-making; hence, requesters are only aware of partial information released by resource providers. Thus, we consider various levels of information about competitors in the proposed heuristics. We want to determine whether requesters could make better scheduling decisions if they have more information about the states of competing jobs.

We conduct experiments to evaluate the performance of the heuristics for various levels of information and the impact

of non-cooperative requesters. The experimental results show that the performance of the Dissolve-P heuristic is superior to that of other heuristics when information about competitors is not provided. However, the MCT-D heuristics outperform the other heuristics when information about the execution times of competitors is provided. We also find that the level of information has a significant effect on the performance of the MCT-D based heuristics, but it does not influence the Dissolve-P based heuristics. Furthermore, requesters who adopt cooperative resource selection strategies achieve better results than those that use non-cooperative strategies.

The contributions of this paper are as follows. (1) To the best of our knowledge, this is the first study of the resource selection issue in an online non-reserved bidding-based Grid system that focuses on minimizing the turnaround time. (2) To address this issue, we propose a set of probabilistic and deterministic resource selection heuristics, as well as a pre-scheduling mechanism, and evaluate their performance. (3) The proposed heuristics consider various levels of information about competing jobs. (4) We examine the impact of cooperative requesters and non-cooperative requesters on the performance of Grid resource selection.

The remainder of the paper is organized as follows. Section2 contains a review of the literature on resource selection. In Section 3, we formally define the problem considered in this research. Section4presents the proposed heuristics for the various levels of information revealed by providers. We describe the simulation setup and evaluate the performance of the proposed heuristics in Section 5. Then, in Section 6, we summarize our conclusions.

2. Related work

A number of resource management approaches have been proposed in various Grid projects. Globus Toolkit [27], the most popular Grid middleware, integrates distributed computing resources and provides a set of management tools, such as security, data management, information services and execution management. In addition, for resource management, Globus provides MDS (Monitoring and Discovery Service) [28] to support the discovery and monitoring of resources, services, and computations, and GRAM(Grid Resource Allocation and Management) [29] combined with RSL(Resource Specification Language) for resource allocation tasks. However, Globus only allows users to specify basic config- urations, such as the file path, maximum CPU power, required memory, and wall clock time. It does not support job matching/scheduling at the global level; instead it leaves the task to the development of an upper-layer service. The bidding model and the proposed heuristics can be constructed as a high-level resource management service on top of Globus.

Condor matchmaker [8,11] is another well-known resource management framework designed for high-throughput computing

(3)

Fig. 2. An example of the postponement phenomenon in the non-reserved bidding model.

in Grids. Under this framework, providers and consumers describe their respective capabilities and requirements in classified advertisements (classads), which are pushed to a central matchmaker that does the matching. One of key features of the framework is that it considers different levels of sharing policies, which are spec- ified in the providers’ advertisements. The approaches in [9,10, 12] are extensions of the Condor matchmaker for handling specific requirements. However, Condor is based on a centralized matchmaking model in which the problem of matchmaker overload may occur. Moreover, matchmaking decisions are made by checking the resource states kept by the matchmaker, but those states may not be consistent with the real states of resources. Therefore, the matching results provided to requesters may be incorrect.

In contrast to the centralized matchmaking-based approaches, many bidding-related studies have been conducted in the field of distributed systems [17]. For example, Xiao et al. [24] presented a bidding-based resource management mechanism called a P2P de- centralized scheduling framework. Based on the mechanism, they proposed an incentive-based scheduling scheme to maximize the success rate of job executions and minimize the fairness deviation among resources. In [20], the authors proposed two contract-net based resource selection policies to increase the number of jobs completed successfully according to the given budgets and dead- lines. Das Anubhav et al. [19] introduced a combinatorial auction- based resource allocation protocol in which a user bids a price value for each of a combination of resources available for a task’s execution. The CORA (Coallocative, Oversubscribing Resource Al- location) [16] architecture is a market-based resource reserva- tion system that utilizes the trustworthy Vickrey auction to make combinatorial allocations of resources. These approaches focus on devising economic Grid methods based on the trading prices for resources used to achieve various goals. Although we adopt a similar bidding scheme, unlike these approaches, our objective is to select appropriate resources for requesters and also improve the performance by minimizing the turnaround time in a non-reserved bidding-based Grid environment.

Surfer [22] is a resource selection and ranking framework that adopts a pull-based protocol (a simplified bidding model) to extract the highest ranked resources. A pull-based model allows requesters to obtain dynamic information directly from providers, but it only provides a general resource selection framework and it is neutral in terms of selection policies. [21] presents an agent- based resource selection mechanism that splits the Grid scheduling process into two phases. In the discovery phase, resources that do not satisfy static resource requirements are filtered out. Then, in the second phase, requesters negotiate directly with providers to determine the current state of the remaining resources and select those suitable for the job’s execution. Unlike our work, the approach in [21] focuses on the benefit to individual requesters instead of all requesters, and it assumes that each provider adopts the reserved resource model.

3. Problem statement

Suppose that R = {r₁,^r2, . . . ,^rm}requesters and S = {s₁,^s2, . . . ,^sn}providers are given in a non-reserved bidding-based Grid system. A requester r_i ∈ R submits J_i = {j_i_,₁,^ji,², . . . ,^ji,^li}jobs

dynamically within a given time period T . The arrival rates of jobs are different for each requester. Each job j_i_,_k ∈ J_i has a workloadwi,k, which is included in CFP messages and sent to a set of providers. After sending CFP messages, the requester r_iis given a deadline d_ifor current bids, and bids received after the deadline will be ignored. Once the deadline has passed, requester r_istarts evaluating bids within a time interval e_iand finally submits a job to the selected provider. The initiation time t_init_i_,_kis the point at which job j_i_,_karrives. Note that since we assume this is an online system where jobs arrive dynamically, the workload wi,^k and initiation time t_init_i_,_kare not known a priori.

We assume each provider s_j∈S manages a computing resource that has a given CPU capability c_j. A bid proposed by a provider s_jin reply to a CFP from requester r_iincludes an expected available time a_j_,_i_,_kand a predicted execution time et_j_,_i_,_k, which is approximately wi,^k/^cj, for executing job j_i_,_k. The expected available time is the time at which a provider finishes the execution of all accepted jobs. As mentioned previously, we focus on the non-reserved bidding model in which providers do not reserve resources for each bid proposed by them; therefore, the expected available time is not updated by a provider until it actually receives a job. In other words, each provider proposes bids with the same expected available time for each CFP before it receives a job. Thus, we define the actual available time u_j_,_i_,_k as the point at which provider s_j starts executing job j_i_,_k. Because resources are not reserved for each bid, the available time may be postponed, i.e., u_j_,_i_,_k≥a_j_,_i_,_k.

Fig. 2shows an example of the postponement phenomenon in the non-reserved bidding model. Three job requests j_m, j_m+1and j_m+2were included in CFP_m, CFP_m+1and CFP_m+2, respectively, and sent to provider s_j. Because the provider did not accept any jobs between the times that CFP_m and CFP_m+2 were received, under the non-reserved bidding model, the three requesting jobs j_m, j_m+1

and j_m+2were allocated, respectively, bids b_n, b_n+1and b_n+2with the same expected available time a_j_,_n. Since the requester of job j_mdecided to submit its job before the requester of job j_m+1, job j_mcan be executed at actual available time u_m, which is equal to the expected available time a_j_,_n. However, because the time slot after a_j_,_nhad been allocated to job j_m, the actual available time for executing job j_m+1would be postponed to u_m+1, i.e. a_j_,_n+et_j_,_m. Thus, only one of the competitors, which received bids with the same expected available time, can be executed at the proposed expected available time, and the execution times of others will be deferred.

Competitors P_j_,_i_,_kof job j_i_,_kare the requesting jobs that receive bids with the same expected available time from provider s_j, and each one receives its bid before job j_i_,_k. For example, there is no competitor of job j_m in Fig. 2. Job j_m+1 has one competitor contending for the resource of provider s_j, i.e., P_j_,_m+1 = {j_m}, whereas job j_m+2has two competitors contending for the resource of provider s_j, i.e., P_j_,_m+2= {j_m,^jm+1}. We also define the order of competitors as the time precedence (≺) that a provider proposes the bids with the same expected available time to competitors. For example, inFig. 2, the order of Competitors P_j_,_m+2is(^jm ≺j_m+1)^, so provider s_jsent a bid to the requester of job j_mbefore sending it to the requester of job j_m+1.

Not all providers in the system are capable of proposing bids to requesters to satisfy job requests. Hence, we define that there are Q_i_,_k ⊆ S contactable providers in the system (a.k.a.

(4)

Table 1

The notations used for the turnaround time and the presented heuristics.

Symbol Description

R The total number of requesters{r1,^r2, . . . ,^rm}in the system.

S The total number of providers{s1,^s2, . . . ,^sn}in the system.

ri A requester where ri∈R.

sj A provider where sj∈S.

cj The CPU capability cjof each provider sj.

T The time period during which requesters submit jobs.

Ji The total number of jobs{ji,1,ji,2, . . . ,ji,li}that requester ri

submits during T .

ji,k A job submitted by requester ri, where ji,k∈Ji. wi,k Each job ji,k∈Jihas a workloadwi,j. tiniti,k The time job ji,karrives.

etj,i,k The execution time (≈wi,k/^cj) predicted by provider sj

during which provider sjexecutes ji,k.

aj,i,k The expected available time at which provider sjexpects to start executing job ji,k

uj,i,k The actual time that provider sjstarts executing job ji,k. pj,i,k The probability of selecting resource sjto execute job ji,k. Qi,k The contactable providers for job ji,k.

Pj,i,k The competitors of job ji,kthat contend for provider sj.

feasible machines in [30]) that can propose bids to requester r_i for job j_i_,_k. A list of contactable providers can be obtained from lightweight/hierarchical matchmakers [31,32] or by employing peer discovery technologies of P2P [24,33].

Recall that, in the bidding model, requesters do not have global information about other requesters’ selection decisions. Therefore, if all requesters greedily select the same provider, e.g., the one with the most powerful CPU capability or the minimum completion time, the load on that provider would become imbalanced. To address the problem, we adopt a probabilistic concept for allocat- ing jobs, whereby the most powerful providers can execute more jobs, but the less powerful ones can still be employed. Suppose that the probability of selecting provider s_jto execute job j_i_,_kis p_j_,_i_,_k. Under the non-reserved bidding model, given a time period T in which requesters dynamically generate jobs for submission to providers, we try to allocate the jobs such that the total turnaround time for all jobs will be minimized; that is, we try to find an appro- priate p_j_,_i_,_k. The notations used for turnaround time and presented heuristics are listed inTable 1.

Eq.(1)defines the formal objective function that we want to minimize. The job execution time is et_j_,_i_,_kand the waiting time is u_j_,_i_,_k − t_init_i_,_k, which is the actual available time minus the initiation time. As similar to [34,35], we focus on the variables of job execution time and waiting time for selecting resources, so we assume each machine interconnected with high-speed links.

Thus, the sum of the waiting time and the job execution time is the turnaround time.

m

X

i=1 li

X

k=1

(^etj,ⁱ,^k+u_j_,_i_,_k−t_init_i_,_k), ^{where j}i,^kis assigned to s_j. ⁽¹⁾

4. Heuristics

Resource selection in the non-reserved bidding-based model presents two major challenges. The first is the lack of a global information system to facilitate optimum decision-making so that a requester cannot determine if its competitors are selecting for the resource it requests. The second challenge is that, since we assume the non-reserved bidding model works in an on-line system, the job arrival time and job workload are not known a priori. To address these challenges, we propose a set of resource selection heuristics for various levels of information released by resource providers under this model. Specifically, we consider four levels of information:

(1) No Competitors’ Information: Only information about pre- dicted job execution time and expected available time is revealed by providers.

(2) The number of Competitors: Besides the above information, providers list the number of competitors.

(3) Competitors’ Execution Times: In addition to the above infor- mation, providers report the execution times of competitors.

(4) Complete Information about Competitors: Besides the above information, providers release the order of competing jobs.

Previous works [34,36] proposed Minimum Execution Time and Minimum Completion Time strategies to facilitate central- ized task allocation. However, because the non-reserved bidding model does not provide centralized control, the resource loads would become imbalanced if all requesters greedily select the same resource. Therefore, to allocate jobs, we propose a set of heuristics based on the notion of probability. To find an appropriate probabil- ity to minimize the turnaround time, the probability p_j_,_i_,_kof allocat- ing job j_i_,_kto provider s_jis derived by each proposed heuristic. We also consider the extreme case of probabilistic resource selection, i.e., the probability of selecting the most preferred provider is one, and the probability of selecting the others is zero. We call the ex- treme case Deterministic selection and other cases Probabilistic selection. For ease of presentation, we discuss the heuristics in the following order: No Competitors’ Information, Complete Information about Competitors, Competitors’ Execution Times, and The Number of Competitors.

4.1. No competitors’ information

For the level of no competitors’ information, we propose one random and three probabilistic resource selection heuristics in addition to the Minimum Execution Time and the Minimum Completion Time. We use the postfixes ‘‘-D’’ and ‘‘-P’’ to distinguish deterministic strategies from probabilistic strategies. The formal definition of each strategy is as follows.

Random selection (RANDOM): For a job j_i_,_k, the probability of selecting one of the contactable providers Q_i_,_k is calculated as follows:

p_j_,_i_,_k= ¹

|Q_i_,_k|.

Minimum Execution Time-Deterministic (MET-D): The provi- der that offers the minimum execution time for the job j_i_,_kis selected. The formulation of the MET-D heuristic is as follows:

p_j_,_i_,_k=

1, ^{if et}j,i,kis minimum∀s_j∈Q_i_,_k, 0, ^otherwise.

Minimum Execution Time-Probabilistic (MET-P): For the job j_i_,_k, the probability of selecting one of the contactable providers Q_i_,_k is proportional to the CPU capability of the provider over that of all contactable providers. The formulation of MET-P heuristic is as follows:

p_j_,_i_,_k=

1 etj,i,k

P

∀sn∈Qi,^k 1 etn,i,k

.

Minimum Completion Time-Deterministic (MCT-D): The pro- vider that offers the minimum completion time, i.e., the waiting time plus the execution time, for job j_i_,_kis selected. The formulation of the MCT-D heuristic is as follows:

p_j_,_i_,_k=

(₁, ^{if et}j,ⁱ,^k+max{a_j_,_i_,_k,^tiniti,^k} −t_init_i_,_k is minimum∀s_j∈Q_i_,_k,

0, ^otherwise.

(5)

Minimum Completion Time-Probabilistic (MCT-P): For the job j_i_,_k_i, the probability of selecting one of the contactable providers Q_i_,_k is proportional to the inverse of the completion time of the provider over that of all contactable providers. The formulation of the MCT-P heuristic is as follows:

p_j_,_i_,_k=

1

etj,i,k+max{aj,i,k,t_initi,^k}−t_initi,^k

P

∀sn∈Qi,^k

1

etn,i,k+max{an,i,k,t_initi,^k}−t_initi,^k

.

Algorithm 1 The Algorithm of Dissolve-P

1: −→

Q_i_,_k←sort Q_i_,_kof job j_i_,_kby waiting time.

2: w = wi,^k;

3: ct=0;

4: numOfProviders=0;

5: for∀s_j∈−→ Q_i_,_kdo

6: w^tj,ⁱ,^k=max{a_j_,_i_,_k,^tiniti,^k} −t_init_i_,_k;

7: c=s_j.capability;

8: for k←1 to j do

9: c+ =_s_k.capability;

10: end for

11: if j<|−→ Q_i_,_k|then

12: w^tj+1,i,k=max{a_j+1,i,k,^tiniti,k} −t_init_i_,_k;

13: if(w/^c) > (w^tj+1,i,k−w^tj,i,k)^then

14: w = w −^c×(w^tj+1,i,k−w^tj,i,k)^;

15: ct=w^tj+1,i,k;

16: else

17: ct=(w/^c) + w^tj,i,k;

18: numOfProviders=j;

19: break;

20: end if

21: else

22: ct=(w/^c) + w^tj,i,k;

23: numOfProviders=j;

24: end if

25: end for

26: for j←1 to numOfProviders do

27: w^tj,i,k=max{a_j_,_i_,_k,^tiniti,^k} −t_init_i_,_k;

28: p_j_,_i_,_k=(^sj.^capability×(^ct−w^tj,i,k))/wi,k;

29: end for

Dissolve-Probabilistic(Dissolve-P): This heuristic is inspired by the way ice cubes dissolve. We treat the workload of a job as an ice cube that can be dissolved by several providers.

Fig. 3 shows an example of the Dissolve-P heuristic.

First, it sorts the providers based on their waiting times.

Then, it tries to dissolve the workload on provider s₁, which offers the minimum waiting time, and checks if s₁has enough capability to perform the job (Fig. 3(a)).

However, s₁does not have enough capability to service the job because the completion time ct₁, i.e., the time that s₁ could complete the job, would be greater than the waiting time, w^t2, of provider s₂. This means the workload would overflow to provider s₂, which would have to help service the job. Likewise, in Fig. 3(b), providers s₁and s₂are not capable of servicing the job because completion time ct₂would be longer thanw^t3. The heuristic repeats the process to check if the workload overflows to provider s₃(Fig. 3(c)). It finds that providers s₁, s₂and s₃can service the job, i.e., the completion time ct₃is less than the waiting timew^t4. In this way, we can derive the selection probability of each involved provider from its potential contribution to executing the job.

a b c

Fig. 3. An example of the Dissolve-P heuristic.

Algorithm 1 details the steps of the Dissolve-P heuris- tic. Given a set of contactable providers Q_i_,_kof job j_i_,_k, sup- pose that we want to obtain the selection probability p_j_,_i_,_k of provider s_jinvolved in the dissolution process. First, the algorithm sorts the contactable providers for job j_i_,_k in order of waiting time from the shortest to the longest (line 1). The remaining workload,w, which is not yet dis- solved, is initially set as the workload of j_i_,_k, and c_tis the final completion time of the providers involved in execut- ing j_i_,_k(lines 2–3). In line 4, numOfProviders denotes the number of providers involved in the dissolution process.

The for loop of lines 5–25 is used to determine how many providers are involved in the dissolution and the final completion time for executing j_i_,_k. The aggregate capa- bility, c, of the involved providers, comprised of the cur- rent and previously involved providers, is derived from lines 7–9. We use two cases to determine the number of involved providers and the final completion time. The condition in line 11 is used to check if the number of par- tial providers have sufficient capability to serve j_i_,_k. If the capability is insufficient, the algorithm proceeds to the second case (lines 22–23), indicating that all contactable providers are involved in the dissolution. In the first case, if the aggregate capability is sufficient to serve j_i_,_k(lines 17–19), the number of involved providers and the final completion time can be determined. Otherwise, the algorithm subtracts the workload executed by the aggregate capability from the remaining workload (line 14), updates the completion time as the waiting time of the next provider (line 15), and jumps to the next loop. In the sec- ond for loop (lines 26–29), after the number of involved providers and the final completion time are determined, the selection probability of each involved provider can be obtained from the proportion of the workload served to the whole workload of j_i_,_k.

4.2. Complete information about competitors

Under the bidding model, a number of competitors can contend for a resource simultaneously. If the provider reveals information about the competitors, we could schedule the competitors in advance and make proper selection decisions to balance the providers’ loads, which would further minimize turnaround time.

To this end, we propose a mechanism that pre-schedules the competitors and updates the waiting time of each involved provider.

Based on the updated waiting times, we use the previous heuristics to derive appropriate probabilities for resource selection.

Before pre-scheduling competitors, we have to sort the jobs by their time precedence (≺), since we assume that the earlier jobs have more opportunities to be allocated first.Fig. 4shows an example of the sorting process for job j_i_,_k. In this example, Q_i_,_k = {s₁, . . . ,^s5}are the contactable providers for job j_i_,_kand

(6)

a b

c d

Fig. 4. An example of the process for sorting competitors.

P_i_,_k= {j₁, . . . ,^j8}are the competitors of j_i_,_k. If the providers reveal details of the number of competitors to the requester, we can construct a bipartite graph to describe the relationships between providers and competitors, as shown inFig. 4(a). Furthermore, if the order information of competitors is released, we can construct the precedence links of the competitors for each provider, as shown inFig. 4(b). Then, we can merge the precedence links into a precedence graph, as shown inFig. 4(c). Finally, we can use a topological sort [37] to derive the time precedence of competitors (^j1≺j₃≺j₄≺j₅≺j₇≺j₈≺j₂≺j₆), as shown inFig. 4(d).

After sorting, we can determine the time precedence of competitors as well as the providers that may be selected. We start pre- scheduling the competitors from the earliest to the latest based on the previously proposed heuristics. For instance, if we apply the MCT-P heuristic in the pre-scheduling stage, each competing job would use the same heuristic to select providers. Once a competitor selects a provider, the waiting time of the provider will be updated to include the execution time of the competitor. The reason for updating the waiting time of the providers selected by competitors is that the jobs have a chance to be submitted to the selected providers before the requester submits its job to the selected providers.

Based on the updated waiting times of the contactable providers, we reuse the previously proposed heuristics to select an appropriate provider for a requester. We propose the following three extensions of the previous heuristics for this level of information.

Minimum Completion Time-Deterministic-Complete (MCT- D-C): In the pre-scheduling phase, we assume that each competitor adopts the MCT-D heuristic to select providers, and uses the competitors’ execution times as input to derive the updated waiting times of their respective providers. After pre-scheduling the competitors by the MCT-D heuristic, the provider that offers the minimum completion time is selected for the job’s execution.

Minimum Completion Time-Probabilistic-Complete (MCT- P-C): In the pre-scheduling phase, we assume that each competitor adopts the MCT-P heuristic to select providers, and uses the other competitors’ execution times as input to derive the updated waiting times of their respective providers. After pre-scheduling the competitors by the MCT-P heuristic, the

probability of selecting one of the providers is proportional to the inverse of the completion time of the provider over that of all contactable providers.

Dissolve-Probabilistic-Complete (Dissolve-P-C): In the pre- scheduling phase, we assume that each competitor adopts the Dissolve-P heuristic to select providers, and uses the competitors’ execution times as input to derive the updated waiting times of their respective providers. After pre-scheduling the competing jobs by the Dissolve-P heuristic, Algorithm 1 is re- applied to calculate the selection probability of the providers;

however, the waiting time in Algorithm 1 must be replaced by the updated waiting time derived by the pre-scheduling process.

4.3. Competitors’ execution times

In this level of information, providers reveal the competitors’

execution times, but not the order of the competitors. Due to the lack of order information, we cannot derive the time precedence of competitors, e.g., the case inFig. 4(d). To solve this problem, we sort the order of competitors in an arbitrary manner.

For this information level, we propose three heuristics, Minimum Completion Time-Deterministic-Execution Time (MCT-D-E), Mini- mum Completion Time-Probabilistic-Execution Time (MCT-P-E) and Dissolve-Probabilistic-Execution Time (Dissolve-P-E). They are simi- lar to the heuristics for Complete Information about Competitors, but the order of competitors is arranged arbitrarily.

4.4. The number of competitors

Requesters know the number of competitors, but not the order and execution times of the competitors. Without the time information, we cannot determine the updated waiting times of the providers selected by competitors in the pre-scheduling phase. To solve this problem, we substitute the execution time of the requester’s job for those of the competitors. For instance, inFig. 4, the execution time et₁_,_i_,_kof job j_i_,_kis used as a substitute for ex- ecution times et₁_,₄,^et1,5,^et1,6 of jobs j₄,^j5, and j₆. For this level of information, we propose three heuristics, Minimum Comple- tion Time-Deterministic-Number (MCT-D-N), Minimum Completion Time-Probabilistic-Number (MCT-P-N) and Dissolve-Probabilistic- Number (Dissolve-P-N). They are similar to the heuristics for Com- petitors’ Execution Times, but they take the execution time of the requester’s job as a substitute to derive the updated waiting times of the involved providers in the pre-scheduling phase.

5. Performance evaluation 5.1. Experiment setup

The objective of the experiments described in this section is threefold: (1) to evaluate the performance of each proposed heuristic under the various levels of information revealed by resource providers; (2) to determine the impact of non-cooperative requesters; and (3) to compare the performance with a centralized matchmaking model. For the experiments, we developed a resource selection tool running on Taiwan UniGrid [38] to evaluate the performance of the presented heuristics.Table 2details the set-up parameters used in the experiments. We selected 20 machines from Taiwan Unigrid as resource providers, as shown in the first column inFig. 4. For the job types, we adopted the Lin- pack benchmark [39,40] and four application benchmarks, namely, the Fast Fourier Transform (FFT), Jacobi Successive Over-relaxation (SOR), Monte Carlo Integration (MCI) and Dense LU Matrix Factor- ization (LU), provided by SciMark2 [41]. As shown inTable 3, four of the benchmarks comprised two different-size problems. MCI (not

(7)

Table 2

Experiment settings.

Parameter Value

Number of providers (|S|) 20

Number of requesters (|R|) 4, 8, 12, 16 and 20 Number of contactable providers

(|Qi,k|)

20 and Random(20)

Job arrival rate Negative Exponential Distribution with mean=50 seconds

Deadline for waiting bids (di) 20 s Experiment period (T ) 15 min

shown) only had one problem. Thus, we had a total of 9 benchmarks, from which the type of each experimental job was selected at random.Fig. 4shows a snapshot of the time required to execute each type of job on each selected machine. The result is the average of executing each type of job three times.

U_p(^jn+1) =

n

X

i=1

(wi·U(^ji)), ^wherewi= ⁱ

n

P

j=1

. ⁽²⁾

To help providers report the predicted job execution time for each proposed bid, we adopt the weighted-mean prediction function to predict the execution times of jobs that belong to the same job type. Eq.(2)is the formal definition of the weighted-mean prediction function, where U_p(^jn+₁)is the predicted execution time for the next job, and U(^ji)denotes the actual execution time of the previous job. Each actual execution time is assigned a weight, based on the freshness degree of the jobs already executed, i.e., the execution time of the most recent job has more weight than the execution times of the second and third most recently executed jobs. In our experiments, we take the execution times of the last 5 jobs as historical data to predict the execution time of the next job.

To solve the problem of the lack of historical data at the beginning of the experiments, we use the execution times listed inTable 4 as historical data to predict the execution times of the first 5 jobs.

In the following experiments, the deviation rate of execution time prediction is approximately 7% on average.

We conduct four groups of experiments to evaluate the performance of the proposed heuristics for each level of information. To observe the influence of the system load on the performance, we use various numbers of requesters, ranging from 4 to 20, to represent the relative system loads. Using 4 requesters allows us to assess the performance when only a small number of requesters contend for resources, whereas 20 requesters allows us to observe the performance when a large number of requesters compete for limited resources. Clearly, the system load is proportional to the number of requesters that join the system.

We also consider the impact of the number of contactable providers on the performance of resource selection. Specifically, we assess two groups, comprised of 20 and Random(20) providers respectively. The different numbers of contactable providers represent the levels of heterogeneity of the providers.|Q_i_,_k| =

|S| = 20 indicates that the providers are homogeneous in terms of their specifications, except for their computational capability.

Thus, potentially, all providers could be accessed by all requesters.

In contrast,|Q_i_,_k| =Random(20)<20 simulates a heterogeneous system in which the number of providers that can service job j_i_,_k is Random(20), selected randomly from the original 20 providers.

We repeated the evaluation of the proposed resource selection heuristics for each case 10 times and took the average result as the experimental outcome.

5.2. Evaluation results

In this section, we present six sets of evaluation results. First, to determine the distribution of competitors in the experiments, we consider the average number of competitors per job compared to the number of requesters and the number of contactable providers. Second, we evaluate the performance of the heuristics on each information level and observe the influence of different contactable providers. Third, we discuss the effect of various information levels on the MCT-D and Dissolve-P heuristics. Fourth,

Table 3

Benchmarks and problem sizes (excluding MCI).

Benchmark Linpack FFT SOR LU

Large Small Large Small Large Small Large Small

Problem size 3600 1800 4 194 304 1 048 576 5000 2500 4000 2000

Table 4

The average time (in seconds) required to execute each type of job on the selected machines.

Machine Linpack FFT SOR LU MCI

Large Small Large Small Large Small Large Small

NTCU 1 122.08 15.47 34.03 6.12 6.39 5.85 158.47 20.19 4.54

NTCU 2 100.81 13.74 21.98 5.22 5.85 5.38 113.90 14.80 4.31

NTCU 3 136.63 18.71 31.21 6.70 8.91 7.24 244.57 20.29 6.48

NTCU 4 99.08 14.20 33.93 7.08 9.51 7.67 155.19 16.22 7.01

NTHU 1 130.39 18.44 32.61 6.78 8.42 6.54 151.04 24.73 6.56

NTHU 2 130.38 18.51 33.79 6.45 8.47 6.56 194.66 19.29 6.53

NTHU 3 130.27 18.54 33.84 7.04 8.45 6.55 150.35 19.24 6.53

NTHU 4 110.69 16.00 28.54 5.32 6.27 5.75 120.83 15.77 4.55

NTTU 1 159.47 20.02 25.41 5.64 10.98 4.72 189.96 24.41 4.40

NTTU 2 112.16 16.05 26.74 5.61 7.55 5.84 130.04 16.97 5.85

NTTU 3 112.17 16.05 26.99 5.58 7.52 5.84 129.76 16.96 5.85

NTTU 4 112.43 16.07 26.84 5.57 7.55 5.83 129.98 16.96 5.85

Sinica 1 98.38 13.81 33.24 7.20 17.69 8.83 154.84 16.09 7.01

Sinica 2 85.45 11.90 28.48 5.93 9.38 7.27 114.87 15.06 5.37

Sinica 3 85.50 11.87 28.53 5.47 9.25 7.27 114.98 14.98 5.39

Sinica 4 85.52 11.85 25.71 5.32 13.22 8.66 114.97 15.05 5.40

CJCU 1 822.60 110.13 136.93 32.04 20.78 7.55 892.57 120.21 7.14

NCHC 1 381.39 50.25 61.95 13.21 8.53 5.78 492.10 60.26 4.30

NTU 1 83.09 11.58 22.42 5.18 7.44 6.48 121.04 15.39 7.03

THU 1 350.67 45.10 59.52 14.31 6.61 5.59 443.60 56.37 5.35