Grid Workflow System - 利用網格技巧改善一個大規模工作流程管理系統的回應時間

1. Introduction

2.3 Grid Workflow System

Tools and applications

Discovery, broking, diagnostics and monitoring

Secure access to resources and services

Diverse resources such as computers, storage media, networks, and sensors

Connectivity and resource protocols Collective services

Fabric User applications

Figure 2-2 The Grid Architecture

2.3 Grid Workflow System

The concepts of workflow are extremely important in grid computing. The systems manage the job dependencies and control the flows of jobs in a gird computing are called grid workflow system. Today, there are many grid workflow systems, such as GridAnt [9], Triana [10], XCAT [11], GridFlow [12], Kepler [13], and Grid-Flow..

All of the grid workflow systems mentioned above are facilitated to orchestrate grid-enabled programs or services. A common feature for these grid workflow systems is that they all provide a graphical user interface and a script language for users to model the workflow process.

Chapter 3. A Grid-Enabled Scalable Workflow Computing

Platform

In this chapter, firstly, a grid-enabled scalable workflow computing platform based on Agentflow is introduced. This scalable platform produces acceptable and stable request response time under a wide range of request workloads. In addition to the system architecture, the strategies for achieving on-demand resource provisioning are also presented in the chapter.

3.1 The PASE Grid Architecture

The PASE server is a workflow enactment subsystem in an Agentflow system which drives the flow of works and facilitates process enacting, control, management, and monitoring. Because there is only one centralized PASE server with a dedicated database server in the original Agentflow system, the request response time will increase greatly when the requests arrive at the PASE server at a high rate. In such circumstances, the single centralized server for the platform becomes the performance bottleneck. In order to solve the performance issue, this thesis proposes a scalable workflow computing platform, PASE grid architecture, which extends a Agentflow system to a grid-enable system.

The following three concepts of grid computing guide us in designing the PASE grid architecture:

Virtualization

A PASE broker in the PASE grid provides a java-based interface, PASE broker

Common Interface (PBCI), which implements a set of functions supporting users to query and acquire PASE resources in the grid. Clients interact with PASE grid through the PBCI and need not know the underlying grid configuration such as the amount of PASE servers available and the computing speed of constituent machines. The PASE Grid thus like a powerful and scalable super-PASE server looks to the clients

Utilizing resources across different administrative domains

With the PASE Information server which will be described in details in section 3.1.3, resources located in different administrative domains could be dynamically integrated together to achieve a common goal, in some way realizing the concept of virtual organization

On-demand resource provisioning

When all the existing PASE servers have been overloaded, the PASE grid would automatically discover more computing resources and appropriately configure them to become newly available PASE servers to share the request workloads. On the other hand, when the incoming requests decrease and the overall system has been under-utilized, the PASE grid will remove a portion of the PASE resources to allow them to be utilized by other demanding PASE brokers or other applications.

The PASE grid architecture is shown in Figure 3-1, the components in which will be elaborated in the following sub sections.

Figure 3-1 The PASE grid architecture

3.1.1 PASE Resource

A PASE resource is composed of software and hardware resources. The software resources include a PASE server which is a flow engine derived from the Agentflow system and a database used to store runtime data for PASE server and replicas of process definitions.

The hardware resource is typically a computer like PC, notebook, or workstation on which the software resources can run. The PASE server and database of the PASE Resource can run on one or more site. Each PASE resource is managed by one PASE information server (PIS), and it can be used by only one PASE broker at any instant.

A PASE resource must be registered to the PIS in its domain before it can be included into the PASE grid. When a resource provider wants to withdraw the PASE resource provided

by him, he must ask the PIS to delete the related record of the PASE resource from its database.

3.1.2 Process Definition Repository and Global Runtime Repository

The process definition repository (PDR) contains some business process definitions designed by process designer using process definition editor (PDE). When the PASE broker needs to add a new PASE resource, the PIS will replicate the content of PDR into the database of the new PASE resource according to the incoming request. In each domain, there can exist more than one PDR, and each PDR can be accessed by more than one PASE resource. The administrators are responsible for registering the PDR into the PIS.

The global runtime repository (GRTR) contains those workflow instances which have finished their execution. The completed workflow instances are kept in the repository for future references. When the PASE broker wants to remove a PASE resource, it will move the PASE resource’s runtime data into the GRTR. There is only one GRTR in the PASE grid, which is managed by the PASE broker.

3.1.3 PASE Information Server

The PASE information server (PIS) plays a role, similar to the MCAT in Storage Resource Broker [15] or Grid Information Service (GIS) in Globus Tookit [14], which is used to maintain necessary information about a domain and all the PASE resources in the domain.

Furthermore, it is also responsible for replicating data from the PDR into new PASE resource and clearing the database of removed PASE resource..

The following tables describe the information maintained by PIS, such as PASE general information (PASE_Geninf), and process definition repository general information (PDR_Geninf). The information is required to assist the PASE broker in accessing, discovering, monitoring, and managing PASE resources.

Table 3-1 PASE_Geninf

Attribute Description

PASE_ID The unique id of PASE Resource PASE_Host The host of the PASE Resource PASE_Port The port of the PASE Resource

Database_Name The database name of the PASE Resource Database_Host The host of the database of the PASE Resource Database_Port The port of the database of the PASE Resource Database_User

Database_Password

The user name and password grant for database accessing

PDR_Id The id of PDR to which the PASE can refer State The state of the PASE Resource.

Load_Max_Limit The limit on the load (instances) ArrivalRate_Max_Limit The limit on the arrival rate

. Table 3-1 illustrates the general information of PASE resource. The unique id is composed of (host:port). The state of PASE resource can be ready, reserving, running, or blocking. The ready state stands for the PASE resource being available, i.e. the database of the PASE Resource is already created, and the PASE server of PASE resource is started. The reserving state represents that the PASE resource is reserved by PASE broker, but it is not

connected to the PASE broker yet. The running state stands for the PASE broker being using the PASE resource for serving incoming requests. The blocking state stands for the failure of the PASE resource.

‘

Table 3-2 PDR_Geninf

Attribute Description

PDR_ID The unique id of the PDR

PDR_Name The database name of the PDR

PDR_Host The host of the PDR

PDR_Port The port of the PDR

PDR_User PDR_Password

The user name and password grant for PDR accessing

Table 3-2 describes the general information of PDR. The unique id of PDR is start as

“PDR”. When an administrator registers a new process definition repository into the PASE grid, the PIS will generate this information according to the properties of the PDR.

3.1.4 PASE Broker

The PASE broker is a vital part of the PASE grid; it is responsible for coordinating the PISs, PASE resources process definition repositories, and the global runtime repository. It can manage multiple PISs and use the PASE resources belonging to those PISs. The architecture of the PASE broker is represented in Figure 3-2.

Figure 3-2 The architecture of PASE broker

PISManager is responsible for managing the PIS connections (PISCs) to all PISs. It retrieves and caches the information maintained in PISs. Initially, the administrator can select the PASE resources and the PDRs he/she wants to used, then the PISManager send replication request to PISs for replicating process definitions into each PASE resource.

The PDRManager manages the PDR connection (PDRCs) to all PDRs. The requests from clients that want to get the process definition relevant data are manipulated by the PDRManager. The GRTR Manager backups the workflow instance from the PASE resource which is to be removed by the PASE broker.

The WFCIPoolManager creates AbstractWFCIs (AWs) to connect to those selected PASE resources with the RMI mechanism. The AW is a component to wrap the WFCI connection and records some metadata about the WFCI connection, such as a list of processes and a list of member records. In addition, AW measures some metrics, like load, average arrival rate of requests, and average response time of requests. Those metrics are as the basis used by PerformanceMonitor for monitoring performance.

The WFCIPoolManager contains three kinds of pools corresponding to different states of AWs, including running pool, suspending pool, and blocking pool. The running pool contains the healthy AWs. The suspending pool contains AWs which would not take any new create- process requests but still have some unfinished workflow instances running on it. The blocking pool contains AWs which run into some kinds of failure founded by the PASE broker.

The PerformanceMonitor (PM) monitors the performance of the overall system by different load determination modes. These modes include load (instance), the average request arrival rate and the average request response time. When the system is overloaded, it informs the WFCIPoolManager to find out new available PASE resources from PISs and create connections to them. If there are no new PASE resources found, it sends an alert to administrators and they can add new PASE resources manually. Moreover, when the system has been under-utilized for a specific period of time, it also informs the WFCIPoolManager to remove some AWs.

The PASEDispatcher does some pre-actions for each request to manipulate the some workflow instance relvant parameters, such as the identity of task (TskID), the identity of artifact instance (AnsID), the artifact instance (PASEartInstance), and the identity of attached file (FileID). It then selects an appropriate PASE resource and delegate the request to it Table 3-3 shows an example.

Table 3-3 Instance id manipulation and request dispatching

InstanceID Allocated Resource ID in the Resource

Tsk(140.113.210.11:20000)000000000001 140:113.210.11:20000 Tsk000000000001 Ans(140.113.210.21:20000)000012345678 140.113.210.21:20000 Ans000012345678

Before each request returns to user, the PASE broker does post-actions to append the identity of PASE resource to instance relevant data, and merge the return data from different PASE resources. Table 3-4 shows an example

Table 3-4 Returned id manipulation Returned ID Source Resource Appended ID

Tsk000000000001 140:113.210.11:20000 Tsk(140.113.210.11:20000)000000000001 Ans000012345678 140.113.210.21:20000 Ans(140.113.210.21:20000)000012345678

3.2 On-Demand Resource Provisioning Strategies

This section discusses the resource provisioning strategies used in the PASE grid. There are two kinds of demand, each of which has different strategies. The following sections represent the detail of those strategies.

3.2.1 User Request Processing

The response time of each request is determined by the computing capabilities of the PASE server and the capacity of the database server in the centralized Agentflow system. The PASE grid architecture alleviates the performance bottleneck of the centralized server with its

scalable computing capabilities, and thus produces shorter response time for user requests.

Among different kinds of requests, manipulate task requests (MTRs) and process enactment requests (PERs) can benefit from this PASE grid architecture. On the other hand, the collect data requests (CDRs) would take a little bit longer time than in the original centralized architecture. Therefore, overall speaking, the proposed PASE grid architecture can effectively improve the runtime performance for most workflow activities. How each kind of requests is processed is illustrated in the followings.

The PER is used to create a workflow instance according to a predefined process definition, such as createProcess() in the PBCI. When a PER occurs, a PASE resource is then selected for processing the request according to a dynamic request dispatching algorithm which is described in Table 3-5. The unique PASE resource will be provided for PER according to the metric selected by the administrator. The PER resource provisioning algorithm is represented in Algorithm 3.1.

Table 3-5 The Dynamic Request Dispatching Algorithm for PER Algorithm 3.1 (Dynamic Request dispatching for PER)

Input:

03 List wfciList=wfciPoolManager.getRunnongs();

04 AbstractWFCI t=new AbstractWFCI();

05 For each AbstractWFCI a∈wfciList do

06 // Compare the current workload of each PASE resource

07 If (a.getMaxLoadByMode(M)-a.getLoadByMode(M))>

08 (t.getMaxLoadByMode(M)-t.getLoadByMode(M)) 09 t=a;

10 EndIf 11 EndFor

12 R = t.getID();

13 End

The PASE resources required for the CDR and MTR are not determined by the dynamic request dispatching algorithm. The CDR is used to retrieve the instance relevant data or process definition related data, e.g. getTaskList() or getMemberRecrod() in the PBCI. A CDR may require more than one PASE resources to collaboratively accomplish its request and these PASE resources are determined by the data to be retrieved. The MTR is used to manipulate an task or a group of tasks, e.g. startTask(), suspendTask() and completeTask() in the PBCI. A MTR will be dispatched to the PASE resource where the workflow instance generating this request was created

3.2.2 Adaptable Resource Allocation

The PerformanceMonitor monitors the performance of each PASE resource in the PASE grid, it sends the event to the WFCIPoolManager when the entire PASE grid is overloaded or

under-utilized. Table 3-6 describes the PASE grid performance monitoring algorithm.

Table 3-6 PASE grid Performance Monitoring Algorithm Algorithm 3.2 (Performance Monitoring Algorithm)

Input:

The monitoring interval I The list of running AWs L The load determination mode M

/* When a continuous underutilized time period exceeds this predefined threshold, the system will remove some PASE resources. */

A time period C

15 u_count++;

automatically discover more computing resources and appropriately configure them to become newly available PASE servers to share the request workloads. Table 3-7 shows the structure of PASEProperty used to describe a PASE resource and its content stored in the PASE_Geninf in PIS. Table 3-8 describes the adding resource algorithm.

Table 3-7 Structure of PASEProperty Structure PASEProperty {

03 List sList=wfciPoolManager.getSuspendingPool();

04 If sList.size()>0

12 wfciPoolManager.moveSuspendingToRunning(pID);

13 return;

26 pisManager.updatePASEState(aList, “Ready”);

27 pisManager.replicatePDR(t.id,pdrID);

28 wfciPoolManager.connectToServer(t.id);

29 End

Remove Resource

On the other hand, when the incoming requests decrease and the overall system has been under-utilized, the PASE grid will remove a portion of the PASE resources, allowing them to be utilized by other demanding PASE broker. Table 3-9 represents the removing resource algorithm. The algorithm just moves the AW representing the PASE resource to be removed removing into the suspending pool in WFCIPoolManager.

Table 3-9 Removing Resource Algorithm

In the WFCIPoolManager, the SuspendChecker periodically uses a suspending checking algorithm to check all the AWs in the suspending pool. For those AWs in which all workflow instances have finished, the SuspendChecker informs the GRTRManager to

backup instances data and then asks the PISManager to clear the instance data and process definition data from the database of the PASE resource. Finally, the WFCIPoolManager disconnects the PASE resource from the PASE broker. Table 3-10 shows the suspending algorithm.

15 EndWhile 16 End

Chapter 4. Performance Evaluation

Based on the PASE grid architecture described in Chapter 3, we have implemented a prototype system, and conducted a series of experiments for performance evaluation. Section 4.1 describes the configurations of the PASE grid environment and related experimental settings. A program that drives the series of experiments is described in Section 4.2. Finally, the results of experiments are shown and discussed in Section 4.3.

4.1 Experimental Settings

4.1.1 PASE Resources

In the following experiment, we include four PASE resources in the PASE grid environment. The information about the software and hardware configurations of each PASE resource is shown in Table 4-1. In addition, all PASE resources will use the same process definition repository in the experiment. The process definitions in the process definition repository are described in section 4.1.2

Table 4-1 PASE resources

Resource Host CPU Memory Database Agentflow

140.113.210.11 AMD Athon64 1.81GHz DDRⅡ 1GB MySQL 4.1 2.2.3.2 140.113.210.18 AMD AthonXP 1.83GHz DDRⅡ 512 MB MySQL 4.1 2.2.3.2 140.113.210.21 AMD Athon64 1.81GHz DDRⅡ 1GB MySQL 4.1 2.2.3.2 140.113.210.23 AMD Athon64 1.81GHz DDRⅡ 1GB MySQL 4.1 2.2.3.2

In the experiments, we explore three different types of metrics for defining the load limit on each PASE resource, The three types of metrics are workflow instance number, request arrival rate, and average response time. The first two metrics are workload directed, and the third is performance directed. Since the load limits should be directly related to user’s awareness of system performance, the load limit values for the first two metrics are dependent on the computing capabilities of the underlying machines, and the load limit values for the third metric are consistent on all machines. The limit values used in the experiments are shown in Table 4-2. Since the memory space and the power of the CPU on 140.113.210.18 is smaller than on other machines, the limit values for the first two metrics on it is set to be lower than on others..

Table 4-2 Limits on three metrics of PASE resources

4.1.2 Process Definitions and PDR

The process definitions adopted in the experiments are real cases obtained from [16], which are used to construct a department management system in universities. The department management system includes several subsystems, such as 1) the working system for master

140.113.210.18 250 0.00025 2000

140.113.210.21 300 0.0005 2000

140.113.210.23 300 0.0005 2000

students, 2) the working system for Ph.D. students, 3) bulletin system, 4) department computer & network center, and 5) laboratory. The services of these subsystems are defined with specific processes designed by and run on the Agentflow system.

In the following experiments, we created 1,500 member representing faculties, assistants and students, who manipulate department management system to accomplish all sorts of tasks which are present in daily operations of a department.

4.1.3 PASE Information Server

To establish a PASE grid, the PASE information server (PIS) first needs to set up several tables in the database server. These tables maintain the information about the PASE grid status, which has been described in Section 3.1.2. The following two figures show the essential data being stored into the database of PIS for the following experiment. In this experiment the PIS runs on 140.113.210.11:2099.

Figure 4-1 PASE_Geninf

Figure 4-2 PDR_Geninf 4.1.4 PASE Broker

Before the PASE broker start working, we must select some PISs and enter their host and port information into configure file of PASE broker. We also need to set the values of some attributes for the performance monitor. The snapshots of these two steps are shown in Figure 4-3 and Figure 4-4.

Figure 4-3 PIS Configurations

Figure 4-4 Performance Monitor Configurations

In Figure 4-4, we can select a monitoring mode for the performance monitor, and set the lower bound as well as upper bound of that mode. The upper bound values in the Performance Monitor configurations are default values when the administrator does not set those values in the PIS. The Arrival Rate Buffer Size is the time interval for the PASE broker to measure the request arrival rate. The Response Time Buffer Size is the amounts of requests collected to measure the average response time.

When the above settings are completed, we can then start the PASE broker, and it will

在文檔中利用網格技巧改善一個大規模工作流程管理系統的回應時間 (頁 17-0)