輕量化點對點格網系統中工作遷移之研究與實作

(1)

國立台中教育大學數位內容科技學系碩士班碩士論文

指導教授：賴冠州博士

輕量化點對點格網系統中工作遷移之研究

與實作

Study and Implementation of Job Migration

on Light-weighted Peer-to-Peer Grid

Systems

研究生：林士傑撰

中華民國九十七年六月

(2)

摘要

隨著資料量越來越大、複雜度也愈增加，因此需要更多更密切的全球協同合作 (Global Coordination/Cooperation)和資源的有效利用。格網計算(Grid Computing)是解決

複雜問題的一項新技術，其主要觀念在於計算資源共享的社群觀點。全世界的格網最大的挑戰便是該如何透過不同的虛擬組織、管理、存取權限來達到資源的共享和整合。目前，在一些整合性的組織派送工作時仍需要指定特定虛擬組織去執行。而計算資源的使用往往集中於某些計算能力較強大的節點，導致計算資源使用上仍不平衡。本研究分析點對點網路的優點，以計算資源的蒐集、整合及使用為主要原則。我們架構了一個點對點的輕量化格網中介軟體雛形系統以及分散式的資源代理人(Resource

Broker)，此系統命名為服務導向的漫遊系統 (Service Oriented Roaming System)，並設計

與實作一個能夠跨虛擬組織(Virtual Organization)的點對點格網系統通訊環境。我們也提

出了工作轉移的機制與負載分享策略，能夠改善閒置電腦資源的使用率及讓計算資源分

享能夠更有效率。本研究希望能夠以此通訊環境為基礎，有效整合更多計算資源及改善計算資源調配。

(3)

Abstract

With the increasing of huge and complex data, we need more dovetailed global

coordination/cooperation to use Grid resources efficiently. Grid computing is a new

technology for solving these complex problems, and its main concept is to share computing

resources in the community. Compared to the cluster computing, Grid computing is no longer

limited in the intranet, the grid system but connects different geographical computing

resources by Wide Area Network (WAN). The biggest challenge of Grid projects is how to

share and integrate computing resources across different virtual organizations. At present,

users still submit a job to a specific virtual organization for job execution. However, it still

fails to achieve the integration and sharing of different geographical and VO’s resources.

This study analyzes the advantages of Peer-to-Peer network for computing resources

collection, integration and use. We built a Peer-to-Peer Grid middleware, named Service

Oriented Roaming System, and design a crossing multi-VO Peer-to-Peer communication

model. We also propose a job migration mechanism and load sharing polices for improving

the utilization of idle computer resources. This study could improve the integration of

computing resources and achieve the efficient computing resources sharing based on our

proposed communication environment.

(4)

List of Figures

Figure 2-1 Commercial Grid solutions taxonomy ... 11

Figure 2-2 Globus Toolkit 4 Services ... 16

Figure 2-3 The JXTA three-layer architecture ... 25

Figure 2-4 JXTA XML format ... 27

Figure 2-5 The JXTA project protocols ... 29

Figure 3-1 The architecture of Globus Toolkit ... 37

Figure 3-2 The architecture of SORS ... 39

Figure 3-3 The structure of traditional resource discovery in Grid system ... 41

Figure 3-4 The structure of resource discovery in SORS ... 42

Figure 3-5 Proposed Load Barrier Policy Algorithm ... 44

Figure 3-6 Proposed Heterogeneity Policy Algorithm ... 48

Figure 3-7 The Load Sharing scheme ... 49

Figure 4-1 Taiwan UniGrid website. ... 53

Figure 4-2 Tiger Grid website. ... 53

Figure 5-1 50 Jobs Total Response Time of Load Barrier Policy ... 56

Figure 5-2 50 Jobs CPU Usage of Load Barrier Policy ... 57

Figure 5-3 50 Jobs execution process of 10/100 ... 58

Figure 5-4 50 Jobs execution process of 100/10 ... 58

Figure 5-5 The Average Job Response Time of three VOs ... 60

Figure 5-6 Compare with the total response time of two policies ... 61

(7)

List of Tables

Table 1-1 Terms and their meanings ... 7

Table 2-1 The Grid computing applied to enterprises or products in recent years. 13 Table 3-1 The comparison with Globus Toolkit and SORS functions ... 40

Table 4-1 Detail list of our computing nodes in this study ... 54

Table 4-2 The bandwidth speed of these four sites ... 54

Table 5-1 The computing nodes of Load Barrier Policy experiment ... 55

Table 5-2 The computing nodes of Heterogeneity Policy experiment ... 59

(8)

Chapter 1. Introduction

1.1 Motivation

With the advance of the science technology, the Grid has been gradually developed. Grid

systems include the network infrastructure and the software architecture, which provide

distributed computing resource platforms. Because of the development of internet and the

technology of computing, the internet users not only share files easily, but also share the

extensive computing resources. Grid systems provide the network services, communication

channels and computing resources. The main functions of the Grids are using distributed

computing resources more efficiently and providing users transparent services in Virtual

Organizations (VOs). These computing resources may be scattered in the different

organizations or the different regions. At present, Grid applications have adopted in various

fields such as the medicine simulation, medical and high-energy physics.

The Grid systems could be broadly characterized as the computational Grid and the data

Grid. Computational Grids focus on computing resources sharing, and data Grids focus on

storage resource sharing [43]. IBM developed e Business on Demand (eBoD) [12] in 2002,

and arise the On-demand Grid development. In the future vision of eBoD, the user of

enterprises use computing resources would like to use as electricity. The user could start the

service when need it, and can stop it when the user does not need service anymore. The eBoD

made a big change from focusing on storage capacity scalability in the past to choosing the

computing resources according the user’s demand. In the Grid system frameworks, satisfy the

real time demand by uses or exchange the computing resources all over the world, such a

concept known as computation on demand. Before the Grid technology been developed,

(9)

computing more efficiently by use distributed resource, and laid the foundation of Grid

science development.

Overall, the most important concepts of the Grid systems are capability of “the dynamic

computing resources” and “cross multiple virtual organizations”. The term “Grid” was coined

in the mid-1990s to denote a proposed distributed computing infrastructure for advance

science and engineering [18]. The concept of power Grids means that when user use

electricity not necessary understanding where the power comes from, let the user can get the

cheap and reliable power [18]. The first persons complete definition the Grid are Dr. Ian

Foster, Dr. Carl Kesselamn and Dr. Tuecke. The father of Grid computing Dr. Ian Foster said

the view of Grid computing is “Grid computing solves the issues of resources sharing and

cross multiple virtual organizations dynamically, and can be flexible, safe and coordinated to

achieve resource sharing＂. July 2002, Dr. Ian Foster pointed out three characteristics of the

Grid computing more concrete [15], listed as the following:

(1) A GRID coordinates resources that are not subject to centralized control.

(2) Uses standard, open, general-purpose protocols and interfaces.

(3) To deliver nontrivial qualities of service.

On the other hand, Peer-to-Peer computing, such as Napster [36]、Gnutella [21] and

Freenet [19], these file-sharing systems are similar with the Grid computing system. They

both share resource from community organizations. With the Grid systems scale expand, have

begin need to solve some issues, such as self-configuration, fault tolerance, system scalability,

and other topics [16]. In many Peer-to-Peer studies provide above solutions. Peer-to-Peer

systems focus on dealing the issues with instability and instantaneous flow, self-configuration

and fault-tolerant. However, the developments of Peer-to-Peer technology mainly work in

vertical integration and develop application, rather than work in constitute the universal

(10)

develop the open standard, and ensure that Grid systems can be integrated with the

applications by provide commonality interface. Of course, there will have new technologies

or tools be produced in the process of the Grid systems development in the future. Represent

that the standards also need to check and amend at anytime.

When Peer-to-Peer technologies among the delicate and complex applications, such as

decentralized structure, desktop application and network computing. Begin to look forward

the Peer-to-Peer and the Grid computing could make a strong integration, and it will be a new

beginning and output in the future.

In this study, we present a light-weighted Grid middleware for computing resources

collection, integration and use. We expect that we can help job execution more efficiently and

improve the idle computing resource usage by this Grid middleware.

1.2 Problems

Currently, the biggest challenges in the Grid systems are how to establish and manage sharing

relationships from different VOs. These challenges must be solved by the new technologies.

Some of the Grid system frameworks in job submit need to specify VO to execute, and

allocate the computing resources by using the middleware (such as condor) to allocate

resource in the same VO. A credible Grid middleware should provide convenient and flexible

environment for the Grid user, but in different Grid systems also have various types of

middleware, the accompanying problem is difficult to integrate computing resources.

Most of the Grid systems do not have same communication channel (pipe line) between

the VOs, coupled with the consideration of security and the target of different VOs, result in

difficult come to integrate computing resources. If we adopt Peer-to-Peer technology to

(11)

The resource adjustment mechanism in the Grid systems can be divided into global and

local adjustment mechanism. Most of the global adjustment mechanisms are deployed by

resource broke or agent. Resource broker play the computing resource allocator role between

the user and the resource provider in the Grid systems, help users find suitable machine for

their job and complete the information access transaction. Currently, the Globus Toolkit does

not provide resource broker function, and must be compatible with other resource allocation

systems to allocate computing resources, such as EGEE (Enabling Grids for E-Science and

Industry in Europe) [64] developed the middleware named gLite, and it adjusts the job by the

global resource broker. Global resource broker could assign the job to the appropriate

computer elements (CE) to execute. If the jobs wait for execution in the job queue of the CE,

the global resource broker unable to re-schedule these jobs anymore. It illustrates that

dynamic of resource broker inadequate.

In the fact, computing resource is extremely dynamic in the Grid systems. For examples,

the computing resource join/leave at any time, different computing power and the network

bandwidth speed. All these factors will influence the performance of job execution. In this

heterogeneous environment, computing resources may in a high load or light load conditions

at any time. Some of specific computing nodes have high utilization , such as the computing

nodes who own a better computing power, the user will submit the job to these nodes first.

Many of the computer nodes have proof computing power, they often be ignored. If we can

improve the usage of computing resource and reduce the idle chance of computing resource,

the efficiency of the job execution will be able to improve. Many load balancing and load

sharing studies solved the computing resource adjustment and resource imbalance by job

migration mechanism. In traditional practices, considered less on the significant heterogeneity

(12)

environment. Therefore, how to take the right measure standard is the important issue in load

balancing or load sharing studies.

1.3 Purposes

Grid systems not only provide computing resources, but also make computing resources do

the maximum validity of the utilization. The principal thrusts in this study are computing

resources collection, integration and use, and with three main contributions to (1) integrate

computing resources by Peer-to-Peer technology (decentralized structure) (2) build the

light-weighted Grid middleware by Peer-to-Peer technology, and (3) to improve the

utilization of idle computer resources by job migration mechanism, and to achieve the

efficiency computing resources sharing. At present, the most of the Grid systems middleware

are using Globus Toolkit and it as a major standard component. In this study, we develop the

number of modules and supply a light-weighted Grid middleware system prototype by

Peer-to-Peer technology, named Service Oriented Roaming System (SORS). According by

construct the unite communication pipeline in SORS, we makes the message transfer and job

execution more efficiency. The modules in SORS consist of the Basic Service, File Transfer,

Information Service, Execution Management and Load Sharing. Through these modules

support, achieve the integration of resource information, the utility of computing resource and

the load sharing. In load sharing management, we use the distributed strategy, and consider

the heterogeneity factors of the Grid system, including the bandwidth speed, the ability of job

execution, and so on. Let the job choosing the most appropriate computing resources to

(13)

1.4 Restrictions

The restrictions of this study, Grid systems are deployed by different regions, virtual

organizations, storage systems, software architectures, and with a high degree of changing. In

our experimental, we consider the budget and the management of the facilities. Our

experimental environment adopts the Grid organizations that the laboratory current joins

(Taiwan UniGrid and Tiger Grid), and come to job migration into cross site and cross multiple

virtual organizations. The scale size of the experimental environment in this study is small

than the actual environment, but can help the managers to control computing nodes state at

any time. The changing and different of experiment results also small than the large scale Grid

systems.

Security is an important consideration in the Globus Toolkit. Different virtual

organizations require different certifications to ensure security when the members use

computing resources. Because of this study focus on cross sites and cross multiple virtual

organizations by Peer-to-Peer technology, in order to complete control all the computing

nodes. Therefore, we do not consider the Grid Security Infrastructure (GSI) of Globus Toolkit

(14)

1.5 Notations

The related terms and the meanings used in this thesis are given in Table 1-1. These symbols

will be used in the following chapters.

Table 1-1 Terms and their meanings

Term of name Meaning

L_AL Local Average Load

R_AL Remote Average Load

L_LB Local Load Barrier

R_LB Remote Load Barrier

Job_idle The idle job in the job queue

Job_Size The size of a job

AVG_Bandwidth The average bandwidth speed

Migration Cost The job migration cost

Local_JRT The job response time in local site

Remote_JRT The job response time in remote site

AVG_JRT The job average response time of all jobs

Local_ idle_ length The idle job length in the local site job queue

Local _running _ length The running job length in the local site job queue

Remote_idle _length The idle job length in the remote site job queue

Remote_running_length The running job length in the remote site job queue

Local_finish_time The job finish time in the local site

Remote_finish_time The job finish time in the remote site

The Grid system includes the network infrastructure and the software architecture, to

provide distributed computing resources platform. Because of the development of internet and

the technology of cheaper computing, the internet users not only share files easily, but also

share the extensive computing resources. The most important concepts of the grid systems are

capability of “the dynamic computing resources” and “cross multiple virtual organizations”.

(15)

communication channel (pipe line) between the VOs, coupled with the security consideration

and the different VO target, result in difficult come to integrate computing resources. We

adopt Peer-to-Peer technology to achieve decentralized computing resources sharing,

integrate by management and access computing resources from different VOs. Thus, we can

improve the rational use of computing resources and reduce the computing resource idle

chance, and the efficiency of the job execution performance will be able to improve.

The principal thrusts of this study are computing resources collection, integration and

utilization. The three main contributions to (1) integrate computing resources by Peer-to-Peer

technology (2) build the light-weighted grid middleware by Peer-to-Peer technology, and (3)

to improve the utilization of idle computer resources by job migration mechanism, and to

achieve the computing resources sharing. Let the job choose the most appropriate computing

resources, and reduce the idle chance of computing resources.

We introduce the gird and peer-to-peer system, problem and propose in chapter 1.

Chapter 2 will discuss the related works, which include for grid computing, peer-to-peer

technology, JXTA and migration technology. Chapter 3 will further discuss the system design,

which include for development tool, system framework overview and the proposed algorithm.

Chapter 4 will describe the experimental environment in this study. Chapter 5 will explain the

results and statistic obtained from the experiments. The last chapter will include related

suggestions, directions for future development, and conclusions for the summarization of this

(16)

Chapter 2. Related Work

This chapter discusses the related work, it can be divided into four sections, include section

2.1 Grid Computing, section 2.2 Peer-to-Peer Technology, section 2.3 JXTA technology and

section 2.4 Migration technology. In section 2.1, describes the development history of Grid

computing, the application in enterprise and the Grid middleware development, such as

Globus Toolkit, condor and condor java API.

Peer-to-Peer networks are almost applied to the file sharing and instant messaging

applications, different from the traditional server/client architecture. In section 2.2, we discuss

peer-to-peer technology applications and it benefits, including the decentralized framework,

great scalability, robustness, high-performance, security and load balancing. In section 2.2.3,

we describe the similarities and differences between Peer-to-Peer and Grid system, and the

related studies of Peer-to-Peer Grid system.

In section 2.3, we introduce the JXTA. We adopt JXTA for implement Peer-to-Peer Grid

system. This platform defines the XML-based framework for the message exchange and the

network topology integration. JXTA provides the series open protocols, and these protocols

are allowed to make devices (e.g., mobile phone, PDA and computer) connection between

nodes.

In section 2.4, we describe the development of migration technology and the application

in the Grid system. By using migration technology, we could allocate the site or node load. In

the distributed system, load sharing can improve the system efficiency by job migration. In

(17)

2.1 Grid

Computing

Grid computing is a kind of distributed system, includes the network infrastructure and

software framework, and provides computing services by the distributed hardware and the

software. The goals of the Grid computing are improving computer power capacity, resource

utilization, and access resource by connecting the large number distributed resource from

geographically or the organization. In addition, The Grid system allows the VOs all over the

world to sharing the resources each other and satisfies the large computing requirements of

the user.

Different organizations share resources or cooperate together for the same goal. These

organizations are called VO in the Grid system. The resource, not only means the computer,

application service provider, hardware or network resource, but also including the software,

scientific instrument and commercial information. In the Grid environment, every VO

attaches great importance to mutual trust, communication and coordination. Toward with the

available resources, VO also provides the application protocols and the structures for the member who owns the access right [46].

In 2004, European Union expanded the Grid application environment and established the

EGEE (Enabling Grids for E-Science and Industry in Europe) planed to expand e-Science

research, education and industrial application. In United States, established the Grid3 plan

according with the LCG [31] development experience, and to promote Open Science Grid.

Expect that Grid will become the important assistance in the scientific research, education and

social changes. Due to these organizations collaboration are promoting the Grid scientific

development. Taiwan is also committed into the Grid research, the apparatus of government

and schools build the Grid systems environment together, and research in the new theory and

(18)

many high schools and universities joined this project from the central of Taiwan. In 2006,

Taiwan UniGrid project [50] attracted more than 30 schools, and coordination with Academia

Sinica, and National Center for High-Performance Computing. At present, the Grid

applications have practical in various fields such as medicine simulation, medical and

high-energy physics.

Figure 2-1 Commercial Grid solutions taxonomy [38]

Grid computing is not only practiced in the science experiment, Figure 2-1 shown the

development of computing technologies in recent years, from the beginning of the cluster

computing to the current open standards Grid environment can be seen Grid computing has

increase the scope in variety of commercial applications gradually. The powerful computing

capacity let researchers devote to the meteorological models or weapon simulations.

Meanwhile, Grid computing is diffusing to traditional business computing applications

show as Table 2-1. Following IBM presented eBoD in 2002, Amazon began selling Elastic

Compute Cloud (EC2) [1] services on the internet in 2006. EC2 provided on demand and

flexibility calculation services by web service, and mainly to help developers use the wider

Open Grids Partner Grids Technical Complexity Enterprise Grids Departmental Grids Clusters 2007 Technology Capacity Frontier 2005 Technology Capacity Frontier 2000 Technology Capacity Frontier Crossdomain access

(19)

computing resources more easily. In 2007, SUN also launched the On-Demand Computing

[47] services, and determines the price by the rental of CPU or the cost of time, and customers

can accordance with their requirements buy CPU computing resources. Above On-Demand

Computing also represents the development of Grid computing, and become more close to

(20)

Table 2-1 The Grid computing applied to enterprises or products in recent years.

Time Name Company The Goal

2002 PLM (Product Life Management) SUN PLM import manufacturing processes to enable enterprises have lower costs for integrate existing IT resources, such as processing power, storage devices, memory and network bandwidth. Enterprises not only allow users to allocate network resources flexible, but also implementation on compute-intensive application.

2004 World Community Grid

[http://www.worldcommunityGrid.org/]

IBM In this plan, the researchers use the information technologies study in the "World Community Grid", and analyze the large number of cancer tissue microarray (TMAs, tissue microarrays) for process number of experiments in the short time.

2004 Oracle 10g Oracle Oracle Grid adopts server and the

modular storage equipment which with lower cost, make the efficient load balance of system. The user can enjoy the high-performance and reliability with the lowest overall cost of information services. The current version is Oracle 11g.

2007 HYDRAstor

[http://www.necam.com/Storage/GridStorage.cfm]

NEC HYDRAstor uses Grid-base storage technology, and is developed by U.S. NEC Laboratories America Inc. It is according to the NEC’s server and the storage experience of business, each unit cost of storage will be reduced to one-tenth of the past similar products.

(21)

In August 2006, the U.S. Oracle in Asia-Pacific (APAC) published, according to the

Overall Grid Index report that the Grid applications use in business organizations in APAC

grow faster than other regions of the world. This report also indicated that the amount of

business organizations in Asia-Pacific who has, or plan to establish Grid computing system

increased 83 percent than in 2005 (compare with the lower growth in the United States 45

percent and Europe 7 percent ). Meanwhile, the awareness and understanding of basis Gird

capacity in Southeast Asia is top three of the world. The Quocirca research director Clive

Longbotom expressed his view for the Asia-Pacific region, he said that the utilization of Grid

computing increased in Asia-Pacific region from the Grid computing bring the business

organizations a new value and attention. The above describes show Grid computing operates in the enterprises or the acceptances of the Gird are both growing positively [37].

Grid middleware is used to integrate the scattered computing resources, and responsible

for the coordination functions between the computing nodes. One of the important

components in the Grid middleware is metadata. The European Data Grid project, that is

cooperated from 150 software engineers and complete over 300,000 lines of code. In short

speaking, the main purpose of the Grid middleware is to achieve resource sharing, security

access and the resource management. There are some common Grid middleware, such as

Globus Toolkit, China Grid Support Platform [4], gLite [13], and UNICORE [52], and the so

on. Globus Toolkit is the most popular Grid middleware, and proposed many Grid-related

standards. In 2001, Department of Institute of Physics in Academia Sinica participated in the

LCG project (Worldwide LHC Computing Grid) [31] of European Organization for Nuclear

Research (CERN). The core middleware of LCG is gLite, and gLite adopt some part

components from the Globus Toolkit, with various packages developed from its own team.

In addition to the Grid middleware and the application development, many Grid-related

(22)

In the traditional job scheduling and load balancing studies considered less on the

heterogeneity significant of the Grid system or the experimental methods were not suitable for

the heterogeneity Grid system. In recent years studies, consider the heterogeneity

characteristics of the Grid system gradually, including the computing ability, bandwidth and

distance. In [42] study, adopted job arrival rates and the job response time for load balancing

factors. In [34, 33] studies, authors collected job total response time for the assessment

standard, and to group the computing nodes who have powerful computing ability, the job

will be assigned priority to these group. In [54], author considered of the user's expect

deadline and the migration cost of job. Above studies almost implemented experiment and

presented the experimental results by the simulation, also show that it is hard to experiment

with real Grid system.

2.1.1 Globus Toolkit

Globus provides a framework for application to process distributed heterogeneous computing

resources. The Globus project developed by Globus Alliance [65] which members including

Argonne National Laboratory and University of Southern California, and these members

devote to the computing environment development. IBM, INTEL, HP, SUN and other

enterprises are also support to Globus Toolkit.

The process of Grid technology development, while not long, but the core technology is

already has great progress. Currently, the most of gird projects are established by the Globus

Toolkit protocols and services, shown as Figure 2-2. Globus Toolkit is an open-source and

free for users, and can be modified by users demand. The object-oriented structure of Globus

Toolkit provide many services, including the resource monitor, resource discovery, execution

management, security infrastructure and data management. The programmer can adopt

(23)

deployment. That is why Globus Toolkit so popular in the Gird system construction. The

latest open version is Globus Toolkit 5.3.

Figure 2-2 Globus Toolkit 4 Services

Globus Toolkit provides the following functions:

Security

The security service in Globus Toolkit provides users authentication identifies, protection

communication channel, and to determine who was allowed to perform actions (authorization)

and other support functions, such as management the user account and maintenance member’s

data.

Data Management

The data management service in Globus Toolkit achieves to distribute the location, transfer,

access and management for the data. GridFTP is a safety, reliable, and high performance

transfer protocol, and it apply optimization for data transfer between nodes. The functions

provide in GridFTP are parallel transfer, reliable transfer, and support transfer security and

Security Data Management Execution Management Information Service Common Runtime Credential Management Authentication Authorization Delegation Community Authorization Data Replication Replica Location Grid FTP Reliable file

Transfer OGSA-DAI Grid telecontrol Protocol Workspace Management Community Scheduler Framework Grid resource Allocation and management Trigger Index WebMDS Java Runtime C Runtime Python Runtime

(24)

integrity for GSI.

Information Services

Monitoring and Discovery System (MDS) provide information services components in the

Globus Toolkit, including the available information and state in the Grid system. For example,

the discovery service can find the suitable node which has better computing resource for the

job.

Execution Management

Grid Resource Allocation Management (GRAM)

GRAM is the important component for execution management services in Globus Toolkit, its

help the user to locate, submit, monitor and remote execution in the Grid system. GRAM is

not a task scheduler, but it communicates with the different bath or cluster task scheduler by

use the single protocol.

Common Runtime Components

After Globus Toolkit version 4, the development team added the common runtime

components, pre-web libraries and tools. These services to be platform independent, and

establish various abstraction layer and leverage functionality lower in the web service stack.

2.1.2 Condor [6]

Condor is a kind of workload management system for the compute-intensive job. The goals of

Condor project is to integrate the large-scale distributed computing resources, and also

support implementation, deployment, assessment mechanisms, and adjustment strategies for

high-throughput computing. Condor provides a job queue mechanism, scheduling policy,

priority scheme, resource monitoring, and resource management. When the user submit their

(25)

execution finish, user will be notified. The job types of condor can be supported, including the

parallel application(MPI application)、JAVA application、DAGMan application and virtual

machine application, the execution state can be classified into run, held and idle, and so on.

The important features in Condor, such as the check point, remote execution, and support the heterogeneity environment [6]。The other developments of Condor, the Japanese scholar

Hidemoto Nakada who servers in Department of Grid Technology Research Center in

National Institute of Advanced Industrial Science and Technology, developed an application

can control condor job queue is called “Condor Java API”, to help the user delivery their job,

and cancel the job quickly. Depend on this, developer can control job state more easily, and to

(26)

2.2 Peer-to-Peer

Technology

In this section, we discuss the Peer-to-Peer network summary and Peer-to-Peer technology

development on the Grid computing system.

2.2.1 Peer-to-Peer Network

Currently, Peer-to-Peer networks are almost applied to the file sharing and instant messaging

applications, different from the traditional server/client architecture, the computer nodes in

peer-to-peer network are all the same, and play server and client role at the same time. Each

node is able to sharing out its resource and also can get the resources from other nodes. By

this architecture, without server or client restrictions, computer nodes own more rule and the

better flexibility, while reducing the network bottlenecks happened (the popular server often

causes the congestion), and the network bandwidth has been properly applied, thus the

network performance enhancing greatly. Peer-to-peer network is structured from a large-scale

decentralized system platform and mappings object location to other nodes by its

identification [40].

In research [3], authors proposed the definition for Peer-to-Peer Network.

Peer-to-Peer systems are distribution systems consisting of interconnected node able to self-organize into network topologies with the purpose of sharing resources such as content, CPU cycles, storage and bandwidth, capable of adapting to failures and accommodating transient population of node while maintaining acceptable connectivity and performance, without requiring the intermediation or support of a global centralized server or authority.

(27)

The characteristics of peer-to-peer network are decentralized, high performance, excellent

scalability, robustness, security, price advantage and load balancing, etc. [3, 26]. We describe

these characteristics as the following:

Decentralization

The computer nodes in Peer-to-Peer network are distributed. Each nodes can send

message to each other without centralized sever intervention. Therefore, it could avoid

some unexceptional event, such as the bottlenecks, security and management issues.

Scalability

In the Peer-to-Peer network, self-organizing helps every node can join or leave anytime

in the topology. With the user increase, not only the requirement of services has

increased, but overall system resources and services also expand simultaneously. We

could satisfy the user’s requirement more easily, which is unable to be done in the

traditional distributed system.

Robustness

The advantages of Peer-to-Peer architecture are the high fault-tolerant and it could

prevent from the network attack. As the services are scattered in various nodes, if some

nodes or network were damaged, the system will have less influence from the damage.

Generally speaking, if some nodes lost or leave in Peer-to-Peer network, the topology

will self-adjustment automatically, and to keep the link with other nodes. Peer-to-Peer

network can also self-organizing constantly according with the network bandwidth,

number of node and the system load.

Security/Privacy Protection

The beginning of system was established, the system designer consider the security

problem about how to prevent the attack. Peer-to-Peer systems are deployed on

(28)

between the nodes, without through centralized server or node, thus prevent the private

information leaked and possibility of eavesdropping.

Loading Balance

The computer node in Peer-to-Peer network are both server and client, unlike the

server/client architecture emphasized the computing power and storage capacity.

Meanwhile, because of the resource be distributed in number of nodes, the Peer-to-Peer

network could achieve the network load balancing.

In the Peer-to-Peer network, it is not easy to regulate user behavior and the computer

node quality, which are many different types of hazards, such as to abuse other nodes

resources. In the Grid system, the computing nodes all have the safety certification and

reliability account. We can use Peer-to-Peer technology to group the computing node and

ensure that resources are used effectively and have a safety access mechanism.

Super node is an important development of the Peer-to-Peer technology. Compare with

the super node and general node, super node is powerful than general node in its computing

power, bandwidth and the storage capacity, etc. Therefore, super node can burden more work

than other nodes [3]. The earliest application adopt the super node concept is KaZaa [30].

KaZaa is the one of the popular Peer-to-Peer software in the world. According to the statistics,

the KaZaa has been downloaded more than 250 million times, and the file transfer in KaZaa

consumed 40 percent of the global network bandwidth. KaZaa combines Napster and

Gnutella common benefit. In the structure of KaZaa, uses the wide distributed architecture

like Gnutella, and lets system can be expanded easily. Because the KaZaa is not necessary to

save file name in the centralized index server, system will choose the node which have better

performance to become the super node automatically, and save the neighbor nodes

(29)

technologies, such as blind search and heuristic search. In many studies, research in search

method and try to find the better path or improve the speed of search. As the super node

indexing, search efficiency has improved greatly.

At present, well-known communications software Skype is a Peer-to-Peer VoIP

application, it uses the same topology like KaZaa. The topology structure of KaZaa and Skype

are slimier between Napster [36] and Gnutella, they both group the node in the region, and the

formation of these groups may be from similar geographical or share similar resources.

Therefore, to share resources in the same region will bring the high query frequency and

quantity.

2.2.2 Peer-to-Peer Grid System

Grid system and Peer-to-Peer system both are built from the community concept, when Grid

computing has been widely used in the computing scientific, Peer-to-Peer applications are

still most used in multimedia file exchange. Grid computing provides computing resources to

the user, and resource management and task scheduling are the important services of the Grid

system. In addition, Grid system focuses on deal with more powerful, more diverse, and a

highly interconnected set of resources.

In the resource provision view, Grid system emphases on supply Quality of Service

(QoS), but Peer-to-Peer system could not be fully guaranteed. Now, with the Grid system

scale expand, Grid developers are facing the autonomic configuration and management issues

[16]. In recent years many projects such as Jalapeno [48], SP2A [2], and JNGI [28], combine

the Peer-to-Peer technology and the Grid system. Most of Grid computing projects

application in resource collection and job management. JNG and Jalapeno used Peer-to-Peer

technology to job execution in the Grid system. In the job management, JNG and Jalapeno

(30)

these two Peer-to-Peer Grid systems that Jalapeno collects the job which is already submitted,

to extend jobs throughout the system and it not necessary to ask submitter any action or

request. But the job submitter of JNGI, it must send request to worker group before submit

job. After above action, the submitter was allowed assigns the job to worker group.

Unfortunately, JNGI could not distribute large number of jobs to worker groups at the same

time.

Besides computing resources collection and utilization, the issues of resource integration

are how to allocate large-scale resource and to collect the computing resources come from

different domain. The Grid system common problem is that the users always like to use the

powerful node, and the resource will be concentrated in these powerful nodes. Then, bring the

resource imbalance problem. We could take on the Peer-to-Peer technology to help discovery

the new resource quickly, add computing nodes more flexibly, and use computing resources

(31)

2.3 JXTA

Technology

The Peer-to-Peer technology becomes a popular topic because of the music file-sharing

programs, such as the Napster and Kazaa. SUN hopes to invest more applications by JXTA.

For example, to build Peer-to-Peer network in the office could reduce the cost from the central

server structure, and avoid network problems like the hot spot and system maintenance [3, 26].

SUN developed JXTA in 2001. JXTA is an open source platform for Peer-to-Peer network,

and its name comes from the Juxtapose. This platform defines the XML-based framework for

the message exchange and the network topology integration.

JXTA provides the series open protocols, and these protocols are allowed to make

devices (e.g., mobile phone, PDA and computer) connection amid nodes. In the JXTA virtual

network, even if some nodes exist behind the firewall or NAT can still communicate with

other nodes directly [29].

2.3.1 JXTA Architecture

Overall, JXTA framework can be divided into three layers, as the following:

Core Layer

This layer is the JXTA core. It gathers the basic functions for Peer-to-Peer Network in a

package, including the node search, media transmission (e.g., transmission media data

through the firewall), group establishment, and other security mechanisms.

Service Layer

JXTA services layer supports the Peer-to-Peer network services and the normal operation. But

in different Peer-to-Peer environment supports different degree of services, including for peer

discovery, index service, data sharing, certification, distributed structure and Public Key

(32)

Application Layer

JXTA application layer supports application implement and combine, such as Peer-to-Peer

message transfer, data and resource sharing, entertainment media content management and

publishing, mail management, the auction system, and other functions can be deployed on the

Peer-to-Peer network.

(33)

2.3.2 JXTA Service

JXTA provides varies network services, including the web service, Remote Method

Invocation (RMI) and Common Object Request Broker Architecture (CORBA), and these

services are delivered by the pipelines. JXTA also can adopt other standards for improve

system efficiency such as Web Services Description Language (WSDL) and Simple Object

Access Protocol (SOAP). By the following services are the most commonly used in JXTA. As

long as there have a node survive, JXTA will provide these services constantly.

Advertisement plays the conversation bridge in the JXTA network. Like the real world

advertisement, advertisement in JXTA describes the information, including the peer,

group, pipe and the service, and all above information need advertisement to keep the

communication in the JXTA network. Advertisement using the XML format as show in

Figure 2-4. XML with the following characteristics:

- Standard

XML standard was made by the World Wide Web Consortium (W3C), and has a

high degree of acceptance in computer science fields.

- Global

XML parser supports UTF-8 character set, this is a global language and Unicode

standard. Users can edit XML using all countries language.

- Self-describing

XML format is constructed by the Meta data, tag and attribute. The user can

self-describe the data format.

- Extensible

(34)

their information. Because of these important characteristics, XML could make

communication and editing easily by the JXTA Protocol.

Figure 2-4 JXTA XML format

Peer Service

Peer service, it is could be a network device or one or more JXTA protocol, such as the

PDA, phone and personal computer. Each node has a unique peer id, and can be existed

in many types as different functions

PeerGroup Service

Peer group is a set to collect the node and own the same share mechanism in a group. At

any time, the node could belong to the multiple groups. When the node is built, it will

join the “Net Peer Group” at the beginning, and then the node could join the other group.

Similar with the peer service, each group has their unique group id, and there are many

group types in the group services, including the secure group, limit available scope group

and monitor peer group. JXTA protocol describes the node how to publish, discovery,

join and monitor these service.

<Peer>MyPeer</Peer> <PeerId>urn:jxta:uuid-59616261646162614A78746150325033BCCEADFAD24D44C5ACF38BD18BFF00 9403</PeerId> <TransportAddress>jxtatls://uuid-59616261646162614A78746150325033BCCEADFAD24D44C5ACF38B D18BFF009403</TransportAddress> <TransportAddress>tcp://192.168.200.141:9701</TransportAddress> <TransportAddress>tcp://210.240.197.6:9701</TransportAddress> <TransportAddress>relay://uuid-59616261646162614A78746150325033BCCEADFAD24D44C5ACF38B D18BFF009403</TransportAddress>

(35)

2.3.3 JXTA Protocol

In this section, we explain the JXTA protocols are mentioned before.

Peer Discovery Protocol

By using Advertisements in this protocol, the node could publish its resource information,

such as the peer, group, pipeline, and other services, and allowing nodes to discovery

others. Peer discovery protocol delivers two types of messages:

- A request format to use to discover advertisements

- A response format for responding to a discovery request

Peer Information Protocol

Most Peer-to-Peer applications always want to know what remote nodes doing now.

When the node publish its information, we can know its message, such as how long it

will exist, how many message were published from it. This protocol provides two types

of messages:

- Peer Info Query Message：To query information of node’s state.

- Peer Info Response Message：To response the node state to other node.

Peer Resolver Protocol

With the need to resolve other node, JXTA defines the protocol “Peer Resolver Protocol”.

This protocol provides implementation of PRP and defines how node exchanges queries

and response information with another one. In general, Resolver service provides the

following two types of messages:

- Resolver Query Message：The type of send query message.

(36)

Peer Endpoint Protocol

This protocol can assure the message’s route, and deliver messages to endpoint node

correctly.

Pipe Binding Protocol

Pipe Binding Protocol is one of the most used in JXTA, and it could be used to create

virtual communication channels (or pipes) between one or many nodes. By this protocol

allowing node have the ability to find the communication pipe, and the node can

combine the information to the endpoint.

Rendezvous Protocol

Rendezvous can be viewed as a point for collector where nodes could exchange

information each other. Through broadcasting of buffered nodes, Rendezvous peer could

reserve other node’s information. Thus, Rendezvous peer helps nodes to find out other

node or redirect the query request to other Rendezvous peer. In fact, Rendezvous peer

means that a node can handle request from other node, and also delegate requests to

other Rendezvous peer. Its main purpose is conveniently search information in local

network. Each node can be dynamically assigned to be or not to be Rendezvous peer. In

hybrid centralized Peer-to-Peer architecture, the central server is been the Rendezvous

peer, but in decentralized architecture, Rendezvous peer may be not only one anymore.

Through the Rendezvous protocol, we can resolve the problems of node searching and

incomplete data query.

Peer Discovery Protocol

Pipe Binding Protocol

Peer Information

Protocol RendezvousProtocol

Endpoint Routing Protocol Peer Resolver Protocol Standard Services Protocols Core Specification Protocols

(37)

2.3.4 Java Virtual Machine

Java interpreter can translate original program to the byte code, which can be executed in any

machine with Java Virtual Machine (JVM). By translating program to the byte code, a

program can be executed in a cross-platform or cross-OS manner. It means users can execute

the same program in different operate system, such as Windows, UNIX, or Macintosh’s Mac

(38)

2.4 Migration

Technology

In this section, we discuss the migration technology including the process migration,

migration strategy, job migration in the Grid system, and the load sharing policy.

2.4.1 Process Migration

The term “Process” appeared in the 1960s, presented by the Multic system designer [7].

Process migration is an action which transferring a state of process from one machine to

another for processing or executing on the target machine. The earliest migration technologies

were used on the operating system, via the inter-connection framework to allocate the CPU

resource to the process. The other migration applications are used in image process, memory

page, and so on. We summarize the three points of migration technology advantage, including

(1) to help dynamic load balancing, and reduce the network traffic. Many operating systems

developed process migration mechanism, such as the Accent Sprite and V. (2) to help the

resource sharing, the resource such as specific node with a special hardware device, large

amounts of free memory, or some other unique resource, help users access more processing

power. (3) to improve system administration by transferring the process that prevent many

unexpected problems, such as machine shut down in suddenly. [45, 39, 35]

Migration technology can be divided into three actions:

Step 1. Suspend the process in the source node.

Step 2. Transfer the data, file and information state to the remote node.

(39)

These steps are the basic action of the migration strategy. According with above steps,

many migration strategies are reformed, and assist in reduce the system cost and raise the

system performance. The common migration strategies, like the Eager, Lazy and Precopy.

Eager is the simplest and most common migration strategy. Because of this strategy transfer

all information (data, file and state) in a one transfer time. When system starts the Eager

migration strategy, the first step is suspending the process on the source node and beginning

transfer information to the remote node. After the information was transferred complete, the

process will reconstruct the process on the remote node. The time and cost of Eager is higher

than other strategies [39], but it is most easily to implement.

2.4.2 Job Migration in Grid environment

The development of migration technology from process migration (inter-connection) to deal

with large scale and geographically distributed system, such migration applications through

the Wide Area Network (WAN) to connect various computing machines are called

internet-connection. The resource in the Grid computing environment is dynamically, and we

could achieve load balancing by job migration between the computing nodes.

The Grid is a cross-geographical and heterogeneity system, the job must be adapted in

different computing capacity and to re-scheduling constantly. Most of the impact factors of

the Grid consider the usage of site resource. In the past load balancing or scheduling studies

[32], only consider the CPU and memory state. But it should be considered more impact

factors in the real Grid environment.

The job migration in distributed system need to reflect three issues to facilitate the load

balancing implementation, there are when to active the migration strategy, how to implement

migration strategy and the consideration of migration cost. Active the migration strategy

(40)

user’s requirement and the condition of the job or according with the node status (such as

CPU or memory state), then to determine whether to start migration strategy. As to how the

implement migration strategy, we could establish the migration strategy on the computing

node or site. The load sharing policy defines the transfer rules between the nodes. We could

adapt load sharing policy to establish the migration strategy. In the next section, we will

describe and discusses the loading sharing policies

2.4.3 Load Sharing Polices

Decentralized system collects a lot of nodes for computing resources sharing. The load

sharing in a decentralized system could helps improve the usage of computing resources,

system performance and obtains the minimal job execution time via sharing the workload to

idle or light load computing resources. In the study [27], shown that when information is

transferred from the source to the destination, will has the significantly affected by the

network state, including the network latency, communication delay, and so on.

Generally speaking, load sharing policy can be divided into dynamic, static and adaptive.

The dynamic load sharing policy uses the system state information (the load of site), and

make the decision by the system state at any time. The static load sharing policy adopts the

prior knowledge, such as the average task-initiation rate and execution rate of each node.

Dynamic load sharing policy also could improve the quality of decision more than the static

load sharing policy. Adaptive policy is the special one, which is started to change system state

when variables or the policy is changed [44]. Dynamic load sharing policy could classify into

Sender-initiated, Receiver-initiated, and Symmetric-initiated. By using load sharing policy,

we can increase the usage of computing resources, and decrease the job response time through

(41)

Sender-initiated policy

Sender-initiated policy shares the system load from the overload system to light load

system for abating its own loading [10]. We could get better performance by using this

policy in light or medium system load [35]. But, if using this policy in high load system

will cause a lot of useless request messages (which will be rejected by other nodes, until

request times meets the default upper bound), leading to the communication channel

congestion problem. Too many request times may also bring the delay and waste of

bandwidth.

Receiver-initiated policy

When any light load node has the capacity to sustain more load, it will send request

message to the node in heavy load to share its load actively. But when light load node

cannot find any suitable load node, it will execute its own job, thus good performance is

assured. Using this policy in light or medium load system, the performance is worse than

that of sender-initiated policy. Just like using sender-initiated in heavy load system, using

receiver-initiated policy in light load system will bring lots of useless messages. This

result in the information collection delay, even some system performance improvement

gained is useless. Nevertheless, unlike using sender-initiated policy in heavy load system,

using receiver-initiated in light load system will obtain the better efficiency, because the

few job in system which means less departure of jobs, thus rarely influenced by delay of

data collection [10, 11].

Symmetric-initiated policy

Symmetric-initiated policy combines the above two policies, thus with both two’s pros

and coins. That is using sender-initiated policy in heavy load system and using

receiver-initiated policy in light or medium load system both result in many useless

(42)

generate many useless messages, and consumption of communication capacity.

Compared with the above two policies in cost, symmetric-initiated policy is a more

(43)

Chapter 3. System Design and Implementation

This chapter, we discuss design and implementation of our system. We adopt the JXTA to

implement a decentralized Grid middleware, named Service Oriented Roaming System

(SORS). In section 3.1, we describe the JXTA development tool for our system. In section 3.2,

we propose our system architecture, including the Globus Toolkit architecture and SORS

architecture. In the section 3.3, we also detail the comparison between Globus Toolkit and

SORS. In the section 3.4 and 3.5, we explain the implement technique of information

discovery and load sharing in our system. In section 3.6, we describe the load barrier policy

and the heterogeneity policy in this study. Finally, we propose simple examples in section 3.7.

3.1 Development

Tool

In this study, we adopt the JXTA as the development tool. JXTA is an open-source platform,

and it provides of Peer-to-Peer services. Currently, JXTA supports JAVA and C++ languages,

and the latest version is JXTA 2.5.1.

3.2 System

Framework

Overview

In Grid systems, most of them use the Globus Toolkit as a middleware. The architecture of

Globus Toolkit could be divided into four layers, including application layer, Grid service

layer, Communication layer, and Basic Grid Fabric layer, as shown in Figure 3-1. In this study,

we would like to implement the functions of the Grid service layer, include execution

(44)

Figure 3-1 The architecture of Globus Toolkit

In this study, we develop a Peer-to-Peer Grid middleware system, named the Service

Oriented Roaming System (SORS) to provide a Grid middleware prototype. The term

“roaming” is named from that the job could move between the sites. The architecture of

SORS is shown in Figure 3-2. The SORS is a light-weighted middleware to collect computing

resource information by a decentralized structure, and to achieve cross-sites/VOs load sharing.

The module of the SORS includes the Basic Service, File Transfer, Information Service,

Execution Management and Load Sharing.

In general, the SORS provides the following functions:

Basic Service (configure)

This module supports the basic setting and configurations, including the peer and group

initialization. It is the underlying structure in the SORS.

File Transfer

Uses sockets to the transfer module of data, file and message, to implement pipelines for

communication. It mainly provides the transfer service and the underlying structure in

Basic Grid Fabric

Communication Grid service Application Internet Protocol Software Information Service Data Management

Execution Management Security

File Transfer Protocol

Hardware Physical Layer

Operating Systems

Disk Network Databases

Portal Collaboratories High Energy Physics

(45)

the SORS.

Information Service

The information service includes the computing resources discovery and integration. The

computing resources discovery could be divided into two parts. The first part collects the

information resource between sites, the second part collects the local resource

information (between local peers). In this study, we build a decentralized structure for

cross-sites/VOs information integration, such as CPU, memory, job queue length status,

and so on.

Execution Management

The execution management module, handles the job management, the job execution, and

the control of the job queue.

Load Sharing

In this load sharing module, we develop a load sharing strategy to share load between

(46)

Figure 3-2 The architecture of SORS

High Level Service Application

JXTA Core Service

Fabric

Peer Discovery Protocol Peer Information Protocol Peer Resolver Protocol Peer Endpoint Protocol

Pipe Binding Protocol Rendezvous Protocol Portals Collaboratories High Energy Physics

Operating Systems

Disk Network Databases

‧‧‧

Information Service Execution Management File Transfer

Configure Load Sharing

JVM

SORS

(47)

3.3 Light-Weighted

System

Developing a light-weighted grid middle is the target of our study. SORS is a light-weighted

grid middleware prototype to provide simple and efficient basic modules to integrate

computing resources and achieve load sharing.

SORS substitute a part of the original components of the Globus Toolkit, for example the

Information Service, Execution Management, and Data Management. Security is an important

consideration in the Globus Toolkit, in different VOs requires different certifications to

ensure system security when the members use computing resources. Because that we

implement a cross-VO system by the Peer-to-Peer technology on JVM. We do not use Globus

Toolkit’s GSI to integrate our system. We compare with the SORS and Globus Toolkit

functions as show in Table 3-1

Table 3-1 The comparison with Globus Toolkit and SORS functions

Globus Toolkit SORS

System core size large small

Transfer Yes Yes

Information service Yes Yes

Resource discovery Yes Yes

Execution management Yes Yes

Job Migration (Cross-VOs) None Yes

3.4 Information

Discovery

Many Grid researches monitor resources according to the agents or brokers. In the Globus

Toolkit architecture, the resource allocation is limited in the same VO, and then resource

information is collected by the web service application, such as Ganglia, MDS, and NWS.

(48)

collected in a centralized server. The information services of the SORS supports the resource

discovery by Peer-to-Peer technologies. The resource information includes: CPU speed,

CPU type, Memory total, Memory free space, Network Bandwidth, and job queue length.

There are two kinds of nodes in the VO. One kind is called “super peer”, and the other kind

is called “general peer”. The super peers are responsible for the site resource collection and

integration. The general peers are responsible for supplying themselves information status to

super peer.

Figure 3-3 The structure of traditional resource discovery in Grid system

Node 1

Monitor

Application/Server

CPU

info Mem info BandwidthINFO

Node 2

CPU

info Mem info Bandwidth INFO

Node 3

CPU

Site B

Node 1 cpu Node 2 Node 3

Site C

Node 1 Node 3 Node 2

(49)

Figure 3-4 The structure of resource discovery in SORS

3.5 Load

Sharing

To improve the efficiency of the allocation is an important issue in the Grid system. This

study focuses on resource allocation among sites with the decentralized structure. In order to

allocate the computer resources and to manage the job, this study adopts the migration

technology which integrated with Condor Java API to access the job status in the Condor job

queue, to achieve the job management and resources sharing across sites/VOs.

Suppose that the local site is over loading. We could use the load sharing algorithm to

migrate the idle job from the local site job queue to the under loading remote site for reducing

the job waiting time, and improving the utilization of the computing resources. SORS takes

the system load balance into consideration to obtain the best job execution time and to

increase the usage of the idle computing resources.

Site A

Node 1

CPU

Node 2

CPU

Node 3

CPU

Site B

Node 1 cpu Node 2 Node 3

Site C

Node 1 Node 3 Node 2 Super Peer General Peer General Peer G S S G G G

輕量化點對點格網系統中工作遷移之研究與實作

國立台中教育大學數位內容科技學系碩士班碩士論文

指導教授： 賴冠州 博士

輕量化點對點格網系統中工作遷移之研究

與實作

Study and Implementation of Job Migration

on Light-weighted Peer-to-Peer Grid

Systems

研究生：林士傑 撰

中華民國九十七年六月

摘 要

Abstract

Table of Contents

List of Figures

List of Tables

Chapter 1. Introduction

1.1 Motivation

1.2 Problems

1.3 Purposes

1.4 Restrictions

1.5 Notations

Chapter 2. Related Work

2.1 Grid

Computing

2.1.1 Globus Toolkit

 Security

 Data Management

 Information Services

 Execution Management

 Common Runtime Components

2.1.2 Condor [6]

2.2 Peer-to-Peer

Technology

2.2.1 Peer-to-Peer Network

2.2.2 Peer-to-Peer Grid System

2.3 JXTA

Technology

2.3.1 JXTA Architecture

2.3.2 JXTA Service

2.3.3 JXTA Protocol

2.3.4 Java Virtual Machine

2.4 Migration

Technology

2.4.1 Process Migration

2.4.2 Job Migration in Grid environment

2.4.3 Load Sharing Polices

Chapter 3. System Design and Implementation

3.1 Development

Tool

3.2 System

Framework

Overview

 Basic Service (configure)

 File Transfer

 Information Service

 Execution Management

 Load Sharing

SORS

3.3 Light-Weighted

System

3.4 Information

Discovery

Site B

Site C

3.5 Load

Sharing

Site A

Site B

Site C

指導教授：賴冠州博士

研究生：林士傑撰

摘要

Security

Data Management

Information Services

Execution Management

Common Runtime Components

Basic Service (configure)

File Transfer

Information Service

Execution Management

Load Sharing