國立台中教育大學數位內容科技學系碩士班碩士論文
指導教授: 賴冠州 博士
輕量化點對點格網系統中工作遷移之研究
與實作
Study and Implementation of Job Migration
on Light-weighted Peer-to-Peer Grid
Systems
研究生:林士傑 撰
中華民國九十七年六月
摘 要
隨著資料量越來越大、複雜度也愈增加,因此需要更多更密切的全球協同合作 (Global Coordination/Cooperation)和資源的有效利用。格網計算(Grid Computing)是解決
複雜問題的一項新技術,其主要觀念在於計算資源共享的社群觀點。全世界的格網最大 的挑戰便是該如何透過不同的虛擬組織、管理、存取權限來達到資源的共享和整合。目 前,在一些整合性的組織派送工作時仍需要指定特定虛擬組織去執行。而計算資源的使 用往往集中於某些計算能力較強大的節點,導致計算資源使用上仍不平衡。 本研究分析點對點網路的優點,以計算資源的蒐集、整合及使用為主要原則。我們 架構了一個點對點的輕量化格網中介軟體雛形系統以及分散式的資源代理人(Resource
Broker),此系統命名為服務導向的漫遊系統 (Service Oriented Roaming System),並設計
與實作一個能夠跨虛擬組織(Virtual Organization)的點對點格網系統通訊環境。我們也提
出了工作轉移的機制與負載分享策略,能夠改善閒置電腦資源的使用率及讓計算資源分
享能夠更有效率。本研究希望能夠以此通訊環境為基礎,有效整合更多計算資源及改善 計算資源調配。
Abstract
With the increasing of huge and complex data, we need more dovetailed global
coordination/cooperation to use Grid resources efficiently. Grid computing is a new
technology for solving these complex problems, and its main concept is to share computing
resources in the community. Compared to the cluster computing, Grid computing is no longer
limited in the intranet, the grid system but connects different geographical computing
resources by Wide Area Network (WAN). The biggest challenge of Grid projects is how to
share and integrate computing resources across different virtual organizations. At present,
users still submit a job to a specific virtual organization for job execution. However, it still
fails to achieve the integration and sharing of different geographical and VO’s resources.
This study analyzes the advantages of Peer-to-Peer network for computing resources
collection, integration and use. We built a Peer-to-Peer Grid middleware, named Service
Oriented Roaming System, and design a crossing multi-VO Peer-to-Peer communication
model. We also propose a job migration mechanism and load sharing polices for improving
the utilization of idle computer resources. This study could improve the integration of
computing resources and achieve the efficient computing resources sharing based on our
proposed communication environment.
Table of Contents
摘 要 ... I
Abstract ... II Table of Contents ... III List of Figures ... V List of Tables ... VI
Chapter 1. Introduction ... 1 1.1 Motivation ... 1 1.2 Problems ... 3 1.3 Purposes ... 5 1.4 Restrictions ... 6 1.5 Notations ... 7
Chapter 2. Related Work ... 9
2.1 Grid Computing ... 10
2.1.1 Globus Toolkit ... 15
2.1.2 Condor ... 17
2.2 Peer-to-Peer Technology ... 19
2.2.1 Peer-to-Peer Network ... 19
2.2.2 Peer-to-Peer Grid System ... 22
2.3 JXTA Technology ... 24
2.3.1 JXTA Architecture ... 24
2.3.2 JXTA Service ... 26
2.3.3 JXTA Protocol ... 28
2.3.4 Java Virtual Machine ... 30
2.4 Migration Technology ... 31
2.4.1 Process Migration ... 31
2.4.2 Job Migration in Grid environment ... 32
2.4.3 Load Sharing Polices ... 33
Chapter 3. System Design and Implementation ... 36
3.1 Development Tool ... 36
3.2 System Framework Overview ... 36
3.3 Light-Weighted System ... 40
3.4 Information Discovery ... 40
3.6.1 Load barrier policy ... 43
3.6.2 Heterogeneity policy ... 44
3.7 A Simple Example ... 49
Chapter 4. Experimental Environment ... 52
4.1 Implementation Environment ... 52
Chapter 5. Experimental Results ... 55
5.1 Load Barrier Policy ... 55
5.2 Heterogeneity Policy ... 59
5.3 The ability of cross internet ... 61
Chapter 6. Conclusion and Future Work ... 63
References …... ... 64
Appendix A SORS User Guide ... 69
List of Figures
Figure 2-1 Commercial Grid solutions taxonomy ... 11
Figure 2-2 Globus Toolkit 4 Services ... 16
Figure 2-3 The JXTA three-layer architecture ... 25
Figure 2-4 JXTA XML format ... 27
Figure 2-5 The JXTA project protocols ... 29
Figure 3-1 The architecture of Globus Toolkit ... 37
Figure 3-2 The architecture of SORS ... 39
Figure 3-3 The structure of traditional resource discovery in Grid system ... 41
Figure 3-4 The structure of resource discovery in SORS ... 42
Figure 3-5 Proposed Load Barrier Policy Algorithm ... 44
Figure 3-6 Proposed Heterogeneity Policy Algorithm ... 48
Figure 3-7 The Load Sharing scheme ... 49
Figure 4-1 Taiwan UniGrid website. ... 53
Figure 4-2 Tiger Grid website. ... 53
Figure 5-1 50 Jobs Total Response Time of Load Barrier Policy ... 56
Figure 5-2 50 Jobs CPU Usage of Load Barrier Policy ... 57
Figure 5-3 50 Jobs execution process of 10/100 ... 58
Figure 5-4 50 Jobs execution process of 100/10 ... 58
Figure 5-5 The Average Job Response Time of three VOs ... 60
Figure 5-6 Compare with the total response time of two policies ... 61
List of Tables
Table 1-1 Terms and their meanings ... 7
Table 2-1 The Grid computing applied to enterprises or products in recent years. 13 Table 3-1 The comparison with Globus Toolkit and SORS functions ... 40
Table 4-1 Detail list of our computing nodes in this study ... 54
Table 4-2 The bandwidth speed of these four sites ... 54
Table 5-1 The computing nodes of Load Barrier Policy experiment ... 55
Table 5-2 The computing nodes of Heterogeneity Policy experiment ... 59
Chapter 1. Introduction
1.1 Motivation
With the advance of the science technology, the Grid has been gradually developed. Grid
systems include the network infrastructure and the software architecture, which provide
distributed computing resource platforms. Because of the development of internet and the
technology of computing, the internet users not only share files easily, but also share the
extensive computing resources. Grid systems provide the network services, communication
channels and computing resources. The main functions of the Grids are using distributed
computing resources more efficiently and providing users transparent services in Virtual
Organizations (VOs). These computing resources may be scattered in the different
organizations or the different regions. At present, Grid applications have adopted in various
fields such as the medicine simulation, medical and high-energy physics.
The Grid systems could be broadly characterized as the computational Grid and the data
Grid. Computational Grids focus on computing resources sharing, and data Grids focus on
storage resource sharing [43]. IBM developed e Business on Demand (eBoD) [12] in 2002,
and arise the On-demand Grid development. In the future vision of eBoD, the user of
enterprises use computing resources would like to use as electricity. The user could start the
service when need it, and can stop it when the user does not need service anymore. The eBoD
made a big change from focusing on storage capacity scalability in the past to choosing the
computing resources according the user’s demand. In the Grid system frameworks, satisfy the
real time demand by uses or exchange the computing resources all over the world, such a
concept known as computation on demand. Before the Grid technology been developed,
computing more efficiently by use distributed resource, and laid the foundation of Grid
science development.
Overall, the most important concepts of the Grid systems are capability of “the dynamic
computing resources” and “cross multiple virtual organizations”. The term “Grid” was coined
in the mid-1990s to denote a proposed distributed computing infrastructure for advance
science and engineering [18]. The concept of power Grids means that when user use
electricity not necessary understanding where the power comes from, let the user can get the
cheap and reliable power [18]. The first persons complete definition the Grid are Dr. Ian
Foster, Dr. Carl Kesselamn and Dr. Tuecke. The father of Grid computing Dr. Ian Foster said
the view of Grid computing is “Grid computing solves the issues of resources sharing and
cross multiple virtual organizations dynamically, and can be flexible, safe and coordinated to
achieve resource sharing". July 2002, Dr. Ian Foster pointed out three characteristics of the
Grid computing more concrete [15], listed as the following:
(1) A GRID coordinates resources that are not subject to centralized control.
(2) Uses standard, open, general-purpose protocols and interfaces.
(3) To deliver nontrivial qualities of service.
On the other hand, Peer-to-Peer computing, such as Napster [36]、Gnutella [21] and
Freenet [19], these file-sharing systems are similar with the Grid computing system. They
both share resource from community organizations. With the Grid systems scale expand, have
begin need to solve some issues, such as self-configuration, fault tolerance, system scalability,
and other topics [16]. In many Peer-to-Peer studies provide above solutions. Peer-to-Peer
systems focus on dealing the issues with instability and instantaneous flow, self-configuration
and fault-tolerant. However, the developments of Peer-to-Peer technology mainly work in
vertical integration and develop application, rather than work in constitute the universal
develop the open standard, and ensure that Grid systems can be integrated with the
applications by provide commonality interface. Of course, there will have new technologies
or tools be produced in the process of the Grid systems development in the future. Represent
that the standards also need to check and amend at anytime.
When Peer-to-Peer technologies among the delicate and complex applications, such as
decentralized structure, desktop application and network computing. Begin to look forward
the Peer-to-Peer and the Grid computing could make a strong integration, and it will be a new
beginning and output in the future.
In this study, we present a light-weighted Grid middleware for computing resources
collection, integration and use. We expect that we can help job execution more efficiently and
improve the idle computing resource usage by this Grid middleware.
1.2 Problems
Currently, the biggest challenges in the Grid systems are how to establish and manage sharing
relationships from different VOs. These challenges must be solved by the new technologies.
Some of the Grid system frameworks in job submit need to specify VO to execute, and
allocate the computing resources by using the middleware (such as condor) to allocate
resource in the same VO. A credible Grid middleware should provide convenient and flexible
environment for the Grid user, but in different Grid systems also have various types of
middleware, the accompanying problem is difficult to integrate computing resources.
Most of the Grid systems do not have same communication channel (pipe line) between
the VOs, coupled with the consideration of security and the target of different VOs, result in
difficult come to integrate computing resources. If we adopt Peer-to-Peer technology to
The resource adjustment mechanism in the Grid systems can be divided into global and
local adjustment mechanism. Most of the global adjustment mechanisms are deployed by
resource broke or agent. Resource broker play the computing resource allocator role between
the user and the resource provider in the Grid systems, help users find suitable machine for
their job and complete the information access transaction. Currently, the Globus Toolkit does
not provide resource broker function, and must be compatible with other resource allocation
systems to allocate computing resources, such as EGEE (Enabling Grids for E-Science and
Industry in Europe) [64] developed the middleware named gLite, and it adjusts the job by the
global resource broker. Global resource broker could assign the job to the appropriate
computer elements (CE) to execute. If the jobs wait for execution in the job queue of the CE,
the global resource broker unable to re-schedule these jobs anymore. It illustrates that
dynamic of resource broker inadequate.
In the fact, computing resource is extremely dynamic in the Grid systems. For examples,
the computing resource join/leave at any time, different computing power and the network
bandwidth speed. All these factors will influence the performance of job execution. In this
heterogeneous environment, computing resources may in a high load or light load conditions
at any time. Some of specific computing nodes have high utilization , such as the computing
nodes who own a better computing power, the user will submit the job to these nodes first.
Many of the computer nodes have proof computing power, they often be ignored. If we can
improve the usage of computing resource and reduce the idle chance of computing resource,
the efficiency of the job execution will be able to improve. Many load balancing and load
sharing studies solved the computing resource adjustment and resource imbalance by job
migration mechanism. In traditional practices, considered less on the significant heterogeneity
environment. Therefore, how to take the right measure standard is the important issue in load
balancing or load sharing studies.
1.3 Purposes
Grid systems not only provide computing resources, but also make computing resources do
the maximum validity of the utilization. The principal thrusts in this study are computing
resources collection, integration and use, and with three main contributions to (1) integrate
computing resources by Peer-to-Peer technology (decentralized structure) (2) build the
light-weighted Grid middleware by Peer-to-Peer technology, and (3) to improve the
utilization of idle computer resources by job migration mechanism, and to achieve the
efficiency computing resources sharing. At present, the most of the Grid systems middleware
are using Globus Toolkit and it as a major standard component. In this study, we develop the
number of modules and supply a light-weighted Grid middleware system prototype by
Peer-to-Peer technology, named Service Oriented Roaming System (SORS). According by
construct the unite communication pipeline in SORS, we makes the message transfer and job
execution more efficiency. The modules in SORS consist of the Basic Service, File Transfer,
Information Service, Execution Management and Load Sharing. Through these modules
support, achieve the integration of resource information, the utility of computing resource and
the load sharing. In load sharing management, we use the distributed strategy, and consider
the heterogeneity factors of the Grid system, including the bandwidth speed, the ability of job
execution, and so on. Let the job choosing the most appropriate computing resources to
1.4 Restrictions
The restrictions of this study, Grid systems are deployed by different regions, virtual
organizations, storage systems, software architectures, and with a high degree of changing. In
our experimental, we consider the budget and the management of the facilities. Our
experimental environment adopts the Grid organizations that the laboratory current joins
(Taiwan UniGrid and Tiger Grid), and come to job migration into cross site and cross multiple
virtual organizations. The scale size of the experimental environment in this study is small
than the actual environment, but can help the managers to control computing nodes state at
any time. The changing and different of experiment results also small than the large scale Grid
systems.
Security is an important consideration in the Globus Toolkit. Different virtual
organizations require different certifications to ensure security when the members use
computing resources. Because of this study focus on cross sites and cross multiple virtual
organizations by Peer-to-Peer technology, in order to complete control all the computing
nodes. Therefore, we do not consider the Grid Security Infrastructure (GSI) of Globus Toolkit
1.5 Notations
The related terms and the meanings used in this thesis are given in Table 1-1. These symbols
will be used in the following chapters.
Table 1-1 Terms and their meanings
Term of name Meaning
L_AL Local Average Load
R_AL Remote Average Load
L_LB Local Load Barrier
R_LB Remote Load Barrier
Job_idle The idle job in the job queue
Job_Size The size of a job
AVG_Bandwidth The average bandwidth speed
Migration Cost The job migration cost
Local_JRT The job response time in local site
Remote_JRT The job response time in remote site
AVG_JRT The job average response time of all jobs
Local_ idle_ length The idle job length in the local site job queue
Local _running _ length The running job length in the local site job queue
Remote_idle _length The idle job length in the remote site job queue
Remote_running_length The running job length in the remote site job queue
Local_finish_time The job finish time in the local site
Remote_finish_time The job finish time in the remote site
The Grid system includes the network infrastructure and the software architecture, to
provide distributed computing resources platform. Because of the development of internet and
the technology of cheaper computing, the internet users not only share files easily, but also
share the extensive computing resources. The most important concepts of the grid systems are
capability of “the dynamic computing resources” and “cross multiple virtual organizations”.
communication channel (pipe line) between the VOs, coupled with the security consideration
and the different VO target, result in difficult come to integrate computing resources. We
adopt Peer-to-Peer technology to achieve decentralized computing resources sharing,
integrate by management and access computing resources from different VOs. Thus, we can
improve the rational use of computing resources and reduce the computing resource idle
chance, and the efficiency of the job execution performance will be able to improve.
The principal thrusts of this study are computing resources collection, integration and
utilization. The three main contributions to (1) integrate computing resources by Peer-to-Peer
technology (2) build the light-weighted grid middleware by Peer-to-Peer technology, and (3)
to improve the utilization of idle computer resources by job migration mechanism, and to
achieve the computing resources sharing. Let the job choose the most appropriate computing
resources, and reduce the idle chance of computing resources.
We introduce the gird and peer-to-peer system, problem and propose in chapter 1.
Chapter 2 will discuss the related works, which include for grid computing, peer-to-peer
technology, JXTA and migration technology. Chapter 3 will further discuss the system design,
which include for development tool, system framework overview and the proposed algorithm.
Chapter 4 will describe the experimental environment in this study. Chapter 5 will explain the
results and statistic obtained from the experiments. The last chapter will include related
suggestions, directions for future development, and conclusions for the summarization of this
Chapter 2. Related Work
This chapter discusses the related work, it can be divided into four sections, include section
2.1 Grid Computing, section 2.2 Peer-to-Peer Technology, section 2.3 JXTA technology and
section 2.4 Migration technology. In section 2.1, describes the development history of Grid
computing, the application in enterprise and the Grid middleware development, such as
Globus Toolkit, condor and condor java API.
Peer-to-Peer networks are almost applied to the file sharing and instant messaging
applications, different from the traditional server/client architecture. In section 2.2, we discuss
peer-to-peer technology applications and it benefits, including the decentralized framework,
great scalability, robustness, high-performance, security and load balancing. In section 2.2.3,
we describe the similarities and differences between Peer-to-Peer and Grid system, and the
related studies of Peer-to-Peer Grid system.
In section 2.3, we introduce the JXTA. We adopt JXTA for implement Peer-to-Peer Grid
system. This platform defines the XML-based framework for the message exchange and the
network topology integration. JXTA provides the series open protocols, and these protocols
are allowed to make devices (e.g., mobile phone, PDA and computer) connection between
nodes.
In section 2.4, we describe the development of migration technology and the application
in the Grid system. By using migration technology, we could allocate the site or node load. In
the distributed system, load sharing can improve the system efficiency by job migration. In
2.1 Grid
Computing
Grid computing is a kind of distributed system, includes the network infrastructure and
software framework, and provides computing services by the distributed hardware and the
software. The goals of the Grid computing are improving computer power capacity, resource
utilization, and access resource by connecting the large number distributed resource from
geographically or the organization. In addition, The Grid system allows the VOs all over the
world to sharing the resources each other and satisfies the large computing requirements of
the user.
Different organizations share resources or cooperate together for the same goal. These
organizations are called VO in the Grid system. The resource, not only means the computer,
application service provider, hardware or network resource, but also including the software,
scientific instrument and commercial information. In the Grid environment, every VO
attaches great importance to mutual trust, communication and coordination. Toward with the
available resources, VO also provides the application protocols and the structures for the member who owns the access right [46].
In 2004, European Union expanded the Grid application environment and established the
EGEE (Enabling Grids for E-Science and Industry in Europe) planed to expand e-Science
research, education and industrial application. In United States, established the Grid3 plan
according with the LCG [31] development experience, and to promote Open Science Grid.
Expect that Grid will become the important assistance in the scientific research, education and
social changes. Due to these organizations collaboration are promoting the Grid scientific
development. Taiwan is also committed into the Grid research, the apparatus of government
and schools build the Grid systems environment together, and research in the new theory and
many high schools and universities joined this project from the central of Taiwan. In 2006,
Taiwan UniGrid project [50] attracted more than 30 schools, and coordination with Academia
Sinica, and National Center for High-Performance Computing. At present, the Grid
applications have practical in various fields such as medicine simulation, medical and
high-energy physics.
Figure 2-1 Commercial Grid solutions taxonomy [38]
Grid computing is not only practiced in the science experiment, Figure 2-1 shown the
development of computing technologies in recent years, from the beginning of the cluster
computing to the current open standards Grid environment can be seen Grid computing has
increase the scope in variety of commercial applications gradually. The powerful computing
capacity let researchers devote to the meteorological models or weapon simulations.
Meanwhile, Grid computing is diffusing to traditional business computing applications
show as Table 2-1. Following IBM presented eBoD in 2002, Amazon began selling Elastic
Compute Cloud (EC2) [1] services on the internet in 2006. EC2 provided on demand and
flexibility calculation services by web service, and mainly to help developers use the wider
Open Grids Partner Grids Technical Complexity Enterprise Grids Departmental Grids Clusters 2007 Technology Capacity Frontier 2005 Technology Capacity Frontier 2000 Technology Capacity Frontier Crossdomain access
computing resources more easily. In 2007, SUN also launched the On-Demand Computing
[47] services, and determines the price by the rental of CPU or the cost of time, and customers
can accordance with their requirements buy CPU computing resources. Above On-Demand
Computing also represents the development of Grid computing, and become more close to
Table 2-1 The Grid computing applied to enterprises or products in recent years.
Time Name Company The Goal
2002 PLM (Product Life Management) SUN PLM import manufacturing processes to enable enterprises have lower costs for integrate existing IT resources, such as processing power, storage devices, memory and network bandwidth. Enterprises not only allow users to allocate network resources flexible, but also implementation on compute-intensive application.
2004 World Community Grid
[http://www.worldcommunityGrid.org/]
IBM In this plan, the researchers use the information technologies study in the "World Community Grid", and analyze the large number of cancer tissue microarray (TMAs, tissue microarrays) for process number of experiments in the short time.
2004 Oracle 10g Oracle Oracle Grid adopts server and the
modular storage equipment which with lower cost, make the efficient load balance of system. The user can enjoy the high-performance and reliability with the lowest overall cost of information services. The current version is Oracle 11g.
2007 HYDRAstor
[http://www.necam.com/Storage/GridStorage.cfm]
NEC HYDRAstor uses Grid-base storage technology, and is developed by U.S. NEC Laboratories America Inc. It is according to the NEC’s server and the storage experience of business, each unit cost of storage will be reduced to one-tenth of the past similar products.
In August 2006, the U.S. Oracle in Asia-Pacific (APAC) published, according to the
Overall Grid Index report that the Grid applications use in business organizations in APAC
grow faster than other regions of the world. This report also indicated that the amount of
business organizations in Asia-Pacific who has, or plan to establish Grid computing system
increased 83 percent than in 2005 (compare with the lower growth in the United States 45
percent and Europe 7 percent ). Meanwhile, the awareness and understanding of basis Gird
capacity in Southeast Asia is top three of the world. The Quocirca research director Clive
Longbotom expressed his view for the Asia-Pacific region, he said that the utilization of Grid
computing increased in Asia-Pacific region from the Grid computing bring the business
organizations a new value and attention. The above describes show Grid computing operates in the enterprises or the acceptances of the Gird are both growing positively [37].
Grid middleware is used to integrate the scattered computing resources, and responsible
for the coordination functions between the computing nodes. One of the important
components in the Grid middleware is metadata. The European Data Grid project, that is
cooperated from 150 software engineers and complete over 300,000 lines of code. In short
speaking, the main purpose of the Grid middleware is to achieve resource sharing, security
access and the resource management. There are some common Grid middleware, such as
Globus Toolkit, China Grid Support Platform [4], gLite [13], and UNICORE [52], and the so
on. Globus Toolkit is the most popular Grid middleware, and proposed many Grid-related
standards. In 2001, Department of Institute of Physics in Academia Sinica participated in the
LCG project (Worldwide LHC Computing Grid) [31] of European Organization for Nuclear
Research (CERN). The core middleware of LCG is gLite, and gLite adopt some part
components from the Globus Toolkit, with various packages developed from its own team.
In addition to the Grid middleware and the application development, many Grid-related
In the traditional job scheduling and load balancing studies considered less on the
heterogeneity significant of the Grid system or the experimental methods were not suitable for
the heterogeneity Grid system. In recent years studies, consider the heterogeneity
characteristics of the Grid system gradually, including the computing ability, bandwidth and
distance. In [42] study, adopted job arrival rates and the job response time for load balancing
factors. In [34, 33] studies, authors collected job total response time for the assessment
standard, and to group the computing nodes who have powerful computing ability, the job
will be assigned priority to these group. In [54], author considered of the user's expect
deadline and the migration cost of job. Above studies almost implemented experiment and
presented the experimental results by the simulation, also show that it is hard to experiment
with real Grid system.
2.1.1 Globus Toolkit
Globus provides a framework for application to process distributed heterogeneous computing
resources. The Globus project developed by Globus Alliance [65] which members including
Argonne National Laboratory and University of Southern California, and these members
devote to the computing environment development. IBM, INTEL, HP, SUN and other
enterprises are also support to Globus Toolkit.
The process of Grid technology development, while not long, but the core technology is
already has great progress. Currently, the most of gird projects are established by the Globus
Toolkit protocols and services, shown as Figure 2-2. Globus Toolkit is an open-source and
free for users, and can be modified by users demand. The object-oriented structure of Globus
Toolkit provide many services, including the resource monitor, resource discovery, execution
management, security infrastructure and data management. The programmer can adopt
deployment. That is why Globus Toolkit so popular in the Gird system construction. The
latest open version is Globus Toolkit 5.3.
Figure 2-2 Globus Toolkit 4 Services
Globus Toolkit provides the following functions:
Security
The security service in Globus Toolkit provides users authentication identifies, protection
communication channel, and to determine who was allowed to perform actions (authorization)
and other support functions, such as management the user account and maintenance member’s
data.
Data Management
The data management service in Globus Toolkit achieves to distribute the location, transfer,
access and management for the data. GridFTP is a safety, reliable, and high performance
transfer protocol, and it apply optimization for data transfer between nodes. The functions
provide in GridFTP are parallel transfer, reliable transfer, and support transfer security and
Security Data Management Execution Management Information Service Common Runtime Credential Management Authentication Authorization Delegation Community Authorization Data Replication Replica Location Grid FTP Reliable file
Transfer OGSA-DAI Grid telecontrol Protocol Workspace Management Community Scheduler Framework Grid resource Allocation and management Trigger Index WebMDS Java Runtime C Runtime Python Runtime
integrity for GSI.
Information Services
Monitoring and Discovery System (MDS) provide information services components in the
Globus Toolkit, including the available information and state in the Grid system. For example,
the discovery service can find the suitable node which has better computing resource for the
job.
Execution Management
Grid Resource Allocation Management (GRAM)
GRAM is the important component for execution management services in Globus Toolkit, its
help the user to locate, submit, monitor and remote execution in the Grid system. GRAM is
not a task scheduler, but it communicates with the different bath or cluster task scheduler by
use the single protocol.
Common Runtime Components
After Globus Toolkit version 4, the development team added the common runtime
components, pre-web libraries and tools. These services to be platform independent, and
establish various abstraction layer and leverage functionality lower in the web service stack.
2.1.2 Condor [6]
Condor is a kind of workload management system for the compute-intensive job. The goals of
Condor project is to integrate the large-scale distributed computing resources, and also
support implementation, deployment, assessment mechanisms, and adjustment strategies for
high-throughput computing. Condor provides a job queue mechanism, scheduling policy,
priority scheme, resource monitoring, and resource management. When the user submit their
execution finish, user will be notified. The job types of condor can be supported, including the
parallel application(MPI application)、JAVA application、DAGMan application and virtual
machine application, the execution state can be classified into run, held and idle, and so on.
The important features in Condor, such as the check point, remote execution, and support the heterogeneity environment [6]。The other developments of Condor, the Japanese scholar
Hidemoto Nakada who servers in Department of Grid Technology Research Center in
National Institute of Advanced Industrial Science and Technology, developed an application
can control condor job queue is called “Condor Java API”, to help the user delivery their job,
and cancel the job quickly. Depend on this, developer can control job state more easily, and to
2.2 Peer-to-Peer
Technology
In this section, we discuss the Peer-to-Peer network summary and Peer-to-Peer technology
development on the Grid computing system.
2.2.1 Peer-to-Peer Network
Currently, Peer-to-Peer networks are almost applied to the file sharing and instant messaging
applications, different from the traditional server/client architecture, the computer nodes in
peer-to-peer network are all the same, and play server and client role at the same time. Each
node is able to sharing out its resource and also can get the resources from other nodes. By
this architecture, without server or client restrictions, computer nodes own more rule and the
better flexibility, while reducing the network bottlenecks happened (the popular server often
causes the congestion), and the network bandwidth has been properly applied, thus the
network performance enhancing greatly. Peer-to-peer network is structured from a large-scale
decentralized system platform and mappings object location to other nodes by its
identification [40].
In research [3], authors proposed the definition for Peer-to-Peer Network.
Peer-to-Peer systems are distribution systems consisting of interconnected node able to self-organize into network topologies with the purpose of sharing resources such as content, CPU cycles, storage and bandwidth, capable of adapting to failures and accommodating transient population of node while maintaining acceptable connectivity and performance, without requiring the intermediation or support of a global centralized server or authority.
The characteristics of peer-to-peer network are decentralized, high performance, excellent
scalability, robustness, security, price advantage and load balancing, etc. [3, 26]. We describe
these characteristics as the following:
Decentralization
The computer nodes in Peer-to-Peer network are distributed. Each nodes can send
message to each other without centralized sever intervention. Therefore, it could avoid
some unexceptional event, such as the bottlenecks, security and management issues.
Scalability
In the Peer-to-Peer network, self-organizing helps every node can join or leave anytime
in the topology. With the user increase, not only the requirement of services has
increased, but overall system resources and services also expand simultaneously. We
could satisfy the user’s requirement more easily, which is unable to be done in the
traditional distributed system.
Robustness
The advantages of Peer-to-Peer architecture are the high fault-tolerant and it could
prevent from the network attack. As the services are scattered in various nodes, if some
nodes or network were damaged, the system will have less influence from the damage.
Generally speaking, if some nodes lost or leave in Peer-to-Peer network, the topology
will self-adjustment automatically, and to keep the link with other nodes. Peer-to-Peer
network can also self-organizing constantly according with the network bandwidth,
number of node and the system load.
Security/Privacy Protection
The beginning of system was established, the system designer consider the security
problem about how to prevent the attack. Peer-to-Peer systems are deployed on
between the nodes, without through centralized server or node, thus prevent the private
information leaked and possibility of eavesdropping.
Loading Balance
The computer node in Peer-to-Peer network are both server and client, unlike the
server/client architecture emphasized the computing power and storage capacity.
Meanwhile, because of the resource be distributed in number of nodes, the Peer-to-Peer
network could achieve the network load balancing.
In the Peer-to-Peer network, it is not easy to regulate user behavior and the computer
node quality, which are many different types of hazards, such as to abuse other nodes
resources. In the Grid system, the computing nodes all have the safety certification and
reliability account. We can use Peer-to-Peer technology to group the computing node and
ensure that resources are used effectively and have a safety access mechanism.
Super node is an important development of the Peer-to-Peer technology. Compare with
the super node and general node, super node is powerful than general node in its computing
power, bandwidth and the storage capacity, etc. Therefore, super node can burden more work
than other nodes [3]. The earliest application adopt the super node concept is KaZaa [30].
KaZaa is the one of the popular Peer-to-Peer software in the world. According to the statistics,
the KaZaa has been downloaded more than 250 million times, and the file transfer in KaZaa
consumed 40 percent of the global network bandwidth. KaZaa combines Napster and
Gnutella common benefit. In the structure of KaZaa, uses the wide distributed architecture
like Gnutella, and lets system can be expanded easily. Because the KaZaa is not necessary to
save file name in the centralized index server, system will choose the node which have better
performance to become the super node automatically, and save the neighbor nodes
technologies, such as blind search and heuristic search. In many studies, research in search
method and try to find the better path or improve the speed of search. As the super node
indexing, search efficiency has improved greatly.
At present, well-known communications software Skype is a Peer-to-Peer VoIP
application, it uses the same topology like KaZaa. The topology structure of KaZaa and Skype
are slimier between Napster [36] and Gnutella, they both group the node in the region, and the
formation of these groups may be from similar geographical or share similar resources.
Therefore, to share resources in the same region will bring the high query frequency and
quantity.
2.2.2 Peer-to-Peer Grid System
Grid system and Peer-to-Peer system both are built from the community concept, when Grid
computing has been widely used in the computing scientific, Peer-to-Peer applications are
still most used in multimedia file exchange. Grid computing provides computing resources to
the user, and resource management and task scheduling are the important services of the Grid
system. In addition, Grid system focuses on deal with more powerful, more diverse, and a
highly interconnected set of resources.
In the resource provision view, Grid system emphases on supply Quality of Service
(QoS), but Peer-to-Peer system could not be fully guaranteed. Now, with the Grid system
scale expand, Grid developers are facing the autonomic configuration and management issues
[16]. In recent years many projects such as Jalapeno [48], SP2A [2], and JNGI [28], combine
the Peer-to-Peer technology and the Grid system. Most of Grid computing projects
application in resource collection and job management. JNG and Jalapeno used Peer-to-Peer
technology to job execution in the Grid system. In the job management, JNG and Jalapeno
these two Peer-to-Peer Grid systems that Jalapeno collects the job which is already submitted,
to extend jobs throughout the system and it not necessary to ask submitter any action or
request. But the job submitter of JNGI, it must send request to worker group before submit
job. After above action, the submitter was allowed assigns the job to worker group.
Unfortunately, JNGI could not distribute large number of jobs to worker groups at the same
time.
Besides computing resources collection and utilization, the issues of resource integration
are how to allocate large-scale resource and to collect the computing resources come from
different domain. The Grid system common problem is that the users always like to use the
powerful node, and the resource will be concentrated in these powerful nodes. Then, bring the
resource imbalance problem. We could take on the Peer-to-Peer technology to help discovery
the new resource quickly, add computing nodes more flexibly, and use computing resources
2.3 JXTA
Technology
The Peer-to-Peer technology becomes a popular topic because of the music file-sharing
programs, such as the Napster and Kazaa. SUN hopes to invest more applications by JXTA.
For example, to build Peer-to-Peer network in the office could reduce the cost from the central
server structure, and avoid network problems like the hot spot and system maintenance [3, 26].
SUN developed JXTA in 2001. JXTA is an open source platform for Peer-to-Peer network,
and its name comes from the Juxtapose. This platform defines the XML-based framework for
the message exchange and the network topology integration.
JXTA provides the series open protocols, and these protocols are allowed to make
devices (e.g., mobile phone, PDA and computer) connection amid nodes. In the JXTA virtual
network, even if some nodes exist behind the firewall or NAT can still communicate with
other nodes directly [29].
2.3.1 JXTA Architecture
Overall, JXTA framework can be divided into three layers, as the following:
Core Layer
This layer is the JXTA core. It gathers the basic functions for Peer-to-Peer Network in a
package, including the node search, media transmission (e.g., transmission media data
through the firewall), group establishment, and other security mechanisms.
Service Layer
JXTA services layer supports the Peer-to-Peer network services and the normal operation. But
in different Peer-to-Peer environment supports different degree of services, including for peer
discovery, index service, data sharing, certification, distributed structure and Public Key
Application Layer
JXTA application layer supports application implement and combine, such as Peer-to-Peer
message transfer, data and resource sharing, entertainment media content management and
publishing, mail management, the auction system, and other functions can be deployed on the
Peer-to-Peer network.
2.3.2 JXTA Service
JXTA provides varies network services, including the web service, Remote Method
Invocation (RMI) and Common Object Request Broker Architecture (CORBA), and these
services are delivered by the pipelines. JXTA also can adopt other standards for improve
system efficiency such as Web Services Description Language (WSDL) and Simple Object
Access Protocol (SOAP). By the following services are the most commonly used in JXTA. As
long as there have a node survive, JXTA will provide these services constantly.
Advertisement
Advertisement plays the conversation bridge in the JXTA network. Like the real world
advertisement, advertisement in JXTA describes the information, including the peer,
group, pipe and the service, and all above information need advertisement to keep the
communication in the JXTA network. Advertisement using the XML format as show in
Figure 2-4. XML with the following characteristics:
- Standard
XML standard was made by the World Wide Web Consortium (W3C), and has a
high degree of acceptance in computer science fields.
- Global
XML parser supports UTF-8 character set, this is a global language and Unicode
standard. Users can edit XML using all countries language.
- Self-describing
XML format is constructed by the Meta data, tag and attribute. The user can
self-describe the data format.
- Extensible
their information. Because of these important characteristics, XML could make
communication and editing easily by the JXTA Protocol.
Figure 2-4 JXTA XML format
Peer Service
Peer service, it is could be a network device or one or more JXTA protocol, such as the
PDA, phone and personal computer. Each node has a unique peer id, and can be existed
in many types as different functions
PeerGroup Service
Peer group is a set to collect the node and own the same share mechanism in a group. At
any time, the node could belong to the multiple groups. When the node is built, it will
join the “Net Peer Group” at the beginning, and then the node could join the other group.
Similar with the peer service, each group has their unique group id, and there are many
group types in the group services, including the secure group, limit available scope group
and monitor peer group. JXTA protocol describes the node how to publish, discovery,
join and monitor these service.
<Peer>MyPeer</Peer> <PeerId>urn:jxta:uuid-59616261646162614A78746150325033BCCEADFAD24D44C5ACF38BD18BFF00 9403</PeerId> <TransportAddress>jxtatls://uuid-59616261646162614A78746150325033BCCEADFAD24D44C5ACF38B D18BFF009403</TransportAddress> <TransportAddress>tcp://192.168.200.141:9701</TransportAddress> <TransportAddress>tcp://210.240.197.6:9701</TransportAddress> <TransportAddress>relay://uuid-59616261646162614A78746150325033BCCEADFAD24D44C5ACF38B D18BFF009403</TransportAddress>
2.3.3 JXTA Protocol
In this section, we explain the JXTA protocols are mentioned before.
Peer Discovery Protocol
By using Advertisements in this protocol, the node could publish its resource information,
such as the peer, group, pipeline, and other services, and allowing nodes to discovery
others. Peer discovery protocol delivers two types of messages:
- A request format to use to discover advertisements
- A response format for responding to a discovery request
Peer Information Protocol
Most Peer-to-Peer applications always want to know what remote nodes doing now.
When the node publish its information, we can know its message, such as how long it
will exist, how many message were published from it. This protocol provides two types
of messages:
- Peer Info Query Message:To query information of node’s state.
- Peer Info Response Message:To response the node state to other node.
Peer Resolver Protocol
With the need to resolve other node, JXTA defines the protocol “Peer Resolver Protocol”.
This protocol provides implementation of PRP and defines how node exchanges queries
and response information with another one. In general, Resolver service provides the
following two types of messages:
- Resolver Query Message:The type of send query message.
Peer Endpoint Protocol
This protocol can assure the message’s route, and deliver messages to endpoint node
correctly.
Pipe Binding Protocol
Pipe Binding Protocol is one of the most used in JXTA, and it could be used to create
virtual communication channels (or pipes) between one or many nodes. By this protocol
allowing node have the ability to find the communication pipe, and the node can
combine the information to the endpoint.
Rendezvous Protocol
Rendezvous can be viewed as a point for collector where nodes could exchange
information each other. Through broadcasting of buffered nodes, Rendezvous peer could
reserve other node’s information. Thus, Rendezvous peer helps nodes to find out other
node or redirect the query request to other Rendezvous peer. In fact, Rendezvous peer
means that a node can handle request from other node, and also delegate requests to
other Rendezvous peer. Its main purpose is conveniently search information in local
network. Each node can be dynamically assigned to be or not to be Rendezvous peer. In
hybrid centralized Peer-to-Peer architecture, the central server is been the Rendezvous
peer, but in decentralized architecture, Rendezvous peer may be not only one anymore.
Through the Rendezvous protocol, we can resolve the problems of node searching and
incomplete data query.
Peer Discovery Protocol
Pipe Binding Protocol
Peer Information
Protocol RendezvousProtocol
Endpoint Routing Protocol Peer Resolver Protocol Standard Services Protocols Core Specification Protocols
2.3.4 Java Virtual Machine
Java interpreter can translate original program to the byte code, which can be executed in any
machine with Java Virtual Machine (JVM). By translating program to the byte code, a
program can be executed in a cross-platform or cross-OS manner. It means users can execute
the same program in different operate system, such as Windows, UNIX, or Macintosh’s Mac
2.4 Migration
Technology
In this section, we discuss the migration technology including the process migration,
migration strategy, job migration in the Grid system, and the load sharing policy.
2.4.1 Process Migration
The term “Process” appeared in the 1960s, presented by the Multic system designer [7].
Process migration is an action which transferring a state of process from one machine to
another for processing or executing on the target machine. The earliest migration technologies
were used on the operating system, via the inter-connection framework to allocate the CPU
resource to the process. The other migration applications are used in image process, memory
page, and so on. We summarize the three points of migration technology advantage, including
(1) to help dynamic load balancing, and reduce the network traffic. Many operating systems
developed process migration mechanism, such as the Accent Sprite and V. (2) to help the
resource sharing, the resource such as specific node with a special hardware device, large
amounts of free memory, or some other unique resource, help users access more processing
power. (3) to improve system administration by transferring the process that prevent many
unexpected problems, such as machine shut down in suddenly. [45, 39, 35]
Migration technology can be divided into three actions:
Step 1. Suspend the process in the source node.
Step 2. Transfer the data, file and information state to the remote node.
These steps are the basic action of the migration strategy. According with above steps,
many migration strategies are reformed, and assist in reduce the system cost and raise the
system performance. The common migration strategies, like the Eager, Lazy and Precopy.
Eager is the simplest and most common migration strategy. Because of this strategy transfer
all information (data, file and state) in a one transfer time. When system starts the Eager
migration strategy, the first step is suspending the process on the source node and beginning
transfer information to the remote node. After the information was transferred complete, the
process will reconstruct the process on the remote node. The time and cost of Eager is higher
than other strategies [39], but it is most easily to implement.
2.4.2 Job Migration in Grid environment
The development of migration technology from process migration (inter-connection) to deal
with large scale and geographically distributed system, such migration applications through
the Wide Area Network (WAN) to connect various computing machines are called
internet-connection. The resource in the Grid computing environment is dynamically, and we
could achieve load balancing by job migration between the computing nodes.
The Grid is a cross-geographical and heterogeneity system, the job must be adapted in
different computing capacity and to re-scheduling constantly. Most of the impact factors of
the Grid consider the usage of site resource. In the past load balancing or scheduling studies
[32], only consider the CPU and memory state. But it should be considered more impact
factors in the real Grid environment.
The job migration in distributed system need to reflect three issues to facilitate the load
balancing implementation, there are when to active the migration strategy, how to implement
migration strategy and the consideration of migration cost. Active the migration strategy
user’s requirement and the condition of the job or according with the node status (such as
CPU or memory state), then to determine whether to start migration strategy. As to how the
implement migration strategy, we could establish the migration strategy on the computing
node or site. The load sharing policy defines the transfer rules between the nodes. We could
adapt load sharing policy to establish the migration strategy. In the next section, we will
describe and discusses the loading sharing policies
2.4.3 Load Sharing Polices
Decentralized system collects a lot of nodes for computing resources sharing. The load
sharing in a decentralized system could helps improve the usage of computing resources,
system performance and obtains the minimal job execution time via sharing the workload to
idle or light load computing resources. In the study [27], shown that when information is
transferred from the source to the destination, will has the significantly affected by the
network state, including the network latency, communication delay, and so on.
Generally speaking, load sharing policy can be divided into dynamic, static and adaptive.
The dynamic load sharing policy uses the system state information (the load of site), and
make the decision by the system state at any time. The static load sharing policy adopts the
prior knowledge, such as the average task-initiation rate and execution rate of each node.
Dynamic load sharing policy also could improve the quality of decision more than the static
load sharing policy. Adaptive policy is the special one, which is started to change system state
when variables or the policy is changed [44]. Dynamic load sharing policy could classify into
Sender-initiated, Receiver-initiated, and Symmetric-initiated. By using load sharing policy,
we can increase the usage of computing resources, and decrease the job response time through
Sender-initiated policy
Sender-initiated policy shares the system load from the overload system to light load
system for abating its own loading [10]. We could get better performance by using this
policy in light or medium system load [35]. But, if using this policy in high load system
will cause a lot of useless request messages (which will be rejected by other nodes, until
request times meets the default upper bound), leading to the communication channel
congestion problem. Too many request times may also bring the delay and waste of
bandwidth.
Receiver-initiated policy
When any light load node has the capacity to sustain more load, it will send request
message to the node in heavy load to share its load actively. But when light load node
cannot find any suitable load node, it will execute its own job, thus good performance is
assured. Using this policy in light or medium load system, the performance is worse than
that of sender-initiated policy. Just like using sender-initiated in heavy load system, using
receiver-initiated policy in light load system will bring lots of useless messages. This
result in the information collection delay, even some system performance improvement
gained is useless. Nevertheless, unlike using sender-initiated policy in heavy load system,
using receiver-initiated in light load system will obtain the better efficiency, because the
few job in system which means less departure of jobs, thus rarely influenced by delay of
data collection [10, 11].
Symmetric-initiated policy
Symmetric-initiated policy combines the above two policies, thus with both two’s pros
and coins. That is using sender-initiated policy in heavy load system and using
receiver-initiated policy in light or medium load system both result in many useless
generate many useless messages, and consumption of communication capacity.
Compared with the above two policies in cost, symmetric-initiated policy is a more
Chapter 3. System Design and Implementation
This chapter, we discuss design and implementation of our system. We adopt the JXTA to
implement a decentralized Grid middleware, named Service Oriented Roaming System
(SORS). In section 3.1, we describe the JXTA development tool for our system. In section 3.2,
we propose our system architecture, including the Globus Toolkit architecture and SORS
architecture. In the section 3.3, we also detail the comparison between Globus Toolkit and
SORS. In the section 3.4 and 3.5, we explain the implement technique of information
discovery and load sharing in our system. In section 3.6, we describe the load barrier policy
and the heterogeneity policy in this study. Finally, we propose simple examples in section 3.7.
3.1 Development
Tool
In this study, we adopt the JXTA as the development tool. JXTA is an open-source platform,
and it provides of Peer-to-Peer services. Currently, JXTA supports JAVA and C++ languages,
and the latest version is JXTA 2.5.1.
3.2 System
Framework
Overview
In Grid systems, most of them use the Globus Toolkit as a middleware. The architecture of
Globus Toolkit could be divided into four layers, including application layer, Grid service
layer, Communication layer, and Basic Grid Fabric layer, as shown in Figure 3-1. In this study,
we would like to implement the functions of the Grid service layer, include execution
Figure 3-1 The architecture of Globus Toolkit
In this study, we develop a Peer-to-Peer Grid middleware system, named the Service
Oriented Roaming System (SORS) to provide a Grid middleware prototype. The term
“roaming” is named from that the job could move between the sites. The architecture of
SORS is shown in Figure 3-2. The SORS is a light-weighted middleware to collect computing
resource information by a decentralized structure, and to achieve cross-sites/VOs load sharing.
The module of the SORS includes the Basic Service, File Transfer, Information Service,
Execution Management and Load Sharing.
In general, the SORS provides the following functions:
Basic Service (configure)
This module supports the basic setting and configurations, including the peer and group
initialization. It is the underlying structure in the SORS.
File Transfer
Uses sockets to the transfer module of data, file and message, to implement pipelines for
communication. It mainly provides the transfer service and the underlying structure in
Basic Grid Fabric
Communication Grid service Application Internet Protocol Software Information Service Data Management
Execution Management Security
File Transfer Protocol
Hardware Physical Layer
Operating Systems
Disk Network Databases
Portal Collaboratories High Energy Physics
the SORS.
Information Service
The information service includes the computing resources discovery and integration. The
computing resources discovery could be divided into two parts. The first part collects the
information resource between sites, the second part collects the local resource
information (between local peers). In this study, we build a decentralized structure for
cross-sites/VOs information integration, such as CPU, memory, job queue length status,
and so on.
Execution Management
The execution management module, handles the job management, the job execution, and
the control of the job queue.
Load Sharing
In this load sharing module, we develop a load sharing strategy to share load between
Figure 3-2 The architecture of SORS
High Level Service Application
JXTA Core Service
Fabric
Peer Discovery Protocol Peer Information Protocol Peer Resolver Protocol Peer Endpoint Protocol
Pipe Binding Protocol Rendezvous Protocol Portals Collaboratories High Energy Physics
Operating Systems
Disk Network Databases
‧‧‧
Information Service Execution Management File Transfer
Configure Load Sharing
JVM
SORS
3.3 Light-Weighted
System
Developing a light-weighted grid middle is the target of our study. SORS is a light-weighted
grid middleware prototype to provide simple and efficient basic modules to integrate
computing resources and achieve load sharing.
SORS substitute a part of the original components of the Globus Toolkit, for example the
Information Service, Execution Management, and Data Management. Security is an important
consideration in the Globus Toolkit, in different VOs requires different certifications to
ensure system security when the members use computing resources. Because that we
implement a cross-VO system by the Peer-to-Peer technology on JVM. We do not use Globus
Toolkit’s GSI to integrate our system. We compare with the SORS and Globus Toolkit
functions as show in Table 3-1
Table 3-1 The comparison with Globus Toolkit and SORS functions
Globus Toolkit SORS
System core size large small
Transfer Yes Yes
Information service Yes Yes
Resource discovery Yes Yes
Execution management Yes Yes
Job Migration (Cross-VOs) None Yes
3.4 Information
Discovery
Many Grid researches monitor resources according to the agents or brokers. In the Globus
Toolkit architecture, the resource allocation is limited in the same VO, and then resource
information is collected by the web service application, such as Ganglia, MDS, and NWS.
collected in a centralized server. The information services of the SORS supports the resource
discovery by Peer-to-Peer technologies. The resource information includes: CPU speed,
CPU type, Memory total, Memory free space, Network Bandwidth, and job queue length.
There are two kinds of nodes in the VO. One kind is called “super peer”, and the other kind
is called “general peer”. The super peers are responsible for the site resource collection and
integration. The general peers are responsible for supplying themselves information status to
super peer.
Figure 3-3 The structure of traditional resource discovery in Grid system
Node 1
Monitor
Application/Server
CPU
info Mem info BandwidthINFO
Node 2
CPU
info Mem info Bandwidth INFO
Node 3
CPU
info Mem info BandwidthINFO
Site B
Node 1 cpu Node 2 Node 3Site C
Node 1 Node 3 Node 2Figure 3-4 The structure of resource discovery in SORS
3.5 Load
Sharing
To improve the efficiency of the allocation is an important issue in the Grid system. This
study focuses on resource allocation among sites with the decentralized structure. In order to
allocate the computer resources and to manage the job, this study adopts the migration
technology which integrated with Condor Java API to access the job status in the Condor job
queue, to achieve the job management and resources sharing across sites/VOs.
Suppose that the local site is over loading. We could use the load sharing algorithm to
migrate the idle job from the local site job queue to the under loading remote site for reducing
the job waiting time, and improving the utilization of the computing resources. SORS takes
the system load balance into consideration to obtain the best job execution time and to
increase the usage of the idle computing resources.
Site A
Node 1
CPU
info Mem info Bandwidth INFO
Node 2
CPU
info Mem info Bandwidth INFO
Node 3
CPU
info Mem info BandwidthINFO