Chapter 1. Introduction
1.1 Motivation
The Grid System is a distributed computer system which makes use of the internet to connect dynamic, multiple virtual organizations and resources. The system could be used in the industry, science fields and molecule calculus for large amount of data computing and producing. The service and the shared resource that Grid System offers including the computing ability, data transmitting, software and information of remote host, and other internet accessing resource. The system further adopts the appropriate strategies and allocates and evaluates them in order to fulfill the goal of strategy sharing, selecting and integrating.
These resources could be spread out into supercomputers, storage systems and other devices in different zones owned by different organizations. Nowadays, Grid System has been applied into multiple fields such as System Simulating、Medical Science, and Advanced Physics.
The word of “Grid” is derives from “electric power grid”. It represents the Grid system as the electric system by “allowing the users to obtain the processing ability as easy as obtaining the electric power from the outlet on the wall” [20]. The father of Grid, Dr. Ian Foster define the Grid in “The Grid: Blueprint for a New Computing Infrastructure” as “A computational grid is a hardware and software infrastructure that provides dependable, consistent, pervasive, and inexpensive access to high-end computational capabilities.” [17].
Infrastructure means that Grid is the combination of various resources. It combines the hardware such as CPU, memory, storage to build a whole system framework, and monitors and controls the results with the software such as Monitor and Broker. Dependable means the
user could take care of the jobs from a high-level point of view without worrying about the processing situation of every individual component in the lower layer. Besides, the user could have the guarantee of the processing jobs since once the jobs handed over to the resource fails to be processed the component in the lower layer will retransmit simultaneously. Therefore, the user doesn’t need to execute the task again. Consistency explains that Grid must develop a standard. Grid services would be impracticable without the support of the standard. Moreover, if an user is able to obtain the service everywhere, it is called Pervasive. Finally, once the infrastructure is matured formed and broadly used, the inexpensive services should be provided to satisfy the demand of the user.
Dr. Ian Foster added an explanation, “The real and specific problem that underlies the Grid concept is coordinated resource sharing and problem solving in dynamic, multi-institutional virtual organizations.” [18], on the resources management in “The Anatomy of the Grid: Enabling Scalable Virtual organizations”, 2001. Virtual Organization (VO) is a virtual institution formed with multiple individual units which shared resources with each other to achieve targets. Virtual Organization is connected by making use of the internet without the regional restriction. Grid utilizes VO to reach the concept of resource sharing.
In” What is the Grid? A Three Point Checklist”, 2002, Dr. Ian Foster clearly referred to the standard of inspecting Grid. “1) coordinates resources that are not subject to centralized control …2) … using standard, open, general-purpose protocols and interfaces…3) … to deliver nontrivial qualities of service” [19]. From all of the above, we conclude that Grid is a distributed system which integrates the resources from different organizations to solve huge compute and handle extensive data.
Table 1.1 illustrates the major categories of grid application and their characteristics [3, 17]. For example, the category of Data Intensive produces enormous amount of computing
data. The European Organization for Nuclear Research (CERN) built Large Hadron Collider (LHC) to study the origin of mass of the elementary particle. More than 4300 scientists of 179 institutes from almost 40 countries have involved in this research. The experiment is expected to generate 15 Petabytes Data amount for one year. The WLCG (Worldwide LHC Computing Grid) project becomes the first Grid system developed to meet the scientific applications. It offers the need of large amount of data calculation, software design, data management and system maintenance for the LHC experiments [1, 13].
Table 1.1 Five Major Classes of Grid Applications [17]
Category Examples Characteristics
Distributed
supercomputing chemistry Very large problems needing lots of CPU, memory, etc.
High throughput Chip design
Harnessing many otherwise idle resources to increase aggregate throughput
On demand Medical instrumentation Cloud detection
Remote resources integrated with local computation, often for bounded amount of time
Data intensive Physics data Synthesis of new information from many or large data sources
Collaborative Education
Support communication or
collaborative work between multiple participants
For both developing the computing ability and utilizing the saving storage, knowing how to allocate Grid resources to achieve the goal would be big challenges in the future. This study will focus on job scheduling for the computing grid. The economic model would be the principle in the research. We will try to design a proportional shared auction system to complete the tasks more efficiently and meet users’ need.
It not only could be applied in academia but also could be used for individuals when the grid technology developed maturely in the future. By then, individuals would possess the computing resources and saving storage as if having their own supercomputers that safely share resources and access without the restriction of distance [34]. For this reason, Dr. Satoshi Matsuoka defined the future Grid System as “Everybody’s Supercomputer:Breaking the Traditional Supercomputer and Grid Economics.” in WoGTA’06 seminar.