Chapter 1 Introduction
1.3 Organization
The following chapters are arranged as the following: In Chapter 2 Grid technology and Financial Services will be introduced in more details separately and then a study of financial services based on Grids given. In Chapter 3 Numerical methods that will be used in computational finance and the problem in migrating IT systems to Grids are introduced, in which Monte Carlo Simulation is specifically chosen to take advantage of the Grids and option pricing and value at risk are used as examples. Various grid platforms are also discussed. Three platforms are deliberately chosen for the calculation of option pricing and value at risk to demonstrate what Grids can do in Chapter 4. Finally this study is concluded in Chapter 5.
- 4 -
Chapter 2
Grid Technology for Financial Services
2.1 Information Technology (IT) and Financial Services
The finance industry involves a broad range of organizations that deal with the management of money. Among these organizations are banks, credit card companies, insurance companies, consumer finance companies, stock brokerages, investment funds and some government sponsored enterprises. The financial services industry represented 22.7% of the global market share in 2005 according to Gartner. In such a scale of market size, evidences found in IT impacts on financial services cannot be ignored.
The structure of the industry has changed significantly in the last two decades as companies, which are not traditionally viewed as financial service providers, have taken advantage of
opportunities created by technology to enter the market. New technology-based services have kept emerging. These changes are the result of the interaction of technology with other forces such as overall economic conditions, societal pressures, and the legal/regulatory environment in which the financial service industry operates. The effects of IT on the internal operations, the structure and the types of services offered by the financial service industry have been particularly profound (Phillips et al, 1984; Hauswald and Marquez, 2003; Griffiths and Remenyi, 2003). IT technology has been and continues to be both a motivator and facilitator of change in the financial service industry, which ultimately leads to competitiveness of the industry. The change is in particular radical after 1991 when World Wide Web was invented by Tim Berners-Lee and his group for information sharing in the community of high energy physics. It was later introduced to the rest of the world, which subsequently changed the face of how people doing business today.
Informational considerations have long been recognized to determine not only the degree of competition but also the pricing and profitability of financial services and instruments. Recent technological progress has dramatically affected the production and availability of information, thereby changing the nature of competition in such information sensitive markets. Hauswald and Marquez (2003) investigate how advances in information technology (IT) affect competition in the financial services industry, particularly credit, insurance, and securities markets. Two aspects of improvement in IT are focused: better processing and easier dissemination of information. In other words, two dimensions of technology progress that affects competition in financial services can be defined as advances in the ability to process and evaluate information, and in the ease of obtaining information generated by competitors. While better technology may result in improved information processing, it might also lead to low-cost or even free access to information through, for example, informational spillovers. They show that in the context of credit screening better
- 5 -
access to information decreases interest rates and the returns from screening. On the other hand, an improved ability to process information increases interest rates and bank profits. Hence
predictions regarding financial claims' pricing hinge on the overall effect ascribed to technological progress. Their results conclude that in general financial markets informational asymmetries drive profitability.
The viewpoint of Hauswald and Marquez is adopted in this study. Assuming competitors in the dynamics of financial market possess similar capacity, the informational asymmetries can be created sometimes only between seconds and now are possible to be achieved through the
outperformance of underlying IT platforms. This exactly reflects what is happening in the case of Optiver Taiwan mentioned earlier in Chapter 1. In the follow sections what advance of Grid technology can offer will be further discussed
2.2 Grid Technology 2.2.1 Definition of Grid
Grid was coined by Ian Foster (Foster and Kessleman, 2004) who gave the essence of the definitions as quoted below
“The sharing that we are concerned with is not primarily file exchange but rather direct access to computers, software, data, and other resources, as is required by a range of collaborative problem solving and resource-brokering strategies emerging in industry, science, and engineering.
This sharing is, necessarily, highly controlled, with resource providers and consumers defining clearly and carefully just what is shared, who is allowed to share, and the conditions under which sharing occurs. A set of individuals and/or institutions defined by such sharing rules form what we call a virtual organization.‖
The definition is centered on the concept of virtual organization, but it is too conceptual to explain what the grid is. Foster then provides additional checklist as below to safeguard the possible logic pitfalls of the definition: Grid is a system that:
1) coordinates resources that are not subject to centralized control
A Grid integrates and coordinates resources and users that live within different control domains—for example, the user’s desktop vs. central computing; different administrative units of the same company; or different companies; and addresses the issues of security, policy, payment, membership, and so forth that arise in these settings. Otherwise, we are dealing with a local management system.
2) using standard, open, general-purpose protocols and interfaces
A Grid is built from multi-purpose protocols and interfaces that address such fundamental
- 6 -
issues as authentication, authorization, resource discovery, and resource access. As discussed further below, it is important that these protocols and interfaces be standard and open. Otherwise, we are dealing with an application specific system.
3) to deliver nontrivial qualities of service.
A Grid allows its constituent resources to be used in a coordinated fashion to deliver various qualities of service, relating for example to response time, throughput, availability, and security, and/or co-allocation of multiple resource types to meet complex user demands, so that the utility of the combined system is significantly greater than that of the sum of its parts.
The definition of Grid thus far is well accepted and has been stably used up to now. The virtual organization (VO) has strong implication of community driven and collaborative sharing of distributed resources. The advance of development of optical fiber network in recent years plays a critical role of why Grids can be a reality. It is also the reason why now the computing paradigm shift to distributed/Grid computing.
Additionally, perhaps the most generally useful definition is that a grid consists of shared heterogeneous computing and data resources networked across administrative boundaries. Given such a definition, a grid can be thought of as both an access method and a platform, with grid middleware being the critical software that enables grid operation and ease-of-use. The above and more details of the primer of Grid are referred to Foster and Kessleman (2004).
2.2.2 Essence of Grid Technology
To realize the above goal, it needs to handle technically inter-operability of middleware that is capable of communicating between heterogeneous computer systems across institutional boundaries. The movement of Grid began in 1996 by Ian Foster and Kessleman (2004). Before their development, another branch of high performance computing that focuses on connecting geographically distributed supercomputers to achieve one single grand task had been developed by Smarr and Catlett (1992). They coined such a methodology as metacomputing and their query has been how can we have infinite computing power under the physical limit, such as Moore’s Law. However, it remains to be less useful because its limit goal on pursuing top performance without noticing practical use in real world. The idea lives on and generates many tools dedicated to high performance/throughput computing, such as Condor (Litzkow, Livny and Mutka, 1988), Legion(Grimshaw and Wulf, 1997) and UNICORE (Almond and Snelling, 1999). Condor, as suggested by the name of the project, is devised to scavenge a large clusters of idle workstations.
Legion is closer to the development of world-wide virtual computer. The goal of UNICORE is even much simpler and practical. It was developed due to Germany government decided to consolidate their 5 national supercomputer centers into a virtual one to reduce the management cost, and need a software tool to integrate them, hence the UNICORE. These tools were
- 7 -
successful under their development scope. However they fail to meet the first and the second items in Foster’s checklist in the previous section.
The emergency of Grids follows the similar path as that of Condor and Legion at the first place, which development aims at resources sharing in high performance computing. However, its vision in open standards and the concept of virtual organization allows its development go far beyond merely cluster supercomputers together. It gives a broader view of resources sharing, in which it is not only limited to the sizable computing cycles and storage space to be shared, but also extended virtually to calculable machines that are able to hook up to the internet, such as sensors and sensor loggers, storage servers, computers etc. Since 1996, Foster and his team have been developing software tools to achieve the purpose. Their software Globus Toolkit (Foster and Kessleman, 2004) is now a de facto middleware for Grids. However, the ambitious development is still considered insufficient to meet the ever growing complexity of grid systems.
As mentioned earlier that grid based on open specifications and standards, they allow all stakeholders within the virtual organization/grid to communicate with each other with ease and enable ones more to focus on integrated value creation activities. The open specifications and standards are made by the community of Open Grid Forum (OGF), which plays as a standard body and made, discussed and announced new standards during regular OGF meetings. Grid Specifications and Standards include Architecture, Scheduling, Resource Management, System Configuration, Data, Data movement, Security, Grid Security infrastructure. In 2004, OGF announced Globus Toolkit version, which adopt both the open standard of grid, Open Grid Services Architecture (OGSA), and the more widely adopted World Wide Web standard, Web services resrouce framework (WSRF), which ultimately enable grids to tackle issues of both scalability and complexity of very large grid systems.
- 8 -
2.3 Performance Enhancement via Grids
In this section two types of Grid systems, compute intensive and data intensive
respectively, are introduced. The classification of the types is based on various grid applications.
Traditionally, the grid systems provides a general platform to harvest, or to scavenge if used only in idle status, compute cycles for a collection of resources across boundaries of institutional administration. In real world most applications are in fact data-centric. For example in a trading center, it collects tick-by-tick volume data from all related financial markets and is driven by informational flows, hence typical data-centric. However, as noted in Section 2.1 the core competence still lies on the performance enhancement of the IT system. The following two
subsections are will gives more details of compute intensive as well as data intensive grid systems by a survey of current development of Grids specifically for financial services. In some cases, e.g.
high frequency data with real time analysis, two systems have to work together to get better performance. Our emphasis will be more on compute intensive grid system.
2.3.1 Compute Intensive Grid Systems
The recent development of computational finance based on grids is hereby scrutinized and remarks given. Our major interest is to see if the split second performance is well justified under the grid architecture. Also, real time issue with real market parametric data should be used as input for practical simulation. In addition, issues of inter-system, inter-disciplinary, geographically distribution of resources and the degree of virtualization are crucial to the success of such a grid. The chosen projects are reviewed and discussed as follow:
1.) PicsouGrid: This is a French Grid Project for Financial Service. It provides a general framework for computation Finance and targets on applications of options trading, options pricing, Monte Carlo simulation, aggregation of statistics etc (Stokes-Rees et al, 2007). The key for this development is the implementation of the middleware ProActive.
ProActive is an in-house Java library for distributed computing developed by INRIA Sophia Antipolis, France. It provides transparent asynchronous distributed method calls and is implemented on top of Java RMI. It is also used in commercial applications. It also provides fault tolerance mechanism. The architecture is shown in Figure, which is very similar to most of grid applications apart from the software stack used. The option pricing was tested in an approximately 894 CPUs. The underlying computer systems are
heterogeneous. The system is used for metacomputing. As a result, the system has to specifically design to orchestrate and to synchronize and re-synchronize the whole distributed processes for one calculation. Once the grid system require synchronization between processes, which imply stronger coupling of algorithm of interest, the
performance will be seriously affected. There is no software treatment to solve such problems and should be tackled by physical infrastructure, e.g. optical fiber network with
- 9 -
Layer 2 light path.
Figure 2 Architecture of PicsouGrid for option pricing based on Monte Carlo simulation (Stokes-Rees, 2007).
2.) FinGrid: FinGrid stands for Financial Information Grid. Its study includes components of bootstrapping, sentimental analysis and multi-scale analysis, which focusing on information integration and analysis, e.g. data mining. It takes advantage of the huge collection of numerical and textual data simultaneously to emphasize the study of societal issues (Amad et al, 2004; Gillam, Ahmad and Dear, 2005; Ahmad, Gillam and Cheng, 2005). The architecture of FinGrid is shown in Figure 3.
Figure 3 The architecture of Financial Information Grid (FinGrid).
It is a typical 3 tiers system, in which the first tier facilitates the client in sending a
request to one of the services: Text Processing Service or Time Series Service; the second tier facilitates the execution of parallel tasks in the main cluster and is distributed to a set of slave machines (nodes) and the third tier comprises the connection of the slave
machines to the data providers. This work focuses on small scale and dedicated grid system. It pumps in real and live numerical and textual data from say Reuters and performs real time sophisticated data mining analysis. This is a good prototype for
- 10 -
Finanical grid. However, it will encountered similar problem as that of PicsouGrid if it is to scale up. The model is more successful in automatically combining real data and the analysis.
3.) IBM Japan collaborates with life insurance company and adopt PC grids concept to scavenge more compute cycles (Tanaka, 2003). In this work an integrated risk
management system (see Fig. 4) is modified, in which the future scenarios of red circle of Fig. 4 are send via Grid middleware to a cluster of PCs. According to the size of the given PCs, the number scenarios are then divided in a work balanced manner for each PC. This is the most typical use of compute intensive grid systems and a good practice for production system. However, the key issues that discussed in the above two cases cannot be answered in this study. Similar architecture can also be found in EGrid (Leto et al, 2005).
Figure 4 Architecture of Integrated Risk Management System (Tanaka, S. 2003).
4.) UK e-Science developed a grid service discovery in the financial markets sector focusing on integration of different knowledge flows (Bell and Ludwig, 2005). From application’s viewpoint, business and technical architecture of financial service
applications may be segmented by product, process or geographic concerns. Segmented inventories make inter silo re-use difficult. The service integration model is adopted and a loosely coupled inventory – containing differing explicit capability knowledge. Three use cases were specifically chosen in this work to explore the use of semantic searching:
- 11 -
Use-case 1 – Searching for trades executed with a particular counterparty
Use-case 2 – Valuing a portfolio of interest rate derivative products
Use-case 3 – Valuing an option based product
The use-cases were chosen to provide examples of three distinct patterns of use – aggregation, standard selection and multiple selection. The architecture (see Fig. 5) is bound specifically with the user-cases. The advantage for grid in this case is that it can be easily tailored into specific user need to integrate different applications, which is a crucial strength of using grid.
Figure 5 The Semantic Discovery for Grid Services architecture (SEDI4G) (Bell and Ludwig, 2005).
2.3.2 Data Intensive Grid Systems
Grid in Financial Services from the perspective of web Services towards Financial Services Industry. The perspective is more on transactional side. Once the bottleneck of compute cycle is solved, the data-centric nature will play the key role again. The knowledge flows back to the customized business logic should provide the best path for users to access the live data of interest. There is no strong focus of development on this data intensive grid system. Even in FinGrid (Amad et al, 2004) which claims in streaming live data for real time analysis, the data issue remains part of compute grids. However, the need for dynamic data management is obvious as mentioned in (Amad et al, 2004). Hereby, we like to introduce and implement a dynamic data management software Ring Buffer Network Bus (RBNB)
Dataturbine to serve such a purpose.
- 12 -
RBNB Dataturbine was used recently to support global environmental observatory network, which involves linking with ten of thousand of sensors and is able to obtain the observed data online. It meets grid/cyberinfrastructure (CI) requirements with regard to data acquisition, instrument management, and state-of-health monitoring including reliable data capture and transport, persistent monitoring of numerous data channels, automated
processing, event detection and analysis, integration across heterogeneous resources and systems, real-time tasking and remote operations and secure access to system resources. To that end, streaming data middleware provides the framework for application development and integration.
Use cases of RBNB Dataturbine include adaptive sampling rates, failure detection and correction, quality assurance and simple observation (see Tilak et al (2007)). Real-time data access can be used to generate interest and buy-in from various stakeholders. Real-time streaming data is a natural model for many applications in observing systems, in particular event detection and pattern recognition. Many of these applications involve filters over data values, or more generally, functions over sliding temporal windows. The RBNB DataTurbine middleware provides a modular, scalable, robust environment while providing security, configuration management, routing, and data archival services. The RBNB DataTurbine system acts as an intermediary between dissimilar data monitoring and analysis devices and applications. As shown in Figure 6 a modular architecture are used, in which a source
or ‖feeder‖ program is a Java application that acquires data from an external live data sources and feeds it into the RBNB server. Additional modules display and manipulate data fetched from the RBNB server. This allows flexible configuration where RBNB serves as a coupling between relatively simple and ‖single purpose‖ suppliers of data and consumers of data, both
or ‖feeder‖ program is a Java application that acquires data from an external live data sources and feeds it into the RBNB server. Additional modules display and manipulate data fetched from the RBNB server. This allows flexible configuration where RBNB serves as a coupling between relatively simple and ‖single purpose‖ suppliers of data and consumers of data, both