• 沒有找到結果。

CHAPTER 3 THE WIS PLATFORM

3.1 System Overview

The previous chapter presented the many possibilities of Web automation, and there are still much more to be discovered. In this chapter the common properties for Web automation applications are organized and discussed, and an ideal architecture is proposed. An implementation of this ideal architecture, the WIS system, is then documented in details.

The main concept of Web automation is to reuse existing services available in the Internet. It can be seen as the next pace in reusability. Instead of “functionality reuse”, mentioned in the discussion of Web services in section 2.3.8 , what is concerned in Web automation is the capability of “service reuse”, the reuse of entire services. For example, we can put online book store services, library online services and inter library loan services together to create a powerful solution for providing books of any kind; and empowered by Web automation, it will not have only descriptions and links such as traditional portal sites, but also really cooperating together. Continuing this example, which may be called a book agent, the user uses the metasearcher capability of the agent to search every potential provider of the book, no matter if the provider is a library or a bookstore. The agent may create two groups of providers having the wanted book, one of libraries and one of bookstores. If the user selects the group of libraries, the data is passed to an inter-library loan system to acquire the book. If the bookstores group is selected, the agent may propose the best buy by comparing the prices and conveniently send the request for the user. As we can see in the example, entire services are reused, creating an altogether new experience for acquiring a book.

To reuse services, the first issue that needs to be solved is interoperability. Many

protocols have been created for different situations in interoperation. The increase interest in the metasearcher area introduced various protocols, such as Z39.50, OAI and OpenURL. But there are always services that do not support these protocols, since their initial purpose is for interaction with human users, not machines. The result is, the only thing that we may be sure about Web services is that they use well known Web technologies that a competent Web browser will surely do its job without any problem. The most common is the HTTP protocol and the HTML presentation language. Additional mechanisms such as cookies, security and scripts make the picture of the common interoperation even more complicated. But to meet the goal that the ideal Web automation tool needs to be of common purpose, these are the only things that should be relied.

By choosing to only rely on common Web technologies, another problem arises.

The user interface of Web services usually change with the time. For example, many sites have advertisements that by their nature changes frequently. For the human user it is not a big problem because he/she can understand their meaning and skip them.

But it is difficult for a computer to really “understand” the interface of a Web service.

Some solutions were mentioned in previous sections to put away these noises.

Intelligent solutions are hard to be widely applicable, in which heuristics are usually suitable for only specific cases. WIDL instead provides a language to describe the position of the required data with more chance to skip unwanted changes. The WIS platform proposed in this thesis uses the WIDL solution, with some modifications.

Parallel processing is a common need of Web automation applications. In a typical metasearcher for example, when the user submits a query, the query is dispatched to various sources at the same time, so that every source can do its job in parallel with others. When a source terminates its job, the results are returned and the agent can do

further processing while there may still have sources working with the request (Figure 3-1).

Figure 3-1: Multiple Processes in a Metasearch

From the metasearcher example, it seems that only the main process need to have the privilege of creating other process. But there may be cases in which a process may need to create sub-processes, and messaging between any one of the processes is needed. The WIS system supports any arbitrary arrangement of processes due to its various scopes. Scopes are to be discussed in detail in the next sections.

In a Web automation application, the need to interoperate with various sources at the same time may consume the resources of a server very fast. In a three tier architecture, the business logic tier accomplishes the Web automation tasks (Figure 3-2). For example, in a previous work, the VUCS system, the automation component is implemented as a DCOM object. When a user performs a search, the interface program requests the Web automation DCOM object, which interoperates with the target resources. To scale up to a large number of users, computing power can be

increased by adding more servers in the business logic tier; but the network may eventually become the bottleneck of the system when multiple users are performing Web automation tasks in the server.

Figure 3-2: Web Automation in a Three Tier Model

The solution provided by WIS for this problem is that it permits Web automation tasks be performed at the client side, which reduces the load in the business logic tier.

From Figure 3-3 we can see that WIS replaces the position of the browser. It is especially efficient when users are widely spread in the network because the Web automation task will mainly spend local network bandwidth, saving the bandwidth of the server. But for this architecture to work, some issues need to be solved.

By moving the Web automation task from the business logic tier, a new problem arises. When in the business logic tier, the Web automation component could easily work with other components and interact with the data tier. For example, in our book recommendation system, the Web automation’s task involves reading hyperlinks

Presentation Business Logic Data Tier Web

stored in the database, extracting the metadata from the online bookstores, and writing the new information back to the database. In the discussion of WIDL in section 2.1.2.1, a lot of applications were mentioned, and many of them refer to integration with the enterprise system. WIDL is mainly designed to run at the server, and any protocol could be used since it needs the supplement of a traditional programming language. But with the Web automation task moved to the client side, there need to be a way to keep the interaction with the enterprise system. Web service technology plays this role by exporting functionality from the enterprise system to WIS.

Whenever the Web automation needs support from the enterprise system, it acquires interface information by the WSDL document and then with the definition, functions at the server side can be called by using SOAP messages. Thus the Web automation task running in WIS can access whatever function it needs from the server, from anywhere in the network.

Figure 3-3: Three Tier Model with WIS

The business logic issue in the WIS architecture is solved, but there is still problem Presentation Business Logic Data Tier

Web

Client Side Server Side

with the interface. In the traditional three tier model, modifications to interfaces can be easily done at the presentation tier. When the user requests a Web automation task from the server, an entire new page is passed back to the browser whenever the server finishes it. But when entire Web automation tasks are running at the client side, how can WIS create whole new pages to show the progress or changes, which varies a lot from application to application? A solution would be to go back to the two-tier model, where clients are designed to specific applications, making the business logic work tightly with the presentation in the desktop. With this approach, WIS will come with different tailored interfaces for different applications, and the maintenance nightmare of updating hundreds or thousands of desktops comes back. Another solution is to use HTML interfaces instead of hard-coded interfaces, which are downloaded from the server. To refresh the interface, the Web automation process can request the server to construct a new page for it according to the parameters given and send it back. By this way WIS can keep its generality and compliance with the three-tier model. Better performance can be achieved by using DHTML, preventing the need of the round-trip every time the interface needs a refresh. The Web automation process can notify the user about any change without having to reload entire pages from the server. Any kind of report can be created by this way.

Many Web automation tasks are repetitive and need to be executed periodically. In the Website checking application, the check task is performed by intervals determined by the site manager. The solution provided by WIS comes from DHTML, which provides timed function call.

Data returned from sources need further processing before presenting them to the user. Users’ requests to the Web automation application need processing before sending to sources. In a metasearcher for example, the query given by the user is

translated to a format that the target resource accepts, which may be different for every resource. Results from every resource may have many differences, such as different date formats and lack or availability of some fields. Reranking will need even further computing. The WIDL, which plays only the role of an interface definition language, let this work to the complementary programming language.

Because it is difficult to have a single tool that can process so many variations in data processing (if any exists), WIS adopts to use a common programming language. The developer can use JavaScript or use Java applets to perform the transformation task.

With this architecture, WIS moves the Web automation task from the server to the client, distributing even more the work load without affecting the convenience of a counterpart browser. Although WIS substitutes the browser, it still keeps the advantage of being a common client for the various applications. There is no need to have a special version of WIS for every application since it is designed with the concern of supporting any application, keeping the original idea of a common client that simplifies the deployment of new applications which usually happens in the maintenance phase.

相關文件