• 沒有找到結果。

Web Interface Definition Language

CHAPTER 2 RELATED WORK

2.3 Related Technologies

2.3.7 Web Interface Definition Language

An introduction about WIDL was given in section 2.1.2.1 .Here we introduce the nuts and bolts of the WIDL language.

The WIDL definition is stored in an ASCII file, which is utilized by client programs at runtime to determine both the location of the service (URL) and the structure of documents that contain the desired data. Client programs access WIDL definitions from local files, naming services such as LDAP, HTTP servers or other URL access schemes, allowing centralized management of WIDL files. Unlike the way CORBA and DCE IDL are normally used, WIDL is interpreted at runtime. As a result, Service, Condition, and Variable definitions within WIDL files can be administered without requiring modification of client code. This usage model supports application-to-application linkages that are more robust and maintainable than if they were coded by hand.

There are three models for WIDL management:

z Client side: where WIDL are collocated with a client program

z Naming service: where WIDL definitions are returned from directory services

z Server side: where WIDL are collocated or embedded within Web documents Except for being expressed in XML, WIDL specifications closely correlate to existing IDLs. One significant difference is the notion of a WIDL record. A WIDL service may specify input or output variables within a particular interface.

The Web Interface Definition Language (WIDL) consists of six XML tags:

z <WIDL> defines an interface, which can contain multiple services and binding

z <SERVICE> defines a service, which consist of input and output bindings

z <BINDING> defines a binding, which specifies input and output variables, as

well as conditions for successful completion of a service

z <VARIABLE> defines input, output, and internal variables used by a service to submit HTTP requests, and to extract data from HTML/XML documents

z <CONDITION> defines success and failure condition for the binding of output variables; specifies error

z <REGION> defines a region within an HTML/XML document; useful for extracting regular result sets which vary in size, such as the output of a search engine, or news

One of the most important features of WIDL is the capability to reliably extract specific data elements from Web documents and map them to output parameters. Two candidate technologies for data extraction are pattern matching by regular expressions or pattern matching by tag patterns. Regular expressions are well suited to raw text files and poorly structured HTML documents. Tag patterns instead rely on the tag structure of the document and needs parse of the document. The parsed document structure exposes relationships between document objects, enabling elements of a document to be accessed with an object model, described in section 2.3.2 . Using an object model, an absolute reference to an element of an HTML document might be specified:

doc.p[0].text

This reference would retrieve the text of the first paragraph of a given document.

From both a development and an administrative point of view, pattern matching is more labor intensive to establish and maintain. Regular expressions are difficult to construct and prone to breakage as document structures change. For instance, the addition of formatting tags around data elements in HTML documents could easily

derail the search for a pattern. An object model, on the other hand, can see through many such changes.

The <VARIABLE> element is used to describe input and output binding parameters. Common attributes are:

z NAME: Required identifier for calling programs.

z TYPE: Required. Specifies both the data type and dimension (for arrays) of the variable.

z REFERENCE: Optional. Specifies an object reference to extract data from the HTML, XML, or text document returned as the result of a service invocation.

z MASK: Optional. Masks permit the use of pattern matching and token collection to easily strip away unwanted labels and other text surrounding target data items.

The <REGION> element is used in output bindings to define targeted subregions of document. This is useful in services that return variable arrays of information in structures that can be located between well known elements of a page.

Regions are critical for poorly designed documents where it is otherwise impossible to differentiate between desired data elements (for instance, story links on a news page) and elements that also match the search criteria.

z NAME: Required. Specifies the name for a region. This name can then be used as the root of an object reference. For instance, a region named foo can be used in object references such as:

foo.p[0].text

z START: Required. An object reference that determines the beginning of a region.

z END: Required. An object reference that determines the end of a region.

<WIDL NAME=”News” VERSION=”2.0”>

<SERVICE NAME=”TechWebOut” METHOD=”GET” URL=”http://www.techWeb.com”

OUTPUT=”techWebOut”>

<SERVICE NAME="TechWeb" METHOD="GET"

URL="http://www.techWeb.com/" OUTPUT="techWebOut">

<BINDING NAME="techWebOut" TYPE="OUTPUT">

<REGION NAME="tops" START="doc.font['Last?Updated*']"

END="doc.b['For?more*']" />

<VARIABLE NAME="service" TYPE="String" VALUE="TECHWEB Top Stories" />

<VARIABLE NAME="url" TYPE="String" REFERENCE="doc.url" />

<VARIABLE NAME=stories TYPE="String[]" REFERENCE="tops.a[].text" />

<VARIABLE NAME="links" TYPE="String[]" REFERENCE="tops.a[].href" />

</BINDING>

</SERVICE>

</WIDL>

Figure 2-8: Extraction of data elements with regions

Figure 2-8 demonstrates the use of regions in a news service, where the number of news stories varies day to day. Regions permit the extraction of data elements relative to other features of a document. The tops region begins with a text object that matches the pattern ‘Last Updated’ and ends with and object that matches ‘For more*’.

Variable references into the tops region collect arrays of anchors and anchor text, regardless of the fact that the sizes of the arrays change throughout the day. The object references within tops are vastly simplified by the processing already provided by the region definition:

tops.a[].text

tops.a[].href

Object References

The default object model used by WIDL provides object references for accessing elements and properties of HTML and XML documents. This model is based on the DOM object model in JavaScript, but without the JavaScript method definitions.

Using the default object model, all elements of HTML and XML documents can be addressed in the following ways:

z By name: if the target element has a non-empty name attribute, it can be used in the reference. For example, the value of an HTML element <a name=”foo”>

can be referenced:

doc.foo.value

z By absolute indexing: each array of elements has a zero-based integer index, i.e.:

doc.headings[0].text doc.p[1].text

z By relative indexing: directs the binding algorithm to search the VALUE attributes of each element in the array, until a match is found. The match must be complete, which requires the use of wildcard metacharacters for partial string matches. Note that the search will return the first matching element, if any:

doc.tr[‘*pattern*’].td[1].text

z By region indexing: directs the binding algorithm to search an object’s attributes until a match is found. Attribute matching is done with parenthesis instead of square brackets:

doc.a(name=’foo’).href

The following properties are available for all objects. To return the text of a container:

.text/.txt

To return the value of a container:

.value/.val

To return the source of a container:

.source/.src

To return the index of a container:

.index/.idx

To return the fully qualified object reference .reference/.ref

Putting WIDL to Work

WIDL files can be hand-coded or developed interactively with command line or graphical tools, which provide aid for determining object references used in

<VARIABLE>, <CONDITION>, and <REGION> declarations.

Once a WIDL file has been created, its use depends upon the implementation of products that can process and understand WIDL services. A Web integration platform based on WIDL needs to provide:

z A mechanism for retrieving WIDL files, either from a local file system, a directory service such as LDAP, or a URL

z An HTML and XML parser, and text pattern matching capabilities, providing and object model for accessing elements of Web documents.

z HTTP and HTTPS support, to initiate requests and receive Web documents Apart from these requirements, a WIDL processor could be delivered as a java class or a Windows DLL, for integration directly with client applications, or as a standalone server with middleware interfaces, allowing thin-client access to Web automation functionality.

Generating Code

The primary purpose of WIDL is integration with corporate business applications.

In much the same way that DCE or CORBA IDL is used to generate code fragments, or "stubs," to be included in development projects, WIDL provides the necessary ingredients for generating Java, JavaScript, C/C++, and even Visual Basic client code.

WebMethods has developed a suite of Web Automation products for the development and management of WIDL files, as well as the generation of client code from WIDL files. Client stubs, which we affectionately call "Weblets," present developers with local function calls, and encapsulate all the methods required to invoke a service that has been defined by a WIDL file.

import watt.api.*;

public class TrackPackage extends Object {

throws IOException, WattException, WattServiceException

Figure 2-9 features a Java class generated from the package tracking WIDL presented earlier in Example 1. This class demonstrates the following methods that

are part of the API that WebMethods has developed for processing WIDL:

z Context

z loadDocument

z invokeService

z getVariable

After declaring the variables that will be used by the PackageTracking class, a handle c to a new Context of the WebMethods Web automation runtime is created. All API calls are then made against this handle.

loadDocument loads and parses the specified WIDL file, in this case Shipping.widl.

Loading the WIDL defines the services of the Shipping interface to the runtime.

invokeService actually submits the input parameters to the TrackPackage service, which makes the appropriate HTTP request and returns either a result set which contains the bound output variables or an error message specified by a

<CONDITION/> statement within the <SERVICE/> definition. getVariable is then used to extract the values of the output variables and to assign them to class variables.

Within the Java application, the package tracking service looks like a simple instantiation of the TrackPackage class:

TrackPackage p = new TrackPackage("12345678");

In short, an application makes a call to a local function that has been generated by WIDL. The local function encapsulates the API calls to the WIDL processor. The WIDL processor:

z Loads the WIDL file from a local or remote file system

z Passes the function's input parameters as an HTTP request

z Parses the retrieved document to extract target data items

z Executes any conditional logic for error checking or service chaining

z Returns the extracted data into the output parameters of the calling function Generated Java classes can be incorporated in standalone Java applications, Java Applets, JavaScript routines, or server-side Java "Servlets." Generated C/C++

encapsulating Web services can be deployed as DLLs, shared libraries, or standalone executables. WebMethods implementation, the Web Automation Platform, provides Java classes, a shared library, a Windows DLL and an Active/X control to support Visual Basic modules which can be embedded in spreadsheets and other Microsoft Office applications.

相關文件