A World-Wide Web server on a multicomputer system

(1)

A World-Wide Web Server

on

a

Multicomputer System *

Chun-Hsing Wu, Chun-Chao Yeh, and Jie-Yong Juang

Dept.

of Computer Science and Information Engineering

National Taiwan University

Taipei, Taiwan, 10617

Fax:

8 8 6- 2-

3

6 28 16

7 Email: [email protected]

Abstract

A s the number of people browsing the world-wide web increases explosively, workload of popular web servers also increases rapidly. A multicomputer sys-

tem, that was designed f o r 1/0 intensisve applications has been found to be quite suitable for serving as a web server. T h e s y s t e m i s composed of multiple clusters of

multiprocessors interconnected by a n interconnection network. T h e interconnection network is a n ATM- like cell-based switching network which can be restructured

so that the s y s t e m can be scaled u p to meet the i n - creasing demands of web service. It is also found that the multicomputer s y s t e m can support video streams eflectiuely. Design of the s y s t e m as well as porting of web servers o n to it will be discussed in this paper.

1 Introduction

Since the initial World-Wide Web prototype was developed in 1990, it grows rapidly and becomes the most popular system on the Internet in recent years [2]. According to the Internet Domain Survey con- ducted in January 1996, about 76,000 s y s t e m now

have the registered domain name www. up from only

600 in July, 1994 [9]. Due to the increasing demands

of web requests, it becomes a critical issue for a web

server of a popular site to offer high performance and guarantee high availability. A web server must be able to serve multiple simultaneous requests promptly even in its peak time. Besides, from the information providers’ point of view, the provided information

of each server will accumulate as the time passes by.

Therefore, it is desirable for a web server to be scal-

able and t o support better information searching capability. In the near future, supporting video streams

will also become a basic requirement for a web server.

Accordingly, highly-available, scalable machine with

strong 1/0 capability will be necessary to run a web

server.

To address these issues, we are developing a world- wide web server on top of a multicomputer machine designed and implemented in our laboratory. The ma-

chine uses a multistage switching network to connect

multiple clusters of multiprocessors. Each processor *This work was partially supported by t h e National Science Council under grants NSC84-2221-E-002-004, and NSC85-2221- E-002-029.

can either be a simple CPU module, or with individ-

ual storage devices and/or network adapters attached

depending on the needs of applications. As a result,

multiple 1/0 devices can be accessed concurrently to

provide higher

110

bandwidth than single-bus ma-

chines. An ATM-like cell-based interconnection network is designed to support more predictable and more efficient inter-processor communication. In addition, its restructurable architecture also makes the machine scalable. The web server on the machine will identify different kinds of requests and assign them to the pro- cessors optimized to handle the type of requests. The

web server can also work well as a proxy web server.

Furthermore, with our multicomputer, a simple repli-

cation strategy can be applied with little overhead to achieve high availability.

In the following sections, we will describe the char- acteristics of existing W W W systems and the availability issues first. Some related works are also discussed. Then, we present our multicomputer platform

in Section 3. In Section 4, a general software struc-

ture of a web server proposed for multicomputers like ours is depicted. An implementation of the server is

discussed in Section 5. Section 6 draws conclusions.

2 Issues

of

WWW

Servers

In World-Wide Web, user agents use the

application-level stateless Hypertext Transfer Proto-

col (HTTP) [3] to request documents or other kinds

of objects from web servers (or from proxies/gateways

to other Internet servers). In practice, a browser es-

tablishes a T C P connection t o a web server before

requesting a document. After fetching a document,

it disconnects the connection immediately. The web server will feed the browser the document or just redi-

rect it to contact other servers. A document may be

a text file, either in plain format, in HTML, an im-

age, or a motion-picture file. It also may be a virtual

document, actual d a t a of which is generated on-the-

fly. Most servers support Common Gateway Interface (CGI) for virtual documents. Database queries and search requests can be implemented by this mechanism. There is also an extension to H T T P called

server-push t o handle dynamic documents consisting

of multiple parts. Server-push allows implementation of simple animation, nevertheless it may occupy a con-

(2)

nection for a longer time.

To reduce the network traffic, a browser may con-

tact a local proxy server ( or caching server) for remote

documents. The proxy server fetches remote docu-

ments for the browsers and then keeps a copy in its

local disk. Next time if some request wants the same document, it may directly feed the browser with the

local copy [SI.

According to several trace analysis studies [4, 1, 51,

small documents are accessed more frequently than

large documents. This observation is consisteriit with

the general web page design rule to $eep the front page small. Most documents are read-only or not modified frequently comparing with ordinary files. For those virtual documents generated by database access, they usually invoke search queries which niay involvle large volume of read-only files. In summary, small files are accessed in most simple connections, and read-only file access contributes t o most disk activities. The phenomena yields another opportunity for optimizing

a web server.

In a web server, disk write access is performed mainly in log operation, and in caching remote doc-

uments in local disks in case of a proxy server.

Disk write in these cases is usually write-once, arid the cached remote documents are always discardable. These features make the traditional weak consistency

model of network file systems suitable for a web server

and alleviate the consistency overhead in replicating

files. Besides these, the link inform.ation in a hyper-

text document may give a hint t1ia.t indicates which

files will be requested soon. The link: information rnay

be used to design a more effective b’uffer replacement

algorithm than that in a general file system.

Many browsers are multi-threaded. They are able to simultaneously send multiple requests for in-lined

images within a HTML document. It can increase

the concurrency of a web server, but it also reduces

the nuniber of users that a web server can serve dur-

ing peak time. Besides, a request .may occupy T C P

connection for a long period of time if it’s request-

ing for a large file or is connected from a low-speed

network. For a proxy server, it will need more connec-

tion capability t o serve local proxy clients and fetch remote documents while no cached file is available.

However, there is limitation on the available connec-

tions of a server due to the shortage of operating system resources such as limited T C P ports, mbuf, process table, etc. Fine-tuning the system will solve the problem, but will not solve it definitely.

Existing rsedirection mechanism in H T T P p-rotocol can be adopted to improve the availability and scala-

bility [3]. Requests received by a central web nrachine

can be redirected to a pool of web machines. This

approach alleviates the workload of the central web machine, but it incurs network and. connectiori overhead in redirection. In addition, t h e central web machine may still be the hot-spot of the machine groups. Furthermore, the same document returned by two different machines in the groups will be considered by

clients or proxy servers as two different copies. It

makes the global caching scheme ineffective.

In the design of NCSA’s scalable web server [6], a

Round-Robin DNS approach is designed for distribut-

ing requests among a cluster of web servers, which

share the same alias host name. The authoritative DNS server in the cluster acts as avirtual router to distribute requests by rotating through the web servers that are alternately mapped to the shared alias name. This design eliminates the single point of failure, and it can dynamically increases the capacity of the virtual server. However, result of the name resolving

will be cached in

a

client’s local name server for a

period of time. Any further resolving request to the same local name server, even from different clients, will reuse it before the mapping is expired. This may make the load distribution among the server cluster uneven. One method proposed to alleviate this effect is to shorten the time-to-live value for each resolving result, and then the name servers of clients will query again soon, but the DNS queries will increase the global traffic.

In the design of our proposed web server, it dis- tributes the requests transparently and more evenly

among several clusters of multiprocessors. It also can

tolerate single point of failures. Before describing the design of the web server, we present the multicomputer architecture first.

3 The

NTU

Cost-effective Multicom-

puter Clusters

In this section we present a cost-effective rnulticom-

puter architecture, called SIGMA (System-Integrated

Growable Multicomputer Architecture)

,

developed a t

National Taiwan University. The goal of the project was to develop a clustered machines to offer better cost-performance ratio than conventional supercom- puters. In stead of using expensive custom design approach, the NTU SIGMA machine is developed with off-the-shelf components. It leverages the latest micro- processor technologies, and integrate computing com-

ponents together with a proprietary interconnection

network.

3.1

Architecture overview

Figure 1 shows an example of SIGMA multicom-

puter architecture with 64-node connection. The system consists of two major entities: computing nodes and network subsystem. The computing node can be

as simple as a CPU module, or can be a complete com-

puter with proper 1/0 capabilities. The network sub-

system is a multistage interconnection network(M1N).

For instance, in Figure 1, the MIN is a three-stage

Clos networ%[7], in which each stage consists of sixteen four-by-four switching elements. Each comput-

ing node contains a SIGMA Network Interface(SN1)

to connect its bus interface to a port of the MIN.

3.2

SIGMA computing nodes

Ehch computing node consists of a CPU module

and some optional 1/0 modules. Connection between

modules is via a standard 1/0 bus, arid thus a vast

array of commodity I/O adapters can be used in the node[lO]. Each node is physically separated from each others. Communication between nodes is achieved by

(3)

SIGMA cell-communication networkiSCCN1

SIGMA-nodes

r I

I

6 4 x 6 4 M I N

I

Figure 1: The SIGMA multicomputer architecture -

A 64-node example

message passing through the internal network subsys- tern. Messages can also be delivered through conventional LANrLocal Area Network) facility if LAN devices are plugged into the nodes. The distributed na- ture of the architecture allows the system to survive device failures. All nodes are not necessary to be the same. Heterogeneous nodes can be in the system. Al- though most of system devices are separated located, system resources can be shared effectively through the

communication facilities, the MIN or the plugged-in

LANs.

3.3

SIGMA cell communication network

T h e network subsystem, SIGMA Cell Communi-

cation Network(SCCN), consists of two major parts:

SIGMA network interface(SNI), and a cell switching

network. A message is chopped into small fixed-sized d a t a entities(cells), before it goes into the switching network, and the cells are reassembled a t the desti-

nation nodes. The SNI take charges of 1. network

protocol conversion (data partition/reassembling), 2.

d a t a buffering, 3. cell header checking/generating, 4.

network link serialization/de-serialization, 5 . cell re-

transmission and link level flow control. In case of transmitting, packets are irijected into SNI through the bus interface. Then, they are converted into cells and stored in cell trarisrriitting buffer. As soon as cells

go into the buffer, they will be fetched out and serial-

ized for sending through the network immediately, cell

by cell, whenever the requested channels are available. Upon receiving, similar operation steps in reverse di- rection will be performed on the cells. To overlay computing (protocol processing) and communication

(cell sending/receiving), we use dual port RAM as cell

buffer in the SNI. Also, we allow transmitting buffer and receiving buffer be rrianipulated concurrently. In

addition, SCCN is a self-routing network based on

(external) . : . :

.

. /

.

. : . :

.

. j

.

((;;hi

7

1

,,i.”’

9

j S I G - n o d e s : .______._._..______..______ i

/netowk SIGMA c e l l ( S C C N ) communication

: S C C N - L ~ ,

Figure 2: A configuration of SIGMA multicomputer

system with 64 nodes

the destination information carried in the cell header. Cells in the network can therefore be routed individ- ually.

3.4

Cluster-based multicomputer system

The SIGMA multicomputer system is designed not only for parallel computing, but also for interactive

computing. Therefore, each node occupies larger

spaces than that in MPP(Massive1y Parallel Proces- sor) systems. It is hard t o put too many nodes all

together in a P C B board. One common solution is to

partition nodes into several clusters. In SIGMA multicomputer system, each cluster consists of several pro-

cessor nodes and a on-board interconnection network

with an architecture similar to t h a t shown in Figure 1

but with fewer stages. Figure 2 shows an example of

partitioning a system of sixty-four nodes network into

sixteen clusters, with four nodes in each cluster.

3.5

Features of SIGMA multicomputer

Some features of the SIGMA multicomputer make

SIGMA machine feasible to run a web server, although

it can be applied t o other applications as well. First,

the system is expansible (scalable). To meet huge sys-

tem resource demands of large scale web servers, size

of a SIGMA machine is allowed to be incrementally

increased. Upgrading of SIGMA system can be made

on module-by-module basis. For instance, one can

simply insert one CPU module t o enhance comput-

ing power, instead of adding a whole computer(1ike

PC/Workstation) to the system as it is needed in the case of PC/workstation clusters with conventional LAN interconnection. Besides, customized in-

(4)

terconnection network of the SIGMA machine provides higher bandwidth and better system resource

sharing. Second, the system allows concurrent 1/0

operations. Different from scientific computing, Web service is more I/O-oriented, especially in disk 1/O and

networking. The SIGMA machine is a share-nothing

architecture. Each node of it would be attached to

disk modules and network modules. Consequently, it provides not only large aggregated computing power

and system memory, but also large bandwidth of disk

and network I/O. In addition, design of the SIGMA interconnection network also providles efficient com- munications to facilitate concurrent I/O.

3.5.1 hardware flow control

Flow-control supported at the hardware level con-

tribute to the fast message-passing communication

in SIGMA. It prevents data loss due to receiving

buffer overflow (in hubs or in destination nodes). For

a connection-oriented communication, any d a t a loss

would require re-transmission of the packet. This would waste bandwidth and cause significant communication delay. Although higher level flow-control protocols such as T C P / I P window-based flow-control can also alleviate the problems, it incurs larger overhead, and moreover it "avoids" the d a t a loss problem, but not guarantees t o "prevent" the problem from hap- pening.

3.5.2 cell-swit ching communication

Another important feature is the cell-switching. Cell

size in SIGMA is fixed at 64 bytes lasng. Four blytes of

the cell is designated as cell header:, two of them are hardware hea.der, and the other two are cell adaption

layer header. Sixty bytes of d a t a pa,yload can carry a

complete ATM cell (53 bytes) or a minimum length of

IP packet over Ethernet(6O bytes) which covers large

portion of small control packets(e.g., ICMP, ARP,

RARP packets) used in T C P / I P protocols. Sixty- byte packet fits one SIGMA cell without any waste,

while it woulld need two ATM cells to carry such a

packet. Also', we support multicasting in the hardware. Current version of SIGMA cell-switching net-

work can achieve multicasting within a cluster (four

nodes), and broadcasting(t0 all nodles) in the system. To respond to a urgent packet quickly, an emergency bit in the cell header can be set and will be identi- fied by the hardware for immediate :processing. Other

benefits from cell switching versus packet switching

are summariaed as follows:

0 simple architecture: Comparing with variable

length(packet-based) architecture, cell-based architecture is simpler. Simplicity of the architecture results in better performance. It not only simplifies control logics but also eases the man- agement of random access buffers.

0 latency improvement: Small packet can interleave

with large packets. As an example shown in Fig-

ure 4, small packets like p 2 and p 5 can be sent out

+control

E)

nodes

0

hub

[

cell data +data

Figure 3: Worm-hole effects of transmitting a packet

across multiple switching hubs

quickly by interleaving with other large packets. These packets may be blocked by the large packets in case of packet-switched communication. in case of pack. Furthermore, transmission of large packets can benefit from worm-hole effects of the network as shown in Figure 3.

0 more predictable transmitting time: Characteris-

tics of cell-interleaving(Figure 4) in SIGMA net-

work subsystem make it more like a TDMA (Time

Division Multiple Access) network where network

bandwidth is divided into a set of time slots. Con-

sequently, the time t o transmit a S-byte packet

can be limited to N * S / B , where N is the number

of nodes, and B is network bandwidth. Bounded

transmitting time is important for real-time applications such as providing real-time video/audio streams in Web servers.

3.5.3 Cell pre-sink

To reduce communication latency, a cell pre-sink scheme was applied, which asks all nodes in a cluster receive(sink) all cells transmitted to t h e cluster in advance before the cells are determined which nodes they should go to exactly by the routing logics. Once the routing tags of the cells are resolved, all nodes in the clusters will be notified if they are the right desti-

nations. If yes, it continues to receive the rest of the

data, segments of the cells; if not, it just flushes the pre-sink data of the cells and gets ready for next cells.

As a result, data can be sent at a full speed without

any delay due to routing tag processing.

4 Software Configuration

of the Web

Server

on

SIGMA

Since the SIGMA multicomputer is flexible in 1/0

device arrangement, it allows a large variety of soft-

ware configurations for a web server. We propose a

configuration based on the world-wide web's run-time

behavior to take advantages of SIGMA architecture.

The proposed software configuration of our web

server is shown in Fig. 5 . It is composed of several

manager groups, each of which consists of several computing nodes. Number of nodes in each group depends on the workload of the web server and can be scaled up

or down when it is necessary. Note that a computing

node may run more than one kind of mangers at the

(5)

9 9

P I P 2 P3 P 4 P5

an example of packets transmission (five nodes, each with a packet o f s i z e pi1

T D ~ T D Z TD3 TD4 TD5

o

- t i m e

case (a) : packet-switching

TD 2 TD5 TD3 TD4 TD1

0

- 2 8 12 time

case (b) : cell-switching

Tpi: completion time f o r packet i

Figure 4: Cell interleaving

via the SIGMA SCCN switching network. However.

request managers, stream managers and proxy man-

agers, may also have connection to external LANs.

Caching can be done more effectively with this archi-

tecture since each computing node is assigned with a

specific task (or tasks) and it is easier to design ef-

fective caching schemes based on locality properties of individual tasks.

4.1 Load sharing among request man-

While

a

request arrives a t the server, it is received

by a request manager. Instead of using a single request

manager, multiple request managers are asked to re-

ceive requests simultaneously. A distributed decision

method is used to distribute workload evenly among

them. In our design, all request managers share the

s a n e IF’ address. When a request packet arrives a t a

computing node where a request manager is running, the low-level network module will peek off its source

IP address, a n d then apply a distributed decision al-

gorithm to determine whether to receive the packet or

not. It will be accepted directly by one of them and re- jected by others. The distributed decision algorithm is

implemented in the driver. So when an HTTP request

is forwarded up to the high-level request manager, it’s

destined. Only one request manager will receive the request, others will not even see the request. This method reduces the overhead of the computing nodes.

Note that, in the connection with broadcast networks

agers

LAN

I

ILoad Sfafus Massages

I/

# Storage Managers Reouesrs Request Managers I I JFomarded Requests

Figure 5 : Software configuration of a world-wide web

server on SIGMA multicomputer

such as Ethernet, the computing nodes sharing the same I P will also share the same physical network address.

4.2 Load redistribution

Three load redistribution strategies are used in our design to improve the web server performance. The first one is for the request manager to directly serve the request locally, the second is to redirect the request to another one, and the last is asking another computing node for help. These three strategies are

applied to three different kinds of requests of differ-

ent natures. Requests for small documents are bet-

ter be served directly because the documents are of-

ten cached in the request manager’s memory buffers. Requests for large files such as video streams usually take longer time t,o process, and processing of some re-

quests will need to generate a virtual document such

as database query results. Request managers will redirect the client sending these kinds of requests to some stream manager for help. This approach will alleviate the workload of request managers and improve the

availability. The third strategy is good for proxy re-

quests. Most current browsers are not able to redi-

rect proxy requests. Moreover, it’s difficult to trans-

parently create another TCP connection between the

client and another node to replace the existing C Q ~ - nection (between the client and the request manager).

Thus, a request manager must handle all proxy re-

quests (requests to remote sites) by itself. This is fine if the requested documents are cached. If not, the re-

quest manager will create a n e x t r a TCP connection to the remote server t o fetch the document. Usually, these two connections may stay for a long time. T C P

connections are valuable resources in a web server, and

should be used more effectively. To solve this prob-

lem, a request manager will route the request to a

proxy manager if a local copy is not available. If, for-

tunately, the requested remote document is cached in the proxy manager, it will directly return the document to the request manager. Otherwise, it is the

(6)

proxy manager’s duty to fetch the remote document.

The remote IP address is used as the hash key so that

all proxy requests to the same remote site are handled by the same proxy manager. Furthermore, the proxy

manager can prefetch or clean the cached documents

from the hyperlink information of the documents for

optimization. By carefully applying these three strategies, the machine’s performance and availability could be enhanced optimistically.

4.3

Video stream service

If all requests l o the same video are served by

the same stream manager, it will haxe better locality.

However, this may cause a problem. when a popu-

lar video program solicits a large number of requests

in the same ]period. The load will not be distributed

evenly among the stream manager groups. So, the

hashing algorithm in a request manager will first redi-

rect all requests for the same videlo program t o the same stream manager, and each stream manager will periodically broadcast its loading status to the request

managers. Once the number of requests to a stream

manager is larger than a threshold, request managers

will then take the source IP address into considera-

tion to choose a second stream manager. In addi-

tion to supporting load distribution, a status message

can also act as a probe message t o check whether a

request manager or a stream manager is still alive.

Status broadcasting can be implemented by the low- level multicast mechanism of the !3IGMA switching network.

In SIGMA, accessing remote memories is faster than accessing local disks. To improve the perfor-

mance of video stream service, a large file is parti-

tioned into pieces and stored in a set of stripped stor-

age managers. While a video file is requested, all of the stripped storage managers will access their local disks concurrently and then return the video program to

the agent via a single stream manager. Each stripped

storage manager simply keeps pieces of the file and

they can access disks concurrently. So the total time

to request a whole video file can be reduced.

4.4

Storage managers

There are three kinds of storage managers in the system, namely cache storage managers, general storage managers, and stripped storage managers. They work together with proxy managers, request managers, and stream managers respectively, and main- tain remote documents, local files, and video files cor-

respondingly.

A

SIGMA computing node running one

of these managers should have locall disks attached. Except for cached remote documents, local docu-

ments and video files are replicated so that serving

requests can continue even when disk failures occur. Some of the reasons not t o replicate remote documents are:

1. Remote documents can be fetched again later after the cache storage manager recovers from crashes.

2. Leaving more disk spaces for caching other remote

documents results in better cache hit rate.

Figure 6: An implementation of the proposed world- wide web server on an eight-node SIGMA multicomputer

3. Replicating files dynamically incurs some over-

At present, replicating local files and video files is done

by ii simple file duplication scheme since web files are

rarely modified by requests and inconsistency during the period of file transfer can be tolerated.

5 Implementation

I n this section we present an implementation of the proposed world-wide web server on an eight- node SIGMA multicomputer. The server is composed

of t,wo mirrored clusters as shown in Fig. 6. Re-

quest managers, proxy managers and stream managers run in association with corresponding general storage managers, cache storage managers, and stripped storage managers. There are two stripped storage managers in each cluster so t h a t a large file can be seg- mented into two stripes. Local files are replicated in different clusters. Currently we are porting the reference library and the H T T P server released by the World-Wide Web Consortium onto our eight-node SIGMA prototype. Each node runs FreeBSD Unix. All nodes are also connected to an Ethernet LAN except for the ones dedicated to stripped storage managers.

Since the nodes running Request Manager A and

Request Manager B have the same IP address and

physical Ethernet address, all incoming packets des-

tined to the shared IP address will reach both Ether-

(7)

net drivers. The drivers have been hacked to support distributed decision method. In the current implementation, they only accept the packets with the least

significant bits of the IP address match their unique

node ID. The number of bits t o be checked depends on the number of request managers in the system. In this way, all incoming requests are dispatched to Request

Manager A and Request Manager B intrinsically.

A

request is routed to one of three manager groups,

and there are two managers in each group that can accept and serve requests concurrently. A video request, on the other hand, will be served by two stripped storage managers concurrently. This arrangement can achieve high degree of load sharing and improve the system availability.

In addition to serving requests concurrently and accessing disks in parallel, request managers are designed to utilize time locality. Small general documents are requested more frequently, and they tend to be cached in the memories of request managers. Large files or video streanis won’t be cached by request man-

agers; they are redirected to stream managers. Stream

managers with replicated stripped storage managers are designed to explore space locality. Large disk block, sequential block placement and prefetching disk blocks for read are implemented in the file system of stripped storage managers.

The system can be scaled up in different ways to adopt the change of request distribution. If the total number of requests increases, more computing nodes

can be added to run as request managers. If proxy

service becomes heavy, we should add more computing nodes to run proxy managers and cache storage

managers. If demands for video stream service in-

crease, more computing nodes for stream managers

arid stripped storage managers are required. On the

other hand, the system can also be scaled down grace-

fully when a manager or a cluster of‘ nodes fails to

work. The system will be able to continue to work with degraded performance.

6 Summary and

Conclusions

In this paper we demonstrate that Internet com-

puting is a suitable application of a multicomputer

system. T h e proposed web server is composed of sev-

eral groups of different managers. Each manager is ar-

ranged to run on one node of the multiprocessor clus-

ters. Using a distributed decision method and hash-

ing algorithms for dispatching request and redistribut- ing workload, computing nodes can share workload, concurrently handle requests, tolerate single point of

failure, and act as a virtual server transparently to

clients. By optimizing the processing of the three kinds of requests independently, performance of the web server can also be improved significantly, even for video streams and caching remote documents. Thus,

we believe that a scalable multicomputer like ours is

suitable for serving as a web server.

References

[l] M. Abrams, C. R. Standridge, G. Abdulla, S.

Williams, and E. A. Fox, “Caching Proxies:

Limitations and Potentials,” Fourth International World Wide Web Conference, 1995.

T. Berners-Lee, R. Cailliau, J. Groff’, and B. Poller-

mann, “World-Wide Web: The Information Uni-

verse,” Electronic Networking: Research, Applica-

tions, and Policy, Vol. 1, No.2, 1992.

T. Berners-Lee, R. Fielding, and H. Frystyk, ”Hy-

pertext Transfer Protocol - HTTP/l.O,” Internet

Draft, Nov. 1995.

H. Braun, and K. Claffy, “Web Traffic characteri- zation: an assessment of the impact of caching doc-

uments from NCSA’s web server,’: Second World

Wide Web Conference, Oct. 1994.

R.J. Clark, and M.H. Ammar, “Providing Scalable

Web Service Using Multicast Delivery,” Second I n -

ternational Workshop on Services i n Distributed

and Networked Environments, 1995.

E.D. Katz, M. Butler, and R. McGrath, ”A Scal-

able H T T P Server: The NCSA Prototype,” Com-

puter Networks and I S D N Systems, Vol. 27, No. 2, 1994.

Y. J. Lin, J . M. Ho, C. C. Yeh, and J. Y. Juang,

“Design of a Switching Module for Large-scale

ATM Switch,” International Conference o n Paral-

lel and Distributed Systems, Taiwan, pp. 399-408, 1993.

A. Luotonen, and Kevin Altis, “World-Wide Web

Proxies,” First International World Wide Web

Conference, 1994.

Network Wizards, “Internet Domain Survey,”

UR L:http://www. nw. com/zone/ W W W / t o p . html, January 1996.

[lo] C. C. Yeh, J. T. Lin, W. C. Kao, C. H. Wu,

and J. Y. Juang, ” A Multicomputer Server for

I/O-Intensive Applications,” 12th I A S T E D Inter-

national Conference on Applied Informatics, Aus- tria, 1995.