SOM: Dynamic Push-Pull Channel Allocation Framework for Mobile Data Broadcasting

(1)

SOM: Dynamic Push-Pull Channel Allocation

Framework for Mobile Data Broadcasting

Jiun-Long Huang, Wen-Chih Peng, Member, IEEE, and Ming-Syan Chen, Fellow, IEEE

Abstract—In a mobile computing environment, the combined use of broadcast and on-demand channels can utilize the bandwidth effectively for data dissemination. We explore in this paper the problem of dynamic data and channel allocation with the number of communication channels and the number of data items given. We first derive the analytical models of the average access time when the data items are requested through the broadcast and on-demand channels. Then, we transform this problem into a guided search problem. In light of the theoretical properties derived, we devise algorithm SOM to obtain the optimal allocation of data and channels. Algorithm SOM is a composite algorithm which will cooperate with 1) a search strategy and 2) a broadcast program generation algorithm. According to the analytical mode, we devise scheme BIS-Incremental on the basis of algorithm SOM, which is able to obtain solutions of high quality efficiently by employing binary interpolation search. In essence, scheme BIS-Incremental is guided to explore the search space with higher likelihood to be the optimal first, thereby leading to an efficient and effective search. It is shown by our simulation results that the solution obtained by scheme BIS-Incremental is of very high quality and is in fact very close to the optimal one. A sensitivity study on several parameters, including the number of data items and the number of communication channels, is conducted. The experimental results show that scheme BIS-Incremental is of very good scalability, which is particularly important for its practical use in a mobile computing environment.

Index Terms—Data dissemination, dynamic data and channel allocation, mobile computing.

æ

1 I

NTRODUCTION

I

Na mobile computing environment, a mobile user with a

power-limited mobile computer can access various information via wireless communication. Applications such as stock activities, traffic reports, and weather forecasts have become increasingly popular in recent years [29]. It is noted that mobile computers use small batteries for their operations without directly connecting to any power source, and the bandwidth of wireless communication is limited in general. As a result, an important design issue in a mobile system is to conserve the energy and communica-tion bandwidth of a mobile unit while allowing mobile users of the ability to access information from anywhere at anytime [24].

A data delivery architecture in which a server continu-ously and repeatedly broadcasts data to a client community through a single broadcast channel was proposed in [1] in order to conserve the energy and communication band-width of a mobile computing system. In a push-based information system, a server generates a broadcast program to broadcast data to mobile users. This broadcast channel is also referred to as a broadcast disk from which mobile users can retrieve data [1], [7]. The mobile users need to wait for the data of interest to appear on the broadcast channel. The access time is defined as the time elapsed from the moment a user issues a data request to the point that the requested data

item is read [15]. One objective of designing proper data allocation in the broadcast disks is to reduce the average access time of data items. The research issues have attracted a considerable amount of attention, including on-demand broadcast [3], [4], [5], [6], data indexing [9], [14], [15], [17], [30], [33], and client cache management [4], [27], [31]. In addition, a significant amount of research effort has been elaborated on developing the index mechanisms [16], [20], [25] and data allocation algorithms [22], [23], [28], [34], [35] in multiple broadcast channels. In addition, the bandwidth allocation for multicell environments with frequency reuse and inference considered was studied in [32].

In addition to being operated in broadcast mode, channels can be operated in on-demand mode (i.e., unicast mode) in which a client explicitly sends data requests to retrieve the data items of interest [18], [19]. The major advantage of data broadcast is its scalability since the performance of the system does not depend on the number of clients listening to the broadcast channels. However, the performance degrades as the number of data items being broadcast increases. It has been shown that the combined use of the broadcast and on-demand channels can utilize bandwidth more efficiently for data dissemination [18], [19]. Hence, the problem of dynamic data and channel allocation is to dynamically partition a given total number of commu-nication channels into broadcasting ones and on-demand ones and to dynamically allocate each data item on broadcast or on-demand channel according to the system workload.

Prior studies of data and channel allocation can be classified into the following three categories: 1) pure on-demand, 2) pure broadcast, and 3) dynamic data and channel allocation. The pure on-demand algorithms are used in traditional client/server architectures. All channels

. The authors are with the Department of Electrical Engineering, National Taiwan University, Taiwan, R.O.C.

E-mail: {jlhuang, wcpeng}@cs.nctu.edu.tw, [email protected].

Manuscript received 2 June 2004; revised 17 Feb. 2005; accepted 12 Mar. 2005; published online 15 June 2006.

For information on obtaining reprints of this article, please send e-mail to: [email protected], and reference IEEECS Log Number TMC-0178-0604.

(2)

are operated in on-demand mode, and all data items are allocated in the on-demand channels. Clients explicitly send data requests to the server to obtain the desired data items. This method is desirable when the number of requests is small and when energy saving is not an issue for the mobile devices. In pure broadcast, all channels are allocated in broadcast mode [1], [12], [22], [35], and all data items are broadcast repeatedly in broadcast channels. This method is useful when the access frequencies of data items are highly skewed (i.e., a small number of data items are of interest to a large group of users).

Dynamic data and channel allocation algorithms are proposed to combine the respective merits of on-demand and broadcast modes and to adapt the change of system parameters including the data access frequencies and the number of users in the system [18], [19], [26]. In dynamic data and channel allocation, the system dynamically allocates broadcast and on-demand channels in accordance with data requests to achieve optimal data access perfor-mance. When the load is heavy, the broadcast channels may significantly relieve the load on on-demand channels by broadcasting frequently accessed data items. When the load is light, on-demand channels can take over to provide instantaneous access to data items.

In this paper, we study the problem of dynamic data and channel allocation. Consider the illustrative example shown in Fig. 1. Assume that the data items Ri, 1 i 15, are of

the same size and are sorted by their access frequencies. The number of channels in this example is assumed to be four. In the beginning, two channels are assigned as broadcast channels and the other two are on-demand ones. Five data items are put in broadcast channels and the broadcast program is shown in Fig. 1a. When the data request rate increases, R6 is moved from the on-demand channel to the

broadcast channel.1_{This will reduce the data request rate to}

demand channels and the expected waiting time in on-demand channels is, hence, reduced. The broadcast program is then rescheduled and the new broadcast program is shown in Fig. 1b. If the data request rate keeps increasing, as shown in Fig. 1c, one channel is reassigned to be a broadcast channel and three data items (R7, R8, and R9)

are moved from on-demand channels to broadcast chan-nels. As the partition of broadcast and on-demand channels varies, the number of data items in those channels changes accordingly, showing the dynamic characteristics of this data and channel allocation problem.

We mention in passing that the authors of [26] provide an adaptive algorithm to allocate data items on broadcast

and on-demand channels. However, they assume a fixed ratio of the on-demand bandwidth to the broadcast bandwidth. The work in [19] is designed to shuffle the loads among broadcast and on-demand channels to keep the load of on-demand channels in a predetermined region. In [18], the average access time of data items is formulated, and the optimal channel allocation is obtained according to the derived theoretical results. Both works [18] and [19] employed flat broadcast programs. A broadcast program is said to be flat if all data items appear with the same frequencies in the broadcast program. On the other hand, a broadcast program is said to be hierarchical if data items of high access frequencies are broadcast more frequently than or equal to those of low access frequencies in the broadcast program. It has been shown that hierarchical broadcast programs usually outperform flat broadcast programs [22], [23]. Hence, algorithms proposed by [18] and [19] may not fully utilize network bandwidth. In view of this, we employ hierarchical broadcast programs in this paper in order to fully utilize the broadcast channels. This feature distin-guishes this paper from others.

Explicitly, we explore in this paper the problem of dynamic data and channel allocation with the number of communication channels and the number of data items given. Gathering the access frequencies of data items is another research issue, since clients do not explicitly send data requests when the data items of interest are put in broadcast channels. Research works [13], [36] in gathering or estimating the data access frequencies in broadcast channels can complement our work. Differing from prior studies [18], [19], hierarchical broadcast programs are employed in our study. In this paper, we first describe the analytical models of broadcast and on-demand channels and transform the problem of dynamic data and channel allocation into a guided search problem. In light of the theoretical properties derived, we devise five pruning properties which are able to effectively reduce the search space by removing the infeasible solutions from the search space. We then devise algorithm SOM (standing for SOlution Mapping) to obtain the optimal allocation of data and channels. Algorithm SOM is a composite algorithm which will cooperate with 1) a search strategy and 2) a broadcast program generation algorithm. According to the analytical models, we devise a search strategy called BIS (standing for Binary Interpolation Search) which is able to dynamically partition the data items and channels into broadcast and on-demand ones in accordance with the incoming requests. Then, based on algorithm SOM, we devise scheme BIS-Incremental to obtain solutions of high

1. The criterion for data movement will be given in Section 4. Fig. 1. An example scenario of dynamic data and channel allocation.

(3)

quality efficiently by employing BIS as the search strategy and VFK _{(standing for Variant-Fanout with the constraint}

K) as the broadcast program generation algorithm,2which greatly reduces the execution time. It is shown by our simulation results that the solutions obtained by scheme BIS-Incremental are of very high quality and are, in fact, very close to the optimal solutions. A sensitivity study on several parameters, including the number of data items and the number of communication channels, is conducted. Moreover, scheme BIS-Incremental is of very good scal-ability, which is particularly important for its practical use in a mobile computing environment.

The rest of this paper is organized as follows: A description of the related work is given in Section 2. In addition, the problem of dynamic data and channel allocation is formulated. Then, the analytical models of broadcast channels, on-demand channels, and the overall system are given in Section 3. In Section 4, we transform the problem of dynamic data and channel allocation into a search problem and develop an efficient algorithm to address this problem based on the derived analytical models. The performance evaluation of the proposed algorithm is presented in Section 5. Finally, this paper concludes with Section 6.

2 P

RELIMINARIES

2.1 Related Work

In [2], the architecture consisting of a single uplink channel and a broadcast channel is considered. A portion of time slots on the broadcast channel is allocated to transmit the data items which are explicitly requested by users via the uplink channel. These time slots are said to be in on-demand mode. On the other hand, the remaining time slots are used to transmit all data items according to a hierarchical broadcast program generated by the broadcast disk technique [1]. These time slots are said to be in broadcast mode. In [2], the ratio of the time slots in broadcast mode to those in on-demand mode is fixed and the broadcast program is static. As a consequence, the scheme proposed in [1] cannot adapt to the change of system workload.

The authors in [26] consider the environment with a broadcast channel, a downlink on-demand channel, and an uplink channel. The on-demand channel is dedicated to transmitting the data items which are explicitly requested by users via the uplink channel. Flat broadcast programs are employed and only the data items whose request rates are high enough will be allocated on the broadcast channel. The authors propose an algorithm to estimate the popular-ity of all data items and to dynamically determine the set of data items on the broadcast channel according to the system workload.

In [10], the information system consists of a broadcast channel and an uplink channel. The authors propose an algorithm to prioritize all data items according to the received data requests and the broadcast rates of these data items. Then, the algorithm will allocate the data items with highest priorities on the broadcast channel. The flat broad-cast programs are used and the ð1; mÞ indexing technique [15] is employed to construct data indices. The authors also

propose several energy efficient data access protocols to minimize the power consumption on data access.

In [19], the authors consider environments with a single broadcast channel and multiple on-demand channels. The broadcast programs are assumed to be flat. The load of the on-demand channels is first divided into several regions. Then, the authors propose a data allocation algorithm to keep the load of the on-demand channels in a predeter-mined suboptimal region by dynamically allocating some data items to the broadcast channel. In addition, the proposed algorithm is able to adaptively adjust the data allocation according to the system workload.

The authors in [18] consider environments with multiple broadcast and on-demand channels. The broadcast pro-grams on the broadcast channels are assumed to be flat. The authors first model the on-demand channels as an M/M/c queue. Then, the formulae of the average access time of the broadcast and on-demand channels are derived. With these analytical results, the authors propose a data and channel allocation algorithm to determine 1) the numbers of channels which are operated in broadcast and on-demand modes and 2) the data items which are allocated in the broadcast and on-demand channels according to the system workload. However, since the proposed algorithm does not employ hierarchical broadcast programs, the network bandwidth may not be fully utilized. The problem we address is similar to that considered in [18] but different from the latter in that we also consider the generation of hierarchical broadcast programs to attain a higher network bandwidth utilization.

2.2 System Description and Problem Formulation Denote the total number of data items as n and data items as Ri, 1 i n. Naturally, the nB frequently accessed

data items are placed in broadcast channels and the other nO¼ n nB data items are in on-demand channels. Let

K¼ KBþ KO represent the total number of channels

where KB and KO are the numbers of broadcast and

on-demand channels, respectively. The problem of generating broadcast programs for KB broadcast channels can be

viewed as the following discrete minimization problem: Given a set of nB data items with their access

probabil-ities, partition them into KB parts so that the average

access time of all data items is minimized [12], [22], [23], [35]. Note that once KB is decided, KO follows.

Fig. 2 shows the architecture of a data dissemination system. We assume that each data item is the same size and read-only [18], [19]. After being powered on, without knowing the placement of the requested data item, a mobile device has to send a data item request via on-demand channels. If the requested data item is placed in an on-demand channel, the server will reply the data item directly. If the data item is in a broadcast channel, the server replies the broadcast information containing the channel frequencies, the data identifiers, the data index information, and other auxiliary information [18]. After receiving the broadcast information, the mobile device will store the broadcast information in the local storage, listen to the broadcast channel and wait for the requested data item. If a mobile device already has the broadcast information in its local storage, for each user request, the device will check whether the requested data item is placed in broadcast channels. If yes, the device will tune to the

(4)

channel where the required data item is placed and wait for the appearance of the requested data item. Otherwise, the device will explicitly send a data request to the server via an on-demand channel and the server will return the requested data item on the on-demand channel.

With the above model, the problem of dynamic data and channel allocation we consider in this paper is formulated as follows.

Problem of dynamic data and channel allocation.Given K channels, n data items, and their access frequencies, we shall do the following tasks to minimize the average access time of all data items:

1. Determine the numbers of broadcast and on-demand channels (i.e., KBand KO), where K ¼ KBþ KO.

2. Determine the numbers of data items allocated to broadcast and on-demand channels (i.e., nBand nO),

where n ¼ nBþ nO.

3. Construct a hierarchical broadcast program in the KB broadcast channels with the nB most frequently

accessed data items.

3 A

NALYTICAL

M

ODELS

The analytical models of the broadcast and on-demand channels are given in Section 3.1 and Section 3.2, respec-tively. In accordance with these analytical models, the overall average access time is formulated in Section 3.3. For better readability, Table 1 lists the symbols used in this paper.

3.1 Broadcast Channels

Since there is more than one data broadcast program for given KB and nB, we use WBðKB; nBÞ to represent the

minimal average access time of the data items allocated in broadcast channels. Let CðK1; n1Þ be a configuration where

KB¼ K1 and nB¼ n1. The optimal broadcast program can

be obtained by executing one broadcast program generation algorithm.

Without considering the use of on-demand channels, the work in [22] explored the problem of generating broadcast programs with the number of broadcast channels (i.e., KB) given. Specifically, the problem of

generating broadcast programs for KB broadcast channels

was transformed into a partition problem to partition the data items into KB partitions. The data items within the

same partition are periodically broadcast in the same channel. Two algorithms, OPT and VFK, were devised in [22] to generate hierarchical broadcast programs for multi-ple broadcast channels. Algorithm OPT is an A*-like algorithm which is able to generate the optimal broadcast program. However, OPT is quite time-consuming. On the other hand, VFK _{is a greedy, heuristic algorithm which is}

able to efficiently obtain broadcast programs which are shown to be very close to the optimal ones. Since the details of OPT and VFK _{are beyond the scope of this}

paper, interested readers are referred to [22] for the details of OPT and VFK_{. To facilitate the design of scheme}

BIS-Incremental, an overview of VFK is given as follows. Basically, VFK _{is a partition-based algorithm which}

divides all data items into K partitions, where K is the number of broadcast channels and allocates all data items into K broadcast channels according to the resultant partitions. Initially, all data items, R1; R2; Rn, are

reor-dered according to their access frequencies in descendent order and are placed in one partition. The average access time of a partition is defined as the average access time of the case that the data items of the partition are broadcast

Fig. 2. The architecture of a data dissemination system.

TABLE 1 Description of Symbols

(5)

periodically in one broadcast channel. Then, the average access time of a broadcast program on multiple channels is the summation of the average access times of all partitions. In each cut, the partition with the largest average access time, say, fRp; Rpþ1; ; Rqg, is selected, and the best cut point of

the selected partition, say, c, which best reduces the average access time of the broadcast program is determined. Then, the selected partition is cut into two partitions, fRp; Rpþ1; ; Rcg and fRcþ1; Rcþ2; ; Rqg. For KB

broad-cast channels, KB 1 cuts are sequentially performed to

partition the data items into KB partitions. Finally, the

resultant broadcast program is obtained by periodically broadcasting all data items within the same partition in one broadcast channel.

Then, we have the incremental property of VFK _as

follows. In the interest of space, the proof of all properties and lemmas is given in the appendix.

Lemma 1 (Incremental Property). The execution of VFK on configuration CðK1; n1Þ will generate K1 data broadcast

programs of CðKb; n1Þ, 1 Kb K1.

Lemma 1 means that the execution of VFK _on

config-uration CðK1; n1Þ will generate K1 broadcast programs,

which are the same as the results produced by VFK for configurations CðKB; n1Þ where KB¼ 1; 2; 3; ; K1.

3.2 On-Demand Channels

Let WOðKO; nOÞ denote the average access time of the data

items placed in on-demand channels. Let Pn

OðnOÞ be the

probability that the requested data item is in on-demand channels when there are nO data items placed in

on-demand channels. We assume that the arrival process of user requests is a Poisson process with the arrival rate . It follows that the arrival process of requests received by on-demand channels is also a Poisson process with arrival rate O¼ POnðnOÞ. As in [18], we assume that the queueing

buffer is infinite. Thus, the on-demand channels are modeled as an M/M/c queueing system [11] with the arrival rate O, the service rate , and the channel number c.

The average service time is1

. Let the sizes of data items and

data requests be s and r, respectively. Hence, similarly to [18], the average service time of on-demand channels can be formulated as:

¼ b sþ r:

Omitting the equation manipulation which can be found in [11], the average access time of the on-demand channels (i.e., the M/M/c queueing system where c¼ KO) when < 1 is

Average access time¼1 þ rc c!ðcÞð1 Þ2 ! p0; where ð1Þ ¼O c; r¼ O ; and p0¼ Xc1 n¼0 rn n!þ rc c!ð1 Þ !1 : 3.3 Overall Average Access Time

The probability that a user requests a data item placed in the broadcast channels is Pn

BðnBÞ ¼Pni¼1B P rðRiÞ. On the

other hand, the probability that a user requests a data item placed in the on-demand channels is

P_OnðnOÞ ¼ Xn i¼nnOþ1 P rðRiÞ ¼ 1 XnB i¼1 P rðRiÞ ¼ 1 PBnðnBÞ:

Then, the minimal average access time of a data dissemina-tion system can then be formulated as follows:

WoptimalðK; nÞ ¼ min 0KBK;0nBn fW ðKB; nBÞg; where ð2Þ WðKB; nBÞ ¼ Pn BðnBÞ WBðKB; nBÞ þ ðPOnðnOÞÞ WOðKO; nOÞ ¼ Pn BðnBÞ WBðKB; nBÞ þ ð1 PBnðnBÞÞ WOðK KB; n nBÞ:

With KB predetermined, the relationship among

WðKB; nBÞ, WBðKB; nBÞ, and WOðK KB; n nBÞ is plotted

in Fig. 3. Note that WOðK KB; n nBÞ increases

exponen-tially when nO increases (i.e., when nB decreases). It is

evident that with too few data items in broadcast channels, the volume of requests at the servers may increase beyond their capacity, thereby making the service practically infeasible. On the other hand, the change of the average access time for the broadcast data items is smoother than that for the on-demand data items since the average access time of the broadcast data items only depends on the number of data items allocated to broadcast channels. In this study, the dynamic data and channel allocation algorithm designed will determine the proper values of KB and nB with the objective of minimizing the average

access time of all data items.

4 SOM: S

OLUTION

M

APPING ON

B

ROADCAST AND

O

N

-D

EMAND

C

HANNELS

In this section, we design algorithm SOM based on the analytical results in Section 3 to address the problem of dynamic data and channel allocation. In Section 4.1, we transform the problem of dynamic data and channel allocation into a search problem and give an overview of algorithm SOM. In Section 4.2, several properties to prune the infeasible solutions from the search space are given. Then, an efficient search strategy based on binary inter-polation search, referred to as BIS, is devised in Section 4.3. Based on algorithm SOM, scheme BIS-Incremental, which is able to obtain nearly-optimal solutions by employing BIS and the incremental properties of VFK_{, is then proposed.}

(6)

The complexity analysis of scheme BIS-Incremental is given in Section 4.4. Finally, an illustrative example is given in Section 4.5.

4.1 Problem Transformation and Overview of SOM Given K and n, for each configuration CðKB; nBÞ,

WBðKB; nBÞ can be obtained by executing a broadcast

program generation algorithm, and WOðK KB; n nBÞ

can be calculated by the analytical model of the on-demand channels. As a result, the problem can be transformed into a search problem: to find the configuration with the minimal average access time by searching all given configurations CðKB; nBÞ, where 0 KB K and 0 nB n.

In this section, we design algorithm SOM to address the problem of dynamic data and channel allocation. In essence, algorithm SOM is a composite and generic algorithm which is composed of a search strategy and a broadcast program generation algorithm. Algorithm SOM consists of two major phases: the search space pruning phase and the solution searching phase. Fig. 4 shows the architecture of algorithm SOM. In the search space pruning phase, some infeasible configurations are removed from the search space. Then, in the solution searching phase, a search strategy is used to guide the search of the optimal solutions with the aid of the employed broadcast program generation algorithm and the analytical model of the on-demand channels. Note that algorithm SOM does not set any limitation in the broadcast program generation algorithm and the modeling of the on-demand channels. Therefore, any improvement in hier-archical broadcast program generation or on-demand channel modeling can be integrated into algorithm SOM seamlessly.

4.2 Phase One: Search Space Pruning

Initially, the search space should contain all the configura-tions CðKB; nBÞ, where 0 KB K and 0 nB n, since

each could possibly be the optimal one. Hence, the size of the initial search space is ðK þ 1Þ ðN þ 1Þ. Since on-demand channels are modeled as an M/M/c queueing system, the average access time of the on-demand channels can be derived by (1). Hence, some infeasible configurations can be pruned by the following properties:

Property 1. All configurations in which 1 KB K 1 and

nB< KB are pruned since those configurations will not be

optimal.

Analogously, we have the following property:

Property 2.All configurations such that nB¼ n and KB< K

are pruned since those configurations will not be optimal. Omitting straightforward proofs, we also have the following three properties:

Property 3.All configurations such that KB¼ 0 and nB > 0are

pruned, since if there is no broadcast channel, no data item can be placed in broadcast channels. That is, nB must be 0 when

KB¼ 0.

Property 4.All configurations such that KO¼ 0 and nO> 0are

pruned since, if there is no on-demand channel, no data item can be placed in on-demand channels. That is, nO must be 0

when KO¼ 0.

Property 5. All configurations that ¼ O

KO 1 are pruned.

When of an M/M/c queueing system is larger than or equal to 1, the system is unstable. That is, the average access time does not converge and will increase drastically as time advances.

Fig. 5 shows an example search space where each square represents one configuration. A grey square indicates that this configuration is pruned and the numbers inside a grey square indicate this configuration is pruned by these properties. Since the number of configurations pruned by Property 5 depends on other parameters such as the request

Fig. 4. Architecture of algorithm SOM.

(7)

arrival rate, we do not show the configurations pruned by Property 5 in Fig. 5.

Lemma 2.When K 1 and n K, Properties 1 through 4 are able to prune 2n þðK1ÞðKþ2Þ₂ configurations.

Lemma 3. 1) The lower bound of the ratio of pruned configurations is4nþK2_þK2

2ðnþ1ÞðKþ1Þwhen K 1 and n K. 2) When

n K, n 1 and K2_{1, this ratio will converge to}K 2nþ

2 K.

In Phase 1, after building the initial search space, algorithm SOM will prune the infeasible configurations according to Properties 1 through 5. Then, algorithm SOM will search the pruned search space for the optimal configuration in Phase 2.

4.3 Phase Two: Solution Searching 4.3.1 Design of Search Strategy BIS

In Phase 2 of algorithm SOM, a search strategy is employed to search the pruned search space for the optimal config-uration. It is obvious that the optimal configuration can be obtained by exhaustive search. However, it is not scalable when the size of the pruned search space is large.

To achieve high scalability, we devise an efficient search strategy, referred to as BIS, based on the analytical models. BIS is a greedy algorithm to find the suboptimal solution of the search space. In essence, BIS is guided to explore the search space with higher likelihood to be the optimal first. A configuration CðK1; n1Þ is said to be “local optimal when

KB¼ K1” if W ðK1; n1 1Þ W ðK1; n1Þ and

WðK1; n1þ 1Þ W ðK1; n1Þ:

To facilitate the design of BIS, we employ the function LocalOptimalCheck to determine whether the input config-uration is local optimal. LocalOptimalCheckðK1; n1Þ returns

LOCALOPTIMAL to notify BIS that the input configuration CðK1; n1Þ is the local optimal when KB¼ K1. Otherwise, it

returns MINUS and PLUS to show that W ðK1; n1 1Þ <

WðK1; n1Þ and W ðK1; n1þ 1Þ < W ðK1; n1Þ, respectively.

The algorithmic form of LocalOptimalCheck is as follows: Function LocalOptimalCheck(KB; nB) 1: Calculate(KB; nB 1) 2: Calculate(KB; nBþ 1) 3: if (W ðKB; nB 1Þ < W ðKB; nBÞ) then 4: return MINUS 5: else if (W ðKB; nBþ 1Þ < W ðKB; nBÞ) then 6: return PLUS 7: else /* W ðKB; nB 1Þ W ðKB; nBÞ and W ðKB; nBþ 1Þ W ðKB; nBÞ 8: return LOCALOPTIMAL 9: end if Procedure Calculate(KB; nB)

1: Calculate and store WBðKB; nBÞ and the corresponding

broadcast program by the employed broadcast program generation algorithm if they have not been calculated.

2: Calculate and store WOðK KB; n nBÞ by (1) if it has

not been calculated.

3: Calculate and store W ðKB; nBÞ by (2) if it has not been

calculated.

Note that each invocation of LocalOptimalCheck will cause at least one execution of the broadcast program generation algorithm. That is costly. Therefore, we design function LocalOptimalPrediction to predict the position of the local optimal solution to reduce the total execution time by reducing the number of invocations of LocalOptimalCheck.

To facilitate the design of function LocalOptimalPrediction, we first design a method to calculate approximations of WBðKB; nBÞ and W ðKB; nBÞ. Denote the approximations of

WBðKB; nBÞ and W ðKB; nBÞ as WB0ðKB; nBÞ and W0ðKB; nBÞ,

respectively. Fig. 6 shows the proposed approximation method which calculates W0

BðKB; nBÞ and W0ðKB; nBÞ by

extrapolation. As shown in Fig. 6, the value of W0

BðK1; n2Þ,

for each n2, can be obtained by the extrapolation of

WBðK1; n1Þ and WBðK1; n1 1Þ. Then, we have the following

equation: W_B0ðK1; n2Þ

n2 n1

¼WBðK1; n1þ Þ WBðK1; n1Þ , where ¼ 1 : if LocalOptimalCheckðK1; n1Þ returns PLUS;

1 : if LocalOptimalCheckðK1; n1Þ returns MINUS:

By solving the above equation, we have WB0ðK1; n2Þ as

W_B0ðK1; n2Þ ¼

1

ðn2 n1Þ ðWBðK1; n1þ Þ WBðK1; n1ÞÞ: Since WOðK1; n2Þ can be obtained by (1), with WB0ðK1; n2Þ,

W0_ðK

1; n2Þ can be obtained by the following equation:

W0ðK1; n2Þ ¼

P_Bnðn2Þ WB0ðK1; n2Þ þ ð1 PBnðn2ÞÞ WOðK K1; n n2Þ:

ð3Þ LocalOptimalPrediction is employed to predict the position of the local optimal of the configurations with KB¼ K1and

nLower nB nUpper. First, LocalOptimalPrediction sets n1¼ dnLowerþnUppere

2 and checks whether W 0_ðK

1; n1 1Þ W0ðK1; n1Þ

and W0_ðK

1; n1þ 1Þ W0ðK1; n1Þ. That is, to check whether

W0ðK1; n1Þ is local optimal. If so, LocalOptimalPrediction

reports CðK1; n1Þ as the possible configuration of the local

optimal solution. Otherwise, if W0ðK1; n1 1Þ < W0ðK1; n1Þ,

LocalOptimalPrediction is invoked recursively by setting

(8)

nUpper¼ n1 1. Similarly, if W0ðK1; n1þ 1Þ < W0ðK1; n1Þ,

LocalOptimalPrediction is invoked recursively by setting nLower¼ n1þ 1. The algorithmic form of function

Local-OptimalPrediction is as follows.

Function LocalOptimalPrediction(K1; nLower; nUpper)

1: n1 dnLowerþn₂ Uppere

2: Calculate W0ðK1; n1Þ, W0ðK1; n1 1Þ

and W0ðK1; n1þ 1Þ by (3)

3: if (W0ðK1; n1þ 1Þ < W0ðK1; n1Þ) then

4: return LocalOptimalPrediction(K1; n1þ 1; nUpper)

5: else if (W0_ðK

1; n1 1Þ < W0ðK1; n1Þ) then

6: return LocalOptimalPrediction(K1; nLower; n1 1)

7: else /* W0_ðK

1; n1Þ is local optimal */

8: return n1

9: end if

We now design search strategy BIS using LocalOptimal-Check and LocalOptimalPrediction. After the search space is pruned, BIS checks these unpruned configurations itera-tively. In each iteration, BIS picks one value (denoted as K1) from the possible values of KB, sets KB¼ K1, and

considers the configurations with KB ¼ K1. Suppose that

nMax and nMin are the maximum and minimum,

respec-tively, of nB among all unpruned configurations with

KB¼ K1. BIS sets n1¼ dnMaxþn2 Mine and checks whether or

not the configuration CðK1; n1Þ is the local optimal with

KB¼ K1 by LocalOptimalCheck. If LocalOptimalCheck

returns LOCALOPTIMAL, BIS memorizes configuration CðK1; n1Þ as a candidate of the resultant configuration.

Then, BIS steps into the next iteration by picking another value of K1. Otherwise, when LocalOptimalCheck returns

PLUS or MINUS, LocalOptimalPrediction is invoked to predict the position of the local optimal with KB¼ K1.

Suppose that LocalOptimalPrediction reports that CðK1; n2Þ

has the high probability to be the local optimal when KB¼ K1. LocalOptimalCheck is invoked again to check

whether W ðK1; n2Þ is the local optimal. In one iteration, BIS

repeats the above procedure until the configuration predicted by LocalOptimalPrediction is indeed the local optimal (i.e., LocalOptimalCheck returns LOCALOPTIMAL). After picking all possible values of KB, BIS stops and returns the best solution

among the candidates.

For a better understanding of algorithm SOM and search strategy BIS, we design scheme BIS-Generic by employing BIS as the search strategy of algorithm SOM. Without being limited to any broadcast program generation algorithm, scheme BIS-Generic is able to cooperate with any broadcast program generation algorithm seamlessly. The algorithmic form of scheme BIS-Generic is as below, and the procedure of search strategy BIS is described in lines 6 through 20. Scheme BIS-Generic

Input:The data items sorted by their access frequencies and the number of communications.

Output:The number of broadcast channels and on-demand channels, the number of data items with broadcast and on-demand channels, and the resultant broadcast program. Note: Scheme BIS-Generic is not limited to any broadcast program generation algorithm.

1: Construct the search space and prune configurations according to Properties 1 through 5. /* Phase 1 */

2: Mark the unavailable configurations (i.e., KB> Kor

K < 0or nB> nor nB< 0) as calculated and set

WBðKB; nBÞ, WOðK KB; n nBÞ and W ðKB; nBÞ to

be 1.

3: for all pruned configuration CðKB; nBÞ do

4: Set WBðKB; nBÞ, WOðK KB; n nBÞ, and

WðKB; nBÞ to be 1 and mark them as calculated.

5: end for

6: for (KB 0 to K) do /* Phase 2 */

7: Calculate the corresponding values of nMaxand nMin.

8: nB dnMaxþn2 Mine

9: Calculate(KB; nB)

10: while(LocalOptimalCheck(KB; nB)

6¼ LOCALOPTIMAL) do

11: if(LocalOptimalCheck(KB; nB) ¼ PLUS then

12: nMin nBþ 1

13: nB LocalOptimalPrediction(KB; nMin; nMax)

14: else/* LocalOptimalCheck(KB; nB) ¼ MINUS */

15: nMax nB 1

16: nB LocalOptimalPrediction(KB; nMin; nMax)

17: end if 18: end while

19: Keep track of the optimal WoptimalðK; nÞ W ðKB; nBÞ,

the corresponding configuration CðKB; nBÞ, and the

broadcast program. 20: end for

4.3.2 Employment of the Incremental Property of VFK We now design scheme BIS-Incremental, which is able to obtain the local optimal solutions efficiently, by integrating the incremental property of VFK _{into scheme BIS-Generic.}

With the incremental property of VFK, the execution of VFK on configuration CðK1; n1Þ will generate K1 broadcast

programs which are the same as the results produced by VFK_{for configurations CðK}

B; n1Þ where KB ¼ 1; 2; 3; ; K1.

To take advantage of the incremental property, the search strategy BIS should 1) search KB in decreasing order and

2) store the results of VFK obtained by the incremental property for future use. Note that the use of the incremental property of VFK does not affect the quality of obtained solutions and VFK _{is required to be the broadcast program}

generation algorithm of scheme BIS-Incremental. The algo-rithmic form of scheme BIS-Incremental is given below. Since scheme BIS-Incremental is similar to scheme BIS-Generic, only modifications are shown.

Scheme BIS-Incremental

Note: VFK _{is required to be the broadcast program}

generation algorithm. 6’: for ðKB K to 0Þ do

Procedure Calculate(KB; nB)

1’: Calculate WBðKB; nBÞ and the corresponding broadcast

program by VFK_{if they had not been calculated. When}

VFK is executed, WBð; nBÞ for all 1 KB and

corresponding broadcast programs are also stored and marked as calculated.

4.4 Complexity Analysis

Since the most time-consuming portion of a BIS-based algorithm is the execution of the employed broadcast

(9)

program generation algorithm, we derive the time complex-ity of a BIS-based algorithm by focusing on the executions of the employed broadcast program generation algorithm. The time complexity of binary interpolation search in the average case is “OðK log nÞ, and therefore, the time complexity of schemes using BIS is OðK log nÞ the time complexity of the broadcast program generation algo-rithm.” By employing the incremental property, the amortized cost to construct a data broadcast program by VFK is 1

K Time Complexity of VF K

. Therefore, the whole time complexity of scheme BIS-Incremental is

OðK log nÞ 1

K Time Complexity of VF

K

¼ Oðlog nÞ Time Complexity of VFK_:

As shown in [22], with n sorted data items and K broadcast channels given, the time complexity of V FK _is

K ðOðK log KÞ þ OðnÞÞ. The time complexity of scheme BIS-Incremental is, hence,

Oðlog nÞ K ðOðK log KÞ þ OðnÞÞ:

If n K, the time complexity of scheme BIS-Incremental is OðKn log nÞ. In addition, scheme BIS-Incremental requires a table to store information on each configuration. For K channels and n data items, the size of this table is ðK þ 1Þ ðn þ 1Þ and, hence, the space complexity of scheme BIS-Incremental is OðK nÞ.

4.5 An Illustrative Example

In this section, we use a running example to illustrate the steps of scheme BIS-Incremental. Table 2 shows the parameters used in this example. The searching steps are shown in Fig. 7, where the number inside a configuration indicates the order of the configuration checked by LocalOptimalCheck. The local optimal solution for each value of KB is marked by thick border.

In Phase 1, Table 3 is constructed and Pn

BðnBÞ for all

0 nB 10 are calculated. Then, configurations are pruned

according to Properties 1 through 5. For each pruned con-figuration CðKB; nBÞ, WBðKB; nBÞ, WOðK KB; K nBÞ,

and W ðKB; nBÞ are initialized to be 1. Consider the

configuration Cð3; 3Þ. The number of the on-demand chan-nels is KO¼ K KB¼ 4 3 ¼ 1. The data request arrival

rate of the on-demand channels is

O¼ Pn Oð3Þ ¼ 20 ð1 P n Bð3ÞÞ ¼ 20 ð1 0:486Þ ¼ 10:284: Because ¼ 0 KO¼ 10:28 110¼ 1:028 > 1, according to Property 5,

this configuration is pruned.

In Phase 2, scheme BIS-Incremental first examines config-urations with KB¼ 4. In this example, the only available

configuration with KB ¼ 4 is Cð4; 10Þ. We have WOð0; 0Þ ¼ 0

since it contains no on-demand channel. By executing VFK_,

we have WBð4; 10Þ ¼ 118 ms. By (2), we have

Wð4; 10Þ ¼

P_Bnð10Þ WBð4; 10Þ þ ð1 PBnð10ÞÞ WOð0; 0Þ ¼ 118 ms:

Configuration Cð4; 10Þ is then checked by LocalOptimal-Check. Since configuration Cð4; 9Þ is pruned and configura-tion Cð4; 11Þ is unavailable, LocalOptimalCheck returns LOCALOPTIMAL, which means that configuration Cð4; 10Þ is local optimal when KB¼ 4.

Next, configurations with KB ¼ 3 are checked. Since

KB¼ 3, the number of data items on broadcast channels

is between four and nine. Scheme BIS-Incremental first checks the configuration with KB¼ 3 and nB¼ d4þ9₂ e ¼ 7.

WOð1; 3Þ, WBð3; 7Þ, and W ð3; 7Þ are then calculated. Due

to the incremental property of VFK_{, W}

Bð2; 7Þ and

WBð1; 7Þ are also obtained when VFK is executed on

KB¼ 3 and nB¼ 7, and are stored in Table 3b for

future use. Note that these two values are not available

TABLE 2 An Example Profile

(10)

if other broadcast generation algorithms (e.g., OPT) are employed. Then, W ð3; 6Þ and W ð3; 8Þ are also calculated in order to check whether W ð3; 7Þ is the local optimal when KB ¼ 3. LocalOptimalCheckð3; 7Þ returns MINUS

since W ð3; 6Þ < W ð3; 7Þ. Then, LocalOptimalPrediction is invoked and reports that Cð3; 6Þ is of high probability to be the local optimal solution. To check whether W ð3; 6Þ is indeed the local optimal, VFK is executed again to obtain WBð3; 5Þ. Finally, LocalOptimalCheckð3; 6Þ returns

LOCALOPTIMAL because W ð3; 6Þ is less than both Wð3; 5Þ and W ð3; 7Þ.

The same procedure is executed on configurations with KB¼ 2, 1, and 0, and the results are shown in Table 3. By

tracking the optimal configurations in different values of KB, we can obtain the suboptimal configuration Cð2; 4Þ. The

configuration Cð2; 4Þ means that two channels are operated in broadcast mode and the top four hot data items (i.e., R1,

R2, R3, and R4) are allocated in these two broadcast

channels. The remaining channels are operated in on-demand mode and the remaining data items are allocated in the on-demand channels. The broadcast program of these two channels and four data items is obtained by executing VFK. Finally, the corresponding broadcast program is shown in Fig. 8.

5 P

ERFORMANCE

E

VALUATION

In order to evaluate the performance improvement achieved by algorithm SOM, we have designed a simulation model of a data dissemination system which is described in Section 5.1. Four schemes are developed based on algorithm SOM to address the problem of dynamic data and channel allocation. Then, four experiments are conducted in the following sections to examine the impact of different system parameters on the performance of all schemes.

5.1 Simulation Model

Similarly to the work in [18], [19], we set the system parameters as shown in Table 4. Also, the access frequency of the ith data item is assumed to be

P rðRiÞ ¼ ð1 iÞ Pn j¼1ð1jÞ ;

where is the parameter of the Zipf distribution. Note that ¼ 0 indicates that the access frequencies are uniformly distributed (i.e., P rðRiÞ ¼ P rðRjÞ for all i, j). In addition, the

access frequencies become increasing skewed as increases. As pointed out in [8], the value of appears to be about 0.8 for traces from a homogeneous environment, and the value

TABLE 3

WBðKB; nBÞ,WOðK KB; n nBÞ, and W ðKBÞnBÞ for the Example (Time Unit: ms)

(a) Table Pn

BðnBÞ. (b) Table WBðKB; nBÞ=WOðK KB; n nBÞ. (c) Table W ðKB; nBÞ.

Fig. 8. The resultant solution of the running example.

TABLE 4

(11)

of appears to be around 0.7 for traces from a diversified user population. In addition, as observed in [21], the value of appears to be larger than 1 in busy Web sites. Hence, we set the default value of to be 0.9 and conduct an experiment with the value of varied to measure the effect of . The simulator is coded in C++.

We have implemented four schemes based on algo-rithm SOM. A scheme denoted by A-B means that A is the corresponding search strategy and B is the corresponding broadcast program generation algorithm. In addition to scheme BIS-Incremental, we implement scheme BIS-VFK to evaluate the effect of employing the incremental property of VFK by comparing it with scheme BIS-Incremental. Scheme BIS-VFK is an instance of scheme BIS-Generic by employ-ing VFK_{as the broadcast program generation algorithm. To}

measure the effect of the search strategy, BIS, we also implement scheme ES-VFKwhich adopts exhaustive search (abbreviated as ES) and VFK_{, respectively, as the search}

strategy and the broadcast program generation algorithm. For each configuration, since the optimal broadcast pro-grams can be obtained by OPT, the optimal data and channel allocation can be obtained by collecting all optimal broadcast programs of all possible configurations in the search space and taking the optimal one among them. As a result, we implement scheme ES-OPT, which employs ES and OPT as the search strategy and the broadcast program generation algorithm, respectively, to obtain the optimal configurations and the corresponding broadcast programs for comparison purposes. Note that all of them are the instances of the proposed algorithm SOM.

In addition to the above SOM-based schemes, scheme FLAT [18], which employs flat broadcast programs (i.e., allocates data items within broadcast channels with equal appearance frequencies), is also implemented in order to evaluate the benefit of using hierarchical broadcast pro-grams. Note that, not being an instance of algorithm SOM, scheme FLAT does not employ any search strategy and broadcast program generation algorithm. A summary of these schemes is given in Table 5.

The following subsections show the average access times and execution times of all schemes on Experiments 1, 2, 3, and 4, respectively, and the parameters of each experiment are listed in Table 6. The ratio of pruned configurations of each scheme is also given to measure the effect of these parameters on configuration pruning. Due to the high complexity of OPT, scheme ES-OPT is quite slower than the others. Hence, the execution time of scheme ES-OPT is not shown in the following figures. In addition, since scheme ES-VFK is slower than based schemes, the execution times of BIS-based schemes are shown in another subfigure to evaluate the effect of employing the incremental property of VFK_.

5.2 Experiment 1: The Effect of Skewness of Access Frequencies

Fig. 9 shows the average access times, ratios of pruned configurations and execution, times of all schemes with the value of varied. The value of is set from 0 to 1.2.

As shown in Fig. 9a, the average access times of all schemes decrease as the value of increases. It can be explained that when the access frequencies are highly skewed, broadcasting hot data items can effectively reduce the load of the on-demand channels and, hence, reduce the average access times. We also observe that schemes employing hierarchical broadcast programs (i.e., OPT and VFK_-based _{schemes) outperform scheme}

FLA-T,especially when the access frequencies are highly skewed. In this example, the performance gain of VFK-based schemes over scheme FLAT increases from 0.5 percent to 32.14 percent as the value of increases from 0 to 1.2. It fully agrees with the fact that VFK _and

OPT outperform FLAT, especially when the access frequencies are highly skewed [22]. In addition, the results of VFK_-based _{schemes are very close to those of}

scheme ES-OPT (i.e., optimal solutions).

Fig. 9b shows the ratio of the pruned configurations with the value of varied. Since scheme FLAT does not prune configurations, the pruning effect of scheme FLAT is omitted in this and the following experiments. We observe that the ratio of the pruned configurations decreases from 44.48 percent to 22.32 percent as the value of increases from 0 to 1.2. Since the number of all configurations and the number of configurations pruned by Properties 1 through 4 are not affected by the value of , this situation results from the pruning effect of Property 5. Note that Property 5 prunes configurations which 1. Considering an arbi-trary configuration, the condition of 1 is when the request rate of the on-demand channels is larger than a threshold (i.e., O KO). When increases, the access

frequencies of cold items decrease. Therefore, on-demand channels can contain more data items without making exceed the threshold. Since the number of configurations pruned by Property 5 decreases as the increase of , the

TABLE 6

Parameters Used in the Experiments TABLE 5

(12)

ratio of pruned configurations decreases as the value of increases. In addition, since pruning is independent of the employed broadcast program generation algorithms, with the same parameters, the numbers of pruned configurations of all SOM-based schemes are the same.

As observed in Figs. 9c and 9d, the execution time of scheme ES-VFK _{increases as the value of increases. It is}

because ES examines all unpruned configurations and the effect of configuration pruning decreases as the value of increases. On the other hand, since search strategy BIS only checks a subset of unpruned configurations, with the same broadcast program generating algorithm, the execution time of scheme ES-OPT is more sensitive to the change of than BIS-based schemes. In this experiment, the execution time reduction of scheme BIS-VFK over scheme ES-VFK is around 98 percent, showing the high efficiency of BIS. In addition, the execution time reduction of scheme BIS-Incremental over scheme BIS-VFK _{increases from 5.26}

per-cent to 20.67 perper-cent as the value of increases from 0 to 1.2. This result shows the advantage of employing the incre-mental property of V FK_.

5.3 Experiment 2: The Effect of the Number of Data Items

In Experiment 2, we investigate the effect of all schemes with the number of data items (i.e., n) varied. The number of data items is set from 2,000 to 10,000. As observed in Fig. 10a, the average access times of all schemes increase as the number of data items increases. The performance gain of scheme ES-OPT (i.e., optimal solution) over scheme FLAT increases from 35.94 percent to 39.08 percent as the number of data items increases, showing the advantage of employing hierarchical broadcast program generation algorithms. In addition, the results of VFK_-based_{schemes are close to those of scheme}

ES-OPT, and the performance gain of VFK_-based_{schemes over}

scheme FLAT ranges from 30.59 percent to 33.24 percent.

Fig. 10b shows that the ratio of pruned configurations slightly decreases from 24.13 percent to 23.31 percent as the number of data items increases. This result agrees with the analysis in Lemma 3 that the ratio of pruned configurations is only slightly affected by the value of n since K n in this experiment. In addition, as shown in Figs. 10c and 10d, the execution time of each scheme increases as the value of n increases. Although the ratio of pruned configurations only decreases slightly as the number of data items increases, the increment of the number of unpruned configurations is still in proportion to the increment of the number of data items since the number of all configurations (i.e., ðK þ 1Þ ðn þ 1Þ) increases as the value of n increases. Hence, in execution time, scheme ES-OPT is more sensitive to the number of pruned configurations than BIS-based schemes since scheme ES-OPT scans all unpruned configurations. As a result, BIS-based schemes are more scalable than scheme ES-OPT. As shown in Figs. 10c and 10d, as the value of n increases, the execution time reduction of scheme BIS-VFK

over scheme ES-VFK _{increases from 93.69 percent to}

99.06 percent. In addition, the execution time reduction of scheme BIS-Incremental over scheme BIS-VFK ranges from 18.57 percent to 22.78 percent. Since the employment of the incremental property of VFK _{does not affect the quality of}

the results, scheme BIS-Incremental is more scalable than scheme BIS-VFK on the number of data items.

5.4 Experiment 3: The Effect of the Number of Channels

This experiment evaluates the effect of the number of channels (i.e., K) which is set from 3 to 15. As shown in Fig. 11a, the average access times of all schemes decrease as the number of channels increases. This result agrees with the intuition that the increase of bandwidth will decrease the average access time. However, the improvement on the average access time decreases as the number of channels

Fig. 9. The results with the value of varied. (a) Average access time. (b) Ratio of the pruned configurations. (c) Execution time I. (d) Execution time II.

(13)

increases. As a result, the determination of the number of channels should consider the balance between performance improvement and the number of channels used. We also observe that the performance gain of scheme ES-OPT over scheme FLAT ranges from 25.02 percent to 38.26 percent as the number of channels increases. In addition, the perfor-mance gain of VFK-based schemes over scheme FLAT ranges from 19.03 percent to 36.87 percent. These results

show that the schemes employing hierarchical broadcast programs are able to utilize network bandwidth better than that employing flat broadcast programs.

As shown in Fig. 11b, the ratio of pruned configurations decreases from 58.63 percent to 14.83 percent as the number of channels increases. With the analysis in Lemma 3, the influence of the ratio of pruned configurations is dominated by K rather than n since K n in this experiment. As a

Fig. 11. The results with the number of channels (K) varied. (a) Average access time. (b) Ratio of pruned configurations. (c) Execution time I. (d) Execution time II.

Fig. 10. The results with the number of data items (n) varied. (a) Average access time. (b) Ratio of pruned configurations. (c) Execution time I. (d) Execution time II.

(14)

result, the influence of the change of K is more significant than that of the change of n. Figs. 11c and 11d show that the execution times of all schemes increase as the number of channels increases. It can be explained as follows: The execution times of all schemes are proportional of the number of unpruned configurations, which increases as the value of K increases since the number of all configurations is ðK þ 1Þ ðn þ 1Þ and the ratio of pruned configurations decreases as the value of K increases. Since the execution time of scheme ES-OPT is more sensitive to the number of unpruned configurations than that of BIS-based schemes, BIS-based schemes are more scalable when the value of K becomes large. As shown in Figs. 11c and 11d, the execution time reduction of scheme BIS-VFK _{over scheme ES-VF}K

ranges from 96.04 percent to 98.01 percent as the value of K increases. In addition, the execution time reduction of scheme BIS-Incremental over scheme BIS-VFK increases from 11.36 percent to 27.87 percent. This result shows the high scalability of scheme BIS-Incremental.

5.5 Experiment 4: The Effect of the Number of Users This experiment measures the effect of the number of users. Fig. 12a shows the average access times of all schemes with the number of users varied. The number of users is set from 200 to 1,000. As observed in Fig. 12a, the average access time reduction of scheme ES-OPT over scheme FLAT increases from 11.8 percent to 40.66 percent as the number of users increases. This is because the data request rate is proportional to the number of users. In addition, the increment of average access time decreases as the number of users increases. This can be explained as follows: When the number of users is small, most channels are allocated in on-demand mode and most data

items are allocated in the on-demand channels. When the number of users becomes large, to reduce the increment of average access time, some channels are reallocated to broadcast mode and some data items are reallocated to the broadcast channels. Hence, the system becomes less sensitive to the number of users when the number of users increases. This result shows the advantage of the combined use of broadcast and on-demand channels. In this experiment, the performance gain of scheme ES-OPT over scheme FLAT increases from 1.10 percent to 46.20 percent as the number of users increases and the performance gain of VFK_-based _{schemes over scheme}

FLAT ranges from 0.81 percent to 40.66 percent.

Fig. 12b shows that the ratio of pruned configurations increases from 20.36 percent to 30.71 percent as the value of N increases. Similar to the situation when varies, the increase of the value of N causes more configurations to be pruned by Property 5. Hence, the ratio of pruned config-urations increases as the value of N increases. It shows that the pruning properties are scalable when the number of users is high.

Figs. 12c and 12d show the execution time of each scheme with the value of N varied. Resulting from the effect showing in Fig. 12b, the execution times of all schemes decrease as the value of N increases. In addition, since BIS-based schemes are less sensitive to the number of unpruned configurations than scheme ES-OPT, the increment of the execution times of BIS-based schemes is smoother than that of scheme ES-OPT. In this experiment, the execution time reduction of scheme BIS-VFK over scheme ES-VFK is around 97 percent, and the execution time reduction of scheme BIS-Incremental over scheme BIS-VFK _decreases

from 22.94 percent to 13.22 percent.

Fig. 12. The results with the number of users (N) varied. (a) Average access time. (b) Ratio of pruned configurations. (c) Execution time I. (d) Execution time II.

(15)

5.6 Summary

In this section, we evaluate the performance of several instances of algorithm SOM. From above experiments, we observe that the average access time of all schemes employing hierarchical broadcast generation programs (i.e., OPT and VFK-based schemes) is better than that of scheme FLAT which employs flat broadcast programs. This result shows the advantage of using hierarchical broadcast program generation algorithms. The solutions obtained by VFK-based schemes are close to scheme ES-OPT due to the fact that the results of VFK _{are close to}

those of OPT.

We also observe that the execution time of BIS-based schemes is much faster than that of scheme ES-OPT when the same broadcast program generation algorithm is employed. This is because BIS searches only the configura-tions with high probability of being the optimal one instead of all configurations in the search space. Due to the combination of the merits of BIS and VFK_{, scheme}

BIS-VFK is able to obtain nearly-optimal solutions effi-ciently. In addition, by employing the incremental property of VFK, scheme BIS-Incremental is able to obtain the same solutions as what scheme BIS-VFK obtains and is more efficient and scalable than scheme BIS-VFK.

6 C

ONCLUSIONS

In this paper, we explored the problem of dynamic data and channel allocation with the number of communication channels and the number of data items given. We first derived the analytical models of the average access time on broadcast and on-demand channels. Then, we transformed this problem into a guided search problem. In light of the theoretical properties derived, we devised algorithm SOM to obtain the optimal allocation of data and channels. According to the analytical mode, we devised scheme BIS-Incremental based on SOM which is able to obtain solutions of high quality efficiently by employing binary interpola-tion search and the incremental property of VFK. Sensitivity study on several parameters, including the number of data items and the number of communication channels, was conducted. Our simulation results showed that the solu-tions of scheme BIS-Incremental are of very high quality and are in fact very close to the optimal ones. In addition, the experimental results also showed that scheme BIS-Incremental is of very good scalability which is particularly important for its practical use in a mobile computing environment.

A

PPENDIX

A

LL

P

ROOFS

Proof of Property 1.Consider an arbitrary configuration C which 1 KB K 1 and nB< KB. Since nB< KB, at

least one broadcast channel does not contain any data item. Then, we can get another configuration C0 _by

reassigning the broadcast channel(s) without any data item as on-demand channel(s). Pn

Bis equal to PB0nsince no

data item is reassigned. Since these reassigned broadcast channels contain no data item, the average access times in broadcast channels of C and C0 _{are equal (i.e.,}

WB¼ WB0). Since C0has more on-demand channels than

C, WO0 is smaller than WO. By (2), we have W0< W and,

as a result, C is not the optimal since C0_{is better than C.t}_u

Proof of Lemma 1. Consider the procedure of VFK

mentioned above. The initial partitions of all configura-tions with the same parameters except KB are the same

(i.e., placing all data items in one partition). Then, the selected partitions to be cut and the best cut points for these configurations are the same. Hence, the results after the first cuts of all configurations with the same parameters except KB equal the result of VFK when

KB¼ 2. With the same reasoning, the results of the

nthcuts of all configurations with the same parameters except KBis equal to the result of VFKwhen KB¼ n þ 1.

This property follows. tu Proof of Lemma 2. When K ¼ 1, the size of the search

space is

ðK þ 1Þ ðn þ 1Þ ¼ ð1 þ 1Þ ðn þ 1Þ ¼ 2n þ 2: The feasible configurations are Cð1; nÞ and Cð0; 0Þ. Then, the number of configurations pruned by Properties 1 through 4 is 2n þ 2 2 ¼ 2n.

Considering the cases that K > 1, when KB¼ 0,

Properties 2 and 3 are able to prune n configurations. For each KB, 1 KB K 1, Properties 1 and 2 are able

to prune KB and one configuration, respectively. When

KB¼ K, Properties 1 and 4 are able to prune n

configurations. Then, the total number of configurations pruned by Properties 1 to 4 is

Number of configurations pruned by Properties 1-4: ¼ n þX K1 i¼1 ði þ 1Þ þ n ¼ 2n þðK 1ÞðK þ 2Þ 2 :

Consequently, we can conclude that the total num-ber of configurations pruned by Properties 1-4 is 2nþðK1ÞðKþ2Þ₂ . tu Proof of Lemma 3.Initially, the total number of configura-tions in the search space is ðn þ 1ÞðK þ 1Þ. When K 1 and n K, according to Lemma 2, the number of configurations pruned by Properties 1 through 5 is at least 2n þðK1ÞðKþ2Þ₂ . Then, the lower bound of the ratio of the pruned configurations can be formulated as follows:

The lower bound of the ratio of the pruned configurations: 2nþ ðK1ÞðKþ2Þ 2 ðn þ 1ÞðK þ 1Þ ¼4nþ K 2_{þ K 2} 2ðn þ 1ÞðK þ 1Þ : When n 1 and K2_1,

The lower bound of the ratio of the pruned configurations: 4nþ K 2_{þ K 2} 2ðn þ 1ÞðK þ 1Þ 4nþ K 2_{þ K þ 4} 2ðn þ 1ÞðK þ 1Þ ¼1 2 K nþ 1þ 4 Kþ 1 :

(16)

Note that 1 Kþ1

1

K when K

2_{1. The approximated}

lower bound of the ratio of the pruned configurations when n 1 and K2_{1 is}

The ratio of pruned configurations: 1 2 K nþ 1þ 4 Kþ 1 since n 1 and K2₁ 1 2 K n þ 4 K ¼K 2nþ 2 K; proving Lemma 3. tu

A

CKNOWLEDGMENTS

This work was done when the authors were with the National Taiwan University and was supported in part by the National Science Council of Taiwan, Republic of China, under Contract NSC93-2752-E-002-006-PAE.

R

EFERENCES

[1] S. Acharya, R. Alonso, M. Franklin, and S. Zdonik, “Broadcast Disks: Data Management for Asymmetric Communication Envir-onments,” Proc. ACM SIGMOD Conf., pp. 198-210, Mar. 1995. [2] S. Acharya, M. Franklin, and S. Zdonik, “Balancing Push and Pull

for Data Broadcast,” Proc. ACM SIGMOD Conf., pp. 183-194, May 1997.

[3] S. Acharyat and S. Muthukrishnan, “Scheduling On-Demand

Broadcasts: New Metrics and Algorithms,” Proc. Fourth ACM/ IEEE Int’l Conf. Mobile Computing and Networking, pp. 43-94, Oct. 1998.

[4] M. Agrawal, A. Manjhi, N. Bansal, and S. Seshan, “Improving Web Performance in Broadcast-Unicast Networks,” Proc. IEEE INFOCOM Conf., Mar.-Apr. 2003.

[5] D. Aksoy and M.J. Franklin, “Scheduling for Large-Scale On-Demand Data Broadcasting,” Proc. IEEE INFOCOM Conf., pp. 651-659, Mar. 1998.

[6] D. Aksoy, M.J. Franklin, and S. Zdonik, “Data Staging for On-Demand Broadcast,” Proc. 27th Int’l Conf. Very Large Data Bases, pp. 571-580, Sept. 2001.

[7] A. Bar-Noy, B. Patt-Shamir, and I. Ziper, “Broadcast Disks with Polynomial Cost Functions,” ACM/Kluwer Wireless Networks, vol. 10, no. 2, Mar. 2004.

[8] L. Breslau, P. Cao, G. Phillips, and S. Shenker, “Web Caching and Zipf-Like Distributions: Evidence and Implications,” Proc. IEEE INFOCOM Conf., Mar. 1999.

[9] M.-S. Chen, K.-L. Wu, and P.S. Yu, “Indexed Sequential Data Broadcasting in a Wireless Computing Environment,” Proc. 17th IEEE Int’l Conf. Distributed Computing Systems, pp. 124-131, May 1997.

[10] A. Datta, D.E. VanderMeer, A. Celik, and V. Kumar, “Broadcast Protocols to Support Efficient Retrieval from Databases by Mobile Users,“ ACM Trans. Database Systems, vol. 24, no. 1, pp. 1-79, Mar. 1999.

[11] D. Gross and C.M. Harris, Fundamentals of Queueing Theory, third ed. John Wiley & Sons, 1998.

[12] C.-H. Hsu, G. Lee, and A.L.P. Chen, “A Near Optimal Algorithm for Generating Broadcast Programs on Multiple Channels,” Proc. 10th ACM Int’l Conf. Information and Knowledge Management, Nov. 2001.

[13] C.-L. Hu and M.-S. Chen, “Dynamic Data Broadcasting with Traffic Awareness,” Proc. 22nd IEEE Int’l Conf. Distributed Computing and Systems, July 2002.

[14] Q.L. Hu, W.-C. Lee, and D.L. Lee, “Indexing Techniques for Wireless Data Broadcast under Data Clustering and Scheduling,” Proc. Eighth ACM Int’l Conf. Information and Knowledge Manage-ment, pp. 351-718, Nov. 1999.

[15] T. Imielinski, S. Viswanathan, and B.R. Badrinath, “Data on Air: Organization and Access,” IEEE Trans. Knowledge and Data Eng., vol. 9, no. 9, pp. 353-372, June 1997.

[16] J. Juran, A.R. Hurson, N. Vijaykrishnan, and S. Kim, “Data Organization and Retrieval on Parallel Air Channels: Performance and Energy Issues,” ACM/Kluwer Wireless Networks, vol. 10, no. 2, Mar. 2004.

[17] S. Lee, D.P. Carney, and S. Zdonik, “Index Hint for On-Demand Broadcasting,” Proc. 19th IEEE Int’l Conf. Data Eng., Mar. 2003. [18] W.-C. Lee, Q.L. Hu, and D.L. Lee, “A Study on Channel Allocation

for Data Dissemination in Mobile Computing Environments,“ ACM/Kluwer Mobile Networks and Applications, vol. 4, no. 5, pp. 117-129, May 1999.

[19] C.-W. Lin, H. Hu, and D.L. Lee, “Adaptive Realtime Bandwidth Allocation for Wireless Data Delivery,” ACM/Kluwer Wireless Networks, vol. 10, pp. 103-120, 2004.

[20] S.-C. Lo and A.L.P. Chen, “Optimal Index and Data Allocation in Multiple Broadcast Channels,” Proc. 16th Int’l Conf. Data Eng., pp. 293-702, Mar. 2000.

[21] V. Padmanabhan and L. Qiu, “The Content and Access Dynamics of a Busy Web Site: Findings and Implications,” Proc. IEEE SIGCOMM Conf., pp. 293-304, Aug.-Sept. 2000.

[22] W.-C. Peng and M.-S. Chen, “Efficient Channel Allocation Tree Generation for Data Broadcasting in a Mobile Computing Environment,” ACM/Kluwer Wireless Networks, vol. 9, no. 2, pp. 117-129, 2003.

[23] K. Prabhakara, K.A. Hua, and J.H. Oh, “Level Multi-Channel Air Cache Designs for Broadcasting in a Mobile Environment,” Proc. 16th Int’l Conf. Data Eng., pp. 167-186, Feb.-Mar. 2000.

[24] M. Satyanarayanan, “Pervasive Computing: Vision and Chal-lenges,“ IEEE Personal Comm., vol. 8, no. 4, pp. 10-17, Aug. 2001. [25] N. Shivakumar and S. Venkatasubramanian, “Efficient Indexing

for Broadcast Based Wireless Systems,“ ACM/Baltzer Mobile Networks and Applications, vol. 4, no. 6, pp. 433-446, Jan. 1996. [26] K. Stathatos, N. Roussopoulos, and J.S. Baras, “Adaptive Data

Broadcast in Hybrid Networks,” Proc. 23rd Int’l Conf. Very Large Data Bases, pp. 326-335, 1997.

[27] C.-J. Su and L. Tassiulas, “Joint Broadcast Scheduling and User’s Cache Management for Efficient Information Delivery,” Proc. Fourth ACM/IEEE Int’l Conf. Mobile Computing and Networking, pp. 33-42, Oct. 1998.

[28] D.A. Tran, K. Hua, and K. Prabhakaran, “On the Efficient Use of Multiple Physical Channel Air Cache,” Proc. IEEE Wireless Communications and Networking Conf., pp. 17-21, 2002.

[29] WAP Forum, http://www.wapforum.org, 2003.

[30] J. Xu, W.-C. Lee, and X. Tang, “Exponential Index: A Parameter-ized Distributed Indexing Scheme for Data on Air,” Proc. Second ACM/USENIX Int’l Conf. Mobile Systems, June 2004.

[31] J.L. Xu, Q.L. Hu, W.-C. Lee, and D.L. Lee, “An Optimal Cache Replacement Policy for Wireless Data Dissemination under Cache Consistency,” Proc. 30th Int’l Conf. Parallel Processing, Sept. 2001. [32] J.L. Xu, D.L. Lee, and B. Li, “On Bandwidth Allocation for Data

Dissemination in Cellular Mobile Networks,” ACM/Kluwer Wire-less Networks, vol. 9, no. 2, pp. 103-116, Mar. 2003.

[33] J.L. Xu, B. Zheng, W.-C. Lee, and D.K. Lee, “Energy Efficient Index for Querying Location-Dependent Data in Mobile Broadcast Environments,” Proc. 19th Int’l Conf. Data Eng., Mar. 2003. [34] W.G. Yee, S.B. Navathe, E. Omiecinski, and C. Jermaine, “Bridging

the Gap between Response Time and Energy-Efficiency in Broadcast Schedule Design,” Proc. Int’l Conf. Extending Data Base Technology, pp. 572-589, 2002.

[35] W. G. Yee, S.B. Navathe, E. Omiecinski, and C. Jermaine, “Efficient Data Allocation over Multiple Channels at Broadcast Servers,” IEEE Trans. Computers, vol. 51, no. 10, pp. 1231-1236, Oct. 2002.

[36] J.X. Yu, T. Sakata, and K.L. Tan, “Statistical Estimation of Access Frequencies in Data Broadcasting Environments,“ ACM/Kluwer Wireless Networks, vol. 6, no. 2, pp. 89-98, Mar. 2000.