A Cross-Layer Framework for Overhead Reduction, Traffic Scheduling, and Burst Allocation in IEEE 802.16 OFDMA Networks

(1)

A Cross-Layer Framework for Overhead Reduction,

Traffic Scheduling, and Burst Allocation in

IEEE 802.16 OFDMA Networks

Jia-Ming Liang, Student Member, IEEE, Jen-Jee Chen, Member, IEEE, You-Chiun Wang, Member, IEEE,

and Yu-Chee Tseng, Senior Member, IEEE

Abstract—IEEE 802.16 orthogonal frequency-division multiple access (OFDMA) downlink subframes have a special 2-D channel-time structure. Allocation resources from such a 2-D structure incur extra control overheads that hurt network performance. Existing solutions try to improve network performance by de-signing either the scheduler in the medium access control layer or the burst allocator in the physical layer, but the efficiency of overhead reduction is limited. In this paper, we point out the necessity of “codesigning” both the scheduler and the burst allocator to efficiently reduce overheads and improve network performance. Under the partial-usage-of-subcarriers model, we propose a cross-layer framework that covers overhead reduction, real-time and non-real-time traffic scheduling, and burst alloca-tion. The framework includes a two-tier priority-based scheduler and a bucket-based burst allocator, which is more complete and efficient than prior studies. Both the scheduler and the burst allocator are tightly coupled together to solve the problem of arranging resources to data traffic. Given available space and bucket design from the burst allocator, the scheduler can well utilize the frame resource, reduce real-time traffic delays, and maintain fairness. On the other hand, with priority knowledge and resource assignment from the scheduler, the burst allocator can efficiently arrange downlink bursts to satisfy traffic requirements with low complexity. Through analysis, the cross-layer framework is validated to give an upper bound to overheads and achieve high network performance. Extensive simulation results verify that the cross-layer framework significantly increases network through-put, maintains long-term fairness, alleviates real-time traffic de-lays, and enhances frame utilization.

Index Terms—Burst allocation, cross-layer design, fair schedul-ing, IEEE 802.16, Worldwide Interoperability for Microwave Access orthogonal frequency-division multiple access (WiMAX OFDMA).

Manuscript received March 15, 2010; revised January 20, 2011; accepted February 25, 2011. Date of publication March 10, 2011; date of current version May 16, 2011. The work of Y.-C. Tseng was supported in part by the Aiming for the Top University and Elite Research Center Development Plan, by the National Science Council under Grant 97-3114-E-009-001, Grant 97-2221-E-009-142-MY3, Grant 98-2219-E-009-019, Grant 98-2219-E-009-005, and Grant 99-2218-E-009-005, by the Industrial Technology Research Institute, Taiwan, by the Institute for Information Industry, Taiwan, by D-Link, and by Intel Corporation. The review of this paper was coordinated by Dr. N. Ansari.

J.-M. Liang, Y.-C. Wang, and Y.-C. Tseng are with the Department of Computer Science, National Chiao-Tung University, Hsin-Chu 30010, Taiwan (e-mail: jmliang@cs.nctu.edu.tw; wangyc@cs.nctu.edu.tw; yctseng@cs.nctu. edu.tw).

J.-J. Chen is with the Department of Electrical Engineering, National Uni-versity of Tainan, Tainan 70005, Taiwan (e-mail: james.jjchen@ieee.org).

Color versions of one or more of the figures in this paper are available online at http://ieeexplore.ieee.org.

Digital Object Identifier 10.1109/TVT.2011.2125808

I. INTRODUCTION

T

HE IEEE 802.16 standard [1] has been developed for wide-range broadband wireless access. The physical (PHY) layer employs the orthogonal frequency-division multi-ple access (OFDMA) technique, where a base station (BS) can simultaneously communicate with multiple mobile subscriber

stations (MSSs) through a set of orthogonal subchannels. The

standard supports the frequency-division duplex (FDD) and the time-division duplex (TDD) modes. This paper aims at the TDD mode. Under the TDD mode, the following two types of subcarrier grouping models are specified: 1) adaptive

modu-lation and coding (AMC) and 2) partial usage of subcarriers

(PUSC). AMC adopts a contiguous permutation strategy, which chooses adjacent subcarriers to constitute each subchannel and leverages channel diversity by the high correlation in channel gains. However, each MSS needs to report its channel quality on every subchannel to the BS. On the other hand, PUSC adopts a distributed permutation strategy, which randomly selects sub-carriers from the entire frequency spectrum to constitute each subchannel. Thus, the subchannels could be more resistant to interference, and each MSS can report only the average channel quality to the BS. Because PUSC is more interference resistant and mandatory in the standard, this paper adopts the PUSC model. In this case, there is no issue of subchannel diversity (i.e., the qualities of all subchannel are similar), because the BS calculates the average quality for each subchannel based on MSSs’ reports [2], [3].

The BS manages network resources for MSSs’ data traffic, which is classified into real-time traffic [e.g., unsolicited grant service (UGS), real-time polling service (rtPS), and extended real-time polling service (ertPS)] and non-real-time traffic [e.g., non-real-time polling service (nrtPS) and best effort (BE)]. These network resources are represented by frames. Each frame consists of a downlink subframe and an uplink subframe. Each downlink subframe is a 2-D array over channel and time domains, as shown in Fig. 1. The resource unit that the BS allocates to MSSs is called a burst. Each burst is a 2-D subarray and needs to be specified by a downlink map information

element MAP_IE or simply IE) in the downlink map

(DL-MAP) field. These IEs are encoded by the robust quaternary phase-shift keying (QPSK) 1/2 modulation and coding scheme (MCS) for reliability. Because the IEs occupy frame space and do not carry MSSs’ data, they are considered control overheads. Explicitly, how we can efficiently reduce IE overheads will

(2)

Fig. 1. Structure of an IEEE 802.16 OFDMA downlink subframe under the TDD mode.

significantly affect network performance, because it determines frame utilization. To manage resources to all data traffic, the standard defines a scheduler in the medium access control (MAC) layer and a burst allocator in the PHY layer. However, their designs are left as open issues to implementers.

This paper aims at codesigning both the scheduler and the burst allocator to improve network performance, which covers overhead reduction, real-time and non-real-time traffic schedul-ing, and burst allocation. The design of the scheduler should consider the following three issues.

• The scheduler should improve network throughput while maintaining long-term fairness. Because the BS may send data to MSSs using different transmission rates (due to network situations), the scheduler will prefer MSSs that use higher transmission rates but should avoid starving MSSs that use lower transmission rates.

• The scheduler should satisfy the delay constraints of real-time traffic to avoid high packet-dropping ratios. However, it should also meet the requirements of non-real-time traffic.

• To well utilize the limited frame space, the scheduler has to reduce IE overheads when assigning resources to MSSs’ data traffic. This condition requires the knowledge of available frame space and burst arrangement design from the burst allocator.

On the other hand, the design of the burst allocator should address the following three issues.

• The burst allocator should arrange IEs and downlink bursts for the MSSs’ resource requests from the scheduler in the OFDMA channel-time structure to well utilize the frame space and reduce the control overhead. Under the PUSC model, because all subchannels are equally adequate for all MSSs, the problem of arranging IEs and downlink bursts will become a 2-D mapping problem, which is NP-complete [4]. To simplify the burst arrangement problem, an advance planning for the MSSs’ resource requests in the scheduler is needed. This condition requires codesigning for the scheduler and the burst allocator.

• To satisfy traffic requirements such as real-time delay constraints, the burst allocator has to arrange bursts based on the traffic scheduling knowledge from the scheduler.

For example, bursts for urgent real-time traffic should first be allocated to avoid packet dropping.

• Simplicity is a critical concern, because a frame is typ-ically 5 ms [5], which means that the burst allocation scheme needs to be executed every 5 ms.

In the literature, prior studies design solely either the sched-uler [6]–[10] or the burst allocator [4], [11]–[14] to address the reduction of IE overheads. Nevertheless, we point out the ne-cessity of the cross-layer design by the following three reasons. First, the amount of IE overheads highly depends on the number of scheduled MSSs and the number of fragmented bursts, where prior work handles the two issues by the scheduler and the burst allocator, respectively. However, if we only consider either issue, the efficiency of overhead reduction is limited. Second, without considering burst arrangements, the scheduler may fail to satisfy MSSs’ requirements, because extra IE overheads will occupy the limited frame space. Third, without considering the scheduling assignments, the burst allocator may kick out some important data of MSSs (due to out-of-frame space). This case may cause unfairness among MSSs and high packet-dropping ratios of real-time traffic. Therefore, it is necessary to codesign both the scheduler and the burst allocator due to their inseparable dependency.

In this paper, we propose a cross-layer framework that con-tains a two-tier priority-based scheduler and a bucket-based

burst allocator. The scheduler assigns priorities to MSSs’

traf-fic in a two-tier manner and allocates resources to the traftraf-fic based on its priority. In the first tier, traffic is differentiated by its type. Urgent real-time traffic is assigned with the highest level-1 priority to avoid its packets being dropped in the next frame. Then, a γ ratio (0 < γ < 1) of nonurgent real-time traffic is assigned with level-2 priority, and non-real-time traffic is given level-3 priority. The aforementioned design has two advantages. First, we can avoid generating too many urgent real-time traffic in the next frame. Second, non-real-time traffic can have opportunity to be served to avoid being starved. In the second tier, traffic of the same type (i.e., the same priority level in the first tier) is assigned with different priorities calculated by the following four factors:

1) current transmission rates; 2) average transmission rates; 3) admitted data rates; 4) queue lengths.

The BS can have the knowledge of the aforementioned four factors, because all downlink traffic is queued in the BS, and MSSs will report their average channel qualities to the BS [15]. Unlike traditional priority-based solutions that are partial to non-real-time traffic [10], our novel two-tier priority scheduling scheme not only prevents urgent real-time traffic from incurring packet dropping (through the first tier) but maintains long-term fairness (through the second tier) as well. The network throughput is also improved by giving a higher priority to MSSs that use higher transmission rates (in the second tier). In addition, the scheduler can adjust the number of MSSs to be served and assign resources to traffic according to the burst arrangement manner (from the burst allocator) to significantly reduce IE overheads. This design is neglected in prior studies

(3)

and has a significant impact on overhead reduction and network performance.

On the other hand, the burst allocator divides the free space of each downlink subframe into a special structure that consists of several “buckets” and then arranges bursts in a bucket-by-bucket manner. Given k requests to be filled in a subframe, we show that this burst allocation scheme generates at most k plus a small constant number of IEs. In addition, the burst allocator will arrange bursts according to the priority design from the scheduler so that the burst allocation can satisfy MSSs’ traffic requirements. The aforementioned bucket-based design incurs very low computation complexity and can be implemented on most low-cost Worldwide Interoperability for Microwave Access (WiMAX) chips [16]. Explicitly, in our cross-layer framework, both the scheduler and the burst allocator are tightly coupled together to solve the problems of overhead reduc-tion, real-time and non-real-time traffic scheduling, and burst allocation.

The major contributions of this paper are fourfold. First, we point out the necessity of codesigning both the scheduler and the burst allocator to improve network performance and propose a cross-layer framework that covers overhead reduc-tion, real-time and non-real-time traffic scheduling, and burst allocation. Our framework is more complete and efficient than prior studies. Second, we develop a two-tier priority-based scheduler that distributes resources among MSSs according to their traffic types, transmission rates, and queue lengths. The proposed scheduler improves network throughput, guar-antees traffic requirements, and maintains long-term fairness. Third, a low-complexity bucket-based scheme is designed for burst allocation, which significantly improves the utilization of downlink subframes. Fourth, we analyze the upper bound of the amount of IE overheads and the potential throughput degradation caused by the proposed burst allocator, which is used to validate the simulation experiments and provide guide-lines for the setting of the burst allocator. Extensive simulations are also conducted, and their results validate that our cross-layer framework can achieve high network throughput, main-tain long-term fairness, alleviate real-time traffic delays, and improve downlink subframe utilization.

The rest of this paper is organized as follows. Section II surveys the related work. Section III gives the problem formu-lation. The cross-layer framework is proposed in Section IV. Section V analyzes the expected network throughput of the proposed framework. Extensive simulation results are given in Section VI. Conclusions are drawn in Section VII.

II. RELATEDWORK

Most of the prior studies on resource allocation in IEEE 802.16 OFDMA networks implement either the scheduler or the burst allocator. For the implementation of the scheduler, the study in [6] proposes a scheduling scheme according to MSSs’ signal-to-noise ratios (SNRs) to achieve rate maximization. The work in [7] proposes a utility function to evaluate the tradeoff between network throughput and long-term fairness. In [8], an opportunistic scheduler is proposed by adopting the instantaneous channel quality of each MSS to maintain fairness.

However, these studies do not consider the delay requirements of real-time traffic. The work in [9] tries to minimize the blocking probability of MSSs’ traffic requests, and thus, the packet-dropping ratios of real-time traffic may be reduced. Nevertheless, all of the aforementioned studies [6]–[9] do not address the issue of overhead reduction. The work in [10] tries to reduce IE overhead from the perspective of the scheduler, where the number of MSSs to be served in each subframe is re-duced to avoid generating too many IEs. However, without the help of the burst allocator, the efficiency of overhead reduction becomes insignificant, and some important data (e.g., urgent real-time traffic) may not be allocated with bursts because of out-of-frame space. In this case, some MSSs may encounter serious packet dropping.

On the other hand, several studies consider implementing the burst allocator. The work in [17] proposes a new control message for periodic resource assignment to reduce duplicate signaling. Reference [18] suggests piggybacking IEs on data packets to increase the utilization of downlink subframes. How-ever, both studies [17], [18] involve modifying the standard. The work in [4] proposes two heuristics for burst allocation: The first heuristic scans the free space in a downlink subframe row by row to try to fully utilize the space, but it may generate a large number of IEs. The second heuristic presegments a subframe into several rectangles, and a request will choose a rectangle larger than it for allocation; however, this scheme re-quires prior knowledge of the request distribution. The work in [11] first allocates bursts for large requests. Nevertheless, larger requests may not be necessarily more important or urgent. Sev-eral studies consider allocating bursts in a column-by-column manner. In [12], bursts with the same MCS are combined into a large one. However, this scheme is not compliant to the standard, because a burst may contain requests from multiple MSSs. The study in [13] pads zero bits in each column’s end, which may cause low subframe utilization. The work in [14] adopts a backward columnwise allocation scheme, where the bursts are allocated from the right–down side to the left–up side of the subframe. However, this scheme requires 3n bursts for

n MSSs in the worst case. As shown, existing research efforts

may pad too many useless bits, generate too many IEs, or leave unused slot holes.

Few studies implement both the scheduler and the burst allocator, but they do not consider reducing the IE overhead. The studies in [19] and [20] try to arrange resources to MSSs to maximize their data rates and maintain fairness. However, they do not consider the delay requirements of real-time traffic. The studies in [21] and [22] develop a one-tier priority-based sched-uler to allocate resources to each MSS to exactly satisfy its demand. Thus, the delay requirement of real-time traffic could be guaranteed, but the network throughput may be degraded. Nevertheless, all of the studies [19]–[22] neglect the issue of overhead reduction, which may lead to low subframe utilization and low network throughput. We will show by simulations in Section VI that, without reducing the IE overhead, the quality-of-service (QoS) requirements of MSSs’ traffic may not be satisfied, particularly when the network becomes saturated.

Table I compares the features of prior studies and our cross-layer framework. It is shown that our cross-cross-layer framework

(4)

TABLE I

COMPARISON OFPRIORWORK ANDOURCROSS-LAYERFRAMEWORK

covers all of the features. In addition, our cross-layer framework has the least computation complexity in burst allocation. Thus, it can be implemented on most WiMAX low-cost chips.

III. RESOURCEALLOCATIONPROBLEM

We consider the downlink communication in an IEEE 802.16 OFDMA network using the TDD mode. The mandatory PUSC model is adopted so that there is no issue of subchannel diversity (because MSSs will report only their average channel qualities to the BS, as mentioned in Section I). The BS supports multiple MSSs in a point-to-multipoint manner, where each MSS has its admitted real-time and non-real-time traffic rates. The BS has to arrange the radio resource to the MSSs according to their traffic demands.

The radio resource is divided into frames, where each frame is further divided into a downlink subframe and an uplink subframe (see Fig. 1). A downlink subframe is modeled by a 2-D array with X time units (in the time domain) and Y subchannels (in the frequency domain). The basic unit in the

X× Y array is called a subchannel time slot (or simply a slot). Each downlink subframe is composed of the following

three portions: 1) preamble; 2) control; and 3) data. The control portion contains a DL-MAP and an uplink map (UL-MAP) to indicate the downlink and uplink resource allocation in the current frame, respectively. The downlink allocation unit is a subarray, called a downlink burst (or simply a burst), in the

X× Y array. Each burst is denoted by (x, y, w, h), where x

is the starting time unit, y is the starting subchannel, w is the burst’s width, and h is the burst’s height. An MSS can own more than one burst in a subframe. However, no two bursts can overlap with each other. Fig. 1 gives some examples. Bursts 1 and 2 can coexist, but bursts 2 and 3 cannot coexist. Each burst requires one IE in the DL-MAP to describe its size and location in the subframe. According to the standard, each burst carries the data of exact one MSS. Explicitly, from the scheduler’s perspective, the number of bursts (and, thus, IEs) will increase when more MSSs are scheduled. On the other hand, from the burst allocator’s perspective, more IEs are required when an MSS’s data are distributed over multiple bursts. An IE requires 60 b encoded by the QPSK 1/2 MCS [5]. Because each slot can carry 48 b by QPSK 1/2, an IE occupies 5/4 slots, which has a significant impact on the available space to allocate bursts in a downlink subframe.

The resource allocation problem is formulated as follows. There are n MSSs in the network, where each MSS Mi, i = 1, . . . , n, is admitted with an average real-time data rate of Rrt i (in bits/frame) and a minimal non-real-time data rate of Rnrt

i (in bits/frame). Let Ci be the current transmission rate1 (in bits/slot) for the BS to send data to Mi, which may change over frames. The objective is to design a cross-layer frame-work that contains both the scheduler and the burst allocator to arrange bursts to MSSs such that we can reduce the IE overhead, improve the network throughput, achieve long-term fairness, alleviate real-time traffic delays, and maximally utilize downlink subframes. In addition, the design of the cross-layer framework should not be very complicated so that it can execute within a frame duration (i.e., 5 ms) and be implemented in most low-cost WiMAX chips. Note that the fairness index (FI) in [23] is adopted to evaluate the long-term fairness of a scheme as follows:

F I = (

n

i=1SDi)2

nn_i=1(SDi)2

where SDiis the share degree of Mi, which is calculated by

SDi = T−1 j=0 ˜ Art_i (fc− j) + ˜Anrti (fc− j) T× (Rrt i + Rnrti ) (1) where ˜Art

i (x) and ˜Anrti (x) are the amounts of real-time and non-real-time traffic allocated to Mi in the xth frame, respec-tively, fc is the current frame index, and T is the window size (in frames), over which we measure fairness. We denote by

Ud(x) the utilization of the xth downlink subframe, which is defined by the ratio of the number of slots used to transmit data to X× Y . Thus, the average downlink utilization over

T frames is T−1_j=0Ud(fc− j)/T . Table II summarizes the notations used in this paper.

IV. PROPOSEDCROSS-LAYERFRAMEWORK

Fig. 2 shows the system architecture of our cross-layer framework, which is composed of the following two com-ponents: 1) the two-tier priority-based scheduler and 2) the

1_{The estimation of the transmission rate highly depends on the path loss,}

fading, and propagation model. Here, we assume that the BS can accurately estimate the transmission rate for each MSS and will discuss how we can conduct the estimation in Section VI.

(5)

TABLE II SUMMARY OFNOTATIONS

Fig. 2. System architecture of the proposed cross-layer framework, where i = 1 . . . n.

bucket-based burst allocator. The transmission rate Cifor each MSS Mi (see Fig. 2, label 1) is periodically reported to the scheduler and the burst allocator. Each Mi’s admitted rates Rrti and Rnrt_i (see Fig. 2, label 2) are sent to the scheduler when Mi first associates with the BS or when Rrt_i and Rnrt_i change. The scheduler also monitors the current amounts of queued real-time and non-real-real-time data B_irtand B_inrt(see Fig. 2, label 3). The burst allocator informs the scheduler of the bucket size Δbkt and the available free-space FS in the current downlink subframe (see Fig. 2, label 4) to help the scheduler distribute resources among MSSs’ traffic, where

F S = X× Y − (FCH size) − (UL-MAP size)

− (size of DL-MAP control fields) (2)

where FCH is the frame control header. The UL-MAP size can be known in advance, because the uplink subframe is allocated before the downlink subframe. The DL-MAP control fields contain all parts of DL-MAP, except for IEs, which are yet to be decided by the burst allocator. The scheduler’s mission is to generate each Mi’s real-time and non-real-time resource assignments Qrt_i and Qnrt_i (see Fig. 2, label 5) to the burst allocator. Based on Qrti and Qnrti , the burst allocator arranges IEs and bursts to each Mi (see Fig. 2, label 6). The actual real-time and non-real-time traffic allocated to Mi are written,

respectively, as Art

i and Anrti (see Fig. 2, label 7) and are fed to the scheduler for future scheduling.

In our cross-layer framework, the priority rule defined in the scheduler helps the burst allocator determine how bursts can be arranged for MSSs’ traffic. On the other hand, the allocation

rule defined in the burst allocator also helps the scheduler to

determine how resources can be assigned to MSSs’ traffic. Both the priority and allocation rules are similar to tenons in the cross-layer framework, which make the scheduler and the burst allocator tightly cooperate with each other.

Due to the NP-complete nature of the burst allocation prob-lem and the hardware constraints of low-cost WiMAX chips, it is inefficient and yet infeasible to derive an optimal solution for arranging IEs and bursts in a short frame duration. Therefore, to keep our burst allocator simple and efficient, we adopt a bucket concept as follows. The available free-space FS in the current subframe is horizontally sliced into a number of buckets, each of size Δbkt (see Fig. 3 for an example). The size Δbkt serves as the allocation unit in our scheme. As shown, the scheduler always keeps (Qrt_i + Qnrt_i ) as a multiple of Δbkt for each

Mi. This way, the burst allocator can easily arrange bursts in a “bucket-by-bucket” manner, well utilize the frame resource, and generate quite few bursts and, thus, IEs (which will be proved to have an upper bound in Section IV-B). In addition, the long-term fairness is achieved, because the actual allocation (Art

i , Anrti ) by the burst allocator is likely to be quite close to the assignment (Qrt

i , Qnrti ) by the scheduler for each i = 1, . . . , n.

A. Two-Tier Priority-Based Scheduler

In each frame, the scheduler will generate resource assign-ments (Qrt

i , Qnrti ), i = 1, . . . , n to the burst allocator. To gen-erate these assignments, the scheduler adopts a two-tier priority rule. In the first tier, traffic is differentiated by its type and is given priority levels according to the following order.

– P1: urgent real-time traffic whose packets will pass their

deadlines at the end of this frame;

– P2: real-time traffic ranked top γ ratio (0 < γ < 1)

sorted by their importance;

(6)

Fig. 3. Example of the bucket-based burst allocation with three buckets and four resource assignments.

Then, in the second tier, traffic of the same type is assigned with different priorities by its importance, which is calculated by the following four factors:

1) current transmission rates; 2) average transmission rates; 3) admitted data rates; 4) queue lengths.

In particular, for priority level P2, we rank the importance of

Mi’s real-time traffic by Iirt= Ci× Ci C_iavg × Brt i Rrt i . (3)

Here, the importance Irt

i involves the following three factors that are multiplied together.

1) A higher transmission rate Cigives Mia higher rating to improve the network throughput.

2) A higher ratio Ci/Ciavggives Mi a higher rating to pre-vent starvation for MSSs with low average rates, where

C_iavgis the average transmission rate for the BS to send data to Miin the most recent T frames. In particular, sup-posing that an MSS encounters a bad channel condition for a long period (i.e., a lower C_iavgvalue), we still prefer this MSS if it can now enjoy a higher transmission rate (i.e., Ci> Ciavg). In addition, a higher Ci/Ciavg value means that the MSSs is currently in a better condition; therefore, we give it a higher priority to improve the potential throughput.

3) A higher ratio Brti /Rirtgives Mia higher rating to favor MSSs with more backlogs.

Similarly, for priority level P3, we rank the importance of

Mi’s non-real-time traffic by I_inrt= Ci× Ci C_iavg× 1 Snrt i (4) where S_inrtis the non-real-time rate satisfaction ratio of Miin the most recent T frames, which is calculated by

S_inrt= T−1 j=0Anrti (fc− j) T× Rnrt i . (5)

A small S_inrt means that Mi’s non-real-time traffic may be starved. Thus, a smaller S_inrtgives Mia higher rating.

The aforementioned two-tier priority rule not only prevents urgent real-time traffic from incurring packet dropping (through the first tier) but maintains long-term fairness (through the second tier) as well. The network throughput is also improved by giving a higher priority to MSSs that use higher transmission rates (in the second tier). In addition, by giving a γ ratio of nonurgent real-time traffic with level-2 priority, the amount of urgent real-time traffic in the next frame can be reduced, and non-real-time traffic can have opportunity to send their data.

In the following procedure, we present the detailed opera-tions of our scheduler. Let eibe a binary flag to indicate whether an IE has been allocated for Mi, i = 1, . . . , n. Initially, we set all ei= 0, i = 1, . . . , n. In addition, the free space F S is deducted by ((Y /Δbkt)− 1) × θIE to preserve the space for potential IEs caused by the burst allocator (this case will be discussed in the next section), where θIE= 5/4 is the size of an IE.

1) Let U_irt be the amount of data of Mi’s urgent real-time traffic in the current frame. For all Mi with Uirt> 0, we sort them according to their Ci values in a descending order. Then, we schedule the free-space FS for each of them as follows until F S≤ 0.

a) Reserve an IE for Mi by setting F S = F S− θIE. Then, set ei= 1.

b) If F S > 0, assign resource Qrt i = min{F S × Ci, Uirt} to Mi and set F S = F S−

Qrt

i /Ci. Then, deduct Qrti from Brti .

2) After step 1, if F S > 0, we sort all Mi that have real-time traffic according to their Irt

i values [by (3)]. Then, we schedule the resource for each of them as follows until either all MSSs in the top γ ratio are examined or

F S≤ 0.

a) If ei= 0, reserve an IE for Miby setting F S = F S−

θIEand ei = 1.

b) If F S > 0, assign more resources δ = min{F S ×

Ci, Birt} to Mi. Then, set Qrti = Qrti + δ and F S =

(7)

3) After step 2, if F S > 0, we sort all Mi according to their Inrt

i values [by (4)]. Then, we schedule the resource for each of them as follows until either all MSSs are examined or F S≤ 0.

a) If ei= 0, reserve an IE for Miby setting F S = F S−

θIEand ei= 1.

b) If F S > 0, assign more resources δ = min{F S ×

Ci, Binrt} to Mi. Then, set Qnrti = δ and F S =

F S− δ/Ci. Deduct δ from Bnrti .

4) Because the bucket size Δbkt is the allocation unit in our burst allocator, in this step, we will do a fine-tuning on Qrt

i and Qnrti such that (Qrti + Qnrti ) is aligned to a multiple of Δbktfor each Mi. To do so, we will gradually

remove some slots from Qnrt

i and then Qrti until (((Qrti +

Qnrt

i )/Ci) mod Δbkt) = 0. One exception is when much of the data in Qrt

i are urgent, which makes removing any resource from Mi impossible. In this case, we will

add more slots to Mi until (((Qrti + Qnrti )/Ci) mod Δbkt) = 0. The aforementioned adjustment (i.e., removal and addition) may make the total resource assignment below or beyond the available resource F S. If so, we will further remove some slots from the MSSs with less importance or add some slots to the MSSs with more importance until the total resource assignment is equal to the initial free space given by the burst allocator.

Fig. 4 illustrates the flowchart of the scheduler. To sum-marize, our scheduler generates the resource assignment ac-cording to the following three priorities: P1) urgent traffic;

P2) real-time traffic; and P3) non-real-time traffic. Step 1 first

schedules MSSs with urgent traffic to alleviate their real-time traffic delays. Step 2 schedules the top γ ratio of MSSs to reduce the number of MSSs that may have urgent traffic in the following frames. This step also helps reduce the IE overhead of future frames caused by urgent traffic, which is neglected by prior studies. Step 3 schedules MSSs with lower non-real-time satisfaction ratios to prevent them from starvation. Finally, step 4 reshapes all assignments such that each (Qrt

i + Qnrti ) is divisible by Δbkt. This step will help the burst allocator fully utilize a downlink subframe.

We then analyze the time complexity of our scheduler. In step 1, sorting MSSs by their Ci values takes O(n lg n) time, and scheduling the resources for the MSSs with urgent traffic takes O(n) time. In step 2, sorting MSSs by their I_irt values requires O(n lg n) time, and scheduling the resources for the top γ ratio of MSSs requires at most O(γn) time. In step 3, sorting MSSs by their Inrt

i values costs O(n lg n) time, and scheduling the resources for the MSSs with non-real-time traffic takes O(n) non-real-time. In step 4, reshaping all requests spends at most O(n) time. Thus, the total time complexity is

O(n lg n + n + n lg n + γn + n lg n + n + n) = O(n lg n).

B. Bucket-Based Burst Allocator

Ideally, the free space F S in (2) should accommodate each resource assignment (Qrti , Qnrti ) calculated by the scheduler and its corresponding IE(s). However, because the burst al-location problem is NP-complete, our bucket-based heuristic

Fig. 4. Flowchart of the two-tier priority-based scheduler.

will try to squeeze as more MSSs’ assignments into F S as possible and allocate one burst per assignment with a very high possibility. If more than one burst is required, more IEs are needed, in which case, some assignments that were originally arranged by the scheduler may be trimmed down or even kicked out by the burst allocator. Given the free space F S by (2), bucket size Δbkt, and assignments (Qrti , Qnrti )’s from the scheduler, our bucket-based heuristic works as follows.

1) Horizontally2 _{slice F S into Y /Δ}

bkt buckets, each of a height Δbkt, where Y is divisible by Δbkt. Fig. 3 shows an example by slicing F S into three buckets.

2) Let k be the number of resource assignments given by the scheduler. We reserve(k + (Y/Δbkt)− 1) × θIE slots for IEs at the left side of the subframe. In fact, the scheduler has also reserved the space for these IEs, and its purpose will become clear later on. Fig. 3 gives an example. Because there are four assignments, 4 + 3− 1 IEs are reserved.

3) We then assign bursts to satisfy these resource assign-ments according to their priorities originally defined in the scheduler. Because each assignment (Qrt

i , Qnrti ) may have data mixed in the categories of P1, P2, and P3, we redefine its priority as follows.

a) An assignment with data in P1 has a higher priority than an assignment without data in P1.

b) Without the existence of data in P1, an assignment with data in P2 has a higher priority than an assign-ment without data in P2.

(8)

Fig. 5. Flowchart of the bucket-based burst allocator.

Then, bursts are allocated in a bucket-by-bucket man-ner. In particular, when an assignment (Qrt

i , Qnrti ) is examined, it will be placed starting from the previous stop point and fill up the bucket from right to left until either (Qrt

i , Qnrti ) is satisfied or the left end of the bucket is encountered. In the latter case, we will move to the right end of the next bucket and repeat the aforementioned allo-cation process. In addition, this “cross-bucket” behavior will require one extra IE for the request. The aforemen-tioned operation is repeated until either all assignments are examined or all buckets are exhausted. Fig. 3 gives one example, where the four assignments are pri-oritized by (Qrt3, Qnrt3 ) > (Qrt1, Qnrt1 ) > (Qrt4, Qnrt4 ) > (Qrt

2, Qnrt2 ). Assignment (Qrt1, Qnrt1 ) requires two IEs, because it involves one cross-bucket behavior.

4) According to the allocation in step 3, we place each resource assignment (Qrt

i , Qnrti ) into its burst(s). In ad-dition, the amount of actual allocation is written into each (Art

i , Anrti ) and fed back to the scheduler for future scheduling.

Fig. 5 illustrates the flowchart of the burst allocator. We make some remarks as follows. First, because there are Y /Δbkt buckets, there are at most ((Y /Δbkt)− 1) cross-bucket burst assignments, and thus, at most ((Y /Δbkt)− 1) extra IEs are needed. To accommodate this need, some assignments may slightly be trimmed down. Thus, (Qrt

i , Qnrti ) and (Arti , Anrti ) are not necessarily the same. However, the difference should be very small. Second, the bucket that is located at the boundary of reserved IEs and data (e.g., the third bucket in Fig. 3) may have some extra slots (e.g., the lower left corner of the third bucket). These extra slots are ignored in the aforementioned process for ease of presentation, but they can be used to allocate bursts to further improve space efficiency. Third, because each cross-bucket behavior will require one extra IE and there are Y /Δbkt buckets, the number of IEs required is bounded, as proved in Theorem 1.

Theorem 1: In the bucket-based burst allocator, the (k +

(Y /Δbkt)− 1) IEs reserved in step 2 are sufficient for the burst allocation in step 3.

Proof: Given Y /Δbkt buckets ˆb1, ˆb2, . . ., and ˆbY /Δbkt,

we can concatenate them into one virtual bucket ˆb with

((Y /Δbkt)− 1) joints. We then allocate one virtual burst for each request from the scheduler in ˆb; therefore, we have at most k virtual bursts. Then, we replace each virtual burst by one

real burst. However, we require one extra real burst whenever the replaced virtual burst crosses one joint. The worst case occurs when each of the ((Y /Δbkt)− 1) joints is crossed by one virtual burst. In this case, we require (k + (Y /Δbkt)− 1) real bursts to replace all virtual bursts. Because each real burst requires one IE, we have to reserve at most (k + (Y /Δbkt)− 1)

IEs.

In comparison, a naive burst allocation will require the worst case of 3k IEs if the allocation goes in a row-major or column-major way [14] (because each request may require up to three IEs). In our scheme, the bucket size Δbkt can dynamically be adjusted to reflect the “grain size” of our allocation. A larger grain size may cause fewer IEs but sacrifice resource utilization, whereas a smaller grain size may cause more IEs but improve resource utilization. We will discuss the effect of Δbktin Section VI-F.

We then analyze the time complexity of our burst allocator. Because we allocate bursts in a zigzag manner, the time com-plexity is proportional to the number of bursts. By Theorem 1, we have at most (k + (Y /Δbkt)− 1) bursts. Because we have

k≤ n and Y/Δbktis usually smaller than n, the time complex-ity is O(k + (Y /Δbkt)− 1) = O(n).

To conclude, the proposed scheduler and burst allocator are dependent on each other by the following two designs. First, the scheduler reserves the extra IE space caused by the bucket partition and arranges resources to MSSs’ traffic so that the resource assignments can align to buckets. Thus, we can enhance the possibility that the burst allocator fully satisfies the resource assignments from the scheduler. Second, the burst allocator follows the priority rule in the scheduler to arrange bursts. Thus, even if the frame space is not enough to satisfy all traffic, urgent real-time traffic can still be arranged with bursts to catch their approaching deadlines.

V. ANALYSIS OFNETWORKTHROUGHPUTLOSS BY THEBUCKET-BASEDSCHEME

Given an ideal scheduler, we analyze the loss of network throughput caused by our bucket-based burst allocator. To simplify the analysis, we assume that the network has only traffic of priority levels P1 and P3 and each MSS has infinite data in P3. (Traffic of P2 will eventually become urgent traffic of P1.) Then, we calculate the difference between the expected throughput by our burst allocator and the maximum throughput by an ideal burst allocator. In the ideal burst allocator, the number of IEs is equal to the number of resource assign-ments from the scheduler. In addition, the frame resource is always first allocated to urgent traffic (P1) and then to non-real-time traffic (P3) with the highest transmission rate. It follows that the following two factors may degrade the network throughput by our burst allocator: 1) extra IEs incurred by step 3 in Section IV-B and 2) the data padding of low-rate

(9)

non-real-time traffic at the boundary between the data in P1 and P3. In particular, each burst must begin with the data in

P1, followed by the data in P3. Furthermore, if the data in P3

covers more than one column, it must be sent at the highest transmission rate. If the data in P3 covers less than a column, it may be sent at a nonhighest transmission rate. The right-hand side of Fig. 3 shows these two possibilities, where P2 is empty. Note that, in the first possibility, all data in P3 must be transmitted at the highest rate; otherwise, the shaded area will be allocated to the data in P3 of other MSSs using the highest rate.

Following the aforementioned formulation, our objective is to find the throughput lossL by our burst allocator compared with the ideal burst allocator as

L = E[ ˜O] × chigh+ E[ ˜S] (6) where ˜O is the random variable that represents the number of extra IEs caused by buckets, and ˜S is the random variable that represents the throughput degradation (in bits) caused by the low-rate padding in the shaded area of the second possibility on the right-hand side of Fig. 3. To simplify the analysis, we assume that there are only two transmission rates chighand clow, where chigh> clow. The probability that an MSS is in either rate is equal.

A. Calculation ofE[ ˜O]

We first give an example to show how our analysis works. Suppose that we have three MSSs and three buckets. Each bucket has two arrangement units, each having Δbkt slots. Thus, there are, in total, six arrangement units, denoted by

O1, O2, O3, O4, O5, and O6. Resources that are allocated to the three MSSs can be represented by two separators “|.” For example, we list the following three possible al-locations: 1) O1O2|O3O4|O5O6; 2) O1O2O3O4O5O6; and 3) O1|O2O3O4O5|O6. In arrangement 1, we need no extra IE. In arrangement 2, MSS 2 receives no resource, but MSS 1 needs one extra IE. In arrangement 3, MSS 2 requires two IEs.

We will use arrangement units and separators to conduct the analysis. Suppose that we have n MSSs, (Y /Δbkt)(= B) number of buckets, and X× B(= α) arrangement units (i.e., each bucket has X arrangement units). This approach can be represented by arbitrarily placing (n− 1) separators along a sequence of α arrangement units. Bucket boundaries appear after each ith arrangement unit such that i is a multiple of X. Note that only (B− 1) bucket boundaries can cause extra IEs, as mentioned in Section IV. Whenever no separator appears at a bucket boundary, one extra IE is needed. There are, in total, (α + (n− 1))!/(α!(n − 1)!) ways of placing these separators. Let ˜E be the random variable that represents the number of bucket boundaries, where each of bucket boundaries is inserted by at least one separator. The probability of ( ˜E = e) is calcu-lated by P rob[ ˜E = e] = C B−1 e ×(α(α−(B−1−e)+(n−1−e))!−(B−1−e))!(n−1−e)! (α+(n−1))! α!(n−1)! . (7)

Note that the term CeB−1refers to the combinations of choos-ing e boundaries from the (B− 1) bucket boundaries. Each of these e boundaries is inserted by at least one separator. The re-maining (B− 1 − e) bucket boundaries must not be inserted by any separator. To understand the second term in the numerator of (7), we can denote by x0the number of separators before the first arrangement unit and by xithe number of separators after the ith arrangement unit, i = 1, . . . , α. Explicitly, we have

x0+ x1+· · · + xα= n− 1 ∀xi∈ {0, 1, 2, · · ·}. However, when ˜E = e, (B − 1 − e) of these xi’s must be 0. In addition, e of these xi’s must be larger than or equal to 1. Then, this problem is equivalent to finding the number of combinations of

y0+ y1+· · · + yj+· · · + yα−(B−1−e)= n− 1 − e

∀yj ∈ {0, 1, 2, . . .}. It follows that there are (α− (B − 1 − e) + (n − 1 − e))!/ ((α− (B − 1 − e))!(n − 1 − e)!) combinations. Therefore,

E[ ˜O] can be obtained by E[ ˜O]=

B−1

e=0

(number of extra IEs when ˜E =e)×P rob[ ˜E =e]

= B−1 e=0 (B−1−e)×C B−1 e × (α−(B−1−e)+(n−1−e))! (α−(B−1−e))!(n−1−e)! (α+(n−1))! α!(n−1)! . (8) B. Calculation ofE[ ˜S]

Recall that E[ ˜S] is the expected throughput degradation

caused by the transmission of a burst at a low rate and the burst contains some data padding of non-real-time traffic. To calculate E[ ˜S], let us define ÑL as the random variable of the number of MSSs using the low transmission rate clow. Because there is no throughput degradation by MSSs using the high transmission rate chigh, the overall expected throughput degradation is E[ ˜S] = n m=1 E[ ˜S| ÑL= m]× P rob[ ÑL= m]. (9) Let ˜Ui be the random variable that represents the amount of data of Mi’s urgent traffic i = 1, . . . , n. Here, we assume that ˜Ui is uniformly distributed among [1,R], where R ∈ N. Let ˜X_jL be the random variable that represents the amount of throughput degradation (in bits) due to the data padding of Mj’s non-real-time traffic when using clow. Because the throughput degradation caused by MSSs using chighis zero, we have

E[ ˜S| ˜NL= m] = E ⎡ ⎣m j=1 ˜ X_jL ⎤ ⎦ . (10) Explicitly, ˜XL

i and ˜XjLare independent of each other for any

i= j; therefore, we have E ⎡ ⎣m j=1 ˜ XjL ⎤ ⎦ =m j=1 E ˜ XjL . (11)

(10)

Now, let us define I_iU as an indicator of representing whether

Mi has urgent traffic such that IiU = 1 if Mi has urgent traf-fic; otherwise, IU

i = 0. Because the bursts of low-rate MSSs without urgent traffic will not contain the data padding of non-real-time traffic, no throughput degradation will be caused by them. Therefore, we can derive

E ˜ X_jL = E ˜ X_jL|I_jU = 1 × P robI_jU = 1 = _R u=1 f ( ˜Uj= u) R × P robI_jU = 1 (12) where f ( ˜Uj= u) = u Δbkt× clow − u Δbkt× clow × Δbkt× (chigh− clow) is a function for representing the throughput degradation caused by a low-rate MSS with non-real-time data padding when ˜Uj = u.

By combining (9)–(12), we can derive that

E[ ˜S] = n m=1 ⎛ ⎝m j=1 _R u=1 f ( ˜Uj= u) R × P robI_jU = 1 ⎞ ⎠ × P rob[ ˜NL= m]. (13) Finally, the throughput loss by our burst allocator can be calculated by combining (8) and (13) into (6).

VI. PERFORMANCEEVALUATION

To verify the effectiveness of our cross-layer framework, we develop a simulator in C++ based on the architecture in [15], as shown in Fig. 6. The simulator contains three layers: The

traffic-generating module in the upper layer creates the MSSs’

demands according to their real-time and non-real-time traffic requirements. In the MAC layer, the queuing module maintains the data queues for each MSS and the scheduling module conducts the actions of the scheduler. In the PHY layer, the

channel-estimating module simulates the channel conditions

and estimates the transmission rate of each MSS, and the

burst-allocating module conducts the actions of the burst allocator.

The arrows in Fig. 6 show the interaction between all the modules in our simulator. In particular, the traffic-generating module will generate traffic and feed them to the scheduling module for allocating resources and to the queuing module for simulating the queue of each traffic. The channel-estimating module will send the transmission rates of MSSs to both the scheduling and burst allocating modules for their references. In addition, the scheduling and the burst-allocating modules will interact with each other, particularly for our scheme.

The simulator adopts a fast Fourier transform (FFT) size of 1024 and the zone category as PUSC with reuse 1. The frame duration is 5 ms. This way, we have X = 12 and Y = 30. Six MCSs are adopted, which are denoted by a set M CS =

{QPSK1/2, QPSK3/4, 16QAM1/2, 16QAM3/4, 64QAM2/3,

Fig. 6. Architecture of our C++ simulator. TABLE III

AMOUNTS OFDATACARRIED BYEACHSLOT AND THEMINIMUM REQUIREDSNR THRESHOLDS OFDIFFERENTMCSS

64QAM3/4}. For the traffic-generating module, the types of real-time traffic include UGS, rtPS, and ertPS, whereas the types of non-real-time traffic include nrtPS and BE. Each MSS has an admitted real-time data rate Rrt

i of 0∼ 200 bits and an admitted non-real-time data rate Rnrt

i of 0∼ 500 bits/frame. In each frame, each MSS generates 0∼ 2Rrt

i amount of real-time data and Rnrt

i ∼ 4Rnrti amount of non-real-time data.

For the channel-estimating module, we develop two scenar-ios to estimate the transmission rate of each MSS. The first scenario, called the Stanford University Interim (SUI) scenario, is based on the SUI path loss model recommended by the IEEE 802.16 Task Group [24]. In particular, each MSS will roam inside the BS’s signal coverage (the largest area that the BS can communicate with each MSS using the lowest QPSK1/2 MCS) and move following the random waypoint model, with the maximal speed of 20 m/s [25]. The transmission rate of each MSS Miis determined by its received SNR as

SN R(BS, Mi) = 10· log10 ˜ P (BS, Mi) BW · No

where BW is the effective channel bandwidth (in hertz), Nois the thermal noise level, and ˜P (BS, Mi) is the received signal power at Mi, which is defined by

˜

P (BS, Mi) =

GBS· GMi· PBS

L(BS, Mi)

where PBS is the transmission power of the BS, GBS and

GMiare the antenna gains at the BS and Mi, respectively, and

L(BS, Mi) is the path loss from the BS to Mi. Given Mi’s SNR, the BS can determine Mi’s MCS based on Table III. In particular, the BS will choose the highest MCS whose minimum required SNR is smaller than SN R(BS, Mi). Table IV lists the parameters used in the SUI scenario.

(11)

TABLE IV

SIMULATIONPARAMETERSUSED IN THESUI SCENARIO

Fig. 7. Six-state Markov chain to model the channel condition.

The second scenario, called the Markov scenario, adopts a six-state Markov chain [26] to simulate the channel condition of each MSS, as shown in Fig. 7. In particular, let M CS[i] be the

ith MCS, i = 1, . . . , 6. Suppose that an MSS uses M CS[i] to

fit its channel condition at the current frame. The probabilities that the MSS switches to M CS[i− 1] and MCS[i + 1] in the next frame are both (1/2)pc, and the probability that it remains unchanged is 1− pc. For the boundary cases of i = 1 and 6, the probabilities of switching to M CS[2] and M CS[5], respectively, are both pc. Unless otherwise stated, we set pc = 0.5, and the initial i value of each MSS is randomly selected from 2 to 5.

We compare our cross-layer framework with the

high-rate-first (HRF) scheme [21], the modified proportional fair (MPF)

scheme [10], the rate maximization with fairness consideration (RMF) scheme [20], and the QoS guarantee (QG) scheme [22]. HRF always first selects the MSS with the highest transmission rate Ci to serve. MPF assigns priorities to MSSs, where an MSS with a higher Ci value and a lower amount of received data is given a higher priority. RMF first allocates resources to unsatisfied MSSs according to their minimum requirements, where MSSs are sorted by their transmission rates. If there remain resources, they are allocated to the MSSs with higher transmission rates. Similarly, QG first satisfies the minimum requirements of each MSS’s traffic, which are divided into real-time and non-real-real-time traffic. Then, the remaining resources are allocated to MSSs with higher transmission rates. Because both HRF and MPF implement only the scheduler, we adopt the scheme in [4] as their burst allocators. In our framework, we use B = 5 buckets and set γ = 0.3 in P2, unless otherwise stated. In Section VI-F, we will discuss the effects of these two parameters on the system performance. The duration of each experiment is at least 1000 frames.

A. Network Throughput

We first compare the network throughput under different number of MSSs (i.e., n), where the network throughput is defined by the amount of MSSs’ data (in bits) transmitted by the BS during 1000 frames. We observe the case when the network becomes saturated, where there are 60∼ 90 MSSs to be served. Fig. 8 shows the simulation results under both the SUI and

the Markov scenarios, where the trends are similar. Explicitly, when the number of MSSs grows, the throughput increases but will eventually become steady when there are too many MSSs (i.e., n≥ 80). The throughput under the SUI scenario is lower than the throughput under the Markov scenario, because some MSSs may move around the boundary of the BS’s coverage, leading to a lower SNR and, thus, a lower MCS. Under the Markov scenario, a higher pc means that each MSS may more frequently change its MCS, and vice versa.

In general, both RMF and QG ignore the effect of IE over-heads on network performance so that their throughput will be degraded. Although HRF first serves MSSs with higher transmission rates, its throughput is not the highest. The reason is that HRF not only ignores the importance of IE overheads but also neglects the effect of Ci/Ciavgfactor on potential through-put when scheduling traffic. The throughthrough-put of MPF is higher than the throughput of RMF, QG, and HRF for the following two reasons. First, MPF prefers MSSs that use the higher trans-mission rates, which is similar to HRF. However, HRF incurs higher IE overheads because of the scheduling methodology (which will be verified in Section VI-B). Second, both RMF and QG try to schedule every traffic in each frame, which gen-erates too many IEs (in fact, we can postpone scheduling some traffic to reduce IE overheads while still guaranteeing long-term fairness, as will be verified in Sections VI-B and C). On the other hand, MPF enjoys higher throughput, because it takes care of IE overheads from the viewpoint of the scheduler. In particular, our cross-layer framework has the highest through-put in most cases because of the following two reasons. First, our scheduler assigns a higher priority to MSSs with higher

Ci and Ci/Ciavg values and thus makes MSSs receive their data in higher transmission rates. Second, both our scheduler and burst allocator can effectively decrease the number of IEs and acquire more subframe space for data transmission. Note that, when n = 90, our cross-layer framework will try to satisfy a large number of urgent traffic to avoid their packets being dropped. In this case, its throughput is slightly lower than the throughput of MPF, but our cross-layer framework can significantly reduce the real-time packet-dropping ratio, as will be shown in Section VI-D.

B. IE Overheads and Subframe Utilization

Fig. 9 shows the average number of IEs in each downlink subframe. As discussed earlier, HRF, RMF, and QG do not consider IE overheads; therefore, they will generate a large number of IEs. The situation becomes worse when the number of MSSs grows, because each MSS needs to be allocated with at least one burst (and, thus, one IE). By considering IE overheads in the scheduler, MPF can reduce the average number of IEs per frame. It can be observed that, when the number of MSSs grows, the number of IEs in MPF reduces. The reason is that MPF allocates more resources to MSSs in a frame to reduce the total number of scheduled MSSs, thus reducing the number of allocated bursts (and IEs). In Fig. 9, our cross-layer framework generates the smallest number of IEs per frame, because both the proposed scheduler and burst allocator consider IE overheads, and the framework can adjust

(12)

Fig. 8. Comparison on network throughput. (a) SUI scenario. (b) Markov scenario (pc= 0.8). (c) Markov scenario (pc= 0.2).

Fig. 9. Comparison on IE overheads. (a) SUI scenario. (b) Markov scenario.

Fig. 10. Comparison on subframe utilization. (a) SUI scenario. (b) Markov scenario.

the number of nonurgent real-time traffic to be served to avoid generating too many bursts.

IE overheads have a strong impact on the utilization of down-link subframes, as reflected in Fig. 10. Because HRF, RMF, and QG generate a large number of IEs, their subframe utilization will be lower than MPF and our cross-layer framework. It can be observed that the number of buckets B significantly affects the subframe utilization of our cross-layer framework. In par-ticular, a very large B (e.g., 30) will reduce the amount of data carried in each bucket and thus generate many small bursts. On the other hand, a very small B (e.g., 1) may degrade the func-tionality of buckets, and thus, some resource assignments may not fully utilize the bursts allocated to them. Based on Fig. 10, we suggest setting B = 5 to get the best utilization, and the analysis result in Section VI-G will also validate this point.

C. Long-Term Fairness

Next, we verify whether each scheme can guarantee long-term fairness under a highly congested network, where there are 140∼ 200 MSSs. Fig. 11 shows the fairness indices (FI) of all schemes. Recall that the network becomes saturated when there are 80 MSSs. Thus, it is impossible to get a FI of 1, because the network resource is not enough to satisfy the requirements of all traffic. Based on Fig. 11, HRF incurs the lowest index, because it always serves MSSs that use higher transmission rates. By considering the amount of allocated data of each MSS, MPF can have a higher index than HRF. QG and RMF try to satisfy the minimum requirement of each traffic in every frame, thus leading to higher indices. Because RMF allocates the resources to MSSs sorted by their transmission rates, its index will be lower than QG.

(13)

Fig. 11. Comparison on long-term fairness. (a) SUI scenario. (b) Markov scenario.

Fig. 12. Comparison on real-time packet-dropping ratios under different numbers of MSSs. (a) SUI scenario. (b) Markov scenario.

Our cross-layer framework has the highest FI (more than 0.85) due to the following two reasons. First, our priority-based scheduler only schedules γ ratio of nonurgent real-time traffic to avoid starving non-real-time traffic. Second, our cross-layer framework tries to reduce the IE overheads and acquire more frame space to allocate bursts for MSSs’ traffic. In this case, we have more resources to fairly distribute among MSSs. Thus, our cross-layer framework can maintain long-term fairness, even in a highly congested network.

D. Packet-Dropping Ratios of Real-Time Traffic

We then observe the packet-dropping ratios of real-time traffic, where each MSS will generate 0∼ 2Rrt

i real-time data in each frame. When a real-time packet is not transmitted within 6 frames (i.e., 30 ms) after being generated, it will be dropped. Fig. 12 shows the real-time packet-dropping ratios of all schemes under 10 ∼ 110 MSSs. Both HRF and MPF distribute resources to MSSs based on the transmission rates without considering the traffic types; therefore, their ratios begin raising when n≥ 50. In this case, a large amount of non-real-time traffic will compete with real-time traffic for the limited resource. On the other hand, the ratios of RMF and QG begin raising when n≥ 90. Because both RMF and QG try to satisfy the minimum requirements of all traffic in each frame, they can avoid real-time packet dropping when the network is not saturated (i.e., n < 90). Our cross-layer framework can have almost zero ratio due to the following three reasons. First,

our priority-based scheduler assigns urgent real-time traffic with the highest priority. In addition, it schedules a γ ratio of nonurgent real-time traffic to avoid generating too many urgent traffic in the following frames. Second, our bucket-based burst allocator arranges bursts based on the priorities from the scheduler; therefore, the bursts of the urgent real-time traffic can first be allocated to avoid packet dropping. Third, both our scheduler and burst allocator try to reduce IE overheads, and thus, more urgent real-time traffic can be served in each frame.

Fig. 13 shows the real-time packet-dropping ratios of all schemes under different admitted non-real-time data rates, where the network is saturated. Because MPF proportionally distributes resources among MSSs, it incurs the highest real-time packet-dropping ratio. On the other hand, because some MSSs with real-time traffic may have higher transmission rates, the ratio of HRF is lower than the ratio of MPF. As discussed earlier, both RMF and QG try to satisfy the minimum require-ment of each traffic, and thus, their ratios can become lower. Note that, because QG differentiates real-time traffic from non-real-time traffic, its ratio is lower than the ratio of RMF. Our cross-layer framework always has a zero ratio, because the bursts of urgent real-time traffic are first allocated, and our framework can acquire more frame space to serve urgent real-time traffic by reducing IE overheads.

Because the trends under both SUI and Markov scenarios are similar, we only show the results under the Markov scenario in the following experiments.

(14)

Fig. 13. Comparison on real-time packet-dropping ratios under different admitted non-real-time rates. (a) SUI scenario. (b) Markov scenario.

Fig. 14. Comparison on non-real-time satisfaction ratios of the bottom 10% MSSs under the Markov scenario.

E. Satisfaction Ratios of Non-Real-Time Traffic

Next, we measure the satisfaction ratios of non-real-time traffic [by (5)] under a saturated network. Fig. 14 shows the satisfaction ratios of non-real-time traffic of the bottom 10% MSSs. When the non-real-time rate is larger than 125 bits/frame, the ratio of HRF is zero, because these bottom 10% MSSs (whose transmission rates must be lower) are starved. The ratio of MPF starts diminishing when the non-real-time rate is larger than 250 bits/frame, because MPF proportionally distributes resources among traffic. By satisfying the minimum requirement of each traffic, the ratios of RMF and QG are close to one. Our cross-layer framework can have a ratio of nearly one for the bottom 10% MSSs, which means that non-real-time traffic will not be starved, although our scheme prefers real-time traffic.

F. Effects of System Parameters

We then observe the effects of system parameters in our cross-layer framework on network throughput, subframe uti-lization, and IE overheads under a saturated network (i.e., the number of MSS is 90). Fig. 15 shows the impact of the number of buckets (i.e., B) on network throughput, utilization, and overhead ratios when Y = 32. Here, the overhead ratio is defined as the ratio of the number of slots used for MAP infor-mation (e.g., DL-MAP, UL-MAP, and IEs) to the total number of slots in a downlink subframe. In general, the utilization decreases when the overhead ratio increases, because they are complementary. Based on Fig. 15, the utilization first increases

Fig. 15. Effect of the number of buckets B on the network throughput, subframe utilization, and IE overheads under the Markov scenario.

and then decreases when B grows. The former increment is because some resource assignments do not fully utilize their allocated bursts. On the other hand, the latter decrement is because the burst allocator generates too many bursts to satisfy the thinner buckets. The overhead ratio increases when B increases, because more IEs are generated. In addition, when

B ≤ 4, the throughput increases when B grows, because more

buckets may serve more requests. On the other hand, when

B ≥ 8, such a trend reverses, because more IEs are generated,

causing lower utilization. Based on Fig. 15, we suggest setting

B = 4∼ 8, because this range of B value improves both

throughput and utilization while reducing IE overheads. Fig. 16 shows the effects of γ and B on real-time packet-dropping ratios and network throughput in our cross-layer framework. Explicitly, the real-time packet-dropping ratio decreases when γ grows, because more real-time traffic can be served. However, when γ increases, the throughput may decrease, because the scheduler has to select more nonurgent real-time traffic to serve. In this case, some real-time traffic with lower transmission rates may be served, which degrades the throughput. As aforementioned, a large B may generate more IEs and thus reduce the utilization. Thus, the throughput in the case of larger B (e.g., B = 15 and 30) starts dropping earlier than in the case of smaller B (e.g., B = 5 and 10). Based on Fig. 16, we suggest setting γ = 0.15∼ 0.45, because this range of γ value not only improves the network throughput but also reduces real-time packet-dropping ratios under different values of B.

(15)

Fig. 16. Effect of γ on the network throughput and real-time packet-dropping ratios under the Markov scenario. (a) Smaller B. (b) Larger B.

Fig. 17. Effect of the number of buckets (B) on the throughput loss L (by analysis) and the total network throughput (by simulation) under the Markov scenario.

G. Verification of Throughput Analysis

Finally, we verify our analytic results in the part where two transmission rates, clow= 48 bits/slot and chigh= 96 bits/slot, are adopted. The probabilities that an MSS can use clow and

chighare both 0.5. Then, the probability that m MSSs can use

clowis

P rob[ ˜NL= m] = Cmn × (P rob[transmission rate = clow])m

× (P rob[transmission rate = clow])n−m

= n! m!(n− m)! × (0.5) m_{× (0.5)}n−m = (0.5) n_{× n!} m!(n− m)!.

In addition, the probability that an MSS Mi has urgent data is

P robI_iU = 1 = (1− γ)TD−1

where TDis the deadline of real-time data that will be dropped (in frames). Note that, because the scheduler will serve all queued real-time data from the top γn MSSs in each frame, after (TD− 1) frames, the probability of an MSS with urgent data is no more than (1− γ)TD−1_{. In our simulation, we set}

γ = 0.3, TD= 6 frames, andR = 200 bits/frame.

Fig. 17 shows the analysis and simulation results. When B < 4, the throughput lossL decreases, but the network throughput increases as B increases. On the other hand, when B > 8,L increases, but the network throughput decreases as B increases.

This result indicates that the minimum value ofL by analysis appears at the range of B = [4, 8], whereas the maximum network throughput by simulation appears at the same range of B = [4, 8]. Thus, our analysis and simulation results are consistent. Based on Fig. 17, we suggest setting B = 4∼ 8 to maximize the network throughput and minimize L, which matches the results in Fig. 15. Therefore, our analysis can validate the simulation results and provide guidelines for the setting of the burst allocator.

VII. CONCLUSION

In this paper, we have proposed a cross-layer framework that covers the issues of overhead reduction, real-time and non-real-time traffic scheduling, and burst allocation in an IEEE 802.16 OFDMA network. Compared with existing so-lutions, our framework is more complete, because it involves codesigning both the two-tier priority-based scheduler and the bucket-based burst allocator. Our scheduler reduces potential IE overheads by adjusting the number of MSSs to be served. With a two-tier priority rules, it guarantees real-time traffic delays, ensures satisfaction ratios of non-real-time traffic, and maintains long-term fairness. On the other hand, our burst allocator incurs low complexity and guarantees a bounded num-ber (k + (Y /Δbkt)− 1) of IEs to accommodate data bursts. In addition, it follows the priority rule from the scheduler to avoid packet dropping of urgent real-time traffic. We have also analyzed the impact of the number of buckets on the throughput loss. Through both analyses and simulations, we show how we can adjust the system parameters to reduce IE overheads, im-prove subframe utilization, and enhance network throughput. In addition, these results verify that such a cross-layer framework significantly improves the resource allocation and utilization of downlink communications in WiMAX networks. For future work, we will investigate how we can optimize the scheduler and burst allocator for some particular cases, e.g., various traffic characteristics and MSS densities. In addition, we will consider extending our results for WiMAX relay networks.

REFERENCES

[1] IEEE Standard for Local and Metropolitan Area Networks Part 16—Air

Interface for Fixed and Mobile Broadband Wireless Access Systems Amendment 2: Physical and Medium Access Control Layers for Combined Fixed and Mobile Operation in Licensed Bands and Corrigendum 1, IEEE