中華大學碩士論文

(1)

中華大學碩士論文

題目:開發最差情形權重公平排程演算法硬體用於高速網路上

Develop the Hardware of WF2Q Algorithm (Worst Case Fair Weighted Fair Queuing) for High Speed Network

系所別 :資訊工程學系碩士班學號姓名 : E09102007 林忠志指導教授 : 許文龍博士

中華民國九十四年六月

(2)

(3)

(4)

(5)

中文摘要

在網際網路盛行之際，上層運用程式的激增伴隨而來的是快速澎脹的資料流通 (尤以視訊會議或網路電話的須求更高)，下層封包排程器必須要公平且迅速讓封包離開路由器，以滿足每秒百萬個封包要傳送的骨幹網路上之路由器。

公平封包排程演算法以文獻[4]所提 GPS 對於處理封包排程問題有最公平性的答案，然而在現實的網路環境中封包並非以液體形態存在且不具有可隨意分割性，故其提出一公平性封包排程演算法（Fair Weight Queuing Algorithm）趨近 GPS 特性有令人滿意的答案，但是伴隨網路流量快速澎漲下封包排程演算法以

往以軟體執行，往往無法有效率地處理過量的封包，造成路由器系統 CPU 負

荷、Buffer Overflow、封包過於等待下造成封包被 drop 掉，又 TCP 對遺失的封包又需重傳更增加網路的擁塞。

在此論文中我們發展WF2Q[2]硬體來改善以 WF2Q 解封包排程問題使用軟體時

的執行速度，在這個發展中我們將以Verilog HDL 語言來進行設計與功能驗證。

經過比較分析後所獲得的結果是我們所設計的硬體比軟體速度提升約百倍。

(6)

Abstract

In nowadays network environment, the upper layer application increases sharply causes the network flow increase fast. And Lower layer packet scheduler must to be fair to let packet leave router to satisfy that each second million packets to have transmitted on the backbone.

GPS[4] has the most fairness answer regarding to packet scheduling, however packets are not liquid in real network environment. Therefore J.C.R Bennett and H.

Zhang proposed a fair algorithm (Weight Fair Queuing Algorithm) to approaches the GPS characteristic.

Due to the network flow increase, the packet scheduling algorithm is often executed by software cannot process excessive packets effectively, causes the system overload and buffer overflow and packet is dropped. Therefore re-transmit packets will increase the network congestion.

We develop the hardware of WF2Q to improve the software drawback and make a performance comparison between these two. In our model it shows the hardware WF2Q exceed the software approach by almost hundred times.

(7)

誌謝

本論文順利完成，首先要感謝我的指導教授許文龍博士，在此期間敦敦教悔與辛勤教導，對於教授教導及鼓勵，謹此表達心中最誠致的敬意及感謝。

此外要感同學幸洲、志卿、廣嶺、國松的互相鼓勵，還有關心我的同事及好友們，感謝您們平時的協助及鼓勵，在我心情低落而有所倦怠時，能夠適時的激發我的鬥志與積極的精神。

最後要感謝我的家人，在這段期間的支持及照顧，雖然有時感到疲憊與沮喪，但溫柔的親情總能消除一切的疲勞，繼續努力不懈。

僅以本文獻給我最敬愛的師長、最親愛的家人及所有關心支持我的人，願將這份研究成果與喜悅與您們分享。

林忠志九十四年六月

(8)

Abstract in Chinese ………..………...…..i

Abstract ……….……….…...ii

Acknowledgment………..iii

Table of Contents……..…….………...iv

List of Figures………vi

List of Table………..vii

Chapter 1 Introduction……….……….1

1.1 Research motivations……….1

1.2 Thesis Organization ………..2

Chapter 2 Background and Related Researches……….……….…………..3

2.1 Packet process flow in router………...….4

2.2 Requirements of a scheduling algorithm………...5

2.3 Overview several queue scheduling algorithms………5

2.4 GPS and WFQ………..……….9

2.5 The design flow………14

Chapter 3 The Overall Design of Our HWF2Q………..………..15

3.1 Explain WF2Q from C code………....15

3.2 Describe the detail verilog HDL code for our design………..20

Chapter 4 Simulation and Performance Analysis…………...………...…..25

4.1 Simulation Results………...25

4.2 Synthesis results and analysis………..29

Chapter 5 Conclusions and Future Works………..………..33

Reference ………..34

(9)

Appendix 1 Detail Verilog code ………..36

Appendix 2 DC Script File………..…….…41

.

(10)

List of Figures

Figure2.1: The General Block Diagram of Packets Flow in a Router…………..04

Figure 2.2: First-in, First-out (FIFO)………06

Figure 2.3: Priority Queuing……….07

Figure 2.4: Fair Queuing………...08

Figure 2.5: An example of generalized processor sharing………10

Figure 2.6: Example of WFQ and WF2Q………..……….…..13

Figure 2.7: Design Flow………14

Figure 3.1: Main procedures of WF2Q Algorithm ………...16

Figure 3.2: Block Diagram of HWF²Q………..19

Figure 3.3: Circuit Block diagram of our verilog code……….24

Figure 4.1: Test 6 result --WF²Q ……….……….26

Figure 4.2: Waveform of test 6 result --- WF²Q ………...27

Figure 4.3: Test 5 result --- WF²Q……….28

Figure 4.4: Test result ---- WFQ……….…………...28

Figure 4.5: Test result ---- WF2Q………...…..29

(11)

List of Tables

Table 2.1: Compare GPS with WFQ ...………...…..12 Table 4.1: Test models for HWF2Q………...…25 Table 4.2. The comparison between packet size and out line speed for HWF2Q….31 Table 4.3: Analyze procedures of WF2Q………..32

(12)

Chapter 1 Introduction

In this chapter, we first give research motivations of thesis and describe the organization of this thesis.

1.1 Research Motivations

Along with the advance in technology, the communication and the computer software or hardware is developed fast. Causes data increase suddenly in the network follow on communication data have transmitted fast. Therefore make packets to leave router quickly to avoid congestion.

It is important to minimize the amount of congestion in the network, because congestion decrease all over packet throughput, increase end-to-end delay, and can lead to packet loss [1], if there is not insufficient buffer memory to store all of the incoming packets. The transmitter retranslates the packets when packets had dropped by router.

There are many different queuing algorithms, each want to find the correct balance between complexity, fairness, and control.

Scheduler does the best effort to select the packets in the queues fairly. Since fairness is a problem, many researchers have devoted on this problem. We are trying to improve the execution efficiency of scheduling algorithm by.

(13)

1.2 Thesis Organization

This thesis is organized as follows: In chapter 2, we introduce background knowledge and related researches. In chapter 3, we will explain the overall design of our HWF2Q.

In chapter 4, our HWF2Q is simulated to verify the correct functionality and analyze the performance with our design.

Finally, conclusion and further works are given in chapter 5.

(14)

Chapter 2 Background and Related Researches

In this chapter, the overview of packet processing flow in router is described in section 2.1. In section 2.2, we describe the requirements of a scheduling algorithm. In section 2.3, we introduce a number of scheduling algorithms and analysis them. Then we introduce the GPS and WFQ in section 2.4. Finally, we describe the design flow in section 2.5.

2.1 Packet process flow in router

The packet processing flow in router is illustrated in Figure2.1.

(15)

Figure2.1: The General Block Diagram of Packet Flow in a Router

Classifying

Scheduling

P

H

Y

Output querying Policing

Shaping Fragmentation/

Segmentation

Switch Fabric

MAC

Congestion Control

Classify every coming packet into different prior queue according to their weight in the header field

For incoming packets, this step is doing A to D. Translate the electronic pulse to digital bit stream.

MAC does the access control for all the joint nodes according to which MAC algorithm it use.

Use algorithms schedule packets according to packet’s service weight.

(16)

2.2 The requirements of a packet scheduling algorithm

1. Support the fair distribution of bandwidth to each of the different service classes

competing for bandwidth on the output port.

2. Furnish protection between the different service classes on an output port, so that a

poorly behaved queue cannot impact the bandwidth and cannot delay other service classes to assign packet to output port.

3. Provide an algorithm that can be implemented in hardware, so it can arbitrate

access to bandwidth on the highest-speed router interfaces without negatively impacting system performance.

2.3 Overview of several queuing algorithms

In this section, we describe a number of queue scheduling Disciplines and analysis them.

2.3.1 First-in, First-out (FIFO) Queuing

FIFO queuing is the basic scheduling algorithm. In FIFO queuing, all packets are treated equally by placing them into a single queue, and then servicing them in the same order that they were placed into the queue. As illustrated in Figure2.1, there are 8 flows coming into the router. The multiplexer sent the packet into the FIFO queue on the arrival time of the packet.

(17)

■ FIFO Analysis

1. Low computational load on the system

2. Packets are not reordered and maximum delay is determined by the maximum

depth of the queue[1].

3. A single FIFO queue does not allow routers to organize buffered packets, and then

service one class of traffic differently from other classes of traffic [3].

4. A bursting flow can consume the entire buffer space of a FIFO queue, and that

causes all other flows to be denied service until after the burst is serviced. This can result in increased delay, jitter, and loss for the other well-behaved TCP and UDP flows traversing the queue.

2

Flow 1

Flow 2

Flow 3

Flow 4

Flow 5

Flow 6

Flow 7

Flow 8

Multiplexer

FIFO Queue

3 5

6

7 4 1 ^Output

Figure 2.2: First-in, First-out (FIFO)

(18)

2.3.2 Priority Queuing (PQ)

Priority queuing (PQ) is the basis for a class of queue scheduling algorithms that are designed to provide a relatively simple method of supporting differentiated service classes. As illustrated in Figure2.2, packets are scheduled from the head of a given queue only if all queues of higher priority are empty.

■ Priority Queuing Analysis

If the amount of high-priority traffic is not policed, lower-priority traffic may experience excessive delay.

2.3.3 Fair Queuing

Fair queuing (FQ) was proposed by John Nagle in 1987. [3].As illustrated in Figure2.3, packets are first classified into flows by the system and then assigned to a queue that is dedicated to that flow. Queues are then serviced one packet at a time in

Flow 1

Flow 2

Flow 3

Flow 4

Flow 5

Flow 6

Flow 7

Flow 8

Classifier

Scheduler Highest

Middle Priority

Lowest Priority Output

Figure 2.3 Priority Queuing

(19)

round-robin order.

Figure 2.4 Fair Queuing

■ Fair Queuing Analysis

1. An extremely bursting or misbehaving flow does not degrade the quality of service

delivered to other flows, because each flow is isolated into its own queue.

2.

FQ is typically applied at the edges of the network, where subscribers connect to their service provider. Vendor implementations of FQ typically classify packets into 256, 512, or 1024 queues that is calculated across the source/destination address pair, the source/destination UDP/TCP port numbers, and the IP ToS byte.

(20)

2.4 GPS and WFQ

In this section, we will describe the Generalized Processor Sharing (GPS) and Weighted Fair Queuing (WFQ). Then we will talk about WF2Q surpassed WFQ.

2.4.1 GPS characteristics

Generalized Processor Sharing (GPS) allows different queues to have different service shares [4], by assigning the different weight to each queue. GPS have several nice properties. Since each queue has its own queue, a burst queue (who is sending a lot of data) will only distort itself and not other queues. In [4] Parekh showed that when using a network with GPS switches and a queue that is leaky bucket constrained an end-to end delay bound can be guaranteed.

It’s work conserving(server must be busy if there are packets waiting in the system) and operates at a fixed rateγ . It is characterized by weights (positive real numbers) given to the flows.

Let Si(τ,t) be the amount of session i traffic served in an interval (τ,t].

Then a GPS server is defined as one for which Si(τ,t) φi

Sj(τ,t) φj , j=1,2,….N for any session i that is backlogged throughout the interval [τ, t ].

Summing over all session j :

Si(τ,t)

Σ

φj > = (t -τ) γφi And session i is guaranteed a rate of

φi Σjφj

Figure 2.5 illustrates generalized processor sharing.

________

> = ____

______

gi = γ

(21)

Variable length packets arrive from both sessions’ infinite capacity links. For i = 1,2, Si(0,t) be the amount of session i traffic that arrives at the system in the interval(0,t).

Ai(0,t) is the amount of time that the packet must wait in the system before it departs.

Figure 2.5 An example of generalized processor sharing

(22)

2.4.2 Weighted Fair Queuing (WFQ)

Because GPS is an idealized fluid model however it is impractical. Therefore WFQ algorithm (Fair Weight Queuing Algorithm) is proposed to Approaches the GPS characteristic [4].

Weighted Fair Queuing (WFQ) is a packed scheduling technique allowing guaranteed bandwidth services. The purpose of WFQ is to let several queues share the same link.

In WFQ a packet at a time is selected and outputted among the active queues. This work is as follows. Each arriving packet is given virtual start and finish times. The virtual start time S(k,i) and the virtual finish time F(k,i) of the k:th packet in queue i are computed as follows :

S(k,i)=max{F(k-1,i),V(a(k,i))}

F(k,i) = S(k,i) + L(k,i)/r(i)

With F(0,i) = 0, a(k,i) and L(k,i) are the arrival time and the length of the packet respectively.

The packet selected for output is the packet with the smallest virtual finish time. In [4]

Parekh describes a Packet GPS (PGPS) algorithm which is identical to the WFQ algorithm described here. He derives several relationships between a fluid GPS system and a corresponding packet WFQ system:

Finish time of a packet in the WFQ system compared to the GPS system will not be later than at most the transmission time of a maximum sized packet. The number of bits serviced in queue by the WFQ system will not fall behind the GPS system by more than a maximum sized packet. The result is shown Table 2.1

(23)

Session 1 Session 2 Queue’s

weight

Arrival Size

1 1

2 1

3 2

11 2

0 3

5 2

9 2

Ø¹ = Ø² GPS PGPS

3 4

5 5

9 7

13 13

5 3

9 9

11 11

2 Ø¹ = Ø² GPS PGPS

4 4

5 5

9 9

13 13

4 3

8 7

11 11

Table 2.1 Compare GPS with WFQ

2.4.3 Worst Case Fair Weighted Fair Queuing (WF2Q)

Although WFQ and WF2Q all take approach GPS as a goal, but WF2Q also considered whether virtual clock has been bigger than packet virtual staring time.

Therefore WFQ is able to transmit the packet too quick. For example Figure 2.5, first queue is assigned to 50% bandwidth and other queue is 5%. Assume packet size is 1 and output port width is 1. When the time is 0, first queue has 11 packets to enter and 2nd to 11 queues have one Packet to enter. In this kind of situation, the virtual starting time is 2 (K-1) and virtual finishing is 2k of kth packet in first queue, but the virtual starting time is 0 and virtual finishing time all is 20 of first packet in 2nd to 11 queues.

Because WFQ only considers packets virtual finishing time, therefore only transmits 10 packets of first queue in time 10 before, then starts to transmit 11th packet of 2nd to 11 queue in time 10 after. Its smooth like Figure 2.5, so has two shortcomings of WFQ :

(24)

1. delay jitter t is too big: When time 2 seconds, the 1 to 10th packets of first queue

leaved the system, then 11th packet of first queue leaved the system in 10 seconds.

Between these differences are too big.

2. WFQ possible transmits too quickly:

In 10 seconds, WFQ already transmitted 10 packets of first queue, but compares with GPS only should transmit 5 packets. If the system has many queues can be more serious. Therefore WF2Q has the better packet transmission order than WFQ.

Figure 2.6 An Example of WFQ and WF2Q

Session 1

Session 2 to Session 10

WFQ service order

WF2Q service order

Result of ordered

(25)

2.5 The Design Flow

Our design Flow is in Figure 2.6

Figure 2.7 Design Flow

Algorithm analysis

Hardware model

System Design

Simulation/Synthesis

(26)

Chapter 3 The Overall Design of Our HWF2Q

In section 3.1, we first show four main procedures of the WF2Q algorithm as Figure 3.1. Then explain the functions by C program. In section 3.2, we explain overall design of our hardware WF2Q (HWF2Q).

3.1 Explain WF2Q from C program

In section 3.1,we first show main procedures of the WF2Q algorithm as Figure 3.1.

Figure 3.1 Main procedures of WF2Q Algorithm

Start

Step 3: Update the global virtual time Step 0:Set up parameters

Flow quantity

Flow’s size

Each flow’s weight Initial virtual time

Step 2: Set the start and the finish times of the remaining Step 1: Look for the candidate flow with the earliest finish

time.

Decision the packet which will be transmit

(27)

The overall procedure of the WF2Q algorithm is shown below.

[Procedure: WF2Q Algorithm]

★ Step 0: Setting the parameters

● Flow quantity

● Flow’s size: It is FIFO size of each flow.

● Each flow weight

● Initial virtual time: It contain global virtual time and virtual star time and virtual finish time.

★ Step 1: Look for the candidate flow with the earliest finish time.

In this procedure, first find out eligible queue, then Compare the virtual finish time within eligible queue.

[C code: ] Look for the candidate flow with the earliest finish time packet* Look_earliest_time( )

{

Packet *pkt = NULL, *nextpkt;

int i;

int pktSize;

double minF;

int flow = -1;

for (i = 0; i< MAXFLOWS; i++){

if (!fs_[i].qcrtSize_) continue;

if (fs_[i].S_ <= V) if (fs_[i].F_ < minF){

flow = i;

minF = fs_[i].F_;

(28)

★ Step 2: Set the start and the finish times of the remaining packets in the queue.

This procedure will calculate the transmission time and update the virtual star time and virtual finish time. The process is below:

★ Step 3: Update the global virtual clock

In this procedure, first find out minimum virtual start time, then compare the global virtual clock with minimum virtual start time. The process is below:

[C code: Set the start and the finish times of the remaining packets in the queue]

void Set_ start_finish_time(Packet* nextpkt ) {

fs_[flow].S_ = fs_[flow].F_;

fs_[flow].F_ = fs_[flow].S_ + (nextPkt->size()/fs_[flow].weight_);

/* the weight parameter better not be 0 */

}

[C code: Update the virtual clock ] void Update_vt(fs_[flow].S )

{

double minS = fs_[flow].S;

for (i = 0; i < MAXFLOWS; ++i) {

W += fs_[i].weight_; /* compute tatal flow weight */

if (fs_[i].qcrtSize_) /* find out minimum virtual start time in flows*/

if (fs_[i].S < minS) minS = fs_[i].S;

}

V = max(minS, (V + ((double)pktSize/W)));

}

/* who’s minimum for V and minS */

(29)

3.2 Overall Design

In this section, we first introduce circuit block diagram of our design (HWF2Q). Then describe our design by verilog HDL

3.2.1 Circuit Block Diagram

Figure 3.2 is a block diagram of the HWF2Q. In this diagram, there is a circuit blocks, which is Time Stamp generator (TS_generator). The FIFOs are buffer to temporarily store the packet’s headers, and Time Stamp generator will calculate and update the head of line packet’s virtual time then sent signals (sel_flow) to queuing manager. The packet of is chosen will dequeue according to the signals, which received from Timestamp Generator.

The TS_monitor is a 64 bits signal that indicates the content of head of line packet. It provides the latest information including the packet’s size and arrival time for the TS_generator to computer the Timestamp.

(30)

Figure 3.2 Block Diagram of Hardware WF²Q

(31)

3.2.2 Describe our design by verilog HDL

Our HWF2Q include four main procedures shown in Figure 3.1.We describe the four main procedures by verilog HDL. ( The detail Code is in appendix 1.)

★ Step 0: Setting the parameters

● Flow quantity

● Flow’s size: It is FIFO size of each flow.

● Each flow weight

● Initial virtual time: It contain global virtual time and virtual star time and virtual finish time.

module TS (clk,rst,

linehead_1, linehead_2, linehead_3, linehead_4,

sel_flow1, sel_flow2, sel_flow3, sel_flow4); //

flow quantity

input [63:0] linehead_1, linehead_2, linehead_3, linehead_4; //

flow size (

FIFO's input) input rst,clk;

output sel_flow1, sel_flow2, sel_flow3, sel_flow4;

// output to QM for selection ․

․

Txtime = linehead_3 [15:0]* 5; // flow1's weight is 20%

(32)

★ Step 1: Look for the candidate flow with the earliest finish time.

st0: begin // compare each flow's FT_Vir with each flow •

•

if (temp_FT_vir1 < temp_FT_vir2)

// don’t sure could use as this way (temp_FT_vir) begin

min_FT_vir = temp_FT_vir1;

sel_flow1 =1;

sel_flow2 =0;

sel_flow3 =0;

sel_flow4 =0;

•

• end else

begin

min_FT_vir = temp_FT_vir2;

sel_flow1 =0;

sel_flow2 =1;

sel_flow3 =0;

sel_flow4 =0;

•

(33)

★ Step 2: Set the start and the finish times of the remaining packets in the queue

st1: begin // update the dequeue flow's start & finish time

nextst= st2;

if ({sel_flow1, sel_flow2, sel_flow3, sel_flow4} ==4'b1000 ) // For flow1

begin

ST_vir1 = FT_vir1 ;

// porporgater the FT_vir to next pkt's ST_vir

txtime = linehead_1[15:0] * 5;

// flow1's weight is 20%

FT_vir1 = ST_vir1 + txtime; // update the finish time end

else if ({sel_flow1, sel_flow2, sel_flow3, sel_flow4} ==4'b0100 ) •

•

(34)

★ Step 3: Update the global virtual clock

First find out minimum virtual start time, and then compare the global virtual clock with minimum virtual start time.

st0: begin // compare each flow's FT_Vir with each other •

•

pre_cal_vt = vt + linehead_1 [15:0]; //

•

st2: begin // Find the Mininum start time after updating S,F time // Min(S)

nextst= st3;

if (ST_vir1 < ST_vir2)

min_ST_vir = ST_vir1;

else

if (ST_vir3 < min_ST_vir)

if (ST_vir4 < min_ST_vir)

end

st3: begin //Update

global virtual clock

nextst= st0;

if(min_ST_vir > pre_cal_vt)

vt = min_ST_vir;

else

vt = pre_cal_vt;

•

(35)

3.2.3 Circuit Block diagram of our verilog code

Figure 3.3 Circuit Block diagram of our verilog code

(36)

Chapter 4 Simulation and Performance Analysis

In this chapter, we will verify the functional correctness and deficiency of our design. Our design was written in Verilog HDL, and then results of each test module will be presented in section 4.1. The performance comparison between hardware and software will be discussed in section 4.2.

4.1 Simulation Results

We setup the different test models (Table 4.1) to verify the performances of WF²Q could properly handle packets in the different weight and length situation. We describe several test models.

Secession Weight Packet Length Test

Number Q1 Q2 Q3 Q4 Q1 Q2 Q3 Q4 Check

Test 1 25% 25% 25% 25% Same Size Pass

Test 2 25% 25% 25% 25% Random Pass

Test 3 50% 20% 20% 10% Same Size Pass

Test 4 50% 20% 20% 10% Random Pass

Test 5 25% 25% 25% 25% Gradually Increasing size Pass

Test 6 50% 20% 20% 10% 50 10 10 10 Pass

Test 7 25% 25% 25% 25% 50 10 10 10 Pass

Table 4.1 Test model

(37)

4.1.1 Test 6 result :

In this test, we use the weight value of Q1—50%, Q2,Q3—20% and Q4—10%. Also

Q1 has fix packet’s size in 50 bytes and Q2, Q3, Q4 all have same packet’s size in 10

bytes. The result is shown as Figure 4.1 and Figure 4.2. As we can see that all queues

follow their weighted slope and Q4 still gets it services, even thought it has only 10%

bandwidth rate. The larger step on Q1 is due to the bigger packet.

Figure 4.1. Test 6 result ---- WF²Q

0 50 100 150 200 250 300 350 400

T0 T1 T5 t11 t21 t31 t41 t51 t60

Q1: 50% BW.

Q2, Q3: 20% BW.

Q4: 10% BW.

(38)

Figure 4.2 Waveform of test 6 result --- WF²Q

4.1.2 Test 5 result :

The result is shown as Figure 4.3; in this test we use the weight value of Q1, Q2, Q3, and Q4—25%. Also of Q1, Q2, Q3, and Q4 all randomly gradual increase size packets’ size start from 10 bytes and each flow received equal amount of the traffic.

(39)

0 10 20 30 40 50 60 70 80

t0

Figure 4.3 Test 5 result --- WF²Q

4.1.3 Compare test 3 result with WFQ

The Figure 4.4 and Figure 4.5 have exactly same testing condition, each flow’s

weight is Q1—50%, Q2, Q3—20%, Q4—10% and all have same packet’s size. At this

test, we can see that both WFQ and WF²Q follow the weight’s slope, however if we

zoom in to smaller time scale. We can see the WFQ can not really handle the burst

packets.

Same size of packet WFQ

0 10 20 30 40 50 60 70 80

Q1: 50% BW.

Q2, Q3: 20% BW.

Q4: 10% BW.

Q1: 50% BW.

Q2, Q3: 20% BW.

Q4: 10% BW.

(40)

Figure 4.4 Test result ---- WFQ

Same size of packet WF2Q

0 10 20 30 40 50 60 70 80

T0 T60 T120

Figure 4.5 Test result --- WF2Q

4.2 Synthesis results

For the entire test we make assumptions that all the packets have been classified

into four queuing buffers, and scheduler extracts packets from queuing buffers and

packet header filed has indicated the class level of that packet. We have simulated and

synthesized WF²Q and WFQ using Toshiba TC240C for our target library. It is a

standard cell with 0.25µ technologies and has voltage with 2.5/3.3V [10]. In our

synthesis results, the total grid area is 28001.5 ~33466.1 and the gate counts with

respect to the Toshiba Standard Cell in CND2xL(4) is approximately 7000 gates.

We’ve tried several different clock speeds in our script file and the scheduler could

run up to 5ns, which is 200 MHz. In this case, the most critical path has 3.42ns data Q1: 50% BW.

Q2, Q3: 20% BW.

Q4: 10% BW.

(41)

require time with -2.07ns data arrival time which meaning the slack is met. We also

check the difference in grid area when running in different clock speed. We found out

that increasing the clock speed will not affect the grid area too much.

4.3 Performance Analysis

In this section we present the performance analysis of our HWF2Q.

4.3.1 Scheduler’s requests for high-speed network.

The complexity of algorithms inhibits their use in high speed switches, which must select a packet to transmit in few microseconds.

In addition, performance must scale to ten thousands connections multiplexed onto a single link, with connection throughput requirements varying over a wide range [12].

The Internet network speed all surpasses Giga Hz; the transmission speed is needed quickly to outline speed in router inside. The packet size of multimedia data (mpg-4) approximately is 50 bytes, so the operating speed is as follow:

(1) Packet size / out line speed = To 50*8(bits) / giga Hz = 0.0000004 (2) Scheduling speed = Ti

The Ti refers to section 4.3.2; therefore the comparison is shown as Table 4.2.

(42)

Ti (colocks / giga) Check Outline speed Packet

size(bytes)

To

Software Hardware Software To >Ti

Hardware To >Ti

10 Giga 50 **410 110 5*10 No Yes**

10 Giga 100

**810 110 5*10 No Yes**

10 Giga 1000

810 110 5*10

No Yes

10 Giga 10000

110 110 5*10

No Yes

Table 4.2 The comparison between packet size and out line speed for HWF2Q

4.3.2 Methodology

We generated X86 assembly code listing from the C code. From the detailed assembly code we were able to count instruction path lengths through major code sections. By referring to the instruction set table of the Pentium II CPU, we can know about clock cycles of every instruction. But, we do not have perfectly accurate to count, because some instructions do not have fixed clocks, such as, it’s<CMP reg,reg>clock is 2 or 3.

We compare the executive cycle of WF2Q operation between hardware and software.

The performance comparison between hardware (refer Figure 4.6) and software approaches is given Table 4.3.

-8

-6 -8

-6 -9 -9

-9 -6

-6 -7

-6 -9

(43)

Hardware (Our HWF2Q)

Software Function

Total clock cycles

Speed-up

Look for the candidate flow with the earliest finish time

1 256

256

Set the start and the finish times of the remaining packets in the queue

1 360 360

Update the global virtual clock

3 501 166

Total clock 5 1117 223

Table 4.3 Analyze procedures of WF2Q

(44)

Chapter 5 Conclusions and Future Works

In this dissertation, we develop a hardware of weight fair queuing algorithm for high speeding network to improve the speed up for solving packet scheduling efficiency.

The primary benefit of WFQ is that an extremely burst or misbehaving flow does not degrade the QoS delivered to other flows [3], because each flow is isolated into its own queue. If a flow attempts to consume more than its fair share of bandwidth, then only its queue is affected, so there is no impact on the performance of the other queues on the shared output port. However WFQ might not pass some certain test conditions, such as worse case scheduling [2]. By adding two more steps, which compare and find each queue’s virtual start time with global virtual time. WF²Q is able to avoid the issues that WFQ does.

Actually, there are related research can be worked in future.

(1) To operate in coordination with bandwidth management to revise WF2Q algorithm’s virtual time, it is able to conform to the Integrate Service.

(2) The Packet classification Algorithm implement possibility also is important direction.

(45)

Reference:

[1] L.L Peterson and B.S Davie“Computer Networks a systems approach. ”Second Edition 2000

[2] J.C.Rbennett and H.Zhang, “ Worst Case faire weighted fair queueing, ” IEEE,INFOCOM pg 120-126.

[3] Juniper Networks “Supporting differentiated service classes Queue scheduling disciplines”Dec.2001

[4] A. Parekh.“A Generalized Processor Sharing Approach to Flow Control in Integrated Service Networks, ” PhD dissertation, Massachusetts Institute of Technology, February 1992.

[5] K.Nichols et. al., “Definition of the Differentiated”Service Field (DS Field) in the Ipv4 and Ipv6 Headers”,IETF RFC2474,December 1998.

[6]. L. Louis Zhang and Brent Beacham “A Scheduler ASIC for a programmable packet switch” 2000 IEEE

[7] Massoud R Hashemi “A Multicast Single-Queue Switch With a Novel Mechanism”

[8] S. Golestani. A self-clocked fair queuing scheme for broadband applications.

In Proceedings of IEEE INFOCOM, pages 636-646 April 1994.

[9] L. Kleinrock. "Queueing Systems, Volume 2: Computer Applications. Wiley, 1976.

(46)

[10] A. Demers, S. Keshav, and S. Shenker. "Analysis and simulation of a fair queueing algorithm. In Journal of Internetworking Research and Experience, pages 3-26, October 1990. Also in Proceedings of ACM SIGCOMM´89, pp 3-12.

[11] Toshiba Corporation. “Toshiba Data Book-TC240C series Primitive Cells, I/O Cells (Non-Linier Delay Models) March, 1999

[12] J. L. Rexford, A. G. Greenberg, and F. G. Bonomi. Hardware-Efficient Fair Queueing Architectures for High-Speed Networks. In IEEE INFOCOM’96, San Francisco, March 1996.

(47)

Appendix 1 Verilog code

module TS (clk,rst,

linehead_1, linehead_2, linehead_3, linehead_4, sel_flow1, sel_flow2, sel_flow3, sel_flow4);

input [63:0] linehead_1, linehead_2, linehead_3, linehead_4; // FIFO's input input rst,clk;

output sel_flow1, sel_flow2, sel_flow3, sel_flow4;

// output to QM for selection

reg sel_flow1, sel_flow2, sel_flow3, sel_flow4;

reg [63:0] ST_vir1, ST_vir2, ST_vir3, ST_vir4, //Flow's virtual start time

FT_vir1, FT_vir2, FT_vir3, FT_vir4, //Flow's virtual finish time

vt, // Global Virtual Time pre_cal_vt, txtime, min_FT_vir, min_ST_vir;

parameter st0=0, st1=1, st2=2, st3=3;

reg[1:0] currentst, nextst;

wire c1,c2,c3,c4;

wire [63:0] temp_FT_vir1, temp_FT_vir2, temp_FT_vir3, temp_FT_vir4;

assign c1 = (ST_vir1 < vt); //compare each flow's ST_Vir with VT assign c2 = (ST_vir2 < vt);

assign c3 = (ST_vir3 < vt);

assign c4 = (ST_vir4 < vt);

assign temp_FT_vir1 = (c1 == 1) ? FT_vir1 :

(64'b1111_1111_1111_1111_1111_1111_1111_1111_1111_1111_1111_1111_1111_1111_1111_1111);

(48)

(64'b1111_1111_1111_1111_1111_1111_1111_1111_1111_1111_1111_1111_1111_1111_1111_1111);

always@(posedge clk or negedge rst) begin

if(rst)

currentst <= st0;

else

currentst <= nextst;

end

always@(currentst or linehead_1 or linehead_2

or linehead_3 or linehead_4) begin

case(currentst)

st0: begin // compare each flow's FT_Vir with each other //dequeue

nextst= st1;

if (temp_FT_vir1 < temp_FT_vir2)

// don’t sure could use as this way (temp_FT_vir) begin

min_FT_vir = temp_FT_vir1;

sel_flow1 =1;

sel_flow2 =0;

sel_flow3 =0;

sel_flow4 =0;

pre_cal_vt = vt + linehead_1 [15:0];

//for st3 use end else

begin

sel_flow1 =0;

sel_flow2 =1;

(49)

sel_flow3 =0;

sel_flow4 =0;

pre_cal_vt = vt + linehead_2[15:0];

//for st3 use end

if (temp_FT_vir3 < min_FT_vir) begin

sel_flow1 =0;

sel_flow2 =0;

sel_flow3 =1;

sel_flow4 =0;

//for st3 use end

if (temp_FT_vir4 < min_FT_vir) begin

sel_flow1 =0;

sel_flow2 =0;

sel_flow3 =0;

sel_flow4 =1;

//for st3 use end end

st1: begin // update the dequeue flow's start & finish time nextst= st2;

if ({sel_flow1, sel_flow2, sel_flow3, sel_flow4} ==4'b1000 ) // For flow1

begin

ST_vir1 = FT_vir1 ;

// porporgater the FT_vir to next pkt's ST_vir

txtime = linehead_1[15:0] * 5;

// flow1's weight is 20%

(50)

end

else if ({sel_flow1, sel_flow2, sel_flow3, sel_flow4} ==4'b0100 ) // For flow2

begin

ST_vir2 = FT_vir2 ;

txtime = linehead_2 [15:0]* 5;

FT_vir2 = ST_vir2 + txtime;

end

else if ({sel_flow1, sel_flow2, sel_flow3, sel_flow4} ==4'b0010 ) // For flow3

begin

ST_vir3 = FT_vir3 ;

// porporgater the FT_vir to next pkt's ST_vir txtime = linehead_3 [15:0]* 5;

end

else if ({sel_flow1, sel_flow2, sel_flow3, sel_flow4} ==4'b0001) // For flow4

begin

ST_vir4 = FT_vir4 ;

txtime = linehead_4[15:0] * 5;

end

st2: begin // Find the Mininum start time after updating S,F time // Min(S)

nextst= st3;

if (ST_vir1 < ST_vir2)

min_ST_vir = ST_vir1;

else

(51)

if (ST_vir3 < min_ST_vir)

if (ST_vir4 < min_ST_vir)

end

st3: begin // update the virtual time

nextst= st0;

if(min_ST_vir > pre_cal_vt)

vt = min_ST_vir;

else

vt = pre_cal_vt;

end endcase end endmodule

(52)

中 華 大 學 碩 士 論 文