在事件層平行方法下 NS2 網路模擬器的效能量測

(1)

在事件層平行方法下 NS2 網路模擬器的效能量測

The Performance of the NS2 Network Simulator

using the Event-level Parallelism Approach

研究生：曾彥勳 Student：Yan-Shiun Tzeng

指導教授：王協源 Advisor：Shie-Yuan Wang

國立交通大學

網路工程研究所

碩士論文

A Thesis

Submitted to Institute of Network Engineering College of Computer Science

National Chiao Tung University in partial Fulfillment of the Requirements

for the Degree of Master

in

Computer Science

June 2007

(2)

NS2

The Performance of the NS2 Network Simulator

using the Event-level Parallelism Approach

!"#$%& '()*+ , -./0 1234567)89:;7<0<=>?@AB*C7DE@F G HI7456JKL 1M Intel NAMDOPQRSTUVWXY 456Z[0\ ]^_7DE@F G`ab? WXY 456Z[7cdefghijklme hino< 0 <pq 1rst < 0 <uvwx+yz G {|}~ 127ZE]9'(6{ WX Y d7 9: G ~ 1 na 8{ NS2 '( 6d 7 G 1 NS2'(6{7 ¡¢ £@F237¤¥ 1 ¦§ ~¨©ª0 7«¬®¯°AB7± G ²³´ µ ¶ 456 N · XY N ¸ XY NWXY N '( 6 N N ¹ ¡ ºZE N DE» NNS2 NELP G

(3)

The Performance of the NS2 Network Simulator

using the Event-level Parallelism Approach

Student: Yan-Shiun Tzeng Advisors: Prof. Shie-Yuan Wang

Institute of Network Engineering National Chiao-Tung University

ABSTRACT

In recent years increasing, CPU clock speed is becoming more and more difficult to effectively improve the performance. Major CPU venders such as Intel, AMD, etc. have all turned to multicore CPUs as the best way to gain additional performance. A desktop PC or NB which are with modern multicore systems will become more and more popular and its cost will become more and more inexpensive.

This thesis presents to a novel and general parallel simulation approach to increas-ing network simulation speeds on modern multicore systems. In this thesis, we present the design and implementation of this approach and apply it to the NS2 network sim-ulator. We show the achieved performance speedups of the NS2 network simulator under various network conditions. Finally, several possible future developments are proposed.

Keywords: CPU, dual-core, quad-core, multicore, network simulator, simulation, NS2, ELP, event-level parallelism, Thread.

(4)

{|/7+ 1I7 ® & – ® 1{ ®& 7®7 1& W 7® 1&® Ft?!%":7¤ 1|#& 7$ *%&ª0'( t nk)* 7+,®-. G tI / 0 1%1234356 1{~7889:; W 7<= 1 /7&>t{?@A7 B -45/7¹ 1CD7EF G ¨ dGH Y +?&®& 1{ I ' 7 |/J:; K 7 1LMkNOP& N QRS& ®TUV& WX7{I' +Y I 7 Z F® 1[{|9 \]^W W ¨©_`abcdef g hWijklmWnopqrcstuv w xy hz{|}jc~ hjkncc hij y cdce h j c x ¡¢ h£¤j¥¦§¨c©ª«¬®j¯°±²c³´µ¶ w ·¸ z{|}¹¹ºº»¼½ j ct w

(5)

List of Figures

2.1 Thread and process models: (a) Single-threaded Process, and (b)

Multi-threaded Process . . . 7

2.2 The architecture of the lightweight processes . . . 8

2.3 The architecture of the SMP-version Linux kernel over a dual-core CPU 9

2.4 The model of the LinuxThreads library . . . 10

2.5 The 1x1 model of the NPTL library . . . 11

2.6 The MxN model of the NPTL library . . . 12

2.7 The module-based platform provided by the NS2 network simulator . 13

2.8 The module skeleton provided by the NS2 network simulator . . . 14

3.1 The architecture of a parallel network simulator using the ELP

ap-proach for dual-core systems . . . 17

3.2 State translation diagram of the master thread . . . 18

3.3 State translation diagram of the worker thread . . . 19

3.4 Lookahead value is amount of transmission time plus propagation delay 22

3.5 Two typical events: (a) Packet Arrival Event, and (b) Local

Compu-tation Event . . . 23

3.6 A 3x3 grid wired network used to illustrate safe events found at the

event level . . . 25

3.7 The path lookahead of a path is the sum of the link lookaheads of all

links on the path . . . 26

3.8 A 3x3 grid wired network used to illustrate safe events found for local

(9)

3.9 Determining when an event can be moved from the event list to the

safe event list for execution by worker threads . . . 31

4.1 The relationship between the event list and the scheduler in the original

architecture of the NS2 network simulator . . . 35

4.2 The relationship between the event list, the safe event list, and the

scheduler of the NS2 network simulator using the ELP approach for

dual-core system . . . 36

4.3 The event data structure modified to support ELP approach . . . 37

4.4 The worker thread gets its current simulation time based on its worker

thread ID . . . 38

4.5 A typical race condition for acquiring random number . . . 39

4.6 The flowchart for the modified run() function of the scheduler . . . . 42

4.7 The snapshots of the event list, the safe event list, and the two worker

threads after the master thread performs two times finding-safe-event

procedure . . . 43

4.8 There are two different control messages between the master thread

and the worker threads . . . 46

4.9 The wired and wireless network protocol modules . . . 48

4.10 A 7x7 grid wireless network used to ilustrate group safe events found

at the event level . . . 52

4.11 The measuring period of the AMS mechanism . . . 53

5.1 The connection pattern on a 5x5 grid wired network . . . 57

5.2 Expected ELPs on a wired network: The network topology size is varied. 60

5.3 Performance speedup on a wired network: The network topology size

is varied. . . 60

5.4 Expected ELPs on a wired network: The coding computation loop is

varied. . . 62

5.5 Performance speedup on a wired network: The coding computation

(10)

5.6 Expected ELPs on a wired network: The link delay is varied. . . 63

5.7 Performance speedup on a wired network: The link delay is varied. . 64

5.8 Expected ELPs on a wired network: The link bandwidth is varied. . . 65

5.9 Performance speedup on a wired network: The link bandwidth is varied. 65

5.10 Performance speedup on a wired network: The link bandwidth is varied

and the coding computation loop is 2048000. . . 66

5.11 ELP Overhead on a wired network: The network topology size is varied. 68 5.12 ELP Overhead on a wired network: The coding computation loop is

varied. . . 68

5.13 ELP Overhead on a wired network: The link delay is varied. . . 68

5.14 ELP Overhead on a wired network: The link bandwidth is varied. . . 69

5.15 The connection pattern on a 5x5 grid wireless network . . . 70

5.16 Expected ELPs on a wireless network: The network topology size is

varied. . . 71

5.17 Performance speedup on a wireless network: The network topology size

is varied. . . 71

5.18 Expected ELPs on a wireless network: The coding computation loop

is varied. . . 72

5.19 Ratio of local computation events to packet arrival events on wired and

wireless networks . . . 73

5.20 Performance speedup on a wireless network: The coding computation

loop is varied. . . 73

5.21 Performance speedup on a wireless network: The coding computation

loop is varied and the network topology size is 30. . . 74

5.22 Expected ELPs on a wireless network: The link bandwidth is varied. 75

5.23 Performance speedup on a wireless network: The link bandwidth is

varied. . . 75

5.24 ELP Overhead on a wireless network: The coding computation loop is

(11)

5.25 ELP Overhead on a wireless network: The network topology size is

varied. . . 77

5.26 ELP Overhead on a wireless network: The coding computation loop is

varied. . . 77

(12)

List of Tables

3.1 Four different conditions of regarding SrcNID and DstNID . . . 24

5.1 Common simulation parameters for wired networks . . . 57

(13)

Chapter 1 Introduction

The complexities of networks have been increasing as the networks are developing. Therefore, it requires simulation users to spend more and more time to simulate a large-scale and complex network case. Shortening the required time for simulating a large-scale and complex network has long been a desire. In the past, this goal could be easily achieved by simply using a faster CPU. However, as CPU clock speed increases become more and more difficult to achieve in recent years, this simple approach is no longer effective.

Understanding that it is harder and harder to increase CPU clock rate, major CPU vendors such as Intel, AMD, etc. have all turned to multicore CPUs as the best way to gain additional performance. Nowadays, there are many models of desktop PCs, notebook computers, and servers on the market that are already shipped with dual-core or quad-core CPUs. With this trend, it is very likely that after a few years, uni-core systems may disappear on the market.

Since multicore systems will be everywhere in the future, effectively using the com-puting power of multiple cores (We will use the more conventional term “CPU” to refer to “core” in the rest of the thesis when there is no ambiguity) simultaneously is a strong desire. However, as pointed out in [1], it is difficult for an application (includ-ing a network simulator) to automatically gain performance speedups on multicore systems because at any given time the application can only be run on a single CPU.

(14)

To gain performance speedups, an application needs to be made “multi-threaded” so that its threads can be run on multiple CPUs simultaneously. However, making an application program “multi-threaded” is not a simply task and does not necessarily can provide good performance speedups [1].

In the context of network simulation, many methods have been proposed for par-allel and distributed simulations over multiple CPUs to achieve performance speedups [2]. Most of these methods fall into two approaches – the conservative approach and the optimistic approach. Using either approach requires a simulation user to parti-tion a simulated network into several porparti-tions and modify simulaparti-tion code to perform simulation clocks synchronization among these portions to avoid causality errors. The conservative approach, although simpler to be implemented into the simulation code, usually results in very low simulation performance under tiny lookahead values [2]. On the other hand, the optimistic approach, although potentially may achieve higher per-formance speedups than the conservative approach, is very complicated and requires substantial modifications to the simulation code.

Network simulation users want to enjoy performance speedups without changing the way they use existing sequential network simulators. If this can be done, even if the performance speedup is not much, as long as the speed of a parallel network simulation is no less than that of the corresponding sequential network simulation, there is no performance loss to a simulation user. With this worst-case performance guarantee and the advantage of using exactly the same way to conduct parallel simula-tions, simulation users will be more willing to conduct parallel simulations to achieve potential performance speedups on multicore systems.

We proposes a novel and general parallel simulation approach, called the “Event-level Parallelism (ELP) approach,” to achieve the above goals. The idea of this approach is analogous to the idea of the “instruction-level parallelism” approach employed in the computer architecture and compiler research communities, which exploits instruction-level parallelism to achieve performance speedups over multiple CPUs without finding the parallelism inherent and latent in an application.

(15)

overview of parallel and distributed simulation methods and the state-of-the-art of the multi-threaded design in the Linux systems. The NS2 network simulator is also introduced in this chapter. The ELP approach is presented and discussed in Chap-ter 3. We describe the detailed design and implementation os the ELP approach in Chapter 4. Chapter 5 presents experimental results to evaluate the performance speedups. Finally, we propose possible extensions to our implementation in Chapter 6 and conclude in Chapter 7.

(16)

Chapter 2 Background

The event-level parallelism approach is similar to the parallel and distributed simula-tion approaches to execute a simulated case on multicore CPU systems. The challenge in the ELP approach is the same at conventional parallel and distributed approaches to execute logical processes (LPs) concurrently and generate correct simulation re-sults. In addition to these, a multi-threaded application running on multiple CPUs needs the support of an operating system (OS) and a user-level thread library.

In this Chapter, the overview of parallel and distributed simulation and related work are introduced. In addition, we explain the state-of-the-art of the multi-threaded design in the Linux kernel and in the user-level library. Finally, the background knowledge of the NS2 network simulator is also introduced.

2.1 Related Work

The parallelization of network simulations has been studied for some time. In [3], the authors attempt to parallelize the widely used open source network simulator ns but could only support simple point-to-point links with static routing and UDP traffic. The supports for TCP connections, dynamic routing, and shared medium networks were not provided due to high complexities. In [4], the authors attempt to parallelize a widely used commercial network simulator, called OPNET ([5]), but could only support simple UDP and IP protocols. The supports for TCP connections and other

(17)

protocols were not provided due to the extensive used of global state, zero lookahead interactions, and pointer data structures in OPNET.

The second reason is that simulation users need to learn parallel simulation con-cepts and approaches to conduct parallel network simulations. This hinders users without such training from conducting parallel network simulations. In [6, 7], the authors use a federated approach to interconnect multiple ns simulation engines, each of which simulates a portion of the network, to together simulate a network. Al-though this approach reduces the required modification to ns simulation code, a user still needs to modify the original script usages of ns for parallel simulations. Even if an existing network simulator has been completely parallelized, a user still needs to learn the concepts of parallel simulation so that he/she can wisely partition a simu-lated network into several portions and map them onto available CPUs. Partitioning a large-scale network with a large number of nodes is a tedious work for the user. Worse yet, finding a load-balancing partition to achieve good performance speedups is very difficult. To achieve good load balancing, the partition decision cannot be made simply by considering the number of nodes assigned to each portion, Instead, the amount f packet traffic that may be generated and needs to be processed in each portion should also be considered. However, such information is hard to obtain and estimate at the time when a user needs to partition a network for parallel simulation.

2.2 Parallel and Distributed Simulation Overview

The parallel discrete event simulation is referred to as the execution of a discrete event simulation program on parallel and distributed simulators. The major issue of the parallel and distributed simulation is how to ensure that results of the parallel discrete event simulation on multiple CPU are correct. In a sequential execution paradigm, it is crucial that the simulation engine always selects the smallest time-stamped event from its event list as the one to be processed next. If an event with larger timestamp were executed before one with a smaller timestamp, this simulation may result in incorrect results. We call this type of errors “causality errors.”

(18)

A discrete event simulation, consisting of logical processes (LPs) that interact exclusively by exchanging timestamp messages, obeys the local causality constraint if and only if each LP processes events in non-decreasing timestamp order. However, adherence to this constraint is sufficient, although not always necessary, to guarantee that no causality errors occur. In other words, violating causality constraint may not always result in simulation errors. This is because two events within a single logical process may be independent of each other. In such a case, processing them without using the non-decreasing timestamp sequence does not lead to a causality error.

The synchronization mechanisms for parallel and distributed simulation fall into two approaches – the conservative approach and the optimistic approach. Using conservative approach, each logical process follows the local causality constraint and thus is blocked until it can be guaranteed that its (local and remote) events are safe to process. Events of each logical process are processed strictly in the non-decreasing timestamp order to avoid any causality errors. On the other hand, with the optimistic approach, events can be processed out of the non-decreasing timestamp order by using additional recovering mechanisms, such as the “rollback” [2].

The ELP approach can easily avoid the causality errors on multicore systems. The detail of the ELP approach will be presented in Chapter 3.

2.3 The State-of-the-art Multi-threaded Support

As mentioned previously, a multi-threaded application running on multiple CPUs needs the support of an operating system (OS) and a user-level thread library. We explain those components in the following section respectively.

2.3.1 The Thread Design in the Linux Kernel

A process is defined as follows [8]: a process is an instance of a program in execution. When a process is created, it is almost identical to its parent. It receives a logical copy of the parents address space and executes the same code as the parent. They

(19)

灿

ĩŢĪġŔŪůŨŭŦĮŵũųŦŢťŦťġőųŰŤŦŴŴ

灿

ĩţĪġŎŶŭŵŪĮŵũųŦŢťŦťġőųŰŤŦŴŴ

Figure 2.1: Thread and process models: (a) Single-threaded Process, and (b) Multi-threaded Process

have separate copies of the data (stack and heap). Changes by the child to a memory location are invisible to the parent, although the parent and child may share the memory pages containing the program code.

A thread is one of multiple execution flows of a process. It is a basic scheduling unit of CPU. There can be more than one thread in a process, and a process at least is associated with a thread. Therefore, the threads of a process may access the same address space and the same kernel resources. As illustrated in Fig. 2.1, each periphery block donates a process. As Fig. 2.1(a) shown, when there is only one thread associated with a process, the program is called a single-threaded application or a single-threaded program. On the other hand, there are more than one thread associated with a process in Fig. 2.1(b). We call this program a multi-threaded application or a multi-threaded program.

In most conventional operating systems, there are two types of thread. One is user thread and the other is kernel thread. A user thread is alive at the user-level. It is often created by the user-level library. On the other hand, a kernel thread is alive in the kernel-level.

In the earlier versions of the Linux operating system, the user threads of a multi-threaded application are created, handled, and scheduled entirely in the user-level library. All the user threads correspond to a process in the kernel. Therefore, when

(20)

task_struct 1 task_struct 2 mm_struct fs_struct signal_struct

Figure 2.2: The architecture of the lightweight processes

one of the threads is accessing the Linux Kernel resources via a system call, the others are blocked by the Linux Kernel. Such a mechanism of multi-threaded support is not satisfactory.

A kernel thread is similar to a process and therefore it has many similar properties as a process. Kernel threads can be independently schedulable. Before the version-2.6 release of Linux system ([9]), the term “process” and ”kernel-thread” are used interchangeably because there is no kernel thread design. Nowadays, the version-2.6 release of Linux kernel has supported the kernel thread. A kernel thread is also called a lightweight process (LWP) in the Linux system. When a process is an instance of a single-threaded program, the process just corresponds to a lightweight process, and the lightweight process is often called process.

The architecture of a lightweight process is illustrated in Fig. 2.2. As Fig. 2.2 illustrates, each lightweight process is associated with an independent task struct. The task struct is the most important data structure in the Linux kernel where the useful information of the lightweight process is stored. Some fields of the task struct are pointers to a memory space where other used resources are stored. Lightweight processes and its parent process may share some resources in the kernel, such as the

(21)

Physical dual-core CPU

Figure 2.3: The architecture of the SMP-version Linux kernel over a dual-core CPU memory page table, the file description table, etc. As such, when one of them modifies a shared resource, the other will immediately see the change. The Linux kernel must synchronize among them when they are accessing the shared resource.

A straightforward way to provide a better multi-threaded support is to associate a lightweight process (or a kernel thread) with a user thread. A Lightweight process can be scheduled. Therefore, each user thread can be independently scheduled by the Linux kernel.

The Linux operating system has supported the SMP architecture and integrated it since its version-2.6 release. Over an n-core CPU system, the SMP-version Linux kernel is capable of scheduling N lightweight processes on N CPU cores respectively. Therefore, the SMP-version Linux kernel is also capable of scheduling N user thread on N CPU cores simultaneously when each user thread is associated to a schedulable kernel thread. For example, as Fig. 2.3 depicts, there are two cores in the dual-core CPU. The scheduler is capable of scheduling two lightweight processes on two CPU cores simultaneously.

After the version-2.6.18 release, the Linux operating system has enabled the SMP support and the LWP support by default.

(22)

Figure 2.4: The model of the LinuxThreads library

2.3.2 The Thread Design in the User-level Library

The IEEE defines the POSIX 1003.1 standards ([10, 11]) to specify the application programming interface (API) for software compatible with variants of the Unix-like operating system, such as FreeBSD, Linux, Solaris, etc. Three well-known user-level libraries are developed.

The LinuxThreads Library

The first developed library is the LinuxThreads ([12]) library. Almost every conven-tional Linux distribution (a package containing not only a specified Linux system but also a lot of popular applications), such as Fedora Core, Ubuntu, Debain, etc. provides the LinuxThreads library to support multi-thread facilities.

As Fig. 2.4 illustrates, the LinuxThreads library implements Nx1 model: the all user threads of a multi-threaded application are associated with a process (or a light-weight process) in the Linux system. However, this architecture could not fully sup-port the multi-threaded capability because it provides multi-threaded supsup-port entirely at the user-level. There are a number of problems with it, mainly owing to the imple-mentation. For example, it uses the system call clone to create a new process sharing the parent’s address space causing problems for signal handling; LinuxThreads needs

(23)

! " ! "

Figure 2.5: The 1x1 model of the NPTL library

to use the signal SIGUSER1 and SIGUSER2 for inter-thread communication, so that LinuxThreads must reserve those signals for it. In this model, only one of the user threads could access the kernel resource at any given time.

To improve upon LinuxThreads, rewriting the user threads library is required. Two projects were started to address these requirement: Next Generation POSIX Threads (NGPT) and Native POSIX Thread Library (NPTL). Since those new projects were started, the main maintainer of the LinuxThreads project has suspended the de-velopment of this project.

The NGPT Library

NGPT’s main developer team included engineers from IBM, Inc. As the results pre-sented in [13], the creating and destroying performance of NPTL is more efficient than NGPT. For this reason, the NGPT library was abandoned in mid-2003, at around the same time when NPTL was released, so that NPTL becomes the standard POSIX-compliant library in the Linux system. NPTL is now a fully integrated part of GNU C Library (it is also called glibc library).

(24)

! " # ! " # ! " #

Figure 2.6: The MxN model of the NPTL library The NPTL Library

NPTL ([13]) project was developed by a team included developer from Red Hat, Inc. They began to design almost at the same time as the NGPT project. It uses an ap-proach similar to LinuxThreads that the NPTL library also uses the system call clone. However, the NPTL library uses the system call to create a new lightweight process rather than a process. NPTL requires a specialized kernel that provides the SMP and the LWP facilities to implement independently scheduling and sharing the address space among all kernel threads of a multi-threaded application. As aforementioned, the current version of the Linux kernel has implemented these requirements.

Two different implementation models are provided by the NPTL library. One model is the 1x1 architecture that each user threads is in 1-1 correspondent to a lightweight processes in the Linux kernel as Fig. 2.5 illustrats. This is the simplest implementation. In the 1x1 model, the scheduler of the Linux kernel may indepen-dently schedule all user threads of a multi-thread application on multiple CPUs, so that one user thread may sleep while others are running. They may also access the kernel resources concurrently.

The other model is the MxN architecture as Fig. 2.6 depicts. There are M user threads associated with N lightweight processes. The NPTL library needs to

(25)

han-

Figure 2.7: The module-based platform provided by the NS2 network simulator dle, and schedule those user threads entirely at th user-level. NPTL is capable of associating a scheduled user thread with an available lightweight process (or a kernel thread). In other words, the MxN model of the NPTL library must make context switching of user threads in the library. The model increases the complexity of the implementation, and therefore there is no developer who is implementing this model nowadays. All Linux distributions have adopted the NPTL library with the 1x1 model to support multi-threaded applications.

2.4 The NS2 Network Simulator

NS2 is an object-oriented and module-based simulator developed as part of the VINT project at the University of California in Berkeley. NS2 is a discrete event simulator targeted at networking research. It provides substantial support for simulation of TCP, UDP, routing protocols over wired and wireless networks. Its performance depends on the number of events that it needs to process. The more events it needs to process, the slower its simulation speed will be.

It adopts an open-system architecture and provides a module-based platform. Fig. 2.7 is an example that depicts a wireless network topology consisting of three

(26)

class BiConnector : public NsObject { protected: char *name_; NsObject *uptarget_; NsObject *downtarget_; NsObject *drop_; ... public:

virtual void drop(Packet *p);

virtual void sendUp(Packet *p, Handler *h); virtual void sendDown(Packet *p, Handler *h); ... }; Module sendDown() sendUp() drop()

class Connector : public NsObject { protected: char *name_; NsObject *target_; NsObject *drop_; ... public:

virtual void drop(Packet *p);

virtual void send(Packet *p, Handler *h); ... }; Module send() drop()

(a) The connector-module skeleton

(a) The bi-connector-module skeleton

(27)

nodes and the organization of each node. In the module-based platform, a module skeleton is provided as shown in Fig. 2.8. Based on the skeleton, researcher can easily develop their own modules and integrate them into the simulator.

(28)

Chapter 3 The Event-level Parallelism

Approach

3.1 The ELP Architecture Overview

In this chapter, we present the architecture of the ELP approach. Fig. 3.1 shows the architecture of a parallel network simulator using the ELP approach. For illustration purposes, the parallel network simulator depicted in this figure is for dual-core sys-tems. It can be extended for quad-core systems by simply increasing the number of worker threads from two to four in this figure.

There are four important components of the ELP architecture. In the following, we describe those components respectively.

• Master Thread

The major responsibility of the master thread is to determine currently which events are safe to be executed in parallel without causing causality errors among themselves

• Worker Thread

A worker thread only executes safe events which have been determined by master thread as safe events and placed in the safe event list. It may insert the newly

(29)

ŎŢŴŵŦųġ ŕũųŦŢť ŘŰųŬŦųġ ŕũųŦŢťġĲ ŘŰųŬŦųġ ŕũųŦŢťġĳ

Figure 3.1: The architecture of a parallel network simulator using the ELP approach for dual-core systems

generated events into the event list during event execution. • Event List

All newly generated events are stored in the event list. The event list is accessed by master or worker threads.

• Safe Event List

The safe event list stores the safe events to be executed in the future. Both master and worker threads can access this list.

For a dual-core, only three simple threads need to run in the parallel network simulator. One thread is the master thread while the others are two worker threads. The master thread determines currently which events are “safe” [2] to be executed by worker threads in parallel without causing causality errors among themselves. When such events can be found, the master thread moves them from the event list to the safe event list and wakes up any worker thread that is sleeping waiting for a safe event to execute. As Fig. 3.2 illustrats, if no new safe events can be found at the current time,

(30)

!

" #$ $% $ & ' & ($ $% $ &

) $* $ $ + , - .$ #$ '& $ ' $ / $ $ $ $% $ & ( $ $% $ ) $ ( 1 ' 2 .$ 1 )+ - .$# $ '

Figure 3.2: State translation diagram of the master thread

the master thread sleeps and waits for any worker thread to later wake it up when that worker thread has finished its current event computation. After being waked up, the master thread continues to find new safe events. The master thread repeats the above operations until the whole simulation is finished (i.e., when the event list becomes empty).

At any given time, at least one worker thread must be busy executing an event. The ELP approach has this property because as in a sequential simulation, the event with the smallest timestamp in the whole simulated network must be a safe event for execution. As such, suppose that currently there is only one worker thread busy executing an event, when that worker thread has finished its event computation and become idle because there is no safe event in the safe event list for it to execute, the master thread must be able to find at least one safe event (it is simply the event at the head of the event list) and move it (or them) to the safe event list. At that time, one or more worker threads will have safe events to execute again as Fig. 3.3 depicts. With this property, the master thread is assured that, when all safe events have been processed, it will be waked up by one worker thread to continue to find more safe events from the event list. It is also this property why the ELP approach can always generate a performance speedup higher than or close to 1 even under tiny

(31)

! " # $ % $ & ' # $ % ( ) * * + , * - , * , * . / 0 1 2 * 3 4 5 * 4 4 3 ,

Figure 3.3: State translation diagram of the worker thread

lookahead values. This is because under such a harsh condition, the ELP approach can always degenerate to a sequential simulation approach. This can be simply done by activating only one worker thread and asking the master thread to move the first event in the event list to the safe event list without checking whether it is a safe event. As such, the ELP approach will never result in a parallel simulation that runs much slower than a sequential simulation.

The event list stores the events to be executed in the future and sorts them based on their timestamps, which denote the times when the events should be executed, in a non-decreasing order. In a sequential network simulator running on a single CPU, these events will be dequeued and executed sequentially by the CPU to avoid causality errors. If new events are created (called “scheduled” in the simulation research field) when an event is executed by a worker thread, the newly generated events are inserted into the event list based on their timestamps.

In the ELP approach, two worker threads are created on a dual-core system to fully utilize the two CPUs. Found safe event are stored and sorted in the safe event list, from which any worker thread can dequeue a safe event to execute it when that

(32)

worker thread finishes its current event computation and becomes idle.

As long as the two worker threads are busy executing events at all times, the two CPUs will be fully utilized and good performance speedups will result. The important job of the master thread is to find enough ELP for the target N-core system at all times. For a N-core system, maintaining N safe events in the safe event list at all times suffices to keep the N worker threads busy all the time.

When a worker thread finishes its current event computation, wants to dequeue a safe event from the safe event list, but finds that there is no safe event in the safe event list, it wakes up the master thread and asks it to find more safe events and move them into the safe event list, If more safe events can be found, the master thread will wake up this worker thread and this worker thread will have a safe event to execute and become busy again. If no more safe events can be found (e.g., the other worker thread is executing an event, which can affect the events at the front of the event list), this worker thread will sleep, waiting for the master thread to later wake it up when the master thread has found new safe events and inserted them into the safe event list.

Since the master thread and worker threads all need to access the event list and the safe event list, to ensure data structure consistency, accesses to these lists are coordinated and protected by locks. In addition to these two lists, if there are other global variables or data structures that may be accessed during event execution, they need to be protected by locks as well. One global variable that is accessed very fre-quently during event execution is the variable storing the current simulation time. However, because this variable is accessed only for read purposes during event execu-tion and its value can only be advanced by the event-processing loop of the network simulator (in the master thread), it need not be protected by locks. When a worker thread dequeues a safe event from the safe event list for execution, the timestamp of the event to be executed should be the current simulation time for this specific event. As such, one can store it into a local variable of this worker thread. When the worker thread executes this event and needs to get the current simulation time by calling an API function, this API function can retrieve the value of this local variable based on

(33)

the thread ID and return it as the current simulation time for this specific event. No locking/unlocking overhead is needed for accessing the current simulation time of a simulated network.

3.2 The Affect Event Relationship

In this section, we present how to find safe events for parallel execution by worker threads. The number of safe events that can be found strongly depends on the looka-head value. Therefore, we briefly explain the concepts of lookalooka-head and safe event here. More detailed explanations are presented in [2]. Suppose that a sequential network simulator has advanced its simulation clock to simulation time T , which is the timestamp of the next unprocessed (i.e., the first) event in the event list, and is going to execute that event. Suppose that there is a constraint that a new event must be scheduled at least L units of simulation time into the future, then it can be guaranteed that all new events scheduled during the execution of this event must have a timestamp greater than or equal to T + L. With this property, any event in the event list with a timestamp less than T + L can be safely processed without causing causality errors. These events are called “safe events” and L is the lookahead value for this example.

In the context of network simulation, when a packet is transmitted over a link, one can associate a lookahead value with the link. This value is the sum of the signal propagation delay D over the link and the transmission time T xT ime of the packet being transmitted over the link, where D is a fixed value and T xT ime is the packet length divided by the link bandwidth. This is because when a packet is transmitted over a link, only then this amount of time has elapsed will the packet arrive at the other end of the link and be completely received by the remote network interface. Suppose that the current simulation time is T 1 and an event E1 is executed at the source node of a link, and during the event execution a packet is transmitted to the destination node of the link. As Fig. 3.4(a) depicts, if there is an event E2 in the event list with a timestamp of T 2 larger than T 1, one can be sure that if T 1 + D + T xT ime > T 2,

(34)

(a) E1 and E2 are safe events

(b) Only E1 is a safe event but E2 is not.

Figure 3.4: Lookahead value is amount of transmission time plus propagation delay E1 cannot affect E2. This means that in this condition E1 and E2 are safe events and can be executed in parallel without affecting each other. On the other hand, if T 1 + D + T xT ime ≤ T 2 as Fig. 3.4(b) illustrats, E1 can affect E2 because the arrived packet may change the internal state of the node where E2 resides before E2 is executed. In such a condition, only E1 is a safe event while E2 is not.

When two nodes are not directly connected by a link, an earlier event E1 on one node may or may not affect a later event on the other node. In the following, we present the rules used to check whether it is possible for E1 to affect E2. In the ELP approach, the data structure of each event stores the source node ID (SrcNID) and destination node ID (DstNID) of the event. If the event represents a packet arrival sent from node i to node j, then SrcNID is set to i and DstNID is set to j. A packet arrival event, when executed, will change the internal state (e.g., the buffer occupancy level) of the receiving node. On the other hand, if the event does not represent a packet arrival, it must represent a local computation, which only modifies the internal state of the node where this event is executed and does not transmit a packet to another node. For a local computation event, suppose that the event is to be executed on node k, then both of its SrcNID and DstNID are set to k. For example, as

(35)

(b) Local Computation Event

(a) Packet Arrival Event

Figure 3.5: Two typical events: (a) Packet Arrival Event, and (b) Local Computation Event

Fig. 3.5 depicts, there are three events in the simulated case currently. One is a local computation event and the others are arrival events. The local computation event E3 resides on Node 3 so that its SrcNID and DstNID are set to 3. The two arrival events E1 and E2 are sent from Node 1 to Node2 and from Node 2 to Node 1, respectively. The SrcNID and DstNID of the event E1 are set to 1 and 2, those of the event E2 are set to 2 and 1. The The SrcNID and DstNID information of an event are readily available in a network simulator.

In a protocol stack consisting of several protocol modules, a packet arrival event for the destination node of a link can only be scheduled by the bottom physical-layer protocol module, which models the operation of the network interface at the source node of the link. Consider the triggering of a “hello” timer, which is commonly used in a routing protocol module to send out a HELLO packet to inform other nodes that this node is still alive. In this example, although the upper-layer routing protocol module schedules a packet transmission event (not an arrival event), before this event reaches the bottom physical-layer protocol module, as it passes along all protocol modules, it is still considered a local computation event because the execution of this event only affects the internal state of the local node.

In the following, we discuss whether an event on one node can affect an event on another node.

(36)

Table 3.1: Four different conditions of regarding SrcNID and DstNID

Rules Condition

Rules 1 SrcN ID1 6= SrcNID2 and DstNID1 6= DstNID2

Rules 2 SrcN ID1 = SrcNID2 and DstNID1 6= DstNID2

Rules 3 SrcN ID1 6= SrcNID2 and DstNID1 = DstNID2

Rules 4 SrcN ID1 = SrcNID2 and DstNID1 = DstNID2

3.2.1 Packet Arrival Events

In this section, we assume that the two events considered are both packet arrival

events. Denote E1’s SrcNID and DstNID as SrcNID1 and DstNID1 and E2’s

Src-NID and DstSrc-NID as SrcSrc-NID2 and DstNID2. When one compares SrcNID1 with

SrcN ID2 and compares DstNID1 with DstNID2, based on the two comparison

re-sults (same or not), there are four different conditions to consider shown in Table 3.1. Among these conditions, we consider Rule 1 and Rule 2 preliminary safe conditions – meaning that if certain lookahead conditions can be met, it is very likely that E1 cannot affect E2. As such, to avoid wasting CPU cycles, one can simply view that

E1 cannot affect E2 without further checking the lookahead conditions.

The reasons for the above checking rules are explained below in Fig. 3.6. In this

figure, each node has a name Ni, where i denotes the ID of the node and each link has

a delay denoted as Lij, where i and j are the IDs of the two nodes of the link. Each

node connects to each of its immediate neighbors by a link and each ink has an output buffer associated with it. In this figure, Ea, Eb, Ec, Ed, and Ef represent packet arrival events and Ee represents a local computation event. Each packet arrival event is associated with an arrow link denoting its SrcNID and DstNID. For example, the SrcNID and DstNID of Ea are 6 and 5, respectively, and those of Ed are 8 and 7, respectively. On the other hand, each local computation event is associated with a self-pointing arrow and its SrcNID and DstNID are the same and is the ID of the node where this event is to be executed. For example, the SrcNID and DstNID of

(37)

L23 L25 L14 L47 L58 L69 L36 L56 L45 L78 L89 L12

Figure 3.6: A 3x3 grid wired network used to illustrate safe events found at the event level

The condition of Rule 3 that SrcNID1 6= SrcNID2 and DstNID1 = DstNID2

is considered unsafe. An example for this condition is when one considers whether Ea can affect Eb assuming that the timestamp of Ea is smaller than that of Eb. Since a packet arrival can change the internal state of the receiving node, it is clear that Ea can affect Eb and thus the execution order of Ea and Eb should be maintained. As such, Ea and Eb should not be executed in parallel.

The condition of Rule 2 that SrcNID1 = SrcNID2 and DstNID1 6= DstNID2

is considered safe if certain lookahead condition can be met. An example for this con-dition is when one considers whether Eb can affect Ed assuming that the timestamp of Eb is smaller than that of Ed. Although the destination node of Eb, which is N5, is not the same as the destination node of Ed, which is N7, Eb can still affect Ed if there exists a path from N5 to N7. On the other hand, if there is no path from N5 to N7, one can be sure that Eb cannot affect Ed.

To determine whether an earlier event E1 on a node Src may affect a later event E2 on a node Dst, one needs to compute the minimum path lookahead among all possible

(38)

Figure 3.7: The path lookahead of a path is the sum of the link lookaheads of all links on the path

Dst is defined to be the sum of the link lookaheads of all links on the path. The

computed minimum value represents the minimum amount of time that must elapse before an earlier event on Src can affect a later event on Dst. This is because Src may

schedule (create) an event destined for Ni, upon receiving this event, node Ni may in

turn schedule an event destined for Nj, ... , and Nk may in turn schedule an event

destined for Dst. If the event scheduled by Nk has a smaller timestamp than E2 and

thus gets executed prior to E2 on Dst, E1 can affect E2.

For the above case in which one would like to determine whether Eb on N5 may

affect Ed on N7, one needs to compute the minimum path lookahead among all

possible paths from N5 to N7. One can use the all-pairs shortest path Dijkstra’s

algorithm to precompute and store the minimum path lookaheads between all pairs of nodes. Suppose that the precomputed minimum path lookahead of the path from

N5 to N7 is P LA57 and, as Fig. 3.7 depicts, the timestamp of Eb is T b and that of

Ed is T d. Then if T b + P LA57 > T d, Eb cannot affect Ed and they can be executed

in parallel.

The condition of Rule 1 that SrcNID1 6= SrcNID2 and DstNID1 6= DstNID2

is considered safe and the treatments for it are the same as those for the previous

condition in which SrcNID1 = SrcNID2. An example for this condition is when one

considers whether Ec can affect Ed assuming that the timestamp of Ec is smaller than that of Ed. Like the treatments for the previous condition, suppose that the

precomputed minimum path lookahead of the path from N2 to N7 is P LA27 and the

(39)

affect Ed and they can be executed in parallel.

The condition of Rule 4 that SrcNID1 = SrcNID2 and DstNID1 = DstNID2

is considered unsafe. An example for this condition is when one considers whether Eb can affect Ef assuming that the timestamp of Eb is smaller than that of Ef . Although a network interface can handle only one packet transmission at one time, it is possible that many packet arrival events with the same SrcNID and the same DstNID (i.e., they are transmitted over the same link) are scheduled and appear in the event list. For example, suppose that a link has a long delay and a high bandwidth, which makes the transmission time of a packet over this link less than the link delay. Then several transmitted packets may be simultaneously “on the flight” over this link. It is clear that the order of these packet arrivals at the receiving node should be maintained and not be executed in parallel. Otherwise, causality errors will result.

3.2.2 Local Computation Events

Here we consider whether an earlier E1 on one node may or may not affect a later event E2 on another node, where either E1 or E2 or both are local computation events. Recall that for a local computation event, its SrcNID is the same as its DstN ID. Therefore, when discussing a local computation event, it suffices to only consider its SrcNID or its DstNID.

Like the treatments for the previous subsection, when either E1 or E2 or both are

local computation events, the conditions that DstNID1 = DstNID2 is considered

unsafe. We only discuss the conditions that DstNID1 6= DstNID2 using Fig. 3.8

that is similar to Fig. 3.6. In this figure, Ea and Ec represent packet arrival events and Eb and Ed represent local computation events.

Suppose that both E1 and E2 are local computation events, it is easy to decide

whether E1 can affect E2. If DstNID1 = DstNID2, E1 can affect E2 because they

are executed on the same node. On the other hand, if DstNID1 6= DstNID2, E1 can

still affect E2 through a path from node DstNID1 to node DstNID2. An example

(40)

L23 L25 L14 L47 L58 L69 L36 L56 L45 L78 L89 L12

Figure 3.8: A 3x3 grid wired network used to illustrate safe events found for local computation events at the event level

timestamp of Eb is smaller than that of Ed. Although the destination node of Eb,

which is N5, is not the same as the destination node of Ed, which is N8, Eb can still

affect Ed if there exists a path from N5 to N8. On the other hand, if there is no

path from N5 to N8, one can be sure that Eb cannot affect Ed. Link the previous

treatments, one needs to compute the minimum path lookahead from node N5to node

N8. Let it be denoted as P LA78. If the timestamp of Eb plus P LA78 is greater than

the timestamp of Ed, Eb cannot affect Ed; otherwise, Ea can affect Eb.

So far, the lookaheads we have considered all come from the delays of physical links. However, it is possible that a local computation event can have a non-zero lookahead due to specific protocol designs. In such a situation, even if E1 and E2 are on the same node, if the timestamp of E1 plus its lookahead is greater than that of E2, E1 cannot affect E2.

For the case when E1 is a local computation event while E2 is a packet arrival, if

DstN ID1 6= DstNID2, E1 is possible to affect E2. An example for this situation is

when one considers whether Eb can affect Ec assuming that the timestamp of Eb is

(41)

N2. Like the treatments for the previous situation, suppose that the minimum path

lookahead of the path from N5 to N2 is P LA52. If the timestamp of Eb plus P LA52is

greater than the timestamp of Ec, Eb cannot affect Ec; otherwise, Eb can affect Ec. For the case when E1 is a packet arrival event while E2 is a local computation

event, if DstNID1 6= DstNID2, E1 may affect E2. An example for this situation is

when one considers whether Ea can affect Eb assuming that the timestamp of Ea is

smaller than that of Eb. The destination node of Ea is N6 and that of Eb is N5. In

this situation, the minimum path lookahead P LA65 is computed. If the timestamp of

Ea plus P LA65 is greater than the timestamp of Eb, Ea cannot affect Eb; otherwise,

Ea can affect Eb.

3.2.3 Wireless Mobile Networks

The links we have considered so far are full-duplex fixed links. On such links, packet transmissions over different links are independent and do not affect each other. On a wireless channel, however, a transmitted packet is broadcast to all nodes that can sense the signal of the packet. As such, the network simulator needs to schedule a packet arrival event for every node that is within the interference range of the transmitting node.

To apply the ELP approach to a wireless network such as a mobile ad hoc network, because link and path lookahead information are important for the ELP approach, the network simulator first uses the wireless interface’s interference range to construct the network topology. This is done by checking whether node j is within the interference range of node i. If this condition holds, it is considered that there is a wireless link from node i to node j. The lookahead of such a link is the link delay (which is the signal propagation delay from node i to node j) plus the packet transmission time over the link. After the network topology has been constructed, a wireless network can be viewed as a wired network and the checking rules for wired networks can be used for the wireless network. For a mobile ad hoc network, since nodes may move, the network topology constructed at the beginning of the simulation needs to be updated

(42)

during the simulation. The update frequency depends on the maximum moving speed of mobile nodes and the required accuracy of simulation results. The mobile ad hoc network will not be discussed in this thesis.

3.3 The Safe Event Set

In the following, we present how the master thread finds the safe event set and when it moves these events from the event list to the safe event list for execution by worker threads. In Fig. 3.9, we show a snapshot of the event list, the safe event list, and the two worker threads. If there is an arrow from event Ei to event Ej, this means that the master thread has determined that Ei cannot affect Ej. On the other hand, if there is no arrow from event Ei to event Ej, this means that Ei can affect Ej. As shown in this figure, for discussion purposes we name the events stored in the event list, starting from the head, E1, E2, E3, E4, ..., respectively. The master thread tries to find a safe event set from these events and move them to the safe event list. To do so, for E1, it checks whether E1 can affect E2, E3, E4, E5, E6, ..., E(1 + MP ). For E2, it checks whether E2 can affect E3, E4, E5, E6, E7, ..., E(2 + MP ). That is, for Ei, it checks whether E(i) can affect E(i + 1), E(i + 2), E(i + 3), E(i + 4), ..., E(i + MP ), where MP is a system parameter (standing for “Maximum Parallelism”) and depends on the degree of ELP that one would like to find. The value of MP also regulates the maximum number of events that the master thread checks for inclusion into the safe event set. Using an appropriate value for MP can provide adequate ELP for a N-core system while reducing unnecessary wastes of CPU cycles, which is explained below.

Suppose that one would like to maintain N safe events in the safe event list at all times, then MP should be set to a value no less than N. For a N-core system, although setting MP to N can potentially allow every worker thread to have an event to execute at the same time, because the number of events that can be included into a safe event set may be less than N (see below) and to avoid frequent sleep/wakeup interactions between worker threads and the master thread, it is suggested that MP is set to a

(43)

ŘŰųŬŦųġŕũųŦŢťġĲ ŘŰųŬŦųġŕũųŦŢťġĳ E1 E2

Figure 3.9: Determining when an event can be moved from the event list to the safe event list for execution by worker threads

larger value such as 2 ∗ N. On the other hand, MP should not be set to a too large value. Otherwise, excessive CPU cycles will be wasted on many unnecessary “affect” relationship comparisons, reducing available CPU cycles that can otherwise be used to achieve higher performance speedups. Recall that the master thread constantly determines the “affect” relationships among MP events and this operation requires

O(MP2

) time complexity. As such, if the N CPUs of a N-core system are already being fully utilized, further increasing MP to find more ELP will not improve the achieved performance speedup but may instead harm it.

A safe event set is constructed based on the above computed “affect” event re-lationships. Initially, it is an empty set {}. Starting from the first event E1 in the event list, event by event, one sequentially determines whether an event E(i) can be added to the safe event set. If any of the events in the safe event set can affect E(i),

E(i) cannot be added to the set. On the other hand, to include E(i) into the safe

(44)

E(i − 2), and E(i − 1) that are not included in the current safe event set can affect E(i). If none of them can affect E(i), then E(i) can be added to the safe event set. The following is a step-by-step construction of the safe event set for the event list in Fig. 3.9: {}, {E1}, {E1, E2}, {E1, E2, E3}, {E1, E2, E3, E4}, {E1, E2, E3, E4, E7}, {E1, E2, E3, E4, E7, E8}. The reason why E5 cannot be added to the safe event set is because E2 can affect it. The reason why E6 cannot be added to the safe event set is because E1 can affect it. In this figure, we use dash lines to draw E1, E2, E3, E4, E7, and E8 in the event list to represent the fact that they have been moved to the safe event list.

When an event stays in the safe event list, any available worker thread can dequeue it for execution. In Fig. 3.9, worker thread 1 has dequeued E1 and worker thread 2 has dequeued E2. We use dash lines to draw E1 and E2 in the safe event list to represent the fact that they have been dequeued by some worker threads and are still being processed. When a worker thread finishes its event computation, it will try to dequeue another event from the safe event list. If no safe event can be found, it wakes up the master thread asking the master thread to search for more safe events and move them from the event list into the safe event list. The master thread then uses the method described above to determine a new safe event set under the current condition (e.g., E1 is finished and removed from the safe event set). If new events can be added to the safe event set, then the master thread moves them into the safe event list. As shown in this example, events in the event list need not be sequentially moved into the safe event list. In this example, even though E7 and E8 have been moved into the safe event list, E5 and E6 still stay in the event list. This situation is correct because one had confirmed that E5 and E6 cannot affect E7 and E8 before adding E7 and E8 into the safe event set. Later on, when E1 and E2 are finished and have been removed from the safe event set, E5 and E6 will become qualified to be added to the safe event list.

When a new event E0 _{is inserted into the event list (e.g., worker thread 1 may}

schedule this event when executing E1), one needs to check whether this event can be moved into the safe event list. Whether or not it can be moved into the safe

(45)

event list does not affect the events already in the safe event list. This is because all

possible situation (i.e., E1 schedules E0 _{and the execution of E}0 _{may affect an event in}

the safe event list) had already been considered when determining the safe event set.

To determine whether E0 _{can be moved into the safe event list, one uses the above}

method to construct a new safe event set based on the new event list. If E0 _appears

(46)

Chapter 4 Design and Implementation

This chapter presents how to apply the ELP approach to the NS2 network simulator. We first describe the modification to the NS2 simulation engine and then the detailed design and implementation of the master thread, the worker thread, and the thread IPC. The modification of the wired and wireless modules on top of the NS2 network simulator is also presented.

4.1 The NS2 Simulation Engine Modification

The NS2 network simulator consists of C++ modules and TCL modules. A part of the information which is used during simulation is stored in C++ modules, and the other part of those is stored in TCL modules. To apply the ELP approach to the NS2 network simulator, some of these C++ and TCL modules need to be modified. In the following, we present the modified components of NS2 Simulation Engine, which the ELP approach depends on.

4.1.1 Scheduler

The NS2 network simulator uses the TCL description language to specify a simulated case. At the beginning of the simulation, the NS2 network simulator will first read this simulated case and configure all simulation environments. Then, the NS2 network

(47)

ŎŢŪůġōŰŰű ŔŤũŦťŶŭŦų

Figure 4.1: The relationship between the event list and the scheduler in the original architecture of the NS2 network simulator

simulator asks its scheduler to begin to simulate this case (it calls the function run() of the scheduler to perform this simulation run). It dequeues the event with the smallest timestamp from the event list and executes it sequentially before we apply the ELP approach. Scheduler repeats the above operations until the whole simulation is finished (when the event becomes empty) or a special event is executed, which is used to ask simulation engine to stop simulation.

To Apply the ELP approach to the NS2 network simulator, the scheduler of the NS2 network simulator needs to precompute the path lookahead, create the worker threads, and prepare the event list and the safe event list before beginning to simulate. We will present these tasks in the following.

The function schedule(), deque(), cancel() of the scheduler are modified for ELP approach because all of them access the event list or the safe event list. As Fig. 4.1 de-picts, there is a list in the architecture of the NS2 network simulator and its scheduler is responsible for maintaining this list which stores unprocessed events. The sched-uler provides three API function to control it – schedule(), deque(), and cancel(). We briefly describe the purpose of these functions.

• schedule() Insert a new event, which is provided by the caller, into the event list.

(48)

ŎŢŴŵŦųġ ŕũųŦŢť ŘŰųŬŦųġ ŕũųŦŢťġĲ ŘŰųŬŦųġ ŕũųŦŢťġĳ ŔŤũŦťŶŭŦų ! " # # # $ ! " $ ! " $

Figure 4.2: The relationship between the event list, the safe event list, and the sched-uler of the NS2 network simulator using the ELP approach for dual-core system

• deque() Dequeue the event with the smallest timestamp from the event list. • cancel() Remove the specified event, which is provided by the caller, from the

event list.

Fig. 4.2 shows that there are two lists, which are the event list and the safe event list, in the network simulator using the ELP architecture and we need to modify the above functions of the scheduler to apply the ELP approach. The simple way to modify the function deque() is that we regard the original list as the safe event list. This way we do not need to change it because the worker thread requires a safe event with the smallest timestamp to execute by calling this function. The modified function schedule() inserts a new event into the event list instead of the safe event list (the original list). Nevertheless, the function cancel() needs to be extensively modified. This is because if the specified event, which the caller wants to remove, is not a safe event, then this function should remove it from the event list. On the other hand, if it has been a safe event, this function should remove it from the safe event list instead of the event list.

(49)

struct ELP_Info {

uint8_t flag_;

unsigned int src_nid_; unsigned int dst_nid_;

double time_; Event *owner; ... }; class Event { public: ... double time_;

struct ELP_Info elp_info_;

... }

Figure 4.3: The event data structure modified to support ELP approach Both the event list and the safe event list need to be protected by locks when the function schedule(), deque(), and cancel() are accessing those; otherwise, some memory pointer errors will occur when there is more than one worker thread wanting to call these functions. Therefore, we insert some code for locking them using the POSIX Threads library to perform this.

4.1.2 Event Lists

Explained in the previous section, the original list in the NS2 network simulator is regarded as the safe event list in the ELP architecture. We need to create the event list to store unprocessed events which need not be determined whether they are safe events and a simple linked-list structure is used to implement it. In addition to the event list, the data structure of the “event” also needs to be modified. As Fig. 4.3 illustrats, we expand this data structure to store the source node ID and destination node ID of the event in the “src nid ” and “dst nid ” fields, respectively. That information will be used to determine whether this event is a safe event. To avoid null values in these fields, we assign the default value (65535) to it when an event is allocated and the event with the source node ID, 65535, is always determined as an unsafe event by the master thread.

(50)

ŘŰųŬŦųġ ŕũųŦŢťġĲ ŘŰųŬŦųġ ŕũųŦŢťġĳ API Clock() ! """ #

Figure 4.4: The worker thread gets its current simulation time based on its worker thread ID

4.1.3 Simulation Clock

In the sequential simulation mode, there is only one executing event at any given time. As such, we need only one memory to store the current simulation time, which is accessed by an API function named “clock().” For the ELP approach, if there are N worker threads in the simulator, they may need to get the current simulation time during their event execution. However, the current simulation time of each worker thread may be different because it depends on the executing event in the worker thread.

We must store these current time separately and modify the API function clock(), which is called by the worker thread to get the current simulation time. For this reason, an array variables is required to store these information and it is accessed based on the worker thread ID. As Fig. 4.4 depicts, when the worker thread executes an event and needs to get the current simulation time by calling the API function clock(), this API function retrieves the value from the array, which stores all current simulation time of the worker threads, based on the worker thread ID and return it as the current simulation time for this specific event.

4.1.4 Random Number

In the Linux system, a (single-threaded or multi-threaded) process is given a random table and it is shared among its user threads and kernel threads. However, this

在事件層平行方法下 NS2 網路模擬器的效能量測