Figure 2.7: The module-based platform provided by the NS2 network simulator dle, and schedule those user threads entirely at th user-level. NPTL is capable of associating a scheduled user thread with an available lightweight process (or a kernel thread). In other words, the MxN model of the NPTL library must make context switching of user threads in the library. The model increases the complexity of the implementation, and therefore there is no developer who is implementing this model nowadays. All Linux distributions have adopted the NPTL library with the 1x1 model to support multi-threaded applications.
2.4 The NS2 Network Simulator
NS2 is an object-oriented and module-based simulator developed as part of the VINT project at the University of California in Berkeley. NS2 is a discrete event simulator targeted at networking research. It provides substantial support for simulation of TCP, UDP, routing protocols over wired and wireless networks. Its performance depends on the number of events that it needs to process. The more events it needs to process, the slower its simulation speed will be.
It adopts an open-system architecture and provides a module-based platform.
Fig. 2.7 is an example that depicts a wireless network topology consisting of three
class BiConnector : public NsObject
virtual void sendUp(Packet *p, Handler *h);
virtual void sendDown(Packet *p, Handler *h);
...
class Connector : public NsObject {
virtual void send(Packet *p, Handler *h);
...
Figure 2.8: The module skeleton provided by the NS2 network simulator
nodes and the organization of each node. In the module-based platform, a module skeleton is provided as shown in Fig. 2.8. Based on the skeleton, researcher can easily develop their own modules and integrate them into the simulator.
Chapter 3
The Event-level Parallelism Approach
3.1 The ELP Architecture Overview
In this chapter, we present the architecture of the ELP approach. Fig. 3.1 shows the architecture of a parallel network simulator using the ELP approach. For illustration purposes, the parallel network simulator depicted in this figure is for dual-core sys-tems. It can be extended for quad-core systems by simply increasing the number of worker threads from two to four in this figure.
There are four important components of the ELP architecture. In the following, we describe those components respectively.
• Master Thread
The major responsibility of the master thread is to determine currently which events are safe to be executed in parallel without causing causality errors among themselves
• Worker Thread
A worker thread only executes safe events which have been determined by master thread as safe events and placed in the safe event list. It may insert the newly
ŎŢŴŵŦųġ ŕũųŦŢť
ŘŰųŬŦųġ ŕũųŦŢťġIJ
ŘŰųŬŦųġ ŕũųŦŢťġij
Figure 3.1: The architecture of a parallel network simulator using the ELP approach for dual-core systems
generated events into the event list during event execution.
• Event List
All newly generated events are stored in the event list. The event list is accessed by master or worker threads.
• Safe Event List
The safe event list stores the safe events to be executed in the future. Both master and worker threads can access this list.
For a dual-core, only three simple threads need to run in the parallel network simulator. One thread is the master thread while the others are two worker threads.
The master thread determines currently which events are “safe” [2] to be executed by worker threads in parallel without causing causality errors among themselves. When such events can be found, the master thread moves them from the event list to the safe event list and wakes up any worker thread that is sleeping waiting for a safe event to execute. As Fig. 3.2 illustrats, if no new safe events can be found at the current time,
!
" #$ $% $ & ' & ($ $% $ &
) $* $ $ + , - .$
#$ '& $ ' $
/ $ $ $ $% $
& ($ $% $ ) $ ( 1 '
2 .$ 1 )+ - .$# $ '
Figure 3.2: State translation diagram of the master thread
the master thread sleeps and waits for any worker thread to later wake it up when that worker thread has finished its current event computation. After being waked up, the master thread continues to find new safe events. The master thread repeats the above operations until the whole simulation is finished (i.e., when the event list becomes empty).
At any given time, at least one worker thread must be busy executing an event.
The ELP approach has this property because as in a sequential simulation, the event with the smallest timestamp in the whole simulated network must be a safe event for execution. As such, suppose that currently there is only one worker thread busy executing an event, when that worker thread has finished its event computation and become idle because there is no safe event in the safe event list for it to execute, the master thread must be able to find at least one safe event (it is simply the event at the head of the event list) and move it (or them) to the safe event list. At that time, one or more worker threads will have safe events to execute again as Fig. 3.3 depicts. With this property, the master thread is assured that, when all safe events have been processed, it will be waked up by one worker thread to continue to find more safe events from the event list. It is also this property why the ELP approach can always generate a performance speedup higher than or close to 1 even under tiny
! "
# $ % $ & '
# $ % ( ) * *
+ , *
- , *
, *
. / 01 2 * 3 4
5 * 4
4
3 ,
Figure 3.3: State translation diagram of the worker thread
lookahead values. This is because under such a harsh condition, the ELP approach can always degenerate to a sequential simulation approach. This can be simply done by activating only one worker thread and asking the master thread to move the first event in the event list to the safe event list without checking whether it is a safe event.
As such, the ELP approach will never result in a parallel simulation that runs much slower than a sequential simulation.
The event list stores the events to be executed in the future and sorts them based on their timestamps, which denote the times when the events should be executed, in a non-decreasing order. In a sequential network simulator running on a single CPU, these events will be dequeued and executed sequentially by the CPU to avoid causality errors. If new events are created (called “scheduled” in the simulation research field) when an event is executed by a worker thread, the newly generated events are inserted into the event list based on their timestamps.
In the ELP approach, two worker threads are created on a dual-core system to fully utilize the two CPUs. Found safe event are stored and sorted in the safe event list, from which any worker thread can dequeue a safe event to execute it when that
worker thread finishes its current event computation and becomes idle.
As long as the two worker threads are busy executing events at all times, the two CPUs will be fully utilized and good performance speedups will result. The important job of the master thread is to find enough ELP for the target N-core system at all times. For a N-core system, maintaining N safe events in the safe event list at all times suffices to keep the N worker threads busy all the time.
When a worker thread finishes its current event computation, wants to dequeue a safe event from the safe event list, but finds that there is no safe event in the safe event list, it wakes up the master thread and asks it to find more safe events and move them into the safe event list, If more safe events can be found, the master thread will wake up this worker thread and this worker thread will have a safe event to execute and become busy again. If no more safe events can be found (e.g., the other worker thread is executing an event, which can affect the events at the front of the event list), this worker thread will sleep, waiting for the master thread to later wake it up when the master thread has found new safe events and inserted them into the safe event list.
Since the master thread and worker threads all need to access the event list and the safe event list, to ensure data structure consistency, accesses to these lists are coordinated and protected by locks. In addition to these two lists, if there are other global variables or data structures that may be accessed during event execution, they need to be protected by locks as well. One global variable that is accessed very fre-quently during event execution is the variable storing the current simulation time.
However, because this variable is accessed only for read purposes during event execu-tion and its value can only be advanced by the event-processing loop of the network simulator (in the master thread), it need not be protected by locks. When a worker thread dequeues a safe event from the safe event list for execution, the timestamp of the event to be executed should be the current simulation time for this specific event.
As such, one can store it into a local variable of this worker thread. When the worker thread executes this event and needs to get the current simulation time by calling an API function, this API function can retrieve the value of this local variable based on
the thread ID and return it as the current simulation time for this specific event. No locking/unlocking overhead is needed for accessing the current simulation time of a simulated network.