An Efficient Time-Based Checkpointing Protocol for Mobile Computing Systems over Mobile IP

(1)

 2003 Kluwer Academic Publishers. Manufactured in The Netherlands.

An Efficient Time-Based Checkpointing Protocol for Mobile

Computing Systems over Mobile IP

CHI-YI LIN, SZU-CHI WANG and SY-YEN KUO∗ Department of Electrical Engineering, National Taiwan University, Taipei, Taiwan

Abstract. Time-based coordinated checkpointing protocols are well suited for mobile computing systems because no explicit coordination message is needed while the advantages of coordinated checkpointing are kept. However, without coordination, every process has to take a checkpoint during a checkpointing process. In this paper, an efficient time-based coordinated checkpointing protocol for mobile computing systems over Mobile IP is proposed. The protocol reduces the number of checkpoints per checkpointing process to nearly minimum, so that fewer checkpoints need to be transmitted through the costly wireless link. Our protocol also performs very well in the aspects of minimizing the number and size of messages transmitted in the wireless network. In addition, the protocol is nonblocking because inconsistencies can be avoided by the piggybacked information in every message. Therefore, the protocol brings very little overhead to a mobile host with limited resource. Additionally, by taking advantage of reliable timers in mobile support stations, the time-based checkpointing protocol can adapt to wide area networks.

Keywords: mobile computing, fault tolerance, checkpointing and rollback-recovery Category: Fault tolerance for IPv6 or IPv4-based mobile computing

1. Introduction

As computer and wireless communication technology ad-vance, the paradigm of mobile computing becomes close to reality. Mobile users are able to access and exchange infor-mation while they are on the move. As a result, collaborative work can be done easily, no matter where the participating members are physically located. Moreover, since user mobil-ity is supported, it is possible that joint workers are distributed over the wide area network and each of them is connected to the network via a wireless link. For example, in a sen-sor network which carries out a real-time scientific compu-tation, sensors with processing capability can be mobile and distributed. Traveling salespersons may rely on the gradation of each other for making an appropriate decision at a partic-ular time. In these scenarios, the most important thing is to make sure their work is progressing, and to minimize the lost work if a failure occurs.

To achieve high reliability, checkpointing and rollback-recovery techniques are widely used in the parallel and distributed computing environment [4–6,8,18]. Recently, checkpointing protocols for mobile computing systems have also been proposed [1–3,7,9,12,14–17]. A common goal of checkpointing protocols for mobile environment is to avoid extra coordination messages and unnecessary checkpoints. Coordinated checkpointing protocols have the advantage of simplicity over uncoordinated checkpointing protocols in terms of recovery and garbage collection. Besides, the for-mer requires less storage capacity for saving checkpoints. Traditional coordinated checkpointing protocols synchronize processes by exchanging coordination messages, but these

co-∗_{Corresponding author. E-mail: [email protected]}

ordination messages can be avoided by using time-based pro-tocols [4,10–13,18], which use synchronized clocks or timers to indirectly coordinate the creation of checkpoints.

However, time-based protocols require every process to take a checkpoint during a checkpointing process, which is saved in the local disk of a mobile host or sent to a fixed host. In this paper, we propose an efficient time-based checkpoint-ing protocol that tries to reduce the number of checkpoints. The basic idea is that if a checkpoint initiator does not transi-tively depend on a process, the process does not have to take a checkpoint associated with the initiator. So, in our proto-col, every process takes a soft checkpoint first, and then the soft checkpoint will be discarded when the process is found to be irrelevant to the initiator. Soft checkpoints are saved in the main memory of mobile hosts, and the soft checkpoints will be saved in the local disk at a later time only if they are not discarded. As a result, the number of disk accesses in the mobile hosts can be reduced.

Our protocol achieves the performance that the number of checkpoints is close to minimum, and the number of coor-dinating messages is very small compared to other existing protocols. The protocol is also non-blocking because the in-consistency between processes is avoided with the aid of the piggybacked information in each message.

The rest of this paper is organized as follows. Section 2 briefly introduces the related work. Section 3 describes the system model and the background. In section 4 we show how the timer synchronization can be improved for time-based protocols. In section 5 we present our checkpointing protocol based on the synchronization technique in section 4. Finally, section 6 concludes our work.

(2)

2. Related works

Prakash and Singhal [16] first proposed a checkpointing pro-tocol for mobile computing systems that requires only a minimum number of processes to take checkpoints (called min-process property) and does not block the underlying computation (called non-blocking property) during check-pointing. However, Cao and Singhal [2] proved that a min-process nonblocking checkpointing algorithm does not exist. They also introduced the concept of mutable check-point [3], which can be saved in the main memory or local disk of a mobile host, and is neither a tentative nor a perma-nent checkpoint. In their checkpointing algorithm, the num-ber of new tentative or permanent checkpoints is minimal, excluding those mutable checkpoints that are discarded after a checkpointing process.

Time-based protocols alleviate the need for explicit co-ordination message when taking a global checkpoint. Since timers cannot be perfectly synchronized, the consistency be-tween all the checkpoints can still be a problem. In [11,13], the consistency problem is solved by disallowing message sending during a period after a timer expires, but this makes a checkpointing protocol become a blocking protocol. In [12], however, processes are nonblocking because the consistency problem was resolved by the information piggybacked in each message. Timer synchronization can also be done using the piggybacked information. But when the transmission delay between two mobile hosts becomes relatively large, the syn-chronization result will be less accurate. Since the reliability of a mobile support station is much higher than that of a mo-bile host, we can take advantage of the accurate clock or timer in a mobile support station as a reliable reference for MHs.

3. System model and background

A mobile computing application is executed by a set of N processes running on several mobile hosts (MHs). Processes communicate with each other by sending messages. These messages are received and then forwarded to the destination host by mobile support stations (MSSs), which are intercon-nected by a fixed network. The mobility of MHs is supported by Mobile IP of IPv4 or IPv6, so that messages can be routed to the destination MH which is moving around in the network. A MH is associated with a Home Agent (HA) when it is within its home network, and with a Foreign Agent (FA) when within a foreign network.

To ensure ordered and reliable message deliveries, each message is associated with an increasing sequence number. To provide a computing system with fault tolerance capabil-ity, every process takes a checkpoint periodically. The pe-riodicity of taking two consecutive checkpoints is called a checkpoint period. Each checkpoint is associated with a monotonically increasing checkpoint number. In a coordi-nated checkpointing protocol, those checkpoints with the same checkpoint number from all the processes form a glob-ally consistent checkpoint. The time interval after taking the

kth checkpoint and before taking the (k+ 1)th checkpoints is called the kth checkpoint interval.

In a system, each node (including MHs and MSSs) con-tains a system clock, with typical clock drift rate ρ in the order of 10−5 or 10−6. The system clocks of MSSs can be synchronized using Internet time synchronization services such as Network Time Protocol (NTP), which makes the maxi-mum deviation σ of all the clocks within tens of milliseconds. The synchronization of MSSs can be done periodically. How-ever, in a wide area network environment, MSSs may belong to different organizations. So, instead of syncing the physi-cal system clocks of MSSs, we use the clock synchronization protocol to sync the logical clocks. The clocks of MHs can be synchronized likewise, but explicit synchronization mes-sages bring overhead to MHs because of the limited wireless bandwidth. In addition, the system clocks of MHs may not be controlled by a user-level application. It is also possible that there are other applications running concurrently in the MH, and some of the applications require transactions. The conse-quence is that modifying system clocks to coordinate between processes may not be a feasible way. Therefore, to coordinate with each other, processes use synchronized timers instead of synchronized clocks. The advantages of using timers to coor-dinate the creation of checkpoints are that the checkpointing protocol does not have to rely on synchronized system clocks of the participating hosts, and no explicit synchronization is needed.

Before a mobile computing application starts, a predefined checkpoint period T is set on the timers. When the local timer expires, the process saves its system state as a check-point. If all the timers expire at exactly the same time, the set of N checkpoints taken at the same instant forms a glob-ally consistent checkpoint. Since timers are not perfectly syn-chronized, the checkpoints may not be consistent because of orphan messages. An orphan message m represents an in-consistent system state with the event receive(m) included in the system state while the event send(m) not in the state. Or-phan messages may lead to the domino effect, which causes unbounded, cascading rollback propagation. As a result, in designing a checkpointing protocol, we need to ensure that no orphan message exists in a global checkpoint, so that the recovery can be free from the domino effect.

4. Improved timer synchronization

In this section we modify the timer synchronization mech-anism in the adaptive checkpointing protocol proposed by Neves and Fuchs [12], which makes the mechanism adaptive to wide area networks.

The mechanism of timer synchronization in [12] uses pig-gybacked timer information from the sender to adjust the timer at the receiver. When the sender sends a message, it piggybacks its “time to next checkpoint” (represented as timeToCkp) in the message. The receiver then uses the piggy-backed timeToCkp to adjust its own timeToCkp. The check-point number of the sender is also piggybacked in the

(3)

(a) (b)

Figure 1. Timer synchronization (a) cnMSS> cnS= cnD, and (b) cnMSS< cnS= cnD. message, so that the receiver can act accordingly to avoid

an orphan message. However, it is clear that if the timer of the sender is faulty, the erroneous timer information will be spread to the receiver. Besides, since the transmission delay between the sender and the receiver is variable, the timer in-formation from the sender may not reflect the correct situation when the message finally arrives at the receiver.

To achieve more accurate timer synchronization, we can utilize the timers in MSSs as an absolute reference because these timers in fixed hosts are more reliable than those in MHs. We assume that the timers of the MSSs are synchro-nized every checkpoint period, which is not a problem for fixed hosts in a wired network. In our design, the local MSS of the receiver is responsible for piggybacking its own time-ToCkp in every message destined to the receiver, because the MSS is the closest fixed host to the receiver.

In the system every MH and MSS maintains a checkpoint number. The checkpoint number is incremented whenever the local timer expires. In the following we use cnS, cnD, and cnMSS to represent the checkpoint number of the sender, re-ceiver, and the local MSS of the rere-ceiver, respectively. Just like in [12], the sender piggybacks its own checkpoint num-ber cnSin each message. When the local MSS of the receiver receives the message, apart from timeToCkp, it also piggy-backs cnMSSin the message. So there are three pieces of infor-mation piggybacked in a message when the message arrives at the receiver: cnS, cnMSS, and timeToCkp of the local MSS (represented as m.timeToCkp). Note that in practice messages take a minimum time tdmin to be delivered from a MSS to a MH in its cell. So, whenever the local timer of a MH is ad-justed by m.timeToCkp, substracting tdminfrom m.timeToCkp makes the adjustment more accurate. In the following de-scription we use the symbol to represent minus tdmin. The relationship between the checkpoint numbers of the receiver, the cnD, cnMSS, and cnSdetermines how the timer is adjusted, as described in the following cases:

I. Checkpoint numbers of the sender and the receiver are the same (cnS = cnD):

(1) cnMSSis equal to cnSand cnD(cnMSS= cnS = cnD). In this case, the receiver resets its timeToCkp to m.timeToCkp+ .

(2) cnMSSis larger than cnS and cnD (cnMSS > cnS = cnD, see figure 1(a)). It means that the timer of MHD is late compared to that of MSS2. So as soon as

mes-sage m is processed, MHD takes a checkpoint with checkpoint number equals to cnMSS. Then MHD re-sets its timeToCkp to m.timeToCkp+ .

(3) cnMSSis smaller than cnS and cnD (cnMSS < cnS = cnD, see figure 1(b)). It means that the timers of MHS and MHD are both early compared to that of MSS2. In order to make the timer of MHD expire at around the right time, MHD resets its timeToCkp to checkpoint period T plus m.timeToCkp+ . II. Checkpoint number of the sender is smaller than that of

the receiver (cnS< cnD):

(1) cnMSS is equal to cnD (cnS < cnMSS = cnD). Since MHD and its local MSS are within the same checkpoint period, MHDjust resets its timeToCkp to m.timeToCkp+ .

(2) cnMSS is equal to cnS (cnS = cnMSS < cnD). cnMSS < cnD means that the timer of MHD expires too early, so MHDresets its timeToCkp to checkpoint period T plus m.timeToCkp+ .

III. Checkpoint number of the sender is larger than that of the receiver (cnS > cnD):

(1) cnMSSis equal to cnD(cnS > cnMSS= cnD, see fig-ure 2(a)). In this case, before MHD can process m, it has to take a checkpoint with checkpoint number equal to cnS; otherwise m becomes an orphan mes-sage. Then MHDresets its timeToCkp to checkpoint period T plus m.timeToCkp+ .

(2) cnMSS is equal to cnS (cnS = cnMSS > cnD, see figure 2(b)). This case is basically the same as the previous one: MHDhas to take a checkpoint before processing m in order not to make m an orphan mes-sage. Since the timer of MHD is late compared to that of MSS2(cnMSS > cnD), MHD then resets its timeToCkp to m.timeToCkp+ .

(4)

(a) (b)

Figure 2. Timer synchronization (a) cnS> cnMSS= cnD, and (b) cnS= cnMSS> cnD. From the above discussion, we can find that the receiver’s

timer can be synchronized whenever a message is received. Since the synchronization information is piggybacked in every message, the sender’s timer can also be synchronized with its local MSS as soon as the sender receives the acknowl-edgement. In other words, both timers of the sender and the receiver can be synchronized in every message exchange.

In the next section, our checkpointing protocol requires that at the end of a checkpoint interval, none of the MH’s timers expires earlier than those of MSSs. To fulfill the re-quirement, we need to take the clock drifts of MHs and MSSs into account. The clock drift rates of the timers in MHs and MSSs are represented as ρMHand ρMSS, respectively. In the system model we also mentioned that after the clock synchro-nization, there exists a maximum deviation σ between two MSSs. In the following lemma, we show how the requirement is achieved.

Lemma 1. By setting = σ + 2ρMSS× T + ρMH × 2T − tdminin the algorithm, for any process that has received a message in the (cn− 1)th checkpoint interval, its (cn + 1)th checkpoint interval begins no earlier than that of a MSS. Proof. Assume a process is in the (cn− 1)th checkpoint in-terval and it receives a message. It is straightforward that the maximum time deviation between any two MSSs after a time period T , is σ + 2ρMSS× T . If receiving the message trig-gers a new checkpoint to be taken immediately such as cases I(2), III(1), and III(2), the maximum time to the (cn+ 1)th checkpoint is 2T . As a result, the maximum time deviation between the process and its MSS is ρMH× 2T − tdminfrom receiving the message to taking the (cn+ 1)th checkpoint. By setting = σ + 2ρMSS× T + ρMH× 2T − tdmin, the adjustment of timeToCkp makes the local timer expire no ear-lier than that of a MSS for the cnth checkpoint interval. On the other hand, if receiving the message does not trigger a new checkpoint immediately such as all other cases, the max-imum time to the (cn+ 1)th checkpoint is T . But multiplying 2T with ρMH in ensures that even if the process does not receive any message during the cnth checkpoint interval, the process’s (cn+ 1)th checkpoint interval will not begin earlier

than that of a MSS.

5. A time-based coordinated checkpointing protocol

In this section, we present our time-based checkpointing pro-tocol based on the synchronization techniques described in section 4, which is applicable for mobile computing systems over Mobile IP.

5.1. Notations and data structures

• SoftCkptcn_{: The cnth soft checkpoint of a process, which} is saved in the main memory of a MH.

• HardCkptcn_{: The cnth hard checkpoint of a process, which} is saved in the local disk of a MH or the stable storage of the MH’s HA or FA.

• PermCkptcn_{: The cnth permanent checkpoint of a process,} which is saved in the stable storage of the HA or FA. The system recovery line consists of N consistent permanent checkpoints, one from each process.

• Cellk: The wireless cell served by MSSk.

• Recvi: An array of N bits of process Pimaintained by Pi’s local MSS. In the beginning of every checkpoint interval, Recvi[j] is initialized to 0 for j = 1 to N, except that Recvi[i] always equals 1. When Pi receives a message m from Pj, and the receipt of m is confirmed by Pi’s MSS, Recvi[j] is set to 1.

• LastRecvi: The Recvi vector of the preceding checkpoint interval of process Pi, which is also maintained by Pi’s local MSS.

• RejectCPi: A variable that saves a checkpoint number for process Pi, maintained by Pi’s local MSS. When Pi is try-ing to transmit its hard checkpoint to the wired network, the local MSS rejects the transmission if the checkpoint number equals RejectCPi.

• CkptNumi: The current checkpoint number of Pi in the local MSS’s knowledge.

5.2. The checkpointing protocol 5.2.1. Checkpoint initiation

A process takes a soft checkpoint whenever its local timer expires. That is, the checkpointing process is intrinsically

(5)

ini-tiated distributedly, by the event of timer expiration in each mobile host. From section 4 we know that a process may also be forced to take a soft checkpoint on receiving a message. In this case, timeToCkp of the process is effectively set to zero on receiving the message, so a soft checkpoint will be taken im-mediately. After a soft checkpoint has been taken, Piresumes its computation.

We assume that during each checkpointing process, one of the N processes will play the role of the checkpoint ini-tiator. During a checkpointing interval, the initiator sends a checkpoint request only to its local MSS (denoted by MSSinit). During the checkpointing process, MSSinitis then responsible for collecting and calculating the dependency relationship be-tween the initiator and all other processes. Note that if there is no checkpoint initiator, the protocol becomes a traditional time-based checkpointing protocol, which makes no effort to reduce unnecessary checkpoints.

5.2.2. Maintaining dependency variables in MSSs

Since a MSS is responsible for forwarding messages to/from the MHs in its cell, we can make use of the MSS to main-tain the dependency variables (Recv, LastRecv) for those processes in the cell. For example, process Pi in Cellk re-ceives a message from Pjand then sends an ACK back to Pj via MSSk. By inspecting the ACK, MSSkknows that the mes-sage from Pjhas been delivered, so MSSk sets Recvi[j] to 1. Note that the ACK is piggybacked with the checkpoint num-ber of Pi, as described in section 4. From the piggybacked checkpoint number of Pi, MSSk can tell whether Pi has en-tered the next checkpoint interval or not. As soon as MSSk finds that Pihas entered a new checkpoint interval, MSSkthen saves the current Recvi as LastRecvi, resets Recvi, and then modifies Recvi accordingly. At the same time, MSSk also up-dates CkptNumifor Pi. Note that the variable RejectCP is also maintained in the MSS. We will explain it in section 5.3.2. 5.2.3. Calculating the dependency relationship

When the local timer of MSSinit expires, MSSinit broadcasts a Recv_Request message to all MSSs. At Tdefer after receiv-ing Recv_Request, each MSS sends to MSSinitthe dependency vector (Recv or LastRecv) of every process in its cell. Here Tdefer is a tunable parameter that the last message sent by a process before the process’s timer expires is expected to ar-rive at the local MSS no later than Tdeferafter the MSS’s timer expires. We can choose a proper Tdeferaccording to the QoS of the mobile network: the better the QoS, the smaller the Tdefer. A reasonable upper bound of Tdefercan be one half of a checkpoint period (T /2), which is normally in the order of several minutes or more.

After receiving all the dependency vectors, MSSinit con-structs an N × N dependency matrix D with one row per process. We adopt the algorithm in [2] that by matrix multi-plications, all the processes on which the initiator transitively depends can be calculated. After finishing the calculation, the final dependency vector Dinit can be obtained, in which Dinit[i] = 1 represents that the initiator transitively depends on Pi in the preceding checkpoint interval.

5.2.4. Discarding unnecessary soft checkpoints

A process can discard the newly taken soft checkpoint if the initiator does not depend on the process in the preced-ing checkpoint interval. To do that, MSSinit obtains a set S_Discardcn from Dinit, which consists of any process Pi such that Dinit[i] = 0, and then MSSinit sends a notification DISCARDcnto the processes in S_Discardcn. If process Pi receives DISCARDcn, it deletes the soft checkpoint from its main memory, and then renumbers the old HardCkptcn−1 as HardCkptcn. For the MSS that delivers DISCARDcnto Pi, it sets Recvi = (LastRecvi∨ Recvi)for Pi.

On the other hand, for those processes that do not re-ceive DISCARDcn during the cnth checkpoint interval, the soft checkpoints are kept in the main memory. At the begin-ning of the (cn+ 1)th checkpoint interval, the soft checkpoint is turned into a hard one before a new soft checkpoint can be taken.

5.2.5. Maintaining permanent checkpoints

In order to ensure the robustness of the recovery line, the hard checkpoints in a MH’s local disk should be transmitted to the stable storage of a fixed host periodically. In the mobile com-puting system based on Mobile IP, the stable storage of the home agent (HA) or foreign agent (FA) is an ideal place to store the permanent checkpoints for the processes in a MH.

The initial state of a process can be regarded as the first permanent checkpoint saved in the HA or FA of the process. So, there exists an initial recovery line which consists of N permanent checkpoints with checkpoint number 0. From 5.2.4 we know that a set of N hard checkpoints with check-point number cn will be formed at the beginning of the (cn + 1)th checkpoint interval. During the (cn + 1)th check-point interval, each process sends its HardCkptcn to either the HA or FA, depending on its current location. When the HA/FA receives the checkpoint, it saves HardCkptcnin its sta-ble storage as a permanent checkpoint PermCkptcn. After the HA/FA has collected all the checkpoints it should have re-ceived, it then proposes to advance the recovery line to check-point number cn. By adopting any feasible total agreement protocol, when all the HAs and FAs have proposed, the recov-ery line is advanced to checkpoint number cn.

5.2.6. Handling disconnections and Handoffs

When a MH within its cnth checkpoint interval is about to disconnect with its local MSS (say MSSp), the processes on the MH are required to take a hard checkpoint with check-point cn+ 1. Then, the checkpoints are forwarded to the HA or FA by MSSp. Assume process Pi takes a hard checkpoint HardCkptcn+1. On receiving HardCkptcn+1, the HA/FA saves (i, HardCkptcn+1) in the stable storage, but HardCkptcn+1 does not overwrite PermCkptcn+1 of Pi at the moment. The reason is that HardCkptcn+1may possibly be discarded during the (cn+ 1)th checkpoint interval if Pi is in S_Discardcn+1. If HardCkptcn+1is not discarded during the (cn+ 1)th check-point interval, it is turned into a permanent checkcheck-point during the (cn+ 2)th checkpoint interval.

(6)

For a disconnected process, its dependency information (Recv, LastRecv, RejectCP, CkptNum) is still kept in the MSS. If the process reconnects with another MSS at a later time, the old MSS then sends the dependency information of the process to the new MSS. For the handoff of a MH, the old MSS also forwards the dependency information of all the

processes in the MH to the new MSS. If the handoff in-volves a change of agents, the old agent forwards the per-manent checkpoints of the processes in the MH to the new agent.

In the following we present a formal description of our checkpointing algorithm:

I. Action at the initiator Pj:

01 sendCheckpoint_Requestto the localMSS;/* The MSS becomes MSSinit∗ /

II. Actions at the MSSinit:

01 wait until the local timer expires;/* Enters a new checkpoint interval */ 02 cn←cn+ 1;timeToCkp←T;

03 sendRecv_Requestto allMSSs;

04 while (not receiving allRecvvectors from eachMSS)

05 if (timeToCkp= 0)

06 exit;/* Abort checkpointing process for this time */ 07 construct matrixD;

08 Dinit← calculate(Recvinit,D);/∗ Recvinitis the Recv variable of the initiator */ 09 S_Discardcn_{← φ;}

10 for eachPi:

11 if (Dinit[i]= 0 )

12 S_Discardcn←_S_{_}_Discardcn∪_P i; 13 sendDISCARDcn_{to all processes}∈_S_{_}_Discardcn_;

III. Actions at process Piwhen Timeout_Event is triggered for checkpoint interval cn:

01 if (SoftCkptcn = NULL)

02 turnSoftCkptcn_into

HardCkptcn_; 03 takeSoftCkptcn +1_;

04 cn←cn+ 1;timeToCkp←nextTimeToCkp;

IV. Actions executed at an MSS, say MSSk,in the checkpoint interval cn:

01 upon relaying messagemfromPi∈CellktoPj:/∗ Note that m can also be an ACK */ 02 if (m.cni>CkptNumi)/∗ Pihas entered next ckpt interval but MSSkis not aware of that*/

03 {

04 CkptNumi←m.cni;

05 LastRecvi←Recvi;

06 resetRecvi;

07 modifyRecviif necessary, then sendmtoPj;

08 }

09 else if (m.ni=CkptNumi)

10 modifyRecviif necessary, then sendmtoPj; 11 else/* m.cni< CkptNumi∗ /

12 sendmtoPj;

13 upon receivingRecv_RequestfromMSSinit: 14 wait (Tdefer);

15 for eachithatPi∈Cellk:

16 if (CkptNumi=cn)

17 sendLastRecvitoMSSinit;

18 else/* CkptNumi< cn, and CkptNumicannot be larger than cn */

19 {

20 for anyjthat a message fromPjis unacknowledged:

21 Recvi[j]← 1;

22 sendRecvitoMSSinit;

23 LastRecvi←Recvi; resetRecvi;

24 CkptNumi←cn;

25 }

26 upon receivingDISCARDcn_for_P

iinCellkfromMSSinit: 27 if (Piis disconnected)

28 forwardDISCARDcn_{to the}_HA_/_FA_of_P i;

29 else

30 forwardDISCARDcn_to_P

i; 31 Recvi←LastRecvi∨Recvi;

32 upon receivingDisconnect_RequestfromMHqinCellk: 33 for eachPiinMHq:/*HardCkptcn+1is included in the request */

34 sendHardCkptcn+1_{to the}

HA/FAofPi; 35 upon receivingHandoff_RequestfromMHqinCellk:

36 for eachPiinMHq:/∗ The id of the new MSS is included in the request */

37 send (Recvi,LastRecvi,RejectCPi,CkptNumi) to the newMSSofPi; 38 upon the local timer expires:

(7)

39 for anyisuch thatDISCARDcn_for

Pi∈Cellkis undelivered:

40 RejectCPi←cn;

41 cn←cn+ 1;timeToCkp←nextTimeToCkp;

42 upon receivingForwardCP_Request(cn−1) fromPi∈Cellk: 43 if (cn−1 =RejectCPi)

44 receive and then forward the checkpoint to theHA/FAofPi;

45 else

46 reject the transmission;

V. Actions for any process Piin the checkpoint interval cn:

01 upon sendingHardCkptcn-1_{to the}_HA_or_FA_:

02 sendForwardCP_Request(cn-1) to the localMSS;

03 if (request not rejected)

04 sendHardCkptcn-1_{to the}_HA_or_FA_;

05 upon receivingDISCARDcn_:

06 discardSoftCkptcn_{; renumber}_HardCkptcn-1_as_HardCkptcn_; 07 upon expiration of the local timer:

08 nextTimeToCkp←T; triggerTimeout_Event;

09 upon receiving a messagemfromPj: 10 if (m.cnj=cn) 11 { 12 deliverMsgToProcess(m); 13 if (m.cnMSS=m.cnj) 14 timeToCkp←m.timeToCkp+ ; 15 else if (m.cnMSS>m.cnj) 16 { 17 cn←m.cnMSS;nextTimeToCkp←m.timeToCkp+ ;

18 triggerTimeout_Event;/* A soft ckpt will be taken now */

19 } 20 else 21 timeToCkp←T+m.timeToCkp+ ; /* m.cnMSS< m.cnj∗ / 22 } 23 else if (m.cnj<cn) 24 { 25 deliverMsgToProcess(m); 26 if (m.cnMSS=cn) 27 timeToCkp←m.timeToCkp+ ; 28 else 29 timeToCkp←T+m.timeToCkp+ ; /* m.cnMSS= m.cnj∗ / 30 } 31 else/* m.cnj > cn∗ / 32 { 33 if (m.cnMSS=cn) 34 nextTimeToCkp←T+timeToCkp+ ; 35 else 36 nextTimeToCkp←m.timeToCkp+ ; /* m.cnMSS= m.cnj∗ / 37 cn←m.cnj;

38 triggerTimeout_Event;/* A soft ckpt will be taken now */

39 wait untilSoftCkptcn_{is taken:}

40 deliverMsgToProcess (m);

41 }

5.3. Handling untimely delayed messages

Since inherent uncertainty of message delivery time exists in the wired and wireless network, we have to deal with untimely delayed messages in the checkpointing algorithm carefully. 5.3.1. Untimely delayed Recv vectors

When MSSinitis collecting the Recv vectors, it is possible that because of network congestions or link failures in the wired network, at the end of the checkpoint interval, some of the Recv vectors have not been received yet. In this case, the checkpointing process for this time has to be aborted (see code II, lines 04–06). In effect, aborting the checkpointing process does not stop the progression of the recovery line

since every process has taken a soft checkpoint, and these soft checkpoints will be turned hard at the beginning of the next checkpoint interval.

5.3.2. Untimely delayed DISCARDcnnotifications

There is a chance that the discard notifications DISCARDcn cannot be delivered to some of the processes before their cnth checkpoint interval is over. For example, the scenario in fig-ure 3 can happen for any Pi and Pj both in S_Discardcn: at the end of the cnth checkpoint interval, Pi has received DISCARDcnbut Pj has not. The consequence is that Pi will renumber its HardCkptcn−1 as HardCkptcn and Pj will turn its SoftCkptcn into HardCkptcn. If there exists a message m

(8)

sent by Piafter Pi’s HardCkptcn−1and received by Pjbefore Pj’s SoftCkptcn, then m becomes an orphan message with re-spect to Pi’s HardCkptcnand Pj’s HardCkptcn. However, the local MSS of Pjis aware that Pjdoes not receive DISCARDcn during the cnth checkpoint interval, so it sets RejectCP_jto cn. During the (cn+ 1)th checkpoint interval, the MSS rejects the transmission of Pj’s HardCkptcn(see code IV, lines 38-46) so that the permanent checkpoint of Pj is not overwritten by the wrongly taken hard checkpoint. On Pj’s part, if the transmis-sion of its HardCkptcnis rejected by the local MSS, Pjdeletes HardCkptcn.

5.3.3. Untimely Delayed Acknowledgements

In our algorithm, Pi’s dependency vectors Recvi and LastRecviare maintained by the local MSS (say MSSk). MSSk maintains these variables by inspecting the piggybacked in-formation in an ACK sent by Pi, but an untimely delayed ACK could be a problem during the checkpointing process. Take figure 4 as an example, when MSSk is about to send Recvi to MSSinit, ACK.mhas not arrived so that MSSk cannot tell whether or not to include the receipt of m in Recvi at the in-stant. In the proposed algorithm we take the following policy (see code IV, lines 18–25): when MSSkis about to send Recvi to MSSinitand it finds that such an unacknowledged message exists, Recvi[j] is set to 1. That is, MSSk presumes the case figure 4(a) always happens. But if ACK.m finally arrives and shows that figure 4(b) is true instead, the receipt of m is then included in Recvi of the cnth checkpoint interval.

5.4. Rollback recovery

When a failure occurs, all the processes should roll back to the latest recovery line. Assume the latest recovery line is

num-Figure 3. A possible scenario that the delivery of DISCARDcnfor Pj is delayed.

bered as cn. Each process first checks whether its HardCkptcn is still in the local disk. If HardCkptcnis found, the process can roll back to the state of HardCkptcn because the con-tent of HardCkptcn is identical to PermCkptcn. Otherwise, the process requests its PermCkptcnfrom the HA or FA. Note that a wrongly taken HardCkptcn of a process, as described in section 5.3.2, will not be used for recovery because the process will delete the checkpoint as soon as the transmission of HardCkptcnis rejected.

From the above description, we can see that with the help of local hard checkpoints, some of the processes can be re-covered locally so that the recovery can be done efficiently.

5.5. Proof of correctness

Lemma 2. If a process Pi receives a message from another process Pjduring the (cn−1)th checkpoint interval and Pj ∈ S_Discardcn, then Pi ∈ S_Discardcn.

Proof. If Pi ∈ S_Discard/ cn, from the proposed algorithm, the initiator transitively depends on Pi during the (cn− 1)th checkpoint interval. Since Pidepends on Pj, the initiator also transitively depends on Pj during the (cn− 1)th checkpoint interval. From the proposed algorithm, Pj ∈ S_Discard/ cn.

A contradiction.

Lemma 3. N permanent checkpoints with the same

check-point number form a globally consistent checkcheck-point.

Proof. We prove it by induction. In the beginning, the N permanent checkpoints with checkpoint number 0 obviously form a globally consistent checkpoint. Assume there are N permanent checkpoints with checkpoint number k and they form a globally consistent checkpoint. In the proposed al-gorithm, if a process Pi receives a message m from another process Pj during the kth checkpoint interval, there are two possibilities:

Case 1: If Pj ∈ S_Discardk+1, there are two possibilities for Pj:

(1.1) Pjdoes not receive DISCARDk+1during its (k+1)th checkpoint interval. From lemma 1 we know Pj’s local MSS does not receive the ACK of DISCARDk+1

(a) (b)

Figure 4. The ACK of m arrives later than MSSkhas sent Recvito MSSinit(a) receipt of m is in (cn− 1)th checkpoint interval of Pi; (b) receipt of m is in the cnth checkpoint interval of Pi.

(9)

at the end of its (k+ 1)th checkpoint interval. As a result, RejectCP_j is set to k+ 1 and the preceding permanent checkpoint PermCkptk of Pj is renum-bered as PermCkptk+1.

(1.2) Pj receives DISCARDk+1 during its (k + 1)th checkpoint interval. In this case, Pj discards its SoftCkptk+1 and the preceding permanent check-point PermCkptk of Pj is renumbered as PermCkptk+1.

From lemma 2 we know Pi ∈ S_Discardk+1. Through the above discussion, we know no matter if Pi receives DISCARDk+1 during its (k+ 1)th checkpoint interval or not, the preceding permanent checkpoint PermCkptk of Pi is renumbered as PermCkptk+1. Since the permanent checkpoints with checkpoint number k form a globally consistent checkpoint, there is no orphan message between the (k+ 1)th permanent checkpoint of Piand the (k+ 1)th permanent checkpoint of Pj.

Case 2: If Pj ∈ S_Discard/ k+1, Pj does not receive DISCARDk+1and turns its SoftCkptk+1into HardCkptk+1 at the end of its (k+ 1)th checkpoint interval. After that, Pj’s HardCkptk+1 is saved as Pj’s PermCkptk+1. From the proposed algorithm, Pj must send m before it takes SoftCkptk+1. Otherwise, Pi will take SoftCkptk+1 before processing m, which makes m been received within Pi’s (k+ 1)th checkpoint interval. As a result, no matter Pi’s PermCkptk is renumbered as PermCkptk+1 or Pi’s Hard-Ckptk+1is saved as PermCkptk+1, there is no orphan mes-sage between Pi’s PermCkptk+1and Pj’s PermCkptk+1. Thus, if the N permanent checkpoints with checkpoint number k form a globally consistent checkpoint, there is no orphan message between the (k+1)th permanent checkpoints of any two processes. That is, N permanent checkpoints with checkpoint number k+ 1 form a globally consistent

check-point.

Theorem. The proposed algorithm always creates a

consis-tent global checkpoint.

Proof. In the beginning there are N permanent checkpoints with checkpoint number 0, and they form the initial recovery line. Suppose there exists N permanent checkpoints with the same checkpoint number k. In the proposed algorithm, we advance the recovery line to checkpoint number k+ 1 only when all processes’ permanent checkpoints PermCkptk+1are collected. From lemma 3, N permanent checkpoints with the same checkpoint number form a globally consistent check-point. Therefore, there always exists a consistent global

checkpoint.

5.6. Performance of our algorithm

In this section we discuss the performance of our checkpoint-ing algorithm, includcheckpoint-ing the blockcheckpoint-ing time, the number of

permanent checkpoints, and the number of coordinating mes-sages. Then we show the comparison with other protocols in a table. Here are the notations used in the following text:

– Nmin: the number of processes that need to take check-points using the Koo–Toueg algorithm [8].

– Ndep: the average number of processes on which a process depends (1 Ndep N − 1).

– Cwireless: cost of sending a message in the wireless link. – Cwired: cost of sending a message in the wired link. – Cbroad: cost of broadcasting a message to all processes. – Tckpt: the checkpointing time, including the delays

in-curred in transferring a checkpoint from a MH to its MSS and saving the checkpoint in the stable storage in the MSS or a fixed host.

5.6.1. Blocking time

It is very clear that the blocking time of our protocol is 0. 5.6.2. Number of new permanent checkpoints

In section 5.3.3, we described that if there is an unacknowl-edged message like the scenario depicted in figure 4, the MSS presumes the case in figure 4(a) always happens. That is, the receipt of message m from Pj is included in the Recv vector of Pi’s (cn− 1)th checkpoint interval. If it turns out later that figure 4(b) is true instead, then there is a chance that Pj and Pj-dependent processes should not have been included in the final dependency with the initiator. The consequence is that there may be additional soft checkpoints be turned into hard ones, so as to increase the number of new permanent check-points. If we choose a proper Tdefer such that the untimely delayed ACKs are very rare, the number of new permanent checkpoints is then close to minimum.

5.6.3. Number of coordinating messages

In the algorithm, the only coordinating message transmitted in the wireless link is the discard notification to a process in the set S_Discardcn. The approximate number of discard no-tifications is N − Nmin. Messages sent in the wired link are N Recv vectors from each MSS to MSSinit, and N − Nmin discard notifications from MSSinit to MSSs that serve the processes in S_Discardcn.

5.6.4. Comparison with other algorithms

Table 1 compares the performance of our algorithm with the algorithms in [3,8,13]. Compared to the Neves–Fuchs algo-rithm which is also time-based, our algoalgo-rithm reduces the number of checkpoints to nearly minimum, so that the to-tal number of checkpoints transmitted onto the fixed network is reduced. Fewer checkpoints transmitted also means less power consumption for mobile hosts. For a mobile comput-ing system, it is also very critical to minimize the number and size of the messages transmitted in the wireless link. So, if we only consider the number of coordinating messages sent in the wireless link, our algorithm performs fairly well. For the size of the piggybacked information and the coordination

(10)

Table 1 Performance comparison.

Algorithm Blocking time Checkpoints Messages

Koo–Toueg [8] Nmin× Tckpt Nmin 3×Nmin× Ndep× (Cwired+ Cwireless)

Neves–Fuchs [13] σ+ 2ρMHT− tdmin N 2×N × Cwireless

Cao–Singhal [3] 0 Nmin ≈ 2 × Nmin× (Cwired+ Cwireless)

+ min(Nmin× (Cwired+ Cwireless), Cbroad) Our algorithm 0 ≈ Nmin ≈ N × Cwired+ (N − Nmin)× (Cwired+ Cwireless) message in the wireless link, our protocol outperforms Cao–

Singhal algorithm with O(1) to O(N ). On the other hand, the cost of transmitting a message in the wired link is far less than transmitting in the wireless link. So, although our protocol re-quires O(N ) coordinating messages in the wired network, the cost is affordable for the wired network with high bandwidth.

6. Conclusions

In this paper we proposed a time-based checkpointing proto-col for mobile computing systems over Mobile IP. Our pro-tocol reduces the number of checkpoints compared to the traditional time-based protocols. We also make use of the ac-curate timers in the MSSs to adjust the timers in the MHs, so that our protocol is well suited to a large mobile com-puting system with MHs spread across a wide area network. We also take advantage of the infrastructure provided by Mo-bile IP, so that the permanent checkpoints of the participat-ing processes can be saved in the HA or FA dependparticipat-ing on the process’s current location. Compared to other protocols, our protocol performs very well in the aspects of minimizing the number and size of messages transmitted in the wireless media. Tracking and computing the dependency relationship between processes are performed in the MSSs, so that MHs are free from additional tasks during checkpointing.

References

[1] A. Acharya and B.R. Badrinath, Checkpointing distributed applications on mobile computers, in: Proceedings of International Conference on Parallel and Distributed Information Systems (September 1994) pp. 73–80.

[2] G. Cao and M. Singhal, On the impossibility of min-process non-blocking checkpointing and an efficient checkpointing algorithm for mobile computing systems, in: Proceedings of the 27th International Conference on Parallel Processing (August 1998) pp. 37–44. [3] G. Cao and M. Singhal, Mutable checkpoints: A new checkpointing

ap-proach for mobile computing, IEEE Transactions on Parallel and Dis-tributed Systems 12(2) (2001) 157–172.

[4] F. Cristian and F. Jahanian, A timestamp-based checkpointing protocol for long-lived distributed computations, in: Proceedings of the IEEE International Symposium on Reliable Distributed Systems (September 1991) pp. 12–20.

[5] E.N. Elnozahy, D.B. Johnson and W. Zwaenepoel, The performance of consistent checkpointing, in: Proceedings of the 11th Symposium on Reliable Distributed Systems (October 1992) pp. 39–47.

[6] E. Elnozahy, L. Alvisi, Y.M. Wang and D.B. Johnson, A survey of roll-back recovery protocols in message passing systems, Technical report CMU-CS-99-148, School of Computer Science, Carnegie Mellon Uni-versity (June 1999).

[7] H. Higaki and M. Takizawa, Checkpoint-recovery protocol for reliable mobile systems, in: Proceedings of the IEEE Symposium on Reliable Distributed Systems (October 1998) pp. 93–99.

[8] R. Koo and S. Toueg, Checkpointing and rollback-recovery for dis-tributed systems, IEEE Transactions on Software Engineering (January 1987) 23–31.

[9] Y. Morita and H. Higaki, Hybrid checkpoint protocol for support-ing mobile-to-mobile communication, in: Proceedsupport-ings of the IEEE International Conference on Information Networking (January 2001) pp. 529–536.

[10] S. Neogy, A. Sinha and P.K. Das, Checkpoint processing in distributed systems software using synchronized clocks, in: International Confer-ence on Information Technology: Coding and Computing (April 2001) pp. 555–559.

[11] N. Neves and W.K. Fuchs, Using time to improve the performance of coordinated checkpointing, in: Proceedings of the IEEE Interna-tional Computer Performance and Dependability Symposium (Septem-ber 1996) pp. 282–291.

[12] N. Neves and W.K. Fuchs, Adaptive recovery for mobile environments, Communications of the ACM (January 1997) 68–74.

[13] N. Neves and W.K. Fuchs, Coordinated checkpointing without sirect coordination, in: Proceedings of the IEEE International Computer Per-formance and Dependability Symposium (September 1998) pp. 23–31. [14] T. Park and H.Y. Yeom, An asynchronous recovery scheme based on optimistic message logging for mobile computing systems, in: Pro-ceedings of the International Conference on Distributed Computing Systems (April 2000) pp. 436–443.

[15] T. Park, N. Woo and H.Y. Yeom, An efficient recovery scheme for mo-bile computing environments, in: IEEE International Conference on Parallel and Distributed Systems (June 2001) pp. 53–60.

[16] R. Prakash and M. Singhal, Low-cost checkpointing and failure recov-ery in mobile computing systems, IEEE Transactions on Parallel and Distributed Systems 7(10) (1996) 1035–1048.

[17] K.F. Ssu, B. Yao, W.K. Fuchs and N. Neves, Adaptive checkpointing with storage management for mobile environments, IEEE Transactions on Reliability 48(4) (December 1999) 315–324.

[18] Z. Tong, R.Y. Kain and W.T. Tsai, A low overhead checkpointing and rollback recovery scheme for distributed systems, in: Proceedings of the 8th Symposium on Reliable Distributed Systems (October 1989) pp. 12–20.

Chi-Yi Lin received the B.S. degree in

electri-cal engineering from National Taiwan University, Taipei, Taiwan, in 1995. From August–December 2000, he was a visiting researcher in AT&T Labs-Research at New Jersey. He is currently work-ing toward the Ph.D. degree at the Department of Electrical Engineering, National Taiwan University. His research interests include fault tolerant distrib-uted/mobile computing systems, and the dissemina-tion of informadissemina-tion in wireless networks. He is a stu-dent member of IEEE.

(11)

Szu-Chi Wang received the B.S. degree in computer

science and information engineering from National Taiwan University in 1995. He is a Ph.D. candi-date in the Department of Electrical Engineering at National Taiwan University. His research interests include distributed systems, fault-tolerant systems, and the dissemination of information in wireless net-works.

E-mail: [email protected]

Sy-Yen Kuo received the B.S. (1979) in electrical

engineering from National Taiwan University, the M.S. (1982) in electrical and computer engineering from the University of California at Santa Barbara, and the Ph.D. (1987) in computer science from the University of Illinois at Urbana-Champaign. Since 1991 he has been with National Taiwan University, where he is currently a professor and the Chairman of Department of Electrical Engineering. He spent his sabbatical year as a visiting researcher at AT&T

Labs-Research, New Jersey from 1999 to 2000. He was the Chairman of the Department of Computer Science and Information Engineering, National Dong Hwa University, Taiwan from 1995 to 1998, a faculty member in the Department of Electrical and Computer Engineering at the University of Ari-zona from 1988 to 1991, and an engineer at Fairchild Semiconductor and Silvar-Lisco, both in California, from 1982 to 1984. In 1989, he also worked as a summer faculty fellow at Jet Propulsion Laboratory of California Insti-tute of Technology. His current research interests include mobile computing and networks, dependable distributed systems, software reliability, and opti-cal WDM networks.

Professor Kuo is an IEEE Fellow. He has published more than 180 papers in journals and conferences. He received the distinguished research award (1997–2001) from the National Science Council, Taiwan. He was also a recipient of the Best Paper Award in the 1996 International Symposium on Software Reliability Engineering, the Best Paper Award in the simulation and test category at the 1986 IEEE/ACM Design Automation Conference (DAC), the National Science Foundation’s Research Initiation Award in 1989, and the IEEE/ACM Design Automation Scholarship in 1990 and 1991.