Two-Version-Based Concurrency Control and Recovery in Real-Time Client/Server Databases∗†

(1)

Two-Version-Based Concurrency Control and Recovery in Real-Time Client/Server Databases

^∗†

Tei-Wei Kuo, Yuan-Ting Kao^†, and Chin-Fu Kuo

Department of Computer Science and Information Engineering National Taiwan University, Taipei, Taiwan 106, ROC

†Department of Computer Science and Information Engineering National Chung Cheng University, Chiayi, Taiwan 621, ROC

Abstract

While there has been a signiﬁcant amount of research in real-time concurrency control, little work has been done in logging and recovery for real-time databases. This paper proposes a two-version approach which considers both real-time concurrency control and recovery.

We propose a network-server-based architecture and algorithms which can not only reduce the blocking time of higher-priority transactions and improve the response time of client- side read-only transactions but also provide a diskless run-time logging mechanism and an eﬃcient and predictable recovery procedure. The performance of the algorithms was veriﬁed by a series of simulation experiments by comparing the algorithms with the well-known Priority Ceiling Protocol (PCP), the Read/Write PCP, the New PCP, and the 2-version two-phase locking protocol, for which we have very encouraging results. The schedulability of higher-priority transactions and the response time of client-side read-only transactions were all greatly improved.

KEY WORDS AND PHRASES: Real-Time Database, Concurrency Control, Recovery, Read-Only Transactions, Client/Server Databases, Logging, Write Through Procedure.

∗Supported in part by research grants from the National Science Council under Grants NSC87-2213-E- 194-018 and NSC87-2213-E-309-002

†This paper is an extended version of a paper in the IEEE Third International High Assurance Systems Engineering Symposium. Yuan-Ting Kao graduated from the National Chung Cheng University under the thesis supervision of Tei-Wei Kuo.

(2)

1 Introduction

Real-time concurrency control has been an active research topic in the past decades. A number of researchers, [5, 6, 7, 14, 16, 21, 17, 22, 18, 26, 27, 25, 30, 32], have proposed vari- ous eﬀective mechanisms in the concurrency control of real-time data access. In particular, semantics-based concurrency control, [6, 7, 16, 19, 26, 27, 25, 32], has been shown to improve the system schedulability signiﬁcantly. Researchers also explored issues in processing read-only transactions, such as those [18, 19] based on the idea of dynamic adjustment of serializability order [22].

As issues in real-time concurrency control are better understood, the demand of system reliability is increasing. Although a lot of research, [8, 12, 11, 24], has been done in logging and recovery for traditional databases, little work explores logging and recovery for real-time databases [13, 31]. Rare research considers issues related to both real-time concurrency control and durability issues. In particular, Gupta, Haritsa, and Ramamritham [10] proposed a new real-time commit protocol which allows transactions to “optimistically”

borrow uncommitted data in a controlled manner to minimize the number of deadline viola- tions. Sivasankaran, Ramamritham, and Stankovic [31] proposed a partitioned logging and recovery algorithm for real-time disk-resident databases. The log is partitioned according to data classes, such as critical and temporal ones, to provide parallel logging and recovery. Non-volatile RAM-based devices are used to reduce the unpredictability of real-time databases. Huang and Gruenwald [13] proposed a checkpointing technique for real-time main memory databases. A database is partitioned according to data types (persistent type vs. temporal type) and update frequencies. The system checkpoints each partition independently based on its update frequency and its temporal valid interval.

While little work has been done in logging and recovery for real-time databases, the close relationship between real-time concurrency control and recovery (and logging) has been ignored in the past decade. Note that a schedule is recoverable only if no transaction τ commits before any transaction from which τ reads commits. A real-time concurrency control protocol should ensure that conﬂicting transactions commit in the order of their read-from relationship. This paper proposes an integrated mechanism for concurrency control and recovery in real-time databases. A two-version-based concurrency control protocol called Two-Version Priority Ceiling Protocol (2VPCP), is proposed to reduce the blocking time of higher-priority transactions based on the idea of dynamic serializability adjustment [17, 19, 22] without relying on local data updates for transactions [17, 19, 22]. The 2VPCP protocol is, then, extended to a distributed environment to process read-only transactions

(3)

at client-side systems locally. The resulting system can not only significantly boost the response time of read-only transactions issued at client-side systems, but also virtually eliminate the interference of conflicting data accesses between client-side read-only transactions and server-side transactions. The extended 2VPCP protocol not only associates each client-side system with a consistent database image for local processing of read-only transactions, but also provides an efficient recovery mechanism. The performance of the algorithms was verified by a series of simulation experiments, for which we had very encouraging results. Comparisons of different recovery mechanisms are also presented to demonstrate the capability of the two-version approach.

There are two major contributions in this paper: (1) The effectiveness of the two- version approach is shown in reducing the blocking time of higher-priority transactions and in improving the response time of client-side read-only transactions. Note that the results of this paper are orthogonal to any previous research in processing read-only transactions, [18, 19], which consider weaker correctness criteria or access patterns of transactions. All transactions in our system are serializable. (2) A two-version (network-server-based) architecture is proposed to not only support a diskless run-time logging mechanism and an effective write-through procedure, but also provide an efficient and predictable recovery mechanism. The logging mechanism and an effective write-through procedure virtually have no impact on the executions of transactions in the system.

The rest of this paper is organized as follows: Section 2 extends the Read/Write Priority Ceiling Protocol (RWPCP) [30] into a two-version-based protocol called the Two- Version Priority Ceiling Protocol (2VPCP). The properties of the 2VPCP protocol are then proven. Section 3 further extends the 2VPCP protocol to a distributed environment to locally and eﬃciently process read-only transactions at client-side systems. The correctness of the extended protocol is proven. Section 4 proposes an eﬃcient and predictable recovery mechanism based on the extended 2VPCP protocol. Section 5 provides experimental results which demonstrate the performance of the algorithms. Section 6 is the conclusion.

2 The 2VPCP Protocol

2.1 Overview

The Read/Write Priority Ceiling Protocol (RWPCP) [30] has shown the eﬀectiveness of using read and write semantics in improving the performance of the Priority Ceiling Protocol (PCP) [29] in real-time concurrency control. While PCP only allows exclusive locks on data

(4)

objects, RWPCP introduces a write priority ceiling W P L_i and an absolute priority ceiling AP L_i for each data object O_i to emulate share and exclusive locks, respectively. The write priority ceiling W P L_iof data object O_i is equal to the highest priority of transactions which may write O_i. The absolute priority ceiling AP L_i of data object O_i is equal to the highest priority of transactions which may read or write O_i. When data object O_i is read-locked, the read/write priority ceiling RW P L_i of O_i is equal to W P L_i. When data object O_i is write-locked, the read/write priority ceiling RW P L_i of O_i is equal to AP L_i. A transaction instance may lock a data object if its priority is higher than the highest read/write priority ceiling RW P L_i of the data objects locked by other transaction instances. When a data object O_i is write-locked, the setting of RW P L_i prevents any other transaction instance from write-locking O_i because RW P L_i is equal to AP L_i. When a data object O_i is read- locked, the setting of RW P L_i only allows a transaction instance with a suﬃciently high priority to read-lock O_i in order to constrain the number of priority inversions for any transaction instance which may write-lock O_i because RW P L_i is equal to W P L_i.

Lam and Hung [17] further sharpened the RWPCP by proposing the idea of dynamic adjustment of serializability order for hard real-time transactions, where Lin and Son [22]

proposed the idea of dynamic adjustment of serializability order for optimistic real-time concurrency control. With a delayed write procedure, a higher-priority transaction instance may preempt a lower-priority transaction instance by using the Thomas Write rules when a write-write conflict exists, where a delayed write procedure requires every transaction instance to only update data objects in its local space and to delay the updating of the database until the commitment of the transaction instance. The read-write conflict between conflicting transaction instances is partially resolved by allowing a higher-priority transaction instance to read the database even though a lower-priority transaction instance has write-locked the data object. Note that the delayed write procedure requires every transaction instance to only update data objects in its local space, and the above preemption in read-write conflict lets the higher-priority transaction instance precede the lower-priority transaction instance in the serializability order.

Although the new protocol introduced by Lam and Hung [17] signiﬁcantly reduces the blocking time of higher-priority transaction instances under RWPCP, every transaction instance may need extra space to keep its own local copy for each of its updated data objects because of the delayed write procedure. On the other hand, only the response time of higher-priority transactions is improved, and the executions of read-only transaction instances tend to be serialized. Little work, including [17, 30], has been done in considering recovery when concurrency control protocols are proposed.

(5)

req\locked R W C

R yes yes no

W yes no no

C no no no

Table 1: The compatibility matrix of locks.

This paper proposes a two-version approach which considers both real-time concurrency control and recovery. We propose to use the idea of two-version databases to replace a delayed write procedure to save the extra space needed by the procedure. The goal is to first propose a two-version variation of the RWPCP [30] to have the flexibility in the dynamic adjustment of transaction serializability order to favor higher-priority transactions and read-only transactions. We will then extend the protocol and the idea of two-version databases into distributed environments for efficient and local processing of read-only transactions and provide efficient and predictable failure recovery. Note that little work has been done for concurrency control in distributed real-time environments, e.g., [20, 15, 28]. Since there can be a large number of read-only transactions in many commercial database systems, how to improve the response time of read-only transactions is of paramount importance.

We assume that a transaction system consists of a ﬁxed set of transactions. (This condition will be relaxed when local processing of read-only transactions is considered in Section 3.) Each data object has two versions: a consistent version and a working version, where the consistent version contains a data value updated by a committed transaction instance, and the working version contains a data value updated by an uncommitted transaction instance. There are three kinds of locks in the system: read, write, and certify.

Before a transaction reads (or writes) a data object, it must ﬁrst read-lock (or write-lock) the data object. A read operation on a data object always reads from the consistent version of the data object. A write operation on a data object always updates the working version of the data object. It is required that, before a transaction commits, the transactions must transform each of its write locks into a certify lock on the same data object. As soon as a transaction obtains a certify lock on a data object, it can copy its updated working version of the data object to the consistent version. There is no requirement on the order or timing of lock transformations. The transformation of a write-lock into a certify-lock is considered as requesting a new certify lock. If the request of a certify-lock by a transaction instance is not granted, the transaction is blocked by the system until the request is granted. When

(6)

a transaction terminates, it must release all of its locks. The compatibility matrix of locks is shown in Table 1. (The well-known Two-Version Two-Phase Locking scheme has the same compatibility matrix [4, 9]. Note that the Two-Version Two-Phase Locking scheme could not guarantee one priority inversion for real-time transactions and may suﬀer from the deadlock problem.) A certify lock is stronger than a write lock, and a write lock is stronger than a read lock. All transactions follow the two-phase locking (2PL) scheme. The details will be shown in later sections. Compared to the Read/Write Priority Ceiling Pro- tocol (RWPCP) [30], the two-version locking mechanism could provide higher-priority (and read-only) transactions better opportunities to preempt lower-priority transactions. How- ever, it would be at the cost of extra certify locks and the copying of the updated working version of data objects to the consistent version. The number of certify locks which must be obtained by a committing transaction is the same as the number of write locks already obtained by the committing transaction. The cost in copying the updated working version of data objects to the consistent version is also proportional to the number of write locks already obtained by the committing transaction.

Now, we will state our notation.

Notation:

• τi,j denotes the j_th instance of transaction τ_i. p_i and c_i are the period and worst- case computation time of transaction τ_i, respectively. If transaction τ_i is aperiodic, p_i is the minimal separation time between its consecutive requests. When there is no ambiguity, we use the terms “transaction” and “transaction instance” interchangeably.

• Ri,j denotes the j_th request of transaction τ_i. A transaction instance τ_i,j is initiated for each request of transaction τ_i. Once transaction instance τ_i,j is aborted, τ_i,j may be restarted or terminated, as required by the selected scheduling algorithm.

• The k_th critical section of a transaction instance τ_i,j is denoted as z_i,j,k and corre- sponds to the code segment between the k_th locking operation and its corresponding unlocking operation. We assume in this paper that critical sections are properly nested. In other words, if the locking operation of a semaphore is no later than the locking operation of another semaphore within a transaction instance, the corresponding unlocking operation of the former semaphore is no earlier than the corresponding unlocking operation of the later semaphore. Note that it is one of the assumptions of PCP in handling the priority inversion problem.

• W (Oi) and C(O_i) denote the working version and consistent version of data object O_i, respectively.

(7)

2.2 The Basic 2VPCP Protocol

The Two-Version Priority Ceiling Protocol (2VPCP) is a two-version variation of the Read/Write Priority Ceiling Protocol [30]. The rationale behind the design of the 2VPCP protocol is to have ﬂexibility in the adjustment of transaction serializability order to favor higher-priority transactions and read-only transactions. In later sections, we shall then extend the 2VPCP protocol into distributed environments for local processing of read-only transactions and eﬃcient failure recovery.

In this section, we are interested in the context of uniprocessor priority-driven preemptive scheduling, and every transaction has a fixed priority. (This condition will be relaxed when local processing of read-only transactions is considered in Section 3.) The real-time database can be either memory-resident or disk-resident. As defined in [30], the write priority ceiling W P L_i of data object O_i is equal to the highest priority of transac- tions which may write O_i. The absolute priority ceiling AP L_i of data object O_i is equal to the highest priority of transactions which may read or write O_i. Since 2VPCP adopts a two-data-version approach and introduces a new lock called certify lock, the setting of the read/write priority ceiling RW P L_i of each data object O_i is modified as follows: The read/write priority ceiling RW P L_i of each data object O_i is set dynamically. When a transaction read-locks or write-locks O_i, RW P L_i is equal to W P L_i. When a transaction certify-locks O_i, RW P L_i is equal to AP L_i. Note that any read operation on a data object always reads from the consistent version of the data object, and any write operation on a data object always writes into the working version of the data object. It is required that, before a transaction commits, the transactions must transform each of its write locks into a certify lock on the same data object. A certify lock on a data object secures the copying of the data value from its working version into the consistent version. No lock transformation is required for a read lock.

The rationale behind the setting of priority ceilings is as follows: When data object O_i is write-locked, RW P L_i is set as W P L_i to prevent any subsequent transaction from write-locking O_i because W P L_i is equal to the highest priority of transactions which may write O_i. Note that there is only one working version for each data object. When data object O_iis certify-locked, RW P L_i is set as AP L_iso that no other subsequent transactions can lock O_i in any mode. This is to secure the copying of the data value from the working version of O_i into its consistent version. When data object O_iis read-locked, RW P L_i is set as W P L_i so that only transactions which have a priority higher than W P L_i can read-lock O_i afterward. This constraint is to prevent any transaction which might later write-lock O_i from being blocked by more than one lower-priority transaction which read-locks O_i.

(8)

We now present the deﬁnition of 2VPCP:

1. A transaction instance, which has the highest priority among all ready transaction instances, is assigned the processor. If a transaction instance does not attempt to lock any data object, the transaction instance can preempt the execution of any transaction instance with a lower priority, whether or not the priorities are assigned or inherited.

(Priority inheritance will be deﬁned later.)

2. When a transaction instance τ_i,j attempts to read-lock, write-lock, or certify-lock a data object O_k, the priority of τ_i,jmust be higher than the read/write priority ceilings of all data objects currently locked by transaction instances other than τ_i,j; otherwise, the lock request is blocked. If the priority of τ_i,j is higher than the read/write priority ceilings of all data objects currently locked by transaction instances other than τ_i,j, there are three cases to consider:

(a) If τ_i,j requests a read lock on O_k, then τ_i,j read-locks O_k, and the read/write priority ceiling RW P L_k of data object O_k is set as W P L_k.

(b) If τ_i,j requests a write lock on O_k, then τ_i,j write-locks O_k, and the read/write priority ceiling RW P L_k of data object O_k is set as W P L_k.

(c) If τ_i,j requests a certify lock on O_k, then τ_i,j certify-locks O_k, and the read/write priority ceiling RW P L_k of data object O_k is set as AP L_k. Note that τ_i,j must have write-locked O_kbefore it requests a certify lock on O_k, and both AP L_kand W P L_k are no less than the priority of τ_i,j.

If the priority of τ_i,j is no higher than the read/write priority ceilings of all data objects currently locked by transaction instances other than τ_i,j, then the lock request is blocked. Let O^∗be the data object with the highest read/write priority ceiling of all data objects currently locked by transaction instances other than τ_i,j. If τ_i,jis blocked because of O^∗, τ_i,j is said to be blocked by the transaction instance that locked O^∗. 3. A transaction instance τ_i,j uses its assigned priority, unless it locks some data objects

and blocks higher priority transaction instances. If a transaction instance blocks a higher priority transaction instance, it inherits the highest priority of the transaction instances blocked by τ_i,j. When a transaction instance unlocks a data object, it resumes the priority it had at the point of obtaining the lock on the data object.

When a transaction instance is aborted, all transaction instances which inherit its priority must reset their priorities according to the deﬁnition of priority inheritance.

The priority inheritance is transitive. Note that the resetting of priority inheritance

(9)

can be eﬃciently implemented by using a stack data structure. We refer interested readers to [29] for details. This is because there is no transitive blocking in transaction executions (Please see Lemma 2).

4. All transaction instances follow a 2PL scheme. That is, no transaction instance is allowed to obtain any new lock after it releases any locks.

The lock compatibility matrix, as shown in Table 1, is implicitly verified through priority ceilings (Please see Lemmas 3, 4, and 5). A transaction instance is before another transaction instance in the serializability order if any of the following conditions is satisfied: (a) the latter reads from the consistent version of any data object updated by the former. (b) the former and the latter update the same data object, and the former write-locks the data object first. Theorem 4 shows that all 2VPCP schedules are serializable. Theorem 5 also shows that the serializability order of transaction instances is the same as their begin unlock message order. The priority ceilings are used to control priority inversion in the system.

A similar approach can be found in [30]. The aborting of a transaction may happen because of its deadline violation. However, transaction aborting incurs low overheads because executing transactions are updating only the working version of any data object. When a transaction commits, the transaction transforms each of its write locks into a certify lock on the same data object and copies its updated working version of the data object to the consistent version. The aborting of a transaction simply discards the working version, and any subsequent transaction can simply overwrite the working version. Since all transactions read from the consistent version of data objects, no cascading aborting is possible.

Furthermore, it is not possible for the cascaded resetting of any inherited priority for any transaction due to the occurrences of transaction abortings because there is no transitive blocking, as shown in Lemma 2 in the next section.

Example 1 A 2VPCP Schedule:

We illustrate the 2VPCP protocol by an example. Suppose that there are three transactions τ₁, τ₂, and τ₃ in a uniprocessor environment. Let the priorities of τ₁, τ₂, and τ₃ be 1, 2, and 3, respectively, where 1 is the highest, and 3 is the lowest. Suppose that τ₁ and τ₂ may read and write data object S₁, respectively, and τ₂ and τ₃ may read and write data object S₂, respectively. According to the deﬁnitions of ceilings, the write priority ceiling W P L₁ and the absolute priority ceiling AP L₁ of S₁ are 2 and 1, respectively. The write priority ceiling W P L₂and the absolute priority ceiling AP L₂of S₂are 3 and 2, respectively.

At time 0, τ₃starts execution. At time 2, τ₃write-locks S₂successfully, and RW P L₂ = W P L₂ = 3. At time 4, τ₂ arrives and preempts τ₃. At time 6, τ₂ write-locks S₁ success-

(10)

T

₁

T

₂

T

₃

S1

S1 S2 S2S1

S2 S2

R_lock(S1)

Unlock(S1)

R_lock(S2)

W_lock(S2)

W_lock(S1) Unlock(S2)

Unlock(S1)

Unlock(S2) C_lock(S1)

C_lock(S2)

0 2 4 6 8 10 12 14 16 18 20 22 24 26 28 30 32 Time

Time

Figure 1: A 2VPCP schedule

fully because the priority of τ₂ is higher than RW P L₂ (RW P L₁ = W P L₁ = 2). At time 8, τ₂ read-locks S₂ successfully because the priority of τ₂ is higher than RW P L₂ (RW P L₂ = W P L₂ = 3). Note that τ₃ is behind τ₂ in the serializability order although τ₃ write-locks S₂ before τ₂ read-locks S₂. At time 11, τ₁ arrives and preempts τ₂. At time 13, τ₁read-locks S₁ successfully because the priority of τ₂is higher than RW P L₁ and RW P L₂. RW P L₁ is equal to W P L₁ = 2. Note that τ₂ is behind τ₁ in the serializability order al- though τ₂ write-locks S₁ before τ₁ read-locks S₁. τ₁ then unlocks S1 and commits at time 17 and 19, respectively. Right before time 21, τ₂ certify-locks S₁ successfully and copies the working version of S1 into the consistent version because the priority of τ₂ is higher than RW P L₂ (RW P L₂ = W P L₂ = 3). At time 21, τ₂ unlocks S₂. At time 23, τ₂ unlocks S₁. At time 25, τ₂ commits, and τ₃ resumes its execution. Right before time 28, τ₃certify-locks S₂ successfully and copies the working version of S2 into the consistent version. At time 30, τ₃ commits.

For comparison, let us schedule these transactions according to the Read/Write Pri- ority Ceiling Protocol (RWPCP), where a single version per data object is considered. As shown in Figure 2, the write-lock request of τ₂ on S₁ is rejected at time 6 because the priority of τ₂ is no higher than RW P L₂ = AP L₂ = 2. The reason for the rejection under the RWPCP protocol is because τ₂ may later read S₂, and the read will leave τ₂ behind τ₃ in the serializability order. As a result, τ₂ is blocked by τ₃. Note that the 2VPCP protocol lets τ₂ preempt τ₃, and τ₂ reads from the consistent version of S₂. As a result, τ₂ is not blocked by τ₃ under the 2VPCP protocol. Figure 2 also shows the blocking of τ₁ at time 13 under the RWPCP protocol.

(11)

0 2 4 6 8 10 12 14 16 18 20 22 24 26 28 30 32

T

₁

T

₂

T

₃

S1

S1 S2

S2

R_lock(S1)

R_lock(S2)

W_lock(S2)

W_lock(S1) Unlock(S2) Unlock(S1)

Unlock(S2)

S2

S1

Time

Figure 2: A RWPCP schedule

This example demonstrates that one of the goals in designing the 2VPCP protocol is that a higher-priority transaction instance can utilize the consistent version of a data object without being blocked by a lower-priority transaction instance, due to read/write conﬂicts.

The serializability order of transaction instances is no longer determined by the order of the conﬂicting lock requests and can be adjusted according to the priorities of the transaction instances. 2

2.3 Properties

Lemma 1 A (higher-priority) transaction instance τ_H can be blocked by another (lower- priority) transaction instance τ_L only if τ_L is executing in a critical section which later blocks τ_H (when τ_H is initiated).

Proof According to the deﬁnitions of the 2VPCP protocol, τ_L can block τ_H only if τ_L directly blocks τ_H because of a lock request, or τ_L inherits a priority higher than the priority of τ_H. In either case, τ_L must be in a critical section to block τ_H. Furthermore, if τ_L is not executing in a critical section when τ_H is initiated, then τ_H can preempt τ_L because its priority must be no more than the priority of τ_H. 2

Deﬁnition 1 [29] Transitive blocking is said to occur if a (higher priority) transaction instance is directly blocked by another (lower priority) transaction instance which, in turn, is directly blocked by the other (further lower priority) transaction instance.

(12)

Lemma 2 No transitive blocking is possible.

Proof. This lemma can be proven by contradiction. Suppose that a transitive blocking happens among three distinct transaction instances τ₁, τ₂, and τ₃. Let τ₁ directly block τ₂, and τ₂ directly block τ₃. According to Lemma 1, τ₁ and τ₂ must be executing in critical sections to block τ₂ and τ₃, respectively. Let τ₁ and τ₂ be executing in critical sections z_1,i and z_2,j, respectively, when the transitive blocking occurs. Since τ₁ blocks τ₂, τ₁ must enter critical section z_1,i before τ₂ is initiated; otherwise, τ₂ will not be directly blocked by τ₁. By the deﬁnitions of the 2VPCP protocol, the read/write priority ceiling RW P L_kof the data object O_k locked by τ₁ when τ₁ enters z_1,i should be no lower than the priority of τ₂. However, when τ₂ requests a lock to enter critical section z_2,j (which blocks τ₃ later), its priority should still be no larger than the read/write priority ceiling RW P L_k of the data object O_k. In other words, τ₂ will not be allowed to enter critical section z_2,j until τ₁ leaves z_1,i, and the transitive blocking should not occur. 2

Theorem 1 2VPCP is deadlock-free.

Proof. Since there is no transitive blocking (please see Lemma 2), a deadlock can only happen between two transaction instances. Let two distinct transaction instances τ₁ and τ₂ form a deadlock, and τ₁ enter critical section z_1,i which blocks τ₂ before τ₂ enters critical section z_2,j which blocks τ₁. Because critical section z_1,i blocks τ₂, the read/write priority ceiling RW P L_kof the data object O_klocked by τ₁ when τ₁ enters z_1,ishould be no lower than the priority of τ₂. In other words, the lock request of τ₂ to enter critical section z_2,j should not succeed until τ₁ leaves z_1,i, and no deadlock should occur. 2

Theorem 2 The maximum number of priority inversion per transaction instance is one.

Proof. Let a transaction instance τ_H be blocked by two distinct lower-priority trans- action instances τ_L and τ_L. Since there is no transitive blocking (please see Lemma 2), τ_L and τ_L must be executing in critical sections z_L,i and z_L,j to directly block τ_H, respectively.

Let τ_L enter critical section z_L,i before τ_L enters critical section z_L,j. Since critical section z_L,i blocks τ_H, critical section z_L,i should also block τ_L because the priority of τ_H is higher than the priority of τ_L. In other words, τ_L should not enter critical section z_L,j to directly block τ_H until τ_Lleaves z_L,i, and the maximum number of priority inversion per transaction instance should not be more than one. 2.

Note that when transactions do abort, Theorems 1 and 2 remain correct, provided that aborted transactions must unlock their data objects. This is because aborted transactions will not be in a deadlock cycle or introduce any further priority inversion.

(13)

We shall ﬁrst prove Lemmas 3, 4, and 5 to show that all of the 2VPCP schedules comply with the compatibility matrix shown in Table 1:

Lemma 3 When data object O_k is read-locked by a transaction instance τ , no transaction instance can certify-lock O_k under the 2VPCP protocol.

Proof. The lemma can be proven by contradiction. Let a distinct transaction instance τ receive a certify lock on O_k when O_k is read-locked by a transaction instance τ under the 2VPCP protocol. By the deﬁnitions of the 2VPCP protocol, the priority of τ must be higher than RW P L_k (i.e., W P L_k), where RW P L_k is no less than the original priority of τ. In other words, τ must inherit the priority of some transaction instance which is higher than RW P L_kto certify-lock O_k. Let z_ibe the earliest critical section which τenters and which later blocks some higher-priority transaction instance τwhose priority is higher than RW P L_k. Based on Lemmas 1 and 2, τ must be executing in critical section z_i before τ is initiated.

There are two cases for discussions on when τ enters critical section z_i: Suppose that data object O_k is read-locked by transaction instance τ before τ enters critical section z_i. Since the 2VPCP protocol should not allow τ to enter critical section z_i because the priority of τ is no higher than RW P L_k, a contradiction exists. (We assume that τ enters a critical section which blocks a transaction instance with a priority higher than RW P L_k in the past paragraph.)

Let data object O_k be read-locked by transaction instance τ after τ enters critical section z_i. The priority of τ must be higher than the priority of τ; otherwise, data object O_kcannot be read-locked by transaction instance τ . If the priority of τ is really higher than the priority of τ, then τ has no chance to regain the CPU and issues a certify lock on O_k unless τ is blocked (when τ issues a certify lock on O_k). Since there is no deadlock and no transitive blocking (please see Lemma 2 and Theorem 1), τ must be blocked by τ. Since τ blocks τ , τ must be executing in a critical section which later blocks τ before τ is initiated (please see Lemma 1). It contradicts the assumption that data object O_k is read-locked by τ because τ has no way to read-lock O_k. 2

Lemma 4 When data object O_kis write-locked by a transaction instance τ , no transaction instance can write-lock or certify-lock O_k under the 2VPCP protocol.

Proof. Since both read and write locks set RW P L_k as W P L_k, and the sets of transactions which may issue write or certify locks are the same, this lemma can be proven in a way similar to the proof of Lemma 3. 2

(14)

Lemma 5 When data object O_k is certify-locked by a transaction instance τ , no transaction instance can lock O_k in any way under the 2VPCP protocol.

Proof. This lemma can be proven in a way similar to the proof of Lemma 3 (by replacing every occurrence of “read-lock by τ ”, “RW P L_k”, and “certify-lock by τ” with

“certify-lock by τ ”, “AP L_k”, and “lock by τ”, respectively). 2

Theorem 3 All 2VPCP schedules satisfy the compatibility matrix shown in Table 1.

Proof. The correctness of this theorem directly follows from Lemmas 3, 4, and 5. 2

Theorem 4 All 2VPCP schedules are serializable.

Proof. Since schedules generated by the 2-version 2PL protocol is (one-copy) serial- izable (1SR) [4, 9], and the set of schedules generated by the 2VPCP protocol is a subset of that generated by the 2-version 2PL protocol (Please see Theorem 3), all 2VPCP schedules are serializable. Note that all schedules which satisfy the 2PL scheme and the compatibility matrix in Table 1 are 2-Version 2PL schedules [4, 9]. Since all 2VPCP schedules satisfy the 2PL scheme and the compatibility matrix in Table 1, all 2VPCP schedules are 2-Version 2PL schedules. 2

3 Read-Only Transaction Processing

3.1 Overview

Consistent version

Work Space version

Server Computer

Consistent verion

Client Computers

Network Connection

Figure 3: A client-server architecture for read-only transactions

The purpose of this section is to extend the 2VPCP protocol into local processing of read-only transactions, as shown in Figure 3. For the purpose of this paper, we assume that

(15)

all updating transactions are submitted to the server-side system for execution. The main idea in this 2VPCP extension is to “duplicate” a consistent version of the database image at each client-side system to service read-only transactions locally at client-side systems. There are two major advantages of this approach: (1) A potentially large number of queries, i.e., read-only transactions, can be screened out of the normal operation of a real-time database system (at the server side). (2) Real-only transactions can be processed much faster and eﬃciently at client-side systems without going through potentially jammed network.

As astute readers may notice, higher-priority read-only transactions at the server system are already favored by the 2VPCP protocol because higher-priority read-only transactions can read-lock and access any data objects, unless the consistent versions of the data objects are under modiﬁcations (i.e., locked in a certify mode). We surmise that, in normal operation, the interval of a certify lock should not be long for server-side transactions because usually only a committing transaction tries to obtain a certify lock. The real question here is how to improve the response time of lower-priority (and even higher-priority) read-only transactions issued by users at client-side systems. The main idea is to maintain a consistent database image at each client-side system to improve the response time of users’ local lower-priority and higher-priority read-only transactions and, at the same time, without sacriﬁcing the serializability correctness of the entire system. In order to achieve this goal, we must build a serializability order of all transactions executing at client-side and server-side systems.

The technical question here is how to efficiently maintain a consistent database image at each client-side system which satisfies the above condition. Our approach is to let each of the client-side systems autonomously fabricate the consistent version of the server-side two- version database at the client side, such that all executions of read-only transactions at the client side can be properly inserted into the serializability order of transaction executions at the server side. In order to maintain a consistent database image at each client-side system, each transaction (or the system) must send client systems a message similar to a redo log (τ_i, object name, old value, new value) for each of its write operations (to the working version of the server-side database). Each of the client-side systems then maintains its consistent database image by observing these messages. Note that the client-side consistent database images will be used for efficient failure recovery in Section 4, and no processing of redo logs is needed again during failure recovery. old value in each redo-log message can be removed because it will not be used in any way.

For the rest of this paper, we assume that all messages sent in the network arrive at the destination in their sending order.

(16)

3.2 The Serializability-Order Rebuilding Mechanism

Because of the existence of a two-version database and the “preemptions” of conflicting transactions (which result in the effects of dynamic adjustment of serializability order) at the server-side system, the serializability order of transaction executions at the server side cannot be simply observed by the timestamps of successful conflicting locking requests issued by the server-side transactions. We propose to observe the serializability order of server-side transactions based on the order of the beginning of the shrinking phase of the server-side transaction instances. The beginning of the shrinking phase of any server-side transaction instance can be easily observed by the appearance of the first unlock request of the transaction instance. The information is purely syntactic and can be easily observed by the system with very low overheads because all server-side transaction instances issue lock and unlock requests to the system, regardless of whether the system tries to observe the beginning of their shrinking phase.

Let begin unlock denote the ﬁrst unlocking operation of a transaction instance. We shall prove in the following theorem that the begin unlock order of transaction instances at the server side complies with the serializability order of the transaction instances executing at the server side. This observation provides a simple and eﬃcient mechanism (which we will show you in the next section) to determine the serializability order of server-side transactions.

Theorem 5 The begin unlock order of transaction instances (at the server side) complies with the serializability order of the transaction instances (at the server side).

Proof. This theorem can be proven by considering all of the combinations of con- flicting r/w operations. To determine the serializability order of conflicting transaction instances, four cases must be considered, as shown in Figure 4, where W L_i(x), RL_i(x), CerL_i(x), and Begin U nlock denote the write lock, read lock, certify lock, and begin unlock message of transaction τ_i. Note that the 2VPCP protocol satisfies the 2PL scheme and the compatibility matrix shown in Table 1 (Please see Theorem 3 and the definitions of the 2VPCP protocol).

1. If there is a write/write conﬂict between two conﬂicting transaction instances, e.g., τ₁ and τ₂in Figure 4.a, then the begin unlock order of transaction instances must comply with the serializability order of the transaction instances. This is because write and certify locks are incompatible with each another. Every write lock of a transaction instance must precede its begin unlock.

(17)

T₁

WL₁(X) CerL₁(X) Begin_Unlock

Begin_Unlock CerL₂(X)

WL₂(X)

T₂

(a) W/W conflict

Time

T₁ ^WL¹^(X) ^CerL¹^(X) Begin_Unlock

Begin_Unlock RL₂(X)

T₂

(b) W/R conflict : W proceeds R

Time

T₁ ^WL¹^(X) ^CerL¹^(X) Begin_Unlock

Begin_Unlock RL₂(X)

T₂

(c) W/R conflict : R proceeds W

Time

T₁

RL₁(X) Begin_Unlock

Begin_Unlock CerL₂(X)

WL₂(X)

T₂

(d) R/W conflict

Time

Figure 4: Serializability order of conﬂicting transactions

2. If there is a write/read conﬂict between two conﬂicting transaction instances, and the read lock of a transaction instance is granted after the write lock and certify lock of another transaction instance, e.g., τ₁and τ₂in Figure 4.b, then the begin unlock order of transaction instances must comply with the serializability order of the transaction instances. This is because read and certify locks are incompatible. The unlocking of the certify lock must be earlier than the granting of the read lock.

3. If there is a write/read conﬂict between two conﬂicting transaction instances, and the read lock of a transaction instance is granted before the certify lock, but after the write lock of another transaction instance, e.g., τ₁ and τ₂ in Figure 4.c, then the begin unlock order of transaction instances must comply with the serializability order

(18)

of the transaction instances. This is because read and certify locks are incompatible.

The certify lock cannot be granted until the unlocking of the read lock. Obviously, the fact that τ₂ may have accessed the consistent version lets τ₂ be before τ₁ in the serializability order.

4. Suppose that there is a read/write conflict between two conflicting transaction instances, and the read lock of a transaction instance precede the write lock of another transaction instance, e.g., τ₁ and τ₂ in Figure 4.d. Regardless of what the order of the begin unlock of τ₁ and the write lock W L₂(x) of τ₂ is, the begin unlock order of transaction instances τ₁ and τ₂ must comply with the serializability order of the transaction instances. This is because the transformation of the write lock into the certify lock must be blocked by the read lock if the read lock is not released. Note that τ₁ may release a lock after τ₂ obtains a conflicting write lock when the begin unlock of τ₁ is after the write lock W L₂(x) of τ₂.

2

Note that if the server-side system allows conflicting transaction instances to commit in an order different from their serializability order, then each client-side system must apply the redo logs of the committing transaction instance on the consistent database version of the system in order of their begin unlock messages, instead of their commit order. Theorem 5 provides the general relationship between the serializability order of server-side transaction instances and their order of begin unlock messages, regardless of whether the server-side system may or may not crash. However, we must emphasize that if the server-side system indeed allows conflicting transaction instances to commit in an order different from their serializability order, and the server may crash at any time, then some schedules of server- side transactions are not recoverable according to the definition of recoverable schedules [9].

In the next section, we shall address this recovery issue for the extended 2VPCP protocol at the server side.

3.3 The Extended 2VPCP Protocol

We now present the mechanism in extending the 2VPCP protocol for local processing of read-only transaction instances at client sides. We assume that messages sent in a network arrive at destination systems in the order of their sending times, and the network is reliable.

We are interested in a close environment, where the network is close and under reasonable control. A network protocol such as TCP/IP which supports reliable message transmissions

(19)

is adopted. If the network fails, then the extended 2VPCP protocol, similar to many other distributed concurrency control protocols, will not work in a distributed environment.

Server-Side Transactions:

Each server-side transaction instance τ_i (scheduled by the 2VPCP protocol) is required to broadcast a message to all client-side systems under the following three circum- stances: Note that the server system can broadcast the messages on behalf of the server-side transaction instances.

1. Before τ_i writes on the working version of a data object O_j, τ_i (or the server-side system) must broadcast a message similar to a redo log (τ_i, O_j, old value, new value) to all client-side systems.

2. Before τ_i commits, τ_i (or the system) must broadcast a message similar to a commit log (τ_i, commit) to all client-side systems.

3. When τ_i ﬁrst unlocks any data object, τ_i (or the server-side system) must broadcast a message (τ_i, begin unlock) to all client-side systems to signal the beginning of the shrinking period of τ_i.

Note that if the server system allows conflicting transaction instances to commit in an order different from their serializability order, and the server may crash at any time, then some schedules of server-side transactions are not recoverable according to the definition of recoverable schedules [9]. In order to maintain the recoverability of the system, the server system must delay the commitment of a transaction instance (i.e., the actual releasing of certify locks and the sending of commit log to client-side systems) until all preceding transaction instances in the serializability order (i.e., their order of begin unlock messages) commit. However, if the server system may never crash, the above delaying requirement of commitment is not necessary. In other words, the above delaying requirement is not necessary before Section 4 which is for failure recovery.

Client-Side System:

Let each client-side system and the server-side system share the same consistent version of the database initially. A transaction instance is said to have committed at a client-side system if all of the redo logs of the transaction instance have been applied on the consistent database version of the client-side system. Note that the write through procedure of each committing transaction instance at the server-side computer cannot be completed until all redo logs and commit log are delivered to all client-side systems. When a

(20)

reliable broadcasting network is adopted, the server-side system may simply return from any sending operations and assume that all client-side systems will receive the messages sent by these operations eventually. The details regarding logging and commitment of server-side transaction instances will be discussed in Section 4.2.

During the system operation, each client-side system keeps all redo-log messages of server-side transaction instances which have not committed at the client-side system.

When a client-side system receives a commit message, the system applies the redo logs of the committing transaction instance on the consistent database version of the system atomically in order of their begin unlock messages. Note that the client-side systems may be busy doing something else; so there could be a diﬀerence between copies on the client sides and on the server side. However, sooner or later, the client-side systems will catch up when the local workloads drop. Such a phenomenon will not cause any problem because all 2VPCP schedules with local read-only transaction processing are serializable even though some clients might actually apply the logs later (Please see Theorem 6).

Each client-side system has a unique updating transaction τ_{U pd} which is responsi- ble for atomically updating the consistent database image based on redo-log messages of committing transaction instances. Before transaction τ_{U pd} updates the consistent database image, it must write-lock the entire database image. Transaction τ_{U pd} may be periodic or aperiodic. The higher the priority of τ_{U pd}, the faster τ_{U pd} can update the consistent database image by processing the redo logs to reﬂect the database image updated by com- mitted server-side transactions. The assignment of a high priority to τ_{U pd} can also help in reducing the recovery time because the recovery mechanism must reﬂect the consistent database image updated by all of the committed server-side transactions, and the mech- anism depends on τ_{U pd} to process the redo logs. However, the high priority of τ_{U pd} may interfere with the executions of read-only transactions at the client side. Simulation exper- iments regarding the priority of τ_{U pd} will be included in Section 5.

Autonomous Read-Only Transaction Processing at Client-Side Systems:

Each client-side system should schedule all of its transaction instances including τ_{U pd} in a preemptive priority-driven fashion. Before a read-only transaction instance reads any data object, it must read-lock the entire database image. Note that the entire database can be locked or unlocked by simply locking or unlocking a global ﬂag. (Simulation experiments will be done to justify the setting of the priority of τ_{U pd}.)

Note that the consistent database image at the server-side system may not be the same as the consistent database image at some client-side systems. It is even possible that the τ_{U pd} transactions of diﬀerent client-side systems process the redo logs of committed

(21)

transaction instances at different speeds, due to different workloads at different client-side systems. However, as shown in Theorem 6, all 2VPCP schedules with local read-only transaction processing are serializable. Read-only transactions at different client-side systems may have consistent, but different views of the database.

Lemma 6 All client-side systems have the same consistent database image if they receive the same set of messages sent from the server-side system and apply them to the database image.

Proof. The correctness of this lemma follows directly from the assumption that the network delivers messages in a ﬁrst-come-ﬁrst-serve fashion. 2

Lemma 7 The database image maintained at client-side systems always satisﬁes the seri- alizability order of server-side transaction instances.

Proof. The correctness of this proof follows directly from Theorem 5 and the deﬁ- nitions of the database image maintenance mechanism. 2

Theorem 6 All 2VPCP schedules with local read-only transaction processing are serializ- able.

Proof. Theorem 4 shows that all transactions at the server side are serializable.

The problem is whether all client-side read-only transactions and all server-side transactions together are still serializable. Lemma 6 shows that all client-side systems have the same consistent database image if they receive the same set of messages sent from the server-side system and apply them to the database image. That is, there is no inconsis- tent view of the database among client-side transactions running on diﬀerent client sides.

Furthermore, Lemma 7 shows that the database image maintained at client-side systems always satisﬁes the serializability order of server-side transactions. A client-side read-only transaction is considered to occur exactly after the server-side transactions which commit at the corresponding client-side system before the read-only transaction read-lock and read the consistent database image. We conclude that client-side read-only transactions and all server-side transactions together are serializable. 2

(22)

4 Failure Recovery

4.1 Motivation

The purpose of this section is to further extend the 2VPCP protocol to the failure recovery of the server-side system. As astute readers may notice, the 2VPCP protocol can be applied in both memory or disk resident databases. For the rest of this section, we will focus our discussions on memory-resident databases. We shall also require that the 2VPCP protocol only allows conﬂicting server-side transactions to commit in their serializability order. The enforcement of commit order can be done easily by delaying the commitment of transactions. However, the delaying of the commitment of server-side transactions may increase the maximum number of priority inversions for a server-side transaction by one. It can be explained by the following example:

Let a higher-priority transaction instance τ_H be blocked by a lower-priority transac- tion instance τ_L under the 2VPCP protocol at the server side. The begin unlock message of τ_L will be before that of τ_H because of the adoption of the 2PL scheme. Since τ_H has a higher priority than τ_L does, τ_H may preempt τ_L (after τ_L unlocks the data object which blocks τ_H) and try to commit before τ_L commits. In order to let conﬂicting transaction in- stances commit according to their serializability order, i.e., the begin unlock message order, τ_H will be delayed to wait for τ_L to commit to keep the system recoverable. Because there is no transitive blocking, and the maximum of priority inversions for the 2VPCP protocol is one, the extra number of priority inversions, due to the delaying of commitment, for τ_H is one.

Consistent version

Work Space version

Server Computer

Consistent verion

(Client Computers)

Network Connection Recovery Servers

Consistent Copy

Figure 5: A client-server architecture for failure recovery

The client computers adopted for local processing of read-only transactions (in Sec-