Multiple Writers - Introduction to Reliable and Secure Distributed Programming

4.1 Introduction

4.4.1 Multiple Writers

All registers discussed so far have only a single writer. That is, our specifications of regular and atomic registers introduced in the previous sections do not provide any

guarantees when multiple processes write to the same register. It is natural to ask what should happen in the case of multiple writers.

In order to answer this question, we need to formulate an appropriate validity property for multiple writers. Indeed, this property requires a read that is not con-current with any write to return the last value written. But, if two processes have written different values v and v^′ concurrently, before some other process invokes a read operation, then what should this read return? Assuming we make it possible for the reader to return either v or v^′, do we allow a concurrent reader, or a reader that comes even later, to return the other value? What about a failed write operation? If a process writes a value v and crashes before completing the write, does a reader need to return v or can it return an older value?

In the following, we answer these questions and generalize the specification of atomic registers to multiple writers.

4.4.2 Specification

An(N, N) atomic register abstraction (NNAR) links together read and write opera-tions in a stricter way than its single-writer relative. This register abstraction ensures that every failed write appears either as if it was never invoked or as if it completed, i.e., as if the operation was invoked and terminated. Clearly, a failed read operation may always appear as if it was never invoked. In addition, even in the face of con-currency, it must be that the values returned by reads could have been returned by a hypothetical serial execution, where every operation takes place at an indivisible point in time, which lies between the invocation event and the completion event of the operation.

An(N, N) atomic register is a strict generalization of a (1, N) atomic register in the sense that every execution of a(1, N) atomic register is also an execution of an (N, N) atomic register but not vice versa. The interface and properties of an (N, N) atomic register abstraction are given in Module4.3.

The hypothetical serial execution mentioned before is called a linearization of the actual execution. More precisely, a linearization of an execution is defined as a sequence of complete operations that appear atomically, one after the other, which contains at least all complete operations of the actual execution (and possibly some operations that were incomplete) and satisfies the following conditions:

1. every read returns the last value written; and

2. for any two operations o and o^′, if o precedes o^′ in the actual execution, then o also appears before o^′in the linearization.

We call an execution linearizable if there is a way to linearize it like this. With this notion, one can reformulate the atomicity property of an(N, N) atomic register in Module4.3as:

NNAR2’: Atomicity: Every execution of the register is linearizable.

To implement(N, N) atomic registers, we adopt the same modular approach as for implementing(1, N) atomic registers. We first give a general transformation that

Module 4.3: Interface and properties of an (N, N) atomic register Module:

Name: (N, N)-AtomicRegister, instance nnar.

Events:

Request: ⟨ nnar, Read ⟩: Invokes a read operation on the register.

Request: ⟨ nnar, Write | v ⟩: Invokes a write operation with value v on the register.

Indication: ⟨ nnar, ReadReturn | v ⟩: Completes a read operation on the register with return value v.

Indication: ⟨ nnar, WriteReturn ⟩: Completes a write operation on the register.

Properties:

NNAR1: Termination: Same as property ONAR1 of a (1, N) atomic register (Module4.2).

NNAR2: Atomicity: Every read operation returns the value that was written most recently in a hypothetical execution, where every failed operation appears to be complete or does not appear to have been invoked at all, and every complete oper-ation appears to have been executed at some instant between its invocoper-ation and its completion.

implements an(N, N) atomic register using (1, N) atomic registers. This transfor-mation uses only an array of underlying(1, N) atomic registers and no other way of exchanging information among the processes. We present it to illustrate the funda-mental difference between both abstractions. We then give two direct and efficient implementations of(N, N) atomic registers in terms of a fail-stop algorithm and a fail-silent algorithm.

4.4.3 Transformation: From(1, N) Atomic to (N, N) Atomic Registers We describe how to transform any(1, N) atomic register abstraction into an (N, N) atomic register abstraction, using no other primitives. To get an intuition of this transformation, recall the example of the atomic blackboard on which one teacher writes and from which multiple students read. A multi-writer register corresponds to a blackboard shared by multiple teachers for writing information that is read by a set of students. All teachers should write to a single common board and all students should read from this board. However, only the simpler boards constructed before are available, where every board allows only one teacher to write information. If every teacher uses his or her own board to write information then it will not be clear for a student which information to select and still ensure the atomicity of the common board, i.e., the illusion of one physical common board that all teachers share. The problem is that the student cannot recognize the latest information that was written. Indeed, if some teacher A writes v and then some other teacher B later

writes w then a student that looks at the common board afterward should see w. But how can the student know that w is indeed the latest information, given that what is available are simply individual boards, one for each teacher?

The solution is to coordinate the teachers so that they explicitly create a hap-pened-before relation among the information they write. To this end, all teachers associate a global timestamp with every written value. When teacher B writes w, he or she first reads the board and finds v (written by teacher A) and an associated timestamp there. Teacher B now increments the timestamp and associates it with w, representing the very fact that w was written after v and is, therefore, more recent than v. This is the key idea of our transformation.

The transformation in Algorithm 4.8 implements one (N, N) atomic register instance nnar from multiple(1, N) atomic registers. Specifically, it uses an array of N underlying (1, N) atomic register instances, called onar.p for p ∈ Π. Every register instance onar.p stores a value and an associated timestamp. Basically, when a process p emulates a write operation with value v to register nnar, it first reads all underlying(1, N) registers. Then it selects the largest timestamp, increments it, and associates it with v, the value to be written. Finally, p writes the value and the associated timestamp to the register instance onar.p.

To read a value from the multi-writer register, a process p first reads all underly-ing registers and returns the value with the largest timestamp. It may occur that several registers store the same timestamp with different values. To resolve this ambiguity, process p orders such values according to the rank of the process that writes to the register. (Recall that the rank associates every process with an index between1 and N.) In other words, process p determines the value with the highest timestamp/rank pair, ordered first by timestamp and second by rank. This defines a total order among the values stored in the underlying registers. We abstract away this order within the functionhighest(·), which we modify for this algorithm so that it operates on triples of the form (timestamp, rank, value) and returns the timestamp and value from the triple with the largest timestamp/rank pair in our total order.

Correctness. The termination property of the(N, N) register follows directly from the corresponding condition of the underlying(1, N) registers.

To show that register instance nnar is atomic, we demonstrate that the nnar-read and nnar-write operations are linearizable, i.e., there exists a hypothetical serial execution with all complete operations of the actual execution, such that (1) every read returns the last value written and (2) for any two operations o1 and o2, if o1 precedes o2 in the actual execution then o1 also appears before o2 in the linearization.

Recall that the algorithm uses a total order on (timestamp, rank, value) tuples, implemented by the functionhighest(·), and selects the value to return in a read operation accordingly.

It is clear from the algorithm that the timestamps written by two serial operations on nnar are strictly increasing, i.e., if a nnar-write operation writes (ts, v) to an underlying register instance onar.q and a subsequent nnar-write operation writes (ts^′, v^′) to an underlying register instance onar.q^′then ts < ts^′.

Algorithm 4.8: From (1, N) Atomic to (N, N) Atomic Registers Implements:

(N, N )-AtomicRegister, instance nnar.

Uses:

(1, N )-AtomicRegister (multiple instances).

upon event_⟨nnar,Init ⟩do val:=_⊥;

writing:= FALSE; readlist:=[⊥]^N; forallq∈ Πdo

Initialize a new instance onar.qof(1, N )-AtomicRegister with writerq;

upon event_⟨nnar,Write_{| v ⟩}do val:=v;

writing:= TRUE; forallq∈ Πdo

trigger_⟨onar.q,Read _⟩; upon event_⟨nnar,Read ⟩do

forallq∈ Πdo

trigger_⟨onar.q,Read _⟩;

upon event_⟨onar.q,ReadReturn| (ts^′, v^′)⟩do readlist[q]:=(ts^′,rank(q), v^′);

if#(readlist) = Nthen (ts, v):=highest(readlist); readlist:=[⊥]^N;

if writing=TRUEthen writing:= FALSE;

trigger_⟨onar.self,Write| (ts + 1, val) ⟩; else

trigger_⟨nnar,ReadReturn| v ⟩; upon event_⟨onar.self,WriteReturn_⟩do

trigger_⟨nnar,WriteReturn_⟩;

Now, construct the linearization as follows. First, include all nnar-write opera-tions according to the total order of their associated timestamp/rank pairs. Second, consider each nnar-read operation or in the order in which the response occurs in the actual execution, take the value ts and the rank of the writer q associated with the value v returned, and find the nnar-write operation ow, during which process q wrote(ts, v) to instance onar.q; place operation or after ow into the linearization, immediately before the subsequent nnar-write operation.

It is easy to see that the first condition of linearizability holds from the construc-tion of the linearizaconstruc-tion, because each read returns the value of the latest preceding write.

To show the second condition of linearizability, consider any two operations o1

and o2 in the actual execution such that o1 precedes o2. There are four cases to consider:

1. If both are writes, they are in the correct order as argued earlier (their timestamps are strictly increasing).

2. Suppose o1 is a read and o2 is a write. The algorithm for the write first reads the underlying registers, selects the highest timestamp/rank pair, and increments this timestamp by one for writing. Then, o1occurs before o2in the linearization according to its construction.

3. Suppose o1 is a write and o2 is a read. As in the previous case, the algorithm for the read first reads the timestamps from all underlying registers and chooses among them a value with a maximal timestamp/rank pair. Thus, o2 returns a value with associated timestamp generated by o1 or by a subsequent write.

Hence, the construction of the linearization places o2after o1.

4. If o1is a read and o2is a read, the case is more complex. Suppose that o1returns a value v1 and selected(ts1, r1) as highest timestamp/rank pair, and o2 used a different highest timestamp/rank pair(ts2, r₂) associated with the return value.

As o2occurs after o1in the actual execution, and as any intervening writes do not decrease the timestamp value, we have ts2 ≥ ts1. If ts2 > ts1 then the second condition holds by construction of the linearization.

Otherwise, if ts2 = ts1, consider process p1 with rank r1 and process p2 with rank r2. If r1 < r2 then the write of process p1 is placed into the linearization before the write of process p2, and, hence, also o1is placed into the linearization before o2. If r1 = r2 then the read operations occur also in the correct order in the linearization. The last case, r1 > r2, however, cannot have occurred in the actual execution: when o2 is invoked, the underlying register instance onar.p1

still contains the pair(ts₁, v₁) and o₂would have selected(ts₁, r₁) as the high-est timhigh-estamp/rank pair. But this contradicts the assumption made above that (ts1, r₁) ̸= (ts², r₂).

Performance. Every write operation into the (N, N) atomic register requires N reads from each of the underlying (1, N) registers and one write into a (1, N) register. Every read from the (N, N) register requires N reads from each of the underlying(1, N) registers.

Assume we apply the transformation of Algorithm 4.8 to the “Read-Impose Write-All” fail-stop algorithm (Algorithm4.5) in order to obtain an(N, N) atomic register algorithm. Every read operation from the (N, N) register would involve N (parallel) communication roundtrips between the reader and all other processes.

Furthermore, every write to the(N, N) register would involve N (parallel) com-munication roundtrips between the writer and all other processes (to determine the largest timestamp), and then another communication roundtrip between the writer and all other processes (to perform the actual writing).

Similarly, assume we apply the transformation of Algorithm 4.8 to “Read-Majority Impose-“Read-Majority” algorithm (Algorithm 4.6–4.7) in order to obtain a (N, N) atomic register algorithm. Every read in the (N, N) register would involve

N (parallel) communication roundtrips between the reader and a majority of the pro-cesses (to determine the latest value), and then N other communication roundtrips between the reader and a majority of the processes (to impose that value). Further-more, every write to the(N, N) register would involve N (parallel) communication roundtrips between the writer and a majority (to determine the largest timestamp) and then another communication roundtrip between the writer and a majority (to perform the actual writing).

We present, in the following, two direct implementations of an (N, N) atomic register that are more efficient than the algorithms we obtain through the auto-matic transformations. Both algorithms use the two main ideas introduced by the transformation, namely, that a writer first consults the memory to obtain the highest timestamp that may have been used by the other writers, and that timestamp/rank pairs are used to extend the order on timestamps. We describe first a fail-stop algorithm and then a fail-silent algorithm.

4.4.4 Fail-Stop Algorithm: Read-Impose Write-Consult-All(N, N) Atomic Register

We describe the “Read-Impose Write-Consult-All” algorithm that implements an (N, N) atomic register in Algorithm 4.9. It uses the fail-stop system model with a perfect failure detector and extends the “Read-Impose Write-All” algorithm for (1, N) atomic registers (Algorithm4.5) to deal with multiple writers.

In order to get an idea of the issue introduced by multiple writers, it is important to first figure out why the “Read-Impose Write-All” algorithm cannot afford multi-ple writers. Consider indeed two processes p and q trying to write concurrently to a register, implemented using the “Read-Impose Write-All” algorithm. Due to the use of acknowledgments for read and write operations, if the preceding operation completed and no other operation is invoked, processes p and q both store the same timestamp ts used by that operation. When they proceed to write, different values would become associated with the same timestamp.

To resolve this issue, the algorithm also stores the identity of the process that writes a value together with a timestamp, expressed through the writer’s rank, and uses it to determine the highest timestamp. Comparisons employ the same ordering of timestamp/rank pairs as in Algorithm4.8. Apart from the addition of writer-ranks and the modified comparison, Algorithm4.9is the same as Algorithm4.5.

Correctness. The termination property of the atomic register follows from the completenessproperty of the failure detector P and the underlying channels.

The atomicity property follows from the accuracy property of P and from the ordering of timestamp/rank pairs. This order is the same as in Algorithm4.8and the argument proceeds analogously.

For demonstrating that the operations of the algorithm are linearizable, we con-struct the hypothetical sequence of atomic operations as follows. First, include all write operations according to the order on the unique timestamp/rank pair that was included in the WRITE message triggered by the operation. Second, consider each read operation or in the order in which the response occurs in the actual execution,

Algorithm 4.9: Read-Impose Write-Consult-All Implements:

(N, N )-AtomicRegister, instance nnar.

Uses:

BestEffortBroadcast, instance beb;

PerfectPointToPointLinks, instance pl;

PerfectFailureDetector, instance_P. upon event_⟨nnar,Init ⟩do

(ts, wr, val):=(0, 0,⊥); correct:=Π;

writeset:=_∅; readval:=_⊥; reading:= FALSE; upon event_{⟨ P},Crash_{| p ⟩}do

correct:= correct_{\ {p}}; upon event_⟨nnar,Read ⟩do

reading:= TRUE; readval:=val;

trigger_⟨beb,Broadcast_{| [}WRITE,ts, wr, val]⟩; upon event_⟨nnar,Write_{| v ⟩}do

trigger_⟨beb,Broadcast_{| [}WRITE,ts + 1,rank(self), v]⟩; upon event_⟨beb,Deliver| p,[WRITE,ts^′, wr^′, v^′]⟩do

if(ts^′, wr^′)is larger than(ts, wr)then (ts, wr, val):=(ts^′, wr^′, v^′); trigger_⟨pl,Send_{| p},[ACK]⟩; upon event_⟨pl,Deliver| p,[ACK]⟩then

writeset:= writeset_{∪ {p}}; upon correct_⊆writesetdo

writeset:=_∅;

if reading=TRUEthen reading:= FALSE;

trigger_⟨nnar,ReadReturn_|readval_⟩; else

trigger_⟨nnar,WriteReturn_⟩;

take the value ts and the rank of the writer q associated with the value v returned, and find the write operation ow, during which process q wrote (ts, v); place operation or

after owinto the sequence, immediately before the subsequent write operation.

The first condition of linearizability is ensured directly by the construction of the linearization, because each read returns the value of the latest preceding write.

To show the second condition of linearizability, consider any two operations o1

and o2 in the actual execution such that o1 precedes o2. There are four cases

to analyze:

1. If both operations are writes then the process p2executing o2 accessed its vari-able ts and incremented it. As the writer p1of o1 has received an ACKmessage from p2 (because p2 has not been detected by P), the value of ts at process p2

is at least as large as the timestamp associated to o1. Thus, the two operations appear in the linearization in the correct order by construction.

2. Suppose o1is a read and o2is a write. As in the previous case, the algorithm for the writer reads its variable ts and increments it by one for writing. This implies that o1 occurs before o2in the linearization.

3. Suppose o1 is a write and o2 is a read. The algorithm for o2 returns the value in variable val at the reader that is associated to the timestamp in variable ts.

According to how processes update their variables and how write operations are ordered, variable ts contains a timestamp that is at least as large as the timestamp written by o1. This implies that o2appears after o1in the linearization.

4. If both operations are reads, suppose that o1 returns a value v1 associated to a timestamp/rank pair (ts1, r1), and o2 returns a value associated to a differ-ent timestamp/rank pair(ts2, r₂). As o2 occurs after o1 in the actual execution, the reader in o1 received an acknowledgement from all nondetected processes, including the process that executes o2. The latter process (executing o2) may only have increased its ts variable in the algorithm; this argument implies that ts2 ≥ ts¹.

We now distinguish two cases. If ts2 > ts1 then the condition of linearizability holds by construction. Otherwise, if ts2 = ts1, consider process p1with rank r1

and process p2with rank r2. If r1 < r2then the write of process p1is placed into the linearization before the write of process p2, and, hence, also o1is placed into the linearization before o2, as required. Otherwise, it must hold r1 > r₂because r₁ ̸= r2. But, this cannot occur in the actual execution. It would mean that the variable tuple (ts, wr) at the process executing o2 contains a timestamp/rank pair that is at least as large as(ts1, r₁) according to the algorithm, because the process received a WRITEmessage containing(t₁, r₁) during o₁and updated the variables. But o2 returns the value associated to(ts2, r2), with ts2 = ts1, and this means that r2 ≥ r1, a contradiction.

Performance. Every read and write in the(N, N) register requires two communi-cation steps, and O(N) messages are exchanged for one operation

4.4.5 Fail-Silent Algorithm: Read-Impose Write-Consult-Majority (N, N) Atomic Register

We describe here how to obtain an algorithm that implements an (N, N) atomic register in a fail-silent model as an extension of our “Read-Impose Write-Majority”

algorithm, i.e., Algorithm4.6–4.7, that implements a(1, N) atomic register.

Let us again consider multiple writers in the single-writer formulation of the al-gorithm and examine why Alal-gorithm4.6–4.7fails to implement an(N, N) atomic

Algorithm 4.10: Read-Impose Write-Consult-Majority (part 1, read and consult) Implements:

(N, N )-AtomicRegister, instance nnar.

Uses:

BestEffortBroadcast, instance beb;

PerfectPointToPointLinks, instance pl.

upon event_⟨nnar,Init _⟩do (ts, wr, val):=(0, 0,⊥); acks:=0;

writeval:=_⊥; rid:=0;

readlist:=[⊥]^N; readval:=_⊥; reading:= FALSE; upon event_⟨nnar,Read _⟩do

rid:= rid+ 1; acks:=0;

在文檔中 Introduction to Reliable and Secure Distributed Programming (頁 177-188)