• 沒有找到結果。

A combination of (1) a process abstraction, (2) a link abstraction, and possibly (3) a failure-detector abstraction defines a distributed-system model. In the following, we discuss several models that will be considered throughout this book to reason about distributed-programming abstractions and the algorithms used to implement them.

We also discuss important properties of abstraction specifications and algorithms that will be useful reasoning tools for the following chapters.

2.7.1 Combining Abstractions

Clearly, we will not consider all possible combinations of basic abstractions. On the other hand, it is interesting to discuss more than one possible combination to get an insight into how certain assumptions affect the design of an algorithm. We have selected six specific combinations to define several different models studied in this book.

Fail-stop. We consider the crash-stop process abstraction, where the processes execute the deterministic algorithms assigned to them, unless they possibly crash, in which case they do not recover. Links are supposed to be perfect (Module2.3).

Finally, we assume the existence of a perfect failure detector (P) of Module2.6.

As the reader will have the opportunity to observe, when comparing algorithms in this model with algorithms in other models discussed later, these assumptions substantially simplify the design of distributed algorithms.

Fail-noisy. We consider the crash-stop process abstraction together with perfect links (Module2.3). In addition, we assume here the existence of the eventually perfect failure detector (✸P) of Module2.8or the eventual leader detector (Ω) of Module2.9. This model represents an intermediate case between the fail-stop model and the fail-silent model (introduced next).

Fail-silent. We consider the crash-stop process abstraction together with per-fect links (Module2.3) only. This model does not assume any failure-detection or leader-election abstractions. That is, processes have no means to get any information about other processes having crashed.

Fail-recovery. This model uses the crash-recovery process abstraction, accord-ing to which processes may crash and later recover and still participate in the

algorithm. Algorithms devised for this model have to cope with consequences of amnesia, i.e., that a process may forget what it did prior to crashing, but may use stable storage for this. Links are assumed to be stubborn (Module 2.2) and algorithms may rely on the eventual leader detector (Ω) of Module2.9.

Fail-arbitrary. This is the most general of our distributed-system models and uses the fail-arbitrary (or Byzantine) process abstraction and the authenticated perfect links abstraction in Module2.5. This model could also be called the fail-silent-arbitrarymodel.

When Byzantine process abstractions are considered together with authenticated perfect links and in combination with the Byzantine eventual leader-detector abstraction (Module2.10), we call it the fail-noisy-arbitrary model.

Randomized. The randomized model is of a different nature than the other distributed-system models, and can be thought of being orthogonal to all of them.

We use it for more general process abstractions than otherwise. Algorithms in the randomized system model are not necessarily deterministic; the processes may use a random source to choose among several steps to execute. Typically, the corresponding algorithms implement a given abstraction with some (hopefully high) probability. Randomization is sometimes the only way to solve a problem or to circumvent inherent inefficiencies of deterministic algorithms.

It is important to note that many abstractions in the book will be specified only for the three models with crash-stop processes, that is, for the fail-stop, fail-noisy, and fail-silent models. In other distributed system models, especially in the fail-recovery and the fail-arbitrary models, they must be formulated differently and represent a different abstraction, strictly speaking.

Moreover, many abstractions we study cannot be implemented in all models.

For example, some abstractions that we will consider in Chap.6do not have fail-silent solutions or fail-arbitrary implementations, and it is not clear how to devise meaningful randomized solutions to such abstractions. For other abstractions, such solutions may exist but devising them is still an active area of research. This is, for instance, the case for randomized solutions to the shared memory abstractions we consider in Chap.4.

2.7.2 Setup

In all system models considered here, the identities of all processes are defined before the execution begins and must be known globally. In practice, they are either configured though a manual process by an administrator or installed automatically by a membership service, which itself must be initialized.

The cryptographic abstractions also require keys to be distributed according to the identities of all processes. For instance, a MAC requires one shared sym-metric key for every pair of processes; a digital signature scheme requires one public/private key pair for every process such that only the process itself knows its private key and all processes know the public keys of all others. Key distribution

occurs outside the system model. In practice, a trusted agent distributes the neces-sary keys during system setup, typically at the same time when the identities of the processes in the system are defined.

2.7.3 Quorums

A recurring tool for designing fault-tolerant algorithms for a set of N processes are quorums. A quorum is a set of processes with special properties.

A quorum in a system with N crash-fault process abstractions (according to the fail-stop, fail-noisy, fail-silent, or fail-recovery system model) is any majority of processes, i.e., any set of more than N/2 processes (equivalently, any set of ⌈N +12 ⌉ or more processes). Several algorithms rely on quorums and exploit the fact that every two quorums overlap in at least one process. Note that even if f < N/2 pro-cesses fail by crashing, there is always at least one quorum of noncrashed propro-cesses in such systems.

In a system consisting of arbitrary-fault process abstractions, two majority quo-rums may not intersect in a correct process. A Byzantine quorum tolerating f faults is a set of more than (N + f)/2 processes (equivalently, any set of ⌈N +f +12 ⌉ or more processes). Two Byzantine quorums always overlap in at least one correct process. To see why this is the case, note that in any Byzantine quorum, there might be f Byzantine processes. Every Byzantine quorum contains, thus, more than

N + f

2 − f = N− f 2

correct processes. Two disjoint Byzantine quorums together would have more than N− f

2 + N − f

2 = N − f

correct members. But there are only N − f correct processes; hence, one correct process must occur in both Byzantine quorums.

Algorithms that rely on Byzantine quorums often need to make progress after obtaining some message from a Byzantine quorum of processes. Because up to f faulty processes may not respond, there must exist at least a Byzantine quorum of correct processes in the system, from which the desired response eventually arrives.

This condition is satisfied only when

N − f > N + f

2 ,

or equivalently when N > 3f, as simple manipulation shows. Therefore, algorithms tolerating Byzantine faults usually require that only f < N/3 processes may fail.

2.7.4 Measuring Performance

When we present a distributed algorithm that implements a given abstraction, we analyze its cost mainly using two metrics: (1) the number of messages required

to terminate an operation of the abstraction, and (2) the number of communication steps required to terminate such an operation. For some algorithms, we evaluate also (3) its total communication size, which is the sum of the lengths of all messages sent by an algorithm. It is measured in bits. When evaluating the performance of algorithms in a crash-recovery model, besides the number of communication steps and the number of messages, we also consider (4) the number of accesses to stable storage (or the “logging operations”).

In general, we count the messages, communication steps, and disk accesses in specific executions of the algorithm, especially in failure-free executions. Such exe-cutions are more likely to happen in practice and are those for which the algorithms are optimized. It makes sense to plan for the worst, by providing means in the algorithms to tolerate failures, and hope for the best, by optimizing the algorithms for the case where failures do not occur. Algorithms that have their performance go proportionally down when the number of failures increases are sometimes called gracefully degradingalgorithms.

Performance measurements are often stated in Big-O Notation, which provides only an upper bound on the asymptotic behavior of the function when its argument grows larger; usually the argument is N, the number of processes in the system.

More precisely, when a metric satisfies O(g(N)) for a function g, it means that for all N bigger than some value N0, the measure is at most a constant times g(N) in absolute value. In this case, the metric is said to be “on the order of g(N).” For instance, a complexity of O(N) is also called linear in N, and a complexity of O(N2) is called quadratic in N.

Precise performance studies help select the most suitable algorithm for a given abstraction in a specific environment and conduct real-time analysis. Consider, for instance, an algorithm that implements the abstraction of perfect communication links, and hence ensures that every message sent by a correct process to a correct process is eventually delivered by the latter. Note the implications of this property on the timing guarantees: for every execution of the algorithm, and every message sent in that execution, there is a time delay within which the message is eventually delivered. The time delay is, however, defined a posteriori. In practice one would require that messages be delivered within some time delay defined a priori, for every execution and possibly every message. To determine whether a given algorithm pro-vides this guarantee in a given environment, a careful performance study needs to be conducted on the algorithm, taking into account various aspects of the environ-ment, such as the operating system, the scheduler, and the network. Such studies are out of the scope of this book. We present algorithms that are applicable to a wide range of distributed systems, where bounded delays cannot be enforced, and where specific infrastructure-related properties, such as real-time demands, are not strictly required.