Just like the act of smiling, the act of abstracting is restricted to very few natural species. By capturing properties that are common to a large and significant range of systems, abstractions help distinguish the fundamental from the accessory, and prevent system designers and engineers from reinventing, over and over, the same solutions for slight variants of the very same problems.
From the Basics . . . Reasoning about distributed systems should start by abstract-ing the underlyabstract-ing physical system: describabstract-ing the relevant elements in an abstract way, identifying their intrinsic properties, and characterizing their interactions, lead us to define what is called a system model. In this book we will use mainly two abstractions to represent the underlying physical system: processes and links.
The processes of a distributed program abstract the active entities that perform computations. A process may represent a computer, a processor within a computer, or simply a specific thread of execution within a processor. In the context of network security, a process may also represent a trust domain, a principal, or one administra-tive unit. To cooperate on some common task, the processes may typically need to exchange messages using some communication network. Links abstract the physical and logical network that supports communication among processes. It is possible to represent multiple realizations of a distributed system by capturing different prop-erties of processes and links, for instance, by describing how these elements may operate or fail under different environmental conditions.
Chapter 2 will provide a deeper discussion of the various distributed-system models that are used in this book.
. . . to the Advanced. Given a system model, the next step is to understand how to build abstractions that capture recurring interaction patterns in distributed applica-tions. In this book we are interested in abstractions that capture robust cooperation problems among groups of processes, as these are important and rather challeng-ing. The cooperation among processes can sometimes be modeled as a distributed agreement problem. For instance, the processes may need to agree on whether a certain event did (or did not) take place, to agree on a common sequence of actions to be performed (from a number of initial alternatives), or to agree on the order by which a set of inputs need to be processed. It is desirable to establish more sophis-ticated forms of agreement from solutions to simpler agreement problems, in an incremental manner. Consider, for instance, the following situations:
• In order for processes to be able to exchange information, they must initially agree on who they are (say, using IP addresses on the Internet) and on some com-mon format for representing messages. They may also need to agree on some way of exchanging messages (say, to use a reliable data stream for communication, like TCP over the Internet).
• After exchanging some messages, the processes may be faced with several alter-native plans of action. They may need to reach a consensus on a common plan, out of several alternatives, and each participating process may have initially its own plan, different from the plans of the other processes.
• In some cases, it may be acceptable for the cooperating processes to take a given step only if all other processes also agree that such a step should take place. If this condition is not met, all processes must agree that the step should not take place.
This form of agreement is crucial in the processing of distributed transactions, where this problem is known as the atomic commitment problem.
• Processes may not only need to agree on which actions they should execute but also need to agree on the order in which these actions should be executed. This form of agreement is the basis of one of the most fundamental techniques to replicate computation in order to achieve fault tolerance, and it is called the total-order broadcastproblem.
This book is about mastering the difficulty that underlies these problems, and devising abstractions that encapsulate such problems. The problems are hard because they require coordination among the processes; given that processes may fail or may even behave maliciously, such abstractions are powerful and sometimes not straightforward to build. In the following, we motivate the relevance of some of the abstractions covered in this book. We distinguish the case where the abstrac-tions emerge from the natural distribution of the application on the one hand, and the case where these abstractions come out as artifacts of an engineering choice for distribution on the other hand.
1.2.1 Inherent Distribution
Applications that require sharing or dissemination of information among several participant processes are a fertile ground for the emergence of problems that required distributed programming abstractions. Examples of such applications are information dissemination engines, multiuser cooperative systems, distributed shared spaces, process control systems, cooperative editors, distributed databases, and distributed storage systems.
Information Dissemination. In distributed applications with information dissem-ination requirements, processes may play one of the following roles: information producers, also called publishers, or information consumers, also called subscribers.
The resulting interaction paradigm is often called publish–subscribe.
Publishers produce information in the form of notifications. Subscribers register their interest in receiving certain notifications. Different variants of the publish–
subscribe paradigm exist to match the information being produced with the subscribers’ interests, including channel-based, subject-based, content-based, or type-based subscriptions. Independently of the subscription method, it is very likely that several subscribers are interested in the same notifications, which the system should broadcast to them. In this case, we are typically interested in having all sub-scribers of the same information receive the same set of messages. Otherwise the system will provide an unfair service, as some subscribers could have access to a lot more information than other subscribers.
Unless this reliability property is given for free by the underlying infrastructure (and this is usually not the case), the sender and the subscribers must coordinate to
agree on which messages should be delivered. For instance, with the dissemination of an audio stream, processes are typically interested in receiving most of the infor-mation but are able to tolerate a bounded amount of message loss, especially if this allows the system to achieve a better throughput. The corresponding abstraction is typically called a best-effort broadcast.
The dissemination of some stock exchange information may require a more reliable form of broadcast, called reliable broadcast, as we would like all active processes to receive the same information. One might even require from a stock exchange infrastructure that information be disseminated in an ordered manner. In several publish–subscribe applications, producers and consumers interact indirectly, with the support of a group of intermediate cooperative brokers. In such cases, agreement abstractions may be useful for the cooperation among the brokers.
Process Control. Process control applications are those where several software processes have to control the execution of a physical activity. Basically, the pro-cesses might be controlling the dynamic location of an aircraft or a train. They might also be controlling the temperature of a nuclear installation or the automation of a car production plant.
Typically, every process is connected to some sensor. The processes might, for instance, need to exchange the values output by their assigned sensors and output some common value, say, print a single location of the aircraft on the pilot control screen, despite the fact that, due to the inaccuracy or failure of their local sensors, they may have observed slightly different input values. This cooperation should be achieved despite some sensors (or associated control processes) having crashed or not observed anything. This type of cooperation can be simplified if all processes agree on the same set of inputs for the control algorithm, a requirement captured by the consensus abstraction.
Cooperative Work. Users located on different nodes of a network may cooperate in building a common software or document, or simply in setting up a distributed dialogue, say, for an online chat or a virtual conference. A shared working space abstraction is very useful here to enable effective cooperation. Such a distributed shared memory abstraction is typically accessed through read and write operations by the users to store and exchange information. In its simplest form, a shared work-ing space can be viewed as one virtual unstructured storage object. In more complex incarnations, shared working spaces may add a structure to create separate loca-tions for its users to write, and range all the way from Wikis to complex multiuser distributed file systems. To maintain a consistent view of the shared space, the pro-cesses need to agree on the relative order among write and read operations on the space.
Distributed Databases. Databases constitute another class of applications where agreement abstractions can be helpful to ensure that all transaction managers obtain a consistent view of the running transactions and can make consistent decisions on how these transactions are serialized.
Additionally, such abstractions can be used to coordinate the transaction man-agers when deciding about the outcome of the transactions. That is, the database
servers, on which a given distributed transaction has executed, need to coordi-nate their activities and decide whether to commit or abort the transaction. They might decide to abort the transaction if any database server detected a violation of the database integrity, a concurrency control inconsistency, a disk error, or simply the crash of some other database server. As we pointed out, the distributed pro-gramming abstraction of atomic commit (or commitment) provides such distributed cooperation.
Distributed Storage. A large-capacity storage system distributes data over many storage nodes, each one providing a small portion of the overall storage space.
Accessing stored data usually involves contacting multiple nodes because even a single data item may be spread over multiple nodes. A data item may undergo complex transformations with error-detection codes or error-correction codes that access multiple nodes, to protect the storage system against the loss or corruption of some nodes. Such systems distribute data not only because of the limited capacity of each node but also for increasing the fault-tolerance of the overall system and for reducing the load on every individual node.
Conceptually, the storage system provides a shared memory abstraction that is accessed through read and write operations, like the shared working space men-tioned before. But since it uses distribution also for the purpose of enhancing the overall resilience, it combines aspects of inherently distributed systems with aspects of artificially distributed systems, which are discussed next.
1.2.2 Distribution as an Artifact
Often applications that are not inherently distributed also use sophisticated abstrac-tions from distributed programming. This need sometimes appears as an artifact of the engineering solution to satisfy some specific requirements such as fault tolerance, load balancing, or fast sharing.
We illustrate this idea through state-machine replication, which is a powerful way to achieve fault tolerance in distributed systems. Briefly, replication consists in making a centralized service highly available by executing several copies of it on different machines that are assumed to fail independently. This ensures the con-tinuity of the service despite the failure of a subset of the machines. No specific hardware is needed: fault tolerance through replication is software-based. In fact, replication may also be used within an information system to improve the read acc-ess performance to data by placing it close to the procacc-esses where it is likely to be queried. For a service that is exposed to attacks over the Internet, for example, the same approach also tolerates malicious intrusions that subvert a limited number of the replicated nodes providing the service.
For replication to be effective, the different copies must be maintained in a con-sistent state. If the states of the replicas may diverge arbitrarily, it does not make sense to talk about replication. The illusion of one highly available service would fall apart and be replaced by that of several distributed services, each possibly failing independently. If replicas are deterministic, one of the simplest ways to guarantee full consistency is to ensure that all replicas receive the same set of requests in the
same order. Typically, such guarantees are enforced by an abstraction called total-order broadcast: the processes need to agree here on the sequence of messages they deliver. Algorithms that implement such a primitive are nontrivial, and providing the programmer with an abstraction that encapsulates these algorithms makes the design of a replicated service easier. If the replicas are nondeterministic then ensur-ing their consistency requires different orderensur-ing abstractions, as we will see later in this book. The challenge in realizing these abstractions lies in tolerating the faults that may affect the replicas, which may range from a simple process crash to being under the control of a malicious adversary.