一般性非同步網路中協議問題之研究(I)

(1)

行政院國家科學委員會專題研究計畫成果報告

一般性非同步網路中協議問題之研究(I)

計畫類別：個別型計畫計畫編號： NSC91-2213-E-003-005- 執行期間： 91 年 08 月 01 日至 92 年 07 月 31 日執行單位：國立臺灣師範大學工業科技教育學系(所) 計畫主持人：蕭顯勝報告類型：精簡報告處理方式：本計畫可公開查詢

中華民國 93 年 2 月 17 日

(2)

一般性非同步網路中協議問題之研究(I)

計畫編號: 91-2213-E-003-005- 執行期限: 2002.08.01 至 2003.07.31 主持人: 蕭顯勝國立台灣師範大學工業科技教育系 Email: hssiu@ite.ntnu.edu.tw 一. 中文摘要 在實際的分散式系統中，系統單元(處理機和通訊線路)是會有多種錯誤模式同時發生之情形(一般稱為混合錯誤模式)、網路之拓樸可能是非完全連結之架構;而且功能正常之處理機在一般情況下是不知道那一個系統單元是有錯誤情況發生的。另外，處理機間之資料通訊是採用非同步 (asynchronous)方式進行。然而，現有的通訊協定沒有一個能在此一般性非同步網路中解決協議問題。本研究旨在一般性非同步網路中提出一些通訊協定來解決協議問題。我們提出一個 APBA(Asynchronous protocol for Byzantine Agreement)通訊協定在一般性網路中解決拜占庭協議問題(Byzantine agreement problem)。具體之研究內容如下: 1. 研究系統單元的錯誤模式。 2. 研究非同步網路資料通訊方式。 3. 提出在解決拜占庭協議問題的通訊協定 APBA。 4. 証明通訊協定 APBA 可以容忍最多的錯誤單元數目。 5. 提出在解決錯誤診斷協議問題通訊協定 APFDA。 6. 証明通訊協定 APFDA 可以診斷出最多的錯誤單元數目。我們証明了被提出之通訊協定是最佳的。在一般性非同步網路中，它們使用了最少之訊息交換量及能容許最大量之錯誤單元數目來解決上述之問題。關鍵詞: 一般性非同步網路，混合錯誤模式，拜占庭協議問題 Abstract

In a real-life distributed system, the system components (either processors, communication links, or both) can be subjected to different types of failures simultaneously (also called hybrid fault model). The network topology may not be fully connected, and a fault-free processor does not know which component in the network is faulty. The data communications between the processors are asynchronous. However, none of the existing protocols is designed for solving the agreement problem in a generalized asynchronous network.

In this project, we propose a protocol, called APBA (Generalized Protocol for Byzantine Agreement), for solving the agreement problem in the generalized asynchronous network (the Byzantine Agreement is the most basically problem of the agreement problems, when the Byzantine agreement can be solved, the

(3)

other agreement can then be solved) More specifically, the goals of this study will achieve the followings:

1. To study the failure model of the system components.

2. To study the communication method in the generalized asynchronous network. 3. To propose the protocol APBA to solve

the Byzantine Agreement problem. 4. To prove the protocol APBA can tolerate

the maximum number of faulty components.

5. To propose the protocol APFDA to solve the fault diagnosis agreement problem. We prove that the proposed protocols are optimal in terms of the minimum number of messages exchanged and the maximum number of faulty components tolerated.

Keywords: generalized asynchronous network, mixed failure model, Byzantine agreement problem, 二. 緣由與目的在實際的分散式系統中，處理機和通訊線路是會有多種錯誤模式同時發生之情形(一般稱為混合錯誤模式)、網路之拓樸可能是非完全連結之架構、而且功能正常之處理機在一般情況下是不知道那一個系統單元是有錯誤情況發生的、而且處理機間的資料通訊是使用非同步方式進行，我們稱這種網路模式為一般性非同步網路 (Generalized

Asynchronous Network，簡稱 GAN)。我們可將處理機和傳輸線錯誤的型態分成兩大類: 任意錯誤(arbitrary fault)與靜止錯誤(dormant fault) [17,18]。任意錯誤型態指處理器或傳輸線能做出不可預期的錯誤行為；而靜止錯誤型態指處理器或傳輸線不會傳送任何訊息或延後訊息之傳送。非同步資料傳送是指處理機間的通訊沒有固定的開始與結束時間限制；相對的，同步資料傳送是指處理機間有預先的協議在同一時間開始作資料通訊，而且能在預設的時間內完成傳送資料的工作。在這樣的網路環境中，協議問題必須考慮下面的情況: 1. 不同處理器必須相互配合來完成協議問題和錯誤診斷問題； 2. 因為網路不為全連式的，正常的訊息可能經過錯誤的處理器或傳輸線路而 被破壞。例如一個訊息 m 從處理器 P 發出要到處理器 Q，其中經過任意錯 誤處理器 R 而被改成訊息 m’。 3. 處理機與通訊線路均存在不同的錯誤模式，必須考慮兩者之間的錯誤行為所引起的影響，使得不會將正常的處理機或通訊線路誤判為錯誤單元。 4. 錯誤的系統單元(如任意錯誤)會主動的發佈不正確資訊去影響錯誤診斷的結果，或因本身的錯誤模式(如靜止錯誤)而沒有將資訊送出。 5. 資料通訊為非同步方式，沒有固定的時間限制；例如必須判斷是資料通訊發生錯誤或是資料傳送延誤等工作，在非同步網路中非常重要。一個協議問題的通訊協定是要讓功能正常(fault-free)的處理機獲得一個共同值(common value)；而此共同值必須滿足下列條件: (1) 協議(Agreement): 所有功能正常的 處理機均得一個相同共同值 v ; (2) 有效(Validity): 如果處理機 S 是功 能正常的，則共同值 v 必須為 S 的

(4)

起始值 vs；即 v = vs。 到目前為止，沒有一個能解決一般化的情況下(處理機和通訊線路是會有混合錯誤模式、網路之拓樸可能是非完全連結) 之錯誤診斷協議問題之方案。在本計畫中，在 GAN 上考慮協議問題，並提出對此問題之解決方案。三. 結果與討論在本計畫中，非同步分散式系統是指一群能自主獨立的處理器，它們經由不規則的通訊線路連繫起來。處理機和傳輸線的錯誤型態分成兩類 : 任意錯誤 (arbitrary fault)與靜止錯誤(dormant fault)。資料通訊方式為非同步方式；即處理機間的通訊沒有固定的開始與結束時間限制。我們列出系統中所有參數如下: (1) N：所有處理器的集合，每一處理器均 有獨一的名稱，系統中處理器數目為 n ( |N| = n)。 (2) c：網路的連通數(connectivity)。根據 Menger 定理[17]，如果網路的連通數 為 c，則任何一對處理機間均存在 c 條 不重疊的路徑。亦即是上述 c 條路徑只 有開始點與結束點是相同的。 (3) Pa：系統中任意錯誤處理器的數目。 (4) Pd：系統中靜止錯誤處理器的數目。 (5) La：系統中任意錯誤通訊線的數目。 (6) Ld：系統中靜止錯誤通訊線的數目。本計畫在上述的分散式系統中找出一個能解決協議問題之通訊協定。首先，我們必須決定系統的最大容錯能力。我們對一般性非同步網路提出一個拜占庭協議通訊協定，稱為 APBA(Asynchronous Protocol for Byzantine Agreement problem)。我們會証明 APBA 為一個最佳的解答－用最少的通訊量能容忍最大數目的錯誤單元。要在非同步網路中解決協議問題，不能用傳統的傳送/接收指令來完成，我們必須使用隨機演算法的方式來解決此問題。此種演算法為一機率模式，大致的運作方式如下： 每一功能良好的處理機 P 執行一連串 的步驟(stages)，其中每一步驟包括有兩個 回合(rounds)的工作。開始時，P 收到從啟 始處理機 S 送來的初始值 v（v＝0 or 1）， 設定 x 為 v；接下來執行一連串的步驟直 到它能找出協議值為止。每個步驟 s 包含 下列工作： Round 1:

P broadcasts (“1”, s, v) where v is the

current value of x

P receives the messages of the form

(“1”,s,*)

If the majority of (“1”,s,*) have same value v

then y = v

otherwise y = null.

Round 2:

P broadcasts (“2”, s, v), where v is the

current value of y

P receives the messages of the form

(“2”,s,*)

If the majority of (“2”,s,*) have the same value v and v <> null

then x = v, performs decide(v) and exit

else if not enough number of messages have same value

then x = v

else x = random(0,1) /* with equal probability */

經過深入的分析各種錯誤模式與解決方法後，我們會提出一個名叫 APBA 的通訊協

(5)

定來解決拜占庭協議問題。APBA 為一個結合隨機模式、秘密通道和虛擬通道等技術。APBA 包括一連串的訊息交換步驟，資料傳送使用了秘密通道和虛擬通道技術去除網路傳輸的錯誤。圖一為 APBA 的概念圖，其中█表示資料被發送端及中間系統單元所影響；□表示資料被中間系統單元；█表示資料被發送端影響；□表示資料不受錯誤單元所影響（即共同協議值）。圖一. APBA 的概念圖四. 參考文獻

[1] J. C. Adams and K. V. S. Ramarao, “Distributed diagnosis of Byzantine processors and links,” in Proc. Symp.

on Distributed Computing Systems,

1989, pp. 562-569.

[2] R. W. Buskens, and R. P. Bianchini, “Distributed on-line diagnosis in the presence of arbitrary faults,” in Proc.

Symp. on Fault-Tolerant Computing,

1993, 470-479.

[3] T. Chandra and S. Toueg, “Unreliable failure detectors for asynchronous systems,” in Proc. of the 10th ACM

Symp. on Principles of Distributed Computing, pp. 325-340, 1991.

[4] N. Deo, GRAPH THEORY with

Applications to Engineering and Computer Science, Englewood

Cliffs:Prentice-Hall, NJ, 1974.

[5] J. Martin, Telecommunications and the Computer, 3rd ed., Englewood

Cliffs:Prentice-Hall, NJ, 1990.

[6] S. Mallela and G. M. Masson, “Diagnosable systems for intermittent faults,” IEEE Trans. on Computers, vol. 27, no. 6, pp. 560-566, 1978.

[7] S. Mallela and G. M. Masson, “Diagnosis without repair for hybrid fault situations,” IEEE Trans. on

Computers, vol 29, no. 6, pp. 461-470,

1980.

[8] A. Pelc, “Reliable communication in networks with Byzantine link failures,” NETWORKS, vol. 22, no. 5, pp. 441-459, Aug. 1992.

[9] F. Preparata, G. Metze, and R. Chien, “On the connection assignment problem of diagnosable systems,”

FTVC Absent rule Absent rule

s

₁

-1

s

₁+

1

2 s

₂

-1

2 s

₂

1

_{. . .}

s

₁

1

FTVC Absent rule

s

_t+ 1 -FTVC

1

2

3 t

s

_t+ 1

1 ₂

3 t

s

+ t+ 1

1 ₂

3 t

2 s

₂+

1

(6)

IEEE Trans. on Electronic Computing,

vol. 16, no. 6, pp. 848-854, 1967. [10] K. V. S. Ramarao and J. C. Adams,

“On the diagnosis of Byzantine faults,” in Proc. Symp. on Reliable Distributed

Systems, 1988, pp. 144-153.

[11] K. Shin and P. Ramanathan, “Diagnosis of processors with Byzantine faults in a distributed computing systems,” in Proc. Symp.

on Fault-Tolerate Computing, 1987,

pp. 55-60.

[12] H. S. Siu, Y. H. Chin, and W. P. Yang, “Byzantine agreement in the presense of mixed faults on processors and links,” IEEE Trans. on Parallel and

Distributed Systems, vol. 9, no. 4,

pp.335-345, 1998.

[13] H. S. Siu, Y. H. Chin, and W. P. Yang, “A note on consensus on dual failure modes,” IEEE Tran. on Parallel and

Distributed Systems, vol. 7, no. 3, pp.

225-230, March 1996.

[14] H. S. Siu (also known as H. S. Hsiao), Y. H. Chin, and W. P. Yang, “Reaching fault diagnosis agreement under a hybrid fault model” IEEE Tran. on

Computers, vol. 49, no. 9, pp. 980-986,

Sept. 2000.

[15] M. Stahl, R. Buskens, and R. Bianchini, “On-line diagnosis in general topology networks,” Proc. of 1992 IEEE

Workshop on Fault-Tolerant Parallel and Distributed Systems, 1992, pp.

114-121.

[16] Nitin H. Vaidya and D. K. Pradhan, “Safe system level diagnosis,” IEEE

Trans. on Computers, vol. 43, no. 3, pp.

367-370, 1994.

[17] S. C. Wang, Y. H. Chin, and K. Q. Yan, “Reaching a fault detection agreement,” in Proc. Int. Conf. on

Parallel Processing, 1990, pp.

251-258.

[18] C. L. Yang and G. M. Masson, “A distributed algorithm for fault diagnosis in systems with soft failures,” IEEE Trans. on Computers, vol. 37, no. 11, pp. 1476-1480, 1988.

一般性非同步網路中協議問題之研究(I)

行政院國家科學委員會專題研究計畫 成果報告

一般性非同步網路中協議問題之研究(I)

中 華 民 國 93 年 2 月 17 日

一般性非同步網路中協議問題之研究(I)

s

-1

s

1

2

s

-1

2

s

1

s

1

s

1

2

3

t

s

1

2

3

t

s

1

2

3

t

2

s

1

行政院國家科學委員會專題研究計畫成果報告

中華民國 93 年 2 月 17 日

₂

₂