分散式系統查核點方法之最佳化研究

(1)

行政院國家科學委員會補助專題研究計畫成果報告

※※※※※※※※※※※※※※※※※※※※※※※※※

※ ※

※ 分散式系統查核點方法之最佳化研究 ※

※ ※

※※※※※※※※※※※※※※※※※※※※※※※※※

計畫類別：■個別型計畫 □整合型計畫計畫編號：NSC－89－2213－E－011－148 執行期間：89 年 08 月 01 日至 90 年 07 月 31 日計畫主持人：邱舉明

共同主持人：

本成果報告包括以下應繳交之附件：

□赴國外出差或研習心得報告一份

□赴大陸地區出差或研習心得報告一份

□出席國際學術會議心得報告及發表之論文各一份

□國際合作研究計畫國外研究報告書一份

執行單位：國立台灣科技大學電機工程系

中華民國 90 年 10 月 13 日

(2)

分散式系統查核點方法之最佳化研究

A Study on Optimal Checkpointing in Distributed Systems 計畫編號：NSC 89-2213-E-011-148

執行期限：89/08/01-90/07/31

主持人：邱舉明國立台灣科技大學資訊工程系教授

Email：[email protected] 計劃參與人員 : 蕭志明、楊政儒、賴志強、張嘉樺

國立台灣科技大學資訊工程系 一、中文摘要：

在分散式系統中，加入一組最少數目的強制查核點，以免除無用的查核點，不但可以避免資源的浪費，並使容錯技術更符合如即時性的需求。在本研究中，我們提出“非因果關係的基本間隔”(PNCI)的觀念，並証明只由 PNCIs 中，就可找到一組最少數目的強制查核點，以使系統免除無用的查核點。我們進一步利用圖形理論，建立一套簡明的系統模式，藉由所建構的有向圖形，將“找尋一組關鍵強制查核點的問題”轉換成“解決經過特定點的路徑均不會形成迴圈的問題 (SUBSET-FET)”。此外，我們也提出一套方法，以簡化原來問題的複雜度。經簡化後，在許多的情況下，可以有效率地找出其最佳解。

關鍵詞：查核點、非因果關係的基本間隔、

最佳化，返轉復原、無用的查核點。

Abstr act:

Inserting a minimum number of forced checkpoints in a distributed system so as to eliminate useless checkpoints has been an important issue for the provision of fault tolerance capability. In this research, we first introduce the notion of primary non-causal intervals and show that these intervals are the only candidates that need be considered for

placed. Our algorithm first converts the original problem to another problem on a directed graph that may reflect the existence of useless checkpoints. The new problem can be efficiently solved using existing methods.

Although our algorithm offers a near-optimal solution in general, optimal solutions could be obtained in many cases using reduction techniques.

Keywor ds: Checkpoints, primary non-causal intervals, optimum, rollback-recovery,

useless checkpoints.

二、緣由與目的：

在分散式系統中，查核點-返轉復原技術是一重要的容錯方法，有關這方面的機制已經有不少研究者提出 [1-4,6,8,13-21,23-27,29]。這類方法，主要是在行程進行中定時或不定時地製作一些查核點，也就是將目前應用程式所使用記憶體的內容，和處理器內暫存器的狀態儲存起來，以提供行程發生錯誤時復原之所需。

在分散式系統中，行程與行程之間都是透過訊息傳輸來達到溝通的目的。當某個行程發生錯誤，而復原到其前次查核點時，必須要和其他行程取得系統一致性 (consistency)[3,16]，所謂系統一致性，就是為了避免其他行程可能接收到該錯誤行程在發生錯誤之前所送出的訊息，而該行程在返轉復原後卻不再重覆執行此訊息送出的動作，以致破壞了系統的計算流程。

一般而言，系統中各行程由於其個別之重

(3)

稱為基本查核點(basic checkpoints)[10]。但是由於前述行程間的訊息溝通所造成的不一致性問題，有些基本查核點會失去其功用，而無法與其他行程之任一查核點構成一致的狀態，這種查核點就稱為無用的查 核點(useless checkpoints)[12,18] 。無用的查 核點對系統的返轉復原功能毫無用處，形成純粹的浪費，最差時，會造成骨牌效應 [24]。因此，如何使系統中之所有查核點都變成有用的，是重要的課題。

解決無用的查核點問題，必須加入一些額外的 強制查核點 (forced checkpoints)。近年來，針對這方面的研究，

已提出各種不同的加入強制查核點的方法。這些方法大致可分為協調(coordinated) 及非協調(uncoordinated)的兩種類型。其中，協調的方法[8,15,19]是在一個行程製作基本查核點時，其他的行程也必須停下來協調，而有必要時，也必須與該行程同步作強制查核點。一般運轉時，會對系統造成過多的延遲，並且又需增加協調訊息。

在非協調類型機制中，最簡單的方法就是強制行程在傳送和接收訊息時作查核點。如此，則系統就不需要協調作查核點，

又能避免無用的查核點。但是這種強制性的查核點動作，與訊息傳輸動作息息相關，隨著訊息量的增加，其相對所產生的查核點耗損(overheads)就會很高。通訊導引方法 (communication-induced) [1,2,11, 12,26] 為解決上述方法產生過多強制查核點的缺失，在每一訊息上附帶一些資訊，

行程則以收到的訊息上所附帶的資訊和本身的資訊比較，決定是否製作強制查核點。隨著訊息量的增加或某些訊息形式的出現，仍會造成過量的強制查核點製作數目。

另外，有學者提出以核心 z-cycle (core z-cycle)的觀念[21]，簡化查核點和訊息樣式之關係，以利強制查核點的製作選擇。

可是該觀念侷限於針對一個無用的查核點的研究，並未考量到查核點彼此之間的互動影響。因此，該觀念並未能正確反應出

一個強制查核點對其它核心 z-cycles 的影響，也就無法從該觀念上尋找到最佳之強制查核點加入位置。

綜觀目前的方法，大都仍以啟發性 (heuristic)查核點製作為主，針對如何找到最少數目強制查核點，以免除無用的查核點問題(也就是最佳化問題)，仍尚未深入探討。因此，有進一步研究的必要。本研究之目的，就是針對此一問題加以探討，進而就一已知的通訊樣式，提出一套有效率的方法，以尋找最佳強制查核點加入位置。

三、結果與討論：

在本研究中，首先我們提出“非因果

關係的基本間隔 ” (primary non-causal interval, PNCI)的觀念。所謂一非因果關係 間隔是指在一行程中，從送出一訊息事件 (send event)到發生在後面的接收訊息事件 (deliver event)的間隔。而所謂“非因果關係的基本間隔”的觀念則定義如下：

“ 如果一非因果關係的間隔中不存在任何接收事件或送出事件，則該間隔就稱為非因果關係的基本間隔(PNCI) ”。

根據上述的定義，我們可以證明，若 有一組F強制查核點，可以令系統免除無用 的查核點，則我們一定可以找到另一組F

'

強制查核點，而F

'

^{中之查核點全部落在}

PNICs中，使得系統也不存在無用的查核 點，其中，∣F

'

∣≦∣F∣。由此，我們就 可引申至，若要尋找一組最少的強制查核點，其加入可令系統去除無用的查核點的話，只需要考量PNCIs所在的位置即可。

接著，針對一已知其通訊樣式之分散式系統，我們提出一套有效率的方法，以尋找最佳強制查核點加入的PNCIs位置。其步驟有三：

一. 建構ZC有向圖：

ZC有向圖能反應出原來系統在形成 Z-cycles方面的情形。如圖一之系統，其對應之ZC有向圖，則在圖二中顯示。

(4)

圖一

圖二

PNCIs 在 ZC 有向圖中為雙向的有向線，一旦在其中加入一強制查核點，在ZC 圖中，相當於加入一強制查核點節點，同時改變雙向的有向線為單向的有向線，這樣就有機會將ZC有向圖的cycles去除。換句話說，可將“在PNCIs中找出最少強制查核點製作位置”的問題順利轉移成“在ZC有向圖中去除最少的線段，使經過指定的點集合的所有路線都不會形成迴路”，這也就是有名的SUBSET-FET問題[9]。

二. 問題簡化

一般而言，ZC有向圖的複雜度頗高。

不過，往往可以透過適當的方式加以簡化。其中，我們可以利用圖論中的強烈連

向圖，經分解後可得到如圖三的三個子圖 H 1、H2 和 H3。

圖三

一般而言，SUBSET-FET是NP問題 [9]，但是已有現成的方法[9]，可以在線性時間內得到近似最佳解。況且，在許多情況下，經進一步分類，實際上大部分是屬於可化簡的流量圖樣(reducible flow graphs) [7]，這類圖對SUBSET-FET而言，可以找 出最佳的解答。例如，圖三中之H1，因為 只有一個基本查核點，可視為最大流量最小截面(max-flow min-cut) 的問題，自然能 夠得到最佳解答。而H2與H3均為可化簡的流量圖形，使用Ramachandran的方法 [22] ，即可得到最佳解。因此，H1、H2 和H3可以分別得到 {x₁}、{x₄}和 {x₇,x₉} 為最佳解。

三. 不可或缺的集合(minimal set)

由前一步驟所獲得之強制查核點，足以使基本查核點不致變成無用的情形，但我們也須保證這些查核點本身不會自己變成無用的情況。這必須由前解中，找出不可或缺的集合才行。我們可以依序加以檢查，並移去不需要的強制查核點即可。

四、計畫成果自評：

本研究針對分散式計算系統，提出 PNCI的觀念，以方便尋找最少強制查核點加入位置，使系統中不存在無用的查核點。再者，我們利用圖論發展一套系統化，

又有效率的通訊及查核點模式，將選擇

(5)

提昇有重要的意義。初步研究成果經投稿已經為2001 IEEE Workshop on Real-Time Embedded Systems所接受。參與研究之相關人員，對於分散式系統之查核點製作問題，也有了更深入的了解。

五、參考文獻：

[1] R. Baldoni, F. Quaglia and P. Fornara,

“An Index-Based Checkpointing Algorithm for Autonomous Distributed Systems”, Proc.

IEEE Int. Symp. on Reliable Distributed Systems, pp. 27-34, 1997.

[2] D. Briatico, A. Ciuffoletti and L.

Simoncini, “A Distributed Domino-Effect Free Recovery Algorithm”, In Proc. IEEE Int.

Symposium on Reliability Distributed oftware and Database, pp. 207-215, 1984.

[3] K.M. Chandy and L. Lamport,

“Distributed Snapshots: Determining global states of distributed systems”, ACM Trans.

Comput. Syst., vol.3, no. 1, pp. 63-75, Feb.

1985.

[4] G.-M. Chiu and C.-R. Young. “Effcient rollback-recovery technique in distributed computing systems”, IEEE Trans. Parallel and Distributed Syst., 7(6):565-577, June 1996.

[5] J.-F. Chiu and G.-M. Chiu.

“Process-replication Technique for Fault-Tolerance and Performance Improvement in Distributed Computing Systems”, IEEE Inter. Symp. On High-Performance Distributed Computing, August, 1994.

[6] F. Cristian and F.Jahanian. “A time-based checkpointing protocol for long-lived distributed computations”, In Proceedings of the 10th symposium on Reliable Distributed Systems, pp. 12-20, September 1991.

[7] Narsingh Deo, Graph Theory with applications to engineering and computer science, 1974.

[8] Elmootazbellah N. Elnozahy and Willy Zwaenepoel. “The performance of consistent checkpointing”. In Proc. 11th Symp. on Reliable Distributed Systems, pp. 39-47, 1992.

[9] G. Even and J. Naor , B. Schieber and m.

Sudan " Approximating minimum feedback sets and multicuts in directed graphs” 4^th

IPCO, pp. 14-28, 1995.

[10] J. Fleischmann and P.A. Wilsey.

“Comparative analysis of periodic state saving techniques in time warp simulators”.

Proceedings 9th workshop on parallel and distributed simulation. (PADS’ 95), pp.

50-58.

[11] J.M. Helary, A. Mostefaoui, and M.

Raynal, “Virtual Precedence in Asynchronous Systems: Concepts and Applications”, Pro. 11^th Workshop on Distrubuted Algorithms, LNCS press, 1997.

[12] J.M. Helary, A, Mostefaoui, R.H.B.

Netzer and M. Raynal, “Preventing Useless Checkpoints in Distributed Computations”, Proc. IEEE Int. Symposium on Reliable Distributed Systems, pp. 183-190, 1997.

[13] T. Juang and S. Venkatesan, “Crash recovery with little overhead”, in Proc.11th Int. Conf. Distributed Computer System, pp.

454-461, May 1991.

[14] T. Juang and S. Venkatesan, “Message and Optimal Crash Recovery in Tree Networks” in Proc. Int. Conf. on Parallel and Distributed Computer System, pp. 259-266, Dec. 1992.

[15] Richard Koo and Sam Toueg,

“Checkpointing and rollback-recovery for distributed system”, IEEE Trans. Software Eng., vol. SE-13, pp. 23-31, Jan.1987.

[16] Leslie Lamport, “Time, clocks, and the Ordering of Events in a Distributed System”, Commun. ACM 21, 7(Jul.1978), pp.

558-565.

[17] D. Manivannan and M. Singhal, “A low-overhead recovery technique using quasi-synchronous checkpointing”, In Proc.

IEEE Int. Conf. Distributed Comput. Syst. pp.

100-107, 1996.

[18] R.H.B. Netzer and J. Xu, “Necessary and Sufficient Conditions for Consistent Global Snapshots”, IEEE Transactions on Parallel and Distributed Systems, vol.6, no.2, pp. 165-169, 1995.

[19] N. Neves and W.K.Fuchs, “Using Time to Improve the Performance of Coordinated Checkpointing”, IEEE Inter.

Computer Performance and Dependability Symp. September 1996.

[20] B.R. Preiss, W. M. Loucks and I.D.

Macintyre, “Effects of the checkpoint

(6)

interval on time and space in time warp”, ACM tran. on modeling and Computer Simulation,vol. 4, No. 3, pp. 223-253, July 1994.

[21] F. Quaglia, R. Baldoni, B. Ciciani, “A Checkpointing Protocol Based on a Minimal Characterization of the “No-Z-Cycle”

Property”, Technical Report 01-98, Department of Information Science, University Roma, Italy, 1999.

[22] V. Ramachandran "Finding a minimum feedback arc set in reducible flow graphs" Journal of algorithms, pp. 299-313 , 1988.

[23] B. Randell, “System Structure for Software Fault Tolerance”, IEEE Trans. on Software Engineering, vol. SE1, no. 2, pp.

220-23, 1975.

[24] D.L. Russel, “State Restoration in Systems of Communicating Processes”, IEEE Trans. Software Engineering, vol. SE6, no2, pp. 183-194, 1980.

[25] K. Vankatesh, T. Radakrishanan and H.L. Li, “Optimal Checkpointing and Local Recording for Domino-Free Rollback-Recovery”, Information Processing Letters, vol. 25, pp. 295-303, 1987.

[26] Y.M. Wang and W.K. Fuchs, “Lazy checkpoint coordination for bounding rollback propagation”, in Proc. IEEE Symp.

Reliable Distributed Syst., pp. 78-85, October 1993.

[27] Y.M. Wang, “Consistent global checkpoints that contain a given set of local checkpoints”, to appear in IEEE Trans. on Computers.

[28] Yi-Min Wang and W. Kent Fuchs,

“Scheduling message processing for reducing Rollback propagation”, in FTCS 1992, pp.

204-211.

[29] Avi Ziv and J. Bruck, “An on-line algorithm for checkpoint placement”, IEEE Trans. on computers, vol. 46, No. 9, September 1997.