• 沒有找到結果。

間歇連接式資料庫並行控制之研究

N/A
N/A
Protected

Academic year: 2021

Share "間歇連接式資料庫並行控制之研究"

Copied!
9
0
0

加載中.... (立即查看全文)

全文

(1)

行政院國家科學委員會專題研究計畫 成果報告

間歇連接式資料庫並行控制之研究

研究成果報告(精簡版)

計 畫 類 別 : 個別型

計 畫 編 號 : NSC 95-2221-E-006-277-

執 行 期 間 : 95 年 08 月 01 日至 96 年 10 月 31 日 執 行 單 位 : 國立成功大學會計學系(所)

計 畫 主 持 人 : 徐立群

計畫參與人員: 碩士班研究生-兼任助理:許書維、陳韋翰、葉光仁

報 告 附 件 : 出席國際會議研究心得報告及發表論文

處 理 方 式 : 本計畫可公開查詢

中 華 民 國 97 年 01 月 29 日

(2)

行政院國家科學委員會補助專題研究計畫成果報告

※※※※※※※※※※※※※※※※※※※※※※※※※

間歇連接式資料庫並行控制之研究

※ ※

※※※※※※※※※※※※※※※※※※※※※※※※※

計畫類別:■個別型計畫 □整合型計畫

計畫編號:NSC 95-2221-E-006-277

執行期間: 95 年 08 月 01 日至 96 年 10 月 31 日

計畫主持人:徐立群

本成果報告包括以下應繳交之附件:

□赴國外出差或研習心得報告一份

□赴大陸地區出差或研習心得報告一份

■出席國際學術會議心得報告及發表之論文各一份

□國際合作研究計畫國外研究報告書一份

執行單位:國立成功大學會計學系

中 華 民 國 97 年 1 月 25 日

(3)

行政院國家科學委員會專題研究計畫成果報告

間歇連接式資料庫並行控制之研究

Concurrency Control for Intermittently Connected Databases

計畫編號:NSC 95-2221-E-006-277

執行期限:95 年 08 月 01 日至 96 年 10 月 31 日

主持人:徐立群 國立成功大學會計學系

shulc@mail.ncku.edu.tw

一、中文摘要

在間歇連接式資料庫環境下,與伺服 器離線之客戶機,各自使用整體資料庫之 一部分資料,以非同步方式運作。這些客 戶機通常只在需要下載或更新資料時,才 與伺服器連接。因為客戶機在離線時可能 會修改共享資料,系統必須要有某種同步 機制才能確保資料之一致性。本研究藉由 兩項重要的觀察,大大簡化我們為間歇連 接式資料庫設計的兩個並行控制協定。雖 然兩個協定皆採用樂觀式同步策略並且保 證可序列性,它們卻有許多不同的特性,

使它們適用於不同的應用。

關鍵詞:資訊存取無所不在、資料複製、

並行控制、可序列化、調解。

Abstract

An intermittently connected database (ICDB) environment is one in which a number of clients work asynchronously while disconnected from the server, each operating on a part of the entire database.

The clients are only connected to the server when they need to download or update data.

Because clients may update shared data while disconnected, certain synchronization mechanisms must be in place to ensure data consistency. In this study, we made a couple important observations which greatly simplify our design and analysis of two

concurrency control protocols for ICDB systems. While both protocols use an optimistic strategy and guarantee serializable executions, their characteristics are different in some other respects, thus making them appropriate for different applications.

Keywords: Ubiquitous information access, data replication, concurrency control, serializability, reconciliation.

二、緣由與目的

As the price/performance ratio of mobile devices continues to improve steadily in recent years, it is now common for people to

cache data on their mobile appliances and work on local data copy while they are away from home or office. Such a working environment has been termed intermittently connected database (ICDB) system [1]. In an ICDB system, a number of clients work asynchronously while disconnected from the server, each operating on a part of the database. The clients are only connected to the server when they need to download or update data. Many emerging applications have ICDB characteristics, e.g., sales force automation, insurance claim processing, hospital work coordination, etc.

When clients are allowed to modify replicated data, maintaining consistency of data copies across various locations becomes an important issue. Our approach to this problem is to model concurrent client activities as transactions. For the ICDB model we are

(4)

considering (every data item in the database has a primary copy that is stored on the server, but data can be replicated and updated at every mobile client;

committing local updates must be done via handshaking with the server), we observe that one can adopt the typical correctness criterion used for centralized database systems, i.e., serializability, rather than the more complicated criterion for replicated distributed databases, i.e., one-copy serializability.

The main reason is because the master copy of every data item is kept on the server. We say a client starts a committing session when it is ready to write back its local updates. We make an important observation that although the activity of every committing client performed between two consecutive committing sessions may consist of multiple tentative local transactions, in fact it can be modeled as a single transaction.

三、結果與討論

The above two observations greatly simplify our design of concurrency control protocols and their correctness reasoning. Based on them, we propose two concurrency control protocols: (1) a match-and-go protocol (MAG) that will abort local transactions sent from a connecting mobile clients if the read set of the transactions has been changed with different values by other mobile clients; (2) a graph-based scheduler (GBS) that maintains and tests the serialization graph of the history which represents the execution the scheduler controls. We discuss a couple observations which facilitate the scheduler's incremental construction of the graph. The two protocols have different characteristics in terms of the types of histories they accept or reject, classes of serializability their generated histories belong to, and scalability, hence making them appropriate for different applications.

The two protocols are designed to be used as validation procedures when a client connects with its server. Our reconciliation criterion is based on global serializability, i.e., every concurrent execution of committed local transactions is equivalent to a sequential execution of the same transactions.

The histories produced by the two protocols are, however, members of different serializability classes. While GBS produces histories that are in the class of conflict serializability, the set of histories produced

by MAG is shown to be in the class of view*

serializability, a generalized class of view serializability. It can be shown that the set of histories that can be generated by GBS has a nonempty intersection with the set of histories that can be generated by MAG.

However, these two sets are incomparable with respect to set inclusion.

Another aspect that distinguishes these two protocols is that MAG requires that each mobile client always uses recently changed data (made by other clients). Hence, the protocol is more suitable for applications such as stock trading, for which knowing and using new data state is important. GBS, on the other hand, allows a mobile client to compute results based on not-up-to-date data values. Hence, the protocol is more suitable for applications such as executive decision making, for which approximate data state may be acceptable.

As for the reconciliation overhead, it is the server's responsibility to produce correct histories in the graph-based scheduler. On the other hand, the reconciliation task is distributed to each connecting mobile client in the match-and-go protocol. As a result, MAG is more scalable than GBS.

When processing a committing transaction, the basic versions of our protocols either accept or reject the entire transaction. When a client remains disconnected for an extended period of time and has executed many local transactions or when the set of updated items of a committing transaction is large, it may be worthwhile to modify our protocols in order to accept partial local updates. For example, instead of one single write block WBBj for a committing transaction Tj, GBS can keep track of multiple write blocks WBj,iB , i = 1, 2, …., n, such that WBBj,i ⊆ WBj,i+1 for 1 ≤ i <

n, and WBj,nB is the complete set of data items the connecting client wishes to propagate back to the server. For reason of atomicity, a disconnected client might choose to create a new write block whenever a local transaction completes. (A committing transaction consists of multiple local transactions.) Now GBS can use a style of binary search to find the largest WBBj,i such that its inclusion in the

(5)

serialization graph will not lead to a cycle.

Note that larger write blocks may mean more conflicts among transactions, which can in turn lead to cycles in the graph.

As for MAG, instead of requiring all downloaded data of a committing transaction to have the same values, we can determine the subset of the read set of the committing transaction whose values remain the same at commit time. We then determine the subset of the write set of the committing transaction that depends on the unchanged read subset. It is not hard to see that we can still propagate this write subset without sacrificing serializability.

Replica control strategies can be categorized as either pessimistic or optimistic.

The two protocols we propose belong to the optimistic camp. The pessimistic approach does not allow two disconnected clients to perform conflicting operations at the same time. The optimistic approach, on the other hand, permits reads and writes everywhere, hence must detect and resolve conflicts after their occurrences. A good overview of existing replication solutions in traditional distributed databases can be found in [2].

In the mobile data management domain, the optimistic approach is generally preferable to the pessimistic one because by using pessimistic schemes data needed by a client may be held by another disconnected client for an extended period of time. The optimistic approach, however, will incur abortion and redo overheads when conflicts can not be resolved. On the other hand, deadlock is a potential problem when locking is used to enforce the pessimistic strategy.

An analytic study of conflict detection and resolution for the optimistic approach and deadlock detection and resolution for the pessimistic approach can be found in [3].

Pitoura and Bhargava [4] proposed a replication model which takes varying connectivity conditions among communicating nodes into consideration.

Nodes are divided into clusters where strongly connected sites belong to the same cluster. Copies within the same cluster are required to be consistent. Inter-cluster data inconsistency is made to be bounded. Two

types of transactions are identified: weak and strict transactions. Weak transactions access local, potentially inconsistent copies and perform tentative updates. Strict transactions access consistent data and perform permanent updates. When disconnected, a client can still operate by employing weak transactions. To determine correct concurrent execution of weak and strict transactions, the proposed scheme uses intracluster and intercluster serialization graphs. The intercluster serialization graph is maintained at run time, which is analogous to our graph-based scheduler.

Commercial database products have also included support for maintaining data consistency in the ICDB environments. For example, Sybase SQL Anywhere provides a reconciliation scheme in its MobiLink server synchronization technology [5], which is analogous to our match-and-go protocol.

Unlike our approach, Sybase's scheme relies on the server to detect and resolve conflicts, thereby limiting its scalability. We have seen variants of our match-and-go protocol in the literature, e.g., [6]. To our knowledge, none had formally reasoned about the correctness of such protocols as we do.

四、計畫成果自評

In this study, we have proposed two concurrency control protocols for intermittently connected client-server databases. The protocol rules determine whether updates done by a committing client can be reflected back to the server. We make two important observations which greatly simplify the design and analysis of our protocols. In particular, we argue that one can adopt serializability, rather than one-copy serializability, as the underlying correctness criterion. In addition, the activities done by each committing client can be modelled by a single transaction, rather than one or more local transactions. The two protocols produce schedules that belong to different classes of serializability: view*

serializability by the match-and-go protocol (MAG) and conflict serializability by the graph-based scheduler (GBS). When

(6)

comparing their performance, we also consider a variant of MAG, called PCMAG, that takes the dependency of a transaction's write set on its read set into consideration.

Simulation results reveal that all three protocols commit more transactions as database size increases. All of them abort more transactions as client size increases. As data domain size increases, the success rate achieved by both MAG and PCMAG drops quickly, and stabilizes beyond the size of ten.

GBS, however, is not sensitive to domain size changes. Generally speaking, GBS outperforms MAG and PCMAG in most scenarios. PCMAG can outperform GBS when the degree of dependency of transactions' write sets on their read sets is low.

At this time, we have published part of our results in [7]. We are working on the following issues: (1) Relaxing the constraint that only one connecting client is processed at a time; (2) Extending the match-and-go protocol in such a way that approximate values, rather than only exact values, can be accepted, while still guaranteeing some form of serializability.

五、參考文獻

[1] S. Mahajan, M. J. Donahoo, S. B.

Navathe, M. H. Ammar, and S. Malik.

“Grouping Techniques for Update Propagation in Intermittently Connected Databases,” In Proceedings of the International Conference on Data Engineering, 1998.

[2] B. Kemme and G. Alonso. “A new approach to developing and

implementing eager database replication protocols,” ACM Trans. on Database Systems, Vol. 25, No. 3, 2000.

[3] J. Gray, P. Helland, P. O'Neil, and Dennis Shasha. “The dangers of replication and a solution,” In Proceedings of the 1996 ACM SIGMOD International Conference on Management of Data, 1996.

[4] E. Pitoura and B. K. Bhargava. “Data Consistency in Intermittently Connected Distributed Systems,” IEEE Transactions on Knowledge and Data Engineering, Vol. 11, No. 6, 1999.

[5] Sybase. “Building UltraLite Applications Using SQL Anywhere Studio,” Sybase, Inc., Student Guide, 2000.

[6] Sidney Chang and Dorothy Curits. “An Approach to Disconnected Operation in an Object-Oriented Database,” In Proceedings of the Third International Conference on Mobile Data Management, 2002.

[7] Chang-Ming Tsai and LihChyun Shu.

"Design and Evaluation of Protocols for Maintaining Data Consistency in

Intermittently Connected Databases," in Proceedings of IEEE EEE05 Workshop on Mobility, Agents, and Mobile Services, Hong Kong, March, 2005.

(7)

出席國際學術會議心得報告

計畫編號 NSC 95-2221-E-006 -277

計畫名稱 間歇連接式資料庫並行控制之研究

出國人員姓名 服務機關及職稱

徐立群

國立成功大學會計學系 會議時間地點

2007 年 6 月 28 日至 2007 年 6 月 29 日 中國上海

會議名稱 第 9 屆國際視覺資訊系統會議

發表論文題目 The Influence of Perceived Quality by Adjusting Frames Per Second and Bits Per Frame under the Limited Bandwidth

一、參加會議經過

國際視覺資訊系統會議提供資訊科學各相關領域研究人員與業者一個交流 新想法之場合,相關領域包含電腦視覺、資料庫、人機互動、影像處理、資訊視 覽、知識與資訊管理等。此會議過去已經舉辦八屆,1996 年在墨爾本、1997 年 在聖地牙哥、1999 年在阿姆斯特丹、2000 年在里昂、2002 年在臺灣、2003 年在 邁阿密、2004 年在舊金山、2005 年在阿姆斯特丹,2007 年在上海復旦大學舉辦。

這個會議投稿的接受率都維持在 50%上下,並具有獨立審查制度確保其論文品 質,

個人參加此會議主要是發表近期所作有關網路頻寬有限環境下,多媒體視訊 撥放品質之實證研究成果之論文⎯“The Influence of Perceived Quality by

Adjusting Frames Per Second and Bits Per Frame under the Limited Bandwidth",得 到一些同行之精闢建議。另外,當然也聆聽其他作者之大作,藉以了解最新的研 究趨勢。

除了多媒體資料處理,個人另一主要研究興趣為資料流處理 (data stream processing)。個人藉由網路與在上海東華大學計算機科學與技術系服務之王洪亞 老師進行跨校合作已有一年半之時間。我們藉由觀察到 stream data 有不同特性,

(8)

據而提出一個 load shedding strategy,目的是為了在系統過載時以減低負載但不 影響系統時效性之方式持續運作。因此次 VISUAL 2007 在上海開會,因此會後 即與王洪亞老師討論將研究成果整理成論文型式,經反覆思考確定題目為

“Loading shedding in processing data streams with stringent timing constraints",同 時亦將各節主題確定,並著手撰寫各節內容。

本次VISUAL 2007 會議已是第 9 年舉辦,共有來自 15 個國家與地區,117 篇 論文投稿。其中 34%之論文進行 20 分鐘之口頭報告,另外 24%之論文進行 15 分鐘之口頭報告。此次參加的學者來自美國、加拿大、德國、法國、義大利、芬 蘭、比利時、澳洲、日本、印度、南韓、新加坡、中國、香港與台灣等國,可說 是參與國相當多的國際會議,同時接受的論文將被Lecture Notes in Computer Science (LNCS)所收錄。兩天的會議主題包含有Image and Video Retrieval, Visual Biometrics, Intelligent Visual Information Processing, Visual Data Mining, Low-level Visual Information Processing, Applications of Visual Information Systems, Ubiquitous and Mobile Visual Information Systems,以及Semantics等。個人的論文 被安排在Low-level Visual Information Processing session發表,發表完後有兩名學 者提問,其中問到本研究是否可以real-time的調整每秒播放率與每畫面的位元數 去符合環境的需要,另一則提到本研究是否提出演算法來改善這類的問題。這兩 個議題都不是本研究所專注的,我們乃是以大規模的統計分析去分類視訊的特 徵,並以實證研究去分析不同視訊內容特徵採用何種技術對受測者的感受如何。

在 Coffee Break 時間個人與其他與會學者進行更深入的交流,其中一位來自 中 國 北 京 清 華 大 學 的 Xianping Fu , 是 清 華 的 博 士 後 研 究 , 專 長 是 image compression,並簡短的討論一些影像處理的議題與大陸學界的近況。另外有一位 來自香港浸信基督教大學 Professor Guoping Qiu,他也是這次大會的 organizer,

從復旦大學計算機系到香港客座第二年。另外還有一位來自義大利 Florence University 的博士生 Lamberto Ballan,這位 27 歲的年輕人第一次到東方的國家,

除了參加這個上海的研討會也參與了 7/2~5 在北京舉辦的 ICME 2007。藉著與同 行的交流不但了解到研究之趨勢,也拓展了個人在學術界之人脈。

參加完 VISUAL 2007 會議之後,六月三十日開始與上海東華大學計算機科學 與技術系服務之王洪亞老師討論我們為 data stream 設計考量時效性之 load shedding strategy。 我們設計的動機主要是現有為 data stream 設計的 load shedding strategy 採用隨機丟棄資料的方式,因而可能會危及應用系統之時效 要求。針對此問題我們設計之方法即以確保時效性為主要考量,經由理論推導與 模擬實驗,我們確定所設計之方法之優點。此次與王洪亞老師會面,可以更有效 率確定將成果撰寫成論文,除了決定了論文題目,也定好各節主題,並已經著手 撰寫各節內容。

二、與會心得

(9)

我們借用 VISUAL2007 keynote speaker Dr. Michael Lew 的一段話:“We are at the beginning of the digital age of information, a digital renaissance allowing us to communicate, share, and learn in a novel ways and resulting in the creation of new paradigms. …. Visual information retrieval is poised to give access to the myriad forms of images and video, comprising knowledge from individuals and cultures to scientific fields and artistic communities.” 未來多媒體資料會佔數位資料的大部 份,同時人們的生活也會愈來愈依賴各式多媒體應用系統。然而目前在多媒體領 域,包括視覺資訊系統,依然有許多瓶頸待突破,個人必須加緊努力才能在激烈 競爭的環境下作出成績。

三、建議

多媒體系統領域之研究有許多的華人參與,臺灣在此領域如要扮演主導角 色,有賴年輕有潛力的學者持續投入,並能作出夠份量的作品。另外更多國內產 官學研界的一同投入,才能在此重要領域達到國際領先的地位。

參考文獻

相關文件

Upon reception of a valid write command (CMD24 or CMD25 in the SD Memory Card protocol), the card will respond with a response token and will wait for a data block to be sent from

We propose two types of estimators of m(x) that improve the multivariate local linear regression estimator b m(x) in terms of reducing the asymptotic conditional variance while

In this paper, we propose a practical numerical method based on the LSM and the truncated SVD to reconstruct the support of the inhomogeneity in the acoustic equation with

That, if a straight line falling on two straight lines makes the interior angles on the same side less than two right angles, the two straight lines, if produced indefinitely, meet

Particularly, combining the numerical results of the two papers, we may obtain such a conclusion that the merit function method based on ϕ p has a better a global convergence and

If necessary, you might like to guide students to read over the notes and discuss the roles and language required of a chairperson or secretary to prepare them for the activity9.

Note that if the server-side system allows conflicting transaction instances to commit in an order different from their serializability order, then each client-side system must apply

• A sequence of numbers between 1 and d results in a walk on the graph if given the starting node.. – E.g., (1, 3, 2, 2, 1, 3) from