中華大學

(1)

中華大學碩士論文

非結構化 P2P 網路中的檔案收尋技術

On Improving File Searching in Unstructured Peer-to-Peer Systems

系別所:資訊工程學系碩士班學號姓名: M09502022 余東陸指導教授: 許慶賢博士

中華民國九十七年八月

(2)

On Improving File Searching in Unstructured Peer-to-Peer Systems

By

Tung-Lu Yu

Advisor: Prof. Ching-Hsien Hsu

Department of Computer Science and Information Engineering Chung-Hua University

Hsin-Chu, 30012, Taiwan

July 2008

(3)

中文摘要

近年來P2P 網路已成為一種新的分散式計算模型，所有的客戶端都能夠分享資源，包括頻寬，存儲空間和計算能力。其網路架構可分為集中式與分散式兩種架構。

在分散式的架構中，又可區分為結構化(structured)與非結構化(unstructured)兩種模型。在分散式非結構化P2P 網路中，最基本的檔案收尋方法為淹沒法(Flooding)。由於淹沒法會產生大量無用且多餘的詢問訊息(Query)，使網路的效能無法被充分的利用。因此，在本篇論文中，我們提出了一個完全分散式的改善技術，Redundant Link

Minimization (RLM)，減少 P2P 網路中多餘的詢問訊息。RLM 的主要作法是利用鄰

居集合將網路節點進行分群，根據群組後的網路拓樸，判斷出多餘的邏輯網路連線，

建構出最佳化的網路拓樸。除此之外，RLM 亦可以保證網路中檔案搜尋的涵蓋率。

根據實驗結果，

RLM 所產生的詢問訊息數量，在最好的情況下，只有淹沒法(Flooding)

的11%。在網路連線稠密度越高的網路中，RLM 改善的效果越顯著。

關鍵詞: 分散式計算，非結構化網路，P2P，Gnutella，淹沒法，RLM

(4)

Abstract

In recent years, P2P has become a popular distributed computing model. All clients provide resources, including bandwidth, storage space, and computing power. The topology of P2P network divided into centralized and distributed. Distributed P2P network divided into structured and unstructured. Flooding is the basic searching method in distributed unstructured P2P network. However, the blind flooding searching mechanism causes a large volume of unnecessary traffic, greatly limits the performance of P2P systems. So, in the paper, we propose a complete distributed technology

Redundant Link Minimization (RLM), to reduce unnecessary query in P2P network. RLM

used neighbor table to clustering in the network, to determine unnecessary links after clustering and then build optimization network topology. According to the experimentation, in the best case, RLM only has 11% query number of flooding. If the network topology more locality, more unnecessary query can be reduce by RLM.

Keywords: Distributed computing, unstructured network, P2P, Gnutella, Flooding,

RLM

(5)

Acknowledgements

First of all, I would like to thank my research advisor, Prof. Ching-Hsien Hsu, for being a consistent source of support and encouragement.

Prof. Ching-Hsien Hsu is a conscientious and careful scholar. He also gives lots of suggestions not only for the thesis but also for my life of graduate. One is fortunate to be one of Prof. Hsu’s graduate students. I would also like to thank members of P.D. Lab, they always give me support when one works on the thesis.

Finally, I would like to thank my family and colleague to give me great support, without they give me power and encouragement through this two years, I could not accomplish this thesis with carefree mind.

(6)

List of Figures

Figure 1: Gnutella’s Flooding...2

Figure 2: Redundant communications ...4

Figure 3: Clustering in unstructured P2P network...10

Figure 4: Intra-cluster links elimination ...11

Figure 5: Inter-cluster links elimination ...12

Figure 6: Query amount ...14

Figure 7: Prove of RLM (a) Optimization Intra-Cluster (b) Optimization Inter- Cluster ...15

Figure 8: Paradigm of peer join ...17

Figure 9: Paradigm of peer leaves (a) M-node (b) H-node(c) R-node...20

Figure 10: Files index cache (a) H-node without files index cache (b) H-node with files index cache ...22

Figure 11: Example of SDC (a) query-heavy peer (b) response-heavy peer...24

Figrue 12: Network structure with different S% (a) random network (S=100) (b) power-law network (S=10)...25

Figure 13: Performance evaluation of RLM in different peer degree (a) Query% (b) Response time% ...27

Figure 14: Performance evaluation of RLM on different number of peers (a) Query% (b) Response time% (c) Average clustering coefficient...28

Figure 15: Performance evaluation of RLM and ACE on different S% (a) Query% (b) Response time% (c) Average clustering coefficient...29

Figure 16: Performance evaluation of RLM and ACE ...30

Figure 17: Performance of RLM in dynamic environment (a) Query number (b) Response time% ...31

List of Tables

Table 1: Notations and Terminologies...8

Table2:The Probing Table (PT) of H-node and R-node (a) PT of P3 (b) PT of P11 (c) PT of P14...12

Table 3: Definitions of notations in analysis ...13

(8)

CHAPTER 1 Introduction

P2P 是一種共享式網路，網路的參與者共享他們所擁有的一部分硬體資源（處理能力、存

儲能力、網路連接能力、印表機等），這些共享資源需要由網路提供服務和內容，能被其它對

等節點(Peer)直接訪問而無需經過中間實體。在此網路中的參與者既是資源（服務和內容）提供者（Server），又是資源（服務和內容）獲取者（Client）。根據具體應用不同，可以把 P2P 分為以下這些類型: 提供檔案和其它內容共享的 P2P 網路，例如 Napster、Gnutella、eDonkey、

emule、BitTorrent 等。挖掘 P2P 對等計算能力和存儲共享能力，例如 SETI@home、Avaki、

Popular Power 等。基於 P2P 模式的協同處理與服務共享平台，例如 JXTA、Magi、Groove、

NET My Service 等。即時通訊交流，包括 ICQ、OICQ、Yahoo Messenger 等。安全的 P2P 通訊與訊息共享，例如Skype、Crowds、Onion Routing 等。

P2P 網路具有以下特性: 分散式(distributed)、可擴展性(scalable)、健壯性(reliability)、高性能(High performance)、隱私保護(privacy)、負載平衡(load balance)。拓樸架構是指分佈式系統中各個計算單元之間的物理或邏輯的互聯關係，節點(peer)之間的拓樸架構一直是確定系統類型的重要依據。根據拓樸架構的關係可以將P2P 系統分為: 中心拓樸(Centralized Topology):

Napster、分散式非結構化拓樸(Decentralized Unstructured Topology): Gnutella[1, 2, 3]、分散式結構化拓樸(Decentralized Structured Topology): Tapestry, Chord[5], CAN 和 Pastry、半分散式拓樸(Partially Decentralized Topology): KaZaA[4]。其中分散式非結構化網路在覆蓋網路(overlay network)中採用了隨機圖的組織模式，節點度數(peer degree)服從冪次法則(Power-Law)，同時支援複雜查詢。而其中最典型的軟體 Gnutella 是一個 P2P 檔案共享系統，它和 Napster 最大的區別在於Gnutella 是純粹的分散式非結構化 P2P 系統，沒有索引伺服器，檔案收尋機制採用了使用在完全隨機圖的淹沒法（Flooding）和隨機轉發（Random Walker）。搜索消息的傳輸深度，透過TTL (Time To Live)數的減少來實現控制。圖一為 Gnutella 的詢問訊息傳輸路徑示意圖。

(9)

Figure 1: Gnutella’s Flooding

1.1 Motivation

由於非結構化網路將覆蓋網路的拓樸為一個完全隨機圖，節點之間的連結(link)沒有遵循某些預先定義的拓樸來構建。這些系統一般不提供性能保證，但容錯性好，支援複雜的查詢，

並在節點頻繁加入/退出情況下，不會對系統造成很大的影響。

但是非結構化 P2P 系統的伸縮性(Scalability)，可用性(Availability)和持久性(Persistence) 等方面有比較明顯的缺陷，特別是在搜索稀少資源的時候。簡單，鬆散的 Overlay 構建模式是非結構化系統具有上述缺點的主要原因。該類系統中的目標收尋一般倚賴於淹沒法 (Flooding)或是淹沒法的改進策略。因而非結構化 P2P 系統中的資源搜索往往導致大量冗餘的通訊負擔，造成網路頻寬的佔用，使網路的效能無法被充分的利用。

因此對覆蓋網路進行優化，可以大幅的改善上述問題。合理的拓樸(Topology)優化策略可以為目標搜索協議提供更有效的支援，減少系統中冗餘的通信負載。拓樸優化技術使得非結構化P2P 系統被更高效與廣泛的應用。

1.2 Objectives

在本篇論文中，我們提出了一個完全分散式的改善技術，Redundant Link Minimization (RLM)，減少 P2P 網路中多餘的詢問訊息。RLM 的主要作法是利用鄰居集合將網路節點進行 分群，根據群組後的網路拓樸，判斷出多餘的邏輯網路連線，進而建構出最佳化的網路拓樸。

Query:Hollow.mp4

I have Hollow.mp4 6-7 levels depending on “time to live”

8000~1,0000 computers

(10)

另外，我們還提出了

RLM 結合檔案索引快取(Files index cache)的功能，使 RLM 在詢問訊息數

量與訊息回應時間，再獲得改善。

1.3 Thesis organization

本論文章節組織如下，第二章我們將介紹問題以及背景知識。在第三章我們提出研究架構與演算法中使用到的參數。演算法的介紹與成本的分析，我們在第四章說明。第五章我們介紹動態環境下的演算法。第六章我們提出了改善

RLM 的方法。第七章節我們展示在靜態與

動態環境下的效能評估。最後，在第八章節做出結論與我們未來的研究方向。

(11)

CHAPTER 2 Background

分散式P2P 網路中，網路拓樸的架構分為結構化與非結構化網路兩種。在分散式非結構化網路中，節點通常只擁有鄰居的相關資料，節點度數遵循冪次法則，同時支援複雜查詢，

如關鍵詞查詢，模糊查詢等。

2.1 Resources Discovery Problem in Unstructured P2P Network

分散式非結構化P2P 網路中，資料收尋一般依賴於淹沒法，而淹沒法最大的缺點是浪費網路頻寬，產生大量不必要的訊息複製與傳送，以及因為節點之間的拓樸失配 (Topology Mismatch)問題[12]，造成詢問訊息的回應時間過長。

圖二中，當節點1發出詢問訊息給節點2, 3，但節點2, 3不知道彼此收到的是相同的訊息，

所以節點2, 3會再把收到的訊息傳給彼此，這邊就造成了不必要的訊息複製與傳送，接下來看到節點2, 3除了彼此互傳相同的詢問訊息之外，還會將詢問訊息傳給節點4，此時，只有最快傳給節點4的節點2所傳送的詢問訊息是有效的，其餘的詢問訊息都是多餘的，所以詢問訊息 2Æ3、3Æ2、3Æ4、4Æ3，為多餘的詢問訊息。

1

2

4 3

Figure 2: Redundant communications

(12)

2.2 Optimization Techniques in Unstructured P2P Network

針對非結構化P2P 覆蓋網路的缺陷，研究者們提出了多種優化法，主要包括四類 (1) 拓樸特性的優化;

根據非結構化覆蓋網路的特點，可行的覆蓋網路優化法是在覆蓋網路中維護特殊的拓樸架構，加強覆蓋網路的拓樸的性質，獲得目標發現的效率，如[16, 20, 21, 22, 23, 24]。

(2) 基於底層網路訊息的覆蓋網路優化;

由於節點之間的任意連接以及隨機加入與退出，以及節點間資訊傳輸忽略底層物理網路架構，P2P系統一般都面臨著拓樸失配問題[12, 15, 18, 19]。因為拓樸失配問題的重要性，根據底層網路訊息來優化覆蓋網路，減輕拓樸失配情況，成為覆蓋網路優化的重要方法。[12]提出了自適應的連接建立方法(ACE)，來減少拓樸失配的影響。ACE方法在不影響目標搜索效率與範圍下，使Gnutella系統中的通信開銷下降了65%。[15]提出的SBO 方法仍然採用衡量節點間通信延遲的方法來優化覆蓋網路，解決拓樸失配問題，和[12]

不同的是，它採用對節點染色的機制，將覆蓋網路優化代價分攤。這類方法的基礎是通信代價表的建構，但需要複雜的操作，在覆蓋網路中維護特定拓樸架構。因此，在實際中，覆蓋網路的優化收斂速度不僅是緩慢，也限制了該方法在動態P2P環境中的使用。

(3) 節點角色區分的覆蓋網路優化;

在網路中，P2P系統中的各個節點間具有異質性(Heterogeneity)。這些差別一般為節點的處理能力、網路頻寬、存儲空間等方面[3]。但現有非結構化P2P系統中的覆蓋網路建構策略往往忽略了這一點。節點在覆蓋網路中的角色和自身能力的差異導致網路中出現多處性能瓶頸，妨礙整個網路運作。KazaA[4]是最早利用節點間異質性的系統。在 KazaA系統中，節點被劃分為兩類:超級節點(Super peer)和普通節點(Ordinary peer)。雖然 KazaA考慮了節點之間的相異性，但是其中的資源搜索相當倚賴於淹沒法。其它應用如在[8, 9, 11, 17]中，利用超級節點來維持負載平衡，強化網路健壯性。在[13, 14]中利用超級節點儲存鄰近節點的關鍵字(keyword)訊息來增加收尋成功率，類似索引(index)或區域快取(location cache)。

(13)

(4) 根據被請求內容的覆蓋網路優化;

P2P用戶請求之間多具有高度的關聯性。在P2P網路環境中，系統中的資源，一般不是隨機分佈在整個系統中，而是有集中性，大部分的請求由少量內容豐富的節點提供。

同時，系統中每一個節點感興趣的內容，往往也表現出集中性。基於上述原因，我們對覆蓋網路進行優化，利用系統中被請求內容，為目標搜索提供充分支援。利用節點感興趣內容的集中性，[10]提出了以興趣為基礎建立捷徑(Interest-based shortcut)的方法，對 Gnutella系統進行了改進。該方法在Gnutella系統的覆蓋網路之上，再增加了特殊捷徑 (Shortcut)，將有類似興趣的節點連接在一起。由於節點感興趣內容的集中性，利用這些捷徑，覆蓋網路中目標發現的時間加速了。[7]利用以興趣為基礎建立捷徑(Interest-based shortcut)方法，將建立了具有相同興趣節點們的群組，並建立了捷徑，如果訊問訊息在此群組中被滿足，訊問訊息就不會傳出去群組外，此法可以降低網路的通訊負擔，但是收尋範圍可能會減少。

(14)

CHAPTER 3 Preliminaries

3.1 Research Architecture

網路拓樸架構基本上分成，隨機網路與冪次法則(Power-Law)網路。所謂的隨機網路，是指在建構網路時，節點跟節點間，隨機建立連線關係，網路的稠密度呈現均勻狀態。冪次法則網路則是較接近真實網路架構，網路拓樸呈現一種「大部分的節點擁有較少連結，而少部分的節點擁有大量連結」的現象，而P2P-Gnutella 網路經過節點加入/離開等過程後也會呈現冪次法則的網路型態。

在我們的研究中，我們假設所有節點的運算能力一樣，節點之間的網路頻寬一樣，節點之間的距離，由兩節點之間的訊息傳輸時間為代表，每一個節點都擁有兩個集合，鄰居集合 (neighbor set)與傳輸集合(forward-list)，鄰居集合儲存自己的鄰居訊息，傳輸集合儲存當收到詢問訊息時要轉發給哪些鄰居，訊息存活時間(TTL)沒有設上限，讓詢問訊息走過網路中所有的節點。

3.2 Notations and Terminologies

在我們的研究架構中，我們將網路中的節點分為三種不同型態:H-node、M-node 與

R-node。H-node 代表群組中的中心節點(cluster head)；M-node 為群組中的成員節點；R-node

則是其餘沒被群組的節點，每一個

R-node 自己形成一個群組。圖三中，圓圈為節點，圓圈中

的數字為節點的ID，節點右上方的數字為該節點的Δ值，黑色的連線代表節點間的鄰居關係。

Total number of peers in network

S

i The set of 1-hop neighbor of Pi

d

i degree of Pi

Δi The total number of edges ( u ,v ), where u and v ∈ Si

C

i The cluster of Pi belongs to

(16)

CHAPTER 4 Redundant Link Minimization

4.1 Clustering

在本文中，我們提出了

RLM，它能有效的降低訊問訊息數量，維持原來的檔案收尋範圍。

根據圖二中的例子，當發出訊問訊息的節點與鄰居們連線狀態呈現三角連線關係的情況[6, 12]，就會產生不必要的訊息傳輸，如圖二中的節點 1, 2, 3。此時，將三角連線關係任一邊的連線刪除，就可減少不必要的訊問訊息傳輸。當Δ值越高的節點被選為 H-node 時，減少訊問訊息數量的效果越好，選出

H-node 的方法為下:

Algorithm 1:

For all peer Pi do //determined who is H-node 1. compute Δ_i

2. compare Δi of all Δj (Pj∈Si) If (Δ_i,di) > (Δ_j,dj)

Pi type transform H-node else

do nothing Def:

(Δi ,di) > (Δj ,dj)=(Δi > Δj) or (Δi = Δj and di > dj)

For all peer Pi type is H-node do //H-node clustering 1-hop neighbor 1. PiH clustering all Pj (Pj∈Si)

all Pj type transform PjM

2. GUID of PiH is the name of cluster

For all peer Pi do not clustering by H-node do // determined who is R-node 1. Pi type transform PiR

check all Pj type (Pj∈Si) 2. If all Pj are R-node

PiR type transform PiH，clustering all PjR

圖三中，在每個節點右上方的數字為該節點的Δ值。以節點 3 為例，節點 3 比對節點 2 傳送來的鄰居表，兩者的鄰居表中有相同的成員節點4，節點 3 的Δ值加 1。節點 3 比對完所有鄰居傳送的鄰居表，計算出的Δ值為 6。所有節點計算Δ值完畢後，比對彼此的Δ值，節點 3, 11 的Δ值都比所有鄰居大，所以節點 3, 11 型態轉為 H-node，P14沒被群組，型態轉為

P

14R

，

(17)

Figure 3: Clustering in unstructured P2P network

4.2 Intra-Cluster Optimization

每個節點收集對外連線訊息，狀態為

M-node 的節點，除了收集對外連線訊息之外，還要

刪除同群組節點之間的連線，方法為下:

Algorithm 2:

For all peer Pi do //collect links information and optimization Intra-cluster links 1. Check affiliated cluster situation of all Pj (Pj∈Si)

if Ci=Cj and Pi , Pj are both M-node

delete Pj in forward-list of Pi // optimization Intra-cluster links else

Pi require Pj send the Dis1,Dis2 // collect links information 2. Pi add the distance Dis1,Dis2,Dis3 and send to H-node of Pi

Def:

Dis1: distance of Pj to H-node of Pj Dis2: distance of Pj to Pi

Dis3: distance of Pi to H-node of Pi

11H，篩選到鄰近群組的最短路徑。

H-node M-node R-node

7 4 0 0 1 0

3 6

8 4

2 2

4 4

6 2

14 2 15 2

12 2 16

4

13 2 11

10 10 4

9 4 5

2

(18)

Figure 4: Intra-cluster links elimination

4.3 Inter-Cluster Optimization

當群組與群組間連線數超過一時，剩下的連線都是多餘的，H-node 與 R-node，利用演算法二收集來的連線資料，過濾出群組間的最短連線，刪除其他多餘的連線，方法為下:

Algorithm 3:

For all peer PiH and PiR do // optimization Inter-cluster links by H-node and R-node 1. Check all PSij(cluster j ∈ CSi)

2. Choose the shortest path Phij of PSij

3. PiH do // H-node optimization Inter-cluster links

Notify M-node of PiH to delete another path of PSij PiR do // R-node optimization Inter-cluster links

delete another path of PSij

Def:

CSi : the set of neighbor clusters of cluster i PSij: the set of paths cluster i to j (cluster j ∈ CSi) Phij: the shortest path of PSij

圖五中，虛線是將要被刪除的連線，每條連線旁邊的數字代表連線的長度，每一個節點都會收集對不同群組間的連線訊息，再將訊息傳送給

H-node，連線訊息包括 5 個資訊，分別

為:探測點的 ID、遠端點的 ID、探測點的 H-node、遠端點的 H-node、三個長度的加總值(探測點到探測點的

H-node 的連線長度，探測點到遠端點的連線長度，遠端點到遠端點的 H-node

的連線長)。以連線 P2ÆP5為例，收集到的訊息為2, 5, 3, 11, 10，並且將訊息儲存在 P2的

H-node P

3H中。

表二為每個

H-node 與 R-node 收集到的連線資訊。從收集的訊息中我們發現，群組 3 到

7 3 0 3 1 3

3 3

8 3

2 3

4 3

6 3

14 14 15 11

12 11 16

11

13 11 11

11 10 11

9 11 5

11

(19)

3，保留最短路徑 P16ÆP6，群組11 到 14，沒有其他路徑，所以 P12ÆP14保留，群組14 到 3，

保留最短路徑

P

14ÆP8，群組14 到 11，沒有其他路徑，所以 P14ÆP12保留，其他連線為多餘連線，通知擁有該連線的

M-node，將連線從傳輸集合中刪除。

Figure 5: Inter-cluster links elimination

Table 2: The Probing Table (PT) of H-node and R-node (a) PT of P3 (b) PT of P11 (c) PT of P14

(a)

(b)

(c)

7 5 0

3 1 3

3 7

8 4

2 3

4 2

6 1

14

8 15

7

12 7 16

4

13 6 11 3

10 4

9 4 5 3

2 3

4 2

(20)

4.4 Analysis

我們分析

RLM 最佳化花費的訊息數量(Query amount)。表三額外定義這節會用到的符號。

Table 3: Definitions of notations in analysis

|H| Number of H-node in network

|M| Number of M-node in network

|R| Number of R-node in network

Rd

i Number of PiH notify M-node to delete unnecessary Inter-cluster links 建構最佳化路徑所花費的訊息分別為下列六項:

每個

P

i向鄰居發送

P

i的鄰居表＝

∑

= N 0 i d i

每個

P

i跟鄰居比對Δ值＝

∑

= N

0 i d i

P

iH

群組鄰居＝

∑

=

|H|

0 x d x

P

∑ ∑ ∑ ∑ ∑ ∑

=

+ + + +

+ ^|^H^|

0 x

x

| R

| 0 z

z

| M

| 0 y

y

| H

| 0 x

x N

0 i

i N

0 i

i d d d d Rd

d ...(1) 其中，Px為

H-node，P

y為

M-node，P

z為

R-node，N=|H|+|M|+|R|。式(1)經由轉換後得到式(2)

Query amount=

∑ ∑

=

+^|^H^|

0 x

x N

0 i

i Rd

d

3 ...(2) 將式(2)中的

∑

= N 0 i

di

3 展開，得到式(3)

Query amount=3*N*(平均節點度數)+

∑

=

| H

| 0 x

Rd ...(3) x

所以

RLM 最佳化的複雜度為

O(N*平均節點度數)。當一個訊問訊息用淹沒法傳送到網路中，

大約產生 N*(平均節點度數-1)+1 個訊問訊息。所以 RLM 建構最佳化路徑的成本(Overhead)

(21)

後發現，RLM 建構網路拓樸時所產生的成本，小到可以忽略不記。

圖六為在不同節點數量下，使用

RLM 時所產生的成本分析圖，其中平均節點度數為

10。

從圖六中看到，當總結點數增加時，使用

RLM 所需要花費的訊息數量也隨之增加，但總數

量是都介於3*N*(平均節點度數-1)到 4*N*(平均節點度數-1)間，如同分析的結果。

Figure 6: Query amount

4.5 Prove of RLM

RLM 的最大目的是在於減少網路中多餘且無用的詢問訊息數量，為了達到這此目的，RLM

會刪減一些網路中的不必的傳輸連線，減少詢問訊息數量。但是，RLM 是否會刪除一些必要的傳輸連線，導致網路的拓樸結構分裂，如同多個孤島一般。下面將證明

RLM 不會刪除必要

的連線，並保持與使用淹沒法一樣大的檔案收尋涵蓋率。

Cluster A Cluster B

Path e 3

Cluster A

1 2

(23)

Chapter 5 Dynamic Network Adaptation

5.1 Peer Join

在P2P 網路中，節點的加入/離開會造成網路拓樸改變，為因應 P2P 網路的動態性，我們提出了在節點加入/離開時的演算法。當新節點加入網路時，起始節點(Bootstraping node)會給加入的新節點幾個鄰居資料，因為鄰居的狀態不同，而加入的新節點將會轉變為相對應的型態，對應的規則如下:

Algorithm 4:

Have H-node in Si peer Pi do 1. Pi type transform PiM

2. join in to the cluster of H-node that PiM nearest Check affiliated cluster situation of all Pj (Pj∈Si)

if Ci=Cj

delete Pj in forward-list of Pi

else

send links data of near clusters to H-node of PiM

Optimization Inter-cluster links Do not have H-node in Si peer Pi do

1. Pi type transform PiR

2. Connect with the M-node of PiRnearest Si are all R-node peer Pi do

1. Pi type transform PiH

2. PiH clustering Si

all PjM check affiliated cluster situation of all Pk (Pj∈Si,Pk∈Sj,Pi≠Pk) if Cj=Ck

delete Pk in forward-list of Pj

else

send links data of near clusters to PiH

Optimization Inter-cluster links

圖八中，PN1，PN2，PN3為新加入的節點，PN1加入網路時，起始節點給他的鄰居為節點 1, 3, 14。PN1檢查過周圍鄰居的狀態後，發現節點3 為 H-node，所以轉變型態為 PN1M

加入

P

3H

N3H，將

P

14R群組起來。

Figure 8: Paradigm of peer join

7 0 1

3

8

2

4

6

14 15

12 16

13 11 10

9 5

N1

N2

N3

(25)

5.2 Peer Leave

當有節點離開網路時，會造成網路拓樸改變，導致原有的訊息傳輸路徑被破壞。為確保網路的連通性與檔案的收尋範圍，根據離開節點的型態，我們提出了對應的演算法。

Algorithm 5:

Leave Pi type is M-node do

1. PiM send neighbor set to PjH (Pj∈Si && Pj∈Ci)

2. PjH update neighbor set and forward-list of PjH //PjH inherited neighbor form PiM

3. PjH check affiliated cluster situation of Pk (Pk∈Sj) //reduce number of R-node If Pk is R-node ,clustering it

4. all Pk of new join check affiliated cluster situation of all Pl (Pl∈Sk) //optimization 5.Eexecute optimization

Leave Pi type is M-node and only have one neighbor PjH do(Special case) //reduce cluster number in network 1. PiM send neighbor set to neighbor PjH

2. PjH update neighbor set and forward-list of PjH //PjH inherited neighbor form PiM

3. PjH check node type of Pk (Pk∈Sj) //reduce number of H-node If Pk have R-node

clustering it

else If Pk not have R-node, but have H-node

PjH type transform PjM and join to cluster of Pk

else PjH type transform PjR

4. Execute optimization Def:

Optimization : Optimization Intra-cluster and Inter-cluster links Leave Pi type is H-node do

1. PiH dismiss cluster ,PjM not have cluster (Pj∈Si && Pj∈Ci) // dismiss cluster 2. PiH send neighbor set to neighbor Pj whose degree is biggest

Peer Pj do

1. Pj update neighbor set and forward-list of Pj //PjM inherited neighbor form PiH

2. Pj type transform PjH

3. PjH clustering Pk of not have cluster (Pk∈Sj)

4. all Pk check affiliated cluster situation of all Pl (Pl∈Sk) 5.Execute optimization

Leave Pi type is H-node and only have one member PjM do (Special case) //reduce cluster number in network 1. PiH send neighbor set to PjM (Pj∈Si && Pj∈Ci)

2. PjM update neighbor set and forward-list of PjM //PjM inherited neighbor form PiH

3. PjM type transform PjH

4. check node type of Pk (Pk∈Sj)

If Pk have H-node ,PjH type transform PiM ,PiM join to cluster of Pk

If Pk not have H-node but have M-node ,PjH type transform PiR

If Pk only have R-node ,PjH clustering Pk 5. Execute optimization

Leave Pi type is R-node do

PiR send neighbor set to Pj whose degree is biggest (Pj∈Si) Peer Pj do

1. P update neighbor set and forward-list of P //P inherited neighbor form P^R

(26)

圖九(a)為 M-node 離開網路的例子，圖中 P6M離開了，P6M通知節點3, 16, 4 將 P6M的資料從鄰居集合中刪除，因為

P

6M

屬於

P

3H

的群組，所以

P

6M

P

3H放入他的傳輸集合中。

在圖九(b)為 H-node 離開網路的例子，圖中 P3H離開了，P3H通知他所有鄰居將

P

3H的資料從鄰居集合中刪除，並通知其下

M-node 轉為沒有群組的狀態，因為 P

3H

P

8ÆP12的訊息傳到

P

3H

中檢查，發現

P

8ÆP12是多餘的連線，所以將

P

12從

P

8的傳輸集合中刪除。

(27)

(a)

(b)

(c)

Figure 9: Paradigm of peer leaves (a) M-node (b) H-node(c) R-node

0 1

3

8

2

4

14 15

12 16

13 11 10

9 5

6 7

7 0

1 3

8

2

4

14 15

12 16

13 11 10

9 5

6 8

8 8

8

11 11

11

11 7

0 1

3

8

2

4

14 15

12 16

13 11 10

9 5

6

8 8

(28)

Chapter 6 Optimization of RLM

6.1 Files Indexing Cache

為改善

RLM，縮短回應時間，我們結合了檔案索引快取(Files index cache)的功能，這個

功能是加在

H-node 中，利用 M-node 傳送自己的檔案集合(file-list)給所屬的 H-node。當訊息

經過

H-node 時，在 H-node 的群組成員中，有滿足收尋條件的情況下，H-node 可以代替鄰居

傳送回應訊息。而

H-node 傳送訊息出去時，只需要傳給有對外連線的 M-node。擁有對外連

線的

M-node，我們稱為閘道節點(gateway-node)[17]。

圖十中，P1M為詢問訊息的出發點，P10M

為目標點，箭頭的方向為詢問訊息的傳送方向，

節點間的距離設定為10 個單位時間，右上角有英文字 G 的節點，代表該節點為閘道節點。

圖十(a) 是只有使用 RLM，沒使用檔案索引快取的情況，P1M接到 P10M

的回應訊息時間為50+50=100 個單位時間。圖十(b)是使用檔案索引快取的情況，詢問訊息走到 P11H時就可得到回應，P1M

接到回應訊息的時間為40+40=80 個單位時間。

H-node 所以只傳送訊息給為閘道節點 P

12M

，當

P

12M收到 P14R

的訊息後，將訊息傳給

P

11H。在此發現，原本要18 個訊問訊息才能傳送完網路上所有的節點，加入檔案索引快取之後，只要9 個詢問訊息就可有走完全部節點的相同效果，達到節省詢問訊息數量的目的。

(29)

(a)

(b)

Figure 10: Files index cache (a) H-node without files index cache (b) H-node with files index cache

7 0

1

3

8

2

4

6 G

14 Start

15 G

12 G

16 G

13 11

10

9

5 Target

1+2+2+2+1+1=9 query 40*2=80 response time

7 0

1

3

8

2

4

6 G

14 Start

15 G

12 G

16 G

13 11

10

9

5 Target

1+6+2+2+6+1=18 query

50*2=100 response time

(30)

6.2 Single Direction Connections

在本段中我們將提出可使用的改善方法，在[15]中提到了在P2P網路中常常會有兩種節點，一種節點叫做頻繁詢問節點(query-heavy peer)，這種節點通常會比一般節點發出更多的詢問訊息，因為使用了RLM後，網路中的路徑會減少，導致回傳時間的增加。另外一種叫做頻繁回應節點(response-heavy peer)，同樣的原因，導致回傳時間的增加。所以在[15]中提出了 Single Direction Connections (SDC)，我們以此為參考，提出了在RLM使用的演算法。

Algorithm 6:

For eachPi monitored message traffic of self As Pi finds oneself is the query-heavy peer

Pi notify Pj, Pi is a query-heavy peer (Pj∈Si) if Pj not inside forward-list of Pi

Pi add Pj into forward-list As Pi finds oneself is the response-heavy peer

Pi notify Pj, Pi is a response-heavy peer(Pj∈Si) if Pi not inside forward-list of Pj

Pj add Pi into forward-list

圖十一(a)是經過RLM最佳化後，每一個節點的傳輸關係，當P6發現自己為頻繁詢問節點時，P6通知他所有1-hop的鄰居說P6為頻繁詢問節點，在P6的1-hop鄰居中，只有P4跟他沒有相互的傳輸關係，所以P6將P4放入P6的傳輸集合中，連線P6ÆP4被建立。

在圖十一(b)中，P6發現自己為頻繁回應節點，P6通知他所有1-hop的鄰居說P6為頻繁回應節點，在P6的1-hop鄰居中只有P4沒有相互的傳輸的關係，所以P4將P6放入P4的傳輸集合中，

連線P4ÆP6被建立。

(31)

(a)

(b)

Figure11: Example of SDC (a) query-heavy peer (b) response-heavy peer

7 0

1

3

8

2

4

6

14 15

12 16

13 11 10

9 5 response-heavy node 7

0

1

3

8

2

4

6

14 15

12 16

13 11 10

9 5 query-heavy node

(32)

Chapter 7 Performance Evaluation

為了評估

RLM 的效果，我們模擬了淹沒法(Flooding)與 RLM，在動態網路與靜態網路環

境中。在模擬的環境中，我們產生的邏輯拓樸網路中的邏輯節點數量從200~2000，所有的節點平均連接2~10 鄰居(Peer degree)；節點跟節點間通訊成本一律設為 10 個單位時間。節點的

連線關係方面，我們確保網路上的任意 2 個節點一定有路徑相通。大部分連線集中在總節點

數百分之 S 的節點上，使虛擬的網路架構擁有區域連線稠密性。S 的值越小，表示大部分連線集中在越少的網路節點上，相對的，區域內連線數量就會越高。圖十二中，方框中的數字代表總節點數百分比，圓點代表連線。圖十二(a)為當 S 為 100 時，連線均勻的散佈在所有節點之中，網路架構呈現均勻的隨機網路。圖十二(b)為當 S 為 10 時，大部份連線集中在所有節點數10%的節點中，剩下 90%的節點中的連線數量較少，網路架構為冪次法則網路。

(a) (b)

Figure 12: Network structure with different S% (a) random network (S=100) (b) power-law network (S=10)

另外，我們還計算了各種情況下，網路的平均群聚係數。群聚係數(Clustering Coefficient) 記算方式為下: 某個節點的群聚係數定義為，其鄰居彼此間也是鄰居的程度。假設該結點有 k 個鄰居，此k 個鄰居間所有可能形成的連結總數為 k*(k-1)/2。此結點的群聚係數為: (此 k 個鄰居間真正形成的連結總數/除以可能形成的連結總數)。整個網路的群聚係數為: 所有結點群

聚度的平均值。舉例說明，假設節點 i 有五個鄰居，這五個鄰居間所有可能形成的連結總數

(33)

7.1 Static Circumstance

我們在靜態P2P 網路中，分析 RLM 的效果，模擬程式使用 Borland C++ Builder 6 撰寫。

RLM 最主要的效果是盡可能的減少在收尋檔案時所產生的詢問訊息數量，維持和淹沒法相同

的網路檔案收尋範圍。在模擬中，產生一個新的拓樸後，隨機挑選一個節點作為詢問訊息的發出點，分別計算使用淹沒法與

RLM 後產生的詢問訊息數量和總回應時間(所有節點的回應

時間總和)。

圖十三中表示在不同度數下

RLM 的效果，菱形點曲線代表 RLM，正方形點曲線代表 RLM

結合檔案索引快取，模擬環境為1000 個節點，S 值為 20，平均節點度數為 2~10。

圖十三(a)顯示在不同的平均節點度數下，詢問訊息所減少的數量。淹沒法所產生的詢問訊息數量為100%，當平均節點度數增加，區域連線稠密度變高，減少詢問訊息比例越來越多。

圖十三(b)顯示在不同的平均節點度數下，總回應時間(所有節點的回應時間總和)，淹沒法所產生的總回應時間為100%。當平均節點度數的增加，減少詢問訊息數量的比例越多，造成許多捷徑被刪除越多，使得總回應時間變的越長。從圖十三中了解，RLM 結合檔案索引快取在總回應時間與詢問訊息的數量上，都比單純只有

RLM 來的有效，但再改善的效果不大。

(34)

20.00 30.00 40.00 50.00 60.00 70.00 80.00 90.00 100.00

degree

query%

RLM 99.98 67.46 48.09 38.96 29.55

RLM+cache 83.34 63.12 45.46 37.11 28.11

2 4 6 8 10

(a)

90.00 100.00 110.00 120.00 130.00

degree

response time%

RLM 100.03 114.65 119.57 122.39 125.99

RLM+cache 95.49 108.43 112.45 114.92 117.96

2 4 6 8 10

(b)

Figure 13: Performance evaluation of RLM in different peer degree (a) Query% (b) Response time%

圖十四表示在不同節點數量下

RLM 的效果，圖中菱形點曲線代表 RLM，正方形點曲線

代表

RLM 結合檔案索引快取，模擬環境為

200~2000 個節點，S 值為 20，平均節點度數為 10。

圖十四(a)顯示在不同的節點數量下，詢問訊息所減少的數量。淹沒法所產生的詢問訊息的數量為100%，隨者節點的總數量增加，群聚係數(Clustering Coefficient)會稍微的降低，使得詢問訊息所減少效果變少。

圖十四(b)顯示在不同的節點數量下，所有節點的回應時間總和。淹沒法產生的總回應時間為100%，隨者節點總數量的增加，使得詢問訊息所減少的數量變少，總回應時間相對變快。

圖十四(c)顯示在不同的節點數量下，群聚係數的變化，當節點數量增加時，群聚係數會慢慢的降低，隨者節點數量增加，群聚係數降低的幅度越低。從圖十四中了解到，當節點總數量增加，群聚係數降低時，詢問訊息所減少的數量變少，總回應時間變快。

中 華 大 學

中 華 大 學 碩 士 論 文

非結構化 P2P 網路中的檔案收尋技術

On Improving File Searching in Unstructured Peer-to-Peer Systems

系 別 所:資 訊 工 程 學 系 碩 士 班 學號姓名: M09502022 余東陸 指導教授: 許 慶 賢 博士

中華民國 九十七 年 八 月

On Improving File Searching in Unstructured Peer-to-Peer Systems

By

Tung-Lu Yu

Advisor: Prof. Ching-Hsien Hsu

Department of Computer Science and Information Engineering Chung-Hua University

Hsin-Chu, 30012, Taiwan

July 2008

中文摘要

Minimization (RLM)，減少 P2P 網路中多餘的詢問訊息。RLM 的主要作法是利用鄰

RLM 所產生的詢問訊息數量，在最好的情況下，只有淹沒法(Flooding)

Abstract

Redundant Link Minimization (RLM), to reduce unnecessary query in P2P network. RLM

Keywords: Distributed computing, unstructured network, P2P, Gnutella, Flooding,

RLM

Acknowledgements

Table of Contents

Chinese Abstract ... I English Abstract...II Acknowledgements ... III Table of Contents ... IV List of Figures... V List of Tables ... V

1 Introuction ...1

2 Background ...4

3 Preliminaries ...7

4 Redundant Link Minimization ...9

5 Dynamic Network Adaptation ...16

6 Optimization RLM ...21

7 Performance Evaluation...25

8 Conclusions and Future Work...32

Reference ...33

List of Figures

List of Tables

CHAPTER 1 Introduction

1.1 Motivation

1.2 Objectives

RLM 結合檔案索引快取(Files index cache)的功能，使 RLM 在詢問訊息數

1.3 Thesis organization

RLM 的方法。第七章節我們展示在靜態與

CHAPTER 2 Background

2.1 Resources Discovery Problem in Unstructured P2P Network

2.2 Optimization Techniques in Unstructured P2P Network

CHAPTER 3 Preliminaries

3.1 Research Architecture

3.2 Notations and Terminologies

R-node。H-node 代表群組中的中心節點(cluster head)；M-node 為群組中的成員節點；R-node

R-node 自己形成一個群組。圖三中，圓圈為節點，圓圈中

P

P

P

is an H-node P

P

is an M-node P

P

P

→P

Forwarding path from P

N

S

d

C

CHAPTER 4

Redundant Link Minimization

4.1 Clustering

RLM，它能有效的降低訊問訊息數量，維持原來的檔案收尋範圍。

H-node 的方法為下:

P

4.2 Intra-Cluster Optimization

M-node 的節點，除了收集對外連線訊息之外，還要

P

P

P

P

P

P

P

P

H-node P

P

中華大學

中華大學碩士論文

系別所:資訊工程學系碩士班學號姓名: M09502022 余東陸指導教授: 許慶賢博士

中華民國九十七年八月