• 沒有找到結果。

Network Protocols: Design and Analysis-Peer-to-Peer File Sharing

N/A
N/A
Protected

Academic year: 2021

Share "Network Protocols: Design and Analysis-Peer-to-Peer File Sharing"

Copied!
69
0
0

加載中.... (立即查看全文)

全文

(1)

Network Protocols:

Design and Analysis

Polly Huang EE NTU

http://cc.ee.ntu.edu.tw/~phuang phuang@cc.ee.ntu.edu.tw

(2)
(3)

File Sharing

• A straight forward term

• Basically users wanting to pass files around computers on the networks

(4)

Polly Huang, NTU EE 4

The Traditional Way

• Client-server based

• Files are only kept in the servers

• Clients always get the files from the servers • Never the other way around

(5)

For Example

• Downloading web pages from WWW

• Think the web site keeping all those pages

you browse as the server

• Think your machine requesting pages as

the client

(6)

Polly Huang, NTU EE 6

The New Way

• Peer-to-peer based

• Files are kept where-ever they are

• Each computer can download from another computer or upload files to another

computer

(7)

Peer-to-Peer (P2P)

• There is no client, server distinction

• The computers are thus simply peers to each other

• Or think this way

(8)

As long as I have a computer on the Internet, I can put files online for others to use.

(9)

3 Major Components

• Connecting • Searching

(10)

Polly Huang, NTU EE 10

By Example

• Napster • Gnutella

(11)

Napster

• The company • The technology • The prospective

(12)

Polly Huang, NTU EE 12

The Beginning

• January 1999 • Shawn Fanning

• Freshman of Northeastern University • If we could all share our MP3 files

(13)

The Technology

• Connecting

– Through a fixed Napster server

• Searching

– Done by the Npaster server

• Downloading

(14)

Polly Huang, NTU EE 14

Connecting

• Each peer connects to a fixed Napster serve r somewhere

• Upon connecting, each peer sends its own li st of files to be shared

(15)

Polly Huang, NTU EE 15

Illustrated

Napster Server Peer A

American Pie Madonna Promise Nsync … Napster Server Peer B Dance With Me J. Lo

She bangs Ricky Martin …

Peer C

Reflection Christina Aguilera Larger Than Life Backstreet Boys …

American Pie Madonna A Promise Nsync A Dance With Me J. Lo B She bangs Ricky Martin B …

Reflection Christina Aguilera C Larger Than Life Backstreet Boys C …

(16)

Polly Huang, NTU EE 16

Searching

• The peer sends the search query to the Naps ter server

• The Napster server performs the search

– Matching the directory

• The Napster server returns

– List of files matching the query and location of the files

(17)

Illustrated

Napster Server Peer A Napster Server Peer B Peer C

American Pie Madonna A Promise Nsync A Dance With Me J. Lo B She bangs Ricky Martin B …

Reflection Christina Aguilera C Larger Than Life Backstreet Boys C …

(18)

Polly Huang, NTU EE 18

Downloading

• Select the desired file from the returned search list

• Request the file directly to the location indicated on the returned list

(19)

Illustrated

Napster Server Peer A Napster Server Peer B Peer C

American Pie Madonna A Request: American Pie Madonna

(20)

Polly Huang, NTU EE 20

Problems

• Napster servers

– Single point of failure – Performance bottleneck

(21)

The Prospective

• December 1999

– The Recording Industry Association of America (RIA A) sued for copyright infringement, asking for damages of $100,000 each time a song is copied

• March 2001

– Judge ruled for Napster to block copying of copyrighte d songs

• July 2001

(22)

Polly Huang, NTU EE 22

Obvious Target

• Being the first peer-to-peer file sharing

system

(23)

Legacy

• Napster might be gone forever

• But the following peer-to-peer file sharing s ystems continue to prosper

(24)

Polly Huang, NTU EE 24

Gnutella

• The company • The technology • The prospective

(25)

The Beginning

• Justin Frankel and Tom Petter • AOL aquires Nullsoft

(26)

Polly Huang, NTU EE 26

The Technology

• Connecting • Searching

(27)

Connecting

• Each peer connects to any peer already on the Gnut ella network

• Upon connecting,

– The peer announce its presence to the neighboring peers – The neighboring peers propagate the announcement until

it reaches all peers on the network

• Upon receiving the announcement

– The contacted peer responds with a bit of information ab out itself

– For example, number of files and amount of disk space o n the particular peer to share with the network

(28)

Polly Huang, NTU EE 28

Illustrated

Peer A

American Pie Madonna Promise Nsync … 10 files, 20MB

Peer B

Dance With Me J. Lo

She bangs Ricky Martin … 20 files, 50MB

Peer C

Reflection Christina Aguilera Larger Than Life Backstreet Boys … 15 files, 40MB Announcement A:10 files, 20MB Announcement A: 10 files, 20MB Respond B: 20 files, 50MB Respond

(29)

Searching

• Similar to connecting

• Upon generating a search query

– The peer sends the search query to the neighboring peers

– The neighboring peers propagate the search query until it reaches all peers on the network

• Upon receiving the search query

– The contacted peer performs the search on its local file base and responds with the list of matched entries

(30)

Polly Huang, NTU EE 30

Illustrated

Peer A

American Pie Madonna Promise Nsync … 10 files, 20MB

Peer B

Dance With Me J. Lo

She bangs Ricky Martin … 20 files, 50MB

Peer C

Reflection Christina Aguilera Larger Than Life Backstreet Boys … 15 files, 40MB Search “Backstreet Boys” Search “Backstreet Boys” Perform Matching Result

C: “Larger Than Life”

Result

C: Larger Than Life Perform

(31)

Query Flooding

• Gnutella • No hierarchy

• Use bootstrap node to learn abo ut others

• Join message

• Send query to neighbors • Neighbors forward query

• If queried peer has object, it s ends message back to queryin g peer

(32)

Polly Huang, NTU EE 32

More Query Flooding

Pros

• Peers have similar resp onsibilities: no group l eaders

• Highly decentralized • No peer maintains dire

ctory info

Cons

• Excessive query traffic • Query radius: may not

have content when pre sent

• Bootstrap node

• Maintenance of overla y network

(33)

Downloading

• Select the desired file from the returned sear ch list

• Request the file directly to the location indi cated on the returned list

• The transfer done using HTTP

– Each Gnutella peer has web browser functions built-in

(34)

Polly Huang, NTU EE 34

Illustrated

Peer A

American Pie Madonna Promise Nsync … 10 files, 20MB

Peer B

Dance With Me J. Lo

She bangs Ricky Martin … 20 files, 50MB

Peer C

Reflection Christina Aguilera Larger Than Life Backstreet Boys … 15 files, 40MB

HTTP Request

Larger Than Life.mp3 HTTP Reply

(35)

The Prospective

• Posted for one day, March 2000

• Immediately withdrawn due to a major perfo rmance concern

– The announcements, announcement responses, s earch queries, and query results will all have to go around the entire network

– Lots of flooding

(36)

Polly Huang, NTU EE 36

The Design Lesson

• Flooding is bad

• P2P system designers beware

• And the following P2P systems continue to evolve to a better state

(37)

Napster vs. Gnutella

• Peers are connected throug h a Napster server

• Upside

– Requests and replies are lim ited within 1 hop

– Searching done at the potent ially more powerful Napster server

– Search results more uniform • Downside

– Limited amount of files shar ed

– Centralized directory

• Peers are connected directly to each other

• Upside

– Large amount of files shared – Decentralized directory

• Downside

– Flooding of requests and replies

– Searching done at the resource limited peer computers

(38)

Polly Huang, NTU EE 38

Minor Difference

• Napster allows only MP3 file sharing • Gnutella allows general file sharing

(39)

Peer-to-Peer Reading

Freenet [Clarke00a] Chord [Stoica00a]

(40)

Freenet

Clarke, Sandberg, Wiley, Hong [Clarke00a]

(41)

Key ideas

• Share files anonymously

– Hide or randomize info to make it hard to find the source

– Encryption on file contents

• Peer-to-peer

– Who has something?

(42)

Polly Huang, NTU EE 42

Basic Idea: Finding Data

• Generate a key from the filename

– it’s just a hash, a “random” 160-bit number

(43)

Basic Idea: Routing

• Throw data into a mesh of nodes • Route queries towards keys

• Encourage locality around keys

– New data tends to go towards like keys – Searches duplicate data

• Performance

– Worst case: flooding

(44)

Polly Huang, NTU EE 44

Basic Idea: Anonymity

• When propagating requests, add randomness to obscure sender/receiver

– randomize initial TTL, forward past final TTL, etc., renumber sender id as query moves

• Data is encrypted and stored by key, so node owner doesn’t know contents

• Updates are hard

– either prohibit them

(45)

Naming

• First, strings that map to hashes

– text/philosophy/sun-tsu/art-of-war => sha-160 hash 0x1 2838482

• But how do we know which strings?

– Could be Prose:Philosophy:Chinese:Sun Tsu:Art of Wa r => 0x8348234f

• And how do we browse?

– What other Philosophy texts?

(46)

Polly Huang, NTU EE 46

Updating

• How do we update data in place?

– Can’t just replace data, because that allows denial of service

– Yet need to update data in place (ex. to maintain directories of keys)

• Single user

– Can use public key cryptography and indirection

(47)

Compare to Other Peer Systems

• Napster had a central database, it’s distribut ed

– Kazaa and Morpheous too (right?)

• Gnutella?

• Others have better

– Search

(48)

Polly Huang, NTU EE 48

Does it work?

• Not clear if Freenet scales…

– With sparse keyspace, how much flooding?

• Vulnerable to DoS attacks…

– Record companies putting songs with 15s of music and then a raspberry

– No real way to stop this

• Not clear that search is sufficient.. • But very interesting design point

(49)

Comparing to Other p2p Systems

• Search: map data to hash, find hash in system with “directed semi-flooding”

• Update:

– insert: first search, then give data to neighbor with closest key

– update in place: hard

• Redundancy: data is cached at every step on search or insert

(50)

Chord

Stoica, Morris, Karger, Kaashoek, Balakrishnan

(51)

Key ideas

• 2nd generation peer-to-peer system

– Distributed lookup: maps key->node

• Strong statements about efficiency

– (compare to freenet’s loose clustering)

• Routing information

– Few neighbors (successors)

– Finger table to jump in ~2^n steps around ring

• Protocols to maintain ring structure even with join s/departures

(52)

Polly Huang, NTU EE 52

Compare Search in Several

Peer-to-Peer Systems

• Napster: central search engine

• Freenet: search towards keys, but no guaran tees

• Chord:

– Map keys to linear search space

– Keep pointers (fingers) into exponential places around space

(53)

Basic Search in Key Space

• Finger table lets you quickly get around circle

– First step gets half way there – Next step gets quarter

– etc.

• Take advantage that in

Internet, everything’s pretty close

– Goal is few questions in logical space, not asking questions of topologically near nodes

(54)

Polly Huang, NTU EE 54

Mapping Real Nodes to Key

Space

• Must map keys to nodes to do search

• Not all keys have real nodes

– Nodes must cover whole space

– Pointers point to nodes that are present

(55)

Node Joins

• Must keep successors and finger table current • Use successors for correctness

– Can always fall back on them to find a key

• Use finger table for performance

– Must update it, but can tolerate temporary errors

• Keep successor and predecessor so we can update our neighbors

(56)

Polly Huang, NTU EE 56

Join Example

before node 6 after node 6

When new node enters, it establishes its successor and predecessor and then builds its finger table, and moves any keys it now “owns”

(57)

Topology Maintenance

• Uses stabilization algorithm to confirm ring is correct

– Every 30s, confirm that your successor knows about you

• If not, either fix it, or yourself • Why would it be wrong?

• Why would you be wrong?

• Dealing with unexpected failures:

– Keep successor list of r next neighbors

(58)

Polly Huang, NTU EE 58

Key Distribution

• Data is distributed unevenly

– Since data hashes and node IDs are random

– And node distribution

around ring may be uneven

• To reduce this, create

virtual nodes

– “More” nodes gives data more chance to even out

(59)

Other Performance Results

• Search path length scales wrt log(N) nodes • Latency seems reasonable

(60)

Polly Huang, NTU EE 60

Comparing to Other p2p Systems

• Search? • Update?

• Redundancy? • Other features?

(61)

Geographic Hash Tables

Ratnasamy, Karp, Yin, Yu, Estrin, Govindan, Shenker

(62)

Polly Huang, NTU EE 62

Key ideas

• 3rd generation peer-to-peer systems

• Blends peer-to-peer and sensor networks • Like Chord (hashing) but to geographic

space rather than virtual space

• Identifies data-centric storage as alternative to data-centric routing

(63)

Basic Idea

• Hash data to a physical location

– Spreads load

– Avoids sending all data to user (otherwise user becomes a hotspot

• Use GPSR (Greedy Perimeter Stateless Routing) to get there

• Shift in thinking for sensor nets from search to storage

(64)

Polly Huang, NTU EE 64

GPSR

• Greedy

– Most of the time

– Moves towards target

• Perimeter

– When encounter holes – Always required at end

(65)

Geographic Hash Table

• Store data at the home node

– Hash key into target location

– Defined as the node closest to the target location – Get there with GPSR

• Place copies of data at home perimeter

– Why? – How?

(66)

Polly Huang, NTU EE 66

Geographic Hash Table

• Store data at the home node

– Hash key into target location

– Defined as the node closest to the target location – Get there with GPSR

• Place copies of data at home perimeter

– Why?

• Reliability (in case home moves or dies)

– How?

• Have to walk perimeter to find home

(67)

Structured Replication

• Perimeter replication for reliability • Structured replication

– Why? for load balancing – How:

• Map data to multiple locations (on a quad tree)

• (somehow) decide when to shift from just using one location to 4, or 16, or…

(68)

Polly Huang, NTU EE 68

Comparing to Other p2p Systems

• Search? • Update?

• Redundancy? • Other features?

(69)

參考文獻

相關文件

Textbook Chapter 4.3 – The substitution method for solving recurrences Textbook Chapter 4.4 – The recursion-tree method for solving recurrences Textbook Chapter 4.5 – The master

Calculate the amortized cost of each operation based on the potential function 4. Calculate total amortized cost based on

Rethinking classroom : Flipped peer observation leads to job-embedded teacher learning. The transparent teacher: Taking charge of your instruction with peer-collected

z The caller sent signaling information over TCP to an online Skype node which forwarded it to callee over TCP. z The online node also routed voice packets from caller to callee

Wang, A recurrent neural network for solving nonlinear convex programs subject to linear constraints, IEEE Transactions on Neural Networks, vol..

•利用吃「碗仔翅」的 活動說出人們喜歡吃 魚翅,商人因著魚翅 價格昂貴,於是大量 捕殺鯊魚賺錢,鯊魚

 “More Joel on Software : Further Thoughts on Diverse and Occasionally Related Matters That Will Prove of Interest to Software Developers, Designers, and Managers, and to Those

Unlike the client-server model, BitTorrent divides a file into a number of equal-sized pieces, where each peer simultaneously downloads and uploads via its neighbors that