• 沒有找到結果。

GnuShare: Enforcing Sharing in Gnutella-style Peer-to-Peer Networks

N/A
N/A
Protected

Academic year: 2022

Share "GnuShare: Enforcing Sharing in Gnutella-style Peer-to-Peer Networks"

Copied!
7
0
0

加載中.... (立即查看全文)

全文

(1)

GnuShare: Enforcing Sharing in Gnutella-style Peer-to-Peer Networks

Antonio Garcia-Martinez Michal Feldman

Department of Physics School of Information Management and Systems

agm@socrates.berkeley.edu mfeldman@sims.berkeley.edu

University of California at Berkeley

Abstract - The free riding phenomenon challenges the performance of peer-to-peer networks. In this paper we propose GnuShare, a mechanism that enforces sharing in Gnutella-style peer-to-peer networks. The main key of the algorithm is the attempt to tie one’s quality-of-service to the degree to which he contributes to the system, this done in a completely decentralized way. Users who share files are being compensated by additional connectivity to the network through the downloading node. This feature creates an incentive for users to share files, and thereby increases the percentage of sharing participants.

I. INTRODUCTION

The 'tragedy of the commons' is a phrase denoting the inevitably selfish use of public goods in the face of minimal stewardship. In computer networks the digital commons in question is typically bandwidth. Selfish behavior might lead to unfair allocation of resources. In the network layer, mechanisms such as fair queueing and admission control policies have largely solved the problem of selfish users maximizing their own profit to everyone else's detriment. With the rise of peer-to-peer (P2P) networks, the application layer of networked computers has suddenly become a public good accessible and exploitable by all, with the disappointingly repetitive results. As is often reported, in the popular Gnutella network, 66\% of users make no contribution to the network's file offerings, and the top 1% of file -sharing nodes satisfy a full 47% of file requests [1]. Without even social disapproval as a motivating force (due to Gnutella's inherent anonymity), users devolve into their most selfish form, and the phenomenon of free riding is observed to a large degree.

This being the case, the challenge is to either make selfish use futile and unprofitable, or make generous and altruistic use lucrative, and the more generous a user, the more profitable his service.

However, tying quality of service to behavior is

tricky in Gnutella, as, with no centralized authority, there is no guarantor of worth or credibility. And, as everyone is completely anonymous, the effect of reputation, either as a spur to good behavior or as a prod to bad, does not exist. Something more concrete and binding, perhaps inherent to the system protocol, is necessary to bring order to the anarchic P2P world.

II. RELATED WORK

Some previous work tried to avoid the free riding problem in peer-to-peer networks by means of enforcement to share. One such example goes back to the bulletin-board services (BBS), where users were permitted to download only after they have contributed to the system in a form of upload. This scheme, however, is very easy to cheat against. All it takes is to upload a file, which can be low quality content or even randomly generated text file.

Another example is the mechanism of caching and replication. In Freenet [3], for instance, objects are replicated throughout the reverse path of the search, replicated all over the network, and forcing users, even those who have not downloaded the file, to share. This solution, however, suffers from other kinds of misbehaving, such as uploading of illegal material onto the system. In addition, there is no cost associated with deleting the replicated file from one’s shared directory; consequently, there is nothing to prevent the user from doing so.

In the real world, money is the medium of trust and worth. Taken to the network sphere, previous researchers have forwarded a currency scheme as a way to keep Gnutella honest [2]. Using micropayments, Golle and colleagues showed that a P2P network where users were credited for uploads and debited for downloads reaches an equilibrium wherein users demonstrate more 'altruistic' behavior than otherwise. Implicit, however, in the model is a central authority that admits nodes to the network, as well as distributes and backs the common currency.

Most digital cash schemes require a trusted authority outside of the vendor-buyer loop to either certify the digital currency, ensure security properties such as

(2)

single -spending, or tie the virtual money to something of inherent worth (i.e. cold hard cash).

However, if the Gnutella network is to maintain its unfettered and untamable nature, then such a solution is not viable.

To see this, imagine for a moment implementing such a scheme in the current Gnutella network. There is no central authority, so every node would be responsible for printing its own currency. Such a currency would basically be IOU's, which take the form of digitally signed certificates saying something like ``I, node A, in transaction number X, downloaded 132 credits of files'' and node A would tender the certificate to node B, from whom he just downloaded an MP3. This certificate would represent a debt from A to B, which B can later redeem in the form of his own download from A, using A's 132 credits to download a comparably-sized file. Our currency must be transitive if it is to be called such, so node B also has the option of using his newly- gotten currency to pay for a download at some other node (say node C), or paying C with his own minted currency, if he feels he is capable of backing his currency with his own bandwidth. Assuming B pays with A's money, C can now contact A and solicit a file download using A's transferred debt. Node A checks the signature to assure the certificate is authentic and issued by him, checks the transaction number to prevent double -spending, and then honors C's request to the amount specified. If node B would rather keep A's debt to himself, he can simply print more money against future uses of his bandwidth by those he paid. And so on.

However, there are some real problems with such a scheme. There is nothing to keep a node from printing endless currency of its own, downloading file after file, and then disappearing, leaving his creditors high and dry. Even assuming a node is honest, given the chaotic topology of the Gnutella network, a node that has been paid in transferred debt may well not be able to reach the issuing node after the money has changed multiple hands. Gnutella queries follow a TTL-limited flood pattern, and a node sees only its horizon of nodes within the sprawling network. A possible solution to this latter problem is encoding the physical network address in the currency, making the IOU certificate a signed tuple {nodeID, transaction number, credits, IP address of issuer}. Then, the issuer of currency would really be writing checks against future use of his bandwidth, and an intemperate node would find himself bombarded with search queries and file requests from all over the network, limiting his own ability to download.

Of course, this does nothing against the first form of fraud listed above of issuing rubber currency and then disappearing (or, for that matter, of simply not honoring the issued currency). It also punishes the offending node, if he is punished at all, too late. Until the moment of the crash, the offending node can search and download as well as the next node, and even after he is flooded, can search and see as much of the network as anyone else.

III. GNUSHARE

A. Desired properties

We would like to couple a given node's past behavior much more tightly to his present network service, ensuring that a departing node loses as much as he gains in leaving, and that dishonest nodes are forever plagued with poor service.

Among Gnutella users, the real coin of the realm is search ability, i.e., finding that file you really want.

In distributed hash-table systems, such as CAN [4] or Chord [5], finding the desired file is guaranteed by the system. Since Gnutella is unstructured, there are no guarantees on file searches. One finds files in relation to one's connectivity to the network, and the more connected one is to nodes that are themselves either file -rich or well-connected, the more successful the search. In this paper, we propose a new search and connection protocol for an unstructured P2P network like Gnutella, that serves users in relation to their contribution to the network in a completely decentralized way.

B. Algorithm description

One can intuitively understand our algorithm by examining the most basic operation of our network: a file transfer.

figure 1: A downloads a file from B;

consequently, A becomes B’s neighbor.

In our example, node A downloads a file from node B; a priori we assume B is A's neighbor in the Gnutella graph, but the reverse is not necessarily true (note: our graph is directed, unlike the current Gnutella de facto standard). In exchange B makes A its neighbor, with the right to forward search requests

directed edge B A

file

digital ‘cash’

(3)

through A (and his neighbors, as a result). A gains nothing other than the file, and pays B for its service by promises on future query requests1. The protocol is simple, but retains some subtlety. If A wishes to download from B again, he promises again yet more future search service, reflected as an increment in the weight of the directed edge from B to A. B slowly spends this incurred debt by channeling search queries through A in the usual TTL-limited flood.

The edge from B to A is eventually spent when the cost of the searches from B to A equals the weight of the edge, which is in turn determined by the number of downloads from A to B. Searches and downloads have their own associated costs Cd and Cs, with

1 One is reminded of the Sicilian proverb, “I don’t do favors, I collect debts.” That is the implicit philosophy of all nodes on the network.

Cs<< Cd, allowing several searches for one download.

Finally, searches are propagated as they started.

That is, when A eventually receives a query from B and does not itself possess the wanted file, it forwards the search to its neighbors, paying the associated search fee out of its debts with other nodes. Figure 2 presents a search process that ends with a successful download, and figure 3 presents the pseudo code of the algorithm.

The propagation of searches is a minor sticking point, and bears some reflection. By gut intuition, it would seem nodes would tend to ignore incoming searches, as they imply costs (whether the file is found on itself, or its neighbors). However, if nodes ignore searches, they forfeit the ability to charge someone for a download, gathering debts and new neighbors which will be useful when they themselves launch queries. Nodes could also honor the query to the point of seeing whether they have the file, and then omitting to pass on the search and saving the money they would spend on search fees in the process. However, in the current Gnutella protocol specification, the PONG message (in reply to a PING, the node find message) is routed back via the Gnutella overlay network to the originator of the PING. Hence, any node that purports to be a leaf node by not forwarding queries could be easily discovered through clever PINGing, assuming it follows the minimal Gnutella specification (which could also be easily checked). In the end, the cost of one search is small, and the possibility of reward through providing an upload is large, that we believe most nodes would honestly respond to queries. And if they did not, they would not only be cheating the queriers themselves, but also all their neighbors to whom they are denying the possibility of providing an upload, hence creating pressure for everyone to verify their neighbors are dealing with searches honestly.

C. GnuShare Implementation

As can be seen, connectivity is the currency of our scheme. Bookkeeping is done via a form of digital cash. In the first step above, after downloading the file, A issues to B a signed certificate stating ``I owe B X number of credits.'' The X can either be a fixed fee, or depend on other factors such as the size of the file, and would be established globally at the

file

$

(a) (b)

Neighbor

Search (Cs)

Upload file Searching node Possessing node Digital cache (Cd)

figure 2: (a) search and download flows. Every dashed edge in the search process is associated with Cs, the search payment. (b) resulting additional directed edge

(4)

beginning of the life of the network. In actual practice, the certificate could take the form of several digital cash notes of small denomination (small enough to be used in lower-price searches), in which case it would take a serial number to avoid double - spending. Or, the certificate could simply contain the running balance between A and B, which A would reissue to the appropriate amount following every transaction.

In the end, the key point is that downloads are permitted to anyone that can reach a given node via a search (which already implies something about the searcher, as discussed in a moment) and that the cash thus issued is good only for barter. B cannot take A's money and give it to someone else. He can only collect his debt by making A work for him. Any node's connectivity is ultimately determined by the amassed certificates it possesses, bearing a node name (and IP address) and an amount. A given node only responds to searches if the querier can present a signed certificate showing some pending debt, whereupon the node processes the query and then either destroys the certificate if it is a denominated micropayment, or destroys it and issues the querier a new certificate with the corresponding search fee deducted if the certificate is of the running balance kind.

D. Node entry and exit

Currently, entering peers in Gnutella undergo a bootstrap process that connects them to one of a hard- coded list of prominent and ever-present hosts. Those hosts then provide them with additional neighbors.

In our scheme, the entry procedure is similar.

Entering nodes are fronted some money by prominent hosts, enough to launch a few queries, but not enough to persist on the network long without participating.

Nodes that leave the network have their debts with other nodes cancelled, but also have all their credit with other nodes cancelled. This can be enforced by having all cash issued to a random but globally unique node ID that changes every time a node enters the network. This creates additional pressure for nodes to stay connected, which is the only way they are useful to the network.

IV. SIMULATION

A. Performance measurement

To gauge the performance of our scheme, we have constructed a Gnutella -like network testbed. In Gnutella, queries are propagated in a controlled flood, with a TTL-limited search scope, and with a buffer at each node recording recently seen queries as an anti-looping measure. Following a successful

search, downloads are conducted out of network and directly between the two peers.

Our protocol is essentially a qua lity-of-service scheme, hence it is by that measure that we judge its effectiveness. The two principal metrics are the ratio of successful queries to total queries for a given grouping of nodes, as well as what we term the ‘file horizon.’ This latter term denotes the subset of files among the entire space of files that a node ‘sees’

from its place in the network, given its connectivity and the limits of a flood search. Given that our protocol imposes a highly dynamic topology, we feel that these two metrics best capture the quality an end- user sees. We will define success to be a sharp difference in the resulting quality metrics among malicious and generous nodes, punishing the former by bad service, and not excessively bothering the latter with worse absolute service compared to unmodified Gnutella.

B. Methodology

Our simulation world is populated by three types of users: altruistic users who always share files, selfish users who never share files, and calculating users that choose to share or not share depending on how well they are doing at the moment. For the calculating ones, if the accumulated cash from other nodes (in other words, the weight and number of its outbound edges) passes a certain threshold, it becomes selfish, hiding all its files from the rest of the network. If it does poorly and threatens to become disconnected, it becomes altruistic and shares.

The starting topology is a power-law network of 1,000 nodes produced by the BRITE topology generator[6]. The initial weight of edges is an arbitrary amount chosen to allow several initial searches, but not allowing nodes to survive for long without an upload. Files are distributed among nodes in a Zipf’s law distribution where the ith most file - rich node possesses i files withα=0.9. The entire space of files varies from 100 to 10,000 files depending on the size and duration of the simulation.

Nodes choose files to query randomly among the files it does not possess.

Each node also has an associated bandwidth, which restricts the number of simultaneous downloads and uploads it can maintain in a given timestep. In our initial simulator, that is a fixed number for all nodes. Future studies will distribute bandwidth to match that seen in real Gnutella networks[7].

Our basic experimental procedure was to run our simulator, either as normal Gnutella or moneyed

(5)

GnuShare, for several averaged runs at one given set of parameters (to smooth out some of the random deviation), and then iterate over that parameter (be it Cs, Cd or the relative proportion of selfish/altruistic/calculation nodes, or anything else) and see the resulting effect on our metrics.

In addition, we consider the time evolution of our metrics for stable system parameters. By gauging the metrics for every time step, rather than at the end of a simulation, we reveal interesting information about the relative performance of the three types of users as the algorithm plays itself out.

IV. EVALUATION

As stated above, our two principal metrics are hit rate, defined as the ratio of successful to total searches for a given type of user (although not necessarily successful downloads, due to bandwidth limitations), and filehorizon, defined as the space of files a user can download at a given point in time. In addition, since this paper reports only initial results and our intuition for chaotic Gnutella -style networks is weak, we consider the average number of files as a rough measure of running search success, as well as average outdegree, as a measure of connectivity.

In fig. 3 we observe the dependence of hit-rate on the ratio of spayment to dpayment. For all types of nodes, system performance decreases as the search price becomes a significant fraction of a download payment. Such a dependence exists for all of our metrics, and finding the right values to use for a given set of network conditions was one of the principal challenges of these simulations. The dependence on this crucial parameter was so strong, we are convinced it must be adjusted dynamically to meet network conditions, as discussed below.

In fig. 4, we plot the results of one simulation of 1,000 nodes, with a demographics of 30%altruistic / 30%selfish / 40%calculating among the node population, and over 100 time steps. Very quickly, as

seen in the average node degree plot 4b, the selfish nodes find themselves disconnected from the network, while altruistic nodes grow more connected.

The calculating nodes start off selfish, and then

rapidly turn altruistic as they begin to suffer the same fate as the constantly selfish nodes. Since throughout the simulation, some calculating nodes are always either altruistic or selfish, the line for calculating nodes is essentially a sort of weighted average of the two altruistic/selfish lines.

Performance-wise, we see that selfish nodes are also functionally excluded from the network. Their hit rate 4c goes to essentially zero, reflecting their isolation from the network. After the selfish nodes die off, the altruistic nodes (as well as those calculating nodes who are acting altruistically), have a mild increase in hit rate, due perhaps to the increasing dissemination of files among nodes willing to share them (which in turn is a result of selfish nodes no longer tying up bandwidth). The filehorizon 4d plot follows hitrate closely, as one intuitively expects.

0 0.2 0.4 0.6 0.8 1 1.2

0 0.2 0.4 0.6 0.8 1

s-d-payment ratio

Avg. Hit-Rate

selfish altruistic calculating

Figure 3: Average hit-rate of selfish, altruistic and calculating nodes, for various value of

spayment/dpayment.

(6)

It is worth commenting that at no point does the GnuShare version outperform, at least according to these specific metrics, plain, unsecured Gnutella.

However, before abandoning the currency idea as useless, recall that all the current metric s simply reflect topology, not end-user performance. For example, while the hit rate of GnuShare (at least for the parameters we have explored, which is nowhere near the entire space) is moderately less than that of Gnutella, the effective hit rate might in fact be higher.

By this we mean, the ratio of successful search and successful download to total searches may be higher than hit rate as earlier defined. In other words, being able to find the file is not the same as being able to find it and download it. We feel that the improvement in available bandwidth following the exclusion of selfish nodes may well outweigh the moderate decrease in search ability following implementation of the currency scheme. Our simulation thus far does not take sufficient account of

bandwidth as a system parameter, and future versions will remedy this shortcoming.

It is also worth mention the importance of the TTL parameter in both money and non-money simulations. In plot 4c, the upper line represents the hit-rate of moneyless Gnutella, which is close to one throughout. It is probable that this is an artifact due to our use of a power-law graph as input. As prior studies have shown[8], random searches in power- law graphs are particularly efficient, and with a network size of only 1,000 and file space of 1,000, a TTL of 3 is enough to basically reach all files.

Gnushare presumably shares this same bias.

However here the TTL question is more subtle.

Experimentation showed that the performace of Gnushare is related to TTL in non-intuitive ways, some simulations showing drastically reduced performance for longer TTL’s. We believe this is due to searches propagating needlessly throughout the network, imposing costs on the propagating nodes as the search expands. Hence, while for one given

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7

0 20 40 60 80 100

Time

Percentage Calc-->Altruistic

0 0.2 0.4 0.6 0.8 1 1.2

0 50 100

Time

Avg. Hit-Rate

selfish altruistic

calculating no-money

(a) Percentage of calculating users acting altruistically (c) Avg. Hit-Rate

0 4 8 12 16

0 50 100

Time

Avg. Node Degree

selfish altruistic

calculating no-money

0 200 400 600 800 1000

0 50 100

Time

File Horizon

selfish altruistic

calculating no-money

(b) Avg. Node Degree (d) Avg. File Horizon

figure 4: performance metrics over 100 timesteps. 1000 nodes; TTL=3; file space = 1000; s -d-payment ratio=.001

The non-money curves represent the average of the selfish and the altruistic users, as their performance with regard to these metrics is about the same.

(7)

search a large TTL might be beneficial, for overall performance of the network controlling the TTL is essential.

IV. FUTURE WORK

Our results can only be called provisional, if initially promising. Future features of our simulator must endeavor to capture the subtle dynamics of bandwidth and connectivity that drive users’ actions and decide quality of service. Currently, our calculating users guide their behavior only by the number and weight of their edge connectivity to the rest of the network. A more realistic measure would hinge on the availability of sought-after files, giving the user of the choice of selfish behavior and mediocre service, or premium service for more altruistic behavior, depending on his needs at the moment and past success.

Another subtle question is the magnitude of the two driving parameters of the network: spayment and dpayment. Currently, these are static, but in any real network they will likely need to vary dynamically to handle changing network conditions. As seen in our preliminary results, the optimal values of spayment and dpayment vary for different conditions of connectivity, user populations, number of files, or search queries. Badly tuned, our scheme can cause outright disaster, shutting out honest and dishonest user alike. To achieve a dynamic equilibrium, users will need to vary these two parameters depending on locally-perceived need. For example, if a user finds it is short on connectivity, it can choose to offer a download for less than usual, passing its selling

‘price’ along with the results of the flood search. The querying user will then likely cull the search results from a search according to price, indirectly creating a market for a particular file, and regulating the amount of future work a node exacts for a given upload.

We stress however that these network machinations should be unseen by the user. Rational thought imposes a mental cost on the user, and so our vision of a real-world GnuShare client involves minimally-sophisticated software that negotiates the bandwidth/information market with minor supervision. Via some high-level quality setting, a user can tune the tradeoff between bandwidth (and possibly download latency) against search ability, changing the setting when he sees fit.

V. CONCLUSION

We have shown that in Gnutella -style networks, a simple currency (or really, barter) scheme can

essentially block out selfish nodes without seriously harming the service of honest users. By locking network service to a user directly to user service to a network, both eventually benefit. The key challenge is in setting the key parameters of the exchange to match current network conditions, a calculation we feel needs to be done dynamically.

In the end, whether a currency-secured network can work as well or better than unsecured Gnutella is a subtle question, and in our opinion an open one.

REFERENCES

[1]. E. Adar and B. Huberman, "Free-riding on Gnutella," First Monday 5(10), Oct. 2000.

[2]. Philippe Golle, Kevin Leyton-Brown, Ilya Mironov, and Mark Lillibridge. Incentives for Sharing in Peer-to-Peer Networks . In Proceedings of the 3rd ACM Conference on Electronic Commerce, pages 264-267, Tampa, FL, October 2001.

[3]. I. Clarke, O. Sandberg, B. Wiley, and T. W.

Hong, "Freenet: A Distributed Anonymous Information Storage and Retrieval System" ICSI Workshop on Design Issues in Anonymity and Unobservability, July 2000.

[4]. Sylvia Ratnasamy, Paul Francis, Mark Handley, Richard Karp, and Scott Shenker. A scalable content-addressable network. In Proc. ACM SIGCOMM 2001, August 2001.

[5]. I. Stoica , R. Morris , D. Karger , M. Kaashoek , H. Balakrishnan , Chord: A Scalable Peer-to-peer Lookup Service for Internet Applications , Proc.

ACM SIGCOMM , San Diego, CA, August 2001.

[6]. http://cs -www.bu.edu/faculty/matta/Research/

BRITE/

[7]. S. Sariou, P. Krishna Gummadi, and Steven D. Gribble. A Measurement Study of Peer- to-peer File Sharing Systems. In Proc.

MMCN ’02, San Jose, CA, January 2002.

[8]. L.A. Adamic, Rajan Lukose, Amit Puniyani

and Bernardo Huberman, Search in Power-

law Networks. Phys. Rev. E. 64, 046135.

參考文獻

相關文件

It would be game-changing to form a class atmosphere that encourage diversity and discussion, formed of groups with different microculture that optimized the sense of belonging

z The caller sent signaling information over TCP to an online Skype node which forwarded it to callee over TCP. z The online node also routed voice packets from caller to callee

In this homework, you are asked to implement k-d tree for the k = 1 case, and the data structure should support the operations of querying the nearest point, point insertion, and

With an infrared sensor device installed in the cushion, the acquired signals from the cushion would determine the sitting position and the time and further,

D) Enzymes are consumed by the reaction while inorganic catalysts do not undergo a chemical change..

This Manual would form an integral part of the ‘School-based Gifted Education Guideline’ (which is an updated version of the Guidelines issued in 2003 and is under preparation)

With the aid of a supply - demand diagram, explain how the introduction of an effective minimum wage law would affect the wage and the quantity of workers employed in that

It represents a universally applicable attitude and skill set everyone, not just computer scientists, would be eager to learn and