• 沒有找到結果。

Web Proxies Log Files

Chapter 5 : Simulation Results

5.5 Web Proxies Log Files

Parameter Value

Log date January 9, 2007

During 1 day

URL set 296,269 URLs

Queried URL set 1,415,075 URLs Table 5-5-1 Profile of web proxies trace log files

Trace log files are downloaded from [23]. These log files are created by web proxies in the United States. The format of these files follows Squid. Data is extracted for verifying the performance of dynamic bloom filters if the result code [24, section 6.7] includes the key word “HIT” but excludes a key word “TCP_NEGATIVE_HIT”. The latter representing a queried web document does not exist in the cache. Otherwise, web proxy has the

Figure 5-4-1 The performance of dynamic bloom filters verified by eDonkey tracker log

corresponding document in cache. The profile of these trace log files is shown in Table 5-5-1.

In this paragraph, eight trace log files are mixed and reallocated to eight Bloom Filters. Each group is assigned 37,034 URLs for simulating the operation of cooperative web proxy. The performance of the dynamic bloom filters is shown as Figure 5-5-1. In the experiment, the order of queried URLs is the same as trace log files. URLs set into dynamic bloom filters are randomly distributed. Eight Bloom Filters at most need to be checked when executing membership check for a queried URL. The performance is similar to the first experiment. Our scheme has better performance than the other two. For the same reason, the distribution of queried data has feature of temporal locality. At most 43% cost of membership check is saved.

5.6

Changing of False Positive Ratio of Bloom-g Filter

Parameter Value

Popularity distribution Random

Data set 30,720 (From domain name set [20])

Queried data set 3,072,000 (exclude data set)

Capacity of a BF Randomly selecting from [20]

The number of blocks per datum 1024、1536 and 3072

The number of blocks 2、3 and 4

Table 5-6-1 False Positive Ratio of N-MABF for Different Number of Block

Bloom-1 Filter is very effective for accessing Bloom Filters. However, the false positive ratio is very large because data are non-uniformly set to each block. Therefore, Bloom-g Filter is proposed to solve this problem. The false positive ratio decreases with the number of the

Figure 5-5-1 Performance of dynamic bloom filter verified by web proxy trace log

blocks dividing k independent hash functions. When the number is two, the false positive ratio is significantly improved. The effect of the number of the blocks on the false positive ratio declines once the number is larger than two. The improved performance is shown as Table 5-6-2.

Value of N False Positive ration of 10 N

False positive ratio of 20 N

False positive ratio of 30 N

2 1.65E-04 3.26E-04 5.02E-04

3 2.95E-05 4.20E-05 6.43E-05

4 2.54E-05 3.22E-05 4.52E-05

Table 5-6-2 False positive ratio of N-MABF cause by various N

Chapter 6: Conclusions

Bloom Filter provides the benefits of space-efficient and constant time to execute membership check. The applications of Bloom Filter to filter incoming data improve system performance by avoiding irrelevant data. However, using Bloom Filters to manage the information of distributed system may suffer from membership check in many Bloom Filters.

The average searching cost of Bloom Filters increases with the number of cooperative peers.

This thesis focused on this issue and proposed a concise scheme for reducing the average cost of membership check. With the temporal locality characteristic in web requests, popular queried documents can be assigned with higher query priority in checking multiple Bloom Filters. We use three real trace logs from NASA web server, edonkey tracker server and web proxies to comparing the performance of our scheme with linear and reverse query order. Our scheme always have better performance. That is because our scheme can immediately change the query order of a BF once it is hit by a queried datum. Hence, search cost of popular data can be rapid dropped down and save more memory access times.

References

[1] B. H. Bloom, "Space/time trade-offs in hash coding with allowable errors," presented at the Communications of the ACM, 1970.

[2] L. Li, et al., "A variable length counting Bloom filter," in Computer Engineering and Technology (ICCET), 2010 2nd International Conference on, 2010, pp. V3-504-V3-508.

[3] F. Bonomi, et al., "An improved construction for counting bloom filters," in Algorithms–ESA 2006, ed: Springer, 2006, pp. 684-695.

[4] D. Ficara, et al., "Multilayer compressed counting bloom filters," in INFOCOM 2008.

The 27th Conference on Computer Communications. IEEE, 2008, pp. 311-315.

[5] M. Mitzenmacher, "Compressed bloom filters," presented at the IEEE/ACM Transactions on Networking (TON), 2002.

[6] H. Song, et al., "Ipv6 lookups using distributed and load balanced bloom filters for 100gbps core router line cards," in INFOCOM 2009, IEEE, 2009, pp. 2518-2526.

[7] S. Dharmapurikar, et al., "Longest prefix matching using bloom filters," in Proceedings of the 2003 conference on Applications, technologies, architectures, and protocols for computer communications, 2003, pp. 201-212.

[8] K. Shanmugasundaram, et al., "Payload attribution via hierarchical bloom filters," in Proceedings of the 11th ACM conference on Computer and communications security, 2004, pp. 31-41.

[9] P. Jokela, et al., "LIPSIN: line speed publish/subscribe inter-networking," in ACM SIGCOMM Computer Communication Review, 2009, pp. 195-206.

[10] P. B. Danzig, et al., "A case for caching file objects inside internetworks," 1993.

[11] L. Fan, et al., "Summary cache: a scalable wide-area web cache sharing protocol,"

presented at the IEEE/ACM Transactions on Networking (TON), 2000.

[12] Y. Qiao, et al., "One memory access bloom filters and their generalization," in INFOCOM, 2011 Proceedings IEEE, 2011, pp. 1745-1753.

[13] D. Guo, et al., "The dynamic bloom filters," presented at the Knowledge and Data Engineering, IEEE Transactions on, 2010.

[14] M. Xiao, et al., "TMBF: Bloom filter algorithms of time-dependent multi bit-strings for

incremental set," presented at the Ultra Modern Telecommunications & Workshops, 2009. ICUMT'09. International Conference on, 2009.

[15] S. Jin and A. Bestavros, "Sources and characteristics of Web temporal locality," in Modeling, Analysis and Simulation of Computer and Telecommunication Systems, 2000.

Proceedings. 8th International Symposium on, 2000, pp. 28-35.

[16] A. Mahanti, et al., "Temporal locality and its impact on Web proxy cache performance,"

presented at the Performance Evaluation, 2000.

[17] L. Breslau, et al., "Web caching and Zipf-like distributions: Evidence and implications,"

in INFOCOM'99. Eighteenth Annual Joint Conference of the IEEE Computer and Communications Societies. Proceedings. IEEE, 1999, pp. 126-134.

[18] P. Cao and S. Irani, "Cost-Aware WWW Proxy Caching Algorithms," in Usenix symposium on internet technologies and systems, 1997, pp. 193-206.

[19] Hoel, P., Port, S., Stone, C.,《Introduction to Stochastic Processes》, Houglition Mifflin, 1972.

[20] Open Directory Project. Available: http://rdf.dmoz.org/

[21] NASA web server trace log file. Available:

http://ita.ee.lbl.gov/html/contrib/NASA-HTTP.html

[22] eDonkey trace log file. Available: http://fabrice.lefessant.net/traces/edonkey2/

[23] Web proxies trace log files. Available: ftp://ftp.ircache.net/Traces/DITL-2007-01-09/

[24] Squid. Available: http://www.comfsm.fm/computing/squid/FAQ-6.html

相關文件