: Dynamic Reordering Bloom Filters - 動態變更查詢優先權的布隆過濾器

In this section, we introduce how to achieve lower cost for checking and maintaining membership in multiple bloom filters. Two factors are considered: (1) Policy of changing the query order of Bloom Filters (2) Reducing the overhead caused by changing the order. Our scheme improves the effect of search based on TMBF.

3.1

Motivation

Distribution of queried web documents is unpredictable due to the feature of temporal locality. Popularity of the same web document may have significant difference in different time slots. Recent researches for membership check in multiple Bloom Filters uses sequential search because of the lack of relationship between any two data stored in the different Bloom Filters. Cost of membership check is increased substantially if many frequently queried web documents are stored in the Bloom Filters which have lower query order. For mitigating the cost, this thesis proposes a concise scheme to immediately modify the query order of each Bloom Filter by referencing distribution of temporal locality. Higher query orders are assigned to the Bloom Filters which include frequently queried web documents. These Bloom Filters are called Popular Bloom Filters in later paragraph. Similarly, Bloom Filters which mostly include infrequently queried web documents gain lower query orders.

3.2

Policy of Changing Query Priority

There is no information that can be used to build relations between data which are set in different Bloom Filters because incoming data are unpredictable. Therefore, an optimal searching scheme does not exist. Fortunately, queried data has feature of temporal locality.

The feature of the data can help to filter out which data are more frequently queried. For this reason, query order of the data can be sorted by their popularities. Our policy is considered that the query order of a Bloom Filter can be promoted one level up once the Bloom Filter contains a queried datum for really reflecting the popularity of the datum. A Bloom Filter which contains a large number of queried data must be assigned with higher query order after

has a higher query order. Figure 3-2-1 shows our idea to promote query order of a BF. We take a red color BF as an example. In the beginning, the red BF located in third query order.

Then, the BF is promoted one query order when a queried datum C coming. Although other queried data are continuously coming and help to promote they own BF, but the query order just switch between two adjacent BFs. The major factor to promote query order of a BF still depends on the popularity of the BF. In the example, red BF is more popular than others.

Query order of red BF is more and more high with the datum C successively coming. Finally, Red BF can own the highest query order.

However, switching any two Bloom Filters for changing their query order will incur too much overhead to practice. Therefore, the idea of OMABF is then introduced for solving this issue.

3.3

Promotion of System Performance

Multiple memory access times are required for membership check of Bloom Filter because bits which are used to represent a datum are randomly distributed to multiple memory blocks. The number of memory access can be reduced if the bits are rearranged to less memory blocks. We replace original Bloom Filters with OMABFs in order to achieve this goal. OMABF takes one or more blocks for setting a datum. The bits are evenly shared by the blocks. Therefore, memory access times can be significantly reduced. Besides, the blocks can be regarded as a tiny Bloom Filter for measuring and managing purposes. Compared with

Figure 3-2-1 Query order changing of multiple BFs

original Bloom Filter, more accurate result of popularity of data can be obtained by measuring the block. In the same time, the blocks can be assigned with different query order respectively.

Therefore, the most popular blocks gain higher query order and create better performance.

Figure 3-3-1 figure out a phenomenon of competition once more than one BF are popular and have approximate popularity to each other. Queried data keep promoting their own BF and leading the query order of two BFs always exchange. Therefore, the more overhead is caused.

Figure 3-3-2 shows architecture of replacing BFs with OMABFs. OMABF consists of multiple homogeneous blocks. Each block only stores a few data of the OMABF. Therefore, a really popular datum just help to promote its‟ own block. Other blocks still have the same query order and do not compete with each other.

However, the problem of updating Bloom Filter may be caused by switching blocks for changing query order, because no information can be used to trace where a block is. Thus,

Figure 3-3-2 Architecture of replacing BFs with OMABFs Figure 3-3-1 Competition between multiple BFs

actually moving blocks is impractical. A concise data structure is introduced for solving this problem with punishment of little extra memory space.

3.4

Query Index

Extra overhead is created when moving blocks to change their query order. Two costs are considered: (1) Extra memory access times for exchanging blocks between two Bloom Filters (2) Updating incorrect memory blocks. Based on these considerations, we propose an idea of indirect query to avoid moving the block. Query order of each block is recorded in an index. Therefore, querying order can be changed with the data recorded in the index. The Concept of the index is referred to as Query Index (QI) in the later paragraphs. QI is an integer array for recording the query order of specific blocks. In the beginning, the values of the array are initialized with increasing number to map each Bloom Filter for purpose of indirect query. When checking membership, Bloom Filters are verified according to order saved in QI. Once a Bloom Filter includes queried document, the order of the number associated with the Bloom Filter is exchanged with the previous one in the QI. Then, the Bloom Filter is checked early in next round. Figure 3-4-1 is the architecture after adding QI.

Since QI is much smaller than BFs, the QI can be stored in cache memory. Each QI manage the query order of blocks which are located in the same position mapping by hash function.

The final architecture of Dynamic Reordering Bloom Filter is shown in Figure 3-4-2.

Block hash function is used to choose the blocks which are in the same position but across Figure 3-4-1 Architecture of adding Query Index

many OMABFs when accessing a datum. Each position is represented by different numbers.

Query Index which has the same number as the blocks is used to manage the query order of the blocks. The blocks are checked following the number reported by Query Index when querying a datum. Query Index may be modified once a block includes the queried datum.

Figure 3-4-2 Architecture of Dynamic Reordering Bloom Filter

在文檔中動態變更查詢優先權的布隆過濾器 (頁 16-21)