Previous Work on Bloom Filter Designs - 在多核心系統上之高效能雙層計數布隆過濾器

Chapter 2 Preliminary

2.1 Previous Work on Bloom Filter Designs

2.1.1 Simple Hash-based Technique

A membership querying function returns a value of true or false to identify the existence of a given input query. A straightforward way to implement a membership query function is to give an entry to each individual member. However, the total number of the members is usually much larger than the limitation of the memory size in a design. A hash function is a basic yet efficient solution for membership querying. An input query will be sent to a hash function, and a hashed value is returned to index the corresponding entry of the query.

However, the single-index hash table is prone to returning many false positives for different queries. For example, a computer system with n-bit memory addresses will introduce 2ⁿ

distinct memory locations. In an ideal case, a hash function needs 2ⁿ bits to distinguish each data location. But due to the storage limitation in real systems, the hash function is forced to map the 2ⁿ memory space to a mapping table with only m bits, where m is much smaller than 2ⁿ. The parameter m may vary according to accuracy requirements and available resources.

Thus, 2ⁿ – m datum could be hashed to a bit that has already been used by another data (Fig.

3(a)). This is referred as a “collision”. The collision problem could make the single-index hash function to report a false positive of membership querying. As an example shown in Fig.

3(b), there exists an element A belonging to a set S. The single-indexed hash unit provides A with a particular bit slot in the mapping table and sets this bit to 1. The value 1 indicates that A is in set S. But another element B that does not belong to S might also be hashed to the same

slot. This conflicting scenario pollutes the meaning of the returned value and creates the situation of a false positive. From this returned value, users cannot tell if the element B has really been assigned to the slot or not.

Fig. 3. (a) Hash collision. (b) Hash reports a false positive for element B.

2.1.2 Classic Bloom Filter (BF)

A Bloom filter is a space-efficient data structure proposed by Bloom in the 1970s [4].

Bloom filter uses multiple hash units for each element and sets several bits (depends on k, the number of hash functions) for each element. Fig. 4 shows how BF maps a single data to the mapping table with k = 3. A specific data C is considered in a particular set T only when all the corresponding hashed slots are set. Fig. 3 also shows another element D, which does not belong to set T, and the corresponding hashed slots. Two of the slots collide with two of C’s slots (colored). However, there is another slot of D that is not set, so the Bloom filter correctly reports D as not in the set T. The Bloom filter has less possibility that reports a false positive than a simple hash function because collision must happen in all of the k hash functions.

Fig. 4. The mapping mechanism of a Bloom filter (k = 3).

2.1.3 Counting Bloom Filter (CBF)

Classic Bloom filter provides a memory-effective way of reducing hash collisions by using multiple hashes. However, a classic Bloom filter suffers from two problems. First, as the number of hash function increases, its mapping slot is “polluted” or “saturated” faster

since the Bloom filter requires setting k bits for each element. Second, the classic Bloom filter does not support “deleting” or “resetting” the mapping slots. In other words, once a bit is set,

the classic Bloom filter has no mechanism to reset it. Eventually, the classic Bloom filter will be filled up with 1’s and loses its filtering functionality.

Fig. 5. Different data maps to same slots in mapping array. In (a), we cannot tell if a slot is mapped multiple times. In (b), we can decrease the counter to indicate removal of an element.

Since the multiple hash function is inevitable for Bloom filters, Fan et al. [6] proposed counting Bloom filter (CBF) to enable resetting a mapping slot. Counting Bloom filter adds an additional counter array along with the mapping slots of the classic Bloom filter. Each l-bit

counter is associated with a mapping slot in a one-to-one fashion. Whenever an element is inserted to a set, each hashed slot will increment its corresponding counter by 1 and sets the mapping slot to 1. Therefore, the counter indicates the number of elements hashed to it, as depicted in Fig. 5. On the other hand, whenever an element is removed, each slot will decrement its corresponding counter. When a counter is decreased to zero, its corresponding mapping slot will be reset to zero. With this resetting procedure, CBF achieves a lower false positive rate and hence reduces the impact of saturation of the classic Bloom filter.

2.1.4 Banked Bloom Filter (BBF)

Both BF and CBF requires k lookups from the mapping table because of k hash functions.

Making these lookups in a serial manner is inefficient and difficult to meet the timing constraint in hardware implementation. However, parallelizing k lookups requires large memory bandwidth, so each memory in the filter has to implement k read/write ports for querying and updating elements. Banked Bloom filter was proposed to address the issue [7].

Similar to the banked cache access, BBF supports required bandwidth by using banking instead of adding read/write ports. Assume a memory with p ports and each port has B banks, it can provide a maximum of p∙B simultaneous access as long as no more than p operations are accessing the same bank [7].

When applying Bloom filters to the local cache or interconnection of a SMP system, the filters are usually accessed at every cycle. Therefore, a delay in the filter is undesirable.

However, bank conflicts will stall the accessing procedure and make the filter to be ineffective.

Banked Bloom filter uses a hard-wired permutation table to prevent bank conflicts. Fig. 6(a) shows how a banked Bloom filter is organized. With four hash functions, the BBF is configured as four banks to provide a memory bandwidth of 1×4=4 accesses simultaneously.

Whenever there is a membership querying to the filter, the permutation table will return a sequence of bit sets to the multiplexers and guide the hash functions to the corresponding banks. Fig. 6(b) depicts a permutation table with four banks (k = 4).

Fig. 6. (a) Banked Bloom filter with four hash functions. (b) Hard-wired permutation table of BBF.

在文檔中在多核心系統上之高效能雙層計數布隆過濾器 (頁 17-22)