Micro-Benchmark Results - Experimental Results

0.6 Experimental Results

0.6.3 Micro-Benchmark Results

Test Procedure

The micro-benchmark test procedure in this study consists of four phases: insertion, update, query, and deletion. The NOR flash is entirely empty before this benchmark. The first phase sequentially inserts 25,000 consecutive keys. The second phase performs 800,000 key updates using a Gaussian random variable for key selection, forming a temporal locality in the access pattern. The third phase performs 800,000 key queries using the same random variable for key selection. The final phase randomly removes all the keys. The mean and the variance of the Gaussian random variable are the median of all the keys and one-sixth the total number of keys, respectively.

Figure 15: The total execution time of the micro-benchmark in different flash size 1 mega-byte, 2 mega-mega-byte, 4 mega-mega-byte, and 8 mega-byte

Fat Lists versus µ Trees

Figure 14 shows the micro-benchmark results in different flash size. Before discussing these results, recall that NOR flash is very slow on write and erase but extremely fast on read.

The page size of µ-tree is 512 bytes in order to contain all data after the insertion phase.

In insertion phase, the flash size does not affect the performance on fat lists and µ-tree, because the minimum need of flash size to build up the list and tree is less than 1M.

Therefore, fat lists and µ-tree does not trigger any garbage-collection activities. The read time of µ-tree is less than fat lists because the query performance in µ-tree is determined by its height. µ-tree is also a balanced tree, so the increase of height is very slow. For this reason, its hard to outperform µ-tree in query. However, the write time of µ-tree is much more than fat lists because µ-tree has to rewrite all path on every structural modification of leaf node. Since NOR flash is much more slower on write than on read, fat lists outperforms µ-tree in insertion phase.

Although there is no garbage-collection activity in insertion phase, the minimum need of space to build up µ-tree is more than that of fat lists after inserting all keys. This is because the data in µ-tree is all contained in leaf node, and every leaf node in µ-tree occupy a page. A leaf node with size 256 byte in µ-tree need 512 bytes, and an object size in fat lists is 353 bytes including 20 data, 20 keys, a 20-bit bitmap, and a pointer pool of size 7,

two words each. The leaf nodes in µ-tree contain a little bit more data than objects in fat27 lists, but the occupied size is much larger. Therefore, to build up µ-tree needs more space than fat lists. In the same flash size, the free space of µ-tree would less than fat lists.

In update phase, Both fat lists and µ-tree experience garbage-collection activities. As the flash size decrease, the performance of fat lists and µ-tree also decrease. The read time of µ-tree is still less than fat lists, but the write and erase time of µ-tree is much more than fat lists. This is not only because µ-tree has to rewrite all path on every structural modification but also because the free space of µ-tree is less than fat lists. Rewriting all path increase write time, and less free space increase erase time. Therefore, fat lists outperforms µ-tree in update phase.

µ-tree outperforms fat list in query phase. Query in balanced tree is fast, and its hard to outperform tree in query. This is a tradeoff between the performance of read-only queries and read-write operations. Because read operation is much faster than write and erase, we choose to sacrifice the performance of read-only queries and improve the performance of read-write operations.

The deletion phase is also a read-write phase. When deleting keys, the structural modi-fications are less than update. However, fat lists still outperforms µ-tree. Figure 15 shows the total execution time of the micro-benchmark in different flash size. Although µ-tree outperforms fat list in query phase, fat lists is still two times faster than µ-tree in the total execution time because NOR flash is fast on read. Overall, this micro-benchmark shows that fat lists is much faster than µ-tree.

Maximum Level and Turnstile Size

This section analyzes fat lists performance under different settings. The first experiment evaluates fat lists using different maximum levels. Figure 16 presents the experimental results of the micro-benchmark. In insertion phase, the difference of write time between different settings of maximum level is small, but the read time decrease as the maximum level increase. This is because the higher level can skip more distance than lower level.

Therefore, the difference of total time between different settings of the maximum level

Figure 16: The micro-benchmark results of fat lists with different settings of maximum level (a) insertion phase (b) update phase (c) query phase (d) deletion phase

determined by the read time in insertion phase. In update phase, the performance is better with maximum level setting to 4 and 5. The performance with maximum level setting to 3 is worse because the maximum level is too small that it spends much more time in query data location. As the maximum level increasing, the size of pointer pool increases, and object size increases, too. Therefore, an object is larger, the consumption of space is faster. The frequency of garbage-collection activities also increases. Thus, the setting of larger maximum level performs worse because of increasing garbage-collection activities frequency. In query phase, the performance is better with maximum level setting to 4, and the result is different with that in insertion phase. Because there is more keys contain in an object in fat lists, the object number is not many. As we use the mechanism of [22] to allocate the number of objects in different level, the high level object amount is few and even none except head and tail objects. The benefit of high level to skip farther is useless, and even worse this increases the read time searching from highest level to lower level. After garbage-collection frequency, the read time increases to find the target, and this also reveals the drawbacks of few high level objects. In deletion phase, the tendency of read time is as in query phase, but the write time decrease as the maximum level increase. As mentioned in update phase, the increasing of maximum level increases the size of pointer pool and object

Figure 17: The total execution time of the micro-benchmark with different settings of maximum level

size. When the maximum level is small, the size of pointer pool is small. Therefore, the spare for updating pointers is less, and that will cause frequent object rewrites because of no spare to update pointer. That is, in small maximum level, small object size decrease the frequency of garbage-collection activities, but small size of pointer pool increase frequency of rewriting objects and increase write time. There is less operation in deletion phase, so the increasing frequency of rewriting objects because of small size of pointer pool reveals.

Figure 17 shows the total execution time of the micro-benchmark with different settings of maximum level. The better choice of setting maximum levels are 4 and 5 in this experiment.

Figure 18 shows the results of evaluating fat lists using different turnstile sizes. As the turnstile size increases, the read time increases in four phases, and especially obvious after update phase. This is because the probability of probing an invalid object increases as the space utilization in NOR flash decreases. Thus, de-referencing a soft pointer may require extra probes to find a valid object and successfully skip. However, the the setting of small turnstile size may not perform the benefit of using soft pointers that can randomly skip farther. This phenomenon is shown in the query phase. Even so, the better performance in update phase is turnstile size being set to 8. Because every turnstile has one spare block, larger turnstiles will result in fewer spare blocks and low space utilization in NOR flash.

Therefore, the write and erase time decreases as the turnstile size increases. Figure 19 shows the total execution time of the micro-benchmark with different settings of turnstile

Figure 18: The micro-benchmark results of fat lists with different settings of turnstile size (a) insertion phase (b) update phase (c) query phase (d) deletion phase

size. The better choice of setting turnstile sizes are 8 and 16 in this experiment. Based on the above results, the recommended maximum level and turnstile size are 5 and 8, respectively.

Overhead of De-referencing Soft Pointers

De-referencing a soft pointer probes objects in a turnstile, but not all probes produce useful results. This section investigates the overhead caused by these extra probes.

To assist our discussion, we first define different types of probing results: a null probe points to an invalid object or free space in NOR flash, a back probe goes to an object whose key is smaller than the current key, a low probe points to an object that does not hook on the current level, and an over probe refers to an object whose key is larger than the searched key. None of these probes are useful. All the other probes are useful, and called successful probes. Figure 20 illustrates the different types of probes.

Figure 19: The total execution time of the micro-benchmark with different settings of turnstile size

Figure 20: Different types of probes made when de-referencing a soft pointer.

This part of our experiment is to analyze the soft pointers in phase 3 in the micro-benchmark. Table 0.6.3 shows these results. The rows titled “pointers visited” and “total probes,” show how many soft pointers are de-referenced and how many probes are made during the test, respectively. The rest of the rows show the percentage of probes separately to differentiate between probe types.

Almost every soft pointer has a successful probe except the soft pointer at the last of every level before finding the searched key. The maximum level is higher, the soft pointers with no successful probe is more. In maximum level setting to 1, percentage of successful probes is 1 because there is just one level and must find an object to skip to before finding searched key. Therefore, as the maximum level decreases, the percentage of successful probes increases.

Max. levels 32

5 4 3 1

Pointers visited 13082578 13130214 22495212 88385632 Total probes 34529843 28286792 35443585 106887758

successful probes 0.775 0.837 0.940 1

low probes 0.020 0.017 0.001 0

over probes 0.676 0.487 0.192 0.057

back probes 0.364 0.233 0.108 0.043

null probes 0.804 0.580 0.334 0.109

Now consider the unsuccessful probes. As mentioned in Section 0.4.2, fat lists adopt an object-level space allocation policy, writing objects at the same level to nearby offsets of the blocks in a turnstile. The percentage of low probes are small, proving this policy successful. Null probes, back probes, and over probes are the most common types of unsuccessful probes. Provided that valid objects are randomly distributed in the entire NOR flash, the total number of back probes and over probes will be proportional to the total number of pointers visited. On the other hand, the total number of null probes is subject to not only the total number of pointers visited but also the space utilization in NOR flash. The lower the space utilization is, the more null probes there will be.

The discussion above shows that the total number of extra probes is a function of the total number of successful probes and the space utilization in NOR flash, meaning that overhead is manageable.

Initializing Speed

This experiment evaluates how many word reads the initialization procedure requires to initialize a fat lists. The initialization procedure is inserted between the update phase and the query phase of the micro-benchmark because this is when the fat lists contains the largest number of keys. This test was conducted under different maximum levels and turnstile sizes, while the total number of keys remained at 25,000.

The results in Table 0.6.3 show that the initialization overhead drastically decreased as the maximum level increased. This is because the initialization procedure escalates

Max. levels 33

1 3 4 5

TS sizes

8 150.59 37.95 30.47 15.07 16 184.03 58.74 30.47 20.13 32 266.75 66.44 33.22 25.30

]^_ ]`_

Figure 21: The macro-benchmark results of fat lists and µ-tree with different times of range query (a) insertion phase (b) range query phase

the current level for long-distance skips. Conversely, the initialization overhead increases when the turnstiles are large. This is because space utilization in NOR flash is inversely proportional to the turnstile size. Decreasing the space utilization increases the probability of making null probes when de-referencing soft pointers.

These results indicate that the maximum level has a much greater influence on the initialization overhead than the turnstile size. When the turnstile size and the maximum level are 8 and 5, respectively, initializing a fat lists of 25,000 keys takes only 15.07 micro-seconds.

在文檔中胖串列: 一種NOR快閃記憶體的循序索引結構 (頁 26-34)