An Efficient R-Tree Implementation over Flash-Memory Storage Systems
∗Chin-Hsien Wu, Li-Pin Chang, Tei-Wei Kuo
Department of Computer Science and Information Engineering National Taiwan University, Taipei, Taiwan 106, ROC
Fax: +886-2-23628167
{
d90003,d6526009,ktw
}@csie.ntu.edu.tw
ABSTRACT
For many applications with spatial data management such as Geographic Information Systems (GIS), block-oriented access over flash memory could introduce a significant num- ber of node updates. Such node updates could result in a large number of out-place updates and garbage collection over flash memory and damage its reliability. In this paper, we propose a very different approach which could efficiently handle fine-grained updates due to R-tree index access of spatial data over flash memory. The implementation is done directly over the flash translation layer (FTL) without any modifications to existing application systems. The feasibil- ity of the proposed methodology is demonstrated with sig- nificant improvement on system performance, overheads on flash-memory management, and energy dissipation.
Categories and Subject Descriptors
C.3 [SPECIAL-PURPOSE AND
APPLICATION-BASED SYSTEMS]: Real-time and embedded systems; H.3.1 [Content Analysis and Index- ing]: Indexing methods
General Terms
Design, Performance, Algorithm
Keywords
Flash Memory, GIS, R-Tree, Storage Systems, Embedded Systems, Spatial Index Structures
1. INTRODUCTION
Flash memory is now considered as an alternative for hard disks in many applications. The popularity of mobile net-
∗Supported in part by a research grant from the National Science Council under Grant NSC 91-2213-E-002-070 and a research grant from the Academia Sinica.
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. To copy otherwise, to republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee.
GIS’03, November 7–8, 2003, New Orleans, Louisiana, USA.
Copyright 2003 ACM 1-58113-730-3/03/0011 ...$5.00.
work also triggers a new wave of mobile applications over hand-held devices. Geographic Information Systems (GIS) is one of the many popular applications over hand-held de- vices. One major technical question for their storage sys- tems designs is how to efficiently access various geographic information, such as electronic maps, and store information over hand-held devices. The supports for applications with such needs of spatial index structures is highly important for the performance of embedded systems, especially when the capacity of flash memory grows rapidly in recent years.
An R-tree [6, 4] is usually implemented as non-memory- resident index structures for the access of a large collection of spatial data. The concept of R-trees was first proposed by Guttman [6], and later on an R*-tree variant was pro- posed by Beckman, Kriegel, Schneider, and Seeger [4], where the difference of R-trees and R*-trees is on the overlapping of bounding boxes. Insertions, deletions, and re-balancing over R-trees often cause many sectors being read and writ- ten back to the same locations. For disk storage systems, these operations are considered efficient, and R-tree nodes are usually grouped in contiguous sectors on a disk for fur- ther efficiency considerations. Implementations over disks could not be applied directly over flash memory. For ex- ample, flash memory [8, 7, 11] could not be over-written (updated) unless it is erased first. As a result, out-of-date (or invalid) versions and the latest version of data might co- exist over flash memory. Furthermore, an erasable unit of a typical flash memory is relatively large, compared to the unit for reads and writes. After a certain number of sector writes, free space on flash memory would be low. Activi- ties which consist of a series of reads/writes/erases with the intention to reclaim free space would start. The activities are called “garbage collection”, which is considered as over- heads in flash-memory management. Note that writes and erases over flash memory take more time than reads. Fre- quent erasing of some particular locations of flash memory could quickly deteriorate the overall lifetime of flash mem- ory, because each erasable unit has a limited cycle count on the erase operation. Any direct application of an R-tree disk implementation over flash memory could result in a se- vere performance degradation and significantly reduce its reliability. For example, intensive byte-wise operations on R-trees, due to object inserting, object deleting, and R-tree reorganizing, could result in a large number of data copy- ings.
In this paper, we target an essential problem in the design of a mobile system which needs an intelligent management
of spatial objects. We explore efficient R-tree implemen- tations for better performance and energy consumption on mobile devices. In the implementation, a reservation buffer and a node translation table are proposed to reduce the number of unnecessary and frequent updates of information over flash-memory storage systems1. The implementation is over the flash translation layer (FTL) [3, 2] for the com- patibility of applications and systems, where FTL provides block-device emulation. Note that the concept of sectors are provided over FTL. As a result, FTL-based flash-memory storage systems could be easily ported to DOS-like operat- ing systems. When an R-tree node is inserted, deleted, or modified, any newly generated objects would be temporar- ily held by the reservation buffer, where an object is a entry in an R-tree node. Since the reservation buffer only holds a limited number of objects, these objects should be flushed to flash memory in a timely fashion. We show that the pro- posed methodology could not only significantly improve the system performance but also reduce the overheads of flash- memory management and energy dissipation.
The rest of this paper is organized as follows: Section 2 provides an overview of flash memory. Section 3 provides the problem formulation. Section 4 introduces the R-tree implementation. Section 5 provides performance analysis of the approach. Section 6 shows experimental results. Section 7 is the conclusion.
2. FLASH MEMORY CHARACTERISTICS
A NAND2 flash memory is organized by many blocks, and each block is of a fixed number of pages. A block is the smallest unit of erase operation, while reads and writes are handled by pages. The typical block size and page size of a NAND flash memory is 16KB and 512B, respectively.
Because flash memory is write-once, we do not overwrite data on update. Instead, data are written to free space, and the old versions of data are invalidated (or considered as dead). The update strategy is called “out-place update”. In other words, any existing data on flash memory could not be over-written (updated) unless it is erased. The pages store live data and dead data are called “live pages” and “dead pages”, respectively. Because out-place update is adopted, we need a dynamic address translation mechanism to map a given LBA (logical block address) to the physical address where the valid data reside. Note that a “logical block” usu- ally denotes a disk sector. To accomplish this objective, a RAM-resident translation table is adopted. The translation table is indexed by LBA’s, and each entry of the table con- tains the physical address of the corresponding LBA. If the system reboots, the translation table could be re-built by scanning the flash memory. Figure 1 illustrates the retrieval of data over flash memory in terms of the translation table.
The focus of this paper will be on NAND flash because it is designed mainly for storage systems.
After a certain number of page writes, free space on flash memory would be low. Activities consist of a series of reads,
1Similar data structures are adopted for B-tree management in [10]. However, with a different nature of R-trees, com- pacting of R-tree nodes and packing of index units over flash-memory pages would be very different.
2There are two major types of flash memory in the current market: NAND flash and NOR flash. The NAND flash memory is specially designed for data storage, and the NOR flash is for EEPROM replacement.
User data . . .
LBA (array index)
Physical address (block,page)
physical address (block,page) Access LBA = 3
Address translation table
(in main-memory) Flash memory 0,0
0,1 0,2 0,30,4 0,5 0,6 0,7 1,0 1,1 1,2 1,3
(0,3) (15,1)(0,1) (0,6) (0,4) (14,7) (13,0)(2,1) (41,2) (12,3) (1,3) (200,4)
0 1 2 34 5 6 7 8 9 10 11
. . .
Figure 1: The logical block address ”3” is mapped to the physical address ”(0,6)” by the translation table.
writes, and erases with the intention to reclaim free spaces would then start. The activities are called “garbage col- lection” and considered as overheads in flash-memory man- agement. The objective of garbage collection is to recycle dead pages scattered over blocks so that they could become free pages after erasings. How to smartly choose blocks for erasing is the responsibility of a block-recycling policy. The block-recycling policy should try to minimize the overheads of garbage collection (caused by live data copyings). Under the current technology, a flash-memory block has a limita- tion on the erase cycle count. For example, each block of a typical NAND flash memory could be erased for 1 million (106) times. A worn-out block could suffer from frequent write errors. The “wear-levelling” policy should try to erase blocks over flash memory evenly so that a longer overall life- time could be achieved. Note that wear-levelling activities could impose significant overheads over the flash-memory storage systems if the access patterns has a strong locality on updates.
3. PROBLEM FORMULATION
The Logical View of an R-Tree Index Structure
10 11 12 13 14 15
A Object
Contents N
a b
c
d R
1 2
3 4
5 6
7
8
9
Object ID 10
11
12
13
15 14
R
a c
1 2 31045 67 89
b d
11 12 13 14 15
Figure 2: An R-Tree (the max fanout = 5).
Suppose that six spatial objects will be inserted to the R-tree shown in Figure 2. As shown in Figure 2, there is an R-tree with an internal node R and four external nodes a, b, c, and d. The minimal bounding box of node R contains the minimal bounding boxes of nodes a, b, c, and d. The hierarchical structure of the R-tree is shown at the right- hand side of Figure 2.
The insertions should check up the minimal bounding boxes of nodes in the R-tree. The 1st object is inserted to node a, the 2nd object and the 3rd object are inserted to node b, the 4th object is inserted to nodes c, and the 5th
object and the 6th object are inserted to node d, as shown in Figure 2. Let each R-tree node be stored in one page. Six updates of R-tree nodes (i.e., six page updates) occur. Such modifications of internal or external nodes are considered ef- ficient over disk-like storage media because updates of nodes are usually localized on the corresponding nodes. Because of the characteristics of flash memory, updates of nodes must be done with out-place writings. Even though only a small portion of a node is modified, information stored on the node must be invalidated, and a new page must be found for the update. The out-place updates for R-tree operations could result in the consumption of free pages and, in many cases, quickly trigger garbage collection. Garbage collection might also increase the energy consumption because writes and erases consume much more energy than reads, as shown in Table 1. The maintenance of R-trees over flash memory is complicated, especially when rebalancing (such as splitting or merging) might occur. Note that rebalancing could cause many pointers in nodes being updated. As a result, much more free space is consumed when rebalancing is needed, and garbage collection is more frequent.
These observations motive the research on R-tree imple- mentations over flash memory. The objectives are to not only improve the system performance but also reduce the energy consumption.
Table 1: Performance of a Typical NAND Flash Memory
Page Read Page Write Block Erase 512 bytes 512 bytes 16K bytes
Performance(µs) 50 200 1,881
Energy 99 237.6 422.4
Consumption (µjoule) (µjoule) (µjoule)
4. THE R-TREE IMPLEMENTATION OVER FLASH-MEMORY STORAGE SYSTEMS 4.1 Overview
The Logical View of an R-Tree Index Structure
Reservation Buffer (in RAM)
10 11 12 13 14 15
A Object
Contents N
a b
c
d R
1 2
3 4
5 6
7
8
9
Object ID 10
11
12
13
15 14
R
a c
1 2 31045 6 7 89
b d
11 12 13 14
I1 I2 I3 I4 I5 I6
Sector 34
Sector 35
Node Translation Table
Flash Memory Translation Layer (FTL) (e.g. File Systems using FTL)
Flash Memory
Figure 3: System Architecture
The implementation of R-trees should be independent of the design of FTL and applications, as shown in Figure
3. The proposed R-tree implementation shall consider the characteristics of flash memory. The objective is to provide transparent and efficient accesses over R-tree index struc- tures on flash-memory storage systems to reduce the num- ber of unnecessary writes and to improve the system perfor- mance.
When an insertion or deletion request is received from an application, an “object” which contains the corresponding operation and data is created to denote the request. The object will be temporarily held by the reservation buffer.
Note that objects for deletions and insertions are created and handled in the same way. The reservation buffer is a write buffer residing in the main memory, as shown in Figure 3. All objects in the reservation buffer will be written to the flash memory in an on-demand fashion (The operation will be presented in Section 4.3.1).
Each object in the reservation buffer has two parts: meta data and data. The meta data of an object contains the minimum bounding box, operation, and pointers for house- keeping. Note that most of the meta data mentioned in the previous statement are traditionally stored in R-tree nodes to form the index structure. A data structure called an in- dex unit is used to store the meta data of an object (Please see Section 4.2). When a collection of objects were flushed from the reservation buffer, the corresponding index units will be created and packed into sectors, where each sector is a logical page on flash memory. The proposed R-tree implementation is responsible of packing index units in a small number of sectors and storing them over flash memory through FTL. Because index units in the same sector might be belonging to different R-tree nodes, a node translation table is adopted to maintain the corresponding locations of index units in an R-tree so that each R-tree node could be efficiently reconstructed during data retrieval.
4.2 Data Structures
Three data structures for the proposed R-tree implemen- tation are presented in this section:
• The Reservation Buffer: The reservation buffer is a write buffer residing in the main memory. When an R- tree node is inserted, deleted, or modified, any newly generated objects would be temporarily held by the reservation buffer, where an object is a entry in an R- tree node. Objects in the reservation buffer represent operations which have not yet been applied to an R- tree.
• Index Units: The physical representation of an R- tree node consists of index units. Index units are cre- ated when objects in the reservation buffer are flushed to an R-tree index structure. An index unit contains necessary meta data of an object to denote any mod- ifications to the corresponding R-tree node. An index unit consists of the following components: data ptr, parent node, next node, id, minimal bounding box, and op f lag. data ptr, parent node, next node, and minimal bounding box are a pointer to the data, a pointer to the parent R-tree node, a pointer to the child R-tree node, and the minimal bounding box, re- spectively. id denotes the R-tree node to which the index unit is belonging. op f lag represents the corre- sponding operation, i.e., an insertion (op f lag = i), a deletion (op f lag = d), or an update (op f lag = u).
Note that modifications to a node usually modifies only a small portion of the contents of a node. In- dex units are packed into few sectors by the proposed R-tree implementation for storing over flash memory to prevent the R-tree index structure from frequent updating (due to minor modifications to nodes). How- ever, due to the packing of index units into sectors, the index units of an R-tree node might be scattered over different sectors of flash memory.
• The Node Translation Table: A node translation table is adopted to maintain the mapping of index units and the corresponding R-tree nodes. Since the index units of an R-tree node might be scattered over flash memory due to the proposed R-tree implementa- tion, a node translation table is adopted to maintain the mapping of index units and the corresponding R- tree nodes so that each R-tree node could be efficiently reconstructed. The node translation table is an array of lists, where each entry of the array, i.e., a list, de- notes an R-tree node. The entry of the array contains a list of the LBA’s of the sectors which contain index units of the corresponding R-tree node.
4.3 Manipulations of Data Structures 4.3.1 Packing of Index Units
The proposed R-tree implementation will pack index units into sectors and store them on flash memory through FTL.
The R-tree implementation should minimize the number of written sectors, where sectors are logical storage units pro- vided by FTL.
We shall use an example to illustrates the idea: Suppose that the reservation buffer could hold up to six objects, as shown in Figure 3. Objects that correspond to the inser- tions of items with ID’s equal to 10, 11, 12, 13, 14, and 15 are left in the reservation buffer. Since the buffer is full, the proposed R-tree implementation first transforms the 6 objects into 6 index units {I1, I2, I3, I4, I5, I6}. According to the minimal bounding boxes of the objects and the min- imal bounding boxes of leaf nodes a, b, c, d, the six index units should be partitioned into four disjoint sets to avoid the scattering of the index units of an R-tree node widely over sectors: {I1} ∈ a, {I2, I3} ∈ b, {I4} ∈ c, {I5, I6} ∈ d.
Suppose that each sector could contain up to 3 index units.
{I1} and {I2, I3} are stored in the first sector (i.e., sector number 34). {I4} and {I5, I6} are stored in the second sector (i.e., sector number 35). The two sectors are then written to flash memory by the proposed R-tree implementation.
With the traditional approach in R-tree maintenance, up to six updates of R-tree nodes (i.e., six sector writes) would be needed. The packing problem can be defined as follows:
Definition 1. The packing problem of index units:
Given a collection T of disjoint sets of index units, a ca- pacity constraint C of sectors, and a positive integer I, the packing problem is to find a partition of T into groups such that the total number of index units in each group is no more than C, and the number of groups in the partition is no more than I.
Theorem 1. The packing problem of index units is NP- Hard.3
3A similar but simplified proof on the packing of information could also be found in [10].
Proof. The intractability of the problem could be shown by a reduction from the Bin-Packing [5] problem. The Bin- Packing problem can be defined as follows: Let S be a col- lection of items to be packed into bins, where the size of a bin is B, and each item xiis of size size(xi). Given an inte- ger J, a Bin-Packing problem instance is to partition S into groups and store each group in a bin such that the total size of a group is no larger than B, and the number of groups in the partition is no more than J.
The reduction from a Bin-Packing problem instance into a packing problem instance for index units. Let the capacity constraint C of sectors be B, and the collection of disjoint sets of index units T be S, where each item xi denotes a set of index units, and the number of index units in a set is equivalent to the size of the corresponding item. The con- straint I on the number of sectors for the packing problem of index units is equal to J. If there exists a solution for the packing problem of index units, then the solution can be directly applied to the Bin-Packing problem instance. As described above, the reduction can be done in a polynomial time. Since the Bin-Packing problem is NP-Hard [5], the packing problem of index units is NP-Hard. 2
Note that the well-known FIRST-FIT approximation al- gorithm [9] could have an approximation bound no more than twice of the optimal solution. Insertions and deletions of an R-tree node might not only update the correspond- ing nodes but also result in rebalancing of the R-tree index structure. In the proposed approach, deletions are handled by having “invalidation objects” in the reservation buffer.
In other words, deletions could be considered as insertions because deletions are turned into invalidation objects for the modifications of the R-tree.
4.3.2 Updating of the Node Translation Table
18
20 24
20 30
21 33
R
a b c d
12 3 4 5 6 7 8 9
21 31
R
a
b
c
d
R
a c
1 2 3104 5 6 7 8 9
b d
11 12 13 14 15
18
20 24
20 30
21 33
21 31
R
a
b
c
d
34 34 35 35 After flushing the reservation buffer
(a) Before flushing the reservation buffer (b) After flushing the reservation buffer Index
units
Sector 34 Sector 35
11 10
I1 I2 I3 I4 I5 I6
12 13 14 15
R
a b c d
1 2 34 5 6 7 8 9
19 19
Figure 4: The situation after the flushing of the reservation buffer.
A node translation table is adopted to maintain the map- ping of index units and the corresponding R-tree nodes so that each R-tree node could be efficiently reconstructed.
Figure 4.(a) shows an R-tree with five nodes (one internal node and four external nodes) and its corresponding node translation table. Figure 4.(b) shows an R-tree and its node translation table after the reservation buffer in Figure 3 is flushed. When an R-tree node is visited, we collect all of the index units belonging to the visited node by scanning the sectors whose LBA’s (logical block addresses) are stored in the list. For example, index units in sectors with LBA’s 20, 24, and 34 must be accessed to reconstruct an R-tree node a, as shown in Figure 4.(b). On the other hand, a sec- tor, e.g., that with LBA 20, might contain index units for more than one node, e.g., R-tree nodes a and b.
The node translation table is an array of lists, where each entry of the array, i.e., a list, denotes an R-tree node. The entry of the array contains a list of the LBA’s of the sectors which contain index units of the corresponding R-tree node.
The number of items in a list could result in the degrada- tion of the system performance and the increasing of the space overheads. A system parameter ω is used to restrict the maximum number of items in lists of the node transla- tion table. Once the number of items in a list grows over ω, the list must be compacted. The compaction of items in a list can be done by reading all sectors that contain the index units of the corresponding R-tree node and then write back to other available sectors. Note that since deletions are handled as variants of insertions, compaction also help in eliminating redundant index units. We shall use the fol- lowing example to illustrate the idea and the compaction process:
id = C;
op = i;
mbr = (x1,y1, w1,h1)
; id = A;
op = i;
mbr = (x2,y2, w2,h2)
; id = B;
op = i;
mbr = (x3,y3, w3,h3)
;
id = A;
op = i;
mbr = (x4,y4, w4,h4)
; id = D;
op = i;
mbr = (x5,y5, w5,h5)
; id = C;
op = i;
mbr = (x6,y6, w6,h6)
;
id = C;
op = i;
mbr = (x7,y7, w7,h7)
; id = B;
op = i;
mbr = (x8,y8, w8,h8)
; id = C;
op = d;
mbr = (x9,y9, w9,h9)
;
id = C;
op = i;
mbr = (x1,y1, w1,h1)
; id = C;
op = i;
mbr = (x6,y6, w6,h6)
; id = C;
op = i;
mbr = (x7,y7, w7,h7)
; id = C;
op = d;
mbr = (x9,y9, w9,h9)
;
Invalidation
id = C;
op = i;
mbr = (x6,y6, w6,h6)
; id = C;
op = i;
mbr = (x7,y7, w7,h7)
;
Sector 1 Sector 2 Sector 3
Index Unit
Number: 1 2 3 4 5 6 7 8 9
Let (x1, y1, w1, h1) be equal to (x9, y9, w9, h9)
The two index units are stored in one sector
Figure 5: Compacting of an R-Tree node C.
In Figure 5, some index units of R-tree node C are scat- tered in three sectors, i.e., Sectors 1, 2, and 3. Let the capac- ity of each sector be enough for three index units. Suppose that id, op, and mbr denote the identifer, the op flag, and the minimal bounding box of an index unit, respectively, where other meta data are not shown in the figure for the simplicity of explanation. Suppose there are 9 index units in the three sectors. They are numbered from 1 to 9. Let (xi, yi) and (wi, hi) denote the coordinate of the lower-left cor- ner and the dimension (i.e., the width and the height) of the minimal bounding box of the corresponding index unit (for i=1 to 9), respectively. The compacting of the list that cor- responds to R-tree node C involves the reading of the three sectors and the writing back of the index units. During the compaction process, the system find that two index units, i.e., the first and ninth index units, have the same minimal bounding box, where one is an insertion, and the other is a deletion. As a result, the deletion causes the removing of the insertions. Since only two left index units, i.e., the sixth and seventh index units, are belonging to R-tree node C, they are written into an available sector. The node trans- lation table is updated accordingly. In Section 5, we would provide further analysis of the compaction overheads.
5. SYSTEM ANALYSIS
The purpose of this section is to provide the analysis of the number of sector accesses for insertions under the pro- posed R-tree implementation, compared to those under the
original one. The overheads of compaction will be then ex- plored.
5.1 Analysis of the R-Tree Implementation
Suppose that n spatial objects ( with different minimal bounding boxes ) are to be inserted. Consider an R-tree index with a height equal to H on flash memory, where each R-tree node can be stored in a flash-memory sector. H is bounded by O(logfan-out(m+n)), where m is the number of objects before the n insertions occur. We shall first explore the number of sector accesses for the insertions of n spatial objects under the original R-tree implementation:
Suppose that the number of node splittings is Nsplit for the insertions of n spatial objects. The numbers of reads and writes for the insertions are RR= O(n ∗ H) and WR= O(n + 3 ∗ Nsplit), respectively, where RR and WR denote the numbers of reads and writes under the original R-tree implementation. RR is bounded by O(n ∗ H) because of the locating of the leaf node for each insertion of a spatial object. WRis bounded by O(n + 3 ∗ Nsplit) because n writes are needed to insert the n spatial objects into the proper leaf nodes, and 3 ∗ Nsplit comes from Nsplit splittings in which each splitting consists of two writes on the split nodes and one write on the parent node.
The number of sectors read for the insertions of n spatial objects under the proposed R-tree implementation could be derived as follows: Given ω as the bound on the length of lists in the node translation table, the number of reads for the insertions are RP R= O(n∗H ∗ω), where RP Rand WP R
denote the numbers of reads and writes under the proposed R-tree implementation. RP R is O(n ∗ H ∗ ω) because each visiting of an R-tree node might involve the traversing of a list in the node translation table. RP R, compared to RR, shows that the the proposed R-tree implementation might read more sectors in handling the insertions. In fact, the proposed R-tree trades the number of reads for the number of writes.
The number of sectors written under the proposed R-tree implementation could be derived as follows: Let the capac- ity of the reservation buffer be of b spatial objects. As a result, the reservation buffer would be flushed at least dn/be times for the insertion of n spatial objects. Let Nspliti denote the number of nodes splittings to handle the i-th flushing of the reservation buffer. Obviously, Pdn/be
i=1 Nspliti = Nsplit
because the R-tree index structures under the proposed R- tree implementation and the original R-tree implementation are logically identical. For each single step of the reservation buffer flushing, we have (b + Nspliti ∗ (f anout − 1) + Nspliti ∗ 2) index units to commit, where f anout is the maximum fan- out of the R-tree. Note that the multiplication of Nspliti and (f anout − 1) in the formula denotes that each splitting will result in 2 new nodes, and the number of index units in the 2 new nodes is (f anout − 1). Furthermore, the splitting might result in the twice updates of the parent node, because the minimal bounding boxes of the two new nodes for the split- ting have not been reflected in the parent node. Therefore, Nspliti ∗ 2 index units are needed in the worst case. Suppose that an R-tree node could fit in a sector. That is, a sector could hold up to (fanout-1) index units. The number of sec- tors written by the i-th committing of the reservation buffer could be (Λb + Nspliti +NsplitiΛ ∗2), where Λ = (f anout − 1).
In order to flush out the reservation buffer for the insertions of n spatial objects completely, we have to write at least
Pdn/be
i=1 (Λb + Nspliti +NisplitΛ ∗2) = (Pdn/be
i=1 b
Λ) +NsplitΛ∗(Λ+2)
≈ (Pdn/be
i=1 b
Λ) + Nsplit sectors. Since the proposed R-tree implementation adopts the FIRST-FIT approximation al- gorithm, the number of sectors written could be bounded by the following formula (the approximation bound would be no more than the twice of an optimal solution):
WP R= O(2 ∗ (
dn/beX
i=1
b
Λ) + Nsplit) = O(2 ∗ n
Λ + Nsplit) (1) WP Ris apparently far less than WR, since Λ (that is the maximum number of index units in a sector) is usually larger than 2. However, we should point out that the compaction of the node translation table might introduce some run-time overheads in compactions, as discussed in the next section.
5.2 Analysis for Node Compaction
A compaction process is to restrict the length of each list in the node translation table. In the section, we explore the overheads due to the compaction. Assume that the n spatial objects are inserted into an R-tree. Let the capacity of the reservation buffer be of b spatial objects. The reservation buffer might be flushed out at least dn/be times. Note that each compaction of an R-tree node will read no more than ω sectors and then compact and write them back to one sector, where ω is the maximum list length in the node translation table (Assume that an R-tree node can be contained in one sector). The number of sectors written by the compaction could be derived as follows: Rcompact∗Pdn/be
i=1 (b + Nspliti ) = Rcompact∗ (n + Nsplit), where Rcompact denotes the ratio of spatial-object processing (including that of split nodes) in which a compaction is resulted. Note that each flushing of the reservation buffer could produce at most (b+Nspliti ) node modifications. Rcompact∗ (n + Nsplit) denotes the worst-case number of sectors written for the compaction during the in- sertions of n spatial objects. Another overheads in the com- paction come from the reads of sectors for the compaction.
There are no more than Rcompact∗ (n + Nsplit) ∗ ω sectors being read for the compaction. We must point out that Rcompact∗ (n + Nsplit) ∗ ω is bounded by O(RP R). When Twrite∗ (WP R+ Rcompact∗ (n + Nsplit)) + Tread∗ (RP R+ Rcompact ∗ (n + Nsplit) ∗ ω) ≤ Twrite ∗ WR+ Tread∗ RR, the proposed R-tree implementation outperforms the orig- inal R-tree implementation for the insertions of n spatial objects, where Twrite and Tread denote the time to write and to read a sector, respectively. The preferred bound on Rcompact could also be derived based on the following for- mula:
Rcompact≤ Twrite∗ (WR− WP R) + Tread∗ (RR− RP R) Twrite∗ (n + Nsplit) + Tread∗ (n + Nsplit) ∗ ω
(2) Rcompact is actually influenced by the access pattern of insertions. When the access pattern of insertions disperses over the whole R-tree, the length of lists in the node trans- lation table could grow fast. This is because sectors could store index units that belong to different R-tree nodes and expands lists in the node translation table. Therefore, the compaction would be activated soon. In the next section, we will discuss how the impact of the characteristics of locality on the compaction.
6. EXPERIMENTAL RESULTS
6.1 Experimental Setup and Performance Met- rics
A NAND-based system prototype was built to evaluate the performance of the proposed R-tree implementation and the original R-tree implementation. For the rest of this sec- tion, let P R and R denote the proposed R-tree implemen- tation and the original R-tree implementation, respectively.
Figure 6: The Taipei map.
The system prototype was equipped with a 4MB NAND flash memory, where the performance of the adopted NAND flash memory was summarized in Table 1. Note that 4MB was a sufficiently large size for the experiments, because the contents of spatial objects were not stored. (Only index units created from the spatial objects were stored.) FTL was adopted to provide block-device emulation for P R and R. A greedy policy [8] was adopted in FTL to serve as the garbage collection policy. Two geographic files describing the roads and buildings of Taipei map were adopted as the data sets for the experiments, as shown in Figure 6. The geographic files were in the shapefile format [1]. The numbers of spatial objects of the buildings and roads were 8,590 and 7,340, respectively.
The parameters for an R-Tree were as follows: For both P R and R, the fan-out of the R-tree structure in the exper- iments was 16. For R, the size of an R-tree node fitted in a sector. For P R, the reservation buffer in the experiments could hold up to 80 objects (unless it was explicitly speci- fied), and the bound on the length of each list in the node translation table was no more than 4.
The performance of P R and R were evaluated in terms of several performance metrics: the average response time of insertions and modifications (deletions) of an R-Tree index structure, the number of pages read, the number of pages written, and the number of blocks erased. Note that the average response time was calculated according to the num- ber of pages written, read, and the number of blocks erased.
We also explored the compaction overheads, the reservation buffer size and energy consumption issues. Note that sec- tor reads/writes were issued by the upper applications, and FTL translated sector reads/writes into page reads/writes to physically access the flash memory.
6.2 Initiation Time for R-Tree Index Struc- tures
We measured the average response time of the insertions during the creation of an R-Tree index structure, based on the two geographic files. A smaller response time denoted a better efficiency in the handling of the insertions. The average response time could also reflect the overheads, that
(a) Average Response Time for the Geographic Files of Buildings and Roads
(b) Number of Pages Being Written for the Geographic Files of Buildings and Roads
(c) Number of Pages Being Read for the Geographic Files of Building and Roads 0
0.1 0.2 0.3 0.4 0.5 0.6
The Buildings The Roads
Average Response Time (ms)
PR R
0 2000 4000 6000 8000 10000 12000
The Buildings The Roads
Number of Pages Being Written (Pages)
PR R
0 10000 20000 30000 40000 50000
The Buildings The Roads
Number of Pages Being Read (Pages)
PR R
Figure 7: An R-tree index structure was entirely constructed based on the insertions of the spatial objects located in the two geographic files.
introduced garbage collection. As shown in Figure 7.(a), P R could handle the insertions more effectively than R due to a better response time. Figure 7.(b) and Figure 7.(c) showed the number of pages written and the number of pages read for the construction of R-Tree index structures. We could observe that the number of pages written under P R was even one-tenth of that under R. Since writing to flash memory could eventually introduce garbage collection activities, a smaller number of pages written was reflected to improve the response time, as mention above. Compared the number of writes and reads, as shown in Figure 7.(b) and Figure 7.(c), we could observe that P R smartly traded a larger number of reads for a reduced number of writes. We could also observe that no garbage collections occurred under P R, and there were 112 and 68 blocks erases observed under R for the indexing of the two geographic files.
6.3 Node Compaction Overheads
(a) Average Response Time for the Geographic Files of Buildings and Roads
(b) Number of Pages Being Written for the Geographic Files of Buildings and Roads
(c) Number of Pages Being Read for the Geographic Files of Building and Roads
(d) Number of Executions of the Compaction for the Geographic Files of Buildings and Roads
0 0.1 0.2 0.3 0.4 0.5 0.6
The Buildings w.
Randomization
The Roads w.
Randomization
Average Response Time (ms)
PR R
0 2000 4000 6000 8000 10000 12000
The Buildings w.
Randomization
The Roads w.
Randomization
Number of Pages Being Written (Pages)
PR R
0 10000 20000 30000 40000 50000 60000
The Buildings w.
Randomization
The Roads w.
Randomization
Number of Pages Being Read (Pages)
PR R
0 100 200 300 400 500 600 700
The Buildings w/o Randomization
The Buildings w.
Randomization The Roads w/o Randomization
The Roads w.
Randomization
Number of Executions of Compact Function
PR
Figure 8: The overheads of the compaction were measured.
We want to know if the spatial locality of the inserted objects could have a significant impact on the overheads for the compaction. In this part of experiments, we manip- ulated the spatial locality in the sequence of the inserted
spatial objects. Two sequences to insert the spatial objects were considered: The first sequence is to sequentially insert the spatial objects (with high locality) according to their locations (i.e., top-down and then left-right). The second sequence is to randomly pick one spatial object to insert.
Note that the second sequence could not have the charac- teristics of locality.
Note that the insertions for the experiments shown in Sec- tion 6.2 were in a sequential order, as mentioned above.
The response time of the insertions, the number of page written, and the number of page read for the random inser- tion sequence under P R and R were shown in Figure 8.(a), 8.(b), and 8.(c), respectively. Compared Figure 7 with Fig- ure 8, P R performed better when the input sequence was sequential. When the input sequence became random, the lists of the node translation table could grow fast and the compaction could be activated frequently. The compaction could impose significant overheads on the handling of the insertions. The phenomenon was also observed in Figure 8.(d), which showed the number of executions to compact a list in the node translation table under the sequential and random sequence. A strong evidence was shown that the overheads for the compaction highly could depend on the spatial locality in the insertion sequence.
6.4 Performance for Data Modifications
(a) Average Response Time for the Modifications of the Geographic File of Buildings
(b) Number of Pages Being Written for the Modifications of the Geographic File of Buildings
(c) Number of Pages Being Read for the Modifications of the Geographic File of Buildings
(d) Number of Erased Blocks for the Modifications of the Geographic File of Buildings
0 0.2 0.4 0.6 0.8 1
0.8 0.6 0.4 0.2
The Ratio of Modifications
Average Response Time (ms)
PR R
0 5000 10000 15000 20000
0.8 0.6 0.4 0.2
The Ratio of Modifications Number of Pages Being Written (Pages)
PR R
0 20000 40000 60000 80000 100000 120000
0.8 0.6 0.4 0.2
The Ratio of Modifications Number of Pages Being Read (Pages)
PR R
0 50 100 150 200 250 300 350
0.8 0.6 0.4 0.2
The Ratio of Modifications
Number of Erased Blocks (Blocks)
R
Figure 9: The performance of modifications to an R-tree index structure was measured.
An experiment parameter modif ication ratio was adopted to control the ratio of the number of the modified spatial ob- jects to the total number of the spatial objects. Note that spatial objects were randomly chosen once to modify based on the specified modif ication ratio, and each modified ob- ject was modified once in the entire experiments. All spatial objects were first inserted and then were randomly selected to modify. There were 8,590 spatial objects located in the geographic file of the buildings of Taipei city. The average response time of the insertions and modifications, the num- ber of pages written, the number of pages read, the number of block erased (for garbage collection) were shown in Figure 9.(a), Figure 9.(b), Figure 9.(c), Figure 9.(d), respectively.
When the modif ication ratio was increased, more spa- tial objects would be modified in the experiments. Because the modifications could result in byte-wise updates to the
R-tree nodes, many pages writes were needed. It was ob- served that P R substantially reduced the number of pages written without introducing a noticeable number of pages read. As a result, P R did’nt bring any garbage collection in the experiments due to a small number of pages written.
Garbage collection that happened frequently would degrade the overall performance. As a result, P R could outperform R according to the experimental results.
6.5 Reservation Buffer Size and Energy Con- sumption
A large reservation buffer could have benefits by analyz- ing these objects, however, it could damage the reliability of the R-tree index structure due to power-failures. The Reservation buffer with different sizes was evaluated to find a reasonably good setting. We evaluated the performance of P R for the indexing of the file of buildings under different sizes of the reservation buffer. The size of the reservation buffer was set between 20 objects and 100 objects, and the size was incremented by 10 objects. The average response time was significantly reduced from 0.37 ms to 0.28 ms when the size of the reservation buffer was increased from 20 ob- jects to 80 objects. After that, the average response time was linearly reduced from 0.28 ms to 0.24 ms and no signif- icant improvement could be observed. Since increasing the size of the reservation buffer could damage the reliability of the R-tree index structure, the recommended size of the reservation buffer for the experiments was 80 objects.
Energy consumption is also a critical issue for portable devices. According to the numbers of reads/ writes/ erases generated in the experiments, we calculated the energy con- sumption under P R and R. The energy consumptions of reads/ writes/ erases are included in Table 1. The energy consumption of the two different approaches for indexing two geographical files was shown in Table 2. The energy consumed under P R was clearly less than R. Since page writes and block erases consume relatively more energy than page reads, the energy consumption was reduced when P R smartly traded extra reads for the number of writes. Fur- thermore, energy consumption contributed by garbage col- lection was also reduced under P R since it consumed free space slower than R.
Table 2: Energy consumption of the proposed R- Tree and the original R-Tree (joule)
Creation
The Proposed R-Tree The Original R-Tree
The Buildings 4.52 6.43
The Roads 4.55 5.63
7. CONCLUSION
In this paper, we propose an efficient R-tree implementa- tion over flash-memory storage systems. The implementa- tion is over the flash translation layer (FTL) [3, 2] for the compatibility of applications and systems. When an R-tree node is inserted, deleted, or modified, the corresponding ob- jects are held by the R-tree implementation. The proposed R-tree implementation then transforms objects into index units and packs units into sectors. The objective is not only to improve the performance of flash-memory storage systems but also to reduce the energy consumption of the
systems, where energy consumption is an important issue for the design of portable devices. We conducted a series of experiments over a system prototype and had very encourag- ing results. For future research, we shall further exploit the energy consumption issue for embedded systems, especially when various application semantics is considered. Research on the joint considerations of flash-memory storage systems and system/application programs might further improve the energy dissipation of the entire system.
8. REFERENCES
[1] Esri shapefile technical description. Technical report, ESRI.
[2] Ftl logger exchanging data with ftl systems. Technical report, Intel Corporation.
[3] Understanding the flash translation layer(ftl) specification. Technical report, Intel Corporation.
[4] N. Beckmann, H. P. Kriegel, R. Schneider, and B. Seeger. The r*tree: An efficient and robust access method for points and rectangles. In In Proc. ACM SIGMOD Intl. Symp. on the Management of Data, pages 322–331, 1990.
[5] M. R. Garey and D. S. Johnson. Computers and intractability. 1979.
[6] A. Guttman. R-tree: A dynamic index structure for spatial searching. pages 45–57. In Proc. ACM SIGMOD Intl. Symp. on the Managementof Data, 1984.
[7] K. Han-Joon and L. Sang-goo. A new flash memory management for flash storage system. In Proceedings of the Computer Software and Applications
Conference, 1999.
[8] A. Kawaguchi, S. Nishioka, and H. Motoda. A flash-memory based file system. USENIX Technical Conference on Unix and Advanced Computing Systems, 1995.
[9] V. V. Vazirani. Approximation Algorithm. Springer publisher, 2001.
[10] C. H. Wu, L. P. Chang, and T. W. Kuo. An efficient b-tree layer for flash-memory storage systems. The 9th International Conference on Real-Time and
Embedded Computing Systems and Applications (RTCSA 2003), 2003.
[11] M. Wu and W. Zwaenepoel. envy: A non-volatile, main memory storage system. In Proceedings of the 6th International Conference on Architectural Support for Programming Languages and Operating Systems, 1994.