To understand the relationship between improvement in LLC-misses and improvement in overall performance, we apply linear regression on request cost cycle versus per-request LLC-misses for several sets of experimental results and formulate the cost caused by LLC-misses. Table 4.5 shows some sets of linear models. For r = 16B, the per-request LLC-misses of baseline model range from 4 to 11, which result in 57%-78% of total costs with 1 thread. Among these cases, the cache-aware model can save 20%-50% LLC-misses. However, since the cache-aware model has higher base cost and more expensive LLC-misses, the overall saved cost is less than 30% in all cases.
r = 16B Cache-aware Baseline 1 thread 44x + 118 37x + 110 5 threads 68x + 209 60x + 136 10 threads 80x + 211 74x + 127
Table 4.5: The fitted linear model of per-request cost cycles versus per-request LLC-misses. Each entry represents a model from data of 12 distinct (m, a) combinations.
Chapter 5 Conclusion
In this thesis, we propose a cache-centric update request handling model using a request buffering data structure called cache-buckets. This model targets to alleviate the poor cache utilization problem of in-memory DBMSs for low-locality update-intensive work-loads. Our cache-aware batch update model tends to aggregate multiple update requests into one batched update to obtain higher temporal locality of cache usage, avoiding re-reference of memory data buckets. The experiment results show that the cache-aware model has up to 4 times less cache misses and 65% increase in throughput. Due to not negligible overhead, not dominating cache-miss penalties and limited room of improve-ment in LLC-misses, the cache-aware model can achieve only slight improveimprove-ment of overall throughput.
For future work, we would like to study the feasibility of designing a cache-centric, fault-tolerance storage system for NVRAM, which is equipped with the cache-aware batch update model, to achieve cache-speed request handling.
Bibliography
[1] Intel XeonR Processor E5-2620 v2 (15M Cache,R 2.10 GHz) Prod-uct Specifications. https://ark.intel.com/products/75789/
Intel-Xeon-Processor-E5-2620-v2-15M-Cache-2_10-GHz.
[2] Intel R 64 and IA-32 Architectures Optimization Reference Manual.
https://software.intel.com/sites/default/files/managed/
9e/bc/64-ia-32-architectures-optimization-manual.pdf, Jun 2017.
[3] S. Chen and Q. Jin. Persistent b+-trees in non-volatile main memory. Proceedings of the VLDB Endowment, 8(7):786–797, 2015.
[4] C. Diaconu, C. Freedman, E. Ismert, P.-A. Larson, P. Mittal, R. Stonecipher, N. Verma, and M. Zwilling. Hekaton: SQL server’s memory-optimized OLTP engine. In Proceedings of the 2013 ACM SIGMOD International Conference on Management of Data, pages 1243–1254. ACM, 2013.
[5] J. Huang, K. Schwan, and M. K. Qureshi. NVRAM-aware logging in transaction systems. Proceedings of the VLDB Endowment, 8(4):389–400, 2014.
[6] A. Jaleel, K. B. Theobald, S. C. Steely Jr, and J. Emer. High performance cache replacement using re-reference interval prediction (RRIP). In ACM SIGARCH Com-puter Architecture News, volume 38, pages 60–71. ACM, 2010.
[7] T. Karnagel, R. Dementiev, R. Rajwar, K. Lai, T. Legler, B. Schlegel, and W. Lehner.
Improving in-memory database index performance with Intel Transactional Syn-R
chronization Extensions. In High Performance Computer Architecture (HPCA), 2014 IEEE 20th International Symposium on, pages 476–487. IEEE, 2014.
[8] C. Kim, D. Burger, and S. W. Keckler. An adaptive, non-uniform cache structure for wire-delay dominated on-chip caches. In Acm Sigplan Notices, volume 37, pages 211–222. ACM, 2002.
[9] H. Kimura. FOEDUS: OLTP engine for a thousand cores and NVRAM. In Pro-ceedings of the 2015 ACM SIGMOD International Conference on Management of Data, pages 691–706. ACM, 2015.
[10] C. Lameter. Numa (non-uniform memory access): An overview. Queue, 11(7):40, 2013.
[11] L. Lamport. Specifying concurrent program modules. ACM Transactions on Pro-gramming Languages and Systems (TOPLAS), 5(2):190–222, 1983.
[12] V. Leis, A. Kemper, and T. Neumann. The adaptive radix tree: ARTful indexing for main-memory databases. In Data Engineering (ICDE), 2013 IEEE 29th Inter-national Conference on, pages 38–49. IEEE, 2013.
[13] J. J. Levandoski, D. B. Lomet, and S. Sengupta. The Bw-Tree: A B-tree for new hardware platforms. In Data Engineering (ICDE), 2013 IEEE 29th International Conference on, pages 302–313. IEEE, 2013.
[14] Z. Majo and T. R. Gross. Memory management in NUMA multicore systems:
trapped between cache contention and interconnect overhead. In ACM SIGPLAN Notices, volume 46, pages 11–20. ACM, 2011.
[15] S. Pelley, T. F. Wenisch, B. T. Gold, and B. Bridge. Storage management in the NVRAM era. Proceedings of the VLDB Endowment, 7(2):121–132, 2013.
[16] S. Perarnau, M. Tchiboukdjian, and G. Huard. Controlling Cache Utilization of HPC Applications. In International Conference on Supercomputing (ICS), 2011.
[17] H. Pirk, F. Funke, M. Grund, T. Neumann, U. Leser, S. Manegold, A. Kemper, and M. Kersten. CPU and cache efficient management of memory-resident databases.
In Data Engineering (ICDE), 2013 IEEE 29th International Conference on, pages 14–25. IEEE, 2013.
[18] H. Plattner and A. Zeier. In-memory data management: technology and applica-tions. Springer Science & Business Media, 2012.
[19] A. Scolari, D. B. Bartolini, and M. D. Santambrogio. A Software Cache Partition-ing System for Hash-Based Caches. ACM Transactions on Architecture and Code Optimization (TACO), 13(4):57, 2016.
[20] D. N. Simha, M. Lu, and T.-c. Chiueh. An update-aware storage system for low-locality update-intensive workloads. In ACM SIGPLAN Notices, volume 47, pages 375–386. ACM, 2012.
[21] L. Soares, D. Tam, and M. Stumm. Reducing the harmful effects of last-level cache polluters with an OS-level, software-only pollute buffer. In Proceedings of the 41st annual IEEE/ACM International Symposium on Microarchitecture, pages 258–269.
IEEE Computer Society, 2008.
[22] S. Tu, W. Zheng, E. Kohler, B. Liskov, and S. Madden. Speedy transactions in mul-ticore in-memory databases. In Proceedings of the Twenty-Fourth ACM Symposium on Operating Systems Principles, pages 18–32. ACM, 2013.
[23] J. Yang, Q. Wei, C. Chen, C. Wang, K. L. Yong, and B. He. NV-Tree: Reducing Consistency Cost for NVM-based Single Level Systems. In FAST, volume 15, pages 167–181, 2015.
[24] H. Zhang, G. Chen, B. C. Ooi, K.-L. Tan, and M. Zhang. In-memory big data management and processing: A survey. IEEE Transactions on Knowledge and Data Engineering, 27(7):1920–1948, 2015.