In this paper, we present a lock-free, cache-friendly C++ template class QueueBuffer. We show how memory coherence and consistency play important roles on writing a correct lock-free code. The difficulty in writing lock-lock-free programs comes from the lack of a memory model in programming languages. Memory models make reasoning about the correctness of parallel programs much more formal and easier. Programmers do not need to worry that compilers and hardware might change the meaning presented by the source code. As an example, Java programmers can write parallel programs much easier and more comfortable since Java has a sequential-consistency memory model [40]. On the contrary, C/C++ have no memory model so far. The bright side is the forthcoming C++ Standard has already defined a memory model [18].
Cache, as a two-side sword, can provide significant performance improvement or degra-dation of applications. Since there is no explicit control over cache, it is a challenge for pro-grammers to write cache-efficient programs without precise profiling. In recent years, shared cache on multicore systems has became an interesting topic. Tian mentioned that the shared cache could be an alternative communication mechanism among cores instead of the much slower traditional memory [32]. Using shared cache as an alternative communication mecha-nism, however, is not easy as it might look like. It is not clear if there is a way in which we
can ensure correctness with ordered atomic types and obtain performance promised by shared cache at the same time. How to use shared cache as a communication mechanism among cores depends on the underlying systems. For example, it is possible that the data written by PROD cannot reach shared cache on time. Then CONS will have a cache miss. This topic needs to be further explored.
Bibliography
[1] H. Sutter. (2005) The free lunch is over. [Online]. Available: http://www.gotw.ca/
publications/concurrency-ddj.htm
[2] R. Allen and K. Kennedy, Optimizing compilers for modern architectures: a dependence-based approach. Morgan Kaufmann Publishers, 2002.
[3] R. Rangan, N. Vachharajani, M. Vachharajani, and D. August, “Decoupled software pipelining with the synchronization array,” in Proceedings of the 13th International Conference on Parallel Architectures and Compilation Techniques. IEEE Computer So-ciety Washington, DC, USA, 2004, pp. 177–188.
[4] G. Ottoni, R. Rangan, A. Stoler, and D. August, “Automatic thread extraction with de-coupled software pipelining,” in Proceedings of the 38th annual IEEE/ACM International Symposium on Microarchitecture. IEEE Computer Society, 2005, p. 118.
[5] R. Rangan, N. Vachharajani, A. Stoler, G. Ottoni, D. August, and G. Cai, “Support for high-frequency streaming in CMPs,” in Proceedings of the 39th Annual IEEE/ACM International Symposium on Microarchitecture. IEEE Computer Society, 2006, pp. 259–
272.
[6] J. Hennessy, D. Patterson, D. Goldberg, and K. Asanovic, Computer architecture: a quantitative approach. Morgan Kaufmann, 2007.
[7] L. Lamport, “How to make a multiprocessor computer that correctly executes multiprocess progranm,” IEEE transactions on computers, vol. 100, no. 28, pp. 690–691, 1979.
[8] S. Adve and K. Gharachorloo, “Shared memory consistency models: A tutorial,”
Computer, vol. 29, no. 12, pp. 66–76, 1996.
[9] P. E. Mckenney, “Memory ordering in modern microprocessors,” Linux Journal, vol. 30, pp. 52–57, 2005.
[10] J. Reinders, Intel threading building blocks: outfitting C++ for multi-core processor parallelism. O’Reilly Media, Inc., 2007, pp. 122–129.
[11] P. McKenney and I. Beaverton, “Memory Barriers: a Hardware View for Software Hack-ers,” 2009.
[12] S. Akhter and J. Roberts, Multi-core programming: increasing performance through software multi-threading. Intel Press, 2006, pp. 212–213.
[13] H. Sutter. (2005) The trouble with locks. [Online]. Available: http://www.drdobbs.com/
cpp/184401930
[14] ——. (2009) volatile vs. volatile. [Online]. Available: http://www.drdobbs.com/
hpc-high-performance-computing/212701484
[15] L. Lamport, “Proving the correctness of multiprocess programs,” IEEE Transactions on Software Engineering, pp. 125–143, 1977.
[16] S. Adve and H. Boehm, “Memory models: a case for rethinking parallel languages and hardware,” in Proceedings of the 28th ACM symposium on Principles of distributed computing. Citeseer, 2009, p. 2.
[17] S. Norton and M. DiPasquale, Thread Time: A Multi-Threaded Programming Guide.
[18] P. Becker. (2010) Programming languages - C++. [Online]. Available: http:
//www.open-std.org/jtc1/sc22/wg21/docs/papers/2010/n3092.pdf
[19] C. SunSoft, Solaris multithreaded programming guide. Prentice-Hall, Inc. Upper Saddle River, NJ, USA, 1995, pp. 117–118.
[20] H. Sutter. (2008) Lock-free code: A false sense of security. [Online]. Available:
http://www.drdobbs.com/cpp/210600279
[21] Intel Threading Building Blocks - Design Patterns, Intel Crop., 2010. [Online]. Avail-R able: http://www.threadingbuildingblocks.org/uploads/81/91/Latest\%20Open\%20So\
%urce\%20Documentation/Design Patterns.pdf
[22] H. Sutter. (2008) Writing lock-free code: A corrected queue. [Online]. Available:
http://www.ddj.com/high-performance-computing/210604448
[23] S. Padidar, “Parallel Program Verification: A Brief Introduction,” 2010.
[24] G. Holzmann, The SPIN model checker: Primer and reference manual. Addison Wesley Publishing Company, 2004.
[25] IBM System x3400, IBM, 2007. [Online]. Available: http://www-07.ibm.com/systems/
includes/pdf/XSD02288USEN.pdf
[26] SUN SPARC ENTERPRISE T5120 SERVER, Oracle, 2009. [Online].
Available: http://www.oracle.com/us/products/servers-storage/servers/sparc-enterprise/
t-series/035999.pdf
[27] Quad-Core Intel XeonR Processor 5300 Series, Intel Crop., 2006. [Online]. Available:R
http://www.intel.com/Assets/en US/PDF/prodbrief/xeon-5300.pdf
[28] UltraSPARC T2 supplement to the UltraSPARC architecture 2007, Sun Mi-crosystems, Inc., 2007. [Online]. Available: http://opensparc-t2.sunsource.net/specs/
UST2-UASuppl-current-draft-P-EXT.pdf
[29] J. Henning, “SPEC CPU2006 benchmark descriptions,” ACM SIGARCH Computer Architecture News, vol. 34, no. 4, p. 17, 2006.
[30] C. Zilles, “Benchmark health considered harmful,” ACM SIGARCH Computer Architecture News, vol. 29, no. 3, p. 5, 2001.
[31] L. Dagum and R. Menon, “Open MP: An Industry-Standard API for Shared-Memory Programming,” IEEE Computational Science and Engineering, vol. 5, no. 1, pp. 46–55, 1998.
[32] T. Tian. (2007) Effective use of the shared cache in multi-core architectures. [Online].
Available: http://www.drdobbs.com/high-performance-computing/196902836
[33] Y. Zhang, K. Ootsu, T. Yokota, and T. Baba, “Clustered Decoupled Software Pipelining on Commodity CMP,” in 14th IEEE International Conference on Parallel and Distributed Systems, 2008. ICPADS’08, 2008, pp. 681–688.
[34] P. Lee, T. Bu, and G. Chandranmenon, “A lock-free, cache-efficient shared ring buffer for multi-core architectures,” in ACM/IEEE Symposium on Architectures for Networking and Communications Systems, 2009.
[35] T. Jablin, Y. Zhang, J. Jablin, J. Huang, H. Kim, and D. August, “Liberty Queues for EPIC Architectures,” in Proceedings of the Eigth Workshop on Explicitly Parallel Instruction Computer Architectures and Compiler Technology (EPIC), 2010.
[36] B. Lewis and D. Berg, Multithreaded programming with java technology. Prentice Hall
[37] R. Carver and K. Tai, Modern multithreading: implementing, testing, and debugging multithreaded Java and C++/Pthreads/Win32 programs. John Wiley and Sons, 2006, pp. 54, 77.
[38] H. Boehm and N. Maclaren. (2006) Should volatile acquire atomicity and thread visibility semantics? [Online]. Available: http://www.open-std.org/jtc1/sc22/wg21/docs/papers/
2006/n2016.html
[39] J. Giacomoni, T. Moseley, and M. Vachharajani, “FastForward for efficient pipeline par-allelism: a cache-optimized concurrent lock-free queue,” in Proceedings of the 13th ACM SIGPLAN Symposium on Principles and practice of parallel programming. ACM, 2008, pp. 43–52.
[40] J. Manson, W. Pugh, and S. Adve, “The Java memory model,” in Proceedings of the 32nd ACM SIGPLAN-SIGACT symposium on Principles of programming languages. ACM, 2005, pp. 378–391.