Based on the evaluation in Section 4.3, procedure-based region formation shows good performance compared to NET. We believe a good future research direction will be using ahead-of-time (AOT) compilation in procedure-based region forma-tion approaches. Similar to static binary translaforma-tion, procedures can be analyzed and compiled before program execution. Traces, on the other hand, cannot be known before program execution. Hence, traces are not suitable targets for AOT compilation. There are several advantages for AOT compilation. First, the com-pilation overhead of procedures is completely eliminated. Second, more aggressive optimizations are possible for AOT compilation to compile procedures. For exam-ple, we may be able to collect path proling information as the input to feedback-directive optimizations in AOT compilation. However, there are challenges for AOT compilation for procedures. The rst challenge will be striped applications which have no symbol table information. The second challenge is how to integrate traces into pre-compiled procedures since traces are usually identied more quickly than procedures. Forming traces can improve performance before hot procedures are identied.
Bibliography
[1] V. Bala, E. Duesterwald, and S. Banerjia. Dynamo: a transparent dynamic optimization system. In Proc. PLDI, pages 112, 2000.
[2] J. Lu, H. Chen, P.-C. Yew, and W.-C. Hsu. Design and implementation of a lightweight dynamic optimization system. Journal of Instruction-Level Parallelism, 6:124, 2004.
[3] S. Sridhar, J. S. Shapiro, E. Northup, and P. P. Bungale. HDTrans: an open source, low-level dynamic instrumentation system. In Proc. VEE, pages 175185, 2006.
[4] David Hiniker, Kim Hazelwood, and Michael D. Smith. Improving region selection in dynamic optimization systems. In MICRO 38: Proceedings of the 38th annual IEEE/ACM International Symposium on Microarchitecture, pages 141154, Washington, DC, USA, 2005. IEEE Computer Society. ISBN 0-7695-2440-0. doi: http://dx.doi.org/10.1109/MICRO.2005.22.
[5] F. Qin, C. Wang, Z. Li, H.-S. Kim, Y. Zhou, and Y. Wu. LIFT: A low-overhead practical information ow tracking system for detecting security attacks. In Proc. Annual Microarchitecture Symposium, pages 135148, 2006.
[6] James E. Smith and Ravi Nair. Virtual Machines: Versatile Platforms for Systems and Processes. Morgan Kaufman, 2005.
[7] Vasanth Bala, Evelyn Duesterwald, and Sanjeev Banerjia. Dynamo: a trans-parent dynamic optimization system. In PLDI '00: Proceedings of the ACM
SIGPLAN 2000 conference on Programming language design and implemen-tation, pages 112, New York, NY, USA, 2000. ACM. ISBN 1-58113-199-2.
doi: http://doi.acm.org/10.1145/349299.349303.
[8] Derek Bruening. Ecient, Transparent, and Comprehensive Runtime Code Manipulation. Ph.d. thesis, Massachusetts Institute of Technology, Cam-bridge, MA, Sep 2004.
[9] Chi-Keung Luk, Robert Cohn, Robert Muth, Harish Patil, Artur Klauser, Geo Lowney, Steven Wallace, Vijay Janapa Reddi, and Kim Hazelwood.
Pin: building customized program analysis tools with dynamic instrumen-tation. In PLDI '05: Proceedings of the 2005 ACM SIGPLAN conference on Programming language design and implementation, pages 190200, New York, NY, USA, 2005. ACM. ISBN 1-59593-056-6. doi: http://doi.acm.org/
10.1145/1065010.1065034.
[10] Nicholas Nethercote. Dynamic Binary Analysis and Instrumentation. A dis-sertation submitted for the degree of doctor of philosophy, University of Cam-bridge, November 2004. URL http://valgrind.org/docs/phd2004.pdf.
[11] K. Scott, N. Kumar, S. Velusamy, B. Childers, J. W. Davidson, and M. L.
Soa. Retargetable and recongurable software dynamic translation. In CGO '03: Proceedings of the international symposium on Code generation and opti-mization, pages 3647, Washington, DC, USA, 2003. IEEE Computer Society.
ISBN 0-7695-1913-X.
[12] K. Scott, N. Kumar, B.R. Childers, J.W. Davidson, and M.L. Soa. Overhead reduction techniques for software dynamic translation. In Parallel and Dis-tributed Processing Symposium, 2004. Proceedings. 18th International, pages 200, April 2004. doi: 10.1109/IPDPS.2004.1303224.
[13] Qin Zhao, Derek Bruening, and Saman Amarasinghe. Umbra: ecient and scalable memory shadowing. In CGO '10: Proceedings of the 8th annual IEEE/ACM international symposium on Code generation and optimization, pages 2231, New York, NY, USA, 2010. ACM. ISBN 978-1-60558-635-9. doi:
http://doi.acm.org/10.1145/1772954.1772960.
[14] Anton Cherno, Mark Herdeg, Ray Hookway, Chris Reeve, Norman Rubin, Tony Tye, S. Bharadwaj Yadavalli, and John Yates. Fx!32: A prole-directed binary translator. IEEE Micro, 18(2):5664, 1998. ISSN 0272-1732. doi:
http://dx.doi.org/10.1109/40.671403.
[15] Raymond J. Hookway and Mark A. Herdeg. Digital fx!32: combining emu-lation and binary transemu-lation. Digital Tech. J., 9(1):312, 1997. ISSN 0898-901X.
[16] L. Baraz, T. Devor, O. Etzion, S. Goldenberg, A. Skaletsky, Yun Wang, and Y. Zemach. Ia-32 execution layer: a two-phase dynamic translator designed to support ia-32 applications on itanium R-based systems. In Microarchitecture, 2003. MICRO-36. Proceedings. 36th Annual IEEE/ACM International Sym-posium on, pages 191201, Dec. 2003. doi: 10.1109/MICRO.2003.1253195.
[17] Cristina Cifuentes, Brian Lewis, and David Ung. Walkabout - a retargetable dynamic binary translation framework. In In Proceedings of the 2002 Work-shop on Binary Translation, 2002.
[18] Giuseppe Desoli, Nikolay Mateev, Evelyn Duesterwald, Paolo Faraboschi, and Joseph A. Fisher. Deli: a new run-time control point. In MICRO 35: Pro-ceedings of the 35th annual ACM/IEEE international symposium on Microar-chitecture, pages 257268, Los Alamitos, CA, USA, 2002. IEEE Computer Society Press. ISBN 0-7695-1859-1.
[19] Kemal Ebcioglu, Erik Altman, Michael Gschwind, and Sumedh Sathaye. Dy-namic binary translation and optimization. IEEE Transactions on Comput-ers, 50(6):529548, 2001.
[20] Jianjun Li, Chenggang Wu, and Wei-Chung Hsu. An evaluation of mis-aligned data access handling mechanisms in dynamic binary translation systems. In CGO '09: Proceedings of the 2009 International Symposium on Code Generation and Optimization, pages 180189, Washington, DC, USA, 2009. IEEE Computer Society. ISBN 978-0-7695-3576-0. doi: http:
//dx.doi.org/10.1109/CGO.2009.22.
[21] David Ung and Cristina Cifuentes. Machine-adaptable dynamic binary trans-lation. SIGPLAN Not., 35(7):4151, 2000. doi: http://doi.acm.org/10.1145/
351403.351414.
[22] David Ung and Cristina Cifuentes. Dynamic binary translation using run-time feedbacks. Sci. Comput. Program., 60(2):189204, 2006. ISSN 0167-6423. doi:
http://dx.doi.org/10.1016/j.scico.2005.10.005.
[23] Fabrice Bellard. Qemu, a fast and portable dynamic translator. In USENIX Annual Technical Conference, FREENIX Track, pages 4146, 2005.
[24] S. DEVINE, E. BUGNION, and M. ROSENBLUM. Virtualization system including a virtual machine monitor for a computer with a segmented archi-tecture. United States Patent 6,397,242.
[25] Chen Yu, Ren Jie, Zhu Hui, and Shi Yuan Chun. Dynamic binary translation and optimization in a whole-system emulator -skyeye. In Parallel Processing Workshops, 2006. ICPP 2006 Workshops. 2006 International Conference on, pages 8 pp.336, 0-0 2006. doi: 10.1109/ICPPW.2006.32.
[26] Matthew Chapman, Daniel J. Magenheimer, and Parthasarathy Ran-ganathan. Magixen: Combining binary translation and virtualization.
http://www.hpl.hp.com/techreports/2007/HPL-2007-77.html, 2007.
[27] Z. Wang, R. Liu, Y. Chen, X. Wu, H. Chen, W. Zhang, and B. Zang.
COREMU: a scalable and portable parallel full-system emulator. In Proc.
PPoPP, 2011.
[28] J.-H. Ding, Y.-C. Chung, P.-C. Chang, and W.-C. Hsu. PQEMU: A parallel system emulator based on QEMU. In 1st International QEMU Users Forum, 2011.
[29] Chris Lattner and Vikram Adve. Llvm: A compilation framework for lifelong program analysis & transformation. In CGO '04: Proceedings of the interna-tional symposium on Code generation and optimization, page 75, Washington, DC, USA, 2004. IEEE Computer Society. ISBN 0-7695-2102-9.
[30] I. Bohm, T.J.K. Edler von Koch, S.C. Kyle, B. Franke, and N. Topham.
Generalized just-in-time trace compilation using a parallel task farm in a dynamic binary translator. In Proc. PLDI, 2011.
[31] J. Ha, M. R. Haghighat, S. Cong, and K. S. McKinley. A concurrent trace-based just-in-time compiler for javascript. In Workshop on Parallel Execution of Sequential Programs on Multi-core Architectures, 2009.
[32] Andreas Gal, Brendan Eich, Mike Shaver, David Anderson, David Mandelin, Mohammad R. Haghighat, Blake Kaplan, Graydon Hoare, Boris Zbarsky, Jason Orendor, Jesse Ruderman, Edwin W. Smith, Rick Reitmaier, Michael Bebenita, Mason Chang, and Michael Franz. Trace-based just-in-time type specialization for dynamic languages. In PLDI, pages 465478, 2009.
[33] D. Merrill and K. Hazelwood. Trace fragment selection within method-based jvms. In Proceedings of the 4th ACM SIGPLAN/SIGOPS international con-ference on Virtual execution environments, pages 4150, 2008.
[34] H. Inoue, H. Hayashizaki, P. Wu, and T. Nakatani. A trace-based java jit compiler retrotted from a method-based compiler. In CGO'11, pages 246
256, 2011.
[35] H. Hayashizaki, P. Wu, H. Inoue, M. J. Serrano, and T. Nakatani. Improving the performance of trace-based systems by false loop ltering. In ASPLOS, pages 405418, 2011.
[36] Toshio Suganuma, Toshiaki Yasue, and Toshio Nakatani. A region-based compilation technique for a java just-in-time compiler. In PLDI '03, pages 312323. ACM, 2003. ISBN 1-58113-662-5. doi: 10.1145/781131.781166. URL http://doi.acm.org/10.1145/781131.781166.
[37] QEMU. http://qemu.org.
[38] Low Level Virtual Machine (LLVM). http://llvm.org.
[39] Apala Guha, Kim hazelwood, and Mary Lou Soa. Dbt path selection for holistic memory eciency and performance. SIGPLAN Not., 45(7):145156, 2010. doi: http://doi.acm.org/10.1145/1837854.1736018.
[40] Tiny Code Generator (TCG) Documentation.
http://wiki.qemu.org/Documentation/TCG.
[41] Aashish Phansalkar, Ajay Joshi, and Lizy K. John. Analysis of redundancy and application balance in the spec cpu2006 benchmark suite. SIGARCH Comput. Archit. News, 35:412423, June 2007. ISSN 0163-5964. doi:
http://doi.acm.org/10.1145/1273440.1250713. URL http://doi.acm.org/
10.1145/1273440.1250713.
[42] Chun-Chen Hsu, Pangfeng Liu, Chien-Min Wang, Jan-Jan Wu, Ding-Yong Hong, Pen-Chung Yew, and Wei-Chung Hsu. Lnq: Building high performance dynamic binary translators with existing compiler backends. In ICPP, pages 226234, 2011.
[43] M. M. Michael and M. L. Scott. Simple, fast, and practical non-blocking and blocking concurrent queue algorithms. In 15th Annual ACM Symposium on Principles of Distributed Computing, 1996.
[44] Vijay Sundaresan, Daryl Maier, Pramod Ramarao, and Mark Stoodley. Expe-riences with multi-threading and dynamic class loading in a java just-in-time compiler. In CGO '06, pages 8797, Washington, DC, USA, 2006. IEEE Computer Society. ISBN 0-7695-2499-0. doi: 10.1109/CGO.2006.16. URL http://dx.doi.org/10.1109/CGO.2006.16.
[45] Wen-Mei W. Hwu, Scott A. Mahlke, William Y. Chen, Pohua P. Chang, Nancy J. Warter, Roger A. Bringmann, Roland G. Ouellette, Richard E.
Hank, Tokuzo Kiyohara, Grant E. Haab, John G. Holm, and Daniel M. Lav-ery. The superblock: an eective technique for vliw and superscalar compi-lation. J. Supercomput., 7(1-2):229248, May 1993. ISSN 0920-8542. doi:
10.1007/BF01205185. URL http://dx.doi.org/10.1007/BF01205185.
[46] perfmon. perfmon2. http://perfmon2.sourceforge.net.
[47] unstrip tool. unstrip tool in dynamic instrumentation library.
http://www.paradyn.org/html/tools/unstrip.html.
[48] Vitaly Chipounov and George Candea. Dynamically Translating x86 to LLVM using QEMU. Technical report, 2010.
[49] Ding-Yong Hong, Chun-Chen Hsu, Pangfeng Liu, Chien-Min Wang, Jan-Jan Wu, , Pen-Chung Yew, and Wei-Chung Hsu. Hqemu: A multi-threaded and retargetable dynamic binary translator on multicores. In CGO '12: Pro-ceedings of the 10th annual IEEE/ACM international symposium on Code generation and optimization, 2012.
[50] J. Ha, M.R. Haghighat, S. Cong, and K.S. McKinley. A concurrent trace-based just-in-time compiler for single-threaded javascript. In Workshop on Parallel Execution of Sequential Programs on Multicore Architectures, 2009.
[51] James C. Dehnert, Brian K. Grant, John P. Banning, Richard Johnson, Thomas Kistler, Alexander Klaiber, and Jim Mattson. The transmeta code morphingTMsoftware: using speculation, recovery, and adaptive retranslation to address real-life challenges. In CGO '03: Proceedings of the international symposium on Code generation and optimization, pages 1524, Washington, DC, USA, 2003. IEEE Computer Society. ISBN 0-7695-1913-X.
[52] Jiwei Lu, Howard Chen, Pen-Chung Yew, and Wei chung Hsu. Design and implementation of a lightweight dynamic optimization system. Journal of Instruction-Level Parallelism, 6:2004, 2004.
[53] Cheng Wang, Shiliang Hu, Ho-Seop Kim, Sreekumar R. Nair, Mauricio Bre-ternitz Jr., Zhiwei Ying, and Youfeng Wu. Stardbt: An ecient multi-platform dynamic binary translation system. In ACSAC'07, pages 415, 2007.
[54] Peng Wu, Hiroshige Hayashizaki, Hiroshi Inoue, and Toshio Nakatani. Re-ducing trace selection footprint for large-scale java applications without per-formance loss. In OOPSLA '11, pages 789804, New York, NY, USA, 2011. ACM. ISBN 978-1-4503-0940-0. doi: 10.1145/2048066.2048127. URL http://doi.acm.org/10.1145/2048066.2048127.
[55] Derek Davis and Kim Hazelwood. Improving region selection through loop completion. In ASPLOS Workshop on Runtime Environments/Systems, Lay-ering, and Virtualized Environments, RESoLVE, Newport Beach, CA, March 2011.
[56] Chengyan Zhao, Youfeng Wu, J. Gregory Stean, and Cristiana Amza.
Lengthening traces to improve opportunities for dynamic optimization. In Proceedings of the Workshop on Interaction between Compilers and Computer Architectures, 2008.
[57] Michael Paleczny, Christopher Vick, and Cli Click. The java hotspot(tm) server compiler. In In USENIX Java Virtual Machine Research and Technol-ogy Symposium, pages 112, 2001.
[58] Hotspot parallel collector. In Memory Management in the Java HotSpot Vir-tual Machine Whitepaper.
[59] Michael Bebenita, Florian Brandner, Manuel Fahndrich, Francesco Logozzo, Wolfram Schulte, Nikolai Tillmann, and Herman Venter. Spur: a trace-based jit compiler for cil. SIGPLAN Not., 45:708725, October 2010. ISSN 0362-1340. doi: http://doi.acm.org/10.1145/1932682.1869517. URL http://doi.
acm.org/10.1145/1932682.1869517.
[60] Hiroshi Inoue, Hiroshige Hayashizaki, Peng Wu, and Toshio Nakatani. Adap-tive multi-level compilation in a trace-based java jit compiler. In Proceedings of the ACM international conference on Object oriented programming systems languages and applications, OOPSLA '12, pages 179194, New York, NY, USA, 2012. ACM. ISBN 978-1-4503-1561-6. doi: 10.1145/2384616.2384630.
URL http://doi.acm.org/10.1145/2384616.2384630.