• 沒有找到結果。

This paper has presented efficient and effective packing and analytical placement algorithms for large-scale heterogeneous FPGA design.

Our packing algorithm is composed of three stages to handle different struc-tures of heterogeneous components and datapath blocks. Our packing provides a more placement-friendly netlist than VPR’s, which gives a great potential for placers to achieve significant quality improvement. A V-shaped framework is proposed to enhance the scalability while considering more exact design constraints than exist-ing works. Moreover, our wirelength-driven analytical placement algorithm applies effective nonlinear optimization techniques and utilizes the regularity of the datap-aths to achieve scalability and high quality. A complex-block-alignment function is proposed to better handle the heterogeneity, and a multilevel framework is applied to further enhance the scalability of our placement algorithm.

We compared our packer and placer with the state-of-the-art FPGA CAD tool, VPR. The experiments were conducted on a set of modern large-scale FPGA benchmarks, Titan23 benchmark suites. The experimental results have shown that our approach achieves a 199.80× speedup, compared to the VPR’s latest packing flow (VPR 7.0). Meanwhile, our placer achieves a 3.07× speedup with 6% shorter wirelength, compared to the VPR’s latest placement flow. Furthermore, the overall

55

flow (packing and placement) achieves a 18.30× speedup with 6% shorter wirelength, compared to the VPR’s latest packing and placement flow.

There are three future research directions: the analytical FPGA placement problem with segmented-length routing architecture, the analytical FPGA placement problem considering timing delay, and FPGA routing. We detail the future research directions below.

First of all, the pre-fabricated programmable routing architecture of modern FPGAs is segmented and there are several types of length. Hence, the way to predict routing and estimate routability in FPGA placement could be very different from that in ASIC placement. Furthermore, the current wirelength model might not be accurate enough for the routing estimation. As a result, how to utilize the various routing resources effectively and combine the estimation with an analytical placement framework have become a new challenge.

Secondly, in addition to wirelength and routability, timing is another critical issue in FPGA placement. Timing has been well explored in ASIC placement.

The timing issue in FPGA design, however, still has lots more to work on because multiple routing resources could lead to different delays, which makes timing harder to predict.

Finally is the FPGA routing. Because the existing open-source FPGA router demonstrated poor scalability when facing large-scale circuits, FPGA routing is also a research direction on the scalability issue. Furthermore, understanding the FPGA routing could also help to build a quick routing forecast during placement.

[1] Altera Corp. http://www.altera.com/.

[2] Xilinx Inc. http://www.xilinx.com/.

[3] T. Ahmed, P. D. Kundarewich, J. H. Anderson, B. L. Taylor, and R. Aggarwal.

Architecture-specific packing for virtex-5 FPGAs. In Proceedings of Symposium on Field Programmable Gate Arrays, pages 5–13, 2008.

[4] M. J. Alexander, J. P. Cohoon, J. L. Ganley, and G. Robins. Placement and routing for performance-oriented FPGA layout. VLSI Design, 7(1):97–110, Jan. 1998.

[5] C. Alpert, A. Kahng, G.-J. Nam, S. Reda, and P. Villarrubia. A semi-persistent clustering technique for VLSI circuit placement. In Proceedings of ACM Inter-national Symposium on Physical Design, pages 200–207, San Francisco, USA, Apr. 2005.

[6] V. Betz and J. Rose. Cluster-based logic blocks for FPGAs: area-efficiency vs.

input sharing and size. In Proceedings of Custom Integrated Circuits Confer-ence, pages 551–554, Santa Clara, USA, May. 1997.

[7] V. Betz and J. Rose. VPR: a new packing, placement and routing tool for FPGA research. In Proceedings of Field-Programmable Logic and Applications, pages 213–222, London, UK, Sep. 1997.

57

[8] V. Betz and J. Rose. Architecture and CAD for Deep-Submicron FPGAs.

Kluwer, 1999.

[9] E. Bozorgzadeh, S. O. Memik, X. Yang, and M. Sarrafzadeh. RPack:

routability-driven packing for cluster-based FPGAs. In Proceedings of Asia and South Pacific Design Automation Conference, pages 629–634, Yokohama, Japan, Feb. 2001.

[10] T. J. Callahan, P. Chong, A. DeHon, and T. Wawrzynek. Fast module mapping and placement for datapaths in FPGAs. In Proceedings of Symposium on Field Programmable Gate Arrays, pages 123–132, Monterey, USA, Feb. 1998.

[11] T. F. Chan, J. Cong, J. R. Shinnerl, K. Sze, and M. Xie. mPL6: enhanced mul-tilevel mixed-size placement. In Proceedings of ACM International Symposium on Physical Design, pages 212–214, San Jose, USA, Apr. 2006.

[12] D. T. Chen, K. Vorwerk, and A. Kennings. Improving timing-driven FPGA packing with physical information. In Proceedings of Field-Programmable Logic and Applications, pages 117–123, Amsterdam, Dutch, Aug. 2007.

[13] T.-C. Chen, Z.-W. Jiang, T.-C. Hsu, H.-C. Chen, and Y.-W. Chang. NTU-place3: an analytical placer for large-scale mixed-size designs with preplaced blocks and density constraints. IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, 27(7):1228–1240, Jul. 2008.

[14] C.-L. E. Cheng. RISA: accurate and efficient placement routability modeling.

In Proceedings of IEEE/ACM International Conference on Computer-Aided Design, pages 690–695, San Jose, USA, Nov. 1994.

[15] S. Chou, M.-K. Hsu, and Y.-W. Chang. Structure-aware placement for datapath-intensive circuit designs. In Proceedings of ACM/IEEE Design Au-tomation Conference, pages 762–767, San Francisco, USA, Jun. 2012.

[16] J. Cong and M. Romesis. Performance-driven multi-level clustering with appli-cation to hierarchical FPGA mapping. In Proceedings of ACM/IEEE Design Automation Conference, pages 389–394, Las Vegas, USA, Jun. 2001.

[17] M. E. Dehkordi and S. D. Brown. Performance-driven recursive multi-level clustering. In Proceedings of IEEE International Conference on Field-Programmable Technology, pages 262–269, Tokyo, Japan, Dec. 2003.

[18] W. Feng. K-way partitioning based packing for FPGA logic blocks without input bandwidth constraint. In Proceedings of IEEE International Conference on Field-Programmable Technology, pages 8–15, Seoul, South Korea, Dec. 2012.

[19] M. R. Garey, D. S. Johnson, and L. Stockmeyer. Some simplified NP-complete graph problems. Theoretical Computer Science, 1(3):237–267, Feb. 1976.

[20] P. Gopalakrishnan, X. Li, and L. Pileggi. Architecture-aware FPGA placement using metric embedding. In Proceedings of ACM/IEEE Design Automation Conference, pages 460–465, San Francisco, USA, Jul. 2006.

[21] M. Gort and J. H. Anderson. Analytical placement for heterogeneous FPGAs.

In Proceedings of Field-Programmable Logic and Applications, pages 143–150, Oslo, Norway, Aug. 2012.

[22] H. Hassan, M. Anis, and M. Elmasry. LAP: a logic activity packing methodol-ogy for leakage power-tolerant FPGAs. In Proceedings of International Sympo-sium on Low Power Electronics and Design, pages 257–262, San Diego, USA, Aug. 2005.

[23] S. Hauck and A. Dehon. Reconfigurable Computing. Morgan Kaufmann, 2008.

[24] D. Hill. US patent 6,370,673: Method and system for high speed detailed placement of cells within an integrated circuit design. 2002.

[25] M.-K. Hsu, Y.-W. Chang, and V. Balabanov. TSV-aware analytical placement for 3D IC designs. In Proceedings of ACM/IEEE Design Automation Confer-ence, pages 664–669, San Diego, USA, Jun. 2011.

[26] M.-K. Hsu, S. Chou, T.-H. Lin, and Y.-W. Chang. Routability-driven analytical placement for mixed-size circuit designs. In Proceedings of the International Conference on Computer-Aided Design, pages 80–84, San Jose, USA, Nov. 2010.

[27] E. Hung, F. Eslami, and S. J. Wilton. Escaping the academic sandbox: realizing VPR circuits on xilinx devices. In Proceedings of Field-Programmable Custom Computing Machines, pages 45–52, Seattle, USA, Apr. 2013.

[28] G. Karypis, R. Aggarwal, V. Kumar, and S. Shekhar. Multilevel hypergraph partitioning: applications in VLSI domain. IEEE Transactions on Very Large Scale Integration (VLSI) Systems, 7(1):69–79, Mar. 1999.

[29] G. Karypis and V. Kumar. Multilevel k-way hypergraph partitioning. In Pro-ceedings of ACM/IEEE Design Automation Conference, pages 343–348, New Orleans, USA, Jun. 1999.

[30] M.-C. Kim, D.-J. Lee, and I. L. Markov. SimPL: an effective placement al-gorithm. IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, 31(1):50–60, Jan. 2012.

[31] M. Kleinhans, G. Sigl, F. M. Johannes, and K. J. Antreich. GORDIAN: VLSI placement by quadratic programming and slicing optimization. IEEE

Transac-tions on Computer-Aided Design of Integrated Circuits and Systems, 10(3):356–

365, Mar. 1991.

[32] T.-H. Lin, P. Banerjee, and Y.-W. Chang. An efficient and effective analytical placer for FPGAs. In Proceedings of ACM/IEEE Design Automation Confer-ence, page 10, Austin, USA, Jun. 2013.

[33] J. Luu, J. H. Anderson, and J. S. Rose. Architecture description and packing for logic blocks with hierarchy, modes and complex interconnect. In Proceedings of Symposium on Field Programmable Gate Arrays, pages 227–236, Monterey, USA, Feb. 2011.

[34] J. Luu, I. Kuon, P. Jamieson, T. Campbell, A. Ye, W. M. Fang, K. Kent, and J. Rose. VPR 5.0: FPGA cad and architecture exploration tools with single-driver routing, heterogeneity and process scaling. ACM Transactions on Reconfigurable Technology and Systems, 4(4):32, Dec. 2011.

[35] P. Maidee, C. Ababei, and K. Bazargan. Timing-driven partitioning-based placement for island style FPGAs. IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, 24(3):395–406, Mar. 2005.

[36] V. Manohararajah, G. R. Chiu, D. P. Singh, and S. D. Brown. Difficulty of predicting interconnect delay in a timing driven FPGA CAD flow. In Proceed-ings of International Workshop on System Level Interconnect Prediction, pages 3–8, Munich, Germany, Mar. 2006.

[37] A. Marquardt, V. Betz, and J. Rose. Timing-driven placement for FPGAs. In Proceedings of Symposium on Field Programmable Gate Arrays, pages 203–213, Monterey, USA, Feb. 2000.

[38] A. S. Marquardt, V. Betz, and J. Rose. Using cluster-based logic blocks and timing-driven packing to improve FPGA speed and density. IEEE Transactions on Very Large Scale Integration (VLSI) Systems, 12(3):288–298, Apr. 1999.

[39] Z. Marrakchi, H. Mrabet, and H. Mehrez. Hierarchical FPGA clustering based on multilevel partitioning approach to improve routability and reduce power dissipation. In Proceedings of International Conference on Reconfigurable Com-puting and FPGAs, pages 4–25, Puebla, Mexico, Sep. 2005.

[40] F. Mo, A. Tabbara, and R. K. Brayton. A force-directed macro-cell placer.

In Proceedings of IEEE/ACM International Conference on Computer-Aided Design, pages 177–181, San Jose, USA, Nov. 2000.

[41] K. E. Murray, S. Whitty, S. Liu, J. Luu, and V. Betz. Titan: enabling large and complex benchmarks in academic CAD. In Proceedings of Field-Programmable Logic and Applications, pages 1–8, Porto, Portugal, Sep. 2013.

[42] W. C. Naylor, R. Donelly, and L. Sha. US patent 6,301,693: Non-linear op-timization system and method for wire length and delay opop-timization for an automatic electric circuit placer. 2001.

[43] A. Pandit, L. Easwaran, and A. Akoglu. Concurrent timing based and routabil-ity driven depopulation technique for FPGA packing. In Proceedings of IEEE International Conference on Field-Programmable Technology, pages 325–328, Taipei, Taiwan, Dec. 2008.

[44] S. T. Rajavel and A. Akoglu. Mo-pack: many-objective clustering for FPGA CAD. In Proceedings of ACM/IEEE Design Automation Conference, pages 818–823, San Diego, USA, Jun. 2011.

[45] G. Sigl, K. Doll, and F. M. Johannes. Analytical placement: A linear or a quadratic objective function? In Proceedings of ACM/IEEE Design Automa-tion Conference, pages 427–432, San Francisco, USA, Jun. 1991.

[46] A. Singh, G. Parthasarathy, and M. Marek-Sadowska. Efficient circuit clus-tering for area and power reduction in FPGAs. ACM Transactions on Design Automation of Electronic Systems, 7(4):643–663, Feb. 2002.

[47] C. N. Sze, T.-C. Wang, and L.-C. Wang. Multilevel circuit clustering for delay minimization. IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, 23(7):1073–1085, Jun. 2004.

[48] D. Xie, J. Xu, and J. Lai. A new FPGA placement algorithm for heterogeneous resources. In Proceedings of ACIS International Conference on Computer and Information Science, pages 742–746, Changsha, China, Oct. 2009.

[49] M. Xu, G. Gr´ewal, and S. Areibi. StarPlace: A new analytic method for FPGA placement. IEEE Transactions on Very Large Scale Integration (VLSI) Systems, 44(3):192–204, Feb. 2011.

[50] Y. Xu and M. Khalid. QPF: efficient quadratic placement for FPGAs. In Pro-ceedings of Field-Programmable Logic and Applications, pages 555–558, Tam-pere, Finland, Aug. 2005.

[51] M. Yang, H. Xu, and A. Almaini. Power-aware FPGA packing algorithm. In Proceedings of ACIS International Conference on Computer and Information Science, pages 817–819, Changsha, China, Oct. 2009.

[52] A. G. Ye and J. Rose. Using multi-bit logic blocks and automated packing to improve field-programmable gate array density for implementing datapath

cir-cuits. In Proceedings of IEEE International Conference on Field-Programmable Technology, pages 129–136, Brisbane, Australia, Dec. 2004.

相關文件