Chapter 1: Introduction
1.2 Contributions of This Dissertation
In this dissertation, an energy-efficient memory-centric on-chip data communication platform is proposed to deal with the increasing data communication and data storage for heterogeneous multi-core SoC designs. Fig. 1.6 presents the contribution matrix of energy-efficient memory-centric on-chip data communication which consists of a memory-centric OCIN and an on-demand memory sub-system.
The memory-centric OCIN provides the micro-architecture for data communication
based on the building blocks, including link wires, routers and NIs. In this dissertation, all building blocks are analyzed and developed to realize energy-efficient multi-core SoCs. Additionally, the on-demand memory sub-system enhances memory bandwidth and reduces the total execution time of the whole system via the centralized MMU and private MMUs. Moreover, the NI provides a bridge between the on-demand memory sub-system, memory-centric OCIN and heterogeneous PEs. The contributions of each block are described as follows.
1.2.1 Link Wires
For link wires, a novel self-calibrated energy-efficient and reliable channel design is proposed for OCINs. The proposed channels reduce the energy consumption while maintaining reliability. The channels are developed using the self-calibrated voltage scaling technique with the self-corrected green (SCG) coding scheme. The SCG coding is a joint bus and error correction coding scheme that provides a reliable mechanism for channels. In addition, it achieves a significant reduction in energy consumption via a joint triplication bus power model for crosstalk avoidance. Based on SCG coding scheme, the proposed self-calibrated voltage scaling technique adjusts voltage swing for energy reduction. Furthermore, this technique tolerates timing variations.
1.2.2 Routers
Routers are the essential components of OCINs. The router architecture depends on the topology and flow control of OCINs. A generic router architecture consists of a set of input buffers, an interconnect matrix, a set of output buffers and control circuitries, including a routing controller, an arbiter and an error detector. In this thesis, a data-link two-level FIFO (first-in first-out) buffer architecture with the centralized
shared buffer is proposed in this paper. The proposed two-level FIFO buffer architecture has a shared buffer mechanism allowing the output channels to share the centralized FIFO with sufficient buffer space. Additionally, the proposed architecture reduces the area and power consumption to achieve the same performance.
In addition to the proposed two-level FIFO buffer, an adaptive congestion-aware routing algorithm with a quality-of-service guarantee arbitration mechanism is proposed for mesh OCINs. Depending on the traffic around the routed node, the proposed routing algorithm provides not only minimum paths but also non-minimum paths for routing packets. Both minimum and non-minimum paths are based on the odd-even turn model to avoid deadlock and livelock problems. The decision of the minimum paths or non-minimum paths depends on the utilities of buffers in neighbor nodes and the specific switching value. In this adaptive algorithm, the congestion conditions and distributed hotspots will be avoided. It has the advantages of getting higher performance and also reducing the latency.
The implementation of routing tables is also proposed via content addressable memories (CAM). Moreover, the implementation of routing tables is extended for IPv6 network routers, which is the next generation of network routers, using ternary content addressable memories (TCAM). As routing tables become larger, energy consumption and leakage current become increasingly important issues in the design of TCAM in nano-scale technologies. Therefore, a novel energy-efficient TCAM macro design is proposed for IPv6 applications. The proposed TCAM employs the concept of architecture and circuit co-design. To achieve an energy-efficient TCAM architecture, a butterfly match-line scheme and a hierarchy search-line scheme are developed to reduce significantly both the search time and power consumption. The match-lines are also implemented using noise-tolerant XOR-based conditional
keepers to reduce not only the search time but also the power consumption. To reduce the increasing leakage power in advanced technologies, the proposed TCAM design utilizes two power gating techniques, namely super cut-off power gating and multi-mode data-retention power gating.
1.2.3 Network Interfaces
NIs, one of the building blocks in OCINs, is a major factor in the performance. In this dissertation, an efficient NI is proposed for the memory-centric OCIN to reduce the data blocking by a borrowing mechanism. By considering the borrowed memory blocks and p-MMU, the size of the output queue in NI can be dynamically scheduled.
Additionally, the p-MMU can dynamically allocate the memory resources for buffering the blocking network data. Therefore, the proposed efficient NI can increase the performance of the memory-centric OCIN.
1.2.4 On-Demand Memory Sub-system
In this dissertation, a memory-centric on-chip data communication platform is presented for merging heterogeneous PEs, and applied to wireless video entertainment systems. In this platform, on-demand memory sub-system is developed for dynamically allocating memory resources and efficiently managing memory accesses.
The contributions of on-demand memory sub-system are described as follows.
A. Buffer borrowing mechanism for NIs
In order to reduce the stall of PEs caused by network data blocking, a novel buffer borrowing mechanism is proposed to borrow the memory resources for buffering the blocking packets.
B. Adaptive cache control
For multi-task applications, different processor elements (PEs) may have different memory requirements during runtime. Therefore, the proposed c-MMU can support memory resource re-allocation by adaptive cache control scheme. Accordingly, the memory utilization of the system can be improved.
C. External Memory Interface for DDR3 DRAM
DDR3 DRAM devices are utilized for supporting huge data storage recently.
Therefore, an efficient external memory interface (EMI) is also designed to reschedule read/write commands for DDR3 DRAM that reduces both execution time and energy consumption.
D. Inter-Layer Pre-Fetch Scheme for Scalable Video Coding
For wireless video entertainment systems, a scalable video coding (SVC) is utilized to adapt variations of wireless channels. Based on the data stream of SVC, an inter-layer pre-fetch scheme (IPS) is proposed to reduce the miss rate during frame decoding of SVC.
E. Efficient Address Translator (AT) for SVC
The SVC data allocation in DDR3 DRAM is proposed using an efficient address translator. The translated addresses can improve the DRAM access efficiency while processing SVC.