• 沒有找到結果。

Chapter 1: Introduction

1.1  Motivation

Modern multi-core SoC designs face a number of problems caused by the data communication among PEs and memory accesses. In addition to shrinking processing technologies, the ratio of interconnection delay to gate delay will increase in advanced technologies [1.2], indicating that on-chip interconnection architectures will dominate performance in future SoC designs. Furthermore, in current multi-core SoC designs, reducing power consumption is the primary challenge for advanced technologies.

Thus, using an on-chip bus to create a platform is a solution for multi-core SoC designs. This on-chip bus platform provides interfaces between multiple processor elements and verification environments [1.3], [1.4]. However, the requirements for on-chip communication bandwidth and PEs are growing continually beyond that which can accommodate standard on-chip buses. Moreover, advanced SoC designs using nano-scale technologies face a number of challenges. First, the shared bus architecture will become a development-critical factor for integration with an increasing number of processor elements. Existing bus architectures and techniques are not scalable, and cannot meet the specific requirements associated with low power and high performance [1.5]. Second, the interconnect delay across the chip exceeds the average clock period of IP blocks. Thus, the ratio of global interconnect delay to average clock period will continue increasing according to the International Technology Roadmap for Semiconductors (ITRS) [1.2]. Third, advanced technologies increase the coupling effect for interconnects, such as capacitive and inductive crosstalk noise. The increasing coupling effect aggravates power-delay metrics and degrades signal integrity [1.6]. Fourth, system design and performance are limited by the complexity of the interconnection between the different modules and blocks with a single clock design [1.7].

As design complexity of multi-core SoC continues to increase, a global approach is needed to effectively transport and manage on-chip communication traffic, and to optimize wire efficiency. Therefore, process-independent network-on-chip (NoC) has been considered an effective solution for integrating a multi-core system. NoC was investigated for dealing with the challenges of on-chip data communication caused by the increasing scale of next generation SoC designs [1.8], [1.9]. The most important characteristics of NoC can be considered as a packet switched approach [1.10] and a flexible and user-defined topology [1.11]. Furthermore, on-chip interconnection networks (OCINs) provide the micro-architecture and the building blocks for NoCs, including network interfaces, routers and link wires [1.12], [1.13]. The generic OCIN is based on a scalable network, which considers all requirements associated with on-chip data communication and traffic. OCINs have a few beneficial characteristics, namely, low communication latency, low energy consumption constraints, and design-time specialization. The motivation in establishing OCINs is to achieve performance using a system communication perspective.

Multi-core SoCs have become a major trend of architecture in modern data computing systems. The multiple PEs are integrated on a single chip or package to exploit the parallelism of applications and achieve superior performance as well as energy efficiency. Because these systems are highly integrated, their designs and trade-offs are tightly coupled; a single design decision can impose significant impact on multiple design layers. Thus, for optimal results, designers have to consider multiple design layers (vertical exploration) and multiple architecture options (horizontal exploration) when mapping an application to an underlying multi-core system as shown in Fig. 1.2 [1.14]. In multi-core SoC designs, the processes of data streaming can be divided into three parts, including data computation, data storage

and data communication. With the increasing PEs in multi-core SoCs, the capability of data computation increases rapidly to satisfy the increasing demands of mobile multimedia services [1.5]. Additionally, multi-task processing is also provided via multi-core SoCs based on parallel programming and task scheduling as shown in Fig.

1.2. According to the task scheduling, the on-chip data communication platform builds the backbone of the parallel hardware architecture and provides data communication and data storage via the OCIN and memory sub-system, respectively.

Furthermore, memory accesses and on-chip data communication dominate the overall performance of multi-core SoCs as shown in Fig. 1.3. Therefore, the development of memory sub-system in multi-cure systems will affect the overall performance dramatically. Moreover, the relative complexity of a video system increases year by year as presented in Fig. 1.4 that indicates great amount of memory capacity and memory bandwidth are required for high quality or multiple scalable level video processing [1.15]. Therefore, the memory sub-system should provide large memory

Parallel Applications

Fig. 1.2 Vertical exploration of a multi-core system. [1.14] 

space and high memory-access bandwidth for satisfying the video real-time requirement. Accordingly, large amounts of high speed and low power memories are indispensable for multi-task and multi-system emerging. These memories should be able to support diverse memory requirement of different PEs in a wireless video entertainment system using a memory sub-system.

When process technologies shrink to nano-scale, the ever-increasing on-chip integrations in recent years have led to a dramatic increase in system performance and system scale. Unfortunately, as performance and area are improved, power dissipation and heat density are substantially increased [1.16]. Accordingly, power dissipation in multi-core SoC designs has become a critical design issue. In multi-core SoC implementations of mobile systems, especially for handheld audio and video applications, low power considerations dominate the overall performance since the

Fig. 1.3 Comparison between memory bandwidth, computation capability and communication efficiency in multi-core SoCs.

Fig. 1.4 Relative complexity of a video system. [1.15]

battery life and geometry of mobile systems are limited [1.17], [1.18]. The demand for reliability design will require designers to find new technologies and circuit to ensure high performance and long operating lifetimes, owing to the high cost of packaging and cooling in nano-scale CMOS technologies. Therefore, energy-efficient circuitry becomes one of the critical issues in multi-core SoC designs.

RISC

p-MMU : Distributed Memory Management Unit NI : Network Interface

Fig. 1.5 Energy-efficient on-chip data communication platform with a memory-centric OCIN and an on-demand memory sub-system.

Based on the above crucial issues in multi-core SoCs, including the energy bound and the increasing requirements of data communication and data storage, an energy-efficient on-chip data communication platform is proposed in this dissertation as shown in Fig. 1.5. This on-chip data communication platform consists of a memory-centric OCIN and an on-demand memory sub-system. The memory-centric OCIN provides building blocks with on-demand memory sub-systems, including energy-efficient and reliable channels, congestion-aware routing algorithm, energy-efficient routing table, two-level FIFO buffer and buffer-efficient NIs. In addition, the on-demand memory sub-system provides high bandwidth and low power memory accesses for multi-core SoCs via a centralized memory management unit

(c-MMU) and private memory management units (p-MMUs). The on-demand memory sub-system can support variety memory resources for different PEs based on the memory behaviors. Moreover, when decoding the video frames, memory access characteristics of video decoders are generally regular and repeat. Therefore, the on-demand memory sub-system can improve the decoding performance via efficient memory management. Implementation of Routing Table (Chap 6) 1. Routing table in OCINs

2. Extended Routing Table in IPv6 Applications Energy-Efficient Reliable Channels (Chap 3) 1. Self-corrected green coding

2. Self-calibrated voltage scaling

Efficient Network Interface (Chap 7) 1. Borrowing mechanism for network interface

Link Wires

On-Demand Memory Sub-System

Distributed Memory Management Unit (Chap 7) 1. Borrowing mechanism network interface 2. Inter-layer pre-fetch for scalable video coding

Centralized Memory Management Unit (Chap 7) 1. Adaptive cache control

2. Efficient external memory interface for DDR3 3. Efficient address translator

Network Interface provides a bridge between data computation, data storage and data communication.

Energy-Efficient Multi-Core SoC Designs

Fig. 1.6 The contribution matrix of energy-efficient memory-centric on-chip data communication