Chapter 5 Memory-Centric On-Chip Data Communication for
5.2 Memory-Centric On-Chip Data Communication Platform
transmitting bandwidth for the multi-core platform is increasing year by year as shown in the Figure 5.3. However, the overall system performance could be limited by the task partitioning, task mapping, memory resource allocation, and memory data accessing. Figure 5.4 indicates the bottlenecks of multi-core platforms with insufficient memory bandwidth and memory capacity for supporting high communication efficiency in the multi-core systems. With ongoing development of multi-core or multi-task system, both the memory capacity and memory access bandwidth are required. Enabling multiple memory data access is necessary for improving the memory bandwidth. However, increasing the memory read/write ports not only increases the hardware complexity but also reduces the memory performance and noise immunity. Conventional memory access method cannot provide enough memory bandwidth for multi-core platform. Hence, the memory management in multi-core or multi-task platform will become more and more important. It is an essential issue that reducing additional memory access and increasing the memory bandwidth effectively. For these reasons, a memory-centric on-chip data communication platform will be proposed and introduced in the following section.
Figure 5.4 Comparison between memory bandwidth, memory capacity and communication efficiency in multi-core systems
5.2 Memory-Centric On-Chip Data Communication Platform
5.2.1 Overall Architecture
data communication platform is proposed and the architecture is shown in Figure 5.5.
Heterogeneous processing elements such as microprocessors and application-specific stream processors can be integrated in the platform. In this platform, each processor element owns distributed memory management unit (d-MMU). The d-MMU includes local cache (D-cache and I-cache) and cache controller which can efficiently handle all memory requests generated by the processor elements. It can dynamically allocate unused space in cache for buffering the transmitting data. If processor elements need additional memory resource requirements, the centralized memory resources including centralized cache and off-chip DRAM can be used. It is controlled by a centralized memory management unit (c-MMU). It can dynamically allocate and manage the memory resources according to different memory requirements.
For the data communication between processor elements, message-passing technique is applied for this platform. The processor elements transmit/receive the data to/from others through an on-chip interconnection network. Network interface is applied to packetize the transmitted data to interconnection and de-packetizes the received data from interconnection. Furthermore, in order to have better energy utilization for green computing, the power management unit can be applied to dynamically control the supply voltage and operating frequency of each processor element for saving energy consumptions.
In the heterogeneous multi-task platform, different processor elements would have quite different memory requirements with different specific functions in a platform. For instance, the memory requirement of the video decoding is larger than that of the wireless processing unit. Moreover, different system environment factors may affect memory utilizations for the applications in platform during runtime.
Different qualities of wireless channels may have different memory behavior in a wireless video integrated system. Thus, a multilevel memory hierarchy on-demand memory system is applied for this platform. The memory system enables the processing elements to own different memory resources dynamically. In the following section, the concept of on-demand memory system will be introduced.
RISC
d-MMU : Distributed Memory Management Unit NI : Network Interface
Figure 5.5 The architecture of memory-centric on-chip data communication platform
5.2.2 Concepts of On-Demand Memory System
In on-demand memory system, a three-level memory hierarchy is constructed, and the illustration is shown in Figure 5.6. For the first hierarchy level, distributed memory management unit (d-MMU) is applied to control the memory accesses. It includes distributed cache and cache controller for processor elements. Furthermore, in order to improve the transmitting efficiency for data communication, d-MMU can dynamically allocate unused space in distributed cache to store packet data so that the stall caused by data blocking can be prevented.
For the second level hierarchy of the on-demand memory system, centralized memory management unit (c-MMU) is constructed to provide more memory resources for processor elements. In c-MMU, a cache controller and centralized cache is included. In addition, the configuration of centralized cache can be dynamically adjusted according to the different memory requirement from processor elements. For example, if a processor element need larger memory requirement than others, it can
for green computing.
Figure 5.6 Illustration of the memory hierarchy in on-demand memory system
For supporting enough memory space, off-chip DRAM is applied, and it is the third memory hierarchy level in the system. DRAM controller is needed to access the off-chip DRAM devices. It includes an external memory interface and address translator to improve the memory access efficiency.
In the on-demand memory system, all processor elements own a private address space and can dynamically be allocated. For data switching between processor elements, message-passing mechanism is used. On-chip interconnection network in the platform is designed for data communication. Note that the thesis is focus on on-demand memory system. The design of interconnection network is not included in this thesis. In conclusion, adaptive memory resource allocation can be achieved and the memory utilization can be improved by the memory management units.