• 沒有找到結果。

DRAM controller techniques and Improvements

Chapter 2 Related Researches of Memory Systems

2.3 DRAM

2.3.2 DRAM controller techniques and Improvements

According to different applications or systems, the memory controllers can be categorized into two classes which are particular-purpose and general purpose memory controller. The particular-purpose memory controller serves one kind of specific application to reduce the memory access latency. In many multimedia applications, the advanced video processes need huge data storage space. In order to

support the real-time video environment, the system needs external memory storage to store the image frame data or motion information. But the memory access speed is much slower than the processor unit execution speed. Many researchers have shown the well memory management method according to the regular memory access behavior in video process can significantly improve the overall system performance.

Base on the different specific applications, there have several approaches been proposed to increase the efficiency of memory access for video coding applications.

Kim memory interface architecture [2.13] reorganizes data arrangement in synchronous DRAM to increase the row-hit rate. Park proposed a memory node control approach [2.14] for HDTV video decoder. It uses history-based prediction to predict the next command is row-hit or row-miss. If it predicts the next command is row-miss, it will pre-charge the current bank. If row-hit, the current row will stay in the active state. The prediction is implemented by a finite state machine which shown in Fig.2. 16.

Fig.2. 16 State machine for storing page hit history information.

Chang proposed a two-layer external memory management unit [2.15] for H.264/AVC decoder. The memory management unit consists of two layers. The first layer is the address translation which provides an efficient pixel data arrangement to reduce the row-miss occurrence. The second layer is the external memory interface (EMI). In the address translation layer, the address translation machine uses a novel data arrangement which is suitable for H.264/AVC decoder to increase the memory bandwidth and reduce the power consumption. In order to minimize the number of active and pre-charge, chessboard-based arrangement memory mapping is presented as shown in Fig.2. 17. It is further compounded with the fact that Luma and Chroma are placed interleaved. The interlaced memory mapping method put the luminance

block and chrominance block in the same row of the bank. Because the decoder accesses a chrominance block after each luminance block, it doesn‟t need to re-active the row when accessing the chrominance block. Thus, it leads to the latency and power consumption reduced. To decrease the latency of row-miss and bank-miss status, the physical addresses produced by AT are stored in specific command FIFO.

Then the command FIFO can auto-detect whether the row-miss or bank miss would happen. The architecture of command FIFO is shown in Fig.2. 18. The incoming address is compared with PAR. If bank address and column address are the same as PAR, we set hit bit of the previous command to one. It leads to auto-precharge capability turned off. Otherwise, the hit bit remains zero such that auto-pre-charge capability turns on to reduce the latency of row-miss.

Fig.2. 17 Interlaced method

Fig.2. 18 Two architectures of command FIFO. B equals to one means bank hit. R equals to one means row hit.

Kim [2.13] and Chang [2.15] reorganize the data arrangement, Park [2.14]

proposed a history-based memory mode controller, Zhu [2.16] and Hongqi [2.17]

adjust the page size. These designers are trying to reduce the total row-miss and minimize the DRAM access latency. In the advanced memory controller, rearrange data is necessary to reduce the access latency. In addition, the advance video coding standard, H.264/AVC, provides several new coding tools including sub-pixel inter-prediction, variable block size motion compensation. Although these techniques can reduce bit-rate and improve the video quality, they require huge memory bandwidth to fetch additional reference pixel for motion compensation(MC) and interpolation. Fortunately, designers can use data reuse scheme to reduce the sub-pixel MC data loading bandwidth from DRAM. Interpolation window reuse(IWR) scheme was [2.18] proposed to reduce data access for the overlapped data. Li [2.19] proposed a cache-based architecture to reuse intra-MB overlapped data, and Chuang [2.20] also proposed an IWR-liked with N-way associative cache architecture to reuse inter-MB and inter-MB overlapped data.

In order to improve the bandwidth, Kang [2.21] and Heithecker [2.22] proposed multi-channel memory controller. The concept of the multi-channel can be applied to the general purpose memory controller. In the SoC system design, a variety of processor elements integrate into a chip. Different applications have different memory needs, finding a single topology that fits well with all applications is difficult, in order to adopt a variety of the functions, flexible and adaptable memory control is more and more important in SoC systems. Furthermore, in the multi core systems, the multi-channel memory controller will be needed to support high bandwidth and provide different application memory requirement. There are many researches develop many kind of efficiency memory systems. Lee [2.23] presents a multilayer, quality-aware memory controller to satisfy different memory access requirement.

Fig.2. 19 shows the configurations of different layers of the proposed memory controller. Layer 0 is called memory interface socket (MIS), it is a configurable, programmable, and high-efficient SDRAM controller for designers to rapidly integrate SDRAM subsystem into their designs. Layer 1 is quality-aware scheduler (QAS), it is a memory controller layer which has the capability to provide quality-of-service guarantees including minimum access latencies and fine-grained bandwidth allocation for heterogeneous processor elements in SoC designs. Moreover, Layer 2 built-in address generator (BAG) designed for multimedia processor elements

can effectively reduce the address bus traffic and therefore further increase the efficiency of on-chip communication.

Fig.2. 19 Configurations of different layers of the proposed memory controller

Nikolov [2.24] present an efficient multiprocessor platform which separated the data communication path and memory data access path. Soininen [2.25] presents the smart memory tile architecture to improve the memory bandwidth and performance.

Ipek [2.26] proposed a self-optimizing memory controller which base on reinforcement learning concept. And in order to adjust the memory access scheduling dynamically, Zheng [2.27] proposed a ME-LREQ(Memory Efficiency-Least Request) policy.

Besides, many SoC and computer systems require DRAM devices to store data.

Due to the 3-D(bank, row, column) structure, modern DRAM devices have non-uniform access latencies [2.28]. Continuous memory accesses directed to the same row of the same bank have less access latency than directed to the different row of the same bank because row conflict would not occur. Many researchers have demonstrated that rearrange and execute the memory requests out of order can significantly reduce the low conflict rate and improve the memory bandwidth efficiency. Shao [2.28] proposed a burst scheduling mechanism to maximize bus utilization of the SDRAM device. With this scheduling, memory accesses to the same rows of the same banks are clustered into bursts. Subsequently, Hu [2.29] proposed new memory access schedule algorithms overcame the starvation problem in burst scheduling.