• 沒有找到結果。

External Memory Interface in DRAM controller

Chapter 4 Hierarchy Memory Management Units for On-Demand

4.2 Centralized Memory Management Unit Organization

4.2.2 External Memory Interface in DRAM controller

In the design of memory hierarchy system, it usually needs an off-chip memory to be a hierarchy level for storage the large amount of data. The external memory interface is used to communicate with the external memory for the system. To deal with tremendous data transfer and storage in video processing, the external memory must provide high data bandwidth to achieve the real time request. The bandwidth of the external memory is limited due to the pin number of I/O is finite. Accordingly the external memory interface must provide high data bandwidth utilization by using some techniques. An external memory interface will be introduced in this section.

4.2.2.1 Concept of External Memory Interface & DRAM Model

The external memory interface (EMI) is an interface between on-chip system and off-chip DRAM devices. It will receive the physical addresses from the address translation machine, and generate DRAM commands to access DRAM data. EMI is designed to control the external memory. The simple connection of the EMI is show

in Fig.4. 20. EMI generates the appropriate commands defined in specification which have been introduced in chapter2. In addition, there are various and complex timing constrains for issuing the DRAM commands. EMI need to issue appropriate commands without any DRAM timing violation. In order to improve the bandwidth efficiency, a command scheduling would be applied to reschedule the DRAM commands. Because the banks in the DRAM can operate in parallel, the commands with different banks would enable issued without timing constrain. According to this concept, rescheduling DRAM commands enables higher bandwidth utilization than in-order issuing. The detail architecture of proposed EMI will be described in the next section.

DATA CMD/ADDR

EMI DRAM

address Write Data

Read Data

Fig.4. 20 Connection of EMI

In this work, 1Gb DDR3 SDRAM model provided by Micron Inc. [4.8] is used.

Several speed grades and configurations can be chosen as shown in Table.4. 2. 15E speed grade and 64Megx16 configuration is chosen. There are 8 independent banks in a DDR3 device. The EMI would recode the bank status and generate appropriate commands according to the corresponding bank states. Different speed grades and configurations may have different timing constrain, the designer must follow these timing rules to build the memory interface. The detail timing issues will also be described in the following sections.

Table.4. 2 Micron`s DDR3 configurations

4.2.2.2 Architecture of EMI

The architecture of EMI is shown in Fig.4. 21. It consists of three finite state machines, FIFOs, command scheduler, Timing counters and I/O control circuit. The operation of proposed EMI can be briefly separated into three parts and each part is controlled by a finite state machine. In the following sections, these parts will be introduced.

Bank0 FSMBank1 FSMBank2 FSM Bank7 FSM

Fig.4. 21 Architecture of EMI

4.2.2.2.1Operation of EMI

The first one is command generating part. In order to generate reasonable commands to access DRAM, 8 Bank Finite State Machines (FSM) are constructed to recode the status of eight DRAM internal banks. The state diagram of Bank FSM is shown in Fig.4. 22(a). When an input command addresses to one of DRAM banks, the state of the corresponding Bank FSM would be checked. According to different bank

status, correct commands are issued to Command Scheduler for rescheduling.

The second one is command issuing part. After rescheduling the command, the DRAM commands are stored in issue FIFO. When issuing these commands to DRAM device, complex timing rules must be strictly observed. The command FSM can issue the commands in the right time without any timing violations. It is controlled by several timing counters which recode the cycle margins of different timing constraints.

When a command is issued, the relative timing counters will be set to a certain value and start to decrease until the counter is return to zero. The timing counters will be checked when issuing new commands from issue FIFO. If there is no timing violation, the command can be issued to DRAM. Otherwise, additional stalls will occur. During the time of waiting, EMI will issue NOP commands to external memory. The common timing parameters are shown in Table.4. 3. In addition, The DRAM needs a long latency to power up and initialization including ZQ calibration and mode register loading. Fig.4. 22(b) shows command FSM state diagram. It includes initialization states, issue states and several waiting states. Initialization states handle the DRAM initializations. Issue states generate the appropriate DRAM commands to I/O control block. Additional waiting states would stall the command issuing until the following command can be issued legally.

The third part is I/O control. When a write command is issued, the write data must be sent after column write latency. Also, the read data would appear in the data bus after column read latency when a read command is issued. The I/O control block controls the timing of access data, and it is controlled by I/O FSM. Furthermore, the Data Strobe (DQS) signal would need to be controlled by I/O for DRAM access data aligning. Fig.4. 22(c) shows the state diagram of I/O FSM.

Initial

Fig.4. 22 State diagram of EMI Finite State Machines

Parameter Symbol

Table.4. 3 Common timing parameters of Micron DDR3 SDRAM

4.2.2.2.2Command Scheduler

In order to improve the bandwidth efficiency, the Command Scheduler is applied to reschedule the command sequence. To fully utilize the DRAM bandwidth, it is necessary to parallelize the accessing which address to different banks. With different situations, appropriate scheduling will be applied.

When the successive accesses address to different banks, the bank-miss will occur.

Fig.4. 23(a) shows the original command sequence without any scheduling, and it has the worst bandwidth efficiency. Fortunately, the banks in a DRAM device can operate in parallel, so we can activate the banks first and then issue the column access commands as shown in Fig.4. 23(b). The bank activate time can be hidden. However, no more than four bank ACTIVATE commands may be issued in a given tFAW (MIN) period. If the number of successive accesses with different banks exceeds four, the optimal sequence is that interleaving ACTIVATE and column access commands as shown in Fig.4. 23(c). The proposed Command Scheduler can schedule the ACTIVATE and column access command with different banks to the optimal sequence. Note that the calculation of cycles in Fig.4. 23 is base on the minimum clock cycle time defined in Table.4. 3.

ACT0 READ0 ACT1 READ1 ACT2 READ2 ACT7 READ7

8 * tRCD = 72 cycles subsequent read command after tWTR has been met. It may cause worse bandwidth efficiency when the read and write commands are interleaved frequently. Fig.4. 24(a) illustrates the example of issuing the read bursts after write bursts. If the successive

read and write commands have no data dependency, the issue sequence can be exchanged so that the bandwidth efficiency can be improved. The example with scheduling is shown in Fig.4. 24(b).

WRITE READ

Fig.4. 24 read / write scheduling

When row-conflict occurs, the PRECHARGE and ACTIVATE commands must be issued to deactivate the open row and re-activate new row. Fig.4. 25 shows the example of four successive row-conflict reads with different banks. With scheduling, the PRECHARGE and ACTIVATE commands can be issued in advance so that the precharge and activate time can be hidden.

(a). Without schedule

4.2.2.3 Bandwidth Improvement with Command Scheduler

Command Scheduler can improve the memory bandwidth efficiency by rescheduling the DRAM commands. Especially for the irregular DRAM accesses, Command Scheduler significantly reduces the average access time by hiding additional cycles which are caused by bank and row conflicts. In order to measure the DRAM memory access efficiency, we define the bandwidth utilization as shown in

the following equation to calculate the DRAM bandwidth utilization. bandwidth utilization in the worst case. The successive access requests are random, so the DRAM row and bank conflict would happen frequently. For the other simulation information, the summary of simulation configurations is shown in Table.4. 4.

Test Pattern configuration

Burst Length 8

Number of random r/w command 2000

EMI configuration

Reference Model Micron DDR3-1333

MT41J128M8BY-15E

Operating clock rate 666.67MHz

Table.4. 4 Simulation summary

Base on the defined DRAM Bandwidth Utilization equation, the bandwidth utilization can be estimated, and the simulation result is shown in Table.4. 5. With Command Scheduler, it can improve 42.8% bandwidth utilization.

Without scheduler With Scheduler Improvement

Bandwidth Utilization 18.58% 26.53% 42.8%

Table.4. 5 Simulation of the bandwidth utilization