DRAM Power Consumption First - S IMULATION R ESULT AND A NALYSIS

CHAPTER 4 SIMULATION RESULT AND ANALYSIS

4.3 S IMULATION R ESULT AND A NALYSIS

4.3.2 DRAM Power Consumption First

Since the video phone scenario does not require up to 71.8% bandwidth utilization to finish all tasks, reduce bandwidth utilization to achieve lower power

consumption is more favorable in the embedded system. In order to reduce power, we analyze the DRAM power consumption.

The DRAM power components listed in Section 2.2.3 can be divided into the background power, activate power, and read/write power.

The background power consists of the precharge power-down power, precharge standby power, active power-down power, active standby power, and refresh power.

In our simulation environment, DRAM is always in the active standby state. Thus, the effective background power is the summation of active standby power and refresh power. Therefore, the background power is fixed at all time.

The activate power is determined by the number of total ACTIVE commands and the task execution time. Thus, fewer ACTIVE commands or longer task execution time can lower activate power.

The read/write power is composed of the read power, write power, and I/O power. The read power and I/O power are decided by the number of total data reads and the task execution time. Also, the write power is decided by the number of total data writes and the task execution time. To lower read/write power, just reduce the read or write data count and stretch the task execution time. However, with the same access pattern, the number of data reads or writes is determined by the data burst length. Since the memory controller with shorter data burst length reads or writes fewer extra data, it consumes less power.

According to above analysis, to reduce DRAM power consumption, we should reduce the number of ACTIVE commands and the data burst length, and stretch the task execution time within timing constraints.

Take Table 4-5, Table 4-6, Fig. 4-9, and Fig. 4-11 into consideration, we choose the memory controller with data burst length 4, MFIFS transaction scheduling policy, and without bank-interleaving support. Though it may consume less power with data burst length 2, the probability of timing violation is also larger. Take Fig. 4-7 as the reference, we set all buffer size to 4 and the MFIFS threshold to 2 to suppress the increase of bandwidth utilization which stretches the task execution time.

Fixed priority Round-robin Average DRAM bandwidth utilization 42.9% 36.8%

Average DRAM power consumption 356.9 mW 423.4 mW

Table 4-10 Average DRAM bandwidth utilization and power consumption with different bus arbitration schemes

Table 4-10 lists the simulation result and there is no timing violation. Compared with the result in Section 4.3.1, the average DRAM power consumption is reduced by 26.3% with fixed priority bus arbitration scheme and 13.2% with round-robin.

Average DRAM Power Consumption

Power (mW) Total read/write power

Total activate power Total background power

Fig. 4-12 Average DRAM power consumption with different optimization policies

Fig. 4-12 shows each power component of the average DRAM power consumption with bandwidth utilization first and power consumption first. The background power is always fixed and takes about 30% of total power consumption.

When the fixed priority bus arbitration scheme is applied with power first, the activate and read/write power is reduced by 24.4% and 40.0% respectively. When the round-robin bus arbitration scheme is applied with power first, although the activate power increases by 116.2%, the reduction of read/write power by 48.5% still lowers total power consumption.

The result is not optimized. However, it is hard to find out an optimized result without thorough simulations since the related factors are not independent and affect each other.

Chapter 5 Hardware Implementation

There are two sections in the chapter. Section 5.1 describes the hardware design of memory controller. In Section 5.2, the implementation result is shown.

5.1 Hardware Design

Fig. 5-1 Hardware block diagram of the memory controller

In Section 4.3.1, we have developed a high performance memory controller architecture. Also, in Section 4.3.2, a low power with less performance architecture by reducing the applied techniques is presented. In real applications, the later one should be implemented since it completes all tasks with less cost. However, because we are eager to know the hardware cost when all techniques are applied, the former one is implemented.

Fig. 5-1 shows the hardware block diagram of the memory controller. The memory controller consists of five parts which are AXI interface, input and output buffer, transaction scheduler, command translator, and command controller.

The AXI interface handles VALID/READY channel handshaking. In addition, the AXI interface combines write and read address channels together for the single input address buffer with round-robin scheme.

The input and output buffer is composed of two input buffers and two output queues. All of them possess 12 entries.

The transaction scheduler reorders input transactions by the MFIFS transaction scheduling policy. We use two components to implement. The transaction reorder unit records IDs of input transactions and reorders them. The transaction issue unit gets corresponding transaction address and data for command translator by the output ID of transaction reorder unit.

Fig. 5-2 Block diagram of transaction reorder unit

Fig. 5-2 shows the block diagram of transaction reorder unit. The ID issue unit receives transaction IDs and addresses from the AXI interface and sends them to the corresponding ID buffer by bank addresses. All the ID buffers contain 12 entries. The threshold counter provides the successive access count to ID selector for output decision.

The command translator consists of four bank controllers which behave as stated in Section 3.5.1 for bank-interleaving support.

There are three states in the command controller. One for DRAM power-up initialization, one for SELF REFRESH power-down mode, and the last one is used for normal DRAM operations. In normal DRAM operations, the scoring function in the command scheduler determines which command and address provided by the command translator to be issued. The read and write unit translates data between single data rate and double data rate.

5.2 Implementation Result

Design

Proposed Kun-Bin Lee’s ARM PL340

Clock Rate

166 MHz 100 MHz 166 MHz

Transaction Scheduler 6688 12003 N/A

Command Translator 4257 N/A

Command Controller 11331 5362

N/A

Gate

Count

Total 47590 17365 About 60K

Table 5-1 Implementation result and comparison

Table 5-1 lists the implementation result and the comparison to other designs. It is obvious that in the proposed design, over 50% gate count is used for data buffering.

Therefore, the total gate count is 47.6K.

In Kun-Bin Lee’s design [9], dedicated channels are used for masters. Thus, no input or output buffer is required and the gate count can be largely reduced. The overall gate count is 17.4K. Despite of the input and output buffer, the gate count of our design is 22.3K which is 28.2% more than that of Kun-Bin Lee’s. However, the speed of our design is 166 MHz.

As to ARM PL340 [11], it is a configurable AXI compliant soft IP and the detailed gate count is unknown.

Chapter 6 Conclusion and Future Work

As the design scale becomes larger and larger, how to evaluate system performance in early stage accurately is the key to successful design.

With the DRAM advancement, data rate is no longer the most critical issue.

Instead, the large power consumption is taken into consideration for the embedded system. How to develop a memory controller with balanced DRAM performance and power consumption is the problem today.

In this thesis, we propose a configurable multimedia platform simulator to evaluate DRAM performance and power consumption introduced by the memory controller. Also, hardware is implemented to see the cost of each technique utilized.

With proper access patterns, the multimedia platform simulator can perform different scenarios. As to the memory controller part, it is well designed and can be easily modified to implement wanted algorithms for evaluation. The overall simulator is based on AXI protocol. With this, the transactions can be transferred out of order which is required by the transaction reorder scheduling.

In the simulation of video phone scenario, several techniques are combined to achieve high DRAM bandwidth utilization. With bank-interleaving support, the bandwidth utilization rises with transaction latency reduction. According to the scenario characteristics, a MFIFS transaction scheduling policy is proposed. It can increase DRAM bandwidth utilization and reduce power consumption simultaneously.

After the buffer size and MFIFS threshold modification, effect of different bus arbitration schemes is almost eliminated. Based on the DRAM power component analysis and the former simulation result, a memory controller consumes lower power and meets timing constraints is shown.

Although the proposed solution is good, there are still some places could be improved. First, a power optimized method should be developed instead of observation and test. Second, since the system bus arbitration scheme plays an

important role in system performance, a more suitable system bus arbitration scheme may be developed.

Reference

[1] Saleh, R., et al., “System-on-Chip: Reuse and Integration,” on Proceedings of the

IEEE, vol. 94, pp. 1050-1069, June 2006.

[2] Xu, S.; Pollitt-Smith, H., “A TLM Platform for System-on-chip Simulation and Verification,” on IEEE VLSI-TSA International Symposium, pp. 220-220, April 2005.

[3] S. Przybylski, “Sorting Out The New DRAMs,” in Hot Chips Tutorial, Stanford, CA, 1997.

[4] Scott Rixner, et al., “Memory Access Scheduling,” in Proceedings of the 27th

Annual International Symposium on Computer Architecture, Vancouver, Canada,

pp. 128–138, June 2000.

[5] Delaluz, V., et al., “Hardware and Software Techniques for Controlling DRAM Power Modes,” on Computers, IEEE Transactions, vol. 50, pp. 1154–1173, Nov.

2001.

[6] Yongsoo Joo, et al., “Energy exploration and reduction of SDRAM memory systems,” on Proceedings of 39th Design Automation Conference, pp. 892-897, June 2002.

[7] Ning-Yaun Ker and Chung-Ho Chen, “An Effective SDRAM Power Mode Management Scheme for Performance and Energy Sensitive Embedded Systems,” in Proceedings of the ASP-DAC 2003, pp. 515–518, Jan. 2003.

[8] Burchardt, A., et al., “A Real-time Streaming Memory Controller,” in

Proceedings of Design, Automation and Test in Europe, vol. 3, pp. 20-25, 2005.

[9] Kun-Bin Lee, et al., “An Efficient Quality-aware Memory Controller for Multimedia Platform SoC,” on Circuits and Systems for Video Technology, IEEE

Transactions, vol. 15, pp. 620-633, May 2005.

[10] MemMax 2.0 Multi-threaded DRAM Access Scheduler, Sonics Limited,

http://www.sonicsinc.com/documets/MemMax_2.0_Data_Sheet.pdf

[11] PrimeCell AXI SDRAM Controller PL340, ARM Limited,

http://www.arm.com/products/solutions/PL340AXIController.html

[12] AMBA Protocol, ARM Limited,

http://www.arm.com/products/solutions/AMBAHomePage.html

[13] AXI Protocol, ARM Limited,

http://www.arm.com/products/solutions/AMBA3AXI.html

[14] JEDEC Organization, http://www.jedec.org/

[15] DDR SDRAM Specification, JEDEC,

http://www.jedec.org/download/default.cfm

[16] Jeff Janzen, “Calculating Memory System Power for DDR SDRAM,” in Micron

designline, quarter 2, 2001.

[17] SystemC Community, http://www.systemc.org/

[18] Micron’s MT46V8M16 128Mb DDR SDRAM,

http://download.micron.com/pdf/datasheets/dram/ddr/128MBDDRx4x8x16D.pdf

在文檔中適用於視訊應用的智慧型記憶體控制器設計 (頁 57-0)