DRAM Bandwidth Utilization First - S IMULATION R ESULT AND A NALYSIS

CHAPTER 4 SIMULATION RESULT AND ANALYSIS

4.3 S IMULATION R ESULT AND A NALYSIS

4.3.1 DRAM Bandwidth Utilization First

Instead of trying all possible combinations of each parameter, we use a simpler method to develop the memory controller.

First, we determine the data burst length since it affects the simulation result most. Based on chosen burst length, scoring functions for the command scheduler are evaluated. After that, we use the selected burst length and scoring function to find out a proper transaction scheduling policy.

During the memory controller development process, both two AXI network arbitration schemes are applied to see the bus arbitration scheme effect.

A. Simulation 1 – Choose a proper data burst length

Buffer size 8 entries

Data burst length 2, 4, 8 Bank-interleaving support Yes / No

Scoring function LTO

Transaction scheduling policy FIFS

Table 4-4 Configuration of the simulator in simulation 1

In simulation 1, different data burst lengths are applied to the memory controller with/without bank-interleaving support. Table 4-4 lists the configuration of the simulator.

Data burst length 2 4 8

Without bank-interleaving Violated Violated Violated With bank-interleaving Violated Met Met

Table 4-5 Timing constraint status when fixed priority bus arbitration scheme is applied

Average DRAM Bandwidth Utilization - Fixed Priority

Burst length 2 Burst length 4 Burst length 8

Bandwidth utilization

Without interleaving With interleaving

Fig. 4-2 Average DRAM bandwidth utilization with fixed priority bus arbitration scheme

Table 4-5 tells whether the timing constraints are met with each configuration when the fixed priority bus arbitration scheme is applied. Except for burst length 2, the timing constraints are met with bank-interleaving support.

Fig. 4-2 shows the average DRAM bandwidth utilization with fixed priority bus arbitration scheme. Without bank-interleaving support, burst length 2 gets lowest bandwidth utilization 26.5% while burst length 8 gets highest 41.6%. With bank-interleaving support, burst length 2 still gets lowest bandwidth utilization 27.1%

while burst length 8 remains highest 59.3%. With bank-interleaving support, the average bandwidth utilization improvement from burst length 2 to 8 is 2.1%, 55.9%, and 42.8% respectively.

Data burst length 2 4 8

Without bank-interleaving Violated Violated Met With bank-interleaving Violated Met Met

Table 4-6 Timing constraint status when round-robin bus arbitration scheme is applied

Average DRAM Bandwidth Utilization - Round-robin

Burst length 2 Burst length 4 Burst length 8

Bandwidth utilization

Without interleaving With interleaving

Fig. 4-3 Average DRAM bandwidth utilization with round-robin bus arbitration scheme

Table 4-6 tells whether the timing constraints are met with each configuration when the round-robin bus arbitration scheme is applied. Except for burst length 2, the timing constraints are met with bank-interleaving support.

Fig. 4-3 shows the average DRAM bandwidth utilization with round-robin bus arbitration scheme. Without bank-interleaving support, burst length 2 gets lowest bandwidth utilization 16.5% while burst length 8 gets highest 46.8%. With bank-interleaving support, burst length 2 still gets lowest bandwidth utilization 20.0%

while burst length 8 remains highest 67.0%. With bank-interleaving support, the average bandwidth utilization improvement from burst length 2 to 8 is 21.5% , 60.3%, and 43.1% respectively.

Note that when the burst length is 2, the average bandwidth utilization with round-robin is 25% to 35% lower than that with fixed priority. When the burst length is 4, the average bandwidth utilizations with both bus arbitration schemes are almost the same. And when the burst length is 8, the average bandwidth utilization with round-robin is 10% to 15% higher than that with fixed priority. Since transactions from the same master are not closely bundled together in time, there exists small intervals between two successive transactions. Therefore, the fixed priority bus arbitration scheme often causes transactions from two masters rotates. If the two masters are mapped to the same bank, row miss occurs repeatedly. If the two masters

generate transactions with different transaction types in a period, read-write turnaround happens over and over in that period. The two reasons make bandwidth utilization improvement diminished.

It is obvious that whether which bus arbitration scheme is applied or whether there is bank-interleaving support, burst length 8 always gets highest bandwidth utilization. Hence, we set the data burst length to 8 in the following simulations.

B. Simulation 2 – Choose a proper scoring function

Buffer size 8 entries

Data burst length 8 Bank-interleaving support Yes

Scoring function LTO, LTOT, LTWT Transaction scheduling policy FIFS

Table 4-7 Configuration of the simulator in simulation 2

Different scoring functions are applied in simulation 2 and Table 4-7 lists the configuration of the simulator.

Average DRAM Bandwidth Utilization

Fig. 4-4 Average DRAM bandwidth utilization with different scoring functions

Fig. 4-4 shows the average DRAM bandwidth utilization with different scoring functions. Both LTOT and LTWT work well when fixed priority is applied, the improvement is 17.7% and 17.4% individually. However, LTOT and LTWT works bad when round-robin is applied, the deterioration is 0.4% and 0.8% respectively.

Since the fixed priority bus arbitration scheme provides less parallelism for bank-interleaving, row miss latency hiding by LTOT or LTWT improves the bandwidth utilization a lot. On the contrary, round-robin provides sufficient parallelism for bank-interleaving. Thus, LTOT and LTWT may not benefit.

Because LTOT performs slightly better, we set the scoring function to LTOT.

C. Simulation 3 – Choose a proper transaction scheduling policy

Buffer size 8 entries

Data burst length 8 Bank-interleaving support Yes

Scoring function LTOT

Transaction scheduling policy FIFS, TLRR, MFIFS

Table 4-8 Configuration of the simulator in simulation 3

Different transaction scheduling policies are evaluated in simulation 3 and Table 4-8 lists the configuration of the simulator.

Fig. 4-5 shows the average DRAM bandwidth utilization with different transaction scheduling policies. When the bus arbitration scheme is fixed priority, TLRR is 8.7% worse and MFIFS is 2.8% better than FIFS. When the bus arbitration scheme is round-robin, TLRR is 7.8% worse and MFIFS is 6.1% better than FIFS.

Since TLRR is designed for the multimedia platform with dedicated channels to masters, it cannot work well with limited information caused by single on-chip bus and finite buffer size.

Of course, we choose MFIFS as the final transaction scheduling policy. However, the performance of MFIFS may differ with different buffer sizes and thresholds. Thus, an extra simulation is performed.

Average DRAM Bandwidth Utilization

Fig. 4-5 Average DRAM bandwidth utilization with different transaction scheduling policies

D. Simulation 4 – Choose a proper buffer size and threshold for MFIFS

Buffer size 4, 8, 12, 16 entries Data burst length 8

Bank-interleaving support Yes

Scoring function LTOT

Transaction scheduling policy MFIFS MFIFS threshold 2, 3, 4

Table 4-9 Configuration of the simulator in simulation 4

In simulation 4, different buffer sizes and thresholds are tested. Table 4-9 lists the configuration of the simulator.

Average DRAM Bandwidth Utilization

Buffer size 4 Buffer size 8 Buffer size 12 Buffer size 16

Bandwidth utilization FP_TH2

Fig. 4-6 Average DRAM bandwidth utilization with different buffer sizes and thresholds

Fig. 4-6 shows the average DRAM bandwidth utilization with different buffer sizes and thresholds. When the buffer size is 4, it bounds the bandwidth utilization since the memory controller cannot get sufficient information. When the buffer size is 12, the two bus arbitration schemes can merely affect the bandwidth utilization.

Fig. 4-7 and Fig. 4-8 presents the average transaction latency and DRAM power consumption individually. In Fig. 4-7, larger buffer size with equal transaction processing ability leads to longer latency. However, larger threshold does not inevitably increase the average transaction latency since it may slightly increase the latency of other masters while significantly decrease the latency of one master.

Based on Fig. 4-6, take Fig. 4-7 and Fig. 4-8 as reference, buffer size 12 and threshold 4 are chosen.

Average Transaction Latency

Buffer size 4 Buffer size 8 Buffer size 12 Buffer size 16

Latency (ns)

Fig. 4-7 Average transaction latency with different buffer sizes and thresholds

Average DRAM Power Consumption

Buffer size 4 Buffer size 8 Buffer size 12 Buffer size 16

Power (mW) FP_TH2

Fig. 4-8 Average DRAM power consumption with different buffer sizes and thresholds

E. Summary

Average DRAM Bandwidth Utilization Transition

0.00%

With LTOT With MFIFS With buffer size and threshold

modification

Bandwidth utilization

Fixed priority Round-robin

Fig. 4-9 Average DRAM bandwidth utilization transition through the simulations

Average Transaction Latency Transition

With LTOT With MFIFS With buffer size and threshold modification

Time (ns)

Fixed priority Round-robin

Fig. 4-10 Average transaction latency transition through the simulations

Average DRAM Power Consumption Transition

With LTOT With MFIFS With buffer size and threshold modification

Power (mW)

Fixed priority Round-robin

Fig. 4-11 Average DRAM power consumption transition through the simulations

Fig. 4-9 , Fig. 4-10, and Fig. 4-11 shows the average DRAM bandwidth utilization, transaction latency, and power consumption transitions through simulations.

In Fig. 4-10, when bank-interleaving is supported, the average transaction latency is reduced by 53.6% with fixed priority bus arbitration scheme and 33.9%

with round-robin. The significant reduction is because bank-interleaving can efficiently hide DRAM operation latencies.

According to Fig. 4-9 and Fig. 4-11, when the bus arbitration scheme is fixed priority, the bandwidth utilization is improved by 72.8% with 36.1% more power consumption. When the bus arbitration scheme is round-robin, the bandwidth utilization is improved by 53.3% with 11.9% more power consumption.

Note that MFIFS with buffer size and threshold modification can slightly increase the bandwidth utilization while decrease the power consumption up to 13%.

在文檔中適用於視訊應用的智慧型記憶體控制器設計 (頁 48-57)