Chapter 5 Discussion and Conclusion
5.1 Discussion
Wide Bus Multi-port Banked v1 Banked v2 Design
Percentage
1 0 0 .0 0 %
3 0 8 .6 3 %
5 6 .9 7 % 5 0 .9 9 %
Fig. 4-6 Total cache access energy of one frame
4.3 Hardware Simulation Result
Our hardware simulation goal is to check the access timing and area of out design.
We show the simulation results in two parts, which are timing comparison and area comparison.
4.3.1 Timing Comparison
Before see the result of timing comparison, we see the cache configuration of each design first. The cache configuration is shown in Table 4-2. In the # of bank field, single port design and multi-port design have only one data array, so they are seen as one bank. In the last filed, output bit of single port is 128 bits. This is due to the bus
width of cache to texture filter. Output bit of banked v1 is also 128 bits due to the requested texels may be in one bank, so output width of each bank has to satisfy it.
Design name # of bank Access port / per bank Output bits / Access port
Wide Bus 1 1 128
Multi-port 1 4 32
Banked v1 4 1 128
Banked v2 4 1 32
Table 4-2 bank number, access port of each design
We separate the access time of cache into data access and tag access. The data access time is shown in Fig. 4-7, which wide design is base line. From the Fig. 4-7, we can find that data array access time of banked design is smaller than original cache.
This is due to the small cache line size of each data bank. In banked v1 and banked v2, the extra time is due to address control. The extra circuit delay doesn’t cause the access time in data access longer. But in banked v2, the delay of extra circuit is long due to the complex address control. The last part is the multi-port texture cache, which its access time is long due to multi-port overhead.
Delay of Data Access
Wide Bus Multi-port Banked v1 Banked v2
Delay(ns)
Data Array Address Control
Fig. 4-7 Delay of data access time
Fig. 4-8 shows the tag access time of each design. In banked v1 and banked v2, the extra time is tag control and multiplexer before tag compare. The last part is multi-port texture cache, which the access time of tag is longer than other design.
Dalay of Tag Access
Wide Bus Multi-port Banked v1 Banked v2
Delay(ns) Tag MUX
Tag Array Tag Control
Fig. 4-8 Delay of tag access
Fig. 4-9 shows the total access time of each design. From the Fig. 4-9 we find that the access time of banked v1 is only a little long than single port texture cache. But compare to the GPU (shown in Table 4-3) which has the same process (0.13 um), cache access time of our designs is still in one cycle.
Delay of Cache Access
Wide Bus Multi-port Banked v1 Banked v2 Design
Delay (ns)
Data Total Tag Total
Fig. 4-9 Delay of cache access
GPU name Core clock frequency Cycle time
ATI Radeon X800 520 MHz 1.92 ns
Geforce 6800 400 MHz 2.5 ns
Table 4-3 clock frequency of GPU which its process 0.13 um
4.3.2 Area Comparison
The area comparison of each design is show in Fig. 4-10. The extra circuit of banked texture is not much. The maximum area of extra circuit in the banked design
is address control in banked v1 and in banked v2. This is because that one component in them is 32-bit 4-1 multiplexer. There are four 32-bit 4-1 multiplexers in the banked design, so the address control in the two kinds of banked design take a large part of extra circuit.
Extra Circuit 0 0 7957.404113 7034.019347
Cache 1910336.347 22390103.66 1876057.497 1561803.895 Wide Bus Multi-port Banked v1 Banked v2
Fig. 4-10 Area comparison of each design
Although the address control in design v1 and in design v2 takes a large part of extra circuit, but the percentage of the extra circuit in each banked design doesn’t take a large part. As Fig. 4-11 shown, the percentage of extra circuit in banked v2 is only 0.45% and in banked v1 is only 0.43%. The extra circuit of banked v1 is larger than extra circuit of banked v2. As Fig. 4-10 shown, extra circuit area of banked v1 is seven thousand um2 and extra circuit area of banked v1 is almost eight thousand um2.
Area Percenatge of Extra Circuit
0.41%
0.42%
0.42%
0.43%
0.43%
0.44%
0.44%
0.45%
0.45%
0.46%
Banked v1 Banked v2
Design
Percentage
Fig. 4-11 Percentage of extra circuit of banked texture cache
Chapter 5 Discussion and Conclusion
5.1 Discussion
At this section, we compare each design access time, cache area, and access power then discuss them. The cache figure of each design is shown in Fig. 5-1. We compare five texture cache designs, which are list in Fig. 5-1(a) ~ Fig.5-1(e).
Fig. 5-1 (a)
Fig. 5-1 (b)
Fig. 5-1 (c)
Fig. 5-1 (d)
Fig. 5-1 (e)
Fig. 5-1 block of each cache design: (a) One data array, output is 128 bits (b) Interleaved data bank with four tag arrays, output of each bank is 32bits (c) Interleaved data bank with share tag, output of each bank is 32bits (d) Interleaved data
bank with banked tag, output of each bank is 32 bits (e) Continuous data bank with banked tag, output of each bank is 128 bits
We use number 1~5 to be performance, small number means better (time is short or area is small). On the other hand, big number means worse (time is long or area is large). We can find that banked v2 with banked tag which is design (e) has small area and low access power. But the access of design (e) depends on the number of data bank accessed. If the requested texels are in one data bank, the access power is lowest.
If the requested texels are in four data banks, the access power is highest.
Design Name Access time Cache area Power / per access
(a) 4 3 2
(b) 3 4 5
(c) 5 5 4
(d) 1 1 3
(e) 2 2 1*
Table 5-1 Comparison of each cache design
In larger cache lines, the percentage of the requested texels in one data bank is larger. We consider larger cache line size for banked v2 for less data bank access. But in banked v2, the output bit of each bank is equal to the line size. It means that the dynamic power of each bank will be high.