• 沒有找到結果。

Chapter 5 Discussion and Conclusion

5.1 Discussion

Wide Bus Multi-port Banked v1 Banked v2 Design

Percentage

1 0 0 .0 0 %

3 0 8 .6 3 %

5 6 .9 7 % 5 0 .9 9 %

Fig. 4-6 Total cache access energy of one frame

4.3 Hardware Simulation Result

Our hardware simulation goal is to check the access timing and area of out design.

We show the simulation results in two parts, which are timing comparison and area comparison.

4.3.1 Timing Comparison

Before see the result of timing comparison, we see the cache configuration of each design first. The cache configuration is shown in Table 4-2. In the # of bank field, single port design and multi-port design have only one data array, so they are seen as one bank. In the last filed, output bit of single port is 128 bits. This is due to the bus

width of cache to texture filter. Output bit of banked v1 is also 128 bits due to the requested texels may be in one bank, so output width of each bank has to satisfy it.

Design name # of bank Access port / per bank Output bits / Access port

Wide Bus 1 1 128

Multi-port 1 4 32

Banked v1 4 1 128

Banked v2 4 1 32

Table 4-2 bank number, access port of each design

We separate the access time of cache into data access and tag access. The data access time is shown in Fig. 4-7, which wide design is base line. From the Fig. 4-7, we can find that data array access time of banked design is smaller than original cache.

This is due to the small cache line size of each data bank. In banked v1 and banked v2, the extra time is due to address control. The extra circuit delay doesn’t cause the access time in data access longer. But in banked v2, the delay of extra circuit is long due to the complex address control. The last part is the multi-port texture cache, which its access time is long due to multi-port overhead.

Delay of Data Access

Wide Bus Multi-port Banked v1 Banked v2

Delay(ns)

Data Array Address Control

Fig. 4-7 Delay of data access time

Fig. 4-8 shows the tag access time of each design. In banked v1 and banked v2, the extra time is tag control and multiplexer before tag compare. The last part is multi-port texture cache, which the access time of tag is longer than other design.

Dalay of Tag Access

Wide Bus Multi-port Banked v1 Banked v2

Delay(ns) Tag MUX

Tag Array Tag Control

Fig. 4-8 Delay of tag access

Fig. 4-9 shows the total access time of each design. From the Fig. 4-9 we find that the access time of banked v1 is only a little long than single port texture cache. But compare to the GPU (shown in Table 4-3) which has the same process (0.13 um), cache access time of our designs is still in one cycle.

Delay of Cache Access

Wide Bus Multi-port Banked v1 Banked v2 Design

Delay (ns)

Data Total Tag Total

Fig. 4-9 Delay of cache access

GPU name Core clock frequency Cycle time

ATI Radeon X800 520 MHz 1.92 ns

Geforce 6800 400 MHz 2.5 ns

Table 4-3 clock frequency of GPU which its process 0.13 um

4.3.2 Area Comparison

The area comparison of each design is show in Fig. 4-10. The extra circuit of banked texture is not much. The maximum area of extra circuit in the banked design

is address control in banked v1 and in banked v2. This is because that one component in them is 32-bit 4-1 multiplexer. There are four 32-bit 4-1 multiplexers in the banked design, so the address control in the two kinds of banked design take a large part of extra circuit.

Extra Circuit 0 0 7957.404113 7034.019347

Cache 1910336.347 22390103.66 1876057.497 1561803.895 Wide Bus Multi-port Banked v1 Banked v2

Fig. 4-10 Area comparison of each design

Although the address control in design v1 and in design v2 takes a large part of extra circuit, but the percentage of the extra circuit in each banked design doesn’t take a large part. As Fig. 4-11 shown, the percentage of extra circuit in banked v2 is only 0.45% and in banked v1 is only 0.43%. The extra circuit of banked v1 is larger than extra circuit of banked v2. As Fig. 4-10 shown, extra circuit area of banked v1 is seven thousand um2 and extra circuit area of banked v1 is almost eight thousand um2.

Area Percenatge of Extra Circuit

0.41%

0.42%

0.42%

0.43%

0.43%

0.44%

0.44%

0.45%

0.45%

0.46%

Banked v1 Banked v2

Design

Percentage

Fig. 4-11 Percentage of extra circuit of banked texture cache

Chapter 5 Discussion and Conclusion

5.1 Discussion

At this section, we compare each design access time, cache area, and access power then discuss them. The cache figure of each design is shown in Fig. 5-1. We compare five texture cache designs, which are list in Fig. 5-1(a) ~ Fig.5-1(e).

Fig. 5-1 (a)

Fig. 5-1 (b)

Fig. 5-1 (c)

Fig. 5-1 (d)

Fig. 5-1 (e)

Fig. 5-1 block of each cache design: (a) One data array, output is 128 bits (b) Interleaved data bank with four tag arrays, output of each bank is 32bits (c) Interleaved data bank with share tag, output of each bank is 32bits (d) Interleaved data

bank with banked tag, output of each bank is 32 bits (e) Continuous data bank with banked tag, output of each bank is 128 bits

We use number 1~5 to be performance, small number means better (time is short or area is small). On the other hand, big number means worse (time is long or area is large). We can find that banked v2 with banked tag which is design (e) has small area and low access power. But the access of design (e) depends on the number of data bank accessed. If the requested texels are in one data bank, the access power is lowest.

If the requested texels are in four data banks, the access power is highest.

Design Name Access time Cache area Power / per access

(a) 4 3 2

(b) 3 4 5

(c) 5 5 4

(d) 1 1 3

(e) 2 2 1*

Table 5-1 Comparison of each cache design

In larger cache lines, the percentage of the requested texels in one data bank is larger. We consider larger cache line size for banked v2 for less data bank access. But in banked v2, the output bit of each bank is equal to the line size. It means that the dynamic power of each bank will be high.

相關文件