Since on-chip L2 cache is bigger and bigger nowadays, low power L2 cache becomes an important issue in the cache system design. In this thesis, we study the design of low power L2 caches in architecture level. And our means is to save the dynamic energy of L2 caches.
The power consumption of an on-chip set-associative L2 cache is discussed in this chap-ter. We analyze the dynamic read energy per access in set-associative cache and introduce the basic concept of way prediction approach. Motivations and objectives are also presented here.
1.1 Power Consumption of an L2 Cache
In the nowadays desktop processor design, on-chip set-associative L2 cache is always a necessary component. A level-2 cache can decrease cache miss rate thus enhance performance.
Properly speaking, a larger L2 cache can decrease cache miss rate more, but its power con-sumption will also increase significantly. There are some characteristics of an L2 cache that we concerned:
1. The associativity is high (≥ 4).
– Higher associativity means that more cache lines should be read per access.
2. Cache line size is big (≥ 64 Bytes).
– Bigger cache line means that more read energy when accessing a cache line.
3. Access frequency depends on L1 miss rate.
– The higher L1 miss rate, the more L2 cache accesses will occur, and thus energy con-sumption of the L2 cache will increase.
Figure 1-1 shows the power distribution of an overall processor (70 nm) [1]. The propor-tion of 1MB L2 cache power is 20%. 11% of the processor power is the static power of the L2 cache and the other 9% part is the dynamic power. Currently, many researches focus on static power of L2 cache and their achievements have already saved about 80% of static power.
2
However, there are just few researches focus on dynamic power of L2 cache which will also increase while the size of L2 cache is growing. And most importantly, much of dynamic energy consumes on reading unnecessary tags and data. So we would like to save dynamic energy from this 9% part.
In a RAM-tagged N-way set-associative cache, N tags and N L2 cache lines are accessed concurrently. After N tags comparison, if tag hit occurs, only one cache line will be chosen from N L2 cache lines. Figure 1-2 shows the simple architecture of a 4-way set-associative
cache. Assume that the required data is in way 2. In the conventional process, all tags and cache lines are accessed concurrently. But, i.e., unnecessary tags and data in other ways (way 0, way1 and way 3) are read out too. Compared to a set-associative cache with sequential
Figure 1-1 Power Distribution of Overall Processor
Figure 1-2 A Normal Read Activity of 4-way Set-associative Cache
3
search approach, this conventional process reduces the latency of accessing cache but energy consumption will get higher.
Figure 1-3 shows the read energy per access in a 512KB 8-way L2 cache which is a common configuration in desktop processor. Reading of data and tags consumes 77.9% of total read energy per access. Since only a single way will be the required data while the cache is hit, 68.3% of read energy consumes on accessing unnecessary tags and data. Our research will focus on eliminating this unnecessary read energy. To achieve this goal, a way prediction mechanism can be applied by early identifing the way of the required data, and only a single way is activated if the way prediction is correct.
1.2 Way Prediction Concept
The basic idea of a way prediction scheme is to make a prediction of the way where the required data may be located in a set-associative cache. This scheme will probe the predicted way first. Only a single way is accessed at the first probe. If the prediction is correct, access latency and energy consumption of the cache is similar to that of a direct-mapped cache with the same size of a single way. In the general approaches, if the first probe misses, the second probe will access all ways expect the first probed way. In other words, if the prediction is wrong, the cache is accessed again to retrieve the desired data, that is, the cache is accessed twice. The performance will degrade since the access latency of the cache becomes longer.
Figure 1-3 Read Energy per Access in a 512KB 8-way Cache
4
1.3 Motivation & Objective
In this section, the motivations and objectives are discussed. The motivations will focus on the benefits that past researches do not achieve. And we also introduce our design ap-proach and goals in this section briefly.
Three motivations are showed below:
1. There is much dynamic read energy consumed on accessing unnecessary tags and cache lines in an L2 cache (about 68% in 512KB 8-way L2 cache). If we can early identify the way of the required data, at least 68% of dynamic read energy can be saved per access.
2. The researches of way prediction for L2 cache are few. The best power saving of L2 cache among these researches is about 47% because the prediction accuracy is not high (70~80%). The optimal case should save 70~80% of dynamic read energy.
3. If we can guarantee that the other ways except the predicted way need not to be probed when the way prediction is wrong, both power consumption and access latency of L2 cache can be enhanced.
Three objectives of our design are showed below:
1. Attach a table, called way table, in a translation lookaside buffer (TLB) to record the way index of an L2 cache line when this line is placed into L2 cache. Way Index indicates the way number in a set-associative cache. And this enhanced TLB is called way predicted TLB (WP-TLB). The merit of attaching the way table in a TLB is that the way table need not store and compare tags then.
2. When an address reference come to WP-TLB, the way index of the L2 cache line asso-ciated with this address reference in the way table is searched for way prediction. If the way table miss occurs, all tags and data are read out which is the same as a conventional access of a set-associative cache. If the way table hit occurs, only a single way is acti-vated whether the way prediction is correct or not. When the way table hit occurs and
5
prediction is wrong, we called this situation as miss way prediction.
3. When miss way prediction occurs, make sure that miss prediction line is not in other ways. The way index in the way table is the latest information. If the wrong way index causes miss way prediction, it means that the corresponding L2 cache line was replaced and has never been moved in L2 cache again. Thus only a single way must be probed when miss way prediction occurs.
In the next chapter, we will introduce the background associated with our research more precisely and also discuss some related researches which can be applied on L2 cache. The de-tails of our design will be proposed in Chapter 3. And the experimental results and discussions are showed in Chapter 4. The last chapter include conclusion of this research and future works.
And the future works will discuss static power of L2 cache and discuss our design under another cache environment.
6