Chapter 3 Design Approach
3.3 Way Table Design
Contents and utilization of a way table
Figure 3-6 shows the proposed way table structure. The number of fields depends on memory page size dividing by L2 cache line size. And the number of entries in a way table is the same as the number of TLB entries. If a page contains N L2 cache lines, N fields are at-tached to each way table entry. Each field contains a valid bit and a way index. The valid bit is equal to one if this line’s way index has ever been recorded. All ways of the L2 cache will be activated simultaneously while the corresponding valid bit is zero. This valid bit can guaran-tee that we will not use the way index which has never been recorded. If no valid bit is pro-vided in the way table, the following case will happen: when an L2 cache line was ever rec-orded but the corresponding TLB entry was replaced, the way index of this L2 cache line will not backup. Later a reference of this cache line comes again; the TLB will miss and then place
Figure 3-5 Architecture Overview of Our Design
21
the corresponding page into a TLB entry. Because we do not have the valid bit for each way index, so the way index will indicate an indeterminate way and may cause miss way predic-tion. In this situation, we need to probe the other ways when miss way prediction occurs be-cause the required data may be located in one of them. The way index is the way number of the L2 cache line. A 4-way L2 cache means we need two bits to record.
In Figure 3-7, if we use N fields to store way indices for a page, the average utilization fields of SPEC2000 benchmark are 37% for a 128-entry instruction way table and 63% for a 128-entry data way table. The utilization rate means that how many fields are recorded in a way table entry. The utilization rate is the higher is the better. For a data way table, the highest rate is 95% and the lowest rate is 6%. The gap between the highest and the lowest rate is very large because the data use for each program is usually different. And the average rate is 63%
proves that most of N fields for a page are utilized efficiently. For an instruction way table, the highest rate is 58% and the lowest rate is 25%. This result is caused by branch instructions.
The average rate is only 37% tells us that we can reduce half of fields per entry, for example, two L2 cache lines share one field. Leaving the cache line competition problem aside, if the numbers of fields are reduced by half, the additional tag need to be added in each field in or-der to recognize different L2 cache lines. And the overall way table size with N/2 fields is similar to a way table with N fields. The conclusion is that if we want to reduce the numbers of fields, one-fourth of fields or fewer are better. And it may be worthy when the utilization
Figure 3-6 Contents of the Way Table
22
rate is less than 25%.
The relation between L2 cache lines and fields of the way table
A page would map to a physical memory block if this page resided at the memory. Figure 3-8 is an example about how a memory reference map to the dedicate field. In Figure 3-8, we assume the virtual page 101110 maps to the physical page 1101. A memory reference 101110 0101 comes. After TLB translation, the physical address which is divided into tag, set and line offset fields is 1101 0101. The page offset contains four bits and the line offset contains two bits. It means that there are four L2 cache lines (A, B, C and D) in this memory page. And the left two bits of the page offset (we called field index) would indicate the field in the way table that we should access. In this example, the field index 01 which belongs to cache line B will indicate to the field 1 of the corresponding way table entry. Therefore, when a memory refer-ence comes to the way table or an L2 cache line is moved into L2 cache, we can easily access the correct field of the way table by using its field index.
0.00%
Figure 3-7 Utilization Rates in Way Table
23
Writing and updating the way table
Table 3-1 shows the timing and the actions when writing and updating a way table. There are two writing conditions. First, when an L2 cache line is moved into the L2 cache, the way index of this line would be recorded into the way table. Second, when way table miss (valid bit is 0) but L2 cache hit occurs, it means that the way table ever recorded the way index of the L2 cache line but suddenly the page of this line was swapped out of the TLB, and the cor-responding way table entry is not backup. So we must record its way index again. Besides, when an L2 cache line is swapped out, we do not invalidate the valid bit of the corresponding field in the way table because the overhead of searching way table for invalidating the way index of the replaced cache line is much complicated. Moreover, avoid the invalidation of the replaced cache line will cause miss way prediction. However, our approach will probe only a single way when miss way prediction occurs. So the effective of no invalidation for the re-placed cache line seems not serious. The way table needs to be updated a new way index in one situation: when miss way prediction occurs which will be discussed briefly in the next paragraph.
Figure 3-8 Indexing of the Way Index in Way Table by Using “Field Index”
24
Figure 3-9 is an example of the miss way prediction case. Originally, line B has already resided at way 1. After line A which comes from the main memory is moved into way 1, line B is swapped out of L2 cache. At this moment, we do not invalidate the valid bit of line B’s corresponding field. Later a memory reference of line B comes, and then its valid bit and the way index of the corresponding field are 1 and 01, respectively. Only way 1 is probed at the first access, and line A resides in it. Thus miss way prediction occurs and the way table needs to be updated the new way index of line B. In Figure 3-9, when miss way prediction occurs, if line B was ever moved into way 0, 2 or 3, the corresponding field would be updated the new way index with way 0, 2 or 3 but not way 1. The field of line B holds a wrong way index be-cause since line B was replaced by line A, line B has never been moved into the L2 cache again. Miss way prediction tells us that line B is not in L2 cache. In fact, our approach guar-antees that the miss predicted line is not in L2 cache. We do not need to spend extra delay for activating the other ways of the corresponding set. The read energy is also saved when miss way prediction occurs.
Table 3-1 Timing of Writing and Updating Way Table
25
A basic example of saving dynamic energy
Figure 3-10 shows an example of saving the dynamic energy. Line A is an L2 cache line and it contains four L1 cache lines, which are A0, A1, A2 and A3. When the data of A1 is ac-cessed in the first time, A (A0, A1, A2 and A3) was moved into the L2 cache and then A1 was moved into the L1 cache. The way index of the corresponding field of the way table was wrote. Later the references of A0, A2 and A3 came, the way index could be found in the way table and just activated A’s way in L2 cache. Probably, A1 was dropped out of L1 cache but A was still in L2 cache. The memory reference of A1 could just activate a single way in L2 cache. Besides, if A is dropped out of the L2 cache but the corresponding field is not flushed by TLB replacement, the dynamic read energy can also be saved while references of A comes.
This situation is such called miss way prediction.
Timings of just activating a single way (save dynamic energy) are summarized below:
Figure 3-9 an Example of Miss Prediction Way
Figure 3-10 an Example of Dynamic Energy Saving of L2 Cache by Way Prediction
26
– When references of A0, A2 or A3 come.
– A1 is dropped out of the L1 cache but A is still in the L2 cache.
– A is dropped out of the L2 cache but the valid bit of corresponding field is still be 1.