Other Modifications - Proposed Modifications

III. Proposed Modifications

3.3 Other Modifications

Since a private partitioning scheme splits the cache into private partitions, only intra-processor misses occur. As such we can apply standard techniques to reduce the miss rate of a cache on a per-partition basis. Such techniques include varying the block size, varying the associativity, and changing the replacement policy.

3.3.1 Per-Partition Variable Block Size

Varying the block size can result in a decrease in compulsory misses for an application, but needs to be balanced with the increase in conflict and capacity misses. A brief overview of various methods to detect the appropriate block size is presented. One method is to have an external monitoring circuit (functionally identical to the cache but not containing data) snooping cache accesses calculating miss rates using different block sizes. Applied on a per partition basis, copies of each of these monitors will be needed for each partition, linearly increasing overhead.

Another option would be to have in cache monitoring, reserving certain sets in the shared cache for particular core and block size. This is not very scalable as a large number of sets will be needed for increasing numbers of cores and possible block sizes.

A third alternative would be to store a number of accessed addresses per core and get the number of hits for that address if the block size were varied. This is lower overhead than making a copy of the cache and able to scale with the number of partitions. However the issue is it does not take into account the effect of replacement policy. One way to deal with

that would be to remove the address from the address list when the block is evicted from the cache. However when the block size monitored is larger than the current block size, the address may be in the cache but could not fit if the block size were larger. For this case the predicted number of hits would be incorrect.

The block size can be adjusted either periodically or based on event. Such an event could include when the cache is repartitioned or when the monitored miss rate differential between the current block size and a monitored block size is greater than a threshold.

The biggest question related to variable block sizes is how they can be implemented, and if the cache can support that implementation on a per-partition basis or it can only work globally across the whole cache. Since the cache size is conceptually fixed (only changed on repartition and not by the variable block mechanism) to change the block size requires either a change in the number of sets or a change in associativity.

Conceptually combining two sets into one set will double the block size. This can be implemented with low cost by fetching multiple sequential blocks per memory request and putting them in different sets. This means block size can be adjusted using integer multiples of the minimum block size. It will however increase the number of memory requests, impacting performance. One issue is that the index function is unaware of the change in number of sets and may index into the middle of a large block. Physically the number of sets in the cache can stay the same, but the index function can be modified to ignore (set to 0) lower order bits when the block size increases, counteracting this issue. If the lower order bits are ignored this means the block size must be a power of two, which it commonly is anyway.

This method for varying the block size is easily adapted to a per partition basis. Each partition is associated with a block size, with the index function changing based on the

partition’s block size. This may slightly affect the time to index a function since a lookup is needed. The stored block size is also used to determine how many blocks to sequentially fetch from the next level of the memory hierarchy. When repartitioning and combining partitions with different block sizes, nothing needs to be done as blocks will be overwritten on demand and are still able to be accessed by either core with their different index functions.

Changing the associativity is conceptually similar. The underlying structure of the cache in terms of number of sets and blocks and ways remains the same, it is the unit of addressing that changes. Conceptually, ways are combined to increase the block size. This can be done by again fetching multiple blocks on a miss, however this time the fetched block go in the same set so multiple blocks need to be evicted. Eviction needs extra time to find the correct block to evict and then use this to find the additional blocks that need to be evicted. This can be performed in parallel to evicting the first block and fetching the next block so has no impact on latency. Apart from this, there are no other major modifications needed compared to the original cache structure. Block size per partition is stored and used when fetching and evicting a block. Combining partitions with two different block sizes may cause an issue with choosing which blocks to evict and put in the cache since the number of blocks in a set may not be a power of two. This requires extra logic to deal with the situation.

3.3.2 Per-Partition Variable Associativity

Varying the associativity can result in a change in the number of conflict misses for an application. Methods to detect the desired associativity are similar to that detecting the desired block size. Again, an external monitoring circuit (functionally identical to the cache but not containing data) snooping cache accesses calculating miss rates using different associativities. In-cache monitoring, reserving certain sets in the shared cache for particular

core and associativity. This is not as practical unless the associativity is reduced and not increased from the baseline associativity. Similar to varying the block size, associativity can be adjusted either periodically or based on event.

Since the cache size is conceptually fixed, varying associativity can be achieved by either changing the block size or changing the number of sets. Changing the block size and associativity was described previously - combine ways. This only works to decrease the associativity from the baseline.

Changing the number of sets however, can both increase and decrease associativity. As partitions do not necessarily have an associativity that is a power of two, it is easiest to restrict changes in the number of sets in a partition to halving and doubling. This will double/halve the associativity of the partition respectively. Physically the cache will maintain the same structure, but conceptually the number of sets changes. This requires changes in both the index function and replacement policy to support. When increasing associativity, the number of LRU bits changes so the replacement policy must deal with this (perhaps by randomly choosing the LRU block of one of the sets). When looking for a hit or block to evict, the index function must search two or more sets - done by searching all combinations of the LSBs of the index that change. When decreasing the associativity, blocks that conceptually are in different sets share the same physical set. The replacement policy must deal with this by ensuring the correct blocks are evicted by checking the LSBs of the address tag. The biggest issue is what to do with existing data when increasing or decreasing the number of sets. The easiest way is to invalidate any of the data in the conceptually new portions. The data could be kept, however requires a large overhead to adjust and find data that can exist in new location (able to be found and will not incorrectly cause a hit).

Applying varying associativity by changing the number of sets is able to be applied on a

per-partition basis, as combining partitions with different associativities is similar to changing the associativity. The partitions can first match their associativity, then one partition readjust their associativity back with a different partition size.

3.3.3 Per-Partition Replacement Policies

Changing the replacement policy can result in a decrease in the number of conflict misses, specifically replacement misses. When implementing on a per-partition basis, partitions with different replacement policies can be combined at any time so the replacement policies need to be active concurrently and updating any tags needed for all accesses. On a miss, each replacement policy will select a block to evict and the final block to evict is chosen which replacement policy is currently in use by the partition. This means the replacement policy tags can not be reused increasing hardware cost. Also since all replacement policies are run concurrently for each access, power is increased.

Selection of the replacement policy to use can also be done in a similar fashion to the past two methods - having an external monitor with different replacement policies or an in-cache monitor. Miss rates can be recorded and replacement policy selection based on this.

在文檔中快取分區模式效能改進的方法 (頁 29-34)