CHAPTER 3 PROBLEM MODELING
3.3 Direct Mapped Cache
For arranging objects for a general K-set direct mapped cache (K>1), it involves not only packing but also placement movements. Because the main memory is divided into K regions, there are K memory block sets. Each set Bk = {bk, bk+1×K, bk+2×K,…}
contains more than one memory blocks, where 0 k < K. The combination of the two movements creates a two-dimensional mapping that injects every object to a (set, block) pair, defined as follows.
Definition 3.3. fpp : O S Bk, where O is the object set, S represents cache sets, and Bk represents blocks in the k-th cache set.
The mapping can transform an object access trace OT to a block access trace BT, and each element in the BT is an ordinal pair of the set and block. According to the mapped cache set index k, the BT can be decomposed into K disjoint block access sub-traces, denoted as BTk, where 0 k < K. In the meanwhile, the mapping of the one-page cache can be regarded as a special case of a one-dimensional mapping working on subspace fpp : O 1 B0. As a result, the object access trace is no longer decomposable.
OT abhecfafgbhcgdefegfcdbhfdahegdaf
BT WWZYXYWYZWZXZXYYYZYXXWZYXWZYZXWY
BT0 WW X W W X X XXW XW XW BT1 ZY Y YZ Z Z YYYZY ZY ZYZ Y CBT0 W X W X W XW XW CBT1 ZY Z Y ZY ZY ZYZ Y
(a)
oi a b c d e f g h bj W W X X Y Y Z Z
(0,0) (0,0) (0,1) (0,1) (1,0) (1,0) (1,1) (1,1) (b)
Figure 3.5. (a) An example of object access trace, block access trace, block access sub-traces, and compressed block access sub-traces. (b) A legal fpp injects eight objects to four memory
blocks.
Consider accessing eight objects on a 2-set direct mapped cache. The OT in Figure 3.5(a) is an object access trace which consists of eight objects. Figure 3.5(b) is an fpp injects these objects to memory blocks. A memory block can be numbered as a (set, block) pair. Figure 3.5(a) also shows the BT, which is converted from OT by the mapping fpp, and two decomposed sub-traces, BT0 and BT1.
Because memory blocks belonging to the same cache set contend for a single cache block, it makes each block access sub-trace can be regarded as a standalone block access trace working on a one-page cache. In this respect, the number of cache misses caused by the block access trace BT can be calculated by the following formula:
Because the mapping fpp can decompose the original block access trace to K disjoint block access sub-traces BTk, the first equation means that summing up the misses of all sub-traces equals total misses. The subsequent equation implies that each sub-trace works on a one-page cache. The original problem becomes a joint of one-page cache problems. According to the discussion in the one-page model, the number of misses caused by the original block access trace is equal to the length of the compressed block access trace. It results to the last equation. The number of misses can be calculated by summing up the length of all the compressed block access sub-traces, denoted as CBTk. For example, in Figure 3.5(a), CBT0 and CBT1 are compresses block access sub-traces of BT0 and BT1, respectively. The cache misses caused by the OT under the mapping fpp is 21.
Figure 3.6. The components of an object access graph for the direct mapped cache.
The deriving of the formula explains the essentiality of defining the one-page cache. Particularly, the deriving process implies that after distributing objects to sets, the original problem becomes K sub-problems, and each of them can be a graph partitioning problem.
We can extend the graph model of the one-page cache to express the object access graph for the K-set direct mapped cache. After applying the mapping fpp to a given object access graph, it generates a two-level partition graph OG’ as illustrated in Figure 3.6. Since the purpose of the mapping fpp is to assign each object to a (set, block) pair.
The components of OG’ include objects, partitions, and regions. The definition of objects and partitions are the same as those defined for the one-page cache model. The disjoint regions enclose partitions in the graph OG’. A region corresponds to a cache set such that the graph OG’ has K regions for a K-set direct mapped cache. The edges in the graph OG’ can be classified into three types, described as follows.
Type-I Edges – The Interior edges within partitions, as previous definition.
Type-B Edges – The edges across different partitions (Blocks) but within the same region.
Type-S Edges – The edges across different regions (cache Sets).
These three types of edges can classify the origin of cache hits and misses to the following items.
Hit-I – An object pair (oi, oj) connected by a Type-I edge is located in the same memory block. It implies both objects must exist in the cache block simultaneously.
Therefore, the transitions from oi to oj in the object access trace always causes cache hits.
Miss-B – An object pair (oi, oj) connected by a Type-B edge is located in two distinct memory blocks but belong to the same set. Because only one cache block is available for swapping memory blocks from one set, either oi or oj exclusively stays in the cache block. A transition from one to the other in the trace leads to swap two distinct blocks into the cache block, and this activity causes one cache conflict miss.
Hit-S and Miss-S – Objects (oi, oj) connected by Type-S edges are located in different sets. Since each cache set works independently, a transition of a Type-S edge may cause either cache hit or miss. The reason of the errors is the graph model is based on the pair-wise trace information. Petrank and Rawitz [68][69] have stated that it is insufficient for precise estimating cache misses with pair-wise information.
In other words, all activities happened before the transition of the given Type-S edge working together to determine whether it causes cache hit or miss.
Observing the classified origin of cache hit and miss sorts out the strategy of the packing and placement technique. Decreasing the amount and length of Type-B edges certainly helps to decrease Miss-B. In the respect of one-page cache, minimizing sum of Type-B edge length is equal to generating shortest CBTs. Meanwhile, for a given object access trace, |BT| is fixed among all object layouts, and the follow relation holds –
∑wi,j= |OT| = |BT|
=Length(Type-I-Edges)+Length(Type-B-Edges)+Length(Type-S-Edges)
(3.6)
items is maximized. That means, we are looking for maximizing Length(Type-I-Edges) + Length(Type-S-Edges). The next problem is to develop a method to find a layout satisfying the goal. However, it is hardly to find an optimal answer. In the next Chapter, we shall discuss about this issue and propose heuristics for this goal.
On the other hand, assuming all small objects have been packed to memory blocks, the remaining job is to distribute these blocks to sets. It becomes considering the placement problem for the K-set direct mapped cache. By the previous analysis on the packing and placement problem, we can propose another respect in modeling the placement problem. In terms of the graph OG’, all the Type-I edges are excluded from the placement problem, because they were handled by the packing stage. By that means, the placement problem is defined as follows.
Definition 3.4. Consider the block access graph BG=(B,E), where B={b1,b2,…}
represents vertexes corresponding to memory blocks, and E is the edge set constructed from the compressed block access trace. Each edge ei,j has a length wi,j, derived from the trace information. The goal is to partition B into K subsets {B0, B1,.., BK-1} and maximize the Equation (3.7). Actually, the edge set in BG is the union of Type-B edges and Type-S edges. The objective function (goal) is to maximize the sum of the length of Type-S edges.
Proposition 2. The placement problem for the direct mapped cache is equivalent to the MAX k-CUT problem.
Papadimitriou and Yannakakis [34] suggest that an unweighted version of this problem is a MAX-SNP complete problem. Kann et al. [43] show MAX k-CUT problem, defined in Section 2.2, and its dual, the MIN k-PARTITION problem, are NP-hard. That means the placement problem is hard and cannot be solved in polynomial time.
Some related researches consider the placement problem as k-coloring problem (such as Hashemi, Kaeli, and Calder in [57], Kalamatianos and Kaeli in [58]). The coloring respect is to assign two different colors to two consecutive executed objects. In other words, these two objects are distributed to different cache sets. However, the k-coloring problem does not deal with edge lengths. That means it could ignore the weighted affinity information between two objects. On the contrary, modeling the placement problem after MAX k-CUT emphasizes the influence of temporal relationships. This is the difference between our placement approach and the others’.