Object Access Trace - PROBLEM MODELING - 適用於快取記憶體的封裝暨安置物件方法

CHAPTER 3 PROBLEM MODELING

3.1 Object Access Trace

We start to discuss the packing and placement problem in a formal way. Consider a set of objects, defined as O = {o₁, o₂, o₃,...}. These elements are the relocatable units to be placed in the memory. Since one of the problem presumptions is sizes of objects are irregular, not necessary identical, the function size(o_i) denotes the size of the given object o_i. Besides, the function addr(o_i) denotes the beginning address in the memory of the given object.

The problem assumes that one of the three cache organizations is configured to mediate the processor/program and the main memory. Consider either the direct mapped cache or the set associative cache, it is assumed to have K sets. A cache block has M bytes in size. Because the cache memory exchanges raw data with the main memory by cache blocks, the main memory space is segmented into memory blocks. The size of a memory block is M bytes, identical to the size of a cache block, so that it can fit into a cache block. The collection of memory blocks is defined as a set B = {b₀, b₁, b₂,...}. In a program’s respect, it can access (load/store) arbitrary objects in the main memory. The

bottom layer undertakes data access activities. When accessing object o_i, the cache system loads the memory block containing the o_i from the main memory to a cache block. The loaded memory block b_j can be derived by (3.1).





 M

j addr( ⁱ) (3.1)

After that, the program accesses the object in the cache block. Since a direct mapped cache divides the memory space into K sets, the block b_j is located in set B_k, where k is calculated by (3.2).

k ≡ j (mod K) (3.2)

As the program constantly accesses objects in the main memory, the activities can be recorded as a trace of the accessed objects, denoted as object access trace (OT). It is used to represent the accessed objects arranged in temporal order. Figure 3.1 explains the conversion flows of the object access trace. It contains three traces. The first object access trace (OT) are composed of alphabets denote objects. Its entire trace can be converted to an address trace (AT) by written down the address numbers of each object with function addr(). Similarly, applying Equation (3.1) to elements in AT yields the block access trace (BT). The horizontal line that divides an address number into two parts denotes it. It is the sequence of blocks swapped into the cache. A cache conflict miss arises upon mismatch, the system pays penalty for loading the missing block to the cache.

PROGRAM

Trace (AT) Block Access Trace (BT)

Figure 3.1. The conversion of object access trace to block access traces.

Consider an object access trace shown as the first row of Figure 3.2(a). The object access trace is converted to a block access trace (BT) under the mapping shown in Figure 3.2(b). The second row of Figure 3.2(a) is a block access trace. When the system is about to access b_j, it matches whether the cache block in the set B_{j(mod k)} holds b_j.

Figure 3.2. (a) An example of object access trace, block access trace, and compressed block access trace in three rows. (b) A legal packing mapping that injects six objects to three

memory blocks.

The goal of this problem is to find a layout scheme that assigns objects to the memory space. The layout scheme injects objects to blocks, as well as object access trace to block access trace. After the new layout scheme is deployed, the new block access trace working on the K-set direct mapped cache is expected to cause fewer cache misses because of the layout scheme.

In the meanwhile, the problem has two preconditions. First, it restricts an object must be smaller than a memory block, i.e., i, size(o_i)  M. It leads to a memory block can hold several objects. Assigning address to an object is equivalent to determining both the memory block and cache set the object shall attend. Meanwhile, as long as the cache block gets larger (M increases), the horizontal line moves to the left progressively in Figure 3.1. The side effect is to inject more objects to the same memory block. In other words, this problem considers the scheme of “packing” objects to memory blocks and “placing” objects to cache sets simultaneously. This is the major difference between our study and related researches dealing with sole placement problem.

The second precondition disallows any object to be placed across memory blocks.

Since an object is assumed smaller than a memory block, the entire object is restricted to lie within a memory block, not crossing two of them. The condition prevents extra cache load. Make such a presumption is reasonable. For instance, real compilers have a code/data alignment optimization pass [96]. The optimization pass aligns instruction blocks or data items, prevents them to lie across the cache block boundary, and reduces extra fetches (also suggest by Intel [11]).

The proposed approach employs the information from the object access trace to construct the layout scheme by the packing and placement technique. The object access trace can be obtained by capturing the activities in executing benchmark or real programs. Our study itemizes scopes in measuring the trace information. The scopes differ by the connectivity of objects in the trace. Distinguishing these scopes is important because it affects the choice of methods for the packing and placement problem. The scopes are listed as follows.

 Degree-1 trace information

This is to count the number of occurrences of each object in the entire object access trace. Telling the popularity of objects is useful. It is call “Degree-1” since the measuring scope is limited to one object, regardless of before and after objects by temporal order. For example, the profile information used in Path Flow Analyzer for PA-RISC (mentioned in [52]), the researches of Steinke et al. [98], and Raman and August [1] can be classified to this category.

 Degree-2 trace information

Degree-2 access trace information is to observe the pair-wise relation between two objects in the trace. In other words, it counts the occurrences of object pairs in the access trace. The symbol w_i,j denotes the occurrence of the segment o_i, o_j in the object access trace. The relation is undirected, and o_i, o_j is equivalent to o_j, o_i. For example, consider the object access trace shown in the first row of Figure 3.2(a). Its access trace information is expressed as the adjacent matrix in Figure 3.3(a).

Degree-2 trace information is used in several related researches, such as [54][57][58]. There are variations by incorporating different metrics to express the affinity between two objects, such as Gloy et al. in [2].

 Degree-k trace information ( k > 2 )

By extending the idea of the Degree-2 trace information, Degree-k trace information means concerning an object with the (k-1)-th after object. The entering and leaving of an object is not merely decided by the preceding object. More than one object together composes the complete cache activity history. Such as the analysis technique showed in Section 3.4, both Degree-2 and Degree-3 trace information are used to reflect the relations of objects entering and leaving. The importance is stressed by Petrank and Rawitz in [68][69]. They suggest that solving placement problem perfectly by pair-wise information is insufficient. In fact, there is no prior research using it to resolve placement problems, because manipulating such deep levels of affinity is difficult. One of the obvious issues is that k is a variable choice. It is an auxiliary analysis tool used in our research. Incapable for forming the graph model, they could not be used for solving the problem.

Degree-2 trace information is especially useful because it can be transformed to graphs. An object access graph OG = (V, E) is constructed by the following instructions:

(i) The vertex set V is equivalent to the object set O, that is i, v_i=o_i. The value s_i =

e_i,j can be add to the graph OG to connect vertexes o_i to o_j. The value w_i,j is given as the length of the edge e_i,j. Figure 3.3(b) is the object access graph of the sample trace listed above. The edges are labeled with the Degree-2 trace information.

a b c d e f

Figure 3.3 (a) The adjacent matrix. (b) The object access graph. (c) Group the original object trace graph into partitions.

The sum of edge length of OG = (V, E) is obviously the length of the object access trace as well as the length of the block access trace, that is –

This is no coincidence because summing up all w_i,j equals to count the occurrences of all segments in the trace. The object access graph is useful in manipulating the packing and placement problem in the following discussions.

在文檔中適用於快取記憶體的封裝暨安置物件方法 (頁 51-58)