• 沒有找到結果。

Backgrounds – Cache Prefetch Methods

Chapter 2 Related Works and Backgrounds

2.2 Backgrounds – Cache Prefetch Methods

Hardware cache prefetching predicts future memory access patterns based on current or past access patterns, and attempts to move data likely to be accessed in the near future closer to the processor.

Hardware prefetchers range from very simple next-line prefetchers to more sophisticated stride or even repeated-pattern based predictors. Those more advanced prefetch methods use tables to record history information related to data accesses. We present several common prefetch methods in this section. And in chapter 4, we choose both a simple sequential prefetcher and a modified correlation prefetcher to cooperate with our cache miss type classifier.

2.2.1 Sequential prefetching

The simplest prefetch methods are sequential prefetching. They access cache lines that immediately follow the current cache line. The sequential prefetch is also called next-line prefetch. Early sequential methods always prefetch after each cache miss, while more recent sequential methods wait to issue prefetching until a sequential access pattern is detected. Once sequential prefetching is issued and turns out to be correct, the degree of the prefetching is increased until the prefetch can completely hide the latency of a miss to main memory. Prefetch degree is the maximum number of cache lines prefetched in response to a single prefetch request. For longer memory latencies, a higher degree is required in order for prefetched data to be returned in time to avoid a cache miss.

2.2.2 Table-Based Prefetching

There are two main kinds of table-based prefetching, Stride Prefetching and Correction Prefetching. Stride Prefetching uses a table (Figure 2-2) to store stride-related

local history information. The program counter (PC) of a load instruction indexes the table.

Each table entry holds the load’s most recent stride (the difference between the two most recently preceding load addresses), last address (to allow computation of the next local stride), and state information describing the stability of the load’s recent stride behavior.

When a prefetch is triggered, addresses a+s, a+2s, …, a+ds are prefetched – where a is the load’s current target address, s is the detected stride and d is the prefetch degree, an implementation dependent prefetch look-ahead distance; more aggressive prefetch implementations will use a higher value for d. Originally Stride Prefetching used a look-ahead PC (LA-PC) to prefetch ahead.

Markov Prefetching is an example of a correlation prefetching method. Correlation prefetching uses a history table to record consecutive address pairs. When a cache miss occurs, the miss address indexes the correlation table, Figure 2-3. Each entry in the Markov correlation table holds a list of addresses that have immediately followed the current miss address in the past. When a table entry is accessed, the members of its address list are prefetched, with the most recent miss address first. The left side of Figure 2-3 illustrates the state of the correlation table after processing the miss address stream shown at the top of the figure Markov prefetching models the miss address stream as a

Figure 2-2: The structure of Stride Prefetching table

is an address and the arcs between nodes are labeled with the probabilities that the arc’s source node address will be immediately followed by the target node address. Each entry in the correlation table represents a node in an associated Markov graph, and its list of memory addresses represents arcs with the highest probabilities. Hence, the table maintains only a very crude approximation to the actual Markov probabilities. The right side of Figure 2-3 is the Markov transition graph that corresponds to the example miss address stream.

2.2.3 Cache Prefetching using a Global History Buffer

In general, prefetch tables store prefetch history inefficiently. First, table data can become stale, and consequently reduce prefetch accuracy (the percent of prefetches that are actually used by the program before being evicted). Second, tables suffer from conflicts that occur when multiple access keys map to the same table entry. The most common solution for reducing table conflicts is to increase the number of table entries.

However, this approach increases the table’s memory requirements, and increases the percentage of stale data held in the table. Third, tables have a fixed (and usually a small) amount of history per entry. Adding more prefetch history per entry creates new opportunities for effective prefetching, but the additional prefetch history also increases the table’s memory requirements and its percentage of stale data, which together can negate the advantages.

Figure 2-3: Markov Prefetching

To provide more efficient prefetchers we propose an alternative prefetching structure that decouples table key matching from the storage of prefetch-related history information.

The overall prefetching structure has two levels (Figure 2-4).

• An Index Table (IT) that is accessed with a key as in conventional prefetch tables. The key may be a load instruction’s PC, a cache miss address, or some combination. The entries in the Index Table contain pointers into the Global History Buffer.

• The Global History Buffer (GHB) is an n-entry FIFO table (implemented as a circular buffer) that holds the n most recent L2 miss addresses. Each GHB entry stores a global miss address and a link pointer. The link pointers are used to chain the GHB entries into address lists. Each address list is the time-ordered sequence of addresses that have the same Index Table key.

Depending on the key that is used for indexing the Index Table, any of a number of history-based prefetch methods can be implemented. In the following subsections we illustrate how the GHB can be used to implement correlation and stride prefetching. In addition, we illustrate more general forms of each (a total of eight prefetching methods).

相關文件