Chapter 2 Backgrounds and Related Works
2.2 Related Works
There are many methods proposed to reduce BTB power. The nature of BTB, frequently accessed and large in structure, gives lot of opportunity when it comes to power saving, including:
1. BTB power management 2. Reduction of BTB access count 3. Reduction of BTB size
In this section, we focus on presenting previous works on BTB size reduction. Two related works are introduced here:
1. Partial Resolution in BTB, IEEE ToC, 1997. [5]
2. Cost-Efficient BTB, Euro-Par Conference on Parallel Processing, 2000. [6]
Both listed related works focus on reduction of each BTB entry width, while as we will learn later in this thesis, our method put effort on reducing number of BTB entries.
11
2.2.1 Related Work 1: Partial Resolution of BTB
In partial resolution of BTB, a technique of tag width reduction in BTB is proposed. By using the proposed technique, named Partial Resolution, tag field length of each BTB entries can be shorten to 3~8 bits in a direct-map BTB.
For a long time, tag comparison has been a very time and power consuming process in cache-like storage access, e.g., BTB. The comparator used for the process examines every bit of both tags to be equal for a hit, while a single bit mismatch can conclude a failure. By this characteristic, there is a chance to shorten the length of tag field in BTB. The shorten tag can unambiguously detect an absence in BTB, though it may falsely indicate a presence. Figure 2-3 shows the basic idea of Partial Resolution.
(a) (b)
Figure 2-3: (a) Conventional BTB (i.e. full tag) and (b) Partial Resolution in BTB.
As can be seen in Figure 2-3(b), a segment of least significant bits of tag are proposed to be used as a shorten version of tag. Intuitively, we can foresee that a partial tag may lead to possible false hits, which means mistaking a non-branch instruction as another branch instruction. Thus a non-branch instruction may wrongfully proceed with the target address provided by BTB, and a later necessary pipeline flush would take place for this misprediction.
12
Note that this particular kind of misprediction wouldn’t occur in conventional BTB design, and it causes nothing but disturb to system pipeline flow and may even harm the accuracy of direction predictor. According to the experimental results presented in the paper, only 3~8 bits are required to provide 99% of the accuracy that full tag can deliver in a direct-map BTB.
However, in a BTB with high associativity, longer tag bit field would be needed to maintain prediction accuracy.
2.2.2 Related Work 2: Cost-Effective BTB
Cost-effective BTB presented a technique to shorten target address field of BTB. By
storing only the difference between branch address and branch target address, the target address field length of BTB can be reduced to essential.
Target address field shortening is based on the exploitation of Branch Locality. The fact is a branch doesn’t tend to jump too far away from itself. So when we compare the address of a branch instruction and the address of its corresponding branch target, only a segment of low order bits would be different. Storing only the difference segment of branch and its target can help us reducing the target address field of BTB. The correct target address can be obtained by concatenating high order bits of PC with the difference segment stored in BTB when the branch is encountered again later on. Difference segment examination can be done by conducting bit by bit XOR on branch address and its branch target and then by finding the leading 1. The distance between the leading 1 and the least significant bit represents the difference segment length. Note that the difference segment length can vary greatly for each branch. And methods should be proposed for the variation in order to maintain correctness and accuracy. In this paper, two methods are proposed for this problem: Paired-Entry BTB and Variable Entry Size BTB.
13
Paired-Entry BTB
Paired-Entry BTB suggested that every entry in BTB has a shorten length than conventional BTB; while for long difference segment, two adjacent short entries can be
paired together in order to form a longer entry. Extra control bit field should be attached to
each BTB set to indicate the mode of the entry utilization: independent entries or as a long paired-entry. Note that in paired-entry mode, tag bit field of one of the entries is proposed to become a part of the target address field. Physically, this requires a specially designed BTB, where tag field can be programmed to function as data field. Figure 2-4 shows how the Paired-Entry BTB works.
Figure 2-4: Conventional BTB vs. Paired-Entry BTB.
As can be seen in the figure, target address field in Paired-Entry BTB is noticeably shortened. Also, control bit field is attached to the BTB. A 1 indicates two entries are paired, and the original tag field between the two paired entries is therefore used as a part of new data field to contain the long address. A 0 indicates two entries function independently.
Variable Entry Size BTB
Variable Size Entry BTB applies a rather straightforward way of dealing with long difference segment. It is proposed to put entries into groups, usually each ways as different groups, and set different target field length to each group. Branches thus enter BTB group and
14
register for an entry according to the length of its difference segment. Figure 2-5 explains the idea.
Figure 2-5: Conventional BTB vs. Variable Entry Size BTB.
Variable Entry Size BTB handle long difference segment by reserving certain number of entries with long target address field.
Paired-Entry BTB is more dynamically adjustable, since entries are only paired as needed.
Besides, using tag field as data field requires hardware support, which leads to a more complex BTB structure in implementation. Variable Entry Size BTB shows less flexibility when it comes to dealing with long difference segment. The number of entries of different target address length should be decided precisely to provide optimal performance.
Unbalanced utilization among groups is suspected for some cases where long difference segment branches use up all the reserved entries. Both method introduced experience a replacement policy complication in BTB, since a long difference segment branch may evict two short ones in Paired-Entry BTB; as for Variable Entry Size BTB, each groups of different target address length may have to maintain its own replacement policy.
15