Chapter 2. Fundamental Background and Related Work
2.3 Related Work
In this section, we’ll show how Minimizing Switching Activities Scheduling (MSAS) works, and then Horizontal and Vertical Scheduling will also be introduced. Finally, we will introduce Power Reduction Rotation Scheduling (PRRS) and Switching-Activity Minimization Loop Scheduling (SAMLS) .
2.3.1 Minimizing Switching Activities Scheduling[12]
The algorithm, Minimizing Switching Activities Scheduling (MSAS), was designed to solve a special case of instruction-level energy-minimization scheduling, i.e., the case when switching activities play the most important role in energy consumption.
When Pbase is very small compare with α ( in equation 2 in section 2.2 ), the energy of a schedule depends mainly on switching activities. For example, when Pbase equals 0.1 and α equals 1, then we need to reduce 10 control steps in schedule length to count one bit switch.
Thus, the MSAS algorithm was designed to minimize switching activities as much as possible.
On the other side, considering the performance, the algorithm also wants to minimize schedule length. Hence, MSAS algorithm minimizes switching activities in first priority and still considers schedule length. Since most previous work focus on one functional unit, the algorithm takes advantage of multiple functional units under VLIW architectures.
In this algorithm, the input is the DAG we have defined it in previous section and the
output is a schedule with switching activities minimization. Due to the existence of the dependency in DAG, we can only schedule a node after all its parent nodes have been scheduled. The scheduling problem with switching activities minimization is how to find a matching between functional units and ready nodes in such a way that it can minimize the total switching activities in every scheduling step. This is equivalent to the min-cost weighted bipartite matching problem. Thus, in the first scheduling step of MSAS algorithm, it creates a weighted bipartite graph GBM, where the verticesof one side are the set of the functional units and the vertices of the other side are the nodes in ready list and the weight of the edge between the node and the functional unit is the switching activities when the node is assigned to the functional unit. Then it assigns nodes based the min-cost maximum bipartite matching.
After assigning the nodes, it updates the nodes in the ready list according to the DAG and the machine codes of the functional units. Then it repeatedly creates the weighted bipartite graph and assigns the nodes until all the nodes are scheduled.
The schedule created by MSAS can reduce total switching activities since it considers the min-cost maximum bipartite matching in each step. It is know that finding a min-cost maximum bipartite matching take O(n3) by the Hungarian Method[20]. Let N be the number of functional unit. In every scheduling step, it needs at most O((N+|V|)3) to find minimum weight maximum bipartite matching using Hungarian Method and the scheduling step is at most |V|. Thus, the complexity of MSAS is O(|V|*(N+|V|)3). It takes too much time to find minimum weight maximum bipartite matching and it may not find the minimal switching activities in some cases. We will show you in our motivation.
2.3.2 Horizontal Scheduling and Vertical Scheduling[13]
Both high performance and low power are two important objectives of complier optimization. Thus, in [13], the authors propose a two-phase instruction scheduling approach.
In the first, instructions are scheduled by list schedule for performance. Then, in the second phase, horizontal and vertical scheduling methods are employed to re-arrange the codes reducing the power without incurring performance penalty.
We first introduce the horizontal scheduling algorithm which re-schedules the instruction components of a long instruction to minimize switching activities of instruction bus. Suppose we have n VLIW instructions which have been scheduled by list schedule, then the horizontal scheduling won’t change the control step of each long instruction and the component of each long instruction, but it will try to re-arrange the position of each sub-instruction of a long instruction to reduce the switching activities. The way of re-schedule the position of sub-instructions of a long instruction is to create the weighted bipartite graph GBM between the long instruction which is re-scheduled and the long-instruction is considered to re-schedule right now. In GBM, the vertices of two sides are the sub-instructions of two long instructions and the weight of the edge is the switching activities between two sub-instructions. Like as MSAS, the horizontal scheduling finds the min-cost maximum bipartite matching and re-arranges the position of each sub-instruction according to the min-cost maximum bipartite matching. For example, in figure 2.4, U1 to U4 are the sub-instructions in the last long instruction already scheduled and L1 to L4 are the sub-instructions of a long instruction to be re-scheduled. Thus, it creates bipartite matching between them and finds the min-cost maximal matching. Then it re-schedules the positions of L1 to L4 according to the matching.
This algorithm repeatedly creates weighted bipartite graph and re-arranges the positions of the sub-instructions of each long instruction from the first long instruction to the last long instruction in the schedule created by list schedule. After re-scheduling all the long instructions, we can get a schedule which has less total switching activities.
Next, we will introduce the vertical scheduling. The vertical scheduling is similar with horizontal scheduling, but it allows sub-instructions to move across long instructions.
U4
Fig. 2.4 An example of bipartite matching for horizontal scheduling
That is, the horizontal scheduling won’t change each sub-instruction’s control step but the vertical scheduling. How can vertical scheduling do this? Because it uses a window size w to decide the weighted bipartite graph between the sub-instructions in the last long instruction already scheduled and the sub-instructions in the next w long instructions that satisfy data dependence constraint. We can say that horizontal scheduling is a special case of vertical scheduling when the window size w = 1. Like as the horizontal scheduling, the vertical scheduling finds the min-cost maximum bipartite matching and re-arranges the position of each sub-instruction according to the min-cost maximum bipartite matching repeatedly until all sub-instructions are scheduled.
The two algorithms present the essential idea of their low power optimization. It requires the functional units of target VLIW architectures to be identical. Thus, they can only perform sub-instructions swapping with identical functional units on target host without performance penalty. However, the functional units are normally classified into several classes in most of VLIW architecture designs. The swapping can only be done with functional units of the same class. This is the main constraint of their method.
2.3.3 Power Reduction Rotation Scheduling[11]
The algorithm, Power Reduction Rotation Scheduling, was designed to minimize both switching activities and scheduling length for loop applications and is based on rotation scheduling[3].
Rotation Scheduling presented in[3] is a scheduling technique used to optimize a loop
schedule with resource constraints. The main goal of rotation scheduling is to reduce the schedule length of a loop application. It transforms a schedule to more compact one iteratively.
Retiming[21] can be used to break the intra-dependence between instructions in a loop
application, so that the rotation can be done to reduce the schedule length of a loop application. In each step of rotation, nodes in the first row of the schedule are rotation down.
By doing so, the nodes in the first row are re-scheduled to the earliest possible available locations. From retiming point of view, each node gets retimed once by drawing one delay from each of incoming edges of the node and adding one delay to each of its outgoing edges in the DFG. The new location of the node in the schedule must also obey the precedence relation in the new retimed graph.
The Power Reduction Rotation Scheduling is totally based on rotation scheduling. In addition, each node needed to be rotated must to be scheduled on the location with minimum switching activities. So it can achieve the goal to reduce power consumption. The disadvantage of PRRS is that it was designed for the loop application and it can not be use in none loop application. It needs an initial schedule to be its input and it takes extra time to create the initial schedule.
2.3.4 Switching-Activity Minimization Loop Scheduling[19]
Switching-Activities Minimization Loop Scheduling is an improving algorithm of PRRS
which was developed to reduce both schedule length and switching activities of a loop application.
Switching-Activity Minimization Loop Scheduling (SAMLS) was based on rotation scheduling and bipartite matching. In the first phase of SAMLS, it performed the same thing as PRRS did. Then the schedule created by phase one will be the input of the phase two of SAMLS. In the phase two of SAMLS, it performed the same thing as horizontal scheduling did.
As the results of experiments, SAMLS has a little better performance than that of PRRS in reducing switching activities, but SAMLS takes much more time than that of PRRS.
2.4 Motivation
From the relative work, we can observe that reducing the switching activities is
an important factor of reducing total power consumption, especially when switching activities play the most important role in energy consumption. Hence, we focus our working on reducing switching activities as much as possible, despite it may take more control steps to complete the application.
In section 2.3.1 and section 2.3.2, we can see that both algorithms create weighted bipartite graph. Then MSAS finds the min-cost maximum bipartite matching and horizontal scheduling finds the maximum bipartite weighted matching. Both of them try to find the minimal switching activities in each allocation. We can observe that it may not find the minimal switching activities. For example, figure 2.5(a) is a simple example of DAG., and figure 2.5(b) is the schedule of figure 2.5(a) by using MSAS. Assuming the binary strings in Figure 2.5(c) are the machine codes of these instructions. Then the switching activities is 9.
But in figure 2.5(d) and figure 2.5(e), we can find that another schedule will cost less switching activities. We can observe that instruction A and instruction B have the same