Chapter 2 Background
2.2 Dynamic Voltage and Frequency Scaling
In this subsection, we will describe background knowledge of DVFS. The term, DVFS, means that scale frequency with scaling voltage. According to [2] and equation below, there is a proper/lowest voltage for each clock frequency to run stably. Thus we can provide enough/lowest voltage for processor to run stably on current frequency to save power. The frequency can be estimated by the following equation,
dd t g
V V f V
)2 ( −
∝ (1)
where f is the frequency of processor, Vdd the supply voltage, Vt the threshold voltage, and Vg the voltage of the input gate. The dominant source of power dissipation in a digital CMOS circuits is the dynamic power dissipation [2],
dd2 V f c
P∝ ⋅ ⋅ (2)
where c is total load capacitance of all gates. In equation (2), we can get conclusion that
power is linear in frequency and quadratic in voltage. But we know that energy consumption, E=Pt, where t is execution time, which is an inverse proportion to frequency. Thus we can derive that energy is quadratic in voltage. There is trade-off between 1/frequency and voltage, between execution time and energy if we consider both performance and energy consumption.
There are some clear examples in figure 2-1 [9].
Figure 2-1 : Example of Energy-Saving in DVFS
In figure 2-1 (A), program runs at highest frequency with highest voltage thus gets highest energy consumption even if the power supply is turned off after finishing the program.
Under the given time constraint of 25 seconds, the voltage scheduling in (B) and (C) get better energy consumption.
2.2.1 Generic Flow of DVFS
Figure 2-2 describes the generic flow of DVFS system. When running a program, DVFS system will trigger profiler at some time or at somewhere in code. After triggered profiler and collected needed information, the system needs to verify the condition of program of runtime and decide whether to scaling frequency and voltage or not. Collected information then will be analyzed by system to produce proper frequency and voltage for current status. After then, we need to map the continuous values to closet discrete hardware settings.
Program
Reach specified point in program or periodically triggered
Collect past information of programs such as execution
time, idle time, instruction counts, memory access counts, cache miss counts,
power, energy, battery status…
Depend on policy such as idle time > threshold, cache miss > threshold,
battery capacitance <
threshold Predict future
behavior of programs according to profile E.g. It has the same idle time in the future
as that in the past
Figure 2-2 : Generic Working Flow of DVFS
2.2.2 Implementation Levels of DVFS
DVFS can be implemented in at a number of levels. These include the hardware level, operating system level, compiler level, virtual machine level, and application level. Nearly all DVFS research has focused on the first three levels. Though the hardware level provides mechanisms for reducing frequency and voltage, it also needs information about program behavior to decide when to apply these mechanisms. Techniques for deriving this information are too expensive to implement in bare hardware [10].
Operating System Program
(Application or Compiler)
Hardware
Virtual Machine
Figure 2-3 : Implementation Levels of DVFS
Operating systems have more information, namely, about what programs are running and what resources they use. Thus, they can make DVFS decisions based on CPU usage patterns.
However, operating systems lack forward looking information about program behavior and are hence limited to extrapolating future behavior from past behavior [3][4][11].
Compilers, however, receive an entire program as input. Thus, they can predict with greater accuracy the paths a program’s execution will take. Compilers can make DVFS decisions at a finer granularity than operating systems by inserting DVFS instructions into program regions such as basic blocks. Nevertheless, statically optimizing compilers lack runtime information and often resort to exhaustive simulation or previously collected offline profiles to decide what program regions should slow down and how much they should slow down. Once made, these decisions remain fixed for a program’s execution [12].
Like compilers, virtual machines have a model of future program behavior and can thus make more accurate power management decisions than operating systems or bare hardware.
However, unlike static compilers, virtual machines have an infrastructure allowing them to profile and reoptimize programs in execution. This dynamic optimization infrastructure allows virtual machines to continuously adapt power management decisions to varying execution behavior [6].
At the application level, programmers can make design decisions that reduce execution time and create opportunities for slowing down the processor. However, doing all of the analysis for DVFS at the application level may place too much of a burden on programmers [2].
2.2.3 Granularity Levels of DVFS
DVFS has been explored at different granularity levels. These include the interval level, intertask level and intratask level. At the largest granularity are interval-based policies that regularly adjust processor speed based on prior workloads. The simplest algorithm of this kind is PAST [11]. PAST adjusts CPU speed at fixed length intervals based on the idle and active cycles of the previous interval. If the idle cycles exceed a threshold, it slows down the processor. Else if the active cycles are higher, it speeds it up.
Interval1 Interval2 Interval3
Interval0 Interval4
Time Frequency
Figure 2-4 : Interval Level DVFS
At a higher granularity are intertask policies that determine execution frequencies of individual tasks. The simplest example of an intertask policy is Energy-priority scheduling [13]. This policy maintains an even workload distribution as new tasks enter a system, to minimize battery drain rate. In every iteration, EPS schedules the task with furthest deadline and fewest overlapping tasks. It computes the minimum workload increase due to the new task and speeds up already scheduled tasks to make room and fill up slack.
Task1 Task2 Task4
Time Frequency
Task3 Task0
Figure 2-5 : Intertask Level DVFS
Intratask approaches vary clock frequency and voltage within individual tasks. These approaches have been implemented in operating systems and compilers. Example of OS-assisted intratask policies are Dudani et al. [14]. To combine EDF scheduling with frequency scaling, Dudani et al. split each task the scheduler chooses into two subtasks, later running at full speed and the earlier running slower. They choose the earlier subtask’s speed to keep the combined execution time of both subtasks below the average execution time for the whole task.
Compiler-assisted intratask DVFS by Hsu and Kremer [15] discusses how to select regions where DVFS decisions should be made. The idea is to instrument a program with profiling code and execute the program to build a table of execution frequencies and average cycles for each region under all possible clock frequencies. Using this exhaustive approach, Hsu and Kremer select the region whose slowdown minimizes energy dissipation and incurs the smallest increase in runtime.
Task0 Section
Task0 Section Task0
Section
Time
Frequency Task0
Section
Task0 Section
Section : Program units, such as function, loop, basic block
Figure 2-6 : Intratask Level DVFS