5.2 Execution Flow
5.2.3 Scheduler
We create a new governor to set CPU frequency because in Linux the power manager and the CPU scheduler are independent, so the scheduler cannot set CPU frequency di-rectly. Since the scheduler cannot interrupt the power manager to notify that it wishes to set CPU frequency, instead the governor will periodically poll the scheduler to know if the scheduler decides to set the frequency or not. The scheduler cannot interrupt the power manager because it may be blocked, and the scheduler are not allowed to be blocked by anything in Linux. Also note that we cannot schedule the throughput guaranteed tasks when the Linux boots up because Linux loads the power manager after loading the CPU scheduler.
The scheduler distributes the split task evenly among the core for performance. Since the split tasks have high priority, having many split tasks in a core may delay their execu-tion so that they may not meet their throughput.
In order to manage the run queues the scheduler first locks all run queues. This can prevent the timers from changing the data structure of the run queses, causing inconsis-tency in the data structures of tasks in the run queues. Then the scheduler assigns tasks to cores, and move the tasks to the assigned run queues. After that the scheduler notifies the governor to adjust CPU frequency, then it unlocks all run queues. Finally, the scheduler
select a new task to run on each core.
Note that the management of run queue does not wait for the current running tasks to finish. If the scheduler waits for all running tasks to finish or migrate, the overhead will be tremendous. As a result we simply manage the queues without waiting for running task to complete or migrate.
Due to the “no-wait” policy of the scheduler, there could be the cases that one task is running on a core, but its data structure as a task has been moved to the run queue of another core. We define this task as in an inconsistent state, meaning that it is running on one core, and in fact it is in the run queue of another core.
After adjusting tasks among run queues of all cores, the scheduler will select a task to run for a core from its run queue. The scheduler may select a task to run on a core c if all four following conditions are satisfied. Note that the scheduler will only select a task from three run queues – the run queue of c, and up to two run queues of c’s neighboring cores. Here neighboring cores mean they have adjacent indices. As a result the scheduler locks at most three run queues to select a task from them.
1. The task has available credits on this core.
2. The task is runnable.
3. The scheduler has not assigned this task to a core.
4. The task is not in the inconsistent state, or the task is in the inconsistent state, and the run queue of core c has this task. Note that if the scheduler selects a task in the inconsistent state (running on core c′) to run on a core c, then the scheduler must wait for the task to give up core c′ before migrating the task to core c.
When a task wakes up after sleeping for several time periods, the scheduler puts the task into the run queue of an available core. If no cores are available, the scheduler puts the task into a big core, then it adjusts the CPU frequency of the big core to satisfy throughput of all tasks in this big core. Note that we do not reschedule all tasks because of its huge
Chapter 6 Experiment
This chapter describes metrics that evaluate the effectiveness of our scheduler and the experimental performance results.
6.1 Methodology
Our target platform is a Juno ARM development board. The Juno ARM development board is an asymmetric multi-core platform consists of performance big cores (two Cortex-A57) and energy-efficient little cores (four Cortex-A53).
The platform supports per-cluster DVFS, i.e., clusters can work under different op-erating frequencies, so that we can adjust the frequency of each cluster according to the schedule. Table 6.1 details the specifications of the related hardware and software. Both Cortex-A57 and core Cortex-A53 can run on five frequency levels. Table 6.2 shows the available CPU frequency level for each core type in Linaro release version 15.07.
We implement our energy-credit based scheduler on the Juno ARM development board.
The scheduler periodically generates schedules for both types of cores. A schedule con-sists of the frequencies of cores and the energy credits of tasks. The scheduler then assigns throughput guaranteed tasks to cores for execution according to their energy credits. We set a scheduling period to one second in our experiments.
We use VLC [16], a free and open source cross-platform multimedia player, as our
Hardware
Processor Dual Cluster, ARMv8 big.LITTLE configuration Dual-Core Cortex-A57 (2MB L2 cache)
Table 6.1: Specifications of Juno ARM Development board
Core type CPU frequency levels Cortex-A57 450 625 800 950 1100 Cortex-A53 450 575 700 775 850 Table 6.2: Available CPU frequency levels (MHz)
benchmark. A VLC process consists of fifteen threads and each thread does different work. We consider each thread of the VLC as a throughput guaranteed task.
We can adjust the bit rate of the media player to control the its workload. We use a video converter FFmpeg [7] to encode the video with a constant bit rate (CBR), so the video player will consumes output from a decoder at a constant rate. As a result we can control the workload of the media player by adjusting its bit rate.
The Linux Priority System controls the priority of a thread with a nice value from -20 to 19. A niceness of -20 is the highest priority and 19 is the lowest priority. The default nice value of a task is 0, and we set nice value of the VLC to 0 in our experiment.
We use by ARM Energy Probe to measure the power consumption of the Juno ARM development board. The ARM Energy Probe can only measure the energy of big cluster and little cluster, so our data exclude the power consumption of the other devices.
We use DS-5 Streaming [3] to analyze the power consumption reported by ARM En-ergy Probe. DS-5 Streaming starts a background task gator to collect the data from ARM Energy Probe. A gator is a high priority profiling task, so we set its nice value to -19.
We test three workload scenarios. The first case is a light-weight workload, where we run one VLC that plays a video encoded with a constant bit rate of 400 kb/s. In the second Median-weight workload case we run eight VLCs simultaneously, and each of them plays a video with the same bit rate as in the first case. In the third heavy-weight workload case we run one VLC that plays a high quality video without compression.
We compare the power consumption of our Energy-credit Based Scheduler (denoted as ECS) with Completely Fair Scheduler (denoted as CFS) under different workloads. In the comparison CFS uses two existing governors – on-demand and conservative. The On-demand governor is the default governor for power manager in Linux, and the conservative governor is a power saving governor.
Recall that the on-demand and the conservative governor have a up threshold and a down threshold. The governors increase the CPU frequency up frequency level when the CPU usage exceeds the up threshold. On the other hand, the governors reduce the CPU frequency when the CPU usage is below the down threshold. We set the up threshold to 80% and the down threshold to 20% respectively.
We evaluate the power consumption of two configurations. In the first configuration, there is only one CFS scheduler, which schedules all processes, including VLC, gator, and other system services. In the second configuration, there are two schedulers – CFS and ECS. ECS schedules the throughput guaranteed task scheduling class of VLC, and CFS schedules gator and other system services. The reason we do not use ECS to schedule everything is that we cannot control the workload of these system services. Also these services do not need to guarantee their throughput.