Contention-Aware Schedulin g for Asymmetric
Multi-core Processors
Xiaokang Fan, Yulei Sui, Jingling Xue
Programming Languages and Compilers Group
School of Computer Science and Engineering, UNSW Australia
ICPADS’15
Motivation
Most of the researches on AMP sche duling didn’t consider shared res ource contention.
◦Speedup-driven.
However, contentions for shared re
sources affect application perform
ance.
Contention-aware Scheduling for A MPs
Offline stage
◦Profiling.
◦Build a performance interference mode l.
Online stage
◦Schedule a set of applications to cor es by considering both speedup factor and predicted performance interferenc e.
Contention-aware Scheduling Frame
work
Offline Interference Model
For a target application and train ing benchmarks, collect
◦Individual pressure
The application’s access rate (access cou nt per second) to a shared resource R.
◦Aggregate pressure
The pressure to a shared resource R that a n application co-runs with another trainin g benchmarks.
Aggregate Pressure
◦P τ (R): aggregate pressure on shared resource R when application running i n cluster τ.
◦C τi(R): individual pressure of the i-th application running in cluster τ.
◦R: shared resource.
Shared cache, shared bus, and shared memor y.
ni
τ i
τ
(R) C (R)
P
1
Intra- and Inter- Cluster Interfe rence
◦α, β, γ, σ, δ, θ, σ‘ are to b e instantiated using linear regressio n with training results.
Performance Degradation
Example of Training
Performance Degradation Re
sult
Online Scheduling
Given a list of applications along with their profiling information.
For an idle core c , sort the appli cations according to their big cor e speedups.
◦c is big core: descending order.
◦c is little core: ascending order.
Online Scheduling(Cont.)
Choose the first application in th e list that satisfy the following condition and assign it to core c .
◦“After assigning the application to core c, the predicted slowdowns of ap plications running on cores are still under a predefined threshold.”
Can be estimated by updating the aggregate pressures.
Online Scheduling(Cont.)
Otherwise, choose the application
that leads to the smallest total p
erformance slowdown.
Evaluation Environment
Versatile Express CoreTile
◦Two A15 and three A7
◦1.2GHz and 1 GHz
Benchmarks
◦28 training benchmarks
From CPU 2006, MediaBench, MiBench
◦21 target applications
From CPU 2006
Prediction Accuracy
Compare with Other Schedul ers
◦Compared with the default scheduler
Average speedup: 12.04%,
Maximum speedup: 28.32%
◦Compared with the speedup-factor-driven scheduler
Average speedup: 7.84%
Maximum speedup: 28.51%.
Conclusion
This paper presents a new contention-a ware workload scheduler for asymmetric multi-core processors.
◦An offline performance interference model for predicting the performance slowdown.
◦An online stage for scheduling an applicat ion to the most appropriate core type base d on predicted performance interference.
The proposed scheduler can improve ove rall system performance by up to 28.32
% and 28.51%, respectively.