Hierarchical Scheduling fo r Diverse Datacenter Workl oads
Arka A. Bhattacharya, David Culler, Ali Ghodsi, Scott Shenker, and Ion Stoica
University of California, Berkeley
Eric Friedman
International Computer Science Institute, Berkeley
ACM SoCC’13
Hierarchical Scheduling
A feature of cloud schedulers.
Enables scheduling resources to re flect organizational priorities.
Hierarchical Share Guarant ee
Assign to each node in the weighted tr ee some guaranteed share of the resour ces.
◦A node nL is guaranteed to get at least x s hare of resources from it parent, where x e quals to
Wi: weight of node ni
P(): parent of a node
C(): the set of children of a node
A(): the subset of demanding nodes
Example
Given 480 servers
240
48
96
96 48
80
96 160
9 6 0
Multi-resource Scheduling
Workloads in data centers tend to be diverse.
◦CPU-intensive, memory-intensive, or I /O intensive.
◦Ignoring the actual resource needs of jobs leads to poor performance isolat ion and low throughput for jobs.
Dominant Resource Fairness (DRF)
A generalization of max-min fairne ss to multiple resource types.
Maximize the minimum dominant shar es of users in the system.
◦Dominant share si is the maximum shar e of resource among all shares of a u ser.
◦Dominant resource is the resource cor responding to the dominant share.
} {
max 1 ,
j j m i
j
i r
s u
Example
Dominant resource
◦Job 1: memory
◦Job 2: CPU
Dominant share
◦60%
How DRF Works
Given a set of users, each with a resource demand vector.
◦The resources required to execute one job.
Starts with every user being allocated with zero resources.
Repeatedly picks the user with the lowest dominant share.
Launches one of the user’s job if there are enough resources availab le in the system.
Example
System with 9 CPUs and 18 GB RAM.
◦User A: <1 CPU, 4 GB>
◦User B: <3 CPUs, 1 GB>
Hierarchical DRF (H-DRF)
Static H-DRF
Collapsed hierarchies
Naive H-DRF
Dynamic H-DRF
Static H-DRF
A static version of DRF to handle hierarchies.
Algorithm
◦Given the hierarchy structure and the amount of resources in the system.
◦Starts with every leaf nodes being al located with zero resources.
◦Repeatedly allocates resource to a le af node until no more resources can b e assigned to any node.
Resource Allocation in Static H-D RF
Start at the root of the tree and travers e down to a leaf.
At each step picking the demanding child that has the smallest dominant share.
◦Internal nodes are assigned the sum of all th e resources assigned to their immediate child ren.
Allocate the leaf node an ε amount of it s resource demands.
◦Increases the node’s dominant share by ε.
Example
Given 10 CPUs and 10 GPUs.
Weakness of Static H-DRF
Re-calculating the static H-DRF al locations for each of the leaves a nd arrivals from scratch is comput ationally infeasible.
Collapsed Hierarchies
Converts a hierarchical scheduler into a flat one and apply weighted DRF algorithm.
◦Works when only one resource is invol ved.
◦Violates the hierarchical share guara ntee for internal nodes in the hierar chy.
Example
Given
Flatten
nr
n1,1 <1,1>
50%
n2,1 <1,0>
25%
n2,2 <0,1>
25%
Weighted DRF
Each user i is associated a weight vector Wi = {wi,1, … wi,m}.
◦wi,j represents the weight of user i fo r resource j.
Dominant share max 1{ , }
j j m i
j
i r
s u
wi,j
Weighted DRF in Collapsed Hierarc hies
Each node ni has a weight wi.
◦Let wi,j = wi for 1≦j≦m
◦The ratio between dominated resources allocated to user a and user b equals to wa/wb.
} {
max 1 ,
i j m i
j
i w
s u
Example
Given
Collapsed Hierarchies
nr
n1,1 <1,1>
50%
n2,1 <1,0>
25%
n2,2 <0,1>
25%
Naive H-DRF
A natural adaptation of the origin al DRF to the hierarchical setting .
The hierarchical share guarantee i s violated for leaf nodes.
◦Starvation
Example
Static H-DRF
Naive H-DRF
Dominate share = 1.0
Dynamic H-DRF
Does not suffer from starvation.
Satisfy the hierarchical share gua rantee.
Two key features:
◦Rescaling to minimum nodes
◦Ignoring blocked nodes
Rescaling to Minimum Nodes
Compute the resource consumption o f an internal node as follows:
◦Find the demanding child with minimum dominant share M.
◦Rescale every child’s resource consu mption vector so that its dominant sh are becomes M.
◦Add all the children’s rescaled vect ors to get the internal node’s resou rce consumption vector.
Example
Given 10 CPUs and 10 GPUs.
After n2,1 finishes a job and relea se 1 CPU:
<0.4, 0>
<0, 1>
<0, 1>
<0, 0.4>
<0.4, 0.4>
Dominate share =
<0.5, 0.4 0>
Dominate share = 0.5
Ignoring Blocked Nodes
Dynamic H-DRF only consider non-bl ocked nodes for rescaling.
A leaf node is blocked if either
◦Any of the resources it requires are saturated.
◦The node is non-demanding.
An internal node is blocked if all of its children are blocked.
Example
Static H-DRF
Without blocked
Dominate share = 1/3
Allocation Properties
Hierarchical Share Guarantees
Group Strategy-proofness
◦No group of users can misrepresent their resource requirements in such a way that all of them are weakly better off, and at least one of them is strictly better off.
Recursive Scheduling
Not Population Monotonicity
◦ PM: Any node exiting the system should not decrease the resource allocation to any other node in the hie rarchy tree.
Example
Evaluation - Hierarchical Sharing
49 Amazon EC2 severs
◦Dominant resource:
n1,1, n2,1, n2,2: CPU
n1,2: GPU
Result
pareto-efficiency: no node in the hierarchy can be allocat ed an extra task on the cluster without reducing the share of some other node.
Conclusion
Proposed H-DRF, which is a hierarc hical multi-resource scheduler.
◦Avoid job starvation and maintain hie rarchical share guarantee.
Future works
◦DRF under placement constraints.
◦Efficient allocation vector update.