Cloud Resource Scheduling for Online and Batch Applications
Kick-off meeting
Project Goal
Develop a resource management syst em that
◦Deploy different types of jobs to ser vers.
◦Dynamically adjust the resource alloc ation according to job workloads.
◦Meet the Service Level Agreement(SLA) of latency-sensitive jobs.
◦Minimize the cost.
Penalty of violating SLA
Type of Job
Interactive job
◦Latency-sensitive
◦State-less
◦Strict SLA
Batch job
◦Consists of many (independent) tasks.
◦Soft deadline.
Example
YouTube
◦Interactive: video streaming
◦Batch: flow analysis
Phone billing system
◦Interactive: rate querying/changing
◦Batch: calculating the phone bill per user.
Cost
“Penalty”
◦The price we have to pay for violatin g the SLA.
Each job has different penalty rat e.
◦P( Ja ) = penalty rate( Ja ) * max(0, v), Ja ∈ I
v: percentage of SLA violation within a ti me window
◦P( Jb ) = penalty rate ( Jb ) * max(0, d), Jb ∈ B
d: difference between job completion time and deadline.
Problem Definition
Given a set of batch job B , a set of interactive job I , the number o f processors m , and a penalty C . I s there a schedule to run all jobs with the total penalty no more tha n C ?
◦NP-complete
◦Design heuristics to obtain schedule with reasonable quality.
Processor Allocation
Estimate the penalty of interactiv e jobs on different number of proc essors.
Estimate the penalty of batch jobs on different number of processors.
Determine the number of processors for each job.
◦With limit number of processors, find the assignment that minimize the tota l penalty.
System Architecture
Implement the scheduling algorithm and evaluate the result.
Build the system on Docker / imple ment the scheduler in Kubernetes.
Consider other resources.
◦Memory, network…etc.