System Architecture - Proposed Time-Directed Dijkstra Algorithm

Chapter 4 Proposed Time-Directed Dijkstra Algorithm

4.4 System Architecture

The proposed TD-D algorithm is not a standalone component. It needs to work with other components to perform its function. Fig. 3 illustrates the system architecture for our TD-D algorithm and its workflow when time slot = t. First, as illustrated in (a), the workload monitor will monitor and record the workload from each application that running on computing cloud in real-time. Since the real-time monitored data is numerous and jumbled, the workload monitor will process them into ordered and usable statistics data, usually the peak workload values for each application in that time slot, and then sends the data to reactive controller and workload predictor, as illustrated in (b). In (c), once the reactive controller receives the monitored workload data from workload monitor, it will perform its only but essential function, that is, dynamically switched on VMs/servers if any resource under-allocation is detected. This is a critical function that a proactive, long-term controller like our TD-D algorithm will need, since there are always prediction error and resource under-allocation is inevitable. We put this reactive, or short-term controller into our system architecture as the auxiliary of our long-term, proactive controller, and the countermeasure to prediction error. Another component that uses the workload statistics data is the workload predictor. As mentioned in the previous chapter, we make use of existing workload prediction technique to deploy a workload predictor in our system, to provide workload prediction for the following W time slots for each application. Finally, as illustrated in (d), the proactive controller receives the updated prediction data from workload predictor, performs our TD-D algorithm, and then sends control messages to computing cloud to perform any desired resource reallocation, as illustrated in (e).

Figure 3. Resource management system architecture and its workflow, assuming time slot = t. (a) Real-time workload data gathered from the Computing Cloud. (b) Real-time workload statistics, usually using the peak workload within a time slot. (c) Dynamically switch on new VMs/servers if under-allocation is detected. (d) Provide updated workload prediction from t + 1 to t + W. (e) Perform dynamic resource allocation at the beginning of each time slot, according to the direction of the proposed Time-Directed Dijkstra algorithm.

Chapter 5 Evaluation

In this chapter, we introduce our experiment settings, the comparison approaches we used, the experimental results and discussion.

5.1 Experiment Settings

We build up a simulation environment to do our evaluation. First, we defined the parameters used in our evaluation, which are listed in Table 3. Note that the energy consumption unit we use here is a relative unit so there is no energy unit like joule or KW/hr. That is, we set or = 1, then we get and from the value of and , by applying (3) and (4). The values of break-even time and can be determined by measuring the energy consumption on real servers and VMs or determined by operator policy. We also use the energy consumption measurement data from [12] as our operating cost parameter. The peak energy consumption for a 2x Intel Xeon X5550 Quad core server is 248W, and the evaluation are listed in Table 3.

Then we implement a synthetic workload generator, which can provide the predicted workload of every app for the following W time slots. By intuition we may

server

Table 3. Parameter settings used in the evaluation

Parameter Definition Value

N The number of applications 30

MAX_VM_APP

The maximum number of VMs which can be allocated to each

application

NUM_SERVER The number of available servers in the data center

ceiling(NUM_APP

×MAX_VM_APP / C)

W Prediction window size 7 time slots

Same as defined in Table 2 9

Same as defined in Table 2 1

Same as defined in (3) 3

Same as defined in (4) 2

C Same as defined in Table 2 6

think a workload generator that generates workload that follow a Gaussian distribution.

But such workload generator mostly generates workload that vibrate along the mean value, but rarely the increasing or decreasing workload, which do happen in the real world data center at the transition time between rush-hour and off-hour. Here we implement a synthetic workload generator that using the discrete version of Gaussian Random Walk model. For each app, we see the predicted resource demand as a series of W non-negative integer random variables like , and these random variables form a time-homogeneous Markov chain. The term “Gaussian Random

Walk” means for every step from to , 0 ≤ t < W , , where 111 is an integer random variable that follow the same Gaussian

distribution . Note that since the number of VMs that allocated to an app should be ranged from 0 to MAX_VM_APP, we force any negative d to be 0 and any

)

d that lager than MAX_VM_APP to be MAX_VM_APP, making the state space of the Markov chain a closed communicating class. Besides, there are always prediction error in real world, so we implement Additive White Gaussian Noise (AWGN) to add prediction error into our predicted workload. The AWGN works similarly as Gaussian Random Walk, that they all apply a Gaussian distribution of the form , so we can adjust the degree of workload fluctuation and the severity of prediction error by applying different variance values.

We categorize the approaches used in the evaluation into five classes, listed as follows:

(1) Resizing VM only, without break-even time: This class of approaches only do the VM resizing and not concern the break-even time, or we can say the switching cost. [6] can be categorized into this class.

(2) Resizing VM only, with break-even time: This class of approaches only do the VM resizing and using the rule of break-even time to balance the operating cost and switching cost.

(3) Resizing VM and server, without break-even time (on demand): This class of approaches do both the VM and server resizing, but not concern the break-even time, or we can say the switching cost. [8] and [13] can be categorized into this class.

(4) Resizing VM and server, with break-even time: This class of approaches do both the VM and server resizing, and using the rule of break-even time to

into this class.

(5) The proposed Time-Directed Dijkstra algorithm.

Note that since every approach has their own scenario, here we assume that all approaches can do the time horizon optimization over W time slots, thus relax and improve some approaches. Another thing is that since the performance of approximation algorithm is highly correlated to the implementation, our simulation results may not reveal the true performance of those approaches that using the approximation algorithm. An approximation algorithm with good implementation can often get the solution very close to the optimal.

Finally, to do an objective and credible measurement on the computing time, in our evaluation, we implement our algorithm to a single-thread, single-process program, running on a Intel Core i5-2500 3.3GHz machine with 8GB RAM.

5.2 Experiment Results and Discussion

First we evaluate the performance of energy saving using error-free workload information. In Table 4, we compare our algorithm with other two approaches that only resizing VM. It can be easily understood that the first two approaches consume much more energy since they don’t provide the server level resizing, and the basic energy consumption takes a significant fraction of the overall energy consumption on a working server [7]. Next we compare our algorithm with other two that resizing at both VM and server levels. The result is illustrated in Fig. 4. We can easily observe that, as the degree of workload fluctuation increases, the energy consumption also increases due to more and more switching cost. We notice that the

server

Table 4. Energy consumption comparison of the proposed TD-D with approaches only resizing VMs

Variance of workload fluctuation

Resizing VMs only, without break-even

time [6]

Resizing VMs only,

with break-even time TD-D (proposed)

2 2253 2234 248

2.38 2314 2288 292

2.72 2328 2297 311

3 2328 2292 350

Figure 4. Energy consumption comparison of the proposed TD-D with approaches that resize both VMs and servers

approach concerning break-even time consume more energy than the one that uses on-demand resizing. This phenomenon can be understood as we described in Chapter 3, that when applying a simple break-even time rule, we may need to allocate more servers to accommodate the VMs that we kept in VM break-even time events. As the workload fluctuation become severe, more VM break-even time events happen and

Table 5. Comparison of average computing time and percentage of times an algorithm completed within 3 minutes

more wasted servers are allocated. This is a good example why we need an optimal algorithm rather than a best-effort algorithm that using simple rules or heuristics.

In the second part, we measure the computing time of our algorithm. Since the complexity of our algorithm is mainly dominated by the level of workload fluctuation, we record the computing time under different workload fluctuation level. We also show the computing time of local search approach for comparison, which is the one that closest to TD-D in energy saving. The results are shown in Table 5. As we can see, the average computing time is acceptable for a long term resource allocation algorithm, and there is a high percentage of times that the algorithm can be completed within a time slot (3 minutes) and give us the optimal solution. In contrast, the local search approach never completes its search within three minutes due to its unbounded search space and the lack of stopping criterion. Since we use the relative energy unit, that is, we set the = 1 and the proposed TD-D can be completed within one time slot, we can conclude that the energy consumption of performing our algorithm is no more than 1. Compare the energy our algorithm can save with the energy and time our algorithm costs, we show the effectiveness and efficiency of our algorithm in energy saving.

Finally, we evaluate the reliability and effectiveness of our algorithm to

Algorithm Variance used in workload generator

2 2.38 2.72 3

Figure 5. Comparison of energy consumption under different severity of prediction error

prediction error. Besides the reactive controller showed in Chapter 4, we re-perform our TD-D algorithm every 2 time slots, instead of W, to resist prediction error. When a resource over-allocation occurs, it brings extra energy consumption of operating cost.

When a resource under-allocation occurs, it brings extra energy consumption of switching cost since the reactive controller has to switch on new VM/server to fulfill the demand. Again we use local search approach for comparison. The reactive controller and algorithm re-performing are also implemented in the local search approach. We set the variance used in workload generator = 2.38 and the results are shown in Fig. 5. As we can see, as the prediction error become severer, the energy consumption also become larger, due to the extra operating cost caused by over-allocation and the extra switching cost caused by under-allocation. We find our algorithm can still save more energy than the comparative approach under prediction error. Another way to evaluate the reliability to prediction error is under-allocation ratio. This is an important evaluation since some energy efficient algorithm may take

the risk of under-allocation to achieve less energy consumption. The results are shown in Table 6. The difference between the second and third column of Table 6 is, the second column show the average VM under-allocation counts of the whole data center, while the third column show the average VM under-allocation counts of each application, which is the actual influencing factor for user experience. We can find that our algorithm can keep in very low resource under-allocation ratio even under severe prediction error. The reason of how our algorithm can achieve low energy consumption while keeping in low resource-allocation ratio is the good side-effect of concerning break-even time. In a break-even time event, we may choose to keep that temporarily unnecessary resource, thus avoiding some resource under-allocation if a prediction error occurs and the resource demand does not really go down.

Table 6. VM under-allocation under different AWGN variances over 7 time slots

Variance of AWGN

Average VM under-allocation (VM / time slot)

Average VM under-allocation per application (VM / (time slot ×

application))

Local search TD-D Local Search TD-D

0.02 0.27 0.30 0.01 0.01

0.04 0.86 0.76 0.03 0.03

0.06 1.20 1.13 0.04 0.04

Chapter 6 Conclusion

6.1 Concluding Remarks

In this paper, we introduce our minimum energy consumption resource allocation algorithm for cloud data centers called Time-Directed Dijkstra (TD-D). It can produce optimal solution by utilizing the existing load prediction approaches. We first characterize the difficulties of resizing at both VM and server levels, and then come up with an optimal algorithm that can seek the best trade-off between operating cost and switching cost to achieve minimum energy consumption. We demonstrate the correctness of our algorithm and show that even such high complexity problem can be completed by commodity machine in reasonable computing time. Compared with representative best-effort dynamic resource allocation algorithm, our optimal algorithm can save more energy under different workload fluctuation level. We also demonstrate the robustness and energy efficiency of our algorithm to prediction error.

6.2 Future Work

The next step is to use the workload traces from real world to further evaluate our algorithm. Another future work is the cluster version of our algorithm. Since there are more and more large scale data center, for reliability and scalability, the cluster version must be developed to build a decentralized resource control system.

References

[1] John J. Prevost, KranthiManoj Nagothu, Brian Kelley and Mo Jamshidi,

“Prediction of Cloud Data Center Networks Loads Using Stochastic and Neural Models,” Proc. of the 6th International Conference on System of Systems Engineering, 2011, pp. 276-281.

[2] Truong Vinh Truong Duy, Yukinori Sato, Yasushi Inoguchi, “Performance Evaluation of a Green Scheduling Algorithm for Energy Savings in Cloud Computing,” International Symposium on Parallel & Distributed Processing, Workshops and Phd Forum (IPDPSW), 2010.

[3] Daniel Gmach, Jerry Rolia, Ludmila Cherkasova, Alfons Kemper, "Workload Analysis and Demand Prediction of Enterprise Data Center Applications," IEEE 10th International Symposium on Workload Characterization, 2007.

[4] Arijit Khan, Xifeng Yan, Shu Tao, Nikos Anerousis, "Workload Characterization and Prediction in the Cloud: A Multiple Time Series Approach," IEEE Network Operations and Management Symposium (NOMS), 2012.

[5] Minghong Lin, Adam Wierman, Lachlan L. H. Andrew, and Eno Thereska,

“Dynamic Right-Sizing for Power-Proportional Data Centers,” IEEE INFOCOM, 2011.

[6] Chunqiang Tang, Malgorzata Steinder, Michael Spreitzer, and Giovanni Pacifici,

“A Scalable Application Placement Controller for Enterprise Data Centers,” ACM Proceedings of the 16th international conference on World Wide Web, 2007.

[7] Dara Kusic, Jeffrey O. Kephart, James E. Hanson, Nagarajan Kandasamy, Guofei Jiang, “Power and Performance Management of Virtualized Computing Environments via Lookahead Control,” Cluster Computing, vol. 12, no. 1, pp.

1-15, March 2009.

[8] Anton Beloglazov, Jemal Abawajy, Rajkumar Buyya, “Energy-aware Resource Allocation Heuristics for Efficient Management of Data Centers for Cloud Computing,” Future Generation Computer Systems, vol. 28, no. 5, May 2012, pp.

755–768, 2012.

[9] Danilo Ardagna, Barbara Panicucci, Marco Trubian, and Li Zhang,

“Energy-Aware Autonomic Resource Allocation in Multitier Virtualized Environments,” IEEE Transactions on Services Computing, vol. 5, no. 1, 2012.

[10] Vinicius Petrucci, Orlando Loques, Daniel Mossé , “A Dynamic Optimization Model for Power and Performance Management of Virtualized Clusters,”

Proceedings of the 1st International Conference on Energy-Efficient Computing and Networking, 2010, pp. 225-233.

[11] Anton Beloglazov, Rajkumar Buyya1, Young Choon Lee, and Albert Zomaya,

“A Taxonomy and Survey of Energy-Efficient Data Centers and Cloud Computing Systems,” Advances in Computers, vol. 82, 2011.

[12] Andrew Krioukov, Prashanth Mohan, Sara Alspaugh, Laura Keys, David Culler, Randy Katz, "NapSAC: Design and Implementation of a Power-Proportional Web Cluster," ACM SIGCOMM Computer Communication Review, vol. 41, no. 1, pp.

102-108, January 2011.

[13] Norman Bobroff, Andrzej Kochut, Kirk Beaty, “Dynamic Placement of Virtual Machines for Managing SLA Violations,” 10th IFIP/IEEE International Symposium on Integrated Network Management, 2007.

在文檔中考量最低能源消耗之雲端資料中心動態資源分配演算法 (頁 30-0)