• 沒有找到結果。

In our empirical study of forecasting power consumption, all quality measurement tools we applied suggest to us the best performance in conformity. As we have seen in Fig.3, the power consumption illustrates seasonality. Not surprisingly, WRASA works well for such a weak irregularity. In this case, the preternatural feature of the proposed measure (i.e., CTE) based on tail distribution is not startlingly clear. We are going to conduct a simulation study to investigate the performance of the proposed quality measure for assessing the performance of a BDS with strong irregularity.

4.4.1 Simulation method

In this study we perform Monte Carlo simulations where extreme irregular observations (jumps) are generated from two different patterns: (1) excessive volatility and (2) excessive volatility with regime switching. We generate 100 sequences of length 212. This trend is based on a sine function, whose amplitude and frequency are drawn from a uniform distribution.

For generating the pattern (1) signals, following Sun and Meinl (2012), we add jumps to the trend. Jump occurrences are uniformly distributed (coinciding with a Poisson arrival rate observed in many systems), with the jump heights being a random number drawn from a skewed contaminated normal noise suggested by Chun et al. (2012). For generating pattern (2) data, we repeat the method used for pattern (2) signals, but shift the trend up and down

once in order to characterize a sequence of excessive volatility with regime switching. The amplitude of the shift is four times the previous trend; see Sun et al. (2015).

Similar to how the method was applied in our empirical study, we also need to decide a moving window for the training and forecasting operations. We set the in-sample size as 200 and the out-of-sample size as 35, 80, and 100. The number of window moves is then 720, and we generate 100 data series for each pattern. Therefore, for each pattern we test our algorithm 72,000 times for in-sample approximation and one-step ahead forecasting. In our simulation, we have 216,000 runs in total. In our study, cases 1–3 and cases 4–6 are based on the data pattern (1) and (2), respectively. The out-of-sample forecasting length is set as 35 for cases 1 and 4, 80 for cases 2 and 5, and 100 for cases 3 and 6.

For each run we collect the values of six conventional criteria and five newly proposed measures. We report the results in next subsection.

4.4.2 Results and discussion

Table5 reports the performance quality of three BDS evaluated by the six conventional methods. We can see that when comparing the performances, WRASA is considered superior by MAPE, MAE, and RMSE in Case 1 in training while SOWDA is preferred by CVM, KS, and Kuiper. Similarly, MAPE, MAE, and RMSE support WRASA and CVM, KS, and Kuiper support SOWDA for out-of-sample performance in Case 1. In Table5we highlight the best performance in bold. Obviously, we encounter difficulty in synchronizing because there are contradictive results.

The contradiction can be identified by four different classes. The first is noncoincidence among three mean error based methods, (i.e., MAPE, MAE, and RMSE). For example, for the out-of-sample performance of Case 3, GOWDA is preferred by RMSE whereas MAE and MAPE support WRASA. Second, there is noncoincidence among three extreme error based methods, (i.e., CVM, KS, and Kuiper). For example, for the in-sample performance of Case 4, SOWDA is preferred by CVM whereas KS and Kuiper support WRASA. Third, there is noncoincidence between the mean error based methods and extreme error based methods. For example, for the out-of-sample performance of Case 2, all mean error based methods support GOWDA but all extreme error based methods benefit WRASA. Fourth, one method is not distinguishable from another method. For example, for the out-of-sample performance of Case 5, MAE cannot distinguish which one is better between GOWDA and WRASA because a tie occurs in comparison. KS cannot distinguish SOWDA and WRASA when comparing the in-sample performance of Case 6.

When we apply TE and CTE, it turns out to be much easier. In Table6, when we evaluate the in-sample performance, all smallest values of TE and CTE lead to WRASA. Therefore, its in-sample performance is better than other two. When we compare the out-of-sample performance, we recognize that TE cannot always provide a coherent conclusion with the results reported in Table7. For example, for Case 2, TE0.95 supports GOWDA whereas TE0.90benefit WRASA. Meantime, TE0.99cannot distinguish5the quality of GOWDA and WRASA. Not surprisingly, CTE can provide the coherent conclusion that WRASA always performs better. We have shown that CTE is a dynamic coherent measure since it satisfies all properties given by Axiom3.

There are many advantages of TE. First, it is a simple quality measure and has a straight-forward interpretation. Second, it defines the complete error distribution by specifying TE for allα. Third, it focuses only on the part of the error distribution specified by α. Fourth,

5Not because of making a round number.

Table5Comparisonofthein-sample(modelling)andout-of-sample(forecasting)performancesmeasuredbysixcriteriaforthreebigdatasystems(i.e.,SOWDA,GOWDA, andWRASA)withthesimulateddata.Thesmallestvaluesarehighlightedinbold In-sampleOut-of-sample MAPEdMAEbRMSEaCVMbKSbKuiperbMAPEcMAERMSECVMKSbKuipera Case1 SOWDA2.86096.04280.99471.03791.20902.20253.14130.66480.96460.39030.55650.9533 GOWDA2.78965.89021.41351.26911.25002.26703.05040.64560.94610.40860.57180.9827 WRASA2.54455.37720.89991.10011.23902.23853.03510.64240.94380.40590.57520.9889 Case2 SOWDA2.82985.97720.98791.02701.19652.18104.66580.98601.36102.35521.18461.9354 GOWDA2.74545.79891.38551.20901.25302.26954.63780.98001.35622.35381.16341.8953 WRASA2.52325.33070.89291.10821.25702.25504.62540.97731.35492.36021.17041.9139 Case3 SOWDA2.85906.00521.02080.91381.16502.10255.29981.11381.51612.89181.25162.0565 GOWDA2.88636.06301.45041.08901.19502.14355.24201.10161.50722.84061.25022.0351 WRASA2.61285.48720.92130.98821.15852.10105.24011.10121.50732.85971.26142.0576 Case4 SOWDA5.206210.95981.50978.31702.41303.04453.72990.78621.08370.47590.59831.0334 GOWDA4.53189.54181.65748.74532.45153.00803.64050.76731.06500.48890.60361.0379 WRASA4.44469.35811.30218.69652.40752.96703.62880.76481.06310.48990.61421.0509 Case5 SOWDA5.172210.88921.49918.35862.38203.00105.15201.08541.45902.51851.17321.9521 GOWDA4.49609.46531.63148.77912.44652.99105.11811.07821.45272.61391.20701.9974 WRASA4.45079.37201.29908.60722.43302.99905.11761.07821.45192.60371.19842.0014 Case6 SOWDA5.150910.84401.48388.35192.39653.02355.39461.13641.52762.76751.28462.0928 GOWDA4.46939.41021.61208.66252.44002.98605.35281.12771.52262.79581.29892.1087 WRASA4.44939.36831.29948.58442.39652.95655.34231.12541.51932.74751.28762.0967 a,b,c,andddenote×101102103,and×104respectively

Table6QualityevaluationwithTEandCTEatdifferentconfidencelevelsforin-sample(training)performancesofthreebigdatasystems(i.e.,SOWDA,GOWDA,and WRASA)withthesimulateddata.Thesmallestvaluesarehighlightedinbold In-sampleaIn-samplea TE0.50TE0.75TE0.90TE0.95TE0.99CTE0.50CTE0.75CTE0.90CTE0.95CTE0.99 Case1 SOWDA0.67991.19431.82042.42326.77081.60682.29793.55205.027710.4363 GOWDA0.59401.02971.52311.90423.62041.24541.69602.37253.05136.1440 WRASA0.51510.90941.36511.72233.57731.13231.56962.26453.00215.7232 Case2 SOWDA1.13861.57512.26803.00416.12301.59712.27903.51154.949910.2339 GOWDA0.59131.02801.52261.89183.57481.24071.68892.35763.02616.1230 WRASA0.52200.91411.37241.72983.53211.13861.57512.26803.00415.6890 Case3 SOWDA0.68551.20281.85612.52567.14671.64422.36663.69965.263010.6361 GOWDA0.58721.02081.50911.88723.82291.24141.69582.38913.16306.5599 WRASA0.52430.92031.38901.75743.75601.16181.61692.35903.09935.9082 Case4 SOWDA1.18152.05373.04163.81017.45552.48493.38674.74226.099511.0469 GOWDA1.07311.85102.70113.28014.83272.11722.80083.66944.37516.9591 WRASA0.90731.58482.35962.92604.62021.86672.51413.39804.18086.6096 Case5 SOWDA1.18632.06043.04933.81827.39802.48543.38104.71986.046810.8072 GOWDA1.07511.85162.70013.27384.81242.11732.80063.66464.36836.9476 WRASA0.91651.59872.36992.92164.63191.87452.51913.39564.17196.5948 Case6 SOWDA1.18352.05323.04213.79387.30042.47513.36534.68575.993810.6652 GOWDA1.07611.85012.69873.27714.83582.11702.80083.66664.37366.7719 WRASA0.91511.59382.36632.91044.58791.86672.50573.36954.12606.5969 adenotes×101

Table7QualityevaluationwithTEandCTEatdifferentconfidencelevelsforout-of-sample(forecasting)performancesofthreebigdatasystems(i.e.,SOWDA,GOWDA,and WRASA)withthesimulateddata.Thesmallestvaluesarehighlightedinbold Out-of-sampleOut-of-sample TE0.50TE0.75TE0.90TE0.95TE0.99CTE0.50CTE0.75CTE0.90CTE0.95CTE0.99 Case1 SOWDA0.49970.89341.29711.88293.65431.09981.51532.19162.85654.1205 GOWDA0.47670.87321.27161.82543.63901.07491.48792.15492.82344.1036 WRASA0.47460.86701.26241.81803.63731.06981.48192.14992.82314.1015 Case2 SOWDA0.75041.33972.13302.96304.28371.62912.23333.11103.73464.5810 GOWDA0.74361.33352.12762.95514.28062.24072.24073.11603.74024.5839 WRASA0.73881.33272.12402.96014.28061.62672.23263.11013.73394.5806 Case3 SOWDA0.85251.52192.48313.26644.58142.52242.52243.43074.02944.7852 GOWDA0.83551.50242.46793.26664.58261.83202.51323.42974.03284.7870 WRASA0.83321.50402.47193.27154.56901.83172.51113.42924.02934.7741 Case4 SOWDA0.62051.08021.53862.09203.73331.72371.72372.39233.02514.2306 GOWDA0.59811.05901.51432.05093.70521.25611.69672.35852.98844.2017 WRASA0.59541.05171.51602.05423.70801.25281.69502.35662.98734.1962 Case5 SOWDA0.85661.49352.29583.07614.48922.39372.39373.27463.89094.7536 GOWDA0.84931.48682.27523.05354.48921.76652.38553.26263.88174.7532 WRASA0.84751.48812.27903.05974.48771.76562.38233.25923.88064.7492 Case6 SOWDA0.88841.57262.45783.21604.59472.52362.52363.41864.03884.8445 GOWDA0.87801.55772.44053.22064.60371.85692.51613.41884.04454.8539 WRASA0.87751.55592.43373.21294.58091.85292.50903.41154.02924.8283

it can be estimated with parametric models once the error distribution can be clarified and the estimation procedures are stable with different methods. However, quality control using TE may lead to undesirable results for skewed error distributions. It does not consider the properties of the error distribution beyondα. In addition, it is non-convex and discontinuous for discrete error distributions.

The advantages of applying CTE are threefold. First, it has superior mathematical proper-ties, as we have shown, and preserves some properties of TE, such as easy interpretation and complete definition of error distribution with allα. Second, it is continuous with respect to α and coherent. Third, when making decision, we can optimize it with convex programming since CTEα1K1+ · · · + ωnKn) is a convex function with respect to weights ω1, . . . , ωn

where n

1ωn = 1. However, challenges still remain when estimating CTE as it is heavily affected by tail modeling.

Coherence matters when we conduct system engineering either by reconfiguring systems or aggregating tasks. System engineering refers to simple configurations, such as paralleliza-tion, and complex ones, such as centralizaparalleliza-tion, fragmentaparalleliza-tion, or distribuparalleliza-tion, as we show in Fig.1. TE is not coherent because it does not satisfy Axiom2-(2) in Sect.2.3.1. Axiom2-(2) states that if we combine two independent tasks, then the total error of them is not greater than the sum of the error associated with each of them. When we reconfigure systems, we try to avoid creating any extra errors (or system instability), but no one can guarantee it. In our empirical study, all three big data systems apply only plain configuration because the input data are simple high-frequency time series and work on a single task of forecasting the load.

Therefore, we can apply TE and CTE for dynamical measuring following Definition3.

Accuracy is given by systemic computability. Many computational methods can be used to accurately describe the probability distribution of errors; therefore, we can obtain a definite value for TE and CTE. Compared with conventional methods, with more suitable models or computational methods, we could optimally reduce the incapability of tail measures. If we refer to robustness as consistency and efficiency as comparability, then we can confidentially apply CTE, since it is a preferred method for quality control due to its robustness and effi-ciency. Therefore, the only thing we need for conducting the quality control is to choose or search for a properα that reflects our own utility.

5 Conclusion

In this article we highlighted the challenge of big data systems and introduced some back-ground information focusing on their quality management. We looked to describe the prevailing paradigm for quality management of these systems. We proposed a new axiomatic framework for quality management of a big data system (BDS) based on the tail error (TE) and conditional tail error (CTE) of the system. From a decision analysis perspective, particularly for the CTE, a dynamic coherent mechanism can be applied to perform these quality errors by continuously updating the new information through out the whole systemic processing.

In addition, the time invariance property of the conditional tail error has been well defined to ensure the inherent characteristics of a BDS are able to satisfy quality requirements.

We applied the proposed methods with three big data systems (i.e., SOWDA, GOWDA, and WRASA) based on wavelet algorithms and conducted an empirical study to evaluate their performances with big electricity demand data from France through analysis and fore-casting. Our empirical results confirmed the efficiency and robustness of the method we proposed herein. In order to fully illustrate the advantage of the proposed method, we also provided a simulation study to highlight some features that were not obtained from our

empirical study. We believe our results will enrich the future applications in quality manage-ment.

As we have seen, the three big data systems we investigated in this article are mainly from fragmented (decentralized) systems where the interconnection of each component is still hierarchical in algorithms. When the underlying big data system exhibits strong distributed features—for example, there are tremendous interoperations—how to execute the dynamic coherent quality measure we proposed in this article should be paid more attention. In addi-tion, some robust tests focusing on validating its conditional correlation over time should be proposed for future study.

Acknowledgements The authors would like to thank the three anonymous reviewers and the guest editor for providing valuable comments. This work was supported in part by the Ministry of Science and Technology (MOST) under Grant 106-2221-E-009-006 and Grant 106-2221-E-009-049-MY2, in part by the “Aiming for the Top University Program” of National Chiao Tung University and the Ministry of Education, Taiwan, and in part by Academia Sinica AS-105-TP-A07 and Ministry of Economic Affairs (MOEA) 106-EC-17-A-24-0619.

References

Agarwal, R., Green, R., Brown, P., Tan, H., & Randhawa, K. (2013). Determinants of quality management practices: An empirical study of New Zealand manufacturing firms. International Journal of Production Economics, 142, 130–145.

Artzner, P., Delbaen, F., Eber, J., Heath, D., & Ku, K. (2007). Coherent multiperiod risk adjusted values and Bellman’s principle. Annals of Operations Research, 152, 5–22.

Baucells, M., & Borgonovo, E. (2013). Invariant probabilistic sensitivity analysis. Management Science, 59(11), 2536–2549.

Bion-Nadal, J. (2008). Dynamic risk measures: Time consistency and risk measures from BMO martingales.

Finance and Stochastics, 12(2), 219–244.

Bion-Nadal, J. (2009). Time consistent dynamic risk processes. Stochastic Processes and their Applications, 119(2), 633–654.

Chen, Y., & Sun, E. (2015). Jump detection and noise separation with singular wavelet method for high-frequency data. Working paper of KEDGE BS.

Chen, Y., & Sun, E. (2018). Chapter 8: Automated business analytics for artificial intelligence in big data @x 4.0 era. In M. Dehmer & F. Emmert-Streib (Eds.), Frontiers in Data Science (pp. 223–251). Boca Raton:

CRC Press.

Chen, Y., Sun, E., & Yu, M. (2015). Improving model performance with the integrated wavelet denoising method. Studies in Nonlinear Dynamics and Econometrics, 19(4), 445–467.

Chen, Y., Sun, E., & Yu, M. (2017). Risk assessment with wavelet feature engineering for high-frequency portfolio trading. Computational Economics.https://doi.org/10.1007/s10614-017-9711-7.

Cheridito, P., & Stadje, M. (2009). Time-inconsistency of VaR and time-consistent alternatives. Finance Research Letters, 6, 40–46.

Chun, S., Shapiro, A., & Uryasev, S. (2012). Conditional value-at-risk and average value-at-risk: Estimation and asymptotics. Operations Research, 60(4), 739–756.

David, H., & Nagaraja, H. (2003). Order statistics (3rd ed.). Hoboken: Wiley.

Deichmann, J., Roggendorf, M., & Wee, D. (2015). McKinsey quarterly november: Preparing IT systems and organizations for the Internet of Things. McKinsey & Company.

Hazen, B., Boone, C., Ezell, J., & Jones-Farmer, J. (2014). Data quality for data science, predictive analytics, and big data in supply chain management: An introduction to the problem and suggestions for research and applications. International Journal of Production Economics, 154, 72–80.

Keating, C., & Katina, P. (2011). Systems of systems engineering: Prospects and challenges for the emerging field. International Journal of System of Systems Engineering, 2(2/3), 234–256.

Liu, Y., Muppala, J., Veeraraghavan, M., Lin, D., & Hamdi, M. (2013). Data center networks: Topologies architechtures and fault-tolerance characteristics. Berlin: Springer.

Maier, M. (1998). Architecting principles for systems-of-systems. Systems Engineering, 1(4), 267–284.

Mellat-Parst, M., & Digman, L. (2008). Learning: The interface of quality management and strategic alliances.

International Journal of Production Economics, 114, 820–829.

O’Neill, P., Sohal, A., & Teng, W. (2015). Quality management approaches and their impact on firms’ financial performance—An Australian study. International Journal of Production Economics.https://doi.org/10.

1016/j.ijpe.2015.07.015i.

Parast, M., & Adams, S. (2012). Corporate social responsibility, benchmarking, and organizational perfor-mance in the petroleum industry: A quality management perspective. International Journal of Production Economics, 139, 447–458.

Pham, H. (2006). System software reliability. Berlin: Springer.

Riedel, F. (2004). Dynamic coherent risk measures. Stochastic Processes and their Applications, 112(2), 185–200.

Shooman, M. (2002). Reliability of computer systems and networks: Fault tolerance analysis and design.

Hoboken: Wiley.

Sun, E., Chen, Y., & Yu, M. (2015). Generalized optimal wavelet decomposing algorithm for big financial data. International Journal of Production Economics, 165, 161–177.

Sun, E., & Meinl, T. (2012). A new wavelet-based denoising algorithm for high-frequency financial data mining. European Journal of Operational Research, 217, 589–599.

Sun, W., Rachev, S., & Fabozzi, F. (2007). Fractals or I.I.D.: Evidence of long-range dependence and heavy tailedness from modeling German equity market returns. Journal of Economics and Business, 59, 575–

595.

Sun, W., Rachev, S., & Fabozzi, F. (2009). A new approach for using Lèvy processes for determining high-frequency value-at-risk predictions. European Financial Management, 15(2), 340–361.

Wu, S., & Zhang, D. (2013). Analyzing the effectiveness of quality management practices in China. Interna-tional Journal of Production Economics, 144, 281–289.

相關文件