CW-CostCovB Results - R ESULT A NALYSIS - 在數量極大的程式碼中解決測試覆蓋度的最佳化問題

5.2 R ESULT A NALYSIS

5.2.4 CW-CostCovB Results

In the CW-CostCovB algorithm, we provide a cost-driven and coverage-driven algorithm. We judge which factor, cost or coverage, is more important. The parameter

fcov and fcost is the factor of extra coverage and cost, respectively. When fcov is

higher than fcost, it means that extra coverage is emphasized. We analyze every kind of parameters where fcov from 0, 0.1, … 1, in other words, fcost from 1, 0.9, … 0. In the curve of EFCC/TFC as illustrated in Figure 14(a), fcov=0 and fcost=1 means we emphasize cost most. When we use fcost=1, the CW-CostCovB becomes the CW-CostMin. The selected test cases can provide 62.3% extra coverage. However, from fcov=0 to fcov=1, the extra coverage only falls in 62% to 69% coverage. The extra coverage is hard to have over 69% coverage because the coverage of infrastructure functions have about 27% coverage. In additional, the curve of COSTsel/COST grows a lot when fcov from 0.3 to 0.4 and spends 1% cost more. The cost only locates on 1% to 3% whatever the selection of fcov. The cost is small but still has large extra coverage, because these selected test cases have large function reachability and have small cost at same time. To compare the difference of emphasis in extra coverage and cost, we choose the extreme result which is fcov=0 and fcov=1, as illustrated in Figure 14 (b). Based on fcov=0, to normalize these two results. For an extra 6% coverage (96.25%-90.44%), we pay a cost of 2.6 times (1.097%→2.853%) and select the test cases of 1.2 times. Hence, we recommend the fcov=0 is better choice in the MPLS test area.

Figure 14 CW-CostCovB results

5.2.5 CW-CovMax Results

In the CW-CovMax algorithm, we provide a cost-driven method. The client can perform different selection policies by different restriction time. We use 500 and 1,000 minutes here. As illustrated in Figure 15 (a), there is 99.63% coverage in rt=500 and 100% coverage in rt=1,000. We can see that in order to improve a little bit coverage would pay a lot of cost. Based on rt=500, we normalize rt=1,000 as depicted in Figure 15 (b). The coverage only improves 0.37% in rt=1,000, but it needs to select 1.44 times test cases and 1.98 times cost. It means that only few parts of functions are covered by a certain test case. In order to improve the coverage from 99% to 100%, we need to select many test cases. It concluded that choosing appropriated restriction time is important. In additional in Figure 15 (a), the 15.58% test cases can reach 100%

coverage. In other words, the functions which are covered by all test cases can also be covered by other 15.58% test cases. There are two possibilities to explain this. First, there are old version test cases. We add new test cases without deleting old test cases.

The functions which can be covered by old version test cases also can be covered by new version test cases. Next, the granularity of coverage is big. When a function is covered by a certain test case, we mark this function as covered. In actually, some test cases may use to test the different parts of the function. Because the testing resource is limited and the reason of easy to manage, we use the function coverage as coverage criteria. The fault detection capability may be decreased if choosing a bigger granularity. We only provide a case study. The real impact of reducing test cases on fault detection capability in our system with large code size would be future works.

Figure 15 CW-CovMax results 5.2.6 CW-CostMin-C results

In the CW-CostMin-C algorithm, we provide a coverage-driven method. The client can get the minimal cost according to the sufficient coverage. This algorithm extends from CW-CostMin algorithm. We select the cheapest test cases in each step until the sufficient coverage is reached. Each step uses the CW-CostMin algorithm.

Obviously, when the test cases in n={0,10,…,90} and ecl=n+10, they must contains test cases in ecl=n. We perform the algorithm from ecl=10 to ecl=100. In Figure 16 (a), the coverage is 68.82% from ecl=10 to ecl=60. Because when we select a test case in ecl=10, it already provides 68.82% coverage. Until ecl=70, it just selects new test cases. Because the huge changes of cost from ecl=90 to ecl=100, we focus on this division. We need to select 1.79% test cases and cost 0.529% in ecl=90. We also need to select 15.85% test cases and cost 12.74% in ecl=100. Based on ecl=90, we normalize ecl=100, as illustrated in Figure 16 (b). Increasing the 8.85 times test cases and 24.08 times cost from coverage 90% to 100%. It also means that there is only few functions are covered by certain test cases. Hence, the cost grows a lot when coverage from 90% to 100%. It is important to choose appropriate ecl.

Figure 16 CW-CostMin-C results 5.2.7 PDF-SA Results

After explaining the results of all algorithms, we focus on the PDF-SA algorithm to see the improvement in each algorithm. The Probability Density Function (PDF) and Cumulative Density Function (CDF) of test intensity of functions are shown in Figure 17 (a). In order to analyze and draw figure easily, different values of test intensity of functions are aggregated into a separate division. For example, 20%

means the test intensity of functions is from greater than or equal to 20% to less than 25%. Observed from Figure 17 (a), the distribution of PDF is irregular. 0% and 100%

test intensity of functions are larger than others, meaning that two large portions of functions are covered by many test cases due to the initial procedures and special features. For the distribution of CDF, its value grows slowly except 100%. Hence, we let the functions be infrastructure functions when tit=100. There are 6,463 functions are infrastructure functions that do not need to be considered in selection algorithms.

In other words, we can reduce 27.73% functional space. For the convenience to compare with experiment results, other two controls are selected as depicted in Figure 17 (b). For 80% and 90% of test intensity of functions, 8,427 and 7,510 functions are not required to be considered, and thus 36.20% and 32.2% functional space are reduced, respectively.

Figure 17 PDF-SA result

As listed in Figure 18 (a), there are five lines to represent the corresponding five algorithms, the CW-NumMin, the CW-CostMin, the CW-CostCovB, the CW-CovMax and the CW-CostMin-C. Y-axis means the execution time in each algorithm with seconds and X-axis means the degree of infrastructure threshold in the PDF-SA algorithm. As illustrated in Figure 18 (a), we can see the different policies of test intensity of functions have great difference, especially in CW to CWtit100, because from CW to CW_tit100 can reduce 27.73% functional space. The speed up is not remarkable from CWtit100 to CWtit90 and CWtit90 to CWtit80 because it only remove

more 4.47% and 4% functional space respectively.

Figure 18 Performance Improving by PDF-SA

The execution time in the CW-CostCovB algorithm is larger than other

algorithms. Because this algorithm needs two parameters, fcov and fcost, and also needs to accumulate the cost and EFCC of all test cases to get the CV(). Furthermore, we normalize the execution time of the CW_tit100, the CW_tit90 and the CW_tit80 base on the execution time of the CW. As illustrated in Figure 18 (b), the execution time of the CW_tit100, the CW_tit90 and the CW_tit80 reduce to 10%~70% of original execution time, especially in the CW-CovMax and the CW-CostMin-C. Because these two algorithms use FC()/COST() as an CW instead FCC()/COST() in other algorithms. The CW_tit100, the CWtit90 and the CWtit80 can save algorithm runtime to 48.46%, 40.91% and 33.37% of original in average. Because the selection algorithms use many operations such as union, intersection and minus of set, even the infrastructure functions of CW_tit100 only have 27.73%, the algorithm runtime can be reduced to 48.46%.

Consequently, if choosing the smaller tit, you can reduce more runtime. In contrary, the results of algorithms become unreasonable when too many functions are considered as infrastructure functions.

6 Conclusions

Regression testing becomes unmanageable in large code size. Hence, we implement the database driven test case selection service and define two metrics to characterize the coverage information: function reachability of a test and test intensity of a function. Then we adapted algorithms from previous works to the practical circumstances of the existing automated regression test system, and devise some test case selection strategies for different concerns.

The CW-NumMin algorithm can reduce the number of selected test cases and cost to 1/39 and 1/43 respectively. The CW-CostMin algorithm can reduce the number of selected test cases and cost to 1/39 and 1/91 respectively. The CW-CostCovB algorithm provides cost-driven and coverage-driven tests. It also concludes that for an extra 6% coverage (96.25%-90.44%), we pay a cost of 2.6 times (1.097%→2.853%) and select the test cases of 1.2 times. The CW-CovMax algorithm provides cost-driven tests. It concludes that rt=1000 has more 0.37% coverage than rt=500 but increases 1.44 times test cases and 1.98 times cost and also concludes that choosing appropriated restriction time is important. The CW-CostMin-C algorithm provides coverage-driven tests. It concludes that from coverage 90% to 100%, it needs to select 8.85 times test cases and cost 24.08 times and also concludes that choosing appropriated effective-confidence level is important. In the PDF-SA algorithm, CWtit100, CWtit90 and CWtit80 can reduce execution time to 48.46%, 40.91% and 33.37% respectively.

These algorithms use greedy heuristic methods and are applicable in MPLS tests of IOS. The experiment results show that the number of test cases and cost are reduced to 1/39 and 1/91, respectively. In advance, these algorithms also provide

cost-driven and coverage-driven tests.

In future work, four directions can be improved. First, the impact without selecting the test cases, which can cover modified functions, should be concerned.

Second, we have to analyze the benefit of trade-off between function coverage and fault detection capability. Because the current system is based on the large code size, to adapt other criteria, such as condition/branch coverage, may degrade the efficiency of testing system and increase the algorithm runtime. Next, if we have many test beds dedicated to different features such that we can perform regression testing parallel with different features. Hence, selecting the test cases to run on the different test beds will have complicated hand-off cost. Finally, the test coverage generated by the original test cases may have flaws. We can compare the effectiveness of test coverage through different traffic, such as attack tools, protocol fuzzier and real traffic.

7 Reference

[1] W. E. Wong, J. Horgan, S. London, and A. Mathur, “Effect of test set minimization on fault detection effectiveness.” Software – Practice and Experience, 28(4):347-369, Apr. 1998

[2] G. Rothermel, M. J. Harrold, J. Ostrin, and C. Hong, “An empirical study of the effects of minimization on the fault detection capabilities of test suites,” in Proceedings of International Conference on Software Maintenance, pp.34-43, 1998

[3] H. K. N. Leung and L. White, “Insights into regression testing,” in Proceedings of International Conference on Software Maintenance, Miami, FL, USA, Oct.

1989, pp.60-69

[4] M. R. Garey and D. S. Johnson, “Computers and intractability: a guide to the theory of NP-completeness,” W. H. Freeman, Jan. 1979

[5] M. J. Harrold, R. Gupta, and M. L. Soffa, “A methodology for controlling the size of a test suite,” ACM Transactions on Software Engineering and Methodology, Vol. 2, No. 3, 1993, pp. 270-285

[6] D. Jeffrey and N. Gupta, “Test suite reduction with selective redundancy,”

International Conference on Software Maintenance, 2002. Proceedings., Sep.

2005

[7] D. Jeffrey and N. Gupta, “Improving fault detection capability by selectively retaining test cases during test suite reduction,” IEEE Transactions on Software Engineering, Vol. 33, Feb. 2007

[8] T. Y. Chen and M. F. Lau, “A new heuristic for test suite reduction,” Information and Software Technology, Vol. 40, No. 5, 1998, pp. 347-354

[9] T. G. Whitten, C. Springs and Colo, “Method and computer program product for generating a computer program product test that includes an optimized set of computer program product test cases, and method for selecting same,” United States Patent[5,805,795], Sep. 1998

[10] X. Y. Ma, Z. F. He, B. K. Sheng, and C. Q. Ye, “A genetic algorithm for test-suite reduction”, 2005 IEEE International Conference on Systems, Man and Cybernetics, Vol. 1, Oct. 2005, pp.133-139

[11] N. Mansour and K. El-Fakin, “Simulated annealing and genetic algorithms for optimal regression testing,”, Journal of Software Maintenance: Research and Practice, Vol. 11, No. 1, 1999, pp. 19-34

[12] J. Black, E Melachrinoudis, and D. Kaeli, “Bi-criteria models for all-uses test suite reduction,” International Conference on Software Engineering, 2004, pp.

106-115

[13] H. Zhong, L. Zhang, H. Mei, “An experimental comparison of four test suite reduction techniques,” POSTER SESSION: Far east experience papers, pp.636-640

[14] T. Y. Chen and M. F. Lau, “A simulation study on some heuristics for test suite reduction,” Information and Software Technology, Vol. 40, no. 13, Nov. 1998, pp.

777-787

[15] G. Rothermel, R. H. C. Chu and M. J. Harrold, “Prioritizing test cases for regression testing,” IEEE Transactions on Software Engineering, Vol. 27, Oct.

2001

[16] A. G. Malishevsky, G. Rothermel, and S. Elbaum, “Modeling the cost-benefits tradeoffs for regression testing techniques.” In Int’l. Conf. Softw. Maint., Oct.

2002, pp. 230–240

[17] W. E. Wong, J. Horgan, A. Mathur, and A. Pasquini, “Test set size minimization and fault detection effectiveness: A case study in a space application,” Journal of Systems and Software, 48:79-89, Oct. 1999

[18] G. Rothermel, M. J. Harrold, J. von Ronne, and C. Hong, “Empirical studies of test-suite reduction,” Software Testing Verification and Reliability, Vol. 12, No. 4, 2002, pp. 219-249

[19] Testwell CTC++, [online], available from World Wide Web:

http://www.testwell.fi/ctcdesc.html

[20] Deitel, Deitel, Nieto & McPhie perl, “Perl: how to program,” Prentice-Hall, 2001 [21] The Perl Directory, [online], available from World Wide Web: [25] M. Kofler, “The definitive guide to MySQL 5, 3rd ed.,” Apress L.P., 2006

[26] The phpMyAdmin Project, [online], available from World Wide Web:

http://www.phpmyadmin.net/home_page/index.php

[27] AppServ Open Project, [online], available from World Wide Web:

http://www.appservnetwork.com/

[28] MySQL, [online], available from World Wide Web: http://www.mysql.com/

[29] PHP, [online], available from World Wide Web: http://www.php.net/

[30] 葉昌福, “PHP 函示庫參考手冊,” 旗標出版股份有限公司, Dec. 2004

在文檔中在數量極大的程式碼中解決測試覆蓋度的最佳化問題 (頁 32-0)