• 沒有找到結果。

Experimental Results and Discussions

There is no standard way in deciding the parameter values of the models in the literature. In this study, the values of the model parameters were set as the commonly adopted values in the past research. Moreover, the values of parameters in the models were calibrated based on the criterion of the EW-OMR value varied from 0.1 to 0.2 in order to avoid the over-fitting phenomenon at the training stage.

The experimental results of assessment accuracy, in terms of EW-OMR, Type I MR and Type II MR, for two types of the software assessment models are respectively shown in Tables 3 and 4; while the results of assessment efficiency, in terms of TNoUVarsM, ANoUVarsAP and NoUVarsLAP, are shown in Table 5. In these tables, DT1 denotes a decision tree with the bi-splitting method; DT2 denotes a decision tree with the multi-splitting method of one value at each branch; DT3 denotes a decision tree with the multi-splitting method of one subset of values at each branch. In addition, three modeling techniques, namely FF-BP ANN, DA and MNLR, were used to establish the one-layered software

assessment model. Three different splitting methods in the DT algorithm were adopted to build the multi-layered software assessment model. The result of each performance measure in one-layered and multi-layered assessment models is individually defined as the average of three modeling techniques and three splitting methods of DT. Moreover, each DT-based model generates two types of results, namely the un-pruning and pruning results, so the result of each performance measure is the mean of the two outcomes that represents the average performance.

Table 3: The experimental results of the EW-OMR measure

EW-OMR Multi-layered

Software Assessment Model

One-layered

Software Assessment Model

DT1 DT2 DT3 Average FF-BP

ANN DA MNLR Average

Mean 27.33% 29.53% 31.72% 29.53% 26.07% 35.70% 35.63% 32.47%

Std. Dev. 4.51% 1.38% 1.25% 2.38% 3.56% 3.64% 2.00% 3.07%

Table 4: The experimental results of the Types I MR and Type II MR measures Multi-layered

Software Assessment Model

One-layered

Software Assessment Model

DT1 DT2 DT3 Average FF-BP

ANN DA MNLR Average

Type I MR

Mean 16.48% 7.28% 18.92% 14.35% 11.11% 25.57% 19.44% 18.71%

Std. Dev. 13.34% 2.43% 5.34% 7.04% 13.18% 5.01% 6.00% 8.06%

Type II MR

Mean 53.03% 84.85% 63.64% 67.17% 63.64% 60.61% 75.76% 66.67%

Std. Dev. 19.05% 4.29% 12.86% 12.07% 29.69% 22.68% 11.34% 21.24%

Table 5: The experimental results of the TNoUVarsM, ANoUVarsAP and NoUVarsLAP measures

Multi-layered

Software Assessment Model

One-layered

Software Assessment Model

DT1 DT2 DT3 Average FF-BP

ANN DA MNLR Average

TNoUVarsM

Mean 2.67 0.50 2.83 2.00 27 5.33 6.33 12.89

Std. Dev. 0.47 0 0.62 0.36 0 1.25 2.05 1.10

ANoUVarsAP

Mean 2.13 0.5 1.99 1.54 27 5.33 6.33 12.89

Std. Dev. 0.35 0 0.23 0.19 0 1.25 2.05 1.10

NoUVarsLAP

Mean 1.95 0.5 2.67 1.95 27 5.33 6.33 12.89

Std. Dev. 0.31 0 0.47 0.31 0 1.25 2.05 1.10

4.1 Assessment Accuracy

Table 3 shows the mean and standard deviation of EW-OMR results for both kinds of software assessment models. The average of three multi-layered software assessment models using different splitting methods of the DT algorithm is 29.53%, and the average value of three one-layered assessment models is 32.47%. However, the FF-BP ANN model has the best value (26.07%) among all assessment models. This depicts the fact that the assessment accuracies of all three DT multi-layered software assessment models are slightly worse than the FF-BP ANN one-layered assessment model, but they are better than the DA and MNLR one-layered assessment models. Hence, the conclusion on which type of software assessment models has better assessment accuracy is hard to draw. Nevertheless, it can be concluded that the multi-layered software assessment model can still maintain acceptable performance in the assessment accuracy if appropriate values can be given to the parameters of the DT assessment model.

Table 4 shows the results of Type I MR and Type II MR for both types of software assessment models. The average value of Type I MR (14.35%) of three multi-layered software

assessment models use different splitting methods of the DT algorithm is lower than the average value (18.71%) of three one-layered software assessment models; while the average value of Type II MR of both software assessment models are not significantly different.

Hence, this implies that the multi-layered software assessment model outperforms the one-layered software assessment model in the partial misclassification rate measure of assessment accuracy. Besides, the values of Type II MR are much higher than the values of Type I MR. This implies that these models have difficulty handling the imbalance distribution feature of the software measurement data automatically. Normally, two ways can be employed to solve such phenomenon. First, the Type II MR values can be lowered at the cost of increasing Type I MR values. That is, software appraisers can increase the misclassification rates of low-cost classes to lower the misclassification rates of high-cost classes to reduce the probability and impact of misclassifying high-cost cases. Another way is to reduce the imbalanced distribution of the model building dataset. This means that the collected software measurement data should be as complete and representative as possible.

4.2 Assessment Efficiency

According to the results shown in Table 5, the average TNoUVarsM, ANoUVarsAP and NoUVarsLAP values (2, 1.54 and 1.95, respectively) of three DT multi-layered software assessment models are much superior to those (12.89, 12.89. 12.89) of three one-layered software assessment models (FF-BP ANN, DA and MNLR). It is noted that since the one-layered software assessment model has only one assessment layer, the values of these three indicators are identical. Hence, it is concluded that the multi-layered software assessment models apparently outperforms the one-layered software assessment models in assessment efficiency.

DT-based multi-layered software assessment models with multi-splitting method of one value at each branch is superior to those with other two splitting methods in three assessment efficiency indicators. The DT model with multi-splitting method of one subset of values at each branch is the worst among the three different models. For the one-layered software assessment models, the DA model has lower values (5.33) of TNoUVarsM, ANoUVarsAP and NoUVarsLAP indicators than the FF-BP ANN and MNLR models. The FF-BP ANN assessment model is the worst in these three assessment efficiency indicators.

4.3 Synthetic Discussion

By considering both of the assessment accuracy and assessment efficiency performance indicators, the multi-layered software assessment model has apparent advantage in the assessment efficiency compared to one-layered software assessment model. However, all multi-layered software assessment models are little worse in their EW-OMR values than the FF-BP ANN model. Hence, it can be concluded that when compared to one-layered software assessment models, the multi-layered software assessment models slightly decrease assessment accuracy, but with better assessment efficiency based on an empirical experiment using software measurement dataset.

The above finding suggests that an appraiser may not need to collect all measurement data (independent variables) to reach the required level of assessment accuracy in a particular software process or product assessment problem. He/She may consider adopting the multi-layered assessment models, such as DT-based assessment model, and collect the measurement data gradually. This way can save much effort in collecting all measurement data and can still maintain acceptable assessment accuracy.

相關文件