CHAPTER 4 EXPERIMENT DESIGN AND RESULTS
4.2 Performance Evaluation
國
立 政 治 大 學
‧
N a tio na
l C h engchi U ni ve rs it y
Figure 5: The 24th simulation set.
4.2 Performance Evaluation
Here we show a brief rule that how to evaluate the DSM’s performance in this study. According to the theoretical non-outlier or theoretical outlier defined when designing the experiment by judging 𝜖𝜖𝑡𝑡, we can compare the identification result to the proposed DSM. There are four possible outcomes: (the term’s fist character is on behalf of the theoretical type and the second character is on behalf of the resulted type identified by the proposed DSM)
(1) N-O: The theoretical non-outlier that has been incorrectly specified as outlier candidate.
(2) O-N: The theoretical outlier that has been incorrectly specified as non-outlier.
(3) O-O: The theoretical outlier that has been correctly specified as outlier candidate.
46
‧
國立 政 治 大 學
‧
N a tio na
l C h engchi U ni ve rs it y
(4) N-N: The theoretical non-outlier that has been correctly specified as non-outlier.
In the Table 4, we show two misrecognized identification types.
Table 4: The possible experiment outcome measurement
Error Type Meaning
Ratio of Type I error
The proportion of theoretical non-outliers that has been incorrectly identified as outlier candidates.
Ratio of Type II error
The proportion of theoretical outliers that has been incorrectly identified as non-outliers.
In the practical application, Type II is more critically serious than Type I. The main reason is Type II may be much more harmful than Type I to our decision or operation. In other words, the risks of Type II error may incur a greater loss to companies or organizations (Joo, Hong & Han, 2003). Due to this reason, we hope that the proportion of Type II error can be very low.
Here we take the 95th simulation set as an example. Table 5 is the result about take the 95th simulation set. Furthermore, we also demonstrate the 100 simulation sets’
total performance in Table 6 for discussion.
47
‧
Table 5: The experiment result of the 95th simulation set
Time Stamp
Training block
Testing block Evaluation
learning
block Potential outlier
M Training
‧
Table 6: The whole experiments’ performance
Total 100
Total amount of O-O Total amount of O-N Total amount of N-O
Potential outlier Testing block Learning block
‧
國立 政 治 大 學
‧
N a tio na
l C h engchi U ni ve rs it y
In the Table 5, we have learned that each window’s accuracy is higher than 95%
and even some windows’ accuracy reach 100%. Here, we adopt indicators Type I and Type II of Table 4 to measure the experiment performance.
For the total 100 simulation sets’ performance, we can know the total amount of theoretical outliers in the whole 100 simulation sets is 281. There are 275 theoretical outliers have been identified as outlier candidates -- 79 of them are detected within the training block (28.7%) and 196 within the testing block (71.3%). This is good phenomenon since the theoretical outliers can be detected as early as possible, especially in testing block. Note that some theoretical outliers are in the training block when M=1. So if this DSM can be applied in long-term or infinite time series data.
It’s likely that we may get more theoretical outliers output as outlier candidate in testing block.
In our study, we hope the proportion of Type II error can be less. Based on this anticipation, we have determined a rule: if the instance has been distinguished as outlier candidate whenever in each window, it will be output to decision maker in the first time. After outputting it to decision maker, decision maker have to evaluate whether it is a real outlier or not. Due to this mechanism, the decision maker only has to review the outlier candidates provided by this DSM. With this rule, the decision maker needs to review only approximately 12% data. This DSM decreases large proportion of data need to be reviewed and really achieves time-saving goal.
In Table 6, there are 6 theoretical outliers that are not detected as outlier candidate, and this proportion is at approximately 2%. The total output amount of the outlier candidates is 2408, at around 12 percentage. Same as experiment result about
50
‧
the theoretical outlier, this result explains a large proportion of outlier candidate being output while in the testing block. Furthermore, this 6 theoretical outliers that do not be identified as outlier candidates are distributing in 6 individual simulation sets.
For investigating the reason causing Type II error, we investigate into each experiment result. Here we try to discuss possible reasons causing this error. Note that the misrecognized identification causes Type II error. Due to the characteristic of application area, we try to discuss some possible solutions to this error case.
In the cases of 23rd, 37th and 75th simulation sets, the theoretical outliers are in the testing block of last window that consists of the 196th to 200th instances.
Furthermore, the theoretical outlier is closer to the majority of the data.
In the 23rd simulation set, Figure 6 shows the entire 200 instances. In this set, the 41st, 120th, 183rd and 196th instances are theoretical outlier. There only the 197th instance didn’t be identified as outlier candidate by the proposed DSM. In the last window where M = 20, the training block is the instances from 95th to 195th, and the test block is the instances from the 196th to 200th where the 197th instance is theoretical outlier and the 196th, and 198th to 200th are theoretical non-outliers. The result of the last window shows that the 120th and 183rd instances have been correctly identified as outlier candidates before. However, the 196th instance has been incorrectly identified as non-outliers, and its value seems near to the neighborhood instances’ value. Due to the deviance between the outlier candidates and the neighborhood instances, we think this reason causes the misrecognized identification and gives rise to the 196th theoretical outlier being identified as non-outlier. However, if we have more time series data for performing more windows, we think this DSM will have high possibility to identify it successfully.
51
‧
國立 政 治 大 學
‧
N a tio na
l C h engchi U ni ve rs it y
Figure 6: The 23rd simulation set.
Figure 7 and 8 show the 37th and the 45th simulation sets individually. In this two sets the problem we face is similar to the 23rd set we discuss before. There are also one theoretical outlier in last window doesn’t been correctly identified as outlier candidate. In 37th set, the 197th instance is closer to the majority data. In the 45th set, the 196th instance seems have similar feature obviously. Due to the deviance between the outlier candidates and the majority data, we think this reason causes the misrecognized identification and gives rise to the 197th theoretical outlier in 37th set and the 196th theoretical outlier in the 45th set being identified as non-outliers incorrectly. Although this instances have not been identified as outlier candidate, we also believe that if there have more time series data for performing more windows.
We have confidence that the proposed DSM may have high possibility to identify it correctly in following windows.
52
‧
國立 政 治 大 學
‧
N a tio na
l C h engchi U ni ve rs it y
Figure 7: The 37th simulation set.
Figure 8: The 45th simulation set.
53
‧
國立 政 治 大 學
‧
N a tio na
l C h engchi U ni ve rs it y
In the 56th simulation set, the Figure 9 shows that both the 24th and 153rd are instances as theoretical outliers, but the 24th instances doesn’t be detected as outlier candidate while M = 1, 2, 3 and 4. However, this theoretical outlier has been detected as a potential outlier in the first window, where M=1. Regretfully, in the continuing widows, this theoretical outlier has been regarded as non-outlier until this instance being discarded. We think if there have more historical data with same concept, and perform with this instance. Maybe this theoretical outlier can be distinguished as outlier candidate. Another solution is that if we shrink the window size to fit this data nature, and we may detect it successfully.
Figure 9: The 56th simulation set.
Consider the 49th and 97th simulation set, this two sets also have similar situation to the 56th set. The difference between the 56th experiment set and 49th, 97th experiment set is that the theoretical outlier in 49th, 97th doesn’t be detected as potential outlier yet. In Figure 10 and 11, although the 6th instance both in the 49th and
54
‧
國立 政 治 大 學
‧
N a tio na
l C h engchi U ni ve rs it y
97th set seem to be very obvious to be distinguished as outlier candidate, the envelope module does not distinguish it correctly. Considering the majority of data in whole window where M = 1 or 2, the majority of data seems having a high similarity with the latter part in the specific window.
Owing to the similarity between the theoretical outlier and the majority of data in the specific windows, we think the possible solutions to distinguish the theoretical outlier may be same as the 56th set’s conditions. One is that if there have more historical data with same concept. Maybe this theoretical outlier can be distinguished as outlier candidate in earlier window. Another one is that if we adjust the window size to fit this data nature, and we may detect it successfully. However, in the initial window, we consider that this DSM can perform extra proper size for early instances.
Based on this simulation data, we realize that the trend of the data may let the theoretical outlier near the origin have strong similarity to the instance in the later part of the window. So we suggest that the DSM perform the extra window with appropriate small size to identify the theoretical outlier correctly.
55
‧
國立 政 治 大 學
‧
N a tio na
l C h engchi U ni ve rs it y
Figure 10: The 49th simulation set.
Figure 11: The 97th simulation set.
56
‧
國立 政 治 大 學
‧
N a tio na
l C h engchi U ni ve rs it y
Let’s go back to a successful experiment set. Refer to the Table 5 and Figure 3.
In the 95th simulation set, most of windows, the theoretical outliers can be identified well. However, while M= 8, 9, 10 and 11, SLFN has incorrectly specified a theoretical outlier as a non-outlier. Thus the misrecognized identification cause a Type II error case. Due to the characteristic of application area, we use an auxiliary decision support rule. If the outlier candidate has been output before, then we don’t take unnecessary pains to focus on it. The reason is that the decision maker has determine before whether the potential outlier is a real outlier or just normal instance.
For discussing, we show some representative window’s chart in the 95th experiment set below to illustrate the result we faced. First, we try to explain the meaning of each present type. The green line is the fitting function trained by the DSM. The yellow upper and lower lines are the boundaries of the envelope. In this envelope, the envelope’s bulk is set to 5. The bulk’s width is based on the error term’s standard deviation, which is 2. With the confidence level, we set the bulk as 2*2.5 = 5.
So the total width of the envelope is 2*ε, 10.
Then, we elaborate the dots’ meaning in the chart. The blue circle is the theoretical non-outlier distinguished as non-outlier in training block, obviously in the envelope. The yellow triangle is theoretical non-outlier but this instance is distinguished and output as outlier candidate. The white square is the theoretical outlier distinguished correctly as outlier candidate. The red square is the theoretical outlier distinguished incorrectly as non-outlier, and this result causes Type II error.
The white circle is the theoretical non-outlier distinguished as non-outlier in testing block.
57
‧
國立 政 治 大 學
‧
N a tio na
l C h engchi U ni ve rs it y
Figure 12: The 95th simulation set’s moving windows when M=2
Figure 12 shows the 2nd window. In this window, the 6th to 105th instances compose training block. There are 2 theoretical outliers in this window. One in the 32nd instances and the other one is the 55th instances, both in training block. Then, we try to observe the trend, especially in testing set. Actually, the trend of data seems to be rising.
Considering the result, this proposed DSM successfully identified the 32nd and 55th instances, theoretical outliers, as outlier candidates in this window. On the other hand, the proposed DSM also output 2 theoretical non-outliers. One in the training block, the other one in the testing block. One is the 82nd instance, the other one is the 109th instance. This 2 instances can be classified as Type I error.
58
‧
國立 政 治 大 學
‧
N a tio na
l C h engchi U ni ve rs it y
Figure 13: The 95th simulation set’s moving windows when M=13
Figure 13 shows the 13rd window. This window shows the 119th and 162nd instances are theoretical outlier, and this DSM also successfully identified it as outlier candidates. Besides, one thing worth to notice is that the 162nd instance has been distinguished correctly as outlier candidate in the testing block.
59
‧
國立 政 治 大 學
‧
N a tio na
l C h engchi U ni ve rs it y
Figure 14: The 95th simulation set’s moving windows when M=16
Figure 14 shows the 16th window. We can notice that the concept has changed after t = 151. It seems that even with the change of concept, the DSM still handles with outlier detection well. Through the 82nd instance is incorrectly detected as an outlier candidate, this may incur a Type I error. Hence the expected application area we proposed, we still output this instance as an outlier candidate to decision maker.
However, the 119th and 162nd instances are still correctly distinguished as outlier candidates.
Summary of this 95th simulation set, there are totally 4 theoretical outliers, and all of these theoretical outliers has been detected. One promising thing is that the result shows that both the 119th and 162nd theoretical outliers are still correctly distinguished as outlier candidates while they appear in the testing block in the first time. Of course, the 32nd and 55th theoretical outliers have been correctly distinguished as outlier candidates while M=1, that is to say, the 32nd and 55th
60
‧
國立 政 治 大 學
‧
N a tio na
l C h engchi U ni ve rs it y
theoretical outliers are also detected by the proposed DSM in the initial window.
There are totally 16 instances, merely 8% of whole data in this simulation set, has been output as outlier candidates. This result clearly reflects this DSM make a well decision support.
Another interesting issue in the computer application is zero-day attack or vulnerability. A zero-day attack is a cyber-attack exploiting a vulnerability that has not been disclosed publicly (Bilge & Dumitras, 2012). In practical application, there is almost defense against a zero-day attack. Furthermore, the IDS with signature-based scanning method seems to be hard to detect it successfully while the attack still remains unknown openly.
From the literature review of zero-day attack, we can deal the zero-day attack problem with unsupervised learning technique. Because the DSM is ignorant about that the concept has drifted while t = 151 in our experiment, we consider the first theoretical outlier appears while t ≥ 151 is the zero-day attack. In our research, all of the zero-day attacks in 100 simulations have been detected successfully. In Table 7, we also know the proportion of zero-day attacks are identified as outlier candidate is 45%, and this figure also means that 45 sets’ zero-day attacks are detected in the testing block.
Table 7: The detecting zero-day attack’s performance
Detected in which block Testing block Training block
Total 100 simulation sets 45 55
61
‧
國立 政 治 大 學