Adaptive Drift Ensemble - 串流資料分析在台灣股市指數期貨之應用

CHAPTER 5 EXPERIMENTS

5.2 Results

5.2.6 Adaptive Drift Ensemble

Figure 15. Algorithms with AUE

Note: Without Remains attribute 1998/7-2012/10, the y-axis starts from 45.

5.2.6 Adaptive Drift Ensemble

For ADE, in experiments we use k=10 (the default value), i.e. there can be at most 10 active sub-classifiers in an ensemble. We transfer an absolute weight into a relative weight that is normalized from 0 to 1. We make an assumption that if the classification performance (e.g.

accuracy) of a classifier is better than that of another then the classifier is able to reflect the current concept with a higher probability. Hence, when the relative weight of a sub-classifier is high, it indicates that the current concept (i.e. the underlying data distribution) is similar to, or the same as, the one that was used to train the sub-classifier. In the beginning, ADE trains a sub-classifier and adds a new sub-classifier to an ensemble when it is at the drift level. If the number of sub-classifiers is larger than 10 (as we use k=10), the sub-classifier that performs the worst will be dropped.

1998/8/1 1999/3/1 1999/10/1 2000/5/1 2000/12/1 2001/7/1 2002/2/1 2002/9/1 2003/4/1 2003/11/1 2004/6/1 2005/1/1 2005/8/1 2006/3/1 2006/10/1 2007/5/1 2007/12/1 2008/7/1 2009/2/1 2009/9/1 2010/4/1 2010/11/1 2011/6/1 2012/1/1 2012/8/1

AUE Hoeffding Tree AUE Hoeffding Adaptive Tree AUE Naïve Bayes

‧

國

立政治大學

‧

N a tio na

l C h engchi U ni ve rs it y

The total number of instances in the TAIEX Futures data is 3,598. We define the lifespan of a sub-classifier as the number of instances that it covers (or on which it is active). On average, a sub-classifier covers 1,043 instances when NB is used, and a sub-classifier covers 946 instances when HT is used. The standard deviation is 698 for NB and it is 732 for HT.

Figure 16 shows the result of ADE. Hoeffding Adaptive Tree with ADE got bigger amplitude than others. On average, Hoeffding Tree with ADE gives 64.65 at accuracy, Hoeffding Adaptive Tree with ADE gives 64.65 and Naïve Bayes with ADE gives 63.29. For standard deviation, Hoeffding Tree with ADE gives 5.8, Hoeffding Adaptive Tree gives 7.5 and Naïve Bayes with ADE gives 6.06.

Figure 16. Result of Adaptive Drift Ensemble Note: The y-axis starts from 50.

50 55 60 65 70 75 80

ADE Hoeffding Tree ADE Hoeffding Adaptive Tree ADE Naïve Bayes

‧

國

立政治大學

‧

N a tio na

l C h engchi U ni ve rs it y

CHPATER 6 DISCUSSIONS 6.1 Impact of Concept Drift

6.1.1 Existence

We used classification algorithms with and without DDM which is drift detector for experiments. The difference of results can be referred as impact of DDM. DDM is able to handle concept drift which bring significant decreasing of accuracy [34]. We can say concept drift exists if algorithms with DDM perform better than algorithms without DDM at accuracy, because the only impact is DDM which was designed for dealing with concept drift and the impact of concept drift is decreasing of accuracy.

The following is fetched from result of base experiment, comparing Figure 8 with Figure 9; we observe that algorithms with DDM usually perform better than do those without DDM. Moreover, we can see from Figure 17 that the accuracy Hoeffding Tree with DDM is consistently higher than that of Hoeffding Tree without DDM. Similarly, we can see from Figure 18 that the accuracy of Naïve Bayes with DDM is consistently higher than that of Naïve Bayes without DDM. We think Hoeffding Adaptive Tree which contains a drift detector is not appropriate to combine with DDM which is another drift detector, so we did not give a comparison here. Therefore, we can accept that assumption that concept drift exists in the TAIEX Futures data.

‧

Figure 17. Hoeffding Tree with or without DDM Note: The y-axis starts from 50.

Figure 18. Naïve Bayes with or without DDM

2000/1/1 2000/7/1 2001/1/1 2001/7/1 2002/1/1 2002/7/1 2003/1/1 2003/7/1 2004/1/1 2004/7/1 2005/1/1 2005/7/1 2006/1/1 2006/7/1 2007/1/1 2007/7/1 2008/1/1 2008/7/1 2009/1/1 2009/7/1 2010/1/1 2010/7/1 2011/1/1 2011/7/1

Hoeffding Tree Hoeffding Tree with DDM

Naïve Bayes Naïve Bayes with DDM

‧

國

立政治大學

‧

N a tio na

l C h engchi U ni ve rs it y

Note: The y-axis starts from 30.

There still is an interesting observation that there is a significant decrease and increase between the 3rd season of 2006 and the 3rd season of 2009 for Naïve Bayes. It was the time during financial tsunami. However, this phenomenon does not occur if the algorithm with DDM is used. Furthermore, there might be a relationship between accuracy and the trend of the stock.

The bottom of accuracy of Naïve Bayes without DDM is in the 3rd season of 2007, in which the stock began to decrease heavily, and the stock began to increase when accuracy had increased to the same level which prior to the decrease.

In order to reduce complexity and focus on the impact of concept drift, we used data which is during only current settlement month. Figure 19 gives the results of experiments based on our previous work, and it also shows that DDM improves classification performance (in terms of accuracy) for both Naïve Bayes (NB) and Hoeffding Tree (HT). Thus, it is more important to investigate DDM rather than explore algorithms that build better classification models.

‧

國

立政治大學

‧

N a tio na

l C h engchi U ni ve rs it y

Figure 19. NB and HT with or without DDM Note: Y-axis is accuracy (from 45% to 85%), and X-axis represents time.

When the size of the time frame for the data is set to a day, the average accuracy is 58.21 for NB and it is 58.08 for HT; the average accuracy is 72.58 for NB with DDM and it is 70.59 for HT with DDM. However, if the size of the time frame for the data is set to an hour, the average accuracy is 55.12 for NB and it is 63.22 for HT; the average accuracy is increased to90.73 for NB and it is increased to 91.25 for HT, if NB and HT are used with DDM. DDM is helpful (no matter we use NB or HT) and we decide to put our focus on its extensions. We report more experimental results in the following.

6.1.2 Time Frame Granularity

We found the importance of DDM in the previous experiments. We also want to know the impact of concept drift in various time frame granularities of TAIEX Futures. The data were collected as “minute” granularity, and we can get “hour” and “day” granularity level data by

‧

calculating more detailed one. Figure 20 (Hoeffding Tree), Figure 21 (Hoeffding Adaptive Tree), and Figure 22 (Naïve Bayes) show the results of each time frame granularity which are day, hour and minute.

We found no matter what algorithms were used, DDM improved accuracy significantly for all time frame granularities. It says concept drift might exist in these three time frame granularities, though, so far, we have no idea whether concept drifts are the same or not in different time frame granularities. DDM also improved stability of accuracy for all time frame granularities. We have known that when concept drift occurs, accuracy will decrease. The decreasing can be observed in detailed time frame. Hence, it’s concept drift that DDM solved so that the results get more high accuracy and more stable.

The frequent of concept drift should be concerned. DDM gives a default value (30 instances) and DDM ignores concept drift if number of instances is less than the value in the current concept. In the hour level, sudden decreasing of accuracy is not easy to be discovered, because the duration of two concept drifts may be smaller than 30 units of time frame. The decreasing of accuracy is obvious when time frame is more detailed than hour level.

Moreover, in minute level, Hoeffding Adaptive Tree performed better than other pure algorithm (no DDM) and DDM also give less improving for Hoeffding Adaptive Tree.

Hoeffding Adaptive Tree contains an adaptively adjusted window to handle concept drift.

However, the adaptive window contributes few when time frame is bigger than minute level.

Hence, in TAIEX Futures market, concept drift perhaps occurs in hour level or more detailed.

DDM improved more in detailed time frame granularity, in other words, concept drift has more impact in detailed time frame granularity. The results in minute level get almost 100% accuracy, so DDM improved less than results in hour level. On average, in day level, DDM improved accuracy at 16.00 and standard deviation at 1.78; in hour level, DDM

‧

國

立政治大學

‧

N a tio na

l C h engchi U ni ve rs it y

improved accuracy at 28.91 and standard deviation at 6.69; in minute level, we ignore Hoeffding Adaptive Tree, and DDM improved accuracy at 24.58 and standard deviation at 35.28.

‧

Figure 20. Hoeffding Tree in various time frame granularities Day

100 Hoeffding Tree Hoeffding Tree with DDM

0 20 40 60 80

100 Hoeffding Tree Hoeffding Tree with DDM

0 20 40 60 80

100 Hoeffding Tree Hoeffding Tree with DDM

1998/8/1 1999/4/1 1999/12/1 2000/8/1 2001/4/1 2001/12/1 2002/8/1 2003/4/1 2003/12/1 2004/8/1 2005/4/1 2005/12/1 2006/8/1 2007/4/1 2007/12/1 2008/8/1 2009/4/1 2009/12/1 2010/8/1 2011/4/1 2011/12/1 2012/8/1

‧

Figure 21. Hoeffding Adaptive Tree in various time frame granularities Day

100 Hoeffding Adaptive Tree Hoeffding Adaptive Tree with DDM

0 20 40 60 80

100 Hoeffding Adaptive Tree Hoeffding Adaptive Tree with DDM

0 20 40 60 80

100 Hoeffding Adaptive Tree Hoeffding Adaptive Tree with DDM

‧

Figure 22. Naïve Bayes in various time frame granularities Day

100 Naïve Bayes Naïve Bayes with DDM

0 20 40 60 80

100 Naïve Bayes Naïve Bayes with DDM

0 20 40 60 80

100 Naïve Bayes Naïve Bayes with DDM

‧

6.2 Types of Concept Drift

6.2.1 Sudden vs. Gradual

We are eager to know whether the type of concept drift in TAIEX Futures market is sudden or gradual, because different types of concept drift need different corresponding method to handle. We gave two ways to validate types of concept drift, and we found it is sudden concept drift that occurs in TAIEX Futures market. First, we compared results of DDM and EDDM, which are designed for solving sudden or gradual concept drift. Second, we compared results of algorithms with DDM and ensemble method. Albert concluded that evolving ensemble method is appropriate to deal with gradual concept drift [29].

Figure 23 shows the results given by algorithms with DDM and EDDM. In terms of the average accuracy, NB gives 56.5%, NB with DDM gives 72.24%, and NB with EDDM gives 71.63%; HT gives 56.41%, HT with DDM gives 71.39%, and HT with EDDM gives 72.54%.

Sometimes DDM is better than EDDM, but sometimes it is not. For NB, the turning point is August 2004. Before that point, DDM is better than EDDM, and DDM is worse than EDDM after that point. Next, we would like to know the types of concept drift existing in the TAIEX Futures data. DDM can only deal with sudden concept drift, while EDDM has the ability to handle both sudden and gradual concept drift. Hence, if an algorithm working with EDDM performs better than it working with DDM does on the TAIEX Futures data, we can suppose that there are gradual concept drifts in the TAIEX Futures data. However, as we can see from Figure 23, there is no obvious difference in classification performance (in terms of accuracy), between the results given by DDM and EDDM. Hence, there is a better chance that sudden rather than gradual concept drift exists in the TAIEX Futures data.

‧

Note: Y-axis is accuracy (from 60% to 95), and X-axis represents time.

Figure 24 (Hoeffding Tree) and Figure 25 (Naïve Bayes) shows the result of DDM, AWE and AUE. AWE and AUE are evolving ensemble method, which is able to handle gradual concept drift [29]. However, these ensemble methods perform worse than DDM which is designed for solving sudden concept drift in the whole time. Moreover, ensemble methods perform only slightly better than original algorithms did. Comparing to the result of original algorithms, DDM improved accuracy significantly, while evolving ensemble methods improved little. On average, DDM gives 12.78 at accuracy improvement corresponded to AWE, 13.52 corresponded to AUE, for Hoeffding Tree. DDM gives 14.93 at accuracy improvement corresponded to AWE, 15.44 corresponded to AUE, for Naïve Bayes.

In summary, though DDM specially solved sudden concept drift, while EDDM solved both sudden and gradual concept drift, DDM worked almost as well as EDDM. Moreover, evolving ensemble methods specially deal with gradual concept drift, but AWE and AUE improved accuracy only few when comparing to result of original algorithms. It indicates that

1998/8/1 1999/10/1 2000/12/1 2002/2/1 2003/4/1 2004/6/1 2005/8/1 2006/10/1 2007/12/1 2009/2/1 2010/4/1 2011/6/1 2012/8/1

Hoeffding Tree with DDM

1998/8/1 1999/10/1 2000/12/1 2002/2/1 2003/4/1 2004/6/1 2005/8/1 2006/10/1 2007/12/1 2009/2/1 2010/4/1 2011/6/1 2012/8/1

Naïve Bayes with DDM Naïve Bayes with EDDM

‧

sudden concept drift impacted much more than gradual concept drift in TAIEX Futures market. Hence, we think the main type of concept drift is sudden rather than gradual.

Figure 24. DDM vs. Ensemble Methods of Hoeffding Tree

Figure 25. DDM vs. Ensemble Methods of Naïve Bayes

Hoeffding Tree with DDM AWE Hoeffding Tree AUE Hoeffding Tree

Naïve Bayes with DDM AWE Naïve Bayes AUE Naïve Bayes

‧

Figure 26 shows that the relative weight of each of the top 10 longest-active sub-classifiers (which are based on different algorithms) changes over time. Below are the two findings: First, concept drift occurs frequently. Second, reoccurring concept drift exists.

Each column in Figure 26 is for the results of an algorithm used in experiments. Each column contains 10 sub-figures, and a sub-figure is associated with a sub-classifier. For each sub-figure, the Y-axis is the relative weight ranging from 0 to 1. The relative weight is defined as the absolute weight of a sub-classifier divided by the sum of absolute weights of all sub-classifiers that are active at the same time. For each sub-figure, the X-axis represents time.

If a sub-classifier is not active during some time periods, its weight is zero during those time periods; if a sub-classifier is dropped after some point in time, its weight is zero after that point.

We focus on the longest-active sub-classifiers for each algorithm. However, some sub-classifiers begin at the same time and some end at the same time. This implies that they are corresponding to similar concepts. Referring to Figure 26, we denote the sub-figures for NB and HT as {NB, HT}. Similar concepts are as follows: {8, 9}, which is between December 1998 and November 2004; {10, 1}, which is between May 2001 and July 2006; {3, 10}, which is between April 2004 and September 2009.

For different time intervals, if the average of the relative weights of a sub-classifier is low but the standard deviation of its relative wrights is high, the sub-classifier can well handle the concept that occurs in a specific time interval but not others. This also implies that the concept that occurs in the time interval is different from others.

In Figure 26, we observe that a repeating pattern of high relative weights of a sub-classifier exists in sub-figure (and it exists no matter what algorithms we use). ADE trains a

‧

國

立政治大學

‧

N a tio na

l C h engchi U ni ve rs it y

sub-classifier for a concept. When the relative weight of a sub-classifier is higher, the accuracy achieved by the sub-classifier is higher. At some point in time, if the concept is similar to the one for which a sub-classifier has been trained, the sub-classifier achieves high accuracy and its relative weight is high. Hence, the similar concept occurs again. In other words, the reoccurring concept drift exists in the TAIEX Futures data. For example, in Figure 26, at point A in time, we can see that the sub-classifier associated with sub-figure 1 of NB has the highest relative weight (which means that the sub-classifier can best handle the concept at point A in time). At point B in time, we can see that the sub-classifier associated with the same sub-figure has the highest relative weight again. The concept that can be handled by the sub-classifier associated with sub-figure 1 of NB occurs first at point A in time and then at point B in time. Point A in time is around March 2005, and point B in time is around July 2008. Therefore, the interval for the reoccurring concept is about 3 years and 5 months. Now we shift our focus to the sub-classifier associated with sub-graph 6 of HT in Figure 26. Similarly, we can see that it has the highest relative weight at point C in time, which is around August 1999, and it has the highest relative weight again at point D in time, which is around April 2005. In this case, the interval for the reoccurring concept is about 5 years and 8 months.

Furthermore, we observe that a concept occurs repeatedly around July 2008, January 2009, July 2009, and October 2010. Roughly speaking, July 2008 is the top (the worst situation) of the financial crisis and January 2009 is the bottom of it. In July 2009, the price of TAIEX returned to the level at which it was in July 2008. In October 2010, the price of TAIEX returned to the level at which it was before the financial crisis.

‧

國

立政治大學

‧

N a tio na

l C h engchi U ni ve rs it y

Figure 26. Top 10 longest-active sub-classifiers

Note: In each subgraph, Y-axis is relative weight (from 0 to 1) of a sub-classifier, and X-axis represents time; line A and B are for graph 1 in column of NB, Line C and D are for graph 6 in column of HT; line stands for high related weights at the point in time.

‧

6.3 Characteristics of Adaptive Drift Ensemble

6.3.1 Comparison of Ensemble Methods

We also conduct experiments that use the evolving ensemble method based on AUE.

Figure 27 show that ADE achieves higher accuracy than does AUE. However, no matter we use NB or HT, ADE performs worse than AUE does between March 2008 and January 2009, and around April 2011. The TAIEX stock market fell during these time periods. We also run experiments using different chunk sizes for AUE. As a result, the smaller chunk size is better. In this case, we set the chunk size to 17, which is lower than the number of instances in a month, in order to obtain the best performance for AUE. Roughly speaking, there are 20 trading days (instances) in a month for TAIEX Futures. On average, the accuracy of AUE with NB is 58.86%, and the accuracy of ADE with NB is 63.29%; AUE with HT shows an average accuracy of 61.25% and ADE with HT shows an average accuracy of 64.65%.

Marrs, Hickey and Black discussed latency of reaction for concept drift in the paper [18].

If a model has a higher value of latency then it needs more time for recovery from concept drift (i.e. it needs more time to achieve the same level of accuracy). However, we do not observe an obvious difference in time for recovery between the result given by AUE and that given by ADE. We explain this as follows: First, the chunk size is related to latency of reaction for concept drift in AUE -- a small value of the chunk size leads to a small value of latency of reaction, which usually helps AUE achieve high accuracy. That is, for AUE, we set the chunk size to a small value that could reduce the impact of latency of reaction for concept drift (given that the chunk size is set to 17 and there are about only 20 trading days in a month for TAIEX Futures). Second, DDM, used by ADE as the base handler to calculate and adjust the chunk size, does not change its status if the number of viewed instances between two drift levels is smaller

‧

國

立政治大學

‧

N a tio na

l C h engchi U ni ve rs it y

than a certain value. That is, there is no status change made by DDM within a certain number of instances. Although ADE is designed to react immediately to concept drift, it will make no reaction if DDM makes no status change.

Figure 27. NB and HT with AUE and ADE Note: Y-axis is accuracy (from 50% to 80%), and X-axis represents time.

6.3.2 Comparison of Handlers

Figure 28 shows accuracy values achieved by ADE using DDM or EDDM as the base handler. When DDM is used by ADE with NB, the average accuracy is 63.29%; and when EDDM is used, the average accuracy is 64.6%. When DDM is used by ADE with HT, the average accuracy is 64.65%; and when EDDM is used, the average is 63.76%. When the number of instances in the leaf of a tree is smaller than a threshold (whose default value is 200), HT will no split the leaf. Hence, HT performs poorly in training if concept drift occurs

‧

國

立政治大學

‧

N a tio na

l C h engchi U ni ve rs it y

frequently. Considering the larger variance shown by EDDM in Figure 28, especially when it is

在文檔中串流資料分析在台灣股市指數期貨之應用 - 政大學術集成 (頁 51-0)