Cluster Analysis for Time Series

Chapter 2 Literature Review

2.2 Cluster Analysis for Time Series

Since the proposal of the famous cluster analysis algorithms, k-means, 50 years ago, the cluster analysis has been widely used as a data analyzing tool in various domains. In the past two decades, time series clustering has shown effective results in providing useful information in various domains. In the financial field, clustering financial time series is a new approach to analyze the dynamic behavior of time series, and to forecast any future tendency of the time series for purposes of decision making. Many financial problems have been studied based on cluster analysis via computational intelligence approach instead of the conventional approach.

Pattarin et al. (2004) propose a classification algorithm for mutual funds style analysis, which combines different statistical techniques and exploits information readily available at low cost.

In their analysis, time series of past returns is used to retrieve synthetic and informative description of contexts characterized by high degrees of complexity, which is useful in identifying the styles of mutual funds. Gafnychuk et al. (2004) use the self-organizing methods to investigate the time series data of the Dow Jones index. Basalto et al. (2007) use a novel clustering procedure, which is applied to the financial time series of the Dow Jones industrial average (DJIA) index to find companies that share similar behaviors. The techniques proposed could extract relevant information from raw market data and yield meaningful hints as to the mutual time evolution of stocks. Karandikar et al. (2007) develop a volatility clustering model to forecast value at risk (VaR). The feasibility and benefits of the model are demonstrated in an

electricity price return series. Zhu (2008) propose a new model based on cluster analysis for oil futures price forecasting. This model is demonstrated using the historical data from NYMEX market, and shows that the proposed model can effectively and stably improve the precision of oil futures price forecasting. Focardi and Fabozzi (2009) adopt a clustering-based methodology to determine optimal tracking portfolio to track indexes. Papanastassiou (2009) discuss classification and clustering of financial time series data based on a parametric GARCH (1,1) representation to assess their riskiness.

In spite of the prevalence of numerous clustering algorithms, including their success in a number of different application domains, clustering remains difficult. When applying the clustering analysis on time series, the method of data processing, feature extraction, similarity measurement, and topology of cluster construction should be determined. Features extracted from the time series are organized by past research into three groups (i.e., according to data used) (Liao, 2005): working directly with the data either in the time or frequency domain;

working indirectly with features extracted from the raw data; and working indirectly with models built from raw data. Defining an appropriate similarity measure and objective function is difficult when choosing clustering algorithm. Jain (2010) emphasizes that “there is no best clustering algorithm” when comparing the results of different clustering algorithms.

Furthermore, the clustering method can be classified into two categories depending on whether the data objects are grouped into a tree of clusters or not (i.e., hierarchical and non-hierarchical).

There are generally two types of hierarchical clustering methods: agglomerative and divisive.

Agglomerative methods start by placing each object in its own cluster and then merging clusters into larger and larger clusters, until all objects are in a single cluster or until certain termination conditions, such as the desired number of clusters, are satisfied. Divisive methods do the opposite. Determining the number of clusters automatically is one of the most difficult

problems in data clustering. Most methods for automatically determining the number of clusters cast it into the problem of model selection.

Although clustering remains a difficult problem, in time series, it offers two benefits.

First, clustering can avoid the improper assumption and restriction of data. Gershenfeld et al.

(1999) propose a cluster-weighted model for time series analysis, which is a simple special case of the general theory of probabilistic networks but one that can handle most of the limitations of practical data sets without unduly constraining either data or user. They show that are nonlinear, non-stationary, non-Gaussian, and discontinuous signals can be described by expanding the probabilistic dependence of the future depending on past relationships of local models. Second, data objects with similar dynamic behavior in their evolution over time are pooled and can thus help in data modeling. Fruhwirth-Schnatter and Kaufmann (2008) propose to pool multiple time series into several groups using finite-mixture models. Within a panel of time series, only those that display “similar” dynamic properties are pooled to estimate the parameters of the generating process. They estimate the groups of time series simultaneously with group-specific model parameters using Bayesian Markov chain Monte Carlo simulation methods. They document the efficiency gains in estimation, and forecasting is realized relative to the overall pooling of the time series. D’Urso and Maharaj (2009) suggest that time series often display dynamic behavior in their evolution over time, which should be taken into account when attempting to cluster the time series. They proposed a fuzzy clustering approach based on autocorrelation functions to determine and represent the underlying structure in the given time series.

Based on literature, we apply the cluster analysis to find the dynamic behavior of financial time series based on computational intelligence approach. The method of data processing, feature extraction, and similarity measurement, as well as the topology of clusters constructed,

are easy to determine.

在文檔中以計算智慧為基礎之新的避險比例決定方法 (頁 20-23)