Parameter Settings - Stock Dataset - Data Complex Simulation

Data Complex Simulation

4.1 Stock Dataset

4.1.1 Parameter Settings

For stock dataset, since the data is collected monthly, it’s possible for analysis on all reasonable parameter range to look at the embedding results from di¤erent manifold learning methods under the data size. Before applying manifold learning, the stock datasets are normalized as following: all data are subtracted by the data point in January 2000 and then divided by it. Using this kind of normalization, we can simplify our analysis by ignoring the normalization e¤ect with shifting invariance and using the same scale factors across all datasets.

The corresponding parameters for manifold learning are shown in Table 4.2.

Table 4.2: Parameter settings for stock dataset.

Neighborhood Relation k-nn -distance Data index distance

Embedding Dimensions 3 3 3

Parameter Range for 1995-2012 4 k 209 0:3 7:3 2 k_idx 209 Parameter Range for 2000-2012 4 k 149 0:5 11:1 2 k_idx 149

For k-nn, all usable k’s will be included in the …nal analysis of eigenvalues.

For -distance, the smallest reasonable distances are chosen by heuristics while the largest distance will include all connections into analysis of eigenvalues. For data index distance, the smallest usable parameter will be kidx (d + 1)=2, while the maximum possible kidx = N 1, where N means data size of the dataset. In conventional manifold learning, it is appreciated that using as less connections as possible to obtain embedding results as approximate to the original data structures as possible. Here, we take a di¤erent point of view and want to look at manifold learning behaviors within full parameter space.

4.1.2 Results

The eigenvalues from the …rst dataset by di¤erent manifold learning methods and neighborhood selection approaches can be shown in Table 4.3 and 4.4.

Since these eigenvalues are drawn until all data points except the focused data are considered as its neighbors, the …nal eigenvalues for individual manifold learn-ing methods will obtain the same values no matter which neighborhood selection approach is used. For di¤erent neighborhood selection approaches, the eigenvalues’

curves will be di¤erent because of di¤erent order of connection insertions.

Manifold learning methods are often considered as designed for using little num-ber of connections to discover the major hidden structure of the original dataset. In this full range analysis, we can observe special behaviors of individual manifold learn-ing method. For example, methods rely on matrices built from weight computation will be more signi…cantly a¤ected by di¤erent neighborhood selection approaches than those rely on matrices built from distance metrics. Another signi…cant result is that eigenvalues from Laplacian eigenmap will be all 1’s if neighbors of one data

Table 4.3: Eigenvalues for stock dataset starting from January 1995 from LLE, Isomap and Hessian LLE.

Methods k-nn -distance Data index distance

LLE

points are all the other data points.

Some selected results from di¤erent manifold learning methods from speci…c neighborhood selection parameters can be shown in Figure 4-1 to 4-3. The points for color changes are in September 2001 and December 2003. Blue lines and points are before September 2001, while green lines and points are between September 2001 and December 2003, and the red lines and points are after December 2003.

September 2001 indicated the 911 terrorist attacks, while December 2003 was a time point for possible observation of signi…cantly di¤erent stock behaviors from previous time points in most of the embedding results from manifold learning. The other time points mentioned in the …gures can be seen in Table 4.5. Embedding result from Laplacian eigenmap has descent shape and contains less information since the version of Laplacian eigenmap used in this work is the simple minded version. For other embedding results, the big jump from February to March in

Table 4.4: Eigenvalues for stock dataset starting from January 1995 from Laplacian eigenmap, di¤usion map and LTSA.

Methods k-nn -distance Data index distance

Laplacian

1997, and larger movements and amplitudes after December 2003 can be captured.

Table 4.5: Meaning of selected time points.

Date Event

1995/01 Data start point #1

1997/02 - 1997/03 Before Asian …nancial crisis 2000/01 Data start point #2

2001/09 911 attack

2003/12 - 2004/01 Observation of di¤erence 2008/01 - 2009/01 Subprime mortgage crisis 2010/01 - 2010/11 QE1

2010/11 - 2012/01 QE2 2012/01 - 2012/06 OT1

The eigenvalues from the second dataset by di¤erent manifold learning methods and neighborhood selection approaches can be shown in Table 4.6 and 4.7.

The di¤erent number of data points and data dimensions generates signi…cantly di¤erent eigenvalue results, but the basic behaviors of di¤erent manifold learning methods are similar.

Some selected results from speci…c parameters can be shown from Figures

4-2 0 0 6 /0 1

Figure 4-1: Selected embedding results for stock dataset from LLE and Isomap.

1 9 9 7 /0 2

(b) Laplacian eigenmap with kidx= 81

Figure 4-2: Selected embedding results for stock dataset from Hessian LLE and Laplacian eigenmap.

Figure 4-3: Selected embedding results for stock dataset from di¤usion map and LTSA.

Table 4.6: Eigenvalues for stock dataset starting from January 2000 from LLE, Isomap and Hessian LLE.

Methods k-nn -distance Data index distance

LLE

Table 4.7: Eigenvalues for stock dataset starting from January 2000 from Laplacian eigenmap, di¤usion map and LTSA.

Methods k-nn -distance Data index distance

Laplacian

4 to 4-6. Since the starting time point is January 2000, the big jump occurred in 1997 cannot be shown in the embedding results. For this dataset, less time points are included while more stock markets are available for analysis. The similar embedding results can be observed although the dataset is signi…cantly di¤erent from the previous dataset. For example, simple-minded Laplacian eigenmap is also unsuitable for this dataset, and similar observable di¤erence of curvatures can be found. The reason for normalization by subtracting closed price in one special time point and divide by it instead of the conventional normalization because of the invariance properties of the manifold learning. The former normalization only contains same shift and scaling factors between two datasets, while the latter will introduce di¤erent shift and scaling factors which increase the complexity and may a¤ect the invariance criteria for further analysis.

2 0 1 2 /0 6

Figure 4-4: Selected embedding results for stock dataset from LLE and Isomap.

2 0 1 2 /0 1

(b) Laplacian eigenmap with kidx= 58

Figure 4-5: Selected embedding results for stock dataset from Hessian LLE and Laplacian eigenmap.

2 0 0 1 /0 9

2 0 1 0 /1 1 2 0 0 0 /0 1

2 0 1 2 /0 1 2 0 1 2 /0 6 2 0 0 4 /0 1 2 0 1 0 /0 1

2 0 0 9 /0 1 2 0 0 6 /0 1 2 0 0 8 /0 1

(a) Di¤usion map with = 2

2 0 0 8 /0 1

2 0 0 6 /0 1

2 0 0 0 /0 1 2 0 0 4 /0 1 2 0 1 0 /0 1

2 0 0 1 /0 9 2 0 1 0 /1 1

2 0 0 9 /0 1 2 0 1 2 /0 1 2 0 1 2 /0 6

(b) LTSA from kidx= 20

Figure 4-6: Selected embedding results for stock dataset from Hessian LLE and Laplacian eigenmap.

在文檔中複雜資料應用流形學習之特徵值分析 (頁 55-62)