2 Signal Processing and Feature Extraction Techniques
2.1 Singular Spectrum Analysis (SSA)
2.1.1 Decomposition Stage
Considering a continuous, nonzero, and real-valued time series, it can always be discretized no matter the time series is a single-channel one, 𝑥(𝑡), or a multi-channel one, 𝐱(𝑡). The discrete time series, 𝐱𝑘, has length of 𝑁 (𝑁 > 2) and sampling interval of ∆𝑡, meaning 𝐱𝑘 = 𝐱(𝑘∆𝑡) where 1 ≤ 𝑘 ≤ 𝑁. Again, the time series can be a multi-channel one
𝐱𝑘= [𝑥𝑘,1 𝑥𝑘,2 𝑥𝑘,3
…
𝑥𝑘,𝑛]𝑇 (2.1)where 𝑛 is the number of channels and the superscript 𝑇 denotes the transpose of a matrix. If 𝑛 equals 1, M-SSA becomes SSA and 𝐱𝑘 becomes 𝑥𝑘. Moreover, although a constant sampling interval is normally assumed, the subscript, 1, 2, 3,
… , 𝑁
, can be interpreted not only as discrete time moments but also as labels of any other linearly ordered structure. The decomposition stage described below consists of two steps: embedding and SVD.First Step: Embedding
Embedding is a standard procedure in time series analysis and, after the embedding is performed, further development can be variated according to the purpose of investigation. SSA starts from embedding a trajectory matrix of time series. By specifying an integer, 𝐿, called window length,
doi:10.6342/NTU201901242 with the dimension 𝐿. If the emphasis needs to be addressed on the dimension of the lagged vectors, then it shall be called as L-lagged vectors (or, simply, lagged vectors) and the trajectory matrix shall be called the L-trajectory matrix.
Both the rows and columns of the trajectory matrix, 𝐗, are sub-series of the original time series and Equation (2.2) defines a unique correspondence between the trajectory matrix and the original time series. Once the window length, 𝐿, is sufficiently large, each lagged vector can be considered as a separate series and be used to investigate the dynamic characteristics for the time series. The simplest example is the well-known ‘moving average’ method, where the averages of the lagged vectors are computed, and there are also much more sophisticated approaches. At any rate, the window length must be large enough so that each lagged vector incorporates the essential part of the dynamic characteristics. From another point of view, not only the row size but also the column size must be large enough to allow a clear separation for the following SVD, where the singular vectors illustrate the detail content of the original time series and the singular values present the energy information corresponding to each frequency component (Bozzo et al., 2010).
The trajectory matrix in Equation (2.2) possesses an obvious symmetry property: the transposed matrix, 𝐗𝑇, is the trajectory matrix of the same time series with window length of 𝐾 rather than 𝐿.
The ijth element is 𝐱𝑖𝑗 = 𝐱𝑖+𝑗−1 which yields that the trajectory matrix has equal elements on the
‘(positive sloping) skew-diagonals’ (or ‘anti-diagonals’) where (𝑖 + 𝑗) equals a constant. Thus, the trajectory matrix is a Hankel matrix (or catalecticant matrix). It is a useful characteristic which is referred to as Hankel diagonals (skew-diagonals or anti-diagonals) and used in the final step to
doi:10.6342/NTU201901242 12
Second Step: Singular Value Decomposition
The second step is to conduct the SVD to the trajectory matrix, 𝐗, as:
𝐗 = 𝐔𝛔𝐕𝑇= [𝐮1 𝐮2
…
𝐮𝐿] vectors and the columns of 𝐕 are called the right-singular vectors of the trajectory matrix.Assuming the rank of the trajectory matrix is 𝐷 as:
𝐷 = rank[𝐗] = max{𝑖, such that 𝜎𝑖 > 0} (2.4) In most of applications, the trajectory matrix is full rank, denoting that 𝐷 = min{𝐿, 𝐾}. The collection (𝜎𝑖, 𝐮𝑖, 𝐯𝑖) where 𝑖 = 1, 2,
… , 𝐷
is called the ith eigentriple that forms the basis of decomposition.By using SVD, it is possible to write the trajectory matrix as a sum of 𝐷 elementary matrices which is given by 𝐗𝑖 = 𝜎𝑖𝐮𝑖𝐯𝑖𝑇:
Under the assumption that the time series is stationary, the trajectory matrix in Equation (2.3) can be replaced by the lag-covariance matrix, 𝐂, and the principal component analysis (PCA) can be equivalently performed instead of SVD for the efficient analysis of large-sized data. There are two distinct methods widely used to the define lag-covariance matrix, named BK approach (Broomhead
doi:10.6342/NTU201901242 13
and King, 1986) and VG approach (Vautard and Ghil, 1989), respectively. An important observation shows that VG approach is equivalent to BK approach by padding 𝐾 − 1 zeros before and after the original time series (Allen and Smith, 1996; Ghil and Taricco, 1997), hence only BK approach is introduced here:
𝐂 = 𝐗𝑇𝐗 = (𝐔𝛔𝐕𝑇)𝑇(𝐔𝛔𝐕𝑇) = 𝐕𝛔2𝐕𝑇 (2.7) where the singular vector matrix 𝐕 are now equal to the eigenvector matrix and the singular value matrix is now equal to the square root of eigenvalue matrix. In this approach, the left-singular vectors can be derived as:
𝐮𝑖 = 𝜎1
𝑖𝐗𝐯𝑖 (2.8)
and the elementary matrices in Equation (2.5) can be re-written as:
𝐗 = ∑𝐷𝑖=1𝐗𝑖 = ∑𝐷𝑖=1𝐗𝐯𝑖𝐯𝑖𝑇 = 𝐗𝐯1𝐯1𝑇+ 𝐗𝐯2𝐯2𝑇+
⋯ + 𝐗
𝐯𝐷𝐯𝐷𝑇 (2.9) By using these equations, SSA or M-SSA becomes much more efficient while dealing with large-sized data. The distribution of singular values (or eigenvalues) in descending order is referred to as singular spectrum and can be used to specify which principal components shall be included in the next step.SVD in Equation (2.3) possesses a number of optimal features. One of these features is that, among all the matrices of rank 𝑟 where 𝑟 < 𝐷 , the matrix, 𝐗̂ = ∑𝑟𝑖=1𝐗𝑖, provides the best approximation to the trajectory matrix so that the (Frobenius) norm of the error matrix is minimum (Golyandina and Zhigljavsky, 2013). Another optimal feature relates to the directions determined by the eigenvectors, 𝐯1, 𝐯2,