Semi-Metrics Based on Functional PLS - Three Types of Semi-Metrics

2.2 Three Types of Semi-Metrics

2.2.2 Semi-Metrics Based on Functional PLS

In this subsection we first introduce partial least squares (PLS) regression and different forms of PLS, then give a brief introduction of semi-metric based on functional PLS.

Partial Least Squares Regression:

Partial Least Squares is a widespread method for modeling relations between a set of dependent variables and a large set of predictors. PLS generalizes and combines features from principal component analysis and multiple regression. It originated in social sciences (particularly economy, Herman Wold 1966) but become popular first in chemometrics due to Herman’s son Svante, (Geladi and Kowalski, 1986). It was first presented as an algorithm analog to the power method (used for computing eigenvectors) but was suitably interpreted in statistical framework.

(Frank and Friedman, 1993; Hoskuldsson, 1988; Helland, 1990; Tenenhaus, 1998).

PLS usually is used to predict response variable Y from predictors X and de-scribe their common structure. If X is full rank, ordinary multiple regression can be applied. When the number of predictors is larger than the number of obser-vations, X would be singular and the ordinate multiple regression is no longer practicable. Several methods have been developed to solve this problem, e.g. prin-cipal component regression. But the method is chosen to explain the variation of X, nothing guarantees that the choice to explain X are suitable for Y. PLS regres-sion finds components from X that are also connected with Y. It searches for a

set of components (latent vectors) that performs a simultaneous decomposition of X and Y with the constraint that these components explain as much as possible of the covariance between X and Y.

Let X be the zero-mean (n × N) matrix and Y be the zero-mean (n × M) matrix, where n denotes the number of data sample. PLS decomposes X and Y into the form

X = TP⁰+ E

Y = UQ⁰+ F (1)

where the T, U are (n × p) matrix of the p extracted components, the (N × p) matrix P and the (M × p) matrix Q represent loading matrices and the (n × N) matrix E and the (n × M) matrix F are the residual matrices. PLS which is based on the nonlinear iterative partial least squares (NIPALS) algorithm finds weight vectors w, c such that

[cov(t, u)]² = [cov(Xw, Yc)]² = max(|r|=|s|=1)[cov(Xr, Ys)]²

where cov(t, u) = t⁰u/n denotes the sample covariance between the components t and u . The NIPALS algorithm starts with random initial value of the component u and repeats a sequence of following steps until convergence.

Step 1. w = X⁰u/(u⁰u) (estimate X weights) Step 2. kwk → 1

Step 3. t = Xw (estimate X component) Step 4. c = Y⁰t/(t⁰t) (estimate Y weights)

Step 5. kck → 1

Step 6. u = Yc (estimate Y component)

Specially, u = y if M = 1, Y is a one-dimensional vector that is denoted by y. In this case the NIPALS procedure converges in a single iteration.

It can be shown that the weight vector w also corresponds to the first eigen-vector of the following series of equations: w ∝ X⁰u ∝ X⁰Yc ∝ X⁰YY⁰t ∝ X⁰YY⁰Xw. This shows that the weight vector w is the right singular vector of the matrix X⁰Y. Similarly, the weight vectors c is the left singular vector of X⁰Y.

Then the latent vectors t and u are given as t=Xw and u=Yc, where the weight vector c is defined in steps 4 and 5 as stated above. Similarly, the extraction of t, u or c estimates can be derived.

Forms of PLS:

PLS is an iterative process. After the latent vectors t and u being extracted, the matrices X and Y are extracted by subtracting their rank-one approximations based on t and u. Different extractions form several variants of PLS. By equation (1) the loading vectors p and q are computed as coefficients of regressing X on t and Y on u, respectively. Then, the loading vectors can be solved by p = X⁰t/(t⁰t) and q = Y⁰u/(u⁰u).

1.) PLS Mode A:

The PLS Mode A is based on rank-one deflation of individual matrices using the corresponding latent and loading vectors. In this case, the X and Y matrices are extracted X = X−tp⁰and Y = Y −uq⁰. This method was originally proposed

by Herman Wold (1966) to model the relations between the different sets of data.

2.) PLS1 and PLS2:

PLS1 (either dependent variable or independent variable consists of a single variable) and PLS2 (both variables are multidimensional) are used as PLS regres-sion method. The form of PLS is the most popular PLS approach. The main feature of the approach is that the relation between of X and Y is asymmetric.

The main assumptions of the form of PLS are:

(i) The latent vectors {ti}p

i=1 are good predictors of Y; p denotes the number of extracted latent vectors.

(ii) a linear inner relation between the latent vectors t and u exists; that is,

U = TD + H

where D is the (p × p) diagonal matrix and H is the residual matrix.

The asymmetric assumption of the independent and dependent variable relation is transformed into a deflation scheme. The latent vectors {ti}^p_i=1 are good predictors of Y. Then the latent vectors are used to deflate Y, that is, a component of the regression of Y on t is removed from Y at each iteration of PLS.

X = X − tp⁰ and Y = Y − tt⁰Y/(t⁰t) = Y − tc⁰

where the weight vector c is defined in step 4 of NIPALS. This way of deflation will ensure that the extracted latent vectors {ti}^p_i=1 are mutually orthogonal.

3.) SIMPLS:

In order to avoid deflation steps at each iteration of PLS1 and PLS2, Jong (1993) has introduced another form of PLS denoted SIMPLS. The SIMPLS ap-proach directly finds the weight vectors { ˜wi}^p_i=1 which are applied to the original matrix X. The criterion of the mutually orthogonal latent vectors ⁿ˜ti

i=1 is kept.

Semi-Metrics Based on Functional PLS:

Let v^q₁, . . . , v_p^q be the vectors of R^J performed by multivariate partial least squares regression (MPLSR) where q denotes the number of the factors and p the number of scalar responses. Then the semi-metric based on the MPLSR is defined as

where w₁, . . . , wJ are weights which define the approximate integration. A standard choice is wj = tj − tj−1. When we consider only one scale response (p = 1), the proximity between two discrete curves is due to only one direction, which seems inadequate with regard to the complexity of functional data. However, as soon as we consider multivariate response, such a family of semi-metrics allows to obtain very good results, which is the case in the curves discrimination context.

在文檔中利用無母數分類方法辨識動態車輛 (頁 14-18)