DATA ANALYSIS - MATERIALS AND METHODS - INTEGRATION OF INDEPENDENT COMPONENT ANALYSIS

CHAPTER 2. INTEGRATION OF INDEPENDENT COMPONENT ANALYSIS

2.2 MATERIALS AND METHODS

2.2.3 DATA ANALYSIS

2.2.3.1 INDEPENDENT COMPONENT ANALYSIS (ICA)

Independent component analysis (ICA) is a method used to transform the observed

multivariate data to statistically independent components (ICs) and to present them as a

linear combination of observation variables. The number of receptors defined by ICA

algorithm must be more than or equal to the number of sources, and the signals emitted

by the sources are in non-Gaussian distribution (Hyvärinen and Oja, 2000). The ICs are

latent variables; therefore, they cannot be directly observed, indicating that the mixing

matrix is also unknown. The purpose of the ICA algorithm is to determine the mixing

matrix (M) or the separating matrix (W). In order to predict the unknown source, it is

assumed that W = M^-1,

ŝ = Wx = M^-1Ms (2.1)

where ŝ is the estimation of the sources (s) and x represents the observed spectra of

the objects.

In the present study JADE (joint approximate diagonalization of eigenmatrices)

algorithm (Cardoso and Souloumiac, 1993; Cardoso, 1999) was employed to conduct

ICA analysis. In general, JADE offers rapid performance for dealing with spectra data

due to it works off-the-shelf, an improvement over other multivariate approaches like

PCR and PLSR. Assuming that the spectra obtained through measurement of the unknown mixtures were the linear combination of various components’ spectra, it can

be expressed as:

A = MI (2.2)

The spectra of samples were all linearly composed of m ICs. Matrix Al×n stands for l

samples containing n values; Im×n stands for the matrix of ICs, including m independent

components. Ml×m stands for the mixing matrix, which is related to the component

concentration in the mixture. The linear relationship between the mixing matrix (M) and

the component concentration (C) can be expressed as:

C = MB (2.3)

Among them, B referred to the matrix of regression coefficient. In doing so, the

concentration of each component in the mixture could be determined by the

combination of ICA and linear regression.

2.2.3.2 PARTIAL LEAST SQUARES REGRESSION (PLSR)

Partial least squares regression (PLSR), a typical method in chemometrics (Wold et

al., 2001), has been widely applied to chemical and engineering fields. When PLSR is

applied to spectral analysis, the spectra can be regarded as the composition of several principal components (PCs), and be expressed as a ‘factor’ in the PLSR algorithm. The

factors’ sequence is determined by their influences; the more important factor is ranked

earlier in the order, such as factor 1 and factor 2. Since information from spectral bands

was used in PLSR analysis, the analysis results can be improved by selecting

appropriate number of factors and specific wavelength ranges. To avoid overfitting of the PLSR model’s results with too many factors, the factors were selected based on the

following principles in this study: (1) A maximum factor limit was set at 1/10 of

calibration set data + 2 to 3 factors; (2) new factors were not added if they caused a rise

in the prediction error; and (3) new factors were not added if they resulted in a standard

error of validation (SEV) smaller than the standard error of calibration (SEC).

2.2.3.3 SPECTRAL PRETREATMENTS

The purpose of spectral pretreatments was to eliminate the spectral variation, which

was not caused by chemical information contained in the samples (de Noord, 1994). For

the raw NIR spectra of sucrose solutions and wax jambu, three different spectral

pretreatments were employed in this study: (1) normalization; (2) 1^st derivative with

normalization; and (3) 2^nd derivative with normalization. Normalization scaled the

spectrum absorbance of all samples to fall within an interval of -1 to 1. For further

applications of ICA in fast on-line inspection of fruits, the procedure of selecting best

pretreatment parameters, including points of smoothing and gap of derivative, were not

employed to save computational time. The gap of derivative was set at a minimal value

of 2, so as to maintain the most wavelength values as inputs for the model.

2.2.3.4 MODEL ESTABLISHMENT

This study used the mathematic software MATLAB (The MathWorks, Inc., Natick,

MA, U.S.A.) to write ICA programs based on JADE algorithm for establishing ICA

spectral calibration models. The results of ICA were compared with the spectral

calibration models of PLSR built by WinISI II (Infrasoft International, LLC., Port

Matilda, PA, U.S.A.) chemometric software package. The analysis procedure of both

ICA and PLSR for wax jambu and sucrose solution samples included: (1) selecting

calibration set and validation set, (2) spectral pretreatments, and (3) determining best

calibration model. Since the sucrose solutions were mixtures of sucrose powder and

water, their composition were rather simple. Therefore, the data of full wavelength

range (400 to 2498 nm) were used for comparing the tolerance abilities of ICA and

PLSR since spectral bands with more noises (e.g. 2200 to 2498 nm) often affect the

analysis results. Identification of specific wavelength ranges was needed for wax jambu

because their composition was more complicated than that in sucrose solutions, which

required additional correlation analysis between wavelengths and sugar content. All of

the sucrose solutions and wax jambu samples were respectively used for analysis to

assess the tolerance abilities of ICA and PLSR. A ratio of calibration to validation

samples of 2:1 was adopted according to the sugar content in the sample. All samples

were ranked ascendantly according to their sugar content. Number 1 and 2 were

assigned for calibration and 3 for validation, with subsequent numbers following the

same alternating sequence. The same sets of calibration and validation were used for

both ICA and PLSR analyses.

After the respective spectral calibration models of sucrose solution and wax jambu

were built, these models were then used to predict the sugar contents of the calibration

and the validation set. The evaluation of predictability was based on the following

statistical parameters, including coefficient of correlation of calibration set (Rc),

standard error of calibration (SEC), coefficient of correlation of validation set (rv),

standard error of validation (SEV), bias, and ratio of [standard error of] performance to

[standard] deviation (RPD), as defined by:

where Yc and Yv represent the estimated sugar contents of the calibration set and the

validation set, respectively. Yr is the reference sugar content, nc and nv are the number of

samples in the calibration set and validation set, and SD is the standard deviation of

sugar content within the validation set. RPD is one of the indices used to evaluate the

performance of a model. The greater the value of RPD is considered adequate for

analytical purposes in most of NIR spectroscopy applications for agricultural products

(Williams and Sobering, 1993).

在文檔中應用獨立成分分析法於生物材料之近紅外光分析 (頁 33-39)