CHAPTER 2. INTEGRATION OF INDEPENDENT COMPONENT ANALYSIS
2.3 RESULTS AND DISCUSSION
2.3.1 SUCROSE SOLUTION
The 78 sucrose solution samples were divided into 52 calibration samples and 26
validation samples with a ratio of 2:1. The distribution of their sugar content (°Brix) is
shown in Table 2.1. For all the samples within the calibration and validation sets, the
difference between maximum values of two sets was 0.2 °Brix; the differences for other
items including minimum, average, standard deviation, and coefficient of variation
(CV), were all smaller than 0.5 °Brix. The above sets of samples were conforming to the
consistent requirement of sugar content distributions.
Table 2.1 Summary of sucrose solutions and sample sugar contents. Total samples (n =
78), calibration set (n = 52) and validation set (n = 26) were arranged to have
consistent distributions of sugar content.
Sucrose Solutions
Group n
Sugar Content (°Brix)
Max. Min. Mean SD CV
Total Samples 78 19.00 0.40 9.83 5.48 0.56
Calibration Set 52 19.00 0.40 9.72 5.52 0.57
Validation Set 26 18.80 0.90 10.06 5.52 0.55
2.3.1.1 SELECTION OF THE MOST APPROPRIATE NUMBER OF ICS
According to the definition of ICA, the observed receptor signals can be decomposed
at most into a number of ICs (independent components) equal to the number of samples
(Hyvärinen and Oja, 2000). This study used the data of full range of wavelength (400 to
2498 nm) as the inputs of ICA, conducted ICA for the original spectra of 52 calibration
samples of sucrose solution by selecting 1 to 52 ICs, and observed the prediction error
by using the calibration model. Both situations with and without normalization were
examined. When only one IC applied, the prediction error was high, so the results were
only shown by applying 2 to 50 ICs. As shown in Fig. 2.2, when the number of ICs
increased to 4, SEC of the case without normalization sharply decreased to 0.14 °Brix,
and SEV fell to 0.21 °Brix, indicating that different numbers of ICs can influence the
predictability of the spectral calibration model. However, application of more ICs did
not necessarily help improve the ability of the calibration model because the sucrose
solutions were mixtures of sucrose and water, hence only the initial 4 ICs were applied
in the calibration model.
The results of ICA with normalized spectra can be observed in Fig. 2.2. The
prediction error greatly reduced as the number of ICs increased to 7; the SEC and SEV
with 7 ICs were 0.12 and 0.22 °Brix, respectively. Normalization apparently gave less
variations of SEV compared with that of original spectra.
Fig. 2.2 Relationship between the numbers of ICs and errors of the predicted sugar
content for sucrose solutions. The most appropriate number of ICs for
normalized spectra was determined by the tendency of SEC (green-short dash
line) and SEV (blue-dash dot dot line) values.
2.3.1.2 SPECTRA DECOMPOSITION AND CORRELATION ANALYSIS OF
SUGAR CONTENT
Based on ICA analysis it is critical to examine whether these 7 ICs were statistically
independent. To illustrate the operation, IC 1 and 4 were selected and their correlation
was shown in Fig. 2.3, with the coefficient of determination (r2) being only 4.0 x 10-8.
This indicated that IC 1 and 4 were independent of each other. Diagrams of every two
ICs among the 7 ICs also showed a similar distribution to that in Fig. 2.3, with all of the
r2 smaller than 0.243, conforming to the mutually independent characteristics of ICs
(Hyvärinen and Oja, 2000).
Fig. 2.3 Distribution of calibration and validation samples of sucrose solutions in IC
1-IC 4 space. IC 1 and IC 4 were randomly selected from the 7 ICs.
Eq. 2.5 shows that the constituent information ‘sugar content’ should mainly
correspond to a specific IC, and there should be a high correlation between the values of
the IC in the mixing matrix and the sugar content. So a diagram was made with the
reference sugar content and the values of each column (each IC) in the mixing matrix.
As shown in Fig. 2.4, the correlation coefficient (r) between IC 1 and the reference
sugar content could reach 0.977, which meant that with 7 ICs extracted, the IC 1 among
all 7 ICs could reveal the most information resulted from the sugar content in the
spectra. The results were in agreement with Westad (2005). Therefore, selection of the
numbers of ICs is important since it influences how the information is used after spectra
decomposition.
Fig. 2.4 Correlation between the values of IC 1 in the mixing matrix and the reference
sugar contents of sucrose solutions.
The regression coefficient matrix by the NIR spectra and the reference sugar content
of calibration sets was shown in Table 2.2, and the values from the top to the bottom
referred to IC 1 to 7. All values were compared in terms of absolute values. It was found
that the value of the first row (IC 1) was the largest, closely followed by the value of IC
4. The results agreed with the order of correlation between each IC and the reference
sugar content, and indicated that the importance of each IC was independent of the IC
sequence. Each major constituent had its corresponding IC decomposed by ICA, in
which IC contribution was clearly defined, so that all constituents of the mixtures could
be distinguished by ICA (Chen and Wang, 2001; Hahn and Yoon, 2006; Pasadakis and
Kardamakis, 2006; Kardamakis et al., 2007; Kaneko et al., 2008).
Table 2.2 Regression coefficient matrix of sucrose solutions with 7 ICs were extracted
from the NIR spectra of calibration sets. Correlation between the absolute
value of each IC in regression coefficient matrix and sugar content was
examined.
IC # Regression Coefficient
1 -2.1811
2 -0.2843
3 -0.1843
4 1.2976
5 0.1876
6 -0.1334
7 -0.1416
The ICs, decomposed from the spectra by ICA, reflected the spectral characteristics of the unknown mixture and constituted the pure materials’ spectra of this mixture under
an ideal state (Chen and Wang, 2001; Hahn and Yoon, 2006; Pasadakis and Kardamakis,
2006; Kardamakis et al., 2007). Since the sucrose solutions were mixtures of sucrose
and water, and the spectra was comprised of both constituents, the ICs decomposed by
ICA should reflect the characteristics of these two pure substances. For the original
spectra of the normalized calibration set, among the 7 ICs applied for ICA, the order of
the 7 ICs, according to the correlation with reference sugar content, was IC 1, 4, 2, 5, 3,
7, and 6. The NIR original spectra of the calibration set and IC 1 were shown in Fig.
2.5(A) and (B), and the reflectance spectrum of sucrose powder post-Detrend was
shown in Fig. 2.5(C). The peak positions of IC 1 (964, 1090, 1436, 2100, and 2276 nm)
matched the specific wavelength ranges of sugar content (C-H band) (Chang et al., 1998;
Park, 2003; Hahn and Yoon, 2006), which was also consistent with the absorption bands
seen in Fig. 2.5(C). So IC 1 can be considered to respond mainly to the sugar content,
conforming to the above results. The other ICs had poor correlation with reference sugar
content, and the absolute values in the regression coefficient matrix were much smaller
than that of IC 1, so they exerted an assisting function.
Fig. 2.5 (A) Original NIR spectra of sucrose solutions, (B) IC 1 decomposed from
calibration sets, and (C) the reflectance spectrum of sucrose powder
post-Detrend.
2.3.1.3 SUGAR CONTENT QUANTIFICATION BASED ON ICA AND PLSR
Quantitative analyses of sugar content in sucrose solutions were conducted by ICA
and PLSR using the full range of wavelength from 400 to 2498 nm. The results of
best spectral calibration model was the original spectra normalized, with 7 ICs applied.
The results were Rc = 0.9998, SEC = 0.124 °Brix, rv = 0.9993, SEV = 0.216 °Brix, bias
= 0.014 °Brix, and RPD = 25.54. A comparison was made in light of the result of the
original spectra with and without normalization, and it was found that the calibration
model yielded similar outcomes in the validation sets, whereas the SEC value was
improved when normalization was applied. Although derivatives can improve baseline
shift of the original spectra and amplify the signal characteristics, noise interference
may also be enhanced at the same time, making it unsuitable for spectral bands with
much noises. The spectrum in the range of 2200 to 2498 nm contained more noises;
therefore, the predictability of the spectral calibration models would decrease as
derivatives were attempted.
Table 2.3 Regression results by ICA and PLSR analyses for sucrose solutions.
2nd Derivative + Normalization 2 0.9990 0.243 20.96 0.9869 0.899 34.99 0.013 6.14
The results of spectral calibration models built by PLSR indicated that the best
spectral calibration model was acquired when the original spectra and 2 factors were
employed, and the results were as follows: Rc = 0.9995, SEC = 0.181 °Brix, rv = 0.9985,
SEV = 0.300 °Brix, bias = 0.069 °Brix, and RPD = 18.38 (Table 2.3). Moreover, with
the SEC = 0.192 °Brix and the SEV = 0.546 °Brix for the 1st derivative with
normalization, and the SEC = 0.243 °Brix and the SEV = 0.899 °Brix for the 2nd
derivative with normalization, it is apparent that the SEV values of both 1st and 2nd
derivatives were many times higher than SEC. The results showed that the PLSR
spectral calibration models had poor predictability when applied to validation sets.
Comparing the quantitative analysis results of ICA and PLSR, all ICA spectral
calibration models had better ability than PLSR in predicting calibration and validation
sets. This means that ICA extracts the characteristic information from the spectra more
effectively, not only improving the expository ability of calibration models for the
calibration sets, but also increasing the tolerance for the validation sets. Results also
showed that ICA was preferable to PLSR due to much lower bias (Table 2.3). This
finding became more obvious with normalization, indicating that ICA had a better
tolerance to the influences caused by factors other than chemical characteristics of the
constituents in the samples, which helped to build more robust spectral calibration
models. In summary for the sucrose solutions, ICA achieved better quantitative analysis
of sugar content than PLSR did, while selecting a suitable number of ICs and spectral
pretreatments could help improve the predictability of spectral calibration models. The
results of sucrose solutions also helped establish proper procedures with useful
information applicable when conducting ICA analysis of wax jambu.
2.3.2 WAX JAMBU
Wax jambu samples totaling 114 were used; their sugar contents ranged from 6.4 to
14.5 °Brix. The average sugar content was 9.92 °Brix with the standard deviation of
1.61 °Brix. All the samples were divided in a 2:1 ratio into 76 and 38 calibration and
validation samples (Table 2.4).
Table 2.4 Summary of wax jambu (Syzygium samarangense Merrill & Perry) and
sample sugar contents. Total samples (n = 114), calibration set (n = 76) and
validation set (n = 38) were arranged to have consistent distributions of
sugar content.
Wax Jambu
Group n
Sugar Content (°Brix)
Max. Min. Mean SD CV
Total Samples 114 14.50 6.40 9.92 1.61 0.16
Calibration Set 76 14.50 6.40 9.89 1.61 0.16
Validation Set 38 14.00 7.10 9.99 1.62 0.16
2.3.2.1 CORRELATION ANALYSIS OF NIR SPECTRA AND SUGAR
CONTENT
Fig. 2.6 showed the distribution of the correlation coefficients for the original, the 1st
derivative and the 2nd derivative spectra of the wax jambu samples and their sugar
contents. The main absorption wavelengths of the original spectra were 676, 968, and
1144 nm, of which 676 nm was located within the visible region of red light, whereas
968 and 1144 nm in the NIR region, belonging to the 2nd overtone of the C-H bond. The
main absorption wavelengths of the 1st derivative spectra were 626, 974, 1070, and
1406 nm, of which 626 nm was located in the visible region of orange light, with the
correlation up to 0.808, while the remaining wavelengths in the NIR region. The main
absorption wavelengths of the 2nd derivative spectra were located in the visible region
between orange light and red light, namely 594, 642, and 692 nm. Fig. 2.6 showed that
the wavelength range of 600 to 1098 nm was the major absorption band, and the 1st
derivative spectra were most significantly correlated to the sugar content (Chung et al.,
2004). As for the spectral band 650 to 700 nm, which belonged to the absorption band
of red light, it was consistent with the color of wax jambu skin, indicating that color
information was also reflected in the spectrum.
Fig. 2.6 Correlation coefficient distributions of the spectra and the sugar content of wax
jambu through three different spectral pretreatments (original spectra, 1st
derivative spectra, and 2nd derivative spectra).
The NIR spectra of wax jambu samples were analyzed by taking every 100 nm as a
band region, and full spectrum range from 400 to 2498 nm was divided into 21 band
regions, in which they were separately analyzed. Analysis of the 76 wax jambu
calibration samples could have been decomposed into 76 ICs; however, applying too
many ICs could easily lead to overfitting of the model. Hence, in this study ICA was
conducted with the limit of 30 ICs. The SEV showed no obvious trend when applying 1
water in the wax jambu samples, so it was necessary to avoid using the spectral bands of
1450 and 1900 nm that represent primarily water absorption. When applying 7 to 30 ICs
(Fig. 2.7), the SEV values in the ranges of 600 to 700 nm and 800 to 1098 nm were less
than 1 °Brix, so were the results of the 1st and the 2nd derivative spectra. All three
spectra fitted the spectral bands of higher correlation in Fig. 2.6, so the specific
wavelength regions for spectrum analyses of wax jambu were selected from the
wavelength range of 600 to 700 nm and 800 to 1098 nm (Chung et al., 2004).
Fig. 2.7 Relationship between spectral bands and errors of the predicted sugar content
for wax jambu when applying 7 to 30 ICs. Full spectrum range from 400 to
2498 nm was divided into 21 band regions by taking every 100 nm as a band
region.
2.3.2.2 SUGAR CONTENT QUANTIFICATION BASED ON ICA AND PLSR
2.3.2.2.1 ANALYSIS WITHOUT SPECTRAL PRETREATMENT
The ICA results of the spectral calibration model for wax jambu are shown in Table
2.5. The best spectral calibration model was found with the normalized 1st derivative
spectra and 10 ICs, resulting in Rc = 0.956, SEC = 0.471 °Brix, rv = 0.954, SEV = 0.489
°Brix, bias = -0.013 °Brix, RPD = 3.32. Among the 10 ICs applied for ICA, the order of
the initial 4 ICs, according to the correlation with reference sugar content, is IC 3, 7, 8,
and 6, with respective correlation coefficient (r) of -0.805, 0.647, -0.612, and 0.279. IC
3, 7, and 8 can be considered to respond mainly to the information of sugar content
(including fructose, glucose and sucrose) (Moneruzzaman et al., 2011; Tehrani et al.,
2011) as the composition of wax jambu is rather complicated than that of sucrose
solution alone. Since the specific wavelengths used were within the wavelength range of
600 to 700 nm and 800 to 1098 nm, the spectra covered the 3rd overtone of C-H bond,
conforming to the results of Fig. 2.6 and 2.7. Additionally, the spectral calibration
models built after normalization used the characteristic information of 10 ICs, which is
in line with the SEV trend observed in Fig. 2.7. Moreover, the small values of bias
indicated that ICA had good tolerance to the influence caused by factors other than the
internal chemical composition of the samples.
The PLSR results of the spectral calibration model are shown in Table 2.5, with the
best spectral calibration model found in the normalized original spectra with 5 factors,
yielding Rc = 0.884, SEC = 0.753 °Brix, rv = 0.867, SEV = 0.816 °Brix, and bias =
0.238 °Brix. The specific wavelength regions used were within the wavelength range of
600 to 700 nm and 800 to 1098 nm, consistent with the aforementioned results.
Table 2.5 Regression results by ICA and PLSR analyses for wax jambu (without spectral pretreatment).
2nd Derivative +
After comparing the results of ICA and PLSR quantitative analysis, it was found that
the ICA calibration model performed better than PLSR, since not only did it enhance the
predictability of the model but it also reduced the bias. The specific wavelengths used in
ICA and PLSR showed a high degree of coincidence. When applied to wax jambu
samples, the correlation analysis between NIR spectra and sugar content provided a
basis to select the appropriate specific wavelength regions.
2.3.2.2.2 ANALYSIS WITH SPECTRAL PRETREATMENT
To evaluate the best predictability of ICA models for wax jambu, ICA analysis was
further performed with pretreatment and outlier procedures. After selecting the best
pretreatment parameters (points of smoothing and gap of derivative were both 3) and
eliminating 1/10 outliers (11 samples) from the total of 114 samples, the best spectral
calibration model was found, as shown in Table 2.6, with the normalized 1st derivative
spectra and 9 ICs, resulting in Rc = 0.988, SEC = 0.243 °Brix, rv = 0.971, SEV = 0.381
°Brix, bias = 0.001 °Brix, RPD = 4.15. The PLSR analysis results under the same
conditions were Rc = 0.983, SEC = 0.287 °Brix, rv = 0.963, SEV = 0.426 °Brix, bias =
-0.039 °Brix, RPD = 3.71. The ICA spectral calibration model had better results than
PLSR results with pretreatment and outlier procedures in predicting calibration and
validation sets.
Table 2.6 Regression results by ICA and PLSR analyses for wax jambu (with spectral pretreatment).
Compared to the previous literatures (You, 2002; Lin, 2002; Chung et al., 2004), the
spectral calibration models built by ICA had higher predictability for wax jambu since
the SEC values reported by You (2002), Chung et al. (2004) and Lin (2002) were 0.413
°Brix, 0.388 °Brix and 0.252 °Brix, respectively. Among them, the SEP values reported
by Chung et al. (2004), 0.262 °Brix, 0.207 °Brix and 0.322 °Brix, were all lower than
its SEC value (0.388 °Brix); these MLR analysis results seemed unreasonable because
that the prediction sets were unknown to the calibration model, thus the SEP values
should be higher than SEC value. Even though, our ICA results listed in Table 2.6 were
better than those reported by Chung et al. (2004) and Lin (2002) in terms of Rc, SEC, rp
and RPD.
The results of ICA sugar content quantification based on NIR spectroscopy showed
that ICA can effectively extract the characteristic information in the spectra, and build
the spectral calibration models with desirable abilities to evaluate the concentration of
the constituents. It thus can be expected that integration of ICA with NIR spectroscopy
could become a powerful tool for quantitative analysis of specific targets.