3. Case studies
3.3 Regional estimation of groundwater Arsenic concentration through Systematical
Systematical Dynamic-neural Modeling
To reliably modelling water quality, it is important to understand the impacts of factors on real competence, the interaction and evolvement of factors within an operation system, and the measurements of factors. In this case study, we aim to present a novel model of case competence with good accuracy and predictability, in which certain assumptions are made for the nature of cases. The proposed SDM incorporates the GT into a dynamic neural network accompanied with three optional statistical techniques to tackle regional estimation problems for water quality management, and its implementation procedure is shown in Fig. 3.15. The SDM first effectively extracts the non-trivial factors that significantly affect the fluctuations of As concentrations through the GT. The NARX network is then utilized to obtain As concentration at ungauged sites with inputs consisting of the extracted non-trivial factors and the estimated As concentrations obtained from recurrent connections. The Bayesian regularization method is configured to control the network complexity for preventing over-fitting. The cross validation technique is used to produce a low-bias estimator of model
Fig. 3.15 Implementation procedure of the proposed SDM for regional estimation of As concentrations in groundwater
generalizability and thus provides a sensible criterion for model selection in the calibration stage. Finally, the IK is implemented to derive the probability map of As concentrations for detecting unsampled areas with As concentrations exceeding the WHO drinking water standard.
A. Study area
Yun-Lin County is located in the southwestern alluvial fan of the Chou-Shui River in Central Taiwan (Fig. 3.16). Based on hydrogeological settings, the southern Choushui River alluvial fan is classified mainly into the proximal-fan, the mid-fan and the distal-fan areas (Central Geological Survey, 1999), in which the coastal region of the Yun-Lin County is located in the distal-fan area. The hydrogeological formation of the distal-fan can be divided into six inter-layered sequences: three marine sequences; and three non-marine sequences. The non-marine sequences with coarse sediments (from medium sand to highly permeable gravel) are considered as aquifers, whereas the marine sequences with fine sediments are considered as aquitards. The annual average precipitation is 1417 mm, which mainly occurs during wet seasons (i.e., between May and September). Aquaculture is the primary revenue source for the inhabitants in the coastal region of Yun-Lin County. Due to high water demand but limited water supply,
Fig. 3.16 Locations of twenty-six groundwater wells at the Yun-Lin coastal area, Taiwan.
groundwater has become a vital water resource in this area for decades. In 1992 the Water Resources Agency installed 26 groundwater monitoring wells (well depths range from 8m to 110m) distributed in this area for groundwater quality monitoring, particularly for As pollution and other potential contamination in groundwater.
Approximately 757 million m3 of groundwater was extracted annually from the aquifers in this area, of which 268 million m3 was considered to be over-pumped (Liu et al., 2001). High As concentration (93.2±161 ugl-1) was detected in the monitoring wells in this area (WHO drinking water standard: 10 ugl-1). Liu et al. (2006) indicated that over-pumping groundwater induces dissolved oxygen and increases As mobility in water and thus the relatively high As content has accumulated and been deposited in the marine sequences with fine sediments.
B. Data collection and preliminary analysis
In this case study, sampling data of groundwater quality parameters were collected quarterly at 26 wells between 1992 and 1999, and the field sampling methods of As concentration was determined by hydride generation followed by atomic absorption spectroscopy, APHA Method 3500-arsenic Part B (APHA, 1992). The maintenance of groundwater monitoring wells is laborious and cost intensive, and therefore only six wells (#3, #6, #7, #12, #17 and #19) have continued monitoring groundwater quality after 1999. The proposed method intends to estimate the As fluctuations of 20 un-monitored wells based on other water quality parameters that are easier to measure.
We assume that 20 un-monitored wells are ungauged sites and 6 monitored wells are gauge stations (Fig. 3.16).
Table 3.9 Statistics of groundwater quality parameters at six gauging stations (wells) during 1992 and 2005.
Item unit #3 #6 #7 #12 #17 #19
Well depth m 22.8 17.0 19.0 19.6 8.4 14.9
Mean SD1 Mean SD Mean SD Mean SD Mean SD Mean SD
As ug/L 75.9 67.6 177.0 109.5 450.4 314.3 43.7 30.7 39.5 47.6 38.1 30.7
temp ℃ 25.8 1.0 25.7 0.9 25.8 1.0 25.9 1.3 26.1 1.4 26.1 1.3
pH 7.7 0.4 7.9 0.4 7.9 0.2 7.7 0.5 7.6 0.4 7.6 0.3
EC
μmho/cm 25℃
23383 18221 16509 8317 2209 912.2 21795 13886 1408 970.8 17295 7677
DO uS/cm 1040 7347 419.0 2923 51.0 349.1 1.3 1.0 1.3 1.0 1.3 1.0
Alk ug/L 356.2 137.3 560.2 138.5 504.4 84.8 384.1 123.9 315.0 55.7 504.2 143.0 TDS ug/L 15822 10285 10963 4149 1432 626.2 13994 7302 885.9 637.7 11790 5426 Cl- ug/L 6851 4652 4479 1819 391.9 215.6 5945 3419 233.6v236.3 4937 2155 SO42- ug/L 690.9 784.1 512.0 270.8 102.9 82.8 960.0 588.1 64.5 57.4 515.6 467.0
Na+ ug/L 3772 2604 2708 998.5 293.5 90.8 3297 1806 179.2v121.7 2756 1133 K+ ug/L 201.1 105.6 145.2 46.1 38.7 15.5 133.4 57.6 17.0v11.4 142.4 91.9 Mg2+ ug/L 598.8 954.6 254.7 205.5 73.2 30.3 427.5 296.4 32.2v22.6 323.8 156.3
Ca2+ ug/L 216.0 133.1 74.9 50.1 59.0 19.7 281.8 174.3 88.0 46.7 150.4 75.8
1standard deviation
A total of 270 (=45x6 wells) data sets of twelve water quality parameters【power
of hydrogen (pH), alkalinity (Alk), cadmium ion (Ca2+), chlorine ion (Cl-), total dissolved solid (TDS), electrical conductivity (EC), sodium ion (Na+), sulfate ion (SO4
2-), potassium ion (K+), dissolved oxygen (DO), magnesium ion (Mg2+) and temperature (temp)】were collected at six gauging stations (wells) between 1992 and
2005, which are used for model construction in this study. Table 3.9 shows the well
Table 3.10 Correlation matrix of As concentration and water quality parameters collected at six gauging stations (wells) during 1992 and 2005.
As pH Alk Ca2+ Cl- TDS EC Na+ SO4
2-K+ DO Mg2+ temp As 1.00 0.32 0.13 -0.34 -0.32 -0.31 -0.30 -0.30 -0.27 -0.26 0.19 -0.18 -0.07 pH 1.00 0.46 -0.56 -0.48 -0.43 -0.51 -0.44 -0.47 -0.37 0.26 -0.33 -0.03 Alk 1.00 -0.41 -0.27 -0.20 -0.28 -0.22 -0.33 -0.16 0.14 -0.17 0.04 Ca2+ 1.00 0.77 0.71 0.76 0.73 0.74 0.62 -0.22 0.52 0.02
Cl- 1.00 0.93 0.97 0.97 0.88 0.87 -0.16 0.55 -0.05
TDS 1.00 0.93 0.93 0.81 0.84 -0.18 0.51 -0.07
EC 1.00 0.96 0.88 0.86 -0.19 0.53 -0.06
Na+ 1.00 0.84 0.88 -0.19 0.52 -0.09
SO42- 1.00 0.72 -0.08 0.42 -0.06
K+ 1.00 -0.11 0.47 -0.07
DO 1.00 -0.12 -0.02
Mg2+ 1.00 0.02
Temp 1.00
depth, mean and standard deviation (SD) of groundwater quality parameters at these 6 gauging stations, in which high mean and variation of As concentration occur, especially at wells #6 and #7. The depth intervals of the three aquifers were <60m, 120-200m, and 280-350 m, respectively (Agricultural Engineering Research Center, 2008). This indicates that the 6 monitored wells (well depth: 8.4-22.8m) are in the same confined aquifer. Table 3.10 shows that all the correlation coefficients between As and twelve water quality parameters are smaller than 0.34 (in an absolute sense), which implies the difficulty in determining non-trivial factors that affect As concentration based solely on such traditional correlation analysis. Therefore, we adopt a more
sophisticated method to effectively extract non-trivial factors from water quality parameters for building an As concentration estimation model.
C. Results and discussion
1) Extracting effective water quality factors
The six wells (#3, #6, #7, #12, #17 and #19) that have sufficient water quality data are assumed as gauging stations for As concentration and are utilized by the GT. Data
sets of twelve water quality factors are first scaled to [-1,1], and a total of 4095 (=212-1)
values corresponding to all possible input combinations are derived through the GT.
The derived values are next sorted in an ascending order, in which values smaller than the 10th percentile (10 0.0089) are classified as the best group (F1 0)
whereas values bigger than the 90th percentile (90 0.136) are classified as the worst group (F9 0). Figure 3.17 shows the results of the GT, where blue bars represent the occurrence frequency of variables in the best group (F1 0) and red bars represent
the occurrence frequency of variables in the worst group can then be identified as the variables associated with higher blue bars and lower red bars simultaneously, and such ratios are shown by the dotted line in Fig. 3.17. And therefore we can extract a subset of input variables that ranks top three in the ratio ofF1 0to F9 0. The GT results
indicate that Alk, Ca2+ and pH value are the non-trivial factors for use in the estimation models (the NARX network and the BPNN).
Ratio
Fig. 3.17 Determination of non-trivial factors by the GT results.
These results are consistent with several studies, which indicated the increase in As leaching efficiency depends on high pH values and Alk concentration (Anawar et al., 2004; Kim et al., 2000; Kuo and Chang, 2010; Liu et al., 2003; Park et al., 2006; Pierce and Moore, 1982). The major C-containing species in the reducing condition in
groundwater are HCO3− and H2CO3 , which cause high pH values and Alk concentration (Wang et al., 2007). In addition, salinization and As enrichment are two main hydro-geochemical characteristics in the Yun-Lin coastal area, and they were estimated by the factor analysis (Wang et al., 2007). Respectable cation, such as calcium ions, and anion contents carried by seawater intrusion initially increased the ion strength in groundwater and induced As desorption (Appelo et al., 2002; Keon et al., 2001). On the one hand, As anions could sorb or bind using carbonates in natural systems (Bauer et al., 2008; Rothwell et al., 2009). Therefore the relationship between As and calcium
ions might be caused by the dissolution of calcium arsenates and/or the competitive desorption of calcium (Bothe and Brown, 1999; Mihaljevicet al., 2003; Nishimura and Robins, 1998). In the Yun-Lin coastal area, the shallow aquifer has suffered serious salinization that affects the concentration of the calcium ion due to the over-pumping of groundwater. This exercise gives evidence that the GT can effectively identify non-trivial and meaningful factors that affect the fluctuations of As concentration, compared with the identification difficulty raised by the traditional correlation matrix shown in Table 3.10.
2) Estimating As concentration at ungauged sites by the NARX network
In this case study, the NARX network is proposed to estimate the regional As concentration in Yun-Lin County. Variables Alk, Ca2+ and pH determined by the GT are used as exogenous inputs to the NARX network. The data sets collected at six gauging stations between 1992 and 2005 are used for model calibration. Therefore, the NARX network in the SP mode trained by the Bayesian regularization method is calibrated by a 30-fold cross validation. The log-sigmoid function and the linear function are the transfer functions used in the hidden and output layers of the NARX network, respectively. The most appropriate NARX network comprises two output-memory orders and 20 neurons in the hidden layer by trial and error method, and the effective number of network parameters (p) is 23.74.
To demonstrate the effectiveness and usefulness of the NARX network established, the BPNN is implemented for comparison purpose. The constructed BPNN consists of the same input variables as those of the NARX network and six neurons in the hidden layer. The hyperbolic tangent sigmoid function and the linear function are the transfer functions used in the hidden and output layers of the BPNN, respectively. The BPNN trained with the Levenberg-Marquardt optimization algorithm is also calibrated by a 30-fold cross validation. The results show that the average RMSE of the NARX network in the training and validation phases are 95.11 and 106.13 ugl-1, respectively, whereas the average RMSE of the BPNN in the training and validation phases considerably increases to 121.54 and 143.37 ugl-1, respectively. The results demonstrate that the NARX network has much better performance than the BPNN. It is noted that both models produce large errors (average RMSE), this is mainly due to the high uncertainty attached to the sampled values, where the mean (138.26 ugl-1) and standard deviation (205.25 ugl-1) of As concentration contributes to the poor model accuracy.
It is worth noting that the effective number of network parameters (p) for the
NARX network has been optimized from 141 to 23.74 after the re-calibration of the network by using the Bayesian regularization method. This demonstrates that the Bayesian regularization method can significantly reduce the effective number of network parameters and thus avoid the over-fitting problem caused in a rather complex
network structure. As a result, the NARX network produces suitable results and has similar performance in the training and validation phases (average RMSE: 95.11 ugl-1 and 106.13 ugl-1 accordingly). In contrast, the BPNN requires fewer neurons in the hidden layer to prevent the over-fitting problem but still performs poor in the validation phase (average RMSE: 143.37 ugl-1).
After model configuration, a total of 100 (=20x5 months) As concentration data collected at twenty assumed ungauged sites in five monitoring months (January 1995, October 1996, October 1997, September 1998 and January 1999) are utilized to test the two constructed models. In addition to RMSE, the normalized mean squared error (NMSE), R-square value (R2) and F test are also used as performance criteria in the testing phase. The NMSE is defined as:
ungauged sites in the same year, respectively, O represents the average of observed As concentrations in a certain year, and n is the length of data.
The results of model comparison in the testing phase are summarized in Table 3.11, which indicates the NARX network has much smaller RMSE as well as NMSE values and higher R2 values than the BPNN. Besides, when assessing the results of the
Table 3.11 Estimation performance of the NARX network and the BPNN for As concentration at 20 ungauged sites between 1995 and 1999 in the testing phase.
Estimation time
RMSE
(ugl-1) NMSE R2 F test p-value2
Data Mean (ugl-1)
Data SD1. (ugl-1)
NARX network
1995 Jan. 57.31 0.19 0.89 0.105 85.63 134.80
1996 Oct. 91.53 0.29 0.89 0.010 90.71 173.61
1997 Oct. 40.24 0.11 0.96 0.291 64.51 125.99
1998 Sept. 48.34 0.26 0.91 0.731 47.83 97.02
1999 Jan. 41.35 0.07 0.98 0.816 75.57 155.91
BPNN
1995 Jan. 109.02 0.69 0.41 0.025 85.63 134.80 1996 Oct. 158.70 0.88 0.17 0.002 90.71 173.61 1997 Oct. 142.42 1.35 0.18 0.070 64.51 125.99 1998 Sept. 114.63 1.47 0.29 0.302 47.83 97.02
1999 Jan. 140.21 0.85 0.25 0.004 75.57 155.91
1Standard deviation
2The null hypothesis is rejected at the 5% level (p-value < 0.05)
F test, the null hypothesis is rejected only at the 5% level of the estimation values in October 1996 for the NARX network, whereas the null hypothesis is rejected in January 1995, October 1996 and January 1999 for the BPNN. Figure 3.18 shows the scatter plots of observed and estimated As concentrations in five different months during 1995 and 1999 derived from the NARX network and the BPNN. The estimation values obtained from the NARX network are close to the ideal line and only have few underestimations at extremely high As concentrations, whereas the BPNN overestimates As concentrations at values lower than 200 ugl-1 and seriously underestimates As
Fig. 3.18 Scatter plots of observed and estimated As concentration (conc.) derived from the NARX network and the BPNN at twenty ungauged sites (1995-1999).
0 200 400 600
Estimated As conc. (ug/l) January 1995
0 200 400 600 800
Estimated As conc. (ug/l) October 1996
0 200 400 600
Estimated As conc. (ug/l) October 1997
0 100 300 500
Estimated As conc. (ug/l) September 1998
0 200 400 600 800
Estimated As conc. (ug/l) January 1999
0 200 400 600
Estimated As conc. (ug/l) January 1995
0 200 400 600 800
Estimated As conc. (ug/l) October 1996
0 200 400 600
Estimated As conc. (ug/l) October 1997
0 100 300 500
Estimated As conc. (ug/l) September 1998
0 200 400 600 800
Estimated As conc. (ug/l) January 1999
0 200 400 600
Estimated As conc. (ug/l) January 1995
0 200 400 600 800
Estimated As conc. (ug/l) October 1996
0 200 400 600
Estimated As conc. (ug/l) October 1997
0 100 300 500
Estimated As conc. (ug/l) September 1998
0 200 400 600 800
Estimated As conc. (ug/l) January 1999
0 200 400 600
Estimated As conc. (ug/l) January 1995
0 200 400 600 800
Estimated As conc. (ug/l) October 1996
0 200 400 600
Estimated As conc. (ug/l) October 1997
0 100 300 500
Estimated As conc. (ug/l) September 1998
0 200 400 600 800
Estimated As conc. (ug/l) January 1999
0 200 400 600
Estimated As conc. (ug/l) January 1995
0 200 400 600 800
Estimated As conc. (ug/l) October 1996
0 200 400 600
Estimated As conc. (ug/l) October 1997
0 100 300 500
Estimated As conc. (ug/l) September 1998
0 200 400 600 800
Estimated As conc. (ug/l) January 1999
As conc. (NARX network)
╳ As conc. (BPNN)
Fig. 3.19 Estimation results of As concentrations at ungauged well #14 during 1995 and 1999 in the testing phases of the NARX network and the BPNN.
concentrations at values higher than 200 ugl-1. Furthermore, Fig. 3.19 shows the estimation results of As concentrations at the ungauged well #14, which is located at the central Yun-Lin coastal area and is far from other gauging and ungauged wells, during 1995 and 1999 in the testing phases of the NARX network and the BPNN. Results indicate that the NARX network only underestimates As concentration in 1995, whereas the BPNN highly overestimates As concentration in all five testing years.
In sum, the NARX network adequately utilizes the information of model outputs through recurrent connections to the network itself for producing reliable estimations of As concentrations at twenty ungauged sites. Owing to the implementation of the Bayesian regularization method into the NARX network, the constructed network
0 50 100 150 200 250 300
A s co n c. (u g /l )
Observed As conc.As conc. (NARX network) As conc. (BPNN)
1995
Jan. 1999
Jan.
1996
Oct. 1997
Oct. 1998
Sept.
shows impressive generalizability and performs well in the testing phase, which can be proved through the similar NMSE values in five testing years (Table 3.11).
3) Deriving the risk map of As concentration through the IK
From the previous section, the NARX network can provide reliable point estimation of As concentration at twenty ungauged sites. The IK is next employed to estimate the regional spatial distribution and to compute the probability of the exposure to high As pollution in unsampled areas. Because the WHO drinking water standard for As concentration is 10ugl-1, the investigation of this study mainly focuses on the threshold of 10ugl-1. Therefore, the estimated As concentration obtained from the NARX network at twenty ungauged sites and the observed As concentration at six gauging stations are utilized to construct the semivariogram models for the IK.
The NARX network coupled with the IK can illustrate the unknown probability of the exposure to high As concentrations at neighboring areas of all 26 wells. If the observed and estimated As concentrations exceed the threshold set in the adjacent region, the IK will assign a high probability of concentration in the region of interest.
The probability maps of As concentration under the threshold of WHO drinking water standard (10ugl-1) in different time spans are shown in Fig. 3.20. In January 1995, high exceeding probabilities (>10 ugl-1) of As concentration occurred in northern and
(a) January 1995 (b) October 1996 (c) October 1997
(d) September 1998 (e) January 1999
Fig. 3.20 Exceeding probability maps of As concentration under the threshold of WHO drinking water standard (10ugl-1) between 1995 and 1999.
Probability As>10 μgl -1
1.0
0.0
southern areas, whereas both the surrounding area of well #5 and the central area (located between Old Huwei River and New Huwei River) had low exceeding probabilities of As concentration. In October 1996, the exceeding probability was high in the southern area of the Old Huwei River. In contrast, the exceeding probability of As concentration was gradually and significantly mitigated in the central and northern areas from October 1997 to January 1999, and the Old Huwei River could be deemed as a clear boundary between high and low As concentrations in an exceeding probability sense. These risk maps reveal the high As-prone areas. As a result, the information of the risk maps derived from the IK of the proposed SDM can consequently help decision makers manage groundwater quality and thus prevent residents from drinking or using toxic groundwater.