Statistical analysis - 比較系統性與非系統性公民科學資料於鳥類物種豐富度預測之表現差異

a) Observed species richness comparison

The different BBS survey methods employed in 2009, caused a different time of duration in each visit than other years (i.e., 6-minute point count surveys were conducted from 2010–2017, while 9-minute point count surveys were conducted in 2009). I therefore removed all visits from the BBS dataset from 2009.

To make the results comparable, I compiled species records and duration of survey points of a given BBS site in a visit. After compiling records of a visit into a checklist in each site separately, a total of 2238 checklists were collected from each visit across the 204 BBS sites in Taiwan. To be comparable with BBS’s survey duration, I only included eBird checklists with a duration of between 36 to 60 minutes, with a total of 2164 eBird checklists retained. I performed a two-tailed Wilcoxon rank-sum test on both datasets to test the difference of observed species richness.

b) Species richness estimation methods

For the selected 14596 eBird checklists that fell within BBS sites, species richness estimation was based on each separate checklist (checklist-based). Three non-parametric approaches of species richness estimation methods were applied to the eBird dataset: (1) abundance-based estimator, Chao1 (Chao, 1984; Colwell & Coddington, 1994; Chao &

Chiu, 2014); (2) Incidence-based Coverage Estimator (ICE) (Chao & Chiu, 2014):

recommended by Chao and Chiu (2014), I set up 10 individuals as a cut-off point to define infrequent or frequent species group; (3) and first-order Jackknife, an estimator based on the number of singleton species (Burnham & Overton, 1978; Colwell & Coddington, 1994). Chao1 estimation was performed using the “iNEXT” package (Hsieh et al., 2016);

ICE and first-order Jackknife estimation methods were performed with the “vegan”

package (Oksanen et al., 2016) in the R platform.

c) Evaluating the performance of species richness estimation methods

To quantify the performance of the species richness estimation methods from the eBird dataset, I calculated the bias value based on estimated species richness from each eBird checklist against the compiled observed species richness from 2009–2017 in each BBS site separately (i.e., the asymptote of total species richness from accumulated annual surveys was assumed to be known as the total species richness in each BBS site, likely to represent the local bird community) (Walther & Morand, 1998; Walther & Martin, 2001;

Walther & Moore, 2005; Tingley et al., 2020). In other words, each eBird checklist produced one result value of bias (unless the eBird location was intersected with more than two BBS sites, then I treated the eBird checklists separately belonging to the shared BBS sites). The bias value was calculated by the following formula:

Bias = ^[𝑬^𝒊𝒋^#𝑨^𝒊^] [𝑨_𝒊]

with j = eBird checklists in the i^th BBS site (i.e., j th sample in each BBS site);

with i =1 to 204 (refers to the i ^th BBS site). Eij is the estimated species richness in each eBird checklist; Ai is the compiled observed species richness of the i ^th BBS site from 2009 to 2017. The bias calculation was performed in Microsoft Excel 2019. Finally, I used one-tailed Wilcoxon rank-sum test to examine the least biased species richness estimator among the three estimation methods by comparing each pair of estimators. The selected least biased species richness estimator was applied to the species richness estimation in order to access the two datasets comparison in the following questions.

d) Determining the effect of duration on bias after species richness estimation (1) Evaluating the effect of duration on observed species richness

Before taking the next step to examine the effect of duration on bias, I tested the effect of duration on observed species richness across all included 14596 eBird checklists.

I fitted four non-linear functions independently by using the least squares method (James et al., 2013). The four non-linear functions are used to estimate the asymptote of species richness as duration increase (Magurran & McGill, 2011), and formulas are depicted as follows:

where, y is the observed species richness, as the dependent variable, and x is the duration, as the independent variable; a, b, c denote the parameters to be estimated by the least squares method. This parameter estimation was calculated with the “stats” package (Team & Worldwide, 2002) in the R platform.

To compare the goodness-of-fit of the four different non-linear models, I compared the fitted curve with the BIC (Gideon, 1978). BIC was used instead of Akaike information criterion (AIC), since our objective was to explain the relationship between duration and observed species richness, instead of predicting the value (Shmueli, 2010).

Under the Bayesian probability framework, the probability of selecting the true model increases as the training sample size increases (Friedman et al., 2001; Magurran & McGill, 2011). BIC model selection was performed with the “AICcmodavg” package (Mazerolle

& Mazerolle, 2019) in the R platform. The best selected non-linear function was used to address the relationship between the duration and bias in the following process.

(2) Calculating the reduction of bias after species richness estimation

To make a comparison of the reduction of bias before and after estimating species richness at a standardized duration, for the same reasons as above, I removed all visits from the BBS dataset from 2009. With a total of 14596 eBird checklists, I treated duration in each eBird checklist as an independent variable; bias derived from observed and estimated species richness were treated as a dependent variable separately. Bias was calculated by the following formula: eBird checklist (note that the estimation was based on the least biased estimation method);

Ai is the compiled observed species richness from the i ^th BBS site recorded from 2010 to 2017.

To test the effect of duration on the bias across all included 14596 eBird checklists, I fitted both independent and dependent variables with the selected non-linear function described above by using the least squares method (James et al., 2013). Parameter estimation was calculated with “stats” package (Team & Worldwide, 2002) in the R platform. Finally, based on the non-linear function at a 60-minutes, the reduction value of bias can be measured with – the bias value after species richness estimation minus the bias value before species richness estimation.

(3) Evaluating improvement on proportion of species richness from eBird against BBS after species richness estimation

To evaluate the improvement of species richness after estimation from eBird dataset against BBS dataset at the duration of 60 minutes, I included BBS sites which only included 10 points (i.e., a total of 60 minutes in each visit was conducted from a BBS site), and removed all visits from 2009. I calculated the average observed species richness from each visit in each BBS site (i.e., the average number of species recorded in each visit of BBS). A total of 92 BBS sites were retained after selection (Figure 4), accompanied with a total of 6611 eBird checklists. I treated duration in each eBird checklist as an independent variable; bias derived from observed and estimated species richness were treated as a dependent variable separately. Bias was calculated by the following formula:

Bias = ^[𝑶^𝒊𝒋^#𝑨^𝒊^] [𝑨_𝒊]

with j = eBird checklists in the i^th BBS site (i.e., j th sample in each BBS site);

with i =1 to 92 (refers to the i ^th BBS site). Oij is the observed species richness in each eBird checklist; Ai is the average observed species richness from each visit in the i ^th BBS site recorded from 2010 to 2017.

Bias = ^[𝑬^𝒊𝒋^#𝑨^𝒊^] [𝑨_𝒊]

with j = eBird checklists in the i^th BBS site (i.e., j th sample in each BBS site);

with i =1 to 92 (refers to the i ^th BBS site). Eij is the estimated species richness in each eBird checklist (note that the estimation was based on the least biased estimation method);

Ai is the average observed species richness from each visit in the i ^th BBS site recorded from 2010 to 2017.

To test the effect of duration on the bias across all included 6611 eBird checklists, I fitted both independent and dependent variables with the selected non-linear function described above by using the least squares method (James et al., 2013). To test the performance of eBird dataset after species richness estimation, based on the non-linear function, 60-minutes was set to standardize the comparison of bias before and after species richness estimation. Finally, the improvement on proportion of species richness from eBird dataset after the estimation can be calculated through the bias formula.

Figure 4 Distribution of selected 92 BBS sites with criteria of 10 points/site from 2010 to 2017 across Taiwan from the original of 457 BBS sites

Results 1. Observed species richness

After restricting duration from both BBS and eBird datasets (36–60 minutes/checklist), the BBS dataset (204 sites) had a statistically higher observed species richness than the 2164 eBird checklists which were recorded within a 2×2 km square buffer based on centroid point from the BBS sites (W = 3826200, effect size = 0.503, p <

0.001) (Figure 5). The median per checklist of observed species richness for BBS (n = 2238) and eBird (n = 2164) datasets were 15 and 9 species, respectively. Inter-quartile range (IQR) for BBS (n = 2238) and eBird (n = 2164) datasets were 9 and 8, respectively (Figure 5).

Figure 5 Observed species richness per checklist recorded in BBS and eBird datasets.

BBS dataset included 2238 visit-based checklists, with a total of 204 sites. eBird dataset included 2164 checklists. Both datasets had durations restricted to the range of 36–60

2. The performance of species richness estimation methods

Chao1 estimator (median bias = -0.693) was overall least biased (W = 12369000, p < 0.05) compared with other two estimators (median bias of ICE = -0.730; median bias of Jackknife = -0.773) against compiled observed species richness from each BBS site (Table 1, Table 2 and Figure 6). ICE estimator was less biased than Jackknife (W = 119220000, p < 0.001) (Table 2). Estimates of species richness by eBird checklists varied by estimation methods, but generally underestimated the true community size (bias < 0) (n = 14596) (Table 1). However, the outcome of estimated species richness varied across estimation methods. Bias derived from the Chao1 estimator varied between -0.987 and 5.602, while bias derived from the Jackknife estimator has a generally smaller range, varied between -1.000 and 1.000 (Table 1).

Table 1 Performance of three species richness estimation methods for the eBird dataset against observed species richness from the BBS dataset, evaluated by the result value of bias summarized by all included checklists (n = 14596). Bias was calculated to make a comparison among estimators.

Mean SD Median IQR Minimum Maximum

Chao1 -0.576 0.393 -0.693 0.440 -0.987 5.602

ICE -0.640 0.286 -0.730 0.351 -0.983 1.222

Jackknife -0.689 0.267 -0.773 0.317 -1.000 1.000

Table 2 One-tailed Wilcoxon rank-sum test between species richness estimation methods

W–value p–value

Chao1 vs. ICE 123690000 < 0.05*

Chao1 vs. Jackknife 123690000 < 0.05*

ICE vs. Jackknife 119220000 < 0.001***

Figure 6 Performance of Chao1, ICE, and Jackknife estimators on species richness estimation methods. Bias was measured by comparing the result of each estimation method against compiled species richness from each BBS site. (A) The difference of Chao1 subtracted from ICE estimator; (B) The difference of Chao1 subtracted from Jackknife estimator; (C) The difference of ICE subtracted from Chao1 estimator; (D) The difference of ICE subtracted from Jackknife estimator; (E) The difference of Jackknife subtracted from Chao1 estimator; (F) The difference of Jackknife subtracted from ICE estimator. Asterisks in plots indicate the significance level between estimation methods by one-tailed Wilcoxon rank-sum test (p < 0.05 = *; p < 0.001 = ***). Note that the result value of bias only presents from -0.05 to 0.05.

3. Relationship between duration and observed species richness

The power function was the best model to represent the relationship between duration and observed species richness, based on the BIC values (Table 3, Table 4 and Figure 7). As a result, the power function was selected to examine the effect of duration on bias in subsequent analyses.

Table 3 BIC model selection results from the relationship of duration and observed species richness

Non-linear function K BIC Delta_BIC BICWt Log-likelihood Power function 3 41262.13 0.0000 0.6921 -20617.87 Gompertz function 4 41263.75 1.6198 0.3079 -20614.28 Logistic function 4 41282.44 20.3053 0.0000 -20623.62 Schumacher function 3 42041.85 779.7200 0.0000 -21007.73

Table 4 Parameter estimates from the power function by least squares method on the relationship of duration and observed species richness

Parameter Estimate Standard Error t–value p–value

a 2.867213 0.059096 48.52 <0.001***

b 0.304814 0.004471 68.17 <0.001***

*Note: the power function formula is depicted above with parameters (a and b) to be estimated. Residual standard error: 5.606 on 14594 degrees of freedom

Figure 7 The relationship of duration and observed species richness from eBird checklists (n = 14596). Power function (top right of the figure) was used to fit the relationship of duration and observed species richness by a least squares approach.

4. Bias reduction after species richness estimation

Underestimation is represented by negative bias (bias < 0), while overestimation is represented by positive bias (bias > 0). In general, as survey duration increased, both observed and estimated species richness of eBird checklists were closer to the observed species richness of BBS sites (Figure 8 and Figure 9). A non-linear power function explained the effect of duration on the bias of species richness of eBird checklists, comparing with BBS checklists (Table 5 and Table 6). Based on the power function at 60-minutes, bias was closer to zero (from -0.61 to -0.50) after species richness being estimated by the Chao1 estimator in eBird dataset; that is, species richness from eBird dataset was overall closer to BBS dataset after the Chao1 species richness estimation (Figure 8 and Figure 9). In addition, bias was significantly closer to zero after the Chao1 species richness estimation (V = 61101000, p < 0.05).

When comparing observed species richness in the eBird and BBS datasets, according to the power function by least squares approach, at 60-minutes the eBird dataset had a bias of -0.61 (Figure 8), which indicated the eBird dataset recorded an average of 39% of the BBS species richness at 60-minutes. The eBird dataset failed to record the same number of observed species at the duration of between 6 to 780 minutes based on power function (bias = 0) (Figure 8).

When comparing the Chao1 species richness estimated from the eBird dataset to observed species richness in the BBS dataset, according to the power function, at 60-minutes the eBird dataset had a bias of -0.50 (Figure 9), which indicated that the eBird dataset recorded an average of 50% of the BBS species richness after the Chao1 species richness estimation. According to the power function, eBird checklists would need a

(Figure 9). The longest duration (780 minutes) among all eBird checklists (n = 14596), had a bias of -0.14 (Figure 9).

Table 5 Parameter estimates from the power function by least squares method on the relationship of duration and bias (observed species richness of eBird vs. observed species richness of BBS)

Parameter Estimate Standard Error t–value p–value

a 0.099773 0.002515 39.67 <0.001***

b 0.330131 0.005414 60.98 <0.001***

*Note: the power function is depicted above with parameters (a and b) to be estimated.

Residual standard error: 0.2595 on 14594 degrees of freedom

Figure 8 The relationship of duration on eBird checklists and bias (observed species richness of eBird vs. observed species richness of BBS) across 204 BBS sites. The power function (top-right in the figure) was used to fit the relationship of bias and duration by a least squares approach. Bias was calculated with observed species richness from both eBird and BBS datasets. A total of 14596 eBird checklists were included in the analyses.

Note that bias calculation of observed species richness in BBS was computed by compiling observed species richness from 2009–2017 across each 204 BBS site

Table 6 Parameter estimates from the power function by least squares method on the relationship of duration and bias (estimated species richness of eBird vs. observed species richness of BBS)

Parameter Estimate Standard Error t–value p–value

a 0.140924 0.004192 33.62 <0.001***

b 0.310248 0.006439 48.18 <0.001***

*Note: the power function is depicted above with parameters (a and b) to be estimated.

Residual standard error: 0.4049 on 14594 degrees of freedom

Figure 9 The relationship of duration on eBird checklists and bias (estimated species richness of eBird vs. observed species richness of BBS) across 204 BBS sites. The power function (top-right in the figure) was used to fit the relationship of bias and duration by a least squares approach. Bias was calculated with estimated species richness from eBird dataset and observed species richness from BBS dataset. A total of 14596 eBird checklists were included in the analyses. Note that bias calculation of observed species richness in BBS was computed by compiling observed species richness from 2009–2017 across each 204 BBS site separately. Since the minimum result value of bias is -1, I added -1 in order to scale the formula.

5. Improvement of proportion of species richness against BBS dataset after the Chao1 species richness estimation

Again, a non-linear power function explained the effect of duration from eBird checklists on bias (Table 7 and Table 8). In general, as survey duration increased, the observed and estimated species richness of eBird checklists were closer to the average observed species richness of BBS sites (Figure 10 and Figure 11). Based on power function at 60-minutes, bias was closer to zero (from -0.34 to -0.14) after species richness being estimated by the Chao1 estimator in eBird dataset, indicating eBird dataset can record the same number of species richness from the BBS dataset raised from 66% to 86% (i.e., species richness from eBird dataset was closer to the average observed species richness from BBS dataset after the Chao1 species richness estimation) (Figure 10 and Figure 11). At 60-minutes, compared to the number of checklists reported a bias >1 before species richness estimation (n = 4), nearly three times (3.25) of eBird checklists were reported a bias >1 after the Chao1 species richness estimation (n = 13) – that is, more than twice as many eBird as BBS species richness were reported when bias >1 (overestimation) (Figure 10 and Figure 11).

When comparing observed species richness of the eBird and BBS datasets, according to the power function by least squares approach, at 60-minutes the eBird dataset had a bias of -0.34 (Figure 10). the eBird dataset recorded an average of 66% of the BBS species richness at 60-minutes. According to the power function, eBird checklists would need a duration of 221.89 minutes to reach 0 bias value (Figure 10).

When comparing the Chao1 species richness estimated from the eBird dataset to average observed species richness in the BBS dataset, according to the power function,

species after the Chao1 species richness estimation. Although the Chao1 estimator could improve the record observed species, eBird dataset was still failed to reach the same number of species richness against the BBS dataset at the 60-minutes even the Chao1 estimator was applied. According to the power function, eBird checklists would need a duration of 96.42 minutes to reach 0 bias value after the Chao1 species richness estimation (Figure 11).

Table 7 Parameter estimates from the power function by least squares method on the relationship of duration and bias (observed species richness of eBird vs. average observed species richness of BBS)

Parameter Estimate Standard Error t–value p–value

a 0.177926 0.005805 30.65 <0.001***

b 0.319615 0.007070 45.21 <0.001***

*Note: the power function is depicted above with parameters (a and b) to be estimated.

Residual standard error: 0.4077 on 6609 degrees of freedom

Figure 10 The relationship of duration on eBird checklists and bias (observed species richness of eBird vs. average observed species richness of BBS) across 92 BBS sites. The power function (top-right in the figure) was used to fit the relationship of bias and duration by a least squares approach. Bias was calculated with observed species richness from both eBird and BBS datasets. A total of 6611 eBird checklists were included in the analyses. Note that bias calculation of observed species richness in BBS dataset was computed by averaging compiled observed species richness from visits in 2010–2017 across each 92 BBS site separately. Since the minimum result value of bias is -1, I added

Table 8 Parameter estimates from the power function by least squares method on the relationship of duration and bias (estimated species richness of eBird vs. average observed species richness of BBS)

Parameter Estimate Standard Error t–value p–value

a 0.247240 0.009437 26.20 <0.001***

b 0.305866 0.008325 36.74 <0.001***

*Note: the power function is depicted above with parameters (a and b) to be estimated.

Residual standard error: 0.634 on 6609 degrees of freedom

Figure 11 The relationship of duration on eBird checklists and bias (estimated species richness of eBird vs. average observed species richness of BBS) across 92 BBS sites. The power function (top-right in the figure) was used to fit the relationship of bias and duration by a least squares approach. Bias was calculated with estimated species richness from eBird dataset and the average observed species richness from BBS dataset. A total of 6611 eBird checklists were included in the analyses. Note that bias calculation of observed species richness in BBS dataset was computed by averaging compiled observed species richness from visits in 2010–2017 across each 92 BBS site separately. Since the minimum result value of bias is -1, I added -1 in order to scale the formula.

Discussion

1. Non-linear relationship – the effect of duration on species richness and bias

In this study, I compared four non-linear models to examine the relationship between duration and species richness. The results showed that a power function was the best-performing model for explaining the relationship between duration and species richness, indicating that duration strongly affects the number of species recorded. The performance of the power function has also been evaluated by Flather (1996) who compared a total of nine non-linear models derived from the North American Breeding

在文檔中比較系統性與非系統性公民科學資料於鳥類物種豐富度預測之表現差異 (頁 32-0)