Statistical analyses and development of models

We divided the data into training and testing data. We used the heuristic provided by Huberty (1994) to determine the ratio of testing data to whole dataset. The

heuristic is as follows:

1/[1 + √(p - 1)]

where p is the number of environmental variables. Since we used 10 environmental variables, the ratio of training to testing data should be 3:1.

In this study, we selected logistic regression, discriminant analysis, Ecological-Niche Factor Analysis (ENFA), Genetic Algorithm for Rule-set Production (GARP), and Maximum Entropy (Maxent) to analyse sambar deer distributions. Logistic regression and discriminant analysis are presence-absence models, while the other 3 are presence-only models. These models are all regularly used for predictions of species distributions (Teixeira et al., 2001; Hirzel et al., 2002;

Brotons et al., 2004).

Logistic regression has been shown to be a powerful tool for analysing the effects of 1 or several independent variables, which are discrete or continuous, over a dichotomic (presence/absence) or polychotomic dependent variable (Hosmer &

Lemeshow, 1989). Logistic regression takes the following form:

π (x) = e^{g (x)}/(1 + e^{g (x)}) or π (x) = 1/(1 + e^{-g (x)})

where π (x) represents the probability of occurrence of the target species. The g (x) is obtained by a regression equation with the form:

g (x) = β0 + β1x1 + β2x2 + … + βpxp

where β₀ is a constant and β₁, β₂, … βp are the coefficients of respective independent variables x1, x2, … xp (Hosmer & Lemeshow, 1989).

Discriminant analysis is a technique used for classifying a set of observations into predefined classes that are based on a set of variables (McLachlan, 2004). Based on the observations, the technique constructs a set of linear functions of the

environmental variables, which are known as discriminant functions, whereby L = b₁x₁ + b₂x₂ + … + b_nx_n + c

where b1, b2, … bn are discriminant coefficients; x1, x2, … xn are the environmental variables; and c is a constant. These discriminant functions are used to predict the class of a new observation with an unknown class. For a k class problem, k discriminant functions are constructed. Given a new observation, all of the k

discriminant functions are evaluated, and the observation is assigned to class i if the i^th discriminant function has the highest value. Logistic regression and discriminant analysis were performed with SAS 9.0 (SAS Institute Inc., Cary, NC, USA).

The principle of ENFA is to compare the distributions of the environmental variables between the presence dataset and the whole study area (Hirzel et al., 2002).

Like Principal Components Analysis, ENFA summarises several environmental variables into a few uncorrelated factors that explain most of the information. The output of ENFA includes eigenvalues and factor scores. The first factor is the marginality factor, which describes the difference between the mean habitat in the study area and species mean. The remainders are the specialisation factors, which describe how specialised the species is with reference to the available habitat range in the study area (Hirzel et al., 2002). ENFA was developed using Biomapper 4.0 (Hirzel et al., 2007). After computing the factor scores, we used the algorithm of the medians to draw a habitat-suitability map for sambar deer.

GARP is a genetic algorithm that creates ecological niche models for species

(Stockwell & Peters, 1999). The model describes environmental conditions under which a species should be able to maintain populations. GARP searches iteratively for non-random correlations between presence and environmental variables by using 4 types of rules: atomic, logistic regression, bioclimatic envelope, and negated bioclimatic envelope. Predicted presence is defined by these rules. We used the Desktop GARP application (version 1.1.6; Http://www.nhm.ku.edu/desktopgarp/) and followed the normal procedure for implementation. The output of a GARP run is a binary map; hence, we applied a modification of the “best subsets” procedure

described by Anderson et al. (2003). We ran 200 GARP models and selected the best 20 models that that had the highest predicted accuracy. The final GARP prediction was produced by summing the 20 selected models, which produced prediction values ranging from 0 to 20.

Maxent is a machine-learning technique that is based on the principle of maximum entropy (Pearson et al., 2004). It estimates the probability distribution of maximum entropy for each environmental variable across the study area with presence-only data (Pearson et al., 2004; Pearson et al., 2006). This distribution is calculated with the constraint that the expected value of each environmental variable under this estimated distribution matches its empirical average (Pearson et al., 2006).

Habitat suitability maps were calculated by applying Maxent models to all grids in the study area, using a logistic link function to yield probability values ranging from 0 to 1. Moreover, Maxent performs well with small sample sizes (Elith et al., 2006;

Hernandez et al., 2006). Maxent models were developed using Maxent (version 3.3.1;

Http://www.cs.princeton.edu/~schapire/maxent/).

We used different methods for each model to determine the importance of each environmental variable. The importance of each variable was determined by Wald Chi-Square statistics in the analysis of maximum likelihood estimates. For

discriminant analysis, the importance of each variable was determined by standardised canonical discriminant function coefficients. For ENFA, variable importance was determined from the factor scores. For Maxent, variable importance was determined by (1) jackknife analysis of the mean gain with the training and test data, in addition to the area under the receiver operating characteristic curve (AUC);

and (2) the mean percentage contribution of each environmental variable for the models (Phillips et al., 2006). Because of software restrictions, we were unable to evaluate the importance of variables for GARP.

To evaluate the performance of each model, we used the AUC (Fielding & Bell, 1997). An AUC is created by plotting the true-positive fraction against the

false-positive fraction for all test points across all possible probability thresholds. The AUC measure takes values between 0 and 1, with a value of 0.5 indicating that a model is no better than random. It is independent of prevalence and is considered as a highly effective measure for the performance of ordinal score models (Manel et al., 2001; McPherson et al., 2004).

For conservation purposes, it is usually desirable to distinguish “suitable” from

“unsuitable” areas by setting a threshold. If the predicted probability of occurrence is larger than the threshold, then it is considered to be a prediction of presence (Pearson et al., 2004). In this study, we calculated kappa statistics under different probabilities of occurrence and selected the probability that generated the maximum kappa statistic as the threshold for each model (Freeman & Moisen, 2008).

To obtain the most robust prediction map, we used ensemble forecasting, as described by Araújo & New (2007). We calculated ensemble forecasting as weighted mean by weighting each model based on its AUC measurement (Araújo & New, 2007;

Marmion et al., 2009; Thuiller et al., 2009; Oppel et al., 2012). The habitat suitability indices of ensemble forecasting ranged from 0 to 1. We summarized the area sizes of

suitable habitat and optimal habitat by using 2 suitability index thresholds, that is, 0.33 and 0.67, respectively. We arbitrarily selected these 2 thresholds basing on our knowledge to Formosan sambar deer status in Taiwan.

II. Habitat selections at home-range and within-home-range scales 1. Study area

This study was conducted at the Taroko National Park, situated in Hualian, Taichung, and Nantou counties of Taiwan. This mountainous national park

encompasses an area of 920 km², with the Central Mountain Range passing through in north-south direction and the Central-Cross-Island Highway crossing in east-west direction. The elevation ranges from sea level to 3700 m, while the area of deer tracking mostly ranges from 1,400 to 3,600 m. The mean annual temperature are 17.5

℃, 12.5 ℃, and 7.7 ℃ in elevations of 1,000 m, 2,000 m, and 3,000 m, respectively.

Areas over 3000 m in elevation snow during winter. The mean annual precipitation is over 2,000 mm. Higher participation occurs during May to October. Dominant natural vegetation at medium-to-high elevation can be roughly classified into 10 types: (1) Yushania niitakayamensis thicket, (2) Juniperus squamata and Rhododendron

pseudochrysanthum thicket, (3) Abies Kawakamii forest, (4) Tsuga formosana forest, (5) Abies Kawakamii and Tsuga formosana forest, (6) Picea morrisonicola forest, (7) Chamaecyparis formosana forest, (8) Pinus taiwanensis forest, (9) broadleaf-conifer mixed forest, and (10) Evergreen broadleaf forest (Yang & Hsu, 2004). The site for deer capture was Mountain Panshi, which is about 3,000 m above sea level. Although this site is only approximately 9 km away from its entrance, it actually cost two days walking to arrive due to the rugged terrain. Consequently, this site is under low human disturbance and hunting pressure.

在文檔中以棲地適合度模式與GPS遙測技術探討臺灣水鹿之空間使用及不同尺度下之棲地選擇方式 (頁 21-26)