建議 - 結論與建議 - 共有物種數的無母數估計探討

第六章結論與建議

6.2 建議

國

立政治大學

‧

N a tio na

l C h engchi U ni ve rs it y

估計值會隨樣本數增加，從低估到高估最後趨近於真實值，與理論模擬的情況一致。

若以 95%信賴區間覆蓋機率的角度來看何時停止抽樣即可準確估計到真實值，以本研究所使用例子在抽出不放回下，除了金庸小說【射鵰英雄傳】對【神鵰俠侶】 ˆ₁₂₍₁₎

S

v 約抽取 30%左右的樣本數，其 95%信賴區間覆蓋率會達到九成以上， ˆ₁₂₍ ₂₎

S

v 約抽取 35%左右的樣本數，其 95%信賴區間覆蓋率會達到九成以上。

其餘實例 ˆ₁₂₍₁₎

S

v 皆抽取 40%左右的樣本數，其 95%信賴區間覆蓋率會達到九成以上，ˆ₁₂₍ ₂₎

S

v 抽取 50%左右的樣本數，其 95%信賴區間覆蓋率會達到最高九成以上。

本文提出兩個估計式在抽出放回下，金庸小說要達到最高的覆蓋率，所需樣本數約為抽出不放回的 1.5 至 2 倍，其餘的實例要達九成以上覆蓋率，所需樣本數約為抽出不放回的 2 倍。

從 95%信賴區間覆蓋率的觀點來看，要準確估計到真實值所需抽取的樣本還是太多，在本實例中若從真實值落入 95%信賴區間點觀點來看， ˆ₁₂₍₁₎

S

v 在抽出不放回需抽取 25%至 35%的樣本，抽出放回下則需抽取 30%至 50%的樣本，其真實值就會包含於信賴區間。 ˆ₁₂₍ ₂₎

S

v 在抽出不放回只需抽取 30%至 45%的樣本，抽出放回下則需抽取 40%至 65%的樣本，其真實值就會包含於 95%信賴區間。

6.2 建議

本研究提出方法雖然在較平穩的幾何分配(例如：幾何分配參數 0.1 至 0.3) 和本文實例上有不錯的效果，但需要兩群落有相同的樣本數，不盡然合乎應用上的需求，未來將繼續修正，可以朝向 Chao 估計式在兩群落抽取不同的樣本數，

會更符合實際應用。

‧ 國

立政治大學

‧

N a tio na

l C h engchi U ni ve rs it y

在實際資料共有物種分佈類似於參數為 0.1 至 0.4 的幾何分配，用本文所提出估計方法效果還不錯，但在一些較極端的情形(例如：兩群落分配類似參數 0.5 以上的幾何分配)，需要抽取非常多的樣本才足以準確估計真實共有物種數。因本研究方法僅用一階摺刀法來估計共有物種數，或許可以考慮更高階的摺刀法，

以降低抽取的樣本數，但實際應用時需要用幾階摺刀法來估計，這也是後續值得探討的問題。

在估計共有物種數時，不可能藉由永無止盡的抽樣來提高估計的準確性，有時兩群落共有物種分配較特殊時(例如：兩群落分配類似參數 0.5 以上的幾何分配)，運用本文所提出方法需要抽取較多的樣本，所以何時停止抽樣即可準確估計是一個重要課題。母體已知的情況下，在本研究的實例中以抽取多少百分比的樣本覆蓋率會達最高，和真實值落入信賴區間時需抽取的最少百分筆樣本數，當作停止抽樣的準則，後續可以用更多實際資料來驗證。但在母體未知下並沒有一個停止抽樣的準則，建議可以發展出一個停止抽樣的準則，不管母體是否已知皆能使用此準則。

‧ 國

立政治大學

‧

N a tio na

l C h engchi U ni ve rs it y

參考文獻

英文部份：

Bunge, J. and Fitzpatrick, M. (1993). Estimating the number of species: A review. Journal

of the American Statistical Association, 88, pp. 364-373.

Burnham, P. K. and Overton, S. W. (1978). Estimation of the size of a closed population when capture probabilities vary among animals. Biometrika, 65, pp. 625-633.

Chao, A. and Lee, S-M. (1992). Estimating the number of classes via sample coverage.

Journal of American Statistical Association, 87, pp. 210-217.

Chao, A., Ma, M.C., and Yang, M.C.K. (1993). Stopping rule and estimation for recapture debugging with unequal detection rates. Biometrika, 80, pp. 193-201.

Chao, A., Hwang, W-H, Chen, Y-C., and Kuo, C-Y. (2000). Estimating the number of shared species in two communities. Statistica Sinica, 10, pp. 227-246.

Chao, A., Pan, H. Y., and Chiang, S. C. (2008). The Petersen-Lincoln estimator and its extension to estimate the size of a shared population. Biometrical Journal, 50, pp.

957-970.

Clayton, M. K. and Frees, E. W. (1987). Nonparametric estimation of the probability of discovering a new species. Journal of the American Statistical Association, 82, pp.

305-311.

Good, I. J. (1953). The population frequencies of species and the estimation of population parameters. Biometrika, 40, pp. 237-264.

Good, I. J. and Toulmin, G. H. (1956). The number of new species, and the increase in population coverage, when a Sample is increased. Biometrika, 43, pp. 45-63.

Efron, B. and Thisted, R. (1976). Estimation the number of unseen species: How many words did Shakespeare now? Biometrika, 63, pp. 435-447.

Efron, B. and Tibshirani, R. J. (1993). An introduction to the bootstrap. New York:

‧ 國

立政治大學

‧

N a tio na

l C h engchi U ni ve rs it y

Chapman & Hall.

Esty, W. W. (1983). A normal limit law for a nonparametric estimator of the coverage of a random sample. The Annals of Statistics, 11, pp. 905-912.

Esty, W. W. (1986). The efficiency of Good's nonparametric coverage estimator. The

Annals of Statistics, 14, pp. 1257-1260.

Mao, C. X. and Lindsay, B. G. (2002). A Poisson model for the coverage problem with a genomic application. Biometrika, 89, pp. 669-681.

Otis, D., Burnham, K. P., White, G., and Anderson, D. R. (1978). Statistical inference from

capture data on closed animal populations. Wildlife Monograph, 62, Washington,

D.C.: The Wildlife Soc.

Quenouille, M. H. (1949). Approximate tests of correlation in time-series. Journal of the

Royal Statistical Society. Series B, 11, pp. 68-84.

Quenouille, M. H. (1956). Notes on bias in estimation. Biometrika, 43, pp. 353-360.

Schucany, W. R., Gray, H. L., and Owen, D. B. (1971). On bias Reduction in Estimation.

Journal of the American Statistical Association, 66, pp. 524-533.

Sharot, T. (1976). The generalized jackknife: finite samples and subsample sizes. Journal

of the American Statistical Association, 71, pp. 451-454.

Yue, C. J. (2009). Sequential sampling in the search new shared species. , Technical report, Department of Statistics, National Chengchi University.

中文部份：

余清祥(1998)，統計在紅樓夢的應用，國立政治大學學報，第 76 期，頁 303-327。

趙蓮菊(1995)，種類知多少，數學傳播，第 74 期，頁 1-6。

‧ 國

立政治大學

‧

N a tio na

l C h engchi U ni ve rs it y

附錄

附表 1、共有物種數 20 下

v

₁(

n

)未出現共有物種機率

在文檔中共有物種數的無母數估計探討 - 政大學術集成 (頁 53-57)

建議

第六章 結論與建議

6.2 建議

國

立 政 治 大 學

‧

N a tio na

l C h engchi U ni ve rs it y

S

S

S

S

S

S

6.2 建議

‧ 國

立 政 治 大 學

‧

N a tio na

l C h engchi U ni ve rs it y

‧ 國

立 政 治 大 學

‧

N a tio na

l C h engchi U ni ve rs it y

參考文獻

of the American Statistical Association, 88, pp. 364-373.

Journal of American Statistical Association, 87, pp. 210-217.

‧ 國

立 政 治 大 學

‧

N a tio na

l C h engchi U ni ve rs it y

Annals of Statistics, 14, pp. 1257-1260.

capture data on closed animal populations. Wildlife Monograph, 62, Washington,

Royal Statistical Society. Series B, 11, pp. 68-84.

Journal of the American Statistical Association, 66, pp. 524-533.

of the American Statistical Association, 71, pp. 451-454.

‧ 國

立 政 治 大 學

‧

N a tio na

l C h engchi U ni ve rs it y

附錄

v

n

第六章結論與建議

立政治大學

立政治大學

立政治大學

立政治大學

立政治大學