常用統計圖

基本統計繪圖函數^:

表 8: 基本統計繪圖函數

stem(x) 莖葉圖

hist(x) 直方圖

barplot(x) 長條圖

dotplot(x) 點狀圖^(epi
al套件⁾

pie(x) 圓餅圖

plot(x) 多功能^plot 作圖函數

boxplot(x) 盒鬚圖

qqnorm(x)、^qqline(x) 常態機率圖

範例 ^1: 長條圖 ^Bar harts:barplot(height,names,horiz, ol,density)

一組²⁵人接受了關於他們的啤酒飲用偏好的調查。類別^1. 國內罐裝 ^2. 國內瓶裝^3. 生啤酒^4. 進口

調查資料為³ ⁴¹ ¹ ³⁴ ³ ³¹ ³² ¹ ²¹ ² ³² ³¹ ¹ ¹¹ ⁴ ³¹

> beer=s an("d:/math work/data/beer.txt",sep= "")

> barplot(beer) # 不正確的離散資料長條圖

> barplot(table(beer)) # 正確的長條圖

> kind= ("國內罐裝","國內瓶裝","生啤酒","進口^")

> barplot(table(beer),horiz=F ,nam es=k ind, ol= (1, 2,3, 4),d ensi ty=1 0)

> barplot(table(beer)/length( beer )) # 相對次數長條圖

> table(beer)/length(beer) # 相對次數分配表

beer

1 2 3 4

0.40 0.16 0.32 0.12

順伯的窩第頁共頁

01234

... ... ... ...

0246810

1 2 3 4

0.00.10.20.30.4

範例 ^2: 圓餅圖 ^Pie harts:pie(x,label)

相似於^barplot() ^, 增加一些資料特色

> beer. ounts = table(beer) # 儲存次數分配表結果

> pie(beer. ounts, ex=2) # 簡易圓餅圖 ^ex 標籤尺寸

> names(beer. ounts)= ("A", "B", "lab le", "nam e")# 類別名

> pie(beer. ounts) # 圓餅圖中列印名稱種類

> pie(beer. ounts, ol= ("pu rple ","g reen 2"," yan ","w hite ")) # 塗色

> kind= ("國內罐裝","國內瓶裝","生啤酒","進口^")

> pie(table(beer), ol= (1,2 ,3,4 ),la bel= kind ) #顏色參數可以用代號

1

> s ores = s an("d:/math work/data/s ores.txt",sep=" ")

>stem(s ores)

The de imal point is 1 digit(s) to the right of the |

0 | 000222344568

1 | 23446

2 | 38

3 | 1

> stem(s ores,s ale=2)

The de imal point is 1 digit(s) to the right of the |

順伯的窩第頁共頁

0 | 000222344

範例 ^4: 直方圖 ^Histograms^: hist(x,breaks,main, xlab,ylab,...)

當資料筆數頗多^, 用莖葉圖較無法顯示資料^, 此時可用直方圖。

假設電影票房排名前²⁵名的一周成績^:29.6^28.2 ^19.6 ^13.7^13.0 ^7.8^3.4^2.0^1.9^1.0^0.7^0.4

0.4 0.30.30.3 0.30.30.2 0.20.20.10.1 0.10.10.1

> movies = s an("d:/math work/data/movies.txt",sep= "")

> hist(movies) # frequen ies

> hist(movies,probability=T RUE) # proportions (or probabilities)

> rug(jitter(movies)) # add ti k marks

Histogram of movies

> hist(movies,breaks= (0,1, 2,3, 4,5, 10,2 0,ma x(mo vies ))) # 自訂組別上下界

> plot( ut(movies, 10)) #強制繪製的分組組數

通常^R會自動分組⁽區間⁾別

_{k = 1+log} ₂ _n

^,若堅持將資料分成^k組^,可用指令plot( ut(data, k)) 來繪製圖形。上題範例我們也可以用 ^hist() 來統計次數^, 用參數 ^breaks 設定組別斷點,plot=FALSE不作圖

> ats=hist(sals,plot=F,break s= ( 0,1, 5,50 ))

> names( ats$ ounts)= ("貧窮","富有","非常富有^")

> ats$ ounts

範例 ^5: 盒鬚圖 boxplot(x,horizontal,names, ol)

用於簡潔地匯總數據^,快速顯示數據是否對稱或懷疑異常值 ⁽極端值⁾。

盒鬚圖顯示⁵個統計量數摘要。在最簡單的用法中^, 盒子下限⁽基本上是

_Q ₁

^),中位數^,上限

(基本上是

_Q ₃

⁾ 和延伸到最小和最大的鬍鬚。為了展示可能的異常值 ⁽極端值^),一個慣例是採用將鬍鬚縮短至箱長的

_1.5

倍。

順伯的窩第頁共頁

Histogram of movies

movies

Frequency

0 5 10 15 20 25 30

05101520

Histogram of movies

movies

Density

0 5 10 15 20 25 30

0.00.10.20.30.40.50.6

(0.0705,3.05] (6,8.95] (11.9,14.9] (17.8,20.8] (23.7,26.7]

051015

圖 ^9: 自訂組別直方圖

> boxplot(movies,main="一週票房",hor izon tal =TRU E)

> summary(movies)

Min. 1st Qu. Median Mean 3rd Qu. Max.

0.100 0.200 0.350 4.781 3.050 29.600

0 5 10 15 20 25 30

...

0 5 10 15 20 25 30

current receipts

圖 ^10: 盒鬚圖

這說明好萊塢對大熱門影片非常感興趣的原因^, 因為真正的大熱門比相當多的中等收入產生更多的收入。

> boxplot(iris[,1:4℄,data=i ris, main ="盒鬚圖並列" ,

names= (names(iris)[1:4℄) , ol = (2 ,3,4 ,5))

> boxplot(Sepal.Length~Spe ies, data =iri s,ma in="盒鬚圖並列^",

xlab="花的分類",ylab="花萼長度", ol= (2, 3,4 ))

Sepal.Length Sepal.Width Petal.Length Petal.Width

0 2 4 6 8

...

setosa versicolor virginica

4.5 5.0 5.5 6.0 6.5 7.0 7.5 8.0

...

順伯的窩第頁共頁

加拿大¹⁸²¹ ^- ¹⁹³⁴年每年的 ^lynx 服飾數量範例 ^6: 折線圖^:Frequen
y ^Polygons

以下資料為紐約洋基棒球隊的打擊率^: ⁽資料來源取自 http://www.espn. om) .314,.289,.282,.279,.275,.267,.266,.265,.256,.250,.249,.211,.161

> x = (.314,.289,.282,.279,.27 5,.2 67,. 266, .265 ,.25 6,.2 50,. 249, .211 ,.16 1)

>hist(x, ol=gray(0.9)) # histogram plot

>result = hist(x) # store the results

>lines( (min(result$breaks) ,res ult$ mids ,max (res ult$ brea ks)) ,

(0,result$ ounts,0),type=" l")

Histogram of x

x

Frequency

0.15 0.20 0.25 0.30 0.35

0 2 4 6 8

0.15 0.20 0.25 0.30 0.35

0 2 4 6 8

0.05 0.10 0.15 0.20 0.25 0.30 0.35

0246810

d=0.05,k=6

Frequency

0.05 0.10 0.15 0.20 0.25 0.30 0.35

02468

圖 ^12: 不適當組距^,組別的直方圖

>result = hist(x) # store the results

> lass.int=result$mids[2℄-re sult $mid s[1℄ # 組距

> x.pts= (min(result$mids)- l ass. int, resu lt$m ids, max( resu lt$m ids) + la ss.i nt)

> y.pts= (0,result$ ounts,0)

> plot(x.pts,y.pts,type="l",m ain= "fre quen y polygon")

> rug(x) #在圖中顯示資料點

範例 ^7: 機率密度圖 Densities:density(x, ...)

進行頻率⁽次數⁾折線圖的要點是將直方圖與母群體的概率密度聯繫起來。可以使用內置的密度函數 ^density()

> data(faithful) #272 obs. of 2 variables:eruptions waiting

> atta h(faithful) # 進行^faithful 資料發佈使用

> hist(waiting,15,prob=T) # 採用^waiting變數¹⁵組的比率密度^,非次數頻率

> lines(density(waiting)) # 連接平滑曲線

> lines(density(waiting,bw ="SJ "), ol=' red' ) # 使用內定法(bandwidth)

>hist(waiting,15,prob=T);l ines (den sity (wai ting ,bw= 0.1) ) # bw值小^,曲線鋸齒狀

>hist(waiting,15,prob=T);l ines (den sity (wai ting ,bw= 10)) # bw值大^,曲線太平滑

順伯的窩第頁共頁

bw=SJ

>bounds <- hist(waiting,right=T,plot= F)$b reak s # data histogram break points

> options(digits=2) # set up digits number

> e df(waiting)(bounds) # al ulus waiting variable umulative

[1℄ 0.000 0.015 0.096 0.217 0.305 0.357 0.393 0.493 0.691 0.893 0.978 0.996 1.000

>plot(bounds,e df(waiting) (bo unds ),ty pe=" l",

main=paste("Cumulative frequen y polygon","of variable waiting",sep="nn"),

ylab="Frequen ies", ol=2 ,lwd =3)

40 50 60 70 80 90 100

0.0 0.2 0.4 0.6 0.8 1.0

Cumulative frequency polygonnnof variable waiting

bounds

> quantile(sals,0.25) # sala 的第一四分位數 ^Q1

25%

1.25

> quantile(sals,0.75) # sala 的第三四分位數 ^Q3

75%

7.25

順伯的窩第頁共頁

−1.5 −1.0 −0.5 0.0 0.5 1.0 1.5

0.20 0.25 0.30

Normal Q−Q Plot

Theoretical Quantiles

Sample Quantiles

圖^14: ^Q-Q 圖

2.4 練習題

1. 您將使用哪個函數指令繪製柏拉圖 ^(Pareto ^Chart)圖表⁽發生頻率排序的長條圖^)?

2. 您將使用哪個函數指令繪製堆積條形圖^?

3. 您將使用哪個函數指令來繪製圓餅圖 ⁽圓面積圖^)?

4. 您將使用哪個函數指令來繪製盒鬚圖^?

5. 您將使用哪個函數指令來繪製直方圖^?

6. 輸入資料^: ⁶⁰ ⁸⁵⁷²⁵⁹ ³⁷⁷⁵ ⁹³⁷ ⁹⁸⁶³ ⁴¹⁹⁰⁵ ¹⁷⁹⁷ 並作莖葉圖

7. 下列數據資料檔^"^bumpers,^"Lrst hi,"math (UsingR 套件⁾分別作出直方圖^,由直方圖預測其平均值^, 中位數^,標準差為多少^? 用^R 軟體核對你的猜測值^?

8. 下列數據資料檔分別作出莖葉圖及盒鬚圖^: "south," rimeand"aid哪些數據資料是偏斜的^? 哪些資料其中有異常值^? 哪些是對稱的^?

9. 數據資料 "pi2000(UsingR)是圓周率

_π

的前²⁰⁰⁰位阿拉伯數字^,觀察直方圖^,是否令你驚奇^?

(a) 找出出現阿拉伯數字

₁

₂

與

₃

的比率^?

(b) 你能否找出所有阿拉伯數字

_{0 ∼ 9}

出現的比率^?

( ) 用哪種機率密度函數解釋數據資料 "pi2000 為最佳^?

10. 研究山毛茸負鼠群體的形態變異 (Lindenmayer, D. B., Viggers, K.L., Cunningham,R.B., and Donnelly,C. F. 1995) 資料數據檔"possum

(a) 作變數 "hdlngth 直方圖

(b) 作變數 "hdlngth莖葉圖

( ) 作變數 "hdlngth 常態分位圖

(d) 作變數 "hdlngth 機率密度圖這些不同形式的顯示有哪些優點和缺點^?

11. 研究山毛茸負鼠群體的形態變異 (Lindenmayer, D. B., Viggers, K.L., Cunningham,R.B., and Donnelly,C. F. 1995) 資料數據檔"possum

(a) 針對雌性作變數 "totlngth 相對次數直方圖

(b) 若直方圖組別設定為 ^breaks ⁼ ^75+(0:5)*5 圖形與上題如何^?

順伯的窩第頁共頁

二維數據分析

^:R

軟體

3.1

雙變量數據

類別資料的長條圖 ^barplot(): ^Plotting^tabular ^data

範例 ^1: 疊合長條圖

(兩類別變數資料⁾ 假設進行了一項學生調查^, 以評估吸煙的學生是否學習時間較少^, 資料

如^:

編號是否抽菸學習時間

1 Y less than 5 hours(代碼¹⁾

2 N 5- 10hours(代碼²⁾

3 N 5- 10hours

4 Y more than10 hours(代碼³⁾

5 N more than10 hours

6 Y less than 5 hours

7 Y 5- 10hours

8 Y less than 5 hours

9 N more than5 hours

10 Y 5- 10hours

> smokes = ("Y","N","N","Y","N","Y", "Y", "Y", "N", "Y")

> amount = (1,2,2,3,3,1,2,1,3,2)

>barplot(table(smokes,amou nt) ) # put Sta ke

1 2 3

0 1 2 3 4

N Y

less than 5 5−10 more than 10 table(amount,smokes)

0.0 0.5 1.0 1.5 2.0 2.5 3.0

參數設定:beside=TRUE 為並列非堆疊長條圖。

範例 ^2: 並列型長條圖

>barplot(table(amount,smok es) ,mai n="t able (amo unt, smok es)" ,

beside=TRUE,legend.text= ("le ss than 5","5-10","more than 10"))

類別與數值資料的盒鬚圖^boxplot(): 假設您有幾個類別的數值數據。可由並列盒鬚圖比較其差異

範例 ^3: 一個簡單的例子可能是在藥物測試中^, 您可以獲得實驗組和對照組的數據^: 實驗組^: ⁵ ⁵ ⁵ ¹³ ⁷ ¹¹ ¹¹ ⁹ ⁸ ⁹

對照組^: ¹¹ ⁸ ⁴ ⁵ ⁹ ⁵ ¹⁰ ⁵ ⁴ ¹⁰

建立^R語言資料^: 我們看到^y 變量⁽實驗組標記為^1,對照組標記為²⁾。在數據和表示類別

順伯的窩第頁共頁

的變量方面如下^:

數值^: ⁵ ⁵ ⁵ ¹³ ⁷ ¹¹ ¹¹ ⁹ ⁸ ⁹ ¹¹ ⁸ ⁴ ⁵ ⁹ ⁵ ¹⁰ ⁵ ⁴ ¹⁰ 類別^: ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ² ² ² ² ² ² ² ² ² ²

> x = (5, 5, 5, 13, 7, 11, 11, 9, 8, 9)

> y = (11, 8, 4, 5, 9, 5, 10, 5, 4, 10)

> boxplot(x,y,names= ("graph1 ","g raph 2"), ol= ("r ed", "blu e"), xlab ="ki nd",

main="study times")

1 2

4 6 8 10 12

> amount = s an() # "複製","張貼" 快捷鍵輸入資料

1: 5 5 5 13 7 11 11 9 8 9 11 8 4 5 9 5 10 5 4 10

21:

Read 20 items

> ategory = s an()

1: 1 1 1 1 1 1 1 1 1 1 2 2 2 2 2 2 2 2 2 2

21:

Read 20 items

> boxplot(amount ~ ategory) #依照類別的盒鬚圖

3.1 練習題

1. 學生對教師的評價為 ^1-5 等級。若下表中給出了前³個問題的答案^:

Student Ques.1 Ques.2 Ques.3

1 3 5 1

2 3 2 3

3 3 5 1

4 4 5 1

5 3 2 1

6 4 2 3

7 3 5 1

8 4 5 1

9 3 4 1

10 4 2 1

(a) 用^(), ^s
an(),^read.table 建立資料

(b) 分別作出問題¹與問題²的統計表

順伯的窩第頁共頁

( ) 作問題¹與問題²的列聯表

(d) 用^barplot() 作問題¹與問題³的堆疊圖 ^(sta
ked)

(e) 用^barplot() 作問題¹、問題²與問題³的並列圖 (side-by-side)

2. 參考²²頁^,練習²⁷數據資料^: 調查小汽車乘客人數是否繫安全帶數據資料" ar.dat

(a) 作乘客人數與是否繫安全帶的列聯表^?

(b) 用^barplot() 作乘客人數與是否繫安全帶的堆疊圖 ^(sta
ked)

( ) 用^barplot() 作乘客人數與是否繫安全帶的並列圖 (side-by-side)

3. 過去的表現是未來表現的指標嗎^? 一個共同的信念是^,一個班級成績 ^A 的學生將成為下一個班級成績 ^A的學生。是這樣嗎^? 數據集"grades成績 ^(UsingR) 包含學生在數學課上收到的成績以及他們在之前數學課上的成績。請製作先後成績的列聯表^? 並做出評論^?

4. 某醫院進行藥物測驗^, 測得實驗組及對照組之指標如下^: 實驗組^:86, ^72,^74, ^85,^76,^79, ^82,^83,^83,^79, ⁸² 對照組^:81, ^77,^63, ^75,^69,^86, ^81,⁶⁰

畫出兩組之 side-by-side 盒形圖

5. 以下數據是有關哺乳母親吸菸與否與每天乳汁量調查資料^: 抽菸^: ^621, ^793, ^593, ^545, ^753, ^655, ^895, ^767, ^714, ^598, ⁶⁹³ 沒抽菸^: ^947, ^945, ^1086, ^1202,^973, ^981, ^930, ^745, ^903, ^899, ⁹⁶¹

(a) 並列兩者^boxplots 圖

(b) 用^dot
hart() 指令作圖

6. 參考²⁰頁^,練習¹²數據資料^:

(a) 製作問題¹與問題²的列聯表

(b) 製作問題²與問題³的長條圖

( ) 製作問題¹、問題²與問題³並列的長條圖

7. " 連線雜誌宣布^, 截至²⁰⁰³年⁷月^, 垃圾郵件佔所有電子郵件的百分比都在

_50%

以上並攀升。

用戶每天可能會收到超過¹⁰⁰封電子郵件^,這使得垃圾郵件成為一個耗時且昂貴的現實。表列出了商業電子郵件中的垃圾郵件數量以及按年度計算的商業電子郵件總量以及一些預測數量。輸入數據^, 然後重新創建表。製作分段條形圖^,顯示垃圾郵件數量和電子郵件總量。

2000 2001 2002 2003 2004 2005

spam 50 110 225 315 390 450

total 125 210 375 475 590 700

Sour e: Wired magazine September 2003 (單位^:billions)

順伯的窩第頁共頁

在文檔中 R data (頁 50-60)

1

Histogram of movies

k = 1+log 2 n

Q 1

Q 3

1.5

Histogram of movies

movies

0 5 10 15 20 25 30

Histogram of movies

movies

0 5 10 15 20 25 30

(0.0705,3.05] (6,8.95] (11.9,14.9] (17.8,20.8] (23.7,26.7]

0 5 10 15 20 25 30

...

0 5 10 15 20 25 30

current receipts

Sepal.Length Sepal.Width Petal.Length Petal.Width

0 2 4 6 8

...

setosa versicolor virginica

4.5 5.0 5.5 6.0 6.5 7.0 7.5 8.0

...

...

...

Histogram of x

x

Frequency

0.15 0.20 0.25 0.30 0.35

0 2 4 6 8

0.15 0.20 0.25 0.30 0.35

0 2 4 6 8

bw=SJ

40 50 60 70 80 90 100

0.0 0.2 0.4 0.6 0.8 1.0

Cumulative frequency polygonnnof variable waiting

bounds

−1.5 −1.0 −0.5 0.0 0.5 1.0 1.5

0.20 0.25 0.30

Normal Q−Q Plot

Theoretical Quantiles

Sample Quantiles

π

1

2

3

0 ∼ 9

二維數據分析

軟體

雙變量數據

1 2 3

0 1 2 3 4

N Y

less than 5 5−10 more than 10 table(amount,smokes)

0.0 0.5 1.0 1.5 2.0 2.5 3.0

1 2

4 6 8 10 12

50%

_{k = 1+log} ₂ _n

_Q ₁

_Q ₃

_1.5

_π

₁

₂

₃

_{0 ∼ 9}

_50%