1
R for beginners
Emmanue
l Paradis
1 W h
at i
s R ? 3
2 The few t hings t o know b efore start ing 5
2
.1 T h e o p e ra to r <- 5
2.2 Listing a nd d e le ting the o b je cts in me mo ry 5
2.3 The o n-line he lp 6
3
Dat a with R 8
3
.1 The ‘o b
j
e
c ts ’ 8
3
.2 R e a d in g d a ta fro m! fil"e s 8
3
.3 S# a $vin g d a ta 10%
3
.4& G' e n e ra tin g d a ta 11
3
.4.1 Re g u( la r s e q) u( e nc e s 11 3
.4.2 Ra nd o m s e q) (ue nc e s 13
3
.5 Ma nip u( la ting o b je c ts 13 3
.5 .1 A* c c e s sin g a p a rtic u( l"a r v$ a l"u( e o f a n o b je c t 13
3
.5 .2 A* rith m! e tic s a n d s i!mp l"e fu( n ction s 14&
3
.5 .3 M+ a trix, c om! p u( tatio n 16
4
-
Gr. ap/ hics with R 18
4
&
.1 M+ a n a gin g g ra p h ic w0 in d o w0 s 18 4.1.1 O1 p e ning s e v$ e ra l g ra p hic w0 ind o w0 s 18 4.1.2 Pa rtitio ning a g ra p hic w0 ind o w0 18
4.2 G' ra p hic fu( nc tio ns 192
4
&
.3 L3 ow0 -l"ev$ e l" p l"o ttin g c o m! m! a n d s 2 0%
4
&
.4& G' ra p h ic p a ra !me te rs 2 1
5 St atistical4 an5 al4y6 ses with R 23
6 The p/ rogramming langu7 age R 26
6
.1 Lo o p s a nd
c o nd
itio na l e xe c u( tio ns 26
6
.2 W8 ri
t
i
n
g y o u( o w0 n fu( n c ti
o
n s 2 79
7
:
H; ow to go f
<
ar. th
er. wi
t
h
R ? 30
8 Index 31
2
T
=
h
e goal4 of< the p/ r. esen5 t docu7 m> en5 tis t o gi?ve a st ar. ti5ng p/ oi5ntf< or. p/ eop/ l4e n5 ewl46yi5nter. ested in5 R. I@ t
ried to simp/ lify6 as mu7 ch as I cou7 ld t he exp/ lanat ions to make t hem u7 nderst andab les b y6 all, while giv? ing u7 sefu7 l details, somet imes wit h tab les. Commands, inst ru7 ct ions and examp/ les are written in CouA rier font.
I t hank JB 7ulien Clau7 de, Christ op/ he Declercq, Friedrich Leisch and Mat hieu7 Ros for t heir comment s and su7 ggest ions on an earlier v? ersion of this docu7 ment. I am also grat efu7 l t o all the memb ers of t he R Dev? elop/ ment Core Team for t heir considerab le effort s in dev? elop/ ingRand animat ing t he discu7 ssion list ‘r-help/ ’. Thanks also t o t he R u7 sers whose qu7 est ions or comment s help/ ed me to writ e “R for b eginners”.
© 2000, EC >m>man5 7uel
4
PD ar. adi
s (20 octob
r
. e 2000)
3
1 W
Eha
Ft
Gis
HR ?
R is a statistical analy6 sis sy6 st em created b y6 Ross Ihaka & Rob ert Gent leman (1996, JI . ComJ Kput. GraL KphM . SN taL t., 5: 299-314). Ris b ot h a langu7 age and a software; it s most remarkab le feat7ures are:
• an5 ef
<
f
<
ect i
v
? e data h
an5 dl
4
i
n
5 g an5 d st or. age f
<
aci
l
4
i
t
y6 ,
• a su7 ite of< op/ er. at or. s f< or. cal4 cu7 4lation5 s on5 ar. .ray6 s, m> at.rices, an5 d ot h er. com> /p4lexO op/ er. ation5 s,
• a large, coherent , int egrated collect ion of t ools for st at ist ical analy6 sis,
• nu7 merou7 s grap/ hical facilities which are p/ articu7 larly6 flexib le, and
• a simp/ le and effect iv? e p/ rogramming langu7 age which inclu7 des many6 facilit ies.
R is a l4 an5 gu7 age con5 si der. ed as a di al4 ect of< the l4an5 gu7 age S cr. eat ed b y6 the AP =T&T= QBel44l L
R ab or. ator. ies. S i s av? ai l4 ab l4 e as t h e sof< twar. e S-PD LR US S com> m> er. ci al4 i zed b y6 MT at h Sof< t (see h
t
t p/ :/U /U www.sp/ l4 u7 s.m> at h sof< t .com> /U f< or. m> or. e in5 f< or. m> at i on5 ). T= h er. e ar. e i m> p/ or. tan5 ts di f< f< er. en5 ces i n5 t
he concep/ t ions of R and S, b
u
7 t t hey6 are not of int erest t o u7 s here: t hose who want t o know more on t his p/ oint can read the p/ ap/ er b 6y Gent leman & Ihaka (1996) or t he R-FAQ (htt/p:/UU/cran.r-p/ rojV ect .org/Udoc/UFAQ/UR-FAQ.ht ml), a cop/ y6 of which is alse dist rib 7uted wit h t he soft ware.
R is f< .reel4y6 di st.rib7uted on5 th e t er. >ms of< the GNW US DP7ub4lic LR icen5 ce of< the FX .ree Sof<twar. e F
X
ou7 5ndation5 (f< or. >mor. e i5n<for. >mation5 : h tt/p:/U/Uwww.gn5 7u.or. g/U ); its dev? el4 op/ >men5 tan5 d di st.rib7ution5 ar. e carried on b
y
6 sev? eral st at ist icians known as t he R Dev? elop/ ment Core Team. A key6 -element in t
his dev? elop/ ment is t he Comp/ rehensiv? e R Archiv? e NW et work (CRANW ).
R is av? ailab le in sev? eral forms: t he sou7 rces writt en in C (and some rou7 tines in Fort ran77) ready6 to b e comp/ iled, essent ially6 for US nix and Linu7 x machines, or some b inaries ready6 for u7 se (af< ter. a v? er. y6 easy6 in5 stal4 l4at ion5 ) accor. di5ng t o th e f< ol44lowi5ng t ab 4le.
ArcY hitZ e[ cY tZu\ re[ O] p^ e[ ra_ tZing` sa yb sa tZe[ m(csa )d I
e
n
te l" W8 in d o w0 s 92 5 /f92 8 /fNg T 4& .0% /f2 0% 0% 0%
L
3
i
n
u( x, (hDi e b ia n 2 .2 ,j M+ a n d ra kk e 79 .1,j R e d Hl a t
6
.x,j S# u( S# e 5 .3 /f6.4/f97.0% )m P
n
P
n
C
o
M
+
a
c O1 S#
Linu( xPPCo 5 .0% Alp ha S# y ste ms Dig ita l Up nix 4.0%
Linu( x (hRe d Ha t 6 .x)m S
#
p
a rc Linu( x (hRe d
Ha t 6
.x)m
T
=
h
e f< i4les t o i n5 stal4 l4 these b in5 ar. ies ar. e at h ttp/ :/U/U cr. an5 .r. -p/ .rojV ect .or. g/Ubi5nU/(exO cep/ t<for. the MT aci5ntosh v
? ersion 1) where y6 ou7 can find t he inst allat ion instru7 ctions for each op/ erating sy6 st em as well.
R is a langu7 age wit h many6 fu7 nct ions for st at ist ical analy6 ses and grap/ hics; t he latter are v
? isu7 alized immediat ely6 in their own window and can b e sav? ed in v? ariou7 s format s (for exO am> /p4le, jV /pg, p/ 5ng, b m> p/ , ep/ s, or. wm> f< u7 n5 der. Win5 dows, p/ s, b >mp/ , p/ ict exO 7u5nder. SU5niOx). T= h e r
. esu7 l4 t s f< r. om> a statist i cal4 an5 al4 y6 si s can5 b e disp/ l4ay6 ed on5 t h e scr. een5 , som> e i n5 t er. m> edi at e r. esu7 l4 t s (Pq -v? al47ues, r. egr. essi on5 coef< <fici en5 ts) can5 be wr. itten5 i5na f< i4le or. 7used i5nsu7 bsequ7 en5 tan5 al46yses. T= heR l
4
an5 gu7 age al4l4ows th e u7 ser. , f< or. in5 st an5 ce, t o p/ .rogr. am> l4oop/ s of< com> >man5 ds t o su7 ccessi?vel46yan5 al46yse
1 T h e M+ a c in to s h p o rt o f Rr h a s j u( s t b e e n fin is h e d b y S# tefa n o Ieac u( s <s jag o@m! c l"in kk .it>t ,j a nd s h o u( l"d b e a $va il"a b l"e s
o o n o n Co R A* Ng .
4
sev? eral dat a set s. It is also p/ ossib
le t o comb
ine in a single p/ rogram different st at ist ical f
<
u
7 n5 ct i on5 s to p/ er. f< or. m> m> or. e com> p/ l4 exO an5 al4 y6 ses. T= h e R u7 ser. s m> ay6 b en5 ef< i t of< a l4ar. ge n5 u7 m> b er. of<
r
. ou7 t i n5 es wr. i t t en5 f< or. San5 d av? ai l4 ab l4 e on5 in5 ter. n5 et (f< or. exO am> p/ l4e: h t tp/ :/U/Ustat.cm> u7 .edu7 /US/U), m> ost of<
t
h ese r. ou7 tin5 es can5 b e u7 sed di r. ect l4 y6 wi t h R.
A
P t f< ir. st, R cou7 l4 d seem> too com> p/ l4 exO f< or. a n5 on5 -sp/ eci al4 i st (f< or. in5 stan5 ce, a b i ol4 ogi st ). T= h i s m> ay6 n
5 ot b e t r. u7 e act u7 al4 l4 y6 . I@ n5 f< act , a p/ r. om> i n5 en5 t f< eat u7 r. e of< R i s i t s f< l4 exO i b i l4 i t y6 . W h er. eas a cl4 assi cal4 sof< twar. e (SAP S, SPD SS, Statistica, ...) di sp/ 4lay6 s (al4m> ost ) al44lthe r. esu7 4lts of< an5 an5 al46ysi s,R stor. es t
hese resu7 lt s in an obju ect, so t hat an analy6 sis can b e done with no resu7 lt disp/ lay6 ed. The u7 ser may6 b e su7 rp/ rised b 6ythu7 s, b 7utsu7 ch a feat7ure is v? ery6 7usefu7 l. Indeed, t he u7 ser can ext ract only6 t
he p/ art of t he resu7 lt s which is of int erest . For examp/ le, if one ru7 ns a series of 20 regressions an5 d wan5 ts t o com> /par. e the di<f<fer. en5 t .regr. ession5 coef< f< icien5 ts, R can5 di sp/ 4lay6 on5 4l6ythe esti>mat ed coef< <fici en5 ts: th7us the r. esu7 l4t s wil44ltakv e 20 l4 in5 es, wh er. eas a cl4 assi cal4 sof< twar. e cou7 l4d wel4 l4 op/ en5 20 r. esu7 4lts wi n5 dows. On5 e cou7 4ld cite m> an5 6y oth er. exO am> p/ l4es il44l7ust.rati5ng the su7 /per. ior. ity6 of< a sy6 st em> su7 ch
asR com> /par. ed t o cl
4
assi
cal
4
sof
<
t
war. es; I@ h
op/ e th
e r. eader. wi
l
4
l
4
b
e con5 ?vi
n
5 ced of
<
t
h
i
s af< ter. r. eadi5ng th is docu7 >men5 t.