# R for beginners

N/A
N/A
Protected

T

=

h



e goal4 of< the p/ r. esen5 t docu7 m> en5 tis t o gi?ve a st ar. ti5ng p/ oi5ntf< or. p/ eop/ l4e n5 ewl46yi5nter. ested in5 R. I@ t

 ried to simp/ lify6 as mu7 ch as I cou7 ld t he exp/ lanat ions to make t hem u7 nderst andab les b y6 all, while giv? ing u7 sefu7 l details, somet imes wit h tab les. Commands, inst ru7 ct ions and examp/ les are written in CouA rier font.

I t hank JB 7ulien Clau7 de, Christ op/ he Declercq, Friedrich Leisch and Mat hieu7 Ros for t heir comment s and su7 ggest ions on an earlier v? ersion of this docu7 ment. I am also grat efu7 l t o all the memb ers of t he R Dev? elop/ ment Core Team for t heir considerab le effort s in dev? elop/ ingRand animat ing t he discu7 ssion list ‘r-help/ ’. Thanks also t o t he R u7 sers whose qu7 est ions or comment s help/ ed me to writ e “R for b eginners”.

R is a statistical analy6 sis sy6 st em created b y6 Ross Ihaka & Rob ert Gent leman (1996, JI . ComJ Kput. GraL KphM . SN taL t., 5: 299-314). Ris b ot h a langu7 age and a software; it s most remarkab le feat7ures are:

• an5 ef

<

f

<

ect i



v

? e data h



an5 dl

4

i



n

5 g an5 d st or. age f

<

aci



l

4

i



t

 y6 ,

• a su7 ite of< op/ er. at or. s f< or. cal4 cu7 4lation5 s on5 ar. .ray6 s, m> at.rices, an5 d ot h er. com> /p4lexO op/ er. ation5 s,

• a large, coherent , int egrated collect ion of t ools for st at ist ical analy6 sis,

• nu7 merou7 s grap/ hical facilities which are p/ articu7 larly6 flexib le, and

• a simp/ le and effect iv? e p/ rogramming langu7 age which inclu7 des many6 facilit ies.

R is a l4 an5 gu7 age con5 si der. ed as a di al4 ect of< the l4an5 gu7 age S cr. eat ed b y6 the AP =T&T= QBel44l L

R ab or. ator. ies. S i s av? ai l4 ab l4 e as t h e sof< twar. e S-PD LR US S com> m> er. ci al4 i zed b y6 MT at h Sof< t (see h



t

 t p/ :/U /U www.sp/ l4 u7 s.m> at h sof< t .com> /U f< or. m> or. e in5 f< or. m> at i on5 ). T= h er. e ar. e i m> p/ or. tan5 ts di f< f< er. en5 ces i n5 t

 he concep/ t ions of R and S, b



u

7 t t hey6 are not of int erest t o u7 s here: t hose who want t o know more on t his p/ oint can read the p/ ap/ er b 6y Gent leman & Ihaka (1996) or t he R-FAQ (htt/p:/UU/cran.r-p/ rojV ect .org/Udoc/UFAQ/UR-FAQ.ht ml), a cop/ y6 of which is alse dist rib 7uted wit h t he soft ware.

R is f< .reel4y6 di st.rib7uted on5 th e t er. >ms of< the GNW US DP7ub4lic LR icen5 ce of< the FX .ree Sof<twar. e F

X

ou7 5ndation5 (f< or. >mor. e i5n<for. >mation5 : h tt/p:/U/Uwww.gn5 7u.or. g/U ); its dev? el4 op/ >men5 tan5 d di st.rib7ution5 ar. e carried on b



y

6 sev? eral st at ist icians known as t he R Dev? elop/ ment Core Team. A key6 -element in t

 his dev? elop/ ment is t he Comp/ rehensiv? e R Archiv? e NW et work (CRANW ).

R is av? ailab le in sev? eral forms: t he sou7 rces writt en in C (and some rou7 tines in Fort ran77) ready6 to b e comp/ iled, essent ially6 for US nix and Linu7 x machines, or some b inaries ready6 for u7 se (af< ter. a v? er. y6 easy6 in5 stal4 l4at ion5 ) accor. di5ng t o th e f< ol44lowi5ng t ab 4le.

ArcY hitZ e[ cY tZu\ re[ O] p^ e[ ra_ tZing` sa yb sa tZe[ m(csa )d I

e

n

te l" W8 in d o w0 s 92 5 /f92 8 /fNg T 4& .0% /f2 0% 0% 0%

L

3

i

n

u( x, (hDi e b ia n 2 .2 ,j M+ a n d ra kk e 79 .1,j R e d Hl a t

6



.x,j S# u( S# e 5 .3 /f6.4/f97.0% )m P

n

P

n

C

o

M

+

a

 c O1 S#

Linu( xPPCo 5 .0% Alp ha S# y ste ms Dig ita l Up nix 4.0%

Linu( x (hRe d Ha t 6 .x)m S

#

p

a rc Linu( x (hRe d



Ha t 6



.x)m

T

=

h



e f< i4les t o i n5 stal4 l4 these b in5 ar. ies ar. e at h ttp/ :/U/U cr. an5 .r. -p/ .rojV ect .or. g/Ubi5nU/(exO cep/ t<for. the MT aci5ntosh v

? ersion 1) where y6 ou7 can find t he inst allat ion instru7 ctions for each op/ erating sy6 st em as well.

R is a langu7 age wit h many6 fu7 nct ions for st at ist ical analy6 ses and grap/ hics; t he latter are v

? isu7 alized immediat ely6 in their own window and can b e sav? ed in v? ariou7 s format s (for exO am> /p4le, jV /pg, p/ 5ng, b m> p/ , ep/ s, or. wm> f< u7 n5 der. Win5 dows, p/ s, b >mp/ , p/ ict exO 7u5nder. SU5niOx). T= h e r

. esu7 l4 t s f< r. om> a statist i cal4 an5 al4 y6 si s can5 b e disp/ l4ay6 ed on5 t h e scr. een5 , som> e i n5 t er. m> edi at e r. esu7 l4 t s (Pq -v? al47ues, r. egr. essi on5 coef< <fici en5 ts) can5 be wr. itten5 i5na f< i4le or. 7used i5nsu7 bsequ7 en5 tan5 al46yses. T= heR l

4

an5 gu7 age al4l4ows th e u7 ser. , f< or. in5 st an5 ce, t o p/ .rogr. am> l4oop/ s of< com> >man5 ds t o su7 ccessi?vel46yan5 al46yse

1 T h e M+ a c in to s h p o rt o f Rr h a s j u( s t b e e n fin is h e d b y S# t efa n o Ieac u( s <s jag o@m! c l"in kk .it>t ,j a nd s h o u( l"d b e a \$va il"a b l"e s

 o o n o n Co R A* Ng .

(4)

4

sev? eral dat a set s. It is also p/ ossib



le t o comb



ine in a single p/ rogram different st at ist ical f

<

u

7 n5 ct i on5 s to p/ er. f< or. m> m> or. e com> p/ l4 exO an5 al4 y6 ses. T= h e R u7 ser. s m> ay6 b en5 ef< i t of< a l4ar. ge n5 u7 m> b er. of<

r

. ou7 t i n5 es wr. i t t en5 f< or. San5 d av? ai l4 ab l4 e on5 in5 ter. n5 et (f< or. exO am> p/ l4e: h t tp/ :/U/Ustat.cm> u7 .edu7 /US/U), m> ost of<

t

 h ese r. ou7 tin5 es can5 b e u7 sed di r. ect l4 y6 wi t h R.

A

P t f< ir. st, R cou7 l4 d seem> too com> p/ l4 exO f< or. a n5 on5 -sp/ eci al4 i st (f< or. in5 stan5 ce, a b i ol4 ogi st ). T= h i s m> ay6 n

5 ot b e t r. u7 e act u7 al4 l4 y6 . I@ n5 f< act , a p/ r. om> i n5 en5 t f< eat u7 r. e of< R i s i t s f< l4 exO i b i l4 i t y6 . W h er. eas a cl4 assi cal4 sof< twar. e (SAP S, SPD SS, Statistica, ...) di sp/ 4lay6 s (al4m> ost ) al44lthe r. esu7 4lts of< an5 an5 al46ysi s,R stor. es t

 hese resu7 lt s in an obju ect, so t hat an analy6 sis can b e done with no resu7 lt disp/ lay6 ed. The u7 ser may6 b e su7 rp/ rised b 6ythu7 s, b 7utsu7 ch a feat7ure is v? ery6 7usefu7 l. Indeed, t he u7 ser can ext ract only6 t

 he p/ art of t he resu7 lt s which is of int erest . For examp/ le, if one ru7 ns a series of 20 regressions an5 d wan5 ts t o com> /par. e the di<f<fer. en5 t .regr. ession5 coef< f< icien5 ts, R can5 di sp/ 4lay6 on5 4l6ythe esti>mat ed coef< <fici en5 ts: th7us the r. esu7 l4t s wil44ltakv e 20 l4 in5 es, wh er. eas a cl4 assi cal4 sof< twar. e cou7 l4d wel4 l4 op/ en5 20 r. esu7 4lts wi n5 dows. On5 e cou7 4ld cite m> an5 6y oth er. exO am> p/ l4es il44l7ust.rati5ng the su7 /per. ior. ity6 of< a sy6 st em> su7 ch



asR com> /par. ed t o cl

4

assi



cal

4

sof

<

t

 war. es; I@ h



op/ e th





l

4

l

4

b



e con5 ?vi



n

5 ced of

<

t

 h



i



s af< ter. r. eadi5ng th is docu7 >men5 t.

