• 沒有找到結果。

R 資料視覺化

N/A
N/A
Protected

Academic year: 2021

Share "R 資料視覺化"

Copied!
77
0
0

加載中.... (立即查看全文)

全文

(1)

http://www.hmwu.idv.tw

http://www.hmwu.idv.tw

吳漢銘國立政治大學 統計學系

R 資料視覺化

ggplot2

統計繪圖套件

E03

(2)

http://www.hmwu.idv.tw

What is ggplot2

High-level graphics system developed by Hadley Wickham.

Implements grammar of graphics from Leland Wilkinson.

Streamlines many graphics workflows for complex plots.

Syntax centered around main ggplot function.

Simpler qplot function provides many shortcuts.

2/77

(3)

http://www.hmwu.idv.tw

The principle of a ggplot

 The principle that a plot:

Plot = data + aesthetics + geometry

data : a data frame (dataset).

aesthetics :

 indicates x and y variables,

 tells R how data are displayed in a plot.

(e.g. color, size and shape of points, the height of bars etc.)

geometry : to the type of graphics

(bar chart, histogram, box plot, line plot, density plot, dot plot etc.)

3/77

(4)

http://www.hmwu.idv.tw

General ggplot syntax

Other important parts of plot:

Faceting implies the same type of graph can be applied to each subset of the data.

(e.g, for variable gender, creating 2 graphs for male and female.)

Annotation lets you to add text to the plot.

Summary Statistics allows you to add descriptive statistics on a plot.

Scales are used to control x and y axis limits.

> install.packages("ggplot2")

> library(ggplot2)

> ggplot(data=iris, aes(x=Sepal.Length, y=Sepal.Width, color=Species)) + geom_point(size=3)

General ggplot syntax

ggplot(data, aes(...)) + geom() + ... + stat() + ...

4/77

(5)

http://www.hmwu.idv.tw

Why ggplot2?

More elegant & compact code than base graphics.

More aesthetically pleasing than base graphics.

Very powerful for exploratory analysis.

Supports a continuum of expertise.

Easy to get started, plenty of power for complex figures.

Publication-quality figures.

Excellent themes can be created with a single command.

Its colors are nicer and more pretty than the usual graphics.

Easy to visualize data with multiple variables.

Provides a platform to create simple graphs providing plethora of information.

• Manual: http://had.co.nz/ggplot2/

• Introduction:

http://www.ling.upenn.edu/~joseff/rstudy/summer2010_ggplot2_intro.

html

• Book: http://had.co.nz/ggplot2/book/

• R Graphics Cookbook : http://www.cookbook-r.com/Graphs/

5/77

(6)

http://www.hmwu.idv.tw

The R Graph Gallery

https://www.r-graph-gallery.com/index.html

6/77

(7)

http://www.hmwu.idv.tw

Data Visualization with ggplot2: Cheat Sheet

https://www.rstudio.com/resources/cheatsheets/

7/77

(8)

http://www.hmwu.idv.tw

Data Visualization with ggplot2: Cheat Sheet 8/77

(9)

http://www.hmwu.idv.tw

ggplot2: Basics 9/77

(10)

http://www.hmwu.idv.tw

ggplot2 extensions

https://exts.ggplot2.tidyverse.org/gallery/

10/77

(11)

http://www.hmwu.idv.tw

索引圖 (Index plot)

iris.index.plot <- function(x){

ggplot(iris, aes(x=1:nrow(iris), y=iris[,x], color=Species)) + geom_point() +

labs(x="index", y=names(iris)[x]) }

index.list <- lapply(1:4, iris.index.plot) library(grid)

library(gridExtra)

marrangeGrob(index.list, nrow=2, ncol=2, top="")

ggplot(iris, aes(x=1:150, y=Sepal.Length, color=Species)) + geom_point()

> # select numerical variables

> mydata <- infert

> dim(mydata)

> head(mydata)

> str(mydata)

> id <- sapply(mydata, is.numeric)

> id

> mydata.numeric <- mydata[, id]

> dim(mydata.numeric)

> head(mydata.numeric)

# is.integer, is.double,

# is.logical, is.character,

# is.POSIXt, is.POSIXlt,

# is.POSIXct

11/77

(12)

http://www.hmwu.idv.tw

盒型圖 (Box plots)

library(ggplot2)

p <- ggplot(iris, aes(x=Species, y=Sepal.Length)) + geom_boxplot()

p

p + coord_flip() # Rotate the box plot

# Notched box plot

ggplot(iris, aes(x=Species, y=Sepal.Length)) + geom_boxplot(notch=TRUE)

ggplot(iris, aes(x=Species, y=Sepal.Length)) +

geom_boxplot(outlier.colour="red", outlier.shape=8, outlier.size=3)

geom_boxplot understands the following aesthetics (required aesthetics are in bold):

x, lower, upper, middle, ymin, ymax, alpha, colour, fill, group, linetype, shape, size, weight

12/77

(13)

http://www.hmwu.idv.tw

Box plot with dots

# stat_summary(): add mean points to a box plot p <- ggplot(iris, aes(x=Species, y=Sepal.Length)) +

geom_boxplot() p

p + stat_summary(fun=mean, geom="point", shape=2, size=4, col="red") p + geom_dotplot(binaxis="y", stackdir="center")

13/77

(14)

http://www.hmwu.idv.tw

Change box plot line colors

p <- ggplot(iris, aes(x=Species, y=Sepal.Length, color=Species)) + geom_boxplot()

p

# Use custom color palettes

p + scale_color_manual(values=c("orange", "purple", "darkgreen"))

# Use brewer color palettes

p + scale_color_brewer(palette="Set2")

> library(RColorBrewer)

> display.brewer.all()

# turn off legends

ggplot(iris, aes(x=Species, y=Sepal.Length, color=Species)) + geom_boxplot(show.legend = FALSE)

# remove the legend after the plot is created p + theme(legend.position = "none")

14/77

(15)

http://www.hmwu.idv.tw

Change box plot fill colors

p <- ggplot(iris, aes(x=Species, y=Sepal.Length, fill=Species)) + geom_boxplot()

p

p + scale_fill_manual(values=c("orange", "purple", "darkgreen")) p + scale_fill_brewer(palette="Set2")

15/77

(16)

http://www.hmwu.idv.tw

Change the legend position

p <- ggplot(iris, aes(x=Species, y=Sepal.Length, fill=Species)) + geom_boxplot()

p

p + theme(legend.position="top") p + theme(legend.position="bottom") p + theme(legend.position="none")

legend.position are:

"left", "top", "right", "bottom".

16/77

(17)

http://www.hmwu.idv.tw

Change the order of items in the legend

> levels(iris$Species)

[1] "setosa" "versicolor" "virginica"

p <- ggplot(iris, aes(x=Species, y=Sepal.Length, fill=Species)) + geom_boxplot()

p

p + scale_x_discrete(limits=c("setosa", "versicolor"))

p + scale_x_discrete(limits=c("versicolor", "setosa", "virginica"))

17/77

(18)

http://www.hmwu.idv.tw

Box plot with multiple groups

> select.id <- sample(1:150, 50)

> if.selected <- 1:150 %in% select.id

> iris.sel <- cbind(iris, if.selected)

> head(iris.sel)

Sepal.Length Sepal.Width Petal.Length Petal.Width Species if.selected 1 5.1 3.5 1.4 0.2 setosa TRUE 2 4.9 3.0 1.4 0.2 setosa FALSE 3 4.7 3.2 1.3 0.2 setosa FALSE 4 4.6 3.1 1.5 0.2 setosa FALSE 5 5.0 3.6 1.4 0.2 setosa TRUE 6 5.4 3.9 1.7 0.4 setosa TRUE

>

> p <- ggplot(iris.sel, aes(x=Species, y=Sepal.Length, fill=if.selected)) + + geom_boxplot() +

+ labs(title="iris data with selected IDs")

> p

18/77

(19)

http://www.hmwu.idv.tw

小提琴圖 (Violin plots)

ggplot(iris, aes(x=Species, y=Sepal.Length)) + geom_violin()

p <- ggplot(iris, aes(x=Species, y=Sepal.Length, fill=Species)) + geom_violin(trim=FALSE)

p

p + stat_summary(fun.data=mean_sdl, fun.args = list(mult = 1), geom="pointrange", color="purple")

19/77

(20)

http://www.hmwu.idv.tw

點圖 (Dot plots)

ggplot(iris, aes(x=Species, y=Petal.Length)) + geom_dotplot(binaxis="y")

ggplot(iris, aes(x=Species, y=Petal.Length)) + geom_dotplot(binaxis="y", stackdir="center",

stackratio=1.5, dotsize=0.5)

ggplot(iris, aes(x=Species, y=Petal.Length)) + geom_boxplot()+

geom_dotplot(binaxis="y", stackdir="center", dotsize=0.5) ggplot(iris, aes(x=Species, y=Petal.Length)) +

geom_violin(trim = FALSE)+

geom_dotplot(binaxis="y", stackdir="center", dotsize=0.5, colour="red")

20/77

(21)

http://www.hmwu.idv.tw

Dot plots + 統計量

p <- ggplot(iris, aes(x=Species, y=Petal.Length)) +

geom_dotplot(binaxis="y", stackdir="center", dotsize=0.5) p

p + stat_summary(fun=mean, geom="point", shape=17, size=3, color="red")

my_summary <- function(x, a=1) { m <- mean(x)

ymin <- m - a * sd(x) ymax <- m + a * sd(x)

c(y=m, ymin=ymin, ymax=ymax) }

p + stat_summary(fun.data=my_summary, color="red")

p + stat_summary(fun.data=my_summary, color="blue", fun.args=list(a=2))

21/77

(22)

http://www.hmwu.idv.tw

直方圖 (Histogram)

> # install.packages("gridExtra")

> library(gridExtra)

> h1 <- ggplot(data=iris, aes(x=Sepal.Length)) + geom_histogram()

> h2 <- ggplot(data=iris, aes(x=Sepal.Length)) + geom_histogram(binwidth=1)

> h3 <- ggplot(data=iris, aes(x=Sepal.Length)) +

geom_histogram(color="black", fill="blue", bins = 10)

> h4 <- ggplot(data=iris, aes(x=Sepal.Length, color=Species)) + geom_histogram(binwidth = 1)

> grid.arrange(h1, h2, h3, h4, nrow=1, ncol=4)

22/77

(23)

http://www.hmwu.idv.tw

Histogram

> p <- ggplot(data=iris, aes(x=Sepal.Length))

> p <- p + geom_histogram()

> p + facet_grid(Species~.) #row

p + facet_grid(.~Species) #column

23/77

(24)

http://www.hmwu.idv.tw

Histogram

> library(gridExtra)

> sl <- ggplot(iris, aes(x=Sepal.Length, fill=Species)) + geom_histogram(binwidth = 0.1)

> sw <- ggplot(iris, aes(x=Sepal.Width, fill=Species)) + geom_histogram(binwidth = 0.1)

> pl <- ggplot(iris, aes(x=Petal.Length, fill=Species)) + geom_histogram(binwidth = 0.1)

> pw <- ggplot(iris, aes(x=Petal.Width, fill=Species)) + geom_histogram(binwidth = 0.1)

> grid.arrange(sl, sw, pl, pw, nrow = 2)

24/77

(25)

http://www.hmwu.idv.tw

Histogram

iris.hist <- function(x){

ggplot(iris, aes(x=iris[,x], fill=Species)) + geom_histogram(binwidth = 0.1) +

xlab(names(iris)[x]) }

hist.list <- lapply(1:4, iris.hist) library(grid)

marrangeGrob(hist.list, nrow=2, ncol=2, top="")

25/77

(26)

http://www.hmwu.idv.tw

機率密度圖 (Density plots)

p1 <- ggplot(iris, aes(x=Sepal.Length)) + geom_density()

p2 <- ggplot(iris, aes(x=Sepal.Length, color=Species)) + geom_density()

p3 <- ggplot(iris, aes(x=Sepal.Length, fill=Species)) + geom_density()

# Add mean line

p1 + geom_vline(aes(xintercept=mean(Sepal.Length)), color="blue", linetype="dashed", size=1) mu <- tapply(iris$Sepal.Length, iris$Species, mean) mu.df <- data.frame(Sp=names(mu), grp.mean=mu)

head(mu.df)

p2 + geom_vline(data=mu.df, aes(xintercept=grp.mean, color=Sp), linetype="dashed")

26/77

(27)

http://www.hmwu.idv.tw

Change line color, line type and fill color

# Change line color, line type and fill color ggplot(iris, aes(x=Sepal.Length))+

geom_density(color="darkblue", fill="lightblue", linetype="dashed")

# Use semi-transparent fill

ggplot(iris, aes(x=Sepal.Length, fill=Species)) + geom_density(alpha=0.4)

27/77

(28)

http://www.hmwu.idv.tw

Combine histogram and density plots

Create the histogram with a density scale using the computed varlable ..density..

# Histogram with density plot

ggplot(iris, aes(x=Sepal.Length)) +

geom_histogram(aes(y=..density..), colour="black", fill="lightblue") + geom_density(alpha=0.2, fill="red")

# Color by groups

phd <- ggplot(iris, aes(x=Sepal.Length, color=Species, fill=Species)) +

geom_histogram(aes(y=..density..), alpha=0.5, position="identity") + geom_density(alpha=.2)

phd

# facets: Split the plot in multiple panels phd + facet_grid(Species ~ .)

28/77

(29)

http://www.hmwu.idv.tw

mtcars {datasets}

mtcars {datasets} : Motor Trend Car Road Tests ((1974 Motor Trend US) data frame包含32台車與11個車的屬性

• mpg :Miles/(US) gallon 耗油量

• cyl :Number of cylinders 汽缸數

• disp:Displacement (cu.in., cubic inch) 單汽缸排氣量

• hp :Gross horsepower 馬力

• drat:Rear axle ratio 後輪軸減速比

• wt :Weight (1000 lbs) 車體重量

• qsec:1/4 mile time 1/4英里加速秒數

• vs :V/S,0代表V型引擎,1代表直立式引擎

• am :Transmission (0 = automatic自排, 1 = manual手排)

• gear:Number of forward gears 變速箱數

• carb:Number of carburetors 化油器數

> head(mtcars)

mpg cyl disp hp drat wt qsec vs am gear carb Mazda RX4 21.0 6 160 110 3.90 2.620 16.46 0 1 4 4 Mazda RX4 Wag 21.0 6 160 110 3.90 2.875 17.02 0 1 4 4 Datsun 710 22.8 4 108 93 3.85 2.320 18.61 1 1 4 1 Hornet 4 Drive 21.4 6 258 110 3.08 3.215 19.44 1 0 3 1 Hornet Sportabout 18.7 8 360 175 3.15 3.440 17.02 0 0 3 2 Valiant 18.1 6 225 105 2.76 3.460 20.22 1 0 3 1

29/77

(30)

http://www.hmwu.idv.tw

長條圖 (Bar plot)(Bar chart)

> # mtcars$cyl <- as.factor(mtcars$cyl)

> p <- ggplot(mtcars, aes(x=cyl)) + + geom_bar() +

+ labs(x="汽缸數(cyl)", y="車輛數")

>

> p + coord_flip()

>

> cyl.df <- data.frame(table(mtcars$cyl))

> cyl.df Var1 Freq 1 4 11 2 6 7 3 8 14

> names(cyl.df) <- c("cyl", "count")

> cyl.df cyl count 1 4 11 2 6 7 3 8 14

>

> ggplot(cyl.df, aes(x=cyl, y=count)) + + geom_bar(stat="identity") +

+ labs(x="汽缸數(cyl)", y="車輛數")

ggtitle("Main title"): Adds a main title above the plot xlab("X axis label"): Changes the X axis label

ylab("Y axis label"): Changes the Y axis label

labs(title="Main title", x="X axis label", y="Y axis label"):

Changes main title and axis labels

30/77

(31)

http://www.hmwu.idv.tw

Bar plot

ggplot(cyl.df, aes(x=cyl, y=count)) +

geom_bar(stat="identity", width=0.5, color="black", fill="steelblue") p <- ggplot(cyl.df, aes(x=cyl, y=count, fill=cyl)) +

geom_bar(stat="identity", width=0.5) p

p + geom_text(aes(label=count), vjust=-0.3, size=4)

p + geom_text(aes(label=count), vjust=1.6, color="white", size=4) p + scale_fill_manual(values=c("#999000", "#E69F99", "#56B4E9")) p + scale_fill_brewer(palette="Dark2")

p + scale_fill_grey()

31/77

(32)

http://www.hmwu.idv.tw

Bar plot

> iris.mean <- aggregate(iris[,1:4], by=list(Species=iris$Species), FUN=mean)

> iris.mean

Species Sepal.Length Sepal.Width Petal.Length Petal.Width 1 setosa 5.006 3.428 1.462 0.246 2 versicolor 5.936 2.770 4.260 1.326 3 virginica 6.588 2.974 5.552 2.026

> mydata <- cbind(stack(iris.mean[,-1]), Species = iris.mean$Species)

> mydata

values ind Species 1 5.006 Sepal.Length setosa 2 5.936 Sepal.Length versicolor 3 6.588 Sepal.Length virginica 4 3.428 Sepal.Width setosa 5 2.770 Sepal.Width versicolor 6 2.974 Sepal.Width virginica 7 1.462 Petal.Length setosa 8 4.260 Petal.Length versicolor 9 5.552 Petal.Length virginica 10 0.246 Petal.Width setosa 11 1.326 Petal.Width versicolor 12 2.026 Petal.Width virginica

> ggplot(mydata, aes(x=ind, y=values, fill = Species)) + + geom_bar(stat="identity")

position_dodge for creating side-by-side barcharts.

Other position adjustments:

position_identity, position_jitterdodge, position_jitter, position_nudge, position_stack

32/77

(33)

http://www.hmwu.idv.tw

Bar plot

ggplot(mydata, aes(x=ind, y=values, fill=Species)) + geom_bar(stat="identity", position="dodge") +

geom_text(aes(label=values), vjust=1.4, color="white", position = position_dodge(0.9)) +

labs(x="", y="mean")

33/77

(34)

http://www.hmwu.idv.tw

Bar plot

> library(plyr)

> mydata.sorted <- arrange(mydata, ind, Species)

> mydata.sorted

values ind Species 1 5.006 Sepal.Length setosa 2 5.936 Sepal.Length versicolor 3 6.588 Sepal.Length virginica ...

10 0.246 Petal.Width setosa 11 1.326 Petal.Width versicolor 12 2.026 Petal.Width virginica

> mydata.sorted$Species <- factor(mydata.sorted$Species, levels=rev(levels(mydata$Species)))

> mydata.cumsum <- ddply(mydata.sorted, "ind",

+ transform, label.ypos=cumsum(values))

> mydata.cumsum

values ind Species label.ypos 1 5.006 Sepal.Length setosa 5.006 2 5.936 Sepal.Length versicolor 10.942 3 6.588 Sepal.Length virginica 17.530 ...

10 0.246 Petal.Width setosa 0.246 11 1.326 Petal.Width versicolor 1.572 12 2.026 Petal.Width virginica 3.598

>

>

>

> ggplot(mydata.cumsum, aes(x=ind, y=values, fill=Species)) + + geom_bar(stat="identity") +

+ geom_text(aes(y=label.ypos, label=values), vjust=1.6, color="white")

34/77

(35)

http://www.hmwu.idv.tw

線圖 (Line plot)

> head(airquality)

Ozone Solar.R Wind Temp Month Day 1 41 190 7.4 67 5 1 2 36 118 8.0 72 5 2 3 12 149 12.6 74 5 3 4 18 313 11.5 62 5 4 5 NA NA 14.3 56 5 5 6 28 NA 14.9 66 5 6

> airquality$Month <- factor(airquality$Month)

> ggplot(airquality, aes(x=Day, y=Temp, group=Month, color=Month)) + + geom_line(aes(linetype=Month)) +

+ geom_point()

35/77

(36)

http://www.hmwu.idv.tw

Line plot

> sales <- data.frame(

+ date = seq(Sys.Date(), length.out=100, by="1 day")[sample(100, 50)], + price = floor(rnorm(50, mean=100, sd=20))

+ )

> sales <- sales[order(sales$date), ]

> head(sales)

date price 28 2020-08-10 73 27 2020-08-11 76 1 2020-08-12 95 22 2020-08-13 109 8 2020-08-16 107 18 2020-08-19 105

lp <- ggplot(data=sales, aes(x=date, y=price)) + geom_line() lp

lp + scale_x_date(date_labels=("%m/%d"))

36/77

(37)

http://www.hmwu.idv.tw

Line plot

lp + scale_x_date(breaks = date_breaks("1 week")) + theme(axis.text.x = element_text(angle=45)) range(sales$date)

# "2020-08-10" "2020-11-07"

amin <- as.Date("2020-09-01") amax <- as.Date("2020-10-31")

lp + scale_x_date(limits = c(amin, amax))

37/77

(38)

http://www.hmwu.idv.tw

Line plot

> mydata <- as.data.frame(matrix(rnorm(100), ncol=4))

> library(reshape2)

> head(mydata, 3)

V1 V2 V3 V4 1 0.2846997 0.78283129 0.659318307 -0.04318195 2 1.2510186 1.59782230 0.187074422 -1.23161275 3 0.3827782 1.44676700 0.840148794 -0.20081868

> #id variable for position in matrix

> mydata$id <- 1:nrow(mydata)

> #reshape to long format

> mydata.lf <- melt(mydata, id.var="id")

> head(mydata.lf)

id variable value 1 1 V1 0.2846997 2 2 V1 1.2510186 3 3 V1 0.3827782 4 4 V1 -3.2994010 5 5 V1 1.4943630 6 6 V1 0.1557203

> tail(mydata.lf)

id variable value 95 20 V4 1.1655219 96 21 V4 0.5081844 97 22 V4 -1.2523577 98 23 V4 -2.3553732 99 24 V4 0.1542803 100 25 V4 0.6899416

> ggplot(mydata.lf, aes(x=id, y=value, group=variable, colour=variable)) + + geom_point()+

+ geom_line(aes(lty=variable))

38/77

(39)

http://www.hmwu.idv.tw

於圖上加直線、迴歸線

pp <- ggplot(airquality, aes(x=Temp, y=Wind)) + geom_point()

pp

pp + geom_hline(yintercept=mean(airquality$Wind), linetype="dashed", color="red", size=2) + geom_vline(xintercept=mean(airquality$Temp), linetype="dotted", color="blue", size=2) beta <- lm(Wind ~ Temp, data=airquality)$coefficients

pp + geom_abline(intercept=beta[1], slope=beta[2], color="red", size=2) +

ggtitle(paste0("y = ", round(beta[1], 2), " + ", round(beta[2], 2), " x"))

39/77

(40)

http://www.hmwu.idv.tw

於圖上加線段、箭頭

pp + geom_segment(aes(x=70, y=5, xend=90, yend=12), color="red") + geom_segment(aes(x=65, y=17, xend=61.5, yend=19.5), color="blue",

arrow = arrow(length = unit(0.5, "cm"))) +

geom_point(aes(x=60, y=5), color="darkgreen", shape=13, size=4) xy.df <- data.frame(x1=airquality$Temp[1:3], y1=airquality$Wind[1:3],

x2=airquality$Temp[4:6], y2=airquality$Wind[4:6])

pp + geom_segment(data=xy.df, aes(x=x1, y=y1, xend=x2, yend=y2), color=2:4, size=2)

?geom_curve

40/77

(41)

http://www.hmwu.idv.tw

散佈圖 (Scatterplot)

> ggplot(data=iris, aes(x=Sepal.Length, y=Sepal.Width, shape=Species, color=Species)) + geom_point()

> p <- ggplot(data=iris, aes(x=Sepal.Length, y=Sepal.Width, shape=Species, color=Species))

> p <- p + geom_point()

> p

> p + geom_line(aes(y=Sepal.Width))

41/77

(42)

http://www.hmwu.idv.tw

Scatterplot, 座標軸

> p <- ggplot(data=iris, aes(x=Sepal.Length, y=Sepal.Width)) + + geom_point()

> p

> p + coord_fixed(ratio=1)

> p + coord_fixed(ratio=0.5)

> p + coord_fixed(ratio=5)

42/77

(43)

http://www.hmwu.idv.tw

Scatterplot, 座標軸

# Change x and y axis limits

gp <- ggplot(airquality, aes(x=Temp, y=Wind)) + geom_point()

gp

gp + xlim(50, 100) + ylim(0, 25)

gp + expand_limits(x=c(50, 100), y=c(0, 25))

# Axis transformations

# trans: "log2", "log10", "sqrt"

gp + scale_x_continuous(trans="log2") + scale_y_continuous(trans="log2") + labs(x="log2(Temp)", y="log2(Wind)")

gp + coord_trans(x="log2", y="log2") gp + scale_y_sqrt() # square root

gp + scale_y_reverse() # Reverse coordinates

43/77

(44)

http://www.hmwu.idv.tw

Scatterplot, 座標軸

gp + scale_y_continuous(labels = percent) gp + scale_y_continuous(labels = dollar) gp + scale_y_continuous(labels = scientific)

gp + scale_y_continuous(labels=scales::percent_format()) gp + scale_y_continuous(labels=scales::dollar_format()) gp + scale_y_continuous(labels=scales::scientific_format())

44/77

(45)

http://www.hmwu.idv.tw

Customize the appearance of the main title and axis labels

theme(plot.title = element_text(family, face, colour, size), axis.title.x = element_text(family, face, colour, size), axis.title.y = element_text(family, face, colour, size))

• family : font family

• face : font face. Possible values are “plain”, “italic”, “bold” and “bold.italic”

p <- ggplot(iris, aes(x=Species, y=Sepal.Length)) + geom_boxplot() +

ggtitle("鳶尾花資料集 (Iris Dataset) 盒形圖") + xlab("品種 (Species)") +

ylab("花萼長度 (Sepal.Length)") p

p + theme(plot.title=element_text(color="red", size=20, face="bold.italic"), axis.title.x=element_text(color="blue", size=14, face="bold"),

axis.title.y=element_text(color="darkgreen", size=14, face="bold"))

# Hide the main title and axis titles p + theme(plot.title=element_blank(),

axis.title.x=element_blank(), axis.title.y=element_blank())

45/77

(46)

http://www.hmwu.idv.tw

Scatterplot, 標題、資料點外形

mtcars$cyl <- as.factor(mtcars$cyl) head(mtcars)

ggplot(mtcars, aes(x=wt, y=mpg)) + geom_point() +

labs(x="車體重量(wt)", y="耗油量(mpg)") ggplot(mtcars, aes(x=wt, y=mpg)) +

geom_point(size=2, color="blue", shape=3) + labs(x="車體重量(wt)", y="耗油量(mpg)")

ggplot(mtcars, aes(x=wt, y=mpg)) +

geom_point(aes(size=qsec), color="darkgreen") +

labs(x="車體重量(wt)", y="耗油量(mpg)", size="1/4英里加速秒數(qsec)", title="Motor Trend Car Road Tests 資料集 (mtcars)")

ggtitle("the main title") xlab("the x axis label") ylab("the y axis label")

labs(x="x label", y="y label", title="main title",

fill="legend title")

46/77

(47)

http://www.hmwu.idv.tw

Scatterplot ,資料點外形、顏色、大小

# mtcars$cyl <- as.factor(mtcars$cyl)

ggplot(mtcars, aes(x=wt, y=mpg, shape=cyl)) + geom_point() +

labs(x="車體重量(wt)", y="耗油量(mpg)", shape="汽缸數(cyl)") ggplot(mtcars, aes(x=wt, y=mpg, shape=cyl, color=cyl)) +

geom_point(size=3) +

labs(x="車體重量(wt)", y="耗油量(mpg)", shape="汽缸數(cyl)", color="汽缸數(cyl)") mtcars$am <- as.factor(mtcars$am)

ggplot(mtcars, aes(x=wt, y=mpg, shape=am, color=cyl, size=hp)) + geom_point() +

labs(x="車體重量(wt)", y="耗油量(mpg)", shape="手自排(am)", color="汽缸數(cyl)", size="馬力(hp)")

47/77

(48)

http://www.hmwu.idv.tw

Scatterplot ,文字標註

p <- ggplot(data=mtcars, aes(x=wt, y=mpg, label=rownames(mtcars))) + geom_point() +

geom_text(size=3) +

labs(x="車體重量(wt)", y="耗油量(mpg)") p

p + geom_label()

geom_text understands the following aesthetics (required aesthetics are in bold):

x, y, label , alpha, angle, colour, family, fontface, group, hjust, lineheight, size, vjust

48/77

(49)

http://www.hmwu.idv.tw

Add text annotations to a graph

set.seed(123)

id <- sample(1:nrow(mtcars), 10) mtcars.subset <- mtcars[id, ]

sp <- ggplot(mtcars.subset, aes(x=wt, y=mpg, label=rownames(mtcars.subset))) + geom_point()

sp + geom_text()

sp + geom_text(size=3)

sp + geom_text(hjust=0, vjust=0)

# 1(normal), 2(bold), 3(italic), 4(bold.italic) sp + geom_text(aes(fontface=3))

sp + geom_label()

49/77

(50)

http://www.hmwu.idv.tw

標註文字的大小

sp + geom_text(aes(color=factor(cyl))) sp + geom_text(aes(size=cyl))

sp + geom_text(aes(size=cyl)) + scale_size(range=c(3, 6))

50/77

(51)

http://www.hmwu.idv.tw

於圖上標註文字

sp + geom_text(x=3, y=25, label="Scatter plot")

sp + annotate(geom="text", x=3, y=25, label="Scatter plot", color="red")

geom_text() : adds text directly to the plot

geom_label() : draws a rectangle underneath the text, making it easier to read.

annotate() : adding small text annotations at a particular location on the plot

annotation_custom() : Adds static annotations that are the same in every panel

# compare to

sp + geom_text(aes(x=3, y=25), label="Scatter plot")

51/77

(52)

http://www.hmwu.idv.tw

標註文字不重疊

# ggrepel: Avoid overlapping of text labels

# install.packages("ggrepel") require("ggrepel")

sp2 <- ggplot(mtcars.subset, aes(x=wt, y=mpg, label=rownames(mtcars.subset))) + geom_point(color="red")

sp2 + geom_text(size = 3.5)

sp2 + geom_text_repel(size = 3.5)

sp2 + geom_label_repel(aes(fill=factor(cyl)), color="white", size=3.5) + theme(legend.position="bottom")

label.sub <- subset(mtcars.subset, wt > 3 & mpg < 20) sp2 + geom_label_repel(data=label.sub,

aes(label=rownames(label.sub), fill=factor(cyl)), color="white", size=3.5) +

theme(legend.position="bottom")

52/77

(53)

http://www.hmwu.idv.tw

圓餅圖 (Pie chart)

> carb.df <- data.frame(table(mtcars$carb))

> names(carb.df) <- c("carb", "Freq")

> carb.df carb Freq 1 1 7 2 2 10 3 3 3 4 4 10 5 6 1 6 8 1

>

> bar.pt <- ggplot(carb.df, aes(x="", y=Freq, fill=carb)) + + geom_bar(width=1, stat="identity") +

+ labs(x="", fill="化油器數(carb)")

> bar.pt

>

> pie <- bar.pt + coord_polar("y", start=0)

> pie

> pie + scale_fill_brewer(palette="Set2")

> pie + scale_fill_grey() + theme_minimal()

>

> mtcars$carb <- factor(mtcars$carb)

> ggplot(mtcars, aes(x=factor(1), fill=carb)) + + geom_bar(width = 1) +

+ coord_polar("y")

53/77

(54)

http://www.hmwu.idv.tw

圓餅圖 (Pie chart)

cyl.df <- data.frame(table(mtcars$cyl)) names(cyl.df) <- c("cyl", "Freq")

cyl.df$Prop <- prop.table(cyl.df$Freq) cyl.df

p.bar <- ggplot(cyl.df, aes(x=cyl, y=Freq, fill=cyl)) + geom_bar(width=1, stat="identity") +

labs(x="", title="mtcars$cyl", fill="cyl") p.bar.tmp <- ggplot(cyl.df, aes(x="", y=Freq, fill=cyl)) +

geom_bar(width=1, stat="identity") +

labs(x="", title="mtcars$cyl", fill="cyl") p.pie <- p.bar.tmp + coord_polar("y", start=0) +

theme_void() +

geom_text(aes(label = paste0(round(Prop*100), "%")), position = position_stack(vjust = 0.5))

library(gridExtra)

grid.arrange(p.bar, p.pie, nrow=2)

Donut chart

https://www.datanovia.com/en/blog/how-to-create-a-pie-chart-in-r-using-ggplot2/

54/77

(55)

http://www.hmwu.idv.tw

QQplot (quantile-quantile plot)

ggplot(airquality, aes(sample=Wind)) + stat_qq() +

labs(title="QQplot for airquality$Wind")

ggplot(airquality, aes(sample=Wind, shape=Month, color=Month)) + stat_qq() +

labs(title="QQplot for airquality$Wind of each Month")

55/77

(56)

http://www.hmwu.idv.tw

Empirical Cumulative Density Function

ggplot(airquality, aes(x=Wind)) + stat_ecdf(geom = "point")

ggplot(airquality, aes(x=Wind)) + stat_ecdf(geom = "step") +

labs(title="Empirical Cumulative Density Function", y = "F(Wind)", x="Wind")

56/77

(57)

http://www.hmwu.idv.tw

Basic heatmap ( 熱圖) with ggplot2

library(tidyr)

xdata <- iris[, 1:4]

n <- nrow(xdata) p <- ncol(xdata)

iris.df <- data.frame(x=rep(1:p, each=n), y=rep(1:n, p), value=gather(xdata)$value)

str(iris.df)

ggplot(iris.df, aes(x=x, y=y, fill=value)) + geom_raster() +

scale_fill_gradient(low="white", high="black", na.value=NA) + scale_y_reverse() +

labs(x="", y="", title="heatmap for iris data") image(t(iris[, 1:4])[, nrow(iris[, 1:4]):1])

http://www.hmwu.idv.tw/web/R/E06-hmwu_R-heatmap.pdf

57/77

(58)

http://www.hmwu.idv.tw

Save a ggplot object

> # print(): print a ggplot to a file

> myplot <- ggplot(iris, aes(x=Sepal.Length, y=Sepal.Width)) + + geom_point()

> pdf("myplot.pdf") # or png("myplot.png")

> print(myplot)

> dev.off() windows

2

> # ggsave: save the last ggplot

> ggplot(mtcars, aes(wt, mpg)) + geom_point()

> ggsave("myplot.png") Saving 6.06 x 5.24 in image

>

> # ggsave: save a ggplot object

> ggsave(file="myplot2.pdf", plot=myplot,

device="pdf", scale=1.5)

Saving 9.09 x 7.86 in image

> getwd()

> list.files()

58/77

(59)

http://www.hmwu.idv.tw

顏色 (1)

ggplot(iris, aes(x=Species, y=Sepal.Length)) + geom_boxplot(fill="lightblue", color="darkred") ggplot(iris, aes(x=Sepal.Length, y=Sepal.Width)) +

geom_point(color="blue")

59/77

(60)

http://www.hmwu.idv.tw

顏色 (2)

h: range of hues to use: [0, 360]

c: chroma (intensity of colour), maximum value varies depending on combination of hue and luminance.

l: luminance (lightness): [0, 100]

bp <- ggplot(iris, aes(x=Species, y=Sepal.Length, fill=Species)) + geom_boxplot() bp

sp <- ggplot(iris, aes(x=Sepal.Length, y=Sepal.Width, color=Species)) + geom_point(size=3) sp

bp + scale_fill_hue(l=50, c=40) sp + scale_color_hue(l=30, c=40)

60/77

(61)

http://www.hmwu.idv.tw

顏色 (3)

bp + scale_fill_manual(values=c("coral", "deeppink", "slateblue2")) sp + scale_color_manual(values=c("coral", "deeppink", "slateblue2"))

# Use RColorBrewer palettes

bp + scale_fill_brewer(palette="Dark2") sp + scale_color_brewer(palette="Dark2")

61/77

(62)

http://www.hmwu.idv.tw

顏色 (4)

# Use gray colors, theme_classic(): turn bg white bp + scale_fill_grey() + theme_classic()

sp + scale_color_grey() + theme_classic()

bp + scale_fill_grey(start=0.8, end=0.2) + theme_classic() sp + scale_color_grey(start=0.8, end=0.2) + theme_classic()

62/77

(63)

http://www.hmwu.idv.tw

# Continuous colors

ggplot(iris, aes(x=Sepal.Length, y=Sepal.Width, color=Species)) + geom_point(size=3)

sc <- ggplot(iris, aes(x=Sepal.Length, y=Sepal.Width,

color=Petal.Length)) + geom_point(size=3)

sc

# Sequential color scheme

sc + scale_color_gradient(low="blue", high="red")

# Diverging color scheme

mid.value <- mean(iris$Petal.Length)

sc + scale_color_gradient2(midpoint=mid.value, low="blue", mid="white", high="red")

# Gradient between n colors library(fields)

sc + scale_color_gradientn(colours=tim.colors(10))

顏色 (5) 63/77

(64)

http://www.hmwu.idv.tw

佈景主題 (ggplot2 themes)

library(gridExtra)

p <- ggplot(iris, aes(x=Species, y=Sepal.Length)) + geom_boxplot() p1 <- p + theme_gray() + labs(title="gray")

p2 <- p + theme_bw() + labs(title="bw")

p3 <- p + theme_linedraw() + labs(title="linedraw") p4 <- p + theme_light() + labs(title="light")

p5 <- p + theme_dark() + labs(title="dark")

p6 <- p + theme_minimal() + labs(title="minimal") p7 <- p + theme_classic() + labs(title="classic")

grid.arrange(p, p1, p2, p3, p4, p5, p6, p7, nrow=2, ncol=4)

64/77

(65)

http://www.hmwu.idv.tw

ggthemes: Extra Themes, Scales and Geoms for 'ggplot2'

# install.packages("ggthemes") library(ggthemes)

sp <- ggplot(iris, aes(x=Sepal.Length, y=Sepal.Width, color=Species)) + geom_point() sp1 <- sp + theme_tufte() + labs(title="tufte") # a minimalist theme

sp2 <- sp + theme_economist() + labs(title="economist") sp3 <- sp + theme_stata() + labs(title="stata")

sp4 <- sp + theme_hc() + labs(title="hc") # Highcharts JS

sp5 <- sp + theme_wsj() + labs(title="wsj") # Wall Street Journal grid.arrange(sp, sp1, sp2, sp3, sp4, sp5, nrow=2, ncol=3)

theme_base, theme_calc, theme_clean, theme_economist, theme_excel,

theme_excel_new, theme_few,

theme_fivethirtyeight, theme_foundation, theme_gdocs, theme_hc, theme_igray,

theme_map, theme_pander, theme_par

theme_solarized, theme_solid, theme_stata, theme_tufte, theme_wsj.

65/77

(66)

http://www.hmwu.idv.tw

facet_grid

列位、欄位,多個變數,facet_grid例子:

ggplot2: Add name of variable used for facet_grid

https://stackoverflow.com/questions/39538226/ggplot2-add-name-of-variable-used-for-facet-grid/39538501

ggplot(mtcars, aes(x=wt, y=mpg, color=as.factor(carb), shape=as.factor(gear))) + geom_point(size=3) +

facet_grid(cyl ~ am, labeller = label_both) +

labs(x="車體重量(wt)", y="耗油量(mpg)", color="化油器數", shape="變速箱數")

66/77

(67)

http://www.hmwu.idv.tw

Plot multiple datasets

set.seed(12345)

df.A <- data.frame(x = rnorm(10), y=rnorm(10)) df.B <- data.frame(x = rnorm(10), y=rnorm(10)) ggplot(df.A, aes(x, y)) +

geom_point() +

geom_point(data = df.B, color = "red", shape = 2, size = 5)

set.seed(12345)

df.A <- data.frame(xa = rnorm(10), ya=rnorm(10)) df.B <- data.frame(xb = rnorm(10), yb=rnorm(10)) ggplot(df.A, aes(x = xa, y = ya)) +

geom_point() +

geom_point(data = df.B, aes(x = xb, y = yb), color = "red", shape = 2, size = 5) ggplot() +

geom_point(data = df.A, aes(x = xa, y = ya)) + geom_point(data = df.B, aes(x = xb, y = yb),

color = "red", shape = 2, size = 5)

67/77

(68)

http://www.hmwu.idv.tw

Overlaying a line plot and a bar plot

test.df <- data.frame(Day = as.Date(c("2021-07-20", "2021-07-21",

"2021-07-22", "2021-07-23",

"2021-07-24")), Number = c(2, 5, 4, 3, 4),

Percentage = c(0.70, 0.50, 0.95, 0.75, 0.3) )

ggplot(test.df) +

geom_bar(aes(x = Day, y = Number), stat = "identity") + geom_line(aes(x = Day, y = Percentage),

size = 2, color = "blue")

ggplot(test.df) +

geom_bar(aes(x = Day, y = Number), stat = "identity") + geom_line(aes(x = Day, y = Percentage * 5),

size = 2, color = "blue") +

scale_y_continuous(sec.axis = sec_axis(~./5,

name = "Percentage"))

68/77

(69)

http://www.hmwu.idv.tw

ggplot2: Geoms

Geoms: Use a geom function to represent data points, use the geom's aesthetic properties to represent variables. Each function returns a layer.

69/77

(70)

http://www.hmwu.idv.tw

ggplot2: One variable 70/77

(71)

http://www.hmwu.idv.tw

ggplot2: Two variables 71/77

(72)

http://www.hmwu.idv.tw

ggplot2: Two variables 72/77

(73)

http://www.hmwu.idv.tw

ggplot2: Three variables 73/77

(74)

http://www.hmwu.idv.tw

ggplot2: Stats 74/77

(75)

http://www.hmwu.idv.tw

ggplot2: Scales 75/77

(76)

http://www.hmwu.idv.tw

ggplot2:

Coordinate Systems, Position Adjustments

76/77

(77)

http://www.hmwu.idv.tw

ggplot2: Labels, Legends, Faceting, Zooming 77/77

參考文獻

相關文件

[r]

半立體是指在平面材料上進行立體化加工,使平面材料在

視覺障礙依衛生署「身心障礙等級」定義為:由於先天

For goods in transit, fill in the means of transport in this column and the code in the upper right corner of the box (refer to the “Customs Clearance Operations and

 lower triangular matrix: 下三角矩陣.  upper triangular matrix:

開角型 得此名是因為眼房水流過的前房角是張開的。這 開角型

Proposition 3.2.21 以及 Proposition 3.2.22, metric space 的 compact subset closed bounded.. least upper bound 以及 greatest lower

colour theory 色彩理論 colour wheel, colour circle 色輪. column