http://www.hmwu.idv.tw
http://www.hmwu.idv.tw
吳漢銘國立政治大學 統計學系
R 資料視覺化
ggplot2
統計繪圖套件
E03
http://www.hmwu.idv.tw
What is ggplot2
High-level graphics system developed by Hadley Wickham.
Implements grammar of graphics from Leland Wilkinson.
Streamlines many graphics workflows for complex plots.
Syntax centered around main ggplot function.
Simpler qplot function provides many shortcuts.
2/77
http://www.hmwu.idv.tw
The principle of a ggplot
The principle that a plot:
Plot = data + aesthetics + geometry
data : a data frame (dataset).
aesthetics :
indicates x and y variables,
tells R how data are displayed in a plot.
(e.g. color, size and shape of points, the height of bars etc.)
geometry : to the type of graphics
(bar chart, histogram, box plot, line plot, density plot, dot plot etc.)
3/77
http://www.hmwu.idv.tw
General ggplot syntax
Other important parts of plot:
Faceting implies the same type of graph can be applied to each subset of the data.
(e.g, for variable gender, creating 2 graphs for male and female.)
Annotation lets you to add text to the plot.
Summary Statistics allows you to add descriptive statistics on a plot.
Scales are used to control x and y axis limits.
> install.packages("ggplot2")
> library(ggplot2)
> ggplot(data=iris, aes(x=Sepal.Length, y=Sepal.Width, color=Species)) + geom_point(size=3)
General ggplot syntax
ggplot(data, aes(...)) + geom() + ... + stat() + ...
4/77
http://www.hmwu.idv.tw
Why ggplot2?
More elegant & compact code than base graphics.
More aesthetically pleasing than base graphics.
Very powerful for exploratory analysis.
Supports a continuum of expertise.
Easy to get started, plenty of power for complex figures.
Publication-quality figures.
Excellent themes can be created with a single command.
Its colors are nicer and more pretty than the usual graphics.
Easy to visualize data with multiple variables.
Provides a platform to create simple graphs providing plethora of information.
• Manual: http://had.co.nz/ggplot2/
• Introduction:
http://www.ling.upenn.edu/~joseff/rstudy/summer2010_ggplot2_intro.
html
• Book: http://had.co.nz/ggplot2/book/
• R Graphics Cookbook : http://www.cookbook-r.com/Graphs/
5/77
http://www.hmwu.idv.tw
The R Graph Gallery
https://www.r-graph-gallery.com/index.html
6/77
http://www.hmwu.idv.tw
Data Visualization with ggplot2: Cheat Sheet
https://www.rstudio.com/resources/cheatsheets/
7/77
http://www.hmwu.idv.tw
Data Visualization with ggplot2: Cheat Sheet 8/77
http://www.hmwu.idv.tw
ggplot2: Basics 9/77
http://www.hmwu.idv.tw
ggplot2 extensions
https://exts.ggplot2.tidyverse.org/gallery/
10/77
http://www.hmwu.idv.tw
索引圖 (Index plot)
iris.index.plot <- function(x){
ggplot(iris, aes(x=1:nrow(iris), y=iris[,x], color=Species)) + geom_point() +
labs(x="index", y=names(iris)[x]) }
index.list <- lapply(1:4, iris.index.plot) library(grid)
library(gridExtra)
marrangeGrob(index.list, nrow=2, ncol=2, top="")
ggplot(iris, aes(x=1:150, y=Sepal.Length, color=Species)) + geom_point()
> # select numerical variables
> mydata <- infert
> dim(mydata)
> head(mydata)
> str(mydata)
> id <- sapply(mydata, is.numeric)
> id
> mydata.numeric <- mydata[, id]
> dim(mydata.numeric)
> head(mydata.numeric)
# is.integer, is.double,
# is.logical, is.character,
# is.POSIXt, is.POSIXlt,
# is.POSIXct
11/77
http://www.hmwu.idv.tw
盒型圖 (Box plots)
library(ggplot2)
p <- ggplot(iris, aes(x=Species, y=Sepal.Length)) + geom_boxplot()
p
p + coord_flip() # Rotate the box plot
# Notched box plot
ggplot(iris, aes(x=Species, y=Sepal.Length)) + geom_boxplot(notch=TRUE)
ggplot(iris, aes(x=Species, y=Sepal.Length)) +
geom_boxplot(outlier.colour="red", outlier.shape=8, outlier.size=3)
geom_boxplot understands the following aesthetics (required aesthetics are in bold):
x, lower, upper, middle, ymin, ymax, alpha, colour, fill, group, linetype, shape, size, weight
12/77
http://www.hmwu.idv.tw
Box plot with dots
# stat_summary(): add mean points to a box plot p <- ggplot(iris, aes(x=Species, y=Sepal.Length)) +
geom_boxplot() p
p + stat_summary(fun=mean, geom="point", shape=2, size=4, col="red") p + geom_dotplot(binaxis="y", stackdir="center")
13/77
http://www.hmwu.idv.tw
Change box plot line colors
p <- ggplot(iris, aes(x=Species, y=Sepal.Length, color=Species)) + geom_boxplot()
p
# Use custom color palettes
p + scale_color_manual(values=c("orange", "purple", "darkgreen"))
# Use brewer color palettes
p + scale_color_brewer(palette="Set2")
> library(RColorBrewer)
> display.brewer.all()
# turn off legends
ggplot(iris, aes(x=Species, y=Sepal.Length, color=Species)) + geom_boxplot(show.legend = FALSE)
# remove the legend after the plot is created p + theme(legend.position = "none")
14/77
http://www.hmwu.idv.tw
Change box plot fill colors
p <- ggplot(iris, aes(x=Species, y=Sepal.Length, fill=Species)) + geom_boxplot()
p
p + scale_fill_manual(values=c("orange", "purple", "darkgreen")) p + scale_fill_brewer(palette="Set2")
15/77
http://www.hmwu.idv.tw
Change the legend position
p <- ggplot(iris, aes(x=Species, y=Sepal.Length, fill=Species)) + geom_boxplot()
p
p + theme(legend.position="top") p + theme(legend.position="bottom") p + theme(legend.position="none")
legend.position are:
"left", "top", "right", "bottom".
16/77
http://www.hmwu.idv.tw
Change the order of items in the legend
> levels(iris$Species)
[1] "setosa" "versicolor" "virginica"
p <- ggplot(iris, aes(x=Species, y=Sepal.Length, fill=Species)) + geom_boxplot()
p
p + scale_x_discrete(limits=c("setosa", "versicolor"))
p + scale_x_discrete(limits=c("versicolor", "setosa", "virginica"))
17/77
http://www.hmwu.idv.tw
Box plot with multiple groups
> select.id <- sample(1:150, 50)
> if.selected <- 1:150 %in% select.id
> iris.sel <- cbind(iris, if.selected)
> head(iris.sel)
Sepal.Length Sepal.Width Petal.Length Petal.Width Species if.selected 1 5.1 3.5 1.4 0.2 setosa TRUE 2 4.9 3.0 1.4 0.2 setosa FALSE 3 4.7 3.2 1.3 0.2 setosa FALSE 4 4.6 3.1 1.5 0.2 setosa FALSE 5 5.0 3.6 1.4 0.2 setosa TRUE 6 5.4 3.9 1.7 0.4 setosa TRUE
>
> p <- ggplot(iris.sel, aes(x=Species, y=Sepal.Length, fill=if.selected)) + + geom_boxplot() +
+ labs(title="iris data with selected IDs")
> p
18/77
http://www.hmwu.idv.tw
小提琴圖 (Violin plots)
ggplot(iris, aes(x=Species, y=Sepal.Length)) + geom_violin()
p <- ggplot(iris, aes(x=Species, y=Sepal.Length, fill=Species)) + geom_violin(trim=FALSE)
p
p + stat_summary(fun.data=mean_sdl, fun.args = list(mult = 1), geom="pointrange", color="purple")
19/77
http://www.hmwu.idv.tw
點圖 (Dot plots)
ggplot(iris, aes(x=Species, y=Petal.Length)) + geom_dotplot(binaxis="y")
ggplot(iris, aes(x=Species, y=Petal.Length)) + geom_dotplot(binaxis="y", stackdir="center",
stackratio=1.5, dotsize=0.5)
ggplot(iris, aes(x=Species, y=Petal.Length)) + geom_boxplot()+
geom_dotplot(binaxis="y", stackdir="center", dotsize=0.5) ggplot(iris, aes(x=Species, y=Petal.Length)) +
geom_violin(trim = FALSE)+
geom_dotplot(binaxis="y", stackdir="center", dotsize=0.5, colour="red")
20/77
http://www.hmwu.idv.tw
Dot plots + 統計量
p <- ggplot(iris, aes(x=Species, y=Petal.Length)) +
geom_dotplot(binaxis="y", stackdir="center", dotsize=0.5) p
p + stat_summary(fun=mean, geom="point", shape=17, size=3, color="red")
my_summary <- function(x, a=1) { m <- mean(x)
ymin <- m - a * sd(x) ymax <- m + a * sd(x)
c(y=m, ymin=ymin, ymax=ymax) }
p + stat_summary(fun.data=my_summary, color="red")
p + stat_summary(fun.data=my_summary, color="blue", fun.args=list(a=2))
21/77
http://www.hmwu.idv.tw
直方圖 (Histogram)
> # install.packages("gridExtra")
> library(gridExtra)
> h1 <- ggplot(data=iris, aes(x=Sepal.Length)) + geom_histogram()
> h2 <- ggplot(data=iris, aes(x=Sepal.Length)) + geom_histogram(binwidth=1)
> h3 <- ggplot(data=iris, aes(x=Sepal.Length)) +
geom_histogram(color="black", fill="blue", bins = 10)
> h4 <- ggplot(data=iris, aes(x=Sepal.Length, color=Species)) + geom_histogram(binwidth = 1)
> grid.arrange(h1, h2, h3, h4, nrow=1, ncol=4)
22/77
http://www.hmwu.idv.tw
Histogram
> p <- ggplot(data=iris, aes(x=Sepal.Length))
> p <- p + geom_histogram()
> p + facet_grid(Species~.) #row
p + facet_grid(.~Species) #column
23/77
http://www.hmwu.idv.tw
Histogram
> library(gridExtra)
> sl <- ggplot(iris, aes(x=Sepal.Length, fill=Species)) + geom_histogram(binwidth = 0.1)
> sw <- ggplot(iris, aes(x=Sepal.Width, fill=Species)) + geom_histogram(binwidth = 0.1)
> pl <- ggplot(iris, aes(x=Petal.Length, fill=Species)) + geom_histogram(binwidth = 0.1)
> pw <- ggplot(iris, aes(x=Petal.Width, fill=Species)) + geom_histogram(binwidth = 0.1)
> grid.arrange(sl, sw, pl, pw, nrow = 2)
24/77
http://www.hmwu.idv.tw
Histogram
iris.hist <- function(x){
ggplot(iris, aes(x=iris[,x], fill=Species)) + geom_histogram(binwidth = 0.1) +
xlab(names(iris)[x]) }
hist.list <- lapply(1:4, iris.hist) library(grid)
marrangeGrob(hist.list, nrow=2, ncol=2, top="")
25/77
http://www.hmwu.idv.tw
機率密度圖 (Density plots)
p1 <- ggplot(iris, aes(x=Sepal.Length)) + geom_density()
p2 <- ggplot(iris, aes(x=Sepal.Length, color=Species)) + geom_density()
p3 <- ggplot(iris, aes(x=Sepal.Length, fill=Species)) + geom_density()
# Add mean line
p1 + geom_vline(aes(xintercept=mean(Sepal.Length)), color="blue", linetype="dashed", size=1) mu <- tapply(iris$Sepal.Length, iris$Species, mean) mu.df <- data.frame(Sp=names(mu), grp.mean=mu)
head(mu.df)
p2 + geom_vline(data=mu.df, aes(xintercept=grp.mean, color=Sp), linetype="dashed")
26/77
http://www.hmwu.idv.tw
Change line color, line type and fill color
# Change line color, line type and fill color ggplot(iris, aes(x=Sepal.Length))+
geom_density(color="darkblue", fill="lightblue", linetype="dashed")
# Use semi-transparent fill
ggplot(iris, aes(x=Sepal.Length, fill=Species)) + geom_density(alpha=0.4)
27/77
http://www.hmwu.idv.tw
Combine histogram and density plots
Create the histogram with a density scale using the computed varlable ..density..
# Histogram with density plot
ggplot(iris, aes(x=Sepal.Length)) +
geom_histogram(aes(y=..density..), colour="black", fill="lightblue") + geom_density(alpha=0.2, fill="red")
# Color by groups
phd <- ggplot(iris, aes(x=Sepal.Length, color=Species, fill=Species)) +
geom_histogram(aes(y=..density..), alpha=0.5, position="identity") + geom_density(alpha=.2)
phd
# facets: Split the plot in multiple panels phd + facet_grid(Species ~ .)
28/77
http://www.hmwu.idv.tw
mtcars {datasets}
mtcars {datasets} : Motor Trend Car Road Tests ((1974 Motor Trend US) data frame包含32台車與11個車的屬性
• mpg :Miles/(US) gallon 耗油量
• cyl :Number of cylinders 汽缸數
• disp:Displacement (cu.in., cubic inch) 單汽缸排氣量
• hp :Gross horsepower 馬力
• drat:Rear axle ratio 後輪軸減速比
• wt :Weight (1000 lbs) 車體重量
• qsec:1/4 mile time 1/4英里加速秒數
• vs :V/S,0代表V型引擎,1代表直立式引擎
• am :Transmission (0 = automatic自排, 1 = manual手排)
• gear:Number of forward gears 變速箱數
• carb:Number of carburetors 化油器數
> head(mtcars)
mpg cyl disp hp drat wt qsec vs am gear carb Mazda RX4 21.0 6 160 110 3.90 2.620 16.46 0 1 4 4 Mazda RX4 Wag 21.0 6 160 110 3.90 2.875 17.02 0 1 4 4 Datsun 710 22.8 4 108 93 3.85 2.320 18.61 1 1 4 1 Hornet 4 Drive 21.4 6 258 110 3.08 3.215 19.44 1 0 3 1 Hornet Sportabout 18.7 8 360 175 3.15 3.440 17.02 0 0 3 2 Valiant 18.1 6 225 105 2.76 3.460 20.22 1 0 3 1
29/77
http://www.hmwu.idv.tw
長條圖 (Bar plot)(Bar chart)
> # mtcars$cyl <- as.factor(mtcars$cyl)
> p <- ggplot(mtcars, aes(x=cyl)) + + geom_bar() +
+ labs(x="汽缸數(cyl)", y="車輛數")
>
> p + coord_flip()
>
> cyl.df <- data.frame(table(mtcars$cyl))
> cyl.df Var1 Freq 1 4 11 2 6 7 3 8 14
> names(cyl.df) <- c("cyl", "count")
> cyl.df cyl count 1 4 11 2 6 7 3 8 14
>
> ggplot(cyl.df, aes(x=cyl, y=count)) + + geom_bar(stat="identity") +
+ labs(x="汽缸數(cyl)", y="車輛數")
ggtitle("Main title"): Adds a main title above the plot xlab("X axis label"): Changes the X axis label
ylab("Y axis label"): Changes the Y axis label
labs(title="Main title", x="X axis label", y="Y axis label"):
Changes main title and axis labels
30/77
http://www.hmwu.idv.tw
Bar plot
ggplot(cyl.df, aes(x=cyl, y=count)) +
geom_bar(stat="identity", width=0.5, color="black", fill="steelblue") p <- ggplot(cyl.df, aes(x=cyl, y=count, fill=cyl)) +
geom_bar(stat="identity", width=0.5) p
p + geom_text(aes(label=count), vjust=-0.3, size=4)
p + geom_text(aes(label=count), vjust=1.6, color="white", size=4) p + scale_fill_manual(values=c("#999000", "#E69F99", "#56B4E9")) p + scale_fill_brewer(palette="Dark2")
p + scale_fill_grey()
31/77
http://www.hmwu.idv.tw
Bar plot
> iris.mean <- aggregate(iris[,1:4], by=list(Species=iris$Species), FUN=mean)
> iris.mean
Species Sepal.Length Sepal.Width Petal.Length Petal.Width 1 setosa 5.006 3.428 1.462 0.246 2 versicolor 5.936 2.770 4.260 1.326 3 virginica 6.588 2.974 5.552 2.026
> mydata <- cbind(stack(iris.mean[,-1]), Species = iris.mean$Species)
> mydata
values ind Species 1 5.006 Sepal.Length setosa 2 5.936 Sepal.Length versicolor 3 6.588 Sepal.Length virginica 4 3.428 Sepal.Width setosa 5 2.770 Sepal.Width versicolor 6 2.974 Sepal.Width virginica 7 1.462 Petal.Length setosa 8 4.260 Petal.Length versicolor 9 5.552 Petal.Length virginica 10 0.246 Petal.Width setosa 11 1.326 Petal.Width versicolor 12 2.026 Petal.Width virginica
> ggplot(mydata, aes(x=ind, y=values, fill = Species)) + + geom_bar(stat="identity")
position_dodge for creating side-by-side barcharts.
Other position adjustments:
position_identity, position_jitterdodge, position_jitter, position_nudge, position_stack32/77
http://www.hmwu.idv.tw
Bar plot
ggplot(mydata, aes(x=ind, y=values, fill=Species)) + geom_bar(stat="identity", position="dodge") +
geom_text(aes(label=values), vjust=1.4, color="white", position = position_dodge(0.9)) +
labs(x="", y="mean")
33/77
http://www.hmwu.idv.tw
Bar plot
> library(plyr)
> mydata.sorted <- arrange(mydata, ind, Species)
> mydata.sorted
values ind Species 1 5.006 Sepal.Length setosa 2 5.936 Sepal.Length versicolor 3 6.588 Sepal.Length virginica ...
10 0.246 Petal.Width setosa 11 1.326 Petal.Width versicolor 12 2.026 Petal.Width virginica
> mydata.sorted$Species <- factor(mydata.sorted$Species, levels=rev(levels(mydata$Species)))
> mydata.cumsum <- ddply(mydata.sorted, "ind",
+ transform, label.ypos=cumsum(values))
> mydata.cumsum
values ind Species label.ypos 1 5.006 Sepal.Length setosa 5.006 2 5.936 Sepal.Length versicolor 10.942 3 6.588 Sepal.Length virginica 17.530 ...
10 0.246 Petal.Width setosa 0.246 11 1.326 Petal.Width versicolor 1.572 12 2.026 Petal.Width virginica 3.598
>
>
>
> ggplot(mydata.cumsum, aes(x=ind, y=values, fill=Species)) + + geom_bar(stat="identity") +
+ geom_text(aes(y=label.ypos, label=values), vjust=1.6, color="white")
34/77
http://www.hmwu.idv.tw
線圖 (Line plot)
> head(airquality)
Ozone Solar.R Wind Temp Month Day 1 41 190 7.4 67 5 1 2 36 118 8.0 72 5 2 3 12 149 12.6 74 5 3 4 18 313 11.5 62 5 4 5 NA NA 14.3 56 5 5 6 28 NA 14.9 66 5 6
> airquality$Month <- factor(airquality$Month)
> ggplot(airquality, aes(x=Day, y=Temp, group=Month, color=Month)) + + geom_line(aes(linetype=Month)) +
+ geom_point()
35/77
http://www.hmwu.idv.tw
Line plot
> sales <- data.frame(
+ date = seq(Sys.Date(), length.out=100, by="1 day")[sample(100, 50)], + price = floor(rnorm(50, mean=100, sd=20))
+ )
> sales <- sales[order(sales$date), ]
> head(sales)
date price 28 2020-08-10 73 27 2020-08-11 76 1 2020-08-12 95 22 2020-08-13 109 8 2020-08-16 107 18 2020-08-19 105
lp <- ggplot(data=sales, aes(x=date, y=price)) + geom_line() lp
lp + scale_x_date(date_labels=("%m/%d"))
36/77
http://www.hmwu.idv.tw
Line plot
lp + scale_x_date(breaks = date_breaks("1 week")) + theme(axis.text.x = element_text(angle=45)) range(sales$date)
# "2020-08-10" "2020-11-07"
amin <- as.Date("2020-09-01") amax <- as.Date("2020-10-31")
lp + scale_x_date(limits = c(amin, amax))
37/77
http://www.hmwu.idv.tw
Line plot
> mydata <- as.data.frame(matrix(rnorm(100), ncol=4))
> library(reshape2)
> head(mydata, 3)
V1 V2 V3 V4 1 0.2846997 0.78283129 0.659318307 -0.04318195 2 1.2510186 1.59782230 0.187074422 -1.23161275 3 0.3827782 1.44676700 0.840148794 -0.20081868
> #id variable for position in matrix
> mydata$id <- 1:nrow(mydata)
> #reshape to long format
> mydata.lf <- melt(mydata, id.var="id")
> head(mydata.lf)
id variable value 1 1 V1 0.2846997 2 2 V1 1.2510186 3 3 V1 0.3827782 4 4 V1 -3.2994010 5 5 V1 1.4943630 6 6 V1 0.1557203
> tail(mydata.lf)
id variable value 95 20 V4 1.1655219 96 21 V4 0.5081844 97 22 V4 -1.2523577 98 23 V4 -2.3553732 99 24 V4 0.1542803 100 25 V4 0.6899416
> ggplot(mydata.lf, aes(x=id, y=value, group=variable, colour=variable)) + + geom_point()+
+ geom_line(aes(lty=variable))
38/77
http://www.hmwu.idv.tw
於圖上加直線、迴歸線
pp <- ggplot(airquality, aes(x=Temp, y=Wind)) + geom_point()
pp
pp + geom_hline(yintercept=mean(airquality$Wind), linetype="dashed", color="red", size=2) + geom_vline(xintercept=mean(airquality$Temp), linetype="dotted", color="blue", size=2) beta <- lm(Wind ~ Temp, data=airquality)$coefficients
pp + geom_abline(intercept=beta[1], slope=beta[2], color="red", size=2) +
ggtitle(paste0("y = ", round(beta[1], 2), " + ", round(beta[2], 2), " x"))
39/77
http://www.hmwu.idv.tw
於圖上加線段、箭頭
pp + geom_segment(aes(x=70, y=5, xend=90, yend=12), color="red") + geom_segment(aes(x=65, y=17, xend=61.5, yend=19.5), color="blue",
arrow = arrow(length = unit(0.5, "cm"))) +
geom_point(aes(x=60, y=5), color="darkgreen", shape=13, size=4) xy.df <- data.frame(x1=airquality$Temp[1:3], y1=airquality$Wind[1:3],
x2=airquality$Temp[4:6], y2=airquality$Wind[4:6])
pp + geom_segment(data=xy.df, aes(x=x1, y=y1, xend=x2, yend=y2), color=2:4, size=2)
?geom_curve
40/77
http://www.hmwu.idv.tw
散佈圖 (Scatterplot)
> ggplot(data=iris, aes(x=Sepal.Length, y=Sepal.Width, shape=Species, color=Species)) + geom_point()
> p <- ggplot(data=iris, aes(x=Sepal.Length, y=Sepal.Width, shape=Species, color=Species))
> p <- p + geom_point()
> p
> p + geom_line(aes(y=Sepal.Width))
41/77
http://www.hmwu.idv.tw
Scatterplot, 座標軸
> p <- ggplot(data=iris, aes(x=Sepal.Length, y=Sepal.Width)) + + geom_point()
> p
> p + coord_fixed(ratio=1)
> p + coord_fixed(ratio=0.5)
> p + coord_fixed(ratio=5)
42/77
http://www.hmwu.idv.tw
Scatterplot, 座標軸
# Change x and y axis limits
gp <- ggplot(airquality, aes(x=Temp, y=Wind)) + geom_point()
gp
gp + xlim(50, 100) + ylim(0, 25)
gp + expand_limits(x=c(50, 100), y=c(0, 25))
# Axis transformations
# trans: "log2", "log10", "sqrt"
gp + scale_x_continuous(trans="log2") + scale_y_continuous(trans="log2") + labs(x="log2(Temp)", y="log2(Wind)")
gp + coord_trans(x="log2", y="log2") gp + scale_y_sqrt() # square root
gp + scale_y_reverse() # Reverse coordinates
43/77
http://www.hmwu.idv.tw
Scatterplot, 座標軸
gp + scale_y_continuous(labels = percent) gp + scale_y_continuous(labels = dollar) gp + scale_y_continuous(labels = scientific)
gp + scale_y_continuous(labels=scales::percent_format()) gp + scale_y_continuous(labels=scales::dollar_format()) gp + scale_y_continuous(labels=scales::scientific_format())
44/77
http://www.hmwu.idv.tw
Customize the appearance of the main title and axis labels
theme(plot.title = element_text(family, face, colour, size), axis.title.x = element_text(family, face, colour, size), axis.title.y = element_text(family, face, colour, size))
• family : font family
• face : font face. Possible values are “plain”, “italic”, “bold” and “bold.italic”
p <- ggplot(iris, aes(x=Species, y=Sepal.Length)) + geom_boxplot() +
ggtitle("鳶尾花資料集 (Iris Dataset) 盒形圖") + xlab("品種 (Species)") +
ylab("花萼長度 (Sepal.Length)") p
p + theme(plot.title=element_text(color="red", size=20, face="bold.italic"), axis.title.x=element_text(color="blue", size=14, face="bold"),
axis.title.y=element_text(color="darkgreen", size=14, face="bold"))
# Hide the main title and axis titles p + theme(plot.title=element_blank(),
axis.title.x=element_blank(), axis.title.y=element_blank())
45/77
http://www.hmwu.idv.tw
Scatterplot, 標題、資料點外形
mtcars$cyl <- as.factor(mtcars$cyl) head(mtcars)
ggplot(mtcars, aes(x=wt, y=mpg)) + geom_point() +
labs(x="車體重量(wt)", y="耗油量(mpg)") ggplot(mtcars, aes(x=wt, y=mpg)) +
geom_point(size=2, color="blue", shape=3) + labs(x="車體重量(wt)", y="耗油量(mpg)")
ggplot(mtcars, aes(x=wt, y=mpg)) +
geom_point(aes(size=qsec), color="darkgreen") +
labs(x="車體重量(wt)", y="耗油量(mpg)", size="1/4英里加速秒數(qsec)", title="Motor Trend Car Road Tests 資料集 (mtcars)")
ggtitle("the main title") xlab("the x axis label") ylab("the y axis label")
labs(x="x label", y="y label", title="main title",
fill="legend title")
46/77
http://www.hmwu.idv.tw
Scatterplot ,資料點外形、顏色、大小
# mtcars$cyl <- as.factor(mtcars$cyl)
ggplot(mtcars, aes(x=wt, y=mpg, shape=cyl)) + geom_point() +
labs(x="車體重量(wt)", y="耗油量(mpg)", shape="汽缸數(cyl)") ggplot(mtcars, aes(x=wt, y=mpg, shape=cyl, color=cyl)) +
geom_point(size=3) +
labs(x="車體重量(wt)", y="耗油量(mpg)", shape="汽缸數(cyl)", color="汽缸數(cyl)") mtcars$am <- as.factor(mtcars$am)
ggplot(mtcars, aes(x=wt, y=mpg, shape=am, color=cyl, size=hp)) + geom_point() +
labs(x="車體重量(wt)", y="耗油量(mpg)", shape="手自排(am)", color="汽缸數(cyl)", size="馬力(hp)")
47/77
http://www.hmwu.idv.tw
Scatterplot ,文字標註
p <- ggplot(data=mtcars, aes(x=wt, y=mpg, label=rownames(mtcars))) + geom_point() +
geom_text(size=3) +
labs(x="車體重量(wt)", y="耗油量(mpg)") p
p + geom_label()
geom_text understands the following aesthetics (required aesthetics are in bold):
x, y, label , alpha, angle, colour, family, fontface, group, hjust, lineheight, size, vjust
48/77
http://www.hmwu.idv.tw
Add text annotations to a graph
set.seed(123)
id <- sample(1:nrow(mtcars), 10) mtcars.subset <- mtcars[id, ]
sp <- ggplot(mtcars.subset, aes(x=wt, y=mpg, label=rownames(mtcars.subset))) + geom_point()
sp + geom_text()
sp + geom_text(size=3)
sp + geom_text(hjust=0, vjust=0)
# 1(normal), 2(bold), 3(italic), 4(bold.italic) sp + geom_text(aes(fontface=3))
sp + geom_label()
49/77
http://www.hmwu.idv.tw
標註文字的大小
sp + geom_text(aes(color=factor(cyl))) sp + geom_text(aes(size=cyl))
sp + geom_text(aes(size=cyl)) + scale_size(range=c(3, 6))
50/77
http://www.hmwu.idv.tw
於圖上標註文字
sp + geom_text(x=3, y=25, label="Scatter plot")
sp + annotate(geom="text", x=3, y=25, label="Scatter plot", color="red")
• geom_text() : adds text directly to the plot
• geom_label() : draws a rectangle underneath the text, making it easier to read.
• annotate() : adding small text annotations at a particular location on the plot
• annotation_custom() : Adds static annotations that are the same in every panel
# compare to
sp + geom_text(aes(x=3, y=25), label="Scatter plot")
51/77
http://www.hmwu.idv.tw
標註文字不重疊
# ggrepel: Avoid overlapping of text labels
# install.packages("ggrepel") require("ggrepel")
sp2 <- ggplot(mtcars.subset, aes(x=wt, y=mpg, label=rownames(mtcars.subset))) + geom_point(color="red")
sp2 + geom_text(size = 3.5)
sp2 + geom_text_repel(size = 3.5)
sp2 + geom_label_repel(aes(fill=factor(cyl)), color="white", size=3.5) + theme(legend.position="bottom")
label.sub <- subset(mtcars.subset, wt > 3 & mpg < 20) sp2 + geom_label_repel(data=label.sub,
aes(label=rownames(label.sub), fill=factor(cyl)), color="white", size=3.5) +
theme(legend.position="bottom")
52/77
http://www.hmwu.idv.tw
圓餅圖 (Pie chart)
> carb.df <- data.frame(table(mtcars$carb))
> names(carb.df) <- c("carb", "Freq")
> carb.df carb Freq 1 1 7 2 2 10 3 3 3 4 4 10 5 6 1 6 8 1
>
> bar.pt <- ggplot(carb.df, aes(x="", y=Freq, fill=carb)) + + geom_bar(width=1, stat="identity") +
+ labs(x="", fill="化油器數(carb)")
> bar.pt
>
> pie <- bar.pt + coord_polar("y", start=0)
> pie
> pie + scale_fill_brewer(palette="Set2")
> pie + scale_fill_grey() + theme_minimal()
>
> mtcars$carb <- factor(mtcars$carb)
> ggplot(mtcars, aes(x=factor(1), fill=carb)) + + geom_bar(width = 1) +
+ coord_polar("y")
53/77
http://www.hmwu.idv.tw
圓餅圖 (Pie chart)
cyl.df <- data.frame(table(mtcars$cyl)) names(cyl.df) <- c("cyl", "Freq")
cyl.df$Prop <- prop.table(cyl.df$Freq) cyl.df
p.bar <- ggplot(cyl.df, aes(x=cyl, y=Freq, fill=cyl)) + geom_bar(width=1, stat="identity") +
labs(x="", title="mtcars$cyl", fill="cyl") p.bar.tmp <- ggplot(cyl.df, aes(x="", y=Freq, fill=cyl)) +
geom_bar(width=1, stat="identity") +
labs(x="", title="mtcars$cyl", fill="cyl") p.pie <- p.bar.tmp + coord_polar("y", start=0) +
theme_void() +
geom_text(aes(label = paste0(round(Prop*100), "%")), position = position_stack(vjust = 0.5))
library(gridExtra)
grid.arrange(p.bar, p.pie, nrow=2)
Donut chart
https://www.datanovia.com/en/blog/how-to-create-a-pie-chart-in-r-using-ggplot2/
54/77
http://www.hmwu.idv.tw
QQplot (quantile-quantile plot)
ggplot(airquality, aes(sample=Wind)) + stat_qq() +
labs(title="QQplot for airquality$Wind")
ggplot(airquality, aes(sample=Wind, shape=Month, color=Month)) + stat_qq() +
labs(title="QQplot for airquality$Wind of each Month")
55/77
http://www.hmwu.idv.tw
Empirical Cumulative Density Function
ggplot(airquality, aes(x=Wind)) + stat_ecdf(geom = "point")
ggplot(airquality, aes(x=Wind)) + stat_ecdf(geom = "step") +
labs(title="Empirical Cumulative Density Function", y = "F(Wind)", x="Wind")
56/77
http://www.hmwu.idv.tw
Basic heatmap ( 熱圖) with ggplot2
library(tidyr)
xdata <- iris[, 1:4]
n <- nrow(xdata) p <- ncol(xdata)
iris.df <- data.frame(x=rep(1:p, each=n), y=rep(1:n, p), value=gather(xdata)$value)
str(iris.df)
ggplot(iris.df, aes(x=x, y=y, fill=value)) + geom_raster() +
scale_fill_gradient(low="white", high="black", na.value=NA) + scale_y_reverse() +
labs(x="", y="", title="heatmap for iris data") image(t(iris[, 1:4])[, nrow(iris[, 1:4]):1])
http://www.hmwu.idv.tw/web/R/E06-hmwu_R-heatmap.pdf
57/77
http://www.hmwu.idv.tw
Save a ggplot object
> # print(): print a ggplot to a file
> myplot <- ggplot(iris, aes(x=Sepal.Length, y=Sepal.Width)) + + geom_point()
> pdf("myplot.pdf") # or png("myplot.png")
> print(myplot)
> dev.off() windows
2
> # ggsave: save the last ggplot
> ggplot(mtcars, aes(wt, mpg)) + geom_point()
> ggsave("myplot.png") Saving 6.06 x 5.24 in image
>
> # ggsave: save a ggplot object
> ggsave(file="myplot2.pdf", plot=myplot,
device="pdf", scale=1.5)
Saving 9.09 x 7.86 in image
> getwd()
> list.files()
58/77
http://www.hmwu.idv.tw
顏色 (1)
ggplot(iris, aes(x=Species, y=Sepal.Length)) + geom_boxplot(fill="lightblue", color="darkred") ggplot(iris, aes(x=Sepal.Length, y=Sepal.Width)) +
geom_point(color="blue")
59/77
http://www.hmwu.idv.tw
顏色 (2)
h: range of hues to use: [0, 360]
c: chroma (intensity of colour), maximum value varies depending on combination of hue and luminance.
l: luminance (lightness): [0, 100]
bp <- ggplot(iris, aes(x=Species, y=Sepal.Length, fill=Species)) + geom_boxplot() bp
sp <- ggplot(iris, aes(x=Sepal.Length, y=Sepal.Width, color=Species)) + geom_point(size=3) sp
bp + scale_fill_hue(l=50, c=40) sp + scale_color_hue(l=30, c=40)
60/77
http://www.hmwu.idv.tw
顏色 (3)
bp + scale_fill_manual(values=c("coral", "deeppink", "slateblue2")) sp + scale_color_manual(values=c("coral", "deeppink", "slateblue2"))
# Use RColorBrewer palettes
bp + scale_fill_brewer(palette="Dark2") sp + scale_color_brewer(palette="Dark2")
61/77
http://www.hmwu.idv.tw
顏色 (4)
# Use gray colors, theme_classic(): turn bg white bp + scale_fill_grey() + theme_classic()
sp + scale_color_grey() + theme_classic()
bp + scale_fill_grey(start=0.8, end=0.2) + theme_classic() sp + scale_color_grey(start=0.8, end=0.2) + theme_classic()
62/77
http://www.hmwu.idv.tw
# Continuous colors
ggplot(iris, aes(x=Sepal.Length, y=Sepal.Width, color=Species)) + geom_point(size=3)
sc <- ggplot(iris, aes(x=Sepal.Length, y=Sepal.Width,
color=Petal.Length)) + geom_point(size=3)
sc
# Sequential color scheme
sc + scale_color_gradient(low="blue", high="red")
# Diverging color scheme
mid.value <- mean(iris$Petal.Length)
sc + scale_color_gradient2(midpoint=mid.value, low="blue", mid="white", high="red")
# Gradient between n colors library(fields)
sc + scale_color_gradientn(colours=tim.colors(10))
顏色 (5) 63/77
http://www.hmwu.idv.tw
佈景主題 (ggplot2 themes)
library(gridExtra)
p <- ggplot(iris, aes(x=Species, y=Sepal.Length)) + geom_boxplot() p1 <- p + theme_gray() + labs(title="gray")
p2 <- p + theme_bw() + labs(title="bw")
p3 <- p + theme_linedraw() + labs(title="linedraw") p4 <- p + theme_light() + labs(title="light")
p5 <- p + theme_dark() + labs(title="dark")
p6 <- p + theme_minimal() + labs(title="minimal") p7 <- p + theme_classic() + labs(title="classic")
grid.arrange(p, p1, p2, p3, p4, p5, p6, p7, nrow=2, ncol=4)
64/77
http://www.hmwu.idv.tw
ggthemes: Extra Themes, Scales and Geoms for 'ggplot2'
# install.packages("ggthemes") library(ggthemes)
sp <- ggplot(iris, aes(x=Sepal.Length, y=Sepal.Width, color=Species)) + geom_point() sp1 <- sp + theme_tufte() + labs(title="tufte") # a minimalist theme
sp2 <- sp + theme_economist() + labs(title="economist") sp3 <- sp + theme_stata() + labs(title="stata")
sp4 <- sp + theme_hc() + labs(title="hc") # Highcharts JS
sp5 <- sp + theme_wsj() + labs(title="wsj") # Wall Street Journal grid.arrange(sp, sp1, sp2, sp3, sp4, sp5, nrow=2, ncol=3)
theme_base, theme_calc, theme_clean, theme_economist, theme_excel,
theme_excel_new, theme_few,
theme_fivethirtyeight, theme_foundation, theme_gdocs, theme_hc, theme_igray,
theme_map, theme_pander, theme_par
theme_solarized, theme_solid, theme_stata, theme_tufte, theme_wsj.
65/77
http://www.hmwu.idv.tw
facet_grid
列位、欄位,多個變數,facet_grid例子:
ggplot2: Add name of variable used for facet_grid
https://stackoverflow.com/questions/39538226/ggplot2-add-name-of-variable-used-for-facet-grid/39538501
ggplot(mtcars, aes(x=wt, y=mpg, color=as.factor(carb), shape=as.factor(gear))) + geom_point(size=3) +
facet_grid(cyl ~ am, labeller = label_both) +
labs(x="車體重量(wt)", y="耗油量(mpg)", color="化油器數", shape="變速箱數")
66/77
http://www.hmwu.idv.tw
Plot multiple datasets
set.seed(12345)
df.A <- data.frame(x = rnorm(10), y=rnorm(10)) df.B <- data.frame(x = rnorm(10), y=rnorm(10)) ggplot(df.A, aes(x, y)) +
geom_point() +
geom_point(data = df.B, color = "red", shape = 2, size = 5)
set.seed(12345)
df.A <- data.frame(xa = rnorm(10), ya=rnorm(10)) df.B <- data.frame(xb = rnorm(10), yb=rnorm(10)) ggplot(df.A, aes(x = xa, y = ya)) +
geom_point() +
geom_point(data = df.B, aes(x = xb, y = yb), color = "red", shape = 2, size = 5) ggplot() +
geom_point(data = df.A, aes(x = xa, y = ya)) + geom_point(data = df.B, aes(x = xb, y = yb),
color = "red", shape = 2, size = 5)
67/77
http://www.hmwu.idv.tw
Overlaying a line plot and a bar plot
test.df <- data.frame(Day = as.Date(c("2021-07-20", "2021-07-21",
"2021-07-22", "2021-07-23",
"2021-07-24")), Number = c(2, 5, 4, 3, 4),
Percentage = c(0.70, 0.50, 0.95, 0.75, 0.3) )
ggplot(test.df) +
geom_bar(aes(x = Day, y = Number), stat = "identity") + geom_line(aes(x = Day, y = Percentage),
size = 2, color = "blue")
ggplot(test.df) +
geom_bar(aes(x = Day, y = Number), stat = "identity") + geom_line(aes(x = Day, y = Percentage * 5),
size = 2, color = "blue") +
scale_y_continuous(sec.axis = sec_axis(~./5,
name = "Percentage"))
68/77
http://www.hmwu.idv.tw
ggplot2: Geoms
Geoms: Use a geom function to represent data points, use the geom's aesthetic properties to represent variables. Each function returns a layer.
69/77
http://www.hmwu.idv.tw
ggplot2: One variable 70/77
http://www.hmwu.idv.tw
ggplot2: Two variables 71/77
http://www.hmwu.idv.tw
ggplot2: Two variables 72/77
http://www.hmwu.idv.tw
ggplot2: Three variables 73/77
http://www.hmwu.idv.tw
ggplot2: Stats 74/77
http://www.hmwu.idv.tw
ggplot2: Scales 75/77
http://www.hmwu.idv.tw
ggplot2:
Coordinate Systems, Position Adjustments
76/77
http://www.hmwu.idv.tw