Up till now, we have used R’s basic plotting functionality. The
ggplot2 package,written by Hadley Wickham https://hadley.nz/, is more
comprehensive and consistent. The extensive documentation, with
examples, for ggplot
can be found at
https://ggplot2.tidyverse.org/reference/
ggplot2 relies entirely on data frames for input.
Let’s make our first ggplot with a dataset of ablation data
(ablation
, downloadable from the website):
ablation <- read.csv("ablation.csv", header = TRUE, stringsAsFactors = TRUE)
names(ablation)[names(ablation) == "SCORE"] <- "Score"
head(ablation)
## Measurement Experiment CellType Direction Time Score
## 1 LDLR-ABLATION E1909 WT ABL 0 2.82
## 2 LDLR-ABLATION E1909 WT ABL 5 11.37
## 3 LDLR-ABLATION E1909 WT ABL 10 9.03
## 4 LDLR-ABLATION E1909 WT ABL 20 28.27
## 5 LDLR-ABLATION E1909 WT ABL 30 42.86
## 6 LDLR-ABLATION E1909 A-KD ABL 0 6.99
You may have noticed that the format of the ablation
data frame is a bit peculiar. It is probably not what you are used to
getting from your colleagues, or working with yourself. It is, however,
in the canonical format for storing and manipulating data that you
should be using.
The hallmark of this canonical (tidy) format is that there is only one (set of) independently observed value(s) in each row. All of the other columns are identifying values, or variables. They explain what exactly was measured, i.e., its metadata.
ggplot(ablation, aes(x = Time, y = Score)) + geom_point()
At a minimum, the two things that you need to give ggplot are:
ggplot gives you exquisite control over plotting parameters.
Here, we’ll change the color and size of the points.
ggplot(ablation, aes(x = Time, y = Score)) + geom_point(colour = "red", size = 4)
Aesthetics are used to bind plotting parameters to your data. The
aes()
function defines which variables you want to plot,
and which plot parameters to map them to. Any aspect of the graph can be
tied to any variable.
ggplot(ablation, aes(x = Time, y = Score)) +
geom_point(aes(color = Experiment), size = 4)
ggplot(ablation, aes(x = Time, y = Score)) +
geom_point(aes(color = Experiment, shape = CellType), size = 4)
When using ggplot, layers are added to a ggplot object. You can add as many layers as you like.
ggplot(ablation, aes(x = Time, y = Score)) +
geom_point(aes(color = Experiment), size = 4) +
geom_text(aes(label = CellType), hjust = -0.3, size = 3)
… and tidy it up a little
ggplot(ablation, aes(x = Time, y = Score)) +
geom_point(aes(color = Experiment), size = 4) +
geom_text(aes(label = CellType), hjust = -0.3, size = 3) +
xlim(0, 33)
It is sometimes useful to save off the base ggplot object and add layers in separate commands. The plot is only rendered when R “prints” the object. This is useful for several reasons:
p <- ggplot(ablation, aes(x = Time, y = Score))
p <- p + geom_point(aes(color = Experiment, shape = Measurement), size = 4)
p <- p + geom_line(aes(group = interaction(Experiment, Measurement, CellType),
color = Experiment,
linetype = CellType))
print(p) # plot gets rendered now
Sourcing a file will not automatically generate output, so here we explicitly ask for the plot to be printed.
Here we’ve added a layer that plots lines. We want a separate line
for each unique combination of Experiment, Measurement, and CellType.
The interaction()
function takes a set of factors, and
computes a composite factor. To see what this does …
interaction(ablation$Experiment, ablation$Measurement, ablation$CellType)
## [1] E1909.LDLR-ABLATION.WT E1909.LDLR-ABLATION.WT E1909.LDLR-ABLATION.WT
## [4] E1909.LDLR-ABLATION.WT E1909.LDLR-ABLATION.WT E1909.LDLR-ABLATION.A-KD
## [7] E1909.LDLR-ABLATION.A-KD E1909.LDLR-ABLATION.A-KD E1909.LDLR-ABLATION.A-KD
## [10] E1909.LDLR-ABLATION.A-KD E1915.LDLR-ABLATION.WT E1915.LDLR-ABLATION.WT
## [13] E1915.LDLR-ABLATION.WT E1915.LDLR-ABLATION.WT E1915.LDLR-ABLATION.WT
## [16] E1915.LDLR-ABLATION.A-KD E1915.LDLR-ABLATION.A-KD E1915.LDLR-ABLATION.A-KD
## [19] E1915.LDLR-ABLATION.A-KD E1915.LDLR-ABLATION.A-KD E1921.LDLR-ABLATION.WT
## [22] E1921.LDLR-ABLATION.WT E1921.LDLR-ABLATION.WT E1921.LDLR-ABLATION.WT
## [25] E1921.LDLR-ABLATION.WT E1921.LDLR-ABLATION.A-KD E1921.LDLR-ABLATION.A-KD
## [28] E1921.LDLR-ABLATION.A-KD E1921.LDLR-ABLATION.A-KD E1921.LDLR-ABLATION.A-KD
## [31] E1909.TfR-ABLATION.WT E1909.TfR-ABLATION.WT E1909.TfR-ABLATION.WT
## [34] E1909.TfR-ABLATION.WT E1909.TfR-ABLATION.WT E1909.TfR-ABLATION.A-KD
## [37] E1909.TfR-ABLATION.A-KD E1909.TfR-ABLATION.A-KD E1909.TfR-ABLATION.A-KD
## [40] E1909.TfR-ABLATION.A-KD E1915.TfR-ABLATION.WT E1915.TfR-ABLATION.WT
## [43] E1915.TfR-ABLATION.WT E1915.TfR-ABLATION.WT E1915.TfR-ABLATION.WT
## [46] E1915.TfR-ABLATION.A-KD E1915.TfR-ABLATION.A-KD E1915.TfR-ABLATION.A-KD
## [49] E1915.TfR-ABLATION.A-KD E1915.TfR-ABLATION.A-KD E1921.TfR-ABLATION.WT
## [52] E1921.TfR-ABLATION.WT E1921.TfR-ABLATION.WT E1921.TfR-ABLATION.WT
## [55] E1921.TfR-ABLATION.WT E1921.TfR-ABLATION.A-KD E1921.TfR-ABLATION.A-KD
## [58] E1921.TfR-ABLATION.A-KD E1921.TfR-ABLATION.A-KD E1921.TfR-ABLATION.A-KD
## 12 Levels: E1909.LDLR-ABLATION.A-KD ... E1921.TfR-ABLATION.WT
This composite factor is passed to the group aesthetic of
geom_line()
to inform ggplot which data values go
together.
We have also added the shape binding to geom_point()
.
The shape of each point is determined by the corresponding Measurement.
Note that ggplot prefers six or fewer distinct shapes (i.e., that there
are no more than six levels in the corresponding factor). You can,
however, define more using a command like
scale_shape_manual(values = 1:11)
.
geom
sHere we specify the shapes we want to use, as well as jittering the points slightly so they no longer sit directly on top of one another.
p <- ggplot(ablation, aes(x = Time, y = Score))
p + geom_point(aes(color = Experiment, shape = Measurement),
size = 4, position = position_dodge(0.5)) +
scale_shape_manual(values = c(1,16))
We’ll show you how to draw a plot listing all the possible shapes at the end of this section.
When plotting many points, controlling opacity (using
alpha
) can also be useful.
p <- ggplot(ablation, aes(x = Time, y = Score))
p <- p + geom_point(aes(color = Experiment, shape = Measurement),
size = 4, alpha = 0.5)
print(p)
Some layers don’t plot data, but affect the plot in other ways. For
example, there are layers that control plot labels and plot theme (there
are eight themes built-in to ggplot2
and many more
available. See, for example, ggthemes
from CRAN; example
plots at https://yutannihilation.github.io/allYourFigureAreBelongToUs/ggthemes/,
or the R CHARTS site at https://r-charts.com/ggplot2/themes/). The
labs()
function also modifies legend labels.
p <- p + labs(title = "Ablation", x = "Time (minutes)", y = "% Saturation")
p <- p + theme_bw() + theme(plot.title = element_text(h = 0.5))
print(p)
ggplot gives you control over the scales of your plot. There is one scale for each binding. In the plot we just made, there are four scales that we can manipulate: the x and y axes, color, and shape. We also previously introduced a linetype scale.
Let’s change our x-axis to include the 5 minute timepoint. This is achieved with yet another layer.
p + scale_x_continuous(breaks = c(0, 5, 10, 20, 30))
p + scale_x_continuous(breaks = unique(ablation$Time))
Tip: In the second example above, we have computed the breaks from the data, rather than listing them individually. This makes the code we are writing usable even when the data changes. This is an essential strategy for reproducibly analyzing data at scale.
We can also manipulate legends with scale layers.
p <- ggplot(ablation, aes(x = Time, y = Score))
p <- p + geom_point(aes(color = Experiment, shape = Measurement), size = 4)
p <- p + geom_line(aes(group = interaction(Experiment, Measurement, CellType),
color = Experiment,
linetype = CellType))
p + scale_shape_manual(values = c(1,16), labels = gsub("-ABLATION", "", levels(ablation$Measurement))) +
scale_linetype_discrete(name = "Cell type") +
scale_color_manual(values = c("darkmagenta", "orange", "chartreuse4"))
Here we provide the labels for the Measurement scale (remember that we used an aesthetic to bind shape to Measurement). Note that ggplot will always order the labels according to the levels of the underlying factor, so the labels should be provided in that order. If you want to change the order in which the legend elements are displayed, change the underlying factor.
We have also changed the title of the CellType legend (the linetype binding) to be two words, and used a different color palette (for the binding to Experiment).
You can use built-in color palettes from ColorBrewer (https://colorbrewer2.org). To see all available palettes:
library(RColorBrewer)
display.brewer.all()
display.brewer.all(colorblindFriendly = TRUE)
and to use a ColorBrewer palette in your plot:
p + scale_color_brewer(palette = "Dark2")
This plot is probably showing too much data at once. One approach to resolve this would be to make separate plots for the LDLR and TfR measurements. You can make multiple plots at once using facets. Here are a few options.
p <- ggplot(ablation, aes(x = Time, y = Score))
p <- p + geom_point(aes(color = Experiment, shape = Measurement),
size = 4)
p + facet_wrap(~Measurement)
p + facet_grid(. ~ Measurement)
p + facet_grid(Measurement ~ .)
p <- p + geom_line(aes(group = CellType, linetype = CellType))
p + facet_grid(Experiment ~ Measurement)
p + facet_grid(Measurement ~ Experiment)
In these plots, you can remove the color and shape legends entirely (an option that can be specified in each of the respective legend layers) …
p + facet_grid(Measurement ~ Experiment) +
scale_color_discrete(guide = "none") +
scale_shape_discrete(guide = "none")
… or you may no longer want to bind the Measurement and Experiment variables to shape and color at all.
Tip:The
facet_wrap()
function in ggplot can be used to wrap a 1D ribbon of plots into a 2D layout. You can also use thegridExtra
package to place independently generated plots on the same page.
As promised above, here we show some code that prints a reference of all the 26 shapes that are available. Note that several shapes can also have a fill aesthetic.
i <- 0:25; x <- i %/% 6; y <- i %% 6
df <- data.frame(name = i, row = x, column = y)
ggplot(df, aes(x = row, y = column)) +
geom_point(shape = i, fill = "yellow", size = 4) +
geom_text(label = i, vjust = 2) +
scale_y_reverse(expand = expansion(add = 0.5)) +
theme(axis.title.x=element_blank(), axis.text.x=element_blank(),
axis.ticks.x = element_blank(),
axis.title.y=element_blank(), axis.text.y=element_blank())
For more inspiration into pretty R-based graphics, see https://www.r-graph-gallery.com/.