Up till now, we have used R’s basic plotting functionality. The ggplot2 package,written by Hadley Wickham http://had.co.nz/, is more comprehensive and consistent. The extensive documentation, with examples, for ggplot can be found at
https://ggplot2.tidyverse.org/reference/

ggplot2 relies entirely on data frames for input.

Let’s make our first ggplot with the ablation data that we imported at the beginning of this semester

ablation <- read.csv("ablation.csv", header = TRUE, stringsAsFactors = TRUE)
names(ablation)[names(ablation) == "SCORE"] <- "Score"
head(ablation)
##     Measurement Experiment CellType Direction Time Score
## 1 LDLR-ABLATION      E1909       WT       ABL    0  2.82
## 2 LDLR-ABLATION      E1909       WT       ABL    5 11.37
## 3 LDLR-ABLATION      E1909       WT       ABL   10  9.03
## 4 LDLR-ABLATION      E1909       WT       ABL   20 28.27
## 5 LDLR-ABLATION      E1909       WT       ABL   30 42.86
## 6 LDLR-ABLATION      E1909     A-KD       ABL    0  6.99

You may have noticed that the format of the ablation data frame is a bit peculiar. The Excel sheet you imported for the plotting exercise is probably not what you are used to getting from your colleagues, or working with yourself. It is, however, in the canonical format for storing and manipulating data that you should be using.

The hallmark of this canonical (tidy) format is that there is only one (set of) independently observed value(s) in each row. All of the other columns are identifying values. They explain what exactly was measured, i.e., it’s metadata.

ggplot(ablation, aes(x = Time, y = Score)) + geom_point()

At a minimum, the two things that you need to give ggplot are:

ggplot gives you exquisite control over plotting parameters. Here, we’ll change the color and size of the points.

ggplot(ablation, aes(x = Time, y = Score)) + geom_point(color = "red", size = 4)

Aesthetics are used to bind plotting parameters to your data. The aes() function defines which variables you want to plot, and which plot parameters to map them to. Any aspect of the graph can be tied to any variable.

ggplot(ablation, aes(x = Time, y = Score)) +
  geom_point(aes(color = Experiment), size = 4)

ggplot(ablation, aes(x = Time, y = Score)) +
  geom_point(aes(color = Experiment, shape = CellType), size = 4)

When using ggplot, layers are added to a ggplot object. You can add as many layers as you like.

ggplot(ablation, aes(x = Time, y = Score)) +
  geom_point(aes(color = Experiment), size = 4) +
  geom_text(aes(label = CellType), hjust = 0, size = 3)

It is sometimes useful to save off the base ggplot object and add layers in separate commands. The plot is only rendered when R “prints” the object. This is useful for several reasons:

p <- ggplot(ablation, aes(x = Time, y = Score))
p <- p + geom_point(aes(color = Experiment, shape = Measurement), size = 4)
p <- p + geom_line(aes(group = interaction(Experiment, Measurement, CellType),
                       color = Experiment,
                       linetype = CellType))
print(p) # plot gets rendered now

Sourcing a file will not automatically generate output, so here we explicitly ask for the plot to be printed.

Here we’ve added a layer that plots lines. We want a separate line for each unique combination of Experiment, Measurement, and CellType. The interaction() function takes a set of factors, and computes a composite factor. To see what this does …

  interaction(ablation$Experiment, ablation$Measurement, ablation$CellType)
##  [1] E1909.LDLR-ABLATION.WT   E1909.LDLR-ABLATION.WT  
##  [3] E1909.LDLR-ABLATION.WT   E1909.LDLR-ABLATION.WT  
##  [5] E1909.LDLR-ABLATION.WT   E1909.LDLR-ABLATION.A-KD
##  [7] E1909.LDLR-ABLATION.A-KD E1909.LDLR-ABLATION.A-KD
##  [9] E1909.LDLR-ABLATION.A-KD E1909.LDLR-ABLATION.A-KD
## [11] E1915.LDLR-ABLATION.WT   E1915.LDLR-ABLATION.WT  
## [13] E1915.LDLR-ABLATION.WT   E1915.LDLR-ABLATION.WT  
## [15] E1915.LDLR-ABLATION.WT   E1915.LDLR-ABLATION.A-KD
## [17] E1915.LDLR-ABLATION.A-KD E1915.LDLR-ABLATION.A-KD
## [19] E1915.LDLR-ABLATION.A-KD E1915.LDLR-ABLATION.A-KD
## [21] E1921.LDLR-ABLATION.WT   E1921.LDLR-ABLATION.WT  
## [23] E1921.LDLR-ABLATION.WT   E1921.LDLR-ABLATION.WT  
## [25] E1921.LDLR-ABLATION.WT   E1921.LDLR-ABLATION.A-KD
## [27] E1921.LDLR-ABLATION.A-KD E1921.LDLR-ABLATION.A-KD
## [29] E1921.LDLR-ABLATION.A-KD E1921.LDLR-ABLATION.A-KD
## [31] E1909.TfR-ABLATION.WT    E1909.TfR-ABLATION.WT   
## [33] E1909.TfR-ABLATION.WT    E1909.TfR-ABLATION.WT   
## [35] E1909.TfR-ABLATION.WT    E1909.TfR-ABLATION.A-KD 
## [37] E1909.TfR-ABLATION.A-KD  E1909.TfR-ABLATION.A-KD 
## [39] E1909.TfR-ABLATION.A-KD  E1909.TfR-ABLATION.A-KD 
## [41] E1915.TfR-ABLATION.WT    E1915.TfR-ABLATION.WT   
## [43] E1915.TfR-ABLATION.WT    E1915.TfR-ABLATION.WT   
## [45] E1915.TfR-ABLATION.WT    E1915.TfR-ABLATION.A-KD 
## [47] E1915.TfR-ABLATION.A-KD  E1915.TfR-ABLATION.A-KD 
## [49] E1915.TfR-ABLATION.A-KD  E1915.TfR-ABLATION.A-KD 
## [51] E1921.TfR-ABLATION.WT    E1921.TfR-ABLATION.WT   
## [53] E1921.TfR-ABLATION.WT    E1921.TfR-ABLATION.WT   
## [55] E1921.TfR-ABLATION.WT    E1921.TfR-ABLATION.A-KD 
## [57] E1921.TfR-ABLATION.A-KD  E1921.TfR-ABLATION.A-KD 
## [59] E1921.TfR-ABLATION.A-KD  E1921.TfR-ABLATION.A-KD 
## 12 Levels: E1909.LDLR-ABLATION.A-KD ... E1921.TfR-ABLATION.WT

This composite factor is passed to the group aesthetic of geom_line() to inform ggplot which data values go together.

We have also added a new binding to geom_point(). The shape of each point is determined by the corresponding Measurement. Note that ggplot prefers six or fewer distinct shapes (i.e., there are no more than six levels in the corresponding factor). You can, however, define more using a command like scale_shape_manual(values = 1:11)

Here we specify the shapes we want to use, as well as jittering the points slightly so they no longer sit directly on top of one another.

p <- ggplot(ablation, aes(x = Time, y = Score))
p + geom_point(aes(color = Experiment, shape = Measurement),
                   size = 4, position = position_dodge(0.5)) +
  scale_shape_manual(values = c(1,16))

We’ll show you how to draw a plot listing all the possible shapes at the end of this section.

When plotting many points, controlling opacity (using alpha) can also be useful.

p <- ggplot(ablation, aes(x = Time, y = Score))
p <- p + geom_point(aes(color = Experiment, shape = Measurement),
                    size = 4, alpha = 0.5)
print(p)

Some layers don’t plot data, but affect the plot in other ways. For example, there are layers that control plot labels and plot theme (there are eight themes built-in to ggplot2 and many more available at https://jrnold.github.io/ggthemes/index.html). The labs() function also modifies legend labels.

p <- p + labs(title = "Ablation", x = "Time (minutes)", y = "% Saturation")
p <- p + theme_bw() + theme(plot.title = element_text(h = 0.5))
print(p)

ggplot gives you control over the scales of your plot. There is one scale for each binding. In the plot we just made, there are five scales that we can manipulate: the x and y axes and the three legends.

Let’s change our x-axis to include the 5 minute timepoint. This is achieved with yet another layer.

p + scale_x_continuous(breaks = c(0, 5, 10, 20, 30))

p + scale_x_continuous(breaks = unique(ablation$Time))

Tip: In the second example above, we have computed the breaks from the data, rather than listing them individually. This makes the code we are writing usable even when the data changes. This is an essential strategy for reproducibly analyzing data at scale.

We can also manipulate legends with scale layers.

p + scale_shape_manual(values = c(1,16), labels = c("LDLR", "TfR")) +
         scale_linetype_discrete(name = "Cell type") +
         scale_color_manual(values = c("brown", "orange", "green"))

Here we provide the labels for the Measurement scale (remember that we used an aesthetic to bind shape to Measurement). Note that ggplot will always order the labels according to the levels of the underlying factor, so the labels should be provided in that order. If you want to change the order in which the legend elements are displayed, change the underlying factor.

We have also changed the title of the CellType legend (the linetype binding) to be two words and used a different color palette (for the binding to Experiment).

You can use built-in color palettes from ColorBrewer (http://colorbrewer2.org). To see all available palettes:

library(RColorBrewer)
display.brewer.all()

and to use a ColorBrewer palette in your plot:

p + scale_color_brewer(palette = "Dark2")

This plot is probably showing too much data at once. One approach to resolve this would be to make separate plots for the LDLR and TfR measurements. You can make multiple plots at once using facets. Here are a few options.

p <- ggplot(ablation, aes(x = Time, y = Score))
p <- p + geom_point(aes(color = Experiment, shape = Measurement),
                    size = 4)

p + facet_grid(Measurement ~ .)

p + facet_grid(. ~ Measurement)

p <- p + geom_line(aes(group = CellType, linetype = CellType))
p + facet_grid(Experiment ~ Measurement)

p + facet_grid(Measurement ~ Experiment)

In these plots, you can remove the color and shape legends entirely (an option that can be specified in each of the respective legend layers) …

p + facet_grid(Measurement ~ Experiment) +
  scale_color_discrete(guide = "none") + scale_shape_discrete(guide = "none")

… or you may no longer want to bind the Measurement and Experiment variables to shape and color at all.

Tip:The facet_wrap() function in ggplot can be used to wrap a 1D ribbon of plots into a 2D layout. You can also use the gridExtra package to place independently generated plots on the same page.

As promised above, here we show some code that prints a reference of all the 26 shapes that are available. Note that several shapes can also have a fill aesthetic.

i <- 0:25; x <- i %/% 6; y <- i %% 6
df <- data.frame(name = i, row = x, column = y)
ggplot(df, aes(x = row, y = column)) +
  geom_point(shape = 0:25, fill = "yellow", size = 4) +
  geom_text(label = i, vjust = 2) +
  scale_y_reverse() +
  theme(axis.title.x=element_blank(), axis.text.x=element_blank(),
        axis.ticks.x = element_blank(),
        axis.title.y=element_blank(), axis.text.y=element_blank())

For more inspiration into pretty R-based graphics, see https://www.r-graph-gallery.com/.