Introduction to ggplot

Getting set up
- Importing our dataset
- Our first plot
Manipulating aspects of our plot
Color palettes
Faceting plots
Printing all the possible shapes
Further inspiration

Getting set up

Up till now, we have used R’s basic plotting functionality. The ggplot2 package,written by Hadley Wickham https://hadley.nz/, is more comprehensive and consistent. The extensive documentation, with examples, for ggplot can be found at
https://ggplot2.tidyverse.org/reference/

ggplot2 relies entirely on data frames for input.

Importing our dataset

Let’s make our first ggplot with a dataset of ablation data (ablation, downloadable from the website):

ablation <- read.csv("ablation.csv", header = TRUE, stringsAsFactors = TRUE)
names(ablation)[names(ablation) == "SCORE"] <- "Score"
head(ablation)

##     Measurement Experiment CellType Direction Time Score
## 1 LDLR-ABLATION      E1909       WT       ABL    0  2.82
## 2 LDLR-ABLATION      E1909       WT       ABL    5 11.37
## 3 LDLR-ABLATION      E1909       WT       ABL   10  9.03
## 4 LDLR-ABLATION      E1909       WT       ABL   20 28.27
## 5 LDLR-ABLATION      E1909       WT       ABL   30 42.86
## 6 LDLR-ABLATION      E1909     A-KD       ABL    0  6.99

You may have noticed that the format of the ablation data frame is a bit peculiar. It is probably not what you are used to getting from your colleagues, or working with yourself. It is, however, in the canonical format for storing and manipulating data that you should be using.

The hallmark of this canonical (tidy) format is that there is only one (set of) independently observed value(s) in each row. All of the other columns are identifying values, or variables. They explain what exactly was measured, i.e., its metadata.

Our first plot

ggplot(ablation, aes(x = Time, y = Score)) + geom_point()

At a minimum, the two things that you need to give ggplot are:

The dataset (which must be a data frame), and the variable(s) you want to plot
The type of plot you want to make.

Manipulating aspects of our plot

ggplot gives you exquisite control over plotting parameters.

Changing color and size

Here, we’ll change the color and size of the points.

ggplot(ablation, aes(x = Time, y = Score)) + geom_point(colour = "red", size = 4)

Binding variables to plotting parameters

Aesthetics are used to bind plotting parameters to your data. The aes() function defines which variables you want to plot, and which plot parameters to map them to. Any aspect of the graph can be tied to any variable.

ggplot(ablation, aes(x = Time, y = Score)) +
  geom_point(aes(color = Experiment), size = 4)

ggplot(ablation, aes(x = Time, y = Score)) +
  geom_point(aes(color = Experiment, shape = CellType), size = 4)

Including multiple layers

When using ggplot, layers are added to a ggplot object. You can add as many layers as you like.

ggplot(ablation, aes(x = Time, y = Score)) +
  geom_point(aes(color = Experiment), size = 4) +
  geom_text(aes(label = CellType), hjust = -0.3, size = 3)

… and tidy it up a little

ggplot(ablation, aes(x = Time, y = Score)) +
  geom_point(aes(color = Experiment), size = 4) +
  geom_text(aes(label = CellType), hjust = -0.3, size = 3) +
  xlim(0, 33)

Building up a ggplot object

It is sometimes useful to save off the base ggplot object and add layers in separate commands. The plot is only rendered when R “prints” the object. This is useful for several reasons:

We don’t need to create one big huge command to create a plot, we can create it piecemeal.
The plot will not get rendered until it has received all of its information, and therefore allows ggplot2 to be more intelligent than R’s built-in plotting commands when deciding how large a plot should be, what the best scale is, etc.

p <- ggplot(ablation, aes(x = Time, y = Score))
p <- p + geom_point(aes(color = Experiment, shape = Measurement), size = 4)
p <- p + geom_line(aes(group = interaction(Experiment, Measurement, CellType),
                       color = Experiment,
                       linetype = CellType))
print(p) # plot gets rendered now

Sourcing a file will not automatically generate output, so here we explicitly ask for the plot to be printed.

Here we’ve added a layer that plots lines. We want a separate line for each unique combination of Experiment, Measurement, and CellType. The interaction() function takes a set of factors, and computes a composite factor. To see what this does …

  interaction(ablation$Experiment, ablation$Measurement, ablation$CellType)

##  [1] E1909.LDLR-ABLATION.WT   E1909.LDLR-ABLATION.WT   E1909.LDLR-ABLATION.WT  
##  [4] E1909.LDLR-ABLATION.WT   E1909.LDLR-ABLATION.WT   E1909.LDLR-ABLATION.A-KD
##  [7] E1909.LDLR-ABLATION.A-KD E1909.LDLR-ABLATION.A-KD E1909.LDLR-ABLATION.A-KD
## [10] E1909.LDLR-ABLATION.A-KD E1915.LDLR-ABLATION.WT   E1915.LDLR-ABLATION.WT  
## [13] E1915.LDLR-ABLATION.WT   E1915.LDLR-ABLATION.WT   E1915.LDLR-ABLATION.WT  
## [16] E1915.LDLR-ABLATION.A-KD E1915.LDLR-ABLATION.A-KD E1915.LDLR-ABLATION.A-KD
## [19] E1915.LDLR-ABLATION.A-KD E1915.LDLR-ABLATION.A-KD E1921.LDLR-ABLATION.WT  
## [22] E1921.LDLR-ABLATION.WT   E1921.LDLR-ABLATION.WT   E1921.LDLR-ABLATION.WT  
## [25] E1921.LDLR-ABLATION.WT   E1921.LDLR-ABLATION.A-KD E1921.LDLR-ABLATION.A-KD
## [28] E1921.LDLR-ABLATION.A-KD E1921.LDLR-ABLATION.A-KD E1921.LDLR-ABLATION.A-KD
## [31] E1909.TfR-ABLATION.WT    E1909.TfR-ABLATION.WT    E1909.TfR-ABLATION.WT   
## [34] E1909.TfR-ABLATION.WT    E1909.TfR-ABLATION.WT    E1909.TfR-ABLATION.A-KD 
## [37] E1909.TfR-ABLATION.A-KD  E1909.TfR-ABLATION.A-KD  E1909.TfR-ABLATION.A-KD 
## [40] E1909.TfR-ABLATION.A-KD  E1915.TfR-ABLATION.WT    E1915.TfR-ABLATION.WT   
## [43] E1915.TfR-ABLATION.WT    E1915.TfR-ABLATION.WT    E1915.TfR-ABLATION.WT   
## [46] E1915.TfR-ABLATION.A-KD  E1915.TfR-ABLATION.A-KD  E1915.TfR-ABLATION.A-KD 
## [49] E1915.TfR-ABLATION.A-KD  E1915.TfR-ABLATION.A-KD  E1921.TfR-ABLATION.WT   
## [52] E1921.TfR-ABLATION.WT    E1921.TfR-ABLATION.WT    E1921.TfR-ABLATION.WT   
## [55] E1921.TfR-ABLATION.WT    E1921.TfR-ABLATION.A-KD  E1921.TfR-ABLATION.A-KD 
## [58] E1921.TfR-ABLATION.A-KD  E1921.TfR-ABLATION.A-KD  E1921.TfR-ABLATION.A-KD 
## 12 Levels: E1909.LDLR-ABLATION.A-KD ... E1921.TfR-ABLATION.WT

This composite factor is passed to the group aesthetic of geom_line() to inform ggplot which data values go together.

We have also added the shape binding to geom_point(). The shape of each point is determined by the corresponding Measurement. Note that ggplot prefers six or fewer distinct shapes (i.e., that there are no more than six levels in the corresponding factor). You can, however, define more using a command like scale_shape_manual(values = 1:11).

Modifying the default behavior of `geom`s

Here we specify the shapes we want to use, as well as jittering the points slightly so they no longer sit directly on top of one another.

p <- ggplot(ablation, aes(x = Time, y = Score))
p + geom_point(aes(color = Experiment, shape = Measurement),
                   size = 4, position = position_dodge(0.5)) +
  scale_shape_manual(values = c(1,16))

We’ll show you how to draw a plot listing all the possible shapes at the end of this section.

When plotting many points, controlling opacity (using alpha) can also be useful.

p <- ggplot(ablation, aes(x = Time, y = Score))
p <- p + geom_point(aes(color = Experiment, shape = Measurement),
                    size = 4, alpha = 0.5)
print(p)

Controlling non-data plot elements

Some layers don’t plot data, but affect the plot in other ways. For example, there are layers that control plot labels and plot theme (there are eight themes built-in to ggplot2 and many more available. See, for example, ggthemes from CRAN; example plots at https://yutannihilation.github.io/allYourFigureAreBelongToUs/ggthemes/, or the R CHARTS site at https://r-charts.com/ggplot2/themes/). The labs() function also modifies legend labels.

p <- p + labs(title = "Ablation", x = "Time (minutes)", y = "% Saturation")
p <- p + theme_bw() + theme(plot.title = element_text(h = 0.5))
print(p)

ggplot gives you control over the scales of your plot. There is one scale for each binding. In the plot we just made, there are four scales that we can manipulate: the x and y axes, color, and shape. We also previously introduced a linetype scale.

Let’s change our x-axis to include the 5 minute timepoint. This is achieved with yet another layer.

p + scale_x_continuous(breaks = c(0, 5, 10, 20, 30))

p + scale_x_continuous(breaks = unique(ablation$Time))

Tip: In the second example above, we have computed the breaks from the data, rather than listing them individually. This makes the code we are writing usable even when the data changes. This is an essential strategy for reproducibly analyzing data at scale.

More scale manipulations

We can also manipulate legends with scale layers.

p <- ggplot(ablation, aes(x = Time, y = Score))
p <- p + geom_point(aes(color = Experiment, shape = Measurement), size = 4)
p <- p + geom_line(aes(group = interaction(Experiment, Measurement, CellType),
                       color = Experiment,
                       linetype = CellType))
p + scale_shape_manual(values = c(1,16), labels = gsub("-ABLATION", "", levels(ablation$Measurement))) +
         scale_linetype_discrete(name = "Cell type") +
         scale_color_manual(values = c("darkmagenta", "orange", "chartreuse4"))

Here we provide the labels for the Measurement scale (remember that we used an aesthetic to bind shape to Measurement). Note that ggplot will always order the labels according to the levels of the underlying factor, so the labels should be provided in that order. If you want to change the order in which the legend elements are displayed, change the underlying factor.

We have also changed the title of the CellType legend (the linetype binding) to be two words, and used a different color palette (for the binding to Experiment).

Color palettes

You can use built-in color palettes from ColorBrewer (https://colorbrewer2.org). To see all available palettes:

library(RColorBrewer)
display.brewer.all()

display.brewer.all(colorblindFriendly = TRUE)

and to use a ColorBrewer palette in your plot:

p + scale_color_brewer(palette = "Dark2")

Faceting plots

This plot is probably showing too much data at once. One approach to resolve this would be to make separate plots for the LDLR and TfR measurements. You can make multiple plots at once using facets. Here are a few options.

p <- ggplot(ablation, aes(x = Time, y = Score))
p <- p + geom_point(aes(color = Experiment, shape = Measurement),
                    size = 4)

p + facet_wrap(~Measurement)

p + facet_grid(. ~ Measurement)

p + facet_grid(Measurement ~ .)

p <- p + geom_line(aes(group = CellType, linetype = CellType))
p + facet_grid(Experiment ~ Measurement)

p + facet_grid(Measurement ~ Experiment)

In these plots, you can remove the color and shape legends entirely (an option that can be specified in each of the respective legend layers) …

p + facet_grid(Measurement ~ Experiment) +
  scale_color_discrete(guide = "none") + 
  scale_shape_discrete(guide = "none")

… or you may no longer want to bind the Measurement and Experiment variables to shape and color at all.

Tip:The facet_wrap() function in ggplot can be used to wrap a 1D ribbon of plots into a 2D layout. You can also use the gridExtra package to place independently generated plots on the same page.

Printing all the possible shapes

As promised above, here we show some code that prints a reference of all the 26 shapes that are available. Note that several shapes can also have a fill aesthetic.

i <- 0:25; x <- i %/% 6; y <- i %% 6
df <- data.frame(name = i, row = x, column = y)
ggplot(df, aes(x = row, y = column)) +
  geom_point(shape = i, fill = "yellow", size = 4) +
  geom_text(label = i, vjust = 2) +
  scale_y_reverse(expand = expansion(add = 0.5)) +
  theme(axis.title.x=element_blank(), axis.text.x=element_blank(),
        axis.ticks.x = element_blank(),
        axis.title.y=element_blank(), axis.text.y=element_blank())

Further inspiration

For more inspiration into pretty R-based graphics, see https://www.r-graph-gallery.com/.

Introduction to ggplot

Luce

10 October 2024

Getting set up

Importing our dataset

Our first plot

Manipulating aspects of our plot

Changing color and size

Binding variables to plotting parameters

Including multiple layers

Building up a ggplot object

Modifying the default behavior of `geom`s

Controlling non-data plot elements

More scale manipulations

Color palettes

Faceting plots

Printing all the possible shapes

Further inspiration

Introduction to ggplot

Luce

10 October 2024

Getting set up

Importing our dataset

Our first plot

Manipulating aspects of our plot

Changing color and size

Binding variables to plotting parameters

Including multiple layers

Building up a ggplot object

Modifying the default behavior of geoms

Controlling non-data plot elements

More scale manipulations

Color palettes

Faceting plots

Printing all the possible shapes

Further inspiration

Modifying the default behavior of `geom`s