--- title: "Introduction to ggplot" author: "Luce" date: "14 October 2021" output: html_document: toc: yes editor_options: chunk_output_type: console --- ```{r setup, include=FALSE} knitr::opts_chunk$set(echo = TRUE) ``` ## Getting set up Up till now, we have used R's basic plotting functionality. The *ggplot2* package,written by Hadley Wickham , is more comprehensive and consistent. The extensive documentation, with examples, for `ggplot` can be found at
```{r load_ggplot2_and_friends, include = FALSE} library(tidyverse) ``` ggplot2 relies entirely on data frames for input. ### Importing our dataset Let's make our first ggplot with dataset of ablation data (`ablation`, downloadable from the website): ```{r import_ablation_dataset} ablation <- read.csv("ablation.csv", header = TRUE, stringsAsFactors = TRUE) names(ablation)[names(ablation) == "SCORE"] <- "Score" head(ablation) ``` You may have noticed that the format of the `ablation` data frame is a bit peculiar. It is probably not what you are used to getting from your colleagues, or working with yourself. It is, however, in the canonical format for storing and manipulating data that you _should_ be using. The hallmark of this canonical (tidy) format is that there is only one (set of) independently observed value(s) in each row. All of the other columns are identifying values. They explain what exactly was measured, i.e., its metadata. ### Our first plot ```{r simple_initial_plot} ggplot(ablation, aes(x = Time, y = Score)) + geom_point() ``` At a minimum, the two things that you need to give ggplot are: - The dataset (which must be a data frame), and the variable(s) you want to plot - The type of plot you want to make. ## Manipulating aspects of our plot ggplot gives you exquisite control over plotting parameters. ### Changing color and size Here, we'll change the color and size of the points. ```{r change_color_size} ggplot(ablation, aes(x = Time, y = Score)) + geom_point(colour = "red", size = 4) ``` ### Binding variables to plotting parameters Aesthetics are used to bind plotting parameters to your data. The aes() function defines which variables you want to plot, and which plot parameters to map them to. Any aspect of the graph can be tied to any variable. ```{r change_mappings} ggplot(ablation, aes(x = Time, y = Score)) + geom_point(aes(color = Experiment), size = 4) ggplot(ablation, aes(x = Time, y = Score)) + geom_point(aes(color = Experiment, shape = CellType), size = 4) ``` ### Including multiple layers When using ggplot, layers are added to a ggplot object. You can add as many layers as you like. ```{r add_text_layer} ggplot(ablation, aes(x = Time, y = Score)) + geom_point(aes(color = Experiment), size = 4) + geom_text(aes(label = CellType), hjust = -0.3, size = 3) ``` ... and tidy it up a little ```{r fix_xlim} ggplot(ablation, aes(x = Time, y = Score)) + geom_point(aes(color = Experiment), size = 4) + geom_text(aes(label = CellType), hjust = -0.3, size = 3) + xlim(0, 33) ``` ### Building up a ggplot object It is sometimes useful to save off the base ggplot object and add layers in separate commands. The plot is only rendered when R "prints" the object. This is useful for several reasons: - We don't need to create one big huge command to create a plot, we can create it piecemeal. - The plot will not get rendered until it has received all of its information, and therefore allows ggplot2 to be more intelligent than R's built-in plotting commands when deciding how large a plot should be, what the best scale is, etc. ```{r line_with_interaction} p <- ggplot(ablation, aes(x = Time, y = Score)) p <- p + geom_point(aes(color = Experiment, shape = Measurement), size = 4) p <- p + geom_line(aes(group = interaction(Experiment, Measurement, CellType), color = Experiment, linetype = CellType)) print(p) # plot gets rendered now ``` Sourcing a file will not automatically generate output, so here we explicitly ask for the plot to be printed. Here we've added a layer that plots lines. We want a separate line for each unique combination of Experiment, Measurement, and CellType. The `interaction()` function takes a set of factors, and computes a composite factor. To see what this does ... ```{r demonstrate_interaction} interaction(ablation$Experiment, ablation$Measurement, ablation$CellType) ``` This composite factor is passed to the group aesthetic of `geom_line()` to inform ggplot which data values go together. We have also added the shape binding to `geom_point()`. The shape of each point is determined by the corresponding Measurement. Note that ggplot prefers six or fewer distinct shapes (i.e., that there are no more than six levels in the corresponding factor). You can, however, define more using a command like `scale_shape_manual(values = 1:11)`. ### Modifying the default behavior of `geom`s Here we specify the shapes we want to use, as well as jittering the points slightly so they no longer sit directly on top of one another. ```{r change_shape} p <- ggplot(ablation, aes(x = Time, y = Score)) p + geom_point(aes(color = Experiment, shape = Measurement), size = 4, position = position_dodge(0.5)) + scale_shape_manual(values = c(1,16)) ``` We'll show you how to draw a plot listing all the possible shapes at the end of this section. When plotting many points, controlling opacity (using `alpha`) can also be useful. ```{r change_opacity} p <- ggplot(ablation, aes(x = Time, y = Score)) p <- p + geom_point(aes(color = Experiment, shape = Measurement), size = 4, alpha = 0.5) print(p) ``` ### Controlling non-data plot elements Some layers don't plot data, but affect the plot in other ways. For example, there are layers that control plot labels and plot theme (there are eight themes built-in to `ggplot2` and many more available. See, for example, `ggthemes` from CRAN; example plots at ). The `labs()` function also modifies legend labels. ```{r change_labels} p <- p + labs(title = "Ablation", x = "Time (minutes)", y = "% Saturation") p <- p + theme_bw() + theme(plot.title = element_text(h = 0.5)) print(p) ``` ggplot gives you control over the scales of your plot. There is one scale for each binding. In the plot we just made, there are five scales that we can manipulate: the x and y axes and the three legends. Let's change our x-axis to include the 5 minute timepoint. This is achieved with yet another layer. ```{r change_x_axis_ticks} p + scale_x_continuous(breaks = c(0, 5, 10, 20, 30)) p + scale_x_continuous(breaks = unique(ablation$Time)) ``` > **_Tip:_** In the second example above, we have computed the breaks from the data, rather than listing them individually. This makes the code we are writing usable even when the data changes. This is an essential strategy for reproducibly analyzing data at scale. ### More scale manipulations We can also manipulate legends with scale layers. ```{r change_multiple_scales} p <- ggplot(ablation, aes(x = Time, y = Score)) p <- p + geom_point(aes(color = Experiment, shape = Measurement), size = 4) p <- p + geom_line(aes(group = interaction(Experiment, Measurement, CellType), color = Experiment, linetype = CellType)) p + scale_shape_manual(values = c(1,16), labels = c("LDLR", "TfR")) + scale_linetype_discrete(name = "Cell type") + scale_color_manual(values = c("brown", "orange", "forestgreen")) ``` Here we provide the labels for the Measurement scale (remember that we used an aesthetic to bind shape to Measurement). Note that ggplot will always order the labels according to the levels of the underlying factor, so the labels should be provided in that order. If you want to change the order in which the legend elements are displayed, change the underlying factor. We have also changed the title of the CellType legend (the linetype binding) to be two words, and used a different color palette (for the binding to Experiment). ## Color palettes You can use built-in color palettes from ColorBrewer (). To see all available palettes: ```{r show_colorbrewer} library(RColorBrewer) display.brewer.all() ``` and to use a ColorBrewer palette in your plot: ```{r use_colorbrewer} p + scale_color_brewer(palette = "Dark2") ``` ## Faceting plots This plot is probably showing too much data at once. One approach to resolve this would be to make separate plots for the LDLR and TfR measurements. You can make multiple plots at once using facets. Here are a few options. ```{r demonstrate_facets} p <- ggplot(ablation, aes(x = Time, y = Score)) p <- p + geom_point(aes(color = Experiment, shape = Measurement), size = 4) p + facet_grid(Measurement ~ .) p + facet_grid(. ~ Measurement) p <- p + geom_line(aes(group = CellType, linetype = CellType)) p + facet_grid(Experiment ~ Measurement) p + facet_grid(Measurement ~ Experiment) ``` In these plots, you can remove the color and shape legends entirely (an option that can be specified in each of the respective legend layers) ... ```{r remove_legend_guides} p + facet_grid(Measurement ~ Experiment) + scale_color_discrete(guide = "none") + scale_shape_discrete(guide = "none") ``` ... or you may no longer want to bind the Measurement and Experiment variables to shape and color at all. > **_Tip:_**The `facet_wrap()` function in ggplot can be used to wrap a 1D ribbon of plots into a 2D layout. You can also use the `gridExtra` package to place independently generated plots on the same page. ## Printing all the possible shapes As promised above, here we show some code that prints a reference of all the 26 shapes that are available. Note that several shapes can also have a fill aesthetic. ```{r plot_all_shapes} i <- 0:25; x <- i %/% 6; y <- i %% 6 df <- data.frame(name = i, row = x, column = y) ggplot(df, aes(x = row, y = column)) + geom_point(shape = 0:25, fill = "yellow", size = 4) + geom_text(label = i, vjust = 2) + scale_y_reverse() + theme(axis.title.x=element_blank(), axis.text.x=element_blank(), axis.ticks.x = element_blank(), axis.title.y=element_blank(), axis.text.y=element_blank()) ``` ## Further inspiration For more inspiration into pretty R-based graphics, see .