Homework Week 10

You’re going to be looking at the count matrix from a droplet-based scRNA-seq of human cells.

1. Wrangling scRNA-seq data in R using bioconductor packages. (11 pts total plus possibility for 2 extra-credit points)

Download the count matrix (WT-1.dge.txt.gz) and read it into R.

Note: given the size of the matrix, you may have to do the analysis on the server, where you will not have access to RStudio, but to the console (just type R after logging in and switching to your designated folder). To obtain images for your Rmarkdown report, you could, for example, opt to write and compile the Rmd on your laptop. This would mean that you should set the chunk options to eval=FALSE because you’re not going to actually execute the code on your machine. To include images this way, you would have to compute them on the server, download them via scp and integrate them into the report via the common markdown syntax: ![](path-to-image). For more info on code chunks, see here. ALTERNATIVELY, without changing the chunk options, you could compile the html on the server where rmarkdown::render("input.Rmd") (within R) will carry out the rendering that happens when you click “Knit” in RStudio.

What do the columns represent? What do the rows represent? (1pt)
Create a SingleCellExperiment object. (This resource is useful in getting to know the capabilities of the bioconductor objects related to scRNA-seq). (1pt)
Show a snippet of the count matrix that is now part of the SCE object. (0.5pt)
Calculate the numbers of reads (~ sequencing depths) for the first five cells, i.e. you should have 5 values in the end. (1pt)
How many genes have non-zero counts in the first five cells? (1pt)
If you were to change the row- and column-names of the original matrix, how could you keep track of the original names? (1pt)
Following the simpleSingleCell workflow, generate histograms or density plots of (a) the total number of UMI counts per cell and (b) total number of expressed genes. (1pt plus 1pt extra-credit if you generate the plots with ggplot2).
- Describe in your own words what the two different histograms show and what that means for the data at hand. (2pts)
- For another extra-credit point, you could generate the histogram for “% mitochondrial reads”.

Note: You may find that the histograms are not as informative as you had hoped. You may find the combination of violin plots (geom_violin()) and beeswarm plots (ggbeeswarm::geom_quasirandom(alpha = 0.5)) more helpful. More details on beeswarm plots can be found here.

Decide on some threshold for either QC parameter and remove the corresponding cells. (1pt)
Using the filtered data set, normalize the counts using scran and scater and judge whether the size factors calculated by computeSumFactors show the expected behavior as shown in Figure 6 of the simpleSingleCell workflow. (1pt)
- How can you access the normalized data matrix? (0.5pt)

2. scRNA-seq data wrangling in R using Seurat. (7pts)

Seurat can be installed via the usual install.packages routine.

Create a Seurat object (function: Seurat::CreateSeuratObject) (1pt)
Perform the same filtering that you chose to do on the SCE object. (1pt)
Normalize the data using Seurat’s function for this (NormalizeData). (.5pt)
- How can you access the normalized data matrix? (Answer 7 of Seurat’s FAQ should be helpful here) (0.5pts)
For the first 10 cells, do pairwise comparisons for each cell of the normalized values from the Seurat object and the SCE object (scatter plots are fine; you may want to check out the GGally package, specifically the ggpairs function. We also recommend to remove genes that have zero counts in all the samples). Explain what you see. (2pts)

Homework Week 10

Friederike Duendar and Luce Skrabanek

ANGSD Course 2019

1. Wrangling scRNA-seq data in R using bioconductor packages. (11 pts total plus possibility for 2 extra-credit points)

2. scRNA-seq data wrangling in R using Seurat. (7pts)

3. Final question: what types of cells do you think you’re looking at? (1pt + 1 extra-credit point)