You’re going to be looking at the count matrix from a droplet-based scRNA-seq of human cells.

1. Wrangling scRNA-seq data in R using bioconductor packages. (11 pts total plus possibility for 2 extra-credit points)

Note: given the size of the matrix, you may have to do the analysis on the server, where you will not have access to RStudio, but to the console (just type R after logging in and switching to your designated folder). To obtain images for your Rmarkdown report, you could, for example, opt to write and compile the Rmd on your laptop. This would mean that you should set the chunk options to eval=FALSE because you’re not going to actually execute the code on your machine. To include images this way, you would have to compute them on the server, download them via scp and integrate them into the report via the common markdown syntax: ![](path-to-image). For more info on code chunks, see here. ALTERNATIVELY, without changing the chunk options, you could compile the html on the server where rmarkdown::render("input.Rmd") (within R) will carry out the rendering that happens when you click “Knit” in RStudio.

Note: You may find that the histograms are not as informative as you had hoped. You may find the combination of violin plots (geom_violin()) and beeswarm plots (ggbeeswarm::geom_quasirandom(alpha = 0.5)) more helpful. More details on beeswarm plots can be found here.

2. scRNA-seq data wrangling in R using Seurat. (7pts)

Seurat can be installed via the usual install.packages routine.

3. Final question: what types of cells do you think you’re looking at? (1pt + 1 extra-credit point)

Hint: It’s a fairly homogeneous population, i.e. all cells would probably be called the same cell name where cell name would be something like “skin cell”. Explain your reasoning!

The point is for your reasoning, there will be an extra-credit point if you identify the cell type correctly.