1. Describe at least 5 properties of RNA-seq read count data that researchers have identified – i.e. what are some of the reasons why we cannot take the read counts at face value? Please also describe the adjustments that need to be made to use them as reliable proxies for gene expression evaluations. (10pts)
  2. Understanding DESeq2’ S4 objects. (6pts total)

As we briefly discussed during class, DESeq2 (and many other bioconductor packages) often use very specific and elaborate R objects to combine observations (e.g. counts, normalized expression values, etc.) with corresponding meta data (e.g., gene IDs, sample IDs, sample conditions, gene ranges etc.). These objects are usually based off the S4 convention and if you are interested in the nitty-gritty details, we can recommend the bioconductor team’s write-up. If you do not know what type of object you are dealing with, class(object) is the function that releases that information. If the result isn’t one of the base R classes you learnt about during Luce’s class (e.g. data.frame, matrix, list), it is highly likely an S4 object. To get more background information, e.g. about how to interact or generate a certain object, you can always resort to the generic help call that you should be familiar with for functions, such as ?list or ?data.frame. To explore an S4 object more, you can use the function showClass, which expects the result of class(object) as an input (e.g. class(object) %>% showClass). You can think of S4 objects as sophisticated lists, i.e. they allow the combination of different types of objects. While the content of lists can be accessed via the same accessors as for data.frames (e.g. list$stored_object1 or list[["stored_object1"]], S4 objects use different accessors called slots, whose content you can retrieve using the @ symbol or a specific accessor function provided by the parent package, e.g. DESeqObject@counts or counts(DESeqObject). Which slots are present in a given S4 object can be determined via slotNames(object).

Use the functions we’ve described to inspect the objects that are returned by rlog() and DESeqDataSetFromMatrix, respectively. You can also try to just type their name in the console and hit return and see what happens.

  1. Exploratory analyses of DESeq.ds objects: generate a dendrogram and a PCA plot of our 10 samples and briefly explain the major insights you can derive from them. Can you infer what the pcaExplorer plots of the “Top Loadings” represent? (3pts)