DESeq.ds
objects: generate a dendrogram and a PCA plot of our 10 samples and briefly explain the major insights you can derive from them. You may use the code detailed in https://physiology.med.cornell.edu/faculty/skrabanek/lab/angsd/lecture_notes/08_practical_exploratory.pdf, but make sure to explain what every (!) function does and what type of output each function generates. Can you infer what the pcaExplorer
plots of the “Top Loadings” represent? (4pts)DESeq2
S4 objects. (5.5pts total)As we briefly discussed during class,
DESeq2
(and many other bioconductor packages) often use very specific and elaborate R objects to combine observations (e.g. counts, normalized expression values, etc.) with corresponding meta data (e.g., gene IDs, sample IDs, sample conditions, gene ranges etc.). These objects are usually based off theS4
convention and if you are interested in the nitty-gritty details, we can recommend the bioconductor team’s write-up. If you do not know what type of object you are dealing with,class(object)
is the function that releases that information. If the result isn’t one of the base R classes you learnt about during Luce’s class (e.g.data.frame
,matrix
,list
), it is highly likely an S4 object. To get more background information, e.g. about how to interact or generate a certain object, you can always resort to the generic help call that you should be familiar with for functions, such as?list
or?data.frame
. To explore an S4 object more, you can use the functionshowClass
, which expects the result ofclass(object)
as an input (e.g.class(object) %>% showClass
). You can think of S4 objects as sophisticated lists insofar as they allow the combination of different types of objects. But where the content of lists can be accessed via the same accessors that are used fordata.frames
(e.g.list$stored_object1
orlist[["stored_object1"]]
, S4 objects use different accessors calledslots
, whose content you can retrieve using the@
symbol or via specific accessor function provided by the parent package, e.g.DESeqObject@counts
orcounts(DESeqObject)
. Which slots are present in a given S4 object can be determined viaslotNames(object)
.
Use the functions we’ve described to inspect the objects that are returned by rlog()
and DESeqDataSetFromMatrix
, respectively. You can also try to just type their name in the console and hit return and see what happens.
Read the instructions here. Note that the getMethods()
described there has been deprecated and has been replaced by findMethods()
, which does not require you to specify the object type for which you’re retrieving the code.
Include the source code of the following DESeq2
functions in your homework’s html:
rlog()
estimateDispersions()
rlogData()
– this is a non-exported function from DESeq2
DESeq2
calculates a p-value that you can use to decide whether a given gene shows different expression values when comparing two conditions of interest. Start from what types of values are used as the measurements. You may consult the document on our website that contains a verbose summary of what we talked about in class. (3pts)nbinomWaldTest()
, identify the relevant line where the Wald statistic is calculated. Explain what the objects contain that are used for the calculation. (2pts for extra credit)Compile the
.Rmd
file and send both the.Rmd
and theHTML
files to angsd_wmc@zohomail.com by Saturday night. If you need support, get in touch with Merv on Thursday, 3-4pm.
(21pts)