Questions
- A somatic human cell contains about 6 picograms of DNA. How much DNA does a sperm cell contain? (1pt)
- How many human cells will you need to obtain 10 micrograms of DNA, as is requested by many sequencing protocols? (1pt)
- Describe one typical cause for DNA loss during DNA extraction (1pt).
- Describe two functions of the adapters that are typically added during Illumina’s library preparations. Which types of sequences do they often entail? (2pts)
- Components of what kind of lab equipment have to be integrated in any type of Illumina sequencer? (1pt)
- agarose gel
- centrifuge
- microwave
- microscope
- scale
Exercises
Similarly to last week, please make sure to document all the steps you take, including making folders etc.
The download from ENA sometimes takes very long (as in: hours, days). Make sure to start this early enough!
- Write a for-loop to download all
fastq
files of WT biological replicate no. 2 of the Gierlinski RNA-seq data set (UNIX
). Try to have a solution that’s as generally applicable as possible. (3pts)
- ENA accession number: ERP004763 (feel free to download from another entry point to the SRA)
- You can download the summary of the sample information, including the URLs for the fastq files, by clicking on the
TSV
link on the Project page. You can also use wget
to download this file: do a right-mouse click on TSV
after you’ve downloaded it once, then select “Copy link location” and use the URL from the clipboard to paste it into the terminal.
- Use the information available at http://dx.doi.org/10.6084/m9.figshare.1416210 to figure out which ENA accession names (such as
ERR458493
) you need to obtain all the fastq
files belonging to WT biological replicate no 2. Use the UNIX commands you know to generate a file with the individual accession IDs and then use that list as input to your for-loop.
- Why are there multiple
fastq
files for sample WT_2
? What does each file represent? (1pt)
- Count the number of lines in each FASTQ file and keep a note of the results (
UNIX
). The zcat
command allows you to see the contents of a gzipped file. A fastq
file has 4 lines per read. Do a second for-loop where you determine the number of reads per file. (3pts)
Project
- Identify and write down one or two biological or technical questions of interest for your project. Make sure to check the requirements for the project on the website. (1pt)
Compile the .Rmd
file and send both the .Rmd and the HTML files to angsd_wmc@zohomail.com by Saturday night.