As last week, please make sure to document all the steps you take including making folders etc.

  1. Write a for-loop to download all fastq files of WT biological replicate no. 2 of the Gierlinski data set (UNIX).
    • ENA accession number: ERP004763 (feel free to download from another entry point to the SRA)
    • You can download the summary of the sample information, including the URLs for the fastq files, by clicking on the TEXT link on the Project page. You can also use wget to download this file, after you have clicked the link once (the URL needs to be generated for each dataset).
    • Use the mapping information available at http://dx.doi.org/10.6084/m9.figshare.1416210 to figure out which ENA accession names you need to obtain the samples of WT biological replicate no 2. Use the UNIX commands you know to generate a file with the sample names and then use that list as input to your for-loop.
  2. Why are there multiple fastq files per sample? What does each file represent?
  3. Count the number of reads stored in each FASTQ file and keep a note of the results (UNIX). The zcat command allows you to see the contents of a gzipped file.

Un-related to the Gierlinski data set:

  1. Identify and write down one or two biological or technical questions of interest for your project.

Compile the .Rmd file and send both the .Rmd and the HTML files to angsd_2019@zoho.com by Sunday night.