As last week, please make sure to document all the steps you take including making folders etc.
- Write a for-loop to download all
fastq
files of WT biological replicate no. 2 of the Gierlinski data set (UNIX
).
- ENA accession number: ERP004763 (feel free to download from another entry point to the SRA)
- You can download the summary of the sample information, including the URLs for the fastq files, by clicking on the
TEXT
link on the Project page. You can also use wget
to download this file, after you have clicked the link once (the URL needs to be generated for each dataset).
- Use the mapping information available at http://dx.doi.org/10.6084/m9.figshare.1416210 to figure out which ENA accession names you need to obtain the samples of WT biological replicate no 2. Use the UNIX commands you know to generate a file with the sample names and then use that list as input to your for-loop.
- Why are there multiple
fastq
files per sample? What does each file represent?
- Count the number of reads stored in each FASTQ file and keep a note of the results (
UNIX
). The zcat
command allows you to see the contents of a gzipped file.
Un-related to the Gierlinski data set:
- Identify and write down one or two biological or technical questions of interest for your project.
Compile the .Rmd
file and send both the .Rmd and the HTML files to angsd_2019@zoho.com by Sunday night.