1. Write a script that will
    • run BWA on one of the samples from the Gierlinski dataset
    • run STAR on the same sample
  2. Subset the aligned reads to select only those that map to chromosome I.
  3. Compare the output from BWA and STAR, and summarize any results or differences.
    • Which optional SAM fields does STAR add and what do they represent?
    • Which optional SAM fields does BWA add and what do they represent?
    • How does the interpretation of the mapping quality field differ in both?
    • Find a read that has been split in STAR. How did BWA handle the mapping of that read?

Project work: (due on Feb 18!)

  1. Download at least one FASTQ file that you will be working with for your project. Document the following details:
    • where did you get it from?
    • what publication is it linked to?
    • who generated the data?
    • how was the NA extracted?
    • what library prep was used?
    • what cell type was used?
    • what was the treatment/experimental condition?
    • what sequencing platform was used?
  2. Align the FASTQ file with an appropriate aligner (you may have to build a new index). Document:
    • parameters (and why you chose them)
    • summary of outcome and basic QC

Compile the .Rmd file and send both the .Rmd and the HTML files to angsd_2019@zoho.com by Monday night.