The goal is to create a package that contains:
The package should be installable via install.packages()
.
Exercises that involve the reading in of data and plotting should become part of the vignette (which is really just a good old Rmarkdown document where you keep track of example analyses you’ve done with the functions in a given package).
Feel free to add any other functions you might come up with along the exercises to your package.
reading_in
function (shown below) as your first function in the newly generated package. Describe the steps you have to take in order to make that function part of the package. (1pt)DESCRIPTION
file to note all the packages that this function depends on. (1pt)FastQC
’s diagnostic “Per base sequence quality” from a single fastqc_data.txt
file into an Robject. (1pt)
Rmd
document that will become your vignette.sed
command (1pt)
@details
section of the function’s documentation would be a good place to put it, too, but for the sake of the homework, just keep it in the vignette.FastQC
results of at least 4 fastq files that should cover 2 biological replicates and 2 technical replicates of each. Make sure to keep track of the sample name in the new Robjects you’re creating. (2pts)
rbind()
; if you’ve generated a list in the previous exercise, also look into the do.call()
function). Save that composite data frame as an .rda
object (with the save()
function) giving it the same name as the name of the Robject (e.g. combined_df.rda
). (1pt).rda
file within the package infrastructure. (0.5pt)eval=FALSE
though (why?).Here’s the function to get you started with your package:
#' Function for parsing the text output of FastQC
#'
#' This functions extracts the values for a specific test run by FastQC on a
#' single fastq file.
#'
#' @param file string that specifies the path to an individual FastQC result file
#' (tyically named "fastqc_data.txt"
#' @param test Indicate which test results should be extracted. Default:
#' "Per base sequence quality". Other options are, for example, "Per tile sequence quality",
#' "Per sequence quality score" etc.
#'
#' @return data.frame with the values of a single FastQC test result.
#'
#' @examples \dontrun{
#' res <- reading_in(file = "acinar-3_S9_L001_R1_001_fastqc/fastqc_data.txt")
#' }
reading_in <- function(file, test = "Per base sequence quality"){
## generate the string that will be used for the file parsing
syscommand <- paste0("sed -n '/", test, "/,/END_MODULE/p' ", file, " | grep -v '^>>'")
## use the fread command, which can interpret UNIX commands on the fly to
## read in the correct portion of the FastQC result
dat <- data.table::fread( cmd = syscommand, header = TRUE) %>% as.data.frame
return(dat)
}
Example plot:
Build the package and send it to angsd_wmc@zohomail.com by Saturday night. If you need support, get in touch with Merv on Thursday, 3-4pm.