r-make

image

Documentation - Output.

Files produced for each fastq file.

NOTE: for paired reads, all output (other than the .fastq.stats files) is named with respect to read one (R1) within a pair.

.stats: quality stats for all reads

column: position in read
count: number of bases
min: lowest quality score value found
max: highest quality score found
sum: sum of quality score values 
mean: mean quality score value
q1: 1st quartile quality score
med: median quality score
q3: 3rd quartile quality score
iqr: inter-quartile range
lw: left-whisker value
rw: right-whisker value
a_count: number of 'A' nucleotides counted
c_count: number of 'C' nucleotides counted
g_count: number of 'G' nucleotides counted
t_count: number of 'T' nucleotides counted
n_count: number of 'N' nucleotides counted
max-count: total number of bases 

.bam: alignment results

qname: query template name
flag: bitwise flag
rname: reference sequence name
pos: 1-based leftmost mapping position
mapq: mapping quality
cigar: cigar string
rnext: reference name of the mate/next segment
pnext: position of the mate/next segment
tlen: observed template length
seq: sequence
qual: ascii of phred-scaled base quality
as: alignment score
xn: number of ambiguous bases
xm: number of mismatches
xo: number of open gaps
xg: number of gap extensions 
nm: edit distance to the reference
md: string for mismatching positions
yt: string representing alignment type
nh: number of reported alignments

.bam.junctions.bed: junctions reported by tophat

chr: chromosome name
start: first base of the intron (1-based) 
stop: last base of the intron (1-based)
strand: defines the strand
intron_motif: 0=non-canon.;1=GT/AG;2=CT/AC;3=GC/AG;4=CT/GC;5=AT/AC;6=GT/AT
annotated: 0=unannotated;1=annotated (only if splice database used)
unique_support: #of uniquely mapping reads crossing the junction
multi_support: # of multi-mapping reads crossing the junction
overhang: maximum spliced alignment overhang

.bam.stats: quality stats for aligned reads

column: position in read
count: number of bases
min: lowest quality score value found
max: highest quality score found
sum: sum of quality score values 
mean: mean quality score value
q1: 1st quartile quality score
med: median quality score
q3: 3rd quartile quality score
iqr: inter-quartile range
lw: left-whisker value
rw: right-whisker value
a_count: number of 'A' nucleotides counted
c_count: number of 'C' nucleotides counted
g_count: number of 'G' nucleotides counted
t_count: number of 'T' nucleotides counted
n_count: number of 'N' nucleotides counted
max-count: total number of bases 

.bam.mapped: number of mapped reads

mapped: # of mapped reads
unmapped: # of unmapped reads

.bam.distribution: distribution of aligned reads

ercc: number of bases mapping to ercc spikeins
index1: number of bases mapping to truseq barcode 1
.
.
.
indexn: number of bases mapping to truseq barcode n
adapter: total number of bases mapping to illumina adapters
phix: number of bases mapping to phix174
ribosomal: number of bases mapping to ribosomal rna
mitochondrial: number of bases mapping to mitochondrial rna
junction: number of bases mapping to splice junctions
intergenic: number of bases mapping to intergenic regions
intron: number of bases mapping to introns
cds: number of bases mapping to coding regions
utr5: number of bases mapping to 5' utr
utr3: number of bases mapping to 3' utr
overlap: number of bases ambiguously mapping to multiple groups 

.bam.nvc: nucleotide intensities

pos: position in read
a: number of 'A' nucleotides counted at position
g: number of 'G' nucleotides counted at position
c: number of 'C' nucleotides counted at position
t: number of 'T' nucleotides counted at position
n: number of 'N' nucleotides counted at position

.bam.qual: quality scores

pos: position in read
hits: mean quality score for aligned reads at position
no_hits: mean quality score for unaligned reads at position

.bam.strand: percent strand

forward: number of reads mapping to forward strand
reverse: number of reads mapping to negative strand

.bam.error: error rate

pos: position in read
errors: number of errors detected
total: total number of bases

.bam.duplicates: percent of duplicates

occurrence: number of times read was observed
position: number of duplicates by mapping position
sequence: number of duplicates by sequence composition

.bam.gc: gc content

gc: percent gc content
count: number of reads at percent gc 

.bam.gene: gene count

METHOD: for each gene, composite models are created which consist of the union of each isoform minus any area of overlap between different gene's exons. each 'gene count' is of its corresponding composite model.

image
































image from http://www-huber.embl.de/users/anders/HTSeq/doc/count.html

gene: chromosome and gene name (separator = !)
count: number of reads mapping to gene

.bam.gene.biotype: distribution of genes detected

biotype: ensembl meta-gene category
count: number of genes detected in category

.bam.genebody: counts across genebody

percentile: percentile of gene length
count: number of reads counted at percentile

.bam.tar: unannotated transcriptionally active regions

chr: chromosome name
start: maximum starting position of the region
stop: maximum ending position of the region
count: number of bases mapping to region

.bai: aligned bam index

beg: file offset of the start of a chunk
end: file offset of the end of a chunk

.unmapped.bam: unaligned reads

qname: query template name
flag: bitwise flag
rname: reference sequence name
pos: 1-based leftmost mapping position
mapq: mapping quality
cigar: cigar string
rnext: reference name of the mate/next segment
pnext: position of the mate/next segment
tlen: observed template length
seq: sequence
qual: ascii of phred-scaled base quality

Files produced for each lane.

.gene.bpkm: bpkm counts

gene: chromosome and gene name (separator = !)
start: maximum starting position of the gene
stop: maximum ending position of the gene
bpkm: number of bases mapping to gene

.bam.distribution.ratio: select ratios

exon/intron: # bases mapped to exons / # to introns
exon/intergenic: # bases mapped to exons / # to intergenic reg.
3utr/5utr: # bases mapped to 3' utr / # to 5' utr

.bam.genebody.cv: coefficient of variation across genebody

cv: coefficient of variation across all genes

.bam.duplicates.percent: percent of duplicates by occurence

1_occurrence: % of reads with 1 occurrence
2_occurrences: % of reads with 2 occurrences
3_occurrences: % of reads with 3 occurrences
4_occurrences: % of reads with 4 occurrences

.bam.mapped: number of mapped reads

here

.bam.distribution: distribution of aligned reads

here

.bam.duplicates: percent of duplicates

here

.bam.error: error rate

here

.bam.gene.count: gene count

here

.bam.gene.biotype: distribution of genes detected

here

.bam.genebody: counts across genebody

here

.bam.qual: quality scores

here

.bam.strand: percent strand

here

.bam.nvc: nucleotide intensities

here

Files produced for each sample.

.bam.gene.count.matrix: matrix of gene counts for all lanes

gene: chromosome and gene name (separator = !)
sample_1: number of bases mapping to gene
.
.
.
sample_n: number of bases mapping to gene

.bam.mapped: number of mapped reads

here

.bam.distribution: distribution of aligned reads

here

.bam.duplicates: percent of duplicates

here

.bam.error: error rate

here

.bam.feature: feature count

here

.bam.gene: gene count

here

.bam.gene.biotype: distribution of genes detected

here

.bam.genebody: counts across genebody

here

.bam.qual: quality scores

here

.bam.strand: percent strand

here

.bam.nvc: nucleotide intensities

here

Documentation.

Available pages:
Last modification date:

p. zumbo