Dragen QC report template for multiple samples from fastqc metrics
By Synnøve Yndestad in R sequencing RNAseq
February 14, 2022
For RNAseq performed on the Illumina NovaSeq6000, a single RNAseq run may contain 70 different samples. Batch aggregating and plotting Quality Control metrics from a sequencing run is very useful to spot samples with low sequencing quality within a single run.
While
multiQC is an excellent tool for aggregating and visualizing QC metrics, my RNAseq project is run using the Dragen pipeline. Since no Dragen module had yet been implemented when I was processing the samples, I wrote my own version in the form of a Rmarkdown report template. It takes a folder of *.fastq_metrics.csv files generated by Dragen, and produces a html report with plots made interactive by plotly.
An example report can be viewed
here.
The Rmarkdown template and the example report can be found in my GitHub
here.
Instructions for use:
1- Add a folder containing the *fastq_metrics.csv files to the working directory.
The folder name will be assigned as RunID.
2- Change any run-specific details in the Description section to document for future reference i.e what kind of samples, from which study, what prep protocol generated the library and which dragen version was used in the processing.
3- Knit report
The plots produced will be the same kind of plots listed below.
Plots produced by the report:
1- Read Mean quality; Per-Sequence Quality Scores
Total number of reads. Each average Phred-scale quality value is rounded to the nearest integer.
2- Positional Base Mean Quality; Per-Base Quality Scores
Average Phred-scale quality value of bases with a specific nucleotide at a given location in the read. Locations are listed first and can be either specific positions or ranges. The nucleotide is listed second and can be A, C, G, or T. N or ambiguous bases are assumed to have the system default value, usually QV2.