Process Details¶
This section describes each Nextflow process in the pipeline, including their inputs, outputs, and core functionality.
downloadData¶
Purpose: Downloads the "illumina-amplicon-capture-wgs.tar.gz" dataset from OSF by default.
- Input: None
- Output: Raw
.fastq.gzfiles
referenceGenome¶
Purpose: Download SARS-CoV-2 reference genome (NC_045512.2) using NCBI Entrez Direct.
- Input: None
- Output:
reference.fasta
qc¶
Purpose: Quality control and cleaning of raw sequencing reads.
- Input: Paired-end FASTQ files
- Output:
- Cleaned FASTQ files (
pair*.R1/2.clean.fastq.gz) - FastQC reports (
.html,.zip) - Fastp reports (
.json,.html) - MultiQC summary
- Cleaned FASTQ files (
mapping¶
Purpose: Align reads to reference genome and produce sorted, indexed BAM files.
- Input: Cleaned FASTQ files,
reference.fasta - Output: Sorted and indexed BAM files (
.sorted.bam,.bai)
primerClipping¶
Purpose: Clip primer sequences using a CleanPlex BEDPE file and bamclipper. Automatically downloads the
cleanplex.amplicons.bedpe file and converts it to Unix format during execution
- Input: Sorted BAM files
- Output: Primer-clipped BAM files (
.primerclipped.bam)
variantCalling¶
Purpose: Call variants from aligned BAM files using freebayes.
- Input: Primer-clipped BAM files,
reference.fasta - Output: Raw VCF files (
${sample_id}.vcf)
consensusGeneration¶
Purpose: Generate consensus sequences using bcftools from VCFs.
- Input: VCF files,
reference.fasta - Output: FASTA files for consensus sequences (
${sample_id}.consensus.fasta)
mergeConsensus¶
Purpose: Merge consensus sequences from multiple samples into a single FASTA file.
- Input: FASTA files for consensus sequences (
${sample_id}.consensus.fasta) - Output: Combined consensus FASTA file (
combined_consensus.fasta)
pangolinLineage¶
Purpose: Assign SARS-CoV-2 lineages using pangolin.
- Input: Combined consensus FASTA file
- Output:
lineage_report.csv, optional TSV and summary files
consensusQC¶
Purpose: Assess consensus quality using president.
- Input: Consensus FASTA,
reference.fasta - Output: QC summary in
output/folder
phylogeny¶
Purpose: Perform multiple sequence alignment and build phylogenetic tree.
- Input: Consensus FASTA file
- Output:
alignment.fasta, IQ-TREE outputs (e.g.,.treefile,.log)
For a visual overview, see the Workflow section.