Process Details¶

This section describes each Nextflow process in the pipeline, including their inputs, outputs, and core functionality.

`downloadData`¶

Purpose: Downloads the "illumina-amplicon-capture-wgs.tar.gz" dataset from OSF by default.

Input: None
Output: Raw .fastq.gz files

`referenceGenome`¶

Purpose: Download SARS-CoV-2 reference genome (NC_045512.2) using NCBI Entrez Direct.

Input: None
Output: reference.fasta

`qc`¶

Purpose: Quality control and cleaning of raw sequencing reads.

Input: Paired-end FASTQ files
Output:
- Cleaned FASTQ files (pair*.R1/2.clean.fastq.gz)
- FastQC reports (.html, .zip)
- Fastp reports (.json, .html)
- MultiQC summary

`mapping`¶

Purpose: Align reads to reference genome and produce sorted, indexed BAM files.

Input: Cleaned FASTQ files, reference.fasta
Output: Sorted and indexed BAM files (.sorted.bam, .bai)

`primerClipping`¶

Purpose: Clip primer sequences using a CleanPlex BEDPE file and bamclipper. Automatically downloads the cleanplex.amplicons.bedpe file and converts it to Unix format during execution

Input: Sorted BAM files
Output: Primer-clipped BAM files (.primerclipped.bam)

`variantCalling`¶

Purpose: Call variants from aligned BAM files using freebayes.

Input: Primer-clipped BAM files, reference.fasta
Output: Raw VCF files (${sample_id}.vcf)

`consensusGeneration`¶

Purpose: Generate consensus sequences using bcftools from VCFs.

Input: VCF files, reference.fasta
Output: FASTA files for consensus sequences (${sample_id}.consensus.fasta)

`mergeConsensus`¶

Purpose: Merge consensus sequences from multiple samples into a single FASTA file.

Input: FASTA files for consensus sequences (${sample_id}.consensus.fasta)
Output: Combined consensus FASTA file (combined_consensus.fasta)

`pangolinLineage`¶

Purpose: Assign SARS-CoV-2 lineages using pangolin.

Input: Combined consensus FASTA file
Output: lineage_report.csv, optional TSV and summary files

`consensusQC`¶

Purpose: Assess consensus quality using president.

Input: Consensus FASTA, reference.fasta
Output: QC summary in output/ folder

`phylogeny`¶

Purpose: Perform multiple sequence alignment and build phylogenetic tree.

Input: Consensus FASTA file
Output: alignment.fasta, IQ-TREE outputs (e.g., .treefile, .log)

For a visual overview, see the Workflow section.

Process Details¶

downloadData¶

referenceGenome¶

qc¶

mapping¶

primerClipping¶

variantCalling¶

consensusGeneration¶

mergeConsensus¶

pangolinLineage¶

consensusQC¶

phylogeny¶