Process Details

This section describes each Nextflow process in the pipeline, including their inputs, outputs, and core functionality.


downloadData

Purpose: Downloads the "illumina-amplicon-capture-wgs.tar.gz" dataset from OSF by default.

  • Input: None
  • Output: Raw .fastq.gz files

referenceGenome

Purpose: Download SARS-CoV-2 reference genome (NC_045512.2) using NCBI Entrez Direct.

  • Input: None
  • Output: reference.fasta

qc

Purpose: Quality control and cleaning of raw sequencing reads.

  • Input: Paired-end FASTQ files
  • Output:
    • Cleaned FASTQ files (pair*.R1/2.clean.fastq.gz)
    • FastQC reports (.html, .zip)
    • Fastp reports (.json, .html)
    • MultiQC summary

mapping

Purpose: Align reads to reference genome and produce sorted, indexed BAM files.

  • Input: Cleaned FASTQ files, reference.fasta
  • Output: Sorted and indexed BAM files (.sorted.bam, .bai)

primerClipping

Purpose: Clip primer sequences using a CleanPlex BEDPE file and bamclipper. Automatically downloads the cleanplex.amplicons.bedpe file and converts it to Unix format during execution

  • Input: Sorted BAM files
  • Output: Primer-clipped BAM files (.primerclipped.bam)

variantCalling

Purpose: Call variants from aligned BAM files using freebayes.

  • Input: Primer-clipped BAM files, reference.fasta
  • Output: Raw VCF files (${sample_id}.vcf)

consensusGeneration

Purpose: Generate consensus sequences using bcftools from VCFs.

  • Input: VCF files, reference.fasta
  • Output: FASTA files for consensus sequences (${sample_id}.consensus.fasta)

mergeConsensus

Purpose: Merge consensus sequences from multiple samples into a single FASTA file.

  • Input: FASTA files for consensus sequences (${sample_id}.consensus.fasta)
  • Output: Combined consensus FASTA file (combined_consensus.fasta)

pangolinLineage

Purpose: Assign SARS-CoV-2 lineages using pangolin.

  • Input: Combined consensus FASTA file
  • Output: lineage_report.csv, optional TSV and summary files

consensusQC

Purpose: Assess consensus quality using president.

  • Input: Consensus FASTA, reference.fasta
  • Output: QC summary in output/ folder

phylogeny

Purpose: Perform multiple sequence alignment and build phylogenetic tree.

  • Input: Consensus FASTA file
  • Output: alignment.fasta, IQ-TREE outputs (e.g., .treefile, .log)

For a visual overview, see the Workflow section.