Pipeline Workflow Diagram and Overview¶

This section outlines the logical structure of the pipeline and provides a visual overview of each step.

Workflow Summary¶

The pipeline consists of the following key stages:

Environment Setup: Initialize and configure the bioinformatics environment.
Data Preparation: Download SARS-CoV-2 reference genome and raw Illumina sequencing data.
Quality Control: Assess and clean raw sequencing reads using fastqc, fastp, and multiqc.
Mapping: Align cleaned reads to the SARS-CoV-2 reference genome using minimap2 and process alignments with samtools.
Primer Clipping: Remove primer sequences from alignments using bamclipper.
Variant Calling: Call genetic variants with freebayes.
Normalization: Normalize Indels and split multiallelic sites using bcftools norm to prepare for consensus generation.
Consensus Generation: Generate consensus sequences using bcftools.
Lineage Annotation: Annotate sequences using pangolin.
Phylogenetic Analysis: Perform multiple sequence alignment with mafft and infer phylogeny using iqtree.

Note: Each step is implemented as a separate Nextflow process and connected logically in main.nf.

For process-specific logic and input/output, see the Process Details section.