Lineage Annotation and Consensus Quality Control

This section explains how the pipeline handles lineage annotation and consensus sequence quality assessment.


Lineage Annotation with Pangolin

pangolin is used to assign SARS-CoV-2 lineages based on the consensus sequences generated by the pipeline.

Input

  • consensus-seqs.fasta (combined consensus sequences for all samples)

Output

  • lineage_report.csv — Pangolin lineage calls with sample metadata
  • Additional summary files: summary.csv, pangolin.json (optional)

Usage

Pangolin is executed via the following command in the pipeline:

pangolin -t 8 consensus-seqs.fasta

Consensus QC with PRESIDENT

PRESIDENT evaluates consensus sequences by comparing them to the reference genome.

Checks Performed

  • Pairwise nucleotide identity
  • Ambiguous base counts
  • Masked regions

Input

  • reference.fasta
  • consensus-seqs.fasta

Output

  • Summary reports written to the output/ directory
  • .html and tabular output formats depending on PRESIDENT version

Usage

Executed as part of the pipeline:

president -r reference.fasta -q consensus-seqs.fasta -t 8 -a -p output/ -f consensus_

📈 These results help assess sequence reliability before downstream analysis like phylogenetics.