Lineage Annotation and Consensus Quality Control¶

This section explains how the pipeline handles lineage annotation and consensus sequence quality assessment.

Lineage Annotation with Pangolin¶

pangolin is used to assign SARS-CoV-2 lineages based on the consensus sequences generated by the pipeline.

Input¶

consensus-seqs.fasta (combined consensus sequences for all samples)

Output¶

lineage_report.csv — Pangolin lineage calls with sample metadata
Additional summary files: summary.csv, pangolin.json (optional)

Usage¶

Pangolin is executed via the following command in the pipeline:

pangolin -t 8 consensus-seqs.fasta

Consensus QC with PRESIDENT¶

PRESIDENT evaluates consensus sequences by comparing them to the reference genome.

Checks Performed¶

Pairwise nucleotide identity
Ambiguous base counts
Masked regions

Input¶

reference.fasta
consensus-seqs.fasta

Output¶

Summary reports written to the output/ directory
.html and tabular output formats depending on PRESIDENT version

Usage¶

Executed as part of the pipeline:

president -r reference.fasta -q consensus-seqs.fasta -t 8 -a -p output/ -f consensus_

📈 These results help assess sequence reliability before downstream analysis like phylogenetics.

GitHub « Previous Next »