reference.fasta: The standard SARS-CoV-2 genome sequence (NC_045512.2).
Use: Used as the template for mapping reads and calling variants.
2. Pre-processing & Alignment####qc/Content: Quality Control & Trimming¶
<sample_id>.fastp.html: An interactive HTML report visualizing read quality before and after trimming. Look here to check adapter removal and read quality scores.
<sample_id>.R1/R2.clean.fastq.gz: The "clean" sequencing reads. Adapters have been trimmed, and low-quality bases removed. These are used for mapping.
<sample_id>.primerclipped.bam: A modified BAM file where PCR primer sequences have been "soft-clipped" (masked).
Why this matters: Amplicon sequencing uses synthetic primers to amplify the virus. If these synthetic sequences aren't removed, they can mask real mutations or introduce false ones.
<sample_id>.consensus.fasta: The reconstructed viral genome for a single specific sample. This sequence represents the specific variant of the virus found in that patient/sample.