Skip to content

fasta — FASTA Parser

Source: crates/fastdpplot-core/src/fasta.rs


FastaRecord

A single parsed FASTA record.

pub struct FastaRecord {
    pub id: String,          // everything after '>' up to first whitespace
    pub description: String, // everything after first whitespace on the header line
    pub sequence: String,    // full sequence, newlines stripped, uppercased
    pub length: usize,       // sequence.len(), cached
}

parse_fasta

pub fn parse_fasta(path: &str) -> Result<Vec<FastaRecord>, FastDpError>

Parse all records from a FASTA file via a streaming BufReader.

Rules

  • Lines starting with > begin a new record.
  • Lines starting with ; are skipped (FASTA comment syntax).
  • All other non-empty lines are appended to the current record's sequence. Each line is trimmed and uppercased before concatenation, so newlines and leading/trailing whitespace are removed, but internal whitespace within a line is preserved.

Errors

Error Condition
FastDpError::Io File cannot be opened.
FastDpError::NoRecords The file is empty or contains no > headers.
FastDpError::EmptySequence A record has a header but zero sequence bases.

Example (Rust)

use fastdpplot_core::fasta::parse_fasta;

let records = parse_fasta("data/query.fasta")?;
println!("First record: {} ({} bp)", records[0].id, records[0].length);

Multi-record FASTA files

All records are parsed and returned. The high-level pipeline (dp_input) uses only the first record from each file.


format_axis_label

pub fn format_axis_label(record: &FastaRecord, max_len: usize) -> String

Format a FASTA record as an axis label, truncated to max_len characters.

  • Returns "{id} | {description}" when the description is non-empty.
  • Returns just "{id}" when the description is empty.
  • Appends when the resulting string exceeds max_len.

The default max_len used in the pipeline is 80 characters.

Example

let label = format_axis_label(&record, 80);
// "NM_001234.5 | Homo sapiens BRCA1 mRNA, complete cds"