Skip to content

io — Parquet and Text Loaders

Source: crates/fastdpplot-core/src/io.rs


load_parquet

pub fn load_parquet(path: &str) -> Result<Vec<DotPoint>, FastDpError>

Load x, y, and optionally value/identity columns from a Parquet file.

Column name detection

Coordinate Accepted column names
X x, X
Y y, Y
Value (optional) value, identity, score

Column matching is case-insensitive. When no value column is found, every point receives value = 1.0.

Streaming behaviour

Record batches are read one at a time via a projection mask; only the required x/y columns, plus the optional value column when present, are fetched. The full Parquet file is never resident in RAM simultaneously.

Supported column types

Role Accepted Arrow types
X, Y Float32, Float64
Value Float32, Float64

Column type restriction

If an X or Y column has an integer type the loader returns FastDpError::UnsupportedType. Cast to float before writing the Parquet file if needed.

Errors

Error Condition
FastDpError::Io File cannot be opened.
FastDpError::Parquet Parquet format error.
FastDpError::Arrow Arrow record-batch reading/iteration failed.
FastDpError::ColumnNotFound x/y column not found.
FastDpError::UnsupportedType Column has an unsupported Arrow type.

load_txt

pub fn load_txt(
    path: &str,
    sep: char,
    x_col: usize,
    y_col: usize,
    val_col: Option<usize>,
) -> Result<Vec<DotPoint>, FastDpError>

Parse a tab- or space-separated alignment table.

Rules

  • Lines starting with # are skipped.
  • If the first non-comment line cannot be parsed as numbers it is treated as a header row and skipped.
  • Parsing is parallelised with rayon.

Errors

Error Condition
FastDpError::Io File cannot be opened.
FastDpError::MissingColumn A row has fewer fields than the required column index.
FastDpError::Parse A required field cannot be parsed as a number (includes line number).

BLAST tabular format

BLAST -outfmt 6 columns are: qseqid sseqid pident length mismatch gapopen qstart qend sstart send evalue bitscore. To plot query-start vs subject-start with identity as value:

python -m cli.main plot \
    --input blast.txt \
    --x-col 6 --y-col 8 --val-col 2 \
    --output dotplot.png