Skip to content

fastdpplot.io — Data Loading

Source: fastdpplot/io.py


load(path, **kwargs) → pd.DataFrame

Auto-detect and load a .parquet or delimited-text file via the Rust backend.

Parameters

Parameter Type Default Description
path str Path to a .parquet file or a delimited text file.
sep str "\t" Field separator for text files. Must be a single character.
x_col int 0 Zero-based column index for the X coordinate (text files).
y_col int 1 Zero-based column index for the Y coordinate (text files).
val_col int \| None None Zero-based column index for the value/score. None → every point gets value=1.0.

Returns

pd.DataFrame with columns ["x", "y", "value"].

Raises

  • ImportError — if the Rust extension is not compiled.
  • ValueError — if the file cannot be parsed (propagated from FastDpError).

Examples

from fastdpplot.io import load

# Parquet
df = load("hits.parquet")

# Tab-separated with explicit columns
df = load("blast.txt", sep="\t", x_col=6, y_col=8, val_col=2)

# Space-separated, no value column (all points get value=1.0)
df = load("coords.txt", sep=" ")

Parquet projection

When loading Parquet, only the columns named x/X, y/Y, and value/identity/score are read from disk. Other columns are skipped, keeping memory usage low.


load_bin(matrix_path, fasta_a, fasta_b) → dict

One-shot loader for a raw binary DP matrix paired with two FASTA sequence files.

Parameters

Parameter Type Default Description
matrix_path str Path to the raw binary .bin matrix file.
fasta_a str Path to the query FASTA file (rows of the matrix).
fasta_b str Path to the subject FASTA file (columns of the matrix).

Returns

A dict with keys:

Key Type Description
"df" pd.DataFrame Dot-point data with columns ["x", "y", "value"].
"x_label" str Subject sequence axis label (formatted from FASTA header).
"y_label" str Query sequence axis label (formatted from FASTA header).
"x_range" tuple[int, int] (0, subject_length) — full X axis span.
"y_range" tuple[int, int] (0, query_length) — full Y axis span.

Raises

  • ImportError — if the Rust extension is not compiled.
  • ValueError — propagated from Rust errors (size mismatch, unknown dtype, FASTA parse errors, …).

Examples

from fastdpplot.io import load_bin

result = load_bin(
    "data/dp_matrix.bin",
    "data/query.fasta",
    "data/subject.fasta",
)

print(result["x_label"])   # e.g. "NM_001234 | Homo sapiens BRCA1 mRNA"
print(result["x_range"])   # (0, 81189)
print(result["df"].shape)  # (N, 3)

Axis swap

If the matrix file size matches the transposed dimensions (subject × query) rather than (query × subject), fastdpplot swaps the axes automatically and emits a warning to stderr.

Only the first FASTA record is used

When a FASTA file contains multiple sequences, only the first record is used to derive the axis label and dimension. This mirrors the behaviour of most DP tools that output a single pairwise matrix.