`fastdpplot.io` — Data Loading¶

Source: fastdpplot/io.py

`load(path, **kwargs) → pd.DataFrame`¶

Auto-detect and load a .parquet or delimited-text file via the Rust backend.

Parameters¶

Parameter	Type	Default	Description
`path`	`str`	—	Path to a `.parquet` file or a delimited text file.
`sep`	`str`	`"\t"`	Field separator for text files. Must be a single character.
`x_col`	`int`	`0`	Zero-based column index for the X coordinate (text files).
`y_col`	`int`	`1`	Zero-based column index for the Y coordinate (text files).
`val_col`	`int \\| None`	`None`	Zero-based column index for the value/score. `None` → every point gets `value=1.0`.

Returns¶

pd.DataFrame with columns ["x", "y", "value"].

Raises¶

ImportError — if the Rust extension is not compiled.
ValueError — if the file cannot be parsed (propagated from FastDpError).

Examples¶

from fastdpplot.io import load

# Parquet
df = load("hits.parquet")

# Tab-separated with explicit columns
df = load("blast.txt", sep="\t", x_col=6, y_col=8, val_col=2)

# Space-separated, no value column (all points get value=1.0)
df = load("coords.txt", sep=" ")

Parquet projection

When loading Parquet, only the columns named x/X, y/Y, and value/identity/score are read from disk. Other columns are skipped, keeping memory usage low.

`load_bin(matrix_path, fasta_a, fasta_b) → dict`¶

One-shot loader for a raw binary DP matrix paired with two FASTA sequence files.

Parameters¶

Parameter	Type	Default	Description
`matrix_path`	`str`	—	Path to the raw binary `.bin` matrix file.
`fasta_a`	`str`	—	Path to the query FASTA file (rows of the matrix).
`fasta_b`	`str`	—	Path to the subject FASTA file (columns of the matrix).

Returns¶

A dict with keys:

Key	Type	Description
`"df"`	`pd.DataFrame`	Dot-point data with columns `["x", "y", "value"]`.
`"x_label"`	`str`	Subject sequence axis label (formatted from FASTA header).
`"y_label"`	`str`	Query sequence axis label (formatted from FASTA header).
`"x_range"`	`tuple[int, int]`	`(0, subject_length)` — full X axis span.
`"y_range"`	`tuple[int, int]`	`(0, query_length)` — full Y axis span.

Raises¶

ImportError — if the Rust extension is not compiled.
ValueError — propagated from Rust errors (size mismatch, unknown dtype, FASTA parse errors, …).

Examples¶

from fastdpplot.io import load_bin

result = load_bin(
    "data/dp_matrix.bin",
    "data/query.fasta",
    "data/subject.fasta",
)

print(result["x_label"])   # e.g. "NM_001234 | Homo sapiens BRCA1 mRNA"
print(result["x_range"])   # (0, 81189)
print(result["df"].shape)  # (N, 3)

Axis swap

If the matrix file size matches the transposed dimensions (subject × query) rather than (query × subject), fastdpplot swaps the axes automatically and emits a warning to stderr.

Only the first FASTA record is used

When a FASTA file contains multiple sequences, only the first record is used to derive the axis label and dimension. This mirrors the behaviour of most DP tools that output a single pairwise matrix.

fastdpplot.io — Data Loading¶

load(path, **kwargs) → pd.DataFrame¶

Parameters¶

Returns¶

Raises¶

Examples¶

load_bin(matrix_path, fasta_a, fasta_b) → dict¶

Parameters¶

Returns¶

Raises¶

Examples¶

`fastdpplot.io` — Data Loading¶

`load(path, **kwargs) → pd.DataFrame`¶

`load_bin(matrix_path, fasta_a, fasta_b) → dict`¶