fastdpplot.io — Data Loading¶
Source: fastdpplot/io.py
load(path, **kwargs) → pd.DataFrame¶
Auto-detect and load a .parquet or delimited-text file via the Rust backend.
Parameters¶
| Parameter | Type | Default | Description |
|---|---|---|---|
path |
str |
— | Path to a .parquet file or a delimited text file. |
sep |
str |
"\t" |
Field separator for text files. Must be a single character. |
x_col |
int |
0 |
Zero-based column index for the X coordinate (text files). |
y_col |
int |
1 |
Zero-based column index for the Y coordinate (text files). |
val_col |
int \| None |
None |
Zero-based column index for the value/score. None → every point gets value=1.0. |
Returns¶
pd.DataFrame with columns ["x", "y", "value"].
Raises¶
ImportError— if the Rust extension is not compiled.ValueError— if the file cannot be parsed (propagated fromFastDpError).
Examples¶
from fastdpplot.io import load
# Parquet
df = load("hits.parquet")
# Tab-separated with explicit columns
df = load("blast.txt", sep="\t", x_col=6, y_col=8, val_col=2)
# Space-separated, no value column (all points get value=1.0)
df = load("coords.txt", sep=" ")
Parquet projection
When loading Parquet, only the columns named x/X, y/Y, and
value/identity/score are read from disk. Other columns are skipped,
keeping memory usage low.
load_bin(matrix_path, fasta_a, fasta_b) → dict¶
One-shot loader for a raw binary DP matrix paired with two FASTA sequence files.
Parameters¶
| Parameter | Type | Default | Description |
|---|---|---|---|
matrix_path |
str |
— | Path to the raw binary .bin matrix file. |
fasta_a |
str |
— | Path to the query FASTA file (rows of the matrix). |
fasta_b |
str |
— | Path to the subject FASTA file (columns of the matrix). |
Returns¶
A dict with keys:
| Key | Type | Description |
|---|---|---|
"df" |
pd.DataFrame |
Dot-point data with columns ["x", "y", "value"]. |
"x_label" |
str |
Subject sequence axis label (formatted from FASTA header). |
"y_label" |
str |
Query sequence axis label (formatted from FASTA header). |
"x_range" |
tuple[int, int] |
(0, subject_length) — full X axis span. |
"y_range" |
tuple[int, int] |
(0, query_length) — full Y axis span. |
Raises¶
ImportError— if the Rust extension is not compiled.ValueError— propagated from Rust errors (size mismatch, unknown dtype, FASTA parse errors, …).
Examples¶
from fastdpplot.io import load_bin
result = load_bin(
"data/dp_matrix.bin",
"data/query.fasta",
"data/subject.fasta",
)
print(result["x_label"]) # e.g. "NM_001234 | Homo sapiens BRCA1 mRNA"
print(result["x_range"]) # (0, 81189)
print(result["df"].shape) # (N, 3)
Axis swap
If the matrix file size matches the transposed dimensions (subject × query) rather than (query × subject), fastdpplot swaps the axes automatically and emits a warning to stderr.
Only the first FASTA record is used
When a FASTA file contains multiple sequences, only the first record is used to derive the axis label and dimension. This mirrors the behaviour of most DP tools that output a single pairwise matrix.