Architecture & Design¶
This page describes the overall architecture of fastdpplot, the data flow from input files to rendered image, and the design decisions behind the Rust/Python split.
High-level overview¶
┌─────────────────────────────────────────────────────────────┐
│ User interface │
│ CLI (cli/main.py) · Python API · Jupyter notebook │
└──────────────────────────────┬──────────────────────────────┘
│
┌─────────────────▼──────────────────┐
│ Python wrappers │
│ fastdpplot.io / plot / server │
│ (thin, stateless, type-safe) │
└─────────────────┬──────────────────┘
│ PyO3 function calls
┌─────────────────▼──────────────────┐
│ fastdpplot._rs (PyO3) │
│ crates/fastdpplot-py/src/lib.rs │
└─────────────────┬──────────────────┘
│ Rust function calls
┌─────────────────▼──────────────────┐
│ fastdpplot-core (pure Rust) │
│ io · fasta · binmat · dp_input │
│ binning · sparse · tile · types │
└─────────────────────────────────────┘
Layer responsibilities¶
CLI (cli/main.py)¶
- Argument parsing via
argparse. - Two sub-commands:
plot(Parquet/text) andbin(binary + FASTA). - Backward-compatible flat interface for existing scripts.
- Delegates entirely to
fastdpplot.ioandfastdpplot.plot.
Python wrappers (fastdpplot/)¶
| Module | Responsibility |
|---|---|
io.py |
File-type detection, call the right Rust loader, return a pandas DataFrame. |
plot.py |
datashader rasterisation, matplotlib axes, Panel server setup. |
server.py |
One-liner wrappers that load data then start the server. |
sparse_convert.py |
Bridge between _rs.to_coo and scipy.sparse. |
No data transformation happens in Python beyond DataFrame construction. All parsing and binning are in Rust.
PyO3 bridge (crates/fastdpplot-py/)¶
- Each
#[pyfunction]is a thin wrapper: convert Python types → Rust types, call the core function, convert the result back. - Errors are mapped to
PyValueErrorvia themap_errhelper. - All functions are registered in the
_rsmodule by#[pymodule].
Rust core (crates/fastdpplot-core/)¶
Pure Rust, no Python dependency. Can be used as a Rust library independently.
Data flow — .bin + FASTA pipeline¶
data/query.fasta ──┐
├─► parse_fasta() ──► FastaRecord (id, desc, sequence, length)
data/subject.fasta ─┘
data/dp_matrix.bin ─► read_bin_matrix() ──► BinMatrix (flat Vec<i32>)
matrix_to_dotpoints() ──► Vec<DotPoint> (parallel, raw scores)
bin_points() ──► BinGrid (parallel histogram)
datashader / matplotlib ──► PNG / SVG / HTML
Parallelism model¶
fastdpplot uses rayon for CPU parallelism:
| Operation | Strategy |
|---|---|
| TXT parsing | par_iter over pre-collected lines |
matrix_to_dotpoints |
par_iter().flat_map_iter() over rows |
bin_points |
par_iter().fold().reduce() with thread-local accumulators |
Thread count
Rayon automatically uses all logical CPUs. Set the RAYON_NUM_THREADS
environment variable to limit parallelism if needed.
Memory model¶
| Scenario | Memory usage |
|---|---|
| Parquet loading | O(batch) — one record batch at a time |
.bin loading |
O(rows × cols × elem_size) — full matrix in RAM |
| Binning | O(width × height) per thread — tiny for typical canvas sizes |
| Sparse output | O(nnz) where nnz ≪ width × height |
Large binary matrices
A 100 000 × 100 000 f32 matrix is 40 GB in RAM. For matrices this
large, consider loading only a region or downsampling before calling
load_bin_matrix.
GPU acceleration¶
fastdpplot detects cudf (RAPIDS cuDF) at import time in plot.py:
try:
import cudf
_DataFrame = cudf.DataFrame
except ImportError:
cudf = None
_DataFrame = pd.DataFrame
When cuDF is available, the dot-point DataFrame is transferred to GPU memory before passing to datashader. All other processing (I/O, binning) remains on CPU in Rust.
Note
The GPU path is transparent — render_static, serve_interactive, and
show all work identically with or without cuDF.
Error handling¶
Rust errors are defined in crates/fastdpplot-core/src/error.rs using
thiserror. The PyO3 bridge maps every FastDpError variant to a Python
ValueError with a human-readable message.
| Rust error | Typical cause |
|---|---|
Io |
File not found, permission denied |
SizeMismatch |
Wrong dtype or swapped dimensions |
CannotInferDtype |
File size matches no dtype for given dimensions |
DimMismatch |
Matrix and FASTA lengths disagree even after transposition |
NoRecords |
Empty FASTA file |
EmptySequence |
FASTA record with no bases |
ColumnNotFound |
Parquet file missing x / y columns |
Parse |
Non-numeric field in a text table |