Skip to content

Architecture & Design

This page describes the overall architecture of fastdpplot, the data flow from input files to rendered image, and the design decisions behind the Rust/Python split.


High-level overview

┌─────────────────────────────────────────────────────────────┐
│                        User interface                        │
│  CLI (cli/main.py)  ·  Python API  ·  Jupyter notebook       │
└──────────────────────────────┬──────────────────────────────┘
             ┌─────────────────▼──────────────────┐
             │         Python wrappers             │
             │  fastdpplot.io / plot / server      │
             │  (thin, stateless, type-safe)       │
             └─────────────────┬──────────────────┘
                               │  PyO3 function calls
             ┌─────────────────▼──────────────────┐
             │        fastdpplot._rs (PyO3)        │
             │   crates/fastdpplot-py/src/lib.rs   │
             └─────────────────┬──────────────────┘
                               │  Rust function calls
             ┌─────────────────▼──────────────────┐
             │      fastdpplot-core (pure Rust)    │
             │  io · fasta · binmat · dp_input     │
             │  binning · sparse · tile · types    │
             └─────────────────────────────────────┘

Layer responsibilities

CLI (cli/main.py)

  • Argument parsing via argparse.
  • Two sub-commands: plot (Parquet/text) and bin (binary + FASTA).
  • Backward-compatible flat interface for existing scripts.
  • Delegates entirely to fastdpplot.io and fastdpplot.plot.

Python wrappers (fastdpplot/)

Module Responsibility
io.py File-type detection, call the right Rust loader, return a pandas DataFrame.
plot.py datashader rasterisation, matplotlib axes, Panel server setup.
server.py One-liner wrappers that load data then start the server.
sparse_convert.py Bridge between _rs.to_coo and scipy.sparse.

No data transformation happens in Python beyond DataFrame construction. All parsing and binning are in Rust.

PyO3 bridge (crates/fastdpplot-py/)

  • Each #[pyfunction] is a thin wrapper: convert Python types → Rust types, call the core function, convert the result back.
  • Errors are mapped to PyValueError via the map_err helper.
  • All functions are registered in the _rs module by #[pymodule].

Rust core (crates/fastdpplot-core/)

Pure Rust, no Python dependency. Can be used as a Rust library independently.


Data flow — .bin + FASTA pipeline

data/query.fasta  ──┐
                    ├─► parse_fasta()   ──► FastaRecord (id, desc, sequence, length)
data/subject.fasta ─┘

data/dp_matrix.bin ─► read_bin_matrix()          ──► BinMatrix (flat Vec<i32>)

                       matrix_to_dotpoints()     ──► Vec<DotPoint>  (parallel, raw scores)

                       bin_points()              ──► BinGrid  (parallel histogram)

                       datashader / matplotlib   ──► PNG / SVG / HTML

Parallelism model

fastdpplot uses rayon for CPU parallelism:

Operation Strategy
TXT parsing par_iter over pre-collected lines
matrix_to_dotpoints par_iter().flat_map_iter() over rows
bin_points par_iter().fold().reduce() with thread-local accumulators

Thread count

Rayon automatically uses all logical CPUs. Set the RAYON_NUM_THREADS environment variable to limit parallelism if needed.


Memory model

Scenario Memory usage
Parquet loading O(batch) — one record batch at a time
.bin loading O(rows × cols × elem_size) — full matrix in RAM
Binning O(width × height) per thread — tiny for typical canvas sizes
Sparse output O(nnz) where nnz ≪ width × height

Large binary matrices

A 100 000 × 100 000 f32 matrix is 40 GB in RAM. For matrices this large, consider loading only a region or downsampling before calling load_bin_matrix.


GPU acceleration

fastdpplot detects cudf (RAPIDS cuDF) at import time in plot.py:

try:
    import cudf
    _DataFrame = cudf.DataFrame
except ImportError:
    cudf = None
    _DataFrame = pd.DataFrame

When cuDF is available, the dot-point DataFrame is transferred to GPU memory before passing to datashader. All other processing (I/O, binning) remains on CPU in Rust.

Note

The GPU path is transparent — render_static, serve_interactive, and show all work identically with or without cuDF.


Error handling

Rust errors are defined in crates/fastdpplot-core/src/error.rs using thiserror. The PyO3 bridge maps every FastDpError variant to a Python ValueError with a human-readable message.

Rust error Typical cause
Io File not found, permission denied
SizeMismatch Wrong dtype or swapped dimensions
CannotInferDtype File size matches no dtype for given dimensions
DimMismatch Matrix and FASTA lengths disagree even after transposition
NoRecords Empty FASTA file
EmptySequence FASTA record with no bases
ColumnNotFound Parquet file missing x / y columns
Parse Non-numeric field in a text table