Architecture¶
Bio Sea Pearl is a layered system that wraps legacy Perl bioinformatics code in a modern Python interface. This page describes the layers, how they interact, and where each component lives in the repository.
High-level layers¶
┌─────────────────────────────────────────────────────────────────┐
│ User-facing interfaces │
│ │
│ ┌──────────────┐ ┌────────────────┐ ┌──────────────────┐ │
│ │ biosea CLI │ │ FastAPI server │ │ Python imports │ │
│ │ (Typer) │ │ (api/server.py)│ │ (library usage) │ │
│ └──────┬───────┘ └───────┬────────┘ └────────┬─────────┘ │
│ │ │ │ │
│ └───────────────────┼─────────────────────┘ │
│ │ │
│ ┌────────▼─────────┐ │
│ │ API layer │ │
│ │ src/bio_sea_pearl│ │
│ │ /api/ │ │
│ └───┬─────────┬───┘ │
│ │ │ │
│ ┌────────▼──┐ ┌───▼──────────────┐ │
│ │ Perl │ │ Python-native │ │
│ │ wrappers │ │ implementations │ │
│ └─────┬─────┘ └──────────────────┘ │
│ │ │
│ ┌────────▼──────────┐ │
│ │ Legacy scripts │ │
│ │ alignment/ │ │
│ │ markov/ │ │
│ │ seqtools/ │ │
│ └───────────────────┘ │
└─────────────────────────────────────────────────────────────────┘
Interface layer¶
Three entry points exist, and all converge on the same API functions:
| Entry point | Technology | Location |
|---|---|---|
biosea CLI |
Typer | src/bio_sea_pearl/cli.py |
| REST API | FastAPI + Uvicorn | api/server.py |
| Library import | Direct Python | from bio_sea_pearl.api import ... |
API layer¶
The src/bio_sea_pearl/api/ package exports seven public functions:
from bio_sea_pearl.api import (
align_sequences, # alignment_api.py
generate_walk, # markov_api.py
hamming_distance, # seqtools_api.py
levenshtein_distance, # seqtools_api.py
kmer_counts, # seqtools_api.py
build_fm_index, # bwt_api.py
search_fm_index, # bwt_api.py
)
Each function encapsulates the decision of whether to call a pure-Python implementation or delegate to a Perl subprocess.
Wrapper layer¶
src/bio_sea_pearl/perl_wrappers/ contains bridge functions that invoke Perl scripts via subprocess.run(). The wrappers:
- Locate the repository root using the
BIOSEA_REPO_ROOTenvironment variable, or by traversing three parent directories above the wrapper file. - Construct the appropriate
perlorpythoncommand line. - Execute the subprocess, capture stdout, and strip ANSI escape codes from the output.
- Parse the result into Python data types.
Native Python layer¶
Two subsystems have pure Python implementations that do not require Perl:
- BWT / FM-index (
src/bio_sea_pearl/bwt/) — suffix arrays, BWT construction, and FM-index backward search. - SeqTools ports (
src/bio_sea_pearl/seqtools_py/) — Hamming distance, Levenshtein distance, and k-mer counting.
Legacy Perl layer¶
The alignment/, markov/, and seqtools/ directories contain the original Perl (and in alignment's case, Python) scripts. These are standalone programs with their own CLI arguments and module systems. The Python wrapper layer calls them as subprocesses.
Repository layout¶
bio-sea-pearl/
├── src/bio_sea_pearl/ # Installable Python package
│ ├── cli.py # Typer CLI entrypoint
│ ├── api/ # Public API functions
│ │ ├── alignment_api.py # → align_sequences()
│ │ ├── markov_api.py # → generate_walk()
│ │ ├── seqtools_api.py # → hamming, levenshtein, kmer
│ │ └── bwt_api.py # → build_fm_index, search_fm_index
│ ├── bwt/ # FM-index (pure Python)
│ │ ├── transform.py # FMIndex class, suffix_array(), bwt()
│ │ └── find.py # Example search script
│ ├── seqtools_py/ # Python ports of Perl SeqTools
│ │ ├── hamming.py
│ │ ├── levenshtein.py
│ │ └── kmer.py
│ └── perl_wrappers/ # Subprocess bridges to Perl
│ ├── alignment_wrapper.py
│ ├── markov_wrapper.py
│ ├── seqtools_wrapper.py
│ └── term_utils.py # ANSI escape stripping
├── api/
│ └── server.py # FastAPI application
├── alignment/
│ ├── bin/ # align.py, align.pl, dotplot.pl, etc.
│ └── scoring/ # 80+ substitution matrices
├── markov/
│ ├── bin/ # randomwalk.pl, validate.pl
│ └── lib/ # MarkovChainFirstOrder.pm, HigherOrder.pm
├── seqtools/
│ ├── run.pl # Perl CLI dispatcher
│ └── SeqTools/ # Perl modules (hamming, levenshtein, etc.)
├── tests/ # pytest test suite
├── config/
│ └── logging.yaml # Logging configuration
├── Dockerfile # Python 3.12-slim + Perl
├── docker-compose.yml # Single-service Compose file
└── pyproject.toml # Hatchling build config
Execution flow¶
CLI path¶
User runs: biosea seqtools hamming ACGT AGGT
│
▼
cli.py → seqtools_hamming_cmd(seq1, seq2)
│
▼
api.seqtools_api.hamming_distance("ACGT", "AGGT")
│
├─ Try: seqtools_py.hamming.distance() ← pure Python
│ └─ Success → return 1
│
└─ Fallback: perl_wrappers.distance_hamming_perl()
└─ subprocess: perl -I seqtools -MSeqTools::hamming ...
REST API path¶
Client sends: POST /distance {"seq1":"kitten","seq2":"sitting","metric":"levenshtein"}
│
▼
api/server.py → validates request via Pydantic model
│
▼
api.seqtools_api.levenshtein_distance("kitten", "sitting")
│
├─ Try: seqtools_py.levenshtein.distance() ← pure Python
│ └─ Success → return 3
│
└─ Fallback: perl_wrappers.distance_levenshtein_perl()
└─ subprocess: perl one-liner with SeqTools::levenshtein
Alignment path (Perl-dependent)¶
User runs: biosea align seq1.fa seq2.fa --matrix BLOSUM62.mat --mode global
│
▼
cli.py → align_cmd(fasta1, fasta2, matrix, mode)
│
▼
api.alignment_api.align_sequences(fasta1, fasta2, matrix, mode)
│
▼
perl_wrappers.alignment_wrapper.run_alignment(...)
│
├─ Resolves repo root (BIOSEA_REPO_ROOT or parents[3])
├─ Prefers alignment/bin/align.py (Python Gotoh implementation)
├─ Falls back to alignment/bin/align.pl (Perl) if needed
└─ Returns ANSI-stripped alignment output
Python-native vs wrapper-based execution¶
The API layer in seqtools_api.py implements a try-Python-first strategy:
- If the pure Python port is importable (
_PY_PORT_AVAILABLEisTrue), call it. - If it raises a
ValueError, re-raise immediately — this is a genuine validation error. - If it raises any other exception, fall back to the Perl wrapper.
- If the Python port is not importable at all, go directly to Perl.
This means:
- BWT / FM-index: Always runs in pure Python. No Perl involved.
- Hamming / Levenshtein / k-mer: Runs in Python by default; Perl is the fallback.
- Alignment: Always delegates to the Perl wrapper layer (which internally runs
align.py, a standalone Python DP script, oralign.pl). - Markov: Always delegates to the Perl wrapper layer (calls
randomwalk.pl).
Configuration¶
Environment variables¶
| Variable | Purpose | Default |
|---|---|---|
BIOSEA_REPO_ROOT |
Root directory for locating Perl scripts and scoring matrices | Three levels above perl_wrappers/ |
Logging¶
The config/logging.yaml file defines a biosea logger that writes to stdout at INFO level:
Build system¶
The project uses Hatchling as its build backend. The wheel bundles:
src/bio_sea_pearl/— the Python packagealignment/,markov/,seqtools/— legacy scripts, included viaforce-includeapi/— the FastAPI server module
See pyproject.toml for the full build configuration.
Related pages¶
- CLI Reference — command-level documentation
- REST API — endpoint schemas
- Extending the Toolkit — how to add new modules