Bio Sea Pearl¶
Sequence analysis and bioinformatics utilities in Python and Perl.
Bio Sea Pearl is a dual-language bioinformatics toolkit that integrates mature Perl implementations with a modern Python interface. It provides pairwise sequence alignment, Markov chain simulation, sequence distance metrics, k-mer analysis, and full-text indexing via the Burrows–Wheeler Transform — accessible through a unified CLI, a REST API, or direct Python imports.
Core capabilities¶
| Subsystem | What it does | Implementation |
|---|---|---|
| Alignment | Global, local, and LCS pairwise alignment with affine gap penalties | Python + Perl (Gotoh algorithm) |
| Markov Chains | Train transition models on sequences and sample random walks | Perl (first-order and higher-order) |
| Sequence Tools | Hamming distance, Levenshtein distance, k-mer counting, pattern search | Python (native) + Perl (Inline C) |
| BWT & FM-index | Suffix arrays, Burrows–Wheeler Transform, FM-index substring search | Pure Python |
Three ways to use it¶
Architecture at a glance¶
┌──────────┐ ┌──────────────┐
│ biosea │ │ FastAPI │
│ CLI │ │ REST server │
└────┬─────┘ └──────┬────────┘
│ │
└────────┬────────┘
│
┌────────▼────────┐
│ API layer │
│ (bio_sea_pearl) │
└───┬─────────┬───┘
│ │
┌────────▼──┐ ┌───▼──────────┐
│ Wrappers │ │ Pure Python │
│ (Perl via │ │ (BWT, seqtools│
│ subprocess)│ │ distances) │
└────┬──────┘ └──────────────┘
│
┌────────▼────────┐
│ Perl scripts │
│ & modules │
│ (alignment, │
│ markov, │
│ seqtools) │
└─────────────────┘
The Python API layer is the single dispatch point. Whether a request arrives via the CLI, the REST API, or a direct import, it flows through the same API functions in src/bio_sea_pearl/api/. See Architecture for the full breakdown.
Quick start¶
Installation guide · Quickstart tutorial
Documentation map¶
| Section | Description |
|---|---|
| Installation | Install from PyPI, source, or Docker |
| Quickstart | First commands across all subsystems |
| CLI Reference | Complete biosea command documentation |
| REST API | Endpoint schemas and interactive docs |
| Architecture | Layers, dispatch, data flow, repo layout |
| Alignment | Gotoh DP, scoring matrices, parallelisation |
| Markov Chains | Transition models, sampling, random walks |
| Sequence Tools | Distances, k-mers, Boyer–Moore search |
| BWT & FM-index | Suffix arrays, BWT, backward search |
| Extending | Adding modules, porting Perl to Python |
| Troubleshooting | Common errors and failure modes |