Bio Sea Pearl¶

Sequence analysis and bioinformatics utilities in Python and Perl.

Bio Sea Pearl is a dual-language bioinformatics toolkit that integrates mature Perl implementations with a modern Python interface. It provides pairwise sequence alignment, Markov chain simulation, sequence distance metrics, k-mer analysis, and full-text indexing via the Burrows–Wheeler Transform — accessible through a unified CLI, a REST API, or direct Python imports.

Core capabilities¶

Subsystem	What it does	Implementation
Alignment	Global, local, and LCS pairwise alignment with affine gap penalties	Python + Perl (Gotoh algorithm)
Markov Chains	Train transition models on sequences and sample random walks	Perl (first-order and higher-order)
Sequence Tools	Hamming distance, Levenshtein distance, k-mer counting, pattern search	Python (native) + Perl (Inline C)
BWT & FM-index	Suffix arrays, Burrows–Wheeler Transform, FM-index substring search	Pure Python

Three ways to use it¶

CLIREST APIPython

biosea seqtools hamming ACGTACGT ACGTACGA
biosea bwt search --sequence ACGTACGT --pattern CGT
biosea align seq1.fa seq2.fa --mode global

curl -X POST http://localhost:8000/distance \
    -H "Content-Type: application/json" \
    -d '{"seq1": "kitten", "seq2": "sitting", "metric": "levenshtein"}'

from bio_sea_pearl.api import hamming_distance, build_fm_index, search_fm_index

print(hamming_distance("ACGT", "AGGT"))  # 1

idx = build_fm_index("ACGTACGT")
print(search_fm_index(idx, "CGT"))       # [1, 5]

Architecture at a glance¶

              ┌──────────┐    ┌──────────────┐
              │  biosea  │    │ FastAPI       │
              │   CLI    │    │ REST server   │
              └────┬─────┘    └──────┬────────┘
                   │                 │
                   └────────┬────────┘
                            │
                   ┌────────▼────────┐
                   │   API layer     │
                   │ (bio_sea_pearl) │
                   └───┬─────────┬───┘
                       │         │
              ┌────────▼──┐  ┌───▼──────────┐
              │  Wrappers │  │ Pure Python   │
              │ (Perl via │  │ (BWT, seqtools│
              │ subprocess)│  │  distances)   │
              └────┬──────┘  └──────────────┘
                   │
          ┌────────▼────────┐
          │  Perl scripts   │
          │  & modules      │
          │ (alignment,     │
          │  markov,        │
          │  seqtools)      │
          └─────────────────┘

The Python API layer is the single dispatch point. Whether a request arrives via the CLI, the REST API, or a direct import, it flows through the same API functions in src/bio_sea_pearl/api/. See Architecture for the full breakdown.

Quick start¶

pip install bio-sea-pearl
biosea --help

Installation guide · Quickstart tutorial

Documentation map¶

Section	Description
Installation	Install from PyPI, source, or Docker
Quickstart	First commands across all subsystems
CLI Reference	Complete `biosea` command documentation
REST API	Endpoint schemas and interactive docs
Architecture	Layers, dispatch, data flow, repo layout
Alignment	Gotoh DP, scoring matrices, parallelisation
Markov Chains	Transition models, sampling, random walks
Sequence Tools	Distances, k-mers, Boyer–Moore search
BWT & FM-index	Suffix arrays, BWT, backward search
Extending	Adding modules, porting Perl to Python
Troubleshooting	Common errors and failure modes