Quickstart¶

This page walks through representative tasks using the biosea CLI. Each example maps directly to a subsystem in the toolkit.

Prerequisites

The examples below assume you have installed the package (see Installation). Alignment and Markov commands require Perl. Sequence tools and BWT commands work with Python alone.

Compute sequence distances¶

Hamming distance compares two equal-length strings position by position:

biosea seqtools hamming ACGTACGT ACGTACGA

Hamming distance: 1

Levenshtein distance handles strings of different lengths by counting insertions, deletions, and substitutions:

biosea seqtools levenshtein kitten sitting

Levenshtein distance: 3

Count k-mers¶

Extract k-mer frequencies from a sequence:

biosea seqtools kmer ACGTACGT --k 3

{"ACG": 2, "CGT": 2, "GTA": 1, "TAC": 1}

Search with BWT / FM-index¶

Build a transient FM-index and search for a pattern:

biosea bwt search --sequence ACGTACGT --pattern CGT

FM-index search positions: [1, 5]

Positions are zero-indexed into the input sequence.

Align two sequences¶

Prepare two FASTA files, then run a global alignment with a scoring matrix:

biosea align seq1.fa seq2.fa \
    --matrix alignment/scoring/BLOSUM62.mat \
    --mode global

Supported alignment modes:

Mode	Algorithm	Description
`global`	Needleman–Wunsch	End-to-end alignment of both sequences
`local`	Smith–Waterman	Best-scoring local subsequence alignment
`lcs`	Longest common subseq	Longest common subsequence length

Tip

If you omit --matrix, the alignment script uses its built-in default scoring. For protein sequences, specifying a substitution matrix (BLOSUM, PAM, VTML) is strongly recommended.

Generate a Markov random walk¶

Train a Markov chain on a FASTA file and sample a sequence:

biosea markov \
    --fasta training.fa \
    --length 100 \
    --start A \
    --order 1 \
    --method alias

This builds a first-order Markov chain from the sequences in training.fa, then generates a 100-character random walk starting from state A using the alias sampling method.

Use the REST API¶

Start the server:

uvicorn api.server:app --host 0.0.0.0 --port 8000

Then call any endpoint. For example, compute a Levenshtein distance:

curl -s -X POST http://127.0.0.1:8000/distance \
    -H "Content-Type: application/json" \
    -d '{"seq1": "kitten", "seq2": "sitting", "metric": "levenshtein"}' | python -m json.tool

{
    "metric": "levenshtein",
    "seq1": "kitten",
    "seq2": "sitting",
    "distance": 3
}

See the REST API reference for all endpoints and request schemas.

Where to go next¶

Goal	Page
Full CLI command reference	CLI Reference
REST API endpoints	REST API
Understand the architecture	Architecture
Deep-dive into alignment	Alignment Module
Explore Markov chains	Markov Module
BWT and FM-index internals	BWT & FM-index
Extend the toolkit	Developer Guide