Output

Output Format¶

The output consists of two aligned sequences printed line-by-line:

Green: matched characters
Red: gaps
Cyan: mismatches

Each output block includes the base range for easier visual indexing.

FM-Index Anchoring with Proteins

If you see the message "FM-index anchoring unavailable/failed. Falling back to MPI full DP" during protein alignment, this is expected behavior, not a bug.

* **Exact Matches vs. Similarity:** The FM-Index requires exact substring matches (k-mers, typically 5-8 characters) to build anchors. Distant protein sequences (<30% identity) often preserve *chemical similarity* rather than exact identity, meaning they may only share very short exact matches (2-3 amino acids).
* **Smart Fallback:** Lacking exact matches long enough to safely anchor, the program intentionally skips the FM-Index phase. It gracefully falls back to the full Smith-Waterman or Needleman-Wunsch DP matrix (using BLOSUM62) to ensure a biologically accurate alignment. 
* **Why not lower the k-mer size?** Forcing a tiny k-mer threshold (like `k=3`) on proteins would result in massive amounts of random, noisy seeds, completely destroying both accuracy and performance.

Additional Outputs (From Code)¶

Beyond alignment:

DP matrices (text or binary)
Traceback matrices
LCS sequence output
Indexed position ranges

Binary formats:

Efficient storage for large matrices
Row-major layout with metadata header

Future Improvements¶

Support multi-sequence alignment.
Allow multiple FASTA entries.
Export alignment results in standard formats (CLUSTAL, Stockholm).
Web-based or GUI interface.