Theoretical Background

1. Longest Common Subsequence (LCS)

The LCS problem is solved using a dynamic programming table to compute the length and path of the longest subsequence common to both strings. The solution provides insights into evolutionary or structural similarity at a character level.

  • Time Complexity: $O(m \cdot n)$
  • Space Complexity: $O(m \cdot n)$
  • Alignment Type: Character-based (not contiguous)

2. Global Alignment (Needleman-Wunsch)

This method aligns the entire sequences using a scoring matrix initialized with gap penalties. It is ideal for full-length, homologous sequences.

  • Time Complexity: $O(m \cdot n)$
  • Uses a traceback matrix to reconstruct the alignment path.
  • Supports scoring for matches, mismatches, and gaps.

3. Local Alignment (Smith-Waterman)

Focuses on identifying the most similar subsequences. Suitable for aligning protein motifs or conserved regions.

  • Time Complexity: $O(m \cdot n)$
  • Incorporates zero in scoring to allow for local reset.
  • Traceback stops when score becomes zero.

Alignment Matrix and Mathematical Explanation

1. Longest Common Subsequence (LCS)

Goal: Find the longest subsequence common to two sequences $X = x_1x_2\ldots x_m$ and $Y = y_1y_2\ldots y_n$.


Matrix Definition

Let $C[i][j]$ denote the length of the LCS of the prefixes $x_1\ldots x_i$ and $y_1\ldots y_j$.

Recurrence Relation:

$$ C[i][j] = \begin{cases} 0 & \text{if } i = 0 \text{ or } j = 0 \\ C[i-1][j-1] + 1 & \text{if } x_i = y_j \\ \max(C[i-1][j], C[i][j-1]) & \text{otherwise} \end{cases} $$

Backtracking Table:

  • D: Diagonal (match)
  • U: Up (skip from X)
  • L: Left (skip from Y)

In Code:

if (x[i - 1] == y[j - 1]) {
    c[i][j] = c[i - 1][j - 1] + 1;
    b[i][j] = 'D';
} else if (c[i - 1][j] >= c[i][j - 1]) {
    c[i][j] = c[i - 1][j];
    b[i][j] = 'U';
} else {
    c[i][j] = c[i][j - 1];
    b[i][j] = 'L';
}

2. Global Alignment (Needleman–Wunsch)

Goal: Align full sequences $X$ and $Y$ from start to end, minimizing penalties or maximizing score.


Matrix Definition

Let $S[i][j]$ denote the optimal alignment score between $x_1\ldots x_i$ and $y_1\ldots y_j$.

Initialization:

$$ S[i][0] = i \cdot \text{GAP},\quad S[0][j] = j \cdot \text{GAP} $$

Recurrence Relation:

$$ S[i][j] = \max \begin{cases} S[i-1][j-1] + \text{score}(x_i, y_j) \ S[i-1][j] + \text{GAP} \ S[i][j-1] + \text{GAP} \end{cases} $$

where:

$$ \text{score}(x_i, y_j) = \begin{cases} \text{MATCH} & \text{if } x_i = y_j \\ \text{MISMATCH} & \text{otherwise} \end{cases} $$

Traceback: Built from a traceback[][] matrix storing 'D', 'U', 'L' for path recovery.


3. Local Alignment (Smith–Waterman)

Goal: Find the highest scoring local region between two sequences, ideal for detecting motifs.


Matrix Definition

Let $H[i][j]$ be the highest scoring alignment ending at $x_i, y_j$.

Initialization:

$$ H[i][0] = 0,\quad H[0][j] = 0 $$

Recurrence Relation:

$$ H[i][j] = \max \begin{cases} 0 \ H[i-1][j-1] + \text{score}(x_i, y_j) \ H[i-1][j] + \text{GAP} \ H[i][j-1] + \text{GAP} \end{cases} $$

Reset to 0 allows alignment to start anywhere and stop at the best score.

Traceback: Starts from the max score cell and moves until score is zero.


Time and Space Complexities

Algorithm Time Complexity Space Complexity Alignment Type
LCS O(m·n) O(m·n) Character-only
Global (NW) O(m·n) O(m·n) Full sequence
Local (SW) O(m·n) O(m·n) Subsequence

Where $m = |X|$, $n = |Y|$