Skip to content

Sequence Modeling

The sequence module predicts NADPH responsiveness directly from protein sequences.

Backbone

  • ESM-2 protein language model
  • Pretrained on large protein sequence corpora

Training modes

  • Pooled embeddings (fast)
  • Residue representations (interpretability)
  • Token-level training (full fine-tuning)

Architecture

  • Attention pooling over residues
  • Fully connected prediction head
  • Supports classification and regression

Training features

  • Focal loss for class imbalance
  • Gradient accumulation for memory efficiency
  • Mixed precision for speed

Labels

Derived from experimental data:

  • Classification: strong vs weak responders
  • Regression: transformed EC50 or NSS

Interpretation

The model learns sequence patterns associated with NADPH sensitivity, enabling prediction for unseen proteins.

This connects sequence information to biochemical behavior.