Skip to content

Sequence Modeling

The sequence module predicts NADPH responsiveness directly from protein sequences.

ESM2 Model Flexibility

Any ESM2 protein model from Facebook/Meta on HuggingFace can be used (e.g., facebook/esm2_t6_8M_UR50D through facebook/esm2_t48_15B_UR50D).

The most basic model (esm2_t6_8M_UR50D) was used during development due to GPU resource constraints. Testing was performed on a Tesla T4 (16 GB VRAM). Larger models will require more VRAM — scale your model selection accordingly via config.yaml.

Backbone

  • ESM-2 protein language model
  • Pretrained on large protein sequence corpora

Training modes

  • Pooled embeddings (fast)
  • Residue representations (interpretability)
  • Token-level training (full fine-tuning)

Architecture

  • Attention pooling over residues
  • Fully connected prediction head
  • Supports classification and regression

Training features

  • Focal loss for class imbalance
  • Gradient accumulation for memory efficiency
  • Mixed precision for speed

Labels

Derived from experimental data:

  • Classification: strong vs weak responders
  • Regression: transformed EC50 or NSS

Interpretation

The model learns sequence patterns associated with NADPH sensitivity, enabling prediction for unseen proteins.

This connects sequence information to biochemical behavior.