Scientific Model¶
This section formalizes the mathematical framework underlying CETSAx–NADPH. The model integrates dose–response fitting, sensitivity scoring, and systems-level inference into a unified quantitative pipeline.
1. Dose–Response Model¶
Each protein \( i \) is observed across a set of concentrations \( c \in \mathbb{R}_{>0} \). The observed CETSA signal is denoted:
1.1 Logistic ITDR Model¶
The response is modeled using a 4-parameter logistic function:
:contentReference[oaicite:0]{index=0}
where:
- \( E_0 \): baseline stability
- \( E_{\max} \): maximal stability shift
- \( EC_{50} \): half-maximal effective concentration
- \( h \): Hill coefficient (cooperativity)
1.2 Log-Dose Parameterization¶
Define:
Then the model becomes:
This parameterization improves numerical stability over wide concentration ranges.
2. Parameter Estimation¶
Let \( \{(c_j, y_j)\}_{j=1}^n \) denote observed data for a protein.
2.1 Weighted Nonlinear Least Squares¶
Parameters \( \theta = (E_0, E_{\max}, \log EC_{50}, h) \) are estimated by minimizing:
where weights are defined as:
This emphasizes low-concentration behavior.
2.2 Regularization¶
To avoid unrealistic cooperativity:
where:
- \( h_0 = 1 \) (target Hill slope)
- \( \lambda > 0 \) controls regularization strength
Total loss:
2.3 Monotonicity Constraint¶
Let \( \tilde{y}_j \) be the isotonic regression estimate of \( y_j \). Then fitting is performed on:
ensuring monotonicity consistent with stabilization or destabilization.
3. Fit Diagnostics¶
3.1 Coefficient of Determination¶
3.2 Effect Size¶
Only fits satisfying:
- \( R^2 > \tau_R \)
- \( \Delta_{\max} > \tau_\Delta \)
are retained.
4. Sensitivity Scoring¶
Each protein is mapped to a scalar NADPH Sensitivity Score (NSS).
4.1 Feature Vector¶
Define:
4.2 Robust Scaling¶
Each feature \( f \) is transformed:
4.3 Directional Transformations¶
- EC50 is inverted:
- Other features remain monotonic with effect strength.
4.4 Composite Score¶
where:
- \( w_k \): feature weights
- \( \phi(\cdot) \): bounded transformation (e.g. sigmoid)
Typical weighting:
5. Pathway Enrichment¶
Let \( S_i \) denote NSS for protein \( i \), and \( \mathcal{P} \subseteq \mathcal{V} \) a pathway.
5.1 Continuous Enrichment¶
Test:
using Mann–Whitney U:
5.2 Over-Representation¶
Define hit set:
Test enrichment via Fisher’s exact test on contingency table:
6. Network Model¶
6.1 Co-Stabilization Matrix¶
Let \( \mathbf{x}_i \in \mathbb{R}^d \) be the dose-response vector for protein \( i \).
6.2 Graph Construction¶
Define graph \( G = (V, E) \):
with edge weight:
6.3 Community Detection¶
Modules are identified by maximizing modularity:
where:
- \( k_i \): node degree
- \( m \): total edge weight
- \( c_i \): community assignment
7. Latent Representation¶
7.1 Feature Matrix¶
Let:
be the standardized feature matrix.
7.2 Principal Component Analysis¶
Compute:
Latent coordinates:
7.3 Factor Analysis¶
Assume:
where:
- \( Z \): latent factors
- \( \Lambda \): loadings
- \( \epsilon \sim \mathcal{N}(0, \Psi)\)
8. Sequence-Based Model¶
Let protein sequence be:
8.1 Embedding¶
Using a pretrained model:
8.2 Attention Pooling¶
8.3 Prediction¶
where \( f_{\theta} \) is a neural network.
9. Explainability¶
9.1 Saliency¶
9.2 Integrated Gradients¶
10. Summary¶
The CETSAx–NADPH framework defines a mapping:
Each transformation is explicitly defined and interpretable, allowing both statistical inference and mechanistic insight.