tfopt — Transcription Factor Optimization Framework
- Originally implemented by Julius Normann.
- This version has been modified and optimized for consistency & speed in submodules by Abhinav Mishra.
tfopt
provides a flexible architecture for estimating transcriptional regulatory influence using mRNA time series
data, TF protein dynamics, and phosphorylation site signals.
The package contains two main submodules:
tfopt/evol
— global optimization via multi-objective evolutionary algorithmstfopt/local
— constrained optimization using SciPy solvers (e.g., SLSQP)
Both modules share a consistent data preparation pipeline and model formulation.
Model Equation
For each mRNA (indexed by i), the measured time series is represented by:
$$ \mathbf{R}_i = \left([mRNA]_i(t_1), [mRNA]_i(t_2), \dots, [mRNA]_i(T)\right) $$
Its predicted value is modeled as a weighted combination of the effects of transcription factors (TFs) that regulate it. Each TF (indexed by j) contributes in two ways:
- A protein component (when no phosphorylation site is reported) with time series ( TF_{i,j}(t) )
- A PSite component (when phosphorylation sites are available) with time series ( PSite_{k,j}(t) ) for each site k
These contributions are modulated by two sets of parameters:
- α-values: For each mRNA, the impact of TF j is weighted by ( \alpha_{i,j} )
- β-values: For each TF, a vector of weights:
$$ \beta_j = \left( \beta_{0,j}, \beta_{1,j}, \dots, \beta_{K_j,j} \right) $$
Here, $ \beta_{0,j} $ multiplies the raw TF protein signal, and the remaining terms multiply phosphorylation site contributions.
Objective Function
To estimate the best set of weights, we minimize the difference between measured and predicted expression over all genes and time points:
$$ \min_{{\alpha,\beta}} \quad \sum_i \sum_t \left( R_i(t) - \hat{R}_i(t) \right)^2 $$
This formulation supports multiple loss types (MSE, MAE, soft L1, Cauchy, etc.) implemented in both submodules.
Constraints
α-constraints (for each mRNA i):
$$ \sum_{j \in J_i} \alpha_{i,j} = 1, \quad 0 \le \alpha_{i,j} \le 1 $$
β-constraints (for each TF j):
$$ \sum_{q=0}^{K_j} \beta_{q,j} = 1, \quad -2 \le \beta_{q,j} \le 2 $$
This ensures that weights are interpretable and stable.
Optimization Problem Summary
The final optimization problem is:
$$ \min_{{\alpha,\beta}} \sum_i \sum_t \left( R_i(t) - \sum_{j\in J_i} \alpha_{i,j} \cdot TF_{i,j}(t) \cdot \left( \beta_{0,j} + \sum_k PSite_{k,j}(t) \cdot \beta_{k,j} \right) \right)^2 $$
subject to the constraints above. This enables estimation of regulatory influences in a biologically meaningful and data-driven manner.
Submodules
evol/
— Global Evolutionary Optimization
- Implements multi-objective optimization using
pymoo
(NSGA2, AGEMOEA, SMSEMOA) - Evaluates tradeoffs between fit error, α-constraint violation, and β-constraint violation
- Outputs Excel summaries, static and interactive plots, and HTML reports
local/
— Constrained Local Optimization
- Implements deterministic solvers (e.g. SLSQP)
- Faster and more interpretable for small- to medium-scale systems
- Shares the same objective and constraint framework as
evol
- Generates the same reports and plots as the global module
Usage
From one level top of project root:
python -m phoskintime tfopt --mode evol
or
python -m phoskintime tfopt --mode local
Output will be saved in structured folders, including Excel files, plots, and an aggregated HTML report.