API Reference
Data Standardization & Cleanup
processing.cleanup
process_collecttri()
Processes the CollecTRI file to clean and filter mRNA-TF interactions. Removes complex interactions, filters by target genes, and saves the result.
format_site(site)
Formats a phosphorylation site string.
If the input is NaN or an empty string, returns an empty string. If the input contains an underscore ('_'), splits the string into two parts, converts the first part to uppercase, and appends the second part unchanged. Otherwise, converts the entire string to uppercase.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
site
|
str
|
The phosphorylation site string to format. |
required |
Returns:
| Name | Type | Description |
|---|---|---|
str |
The formatted phosphorylation site string. |
process_msgauss()
Processes the MS Gaussian data file to generate time series data.
process_msgauss_std()
Processes the MS Gaussian data file to compute transformed means and standard deviations.
process_routlimma()
Processes the Rout Limma table to generate time series data for mRNA.
update_gene_symbols(filename)
Updates the GeneID column in a CSV file by mapping GeneIDs to gene/protein symbols.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
filename
|
str
|
The path to the CSV file to be updated. The file must contain a 'GeneID' column. |
required |
move_processed_files()
Moves or copies processed files to their respective directories.
Optimization Results Mapping
processing.map
map_optimization_results(tf_file_path, kin_file_path, sheet_name='Alpha Values')
Reads the TF-mRNA optimization results from an Excel file and maps mRNA to each TF.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
tf_file_path
|
Path to the Excel file containing TF-mRNA optimization results. |
required | |
kin_file_path
|
Path to the Excel file containing Kinase-Phosphorylation optimization results. |
required | |
sheet_name
|
The name of the sheet in the Excel file to read from. Default is 'Alpha Values'. |
'Alpha Values'
|
Returns:
| Type | Description |
|---|---|
|
pd.DataFrame: A DataFrame containing the mapped TF, mRNA, Psite, and Kinase information. |
create_cytoscape_table(mapping_csv_path)
Creates a Cytoscape-compatible edge table from a mapping file.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
mapping_csv_path
|
str
|
Path to the input CSV file with columns: TF, TF_strength, mRNA, Psite, Kinase, Kinase_strength |
required |
Returns:
| Name | Type | Description |
|---|---|---|
table |
DataFrame
|
Edge table with columns [Source, Target, Interaction, Strength] |
add_kinetic_strength_columns(mapping_path, mapping__path, excel_path, suffix)
Adds kinetic strength columns to the mapping files based on the provided Excel file.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
mapping_path
|
str
|
Path to the first mapping file. |
required |
mapping__path
|
str
|
Path to the second mapping file. |
required |
excel_path
|
str
|
Path to the Excel file containing kinetic strength data. |
required |
suffix
|
str
|
Suffix to append to the output files. |
required |
generate_nodes(edge_df)
Infers node types and aggregates all phosphorylation sites per target node from phosphorylation edges.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
edge_df
|
DataFrame
|
Must have columns ['Source', 'Target', 'Interaction', 'Psite'] |
required |
Returns:
| Type | Description |
|---|---|
|
pd.DataFrame: DataFrame with columns ['Node', 'Type', 'Psite'] |
Kinase-Phosphorylation Optimization
Evolutionary Algorithms
kinopt.evol.config.constants
kinopt.evol.config.logconf
ColoredFormatter
Bases: Formatter
format(record)
Format the log record with ANSI color codes and elapsed time.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
record
|
LogRecord
|
The log record to format. |
required |
Returns: str: The formatted log message with ANSI color codes.
remove_ansi(s)
staticmethod
Remove ANSI escape codes from a string.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
s
|
str
|
The string from which to remove ANSI escape codes. |
required |
Returns: str: The string without ANSI escape codes.
setup_logger(name='phoskintime', log_file=None, level=logging.DEBUG, log_dir=LOG_DIR, rotate=True, max_bytes=2 * 1024 * 1024, backup_count=5)
Function to set up a logger with both file and console handlers.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
name
|
str
|
Name of the logger. |
'phoskintime'
|
log_file
|
str
|
Path to the log file. If None, a default path is generated. |
None
|
level
|
int
|
Logging level (e.g., logging.DEBUG, logging.INFO). |
DEBUG
|
log_dir
|
str
|
Directory where log files are stored. |
LOG_DIR
|
rotate
|
bool
|
Whether to use rotating file handler. |
True
|
max_bytes
|
int
|
Maximum size of log file before rotation. |
2 * 1024 * 1024
|
backup_count
|
int
|
Number of backup files to keep. |
5
|
Returns:
| Name | Type | Description |
|---|---|---|
logger |
Logger
|
Configured logger instance. |
kinopt.evol.exporter.plotout
plot_residuals_for_gene(gene, gene_data)
Generates and saves combined residual-related plots for one gene with all psites in the legend.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
gene
|
str
|
Gene identifier. |
required |
gene_data
|
dict
|
Dictionary with keys 'psites', 'observed', 'estimated', and 'residuals' containing data for all psites. |
required |
TIME_POINTS
|
ndarray or list
|
Time points corresponding to the series. |
required |
opt_analyze_nsga(problem, result, F, pairs, approx_ideal, approx_nadir, asf_i, pseudo_i, n_evals, hv, hist, val, hist_cv_avg, k, igd, best_objectives, waterfall_df, convergence_df, alpha_values, beta_values)
Function to generate and save various plots related to optimization results.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
problem
|
The optimization problem instance. |
required | |
result
|
The result of the optimization run. |
required | |
F
|
Objective function values. |
required | |
pairs
|
Pairs of objectives to plot. |
required | |
approx_ideal
|
Approximate ideal point in objective space. |
required | |
approx_nadir
|
Approximate nadir point in objective space. |
required | |
asf_i
|
Index of the best solution in terms of the augmented weighted sum. |
required | |
pseudo_i
|
Index of the pseudo weights. |
required | |
n_evals
|
Number of evaluations at each generation. |
required | |
hv
|
Hypervolume values. |
required | |
hist
|
History of the optimization process. |
required | |
val
|
Values for convergence plot. |
required | |
hist_cv_avg
|
Average constraint violation history. |
required | |
k
|
Number of generations. |
required | |
igd
|
Inverted generational distance values. |
required | |
best_objectives
|
Best objectives found during the optimization process. |
required | |
waterfall_df
|
DataFrame containing waterfall plot data. |
required | |
convergence_df
|
DataFrame containing convergence data. |
required | |
alpha_values
|
Dictionary containing alpha values for parameters. |
required | |
beta_values
|
Dictionary containing beta values for parameters. |
required |
Returns:
| Type | Description |
|---|---|
|
None |
opt_analyze_de(long_df, convergence_df, ordered_optimizer_runs, x_values, y_values, val)
Function to generate and save various plots related to optimization results.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
long_df
|
DataFrame
|
DataFrame containing parameter values and objective function values. |
required |
convergence_df
|
DataFrame
|
DataFrame containing convergence data. |
required |
ordered_optimizer_runs
|
DataFrame
|
DataFrame containing ordered optimizer runs. |
required |
x_values
|
list
|
X-axis values for the waterfall plot. |
required |
y_values
|
list
|
Y-axis values for the waterfall plot. |
required |
val
|
list
|
Values for the convergence plot. |
required |
Returns:
| Type | Description |
|---|---|
|
None |
kinopt.evol.exporter.sheetutils
output_results(P_initial, P_init_dense, P_estimated, residuals, alpha_values, beta_values, result, timepoints, OUT_FILE)
Function to output results to an Excel file.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
P_initial
|
dict
|
Dictionary with initial parameters. |
required |
P_init_dense
|
ndarray
|
Dense matrix of initial parameters. |
required |
P_estimated
|
ndarray
|
Dense matrix of estimated parameters. |
required |
residuals
|
ndarray
|
Dense matrix of residuals. |
required |
alpha_values
|
dict
|
Dictionary with alpha values. |
required |
beta_values
|
dict
|
Dictionary with beta values. |
required |
result
|
str
|
Result string for logging. |
required |
timepoints
|
list
|
List of time points. |
required |
OUT_FILE
|
str
|
Output file path. |
required |
Returns:
| Type | Description |
|---|---|
|
None |
kinopt.evol.objfn.minfndiffevo
PhosphorylationOptimizationProblem
Bases: ElementwiseProblem
Single-objective constrained optimization problem for phosphorylation dynamics (Numba-accelerated).
Minimizes loss between observed and predicted phosphorylation levels subject to constraints that alpha and beta weights sum to 1.0 for each gene-psite and kinase group, respectively.
Objective
- minimize loss (MSE, autocorrelation, Huber, or MAPE)
Constraints g(x) <= 0: - for each alpha group: |sum(alpha_group) - 1| <= eps_eq - for each kinase beta group: |sum(beta_group) - 1| <= eps_eq
Attributes:
| Name | Type | Description |
|---|---|---|
P_initial |
dict
|
Dictionary mapping (gene, psite) tuples to data dictionaries. |
P_initial_array |
ndarray
|
Observed phosphorylation matrix with shape (i_max, t_max). |
K_index |
dict
|
Dictionary mapping kinase names to lists of (psite_label, row_idx) tuples. |
K_array |
ndarray
|
Kinase activity matrix with shape (n_k_rows, t_max). |
gp_offsets |
ndarray
|
Offset indices for gene-psite groups. |
gp_kinase_ids |
ndarray
|
Kinase IDs for alpha variables. |
k_offsets |
ndarray
|
Offset indices for kinase groups. |
k_psite_rows |
ndarray
|
Psite row indices for beta variables. |
num_alpha |
int
|
Total number of alpha variables. |
num_beta |
int
|
Total number of beta variables. |
eps_eq |
float
|
Tolerance for equality constraints. |
loss_id |
int
|
Loss type identifier. |
include_reg |
bool
|
Whether to include regularization. |
n_scalar |
float
|
Scalar factor for normalization. |
estimated_series(params)
Compute estimated phosphorylation series for given parameters.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
params
|
array - like
|
1D array of parameters [alpha_1, ..., alpha_N, beta_1, ..., beta_M]. |
required |
Returns:
| Name | Type | Description |
|---|---|---|
ndarray |
Predicted phosphorylation matrix with shape (i_max, t_max). |
residuals(params)
Compute residuals between observed and estimated phosphorylation for given parameters.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
params
|
array - like
|
1D array of parameters [alpha_1, ..., alpha_N, beta_1, ..., beta_M]. |
required |
Returns:
| Name | Type | Description |
|---|---|---|
ndarray |
Residual matrix (observed - estimated) with shape (i_max, t_max). |
kinopt.evol.objfn.minfnnsgaii
PhosphorylationOptimizationProblem
Bases: ElementwiseProblem
Multi-objective optimization
F[0] = main loss (error) F[1] = alpha sum-to-1 violations (aggregated) F[2] = beta sum-to-1 violations (aggregated)
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
P_initial
|
dict
|
Dictionary with keys as (gene, psite) and values containing 'Kinases' and 'TimeSeries'. |
required |
P_initial_array
|
ndarray
|
Array of observed gene-psite data. |
required |
K_index
|
dict
|
Dictionary mapping each kinase to a list of (psite, time_series) tuples. |
required |
K_array
|
ndarray
|
Array of kinase-psite time-series data. |
required |
gene_psite_counts
|
list
|
List of integers indicating the number of kinases associated with each gene-psite. |
required |
beta_counts
|
dict
|
Dictionary indicating how many beta values correspond to each kinase-psite combination. |
required |
objective_function(params)
Computes the main objective function (loss) for the given parameters.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
params
|
ndarray
|
Parameter vector containing alpha and beta values. |
required |
Returns:
| Name | Type | Description |
|---|---|---|
float |
Computed loss value based on the selected loss type (base, autocorrelation, huber, or mape). |
kinopt.evol.opt.optrun
choose_de_pop_size(problem)
Determine an appropriate population size for Differential Evolution (DE) algorithms.
The population size is calculated based on the number of decision variables, with bounds to ensure reasonable performance. DE algorithms benefit from population sizes that are multiples of 10.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
problem
|
The optimization problem instance with an 'n_var' attribute indicating the number of decision variables. |
required |
Returns:
| Name | Type | Description |
|---|---|---|
int |
The calculated population size (multiple of 10, between 100 and 600). |
choose_nsga_pop_size(problem, n_obj=3)
Determine an appropriate population size for NSGA-based multi-objective algorithms.
The population size is scaled based on the problem dimensionality (number of decision variables) with heuristic thresholds. The size is rounded to multiples of 50 and enforced to be at least 10 times the number of objectives.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
problem
|
The optimization problem instance with an 'n_var' attribute indicating the number of decision variables. |
required | |
n_obj
|
int
|
Number of objectives in the problem. Defaults to 3. |
3
|
Returns:
| Name | Type | Description |
|---|---|---|
int |
The calculated population size (multiple of 50, at least 10*n_obj). |
binary_tournament_loss_cv(pop, P, eps_cv=1e-10, cv_mode='linf', **kwargs)
Robust binary tournament comparator for constrained optimization.
This function performs binary tournament selection with constraint handling using either true constraint violations (CV) or pseudo-constrained objectives. It supports both single-objective and multi-objective formulations.
Works for
A) single-objective: F has length 1 - if CV exists, use constraint-domination (CV first, then F) - else compare by F only B) pseudo-constrained objectives: F = [loss, alpha_violation, beta_violation] - feasibility-first based on F[1], F[2], then loss
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
pop
|
Population of individuals with 'F' (objectives) and optionally 'CV' attributes. |
required | |
P
|
ndarray
|
Tournament pairs array of shape (n_tournaments, 2), where each row contains indices of two competing individuals. |
required |
eps_cv
|
float
|
Feasibility tolerance for constraint violations. Defaults to 1e-10. |
1e-10
|
cv_mode
|
str
|
Mode for aggregating constraint violations when using pseudo-constrained objectives. Options: 'linf' (max), 'l1' (sum), 'l2' (norm). Defaults to "linf". |
'linf'
|
**kwargs
|
Additional keyword arguments (unused, for compatibility). |
{}
|
Returns:
| Type | Description |
|---|---|
|
np.ndarray: Array of winning indices for each tournament, shape (n_tournaments,). |
Raises:
| Type | Description |
|---|---|
ValueError
|
If pressure is not 2 (only binary tournaments supported) or if cv_mode is not one of 'linf', 'l1', 'l2'. |
run_optimization(P_initial, P_initial_array, K_index, K_array, gene_psite_counts, beta_counts, PhosphorylationOptimizationProblem)
Sets up and runs the multi-objective optimization problem for phosphorylation using an NSGA2 algorithm and a thread pool for parallelization.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
P_initial, P_initial_array, K_index, K_array, gene_psite_counts, beta_counts
|
Data structures describing the problem (time-series data, kinases, etc.). |
required | |
PhosphorylationOptimizationProblem
|
class
|
The custom problem class to be instantiated. |
required |
Returns:
| Name | Type | Description |
|---|---|---|
result |
The pymoo result object containing the optimized population and history. |
|
exec_time |
Execution time for the optimization. |
pick_best_loss_with_constraints_as_objectives(result, eps_cv=1e-10, cv_mode='l1', tie_tol=1e-12, tie_break='loss_then_l2')
Select the best solution from a population with constraints formulated as objectives.
This function assumes a specific objective structure where
F[:,0] = loss (minimize) F[:,1] = constraint violation 1 (minimize, ideally 0) F[:,2] = constraint violation 2 (minimize, ideally 0)
Selection rule
A) If any feasible solutions exist (cv1<=eps and cv2<=eps): choose minimum loss among feasible. B) Else: choose minimum aggregated CV; tie-break by loss; optional tie-break by ||X||2.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
result
|
Pymoo result object containing the final population with 'F' and 'X' attributes. |
required | |
eps_cv
|
float
|
Feasibility tolerance for constraint violations. Defaults to 1e-10. |
1e-10
|
cv_mode
|
str
|
Mode for aggregating constraint violations. Options: 'l1' (sum), 'linf' (max), 'l2' (Euclidean norm). Defaults to "l1". |
'l1'
|
tie_tol
|
float
|
Tolerance for considering values as tied. Defaults to 1e-12. |
1e-12
|
tie_break
|
str
|
Tie-breaking strategy. Options: 'loss_then_l2' or 'loss_only'. Defaults to "loss_then_l2". |
'loss_then_l2'
|
Returns:
| Name | Type | Description |
|---|---|---|
tuple |
A tuple containing: - best_solution: The selected individual from the population. - best_index_in_pop (int): The index of the best solution in the population. - info (dict): Dictionary with selection metadata including selection case, number of feasible solutions, and best objective values. |
Raises:
| Type | Description |
|---|---|
ValueError
|
If the objective array has fewer than 3 columns or if cv_mode is not one of 'l1', 'linf', 'l2'. |
post_optimization_nsga(result, weights=np.array([1.0, 1.0, 1.0]), ref_point=np.array([3, 1, 1]))
Post-process the result of a multi-objective NSGA-based optimization run.
This function analyzes the optimization history, computes convergence metrics (hypervolume, IGD+), identifies the best solution using constraint handling, and generates CSV reports for convergence and parameter scans.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
result
|
The final result object from the pymoo optimizer containing the final population, history, and objective values. |
required | |
weights
|
ndarray
|
Array of length 3 for weighting the objectives in decomposition-based selection. Defaults to [1.0, 1.0, 1.0]. |
array([1.0, 1.0, 1.0])
|
ref_point
|
ndarray
|
Reference point for hypervolume computation. Defaults to [3, 1, 1]. |
array([3, 1, 1])
|
Returns:
| Name | Type | Description |
|---|---|---|
tuple |
A tuple containing 23 elements: - F: Final objective values array - pairs: Objective pairs for plotting [(0,1), (0,2), (1,2)] - n_evals: Number of evaluations per generation - hist_cv: Minimum constraint violation per generation - hist_cv_avg: Average constraint violation per generation - k: Generation index when first feasible solution appeared - metric_igd: IGDPlus metric object - metric_hv: Hypervolume metric object - best_solution: The selected best individual - best_objectives: Objective vector of best solution - optimized_params: Decision variables (X) of best solution - approx_nadir: Approximate nadir point - approx_ideal: Approximate ideal point - scores: Best solution's objective scores - best_index: Index of best solution in population - hist: Full optimization history - hist_hv: Hypervolume values per generation - hist_igd: IGD+ values per generation - convergence_df: DataFrame with iteration vs best objective - waterfall_df: DataFrame with all solutions and parameters - asf_i: Index of best solution by ASF decomposition - pseudo_weights_result: Result of pseudo-weights MCDM method - pairs (duplicate): Objective pairs - val: Best objective value per generation |
post_optimization_de(result, alpha_values, beta_values)
Post-process the result of a single-objective DE or GA optimization run.
This function extracts the final population, creates parameter labels from alpha and beta values, generates a parameter scan DataFrame sorted by objective value, and produces a convergence DataFrame showing the best objective per iteration.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
result
|
The final result object from the pymoo optimizer (e.g., GA or DE result) containing the population, history, and objective values. |
required | |
alpha_values
|
dict
|
Dictionary mapping (gene, psite) tuples to dictionaries of {kinase: value} for alpha parameters. |
required |
beta_values
|
dict
|
Dictionary mapping (kinase, psite) tuples to beta parameter values. |
required |
Returns:
| Name | Type | Description |
|---|---|---|
tuple |
A tuple containing 6 elements: - ordered_optimizer_runs: DataFrame of all solutions sorted by objective value - convergence_df: DataFrame with iteration vs best objective value - long_df: Long-form DataFrame for parameter visualization with columns ['Individual', 'Objective Value (F)', 'Parameter', 'Parameter Value', 'Type'] - x_values: List of iteration indices selected for plotting - y_values: List of objective values corresponding to x_values - val: Best objective value per generation from history |
kinopt.evol.optcon.construct
pipeline(input1_path: str, input2_path: str, time_series_columns: list[str], scaling_method: str, split_point: float, segment_points: list[float], estimate_missing_kinases: bool, kinase_to_psites: dict[str, int])
Function to run the entire pipeline for loading and processing data.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
input1_path
|
str
|
Path to the first CSV file (HGNC data). |
required |
input2_path
|
str
|
Path to the second CSV file (kinase interactions). |
required |
time_series_columns
|
list[str]
|
List of time series columns to extract. |
required |
scaling_method
|
str
|
Method for scaling the data. |
required |
split_point
|
float
|
Split point for scaling. |
required |
segment_points
|
list[float]
|
Segment points for scaling. |
required |
estimate_missing_kinases
|
bool
|
Flag to estimate missing kinases. |
required |
kinase_to_psites
|
dict[str, int]
|
Dictionary mapping kinases to their respective psites. |
required |
Returns:
| Name | Type | Description |
|---|---|---|
full_hgnc_df |
DataFrame
|
The scaled data from input1. |
interaction_df |
DataFrame
|
The subset/merged DataFrame from input2. |
observed |
DataFrame
|
Subset of full_hgnc_df merged with interaction_df. |
P_initial |
dict
|
Dictionary mapping gene-psite pairs to kinase relationships and time-series data. |
P_initial_array |
ndarray
|
Array containing observed time-series data for gene-psite pairs. |
K_array |
ndarray
|
Array containing time-series data for kinase-psite combinations. |
K_index |
dict
|
Mapping of kinases to their respective psite data. |
beta_counts |
dict
|
Mapping of kinase indices to the number of associated psites. |
gene_psite_counts |
list
|
List of counts of psites for each gene. |
n |
int
|
Number of unique gene-psite pairs. |
load_geneid_to_psites(input1_path=INPUT1)
Function to load geneid to psite mapping from input1.csv. Args: input1_path (str): Path to the first CSV file (HGNC data). Returns: geneid_psite_map (dict): Dictionary mapping gene IDs to sets of psites.
get_unique_kinases(input2_path=INPUT2)
Function to extract unique kinases from input2.csv. Args: input2_path (str): Path to the second CSV file (kinase interactions). Returns: kinases (set): Set of unique kinases extracted from the input2 file.
check_kinases()
Function to check if kinases from input2.csv are present in input1.csv.
Returns:
| Type | Description |
|---|---|
|
None |
kinopt.evol.utils.iodata
format_duration(seconds)
Returns a formatted string representing the duration in seconds, minutes, or hours.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
seconds
|
float
|
The duration in seconds. |
required |
Returns: str: The formatted duration string.
load_and_scale_data(estimate_missing, scaling_method, split_point, seg_points)
Function to load and scale data from CSV files.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
estimate_missing
|
bool
|
If True, estimates missing values. |
required |
scaling_method
|
str
|
The scaling method to apply ('min_max', 'log', 'temporal', 'segmented', 'slope', 'cumulative'). |
required |
split_point
|
int
|
Column index for temporal scaling. |
required |
seg_points
|
list
|
List of column indices for segmented scaling. |
required |
Returns:
| Name | Type | Description |
|---|---|---|
full_hgnc_df |
DataFrame
|
DataFrame with scaled time-series data. |
interaction_df |
DataFrame
|
DataFrame containing interaction data. |
observed |
DataFrame
|
DataFrame containing observed data. |
apply_scaling(df, time_series_columns, method, split_point, segment_points)
Function to apply different scaling methods to time-series data in a DataFrame.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
df
|
DataFrame
|
Input DataFrame containing time-series data. |
required |
time_series_columns
|
list
|
List of column names to scale. |
required |
method
|
str
|
Scaling method ('min_max', 'log', 'temporal', 'segmented', 'slope', 'cumulative'). |
required |
split_point
|
int
|
Column index for temporal scaling. |
required |
segment_points
|
list
|
List of column indices for segmented scaling. |
required |
Returns:
| Type | Description |
|---|---|
|
pd.DataFrame: DataFrame with scaled time-series data. |
create_report(results_dir: str, output_file: str = 'report.html')
Creates a single global report HTML file from all gene folders inside the results directory.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
results_dir
|
str
|
Path to the root result's directory. |
required |
output_file
|
str
|
Name of the generated global report file (placed inside results_dir). |
'report.html'
|
Returns:
| Type | Description |
|---|---|
|
None |
organize_output_files(*directories)
Function to organize output files into protein-specific folders and a general folder.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
*directories
|
List of directories to organize. |
()
|
Returns:
| Type | Description |
|---|---|
|
None |
kinopt.evol.utils.params
extract_parameters(P_initial, gene_psite_counts, K_index, optimized_params)
Function to extract alpha and beta values from the optimized parameters.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
P_initial
|
dict
|
Dictionary containing initial parameters for each gene-psite pair. |
required |
gene_psite_counts
|
list
|
List of counts for each gene-psite pair. |
required |
K_index
|
dict
|
Dictionary mapping kinases to their respective psite pairs. |
required |
optimized_params
|
list
|
List of optimized parameters. |
required |
Returns:
| Name | Type | Description |
|---|---|---|
alpha_values |
dict
|
Dictionary containing alpha values for each gene-psite pair. |
beta_values |
dict
|
Dictionary containing beta values for each kinase-psite pair. |
compute_metrics(optimized_params: np.ndarray, P_initial: dict, P_initial_array: np.ndarray, K_index: dict, K_array: np.ndarray, gene_psite_counts: list, beta_counts: dict, n: int)
Function to compute error metrics for the estimated series.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
optimized_params
|
list
|
List of optimized parameters. |
required |
P_initial
|
dict
|
Dictionary containing initial parameters for each gene-psite pair. |
required |
P_initial_array
|
ndarray
|
Array of initial parameters. |
required |
K_index
|
dict
|
Dictionary mapping kinases to their respective psite pairs. |
required |
K_array
|
ndarray
|
Array of kinases. |
required |
gene_psite_counts
|
list
|
List of counts for each gene-psite pair. |
required |
beta_counts
|
dict
|
List of counts for each kinase-psite pair. |
required |
n
|
int
|
Number of samples. |
required |
Returns:
| Name | Type | Description |
|---|---|---|
P_estimated |
ndarray
|
Estimated series. |
residuals |
ndarray
|
Residuals between initial and estimated series. |
mse |
float
|
Mean Squared Error. |
rmse |
float
|
Root Mean Squared Error. |
mae |
float
|
Mean Absolute Error. |
mape |
float
|
Mean Absolute Percentage Error. |
r_squared |
float
|
R-squared value. |
Gradient-Based Algorithms
kinopt.local.config.constants
parse_args()
kinopt.local CLI. Defaults come from config.toml.
kinopt.local.config.logconf
kinopt.local.exporter.plotout
format_timepoints(tp, tol=1e-09)
Format timepoints with minimal decimals: - integers -> no decimal - non-integers -> one decimal
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
tp
|
array - like
|
Timepoints (list or np.ndarray) |
required |
tol
|
float
|
Tolerance for floating-point integer check |
1e-09
|
Returns:
| Type | Description |
|---|---|
|
list[str]: Formatted labels |
plot_fits_for_gene(gene, gene_data, real_timepoints)
Function to plot the observed and estimated phosphorylation levels for each psite of a gene.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
gene
|
str
|
The name of the gene. |
required |
gene_data
|
dict
|
A dictionary containing observed and estimated data for each psite of the gene. |
required |
real_timepoints
|
list
|
A list of timepoints corresponding to the observed and estimated data. |
required |
export_outcomes_to_csv(outcomes, csv_path)
Export multistart optimization outcomes to CSV.
One row per start, scalar diagnostics only.
plot_cumulative_residuals(gene, gene_data, real_timepoints)
Function to plot the cumulative residuals for each psite of a gene.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
gene
|
str
|
The name of the gene. |
required |
gene_data
|
dict
|
A dictionary containing the residuals for each psite of the gene. |
required |
real_timepoints
|
list
|
A list of timepoints corresponding to the observed and estimated data. |
required |
plot_autocorrelation_residuals(gene, gene_data, real_timepoints)
Function to plot the autocorrelation of residuals for each psite of a gene.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
gene
|
str
|
The name of the gene. |
required |
gene_data
|
dict
|
A dictionary containing the residuals for each psite of the gene. |
required |
real_timepoints
|
list
|
A list of timepoints corresponding to the observed and estimated data. |
required |
plot_histogram_residuals(gene, gene_data, real_timepoints)
Function to plot histograms of residuals for each psite of a gene.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
gene
|
str
|
The name of the gene. |
required |
gene_data
|
dict
|
A dictionary containing the residuals for each psite of the gene. |
required |
real_timepoints
|
list
|
A list of timepoints corresponding to the observed and estimated data. |
required |
plot_qqplot_residuals(gene, gene_data, real_timepoints)
Function to plot QQ plots of residuals for each psite of a gene.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
gene
|
str
|
The name of the gene. |
required |
gene_data
|
dict
|
A dictionary containing the residuals for each psite of the gene. |
required |
real_timepoints
|
list
|
A list of timepoints corresponding to the observed and estimated data. |
required |
plot_multistart_summary_runtime_overlay(summary_csv, out_path=None, figsize=(8, 8), x_col='rank', y_col='fun', c_col='runtime_s', success_col='success', cv_col='constr_violation', annotate_best=True)
Read a multistart summary CSV and plot objective vs rank with point color = runtime.
Minimal, information-dense conventions: - x: rank (best -> worst) - y: final objective (fun) - color: runtime in seconds - optional: de-emphasize non-success / infeasible points (if columns exist)
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
summary_csv
|
str | Path
|
Path to the multistart_summary.csv |
required |
out_path
|
str | Path | None
|
If provided, saves the figure (e.g. .png) |
None
|
figsize
|
tuple
|
Figure size in inches |
(8, 8)
|
x_col, y_col, c_col
|
Column names |
required | |
success_col, cv_col
|
Optional columns for styling (used if present) |
required | |
annotate_best
|
bool
|
Annotate the best run (rank=1 or min fun) |
True
|
Returns:
| Type | Description |
|---|---|
(fig, ax, df)
|
Matplotlib figure/axis and the loaded DataFrame |
kinopt.local.exporter.sheetutils
format_timepoints(tp, tol=1e-09)
Format timepoints with minimal decimals: - integers -> no decimal - non-integers -> one decimal
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
tp
|
array - like
|
Timepoints (list or np.ndarray) |
required |
tol
|
float
|
Tolerance for floating-point integer check |
1e-09
|
Returns:
| Type | Description |
|---|---|
|
list[str]: Formatted labels |
plot_fits_for_gene(gene, gene_data, real_timepoints)
Function to plot the observed and estimated phosphorylation levels for each psite of a gene.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
gene
|
str
|
The name of the gene. |
required |
gene_data
|
dict
|
A dictionary containing observed and estimated data for each psite of the gene. |
required |
real_timepoints
|
list
|
A list of timepoints corresponding to the observed and estimated data. |
required |
export_outcomes_to_csv(outcomes, csv_path)
Export multistart optimization outcomes to CSV.
One row per start, scalar diagnostics only.
plot_cumulative_residuals(gene, gene_data, real_timepoints)
Function to plot the cumulative residuals for each psite of a gene.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
gene
|
str
|
The name of the gene. |
required |
gene_data
|
dict
|
A dictionary containing the residuals for each psite of the gene. |
required |
real_timepoints
|
list
|
A list of timepoints corresponding to the observed and estimated data. |
required |
plot_autocorrelation_residuals(gene, gene_data, real_timepoints)
Function to plot the autocorrelation of residuals for each psite of a gene.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
gene
|
str
|
The name of the gene. |
required |
gene_data
|
dict
|
A dictionary containing the residuals for each psite of the gene. |
required |
real_timepoints
|
list
|
A list of timepoints corresponding to the observed and estimated data. |
required |
plot_histogram_residuals(gene, gene_data, real_timepoints)
Function to plot histograms of residuals for each psite of a gene.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
gene
|
str
|
The name of the gene. |
required |
gene_data
|
dict
|
A dictionary containing the residuals for each psite of the gene. |
required |
real_timepoints
|
list
|
A list of timepoints corresponding to the observed and estimated data. |
required |
plot_qqplot_residuals(gene, gene_data, real_timepoints)
Function to plot QQ plots of residuals for each psite of a gene.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
gene
|
str
|
The name of the gene. |
required |
gene_data
|
dict
|
A dictionary containing the residuals for each psite of the gene. |
required |
real_timepoints
|
list
|
A list of timepoints corresponding to the observed and estimated data. |
required |
plot_multistart_summary_runtime_overlay(summary_csv, out_path=None, figsize=(8, 8), x_col='rank', y_col='fun', c_col='runtime_s', success_col='success', cv_col='constr_violation', annotate_best=True)
Read a multistart summary CSV and plot objective vs rank with point color = runtime.
Minimal, information-dense conventions: - x: rank (best -> worst) - y: final objective (fun) - color: runtime in seconds - optional: de-emphasize non-success / infeasible points (if columns exist)
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
summary_csv
|
str | Path
|
Path to the multistart_summary.csv |
required |
out_path
|
str | Path | None
|
If provided, saves the figure (e.g. .png) |
None
|
figsize
|
tuple
|
Figure size in inches |
(8, 8)
|
x_col, y_col, c_col
|
Column names |
required | |
success_col, cv_col
|
Optional columns for styling (used if present) |
required | |
annotate_best
|
bool
|
Annotate the best run (rank=1 or min fun) |
True
|
Returns:
| Type | Description |
|---|---|
(fig, ax, df)
|
Matplotlib figure/axis and the loaded DataFrame |
output_results(P_initial, P_init_dense, P_estimated, residuals, alpha_values, beta_values, result, mse, rmse, mae, mape, r_squared)
Function to output the results of the optimization process.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
P_initial
|
dict
|
Dictionary containing initial phosphorylation data. |
required |
P_init_dense
|
ndarray
|
Dense matrix of initial phosphorylation data. |
required |
P_estimated
|
ndarray
|
Dense matrix of estimated phosphorylation data. |
required |
residuals
|
ndarray
|
Dense matrix of residuals. |
required |
alpha_values
|
dict
|
Dictionary containing optimized alpha values. |
required |
beta_values
|
dict
|
Dictionary containing optimized beta values. |
required |
result
|
OptimizeResult
|
Result object from the optimization process. |
required |
mse
|
float
|
Mean Squared Error of the optimization. |
required |
rmse
|
float
|
Root Mean Squared Error of the optimization. |
required |
mae
|
float
|
Mean Absolute Error of the optimization. |
required |
mape
|
float
|
Mean Absolute Percentage Error of the optimization. |
required |
r_squared
|
float
|
R-squared value of the optimization. |
required |
export_params_npz(outcomes, path)
Export the optimized parameters to a compressed npz file.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
outcomes
|
list
|
List of OptimizeResult objects. |
required |
path
|
str
|
Path to save the npz file. |
required |
kinopt.local.objfn.minfn
kinopt.local.opt.optrun
StartOutcome
dataclass
Outcome of a single optimization start.
:param start_id: ID of the start. :param seed: Seed used for the start. :param result: Result of the optimization. :param optimized_params: Optimized parameters. :param fun: Objective function value. :param success: Whether the optimization was successful. :param constr_violation: Constraint violation. :param runtime_s: Runtime of the optimization.
run_optimization(obj_fun, params_initial, opt_method, bounds, constraints)
Run optimization using the specified method.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
obj_fun
|
Objective function to minimize. |
required | |
params_initial
|
Initial parameters for the optimization. |
required | |
opt_method
|
Optimization method to use (e.g., 'SLSQP', 'trust-constr'). |
required | |
bounds
|
Bounds for the parameters. |
required | |
constraints
|
Constraints for the optimization. |
required |
Returns:
| Name | Type | Description |
|---|---|---|
result |
Result of the optimization. |
|
optimized_params |
Optimized parameters. |
multistart_run_optimization(obj_fun, params_initial, opt_method, bounds, constraints, n_starts=24, n_jobs=-1, base_seed=1234, init_strategy='hybrid', jitter_scale=0.15, prefer_feasible=True, logger=None)
Runs run_optimization multiple times in parallel and returns (best_result, best_params, outcomes).
Selection logic (sophisticated but simple): 1) If prefer_feasible: prefer (cv <= 0) or smallest constraint violation. 2) Then lowest objective. 3) Then success=True as tie-breaker. 4) Then shortest runtime as final tie-breaker.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
obj_fun
|
Objective function to optimize. |
required | |
params_initial
|
Initial parameters for optimization. |
required | |
opt_method
|
Optimization method to use (e.g., 'SLSQP', 'trust-constr'). |
required | |
bounds
|
Parameter bounds for optimization. |
required | |
constraints
|
Constraints for optimization. |
required | |
n_starts
|
Number of optimization starts to run (default: 24). |
24
|
|
n_jobs
|
Number of parallel jobs to run. -1 means use all processors (default: -1). |
-1
|
|
base_seed
|
Base seed for random number generation (default: 1234). |
1234
|
|
init_strategy
|
Strategy for sampling initial parameters: 'jitter', 'uniform', or 'hybrid' (default: 'hybrid'). |
'hybrid'
|
|
jitter_scale
|
Scale for jittering initial parameters (default: 0.15). |
0.15
|
|
prefer_feasible
|
If True, prefer feasible solutions over infeasible ones (default: True). |
True
|
|
logger
|
Logger instance for logging messages (default: None). |
None
|
Returns:
| Name | Type | Description |
|---|---|---|
tuple |
A tuple containing: - best_result: The optimization result object with the best outcome. - best_params: The optimized parameters corresponding to the best result. - outcomes: List of StartOutcome objects for all optimization starts. |
kinopt.local.optcon.construct
load_geneid_to_psites(input1_path=INPUT1)
Load the geneid to psite mapping from a CSV file.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
input1_path
|
str
|
Path to the input CSV file containing geneid and psite information. |
INPUT1
|
Returns: defaultdict: A dictionary mapping geneid to a set of psites.
get_unique_kinases(input2_path=INPUT2)
Extract unique kinases from the input CSV file.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
input2_path
|
str
|
Path to the input CSV file containing kinase information. |
INPUT2
|
Returns: set: A set of unique kinases.
check_kinases()
Check if kinases in input2.csv are present in input1.csv and log the results.
kinopt.local.utils.iodata
format_duration(seconds)
Formats a duration in seconds into a human-readable string. - If less than 60 seconds, returns in seconds. - If less than 3600 seconds, returns in minutes. - If more than 3600 seconds, returns in hours.
:param seconds: :return: Formatted string
load_and_scale_data(estimate_missing, scaling_method, split_point, seg_points)
Load and scale the data from the specified input files.
:param estimate_missing: :param scaling_method: :param split_point: :param seg_points: :return: Time series data, interaction data, observed data
apply_scaling(df, cols, method, split_point, seg_points)
Apply scaling to the specified columns of a DataFrame based on the given method. The scaling methods include: - 'min_max': Min-Max scaling - 'log': Logarithmic scaling - 'temporal': Temporal scaling (two segments) - 'segmented': Segmented scaling (multiple segments) - 'slope': Slope scaling - 'cumulative': Cumulative scaling
:param df: :param cols: :param method: :param split_point: :param seg_points: :return: df
create_report(results_dir: str, output_file: str = 'report.html')
Creates a single global report HTML file from all gene folders inside the results directory.
For each gene folder (e.g. "ABL2"), the report will include: - All PNG plots and interactive HTML plots displayed in a grid with three plots per row. - Each plot is confined to a fixed size of 900px by 900px. - Data tables from XLSX or CSV files in the gene folder are displayed below the plots, one per row.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
results_dir
|
str
|
Path to the root results directory. |
required |
output_file
|
str
|
Name of the generated global report file (placed inside results_dir). |
'report.html'
|
organize_output_files(*directories)
Function to organize output files into protein-specific folders. It moves files matching the pattern 'protein_name_*.{json,svg,png,html,csv,xlsx}' into a folder named after the protein (e.g., 'ABL2') and moves all other files into a 'General' folder within the same directory.
:param directories:
kinopt.local.utils.params
extract_parameters(P_initial, gene_kinase_counts, total_alpha, unique_kinases, K_index, optimized_params)
Extracts the alpha and beta parameters from the optimized parameters.
:param P_initial: :param gene_kinase_counts: :param total_alpha: :param unique_kinases: :param K_index: :param optimized_params: :return: Alpha and beta values as dictionaries
compute_metrics(optimized_params, P_init_dense, t_max, gene_alpha_starts, gene_kinase_counts, gene_kinase_idx, total_alpha, kinase_beta_starts, kinase_beta_counts, K_data, K_indices, K_indptr)
Computes the estimated series and various metrics based on the optimized parameters.
:param optimized_params: :param P_init_dense: :param t_max: :param gene_alpha_starts: :param gene_kinase_counts: :param gene_kinase_idx: :param total_alpha: :param kinase_beta_starts: :param kinase_beta_counts: :param K_data: :param K_indices: :param K_indptr: :return: Estimated series, residuals, MSE, RMSE, MAE, MAPE, R-squared
Fitting Analysis & Feasibility
kinopt.fitanalysis.helpers.postfit
goodnessoffit(estimated, observed)
Function to plot the goodness of fit and kullback-leibler divergence for estimated and observed values.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
estimated
|
DataFrame
|
DataFrame containing estimated values. |
required |
observed
|
DataFrame
|
DataFrame containing observed values. |
required |
Returns:
| Type | Description |
|---|---|
|
None |
reshape_alpha_beta(alpha_values, beta_values)
Function to reshape alpha and beta values for plotting.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
alpha_values
|
DataFrame
|
DataFrame containing alpha values. |
required |
beta_values
|
DataFrame
|
DataFrame containing beta values. |
required |
Returns: pd.DataFrame: Reshaped DataFrame containing both alpha and beta values.
perform_pca(df)
Function to perform PCA analysis on the given DataFrame.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
df
|
DataFrame
|
DataFrame containing the data for PCA analysis. |
required |
Returns:
| Type | Description |
|---|---|
|
pd.DataFrame: DataFrame with PCA results and additional columns for type and gene/psite information. |
plot_pca(result_df_sorted, y_axis_column)
Plot PCA or t-SNE results for each gene/psite. The function creates scatter plots with different markers for alpha and beta parameters, and adds labels for each point. The function also adjusts text labels to avoid overlap using the adjustText library.
:param result_df_sorted: DataFrame containing PCA or t-SNE results. :param y_axis_column: Column name for the y-axis values in the plot.
perform_tsne(scaled_data, df)
Perform t-SNE analysis on the given scaled data. The function returns a DataFrame with t-SNE results and additional columns for type and gene/psite information.
:param scaled_data: :param df:
:return: - pd.DataFrame: DataFrame with t-SNE results and additional columns.
additional_plots(df, scaled_data, alpha_values, beta_values, residuals_df)
Function to create additional plots including CDF, KDE, Boxplot, and Hierarchical Clustering.
:param df: :param scaled_data: :param alpha_values: :param beta_values: :param residuals_df:
create_sankey_from_network(output_dir, data, title)
Creates a Sankey diagram from the given data and saves it as an HTML file.
This function processes the input data to generate nodes and links for a Sankey diagram. It assigns colors to nodes and links based on their attributes and values, and uses Plotly to render the diagram. The resulting diagram is saved as an HTML file in the specified output directory.
:param output_dir: str The directory where the Sankey diagram HTML file will be saved. :param data: pd.DataFrame A DataFrame containing the data for the Sankey diagram. It must include the following columns: - 'Source': The source node of the link. - 'Target': The target node of the link. - 'Value': The value of the link, which determines the flow size. :param title: str The title of the Sankey diagram.
The function performs the following steps: 1. Initializes nodes and links for the Sankey diagram. 2. Maps node labels to indices and assigns colors to nodes. 3. Processes the data to create links between nodes, assigning colors based on link values. 4. Builds the Sankey diagram using Plotly. 5. Adds a color bar to explain the flow gradient. 6. Saves the Sankey diagram as an HTML file in the specified output directory.
important_connections(output_dir, data, top_n=20)
Extracts the top N most important connections based on their absolute values and saves them to a CSV file.
:param output_dir: str The directory where the CSV file will be saved. :param data: pd.DataFrame A DataFrame containing the connections with columns 'Source', 'Target', and 'Value'. :param top_n: int, optional The number of top connections to extract (default is 20).
The function sorts the connections by their absolute values in descending order, selects the top N connections, and saves them to a CSV file named 'top_connections.csv' in the specified output directory.
kinopt.optimality.KKT
generate_latex_table(summary_dict, table_caption, table=None)
Function to generate a LaTeX table from a summary dictionary.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
summary_dict
|
dict
|
Dictionary containing summary data. |
required |
table_caption
|
str
|
Caption for the LaTeX table. |
required |
table
|
str
|
Optional existing LaTeX table to append to. |
None
|
Returns:
| Name | Type | Description |
|---|---|---|
str |
LaTeX formatted table as a string. |
print_primal_feasibility_results(primal_summary, alpha_violations, beta_violations, logger_obj=None)
Logs the primal feasibility summary and violation details.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
primal_summary
|
dict
|
Dictionary containing primal feasibility results. |
required |
alpha_violations
|
dict
|
Dictionary containing alpha constraint violations. |
required |
beta_violations
|
dict
|
Dictionary containing beta constraint violations. |
required |
logger_obj
|
Optional logger object to log the information. |
None
|
print_sensitivity_and_active_constraints(sensitivity_summary, active_constraints_summary, logger_obj=None)
Logs the sensitivity summary and active constraints summary.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
sensitivity_summary
|
dict
|
Dictionary containing sensitivity analysis results. |
required |
active_constraints_summary
|
dict
|
Dictionary containing active constraints summary. |
required |
logger_obj
|
Optional logger object to log the information. |
None
|
plot_constraint_violations(alpha_violations, beta_violations, out_dir)
Function to plot constraint violations for alpha and beta values. It creates a stacked bar plot showing the violations for each protein. The top 5 proteins with the highest violations are highlighted in red.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
alpha_violations
|
Series
|
Series containing alpha constraint violations. |
required |
beta_violations
|
Series
|
Series containing beta constraint violations. |
required |
out_dir
|
str
|
Directory to save the plot. |
required |
plot_sensitivity_analysis(sensitivity_analysis, out_dir)
Function to plot sensitivity analysis results. It creates a horizontal bar plot showing the mean, max, and min sensitivity for each protein.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
sensitivity_analysis
|
DataFrame
|
DataFrame containing sensitivity analysis results. |
required |
out_dir
|
str
|
Directory to save the plot. |
required |
Returns:
| Type | Description |
|---|---|
|
None |
process_excel_results(file_path=OUT_FILE)
Function to process the Excel results file. It reads the alpha and beta values, estimated and observed values, validates normalization constraints, computes residuals and gradients, and generates LaTeX tables for the residuals and sensitivity summaries. It also performs sensitivity analysis and identifies high sensitivity sites. The results are returned as a dictionary.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
file_path
|
str
|
Path to the Excel file containing results. |
OUT_FILE
|
Returns: dict: Dictionary containing the processed results, including alpha and beta values, estimated and observed values, constraint violations, residuals summary, sensitivity summary, and high sensitivity sites.
post_optimization_results()
Function to process and visualize the results of the optimization.
Returns: dict: Dictionary containing the processed results, including alpha and beta values, estimated and observed values, constraint violations, residuals summary, sensitivity summary, and high sensitivity sites.
TF-mRNA Optimization
Evolutionary Algorithms
tfopt.evol.config.constants
parse_args()
tfopt.evol CLI: bounds, loss, optimizer selection. Defaults come from config.toml.
tfopt.evol.config.logconf
tfopt.evol.exporter.plotout
plot_estimated_vs_observed(predictions, expression_matrix, gene_ids, time_points, regulators, tf_protein_matrix, tf_ids, num_targets, save_path=OUT_DIR)
Plot the estimated vs observed expression levels for a set of genes.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
predictions
|
ndarray
|
Predicted expression levels. |
required |
expression_matrix
|
ndarray
|
Observed expression levels. |
required |
gene_ids
|
list
|
List of gene identifiers. |
required |
time_points
|
ndarray
|
Time points for the experiments. |
required |
regulators
|
ndarray
|
Matrix of regulators for each gene. |
required |
tf_protein_matrix
|
ndarray
|
Matrix of TF protein levels. |
required |
tf_ids
|
list
|
List of TF identifiers. |
required |
num_targets
|
int
|
Number of target genes to plot. |
required |
save_path
|
str
|
Directory to save the plots. |
OUT_DIR
|
compute_predictions(x, regulators, protein_mat, psite_tensor, n_reg, T_use, n_mRNA, beta_start_indices, num_psites)
Compute the predicted expression levels based on the optimization variables.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
x
|
ndarray
|
Optimization variables. |
required |
regulators
|
ndarray
|
Matrix of regulators for each gene. |
required |
protein_mat
|
ndarray
|
Matrix of TF protein levels. |
required |
psite_tensor
|
ndarray
|
Tensor of phosphorylation sites. |
required |
n_reg
|
int
|
Number of regulators. |
required |
T_use
|
int
|
Number of time points to use. |
required |
n_mRNA
|
int
|
Number of mRNAs. |
required |
beta_start_indices
|
list
|
List of starting indices for beta parameters. |
required |
num_psites
|
list
|
List of number of phosphorylation sites for each TF. |
required |
tfopt.evol.exporter.sheetutils
save_results_to_excel(gene_ids, tf_ids, final_alpha, final_beta, psite_labels_arr, expression_matrix, predictions, objective_value, reg_map, filename=OUT_FILE)
Save the optimization results to an Excel file.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
gene_ids
|
list
|
List of gene identifiers. |
required |
tf_ids
|
list
|
List of TF identifiers. |
required |
final_alpha
|
ndarray
|
Final alpha values. |
required |
final_beta
|
ndarray
|
Final beta values. |
required |
psite_labels_arr
|
list
|
List of phosphorylation site labels. |
required |
expression_matrix
|
ndarray
|
Observed expression levels. |
required |
predictions
|
ndarray
|
Predicted expression levels. |
required |
objective_value
|
float
|
Objective value from optimization. |
required |
reg_map
|
dict
|
Mapping of genes to regulators. |
required |
filename
|
str
|
Path to the output Excel file. |
OUT_FILE
|
tfopt.evol.objfn.minfn
TFOptimizationMultiObjectiveProblem
Bases: Problem
Represents a multi-objective optimization problem specific to transcription factor (TF) and mRNA synthesis dynamics.
This class is an extension of the Problem class and is designed to model complex biological
processes by incorporating various dynamic parameters like regulators, protein matrices,
psite tensors, and associated configurations. It supports parallel evaluation for
multi-thread usage, optimizing performance for large populations.
Attributes:
| Name | Type | Description |
|---|---|---|
n_mRNA |
int
|
Number of mRNA species in the system. |
n_TF |
int
|
Number of transcription factor species in the system. |
n_reg |
int
|
Number of regulators. |
n_psite_max |
int
|
Maximum number of potential p-sites. |
n_alpha |
int
|
Number of alpha parameters used in modeling. |
T_use |
int
|
Number of time units or steps to use in the evaluation. |
mRNA_mat |
ndarray
|
A matrix representing the mRNA dynamics. |
regulators |
ndarray
|
Array of regulator IDs associated with mRNA and TF interactions. |
protein_mat |
ndarray
|
A matrix representing the protein synthesis rates or patterns. |
psite_tensor |
ndarray
|
A tensor indicating probabilistic binding sites of proteins. |
beta_start_indices |
ndarray
|
Array indicating the starting indices of beta coefficients. |
num_psites |
ndarray
|
Array indicating the number of p-sites per transcription factor. |
no_psite_tf |
ndarray
|
Boolean array indicating TFs with zero p-sites. |
loss_type |
int
|
Configurable loss function type for the optimization. Defaults to 0. |
lam1 |
float
|
First regularization parameter for loss calculation. Defaults to 1e-3. |
lam2 |
float
|
Second regularization parameter for loss calculation. Defaults to 1e-3. |
max_threads |
int
|
Maximum number of threads to use for evaluation. 0 indicates automatic thread selection based on system capacity. |
__init__(n_var: int, n_mRNA: int, n_TF: int, n_reg: int, n_psite_max: int, n_alpha: int, mRNA_mat: np.ndarray, regulators: np.ndarray, protein_mat: np.ndarray, psite_tensor: np.ndarray, T_use: int, beta_start_indices: np.ndarray, num_psites: np.ndarray, no_psite_tf: np.ndarray, xl: Optional[np.ndarray] = None, xu: Optional[np.ndarray] = None, **kwargs)
Initializes the class with various parameters required for computational evaluation.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
n_var
|
int
|
The number of variables. |
required |
n_mRNA
|
int
|
The number of mRNA molecules. |
required |
n_TF
|
int
|
The number of transcription factors (TFs). |
required |
n_reg
|
int
|
The number of regulators. |
required |
n_psite_max
|
int
|
The maximum number of p-sites. |
required |
n_alpha
|
int
|
The number of alpha coefficients. |
required |
mRNA_mat
|
ndarray
|
Matrix representing the mRNA data. |
required |
regulators
|
ndarray
|
Array representing the regulator mappings. |
required |
protein_mat
|
ndarray
|
Matrix representing the protein data. |
required |
psite_tensor
|
ndarray
|
Tensor representing the p-site data. |
required |
T_use
|
int
|
The time step or usage parameter. |
required |
beta_start_indices
|
ndarray
|
Array of start indices for beta calculations. |
required |
num_psites
|
ndarray
|
Array representing the number of p-sites. |
required |
no_psite_tf
|
ndarray
|
Boolean array indicating TFs with no associated p-sites. |
required |
xl
|
Optional[ndarray]
|
Optional lower-bound array for the variables. |
None
|
xu
|
Optional[ndarray]
|
Optional upper-bound array for the variables. |
None
|
**kwargs
|
Additional optional arguments such as "loss_type", "lam1", "lam2", and "threads". |
{}
|
tfopt.evol.opt.optrun
run_optimization(problem, total_dim, optimizer)
Execute multi-objective optimization using the specified algorithm.
This function configures and runs one of three multi-objective evolutionary algorithms (UNSGA3, SMSEMOA, or AGEMOEA) on the provided optimization problem. The algorithm is configured with appropriate genetic operators (two-point crossover and polynomial mutation) and terminated after 1000 generations.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
problem
|
Problem
|
The pymoo Problem instance defining the optimization problem, including objectives, constraints, and variable bounds. |
required |
total_dim
|
int
|
Total number of decision variables (dimensions) in the optimization problem. Used to determine population size and mutation probability. |
required |
optimizer
|
int
|
Selector for the optimization algorithm: - 0: UNSGA3 (Unified NSGA-III) - Reference direction-based algorithm - 1: SMSEMOA - S-Metric Selection Evolutionary Multi-objective Algorithm - 2: AGEMOEA - Adaptive Geometry Estimation-based Multi-objective Evolutionary Algorithm |
required |
Returns:
| Name | Type | Description |
|---|---|---|
Result |
A pymoo Result object containing the optimization outcomes, including: - X: Decision variables of the Pareto-optimal solutions - F: Objective function values of the Pareto-optimal solutions - algorithm: The algorithm instance used - Additional statistics and convergence information |
Notes
- Population size is set to 2 * total_dim (or larger for UNSGA3 if needed)
- Crossover probability: 0.9
- Mutation probability: 1.0 / total_dim
- Mutation distribution index (eta): 20
- Termination: Fixed at 1000 generations
- Random seed: 1 (for reproducibility)
- Duplicate elimination is enabled for all algorithms
- UNSGA3 automatically adjusts population size to match reference directions
tfopt.evol.optcon.construct
build_fixed_arrays(mRNA_ids, mRNA_mat, TF_ids, protein_dict, psite_dict, psite_labels_dict, reg_map)
Builds fixed-shape arrays from the input data.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
mRNA_ids
|
list
|
List of mRNA identifiers. |
required |
mRNA_mat
|
ndarray
|
Matrix of mRNA expression levels. |
required |
TF_ids
|
list
|
List of TF identifiers. |
required |
protein_dict
|
dict
|
Dictionary mapping TFs to their protein levels. |
required |
psite_dict
|
dict
|
Dictionary mapping TFs to their phosphorylation sites. |
required |
psite_labels_dict
|
dict
|
Dictionary mapping TFs to their phosphorylation site labels. |
required |
reg_map
|
dict
|
Mapping of genes to their regulators. |
required |
Returns: mRNA_mat (np.ndarray): Matrix of mRNA expression levels. regulators (np.ndarray): Matrix of regulators for each mRNA. protein_mat (np.ndarray): Matrix of TF protein levels. psite_tensor (np.ndarray): Tensor of phosphorylation sites. n_reg (int): Number of regulators. n_psite_max (int): Maximum number of phosphorylation sites across all TFs. psite_labels_arr (list): List of phosphorylation site labels for each TF. num_psites (np.ndarray): Array indicating the number of phosphorylation sites for each TF.
tfopt.evol.optcon.filter
load_raw_data()
Load raw data from files.
Returns:
| Name | Type | Description |
|---|---|---|
mRNA_ids |
List of mRNA gene identifiers. |
|
mRNA_mat |
Matrix of mRNA expression data. |
|
mRNA_time_cols |
Time points for mRNA data. |
|
TF_ids |
List of transcription factor identifiers. |
|
protein_dict |
Dictionary mapping TF_ids to their protein data. |
|
psite_dict |
Dictionary mapping TF_ids to their phosphorylation site data. |
|
psite_labels_dict |
Dictionary mapping TF_ids to their phosphorylation site labels. |
|
TF_time_cols |
Time points for TF data. |
|
reg_map |
Regulation map, mapping mRNA genes to their regulators. |
filter_mrna(mRNA_ids, mRNA_mat, reg_map)
Filter mRNA genes to only those with regulators present in the regulation map.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
mRNA_ids
|
list
|
List of mRNA gene identifiers. |
required |
mRNA_mat
|
ndarray
|
Matrix of mRNA expression data. |
required |
reg_map
|
dict
|
Regulation map, mapping mRNA genes to their regulators. |
required |
Returns:
| Name | Type | Description |
|---|---|---|
filtered_mRNA_ids |
list
|
List of filtered mRNA gene identifiers. |
filtered_mRNA_mat |
ndarray
|
Matrix of filtered mRNA expression data. |
update_regulations(mRNA_ids, reg_map, TF_ids)
Update the regulation map to only include relevant transcription factors.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
mRNA_ids
|
list
|
List of mRNA gene identifiers. |
required |
reg_map
|
dict
|
Regulation map, mapping mRNA genes to their regulators. |
required |
TF_ids
|
list
|
List of transcription factor identifiers. |
required |
Returns:
| Name | Type | Description |
|---|---|---|
relevant_TFs |
set
|
Set of relevant transcription factors. |
filter_TF(TF_ids, protein_dict, psite_dict, psite_labels_dict, relevant_TFs)
Filter transcription factors to only those present in the relevant_TFs set.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
TF_ids
|
list
|
List of transcription factor identifiers. |
required |
protein_dict
|
dict
|
Dictionary mapping TF_ids to their protein data. |
required |
psite_dict
|
dict
|
Dictionary mapping TF_ids to their phosphorylation site data. |
required |
psite_labels_dict
|
dict
|
Dictionary mapping TF_ids to their phosphorylation site labels. |
required |
relevant_TFs
|
set
|
Set of relevant transcription factors. |
required |
Returns:
| Name | Type | Description |
|---|---|---|
TF_ids_filtered |
list
|
List of filtered transcription factor identifiers. |
protein_dict |
dict
|
Filtered dictionary mapping TF_ids to their protein data. |
psite_dict |
dict
|
Filtered dictionary mapping TF_ids to their phosphorylation site data. |
psite_labels_dict |
dict
|
Filtered dictionary mapping TF_ids to their phosphorylation site labels. |
determine_T_use(mRNA_mat, TF_time_cols)
Determine the number of time points to use for the analysis.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
mRNA_mat
|
ndarray
|
Matrix of mRNA expression data. |
required |
TF_time_cols
|
list
|
Time points for TF data. |
required |
tfopt.evol.utils.iodata
load_mRNA_data(filename=INPUT3)
Load mRNA data from a CSV file.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
filename
|
str
|
Path to the CSV file containing mRNA data. |
INPUT3
|
Returns: - mRNA_ids: List of mRNA gene identifiers (strings). - mRNA_mat: Matrix of mRNA expression data (numpy array). - time_cols: List of time columns (excluding "GeneID").
load_TF_data(filename=INPUT1)
Load TF data from a CSV file.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
filename
|
str
|
Path to the CSV file containing TF data. |
INPUT1
|
Returns: - TF_ids: List of TF identifiers (strings). - protein_dict: Dictionary mapping TF identifiers to their protein data (numpy array). - psite_dict: Dictionary mapping TF identifiers to their phosphorylation site data (list of numpy arrays). - psite_labels_dict: Dictionary mapping TF identifiers to their phosphorylation site labels (list of strings). - time_cols: List of time columns (excluding "GeneID" and "Psite").
load_regulation(filename=INPUT4)
Load regulation data from a CSV file.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
filename
|
str
|
Path to the CSV file containing regulation data. |
INPUT4
|
Returns: - reg_map: Dictionary mapping mRNA genes to their regulators (list of TF identifiers).
create_report(results_dir: str, output_file: str = 'report.html')
Creates a single global report HTML file from all gene folders inside the results directory.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
results_dir
|
str
|
Path to the directory containing gene folders. |
required |
organize_output_files(*directories)
Organizes output files from multiple directories into separate folders for each protein.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
directories
|
str
|
List of directories to organize. |
()
|
format_duration(seconds)
Format a duration in seconds into a human-readable string.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
seconds
|
float
|
Duration in seconds. |
required |
Returns: str: Formatted duration string.
tfopt.evol.utils.params
create_no_psite_array(n_TF, num_psites, psite_labels_arr)
Create an array indicating whether each TF has no phosphorylation sites.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
n_TF
|
int
|
Number of transcription factors. |
required |
num_psites
|
list
|
List of number of phosphorylation sites for each TF. |
required |
psite_labels_arr
|
list
|
List of phosphorylation site labels for each TF. |
required |
Returns:
| Name | Type | Description |
|---|---|---|
no_psite_tf |
ndarray
|
Array indicating whether each TF has no phosphorylation sites. |
compute_beta_indices(num_psites, n_TF)
Compute the starting indices for the beta parameters for each TF.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
num_psites
|
list
|
List of number of phosphorylation sites for each TF. |
required |
n_TF
|
int
|
Number of transcription factors. |
required |
Returns:
| Name | Type | Description |
|---|---|---|
beta_start_indices |
ndarray
|
Array of starting indices for the beta parameters. |
cum |
int
|
Total number of beta parameters. |
create_initial_guess(n_mRNA, n_reg, n_TF, num_psites, no_psite_tf)
Create the initial guess for the optimization variables.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
n_mRNA
|
int
|
Number of mRNAs. |
required |
n_reg
|
int
|
Number of regulators. |
required |
n_TF
|
int
|
Number of transcription factors. |
required |
num_psites
|
list
|
List of number of phosphorylation sites for each TF. |
required |
no_psite_tf
|
ndarray
|
Array indicating whether each TF has no phosphorylation sites. |
required |
Returns:
| Name | Type | Description |
|---|---|---|
x0 |
ndarray
|
Initial guess for the optimization variables. |
n_alpha |
int
|
Number of alpha parameters. |
create_bounds(n_alpha, n_beta_total, lb, ub)
Create the lower and upper bounds for the optimization variables.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
n_alpha
|
int
|
Number of alpha parameters. |
required |
n_beta_total
|
int
|
Total number of beta parameters. |
required |
lb
|
float
|
Lower bound for the optimization variables. |
required |
ub
|
float
|
Upper bound for the optimization variables. |
required |
Returns:
| Name | Type | Description |
|---|---|---|
xl |
ndarray
|
Lower bounds for the optimization variables. |
xu |
ndarray
|
Upper bounds for the optimization variables. |
get_parallel_runner()
Get a parallel runner for multi-threading.
Returns:
| Name | Type | Description |
|---|---|---|
runner |
Parallelization runner. |
|
pool |
ThreadPool instance for parallel execution. |
extract_best_solution(res, n_alpha, n_mRNA, n_reg, n_TF, num_psites, beta_start_indices)
Extract the best solution from the optimization results.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
res
|
Optimization results. |
required | |
n_alpha
|
int
|
Number of alpha parameters. |
required |
n_mRNA
|
int
|
Number of mRNAs. |
required |
n_reg
|
int
|
Number of regulators. |
required |
n_TF
|
int
|
Number of transcription factors. |
required |
num_psites
|
list
|
List of number of phosphorylation sites for each TF. |
required |
beta_start_indices
|
ndarray
|
Array of starting indices for the beta parameters. |
required |
Returns:
| Name | Type | Description |
|---|---|---|
final_alpha |
ndarray
|
Final alpha parameters. |
final_beta |
ndarray
|
Final beta parameters. |
best_objectives |
ndarray
|
Best objectives from the Pareto front. |
final_x |
ndarray
|
Final optimization variables. |
print_alpha_mapping(mRNA_ids, reg_map, TF_ids, final_alpha)
Print the mapping of transcription factors (TFs) to mRNAs with their corresponding alpha values.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
mRNA_ids
|
list
|
List of mRNA identifiers. |
required |
reg_map
|
dict
|
Mapping of genes to their regulators. |
required |
TF_ids
|
list
|
List of TF identifiers. |
required |
final_alpha
|
ndarray
|
Final alpha parameters (mRNA x TF). |
required |
print_beta_mapping(TF_ids, final_beta, psite_labels_arr)
Print the mapping of transcription factors (TFs) to their beta parameters.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
TF_ids
|
list
|
List of TF identifiers. |
required |
final_beta
|
ndarray
|
Final beta parameters (TF x β). |
required |
psite_labels_arr
|
list
|
List of phosphorylation site labels for each TF. |
required |
Gradient-Based Algorithms
tfopt.local.config.constants
parse_args()
tfopt.local CLI: bounds and loss selection. Defaults come from config.toml.
tfopt.local.config.logconf
tfopt.local.exporter.plotout
plot_estimated_vs_observed(predictions, expression_matrix, gene_ids, time_points, regulators, tf_protein_matrix, tf_ids, num_targets, save_path=OUT_DIR)
Plots the estimated vs observed values for a given set of genes and their corresponding TFs.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
predictions
|
ndarray
|
Predicted expression levels. |
required |
expression_matrix
|
ndarray
|
Observed expression levels. |
required |
gene_ids
|
list
|
List of gene identifiers. |
required |
time_points
|
ndarray
|
Time points for the experiments. |
required |
regulators
|
ndarray
|
Matrix of regulators for each gene. |
required |
tf_protein_matrix
|
ndarray
|
Matrix of TF protein levels. |
required |
tf_ids
|
list
|
List of TF identifiers. |
required |
num_targets
|
int
|
Number of target genes to plot. |
required |
save_path
|
str
|
Directory to save the plots. |
OUT_DIR
|
plot_multistart_summary_runtime_overlay(summary_csv, out_path=None, figsize=(8, 8), x_col='rank', y_col='fun', c_col='runtime_s', success_col='success', cv_col='constr_violation', annotate_best=True)
Creates a scatter plot visualizing multi-start optimization results with runtime overlay.
This function reads a CSV summary of multiple optimization runs and generates a scatter plot showing the relationship between run rank and final objective value, with runtime (or iterations) represented as color intensity. Successful and feasible runs are emphasized while unsuccessful or infeasible runs are shown with reduced opacity.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
summary_csv
|
str or Path
|
Path to the CSV file containing multi-start optimization results. |
required |
out_path
|
str or Path
|
Path to save the output figure. If None, figure is not saved. Defaults to None. |
None
|
figsize
|
tuple
|
Figure size as (width, height) in inches. Defaults to (8, 8). |
(8, 8)
|
x_col
|
str
|
Column name for x-axis (run rank). If missing, will be created from y_col ranking. Defaults to "rank". |
'rank'
|
y_col
|
str
|
Column name for y-axis (final objective value). Defaults to "fun". |
'fun'
|
c_col
|
str
|
Column name for color mapping (typically runtime). Falls back to "nit" (iterations) if not found. Defaults to "runtime_s". |
'runtime_s'
|
success_col
|
str
|
Column name indicating optimization success status. Defaults to "success". |
'success'
|
cv_col
|
str
|
Column name for constraint violation values. Falls back to common alternatives if not found. Defaults to "constr_violation". |
'constr_violation'
|
annotate_best
|
bool
|
Whether to annotate the best (rank 1) point on the plot. Defaults to True. |
True
|
Returns:
| Name | Type | Description |
|---|---|---|
tuple |
A tuple containing: - fig (matplotlib.figure.Figure): The generated figure object. - ax (matplotlib.axes.Axes): The axes object of the plot. - df (pd.DataFrame): The processed DataFrame with sorted results. |
Notes
- Points are considered feasible if constraint violation <= 1e-8
- Infeasible or unsuccessful runs are plotted with reduced opacity (0.25)
- If rank column is missing, it's automatically generated from objective values
- If runtime column is missing, falls back to iteration count or constant color
tfopt.local.exporter.sheetutils
save_results_to_excel(gene_ids, tf_ids, final_alpha, final_beta, psite_labels_arr, expression_matrix, predictions, objective_value, reg_map, filename=OUT_FILE)
Save the optimization results to an Excel file.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
gene_ids
|
list
|
List of gene identifiers. |
required |
tf_ids
|
list
|
List of TF identifiers. |
required |
final_alpha
|
ndarray
|
Final alpha values. |
required |
final_beta
|
ndarray
|
Final beta values. |
required |
psite_labels_arr
|
list
|
List of phosphorylation site labels. |
required |
expression_matrix
|
ndarray
|
Observed expression levels. |
required |
predictions
|
ndarray
|
Predicted expression levels. |
required |
objective_value
|
float
|
Objective value from optimization. |
required |
reg_map
|
dict
|
Mapping of genes to regulators. |
required |
filename
|
str
|
Path to the output Excel file. |
OUT_FILE
|
export_multistart_results(results)
Export multiple multistart optimization results to an Excel file.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
results
|
list
|
List of optimization results, each containing attributes like 'start_id', 'fun', 'success', etc. |
required |
Returns:
| Type | Description |
|---|---|
|
None |
save_multistart_solutions_npz(all_results, out_path)
Saves multistart optimization solutions to a compressed .npz file format.
This function aggregates optimization results into a structured format and saves them in a compressed NumPy .npz file. It processes the solutions, extracting relevant attributes such as optimization variables, function values, success status, and starting IDs, before saving them for later use.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
all_results
|
list
A list of optimization result objects. Each result object must have
the attributes |
required | |
out_path
|
str or Path The file path where the compressed .npz file will be saved. The path will be converted into a pathlib Path object if it is not already one. |
required |
tfopt.local.objfn.minfn
objective_(x, expression_matrix, regulators, tf_protein_matrix, psite_tensor, n_reg, T_use, n_genes, beta_start_indices, num_psites, loss_type, lam1=1e-06, lam2=1e-06)
Originally implemented by Julius Normann.
This version has been modified and optimized for consistency & speed in submodules by Abhinav Mishra.
Computes a loss value using one of several loss functions.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
x
|
Decision vector. |
required | |
expression_matrix
|
(n_genes x T_use) measured gene expression values. |
required | |
regulators
|
(n_genes x n_reg) indices of TF regulators for each gene. |
required | |
tf_protein_matrix
|
(n_TF x T_use) TF protein time series. |
required | |
psite_tensor
|
(n_TF x n_psite_max x T_use) matrix of PSite signals (padded with zeros). |
required | |
n_reg
|
Maximum number of regulators per gene. |
required | |
T_use
|
Number of time points used. |
required | |
n_genes, n_TF
|
Number of genes and TF respectively. |
required | |
beta_start_indices
|
Integer array giving the starting index (in the β–segment) for each TF. |
required | |
num_psites
|
Integer array with the actual number of PSites for each TF. |
required | |
loss_type
|
Integer indicating the loss type (0: MSE, 1: MAE, 2: soft L1, 3: Cauchy, 4: Arctan, 5: Elastic Net, 6: Tikhonov). |
required |
Returns:
| Name | Type | Description |
|---|---|---|
loss |
The computed loss (a scalar). |
compute_predictions(x, regulators, tf_protein_matrix, psite_tensor, n_reg, T_use, n_genes, beta_start_indices, num_psites)
Computes the predicted expression matrix based on the decision vector x.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
x
|
Decision vector. |
required | |
regulators
|
(n_genes x n_reg) indices of TF regulators for each gene. |
required | |
tf_protein_matrix
|
(n_TF x T_use) TF protein time series. |
required | |
psite_tensor
|
(n_TF x n_psite_max x T_use) matrix of PSite signals (padded with zeros). |
required | |
n_reg
|
Maximum number of regulators per gene. |
required | |
T_use
|
Number of time points used. |
required | |
n_genes
|
Number of genes. |
required | |
beta_start_indices
|
Integer array giving the starting index (in the β–segment) for each TF. |
required | |
num_psites
|
Integer array with the actual number of PSites for each TF. |
required |
Returns:
| Name | Type | Description |
|---|---|---|
predictions |
(n_genes x T_use) predicted gene expression values. |
objective_wrapper(x, expression_matrix, regulators, tf_protein_matrix, psite_tensor, n_reg, T_use, n_genes, beta_start_indices, num_psites, loss_type)
Wrapper function for the objective function.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
x
|
Decision vector. |
required | |
expression_matrix
|
(n_genes x T_use) measured gene expression values. |
required | |
regulators
|
(n_genes x n_reg) indices of TF regulators for each gene. |
required | |
tf_protein_matrix
|
(n_TF x T_use) TF protein time series. |
required | |
psite_tensor
|
(n_TF x n_psite_max x T_use) matrix of PSite signals (padded with zeros). |
required | |
n_reg
|
Maximum number of regulators per gene. |
required | |
T_use
|
Number of time points used. |
required | |
n_genes
|
Number of genes. |
required | |
beta_start_indices
|
Integer array giving the starting index (in the β–segment) for each TF. |
required | |
num_psites
|
Integer array with the actual number of PSites for each TF. |
required | |
loss_type
|
Integer indicating the loss type. |
required |
Returns:
| Name | Type | Description |
|---|---|---|
loss |
The computed loss (a scalar). |
tfopt.local.opt.optrun
run_optimizer(x0, bounds, lin_cons, expression_matrix, regulators, tf_protein_matrix, psite_tensor, n_reg, T_use, n_genes, beta_start_indices, num_psites, loss_type)
Runs the optimization algorithm to minimize the objective function.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
x0
|
Initial guess for the optimization variables. |
required | |
bounds
|
Bounds for the optimization variables. |
required | |
lin_cons
|
Linear constraints for the optimization problem. |
required | |
expression_matrix
|
(n_genes x T_use) measured gene expression values. |
required | |
regulators
|
(n_genes x n_reg) indices of TF regulators for each gene. |
required | |
tf_protein_matrix
|
(n_TF x T_use) TF protein time series. |
required | |
psite_tensor
|
(n_TF x n_psite_max x T_use) matrix of PSite signals (padded with zeros). |
required | |
n_reg
|
Maximum number of regulators per gene. |
required | |
T_use
|
Number of time points used. |
required | |
n_genes, n_TF
|
Number of genes and TF respectively. |
required | |
beta_start_indices
|
Integer array giving the starting index (in the β–segment) for each TF. |
required | |
num_psites
|
Integer array with the actual number of PSites for each TF. |
required | |
loss_type
|
Type of loss function to use. |
required |
Returns: result : Result of the optimization process, including the optimized parameters and objective value.
generate_multistart_x0(x0: np.ndarray, bounds: Sequence[Tuple[float, float]], n_starts: int, seed: int = 0, jitter_frac: float = 0.05, p_random: float = 0.3) -> List[np.ndarray]
Generates multiple starting points for multi-start optimization.
Builds a diverse list of starting points
- mostly: jitter around baseline x0
- some: fully random within bounds
jitter_frac is relative to (ub - lb). p_random is fraction of starts that are random-in-bounds.
Returns:
| Type | Description |
|---|---|
List[ndarray]
|
List[np.ndarray]: A list of starting points for optimization. |
run_optimizer_multistart(x0: np.ndarray, bounds, lin_cons, expression_matrix, regulators, tf_protein_matrix, psite_tensor, n_reg, T_use, n_genes, beta_start_indices, num_psites, loss_type, run_optimizer_func, cfg: Optional[MultiStartConfig] = None, polish: bool = True)
Executes a multistart optimization loop with parallelization and optional polishing to find the best solution across multiple starting points. The function leverages a parallel approach for running multiple optimizations, selects the best result based on predefined sorting criteria, and optionally refines it.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
x0
|
ndarray
|
Initial guess for the optimization variables. |
required |
bounds
|
Bounds for the optimization variables, typically a sequence of (min, max) pairs. |
required | |
lin_cons
|
Linear constraints for the optimizer, defined as per specific optimizer requirements. |
required | |
expression_matrix
|
Input gene expression data utilized in the optimization process. |
required | |
regulators
|
Regulatory inputs or factors influencing the optimization process. |
required | |
tf_protein_matrix
|
Matrix representing transcription factor proteins relevant to the process. |
required | |
psite_tensor
|
Tensor containing phosphorylation site data used in the computations. |
required | |
n_reg
|
Number of regulators involved in the optimization. |
required | |
T_use
|
Specific configuration parameter determining time or iteration usage. |
required | |
n_genes
|
Number of genes considered within the problem scope. |
required | |
beta_start_indices
|
Indices indicating the start positions of beta parameters in the optimization. |
required | |
num_psites
|
Total number of phosphorylation sites accounted for in optimization. |
required | |
loss_type
|
Type of loss function used for evaluating optimization performance. |
required | |
run_optimizer_func
|
Optimization function to be executed for each starting point. |
required | |
cfg
|
Optional[MultiStartConfig]
|
Configuration object specifying multistart parameters such as number of starts, parallelization settings, and randomness. |
None
|
polish
|
bool
|
Indicates whether to perform a final optimization run initialized at the best solution. Defaults to True. |
True
|
Returns:
| Name | Type | Description |
|---|---|---|
Tuple |
A tuple containing: - The best result as determined by sorting criteria. - A list of sorted optimization results from all starting points. |
tfopt.local.optcon.construct
build_fixed_arrays(gene_ids, expression_matrix, tf_ids, tf_protein, tf_psite_data, tf_psite_labels, reg_map)
Builds fixed-shape arrays from the input data.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
- gene_ids
|
list of mRNA identifiers. |
required | |
- expression_matrix
|
array of shape (n_genes, T) with mRNA expression levels. |
required | |
- tf_ids
|
list of TF identifiers. |
required | |
- tf_protein
|
dict mapping TFs to their protein levels. |
required | |
- tf_psite_data
|
dict mapping TFs to their phosphorylation sites. |
required | |
- tf_psite_labels
|
dict mapping TFs to their phosphorylation site labels. |
required | |
- reg_map
|
mapping of genes to their regulators (TFs). |
required |
Returns:
| Type | Description |
|---|---|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
constraint_alpha_func(x, n_genes, n_reg)
For each gene, the sum of its alpha parameters must equal 1.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
x
|
ndarray
|
Decision vector. |
required |
n_genes
|
int
|
Number of genes. |
required |
n_reg
|
int
|
Number of regulators. |
required |
Returns:
| Type | Description |
|---|---|
|
np.ndarray: Array of constraints. |
constraint_beta_func(x, n_alpha, n_TF, beta_start_indices, num_psites, no_psite_tf)
For each TF, the sum of its beta parameters must equal 1.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
x
|
ndarray
|
Decision vector. |
required |
n_alpha
|
int
|
Number of alpha parameters. |
required |
n_TF
|
int
|
Number of transcription factors. |
required |
beta_start_indices
|
list
|
List of starting indices for beta parameters. |
required |
num_psites
|
list
|
List of number of phosphorylation sites for each TF. |
required |
no_psite_tf
|
list
|
List indicating if a TF has no phosphorylation site. |
required |
Returns:
| Type | Description |
|---|---|
|
np.ndarray: Array of constraints. |
build_linear_constraints(n_genes, n_TF, n_reg, n_alpha, beta_start_indices, num_psites, no_psite_tf)
Build linear constraints for the transcription factor optimization problem.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
n_genes
|
int
|
Number of genes. |
required |
n_TF
|
int
|
Number of transcription factors. |
required |
n_reg
|
int
|
Number of regulators. |
required |
n_alpha
|
int
|
Number of alpha parameters. |
required |
beta_start_indices
|
list
|
List of starting indices for beta parameters. |
required |
num_psites
|
list
|
List of number of phosphorylation sites for each TF. |
required |
no_psite_tf
|
list
|
List indicating if a TF has no phosphorylation site. |
required |
Returns:
| Name | Type | Description |
|---|---|---|
list |
List of linear constraints. |
tfopt.local.optcon.filter
load_and_filter_data()
Load and filter data for the optimization problem.
Returns:
| Type | Description |
|---|---|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
prepare_data(gene_ids, expr_matrix, tf_ids, tf_protein, tf_psite_data, tf_psite_labels, tf_time_cols, reg_map)
Prepares the data for optimization by filtering the expression matrix to match the number of time points and building fixed arrays.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
gene_ids
|
list
|
List of gene IDs. |
required |
expr_matrix
|
ndarray
|
Gene expression matrix. |
required |
tf_ids
|
list
|
List of transcription factor IDs. |
required |
tf_protein
|
dict
|
Dictionary mapping TF IDs to their protein data. |
required |
tf_psite_data
|
dict
|
Dictionary mapping TF IDs to their phosphorylation site data. |
required |
tf_psite_labels
|
dict
|
Dictionary mapping TF IDs to their phosphorylation site labels. |
required |
tf_time_cols
|
list
|
Time columns for TF data. |
required |
reg_map
|
dict
|
Regulation map, mapping gene IDs to their regulators. |
required |
Returns: fixed_arrays (tuple): Tuple containing the fixed arrays: - expression_matrix: array of shape (n_genes, T) - regulators: array of shape (n_genes, n_reg) with indices into tf_ids. - tf_protein_matrix: array of shape (n_TF, T) - psite_tensor: array of shape (n_TF, n_psite_max, T), padded with zeros. - n_reg: maximum number of regulators per gene. - n_psite_max: maximum number of PSites among TFs. - psite_labels_arr: list (length n_TF) of lists of PSite names (padded with empty strings). - num_psites: array of length n_TF with the actual number of PSites for each TF. T_use (int): Number of time points used in the expression matrix.
tfopt.local.utils.iodata
min_max_normalize(df, custom_max=None)
Row-wise (per-sample) min-max normalize time-series columns starting with 'x'.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
df
|
DataFrame
|
Input DataFrame with time-series columns (x1-xN). |
required |
custom_max
|
float
|
If given, used as max for all rows. |
None
|
Returns:
| Type | Description |
|---|---|
|
pd.DataFrame: Normalized DataFrame with same shape. |
load_expression_data(filename=INPUT3)
Loads gene expression (mRNA) data.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
filename
|
str
|
Path to the CSV file containing mRNA data. |
INPUT3
|
Returns:
| Type | Description |
|---|---|
|
|
|
|
|
load_tf_protein_data(filename=INPUT1)
Loads TF protein data along with PSite information.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
filename
|
str
|
Path to the CSV file containing TF protein data. |
INPUT1
|
Returns: - tf_ids: List of TF identifiers (strings). - tf_protein: Dictionary mapping TF identifiers to their protein data (numpy array). - tf_psite_data: Dictionary mapping TF identifiers to their phosphorylation site data (list of numpy arrays). - tf_psite_labels: Dictionary mapping TF identifiers to their phosphorylation site labels (list of strings). - time_cols: List of time columns (excluding "GeneID" and "Psite").
load_regulation(filename=INPUT4)
Returns a mapping from gene (source) to a list of TFs (targets).
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
filename
|
str
|
Path to the CSV file containing regulation data. |
INPUT4
|
Returns: - reg_map: Dictionary mapping gene identifiers to lists of TF identifiers.
summarize_stats(input3=INPUT3, input1=INPUT1, input4=INPUT4)
Summarizes statistics for the expression data (input3) and TF protein data (input1).
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
input3
|
str
|
Path to the expression data CSV file. |
INPUT3
|
input1
|
str
|
Path to the TF protein data CSV file. |
INPUT1
|
input4
|
str
|
Path to the mapping file CSV. |
INPUT4
|
create_report(results_dir: str, output_file: str = 'report.html')
Creates a single global report HTML file from all gene folders inside the results directory.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
results_dir
|
str
|
Path to the root results directory. |
required |
output_file
|
str
|
Name of the generated global report file (placed inside results_dir). |
'report.html'
|
organize_output_files(*directories)
Function to organize output files into protein-specific folders.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
directories
|
str
|
List of directories to organize. |
()
|
tfopt.local.utils.params
get_optimization_parameters(expression_matrix, tf_protein_matrix, n_reg, T_use, psite_labels_arr, num_psites, lb, ub)
Prepare the optimization parameters for the optimization problem.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
expression_matrix
|
ndarray
|
Gene expression matrix. |
required |
tf_protein_matrix
|
ndarray
|
TF protein matrix. |
required |
n_reg
|
int
|
Number of regulators. |
required |
T_use
|
int
|
Number of time points to use. |
required |
psite_labels_arr
|
list
|
List of phosphorylation site labels for each TF. |
required |
num_psites
|
ndarray
|
Array containing the number of phosphorylation sites for each TF. |
required |
lb
|
float
|
Lower bound for beta parameters. |
required |
ub
|
float
|
Upper bound for beta parameters. |
required |
Returns: x0 (np.ndarray): Initial guess for the optimization variables. n_alpha (int): Number of alpha parameters. beta_start_indices (np.ndarray): Starting indices for beta parameters. bounds (list): List of bounds for the optimization variables. no_psite_tf (np.ndarray): Array indicating whether each TF has no phosphorylation sites. n_genes (int): Number of genes. n_TF (int): Number of transcription factors.
postprocess_results(result, n_alpha, n_genes, n_reg, beta_start_indices, num_psites, reg_map, gene_ids, tf_ids, psite_labels_arr)
Post-process the optimization results to extract the final alpha and beta parameters.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
result
|
OptimizeResult
|
The result of the optimization. |
required |
n_alpha
|
int
|
Number of alpha parameters. |
required |
n_genes
|
int
|
Number of genes. |
required |
n_reg
|
int
|
Number of regulators. |
required |
beta_start_indices
|
ndarray
|
Starting indices for beta parameters. |
required |
num_psites
|
ndarray
|
Array containing the number of phosphorylation sites for each TF. |
required |
reg_map
|
dict
|
Regulation map, mapping gene IDs to their regulators. |
required |
gene_ids
|
list
|
List of gene IDs. |
required |
tf_ids
|
list
|
List of transcription factor IDs. |
required |
psite_labels_arr
|
list
|
List of lists containing phosphorylation site labels. |
required |
Returns:
| Name | Type | Description |
|---|---|---|
final_x |
ndarray
|
Final optimization result. |
final_alpha |
ndarray
|
Final alpha parameters reshaped into a matrix. |
final_beta |
ndarray
|
Final beta parameters reshaped into a matrix. |
Fitting Analysis
tfopt.fitanalysis.helper
Plotter
A class to plot various analysis results from an Excel file.
__init__(filepath, savepath)
Initializes the Plotter instance by loading data from the Excel file. Args: filepath (str): Path to the Excel file containing analysis results. savepath (str): Directory where the plots will be saved.
load_data()
Loads data from the specified Excel file. Args: filepath (str): Path to the Excel file. savepath (str): Directory where the plots will be saved.
plot_alpha_distribution()
Plots the distribution of alpha parameter values grouped by transcription factors (TFs) using a strip plot.
plot_beta_barplots()
Processes the beta values DataFrame and creates a separate bar plot for each unique transcription factor (TF).
plot_heatmap_abs_residuals()
Plots a heatmap of the absolute values of the residuals.
plot_goodness_of_fit()
Creates a scatter plot comparing observed vs. estimated values, fits a linear regression model, plots the 95% confidence interval, and labels points outside the confidence interval.
plot_kld()
Plots the Kullback-Leibler Divergence (KLD) for each mRNA. The KLD is calculated between the observed and estimated distributions of the mRNA expression levels.
plot_pca()
Plots a PCA (Principal Component Analysis) of the observed and estimated values.
plot_boxplot_alpha()
Plots a boxplot of the alpha values.
plot_boxplot_beta()
Plots a boxplot of the beta values.
plot_cdf_alpha()
Plots the cumulative distribution function (CDF) of the alpha values.
plot_cdf_beta()
Plots the cumulative distribution function (CDF) of the beta values.
plot_time_wise_residuals()
Plots the residuals over time for each mRNA.
ODE Modelling & Parameter Estimation
Configuration
config.cli
Command‑line entry point for the phoskintime pipeline.
Usage
Come one level up from the package root, it should be the working directory
(where you can see the project directory).
run everything with the default (local) solver
python phoskintime all
run only preprocessing
python phoskintime prep
run tfopt with local flavour
python phoskintime tfopt --mode local
run tfopt with evol flavour
python phoskintime tfopt --mode evol
run kinopt with local flavour
python phoskintime kinopt --mode local
run kinopt with evol flavour
python phoskintime kinopt --mode evol
run the model
python phoskintime model
run the integrated Global Model
python phoskintime networkmodel
prep()
Preprocess data (processing.cleanup).
tfopt(mode: str = typer.Option('local', help='local | evol'), conf: Path | None = typer.Option(None, '--conf', file_okay=True, dir_okay=False, writable=False, help='Path to TOML/YAML config. Uses defaults if omitted.'))
Transcription-Factor-mRNA Optimisation.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
mode
|
str
|
local | evol |
Option('local', help='local | evol')
|
conf
|
Path | None
|
Path to TOML/YAML config. Uses defaults if omitted. |
Option(None, '--conf', file_okay=True, dir_okay=False, writable=False, help='Path to TOML/YAML config. Uses defaults if omitted.')
|
Returns: None
kinopt(mode: str = typer.Option('local', help='local | evol'), conf: Path | None = typer.Option(None, '--conf', file_okay=True, dir_okay=False, writable=False, help='Path to TOML/YAML config. Uses defaults if omitted.'))
Kinase-Phosphorylation Optimization.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
mode
|
str
|
local | evol |
Option('local', help='local | evol')
|
conf
|
Path | None
|
Path to TOML/YAML config. Uses defaults if omitted. |
Option(None, '--conf', file_okay=True, dir_okay=False, writable=False, help='Path to TOML/YAML config. Uses defaults if omitted.')
|
Returns: None
model(conf: Path | None = typer.Option(None, '--conf', file_okay=True, dir_okay=False, writable=False, help='Path to model config file. Uses defaults if omitted.'))
Run the model (runner.main).
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
conf
|
Path | None
|
Path to model config file. Uses defaults if omitted. |
Option(None, '--conf', file_okay=True, dir_okay=False, writable=False, help='Path to model config file. Uses defaults if omitted.')
|
Returns: None
networkmodel(conf: Path | None = typer.Option('config.toml', '--conf', file_okay=True, dir_okay=False, writable=False, help='Path to global model config file. Uses config.toml by default.'))
Run the integrated Global Model (networkmodel.runner).
This runs the unified optimization pipeline defined in the [networkmodel] section of your configuration.
clean()
Remove all pycache, .pyc, .nbc, and build artifacts recursively.
all(tf_mode: str = typer.Option('local', help='tfopt mode: local | evol'), kin_mode: str = typer.Option('local', help='kinopt mode: local | evol'), tf_conf: Path | None = typer.Option(None, help='tfopt config file'), kin_conf: Path | None = typer.Option(None, help='kinopt config file'), model_conf: Path | None = typer.Option(None, help='model config file'))
Run every stage in sequence. Preprocessing -> TF optimisation -> Kinase optimisation -> Model.
Note: This command does NOT run the global network simulation (networkmodel).
To run the global model, use the separate entry point phoskintime-global
(or python -m networkmodel.runner) after this command completes.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
tf_mode
|
str
|
tfopt mode: local | evol |
Option('local', help='tfopt mode: local | evol')
|
kin_mode
|
str
|
kinopt mode: local | evol |
Option('local', help='kinopt mode: local | evol')
|
tf_conf
|
Path | None
|
Path to TOML/YAML config. Uses defaults if omitted. |
Option(None, help='tfopt config file')
|
kin_conf
|
Path | None
|
Path to TOML/YAML config. Uses defaults if omitted. |
Option(None, help='kinopt config file')
|
model_conf
|
Path | None
|
Path to model config file. Uses defaults if omitted. |
Option(None, help='model config file')
|
Returns: None
config.config
parse_bound_pair(val)
Parse a string representing a pair of bounds (lower, upper) into a tuple of floats. The upper bound can be 'inf' or 'infinity' to represent infinity. Raises ValueError if the input is not in the correct format. Args: val (str): The string to parse, e.g., "0,3" or "0,infinity". Returns: tuple: A tuple containing the lower and upper bounds as floats.
parse_fix_value(val)
Parse a fixed value or a list of fixed values from a string. If the input is a single value, it returns that value as a float. If the input is a comma-separated list, it returns a list of floats. Raises ValueError if the input is not in the correct format. Args: val (str): The string to parse, e.g., "1.0" or "1.0,2.0". Returns: float or list: The parsed fixed value(s) as a float or a list of floats.
ensure_output_directory(directory)
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
directory
|
str
|
The path to the directory to create. |
required |
Returns: None
parse_args()
Parse command-line arguments for the PhosKinTime script. This function uses argparse to define and handle the command-line options. It includes options for setting bounds, fixed parameters, bootstrapping, profile estimation, and input file paths. The function returns the parsed arguments as a Namespace object. The arguments include: --A-bound, --B-bound, --C-bound, --D-bound, --Ssite-bound, --Dsite-bound, --bootstraps, --input-excel-protein, --input-excel-psite, --input-excel-rna.
Returns: argparse.Namespace: The parsed command-line arguments.
log_config(logger, bounds, args)
Log the configuration settings for the PhosKinTime script. This function logs the parameter bounds bootstrapping iterations. It uses the provided logger to output the information.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
logger
|
Logger
|
The logger to use for logging. |
required |
bounds
|
dict
|
The parameter bounds. |
required |
args
|
Namespace
|
The command-line arguments. |
required |
Returns: None
extract_config(args)
Extract configuration settings from command-line arguments. This function creates a dictionary containing the parameter bounds, bootstrapping iterations. The function returns the configuration dictionary.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
args
|
Namespace
|
The command-line arguments. |
required |
Returns: dict: The configuration settings.
score_fit(params, target, prediction, alpha=ALPHA_WEIGHT, beta=BETA_WEIGHT, gamma=GAMMA_WEIGHT, delta=DELTA_WEIGHT, mu=MU_WEIGHT)
Calculate the score for the fit of a model to target data. The score is a weighted combination of various metrics including mean squared error (MSE), root mean squared error (RMSE), mean absolute error (MAE), variance, and regularization penalty. The weights for each metric can be adjusted using the parameters alpha, beta, gamma, and delta. The regularization penalty is controlled by the reg_penalty parameter. The function returns the calculated score. Args: params (np.ndarray): The model parameters. target (np.ndarray): The target data. prediction (np.ndarray): The predicted data. alpha (float): Weight for RMSE. beta (float): Weight for MAE. gamma (float): Weight for variance. delta (float): Weight for MSE. mu (float): Regularization penalty weight. Returns: float: The calculated score.
future_times(n_new: int, ratio: Optional[float] = None, tp: np.ndarray = TIME_POINTS) -> np.ndarray
Extend ttime points by n_new points, each spaced by multiplying the previous interval by ratio. If ratio is None, it is inferred from the last two points.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
n_new
|
int
|
Number of new time points to generate. |
required |
ratio
|
float
|
Ratio to multiply the previous interval. Defaults to None. |
None
|
tp
|
ndarray
|
Existing time points. Defaults to TIME_POINTS. |
TIME_POINTS
|
Returns: np.ndarray: Extended time points.
config.constants
get_param_names_rand(num_psites: int) -> list
Generate parameter names for the random model.
Format: ['A', 'B', 'C', 'D'] +
['S1', 'S2', ..., 'S
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
num_psites
|
int
|
Number of phosphorylation sites. |
required |
Returns: list: List of parameter names.
get_param_names_ds(num_psites: int) -> list
Generate parameter names for distributive or successive models.
Format: ['A', 'B', 'C', 'D'] +
['S1', 'S2', ..., 'S
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
num_psites
|
int
|
Number of phosphorylation sites. |
required |
Returns: list: List of parameter names.
generate_labels_rand(num_psites: int) -> list
Generates labels for the states based on the number of phosphorylation sites for the random model. Returns a list with the base labels "R" and "P", followed by labels for all combinations of phosphorylated sites.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
num_psites
|
int
|
Number of phosphorylation sites. |
required |
Returns: list: List of state labels.
generate_labels_ds(num_psites: int) -> list
Generates labels for the states based on the number of phosphorylation sites for the distributive or successive models. Returns a list with the base labels "R" and "P", followed by labels for each individual phosphorylated state.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
num_psites
|
int
|
Number of phosphorylation sites. |
required |
Returns: list: List of state labels.
location(path: str, label: str = None) -> str
Returns a clickable hyperlink string for supported terminals using ANSI escape sequences.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
path
|
str
|
The file path or URL. |
required |
label
|
str
|
The display text for the link. Defaults to the path if not provided. |
None
|
Returns:
| Name | Type | Description |
|---|---|---|
str |
str
|
A string that, when printed, shows a clickable link in terminals that support ANSI hyperlinks. |
get_number_of_params_rand(num_psites)
Calculate the number of parameters required for the ODE system based on the number of phosphorylation sites.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
num_psites
|
int
|
Number of phosphorylation sites (1 to 4). |
required |
Returns:
| Name | Type | Description |
|---|---|---|
int |
Total number of parameters. |
get_bounds_rand(num_psites, ub=0, lower=0)
Generate bounds for the ODE parameters based on the number of phosphorylation sites.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
num_psites
|
int
|
Number of phosphorylation sites. |
required |
lower
|
float
|
Lower bound for parameters. |
0
|
upper
|
float
|
Upper bound for parameters. |
required |
Returns:
| Name | Type | Description |
|---|---|---|
list |
List of bounds as [lower, upper] for each parameter. |
config.logconf
ColoredFormatter
Bases: Formatter
Custom formatter to add colors to log messages and elapsed time.
This formatter uses ANSI escape codes to colorize the log messages based on their severity level.
It also includes a right-aligned clock that shows the elapsed time since the logger was initialized.
The elapsed time is displayed in a human-readable format (e.g., "1h 23m 45s").
The formatter is designed to be used with a logger that has a console handler.
The elapsed time is calculated from the time the logger was initialized and is displayed in a right-aligned format.
The formatter also ensures that the log messages are padded to a specified width, which can be adjusted using the width parameter.
The remove_ansi method is used to strip ANSI escape codes from the log message for accurate padding calculation.
The format method is overridden to customize the log message format, including the timestamp, logger name, log level, and message.
The setup_logger function is used to configure the logger with a file handler and a stream handler.
The file handler writes log messages to a specified log file, while the stream handler outputs log messages to the console.
The logger is set to the specified logging level, and the log file is created in the specified directory.
The log file is rotated based on size, and old log files are backed up.
format(record)
Format the log record with colors and elapsed time. This method overrides the default format method to customize the log message format. It includes the timestamp, logger name, log level, and message.
remove_ansi(s)
staticmethod
Remove ANSI escape codes from a string.
setup_logger(name=None, log_file=None, level=logging.DEBUG, log_dir=LOG_DIR, rotate=True, max_bytes=2 * 1024 * 1024, backup_count=5, mp_file_logging='main_only')
Setup a logger with colored output and file logging. This function creates a logger with colored output for console messages :param name: :param log_file: :param level: :param log_dir: :param rotate: :param max_bytes: :param backup_count: :param mp_file_logging: - "off": disable file logging - "main_only": file logging only in main process - "per_process": file logging in each process :return: logger
Core Functions
protwise.paramest.normest
worker_find_lambda(lam: float, gene: str, target: np.ndarray, p0: np.ndarray, time_points: np.ndarray, free_bounds: Tuple[np.ndarray, np.ndarray], init_cond: np.ndarray, num_psites: int, p_data: np.ndarray, pr_data: np.ndarray) -> Tuple[float, float, str]
Worker function for a single lambda value.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
lam
|
float
|
Regularization parameter. |
required |
gene
|
str
|
Gene name. |
required |
target
|
ndarray
|
Target data. |
required |
p0
|
ndarray
|
Initial parameter guess. |
required |
time_points
|
ndarray
|
Time points for the model fitting. |
required |
free_bounds
|
Tuple[ndarray, ndarray]
|
Parameter bounds for the optimization. |
required |
init_cond
|
ndarray
|
Initial conditions for the ODE solver. |
required |
num_psites
|
int
|
Number of phosphorylation sites. |
required |
p_data
|
ndarray
|
Measurement data for protein-phospho. |
required |
pr_data
|
ndarray
|
Reference data for protein. |
required |
Returns:
| Type | Description |
|---|---|
Tuple[float, float, str]
|
Tuple containing the lambda value, score, and weight key. |
find_best_lambda(gene: str, target: np.ndarray, p0: np.ndarray, time_points: np.ndarray, free_bounds: Tuple[np.ndarray, np.ndarray], init_cond: np.ndarray, num_psites: int, p_data: np.ndarray, pr_data: np.ndarray, lambdas=np.logspace(-2, 0, 10), max_workers: int = 4, per_lambda_timeout: float = 1800.0) -> Tuple[float, str]
Finds best lambda_reg to use in model_func.
normest(gene, pr_data, p_data, r_data, init_cond, num_psites, time_points, bounds, bootstraps, use_regularization=USE_REGULARIZATION)
Function to estimate parameters for a given gene using ODE models.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
gene
|
Gene name. |
required | |
pr_data
|
Protein data. |
required | |
p_data
|
Phosphorylation data. |
required | |
r_data
|
Reference data. |
required | |
init_cond
|
Initial conditions for the ODE solver. |
required | |
num_psites
|
Number of phosphorylation sites. |
required | |
time_points
|
Time points for the model fitting. |
required | |
bounds
|
Parameter bounds for the optimization. |
required | |
bootstraps
|
Number of bootstrap iterations. |
required | |
use_regularization
|
Whether to use regularization in the fitting process. |
USE_REGULARIZATION
|
Returns:
| Type | Description |
|---|---|
|
Tuple containing estimated parameters, model fits, error values, and regularization term. |
protwise.paramest.toggle
estimate_parameters(gene, pr_data, p_data, r_data, init_cond, num_psites, time_points, bounds, bootstraps)
Wrapper around the normal estimation workflow in paramest/normest.py.
This function delegates parameter estimation to :func:paramest.normest.normest.
There is currently only one estimation mode (normal). A sequential estimation mode
does not exist; paramest/seqest.py is not part of this package.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
gene
|
str
|
Gene name. |
required |
pr_data
|
array
|
Array of protein data. |
required |
p_data
|
array
|
Array of protein-phospho data. |
required |
r_data
|
array
|
Array of RNA data. |
required |
init_cond
|
array
|
Initial conditions for the model. |
required |
num_psites
|
int
|
Number of phosphorylation sites. |
required |
time_points
|
array
|
Time points for the data. |
required |
bounds
|
tuple
|
Bounds for the parameter estimation. |
required |
bootstraps
|
int
|
Number of bootstrap samples. |
required |
Returns:
| Name | Type | Description |
|---|---|---|
model_fits |
list
|
List of model fits. |
estimated_params |
array
|
Estimated parameters. |
seq_model_fit |
array
|
Model fit array of shape (num_psites, len(time_points)). |
errors |
array
|
Errors in the estimation. |
reg_term |
float
|
Regularization term. |
Weights for Curve Fitting
protwise.models.weights
early_emphasis(pr_data, p_data, time_points, num_psites)
Function that calculates custom weights for early time points in a dataset.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
pr_data
|
2D numpy array of shape (num_psites, n_times) |
required | |
p_data
|
2D numpy array of shape (num_psites, n_times) |
required | |
time_points
|
1D numpy array of time points |
required | |
num_psites
|
Number of phosphorylation sites |
required |
Returns:
| Name | Type | Description |
|---|---|---|
custom_weights |
1D numpy array of weights for early time points |
get_protein_weights(gene, input1_path=Path(__file__).resolve().parent.parent / 'processing' / 'input1_wstd.csv', input2_path=Path(__file__).resolve().parent.parent / 'data' / 'input2.csv')
Function to extract weights for a specific gene from the input files.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
gene
|
str
|
Gene ID to filter the weights. |
required |
input1_path
|
Path
|
Path to the input1_wstd.csv file. |
parent / 'processing' / 'input1_wstd.csv'
|
input2_path
|
Path
|
Path to the input2.csv file. |
parent / 'data' / 'input2.csv'
|
Returns:
| Name | Type | Description |
|---|---|---|
weights |
ndarray
|
Extracted weights for the specified gene. |
full_weight(p_data_weight, use_regularization, reg_len)
Function to create a full weight array for parameter estimation.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
p_data_weight
|
ndarray
|
The weight data to be processed. |
required |
use_regularization
|
bool
|
Flag to indicate if regularization is used. |
required |
reg_len
|
int
|
Length of the regularization term. |
required |
Returns:
| Type | Description |
|---|---|
|
numpy.ndarray: The full weight array. |
get_weight_options(target, t_target, num_psites, use_regularization, reg_len, early_weights, ms_gauss_weights)
Function to calculate weights for parameter estimation based on the target data and time points.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
target
|
ndarray
|
The target data for which weights are calculated. |
required |
t_target
|
ndarray
|
The time points corresponding to the target data. |
required |
num_psites
|
int
|
Number of phosphorylation sites. |
required |
use_regularization
|
bool
|
Flag to indicate if regularization is used. |
required |
reg_len
|
int
|
Length of the regularization term. |
required |
early_weights
|
ndarray
|
Weights for early time points. |
required |
ms_gauss_weights
|
ndarray
|
Weights based on Gaussian distribution. |
required |
Returns:
| Name | Type | Description |
|---|---|---|
dict |
A dictionary containing different weight options. |
Parameter Estimation
protwise.paramest.core
process_gene(gene, protein_data, kinase_data, mrna_data, time_points, bounds, bootstraps=0, out_dir=OUT_DIR)
Process a single gene by estimating its parameters and generating plots.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
gene
|
str
|
Gene name. |
required |
protein_data
|
DataFrame
|
DataFrame containing protein-only data. |
required |
kinase_data
|
DataFrame
|
DataFrame containing kinase data. |
required |
mrna_data
|
DataFrame
|
DataFrame containing mRNA data. |
required |
time_points
|
list
|
List of time points for the experiment. |
required |
bounds
|
tuple
|
Bounds for parameter estimation. |
required |
bootstraps
|
int
|
Number of bootstrap iterations. Defaults to 0. |
0
|
out_dir
|
str
|
Output directory for saving results. Defaults to OUT_DIR. |
OUT_DIR
|
Returns:
| Type | Description |
|---|---|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
process_gene_wrapper(gene, protein_data, kinase_data, mrna_data, time_points, bounds, bootstraps, out_dir=OUT_DIR)
Wrapper function to process a gene.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
gene
|
str
|
Gene name. |
required |
protein_data
|
DataFrame
|
DataFrame containing protein-only data. |
required |
kinase_data
|
DataFrame
|
DataFrame containing kinase data. |
required |
mrna_data
|
DataFrame
|
DataFrame containing mRNA data. |
required |
time_points
|
list
|
List of time points for the experiment. |
required |
bounds
|
tuple
|
Bounds for parameter estimation. |
required |
bootstraps
|
int
|
Number of bootstrap iterations. Defaults to 0. |
required |
out_dir
|
str
|
Output directory for saving results. Defaults to OUT_DIR. |
OUT_DIR
|
Returns:
| Name | Type | Description |
|---|---|---|
dict |
A dictionary containing the results of the gene processing. |
Confidence Intervals using Linearization
protwise.paramest.identifiability.ci
confidence_intervals(gene, popt, pcov, target, model, alpha_val=0.05)
Computes the confidence intervals for parameter estimates using Wald Intervals approach.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
gene
|
str
|
Gene name. |
required |
popt
|
ndarray
|
Optimized parameter estimates. |
required |
pcov
|
ndarray
|
Covariance matrix of the optimized parameters. |
required |
target
|
ndarray
|
Target data. |
required |
model
|
ndarray
|
Model predictions. |
required |
alpha_val
|
float
|
Significance level for confidence intervals. Defaults to 0.05. |
0.05
|
Returns:
| Name | Type | Description |
|---|---|---|
dict |
A dictionary containing the confidence intervals and other statistics. |
Knockout Analysis
protwise.knockout.helper
Perturbation & Parameter Sensitivity Analysis
protwise.sensitivity.analysis
compute_bound(value, perturbation=PERTURBATIONS_VALUE)
Computes the lower and upper bounds for a given parameter value for sensitivity analysis and perturbations.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
value
|
float
|
The parameter value. |
required |
perturbation
|
float
|
The perturbation factor. |
PERTURBATIONS_VALUE
|
Returns:
| Name | Type | Description |
|---|---|---|
list |
A list containing the lower and upper bounds. |
define_sensitivity_problem_rand(num_psites, values)
Defines the Morris sensitivity analysis problem for the random model.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
num_psites
|
int
|
Number of phosphorylation sites. |
required |
values
|
list
|
List of parameter values. |
required |
Returns:
| Name | Type | Description |
|---|---|---|
dict |
A dictionary containing the number of variables, parameter names, and bounds. |
define_sensitivity_problem_ds(num_psites, values)
Defines the Morris sensitivity analysis problem for the dynamic-site model.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
num_psites
|
int
|
Number of phosphorylation sites. |
required |
values
|
list
|
List of parameter values. |
required |
Returns:
| Name | Type | Description |
|---|---|---|
dict |
A dictionary containing the number of variables, parameter names, and bounds. |
Model Diagram
protwise.models.diagram.helpers
powerset(iterable)
Return the list of all subsets (as frozensets) of the given iterable.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
iterable
|
An iterable (e.g., list, set) to generate subsets from. |
required |
Returns: A list of frozensets representing all subsets of the input iterable.
state_label(state)
Convert a set of phosphorylation sites into a node label.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
state
|
A frozenset representing the phosphorylation state. |
required |
Returns: A string representing the label for the node.
create_random_diagram(x, num_sites, output_filename)
Create a random phosphorylation diagram.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
x
|
Placeholder parameter, not used in this function. |
required | |
num_sites
|
The number of phosphorylation sites. |
required | |
output_filename
|
The name of the output file for the diagram. |
required |
create_distributive_diagram(x, num_sites, output_filename)
Create a distributive phosphorylation diagram.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
x
|
Placeholder parameter, not used in this function. |
required | |
num_sites
|
The number of phosphorylation sites. |
required | |
output_filename
|
The name of the output file for the diagram. |
required |
create_successive_model(x, num_sites, output_filename)
Create a successive phosphorylation diagram.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
x
|
Placeholder parameter, not used in this function. |
required | |
num_sites
|
The number of phosphorylation sites. |
required | |
output_filename
|
The name of the output file for the diagram. |
required |
Protein Wise Model Types
protwise.models.distmod
ode_core(y, t, A, B, C, D, S_rates, D_rates)
The core ODE system for the distributive phosphorylation model.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
y
|
array of concentrations |
required | |
t
|
time |
required | |
A
|
mRNA production rate |
required | |
B
|
mRNA degradation rate |
required | |
C
|
protein production rate |
required | |
D
|
protein degradation rate |
required | |
S_rates
|
phosphorylation rates for each site |
required | |
D_rates
|
dephosphorylation rates for each site |
required |
Returns:
| Name | Type | Description |
|---|---|---|
dydt |
array of derivatives |
unpack_params(params, num_psites)
Function to unpack the parameters for the distributive ODE system.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
params
|
array
|
Parameter vector containing A, B, C, D, S_1.S_n, Ddeg_1.Ddeg_m. |
required |
num_psites
|
int
|
Number of phosphorylation sites. |
required |
Returns:
| Name | Type | Description |
|---|---|---|
A |
float
|
mRNA production rate. |
B |
float
|
mRNA degradation rate. |
C |
float
|
protein production rate. |
D |
float
|
protein degradation rate. |
S_rates |
array
|
Phosphorylation rates for each site. |
D_rates |
array
|
Dephosphorylation rates for each site. |
solve_ode(params, init_cond, num_psites, t)
Solve the ODE system for the distributive phosphorylation model.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
params
|
array of parameters |
required | |
init_cond
|
initial conditions |
required | |
num_psites
|
number of phosphorylation sites |
required | |
t
|
time points |
required |
Returns:
| Name | Type | Description |
|---|---|---|
sol |
solution of the ODE system |
|
P_fitted |
phosphorylated sites |
protwise.models.randmod
unpack_params(params, num_sites)
Unpack parameters for the Random model.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
params
|
array
|
Parameter vector containing A, B, C, D, S_1.S_n, Ddeg_1.Ddeg_m. |
required |
num_sites
|
int
|
Number of phosphorylation sites. |
required |
Returns:
| Name | Type | Description |
|---|---|---|
A |
float
|
mRNA production rate. |
B |
float
|
mRNA degradation rate. |
C |
float
|
protein production rate. |
D |
float
|
protein degradation rate. |
S |
array
|
Phosphorylation rates for each site. |
Ddeg |
array
|
Degradation rates for phosphorylated states. |
ode_system(y, t, A, B, C, D, num_sites, S, Ddeg, mono_idx, forward, drop, fcounts, dcounts)
Compute the time derivatives of a random phosphorylation ODE system.
This function supports a large number of phosphorylation states by using precomputed transition indices to optimize speed.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
y
|
array
|
Current state vector [R, P, X_1, ..., X_m]. |
required |
t
|
float
|
Time (unused; present for compatibility with ODE solvers). |
required |
A
|
float
|
mRNA production rate. |
required |
B
|
float
|
mRNA degradation rate. |
required |
C
|
float
|
protein production rate. |
required |
D
|
float
|
protein degradation rate. |
required |
num_sites
|
int
|
Number of phosphorylation sites. |
required |
S
|
array
|
Phosphorylation rates for each site. |
required |
Ddeg
|
array
|
Degradation rates for phosphorylated states. |
required |
mono_idx
|
array
|
Precomputed indices for mono-phosphorylated states. |
required |
forward
|
array
|
Forward phosphorylation target states. |
required |
drop
|
array
|
Dephosphorylation target states. |
required |
fcounts
|
array
|
Number of valid forward transitions for each state. |
required |
dcounts
|
array
|
Number of valid dephosphorylation transitions for each state. |
required |
Returns:
| Name | Type | Description |
|---|---|---|
out |
array
|
Derivatives [dR, dP, dX_1, ..., dX_m]. |
solve_ode(popt, y0, num_sites, t)
Integrate the ODE system for phosphorylation dynamics in random phosphorylation model.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
popt
|
array
|
Optimized parameter vector [A, B, C, D, S_1.S_n, Ddeg_1.Ddeg_m]. |
required |
y0
|
array
|
Initial condition vector [R0, P0, X1_0, ..., Xm_0]. |
required |
num_sites
|
int
|
Number of phosphorylation sites. |
required |
t
|
array
|
Time points to integrate over. |
required |
Returns:
| Name | Type | Description |
|---|---|---|
sol |
ndarray
|
Full ODE solution of shape (len(t), len(y0)). |
mono |
ndarray
|
1D array of fitted values for R (after OFFSET) and P states. |
protwise.models.succmod
ode_core(y, t, A, B, C, D, S_rates, D_rates)
The core of the ODE system for the successive ODE model.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
y
|
array
|
The current state of the system. |
required |
t
|
float
|
The current time. |
required |
A
|
float
|
The mRNA production rate. |
required |
B
|
float
|
The mRNA degradation rate. |
required |
C
|
float
|
The protein production rate. |
required |
D
|
float
|
The protein degradation rate. |
required |
S_rates
|
array
|
The phosphorylation rates for each site. |
required |
D_rates
|
array
|
The dephosphorylation rates for each site. |
required |
Returns: dydt (np.array): The derivatives of the state variables.
unpack_params(params, num_psites)
Function to unpack the parameters for the ODE system. The parameters are expected to be in the following order: A, B, C, D, S_rates, D_rates where S_rates and D_rates are arrays of length num_psites. The function returns the unpacked parameters as separate variables. :param params: array of parameters :param num_psites: number of phosphorylation sites :return: A, B, C, D, S_rates, D_rates
solve_ode(params, init_cond, num_psites, t)
Solve the ODE system using the given parameters and initial conditions. The function integrates the ODE system over time and returns the solution.
:param params: :param init_cond: :param num_psites: :param t: :return: solution, solution of phosphorylated sites
Steady-State Calculation
protwise.steady.initdist
initial_condition(num_psites: int) -> list
Calculates the initial steady-state conditions for a given number of phosphorylation sites for distributive phosphorylation model.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
num_psites
|
int
|
Number of phosphorylation sites in the model. |
required |
Returns:
| Name | Type | Description |
|---|---|---|
list |
list
|
A list of steady-state values for the variables [R, P, P_sites]. |
Raises:
| Type | Description |
|---|---|
ValueError
|
If the optimization fails to find a solution for the steady-state conditions. |
protwise.steady.initrand
initial_condition(num_psites: int) -> list
Calculates the initial steady-state conditions for a given number of phosphorylation sites for random phosphorylation model.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
num_psites
|
int
|
Number of phosphorylation sites in the model. |
required |
Returns:
| Name | Type | Description |
|---|---|---|
list |
list
|
A list of steady-state values for the variables [R, P, P_sites]. |
Raises:
| Type | Description |
|---|---|
ValueError
|
If the optimization fails to find a solution for the steady-state conditions. |
protwise.steady.initsucc
initial_condition(num_psites: int) -> list
Calculates the initial steady-state conditions for a given number of phosphorylation sites for successive phosphorylation model.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
num_psites
|
int
|
Number of phosphorylation sites in the model. |
required |
Returns:
| Name | Type | Description |
|---|---|---|
list |
list
|
A list of steady-state values for the variables [R, P, P_sites]. |
Raises:
| Type | Description |
|---|---|
ValueError
|
If the optimization fails to find a solution for the steady-state conditions. |
Plotting
protwise.plotting.plotting
Plotter
A class to encapsulate plotting functionalities for ODE model analysis.
Attributes:
| Name | Type | Description |
|---|---|---|
gene |
str
|
The gene or experiment name. |
out_dir |
str
|
The directory where plots will be saved. |
color_palette |
list
|
List of color codes used for plotting. |
plot_parallel(solution: np.ndarray, labels: list)
Plots a parallel coordinates plot for the given solution.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
solution
|
ndarray
|
2D numpy array of shape (samples, features) representing the data. |
required |
labels
|
list
|
List of labels for the features in the solution. |
required |
pca_components(solution: np.ndarray, target_variance: float = 0.99)
Plots a scree plot showing the explained variance ratio for PCA components.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
solution
|
ndarray
|
2D numpy array of shape (samples, features) representing the data. |
required |
target_variance
|
float
|
The target variance to explain. Defaults to 0.99. |
0.99
|
plot_pca(solution: np.ndarray, components: int = 3)
Plots the PCA results for the given solution.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
solution
|
ndarray
|
2D numpy array of shape (samples, features) representing the data. |
required |
components
|
int
|
Number of PCA components to plot. Defaults to 3. |
3
|
Returns:
| Name | Type | Description |
|---|---|---|
tuple |
PCA result and explained variance ratio. |
plot_tsne(solution: np.ndarray, perplexity: int = 30)
Plots a t-SNE visualization of the given solution.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
solution
|
ndarray
|
2D numpy array of shape (samples, features) representing the data. |
required |
perplexity
|
int
|
The perplexity parameter for t-SNE. Defaults to 30. |
30
|
Returns:
| Type | Description |
|---|---|
|
np.ndarray: The t-SNE result. |
plot_param_series(estimated_params: list, param_names: list, time_points: np.ndarray)
Plots the time series of estimated parameters over the given time points.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
estimated_params
|
list
|
List of estimated parameters. |
required |
param_names
|
list
|
List of parameter names. |
required |
time_points
|
ndarray
|
Array of time points. |
required |
plot_profiles(data: pd.DataFrame)
Plots the profiles of estimated parameters over time.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
data
|
DataFrame
|
DataFrame containing the time series data. |
required |
plot_model_fit(model_fit: np.ndarray, Pr_data: np.ndarray, P_data: np.ndarray, R_data: np.ndarray, sol: np.ndarray, num_psites: int, psite_labels: list, time_points: np.ndarray)
Plots the model fit for mRNA, protein, and phosphorylated species across time.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
model_fit
|
ndarray
|
Flattened model fit data (length = 9 + 14 + 14*num_psites). |
required |
Pr_data
|
ndarray
|
Protein data (14,). |
required |
P_data
|
ndarray
|
Phosphorylation data (num_psites + 2, 14). |
required |
R_data
|
ndarray
|
mRNA data (9,). |
required |
sol
|
ndarray
|
ODE solution array. |
required |
num_psites
|
int
|
Number of phosphorylation sites. |
required |
psite_labels
|
list
|
Labels for phosphorylation sites. |
required |
time_points
|
ndarray
|
Time points (14,). |
required |
plot_param_scatter(est_arr: np.ndarray, num_psites: int, time_vals: np.ndarray)
Plots scatter and density plots for parameters.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
est_arr
|
ndarray
|
2D numpy array of estimated parameters. |
required |
num_psites
|
int
|
Number of phosphorylation sites. |
required |
time_vals
|
ndarray
|
Array of time values. |
required |
plot_heatmap(param_value_df: pd.DataFrame)
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
param_value_df
|
DataFrame
|
DataFrame containing parameter values with 'Protein' as one of the columns. |
required |
plot_error_distribution(error_df: pd.DataFrame)
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
error_df
|
DataFrame
|
DataFrame containing errors with 'MAE' as one of the columns. |
required |
plot_gof(merged_data: pd.DataFrame)
Plot the goodness of fit for the model.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
merged_data
|
DataFrame
|
Dataframe containing merged data. |
required |
plot_kld(merged_data: pd.DataFrame)
Plots the Kullback-Divergence for the model.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
merged_data
|
DataFrame
|
Dataframe containing merged data. |
required |
plot_params_bar(ci_results: dict, param_labels: list = None)
Plots bar plot for estimated parameter with 95% Confidence Interval.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
ci_results
|
dict
|
Dictionary containing the results of the confidence intervals. |
required |
param_labels
|
list
|
List of parameter labels. Defaults to None. |
None
|
plot_knockouts(results_dict: dict, num_psites: int, psite_labels: list)
Plot wild-type and knockout simulation results for comparison.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
results_dict
|
dict
|
Dictionary containing simulation results. |
required |
num_psites
|
int
|
Number of phosphorylation sites. |
required |
psite_labels
|
list
|
List of phosphorylation site labels. |
required |
plot_top_param_pairs(excel_path: str)
For each gene's '_perturbations' sheet in the Excel file, plot scatter plots for the parameter pairs with correlation.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
excel_path
|
str
|
Path to the Excel file. |
required |
plot_model_perturbations(problem: dict, Si: dict, cutoff_idx: int, time_points: np.ndarray, n_sites: int, best_model_psite_solutions: np.ndarray, best_mrna_solutions: np.ndarray, best_protein_solutions: np.ndarray, psite_labels: list[str], protein_data_ref: np.ndarray, psite_data_ref: np.ndarray, rna_ref: np.ndarray, model_fit_sol: np.ndarray) -> None
Plot the best model perturbations for the given data.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
problem
|
dict
|
The optimization problem. |
required |
Si
|
dict
|
The simulation index. |
required |
cutoff_idx
|
int
|
The cutoff index for the time points. |
required |
time_points
|
ndarray
|
The time points for the data. |
required |
n_sites
|
int
|
The number of phosphorylation sites. |
required |
best_model_psite_solutions
|
ndarray
|
The best model phosphorylation site solutions. |
required |
best_mrna_solutions
|
ndarray
|
The best model mRNA solutions. |
required |
best_protein_solutions
|
ndarray
|
The best model protein solutions. |
required |
protein_ref
|
The reference data for the protein. |
required | |
psite_labels
|
list[str]
|
The labels for the phosphorylation sites. |
required |
psite_data_ref
|
ndarray
|
The reference data for the phosphorylation sites. |
required |
rna_ref
|
ndarray
|
The reference data for mRNA. |
required |
plot_time_state_grid(samples: np.ndarray, time_points: np.ndarray, state_names: list)
Grid of strip plots per state showing variability across time.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
samples
|
ndarray
|
shape (n_samples, n_timepoints, n_states) |
required |
time_points
|
ndarray
|
array of time points |
required |
state_names
|
list
|
list of state names |
required |
plot_phase_space(samples: np.ndarray, state_names: list)
Phase space plots: one state vs another for each simulation.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
samples
|
ndarray
|
Shape (n_samples, n_timepoints, n_states) |
required |
state_names
|
list
|
List of state names (length = num_states) |
required |
plot_future_fit(P_data: np.ndarray, R_data: np.ndarray, sol: np.ndarray, num_psites: int, psite_labels: list, time_points: np.ndarray)
Plots the model fit for the future time points.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
P_data
|
ndarray
|
Data for phosphorylation sites. |
required |
R_data
|
ndarray
|
Data for mRNA. |
required |
sol
|
ndarray
|
Model solution. |
required |
num_psites
|
int
|
Number of phosphorylation sites. |
required |
psite_labels
|
list
|
Labels for phosphorylation sites. |
required |
time_points
|
ndarray
|
Time points for the data. |
required |
plot_regularization(excel_path: str)
Read every '
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
excel_path
|
str
|
Path to the Excel file. |
required |
plot_model_error(excel_path: str)
Read every '
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
excel_path
|
str
|
Path to the Excel file. |
required |
Utility Functions
common.utils.display
ensure_output_directory(directory)
Ensure the output directory exists. If it doesn't, create it.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
directory
|
str
|
Path to the output directory. |
required |
load_data(excel_file, sheet='Estimated Values')
Load data from an Excel file. The default sheet is "Estimated Values".
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
excel_file
|
str
|
Path to the Excel file. |
required |
sheet
|
str
|
Name of the sheet to load. Default is "Estimated Values". |
'Estimated Values'
|
Returns:
| Type | Description |
|---|---|
|
pd.DataFrame: DataFrame containing the data from the specified sheet. |
format_duration(seconds)
Format a duration in seconds into a human-readable string.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
seconds
|
float
|
Duration in seconds. |
required |
Returns: str: Formatted duration string.
merge_obs_est(filename)
Function to merge observed and estimated data from an Excel file.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
filename
|
str
|
Path to the Excel file containing observed and estimated data. |
required |
Returns:
| Type | Description |
|---|---|
|
pd.DataFrame: Merged DataFrame containing observed and estimated values for each gene and Psite. |
save_result(results, excel_filename)
Function to save results to an Excel file.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
results
|
list
|
List of dictionaries containing results for each gene. |
required |
excel_filename
|
str
|
Path to the output Excel file. |
required |
create_report(results_dir: str, output_file: str = f'{model_type}_report.html')
Creates a single global report HTML file from all gene folders inside the results directory.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
results_dir
|
str
|
Path to the root result's directory. |
required |
output_file
|
str
|
Name of the generated global report file (placed inside results_dir). |
f'{model_type}_report.html'
|
organize_output_files(directories: Iterable[Union[str, Path]])
Organize output files into protein-specific folders and a general folder.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
directories
|
Iterable[Union[str, Path]]
|
List of directories to organize. |
required |
common.utils.tables
generate_tables(xlsx_file_path)
Generate hierarchical tables from the XLSX file containing alpha and beta values.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
xlsx_file_path
|
str
|
Path to the XLSX file containing alpha and beta values. |
required |
Returns:
| Name | Type | Description |
|---|---|---|
tuple |
containing protein, psite, and the corresponding table. |
save_tables(tables, output_dir)
Save the generated tables as LaTeX and CSV files.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
tables
|
list
|
List of tuples containing protein, psite, and the corresponding table. |
required |
output_dir
|
str
|
Directory to save the LaTeX and CSV files. |
required |
save_master_table(folder='latex', output_file='latex/all_tables.tex')
Save a master LaTeX file that includes all individual LaTeX files from the specified folder.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
folder
|
str
|
The folder containing the individual LaTeX files. |
'latex'
|
output_file
|
str
|
The name of the master LaTeX file to be created. |
'latex/all_tables.tex'
|
common.utils.latexit
generate_latex_table(df, sheet_name)
Generate LaTeX code for a table from a DataFrame. Args: df (pd.DataFrame): DataFrame to convert to LaTeX. sheet_name (str): Name of the sheet for caption and label. Returns: str: LaTeX code for the table.
generate_latex_image(image_filename)
Generate LaTeX code for an image.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
image_filename
|
str
|
Path to the image file. |
required |
Returns: str: LaTeX code for the image.
main(input_dir)
Main function to process Excel and PNG files in the input directory and generate LaTeX code.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
input_dir
|
str
|
Directory containing Excel and PNG files. |
required |
Global ODE Model
Core Data Structures & Topology
networkmodel.network
System State and Topology Management Module.
This module defines the core data structures that represent the biological system: 1. Index: Manages the mapping between biological names (Proteins, Sites) and numerical indices in the state vector. It handles the complex logic of "Proxy Redirection" for orphan Transcription Factors. 2. KinaseInput: Interpolates experimental kinase data onto the simulation time grid. 3. System: The central container holding all parameters, sparse matrices (CSR), and logic required to evaluate the differential equations. It acts as the bridge between high-level Python objects and the low-level Numba JIT kernels.
Index
Manages the indexing of the state vector Y and network topology.
The state vector Y is a flattened array containing all species. For $N$ proteins, the layout depends on the MODEL type:
- Distributive/Sequential: [mRNA_i, Prot_i, Site_i_1, Site_i_2, ...]
- Combinatorial: [mRNA_i, Prot_state_0, Prot_state_1, ..., Prot_state_2^n]
__init__(interactions: pd.DataFrame, tf_interactions: pd.DataFrame = None, kin_beta_map: dict = None, tf_beta_map: dict = None)
Index manager for the PhoskinTime ODE system. Handles protein/kinase mapping and redirects Orphan TFs to Kinase Proxies.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
interactions
|
DataFrame
|
Cleaned Kinase-Substrate network (columns: protein, psite, kinase). |
required |
tf_interactions
|
DataFrame
|
Cleaned TF-Target network (columns: tf, target). |
None
|
kin_beta_map
|
dict
|
Optional dictionary of optimized kinase priors {name: beta_val}. |
None
|
tf_beta_map
|
dict
|
Optional dictionary of optimized TF priors {name: beta_val}. |
None
|
block(i: int) -> slice
Helper to get the range in the state vector for protein i.
KinaseInput
Manages the external Kinase signal inputs $K(t)$. Interpolates sparse experimental observations onto the solver's dense time grid.
eval(t)
Returns the kinase activity vector at time t (using step interpolation).
System
The central Simulation Object.
Holds:
1. Parameters (Arrays $A_i, B_i, \dots$).
2. Network Topology Matrices (Sparse CSR format).
3. Initial Conditions logic.
4. The rhs method (Python-side derivative calculation).
5. Argument packing logic for Numba JIT kernels.
update(c_k, A_i, B_i, C_i, D_i, Dp_i, E_i, tf_scale)
Updates the system parameters from the optimizer.
attach_initial_condition_data(df_prot, df_rna, df_pho)
Attaches experimental data used to calculate t=0 state.
set_initial_conditions()
Derives the y0 vector from attached experimental data.
rhs(t, y)
Python-side Right-Hand Side evaluation. Used primarily for testing or when live interactivity/debugging is needed.
Logic Flow:
1. Live-Drive: Calculate Kinase Activity $Kt = K_{data}(t) imes c_k$.
2. Signaling: Calculate Phospho-Drive $S = W \cdot Kt$.
3. Protein Aggregation: Sum phospho-states to get total protein $P$.
Crucially, if a protein is a Kinase (or Proxy), overwrite its value with $Kt$.
4. Regulation: Calculate TF inputs $TF_{in} = TF_{mat} \cdot P$.
5. Dynamics: Call model-specific _rhs kernel.
y0() -> np.ndarray
Returns the initial state vector y0.
odeint_args(S_cache=None)
Packs all system arrays into a tuple for the Numba JIT solver.
This method constructs the driver_map array, which tells the low-level solver
which proteins are actually Kinases (or Proxies) and should be "Driven" by
experimental data rather than simulated.
networkmodel.buildmat
Network Matrix Construction Module.
This script handles the construction of the two primary connectivity matrices used in the global simulation: 1. W Matrix (Kinase-Substrate): A sparse matrix defining the phosphorylation rates from kinases to specific phosphosites. It is built in parallel to handle large interactomes. 2. TF Matrix (Transcription Factor-Gene): A sparse matrix defining the transcriptional regulation strengths. It includes logic to handle "proxy" TFs (orphans mapped to kinases).
site_key(site: str) -> int
Extracts the numerical position from a phosphosite string for sorting.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
site
|
str
|
e.g., "S473" or "Y1068". |
required |
Returns:
| Name | Type | Description |
|---|---|---|
int |
int
|
The integer position (e.g., 473). |
Raises:
| Type | Description |
|---|---|
ValueError
|
If no digits are found in the string. |
build_W_parallel(interactions: pd.DataFrame, idx, n_cores=4) -> sparse.csr_matrix
Constructs the Global Kinase-Substrate Matrix W using parallel processing.
The global matrix is a vertical stack of local matrices: $$ W_{global} = egin{bmatrix} W_{protein_1} \ W_{protein_2} \ dots \end{bmatrix} $$
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
interactions
|
DataFrame
|
Dataframe containing 'protein', 'psite', 'kinase', and 'alpha'. |
required |
idx
|
IndexMap
|
Object containing mappings (sites, k2i, proteins). |
required |
n_cores
|
int
|
Number of parallel processes to use. |
4
|
Returns:
| Type | Description |
|---|---|
csr_matrix
|
sparse.csr_matrix: The global interaction matrix. |
build_tf_matrix(tf_net, idx, tf_beta_map=None, kin_beta_map=None)
Constructs the Transcription Factor Regulation Matrix.
This matrix defines the influence of TFs on target genes. The edge weight is calculated as: $$ W_{ij} = lpha_{ij} imes eta_{tf} $$ where $lpha$ is the edge strength and $eta$ is the activity multiplier.
Proxy Logic:
If a TF is mapped as a "proxy" (i.e., an orphan TF whose activity is inferred
from a kinase), we use the kinase's beta value (kin_beta_map) instead of
the TF's beta.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
tf_net
|
DataFrame
|
TF-Target interactions with 'tf', 'target', and 'alpha'. |
required |
idx
|
IndexMap
|
Object containing mappings (p2i, proxy_map). |
required |
tf_beta_map
|
dict
|
Multipliers for standard TFs. |
None
|
kin_beta_map
|
dict
|
Multipliers for kinases (used for proxies). |
None
|
Returns:
| Type | Description |
|---|---|
|
sparse.csr_matrix: An $(N, N)$ matrix where rows=targets, cols=TFs. |
networkmodel.params
Parameter Management and Transformation Module.
This module handles the interface between the biological model (which requires positive physical parameters like rate constants) and the numerical optimizer (which operates on a flat vector of decision variables).
Key Concepts:
1. Positivity Constraint: Biological rates cannot be negative. To enforce this
during optimization without using hard constraints, we use a Softplus transformation:
$$ P_{physical} = \ln(1 + e^{\theta_{raw}}) $$
This maps any real number $\theta$ to a positive value $P$.
2. Vectorization: The optimizer expects a single 1D array (theta). This module
packs all distinct parameter arrays ($A_i, B_i, \dots$) into this vector and
maintains a slices dictionary to unpack them back.
init_raw_params(defaults, custom_bounds=None)
Initializes the decision vector and bounds for the optimizer.
This function performs three main tasks:
1. Transformation: Converts default physical values to "raw" optimization space
using the inverse softplus function.
2. Flattening: Concatenates all parameter arrays into a single 1D vector theta0.
3. Bounds Mapping: Transforms physical bounds (min, max) into raw space bounds.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
defaults
|
dict
|
Dictionary of initial physical values (e.g., {'A_i': [0.1, ...], ...}). |
required |
custom_bounds
|
dict
|
Dictionary of (min, max) tuples for specific keys.
Overrides the global |
None
|
Returns:
| Name | Type | Description |
|---|---|---|
tuple |
A tuple containing: - theta0 (np.ndarray): The initial flattened decision vector. - slices (dict): Mapping of parameter names to slice objects for unpacking. - xl (np.ndarray): Lower bounds vector in raw space. - xu (np.ndarray): Upper bounds vector in raw space. |
unpack_params(theta, slices)
Converts the raw decision vector back to physical parameters.
Applies the Softplus transformation: $$ P = \ln(1 + e^{\theta}) $$ This ensures that all output parameters are strictly positive, regardless of the values chosen by the optimizer.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
theta
|
ndarray
|
The flat decision vector from the optimizer. |
required |
slices
|
dict
|
The slicing dictionary created by |
required |
Returns:
| Name | Type | Description |
|---|---|---|
dict |
A dictionary of physical parameters keys (e.g., 'A_i', 'c_k') mapped to positive numpy arrays or scalars. |
Configuration & Data Loading
networkmodel.config
Configurations for PhoskinTime Global model.
This module serves as the central configuration hub for the application. It loads
settings from a config.toml file located in the project root and exports them
as global constants. These constants control every aspect of the simulation pipeline,
including:
- I/O Paths: Locations of interaction networks and experimental data.
- Model Topology: Selection of the kinetic model structure (Distributive vs. Sequential vs. Combinatorial).
- Solver Settings: Tolerances and step sizes for the ODE integrator.
- Optimization: Hyperparameters for the parameter estimation loop (Evolutionary Strategy).
- Regularization: Penalty terms to prevent overfitting.
- Metadata: Versioning and citation information.
networkmodel.io
Data Loading and Pre-processing Module.
This script is responsible for ingesting raw CSV/Excel files containing biological network definitions (Kinases, TFs) and experimental data (MS Proteomics, RNA-seq). It harmonizes identifiers, merges prior knowledge (Alpha/Beta weights), and reshapes time-series data into a standardized 'tidy' format for the modeling pipeline.
load_data(args)
Loads, cleans, and merges all required data files for the model.
Process: 1. Networks: Loads Kinase-Substrate and TF-Gene interactions. - Expands set-based notations (e.g., "{K1, K2}"). - Merges 'Alpha' (edge strength) and 'Beta' (node activity prior) values from optimization results. 2. Experimental Data: Loads MS and RNA-seq data. - Scales data (e.g., log-transformation, normalization). - Reshapes from "wide" (time points as columns) to "long" (time as a variable).
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
args
|
Namespace
|
Configuration namespace containing file paths (kinase_net, tf_net, ms, rna, kinopt, tfopt). |
required |
Returns:
| Name | Type | Description |
|---|---|---|
tuple |
A tuple containing: - df_kin_clean (pd.DataFrame): Kinase network with columns [protein, psite, kinase, alpha]. - df_tf_clean (pd.DataFrame): TF network with columns [tf, target, alpha]. - df_prot (pd.DataFrame): Protein abundance data [protein, time, fc]. - df_pho (pd.DataFrame): Phosphorylation data [protein, psite, time, fc]. - df_rna (pd.DataFrame): RNA abundance data [protein, time, fc]. - kin_beta_map (dict): Prior activity assumptions for kinases {kinase: beta}. - tf_beta_map (dict): Prior activity assumptions for TFs {tf: beta}. |
Physics Kernels (JIT)
networkmodel.models
Kinetic Model Kernels (Right-Hand Side Definitions).
This module contains the JIT-compiled kernels that define the system of Ordinary Differential Equations (ODEs) for different biological kinetic assumptions.
Supported Models: 1. Distributive (Model 0): Sites on the same protein are phosphorylated independently. State space scales linearly with sites ($N_{sites} + 2$). 2. Sequential (Model 1): Phosphorylation occurs in a strict order ($P_0 o P_1 o \dots$). Useful for processive kinases. State space is linear. 3. Combinatorial (Model 2): Every possible phosphorylation pattern is a distinct state. State space scales exponentially ($2^{N_{sites}} + 1$). Handles complex logic like logic gates. 4. Saturating (Model 4): Uses Michaelis-Menten kinetics for translation and phosphorylation. Prevents unbiological rate explosions at high concentrations.
calculate_synthesis_rate(Ai, tf_scale, u_raw)
Computes the transcription rate based on Transcription Factor (TF) input.
Uses a rational (Hill-like) function that is numerically stable and bounded.
Logic:
- Soft-Clipping: The raw input u is first mapped to (-1, 1) to prevent
numerical overflow/instability using $u_{norm} = u / (1 + |u|)$.
- Activation (u > 0): Rate increases from $A_i$ to $A_i imes (1 + tf_scale)$.
- Repression (u < 0): Rate decreases from $A_i$ to $A_i / (1 + tf_scale)$.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
Ai
|
float
|
Basal synthesis rate. |
required |
tf_scale
|
float
|
Maximum fold-change factor. |
required |
u_raw
|
float
|
Raw total TF input (weighted sum of regulators). |
required |
Returns:
| Name | Type | Description |
|---|---|---|
float |
The calculated synthesis rate. |
saturating_rhs(y, dy, A_i, B_i, C_i, D_i, Dp_i, E_i, tf_scale, TF_inputs, S_all, offset_y, offset_s, n_sites)
Right-Hand Side for Model 4: Saturating Kinetics.
Unlike other models that use Mass Action kinetics (Rate = k * [S]), this model uses Michaelis-Menten forms (Rate = Vmax * [S] / (Km + [S])) for Translation and Phosphorylation. This prevents "runaway" kinetics where high protein concentrations lead to infinitely fast reactions.
Physics: - Translation: Limited by ribosome availability. - Phosphorylation: Limited by enzyme (kinase) availability relative to substrate.
distributive_rhs(y, dy, A_i, B_i, C_i, D_i, Dp_i, E_i, tf_scale, TF_inputs, S_all, offset_y, offset_s, n_sites)
Right-Hand Side for Model 0: Distributive (Independent) Binding.
Assumes that the phosphorylation status of one site does not affect others. We track the Unphosphorylated species $P$ and $N_{sites}$ distinct mono-phosphorylated species. This is a simplification that ignores multi-phosphorylated states but captures individual site dynamics efficiently.
sequential_rhs(y, dy, A_i, B_i, C_i, D_i, Dp_i, E_i, tf_scale, TF_inputs, S_all, offset_y, offset_s, n_sites)
Right-Hand Side for Model 1: Sequential Binding.
Assumes phosphorylation must happen in a specific order: $P_0 \xrightarrow{k_0} P_1 \xrightarrow{k_1} P_2 \dots \xrightarrow{k_{n-1}} P_n$
Dephosphorylation is assumed to be distributive (any state can revert to the previous one). This model is suitable for processive kinases or mechanisms with strict steric requirements.
combinatorial_rhs(y, dy, A_i, B_i, C_i, D_i, Dp_i, E_i, tf_scale, TF_inputs, S_cache, jb, offset_y, offset_s, n_sites, n_states, trans_from, trans_to, trans_site, trans_off, trans_n)
Right-Hand Side for Model 2: Combinatorial Binding.
Tracks all $2^N$ possible states. This allows for complex logic (e.g., "Site A facilitates Site B").
The state transition logic is split into:
1. Dephosphorylation (Implicit Graph): Iterates over all states m. For every set bit in m,
calculates decay to m ^ lsb.
2. Phosphorylation (Explicit Graph): Uses pre-calculated trans_* arrays to apply forward
transitions based on available sites.
build_random_transitions(idx)
Precompute random phosphorylation transitions for all proteins (Model 2 Setup).
Constructs the sparse adjacency list for the hypercube state graph.
Returns arrays for Numba
trans_from, trans_to, trans_site (flattened) trans_off[i], trans_n[i] per protein i
Interpretation
for protein i: transitions are in slice [trans_off[i] : trans_off[i]+trans_n[i]] each transition uses site index 'j' (0..ns-1) to pick rate S_all[s_start + j]
Numerical Integration & Solvers
networkmodel.simulate
Simulation and Measurement Extraction Module.
This module orchestrates the numerical integration of the ODE system and processes the raw state trajectories into biologically meaningful observables (Fold Changes).
Key Responsibilities:
1. Solver Wrapper: Abstracting the choice between scipy.integrate.odeint (LSODA)
and a custom Numba-accelerated RK45 solver.
2. State Aggregation: Converting raw state vectors $Y$ into:
* Total Protein: Sum of unphosphorylated and phosphorylated forms.
* Site Phosphorylation: Sum of specific phospho-states (handling combinatorial logic if needed).
* RNA: Direct extraction from state vector.
3. Data Alignment: Interpolating or slicing the simulated timepoints to match experimental grids.
simulate_odeint(sys, t_eval, rtol, atol, mxstep)
Runs the ODE solver for the given system and timepoints.
Dispatches to either a custom Numba RK45 solver (faster for small/stiff systems with JIT) or SciPy's LSODA (robust standard).
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
sys
|
System
|
The system object containing parameters and topology. |
required |
t_eval
|
ndarray
|
Time points to return the solution at. |
required |
rtol
|
float
|
Relative tolerance. |
required |
atol
|
float
|
Absolute tolerance. |
required |
mxstep
|
int
|
Maximum number of internal steps per time interval. |
required |
Returns:
| Type | Description |
|---|---|
|
np.ndarray: The solution matrix $Y$ of shape (len(t_eval), state_dim). |
simulate_and_measure(sys, idx, t_points_p, t_points_r, t_points_pho)
Simulates the system and extracts 'Fold Change' (FC) predictions aligned with data.
Process: 1. Union Grid: Creates a master time grid containing all experimental timepoints. 2. Simulate: Integrates the system once over this master grid. 3. Extract & Normalize: - Calculates raw observables (e.g., Total Protein = Unphos + Phos). - Normalizes by the value at the baseline timepoint (t=0 for protein/phospho, t=4 for RNA). 4. Slice: Filters the result to match the specific timepoints requested for each modality.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
sys
|
System object. |
required | |
idx
|
Index object (topology map). |
required | |
t_points_*
|
Arrays of timepoints for Protein, RNA, and Phospho data. |
required |
Returns:
| Name | Type | Description |
|---|---|---|
tuple |
(df_prot, df_rna, df_phos) - Pandas DataFrames with columns [protein, time, pred_fc]. |
networkmodel.solvers
Adaptive Runge-Kutta Integrators (JIT-Compiled).
This module implements a custom Dormand-Prince (RK45) adaptive step-size solver tailored specifically for the stiff and sparse ODE systems generated by PhoskinTime.
Why a custom solver?
Standard SciPy solvers (odeint, solve_ivp) incur significant Python overhead when calling
the RHS function thousands of times per second. By implementing the entire integration loop
in Numba-compiled code, we achieve C-like performance while maintaining flexibility.
Key Features: 1. Bucketed Inputs: Efficiently handles time-varying Kinase inputs ($K(t)$) by detecting boundaries in the discretized input grid and adjusting steps accordingly. 2. Sparse Operations: RHS evaluations use custom sparse matrix-vector multiplication kernels that operate in-place to minimize memory allocation. 3. Hermite Interpolation: Provides dense output, allowing accurate evaluation at user-specified timepoints ($t_{eval}$) regardless of the internal adaptive steps taken.
csr_matvec_into(out, indptr, indices, data, x, n_rows)
In-place Sparse Matrix-Vector Multiplication: out = A * x.
Used for calculating signaling flux ($S = W \cdot K$) and TF inputs ($TF_{in} = A \cdot P$). Optimized for Numba with no temporary array creation.
rhs_model0_bucketed_into(dy, y, jb, c_k, A_i, B_i, C_i, D_i, Dp_i, E_i, tf_scale, kin_Kmat, W_indptr, W_indices, W_data, n_W_rows, TF_indptr, TF_indices, TF_data, n_TF_rows, offset_y, offset_s, n_sites, tf_deg, driver_map, Kt_work, S_all_work, P_vec_work, TF_in_work)
RHS wrapper for Model 0 (Distributive) optimized for the custom solver loop.
Differences from standard RHS:
- Writes directly into dy (no return).
- Takes pre-allocated work arrays (Kt_work, etc.) to avoid malloc.
- Takes jb (current time bucket index) to look up $K(t)$ instantly.
rhs_model1_bucketed_into(dy, y, jb, c_k, A_i, B_i, C_i, D_i, Dp_i, E_i, tf_scale, kin_Kmat, W_indptr, W_indices, W_data, n_W_rows, TF_indptr, TF_indices, TF_data, n_TF_rows, offset_y, offset_s, n_sites, tf_deg, driver_map, Kt_work, S_all_work, P_vec_work, TF_in_work)
RHS wrapper for Model 1 (Sequential) optimized for custom solver.
rhs_model2_bucketed_into(dy, y, jb, c_k, A_i, B_i, C_i, D_i, Dp_i, E_i, tf_scale, S_cache, TF_indptr, TF_indices, TF_data, n_TF_rows, offset_y, offset_s, n_sites, n_states, trans_from, trans_to, trans_site, trans_off, trans_n, tf_deg, driver_map, P_vec, TF_inputs)
RHS wrapper for Model 2 (Combinatorial).
Special Handling:
- Uses S_cache (pre-computed W*Kt) instead of computing S_all on the fly.
- Live-Drive limitation: Because S_cache is pre-computed, we don't have access to
instaneous Kt vector here to drive P_vec. We fallback to standard protein summation.
rhs_model4_bucketed_into(dy, y, jb, c_k, A_i, B_i, C_i, D_i, Dp_i, E_i, tf_scale, kin_Kmat, W_indptr, W_indices, W_data, n_W_rows, TF_indptr, TF_indices, TF_data, n_TF_rows, offset_y, offset_s, n_sites, tf_deg, driver_map, Kt_work, S_all_work, P_vec_work, TF_in_work)
RHS wrapper for Model 4 (Saturating/Michaelis-Menten).
adaptive_rk45_model01(model_id, y0, t_eval, kin_grid, args, rtol=1e-05, atol=1e-07, dt_init=0.05, dt_min=1e-06, dt_max=1.0, safety=0.9, max_steps=2000000)
Adaptive RK45 Integrator (Models 0, 1, 4).
Implements the Dormand-Prince 5(4) pair with local error control.
Key Logic:
1. Step Proposal: Estimate error of current step using 4th/5th order difference.
2. Boundary Check: If next step crosses a kin_grid boundary (where input K(t) changes),
shorten step to land exactly on boundary.
3. Error Control: If error < tolerance, accept step and potentially grow dt.
If error > tolerance, reject and shrink dt.
4. Interpolation: If accepted step spans a t_eval point, interpolate and record solution.
adaptive_rk45_model2(y0, t_eval, kin_grid, args2, rtol=1e-05, atol=1e-07, dt_init=0.05, dt_min=1e-06, dt_max=1.0, safety=0.9, max_steps=2000000)
Adaptive RK45 Integrator specifically for Model 2 (Combinatorial).
Similar logic to adaptive_rk45_model01 but unpacks Model 2 specific structures
(S_cache, state transitions) and calls rhs_model2_bucketed_into.
networkmodel.jacspeedup
Numerical Solver Wrappers and JIT Kernels.
This module acts as the high-performance computational core for the simulation.
It bridges the gap between the high-level System definitions and the low-level
numerical integrators (Runge-Kutta methods).
Key responsibilities:
1. Solver Interface: Provides solve_custom to dispatch the correct integration
routine based on the selected kinetic model (Distributive, Sequential, etc.).
2. JIT Compilation: Uses Numba to compile time-critical functions (RHS evaluations,
matrix-vector multiplications) into machine code for speed.
3. Sparse Matrix Operations: Implements fast custom kernels for calculating signaling
inputs ($S = W \cdot K$) and transcriptional regulation ($TF_{in} = A_{tf} \cdot P$).
4. Jacobian Approximation: Provides finite-difference routines for estimating
the Jacobian matrix, which is essential for stiff solvers (though mostly used here for
diagnostics or implicit stepping if enabled).
solve_custom(sys, y0, t_eval, rtol, atol)
Dispatches the simulation to the appropriate adaptive Runge-Kutta solver based on the global MODEL type.
This function handles the setup of model-specific arguments, such as building the phosphorylation cache for the combinatorial model (Model 2) or packing the argument tuples for the standard models (0, 1, 4).
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
sys
|
System
|
The system object containing parameters and network matrices. |
required |
y0
|
ndarray
|
Initial state vector at t=0. |
required |
t_eval
|
ndarray
|
Time points where the solution is required. |
required |
rtol
|
float
|
Relative tolerance for error control. |
required |
atol
|
float
|
Absolute tolerance for error control. |
required |
Returns:
| Type | Description |
|---|---|
|
np.ndarray: The integrated solution matrix Y of shape (len(t_eval), n_states). |
csr_matvec(indptr, indices, data, x, n_rows)
Performs a fast sparse matrix-vector multiplication: $y = A \cdot x$.
This is used to calculate: 1. Signaling Input: $S_{all} = W_{global} \cdot Kt$ (Kinase -> Site) 2. TF Input: $TF_{in} = TF_{matrix} \cdot P_{vec}$ (TF -> Gene)
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
indptr, indices, data
|
Arrays defining the CSR matrix A. |
required | |
x
|
ndarray
|
The dense vector to multiply. |
required |
n_rows
|
int
|
Number of rows in A. |
required |
Returns:
| Type | Description |
|---|---|
|
np.ndarray: The result vector y. |
csr_matvec_into(out, indptr, indices, data, x, n_rows)
In-place version of sparse matrix-vector multiplication to reduce memory allocation.
Writes the result directly into out.
build_S_cache_into(S_out, W_indptr, W_indices, W_data, kin_Kmat, c_k)
Pre-computes the signaling drive 'S' for every site at every time bucket.
For Model 2, calculating $S = W \cdot (K(t) \cdot c_k)$ at every micro-step is too slow. Instead, since $K(t)$ is discretized into buckets, we can pre-calculate the resulting S for each bucket.
Parallelized using prange for performance.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
S_out
|
ndarray
|
Output cache (n_sites, n_time_buckets). |
required |
W_*
|
CSR arrays for the kinase-substrate interaction matrix. |
required | |
kin_Kmat
|
ndarray
|
Kinase activity profiles (n_kinases, n_buckets). |
required |
c_k
|
ndarray
|
Optimized kinase activity multipliers. |
required |
kin_eval_step(t, grid, Kmat)
Evaluates kinase activity at time t using step interpolation (Nearest Neighbor / Bucket).
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
t
|
float
|
Current simulation time. |
required |
grid
|
ndarray
|
Time grid boundaries. |
required |
Kmat
|
ndarray
|
Kinase data matrix. |
required |
Returns:
| Type | Description |
|---|---|
|
np.ndarray: Vector of kinase activities at time t. |
rhs_nb_distributive(y, t, c_k, A_i, B_i, C_i, D_i, Dp_i, E_i, tf_scale, kin_grid, kin_Kmat, W_indptr, W_indices, W_data, n_W_rows, TF_indptr, TF_indices, TF_data, n_TF_rows, offset_y, offset_s, n_sites, tf_deg, driver_map)
Right-Hand Side (RHS) function for Model 0 (Distributive / Independent Binding).
Calculates dy/dt = f(t, y) including:
1. Kinase inputs: Interpolates K(t) and maps to sites via sparse W.
2. TF inputs: Aggregates protein levels, maps to genes via sparse TF matrix,
and applies Hill-like squashing.
3. Local dynamics: Calls distributive_rhs for the specific kinetic equations.
rhs_nb_sequential(y, t, c_k, A_i, B_i, C_i, D_i, Dp_i, E_i, tf_scale, kin_grid, kin_Kmat, W_indptr, W_indices, W_data, n_W_rows, TF_indptr, TF_indices, TF_data, n_TF_rows, offset_y, offset_s, n_sites, tf_deg, driver_map)
Right-Hand Side (RHS) function for Model 1 (Sequential Binding).
Almost identical to Distributive, but calls sequential_rhs.
rhs_nb_combinatorial(y, t, c_k, A_i, B_i, C_i, D_i, Dp_i, E_i, tf_scale, kin_grid, S_cache, TF_indptr, TF_indices, TF_data, n_TF_rows, offset_y, offset_s, n_sites, n_states, trans_from, trans_to, trans_site, trans_off, trans_n, tf_deg, driver_map, P_vec, TF_inputs)
Right-Hand Side (RHS) function for Model 2 (Combinatorial Binding).
Differences from other models:
1. Uses S_cache instead of recomputing $S_{all}$.
2. Uses pre-allocated work arrays (P_vec, TF_inputs) passed as arguments.
3. Iterates over n_states (2^n_sites) rather than just linear sites.
rhs_nb_saturating(y, t, c_k, A_i, B_i, C_i, D_i, Dp_i, E_i, tf_scale, kin_grid, kin_Kmat, W_indptr, W_indices, W_data, n_W_rows, TF_indptr, TF_indices, TF_data, n_TF_rows, offset_y, offset_s, n_sites, tf_deg, driver_map)
Right-Hand Side (RHS) for Model 4 (Saturating/Michaelis-Menten).
Identical setup to distributive, but calls saturating_rhs.
rhs_odeint(y, t, *args)
Standard python wrapper required for scipy.integrate.odeint or similar API.
fd_jacobian_nb_core_distributive(y, t, c_k, A_i, B_i, C_i, D_i, Dp_i, E_i, tf_scale, kin_grid, kin_Kmat, W_indptr, W_indices, W_data, n_W_rows, TF_indptr, TF_indices, TF_data, n_TF_rows, offset_y, offset_s, n_sites, tf_deg, driver_map, eps=1e-08)
Finite Difference Jacobian calculation for Distributive Model. Approximates J[i,j] = df_i/dy_j using forward differences. Useful for testing or if an implicit solver is required without an analytical Jacobian.
fd_jacobian_nb_core_sequential(y, t, c_k, A_i, B_i, C_i, D_i, Dp_i, E_i, tf_scale, kin_grid, kin_Kmat, W_indptr, W_indices, W_data, n_W_rows, TF_indptr, TF_indices, TF_data, n_TF_rows, offset_y, offset_s, n_sites, tf_deg, driver_map, eps=1e-08)
Finite Difference Jacobian for Sequential Model.
fd_jacobian_nb_core_combinatorial(y, t, c_k, A_i, B_i, C_i, D_i, Dp_i, E_i, tf_scale, kin_grid, S_cache, TF_indptr, TF_indices, TF_data, n_TF_rows, offset_y, offset_s, n_sites, n_states, trans_from, trans_to, trans_site, trans_off, trans_n, tf_deg, driver_map, P_vec, TF_inputs, eps=1e-08)
Finite Difference Jacobian for Combinatorial Model.
fd_jacobian_nb_core_saturating(y, t, c_k, A_i, B_i, C_i, D_i, Dp_i, E_i, tf_scale, kin_grid, kin_Kmat, W_indptr, W_indices, W_data, n_W_rows, TF_indptr, TF_indices, TF_data, n_TF_rows, offset_y, offset_s, n_sites, tf_deg, driver_map, eps=1e-08)
Finite Difference Jacobian for Saturating Model.
fd_jacobian_odeint(y, t, *args)
Python wrapper for the Jacobian function.
networkmodel.steadystate
Initial Condition and Steady-State Logic Module.
This module is responsible for defining the starting state ($y_0$) of the ODE system.
It supports two primary modes:
1. Data-Driven Initialization (build_y0_from_data): Uses experimental data (Protein/RNA abundance
at t=0) to set the initial conditions directly. This is crucial for matching the
absolute scale of the measurements.
2. Analytical Steady-State (steady_state_*): Solves the algebraic equilibrium equations
($dy/dt = 0$) for a system where all kinetic parameters are set to 1.0. This is useful for
testing structural consistency or initializing systems without data.
build_y0_from_data(idx, df_prot, df_rna, df_pho, *, t_init=0.0, t0_pho=0.0, eps=1e-09, time_tol=1e-08, max_pho_frac=0.3)
Constructs the initial state vector $y_0$ strictly from experimental data.
Physics-Compliant Mass Balance:
1. Protein Total ($P_{tot}$): Taken from df_prot at t_init.
2. Phospho Mass: Taken from df_pho at t0_pho. Because phospho signals are often
relative intensities, we scale them such that the total phosphorylated mass does not
exceed max_pho_frac (e.g., 30%) of $P_{tot}$.
3. Unphosphorylated Mass ($P_0$): Calculated by conservation: $P_0 = P_{tot} - \sum P_{phos}$.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
idx
|
System index object. |
required | |
df_prot, df_rna, df_pho
|
Tidy dataframes of observations. |
required | |
t_init
|
float
|
The simulation start time (usually 0.0). |
0.0
|
max_pho_frac
|
float
|
Cap on the initial fraction of phosphorylated protein. |
0.3
|
Returns:
| Type | Description |
|---|---|
|
np.ndarray: The assembled initial condition vector. |
steady_state_distributive(idx, TF_inputs=None, tf_scale=1.0, verify_with_rhs=False)
Analytically computes the steady state for the Distributive Model assuming all kinetic parameters (A, B, C, D, E, k) are 1.0.
This serves as a structural validation of the network topology. If the topology is valid, a stable steady state should exist.
steady_state_sequential(idx, TF_inputs=None, tf_scale=1.0, verify_with_rhs=False)
Analytically computes the steady state for the Sequential Model.
Because the states form a linear chain ($P_0 \leftrightarrow P_1 \leftrightarrow \dots$), the steady state can be found by solving a tridiagonal linear system.
steady_state_combinatorial(idx, TF_inputs=None, jb=0, tf_scale=1.0, max_states_per_protein=4096, trans_from=None, trans_to=None, trans_site=None, trans_off=None, trans_n=None, verify_with_rhs=False)
Computes steady state for the Combinatorial Model.
This involves solving a sparse linear system $A P = b$ for each protein, where $A$ is the transition rate matrix of the $2^N$ states graph.
networkmodel.model_ivp
IVP Solver Interface Module.
This module provides factory functions that generate callable Right-Hand Side (RHS)
functions compatible with scipy.integrate.solve_ivp.
These factories bridge the gap between high-level Python solvers and the low-level,
JIT-compiled Numba kernels defined in networkmodel.models. They handle:
1. Type Safety: Ensuring all arrays are C-contiguous and float64 for Numba.
2. Input Normalization: Wrapping transcription factor (TF) inputs whether they are constants or time-dependent functions.
3. Closure Creation: Returning a simple fun(t, y) that closes over all static model parameters, optimizing solver overhead.
make_solve_ivp_fun_saturating(*, A_i, B_i, C_i, D_i, Dp_i, E_i, tf_scale, tf_input, S_all, offset_y, offset_s, n_sites)
Creates a fun(t, y) closure for the Saturating (Michaelis-Menten) Model.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
A_i, B_i, C_i, D_i, Dp_i, E_i
|
ndarray
|
Kinetic parameters arrays. |
required |
tf_scale
|
float
|
Global scaling factor for TF activity. |
required |
tf_input
|
Transcription factor input (array or callable). |
required | |
S_all
|
ndarray
|
Pre-calculated Signaling drive matrix. |
required |
offset_y, offset_s
|
ndarray
|
Index offsets for state vectors and sites. |
required |
n_sites
|
ndarray
|
Number of phospho-sites per protein. |
required |
Returns:
| Name | Type | Description |
|---|---|---|
callable |
A function |
make_solve_ivp_fun_distributive(*, A_i, B_i, C_i, D_i, Dp_i, E_i, tf_scale, tf_input, S_all, offset_y, offset_s, n_sites)
Creates a fun(t, y) closure for the Distributive (Independent) Model.
This model assumes phosphorylation sites on the same protein behave independently.
make_solve_ivp_fun_sequential(*, A_i, B_i, C_i, D_i, Dp_i, E_i, tf_scale, tf_input, S_all, offset_y, offset_s, n_sites)
Creates a fun(t, y) closure for the Sequential Binding Model.
This model enforces an order of phosphorylation (e.g., Site 1 must be phos before Site 2).
make_solve_ivp_fun_combinatorial(*, A_i, B_i, C_i, D_i, Dp_i, E_i, tf_scale, tf_input, S_cache, jb, offset_y, offset_s, n_sites, n_states, trans_from, trans_to, trans_site, trans_off, trans_n)
Creates a fun(t, y) closure for the Combinatorial Model.
This model handles the full 2^n state space complexity using pre-computed transition tables.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
S_cache
|
ndarray
|
Cached signaling matrix (pre-computed inputs). |
required |
jb
|
int
|
Time bucket index (for static step evaluation). |
required |
trans_*
|
Arrays defining the sparse state transition graph. |
required |
Returns:
| Name | Type | Description |
|---|---|---|
callable |
A function |
Optimization & Loss Functions
networkmodel.optproblem
Multi-Objective Optimization Problem Definition.
This module defines the GlobalODE_MOO class, which wraps the biological simulation
and loss calculation into a format compatible with the pymoo optimization framework.
Key Concepts: 1. Elementwise Problem: Each member of the population (a parameter set) is evaluated individually. 2. Three Objectives: The optimizer simultaneously minimizes error for: * Protein Abundance * mRNA Abundance * Phosphorylation Levels 3. Prior Regularization: A penalty term ensures parameters stay close to biologically plausible priors derived from upstream analysis (KinOpt/TFOpt). This penalty is added to all objectives to steer the entire Pareto front toward realistic regions.
GlobalODE_MOO
Bases: ElementwiseProblem
Defines the multi-objective optimization problem for the ODE model.
Attributes:
| Name | Type | Description |
|---|---|---|
sys |
System
|
The biological system object (contains topology and matrices). |
slices |
dict
|
Mapping of parameter names to indices in the decision vector |
loss_data |
dict
|
Pre-processed experimental data arrays for fast loss calculation. |
defaults |
dict
|
Prior parameter values for regularization. |
lambdas |
dict
|
Hyperparameters controlling the strength of penalties/weights. |
time_grid |
ndarray
|
The common time grid for simulation. |
fail_value |
float
|
Penalty value assigned to objectives if simulation fails. |
__init__(sys, slices, loss_data, defaults, lambdas, time_grid, xl, xu, fail_value=1000000000000.0, elementwise_runner=None)
Initialize the MOO problem.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
sys
|
System object. |
required | |
slices
|
Parameter slice dictionary. |
required | |
loss_data
|
Dictionary of contiguous arrays for loss function. |
required | |
defaults
|
Dictionary of default/prior parameter values. |
required | |
lambdas
|
Dictionary containing weights for 'protein', 'rna', 'phospho', and 'prior'. |
required | |
time_grid
|
Simulation timepoints. |
required | |
xl
|
Lower bounds for parameters. |
required | |
xu
|
Upper bounds for parameters. |
required | |
fail_value
|
Objective value to return if ODE solver diverges. |
1000000000000.0
|
|
elementwise_runner
|
Optional runner for parallel execution. |
None
|
get_weight_options(time_points, *, rna_time_points=None, early_window=None, center=None, baseline=None, eps=1e-12)
Generates a library of time-dependent weighting schemes.
Time-series data often requires non-uniform weighting. For example: - Early Transients: Often contain the most kinetic information, so we might boost early timepoints. - Late Steady-State: Might be noisy or less relevant for initial kinetics.
Parameters
time_points : array-like All times you want schemes to support. rna_time_points : array-like or None Optional; used only for a couple RNA-friendly schemes. early_window : float or None Times <= early_window are "early". If None, uses ~20% quantile. center : float or None Center for gaussian/logistic. If None, uses median. baseline : float or None Baseline time (for "from_baseline" schemes). eps : float Numerical floor to prevent division by zero.
Returns
dict[str, callable]
Dictionary mapping scheme names (e.g., 'linear_early') to functions f(t) -> weight.
build_weight_functions(time_points_protein: np.ndarray, time_points_rna: np.ndarray, scheme_prot_pho: str = 'uniform', scheme_rna: str = 'uniform', early_window_prot_pho: float = 2.0, early_window_rna: float = 15.0) -> Tuple[Callable[[np.ndarray], np.ndarray], Callable[[np.ndarray], np.ndarray]]
Factory to build modality-specific weight functions based on configuration.
Parameters
time_points_protein: Timepoints for protein/phospho modality. time_points_rna: Timepoints for RNA modality. scheme_prot_pho: Weighting scheme name for protein/phospho (e.g., 'linear_early'). scheme_rna: Weighting scheme name for RNA (e.g., 'uniform'). early_window_prot_pho: Definition of 'early' (in minutes) for protein data. early_window_rna: Definition of 'early' (in minutes) for RNA data.
Returns
w_prot_pho, w_rna: Two callables that take time arrays and return weight arrays.
networkmodel.lossfn
High-Performance Loss Functions Module.
This module implements the objective functions used to quantify the discrepancy between model predictions ($Y$) and experimental data (Protein, RNA, Phospho).
Key Features:
1. JIT Compilation: All functions are decorated with @njit from Numba to
enable C-like performance, which is critical when the loss function is called
thousands of times during optimization.
2. Multi-Modal Support: Handles RNA, Total Protein, and Phospho-site data
simultaneously.
3. Robust Error Metrics: Supports multiple loss modes (Squared Error, Huber,
Pseudo-Huber, Charbonnier) to handle outliers in biological data.
4. Topology Awareness: Includes separate logic for standard models (linear states)
and combinatorial models (bitwise state aggregation).
sq(diff)
Standard Squared Error: (y - y_pred)^2.
huber(diff, delta=1.0)
Huber Loss. Behaves like Squared Error for small errors (<= delta) and Absolute Error for large errors. Robust to outliers.
pseudo_huber(diff, delta=1.0)
Pseudo-Huber Loss. A smooth approximation of Huber loss that is differentiable everywhere.
charbonnier(diff, eps=0.001)
Charbonnier Loss (differentiable L1). $\sqrt{diff^2 + \epsilon^2}$. Very robust to outliers.
log_cosh(diff)
Log-Hyperbolic Cosine Loss. Approx: x^2/2 for small x, abs(x) - log(2) for large x. Smoother than Huber; helps gradients flow better near zero.
cauchy_loss(diff, c=1.0)
Cauchy (Lorentzian) Loss. Scales logarithmically for large errors. Extremely robust to outliers; the influence of an outlier tends to zero.
poisson_scaled_mse(diff, pred_val, eps=1e-06)
Poisson-Scaled MSE. Weights the error by the inverse of the predicted intensity. Penalizes relative error more heavily at low values and allows more variance at high values.
geman_mcclure(diff, delta=1.0)
Geman-McClure Loss. Soft-saturating loss. As error -> infinity, loss -> constant. This strictly limits the maximum penalty any single data point can exert.
loss_function_noncomb(Y, p_prot, t_prot, obs_prot, w_prot, p_rna, t_rna, obs_rna, w_rna, p_pho, s_pho, t_pho, obs_pho, w_pho, prot_map, prot_base_idx, rna_base_idx, pho_base_idx)
Loss function for Models 0, 1, and 4 (Non-Combinatorial).
In these models, the state vector is linear: [RNA, Unphos, Phos_site1, Phos_site2, ...]
Calculates: 1. Protein: (Unphos + Sum(Phos_states)) / Baseline 2. RNA: RNA_state / Baseline 3. Phospho: Phos_state_j / Baseline
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
Y
|
ndarray
|
Simulation trajectory (n_times x n_states). |
required |
*_prot, *_rna, *_pho
|
Arrays of indices (protein index, time index, etc.), observations, and weights for each data type. |
required | |
prot_map
|
ndarray
|
Lookup table for state offsets. |
required |
*_base_idx
|
Index of the normalization baseline timepoint (usually t=0). |
required |
Returns:
| Name | Type | Description |
|---|---|---|
tuple |
(loss_protein, loss_rna, loss_phospho) |
loss_function_comb(Y, p_prot, t_prot, obs_prot, w_prot, p_rna, t_rna, obs_rna, w_rna, p_pho, s_pho, t_pho, obs_pho, w_pho, prot_map, prot_base_idx, rna_base_idx, pho_base_idx)
Loss function for Model 2 (Combinatorial).
In this model, states represent all $2^n$ combinations of phosphorylation. To calculate observables, we must aggregate these states: 1. Protein: Sum of all $2^n$ states. 2. Phospho Site j: Sum of all states where the $j$-th bit is 1.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
Y
|
ndarray
|
Simulation trajectory. |
required |
networkmodel.optuna_solver
Native Optuna Optimization Engine (MOTPE).
This module provides an alternative optimization backend using Optuna's Multi-Objective Tree-structured Parzen Estimator (MOTPE). It is designed to replace or coexist with the Pymoo genetic algorithm.
Key Features:
1. Bayesian Optimization: Uses MOTPE to efficiently explore the parameter space
by modeling the probability density of good vs. bad solutions.
2. Persistent Storage: Saves all trial data to a local SQLite database (optimization.db),
allowing pause/resume and post-hoc analysis.
3. Live Dashboard: Optionally launches the optuna-dashboard for real-time
visualization of the Pareto front and parameter importances.
4. Vectorized Loss: Implements a high-performance objective function that avoids
Python loops during evaluation.
OptunaResult
dataclass
Standardized result container to ensure compatibility with existing export scripts.
This mimics the structure of Pymoo's Result object so that downstream plotting and export functions don't need to change.
NativeOptunaObjective
The optimization kernel designed specifically for Optuna.
This class is callable objective(trial) and handles the entire evaluation pipeline:
Parameter Suggestion -> System Update -> ODE Simulation -> Loss Calculation.
It pre-calculates array indices (_build_fast_indices) to enable fully vectorized
loss computation, which is critical for performance when running thousands of trials.
__call__(trial)
The main evaluation step called by Optuna.
run_optuna_solver(args, sys, loss_data, slices, xl, xu, defaults, lambdas, time_grid, df_prot, df_rna, df_pho, n_trials=5000)
Main driver function for the Optuna Optimization pipeline.
It replaces the Pymoo genetic algorithm with Optuna's MOTPE.
Workflow:
1. Augment loss data with fast indices.
2. Setup persistent SQLite storage (optimization.db).
3. Launch the Optuna Dashboard (background thread) for real-time monitoring.
4. Run the optimization loop.
5. Extract the Pareto front from the database.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
args
|
Configuration namespace (seed, output_dir, etc.). |
required | |
sys
|
The system object. |
required | |
loss_data
|
Pre-computed data for loss function. |
required | |
slices
|
Parameter slices. |
required | |
xl, xu
|
Lower/Upper parameter bounds. |
required | |
n_trials
|
Number of evaluations to run. |
5000
|
Returns:
| Name | Type | Description |
|---|---|---|
OptunaResult |
Standardized result object containing Best Parameters (X) and Objectives (F). |
networkmodel.runner
Main Driver for PhosKinTime Global Model.
This script runs the optimization and analysis steps of the PhosKinTime Global Model. It loads the data, builds the model, and runs the optimization.
The script can be run in two modes: 1. Standard optimization: run_optuna_solver() 2. Sensitivity analysis: run_sensitivity_analysis()
The script can also be run in parallel using the multiprocessing module. This is useful for running the optimization on multiple cores. To enable parallel execution, set the CORES environment variable to the number of cores to use.
networkmodel.refine
Iterative Refinement and Optimization Zooming Module.
This module implements a multi-stage optimization strategy to fine-tune the biological model parameters. After an initial global search (Exploration), this module performs iterative "Refinement" passes (Exploitation) by:
- Zooming: Calculating a tighter bounding box around the best solutions found so far.
- Seeding: Creating a hybrid population that mixes the best previous solutions ("Warm Start") with fresh random samples within the new bounds to maintain diversity.
- Parallel Execution: Using multiprocessing to speed up the re-evaluation of the refined space.
get_refined_bounds(X, current_xl, current_xu, idx, padding=0.2)
Calculates tighter parameter bounds based on the spread of the current best solutions. Also logs a detailed "Biological Report" showing the physical ranges of these new bounds.
The new bounds are calculated as:
$$ [min(X) - padding imes range, max(X) + padding imes range] $$
clamped to the original current_xl and current_xu.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
X
|
ndarray
|
The decision vectors of the current Pareto front. |
required |
current_xl
|
ndarray
|
The absolute lower bounds of the search space. |
required |
current_xu
|
ndarray
|
The absolute upper bounds of the search space. |
required |
idx
|
Index
|
The system index object (for mapping parameters to names). |
required |
padding
|
float
|
The expansion factor for the new bounds (default 20%). |
0.2
|
Returns:
| Name | Type | Description |
|---|---|---|
tuple |
(new_xl, new_xu) arrays. |
create_multistart_population(X_best, pop_size, new_xl, new_xu)
Creates a hybrid population for the next optimization stage.
Strategy: - 50% Warm Start: Reuses the best individuals from the previous run (perturbed slightly). - 50% Fresh Sampling: Randomly samples new points within the refined bounds to prevent premature convergence to a local minimum.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
X_best
|
ndarray
|
Array of best solutions from previous run. |
required |
pop_size
|
int
|
Target population size. |
required |
new_xl, new_xu
|
The new boundaries. |
required |
Returns:
| Name | Type | Description |
|---|---|---|
Population |
A Pymoo Population object ready for the algorithm. |
run_iterative_refinement(problem, prev_res, args, idx=None, max_passes=3, padding=0.25)
Executes the recursive refinement loop.
Repeatedly zooms in on the optimal region by: 1. Defining a tighter box around the current Pareto front. 2. Launching a new optimization run within that box. 3. Checking if the solution improved; if not, stopping early.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
problem
|
GlobalODE_MOO
|
The optimization problem instance. |
required |
prev_res
|
Result
|
The result object from the previous (global) run. |
required |
args
|
Namespace
|
Configuration arguments. |
required |
idx
|
Index
|
System index object for logging. |
None
|
max_passes
|
int
|
Maximum number of refinement iterations. |
3
|
padding
|
float
|
Initial padding factor for bounds expansion. |
0.25
|
Returns:
| Name | Type | Description |
|---|---|---|
Result |
The final, most refined optimization result. |
run_refinement(problem, prev_res, args, padding=0.25)
Legacy wrapper for a single refinement pass. Deprecated in favor of the recursive run_iterative_refinement.
networkmodel.scan
Hyperparameter Tuning Module (Optuna + Pymoo).
This module implements an "Outer Loop" optimization to tune the hyperparameters of the loss function itself (specifically, the regularization weights $\lambda_{protein}, \lambda_{RNA}, \dots$).
Architecture: Nested Optimization 1. Outer Loop (Optuna): Bayesian Optimization searches for the best set of objective weights ($\lambda$). 2. Inner Loop (Pymoo): For each set of weights, a short Genetic Algorithm (UNSGA3) runs to estimate how well the model could fit the data with those priorities.
Key Features:
* Live Dashboard: Launches the optuna-dashboard to visualize parameter importance and Pareto fronts in real-time.
* Pruning: Uses OptunaPruningCallback to terminate unpromising Pymoo runs early, saving compute time.
* Persistent Storage: Saves all trials to an SQLite database, enabling pause/resume functionality.
OptunaPruningCallback
Bases: Callback
Bridges Pymoo's internal generational loop with Optuna's pruning API.
This allows the outer Optuna loop to kill a running Pymoo optimization
if the intermediate results (at gen_step intervals) are significantly
worse than the median of previous trials.
BioObjective
The Function-Object (Functor) optimized by Optuna.
Represents one "Trial":
1. Receives hyperparameters ($\lambda$) from Optuna.
2. Constructs the GlobalODE_MOO problem with these weights.
3. Runs a reduced-scope Genetic Algorithm (UNSGA3).
4. Returns the best achieved error score.
run_hyperparameter_scan(args, sys, loss_data, defaults, time_grid, runner, slices, xl, xu)
Main driver for the Hyperparameter Scan.
Orchestrates the entire tuning process: 1. Sets up the persistent SQLite database. 2. Launches the Optuna Dashboard in a background thread. 3. Runs the optimization loop. 4. Exports results (Excel + Plots).
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
args
|
Configuration arguments. |
required | |
sys
|
System object. |
required | |
loss_data
|
Pre-computed loss indices. |
required | |
defaults
|
Default parameters. |
required | |
time_grid
|
Simulation timepoints. |
required | |
runner
|
Parallel runner for Pymoo. |
required | |
slices
|
Parameter slices. |
required | |
xl, xu
|
Parameter bounds. |
required |
Returns:
| Name | Type | Description |
|---|---|---|
dict |
The best set of hyperparameters found. |
Analysis & Visualization
networkmodel.sensitivity
Global Sensitivity Analysis Module (Morris Method).
This module performs a comprehensive sensitivity analysis to determine which model parameters have the most significant impact on the system's output.
Methodology: It uses the Morris Method (Elementary Effects), which is computationally efficient for high-dimensional models. It works by: 1. Sampling: Generating $N$ trajectories through the parameter space, where each step changes one parameter. 2. Simulation: Running the ODE model for every sampled parameter set in parallel. 3. Analysis: Computing the mean absolute effect ($\mu^*$) and interaction/non-linearity ($\sigma$) for each parameter.
Outputs: * Sensitivity Indices: CSV ranking parameters by influence. * Perturbation Clouds: Visualizations showing how parameter uncertainty propagates to trajectory spread.
compute_bounds(params_dict, perturbation=SENSITIVITY_PERTURBATION)
Generates [lower, upper] bounds for each parameter based on a local perturbation.
For SALib, we define a hypercube around the optimal parameters found during calibration.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
params_dict
|
dict
|
The best-fit parameters. |
required |
perturbation
|
float
|
Fraction to vary parameters (e.g., 0.2 = +/- 20%). |
SENSITIVITY_PERTURBATION
|
Returns:
| Name | Type | Description |
|---|---|---|
dict |
Problem definition for SALib containing 'num_vars', 'names', and 'bounds'. |
run_sensitivity_analysis(sys, idx, fitted_params, output_dir, metric='total_signal')
Main driver for Morris Sensitivity Analysis.
Workflow:
1. **Define Problem:** Create bounds around `fitted_params`.
2. **Sample:** Generate Morris trajectories using SALib.
3. **Simulate:** Run parallel simulations for all samples.
4. **Analyze:** Compute Morris indices ($\mu^*$, $\sigma$).
5. **Visualize:** Plot influence bars and trajectory perturbation clouds.
[Image of parallel processing flow chart]
Args:
sys: The System object.
idx: Index object.
fitted_params: Dictionary of optimal parameters.
output_dir: Path to save results.
metric: Scalar metric for sensitivity target.
Returns:
pd.DataFrame: The computed sensitivity indices.
networkmodel.analysis
Steady-State Simulation and Analysis Module.
This script manages the simulation of the biological system to its steady state and performs a comprehensive post-simulation analysis. It includes two main functions:
simulate_until_steady: Integrates the ODEs over a long time horizon using log-spaced time steps to capture both fast initial kinetics and slow equilibration.plot_steady_state_all: A massive visualization and reporting pipeline that generates per-protein dynamic plots, convergence diagnostics, kinase dominance analysis, and global phosphorylation statistics.
This module is essential for establishing the baseline behavior of the cell model before perturbations (e.g., drug treatments) are applied.
simulate_until_steady(sys, t_max=1440.0, n_points=1000)
Simulates the system from t=0 to t_max (default 24h) to observe convergence.
Uses log-spacing for the time grid. This is crucial for biological systems where phosphorylation reactions happen in seconds/minutes, while transcriptional/translational changes happen over hours.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
sys
|
System
|
The system object containing ODE definitions and parameters. |
required |
t_max
|
float
|
Maximum simulation time in minutes (default 1440m = 24h). |
1440.0
|
n_points
|
int
|
Number of time points in the evaluation grid. |
1000
|
Returns:
| Name | Type | Description |
|---|---|---|
tuple |
(t_eval, Y) where t_eval is the time vector and Y is the state matrix. |
plot_steady_state_all(t, Y, sys, idx, output_dir)
Performs comprehensive analysis and plotting of the steady-state results.
This function generates: 1. Dynamic Plots: Time-series plots for RNA, Protein, and Phospho-sites for every protein. 2. Convergence Diagnostics: Histograms of derivatives at the final time point. 3. Phospho-fraction Summary: Table and plots of the % phosphorylated for all proteins. 4. Kinase Drive Analysis: Quantification of how much "signal" each kinase pushes into the network. 5. Dominance Analysis: Identifies which kinase is the primary driver for every single phosphorylation site. 6. Activity vs. Drive Scatter: Visualizes kinase efficiency (Active Conc vs. Network Output).
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
t
|
ndarray
|
Time vector. |
required |
Y
|
ndarray
|
State matrix (time x variables). |
required |
sys
|
System
|
The system object (used for RHS evaluation and W matrix access). |
required |
idx
|
IndexMap
|
Object mapping names to state indices. |
required |
output_dir
|
str
|
Directory where results will be saved. |
required |
networkmodel.export
Exporting and Visualizing Simulation Results Module
Export and visualize simulation results from global model optimization.
This module provides functions for exporting and visualizing simulation results from global model optimization. It includes functions for exporting phosphorylation drive S, scanning prior regularization parameters, plotting phosphorylation drive S, generating diagnostic correlation plots, plotting residuals, and plotting boxplots of parameters across the Pareto front.
build_site_meta(idx)
Returns parallel arrays of length idx.total_sites.
Args:
idx: Index object containing model-specific information.
Returns
site_protein: np.ndarray of protein names for each site site_psite: np.ndarray of psite labels for each site site_local: np.ndarray of local site indices within each protein
save_pareto_3d(res, selected_solution=None, output_dir='out_moo')
Saves a high-quality 3D Scatter plot of the Pareto Front. Highlights the 'selected' balanced solution if provided.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
res
|
Optimization result object containing Pareto front data. |
required | |
selected_solution
|
Tuple of (fitness, decision variables) for the selected solution. |
None
|
|
output_dir
|
Directory where plots will be saved. |
'out_moo'
|
save_parallel_coordinates(res, selected_solution=None, output_dir='out_moo')
Saves a Parallel Coordinate Plot (PCP).
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
res
|
Optimization result object containing Pareto front data. |
required | |
selected_solution
|
Tuple of (fitness, decision variables) for the selected solution. |
None
|
|
output_dir
|
Directory where plots will be saved. |
'out_moo'
|
create_convergence_video(res, output_dir='out_moo', filename='optimization_history.mp4')
Creates an animation of the Pareto Front evolution using standard Matplotlib.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
res
|
Optimization result object containing Pareto front data. |
required | |
output_dir
|
Directory where video will be saved. |
'out_moo'
|
|
filename
|
Name of the output video file. |
'optimization_history.mp4'
|
export_pareto_front_to_excel(res, sys, idx, slices, output_path, weights=(1.0, 1.0, 1.0), top_k_trajectories=None, t_points_p=None, t_points_r=None, t_points_ph=None, rtol=1e-05, atol=1e-07, mxstep=5000)
Export all Pareto solutions into one Excel workbook.
Writes sheets
- summary: objectives + scalar score + rank + weights
- params_genes: per-protein parameters for each sol_id (long format)
- params_kinases: per-kinase parameters for each sol_id (long format)
- traj_protein: trajectories per sol_id (long format)
- traj_rna: trajectories per sol_id (long format)
- traj_phospho: trajectories per sol_id (long format)
Notes
- This can get HUGE if you have many Pareto points. Use top_k_trajectories.
- The function assumes res.X and res.F exist.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
res
|
Optimization result object containing Pareto front data. |
required | |
sys
|
System object containing model-specific information. |
required | |
idx
|
Index object containing protein/kinase metadata. |
required | |
slices
|
Slices for trajectory data (e.g., time points for protein, RNA, and phospho data). |
required | |
output_path
|
Path to save the Excel workbook. |
required | |
top_k_trajectories
|
None = export trajectories for all solutions; else only top K by scalar score. |
None
|
|
t_points_p, t_points_r, t_points_ph
|
Optional time points for trajectory data. |
required |
plot_goodness_of_fit(df_prot_obs, df_prot_pred, df_rna_obs, df_rna_pred, df_phos_obs, df_phos_pred, output_dir, file_prefix='')
Plots goodness of fit for protein, RNA, and phosphorylation data.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
df_prot_obs
|
Protein observed data DataFrame. |
required | |
df_prot_pred
|
Protein predicted data DataFrame. |
required | |
df_rna_obs
|
RNA observed data DataFrame. |
required | |
df_rna_pred
|
RNA predicted data DataFrame. |
required | |
df_phos_obs
|
Phosphorylation observed data DataFrame. |
required | |
df_phos_pred
|
Phosphorylation predicted data DataFrame. |
required | |
output_dir
|
Directory where plots will be saved. |
required | |
file_prefix
|
Prefix for output file names. |
''
|
plot_gof_from_pareto_excel(excel_path: str, output_dir: str, plot_goodness_of_fit_func, df_prot_obs_all: pd.DataFrame, df_rna_obs_all: pd.DataFrame, df_phos_obs_all: pd.DataFrame, traj_protein_sheet: str = 'traj_protein', traj_rna_sheet: str = 'traj_rna', traj_phospho_sheet: str = 'traj_phospho', summary_sheet: str = 'summary', top_k: int = None, only_solutions=None, score_col: str = 'scalar_score')
Uses the Excel produced by export_pareto_front_to_excel to plot goodness of fit
- summary
- traj_protein (sol_id, protein, time, pred_fc)
- traj_rna (sol_id, protein, time, pred_fc)
- traj_phospho (sol_id, protein, psite, time, pred_fc)
export_results(sys, idx, df_prot_obs, df_rna_obs, df_phos_obs, df_pred_p, df_pred_r, df_pred_ph, output_dir)
Export pre-computed observed + predicted trajectories and model parameters.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
sys
|
System object containing model information. |
required | |
idx
|
Index of the solution to export. |
required | |
df_prot_obs
|
Protein observed data DataFrame. |
required | |
df_rna_obs
|
RNA observed data DataFrame. |
required | |
df_phos_obs
|
Phosphorylation observed data DataFrame. |
required | |
df_pred_p
|
Protein predicted data DataFrame. |
required | |
df_pred_r
|
RNA predicted data DataFrame. |
required | |
df_pred_ph
|
Phosphorylation predicted data DataFrame. |
required | |
output_dir
|
Directory where results will be saved. |
required |
save_gene_timeseries_plots(gene: str, df_prot_obs: pd.DataFrame, df_prot_pred: pd.DataFrame, df_rna_obs: pd.DataFrame, df_rna_pred: pd.DataFrame, df_phos_obs: pd.DataFrame, df_phos_pred: pd.DataFrame, output_dir: str, prot_times: np.ndarray = None, rna_times: np.ndarray = None, phos_times: np.ndarray = None, filename_prefix: str = 'ts', dpi: int = 300, phos_mode: str = 'per_psite', max_psites: int = None)
Save a 3-panel time-series plot for one protein link
- Protein observed vs predicted (fc vs fc_pred)
- RNA observed vs predicted
- Phosphorylation observed vs predicted (either mean across psites or per-psite lines
scan_prior_reg(out_dir)
Scan prior regularization parameters and save Pareto front plots.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
out_dir
|
Directory containing output files. |
required |
export_S_rates(sys, idx, output_dir, filename='S_rates_picked.csv', long=True)
Export phosphorylation drive S for optimized parameters. S is per-site and per time-runner (TIME_POINTS_PROTEIN / sys.kin_grid).
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
sys
|
System object containing model information. |
required | |
idx
|
Index of the solution to export. |
required | |
output_dir
|
Directory where results will be saved. |
required | |
filename
|
Name of the output file. |
'S_rates_picked.csv'
|
|
long
|
If True, output in long format (protein, psite, time, S); otherwise, wide format. |
True
|
plot_s_rates_report(csv_path: str | Path, out_pdf: str | Path = 'S_rates_report.pdf', *, time_col: str = 'time', value_col: str = 'S', protein_col: str = 'protein', psite_col: str = 'psite', log_x: bool = True, top_k_sites_per_protein: int | None = 24, max_sites_per_page: int = 12, ncols: int = 3, normalize_per_site: bool = False, heatmap_per_protein: bool = True, heatmap_cap_sites: int = 80, agg_duplicates: str = 'mean', dpi: int = 150) -> Path
Plot phosphorylation drive S for optimized parameters.
process_convergence_history(res, output_dir)
Extract and save optimization convergence history.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
res
|
Pymoo optimization result object with history attribute |
required | |
output_dir
|
Directory path to save outputs |
required |
Returns:
| Type | Description |
|---|---|
|
pd.DataFrame: Convergence history dataframe with columns [n_evals, gen, min_prot_mse, min_rna_mse, min_phos_mse] |
export_kinase_activities(sys, idx, output_dir, t_max=120, n_points=121)
Calculates and saves the effective Kinase Activity (Profile * c_k) over time.
export_param_correlations(res, slices, idx, output_dir, best_idx=None)
Plots kinase parameter correlation across Pareto front and gene parameter correlation for best solution.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
res
|
Result object containing optimization results. |
required | |
slices
|
Dictionary of parameter slices. |
required | |
idx
|
Index object containing model indices. |
required | |
output_dir
|
Directory where results will be saved. |
required | |
best_idx
|
Index of the best solution to plot gene parameter correlations. |
None
|
export_residuals(sys, idx, df_prot, df_rna, df_phos, output_dir)
Calculates and plots residuals (Observed - Predicted) for the best solution. Helps identify systematic bias (e.g., 'Always under-predicts at t=10' or 'Always over-predicts at t=100').
export_parameter_distributions(res, slices, idx, output_dir)
Plots boxplots of parameters across the ENTIRE Pareto front. Shows which parameters are 'identifiable' (tight box) vs 'sloppy' (wide box).
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
res
|
Optimization result object containing Pareto front data. |
required | |
slices
|
Dictionary mapping parameter names to their column indices in res.X. |
required | |
idx
|
Index object containing model-specific information. |
required | |
output_dir
|
Directory where plots will be saved. |
required |
networkmodel.dashboard_app
Dashboard application for PhosKinTime.
This module provides a Streamlit-based dashboard for visualizing and analyzing results from PhosKinTime simulations. It includes features for visualizing pareto frontiers, goodness of fit, residuals analysis, and more.
The dashboard is intended to be run from the command line, e.g.:
streamlit run run_dashboard.py -- --output_dir=/path/to/output
networkmodel.dashboard_bundle
Dashboard bundle saving and loading utilities.
This module provides functions to save and load a compact, dashboard-friendly bundle containing essential data for visualization. The bundle is designed to be lean, including only the necessary data for visualization, avoiding pickling of custom classes for robustness.
Variables stored in the bundle:
- args: command-line arguments
- picked_index: index of the protein picked for visualization
- frechet_scores: dictionary of Frechet distances for each protein
- lambdas: dictionary of lambda values for each protein
- solver_times: dictionary of solver times for each protein
- defaults: dictionary of default values for each protein
- slices: dictionary of slice values for each protein
- xl, xu: lower and upper bounds for each protein
Note that the bundle does not store the optimization result object (res) or the Pymoo algorithm (sys).
Utilities
networkmodel.utils
Utilities and Configuration Management Module.
This module provides essential helper functions and classes for:
1. Data Normalization: Functions to clean column names (_normcols), calculate fold-changes
relative to a baseline (normalize_fc_to_t0), and scale data for consistent modeling.
2. Configuration Loading: The load_config_toml function parses the config.toml file
into a structured PhosKinConfig dataclass, centralizing all run settings.
3. Dynamic Bound Calculation: calculate_bio_bounds intelligently sets parameter limits
based on the input data range and network topology, ensuring biological plausibility.
4. Math Helpers: JIT-compiled kernels for common operations like softplus (for positive parameters)
and time_bucket (for discrete input grids).
PhosKinConfig
dataclass
Immutable configuration object holding all settings for the simulation run.
Loaded from config.toml.
normalize_fc_to_t0(df)
Normalizes a Fold-Change (FC) DataFrame relative to the value at time t=0.
Logic: For each (protein, psite) group, find the FC at t=0. New_FC(t) = Raw_FC(t) / Raw_FC(0).
process_and_scale_raw_data(df, time_points, id_cols, scale_method='fc_start', epsilon=0.001)
Scales time-series data ensuring ALL outputs remain non-negative (>= 0).
This function handles the conversion from "Wide" format (columns x1, x2, ...) to "Long" format and applies the selected scaling transformation.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
scale_method
|
str
|
|
'fc_start'
|
time_bucket(t, grid)
Finds the index j such that grid[j] <= t < grid[j+1].
Used for piecewise-constant input interpolation.
softplus(x)
Softplus activation: log(1 + exp(x)). Maps real numbers to positive numbers. Used for parameter transformation.
inv_softplus(y)
Inverse Softplus: log(exp(y) - 1). Maps positive numbers to real numbers.
pick_best_lamdas(F, weights)
Selects the best solution from a Pareto front based on a weighted sum of normalized objectives.
load_config_toml(path: str | Path) -> PhosKinConfig
Parses config.toml and returns a validated PhosKinConfig object.
Includes fallbacks and default values for missing keys.
get_parameter_labels(idx)
Generates descriptive labels for all parameters in the flattened decision vector.
calculate_bio_bounds(idx, df_prot, df_rna, tf_mat, kin_in)
Dynamically calculates optimization bounds by analyzing network topology, data dynamic ranges, and kinetic equilibrium requirements.
This ensures that the optimizer doesn't waste time searching parameter space regions that are mathematically impossible to yield the observed data (e.g., degradation rates that are too slow to match the observed decay).
Returns:
| Name | Type | Description |
|---|---|---|
dict |
Custom bounds for each parameter group. |
get_optimized_sets(idx, slices, xl, xu, eps=1e-14)
Identifies which entities (Proteins, Sites, Kinases) actually have free parameters being optimized (i.e., bounds are not collapsed).
Returns:
| Name | Type | Description |
|---|---|---|
opt_proteins |
set[str] proteins with any free variable among A/B/C/D/E |
|
opt_sites |
set[str] site labels like 'EGFR_Y1173' where Dp_i is free |
|
opt_kinases |
set[str] kinases with free c_k |
networkmodel.cache
Fast Loss Data Preparation Module.
This script handles the crucial pre-processing step for the optimization loop. Instead of performing slow dictionary lookups and string comparisons inside the loss function (which runs thousands of times), this module maps all experimental data (RNA, Protein, Phospho) onto integer-based grid indices once.
The output is a dictionary of contiguous NumPy arrays (indices, observations, weights) that can be passed directly to a fast JIT-compiled or Cython loss function.
prepare_fast_loss_data(idx, df_prot, df_rna, df_pho, time_grid)
Converts pandas DataFrames of experimental data into integer-based arrays aligned with the simulation state vector and time grid.
This function performs "pre-indexing":
1. Maps float timepoints to integer indices in time_grid.
2. Maps string protein names to integer state indices (p2i).
3. Maps specific phosphosites (e.g., "S473") to relative state offsets.
4. Constructs a prot_map for fast state slicing.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
idx
|
IndexMap
|
Object containing mappings (p2i, sites, n_sites, etc.). |
required |
df_prot
|
DataFrame
|
Protein data columns [protein, time, fc, w]. |
required |
df_rna
|
DataFrame
|
RNA data columns [protein, time, fc, w]. |
required |
df_pho
|
DataFrame
|
Phospho data columns [protein, psite, time, fc, w]. |
required |
time_grid
|
array - like
|
The fixed time points of the simulation solver. |
required |
Returns:
| Name | Type | Description |
|---|---|---|
dict |
A dictionary containing keyed NumPy arrays (e.g., 'p_prot', 'obs_prot') ready for the fast loss function. |