Skip to content

API Reference

Data Standardization & Cleanup

processing.cleanup

process_collecttri()

Processes the CollecTRI file to clean and filter mRNA-TF interactions. Removes complex interactions, filters by target genes, and saves the result.

format_site(site)

Formats a phosphorylation site string.

If the input is NaN or an empty string, returns an empty string. If the input contains an underscore ('_'), splits the string into two parts, converts the first part to uppercase, and appends the second part unchanged. Otherwise, converts the entire string to uppercase.

Parameters:

Name Type Description Default
site str

The phosphorylation site string to format.

required

Returns:

Name Type Description
str

The formatted phosphorylation site string.

process_msgauss()

Processes the MS Gaussian data file to generate time series data.

process_msgauss_std()

Processes the MS Gaussian data file to compute transformed means and standard deviations.

process_routlimma()

Processes the Rout Limma table to generate time series data for mRNA.

update_gene_symbols(filename)

Updates the GeneID column in a CSV file by mapping GeneIDs to gene/protein symbols.

Parameters:

Name Type Description Default
filename str

The path to the CSV file to be updated. The file must contain a 'GeneID' column.

required

move_processed_files()

Moves or copies processed files to their respective directories.

Optimization Results Mapping

processing.map

map_optimization_results(tf_file_path, kin_file_path, sheet_name='Alpha Values')

Reads the TF-mRNA optimization results from an Excel file and maps mRNA to each TF.

Parameters:

Name Type Description Default
tf_file_path

Path to the Excel file containing TF-mRNA optimization results.

required
kin_file_path

Path to the Excel file containing Kinase-Phosphorylation optimization results.

required
sheet_name

The name of the sheet in the Excel file to read from. Default is 'Alpha Values'.

'Alpha Values'

Returns:

Type Description

pd.DataFrame: A DataFrame containing the mapped TF, mRNA, Psite, and Kinase information.

create_cytoscape_table(mapping_csv_path)

Creates a Cytoscape-compatible edge table from a mapping file.

Parameters:

Name Type Description Default
mapping_csv_path str

Path to the input CSV file with columns: TF, TF_strength, mRNA, Psite, Kinase, Kinase_strength

required

Returns:

Name Type Description
table DataFrame

Edge table with columns [Source, Target, Interaction, Strength]

add_kinetic_strength_columns(mapping_path, mapping__path, excel_path, suffix)

Adds kinetic strength columns to the mapping files based on the provided Excel file.

Parameters:

Name Type Description Default
mapping_path str

Path to the first mapping file.

required
mapping__path str

Path to the second mapping file.

required
excel_path str

Path to the Excel file containing kinetic strength data.

required
suffix str

Suffix to append to the output files.

required

generate_nodes(edge_df)

Infers node types and aggregates all phosphorylation sites per target node from phosphorylation edges.

Parameters:

Name Type Description Default
edge_df DataFrame

Must have columns ['Source', 'Target', 'Interaction', 'Psite']

required

Returns:

Type Description

pd.DataFrame: DataFrame with columns ['Node', 'Type', 'Psite']

Kinase-Phosphorylation Optimization

Evolutionary Algorithms

kinopt.evol.config.constants

kinopt.evol.config.logconf

ColoredFormatter

Bases: Formatter

format(record)

Format the log record with ANSI color codes and elapsed time.

Parameters:

Name Type Description Default
record LogRecord

The log record to format.

required

Returns: str: The formatted log message with ANSI color codes.

remove_ansi(s) staticmethod

Remove ANSI escape codes from a string.

Parameters:

Name Type Description Default
s str

The string from which to remove ANSI escape codes.

required

Returns: str: The string without ANSI escape codes.

setup_logger(name='phoskintime', log_file=None, level=logging.DEBUG, log_dir=LOG_DIR, rotate=True, max_bytes=2 * 1024 * 1024, backup_count=5)

Function to set up a logger with both file and console handlers.

Parameters:

Name Type Description Default
name str

Name of the logger.

'phoskintime'
log_file str

Path to the log file. If None, a default path is generated.

None
level int

Logging level (e.g., logging.DEBUG, logging.INFO).

DEBUG
log_dir str

Directory where log files are stored.

LOG_DIR
rotate bool

Whether to use rotating file handler.

True
max_bytes int

Maximum size of log file before rotation.

2 * 1024 * 1024
backup_count int

Number of backup files to keep.

5

Returns:

Name Type Description
logger Logger

Configured logger instance.

kinopt.evol.exporter.plotout

plot_residuals_for_gene(gene, gene_data)

Generates and saves combined residual-related plots for one gene with all psites in the legend.

Parameters:

Name Type Description Default
gene str

Gene identifier.

required
gene_data dict

Dictionary with keys 'psites', 'observed', 'estimated', and 'residuals' containing data for all psites.

required
TIME_POINTS ndarray or list

Time points corresponding to the series.

required

opt_analyze_nsga(problem, result, F, pairs, approx_ideal, approx_nadir, asf_i, pseudo_i, n_evals, hv, hist, val, hist_cv_avg, k, igd, best_objectives, waterfall_df, convergence_df, alpha_values, beta_values)

Function to generate and save various plots related to optimization results.

Parameters:

Name Type Description Default
problem

The optimization problem instance.

required
result

The result of the optimization run.

required
F

Objective function values.

required
pairs

Pairs of objectives to plot.

required
approx_ideal

Approximate ideal point in objective space.

required
approx_nadir

Approximate nadir point in objective space.

required
asf_i

Index of the best solution in terms of the augmented weighted sum.

required
pseudo_i

Index of the pseudo weights.

required
n_evals

Number of evaluations at each generation.

required
hv

Hypervolume values.

required
hist

History of the optimization process.

required
val

Values for convergence plot.

required
hist_cv_avg

Average constraint violation history.

required
k

Number of generations.

required
igd

Inverted generational distance values.

required
best_objectives

Best objectives found during the optimization process.

required
waterfall_df

DataFrame containing waterfall plot data.

required
convergence_df

DataFrame containing convergence data.

required
alpha_values

Dictionary containing alpha values for parameters.

required
beta_values

Dictionary containing beta values for parameters.

required

Returns:

Type Description

None

opt_analyze_de(long_df, convergence_df, ordered_optimizer_runs, x_values, y_values, val)

Function to generate and save various plots related to optimization results.

Parameters:

Name Type Description Default
long_df DataFrame

DataFrame containing parameter values and objective function values.

required
convergence_df DataFrame

DataFrame containing convergence data.

required
ordered_optimizer_runs DataFrame

DataFrame containing ordered optimizer runs.

required
x_values list

X-axis values for the waterfall plot.

required
y_values list

Y-axis values for the waterfall plot.

required
val list

Values for the convergence plot.

required

Returns:

Type Description

None

kinopt.evol.exporter.sheetutils

output_results(P_initial, P_init_dense, P_estimated, residuals, alpha_values, beta_values, result, timepoints, OUT_FILE)

Function to output results to an Excel file.

Parameters:

Name Type Description Default
P_initial dict

Dictionary with initial parameters.

required
P_init_dense ndarray

Dense matrix of initial parameters.

required
P_estimated ndarray

Dense matrix of estimated parameters.

required
residuals ndarray

Dense matrix of residuals.

required
alpha_values dict

Dictionary with alpha values.

required
beta_values dict

Dictionary with beta values.

required
result str

Result string for logging.

required
timepoints list

List of time points.

required
OUT_FILE str

Output file path.

required

Returns:

Type Description

None

kinopt.evol.objfn.minfndiffevo

PhosphorylationOptimizationProblem

Bases: ElementwiseProblem

Single-objective constrained optimization problem for phosphorylation dynamics (Numba-accelerated).

Minimizes loss between observed and predicted phosphorylation levels subject to constraints that alpha and beta weights sum to 1.0 for each gene-psite and kinase group, respectively.

Objective
  • minimize loss (MSE, autocorrelation, Huber, or MAPE)

Constraints g(x) <= 0: - for each alpha group: |sum(alpha_group) - 1| <= eps_eq - for each kinase beta group: |sum(beta_group) - 1| <= eps_eq

Attributes:

Name Type Description
P_initial dict

Dictionary mapping (gene, psite) tuples to data dictionaries.

P_initial_array ndarray

Observed phosphorylation matrix with shape (i_max, t_max).

K_index dict

Dictionary mapping kinase names to lists of (psite_label, row_idx) tuples.

K_array ndarray

Kinase activity matrix with shape (n_k_rows, t_max).

gp_offsets ndarray

Offset indices for gene-psite groups.

gp_kinase_ids ndarray

Kinase IDs for alpha variables.

k_offsets ndarray

Offset indices for kinase groups.

k_psite_rows ndarray

Psite row indices for beta variables.

num_alpha int

Total number of alpha variables.

num_beta int

Total number of beta variables.

eps_eq float

Tolerance for equality constraints.

loss_id int

Loss type identifier.

include_reg bool

Whether to include regularization.

n_scalar float

Scalar factor for normalization.

estimated_series(params)

Compute estimated phosphorylation series for given parameters.

Parameters:

Name Type Description Default
params array - like

1D array of parameters [alpha_1, ..., alpha_N, beta_1, ..., beta_M].

required

Returns:

Name Type Description
ndarray

Predicted phosphorylation matrix with shape (i_max, t_max).

residuals(params)

Compute residuals between observed and estimated phosphorylation for given parameters.

Parameters:

Name Type Description Default
params array - like

1D array of parameters [alpha_1, ..., alpha_N, beta_1, ..., beta_M].

required

Returns:

Name Type Description
ndarray

Residual matrix (observed - estimated) with shape (i_max, t_max).

kinopt.evol.objfn.minfnnsgaii

PhosphorylationOptimizationProblem

Bases: ElementwiseProblem

Multi-objective optimization

F[0] = main loss (error) F[1] = alpha sum-to-1 violations (aggregated) F[2] = beta sum-to-1 violations (aggregated)

Parameters:

Name Type Description Default
P_initial dict

Dictionary with keys as (gene, psite) and values containing 'Kinases' and 'TimeSeries'.

required
P_initial_array ndarray

Array of observed gene-psite data.

required
K_index dict

Dictionary mapping each kinase to a list of (psite, time_series) tuples.

required
K_array ndarray

Array of kinase-psite time-series data.

required
gene_psite_counts list

List of integers indicating the number of kinases associated with each gene-psite.

required
beta_counts dict

Dictionary indicating how many beta values correspond to each kinase-psite combination.

required

objective_function(params)

Computes the main objective function (loss) for the given parameters.

Parameters:

Name Type Description Default
params ndarray

Parameter vector containing alpha and beta values.

required

Returns:

Name Type Description
float

Computed loss value based on the selected loss type (base, autocorrelation, huber, or mape).

kinopt.evol.opt.optrun

choose_de_pop_size(problem)

Determine an appropriate population size for Differential Evolution (DE) algorithms.

The population size is calculated based on the number of decision variables, with bounds to ensure reasonable performance. DE algorithms benefit from population sizes that are multiples of 10.

Parameters:

Name Type Description Default
problem

The optimization problem instance with an 'n_var' attribute indicating the number of decision variables.

required

Returns:

Name Type Description
int

The calculated population size (multiple of 10, between 100 and 600).

choose_nsga_pop_size(problem, n_obj=3)

Determine an appropriate population size for NSGA-based multi-objective algorithms.

The population size is scaled based on the problem dimensionality (number of decision variables) with heuristic thresholds. The size is rounded to multiples of 50 and enforced to be at least 10 times the number of objectives.

Parameters:

Name Type Description Default
problem

The optimization problem instance with an 'n_var' attribute indicating the number of decision variables.

required
n_obj int

Number of objectives in the problem. Defaults to 3.

3

Returns:

Name Type Description
int

The calculated population size (multiple of 50, at least 10*n_obj).

binary_tournament_loss_cv(pop, P, eps_cv=1e-10, cv_mode='linf', **kwargs)

Robust binary tournament comparator for constrained optimization.

This function performs binary tournament selection with constraint handling using either true constraint violations (CV) or pseudo-constrained objectives. It supports both single-objective and multi-objective formulations.

Works for

A) single-objective: F has length 1 - if CV exists, use constraint-domination (CV first, then F) - else compare by F only B) pseudo-constrained objectives: F = [loss, alpha_violation, beta_violation] - feasibility-first based on F[1], F[2], then loss

Parameters:

Name Type Description Default
pop

Population of individuals with 'F' (objectives) and optionally 'CV' attributes.

required
P ndarray

Tournament pairs array of shape (n_tournaments, 2), where each row contains indices of two competing individuals.

required
eps_cv float

Feasibility tolerance for constraint violations. Defaults to 1e-10.

1e-10
cv_mode str

Mode for aggregating constraint violations when using pseudo-constrained objectives. Options: 'linf' (max), 'l1' (sum), 'l2' (norm). Defaults to "linf".

'linf'
**kwargs

Additional keyword arguments (unused, for compatibility).

{}

Returns:

Type Description

np.ndarray: Array of winning indices for each tournament, shape (n_tournaments,).

Raises:

Type Description
ValueError

If pressure is not 2 (only binary tournaments supported) or if cv_mode is not one of 'linf', 'l1', 'l2'.

run_optimization(P_initial, P_initial_array, K_index, K_array, gene_psite_counts, beta_counts, PhosphorylationOptimizationProblem)

Sets up and runs the multi-objective optimization problem for phosphorylation using an NSGA2 algorithm and a thread pool for parallelization.

Parameters:

Name Type Description Default
P_initial, P_initial_array, K_index, K_array, gene_psite_counts, beta_counts

Data structures describing the problem (time-series data, kinases, etc.).

required
PhosphorylationOptimizationProblem class

The custom problem class to be instantiated.

required

Returns:

Name Type Description
result

The pymoo result object containing the optimized population and history.

exec_time

Execution time for the optimization.

pick_best_loss_with_constraints_as_objectives(result, eps_cv=1e-10, cv_mode='l1', tie_tol=1e-12, tie_break='loss_then_l2')

Select the best solution from a population with constraints formulated as objectives.

This function assumes a specific objective structure where

F[:,0] = loss (minimize) F[:,1] = constraint violation 1 (minimize, ideally 0) F[:,2] = constraint violation 2 (minimize, ideally 0)

Selection rule

A) If any feasible solutions exist (cv1<=eps and cv2<=eps): choose minimum loss among feasible. B) Else: choose minimum aggregated CV; tie-break by loss; optional tie-break by ||X||2.

Parameters:

Name Type Description Default
result

Pymoo result object containing the final population with 'F' and 'X' attributes.

required
eps_cv float

Feasibility tolerance for constraint violations. Defaults to 1e-10.

1e-10
cv_mode str

Mode for aggregating constraint violations. Options: 'l1' (sum), 'linf' (max), 'l2' (Euclidean norm). Defaults to "l1".

'l1'
tie_tol float

Tolerance for considering values as tied. Defaults to 1e-12.

1e-12
tie_break str

Tie-breaking strategy. Options: 'loss_then_l2' or 'loss_only'. Defaults to "loss_then_l2".

'loss_then_l2'

Returns:

Name Type Description
tuple

A tuple containing: - best_solution: The selected individual from the population. - best_index_in_pop (int): The index of the best solution in the population. - info (dict): Dictionary with selection metadata including selection case, number of feasible solutions, and best objective values.

Raises:

Type Description
ValueError

If the objective array has fewer than 3 columns or if cv_mode is not one of 'l1', 'linf', 'l2'.

post_optimization_nsga(result, weights=np.array([1.0, 1.0, 1.0]), ref_point=np.array([3, 1, 1]))

Post-process the result of a multi-objective NSGA-based optimization run.

This function analyzes the optimization history, computes convergence metrics (hypervolume, IGD+), identifies the best solution using constraint handling, and generates CSV reports for convergence and parameter scans.

Parameters:

Name Type Description Default
result

The final result object from the pymoo optimizer containing the final population, history, and objective values.

required
weights ndarray

Array of length 3 for weighting the objectives in decomposition-based selection. Defaults to [1.0, 1.0, 1.0].

array([1.0, 1.0, 1.0])
ref_point ndarray

Reference point for hypervolume computation. Defaults to [3, 1, 1].

array([3, 1, 1])

Returns:

Name Type Description
tuple

A tuple containing 23 elements: - F: Final objective values array - pairs: Objective pairs for plotting [(0,1), (0,2), (1,2)] - n_evals: Number of evaluations per generation - hist_cv: Minimum constraint violation per generation - hist_cv_avg: Average constraint violation per generation - k: Generation index when first feasible solution appeared - metric_igd: IGDPlus metric object - metric_hv: Hypervolume metric object - best_solution: The selected best individual - best_objectives: Objective vector of best solution - optimized_params: Decision variables (X) of best solution - approx_nadir: Approximate nadir point - approx_ideal: Approximate ideal point - scores: Best solution's objective scores - best_index: Index of best solution in population - hist: Full optimization history - hist_hv: Hypervolume values per generation - hist_igd: IGD+ values per generation - convergence_df: DataFrame with iteration vs best objective - waterfall_df: DataFrame with all solutions and parameters - asf_i: Index of best solution by ASF decomposition - pseudo_weights_result: Result of pseudo-weights MCDM method - pairs (duplicate): Objective pairs - val: Best objective value per generation

post_optimization_de(result, alpha_values, beta_values)

Post-process the result of a single-objective DE or GA optimization run.

This function extracts the final population, creates parameter labels from alpha and beta values, generates a parameter scan DataFrame sorted by objective value, and produces a convergence DataFrame showing the best objective per iteration.

Parameters:

Name Type Description Default
result

The final result object from the pymoo optimizer (e.g., GA or DE result) containing the population, history, and objective values.

required
alpha_values dict

Dictionary mapping (gene, psite) tuples to dictionaries of {kinase: value} for alpha parameters.

required
beta_values dict

Dictionary mapping (kinase, psite) tuples to beta parameter values.

required

Returns:

Name Type Description
tuple

A tuple containing 6 elements: - ordered_optimizer_runs: DataFrame of all solutions sorted by objective value - convergence_df: DataFrame with iteration vs best objective value - long_df: Long-form DataFrame for parameter visualization with columns ['Individual', 'Objective Value (F)', 'Parameter', 'Parameter Value', 'Type'] - x_values: List of iteration indices selected for plotting - y_values: List of objective values corresponding to x_values - val: Best objective value per generation from history

kinopt.evol.optcon.construct

pipeline(input1_path: str, input2_path: str, time_series_columns: list[str], scaling_method: str, split_point: float, segment_points: list[float], estimate_missing_kinases: bool, kinase_to_psites: dict[str, int])

Function to run the entire pipeline for loading and processing data.

Parameters:

Name Type Description Default
input1_path str

Path to the first CSV file (HGNC data).

required
input2_path str

Path to the second CSV file (kinase interactions).

required
time_series_columns list[str]

List of time series columns to extract.

required
scaling_method str

Method for scaling the data.

required
split_point float

Split point for scaling.

required
segment_points list[float]

Segment points for scaling.

required
estimate_missing_kinases bool

Flag to estimate missing kinases.

required
kinase_to_psites dict[str, int]

Dictionary mapping kinases to their respective psites.

required

Returns:

Name Type Description
full_hgnc_df DataFrame

The scaled data from input1.

interaction_df DataFrame

The subset/merged DataFrame from input2.

observed DataFrame

Subset of full_hgnc_df merged with interaction_df.

P_initial dict

Dictionary mapping gene-psite pairs to kinase relationships and time-series data.

P_initial_array ndarray

Array containing observed time-series data for gene-psite pairs.

K_array ndarray

Array containing time-series data for kinase-psite combinations.

K_index dict

Mapping of kinases to their respective psite data.

beta_counts dict

Mapping of kinase indices to the number of associated psites.

gene_psite_counts list

List of counts of psites for each gene.

n int

Number of unique gene-psite pairs.

load_geneid_to_psites(input1_path=INPUT1)

Function to load geneid to psite mapping from input1.csv. Args: input1_path (str): Path to the first CSV file (HGNC data). Returns: geneid_psite_map (dict): Dictionary mapping gene IDs to sets of psites.

get_unique_kinases(input2_path=INPUT2)

Function to extract unique kinases from input2.csv. Args: input2_path (str): Path to the second CSV file (kinase interactions). Returns: kinases (set): Set of unique kinases extracted from the input2 file.

check_kinases()

Function to check if kinases from input2.csv are present in input1.csv.

Returns:

Type Description

None

kinopt.evol.utils.iodata

format_duration(seconds)

Returns a formatted string representing the duration in seconds, minutes, or hours.

Parameters:

Name Type Description Default
seconds float

The duration in seconds.

required

Returns: str: The formatted duration string.

load_and_scale_data(estimate_missing, scaling_method, split_point, seg_points)

Function to load and scale data from CSV files.

Parameters:

Name Type Description Default
estimate_missing bool

If True, estimates missing values.

required
scaling_method str

The scaling method to apply ('min_max', 'log', 'temporal', 'segmented', 'slope', 'cumulative').

required
split_point int

Column index for temporal scaling.

required
seg_points list

List of column indices for segmented scaling.

required

Returns:

Name Type Description
full_hgnc_df DataFrame

DataFrame with scaled time-series data.

interaction_df DataFrame

DataFrame containing interaction data.

observed DataFrame

DataFrame containing observed data.

apply_scaling(df, time_series_columns, method, split_point, segment_points)

Function to apply different scaling methods to time-series data in a DataFrame.

Parameters:

Name Type Description Default
df DataFrame

Input DataFrame containing time-series data.

required
time_series_columns list

List of column names to scale.

required
method str

Scaling method ('min_max', 'log', 'temporal', 'segmented', 'slope', 'cumulative').

required
split_point int

Column index for temporal scaling.

required
segment_points list

List of column indices for segmented scaling.

required

Returns:

Type Description

pd.DataFrame: DataFrame with scaled time-series data.

create_report(results_dir: str, output_file: str = 'report.html')

Creates a single global report HTML file from all gene folders inside the results directory.

Parameters:

Name Type Description Default
results_dir str

Path to the root result's directory.

required
output_file str

Name of the generated global report file (placed inside results_dir).

'report.html'

Returns:

Type Description

None

organize_output_files(*directories)

Function to organize output files into protein-specific folders and a general folder.

Parameters:

Name Type Description Default
*directories

List of directories to organize.

()

Returns:

Type Description

None

kinopt.evol.utils.params

extract_parameters(P_initial, gene_psite_counts, K_index, optimized_params)

Function to extract alpha and beta values from the optimized parameters.

Parameters:

Name Type Description Default
P_initial dict

Dictionary containing initial parameters for each gene-psite pair.

required
gene_psite_counts list

List of counts for each gene-psite pair.

required
K_index dict

Dictionary mapping kinases to their respective psite pairs.

required
optimized_params list

List of optimized parameters.

required

Returns:

Name Type Description
alpha_values dict

Dictionary containing alpha values for each gene-psite pair.

beta_values dict

Dictionary containing beta values for each kinase-psite pair.

compute_metrics(optimized_params: np.ndarray, P_initial: dict, P_initial_array: np.ndarray, K_index: dict, K_array: np.ndarray, gene_psite_counts: list, beta_counts: dict, n: int)

Function to compute error metrics for the estimated series.

Parameters:

Name Type Description Default
optimized_params list

List of optimized parameters.

required
P_initial dict

Dictionary containing initial parameters for each gene-psite pair.

required
P_initial_array ndarray

Array of initial parameters.

required
K_index dict

Dictionary mapping kinases to their respective psite pairs.

required
K_array ndarray

Array of kinases.

required
gene_psite_counts list

List of counts for each gene-psite pair.

required
beta_counts dict

List of counts for each kinase-psite pair.

required
n int

Number of samples.

required

Returns:

Name Type Description
P_estimated ndarray

Estimated series.

residuals ndarray

Residuals between initial and estimated series.

mse float

Mean Squared Error.

rmse float

Root Mean Squared Error.

mae float

Mean Absolute Error.

mape float

Mean Absolute Percentage Error.

r_squared float

R-squared value.

Gradient-Based Algorithms

kinopt.local.config.constants

parse_args()

kinopt.local CLI. Defaults come from config.toml.

kinopt.local.config.logconf

kinopt.local.exporter.plotout

format_timepoints(tp, tol=1e-09)

Format timepoints with minimal decimals: - integers -> no decimal - non-integers -> one decimal

Parameters:

Name Type Description Default
tp array - like

Timepoints (list or np.ndarray)

required
tol float

Tolerance for floating-point integer check

1e-09

Returns:

Type Description

list[str]: Formatted labels

plot_fits_for_gene(gene, gene_data, real_timepoints)

Function to plot the observed and estimated phosphorylation levels for each psite of a gene.

Parameters:

Name Type Description Default
gene str

The name of the gene.

required
gene_data dict

A dictionary containing observed and estimated data for each psite of the gene.

required
real_timepoints list

A list of timepoints corresponding to the observed and estimated data.

required

export_outcomes_to_csv(outcomes, csv_path)

Export multistart optimization outcomes to CSV.

One row per start, scalar diagnostics only.

plot_cumulative_residuals(gene, gene_data, real_timepoints)

Function to plot the cumulative residuals for each psite of a gene.

Parameters:

Name Type Description Default
gene str

The name of the gene.

required
gene_data dict

A dictionary containing the residuals for each psite of the gene.

required
real_timepoints list

A list of timepoints corresponding to the observed and estimated data.

required

plot_autocorrelation_residuals(gene, gene_data, real_timepoints)

Function to plot the autocorrelation of residuals for each psite of a gene.

Parameters:

Name Type Description Default
gene str

The name of the gene.

required
gene_data dict

A dictionary containing the residuals for each psite of the gene.

required
real_timepoints list

A list of timepoints corresponding to the observed and estimated data.

required

plot_histogram_residuals(gene, gene_data, real_timepoints)

Function to plot histograms of residuals for each psite of a gene.

Parameters:

Name Type Description Default
gene str

The name of the gene.

required
gene_data dict

A dictionary containing the residuals for each psite of the gene.

required
real_timepoints list

A list of timepoints corresponding to the observed and estimated data.

required

plot_qqplot_residuals(gene, gene_data, real_timepoints)

Function to plot QQ plots of residuals for each psite of a gene.

Parameters:

Name Type Description Default
gene str

The name of the gene.

required
gene_data dict

A dictionary containing the residuals for each psite of the gene.

required
real_timepoints list

A list of timepoints corresponding to the observed and estimated data.

required

plot_multistart_summary_runtime_overlay(summary_csv, out_path=None, figsize=(8, 8), x_col='rank', y_col='fun', c_col='runtime_s', success_col='success', cv_col='constr_violation', annotate_best=True)

Read a multistart summary CSV and plot objective vs rank with point color = runtime.

Minimal, information-dense conventions: - x: rank (best -> worst) - y: final objective (fun) - color: runtime in seconds - optional: de-emphasize non-success / infeasible points (if columns exist)

Parameters:

Name Type Description Default
summary_csv str | Path

Path to the multistart_summary.csv

required
out_path str | Path | None

If provided, saves the figure (e.g. .png)

None
figsize tuple

Figure size in inches

(8, 8)
x_col, y_col, c_col

Column names

required
success_col, cv_col

Optional columns for styling (used if present)

required
annotate_best bool

Annotate the best run (rank=1 or min fun)

True

Returns:

Type Description
(fig, ax, df)

Matplotlib figure/axis and the loaded DataFrame

kinopt.local.exporter.sheetutils

format_timepoints(tp, tol=1e-09)

Format timepoints with minimal decimals: - integers -> no decimal - non-integers -> one decimal

Parameters:

Name Type Description Default
tp array - like

Timepoints (list or np.ndarray)

required
tol float

Tolerance for floating-point integer check

1e-09

Returns:

Type Description

list[str]: Formatted labels

plot_fits_for_gene(gene, gene_data, real_timepoints)

Function to plot the observed and estimated phosphorylation levels for each psite of a gene.

Parameters:

Name Type Description Default
gene str

The name of the gene.

required
gene_data dict

A dictionary containing observed and estimated data for each psite of the gene.

required
real_timepoints list

A list of timepoints corresponding to the observed and estimated data.

required

export_outcomes_to_csv(outcomes, csv_path)

Export multistart optimization outcomes to CSV.

One row per start, scalar diagnostics only.

plot_cumulative_residuals(gene, gene_data, real_timepoints)

Function to plot the cumulative residuals for each psite of a gene.

Parameters:

Name Type Description Default
gene str

The name of the gene.

required
gene_data dict

A dictionary containing the residuals for each psite of the gene.

required
real_timepoints list

A list of timepoints corresponding to the observed and estimated data.

required

plot_autocorrelation_residuals(gene, gene_data, real_timepoints)

Function to plot the autocorrelation of residuals for each psite of a gene.

Parameters:

Name Type Description Default
gene str

The name of the gene.

required
gene_data dict

A dictionary containing the residuals for each psite of the gene.

required
real_timepoints list

A list of timepoints corresponding to the observed and estimated data.

required

plot_histogram_residuals(gene, gene_data, real_timepoints)

Function to plot histograms of residuals for each psite of a gene.

Parameters:

Name Type Description Default
gene str

The name of the gene.

required
gene_data dict

A dictionary containing the residuals for each psite of the gene.

required
real_timepoints list

A list of timepoints corresponding to the observed and estimated data.

required

plot_qqplot_residuals(gene, gene_data, real_timepoints)

Function to plot QQ plots of residuals for each psite of a gene.

Parameters:

Name Type Description Default
gene str

The name of the gene.

required
gene_data dict

A dictionary containing the residuals for each psite of the gene.

required
real_timepoints list

A list of timepoints corresponding to the observed and estimated data.

required

plot_multistart_summary_runtime_overlay(summary_csv, out_path=None, figsize=(8, 8), x_col='rank', y_col='fun', c_col='runtime_s', success_col='success', cv_col='constr_violation', annotate_best=True)

Read a multistart summary CSV and plot objective vs rank with point color = runtime.

Minimal, information-dense conventions: - x: rank (best -> worst) - y: final objective (fun) - color: runtime in seconds - optional: de-emphasize non-success / infeasible points (if columns exist)

Parameters:

Name Type Description Default
summary_csv str | Path

Path to the multistart_summary.csv

required
out_path str | Path | None

If provided, saves the figure (e.g. .png)

None
figsize tuple

Figure size in inches

(8, 8)
x_col, y_col, c_col

Column names

required
success_col, cv_col

Optional columns for styling (used if present)

required
annotate_best bool

Annotate the best run (rank=1 or min fun)

True

Returns:

Type Description
(fig, ax, df)

Matplotlib figure/axis and the loaded DataFrame

output_results(P_initial, P_init_dense, P_estimated, residuals, alpha_values, beta_values, result, mse, rmse, mae, mape, r_squared)

Function to output the results of the optimization process.

Parameters:

Name Type Description Default
P_initial dict

Dictionary containing initial phosphorylation data.

required
P_init_dense ndarray

Dense matrix of initial phosphorylation data.

required
P_estimated ndarray

Dense matrix of estimated phosphorylation data.

required
residuals ndarray

Dense matrix of residuals.

required
alpha_values dict

Dictionary containing optimized alpha values.

required
beta_values dict

Dictionary containing optimized beta values.

required
result OptimizeResult

Result object from the optimization process.

required
mse float

Mean Squared Error of the optimization.

required
rmse float

Root Mean Squared Error of the optimization.

required
mae float

Mean Absolute Error of the optimization.

required
mape float

Mean Absolute Percentage Error of the optimization.

required
r_squared float

R-squared value of the optimization.

required

export_params_npz(outcomes, path)

Export the optimized parameters to a compressed npz file.

Parameters:

Name Type Description Default
outcomes list

List of OptimizeResult objects.

required
path str

Path to save the npz file.

required

kinopt.local.objfn.minfn

kinopt.local.opt.optrun

StartOutcome dataclass

Outcome of a single optimization start.

:param start_id: ID of the start. :param seed: Seed used for the start. :param result: Result of the optimization. :param optimized_params: Optimized parameters. :param fun: Objective function value. :param success: Whether the optimization was successful. :param constr_violation: Constraint violation. :param runtime_s: Runtime of the optimization.

run_optimization(obj_fun, params_initial, opt_method, bounds, constraints)

Run optimization using the specified method.

Parameters:

Name Type Description Default
obj_fun

Objective function to minimize.

required
params_initial

Initial parameters for the optimization.

required
opt_method

Optimization method to use (e.g., 'SLSQP', 'trust-constr').

required
bounds

Bounds for the parameters.

required
constraints

Constraints for the optimization.

required

Returns:

Name Type Description
result

Result of the optimization.

optimized_params

Optimized parameters.

multistart_run_optimization(obj_fun, params_initial, opt_method, bounds, constraints, n_starts=24, n_jobs=-1, base_seed=1234, init_strategy='hybrid', jitter_scale=0.15, prefer_feasible=True, logger=None)

Runs run_optimization multiple times in parallel and returns (best_result, best_params, outcomes).

Selection logic (sophisticated but simple): 1) If prefer_feasible: prefer (cv <= 0) or smallest constraint violation. 2) Then lowest objective. 3) Then success=True as tie-breaker. 4) Then shortest runtime as final tie-breaker.

Parameters:

Name Type Description Default
obj_fun

Objective function to optimize.

required
params_initial

Initial parameters for optimization.

required
opt_method

Optimization method to use (e.g., 'SLSQP', 'trust-constr').

required
bounds

Parameter bounds for optimization.

required
constraints

Constraints for optimization.

required
n_starts

Number of optimization starts to run (default: 24).

24
n_jobs

Number of parallel jobs to run. -1 means use all processors (default: -1).

-1
base_seed

Base seed for random number generation (default: 1234).

1234
init_strategy

Strategy for sampling initial parameters: 'jitter', 'uniform', or 'hybrid' (default: 'hybrid').

'hybrid'
jitter_scale

Scale for jittering initial parameters (default: 0.15).

0.15
prefer_feasible

If True, prefer feasible solutions over infeasible ones (default: True).

True
logger

Logger instance for logging messages (default: None).

None

Returns:

Name Type Description
tuple

A tuple containing: - best_result: The optimization result object with the best outcome. - best_params: The optimized parameters corresponding to the best result. - outcomes: List of StartOutcome objects for all optimization starts.

kinopt.local.optcon.construct

load_geneid_to_psites(input1_path=INPUT1)

Load the geneid to psite mapping from a CSV file.

Parameters:

Name Type Description Default
input1_path str

Path to the input CSV file containing geneid and psite information.

INPUT1

Returns: defaultdict: A dictionary mapping geneid to a set of psites.

get_unique_kinases(input2_path=INPUT2)

Extract unique kinases from the input CSV file.

Parameters:

Name Type Description Default
input2_path str

Path to the input CSV file containing kinase information.

INPUT2

Returns: set: A set of unique kinases.

check_kinases()

Check if kinases in input2.csv are present in input1.csv and log the results.

kinopt.local.utils.iodata

format_duration(seconds)

Formats a duration in seconds into a human-readable string. - If less than 60 seconds, returns in seconds. - If less than 3600 seconds, returns in minutes. - If more than 3600 seconds, returns in hours.

:param seconds: :return: Formatted string

load_and_scale_data(estimate_missing, scaling_method, split_point, seg_points)

Load and scale the data from the specified input files.

:param estimate_missing: :param scaling_method: :param split_point: :param seg_points: :return: Time series data, interaction data, observed data

apply_scaling(df, cols, method, split_point, seg_points)

Apply scaling to the specified columns of a DataFrame based on the given method. The scaling methods include: - 'min_max': Min-Max scaling - 'log': Logarithmic scaling - 'temporal': Temporal scaling (two segments) - 'segmented': Segmented scaling (multiple segments) - 'slope': Slope scaling - 'cumulative': Cumulative scaling

:param df: :param cols: :param method: :param split_point: :param seg_points: :return: df

create_report(results_dir: str, output_file: str = 'report.html')

Creates a single global report HTML file from all gene folders inside the results directory.

For each gene folder (e.g. "ABL2"), the report will include: - All PNG plots and interactive HTML plots displayed in a grid with three plots per row. - Each plot is confined to a fixed size of 900px by 900px. - Data tables from XLSX or CSV files in the gene folder are displayed below the plots, one per row.

Parameters:

Name Type Description Default
results_dir str

Path to the root results directory.

required
output_file str

Name of the generated global report file (placed inside results_dir).

'report.html'

organize_output_files(*directories)

Function to organize output files into protein-specific folders. It moves files matching the pattern 'protein_name_*.{json,svg,png,html,csv,xlsx}' into a folder named after the protein (e.g., 'ABL2') and moves all other files into a 'General' folder within the same directory.

:param directories:

kinopt.local.utils.params

extract_parameters(P_initial, gene_kinase_counts, total_alpha, unique_kinases, K_index, optimized_params)

Extracts the alpha and beta parameters from the optimized parameters.

:param P_initial: :param gene_kinase_counts: :param total_alpha: :param unique_kinases: :param K_index: :param optimized_params: :return: Alpha and beta values as dictionaries

compute_metrics(optimized_params, P_init_dense, t_max, gene_alpha_starts, gene_kinase_counts, gene_kinase_idx, total_alpha, kinase_beta_starts, kinase_beta_counts, K_data, K_indices, K_indptr)

Computes the estimated series and various metrics based on the optimized parameters.

:param optimized_params: :param P_init_dense: :param t_max: :param gene_alpha_starts: :param gene_kinase_counts: :param gene_kinase_idx: :param total_alpha: :param kinase_beta_starts: :param kinase_beta_counts: :param K_data: :param K_indices: :param K_indptr: :return: Estimated series, residuals, MSE, RMSE, MAE, MAPE, R-squared

Fitting Analysis & Feasibility

kinopt.fitanalysis.helpers.postfit

goodnessoffit(estimated, observed)

Function to plot the goodness of fit and kullback-leibler divergence for estimated and observed values.

Parameters:

Name Type Description Default
estimated DataFrame

DataFrame containing estimated values.

required
observed DataFrame

DataFrame containing observed values.

required

Returns:

Type Description

None

reshape_alpha_beta(alpha_values, beta_values)

Function to reshape alpha and beta values for plotting.

Parameters:

Name Type Description Default
alpha_values DataFrame

DataFrame containing alpha values.

required
beta_values DataFrame

DataFrame containing beta values.

required

Returns: pd.DataFrame: Reshaped DataFrame containing both alpha and beta values.

perform_pca(df)

Function to perform PCA analysis on the given DataFrame.

Parameters:

Name Type Description Default
df DataFrame

DataFrame containing the data for PCA analysis.

required

Returns:

Type Description

pd.DataFrame: DataFrame with PCA results and additional columns for type and gene/psite information.

plot_pca(result_df_sorted, y_axis_column)

Plot PCA or t-SNE results for each gene/psite. The function creates scatter plots with different markers for alpha and beta parameters, and adds labels for each point. The function also adjusts text labels to avoid overlap using the adjustText library.

:param result_df_sorted: DataFrame containing PCA or t-SNE results. :param y_axis_column: Column name for the y-axis values in the plot.

perform_tsne(scaled_data, df)

Perform t-SNE analysis on the given scaled data. The function returns a DataFrame with t-SNE results and additional columns for type and gene/psite information.

:param scaled_data: :param df:

:return: - pd.DataFrame: DataFrame with t-SNE results and additional columns.

additional_plots(df, scaled_data, alpha_values, beta_values, residuals_df)

Function to create additional plots including CDF, KDE, Boxplot, and Hierarchical Clustering.

:param df: :param scaled_data: :param alpha_values: :param beta_values: :param residuals_df:

create_sankey_from_network(output_dir, data, title)

Creates a Sankey diagram from the given data and saves it as an HTML file.

This function processes the input data to generate nodes and links for a Sankey diagram. It assigns colors to nodes and links based on their attributes and values, and uses Plotly to render the diagram. The resulting diagram is saved as an HTML file in the specified output directory.

:param output_dir: str The directory where the Sankey diagram HTML file will be saved. :param data: pd.DataFrame A DataFrame containing the data for the Sankey diagram. It must include the following columns: - 'Source': The source node of the link. - 'Target': The target node of the link. - 'Value': The value of the link, which determines the flow size. :param title: str The title of the Sankey diagram.

The function performs the following steps: 1. Initializes nodes and links for the Sankey diagram. 2. Maps node labels to indices and assigns colors to nodes. 3. Processes the data to create links between nodes, assigning colors based on link values. 4. Builds the Sankey diagram using Plotly. 5. Adds a color bar to explain the flow gradient. 6. Saves the Sankey diagram as an HTML file in the specified output directory.

important_connections(output_dir, data, top_n=20)

Extracts the top N most important connections based on their absolute values and saves them to a CSV file.

:param output_dir: str The directory where the CSV file will be saved. :param data: pd.DataFrame A DataFrame containing the connections with columns 'Source', 'Target', and 'Value'. :param top_n: int, optional The number of top connections to extract (default is 20).

The function sorts the connections by their absolute values in descending order, selects the top N connections, and saves them to a CSV file named 'top_connections.csv' in the specified output directory.

kinopt.optimality.KKT

generate_latex_table(summary_dict, table_caption, table=None)

Function to generate a LaTeX table from a summary dictionary.

Parameters:

Name Type Description Default
summary_dict dict

Dictionary containing summary data.

required
table_caption str

Caption for the LaTeX table.

required
table str

Optional existing LaTeX table to append to.

None

Returns:

Name Type Description
str

LaTeX formatted table as a string.

print_primal_feasibility_results(primal_summary, alpha_violations, beta_violations, logger_obj=None)

Logs the primal feasibility summary and violation details.

Parameters:

Name Type Description Default
primal_summary dict

Dictionary containing primal feasibility results.

required
alpha_violations dict

Dictionary containing alpha constraint violations.

required
beta_violations dict

Dictionary containing beta constraint violations.

required
logger_obj

Optional logger object to log the information.

None

print_sensitivity_and_active_constraints(sensitivity_summary, active_constraints_summary, logger_obj=None)

Logs the sensitivity summary and active constraints summary.

Parameters:

Name Type Description Default
sensitivity_summary dict

Dictionary containing sensitivity analysis results.

required
active_constraints_summary dict

Dictionary containing active constraints summary.

required
logger_obj

Optional logger object to log the information.

None

plot_constraint_violations(alpha_violations, beta_violations, out_dir)

Function to plot constraint violations for alpha and beta values. It creates a stacked bar plot showing the violations for each protein. The top 5 proteins with the highest violations are highlighted in red.

Parameters:

Name Type Description Default
alpha_violations Series

Series containing alpha constraint violations.

required
beta_violations Series

Series containing beta constraint violations.

required
out_dir str

Directory to save the plot.

required

plot_sensitivity_analysis(sensitivity_analysis, out_dir)

Function to plot sensitivity analysis results. It creates a horizontal bar plot showing the mean, max, and min sensitivity for each protein.

Parameters:

Name Type Description Default
sensitivity_analysis DataFrame

DataFrame containing sensitivity analysis results.

required
out_dir str

Directory to save the plot.

required

Returns:

Type Description

None

process_excel_results(file_path=OUT_FILE)

Function to process the Excel results file. It reads the alpha and beta values, estimated and observed values, validates normalization constraints, computes residuals and gradients, and generates LaTeX tables for the residuals and sensitivity summaries. It also performs sensitivity analysis and identifies high sensitivity sites. The results are returned as a dictionary.

Parameters:

Name Type Description Default
file_path str

Path to the Excel file containing results.

OUT_FILE

Returns: dict: Dictionary containing the processed results, including alpha and beta values, estimated and observed values, constraint violations, residuals summary, sensitivity summary, and high sensitivity sites.

post_optimization_results()

Function to process and visualize the results of the optimization.

Returns: dict: Dictionary containing the processed results, including alpha and beta values, estimated and observed values, constraint violations, residuals summary, sensitivity summary, and high sensitivity sites.

TF-mRNA Optimization

Evolutionary Algorithms

tfopt.evol.config.constants

parse_args()

tfopt.evol CLI: bounds, loss, optimizer selection. Defaults come from config.toml.

tfopt.evol.config.logconf

tfopt.evol.exporter.plotout

plot_estimated_vs_observed(predictions, expression_matrix, gene_ids, time_points, regulators, tf_protein_matrix, tf_ids, num_targets, save_path=OUT_DIR)

Plot the estimated vs observed expression levels for a set of genes.

Parameters:

Name Type Description Default
predictions ndarray

Predicted expression levels.

required
expression_matrix ndarray

Observed expression levels.

required
gene_ids list

List of gene identifiers.

required
time_points ndarray

Time points for the experiments.

required
regulators ndarray

Matrix of regulators for each gene.

required
tf_protein_matrix ndarray

Matrix of TF protein levels.

required
tf_ids list

List of TF identifiers.

required
num_targets int

Number of target genes to plot.

required
save_path str

Directory to save the plots.

OUT_DIR

compute_predictions(x, regulators, protein_mat, psite_tensor, n_reg, T_use, n_mRNA, beta_start_indices, num_psites)

Compute the predicted expression levels based on the optimization variables.

Parameters:

Name Type Description Default
x ndarray

Optimization variables.

required
regulators ndarray

Matrix of regulators for each gene.

required
protein_mat ndarray

Matrix of TF protein levels.

required
psite_tensor ndarray

Tensor of phosphorylation sites.

required
n_reg int

Number of regulators.

required
T_use int

Number of time points to use.

required
n_mRNA int

Number of mRNAs.

required
beta_start_indices list

List of starting indices for beta parameters.

required
num_psites list

List of number of phosphorylation sites for each TF.

required

tfopt.evol.exporter.sheetutils

save_results_to_excel(gene_ids, tf_ids, final_alpha, final_beta, psite_labels_arr, expression_matrix, predictions, objective_value, reg_map, filename=OUT_FILE)

Save the optimization results to an Excel file.

Parameters:

Name Type Description Default
gene_ids list

List of gene identifiers.

required
tf_ids list

List of TF identifiers.

required
final_alpha ndarray

Final alpha values.

required
final_beta ndarray

Final beta values.

required
psite_labels_arr list

List of phosphorylation site labels.

required
expression_matrix ndarray

Observed expression levels.

required
predictions ndarray

Predicted expression levels.

required
objective_value float

Objective value from optimization.

required
reg_map dict

Mapping of genes to regulators.

required
filename str

Path to the output Excel file.

OUT_FILE

tfopt.evol.objfn.minfn

TFOptimizationMultiObjectiveProblem

Bases: Problem

Represents a multi-objective optimization problem specific to transcription factor (TF) and mRNA synthesis dynamics.

This class is an extension of the Problem class and is designed to model complex biological processes by incorporating various dynamic parameters like regulators, protein matrices, psite tensors, and associated configurations. It supports parallel evaluation for multi-thread usage, optimizing performance for large populations.

Attributes:

Name Type Description
n_mRNA int

Number of mRNA species in the system.

n_TF int

Number of transcription factor species in the system.

n_reg int

Number of regulators.

n_psite_max int

Maximum number of potential p-sites.

n_alpha int

Number of alpha parameters used in modeling.

T_use int

Number of time units or steps to use in the evaluation.

mRNA_mat ndarray

A matrix representing the mRNA dynamics.

regulators ndarray

Array of regulator IDs associated with mRNA and TF interactions.

protein_mat ndarray

A matrix representing the protein synthesis rates or patterns.

psite_tensor ndarray

A tensor indicating probabilistic binding sites of proteins.

beta_start_indices ndarray

Array indicating the starting indices of beta coefficients.

num_psites ndarray

Array indicating the number of p-sites per transcription factor.

no_psite_tf ndarray

Boolean array indicating TFs with zero p-sites.

loss_type int

Configurable loss function type for the optimization. Defaults to 0.

lam1 float

First regularization parameter for loss calculation. Defaults to 1e-3.

lam2 float

Second regularization parameter for loss calculation. Defaults to 1e-3.

max_threads int

Maximum number of threads to use for evaluation. 0 indicates automatic thread selection based on system capacity.

__init__(n_var: int, n_mRNA: int, n_TF: int, n_reg: int, n_psite_max: int, n_alpha: int, mRNA_mat: np.ndarray, regulators: np.ndarray, protein_mat: np.ndarray, psite_tensor: np.ndarray, T_use: int, beta_start_indices: np.ndarray, num_psites: np.ndarray, no_psite_tf: np.ndarray, xl: Optional[np.ndarray] = None, xu: Optional[np.ndarray] = None, **kwargs)

Initializes the class with various parameters required for computational evaluation.

Parameters:

Name Type Description Default
n_var int

The number of variables.

required
n_mRNA int

The number of mRNA molecules.

required
n_TF int

The number of transcription factors (TFs).

required
n_reg int

The number of regulators.

required
n_psite_max int

The maximum number of p-sites.

required
n_alpha int

The number of alpha coefficients.

required
mRNA_mat ndarray

Matrix representing the mRNA data.

required
regulators ndarray

Array representing the regulator mappings.

required
protein_mat ndarray

Matrix representing the protein data.

required
psite_tensor ndarray

Tensor representing the p-site data.

required
T_use int

The time step or usage parameter.

required
beta_start_indices ndarray

Array of start indices for beta calculations.

required
num_psites ndarray

Array representing the number of p-sites.

required
no_psite_tf ndarray

Boolean array indicating TFs with no associated p-sites.

required
xl Optional[ndarray]

Optional lower-bound array for the variables.

None
xu Optional[ndarray]

Optional upper-bound array for the variables.

None
**kwargs

Additional optional arguments such as "loss_type", "lam1", "lam2", and "threads".

{}

tfopt.evol.opt.optrun

run_optimization(problem, total_dim, optimizer)

Execute multi-objective optimization using the specified algorithm.

This function configures and runs one of three multi-objective evolutionary algorithms (UNSGA3, SMSEMOA, or AGEMOEA) on the provided optimization problem. The algorithm is configured with appropriate genetic operators (two-point crossover and polynomial mutation) and terminated after 1000 generations.

Parameters:

Name Type Description Default
problem Problem

The pymoo Problem instance defining the optimization problem, including objectives, constraints, and variable bounds.

required
total_dim int

Total number of decision variables (dimensions) in the optimization problem. Used to determine population size and mutation probability.

required
optimizer int

Selector for the optimization algorithm: - 0: UNSGA3 (Unified NSGA-III) - Reference direction-based algorithm - 1: SMSEMOA - S-Metric Selection Evolutionary Multi-objective Algorithm - 2: AGEMOEA - Adaptive Geometry Estimation-based Multi-objective Evolutionary Algorithm

required

Returns:

Name Type Description
Result

A pymoo Result object containing the optimization outcomes, including: - X: Decision variables of the Pareto-optimal solutions - F: Objective function values of the Pareto-optimal solutions - algorithm: The algorithm instance used - Additional statistics and convergence information

Notes
  • Population size is set to 2 * total_dim (or larger for UNSGA3 if needed)
  • Crossover probability: 0.9
  • Mutation probability: 1.0 / total_dim
  • Mutation distribution index (eta): 20
  • Termination: Fixed at 1000 generations
  • Random seed: 1 (for reproducibility)
  • Duplicate elimination is enabled for all algorithms
  • UNSGA3 automatically adjusts population size to match reference directions

tfopt.evol.optcon.construct

build_fixed_arrays(mRNA_ids, mRNA_mat, TF_ids, protein_dict, psite_dict, psite_labels_dict, reg_map)

Builds fixed-shape arrays from the input data.

Parameters:

Name Type Description Default
mRNA_ids list

List of mRNA identifiers.

required
mRNA_mat ndarray

Matrix of mRNA expression levels.

required
TF_ids list

List of TF identifiers.

required
protein_dict dict

Dictionary mapping TFs to their protein levels.

required
psite_dict dict

Dictionary mapping TFs to their phosphorylation sites.

required
psite_labels_dict dict

Dictionary mapping TFs to their phosphorylation site labels.

required
reg_map dict

Mapping of genes to their regulators.

required

Returns: mRNA_mat (np.ndarray): Matrix of mRNA expression levels. regulators (np.ndarray): Matrix of regulators for each mRNA. protein_mat (np.ndarray): Matrix of TF protein levels. psite_tensor (np.ndarray): Tensor of phosphorylation sites. n_reg (int): Number of regulators. n_psite_max (int): Maximum number of phosphorylation sites across all TFs. psite_labels_arr (list): List of phosphorylation site labels for each TF. num_psites (np.ndarray): Array indicating the number of phosphorylation sites for each TF.

tfopt.evol.optcon.filter

load_raw_data()

Load raw data from files.

Returns:

Name Type Description
mRNA_ids

List of mRNA gene identifiers.

mRNA_mat

Matrix of mRNA expression data.

mRNA_time_cols

Time points for mRNA data.

TF_ids

List of transcription factor identifiers.

protein_dict

Dictionary mapping TF_ids to their protein data.

psite_dict

Dictionary mapping TF_ids to their phosphorylation site data.

psite_labels_dict

Dictionary mapping TF_ids to their phosphorylation site labels.

TF_time_cols

Time points for TF data.

reg_map

Regulation map, mapping mRNA genes to their regulators.

filter_mrna(mRNA_ids, mRNA_mat, reg_map)

Filter mRNA genes to only those with regulators present in the regulation map.

Parameters:

Name Type Description Default
mRNA_ids list

List of mRNA gene identifiers.

required
mRNA_mat ndarray

Matrix of mRNA expression data.

required
reg_map dict

Regulation map, mapping mRNA genes to their regulators.

required

Returns:

Name Type Description
filtered_mRNA_ids list

List of filtered mRNA gene identifiers.

filtered_mRNA_mat ndarray

Matrix of filtered mRNA expression data.

update_regulations(mRNA_ids, reg_map, TF_ids)

Update the regulation map to only include relevant transcription factors.

Parameters:

Name Type Description Default
mRNA_ids list

List of mRNA gene identifiers.

required
reg_map dict

Regulation map, mapping mRNA genes to their regulators.

required
TF_ids list

List of transcription factor identifiers.

required

Returns:

Name Type Description
relevant_TFs set

Set of relevant transcription factors.

filter_TF(TF_ids, protein_dict, psite_dict, psite_labels_dict, relevant_TFs)

Filter transcription factors to only those present in the relevant_TFs set.

Parameters:

Name Type Description Default
TF_ids list

List of transcription factor identifiers.

required
protein_dict dict

Dictionary mapping TF_ids to their protein data.

required
psite_dict dict

Dictionary mapping TF_ids to their phosphorylation site data.

required
psite_labels_dict dict

Dictionary mapping TF_ids to their phosphorylation site labels.

required
relevant_TFs set

Set of relevant transcription factors.

required

Returns:

Name Type Description
TF_ids_filtered list

List of filtered transcription factor identifiers.

protein_dict dict

Filtered dictionary mapping TF_ids to their protein data.

psite_dict dict

Filtered dictionary mapping TF_ids to their phosphorylation site data.

psite_labels_dict dict

Filtered dictionary mapping TF_ids to their phosphorylation site labels.

determine_T_use(mRNA_mat, TF_time_cols)

Determine the number of time points to use for the analysis.

Parameters:

Name Type Description Default
mRNA_mat ndarray

Matrix of mRNA expression data.

required
TF_time_cols list

Time points for TF data.

required

tfopt.evol.utils.iodata

load_mRNA_data(filename=INPUT3)

Load mRNA data from a CSV file.

Parameters:

Name Type Description Default
filename str

Path to the CSV file containing mRNA data.

INPUT3

Returns: - mRNA_ids: List of mRNA gene identifiers (strings). - mRNA_mat: Matrix of mRNA expression data (numpy array). - time_cols: List of time columns (excluding "GeneID").

load_TF_data(filename=INPUT1)

Load TF data from a CSV file.

Parameters:

Name Type Description Default
filename str

Path to the CSV file containing TF data.

INPUT1

Returns: - TF_ids: List of TF identifiers (strings). - protein_dict: Dictionary mapping TF identifiers to their protein data (numpy array). - psite_dict: Dictionary mapping TF identifiers to their phosphorylation site data (list of numpy arrays). - psite_labels_dict: Dictionary mapping TF identifiers to their phosphorylation site labels (list of strings). - time_cols: List of time columns (excluding "GeneID" and "Psite").

load_regulation(filename=INPUT4)

Load regulation data from a CSV file.

Parameters:

Name Type Description Default
filename str

Path to the CSV file containing regulation data.

INPUT4

Returns: - reg_map: Dictionary mapping mRNA genes to their regulators (list of TF identifiers).

create_report(results_dir: str, output_file: str = 'report.html')

Creates a single global report HTML file from all gene folders inside the results directory.

Parameters:

Name Type Description Default
results_dir str

Path to the directory containing gene folders.

required

organize_output_files(*directories)

Organizes output files from multiple directories into separate folders for each protein.

Parameters:

Name Type Description Default
directories str

List of directories to organize.

()

format_duration(seconds)

Format a duration in seconds into a human-readable string.

Parameters:

Name Type Description Default
seconds float

Duration in seconds.

required

Returns: str: Formatted duration string.

tfopt.evol.utils.params

create_no_psite_array(n_TF, num_psites, psite_labels_arr)

Create an array indicating whether each TF has no phosphorylation sites.

Parameters:

Name Type Description Default
n_TF int

Number of transcription factors.

required
num_psites list

List of number of phosphorylation sites for each TF.

required
psite_labels_arr list

List of phosphorylation site labels for each TF.

required

Returns:

Name Type Description
no_psite_tf ndarray

Array indicating whether each TF has no phosphorylation sites.

compute_beta_indices(num_psites, n_TF)

Compute the starting indices for the beta parameters for each TF.

Parameters:

Name Type Description Default
num_psites list

List of number of phosphorylation sites for each TF.

required
n_TF int

Number of transcription factors.

required

Returns:

Name Type Description
beta_start_indices ndarray

Array of starting indices for the beta parameters.

cum int

Total number of beta parameters.

create_initial_guess(n_mRNA, n_reg, n_TF, num_psites, no_psite_tf)

Create the initial guess for the optimization variables.

Parameters:

Name Type Description Default
n_mRNA int

Number of mRNAs.

required
n_reg int

Number of regulators.

required
n_TF int

Number of transcription factors.

required
num_psites list

List of number of phosphorylation sites for each TF.

required
no_psite_tf ndarray

Array indicating whether each TF has no phosphorylation sites.

required

Returns:

Name Type Description
x0 ndarray

Initial guess for the optimization variables.

n_alpha int

Number of alpha parameters.

create_bounds(n_alpha, n_beta_total, lb, ub)

Create the lower and upper bounds for the optimization variables.

Parameters:

Name Type Description Default
n_alpha int

Number of alpha parameters.

required
n_beta_total int

Total number of beta parameters.

required
lb float

Lower bound for the optimization variables.

required
ub float

Upper bound for the optimization variables.

required

Returns:

Name Type Description
xl ndarray

Lower bounds for the optimization variables.

xu ndarray

Upper bounds for the optimization variables.

get_parallel_runner()

Get a parallel runner for multi-threading.

Returns:

Name Type Description
runner

Parallelization runner.

pool

ThreadPool instance for parallel execution.

extract_best_solution(res, n_alpha, n_mRNA, n_reg, n_TF, num_psites, beta_start_indices)

Extract the best solution from the optimization results.

Parameters:

Name Type Description Default
res

Optimization results.

required
n_alpha int

Number of alpha parameters.

required
n_mRNA int

Number of mRNAs.

required
n_reg int

Number of regulators.

required
n_TF int

Number of transcription factors.

required
num_psites list

List of number of phosphorylation sites for each TF.

required
beta_start_indices ndarray

Array of starting indices for the beta parameters.

required

Returns:

Name Type Description
final_alpha ndarray

Final alpha parameters.

final_beta ndarray

Final beta parameters.

best_objectives ndarray

Best objectives from the Pareto front.

final_x ndarray

Final optimization variables.

print_alpha_mapping(mRNA_ids, reg_map, TF_ids, final_alpha)

Print the mapping of transcription factors (TFs) to mRNAs with their corresponding alpha values.

Parameters:

Name Type Description Default
mRNA_ids list

List of mRNA identifiers.

required
reg_map dict

Mapping of genes to their regulators.

required
TF_ids list

List of TF identifiers.

required
final_alpha ndarray

Final alpha parameters (mRNA x TF).

required

print_beta_mapping(TF_ids, final_beta, psite_labels_arr)

Print the mapping of transcription factors (TFs) to their beta parameters.

Parameters:

Name Type Description Default
TF_ids list

List of TF identifiers.

required
final_beta ndarray

Final beta parameters (TF x β).

required
psite_labels_arr list

List of phosphorylation site labels for each TF.

required

Gradient-Based Algorithms

tfopt.local.config.constants

parse_args()

tfopt.local CLI: bounds and loss selection. Defaults come from config.toml.

tfopt.local.config.logconf

tfopt.local.exporter.plotout

plot_estimated_vs_observed(predictions, expression_matrix, gene_ids, time_points, regulators, tf_protein_matrix, tf_ids, num_targets, save_path=OUT_DIR)

Plots the estimated vs observed values for a given set of genes and their corresponding TFs.

Parameters:

Name Type Description Default
predictions ndarray

Predicted expression levels.

required
expression_matrix ndarray

Observed expression levels.

required
gene_ids list

List of gene identifiers.

required
time_points ndarray

Time points for the experiments.

required
regulators ndarray

Matrix of regulators for each gene.

required
tf_protein_matrix ndarray

Matrix of TF protein levels.

required
tf_ids list

List of TF identifiers.

required
num_targets int

Number of target genes to plot.

required
save_path str

Directory to save the plots.

OUT_DIR

plot_multistart_summary_runtime_overlay(summary_csv, out_path=None, figsize=(8, 8), x_col='rank', y_col='fun', c_col='runtime_s', success_col='success', cv_col='constr_violation', annotate_best=True)

Creates a scatter plot visualizing multi-start optimization results with runtime overlay.

This function reads a CSV summary of multiple optimization runs and generates a scatter plot showing the relationship between run rank and final objective value, with runtime (or iterations) represented as color intensity. Successful and feasible runs are emphasized while unsuccessful or infeasible runs are shown with reduced opacity.

Parameters:

Name Type Description Default
summary_csv str or Path

Path to the CSV file containing multi-start optimization results.

required
out_path str or Path

Path to save the output figure. If None, figure is not saved. Defaults to None.

None
figsize tuple

Figure size as (width, height) in inches. Defaults to (8, 8).

(8, 8)
x_col str

Column name for x-axis (run rank). If missing, will be created from y_col ranking. Defaults to "rank".

'rank'
y_col str

Column name for y-axis (final objective value). Defaults to "fun".

'fun'
c_col str

Column name for color mapping (typically runtime). Falls back to "nit" (iterations) if not found. Defaults to "runtime_s".

'runtime_s'
success_col str

Column name indicating optimization success status. Defaults to "success".

'success'
cv_col str

Column name for constraint violation values. Falls back to common alternatives if not found. Defaults to "constr_violation".

'constr_violation'
annotate_best bool

Whether to annotate the best (rank 1) point on the plot. Defaults to True.

True

Returns:

Name Type Description
tuple

A tuple containing: - fig (matplotlib.figure.Figure): The generated figure object. - ax (matplotlib.axes.Axes): The axes object of the plot. - df (pd.DataFrame): The processed DataFrame with sorted results.

Notes
  • Points are considered feasible if constraint violation <= 1e-8
  • Infeasible or unsuccessful runs are plotted with reduced opacity (0.25)
  • If rank column is missing, it's automatically generated from objective values
  • If runtime column is missing, falls back to iteration count or constant color

tfopt.local.exporter.sheetutils

save_results_to_excel(gene_ids, tf_ids, final_alpha, final_beta, psite_labels_arr, expression_matrix, predictions, objective_value, reg_map, filename=OUT_FILE)

Save the optimization results to an Excel file.

Parameters:

Name Type Description Default
gene_ids list

List of gene identifiers.

required
tf_ids list

List of TF identifiers.

required
final_alpha ndarray

Final alpha values.

required
final_beta ndarray

Final beta values.

required
psite_labels_arr list

List of phosphorylation site labels.

required
expression_matrix ndarray

Observed expression levels.

required
predictions ndarray

Predicted expression levels.

required
objective_value float

Objective value from optimization.

required
reg_map dict

Mapping of genes to regulators.

required
filename str

Path to the output Excel file.

OUT_FILE

export_multistart_results(results)

Export multiple multistart optimization results to an Excel file.

Parameters:

Name Type Description Default
results list

List of optimization results, each containing attributes like 'start_id', 'fun', 'success', etc.

required

Returns:

Type Description

None

save_multistart_solutions_npz(all_results, out_path)

Saves multistart optimization solutions to a compressed .npz file format.

This function aggregates optimization results into a structured format and saves them in a compressed NumPy .npz file. It processes the solutions, extracting relevant attributes such as optimization variables, function values, success status, and starting IDs, before saving them for later use.

Parameters:

Name Type Description Default
all_results

list A list of optimization result objects. Each result object must have the attributes x (optimization solution vector) and fun (objective function value). Optionally, it can have success (indicating whether the optimization succeeded, defaults to False if not present) and start_id (identifier of the starting point, defaults to -1 if not present).

required
out_path

str or Path The file path where the compressed .npz file will be saved. The path will be converted into a pathlib Path object if it is not already one.

required

tfopt.local.objfn.minfn

objective_(x, expression_matrix, regulators, tf_protein_matrix, psite_tensor, n_reg, T_use, n_genes, beta_start_indices, num_psites, loss_type, lam1=1e-06, lam2=1e-06)

Originally implemented by Julius Normann.

This version has been modified and optimized for consistency & speed in submodules by Abhinav Mishra.

Computes a loss value using one of several loss functions.

Parameters:

Name Type Description Default
x

Decision vector.

required
expression_matrix

(n_genes x T_use) measured gene expression values.

required
regulators

(n_genes x n_reg) indices of TF regulators for each gene.

required
tf_protein_matrix

(n_TF x T_use) TF protein time series.

required
psite_tensor

(n_TF x n_psite_max x T_use) matrix of PSite signals (padded with zeros).

required
n_reg

Maximum number of regulators per gene.

required
T_use

Number of time points used.

required
n_genes, n_TF

Number of genes and TF respectively.

required
beta_start_indices

Integer array giving the starting index (in the β–segment) for each TF.

required
num_psites

Integer array with the actual number of PSites for each TF.

required
loss_type

Integer indicating the loss type (0: MSE, 1: MAE, 2: soft L1, 3: Cauchy, 4: Arctan, 5: Elastic Net, 6: Tikhonov).

required

Returns:

Name Type Description
loss

The computed loss (a scalar).

compute_predictions(x, regulators, tf_protein_matrix, psite_tensor, n_reg, T_use, n_genes, beta_start_indices, num_psites)

Computes the predicted expression matrix based on the decision vector x.

Parameters:

Name Type Description Default
x

Decision vector.

required
regulators

(n_genes x n_reg) indices of TF regulators for each gene.

required
tf_protein_matrix

(n_TF x T_use) TF protein time series.

required
psite_tensor

(n_TF x n_psite_max x T_use) matrix of PSite signals (padded with zeros).

required
n_reg

Maximum number of regulators per gene.

required
T_use

Number of time points used.

required
n_genes

Number of genes.

required
beta_start_indices

Integer array giving the starting index (in the β–segment) for each TF.

required
num_psites

Integer array with the actual number of PSites for each TF.

required

Returns:

Name Type Description
predictions

(n_genes x T_use) predicted gene expression values.

objective_wrapper(x, expression_matrix, regulators, tf_protein_matrix, psite_tensor, n_reg, T_use, n_genes, beta_start_indices, num_psites, loss_type)

Wrapper function for the objective function.

Parameters:

Name Type Description Default
x

Decision vector.

required
expression_matrix

(n_genes x T_use) measured gene expression values.

required
regulators

(n_genes x n_reg) indices of TF regulators for each gene.

required
tf_protein_matrix

(n_TF x T_use) TF protein time series.

required
psite_tensor

(n_TF x n_psite_max x T_use) matrix of PSite signals (padded with zeros).

required
n_reg

Maximum number of regulators per gene.

required
T_use

Number of time points used.

required
n_genes

Number of genes.

required
beta_start_indices

Integer array giving the starting index (in the β–segment) for each TF.

required
num_psites

Integer array with the actual number of PSites for each TF.

required
loss_type

Integer indicating the loss type.

required

Returns:

Name Type Description
loss

The computed loss (a scalar).

tfopt.local.opt.optrun

run_optimizer(x0, bounds, lin_cons, expression_matrix, regulators, tf_protein_matrix, psite_tensor, n_reg, T_use, n_genes, beta_start_indices, num_psites, loss_type)

Runs the optimization algorithm to minimize the objective function.

Parameters:

Name Type Description Default
x0

Initial guess for the optimization variables.

required
bounds

Bounds for the optimization variables.

required
lin_cons

Linear constraints for the optimization problem.

required
expression_matrix

(n_genes x T_use) measured gene expression values.

required
regulators

(n_genes x n_reg) indices of TF regulators for each gene.

required
tf_protein_matrix

(n_TF x T_use) TF protein time series.

required
psite_tensor

(n_TF x n_psite_max x T_use) matrix of PSite signals (padded with zeros).

required
n_reg

Maximum number of regulators per gene.

required
T_use

Number of time points used.

required
n_genes, n_TF

Number of genes and TF respectively.

required
beta_start_indices

Integer array giving the starting index (in the β–segment) for each TF.

required
num_psites

Integer array with the actual number of PSites for each TF.

required
loss_type

Type of loss function to use.

required

Returns: result : Result of the optimization process, including the optimized parameters and objective value.

generate_multistart_x0(x0: np.ndarray, bounds: Sequence[Tuple[float, float]], n_starts: int, seed: int = 0, jitter_frac: float = 0.05, p_random: float = 0.3) -> List[np.ndarray]

Generates multiple starting points for multi-start optimization.

Builds a diverse list of starting points
  • mostly: jitter around baseline x0
  • some: fully random within bounds

jitter_frac is relative to (ub - lb). p_random is fraction of starts that are random-in-bounds.

Returns:

Type Description
List[ndarray]

List[np.ndarray]: A list of starting points for optimization.

run_optimizer_multistart(x0: np.ndarray, bounds, lin_cons, expression_matrix, regulators, tf_protein_matrix, psite_tensor, n_reg, T_use, n_genes, beta_start_indices, num_psites, loss_type, run_optimizer_func, cfg: Optional[MultiStartConfig] = None, polish: bool = True)

Executes a multistart optimization loop with parallelization and optional polishing to find the best solution across multiple starting points. The function leverages a parallel approach for running multiple optimizations, selects the best result based on predefined sorting criteria, and optionally refines it.

Parameters:

Name Type Description Default
x0 ndarray

Initial guess for the optimization variables.

required
bounds

Bounds for the optimization variables, typically a sequence of (min, max) pairs.

required
lin_cons

Linear constraints for the optimizer, defined as per specific optimizer requirements.

required
expression_matrix

Input gene expression data utilized in the optimization process.

required
regulators

Regulatory inputs or factors influencing the optimization process.

required
tf_protein_matrix

Matrix representing transcription factor proteins relevant to the process.

required
psite_tensor

Tensor containing phosphorylation site data used in the computations.

required
n_reg

Number of regulators involved in the optimization.

required
T_use

Specific configuration parameter determining time or iteration usage.

required
n_genes

Number of genes considered within the problem scope.

required
beta_start_indices

Indices indicating the start positions of beta parameters in the optimization.

required
num_psites

Total number of phosphorylation sites accounted for in optimization.

required
loss_type

Type of loss function used for evaluating optimization performance.

required
run_optimizer_func

Optimization function to be executed for each starting point.

required
cfg Optional[MultiStartConfig]

Configuration object specifying multistart parameters such as number of starts, parallelization settings, and randomness.

None
polish bool

Indicates whether to perform a final optimization run initialized at the best solution. Defaults to True.

True

Returns:

Name Type Description
Tuple

A tuple containing: - The best result as determined by sorting criteria. - A list of sorted optimization results from all starting points.

tfopt.local.optcon.construct

build_fixed_arrays(gene_ids, expression_matrix, tf_ids, tf_protein, tf_psite_data, tf_psite_labels, reg_map)

Builds fixed-shape arrays from the input data.

Parameters:

Name Type Description Default
- gene_ids

list of mRNA identifiers.

required
- expression_matrix

array of shape (n_genes, T) with mRNA expression levels.

required
- tf_ids

list of TF identifiers.

required
- tf_protein

dict mapping TFs to their protein levels.

required
- tf_psite_data

dict mapping TFs to their phosphorylation sites.

required
- tf_psite_labels

dict mapping TFs to their phosphorylation site labels.

required
- reg_map

mapping of genes to their regulators (TFs).

required

Returns:

Type Description
  • expression_matrix: array of shape (n_genes, T) with mRNA expression levels.
  • regulators: array of shape (n_genes, n_reg) with TF indices.
  • tf_protein_matrix: array of shape (n_TF, T) with TF protein levels.
  • psite_tensor: array of shape (n_TF, n_psite_max, T) with phosphorylation sites.
  • n_reg: number of regulators.
  • n_psite_max: maximum number of phosphorylation sites across all TFs.
  • psite_labels_arr: list of labels for each TF's phosphorylation sites.
  • num_psites: array indicating the number of phosphorylation sites for each TF.

constraint_alpha_func(x, n_genes, n_reg)

For each gene, the sum of its alpha parameters must equal 1.

Parameters:

Name Type Description Default
x ndarray

Decision vector.

required
n_genes int

Number of genes.

required
n_reg int

Number of regulators.

required

Returns:

Type Description

np.ndarray: Array of constraints.

constraint_beta_func(x, n_alpha, n_TF, beta_start_indices, num_psites, no_psite_tf)

For each TF, the sum of its beta parameters must equal 1.

Parameters:

Name Type Description Default
x ndarray

Decision vector.

required
n_alpha int

Number of alpha parameters.

required
n_TF int

Number of transcription factors.

required
beta_start_indices list

List of starting indices for beta parameters.

required
num_psites list

List of number of phosphorylation sites for each TF.

required
no_psite_tf list

List indicating if a TF has no phosphorylation site.

required

Returns:

Type Description

np.ndarray: Array of constraints.

build_linear_constraints(n_genes, n_TF, n_reg, n_alpha, beta_start_indices, num_psites, no_psite_tf)

Build linear constraints for the transcription factor optimization problem.

Parameters:

Name Type Description Default
n_genes int

Number of genes.

required
n_TF int

Number of transcription factors.

required
n_reg int

Number of regulators.

required
n_alpha int

Number of alpha parameters.

required
beta_start_indices list

List of starting indices for beta parameters.

required
num_psites list

List of number of phosphorylation sites for each TF.

required
no_psite_tf list

List indicating if a TF has no phosphorylation site.

required

Returns:

Name Type Description
list

List of linear constraints.

tfopt.local.optcon.filter

load_and_filter_data()

Load and filter data for the optimization problem.

Returns:

Type Description
  • gene_ids (list): List of gene IDs.
  • expr_matrix (np.ndarray): Gene expression matrix.
  • expr_time_cols (list): Time columns for expression data.
  • tf_ids (list): List of transcription factor IDs.
  • tf_protein (dict): Dictionary mapping TF IDs to their protein data.
  • tf_psite_data (dict): Dictionary mapping TF IDs to their phosphorylation site data.
  • tf_psite_labels (dict): Dictionary mapping TF IDs to their phosphorylation site labels.
  • tf_time_cols (list): Time columns for TF data.
  • reg_map (dict): Regulation map, mapping gene IDs to their regulators.

prepare_data(gene_ids, expr_matrix, tf_ids, tf_protein, tf_psite_data, tf_psite_labels, tf_time_cols, reg_map)

Prepares the data for optimization by filtering the expression matrix to match the number of time points and building fixed arrays.

Parameters:

Name Type Description Default
gene_ids list

List of gene IDs.

required
expr_matrix ndarray

Gene expression matrix.

required
tf_ids list

List of transcription factor IDs.

required
tf_protein dict

Dictionary mapping TF IDs to their protein data.

required
tf_psite_data dict

Dictionary mapping TF IDs to their phosphorylation site data.

required
tf_psite_labels dict

Dictionary mapping TF IDs to their phosphorylation site labels.

required
tf_time_cols list

Time columns for TF data.

required
reg_map dict

Regulation map, mapping gene IDs to their regulators.

required

Returns: fixed_arrays (tuple): Tuple containing the fixed arrays: - expression_matrix: array of shape (n_genes, T) - regulators: array of shape (n_genes, n_reg) with indices into tf_ids. - tf_protein_matrix: array of shape (n_TF, T) - psite_tensor: array of shape (n_TF, n_psite_max, T), padded with zeros. - n_reg: maximum number of regulators per gene. - n_psite_max: maximum number of PSites among TFs. - psite_labels_arr: list (length n_TF) of lists of PSite names (padded with empty strings). - num_psites: array of length n_TF with the actual number of PSites for each TF. T_use (int): Number of time points used in the expression matrix.

tfopt.local.utils.iodata

min_max_normalize(df, custom_max=None)

Row-wise (per-sample) min-max normalize time-series columns starting with 'x'.

Parameters:

Name Type Description Default
df DataFrame

Input DataFrame with time-series columns (x1-xN).

required
custom_max float

If given, used as max for all rows.

None

Returns:

Type Description

pd.DataFrame: Normalized DataFrame with same shape.

load_expression_data(filename=INPUT3)

Loads gene expression (mRNA) data.

Parameters:

Name Type Description Default
filename str

Path to the CSV file containing mRNA data.

INPUT3

Returns:

Type Description
  • gene_ids: List of gene identifiers (strings).
  • expression_matrix: Matrix of gene expression data (numpy array).
  • time_cols: List of time columns (excluding "GeneID").

load_tf_protein_data(filename=INPUT1)

Loads TF protein data along with PSite information.

Parameters:

Name Type Description Default
filename str

Path to the CSV file containing TF protein data.

INPUT1

Returns: - tf_ids: List of TF identifiers (strings). - tf_protein: Dictionary mapping TF identifiers to their protein data (numpy array). - tf_psite_data: Dictionary mapping TF identifiers to their phosphorylation site data (list of numpy arrays). - tf_psite_labels: Dictionary mapping TF identifiers to their phosphorylation site labels (list of strings). - time_cols: List of time columns (excluding "GeneID" and "Psite").

load_regulation(filename=INPUT4)

Returns a mapping from gene (source) to a list of TFs (targets).

Parameters:

Name Type Description Default
filename str

Path to the CSV file containing regulation data.

INPUT4

Returns: - reg_map: Dictionary mapping gene identifiers to lists of TF identifiers.

summarize_stats(input3=INPUT3, input1=INPUT1, input4=INPUT4)

Summarizes statistics for the expression data (input3) and TF protein data (input1).

Parameters:

Name Type Description Default
input3 str

Path to the expression data CSV file.

INPUT3
input1 str

Path to the TF protein data CSV file.

INPUT1
input4 str

Path to the mapping file CSV.

INPUT4

create_report(results_dir: str, output_file: str = 'report.html')

Creates a single global report HTML file from all gene folders inside the results directory.

Parameters:

Name Type Description Default
results_dir str

Path to the root results directory.

required
output_file str

Name of the generated global report file (placed inside results_dir).

'report.html'

organize_output_files(*directories)

Function to organize output files into protein-specific folders.

Parameters:

Name Type Description Default
directories str

List of directories to organize.

()

tfopt.local.utils.params

get_optimization_parameters(expression_matrix, tf_protein_matrix, n_reg, T_use, psite_labels_arr, num_psites, lb, ub)

Prepare the optimization parameters for the optimization problem.

Parameters:

Name Type Description Default
expression_matrix ndarray

Gene expression matrix.

required
tf_protein_matrix ndarray

TF protein matrix.

required
n_reg int

Number of regulators.

required
T_use int

Number of time points to use.

required
psite_labels_arr list

List of phosphorylation site labels for each TF.

required
num_psites ndarray

Array containing the number of phosphorylation sites for each TF.

required
lb float

Lower bound for beta parameters.

required
ub float

Upper bound for beta parameters.

required

Returns: x0 (np.ndarray): Initial guess for the optimization variables. n_alpha (int): Number of alpha parameters. beta_start_indices (np.ndarray): Starting indices for beta parameters. bounds (list): List of bounds for the optimization variables. no_psite_tf (np.ndarray): Array indicating whether each TF has no phosphorylation sites. n_genes (int): Number of genes. n_TF (int): Number of transcription factors.

postprocess_results(result, n_alpha, n_genes, n_reg, beta_start_indices, num_psites, reg_map, gene_ids, tf_ids, psite_labels_arr)

Post-process the optimization results to extract the final alpha and beta parameters.

Parameters:

Name Type Description Default
result OptimizeResult

The result of the optimization.

required
n_alpha int

Number of alpha parameters.

required
n_genes int

Number of genes.

required
n_reg int

Number of regulators.

required
beta_start_indices ndarray

Starting indices for beta parameters.

required
num_psites ndarray

Array containing the number of phosphorylation sites for each TF.

required
reg_map dict

Regulation map, mapping gene IDs to their regulators.

required
gene_ids list

List of gene IDs.

required
tf_ids list

List of transcription factor IDs.

required
psite_labels_arr list

List of lists containing phosphorylation site labels.

required

Returns:

Name Type Description
final_x ndarray

Final optimization result.

final_alpha ndarray

Final alpha parameters reshaped into a matrix.

final_beta ndarray

Final beta parameters reshaped into a matrix.

Fitting Analysis

tfopt.fitanalysis.helper

Plotter

A class to plot various analysis results from an Excel file.

__init__(filepath, savepath)

Initializes the Plotter instance by loading data from the Excel file. Args: filepath (str): Path to the Excel file containing analysis results. savepath (str): Directory where the plots will be saved.

load_data()

Loads data from the specified Excel file. Args: filepath (str): Path to the Excel file. savepath (str): Directory where the plots will be saved.

plot_alpha_distribution()

Plots the distribution of alpha parameter values grouped by transcription factors (TFs) using a strip plot.

plot_beta_barplots()

Processes the beta values DataFrame and creates a separate bar plot for each unique transcription factor (TF).

plot_heatmap_abs_residuals()

Plots a heatmap of the absolute values of the residuals.

plot_goodness_of_fit()

Creates a scatter plot comparing observed vs. estimated values, fits a linear regression model, plots the 95% confidence interval, and labels points outside the confidence interval.

plot_kld()

Plots the Kullback-Leibler Divergence (KLD) for each mRNA. The KLD is calculated between the observed and estimated distributions of the mRNA expression levels.

plot_pca()

Plots a PCA (Principal Component Analysis) of the observed and estimated values.

plot_boxplot_alpha()

Plots a boxplot of the alpha values.

plot_boxplot_beta()

Plots a boxplot of the beta values.

plot_cdf_alpha()

Plots the cumulative distribution function (CDF) of the alpha values.

plot_cdf_beta()

Plots the cumulative distribution function (CDF) of the beta values.

plot_time_wise_residuals()

Plots the residuals over time for each mRNA.

ODE Modelling & Parameter Estimation

Configuration

config.cli

Command‑line entry point for the phoskintime pipeline.

Usage

Come one level up from the package root, it should be the working directory

(where you can see the project directory).

run everything with the default (local) solver

python phoskintime all

run only preprocessing

python phoskintime prep

run tfopt with local flavour

python phoskintime tfopt --mode local

run tfopt with evol flavour

python phoskintime tfopt --mode evol

run kinopt with local flavour

python phoskintime kinopt --mode local

run kinopt with evol flavour

python phoskintime kinopt --mode evol

run the model

python phoskintime model

run the integrated Global Model

python phoskintime networkmodel

prep()

Preprocess data (processing.cleanup).

tfopt(mode: str = typer.Option('local', help='local | evol'), conf: Path | None = typer.Option(None, '--conf', file_okay=True, dir_okay=False, writable=False, help='Path to TOML/YAML config. Uses defaults if omitted.'))

Transcription-Factor-mRNA Optimisation.

Parameters:

Name Type Description Default
mode str

local | evol

Option('local', help='local | evol')
conf Path | None

Path to TOML/YAML config. Uses defaults if omitted.

Option(None, '--conf', file_okay=True, dir_okay=False, writable=False, help='Path to TOML/YAML config. Uses defaults if omitted.')

Returns: None

kinopt(mode: str = typer.Option('local', help='local | evol'), conf: Path | None = typer.Option(None, '--conf', file_okay=True, dir_okay=False, writable=False, help='Path to TOML/YAML config. Uses defaults if omitted.'))

Kinase-Phosphorylation Optimization.

Parameters:

Name Type Description Default
mode str

local | evol

Option('local', help='local | evol')
conf Path | None

Path to TOML/YAML config. Uses defaults if omitted.

Option(None, '--conf', file_okay=True, dir_okay=False, writable=False, help='Path to TOML/YAML config. Uses defaults if omitted.')

Returns: None

model(conf: Path | None = typer.Option(None, '--conf', file_okay=True, dir_okay=False, writable=False, help='Path to model config file. Uses defaults if omitted.'))

Run the model (runner.main).

Parameters:

Name Type Description Default
conf Path | None

Path to model config file. Uses defaults if omitted.

Option(None, '--conf', file_okay=True, dir_okay=False, writable=False, help='Path to model config file. Uses defaults if omitted.')

Returns: None

networkmodel(conf: Path | None = typer.Option('config.toml', '--conf', file_okay=True, dir_okay=False, writable=False, help='Path to global model config file. Uses config.toml by default.'))

Run the integrated Global Model (networkmodel.runner).

This runs the unified optimization pipeline defined in the [networkmodel] section of your configuration.

clean()

Remove all pycache, .pyc, .nbc, and build artifacts recursively.

all(tf_mode: str = typer.Option('local', help='tfopt mode: local | evol'), kin_mode: str = typer.Option('local', help='kinopt mode: local | evol'), tf_conf: Path | None = typer.Option(None, help='tfopt config file'), kin_conf: Path | None = typer.Option(None, help='kinopt config file'), model_conf: Path | None = typer.Option(None, help='model config file'))

Run every stage in sequence. Preprocessing -> TF optimisation -> Kinase optimisation -> Model.

Note: This command does NOT run the global network simulation (networkmodel). To run the global model, use the separate entry point phoskintime-global (or python -m networkmodel.runner) after this command completes.

Parameters:

Name Type Description Default
tf_mode str

tfopt mode: local | evol

Option('local', help='tfopt mode: local | evol')
kin_mode str

kinopt mode: local | evol

Option('local', help='kinopt mode: local | evol')
tf_conf Path | None

Path to TOML/YAML config. Uses defaults if omitted.

Option(None, help='tfopt config file')
kin_conf Path | None

Path to TOML/YAML config. Uses defaults if omitted.

Option(None, help='kinopt config file')
model_conf Path | None

Path to model config file. Uses defaults if omitted.

Option(None, help='model config file')

Returns: None

config.config

parse_bound_pair(val)

Parse a string representing a pair of bounds (lower, upper) into a tuple of floats. The upper bound can be 'inf' or 'infinity' to represent infinity. Raises ValueError if the input is not in the correct format. Args: val (str): The string to parse, e.g., "0,3" or "0,infinity". Returns: tuple: A tuple containing the lower and upper bounds as floats.

parse_fix_value(val)

Parse a fixed value or a list of fixed values from a string. If the input is a single value, it returns that value as a float. If the input is a comma-separated list, it returns a list of floats. Raises ValueError if the input is not in the correct format. Args: val (str): The string to parse, e.g., "1.0" or "1.0,2.0". Returns: float or list: The parsed fixed value(s) as a float or a list of floats.

ensure_output_directory(directory)

Parameters:

Name Type Description Default
directory str

The path to the directory to create.

required

Returns: None

parse_args()

Parse command-line arguments for the PhosKinTime script. This function uses argparse to define and handle the command-line options. It includes options for setting bounds, fixed parameters, bootstrapping, profile estimation, and input file paths. The function returns the parsed arguments as a Namespace object. The arguments include: --A-bound, --B-bound, --C-bound, --D-bound, --Ssite-bound, --Dsite-bound, --bootstraps, --input-excel-protein, --input-excel-psite, --input-excel-rna.

Returns: argparse.Namespace: The parsed command-line arguments.

log_config(logger, bounds, args)

Log the configuration settings for the PhosKinTime script. This function logs the parameter bounds bootstrapping iterations. It uses the provided logger to output the information.

Parameters:

Name Type Description Default
logger Logger

The logger to use for logging.

required
bounds dict

The parameter bounds.

required
args Namespace

The command-line arguments.

required

Returns: None

extract_config(args)

Extract configuration settings from command-line arguments. This function creates a dictionary containing the parameter bounds, bootstrapping iterations. The function returns the configuration dictionary.

Parameters:

Name Type Description Default
args Namespace

The command-line arguments.

required

Returns: dict: The configuration settings.

score_fit(params, target, prediction, alpha=ALPHA_WEIGHT, beta=BETA_WEIGHT, gamma=GAMMA_WEIGHT, delta=DELTA_WEIGHT, mu=MU_WEIGHT)

Calculate the score for the fit of a model to target data. The score is a weighted combination of various metrics including mean squared error (MSE), root mean squared error (RMSE), mean absolute error (MAE), variance, and regularization penalty. The weights for each metric can be adjusted using the parameters alpha, beta, gamma, and delta. The regularization penalty is controlled by the reg_penalty parameter. The function returns the calculated score. Args: params (np.ndarray): The model parameters. target (np.ndarray): The target data. prediction (np.ndarray): The predicted data. alpha (float): Weight for RMSE. beta (float): Weight for MAE. gamma (float): Weight for variance. delta (float): Weight for MSE. mu (float): Regularization penalty weight. Returns: float: The calculated score.

future_times(n_new: int, ratio: Optional[float] = None, tp: np.ndarray = TIME_POINTS) -> np.ndarray

Extend ttime points by n_new points, each spaced by multiplying the previous interval by ratio. If ratio is None, it is inferred from the last two points.

Parameters:

Name Type Description Default
n_new int

Number of new time points to generate.

required
ratio float

Ratio to multiply the previous interval. Defaults to None.

None
tp ndarray

Existing time points. Defaults to TIME_POINTS.

TIME_POINTS

Returns: np.ndarray: Extended time points.

config.constants

get_param_names_rand(num_psites: int) -> list

Generate parameter names for the random model. Format: ['A', 'B', 'C', 'D'] + ['S1', 'S2', ..., 'S'] + [parameter names for all combinations of dephosphorylation sites].

Parameters:

Name Type Description Default
num_psites int

Number of phosphorylation sites.

required

Returns: list: List of parameter names.

get_param_names_ds(num_psites: int) -> list

Generate parameter names for distributive or successive models. Format: ['A', 'B', 'C', 'D'] + ['S1', 'S2', ..., 'S'] + ['D1', 'D2', ..., 'D'].

Parameters:

Name Type Description Default
num_psites int

Number of phosphorylation sites.

required

Returns: list: List of parameter names.

generate_labels_rand(num_psites: int) -> list

Generates labels for the states based on the number of phosphorylation sites for the random model. Returns a list with the base labels "R" and "P", followed by labels for all combinations of phosphorylated sites.

Parameters:

Name Type Description Default
num_psites int

Number of phosphorylation sites.

required

Returns: list: List of state labels.

generate_labels_ds(num_psites: int) -> list

Generates labels for the states based on the number of phosphorylation sites for the distributive or successive models. Returns a list with the base labels "R" and "P", followed by labels for each individual phosphorylated state.

Parameters:

Name Type Description Default
num_psites int

Number of phosphorylation sites.

required

Returns: list: List of state labels.

location(path: str, label: str = None) -> str

Returns a clickable hyperlink string for supported terminals using ANSI escape sequences.

Parameters:

Name Type Description Default
path str

The file path or URL.

required
label str

The display text for the link. Defaults to the path if not provided.

None

Returns:

Name Type Description
str str

A string that, when printed, shows a clickable link in terminals that support ANSI hyperlinks.

get_number_of_params_rand(num_psites)

Calculate the number of parameters required for the ODE system based on the number of phosphorylation sites.

Parameters:

Name Type Description Default
num_psites int

Number of phosphorylation sites (1 to 4).

required

Returns:

Name Type Description
int

Total number of parameters.

get_bounds_rand(num_psites, ub=0, lower=0)

Generate bounds for the ODE parameters based on the number of phosphorylation sites.

Parameters:

Name Type Description Default
num_psites int

Number of phosphorylation sites.

required
lower float

Lower bound for parameters.

0
upper float

Upper bound for parameters.

required

Returns:

Name Type Description
list

List of bounds as [lower, upper] for each parameter.

config.logconf

ColoredFormatter

Bases: Formatter

Custom formatter to add colors to log messages and elapsed time. This formatter uses ANSI escape codes to colorize the log messages based on their severity level. It also includes a right-aligned clock that shows the elapsed time since the logger was initialized. The elapsed time is displayed in a human-readable format (e.g., "1h 23m 45s"). The formatter is designed to be used with a logger that has a console handler. The elapsed time is calculated from the time the logger was initialized and is displayed in a right-aligned format. The formatter also ensures that the log messages are padded to a specified width, which can be adjusted using the width parameter. The remove_ansi method is used to strip ANSI escape codes from the log message for accurate padding calculation. The format method is overridden to customize the log message format, including the timestamp, logger name, log level, and message. The setup_logger function is used to configure the logger with a file handler and a stream handler. The file handler writes log messages to a specified log file, while the stream handler outputs log messages to the console. The logger is set to the specified logging level, and the log file is created in the specified directory. The log file is rotated based on size, and old log files are backed up.

format(record)

Format the log record with colors and elapsed time. This method overrides the default format method to customize the log message format. It includes the timestamp, logger name, log level, and message.

remove_ansi(s) staticmethod

Remove ANSI escape codes from a string.

setup_logger(name=None, log_file=None, level=logging.DEBUG, log_dir=LOG_DIR, rotate=True, max_bytes=2 * 1024 * 1024, backup_count=5, mp_file_logging='main_only')

Setup a logger with colored output and file logging. This function creates a logger with colored output for console messages :param name: :param log_file: :param level: :param log_dir: :param rotate: :param max_bytes: :param backup_count: :param mp_file_logging: - "off": disable file logging - "main_only": file logging only in main process - "per_process": file logging in each process :return: logger

Core Functions

protwise.paramest.normest

worker_find_lambda(lam: float, gene: str, target: np.ndarray, p0: np.ndarray, time_points: np.ndarray, free_bounds: Tuple[np.ndarray, np.ndarray], init_cond: np.ndarray, num_psites: int, p_data: np.ndarray, pr_data: np.ndarray) -> Tuple[float, float, str]

Worker function for a single lambda value.

Parameters:

Name Type Description Default
lam float

Regularization parameter.

required
gene str

Gene name.

required
target ndarray

Target data.

required
p0 ndarray

Initial parameter guess.

required
time_points ndarray

Time points for the model fitting.

required
free_bounds Tuple[ndarray, ndarray]

Parameter bounds for the optimization.

required
init_cond ndarray

Initial conditions for the ODE solver.

required
num_psites int

Number of phosphorylation sites.

required
p_data ndarray

Measurement data for protein-phospho.

required
pr_data ndarray

Reference data for protein.

required

Returns:

Type Description
Tuple[float, float, str]

Tuple containing the lambda value, score, and weight key.

find_best_lambda(gene: str, target: np.ndarray, p0: np.ndarray, time_points: np.ndarray, free_bounds: Tuple[np.ndarray, np.ndarray], init_cond: np.ndarray, num_psites: int, p_data: np.ndarray, pr_data: np.ndarray, lambdas=np.logspace(-2, 0, 10), max_workers: int = 4, per_lambda_timeout: float = 1800.0) -> Tuple[float, str]

Finds best lambda_reg to use in model_func.

normest(gene, pr_data, p_data, r_data, init_cond, num_psites, time_points, bounds, bootstraps, use_regularization=USE_REGULARIZATION)

Function to estimate parameters for a given gene using ODE models.

Parameters:

Name Type Description Default
gene

Gene name.

required
pr_data

Protein data.

required
p_data

Phosphorylation data.

required
r_data

Reference data.

required
init_cond

Initial conditions for the ODE solver.

required
num_psites

Number of phosphorylation sites.

required
time_points

Time points for the model fitting.

required
bounds

Parameter bounds for the optimization.

required
bootstraps

Number of bootstrap iterations.

required
use_regularization

Whether to use regularization in the fitting process.

USE_REGULARIZATION

Returns:

Type Description

Tuple containing estimated parameters, model fits, error values, and regularization term.

protwise.paramest.toggle

estimate_parameters(gene, pr_data, p_data, r_data, init_cond, num_psites, time_points, bounds, bootstraps)

Wrapper around the normal estimation workflow in paramest/normest.py.

This function delegates parameter estimation to :func:paramest.normest.normest. There is currently only one estimation mode (normal). A sequential estimation mode does not exist; paramest/seqest.py is not part of this package.

Parameters:

Name Type Description Default
gene str

Gene name.

required
pr_data array

Array of protein data.

required
p_data array

Array of protein-phospho data.

required
r_data array

Array of RNA data.

required
init_cond array

Initial conditions for the model.

required
num_psites int

Number of phosphorylation sites.

required
time_points array

Time points for the data.

required
bounds tuple

Bounds for the parameter estimation.

required
bootstraps int

Number of bootstrap samples.

required

Returns:

Name Type Description
model_fits list

List of model fits.

estimated_params array

Estimated parameters.

seq_model_fit array

Model fit array of shape (num_psites, len(time_points)).

errors array

Errors in the estimation.

reg_term float

Regularization term.

Weights for Curve Fitting

protwise.models.weights

early_emphasis(pr_data, p_data, time_points, num_psites)

Function that calculates custom weights for early time points in a dataset.

Parameters:

Name Type Description Default
pr_data

2D numpy array of shape (num_psites, n_times)

required
p_data

2D numpy array of shape (num_psites, n_times)

required
time_points

1D numpy array of time points

required
num_psites

Number of phosphorylation sites

required

Returns:

Name Type Description
custom_weights

1D numpy array of weights for early time points

get_protein_weights(gene, input1_path=Path(__file__).resolve().parent.parent / 'processing' / 'input1_wstd.csv', input2_path=Path(__file__).resolve().parent.parent / 'data' / 'input2.csv')

Function to extract weights for a specific gene from the input files.

Parameters:

Name Type Description Default
gene str

Gene ID to filter the weights.

required
input1_path Path

Path to the input1_wstd.csv file.

parent / 'processing' / 'input1_wstd.csv'
input2_path Path

Path to the input2.csv file.

parent / 'data' / 'input2.csv'

Returns:

Name Type Description
weights ndarray

Extracted weights for the specified gene.

full_weight(p_data_weight, use_regularization, reg_len)

Function to create a full weight array for parameter estimation.

Parameters:

Name Type Description Default
p_data_weight ndarray

The weight data to be processed.

required
use_regularization bool

Flag to indicate if regularization is used.

required
reg_len int

Length of the regularization term.

required

Returns:

Type Description

numpy.ndarray: The full weight array.

get_weight_options(target, t_target, num_psites, use_regularization, reg_len, early_weights, ms_gauss_weights)

Function to calculate weights for parameter estimation based on the target data and time points.

Parameters:

Name Type Description Default
target ndarray

The target data for which weights are calculated.

required
t_target ndarray

The time points corresponding to the target data.

required
num_psites int

Number of phosphorylation sites.

required
use_regularization bool

Flag to indicate if regularization is used.

required
reg_len int

Length of the regularization term.

required
early_weights ndarray

Weights for early time points.

required
ms_gauss_weights ndarray

Weights based on Gaussian distribution.

required

Returns:

Name Type Description
dict

A dictionary containing different weight options.

Parameter Estimation

protwise.paramest.core

process_gene(gene, protein_data, kinase_data, mrna_data, time_points, bounds, bootstraps=0, out_dir=OUT_DIR)

Process a single gene by estimating its parameters and generating plots.

Parameters:

Name Type Description Default
gene str

Gene name.

required
protein_data DataFrame

DataFrame containing protein-only data.

required
kinase_data DataFrame

DataFrame containing kinase data.

required
mrna_data DataFrame

DataFrame containing mRNA data.

required
time_points list

List of time points for the experiment.

required
bounds tuple

Bounds for parameter estimation.

required
bootstraps int

Number of bootstrap iterations. Defaults to 0.

0
out_dir str

Output directory for saving results. Defaults to OUT_DIR.

OUT_DIR

Returns:

Type Description
  • gene: The gene being processed.
  • estimated_params: Estimated parameters for the gene.
  • model_fits: Model fits for the gene.
  • seq_model_fit: Sequential model fit for the gene.
  • errors: Error metrics (MSE, MAE).
  • final_params: Final estimated parameters.
  • param_df: DataFrame of estimated parameters.
  • gene_psite_data: Dictionary of gene-specific data.
  • psite_labels: Labels for phosphorylation sites.
  • pca_result: PCA result for the gene.
  • ev: Explained variance for PCA.
  • tsne_result: t-SNE result for the gene.
  • perturbation_analysis: Sensitivity analysis results.
  • perturbation_curves_params: Trajectories with parameters for sensitivity analysis.
  • knockout_results: Dictionary of knockout results.
  • regularization: Regularization value used in parameter estimation.

process_gene_wrapper(gene, protein_data, kinase_data, mrna_data, time_points, bounds, bootstraps, out_dir=OUT_DIR)

Wrapper function to process a gene.

Parameters:

Name Type Description Default
gene str

Gene name.

required
protein_data DataFrame

DataFrame containing protein-only data.

required
kinase_data DataFrame

DataFrame containing kinase data.

required
mrna_data DataFrame

DataFrame containing mRNA data.

required
time_points list

List of time points for the experiment.

required
bounds tuple

Bounds for parameter estimation.

required
bootstraps int

Number of bootstrap iterations. Defaults to 0.

required
out_dir str

Output directory for saving results. Defaults to OUT_DIR.

OUT_DIR

Returns:

Name Type Description
dict

A dictionary containing the results of the gene processing.

Confidence Intervals using Linearization

protwise.paramest.identifiability.ci

confidence_intervals(gene, popt, pcov, target, model, alpha_val=0.05)

Computes the confidence intervals for parameter estimates using Wald Intervals approach.

Parameters:

Name Type Description Default
gene str

Gene name.

required
popt ndarray

Optimized parameter estimates.

required
pcov ndarray

Covariance matrix of the optimized parameters.

required
target ndarray

Target data.

required
model ndarray

Model predictions.

required
alpha_val float

Significance level for confidence intervals. Defaults to 0.05.

0.05

Returns:

Name Type Description
dict

A dictionary containing the confidence intervals and other statistics.

Knockout Analysis

protwise.knockout.helper

Perturbation & Parameter Sensitivity Analysis

protwise.sensitivity.analysis

compute_bound(value, perturbation=PERTURBATIONS_VALUE)

Computes the lower and upper bounds for a given parameter value for sensitivity analysis and perturbations.

Parameters:

Name Type Description Default
value float

The parameter value.

required
perturbation float

The perturbation factor.

PERTURBATIONS_VALUE

Returns:

Name Type Description
list

A list containing the lower and upper bounds.

define_sensitivity_problem_rand(num_psites, values)

Defines the Morris sensitivity analysis problem for the random model.

Parameters:

Name Type Description Default
num_psites int

Number of phosphorylation sites.

required
values list

List of parameter values.

required

Returns:

Name Type Description
dict

A dictionary containing the number of variables, parameter names, and bounds.

define_sensitivity_problem_ds(num_psites, values)

Defines the Morris sensitivity analysis problem for the dynamic-site model.

Parameters:

Name Type Description Default
num_psites int

Number of phosphorylation sites.

required
values list

List of parameter values.

required

Returns:

Name Type Description
dict

A dictionary containing the number of variables, parameter names, and bounds.

Model Diagram

protwise.models.diagram.helpers

powerset(iterable)

Return the list of all subsets (as frozensets) of the given iterable.

Parameters:

Name Type Description Default
iterable

An iterable (e.g., list, set) to generate subsets from.

required

Returns: A list of frozensets representing all subsets of the input iterable.

state_label(state)

Convert a set of phosphorylation sites into a node label.

Parameters:

Name Type Description Default
state

A frozenset representing the phosphorylation state.

required

Returns: A string representing the label for the node.

create_random_diagram(x, num_sites, output_filename)

Create a random phosphorylation diagram.

Parameters:

Name Type Description Default
x

Placeholder parameter, not used in this function.

required
num_sites

The number of phosphorylation sites.

required
output_filename

The name of the output file for the diagram.

required

create_distributive_diagram(x, num_sites, output_filename)

Create a distributive phosphorylation diagram.

Parameters:

Name Type Description Default
x

Placeholder parameter, not used in this function.

required
num_sites

The number of phosphorylation sites.

required
output_filename

The name of the output file for the diagram.

required

create_successive_model(x, num_sites, output_filename)

Create a successive phosphorylation diagram.

Parameters:

Name Type Description Default
x

Placeholder parameter, not used in this function.

required
num_sites

The number of phosphorylation sites.

required
output_filename

The name of the output file for the diagram.

required

Protein Wise Model Types

protwise.models.distmod

ode_core(y, t, A, B, C, D, S_rates, D_rates)

The core ODE system for the distributive phosphorylation model.

Parameters:

Name Type Description Default
y

array of concentrations

required
t

time

required
A

mRNA production rate

required
B

mRNA degradation rate

required
C

protein production rate

required
D

protein degradation rate

required
S_rates

phosphorylation rates for each site

required
D_rates

dephosphorylation rates for each site

required

Returns:

Name Type Description
dydt

array of derivatives

unpack_params(params, num_psites)

Function to unpack the parameters for the distributive ODE system.

Parameters:

Name Type Description Default
params array

Parameter vector containing A, B, C, D, S_1.S_n, Ddeg_1.Ddeg_m.

required
num_psites int

Number of phosphorylation sites.

required

Returns:

Name Type Description
A float

mRNA production rate.

B float

mRNA degradation rate.

C float

protein production rate.

D float

protein degradation rate.

S_rates array

Phosphorylation rates for each site.

D_rates array

Dephosphorylation rates for each site.

solve_ode(params, init_cond, num_psites, t)

Solve the ODE system for the distributive phosphorylation model.

Parameters:

Name Type Description Default
params

array of parameters

required
init_cond

initial conditions

required
num_psites

number of phosphorylation sites

required
t

time points

required

Returns:

Name Type Description
sol

solution of the ODE system

P_fitted

phosphorylated sites

protwise.models.randmod

unpack_params(params, num_sites)

Unpack parameters for the Random model.

Parameters:

Name Type Description Default
params array

Parameter vector containing A, B, C, D, S_1.S_n, Ddeg_1.Ddeg_m.

required
num_sites int

Number of phosphorylation sites.

required

Returns:

Name Type Description
A float

mRNA production rate.

B float

mRNA degradation rate.

C float

protein production rate.

D float

protein degradation rate.

S array

Phosphorylation rates for each site.

Ddeg array

Degradation rates for phosphorylated states.

ode_system(y, t, A, B, C, D, num_sites, S, Ddeg, mono_idx, forward, drop, fcounts, dcounts)

Compute the time derivatives of a random phosphorylation ODE system.

This function supports a large number of phosphorylation states by using precomputed transition indices to optimize speed.

Parameters:

Name Type Description Default
y array

Current state vector [R, P, X_1, ..., X_m].

required
t float

Time (unused; present for compatibility with ODE solvers).

required
A float

mRNA production rate.

required
B float

mRNA degradation rate.

required
C float

protein production rate.

required
D float

protein degradation rate.

required
num_sites int

Number of phosphorylation sites.

required
S array

Phosphorylation rates for each site.

required
Ddeg array

Degradation rates for phosphorylated states.

required
mono_idx array

Precomputed indices for mono-phosphorylated states.

required
forward array

Forward phosphorylation target states.

required
drop array

Dephosphorylation target states.

required
fcounts array

Number of valid forward transitions for each state.

required
dcounts array

Number of valid dephosphorylation transitions for each state.

required

Returns:

Name Type Description
out array

Derivatives [dR, dP, dX_1, ..., dX_m].

solve_ode(popt, y0, num_sites, t)

Integrate the ODE system for phosphorylation dynamics in random phosphorylation model.

Parameters:

Name Type Description Default
popt array

Optimized parameter vector [A, B, C, D, S_1.S_n, Ddeg_1.Ddeg_m].

required
y0 array

Initial condition vector [R0, P0, X1_0, ..., Xm_0].

required
num_sites int

Number of phosphorylation sites.

required
t array

Time points to integrate over.

required

Returns:

Name Type Description
sol ndarray

Full ODE solution of shape (len(t), len(y0)).

mono ndarray

1D array of fitted values for R (after OFFSET) and P states.

protwise.models.succmod

ode_core(y, t, A, B, C, D, S_rates, D_rates)

The core of the ODE system for the successive ODE model.

Parameters:

Name Type Description Default
y array

The current state of the system.

required
t float

The current time.

required
A float

The mRNA production rate.

required
B float

The mRNA degradation rate.

required
C float

The protein production rate.

required
D float

The protein degradation rate.

required
S_rates array

The phosphorylation rates for each site.

required
D_rates array

The dephosphorylation rates for each site.

required

Returns: dydt (np.array): The derivatives of the state variables.

unpack_params(params, num_psites)

Function to unpack the parameters for the ODE system. The parameters are expected to be in the following order: A, B, C, D, S_rates, D_rates where S_rates and D_rates are arrays of length num_psites. The function returns the unpacked parameters as separate variables. :param params: array of parameters :param num_psites: number of phosphorylation sites :return: A, B, C, D, S_rates, D_rates

solve_ode(params, init_cond, num_psites, t)

Solve the ODE system using the given parameters and initial conditions. The function integrates the ODE system over time and returns the solution.

:param params: :param init_cond: :param num_psites: :param t: :return: solution, solution of phosphorylated sites

Steady-State Calculation

protwise.steady.initdist

initial_condition(num_psites: int) -> list

Calculates the initial steady-state conditions for a given number of phosphorylation sites for distributive phosphorylation model.

Parameters:

Name Type Description Default
num_psites int

Number of phosphorylation sites in the model.

required

Returns:

Name Type Description
list list

A list of steady-state values for the variables [R, P, P_sites].

Raises:

Type Description
ValueError

If the optimization fails to find a solution for the steady-state conditions.

protwise.steady.initrand

initial_condition(num_psites: int) -> list

Calculates the initial steady-state conditions for a given number of phosphorylation sites for random phosphorylation model.

Parameters:

Name Type Description Default
num_psites int

Number of phosphorylation sites in the model.

required

Returns:

Name Type Description
list list

A list of steady-state values for the variables [R, P, P_sites].

Raises:

Type Description
ValueError

If the optimization fails to find a solution for the steady-state conditions.

protwise.steady.initsucc

initial_condition(num_psites: int) -> list

Calculates the initial steady-state conditions for a given number of phosphorylation sites for successive phosphorylation model.

Parameters:

Name Type Description Default
num_psites int

Number of phosphorylation sites in the model.

required

Returns:

Name Type Description
list list

A list of steady-state values for the variables [R, P, P_sites].

Raises:

Type Description
ValueError

If the optimization fails to find a solution for the steady-state conditions.

Plotting

protwise.plotting.plotting

Plotter

A class to encapsulate plotting functionalities for ODE model analysis.

Attributes:

Name Type Description
gene str

The gene or experiment name.

out_dir str

The directory where plots will be saved.

color_palette list

List of color codes used for plotting.

plot_parallel(solution: np.ndarray, labels: list)

Plots a parallel coordinates plot for the given solution.

Parameters:

Name Type Description Default
solution ndarray

2D numpy array of shape (samples, features) representing the data.

required
labels list

List of labels for the features in the solution.

required

pca_components(solution: np.ndarray, target_variance: float = 0.99)

Plots a scree plot showing the explained variance ratio for PCA components.

Parameters:

Name Type Description Default
solution ndarray

2D numpy array of shape (samples, features) representing the data.

required
target_variance float

The target variance to explain. Defaults to 0.99.

0.99

plot_pca(solution: np.ndarray, components: int = 3)

Plots the PCA results for the given solution.

Parameters:

Name Type Description Default
solution ndarray

2D numpy array of shape (samples, features) representing the data.

required
components int

Number of PCA components to plot. Defaults to 3.

3

Returns:

Name Type Description
tuple

PCA result and explained variance ratio.

plot_tsne(solution: np.ndarray, perplexity: int = 30)

Plots a t-SNE visualization of the given solution.

Parameters:

Name Type Description Default
solution ndarray

2D numpy array of shape (samples, features) representing the data.

required
perplexity int

The perplexity parameter for t-SNE. Defaults to 30.

30

Returns:

Type Description

np.ndarray: The t-SNE result.

plot_param_series(estimated_params: list, param_names: list, time_points: np.ndarray)

Plots the time series of estimated parameters over the given time points.

Parameters:

Name Type Description Default
estimated_params list

List of estimated parameters.

required
param_names list

List of parameter names.

required
time_points ndarray

Array of time points.

required

plot_profiles(data: pd.DataFrame)

Plots the profiles of estimated parameters over time.

Parameters:

Name Type Description Default
data DataFrame

DataFrame containing the time series data.

required

plot_model_fit(model_fit: np.ndarray, Pr_data: np.ndarray, P_data: np.ndarray, R_data: np.ndarray, sol: np.ndarray, num_psites: int, psite_labels: list, time_points: np.ndarray)

Plots the model fit for mRNA, protein, and phosphorylated species across time.

Parameters:

Name Type Description Default
model_fit ndarray

Flattened model fit data (length = 9 + 14 + 14*num_psites).

required
Pr_data ndarray

Protein data (14,).

required
P_data ndarray

Phosphorylation data (num_psites + 2, 14).

required
R_data ndarray

mRNA data (9,).

required
sol ndarray

ODE solution array.

required
num_psites int

Number of phosphorylation sites.

required
psite_labels list

Labels for phosphorylation sites.

required
time_points ndarray

Time points (14,).

required

plot_param_scatter(est_arr: np.ndarray, num_psites: int, time_vals: np.ndarray)

Plots scatter and density plots for parameters.

Parameters:

Name Type Description Default
est_arr ndarray

2D numpy array of estimated parameters.

required
num_psites int

Number of phosphorylation sites.

required
time_vals ndarray

Array of time values.

required

plot_heatmap(param_value_df: pd.DataFrame)

Parameters:

Name Type Description Default
param_value_df DataFrame

DataFrame containing parameter values with 'Protein' as one of the columns.

required

plot_error_distribution(error_df: pd.DataFrame)

Parameters:

Name Type Description Default
error_df DataFrame

DataFrame containing errors with 'MAE' as one of the columns.

required

plot_gof(merged_data: pd.DataFrame)

Plot the goodness of fit for the model.

Parameters:

Name Type Description Default
merged_data DataFrame

Dataframe containing merged data.

required

plot_kld(merged_data: pd.DataFrame)

Plots the Kullback-Divergence for the model.

Parameters:

Name Type Description Default
merged_data DataFrame

Dataframe containing merged data.

required

plot_params_bar(ci_results: dict, param_labels: list = None)

Plots bar plot for estimated parameter with 95% Confidence Interval.

Parameters:

Name Type Description Default
ci_results dict

Dictionary containing the results of the confidence intervals.

required
param_labels list

List of parameter labels. Defaults to None.

None

plot_knockouts(results_dict: dict, num_psites: int, psite_labels: list)

Plot wild-type and knockout simulation results for comparison.

Parameters:

Name Type Description Default
results_dict dict

Dictionary containing simulation results.

required
num_psites int

Number of phosphorylation sites.

required
psite_labels list

List of phosphorylation site labels.

required

plot_top_param_pairs(excel_path: str)

For each gene's '_perturbations' sheet in the Excel file, plot scatter plots for the parameter pairs with correlation.

Parameters:

Name Type Description Default
excel_path str

Path to the Excel file.

required

plot_model_perturbations(problem: dict, Si: dict, cutoff_idx: int, time_points: np.ndarray, n_sites: int, best_model_psite_solutions: np.ndarray, best_mrna_solutions: np.ndarray, best_protein_solutions: np.ndarray, psite_labels: list[str], protein_data_ref: np.ndarray, psite_data_ref: np.ndarray, rna_ref: np.ndarray, model_fit_sol: np.ndarray) -> None

Plot the best model perturbations for the given data.

Parameters:

Name Type Description Default
problem dict

The optimization problem.

required
Si dict

The simulation index.

required
cutoff_idx int

The cutoff index for the time points.

required
time_points ndarray

The time points for the data.

required
n_sites int

The number of phosphorylation sites.

required
best_model_psite_solutions ndarray

The best model phosphorylation site solutions.

required
best_mrna_solutions ndarray

The best model mRNA solutions.

required
best_protein_solutions ndarray

The best model protein solutions.

required
protein_ref

The reference data for the protein.

required
psite_labels list[str]

The labels for the phosphorylation sites.

required
psite_data_ref ndarray

The reference data for the phosphorylation sites.

required
rna_ref ndarray

The reference data for mRNA.

required

plot_time_state_grid(samples: np.ndarray, time_points: np.ndarray, state_names: list)

Grid of strip plots per state showing variability across time.

Parameters:

Name Type Description Default
samples ndarray

shape (n_samples, n_timepoints, n_states)

required
time_points ndarray

array of time points

required
state_names list

list of state names

required

plot_phase_space(samples: np.ndarray, state_names: list)

Phase space plots: one state vs another for each simulation.

Parameters:

Name Type Description Default
samples ndarray

Shape (n_samples, n_timepoints, n_states)

required
state_names list

List of state names (length = num_states)

required

plot_future_fit(P_data: np.ndarray, R_data: np.ndarray, sol: np.ndarray, num_psites: int, psite_labels: list, time_points: np.ndarray)

Plots the model fit for the future time points.

Parameters:

Name Type Description Default
P_data ndarray

Data for phosphorylation sites.

required
R_data ndarray

Data for mRNA.

required
sol ndarray

Model solution.

required
num_psites int

Number of phosphorylation sites.

required
psite_labels list

Labels for phosphorylation sites.

required
time_points ndarray

Time points for the data.

required

plot_regularization(excel_path: str)

Read every '_params' sheet in the Excel file, pull the Regularization value, and plot a horizontal bar chart of regularization vs. gene.

Parameters:

Name Type Description Default
excel_path str

Path to the Excel file.

required

plot_model_error(excel_path: str)

Read every '_params' sheet in the Excel file, pull the RMSE value, and plot a horizontal bar chart of RMSE vs. gene.

Parameters:

Name Type Description Default
excel_path str

Path to the Excel file.

required

Utility Functions

common.utils.display

ensure_output_directory(directory)

Ensure the output directory exists. If it doesn't, create it.

Parameters:

Name Type Description Default
directory str

Path to the output directory.

required

load_data(excel_file, sheet='Estimated Values')

Load data from an Excel file. The default sheet is "Estimated Values".

Parameters:

Name Type Description Default
excel_file str

Path to the Excel file.

required
sheet str

Name of the sheet to load. Default is "Estimated Values".

'Estimated Values'

Returns:

Type Description

pd.DataFrame: DataFrame containing the data from the specified sheet.

format_duration(seconds)

Format a duration in seconds into a human-readable string.

Parameters:

Name Type Description Default
seconds float

Duration in seconds.

required

Returns: str: Formatted duration string.

merge_obs_est(filename)

Function to merge observed and estimated data from an Excel file.

Parameters:

Name Type Description Default
filename str

Path to the Excel file containing observed and estimated data.

required

Returns:

Type Description

pd.DataFrame: Merged DataFrame containing observed and estimated values for each gene and Psite.

save_result(results, excel_filename)

Function to save results to an Excel file.

Parameters:

Name Type Description Default
results list

List of dictionaries containing results for each gene.

required
excel_filename str

Path to the output Excel file.

required

create_report(results_dir: str, output_file: str = f'{model_type}_report.html')

Creates a single global report HTML file from all gene folders inside the results directory.

Parameters:

Name Type Description Default
results_dir str

Path to the root result's directory.

required
output_file str

Name of the generated global report file (placed inside results_dir).

f'{model_type}_report.html'

organize_output_files(directories: Iterable[Union[str, Path]])

Organize output files into protein-specific folders and a general folder.

Parameters:

Name Type Description Default
directories Iterable[Union[str, Path]]

List of directories to organize.

required

common.utils.tables

generate_tables(xlsx_file_path)

Generate hierarchical tables from the XLSX file containing alpha and beta values.

Parameters:

Name Type Description Default
xlsx_file_path str

Path to the XLSX file containing alpha and beta values.

required

Returns:

Name Type Description
tuple

containing protein, psite, and the corresponding table.

save_tables(tables, output_dir)

Save the generated tables as LaTeX and CSV files.

Parameters:

Name Type Description Default
tables list

List of tuples containing protein, psite, and the corresponding table.

required
output_dir str

Directory to save the LaTeX and CSV files.

required

save_master_table(folder='latex', output_file='latex/all_tables.tex')

Save a master LaTeX file that includes all individual LaTeX files from the specified folder.

Parameters:

Name Type Description Default
folder str

The folder containing the individual LaTeX files.

'latex'
output_file str

The name of the master LaTeX file to be created.

'latex/all_tables.tex'

common.utils.latexit

generate_latex_table(df, sheet_name)

Generate LaTeX code for a table from a DataFrame. Args: df (pd.DataFrame): DataFrame to convert to LaTeX. sheet_name (str): Name of the sheet for caption and label. Returns: str: LaTeX code for the table.

generate_latex_image(image_filename)

Generate LaTeX code for an image.

Parameters:

Name Type Description Default
image_filename str

Path to the image file.

required

Returns: str: LaTeX code for the image.

main(input_dir)

Main function to process Excel and PNG files in the input directory and generate LaTeX code.

Parameters:

Name Type Description Default
input_dir str

Directory containing Excel and PNG files.

required

Global ODE Model

Core Data Structures & Topology

networkmodel.network

System State and Topology Management Module.

This module defines the core data structures that represent the biological system: 1. Index: Manages the mapping between biological names (Proteins, Sites) and numerical indices in the state vector. It handles the complex logic of "Proxy Redirection" for orphan Transcription Factors. 2. KinaseInput: Interpolates experimental kinase data onto the simulation time grid. 3. System: The central container holding all parameters, sparse matrices (CSR), and logic required to evaluate the differential equations. It acts as the bridge between high-level Python objects and the low-level Numba JIT kernels.

Index

Manages the indexing of the state vector Y and network topology.

The state vector Y is a flattened array containing all species. For $N$ proteins, the layout depends on the MODEL type:

  • Distributive/Sequential: [mRNA_i, Prot_i, Site_i_1, Site_i_2, ...]
  • Combinatorial: [mRNA_i, Prot_state_0, Prot_state_1, ..., Prot_state_2^n]

__init__(interactions: pd.DataFrame, tf_interactions: pd.DataFrame = None, kin_beta_map: dict = None, tf_beta_map: dict = None)

Index manager for the PhoskinTime ODE system. Handles protein/kinase mapping and redirects Orphan TFs to Kinase Proxies.

Parameters:

Name Type Description Default
interactions DataFrame

Cleaned Kinase-Substrate network (columns: protein, psite, kinase).

required
tf_interactions DataFrame

Cleaned TF-Target network (columns: tf, target).

None
kin_beta_map dict

Optional dictionary of optimized kinase priors {name: beta_val}.

None
tf_beta_map dict

Optional dictionary of optimized TF priors {name: beta_val}.

None

block(i: int) -> slice

Helper to get the range in the state vector for protein i.

KinaseInput

Manages the external Kinase signal inputs $K(t)$. Interpolates sparse experimental observations onto the solver's dense time grid.

eval(t)

Returns the kinase activity vector at time t (using step interpolation).

System

The central Simulation Object.

Holds: 1. Parameters (Arrays $A_i, B_i, \dots$). 2. Network Topology Matrices (Sparse CSR format). 3. Initial Conditions logic. 4. The rhs method (Python-side derivative calculation). 5. Argument packing logic for Numba JIT kernels.

update(c_k, A_i, B_i, C_i, D_i, Dp_i, E_i, tf_scale)

Updates the system parameters from the optimizer.

attach_initial_condition_data(df_prot, df_rna, df_pho)

Attaches experimental data used to calculate t=0 state.

set_initial_conditions()

Derives the y0 vector from attached experimental data.

rhs(t, y)

Python-side Right-Hand Side evaluation. Used primarily for testing or when live interactivity/debugging is needed.

Logic Flow: 1. Live-Drive: Calculate Kinase Activity $Kt = K_{data}(t) imes c_k$. 2. Signaling: Calculate Phospho-Drive $S = W \cdot Kt$. 3. Protein Aggregation: Sum phospho-states to get total protein $P$. Crucially, if a protein is a Kinase (or Proxy), overwrite its value with $Kt$. 4. Regulation: Calculate TF inputs $TF_{in} = TF_{mat} \cdot P$. 5. Dynamics: Call model-specific _rhs kernel.

y0() -> np.ndarray

Returns the initial state vector y0.

odeint_args(S_cache=None)

Packs all system arrays into a tuple for the Numba JIT solver.

This method constructs the driver_map array, which tells the low-level solver which proteins are actually Kinases (or Proxies) and should be "Driven" by experimental data rather than simulated.

networkmodel.buildmat

Network Matrix Construction Module.

This script handles the construction of the two primary connectivity matrices used in the global simulation: 1. W Matrix (Kinase-Substrate): A sparse matrix defining the phosphorylation rates from kinases to specific phosphosites. It is built in parallel to handle large interactomes. 2. TF Matrix (Transcription Factor-Gene): A sparse matrix defining the transcriptional regulation strengths. It includes logic to handle "proxy" TFs (orphans mapped to kinases).

site_key(site: str) -> int

Extracts the numerical position from a phosphosite string for sorting.

Parameters:

Name Type Description Default
site str

e.g., "S473" or "Y1068".

required

Returns:

Name Type Description
int int

The integer position (e.g., 473).

Raises:

Type Description
ValueError

If no digits are found in the string.

build_W_parallel(interactions: pd.DataFrame, idx, n_cores=4) -> sparse.csr_matrix

Constructs the Global Kinase-Substrate Matrix W using parallel processing.

The global matrix is a vertical stack of local matrices: $$ W_{global} = egin{bmatrix} W_{protein_1} \ W_{protein_2} \ dots \end{bmatrix} $$

Parameters:

Name Type Description Default
interactions DataFrame

Dataframe containing 'protein', 'psite', 'kinase', and 'alpha'.

required
idx IndexMap

Object containing mappings (sites, k2i, proteins).

required
n_cores int

Number of parallel processes to use.

4

Returns:

Type Description
csr_matrix

sparse.csr_matrix: The global interaction matrix.

build_tf_matrix(tf_net, idx, tf_beta_map=None, kin_beta_map=None)

Constructs the Transcription Factor Regulation Matrix.

This matrix defines the influence of TFs on target genes. The edge weight is calculated as: $$ W_{ij} = lpha_{ij} imes eta_{tf} $$ where $lpha$ is the edge strength and $eta$ is the activity multiplier.

Proxy Logic: If a TF is mapped as a "proxy" (i.e., an orphan TF whose activity is inferred from a kinase), we use the kinase's beta value (kin_beta_map) instead of the TF's beta.

Parameters:

Name Type Description Default
tf_net DataFrame

TF-Target interactions with 'tf', 'target', and 'alpha'.

required
idx IndexMap

Object containing mappings (p2i, proxy_map).

required
tf_beta_map dict

Multipliers for standard TFs.

None
kin_beta_map dict

Multipliers for kinases (used for proxies).

None

Returns:

Type Description

sparse.csr_matrix: An $(N, N)$ matrix where rows=targets, cols=TFs.

networkmodel.params

Parameter Management and Transformation Module.

This module handles the interface between the biological model (which requires positive physical parameters like rate constants) and the numerical optimizer (which operates on a flat vector of decision variables).

Key Concepts: 1. Positivity Constraint: Biological rates cannot be negative. To enforce this during optimization without using hard constraints, we use a Softplus transformation: $$ P_{physical} = \ln(1 + e^{\theta_{raw}}) $$ This maps any real number $\theta$ to a positive value $P$. 2. Vectorization: The optimizer expects a single 1D array (theta). This module packs all distinct parameter arrays ($A_i, B_i, \dots$) into this vector and maintains a slices dictionary to unpack them back.

init_raw_params(defaults, custom_bounds=None)

Initializes the decision vector and bounds for the optimizer.

This function performs three main tasks: 1. Transformation: Converts default physical values to "raw" optimization space using the inverse softplus function. 2. Flattening: Concatenates all parameter arrays into a single 1D vector theta0. 3. Bounds Mapping: Transforms physical bounds (min, max) into raw space bounds.

Parameters:

Name Type Description Default
defaults dict

Dictionary of initial physical values (e.g., {'A_i': [0.1, ...], ...}).

required
custom_bounds dict

Dictionary of (min, max) tuples for specific keys. Overrides the global BOUNDS_CONFIG.

None

Returns:

Name Type Description
tuple

A tuple containing: - theta0 (np.ndarray): The initial flattened decision vector. - slices (dict): Mapping of parameter names to slice objects for unpacking. - xl (np.ndarray): Lower bounds vector in raw space. - xu (np.ndarray): Upper bounds vector in raw space.

unpack_params(theta, slices)

Converts the raw decision vector back to physical parameters.

Applies the Softplus transformation: $$ P = \ln(1 + e^{\theta}) $$ This ensures that all output parameters are strictly positive, regardless of the values chosen by the optimizer.

Parameters:

Name Type Description Default
theta ndarray

The flat decision vector from the optimizer.

required
slices dict

The slicing dictionary created by init_raw_params.

required

Returns:

Name Type Description
dict

A dictionary of physical parameters keys (e.g., 'A_i', 'c_k') mapped to positive numpy arrays or scalars.

Configuration & Data Loading

networkmodel.config

Configurations for PhoskinTime Global model.

This module serves as the central configuration hub for the application. It loads settings from a config.toml file located in the project root and exports them as global constants. These constants control every aspect of the simulation pipeline, including:

  1. I/O Paths: Locations of interaction networks and experimental data.
  2. Model Topology: Selection of the kinetic model structure (Distributive vs. Sequential vs. Combinatorial).
  3. Solver Settings: Tolerances and step sizes for the ODE integrator.
  4. Optimization: Hyperparameters for the parameter estimation loop (Evolutionary Strategy).
  5. Regularization: Penalty terms to prevent overfitting.
  6. Metadata: Versioning and citation information.

networkmodel.io

Data Loading and Pre-processing Module.

This script is responsible for ingesting raw CSV/Excel files containing biological network definitions (Kinases, TFs) and experimental data (MS Proteomics, RNA-seq). It harmonizes identifiers, merges prior knowledge (Alpha/Beta weights), and reshapes time-series data into a standardized 'tidy' format for the modeling pipeline.

load_data(args)

Loads, cleans, and merges all required data files for the model.

Process: 1. Networks: Loads Kinase-Substrate and TF-Gene interactions. - Expands set-based notations (e.g., "{K1, K2}"). - Merges 'Alpha' (edge strength) and 'Beta' (node activity prior) values from optimization results. 2. Experimental Data: Loads MS and RNA-seq data. - Scales data (e.g., log-transformation, normalization). - Reshapes from "wide" (time points as columns) to "long" (time as a variable).

Parameters:

Name Type Description Default
args Namespace

Configuration namespace containing file paths (kinase_net, tf_net, ms, rna, kinopt, tfopt).

required

Returns:

Name Type Description
tuple

A tuple containing: - df_kin_clean (pd.DataFrame): Kinase network with columns [protein, psite, kinase, alpha]. - df_tf_clean (pd.DataFrame): TF network with columns [tf, target, alpha]. - df_prot (pd.DataFrame): Protein abundance data [protein, time, fc]. - df_pho (pd.DataFrame): Phosphorylation data [protein, psite, time, fc]. - df_rna (pd.DataFrame): RNA abundance data [protein, time, fc]. - kin_beta_map (dict): Prior activity assumptions for kinases {kinase: beta}. - tf_beta_map (dict): Prior activity assumptions for TFs {tf: beta}.

Physics Kernels (JIT)

networkmodel.models

Kinetic Model Kernels (Right-Hand Side Definitions).

This module contains the JIT-compiled kernels that define the system of Ordinary Differential Equations (ODEs) for different biological kinetic assumptions.

Supported Models: 1. Distributive (Model 0): Sites on the same protein are phosphorylated independently. State space scales linearly with sites ($N_{sites} + 2$). 2. Sequential (Model 1): Phosphorylation occurs in a strict order ($P_0 o P_1 o \dots$). Useful for processive kinases. State space is linear. 3. Combinatorial (Model 2): Every possible phosphorylation pattern is a distinct state. State space scales exponentially ($2^{N_{sites}} + 1$). Handles complex logic like logic gates. 4. Saturating (Model 4): Uses Michaelis-Menten kinetics for translation and phosphorylation. Prevents unbiological rate explosions at high concentrations.

calculate_synthesis_rate(Ai, tf_scale, u_raw)

Computes the transcription rate based on Transcription Factor (TF) input.

Uses a rational (Hill-like) function that is numerically stable and bounded.

Logic: - Soft-Clipping: The raw input u is first mapped to (-1, 1) to prevent numerical overflow/instability using $u_{norm} = u / (1 + |u|)$. - Activation (u > 0): Rate increases from $A_i$ to $A_i imes (1 + tf_scale)$. - Repression (u < 0): Rate decreases from $A_i$ to $A_i / (1 + tf_scale)$.

Parameters:

Name Type Description Default
Ai float

Basal synthesis rate.

required
tf_scale float

Maximum fold-change factor.

required
u_raw float

Raw total TF input (weighted sum of regulators).

required

Returns:

Name Type Description
float

The calculated synthesis rate.

saturating_rhs(y, dy, A_i, B_i, C_i, D_i, Dp_i, E_i, tf_scale, TF_inputs, S_all, offset_y, offset_s, n_sites)

Right-Hand Side for Model 4: Saturating Kinetics.

Unlike other models that use Mass Action kinetics (Rate = k * [S]), this model uses Michaelis-Menten forms (Rate = Vmax * [S] / (Km + [S])) for Translation and Phosphorylation. This prevents "runaway" kinetics where high protein concentrations lead to infinitely fast reactions.

Physics: - Translation: Limited by ribosome availability. - Phosphorylation: Limited by enzyme (kinase) availability relative to substrate.

distributive_rhs(y, dy, A_i, B_i, C_i, D_i, Dp_i, E_i, tf_scale, TF_inputs, S_all, offset_y, offset_s, n_sites)

Right-Hand Side for Model 0: Distributive (Independent) Binding.

Assumes that the phosphorylation status of one site does not affect others. We track the Unphosphorylated species $P$ and $N_{sites}$ distinct mono-phosphorylated species. This is a simplification that ignores multi-phosphorylated states but captures individual site dynamics efficiently.

sequential_rhs(y, dy, A_i, B_i, C_i, D_i, Dp_i, E_i, tf_scale, TF_inputs, S_all, offset_y, offset_s, n_sites)

Right-Hand Side for Model 1: Sequential Binding.

Assumes phosphorylation must happen in a specific order: $P_0 \xrightarrow{k_0} P_1 \xrightarrow{k_1} P_2 \dots \xrightarrow{k_{n-1}} P_n$

Dephosphorylation is assumed to be distributive (any state can revert to the previous one). This model is suitable for processive kinases or mechanisms with strict steric requirements.

combinatorial_rhs(y, dy, A_i, B_i, C_i, D_i, Dp_i, E_i, tf_scale, TF_inputs, S_cache, jb, offset_y, offset_s, n_sites, n_states, trans_from, trans_to, trans_site, trans_off, trans_n)

Right-Hand Side for Model 2: Combinatorial Binding.

Tracks all $2^N$ possible states. This allows for complex logic (e.g., "Site A facilitates Site B").

The state transition logic is split into: 1. Dephosphorylation (Implicit Graph): Iterates over all states m. For every set bit in m, calculates decay to m ^ lsb. 2. Phosphorylation (Explicit Graph): Uses pre-calculated trans_* arrays to apply forward transitions based on available sites.

build_random_transitions(idx)

Precompute random phosphorylation transitions for all proteins (Model 2 Setup).

Constructs the sparse adjacency list for the hypercube state graph.

Returns arrays for Numba

trans_from, trans_to, trans_site (flattened) trans_off[i], trans_n[i] per protein i

Interpretation

for protein i: transitions are in slice [trans_off[i] : trans_off[i]+trans_n[i]] each transition uses site index 'j' (0..ns-1) to pick rate S_all[s_start + j]

Numerical Integration & Solvers

networkmodel.simulate

Simulation and Measurement Extraction Module.

This module orchestrates the numerical integration of the ODE system and processes the raw state trajectories into biologically meaningful observables (Fold Changes).

Key Responsibilities: 1. Solver Wrapper: Abstracting the choice between scipy.integrate.odeint (LSODA) and a custom Numba-accelerated RK45 solver. 2. State Aggregation: Converting raw state vectors $Y$ into: * Total Protein: Sum of unphosphorylated and phosphorylated forms. * Site Phosphorylation: Sum of specific phospho-states (handling combinatorial logic if needed). * RNA: Direct extraction from state vector. 3. Data Alignment: Interpolating or slicing the simulated timepoints to match experimental grids.

simulate_odeint(sys, t_eval, rtol, atol, mxstep)

Runs the ODE solver for the given system and timepoints.

Dispatches to either a custom Numba RK45 solver (faster for small/stiff systems with JIT) or SciPy's LSODA (robust standard).

Parameters:

Name Type Description Default
sys System

The system object containing parameters and topology.

required
t_eval ndarray

Time points to return the solution at.

required
rtol float

Relative tolerance.

required
atol float

Absolute tolerance.

required
mxstep int

Maximum number of internal steps per time interval.

required

Returns:

Type Description

np.ndarray: The solution matrix $Y$ of shape (len(t_eval), state_dim).

simulate_and_measure(sys, idx, t_points_p, t_points_r, t_points_pho)

Simulates the system and extracts 'Fold Change' (FC) predictions aligned with data.

Process: 1. Union Grid: Creates a master time grid containing all experimental timepoints. 2. Simulate: Integrates the system once over this master grid. 3. Extract & Normalize: - Calculates raw observables (e.g., Total Protein = Unphos + Phos). - Normalizes by the value at the baseline timepoint (t=0 for protein/phospho, t=4 for RNA). 4. Slice: Filters the result to match the specific timepoints requested for each modality.

Parameters:

Name Type Description Default
sys

System object.

required
idx

Index object (topology map).

required
t_points_*

Arrays of timepoints for Protein, RNA, and Phospho data.

required

Returns:

Name Type Description
tuple

(df_prot, df_rna, df_phos) - Pandas DataFrames with columns [protein, time, pred_fc].

networkmodel.solvers

Adaptive Runge-Kutta Integrators (JIT-Compiled).

This module implements a custom Dormand-Prince (RK45) adaptive step-size solver tailored specifically for the stiff and sparse ODE systems generated by PhoskinTime.

Why a custom solver? Standard SciPy solvers (odeint, solve_ivp) incur significant Python overhead when calling the RHS function thousands of times per second. By implementing the entire integration loop in Numba-compiled code, we achieve C-like performance while maintaining flexibility.

Key Features: 1. Bucketed Inputs: Efficiently handles time-varying Kinase inputs ($K(t)$) by detecting boundaries in the discretized input grid and adjusting steps accordingly. 2. Sparse Operations: RHS evaluations use custom sparse matrix-vector multiplication kernels that operate in-place to minimize memory allocation. 3. Hermite Interpolation: Provides dense output, allowing accurate evaluation at user-specified timepoints ($t_{eval}$) regardless of the internal adaptive steps taken.

csr_matvec_into(out, indptr, indices, data, x, n_rows)

In-place Sparse Matrix-Vector Multiplication: out = A * x.

Used for calculating signaling flux ($S = W \cdot K$) and TF inputs ($TF_{in} = A \cdot P$). Optimized for Numba with no temporary array creation.

rhs_model0_bucketed_into(dy, y, jb, c_k, A_i, B_i, C_i, D_i, Dp_i, E_i, tf_scale, kin_Kmat, W_indptr, W_indices, W_data, n_W_rows, TF_indptr, TF_indices, TF_data, n_TF_rows, offset_y, offset_s, n_sites, tf_deg, driver_map, Kt_work, S_all_work, P_vec_work, TF_in_work)

RHS wrapper for Model 0 (Distributive) optimized for the custom solver loop.

Differences from standard RHS: - Writes directly into dy (no return). - Takes pre-allocated work arrays (Kt_work, etc.) to avoid malloc. - Takes jb (current time bucket index) to look up $K(t)$ instantly.

rhs_model1_bucketed_into(dy, y, jb, c_k, A_i, B_i, C_i, D_i, Dp_i, E_i, tf_scale, kin_Kmat, W_indptr, W_indices, W_data, n_W_rows, TF_indptr, TF_indices, TF_data, n_TF_rows, offset_y, offset_s, n_sites, tf_deg, driver_map, Kt_work, S_all_work, P_vec_work, TF_in_work)

RHS wrapper for Model 1 (Sequential) optimized for custom solver.

rhs_model2_bucketed_into(dy, y, jb, c_k, A_i, B_i, C_i, D_i, Dp_i, E_i, tf_scale, S_cache, TF_indptr, TF_indices, TF_data, n_TF_rows, offset_y, offset_s, n_sites, n_states, trans_from, trans_to, trans_site, trans_off, trans_n, tf_deg, driver_map, P_vec, TF_inputs)

RHS wrapper for Model 2 (Combinatorial).

Special Handling: - Uses S_cache (pre-computed W*Kt) instead of computing S_all on the fly. - Live-Drive limitation: Because S_cache is pre-computed, we don't have access to instaneous Kt vector here to drive P_vec. We fallback to standard protein summation.

rhs_model4_bucketed_into(dy, y, jb, c_k, A_i, B_i, C_i, D_i, Dp_i, E_i, tf_scale, kin_Kmat, W_indptr, W_indices, W_data, n_W_rows, TF_indptr, TF_indices, TF_data, n_TF_rows, offset_y, offset_s, n_sites, tf_deg, driver_map, Kt_work, S_all_work, P_vec_work, TF_in_work)

RHS wrapper for Model 4 (Saturating/Michaelis-Menten).

adaptive_rk45_model01(model_id, y0, t_eval, kin_grid, args, rtol=1e-05, atol=1e-07, dt_init=0.05, dt_min=1e-06, dt_max=1.0, safety=0.9, max_steps=2000000)

Adaptive RK45 Integrator (Models 0, 1, 4).

Implements the Dormand-Prince 5(4) pair with local error control.

Key Logic: 1. Step Proposal: Estimate error of current step using 4th/5th order difference. 2. Boundary Check: If next step crosses a kin_grid boundary (where input K(t) changes), shorten step to land exactly on boundary. 3. Error Control: If error < tolerance, accept step and potentially grow dt. If error > tolerance, reject and shrink dt. 4. Interpolation: If accepted step spans a t_eval point, interpolate and record solution.

adaptive_rk45_model2(y0, t_eval, kin_grid, args2, rtol=1e-05, atol=1e-07, dt_init=0.05, dt_min=1e-06, dt_max=1.0, safety=0.9, max_steps=2000000)

Adaptive RK45 Integrator specifically for Model 2 (Combinatorial). Similar logic to adaptive_rk45_model01 but unpacks Model 2 specific structures (S_cache, state transitions) and calls rhs_model2_bucketed_into.

networkmodel.jacspeedup

Numerical Solver Wrappers and JIT Kernels.

This module acts as the high-performance computational core for the simulation. It bridges the gap between the high-level System definitions and the low-level numerical integrators (Runge-Kutta methods).

Key responsibilities: 1. Solver Interface: Provides solve_custom to dispatch the correct integration routine based on the selected kinetic model (Distributive, Sequential, etc.). 2. JIT Compilation: Uses Numba to compile time-critical functions (RHS evaluations, matrix-vector multiplications) into machine code for speed. 3. Sparse Matrix Operations: Implements fast custom kernels for calculating signaling inputs ($S = W \cdot K$) and transcriptional regulation ($TF_{in} = A_{tf} \cdot P$). 4. Jacobian Approximation: Provides finite-difference routines for estimating the Jacobian matrix, which is essential for stiff solvers (though mostly used here for diagnostics or implicit stepping if enabled).

solve_custom(sys, y0, t_eval, rtol, atol)

Dispatches the simulation to the appropriate adaptive Runge-Kutta solver based on the global MODEL type.

This function handles the setup of model-specific arguments, such as building the phosphorylation cache for the combinatorial model (Model 2) or packing the argument tuples for the standard models (0, 1, 4).

Parameters:

Name Type Description Default
sys System

The system object containing parameters and network matrices.

required
y0 ndarray

Initial state vector at t=0.

required
t_eval ndarray

Time points where the solution is required.

required
rtol float

Relative tolerance for error control.

required
atol float

Absolute tolerance for error control.

required

Returns:

Type Description

np.ndarray: The integrated solution matrix Y of shape (len(t_eval), n_states).

csr_matvec(indptr, indices, data, x, n_rows)

Performs a fast sparse matrix-vector multiplication: $y = A \cdot x$.

This is used to calculate: 1. Signaling Input: $S_{all} = W_{global} \cdot Kt$ (Kinase -> Site) 2. TF Input: $TF_{in} = TF_{matrix} \cdot P_{vec}$ (TF -> Gene)

Parameters:

Name Type Description Default
indptr, indices, data

Arrays defining the CSR matrix A.

required
x ndarray

The dense vector to multiply.

required
n_rows int

Number of rows in A.

required

Returns:

Type Description

np.ndarray: The result vector y.

csr_matvec_into(out, indptr, indices, data, x, n_rows)

In-place version of sparse matrix-vector multiplication to reduce memory allocation. Writes the result directly into out.

build_S_cache_into(S_out, W_indptr, W_indices, W_data, kin_Kmat, c_k)

Pre-computes the signaling drive 'S' for every site at every time bucket.

For Model 2, calculating $S = W \cdot (K(t) \cdot c_k)$ at every micro-step is too slow. Instead, since $K(t)$ is discretized into buckets, we can pre-calculate the resulting S for each bucket.

Parallelized using prange for performance.

Parameters:

Name Type Description Default
S_out ndarray

Output cache (n_sites, n_time_buckets).

required
W_*

CSR arrays for the kinase-substrate interaction matrix.

required
kin_Kmat ndarray

Kinase activity profiles (n_kinases, n_buckets).

required
c_k ndarray

Optimized kinase activity multipliers.

required

kin_eval_step(t, grid, Kmat)

Evaluates kinase activity at time t using step interpolation (Nearest Neighbor / Bucket).

Parameters:

Name Type Description Default
t float

Current simulation time.

required
grid ndarray

Time grid boundaries.

required
Kmat ndarray

Kinase data matrix.

required

Returns:

Type Description

np.ndarray: Vector of kinase activities at time t.

rhs_nb_distributive(y, t, c_k, A_i, B_i, C_i, D_i, Dp_i, E_i, tf_scale, kin_grid, kin_Kmat, W_indptr, W_indices, W_data, n_W_rows, TF_indptr, TF_indices, TF_data, n_TF_rows, offset_y, offset_s, n_sites, tf_deg, driver_map)

Right-Hand Side (RHS) function for Model 0 (Distributive / Independent Binding).

Calculates dy/dt = f(t, y) including: 1. Kinase inputs: Interpolates K(t) and maps to sites via sparse W. 2. TF inputs: Aggregates protein levels, maps to genes via sparse TF matrix, and applies Hill-like squashing. 3. Local dynamics: Calls distributive_rhs for the specific kinetic equations.

rhs_nb_sequential(y, t, c_k, A_i, B_i, C_i, D_i, Dp_i, E_i, tf_scale, kin_grid, kin_Kmat, W_indptr, W_indices, W_data, n_W_rows, TF_indptr, TF_indices, TF_data, n_TF_rows, offset_y, offset_s, n_sites, tf_deg, driver_map)

Right-Hand Side (RHS) function for Model 1 (Sequential Binding). Almost identical to Distributive, but calls sequential_rhs.

rhs_nb_combinatorial(y, t, c_k, A_i, B_i, C_i, D_i, Dp_i, E_i, tf_scale, kin_grid, S_cache, TF_indptr, TF_indices, TF_data, n_TF_rows, offset_y, offset_s, n_sites, n_states, trans_from, trans_to, trans_site, trans_off, trans_n, tf_deg, driver_map, P_vec, TF_inputs)

Right-Hand Side (RHS) function for Model 2 (Combinatorial Binding).

Differences from other models: 1. Uses S_cache instead of recomputing $S_{all}$. 2. Uses pre-allocated work arrays (P_vec, TF_inputs) passed as arguments. 3. Iterates over n_states (2^n_sites) rather than just linear sites.

rhs_nb_saturating(y, t, c_k, A_i, B_i, C_i, D_i, Dp_i, E_i, tf_scale, kin_grid, kin_Kmat, W_indptr, W_indices, W_data, n_W_rows, TF_indptr, TF_indices, TF_data, n_TF_rows, offset_y, offset_s, n_sites, tf_deg, driver_map)

Right-Hand Side (RHS) for Model 4 (Saturating/Michaelis-Menten). Identical setup to distributive, but calls saturating_rhs.

rhs_odeint(y, t, *args)

Standard python wrapper required for scipy.integrate.odeint or similar API.

fd_jacobian_nb_core_distributive(y, t, c_k, A_i, B_i, C_i, D_i, Dp_i, E_i, tf_scale, kin_grid, kin_Kmat, W_indptr, W_indices, W_data, n_W_rows, TF_indptr, TF_indices, TF_data, n_TF_rows, offset_y, offset_s, n_sites, tf_deg, driver_map, eps=1e-08)

Finite Difference Jacobian calculation for Distributive Model. Approximates J[i,j] = df_i/dy_j using forward differences. Useful for testing or if an implicit solver is required without an analytical Jacobian.

fd_jacobian_nb_core_sequential(y, t, c_k, A_i, B_i, C_i, D_i, Dp_i, E_i, tf_scale, kin_grid, kin_Kmat, W_indptr, W_indices, W_data, n_W_rows, TF_indptr, TF_indices, TF_data, n_TF_rows, offset_y, offset_s, n_sites, tf_deg, driver_map, eps=1e-08)

Finite Difference Jacobian for Sequential Model.

fd_jacobian_nb_core_combinatorial(y, t, c_k, A_i, B_i, C_i, D_i, Dp_i, E_i, tf_scale, kin_grid, S_cache, TF_indptr, TF_indices, TF_data, n_TF_rows, offset_y, offset_s, n_sites, n_states, trans_from, trans_to, trans_site, trans_off, trans_n, tf_deg, driver_map, P_vec, TF_inputs, eps=1e-08)

Finite Difference Jacobian for Combinatorial Model.

fd_jacobian_nb_core_saturating(y, t, c_k, A_i, B_i, C_i, D_i, Dp_i, E_i, tf_scale, kin_grid, kin_Kmat, W_indptr, W_indices, W_data, n_W_rows, TF_indptr, TF_indices, TF_data, n_TF_rows, offset_y, offset_s, n_sites, tf_deg, driver_map, eps=1e-08)

Finite Difference Jacobian for Saturating Model.

fd_jacobian_odeint(y, t, *args)

Python wrapper for the Jacobian function.

networkmodel.steadystate

Initial Condition and Steady-State Logic Module.

This module is responsible for defining the starting state ($y_0$) of the ODE system. It supports two primary modes: 1. Data-Driven Initialization (build_y0_from_data): Uses experimental data (Protein/RNA abundance at t=0) to set the initial conditions directly. This is crucial for matching the absolute scale of the measurements. 2. Analytical Steady-State (steady_state_*): Solves the algebraic equilibrium equations ($dy/dt = 0$) for a system where all kinetic parameters are set to 1.0. This is useful for testing structural consistency or initializing systems without data.

build_y0_from_data(idx, df_prot, df_rna, df_pho, *, t_init=0.0, t0_pho=0.0, eps=1e-09, time_tol=1e-08, max_pho_frac=0.3)

Constructs the initial state vector $y_0$ strictly from experimental data.

Physics-Compliant Mass Balance: 1. Protein Total ($P_{tot}$): Taken from df_prot at t_init. 2. Phospho Mass: Taken from df_pho at t0_pho. Because phospho signals are often relative intensities, we scale them such that the total phosphorylated mass does not exceed max_pho_frac (e.g., 30%) of $P_{tot}$. 3. Unphosphorylated Mass ($P_0$): Calculated by conservation: $P_0 = P_{tot} - \sum P_{phos}$.

Parameters:

Name Type Description Default
idx

System index object.

required
df_prot, df_rna, df_pho

Tidy dataframes of observations.

required
t_init float

The simulation start time (usually 0.0).

0.0
max_pho_frac float

Cap on the initial fraction of phosphorylated protein.

0.3

Returns:

Type Description

np.ndarray: The assembled initial condition vector.

steady_state_distributive(idx, TF_inputs=None, tf_scale=1.0, verify_with_rhs=False)

Analytically computes the steady state for the Distributive Model assuming all kinetic parameters (A, B, C, D, E, k) are 1.0.

This serves as a structural validation of the network topology. If the topology is valid, a stable steady state should exist.

steady_state_sequential(idx, TF_inputs=None, tf_scale=1.0, verify_with_rhs=False)

Analytically computes the steady state for the Sequential Model.

Because the states form a linear chain ($P_0 \leftrightarrow P_1 \leftrightarrow \dots$), the steady state can be found by solving a tridiagonal linear system.

steady_state_combinatorial(idx, TF_inputs=None, jb=0, tf_scale=1.0, max_states_per_protein=4096, trans_from=None, trans_to=None, trans_site=None, trans_off=None, trans_n=None, verify_with_rhs=False)

Computes steady state for the Combinatorial Model.

This involves solving a sparse linear system $A P = b$ for each protein, where $A$ is the transition rate matrix of the $2^N$ states graph.

networkmodel.model_ivp

IVP Solver Interface Module.

This module provides factory functions that generate callable Right-Hand Side (RHS) functions compatible with scipy.integrate.solve_ivp.

These factories bridge the gap between high-level Python solvers and the low-level, JIT-compiled Numba kernels defined in networkmodel.models. They handle: 1. Type Safety: Ensuring all arrays are C-contiguous and float64 for Numba. 2. Input Normalization: Wrapping transcription factor (TF) inputs whether they are constants or time-dependent functions. 3. Closure Creation: Returning a simple fun(t, y) that closes over all static model parameters, optimizing solver overhead.

make_solve_ivp_fun_saturating(*, A_i, B_i, C_i, D_i, Dp_i, E_i, tf_scale, tf_input, S_all, offset_y, offset_s, n_sites)

Creates a fun(t, y) closure for the Saturating (Michaelis-Menten) Model.

Parameters:

Name Type Description Default
A_i, B_i, C_i, D_i, Dp_i, E_i ndarray

Kinetic parameters arrays.

required
tf_scale float

Global scaling factor for TF activity.

required
tf_input

Transcription factor input (array or callable).

required
S_all ndarray

Pre-calculated Signaling drive matrix.

required
offset_y, offset_s ndarray

Index offsets for state vectors and sites.

required
n_sites ndarray

Number of phospho-sites per protein.

required

Returns:

Name Type Description
callable

A function fun(t, y) -> dy compatible with ODE solvers.

make_solve_ivp_fun_distributive(*, A_i, B_i, C_i, D_i, Dp_i, E_i, tf_scale, tf_input, S_all, offset_y, offset_s, n_sites)

Creates a fun(t, y) closure for the Distributive (Independent) Model.

This model assumes phosphorylation sites on the same protein behave independently.

make_solve_ivp_fun_sequential(*, A_i, B_i, C_i, D_i, Dp_i, E_i, tf_scale, tf_input, S_all, offset_y, offset_s, n_sites)

Creates a fun(t, y) closure for the Sequential Binding Model.

This model enforces an order of phosphorylation (e.g., Site 1 must be phos before Site 2).

make_solve_ivp_fun_combinatorial(*, A_i, B_i, C_i, D_i, Dp_i, E_i, tf_scale, tf_input, S_cache, jb, offset_y, offset_s, n_sites, n_states, trans_from, trans_to, trans_site, trans_off, trans_n)

Creates a fun(t, y) closure for the Combinatorial Model.

This model handles the full 2^n state space complexity using pre-computed transition tables.

Parameters:

Name Type Description Default
S_cache ndarray

Cached signaling matrix (pre-computed inputs).

required
jb int

Time bucket index (for static step evaluation).

required
trans_*

Arrays defining the sparse state transition graph.

required

Returns:

Name Type Description
callable

A function fun(t, y) -> dy.

Optimization & Loss Functions

networkmodel.optproblem

Multi-Objective Optimization Problem Definition.

This module defines the GlobalODE_MOO class, which wraps the biological simulation and loss calculation into a format compatible with the pymoo optimization framework.

Key Concepts: 1. Elementwise Problem: Each member of the population (a parameter set) is evaluated individually. 2. Three Objectives: The optimizer simultaneously minimizes error for: * Protein Abundance * mRNA Abundance * Phosphorylation Levels 3. Prior Regularization: A penalty term ensures parameters stay close to biologically plausible priors derived from upstream analysis (KinOpt/TFOpt). This penalty is added to all objectives to steer the entire Pareto front toward realistic regions.

GlobalODE_MOO

Bases: ElementwiseProblem

Defines the multi-objective optimization problem for the ODE model.

Attributes:

Name Type Description
sys System

The biological system object (contains topology and matrices).

slices dict

Mapping of parameter names to indices in the decision vector x.

loss_data dict

Pre-processed experimental data arrays for fast loss calculation.

defaults dict

Prior parameter values for regularization.

lambdas dict

Hyperparameters controlling the strength of penalties/weights.

time_grid ndarray

The common time grid for simulation.

fail_value float

Penalty value assigned to objectives if simulation fails.

__init__(sys, slices, loss_data, defaults, lambdas, time_grid, xl, xu, fail_value=1000000000000.0, elementwise_runner=None)

Initialize the MOO problem.

Parameters:

Name Type Description Default
sys

System object.

required
slices

Parameter slice dictionary.

required
loss_data

Dictionary of contiguous arrays for loss function.

required
defaults

Dictionary of default/prior parameter values.

required
lambdas

Dictionary containing weights for 'protein', 'rna', 'phospho', and 'prior'.

required
time_grid

Simulation timepoints.

required
xl

Lower bounds for parameters.

required
xu

Upper bounds for parameters.

required
fail_value

Objective value to return if ODE solver diverges.

1000000000000.0
elementwise_runner

Optional runner for parallel execution.

None

get_weight_options(time_points, *, rna_time_points=None, early_window=None, center=None, baseline=None, eps=1e-12)

Generates a library of time-dependent weighting schemes.

Time-series data often requires non-uniform weighting. For example: - Early Transients: Often contain the most kinetic information, so we might boost early timepoints. - Late Steady-State: Might be noisy or less relevant for initial kinetics.

Parameters

time_points : array-like All times you want schemes to support. rna_time_points : array-like or None Optional; used only for a couple RNA-friendly schemes. early_window : float or None Times <= early_window are "early". If None, uses ~20% quantile. center : float or None Center for gaussian/logistic. If None, uses median. baseline : float or None Baseline time (for "from_baseline" schemes). eps : float Numerical floor to prevent division by zero.

Returns

dict[str, callable] Dictionary mapping scheme names (e.g., 'linear_early') to functions f(t) -> weight.

build_weight_functions(time_points_protein: np.ndarray, time_points_rna: np.ndarray, scheme_prot_pho: str = 'uniform', scheme_rna: str = 'uniform', early_window_prot_pho: float = 2.0, early_window_rna: float = 15.0) -> Tuple[Callable[[np.ndarray], np.ndarray], Callable[[np.ndarray], np.ndarray]]

Factory to build modality-specific weight functions based on configuration.

Parameters

time_points_protein: Timepoints for protein/phospho modality. time_points_rna: Timepoints for RNA modality. scheme_prot_pho: Weighting scheme name for protein/phospho (e.g., 'linear_early'). scheme_rna: Weighting scheme name for RNA (e.g., 'uniform'). early_window_prot_pho: Definition of 'early' (in minutes) for protein data. early_window_rna: Definition of 'early' (in minutes) for RNA data.

Returns

w_prot_pho, w_rna: Two callables that take time arrays and return weight arrays.

networkmodel.lossfn

High-Performance Loss Functions Module.

This module implements the objective functions used to quantify the discrepancy between model predictions ($Y$) and experimental data (Protein, RNA, Phospho).

Key Features: 1. JIT Compilation: All functions are decorated with @njit from Numba to enable C-like performance, which is critical when the loss function is called thousands of times during optimization. 2. Multi-Modal Support: Handles RNA, Total Protein, and Phospho-site data simultaneously. 3. Robust Error Metrics: Supports multiple loss modes (Squared Error, Huber, Pseudo-Huber, Charbonnier) to handle outliers in biological data. 4. Topology Awareness: Includes separate logic for standard models (linear states) and combinatorial models (bitwise state aggregation).

sq(diff)

Standard Squared Error: (y - y_pred)^2.

huber(diff, delta=1.0)

Huber Loss. Behaves like Squared Error for small errors (<= delta) and Absolute Error for large errors. Robust to outliers.

pseudo_huber(diff, delta=1.0)

Pseudo-Huber Loss. A smooth approximation of Huber loss that is differentiable everywhere.

charbonnier(diff, eps=0.001)

Charbonnier Loss (differentiable L1). $\sqrt{diff^2 + \epsilon^2}$. Very robust to outliers.

log_cosh(diff)

Log-Hyperbolic Cosine Loss. Approx: x^2/2 for small x, abs(x) - log(2) for large x. Smoother than Huber; helps gradients flow better near zero.

cauchy_loss(diff, c=1.0)

Cauchy (Lorentzian) Loss. Scales logarithmically for large errors. Extremely robust to outliers; the influence of an outlier tends to zero.

poisson_scaled_mse(diff, pred_val, eps=1e-06)

Poisson-Scaled MSE. Weights the error by the inverse of the predicted intensity. Penalizes relative error more heavily at low values and allows more variance at high values.

geman_mcclure(diff, delta=1.0)

Geman-McClure Loss. Soft-saturating loss. As error -> infinity, loss -> constant. This strictly limits the maximum penalty any single data point can exert.

loss_function_noncomb(Y, p_prot, t_prot, obs_prot, w_prot, p_rna, t_rna, obs_rna, w_rna, p_pho, s_pho, t_pho, obs_pho, w_pho, prot_map, prot_base_idx, rna_base_idx, pho_base_idx)

Loss function for Models 0, 1, and 4 (Non-Combinatorial).

In these models, the state vector is linear: [RNA, Unphos, Phos_site1, Phos_site2, ...]

Calculates: 1. Protein: (Unphos + Sum(Phos_states)) / Baseline 2. RNA: RNA_state / Baseline 3. Phospho: Phos_state_j / Baseline

Parameters:

Name Type Description Default
Y ndarray

Simulation trajectory (n_times x n_states).

required
*_prot, *_rna, *_pho

Arrays of indices (protein index, time index, etc.), observations, and weights for each data type.

required
prot_map ndarray

Lookup table for state offsets.

required
*_base_idx

Index of the normalization baseline timepoint (usually t=0).

required

Returns:

Name Type Description
tuple

(loss_protein, loss_rna, loss_phospho)

loss_function_comb(Y, p_prot, t_prot, obs_prot, w_prot, p_rna, t_rna, obs_rna, w_rna, p_pho, s_pho, t_pho, obs_pho, w_pho, prot_map, prot_base_idx, rna_base_idx, pho_base_idx)

Loss function for Model 2 (Combinatorial).

In this model, states represent all $2^n$ combinations of phosphorylation. To calculate observables, we must aggregate these states: 1. Protein: Sum of all $2^n$ states. 2. Phospho Site j: Sum of all states where the $j$-th bit is 1.

Parameters:

Name Type Description Default
Y ndarray

Simulation trajectory.

required

networkmodel.optuna_solver

Native Optuna Optimization Engine (MOTPE).

This module provides an alternative optimization backend using Optuna's Multi-Objective Tree-structured Parzen Estimator (MOTPE). It is designed to replace or coexist with the Pymoo genetic algorithm.

Key Features: 1. Bayesian Optimization: Uses MOTPE to efficiently explore the parameter space by modeling the probability density of good vs. bad solutions. 2. Persistent Storage: Saves all trial data to a local SQLite database (optimization.db), allowing pause/resume and post-hoc analysis. 3. Live Dashboard: Optionally launches the optuna-dashboard for real-time visualization of the Pareto front and parameter importances. 4. Vectorized Loss: Implements a high-performance objective function that avoids Python loops during evaluation.

OptunaResult dataclass

Standardized result container to ensure compatibility with existing export scripts.

This mimics the structure of Pymoo's Result object so that downstream plotting and export functions don't need to change.

NativeOptunaObjective

The optimization kernel designed specifically for Optuna.

This class is callable objective(trial) and handles the entire evaluation pipeline: Parameter Suggestion -> System Update -> ODE Simulation -> Loss Calculation.

It pre-calculates array indices (_build_fast_indices) to enable fully vectorized loss computation, which is critical for performance when running thousands of trials.

__call__(trial)

The main evaluation step called by Optuna.

run_optuna_solver(args, sys, loss_data, slices, xl, xu, defaults, lambdas, time_grid, df_prot, df_rna, df_pho, n_trials=5000)

Main driver function for the Optuna Optimization pipeline.

It replaces the Pymoo genetic algorithm with Optuna's MOTPE.

Workflow: 1. Augment loss data with fast indices. 2. Setup persistent SQLite storage (optimization.db). 3. Launch the Optuna Dashboard (background thread) for real-time monitoring. 4. Run the optimization loop. 5. Extract the Pareto front from the database.

Parameters:

Name Type Description Default
args

Configuration namespace (seed, output_dir, etc.).

required
sys

The system object.

required
loss_data

Pre-computed data for loss function.

required
slices

Parameter slices.

required
xl, xu

Lower/Upper parameter bounds.

required
n_trials

Number of evaluations to run.

5000

Returns:

Name Type Description
OptunaResult

Standardized result object containing Best Parameters (X) and Objectives (F).

networkmodel.runner

Main Driver for PhosKinTime Global Model.

This script runs the optimization and analysis steps of the PhosKinTime Global Model. It loads the data, builds the model, and runs the optimization.

The script can be run in two modes: 1. Standard optimization: run_optuna_solver() 2. Sensitivity analysis: run_sensitivity_analysis()

The script can also be run in parallel using the multiprocessing module. This is useful for running the optimization on multiple cores. To enable parallel execution, set the CORES environment variable to the number of cores to use.

networkmodel.refine

Iterative Refinement and Optimization Zooming Module.

This module implements a multi-stage optimization strategy to fine-tune the biological model parameters. After an initial global search (Exploration), this module performs iterative "Refinement" passes (Exploitation) by:

  1. Zooming: Calculating a tighter bounding box around the best solutions found so far.
  2. Seeding: Creating a hybrid population that mixes the best previous solutions ("Warm Start") with fresh random samples within the new bounds to maintain diversity.
  3. Parallel Execution: Using multiprocessing to speed up the re-evaluation of the refined space.

get_refined_bounds(X, current_xl, current_xu, idx, padding=0.2)

Calculates tighter parameter bounds based on the spread of the current best solutions. Also logs a detailed "Biological Report" showing the physical ranges of these new bounds.

The new bounds are calculated as: $$ [min(X) - padding imes range, max(X) + padding imes range] $$ clamped to the original current_xl and current_xu.

Parameters:

Name Type Description Default
X ndarray

The decision vectors of the current Pareto front.

required
current_xl ndarray

The absolute lower bounds of the search space.

required
current_xu ndarray

The absolute upper bounds of the search space.

required
idx Index

The system index object (for mapping parameters to names).

required
padding float

The expansion factor for the new bounds (default 20%).

0.2

Returns:

Name Type Description
tuple

(new_xl, new_xu) arrays.

create_multistart_population(X_best, pop_size, new_xl, new_xu)

Creates a hybrid population for the next optimization stage.

Strategy: - 50% Warm Start: Reuses the best individuals from the previous run (perturbed slightly). - 50% Fresh Sampling: Randomly samples new points within the refined bounds to prevent premature convergence to a local minimum.

Parameters:

Name Type Description Default
X_best ndarray

Array of best solutions from previous run.

required
pop_size int

Target population size.

required
new_xl, new_xu

The new boundaries.

required

Returns:

Name Type Description
Population

A Pymoo Population object ready for the algorithm.

run_iterative_refinement(problem, prev_res, args, idx=None, max_passes=3, padding=0.25)

Executes the recursive refinement loop.

Repeatedly zooms in on the optimal region by: 1. Defining a tighter box around the current Pareto front. 2. Launching a new optimization run within that box. 3. Checking if the solution improved; if not, stopping early.

Parameters:

Name Type Description Default
problem GlobalODE_MOO

The optimization problem instance.

required
prev_res Result

The result object from the previous (global) run.

required
args Namespace

Configuration arguments.

required
idx Index

System index object for logging.

None
max_passes int

Maximum number of refinement iterations.

3
padding float

Initial padding factor for bounds expansion.

0.25

Returns:

Name Type Description
Result

The final, most refined optimization result.

run_refinement(problem, prev_res, args, padding=0.25)

Legacy wrapper for a single refinement pass. Deprecated in favor of the recursive run_iterative_refinement.

networkmodel.scan

Hyperparameter Tuning Module (Optuna + Pymoo).

This module implements an "Outer Loop" optimization to tune the hyperparameters of the loss function itself (specifically, the regularization weights $\lambda_{protein}, \lambda_{RNA}, \dots$).

Architecture: Nested Optimization 1. Outer Loop (Optuna): Bayesian Optimization searches for the best set of objective weights ($\lambda$). 2. Inner Loop (Pymoo): For each set of weights, a short Genetic Algorithm (UNSGA3) runs to estimate how well the model could fit the data with those priorities.

Key Features: * Live Dashboard: Launches the optuna-dashboard to visualize parameter importance and Pareto fronts in real-time. * Pruning: Uses OptunaPruningCallback to terminate unpromising Pymoo runs early, saving compute time. * Persistent Storage: Saves all trials to an SQLite database, enabling pause/resume functionality.

OptunaPruningCallback

Bases: Callback

Bridges Pymoo's internal generational loop with Optuna's pruning API.

This allows the outer Optuna loop to kill a running Pymoo optimization if the intermediate results (at gen_step intervals) are significantly worse than the median of previous trials.

BioObjective

The Function-Object (Functor) optimized by Optuna.

Represents one "Trial": 1. Receives hyperparameters ($\lambda$) from Optuna. 2. Constructs the GlobalODE_MOO problem with these weights. 3. Runs a reduced-scope Genetic Algorithm (UNSGA3). 4. Returns the best achieved error score.

run_hyperparameter_scan(args, sys, loss_data, defaults, time_grid, runner, slices, xl, xu)

Main driver for the Hyperparameter Scan.

Orchestrates the entire tuning process: 1. Sets up the persistent SQLite database. 2. Launches the Optuna Dashboard in a background thread. 3. Runs the optimization loop. 4. Exports results (Excel + Plots).

Parameters:

Name Type Description Default
args

Configuration arguments.

required
sys

System object.

required
loss_data

Pre-computed loss indices.

required
defaults

Default parameters.

required
time_grid

Simulation timepoints.

required
runner

Parallel runner for Pymoo.

required
slices

Parameter slices.

required
xl, xu

Parameter bounds.

required

Returns:

Name Type Description
dict

The best set of hyperparameters found.

Analysis & Visualization

networkmodel.sensitivity

Global Sensitivity Analysis Module (Morris Method).

This module performs a comprehensive sensitivity analysis to determine which model parameters have the most significant impact on the system's output.

Methodology: It uses the Morris Method (Elementary Effects), which is computationally efficient for high-dimensional models. It works by: 1. Sampling: Generating $N$ trajectories through the parameter space, where each step changes one parameter. 2. Simulation: Running the ODE model for every sampled parameter set in parallel. 3. Analysis: Computing the mean absolute effect ($\mu^*$) and interaction/non-linearity ($\sigma$) for each parameter.

Outputs: * Sensitivity Indices: CSV ranking parameters by influence. * Perturbation Clouds: Visualizations showing how parameter uncertainty propagates to trajectory spread.

compute_bounds(params_dict, perturbation=SENSITIVITY_PERTURBATION)

Generates [lower, upper] bounds for each parameter based on a local perturbation.

For SALib, we define a hypercube around the optimal parameters found during calibration.

Parameters:

Name Type Description Default
params_dict dict

The best-fit parameters.

required
perturbation float

Fraction to vary parameters (e.g., 0.2 = +/- 20%).

SENSITIVITY_PERTURBATION

Returns:

Name Type Description
dict

Problem definition for SALib containing 'num_vars', 'names', and 'bounds'.

run_sensitivity_analysis(sys, idx, fitted_params, output_dir, metric='total_signal')

Main driver for Morris Sensitivity Analysis.

Workflow:
1.  **Define Problem:** Create bounds around `fitted_params`.
2.  **Sample:** Generate Morris trajectories using SALib.
3.  **Simulate:** Run parallel simulations for all samples.
4.  **Analyze:** Compute Morris indices ($\mu^*$, $\sigma$).
5.  **Visualize:** Plot influence bars and trajectory perturbation clouds.

[Image of parallel processing flow chart]

Args:
    sys: The System object.
    idx: Index object.
    fitted_params: Dictionary of optimal parameters.
    output_dir: Path to save results.
    metric: Scalar metric for sensitivity target.

Returns:
    pd.DataFrame: The computed sensitivity indices.

networkmodel.analysis

Steady-State Simulation and Analysis Module.

This script manages the simulation of the biological system to its steady state and performs a comprehensive post-simulation analysis. It includes two main functions:

  1. simulate_until_steady: Integrates the ODEs over a long time horizon using log-spaced time steps to capture both fast initial kinetics and slow equilibration.
  2. plot_steady_state_all: A massive visualization and reporting pipeline that generates per-protein dynamic plots, convergence diagnostics, kinase dominance analysis, and global phosphorylation statistics.

This module is essential for establishing the baseline behavior of the cell model before perturbations (e.g., drug treatments) are applied.

simulate_until_steady(sys, t_max=1440.0, n_points=1000)

Simulates the system from t=0 to t_max (default 24h) to observe convergence.

Uses log-spacing for the time grid. This is crucial for biological systems where phosphorylation reactions happen in seconds/minutes, while transcriptional/translational changes happen over hours.

Parameters:

Name Type Description Default
sys System

The system object containing ODE definitions and parameters.

required
t_max float

Maximum simulation time in minutes (default 1440m = 24h).

1440.0
n_points int

Number of time points in the evaluation grid.

1000

Returns:

Name Type Description
tuple

(t_eval, Y) where t_eval is the time vector and Y is the state matrix.

plot_steady_state_all(t, Y, sys, idx, output_dir)

Performs comprehensive analysis and plotting of the steady-state results.

This function generates: 1. Dynamic Plots: Time-series plots for RNA, Protein, and Phospho-sites for every protein. 2. Convergence Diagnostics: Histograms of derivatives at the final time point. 3. Phospho-fraction Summary: Table and plots of the % phosphorylated for all proteins. 4. Kinase Drive Analysis: Quantification of how much "signal" each kinase pushes into the network. 5. Dominance Analysis: Identifies which kinase is the primary driver for every single phosphorylation site. 6. Activity vs. Drive Scatter: Visualizes kinase efficiency (Active Conc vs. Network Output).

Parameters:

Name Type Description Default
t ndarray

Time vector.

required
Y ndarray

State matrix (time x variables).

required
sys System

The system object (used for RHS evaluation and W matrix access).

required
idx IndexMap

Object mapping names to state indices.

required
output_dir str

Directory where results will be saved.

required

networkmodel.export

Exporting and Visualizing Simulation Results Module

Export and visualize simulation results from global model optimization.

This module provides functions for exporting and visualizing simulation results from global model optimization. It includes functions for exporting phosphorylation drive S, scanning prior regularization parameters, plotting phosphorylation drive S, generating diagnostic correlation plots, plotting residuals, and plotting boxplots of parameters across the Pareto front.

build_site_meta(idx)

Returns parallel arrays of length idx.total_sites.

Args:

idx: Index object containing model-specific information.

Returns

site_protein: np.ndarray of protein names for each site site_psite: np.ndarray of psite labels for each site site_local: np.ndarray of local site indices within each protein

save_pareto_3d(res, selected_solution=None, output_dir='out_moo')

Saves a high-quality 3D Scatter plot of the Pareto Front. Highlights the 'selected' balanced solution if provided.

Parameters:

Name Type Description Default
res

Optimization result object containing Pareto front data.

required
selected_solution

Tuple of (fitness, decision variables) for the selected solution.

None
output_dir

Directory where plots will be saved.

'out_moo'

save_parallel_coordinates(res, selected_solution=None, output_dir='out_moo')

Saves a Parallel Coordinate Plot (PCP).

Parameters:

Name Type Description Default
res

Optimization result object containing Pareto front data.

required
selected_solution

Tuple of (fitness, decision variables) for the selected solution.

None
output_dir

Directory where plots will be saved.

'out_moo'

create_convergence_video(res, output_dir='out_moo', filename='optimization_history.mp4')

Creates an animation of the Pareto Front evolution using standard Matplotlib.

Parameters:

Name Type Description Default
res

Optimization result object containing Pareto front data.

required
output_dir

Directory where video will be saved.

'out_moo'
filename

Name of the output video file.

'optimization_history.mp4'

export_pareto_front_to_excel(res, sys, idx, slices, output_path, weights=(1.0, 1.0, 1.0), top_k_trajectories=None, t_points_p=None, t_points_r=None, t_points_ph=None, rtol=1e-05, atol=1e-07, mxstep=5000)

Export all Pareto solutions into one Excel workbook.

Writes sheets
  • summary: objectives + scalar score + rank + weights
  • params_genes: per-protein parameters for each sol_id (long format)
  • params_kinases: per-kinase parameters for each sol_id (long format)
  • traj_protein: trajectories per sol_id (long format)
  • traj_rna: trajectories per sol_id (long format)
  • traj_phospho: trajectories per sol_id (long format)
Notes
  • This can get HUGE if you have many Pareto points. Use top_k_trajectories.
  • The function assumes res.X and res.F exist.

Parameters:

Name Type Description Default
res

Optimization result object containing Pareto front data.

required
sys

System object containing model-specific information.

required
idx

Index object containing protein/kinase metadata.

required
slices

Slices for trajectory data (e.g., time points for protein, RNA, and phospho data).

required
output_path

Path to save the Excel workbook.

required
top_k_trajectories

None = export trajectories for all solutions; else only top K by scalar score.

None
t_points_p, t_points_r, t_points_ph

Optional time points for trajectory data.

required

plot_goodness_of_fit(df_prot_obs, df_prot_pred, df_rna_obs, df_rna_pred, df_phos_obs, df_phos_pred, output_dir, file_prefix='')

Plots goodness of fit for protein, RNA, and phosphorylation data.

Parameters:

Name Type Description Default
df_prot_obs

Protein observed data DataFrame.

required
df_prot_pred

Protein predicted data DataFrame.

required
df_rna_obs

RNA observed data DataFrame.

required
df_rna_pred

RNA predicted data DataFrame.

required
df_phos_obs

Phosphorylation observed data DataFrame.

required
df_phos_pred

Phosphorylation predicted data DataFrame.

required
output_dir

Directory where plots will be saved.

required
file_prefix

Prefix for output file names.

''

plot_gof_from_pareto_excel(excel_path: str, output_dir: str, plot_goodness_of_fit_func, df_prot_obs_all: pd.DataFrame, df_rna_obs_all: pd.DataFrame, df_phos_obs_all: pd.DataFrame, traj_protein_sheet: str = 'traj_protein', traj_rna_sheet: str = 'traj_rna', traj_phospho_sheet: str = 'traj_phospho', summary_sheet: str = 'summary', top_k: int = None, only_solutions=None, score_col: str = 'scalar_score')

Uses the Excel produced by export_pareto_front_to_excel to plot goodness of fit

  • summary
  • traj_protein (sol_id, protein, time, pred_fc)
  • traj_rna (sol_id, protein, time, pred_fc)
  • traj_phospho (sol_id, protein, psite, time, pred_fc)

export_results(sys, idx, df_prot_obs, df_rna_obs, df_phos_obs, df_pred_p, df_pred_r, df_pred_ph, output_dir)

Export pre-computed observed + predicted trajectories and model parameters.

Parameters:

Name Type Description Default
sys

System object containing model information.

required
idx

Index of the solution to export.

required
df_prot_obs

Protein observed data DataFrame.

required
df_rna_obs

RNA observed data DataFrame.

required
df_phos_obs

Phosphorylation observed data DataFrame.

required
df_pred_p

Protein predicted data DataFrame.

required
df_pred_r

RNA predicted data DataFrame.

required
df_pred_ph

Phosphorylation predicted data DataFrame.

required
output_dir

Directory where results will be saved.

required

save_gene_timeseries_plots(gene: str, df_prot_obs: pd.DataFrame, df_prot_pred: pd.DataFrame, df_rna_obs: pd.DataFrame, df_rna_pred: pd.DataFrame, df_phos_obs: pd.DataFrame, df_phos_pred: pd.DataFrame, output_dir: str, prot_times: np.ndarray = None, rna_times: np.ndarray = None, phos_times: np.ndarray = None, filename_prefix: str = 'ts', dpi: int = 300, phos_mode: str = 'per_psite', max_psites: int = None)

scan_prior_reg(out_dir)

Scan prior regularization parameters and save Pareto front plots.

Parameters:

Name Type Description Default
out_dir

Directory containing output files.

required

export_S_rates(sys, idx, output_dir, filename='S_rates_picked.csv', long=True)

Export phosphorylation drive S for optimized parameters. S is per-site and per time-runner (TIME_POINTS_PROTEIN / sys.kin_grid).

Parameters:

Name Type Description Default
sys

System object containing model information.

required
idx

Index of the solution to export.

required
output_dir

Directory where results will be saved.

required
filename

Name of the output file.

'S_rates_picked.csv'
long

If True, output in long format (protein, psite, time, S); otherwise, wide format.

True

plot_s_rates_report(csv_path: str | Path, out_pdf: str | Path = 'S_rates_report.pdf', *, time_col: str = 'time', value_col: str = 'S', protein_col: str = 'protein', psite_col: str = 'psite', log_x: bool = True, top_k_sites_per_protein: int | None = 24, max_sites_per_page: int = 12, ncols: int = 3, normalize_per_site: bool = False, heatmap_per_protein: bool = True, heatmap_cap_sites: int = 80, agg_duplicates: str = 'mean', dpi: int = 150) -> Path

Plot phosphorylation drive S for optimized parameters.

process_convergence_history(res, output_dir)

Extract and save optimization convergence history.

Parameters:

Name Type Description Default
res

Pymoo optimization result object with history attribute

required
output_dir

Directory path to save outputs

required

Returns:

Type Description

pd.DataFrame: Convergence history dataframe with columns [n_evals, gen, min_prot_mse, min_rna_mse, min_phos_mse]

export_kinase_activities(sys, idx, output_dir, t_max=120, n_points=121)

Calculates and saves the effective Kinase Activity (Profile * c_k) over time.

export_param_correlations(res, slices, idx, output_dir, best_idx=None)

Plots kinase parameter correlation across Pareto front and gene parameter correlation for best solution.

Parameters:

Name Type Description Default
res

Result object containing optimization results.

required
slices

Dictionary of parameter slices.

required
idx

Index object containing model indices.

required
output_dir

Directory where results will be saved.

required
best_idx

Index of the best solution to plot gene parameter correlations.

None

export_residuals(sys, idx, df_prot, df_rna, df_phos, output_dir)

Calculates and plots residuals (Observed - Predicted) for the best solution. Helps identify systematic bias (e.g., 'Always under-predicts at t=10' or 'Always over-predicts at t=100').

export_parameter_distributions(res, slices, idx, output_dir)

Plots boxplots of parameters across the ENTIRE Pareto front. Shows which parameters are 'identifiable' (tight box) vs 'sloppy' (wide box).

Parameters:

Name Type Description Default
res

Optimization result object containing Pareto front data.

required
slices

Dictionary mapping parameter names to their column indices in res.X.

required
idx

Index object containing model-specific information.

required
output_dir

Directory where plots will be saved.

required

networkmodel.dashboard_app

Dashboard application for PhosKinTime.

This module provides a Streamlit-based dashboard for visualizing and analyzing results from PhosKinTime simulations. It includes features for visualizing pareto frontiers, goodness of fit, residuals analysis, and more.

The dashboard is intended to be run from the command line, e.g.:

streamlit run run_dashboard.py -- --output_dir=/path/to/output

networkmodel.dashboard_bundle

Dashboard bundle saving and loading utilities.

This module provides functions to save and load a compact, dashboard-friendly bundle containing essential data for visualization. The bundle is designed to be lean, including only the necessary data for visualization, avoiding pickling of custom classes for robustness.

Variables stored in the bundle:

  • args: command-line arguments
  • picked_index: index of the protein picked for visualization
  • frechet_scores: dictionary of Frechet distances for each protein
  • lambdas: dictionary of lambda values for each protein
  • solver_times: dictionary of solver times for each protein
  • defaults: dictionary of default values for each protein
  • slices: dictionary of slice values for each protein
  • xl, xu: lower and upper bounds for each protein

Note that the bundle does not store the optimization result object (res) or the Pymoo algorithm (sys).

Utilities

networkmodel.utils

Utilities and Configuration Management Module.

This module provides essential helper functions and classes for: 1. Data Normalization: Functions to clean column names (_normcols), calculate fold-changes relative to a baseline (normalize_fc_to_t0), and scale data for consistent modeling. 2. Configuration Loading: The load_config_toml function parses the config.toml file into a structured PhosKinConfig dataclass, centralizing all run settings. 3. Dynamic Bound Calculation: calculate_bio_bounds intelligently sets parameter limits based on the input data range and network topology, ensuring biological plausibility. 4. Math Helpers: JIT-compiled kernels for common operations like softplus (for positive parameters) and time_bucket (for discrete input grids).

PhosKinConfig dataclass

Immutable configuration object holding all settings for the simulation run. Loaded from config.toml.

normalize_fc_to_t0(df)

Normalizes a Fold-Change (FC) DataFrame relative to the value at time t=0.

Logic: For each (protein, psite) group, find the FC at t=0. New_FC(t) = Raw_FC(t) / Raw_FC(0).

process_and_scale_raw_data(df, time_points, id_cols, scale_method='fc_start', epsilon=0.001)

Scales time-series data ensuring ALL outputs remain non-negative (>= 0).

This function handles the conversion from "Wide" format (columns x1, x2, ...) to "Long" format and applies the selected scaling transformation.

Parameters:

Name Type Description Default
scale_method str
  • 'raw' / 'none': No scaling. Returns raw intensities/counts.
  • 'fc_start': Standard Fold-Change (x_t / x_0).
  • 'robust_fc': Fold-Change with noise floor (x_t / (x_0 + eps)).
  • 'max_scale': Normalizes to [0, 1] range (x_t / x_max).
  • 'mean_scale': Centers data around 1.0 (x_t / x_mean).
  • 'l2_norm': Unit vector scaling (x_t / ||x||).
'fc_start'

time_bucket(t, grid)

Finds the index j such that grid[j] <= t < grid[j+1]. Used for piecewise-constant input interpolation.

softplus(x)

Softplus activation: log(1 + exp(x)). Maps real numbers to positive numbers. Used for parameter transformation.

inv_softplus(y)

Inverse Softplus: log(exp(y) - 1). Maps positive numbers to real numbers.

pick_best_lamdas(F, weights)

Selects the best solution from a Pareto front based on a weighted sum of normalized objectives.

load_config_toml(path: str | Path) -> PhosKinConfig

Parses config.toml and returns a validated PhosKinConfig object. Includes fallbacks and default values for missing keys.

get_parameter_labels(idx)

Generates descriptive labels for all parameters in the flattened decision vector.

calculate_bio_bounds(idx, df_prot, df_rna, tf_mat, kin_in)

Dynamically calculates optimization bounds by analyzing network topology, data dynamic ranges, and kinetic equilibrium requirements.

This ensures that the optimizer doesn't waste time searching parameter space regions that are mathematically impossible to yield the observed data (e.g., degradation rates that are too slow to match the observed decay).

Returns:

Name Type Description
dict

Custom bounds for each parameter group.

get_optimized_sets(idx, slices, xl, xu, eps=1e-14)

Identifies which entities (Proteins, Sites, Kinases) actually have free parameters being optimized (i.e., bounds are not collapsed).

Returns:

Name Type Description
opt_proteins

set[str] proteins with any free variable among A/B/C/D/E

opt_sites

set[str] site labels like 'EGFR_Y1173' where Dp_i is free

opt_kinases

set[str] kinases with free c_k

networkmodel.cache

Fast Loss Data Preparation Module.

This script handles the crucial pre-processing step for the optimization loop. Instead of performing slow dictionary lookups and string comparisons inside the loss function (which runs thousands of times), this module maps all experimental data (RNA, Protein, Phospho) onto integer-based grid indices once.

The output is a dictionary of contiguous NumPy arrays (indices, observations, weights) that can be passed directly to a fast JIT-compiled or Cython loss function.

prepare_fast_loss_data(idx, df_prot, df_rna, df_pho, time_grid)

Converts pandas DataFrames of experimental data into integer-based arrays aligned with the simulation state vector and time grid.

This function performs "pre-indexing": 1. Maps float timepoints to integer indices in time_grid. 2. Maps string protein names to integer state indices (p2i). 3. Maps specific phosphosites (e.g., "S473") to relative state offsets. 4. Constructs a prot_map for fast state slicing.

Parameters:

Name Type Description Default
idx IndexMap

Object containing mappings (p2i, sites, n_sites, etc.).

required
df_prot DataFrame

Protein data columns [protein, time, fc, w].

required
df_rna DataFrame

RNA data columns [protein, time, fc, w].

required
df_pho DataFrame

Phospho data columns [protein, psite, time, fc, w].

required
time_grid array - like

The fixed time points of the simulation solver.

required

Returns:

Name Type Description
dict

A dictionary containing keyed NumPy arrays (e.g., 'p_prot', 'obs_prot') ready for the fast loss function.