API Reference
Data Standardization & Cleanup
processing.cleanup
process_collecttri()
Processes the CollecTRI file to clean and filter mRNA-TF interactions. Removes complex interactions, filters by target genes, and saves the result.
format_site(site)
Formats a phosphorylation site string.
If the input is NaN or an empty string, returns an empty string. If the input contains an underscore ('_'), splits the string into two parts, converts the first part to uppercase, and appends the second part unchanged. Otherwise, converts the entire string to uppercase.
| Parameters: |
|
|---|
| Returns: |
|
|---|
process_msgauss()
Processes the MS Gaussian data file to generate time series data.
process_msgauss_std()
Processes the MS Gaussian data file to compute transformed means and standard deviations.
process_routlimma()
Processes the Rout Limma table to generate time series data for mRNA.
update_gene_symbols(filename)
Updates the GeneID column in a CSV file by mapping GeneIDs to gene/protein symbols.
| Parameters: |
|
|---|
move_processed_files()
Moves or copies processed files to their respective directories.
Optimization Results Mapping
processing.map
map_optimization_results(tf_file_path, kin_file_path, sheet_name='Alpha Values')
Reads the TF-mRNA optimization results from an Excel file and maps mRNA to each TF.
| Parameters: |
|
|---|
| Returns: |
|
|---|
create_cytoscape_table(mapping_csv_path)
Creates a Cytoscape-compatible edge table from a mapping file.
| Parameters: |
|
|---|
| Returns: |
|
|---|
add_kinetic_strength_columns(mapping_path, mapping__path, excel_path, suffix)
Adds kinetic strength columns to the mapping files based on the provided Excel file.
| Parameters: |
|
|---|
generate_nodes(edge_df)
Infers node types and aggregates all phosphorylation sites per target node from phosphorylation edges.
| Parameters: |
|
|---|
| Returns: |
|
|---|
Kinase-Phosphorylation Optimization
Evolutionary Algorithms
kinopt.evol.config.constants
kinopt.evol.config.logconf
ColoredFormatter
Bases: Formatter
format(record)
Format the log record with ANSI color codes and elapsed time.
| Parameters: |
|
|---|
Returns: str: The formatted log message with ANSI color codes.
remove_ansi(s)
staticmethod
Remove ANSI escape codes from a string.
| Parameters: |
|
|---|
Returns: str: The string without ANSI escape codes.
setup_logger(name='phoskintime', log_file=None, level=logging.DEBUG, log_dir=LOG_DIR, rotate=True, max_bytes=2 * 1024 * 1024, backup_count=5)
Function to set up a logger with both file and console handlers.
| Parameters: |
|
|---|
| Returns: |
|
|---|
kinopt.evol.exporter.plotout
plot_residuals_for_gene(gene, gene_data)
Generates and saves combined residual-related plots for one gene with all psites in the legend.
| Parameters: |
|
|---|
opt_analyze_nsga(problem, result, F, pairs, approx_ideal, approx_nadir, asf_i, pseudo_i, n_evals, hv, hist, val, hist_cv_avg, k, igd, best_objectives, waterfall_df, convergence_df, alpha_values, beta_values)
Function to generate and save various plots related to optimization results.
| Parameters: |
|
|---|
| Returns: |
|
|---|
opt_analyze_de(long_df, convergence_df, ordered_optimizer_runs, x_values, y_values, val)
Function to generate and save various plots related to optimization results.
| Parameters: |
|
|---|
| Returns: |
|
|---|
kinopt.evol.exporter.sheetutils
output_results(P_initial, P_init_dense, P_estimated, residuals, alpha_values, beta_values, result, timepoints, OUT_FILE)
Function to output results to an Excel file.
| Parameters: |
|
|---|
| Returns: |
|
|---|
kinopt.evol.objfn.minfndiffevo
PhosphorylationOptimizationProblem
Bases: ElementwiseProblem
Single-objective constrained optimization problem for phosphorylation dynamics (Numba-accelerated).
Minimizes loss between observed and predicted phosphorylation levels subject to constraints that alpha and beta weights sum to 1.0 for each gene-psite and kinase group, respectively.
Objective
- minimize loss (MSE, autocorrelation, Huber, or MAPE)
Constraints g(x) <= 0: - for each alpha group: |sum(alpha_group) - 1| <= eps_eq - for each kinase beta group: |sum(beta_group) - 1| <= eps_eq
| Attributes: |
|
|---|
estimated_series(params)
Compute estimated phosphorylation series for given parameters.
| Parameters: |
|
|---|
| Returns: |
|
|---|
residuals(params)
Compute residuals between observed and estimated phosphorylation for given parameters.
| Parameters: |
|
|---|
| Returns: |
|
|---|
kinopt.evol.objfn.minfnnsgaii
PhosphorylationOptimizationProblem
Bases: ElementwiseProblem
Multi-objective optimization
F[0] = main loss (error) F[1] = alpha sum-to-1 violations (aggregated) F[2] = beta sum-to-1 violations (aggregated)
| Parameters: |
|
|---|
objective_function(params)
Computes the main objective function (loss) for the given parameters.
| Parameters: |
|
|---|
| Returns: |
|
|---|
kinopt.evol.opt.optrun
choose_de_pop_size(problem)
Determine an appropriate population size for Differential Evolution (DE) algorithms.
The population size is calculated based on the number of decision variables, with bounds to ensure reasonable performance. DE algorithms benefit from population sizes that are multiples of 10.
| Parameters: |
|
|---|
| Returns: |
|
|---|
choose_nsga_pop_size(problem, n_obj=3)
Determine an appropriate population size for NSGA-based multi-objective algorithms.
The population size is scaled based on the problem dimensionality (number of decision variables) with heuristic thresholds. The size is rounded to multiples of 50 and enforced to be at least 10 times the number of objectives.
| Parameters: |
|
|---|
| Returns: |
|
|---|
binary_tournament_loss_cv(pop, P, eps_cv=1e-10, cv_mode='linf', **kwargs)
Robust binary tournament comparator for constrained optimization.
This function performs binary tournament selection with constraint handling using either true constraint violations (CV) or pseudo-constrained objectives. It supports both single-objective and multi-objective formulations.
Works for
A) single-objective: F has length 1 - if CV exists, use constraint-domination (CV first, then F) - else compare by F only B) pseudo-constrained objectives: F = [loss, alpha_violation, beta_violation] - feasibility-first based on F[1], F[2], then loss
| Parameters: |
|
|---|
| Returns: |
|
|---|
| Raises: |
|
|---|
run_optimization(P_initial, P_initial_array, K_index, K_array, gene_psite_counts, beta_counts, PhosphorylationOptimizationProblem)
Sets up and runs the multi-objective optimization problem for phosphorylation using an NSGA2 algorithm and a thread pool for parallelization.
| Parameters: |
|
|---|
| Returns: |
|
|---|
pick_best_loss_with_constraints_as_objectives(result, eps_cv=1e-10, cv_mode='l1', tie_tol=1e-12, tie_break='loss_then_l2')
Select the best solution from a population with constraints formulated as objectives.
This function assumes a specific objective structure where
F[:,0] = loss (minimize) F[:,1] = constraint violation 1 (minimize, ideally 0) F[:,2] = constraint violation 2 (minimize, ideally 0)
Selection rule
A) If any feasible solutions exist (cv1<=eps and cv2<=eps): choose minimum loss among feasible. B) Else: choose minimum aggregated CV; tie-break by loss; optional tie-break by ||X||2.
| Parameters: |
|
|---|
| Returns: |
|
|---|
| Raises: |
|
|---|
post_optimization_nsga(result, weights=np.array([1.0, 1.0, 1.0]), ref_point=np.array([3, 1, 1]))
Post-process the result of a multi-objective NSGA-based optimization run.
This function analyzes the optimization history, computes convergence metrics (hypervolume, IGD+), identifies the best solution using constraint handling, and generates CSV reports for convergence and parameter scans.
| Parameters: |
|
|---|
| Returns: |
|
|---|
post_optimization_de(result, alpha_values, beta_values)
Post-process the result of a single-objective DE or GA optimization run.
This function extracts the final population, creates parameter labels from alpha and beta values, generates a parameter scan DataFrame sorted by objective value, and produces a convergence DataFrame showing the best objective per iteration.
| Parameters: |
|
|---|
| Returns: |
|
|---|
kinopt.evol.optcon.construct
pipeline(input1_path: str, input2_path: str, time_series_columns: list[str], scaling_method: str, split_point: float, segment_points: list[float], estimate_missing_kinases: bool, kinase_to_psites: dict[str, int])
Function to run the entire pipeline for loading and processing data.
| Parameters: |
|
|---|
| Returns: |
|
|---|
load_geneid_to_psites(input1_path=INPUT1)
Function to load geneid to psite mapping from input1.csv. Args: input1_path (str): Path to the first CSV file (HGNC data). Returns: geneid_psite_map (dict): Dictionary mapping gene IDs to sets of psites.
get_unique_kinases(input2_path=INPUT2)
Function to extract unique kinases from input2.csv. Args: input2_path (str): Path to the second CSV file (kinase interactions). Returns: kinases (set): Set of unique kinases extracted from the input2 file.
check_kinases()
Function to check if kinases from input2.csv are present in input1.csv.
| Returns: |
|
|---|
kinopt.evol.utils.iodata
format_duration(seconds)
Returns a formatted string representing the duration in seconds, minutes, or hours.
| Parameters: |
|
|---|
Returns: str: The formatted duration string.
load_and_scale_data(estimate_missing, scaling_method, split_point, seg_points)
Function to load and scale data from CSV files.
| Parameters: |
|
|---|
| Returns: |
|
|---|
apply_scaling(df, time_series_columns, method, split_point, segment_points)
Function to apply different scaling methods to time-series data in a DataFrame.
| Parameters: |
|
|---|
| Returns: |
|
|---|
create_report(results_dir: str, output_file: str = 'report.html')
Creates a single global report HTML file from all gene folders inside the results directory.
| Parameters: |
|
|---|
| Returns: |
|
|---|
organize_output_files(*directories)
Function to organize output files into protein-specific folders and a general folder.
| Parameters: |
|
|---|
| Returns: |
|
|---|
kinopt.evol.utils.params
extract_parameters(P_initial, gene_psite_counts, K_index, optimized_params)
Function to extract alpha and beta values from the optimized parameters.
| Parameters: |
|
|---|
| Returns: |
|
|---|
compute_metrics(optimized_params: np.ndarray, P_initial: dict, P_initial_array: np.ndarray, K_index: dict, K_array: np.ndarray, gene_psite_counts: list, beta_counts: dict, n: int)
Function to compute error metrics for the estimated series.
| Parameters: |
|
|---|
| Returns: |
|
|---|
Gradient-Based Algorithms
kinopt.local.config.constants
parse_args()
kinopt.local CLI. Defaults come from config.toml.
kinopt.local.config.logconf
kinopt.local.exporter.plotout
format_timepoints(tp, tol=1e-09)
Format timepoints with minimal decimals: - integers -> no decimal - non-integers -> one decimal
| Parameters: |
|
|---|
| Returns: |
|
|---|
plot_fits_for_gene(gene, gene_data, real_timepoints)
Function to plot the observed and estimated phosphorylation levels for each psite of a gene.
| Parameters: |
|
|---|
export_outcomes_to_csv(outcomes, csv_path)
Export multistart optimization outcomes to CSV.
One row per start, scalar diagnostics only.
plot_cumulative_residuals(gene, gene_data, real_timepoints)
Function to plot the cumulative residuals for each psite of a gene.
| Parameters: |
|
|---|
plot_autocorrelation_residuals(gene, gene_data, real_timepoints)
Function to plot the autocorrelation of residuals for each psite of a gene.
| Parameters: |
|
|---|
plot_histogram_residuals(gene, gene_data, real_timepoints)
Function to plot histograms of residuals for each psite of a gene.
| Parameters: |
|
|---|
plot_qqplot_residuals(gene, gene_data, real_timepoints)
Function to plot QQ plots of residuals for each psite of a gene.
| Parameters: |
|
|---|
plot_multistart_summary_runtime_overlay(summary_csv, out_path=None, figsize=(8, 8), x_col='rank', y_col='fun', c_col='runtime_s', success_col='success', cv_col='constr_violation', annotate_best=True)
Read a multistart summary CSV and plot objective vs rank with point color = runtime.
Minimal, information-dense conventions: - x: rank (best -> worst) - y: final objective (fun) - color: runtime in seconds - optional: de-emphasize non-success / infeasible points (if columns exist)
| Parameters: |
|
|---|
| Returns: |
|
|---|
kinopt.local.exporter.sheetutils
format_timepoints(tp, tol=1e-09)
Format timepoints with minimal decimals: - integers -> no decimal - non-integers -> one decimal
| Parameters: |
|
|---|
| Returns: |
|
|---|
plot_fits_for_gene(gene, gene_data, real_timepoints)
Function to plot the observed and estimated phosphorylation levels for each psite of a gene.
| Parameters: |
|
|---|
export_outcomes_to_csv(outcomes, csv_path)
Export multistart optimization outcomes to CSV.
One row per start, scalar diagnostics only.
plot_cumulative_residuals(gene, gene_data, real_timepoints)
Function to plot the cumulative residuals for each psite of a gene.
| Parameters: |
|
|---|
plot_autocorrelation_residuals(gene, gene_data, real_timepoints)
Function to plot the autocorrelation of residuals for each psite of a gene.
| Parameters: |
|
|---|
plot_histogram_residuals(gene, gene_data, real_timepoints)
Function to plot histograms of residuals for each psite of a gene.
| Parameters: |
|
|---|
plot_qqplot_residuals(gene, gene_data, real_timepoints)
Function to plot QQ plots of residuals for each psite of a gene.
| Parameters: |
|
|---|
plot_multistart_summary_runtime_overlay(summary_csv, out_path=None, figsize=(8, 8), x_col='rank', y_col='fun', c_col='runtime_s', success_col='success', cv_col='constr_violation', annotate_best=True)
Read a multistart summary CSV and plot objective vs rank with point color = runtime.
Minimal, information-dense conventions: - x: rank (best -> worst) - y: final objective (fun) - color: runtime in seconds - optional: de-emphasize non-success / infeasible points (if columns exist)
| Parameters: |
|
|---|
| Returns: |
|
|---|
output_results(P_initial, P_init_dense, P_estimated, residuals, alpha_values, beta_values, result, mse, rmse, mae, mape, r_squared)
Function to output the results of the optimization process.
| Parameters: |
|
|---|
export_params_npz(outcomes, path)
Export the optimized parameters to a compressed npz file.
| Parameters: |
|
|---|
kinopt.local.objfn.minfn
kinopt.local.opt.optrun
StartOutcome
dataclass
Outcome of a single optimization start.
:param start_id: ID of the start. :param seed: Seed used for the start. :param result: Result of the optimization. :param optimized_params: Optimized parameters. :param fun: Objective function value. :param success: Whether the optimization was successful. :param constr_violation: Constraint violation. :param runtime_s: Runtime of the optimization.
run_optimization(obj_fun, params_initial, opt_method, bounds, constraints)
Run optimization using the specified method.
| Parameters: |
|
|---|
| Returns: |
|
|---|
multistart_run_optimization(obj_fun, params_initial, opt_method, bounds, constraints, n_starts=24, n_jobs=-1, base_seed=1234, init_strategy='hybrid', jitter_scale=0.15, prefer_feasible=True, logger=None)
Runs run_optimization multiple times in parallel and returns (best_result, best_params, outcomes).
Selection logic (sophisticated but simple): 1) If prefer_feasible: prefer (cv <= 0) or smallest constraint violation. 2) Then lowest objective. 3) Then success=True as tie-breaker. 4) Then shortest runtime as final tie-breaker.
| Parameters: |
|
|---|
| Returns: |
|
|---|
kinopt.local.optcon.construct
load_geneid_to_psites(input1_path=INPUT1)
Load the geneid to psite mapping from a CSV file.
| Parameters: |
|
|---|
Returns: defaultdict: A dictionary mapping geneid to a set of psites.
get_unique_kinases(input2_path=INPUT2)
Extract unique kinases from the input CSV file.
| Parameters: |
|
|---|
Returns: set: A set of unique kinases.
check_kinases()
Check if kinases in input2.csv are present in input1.csv and log the results.
kinopt.local.utils.iodata
format_duration(seconds)
Formats a duration in seconds into a human-readable string. - If less than 60 seconds, returns in seconds. - If less than 3600 seconds, returns in minutes. - If more than 3600 seconds, returns in hours.
:param seconds: :return: Formatted string
load_and_scale_data(estimate_missing, scaling_method, split_point, seg_points)
Load and scale the data from the specified input files.
:param estimate_missing: :param scaling_method: :param split_point: :param seg_points: :return: Time series data, interaction data, observed data
apply_scaling(df, cols, method, split_point, seg_points)
Apply scaling to the specified columns of a DataFrame based on the given method. The scaling methods include: - 'min_max': Min-Max scaling - 'log': Logarithmic scaling - 'temporal': Temporal scaling (two segments) - 'segmented': Segmented scaling (multiple segments) - 'slope': Slope scaling - 'cumulative': Cumulative scaling
:param df: :param cols: :param method: :param split_point: :param seg_points: :return: df
create_report(results_dir: str, output_file: str = 'report.html')
Creates a single global report HTML file from all gene folders inside the results directory.
For each gene folder (e.g. "ABL2"), the report will include: - All PNG plots and interactive HTML plots displayed in a grid with three plots per row. - Each plot is confined to a fixed size of 900px by 900px. - Data tables from XLSX or CSV files in the gene folder are displayed below the plots, one per row.
| Parameters: |
|
|---|
organize_output_files(*directories)
Function to organize output files into protein-specific folders. It moves files matching the pattern 'protein_name_*.{json,svg,png,html,csv,xlsx}' into a folder named after the protein (e.g., 'ABL2') and moves all other files into a 'General' folder within the same directory.
:param directories:
kinopt.local.utils.params
extract_parameters(P_initial, gene_kinase_counts, total_alpha, unique_kinases, K_index, optimized_params)
Extracts the alpha and beta parameters from the optimized parameters.
:param P_initial: :param gene_kinase_counts: :param total_alpha: :param unique_kinases: :param K_index: :param optimized_params: :return: Alpha and beta values as dictionaries
compute_metrics(optimized_params, P_init_dense, t_max, gene_alpha_starts, gene_kinase_counts, gene_kinase_idx, total_alpha, kinase_beta_starts, kinase_beta_counts, K_data, K_indices, K_indptr)
Computes the estimated series and various metrics based on the optimized parameters.
:param optimized_params: :param P_init_dense: :param t_max: :param gene_alpha_starts: :param gene_kinase_counts: :param gene_kinase_idx: :param total_alpha: :param kinase_beta_starts: :param kinase_beta_counts: :param K_data: :param K_indices: :param K_indptr: :return: Estimated series, residuals, MSE, RMSE, MAE, MAPE, R-squared
Fitting Analysis & Feasibility
kinopt.fitanalysis.helpers.postfit
goodnessoffit(estimated, observed)
Function to plot the goodness of fit and kullback-leibler divergence for estimated and observed values.
| Parameters: |
|
|---|
| Returns: |
|
|---|
reshape_alpha_beta(alpha_values, beta_values)
Function to reshape alpha and beta values for plotting.
| Parameters: |
|
|---|
Returns: pd.DataFrame: Reshaped DataFrame containing both alpha and beta values.
perform_pca(df)
Function to perform PCA analysis on the given DataFrame.
| Parameters: |
|
|---|
| Returns: |
|
|---|
plot_pca(result_df_sorted, y_axis_column)
Plot PCA or t-SNE results for each gene/psite. The function creates scatter plots with different markers for alpha and beta parameters, and adds labels for each point. The function also adjusts text labels to avoid overlap using the adjustText library.
:param result_df_sorted: DataFrame containing PCA or t-SNE results. :param y_axis_column: Column name for the y-axis values in the plot.
perform_tsne(scaled_data, df)
Perform t-SNE analysis on the given scaled data. The function returns a DataFrame with t-SNE results and additional columns for type and gene/psite information.
:param scaled_data: :param df:
:return: - pd.DataFrame: DataFrame with t-SNE results and additional columns.
additional_plots(df, scaled_data, alpha_values, beta_values, residuals_df)
Function to create additional plots including CDF, KDE, Boxplot, and Hierarchical Clustering.
:param df: :param scaled_data: :param alpha_values: :param beta_values: :param residuals_df:
create_sankey_from_network(output_dir, data, title)
Creates a Sankey diagram from the given data and saves it as an HTML file.
This function processes the input data to generate nodes and links for a Sankey diagram. It assigns colors to nodes and links based on their attributes and values, and uses Plotly to render the diagram. The resulting diagram is saved as an HTML file in the specified output directory.
:param output_dir: str The directory where the Sankey diagram HTML file will be saved. :param data: pd.DataFrame A DataFrame containing the data for the Sankey diagram. It must include the following columns: - 'Source': The source node of the link. - 'Target': The target node of the link. - 'Value': The value of the link, which determines the flow size. :param title: str The title of the Sankey diagram.
The function performs the following steps: 1. Initializes nodes and links for the Sankey diagram. 2. Maps node labels to indices and assigns colors to nodes. 3. Processes the data to create links between nodes, assigning colors based on link values. 4. Builds the Sankey diagram using Plotly. 5. Adds a color bar to explain the flow gradient. 6. Saves the Sankey diagram as an HTML file in the specified output directory.
important_connections(output_dir, data, top_n=20)
Extracts the top N most important connections based on their absolute values and saves them to a CSV file.
:param output_dir: str The directory where the CSV file will be saved. :param data: pd.DataFrame A DataFrame containing the connections with columns 'Source', 'Target', and 'Value'. :param top_n: int, optional The number of top connections to extract (default is 20).
The function sorts the connections by their absolute values in descending order, selects the top N connections, and saves them to a CSV file named 'top_connections.csv' in the specified output directory.
kinopt.optimality.KKT
generate_latex_table(summary_dict, table_caption, table=None)
Function to generate a LaTeX table from a summary dictionary.
| Parameters: |
|
|---|
| Returns: |
|
|---|
print_primal_feasibility_results(primal_summary, alpha_violations, beta_violations, logger_obj=None)
Logs the primal feasibility summary and violation details.
| Parameters: |
|
|---|
print_sensitivity_and_active_constraints(sensitivity_summary, active_constraints_summary, logger_obj=None)
Logs the sensitivity summary and active constraints summary.
| Parameters: |
|
|---|
plot_constraint_violations(alpha_violations, beta_violations, out_dir)
Function to plot constraint violations for alpha and beta values. It creates a stacked bar plot showing the violations for each protein. The top 5 proteins with the highest violations are highlighted in red.
| Parameters: |
|
|---|
plot_sensitivity_analysis(sensitivity_analysis, out_dir)
Function to plot sensitivity analysis results. It creates a horizontal bar plot showing the mean, max, and min sensitivity for each protein.
| Parameters: |
|
|---|
| Returns: |
|
|---|
process_excel_results(file_path=OUT_FILE)
Function to process the Excel results file. It reads the alpha and beta values, estimated and observed values, validates normalization constraints, computes residuals and gradients, and generates LaTeX tables for the residuals and sensitivity summaries. It also performs sensitivity analysis and identifies high sensitivity sites. The results are returned as a dictionary.
| Parameters: |
|
|---|
Returns: dict: Dictionary containing the processed results, including alpha and beta values, estimated and observed values, constraint violations, residuals summary, sensitivity summary, and high sensitivity sites.
post_optimization_results()
Function to process and visualize the results of the optimization.
Returns: dict: Dictionary containing the processed results, including alpha and beta values, estimated and observed values, constraint violations, residuals summary, sensitivity summary, and high sensitivity sites.
TF-mRNA Optimization
Evolutionary Algorithms
tfopt.evol.config.constants
parse_args()
tfopt.evol CLI: bounds, loss, optimizer selection. Defaults come from config.toml.
tfopt.evol.config.logconf
tfopt.evol.exporter.plotout
plot_estimated_vs_observed(predictions, expression_matrix, gene_ids, time_points, regulators, tf_protein_matrix, tf_ids, num_targets, save_path=OUT_DIR)
Plot the estimated vs observed expression levels for a set of genes.
| Parameters: |
|
|---|
compute_predictions(x, regulators, protein_mat, psite_tensor, n_reg, T_use, n_mRNA, beta_start_indices, num_psites)
Compute the predicted expression levels based on the optimization variables.
| Parameters: |
|
|---|
tfopt.evol.exporter.sheetutils
save_results_to_excel(gene_ids, tf_ids, final_alpha, final_beta, psite_labels_arr, expression_matrix, predictions, objective_value, reg_map, filename=OUT_FILE)
Save the optimization results to an Excel file.
| Parameters: |
|
|---|
tfopt.evol.objfn.minfn
TFOptimizationMultiObjectiveProblem
Bases: Problem
Represents a multi-objective optimization problem specific to transcription factor (TF) and mRNA synthesis dynamics.
This class is an extension of the Problem class and is designed to model complex biological
processes by incorporating various dynamic parameters like regulators, protein matrices,
psite tensors, and associated configurations. It supports parallel evaluation for
multi-thread usage, optimizing performance for large populations.
| Attributes: |
|
|---|
__init__(n_var: int, n_mRNA: int, n_TF: int, n_reg: int, n_psite_max: int, n_alpha: int, mRNA_mat: np.ndarray, regulators: np.ndarray, protein_mat: np.ndarray, psite_tensor: np.ndarray, T_use: int, beta_start_indices: np.ndarray, num_psites: np.ndarray, no_psite_tf: np.ndarray, xl: Optional[np.ndarray] = None, xu: Optional[np.ndarray] = None, **kwargs)
Initializes the class with various parameters required for computational evaluation.
| Parameters: |
|
|---|
tfopt.evol.opt.optrun
run_optimization(problem, total_dim, optimizer)
Execute multi-objective optimization using the specified algorithm.
This function configures and runs one of three multi-objective evolutionary algorithms (UNSGA3, SMSEMOA, or AGEMOEA) on the provided optimization problem. The algorithm is configured with appropriate genetic operators (two-point crossover and polynomial mutation) and terminated after 1000 generations.
| Parameters: |
|
|---|
| Returns: |
|
|---|
Notes
- Population size is set to 2 * total_dim (or larger for UNSGA3 if needed)
- Crossover probability: 0.9
- Mutation probability: 1.0 / total_dim
- Mutation distribution index (eta): 20
- Termination: Fixed at 1000 generations
- Random seed: 1 (for reproducibility)
- Duplicate elimination is enabled for all algorithms
- UNSGA3 automatically adjusts population size to match reference directions
tfopt.evol.optcon.construct
build_fixed_arrays(mRNA_ids, mRNA_mat, TF_ids, protein_dict, psite_dict, psite_labels_dict, reg_map)
Builds fixed-shape arrays from the input data.
| Parameters: |
|
|---|
Returns: mRNA_mat (np.ndarray): Matrix of mRNA expression levels. regulators (np.ndarray): Matrix of regulators for each mRNA. protein_mat (np.ndarray): Matrix of TF protein levels. psite_tensor (np.ndarray): Tensor of phosphorylation sites. n_reg (int): Number of regulators. n_psite_max (int): Maximum number of phosphorylation sites across all TFs. psite_labels_arr (list): List of phosphorylation site labels for each TF. num_psites (np.ndarray): Array indicating the number of phosphorylation sites for each TF.
tfopt.evol.optcon.filter
load_raw_data()
Load raw data from files.
| Returns: |
|
|---|
filter_mrna(mRNA_ids, mRNA_mat, reg_map)
Filter mRNA genes to only those with regulators present in the regulation map.
| Parameters: |
|
|---|
| Returns: |
|
|---|
update_regulations(mRNA_ids, reg_map, TF_ids)
Update the regulation map to only include relevant transcription factors.
| Parameters: |
|
|---|
| Returns: |
|
|---|
filter_TF(TF_ids, protein_dict, psite_dict, psite_labels_dict, relevant_TFs)
Filter transcription factors to only those present in the relevant_TFs set.
| Parameters: |
|
|---|
| Returns: |
|
|---|
determine_T_use(mRNA_mat, TF_time_cols)
Determine the number of time points to use for the analysis.
| Parameters: |
|
|---|
tfopt.evol.utils.iodata
load_mRNA_data(filename=INPUT3)
Load mRNA data from a CSV file.
| Parameters: |
|
|---|
Returns: - mRNA_ids: List of mRNA gene identifiers (strings). - mRNA_mat: Matrix of mRNA expression data (numpy array). - time_cols: List of time columns (excluding "GeneID").
load_TF_data(filename=INPUT1)
Load TF data from a CSV file.
| Parameters: |
|
|---|
Returns: - TF_ids: List of TF identifiers (strings). - protein_dict: Dictionary mapping TF identifiers to their protein data (numpy array). - psite_dict: Dictionary mapping TF identifiers to their phosphorylation site data (list of numpy arrays). - psite_labels_dict: Dictionary mapping TF identifiers to their phosphorylation site labels (list of strings). - time_cols: List of time columns (excluding "GeneID" and "Psite").
load_regulation(filename=INPUT4)
Load regulation data from a CSV file.
| Parameters: |
|
|---|
Returns: - reg_map: Dictionary mapping mRNA genes to their regulators (list of TF identifiers).
create_report(results_dir: str, output_file: str = 'report.html')
Creates a single global report HTML file from all gene folders inside the results directory.
| Parameters: |
|
|---|
organize_output_files(*directories)
Organizes output files from multiple directories into separate folders for each protein.
| Parameters: |
|
|---|
format_duration(seconds)
Format a duration in seconds into a human-readable string.
| Parameters: |
|
|---|
Returns: str: Formatted duration string.
tfopt.evol.utils.params
create_no_psite_array(n_TF, num_psites, psite_labels_arr)
Create an array indicating whether each TF has no phosphorylation sites.
| Parameters: |
|
|---|
| Returns: |
|
|---|
compute_beta_indices(num_psites, n_TF)
Compute the starting indices for the beta parameters for each TF.
| Parameters: |
|
|---|
| Returns: |
|
|---|
create_initial_guess(n_mRNA, n_reg, n_TF, num_psites, no_psite_tf)
Create the initial guess for the optimization variables.
| Parameters: |
|
|---|
| Returns: |
|
|---|
create_bounds(n_alpha, n_beta_total, lb, ub)
Create the lower and upper bounds for the optimization variables.
| Parameters: |
|
|---|
| Returns: |
|
|---|
get_parallel_runner()
Get a parallel runner for multi-threading.
| Returns: |
|
|---|
extract_best_solution(res, n_alpha, n_mRNA, n_reg, n_TF, num_psites, beta_start_indices)
Extract the best solution from the optimization results.
| Parameters: |
|
|---|
| Returns: |
|
|---|
print_alpha_mapping(mRNA_ids, reg_map, TF_ids, final_alpha)
Print the mapping of transcription factors (TFs) to mRNAs with their corresponding alpha values.
| Parameters: |
|
|---|
print_beta_mapping(TF_ids, final_beta, psite_labels_arr)
Print the mapping of transcription factors (TFs) to their beta parameters.
| Parameters: |
|
|---|
Gradient-Based Algorithms
tfopt.local.config.constants
parse_args()
tfopt.local CLI: bounds and loss selection. Defaults come from config.toml.
tfopt.local.config.logconf
tfopt.local.exporter.plotout
plot_estimated_vs_observed(predictions, expression_matrix, gene_ids, time_points, regulators, tf_protein_matrix, tf_ids, num_targets, save_path=OUT_DIR)
Plots the estimated vs observed values for a given set of genes and their corresponding TFs.
| Parameters: |
|
|---|
plot_multistart_summary_runtime_overlay(summary_csv, out_path=None, figsize=(8, 8), x_col='rank', y_col='fun', c_col='runtime_s', success_col='success', cv_col='constr_violation', annotate_best=True)
Creates a scatter plot visualizing multi-start optimization results with runtime overlay.
This function reads a CSV summary of multiple optimization runs and generates a scatter plot showing the relationship between run rank and final objective value, with runtime (or iterations) represented as color intensity. Successful and feasible runs are emphasized while unsuccessful or infeasible runs are shown with reduced opacity.
| Parameters: |
|
|---|
| Returns: |
|
|---|
Notes
- Points are considered feasible if constraint violation <= 1e-8
- Infeasible or unsuccessful runs are plotted with reduced opacity (0.25)
- If rank column is missing, it's automatically generated from objective values
- If runtime column is missing, falls back to iteration count or constant color
tfopt.local.exporter.sheetutils
save_results_to_excel(gene_ids, tf_ids, final_alpha, final_beta, psite_labels_arr, expression_matrix, predictions, objective_value, reg_map, filename=OUT_FILE)
Save the optimization results to an Excel file.
| Parameters: |
|
|---|
export_multistart_results(results)
Export multiple multistart optimization results to an Excel file.
| Parameters: |
|
|---|
| Returns: |
|
|---|
save_multistart_solutions_npz(all_results, out_path)
Saves multistart optimization solutions to a compressed .npz file format.
This function aggregates optimization results into a structured format and saves them in a compressed NumPy .npz file. It processes the solutions, extracting relevant attributes such as optimization variables, function values, success status, and starting IDs, before saving them for later use.
| Parameters: |
|
|---|
tfopt.local.objfn.minfn
objective_(x, expression_matrix, regulators, tf_protein_matrix, psite_tensor, n_reg, T_use, n_genes, beta_start_indices, num_psites, loss_type, lam1=1e-06, lam2=1e-06)
Originally implemented by Julius Normann.
This version has been modified and optimized for consistency & speed in submodules by Abhinav Mishra.
Computes a loss value using one of several loss functions.
| Parameters: |
|
|---|
| Returns: |
|
|---|
compute_predictions(x, regulators, tf_protein_matrix, psite_tensor, n_reg, T_use, n_genes, beta_start_indices, num_psites)
Computes the predicted expression matrix based on the decision vector x.
| Parameters: |
|
|---|
| Returns: |
|
|---|
objective_wrapper(x, expression_matrix, regulators, tf_protein_matrix, psite_tensor, n_reg, T_use, n_genes, beta_start_indices, num_psites, loss_type)
Wrapper function for the objective function.
| Parameters: |
|
|---|
| Returns: |
|
|---|
tfopt.local.opt.optrun
run_optimizer(x0, bounds, lin_cons, expression_matrix, regulators, tf_protein_matrix, psite_tensor, n_reg, T_use, n_genes, beta_start_indices, num_psites, loss_type)
Runs the optimization algorithm to minimize the objective function.
| Parameters: |
|
|---|
Returns: result : Result of the optimization process, including the optimized parameters and objective value.
generate_multistart_x0(x0: np.ndarray, bounds: Sequence[Tuple[float, float]], n_starts: int, seed: int = 0, jitter_frac: float = 0.05, p_random: float = 0.3) -> List[np.ndarray]
Generates multiple starting points for multi-start optimization.
Builds a diverse list of starting points
- mostly: jitter around baseline x0
- some: fully random within bounds
jitter_frac is relative to (ub - lb). p_random is fraction of starts that are random-in-bounds.
| Returns: |
|
|---|
run_optimizer_multistart(x0: np.ndarray, bounds, lin_cons, expression_matrix, regulators, tf_protein_matrix, psite_tensor, n_reg, T_use, n_genes, beta_start_indices, num_psites, loss_type, run_optimizer_func, cfg: Optional[MultiStartConfig] = None, polish: bool = True)
Executes a multistart optimization loop with parallelization and optional polishing to find the best solution across multiple starting points. The function leverages a parallel approach for running multiple optimizations, selects the best result based on predefined sorting criteria, and optionally refines it.
| Parameters: |
|
|---|
| Returns: |
|
|---|
tfopt.local.optcon.construct
build_fixed_arrays(gene_ids, expression_matrix, tf_ids, tf_protein, tf_psite_data, tf_psite_labels, reg_map)
Builds fixed-shape arrays from the input data.
| Parameters: |
|
|---|
| Returns: |
|
|---|
constraint_alpha_func(x, n_genes, n_reg)
For each gene, the sum of its alpha parameters must equal 1.
| Parameters: |
|
|---|
| Returns: |
|
|---|
constraint_beta_func(x, n_alpha, n_TF, beta_start_indices, num_psites, no_psite_tf)
For each TF, the sum of its beta parameters must equal 1.
| Parameters: |
|
|---|
| Returns: |
|
|---|
build_linear_constraints(n_genes, n_TF, n_reg, n_alpha, beta_start_indices, num_psites, no_psite_tf)
Build linear constraints for the transcription factor optimization problem.
| Parameters: |
|
|---|
| Returns: |
|
|---|
tfopt.local.optcon.filter
load_and_filter_data()
Load and filter data for the optimization problem.
| Returns: |
|
|---|
prepare_data(gene_ids, expr_matrix, tf_ids, tf_protein, tf_psite_data, tf_psite_labels, tf_time_cols, reg_map)
Prepares the data for optimization by filtering the expression matrix to match the number of time points and building fixed arrays.
| Parameters: |
|
|---|
Returns: fixed_arrays (tuple): Tuple containing the fixed arrays: - expression_matrix: array of shape (n_genes, T) - regulators: array of shape (n_genes, n_reg) with indices into tf_ids. - tf_protein_matrix: array of shape (n_TF, T) - psite_tensor: array of shape (n_TF, n_psite_max, T), padded with zeros. - n_reg: maximum number of regulators per gene. - n_psite_max: maximum number of PSites among TFs. - psite_labels_arr: list (length n_TF) of lists of PSite names (padded with empty strings). - num_psites: array of length n_TF with the actual number of PSites for each TF. T_use (int): Number of time points used in the expression matrix.
tfopt.local.utils.iodata
min_max_normalize(df, custom_max=None)
Row-wise (per-sample) min-max normalize time-series columns starting with 'x'.
| Parameters: |
|
|---|
| Returns: |
|
|---|
load_expression_data(filename=INPUT3)
Loads gene expression (mRNA) data.
| Parameters: |
|
|---|
| Returns: |
|
|---|
load_tf_protein_data(filename=INPUT1)
Loads TF protein data along with PSite information.
| Parameters: |
|
|---|
Returns: - tf_ids: List of TF identifiers (strings). - tf_protein: Dictionary mapping TF identifiers to their protein data (numpy array). - tf_psite_data: Dictionary mapping TF identifiers to their phosphorylation site data (list of numpy arrays). - tf_psite_labels: Dictionary mapping TF identifiers to their phosphorylation site labels (list of strings). - time_cols: List of time columns (excluding "GeneID" and "Psite").
load_regulation(filename=INPUT4)
Returns a mapping from gene (source) to a list of TFs (targets).
| Parameters: |
|
|---|
Returns: - reg_map: Dictionary mapping gene identifiers to lists of TF identifiers.
summarize_stats(input3=INPUT3, input1=INPUT1, input4=INPUT4)
Summarizes statistics for the expression data (input3) and TF protein data (input1).
| Parameters: |
|
|---|
create_report(results_dir: str, output_file: str = 'report.html')
Creates a single global report HTML file from all gene folders inside the results directory.
| Parameters: |
|
|---|
organize_output_files(*directories)
Function to organize output files into protein-specific folders.
| Parameters: |
|
|---|
tfopt.local.utils.params
get_optimization_parameters(expression_matrix, tf_protein_matrix, n_reg, T_use, psite_labels_arr, num_psites, lb, ub)
Prepare the optimization parameters for the optimization problem.
| Parameters: |
|
|---|
Returns: x0 (np.ndarray): Initial guess for the optimization variables. n_alpha (int): Number of alpha parameters. beta_start_indices (np.ndarray): Starting indices for beta parameters. bounds (list): List of bounds for the optimization variables. no_psite_tf (np.ndarray): Array indicating whether each TF has no phosphorylation sites. n_genes (int): Number of genes. n_TF (int): Number of transcription factors.
postprocess_results(result, n_alpha, n_genes, n_reg, beta_start_indices, num_psites, reg_map, gene_ids, tf_ids, psite_labels_arr)
Post-process the optimization results to extract the final alpha and beta parameters.
| Parameters: |
|
|---|
| Returns: |
|
|---|
Fitting Analysis
tfopt.fitanalysis.helper
Plotter
A class to plot various analysis results from an Excel file.
__init__(filepath, savepath)
Initializes the Plotter instance by loading data from the Excel file. Args: filepath (str): Path to the Excel file containing analysis results. savepath (str): Directory where the plots will be saved.
load_data()
Loads data from the specified Excel file. Args: filepath (str): Path to the Excel file. savepath (str): Directory where the plots will be saved.
plot_alpha_distribution()
Plots the distribution of alpha parameter values grouped by transcription factors (TFs) using a strip plot.
plot_beta_barplots()
Processes the beta values DataFrame and creates a separate bar plot for each unique transcription factor (TF).
plot_heatmap_abs_residuals()
Plots a heatmap of the absolute values of the residuals.
plot_goodness_of_fit()
Creates a scatter plot comparing observed vs. estimated values, fits a linear regression model, plots the 95% confidence interval, and labels points outside the confidence interval.
plot_kld()
Plots the Kullback-Leibler Divergence (KLD) for each mRNA. The KLD is calculated between the observed and estimated distributions of the mRNA expression levels.
plot_pca()
Plots a PCA (Principal Component Analysis) of the observed and estimated values.
plot_boxplot_alpha()
Plots a boxplot of the alpha values.
plot_boxplot_beta()
Plots a boxplot of the beta values.
plot_cdf_alpha()
Plots the cumulative distribution function (CDF) of the alpha values.
plot_cdf_beta()
Plots the cumulative distribution function (CDF) of the beta values.
plot_time_wise_residuals()
Plots the residuals over time for each mRNA.
ODE Modelling & Parameter Estimation
Configuration
config.cli
Command‑line entry point for the phoskintime pipeline.
Usage
Come one level up from the package root, it should be the working directory
(where you can see the project directory).
run everything with the default (local) solver
python phoskintime all
run only preprocessing
python phoskintime prep
run tfopt with local flavour
python phoskintime tfopt --mode local
run tfopt with evol flavour
python phoskintime tfopt --mode evol
run kinopt with local flavour
python phoskintime kinopt --mode local
run kinopt with evol flavour
python phoskintime kinopt --mode evol
run the model
python phoskintime model
prep()
Preprocess data (processing.cleanup).
tfopt(mode: str = typer.Option('local', help='local | evol'), conf: Path | None = typer.Option(None, '--conf', file_okay=True, dir_okay=False, writable=False, help='Path to TOML/YAML config. Uses defaults if omitted.'))
Transcription-Factor-mRNA Optimisation.
| Parameters: |
|
|---|
Returns: None
kinopt(mode: str = typer.Option('local', help='local | evol'), conf: Path | None = typer.Option(None, '--conf', file_okay=True, dir_okay=False, writable=False, help='Path to TOML/YAML config. Uses defaults if omitted.'))
Kinase-Phosphorylation Optimization.
| Parameters: |
|
|---|
Returns: None
model(conf: Path | None = typer.Option(None, '--conf', file_okay=True, dir_okay=False, writable=False, help='Path to model config file. Uses defaults if omitted.'))
Run the model (bin.main).
| Parameters: |
|
|---|
Returns: None
all(tf_mode: str = typer.Option('local', help='tfopt mode: local | evol'), kin_mode: str = typer.Option('local', help='kinopt mode: local | evol'), tf_conf: Path | None = typer.Option(None, help='tfopt config file'), kin_conf: Path | None = typer.Option(None, help='kinopt config file'), model_conf: Path | None = typer.Option(None, help='model config file'))
Run every stage in sequence. Preprocessing -> TF optimisation -> Kinase optimisation -> Model.
| Parameters: |
|
|---|
Returns: None
config.config
parse_bound_pair(val)
Parse a string representing a pair of bounds (lower, upper) into a tuple of floats. The upper bound can be 'inf' or 'infinity' to represent infinity. Raises ValueError if the input is not in the correct format. Args: val (str): The string to parse, e.g., "0,3" or "0,infinity". Returns: tuple: A tuple containing the lower and upper bounds as floats.
parse_fix_value(val)
Parse a fixed value or a list of fixed values from a string. If the input is a single value, it returns that value as a float. If the input is a comma-separated list, it returns a list of floats. Raises ValueError if the input is not in the correct format. Args: val (str): The string to parse, e.g., "1.0" or "1.0,2.0". Returns: float or list: The parsed fixed value(s) as a float or a list of floats.
ensure_output_directory(directory)
| Parameters: |
|
|---|
Returns: None
parse_args()
Parse command-line arguments for the PhosKinTime script. This function uses argparse to define and handle the command-line options. It includes options for setting bounds, fixed parameters, bootstrapping, profile estimation, and input file paths. The function returns the parsed arguments as a Namespace object. The arguments include: --A-bound, --B-bound, --C-bound, --D-bound, --Ssite-bound, --Dsite-bound, --bootstraps, --input-excel-protein, --input-excel-psite, --input-excel-rna.
Returns: argparse.Namespace: The parsed command-line arguments.
log_config(logger, bounds, args)
Log the configuration settings for the PhosKinTime script. This function logs the parameter bounds bootstrapping iterations. It uses the provided logger to output the information.
| Parameters: |
|
|---|
Returns: None
extract_config(args)
Extract configuration settings from command-line arguments. This function creates a dictionary containing the parameter bounds, bootstrapping iterations. The function returns the configuration dictionary.
| Parameters: |
|
|---|
Returns: dict: The configuration settings.
score_fit(params, target, prediction, alpha=ALPHA_WEIGHT, beta=BETA_WEIGHT, gamma=GAMMA_WEIGHT, delta=DELTA_WEIGHT, mu=MU_WEIGHT)
Calculate the score for the fit of a model to target data. The score is a weighted combination of various metrics including mean squared error (MSE), root mean squared error (RMSE), mean absolute error (MAE), variance, and regularization penalty. The weights for each metric can be adjusted using the parameters alpha, beta, gamma, and delta. The regularization penalty is controlled by the reg_penalty parameter. The function returns the calculated score. Args: params (np.ndarray): The model parameters. target (np.ndarray): The target data. prediction (np.ndarray): The predicted data. alpha (float): Weight for RMSE. beta (float): Weight for MAE. gamma (float): Weight for variance. delta (float): Weight for MSE. mu (float): Regularization penalty weight. Returns: float: The calculated score.
future_times(n_new: int, ratio: Optional[float] = None, tp: np.ndarray = TIME_POINTS) -> np.ndarray
Extend ttime points by n_new points, each spaced by multiplying the previous interval by ratio. If ratio is None, it is inferred from the last two points.
| Parameters: |
|
|---|
Returns: np.ndarray: Extended time points.
config.constants
get_param_names_rand(num_psites: int) -> list
Generate parameter names for the random model.
Format: ['A', 'B', 'C', 'D'] +
['S1', 'S2', ..., 'S
| Parameters: |
|
|---|
Returns: list: List of parameter names.
get_param_names_ds(num_psites: int) -> list
Generate parameter names for distributive or successive models.
Format: ['A', 'B', 'C', 'D'] +
['S1', 'S2', ..., 'S
| Parameters: |
|
|---|
Returns: list: List of parameter names.
generate_labels_rand(num_psites: int) -> list
Generates labels for the states based on the number of phosphorylation sites for the random model. Returns a list with the base labels "R" and "P", followed by labels for all combinations of phosphorylated sites.
| Parameters: |
|
|---|
Returns: list: List of state labels.
generate_labels_ds(num_psites: int) -> list
Generates labels for the states based on the number of phosphorylation sites for the distributive or successive models. Returns a list with the base labels "R" and "P", followed by labels for each individual phosphorylated state.
| Parameters: |
|
|---|
Returns: list: List of state labels.
location(path: str, label: str = None) -> str
Returns a clickable hyperlink string for supported terminals using ANSI escape sequences.
| Parameters: |
|
|---|
| Returns: |
|
|---|
get_number_of_params_rand(num_psites)
Calculate the number of parameters required for the ODE system based on the number of phosphorylation sites.
| Parameters: |
|
|---|
| Returns: |
|
|---|
get_bounds_rand(num_psites, ub=0, lower=0)
Generate bounds for the ODE parameters based on the number of phosphorylation sites.
| Parameters: |
|
|---|
| Returns: |
|
|---|
config.logconf
ColoredFormatter
Bases: Formatter
Custom formatter to add colors to log messages and elapsed time.
This formatter uses ANSI escape codes to colorize the log messages based on their severity level.
It also includes a right-aligned clock that shows the elapsed time since the logger was initialized.
The elapsed time is displayed in a human-readable format (e.g., "1h 23m 45s").
The formatter is designed to be used with a logger that has a console handler.
The elapsed time is calculated from the time the logger was initialized and is displayed in a right-aligned format.
The formatter also ensures that the log messages are padded to a specified width, which can be adjusted using the width parameter.
The remove_ansi method is used to strip ANSI escape codes from the log message for accurate padding calculation.
The format method is overridden to customize the log message format, including the timestamp, logger name, log level, and message.
The setup_logger function is used to configure the logger with a file handler and a stream handler.
The file handler writes log messages to a specified log file, while the stream handler outputs log messages to the console.
The logger is set to the specified logging level, and the log file is created in the specified directory.
The log file is rotated based on size, and old log files are backed up.
format(record)
Format the log record with colors and elapsed time. This method overrides the default format method to customize the log message format. It includes the timestamp, logger name, log level, and message.
remove_ansi(s)
staticmethod
Remove ANSI escape codes from a string.
setup_logger(name='phoskintime', log_file=None, level=logging.DEBUG, log_dir=LOG_DIR, rotate=True, max_bytes=2 * 1024 * 1024, backup_count=5, mp_file_logging='main_only')
Setup a logger with colored output and file logging. This function creates a logger with colored output for console messages :param name: :param log_file: :param level: :param log_dir: :param rotate: :param max_bytes: :param backup_count: :param mp_file_logging: - "off": disable file logging - "main_only": file logging only in main process - "per_process": file logging in each process :return: logger
Core Functions
paramest.normest
worker_find_lambda(lam: float, gene: str, target: np.ndarray, p0: np.ndarray, time_points: np.ndarray, free_bounds: Tuple[np.ndarray, np.ndarray], init_cond: np.ndarray, num_psites: int, p_data: np.ndarray, pr_data: np.ndarray) -> Tuple[float, float, str]
Worker function for a single lambda value.
| Parameters: |
|
|---|
| Returns: |
|
|---|
find_best_lambda(gene: str, target: np.ndarray, p0: np.ndarray, time_points: np.ndarray, free_bounds: Tuple[np.ndarray, np.ndarray], init_cond: np.ndarray, num_psites: int, p_data: np.ndarray, pr_data: np.ndarray, lambdas=np.logspace(-2, 0, 10), max_workers: int = 4, per_lambda_timeout: float = 1800.0) -> Tuple[float, str]
Finds best lambda_reg to use in model_func.
normest(gene, pr_data, p_data, r_data, init_cond, num_psites, time_points, bounds, bootstraps, use_regularization=USE_REGULARIZATION)
Function to estimate parameters for a given gene using ODE models.
| Parameters: |
|
|---|
| Returns: |
|
|---|
paramest.toggle
estimate_parameters(gene, pr_data, p_data, r_data, init_cond, num_psites, time_points, bounds, bootstraps)
This function allows for the selection of the estimation mode and handles the parameter estimation process accordingly.
| Parameters: |
|
|---|
| Returns: |
|
|---|
Weights for Curve Fitting
models.weights
early_emphasis(pr_data, p_data, time_points, num_psites)
Function that calculates custom weights for early time points in a dataset.
| Parameters: |
|
|---|
| Returns: |
|
|---|
get_protein_weights(gene, input1_path=Path(__file__).resolve().parent.parent / 'processing' / 'input1_wstd.csv', input2_path=Path(__file__).resolve().parent.parent / 'data' / 'input2.csv')
Function to extract weights for a specific gene from the input files.
| Parameters: |
|
|---|
| Returns: |
|
|---|
full_weight(p_data_weight, use_regularization, reg_len)
Function to create a full weight array for parameter estimation.
| Parameters: |
|
|---|
| Returns: |
|
|---|
get_weight_options(target, t_target, num_psites, use_regularization, reg_len, early_weights, ms_gauss_weights)
Function to calculate weights for parameter estimation based on the target data and time points.
| Parameters: |
|
|---|
| Returns: |
|
|---|
Parameter Estimation
paramest.core
process_gene(gene, protein_data, kinase_data, mrna_data, time_points, bounds, bootstraps=0, out_dir=OUT_DIR)
Process a single gene by estimating its parameters and generating plots.
| Parameters: |
|
|---|
| Returns: |
|
|---|
process_gene_wrapper(gene, protein_data, kinase_data, mrna_data, time_points, bounds, bootstraps, out_dir=OUT_DIR)
Wrapper function to process a gene.
| Parameters: |
|
|---|
| Returns: |
|
|---|
Confidence Intervals using Linearization
paramest.identifiability.ci
confidence_intervals(gene, popt, pcov, target, model, alpha_val=0.05)
Computes the confidence intervals for parameter estimates using Wald Intervals approach.
| Parameters: |
|
|---|
| Returns: |
|
|---|
Knockout Analysis
knockout.helper
Perturbation & Parameter Sensitivity Analysis
sensitivity.analysis
compute_bound(value, perturbation=PERTURBATIONS_VALUE)
Computes the lower and upper bounds for a given parameter value for sensitivity analysis and perturbations.
| Parameters: |
|
|---|
| Returns: |
|
|---|
define_sensitivity_problem_rand(num_psites, values)
Defines the Morris sensitivity analysis problem for the random model.
| Parameters: |
|
|---|
| Returns: |
|
|---|
define_sensitivity_problem_ds(num_psites, values)
Defines the Morris sensitivity analysis problem for the dynamic-site model.
| Parameters: |
|
|---|
| Returns: |
|
|---|
Model Diagram
models.diagram.helpers
powerset(iterable)
Return the list of all subsets (as frozensets) of the given iterable.
| Parameters: |
|
|---|
Returns: A list of frozensets representing all subsets of the input iterable.
state_label(state)
Convert a set of phosphorylation sites into a node label.
| Parameters: |
|
|---|
Returns: A string representing the label for the node.
create_random_diagram(x, num_sites, output_filename)
Create a random phosphorylation diagram.
| Parameters: |
|
|---|
create_distributive_diagram(x, num_sites, output_filename)
Create a distributive phosphorylation diagram.
| Parameters: |
|
|---|
create_successive_model(x, num_sites, output_filename)
Create a successive phosphorylation diagram.
| Parameters: |
|
|---|
Model Types
models.distmod
ode_core(y, t, A, B, C, D, S_rates, D_rates)
The core ODE system for the distributive phosphorylation model.
| Parameters: |
|
|---|
| Returns: |
|
|---|
unpack_params(params, num_psites)
Function to unpack the parameters for the distributive ODE system.
| Parameters: |
|
|---|
| Returns: |
|
|---|
solve_ode(params, init_cond, num_psites, t)
Solve the ODE system for the distributive phosphorylation model.
| Parameters: |
|
|---|
| Returns: |
|
|---|
models.randmod
unpack_params(params, num_sites)
Unpack parameters for the Random model.
| Parameters: |
|
|---|
| Returns: |
|
|---|
ode_system(y, t, A, B, C, D, num_sites, S, Ddeg, mono_idx, forward, drop, fcounts, dcounts)
Compute the time derivatives of a random phosphorylation ODE system.
This function supports a large number of phosphorylation states by using precomputed transition indices to optimize speed.
| Parameters: |
|
|---|
| Returns: |
|
|---|
solve_ode(popt, y0, num_sites, t)
Integrate the ODE system for phosphorylation dynamics in random phosphorylation model.
| Parameters: |
|
|---|
| Returns: |
|
|---|
models.succmod
ode_core(y, t, A, B, C, D, S_rates, D_rates)
The core of the ODE system for the successive ODE model.
| Parameters: |
|
|---|
Returns: dydt (np.array): The derivatives of the state variables.
unpack_params(params, num_psites)
Function to unpack the parameters for the ODE system. The parameters are expected to be in the following order: A, B, C, D, S_rates, D_rates where S_rates and D_rates are arrays of length num_psites. The function returns the unpacked parameters as separate variables. :param params: array of parameters :param num_psites: number of phosphorylation sites :return: A, B, C, D, S_rates, D_rates
solve_ode(params, init_cond, num_psites, t)
Solve the ODE system using the given parameters and initial conditions. The function integrates the ODE system over time and returns the solution.
:param params: :param init_cond: :param num_psites: :param t: :return: solution, solution of phosphorylated sites
Steady-State Calculation
steady.initdist
initial_condition(num_psites: int) -> list
Calculates the initial steady-state conditions for a given number of phosphorylation sites for distributive phosphorylation model.
| Parameters: |
|
|---|
| Returns: |
|
|---|
| Raises: |
|
|---|
steady.initrand
initial_condition(num_psites: int) -> list
Calculates the initial steady-state conditions for a given number of phosphorylation sites for random phosphorylation model.
| Parameters: |
|
|---|
| Returns: |
|
|---|
| Raises: |
|
|---|
steady.initsucc
initial_condition(num_psites: int) -> list
Calculates the initial steady-state conditions for a given number of phosphorylation sites for successive phosphorylation model.
| Parameters: |
|
|---|
| Returns: |
|
|---|
| Raises: |
|
|---|
Plotting
plotting.plotting
Plotter
A class to encapsulate plotting functionalities for ODE model analysis.
| Attributes: |
|
|---|
plot_parallel(solution: np.ndarray, labels: list)
Plots a parallel coordinates plot for the given solution.
| Parameters: |
|
|---|
pca_components(solution: np.ndarray, target_variance: float = 0.99)
Plots a scree plot showing the explained variance ratio for PCA components.
| Parameters: |
|
|---|
plot_pca(solution: np.ndarray, components: int = 3)
Plots the PCA results for the given solution.
| Parameters: |
|
|---|
| Returns: |
|
|---|
plot_tsne(solution: np.ndarray, perplexity: int = 30)
Plots a t-SNE visualization of the given solution.
| Parameters: |
|
|---|
| Returns: |
|
|---|
plot_param_series(estimated_params: list, param_names: list, time_points: np.ndarray)
Plots the time series of estimated parameters over the given time points.
| Parameters: |
|
|---|
plot_profiles(data: pd.DataFrame)
Plots the profiles of estimated parameters over time.
| Parameters: |
|
|---|
plot_model_fit(model_fit: np.ndarray, Pr_data: np.ndarray, P_data: np.ndarray, R_data: np.ndarray, sol: np.ndarray, num_psites: int, psite_labels: list, time_points: np.ndarray)
Plots the model fit for mRNA, protein, and phosphorylated species across time.
| Parameters: |
|
|---|
plot_param_scatter(est_arr: np.ndarray, num_psites: int, time_vals: np.ndarray)
Plots scatter and density plots for parameters.
| Parameters: |
|
|---|
plot_heatmap(param_value_df: pd.DataFrame)
| Parameters: |
|
|---|
plot_error_distribution(error_df: pd.DataFrame)
| Parameters: |
|
|---|
plot_gof(merged_data: pd.DataFrame)
Plot the goodness of fit for the model.
| Parameters: |
|
|---|
plot_kld(merged_data: pd.DataFrame)
Plots the Kullback-Divergence for the model.
| Parameters: |
|
|---|
plot_params_bar(ci_results: dict, param_labels: list = None)
Plots bar plot for estimated parameter with 95% Confidence Interval.
| Parameters: |
|
|---|
plot_knockouts(results_dict: dict, num_psites: int, psite_labels: list)
Plot wild-type and knockout simulation results for comparison.
| Parameters: |
|
|---|
plot_top_param_pairs(excel_path: str)
For each gene's '_perturbations' sheet in the Excel file, plot scatter plots for the parameter pairs with correlation.
| Parameters: |
|
|---|
plot_model_perturbations(problem: dict, Si: dict, cutoff_idx: int, time_points: np.ndarray, n_sites: int, best_model_psite_solutions: np.ndarray, best_mrna_solutions: np.ndarray, best_protein_solutions: np.ndarray, psite_labels: list[str], protein_data_ref: np.ndarray, psite_data_ref: np.ndarray, rna_ref: np.ndarray, model_fit_sol: np.ndarray) -> None
Plot the best model perturbations for the given data.
| Parameters: |
|
|---|
plot_time_state_grid(samples: np.ndarray, time_points: np.ndarray, state_names: list)
Grid of strip plots per state showing variability across time.
| Parameters: |
|
|---|
plot_phase_space(samples: np.ndarray, state_names: list)
Phase space plots: one state vs another for each simulation.
| Parameters: |
|
|---|
plot_future_fit(P_data: np.ndarray, R_data: np.ndarray, sol: np.ndarray, num_psites: int, psite_labels: list, time_points: np.ndarray)
Plots the model fit for the future time points.
| Parameters: |
|
|---|
plot_regularization(excel_path: str)
Read every '
| Parameters: |
|
|---|
plot_model_error(excel_path: str)
Read every '
| Parameters: |
|
|---|
Utility Functions
utils.display
ensure_output_directory(directory)
Ensure the output directory exists. If it doesn't, create it.
| Parameters: |
|
|---|
load_data(excel_file, sheet='Estimated Values')
Load data from an Excel file. The default sheet is "Estimated Values".
| Parameters: |
|
|---|
| Returns: |
|
|---|
format_duration(seconds)
Format a duration in seconds into a human-readable string.
| Parameters: |
|
|---|
Returns: str: Formatted duration string.
merge_obs_est(filename)
Function to merge observed and estimated data from an Excel file.
| Parameters: |
|
|---|
| Returns: |
|
|---|
save_result(results, excel_filename)
Function to save results to an Excel file.
| Parameters: |
|
|---|
create_report(results_dir: str, output_file: str = f'{model_type}_report.html')
Creates a single global report HTML file from all gene folders inside the results directory.
| Parameters: |
|
|---|
organize_output_files(directories: Iterable[Union[str, Path]])
Organize output files into protein-specific folders and a general folder.
| Parameters: |
|
|---|
utils.tables
generate_tables(xlsx_file_path)
Generate hierarchical tables from the XLSX file containing alpha and beta values.
| Parameters: |
|
|---|
| Returns: |
|
|---|
save_tables(tables, output_dir)
Save the generated tables as LaTeX and CSV files.
| Parameters: |
|
|---|
save_master_table(folder='latex', output_file='latex/all_tables.tex')
Save a master LaTeX file that includes all individual LaTeX files from the specified folder.
| Parameters: |
|
|---|
utils.latexit
generate_latex_table(df, sheet_name)
Generate LaTeX code for a table from a DataFrame. Args: df (pd.DataFrame): DataFrame to convert to LaTeX. sheet_name (str): Name of the sheet for caption and label. Returns: str: LaTeX code for the table.
generate_latex_image(image_filename)
Generate LaTeX code for an image.
| Parameters: |
|
|---|
Returns: str: LaTeX code for the image.
main(input_dir)
Main function to process Excel and PNG files in the input directory and generate LaTeX code.
| Parameters: |
|
|---|