API Reference

Data Standardization & Cleanup

processing.cleanup

process_collecttri()

Processes the CollecTRI file to clean and filter mRNA-TF interactions. Removes complex interactions, filters by target genes, and saves the result.

format_site(site)

Formats a phosphorylation site string.

If the input is NaN or an empty string, returns an empty string. If the input contains an underscore ('_'), splits the string into two parts, converts the first part to uppercase, and appends the second part unchanged. Otherwise, converts the entire string to uppercase.

Parameters:
  • site (str) –

    The phosphorylation site string to format.

Returns:
  • str

    The formatted phosphorylation site string.

process_msgauss()

Processes the MS Gaussian data file to generate time series data.

process_msgauss_std()

Processes the MS Gaussian data file to compute transformed means and standard deviations.

process_routlimma()

Processes the Rout Limma table to generate time series data for mRNA.

update_gene_symbols(filename)

Updates the GeneID column in a CSV file by mapping GeneIDs to gene/protein symbols.

Parameters:
  • filename (str) –

    The path to the CSV file to be updated. The file must contain a 'GeneID' column.

move_processed_files()

Moves or copies processed files to their respective directories.

Optimization Results Mapping

processing.map

map_optimization_results(tf_file_path, kin_file_path, sheet_name='Alpha Values')

Reads the TF-mRNA optimization results from an Excel file and maps mRNA to each TF.

Parameters:
  • tf_file_path

    Path to the Excel file containing TF-mRNA optimization results.

  • kin_file_path

    Path to the Excel file containing Kinase-Phosphorylation optimization results.

  • sheet_name

    The name of the sheet in the Excel file to read from. Default is 'Alpha Values'.

Returns:
  • pd.DataFrame: A DataFrame containing the mapped TF, mRNA, Psite, and Kinase information.

create_cytoscape_table(mapping_csv_path)

Creates a Cytoscape-compatible edge table from a mapping file.

Parameters:
  • mapping_csv_path (str) –

    Path to the input CSV file with columns: TF, TF_strength, mRNA, Psite, Kinase, Kinase_strength

Returns:
  • table( DataFrame ) –

    Edge table with columns [Source, Target, Interaction, Strength]

add_kinetic_strength_columns(mapping_path, mapping__path, excel_path, suffix)

Adds kinetic strength columns to the mapping files based on the provided Excel file.

Parameters:
  • mapping_path (str) –

    Path to the first mapping file.

  • mapping__path (str) –

    Path to the second mapping file.

  • excel_path (str) –

    Path to the Excel file containing kinetic strength data.

  • suffix (str) –

    Suffix to append to the output files.

generate_nodes(edge_df)

Infers node types and aggregates all phosphorylation sites per target node from phosphorylation edges.

Parameters:
  • edge_df (DataFrame) –

    Must have columns ['Source', 'Target', 'Interaction', 'Psite']

Returns:
  • pd.DataFrame: DataFrame with columns ['Node', 'Type', 'Psite']

Kinase-Phosphorylation Optimization

Evolutionary Algorithms

kinopt.evol.config.constants

kinopt.evol.config.logconf

ColoredFormatter

Bases: Formatter

format(record)

Format the log record with ANSI color codes and elapsed time.

Parameters:
  • record (LogRecord) –

    The log record to format.

Returns: str: The formatted log message with ANSI color codes.

remove_ansi(s) staticmethod

Remove ANSI escape codes from a string.

Parameters:
  • s (str) –

    The string from which to remove ANSI escape codes.

Returns: str: The string without ANSI escape codes.

setup_logger(name='phoskintime', log_file=None, level=logging.DEBUG, log_dir=LOG_DIR, rotate=True, max_bytes=2 * 1024 * 1024, backup_count=5)

Function to set up a logger with both file and console handlers.

Parameters:
  • name (str, default: 'phoskintime' ) –

    Name of the logger.

  • log_file (str, default: None ) –

    Path to the log file. If None, a default path is generated.

  • level (int, default: DEBUG ) –

    Logging level (e.g., logging.DEBUG, logging.INFO).

  • log_dir (str, default: LOG_DIR ) –

    Directory where log files are stored.

  • rotate (bool, default: True ) –

    Whether to use rotating file handler.

  • max_bytes (int, default: 2 * 1024 * 1024 ) –

    Maximum size of log file before rotation.

  • backup_count (int, default: 5 ) –

    Number of backup files to keep.

Returns:
  • logger( Logger ) –

    Configured logger instance.

kinopt.evol.exporter.plotout

plot_residuals_for_gene(gene, gene_data)

Generates and saves combined residual-related plots for one gene with all psites in the legend.

Parameters:
  • gene (str) –

    Gene identifier.

  • gene_data (dict) –

    Dictionary with keys 'psites', 'observed', 'estimated', and 'residuals' containing data for all psites.

  • TIME_POINTS (ndarray or list) –

    Time points corresponding to the series.

opt_analyze_nsga(problem, result, F, pairs, approx_ideal, approx_nadir, asf_i, pseudo_i, n_evals, hv, hist, val, hist_cv_avg, k, igd, best_objectives, waterfall_df, convergence_df, alpha_values, beta_values)

Function to generate and save various plots related to optimization results.

Parameters:
  • problem

    The optimization problem instance.

  • result

    The result of the optimization run.

  • F

    Objective function values.

  • pairs

    Pairs of objectives to plot.

  • approx_ideal

    Approximate ideal point in objective space.

  • approx_nadir

    Approximate nadir point in objective space.

  • asf_i

    Index of the best solution in terms of the augmented weighted sum.

  • pseudo_i

    Index of the pseudo weights.

  • n_evals

    Number of evaluations at each generation.

  • hv

    Hypervolume values.

  • hist

    History of the optimization process.

  • val

    Values for convergence plot.

  • hist_cv_avg

    Average constraint violation history.

  • k

    Number of generations.

  • igd

    Inverted generational distance values.

  • best_objectives

    Best objectives found during the optimization process.

  • waterfall_df

    DataFrame containing waterfall plot data.

  • convergence_df

    DataFrame containing convergence data.

  • alpha_values

    Dictionary containing alpha values for parameters.

  • beta_values

    Dictionary containing beta values for parameters.

Returns:
  • None

opt_analyze_de(long_df, convergence_df, ordered_optimizer_runs, x_values, y_values, val)

Function to generate and save various plots related to optimization results.

Parameters:
  • long_df (DataFrame) –

    DataFrame containing parameter values and objective function values.

  • convergence_df (DataFrame) –

    DataFrame containing convergence data.

  • ordered_optimizer_runs (DataFrame) –

    DataFrame containing ordered optimizer runs.

  • x_values (list) –

    X-axis values for the waterfall plot.

  • y_values (list) –

    Y-axis values for the waterfall plot.

  • val (list) –

    Values for the convergence plot.

Returns:
  • None

kinopt.evol.exporter.sheetutils

output_results(P_initial, P_init_dense, P_estimated, residuals, alpha_values, beta_values, result, timepoints, OUT_FILE)

Function to output results to an Excel file.

Parameters:
  • P_initial (dict) –

    Dictionary with initial parameters.

  • P_init_dense (ndarray) –

    Dense matrix of initial parameters.

  • P_estimated (ndarray) –

    Dense matrix of estimated parameters.

  • residuals (ndarray) –

    Dense matrix of residuals.

  • alpha_values (dict) –

    Dictionary with alpha values.

  • beta_values (dict) –

    Dictionary with beta values.

  • result (str) –

    Result string for logging.

  • timepoints (list) –

    List of time points.

  • OUT_FILE (str) –

    Output file path.

Returns:
  • None

kinopt.evol.objfn.minfndiffevo

PhosphorylationOptimizationProblem

Bases: ElementwiseProblem

Custom optimization problem for phosphorylation analysis.

Defines the constraints, bounds, and objective function for optimizing alpha and beta parameters across gene-psite-kinase relationships.

Attributes:
  • P_initial (dict) –

    Mapping of gene-psite pairs to kinase relationships and time-series data.

  • P_initial_array (ndarray) –

    Observed time-series data for gene-psite pairs.

  • K_index (dict) –

    Mapping of kinases to their respective psite data.

  • K_array (ndarray) –

    Array containing time-series data for kinase-psite combinations.

  • gene_psite_counts (list) –

    Number of kinases per gene-psite combination.

  • beta_counts (dict) –

    Mapping of kinase indices to the number of associated psites.

__init__(P_initial, P_initial_array, K_index, K_array, gene_psite_counts, beta_counts, **kwargs)

Initializes the optimization problem with given data and constraints.

Parameters:
  • P_initial (dict) –

    Mapping of gene-psite pairs to kinase relationships and time-series data.

  • P_initial_array (ndarray) –

    Observed time-series data for gene-psite pairs.

  • K_index (dict) –

    Mapping of kinases to their respective psite data.

  • K_array (ndarray) –

    Array containing time-series data for kinase-psite combinations.

  • gene_psite_counts (list) –

    Number of kinases per gene-psite combination.

  • beta_counts (dict) –

    Mapping of kinase indices to the number of associated psites.

objective_function(params)

Computes the loss value for the given parameters using the selected loss type.

Parameters:
  • params (ndarray) –

    Decision variables vector.

Returns:
  • float

    Computed loss value.

kinopt.evol.objfn.minfnnsgaii

PhosphorylationOptimizationProblem

Bases: ElementwiseProblem

Multi-objective optimization problem for phosphorylation analysis.

Objectives: - Minimize sum of squared residuals (main objective). - Minimize violations of constraints for alpha (secondary objective). - Minimize violations of constraints for beta (tertiary objective).

__init__(P_initial, P_initial_array, K_index, K_array, gene_psite_counts, beta_counts, **kwargs)

Initializes the multi-objective optimization problem.

Parameters:
  • P_initial (dict) –

    Mapping of gene-psite pairs to kinase relationships and time-series data.

  • P_initial_array (ndarray) –

    Observed time-series data for gene-psite pairs.

  • K_index (dict) –

    Mapping of kinases to their respective psite data.

  • K_array (ndarray) –

    Array containing time-series data for kinase-psite combinations.

  • gene_psite_counts (list) –

    Number of kinases per gene-psite combination.

  • beta_counts (dict) –

    Mapping of kinase indices to the number of associated psites.

objective_function(params)

Computes the loss value for the given parameters using the selected loss type.

Parameters:
  • params (ndarray) –

    Decision variables vector.

Returns:
  • float

    Computed loss value.

kinopt.evol.opt.optrun

run_optimization(P_initial, P_initial_array, K_index, K_array, gene_psite_counts, beta_counts, PhosphorylationOptimizationProblem)

Sets up and runs the multi-objective optimization problem for phosphorylation using an NSGA2 algorithm and a thread pool for parallelization.

Parameters:
  • P_initial, ((P_initial_array, K_index, K_array, gene_psite_counts, beta_counts)) –

    Data structures describing the problem (time-series data, kinases, etc.).

  • PhosphorylationOptimizationProblem (class) –

    The custom problem class to be instantiated.

Returns:
  • result

    The pymoo result object containing the optimized population and history.

  • exec_time

    Execution time for the optimization.

post_optimization_nsga(result, weights=np.array([1.0, 1.0, 1.0]), ref_point=np.array([3, 1, 1]))

Post-processes the result of a multi-objective optimization run.

Parameters:
  • result

    The final result object from the optimizer (e.g., a pymoo result).

  • weights (ndarray, default: array([1.0, 1.0, 1.0]) ) –

    Array of length 3 for weighting the objectives.

  • ref_point (ndarray, default: array([3, 1, 1]) ) –

    Reference point for hypervolume computations.

Returns:
  • dict

    A dictionary with keys: 'best_solution': The best individual from the weighted scoring. 'best_objectives': Its corresponding objective vector. 'optimized_params': The individual's decision variables (X). 'scores': Weighted scores for each solution in the Pareto front. 'best_index': The index of the best solution according to weighted score. 'hist_hv': The hypervolume per generation. 'hist_igd': The IGD+ per generation. 'convergence_df': The DataFrame with iteration vs. best objective value for each iteration in the result history.

post_optimization_de(result, alpha_values, beta_values)

Post-processes the result of a multi-objective optimization run.

Parameters:
  • result

    The final result object from the optimizer (e.g., a pymoo result).

Returns:
  • dict

    A dictionary with keys: 'best_solution': The best individual from the weighted scoring. 'best_objectives': Its corresponding objective vector. 'optimized_params': The individual's decision variables (X). 'scores': Weighted scores for each solution in the Pareto front. 'best_index': The index of the best solution according to weighted score. 'hist_hv': The hyper volume per generation. 'hist_igd': The IGD+ per generation. 'convergence_df': The DataFrame with iteration vs. best objective value for each iteration in the result history.

kinopt.evol.optcon.construct

pipeline(input1_path: str, input2_path: str, time_series_columns: list[str], scaling_method: str, split_point: float, segment_points: list[float], estimate_missing_kinases: bool, kinase_to_psites: dict[str, int])

Function to run the entire pipeline for loading and processing data.

Parameters:
  • input1_path (str) –

    Path to the first CSV file (HGNC data).

  • input2_path (str) –

    Path to the second CSV file (kinase interactions).

  • time_series_columns (list[str]) –

    List of time series columns to extract.

  • scaling_method (str) –

    Method for scaling the data.

  • split_point (float) –

    Split point for scaling.

  • segment_points (list[float]) –

    Segment points for scaling.

  • estimate_missing_kinases (bool) –

    Flag to estimate missing kinases.

  • kinase_to_psites (dict[str, int]) –

    Dictionary mapping kinases to their respective psites.

Returns:
  • full_hgnc_df( DataFrame ) –

    The scaled data from input1.

  • interaction_df( DataFrame ) –

    The subset/merged DataFrame from input2.

  • observed( DataFrame ) –

    Subset of full_hgnc_df merged with interaction_df.

  • P_initial( dict ) –

    Dictionary mapping gene-psite pairs to kinase relationships and time-series data.

  • P_initial_array( ndarray ) –

    Array containing observed time-series data for gene-psite pairs.

  • K_array( ndarray ) –

    Array containing time-series data for kinase-psite combinations.

  • K_index( dict ) –

    Mapping of kinases to their respective psite data.

  • beta_counts( dict ) –

    Mapping of kinase indices to the number of associated psites.

  • gene_psite_counts( list ) –

    List of counts of psites for each gene.

  • n( int ) –

    Number of unique gene-psite pairs.

load_geneid_to_psites(input1_path=INPUT1)

Function to load geneid to psite mapping from input1.csv. Args: input1_path (str): Path to the first CSV file (HGNC data). Returns: geneid_psite_map (dict): Dictionary mapping gene IDs to sets of psites.

get_unique_kinases(input2_path=INPUT2)

Function to extract unique kinases from input2.csv. Args: input2_path (str): Path to the second CSV file (kinase interactions). Returns: kinases (set): Set of unique kinases extracted from the input2 file.

check_kinases()

Function to check if kinases from input2.csv are present in input1.csv.

Returns:
  • None

kinopt.evol.utils.iodata

format_duration(seconds)

Returns a formatted string representing the duration in seconds, minutes, or hours.

Parameters:
  • seconds (float) –

    The duration in seconds.

Returns: str: The formatted duration string.

load_and_scale_data(estimate_missing, scaling_method, split_point, seg_points)

Function to load and scale data from CSV files.

Parameters:
  • estimate_missing (bool) –

    If True, estimates missing values.

  • scaling_method (str) –

    The scaling method to apply ('min_max', 'log', 'temporal', 'segmented', 'slope', 'cumulative').

  • split_point (int) –

    Column index for temporal scaling.

  • seg_points (list) –

    List of column indices for segmented scaling.

Returns:
  • full_hgnc_df( DataFrame ) –

    DataFrame with scaled time-series data.

  • interaction_df( DataFrame ) –

    DataFrame containing interaction data.

  • observed( DataFrame ) –

    DataFrame containing observed data.

apply_scaling(df, time_series_columns, method, split_point, segment_points)

Function to apply different scaling methods to time-series data in a DataFrame.

Parameters:
  • df (DataFrame) –

    Input DataFrame containing time-series data.

  • time_series_columns (list) –

    List of column names to scale.

  • method (str) –

    Scaling method ('min_max', 'log', 'temporal', 'segmented', 'slope', 'cumulative').

  • split_point (int) –

    Column index for temporal scaling.

  • segment_points (list) –

    List of column indices for segmented scaling.

Returns:
  • pd.DataFrame: DataFrame with scaled time-series data.

create_report(results_dir: str, output_file: str = 'report.html')

Creates a single global report HTML file from all gene folders inside the results directory.

Parameters:
  • results_dir (str) –

    Path to the root result's directory.

  • output_file (str, default: 'report.html' ) –

    Name of the generated global report file (placed inside results_dir).

Returns:
  • None

organize_output_files(*directories)

Function to organize output files into protein-specific folders and a general folder.

Parameters:
  • *directories

    List of directories to organize.

Returns:
  • None

kinopt.evol.utils.params

extract_parameters(P_initial, gene_psite_counts, K_index, optimized_params)

Function to extract alpha and beta values from the optimized parameters.

Parameters:
  • P_initial (dict) –

    Dictionary containing initial parameters for each gene-psite pair.

  • gene_psite_counts (list) –

    List of counts for each gene-psite pair.

  • K_index (dict) –

    Dictionary mapping kinases to their respective psite pairs.

  • optimized_params (list) –

    List of optimized parameters.

Returns:
  • alpha_values( dict ) –

    Dictionary containing alpha values for each gene-psite pair.

  • beta_values( dict ) –

    Dictionary containing beta values for each kinase-psite pair.

compute_metrics(optimized_params: np.ndarray, P_initial: dict, P_initial_array: np.ndarray, K_index: dict, K_array: np.ndarray, gene_psite_counts: list, beta_counts: dict, n: int)

Function to compute error metrics for the estimated series.

Parameters:
  • optimized_params (list) –

    List of optimized parameters.

  • P_initial (dict) –

    Dictionary containing initial parameters for each gene-psite pair.

  • P_initial_array (ndarray) –

    Array of initial parameters.

  • K_index (dict) –

    Dictionary mapping kinases to their respective psite pairs.

  • K_array (ndarray) –

    Array of kinases.

  • gene_psite_counts (list) –

    List of counts for each gene-psite pair.

  • beta_counts (dict) –

    List of counts for each kinase-psite pair.

  • n (int) –

    Number of samples.

Returns:
  • P_estimated( ndarray ) –

    Estimated series.

  • residuals( ndarray ) –

    Residuals between initial and estimated series.

  • mse( float ) –

    Mean Squared Error.

  • rmse( float ) –

    Root Mean Squared Error.

  • mae( float ) –

    Mean Absolute Error.

  • mape( float ) –

    Mean Absolute Percentage Error.

  • r_squared( float ) –

    R-squared value.

Gradient-Based Algorithms

kinopt.local.config.constants

parse_args()

Parses command-line arguments for the optimization script. This function uses argparse to handle various parameters related to the optimization process. The parameters include bounds for the optimization, loss function types, estimation of missing kinases, scaling methods for time-series data, and the optimization method to be used. The function returns a tuple containing the parsed arguments.

:return: A tuple containing the parsed arguments. - lower_bound (float): Lower bound for the optimization. - upper_bound (float): Upper bound for the optimization. - loss_type (str): Type of loss function to use. - estimate_missing (bool): Whether to estimate missing kinase-psite values. - scaling_method (str): Method for scaling time-series data. - split_point (int): Split point for temporal scaling. - segment_points (list of int): Segment points for segmented scaling. - method (str): Optimization method to use.

kinopt.local.config.logconf

kinopt.local.exporter.plotout

plot_fits_for_gene(gene, gene_data, real_timepoints)

Function to plot the observed and estimated phosphorylation levels for each psite of a gene.

Parameters:
  • gene (str) –

    The name of the gene.

  • gene_data (dict) –

    A dictionary containing observed and estimated data for each psite of the gene.

  • real_timepoints (list) –

    A list of timepoints corresponding to the observed and estimated data.

plot_cumulative_residuals(gene, gene_data, real_timepoints)

Function to plot the cumulative residuals for each psite of a gene.

Parameters:
  • gene (str) –

    The name of the gene.

  • gene_data (dict) –

    A dictionary containing the residuals for each psite of the gene.

  • real_timepoints (list) –

    A list of timepoints corresponding to the observed and estimated data.

plot_autocorrelation_residuals(gene, gene_data, real_timepoints)

Function to plot the autocorrelation of residuals for each psite of a gene.

Parameters:
  • gene (str) –

    The name of the gene.

  • gene_data (dict) –

    A dictionary containing the residuals for each psite of the gene.

  • real_timepoints (list) –

    A list of timepoints corresponding to the observed and estimated data.

plot_histogram_residuals(gene, gene_data, real_timepoints)

Function to plot histograms of residuals for each psite of a gene.

Parameters:
  • gene (str) –

    The name of the gene.

  • gene_data (dict) –

    A dictionary containing the residuals for each psite of the gene.

  • real_timepoints (list) –

    A list of timepoints corresponding to the observed and estimated data.

plot_qqplot_residuals(gene, gene_data, real_timepoints)

Function to plot QQ plots of residuals for each psite of a gene.

Parameters:
  • gene (str) –

    The name of the gene.

  • gene_data (dict) –

    A dictionary containing the residuals for each psite of the gene.

  • real_timepoints (list) –

    A list of timepoints corresponding to the observed and estimated data.

kinopt.local.exporter.sheetutils

plot_fits_for_gene(gene, gene_data, real_timepoints)

Function to plot the observed and estimated phosphorylation levels for each psite of a gene.

Parameters:
  • gene (str) –

    The name of the gene.

  • gene_data (dict) –

    A dictionary containing observed and estimated data for each psite of the gene.

  • real_timepoints (list) –

    A list of timepoints corresponding to the observed and estimated data.

plot_cumulative_residuals(gene, gene_data, real_timepoints)

Function to plot the cumulative residuals for each psite of a gene.

Parameters:
  • gene (str) –

    The name of the gene.

  • gene_data (dict) –

    A dictionary containing the residuals for each psite of the gene.

  • real_timepoints (list) –

    A list of timepoints corresponding to the observed and estimated data.

plot_autocorrelation_residuals(gene, gene_data, real_timepoints)

Function to plot the autocorrelation of residuals for each psite of a gene.

Parameters:
  • gene (str) –

    The name of the gene.

  • gene_data (dict) –

    A dictionary containing the residuals for each psite of the gene.

  • real_timepoints (list) –

    A list of timepoints corresponding to the observed and estimated data.

plot_histogram_residuals(gene, gene_data, real_timepoints)

Function to plot histograms of residuals for each psite of a gene.

Parameters:
  • gene (str) –

    The name of the gene.

  • gene_data (dict) –

    A dictionary containing the residuals for each psite of the gene.

  • real_timepoints (list) –

    A list of timepoints corresponding to the observed and estimated data.

plot_qqplot_residuals(gene, gene_data, real_timepoints)

Function to plot QQ plots of residuals for each psite of a gene.

Parameters:
  • gene (str) –

    The name of the gene.

  • gene_data (dict) –

    A dictionary containing the residuals for each psite of the gene.

  • real_timepoints (list) –

    A list of timepoints corresponding to the observed and estimated data.

output_results(P_initial, P_init_dense, P_estimated, residuals, alpha_values, beta_values, result, mse, rmse, mae, mape, r_squared)

Function to output the results of the optimization process.

Parameters:
  • P_initial (dict) –

    Dictionary containing initial phosphorylation data.

  • P_init_dense (ndarray) –

    Dense matrix of initial phosphorylation data.

  • P_estimated (ndarray) –

    Dense matrix of estimated phosphorylation data.

  • residuals (ndarray) –

    Dense matrix of residuals.

  • alpha_values (dict) –

    Dictionary containing optimized alpha values.

  • beta_values (dict) –

    Dictionary containing optimized beta values.

  • result (OptimizeResult) –

    Result object from the optimization process.

  • mse (float) –

    Mean Squared Error of the optimization.

  • rmse (float) –

    Root Mean Squared Error of the optimization.

  • mae (float) –

    Mean Absolute Error of the optimization.

  • mape (float) –

    Mean Absolute Percentage Error of the optimization.

  • r_squared (float) –

    R-squared value of the optimization.

kinopt.local.objfn.minfn

kinopt.local.opt.optrun

run_optimization(obj_fun, params_initial, opt_method, bounds, constraints)

Run optimization using the specified method.

Parameters:
  • obj_fun

    Objective function to minimize.

  • params_initial

    Initial parameters for the optimization.

  • opt_method

    Optimization method to use (e.g., 'SLSQP', 'trust-constr').

  • bounds

    Bounds for the parameters.

  • constraints

    Constraints for the optimization.

Returns:
  • result

    Result of the optimization.

  • optimized_params

    Optimized parameters.

kinopt.local.optcon.construct

load_geneid_to_psites(input1_path=INPUT1)

Load the geneid to psite mapping from a CSV file.

Parameters:
  • input1_path (str, default: INPUT1 ) –

    Path to the input CSV file containing geneid and psite information.

Returns: defaultdict: A dictionary mapping geneid to a set of psites.

get_unique_kinases(input2_path=INPUT2)

Extract unique kinases from the input CSV file.

Parameters:
  • input2_path (str, default: INPUT2 ) –

    Path to the input CSV file containing kinase information.

Returns: set: A set of unique kinases.

check_kinases()

Check if kinases in input2.csv are present in input1.csv and log the results.

kinopt.local.utils.iodata

format_duration(seconds)

Formats a duration in seconds into a human-readable string. - If less than 60 seconds, returns in seconds. - If less than 3600 seconds, returns in minutes. - If more than 3600 seconds, returns in hours.

:param seconds: :return: Formatted string

load_and_scale_data(estimate_missing, scaling_method, split_point, seg_points)

Load and scale the data from the specified input files.

:param estimate_missing: :param scaling_method: :param split_point: :param seg_points: :return: Time series data, interaction data, observed data

apply_scaling(df, cols, method, split_point, seg_points)

Apply scaling to the specified columns of a DataFrame based on the given method. The scaling methods include: - 'min_max': Min-Max scaling - 'log': Logarithmic scaling - 'temporal': Temporal scaling (two segments) - 'segmented': Segmented scaling (multiple segments) - 'slope': Slope scaling - 'cumulative': Cumulative scaling

:param df: :param cols: :param method: :param split_point: :param seg_points: :return: df

create_report(results_dir: str, output_file: str = 'report.html')

Creates a single global report HTML file from all gene folders inside the results directory.

For each gene folder (e.g. "ABL2"), the report will include: - All PNG plots and interactive HTML plots displayed in a grid with three plots per row. - Each plot is confined to a fixed size of 900px by 900px. - Data tables from XLSX or CSV files in the gene folder are displayed below the plots, one per row.

Parameters:
  • results_dir (str) –

    Path to the root results directory.

  • output_file (str, default: 'report.html' ) –

    Name of the generated global report file (placed inside results_dir).

organize_output_files(*directories)

Function to organize output files into protein-specific folders. It moves files matching the pattern 'protein_name_*.{json,svg,png,html,csv,xlsx}' into a folder named after the protein (e.g., 'ABL2') and moves all other files into a 'General' folder within the same directory.

:param directories:

kinopt.local.utils.params

extract_parameters(P_initial, gene_kinase_counts, total_alpha, unique_kinases, K_index, optimized_params)

Extracts the alpha and beta parameters from the optimized parameters.

:param P_initial: :param gene_kinase_counts: :param total_alpha: :param unique_kinases: :param K_index: :param optimized_params: :return: Alpha and beta values as dictionaries

compute_metrics(optimized_params, P_init_dense, t_max, gene_alpha_starts, gene_kinase_counts, gene_kinase_idx, total_alpha, kinase_beta_starts, kinase_beta_counts, K_data, K_indices, K_indptr)

Computes the estimated series and various metrics based on the optimized parameters.

:param optimized_params: :param P_init_dense: :param t_max: :param gene_alpha_starts: :param gene_kinase_counts: :param gene_kinase_idx: :param total_alpha: :param kinase_beta_starts: :param kinase_beta_counts: :param K_data: :param K_indices: :param K_indptr: :return: Estimated series, residuals, MSE, RMSE, MAE, MAPE, R-squared

Fitting Analysis & Feasibility

kinopt.fitanalysis.helpers.postfit

goodnessoffit(estimated, observed)

Function to plot the goodness of fit and kullback-leibler divergence for estimated and observed values.

Parameters:
  • estimated (DataFrame) –

    DataFrame containing estimated values.

  • observed (DataFrame) –

    DataFrame containing observed values.

Returns:
  • None

reshape_alpha_beta(alpha_values, beta_values)

Function to reshape alpha and beta values for plotting.

Parameters:
  • alpha_values (DataFrame) –

    DataFrame containing alpha values.

  • beta_values (DataFrame) –

    DataFrame containing beta values.

Returns: pd.DataFrame: Reshaped DataFrame containing both alpha and beta values.

perform_pca(df)

Function to perform PCA analysis on the given DataFrame.

Parameters:
  • df (DataFrame) –

    DataFrame containing the data for PCA analysis.

Returns:
  • pd.DataFrame: DataFrame with PCA results and additional columns for type and gene/psite information.

plot_pca(result_df_sorted, y_axis_column)

Plot PCA or t-SNE results for each gene/psite. The function creates scatter plots with different markers for alpha and beta parameters, and adds labels for each point. The function also adjusts text labels to avoid overlap using the adjustText library.

:param result_df_sorted: DataFrame containing PCA or t-SNE results. :param y_axis_column: Column name for the y-axis values in the plot.

perform_tsne(scaled_data, df)

Perform t-SNE analysis on the given scaled data. The function returns a DataFrame with t-SNE results and additional columns for type and gene/psite information.

:param scaled_data: :param df:

:return: - pd.DataFrame: DataFrame with t-SNE results and additional columns.

additional_plots(df, scaled_data, alpha_values, beta_values, residuals_df)

Function to create additional plots including CDF, KDE, Boxplot, and Hierarchical Clustering.

:param df: :param scaled_data: :param alpha_values: :param beta_values: :param residuals_df:

create_sankey_from_network(output_dir, data, title)

Creates a Sankey diagram from the given data and saves it as an HTML file.

This function processes the input data to generate nodes and links for a Sankey diagram. It assigns colors to nodes and links based on their attributes and values, and uses Plotly to render the diagram. The resulting diagram is saved as an HTML file in the specified output directory.

:param output_dir: str The directory where the Sankey diagram HTML file will be saved. :param data: pd.DataFrame A DataFrame containing the data for the Sankey diagram. It must include the following columns: - 'Source': The source node of the link. - 'Target': The target node of the link. - 'Value': The value of the link, which determines the flow size. :param title: str The title of the Sankey diagram.

The function performs the following steps: 1. Initializes nodes and links for the Sankey diagram. 2. Maps node labels to indices and assigns colors to nodes. 3. Processes the data to create links between nodes, assigning colors based on link values. 4. Builds the Sankey diagram using Plotly. 5. Adds a color bar to explain the flow gradient. 6. Saves the Sankey diagram as an HTML file in the specified output directory.

important_connections(output_dir, data, top_n=20)

Extracts the top N most important connections based on their absolute values and saves them to a CSV file.

:param output_dir: str The directory where the CSV file will be saved. :param data: pd.DataFrame A DataFrame containing the connections with columns 'Source', 'Target', and 'Value'. :param top_n: int, optional The number of top connections to extract (default is 20).

The function sorts the connections by their absolute values in descending order, selects the top N connections, and saves them to a CSV file named 'top_connections.csv' in the specified output directory.

kinopt.optimality.KKT

generate_latex_table(summary_dict, table_caption, table=None)

Function to generate a LaTeX table from a summary dictionary.

Parameters:
  • summary_dict (dict) –

    Dictionary containing summary data.

  • table_caption (str) –

    Caption for the LaTeX table.

  • table (str, default: None ) –

    Optional existing LaTeX table to append to.

Returns:
  • str

    LaTeX formatted table as a string.

print_primal_feasibility_results(primal_summary, alpha_violations, beta_violations, logger_obj=None)

Logs the primal feasibility summary and violation details.

Parameters:
  • primal_summary (dict) –

    Dictionary containing primal feasibility results.

  • alpha_violations (dict) –

    Dictionary containing alpha constraint violations.

  • beta_violations (dict) –

    Dictionary containing beta constraint violations.

  • logger_obj

    Optional logger object to log the information.

print_sensitivity_and_active_constraints(sensitivity_summary, active_constraints_summary, logger_obj=None)

Logs the sensitivity summary and active constraints summary.

Parameters:
  • sensitivity_summary (dict) –

    Dictionary containing sensitivity analysis results.

  • active_constraints_summary (dict) –

    Dictionary containing active constraints summary.

  • logger_obj

    Optional logger object to log the information.

plot_constraint_violations(alpha_violations, beta_violations, out_dir)

Function to plot constraint violations for alpha and beta values. It creates a stacked bar plot showing the violations for each protein. The top 5 proteins with the highest violations are highlighted in red.

Parameters:
  • alpha_violations (Series) –

    Series containing alpha constraint violations.

  • beta_violations (Series) –

    Series containing beta constraint violations.

  • out_dir (str) –

    Directory to save the plot.

plot_sensitivity_analysis(sensitivity_analysis, out_dir)

Function to plot sensitivity analysis results. It creates a horizontal bar plot showing the mean, max, and min sensitivity for each protein.

Parameters:
  • sensitivity_analysis (DataFrame) –

    DataFrame containing sensitivity analysis results.

  • out_dir (str) –

    Directory to save the plot.

Returns:
  • None

process_excel_results(file_path=OUT_FILE)

Function to process the Excel results file. It reads the alpha and beta values, estimated and observed values, validates normalization constraints, computes residuals and gradients, and generates LaTeX tables for the residuals and sensitivity summaries. It also performs sensitivity analysis and identifies high sensitivity sites. The results are returned as a dictionary.

Parameters:
  • file_path (str, default: OUT_FILE ) –

    Path to the Excel file containing results.

Returns: dict: Dictionary containing the processed results, including alpha and beta values, estimated and observed values, constraint violations, residuals summary, sensitivity summary, and high sensitivity sites.

post_optimization_results()

Function to process and visualize the results of the optimization.

Returns: dict: Dictionary containing the processed results, including alpha and beta values, estimated and observed values, constraint violations, residuals summary, sensitivity summary, and high sensitivity sites.

TF-mRNA Optimization

Evolutionary Algorithms

tfopt.evol.config.constants

parse_args()

Parse command line arguments for the PhosKinTime optimization problem. This function uses argparse to handle input parameters for the optimization process. The parameters include: - lower_bound: Lower bound for the optimization variables (default: -2). - upper_bound: Upper bound for the optimization variables (default: 2). - loss_type: Type of loss function to use (default: 0). Options: 0: MSE 1: MAE 2: soft L1 3: Cauchy 4: Arctan 5: Elastic Net 6: Tikhonov - optimizer: Global Evolutionary Optimization method (default: 0). Options: 0: NGSA2 1: SMSEMOA 2: AGEMOEA

:returns - lower_bound: Lower bound for the optimization variables. - upper_bound: Upper bound for the optimization variables. - loss_type: Type of loss function to use. - optimizer: Global Evolutionary Optimization method. :rtype: tuple :raises argparse.ArgumentError: If an invalid argument is provided. :raises SystemExit: If the script is run with invalid arguments.

tfopt.evol.config.logconf

tfopt.evol.exporter.plotout

plot_estimated_vs_observed(predictions, expression_matrix, gene_ids, time_points, regulators, tf_protein_matrix, tf_ids, num_targets, save_path=OUT_DIR)

Plot the estimated vs observed expression levels for a set of genes.

Parameters:
  • predictions (ndarray) –

    Predicted expression levels.

  • expression_matrix (ndarray) –

    Observed expression levels.

  • gene_ids (list) –

    List of gene identifiers.

  • time_points (ndarray) –

    Time points for the experiments.

  • regulators (ndarray) –

    Matrix of regulators for each gene.

  • tf_protein_matrix (ndarray) –

    Matrix of TF protein levels.

  • tf_ids (list) –

    List of TF identifiers.

  • num_targets (int) –

    Number of target genes to plot.

  • save_path (str, default: OUT_DIR ) –

    Directory to save the plots.

compute_predictions(x, regulators, protein_mat, psite_tensor, n_reg, T_use, n_mRNA, beta_start_indices, num_psites)

Compute the predicted expression levels based on the optimization variables.

Parameters:
  • x (ndarray) –

    Optimization variables.

  • regulators (ndarray) –

    Matrix of regulators for each gene.

  • protein_mat (ndarray) –

    Matrix of TF protein levels.

  • psite_tensor (ndarray) –

    Tensor of phosphorylation sites.

  • n_reg (int) –

    Number of regulators.

  • T_use (int) –

    Number of time points to use.

  • n_mRNA (int) –

    Number of mRNAs.

  • beta_start_indices (list) –

    List of starting indices for beta parameters.

  • num_psites (list) –

    List of number of phosphorylation sites for each TF.

tfopt.evol.exporter.sheetutils

save_results_to_excel(gene_ids, tf_ids, final_alpha, final_beta, psite_labels_arr, expression_matrix, predictions, objective_value, reg_map, filename=OUT_FILE)

Save the optimization results to an Excel file.

Parameters:
  • gene_ids (list) –

    List of gene identifiers.

  • tf_ids (list) –

    List of TF identifiers.

  • final_alpha (ndarray) –

    Final alpha values.

  • final_beta (ndarray) –

    Final beta values.

  • psite_labels_arr (list) –

    List of phosphorylation site labels.

  • expression_matrix (ndarray) –

    Observed expression levels.

  • predictions (ndarray) –

    Predicted expression levels.

  • objective_value (float) –

    Objective value from optimization.

  • reg_map (dict) –

    Mapping of genes to regulators.

  • filename (str, default: OUT_FILE ) –

    Path to the output Excel file.

tfopt.evol.objfn.minfn

TFOptimizationMultiObjectiveProblem

Bases: Problem

Originally implemented by Julius Normann.

This version has been modified and optimized for consistency & speed in submodules by Abhinav Mishra.

Multi-objective optimization problem for TF optimization. This class defines a multi-objective optimization problem for the transcription factor (TF) optimization problem. It inherits from the Problem class in the pymoo library. The problem is defined with three objectives: f1 (error), f2 (alpha violation), and f3 (beta violation).

__init__(n_var, n_mRNA, n_TF, n_reg, n_psite_max, n_alpha, mRNA_mat, regulators, protein_mat, psite_tensor, T_use, beta_start_indices, num_psites, no_psite_tf, xl=None, xu=None, **kwargs)

Initialize the multi-objective optimization problem.

Parameters:
  • n_var (int) –

    Number of decision variables.

  • n_mRNA (int) –

    Number of mRNAs.

  • n_TF (int) –

    Number of transcription factors.

  • n_reg (int) –

    Number of regulators.

  • n_psite_max (int) –

    Maximum number of phosphorylation sites.

  • n_alpha (int) –

    Number of alpha parameters.

  • mRNA_mat (ndarray) –

    Matrix of mRNA measurements.

  • regulators (ndarray) –

    Matrix of regulators for each mRNA.

  • protein_mat (ndarray) –

    Matrix of TF protein levels.

  • psite_tensor (ndarray) –

    Tensor of phosphorylation sites.

  • T_use (int) –

    Number of time points to use.

  • beta_start_indices (list) –

    List of starting indices for beta parameters.

  • num_psites (list) –

    List of number of phosphorylation sites for each TF.

  • no_psite_tf (list) –

    List indicating if a TF has no phosphorylation site.

  • xl (ndarray, default: None ) –

    Lower bounds for decision variables. Defaults to None.

  • xu (ndarray, default: None ) –

    Upper bounds for decision variables. Defaults to None.

objective_(x, mRNA_mat, regulators, protein_mat, psite_tensor, n_reg, T_use, n_mRNA, beta_start_indices, num_psites, loss_type, lam1=0.001, lam2=0.001)

Computes a loss value for transcription factor optimization using evolutionary algorithms.

Parameters:
  • x (ndarray) –

    Optimization variables.

  • mRNA_mat (ndarray) –

    Matrix of mRNA measurements.

  • regulators (ndarray) –

    Matrix of regulators for each mRNA.

  • protein_mat (ndarray) –

    Matrix of TF protein levels.

  • psite_tensor (ndarray) –

    Tensor of phosphorylation sites.

  • n_reg (int) –

    Number of regulators.

  • T_use (int) –

    Number of time points to use.

  • n_mRNA (int) –

    Number of mRNAs.

  • beta_start_indices (list) –

    List of starting indices for beta parameters.

  • num_psites (list) –

    List of number of phosphorylation sites for each TF.

  • loss_type (int) –

    Type of loss function to use.

  • lam1 (float, default: 0.001 ) –

    L1 penalty coefficient. Defaults to 1e-3.

  • lam2 (float, default: 0.001 ) –

    L2 penalty coefficient. Defaults to 1e-3.

Returns:
  • float

    Computed loss value.

tfopt.evol.opt.optrun

run_optimization(problem, total_dim, optimizer)

Run the optimization using the specified algorithm and problem.

Parameters:
  • problem (Problem) –

    The optimization problem to solve.

  • total_dim (int) –

    Total number of dimensions in the problem.

  • optimizer (int) –

    The optimizer to use (0 for NSGA2, 1 for SMSEMOA, 2 for AGEMOEA).

Returns:
  • res( Result ) –

    The result of the optimization.

tfopt.evol.optcon.construct

build_fixed_arrays(mRNA_ids, mRNA_mat, TF_ids, protein_dict, psite_dict, psite_labels_dict, reg_map)

Builds fixed-shape arrays from the input data.

Parameters:
  • mRNA_ids (list) –

    List of mRNA identifiers.

  • mRNA_mat (ndarray) –

    Matrix of mRNA expression levels.

  • TF_ids (list) –

    List of TF identifiers.

  • protein_dict (dict) –

    Dictionary mapping TFs to their protein levels.

  • psite_dict (dict) –

    Dictionary mapping TFs to their phosphorylation sites.

  • psite_labels_dict (dict) –

    Dictionary mapping TFs to their phosphorylation site labels.

  • reg_map (dict) –

    Mapping of genes to their regulators.

Returns: mRNA_mat (np.ndarray): Matrix of mRNA expression levels. regulators (np.ndarray): Matrix of regulators for each mRNA. protein_mat (np.ndarray): Matrix of TF protein levels. psite_tensor (np.ndarray): Tensor of phosphorylation sites. n_reg (int): Number of regulators. n_psite_max (int): Maximum number of phosphorylation sites across all TFs. psite_labels_arr (list): List of phosphorylation site labels for each TF. num_psites (np.ndarray): Array indicating the number of phosphorylation sites for each TF.

tfopt.evol.optcon.filter

load_raw_data()

Load raw data from files.

Returns:
  • mRNA_ids

    List of mRNA gene identifiers.

  • mRNA_mat

    Matrix of mRNA expression data.

  • mRNA_time_cols

    Time points for mRNA data.

  • TF_ids

    List of transcription factor identifiers.

  • protein_dict

    Dictionary mapping TF_ids to their protein data.

  • psite_dict

    Dictionary mapping TF_ids to their phosphorylation site data.

  • psite_labels_dict

    Dictionary mapping TF_ids to their phosphorylation site labels.

  • TF_time_cols

    Time points for TF data.

  • reg_map

    Regulation map, mapping mRNA genes to their regulators.

filter_mrna(mRNA_ids, mRNA_mat, reg_map)

Filter mRNA genes to only those with regulators present in the regulation map.

Parameters:
  • mRNA_ids (list) –

    List of mRNA gene identifiers.

  • mRNA_mat (ndarray) –

    Matrix of mRNA expression data.

  • reg_map (dict) –

    Regulation map, mapping mRNA genes to their regulators.

Returns:
  • filtered_mRNA_ids( list ) –

    List of filtered mRNA gene identifiers.

  • filtered_mRNA_mat( ndarray ) –

    Matrix of filtered mRNA expression data.

update_regulations(mRNA_ids, reg_map, TF_ids)

Update the regulation map to only include relevant transcription factors.

Parameters:
  • mRNA_ids (list) –

    List of mRNA gene identifiers.

  • reg_map (dict) –

    Regulation map, mapping mRNA genes to their regulators.

  • TF_ids (list) –

    List of transcription factor identifiers.

Returns:
  • relevant_TFs( set ) –

    Set of relevant transcription factors.

filter_TF(TF_ids, protein_dict, psite_dict, psite_labels_dict, relevant_TFs)

Filter transcription factors to only those present in the relevant_TFs set.

Parameters:
  • TF_ids (list) –

    List of transcription factor identifiers.

  • protein_dict (dict) –

    Dictionary mapping TF_ids to their protein data.

  • psite_dict (dict) –

    Dictionary mapping TF_ids to their phosphorylation site data.

  • psite_labels_dict (dict) –

    Dictionary mapping TF_ids to their phosphorylation site labels.

  • relevant_TFs (set) –

    Set of relevant transcription factors.

Returns:
  • TF_ids_filtered( list ) –

    List of filtered transcription factor identifiers.

  • protein_dict( dict ) –

    Filtered dictionary mapping TF_ids to their protein data.

  • psite_dict( dict ) –

    Filtered dictionary mapping TF_ids to their phosphorylation site data.

  • psite_labels_dict( dict ) –

    Filtered dictionary mapping TF_ids to their phosphorylation site labels.

determine_T_use(mRNA_mat, TF_time_cols)

Determine the number of time points to use for the analysis.

Parameters:
  • mRNA_mat (ndarray) –

    Matrix of mRNA expression data.

  • TF_time_cols (list) –

    Time points for TF data.

tfopt.evol.utils.iodata

load_mRNA_data(filename=INPUT3)

Load mRNA data from a CSV file.

Parameters:
  • filename (str, default: INPUT3 ) –

    Path to the CSV file containing mRNA data.

Returns: - mRNA_ids: List of mRNA gene identifiers (strings). - mRNA_mat: Matrix of mRNA expression data (numpy array). - time_cols: List of time columns (excluding "GeneID").

load_TF_data(filename=INPUT1)

Load TF data from a CSV file.

Parameters:
  • filename (str, default: INPUT1 ) –

    Path to the CSV file containing TF data.

Returns: - TF_ids: List of TF identifiers (strings). - protein_dict: Dictionary mapping TF identifiers to their protein data (numpy array). - psite_dict: Dictionary mapping TF identifiers to their phosphorylation site data (list of numpy arrays). - psite_labels_dict: Dictionary mapping TF identifiers to their phosphorylation site labels (list of strings). - time_cols: List of time columns (excluding "GeneID" and "Psite").

load_regulation(filename=INPUT4)

Load regulation data from a CSV file.

Parameters:
  • filename (str, default: INPUT4 ) –

    Path to the CSV file containing regulation data.

Returns: - reg_map: Dictionary mapping mRNA genes to their regulators (list of TF identifiers).

create_report(results_dir: str, output_file: str = 'report.html')

Creates a single global report HTML file from all gene folders inside the results directory.

Parameters:
  • results_dir (str) –

    Path to the directory containing gene folders.

organize_output_files(*directories)

Organizes output files from multiple directories into separate folders for each protein.

Parameters:
  • directories (str, default: () ) –

    List of directories to organize.

format_duration(seconds)

Format a duration in seconds into a human-readable string.

Parameters:
  • seconds (float) –

    Duration in seconds.

Returns: str: Formatted duration string.

tfopt.evol.utils.params

create_no_psite_array(n_TF, num_psites, psite_labels_arr)

Create an array indicating whether each TF has no phosphorylation sites.

Parameters:
  • n_TF (int) –

    Number of transcription factors.

  • num_psites (list) –

    List of number of phosphorylation sites for each TF.

  • psite_labels_arr (list) –

    List of phosphorylation site labels for each TF.

Returns:
  • no_psite_tf( ndarray ) –

    Array indicating whether each TF has no phosphorylation sites.

compute_beta_indices(num_psites, n_TF)

Compute the starting indices for the beta parameters for each TF.

Parameters:
  • num_psites (list) –

    List of number of phosphorylation sites for each TF.

  • n_TF (int) –

    Number of transcription factors.

Returns:
  • beta_start_indices( ndarray ) –

    Array of starting indices for the beta parameters.

  • cum( int ) –

    Total number of beta parameters.

create_initial_guess(n_mRNA, n_reg, n_TF, num_psites, no_psite_tf)

Create the initial guess for the optimization variables.

Parameters:
  • n_mRNA (int) –

    Number of mRNAs.

  • n_reg (int) –

    Number of regulators.

  • n_TF (int) –

    Number of transcription factors.

  • num_psites (list) –

    List of number of phosphorylation sites for each TF.

  • no_psite_tf (ndarray) –

    Array indicating whether each TF has no phosphorylation sites.

Returns:
  • x0( ndarray ) –

    Initial guess for the optimization variables.

  • n_alpha( int ) –

    Number of alpha parameters.

create_bounds(n_alpha, n_beta_total, lb, ub)

Create the lower and upper bounds for the optimization variables.

Parameters:
  • n_alpha (int) –

    Number of alpha parameters.

  • n_beta_total (int) –

    Total number of beta parameters.

  • lb (float) –

    Lower bound for the optimization variables.

  • ub (float) –

    Upper bound for the optimization variables.

Returns:
  • xl( ndarray ) –

    Lower bounds for the optimization variables.

  • xu( ndarray ) –

    Upper bounds for the optimization variables.

get_parallel_runner()

Get a parallel runner for multi-threading.

Returns:
  • runner

    Parallelization runner.

  • pool

    ThreadPool instance for parallel execution.

extract_best_solution(res, n_alpha, n_mRNA, n_reg, n_TF, num_psites, beta_start_indices)

Extract the best solution from the optimization results.

Parameters:
  • res

    Optimization results.

  • n_alpha (int) –

    Number of alpha parameters.

  • n_mRNA (int) –

    Number of mRNAs.

  • n_reg (int) –

    Number of regulators.

  • n_TF (int) –

    Number of transcription factors.

  • num_psites (list) –

    List of number of phosphorylation sites for each TF.

  • beta_start_indices (ndarray) –

    Array of starting indices for the beta parameters.

Returns:
  • final_alpha( ndarray ) –

    Final alpha parameters.

  • final_beta( ndarray ) –

    Final beta parameters.

  • best_objectives( ndarray ) –

    Best objectives from the Pareto front.

  • final_x( ndarray ) –

    Final optimization variables.

print_alpha_mapping(mRNA_ids, reg_map, TF_ids, final_alpha)

Print the mapping of transcription factors (TFs) to mRNAs with their corresponding alpha values.

Parameters:
  • mRNA_ids (list) –

    List of mRNA identifiers.

  • reg_map (dict) –

    Mapping of genes to their regulators.

  • TF_ids (list) –

    List of TF identifiers.

  • final_alpha (ndarray) –

    Final alpha parameters (mRNA x TF).

print_beta_mapping(TF_ids, final_beta, psite_labels_arr)

Print the mapping of transcription factors (TFs) to their beta parameters.

Parameters:
  • TF_ids (list) –

    List of TF identifiers.

  • final_beta (ndarray) –

    Final beta parameters (TF x β).

  • psite_labels_arr (list) –

    List of phosphorylation site labels for each TF.

Gradient-Based Algorithms

tfopt.local.config.constants

parse_args()

Parse command line arguments for the PhosKinTime optimization problem. This function uses argparse to handle input parameters for the optimization process. The parameters include: - lower_bound: Lower bound for the optimization variables (default: -2). - upper_bound: Upper bound for the optimization variables (default: 2). - loss_type: Type of loss function to use (default: 0). Options: 0: MSE 1: MAE 2: soft L1 3: Cauchy 4: Arctan 5: Elastic Net 6: Tikhonov

:return: lower_bound, upper_bound, loss_type

tfopt.local.config.logconf

tfopt.local.exporter.plotout

plot_estimated_vs_observed(predictions, expression_matrix, gene_ids, time_points, regulators, tf_protein_matrix, tf_ids, num_targets, save_path=OUT_DIR)

Plots the estimated vs observed values for a given set of genes and their corresponding TFs.

Parameters:
  • predictions (ndarray) –

    Predicted expression levels.

  • expression_matrix (ndarray) –

    Observed expression levels.

  • gene_ids (list) –

    List of gene identifiers.

  • time_points (ndarray) –

    Time points for the experiments.

  • regulators (ndarray) –

    Matrix of regulators for each gene.

  • tf_protein_matrix (ndarray) –

    Matrix of TF protein levels.

  • tf_ids (list) –

    List of TF identifiers.

  • num_targets (int) –

    Number of target genes to plot.

  • save_path (str, default: OUT_DIR ) –

    Directory to save the plots.

tfopt.local.exporter.sheetutils

save_results_to_excel(gene_ids, tf_ids, final_alpha, final_beta, psite_labels_arr, expression_matrix, predictions, objective_value, reg_map, filename=OUT_FILE)

Save the optimization results to an Excel file.

Parameters:
  • gene_ids (list) –

    List of gene identifiers.

  • tf_ids (list) –

    List of TF identifiers.

  • final_alpha (ndarray) –

    Final alpha values.

  • final_beta (ndarray) –

    Final beta values.

  • psite_labels_arr (list) –

    List of phosphorylation site labels.

  • expression_matrix (ndarray) –

    Observed expression levels.

  • predictions (ndarray) –

    Predicted expression levels.

  • objective_value (float) –

    Objective value from optimization.

  • reg_map (dict) –

    Mapping of genes to regulators.

  • filename (str, default: OUT_FILE ) –

    Path to the output Excel file.

tfopt.local.objfn.minfn

objective_(x, expression_matrix, regulators, tf_protein_matrix, psite_tensor, n_reg, T_use, n_genes, beta_start_indices, num_psites, loss_type, lam1=1e-06, lam2=1e-06)

Originally implemented by Julius Normann.

This version has been modified and optimized for consistency & speed in submodules by Abhinav Mishra.

Computes a loss value using one of several loss functions.

Parameters:
  • x ( ) –

    Decision vector.

  • expression_matrix ( ) –

    (n_genes x T_use) measured gene expression values.

  • regulators ( ) –

    (n_genes x n_reg) indices of TF regulators for each gene.

  • tf_protein_matrix ( ) –

    (n_TF x T_use) TF protein time series.

  • psite_tensor ( ) –

    (n_TF x n_psite_max x T_use) matrix of PSite signals (padded with zeros).

  • n_reg ( ) –

    Maximum number of regulators per gene.

  • T_use ( ) –

    Number of time points used.

  • n_genes, (n_TF) –

    Number of genes and TF respectively.

  • beta_start_indices

    Integer array giving the starting index (in the β–segment) for each TF.

  • num_psites ( ) –

    Integer array with the actual number of PSites for each TF.

  • loss_type ( ) –

    Integer indicating the loss type (0: MSE, 1: MAE, 2: soft L1, 3: Cauchy, 4: Arctan, 5: Elastic Net, 6: Tikhonov).

Returns:
  • loss

    The computed loss (a scalar).

compute_predictions(x, regulators, tf_protein_matrix, psite_tensor, n_reg, T_use, n_genes, beta_start_indices, num_psites)

Computes the predicted expression matrix based on the decision vector x.

Parameters:
  • x ( ) –

    Decision vector.

  • regulators ( ) –

    (n_genes x n_reg) indices of TF regulators for each gene.

  • tf_protein_matrix ( ) –

    (n_TF x T_use) TF protein time series.

  • psite_tensor ( ) –

    (n_TF x n_psite_max x T_use) matrix of PSite signals (padded with zeros).

  • n_reg ( ) –

    Maximum number of regulators per gene.

  • T_use ( ) –

    Number of time points used.

  • n_genes ( ) –

    Number of genes.

  • beta_start_indices

    Integer array giving the starting index (in the β–segment) for each TF.

  • num_psites ( ) –

    Integer array with the actual number of PSites for each TF.

Returns:
  • predictions

    (n_genes x T_use) predicted gene expression values.

objective_wrapper(x, expression_matrix, regulators, tf_protein_matrix, psite_tensor, n_reg, T_use, n_genes, beta_start_indices, num_psites, loss_type)

Wrapper function for the objective function.

Parameters:
  • x ( ) –

    Decision vector.

  • expression_matrix ( ) –

    (n_genes x T_use) measured gene expression values.

  • regulators ( ) –

    (n_genes x n_reg) indices of TF regulators for each gene.

  • tf_protein_matrix ( ) –

    (n_TF x T_use) TF protein time series.

  • psite_tensor ( ) –

    (n_TF x n_psite_max x T_use) matrix of PSite signals (padded with zeros).

  • n_reg ( ) –

    Maximum number of regulators per gene.

  • T_use ( ) –

    Number of time points used.

  • n_genes ( ) –

    Number of genes.

  • beta_start_indices

    Integer array giving the starting index (in the β–segment) for each TF.

  • num_psites ( ) –

    Integer array with the actual number of PSites for each TF.

  • loss_type ( ) –

    Integer indicating the loss type.

Returns:
  • loss

    The computed loss (a scalar).

tfopt.local.opt.optrun

run_optimizer(x0, bounds, lin_cons, expression_matrix, regulators, tf_protein_matrix, psite_tensor, n_reg, T_use, n_genes, beta_start_indices, num_psites, loss_type)

Runs the optimization algorithm to minimize the objective function.

Parameters:
  • x0 ( ) –

    Initial guess for the optimization variables.

  • bounds ( ) –

    Bounds for the optimization variables.

  • lin_cons ( ) –

    Linear constraints for the optimization problem.

  • expression_matrix ( ) –

    (n_genes x T_use) measured gene expression values.

  • regulators ( ) –

    (n_genes x n_reg) indices of TF regulators for each gene.

  • tf_protein_matrix ( ) –

    (n_TF x T_use) TF protein time series.

  • psite_tensor ( ) –

    (n_TF x n_psite_max x T_use) matrix of PSite signals (padded with zeros).

  • n_reg ( ) –

    Maximum number of regulators per gene.

  • T_use ( ) –

    Number of time points used.

  • n_genes, (n_TF) –

    Number of genes and TF respectively.

  • beta_start_indices ( ) –

    Integer array giving the starting index (in the β–segment) for each TF.

  • num_psites ( ) –

    Integer array with the actual number of PSites for each TF.

  • loss_type ( ) –

    Type of loss function to use.

Returns: result : Result of the optimization process, including the optimized parameters and objective value.

tfopt.local.optcon.construct

build_fixed_arrays(gene_ids, expression_matrix, tf_ids, tf_protein, tf_psite_data, tf_psite_labels, reg_map)

Builds fixed-shape arrays from the input data.

Parameters:
  • - (gene_ids) –

    list of mRNA identifiers.

  • - (expression_matrix) –

    array of shape (n_genes, T) with mRNA expression levels.

  • - (tf_ids) –

    list of TF identifiers.

  • - (tf_protein) –

    dict mapping TFs to their protein levels.

  • - (tf_psite_data) –

    dict mapping TFs to their phosphorylation sites.

  • - (tf_psite_labels) –

    dict mapping TFs to their phosphorylation site labels.

  • - (reg_map) –

    mapping of genes to their regulators (TFs).

Returns:
    • expression_matrix: array of shape (n_genes, T) with mRNA expression levels.
    • regulators: array of shape (n_genes, n_reg) with TF indices.
    • tf_protein_matrix: array of shape (n_TF, T) with TF protein levels.
    • psite_tensor: array of shape (n_TF, n_psite_max, T) with phosphorylation sites.
    • n_reg: number of regulators.
    • n_psite_max: maximum number of phosphorylation sites across all TFs.
    • psite_labels_arr: list of labels for each TF's phosphorylation sites.
    • num_psites: array indicating the number of phosphorylation sites for each TF.

constraint_alpha_func(x, n_genes, n_reg)

For each gene, the sum of its alpha parameters must equal 1.

Parameters:
  • x (ndarray) –

    Decision vector.

  • n_genes (int) –

    Number of genes.

  • n_reg (int) –

    Number of regulators.

Returns:
  • np.ndarray: Array of constraints.

constraint_beta_func(x, n_alpha, n_TF, beta_start_indices, num_psites, no_psite_tf)

For each TF, the sum of its beta parameters must equal 1.

Parameters:
  • x (ndarray) –

    Decision vector.

  • n_alpha (int) –

    Number of alpha parameters.

  • n_TF (int) –

    Number of transcription factors.

  • beta_start_indices (list) –

    List of starting indices for beta parameters.

  • num_psites (list) –

    List of number of phosphorylation sites for each TF.

  • no_psite_tf (list) –

    List indicating if a TF has no phosphorylation site.

Returns:
  • np.ndarray: Array of constraints.

build_linear_constraints(n_genes, n_TF, n_reg, n_alpha, beta_start_indices, num_psites, no_psite_tf)

Build linear constraints for the transcription factor optimization problem.

Parameters:
  • n_genes (int) –

    Number of genes.

  • n_TF (int) –

    Number of transcription factors.

  • n_reg (int) –

    Number of regulators.

  • n_alpha (int) –

    Number of alpha parameters.

  • beta_start_indices (list) –

    List of starting indices for beta parameters.

  • num_psites (list) –

    List of number of phosphorylation sites for each TF.

  • no_psite_tf (list) –

    List indicating if a TF has no phosphorylation site.

Returns:
  • list

    List of linear constraints.

tfopt.local.optcon.filter

load_and_filter_data()

Load and filter data for the optimization problem.

Returns:
    • gene_ids (list): List of gene IDs.
    • expr_matrix (np.ndarray): Gene expression matrix.
    • expr_time_cols (list): Time columns for expression data.
    • tf_ids (list): List of transcription factor IDs.
    • tf_protein (dict): Dictionary mapping TF IDs to their protein data.
    • tf_psite_data (dict): Dictionary mapping TF IDs to their phosphorylation site data.
    • tf_psite_labels (dict): Dictionary mapping TF IDs to their phosphorylation site labels.
    • tf_time_cols (list): Time columns for TF data.
    • reg_map (dict): Regulation map, mapping gene IDs to their regulators.

prepare_data(gene_ids, expr_matrix, tf_ids, tf_protein, tf_psite_data, tf_psite_labels, tf_time_cols, reg_map)

Prepares the data for optimization by filtering the expression matrix to match the number of time points and building fixed arrays.

Parameters:
  • gene_ids (list) –

    List of gene IDs.

  • expr_matrix (ndarray) –

    Gene expression matrix.

  • tf_ids (list) –

    List of transcription factor IDs.

  • tf_protein (dict) –

    Dictionary mapping TF IDs to their protein data.

  • tf_psite_data (dict) –

    Dictionary mapping TF IDs to their phosphorylation site data.

  • tf_psite_labels (dict) –

    Dictionary mapping TF IDs to their phosphorylation site labels.

  • tf_time_cols (list) –

    Time columns for TF data.

  • reg_map (dict) –

    Regulation map, mapping gene IDs to their regulators.

Returns: fixed_arrays (tuple): Tuple containing the fixed arrays: - expression_matrix: array of shape (n_genes, T) - regulators: array of shape (n_genes, n_reg) with indices into tf_ids. - tf_protein_matrix: array of shape (n_TF, T) - psite_tensor: array of shape (n_TF, n_psite_max, T), padded with zeros. - n_reg: maximum number of regulators per gene. - n_psite_max: maximum number of PSites among TFs. - psite_labels_arr: list (length n_TF) of lists of PSite names (padded with empty strings). - num_psites: array of length n_TF with the actual number of PSites for each TF. T_use (int): Number of time points used in the expression matrix.

tfopt.local.utils.iodata

min_max_normalize(df, custom_max=None)

Row-wise (per-sample) min-max normalize time-series columns starting with 'x'.

Parameters:
  • df (DataFrame) –

    Input DataFrame with time-series columns (x1-xN).

  • custom_max (float, default: None ) –

    If given, used as max for all rows.

Returns:
  • pd.DataFrame: Normalized DataFrame with same shape.

load_expression_data(filename=INPUT3)

Loads gene expression (mRNA) data.

Parameters:
  • filename (str, default: INPUT3 ) –

    Path to the CSV file containing mRNA data.

Returns:
    • gene_ids: List of gene identifiers (strings).
    • expression_matrix: Matrix of gene expression data (numpy array).
    • time_cols: List of time columns (excluding "GeneID").

load_tf_protein_data(filename=INPUT1)

Loads TF protein data along with PSite information.

Parameters:
  • filename (str, default: INPUT1 ) –

    Path to the CSV file containing TF protein data.

Returns: - tf_ids: List of TF identifiers (strings). - tf_protein: Dictionary mapping TF identifiers to their protein data (numpy array). - tf_psite_data: Dictionary mapping TF identifiers to their phosphorylation site data (list of numpy arrays). - tf_psite_labels: Dictionary mapping TF identifiers to their phosphorylation site labels (list of strings). - time_cols: List of time columns (excluding "GeneID" and "Psite").

load_regulation(filename=INPUT4)

Returns a mapping from gene (source) to a list of TFs (targets).

Parameters:
  • filename (str, default: INPUT4 ) –

    Path to the CSV file containing regulation data.

Returns: - reg_map: Dictionary mapping gene identifiers to lists of TF identifiers.

summarize_stats(input3=INPUT3, input1=INPUT1, input4=INPUT4)

Summarizes statistics for the expression data (input3) and TF protein data (input1).

Parameters:
  • input3 (str, default: INPUT3 ) –

    Path to the expression data CSV file.

  • input1 (str, default: INPUT1 ) –

    Path to the TF protein data CSV file.

  • input4 (str, default: INPUT4 ) –

    Path to the mapping file CSV.

create_report(results_dir: str, output_file: str = 'report.html')

Creates a single global report HTML file from all gene folders inside the results directory.

Parameters:
  • results_dir (str) –

    Path to the root results directory.

  • output_file (str, default: 'report.html' ) –

    Name of the generated global report file (placed inside results_dir).

organize_output_files(*directories)

Function to organize output files into protein-specific folders.

Parameters:
  • directories (str, default: () ) –

    List of directories to organize.

tfopt.local.utils.params

get_optimization_parameters(expression_matrix, tf_protein_matrix, n_reg, T_use, psite_labels_arr, num_psites, lb, ub)

Prepare the optimization parameters for the optimization problem.

Parameters:
  • expression_matrix (ndarray) –

    Gene expression matrix.

  • tf_protein_matrix (ndarray) –

    TF protein matrix.

  • n_reg (int) –

    Number of regulators.

  • T_use (int) –

    Number of time points to use.

  • psite_labels_arr (list) –

    List of phosphorylation site labels for each TF.

  • num_psites (ndarray) –

    Array containing the number of phosphorylation sites for each TF.

  • lb (float) –

    Lower bound for beta parameters.

  • ub (float) –

    Upper bound for beta parameters.

Returns: x0 (np.ndarray): Initial guess for the optimization variables. n_alpha (int): Number of alpha parameters. beta_start_indices (np.ndarray): Starting indices for beta parameters. bounds (list): List of bounds for the optimization variables. no_psite_tf (np.ndarray): Array indicating whether each TF has no phosphorylation sites. n_genes (int): Number of genes. n_TF (int): Number of transcription factors.

postprocess_results(result, n_alpha, n_genes, n_reg, beta_start_indices, num_psites, reg_map, gene_ids, tf_ids, psite_labels_arr)

Post-process the optimization results to extract the final alpha and beta parameters.

Parameters:
  • result (OptimizeResult) –

    The result of the optimization.

  • n_alpha (int) –

    Number of alpha parameters.

  • n_genes (int) –

    Number of genes.

  • n_reg (int) –

    Number of regulators.

  • beta_start_indices (ndarray) –

    Starting indices for beta parameters.

  • num_psites (ndarray) –

    Array containing the number of phosphorylation sites for each TF.

  • reg_map (dict) –

    Regulation map, mapping gene IDs to their regulators.

  • gene_ids (list) –

    List of gene IDs.

  • tf_ids (list) –

    List of transcription factor IDs.

  • psite_labels_arr (list) –

    List of lists containing phosphorylation site labels.

Returns:
  • final_x( ndarray ) –

    Final optimization result.

  • final_alpha( ndarray ) –

    Final alpha parameters reshaped into a matrix.

  • final_beta( ndarray ) –

    Final beta parameters reshaped into a matrix.

Fitting Analysis

tfopt.fitanalysis.helper

Plotter

A class to plot various analysis results from an Excel file.

__init__(filepath, savepath)

Initializes the Plotter instance by loading data from the Excel file. Args: filepath (str): Path to the Excel file containing analysis results. savepath (str): Directory where the plots will be saved.

load_data()

Loads data from the specified Excel file. Args: filepath (str): Path to the Excel file. savepath (str): Directory where the plots will be saved.

plot_alpha_distribution()

Plots the distribution of alpha parameter values grouped by transcription factors (TFs) using a strip plot.

plot_beta_barplots()

Processes the beta values DataFrame and creates a separate bar plot for each unique transcription factor (TF).

plot_heatmap_abs_residuals()

Plots a heatmap of the absolute values of the residuals.

plot_goodness_of_fit()

Creates a scatter plot comparing observed vs. estimated values, fits a linear regression model, plots the 95% confidence interval, and labels points outside the confidence interval.

plot_kld()

Plots the Kullback-Leibler Divergence (KLD) for each mRNA. The KLD is calculated between the observed and estimated distributions of the mRNA expression levels.

plot_pca()

Plots a PCA (Principal Component Analysis) of the observed and estimated values.

plot_boxplot_alpha()

Plots a boxplot of the alpha values.

plot_boxplot_beta()

Plots a boxplot of the beta values.

plot_cdf_alpha()

Plots the cumulative distribution function (CDF) of the alpha values.

plot_cdf_beta()

Plots the cumulative distribution function (CDF) of the beta values.

plot_time_wise_residuals()

Plots the residuals over time for each mRNA.

ODE Modelling & Parameter Estimation

Configuration

config.cli

Command‑line entry point for the phoskintime pipeline.

Usage

Come one level up from the package root, it should be the working directory

(where you can see the project directory).

run everything with the default (local) solver

python phoskintime all

run only preprocessing

python phoskintime prep

run tfopt with local flavour

python phoskintime tfopt --mode local

run tfopt with evol flavour

python phoskintime tfopt --mode evol

run kinopt with local flavour

python phoskintime kinopt --mode local

run kinopt with evol flavour

python phoskintime kinopt --mode evol

run the model

python phoskintime model

prep()

Preprocess data (processing.cleanup).

tfopt(mode: str = typer.Option('local', help='local | evol'), conf: Path | None = typer.Option(None, '--conf', file_okay=True, dir_okay=False, writable=False, help='Path to TOML/YAML config. Uses defaults if omitted.'))

Transcription-Factor-mRNA Optimisation.

Parameters:
  • mode (str, default: Option('local', help='local | evol') ) –

    local | evol

  • conf (Path | None, default: Option(None, '--conf', file_okay=True, dir_okay=False, writable=False, help='Path to TOML/YAML config. Uses defaults if omitted.') ) –

    Path to TOML/YAML config. Uses defaults if omitted.

Returns: None

kinopt(mode: str = typer.Option('local', help='local | evol'), conf: Path | None = typer.Option(None, '--conf', file_okay=True, dir_okay=False, writable=False, help='Path to TOML/YAML config. Uses defaults if omitted.'))

Kinase-Phosphorylation Optimization.

Parameters:
  • mode (str, default: Option('local', help='local | evol') ) –

    local | evol

  • conf (Path | None, default: Option(None, '--conf', file_okay=True, dir_okay=False, writable=False, help='Path to TOML/YAML config. Uses defaults if omitted.') ) –

    Path to TOML/YAML config. Uses defaults if omitted.

Returns: None

model(conf: Path | None = typer.Option(None, '--conf', file_okay=True, dir_okay=False, writable=False, help='Path to model config file. Uses defaults if omitted.'))

Run the model (bin.main).

Parameters:
  • conf (Path | None, default: Option(None, '--conf', file_okay=True, dir_okay=False, writable=False, help='Path to model config file. Uses defaults if omitted.') ) –

    Path to model config file. Uses defaults if omitted.

Returns: None

all(tf_mode: str = typer.Option('local', help='tfopt mode: local | evol'), kin_mode: str = typer.Option('local', help='kinopt mode: local | evol'), tf_conf: Path | None = typer.Option(None, help='tfopt config file'), kin_conf: Path | None = typer.Option(None, help='kinopt config file'), model_conf: Path | None = typer.Option(None, help='model config file'))

Run every stage in sequence. Preprocessing -> TF optimisation -> Kinase optimisation -> Model.

Parameters:
  • tf_mode (str, default: Option('local', help='tfopt mode: local | evol') ) –

    tfopt mode: local | evol

  • kin_mode (str, default: Option('local', help='kinopt mode: local | evol') ) –

    kinopt mode: local | evol

  • tf_conf (Path | None, default: Option(None, help='tfopt config file') ) –

    Path to TOML/YAML config. Uses defaults if omitted.

  • kin_conf (Path | None, default: Option(None, help='kinopt config file') ) –

    Path to TOML/YAML config. Uses defaults if omitted.

  • model_conf (Path | None, default: Option(None, help='model config file') ) –

    Path to model config file. Uses defaults if omitted.

Returns: None

config.config

parse_bound_pair(val)

Parse a string representing a pair of bounds (lower, upper) into a tuple of floats. The upper bound can be 'inf' or 'infinity' to represent infinity. Raises ValueError if the input is not in the correct format. Args: val (str): The string to parse, e.g., "0,3" or "0,infinity". Returns: tuple: A tuple containing the lower and upper bounds as floats.

parse_fix_value(val)

Parse a fixed value or a list of fixed values from a string. If the input is a single value, it returns that value as a float. If the input is a comma-separated list, it returns a list of floats. Raises ValueError if the input is not in the correct format. Args: val (str): The string to parse, e.g., "1.0" or "1.0,2.0". Returns: float or list: The parsed fixed value(s) as a float or a list of floats.

ensure_output_directory(directory)

Parameters:
  • directory (str) –

    The path to the directory to create.

Returns: None

parse_args()

Parse command-line arguments for the PhosKinTime script. This function uses argparse to define and handle the command-line options. It includes options for setting bounds, fixed parameters, bootstrapping, profile estimation, and input file paths. The function returns the parsed arguments as a Namespace object. The arguments include: --A-bound, --B-bound, --C-bound, --D-bound, --Ssite-bound, --Dsite-bound, --bootstraps, --input-excel-protein, --input-excel-psite, --input-excel-rna.

Returns: argparse.Namespace: The parsed command-line arguments.

log_config(logger, bounds, args)

Log the configuration settings for the PhosKinTime script. This function logs the parameter bounds bootstrapping iterations. It uses the provided logger to output the information.

Parameters:
  • logger (Logger) –

    The logger to use for logging.

  • bounds (dict) –

    The parameter bounds.

  • args (Namespace) –

    The command-line arguments.

Returns: None

extract_config(args)

Extract configuration settings from command-line arguments. This function creates a dictionary containing the parameter bounds, bootstrapping iterations. The function returns the configuration dictionary.

Parameters:
  • args (Namespace) –

    The command-line arguments.

Returns: dict: The configuration settings.

score_fit(params, target, prediction, alpha=ALPHA_WEIGHT, beta=BETA_WEIGHT, gamma=GAMMA_WEIGHT, delta=DELTA_WEIGHT, mu=MU_WEIGHT)

Calculate the score for the fit of a model to target data. The score is a weighted combination of various metrics including mean squared error (MSE), root mean squared error (RMSE), mean absolute error (MAE), variance, and regularization penalty. The weights for each metric can be adjusted using the parameters alpha, beta, gamma, and delta. The regularization penalty is controlled by the reg_penalty parameter. The function returns the calculated score. Args: params (np.ndarray): The model parameters. target (np.ndarray): The target data. prediction (np.ndarray): The predicted data. alpha (float): Weight for RMSE. beta (float): Weight for MAE. gamma (float): Weight for variance. delta (float): Weight for MSE. mu (float): Regularization penalty weight. Returns: float: The calculated score.

future_times(n_new: int, ratio: Optional[float] = None, tp: np.ndarray = TIME_POINTS) -> np.ndarray

Extend ttime points by n_new points, each spaced by multiplying the previous interval by ratio. If ratio is None, it is inferred from the last two points.

Parameters:
  • n_new (int) –

    Number of new time points to generate.

  • ratio (float, default: None ) –

    Ratio to multiply the previous interval. Defaults to None.

  • tp (ndarray, default: TIME_POINTS ) –

    Existing time points. Defaults to TIME_POINTS.

Returns: np.ndarray: Extended time points.

config.constants

get_param_names_rand(num_psites: int) -> list

Generate parameter names for the random model. Format: ['A', 'B', 'C', 'D'] + ['S1', 'S2', ..., 'S'] + [parameter names for all combinations of dephosphorylation sites].

Parameters:
  • num_psites (int) –

    Number of phosphorylation sites.

Returns: list: List of parameter names.

get_param_names_ds(num_psites: int) -> list

Generate parameter names for distributive or successive models. Format: ['A', 'B', 'C', 'D'] + ['S1', 'S2', ..., 'S'] + ['D1', 'D2', ..., 'D'].

Parameters:
  • num_psites (int) –

    Number of phosphorylation sites.

Returns: list: List of parameter names.

generate_labels_rand(num_psites: int) -> list

Generates labels for the states based on the number of phosphorylation sites for the random model. Returns a list with the base labels "R" and "P", followed by labels for all combinations of phosphorylated sites.

Parameters:
  • num_psites (int) –

    Number of phosphorylation sites.

Returns: list: List of state labels.

generate_labels_ds(num_psites: int) -> list

Generates labels for the states based on the number of phosphorylation sites for the distributive or successive models. Returns a list with the base labels "R" and "P", followed by labels for each individual phosphorylated state.

Parameters:
  • num_psites (int) –

    Number of phosphorylation sites.

Returns: list: List of state labels.

location(path: str, label: str = None) -> str

Returns a clickable hyperlink string for supported terminals using ANSI escape sequences.

Parameters:
  • path (str) –

    The file path or URL.

  • label (str, default: None ) –

    The display text for the link. Defaults to the path if not provided.

Returns:
  • str( str ) –

    A string that, when printed, shows a clickable link in terminals that support ANSI hyperlinks.

get_number_of_params_rand(num_psites)

Calculate the number of parameters required for the ODE system based on the number of phosphorylation sites.

Parameters:
  • num_psites (int) –

    Number of phosphorylation sites (1 to 4).

Returns:
  • int

    Total number of parameters.

get_bounds_rand(num_psites, ub=0, lower=0)

Generate bounds for the ODE parameters based on the number of phosphorylation sites.

Parameters:
  • num_psites (int) –

    Number of phosphorylation sites.

  • lower (float, default: 0 ) –

    Lower bound for parameters.

  • upper (float) –

    Upper bound for parameters.

Returns:
  • list

    List of bounds as [lower, upper] for each parameter.

config.logconf

ColoredFormatter

Bases: Formatter

Custom formatter to add colors to log messages and elapsed time. This formatter uses ANSI escape codes to colorize the log messages based on their severity level. It also includes a right-aligned clock that shows the elapsed time since the logger was initialized. The elapsed time is displayed in a human-readable format (e.g., "1h 23m 45s"). The formatter is designed to be used with a logger that has a console handler. The elapsed time is calculated from the time the logger was initialized and is displayed in a right-aligned format. The formatter also ensures that the log messages are padded to a specified width, which can be adjusted using the width parameter. The remove_ansi method is used to strip ANSI escape codes from the log message for accurate padding calculation. The format method is overridden to customize the log message format, including the timestamp, logger name, log level, and message. The setup_logger function is used to configure the logger with a file handler and a stream handler. The file handler writes log messages to a specified log file, while the stream handler outputs log messages to the console. The logger is set to the specified logging level, and the log file is created in the specified directory. The log file is rotated based on size, and old log files are backed up.

format(record)

Format the log record with colors and elapsed time. This method overrides the default format method to customize the log message format. It includes the timestamp, logger name, log level, and message.

remove_ansi(s) staticmethod

Remove ANSI escape codes from a string.

setup_logger(name='phoskintime', log_file=None, level=logging.DEBUG, log_dir=LOG_DIR, rotate=True, max_bytes=2 * 1024 * 1024, backup_count=5)

Setup a logger with colored output and file logging. This function creates a logger with colored output for console messages :param name: :param log_file: :param level: :param log_dir: :param rotate: :param max_bytes: :param backup_count: :return: logger

Core Functions

paramest.normest

worker_find_lambda(lam: float, gene: str, target: np.ndarray, p0: np.ndarray, time_points: np.ndarray, free_bounds: Tuple[np.ndarray, np.ndarray], init_cond: np.ndarray, num_psites: int, p_data: np.ndarray, pr_data: np.ndarray) -> Tuple[float, float, str]

Worker function for a single lambda value.

Parameters:
  • lam (float) –

    Regularization parameter.

  • gene (str) –

    Gene name.

  • target (ndarray) –

    Target data.

  • p0 (ndarray) –

    Initial parameter guess.

  • time_points (ndarray) –

    Time points for the model fitting.

  • free_bounds (Tuple[ndarray, ndarray]) –

    Parameter bounds for the optimization.

  • init_cond (ndarray) –

    Initial conditions for the ODE solver.

  • num_psites (int) –

    Number of phosphorylation sites.

  • p_data (ndarray) –

    Measurement data for protein-phospho.

  • pr_data (ndarray) –

    Reference data for protein.

Returns:
  • Tuple[float, float, str]

    Tuple containing the lambda value, score, and weight key.

find_best_lambda(gene: str, target: np.ndarray, p0: np.ndarray, time_points: np.ndarray, free_bounds: Tuple[np.ndarray, np.ndarray], init_cond: np.ndarray, num_psites: int, p_data: np.ndarray, pr_data: np.ndarray, lambdas=np.logspace(-2, 0, 10), max_workers: int = os.cpu_count()) -> Tuple[float, str]

Finds best lambda_reg to use in model_func.

normest(gene, pr_data, p_data, r_data, init_cond, num_psites, time_points, bounds, bootstraps, use_regularization=USE_REGULARIZATION)

Function to estimate parameters for a given gene using ODE models.

Parameters:
  • gene

    Gene name.

  • pr_data

    Protein data.

  • p_data

    Phosphorylation data.

  • r_data

    Reference data.

  • init_cond

    Initial conditions for the ODE solver.

  • num_psites

    Number of phosphorylation sites.

  • time_points

    Time points for the model fitting.

  • bounds

    Parameter bounds for the optimization.

  • bootstraps

    Number of bootstrap iterations.

  • use_regularization

    Whether to use regularization in the fitting process.

Returns:
  • Tuple containing estimated parameters, model fits, error values, and regularization term.

paramest.toggle

estimate_parameters(gene, pr_data, p_data, r_data, init_cond, num_psites, time_points, bounds, bootstraps)

This function allows for the selection of the estimation mode and handles the parameter estimation process accordingly.

Parameters:
  • gene (str) –

    Gene name.

  • pr_data (array) –

    Array of protein data.

  • p_data (array) –

    Array of protein-phospho data.

  • r_data (array) –

    Array of RNA data.

  • init_cond (array) –

    Initial conditions for the model.

  • num_psites (int) –

    Number of phosphorylation sites.

  • time_points (array) –

    Time points for the data.

  • bounds (tuple) –

    Bounds for the parameter estimation.

  • bootstraps (int) –

    Number of bootstrap samples.

Returns:
  • model_fits( list ) –

    List of model fits.

  • estimated_params( array ) –

    Estimated parameters.

  • seq_model_fit( array ) –

    Sequence model fit.

  • errors( array ) –

    Errors in the estimation.

  • reg_term( float ) –

    Regularization term.

Weights for Curve Fitting

models.weights

early_emphasis(pr_data, p_data, time_points, num_psites)

Function that calculates custom weights for early time points in a dataset.

Parameters:
  • pr_data

    2D numpy array of shape (num_psites, n_times)

  • p_data

    2D numpy array of shape (num_psites, n_times)

  • time_points

    1D numpy array of time points

  • num_psites

    Number of phosphorylation sites

Returns:
  • custom_weights

    1D numpy array of weights for early time points

get_protein_weights(gene, input1_path=Path(__file__).resolve().parent.parent / 'processing' / 'input1_wstd.csv', input2_path=Path(__file__).resolve().parent.parent / 'kinopt' / 'data' / 'input2.csv')

Function to extract weights for a specific gene from the input files.

Parameters:
  • gene (str) –

    Gene ID to filter the weights.

  • input1_path (Path, default: parent / 'processing' / 'input1_wstd.csv' ) –

    Path to the input1_wstd.csv file.

  • input2_path (Path, default: parent / 'kinopt' / 'data' / 'input2.csv' ) –

    Path to the input2.csv file.

Returns:
  • weights( ndarray ) –

    Extracted weights for the specified gene.

full_weight(p_data_weight, use_regularization, reg_len)

Function to create a full weight array for parameter estimation.

Parameters:
  • p_data_weight (ndarray) –

    The weight data to be processed.

  • use_regularization (bool) –

    Flag to indicate if regularization is used.

  • reg_len (int) –

    Length of the regularization term.

Returns:
  • numpy.ndarray: The full weight array.

get_weight_options(target, t_target, num_psites, use_regularization, reg_len, early_weights, ms_gauss_weights)

Function to calculate weights for parameter estimation based on the target data and time points.

Parameters:
  • target (ndarray) –

    The target data for which weights are calculated.

  • t_target (ndarray) –

    The time points corresponding to the target data.

  • num_psites (int) –

    Number of phosphorylation sites.

  • use_regularization (bool) –

    Flag to indicate if regularization is used.

  • reg_len (int) –

    Length of the regularization term.

  • early_weights (ndarray) –

    Weights for early time points.

  • ms_gauss_weights (ndarray) –

    Weights based on Gaussian distribution.

Returns:
  • dict

    A dictionary containing different weight options.

Parameter Estimation

paramest.core

process_gene(gene, protein_data, kinase_data, mrna_data, time_points, bounds, bootstraps=0, out_dir=OUT_DIR)

Process a single gene by estimating its parameters and generating plots.

Parameters:
  • gene (str) –

    Gene name.

  • protein_data (DataFrame) –

    DataFrame containing protein-only data.

  • kinase_data (DataFrame) –

    DataFrame containing kinase data.

  • mrna_data (DataFrame) –

    DataFrame containing mRNA data.

  • time_points (list) –

    List of time points for the experiment.

  • bounds (tuple) –

    Bounds for parameter estimation.

  • bootstraps (int, default: 0 ) –

    Number of bootstrap iterations. Defaults to 0.

  • out_dir (str, default: OUT_DIR ) –

    Output directory for saving results. Defaults to OUT_DIR.

Returns:
    • gene: The gene being processed.
    • estimated_params: Estimated parameters for the gene.
    • model_fits: Model fits for the gene.
    • seq_model_fit: Sequential model fit for the gene.
    • errors: Error metrics (MSE, MAE).
    • final_params: Final estimated parameters.
    • param_df: DataFrame of estimated parameters.
    • gene_psite_data: Dictionary of gene-specific data.
    • psite_labels: Labels for phosphorylation sites.
    • pca_result: PCA result for the gene.
    • ev: Explained variance for PCA.
    • tsne_result: t-SNE result for the gene.
    • perturbation_analysis: Sensitivity analysis results.
    • perturbation_curves_params: Trajectories with parameters for sensitivity analysis.
    • knockout_results: Dictionary of knockout results.
    • regularization: Regularization value used in parameter estimation.

process_gene_wrapper(gene, protein_data, kinase_data, mrna_data, time_points, bounds, bootstraps, out_dir=OUT_DIR)

Wrapper function to process a gene.

Parameters:
  • gene (str) –

    Gene name.

  • protein_data (DataFrame) –

    DataFrame containing protein-only data.

  • kinase_data (DataFrame) –

    DataFrame containing kinase data.

  • mrna_data (DataFrame) –

    DataFrame containing mRNA data.

  • time_points (list) –

    List of time points for the experiment.

  • bounds (tuple) –

    Bounds for parameter estimation.

  • bootstraps (int) –

    Number of bootstrap iterations. Defaults to 0.

  • out_dir (str, default: OUT_DIR ) –

    Output directory for saving results. Defaults to OUT_DIR.

Returns:
  • dict

    A dictionary containing the results of the gene processing.

Confidence Intervals using Linearization

paramest.identifiability.ci

confidence_intervals(gene, popt, pcov, target, model, alpha_val=0.05)

Computes the confidence intervals for parameter estimates using Wald Intervals approach.

Parameters:
  • gene (str) –

    Gene name.

  • popt (ndarray) –

    Optimized parameter estimates.

  • pcov (ndarray) –

    Covariance matrix of the optimized parameters.

  • target (ndarray) –

    Target data.

  • model (ndarray) –

    Model predictions.

  • alpha_val (float, default: 0.05 ) –

    Significance level for confidence intervals. Defaults to 0.05.

Returns:
  • dict

    A dictionary containing the confidence intervals and other statistics.

Knockout Analysis

knockout.helper

Perturbation & Parameter Sensitivity Analysis

sensitivity.analysis

compute_bound(value, perturbation=PERTURBATIONS_VALUE)

Computes the lower and upper bounds for a given parameter value for sensitivity analysis and perturbations.

Parameters:
  • value (float) –

    The parameter value.

  • perturbation (float, default: PERTURBATIONS_VALUE ) –

    The perturbation factor.

Returns:
  • list

    A list containing the lower and upper bounds.

define_sensitivity_problem_rand(num_psites, values)

Defines the Morris sensitivity analysis problem for the random model.

Parameters:
  • num_psites (int) –

    Number of phosphorylation sites.

  • values (list) –

    List of parameter values.

Returns:
  • dict

    A dictionary containing the number of variables, parameter names, and bounds.

define_sensitivity_problem_ds(num_psites, values)

Defines the Morris sensitivity analysis problem for the dynamic-site model.

Parameters:
  • num_psites (int) –

    Number of phosphorylation sites.

  • values (list) –

    List of parameter values.

Returns:
  • dict

    A dictionary containing the number of variables, parameter names, and bounds.

Model Diagram

models.diagram.helpers

powerset(iterable)

Return the list of all subsets (as frozensets) of the given iterable.

Parameters:
  • iterable

    An iterable (e.g., list, set) to generate subsets from.

Returns: A list of frozensets representing all subsets of the input iterable.

state_label(state)

Convert a set of phosphorylation sites into a node label.

Parameters:
  • state

    A frozenset representing the phosphorylation state.

Returns: A string representing the label for the node.

create_random_diagram(x, num_sites, output_filename)

Create a random phosphorylation diagram.

Parameters:
  • x

    Placeholder parameter, not used in this function.

  • num_sites

    The number of phosphorylation sites.

  • output_filename

    The name of the output file for the diagram.

create_distributive_diagram(x, num_sites, output_filename)

Create a distributive phosphorylation diagram.

Parameters:
  • x

    Placeholder parameter, not used in this function.

  • num_sites

    The number of phosphorylation sites.

  • output_filename

    The name of the output file for the diagram.

create_successive_model(x, num_sites, output_filename)

Create a successive phosphorylation diagram.

Parameters:
  • x

    Placeholder parameter, not used in this function.

  • num_sites

    The number of phosphorylation sites.

  • output_filename

    The name of the output file for the diagram.

Model Types

models.distmod

ode_core(y, t, A, B, C, D, S_rates, D_rates)

The core ODE system for the distributive phosphorylation model.

Parameters:
  • y

    array of concentrations

  • t

    time

  • A

    mRNA production rate

  • B

    mRNA degradation rate

  • C

    protein production rate

  • D

    protein degradation rate

  • S_rates

    phosphorylation rates for each site

  • D_rates

    dephosphorylation rates for each site

Returns:
  • dydt

    array of derivatives

unpack_params(params, num_psites)

Function to unpack the parameters for the distributive ODE system.

Parameters:
  • params(np.array)

    Parameter vector containing A, B, C, D, S_1.S_n, Ddeg_1.Ddeg_m.

  • num_psites(int)

    Number of phosphorylation sites.

Returns:
  • A( float ) –

    mRNA production rate.

  • B( float ) –

    mRNA degradation rate.

  • C( float ) –

    protein production rate.

  • D( float ) –

    protein degradation rate.

  • S_rates( array ) –

    Phosphorylation rates for each site.

  • D_rates( array ) –

    Dephosphorylation rates for each site.

solve_ode(params, init_cond, num_psites, t)

Solve the ODE system for the distributive phosphorylation model.

Parameters:
  • params

    array of parameters

  • init_cond

    initial conditions

  • num_psites

    number of phosphorylation sites

  • t

    time points

Returns:
  • sol

    solution of the ODE system

  • P_fitted

    phosphorylated sites

models.randmod

unpack_params(params, num_sites)

Unpack parameters for the Random model.

Parameters:
  • params (array) –

    Parameter vector containing A, B, C, D, S_1.S_n, Ddeg_1.Ddeg_m.

  • num_sites (int) –

    Number of phosphorylation sites.

Returns:
  • A( float ) –

    mRNA production rate.

  • B( float ) –

    mRNA degradation rate.

  • C( float ) –

    protein production rate.

  • D( float ) –

    protein degradation rate.

  • S( array ) –

    Phosphorylation rates for each site.

  • Ddeg( array ) –

    Degradation rates for phosphorylated states.

ode_system(y, t, A, B, C, D, num_sites, S, Ddeg, mono_idx, forward, drop, fcounts, dcounts)

Compute the time derivatives of a random phosphorylation ODE system.

This function supports a large number of phosphorylation states by using precomputed transition indices to optimize speed.

Parameters:
  • y (array) –

    Current state vector [R, P, X_1, ..., X_m].

  • t (float) –

    Time (unused; present for compatibility with ODE solvers).

  • A (float) –

    mRNA production rate.

  • B (float) –

    mRNA degradation rate.

  • C (float) –

    protein production rate.

  • D (float) –

    protein degradation rate.

  • num_sites (int) –

    Number of phosphorylation sites.

  • S (array) –

    Phosphorylation rates for each site.

  • Ddeg (array) –

    Degradation rates for phosphorylated states.

  • mono_idx (array) –

    Precomputed indices for mono-phosphorylated states.

  • forward (array) –

    Forward phosphorylation target states.

  • drop (array) –

    Dephosphorylation target states.

  • fcounts (array) –

    Number of valid forward transitions for each state.

  • dcounts (array) –

    Number of valid dephosphorylation transitions for each state.

Returns:
  • out( array ) –

    Derivatives [dR, dP, dX_1, ..., dX_m].

solve_ode(popt, y0, num_sites, t)

Integrate the ODE system for phosphorylation dynamics in random phosphorylation model.

Parameters:
  • popt (array) –

    Optimized parameter vector [A, B, C, D, S_1.S_n, Ddeg_1.Ddeg_m].

  • y0 (array) –

    Initial condition vector [R0, P0, X1_0, ..., Xm_0].

  • num_sites (int) –

    Number of phosphorylation sites.

  • t (array) –

    Time points to integrate over.

Returns:
  • sol( ndarray ) –

    Full ODE solution of shape (len(t), len(y0)).

  • mono( ndarray ) –

    1D array of fitted values for R (after OFFSET) and P states.

models.succmod

ode_core(y, t, A, B, C, D, S_rates, D_rates)

The core of the ODE system for the successive ODE model.

Parameters:
  • y (array) –

    The current state of the system.

  • t (float) –

    The current time.

  • A (float) –

    The mRNA production rate.

  • B (float) –

    The mRNA degradation rate.

  • C (float) –

    The protein production rate.

  • D (float) –

    The protein degradation rate.

  • S_rates (array) –

    The phosphorylation rates for each site.

  • D_rates (array) –

    The dephosphorylation rates for each site.

Returns: dydt (np.array): The derivatives of the state variables.

unpack_params(params, num_psites)

Function to unpack the parameters for the ODE system. The parameters are expected to be in the following order: A, B, C, D, S_rates, D_rates where S_rates and D_rates are arrays of length num_psites. The function returns the unpacked parameters as separate variables. :param params: array of parameters :param num_psites: number of phosphorylation sites :return: A, B, C, D, S_rates, D_rates

solve_ode(params, init_cond, num_psites, t)

Solve the ODE system using the given parameters and initial conditions. The function integrates the ODE system over time and returns the solution.

:param params: :param init_cond: :param num_psites: :param t: :return: solution, solution of phosphorylated sites

Steady-State Calculation

steady.initdist

initial_condition(num_psites: int) -> list

Calculates the initial steady-state conditions for a given number of phosphorylation sites for distributive phosphorylation model.

Parameters:
  • num_psites (int) –

    Number of phosphorylation sites in the model.

Returns:
  • list( list ) –

    A list of steady-state values for the variables [R, P, P_sites].

Raises:
  • ValueError

    If the optimization fails to find a solution for the steady-state conditions.

steady.initrand

initial_condition(num_psites: int) -> list

Calculates the initial steady-state conditions for a given number of phosphorylation sites for random phosphorylation model.

Parameters:
  • num_psites (int) –

    Number of phosphorylation sites in the model.

Returns:
  • list( list ) –

    A list of steady-state values for the variables [R, P, P_sites].

Raises:
  • ValueError

    If the optimization fails to find a solution for the steady-state conditions.

steady.initsucc

initial_condition(num_psites: int) -> list

Calculates the initial steady-state conditions for a given number of phosphorylation sites for successive phosphorylation model.

Parameters:
  • num_psites (int) –

    Number of phosphorylation sites in the model.

Returns:
  • list( list ) –

    A list of steady-state values for the variables [R, P, P_sites].

Raises:
  • ValueError

    If the optimization fails to find a solution for the steady-state conditions.

Plotting

plotting.plotting

Plotter

A class to encapsulate plotting functionalities for ODE model analysis.

Attributes:
  • gene (str) –

    The gene or experiment name.

  • out_dir (str) –

    The directory where plots will be saved.

  • color_palette (list) –

    List of color codes used for plotting.

plot_parallel(solution: np.ndarray, labels: list)

Plots a parallel coordinates plot for the given solution.

Parameters:
  • solution (ndarray) –

    2D numpy array of shape (samples, features) representing the data.

  • labels (list) –

    List of labels for the features in the solution.

pca_components(solution: np.ndarray, target_variance: float = 0.99)

Plots a scree plot showing the explained variance ratio for PCA components.

Parameters:
  • solution (ndarray) –

    2D numpy array of shape (samples, features) representing the data.

  • target_variance (float, default: 0.99 ) –

    The target variance to explain. Defaults to 0.99.

plot_pca(solution: np.ndarray, components: int = 3)

Plots the PCA results for the given solution.

Parameters:
  • solution (ndarray) –

    2D numpy array of shape (samples, features) representing the data.

  • components (int, default: 3 ) –

    Number of PCA components to plot. Defaults to 3.

Returns:
  • tuple

    PCA result and explained variance ratio.

plot_tsne(solution: np.ndarray, perplexity: int = 30)

Plots a t-SNE visualization of the given solution.

Parameters:
  • solution (ndarray) –

    2D numpy array of shape (samples, features) representing the data.

  • perplexity (int, default: 30 ) –

    The perplexity parameter for t-SNE. Defaults to 30.

Returns:
  • np.ndarray: The t-SNE result.

plot_param_series(estimated_params: list, param_names: list, time_points: np.ndarray)

Plots the time series of estimated parameters over the given time points.

Parameters:
  • estimated_params (list) –

    List of estimated parameters.

  • param_names (list) –

    List of parameter names.

  • time_points (ndarray) –

    Array of time points.

plot_profiles(data: pd.DataFrame)

Plots the profiles of estimated parameters over time.

Parameters:
  • data (DataFrame) –

    DataFrame containing the time series data.

plot_model_fit(model_fit: np.ndarray, Pr_data: np.ndarray, P_data: np.ndarray, R_data: np.ndarray, sol: np.ndarray, num_psites: int, psite_labels: list, time_points: np.ndarray)

Plots the model fit for mRNA, protein, and phosphorylated species across time.

Parameters:
  • model_fit (ndarray) –

    Flattened model fit data (length = 9 + 14 + 14*num_psites).

  • Pr_data (ndarray) –

    Protein data (14,).

  • P_data (ndarray) –

    Phosphorylation data (num_psites + 2, 14).

  • R_data (ndarray) –

    mRNA data (9,).

  • sol (ndarray) –

    ODE solution array.

  • num_psites (int) –

    Number of phosphorylation sites.

  • psite_labels (list) –

    Labels for phosphorylation sites.

  • time_points (ndarray) –

    Time points (14,).

plot_param_scatter(est_arr: np.ndarray, num_psites: int, time_vals: np.ndarray)

Plots scatter and density plots for parameters.

Parameters:
  • est_arr (ndarray) –

    2D numpy array of estimated parameters.

  • num_psites (int) –

    Number of phosphorylation sites.

  • time_vals (ndarray) –

    Array of time values.

plot_heatmap(param_value_df: pd.DataFrame)

Parameters:
  • param_value_df (DataFrame) –

    DataFrame containing parameter values with 'Protein' as one of the columns.

plot_error_distribution(error_df: pd.DataFrame)

Parameters:
  • error_df (DataFrame) –

    DataFrame containing errors with 'MAE' as one of the columns.

plot_gof(merged_data: pd.DataFrame)

Plot the goodness of fit for the model.

Parameters:
  • merged_data (DataFrame) –

    Dataframe containing merged data.

plot_kld(merged_data: pd.DataFrame)

Plots the Kullback-Divergence for the model.

Parameters:
  • merged_data (DataFrame) –

    Dataframe containing merged data.

plot_params_bar(ci_results: dict, param_labels: list = None)

Plots bar plot for estimated parameter with 95% Confidence Interval.

Parameters:
  • ci_results (dict) –

    Dictionary containing the results of the confidence intervals.

  • param_labels (list, default: None ) –

    List of parameter labels. Defaults to None.

plot_knockouts(results_dict: dict, num_psites: int, psite_labels: list)

Plot wild-type and knockout simulation results for comparison.

Parameters:
  • results_dict (dict) –

    Dictionary containing simulation results.

  • num_psites (int) –

    Number of phosphorylation sites.

  • psite_labels (list) –

    List of phosphorylation site labels.

plot_top_param_pairs(excel_path: str)

For each gene's '_perturbations' sheet in the Excel file, plot scatter plots for the parameter pairs with correlation.

Parameters:
  • excel_path (str) –

    Path to the Excel file.

plot_model_perturbations(problem: dict, Si: dict, cutoff_idx: int, time_points: np.ndarray, n_sites: int, best_model_psite_solutions: np.ndarray, best_mrna_solutions: np.ndarray, best_protein_solutions: np.ndarray, psite_labels: list[str], protein_data_ref: np.ndarray, psite_data_ref: np.ndarray, rna_ref: np.ndarray, model_fit_sol: np.ndarray) -> None

Plot the best model perturbations for the given data.

Parameters:
  • problem (dict) –

    The optimization problem.

  • Si (dict) –

    The simulation index.

  • cutoff_idx (int) –

    The cutoff index for the time points.

  • time_points (ndarray) –

    The time points for the data.

  • n_sites (int) –

    The number of phosphorylation sites.

  • best_model_psite_solutions (ndarray) –

    The best model phosphorylation site solutions.

  • best_mrna_solutions (ndarray) –

    The best model mRNA solutions.

  • best_protein_solutions (ndarray) –

    The best model protein solutions.

  • protein_ref

    The reference data for the protein.

  • psite_labels (list[str]) –

    The labels for the phosphorylation sites.

  • psite_data_ref (ndarray) –

    The reference data for the phosphorylation sites.

  • rna_ref (ndarray) –

    The reference data for mRNA.

plot_time_state_grid(samples: np.ndarray, time_points: np.ndarray, state_names: list)

Grid of strip plots per state showing variability across time.

Parameters:
  • samples (ndarray) –

    shape (n_samples, n_timepoints, n_states)

  • time_points (ndarray) –

    array of time points

  • state_names (list) –

    list of state names

plot_phase_space(samples: np.ndarray, state_names: list)

Phase space plots: one state vs another for each simulation.

Parameters:
  • samples (ndarray) –

    Shape (n_samples, n_timepoints, n_states)

  • state_names (list) –

    List of state names (length = num_states)

plot_future_fit(P_data: np.ndarray, R_data: np.ndarray, sol: np.ndarray, num_psites: int, psite_labels: list, time_points: np.ndarray)

Plots the model fit for the future time points.

Parameters:
  • P_data (ndarray) –

    Data for phosphorylation sites.

  • R_data (ndarray) –

    Data for mRNA.

  • sol (ndarray) –

    Model solution.

  • num_psites (int) –

    Number of phosphorylation sites.

  • psite_labels (list) –

    Labels for phosphorylation sites.

  • time_points (ndarray) –

    Time points for the data.

plot_regularization(excel_path: str)

Read every '_params' sheet in the Excel file, pull the Regularization value, and plot a horizontal bar chart of regularization vs. gene.

Parameters:
  • excel_path (str) –

    Path to the Excel file.

plot_model_error(excel_path: str)

Read every '_params' sheet in the Excel file, pull the RMSE value, and plot a horizontal bar chart of RMSE vs. gene.

Parameters:
  • excel_path (str) –

    Path to the Excel file.

Utility Functions

utils.display

ensure_output_directory(directory)

Ensure the output directory exists. If it doesn't, create it.

Parameters:
  • directory (str) –

    Path to the output directory.

load_data(excel_file, sheet='Estimated Values')

Load data from an Excel file. The default sheet is "Estimated Values".

Parameters:
  • excel_file (str) –

    Path to the Excel file.

  • sheet (str, default: 'Estimated Values' ) –

    Name of the sheet to load. Default is "Estimated Values".

Returns:
  • pd.DataFrame: DataFrame containing the data from the specified sheet.

format_duration(seconds)

Format a duration in seconds into a human-readable string.

Parameters:
  • seconds (float) –

    Duration in seconds.

Returns: str: Formatted duration string.

merge_obs_est(filename)

Function to merge observed and estimated data from an Excel file.

Parameters:
  • filename (str) –

    Path to the Excel file containing observed and estimated data.

Returns:
  • pd.DataFrame: Merged DataFrame containing observed and estimated values for each gene and Psite.

save_result(results, excel_filename)

Function to save results to an Excel file.

Parameters:
  • results (list) –

    List of dictionaries containing results for each gene.

  • excel_filename (str) –

    Path to the output Excel file.

create_report(results_dir: str, output_file: str = f'{model_type}_report.html')

Creates a single global report HTML file from all gene folders inside the results directory.

Parameters:
  • results_dir (str) –

    Path to the root result's directory.

  • output_file (str, default: f'{model_type}_report.html' ) –

    Name of the generated global report file (placed inside results_dir).

organize_output_files(directories: Iterable[Union[str, Path]])

Organize output files into protein-specific folders and a general folder.

Parameters:
  • directories (Iterable[Union[str, Path]]) –

    List of directories to organize.

utils.tables

generate_tables(xlsx_file_path)

Generate hierarchical tables from the XLSX file containing alpha and beta values.

Parameters:
  • xlsx_file_path (str) –

    Path to the XLSX file containing alpha and beta values.

Returns:
  • tuple

    containing protein, psite, and the corresponding table.

save_tables(tables, output_dir)

Save the generated tables as LaTeX and CSV files.

Parameters:
  • tables (list) –

    List of tuples containing protein, psite, and the corresponding table.

  • output_dir (str) –

    Directory to save the LaTeX and CSV files.

save_master_table(folder='latex', output_file='latex/all_tables.tex')

Save a master LaTeX file that includes all individual LaTeX files from the specified folder.

Parameters:
  • folder (str, default: 'latex' ) –

    The folder containing the individual LaTeX files.

  • output_file (str, default: 'latex/all_tables.tex' ) –

    The name of the master LaTeX file to be created.

utils.latexit

generate_latex_table(df, sheet_name)

Generate LaTeX code for a table from a DataFrame. Args: df (pd.DataFrame): DataFrame to convert to LaTeX. sheet_name (str): Name of the sheet for caption and label. Returns: str: LaTeX code for the table.

generate_latex_image(image_filename)

Generate LaTeX code for an image.

Parameters:
  • image_filename (str) –

    Path to the image file.

Returns: str: LaTeX code for the image.

main(input_dir)

Main function to process Excel and PNG files in the input directory and generate LaTeX code.

Parameters:
  • input_dir (str) –

    Directory containing Excel and PNG files.