API Reference
Data Standardization & Cleanup
processing.cleanup
process_collecttri()
Processes the CollecTRI file to clean and filter mRNA-TF interactions. Removes complex interactions, filters by target genes, and saves the result.
format_site(site)
Formats a phosphorylation site string.
If the input is NaN or an empty string, returns an empty string. If the input contains an underscore ('_'), splits the string into two parts, converts the first part to uppercase, and appends the second part unchanged. Otherwise, converts the entire string to uppercase.
Parameters: |
|
---|
Returns: |
|
---|
process_msgauss()
Processes the MS Gaussian data file to generate time series data.
process_msgauss_std()
Processes the MS Gaussian data file to compute transformed means and standard deviations.
process_routlimma()
Processes the Rout Limma table to generate time series data for mRNA.
update_gene_symbols(filename)
Updates the GeneID column in a CSV file by mapping GeneIDs to gene/protein symbols.
Parameters: |
|
---|
move_processed_files()
Moves or copies processed files to their respective directories.
Optimization Results Mapping
processing.map
map_optimization_results(tf_file_path, kin_file_path, sheet_name='Alpha Values')
Reads the TF-mRNA optimization results from an Excel file and maps mRNA to each TF.
Parameters: |
|
---|
Returns: |
|
---|
create_cytoscape_table(mapping_csv_path)
Creates a Cytoscape-compatible edge table from a mapping file.
Parameters: |
|
---|
Returns: |
|
---|
add_kinetic_strength_columns(mapping_path, mapping__path, excel_path, suffix)
Adds kinetic strength columns to the mapping files based on the provided Excel file.
Parameters: |
|
---|
generate_nodes(edge_df)
Infers node types and aggregates all phosphorylation sites per target node from phosphorylation edges.
Parameters: |
|
---|
Returns: |
|
---|
Kinase-Phosphorylation Optimization
Evolutionary Algorithms
kinopt.evol.config.constants
kinopt.evol.config.logconf
ColoredFormatter
Bases: Formatter
format(record)
Format the log record with ANSI color codes and elapsed time.
Parameters: |
|
---|
Returns: str: The formatted log message with ANSI color codes.
remove_ansi(s)
staticmethod
Remove ANSI escape codes from a string.
Parameters: |
|
---|
Returns: str: The string without ANSI escape codes.
setup_logger(name='phoskintime', log_file=None, level=logging.DEBUG, log_dir=LOG_DIR, rotate=True, max_bytes=2 * 1024 * 1024, backup_count=5)
Function to set up a logger with both file and console handlers.
Parameters: |
|
---|
Returns: |
|
---|
kinopt.evol.exporter.plotout
plot_residuals_for_gene(gene, gene_data)
Generates and saves combined residual-related plots for one gene with all psites in the legend.
Parameters: |
|
---|
opt_analyze_nsga(problem, result, F, pairs, approx_ideal, approx_nadir, asf_i, pseudo_i, n_evals, hv, hist, val, hist_cv_avg, k, igd, best_objectives, waterfall_df, convergence_df, alpha_values, beta_values)
Function to generate and save various plots related to optimization results.
Parameters: |
|
---|
Returns: |
|
---|
opt_analyze_de(long_df, convergence_df, ordered_optimizer_runs, x_values, y_values, val)
Function to generate and save various plots related to optimization results.
Parameters: |
|
---|
Returns: |
|
---|
kinopt.evol.exporter.sheetutils
output_results(P_initial, P_init_dense, P_estimated, residuals, alpha_values, beta_values, result, timepoints, OUT_FILE)
Function to output results to an Excel file.
Parameters: |
|
---|
Returns: |
|
---|
kinopt.evol.objfn.minfndiffevo
PhosphorylationOptimizationProblem
Bases: ElementwiseProblem
Custom optimization problem for phosphorylation analysis.
Defines the constraints, bounds, and objective function for optimizing alpha and beta parameters across gene-psite-kinase relationships.
Attributes: |
|
---|
__init__(P_initial, P_initial_array, K_index, K_array, gene_psite_counts, beta_counts, **kwargs)
Initializes the optimization problem with given data and constraints.
Parameters: |
|
---|
objective_function(params)
Computes the loss value for the given parameters using the selected loss type.
Parameters: |
|
---|
Returns: |
|
---|
kinopt.evol.objfn.minfnnsgaii
PhosphorylationOptimizationProblem
Bases: ElementwiseProblem
Multi-objective optimization problem for phosphorylation analysis.
Objectives: - Minimize sum of squared residuals (main objective). - Minimize violations of constraints for alpha (secondary objective). - Minimize violations of constraints for beta (tertiary objective).
__init__(P_initial, P_initial_array, K_index, K_array, gene_psite_counts, beta_counts, **kwargs)
Initializes the multi-objective optimization problem.
Parameters: |
|
---|
objective_function(params)
Computes the loss value for the given parameters using the selected loss type.
Parameters: |
|
---|
Returns: |
|
---|
kinopt.evol.opt.optrun
run_optimization(P_initial, P_initial_array, K_index, K_array, gene_psite_counts, beta_counts, PhosphorylationOptimizationProblem)
Sets up and runs the multi-objective optimization problem for phosphorylation using an NSGA2 algorithm and a thread pool for parallelization.
Parameters: |
|
---|
Returns: |
|
---|
post_optimization_nsga(result, weights=np.array([1.0, 1.0, 1.0]), ref_point=np.array([3, 1, 1]))
Post-processes the result of a multi-objective optimization run.
Parameters: |
|
---|
Returns: |
|
---|
post_optimization_de(result, alpha_values, beta_values)
Post-processes the result of a multi-objective optimization run.
Parameters: |
|
---|
Returns: |
|
---|
kinopt.evol.optcon.construct
pipeline(input1_path: str, input2_path: str, time_series_columns: list[str], scaling_method: str, split_point: float, segment_points: list[float], estimate_missing_kinases: bool, kinase_to_psites: dict[str, int])
Function to run the entire pipeline for loading and processing data.
Parameters: |
|
---|
Returns: |
|
---|
load_geneid_to_psites(input1_path=INPUT1)
Function to load geneid to psite mapping from input1.csv. Args: input1_path (str): Path to the first CSV file (HGNC data). Returns: geneid_psite_map (dict): Dictionary mapping gene IDs to sets of psites.
get_unique_kinases(input2_path=INPUT2)
Function to extract unique kinases from input2.csv. Args: input2_path (str): Path to the second CSV file (kinase interactions). Returns: kinases (set): Set of unique kinases extracted from the input2 file.
check_kinases()
Function to check if kinases from input2.csv are present in input1.csv.
Returns: |
|
---|
kinopt.evol.utils.iodata
format_duration(seconds)
Returns a formatted string representing the duration in seconds, minutes, or hours.
Parameters: |
|
---|
Returns: str: The formatted duration string.
load_and_scale_data(estimate_missing, scaling_method, split_point, seg_points)
Function to load and scale data from CSV files.
Parameters: |
|
---|
Returns: |
|
---|
apply_scaling(df, time_series_columns, method, split_point, segment_points)
Function to apply different scaling methods to time-series data in a DataFrame.
Parameters: |
|
---|
Returns: |
|
---|
create_report(results_dir: str, output_file: str = 'report.html')
Creates a single global report HTML file from all gene folders inside the results directory.
Parameters: |
|
---|
Returns: |
|
---|
organize_output_files(*directories)
Function to organize output files into protein-specific folders and a general folder.
Parameters: |
|
---|
Returns: |
|
---|
kinopt.evol.utils.params
extract_parameters(P_initial, gene_psite_counts, K_index, optimized_params)
Function to extract alpha and beta values from the optimized parameters.
Parameters: |
|
---|
Returns: |
|
---|
compute_metrics(optimized_params: np.ndarray, P_initial: dict, P_initial_array: np.ndarray, K_index: dict, K_array: np.ndarray, gene_psite_counts: list, beta_counts: dict, n: int)
Function to compute error metrics for the estimated series.
Parameters: |
|
---|
Returns: |
|
---|
Gradient-Based Algorithms
kinopt.local.config.constants
parse_args()
Parses command-line arguments for the optimization script. This function uses argparse to handle various parameters related to the optimization process. The parameters include bounds for the optimization, loss function types, estimation of missing kinases, scaling methods for time-series data, and the optimization method to be used. The function returns a tuple containing the parsed arguments.
:return: A tuple containing the parsed arguments. - lower_bound (float): Lower bound for the optimization. - upper_bound (float): Upper bound for the optimization. - loss_type (str): Type of loss function to use. - estimate_missing (bool): Whether to estimate missing kinase-psite values. - scaling_method (str): Method for scaling time-series data. - split_point (int): Split point for temporal scaling. - segment_points (list of int): Segment points for segmented scaling. - method (str): Optimization method to use.
kinopt.local.config.logconf
kinopt.local.exporter.plotout
plot_fits_for_gene(gene, gene_data, real_timepoints)
Function to plot the observed and estimated phosphorylation levels for each psite of a gene.
Parameters: |
|
---|
plot_cumulative_residuals(gene, gene_data, real_timepoints)
Function to plot the cumulative residuals for each psite of a gene.
Parameters: |
|
---|
plot_autocorrelation_residuals(gene, gene_data, real_timepoints)
Function to plot the autocorrelation of residuals for each psite of a gene.
Parameters: |
|
---|
plot_histogram_residuals(gene, gene_data, real_timepoints)
Function to plot histograms of residuals for each psite of a gene.
Parameters: |
|
---|
plot_qqplot_residuals(gene, gene_data, real_timepoints)
Function to plot QQ plots of residuals for each psite of a gene.
Parameters: |
|
---|
kinopt.local.exporter.sheetutils
plot_fits_for_gene(gene, gene_data, real_timepoints)
Function to plot the observed and estimated phosphorylation levels for each psite of a gene.
Parameters: |
|
---|
plot_cumulative_residuals(gene, gene_data, real_timepoints)
Function to plot the cumulative residuals for each psite of a gene.
Parameters: |
|
---|
plot_autocorrelation_residuals(gene, gene_data, real_timepoints)
Function to plot the autocorrelation of residuals for each psite of a gene.
Parameters: |
|
---|
plot_histogram_residuals(gene, gene_data, real_timepoints)
Function to plot histograms of residuals for each psite of a gene.
Parameters: |
|
---|
plot_qqplot_residuals(gene, gene_data, real_timepoints)
Function to plot QQ plots of residuals for each psite of a gene.
Parameters: |
|
---|
output_results(P_initial, P_init_dense, P_estimated, residuals, alpha_values, beta_values, result, mse, rmse, mae, mape, r_squared)
Function to output the results of the optimization process.
Parameters: |
|
---|
kinopt.local.objfn.minfn
kinopt.local.opt.optrun
run_optimization(obj_fun, params_initial, opt_method, bounds, constraints)
Run optimization using the specified method.
Parameters: |
|
---|
Returns: |
|
---|
kinopt.local.optcon.construct
load_geneid_to_psites(input1_path=INPUT1)
Load the geneid to psite mapping from a CSV file.
Parameters: |
|
---|
Returns: defaultdict: A dictionary mapping geneid to a set of psites.
get_unique_kinases(input2_path=INPUT2)
Extract unique kinases from the input CSV file.
Parameters: |
|
---|
Returns: set: A set of unique kinases.
check_kinases()
Check if kinases in input2.csv are present in input1.csv and log the results.
kinopt.local.utils.iodata
format_duration(seconds)
Formats a duration in seconds into a human-readable string. - If less than 60 seconds, returns in seconds. - If less than 3600 seconds, returns in minutes. - If more than 3600 seconds, returns in hours.
:param seconds: :return: Formatted string
load_and_scale_data(estimate_missing, scaling_method, split_point, seg_points)
Load and scale the data from the specified input files.
:param estimate_missing: :param scaling_method: :param split_point: :param seg_points: :return: Time series data, interaction data, observed data
apply_scaling(df, cols, method, split_point, seg_points)
Apply scaling to the specified columns of a DataFrame based on the given method. The scaling methods include: - 'min_max': Min-Max scaling - 'log': Logarithmic scaling - 'temporal': Temporal scaling (two segments) - 'segmented': Segmented scaling (multiple segments) - 'slope': Slope scaling - 'cumulative': Cumulative scaling
:param df: :param cols: :param method: :param split_point: :param seg_points: :return: df
create_report(results_dir: str, output_file: str = 'report.html')
Creates a single global report HTML file from all gene folders inside the results directory.
For each gene folder (e.g. "ABL2"), the report will include: - All PNG plots and interactive HTML plots displayed in a grid with three plots per row. - Each plot is confined to a fixed size of 900px by 900px. - Data tables from XLSX or CSV files in the gene folder are displayed below the plots, one per row.
Parameters: |
|
---|
organize_output_files(*directories)
Function to organize output files into protein-specific folders. It moves files matching the pattern 'protein_name_*.{json,svg,png,html,csv,xlsx}' into a folder named after the protein (e.g., 'ABL2') and moves all other files into a 'General' folder within the same directory.
:param directories:
kinopt.local.utils.params
extract_parameters(P_initial, gene_kinase_counts, total_alpha, unique_kinases, K_index, optimized_params)
Extracts the alpha and beta parameters from the optimized parameters.
:param P_initial: :param gene_kinase_counts: :param total_alpha: :param unique_kinases: :param K_index: :param optimized_params: :return: Alpha and beta values as dictionaries
compute_metrics(optimized_params, P_init_dense, t_max, gene_alpha_starts, gene_kinase_counts, gene_kinase_idx, total_alpha, kinase_beta_starts, kinase_beta_counts, K_data, K_indices, K_indptr)
Computes the estimated series and various metrics based on the optimized parameters.
:param optimized_params: :param P_init_dense: :param t_max: :param gene_alpha_starts: :param gene_kinase_counts: :param gene_kinase_idx: :param total_alpha: :param kinase_beta_starts: :param kinase_beta_counts: :param K_data: :param K_indices: :param K_indptr: :return: Estimated series, residuals, MSE, RMSE, MAE, MAPE, R-squared
Fitting Analysis & Feasibility
kinopt.fitanalysis.helpers.postfit
goodnessoffit(estimated, observed)
Function to plot the goodness of fit and kullback-leibler divergence for estimated and observed values.
Parameters: |
|
---|
Returns: |
|
---|
reshape_alpha_beta(alpha_values, beta_values)
Function to reshape alpha and beta values for plotting.
Parameters: |
|
---|
Returns: pd.DataFrame: Reshaped DataFrame containing both alpha and beta values.
perform_pca(df)
Function to perform PCA analysis on the given DataFrame.
Parameters: |
|
---|
Returns: |
|
---|
plot_pca(result_df_sorted, y_axis_column)
Plot PCA or t-SNE results for each gene/psite. The function creates scatter plots with different markers for alpha and beta parameters, and adds labels for each point. The function also adjusts text labels to avoid overlap using the adjustText library.
:param result_df_sorted: DataFrame containing PCA or t-SNE results. :param y_axis_column: Column name for the y-axis values in the plot.
perform_tsne(scaled_data, df)
Perform t-SNE analysis on the given scaled data. The function returns a DataFrame with t-SNE results and additional columns for type and gene/psite information.
:param scaled_data: :param df:
:return: - pd.DataFrame: DataFrame with t-SNE results and additional columns.
additional_plots(df, scaled_data, alpha_values, beta_values, residuals_df)
Function to create additional plots including CDF, KDE, Boxplot, and Hierarchical Clustering.
:param df: :param scaled_data: :param alpha_values: :param beta_values: :param residuals_df:
create_sankey_from_network(output_dir, data, title)
Creates a Sankey diagram from the given data and saves it as an HTML file.
This function processes the input data to generate nodes and links for a Sankey diagram. It assigns colors to nodes and links based on their attributes and values, and uses Plotly to render the diagram. The resulting diagram is saved as an HTML file in the specified output directory.
:param output_dir: str The directory where the Sankey diagram HTML file will be saved. :param data: pd.DataFrame A DataFrame containing the data for the Sankey diagram. It must include the following columns: - 'Source': The source node of the link. - 'Target': The target node of the link. - 'Value': The value of the link, which determines the flow size. :param title: str The title of the Sankey diagram.
The function performs the following steps: 1. Initializes nodes and links for the Sankey diagram. 2. Maps node labels to indices and assigns colors to nodes. 3. Processes the data to create links between nodes, assigning colors based on link values. 4. Builds the Sankey diagram using Plotly. 5. Adds a color bar to explain the flow gradient. 6. Saves the Sankey diagram as an HTML file in the specified output directory.
important_connections(output_dir, data, top_n=20)
Extracts the top N most important connections based on their absolute values and saves them to a CSV file.
:param output_dir: str The directory where the CSV file will be saved. :param data: pd.DataFrame A DataFrame containing the connections with columns 'Source', 'Target', and 'Value'. :param top_n: int, optional The number of top connections to extract (default is 20).
The function sorts the connections by their absolute values in descending order, selects the top N connections, and saves them to a CSV file named 'top_connections.csv' in the specified output directory.
kinopt.optimality.KKT
generate_latex_table(summary_dict, table_caption, table=None)
Function to generate a LaTeX table from a summary dictionary.
Parameters: |
|
---|
Returns: |
|
---|
print_primal_feasibility_results(primal_summary, alpha_violations, beta_violations, logger_obj=None)
Logs the primal feasibility summary and violation details.
Parameters: |
|
---|
print_sensitivity_and_active_constraints(sensitivity_summary, active_constraints_summary, logger_obj=None)
Logs the sensitivity summary and active constraints summary.
Parameters: |
|
---|
plot_constraint_violations(alpha_violations, beta_violations, out_dir)
Function to plot constraint violations for alpha and beta values. It creates a stacked bar plot showing the violations for each protein. The top 5 proteins with the highest violations are highlighted in red.
Parameters: |
|
---|
plot_sensitivity_analysis(sensitivity_analysis, out_dir)
Function to plot sensitivity analysis results. It creates a horizontal bar plot showing the mean, max, and min sensitivity for each protein.
Parameters: |
|
---|
Returns: |
|
---|
process_excel_results(file_path=OUT_FILE)
Function to process the Excel results file. It reads the alpha and beta values, estimated and observed values, validates normalization constraints, computes residuals and gradients, and generates LaTeX tables for the residuals and sensitivity summaries. It also performs sensitivity analysis and identifies high sensitivity sites. The results are returned as a dictionary.
Parameters: |
|
---|
Returns: dict: Dictionary containing the processed results, including alpha and beta values, estimated and observed values, constraint violations, residuals summary, sensitivity summary, and high sensitivity sites.
post_optimization_results()
Function to process and visualize the results of the optimization.
Returns: dict: Dictionary containing the processed results, including alpha and beta values, estimated and observed values, constraint violations, residuals summary, sensitivity summary, and high sensitivity sites.
TF-mRNA Optimization
Evolutionary Algorithms
tfopt.evol.config.constants
parse_args()
Parse command line arguments for the PhosKinTime optimization problem. This function uses argparse to handle input parameters for the optimization process. The parameters include: - lower_bound: Lower bound for the optimization variables (default: -2). - upper_bound: Upper bound for the optimization variables (default: 2). - loss_type: Type of loss function to use (default: 0). Options: 0: MSE 1: MAE 2: soft L1 3: Cauchy 4: Arctan 5: Elastic Net 6: Tikhonov - optimizer: Global Evolutionary Optimization method (default: 0). Options: 0: NGSA2 1: SMSEMOA 2: AGEMOEA
:returns - lower_bound: Lower bound for the optimization variables. - upper_bound: Upper bound for the optimization variables. - loss_type: Type of loss function to use. - optimizer: Global Evolutionary Optimization method. :rtype: tuple :raises argparse.ArgumentError: If an invalid argument is provided. :raises SystemExit: If the script is run with invalid arguments.
tfopt.evol.config.logconf
tfopt.evol.exporter.plotout
plot_estimated_vs_observed(predictions, expression_matrix, gene_ids, time_points, regulators, tf_protein_matrix, tf_ids, num_targets, save_path=OUT_DIR)
Plot the estimated vs observed expression levels for a set of genes.
Parameters: |
|
---|
compute_predictions(x, regulators, protein_mat, psite_tensor, n_reg, T_use, n_mRNA, beta_start_indices, num_psites)
Compute the predicted expression levels based on the optimization variables.
Parameters: |
|
---|
tfopt.evol.exporter.sheetutils
save_results_to_excel(gene_ids, tf_ids, final_alpha, final_beta, psite_labels_arr, expression_matrix, predictions, objective_value, reg_map, filename=OUT_FILE)
Save the optimization results to an Excel file.
Parameters: |
|
---|
tfopt.evol.objfn.minfn
TFOptimizationMultiObjectiveProblem
Bases: Problem
Originally implemented by Julius Normann.
This version has been modified and optimized for consistency & speed in submodules by Abhinav Mishra.
Multi-objective optimization problem for TF optimization.
This class defines a multi-objective optimization problem for the
transcription factor (TF) optimization problem. It inherits from the
Problem
class in the pymoo library. The problem is defined with three
objectives: f1 (error), f2 (alpha violation), and f3 (beta violation).
__init__(n_var, n_mRNA, n_TF, n_reg, n_psite_max, n_alpha, mRNA_mat, regulators, protein_mat, psite_tensor, T_use, beta_start_indices, num_psites, no_psite_tf, xl=None, xu=None, **kwargs)
Initialize the multi-objective optimization problem.
Parameters: |
|
---|
objective_(x, mRNA_mat, regulators, protein_mat, psite_tensor, n_reg, T_use, n_mRNA, beta_start_indices, num_psites, loss_type, lam1=0.001, lam2=0.001)
Computes a loss value for transcription factor optimization using evolutionary algorithms.
Parameters: |
|
---|
Returns: |
|
---|
tfopt.evol.opt.optrun
run_optimization(problem, total_dim, optimizer)
Run the optimization using the specified algorithm and problem.
Parameters: |
|
---|
Returns: |
|
---|
tfopt.evol.optcon.construct
build_fixed_arrays(mRNA_ids, mRNA_mat, TF_ids, protein_dict, psite_dict, psite_labels_dict, reg_map)
Builds fixed-shape arrays from the input data.
Parameters: |
|
---|
Returns: mRNA_mat (np.ndarray): Matrix of mRNA expression levels. regulators (np.ndarray): Matrix of regulators for each mRNA. protein_mat (np.ndarray): Matrix of TF protein levels. psite_tensor (np.ndarray): Tensor of phosphorylation sites. n_reg (int): Number of regulators. n_psite_max (int): Maximum number of phosphorylation sites across all TFs. psite_labels_arr (list): List of phosphorylation site labels for each TF. num_psites (np.ndarray): Array indicating the number of phosphorylation sites for each TF.
tfopt.evol.optcon.filter
load_raw_data()
Load raw data from files.
Returns: |
|
---|
filter_mrna(mRNA_ids, mRNA_mat, reg_map)
Filter mRNA genes to only those with regulators present in the regulation map.
Parameters: |
|
---|
Returns: |
|
---|
update_regulations(mRNA_ids, reg_map, TF_ids)
Update the regulation map to only include relevant transcription factors.
Parameters: |
|
---|
Returns: |
|
---|
filter_TF(TF_ids, protein_dict, psite_dict, psite_labels_dict, relevant_TFs)
Filter transcription factors to only those present in the relevant_TFs set.
Parameters: |
|
---|
Returns: |
|
---|
determine_T_use(mRNA_mat, TF_time_cols)
Determine the number of time points to use for the analysis.
Parameters: |
|
---|
tfopt.evol.utils.iodata
load_mRNA_data(filename=INPUT3)
Load mRNA data from a CSV file.
Parameters: |
|
---|
Returns: - mRNA_ids: List of mRNA gene identifiers (strings). - mRNA_mat: Matrix of mRNA expression data (numpy array). - time_cols: List of time columns (excluding "GeneID").
load_TF_data(filename=INPUT1)
Load TF data from a CSV file.
Parameters: |
|
---|
Returns: - TF_ids: List of TF identifiers (strings). - protein_dict: Dictionary mapping TF identifiers to their protein data (numpy array). - psite_dict: Dictionary mapping TF identifiers to their phosphorylation site data (list of numpy arrays). - psite_labels_dict: Dictionary mapping TF identifiers to their phosphorylation site labels (list of strings). - time_cols: List of time columns (excluding "GeneID" and "Psite").
load_regulation(filename=INPUT4)
Load regulation data from a CSV file.
Parameters: |
|
---|
Returns: - reg_map: Dictionary mapping mRNA genes to their regulators (list of TF identifiers).
create_report(results_dir: str, output_file: str = 'report.html')
Creates a single global report HTML file from all gene folders inside the results directory.
Parameters: |
|
---|
organize_output_files(*directories)
Organizes output files from multiple directories into separate folders for each protein.
Parameters: |
|
---|
format_duration(seconds)
Format a duration in seconds into a human-readable string.
Parameters: |
|
---|
Returns: str: Formatted duration string.
tfopt.evol.utils.params
create_no_psite_array(n_TF, num_psites, psite_labels_arr)
Create an array indicating whether each TF has no phosphorylation sites.
Parameters: |
|
---|
Returns: |
|
---|
compute_beta_indices(num_psites, n_TF)
Compute the starting indices for the beta parameters for each TF.
Parameters: |
|
---|
Returns: |
|
---|
create_initial_guess(n_mRNA, n_reg, n_TF, num_psites, no_psite_tf)
Create the initial guess for the optimization variables.
Parameters: |
|
---|
Returns: |
|
---|
create_bounds(n_alpha, n_beta_total, lb, ub)
Create the lower and upper bounds for the optimization variables.
Parameters: |
|
---|
Returns: |
|
---|
get_parallel_runner()
Get a parallel runner for multi-threading.
Returns: |
|
---|
extract_best_solution(res, n_alpha, n_mRNA, n_reg, n_TF, num_psites, beta_start_indices)
Extract the best solution from the optimization results.
Parameters: |
|
---|
Returns: |
|
---|
print_alpha_mapping(mRNA_ids, reg_map, TF_ids, final_alpha)
Print the mapping of transcription factors (TFs) to mRNAs with their corresponding alpha values.
Parameters: |
|
---|
print_beta_mapping(TF_ids, final_beta, psite_labels_arr)
Print the mapping of transcription factors (TFs) to their beta parameters.
Parameters: |
|
---|
Gradient-Based Algorithms
tfopt.local.config.constants
parse_args()
Parse command line arguments for the PhosKinTime optimization problem. This function uses argparse to handle input parameters for the optimization process. The parameters include: - lower_bound: Lower bound for the optimization variables (default: -2). - upper_bound: Upper bound for the optimization variables (default: 2). - loss_type: Type of loss function to use (default: 0). Options: 0: MSE 1: MAE 2: soft L1 3: Cauchy 4: Arctan 5: Elastic Net 6: Tikhonov
:return: lower_bound, upper_bound, loss_type
tfopt.local.config.logconf
tfopt.local.exporter.plotout
plot_estimated_vs_observed(predictions, expression_matrix, gene_ids, time_points, regulators, tf_protein_matrix, tf_ids, num_targets, save_path=OUT_DIR)
Plots the estimated vs observed values for a given set of genes and their corresponding TFs.
Parameters: |
|
---|
tfopt.local.exporter.sheetutils
save_results_to_excel(gene_ids, tf_ids, final_alpha, final_beta, psite_labels_arr, expression_matrix, predictions, objective_value, reg_map, filename=OUT_FILE)
Save the optimization results to an Excel file.
Parameters: |
|
---|
tfopt.local.objfn.minfn
objective_(x, expression_matrix, regulators, tf_protein_matrix, psite_tensor, n_reg, T_use, n_genes, beta_start_indices, num_psites, loss_type, lam1=1e-06, lam2=1e-06)
Originally implemented by Julius Normann.
This version has been modified and optimized for consistency & speed in submodules by Abhinav Mishra.
Computes a loss value using one of several loss functions.
Parameters: |
|
---|
Returns: |
|
---|
compute_predictions(x, regulators, tf_protein_matrix, psite_tensor, n_reg, T_use, n_genes, beta_start_indices, num_psites)
Computes the predicted expression matrix based on the decision vector x.
Parameters: |
|
---|
Returns: |
|
---|
objective_wrapper(x, expression_matrix, regulators, tf_protein_matrix, psite_tensor, n_reg, T_use, n_genes, beta_start_indices, num_psites, loss_type)
Wrapper function for the objective function.
Parameters: |
|
---|
Returns: |
|
---|
tfopt.local.opt.optrun
run_optimizer(x0, bounds, lin_cons, expression_matrix, regulators, tf_protein_matrix, psite_tensor, n_reg, T_use, n_genes, beta_start_indices, num_psites, loss_type)
Runs the optimization algorithm to minimize the objective function.
Parameters: |
|
---|
Returns: result : Result of the optimization process, including the optimized parameters and objective value.
tfopt.local.optcon.construct
build_fixed_arrays(gene_ids, expression_matrix, tf_ids, tf_protein, tf_psite_data, tf_psite_labels, reg_map)
Builds fixed-shape arrays from the input data.
Parameters: |
|
---|
Returns: |
|
---|
constraint_alpha_func(x, n_genes, n_reg)
For each gene, the sum of its alpha parameters must equal 1.
Parameters: |
|
---|
Returns: |
|
---|
constraint_beta_func(x, n_alpha, n_TF, beta_start_indices, num_psites, no_psite_tf)
For each TF, the sum of its beta parameters must equal 1.
Parameters: |
|
---|
Returns: |
|
---|
build_linear_constraints(n_genes, n_TF, n_reg, n_alpha, beta_start_indices, num_psites, no_psite_tf)
Build linear constraints for the transcription factor optimization problem.
Parameters: |
|
---|
Returns: |
|
---|
tfopt.local.optcon.filter
load_and_filter_data()
Load and filter data for the optimization problem.
Returns: |
|
---|
prepare_data(gene_ids, expr_matrix, tf_ids, tf_protein, tf_psite_data, tf_psite_labels, tf_time_cols, reg_map)
Prepares the data for optimization by filtering the expression matrix to match the number of time points and building fixed arrays.
Parameters: |
|
---|
Returns: fixed_arrays (tuple): Tuple containing the fixed arrays: - expression_matrix: array of shape (n_genes, T) - regulators: array of shape (n_genes, n_reg) with indices into tf_ids. - tf_protein_matrix: array of shape (n_TF, T) - psite_tensor: array of shape (n_TF, n_psite_max, T), padded with zeros. - n_reg: maximum number of regulators per gene. - n_psite_max: maximum number of PSites among TFs. - psite_labels_arr: list (length n_TF) of lists of PSite names (padded with empty strings). - num_psites: array of length n_TF with the actual number of PSites for each TF. T_use (int): Number of time points used in the expression matrix.
tfopt.local.utils.iodata
min_max_normalize(df, custom_max=None)
Row-wise (per-sample) min-max normalize time-series columns starting with 'x'.
Parameters: |
|
---|
Returns: |
|
---|
load_expression_data(filename=INPUT3)
Loads gene expression (mRNA) data.
Parameters: |
|
---|
Returns: |
|
---|
load_tf_protein_data(filename=INPUT1)
Loads TF protein data along with PSite information.
Parameters: |
|
---|
Returns: - tf_ids: List of TF identifiers (strings). - tf_protein: Dictionary mapping TF identifiers to their protein data (numpy array). - tf_psite_data: Dictionary mapping TF identifiers to their phosphorylation site data (list of numpy arrays). - tf_psite_labels: Dictionary mapping TF identifiers to their phosphorylation site labels (list of strings). - time_cols: List of time columns (excluding "GeneID" and "Psite").
load_regulation(filename=INPUT4)
Returns a mapping from gene (source) to a list of TFs (targets).
Parameters: |
|
---|
Returns: - reg_map: Dictionary mapping gene identifiers to lists of TF identifiers.
summarize_stats(input3=INPUT3, input1=INPUT1, input4=INPUT4)
Summarizes statistics for the expression data (input3) and TF protein data (input1).
Parameters: |
|
---|
create_report(results_dir: str, output_file: str = 'report.html')
Creates a single global report HTML file from all gene folders inside the results directory.
Parameters: |
|
---|
organize_output_files(*directories)
Function to organize output files into protein-specific folders.
Parameters: |
|
---|
tfopt.local.utils.params
get_optimization_parameters(expression_matrix, tf_protein_matrix, n_reg, T_use, psite_labels_arr, num_psites, lb, ub)
Prepare the optimization parameters for the optimization problem.
Parameters: |
|
---|
Returns: x0 (np.ndarray): Initial guess for the optimization variables. n_alpha (int): Number of alpha parameters. beta_start_indices (np.ndarray): Starting indices for beta parameters. bounds (list): List of bounds for the optimization variables. no_psite_tf (np.ndarray): Array indicating whether each TF has no phosphorylation sites. n_genes (int): Number of genes. n_TF (int): Number of transcription factors.
postprocess_results(result, n_alpha, n_genes, n_reg, beta_start_indices, num_psites, reg_map, gene_ids, tf_ids, psite_labels_arr)
Post-process the optimization results to extract the final alpha and beta parameters.
Parameters: |
|
---|
Returns: |
|
---|
Fitting Analysis
tfopt.fitanalysis.helper
Plotter
A class to plot various analysis results from an Excel file.
__init__(filepath, savepath)
Initializes the Plotter instance by loading data from the Excel file. Args: filepath (str): Path to the Excel file containing analysis results. savepath (str): Directory where the plots will be saved.
load_data()
Loads data from the specified Excel file. Args: filepath (str): Path to the Excel file. savepath (str): Directory where the plots will be saved.
plot_alpha_distribution()
Plots the distribution of alpha parameter values grouped by transcription factors (TFs) using a strip plot.
plot_beta_barplots()
Processes the beta values DataFrame and creates a separate bar plot for each unique transcription factor (TF).
plot_heatmap_abs_residuals()
Plots a heatmap of the absolute values of the residuals.
plot_goodness_of_fit()
Creates a scatter plot comparing observed vs. estimated values, fits a linear regression model, plots the 95% confidence interval, and labels points outside the confidence interval.
plot_kld()
Plots the Kullback-Leibler Divergence (KLD) for each mRNA. The KLD is calculated between the observed and estimated distributions of the mRNA expression levels.
plot_pca()
Plots a PCA (Principal Component Analysis) of the observed and estimated values.
plot_boxplot_alpha()
Plots a boxplot of the alpha values.
plot_boxplot_beta()
Plots a boxplot of the beta values.
plot_cdf_alpha()
Plots the cumulative distribution function (CDF) of the alpha values.
plot_cdf_beta()
Plots the cumulative distribution function (CDF) of the beta values.
plot_time_wise_residuals()
Plots the residuals over time for each mRNA.
ODE Modelling & Parameter Estimation
Configuration
config.cli
Command‑line entry point for the phoskintime pipeline.
Usage
Come one level up from the package root, it should be the working directory
(where you can see the project directory).
run everything with the default (local) solver
python phoskintime all
run only preprocessing
python phoskintime prep
run tfopt with local flavour
python phoskintime tfopt --mode local
run tfopt with evol flavour
python phoskintime tfopt --mode evol
run kinopt with local flavour
python phoskintime kinopt --mode local
run kinopt with evol flavour
python phoskintime kinopt --mode evol
run the model
python phoskintime model
prep()
Preprocess data (processing.cleanup).
tfopt(mode: str = typer.Option('local', help='local | evol'), conf: Path | None = typer.Option(None, '--conf', file_okay=True, dir_okay=False, writable=False, help='Path to TOML/YAML config. Uses defaults if omitted.'))
Transcription-Factor-mRNA Optimisation.
Parameters: |
|
---|
Returns: None
kinopt(mode: str = typer.Option('local', help='local | evol'), conf: Path | None = typer.Option(None, '--conf', file_okay=True, dir_okay=False, writable=False, help='Path to TOML/YAML config. Uses defaults if omitted.'))
Kinase-Phosphorylation Optimization.
Parameters: |
|
---|
Returns: None
model(conf: Path | None = typer.Option(None, '--conf', file_okay=True, dir_okay=False, writable=False, help='Path to model config file. Uses defaults if omitted.'))
Run the model (bin.main).
Parameters: |
|
---|
Returns: None
all(tf_mode: str = typer.Option('local', help='tfopt mode: local | evol'), kin_mode: str = typer.Option('local', help='kinopt mode: local | evol'), tf_conf: Path | None = typer.Option(None, help='tfopt config file'), kin_conf: Path | None = typer.Option(None, help='kinopt config file'), model_conf: Path | None = typer.Option(None, help='model config file'))
Run every stage in sequence. Preprocessing -> TF optimisation -> Kinase optimisation -> Model.
Parameters: |
|
---|
Returns: None
config.config
parse_bound_pair(val)
Parse a string representing a pair of bounds (lower, upper) into a tuple of floats. The upper bound can be 'inf' or 'infinity' to represent infinity. Raises ValueError if the input is not in the correct format. Args: val (str): The string to parse, e.g., "0,3" or "0,infinity". Returns: tuple: A tuple containing the lower and upper bounds as floats.
parse_fix_value(val)
Parse a fixed value or a list of fixed values from a string. If the input is a single value, it returns that value as a float. If the input is a comma-separated list, it returns a list of floats. Raises ValueError if the input is not in the correct format. Args: val (str): The string to parse, e.g., "1.0" or "1.0,2.0". Returns: float or list: The parsed fixed value(s) as a float or a list of floats.
ensure_output_directory(directory)
Parameters: |
|
---|
Returns: None
parse_args()
Parse command-line arguments for the PhosKinTime script. This function uses argparse to define and handle the command-line options. It includes options for setting bounds, fixed parameters, bootstrapping, profile estimation, and input file paths. The function returns the parsed arguments as a Namespace object. The arguments include: --A-bound, --B-bound, --C-bound, --D-bound, --Ssite-bound, --Dsite-bound, --bootstraps, --input-excel-protein, --input-excel-psite, --input-excel-rna.
Returns: argparse.Namespace: The parsed command-line arguments.
log_config(logger, bounds, args)
Log the configuration settings for the PhosKinTime script. This function logs the parameter bounds bootstrapping iterations. It uses the provided logger to output the information.
Parameters: |
|
---|
Returns: None
extract_config(args)
Extract configuration settings from command-line arguments. This function creates a dictionary containing the parameter bounds, bootstrapping iterations. The function returns the configuration dictionary.
Parameters: |
|
---|
Returns: dict: The configuration settings.
score_fit(params, target, prediction, alpha=ALPHA_WEIGHT, beta=BETA_WEIGHT, gamma=GAMMA_WEIGHT, delta=DELTA_WEIGHT, mu=MU_WEIGHT)
Calculate the score for the fit of a model to target data. The score is a weighted combination of various metrics including mean squared error (MSE), root mean squared error (RMSE), mean absolute error (MAE), variance, and regularization penalty. The weights for each metric can be adjusted using the parameters alpha, beta, gamma, and delta. The regularization penalty is controlled by the reg_penalty parameter. The function returns the calculated score. Args: params (np.ndarray): The model parameters. target (np.ndarray): The target data. prediction (np.ndarray): The predicted data. alpha (float): Weight for RMSE. beta (float): Weight for MAE. gamma (float): Weight for variance. delta (float): Weight for MSE. mu (float): Regularization penalty weight. Returns: float: The calculated score.
future_times(n_new: int, ratio: Optional[float] = None, tp: np.ndarray = TIME_POINTS) -> np.ndarray
Extend ttime points by n_new points, each spaced by multiplying the previous interval by ratio. If ratio is None, it is inferred from the last two points.
Parameters: |
|
---|
Returns: np.ndarray: Extended time points.
config.constants
get_param_names_rand(num_psites: int) -> list
Generate parameter names for the random model.
Format: ['A', 'B', 'C', 'D'] +
['S1', 'S2', ..., 'S
Parameters: |
|
---|
Returns: list: List of parameter names.
get_param_names_ds(num_psites: int) -> list
Generate parameter names for distributive or successive models.
Format: ['A', 'B', 'C', 'D'] +
['S1', 'S2', ..., 'S
Parameters: |
|
---|
Returns: list: List of parameter names.
generate_labels_rand(num_psites: int) -> list
Generates labels for the states based on the number of phosphorylation sites for the random model. Returns a list with the base labels "R" and "P", followed by labels for all combinations of phosphorylated sites.
Parameters: |
|
---|
Returns: list: List of state labels.
generate_labels_ds(num_psites: int) -> list
Generates labels for the states based on the number of phosphorylation sites for the distributive or successive models. Returns a list with the base labels "R" and "P", followed by labels for each individual phosphorylated state.
Parameters: |
|
---|
Returns: list: List of state labels.
location(path: str, label: str = None) -> str
Returns a clickable hyperlink string for supported terminals using ANSI escape sequences.
Parameters: |
|
---|
Returns: |
|
---|
get_number_of_params_rand(num_psites)
Calculate the number of parameters required for the ODE system based on the number of phosphorylation sites.
Parameters: |
|
---|
Returns: |
|
---|
get_bounds_rand(num_psites, ub=0, lower=0)
Generate bounds for the ODE parameters based on the number of phosphorylation sites.
Parameters: |
|
---|
Returns: |
|
---|
config.logconf
ColoredFormatter
Bases: Formatter
Custom formatter to add colors to log messages and elapsed time.
This formatter uses ANSI escape codes to colorize the log messages based on their severity level.
It also includes a right-aligned clock that shows the elapsed time since the logger was initialized.
The elapsed time is displayed in a human-readable format (e.g., "1h 23m 45s").
The formatter is designed to be used with a logger that has a console handler.
The elapsed time is calculated from the time the logger was initialized and is displayed in a right-aligned format.
The formatter also ensures that the log messages are padded to a specified width, which can be adjusted using the width
parameter.
The remove_ansi
method is used to strip ANSI escape codes from the log message for accurate padding calculation.
The format
method is overridden to customize the log message format, including the timestamp, logger name, log level, and message.
The setup_logger
function is used to configure the logger with a file handler and a stream handler.
The file handler writes log messages to a specified log file, while the stream handler outputs log messages to the console.
The logger is set to the specified logging level, and the log file is created in the specified directory.
The log file is rotated based on size, and old log files are backed up.
format(record)
Format the log record with colors and elapsed time. This method overrides the default format method to customize the log message format. It includes the timestamp, logger name, log level, and message.
remove_ansi(s)
staticmethod
Remove ANSI escape codes from a string.
setup_logger(name='phoskintime', log_file=None, level=logging.DEBUG, log_dir=LOG_DIR, rotate=True, max_bytes=2 * 1024 * 1024, backup_count=5)
Setup a logger with colored output and file logging. This function creates a logger with colored output for console messages :param name: :param log_file: :param level: :param log_dir: :param rotate: :param max_bytes: :param backup_count: :return: logger
Core Functions
paramest.normest
worker_find_lambda(lam: float, gene: str, target: np.ndarray, p0: np.ndarray, time_points: np.ndarray, free_bounds: Tuple[np.ndarray, np.ndarray], init_cond: np.ndarray, num_psites: int, p_data: np.ndarray, pr_data: np.ndarray) -> Tuple[float, float, str]
Worker function for a single lambda value.
Parameters: |
|
---|
Returns: |
|
---|
find_best_lambda(gene: str, target: np.ndarray, p0: np.ndarray, time_points: np.ndarray, free_bounds: Tuple[np.ndarray, np.ndarray], init_cond: np.ndarray, num_psites: int, p_data: np.ndarray, pr_data: np.ndarray, lambdas=np.logspace(-2, 0, 10), max_workers: int = os.cpu_count()) -> Tuple[float, str]
Finds best lambda_reg to use in model_func.
normest(gene, pr_data, p_data, r_data, init_cond, num_psites, time_points, bounds, bootstraps, use_regularization=USE_REGULARIZATION)
Function to estimate parameters for a given gene using ODE models.
Parameters: |
|
---|
Returns: |
|
---|
paramest.toggle
estimate_parameters(gene, pr_data, p_data, r_data, init_cond, num_psites, time_points, bounds, bootstraps)
This function allows for the selection of the estimation mode and handles the parameter estimation process accordingly.
Parameters: |
|
---|
Returns: |
|
---|
Weights for Curve Fitting
models.weights
early_emphasis(pr_data, p_data, time_points, num_psites)
Function that calculates custom weights for early time points in a dataset.
Parameters: |
|
---|
Returns: |
|
---|
get_protein_weights(gene, input1_path=Path(__file__).resolve().parent.parent / 'processing' / 'input1_wstd.csv', input2_path=Path(__file__).resolve().parent.parent / 'kinopt' / 'data' / 'input2.csv')
Function to extract weights for a specific gene from the input files.
Parameters: |
|
---|
Returns: |
|
---|
full_weight(p_data_weight, use_regularization, reg_len)
Function to create a full weight array for parameter estimation.
Parameters: |
|
---|
Returns: |
|
---|
get_weight_options(target, t_target, num_psites, use_regularization, reg_len, early_weights, ms_gauss_weights)
Function to calculate weights for parameter estimation based on the target data and time points.
Parameters: |
|
---|
Returns: |
|
---|
Parameter Estimation
paramest.core
process_gene(gene, protein_data, kinase_data, mrna_data, time_points, bounds, bootstraps=0, out_dir=OUT_DIR)
Process a single gene by estimating its parameters and generating plots.
Parameters: |
|
---|
Returns: |
|
---|
process_gene_wrapper(gene, protein_data, kinase_data, mrna_data, time_points, bounds, bootstraps, out_dir=OUT_DIR)
Wrapper function to process a gene.
Parameters: |
|
---|
Returns: |
|
---|
Confidence Intervals using Linearization
paramest.identifiability.ci
confidence_intervals(gene, popt, pcov, target, model, alpha_val=0.05)
Computes the confidence intervals for parameter estimates using Wald Intervals approach.
Parameters: |
|
---|
Returns: |
|
---|
Knockout Analysis
knockout.helper
Perturbation & Parameter Sensitivity Analysis
sensitivity.analysis
compute_bound(value, perturbation=PERTURBATIONS_VALUE)
Computes the lower and upper bounds for a given parameter value for sensitivity analysis and perturbations.
Parameters: |
|
---|
Returns: |
|
---|
define_sensitivity_problem_rand(num_psites, values)
Defines the Morris sensitivity analysis problem for the random model.
Parameters: |
|
---|
Returns: |
|
---|
define_sensitivity_problem_ds(num_psites, values)
Defines the Morris sensitivity analysis problem for the dynamic-site model.
Parameters: |
|
---|
Returns: |
|
---|
Model Diagram
models.diagram.helpers
powerset(iterable)
Return the list of all subsets (as frozensets) of the given iterable.
Parameters: |
|
---|
Returns: A list of frozensets representing all subsets of the input iterable.
state_label(state)
Convert a set of phosphorylation sites into a node label.
Parameters: |
|
---|
Returns: A string representing the label for the node.
create_random_diagram(x, num_sites, output_filename)
Create a random phosphorylation diagram.
Parameters: |
|
---|
create_distributive_diagram(x, num_sites, output_filename)
Create a distributive phosphorylation diagram.
Parameters: |
|
---|
create_successive_model(x, num_sites, output_filename)
Create a successive phosphorylation diagram.
Parameters: |
|
---|
Model Types
models.distmod
ode_core(y, t, A, B, C, D, S_rates, D_rates)
The core ODE system for the distributive phosphorylation model.
Parameters: |
|
---|
Returns: |
|
---|
unpack_params(params, num_psites)
Function to unpack the parameters for the distributive ODE system.
Parameters: |
|
---|
Returns: |
|
---|
solve_ode(params, init_cond, num_psites, t)
Solve the ODE system for the distributive phosphorylation model.
Parameters: |
|
---|
Returns: |
|
---|
models.randmod
unpack_params(params, num_sites)
Unpack parameters for the Random model.
Parameters: |
|
---|
Returns: |
|
---|
ode_system(y, t, A, B, C, D, num_sites, S, Ddeg, mono_idx, forward, drop, fcounts, dcounts)
Compute the time derivatives of a random phosphorylation ODE system.
This function supports a large number of phosphorylation states by using precomputed transition indices to optimize speed.
Parameters: |
|
---|
Returns: |
|
---|
solve_ode(popt, y0, num_sites, t)
Integrate the ODE system for phosphorylation dynamics in random phosphorylation model.
Parameters: |
|
---|
Returns: |
|
---|
models.succmod
ode_core(y, t, A, B, C, D, S_rates, D_rates)
The core of the ODE system for the successive ODE model.
Parameters: |
|
---|
Returns: dydt (np.array): The derivatives of the state variables.
unpack_params(params, num_psites)
Function to unpack the parameters for the ODE system. The parameters are expected to be in the following order: A, B, C, D, S_rates, D_rates where S_rates and D_rates are arrays of length num_psites. The function returns the unpacked parameters as separate variables. :param params: array of parameters :param num_psites: number of phosphorylation sites :return: A, B, C, D, S_rates, D_rates
solve_ode(params, init_cond, num_psites, t)
Solve the ODE system using the given parameters and initial conditions. The function integrates the ODE system over time and returns the solution.
:param params: :param init_cond: :param num_psites: :param t: :return: solution, solution of phosphorylated sites
Steady-State Calculation
steady.initdist
initial_condition(num_psites: int) -> list
Calculates the initial steady-state conditions for a given number of phosphorylation sites for distributive phosphorylation model.
Parameters: |
|
---|
Returns: |
|
---|
Raises: |
|
---|
steady.initrand
initial_condition(num_psites: int) -> list
Calculates the initial steady-state conditions for a given number of phosphorylation sites for random phosphorylation model.
Parameters: |
|
---|
Returns: |
|
---|
Raises: |
|
---|
steady.initsucc
initial_condition(num_psites: int) -> list
Calculates the initial steady-state conditions for a given number of phosphorylation sites for successive phosphorylation model.
Parameters: |
|
---|
Returns: |
|
---|
Raises: |
|
---|
Plotting
plotting.plotting
Plotter
A class to encapsulate plotting functionalities for ODE model analysis.
Attributes: |
|
---|
plot_parallel(solution: np.ndarray, labels: list)
Plots a parallel coordinates plot for the given solution.
Parameters: |
|
---|
pca_components(solution: np.ndarray, target_variance: float = 0.99)
Plots a scree plot showing the explained variance ratio for PCA components.
Parameters: |
|
---|
plot_pca(solution: np.ndarray, components: int = 3)
Plots the PCA results for the given solution.
Parameters: |
|
---|
Returns: |
|
---|
plot_tsne(solution: np.ndarray, perplexity: int = 30)
Plots a t-SNE visualization of the given solution.
Parameters: |
|
---|
Returns: |
|
---|
plot_param_series(estimated_params: list, param_names: list, time_points: np.ndarray)
Plots the time series of estimated parameters over the given time points.
Parameters: |
|
---|
plot_profiles(data: pd.DataFrame)
Plots the profiles of estimated parameters over time.
Parameters: |
|
---|
plot_model_fit(model_fit: np.ndarray, Pr_data: np.ndarray, P_data: np.ndarray, R_data: np.ndarray, sol: np.ndarray, num_psites: int, psite_labels: list, time_points: np.ndarray)
Plots the model fit for mRNA, protein, and phosphorylated species across time.
Parameters: |
|
---|
plot_param_scatter(est_arr: np.ndarray, num_psites: int, time_vals: np.ndarray)
Plots scatter and density plots for parameters.
Parameters: |
|
---|
plot_heatmap(param_value_df: pd.DataFrame)
Parameters: |
|
---|
plot_error_distribution(error_df: pd.DataFrame)
Parameters: |
|
---|
plot_gof(merged_data: pd.DataFrame)
Plot the goodness of fit for the model.
Parameters: |
|
---|
plot_kld(merged_data: pd.DataFrame)
Plots the Kullback-Divergence for the model.
Parameters: |
|
---|
plot_params_bar(ci_results: dict, param_labels: list = None)
Plots bar plot for estimated parameter with 95% Confidence Interval.
Parameters: |
|
---|
plot_knockouts(results_dict: dict, num_psites: int, psite_labels: list)
Plot wild-type and knockout simulation results for comparison.
Parameters: |
|
---|
plot_top_param_pairs(excel_path: str)
For each gene's '_perturbations' sheet in the Excel file, plot scatter plots for the parameter pairs with correlation.
Parameters: |
|
---|
plot_model_perturbations(problem: dict, Si: dict, cutoff_idx: int, time_points: np.ndarray, n_sites: int, best_model_psite_solutions: np.ndarray, best_mrna_solutions: np.ndarray, best_protein_solutions: np.ndarray, psite_labels: list[str], protein_data_ref: np.ndarray, psite_data_ref: np.ndarray, rna_ref: np.ndarray, model_fit_sol: np.ndarray) -> None
Plot the best model perturbations for the given data.
Parameters: |
|
---|
plot_time_state_grid(samples: np.ndarray, time_points: np.ndarray, state_names: list)
Grid of strip plots per state showing variability across time.
Parameters: |
|
---|
plot_phase_space(samples: np.ndarray, state_names: list)
Phase space plots: one state vs another for each simulation.
Parameters: |
|
---|
plot_future_fit(P_data: np.ndarray, R_data: np.ndarray, sol: np.ndarray, num_psites: int, psite_labels: list, time_points: np.ndarray)
Plots the model fit for the future time points.
Parameters: |
|
---|
plot_regularization(excel_path: str)
Read every '
Parameters: |
|
---|
plot_model_error(excel_path: str)
Read every '
Parameters: |
|
---|
Utility Functions
utils.display
ensure_output_directory(directory)
Ensure the output directory exists. If it doesn't, create it.
Parameters: |
|
---|
load_data(excel_file, sheet='Estimated Values')
Load data from an Excel file. The default sheet is "Estimated Values".
Parameters: |
|
---|
Returns: |
|
---|
format_duration(seconds)
Format a duration in seconds into a human-readable string.
Parameters: |
|
---|
Returns: str: Formatted duration string.
merge_obs_est(filename)
Function to merge observed and estimated data from an Excel file.
Parameters: |
|
---|
Returns: |
|
---|
save_result(results, excel_filename)
Function to save results to an Excel file.
Parameters: |
|
---|
create_report(results_dir: str, output_file: str = f'{model_type}_report.html')
Creates a single global report HTML file from all gene folders inside the results directory.
Parameters: |
|
---|
organize_output_files(directories: Iterable[Union[str, Path]])
Organize output files into protein-specific folders and a general folder.
Parameters: |
|
---|
utils.tables
generate_tables(xlsx_file_path)
Generate hierarchical tables from the XLSX file containing alpha and beta values.
Parameters: |
|
---|
Returns: |
|
---|
save_tables(tables, output_dir)
Save the generated tables as LaTeX and CSV files.
Parameters: |
|
---|
save_master_table(folder='latex', output_file='latex/all_tables.tex')
Save a master LaTeX file that includes all individual LaTeX files from the specified folder.
Parameters: |
|
---|
utils.latexit
generate_latex_table(df, sheet_name)
Generate LaTeX code for a table from a DataFrame. Args: df (pd.DataFrame): DataFrame to convert to LaTeX. sheet_name (str): Name of the sheet for caption and label. Returns: str: LaTeX code for the table.
generate_latex_image(image_filename)
Generate LaTeX code for an image.
Parameters: |
|
---|
Returns: str: LaTeX code for the image.
main(input_dir)
Main function to process Excel and PNG files in the input directory and generate LaTeX code.
Parameters: |
|
---|