API Reference
This reference only documents importable modules and packages that exist in this repository.
Optimization
kinopt.local.config.constants
parse_args()
kinopt.local CLI. Defaults come from config.toml.
kinopt.local.config.logconf
kinopt.local.exporter.plotout
format_timepoints(tp, tol=1e-09)
Format timepoints with minimal decimals: - integers -> no decimal - non-integers -> one decimal
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
tp
|
array - like
|
Timepoints (list or np.ndarray) |
required |
tol
|
float
|
Tolerance for floating-point integer check |
1e-09
|
Returns:
| Type | Description |
|---|---|
|
list[str]: Formatted labels |
plot_fits_for_gene(gene, gene_data, real_timepoints)
Function to plot the observed and estimated phosphorylation levels for each psite of a gene.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
gene
|
str
|
The name of the gene. |
required |
gene_data
|
dict
|
A dictionary containing observed and estimated data for each psite of the gene. |
required |
real_timepoints
|
list
|
A list of timepoints corresponding to the observed and estimated data. |
required |
export_outcomes_to_csv(outcomes, csv_path)
Export multistart optimization outcomes to CSV.
One row per start, scalar diagnostics only.
plot_cumulative_residuals(gene, gene_data, real_timepoints)
Function to plot the cumulative residuals for each psite of a gene.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
gene
|
str
|
The name of the gene. |
required |
gene_data
|
dict
|
A dictionary containing the residuals for each psite of the gene. |
required |
real_timepoints
|
list
|
A list of timepoints corresponding to the observed and estimated data. |
required |
plot_autocorrelation_residuals(gene, gene_data, real_timepoints)
Function to plot the autocorrelation of residuals for each psite of a gene.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
gene
|
str
|
The name of the gene. |
required |
gene_data
|
dict
|
A dictionary containing the residuals for each psite of the gene. |
required |
real_timepoints
|
list
|
A list of timepoints corresponding to the observed and estimated data. |
required |
plot_histogram_residuals(gene, gene_data, real_timepoints)
Function to plot histograms of residuals for each psite of a gene.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
gene
|
str
|
The name of the gene. |
required |
gene_data
|
dict
|
A dictionary containing the residuals for each psite of the gene. |
required |
real_timepoints
|
list
|
A list of timepoints corresponding to the observed and estimated data. |
required |
plot_qqplot_residuals(gene, gene_data, real_timepoints)
Function to plot QQ plots of residuals for each psite of a gene.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
gene
|
str
|
The name of the gene. |
required |
gene_data
|
dict
|
A dictionary containing the residuals for each psite of the gene. |
required |
real_timepoints
|
list
|
A list of timepoints corresponding to the observed and estimated data. |
required |
plot_multistart_summary_runtime_overlay(summary_csv, out_path=None, figsize=(8, 8), x_col='rank', y_col='fun', c_col='runtime_s', success_col='success', cv_col='constr_violation', annotate_best=True)
Read a multistart summary CSV and plot objective vs rank with point color = runtime.
Minimal, information-dense conventions: - x: rank (best -> worst) - y: final objective (fun) - color: runtime in seconds - optional: de-emphasize non-success / infeasible points (if columns exist)
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
summary_csv
|
str | Path
|
Path to the multistart_summary.csv |
required |
out_path
|
str | Path | None
|
If provided, saves the figure (e.g. .png) |
None
|
figsize
|
tuple
|
Figure size in inches |
(8, 8)
|
x_col, y_col, c_col
|
Column names |
required | |
success_col, cv_col
|
Optional columns for styling (used if present) |
required | |
annotate_best
|
bool
|
Annotate the best run (rank=1 or min fun) |
True
|
Returns:
| Type | Description |
|---|---|
(fig, ax, df)
|
Matplotlib figure/axis and the loaded DataFrame |
kinopt.local.exporter.sheetutils
format_timepoints(tp, tol=1e-09)
Format timepoints with minimal decimals: - integers -> no decimal - non-integers -> one decimal
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
tp
|
array - like
|
Timepoints (list or np.ndarray) |
required |
tol
|
float
|
Tolerance for floating-point integer check |
1e-09
|
Returns:
| Type | Description |
|---|---|
|
list[str]: Formatted labels |
plot_fits_for_gene(gene, gene_data, real_timepoints)
Function to plot the observed and estimated phosphorylation levels for each psite of a gene.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
gene
|
str
|
The name of the gene. |
required |
gene_data
|
dict
|
A dictionary containing observed and estimated data for each psite of the gene. |
required |
real_timepoints
|
list
|
A list of timepoints corresponding to the observed and estimated data. |
required |
export_outcomes_to_csv(outcomes, csv_path)
Export multistart optimization outcomes to CSV.
One row per start, scalar diagnostics only.
plot_cumulative_residuals(gene, gene_data, real_timepoints)
Function to plot the cumulative residuals for each psite of a gene.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
gene
|
str
|
The name of the gene. |
required |
gene_data
|
dict
|
A dictionary containing the residuals for each psite of the gene. |
required |
real_timepoints
|
list
|
A list of timepoints corresponding to the observed and estimated data. |
required |
plot_autocorrelation_residuals(gene, gene_data, real_timepoints)
Function to plot the autocorrelation of residuals for each psite of a gene.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
gene
|
str
|
The name of the gene. |
required |
gene_data
|
dict
|
A dictionary containing the residuals for each psite of the gene. |
required |
real_timepoints
|
list
|
A list of timepoints corresponding to the observed and estimated data. |
required |
plot_histogram_residuals(gene, gene_data, real_timepoints)
Function to plot histograms of residuals for each psite of a gene.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
gene
|
str
|
The name of the gene. |
required |
gene_data
|
dict
|
A dictionary containing the residuals for each psite of the gene. |
required |
real_timepoints
|
list
|
A list of timepoints corresponding to the observed and estimated data. |
required |
plot_qqplot_residuals(gene, gene_data, real_timepoints)
Function to plot QQ plots of residuals for each psite of a gene.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
gene
|
str
|
The name of the gene. |
required |
gene_data
|
dict
|
A dictionary containing the residuals for each psite of the gene. |
required |
real_timepoints
|
list
|
A list of timepoints corresponding to the observed and estimated data. |
required |
plot_multistart_summary_runtime_overlay(summary_csv, out_path=None, figsize=(8, 8), x_col='rank', y_col='fun', c_col='runtime_s', success_col='success', cv_col='constr_violation', annotate_best=True)
Read a multistart summary CSV and plot objective vs rank with point color = runtime.
Minimal, information-dense conventions: - x: rank (best -> worst) - y: final objective (fun) - color: runtime in seconds - optional: de-emphasize non-success / infeasible points (if columns exist)
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
summary_csv
|
str | Path
|
Path to the multistart_summary.csv |
required |
out_path
|
str | Path | None
|
If provided, saves the figure (e.g. .png) |
None
|
figsize
|
tuple
|
Figure size in inches |
(8, 8)
|
x_col, y_col, c_col
|
Column names |
required | |
success_col, cv_col
|
Optional columns for styling (used if present) |
required | |
annotate_best
|
bool
|
Annotate the best run (rank=1 or min fun) |
True
|
Returns:
| Type | Description |
|---|---|
(fig, ax, df)
|
Matplotlib figure/axis and the loaded DataFrame |
output_results(P_initial, P_init_dense, P_estimated, residuals, alpha_values, beta_values, result, mse, rmse, mae, mape, r_squared)
Function to output the results of the optimization process.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
P_initial
|
dict
|
Dictionary containing initial phosphorylation data. |
required |
P_init_dense
|
ndarray
|
Dense matrix of initial phosphorylation data. |
required |
P_estimated
|
ndarray
|
Dense matrix of estimated phosphorylation data. |
required |
residuals
|
ndarray
|
Dense matrix of residuals. |
required |
alpha_values
|
dict
|
Dictionary containing optimized alpha values. |
required |
beta_values
|
dict
|
Dictionary containing optimized beta values. |
required |
result
|
OptimizeResult
|
Result object from the optimization process. |
required |
mse
|
float
|
Mean Squared Error of the optimization. |
required |
rmse
|
float
|
Root Mean Squared Error of the optimization. |
required |
mae
|
float
|
Mean Absolute Error of the optimization. |
required |
mape
|
float
|
Mean Absolute Percentage Error of the optimization. |
required |
r_squared
|
float
|
R-squared value of the optimization. |
required |
export_params_npz(outcomes, path)
Export the optimized parameters to a compressed npz file.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
outcomes
|
list
|
List of OptimizeResult objects. |
required |
path
|
str
|
Path to save the npz file. |
required |
kinopt.local.objfn.minfn
kinopt.local.opt.optrun
StartOutcome
dataclass
Outcome of a single optimization start.
:param start_id: ID of the start. :param seed: Seed used for the start. :param result: Result of the optimization. :param optimized_params: Optimized parameters. :param fun: Objective function value. :param success: Whether the optimization was successful. :param constr_violation: Constraint violation. :param runtime_s: Runtime of the optimization.
run_optimization(obj_fun, params_initial, opt_method, bounds, constraints)
Run optimization using the specified method.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
obj_fun
|
Objective function to minimize. |
required | |
params_initial
|
Initial parameters for the optimization. |
required | |
opt_method
|
Optimization method to use (e.g., 'SLSQP', 'trust-constr'). |
required | |
bounds
|
Bounds for the parameters. |
required | |
constraints
|
Constraints for the optimization. |
required |
Returns:
| Name | Type | Description |
|---|---|---|
result |
Result of the optimization. |
|
optimized_params |
Optimized parameters. |
multistart_run_optimization(obj_fun, params_initial, opt_method, bounds, constraints, n_starts=24, n_jobs=-1, base_seed=1234, init_strategy='hybrid', jitter_scale=0.15, prefer_feasible=True, logger=None)
Runs run_optimization multiple times in parallel and returns (best_result, best_params, outcomes).
Selection logic (sophisticated but simple): 1) If prefer_feasible: prefer (cv <= 0) or smallest constraint violation. 2) Then lowest objective. 3) Then success=True as tie-breaker. 4) Then shortest runtime as final tie-breaker.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
obj_fun
|
Objective function to optimize. |
required | |
params_initial
|
Initial parameters for optimization. |
required | |
opt_method
|
Optimization method to use (e.g., 'SLSQP', 'trust-constr'). |
required | |
bounds
|
Parameter bounds for optimization. |
required | |
constraints
|
Constraints for optimization. |
required | |
n_starts
|
Number of optimization starts to run (default: 24). |
24
|
|
n_jobs
|
Number of parallel jobs to run. -1 means use all processors (default: -1). |
-1
|
|
base_seed
|
Base seed for random number generation (default: 1234). |
1234
|
|
init_strategy
|
Strategy for sampling initial parameters: 'jitter', 'uniform', or 'hybrid' (default: 'hybrid'). |
'hybrid'
|
|
jitter_scale
|
Scale for jittering initial parameters (default: 0.15). |
0.15
|
|
prefer_feasible
|
If True, prefer feasible solutions over infeasible ones (default: True). |
True
|
|
logger
|
Logger instance for logging messages (default: None). |
None
|
Returns:
| Name | Type | Description |
|---|---|---|
tuple |
A tuple containing: - best_result: The optimization result object with the best outcome. - best_params: The optimized parameters corresponding to the best result. - outcomes: List of StartOutcome objects for all optimization starts. |
kinopt.local.optcon.construct
load_geneid_to_psites(input1_path=INPUT1)
Load the geneid to psite mapping from a CSV file.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
input1_path
|
str
|
Path to the input CSV file containing geneid and psite information. |
INPUT1
|
Returns: defaultdict: A dictionary mapping geneid to a set of psites.
get_unique_kinases(input2_path=INPUT2)
Extract unique kinases from the input CSV file.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
input2_path
|
str
|
Path to the input CSV file containing kinase information. |
INPUT2
|
Returns: set: A set of unique kinases.
check_kinases()
Check if kinases in input2.csv are present in input1.csv and log the results.
kinopt.local.utils.iodata
format_duration(seconds)
Formats a duration in seconds into a human-readable string. - If less than 60 seconds, returns in seconds. - If less than 3600 seconds, returns in minutes. - If more than 3600 seconds, returns in hours.
:param seconds: :return: Formatted string
load_and_scale_data(estimate_missing, scaling_method, split_point, seg_points)
Load and scale the data from the specified input files.
:param estimate_missing: :param scaling_method: :param split_point: :param seg_points: :return: Time series data, interaction data, observed data
apply_scaling(df, cols, method, split_point, seg_points)
Apply scaling to the specified columns of a DataFrame based on the given method. The scaling methods include: - 'min_max': Min-Max scaling - 'log': Logarithmic scaling - 'temporal': Temporal scaling (two segments) - 'segmented': Segmented scaling (multiple segments) - 'slope': Slope scaling - 'cumulative': Cumulative scaling
:param df: :param cols: :param method: :param split_point: :param seg_points: :return: df
create_report(results_dir: str, output_file: str = 'report.html')
Creates a single global report HTML file from all gene folders inside the results directory.
For each gene folder (e.g. "ABL2"), the report will include: - All PNG plots and interactive HTML plots displayed in a grid with three plots per row. - Each plot is confined to a fixed size of 900px by 900px. - Data tables from XLSX or CSV files in the gene folder are displayed below the plots, one per row.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
results_dir
|
str
|
Path to the root results directory. |
required |
output_file
|
str
|
Name of the generated global report file (placed inside results_dir). |
'report.html'
|
organize_output_files(*directories)
Function to organize output files into protein-specific folders. It moves files matching the pattern 'protein_name_*.{json,svg,png,html,csv,xlsx}' into a folder named after the protein (e.g., 'ABL2') and moves all other files into a 'General' folder within the same directory.
:param directories:
kinopt.local.utils.params
extract_parameters(P_initial, gene_kinase_counts, total_alpha, unique_kinases, K_index, optimized_params)
Extracts the alpha and beta parameters from the optimized parameters.
:param P_initial: :param gene_kinase_counts: :param total_alpha: :param unique_kinases: :param K_index: :param optimized_params: :return: Alpha and beta values as dictionaries
Fit Analysis
kinopt.fitanalysis.helpers.postfit
goodnessoffit(estimated, observed)
Function to plot the goodness of fit and kullback-leibler divergence for estimated and observed values.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
estimated
|
DataFrame
|
DataFrame containing estimated values. |
required |
observed
|
DataFrame
|
DataFrame containing observed values. |
required |
Returns:
| Type | Description |
|---|---|
|
None |
reshape_alpha_beta(alpha_values, beta_values)
Function to reshape alpha and beta values for plotting.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
alpha_values
|
DataFrame
|
DataFrame containing alpha values. |
required |
beta_values
|
DataFrame
|
DataFrame containing beta values. |
required |
Returns: pd.DataFrame: Reshaped DataFrame containing both alpha and beta values.
perform_pca(df)
Function to perform PCA analysis on the given DataFrame.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
df
|
DataFrame
|
DataFrame containing the data for PCA analysis. |
required |
Returns:
| Type | Description |
|---|---|
|
pd.DataFrame: DataFrame with PCA results and additional columns for type and gene/psite information. |
plot_pca(result_df_sorted, y_axis_column)
Plot PCA or t-SNE results for each gene/psite. The function creates scatter plots with different markers for alpha and beta parameters, and adds labels for each point. The function also adjusts text labels to avoid overlap using the adjustText library.
:param result_df_sorted: DataFrame containing PCA or t-SNE results. :param y_axis_column: Column name for the y-axis values in the plot.
perform_tsne(scaled_data, df)
Perform t-SNE analysis on the given scaled data. The function returns a DataFrame with t-SNE results and additional columns for type and gene/psite information.
:param scaled_data: :param df:
:return: - pd.DataFrame: DataFrame with t-SNE results and additional columns.
additional_plots(df, scaled_data, alpha_values, beta_values, residuals_df)
Function to create additional plots including CDF, KDE, Boxplot, and Hierarchical Clustering.
:param df: :param scaled_data: :param alpha_values: :param beta_values: :param residuals_df:
create_sankey_from_network(output_dir, data, title)
Creates a Sankey diagram from the given data and saves it as an HTML file.
This function processes the input data to generate nodes and links for a Sankey diagram. It assigns colors to nodes and links based on their attributes and values, and uses Plotly to render the diagram. The resulting diagram is saved as an HTML file in the specified output directory.
:param output_dir: str The directory where the Sankey diagram HTML file will be saved. :param data: pd.DataFrame A DataFrame containing the data for the Sankey diagram. It must include the following columns: - 'Source': The source node of the link. - 'Target': The target node of the link. - 'Value': The value of the link, which determines the flow size. :param title: str The title of the Sankey diagram.
The function performs the following steps: 1. Initializes nodes and links for the Sankey diagram. 2. Maps node labels to indices and assigns colors to nodes. 3. Processes the data to create links between nodes, assigning colors based on link values. 4. Builds the Sankey diagram using Plotly. 5. Adds a color bar to explain the flow gradient. 6. Saves the Sankey diagram as an HTML file in the specified output directory.
important_connections(output_dir, data, top_n=20)
Extracts the top N most important connections based on their absolute values and saves them to a CSV file.
:param output_dir: str The directory where the CSV file will be saved. :param data: pd.DataFrame A DataFrame containing the connections with columns 'Source', 'Target', and 'Value'. :param top_n: int, optional The number of top connections to extract (default is 20).
The function sorts the connections by their absolute values in descending order, selects the top N connections, and saves them to a CSV file named 'top_connections.csv' in the specified output directory.
Optimality and KKT Utilities
kinopt.optimality.KKT
generate_latex_table(summary_dict, table_caption, table=None)
Function to generate a LaTeX table from a summary dictionary.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
summary_dict
|
dict
|
Dictionary containing summary data. |
required |
table_caption
|
str
|
Caption for the LaTeX table. |
required |
table
|
str
|
Optional existing LaTeX table to append to. |
None
|
Returns:
| Name | Type | Description |
|---|---|---|
str |
LaTeX formatted table as a string. |
print_primal_feasibility_results(primal_summary, alpha_violations, beta_violations, logger_obj=None)
Logs the primal feasibility summary and violation details.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
primal_summary
|
dict
|
Dictionary containing primal feasibility results. |
required |
alpha_violations
|
dict
|
Dictionary containing alpha constraint violations. |
required |
beta_violations
|
dict
|
Dictionary containing beta constraint violations. |
required |
logger_obj
|
Optional logger object to log the information. |
None
|
print_sensitivity_and_active_constraints(sensitivity_summary, active_constraints_summary, logger_obj=None)
Logs the sensitivity summary and active constraints summary.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
sensitivity_summary
|
dict
|
Dictionary containing sensitivity analysis results. |
required |
active_constraints_summary
|
dict
|
Dictionary containing active constraints summary. |
required |
logger_obj
|
Optional logger object to log the information. |
None
|
plot_constraint_violations(alpha_violations, beta_violations, out_dir)
Function to plot constraint violations for alpha and beta values. It creates a stacked bar plot showing the violations for each protein. The top 5 proteins with the highest violations are highlighted in red.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
alpha_violations
|
Series
|
Series containing alpha constraint violations. |
required |
beta_violations
|
Series
|
Series containing beta constraint violations. |
required |
out_dir
|
str
|
Directory to save the plot. |
required |
plot_sensitivity_analysis(sensitivity_analysis, out_dir)
Function to plot sensitivity analysis results. It creates a horizontal bar plot showing the mean, max, and min sensitivity for each protein.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
sensitivity_analysis
|
DataFrame
|
DataFrame containing sensitivity analysis results. |
required |
out_dir
|
str
|
Directory to save the plot. |
required |
Returns:
| Type | Description |
|---|---|
|
None |
process_excel_results(file_path=OUT_FILE)
Function to process the Excel results file. It reads the alpha and beta values, estimated and observed values, validates normalization constraints, computes residuals and gradients, and generates LaTeX tables for the residuals and sensitivity summaries. It also performs sensitivity analysis and identifies high sensitivity sites. The results are returned as a dictionary.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
file_path
|
str
|
Path to the Excel file containing results. |
OUT_FILE
|
Returns: dict: Dictionary containing the processed results, including alpha and beta values, estimated and observed values, constraint violations, residuals summary, sensitivity summary, and high sensitivity sites.
post_optimization_results()
Function to process and visualize the results of the optimization.
Returns: dict: Dictionary containing the processed results, including alpha and beta values, estimated and observed values, constraint violations, residuals summary, sensitivity summary, and high sensitivity sites.
Data Processing
processing.cleanup
process_collecttri()
Processes the CollecTRI file to clean and filter mRNA-TF interactions. Removes complex interactions, filters by target genes, and saves the result.
format_site(site)
Formats a phosphorylation site string.
If the input is NaN or an empty string, returns an empty string. If the input contains an underscore ('_'), splits the string into two parts, converts the first part to uppercase, and appends the second part unchanged. Otherwise, converts the entire string to uppercase.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
site
|
str
|
The phosphorylation site string to format. |
required |
Returns:
| Name | Type | Description |
|---|---|---|
str |
The formatted phosphorylation site string. |
process_msgauss()
Processes the MS Gaussian data file to generate time series data.
process_msgauss_std()
Processes the MS Gaussian data file to compute transformed means and standard deviations.
process_routlimma()
Processes the Rout Limma table to generate time series data for mRNA.
update_gene_symbols(filename)
Updates the GeneID column in a CSV file by mapping GeneIDs to gene/protein symbols.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
filename
|
str
|
The path to the CSV file to be updated. The file must contain a 'GeneID' column. |
required |
move_processed_files()
Moves or copies processed files to their respective directories.
processing.map
map_optimization_results(tf_file_path, kin_file_path, sheet_name='Alpha Values')
Reads the TF-mRNA optimization results from an Excel file and maps mRNA to each TF.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
tf_file_path
|
Path to the Excel file containing TF-mRNA optimization results. |
required | |
kin_file_path
|
Path to the Excel file containing Kinase-Phosphorylation optimization results. |
required | |
sheet_name
|
The name of the sheet in the Excel file to read from. Default is 'Alpha Values'. |
'Alpha Values'
|
Returns:
| Type | Description |
|---|---|
|
pd.DataFrame: A DataFrame containing the mapped TF, mRNA, Psite, and Kinase information. |
create_cytoscape_table(mapping_csv_path)
Creates a Cytoscape-compatible edge table from a mapping file.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
mapping_csv_path
|
str
|
Path to the input CSV file with columns: TF, TF_strength, mRNA, Psite, Kinase, Kinase_strength |
required |
Returns:
| Name | Type | Description |
|---|---|---|
table |
DataFrame
|
Edge table with columns [Source, Target, Interaction, Strength] |
add_kinetic_strength_columns(mapping_path, mapping__path, excel_path, suffix)
Adds kinetic strength columns to the mapping files based on the provided Excel file.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
mapping_path
|
str
|
Path to the first mapping file. |
required |
mapping__path
|
str
|
Path to the second mapping file. |
required |
excel_path
|
str
|
Path to the Excel file containing kinetic strength data. |
required |
suffix
|
str
|
Suffix to append to the output files. |
required |
generate_nodes(edge_df)
Infers node types and aggregates all phosphorylation sites per target node from phosphorylation edges.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
edge_df
|
DataFrame
|
Must have columns ['Source', 'Target', 'Interaction', 'Psite'] |
required |
Returns:
| Type | Description |
|---|---|
|
pd.DataFrame: DataFrame with columns ['Node', 'Type', 'Psite'] |
Runtime Configuration
config.cli
Command‑line entry point for the phoskintime pipeline.
kinopt(mode: str = typer.Option('local', help='local | evol'), conf: Path | None = typer.Option(None, '--conf', file_okay=True, dir_okay=False, writable=False, help='Path to TOML/YAML config. Uses defaults if omitted.'))
Kinase-Phosphorylation Optimization.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
mode
|
str
|
local | evol |
Option('local', help='local | evol')
|
conf
|
Path | None
|
Path to TOML/YAML config. Uses defaults if omitted. |
Option(None, '--conf', file_okay=True, dir_okay=False, writable=False, help='Path to TOML/YAML config. Uses defaults if omitted.')
|
Returns: None
clean()
Remove all pycache, .pyc, .nbc, and build artifacts recursively.
all(kin_mode: str = typer.Option('local', help='kinopt mode: local | evol'), kin_conf: Path | None = typer.Option(None, help='kinopt config file'))
Run every stage in sequence. Preprocessing -> Kinase optimisation
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
kin_mode
|
str
|
kinopt mode: local | evol |
Option('local', help='kinopt mode: local | evol')
|
kin_conf
|
Path | None
|
Path to TOML/YAML config. Uses defaults if omitted. |
Option(None, help='kinopt config file')
|
Returns: None
config.logconf
ColoredFormatter
Bases: Formatter
Custom formatter to add colors to log messages and elapsed time.
This formatter uses ANSI escape codes to colorize the log messages based on their severity level.
It also includes a right-aligned clock that shows the elapsed time since the logger was initialized.
The elapsed time is displayed in a human-readable format (e.g., "1h 23m 45s").
The formatter is designed to be used with a logger that has a console handler.
The elapsed time is calculated from the time the logger was initialized and is displayed in a right-aligned format.
The formatter also ensures that the log messages are padded to a specified width, which can be adjusted using the width parameter.
The remove_ansi method is used to strip ANSI escape codes from the log message for accurate padding calculation.
The format method is overridden to customize the log message format, including the timestamp, logger name, log level, and message.
The setup_logger function is used to configure the logger with a file handler and a stream handler.
The file handler writes log messages to a specified log file, while the stream handler outputs log messages to the console.
The logger is set to the specified logging level, and the log file is created in the specified directory.
The log file is rotated based on size, and old log files are backed up.
format(record)
Format the log record with colors and elapsed time. This method overrides the default format method to customize the log message format. It includes the timestamp, logger name, log level, and message.
remove_ansi(s)
staticmethod
Remove ANSI escape codes from a string.
format_duration(seconds)
Format elapsed seconds as a compact human-readable duration.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
seconds
|
Elapsed time in seconds. |
required |
Returns:
| Type | Description |
|---|---|
|
A string in seconds, minutes, or hours depending on duration. |
setup_logger(name=None, log_file=None, level=logging.DEBUG, log_dir=LOG_DIR, rotate=True, max_bytes=2 * 1024 * 1024, backup_count=5, mp_file_logging='main_only')
Setup a logger with colored output and file logging.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
name
|
Logger name. If omitted, the caller module name is used. |
None
|
|
log_file
|
Optional log file path. If omitted, a dated file is created in |
None
|
|
level
|
Logging level for handlers and the logger. |
DEBUG
|
|
log_dir
|
Directory for file logs. |
LOG_DIR
|
|
rotate
|
Whether to use |
True
|
|
max_bytes
|
Maximum rotating log file size. |
2 * 1024 * 1024
|
|
backup_count
|
Number of rotated log files to keep. |
5
|
|
mp_file_logging
|
Multiprocessing file logging policy: |
'main_only'
|
Returns:
| Type | Description |
|---|---|
|
The configured logger. |
config_loader
load(mode: str, section: str) -> dict[str, Any]
cached
Load configuration for a specific mode and section.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
mode
|
str
|
The mode to load configuration for. |
required |
section
|
str
|
The section to load configuration for. |
required |
Returns:
| Type | Description |
|---|---|
dict[str, Any]
|
dict[str, Any]: The loaded configuration. |
ensure_dirs() -> None
Ensure that the necessary directories exist for the project.
Returns:
| Type | Description |
|---|---|
None
|
None |