PhosKinTime Data Preprocessing & Mapping
This workflow prepares and maps time-series data for kinase and transcription factor optimization models from raw proteomics and transcriptomics datasets.
Structure
phoskintime/
├── processing/
│ ├── cleanup.py # Data cleaning and preparation
│ └── map.py # Optimization result mapping and network table generation
├── raw/ # Input CSVs (CollecTRI, MS Gaussian, Rout Limma)
├── kinopt/data/ # Kinase model inputs
├── tfopt/data/ # TF model inputs
└── data/ # Network export for Cytoscape
Scripts Overview
cleanup.py
Performs the following steps:
-
TF-mRNA Interaction Cleanup
- Filters complex interactions in CollecTRI
- Keeps only TFs matching phospho-interactions in
input2.csv
-
Proteomics Data Transformation
- Transforms MS Gaussian predictions with
2^mean
- Formats phosphorylation sites, saves to
input1.csv
- Transforms MS Gaussian predictions with
-
Error Propagation
- Computes std propagation:
σ_y = 2^x * ln(2) * σ_x
- Saves to
input1_wstd.csv
- Computes std propagation:
-
Transcriptomics Cleanup
- Transforms Rout Limma values with
2^x
- Saves to
input3.csv
- Transforms Rout Limma values with
-
Gene Symbol Mapping
- Replaces Ensembl/Entrez IDs with gene symbols (using MyGeneInfo)
-
File Management
- Moves cleaned files to
kinopt/data/
andtfopt/data/
- Moves cleaned files to
map.py
This script processes optimization results for transcription factors (TFs) and kinases, mapping their interactions with mRNA and phosphorylation sites. It generates Cytoscape-compatible edge and node tables for network visualization.
Key Features:
- TF-mRNA Mapping: Extracts non-zero optimization results and groups mRNA by associated TFs and their strengths.
- Kinase-Phosphorylation Mapping: Maps kinases to mRNA and phosphorylation sites based on optimization results.
- Cytoscape Table Generation: Creates edge and node tables for network visualization, including interaction types and strengths.
- Kinetic Strength Integration: Adds kinetic strength columns to mapping files for further analysis.
Inputs
Place the following raw data in processing/raw/
:
CollecTRI.csv
MS_Gaussian_updated_09032023.csv
Rout_LimmaTable.csv
input2.csv
(phospho interactions)
Outputs
File | Description |
---|---|
input1.csv |
Phospho time series (KinOpt, TFOpt) |
input1_wstd.csv |
Same as above + standard deviation |
input2.csv |
Phospho kinase-interaction metadata |
input3.csv |
mRNA time series (TFOpt) |
input4.csv |
Clean TF-mRNA interactions |
mapping.csv |
Mapped TF → mRNA with Kinase + Psite |
mapping_.csv |
Cytoscape-compatible edge list |
nodes.csv |
Cytoscape node roles |
Notes
- Complex TF interactions (e.g.
COMPLEX:TF1/TF2
) are excluded. - Kinase-only proteins not appearing in CollecTRI (e.g.
PAK2
) are excluded from TF mapping. - Unmappable GeneIDs are printed at runtime.