PhosKinTime Data Preprocessing & Mapping

This workflow prepares and maps time-series data for kinase and transcription factor optimization models from raw proteomics and transcriptomics datasets.

Structure

phoskintime/
├── processing/
│   ├── cleanup.py     # Data cleaning and preparation
│   └── map.py         # Optimization result mapping and network table generation
├── raw/               # Input CSVs (CollecTRI, MS Gaussian, Rout Limma)
├── kinopt/data/       # Kinase model inputs
├── tfopt/data/        # TF model inputs
└── data/              # Network export for Cytoscape

Scripts Overview

`cleanup.py`

Performs the following steps:

TF-mRNA Interaction Cleanup
- Filters complex interactions in CollecTRI
- Keeps only TFs matching phospho-interactions in input2.csv
Proteomics Data Transformation
- Transforms MS Gaussian predictions with 2^mean
- Formats phosphorylation sites, saves to input1.csv
Error Propagation
- Computes std propagation:
  σ_y = 2^x * ln(2) * σ_x
- Saves to input1_wstd.csv
Transcriptomics Cleanup
- Transforms Rout Limma values with 2^x
- Saves to input3.csv
Gene Symbol Mapping
- Replaces Ensembl/Entrez IDs with gene symbols (using MyGeneInfo)
File Management
- Moves cleaned files to kinopt/data/ and tfopt/data/

`map.py`

This script processes optimization results for transcription factors (TFs) and kinases, mapping their interactions with mRNA and phosphorylation sites. It generates Cytoscape-compatible edge and node tables for network visualization.

Key Features:

TF-mRNA Mapping: Extracts non-zero optimization results and groups mRNA by associated TFs and their strengths.
Kinase-Phosphorylation Mapping: Maps kinases to mRNA and phosphorylation sites based on optimization results.
Cytoscape Table Generation: Creates edge and node tables for network visualization, including interaction types and strengths.
Kinetic Strength Integration: Adds kinetic strength columns to mapping files for further analysis.

Inputs

Place the following raw data in processing/raw/:

CollecTRI.csv
MS_Gaussian_updated_09032023.csv
Rout_LimmaTable.csv
input2.csv (phospho interactions)

Outputs

File	Description
`input1.csv`	Phospho time series (KinOpt, TFOpt)
`input1_wstd.csv`	Same as above + standard deviation
`input2.csv`	Phospho kinase-interaction metadata
`input3.csv`	mRNA time series (TFOpt)
`input4.csv`	Clean TF-mRNA interactions
`mapping.csv`	Mapped TF → mRNA with Kinase + Psite
`mapping_.csv`	Cytoscape-compatible edge list
`nodes.csv`	Cytoscape node roles

Notes

Complex TF interactions (e.g. COMPLEX:TF1/TF2) are excluded.
Kinase-only proteins not appearing in CollecTRI (e.g. PAK2) are excluded from TF mapping.
Unmappable GeneIDs are printed at runtime.