API Reference
Welcome to the API reference for PhosphoVelocity. This documentation is automatically generated from the Python source code.
Model & Inference¶
Core modules for Gaussian Process modeling and Bayesian inference.
Gaussian Process Surrogate¶
PhosphositeGP
¶
GP surrogate for a single phosphosite trajectory.
Incorporates observation uncertainty directly (heteroscedastic noise) and uses a Matérn 5/2 kernel which is better suited for biochemical dynamics.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
length_scale
|
float
|
Initial length-scale for the Matérn kernel (in minutes). |
30.0
|
normalize_y
|
bool
|
Whether to normalize the target variable before fitting. |
True
|
Source code in src/phospho_velocity/model/gp.py
63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 | |
fit(times, values, uncertainties=None)
¶
Fit GP to observed (time, log2_intensity) pairs.
Non-finite values are silently removed before fitting.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
times
|
(array - like, shape(n))
|
|
required |
values
|
(array - like, shape(n))
|
|
required |
uncertainties
|
(array - like, shape(n))
|
Standard deviations of the observations. If provided, they are
squared and passed as the |
None
|
Returns:
| Type | Description |
|---|---|
self
|
|
Source code in src/phospho_velocity/model/gp.py
89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 | |
fit_parallel(instances, times_list, values_list, uncertainties_list=None, n_jobs=-1, desc='Fitting GPs')
¶
Fit a list of PhosphositeGP instances in parallel with a tqdm progress bar.
The flow of every other method (predict, predict_derivative) is unchanged — this only replaces the sequential loop over .fit() calls.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
instances
|
list of PhosphositeGP
|
Pre-constructed GP objects (one per (site, cell_line) group). |
required |
times_list
|
list of np.ndarray
|
Time arrays, one per instance. |
required |
values_list
|
list of np.ndarray
|
Intensity arrays, one per instance. |
required |
uncertainties_list
|
list of np.ndarray or None
|
Uncertainty arrays, one per instance. Pass None for the whole list to use the fixed noise floor for all. |
None
|
n_jobs
|
int
|
Number of parallel jobs. -1 uses all available cores. |
-1
|
desc
|
str
|
tqdm bar label. |
'Fitting GPs'
|
Returns:
| Type | Description |
|---|---|
list of PhosphositeGP
|
The same instances, each with _gp and _is_fitted populated. |
Source code in src/phospho_velocity/model/gp.py
149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 | |
Observation Model¶
Observation model for phosphopeptide log-ratios.
Based on the uncertainty propagation described in Robin et al. (2019): the localization probability of a phosphosite determines how much uncertainty is attributed to that measurement.
ObservationModel
¶
Gaussian observation model for phosphopeptide log-ratios.
The likelihood is: p(y | mu, sigma) = Normal(y; mu, sigma)
Optionally, a Robin et al. 2019 Gamma-variance model can be fitted to
empirical log-ratio data via :meth:fit_variance_model, after which
:meth:sigma_from_model returns the predicted sigma for any log-ratio
magnitude.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
None
|
|
required |
Source code in src/phospho_velocity/model/observation.py
17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 | |
fit_variance_model(log_ratios, sd_values, min_bin_size=20)
¶
Fit a Robin et al. 2019 exponential variance model.
Models the relationship between |log_ratio| and observed SD as:
.. math::
b(x) = A \cdot \exp(B \cdot x), \quad x = \log(|\text{log\_ratio}| + \varepsilon)
where the parameters (A, B) are estimated by log-linear regression
on per-bin median SDs.
After calling this method, sigma_mode is automatically switched to
"variance_model" and :meth:sigma_from_model is available.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
log_ratios
|
(ndarray, shape(N))
|
Observed peptide log-ratios. |
required |
sd_values
|
(ndarray, shape(N))
|
Corresponding observed standard deviations. |
required |
min_bin_size
|
int
|
Minimum number of data points per bin (default 20). |
20
|
Source code in src/phospho_velocity/model/observation.py
67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 | |
log_likelihood(observed, predicted, sigma)
¶
Gaussian log-likelihood.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
observed
|
float or ndarray
|
Observed log2 intensity or log-ratio values. |
required |
predicted
|
float or ndarray
|
Predicted values from the GP or ODE model. |
required |
sigma
|
float or ndarray
|
Observation noise standard deviation. |
required |
Returns:
| Type | Description |
|---|---|
float or ndarray
|
Log-likelihood value(s). |
Source code in src/phospho_velocity/model/observation.py
38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 | |
sigma_from_model(log_ratio)
¶
Predict sigma using the fitted exponential variance model.
Requires :meth:fit_variance_model to have been called first.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
log_ratio
|
float or ndarray
|
Log-ratio magnitude(s) for which to predict sigma. |
required |
Returns:
| Type | Description |
|---|---|
float or ndarray
|
Predicted sigma value(s). |
Raises:
| Type | Description |
|---|---|
RuntimeError
|
If :meth: |
Source code in src/phospho_velocity/model/observation.py
155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 | |
peptide_log_ratio_uncertainty(localization_prob, min_sigma=0.1, max_sigma=2.0)
¶
Map localization probability to observation uncertainty.
High localization probability → low sigma (reliable measurement). Low localization probability → high sigma (uncertain measurement).
sigma = max_sigma - (max_sigma - min_sigma) * localization_prob
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
localization_prob
|
float or ndarray
|
Phosphosite localization probability in [0, 1]. |
required |
min_sigma
|
float
|
Minimum uncertainty (at localization_prob = 1). |
0.1
|
max_sigma
|
float
|
Maximum uncertainty (at localization_prob = 0). |
2.0
|
Returns:
| Type | Description |
|---|---|
float or ndarray
|
Uncertainty sigma value(s). |
Source code in src/phospho_velocity/model/observation.py
190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 | |
Bayesian Inference (PyMC)¶
Bayesian GP inference using PyMC.
Implements a fully Bayesian Gaussian Process model for phosphosite trajectories, yielding posterior distributions over the latent trajectory and its derivative (velocity).
BayesianPhosphositeModel
¶
Bayesian GP model for a single phosphosite trajectory.
Uses PyMC's latent GP with an ExpQuad (squared exponential) covariance.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
n_samples
|
int
|
Number of MCMC samples per chain. |
500
|
n_tune
|
int
|
Number of tuning steps. |
500
|
target_accept
|
float
|
NUTS target acceptance rate. |
0.9
|
random_seed
|
int
|
Random seed for reproducibility. |
42
|
network
|
KinaseSubstrateNetwork
|
If provided, the regularization prior |
None
|
site_id
|
str
|
Site identifier used with network to look up |
None
|
Source code in src/phospho_velocity/inference/gp_inference.py
20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 | |
fit(times, values, uncertainties=None)
¶
Fit the Bayesian GP model via NUTS.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
times
|
(array - like, shape(n))
|
Observed time points. |
required |
values
|
(array - like, shape(n))
|
Observed log2 intensities (may contain NaN). |
required |
uncertainties
|
(array - like, shape(n))
|
Per-observation sigma. Defaults to 0.5 for all observations. |
None
|
Returns:
| Type | Description |
|---|---|
InferenceData
|
Posterior samples. |
Source code in src/phospho_velocity/inference/gp_inference.py
62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 | |
posterior_predictive(idata, grid_times)
¶
Compute mean and std of posterior predictive at grid_times.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
idata
|
InferenceData
|
|
required |
grid_times
|
(array - like, shape(m))
|
|
required |
Returns:
| Name | Type | Description |
|---|---|---|
mean |
(ndarray, shape(m))
|
|
std |
(ndarray, shape(m))
|
|
Source code in src/phospho_velocity/inference/gp_inference.py
154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 | |
posterior_velocity(idata, grid_times, eps=0.5)
¶
Velocity posterior via the GP derivative kernel.
For each posterior draw (ℓ_i, σ_f_i, noise_i), instantiates a
:class:~phospho_velocity.model.gp_derivative.GPDerivative and
conditions it on the observed values v_obs (stored during
:meth:fit) using the per-observation sigma noise_i + unc.
The total velocity uncertainty is computed via the law of total
variance: Var[v] = E[Var[v|θ]] + Var[E[v|θ]].
Falls back to a finite-difference approximation of the posterior predictive when fewer than 2 observed time points are available.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
idata
|
InferenceData
|
|
required |
grid_times
|
(array - like, shape(m))
|
|
required |
eps
|
float
|
Finite-difference step used only for the fallback path. |
0.5
|
Returns:
| Name | Type | Description |
|---|---|---|
vel_mean |
(ndarray, shape(m))
|
|
vel_std |
(ndarray, shape(m))
|
|
Source code in src/phospho_velocity/inference/gp_inference.py
196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 | |
HierarchicalBayesianPhosphositeModel
¶
Hierarchical GP model for processing batches of phosphosites.
Source code in src/phospho_velocity/inference/gp_inference.py
302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399 400 401 402 403 404 405 406 407 408 409 410 411 412 413 414 415 416 417 418 419 420 421 422 423 424 425 426 427 428 429 430 431 432 433 434 | |
fit_and_predict_batch(df_batch, grid_times, group_col='_group_id', time_col='time_min', value_col='log2_intensity_norm', uncert_col='sigma')
¶
Fits the hierarchical model and computes posterior velocity directly. Returns a dictionary mapping group_id -> (vel_mean, vel_std, vel_q5, vel_q95)
Source code in src/phospho_velocity/inference/gp_inference.py
309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399 400 401 402 403 404 405 406 407 408 409 410 411 412 413 414 415 416 417 418 419 420 421 422 423 424 425 426 427 428 429 430 431 432 433 434 | |
Velocity Computation¶
Higher-level wrappers for estimating velocities across full datasets.
Velocity computation for phosphosite time courses.
Provides GP-derivative velocity estimators operating on tidy long-format DataFrames of phosphosite time-course data. Velocity is derived analytically from the joint Gaussian Process over (f, df/dt) using the Matérn 5/2 derivative kernel — no finite-difference approximation.
VelocityEstimator
¶
Estimate phosphosite velocity from time-course data via GP derivative.
Velocity is computed analytically as the posterior mean of the GP derivative, conditioned on the observations, using the Matérn 5/2 derivative kernel.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
grid_times
|
array - like
|
Dense time grid for GP evaluation. Defaults to
|
None
|
length_scale
|
float
|
GP length-scale parameter passed to :class: |
30.0
|
signal_var
|
float
|
GP signal variance passed to :class: |
1.0
|
network
|
KinaseSubstrateNetwork
|
If provided, used to derive per-site GP hyperparameters via
:meth: |
None
|
prior_strength
|
float
|
Scaling factor that controls how strongly the network connectivity modulates the per-site hyperparameters. Defaults to 1.0. |
1.0
|
Source code in src/phospho_velocity/velocity/compute.py
27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 | |
length_scale
property
¶
Alias for :attr:base_length_scale (backward compatibility).
signal_var
property
¶
Alias for :attr:base_signal_var (backward compatibility).
estimate_site_velocity(times, values, uncertainties=None)
¶
Estimate velocity for a single (site, cell_line) pair.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
times
|
(array - like, shape(n))
|
|
required |
values
|
(array - like, shape(n))
|
|
required |
uncertainties
|
(array - like, shape(n))
|
|
None
|
Returns:
| Type | Description |
|---|---|
dict with keys:
|
|
Source code in src/phospho_velocity/velocity/compute.py
120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 | |
run_bayesian_velocity(df, value_col='log2_intensity_norm', time_col='time_min', site_col='site_id', cell_line_col='cell_line', n_samples=1000, n_tune=1000, n_grid=60, batch_size=20, method='hierarchical_bayesian', extend_to=None, network=None)
¶
Run Bayesian GP velocity for all (site, cell_line) groups.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
df
|
DataFrame
|
|
required |
value_col
|
str
|
|
'log2_intensity_norm'
|
time_col
|
str
|
|
'time_min'
|
site_col
|
str
|
|
'site_id'
|
cell_line_col
|
str
|
|
'cell_line'
|
n_samples
|
int
|
|
1000
|
n_tune
|
int
|
|
1000
|
n_grid
|
int
|
Number of points in the evaluation time grid (default 60). |
60
|
batch_size
|
int
|
Number of trajectories to process simultaneously (Hierarchical only). |
20
|
method
|
str
|
'bayesian' (independent GPs) or 'hierarchical_bayesian' (batched partial pooling). |
'hierarchical_bayesian'
|
extend_to
|
float
|
Extend time grid to this value. |
None
|
network
|
KinaseSubstrateNetwork
|
If provided, used to set network-informed priors on GP signal variance. |
None
|
Returns:
| Type | Description |
|---|---|
DataFrame
|
Columns: site_id, cell_line, time_min, velocity_mean, velocity_sd, velocity_q5, velocity_q95. |
Source code in src/phospho_velocity/velocity/compute.py
290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399 400 401 402 403 404 405 406 407 408 409 410 411 412 413 414 415 416 417 418 419 420 421 422 423 424 425 426 427 428 429 430 431 432 433 434 435 436 437 438 439 440 441 442 443 444 445 446 447 448 449 450 451 452 453 454 455 456 457 458 | |
Data Processing¶
Modules for parsing raw MaxQuant files and preprocessing the data.
Input / Output¶
MaxQuant Phospho(STY)Sites.txt parser.
Parses the wide-format MaxQuant phosphoproteomics output into a tidy long-format DataFrame suitable for downstream GP-based time-course reconstruction and velocity modeling.
MaxQuantAdapter
¶
Adapter for MaxQuant Phospho(STY)Sites.txt files.
Wraps the module-level parsing functions as a
:class:~phospho_velocity.io.base.PhosphoInputAdapter-compatible object.
The module-level functions are kept for backward compatibility.
Source code in src/phospho_velocity/io/maxquant.py
225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 | |
parse(path)
¶
Parse a MaxQuant Phospho(STY)Sites.txt file.
Delegates to :func:parse_phosphosites.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
path
|
str
|
Path to the file. |
required |
Returns:
| Type | Description |
|---|---|
DataFrame
|
Canonical long-format DataFrame. |
Source code in src/phospho_velocity/io/maxquant.py
233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 | |
validate_columns(df)
¶
Return missing canonical columns from df.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
df
|
DataFrame
|
|
required |
Returns:
| Type | Description |
|---|---|
list of str
|
|
Source code in src/phospho_velocity/io/maxquant.py
264 265 266 267 268 269 270 271 272 273 274 275 276 277 | |
find_intensity_columns(cols)
¶
Strictly match ONLY raw intensity columns: 'Intensity ES2_1', NOT: - Intensity L - Intensity H - Intensity normalized
Source code in src/phospho_velocity/io/maxquant.py
83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 | |
make_site_id(row)
¶
Build a stable site ID from MaxQuant phosphosite row.
The Leading proteins field from MaxQuant may optionally carry
colon-delimited genomic annotation appended by external tools, giving the
following full site_id schema (all parts colon-separated within the
leading-proteins segment, underscore before the phosphosite suffix)::
<protein_id>:<genomic_mapping>:<aa_substitution>:<nucleotide_change>
:<codon_change>:<sift_prediction>:<polyphen_score>_<phosphosite>
Example::
ENSP00000263026:map.16/22269867/G,T:p.Q361R:n.A1556G:c.cAa/cGa
:SIFTprediction.tolerated:PolyPhenScore.0_S74
For unannotated entries the simpler form is used::
ENSP00000001146_S460
Component descriptions:
protein_id– Ensembl protein accession, e.g.ENSP00000263026.genomic_mapping– chromosome/position/ref,alt, e.g.map.16/22269867/G,T. Note: contains a literal comma.aa_substitution– HGVS protein change, e.g.p.Q361R.nucleotide_change– transcript-level HGVS, e.g.n.A1556G.codon_change– codon-level notation, e.g.c.cAa/cGa.sift_prediction– e.g.SIFTprediction.tolerated.polyphen_score– e.g.PolyPhenScore.0.phosphosite– single amino-acid + position code, e.g.S74(only the first position fromPositions within proteinsis used to avoid repeated entries likeS74;74;385;269).
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
row
|
Series
|
A row from Phospho(STY)Sites.txt. |
required |
Returns:
| Type | Description |
|---|---|
str
|
Opaque but human-readable site identifier. Downstream code must
treat this as an opaque string; use
:func: |
Source code in src/phospho_velocity/io/maxquant.py
29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 | |
parse_phosphosites(filepath, sep='\t')
¶
Parse MaxQuant Phospho(STY)Sites.txt into tidy long-format DataFrame.
Rows corresponding to contaminants (Potential contaminant == "+" ) and
reverse decoy hits (Reverse == "+" ) are removed. Rows with NaN
intensity are kept to preserve the missingness structure.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
filepath
|
str
|
Path to |
required |
sep
|
str
|
Column separator (default |
'\t'
|
Returns:
| Type | Description |
|---|---|
DataFrame
|
Tidy long-format table with columns:
|
Source code in src/phospho_velocity/io/maxquant.py
130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 | |
parse_sample_meta(colname)
¶
Parse Intensity ES2_1 → (cell_line='ES2', replicate=1).
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
colname
|
str
|
Column name such as |
required |
Returns:
| Type | Description |
|---|---|
tuple of (str, int)
|
|
Raises:
| Type | Description |
|---|---|
ValueError
|
If the column name does not match the expected pattern. |
Source code in src/phospho_velocity/io/maxquant.py
104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 | |
Preprocessing & Normalization¶
Preprocessing and normalization utilities.
Implements log2 transformation, per-sample median normalization, log-fold-change computation, and missingness handling for phosphoproteomics time-course data.
assign_time(df, time_map=None, allow_missing=False)
¶
Add time_min column from cell_line + replicate mapping.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
df
|
DataFrame
|
Long-format DataFrame with |
required |
time_map
|
dict or :class:`~phospho_velocity.config.TimeMap`
|
Mapping
|
None
|
allow_missing
|
bool
|
When |
False
|
Returns:
| Type | Description |
|---|---|
DataFrame
|
Copy of df with added |
Source code in src/phospho_velocity/preprocessing/normalize.py
40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 | |
compute_log_fold_change(df, value_col='log2_intensity', baseline_time=0.0)
¶
Compute log2 fold-change vs. baseline time per (site_id, cell_line).
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
df
|
DataFrame
|
Long-format DataFrame with |
required |
value_col
|
str
|
Column with log2 intensity values. |
'log2_intensity'
|
baseline_time
|
float
|
Time point to use as baseline (default 0.0). |
0.0
|
Returns:
| Type | Description |
|---|---|
DataFrame
|
Copy of df with |
Source code in src/phospho_velocity/preprocessing/normalize.py
165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 | |
median_normalize(df, value_col='log2_intensity', group_cols=None)
¶
Median-center per cell_line (default), preserving global scale.
Source code in src/phospho_velocity/preprocessing/normalize.py
132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 | |
Visualization¶
Plotting utilities for trajectories, confidence intervals, and heatmaps.
Visualization utilities for phosphosite trajectories and velocities.
plot_site_overview(vel_df, output_path=None)
¶
Save a multi-panel overview figure.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
vel_df
|
DataFrame
|
Velocity DataFrame from :func: |
required |
output_path
|
str
|
File path to save the figure (PNG/PDF/SVG). If None, the figure is returned without saving. |
None
|
Returns:
| Name | Type | Description |
|---|---|---|
fig |
matplotlib Figure
|
|
Source code in src/phospho_velocity/plotting/trajectory.py
165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 | |
plot_site_trajectory(times, values, trajectory_mean, trajectory_std=None, velocity_mean=None, velocity_std=None, velocity_q5=None, velocity_q95=None, site_id='Unknown Site', ax=None)
¶
Plot GP trajectory and velocity for a single phosphosite.
Handles both standard GP uncertainty (mean ± 2σ) and Bayesian Credible Intervals (q5 to q95).
Source code in src/phospho_velocity/plotting/trajectory.py
14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 | |
plot_velocity_heatmap(vel_df, site_col='site_id', time_col='time_min', value_col='velocity', ax=None)
¶
Plot heatmap of velocity across sites × time.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
vel_df
|
DataFrame
|
|
required |
site_col
|
str
|
|
'site_id'
|
time_col
|
str
|
|
'time_min'
|
value_col
|
str
|
|
'velocity'
|
ax
|
matplotlib Axes
|
|
None
|
Returns:
| Name | Type | Description |
|---|---|---|
fig |
matplotlib Figure
|
|
ax |
matplotlib Axes
|
|
Source code in src/phospho_velocity/plotting/trajectory.py
110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 | |
Network & Configuration¶
Network priors and pipeline configuration.
Kinase-Substrate Network¶
Kinase-substrate network prior.
Provides a data structure and utilities for incorporating kinase–substrate network knowledge (e.g. from KinomeXplorer / NetworKIN / Creixell et al. 2015) as regularization priors into the velocity model.
KinaseSubstrateNetwork
¶
Kinase–substrate interaction network.
Stores directed kinase → substrate-site edges with optional scores.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
None
|
|
required |
Source code in src/phospho_velocity/network/kinase_substrate.py
19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 | |
get_kinases(site)
¶
Return kinases for site.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
site
|
str
|
|
required |
Returns:
| Type | Description |
|---|---|
list of str
|
|
Source code in src/phospho_velocity/network/kinase_substrate.py
119 120 121 122 123 124 125 126 127 128 129 130 | |
get_substrates(kinase)
¶
Return substrate sites for kinase.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
kinase
|
str
|
|
required |
Returns:
| Type | Description |
|---|---|
list of str
|
|
Source code in src/phospho_velocity/network/kinase_substrate.py
105 106 107 108 109 110 111 112 113 114 115 116 | |
load_from_file(filepath, sep='\t')
¶
Load kinase–substrate pairs from a tab-delimited file.
Two column layouts are accepted:
- Legacy format:
kinase,substrate_site, (score). - Current format (produced by
make_network_file.py):kinase,substrate,site, (score).substrate_siteis derived as"{substrate}_{site}"so that it matches thesite_idkeys produced by the MaxQuant parser.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
filepath
|
str
|
Path to the network file. |
required |
sep
|
str
|
Column separator. |
'\t'
|
Returns:
| Type | Description |
|---|---|
self
|
|
Raises:
| Type | Description |
|---|---|
ValueError
|
If neither the legacy nor the current column layout is detected. |
Source code in src/phospho_velocity/network/kinase_substrate.py
35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 | |
pssm_weight(site_sequence, pssm_matrix, aa_order='ACDEFGHIKLMNPQRSTVWY')
¶
Score a sequence window against a position-specific scoring matrix.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
site_sequence
|
str
|
Amino-acid sequence window (e.g. 15-mers centered on phosphosite). |
required |
pssm_matrix
|
(ndarray, shape(L, 20))
|
PSSM scores; rows = positions, columns = amino acids in aa_order. |
required |
aa_order
|
str
|
Amino acid ordering for PSSM columns. |
'ACDEFGHIKLMNPQRSTVWY'
|
Returns:
| Type | Description |
|---|---|
float
|
Cumulative log-odds score. |
Source code in src/phospho_velocity/network/kinase_substrate.py
133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 | |
regularization_prior(site_id, sites_df=None)
¶
Compute regularization weight based on network connectivity.
Sites with more kinase connections receive lower regularization (stronger prior towards the network-implied trajectory).
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
site_id
|
str
|
|
required |
sites_df
|
DataFrame
|
Not currently used; reserved for future extensions. |
None
|
Returns:
| Type | Description |
|---|---|
float
|
Regularization weight in (0, 1]. |
Source code in src/phospho_velocity/network/kinase_substrate.py
166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 | |
build_default_network()
¶
Return an empty :class:KinaseSubstrateNetwork.
Serves as a placeholder when no network file is available.
Returns:
| Type | Description |
|---|---|
KinaseSubstrateNetwork
|
|
Source code in src/phospho_velocity/network/kinase_substrate.py
193 194 195 196 197 198 199 200 201 202 203 | |
Pipeline Configuration¶
Configuration system for phospho-velocity pipeline.
Uses Python dataclasses to keep dependencies minimal (no pydantic required). Supports loading from and saving to YAML files.
GenericInputSchema
dataclass
¶
Column-name mapping for generic long-format input files.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
site_id_col
|
str
|
Column containing phosphosite identifiers. |
'site_id'
|
gene_col
|
str
|
Column containing gene/protein names. |
'gene'
|
cell_line_col
|
str
|
Column containing cell line or sample labels. |
'cell_line'
|
replicate_col
|
str
|
Column containing replicate indices. |
'replicate'
|
intensity_col
|
str
|
Column containing (log2) intensity or ratio values. |
'intensity'
|
locprob_col
|
str
|
Column containing localization probabilities. When |
None
|
Source code in src/phospho_velocity/config.py
169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 | |
ModelConfig
dataclass
¶
GP model / MCMC parameters.
Source code in src/phospho_velocity/config.py
208 209 210 211 212 213 214 215 216 | |
PipelineConfig
dataclass
¶
Top-level pipeline configuration.
Source code in src/phospho_velocity/config.py
219 220 221 222 223 224 225 226 227 228 229 230 231 232 | |
PreprocessingConfig
dataclass
¶
Preprocessing parameters.
Source code in src/phospho_velocity/config.py
198 199 200 201 202 203 204 205 | |
TimeMap
dataclass
¶
Mapping from (cell_line, replicate) to time in minutes.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
mapping
|
dict
|
Dictionary with |
dict()
|
Source code in src/phospho_velocity/config.py
20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 | |
from_csv(path)
classmethod
¶
Load a :class:TimeMap from a CSV file.
The CSV must have columns cell_line, replicate, time_min
(with a header row).
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
path
|
str
|
Path to the CSV file. |
required |
Returns:
| Type | Description |
|---|---|
TimeMap
|
|
Source code in src/phospho_velocity/config.py
66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 | |
from_dict(d)
classmethod
¶
Construct a :class:TimeMap from a plain dictionary.
The dictionary keys must be 2-tuples (cell_line, replicate) or
strings in the form "cell_line_replicate" (split on the last
underscore).
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
d
|
dict
|
Source mapping. |
required |
Returns:
| Type | Description |
|---|---|
TimeMap
|
|
Source code in src/phospho_velocity/config.py
34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 | |
from_json(path)
classmethod
¶
Load a :class:TimeMap from a JSON file.
The JSON must be an object whose keys are "cell_line_replicate"
strings and whose values are time-in-minutes floats, or a list of
{"cell_line": ..., "replicate": ..., "time_min": ...} objects.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
path
|
str
|
Path to the JSON file. |
required |
Returns:
| Type | Description |
|---|---|
TimeMap
|
|
Source code in src/phospho_velocity/config.py
99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 | |
validate(df, cell_line_col='cell_line', replicate_col='replicate')
¶
Return (cell_line, replicate) pairs in df not covered by the mapping.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
df
|
DataFrame
|
Long-format data with cell_line_col and replicate_col columns. |
required |
cell_line_col
|
str
|
Column name for cell line labels. |
'cell_line'
|
replicate_col
|
str
|
Column name for replicate indices. |
'replicate'
|
Returns:
| Type | Description |
|---|---|
list of (str, int)
|
Uncovered (cell_line, replicate) pairs; empty if all covered. |
Source code in src/phospho_velocity/config.py
129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 | |
load_config(path)
¶
Load a :class:PipelineConfig from a YAML file.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
path
|
str
|
Path to YAML configuration file. |
required |
Returns:
| Type | Description |
|---|---|
PipelineConfig
|
|
Source code in src/phospho_velocity/config.py
266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 | |
save_config(config, path)
¶
Serialize config to a YAML file.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
config
|
PipelineConfig
|
|
required |
path
|
str
|
Destination path. |
required |
Source code in src/phospho_velocity/config.py
298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 | |
Command Line Interface¶
The internal CLI orchestrator.
Command-line interface for phospho-velocity.
Provides subcommands: parse, preprocess, fit, run.
build_parser()
¶
Build the top-level argument parser.
Source code in src/phospho_velocity/cli/run.py
222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 | |
main(argv=None)
¶
Entry point for the phospho-velocity CLI.
Source code in src/phospho_velocity/cli/run.py
325 326 327 328 329 330 | |