Reproducibility & Code¶
Philosophy¶
This Python port was written to be:
- transparent – every step is in a plain Python script,
- reproducible – Snakemake encodes the full dependency graph,
- extensible – you can plug in new models, new datasets, or new constructs without rewriting everything.
Environments¶
You can reproduce the full analysis using:
uvorpipwithrequirements.txt,- Python 3.9–3.11.
All versions are pinned in pyproject.toml / requirements.txt.
Workflow (Snakemake)¶
The core analysis is orchestrated by the Snakefile:
-
snakemake -j 4
runs the full pipeline and generates:- intermediate parameter tables in
parameters/, - diagnostic plots in
plots/, - final integrated report in
report/.
- intermediate parameter tables in
The rules:
- encode dependencies between steps (1–7),
- enforce that later steps only run once earlier parameters are available,
- clean up intermediate directories automatically if configured that way.
You can visualise the DAG with:
snakemake --dag | dot -Tpdf > dag.pdf
Determinism¶
The pipeline is designed to be deterministic given the same inputs:
- ODE integration uses
scipy.integrate.solve_ivpwith fixed settings, - optimisation uses
scipy.optimize.curve_fitwith explicit initial guesses, - plotting is scripted and does not involve random jitter.
Any remaining variability (e.g. due to local BLAS/OpenMP threading) should be minor and not affect qualitative conclusions.
Validation vs original R workflow¶
To confirm that the Python port faithfully reproduces the original CasTuner analysis, the following was compared:
- mCherry degradation rate α
- Repression/derepression delays
- Up- and down-regulation half-times
- Hill parameters K and n, including their ordering across constructs
- Representative Hill curves, time courses and ODE trajectories
Results:
- parameter rankings and curve shapes match very closely,
-
absolute values differ modestly, consistent with:
- different ODE solvers (
deSolve::lsodavssolve_ivp), - different non-linear optimisers and tolerance settings.
- different ODE solvers (
In short, the Python implementation captures the same biological trends and quantitative behaviour as the original R code, while being easier to integrate into modern Python-based analysis pipelines.
Extending the pipeline¶
You can:
- add new steps as additional scripts + Snakemake rules,
-
plug in alternative models (e.g. multi-state chromatin models) by reusing:
- the data loading,
- gating,
- and plotting utilities already present.
The documentation for each step should give you enough context to do this without breaking the global narrative.