Reproducibility & Code

Philosophy

This Python port was written to be:

  • transparent – every step is in a plain Python script,
  • reproducible – Snakemake encodes the full dependency graph,
  • extensible – you can plug in new models, new datasets, or new constructs without rewriting everything.

Environments

You can reproduce the full analysis using:

  • uv or pip with requirements.txt,
  • Python 3.9–3.11.

All versions are pinned in pyproject.toml / requirements.txt.

Workflow (Snakemake)

The core analysis is orchestrated by the Snakefile:

  • snakemake -j 4
    runs the full pipeline and generates:

    • intermediate parameter tables in parameters/,
    • diagnostic plots in plots/,
    • final integrated report in report/.

The rules:

  • encode dependencies between steps (1–7),
  • enforce that later steps only run once earlier parameters are available,
  • clean up intermediate directories automatically if configured that way.

You can visualise the DAG with:

snakemake --dag | dot -Tpdf > dag.pdf

Determinism

The pipeline is designed to be deterministic given the same inputs:

  • ODE integration uses scipy.integrate.solve_ivp with fixed settings,
  • optimisation uses scipy.optimize.curve_fit with explicit initial guesses,
  • plotting is scripted and does not involve random jitter.

Any remaining variability (e.g. due to local BLAS/OpenMP threading) should be minor and not affect qualitative conclusions.

Validation vs original R workflow

To confirm that the Python port faithfully reproduces the original CasTuner analysis, the following was compared:

  • mCherry degradation rate α
  • Repression/derepression delays
  • Up- and down-regulation half-times
  • Hill parameters K and n, including their ordering across constructs
  • Representative Hill curves, time courses and ODE trajectories

Results:

  • parameter rankings and curve shapes match very closely,
  • absolute values differ modestly, consistent with:

    • different ODE solvers (deSolve::lsoda vs solve_ivp),
    • different non-linear optimisers and tolerance settings.

In short, the Python implementation captures the same biological trends and quantitative behaviour as the original R code, while being easier to integrate into modern Python-based analysis pipelines.

Extending the pipeline

You can:

  • add new steps as additional scripts + Snakemake rules,
  • plug in alternative models (e.g. multi-state chromatin models) by reusing:

    • the data loading,
    • gating,
    • and plotting utilities already present.

The documentation for each step should give you enough context to do this without breaking the global narrative.