Skip to content

Workflow

Overview

The full pipeline proceeds in five stages. All stages are driven by config.yaml and orchestrated by Snakemake.

flowchart TD
    A[".RData files\n(QDNAseq + liquidCNA outputs)"]
    B["tumorfits extract-data\n→ data/patient_data/"]
    C["Subclonal_ratio_estimates.extended.txt\n+ OV_patientDNA_sampleList.txt"]
    D["tumorfits ode\n→ ode_gof_points.csv"]
    E["tumorfits pde\n→ results/pde/"]
    F["tumorfits heatmap\n→ results/heatmaps/"]
    G["tumorfits mesh-view\n→ results/mesh_view/"]

    A --> B
    C --> D
    D --> E
    D --> F
    D --> G

Stage 1: Data extraction

Rule: extract_data

Reads every .RData file under data/ and writes each data.frame R object as a CSV to data/patient_data/<patient_id>/.

Input format: binary R .RData archives produced by QDNAseq and liquidCNA.

Output: structured CSV files used for diagnosis, QC, and manual inspection.

Note

This step is only required if the data/patient_data/ directory is missing. The repository ships with pre-extracted CSVs so this step can be skipped for the bundled dataset.

Stage 2: ODE model fitting

Rule: ode_fit

Fits the well-mixed ODE model to each patient's longitudinal subclonal fraction and CA125 data using multi-start L-BFGS-B optimisation.

Key outputs: - ode_gof_points.csv — long-table CSV, one row per parameter per patient - Per-patient state-trajectory plots in results/ode_diag/

See Mathematical Model for the equations.

Stage 3: PDE model

Rule: pde_run

Runs the 1-D reaction–diffusion PDE using ODE parameter estimates as initialisation. Optionally re-fits the diffusion coefficients (DS, DR).

Output: per-patient fit plots and parameter CSVs in results/pde/.

Stage 4: Heatmaps

Rule: heatmaps

Generates space-time heatmaps of S(x, t) and R(x, t) from the PDE simulation without re-fitting. One PNG per patient.

Stage 5: Mesh visualisation

Rule: mesh_view

Runs a 2-D FEniCS reaction–diffusion simulation and produces three PyVista off-screen PNG visualisations per patient:

File Content
<pid>_resistance_zones.png Spatial map of resistant cell fraction
<pid>_streamlines.png Cell-density gradient streamlines
<pid>_drug_efficacy.png Drug kill-rate overlay

Running with Snakemake

# Full pipeline
snakemake --cores all --configfile config.yaml

# Dry-run
snakemake -n --configfile config.yaml

# Single rule
snakemake --cores all results/ode_gof_points.csv

# Clean
snakemake clean --cores 1

config.yaml structure

data:
  root: "data"
  subclonal_ratios: "data/liquidCNA_results/..."
  sample_list: "data/OV_patientDNA_sampleList.txt"
  patient_data_dir: "data/patient_data"

cohort:
  flags: "yes,maybe"
  time_unit: "months"
  use_ca125_updated: true
  drop_failed: true

ode:
  out_points: "results/ode_gof_points.csv"
  n_starts: 8
  ...

pde:
  out_dir: "results/pde"
  n_cells: 200
  ...

All parameters are documented inline in config.yaml.