Skip to content

Contributing

Contributions to CETSAx–NADPH are welcome, but they need to meet a clear standard. This project prioritizes correctness, clarity, and biological relevance over feature volume.


Philosophy

This is not a generic software project. It is a scientific tool.

Contributions should:

  • improve correctness of the model or analysis
  • increase interpretability
  • enhance reproducibility
  • extend functionality in a biologically meaningful way

Avoid adding features that increase complexity without clear scientific value.


What You Can Contribute

1. Bug fixes

  • Incorrect results or edge-case failures
  • Numerical instability in fitting or scoring
  • Data handling inconsistencies

2. Performance improvements

  • Faster curve fitting
  • Reduced memory usage in sequence models
  • Better parallelization

3. New analysis modules

Examples:

  • alternative scoring metrics
  • improved network inference
  • new clustering or latent methods

These should integrate cleanly into the existing pipeline.


4. Sequence modeling improvements

  • better training strategies
  • improved explainability methods
  • integration of structural features

5. Documentation

  • clearer explanations
  • missing usage examples
  • better interpretation guidance

What Not to Contribute

  • UI layers or dashboards without analytical value
  • loosely tested experimental features
  • large refactors without justification
  • redundant implementations of existing functionality

Development Setup

Clone the repository and install in editable mode:

```bash
git clone https://github.com/bibymaths/cetsax.git
cd cetsax-nadph

uv venv
source .venv/bin/activate
uv pip install -r requirements.txt
pip install -e .
````

---

## Coding Guidelines

### General

* Keep functions small and focused
* Prefer explicit logic over abstraction
* Avoid hidden side effects

---

### Data handling

* Use `pandas.DataFrame` consistently
* Preserve column naming conventions (`id`, `condition`, metrics)
* Do not silently modify input data

---

### Numerical code

* Avoid unstable transformations
* Document assumptions (e.g. scaling, bounds)
* Prefer reproducible deterministic behavior

---

### Deep learning

* Keep training and inference clearly separated
* Avoid unnecessary GPU memory usage
* Document all hyperparameters

---

## Testing

Before submitting:

* Run the pipeline on a small dataset
* Verify outputs are consistent
* Check edge cases (missing data, low variance, etc.)

If possible:

* add unit tests
* include reproducible examples

---

## Submitting Changes

### 1. Create a branch

```bash id="p2x7oy"
git checkout -b feature/your-feature-name

2. Make focused commits

  • One logical change per commit
  • Clear commit messages

3. Open a Pull Request

Include:

  • what the change does
  • why it is needed
  • how it was tested

If relevant, include before/after results.


Review Process

Pull requests are evaluated based on:

  • correctness
  • clarity
  • consistency with existing design
  • biological relevance

Changes may be rejected if they:

  • complicate the system unnecessarily
  • introduce ambiguity in interpretation
  • lack sufficient justification

Style Expectations

  • Write code as if it will be read, not just executed
  • Avoid unnecessary cleverness
  • Prefer clarity over brevity

Communication

If you plan a larger change:

  • open an issue first
  • describe the idea clearly
  • wait for feedback before implementing

This avoids wasted effort.


Summary

Contribute if you can:

  • make the model more accurate
  • make the outputs more interpretable
  • make the system more robust

Do not contribute just to add features.

The goal is a tool that produces results you would trust in a paper.