Contributing¶

Contributions to CETSAx–NADPH are welcome, but they need to meet a clear standard. This project prioritizes correctness, clarity, and biological relevance over feature volume.

Philosophy¶

This is not a generic software project. It is a scientific tool.

Contributions should:

improve correctness of the model or analysis
increase interpretability
enhance reproducibility
extend functionality in a biologically meaningful way

Avoid adding features that increase complexity without clear scientific value.

What You Can Contribute¶

1. Bug fixes¶

Incorrect results or edge-case failures
Numerical instability in fitting or scoring
Data handling inconsistencies

2. Performance improvements¶

Faster curve fitting
Reduced memory usage in sequence models
Better parallelization

3. New analysis modules¶

Examples:

alternative scoring metrics
improved network inference
new clustering or latent methods

These should integrate cleanly into the existing pipeline.

4. Sequence modeling improvements¶

better training strategies
improved explainability methods
integration of structural features

5. Documentation¶

clearer explanations
missing usage examples
better interpretation guidance

What Not to Contribute¶

UI layers or dashboards without analytical value
loosely tested experimental features
large refactors without justification
redundant implementations of existing functionality

Development Setup¶

Clone the repository and install in editable mode:

```bash
git clone https://github.com/bibymaths/cetsax.git
cd cetsax-nadph

uv venv
source .venv/bin/activate
uv pip install -r requirements.txt
pip install -e .
````

---

## Coding Guidelines

### General

* Keep functions small and focused
* Prefer explicit logic over abstraction
* Avoid hidden side effects

---

### Data handling

* Use `pandas.DataFrame` consistently
* Preserve column naming conventions (`id`, `condition`, metrics)
* Do not silently modify input data

---

### Numerical code

* Avoid unstable transformations
* Document assumptions (e.g. scaling, bounds)
* Prefer reproducible deterministic behavior

---

### Deep learning

* Keep training and inference clearly separated
* Avoid unnecessary GPU memory usage
* Document all hyperparameters

---

## Testing

Before submitting:

* Run the pipeline on a small dataset
* Verify outputs are consistent
* Check edge cases (missing data, low variance, etc.)

If possible:

* add unit tests
* include reproducible examples

---

## Submitting Changes

### 1. Create a branch

```bash id="p2x7oy"
git checkout -b feature/your-feature-name

2. Make focused commits¶

One logical change per commit
Clear commit messages

3. Open a Pull Request¶

Include:

what the change does
why it is needed
how it was tested

If relevant, include before/after results.

Review Process¶

Pull requests are evaluated based on:

correctness
clarity
consistency with existing design
biological relevance

Changes may be rejected if they:

complicate the system unnecessarily
introduce ambiguity in interpretation
lack sufficient justification

Style Expectations¶

Write code as if it will be read, not just executed
Avoid unnecessary cleverness
Prefer clarity over brevity

Communication¶

If you plan a larger change:

open an issue first
describe the idea clearly
wait for feedback before implementing

This avoids wasted effort.

Summary¶

Contribute if you can:

make the model more accurate
make the outputs more interpretable
make the system more robust

Do not contribute just to add features.

The goal is a tool that produces results you would trust in a paper.