Motif Scoring¶
motif_scoring.py – batch kinase-motif scorer.
Replaces the legacy subprocess-based C-binary calls for motif scoring. All sequences are scored in a single batched call; no per-sequence subprocess or loop is used.
score_sequences(id_seq: dict[str, str], id_pos_res: dict[str, dict[int, str]] | None = None) -> dict[str, dict[int, dict[str, dict[str, tuple[str, str, float]]]]]
¶
Score all sequences in id_seq against the bundled kinase motif atlas.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
id_seq
|
dict[str, str]
|
Mapping of protein-id → full protein sequence. |
required |
id_pos_res
|
dict[str, dict[int, str]] | None
|
Optional mapping of protein-id → {1-based position: residue}.
When provided, only those positions are scored.
When |
None
|
Returns:
| Type | Description |
|---|---|
Nested dict: ``result[protein_id][pos][tree][kinase] = (res, peptide, score)``
|
|
where
|
|
- *pos* is the 1-based sequence position
|
|
- *tree* is the top-level classifier type (e.g. ``'KIN'``, ``'SH2'``, ``'1433'``)
|
|
- *kinase* is the specific kinase / group name (e.g. ``'PKA_group'``)
|
|
- *res* is the phospho-residue character (``'S'``, ``'T'``, or ``'Y'``)
|
|
- *peptide* is the display window from the motif scorer
|
|
- *score* is the posterior probability in ``[0, 1]``
|
|