Skip to content

Graph Scoring

compute_networkin_score(motif_likelihood: float, context_likelihood: float) -> float

Combine motif and STRING context likelihoods into the final NetworKIN score.

The NetworKIN score is defined as the product of the motif likelihood (from pynetphorest sequence scoring) and the STRING network context likelihood (derived from graph-based shortest-path distance).

Parameters:

Name Type Description Default
motif_likelihood float

Likelihood ratio from the motif scoring step (>= 0).

required
context_likelihood float

Likelihood ratio from the STRING network context scoring step (>= 0).

required

Returns:

Type Description
float

NetworKIN score = motif_likelihood × context_likelihood.

Examples:

>>> compute_networkin_score(2.0, 1.5)
3.0
>>> compute_networkin_score(0.0, 5.0)
0.0
Source code in src/pynetworkin/graph_scoring.py
def compute_networkin_score(
    motif_likelihood: float,
    context_likelihood: float,
) -> float:
    """
    Combine motif and STRING context likelihoods into the final NetworKIN score.

    The NetworKIN score is defined as the product of the motif likelihood
    (from pynetphorest sequence scoring) and the STRING network context
    likelihood (derived from graph-based shortest-path distance).

    Parameters
    ----------
    motif_likelihood : float
        Likelihood ratio from the motif scoring step (>= 0).
    context_likelihood : float
        Likelihood ratio from the STRING network context scoring step (>= 0).

    Returns
    -------
    float
        NetworKIN score = motif_likelihood × context_likelihood.

    Examples
    --------
    >>> compute_networkin_score(2.0, 1.5)
    3.0
    >>> compute_networkin_score(0.0, 5.0)
    0.0
    """
    return motif_likelihood * context_likelihood

filter_and_rank_predictions(predictions: list[dict[str, Any]], min_networkin: float = 2.0, min_motif: float = 0.05, top_k: int = 5) -> list[dict[str, Any]]

Filter and rank kinase–substrate predictions by NetworKIN score.

Removes predictions below minimum score thresholds, then keeps only the top-k kinase candidates per (target protein, phosphosite position) pair, sorted by descending NetworKIN score.

Parameters:

Name Type Description Default
predictions list of dict

Raw prediction rows as returned by compile_predictions or recover_predictions. Each dict must contain the keys "Name", "Position", "NetworKIN score", and "Motif probability".

required
min_networkin float

Minimum NetworKIN score to retain a prediction. Default is 2.0.

2.0
min_motif float

Minimum motif probability to retain a prediction. Default is 0.05.

0.05
top_k int

Maximum number of kinase predictions to keep per (protein, position) pair. Default is 5.

5

Returns:

Type Description
list of dict

Filtered and ranked prediction rows in the same format as the input. The list is sorted by (Name, Position, NetworKIN score descending).

Notes

Predictions recovered by the false-negative recovery step (recovered = True) may have "Motif probability" set to -1.0 as a sentinel value. Such rows will be excluded by the min_motif filter unless you set min_motif to a negative value.

Source code in src/pynetworkin/graph_scoring.py
def filter_and_rank_predictions(
    predictions: list[dict[str, Any]],
    min_networkin: float = 2.0,
    min_motif: float = 0.05,
    top_k: int = 5,
) -> list[dict[str, Any]]:
    """
    Filter and rank kinase–substrate predictions by NetworKIN score.

    Removes predictions below minimum score thresholds, then keeps only
    the top-*k* kinase candidates per (target protein, phosphosite position)
    pair, sorted by descending NetworKIN score.

    Parameters
    ----------
    predictions : list of dict
        Raw prediction rows as returned by ``compile_predictions`` or
        ``recover_predictions``.  Each dict must contain the keys
        ``"Name"``, ``"Position"``, ``"NetworKIN score"``, and
        ``"Motif probability"``.
    min_networkin : float, optional
        Minimum NetworKIN score to retain a prediction.  Default is ``2.0``.
    min_motif : float, optional
        Minimum motif probability to retain a prediction.  Default is ``0.05``.
    top_k : int, optional
        Maximum number of kinase predictions to keep per (protein, position)
        pair.  Default is ``5``.

    Returns
    -------
    list of dict
        Filtered and ranked prediction rows in the same format as the input.
        The list is sorted by (``Name``, ``Position``, ``NetworKIN score``
        descending).

    Notes
    -----
    Predictions recovered by the false-negative recovery step
    (``recovered = True``) may have ``"Motif probability"`` set to ``-1.0``
    as a sentinel value.  Such rows will be excluded by the ``min_motif``
    filter unless you set ``min_motif`` to a negative value.
    """
    df = pd.DataFrame(predictions)
    if df.empty:
        logger.warning("No predictions available after scoring")
        return predictions

    filtered = df[
        (df["NetworKIN score"] > min_networkin) & (df["Motif probability"] > min_motif)
    ].copy()
    filtered = filtered.sort_values(
        ["Name", "Position", "NetworKIN score"], ascending=[True, True, False]
    )
    filtered = filtered.groupby(["Name", "Position"], as_index=False).head(top_k)

    logger.success("Retained {} predictions after ranking and filtering", len(filtered))
    return filtered.to_dict(orient="records")