PWM Scoring¶
Score sequences against a known PSSM with optional spatial weighting.
pyprego.compute ¶
PWM scoring / energy computation.
Mirrors the core compute_pwm / compute_local_pwm functions from the
R prego package. Given a PSSM and optional spatial model, compute the predicted
PWM energy for each input sequence.
All computation uses NumPy arrays; the interfaces accept and return arrays so that a torch backend could be swapped in later with minimal changes.
compute_pwm ¶
compute_pwm(sequences: list[str] | ndarray, pssm: DataFrame, spat: DataFrame | None = None, *, spat_min: int = 1, spat_max: int | None = None, bidirect: bool = True, prior: float = 0.01, func: str = 'logSumExp') -> np.ndarray
Compute PWM energy scores for sequences given a PSSM and spatial model.
Mirrors the R compute_pwm() function. For each sequence, slides the
PSSM across all valid positions, computes the log-likelihood at each
window, applies spatial weighting, and aggregates via logSumExp or max.
| PARAMETER | DESCRIPTION |
|---|---|
sequences
|
DNA sequences.
TYPE:
|
pssm
|
PSSM DataFrame (pos, A, C, G, T).
TYPE:
|
spat
|
Spatial model DataFrame (bin, spat_factor). If
TYPE:
|
spat_min
|
Minimum position in the sequence to consider (1-based, as in R).
TYPE:
|
spat_max
|
Maximum position.
TYPE:
|
bidirect
|
Score both orientations and combine.
TYPE:
|
prior
|
Uniform prior added to PSSM probabilities.
TYPE:
|
func
|
Combination function:
TYPE:
|
| RETURNS | DESCRIPTION |
|---|---|
ndarray
|
1-D array of scores, one per sequence. |
compute_local_pwm ¶
compute_local_pwm(sequences: list[str] | ndarray, pssm: DataFrame, *, spat: DataFrame | None = None, bidirect: bool = True, prior: float = 0.01) -> np.ndarray
Compute per-position PWM scores across each sequence.
Mirrors the R compute_local_pwm() function. At each valid position,
computes the log-likelihood of the PSSM alignment. Positions where the
PSSM does not fit are set to NaN.
In the R implementation, compute_local_pwm_cpp extracts a substring of
motif_len at each position and calls integrate_energy on it. With a
single-bin uniform spatial factor, this is equivalent to computing
logSumExp(forward_score, rc_score) at each position when bidirect=True,
or just the forward score when bidirect=False.
| PARAMETER | DESCRIPTION |
|---|---|
sequences
|
DNA sequences.
TYPE:
|
pssm
|
PSSM DataFrame.
TYPE:
|
spat
|
Spatial model DataFrame. If provided, spatial weighting is applied.
If
TYPE:
|
bidirect
|
Score both orientations.
TYPE:
|
prior
|
Uniform prior.
TYPE:
|
| RETURNS | DESCRIPTION |
|---|---|
ndarray
|
2-D array of shape |