PWM Scoring¶

Score sequences against a known PSSM with optional spatial weighting.

pyprego.compute ¶

PWM scoring / energy computation.

Mirrors the core compute_pwm / compute_local_pwm functions from the R prego package. Given a PSSM and optional spatial model, compute the predicted PWM energy for each input sequence.

All computation uses NumPy arrays; the interfaces accept and return arrays so that a torch backend could be swapped in later with minimal changes.

compute_pwm ¶

compute_pwm(sequences: list[str] | ndarray, pssm: DataFrame, spat: DataFrame | None = None, *, spat_min: int = 1, spat_max: int | None = None, bidirect: bool = True, prior: float = 0.01, func: str = 'logSumExp') -> np.ndarray

Compute PWM energy scores for sequences given a PSSM and spatial model.

Mirrors the R compute_pwm() function. For each sequence, slides the PSSM across all valid positions, computes the log-likelihood at each window, applies spatial weighting, and aggregates via logSumExp or max.

PARAMETER	DESCRIPTION
`sequences`	DNA sequences. TYPE: `list[str] \| ndarray`
`pssm`	PSSM DataFrame (pos, A, C, G, T). TYPE: `DataFrame`
`spat`	Spatial model DataFrame (bin, spat_factor). If `None`, uniform spatial weighting is used. TYPE: `DataFrame \| None` DEFAULT: `None`
`spat_min`	Minimum position in the sequence to consider (1-based, as in R). TYPE: `int` DEFAULT: `1`
`spat_max`	Maximum position. `None` means use full sequence length. TYPE: `int \| None` DEFAULT: `None`
`bidirect`	Score both orientations and combine. TYPE: `bool` DEFAULT: `True`
`prior`	Uniform prior added to PSSM probabilities. TYPE: `float` DEFAULT: `0.01`
`func`	Combination function: `"logSumExp"` or `"max"`. TYPE: `str` DEFAULT: `'logSumExp'`

RETURNS	DESCRIPTION
`ndarray`	1-D array of scores, one per sequence.

compute_local_pwm ¶

compute_local_pwm(sequences: list[str] | ndarray, pssm: DataFrame, *, spat: DataFrame | None = None, bidirect: bool = True, prior: float = 0.01) -> np.ndarray

Compute per-position PWM scores across each sequence.

Mirrors the R compute_local_pwm() function. At each valid position, computes the log-likelihood of the PSSM alignment. Positions where the PSSM does not fit are set to NaN.

In the R implementation, compute_local_pwm_cpp extracts a substring of motif_len at each position and calls integrate_energy on it. With a single-bin uniform spatial factor, this is equivalent to computing logSumExp(forward_score, rc_score) at each position when bidirect=True, or just the forward score when bidirect=False.

PARAMETER	DESCRIPTION
`sequences`	DNA sequences. TYPE: `list[str] \| ndarray`
`pssm`	PSSM DataFrame. TYPE: `DataFrame`
`spat`	Spatial model DataFrame. If provided, spatial weighting is applied. If `None`, uniform weighting (factor=1) is used. TYPE: `DataFrame \| None` DEFAULT: `None`
`bidirect`	Score both orientations. TYPE: `bool` DEFAULT: `True`
`prior`	Uniform prior. TYPE: `float` DEFAULT: `0.01`

RETURNS	DESCRIPTION
`ndarray`	2-D array of shape `(n_sequences, seq_length)` with per-position scores. Positions where the PSSM window does not fit contain NaN.