Skip to content

Types

Core data structures and type definitions used throughout pyprego.

pyprego.types

Core type definitions for pyprego.

This module defines the data structures used throughout the package. PSSM matrices and spatial models are represented as pandas DataFrames to keep things simple, inspectable, and consistent with the R prego package.

NUCLEOTIDES module-attribute

NUCLEOTIDES = ('A', 'C', 'G', 'T')

RegressionResult dataclass

Container for the output of :func:pyprego.regression.regress_pwm.

ATTRIBUTE DESCRIPTION
pssm

PSSM DataFrame (pos, A, C, G, T) for the inferred motif.

TYPE: DataFrame

spat

Spatial model DataFrame (bin, spat_factor).

TYPE: DataFrame

pred

Predicted PWM score for each input sequence.

TYPE: ndarray

consensus

Consensus sequence derived from the PSSM.

TYPE: str

r2

R-squared of prediction vs response (continuous response).

TYPE: float | None

ks

KS statistic (binary response).

TYPE: float | None

seed_motif

The seed motif / kmer that initialised the regression.

TYPE: str | None

bidirect

Whether the model is bidirectional (uses reverse complement).

TYPE: bool

spat_min

Minimum spatial position used.

TYPE: int

spat_max

Maximum spatial position used.

TYPE: int | None

seq_length

Length of input sequences.

TYPE: int | None

_predict_fn

Internal prediction function (set after fitting).

TYPE: Callable | None

predict

predict(sequences: list[str] | ndarray) -> np.ndarray

Predict PWM scores for new sequences.

PARAMETER DESCRIPTION
sequences

DNA sequences to score.

TYPE: list[str] | ndarray

RETURNS DESCRIPTION
ndarray

Predicted scores, one per sequence.

RAISES DESCRIPTION
RuntimeError

If the model has not been fitted yet.

to_dict

to_dict() -> dict[str, Any]

Serialise the result to a plain dictionary (for YAML/JSON export).

pssm_dataframe

pssm_dataframe(matrix: ndarray) -> pd.DataFrame

Create a PSSM DataFrame from a (L, 4) NumPy array.

PARAMETER DESCRIPTION
matrix

Array of shape (L, 4) with columns ordered A, C, G, T.

TYPE: ndarray

RETURNS DESCRIPTION
DataFrame

DataFrame with columns pos, A, C, G, T.

RAISES DESCRIPTION
ValueError

If matrix does not have exactly 4 columns.

pssm_to_array

pssm_to_array(pssm: DataFrame) -> np.ndarray

Extract the (L, 4) NumPy array from a PSSM DataFrame.

PARAMETER DESCRIPTION
pssm

PSSM DataFrame with at least columns A, C, G, T.

TYPE: DataFrame

RETURNS DESCRIPTION
ndarray

Array of shape (L, 4).

spatial_dataframe

spatial_dataframe(bins: ndarray, factors: ndarray) -> pd.DataFrame

Create a spatial model DataFrame.

PARAMETER DESCRIPTION
bins

1-D array of bin start positions.

TYPE: ndarray

factors

1-D array of spatial factors (same length as bins).

TYPE: ndarray

RETURNS DESCRIPTION
DataFrame

DataFrame with columns bin and spat_factor.